Web Information Retrieval refers to methods and technologies for search, analysis, and automatic organization of data collections in the World Wide Web: text documents, multimedia contents, structured and semi-structured knowledge representations. It has quickly become one of the most important areas in Computer and Information Sciences because of its direct applications in e-commerce, e-CRM, corporate knowledge bases and data repositories, Web analytics, and Web information systems.
The course will introduce mathematical models and algorithms widely used by Web search engines, intranets, and modern digital libratries. In doing so, we will consider state of the art techniques from linear algebra, statistics, graph mining and machine learning. The course will also provide a brief overview of other areas in Web mining, such as Web content mining and Web structure mining.
This course is part of the international Summer Academy (Track "Web Science - Engineering") and will be organized in a block form over the last weeks of the summer term 2012. Students from Uni Koblenz may attend this lecture (and earn ECTS credits) as a common course in their curriculum.
The course "Web Information Retrieval" will be held in parallel in two languages: English and Russian.
- Technical basics (linear algebra, stochastics, graph algorithms, text processing)
- Content analysis - vector space models
- Link analysis and authority ranking
- Top-k retrieval
- Multi-modal analysis of Web data
- Thematically focused crawling
- Search engine optimization
Basic knowledge in linear algebra, stochastics and graph algorithms. Recommended prerequisites are courses in data mining and database systems.
External students can earn credits per course (3 ECTS for Web Information Retrieval).
Internal diploma students earn credits per course OR have it as part of the oral diploma exam worth 3 ECTS / 2 SWS.
Internal master students in CV and Computer Science earn credit for the module INSS08 "Web Search & Data Mining" (offered by Prof. Dr. York Sure and Dr. Dr. Sergej Sizov), i.e. one joint module for two lectures with integrated exercises, together 6 ECTS = 4 SWS.
Schedule (for course in English)
Tue 26 June 2011 18:00-20:00 in D-239
Introduction and Motivation
Thu 28 June 2011 12:00-14:00 in D-239
Technical Basics (lecture / classes)
Fri 29 June 2011 16:00-20:00 in D-239
Text Mining and Retrieval (lecture / classes)
Mon 02 July 2011 16:00-20:00 in D-239
Link Analysis and Authority Ranking (lecture)
Fri 06 July 2011 16:00-20:00 in D-239
Link Analysis and Authority Ranking (classes)
Tue 10 July 2011 08:00-12:00 in D-239
Search Engine optimization (lecture / classes)
Wed 11 July 2011 12:00-14:00 in D-239
Semantic technologies in Web retrieval
Thu 12 July 2011 08:00-12:00 in D-239
Wrap-up discussion, exam preparation
Individual oral exam (30 min) at the end of the course. Key topics of the oral exam 'Web Information Retrieval' include: basic methods of text mining and text retrieval; methods of link analysis and authority ranking; basics of search engine design and optimization.
External students: in order to attend the course, you have to apply for Summer Academy participation.
Internal students: the course is maintained in KLIPS (English part under course ID 04179 and Russian part as a special course) as a common lecture and can be booked by regular studens in the usual way.
Dr. Dr. Sergej Sizov -
Ricardo Baeza-Yates, Berthier Ribeiro-Neto:
Modern Information Retrieval: The Concepts and Technology behind Search
Addison-Wesley Professional, 2011
Christopher D. Manning, Hinrich Schütze:
Foundations of Statistical Natural Language Processing
MIT Press, 1999