Web Information Retrieval refers to methods and technologies for search, analysis, and automatic organization of data collections in the World Wide Web: text documents, multimedia contents, structured and semi-structured knowledge representations. It has quickly become one of the most important areas in Computer and Information Sciences because of its direct applications in e-commerce, e-CRM, corporate knowledge bases and data repositories, Web analytics, and Web information systems.
The course will introduce mathematical models and algorithms widely used by Web search engines, intranets, and modern digital libratries. In doing so, we will consider state of the art techniques from linear algebra, statistics, graph mining and machine learning. The course will also provide a brief overview of other areas in Web mining, such as Web content mining and Web structure mining.
This course 04IN1021 (Web Information Retrieval) is part of the international Summer Academy (Track "Web Science - Engineering") and will be organized in a block form over the last weeks of the summer term 2013.
Students from Uni Koblenz may attend this lecture (and earn ECTS credits) for the module "Information Retrieval" in their curriculum.
- Technical basics (linear algebra, stochastics, graph algorithms, text processing)
- Content analysis - vector space models
- Link analysis and authority ranking
- Top-k retrieval
- Multi-modal analysis of Web data
- Thematically focused crawling
- Search engine optimization
Basic knowledge in linear algebra, stochastics and graph algorithms. Recommended prerequisites are courses in data mining and database systems.
External students can earn credits per course (3 ECTS for 04IN1021 Web Information Retrieval).
Internal diploma students earn credits per course OR have it as part of the oral diploma exam worth 3 ECTS / 2 SWS.
Internal master students in CV and Computer Science earn credit for the MODULE "Information Retrieval". The module includes the lecture "Information Retrieval" and the seminar "Advances in Information Retrieval", together 6 ECTS = 4 SWS. The seminar can be attended in the summer term 2013 as well, in parallel to the course.
Schedule (2013 course in English)
Academy WEEK 1 (2013 KW 26):
- Mon 24 Jun 2013 12:15-13:45 in A-308: Introduction
- Tue 25 Jun 2013 12:15-13:45 in A-308: Technical Basics
- Wed 26 Jun 2013 12:15-13:45 in A-308: Text Retrieval (1)
- Thu 27 Jun 2013 12:15-13:45 in A-308: Text Retrieval (2)
- Fri 28 Jun 2013 12:15-13:45 in A-308: Text Retrieval - Exercises
Academy WEEK 2 (2013 KW 27):
Wed 03 Jul 2013 12:15-13:45 in A-308: Technical Basics Part 2!!!!
Academy WEEK 3 (2013 KW 28):
- Wed 10 Jul 2013 12:15-13:45 in A-308: Authority Ranking (1)
- Thu 11 Jul 2013 12:15-13:45 in A-308: Authority Ranking (2)
- Fri 12 Jul 2013 12:15-13:45 in A-308: Authority Ranking - Exercises
Academy WEEK 4 (2013 KW 29):
- Wed 17 Jul 2013 12:15-13:45 in A-308: Classification and Clustering
- Thu 18 Jul 2013 12:15-13:45 in A-308: Classification and Clustering - Exercises
- Fri 19 Jul 2013 12:15-13:45 in A-308: SEO, Web Spam & Advertising
Individual oral exam (20 min) at the end of the course. Key topics of the oral exam 'Web Information Retrieval' include: basic methods of text mining and text retrieval; methods of link analysis and authority ranking; basics of search engine design and optimization. Appointments for oral exam will be introduced during the course.
External students: in order to attend the course, you have to apply for Summer Academy participation.
Internal students: the course is maintained in KLIPS as a common module/lecture option and can be booked by regular studens in the usual way.
Dr. Dr. Sergej Sizov -
(old slides from 2012 can be found here)
- Exercise 1: Technical Basics (PDF slides)
- Exercise 2: Technical Basics 2 (PDF slides)
- Exercise 3: Text Retrieval, Authority Ranking (PDF slides)
- Exercise 4: Spam, Classification&Clustering (PDF slides)
- Chapter 1: Introduction (PDF slides, PDF handout)
- Chapter 2: Technical Basics (PDF slides, PDF handout)
- Chapter 3: Text Retrieval (PDF slides, PDF handout)
- Chapter 4: Authority Ranking (PDF slides, PDF handout)
- Chapter 5: Classification and Clustering (PDF slides, PDF handout)
- Chapter 6: SEO, Web Spam, Advertising (PDF slides, PDF handout)
Ricardo Baeza-Yates, Berthier Ribeiro-Neto:
Modern Information Retrieval: The Concepts and Technology behind Search
Addison-Wesley Professional, 2011
Christopher D. Manning, Hinrich Schütze:
Foundations of Statistical Natural Language Processing
MIT Press, 1999