Web Information Retrieval
[go to overview]Summer Term 2015
Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.
(Samuel Johnson)
Information Retrieval (IR) is dealing with the storage, representation and management of information items. In a classical setting the information items correspond to text documents. With the advent of the World Wide Web, the methods of IR have been transferred to retrieval on the web. This poses different challenges and has spawned the area of Web Retrieval.
The lecture will give an introduction in established retrieval models for text based documents, models that exploit the graph structure of the WWW, the topic of evaluating the performance of retrieval systems and related tasks like classification and clustering of web documents.
News
- Date for the written exam is August 4th 2015. The exam will take place in lecture room E-011 from 10.00 (s.t.) to 12.00. Prior registration in KLIPS is required to participate in the exam.
- No lecture and tutorial on Monday 27th of April.
- Lecture and Tutorial will start on April 13th.
Organisational information
Klips (Lesson): Link
Klips (Tutorial): Link
Lecture - Web Information Retrieval
Veranstaltungsnummer: 04179
Dozent(in) |
|
Termin(e) |
|
Tutorial - Web Information Retrieval
Dozent(in) |
|
Termin(e) |
|
Lecture Material
Slides and additional material will be provided along with the progress of the lecture.
Lecture
- Introduction (PDF) (Powerpoint)
- Information Seeking (PDF) (Powerpoint)
- Evaluation - Cranfield (PDF) (Powerpoint)
- Evaluation - Set-based Metrics (PDF) (Powerpoint)
- Evaluation - Ranking-aware Metrics (PDF) (Powerpoint)
- Evaluation - Significance (PDF) (Powerpoint)
- Pre-processing - Document Access (PDF) (Powerpoint)
- Pre-processing - Tokenization (PDF) (Powerpoint)
- Pre-processing - Filtering (PDF) (Powerpoint)
- Pre-processing - Static Quality Measures (PDF) (Powerpoint)
- Boolean Retrieval - Model (PDF) (Powerpoint)
- Boolean Retrieval - Inverted Index (PDF) (Powerpoint)
- Boolean Retrieval - More Complex Queries (PDF) (Powerpoint)
- Fast String Search (PDF) (Powerpoint)
- VSM - Model (PDF) (Powerpoint)
- VSM - Implementation (PDF) (Powerpoint)
- VSM - Relevance Feedback (PDF) (Powerpoint)
- Probabilistic Retrieval - Probabilitiy Ranking Principle (PDF) (Powerpoint)
- Probabilistic Retrieval - BIM (PDF) (Powerpoint)
- Probabilistic Retrieval - Relevance Feedback (PDF) (Powerpoint)
- Probabilistic Retrieval - BM25 (PDF) (Powerpoint)
- Crawler - Web as Graph (PDF) (Powerpoint)
- Authority Ranking - PageRank (PDF) (Powerpoint)
- Authority Ranking - HITS (PDF) (Powerpoint)
- Learning to Rank (PDF) (Powerpoint)
- Language Models for IR (PDF) (Powerpoint)
Tutorials
- Information Seeking (PDF)
- Evaluation (PDF)
- Evaluation - Metrics (PDF)
- Evaluation - Metrics (part 2) / Pre-processing (PDF)
- Pre-processing (PDF)
- Vector Space Model (PDF) - MANADTORY ASSIGNMENT
- Wiki Retrieval (PDF) - MANADTORY ASSIGNMENT
Additional Material
- Collection of all sentences on Wikipedia (and samples thereof): http://glm.rene-pickhardt.de/data/
- Stanford NLP software: http://nlp.stanford.edu/software/
- Stemmer Libary: Snowball
- Exam of last year (PDF)
- Lucene minimal demo (Java source code) -- Running with Lucene 5.2.1 (core), using the following jars: lucene-core-5.2.1.jar, lucene-analyzers-common-5.2.1.jar and lucene-queryparsers-5.2.1.jar
The lecture material for Web Information Retrieval by Thomas Gottron is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License