Finding Good URLs Evaluation Dataset

Queries, corpus information and relevance judgements used for evaluating the task of mapping entities in a knowledge base to public web documents. The dataset provides graded relevance judgements in a format that can directly be processed by the trec_eval tool. Two additional files provide a mapping from document-IDs to URLs and topic-IDs to human readable queries.

The dataset has been introdcued in a paper at the ISWC workshop on Web of Linked Entities 2012. 


The URL collection, relevance judgements and query selection by Christian Hachenberg and Thomas Gottron are licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

If you are using the files in a scientific context that leads to publications, please cite the related publication provided below.


Compressed in ZIP format:

Related Publications

C. Hachenberg and T. Gottron, “Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations” in WoLE’12: Proceedings of the ISWC workshop on Web of Linked Entities, Nov. 2012.