Queries, corpus information and relevance judgements used for evaluating the task of mapping entities in a knowledge base to public web documents. The dataset provides graded relevance judgements in a format that can directly be processed by the trec_eval tool. Two additional files provide a mapping from document-IDs to URLs and topic-IDs to human readable queries.
The dataset has been introdcued in a paper at the ISWC workshop on Web of Linked Entities 2012.
The URL collection, relevance judgements and query selection by Christian Hachenberg and Thomas Gottron are licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
If you are using the files in a scientific context that leads to publications, please cite the related publication provided below.
Compressed in ZIP format: wole-2012-dataset.zip
C. Hachenberg and T. Gottron, “Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations” in WoLE’12: Proceedings of the ISWC workshop on Web of Linked Entities, Nov. 2012.