Focused Crawling of Linked Open Data: A Probabilistic Model Making Use of Path Information[go to overview]
The Linked Data cloud has seen a tremendous and continuing growth over the last couple of years. In order to consume Linked Data, many scenarios require complex and computationally intensive operations on focused subsets of the Linked Open Data (LOD) cloud. In this paper we address the idea of focused crawling of Linked Data for efficiently constructing Linked Data subsets which match a configurable relevance criterion. We motivate the need for focused crawling of Linked Data, formalise the corresponding task and describe an adaptive, path-based probabilistic ranking model for guiding a focused crawler. We establish an evaluation setup and empirically compare our proposed model to unfocused baseline approaches and simple heuristics. The evaluation is performed on three different datasets using three different relevance criteria. Our results demonstrate that focused crawling on Linked Data is feasible and show that adaptive approaches, which learn how to find relevant resources while they are crawling the Linked Data graph, perform well throughout our experiments.
09.04.15 - 10:15