Strategies for Efficiently Keeping Local Linked Open Data Caches Up-To-Date[go to overview]
Quite often, Linked Open Data (LOD) applications pre-fetch data from the Web and store local copies of it in a cache for faster access at runtime. Yet, recent investigations have shown that data published and interlinked on the LOD cloud is subject to frequent changes. As the data in the cloud changes also local copies of the data need to be updated. However, LOD applications must deal with limitations of the available computational resources when keeping their local data up-to-date. These limitations imply the need to prioritise which data sources should be considered first for retrieving their data and synchronising the local copy with the original data. In order to make best use of the resources available, it is vital to choose a good scheduling strategy to know when to update which data source. In this talk we present the evaluation of different strategies on a large-scale LOD dataset that is obtained from the LOD cloud by weekly crawls over the course of three years. We investigate two different setups: (i) in the single step setup we evaluate the quality of update strategies for a single and isolated update of a local data cache, while (ii) the iterative progression setup involves measuring the quality of the local data cache when considering iterative updates over a longer period of time. Our evaluation indicates the effectiveness of each strategy for updating local copies of LOD sources, i.\,e, we demonstrate for given limitations of bandwidth, the strategies performance in terms of data accuracy and freshness. The evaluation shows that measures that capture the change behavior or dynamics are the most suitable ones for conducting updates.
25.06.15 - 10:15