Sie sind hier

Distributed SPARQL

The Semantic Web is an inherently distributed system, lacking any central schema or control. RDF is the standard language to represent knowledge in the semantic web. Recently, the semantic web query language SPARQL has reached the status of a W3C candidate recommendation. SPARQL allows to specify the dataset a query is evaluated against in terms of RDF graphs.

In a distributed setting we can easily imagine scenarios, where such a query may range over data stored at various SPARQL endpoints. For example consider the query shown below. It selects co-authors of Tim Berners-Lee and, if available, their dates of birth. The co-authors are determined using an RDF version of DBLP and the dates of birth are selected from an RDF version of Wikipedia.

using an RDF version of DBLP and the dates of birth are selected from an RDF version of Wikipedia.

SELECT ?coauthor ?birthdate
FROM NAMED <http://www4.wiwiss.fu-berlin.de/dblp/>
FROM NAMED <http://www.dbpedia.org>
WHERE  {
  GRAPH <http://www4.wiwiss.fu-berlin.de/dblp/> {
    ?paper dc:creator <http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007>.
    ?paper dc:creator ?coauthor.
    ?coauthor foaf:name ?name. }
  GRAPH <http://www.dbpedia.org> {
    ?person foaf:name ?name.
    ?person dbpedia:birth ?birthdate. }
}

Current approaches re

Current approaches require either the application to issue two queries and treat the results or to integrate the data sources into a single repository, which then evaluates the query. In the scenario introduced above the latter would propably be impossible due to the amount of data involved.

We work on algorithms and protocol extensions for distributed evaluation of SPARQL queries. Using this extension, applications can transparently query the whole semantic web, not caring for the actual location the data is stored at. This involves query rewriting, distributed query evaluation and distributed indexing of enormous datasets. An important design criterion is that endpoints need not cooperate in the building of this infrastructure. Instead, every standards conform SPARQL endpoint can be utilised (although possible with less precise index information).

An implementation is available based on Sesame2. The sourcecode, bugtracking, questions and answers and more are available on launchpad.