Impact Analysis of Data Placement Strategies on Query Efforts in Distributed RDF Stores[go to overview]
In the last years, scalable RDF stores in the cloud have been developed, where graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. One main challenge in these RDF stores is the data placement strategy that can be formalized in terms of graph covers. These graph covers determine whether (a) the triples distribution is well-balanced over all storage nodes (storage balance) (b) different query results may be computed on several compute nodes in parallel (vertical parallelization) and (c) individual query results can be produced only from triples assigned to few - ideally one - storage node (horizontal containment). We analyse the impact of three most commonly used graph cover strategies in these terms and found out that balancing query workload reduces the query execution time more than reducing data transfer over network. To this end, we present our novel benchmark and open source evaluation platform Koral.
This presentation extends the results presented in the Oberseminar talk from 16.03.2017 by providing an in-depth examination of the previously presented results and how the different graph cover strategies are affected by scaling up the number of slaves and scaling up the dataset size.
06.07.17 - 10:15