Big Data on Diet 2: Reducing the Statistics Size[go to overview]
RDF graphs are sets of triples consisting of the subject, property and object resource. Nowadays, RDF graph can consist of trillion of triples. In order to cope with these huge graphs, distributed RDF stores combine the computational power of several compute nodes. In order to query the RDF graph stored in these distributed RDF stores efficiently, statistical information about the occurrences of resources on the different compute nodes is required. A naive way to store these statistical information is a table stored in a single random access file. This naive implementation leads to files that have a size of several tens of gigabytes and thus, to slow read and write operations. During this talk a compression method is presented that is able to reduce the storage consumption of the statistical information by over 96% while providing efficient access to the data.
21.03.19 - 10:15