Sie sind hier

Evaluating Compression Strategies of Databases

In relational databases, data are stored in tables. Each row represents a datum and each column an attribute. In a variety of real-world applications many rows have only values for a few columns and all other columns are null. In traditional relational databases the data is stored and compressed row-wise (for instance, see [1,2]). In contrast to these traditional relational databases, column-oriented databases that store and compress the tables column-wise have been developed (for instance, see [3,4]). One of their advantages is that they can compress the data more efficiently. Nevertheless, column-oriented databases may show a worse performance if updates occur or if a complete row should be read. In this bachelor thesis, the performance of different compression strategies used by prominent row-oriented and column-oriented databases like MySQL, PostgreSQL, MongoDB or Impala should be evaluated. Therefore, the used compression strategies should be reimplemented, first. Thereafter, an evaluation methodology should be developed with which the compression strategies are finally evaluated.

[1] https://dl.acm.org/citation.cfm?id=560733
[2] https://ieeexplore.ieee.org/document/1617426/
[3] https://parquet.apache.org/
[4] https://dl.acm.org/citation.cfm?doid=2882903.2915964

Studienart: 
Bachelor
Ausschreibungsdatum: 
2018