Institute for Web Science and Technologies · Universität Koblenz - Landau

Scale-out evaluation of news feed retrieval algorithms on Neo4j and Titan cluster

[go to overview]
Sebastian Schlicht

News feed platforms, such as Facebook and Twitter, are growing continuously. Their primary requirements are scalability, low request latencies and high availability [7] for read and write requests. This requires to scale out the system to multiple machines [10]. STOU and Graphity were reported as high performant algorithms to power a news feed system. However, their evaluation took place on a single machine with one thread, the algorithms didn’t support to run in a distributed system. We identified a proper experimental setup for the evaluation of scale-out. We implemented versions of STOU and Graphity, that support a distributed execution. We evaluated the news feed algorithms on two distributed graph databases, that use different distribution strategies: a) Neo4j with its database replication and b) Titan using Cassandra, that bases upon distributed hash-tables. Cassandra is an eventually-consistent database. Neo4j shows good read scale-out properties, for both STOU and Graphity, but its database replication approach can’t provide write scaleout. On Titan Graphity provides better read scale-out than STOU, while STOU shows good write scale-out properties. We weren’t able to implement Graphity in a manner, that would support concurrent writes in a distributed system. Graphity seems too complex, to be used on eventually-consistent databases. With caching enabled, its read performance is similar to STOU, while its runtime complexity for writes is higher. We conclude that STOU is more suitable to run in a distributed system. Both distribution strategies have strengths that can be used to power a news feed system and a combination of both approaches will show the better scale-out properties.

06.08.15 - 10:15
B 016