Scale-out evaluation of news feed retrieval algorithms on Neo4j and Titan cluster[go to overview]
News feed platforms, such as Facebook and Twitter, are growing continuously. Their primary requirements are scalability, low request latencies and high availability  for read and write requests. This requires to scale out the system to multiple machines . STOU and Graphity were reported as high performant algorithms to power a news feed system. However, their evaluation took place on a single machine with one thread, the algorithms didn’t support to run in a distributed system. We identified a proper experimental setup for the evaluation of scale-out. We implemented versions of STOU and Graphity, that support a distributed execution. We evaluated the news feed algorithms on two distributed graph databases, that use different distribution strategies: a) Neo4j with its database replication and b) Titan using Cassandra, that bases upon distributed hash-tables. Cassandra is an eventually-consistent database. Neo4j shows good read scale-out properties, for both STOU and Graphity, but its database replication approach can’t provide write scaleout. On Titan Graphity provides better read scale-out than STOU, while STOU shows good write scale-out properties. We weren’t able to implement Graphity in a manner, that would support concurrent writes in a distributed system. Graphity seems too complex, to be used on eventually-consistent databases. With caching enabled, its read performance is similar to STOU, while its runtime complexity for writes is higher. We conclude that STOU is more suitable to run in a distributed system. Both distribution strategies have strengths that can be used to power a news feed system and a combination of both approaches will show the better scale-out properties.
06.08.15 - 10:15