After a long hiatus of technical posts, I’m finally getting around to blogging about my PhD research. Today, I’ll give a brief overview of some of my recent work on the C3 system that was published at NSDI 2015.

My research has focused on techniques to reduce latency in the context of large-scale distributed storage systems. A common pattern in the way people architect scalable web-services today is to have large request fanouts, where even a single end-user request can trigger tens to thousands of data accesses to the storage tier. In the presence of such access patterns, the tail latency of your storage servers becomes very important since they begin to dominate the overall query time.

At the same time, storage servers are typically chaotic. Skewed demands across storage servers, queueing delays across various layers of the stack, background activities such as garbage collection and SSTable compaction, as well as resource contention with co-located workloads are some of the many factors that lead to performance fluctuations across storage servers. These sources of performance fluctuations can quickly inflate the tail-latency of your storage system, and degrade the performance of application services that depend on the storage tier.

In light of this issue, we investigate how replica selection, wherein a database client can select one out of multiple replicas to service a read request, can be used to cope with server-side performance fluctuations at the storage layer. That is, can clients carefully select replicas for serving reads with the objective of improving their response times?

This is challenging for several reasons. First of all, clients need a way to reliably measure and adapt to performance fluctuations across storage servers. Secondly, a fleet of clients needs to ensure that they do not enter herd behaviours or load oscillations because all of them are trying to improve their response times by going after faster servers. As it turns out, many popular systems either do a poor job of replica selection because they are agnostic to performance heterogeneity across storage servers, or are prone to herd behaviours because they get performance-aware replica selection wrong.

C3 addresses these problems through a careful combination of two mechanisms. First, clients in a C3 system, with some help from the servers, carefully rank replicas in order to balance request queues across servers in proportion to their performance differences. We refer to this as replica ranking. Second, C3 clients use a congestion-control-esque approach to distributed rate control, where clients adjust and throttle their sending rates to individual servers in a fully decentralized fashion. This ensures that C3 clients do not collectively send more requests per second to a server than it can actually process.

The combination of these two mechanisms gives C3 some impressive performance improvements over Cassandra’s Dynamic Snitching, which we used as a baseline. In experiments conducted on Amazon EC2, we found C3 to improve the 99.9th percentile latency by factors of 3x, while improving read throughput by up to 50%. See the paper for details regarding the various experiments we ran as well as the settings considered.

While the system evaluation in the paper was conducted using the Yahoo Cloud Serving Benchmark (YCSB), I’m currently investigating how C3 performs under production settings through some companies who’ve agreed to give it a test run. So far, the tests have been rather positive and we’ve been learning a lot more about C3 and the problem of replica selection in general. Stay tuned for more results!