Given the news lately, The New Colossus by Emma Lazarus seems very relevant.
Not like the brazen giant of Greek fame,
With conquering limbs astride from land to land;
Here at our sea-washed, sunset gates shall stand
A mighty woman with a torch, whose flame
Is the imprisoned lightning, and her name
MOTHER OF EXILES. From her beacon-hand
Glows world-wide welcome; her mild eyes command
The air-bridged harbor that twin cities frame.
"Keep, ancient lands, your storied pomp!" cries she
With silent lips. "Give me your tired, your poor,
Your huddled masses yearning to breathe free,
The wretched refuse of your teeming shore.
Send these, the homeless, tempest-tost to me,
I lift my lamp beside the golden door!
Reproducing an old answer of mine from Quora.
Is -0 a number? Yes it is, and it is equal to 0.
Interestingly though, signed zeros are a necessary representation in computing because of rounding off errors and limitations with floating point precision. In certain classes of computations, the sign of a number before it was rounded off to zero is of practical importance.
Here’s an example from the Wikipedia article on signed zeros:
Informally, one may use the notation “−0” for a negative value that was rounded to zero. This notation may be useful when a negative sign is significant; for example, when tabulating Celsiu temperatures, where a negative sign means below freezing.
More than an hour in, I still haven’t made it past the central panel.
And I finally defended my PhD thesis.
After a long hiatus of technical posts, I’m finally getting around to blogging about my PhD research. Today, I’ll give a brief overview of some of my recent work on the C3 system that was published at NSDI 2015.
My research has focused on techniques to reduce latency in the context of large-scale distributed storage systems. A common pattern in the way people architect scalable web-services today is to have large request fanouts, where even a single end-user request can trigger tens to thousands of data accesses to the storage tier. In the presence of such access patterns, the tail latency of your storage servers becomes very important since they begin to dominate the overall query time.
At the same time, storage servers are typically chaotic. Skewed demands across storage servers, queueing delays across various layers of the stack, background activities such as garbage collection and SSTable compaction, as well as resource contention with co-located workloads are some of the many factors that lead to performance fluctuations across storage servers. These sources of performance fluctuations can quickly inflate the tail-latency of your storage system, and degrade the performance of application services that depend on the storage tier.
In light of this issue, we investigate how replica selection, wherein a database client can select one out of multiple replicas to service a read request, can be used to cope with server-side performance fluctuations at the storage layer. That is, can clients carefully select replicas for serving reads with the objective of improving their response times?
This is challenging for several reasons. First of all, clients need a way to reliably measure and adapt to performance fluctuations across storage servers. Secondly, a fleet of clients needs to ensure that they do not enter herd behaviours or load oscillations because all of them are trying to improve their response times by going after faster servers. As it turns out, many popular systems either do a poor job of replica selection because they are agnostic to performance heterogeneity across storage servers, or are prone to herd behaviours because they get performance-aware replica selection wrong.
C3 addresses these problems through a careful combination of two mechanisms. First, clients in a C3 system, with some help from the servers, carefully rank replicas in order to balance request queues across servers in proportion to their performance differences. We refer to this as replica ranking. Second, C3 clients use a congestion-control-esque approach to distributed rate control, where clients adjust and throttle their sending rates to individual servers in a fully decentralized fashion. This ensures that C3 clients do not collectively send more requests per second to a server than it can actually process.
The combination of these two mechanisms gives C3 some impressive performance improvements over Cassandra’s Dynamic Snitching, which we used as a baseline. In experiments conducted on Amazon EC2, we found C3 to improve the 99.9th percentile latency by factors of 3x, while improving read throughput by up to 50%. See the paper for details regarding the various experiments we ran as well as the settings considered.
While the system evaluation in the paper was conducted using the Yahoo Cloud Serving Benchmark (YCSB), I’m currently investigating how C3 performs under production settings through some companies who’ve agreed to give it a test run. So far, the tests have been rather positive and we’ve been learning a lot more about C3 and the problem of replica selection in general. Stay tuned for more results!