The Emperor has no Clothes

2013-10-17

“In theory, theory and practice are the same. In practice, they are not.” — Albert Einstein

The original Dynamo paper created a wave of interest in the CAP theorem and gave rise to the recent crop of distributed databases: Cassandra, Riak, et al. These systems are generally AP where C can be tuned to provide some guarantee of consistency, i.e. they do their best to provide CAP according to the application’s needs. For instance, you might have a cluster of 5 nodes where a write to the cluster will return success if 3 of the nodes acknowledge the write. The cluster will still be available even if two of the machines fail.

In theory they are a great way to ensure availability to your application in the face of network failures. In practice, I believe these databases are so complex that they often provide less availability than a simpler CP system like a SQL database.

Consider:

As the recent series of Jepsen posts show, even the most highly regarded of these systems have serious bugs in their handling of network failures. Cascading failure happens. Split brain happens. Distributed databases do not scale linearly vs a single system; having a 5 node cluster will not handle 5x the throughput of a single system so your costs will increase super-linearly and your chances of network failure increase 5x (and thus exposing those hard-to-test network failure bugs). Distributed databases are useful only if:

My belief is simple: avoid distributed databases if possible. You will pay a heavy tax for their use.