Why Is Everyone Talking About the CAP Theorem?

With all the hype and buzz around micro services and containers and APIs and  (insert buzz word here) many in the industry are talking about the CAP theorem. It seems like every software engineer is expected to know what it is all of a sudden. OK, that maybe an over exaggeration, clearly if you're not building distributed applications you won't have a clue as to what I'm ranting about here... But lets face who isn't writing distributed applications these days? 

So what is it?

Well, around 1999 a computer scientist Eric Brewer published what he called a CAP principle, which simply states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency, Availability, Partitioning (CAP). Wow, more buzz words here. Not really, these are actually mean something.

Consistency refers to all nodes in a cluster seeing the same data all the time at the same time. Or another way to put it: every read is guaranteed to return data from the most recent write. Availability refers to every node in the cluster being able to execute queries in reasonable amount of time. The last one, Partitioning Tolerance basically means that even network connections between nodes are down, the other two promises are kept. 

Now, the theorem simply postulates that in a distributed system only two of the three above promises can be guaranteed at any given time, and it is impossible to satisfy all three.

Some math people will say, "oh hey wait a minute it's a theorem where the proof"? OK, lets try to put something together:

Lets assume we have a set N {n1, n2}  representing a distributed system of nodes. And all nodes in N are connected. So we have something like this:


In simpler terms this means that both nodes have the same exact data set. Now let's see what C, A and Pmeans in this setup.

Consistency: If Ichange the data set in n1, I'll have to replicate this change in n2 sothat my data looks the same if I see it from either n1 or n2.

Availability: As long as both n1 and n2 are up and running I should be able to perform CRUD queries on data from any of them. 

Partition Tolerance: If the link between n1 and n2 fails, I should still be able to perform CRUD queries on my data set.

One way to understand CAP theorem is to see what happens during a network partition.


Clearly  n1 cannot communicate with n2 anymore. Let's assume we have some meansto discover partition. Now, if someone talks to n1 and changes the dataset, n1 does not have any means to propagate this change to n2.

Ifavailability is what we desire, according to the definition, we'll haveto allow changes in both n1 and n2. This causes a "split brain problem"  where some updates are in n1, others in n2 making the entire system as awhole, inconsistent.

If we choose consistency we'll have to block all changes on n1 and n2, which by definition makes the system unavailable.

What if we direct all communication to one node,  either n1 or n2. This will reduce availability because we have only onenode in the system which not all clients can reach. Essentially, weassume that the other node is unavailable.

Once the link isrestored, we can propagate all the changes made in n1 while the link wasdown (we can keep a log of all changes once we discover a partition) to  n2 so that both have the same data set.

It's evident here that we can chose CP or AP but not both. Also it doesn't really make sense to chose CA. Also, it's worth mentioning that these principles aren't hard switches, rather a sliding scale.

OK math people, breathe, I know it's not a traditional mathematical proof, but I think it's good enough.

My oversimplified explanation here should supplemented with actual academic text here: 

CAP Twelve Years Later: How the "Rules" Have Changed

Why do we care?

That's the most important question isn't it. Well, we care only when we are architecting a distributed platform as a service type of system. Do we care about the CAP theorem when it comes to APIs or containers...? I'll say this, and take this with a grain of salt, I don't think we do. Yes, it's important to understand these principles when deciding how to configure replication/synchronization of data across, for example, NoSQL database nodes. We also should be aware of these principles when configuring message brokers for say AMQP or JMS etc... Configuration servers like etcd and consul also inherently depend on data replication.

So stateless APIs ? And yes APIs should be stateless. The word stateless here answers our question here, and my answer is no. However, a modern application platform isn't complete without data so on some level every engineer should understand the CAP theorem.

OK, rant is coming to an end, and I guess my main point here is, next time some really really smart engineer asks you, "Do you know the CAP theorem" you should wow them with your theoretical CS skills, then go on your day write your code...