About Adam Warski

Adam is one of the co-founders of SoftwareMill, a company specialising in delivering customised software solutions. He is also involved in open-source projects, as a founder, lead developer or contributor to: Hibernate Envers, a Hibernate core module, which provides entity versioning/auditing capabilities; ElasticMQ, an SQS-compatible messaging server written in Scala; Veripacks, a tool to specify and verify inter-package dependencies, and others.

Trying to understand CAP

The CAP theorem, stated by Brewer and proved by Gilbert and Lynch specifies a property of distributed systems. It states that such a system cannot guarantee at the same time Consistency, Availability and Partition tolerance. It is also often said as a catchy phrase:

Consistency, Availability, Partition Tolerance – pick any two

used mostly when talking about NoSQL databases and suggesting that a distributed system can be characterized as either CA, AP or CP (see e.g. here).

For some time I’ve been trying to understand the different combinations and what do they mean in practice; having some time at various airports I caught up on reading, and here’s what I came up with.

What’s C, A and P?

First let’s define how I understand the three guarantees, basing on some of the articles I’ve read.

Consistency is the easiest one. It roughly means that the clients get the same view of data. By saying that a system is consistent we often mean strong consistency, but it also can come in different flavors, e.g. casual.

Availability is a property saying that every request to a non-failing node will return a (meaningful) response. The response may not contain all data (so the harvest will not be 100%, see the appropriate section in [3]), but it should be useful for the client.

Partition tolerance means that the system will continue working even if any number of messages sent between nodes is lost. This can be e.g. a network failure between two datacenters, where nodes in each datacenter form a partition. Also note that a failure of any number of nodes forms a partition (it is not possible to distinguish between a network failure and a node failing and stopping to respond to messages).

The hardest part for me is understanding the difference between Availability and Partition tolerance. Also, the various articles don’t specify what they mean by saying that a system is “working” after being e.g. partitioned – does it mean that every request gets a response with useful data, or are responses “Sorry, I can’t give you data right now” acceptable also?

P+C?

Let’s assume that a system is partition tolerant and that it has more than one node. If a partition is formed, splitting the system in two, the system should continue working. Hence both partitions allow clients to write. But then, how to guarantee consistency? If one client writes to partition 1, and another to partition 2? Hence: P => ~C.

A+~P?

Suppose now that a system is available and that we have more than one node. As the system is available, it should respond to requests even if some nodes die. As noted above, some nodes dying are equivalent to a partition. So if the system is still working, we have partition tolerance. Hence: A => P.

A+C?

Summarizing the two implications above (A => P and P => ~C), we get: A => ~C, so that an available system cannot be consistent (if it has more than one node). In practice however, there are of course AC systems, e.g. single-node RDBMS. Or even master-slave/master-master replicated RDBMS, provided there’s a central router knowing which nodes live and directing client appropriately. Such a router is then a single point of failure (SPoF).

Relax?

I suspect that in reality, when e.g. NoSQL/NewSQL systems are characterized with the CAP properties, they assume some relaxed form of C/A/P. Unfortunately, all of the definitions flying around seem to be pretty vague and are more of hand-waving than proper, precise statements. I think it would be much easier to explore the ever-growing ecosystem of new datastores if they could be more easily characterized; maybe the CAP vocabulary is just not enough?

Please correct me if I’m wrong somewhere, and I probably am! :)

And don’t forget to share!

Reference: Trying to understand CAP from our JCG partner Adam Warski at the Blog of Adam Warski blog.

Related Whitepaper:

Software Architecture

This guide will introduce you to the world of Software Architecture!

This 162 page guide will cover topics within the field of software architecture including: software architecture as a solution balancing the concerns of different stakeholders, quality assurance, methods to describe and evaluate architectures, the influence of architecture on reuse, and the life cycle of a system and its architecture. This guide concludes with a comparison between the professions of software architect and software engineer.

Get it Now!  

Leave a Reply


1 + five =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

15,153 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books