CAP Theorem – Brewer’s Theorem | Hadoop HBase

In this post, we will understand about CAP theorem or Brewer’s theorem. This theorem was proposed by Eric Brewer of  University of California, Berkeley.

CAP Theorem or Brewer’s Theorem

CAP theorem, also known as Brewer’s theorem states that it is impossible for a distributed computing system to simultaneously provide all the three guarantee i.e.  Consistency, Availability or Partition tolerance.

Therefore, at any point of time for any distributed system, we can choose only two of consistency, availability or partition tolerance.

Availability

Even if any of one node goes down, we can still access the data.

Consistency

You access the most recent data.

Partition Tolerance

Between the nodes, it should tolerate network outage.

The above of the three guarantees are shown in three vertices of a triangle and we are free to choose any side of the triangle.

Therefore, we can choose (Availability and Consistency) or (Availability and Partition Tolerance) or (Consistency and Partition Tolerance).

Please refer to figure below:

CAP theorem
CAP Theorem

Relational Databases such as Oracle, MySQL choose Availability and Consistency while databases such as Cassandra, Couch, DynoDB choose Availability and Partition Tolerance and the databases such as HBase, MongoDB choose Consistency and Partition Tolerance.

CAP Theorem Example 1:  Consistency and Partition Tolerance

Let us take an example to understand one of the use cases say (Consistency and Partition Tolerance).

These databases are usually shared or distributed data and they tend to have master or primary node through which they can handle the right request. A good example is MongoDB.

What happens when the master goes down?

In this case, usually another master will get elected and till then data can’t be read from other nodes as it is not consistent. Therefore, availability is sacrificed.

However, if the write operation went fine and there is network outage between the nodes, there is no problem because the secondary node can serve the data. Therefore, partition tolerance is achieved.

CAP Theorem Example 2: Availability and Partition Tolerance

Let us try to understand an example for Availability and Partition Tolerance.

These databases are also shared and distributed in nature and usually master-less. This means every node is equal. Cassandra is a good example of this kind of databases.

Let us consider we have an overnight batch job that writes the data from a mainframe to Cassandra database and the same database is read throughout a day. If we have to read the data as and when it is written then we might get stale data and hence the consistency is sacrificed.

Since this is the read heavy and write once use case, I don’t care about reading data immediately. I just care about once the write has happened, we can read from any of the nodes.

But Availability is one of the important parameters because if one of the nodes goes down we can be able to read the data from another backup node. The system as a whole is available.

Partition tolerance will help us in any network outage between the nodes. If any of the nodes goes down due to network issue another node can take it up.