Posts

Showing posts from April, 2026

Consistency 3/3 - Data Consistency and the "Theory of relativity"

  "If a tree falls in a forest and no one is around to hear it, does it make a sound?" If a system has an inconsistency but no one  is able  to observe it, is it still an inconsistency? Introduction We need to scale data processing systems geographically  to achieve lower Latency and (at least partial) Availability in case of network Partitioning. But  CAP / PACELC  tells us that we cannot achieve strong Consistency in this case. When we increase the Consistency requirements, we have to accept lower Availability and higher Latency. What is the minimum consistency level that we need? If eventual consistency is enough for your system, things are pretty clear and relatively simple. Most probably you want to achieve Strong Eventual Consistency that is relatively cheap and provides nice guaranties. For this you will have to you something like  CRDT . Some theoretical results assure us that you cannot find something way cleverer than CRDT that achieves ...

Consistency 2/3 - Flow consistency - read-your-writes consistency

   2. Consistency, Availability and low Latency in Distributed system (workarounding the CAP/PACELC theorems)   Introduction: " 1. Cache and Data Consistency in Distributed systems (CAP/PACELC/CRDT) "   TL;DR Full strong Consistency in geographically Distributed systems can only be achieved by sacrificing Availability (per  CAP  theorem) and with prohibitive Latency costs (per  PACELC  theorem). However, we can still design consistent enough systems that continue to function when one geographical region is down and without paying the inter-region latency most of the time. While eventual Consistency is OK many times, there are still cases when we want a strong read-after-write consistency for certain read-after-write flows.  There is an optimum design that assures strong Consistency inside read-after-writes flows . Arguably, this is the highest Consistency level that can be assured without a prohibitive impact on Availability and Latency.

Consistency 1/3 - Cache and Data Consistency in Distributed systems (CAP/PACELC/CRDT)

  Abstract There is always a tension between data  C onsistency and system  A vailability when  P artitioning a system across datacenters (think  CAP ). Especially data cache-ing poses interesting challenges. This tension becomes way more acute as soon you have 2 data centers separated by more than 10ms latency. I present below some of the problems along with possible solutions. In the end I will present an elegant solution that maximizes  A vailability while providing the needed  C onsistency level for read-after-writes flows. The solution requires the client to carry a monotonic  id  along the flow. I would postulate that any solution where the client don't carry some consistency info will provide a higher latency that the presented solution (see chapter "Flow consistency" ) . The examples below are simplified to be intuitive and easy to understand, however these learnings also apply to N datacenters. How it starts Suppose you started with ...