Posts

Consistency 3/3 - Data Consistency and the "Theory of relativity"

  "If a tree falls in a forest and no one is around to hear it, does it make a sound?" If a system has an inconsistency but no one  is able  to observe it, is it still an inconsistency? Introduction We need to scale data processing systems geographically  to achieve lower Latency and (at least partial) Availability in case of network Partitioning. But  CAP / PACELC  tells us that we cannot achieve strong Consistency in this case. When we increase the Consistency requirements, we have to accept lower Availability and higher Latency. What is the minimum consistency level that we need? If eventual consistency is enough for your system, things are pretty clear and relatively simple. Most probably you want to achieve Strong Eventual Consistency that is relatively cheap and provides nice guaranties. For this you will have to you something like  CRDT . Some theoretical results assure us that you cannot find something way cleverer than CRDT that achieves ...

Consistency 2/3 - Flow consistency - read-your-writes consistency

   2. Consistency, Availability and low Latency in Distributed system (workarounding the CAP/PACELC theorems)   Introduction: " 1. Cache and Data Consistency in Distributed systems (CAP/PACELC/CRDT) "   TL;DR Full strong Consistency in geographically Distributed systems can only be achieved by sacrificing Availability (per  CAP  theorem) and with prohibitive Latency costs (per  PACELC  theorem). However, we can still design consistent enough systems that continue to function when one geographical region is down and without paying the inter-region latency most of the time. While eventual Consistency is OK many times, there are still cases when we want a strong read-after-write consistency for certain read-after-write flows.  There is an optimum design that assures strong Consistency inside read-after-writes flows . Arguably, this is the highest Consistency level that can be assured without a prohibitive impact on Availability and Latency.

Consistency 1/3 - Cache and Data Consistency in Distributed systems (CAP/PACELC/CRDT)

  Abstract There is always a tension between data  C onsistency and system  A vailability when  P artitioning a system across datacenters (think  CAP ). Especially data cache-ing poses interesting challenges. This tension becomes way more acute as soon you have 2 data centers separated by more than 10ms latency. I present below some of the problems along with possible solutions. In the end I will present an elegant solution that maximizes  A vailability while providing the needed  C onsistency level for read-after-writes flows. The solution requires the client to carry a monotonic  id  along the flow. I would postulate that any solution where the client don't carry some consistency info will provide a higher latency that the presented solution (see chapter "Flow consistency" ) . The examples below are simplified to be intuitive and easy to understand, however these learnings also apply to N datacenters. How it starts Suppose you started with ...

Reusable building blocks in software

Image
When we design a building, we usually design it base on plane walls and right angles. Why is that? Sometimes we design a round form, however that becomes way more complex to build. Still, we try to stick with simple forms like circle arcs. The Sydney Opera House was architected with unusual round forms at the roof. That project "was completed ten years late and 1,357% over budget " ( source ). The project really gained speed when they reduced the complexity of sail-like round forms to reusable (smaller) building blocks. In software you have the same problem. If you don't find reusable building blocks - as reusable code or reusable patterns - the complexity of the project grows exponentially, and you find yourself over budget (time and cost).  In software, you rarely need something as sophisticated as Sydney Opera House. Therefore, try to reduce complexity as much as possible if you want to finish your project in realistic time. Simplicity is the ultimate sophistication ...

Bounded contexts or consistency contexts?

"Embrace modularity but beware of granularity"  ( Mark Richards)   While using the Microservice architecture in software can buy you some agility if done right, I often see architectures where microservices bring extra complexity that actually increases the implementation time. The microservice architecture often brings incidental complexity, that is often caused by the uninspired choice of our microservice boundaries. Bounded context should guide the choice of microservice boundaries. However, I find bounded context to be a too ambiguous concept. Any unit of software can be seen as a bounded context, even a class. A Payment sounds like a bounded context. What about a CreditCard , can it have its own bounded context? We don't want to create a microservice for each class, for sure. Think about this when you think to create another microservice, are you going too close to the "microservice per class" anti-pattern? On the other end, most real life software syste...

Dependency Inversion for Entities - software architecture

Image
  Problem to solve: We create a core software entity, for example a  Product  - this will become  root  or  parent  entity Then we create multiple "child" entities that depend on  Product . For example  Order  and  Warranty . We have business rules for the "parent" entity that depend on "child" entities. For example you cannot remove a  Product  if there are in-flight  Orders;  or if the  Warranty  period is not over for all the instances of that  Product We want to avoid putting all those business rules in  Product , because such "tight coupling" would make the software harder to maintain as the system grows. Solution: TLDR: It helps to imagine the entities  Product ,  Order  and  Warranty  as defined in separate software modules, either in a modular monolith or in a microservice architecture. This will highlight the loose coupling we want to achieve. Howeve...

Microservices: Software decomposition is not for performance, it is for the human brain

You can actually prove that most software systems would work faster in a monolith than in a microservice architecture. Just put back together all the microservices in a monolith, mentally. Replace all the REST calls with direct Java calls, this should eliminate a lot of extra work to serialize/deserialize.  The resulting system will consume less resources overall just by less serialization. The latency should be smaller without the network calls. If you eliminate all the workarounds added just to make the distributed transactions to work, the performance should be significantly higher - with a proper load balancing of course. There are some edge cases, like if your resulting application does not fit into DRAM, however this is rarely the case. The horizontal scalability should still work is you deploy enough of those monoliths and correctly balance the work among them. Of course, the starting time might be higher with the monolith. The risk for Out Of Memory from one module to the o...