Acronyms

SOLID

Design principles for object-oriented languages:

Single responsibility

  • Each class should have only one purpose (do one thing)

Open-close principle

  • Class should be open for extention and closed for modification

  • You should not need to rewrite existing class for implementing new feature

Polymorphism should help here, interfaces with multiple implementations. Code smell is switch case, if and many else-if statements

Liskov substitution

  • Every subclass should be substitutable for their parent class

E.g. when one of child throws from overriden method Exception("operation is not supported") this is a break of this rule.

Interface segregation

  • Interfaces should not force class to implement what they can not do

  • Large interfaces should be split into small ones

Dependency inversion

  • Components should depend on abstractions, not on concretions (details)

Dependency Inversion Principle (DIP) is neither dependency injection (DI) nor inversion of control (IoC). DI is just a way to achieve IoC.

Classes depend on interfaces, not on concrete implementations.

What about a Spring? Should we always create interface -> implementation?

Only when we expect that interface will have multiple implementations it makes sense to have an interface, otherwise it will only complicate the code.

ACID

ACID is a set of properties for DB transactions:

  • Atomicity - assuming every transaction contains 1..N statements, so either all of them succeed or none of them are applied.

  • Consistency - ensures that a transaction can only bring the database from one consistent state to another, preserving database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This prevents database corruption by an illegal transaction.

  • Isolation - ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.

  • Durability - guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure.

Which storages are ACID compliant: all SQL engines (Oracle, MySQL, PostgreSQL, and MS SQL server are a few examples), Neo4j, and MongoDB (actually not, proved by Jepsen).

ACID-compliant databases are widely used in industries like finance, healthcare, and government administration, where data storage is highly regulated.

  • Basically Available – Rather than enforcing immediate consistency, BASE-modelled NoSQL databases will ensure availability of data by spreading and replicating it across the nodes of the database cluster.

  • Soft State – Due to the lack of immediate consistency, data values may change over time. The BASE model breaks off with the concept of a database which enforces its own consistency, delegating that responsibility to developers.

  • Eventually Consistent – The fact that BASE does not enforce immediate consistency does not mean that it never achieves it. However, until it does, data reads are still possible (even though they might not reflect the reality).

BASE-compliant databases are almost exclusively used by large companies in relatively unregulated spaces that process several terabytes or more of data every day.

Popular BASE-compliant databases include BigTable and DynamoDB, as well as Cassandra and Hadoop.

Details

BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.

CAP theorem

  1. Consistency: Every read receives the most recent write or an error.

  2. Availability: Every request receives a response, without guarantee that it contains the most recent version of the information. Access continues even during a partial system failure.

  3. Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes..

Интересное объяснение на примере компании "Позвони, напомню"

AP: assuming that connection between C1 and C2 is degraded, the system still continues to operate (this is P). If client will write different data to C1 and C2, they will not be able to sync values over the network and it will mean that when client will get non Consistent values while reading. But system remains Available as it will accept all read and write requests.

CP: assuming that connection between C1 and C2 is degraded, the system still continues to operate (this is P). Assuming that system requires Consistency, when client tries to write some new value to C1, C1 will realise that it can not sync this value with C2, and C1 will reject write operation. Though read operations will be well accepted by the system, only writes are rejected. So systems is not fully available.

Above is CA: all writes are synced between C1 and C2 (this is Availability), all read requests are returning that same value (this is Consistency). But such system does not have a tolerance to partitioning. E.g. single node systems.

CAP Theorem Trade-Offs

Lets apply CAP Theorem in Master Slave Replication cluster setup for simplicity. Also, let’s imagine that network partition happens at master node in a way that it cannot communicate with rest of the nodes in cluster and with clients or master node goes down.

Consistency and Availability Trade-Off

As we can see that when master node is down, Client 1 has few options to retrieve information

Consistency: To Achieve consistency clients can:

  • Connect to master and the request fails.

  • It waits/retries till master node is up and running, which again violates availability.

Availability: Client can connect to other replica and access stale or inconsistent information.

This is the trade-off which client needs to make and as described through CAP Theorem, client can either choose Consistency or Availability in case of Network Partitions.

PACELC theorem

The PACELC theorem is an extension to the CAP theorem. Both theorems were developed to provide a framework for comparing distributed systems. Like the CAP theorem, the PACELC theorem states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C). PACELC extends the CAP theorem by introducing latency and consistency as additional attributes of distributed systems. The theorem states that, “else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).”

Let’s take previous example with Master Slave Cluster Setup. This time lets assume that the Network Partition doesn’t happen and master node is up and running. But after certain point in time its is overwhelmed by clients request and its performance degrades.

The Latency Consistency Trade-Off

Client 1 has few following options to retrieve information and this is the trade-off which client needs to make and as described through PACELC Theorem.

Consistency: To achieve consistent information client still requests Master node but receives slow response or high latency.

Latency: Client can connect to other replica and accesses stale or inconsistent information and receives response with low latency.

Off-course, PACELC Theorem is applicable to different scenarios where clients or nodes within cluster need to make a tradeoff between latency and consistency when Network Partition doesn’t happens.

  • The default versions of DynamoDB, Cassandra, and Cosmos DB are PA/EL systems: if a partition occurs, they give up consistency for availability, and under normal operation they give up consistency for lower latency.

  • MongoDB can be classified as a PA/EC system. In the baseline case, the system guarantees reads and writes to be consistent.

  • Fully ACID systems such as VoltDB/H-Store, Megastore and MySQL Cluster are PA/EC: they refuse to give up consistency, and will pay the availability and latency costs to achieve it. BigTable and related systems such as HBase are also PC/EC.

KISS

Last updated