Meaning of ACID
We all know that guarantees provided by Transactions are often describe by the well-known acronym ACID, which stands for Atomicity, Consistency, Isolation, Durability. However, in practice, one database implementation of ACID does not always equal another’s implementation. Systems that do not meet the ACID criteria is BASE which stands for, Basically available, soft stage and eventual consistency. Let’s go through the definitions of the terms in ACID.
Atomicity
Atomic usually refers to something that cannot be broken down into smaller parts(we all have been through those chemistry classes where they describe that Atom is the smallest entity in the universe). This is a word which has different meanings under different contexts. For example, in multi-threaded programming, if one Thread executes an atomic operation, that means there is no way that another Thread could see the half-finished result of the operation. The system can only be in the state it was before the operation or after the operation, not something in between. By contrast, in the context of ACID, atomicity is not about Concurrency. It does not describe what happens if several processes try to access the data at the same time, because it is covered under the separate term Isolation. Rather, ACID atomicity describes what happens if a client wants to make several writes, but a fault occurs after some writes have been processed, for example,
- A process crashes
- A Network connection gets interrupted
- A disk becomes full
- Some integrity constraint violation occurs
In either of the above cases if the writes are grouped together into an atomic transaction, and the transaction cannot be completed(committed) due to a fault, then the transaction is aborted and database must discard or undo any writes it has made so far in the transaction.
Without this invariant/guarantee, if an error occurs partway through making multiple changes, it’s difficult to know which changes have taken effect and which haven’t. The application could try again, but that risks making the same change twice, leading to duplicate or incorrect data. Atomicity solves this problem by ensuring that if a transaction gets aborted the application can safely retry without corrupting the data.
Consistency
Consistency in terms of ACID is the idea that you have certain invariants about the data that must always be true. For example, in a banking system, credits and debits across all accounts must always be balanced. If a transaction starts with a database that is valid according to these invariants, and any writes during the transaction preserve the validity, then we can be sure that the invariants are always satisfied. When we think about this a little deeply, we can figure out that the idea of consistency depends on the application’s notion of invariants, and it’s the application’s responsibility to define its transactions correctly so that consistency is preserved. This is not something that the Database can guarantee. If we write bad data that violates our invariants, the database won’t stop us(Some specific invariants like foreign key constraints and uniqueness constraints can be checked, however in general, the application defines what data is valid or invalid and the database only stores it).
Isolation
Most databases are accessed by several clients at the same time. This isn’t a problem if they are reading and writing different parts of the database, but if they are accessing the same database records, we can run into concurrency problems. Often called as Race Conditions.
The following diagram is a simple example of the problem. Say we have two clients simultaneously incrementing a counter that is stored in a database. Each client needs to read the current value, add 1 and write the new value back(assuming there is no increment operation built into the database). The output in the following diagram - the Counter should have increased from 42 to 44, because two increments happened, but it actually only went to 43 because of the race condition. Isolation in the sense of ACID means that concurrently executing transactions are isolated from each other, they cannot step on each other’s toes. In theory in order to achieve this invariant, the databases execute these operations Serially(One after the other) even though they are occurring concurrently(For example - by having a single thread update the entire database or so). Hence the result would same as if they happened one after the other. However, in practice, serializable isolation is rarely used, because it carries a performance penalty.
Durability
The purpose of a database system is to provide a safe place where data can be stored without the fear of losing it(data outlives the program which created it) Durability is the promise that once a transaction has committed successfully, any data it has written will not be forgotten, even if there is a hardware fault or the database crashes. If you had read Do’s and don’t of error handling, we saw that with a Single node database is never reliable and 100% reliability can never be achieved. However, in order to attain best case Durability databases use mechanisms like Replication in order to ensure that the data never gets lost.