Saturday, August 9, 2014

ACID Properties

Relational databases guarantee the data integrity by maintaining the ACID properties. What we mean by data integrity? Data integrity means that the data remains in a good state if a transaction happens. Let's look into each of the property in detail

Atomicity
Atomicity represents the notion that if a transaction happens then it should be either completely successful or complete failure. There is no mid state. While transaction the system might go through partial success before going into full success or full failure but it cannot be a partial success or partial failure after the database transaction is complete.For example if you transfer money from Account A to Account B. Then there can be two possible states only, Money is debited from Account A and credited to Account B or no transfer happens. It cannot happen that the money gets debited from Account A and not credited to Account B or vice-versa.

Consistency
Consistency represents the notion that the data will not go in a bad state. Before transaction the data starts with a valid state and after transaction also it will be in a valid state. It may go into invalid state while transaction but not after transaction. For example in the bank account transfer, if \$100 are transferred from Account A to Account B, then A should be debited with \$100 and B is credited with \$100. It cannot happen that Account A is debited with \$200 and B is still credited with \$100.

Isolation
Multiple transactions can happen concurrently. When Account A is transferring money to Account B at the same time Account B might be transferring money to Account C. Let's say Account B has \$0 to start with and Account A is transferring \$100. Also Account B wants to transfer \$100 to Account C. Now let's say Account A transfers \$100 to Account B and in second transaction Account B transfers \$100 to Account C. Should the transfer from B to C be allowed. As the first transaction is still not complete should we allow to let the B to C transfer. That's where the notion of isolation comes in picture. There are multiple isolation levels based on what another transaction can see in the first transaction. So if in this case second transaction sees the \$100 transfer of first transaction though the first transaction is still not complete, the second transaction is really seeing the dirty state of first transaction. What happens if B transfers \$100 to C and the first transaction fails? \$100 will vanish from bank. Right isolation levels are very important fro data integrity. Also you don't want that only one transaction happens at a point of time so that the data integrity is ensured. This will kill the system in terms of performance. Databases employ a variety of technique to provide right performance and right isolation levels. On top of that applications can employ certain techniques to achieve the right performance without compromising the integrity of data.

Durability
Durability basically means the data remains durable. That is the data is persisted in long term storage. The data might change in future as part of another transaction but if no transaction happens , the data should remain in the same state. It essentially means the changes are persisted to filesytems.  If Account A transfers \$100 to Account B the transaction is successful however after some time Account remains with \$100 less but the money vanishes from Account B. A gain to bank but not a desirable situation in terms of data integrity.