Sunday, October 12, 2014

In Memory Databases

In memory databases are different from traditional databases in the sense that in memory database sits in memory and traditional databases stores the data in files system. As the data sits in in-memory, in memory databases are much faster than traditional databases. The primary reason for performance is that in memory databases do not have to seek the disk. 

In terms of ACID properties, in -memory databases satisfy all the requirements apart from durability. As the database exist in memory, in case of power failure, the database can loose the data. However their are many techniques which are employed to handle the situation. The involves from maintaining
transaction logs, writing snapshots to disk to using certain kind of memories which do not loose the date in case of power failure.

In memory databases are different from embedded database as embedded database run in the same process as the application whereas in memory database run in their own process and operate within a client server architectures.

With the Internet of things gaining momentum, we are seeing humongous amount of data getting
generated. This comes in the category of fast and big data. Big data which is relatively static in nature or does not comes with high speed and is not required to be processed in real time, solutions like Hadoop has proven good to handle them. Even traditional databases with sharding can be employed to handle them. The advantage of Hadoop over traditional SQL databases is that hadoop can work without defining schema upfront. The situation changes when we are dealing with data that is huge in amount and is coming with constantly high velocity. If we do not have to process the data at real time, we are still okay as we can batch them up and put into some persistent storage like HDFS or even in some relational database. What to do if we need to process the data at real time and run analytics on top of that?

If we just have to do event processing, the situation is still easier to handle as we can look into data, raise events based on conditions or filters and move the data into persistent storage. The situation becomes complex when the data needs to be processed for analytics insight. That would require running complex queries and involves variety of data pieces (multiple tables).  That's the place where in-memory databases can be used most effectively. At times can cannot look into the event in isolation but has to be looked into context. That requires complex queries and that's where in-memory databases can help.

There is another thought process also where if we know that the data size can be fit into memory than why not use the in-memory databases. There is no I/O involved and everything is fast. Many commercial database technologies are also adopting this notion and coming with database products which can completely fit into memory. SAP Hana, VoltDB are a few of them who have products in the area of in-memory databases.

No comments:

Post a Comment