Sunday, February 17, 2019

Migrating from DynamoDB to Postgres

The mass popularity of NoSQL databases has also resulted in it being used in all use cases without understanding the suitability for the use case. One fundamental rule that is usually forgotten is that the NoSQL databases are designed around queries. Initially, the schema is evolved based on the need for initial business use cases. However, the business use cases evolve and change in multiple ways and soon the new needs of interacting with the database become unwieldy. This is the fundamental problem that people usually hit wiht NoSQL databases.

Where DynamoDB gets into trouble:
  • As the business use cases evolve and change the need to query the database in multiple ways arise. DynamoDB is not easy to query if it is not queried based on the partition key. One can build indexes but there is a cost associated with it. Filters can be used to query the data but that involves scan and as the data grows it starts becoming costly both in terms of money and time.
  • DynamoDB is schemaless so with time the data evolve in multiple ways. In older data records, it is quite possible that the fields might be missing or might have a different interpretation. Developers keep handling them in the deep layers of code to keep the world moving. However, soon it results in too many if-else statements. Migration is a pain to handle such cases, however, one has to be ready for missing fields and handle them with suitable defaults.
  • There is no relationship integrity so it's easy to put wrong data in relationships and it's very difficult to figure out even if something like that has happened. In SQL also it's possible to put a wrong key with a valid foreign relationship but still in terms of integrity SQL provides much better primitives.
  • This will be a repetition of the above points but a different perspective. As it's fine to add any kind of data in the table, people start putting all kind of data in it. Imagine that everyone in the world is given all kind of freedom. Sounds romantic. However, soon ti will be chaos as everyone is living in all different ways. 
Sample code for migration

import boto3
import psycopg2

# Create a connection to DynamoDB. Please fill the required keys. A better way to do is 
# to put it in config file and pass it through. In AWS environment it's better to use Roles
dynamo = boto3.client('dynamodb',aws_access_key_id='<access-key>', \
                      aws_secret_access_key='<access-secrte>', \
                      region_name='<region>')

# Create the database connection
db = psycopg2.connect(host="<db_host>,database="<db>", user="<user>", password="<pass>")
dbCurr = db.cursor()

#Assume we have a user table in dynamoDB and insert it into the postgres Users table
user = dynamo.query(TableName="User", KeyConditionExpression ="Email = :email", \
                               ExpressionAttributeValues = {":email": { 'S': email }})  

email = user.Items[0]['Email']

#Note the returning id so that we can use the id of the newly persisted record.
#This can be used to create foreign key relationships for further table
userSql = 'INSERT INTO users(email) VALUES(%s) returning id'
userValues=(email)

dbCurr.execute(userSql,userValues)
dbUserId = int(dbCurr.fetchone()[0]) 

db.commit()
dbCurr.close()

Some more stories from the web:

No comments:

Post a Comment