Sunday, August 28, 2016

Pandas : Plotting a data series

A common need is to plot the data using pandas dataframe having different series data. For example, consider the following data:

Student, Class, Marks
A,       1,     56
A,       2,     67
A,       3,     89
A,       4,     76
A,       5,     76
B,       1,     78
B,       2,     99
B,       3,     75
B,       4,     44
B,       5,     77

Python finding the encoding of a file

chardet is a universal character encoding detector in python. It can find the encoding of a file also provides a confidence score of the encoding. To have chardet in your environment, install chardet with your package manager. With pip it would look like

pip install chardet

Tuesday, June 7, 2016

Serverless Architecture

The first thing to understand about serverless architecture is that it's not about the absence of server. What it means is that as a developer you are not concerned with server. You provide a code piece to an environment and it will be executed and results will be returned to you. You are just responsible for providing the code piece and generally the code piece has to adhere to some contract, so that the execution environment can understand it. AWS Lambda is an example of serverless architecture. 

Monday, June 6, 2016

Streaming data from Result Set

This example shows a way to stream the database records in a JSON format. This post is done against postgres db and a table is assumed. However you can point to any table structure and db after doing required adjustments.

public class StreamingService {

  public void handleRequest(String sql, OutputStream op) throws IOException 
   //Initialize the driver
   try {
   } catch (Exception e) {


Tuesday, April 19, 2016

Mysql Cheat Sheet

To List all the databases, their tables and the number of records in each table

select table_schema,table_name,table_rows from information_schema.tables;

displays in following format

 | table_schema               | table_name                                                   | table_rows       |

Note: The table_rows here are approximated rows and may not show the correct number of rows. Be careful.

Extended Display

Select * from abc \G;

\G results in extended display

Dumping database

mysqldump -u<username> -p<password> <db_name>  >   dump.sql

dump.sql will contain the dump of db

Saturday, April 16, 2016

Airflow - Beginners Tutorial

Airflow is a workflow engine from Airbnb. Airbnb developed it for its internal use and had recently open sourced it. In Airflow, the workflow is defined programmatically. Airflow document says that it's more maintainable to build workflows in this way, however I would leave it to the judgement of everyone. A developer, anyway would love anything programmatic. :) Airflow comes with a UI also and I can say that the UI is very clean and impressive. 

The main concept of airflow is a DAG (Directed Acyclic Graph). Yes, it's the same graph that you have seen in Maths, if you have seen it. A DAG contains vertices and directed edges. In a DAG, you can never reach to the same vertex, at which you have started, following the directed edges. Otherwise your workflow can get into an infinite loop. In workflow context, tasks can be defined as vertex and the sequence is represented with the directed edge. The sequence decides the order in which the tasks will be performed.

Wednesday, April 6, 2016

Comparing Mysql databases

This is often a need to compare two databases in terms of schema and data. This might be required to compare with production databases or in test environment to ensure that the data values has not changed. For the same, Mysql has a utility called mysqldbcompare. Let's see how to use it. First the installation.


You can either download from you os repositories or can download it directly from the mysql site. Do a search for mysql-utitlies and you can reach to the page. One way to do it in Linux is as follows


This will download the file in your current directory. Than unzip it and run the python command to install it.

tar -xvf mysql-utilities-1.5.6.tar.gz
cd mysql-utilities-1.5.6
sudo python install

Now let's say we want to compare two database db1 and db2 in a machine called s1. The command goes like

mysqldbcompare --server1=<user>:<password>@s1:3306 db1:db2

Here <user> is the db user and <pass> is the password of the user. If the db are residing in two different machine called s1 and s2 the command becomes

mysqldbcompare --server1=<user>:<password>@s1:3306 --server2=<user>:<password>@s2:3306 db1:db2

By default it stops at first failure. If you want to run it complete than provide --run-all-tests option

mysqldbcompare --server1=<user>:<password>@s1:3306 --server2=<user>:<password>@s2:3306 db1:db2  --run-all-tests

It comes with a lot of switches. For details refer to the mysql documentation page. 

The downside of this approach seems to be that the comparison cannot be configured to handle certain columns of the table only. This becomes important as sometimes one wants to ignore audit columns like create and update dates.