Sunday, July 27, 2014

Object Relationship Mapping (ORM)

ORM stands for Object relational Mapping. ORM is an attempt to map the notion of object and relational world so that they can talk to each other in a easy way. Any non trivial application has a database behind it and Java applications are no exception. In fact if we look closely into any application, one will realize that the application gets more or less modeled around the data model. In database technology, relational database are the clear winners. Other database technologies has come and gone. Relational concept of data management was first introduced by E.F. Codd in 1970.

An analogy for relational model can be drawn with spreadsheets. Each sheet represents a table and the columns in the sheet represent the table attributes. Each instance of data is represented by the rows. The data in different sheets are connected with each other by referring to the data point using the sheet number, column number and row number. This is what is called as foreign key relationship in database technology. In fact most of the GUI interfaces to database show the data in a spreadsheet format.

To interact with the database, Standard Query Language(SQL) has emerged as the standard way. The SQL standards are controlled by ANSI. However there are still proprietary variations to it. SQL provides two types of mechanism:

  • Data Definition Language (DDL) : Provides ways to create and alter tables.
  • Data Manipulation Language (DML) : Provides ways to manipulate and retrieve data. It includes inserting, updating and deleting data.

To interact with the database, the applications has to issue SQL to the database. How to issue SQL is proprietary to each database. They have their own API's exposed for this and the API's might be written in different languages. For example a database written in C language. might expose C based API's. Considering that the data independence is considered a virtue for any application, it would be a lot of work for an application developer to understand the interfaces for each of the database and implement it. To solve this kind of problem, Java has come up with ((JDBC)) API's.

JDBC is the most popular way of connecting to databases in Java. It's an interface based API where the implementation for each database is provided by the drivers for particular database. Though JDBC is very popular, it is inherently relational in nature. The basic problem is the mismatch in conceptual level between relational technology and Object Oriented Technology. Java being a pure Object Oriented Language, this mismatch is important to deal with. This mismatch is also known as Object relational mismatch. ORM tries to solve this mismatch.

Let's see the kind of mismatch that are there:

Inheritance

Java supports inheritance. For example we might have User class from which Student and Teacher class is derived.

User

public class User{
   private String Name;

   //Setters and getters
}

Student

public class Student extends User{

    private double percentage;
   
    //Setter and Getter
}

Teacher

public class Teacher extends User{
    private int exprienceYears;

    //Setters and Getters
}

Now think for a moment how you are going to map these classes to the table structure. ORM frameworks adopt different strategies to solve this, which can be seen at ((Hibernate)) section.
Also with this comes the mismatch in terms of polymorphism. A reference of User type can refer to an object of Student or Teacher. Also a list might contain a mix of Teacher, Student and User objects. How you build those list by querying database. The ORM frameworks has to somehow differentiate that the data is belonging to User or Student or Teacher.

Granularity

The granularity problem comes when the number of classes mapping to number of tables in the database do not match. For example let's say we have the User class which has an Address object

public class User{
   private String Name;
   private Address address;

   //Setters and getters
}

Address

public class Address{
    private String city;
    private String country;

    //Setters and getters

Also the table structure for User is

Table USER:

 NAME
 CITY
 COUNTRY

There is one table but the data is sitting in two objects. The same problem can come the other way round also where you have two tables and one class containing all the data points. ORM frameworks has  to care of this mismatch in terms of different number of tables mapped to different number of classes.

Identity and Equality

The notion of identity is driven by primary key in relational model. Given a primary key you will always retrieve the same data. Also it does not matter, how many clients are retrieving the same data. With right isolation level all will see the same data. In Java the situation becomes more complex. The same data row might be represented by more objects in Java layer. For example a User data in database with primary key 1 might have more than one object in different thread. The question comes which is the object having the right data. Also if all thread try to save the same object, than who wins? Similar problem arises related to equality.

In Java the default is reference equality. If the references are pointing to the same object than they are equal. The corollary is that if there are two objects representing the same data row, they will come out as different. To solve this we have to give implementation to the equals methos, but it's not always trivial. ORM solutions has to provide provisions to maintain the correct notion of equality and identity. The frameworks usually ask to explicitly map the primary key to an attribute.

Association

In Java the relationship is done by association. For example User has a reference of address. However in Tables, the relationship is done with foreign key association. Java has also the notion of directionality. For example you can access address form User but not the other way round. To build the relationship from both side you have to put the reference on both the sides.

User

public class User{
    private Address address;
    //Setters and Getters
}

Address

public class Address{
    private User user;

    //setters and getters
}

However there is no notion of directionality in the relational world.The foreign key relationship can be done at any one end and the relationship is build. With SQL you can navigate from any end. For example the foreign key relationship build at User side will be

Table ADDRESS

 ADDRESS_ID
 CITY
 COUNTRY

Table USER

USER_ID
NAME
ADDRESS_ID (Fk)

ORM solutions has to deal with these association while setting and getting the data from the database.

Type Systems

Object Oriented and relational world has different type systems. For example in relational world string can be constrained on the size however on the Java side you can point the reference of String to any size based on memory allocated. Also date and times are handled differently in Java and relational world. For example in some databases there is a distinction between date, time and timestamp. Time stamp contains both date and time. In Java, the date is a long value which contains the date and time both. So when you fetch the date or time from relational world how you map it to a Java data type of Date or Calendar.

Databases are Different

There are so many popular databases in the market. In spite of SQL being a standard, there are still variations in the SQL support in each database and there are many vendor extensions and individual flavors in each database. Though this is not really an ORM issue but it is important from the perspective of database portability, if you are looking for it. Ensuing database portability using JDBC is usually a hurricane task if you wish to use the individual flavors of database for performance or other reasons. ORM frameworks attempt to handle most of the important databases transparently. Also they do have extension mechanisms if you wish to support another database.
A video explaining the above concepts:
ORM frameworks adopt different strategies to solve these kind of mismatches. ORM frameworks strive to preserve the notion of object world concepts and shield the developers from relational world by taking care of mappings. This should be taken as an excuse to not to learn relational concepts. In fact on the other way to be a good user of ORM frameworks, one should understand how the mapping works and the implications of it. The number issue in using ORM frameworks is performance and most of the time its because of not understanding how the ORM frameworks map to the relational world and not having a good grasp of relational world concepts.

No comments:

Post a Comment