Monday, July 28, 2014

Sharding in Hibernate

Sharding is basically horizontal partitioning of data. If handled carefully, sharding can improve performance. For example let's say for a online mail provider, the mail subscribers can run into huge numbers. If all the rows of the database are sitting in the same server, it would hog a huge amount of memory. Sharding can help in horizontal partitioning of data. We can partition the records on the basic of continents. The continent information derived from the country information. So the rows belonging to North America and Asia would be sitting on different servers. If the queries are always continent specific most of the time, this partitioning will help in improving the performance and also indexing will be better. The care should be taken that if we have queries which involve going to different databases to fetch the result set, and if this is a prevalent scenario, sharding would result in degradation of performance.
Hibernate Shards helps in dealing with horizontal partitioning of data. To use shards, let's assume we have a EmailUser table where the user data is saved. We will save the records alternately in two
different database and will see that how hibernate shields us from the knowledge that it is talking to two different database. To make this tutorial, make a java project as outlined in Hibernate Introduction with Annotation. Also make sure Hypersonic jar is in the path. We will make a tutorial where the data is saved to two databases. The source code can be downloaded from the attachment section down below.
Let's make the EmailUser class first. Note that it is a simple mapping class.
@Entity
public class EmailUser {
 
 private BigInteger id;
 private String name; 
 private String continent;
 
        //The id is generated by using a Shard specific generator
 @Id 
 @GeneratedValue(generator="EmailIdGenerator")
 @GenericGenerator(name="EmailIdGenerator",  
             strategy="org.hibernate.shards.id.ShardedUUIDGenerator")
 @Column(name="ID")
 public BigInteger getId() {
  return id;
 }
 
 public void setId(BigInteger id) {
  this.id = id;
 } 

 //Getters and setters for other properties
        ...
}
As we are interacting with two different databases, we need to provide two configuration file to Hibernate. If there are more than two databases, we have to provide same number of configuration files. Each file representing the details of a database. The first configuration file is usually the master file, where we provide all the details about other settings. Rest of the configuration files just contains the details of connection to database and shard specific id's.
hibernate0.cfg.xml (First configuration file)
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE hibernate-configuration PUBLIC
        "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
        "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">

<hibernate-configuration>
    <!-- note the different name -->
    <session-factory name="HibernateSessionFactory0"> 
    
    <!-- Database connection settings -->
    <property name="connection.driver_class">org.hsqldb.jdbcDriver</property>
    <property name="connection.url">jdbc:hsqldb:hsql://localhost</property>
    <property name="connection.username">sa</property>
    <property name="connection.password"></property>
        
    <!-- JDBC connection pool (use the built-in) -->
    <property name="connection.pool_size">1</property>

    <!-- SQL dialect -->
    <property name="dialect">org.hibernate.dialect.HSQLDialect</property>

    <!-- Enable Hibernate's automatic session context management -->
    <property name="current_session_context_class">thread</property>

    <!-- Second-level cache  -->
    <property name="cache.provider_class">org.hibernate.cache.EhCacheProvider/>

    <!-- Echo all executed SQL to stdout -->
    <property name="show_sql">true</property>
    <!--  property name="format_sql">true</property -->
    <!-- property name="use_sql_comments">true</property -->

    <!-- Drop and re-create the database schema on startup -->
    <property name="hbm2ddl.auto">create-drop</property>
        
    <!-- Shard specific confifuration -->
    <property name="hibernate.connection.shard_id">0</property> 
    <property name="hibernate.shard.enable_cross_shard_relationship_checks">
    true
    </property>

 </session-factory>
</hibernate-configuration>

The hibernate configuration for second database
hibernate1.cfg (Second Configuration file)


<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE hibernate-configuration PUBLIC
        "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
        "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">

<hibernate-configuration>
   <!-- note the different name -->
   <session-factory name="HibernateSessionFactory0"> 
   
   <!-- Database connection settings, this database is in memory -->
   <property name="connection.driver_class">org.hsqldb.jdbcDriver</property>
   <property name="connection.url">jdbc:hsqldb:file:1</property>
   <property name="connection.username">sa</property>
   <property name="connection.password"></property>
   
   <!-- SQL dialect - This tells the SQL grammer to be used -->
   <property name="dialect">org.hibernate.dialect.HSQLDialect</property>
        
   <property name="hibernate.connection.shard_id">1</property> 
   <property 
      name="hibernate.shard.enable_cross_shard_relationship_checks">
     true
   </property>
</session-factory>
</hibernate-configuration>
In the true tradition of Hibernate, now we write the HibernateUtil class which is responsible for building sessionFactory
public class HibernateUtil {
 
   private static SessionFactory sessionFactory;
 
   static{
     try{
 //Provide the master hibernate config
 AnnotationConfiguration prototypeConfig = 
             new AnnotationConfiguration().configure("hibernate0.cfg.xml");
 prototypeConfig.addAnnotatedClass(EmailUser.class);
 List<ShardConfiguration> shardConfigs = 
             new ArrayList<ShardConfiguration>();
 shardConfigs.add(buildShardConfig("hibernate0.cfg.xml"));
 shardConfigs.add(buildShardConfig("hibernate1.cfg.xml"));
 ShardStrategyFactory shardStrategyFactory = buildShardStrategyFactory();
        ShardedConfiguration shardedConfig = 
           new ShardedConfiguration(prototypeConfig,
        shardConfigs,
        shardStrategyFactory);

        sessionFactory =  shardedConfig.buildShardedSessionFactory();
 }catch(Throwable ex){
  throw new ExceptionInInitializerError(ex);
 }
} 
 
   static ShardConfiguration buildShardConfig(String configFile) {
 Configuration config = new Configuration().configure(configFile);
 return new ConfigurationToShardConfigurationAdapter(config);
   }
 
   static ShardStrategyFactory buildShardStrategyFactory() {
 ShardStrategyFactory shardStrategyFactory = 
  new ShardStrategyFactory() {
         public ShardStrategy newShardStrategy(List<ShardId> shardIds) {
         RoundRobinShardLoadBalancer loadBalancer = 
   new RoundRobinShardLoadBalancer(shardIds);
  ShardSelectionStrategy shardSelection= 
   new RoundRobinShardSelectionStrategy(loadBalancer);
  ShardResolutionStrategy shardResoultion = 
   new AllShardsShardResolutionStrategy(shardIds);
  ShardAccessStrategy shardAccess = 
   new SequentialShardAccessStrategy();
  return new ShardStrategyImpl(shardSelection, 
                                             shardResoultion , 
                                             shardAccess );
 }
   };
   return shardStrategyFactory;
  
  public static SessionFactory getSessionFactory(){
 return sessionFactory;
 }
 
 public static void shutdown(){
  //Close caches and connection pool
  getSessionFactory().close();
 }
}
Now let's access the session and do database operations
public class HibernateShardsMain {
 
   public static void main(String[] args){
 Session session = HibernateUtil.getSessionFactory().openSession();
 Transaction tx= session.beginTransaction();
  
 EmailUser e1= new EmailUser();
 e1.setName("abc");
 e1.setContinent("North America");
 System.out.println(session.save(e1));
  
 EmailUser e2= new EmailUser();
 e2.setName("def");
 e2.setContinent("Asia");
 System.out.println(session.save(e2));
  
 EmailUser e3= new EmailUser();
 e3.setName("fgh");
 e3.setContinent("Asia");
 System.out.println(session.save(e3));
 
 EmailUser e4= new EmailUser();
 e4.setName("ijk");
 e4.setContinent("North America");
 System.out.println(session.save(e4));
 
 tx.commit();
 session.close();
  
 session = HibernateUtil.getSessionFactory().openSession();
 tx= session.beginTransaction();
  
 Criteria crit = session.createCriteria(EmailUser.class);
 crit.add(Restrictions.ilike("continent", "North America"));
 crit.list();
    
 tx.commit();
 session.close();
   }

}
Note that the main code has no knowledge that it is working against multiple databases. It looks similar to the code as we see while we interact with a single database. As we are using a round robin strategy, Hibernate will save the data alternately in two different databases.
Before you plan to use Hibernate shards, be careful of following points:
  • It looks like Shards in not under active development.
  • Shards has serious problems in dealing with queries, especially order by and distinct. These problems are itself very complex when we are dealing in a sharded environment. For the order by and distinct to work, we have to bring the data from all shards at a common location and than build the result. But than we run into book keeping and memory issues.

2 comments:

  1. This was done way back. I doubt if can get my hands on the code. Also I would suggest to refer Hibernate docs about the latest state of things on sharding.

    ReplyDelete