Thursday, October 16, 2014

Exceptions while Batch processing in Postgres

If you are encountering the following exception

Caused by: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block

you have been hit with the way postgres handles a batch. When postgres is handling a batch of queries,  even if one of the query while executing fails with an exception, Postgres aborts the whole
batch. 

There are both pros and cons to this approach. From pros perspective, as Postgres fails the whole batch even if one issue happens, it ensures that we are not working with potentially a bad state of data. However on the cons side sometimes you might just want for the batch to pass even if a particular statement fails. So for example if you are inserting a set of audit logs, even if one particular statement fails, you may just want for the rest of the queries to pass. Handling the failures become tricky especially if you have put a batching mechanism in place for performance reasons.

The situation becomes tricky in the context of Postgres, if while batch processing the exception thrown is gobbled in the middle. For example in following case:

start transaction
query 1
try{
     query 2
catch(Exception ex){
   //Gobble the exception and do nothing
}
query 3
commit transaction

If all three queries are running in the context of a batch and if query 2 throws and exception and is gobbled, the connection gets into a bad state. Now when query 3 is executed as part of batch, the whole transaction will fail. So be careful about eating up exceptions like above.

In this case, databases like Oracle and mySQL will still process the query 3 successfully.

One solution to handle this is to run the queries in auto commit to true. This will result in each query running in it's own transaction but performance might hit. 

Postgres provides  a savepoint capability to handle rolling back partially. For example the above query snippet can be handled as follows

start transaction
query 1

SAVEPOINT point1

query 2

ROLLBACK TO point 1

query 3
commit transaction

For more details on savepoint see this

In Java JDBC Connection object provides interfaces to set and release savepoints. In the context of Java the above query using savepoint would become

//Get the connection object and set auto commit to false
//so that everything will be in one batched.
conn.setAutoCommit(false);

Statement stmt = conn.createStatement();
   
String sqlQuery1 = //Put a query which passes

stmt.executeUpdate(sqlQuery1 ); 

//Set a savepoint 
Savepoint savepoint = conn.setSavepoint("Savepoint"); 

try{
   String sqlQuery2 = //Put a query which fails

   stmt.executeUpdate(sqlQuery2 );  
}catch(Exception ex){
   conn.rollback(savepoint);

}   

//Release the savepoint if not required
//Any reference to savepoint after this will raise exception
conn.releaseSavepoint(savepoint)

String sqlQuery2 = //Put a query which fails

stmt.executeUpdate(sqlQuery2 ); 

//Commit the transaction so all queries will go in a bath
conn.commit();

No comments:

Post a Comment