The Benefits of combining Fork Join pools with PostgresSQL

Warning: As I am incredibly busy at the moment, benchmarks are not necessarily provided. This is a performance review based on work experience and the accompanying documents for my superiors.

Postgresql is a terrific free tool. In my opinion, it can do almost all than an Oracle product and has quick functions for performing critical tasks such as dumping an entire database in a CSV from a programming language. However, insertion can be slow. I am constantly updating three or more databases with 15 or more attributes a piece. Unfortunately, my company makes an estimated revenue of the cost of an Oracle or Microsoft license. While PostgresSQL developers promise multi-threading in the future, Java has actually solved a significant portion of this task for us along with Apache using Fork Join Pools. The result can be an improvement by thousands of records.

Fork Join Pools and Why to use SE 8

Java SE 7 introduced a Fork Join Pool to the concurrent processes. Oracle recommends only using this for extremely intensive tasks. The tool itself works by sharing tasks among threads, work-stealing. Java SE 8 improves on the algorithm and reduces the number of tasks which are dropped.

Setting Up the Pool In Spring 4

I reviewed several different connection pools before settling on the vastly improved and newest version of Apache DBCP. BoneCP offers improved performance but at the cost of ignoring critical flaws. Chiefly, the current version fails to close connections properly leading the developer to recommend that I revert to version 0.8. The new Apache DBCP outperformed C3P0 in my benchmark tests. My outlook is provided below. It is based on a connection between both a machine and the internet to a co-location and from a machine to a local machine.

Connection Pool Pros Cons
Apache DBCP Reliable Somewhat slower than BoneCP
BoneCP Fast Somewhat Unreliable
C3PO Reliable Slow and worse option than DBCP.

Setting Up the DBCP is incredibly simple and configurable. Spring uses gradle or XML for configuration. For my purpose, XML provided a decent option since my teammates use XML in their daily work. For that reason, the XML configuration is provided below. A decent way to set this up is to use the data source properties bean as a reference in the declaration of the DAO Template’s data source.

<bean id="dataSource" destroy-method="close"
  class="org.apache.commons.dbcp.BasicDataSource">
    <property name="driverClassName" value="${jdbcdriver}"/>
    <property name="url" value="${jdbcurl}"/>
    <property name="username" value="${dbcusername}"/>
    <property name="password" value="${dbcpass}"/>
    <property name="initialSize" value="3"/>
    <property name="validationQuery" value="Select 1" />
</bean>

Other options are provided at the DBCP site. A validation query is provided here since I ran into an issue with the validation query in my own work.

A reference can be provided to the DAO template using:

     <bean id="jdbcTemplate" ref="dataSource"/>

Delivering the Code with a DAO Template: Avoiding Race Conditions

An important consideration is how multiple connections will be handled by the DAO Template. Fortunately, Java offers asynchronization and Spring allows for this to be declared even without declaring a method to be synchronized using the annotation @Async.

   @Async
   protected void postData(Map data){

   }

The method above is now Asynchronously accessed.

Spring accepts the upper class in the collection hierarchy that more common collections implement or extend. Basically, due to inheritance, the HashMap “is-a” map but Map cannot be recast to HashMap. Call the getClass() method to see this in action. If .getClass() returns java.util.HashMap, it is possible to recast Map to HashMap to gain any benefits beyond implementing the Map interface. Objects are passed by reference, by memory location more specifically, so this should be the case.

The Fork Join Pool

In this instance, the Fork Join Pool should accept Recursive Actions. They are not only useful in recursive tasks. Due to the work sharing, they improve I/O and other intensive tasks as well. Oddly, I ran into an issue where methods returning values with RecursiveTask (my parsers) would not completely close. I switched to using Callables instead. Fork Join Pools accept both Runnable, Callable “is-a” runnable, and ForkJoinTask classes explicitly. Threads, Callables, Runnable, ForkJoinTask, and RecursiveAction objects are acceptable.

Instantiation for this task requires:

ForkJoinPool fjp=new ForkJoinPool(Runtime.getRuntime().availableProcessors()*procnum);

The number of processes is the argument. Anything that does not fit an integer throws an IllegalArgumentException.

Keep in mind that Armdahl’s rule applies and that too many processes will cause the process to slow down, as does inappropriate use.
wiki-to-armdahls-rul

To submit a process use:

fjp.execute(new SQLRecursiveAction(/**parameters**/));

The SQL class (in this case an inner class) would be:

private SQLRecursiveAction extends RecursiveAction{
      private final String aString;
      public SQLRecursiveAction(String aString){

      }

      protected void compute(){
            /**call to DAO Template**/
      }
}

The compute method is required as it implements a protected abstract method (abstract methods must at least be default and accessible in abstract classes).

Once ready, do not shutdown the Fork Join Task. Instead, there are other ways to wait for completion that allow reuse.

int w=0;
while(fjp.isQuiescent() == false && fjp.getActiveThreadCount()>0){
w++;
}
//log.info("Waited for "+w+" cycles");

When complete, shutdown is necessary. This should be done when finished with the pool.

fjp.shutdown() //orderly shutdown
//fjp.shutdownNow() for immediate shutdown

Findings

In the end, the Fork Join Pool significantly improved the performance of my application which can parse any type of document into attributes stored in JSon strings and posted to the database. The number of records I was able to parse and post increased dramatically from several hundred over one minute to over 10,000 records with the same number of attributes. It may also be wise to consider other improvements such as TokuTech’s fractal tree indexing for large and normalized databases.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s