Checking and Increasing the Connection Limit in PostgreSQL

It finally happened. I blew the max connection limit in my new  PostgreSQL install by starting too many grid nodes  with associated connection pools for a system I am writing.

The default limit of 100 connections is far too few, especially with two people doing database intensive tasks. This article explains how to check the number of connections used as a standard user and administrator and how the administrator can change the connection limit.

Default Connection Limit and Buffer Size

By default, PostgreSQL has a relatively low number of maximum allowed connections. The default limit is 100.

The limit is related to the size of the shared buffers. Connections utilize the memory in the shared buffers. By default, the shared buffer size is set to 8 gigabytes.

PostgreSQL is a versatile database. Amazon built Redshift on the system. However, the maximum sizes are low and should remain so to encourage developers and students not to abuse their system’s resources.

Common Error When Exceeding Size Limit

With an ever increasing amount of data available and still growing need for technology, connection limits will be exceeded on many systems. PostgreSQL throws the following error when this occurs:

psql: FATAL: remaining connection slots are reserved for non-replication superuser connections

In my case, the connection pool failed to reserve the required ten to twenty connections per highly database intensive job and released what connections it did acquire back to the database instantly. This left me with 96 used connections and four that were unattainable. Some of the 96 connections were by an ETL tool or PG Admin.

Querying the Connection Limit

It is possible for non-administrators to check the number of connections in use with the following query:

SELECT count(distinct(numbackends)) FROM pg_stat_database

This query puts together a view of database statistics and counts the number of currently active connections.

Administrators have the option of querying connections from the psql command line using:

SHOW max_connections

This query asks the database to return the value of the max_connections configuration variable described below.

Most configuration variables can be queried in psql. Some can even be set from the command line console. The max_connections variable cannot be set from the command line due to an associated increase or decrease in memory usage.

Setting Connection Related Variables

While the initial settings are low by todays settings, they can be set in your postgresql.conf file. The file is typically found at a location such as:

/var/lib/pgsql/X.X/data/postgresql.conf  or /usr/local/postgres/postgresql.conf

The default location is the data directory under your PostgreSQL installation.

Look for the following variables in the file:

shared_buffers = 8000MB
max_connections = 100

Choosing Size Limits

It is important to set size limits relevant to the size of the shared buffers. Dividing the RAM usage by the current number of connections in use or the maximum number allotted will give a nice over-estimate of total memory use per connection. Ensure that this multiple does not exceed the shared buffer size and the capacity of your RAM. The shared buffer size should generally be less than the amount of RAM in your machine.

The equality to  check with is:

shared_buffers < (RAM - padding) && (current_ram_use / max_connections) * desired_connections < (RAM - padding) && (current_ram_use / max_connections) * desired_connections < shared_buffers

Basically, ensure that you have enough RAM and that the potential memory use does not exceed RAM and the shared buffer size. This inequality uses the max_connections settings instead of the actual number of used connections for some extra assurance.

Reloading and Restarting

The configuration file can be reloaded in an active session without a restart using the sql command line console or through PG Admin with the query:

SELECT pg_reload_config()

This query does not require administrator privileges but will not work when setting the maximum number of allotted connections or buffer size. Any variable in the configuration file proceeded by the following comment requires restarting the database server:

# (change requires restart)

Both max_connections and shared_buffers cause a change in memory usage which cannot be reworked on the fly. From the Linux command line, restart the server with the command:

./postgresql restart

On Ubuntu, the restart script normally resides at /etc/init.d.

If PostgreSQL is setup as a service, use:

service postgresql restart

Conclusion

In today’s world where data is bountiful, it is easy to exceed the connection limit in PostgreSQL. This article reviewed how to reset the connection limit and considerations for resetting the limit and associated shared buffer size.

Tune PostGres for Faster Inserts

Insertions in PostGreSQL take a while, especially for large amounts of data. A project of mine at work sees both large batch updates and another sees single insert statements in a way that they are preferable over others. There are ways to tune PostgreSQL to make sure that you are achieving the most out of your database.

These variables should be in postgres.conf.

  • Tune the Wal (write-ahead logging) with wal_logging set to minimal and ensure a decent wal_writer_delay: When Wal logging times are short and content large, this can greatly effect servers. 10,000 record inserts went from 30 minutes to 2 seconds by extending the wal_writer_delay from per transaction (0 seconds) to 100 milliseconds. The default delay is 200 milliseconds
  • Increase thread limits: Increased thread limits help large transactions. A 10,000 record batch update in my case went from 15 minutes to 3 seconds when thread limits went from 10000 to 50000 threads

Be aware that other configurations will affect your system and that Wal logging is necessary for replication as Wal logs are used to recreate the state in a hot-swapping environment.

The Benefits of combining Fork Join pools with PostgresSQL

Warning: As I am incredibly busy at the moment, benchmarks are not necessarily provided. This is a performance review based on work experience and the accompanying documents for my superiors.

Postgresql is a terrific free tool. In my opinion, it can do almost all than an Oracle product and has quick functions for performing critical tasks such as dumping an entire database in a CSV from a programming language. However, insertion can be slow. I am constantly updating three or more databases with 15 or more attributes a piece. Unfortunately, my company makes an estimated revenue of the cost of an Oracle or Microsoft license. While PostgresSQL developers promise multi-threading in the future, Java has actually solved a significant portion of this task for us along with Apache using Fork Join Pools. The result can be an improvement by thousands of records.

Fork Join Pools and Why to use SE 8

Java SE 7 introduced a Fork Join Pool to the concurrent processes. Oracle recommends only using this for extremely intensive tasks. The tool itself works by sharing tasks among threads, work-stealing. Java SE 8 improves on the algorithm and reduces the number of tasks which are dropped.

Setting Up the Pool In Spring 4

I reviewed several different connection pools before settling on the vastly improved and newest version of Apache DBCP. BoneCP offers improved performance but at the cost of ignoring critical flaws. Chiefly, the current version fails to close connections properly leading the developer to recommend that I revert to version 0.8. The new Apache DBCP outperformed C3P0 in my benchmark tests. My outlook is provided below. It is based on a connection between both a machine and the internet to a co-location and from a machine to a local machine.

Connection Pool Pros Cons
Apache DBCP Reliable Somewhat slower than BoneCP
BoneCP Fast Somewhat Unreliable
C3PO Reliable Slow and worse option than DBCP.

Setting Up the DBCP is incredibly simple and configurable. Spring uses gradle or XML for configuration. For my purpose, XML provided a decent option since my teammates use XML in their daily work. For that reason, the XML configuration is provided below. A decent way to set this up is to use the data source properties bean as a reference in the declaration of the DAO Template’s data source.

<bean id="dataSource" destroy-method="close"
  class="org.apache.commons.dbcp.BasicDataSource">
    <property name="driverClassName" value="${jdbcdriver}"/>
    <property name="url" value="${jdbcurl}"/>
    <property name="username" value="${dbcusername}"/>
    <property name="password" value="${dbcpass}"/>
    <property name="initialSize" value="3"/>
    <property name="validationQuery" value="Select 1" />
</bean>

Other options are provided at the DBCP site. A validation query is provided here since I ran into an issue with the validation query in my own work.

A reference can be provided to the DAO template using:

     <bean id="jdbcTemplate" ref="dataSource"/>

Delivering the Code with a DAO Template: Avoiding Race Conditions

An important consideration is how multiple connections will be handled by the DAO Template. Fortunately, Java offers asynchronization and Spring allows for this to be declared even without declaring a method to be synchronized using the annotation @Async.

   @Async
   protected void postData(Map data){

   }

The method above is now Asynchronously accessed.

Spring accepts the upper class in the collection hierarchy that more common collections implement or extend. Basically, due to inheritance, the HashMap “is-a” map but Map cannot be recast to HashMap. Call the getClass() method to see this in action. If .getClass() returns java.util.HashMap, it is possible to recast Map to HashMap to gain any benefits beyond implementing the Map interface. Objects are passed by reference, by memory location more specifically, so this should be the case.

The Fork Join Pool

In this instance, the Fork Join Pool should accept Recursive Actions. They are not only useful in recursive tasks. Due to the work sharing, they improve I/O and other intensive tasks as well. Oddly, I ran into an issue where methods returning values with RecursiveTask (my parsers) would not completely close. I switched to using Callables instead. Fork Join Pools accept both Runnable, Callable “is-a” runnable, and ForkJoinTask classes explicitly. Threads, Callables, Runnable, ForkJoinTask, and RecursiveAction objects are acceptable.

Instantiation for this task requires:

ForkJoinPool fjp=new ForkJoinPool(Runtime.getRuntime().availableProcessors()*procnum);

The number of processes is the argument. Anything that does not fit an integer throws an IllegalArgumentException.

Keep in mind that Armdahl’s rule applies and that too many processes will cause the process to slow down, as does inappropriate use.
wiki-to-armdahls-rul

To submit a process use:

fjp.execute(new SQLRecursiveAction(/**parameters**/));

The SQL class (in this case an inner class) would be:

private SQLRecursiveAction extends RecursiveAction{
      private final String aString;
      public SQLRecursiveAction(String aString){

      }

      protected void compute(){
            /**call to DAO Template**/
      }
}

The compute method is required as it implements a protected abstract method (abstract methods must at least be default and accessible in abstract classes).

Once ready, do not shutdown the Fork Join Task. Instead, there are other ways to wait for completion that allow reuse.

int w=0;
while(fjp.isQuiescent() == false && fjp.getActiveThreadCount()>0){
w++;
}
//log.info("Waited for "+w+" cycles");

When complete, shutdown is necessary. This should be done when finished with the pool.

fjp.shutdown() //orderly shutdown
//fjp.shutdownNow() for immediate shutdown

Findings

In the end, the Fork Join Pool significantly improved the performance of my application which can parse any type of document into attributes stored in JSon strings and posted to the database. The number of records I was able to parse and post increased dramatically from several hundred over one minute to over 10,000 records with the same number of attributes. It may also be wise to consider other improvements such as TokuTech’s fractal tree indexing for large and normalized databases.