pgbouncer: Proving the point

PostgreSQL has proven to be an excellent database for web applications of all kinds. The important thing to notice is that many web applications will cause load which is quite different from what a normal desktop application would cause.
A web application will typically open a connection, fire some very short queries and disconnect. The SQL involved is usually quite simple – so, why care? Well, you must not forget that opening and closing a database connection over and over again is (compared to trivial SQL) not free, but overhead worth thinking about.

To solve the problem you can turn to using a connection pool. pgbouncer will be exactly what you have been looking for. The key advantage is that pgbouncer is an extremely lightweight solution to the problem of PostgreSQL connection pooling. With a reported overhead of around 2k per connection this is more than true.

Recently I have been pretty surprised what kind of impact it can make under some extreme cases. I decided to do the simplest possible of all tests to see what it really means to save on overhead.

We can write ourselves a nice sample SQL (select.sql), which should be executed repeatedly using pgbench:

SELECT 1;

Now we got to run an extreme test against our standard PostgreSQL installation (1 CPU core in a VirtualBox VM, PostgreSQL 9.2.4):

hs@VM:test$ pgbench -t 1000 -c 20 -S p1 -C -f select.sql
starting vacuum…end.
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
number of transactions per client: 1000
number of transactions actually processed: 20000/20000
tps = 67.540663 (including connections establishing)
tps = 15423.090062 (excluding connections establishing)

The main point here is the difference between included and excluded connection creations. This is a 200 times difference. A major difference has to be expected here because forking a connection is by far more expensive than just executing the most trivial SQL possible. Note that we used the -C switch in pgbench, which makes sure that connections are really closed after every transaction. This means that we really had to open 20.000 instead of just 20 connections.

Let us repeat this very same test – only this time we won’t connect to PostgreSQL directly, but to pgbouncer, which then passes the query on to the real PostgreSQL database instance:

hs@VM:test$ pgbench -t 1000 -c 20 -S p1 -C -f select.sql -p 6432
starting vacuum…end.
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
number of transactions per client: 1000
number of transactions actually processed: 20000/20000
tps = 1013.264853 (including connections establishing)
tps = 2765.711593 (excluding connections establishing)

We do the very same test here again. Keep in mind that pgbouncer and PostgreSQL share the same, lonely, CPU to make the data easily comparable.
The overall throughput in the nasty case (= connect + trivial statement + disconnect) has increased over 15 times already. Sure, if the connection has been kept the overall speed has logically gone down due to additional capacity used up by the pgbouncer infrastructure – however, this is not the case we wanted to optimize for. What really matters here is the scenario, which needs new connections constantly.

It is interesting to see how big the performance gain can really be under extreme cases.

Hans-Juergen Schoenig
Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is consultant and CEO of the company „Cybertec Schönig & Schönig GmbH“ (www.postgresq-support.de), which has served countless customers around the globe.