In my previous posting on PostgreSQL 9.4 I wrote about aggregate FILTER clauses, which are a neat way to make partial aggregates more readable. Inspired by some comments to this blog post I decided to create a follow-up post, to see the impact this new FILTER clause has on performance.

Loading some demo data

To see the performance impact people can expect, here is some demo data:


test=# BEGIN;
BEGIN

test=# CREATE TABLE t_test (id int, dummy char(200));
CREATE TABLE

test=# INSERT INTO t_test SELECT *, 'dummy' FROM generate_series(1, 20000000) AS x;
INSERT 0 20000000

test=# COMMIT;
COMMIT

The data set is roughly 4.5 GB of data. On the test box this still fits nicely into memory:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
 4596 MB
(1 row)

A classical approach

For years I have used the classical approach. The trick is that aggregation functions won’t consider NULL values. So to do partial aggregation functions the idea is to just replace values on the fly. The code is as already stated in the previous post:

test=# SELECT  
        count(CASE WHEN id % 2 = 0 THEN 1 ELSE NULL END) AS even,
        count(CASE WHEN id % 2 = 1 THEN 1 ELSE NULL END) AS odd
FROM    t_test;
   even   |   odd   
----------+----------
 10000000 | 10000000
(1 row)

Time: 6208.092 ms

In this example the aggregation takes slightly over 6 seconds (with all hint bits set, etc.). This is not too bad.

The modern approach – filter clauses

Let us see now what happens when the new approach is used:

SELECT  count(id) FILTER (WHERE id % 2 = 0) AS even,
        count(id) FILTER (WHERE id % 2 = 1) AS odd
FROM    t_test;
   even   |   odd   
----------+----------
 10000000 | 10000000
(1 row)

Time: 5353.172 ms

The query is almost one second faster than the original one. The overhead of the new FILTER clause is simply ways smaller than in the original version of the query. Of course, the more clauses you add the larger the difference will be.

More realistic scenarios

Before a more complex query is fired I decided to increase work_mem to make sure that all aggregations can happen in memory:

test=# SET work_mem TO '1 GB';
SET
Time: 0.375 ms

This time an additional GROUP BY clause is added to split the data set into smaller chunks. A GROUP BY clause is basically what most people will have in their queries in case FILTER is used:

test=# SELECT  id % 3 AS g,
               count(CASE WHEN id % 2 = 0 THEN 1 ELSE NULL END) AS even,
               count(CASE WHEN id % 2 = 1 THEN 1 ELSE NULL END) AS odd
FROM    t_test
GROUP BY 1;
 g |  even   |   odd  
---+---------+---------
 0 | 3333333 | 3333333
 1 | 3333333 | 3333334
 2 | 3333334 | 3333333
(3 rows)

Time: 22119.977 ms

The result is created in 22 seconds, which is actually quite ok.

The next query shows the same thing using the new syntax:

SELECT  id % 3 AS g,
        count(id) FILTER (WHERE id % 2 = 0) AS even,
        count(id) FILTER (WHERE id % 2 = 1) AS odd
FROM    t_test
GROUP BY 1;
 g |  even   |   odd  
---+---------+---------
 0 | 3333333 | 3333333
 1 | 3333333 | 3333334
 2 | 3333334 | 3333333
(3 rows)

Time: 21379.657 ms

Summary – FILTER clauses

The important thing here is that the performance difference is again around 1 second. This makes sense because the performance gained by the FILTER clause is the same regardless of the grouping around it. This is somewhat expected.

In case you need any assistance, please feel free to contact us.
 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.