Partial aggregation: The beautiful way

02.2015 / Category: How To / Tags: development | sql help

Partial aggregation: Aggregations are a fundamental feature of SQL. Functions such as min, max, count, sum, avg and so on are used in virtually every application - it is basically impossible to write a reasonably complex applications without them.

Here is a trivial aggregate: All it does is summing up numbers from 1 to 10:

test=# SELECT sum(x) FROM generate_series(1, 10) AS x;
 sum
-----
  55
(1 row)

1

2

3

4

5

test=# SELECT sum(x) FROM generate_series(1, 10) AS x;

sum

-----

55

(1 row)

So far so good. But what if we only want to sum up even numbers? We can rely on a simple trick: NULL values are ignored by aggregation functions. This makes a lot of sense because NULL really means “undefined” and you cannot average or sum up undefined values. A simple CASE WHEN can do the job:

test=# SELECT sum(CASE WHEN x % 2 = 0 THEN x ELSE NULL END)
            FROM generate_series(1, 10) AS x;
 sum
-----
  30
(1 row)

1

2

3

4

5

6

test=# SELECT sum(CASE WHEN x % 2 = 0 THEN x ELSE NULL END)

FROM generate_series(1, 10) AS x;

sum

-----

30

(1 row)

This works perfectly - however, it is a bit old fashioned and pretty nasty to write (especially in case of more complex operations).

The fancy way to partial aggregation

There is a more fancy way to do that:

test=# SELECT     sum(x) FILTER (WHERE x % 2 = 0) AS even,
                  sum(x) FILTER (WHERE x % 2 = 1) AS odd
       FROM  generate_series(1, 10) AS x;
 even | odd
------+-----
   30 |  25
(1 row)

1

2

3

4

5

6

7

test=# SELECT sum(x) FILTER (WHERE x % 2 = 0) AS even,

sum(x) FILTER (WHERE x % 2 = 1) AS odd

FROM generate_series(1, 10) AS x;

even | odd

------+-----

30 | 25

(1 row)

The FILTER keywords allow users to tell the systems which rows should make it into the aggregate function. The new syntax is far more readable, as well as a lot shorter.

Find out more about partial aggregation in PostgreSQL 16 in this blog post about Parallel Aggregates by Pavlo Golub.

In case you need any assistance, please feel free to contact us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes

Article Rating

Subscribe

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Efi

9 years ago

Is there any difference performance wise between the FILTER version and the CASE WHEN version?

0

Reply

David Fetter

9 years ago

Reply to Efi

Clarity and speed. The planner knows about FILTER, so it has what it needs to make it faster.

0

Reply

Slava Moudry

9 years ago

Reply to David Fetter

Thanks! This is really great. David, can you elaborate a bit on planner optimizations for FILTER.
Would it use data stats on columns or can it pull data using index-only scan?

0

Reply

David Fetter

9 years ago

Reply to Slava Moudry

Check the git commit logs for FILTER. The initial mechanics of the optimization are there.

0

Reply

David Fetter

9 years ago

Reply to Slava Moudry

FILTER is about eliminating, not adding. See the patch for places where that happens. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b560ec1b0d7b910ce13edc51ffaafaca72136e3b