CYBERTEC Logo

Distributed aggregates / aggregate pushdown in PostgreSQL

10.2017 / Category: / Tags:

PostgreSQL 10 will provide end users with countless new features. One of those features is related to “Foreign Data Wrappers” and is generally known as “aggregate pushdown”. To be honest: This stuff is one of my favorite new features of PostgreSQL 10 and therefore it might be worth, sharing this piece information with a broader audience. So if you are interested in remote aggregation, distributed queries, distributed aggregates and aggregate pushdown in PostgreSQL, keep reading.

Preparing PostgreSQL for a test

To show what the optimizer is already capable of, we need two databases:

Then we can deploy some simple test data in db02:

The script generates 1 million rows and just a single name (= “dummy”)

Create a “database link” in PostgreSQL

For many years now, PostgreSQL has provided means to access remote data sources using “Foreign Data Wrappers” (FDWs)

The script shown here loads the postgres_fdw extension, which allows us to connect to a remote PostgreSQL database. Then a virtual server pointing to db01 is created in db01. Finally, there are a user mapping and the foreign schema imported. All tables in the remote database, which can be found in the “public” schema, will be linked and visible in db01.

Running a simple query in PostgreSQL

Once the test data is in place, we can give PostgreSQL a try and see, how it behaves in case of aggregates. Here is an example:

The most important observation here is that PostgreSQL is able to push over the complete aggregate. As you can see, the remote SQL is basically the same as the local query. The main advantage is that by pushing over the aggregates PostgreSQL can drastically reduce the load on your local machine and reduce the amount of data, which has to be sent over the network.

PostgreSQL Foreign Data Wrappers and joins

However, at this point it is necessary to issue a word of caution: Yes, aggregates can be pushed down to a remote server. The thing is: Joins happen before the aggregate. In other words: PostgreSQL has to transfer all the data from the remote host in this case:

Further development

For PostgreSQL 11.0 we are working on a patch, which will hopefully make it into core. It allows PostgreSQL to perform many aggregations before the join has to happen, which makes joining cheaper because less data ends up in the join. There are many more improvements possible. They may be added to the planner in the near future.
However, as of PostgreSQL 10 a large step forward has been made already to allow PostgreSQL to dynamically distribute queries in a cluster.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram