CYBERTEC Logo

pgstrom: PostgreSQL on a GPU with joins

09.2015 / Category: / Tags: |

In my previous post I have shared some GPU performance data, which were mostly related to aggregation. Given the information I have found on the pgstrom wiki page, I decided to give joins a try to see how much speed we can gain by offloading some of the work PostgreSQL has to do to the graphics card. Let's see how PostgreSQL does on a GPU with joins.

First I created some test data:

25 million rows should be joined with 1 million rows, which are a 100% subset of the original data.

A first test shows a very nice improvement. First two queries on my CPU:

The CPU in use here is an “AMD FX(tm)-8350 Eight-Core Processor” running at 4 Ghz.

Let us try the same thing on the GPU:

What we see here is a nice improvement. The query is several times faster.

Making use of indexes

The interesting part is to see, what happens if indexes are added to both sides of the join:

The standard query, shown before, does not show any difference because too much data is needed inside the join. So, the test is repeated with a reasonable filter:

The CPU version is pretty fast. PostgreSQL scans the indexes, makes use of the sorted input to perform a nice merge join then. The aggregation is pretty fast as well.

In the GPU enabled case the plan does not look as efficient in the default setting:

Even if the query is slower in this case, it should not be a major issue. If the cost models are adjusted properly and if the planner is told how to decide on the right plan, the system should be able to figure out that the CPU version is the faster one. So the loss in speed is really not an issue here. As soon as all infrastructure is in place, balancing the cost model and adjusting the planner here and there does not seem like a showstopper (at least not given the achievements already made).

Joining more tables with pgstrom

Let us just add more tables to the join to see what happens. To make sure that CPU and GPU get a fair chance, I have removed my two indexes again:

As expected the CPU has a hard time joining all the stuff together. It is a pretty nasty join. At the end 34 seconds are needed.

The GPU can really shine under those circumstances. Expensive joins really seem to be what this is all about after all:

Given the number of rows involved, this is really a nice number to see. 8 seconds just rocks and shows the real potential of a GPU.

Of course: You got to keep in mind that I am running arbitrary SQL statements, which are not too realistic in many cases. I am yet to test pgstrom on a real workload involving more every-day queries you would typically see in PostgreSQL.

0 0 votes
Article Rating
Subscribe
Notify of
guest
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Alexander Korotkov
Alexander Korotkov
8 years ago

You also should keep in the mind that you're comparing running SQL in single thread of CPU and running same SQL in multiple threads of GPU. I can suppose that pgstrom utilize the full GPU. You could run your test queries in parallel using pgbench. Then you could figure out real average performance.

Hans-Jürgen Schönig
Hans-Jürgen Schönig
8 years ago

i actually planned some larger scale pgbench tests but I did not get around do it yet. i am also yet to figure out, where CPU time is burned. we are definitely planning some more tests as soon as there is enough time.

Matthew Seal
Matthew Seal
8 years ago

This article (and your previous one) were really useful. I recently finished up some performance bench-marking for a production system with various DB implementations. One was PGStrom and your examples let me quickly find what wasn't available for GPUPreAggr calls. It'd be great if decide to post another blog on PGStrom where you pointed out where the library is still lacking/what's needed to get it to production capability. i.e. Can't aggregate involving Numeric/UUIDS, or GPU worker limits, using the debug logger to identify decision trees, etc. Also more content on bottlenecks and more on types of queries that will be sub-optimal would be awesome. I found, for example, that being unable to stage data on the Graphics card memory for later execution made all queries have an underlying latency regardless of size that made my use-case less ideal.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    3
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram