Parallel queries and SERIALIZABLE transactions

Posted on 2016-11-14 by Hans-Jürgen Schönig

In version 9.6 PostgreSQL introduced parallel queries. The ability to use more than just one CPU core per query is a giant leap forward and has made PostgreSQL an even more desirable database. With parallel queries many workloads can be sped up considerably. However, in this article, I want to point out a small missing feature, which bugged one of our support clients recently: lack of support for SERIALIZABLE. UPDATE Sept. 2023: This restriction was lifted as of v12, see this note in the documentation.

Consider the following example:


test=# CREATE TABLE t_demo AS
SELECT         id
FROM           generate_series(1, 10000000) AS id;

SELECT 10000000

The sample table contains 10 million rows.

To tell the optimizer about the content of the table, running ANALYZE seems like a good idea:


test=# ANALYZE t_demo;
ANALYZE

Then users can tell PostgreSQL about the maximum number of cores allowed for the query we are about to run:


test=# SET max_parallel_workers_per_gather TO 4;
SET

After that we can already run a simple aggregation:


test=# explain SELECT count(*) FROM t_demo;
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=76498.57..76498.58 rows=1 width=8)
  ->  Gather  (cost=76498.15..76498.56 rows=4 width=8)
      Workers Planned: 4
      ->  Partial Aggregate  (cost=75498.15..75498.16 rows=1 width=8)
      ->  Parallel Seq Scan on t_demo  (cost=0.00..69248.12 rows=2500012 width=0)
(5 rows)

PostgreSQL will execute a so called parallel sequential scan using 4 CPU cores (= worker processes). The system will scale nicely and significantly better execution times can be observed.

SERIALIZABLE and parallelism

We have seen a couple of people use SERIALIZABLE transactions for analytical requests. While this might not be a good idea for other reasons, there are also implications when it comes to the use of parallel queries. At this point (as of PostgreSQL 9.6) there is some code in PostgreSQL which is not yet fully parallel safe. Therefore parallel queries are not yet supported if you happen to use SERIALIZABLE. If you are affected by this, consider using REPEATABLE READ.

Here is what happens:

test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
test=# explain SELECT count(*) FROM t_demo;
                               QUERY PLAN
------------------------------------------------------------------------
 Aggregate  (cost=169248.60..169248.61 rows=1 width=8)
  ->  Seq Scan on t_demo  (cost=0.00..144248.48 rows=10000048 width=0)
(2 rows)

test=# COMMIT;
COMMIT

As you can see, only a single core is used.

The code inside the query planner (planner.c) is very clear about this:

We can’t use parallelism in serializable mode because the predicate locking code is not parallel-aware. It’s not catastrophic if someone tries to run a parallel plan in serializable mode; it just won’t get any workers and will run serially. But it seems like a good heuristic to assume that the same serialization level will be in effect at plan time and execution time, so don’t generate a parallel plan if we’re in serializable mode.

Future versions of PostgreSQL will surely relax this at some point so that all isolation levels can benefit from parallel queries.

UPDATE:

This limitation was removed as of PostgreSQL v12: see the documentation on “Allow parallelized queries when in SERIALIZABLE isolation mode (Thomas Munro)” for more information.

In case you need any assistance, please feel free to contact us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.