Checkpoint distance and amount of WAL

UPDATED September 2023: Most people who use PostgreSQL database systems are aware of the fact that the database engine must send changes to the so-called “Write Ahead Log” (= WAL) to ensure that in case of a crash, the database will be able to recover to a consistent state safely and reliably. However, not everybody knows that tuning the database server will actually have an impact on the amount of WAL which Postgres writes to disk. This is due to Postgres' use of checkpoints.

Table of Contents

WAL and checkpoint

PostgreSQL must write changes to the WAL (or xlog as some call it) before those changes even make it to the underlying data files. In most cases PostgreSQL will simply log what has to be changed in the data files. However, it must write much much more.

To ensure that the WAL will not grow infinitely, PostgreSQL creates checkpoints. These are a perfect opportunity to recycle WAL. The size of your transaction should be more or less constant, so cleaning up WAL from time to time is an absolute must. Fortunately checkpointing is done automatically.

By default, PostgreSQL uses the following parameters to determine when a checkpoint is supposed to happen:

# checkpoint_timeout = 5min # range 30s-1h

# max_wal_size = 1GB

# min_wal_size = 80MB

These default settings are somewhat ok for a reasonable small database. However, for a big system, having fewer checkpoints will help to increase performance. Setting max_wal_size to, say, 20GB is definitely a good thing to do if your system is reasonably large and heavily loaded (writes).

What is widely known is that the distance between two checkpoints does not only improve speed due to reduced checkpointing – fewer checkpoints will also have an impact on the amount of transaction log written.

Reducing the amount of transaction log written

Increasing the distance between checkpoints leads to less WAL - but why does that actually happen? Remember: The whole point of having the transaction log in the first place, is to ensure that the system will always survive a crash. Applying these changes in the WAL to the data files will fix the data files and recover the system at startup. To do that safely, PostgreSQL cannot simply log the changes made to a block – in case a block is changed for THE FIRST TIME after a checkpoint, the entire page has to be sent to the WAL. All subsequent changes can be incremental. The point now is: If checkpoints are close together there are many “first times” and full pages have to be written to the WAL quite frequently. However, if checkpoints are far apart the number of full page writes will drop dramatically leading to a lot less WAL.

On heavily loaded systems we are not talking about peanuts – the difference can be quite significant.

Measuring the amount of WAL written

To measure how much transaction log the system actually produces during a normal benchmark, I have conducted a simple test using empty database instances:

[hs@linuxpc db]$ createdb test
[hs@linuxpc db]$ psql test
psql (9.6.1)
Type 'help' for help.

[hs@linuxpc db]$ createdb test

[hs@linuxpc db]$ psql test

psql (9.6.1)

Type 'help' for help.

To reduce the number of disk flushes and to give my SSD a long and prosperous life, I set synchronous_commit to off:

test=# ALTER DATABASE test SET synchronous_commit TO off;
ALTER DATABASE

1 2	test=# ALTER DATABASE test SET synchronous_commit TO off; ALTER DATABASE

Then, a small set of test data is loaded:

[hs@linuxpc db]$ pgbench -i test
NOTICE:  table 'pgbench_history' does not exist, skipping
NOTICE:  table 'pgbench_tellers' does not exist, skipping
NOTICE:  table 'pgbench_accounts' does not exist, skipping
NOTICE:  table 'pgbench_branches' does not exist, skipping
creating tables...
100000 of 100000 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)
vacuum...
set primary keys...
done.

[hs@linuxpc db]$ pgbench -i test

NOTICE: table 'pgbench_history' does not exist, skipping

NOTICE: table 'pgbench_tellers' does not exist, skipping

NOTICE: table 'pgbench_accounts' does not exist, skipping

NOTICE: table 'pgbench_branches' does not exist, skipping

creating tables...

100000 of 100000 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)

vacuum...

set primary keys...

done.

100.000 rows are absolutely enough to conduct this simple test.

Then 4 concurrent connections will perform 2 million transactions each.

For the first test default, PostgreSQL config parameters are used:

[hs@linuxpc db]$ time pgbench -c 4 -t 2000000 test
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
number of transactions per client: 2000000
number of transactions actually processed: 8000000/8000000
latency average = 0.558 ms
tps = 7173.597623 (including connections establishing)
tps = 7173.616874 (excluding connections establishing)

real      18m35.222s
user     2m23.980s
sys       3m4.830s

[hs@linuxpc db]$ time pgbench -c 4 -t 2000000 test

starting vacuum...end.

transaction type: <builtin: TPC-B (sort of)>

scaling factor: 1

query mode: simple

number of clients: 4

number of threads: 1

number of transactions per client: 2000000

number of transactions actually processed: 8000000/8000000

latency average = 0.558 ms

tps = 7173.597623 (including connections establishing)

tps = 7173.616874 (excluding connections establishing)

real 18m35.222s

user 2m23.980s

sys 3m4.830s

The amount of WAL produced is around 3.5 GB:

test=# SELECT pg_size_pretty(pg_current_xlog_location() - '0/00000000'::pg_lsn);
 pg_size_pretty
----------------
 3447 MB
(1 row)

test=# SELECT pg_size_pretty(pg_current_xlog_location() - '0/00000000'::pg_lsn);

pg_size_pretty

----------------

3447 MB

(1 row)

However, what happens if the checkpoint distances are decreased dramatically?

checkpoint_timeout = 30s
max_wal_size = 32MB
min_wal_size = 32MB

checkpoint_timeout = 30s

max_wal_size = 32MB

min_wal_size = 32MB

Skyrocketing WAL occurs due to checkpoints being too close together.

The amount of WAL will skyrocket because checkpoints are so close that most changes will be a “first change” made to a block. Using these parameters, the WAL will skyrocket to staggering 23 GB. As you can see the amount of WAL can easily multiply if those settings are not ideal.

Increasing the distance between checkpoints will certainly speed things up and have a positive impact on the WAL volume:

In case you need any assistance, please feel free to contact us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes

Article Rating

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Kristjan Mustkivi

4 years ago

Thank you for the great explanation! I am having trouble understanding one thing though - "The amount of WAL will skyrocket because checkpoints are so close that most changes will be a “first change” made to a block." What does the "first change" to a block actually mean here?

Stay tuned with our

Checkpoint distance and amount of WAL

WAL and checkpoint

Reducing the amount of transaction log written

Measuring the amount of WAL written

Then, a small set of test data is loaded:

For the first test default, PostgreSQL config parameters are used:

The amount of WAL produced is around 3.5 GB:

Skyrocketing WAL occurs due to checkpoints being too close together.

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts