CYBERTEC Logo

Speed up PostgreSQL data loading with COPY (FREEZE)

08.2014 / Category: , / Tags: |

UPDATED JULY 19, 2023 - When I do training here at CYBERTEC, people often ask how to load data fast and efficiently. There are many ways to achieve this. One way not too many know about is COPY (FREEZE). It makes sure that PostgreSQL can organize data nicely straight away by instantly writing hint bits (as explained in my previous post).

Sample data for our bulk load test

To demonstrate COPY (FREEZE), first generate some data:

We will load the data into this table:

Importing the data with COPY

To import the data on my test system (single SATA disk in a Linux box) I use a normal COPY:

As you can see, it takes around 10 minutes to do so.

Speeding up the load with COPY (FREEZE)

To speed things up, let's try COPY (FREEZE):

The important thing here is that this command can only operate inside the transaction that created or truncated the table. Otherwise, it doesn't work (I'll explain the reason later). So, let's try the following:

The execution time goes down dramatically!

Explanation for the speed of COPY (FREEZE)

The main reason it's now more efficient is that PostgreSQL did not have to write transaction log data. That's because I configured wal_level = minimal. With this setting, you can't use WAL archiving or replication, and PostgreSQL only logs enough data for crash recovery. Because of that, PostgreSQL does not need to write WAL for data modifications that take place in the transaction that created or truncated the table. Skipping WAL writes speeds up data loading considerably! Changing wal_level requires a restart of the database, but it may be worth the pain if you need to load data quickly.

But there's also another gain: The second major benefit will be visible when the data is read for the first time after the import. As already shown in my previous posts, PostgreSQL has to set hint bits during the first reading operation. This translates to a large number of writes later on down the road.

COPY (FREEZE) has been made to fix exactly this kind of problem - the data are already loaded in the frozen state, and there is no need to write hint bits any more. Frozen rows are unconditionally visible, and that is the reason why the table had to be created or truncated in the same transaction: otherwise, concurrent readers could see the rows before the transaction commits, which would violate transaction isolation.

Analytical applications can benefit greatly
from the COPY (FREEZE) feature!

Finally...

I’ve recently seen some really broad tables (hundreds of columns) in a somewhat inefficiently structured database. Our PostgreSQL support customer complained about strange runtime behavior which could not be easily explained. To help other PostgreSQL users in this same situation, I decided to reveal the secrets of a fairly common performance problem many people don’t understand: Column order and column access.
 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Craig Ringer (2ndQuadrant)
Craig Ringer (2ndQuadrant)
9 years ago

Note that there are some non-performance consequences of page size, too:

* A single tuple can't span multiple pages (though TOAST lets you store big individual fields out-of-line), so with 4k pages you're limited to tuples roughly half as wide; and

* Index entries may not span multiple pages, even those for TOASTed values, so the existing limitation on the size of indexed text fields is made smaller with a smaller page size, so it's more likely you'll hit it in production.

For that reason, I'd be pretty reluctant to set page size smaller than 4k.

Also, DBs with different page sizes aren't on-disk compatible as the page size is a compile-time parameter. This means that any change must involve a dump and reload, and it means you have to use custom-compiled Pg instances for all replicas, etc.

I don't think that's worth a 5-10% boost.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    1
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram