Speeding up things with hint bits

Post UPDATED June 2023: PostgreSQL is a highly sophisticated relational database system capable of performing wonderful operations. But, sophistication also means that there is a bit of complexity under the surface, which is not always well understood by users. One thing people usually don't know about are hint bits.

Table of Contents

What are hint bits?

Actually, they are an internal optimization which makes sure that visibility checks are sped up considerably. As you might know, PostgreSQL traditionally had to decide for each row whether it was visible to a user or not. The best way to speed up the process is to avoid that - and that is why hint bits were invented.

When PostgreSQL finds a row that is visible to all transactions, it sets a bit in the row header indicating this fact. Next time someone finds this row, they can avoid comparing the row visibility information to their snapshot, because the hint bit already tells them that it must be visible.

Creating some demo data

To demonstrate how things work, let's create some demo data:

test=# CREATE TABLE t_test (a int, b int);
CREATE TABLE

1 2	test=# CREATE TABLE t_test (a int, b int); CREATE TABLE

A simple Perl script can create the data nicely. In this example, half a billion rows are used (big enough to demonstrate the point and yet small enough to do the test in reasonable time):

[hs@jacqueline blog]$ cat numbers.pl
#!/usr/bin/perl

for ($count = 1; $count <= 500000000; $count++)
{
            print '$count   $countn';
}

[hs@jacqueline blog]$ cat numbers.pl

#!/usr/bin/perl

for ($count = 1; $count <= 500000000; $count++)

{

print '$count $countn';

}

To ensure that autovacuum does not spoil our day, it is turned off for this test:

test=# ALTER TABLE t_test SET (autovacuum_enabled = off);
ALTER TABLE

1 2	test=# ALTER TABLE t_test SET (autovacuum_enabled = off); ALTER TABLE

Then data is loaded into PostgreSQL. The easiest way to do that is to use “COPY … FROM PROGRAM” and to simply read things in through a pipe. It works like this:

test=# timing
Timing is on.

test=# COPY t_test FROM PROGRAM '/path/numbers.pl';
COPY 500000000
Time: 612716.438 ms

test=# timing

Timing is on.

test=# COPY t_test FROM PROGRAM '/path/numbers.pl';

COPY 500000000

Time: 612716.438 ms

On our system...

(some Intel i5 and a 750 GB SATA disk) it takes around 10 minutes to load the data. Note that default PostgreSQL parameters were used (no checkpoint optimizations). “top” leaves us with output as shown in the next listing:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                
27570 hs        20   0  257m  24m  23m R 48.6  0.2   0:10.33 postgres                                                                                               
27574 hs        20   0  119m 1584 1252 R 31.6  0.0   0:06.66 numbers.pl                                                                                             
27356 hs        20   0  256m  16m  16m S  1.3  0.1   0:00.18 postgres

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

27570 hs 20 0 257m 24m 23m R 48.6 0.2 0:10.33 postgres

27574 hs 20 0 119m 1584 1252 R 31.6 0.0 0:06.66 numbers.pl

27356 hs 20 0 256m 16m 16m S 1.3 0.1 0:00.18 postgres

You can clearly see that generating numbers actually causes a decent share of the CPU load. Actually, this is good - because it means that PostgreSQL is cheap, relative to the data creation part.

Hint bits: first contact …

Up until now, nothing special could be observed. Let's try to run a SELECT:

test=# SELECT count(*) FROM t_test;
   count  
-----------
 500000000
(1 row)

 

Time: 359669.393 ms

test=# SELECT count(*) FROM t_test;

count

-----------

500000000

(1 row)

Time: 359669.393 ms

The interesting part here is not the result of the query. The really interesting part is hidden nicely below the surface.

Here is vmstat:

[hs@jacqueline blog]$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    164 4968092  46412 10534028    0    0  3486  1903    3   13  1  0 91  7  0 
 0  0    164 4967960  46412 10534144    0    0     0     0   78  225  0  0 100  0  0   
 0  0    164 4967960  46420 10534144    0    0     0     8   72  203  0  0 99  1  0     
 0  0    164 4967960  46420 10534144    0    0     0     0  103  301  0  0 100  0  0  
 0  1    164 4946260  46420 10555520    0    0 10586    20  246  437  1  0 97  2  0           
 1  1    164 4732608  46420 10763488    0    0 104000     0 1403 2046 10  2 76 12  0      
 0  1    164 4517592  46420 10972988    0    0 104704     0 1415 2042 10  3 76 11  0      
 1  0    164 4301832  46420 11183036    0    0 105024     0 1410 2038 10  3 75 12  0       
 1  0    164 4082228  46420 11396740    0    0 106880     4 1449 2048 11  3 75 11  0       
 0  1    164 3868576  46420 11604996    0    0 104064     0 1414 2050 10  3 75 12  0      
 1  0    164 3650460  46420 11817164    0    0 106176     0 1415 2058 11  2 75 12  0       
 0  1    164 3433460  46420 12028640    0    0 105664     0 1419 2078 10  2 75 12  0      
 1  1    164 3262588  46420 12194508    0    0 82944 45356 1199 2209  8  3 66 23  0     
 1  1    164 3143176  46420 12310644    0    0 58112 43818  949 1738  6  2 75 18  0      
 1  1    164 3018680  46420 12432100    0    0 60736 36900  940 1760  6  2 75 17  0      
 1  2    164 2897284  46420 12550404    0    0 59136 29726  917 1727  6  2 75 16  0      
 0  2    164 2775020  46420 12669348    0    0 59392 36642  912 1746  6  2 76 17  0      
 0  2    164 2656848  46420 12784480    0    0 57600 36388  883 1687  6  2 75 17  0      
 1  1    164 2537560  46420 12900636    0    0 58048 36644  937 1755  5  2 73 19  0

[hs@jacqueline blog]$ vmstat 2

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

1 0 164 4968092 46412 10534028 0 0 3486 1903 3 13 1 0 91 7 0

0 0 164 4967960 46412 10534144 0 0 0 0 78 225 0 0 100 0 0

0 0 164 4967960 46420 10534144 0 0 0 8 72 203 0 0 99 1 0

0 0 164 4967960 46420 10534144 0 0 0 0 103 301 0 0 100 0 0

0 1 164 4946260 46420 10555520 0 0 10586 20 246 437 1 0 97 2 0

1 1 164 4732608 46420 10763488 0 0 104000 0 1403 2046 10 2 76 12 0

0 1 164 4517592 46420 10972988 0 0 104704 0 1415 2042 10 3 76 11 0

1 0 164 4301832 46420 11183036 0 0 105024 0 1410 2038 10 3 75 12 0

1 0 164 4082228 46420 11396740 0 0 106880 4 1449 2048 11 3 75 11 0

0 1 164 3868576 46420 11604996 0 0 104064 0 1414 2050 10 3 75 12 0

1 0 164 3650460 46420 11817164 0 0 106176 0 1415 2058 11 2 75 12 0

0 1 164 3433460 46420 12028640 0 0 105664 0 1419 2078 10 2 75 12 0

1 1 164 3262588 46420 12194508 0 0 82944 45356 1199 2209 8 3 66 23 0

1 1 164 3143176 46420 12310644 0 0 58112 43818 949 1738 6 2 75 18 0

1 1 164 3018680 46420 12432100 0 0 60736 36900 940 1760 6 2 75 17 0

1 2 164 2897284 46420 12550404 0 0 59136 29726 917 1727 6 2 75 16 0

0 2 164 2775020 46420 12669348 0 0 59392 36642 912 1746 6 2 76 17 0

0 2 164 2656848 46420 12784480 0 0 57600 36388 883 1687 6 2 75 17 0

1 1 164 2537560 46420 12900636 0 0 58048 36644 937 1755 5 2 73 19 0

“bi” (= blocks in) and “bo” (= blocks out) are the two columns of interest here.

At first, things look pretty normal.

We start to read at roughly 100 MB / second, a reasonable value for a desktop PC featuring a single SATA drive. But then, things start to go south. At some point, PostgreSQL starts to write data out to disk. Why that? Well, when a row is loaded, PostgreSQL has no idea whether it is visible to others without checking visibility all over again. Therefore PostgreSQL will mark a row as “visible to everyone” when it's read for the first time. This is exactly where the I/O comes from - when rows are marked as “visible to everybody”. The advantage here is that visibility checks are sped up tremendously.

When the query is executed again, the I/O pattern is totally different:

[hs@jacqueline blog]$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st         
 0  1    164 191076   1688 15325392    0    0 104256     0 1401 2077  8  2 75 15  0          
 0  1    164 183140   1688 15334056    0    0 105792     0 1394 2113  8  2 75 15  0          
 0  1    164 187728   1688 15329596    0    0 106880     0 1429 2121  9  2 75 15  0          
 0  1    164 190828   1688 15327804    0    0 105472     0 1413 2124  8  2 75 15  0          
 2  0    164 184008   1688 15335792    0    0 104640     0 1389 2108  8  2 75 15  0          
 1  0    164 188348   1688 15336880    0    0 106240     0 1466 2107  9  2 75 14  0          
 1  0    164 191944   1692 15337824    0    0 100226     4 1372 2060  8  2 75 15  0

[hs@jacqueline blog]$ vmstat 2

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

0 1 164 191076 1688 15325392 0 0 104256 0 1401 2077 8 2 75 15 0

0 1 164 183140 1688 15334056 0 0 105792 0 1394 2113 8 2 75 15 0

0 1 164 187728 1688 15329596 0 0 106880 0 1429 2121 9 2 75 15 0

0 1 164 190828 1688 15327804 0 0 105472 0 1413 2124 8 2 75 15 0

2 0 164 184008 1688 15335792 0 0 104640 0 1389 2108 8 2 75 15 0

1 0 164 188348 1688 15336880 0 0 106240 0 1466 2107 9 2 75 14 0

1 0 164 191944 1692 15337824 0 0 100226 4 1372 2060 8 2 75 15 0

As you can see, there is no more writing going on - just a clean 100+ MB / second. Exactly what we want to see.

The read is a lot faster now:

test=# SELECT count(*) FROM t_test;
   count  
-----------
 500000000
(1 row)

Time: 110263.714 ms

test=# SELECT count(*) FROM t_test;

count

-----------

500000000

(1 row)

Time: 110263.714 ms

Wow, the time needed to perform this one has gone down dramatically! And more importantly, the execution time is pretty stable:

Time: 110382.362 ms

1	Time: 110382.362 ms

The theory behind it is that reads happen more frequently than writes, and therefore investing in hint bits makes an awful lot of sense, because it seriously improves run-times.

Turning to VACUUM

But there is a bit more. What if we VACUUM the table?

Here is vmstat again:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1    164 161936    616 15427528    0    0 50718   356  845 1262  4  2 79 15  0
 0  1    164 168136    616 15421384    0    0 87680   636 1267 2517  6  4 75 15  0           
 0  1    164 169624    616 15419204    0    0 88704   624 1254 2529  6  3 75 15  0           
 0  1    164 179296    616 15409532    0    0 90304   560 1279 2435  6  4 75 15  0           
 0  1    164 185868    616 15402548    0    0 89216   628 1319 1898  7  4 74 15  0           
 1  1    164 178924    616 15409580    0    0 88832   628 1295 1960  7  3 75 15  0           
 0  1    164 183636    616 15404972    0    0 80166   624 1196 1802  6  3 75 16  0
 0  1    164 186240    616 15402652    0    0 91648   568 1354 1951  7  4 75 15  0           
 0  2    164 180660    624 15408296    0    0 89472   670 1307 1982  6  3 75 16  0           
 1  1    164 170120    624 15418500    0    0 72260 49908 1080 2035  5  3 70 22  0         
 0  2    164 164788    624 15423656    0    0 67072 22782  995 1955  4  2 70 23  0          
 0  2    164 168012    624 15420556    0    0 61888 22492  984 1928  4  2 73 20  0          
 0  2    164 164044    624 15424496    0    0 60672 29582  959 1887  4  2 73 20  0          
 0  2    164 172352    624 15416452    0    0 67712 22524 1053 2097  5  3 72 20  0

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

1 1 164 161936 616 15427528 0 0 50718 356 845 1262 4 2 79 15 0

0 1 164 168136 616 15421384 0 0 87680 636 1267 2517 6 4 75 15 0

0 1 164 169624 616 15419204 0 0 88704 624 1254 2529 6 3 75 15 0

0 1 164 179296 616 15409532 0 0 90304 560 1279 2435 6 4 75 15 0

0 1 164 185868 616 15402548 0 0 89216 628 1319 1898 7 4 74 15 0

1 1 164 178924 616 15409580 0 0 88832 628 1295 1960 7 3 75 15 0

0 1 164 183636 616 15404972 0 0 80166 624 1196 1802 6 3 75 16 0

0 1 164 186240 616 15402652 0 0 91648 568 1354 1951 7 4 75 15 0

0 2 164 180660 624 15408296 0 0 89472 670 1307 1982 6 3 75 16 0

1 1 164 170120 624 15418500 0 0 72260 49908 1080 2035 5 3 70 22 0

0 2 164 164788 624 15423656 0 0 67072 22782 995 1955 4 2 70 23 0

0 2 164 168012 624 15420556 0 0 61888 22492 984 1928 4 2 73 20 0

0 2 164 164044 624 15424496 0 0 60672 29582 959 1887 4 2 73 20 0

0 2 164 172352 624 15416452 0 0 67712 22524 1053 2097 5 3 72 20 0

Again, some writing is going on. We are seriously bound by the disk here, as the following couple of lines taken from “top” indicate:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND                                                     27570 hs        20   0  257m  30m  29m R 26.3  0.2   9:34.80 postgres                                                                                                
   59 root      20   0     0    0    0 S  1.7  0.0  18:00.13 kswapd0                                                                                                
   36 root      20   0     0    0    0 S  1.0  0.0  10:42.29 kblockd/2                                                                                               
27245 root      20   0     0    0    0 D  0.3  0.0   0:03.53 flush-8:16                                                                                             
30861 root      20   0 15036 1196  836 R  0.3  0.0   0:03.75 top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27570 hs 20 0 257m 30m 29m R 26.3 0.2 9:34.80 postgres

59 root 20 0 0 0 0 S 1.7 0.0 18:00.13 kswapd0

36 root 20 0 0 0 0 S 1.0 0.0 10:42.29 kblockd/2

27245 root 20 0 0 0 0 D 0.3 0.0 0:03.53 flush-8:16

30861 root 20 0 15036 1196 836 R 0.3 0.0 0:03.75 top

VACUUM takes quite a while as well:

test=# VACUUM t_test;
VACUUM
Time: 363773.532 ms

test=# VACUUM t_test;

VACUUM

Time: 363773.532 ms

The beauty now is that we not only know whether a single row is visible or not - VACUUM can also figure out whether an entire block is visible or not.

Again, we see an improvement in speed - but this time, it's not very big (just two seconds):

test=# SELECT count(*) FROM t_test;
   count  
-----------
 500000000
(1 row)

Time: 108715.332 ms

test=# SELECT count(*) FROM t_test;

count

-----------

500000000

(1 row)

Time: 108715.332 ms

The main benefit really comes from those bits attached to a single line - and this benefit is massive.

Finally...

If you liked this article, check out the next article in this series: Speed up PostgreSQL data loading with COPY (FREEZE)

For further information on how to speed up performance in PostgreSQL, see this post about CLUSTER and HOT updates, or see Laurenz Albe's first article on HOT updates.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes

Article Rating

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Radovan

9 years ago

Thanks for information about postgresql internals. I see obvious benefit when using with datawarehouse or loading large amount of data into postgresql, but in which scenario(s) the bits hints could be beneficial for OLTP database load?

Marc Rechté

3 years ago

Thanks. Very interesting.

Stay tuned with our

Speeding up things with hint bits

What are hint bits?

Creating some demo data

To ensure that autovacuum does not spoil our day, it is turned off for this test:

On our system...

Hint bits: first contact …

Here is vmstat:

At first, things look pretty normal.

When the query is executed again, the I/O pattern is totally different:

The read is a lot faster now:

Turning to VACUUM

Here is vmstat again:

VACUUM takes quite a while as well:

Finally...

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts