Detecting table bloat

If your PostgreSQL database is purely maintained (lack of VACUUM) or badly structured, you might face some table bloat.

Table bloat is somewhat nasty because it slows down your database and eats up more storage than needed. Therefore it can be important to figure out, if a table is bloated or not. The PostgreSQL contrib package provides a package called pgstattuple to diagnose table bloat. 

To use the module we have to activate it:

test=# CREATE EXTENSION pgstattuple;

CREATE EXTENSION

For a test we create a table and add some 10.000 rows to it on the fly:

test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1, 10000);

SELECT 10000

We can already see, how pgstattuple works. To make sure the psql output stays readable, we have called …

test=# \x

Expanded display is on.

In our example pgstattuple will return the following data:

test=# SELECT * FROM pgstattuple('t_test');

-[ RECORD 1 ]------+-------

table_len | 368640

tuple_count | 10000

tuple_len | 280000

tuple_percent | 75.95

dead_tuple_count | 0

dead_tuple_len | 0

dead_tuple_percent | 0

free_space | 7380

free_percent | 2

As you can see the size of the table is somewhat over 368k. Our table has a fill grade of around 76%. Note that those numbers don’t add up to 100% completely. This is due to some overhead. In reality the fill grade of a freshly loaded table will be a lot higher than in our trivial single column case.

To demonstrate table bloat we can delete some data. In this example we delete one third of those rows:

test=# DELETE FROM t_test WHERE generate_series % 3 = 0;

DELETE 3333

Note, the size of the table on disk is still the same. Some of those valid rows have just been moved to the “dead tuple” section of the output:

test=# SELECT * FROM pgstattuple('t_test');

-[ RECORD 1 ]------+-------

table_len | 368640

tuple_count | 6667

tuple_len | 186676

tuple_percent | 50.64

dead_tuple_count | 3333

dead_tuple_len | 93324

dead_tuple_percent | 25.32

free_space | 7380

free_percent | 2

The first lesson here is that DELETE does not shrink a table on disk. It merely marks rows as dead. This is highly important – many people are misled by this behavior.

To reclaim the space occupied by those dead rows we can call VACUUM:

test=# VACUUM t_test;

VACUUM

What we see here is that those dead rows have been converted to free space. And yes, the size of the table is still the same:

test=# SELECT * FROM pgstattuple('t_test');

-[ RECORD 1 ]------+-------

table_len | 368640

tuple_count | 6667

tuple_len | 186676

tuple_percent | 50.64

dead_tuple_count | 0

dead_tuple_len | 0

dead_tuple_percent | 0

free_space | 114036

free_percent | 30.93

This free space can now be used to store new rows inside your table.

Checking many tables

In our example we have checked one table. But what if you want to check your entire database to see who is to blame?

In this case you have to make use of the PostgreSQL type system and a system catalog. Let us discuss the type system first: In the previous example we have seen that pgstattuple can be called in the FROM-clause because it returns an entire row. But we have to do this for all tables – not just for some. To make sure that we can put the list of objects we want to check into the FROM-clause (in the form of the pg_class system table) we have to move the function call to the SELECT clause. The call will return a composite type consisting of all fields:

test=# SELECT relname, pgstattuple(oid) FROM pg_class WHERE relkind = 'r' LIMIT 3;

relname | pgstattuple

--------------+----------------------------------------------------

pg_statistic | (147456,387,115800,78.53,27,8161,5.53,19840,13.45)

pg_type | (73728,357,61965,84.05,20,3440,4.67,4828,6.55)

pg_authid | (8192,3,324,3.96,0,0,0,7816,95.41)

(3 rows)

This is not really readable so we have to break up those fields again. Some brackets and a star will do the job:

test=# SELECT relname, (pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' LIMIT 3;-[ RECORD 1 ]------+-------------

relname | pg_statistic

table_len | 147456

tuple_count | 387

tuple_len | 115800

tuple_percent | 78.53

dead_tuple_count | 27

dead_tuple_len | 8161

dead_tuple_percent | 5.53

free_space | 19840

free_percent | 13.45

-[ RECORD 2 ]------+-------------

relname | pg_type

table_len | 73728

tuple_count | 357

tuple_len | 61965

tuple_percent | 84.05

dead_tuple_count | 20

dead_tuple_len | 3440

dead_tuple_percent | 4.67

free_space | 4828

free_percent | 6.55

-[ RECORD 3 ]------+-------------

relname | pg_authid

table_len | 8192

tuple_count | 3

tuple_len | 324

tuple_percent | 3.96

dead_tuple_count | 0

dead_tuple_len | 0

dead_tuple_percent | 0

free_space | 7816

free_percent | 95.41

We can now sort and filter at will to figure out, which table is to blame.

———-
We welcome your feedback on our brand new Facebook page: www.fb.com/cybertec.postgresql

Hans-Juergen Schoenig
Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is consultant and CEO of the company „Cybertec Schönig & Schönig GmbH“ (www.postgresq-support.de), which has served countless customers around the globe.