pg_squeeze – PostgreSQL extension to auto-rebuild bloated tables


pg_squeeze, an open-source PostgreSQL extension from Cybertec, enables automatic and transparent fixing of one of the few weak points of PostgreSQL – bloated tables.

Unlike with built-in commands “VACUUM FULL” or “CLUSTER”, with “pg_squeeze” there are no extended periods of full table locking, thus reads and writes are not blocked during the rebuild! Also the rebuilding process is very efficient due to a novel approach of using transaction log files and logical decoding (instead of triggers) to capture possible data changes to the table being rebuild. This helps to save firstly on disk space and IO throughput and even more importantly enables very short locking-times, making it a perfect fit for mission-critical OLTP systems.

How does pg_squeeze work?

The extension is implemented as a background worker process (a framework introduced in version 9.4) that periodically monitors user-defined tables and when it detects that a table exceeded the “bloat threshold”, it kicks in and rebuilds that table automatically! Rebuilding happens concurrently in the background with minimal storage and computational overhead due to use of Postgres’ built-in replication slots together with logical decoding to extract possible table changes happening during the rebuild from XLOG. Bloat threshold is of course configurable and bloat ratio calculation is based on the Free Space Map (taking also FILLFACTOR into account) or under certain conditions on the “pgstattuple” extension when it’s available.  Additionally many customization parameters like “minimum table size” can be set, with non-suitable tables being ignored. Also reordering by an index or moving the table or indexes to new tablespace is possible.




PostgreSQL License


Grab the code HERE a try it out (Version Beta1)! In case of questions or for support inquiries, we would be happy to  hear from you – CONTACT.


Q: Is it safe? What happens when power goes away during a table rebuild ?

A: Yes it is safe, rebuild happens in a transaction. Additionally also maximum lock time can be set for the extension to limit off the time taken to switch the table.


Q: How does it differ from “pg_repack” ?

A: In sense that it is more resource friendly by not using triggers, automatically determining bloated tables on itself and not requiring use of a separate command line tool.


Q: What are the requirements for tables to be rebuilt?

A: Besides hitting the “bloat threshold”, the only hard-coded requirement is that a table needs to have an identity key, thus primary key or a unique constraint defined.