Reducing space consumption

After digging through some customer source code yesterday I thought that it might be worth publishing a post about space consumption and enum types. It seems to be quite common to store status information as text fields. This is highly inefficient.

Here is an example:

test=# CREATE TABLE t_log (id serial, t timestamp, status text);

Unless we have defined a check constraint we can add any data we want:

test=# INSERT INTO t_log (status) VALUES ('online');

INSERT 0 1

test=# INSERT INTO t_log (status) VALUES ('offline');

INSERT 0 1

test=# INSERT INTO t_log (status) VALUES ('some useless stuff');

INSERT 0 1

The problem I want to point out here is that a string needs a fair amount of space. In addition to that, it is pretty likely that there is only a handful of status flags available anyway. Just imagine that you have to store billions of rows – a couple of bytes can definitely make a difference.

Space optimization

To reduce the amount of space needed, we can create an enum-type. The advantage here is that we need just one byte to store up to 255 different values. An enum is also an implicit “check constraint” making sure that only those values we really want are allowed:

test=# CREATE TYPE log_status AS ENUM ('online', 'offline', 'waiting');

CREATE TYPE

In our example three values are allowed.

We can now use this type in our table:

test=# CREATE TABLE t_log (id serial, t timestamp, status log_status);

CREATE TABLE

When we try to insert data, we will say that bad rows are not accepted:

test=# INSERT INTO t_log (status) VALUES ('online');

INSERT 0 1

test=# INSERT INTO t_log (status) VALUES ('some useless stuff');

ERROR: invalid input value for enum log_status: "some useless stuff"

LINE 1: INSERT INTO t_log (status) VALUES ('some useless stuff');

Enum data types are really a nice way to model a finite list of textual values in an efficient way. One cool feature is also that those types can preserve order. What does that mean? It basically means that if you sort an enum column, data will be returned in the same order as the enum.

Changing enums later on in the process is definitely possible. ALTER TYPE provides a nice interface to add values to an enum type.

Visit us on facebook: www.fb.com/cybertec.postgresql

Hans-Juergen Schoenig
Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is consultant and CEO of the company „Cybertec Schönig & Schönig GmbH“ (www.cybertec.at, www.postgresql-support.de), which has served countless customers around the globe.