PostgreSQL offers a great deal of flexibility when it comes to resource
allocation and efficient usage of hardware capacities. This allows fancy
monitoring of what PostgreSQL does in production. Sometimes, however, better
operating system integration is needed and helps to use hardware even more
efficiently.
On Linux one way to limit or allocate resources is to use the Linux kernel’s
cgroup interface. cgroups and PostgreSQL are a perfect team for …
- more efficient monitoring
- restriciting use of resources
- resource accounting.
How do cgroups work?
On a Linux system cgroups (= control groups) are basically a way to group
various operating system processes into groups. A group like that can then be
assigned to a certain amount of RAM, a fraction of a CPU or to a certain number
of I/O operations.
Processes can be moved to a group manually or with the help of a daemon which
moves processes to the right group automatically given some user-defined rules.
It is easily possible to move all processes belonging to a user to some special
cgroup and apply constraints to it. Following this approach you can move an
entire PostgreSQL instance and all its processes to a certain cgroup.
A simple config could look like that (/etc/cgconfig.conf):
mount {
cpu = /cgroup/main;
cpuacct = /cgroup/main;
memory = /cgroup/main;
blkio = /cgroup/main;
}
group user_0004 {
perm {
admin {
uid = root;
gid = root;
}
task {
uid = user_0004;
gid = user_0004;
}
}
memory {
memory.limit_in_bytes = 256M;
}
cpu {
cpu.shares = "75";
}
}
group user_0005 {
perm {
admin {
uid = root;
gid = root;
}
task {
uid = user_0005;
gid = user_0005;
}
}
memory {
memory.limit_in_bytes = 512M;
}
cpu {
cpu.shares = "125";
}
}
We got two users which form separate groups. Each group will get certain
fractions of the system’s CPU as well as a certain amount of memory. In other
words: You can box a database instance nicely inside a group.
In the next step you can nicely put processes into a group. This is best done
using /etc/cgrules.conf:
user_0004 * /user_0004 user_0005 * /user_0005
In our case every user’s process is automatically moved to the right process on
startup.
More on cgroups will be available in further blog posts.
Thoughts about autonomous transactions
When reading Gianni Ciolli’s wiki page about autonomous transactions this week I spent some time to think about things in more detail.
The idea of an autonomous transaction is actually quite simple:
START TRANSACTION;
some_sql_action;
START AUTONONOUS TRANSACTION;
INSERT INTO log_table VALUES (some_data);
COMMIT AUTONOMOUS TRANSACTION;
some_more_sql_action;
ROLLBACK;
The main advantage here is actually that you can commit the log entry without having to commit the parent transaction. This makes the log entry visible if the parent transaction fails badly. Reporting errors to a table is ways more robust in this case. Technically the idea floating around is to push transactions to a stack doing some push / pop operations as autonomous transactions come and go.
BUT: What should happen in this case?
START TRANSACTION;
SELECT some_data FROM some_table WHERE id = 1 FOR UPDATE;
START AUTONONOUS TRANSACTION;
SELECT some_data FROM some_table WHERE id = 1 FOR UPDATE; -- > problem
COMMIT AUTONOMOUS TRANSACTION;
some_more_sql_action;
ROLLBACK;
If the child transaction is really supposed to be fully independent of its parent this should actually never be possible. The parent transaction locks up a row which is then used by the autonomous transaction. This leaves us with two choices: Either the child transaction is NOT fully independent or you allow users to “kill” themselves by locking up the connection.
When being out on the road doing consulting it is sometimes impressing to figure out how little people actually know about how joins work. Especially outer joins seem to be a mystery for most people.
Here is an example which might bring some enlightenment:
test=# CREATE TABLE a (id int4); CREATE TABLE test=# CREATE TABLE b (id int4); CREATE TABLE test=# INSERT INTO a VALUES (1), (2), (3); INSERT 0 3 test=# INSERT INTO b VALUES (2), (3), (4); INSERT 0 3
We just insert a couple of rows and see how things work out.
Let us start with a simple LEFT JOIN:
test=# SELECT * FROM a LEFT JOIN b ON (a.id = b.id); id | id ----+---- 1 | 2 | 2 3 | 3 (3 rows)
The goal here is to get all rows from the left hand side and see which rows match on the right side. The “missing” value is filled with a NULL token.
One of the key misunderstandings is this magical ON-clause:
test=# SELECT * FROM a LEFT JOIN b ON (a.id = b.id AND b.id = 2); id | id ----+---- 1 | 2 | 2 3 | (3 rows)
To most people this result is surprising. It is important to notice that the filter is only applied to the right side. So, we get one more NULL value. This is pretty easy to see and comprehend if the usecase is that simple – but, in case of an outer-join orgy it is not that trivial anymore.
The example I have just shown is not what people usually expect. Here is an alternative:
test=# SELECT * FROM a LEFT JOIN b ON (a.id = b.id) WHERE b.id = 2; id | id ----+---- 2 | 2 (1 row)
The WHERE-clause will filter on both columns at the same time. This will actually turn the LEFT JOIN into a useless operation because the WHERE-clause will filter stuff away anyway. NOTE: This is a mistake which can commonly observed; most people don’t get ON and WHERE right when it comes to outer joins.
FULL OUTER joins
FULL OUTER joins are the next bug miracle:
test=# SELECT * FROM a FULL JOIN b ON (a.id = b.id); id | id ----+---- 1 | 2 | 2 3 | 3 | 4 (4 rows)
We take all rows from the left hand side, take the matches on the right hand side and add NULL values in case no matches are found. So far this is no surprise …
The problem usually starts when we try to make the ON clause a little bit more fancy:
test=# SELECT * FROM a FULL JOIN b ON (a.id = b.id AND b.id = 2); id | id ----+---- 1 | 2 | 2 3 | | 4 | 3 (5 rows)
Most people would not expect “3” to exist at all on the right hand side. As you can see it is moved to the NULL-part of the join. This is not too intuitive for most people. However, this is perfectly expected behavior.
If we filter in the WHERE clause we will again turn the outer join into a useless operation:
test=# SELECT * FROM a FULL JOIN b ON (a.id = b.id) WHERE b.id = 2; id | id ----+---- 2 | 2 (1 row)
Maybe this will bring some enlightenment to some users out there.
PostgreSQL is famous for its scalability and especially for its exceptionally good support for transactions. But, not only this makes PostgreSQL fairly unique.
We are currently already experimenting with some customer applications to see if they will work nicely with PostgreSQL 9.1. One of the features which will make PostgreSQL really cool is the fact that we now have writable CTEs.
I think most of us will know this scenario: It is 3 o’clock in the morning. You are dead tired and something just does not want to work. You want to delete data from a table … a lot of data. Are you brave enough to hit commit? You want to, let’s say, 20 mio rows … wonna risk it?
Well, with writable CTEs things will be a little easier.
Here is a demo:
test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1, 10000) AS id; SELECT 10000
First we create a small demo with some simple data.
Now we do our fancy cleanup. For the sake of security we do it inside a transaction:
test=# BEGIN; BEGIN
How about analyzing data while it is being deleted? Here is how it works …
test=# WITH x AS ( DELETE FROM t_test RETURNING * )
SELECT min(id), max(id), avg(id), count(*) FROM x;
min | max | avg | count
-----+-------+-----------------------+-------
1 | 10000 | 5000.5000000000000000 | 10000
(1 row)
We put the DELETE inside a so called WITH-clause. WITH will give us a “virtual table” called “x” which contains the data returned by DELETE. Note, we used DELETE in combination with a “RETURNING *” … so we essentially tell DELETE to give us the data we are about to kick. As we might want to do it for 20 mio rows or so it is definitely too much for a night-time reading exercise. So, we turn this data coming back from the DELETE into something we can then nicely analyze inside our SELECT part. Given this analysis it is a lot easier to judge if our DELETE has really touched the data we wanted it to touch. It greatly reduces the odds of breaking something late at night.
If we feel safe, we can nicely commit …
test=# COMMIT; COMMIT
Sometimes people have to delete a lot of data. Common table expressions (CTEs) make this process a lot more reliable and robust.
