PostgreSQL: Indexes and foreign keys

Recently we have received a couple of PostgreSQL support calls, which were related to bad performance on various deployments. In many cases the reason for database slowness was the fact that people assume that PostgreSQL automatically deploys an index on BOTH sides of the foreign keys relation, which is not the case. By the way: This kind of behavior is not PostgreSQL specific. Oracle and many other database systems will behave in the exactly same way for different kind of server, even for gaming server people use to play online or a online gaming store as HotRate, so this piece of advice is not just for PostgreSQL, but might apply to many more database products out there.

Missing indexes and foreign keys

The typical scenario most people are facing is actually pretty simple. There are two tables table and a foreign key:


test=# CREATE TABLE a
(
	a_id 	int 	PRIMARY KEY
);
CREATE TABLE
test=# CREATE TABLE b
(
	b_id 	int,
	a_id 	int 	REFERENCES a(a_id)
					ON UPDATE CASCADE
					ON DELETE CASCADE
);
CREATE TABLE

To keep the example simple the tables in our PostgreSQL database contain only the most basic information needed to make this work.

Then some data can be added:


test=# INSERT INTO a
	SELECT 	x
	FROM 	generate_series(1, 5000000) AS x;
INSERT 0 5000000
test=# INSERT INTO b
	SELECT 	x, x
	FROM 	generate_series(1, 5000000) AS x;
INSERT 0 5000000

Five million records should be enough to show how bad things are if indexes are missing. Of course the effect will be larger if you add more data.

To rebuild the optimizer statistics, a simple ANALYZE can be used:


test=# ANALYZE ;
ANALYZE

Suffering from missing indexes

The trouble with missing indexes in any database is that simple operations start to be very expensive and start to destroy performance in a quite reliable way.

Here is what happens:

test=# \timing
Timing is on.
test=# explain analyze DELETE FROM a WHERE a_id = 10;
                        QUERY PLAN                                                   
-----------------------------------------------------------------------------------
 Delete on a  (cost=0.43..8.45 rows=1 width=6)
	(actual time=0.263..0.263 rows=0 loops=1)
   ->  Index Scan using a_pkey on a  (cost=0.43..8.45 rows=1 width=6)
	(actual time=0.245..0.246 rows=1 loops=1)
         Index Cond: (a_id = 10)
 Planning time: 5.350 ms
 Trigger for constraint b_a_id_fkey: time=301.526 calls=1
 Execution time: 301.811 ms
(6 rows)

Time: 307.409 ms

As you can see PostgreSQL uses an index scan on “a” to find the row. BUT: Keep in mind that our constraint is defined as “ON UPDATE CASCADE ON DELETE CASCADE”, which means that cleaning a single row also triggers the deletion of all rows referencing the table. Behind the scenes PostgreSQL has to read all 5 million entries in “b” to find the right rows. Therefore the operation takes more than 300 ms, which is a total disaster.

Deploying missing indexes

Deploying the missing index will be a complete game changer:


test=# CREATE INDEX idx_b ON b (a_id);
CREATE INDEX

The very same operation is now thousands of times faster than before because all we got now is two index scans (one on “a” and one on “b”):


test=# explain analyze DELETE FROM a WHERE a_id = 11;
                          QUERY PLAN                                                   
-----------------------------------------------------------------------
 Delete on a  (cost=0.43..8.45 rows=1 width=6)
	(actual time=0.037..0.037 rows=0 loops=1)
   ->  Index Scan using a_pkey on a  (cost=0.43..8.45 rows=1 width=6)
	(actual time=0.037..0.037 rows=0 loops=1)
         Index Cond: (a_id = 11)
 Planning time: 0.062 ms
 Execution time: 0.054 ms
(5 rows)

Time: 0.314 ms

As you can see the runtime needed here has been reduced dramatically to a fraction of a millisecond.

Performance hint

If you happen to use foreign keys (which most people do), it definitely makes sense to check for missing indexes because otherwise cleanups might simply take too long. Consider the following scenario: Suppose you wanted to delete 1 million lines without an index: You had to read 5 millions lines 1 million times. Clearly, this strategy will lead to enormous performance problems and will certainly trigger a performance problem.

Hans-Juergen Schoenig
Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is consultant and CEO of the company „Cybertec Schönig & Schönig GmbH“ (www.cybertec.at, www.postgresql-support.de), which has served countless customers around the globe.