(func()).* – Hidden performance issues

PostgreSQL support cases are coming in on a regular basis. This week an especially noteworthy one reached our desks here at Cybertec, which is so interesting, that I decided to sit down and share some information. I guess many people out there have similar issues and therefore this post might be helpful to developers and administrators out there.

Composite data types in PostgreSQL

PostgreSQL offers a very interesting feature: It is possible to create so called “composite types”, which are simply a collection of fields..

Here is an example:


CREATE TYPE sample_type AS (a int, b int, c int);

In this example a type called “sample_type” is created. For the sake of simplicity it consists of three integer values. The beauty of PostgreSQL is that composite types can be used just like normal types (e.g. as columns, array, etc.). The use of composite types offers database developers a great deal of flexibility. Passing more complex data to functions is a lot easier than without composite types and therefore composites are a joy to work with.

However, when using a composite type it is also necessary to be cautious and to understand the basic inner workings of PostgreSQL.

Consider the following function:


CREATE FUNCTION sample_func()

RETURNS setof sample_type

AS

$$

DECLARE

 

BEGIN

RAISE NOTICE 'function call was here: %',

clock_timestamp();

RETURN NEXT (1, 2, 3);

RETURN NEXT (4, 5, 6);

EXECUTE 'SELECT pg_sleep(2)';

RETURN;

END;

$$ LANGUAGE 'plpgsql';

The function returns two rows and sleeps for two seconds. So far things are fine.

However, consider the following trivial SQL statement:


test=# \timing

Timing is on.

test=# SELECT (sample_func()).*;

NOTICE:  function call was here: 2017-04-07 16:05:44.238239+02

NOTICE:  function call was here: 2017-04-07 16:05:46.241073+02

NOTICE:  function call was here: 2017-04-07 16:05:48.243333+02

a | b | c

---+---+---

1 | 2 | 3

4 | 5 | 6

(2 rows)

Time: 6010.778 ms

As you can see the statement takes around 6 seconds. The reason is simple: The code is executed three times. Note that there is just one function call. However, PostgreSQL does something important behind the scenes: It expands the SELECT-clause. To PostgreSQL “(sample_func()).*” is not just a single call. It expands things before execution in a way that the function is called three times (one call for each column). While this is not a big deal for fast, tiny functions, it can be a HUGE issue for more complicated or slow code such as the function I have just written.

Reducing the number of function calls

To reduce the number of function calls the function can be moved to the FROM-clause of the query:


test=# SELECT * FROM sample_func();

NOTICE:  function call was here: 2017-04-07 16:06:00.810724+02

a | b | c

---+---+---

1 | 2 | 3

4 | 5 | 6

(2 rows)

Time: 2002.869 ms

As you can see the execution time has dropped by 2/3 and the code only emits one NOTICE.

Additional performance tips

Sometimes it is hard to simply move the composite type to a FROM-clause. In those special cases it can make sense to work with subselects and NOTICE messages for debugging to get around the problem. Usually trying multiple variations of the query can lead to success.

If you have any questions regarding this issue, feel free to contact us and to leave a comment in the feedback section.

Hans-Juergen Schoenig
Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is consultant and CEO of the company „Cybertec Schönig & Schönig GmbH“ (www.cybertec.at, www.postgresql-support.de), which has served countless customers around the globe.