Operator classes explained

Posted on 2013-11-15 by Hans-Jürgen Schönig

After my morning session in Dublin some people asked me to share the codes of an operator class I created for demo reasons. This posting is supposed to cover those codes.

What is an operator class?

In PostgreSQL an index is not a hardwired thing, capable of following just one strategy. Instead an index is more of a framework people can use to implement their own indexing strategies. What is the point of all that?

Let us consider two values stored in a database:

CREATE TABLE t_sva (sva text);

INSERT INTO t_sva VALUES ('1118090878');
INSERT INTO t_sva VALUES ('2345010477');

At first glance those two numbers seem to be in their right order (alphabetical). However, in reality they are not. Those two odd numbers are in fact Austrian social security numbers. The first number is related to the 9^th of August 1978 (09 = day of month, 08 = month, 78 = year). 1118 is just a sequential number issued by the social security system. Taking into account that 1977 was actually before 1978 the second number should be the first entry in a sorted list.

The fact that those numbers need a “strange” sort order create some challenge when it comes to indexing this stuff. This is exactly when PostgreSQL operator classes kick in. An operator allows a user to create his own handcrafted indexing strategy. As the name (= operator class) already implies, creating an indexing strategy is all about operators. The goal is to create our own operators and use them alongside a normal btree.

Here is how it works:

CREATE OR REPLACE FUNCTION normalize_si(text)
RETURNS text AS $$
        BEGIN
        RETURN substring($1, 9, 2) ||
                substring($1, 7, 2) || substring($1, 5, 2) || substring($1, 1, 4);
        END; $$

LANGUAGE 'plpgsql' IMMUTABLE;

First of all we create a simple stored procedure capable of transforming our strange string and turn it into something which can be sorted normally:

SELECT normalize_si('1118090878');
 normalize_si
--------------
 7808091118
(1 row)

The clue here is that we can now use our own logic to compare values and turn this logic into operators. To do so we need a set of functions:

CREATE OR REPLACE FUNCTION si_lt(text, text)
RETURNS boolean AS
$$
        BEGIN
                RETURN normalize_si($1) < normalize_si($2);
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

-- lower equals
CREATE OR REPLACE FUNCTION si_le(text, text)
RETURNS boolean AS
$$
        BEGIN
                RETURN normalize_si($1) <= normalize_si($2);
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

-- greater equal

CREATE OR REPLACE FUNCTION si_ge(text, text)
RETURNS boolean AS
$$
        BEGIN
                RETURN normalize_si($1) >= normalize_si($2);
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

-- greater
CREATE OR REPLACE FUNCTION si_gt(text, text)
RETURNS boolean AS
$$
        BEGIN
                RETURN normalize_si($1) > normalize_si($2);
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

In PostgreSQL every operator is simply based on a stored procedure. So if you have proper procedures at hand already, it is pretty simple to turn those into operators. CREATE OPERATOR does the job:

-- define operators
CREATE OPERATOR <# ( PROCEDURE=si_lt,
                        LEFTARG=text,
                        RIGHTARG=text);

CREATE OPERATOR <=# ( PROCEDURE=si_le,
                        LEFTARG=text,
                        RIGHTARG=text);

CREATE OPERATOR >=# ( PROCEDURE=si_ge,
                        LEFTARG=text,
                        RIGHTARG=text);

CREATE OPERATOR ># ( PROCEDURE=si_gt,
                        LEFTARG=text,
                        RIGHTARG=text);

If you want to create an operator you have to tell PostgreSQL which function to call. Then we have to map the left and the right argument to our two arguments passed to the function. Of course, there are more features here which can be used to make this work even better, but for the most simplistic stuff this is enough.

The names of those operators make no difference here. However, I would recommend to use something readable.

Support functions

Before an index shows us how to handle a new strategy, we have to define some support functions. Depending on the type of index we will need different sets of support functions. In our case we are planning to use a simple btree so all we need is a comparison function returning a simple integer value:

CREATE OR REPLACE FUNCTION si_same(text, text)
        RETURNS int AS
$$
        BEGIN
                -- ugly code for clarity
                IF      normalize_si($1) < normalize_si($2)
                THEN
                        RETURN -1;
                ELSIF   normalize_si($1) > normalize_si($2)
                THEN
                        RETURN +1;
                ELSE
                        RETURN 0;
                END IF;
        END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

Creating the operator class

Now we have all ingredients in place to use our own indexing strategy. All it takes now is one last command:

CREATE OPERATOR CLASS sva_special_ops
       FOR TYPE text USING btree
AS
    OPERATOR        1       <#  ,
    OPERATOR        2       <=# ,
    OPERATOR        3       =  ,
    OPERATOR        4       >=# ,
    OPERATOR        5       >#  ,
    FUNCTION        1       si_same(text, text)
;

We have to tell the index which operators to use when and which support functions to use. In case of btrees this is really simple.

We can already create an index using our own operator class now:

CREATE INDEX idx_special ON t_sva (sva sva_special_ops);

The optimizer is already capable of using our index:

SET enable_seqscan TO off;              
     -- to make sure that this work with just two rows

explain SELECT * FROM t_sva WHERE sva = '0000112276';
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Index Only Scan using idx_special on t_sva  (cost=0.13..8.14 rows=1 width=32)
   Index Cond: (sva = '0000112276'::text)
(2 rows)