GIN - Just A Kind Of Index
As long as PostgreSQL is concerned, GIN means an access method (index type) rather than a drink. The acronym stands for Generalized Inverted Index. This article tries to explain how data is organized in the GIN index and how search works.
The current implementation of GIN index has much to do with B-tree, so let’s look at B-tree first.
The index consists of pages, each containing as many index tuples as space allows. Index tuple is a structure containing:
- key datum – value of the indexed attribute (column). On leaf page the key datum is that of the referenced heap tuple (table row). On internal page it describes the referenced subtree. (It’s a minimum key value of the referenced page.)
- tuple identifier (TID) – information about physical location of the referenced index tuple or heap tuple. Note that each row of the indexed heap (table) is referenced by exactly one TID.
(High key is a special case of index tuple. It merely contains key value such that all tuples on the page are lower than or equal to. It does not point to any other index tuple.)
Tuple Identifier (TID) (sometimes called item pointer) is a structure pointing to physical location of data on disk. It tells on which page within a table or index the tuple is located and where exactly on that page.
A new index tuple must be inserted into the tree for each table row that we index, even if another row having the same key value already has been indexed. Such an insertion can possibly cause page split, affecting one or more pages of the index.
As for the internal pages, there’s no meaningful difference between B-tree and GIN. What makes the difference are leaf pages. Depending on the number of entries having the same key value, one of the following takes place.
The first kind of GIN index leaf page is one containing a posting list.
Picture 2: GIN index with a posting list
A single index tuple exists for each key entry (i.e. unique key value) on the index leaf page. That index tuple in fact does not point directly to the corresponding row(s). Instead it points to a posting list, which is a list of pointers (TIDs) to all rows containing the appropriate key value. In order to index a new row that already has key entry in the tree for the indexed attribute (column), we only need to find the entry and add TID of the new row to the posting list. This does not trigger any page split in the index tree.
One thing worth to mention is that items of a posting list are sorted by the TID in ascending order. In other words, arrows from the TIDs of particular posting list do not cross each other.
If too many rows having the same value of the indexed attribute are inserted, multiple pages are allocated for the TIDs and these constitute so called posting tree.
Picture 3: GIN index with a posting tree
The posting tree is a separate B-tree structure stored on pages of the GIN index, where TID of the indexed row is used as a key – this way we can easily maintain the TIDs sorted as we do in posting list.
Compared to the standard B-tree, the posting tree is a little bit simplified:
- Page of the posting tree is a little bit different from the page that the entry tree uses. While the usual page format (used for both heap and index tuples) is designed to accommodate tuples of variable size (because key size is not constant in general), the posting tree page is always used for the same type (the TID) and therefore the key size is constant. Thus content of the posting tree page becomes array of constant-sized values, where the first item is reserved to keep the maximum TID value for the whole page.
- It’s not supposed to handle concurrent insertions. When a new entry is to be added, the referencing leaf page is locked in exclusive mode, so that only 1 backend can change the underlying posting tree.
Exactly one index tuple of the entry tree (i.e. the “main tree”) points to particular posting tree. That means, there can be many posting trees in a single GIN index, each representing specific key value.Specific feature of the GIN index is that a single table row can be referenced by multiple (or many) TIDs, each located in different posting
list/tree. The next post will explain such a case in detail.