CYBERTEC Logo

PostgreSQL: Finding Christmas presents

12.2013 / Category: / Tags: |

Christmas is approaching fast so I thought I'd give PostgreSQL users some inspiration when it comes to buying their Christmas presents. I assume that I am not the only one who has constant troubles finding the right present for somebody. The trouble is that prices are usually strange numbers. Maybe EUR 49.95 or EUR 49.99. This makes it very hard to look for something that costs, say, EUR 50.

Let us assume you want to buy something that costs around EUR 50. Of course, if there is nothing for EUR 49.95 or 48.99 - but given the fact that it is Christmas - you are also fine with EUR 54.99.

KNN comes to the rescue

Using the = operator is clearly not enough to fixing your problem. What you really want is some kind of fuzzy search. The trivial approach to this database problem is to just look for a range of values. In many cases this seems like a good idea but remember: If you happen to be a large website, looking for everything between EUR 40 and EUR 60 might yield hundreds of thousands of products. This is clearly not an option. All you want is a handful of suggestions, which are as close as possible to your desired target price.

Finding something close to what you are looking for is exactly what KNN search (K Nearest Neighbor Search) has been made for. It allows you to break the chains of the = operator and allows fuzzy search.

An example

To show you how you can use PostgreSQL to search for similar numbers, we can create a table containing some randomly generated entries:

The generate_series function is a nice and easy way to generate a list of values. As you can see, we have evenly distributed values between 0 and 100 – exactly what we need for our test.

The trivial approach: Ranges

As mentioned before, most people would go for the trivial approach here and just come up with a range. To do so we first create an index:

Then we can just do the query:

Just take a look at the estimates of the PostgreSQL optimizer: We are expected to need 950 rows for this query. This is not surprising as it is close to 10% of the data. Just imagine doing that on 1 mio rows – you had to read 100.000 rows to display just a handful of them. Clearly – performance will go down the drain in this case.

A second problem is: What if there are no products in the range of 45 – 55? What if the cheapest product is EUR 56?

Supersonic fast KNN search

KNN can come to the rescue here.

To use KNN we first have to install the extension containing some fancy operator classes:

Then we can deploy a Gist index capable of doing KNN:

The module will introduce the so called “distance operator” (<->). It measures the distance between the value in doubt and the target we are looking for. The beauty of the PostgreSQL KNN mechanism is that we can sort by this distance to fetch those rows in the order we desire:

Our query just takes a fraction of a millisecond. And more important: Runtimes are pretty stable – even if the number of rows grows.

On behalf of my entire team, we wish you all a Merry Christmas time and efficient Christmas shopping 😉

Visit us on facebook: www.fb.com/cybertec.postgresql

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram