CYBERTEC Logo

Flexible grouping: Some dirty SQL trickery

11.2015 / Category: / Tags: |

While doing PostgreSQL consulting for a German client, I stumbled over an interesting issue this week, which might be worth sharing with some folks out on the Internet, it's all about grouping.

Suppose you are measuring the same thing various times on different sensors every, say, 15 minutes. Maybe some temperature, some air pressure or whatever. The data might look like it is shown in the next table:

The human eye can instantly spot that 14:00 and 14:01 could be candidates for grouping (maybe the differences are just related to latency or some slightly inconsistent timing). The same applies to 14:14 to 14:16. You might want to have this data in the same group during aggregation.

The question now is: How can that be achieved with PostgreSQL?

Some dirty SQL trickery

The first thing to do is to check out those difference from one timestamp to the next:

The lag function offers a nice way to solve this kind of problem:

Now that we have used lag to "move" the time to the next row, there is a simple trick which can be applied:

Moving the lag to a subselect allows us to start all over again and to create those groups. The trick now is: If the difference from one line to the next is high, start a new group - otherwise stay within the group.

This leaves us with a simple result set:

From now on, life is easy. We can take this output and quickly aggregate on this data. "GROUP BY g" will give us nice groups for each value of "g".

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Subscribe
Notify of
guest
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Corey Huinker
Corey Huinker
8 years ago

Here's a way to do that without sequences:


select x.*, sum(edge) over (order by t) as group_num
from (select *, case when (t - lag(t,1) over (order by t)) >= '10 minutes' then 1 else 0 end as edge
from t_data) x
order by t;

t | val | edge | group_num
----------+-----+------+-----------
14:00:00 | 12 | 0 | 0
14:01:00 | 22 | 0 | 0
14:01:00 | 43 | 0 | 0
14:14:00 | 32 | 1 | 1
14:15:00 | 33 | 0 | 1
14:16:00 | 27 | 0 | 1
14:30:00 | 19 | 1 | 2
(7 rows)

Hans-Jürgen Schönig
Hans-Jürgen Schönig
8 years ago
Reply to  Corey Huinker

this is a cool one as well. we used the sequence because you can run jobs continuously.
i like your query :). i got to remember that one.

Corey Huinker
Corey Huinker
8 years ago

Ah, didn't realize you wanted the group_num to be unique across all queries, not just within this query.

Hans-Jürgen Schönig
Hans-Jürgen Schönig
8 years ago
Reply to  Corey Huinker

in this business case yes. but, your query is pretty cool as well :). maybe i just communicated badly.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    4
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram