Custom replication handlers for Postgres-BDR

04.2017 / Category: How To / Tags: replication

Although normally one should try to avoid using non-core extensions/forks of Postgres, in some rare cases it could be even worse not to use them. But as BDR project is also here to stay (as I’ve understood main functionality will be gradually integrated into Postgres core), then the following info might still be useful in the future. So the thing is that when we got a client request to support them with custom replication handlers, we discovered to our surprise that the official documentation (and also our big brother, the Internet) is sadly pretty empty in that area with no working code examples. Maybe even on purpose, as kind of a warning that you’re already on slippery ice and should watch out. Who knows. But in short, after some poking around, I got acquainted with the topic and just to give you an idea what’s involved, a short overview.

Table of Contents

About BDR in two sentences

First, for those not too familiar with the relatively new project of BDR (Bi-Directional Replication) – in short it enables asynchronously replicated Master–Master Postgres clusters (up to 48 nodes) where nodes are kind of independent of each other and they just try to communicate with all other nodes and fetch and apply (synchronize) their data changes on “best effort” terms. But to have any real chance at keeping data in sync BDR imposes quite some limitations on the database design – the biggest one being maybe that you must have UNIQUE or PRIMARY KEY constraints on all of your tables and exclusion constraints and DDL operations causing full table rewrites are disabled. Which shouldn’t be a big problem though in practice if you follow good database design best practices. But there are also some other aspects, so for general concepts see here and for DDL restrictions see for example here. And as there are some other corner-case issues (see GitHub) I would recommend doing some thorough testing for your concrete use case before going “all in” on BDR. But in general still a very powerful tool and definitely the future.

BDR conflict handling basics

But back to conflicts. In short – when you model your data according to BDR guidelines, most of the time the framework will take care of conflict situations where 2 nodes changed/inserted the same data-row simultaneously. This is possible as no global locking is done - all nodes operate on local node data and just try to sync with others automatically, by applying the “last timestamp wins” policy. That should also suffice for 99% of use cases. One just needs to remember that this also means that some data inserts or updates will just be thrown away and one needs to design the applications accordingly. More info on types of conflicts that BDR can resolve automatically can be found here.

But what to do when we still want to customize our conflict handling? For that the framework offers a possibility to declare standard PL/pgSQL stored procedures with a specific signature that will get the conflicting rows as input and the procedure can then decide whether it wants to discard one of the rows or to merge some column values. This merging use case though is probably the most wanted approach for custom handlers, so we’ll do a sample with that. Also note that custom handling is possible only where one UNIQUE constraints is violated (thus you shouldn't have too many)!

Setting up a sample handler function

User written conflict handler stored procedures have specific table (type) based inputs, meaning you’ll need at least one separate function for every handled table. But at least the handler function will get the conflict type (INSERT vs INSERT, INSERT vs UPDATE, …) as input so you can choose to have some “IF statements” to cover all the necessary conflict types for one table in the code.

A simple INSERT VS INSERT conflict handler that will merge for example our website hit counters (imagine we have two Data Centers with a Webserver and one BDR database node and some kind of inflexible log monitoring tool that can only do an INSERT on midnight with a fixed Primary Key ID translating to yesterday's date) will look like that:

CREATE OR REPLACE FUNCTION public.hitcount_conflict_handler_ins_ins (

 row1 public.hitcount,

 row2 public.hitcount,

 table_name text,

 table_regclass regclass,

 conflict_type bdr.bdr_conflict_type, /* [insert_insert | insert_update | update_update | update_delete | delete_delete | unhandled_tx_abort] */

 OUT row_out public.hitcount,

 OUT handler_action bdr.bdr_conflict_handler_action) /* [IGNORE | ROW | SKIP] */

 RETURNS record AS

$BODY$

BEGIN

 raise warning 'conflict detected for public.hitcount, old_row: %, incoming_row: %', row1, row2;

 -- code to choose the output row or to merge values

 row1.counter = row1.counter + row2.counter;

 row_out := row1;

 handler_action := 'ROW';

END;

$BODY$

LANGUAGE plpgsql;



-- after writing the handler procedure we also need to register it as an handler

select * from bdr.bdr_create_conflict_handler(

 ch_rel := 'hitcount',

 ch_name := 'hitcount_ins_ins_handler',

 ch_proc := 'public.hitcount_conflict_handler_ins_ins(public.hitcount, public.hitcount, text, regclass, bdr.bdr_conflict_type)',

 ch_type := 'insert_insert');

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

CREATE OR REPLACE FUNCTION public.hitcount_conflict_handler_ins_ins (

row1 public.hitcount,

row2 public.hitcount,

table_name text,

table_regclass regclass,

conflict_type bdr.bdr_conflict_type, /* [insert_insert | insert_update | update_update | update_delete | delete_delete | unhandled_tx_abort] */

OUT row_out public.hitcount,

OUT handler_action bdr.bdr_conflict_handler_action) /* [IGNORE | ROW | SKIP] */

RETURNS record AS

$BODY$

BEGIN

raise warning 'conflict detected for public.hitcount, old_row: %, incoming_row: %', row1, row2;

-- code to choose the output row or to merge values

row1.counter = row1.counter + row2.counter;

row_out := row1;

handler_action := 'ROW';

END;

$BODY$

LANGUAGE plpgsql;

-- after writing the handler procedure we also need to register it as an handler

select * from bdr.bdr_create_conflict_handler(

ch_rel := 'hitcount',

ch_name := 'hitcount_ins_ins_handler',

ch_proc := 'public.hitcount_conflict_handler_ins_ins(public.hitcount, public.hitcount, text, regclass, bdr.bdr_conflict_type)',

ch_type := 'insert_insert');

After successful defining our conflict resolution handler jumps into action and we can see according messages in the server logs:

2017-03-17 09:56:20.430 UTC,,,23833,,58cba8e3.5d19,10,,2017-03-17 09:14:11 UTC,5/32,719,WARNING,01000,"conflict detected for public.hitcount, old_row: (20170317,100), incoming_row: (20170317,50) ",,,,,"apply INSERT from remote relation public.t1 in commit 0/18DC780, xid 739 commited at 

2017-03-17 09:56:20.425756+00 (action #2) from node (6398391759225146474,1,16385)",,,,"bdr (6398391788660796561,1,16385,): apply" 2017-03-17 09:56:20.430 UTC,,,23833,,58cba8e3.5d19,12,,2017-03-17 09:14:11 UTC,5/32,719,LOG,23000,"CONFLICT: remote INSERT: row was previously INSERTed at node 6398391788660796561:1. Resolution: conflict_trigger_returned_tuple; PKEY: id[text]:20170317 counter[int4]:100",,,,,"apply INSERT from remote relation public.hitcount in commit 0/18DC780, xid 739 commited at 2017-03-17 09:56:20.425756+00 (action #2) from node (6398391759225146474,1,16385)",,,,"bdr (6398391788660796561,1,16385,): apply"

1

2

3

2017-03-17 09:56:20.430 UTC,,,23833,,58cba8e3.5d19,10,,2017-03-17 09:14:11 UTC,5/32,719,WARNING,01000,"conflict detected for public.hitcount, old_row: (20170317,100), incoming_row: (20170317,50) ",,,,,"apply INSERT from remote relation public.t1 in commit 0/18DC780, xid 739 commited at

2017-03-17 09:56:20.425756+00 (action #2) from node (6398391759225146474,1,16385)",,,,"bdr (6398391788660796561,1,16385,): apply" 2017-03-17 09:56:20.430 UTC,,,23833,,58cba8e3.5d19,12,,2017-03-17 09:14:11 UTC,5/32,719,LOG,23000,"CONFLICT: remote INSERT: row was previously INSERTed at node 6398391788660796561:1. Resolution: conflict_trigger_returned_tuple; PKEY: id[text]:20170317 counter[int4]:100",,,,,"apply INSERT from remote relation public.hitcount in commit 0/18DC780, xid 739 commited at 2017-03-17 09:56:20.425756+00 (action #2) from node (6398391759225146474,1,16385)",,,,"bdr (6398391788660796561,1,16385,): apply"

In addition to outputting to the server log it’s also possible to enable conflict handler “audit logging” into the specific bdr.bdr_conflict_history table by setting bdr.log_conflicts_to_table=on in the server config. There the same information will be just more nicely structured – but then you also might need some kind of a Cronjob to clean up the table periodically.

Avoid conflicts if possible

Although we’ve seen that conflict handling is kind of manageable, just some friendly advice – the best conflict handling scheme is to have no conflicts at all. So before starting with your application I’d recommend trying to design such a data model where all applications work with only some partitions of data or with UUIDs. An extract from the BDR documentation:

Conflicts can only happen if there are things happening at the same time on multiple nodes, so the simplest way to avoid conflicts is to only ever write to one node, or to only ever write to independent subsets of the database on each node.

NB! For production stuff make sure you use the latest 1.0.2 version of BDR as older ones had a bug where for insert/insert handlers the remotely changed tuple was not populated correctly. Also in case of more question or for support on this topic feel free to contact us.