Guru: MERGE, Chicken, And Eggs
September 30, 2019 Ted Holt
Which came first: the chicken or the egg? I don’t have time to ponder such trivialities. However, I am glad to know that SQL has a way to help me with chicken-and-egg database updates, i.e., when two statements need to run but each politely needs for the other to go first.
Suppose you support an IBM i system that keeps up with inventory. It has an item master table (physical file) that stores general information such as a description, the standard cost, and the list price of an item.
create table ItemMaster (ItemNumber char(6), Revision dec(3), Description char(20), Cost dec(5,2), Price dec(5,2), EffectiveFrom date not null, EffectiveThru date not null with default '9999-12-31', primary key (ItemNumber, Revision)); insert into ItemMaster values ('AA-101', 1, 'Widget', 2.50, 3.00, '2019-01-01', '9999-12-31'), ('BB-202', 1, 'Doodad', 1.00, 2.00, '2019-02-01', '2019-07-31'), ('BB-202', 2, 'Doodad', 1.25, 2.15, '2019-08-01', '9999-12-31'), ('CC-303', 1, 'Thingie', 3.50, 4.25, '2019-01-01', '9999-12-31');
In many systems, the item master has one row (record) for each item. You can store only one set of values at a time. One description, one cost, one price. When you update a column (field), the old value of that column is lost.
This system, however, permits multiple rows for an item. Each row has a range of effective dates. The dates in the rows of any item do not overlap, so that we can know what the description, standard cost, and catalog price were for any item on any given date.
The rows for an item are distinguished by a revision (version) number, which is incremented when a new revision of the item is created. Thus, there are two ways to identify the active revision of an item: by looking for the highest revision number or by looking for the expiration (effective through) date 9999-12-31.
To change the description, cost, and/or price for an item requires two operations.
- Change the effective-through date of the latest revision from 9999-12-31 to the day before the new values take effect.
- Add a new row with the new data and an effective-through date of 9999-12-31.
In other words, we must UPDATE the existing active row and we must INSERT a new active row.
Think for a moment. How would you make these changes? Would you UPDATE first or INSERT first? Either way you can run into problems. If you INSERT first, you have multiple rows for an item with an expiration date of 9999-12-31. If you UPDATE first, retrieving values from the last-active row in order to generate the release number for the new row becomes more difficult because you have to look for the MAX(REVISION) for each item. This is the sort of chicken-and-egg situation I was referring to.
And then there’s the problem of the first of the two running to completion and the second one canceling with an error, which is a very real possibility. None of this is insurmountable. I just want to point out that this sort of thing needs to be thought out carefully.
Fortunately, there is a better way. It’s called MERGE.
Consider a file-maintenance table of changes to be applied to the item master.
create table ItemUpdates (ItemNumber char(6), Description char(20), Cost dec(5,2), Price dec(5,2)); insert into ItemUpdates values ( 'AA-101', 'Widget', 3.00, 3.55), ( 'BB-202', 'Doodad', 1.50, 2.25);
We have data — a description, a cost, and a price — for two items. These may be the same values that are in the current row for the item, or they may be new values. In this example, I chose to leave the description as it is and change only cost and price for both items.
MERGE needs two transactions per item — one to force an UPDATE and one to force an INSERT. But we have only one row per item in the ItemUpdates table. Here’s how we generate two transactions from one.
select m.ItemNumber, m.Revision, m.Description, m.cost, m.price from ItemMaster as m join ItemUpdates as u on m.ItemNumber = u.ItemNumber where m.EffectiveThru = '9999-12-31' union all select m.ItemNumber, m.Revision + 1, u.Description, u.cost, u.price from ItemMaster as m join ItemUpdates as u on m.ItemNumber = u.ItemNumber where m.EffectiveThru = '9999-12-31'
This query transforms two rows of input into four rows of output. The first SELECT retrieves data from the current revision. We’ll use the revision number and ignore the description, cost, and price columns. The second select generates the new revision with the new description, cost, and price.
Item | Revision | Description | Cost | Price |
AA-101 | 1 | Widget | 2.50 | 3.00 |
AA-101 | 2 | Widget | 3.00 | 3.55 |
BB-202 | 2 | Doodad | 1.25 | 2.15 |
BB-202 | 3 | Doodad | 1.50 | 2.25 |
This query becomes the source data set in MERGE. That is, it follows USING. Watch MERGE do its magic.
merge into ItemMaster as tgt using (select m.ItemNumber, m.Revision, m.Description, m.cost, m.price from ItemMaster as m join ItemUpdates as u on m.ItemNumber = u.ItemNumber where m.EffectiveThru = '9999-12-31' union all select m.ItemNumber, m.Revision + 1, u.Description, u.cost, u.price from ItemMaster as m join ItemUpdates as u on m.ItemNumber = u.ItemNumber where m.EffectiveThru = '9999-12-31') as src on (src.ItemNumber, src.Revision) = (tgt.ItemNumber, tgt.Revision) when matched then update set tgt.EffectiveThru = '2019-09-30' when not matched then insert values (src.ItemNumber, src.Revision, src.Description, src.Cost, src.Price, '2019-10-01', '9999-12-31')
The UPDATE only changes one column — it updates the expiration date to one day before the day the new revision takes effect. The INSERT creates the new revision. Amazing!
It would be more realistic to see it in a production situation, such as in an RPG program.
dcl-s NewEffectiveDate date; dcl-s NoExpirationDate date inz(d'9999-12-31'); exec sql merge into ItemMaster as tgt using (select m.ItemNumber, m.Revision, m.Description, m.cost, m.price from ItemMaster as m join ItemUpdates as u on m.ItemNumber = u.ItemNumber where m.EffectiveThru = :NoExpirationDate union all select m.ItemNumber, m.Revision + 1, u.Description, u.cost, u.price from ItemMaster as m join ItemUpdates as u on m.ItemNumber = u.ItemNumber where m.EffectiveThru = :NoExpirationDate) as src on (src.ItemNumber, src.Revision) = (tgt.ItemNumber, tgt.Revision) when matched then update set tgt.EffectiveThru = :NewEffectiveDate - 1 day when not matched then insert values (src.ItemNumber, src.Revision, src.Description, src.Cost, src.Price, :NewEffectiveDate, :NoExpirationDate);
So what’s the result? See for yourself. First, the before:
Item | Revision | Description | Cost | Price | Effective from | Effective thru |
AA-101 | 1 | Widget | 2.50 | 3.00 | 2019-01-01 | 9999-12-31 |
BB-202 | 1 | Doodad | 1.00 | 2.00 | 2019-02-01 | 2019-07-31 |
BB-202 | 2 | Doodad | 1.25 | 2.15 | 2019-08-01 | 9999-12-31 |
CC-303 | 1 | Thingie | 3.50 | 4.25 | 2019-01-01 | 9999-12-31 |
Now, the after (assuming the new effective date is October 1, 2019):
Item | Revision | Description | Cost | Price | Effective from | Effective thru |
AA-101 | 1 | Widget | 2.50 | 3.00 | 2019-01-01 | 2019-09-30 |
AA-101 | 2 | Widget | 3.00 | 3.55 | 2019-10-01 | 9999-12-31 |
BB-202 | 1 | Doodad | 1.00 | 2.00 | 2019-02-01 | 2019-07-31 |
BB-202 | 2 | Doodad | 1.25 | 2.15 | 2019-08-01 | 2019-09-30 |
BB-202 | 3 | Doodad | 1.50 | 2.25 | 2019-10-01 | 9999-12-31 |
CC-303 | 1 | Thingie | 3.50 | 4.25 | 2019-01-01 | 9999-12-31 |
MERGE came through for us again!
A word of warning is appropriate here. When you do this type of update, you need to do it under commitment control. It is possible for the UPDATE part of MERGE to succeed and the INSERT part to fail, and vice versa. If one part succeeds and one part fails, you need to be able to ROLLBACK. I would not want a partially updated database.
Joe Celko wrote about this sort of problem almost 11 years ago, but with a couple of differences. If you’re interested in reading more about this topic, see the link below. Anything he writes, whether book or article, is worth reading.
How is using this Merge technique under commitment control any different than just doing the original update and insert directly under commitment control? Also, IBM now natively supports temporal tables, so that could be an alternate way to solve this problem, although queries against older time frames would need to be done using temporal based SQL syntax.