Merge Into the Synchronization Fast Lane with DB2 for i 7.1
September 22, 2010 Michael Sansoterra
One of the tedious chores that many developers face is keeping many different data sources synchronized. It is common for mid- to large-sized companies to have multiple software packages with requirements to keep synchronized copies of the item master, customer master, employee master, and other master files. Other common synchronization tasks involve keeping aggregate tables of detailed data or even synchronizing DB2 data with a relational or non-relational remote data source, including remote databases, Web services, flat files, etc. Fortunately, to help with all of this, IBM has updated DB2 for i 7.1 to include support for the SQL MERGE statement. MERGE (part of the SQL:2008 standard) is a statement that combines the functionality of INSERT and UPDATE statements into a single statement, generally for the purpose of synchronizing data. MERGE makes use of a target table (or updateable view) and a source query to do the synchronization. A Simple Example Let’s look at a quick example of how we might use MERGE to synchronize exact copies of an EMPLOYEE master file (from the DB2 for i EMPLOYEE sample table), which are placed in two different libraries (a.k.a., schemas). These files have a single column primary key: EMPNO. We’ll also assume this is a unidirectional synchronization, with the source library being merged into the target library. In the past, a complete synchronization of the tables would be done with three SQL statements: /* Step 1 - Update Target Rows from Source Rows */ UPDATE TARGETLIB.EMPLOYEE AS TARGET SET ROW=(SELECT * FROM SOURCELIB.EMPLOYEE AS SOURCE WHERE TARGET.EMPNO=SOURCE.EMPNO) WHERE EXISTS (SELECT * FROM SOURCELIB.EMPLOYEE AS SOURCE WHERE TARGET.EMPNO=SOURCE.EMPNO) /* Step 2 - Remove Target Rows that are not in Source */ DELETE FROM TARGETLIB.EMPLOYEE AS TARGET WHERE NOT EXISTS (SELECT * FROM SOURCELIB.EMPLOYEE AS SOURCE WHERE TARGET.EMPNO=SOURCE.EMPNO) /* Step 3 - Insert New Rows from Source into Target */ INSERT INTO TARGETLIB.EMPLOYEE SELECT * FROM SOURCELIB.EMPLOYEE AS SOURCE WHERE NOT EXISTS (SELECT * FROM TARGETLIB.EMPLOYEE AS TARGET WHERE TARGET.EMPNO=SOURCE.EMPNO) Now let’s take a look at a sample MERGE statement that is similar to the above statements: MERGE INTO TARGETLIB.EMPLOYEE AS EMPLOYEE_TARGET USING SOURCELIB.EMPLOYEE AS EMPLOYEE_SOURCE ON (Employee_Target.EmpNo=Employee_Source.EmpNo) WHEN MATCHED THEN UPDATE SET Employee_Target.FIRSTNME =Employee_Source.FIRSTNME, Employee_Target.MIDINIT =Employee_Source.MIDINIT, Employee_Target.LASTNAME =Employee_Source.LASTNAME, Employee_Target.WORKDEPT =Employee_Source.WORKDEPT, Employee_Target.PHONENO =Employee_Source.PHONENO, Employee_Target.HIREDATE =Employee_Source.HIREDATE, Employee_Target.JOB =Employee_Source.JOB, Employee_Target.EDLEVEL =Employee_Source.EDLEVEL, Employee_Target.SEX =Employee_Source.SEX, Employee_Target.BIRTHDATE=Employee_Source.BIRTHDATE, Employee_Target.SALARY =Employee_Source.SALARY, Employee_Target.BONUS =Employee_Source.BONUS, Employee_Target.COMM =Employee_Source.COMM WHEN NOT MATCHED THEN INSERT (EMPNO,FIRSTNME,MIDINIT,LASTNAME,WORKDEPT,PHONENO, HIREDATE,JOB,EDLEVEL,SEX,BIRTHDATE,SALARY,BONUS,COMM) VALUES( Employee_Source.EMPNO,Employee_Source.FIRSTNME, Employee_Source.MIDINIT,Employee_Source.LASTNAME, Employee_Source.WORKDEPT,Employee_Source.PHONENO, Employee_Source.HIREDATE,Employee_Source.JOB, Employee_Source.EDLEVEL,Employee_Source.SEX, Employee_Source.BIRTHDATE,Employee_Source.SALARY, Employee_Source.BONUS,Employee_Source.COMM) ELSE IGNORE; Let’s examine the components of MERGE in a little more detail. MERGE INTO begins the statement and is followed by an updateable table or view, known as the “target.” The USING keyword specifies a “table reference, ” which can include just about any query, but does not include a common table expression. This is considered the source data for the merge. The ON keyword specifies the correlation between the source and the target, usually a unique row identifier. However, you can also specify additional criteria here, such as limiting the MERGE to employees within a certain department, etc. WHEN MATCHED THEN is used to indicate the operation to be performed when the ON clause condition has been met. In this case, an abbreviated UPDATE statement, using only a SET assignment clause, is used within the MERGE statement to set all the target row columns to be equal to the corresponding source row columns. WHEN NOT MATCHED THEN indicates the operation to be done when a source row does not have a matching row in the target table. Typically, this is an abridged INSERT/VALUES statement. Please note that the table name is not specified because the target table is automatically used by MERGE. For the instance when the source and the target should always contain an exact number of matching rows, the SIGNAL statement can be coded to indicate an error condition. While this example was somewhat trivial, please note that multiple WHEN MATCHED and NOT MATCHED clauses can be specified with additional criteria. These clauses will be evaluated in the order specified in the statement. This is similar to the evaluation of a CASE statement; once the first condition is true the specified operation will be evaluated with the remainder being ignored. ELSE IGNORE specifies that nothing further is to be done when the matching condition for all prior WHEN clauses are false. Please note when using MERGE that the synchronization is a uni-directional proposition–always from the source to the target. In the example above, you cannot update the source table based on data in the target. If you need two-way synchronization, then you will need a second MERGE statement with the source and target reversed. Deletion Considerations I said MERGE is “similar” to the three SQL statements because while DB2 can add new rows to the target and can update existing rows in the target, it cannot DELETE rows in the target for the circumstance where there is no matching row in the source. However, it can delete rows from the target in the case where there is a matching row from the source. I mention this because SQL Server’s implementation of MERGE has the option to delete rows from the target when there is no matching row in the source. I initially expected DB2’s MERGE to be able to perform a target DELETE when there is no matching source row. However this behavior, I learned, is SQL Server specific and not part of the ANSI SQL standard. Bummer! A DELETE statement is allowed in the WHEN MATCHED clause and can be used to remove rows from the target table when a match is found. However, a DELETE usually isn’t done unless additional criteria are specified (unless you want to delete all rows in the target when there is a matching source row). For example, if we wanted to remove MERGED employees from the target when the department is T01 (temporary employees), we could code a WHEN MATCHED THEN clause to the above MERGE statement like this: WHEN MATCHED AND Employee_Source.WORKDEPT='T01' THEN DELETE Because order of evaluation is important, this specific WHEN MATCHED would have to be placed prior to the more generic WHEN MATCHED clause shown earlier. If temporary employees are not supposed to be in the source data, an error condition can be artificially created using the SIGNAL statement like this: WHEN MATCHED AND Employee_Source.WORKDEPT='T01' THEN SIGNAL SQLSTATE '70001' SET MESSAGE_TEXT='Temporary Employees Found' Another Example Let’s take a look at one more example. Consider the following Sales Summary table that summarizes customer sales by year and customer ID: CREATE TABLE SalesSummaryByYear ( SalesYear SMALLINT NOT NULL, CustomerId INT NOT NULL, OrderQty INT, OrderLineValue DEC(19,4), NumberLines INT, RankBySales INT, RankByOrders INT, CreatedOn TIMESTAMP, LastUpdatedOn TIMESTAMP, CONSTRAINT pk_SalesSummaryByYear PRIMARY KEY (SalesYear,CustomerId)) You can use the MERGE statement to INSERT new and UPDATE existing rows in this table: MERGE INTO SalesSummaryByYear Target USING ( SELECT YEAR(OrderDate) AS SalesYear, CustomerId, COUNT(DISTINCT SOD.SalesOrderId) As NumberOrders, SUM(OrderQty) AS OrderQty, SUM(UnitPrice*OrderQty*(1-UnitPriceDiscount)) AS OrderLineValue, COUNT(*) AS NumberLines, RANK() OVER(PARTITION BY CustomerId ORDER BY SUM(UnitPrice*OrderQty* (1-UnitPriceDiscount)) DESC) AS RankBySales, RANK() OVER(PARTITION BY CustomerId ORDER BY COUNT(DISTINCT SOD.SalesOrderId) DESC) AS RankByOrders FROM SalesOrderDetail SOD JOIN SalesOrderHeader SOH ON SOH.SalesOrderId=SOD.SalesOrderId GROUP BY YEAR(OrderDate),CustomerId ) Source ON Target.SalesYear=Source.SalesYear AND Target.CustomerId=Source.CustomerId WHEN MATCHED THEN UPDATE SET SalesYear=Source.SalesYear, CustomerId=Source.CustomerId, OrderQty=Source.OrderQty, OrderLineValue=Source.OrderLineValue, NumberLines=Source.NumberLines, RankBySales=Source.RankBySales, RankByOrders=Source.RankByOrders, LastUpdatedOn=CURRENT_TIMESTAMP WHEN NOT MATCHED THEN INSERT (SalesYear,CustomerId, OrderQty,OrderLineValue,NumberLines, RankBySales,RankByOrders,CreatedOn) VALUES (Source.SalesYear,Source.CustomerId, Source.OrderQty,Source.OrderLineValue,Source.NumberLines, Source.RankBySales,Source.RankByOrders,CURRENT_TIMESTAMP) ; In this example, a complex query is used as MERGE’s source. If the sales order history was extremely large, a WHERE clause could be added to limit the query’s sales to only the past year or two, which should be OK because sales orders over one year old are generally never updated. Using this technique is beneficial because:
If you need to synchronize with remote data sources, table functions can be used to aid with this task, if the situation is appropriate. For example, if table function TFXML reads customer master data from an XML source (such as a Web service or an integrated file system (IFS) file), then a MERGE statement like this is valid: MERGE INTO MYDATA.CUSTOMER AS TARGET USING (SELECT * FROM TABLE(MYPGM.TFXML()) AS SOURCE ON (SOURCE.CustomerId=TARGET.CustomerId) ... Other Considerations One other important thing to know about MERGE concerns how it acts when running within a transaction. By default, the statement will run as an ATOMIC statement, that is, all of the modifications must succeed or the entire statement will fail and no changes are made. However, NOT ATOMIC may also be specified. In this case the statement is processed row by row, and all of the rows successfully processed up to the point of an error will be committed. Further, when NOT ATOMIC is specified, an additional option to continue processing (i.e., ignore the error) or stop processing is available. Benefits of using MERGE include:
In summary, MERGE is another one of those great new statements that makes the coding of a common task easier and squeezes even more work out of DB2. Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.
|