Guru: A More Efficient Way To Merge With SQL
October 15, 2018 Mike Larsen
Lately, I’ve been using the merge statement in my programs to insert or update rows in a table. I recently came across a situation where a program using merge was running every few minutes and performing updates to thousands of rows each time it ran. Since this table was journaled, it was obvious some re-design was necessary.
For those unfamiliar with the merge statement, it is sometimes referred to as an “upsert.” That means it will either perform an update or an insert. In RPG terms, I like to compare it to a chain operation. With a chain, you check whether or not the row was found in the table. If it was found, update the row, otherwise insert it.
To help illustrate the merge statement, I put together a quick program that maintains employee master information. The program contains two versions of the merge statement; how the original statement was coded and how it looks after the enhancement. I’ll show the original merge statement (Figure 1), followed by the enhanced statement (Figure 2) and point out the differences.
Figure 1. The original merge statement
Exec sql Merge into Employee_Master as tgt using (values(:employeeNumber, :in_employee_status, :in_employee_last_name, :in_employee_first_name, :in_employee_title)) as src (sEmployee_number, sEmployee_status, sEmployee_last_name, sEmployee_first_name, sEmployee_title) on (tgt.employee_number) = (src.sEmployee_number) // ***** the update occurs unconditionally when matched then update set tgt.employee_status = sEmployee_status, tgt.employee_last_name = sEmployee_last_name, tgt.employee_first_name = sEmployee_first_name, tgt.employee_title = sEmployee_title, tgt.row_updated_program = :pgm_stat.Program when not matched then insert (tgt.employee_number, tgt.employee_status, tgt.employee_last_name, tgt.employee_first_name, tgt.employee_title, tgt.row_added_program) values(sEmployee_number, sEmployee_status, sEmployee_last_name, sEmployee_first_name, sEmployee_title, :pgm_stat.Program);
The merge statement in Figure 1 uses values passed into the program as parameters to either update or insert a row in the employee master. The employee master table is defined as the target for the actions and the source is the data passed into the program. If the employee number parameter is not found in the table, it inserts a row. Conversely, if the employee already exists on the table, it updates the columns for that employee’s row. Pay particular attention to the commented section of code where the update occurs without any conditions. That means every time this process runs and the employee exists on the table, it updates their information regardless if there were any changes. Now imagine that this process is one that runs every few minutes and there happens to be many updates occurring to employee information. Can you see the problem? It would perform many unnecessary updates and also create many entries to the journal.
I thought of a few ways to correct this problem. I could turn off journaling for this table. But that would leave me with other challenges. I could also rewrite this part of the program using RPG and the chain operation to check if the row already existed and only update it if any columns had changed. I didn’t like that option either as it would have taken some time to rewrite the code. Or, I could add a few lines of code to my existing merge statement to check if any of the employee information had changed. I liked this idea the best, and that’s what is shown in Figure 2.
Figure 2. Enhanced merge statement
Exec sql Merge into Employee_Master as tgt using (values(:employeeNumber, :in_employee_status, :in_employee_last_name, :in_employee_first_name, :in_employee_title)) as src (sEmployee_number, sEmployee_status, sEmployee_last_name, sEmployee_first_name, sEmployee_title) on (tgt.employee_number) = (src.sEmployee_number) // ***** only perform updates if one or more of the fields we're // updating has changed. when matched and (tgt.employee_status <> sEmployee_status or tgt.employee_last_name <> sEmployee_last_name or tgt.employee_first_name <> sEmployee_first_name or tgt.employee_title <> sEmployee_title ) then update set tgt.employee_status = sEmployee_status, tgt.employee_last_name = sEmployee_last_name, tgt.employee_first_name = sEmployee_first_name, tgt.employee_title = sEmployee_title , tgt.row_updated_program = :pgm_stat.Program when not matched then insert (tgt.employee_number, tgt.employee_status, tgt.employee_last_name, tgt.employee_first_name, tgt.employee_title, tgt.row_added_program) values(sEmployee_number, sEmployee_status, sEmployee_last_name, sEmployee_first_name, sEmployee_title, :pgm_stat.Program);
I added some code where I perform the check if the employee exists (directly after the ‘when matched’ condition) to check the values of all the columns that are updated. If those conditions aren’t met, the update will not occur.
That’s it! Just a few lines of code to improve performance and reduce the number of times I write to the journal.
Whether you’ve used merge in the past or if it’s totally new to you, you can now feel confident using it in your programs.
In our shop, we have a sequence number column that gets updated when there is a column that has changed. It is that sequence number I check …