Guru: Concerning The Stepping-On Of Feet
October 19, 2020 Ted Holt
It has come to my attention that once again I did not tell the truth, the whole truth, and nothing but the truth. In Three Suboptimal I/O Practices, I said that a simple SELECT INTO was preferable to a cursor that fetches one row. It turns out that there is at least one situation in which SELECT INTO will not serve the purpose, and one has no choice but to use a cursor that fetches one row.
The situation is this: the program must lock the fetched row for update. Despite its power and simplicity, SELECT INTO cannot lock a row, or so I am told. Consider the following code, embedded in an RPG program.
exec sql select name into :name from vemp where clock = :clock; . . . more code . . . exec sql update vemp set name = :NewName where clock = :clock;
It is possible for another job to change the value of the NAME column in VEMP between the SELECT and UPDATE statements. In such a case, this program would wipe out the other update.
One way to keep one program from stepping on another program is to lock the row, and that requires a cursor.
exec sql declare c1 cursor for select name from vemp where clock = :clock for update of name; exec sql open c1; exec sql fetch c1 into :name; . . . more code . . . exec sql update vemp set name = :NewName where current of c1; exec sql close c1;
The FOR UPDATE OF clause locks the row. Problem solved. Case closed. Court adjourned.
Or is it?
I have written or worked on many programs, usually of a file-maintenance persuasion, that locked a record while someone entered changes to the data. However, these are not the 1980’s, and we have better ways these days to ensure the integrity of data. I refer you to Michael Sansoterra’s article Emulate RPG’s Pessimistic Locking in SQL, published just over nine years ago in this august publication. Mike treated the subject with his usual adept thoroughness, explaining why optimistic locking is better than pessimistic locking. Reader Jamie shared his experience, enriching the discussion.
There is no reason why two or more jobs should not change the same row at the same time, provided they are not changing the same column(s), i.e. field(s). Consider a database table with employee data.
- A program retrieves some or all columns (fields) of a row (record), using SELECT INTO, and saves the data into a before-image data structure.
- The program copies the before-image into an after-image.
- The user changes one or more columns in the after-image.
- The program retrieves the same row, using SELECT INTO a second time and saving the data into a third data structure.
- The program compares the three data structures. If no other job has changed the same columns that this job is changing, the program updates the database. If some other job has changed the same columns, then the database does not get the update and the user is informed and told to redo the transaction.
In this example, the current job changes the name and another job changes the telephone number. There is no conflict.
Clock | Name | Date of Birth | Telephone | |
Before image | 123 | Sally Varygland | 1980-01-01 | 662-840-5546 |
After image | 123 | Sally Mander | 1980-01-01 | 662-840-5546 |
Second SELECT INTO | 123 | Sally Varygland | 1980-01-01 | 662-841-1245 |
This takes a little more work than locking the row, but it’s not complicated.
Also, consider that some updates do not need locking. For instance, suppose a program allows workers to store items in a warehouse. Suppose two workers both store item A into two different locations at the same time. The program updates two rows, one for each location, and also updates a summary record that contains the total quantity in the warehouse.
UPDATE INVENTORY SET QTYONHAND = QTYONHAND + :Qty WHERE ITEM = :SomeItem;
Two copies of the same program should be able to update the summary record without interfering with one another. It would not matter which user updated first or last.
I should also mention that IBM has placed a lot of thought into the matter of concurrency, such that the database manager handles many contention problems so we don’t have to. You can read more than you probably want to know about concurrency in the IBM Knowledge Center.
Over the years, I have handled my share of help-desk calls, and a common complaint has been that a user could not access a row that someone else had tied up. That someone else might have gone to a meeting, to lunch, or home for the day. I’ve known users to step on their own feet, having a record locked in one 5250 session and trying to update that same record from another session. Anything we can do to avoid such foolishness is worth the effort.
RELATED STORIES
Guru: Three Suboptimal I/O Practices
“If no other job has changed the same columns that this job is changing…” Shops I have worked at in the past do something similar. Though rather than comparing the to-be-updated columns, we usually check the last changed timestamp column of the record (which is always up to date thanks to the “for each row on update as row change timestamp” clause on the column).
Never thought of conditioning it on the to-be-updated rows, but the way our DB operates I think in most cases we would want users to be aware that someone else altered a part of the same record, as it may have an impact on the updates they want to make.
Typically we would record the timestamp in the initial SELECT, and then check the timestamp in the WHERE clause of the UPDATE. Though this does mean we can’t tell the difference between a concurrent update failure and a record not found error (without doing further selects), but then a record not found then there was a successful SELECT a few moments ago is highly unlikely.
Hi Ted,
Another scenario where a simple SELECT INTO would not be allowed in embedded SQL program is when a UNION is involved in SQL statement.
Fortunately, a simple workaround exists for that scenario. A view (or, indeed, a table function like in my actual scenario below) could be created to encapsulate the original SQL statement with a UNION. Then a SELECT from that view (or table function) could be coded in with INTO.
The original SQL statement was something like this.
SELECT
*
FROM TABLE ( CUSTOMER_RATE_DEFAULT_EXCLUSION ( ) ) RATE
UNION
SELECT
*
FROM TABLE ( CUSTOMER_RATE_DEFAULT_INCLUSION ( ) ) RATE
You cannot squeeze the INTO into that construct. However, a new table function, say, CUSTOMER_RATE, could be created with the following Return statement.
RETURN (
SELECT
*
FROM TABLE ( CUSTOMER_RATE_DEFAULT_EXCLUSION ( @PARM1 , @PARM2 , …) ) RATE
UNION
SELECT
*
FROM TABLE ( CUSTOMER_RATE_DEFAULT_INCLUSION ( @PARM1 , @PARM2 , … ) ) RATE
)
Then, the statement in the consumer program would be something like this.
SELECT
*
INTO :hostDataStructure :nullIndicatorArray
FROM TABLE ( CUSTOMER_RATE ( ) ) RATE
FETCH FIRST ROW ONLY;