Guru: Faster Fetching, Revisited
November 13, 2017 Paul Tuohy
While visiting a client recently, I was shown how they were using a multi-row FETCH into a multiple-occurrence data structure to retrieve large sets of data. When I asked why they were not using a data structure array instead, I was referred to an IT Jungle article in which Ted Holt answered a question on whether it was faster to perform a row-at-a-time FETCH or a multi-row FETCH when using embedded SQL.
Ted provided a comparison between a row-at-a-time FETCH, a multi-row FETCH into a data structure array, and a multi-row FETCH into a multiple-occurrence data structure. At the end of the article, Ted concluded, “The file with which I tested had 1.3 million records. The first two programs used 11 seconds of CPU time. However, the last one used only seven seconds of CPU time. Using an array data structure was no faster than the single-row fetch. The multiple-occurrence data structure was faster than the array data structure.”
Loathe as I am to disagree with Ted, his conclusion did not correspond with my experience. I would expect the data structure array and the multiple-occurrence data structure to be comparable in performance. (Underneath the covers, it should be the exact same process for both.) And I would expect both of them to be faster that a row-at-a-time FETCH.
I copied the programs from the original article, removed the logic for printing details and added logic to calculate the length of time (in milliseconds) between declaring the cursor and closing the cursor. The three programs ran over a database with the same format and the same number of rows.
This is the code for the program that performed a row-at-a-time FETCH.
ctl-Opt option(*srcstmt:*nodebugio); dcl-Ds Transaction qualified inz; Batch packed(3:0); Code char(2); Order char(7); Item char(15); Warehouse char(3); Quantity packed(11:3); end-Ds; dcl-C SqlEof const('02000'); dcl-S DetailLine Char(132); dcl-S fromstamp Timestamp; dcl-S tostamp Timestamp; dcl-S diff Int(10); exec sql set option commit = *none; *inlr = *on; fromstamp = %timestamp(); exec sql declare Input cursor for select batch, tcode, ordno, itnbr, house, trqty from transaction order by key; exec sql open Input; dow '1'; exec sql fetch next from Input into :Transaction; if sqlstt >= SqlEof; leave; endif; enddo; exec sql close Input; tostamp = %timestamp(); diff = %diff(toStamp: fromStamp: *ms); dsply ('Single: ' + %char(diff)); return;
This is the equivalent program using a multi-row FETCH into a data structure array.
ctl-Opt option(*srcstmt:*nodebugio); dcl-Ds Transaction qualified inz dim(1000); Batch packed(3:0); Code char(2); Order char(7); Item char(15); Warehouse char(3); Quantity packed(11:3); end-Ds; dcl-S getRows Int(10) inz(%elem(transaction)); dcl-C SqlEof const('02000'); dcl-S DetailLine Char(132); dcl-S fromstamp Timestamp; dcl-S tostamp Timestamp; dcl-S diff Int(10); exec sql set option commit = *none; *inlr = *on; fromstamp = %timestamp(); exec sql declare Input cursor for select batch, tcode, ordno, itnbr, house, trqty from transaction order by key; exec sql open Input; dou sqlstt >= SqlEof; exec sql fetch next from Input for :getRows rows into :Transaction; enddo; exec sql close Input; tostamp = %timestamp(); diff = %diff(toStamp: fromStamp: *ms); dsply ('Array: ' + %char(diff)); return;
This is the equivalent program using a multi-row FETCH into a multiple-occurrence data structure.
ctl-Opt option(*srcstmt:*nodebugio); dcl-Ds Transaction qualified inz occurs(1000); Batch packed(3:0); Code char(2); Order char(7); Item char(15); Warehouse char(3); Quantity packed(11:3); end-Ds; dcl-S getRows Int(10) inz(1000); dcl-C SqlEof const('02000'); dcl-S DetailLine Char(132); dcl-S fromstamp Timestamp; dcl-S tostamp Timestamp; dcl-S diff Int(10); exec sql set option commit = *none; %occur(transaction) = 1; *inlr = *on; fromstamp = %timestamp(); exec sql declare Input cursor for select batch, tcode, ordno, itnbr, house, trqty from transaction order by key; exec sql open Input; dou sqlstt >= SqlEof; exec sql fetch next from Input for :getRows rows into :Transaction; enddo; exec sql close Input; tostamp = %timestamp(); diff = %diff(toStamp: fromStamp: *ms); dsply ('MODS: ' + %char(diff)); return;
I called the three programs multiple times and in different combinations. There were always minor differences on each call, but the duration differences between the three were always similar to these:
Method | Time |
Single | 14161000 |
Array | 1835000 |
MODS | 1842000 |
Both multi-row FETCH methods were always approximately eight times faster than a single row FETCH. The difference between the FETCH into a multiple-occurrence data structure and a data structure array were negligible and easily accounted for by the fact that my system was doing other things! Although the data structure array program was usually faster (by a small margin), in a few of the tests, the multiple-occurrence data structure program was faster (by a small margin).
So, it would appear that a multi-row FETCH is significantly faster that a row-at-a-time FETCH. Given the choice between a data structure array and a multiple-occurrence data structure, I will opt for the data structure array as being the easier of the two to manage and to comprehend.
Editor’s Note:
I am grateful to Paul Tuohy for revisiting this subject.
I did not mean for my earlier article to be taken as authoritative. I was asked a question, and I expressed my opinion that multiple-row fetch was faster than single-row fetch, even though I had no statistics to back up that opinion. I shared the results of one test that I had run, hardly a conclusive finding.
To anyone who uses multiple-occurrence data structures for multiple-fetch because of my article, I say that what you’re doing is not wrong and I wouldn’t change existing programs. Like Paul, I prefer data structure arrays to multiple-occurrence data structures, and I would recommend you use data structure arrays also.
–Ted
Hi Ted,
Instead of a loop, can you simply do one fetch and specify “fetch first 1000 rows only”?
Glenn
Yes, Glenn, but only if you won’t need more than 1000 rows in the result set.
I can easily use the MODS columns in any other embedded SQL statements. I can’t do this with data structure array elements. Unless there was an enhancement to handle subscripts easily that I missed, this feature of the MODS seems to be a large advantage over data structure arrays, as the precompiler doesn’t seem to handle subscripts.
If I want a fixed/max number of records where that number is not too high, I will frequently do a single fetch with “for x rows” into an array DS and then loop through the array, as opposed to looping through multiple fetches.
I have a question – how does the value chosen for “Exec SQL Set Option ALWBLK = ???” affect the results, if at all?
Additionally once you get your data into the Data Structure Array, you can perform array operations on it. SORTA to reorder data in a subfile based on a column clicked on.
D.J.