Guru: Speeding Up RPG By Reducing I/O Operations, Part 1
May 8, 2023 Gregory Simmons
Perhaps one of the easiest ways to speed up an RPG program is to reduce the number of I/O operations it needs to perform. In this article let’s explore one simple method for moving toward dataset processing.
Here I have a simple RPG program. Okay, admittedly, we don’t often get to write “simple” RPG programs, but for this example, I have stripped the RPG program down to just the read loop so I can demonstrate the conversion.
1 Dcl-f AcctMstr Usage(*Input) Keyed; 2 Dcl-pr entry ExtPgm('RPGRPT1'); 3 n Packed(3:0); 4 End-Pr; 5 Dcl-pi entry; 6 inBranch Packed(3:0); 7 End-pi; 8 setll branch AcctMstr; 9 reade branch AcctMstr; 10 dow not %eof(AcctMstr); 11 //... Do Important Stuff 12 reade branch AcctMstr; 13 EndDo; 14 *Inlr = *On; 15 Return;
In the above program, we need to read all records from the AcctMstr file where the branch is equal to the branch that was passed in. This program needs to perform some formatting and gathering of data from other files as well, but for today let’s tackle the simplest of changes, swapping out the read loop for iterating through an SQL cursor.
By swapping out this standard read loop for an SQL cursor, we can reduce the number of I/O operations from, in many cases, several thousand (or even millions) to one. The benefit here is that there is a direct correlation between the number of I/O operations performed in a program to the speed with which that program can perform its tasks.
1 Ctl-Opt Option(*SrcStmt:*NoDebugIO); 2 Ctl-Opt DftActGrp(*no); 3 Dcl-ds t_acctMstr Extname('ACCTMSTR') Qualified Template End-ds; 4 Dcl-ds dsAcctInfo Extname('ACCTMSTR') End-ds; 5 Dcl-ds dsAcctInfoAry Likeds(dsAcctInfo) Dim(25000); 6 Dcl-s index Uns(10); 7 Dcl-s rows Uns(10) Inz(%Elem(dsAcctInfoAry)); 8 Dcl-pr entry ExtPgm('RPGRPT1A'); 9 n Like(t_acctMstr.branch); 10 End-Pr; 11 Dcl-pi entry; 12 inBranch Like(t_acctMstr.branch); 13 End-pi; 14 If Prepare_For_Data_Pull(inBranch); 15 Dow Pull_Data(); 16 For index = 1 to rows; 17 dsAcctInfo = dsAcctInfoAry(index); 18 //... Do Important Stuff 19 EndFor; 20 EndDo; 21 EndIf; 22 Exec SQL Close AcctInfoCsr; 23 *Inlr = *On; 24 Return;
Line 1: Some legacy programs don’t have this CTL-Opt set, so I add it in.
Line 2: Since I always code the On-Exit operator in my procedures and when not specified, the DFTACTGRP will be *YES. So, I need to change this program to no longer run in the default activation group. I don’t have time in this article to go into detail about activation groups, suffice it to say though, that if you need to brush up on how activation groups work, there are some great articles available.
Line 3: This is a template which I use to define my input parameter like it’s defined in my file.
Line 4: Note that I define this data structure just like the template above it, but NOT LikeDs of that data structure. This is done on purpose, because if you use the LikeDS on the definition then the data structure is automatically qualified. Since I want this program to function as it used to, I want the full record of AcctMstr to be in memory and not have to go change any references to those fields as a qualified reference. Ex. Accnt vs. dsAcctInfo.Accnt.
Line 5: I make the number of dimensions as big as I can and still have my data structure array within the 16 MB size limit.
Line 7: The variable rows will be initialized to the number of dimensions in the array defined on line 5. I will adjust this down to how many I actually fetched later.
Lines 9 and 12: Noticed that I’ve changed the definition from a hard coded value to being defined like the appropriate field in the file. This is a bonus tip: Whenever possible, define variables “like” they are defined in the applicable file. If/when the length of that field changes, your program will need less and less programmer touches, and more and more scenarios will just require a re-compile.
Line 14: This procedure will declare and open the cursor.
Line 15: This procedure will perform the fetch to populate the array I’ll be iterating through.
Line 17: As noted above, with moving this dimension of the array into the data structure that is not qualified, I’ll now have all fields in my AcctMstr file in memory.
Line 22: We always want to remember to close our SQL cursor. Normally, I would do this within an On-Exit statement, but in this instance, I’m not converting the program to a Linear-Main program so I just close the cursor when I’m done with my loop.
// Prepare and open the cursor 25 Dcl-Proc Prepare_For_Data_Pull; 26 Dcl-Pi Prepare_For_Data_Pull Ind; 27 p_branch Like(t_acctMstr.branch); 28 End-Pi; 29 Dcl-s result Ind Inz(*On); 30 Exec SQL Declare AcctInfoCsr INSENSITIVE Cursor For 31 Select * 32 From AcctMstr 33 Where branch = :p_branch 34 Order By accnt 35 For Read Only With NC; 36 Exec SQL Open AcctInfoCsr; 37 If sqlCode < 0; 38 result = *Off; 39 Endif; 40 Return result; 41 On-Exit; 42 End-Proc Prepare_For_Data_Pull;
Line 30: By default, the SQL cursor will be defined as Insensitive (which means that if data is changed your cursor will not reflect that – it’s just a snapshot of the data at the time the fetch occurred), but I prefer to specify the Insensitive anyway just to be clear to myself and anyone else reading my code that that was my intent.
Line 31: I normally would never use Select * and would instead select only the fields I need. However, in this instance since we are replacing a read statement, which will pull all fields used into memory, I’m keeping it simple. This will allow for all fields from the file to be in memory and thus no other code changes are required to continue to use those fields. Ideally, I would research all of the fields from the file that are needed, but again, I’m keeping it simple this go around.
Line 35: Ever since hearing of a nightmare-ish instance where unexpected commitment boundaries were opened during a simple select statement, I’ve been in the habit of always being explicit about what kind of commitment control I want to use for each SQL statement.
Line 36: Opening the cursor; required prior to fetching any data from it.
Line 41: I always, always code the On-Exit operator in my procedures. Even if it doesn’t have a need to do anything, it’s always there to future coders, subliminally asking the coder, “anything you should put in here?”.
Line 42: I always code the name of the procedure on the End-Proc statement. This just gives another nice hint, as you’re scrolling upwards, of which procedure you’ve scrolled into.
43 Dcl-Proc Pull_Data; 44 Dcl-Pi Pull_Data Ind; 45 End-Pi; 46 Dcl-s result Ind Inz(*On); 47 Clear dsAcctInfoAry; 48 Exec SQL 49 Fetch Next From AcctInfoCsr For :rows Rows Into :dsAcctInfoAry; 50 If sqlCode < 0; 51 result = *Off; 52 Else; 53 Exec sql GET DIAGNOSTICS :rows = ROW_COUNT ; 54 EndIf; 55 return result; 56 On-Exit; 57 End-Proc Pull_Data;
Lines 46 and 51: Most of my procedures I return an indicator so I know whether the procedure worked or not. Another idea would be to have this procedure return the number of rows fetched. I would recommend whichever you and your fellow developers in your shop feel makes best sense.
Line 47: Clearing the array that’s going to receive the rows from the fetch. Admittedly, since I’m using the rows variable to know when I have data or not, I probably could get away with not clearing the array, but it feels safer/cleaner to clear it anyway.
Lines 48 and 49: This is where the 1 read happens that pulls in up to 25,000 rows all at once.
Line 53: This is where I set the rows variable to the actual number of rows retrieved. You can also use SQLERRD(3), but I recently heard second hand that that shouldn’t be (always) trusted. I’ve never found it to be inaccurate, but you have a couple of options.
In my testing, the I/O was decreased from 265,114 reads, to 11. That’s 10 reads of pulling in 25,000 records at a time and one final read to pull in the remaining 15,114 records. That’s a 99.9959% decrease in I/O for this file in this program! And for my fellow speed demons, the original program ran in 21.87 second; after the cursor overhaul, .41 seconds or a 98.1253% reduction!
Until next time, happy coding.
Gregory Simmons is a software engineer with PC Richard & Son. He started on the IBM i platform in 1994, graduated with a degree in Computer Information Systems in 1997 and has been working on the OS/400 and IBM i platform ever since. He has been a registered instructor with the IBM Academic Initiative since 2007, and holds a COMMON Application Developer certification. When he’s not trying to figure out how to speed up legacy programs, he enjoys speaking at COMMON and other technical conferences, running, backpacking, SCUBA diving, hunting, and fishing.
I’ve used this technique myself a fair bit, though I don’t usually do nearly that many rows.
One thing worth mentioning, is I don’t think you need to call GET DIAGNOSTICS to get the row count. That should already be available in the SQL communications area DS SQLCA (the same place that SQLCode and SQLState come from) right after the fetch, variable name SQLER3.
It is not beneficial to have complex code in order to boost speed, even if you do manage to save some processing time and disk IO. This is already taken care of by the IBM i operating system (it doesn’t physically perform each IO to the physical disk). Unless the program desperately needs to be faster.
Thanks for this guru corner.
Just for information, in recent RPG you can avoid the prototype for the current module.
and you can just declare unnamed entry like this
Dcl-pi *N;
inBranch Packed(3:0);
End-pi;
that’s it, so the source is not tied to a particular object name and more compact.
i.e. this is a full compilable RPG program with one parameter
DCL-PI *N;
test1 char(1);
END-PI;
*INLR = *ON;
if you don’t have parameters this is effectively a compilable a full RPG program 😉
—
*INLR = *ON;
—
Beware of WITH NC
It can bite you inproduction 😉
You can actually read records that will be rolled back by concurrent transactions for errors, basically records that do not exists.
Done that, seen that 😉