Guru: Speeding Up RPG By Reducing I/O Operations, Part 2
June 12, 2023 Gregory Simmons
Legacy code. Often one admits they have legacy code either with a chuckle or a wince. Nonetheless, it usually is admitted with bad connotations. We must remember though – legacy code becomes legacy code because it works. It performs its tasks day in and day out for many years and is forgotten about. Only the squeaky wheel gets the grease, right?
The problem with this code is that, while it works, decades slip by, and technology evolves. As these decades roll along, we don’t just end up with a few programs that fit this ‘legacy code’ stereotype, we often end up with thousands of these programs.
I’m sure you can agree that if you stroll into your CTO’s office and tell him/her that you’d like to address all this legacy code at once, you may get words of encouragement, but it won’t be a top priority project. Which is why we have to tackle this legacy code one bit at a time. I like the analogy that Michael Feathers provides in his book, Working Effectively with Legacy Code, where he proposed that a source library littered with legacy code is a muddy field, with a few blades of grass here and there representing some quality <RPG> programs. While we all would like to imagine our source code like a finely manicured football field, the truth of the matter is that you’re likely in the same situation I am: In the muddy field. There are hundreds if not thousands of programs in your source library which were written years before you joined your company, or in some cases before you were born. We all can (and should) dream of having a source library that emulates a perfectly groomed field of grass, but Mr. Feathers advises that we focus on one blade of grass at a time.
When we do find opportunities to work on that one blade of grass, we need to have the right tools at the ready in our toolbelt. A while back, in my article, Guru: Speeding Up RPG By Reducing I/O Operations, Part One, I demonstrated how to take an RPG program with a simple read loop and upgrade it with a SQL cursor. Let’s take this a bit further and look at how we can deal with other file accesses within that read loop.
Let’s have a look at a simple RPG program:
1 **FREE 2 Dcl-f AcctMstr Usage(*Input) Keyed; 3 Dcl-f AcctMsExt Usage(*Input) Keyed; 4 Dcl-f AcctMsOpt Usage(*Input) Keyed; 5 Dcl-pi *n; 6 inBranch Packed(3:0); 7 End-pi; 8 setll branch AcctMstr; 9 reade branch AcctMstr; 10 dow not %eof(AcctMstr); 11 Chain (branch:acct) AcctMsExt; 12 If %Found(rAcctMsExt); 13 Chain (branch:acct) AcctMsOpt; 14 //... Do Important Stuff 15 Endif; 16 reade branch AcctMstr; 17 EndDo; 18 *Inlr = *On; 18 Return;
Line 1: **Free – We’ll be going 100 percent RPG free for this program. All the cool kids are doing it.
Line 2: AcctMstr is the primary read file for this program.
Line 3: AcctMsExt is a file that we have to have a match with the primary file or we don’t want to process the record.
Line 4: AcctMsOpt is a file that, if exists for the branch/account key we will use that data.
Line 5-7: A nice comment from a reader of one of my previous articles brought to my attention that you don’t actually need to specify the prototype for a program. So, I chose to save myself three lines of code and just provide the procedure interface. My thanks to the commenter for reminding me of this. Old habits die hard.
Lines 11 and 12: This is the traditional RPG chain and If combo. But this is where the RPG program dictates that there has to be a match in order for us to continue to be interested in the previously read record.
Line 13: Here we have another chain, but note that this one isn’t followed by a %Found built in function. The data coming from this file is optional.
In the first iteration of this RPG program where I only swapped out the read loop for as SQL cursor, that reduced the I/O from 265,411 to 11. Now that we look at this program with two extra chain operations for each read operation, the impact on this program is going to be triple. That is, now this program isn’t performing 265,411 I/O operations, it’s performing 796,233 I/O operations.
Now, let’s have a look at this same program which is using an SQL cursor to grab data from all three files at once.
1 **FREE 2 Ctl-Opt Option(*SrcStmt:*NoDebugIO); 3 Ctl-Opt DftActGrp(*no); 4 Dcl-ds t_acctMstr Extname('ACCTMSTR') Qualified Template End-ds; 5 Dcl-ds t_acctMsOpt Extname('ACCTMSOPT') Qualified Template End-ds; 6 Dcl-ds dsAcctInfo; 7 account Like(t_acctMstr.ACCOUNT); 8 branch Like(t_acctMstr.BRANCH); 9 coOwner Like(t_acctMsOPT.COOWNER); 10 End-ds; 11 Dcl-ds dsAcctInfoAry Likeds(dsAcctInfo) Dim(25000); 12 Dcl-s index Uns(10); 13 Dcl-s rows Uns(10) Inz(%Elem(dsAcctInfoAry)); 14 Dcl-pi *n; 15 inBranch Like(t_acctMstr.branch); 16 End-pi; 17 If Prepare_For_Data_Pull(inBranch); 18 Dow Pull_Data(); 19 For index = 1 to rows; 20 dsAcctInfo = dsAcctInfoAry(index); //... Do Important Stuff 21 EndFor; 22 EndDo; 23 EndIf; 24 Exec SQL Close AcctInfoCsr; 25 *Inlr = *On; 26 Return;
Line 1: This program will continue to be in fully free format.
Line 2: Some programs may not have this control option, so I usually add it in.
Line 3: Since I always code the On-Exit operator in my procedures and when not specified, the DFTACTGRP will be *YES. So, I need to change this program to no longer run in the default activation group.
Lines 4 and 5: These are templates I use to define my input paramater as well as my data structure used later.
Lines 6 – 10: This is the data structure which will be used to define each dimension of the array I will fetch into later.
Line 11: This is the array that defines what each result row will look like. Adding just one field from my AcctMsOpt file didn’t increase the overall size of the array over the 16 mb limit. So, I can continue to fetch up to 25,000 records at a time.
Line 13: The variable rows will be initialized to the number of dimensions in the array defined on line 5. I will adjust this down to how many I actually fetched later.
Line 17: This procedure will declare and open the cursor.
Line 18: This procedure will perform the fetch to populate the array I’ll be iterating through.
Line 20: Move this dimension of the array into the data structure. Note that the data structure is not qualified, so all fields needed from the files will be in memory.
Line 24: We always want to remember to close our SQL cursor. Normally, I would do this within an On-Exit statement, but in this instance, I’m not converting the program to a Linear-Main program so I just close the cursor when I’m done with my loop.
1 // Prepare and open the cursor 2 Dcl-Proc Prepare_For_Data_Pull; 3 Dcl-Pi Prepare_For_Data_Pull Ind; 4 p_branch Like(t_acctMstr.branch); 5 End-Pi; 6 Dcl-s result Ind Inz(*On); 7 Exec SQL Declare AcctInfoCsr INSENSITIVE Cursor For 8 Select a.account, a.branch, e.ActAge, Coalesce(o.CoOwner,'') 9 From AcctMstr a 10 Inner Join AcctMsExt e on a.account = e.account and a.branch = e.branch 11 Left Outer Join AcctMsOpt o on a.account = o.account and a.branch = o.branch 12 Where a.branch = :p_branch 13 Order By a.account 14 For Read Only; 15 Exec SQL Open AcctInfoCsr; 16 If sqlCode < 0; 17 result = *Off; 18 Endif; 19 Return result; 20 On-Exit; 21 End-Proc Prepare_For_Data_Pull;
Line 1: I always add a simple one-line comment of what the procedure is to do even though the name of the procedure should be fairly self-documenting. If nothing else, it helps to visually set the procedure apart from the previous lines.
Line 7: By default, the SQL cursor will be defined as insensitive (which means that if data is changed your cursor will not reflect that – it’s just a snapshot of the data at the time the fetch occurred), but I prefer to specify the Insensitive anyway just to be clear to myself and anyone else reading my code that that was my intent.
Line 8: Select the fields I need for ‘doing my important stuff’. Note fields I’m selecting:
- Two fields from AcctMstr which I’ve given an alias of ‘a’.
- One field from AcctMsExt which I’ve given an alias of ‘e’. Since this file I use an Inner Join, I can trust that ‘e’ fields will always contain data – it won’t be null.
- One field from AcctMsOpt which I’ve given an alias of ‘o’. Since this file is accessed via a Left Outer Join, there may or not be matching records, so ‘o’ fields could be null. So, ALWAYS, wrap these fields with a Coalesce clause. The Coalesce statement tests for the first parameter being null. Then if it’s not null it returns the value, otherwise it returns whatever you specify as the second parameter. Pretty cool, huh? So, in this case, if there is no co-owner for this account, just return blank.
1 // Pull the data for the report 2 Dcl-Proc Pull_Data; 3 Dcl-Pi Pull_Data Ind; 4 End-Pi; 5 Dcl-s result Ind Inz(*On); 6 Clear dsAcctInfoAry; 7 Exec SQL 8 Fetch Next From AcctInfoCsr For :rows Rows Into :dsAcctInfoAry; 9 If sqlstate <> '00000'; 10 result = *Off; 11 Else; 12 Exec SQL GET DIAGNOSTICS :rows = ROW_COUNT; 13 EndIf; 14 return result; 15 On-Exit; 16 End-Proc Pull_Data;
This procedure, in short, grabs as many records as are available, up to 25,000. If we were to imagine all our data records as a book, an SQL cursor allows us to read a bunch of pages, then keep a finger where we left off. So subsequent calls to this procedure will grab the next batch of records, up to 25,000.
Line 6: As a good practice, I always clear what I’m going to fetch into; even if I know the program will only call that procedure once. You never know if/when a fellow developer has to come in later and modify your program.
Lines 7 and 8: Recall that in the procedure Prepare_For_Data_Pull where I defined the cursor, I then opened it on line 15. This opened the data path. Here then, I can fetch the next block of data into my data structure array.
Lines 9-13: If I encountered any issue with the fetch, I will return *Off. Otherwise, adjust my rows variable to the number of rows I actually fetched.
In this converted program I’ve reduced the I/O operations from 796,233 to 11. For you number crunchers, that’s 99.9986 percent decrease in I/O operations. If you’ll recall from my part 1 article, the percent decrease was 99.9959 percent. Okay, by itself, a .0027 percent reduction in I/O is not going to get your CTO excited. However, the intention is to show you how it’s done. Now, you have a couple more tools at your disposal to tackle that next blade of grass.
Until next time, happy coding.
Gregory Simmons is a software engineer with PC Richard & Son. He started on the IBM i platform in 1994, graduated with a degree in Computer Information Systems in 1997 and has been working on the OS/400 and IBM i platform ever since. He has been a registered instructor with the IBM Academic Initiative since 2007, and holds a COMMON Application Developer certification. When he’s not trying to figure out how to speed up legacy programs, he enjoys speaking at COMMON and other technical conferences, running, backpacking, SCUBA diving, hunting, and fishing.
It is interesting how, in one sense, we have come full circle. Old RPG II had the well known, and almost always used, read blocking. I know one of the earliest skills I learned was working out block sizes for maximum efficiency on our S/36 with its 650 MB disk pack. Though of course this was really only relevant when reading entire files.