Guru: Refactoring RPG – GOTO
August 27, 2018 Ted Holt
When I first learned COBOL, I coded loops the way all the programmers in my shop did — with GO TO. Paragraph names were labels, not routines. Then I took a class in COBOL and learned structured programming. I’ve never looked back. I wish other people felt the same way, because I don’t like to work on GOTO-laden programs.
Injudicious use of branching — in RPG that would be the GOTO and CABxx op codes — is a major reason I refactor. GOTO plays havoc with program “logic”, a word I hesitate to use in this context. The minute someone asks me to insert a pop-up window or otherwise alter the flow of a program, the whole thing falls apart.
In the following paragraphs, I share a few ways to replace branching operations with more robust programming practice. I use RPG for my examples, but what I write applies to other languages, especially procedural languages like COBOL and CL.
1. Classic Patterns
A good place to begin is to look for classic patterns and replace branching with structured op codes. The classic patterns and their RPG op codes are:
- Top-tested loop (DOW)
- Bottom-tested loop (DOU)
- Selection
- Simple selection (IF, IF/ELSE, IF/ELSEIF/ELSE).
- Case structure (SELECT)
I feel safe in saying that all modern languages support these patterns. I would add two more patterns to the list:
- Counted loop (FOR, DO)
- Middle-tested loop (Not implemented in RPG)
In general, branching backwards through the code is a loop; branching forward is selection.
Here’s a question for you. Which pattern does this fit?
C CHGCOD CABNE 1 TAG200 C MOVE . . . C MOVE . . . C ADD . . . C MOVE . . . C GOTO TAG220 C TAG200 TAG C MOVE . . . C MOVE . . . C SUB . . . C MOVE . . . C TAG220 TAG
If you said that this is selection or if/then/else or something similar, give yourself a point. It is indeed selection, and is easily refactored like this:
C CHGCOD IFEQ 1 C MOVE . . . C MOVE . . . C ADD . . . C MOVE . . . C ELSE C MOVE . . . C MOVE . . . C SUB . . . C MOVE . . . C ENDIF
Change those MOVE’s, that ADD, and that SUB to EVAL’s and you’re only a step away from free-form code.
How about this one?
C Z-ADD1 NX C NXTOPT TAG C OPT,NX CASEQ'A' ADDSR C OPT,NX CASEQ'C' CHGSR C OPT,NX CASEQ'D' DELSR C END C EXCPTE1 C ADD 1 NX C NX CABLE12 NXTOPT
If you said that it’s a bottom-tested loop, give yourself a point. That answer is correct. However, a closer look shows that it’s a counted loop, with NX ranging from 1 to 12, so if you said that it’s a counted loop, give yourself two points.
C FOR NDX = 1 to 12 C OPT(NDX) CASEQ 'A' ADDSR C OPT(NDX) CASEQ 'C' CHGSR C OPT(NDX) CASEQ 'D' DLTSR C END C EXCEPT E1 C ENDFOR
What pattern does this one fit?
C RD100 TAG C READ QCUSTCDT 44 C *IN44 CABEQ *ON RD110 C MOVE . . . C MOVE . . . C MOVE . . . C MOVE . . . C EXCEPT E1 C GOTO RD100 C RD110 TAG
Did you identify this as a middle-tested loop? The READ must be done at least once. The MOVE’s and EXCPT may not be done at all. RPG, CL, and COBOL programs are full of this type of loop because this is the way programmers think. This is why I add middle-tested loops to the list of classic patterns.
Within such loops I typically find I/O operations, especially EXFMT and the various READx op codes, and scanning operations, such as the SCAN op code, the %SCAN function, and the various table-and-array lookup functions and op codes. I have already written about the practicality of middle-tested loops and I won’t belabor the point.
This code easily converts like this:
C DOW '1' C READ QCUSTCDT C IF %EOF() C LEAVE C ENDIF C MOVE . . . C MOVE . . . C MOVE . . . C MOVE . . . C EXCEPT E1 C ENDDO
The following table contains some common patterns in pseudocode and their structured equivalents.
Pattern | Structured programming equivalent |
A:
if condition goto B . . . goto A B: |
top-tested loop (do while) |
A:
. . . if condition goto A |
bottom-tested loop (do until) |
A:
. . . if condition goto B . . . goto A B: |
middle-tested loop |
A:
. . . if condition goto A . . . if condition goto A . . . if condition goto A . . . goto A |
any loop with ITER |
. . .
if condition goto A . . . if condition goto A . . . if condition goto A . . . A: |
case structure (SELECT) |
if condition
. . . goto A if . . . . . . goto A . . . A: |
case structure (SELECT) |
if condition goto A
. . . A: |
if/then |
if condition goto A
. . . goto B A: . . . B: |
if/then/else |
2. Indiscriminate Branching
Not all code fits these patterns. After all, they don’t call it “spaghetti code” for nothing. Here’s another example. What would you do with this one?
C OPTION CABNE '1' TAG300 . . . C TAG300 TAG
Would you convert this to an if/then structure? Not so fast! How many lines do the three dots represent? If only a few, then yes, such a conversion would probably be appropriate. If a lot, then maybe not, or at least not yet.
“A few” and “a lot” are subjective terms. I prefer to keep the number of lines between an IF and the corresponding ENDIF to what I can easily view on a screen, something like 25 or fewer, certainly no more than 50. (If that seems unrealistic, keep in mind that it’s a preference, not a hard-and-fast rule.) The same applies to loops.
If those three dots represent dozens or even hundreds of lines, I look for ways to reduce that number before tackling the GOTO. I might try to refactor that code into one or more routines, as I wrote about previously.
In this case, I might try to move the lines between the GOTO and TAG into a subroutine, like this:
C OPTION IFEQ '1' C EXSR OPTION1 C ENDIF . . . C OPTION1 BEGSR . . . (the code that followed the CABNE goes here) C ENDSR
Whether I can create such a subroutine or not depends on what’s in those lines of code. In most cases I can get away with this. However, one thing that tends to foul up the whole works is (as you may have guessed) GOTO. Let me expand the example slightly.
C OPTION CABNE '1' TAG300 C ADD . . . C MOVE . . . C CTYPE IFNE 1 C MOVE . . . C GOTO SKIP1 C ENDIF C MOVE . . . C MOVE . . . C SKIP1 TAG C MOVE . . . C MOVE . . . C MOVE . . . C TAG300 TAG
There is another GOTO/TAG pair, but they are within the new routine. There are no branches to destinations before the first line (CABNE line) or after the last line (TAG300). If there are no branches to SKIP1 from elsewhere in the program, we can create a new subroutine.
C IF OPTION = 1 C EXSR OPTION1 C ENDIF . . . C OPTION1 BEGSR C ADD . . . C MOVE . . . C CTYPE IFNE 1 C MOVE . . . C GOTO SKIP1 C ENDIF C MOVE . . . C MOVE . . . C SKIP1 TAG C MOVE . . . C MOVE . . . C MOVE . . . C ENDSR
But suppose that the code between the CABNE and the TAG does branch out of the range. Do we lose hope? No. Branching out of the range does not automatically prevent you from creating a routine. Consider this:
C TAG120 TAG . . . C OPTION CABNE 1 TAG300 C ADD . . . C MOVE . . . C CTYPE IFNE 1 C MOVE . . . C GOTO TAG120 C ENDIF C MOVE . . . C MOVE . . . C TAG300 TAG
We have a problem: Control may branch to TAG120 from within the new subroutine, and branching in and out of subroutines is a no-no. Here’s an easy way around that.
D TAGNAME S 10 . . . C TAG120 TAG . . . C IF OPTION = 1 C EXSR OPTION1 C TAGNAME CABEQ 'TAG120' TAG120 C ENDIF . . . C OPTION1 BEGSR C EVAL TAGNAME = *BLANKS C ADD . . . C MOVE . . . C CTYPE IFNE 1 C MOVE . . . C EVAL TAGNAME = 'TAG120' C LEAVESR C ENDIF C MOVE . . . C MOVE . . . C ENDSR
I’ve defined a 10-byte character variable called TAGNAME, and I use it to control branching after calling the subroutine. One TAGNAME variable is sufficient to handle all such cases in a program.
I’ve added a new first line to the subroutine to set the TAGNAME to blanks. If the OPTION1 subroutine needs to branch to TAG120, it loads the TAGNAME variable with an appropriate value (I use the name of the TAG itself) and exits the subroutine. Upon return to the caller, control branches as it always did.
While it might seem that I made no progress by removing one GOTO and adding one CABEQ, I have made progress by creating a new subroutine. Further refactoring may eventually eliminate the TAGNAME variable.
3. Purism Is Not Required
I’ve had success with these simple techniques. I’ve no doubt they will work for you, too. I don’t refactor to eliminate GOTO out of philosophical reasons. I eliminate GOTO when the use of GOTO prevents me from changing the program source code to handle new requirements. Otherwise, I leave it alone. You don’t have to remove all GOTO’s from working code. They won’t breed, as rats or roaches would.
RELATED STORIES
Guru: Refactoring into Routines
Back in 1979 when I was programming in Cobol on IBM 370, I read the book “Structured Programming in Cobol”, and never used GO TO again. Replaced it with a variety of PERFORM statements. You have put it beautifully, so a man with average programming aptitude can also understand.