Guru: When Is An Error Not An Error?
November 26, 2018 Jon Paris
When is an error not an error? When it is expected! In this article I want to discuss the use of RPG’s MONITOR op-code and discuss ways in which it might change the way you code RPG. I was prompted to write up my thoughts on this subject as a result of being quizzed by students at a recent RPG & DB2 Summit as to why I was using Monitor blocks rather than more conventional RPG techniques in my examples.
So what do I mean by expected? Basically I mean those errors that you know are going to happen sometimes and that you can hopefully do something about. For example, an error caused by a record lock that could be handled by triggering a retry. Or a divide by zero error could result in a sensible default value being used to allow processing to continue.
Before I get into the discussing Monitor though, there is a point I would like to make about my general philosophy on error handling. In my opinion, there is NEVER a valid excuse for a program to display to a user the “green screen of death.” You know, the one with the options to Cancel, Dump, etc. Or at least they should never see it until after you have put out a “pretty” screen explaining the error. I feel that error handling in RPG is a much neglected art, so much so that, together with my colleagues Susan Gantner and Paul Tuohy, some years ago I authored the IBM Redpiece “RPG: Exception and Error Handling.” If you really want to understand your RPG error handling options check it out. OK, I’ll get off the soapbox now and get back to the article.
The MONITOR Op-code
While many different methods of error handling are available in RPG, MONITOR is probably the least-used member of the family. In part this may be because it is a relatively recent addition to the language. But it may also be because people fail to realize the benefits it offers in terms of program readability.
Monitor allows us to react to errors rather than have to attempt to pre-empt them. Take the EVAL op-code for example. It has no provision for an error (E) extender, consequently division by zero, or numeric overflow, among others, will normally result in a terminal error. In the past the only way to deal with that was to pre-emptively attempt to ensure that the condition could not occur. The same thing applies when we have to convert character or numeric strings to dates, or extract a substring from another string, or. . . . As a result we may encounter code like this:
If totalQuantity = 0; averagePrice = 0; Else; averagePrice = totalSales / totalQuantity; EndIf; Test(DE) *MDY charDate; If %Error; realDate = *LoVal; Else; realDate = %Date(charDate: *mdy); EndIf;
Now that you have looked at that code, a couple of questions for you. In the case of the first test, is the programmer expecting the value to be zero? In this case, given the names of the variable and the subsequent calculation, you could reasonably guess that the norm in fact is a non-zero condition. And that in turn would make an old programmer like me grit their teeth and wonder why the programmer did not test for greater than zero so that the expected condition came first . . . but I digress.
Same question for the second example. Is it normal for charDate to contain an invalid date? Probably not, but we don’t know from the code. Of course we could always rely on the ever accurate comments in the code to tell us, if we’re lucky enough to have any of course.
One thing I can pretty much guarantee you is that at least one of the two tests will have change flags against it indicating that it was added to the program at some time, probably after some poor user suffered the green screen of death!
Now let’s look at the same logic written to use Monitor.
Monitor; averagePrice = totalSales / totalQuantity; On-Error; averagePrice = 0; End-Mon; Monitor; realDate = %Date(charDate: *mdy); On-Error; realDate = *LoVal; End-Mon;
Doesn’t that make things a lot clearer? Now you know exactly what the original programmer expected to happen. You also know exactly how they handled anticipated errors.
Notice any similarity with the way you program in CL? In CL we normally attempt an operation and then use MONMSG to react to any errors that are triggered. Think of Monitor as giving us that same flexibility in RPG!
Another nice feature of Monitor is that we can control exactly which errors we want to handle. In fact I would normally have coded the first of the examples more like this:
Monitor; averagePrice = totalSales / totalQuantity; On-Error errDivZero; averagePrice = 0; End-Mon;
Where errDivZero is a constant defined like this:
Dcl-C errDivZero 00102; // Attempt to divide by zero
In practice, this constant would have been included via a /Copy of the standard set of status code values that I use in most programs.
Why do it this way? Because divide by zero isn’t the only potential error that could occur here. The code could have, for example, triggered a numeric overflow error. Should that have occurred in the first Monitor version, then processing would have continued using the default value. This happens because an On-error op-code with no qualification traps any and all errors. And that probably would not have been a good thing to do.
By specifying only the statuses that I am prepared to handle I am assured that RPG’s default error handling will be triggered in the event that any unexpected error occurs. And of course that would result in the PSSR being called to log the error, notify the user, capture the current screen image, and . . . . You do always include such a standard PSSR, don’t you? If not, then you really should read the Redpiece I referenced earlier!
Another reason I like using the Monitor approach is that it is generally more efficient — particularly if the majority of the time the error will not be triggered. Perhaps we only get one invalid date in every 10,000 records. The Test/If combination would have to be applied to 9,999 records to no effect. On the other hand we can simply invoke Monitor to set up the error monitoring and that is a much less resource intensive operation.
Other Monitor Options
In addition to being able to trap specific error statuses, On-Error allows for the use of two special values. Actually there are three if you count *ALL but since that is the same as not coding anything I’ll ignore it for now.
The first value is *PROGRAM. This covers all status codes from 00100 to 00999. The second is *FILE which, as you’ve probably guessed, covers all the file-related status codes, from 01000 to 09999. I have used these from time and coded conditional logic based on specific status codes. Usually such a routine would default to calling the PSSR routine if the error is not one I can handle. In general though I usually code On-error blocks for the specific errors I can process, and leave the default handler to catch anything I had not anticipated.
Summary
MONITOR is a terrific and underutilized addition to the RPG language. I hope in this tip I have encouraged you to try it. Not just because it will improve your error handling abilities, but because of the contribution it can make to making your code more understandable.
RELATED STORY
RPG: Exception and Error Handling
Jon Paris is one of the world’s foremost experts on programming on the IBM i platform. A frequent author, forum contributor, and speaker at user groups and technical conferences around the world, he is also an IBM Champion and a partner at Partner400 and System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Paul Tuohy.
I always coded with a PSSR and I went a little further using the value that could be put on the dump header. Every SR or monitored routine would change the field and leave that mark on the dump so when I got called out the report would tell me where to go look. This shortened my diagnostic time substantially.
In my view almost all software should be built assuming incoming data is invalid and outputs at risk of database enforced rules or record locks.
We’ve started using this technique more and more recently. However we’ve started getting complaints from the sys admins that the job logs for these programs that are running all day are being filled up with messages generated from the errors caught in these monitor statements. Any suggestions to placate them?
You can remove unwanted messages from the job log.
https://www.itjungle.com/2005/02/16/fhg021605-story02/
*** Another reason I like using the Monitor approach is that it is generally more efficient — particularly if the majority of the time the error will not be triggered. ***
I would change that last part to say “BUT ONLY if the majority of the time the error will not be triggered.“. Monitor blocks and (E) opcode extenders should be used judiciously. The problem is that people tend to forget that last bit and use Monitor blocks all the time because they are convenient. Processing exceptions is expensive. Here’s a real-life example of the problems that inappropriate use of Monitor blocks can cause. Consider this program that processes a file with 10M records:
**free
Ctl-opt DftActGrp(*no) ActGrp(‘TEST’);
Dcl-F InputFile Usage(*Input);
Dcl-S Amount Zoned(7:0) Inz;
DoU %eof(Inputfile);
Read InputFile;
If %eof(InputFile);
Leave;
Endif;
Monitor;
Amount = %Dec(Data:7:0);
On-Error;
Amount = 0;
EndMon;
EndDo;
*inlr = *on;
I’ve mocked up the file to have 5 million records with field DATA being blank and the other 5 million where DATA has some combination of letters and numbers. Either way, they will all trigger an exception. What happens when we run this program?
1) The joblog fills up with 10 million entries saying “A character representation of a numeric value is in error”, rendering it completely useless. If JOBMSGQFL is set to *NOWRAP, the job will abend after a few seconds when the joblog fills up. If JOBMSGQFL is set to *WRAP, it will abend after a few minutes when it has produced 9999 joblog spool files, that take up a lot of space and are a pain to clean up. If JOBMSGQFL is set to *WRAP, we’ll end up with a 12,686 page joblog that only shows the tail end of the job.
2) QSYSOPR message queue fills up with thousands of messages saying “Job message queue for 374807/DAN/TEST37 has been wrapped.”
3) The job uses up to 18% CPU while active
4) It runs for 12 minutes 42 seconds and consumes 416 CPUs
With a simple modification to do preemptive monitoring all this can be avoided:
**free
Ctl-opt DftActGrp(*no) ActGrp(‘TEST’);
Dcl-F InputFile Usage(*Input);
Dcl-S Amount Zoned(7:0) Inz;
Dcl-C NUMBERS ‘1234567890 ‘;
DoU %Eof(InputFile);
Read InputFile;
If %eof(InputFile);
Leave;
Endif;
If Data *Blanks and %Check(NUMBERS:Data) = 0;
Monitor; // just in case
Amount = %dec(Data:7:0);
On-Error;
Amount = 0;
EndMon;
Else;
Amount = 0;
Endif;
Enddo;
*inlr = *on;
1) Joblog less than 1 page showing start and end of the job and 0 errors.
2) No messages over the console
3) CPU usage while active 0.1%.
4) Runtime < 1 second. CPU consumption: 2.4 CPUs.
Checking a field for blanks and/or doing a %check is an in-memory operation and is very fast and inexpensive. Monitor blocks are fine to deal with very infrequent exceptions or interactive programs, but they can cause a lot of problems in high-volume batch environments. Because of the overhead and joblog issues mentioned above, I do not recommend reactive monitoring with Monitor or (E) extenders as the first option. I prefer to do inexpensive tests that do not cause exception and only use Monitor for the final catch-all.
The other thing I don’t like about Monitor and (E) extenders that some people abuse them:
Monitor;
// 5000 lines of shoddy code
On-error;
Endmon;
*inlr = *on;
Look Ma, my program never bombs!
Error and exception handling in RPG is indeed a neglected art. The most difficult part may very well be deciding what to do when an exception occurs. Roll back? Abort? Log, continue and notify? These considerations are rarely defined upfront. I’ve had heated debates in my shop about error handling. Some people deliberately don’t take provisions to deal with eval overflows because they want the job to go to MSGW so they learn that the total field on the report was not big enough. IMO, letting jobs go to MSGW when it can reasonably be avoided is never a good idea. If the job runs interactively, the user will get the dreaded green screen of death. If it runs in Batch, it can hold up other jobs, especially in single threaded JOBQ’s. Evals where there is more than a remote chance of overflow should be always be monitored. Now you can abend the job and alert the user, operations, yourself or the police – you are in control.
Dan D