Guru: Application Tracing With The QTRC APIs
May 20, 2019 Dawn May
The IBM i operating system is extensively instrumented for diagnostic purposes to help IBM resolve issues should they arise. Instrumenting your application for diagnostics is also possible in a manner very similar to what IBM has done in the operating system.
STRTRC and the QTRC APIs
The operating system includes a trace facility called QTRC. This trace facility is a set of APIs (QTRC APIs) that allow you to instrument your application to collect trace data. This trace data can be whatever you may find helpful for debugging applications in a production environment, where using a debugger is impractical.
The Start Trace (STRTRC) command is used to turn on the trace. You specify the job to trace, which may be a single job, a set of jobs, generic jobs, or all jobs in a subsystem. STRTRC can also “Join” jobs to the trace; if you start worker jobs with spawn or are connecting to a prestart job via JDBC (for example), join trace automatically adds those jobs to the trace. This makes the trace collection dynamic based upon the flow of work for the application. There is a Join Trace API you can use for your application to join an active trace.
The “Job Trace Type” parameter allows you to collect a call/return flow trace, collect operating system trace data, as well as specify that a component trace should be collected. The component trace is selected with the TRCTYPE parameter; when you review the available values for the TRCTYPE component parameter, you will see all sorts of selections to collect operating system component traces. This is where the QTRC APIs come into play. You can create your own component trace and specify your component on the TRCTYPE parameter. The trace level allows you to collect the trace data at three levels of detail. The trace levels are cumulative, such that at *VERBOSE, you also collect data specified at the *ERROR and *INFO levels.
Your application checks to see if trace is active before writing any trace data; QTRC API tips recommend caching the value for fast checks, refreshing the cached value periodically. There are several QTRC APIs that allow you to write data when trace is active; you can write formatted or unformatted hexadecimal data, formatted or unformatted text data, and call stacks. The system automatically collects additional information when trace data is written including the timestamp, job name, thread ID, current user profile, and more.
If you have an application issue and have instrumented your application with the QTRC APIs, the general flow is to start the trace, run the application until the error occurs, then end the trace with ENDTRC. With the trace complete, you have the option to select jobs from the trace collection. This may be useful if you know which jobs encountered the error. PRTTRC prints the trace data, or it can go to an output file for analysis (although IBM does not document the output file format). When you print the trace data, you can order the output by thread, or have all the trace output ordered by time. This latter option would have trace data from different threads or jobs intermixed in time order.
STRTRC is efficient; it uses Performance Explorer (PEX) under the covers to collect the trace data. If you manage performance collections in Navigator for i, you will see the completed PEX collections generated by STRTRC.
STRTRC also supports automatically ending the trace if a watched condition occurs. In my last article, Finding the Needle in the Haystack with Watches, I reviewed watching for messages. If a message is an indication of the application’s issue, you can specify the watched condition on the STRTRC command; when the watched condition occurs, the system will automatically end the trace for you. This can be very useful if the error is intermittent; you would start the trace with a relatively large maximum trace size, let the trace wrap when the size limit is met, and then let the trace run until the message occurs. At this point a watch is triggered and the trace automatically ends. You can also set a time limit for how long the system should watch for the message and have the trace end within that given parameter. You can optionally have a watch program that gets called so you can take some additional action.
Below is an example of the STRTRC command. The session ID may be generated automatically by the system or you can specify the name; I prefer to use a name I can remember. The JOB parameter allows you to select the jobs you want to trace; in my example, I am tracing all jobs in the DAWNMAY subsystem. My application uses spawn() and JOINTRC(*YES) will include those spawned jobs in my trace collection. I left the JOBTRCTYPE parameter default to *ALL, so I will collect a call/return trace, any operating system trace data, as well as any components I specify on the TRCTYPE parameter. On the QTRC APIs, I have used DAWNMAY as my component and the STRTRC API allows me to specify my component ID on the TRCTYPE parameter. Notice that I am collecting detailed trace data with the *VERBOSE trace level.
STRTRC SSNID(DAWNMAYTRC) JOB((*ALL/*ALL *ALL DAWNMAY)) JOINTRC(*YES) TRCTYPE((DAWNMAY *VERBOSE))
The various xxxTRC commands do require *SERVICE special authority. However, the QIBM_SERVICE_TRACE function ID can authorize the use of these commands without granting the user a powerful special authority.