Admin Alert: When System Job Tables Attack, Part I

September 24, 2008 Joe Hertvik

Vigilance pays, negligence punishes. Want proof? Try ignoring i5/OS message CPI1468 (System job tables nearing capacity) the next time it shows up. A system job table overflow can prevent your system from accepting new jobs, delete your spooled files, fill up your DASD, or stop an IPL. And those are its good points. This week and next, I’ll look at system job table overflows and how they affect your system.

What Are System Job Tables and Why Should I Care?

System job tables are internal system objects used by the i5/OS operating system to track every job on a partition. By default, there can be up to 10 job tables on a partition and each table can track up to 16352 jobs for a maximum number of 163520 jobs. The maximum number of system jobs for a partition is designated in the Maximum number of jobs (QMAXJOB) system value, and that number can be changed to any valid number between 32000 and 485000 jobs. QMAXJOB’s shipped value is 165320.

Tracked jobs include active jobs, completed jobs, and jobs that are waiting to be run from job queues. A new system job entry is created every time work (in the form of a job) is submitted to the system. Completed jobs are tracked and remain on the system for as long as there is spooled file output present for the job.

If too many jobs remain active in the system, the job tables can become too big and approach their maximum size. As your system approaches its maximum number of jobs, a number of problems can occur, including the following reported issues.

Slow backups
Your system may stop accepting new jobs
High DASD usage, because of the high number of spooled files on the system
Performance problems with certain i5/OS commands and APIs
IPL problems if the system can’t start up any new jobs

It’s not a pretty picture, which is why it’s wise to occasionally check your system job table usage. i5/OS reminds you when system job table usage is approaching a critical point by issuing the following CPI1468 message.

CPI1468 - System job tables nearing capacity.

According to IBM, the job table checking code was changed in OS/400 V4R5M0 to send a CPI1468 message to the QSYSOPR message queue, QHST history log, and the QSYSMSG critical message queue (if it exists) every time the tenth system job table is extended to track more jobs. If you ignore the messages and the system job tables fill up, you may see some of the problems listed above.

The Simple Solution that Doesn’t Exist

Since excessive job table entries can cause so many problems, you may want to set the system warning message threshold lower, similar to what you might do with other i5/OS thresholds. A lower threshold would provide you with more time to remove excessive jobs and to clean up your system job tables. But IBM won’t let you do that. The operating system has no mechanism for changing the threshold value at which the CPI1468 message is sent.

Since you can’t lower the warning threshold value, that leaves you with three options for detecting and dealing with system job table problems.

Monitoring the system to determine when you are approaching high job table usage.
Detecting and deleting excessive jobs that are cluttering your system.
Maintaining the job table entries to remove excessive unused entries, to check for job table damage, and to compress the job tables.

This week, I’ll cover how to monitor the system for high job table usage. Next week, I’ll look at how to detect and delete excessive system jobs and how to maintain your job table entries.

Monitoring System Job Table Usage

To monitor job table usage, use the green screen Display Job Table command (DSPJOBTBL), as follows:

DSPJOBTBL OUTPUT(*)

DSPJOBTBL provides a screen display similar to the following:

                               Display Job Tables 

 Permanent job structures:                Temporary job structures:
   Initial  . . . . :   30                  Initial  . . . . :   20
   Additional . . . :   10                  Additional . . . :   10
   Available  . . . :   72480               Available  . . . :   583
   Total  . . . . . :   126625                                      
   Maximum  . . . . :   163520                                      
                                                                    
                                                                    
                          ---------------------Entries--------------
      Table         Size        Total    Available       In-use      Other
          1     16752384        16352            0        16352        0
          2     16749312        16352           99        16253        0
          3     16749312        16352         1718        14634        0
          4     16749312        16352        11136         5216        0
          5     16749312        16352        15891          461        0
          6     16749312        16352        16312           40        0
          7     16749312        16352        15833          519        0

There are the three sections to this display: the Permanent job structures, the Temporary job structures, and the Entries.

The Permanent job structures area summarizes the number of permanent job structure entries that exist in the system. A permanent job structure is assigned to each new job that enters the system. These entries are available for reuse but they cannot be recycled until the entry’s current job either: a) ends without producing any spooled file output; or b) all the spooled file output for the job is either printed or deleted.

The three fields to pay attention to in this section are the Available, Total, and Maximum permanent job structures. The Maximum figure is the maximum number of jobs that are currently allowed on the system, which is the value contained in the QMAXJOB system value discussed above. The Total figure reports the total of number of entries contained in all system job tables. Total includes permanent job entries for currently active jobs as well as reusable entries for jobs that have already ended without leaving any spooled file output on the system. Available entries are the number of permanent entries that are available for reuse by new jobs entering the system.

To determine if your system job tables need work, check the permanent job structure entry numbers against each other. If there are a small number of Available entries in relation to the Total entries, the system may experience performance degradation because it will have to extend the job tables when a new job enters the system. Curiously, if there are a large number of Available entries in relationship to the Total, system performance will also suffer when performing functions that examine jobs. Too many available permanent entries can cause degraded performance during IPL steps that process table functions. Finally, if the Total number of jobs is approaching the Maximum system jobs, the CPF1468 message will soon appear and you may start experiencing some of the problems listed above.

For job table overflow purposes, you can pretty much ignore the Temporary job structures section of the DSPJOBTBL display. New jobs entering the system are also assigned a reusable temporary job structure entry. This entry is returned to the Temporary job structure pool when the job ends. The Temporary job structures area also lists out the current number of available temporary entries.

The Entries area indicates how heavily used each of the partition’s 10 possible job tables are. It shows you the total size of each job table, how many entries each job table can hold, and the number of available and in-use entries. You can use the Entries information to back up what you see in the permanent job structures area. If there are a small number of available entries in the permanent job structure area, that should be reflected in the job table detail in the Entries area. Similarly, if the permanent job structure area shows a large number of available entries, you can see which tables have the most available entries in the Entries area.

The other nice feature about the Entries area is that it can show how the in-use job entries are distributed. Press F11 and the entries screen will shift from a summary of the total, available, and in-use entries for each job table to a listing of how all the in-use job entries are allocated for each table. Here’s an example of what that screen looks like.

                               Display Job Tables                  
                                                                   
 Permanent job structures:                Temporary job structures:
   Initial  . . . . :   30                  Initial  . . . . :   20
   Additional . . . :   10                  Additional . . . :   10
   Available  . . . :   72474               Available  . . . :   592
   Total  . . . . . :   126625                                      
   Maximum  . . . . :   163520                                      
                                                                    
             ------------------In-use Entries------------------     
                                 Job        Output      Job Log     
      Table       Active        Queue        Queue      Pending     
          1          127            0        16197           28     
          2           43            0        16285            4     
          3          300            0        14665           14     
          4            1            0         4775           22     
          5            0            0          461            0     
          6            0            0           40            0     
          7            0            0          519            0

With F11, the Entries area changes its display to show you how many of its in-use jobs are active (running), in a job queue (waiting to run), ended but sitting on an output queue with spooled files waiting to print, or sitting in a Job Log pending state. This can provide direction as to whether it will be worth it to compress the job tables to create more space or if you need to remove old jobs from the system.

What’s Next?

This week, I’ve explained what the system job tables, how they can affect the system, and how to detect the warning signals when the job tables are beginning to fill up. Next week, I’ll look at how you remove excessive jobs that are filling up the table and how you maintain the tables to solve these problems. See you then.

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot