Guru: Health Indicators
January 28, 2019 Dawn May
In Configure Collection Services, I reviewed how you can improve the configuration of Collection Services. In this tip, I’m going to demonstrate how to easily review your Collection Services data and understand your partition’s performance characteristics using the Health Indicators.
Navigator for i has a rich set of performance tasks, one of which is Investigate Data, also referred to as the Performance Data Investigator (PDI). The base operating system allows you to manage and visualize Collection Services data. The Performance tasks (and PDI) also support other types of performance data, but those graphical features require installing the Performance Tools (5770-PT1) product.
With PDI, you can review your Collection Services data with hundreds of different charts. I recommend everyone become familiar with Collection Services system overview charts. Regardless of the environment, if you understand your performance signature, if (when?) something goes wrong, you know what “normal” looks like, which can make troubleshooting more focused.
If you are a beginner, getting started with PDI may seem intimidating. Never fear, IBM has provided an easy way to get started using the Health Indicators.
Health Indicators are found under the Investigate Data task in Navigator for i, as the following screen capture shows.
There are several different perspectives to select; the first, System Resources Health Indicators, is a good starting point to review overall partition health. You must select a Collection Services collection when you review the health indicators. The default is Most Recent, which is the currently active collection. Using the most recent collection, you can review the performance of your partition for the current day. You can select other collections to review performance characteristics of prior days.
The System Resources Health Indicators chart provides a summary of CPU, disk, memory, 5250 response time, and database performance characteristics. In the screen capture below, you can see CPU, disk, and database performance require further investigation as there are yellow and red sections in the chart.
This chart needs a brief explanation. Let’s assume the chart is using performance data from last Friday. On that day, I had a CPU issue 20 percent of the time, with it being a significant issue 2 percent of the time. It does not tell me when the CPU issues occurred, nor does it tell me what type of issue. All I know is I had a CPU issue of some sort on Friday. Likewise, with disk and database, I also had a performance issue.
The System Resources Health Indicators is the simplest way to tell if you have performance concerns.
From the System Resources Health Indicators chart, use the Actions drop down to take your next steps. The following screen capture shows the drill-down options; you can investigate each metric further, or you can define the health indicator thresholds.
The system-supplied defaults may be good for most environments, but probably not all. Every environment is different, and you may need to customize the threshold settings to get an accurate reflection of your overall system health. For example, the default CPU health indicator threshold for CPU utilization is set for a warning (yellow) at 85 percent and action (red) at 90 percent. But if your normal CPU utilization is 60 percent you may want to customize the threshold to have the warning level at 65 percent and action at 80 percent.
Going back to the System Resources Health Indicators, if I take the CPU Health Indicators drill down, I can get more information about my CPU issue. The screen capture below tells me my CPU issue is due to Jobs CPU Queueing Percent. CPU queuing generally indicates you need more CPU resources on your system. In this collection, however, I know that workload groups were being used to license only one core for DB2 Web Query. Since the workload was limited to one core, latency was introduced, which shows up as CPU queuing. If interested, you can read more on this topic in the blog article Workload Groups and Performance Considerations.
If I drill down to Disk Health Indicators, I see an issue with Average Disk Response Time and Average Disk Percent Busy. Again, these charts don’t tell me when, just that I had this disk performance concern. From Disk Health Indicators, I can drill down into additional PDI charts that can help me understand the issue in more detail.
Finally, if I were to drill down to Database Health Indicators, I find several more issues, as indicated in the screen capture below.
The drill downs from the Database Health Indicators allow me to investigate the issue further, but not all options for database are available without the Performance Tools Product (5770-PT1) installed.
As you can see, it is very easy to get started with PDI and Collection Services data with the Health Indicators. Once you get started, you will find a tremendous amount of information available with a few clicks of the mouse. If you are interested in some additional reading, the IBM Redpaper publication Accessing IBM i Health Indicators Using Performance Data Investigator is a useful reference. That document was written in 2014 but is still a good reference.