Guru: Configure Collection Services
November 12, 2018 Dawn May
Collection Services collects valuable information about your partition and the workloads you are running. While many think of Collection Services as performance data, it is really systems management data and at some point you will need that data. Collection Services helps you understand the overall workload running on your system and trends over time. It provides the ability to look backwards in time to understand “How did I get here?” and has the data to answer a wide variety of other questions.
Collection Services is on by default, so whether you know it or not, you are running Collection Services. Do not turn it off. If you turned it off, turn it back on. The default configuration for Collection Services, as shipped with the system, is good, but you can change it to make it even better.
Below I review the important configuration parameters for Collection Services, there are some environmental considerations that need to be taken into account for a couple of them. The preferred interface for configuring Collection Services is the Navigator for i Web console. You can use the CFGPFRCOL command, but that command has gotten more complex in the past few releases.
The following screen capture shows where you find the Configure Collection Services task within Navigator for i; it is part of the Performance Tasks.
The next screen capture shows the General tab from the Configure Collection Services task. You can see that I have numbered four key parameters.
The first parameter to consider is the collection interval; the default is 15 minutes. Fifteen minutes is an eternity in computer time. IBM originally selected this default value since it was relatively safe, given the amount of disk space consumed by the Collection Services data. Today most shops have plenty of disk space, so you should consider adjusting the collection interval to a smaller value. IBM recommends five-minute intervals, and I agree with IBM’s recommendation. You will need about three times as much disk space as compared with 15-minute intervals, but this is generally a good trade-off. The system will automatically expire collections as they age, so you don’t need to worry about accumulating huge amounts of data over time.
The second parameter tells when to cycle the collection. Cycling closes out the current collection and starts a new one. The default is generally good, assuming you review your performance data on a daily basis. However, there are two major considerations for why you may want to change the cycle time and frequency.
- Cycle time of day. The default time to cycle the collection is at midnight. If you have a critical application that runs across midnight, you should change the cycle time — you do not want to cycle the collection in the middle of an important application. For example, if you have a batch application that runs from 11:00PM to 01:00AM, you should change the cycle time to be before 11PM or after 1AM.
- Cycle frequency. The default is every 24 hours, and again, this is a good default with one collection per day. However, on large, busy systems, the collection can grow rather large. By cycling more than once a day, you have smaller collections to work with. The queries that run to analyze the data will run more quickly on smaller collections. The trade-off is that you will need to stitch them together to view a full day’s worth of data.
The third parameter to review is the option to create historical data. Historical data was introduced in the 7.3 release, and is off by default, which is not a good default. I recommend that you turn on historical data collection. The system automatically manages historical data, summarizing it over time. You should also ensure that you have the PM Agent running so you can extend the retention period for historical data. It allows you to visualize important metrics over time to understand the workload growth over time. Historical data will be very valuable before you upgrade hardware or roll out application changes.
Be sure to leave the fourth parameter, create database files during collection, set to the default of *YES. This allows you to investigate Collection Services data in near real-time. The Performance Data Investigator, which I’ll write about in the future, requires the data to be in the Db2 files for analysis.
There are many additional configuration parameters for Collection Services on the other tabs, but the defaults are good for most of them. You may want to review the settings in the Data Retention tab. Be sure to keep Collection Services standard data (or the data in the management collection object) on the system at least 10 days, or even more if you have the disk space. You’ll want to keep baseline collections from key timeframes for reference.
The example Configure Performance Collection command below shows the parameters used to change the collection interval to five minutes, the cycle time to 11PM, cycle the collection every 12 hours, and to create historical data.
CFGPFRCOL INTERVAL(5.0) CYCTIME(230000) CYCITV(12) CRTPFRHST(*YES) CRTHSTDTL(*YES)
Now that you have improved your Collection Services configuration, you can use that data for a wide variety of purposes, which I’ll cover in future tips.