Admin Alert: Making Educated Guesses on CPU Utilization

January 9, 2008 Joe Hertvik

In previous issues, I discussed how to activate and deactivate trial Capacity on Demand (CoD) processors on a System i 550 box. Once the trial ends, however, you have to decide whether the additional capacity helped system performance and whether your organization should permanently activate those processors. To help with that decision, I’ll demonstrate a rough method for comparing before and after CPU performance in a trial CoD situation.

The Situation

Continuing with the case study discussed in my earlier articles, I activated an additional processor on a partition that was going to experience extremely heavy increased demand during the Christmas season. The trial CoD period lasted one month, and during that time we increased the number of active system processors for that partition from two to three processors (out of a possible four processors).

Our unconfirmed feeling was that the additional processor did its job, as there were no delays in processing and all orders were filled according to customer specifications. However, we needed to quantify the results with harder numbers that reflected what happened while the third processor was activated and what would have happened if we had continued to run without the additional processor. To do this, I wanted to present management with the following pieces of information to determine the processor’s effectiveness in servicing high workload during a busy period.

What was the average CPU utilization after adding an extra processor during the trial period?
What was the average CPU utilization when using three processors during the busiest day of the trial period?
What would have been the average CPU utilization for the entire trial period if we had not added the extra processor?
What would have been the average CPU utilization during the busiest day of the trial period if we had not added the extra processor?

The first two numbers can be obtained through the performance analysis techniques I described in the first article of my series, while the last two numbers can be mathematically derived from the first two numbers. With this information, it’s possible to come up with comparison numbers that can show management how adding an extra CPU helped out processing and what would have happened if the extra processor was not added to the system. Once you know the net effect of using an extra CPU on the system, it’s easier to make a better decision as to whether it would be worth it to reactivate the CPU on a permanent basis.

What’s At Stake?

The balance here is between the cost of purchasing or renting an additional CPU versus the effect on system processing of not having the additional processor. Without revealing exact prices, permanently adding a CPU to an i 550 machine can be expensive. Purchasing an additional CPU for a System i runs in the tens of thousands of dollars, while renting a CPU on a daily basis could easily cost several hundred dollars a day. So quantifying the results is not a trivial problem, and knowing how to do apples-to-apples comparisons can be helpful in trying to make this decision.

Measuring Apples-to-Apples

In this case, the only effective evaluation for adding an extra CPU was on an apples-to-apples basis, where the results can be compared with and without the additional CPU (i.e., you have to look at the same workload under both configurations). This means that I need to obtain CPU utilization results for the holiday period using three processors and then derive what the results would have been during that time if the partition was only running with two processors. That way, I can quantify the net result of adding an additional processor.

To perform this comparison, I need two things 1) CPU utilization data for the busy period; and 2) a conversion formula for extrapolating what the results would have been if we had been running with two processors during that time instead of three.

To get performance data for the holiday season, I went back to my business partner and asked them to calculate the average CPU utilization during my trial CoD period. I also asked them to show me what the average CPU utilization was on the busiest day of the season.

From that performance data analysis, I retrieved the following CPU utilization rates for the busy period:

The average peak CPU utilization during that time was 65 percent. This means that, on average, the three CPUs were busy up to 65 percent of the time.
On the heaviest traffic day during the busy period, the average peak utilization was 85 percent, meaning that the CPUs were busy up to 85 percent of the time.

At this point, I had a fairly reasonable idea of how the three processors would work under a heavy workload. The next task was to see how the utilization would have changed if the system were using its permanent configuration of two processors during that same period.

A Word About Average and Peak CPU Utilization

Before I continue, it’s important to understand what the term average CPU utilization means. First, there is no such thing as a processor being 65 percent busy. Processors are either in use (100 percent busy) or they are not in use (0 percent busy). CPU utilization can only be understood by looking at utilization measurements over time. So performance analysis looks at the average CPU utilization over a specific time period and calculates a probability that the CPU will be busy at any one point during that time. This means that when you say that your average CPU utilization was 65 percent over 30 days, it actually means that 65 times out of every 100 intervals, your CPUs will be busy. For our busiest day with an average CPU utilization of 85 percent, we can statistically count on the CPUs being busy 85 times out of every 100 intervals sampled.

The second point to note is that since we are dealing with probabilities and averages, it is possible to generate CPU utilization rates over 100 percent. It’s not a desirable situation, but it means that the CPUs have more work than they can process on average and most work will sit in queue or in a subsystem for a fairly long time before it can get completed. Your system will be CPU bound at this point.

Converting the Averages

So we’re looking at the average peak CPU utilization for the system with and without the third processor. We already know that when we added the extra CPU, the peak (highest) average CPU utilizations during the trial CoD period and the busiest day within that period were 65 percent and 85 percent, respectively. What we need to find out is what those averages would have been if we had run the system with only two processors during those times.

Fortunately, I discovered the following IBM formula for calculating average CPU utilization over a specific time period:

((CPU time / elapsed time) / number of processors) = average CPU utilization

Understanding this calculation, I can convert my average CPU utilization rates from using three processors to using two processors. I do this by performing the following two-step process:

Solve the equation for all variables when using three processors and average CPU utilizations of 65 percent and 85 percent.
Change the number of processors in the formula from three CPUs to two and resolve the equation for two processors.

By doing this, I can determine what my average CPU utilization would have been during the busy periods had I been running my partition with two processors instead of three.

And the rest is (relatively) simple math.

Running the Numbers For the 30-day Average

Using basic algebra, I can solve the equation for my 30-day average CPU utilization of 65 percent by plugging the following values into the equation

CPU utilization = .65 (65 %)
Number of processors = 3
Elapsed time = 100 seconds

The CPU utilization and number of processors were obtained from my actual values. Since I’m dealing with averages and I’m just trying to solve the equation for substitution purposes, it doesn’t matter what value I plug in to elapsed time. So to make the math easier, I’ll use a simple elapsed time of 100 seconds. The unknown value in the equation is the CPU time, which is the last value I need to solve the equation.

Once I plug these numbers in, my equation now looks like this:

((CPU time / 100) / 3) = .65

To solve the equation for CPU time, I reduce the variables in the following manner to retrieve a value:

CPU time / 100 = (.65 * 3)
CPU time = (.65 * 3) * 100
CPU time = 195

And once I know what the CPU time is for these particular values, my solved equation looks like this:

((195 / 100) / 3) = .65

Having solved one instance of the equation for 65 percent utilization, I can now calculate what the utilization would have been using the same amount of CPU time when using two instead of three processors. I do that simply by using the same equation, but this time I calculate the answer by dividing by two processors instead of by three processors.

((195 / 100) / 2) = .975

What this tells me is that under the same processing circumstances that produced a 65 percent peak average CPU utilization for three processors, my CPU utilization would have jumped to 97.5 percent if the system had been running with two processors under the same workload. With two processors, the CPUs would have (on average) been busy 97.5 times out of 100 sampling intervals. So here I can demonstrate that adding the extra CPU added a significant value to the system by reducing the CPU utilization rate.

Running the Numbers For the Busiest Day of the Period

Once I have the basic technique worked out, I can also determine what would have happened if I were running two processors on the busiest day of the holiday season. As I did above, I first solve the equation for an average peak CPU utilization of 85 percent with three processors. To do this, I use the following variables:

CPU utilization = .85 (85 %)
Number of processors = 3
Elapsed time = 100 seconds

Which allows me to solve the equation in the following way:

((CPU time / 100 ) / 3) = .85
CPU time / 100 = (.85 * 3)
CPU time = (.85 * 3) * 100
CPU time = 255

And the solved equation would read:

((255 / 100) / 3) = .85

To convert this equation for two processors, I substitute the number 2 for the number 3 and get the following answer for my new average CPU utilization:

((255 / 100) / 2) = 1.275

Which means that had I been using two processors instead of three on the busiest day of my trial CoD period, my peak average CPU utilization would have been 127.5 percent instead of 85 percent. This means that the processors would have always been busy, with much more work than they can handle at any one time. In a situation like this, the system would have been processor bound and it would have taken a very long time to complete any work at all.

Apples-to-Apples Again

By using these rough calculations, I can show my results to management and illustrate what the effect of adding the third processor to my partition has been. Management will be able to understand that the extra processor had a beneficial effect on the system. It can help them make an intelligent decision on whether to permanently add the third processor to the partition or whether to rent additional processor capability through On/Off Activation, Reserve CoD, or Utility CoD. The key lies in being able to understand what the net effect of the change is.

These calculations are rough, but by using them you can create an apples-to-apples comparison that can make it easier to understand what (if any) beneficial effects adding an extra processor to your system can produce. In addition to using this technique to look back at how adding an extra processor helped the system after a trial CoD period, you can also reverse the calculations and use them to predict the future. Specifically, you can also use the equation to make an estimated guess as to how adding an extra processor will help a system that is currently having problems.