Using i5/OS Performance Adjuster to Better Manage Memory
November 1, 2006 Doug Mewmaw
My boss has a saying that goes: “Do you want to get your brain surgery by a doctor that has done one operation or do you want the operation done by a doctor that has done 1,000 surgeries? He uses this analogy to give our customers peace of mind when we analyze their performance data. After all, in 2005 we did 3,000+ capacity plans alone. As a result, we get the pleasure of helping customers understand their performance needs. Big shops, small shops–our company has seen it all in regards to performance. I state the above not to be a commercial for my company, but merely to lay down the foundation that my company not only knows performance, but we understand how to efficiently analyze the performance data. I think sometimes I see performance data in my sleep. (Insert “get a life” here.) As someone who talks to a lot of people in the industry, I have come to the conclusion that performance management is not an easy concept for everyone. Particularly, memory analysis is a performance component that drives most people bananas. In the old days, one company had a product that predicted the impact of adding memory to the system. We also had very specific guidelines given to us by IBM for System/3X and AS/400 systems that helped us manage the memory component of our systems. To this day, I still have these guidelines (now faded to yellow) pinned to my bulletin board. What some people learned from experience, however, was that these guidelines didn’t work for all shops. The reason was simple. All applications are different. That is, 100 memory faults a second for company A would be fine, but company B’s 100 faults per second would bring the system to its knees. Since everyone is not proficient at memory performance management, IBM created a performance process that helped manage the memory component on the AS/400, iSeries, and System i5 servers. This process is called Performance Adjuster (QPFRADJ) and it has been a part of the operating system for years. The IBM iSeries Information Center V5R3 Experience Report states: “The iSeries server has the ability to automatically manage the shared memory pools without any user interaction.” [Emphasis added.] However, after looking at tons of performance data, I would give this analogy about the above statement by IBM concerning Performance Adjuster: My car will go forward if I turn it on and press the accelerator. However, will it get me to my destination with me merely turning the car on? Of course not. I’m required to steer the car, add gas to it when needed, and do periodical maintenance and check ups. In other words, managing memory with Performance Adjuster requires one to do a lot more than merely turning it on. Let me explain. But before I do that, let’s be clear. It is not my goal to teach you about Performance Adjuster. You can get the documentation from IBM. I do however want to show you how important user intervention is when using Performance Adjuster. Monitoring How QPFRADJ Is Affecting System Performance Today, I’m constantly asked for guidelines that will help shops manage their memory performance component better. As I said before, IBM stopped giving out guidelines because they were not applicable for all types of environments. However, since machine pool faulting is so critical to the overall performance of the box, machine pool faulting is a metric where we must measure it against a best practice guideline. The best practice guideline is to have machine pool faulting less than 10 faults per second. When I managed my iSeries environment in my previous job, it was my philosophy to be a little more aggressive. That is, it was not uncommon for my boxes to be under 5 faults per second sec. That is something you will have to determine what is acceptable for your location. With that said, let’s look at the below graph where performance adjuster is merely turned on and there is no user intervention.
Here are some observations: 1. This is a two Y axis graph. The Left Y axis data (shown in orange) shows the machine pool faulting rate and the Right Y axis data (shown in gray) shows the QPFRADJ memory movement throughout the day. 2. This machine has a high faulting rate. Between 16:00 and 23:45, notice that there are many intervals where the machine pool faulting rate is way over the recommended best practice guideline of 10 faults per second. During this period, the average is 20 faults per second. 3. This machine has an unstable machine pool memory allocation. Between 16:00 and 23:45, notice how QPFRADJ is moving the memory throughout the day. In fact, the machine pool memory allocation is extremely unstable, fluctuating from 1.2 GB to 2.9 GB.
Since the machine pool faulting rate affects how efficiently the systems tasks are being processed, it’s imperative that there is enough memory in the pool to ensure the faulting rate is within the best practice guideline. In other words, you don’t want a system that is spending all its time simply trying to manage the OS and related system tasks. Your CIO expects you to create a stable environment so your company’s applications can run successfully. That’s why the machine pool faulting metric is so critical to your system. In this example, we need to tell Performance Adjuster we not only need to add more memory in the machine pool, but we need to keep it in there. This is done in two steps:
Remember, the goal is to have zero problems in the machine pool. Only then can the your normal applications have a chance to run efficiently. In the screen shot below, notice an environment where the system administrator has enough memory in the machine pool. This not only creates a stable QPFRADJ environment (total memory in the pool is not fluctuating in huge movements), but the faulting rate is within the best practice guideline.
Ensuring the Machine Pool is Set Up Appropriately Once you understand how QPFRADJ is affecting the machine pool and you make the necessary tuning changes, you must measure the machine pool service level. That is, with any tuning exercise, you must prove the change had a positive effect. In this case, we must ensure the machine pool faulting rate is within the best practice guideline:
Do a Performance Adjuster MRI on Your System In talking to people in the industry, a lot of performance tuners simply don’t understand the big picture of how Performance Adjuster is affecting their system throughout the day. A neat trick is to simply measure the total memory in each pool throughout the day (for all pools). By doing this Performance Adjuster MRI, you can see how QPFRADJ is moving memory around on the entire box. By understanding the memory component better, you can make educated performance tuning decisions.
In this example, it’s obvious that pool 3 (which was *INTERACT) is consuming a majority of the total memory resources. The above screen shot shows that at 8:15 a.m., just under 62 percent of the total memory is in pool 3. Also notice that pool 5 and pool 6 has had no movement throughout the day. Another View of the Performance Adjuster MRI Results
In our 8:15 a.m. example, we see exactly how Performance Adjuster allocated the total memory in each pool. Notice that the machine pool is using 11 percent of the total system memory. Pool 3 (which was *INTERACT) indeed had just under 62 percent of the total memory. This kind of analysis is so critical in appropriately setting up the minimums and maximums within the WRKSHRPOOL command. Once you understand the big picture, you can get more granular and focus on the peak processing time on your system. Another best practice technique is to look at the actual total memory in the pool. The next graph shows that view.
This graph shows how much total memory is in each pool. The above screenshot shows that at 8:15 a.m., just over 119 GB of total memory is in pool 3. Performance Adjuster (QPFRADJ)–User Intervention Required! Measuring the memory component of a system is not an easy task. You have to play detective and really do the due diligence to fully understand how Performance Adjuster is affecting memory on your iSeries or System i5 system. First, we know we must first understand the big picture. The steps are:
Understanding the big picture first, gives you a starting point to ensure Performance Adjuster is set up efficiently. Just like we measured the machine pool, we can do the same methodology for our other shared pools. In my next article, I will teach you how to measure your core production applications and help you understand how to measure the memory component during your peak processing times. Doug Mewmaw is an 25-year “jack of all trades” IT veteran who currently is director of Education & Analysis at Midrange Performance Group, an iSeries business partner that specializes in performance management and capacity planning. He can be reached at DMewmaw@mpginc.com. |