Admin Alert: When Journaling Slows Down Your System, And What To Do About It
May 14, 2014 Joe Hertvik
One interesting feature of the IBM i operating system is that under certain circumstances, journaling can actually slow down batch job performance. Here’s a real-life case study of how journaling can slow down processing and what tools IBM provides to handle the situation. When Journaling Attacks After migrating a production IBM i partition to a new Power 7+ machine from Power 6 hardware, a batch job that previously took five hours to complete was now taking over 11 hours to finish and had to be cancelled each time it ran. This package rebuilt an item table in the company’s ERP system. The IBM i partition ran a high availability package that used remote journaling to replicate data from the production IBM i box to a Capacity BackUp (CBU) system. The machine administrator also identified two other jobs that were experiencing significantly increased run times over what the jobs historically ran at on the old Power 6 hardware. After investigation, they determined that all three jobs produced unusual and intense disk activity: clearing, writing, and updating large ERP files. The Culprit: Journaling After opening a ticket with IBM, the shop determined that it was the intensive disk activity that was lowering performance. Specifically, it was the high number of journal writes that were slowing down the system. The high availability package was generating synchronous journal entry writes to a journal receiver whenever a file was updated. As the batch job performed database writes and updates, it continually went into a journal wait state while it wrote additional journal entries (JE) to the journal receiver used for replication. Since these were synchronous writes, the application had to wait until journal entry generation was finished before it could write or update the next record to disk. IBM and the customer reasoned that journaling was slowing down processing on the system and raising the job run times. This is a fairly common occurrence for high availability packages that use remote journaling for CBU replication. What IBM Recommended IBM recommended that the customer purchase and install option 42 of the IBM i operating system, HA Journal Performance. Option 42 is available on i 6.1 and 7.1 under the following licensed program features.
Option 42 can also be loaded under the older i5/OS V5R4Mx operating system but it’s unclear whether the product is still available for licensing to V5R4Mx shops. Option 42 is a relatively inexpensive chargeable i OS licensed program (costing somewhere between $1,000 and $10,000 depending on model and pricing) that is available on your installation media. In cases like this where it’s a potential cure for a performance issue, HA Journal Performance can be installed to a customer system for a 70-day evaluation period. HA Journal Performance caches (bundles) journal writes to main memory before writing them to disk. This improves performance because the batch program writing the journal entry no longer has to wait for a journal entry to complete before processing the next record. After installing option 42, our case study company activated journal caching for the journal that services files for the long-running batch jobs. They turned it on by using the following Change Journal command (CHGJRN) for all journals tracking the job’s file changes. CHGJRN JRN(jrnlib/journal) JRNCACHE(*YES) Setting a journal’s Journal caching (JRNCACHE) parameter to *YES turns on journal caching. Note: for HA installations, check with your HA vendor for any additional configurations that need to be made. Then they ran the problem job again and found a significant increase in performance. Here are the run times for the job running on the old system, the job running on the new system after migration, and the job running on the new system after turning on journal caching.
The other two slow batch jobs also showed similar increases. Before activating journal caching, the jobs took twice as long on the new system as they did on the old system; after activation, the jobs ran about a third as long as on the old system. This solved the issue. The company bought the package as a permanent fix to speed up disk intensive batch jobs. Using A Little Or A Lot Of Journal Caching Besides using journaling for replication, several other IBM i functions such as ODBC and SQL use journaling to track database changes. Journaling is a requirement for several of these functions and as such, may affect performance for batch jobs with intense disk activity. If you have speed issues with batch jobs using journaled files, it’s worth trying out option 42 to see if it improves batch job performance, especially in a journaling environment such as a high availability setup. You should also note that journal caching can be selectively turned on. If you’re generally happy with system performance but there are a few jobs that may benefit from journal caching, you can selectively turn on caching just for the journals that are affected by those jobs. You don’t have to turn it on for all the journals on your system. Journal caching is local to each journal and must be turned on for each journal. It’s not global for the entire system. What’s The Catch? There’s Gotta Be A Catch For things like ODBC connections and SQL updates, there is little downside to turning on caching. The only effect is that journal entries from those jobs will be cached to memory and not immediately be written to the journal. There are however, two downsides to using journal caching. The first downside occurs with applications that rely on remote journaling, such as high availability packages. Caching journal entries means that your journal entries are not immediately written to disk. Because journal entries are not immediately transmitted to a remote system until they are written to disk, journal caching slows down how quickly a remote CBU system will be updated, which can also affect your high availability recovery point objective (RPO). Using journal caching will result in a longer time for posting transactions to a CBU than you would experience if you weren’t using journal caching. The real downside is increased risk that in the event of a catastrophic system failure where an IBM i partition is no longer available, more in-process transactions in memory will be lost than if you did not enable journal caching. IBM’s recommendation is that “it is not recommended to use journal caching if it is unacceptable to lose even one recent change, as in the event of a system failure, where the contents of main memory are not preserved.” Keep this downside in mind before you turn on journal caching. The second downside is there are special considerations for journal caching when you use commitment control on your system. According to IBM i expert Gary Patterson, commitment control is set up to “… flush journal entries to disk, regardless of caching or bundle size.” Commitment control breaks journal caching by putting your job back into the journal wait state that journal caching is designed to avoid. Fortunately, there is a fix. You can turn on a “soft commit” for an individual job or for your entire system, and that will enable you to continue using journal caching for batch jobs. See Gary Patterson’s article on soft commits for more information on enabling this feature. The Bottom Line For Journal Caching Be aware that there is a benefit and a risk when using journal caching to speed up batch processing for journaled files. The benefit is that for I/O-intensive applications, it can significantly decrease processing time by eliminating synchronous journal entry posting. The risk is that in the event of a system failure, more data will be lost in memory without being posted to a remote journal or a CBU. You may also have to adjust your job or systems settings if you’re using commitment control on your partition. Journal caching can be a great benefit to your system, but you should be aware of the risks before turning it on. Joe Hertvik is an IBM i subject matter expert (SME) and the owner of Hertvik Business Services, a service company that provides written marketing content and presentation services for the computer industry, including white papers, case studies, and other marketing material. Email Joe for a free quote for any upcoming projects. He also runs a data center for two companies outside Chicago, featuring multiple IBM i ERP systems. Joe is a contributing editor for IT Jungle and has written the Admin Alert column since 2002. Check out his blog where he features practical information for tech users at joehertvik.com.
|