Myths, Misconceptions Run Wild in World of High Availability
November 29, 2004 Dan Burger
The myths and misconceptions about high availability in the iSeries market make discussion of this topic a swamp. You can easily lose a boot in the deep mud, and if that’s the worst thing that happens to you, you’ll be lucky. Many iSeries shops lose more than a boot; they lose their way. Or they go far out of their way to find another route around the swamp in search of 24/7 reliability and more efficient ways to perform backup and recovery. Just as different companies have different goals, and different vendors offer different solutions, OS/400 shops need to clearly understand the benefits of high availability. In the not too distant past, high availability was widely believed to be a solution only applicable to large enterprises (primarily banks and financial institutions) that had humongous investments in IT staff and equipment and, more important, needed to provide continuous access to their applications. (When you go to an ATM for cash, or to the Web to check your stock portfolio, you don’t want to wait for a batch job to finish running, or a tape backup.) That perception was accurate as recently as five years ago, but the technology has changed, and the market has changed. Dramatically changed. In the early development stages, high availability software required high-performance AS/400 and iSeries boxes with the processing power to run it. Because the purpose of high availability is to mirror databases and business critical applications, this created extremely heavy workloads. Without the big horsepower servers, high availability would bog down systems, a condition that required budgets that could absorb the purchase of such things as additional processors and more disk arms and other costly items. Without the horsepower, jobs backed up and complaints about slow systems reached all the way up to the CIO. Some people think high availability is still only for Fortune 500-type companies, a fairytale land where pockets are deep and anything is possible. Not so. Small and midsized businesses are the hotbed of HA activity. For most organizations, this is not a clustered environment with several iSeries boxes. There is most often a cluster consisting of a source box (your production machine) and a target box (the one you hope you never have to use for production work, but hope to use for other work, like data warehousing or doing tape backups). That target server is, in many instances, a smaller box than the production machine entry-level iSeries, which is then equipped with the appropriate bandwidth, high availability software to automate the failover from the source to the target machine. The reason why high availability is moving down into small and midsized businesses is the introduction of remote journaling technology, which IBM first included in OS/400 in 1999. A journal, as many people know, is a copy of the transactions that run on a system as it is processing transactions. Remote journaling changed the playing field, opened the door to a number of new vendors, and brought the price down to a level where the average small or midsized business could get a good return on its investment in HA software. Not all high availability products make use of remote journaling, but it is far and away the most used technology in new HA implementations.
Because remote journaling has many advantages over local journaling (which was originally used on OS/400 servers when implementing HA solutions before 1999), and because IBM has promoted its benefits to the iSeries installed base, it is frequently talked about and is almost as frequently misunderstood. A quick explanation is that remote journaling writes data to the target server, as well as to the source (production) server. It does this as a function of the operating system and makes it possible to send a copy of every journal entry to a journal receiver on a target server. (Technically, you could use remote journaling on a single physical machine with one logical partition as a source and another as a target, too.) Remote journaling notes the changes and writes a journal entry to the journal receiver on the source server, and then sends a copy to the journal receiver on the target. Local journaling writes only to the source server and, rather than being a function of the operating system, is the job of the third-party HA software running on the source box to get the journal data and move it to the target server. For an in-depth explanation of journaling, including remote journaling, download the IBM Redbook Striving for Optimal Journal Performance on DB2 Universal Database for iSeries (PDF format). As part of the operating system, not the vendors’ solutions, remote journaling is simply a very fast transport mechanism. It runs at the same speed, no matter which third-party vendor’s product is being used. HA products are not a factor in the speed at which the remote journaling process takes place. These HA programs will, however, have an impact on how quickly and efficiently the entire replication process takes place. Whether or not high speed is a business objective, one should never underestimate the allure of speed as a sales topic. Some HA sales people have been known to exaggerate performance claims, so much so that getting a second opinion on performance numbers is important. A myth is only a myth if left unchallenged. A related performance issue is the topic of role swapping, which is the change of primary databases and application software from the source iSeries to the target iSeries. There are several phases to a role swap, and figures pertaining to the process are often unrealistic and are seldom an apples-to-apples comparison. An important aspect of the role swapping procedure is advance testing. Pulling the trigger on a role swap is not without its preparation. It is good advice to test the system regularly to ensure that all the interfaces, devices, user profiles, and complete data are in order. The ease with which this testing takes place is considered as important as the speed at which the swap occurs after the “go” button is pushed. If the testing procedures are overly complex, the IT staff will be less inclined to use the testing, and the HA project will probably fail. Without performing role swaps, the value of the HA system is diminished greatly because the product is not being used to reduce downtime during such things as hardware or operating system upgrades. This type of use adds value to the HA system because otherwise it is mainly an insurance policy in case of a disaster–and one that you don’t know is good until it is too late to change. Ideally, customers would have identical source and target machines and role swap once a month to prove to their company that this HA stuff is working. This month’s source machine is next month’s target machine. So the time it takes to complete the role swap includes the time it takes to prepare, the brief time involved in executing the role swap, and the time involved with the verification that the swap went as anticipated. Out in the field, where the sales battles are fought, a good question to ask is whether the salesperson understands all the issues involved in the role swap procedure. Make sure you ask questions about all of the phases of completing a role swap, and don’t be fooled into thinking that it’s about as complicated as flipping on a light. During this process applications are being restarted, network attached devices are being switched, and users are coming back online. When making a high availability purchasing decision, it is important to define some basic goals. How much downtime are you willing to have when recovering from planned or unplanned outages is a good place to begin. How many seconds or hours are acceptable? In the small and midsized business portion of the iSeries market, many organizations do not require the highest of high availability, but without a realistic plan it’s easy to overbuild a system or to simply end up with more bells and whistles than you need. It is easier said than done, but if you focus on actual business needs, both today’s and tomorrow’s, the likelihood of overbuilding or under-estimating what you need in an HA setup is reduced. For many companies, the differences in speed are not particularly relevant. Their goals are not real-time synchronization. Management of the HA process is a higher priority, and in this case control trumps speed. The administration aspect is another messy area where overstating and understating facts is common. Due to the complexity of the original HA systems, compared with the current technology, those systems were much more labor-intensive. But even today, the complexity of an HA setup and its administration requirements are going to change from one implementation to the next, depending on the customers’ requirements, which again relate to the acceptable amount of downtime in a business plan. To a large degree, the growing use of high availability among small and midsized businesses in the iSeries market comes down to price and ease of use. Software prices do not vary widely, but are directly related to how easy the products install, the amount of training necessary, and the costs involved in maintaining or upgrading the system as time goes on. In the next issue of The Four Hundred, the high availability discussion will take a look at the individual vendors, their products, their technology, and how it is being applied to the needs of iSeries businesses. Related Articles “Make High Availability Work for You” “Choose Wisely: High Availability Performance and Reliability Issues” |