The Real World, Versus Real-World Load Testing
October 25, 2005 Robert Gast
We universally accept the fact that it’s dangerous to run with sharp objects like scissors. People do nevertheless, and occasionally get hurt. And while it’s risky to go live with newly written software before knowing how it will behave under heavy loads, lots of development shops take the chance. Ironically, load testing is most often left to customers, dealers, salesmen, and others who, for reasons of transacting routine business, interact with skittish new systems at their own peril. Such was the case for a small 100-year-old scissor maker that had based a century of success on quality and accountability. Two weeks after launching a new order entry system, they realized that in the time between querying the amount of inventory in stock and confirming an order, available inventory would disappear and ordered items could not be shipped. For a short time, customer service representatives fielded calls from discouraged customers and their confidence waned. Effectively, if functional and regression testing is one blade of a scissor, then load testing is the other blade. Functional and regression tests that are performed routinely by development shops verify only that new code will work in isolation and nothing more. With an increasing number of customers, dealers, partners, agents, and resellers coming face to face with self-service applications, the pressure is on to release zero-defect code. And because computing resources are expensive, it makes sense to exploit the full potential of existing hardware assets. Even if the new application you’ve written doesn’t fail under duress, it may require more processor, memory, and disk than really necessary. Ruth Willenborg, a WebSpere application server performance manager at IBM, writes in the IBM WebSphere Development Technical Journal, “Common scalability problems, including synchronization issues and database contention, do not surface until load tests are performed. Anticipate these types of problems and invest in a good load test environment, as close to the production environment as possible.” The tab handed the U.S. economy for poor software quality was $60 billion according to a 2002 study conducted by the United States Department of Commerce, National Institute of Standards and Technology (NIST). Framingham, Massachusetts, researcher IDC recently took a look at the demand for automated software quality (ASQ) tools, which is the group of products that load testing tools fall within, and reported that the ASQ market grew by double-digits in 2004. The need for improved software development productivity, higher software quality, and reduced time to market, along with the need to manage increasingly complex software, and the business-criticality of software are some of the reasons, says IDC. The development trenches are filled with managers and programmers who constantly feel pressured to get new applications finished and into production. Hard as it may be to fit load testing into the development cycle, you must be able to predict the performance of software before it goes live, and understand how performance will degrade as loads increase. Despite talent, meticulous diligence, and dedication, new code will always surprise you. Several questions relating to capacity, scalability, and performance should be considered, according to a white paper written by Stewart Bishop, an authority on automated software testing, and product development manager for iSeries software test automation software developer The Original Software Group. These questions include:
Cut Risk To limit your exposure to load related problems, one should first choose between different automated load testing strategies. The stress testing method simulates multiple users simultaneously interacting with applications, and is most often used to identify performance bottlenecks caused by both software and hardware. By incrementally increasing repetitive application inputs and measuring performance degradation, you can pinpoint an application’s breakpoint. Stress testing does not account for a variance in the time the enter key is depressed, or so-called user think time. In contrast, the method called real-world simulation measures the impact of ‘X’ users over a given period of time. It assumes that users have different habits–varied think times. Real-world testing is more flexible and telling than stress testing because it illustrates how an application accommodates user loads that fluctuate. Stress testing will uncover load related problems, but not under real world circumstances. Basic performance issues will get flagged, but not ones caused by more complicated application input scenarios. With real-world testing, when an application is finally placed into production, glitches will have already been dealt with. Load Testing Tools It might make sense for someone to write his or her own test script that will, to a limited degree, simulate user activity. Homegrown test script development takes time and sometimes requires special tools and skills. To build your own load testing tool you have to write a program that runs one thread for each simulated user session. This program needs to mimic some action performed by a production application. Each thread must submit a specific request and retrieve an answer. This action should be repeated in parallel a varying number of times based on the discretion of the tester. Since users have different work habits, the way requests are made must vary slightly. A simulated database needs to be in place to work against, and, data needs to be collected on how the system is responding. For the sake of accuracy, other parts of the system must be under realistic loads at the time these simulations are performed. This process must be repeated for each different test performed, and if the production application changes even minutely, the test must change as well. And once all of these bases are covered, you need to be sure that non-technical users can use it. Mature load-testing tools available from companies that specialize in such things, are designed to consume minimal resources and produce reams of analytical documentation. These tools often incorporate the capability to infinitely vary the way test transactions are submitted, and they can be used throughout the development process to catch problems as incremental progress is made and do so with minimal maintenance. Furthermore, many vendors offer assistance in assessing requirements prior to building test packs. Also, a good tool should be able to impose a variety of artificial background loads to simulate other unrelated active jobs, and monitor batch activity, server jobs, and green-screen activity. Expectations need to be set properly on how much time it takes to get a test pack up and running. In short, the more experience and familiarity a technician has with a load testing tool, the less time it will take to prepare a test pack. Experienced technicians can put a simple test pack together in one day. Vendors offer training to help technicians get past initial hurdles and become productive more quickly. Setting up a load-testing environment is sometimes perceived as expensive, because people, time, and enough hardware to simulate production environment load levels must be allocated to the task. In juxtaposition, consider the cost of fixing a problem once it has occurred. In light of competitive forces, what is excellent customer service worth? What’s the cost of acquiring new business? For some companies, new customers just show up on their doorstep, but for most of them finding one new client can cost tens of thousands of dollars. One major US electronics retailer is repeatedly pushing the limits of some iSeries applications that support a new POS system because, when the biggest day of the year for retailers comes–the day after Thanksgiving, they want to be certain that the system can handle their customer’s transactions without fail. Solid load testing lets the IT department better align itself with the strategic goals of the organization by guaranteeing that critical systems will always be available. Sound load testing practices could have helped the scissor maker gauge the resiliency of its order entry application by stressing it with simulated transactions and comparing the input load with what the system actually processed (inputs should equal outputs). Instead, by tracking backwards through a cascade of problems, they realized in the interval between checking inventory availability and striking the enter key to commit the order, the quantity available could change if two or more people were accessing the same product. Also, the subroutine to direct users to a substitute product did not catch the change quickly enough because of excessive locking time. Robert Gast has written about technology and business management since 1986. He is the managing partner of Chicago-area based Evant Group, and can be reached at bobgast@evantgroup.com. |