Business Continuity Planning Part 2: Disaster Without Warning
May 30, 2006 Mary Lou Roberts
April 18th marked the 100th anniversary of the Great San Francisco Earthquake, which was by far the most devastating earthquake ever to occur in North America and which resulted in approximately 3,000 deaths. Compared with that event, the 1994 Northridge, California, quake that claimed 60 lives looked like child’s play. And you might be thinking that earthquakes are not your problem, that they are something that happens in mostly in California or in the Pacific Ocean. You would be wrong about that. Scientists have been working for decades in an attempt to uncover some means of predicting when earthquakes will occur, but without much success. The U.S. Department of the Interior, through the U.S. Geological Survey (USGS), does offer a 24-hour earthquake forecast map, showing the probability of experiencing a quake for any particular day. And the National Earthquake Prediction Evaluation Council (NEPEC) has recently been re-established to advise the director of the USGS on earthquake prediction, forecasting, and hazard assessment. Despite some progress in locating fault lines and measuring and reporting the earth’s movements, most quakes still take us by surprise with deadly consequences. In fact, the University of California Riverside notes, “The USGS reports that earthquakes are one of the most costly natural hazards facing the nation, posing a significant risk to 75 million Americans in 39 states.” Last week, I began a discussion of business continuity planning (BCP), looking at the plans some companies in the Gulf Coast and Atlantic seaboard of the U.S. have put into place in preparation for what is predicted to be another harsh hurricane season. While companies located in the Gulf States and on the eastern Atlantic seaboard have much to be concerned about from Mother Nature’s wind and storm wrath, they at least have the benefit of some advance warning of hurricanes and can take some precautions in the event that a violent storm is approaching. For many other types of disasters–major power outages, terrorist attacks, tornadoes, and earthquakes, for example–there is no warning beyond the fact that, inevitably, they will occur sooner or later. Our job is to plan for that inevitability, as most companies in earthquake-prone regions have done. Kark Storz, a global company that produces medical instruments and devices, has locations across North America, but its largest office, which houses its primary IT facility, is in Culver City, California. Its primary manufacturing facility is in Santa Barbara. Matt Butcher, the chief technology officer for Karl Storz’ North American region, reports that the company runs its SAP business applications on an iSeries 570. The primary server is located in Culver City, and replicated in Boston using Lakeview Technology‘s MIMIX. The company has also recently implemented VoIP for all of North America, and the Culver City location acts as a hub for that. According to Butcher, Karl Storz has many operating units that, until recently, have acted somewhat independently of each other, creating some difficulties in standardizing technology and a business continuity plan. Now, however, the company is moving toward working together as a single unit. But that means enhancing the disaster plan. “Each of the companies operated fairly independently, and each had its own systems. Once we moved to SAP, we realized we were all relying on a single system that is located in the earthquake zone,” he says. At that point, the company looked more closely at the business continuity plan that had been in place, found it wanting, and began working on a new plan. To accomplish this, Butcher says that the company formed a team with representatives from IT, facilities, and human resources. This team is tasked with both revising and updating the business continuity plan on a quarterly basis. The team is also planning to engage an expert to provide input to the plan and to help Karl Storz test it. Further, the company is looking at software packages that are available to make the documentation easier to maintain. Pointing out some of the differences between different types of disasters, Butcher notes that, although it’s conceivable that an earthquake could wipe out both of the California IT locations, which are 100 miles apart, it’s not very likely. Unlike hurricanes, earthquakes do not tend to affect a very wide geographic area. So the company recently tested a new plan using a real event when its Canadian location lost the use of its building for a while. “We were able to route the calls to Culver City and continue business,” Butcher says. “With the VoIP system, if we lose a location and lose contact with the LAN, each location can still operate locally. They may not be able to dial other locations, but they will be able to make and receive calls. If we lose Culver City, we can take that call center and easily move that somewhere else. We don’t have a completely redundant call system, but we have close to it.” One real remaining area of concern to the company is the manufacturing operation itself. “To reestablish a medical device manufacturing location is not a simple task,” Butcher notes. “There hasn’t been an earthquake in that area for about 20 years, so we might be ripe for one. From that standpoint, we do have some concerns.” To date, the plan involves backup and switchover for the enterprise SAP system and phone communications with the VoIP system. Butcher says that next the planning team will be looking at network storage and coming up with a storage area network to provide file redundancy. “That’s our roadmap. Is it perfect? No. But we’ll keep moving and building on it. From an IT perspective, we’ve done pretty well, but from a BCP perspective, we have more to do in terms of making sure we get people into the right places so that we can start transacting business again as soon as possible.” In fact, Karl Storz recently experienced a problem that was not accounted for in its plan and didn’t even involve harm or damage to the company’s location: A mudslide cut off the roads between the two locations. Butcher reports that he had to fly rather than drive to get to the other locations because the roads were inaccessible. Such are the lessons learned in the creation and maintenance of an ever-evolving business continuity plan. Another iSeries user in the earthquake zone is Pharmavite, a manufacturing company that produces dietary supplements. Pharmavite produces 11.5 billion tablets, capsules, and soft gels annually, and makes more than 35,000 shipments per year to both domestic and international customers. The majority of its 900 employees are located in the Los Angeles area, with 38 of them working on the company’s IT staff. Located in Northridge, California, Pharmavite has first-hand experience in the need for a well thought-out and tested business continuity plan. Although in 1994 the company headquarters was in Mission Hills, right next to Northridge, and its facility did take a significant hit, Greg Krietemeyer, manager of technical services, also points out that, “around here, people say that you can expect a significant quake about every 10 years. Since the last one happened in 1994, there’s a lot of talk that we’re overdue for another one. It’s not a matter of if, but when.” Pharmavite runs an iSeries 570 as its enterprise server, and also has 50 to 60 Windows boxes that are used as domain controllers and print servers in ancillary functions. Currently, the company is implementing a new ERP software package: The J.D. Edwards suite, now owned by Oracle. By the end of the year, all of Pharmavite’s business applications will be moved over to the new ERP system. Lakeview Technology’s MIMIX is used to replicate the data to another model 570 at a Sungard facility in Scottsdale, Arizona. At the time that the 1994 earthquake hit, the company had been in the process of beginning a business continuity plan. However, the plan had not been tested and documented. The quake primarily affected the corporate office and, says Krietemeyer, “The building we were in was red-tagged [no one was allowed in] and remained unavailable for a few months. The staff that had been located in that facility had to be relocated to other facilities and then later to a temporary location that could house everyone. A lot of people who were with the company then are still with us, and they have delivered a lot of shared learning that is helping us deliver on our plan today.” Pharmavite’s overall BCP process is broken into multiple areas. The Emergency Operations Center (EOC) team is comprised of the executives in the company–the decision makers. The IT team focuses on the IT infrastructure. And the Emergency Response (ER) team, whose role it is to look for anyone who needs medical assistance, is trained on a yearly basis to deliver CPR and first aid. Further, the company has a standing BCP team that meets on a bi-weekly basis to discuss issues and vulnerabilities–and it’s not just earthquakes that they plan for. Currently, Krietemeyer reports that it is talking about the Avian Flu as something that could potentially impact their organization and it is determining what preparations the company can take in advance. Other topics of discussion include terrorist attacks, chemical spills, toxic fumes, and the possibility of an airplane crashing into the building, since its facility is very close to an airport. Right now, Krietemeyer is working on a disaster analysis from a telecommunications perspective. He is basing his analysis on three different scenarios. The first of these (which he calls “the black hole”) assumes an event such as an airplane crash into the building during a weekday, in which the entire building and the people are lost. The second scenario (for example, a chemical spill) assumes that the building is unavailable. The systems would be up and running, but the people can’t get to them. The third scenario (for example, a fire in the data center) assumes that the systems are gone but the people are alright. For each of these scenarios, the company will analyze what is necessary to continue to operate the business. There are processes in place for each of the subteams, documenting what actions they should take in the event of a disaster, and regular testing occurs, says Krietemeyer. “From an IT perspective, we do at least two tests per year, and we try to test with the business community every other year, by actually rolling over to our DR site and letting the business people sign on and perform their regular work and then roll back to validate and make sure that everything was successful.” Beyond that, Pharmavite runs true disaster simulations in which a large group of the company as a whole participates. They work with MLC & Associates, a consulting firm, which initially helped Pharmavite to develop its BCP. “They will shoot us an email,” Krietemeyer says, “and say something like, ‘An earthquake has just occurred in such and such an area. What do you do?’ All of the subteams then take action and say what they would do if that actually happened. Then they’ll throw another simulation at us–and then another.” The most recent simulation started with an earthquake that then caused fires that were spreading toward some of the company’s buildings. Then rioting broke out. “It really forced each of the subteams to look at decision making in terms of a situation that could really happen.” Even the suppliers and customers are involved in these simulations and are informed in advance when a simulation is going to take place. (To date the simulations have been scheduled–but consideration is being given to doing spontaneous ones.) “Whenever we actually go through one of these tests,” says Krietemeyer, “I contact Sungard (the DR site) and Iron Mountain (offsite file storage) and other key vendors who will participate in the simulation. For example, I’ll call Iron Mountain and say, ‘This is Greg from Pharmavite. I need to have these tapes retrieved,’ and they will actually pull the tapes and put them on a truck and take them to the location we specify.” Does this level of business continuity planning sound like a lot of work? It is. Krietemeyer offers a rough estimate that Pharmavite dedicates approximately 5,000 person-hours a year to maintain and update the plan. And it will have additional work to do at the end of this year when the new ERP implementation is complete, performing a new business impact analysis to look at how the new systems will impact the company. But Krietemeyer maintains it’s worth it. “Having gone through the Northridge earthquake in 1994, this company is very committee to BCP, and the executive offices are very supportive. Even with all this work, however, there are still areas of vulnerability. Like Butcher at Karl Storz, Krietemeyer believes that its biggest area of vulnerability is from a manufacturing, packaging, and distribution perspective. “We have put processes in place to work with other vendors and suppliers in the event of a disaster, but the process for doing that is not going to be as easy as from an IT perspective, where we are fully replicated to another site. The timeframe for getting us back up and running is going to be longer if it’s non-IT impact.” For any companies that have not yet developed and tested and maintained a business continuity plan, Krietemeyer offers some advice: “Get started. Make it a priority. Make sure you have management support. Everybody assumes it’s never going to happen to me. But no matter where you are, something will eventually happen and you have to plan for it. And it’s a good idea to get an outside firm involved. If you’ve never done this, you don’t know what you don’t know.” |