Turning a System i into a Time Machine: Nippon Express and CCSS Show How It’s Done
April 22, 2008 Greer Hahn
As the spectre of recession begins its inevitable and indiscriminate creep into boardrooms across the world, the attentions of CEOs begin to scan the body of the organization, looking for areas in which to trim the fat that might sustain them through the tough times. So how do IT managers justify and protect their existing budgets and prove their System i environment can punch above its weight? Simple–they turn their System i into a time machine. Within the heart of any logistics organization beats a ticking clock–to say it was a time sensitive industry would be an understatement. Nippon Express (USA), part of the global logistics heavyweight, Nippon Express, has built an enviable reputation on reliable and competitive logistics services for land, sea and air transportation and distribution. Based in New York, the company serves the needs of thousands of customers across the United States. For Nippon, the clock is always ticking in line with customer expectations; increasing efficiency, for them, directly translates to raising those customer expectations higher still and also improving their own profit margins. Helping them to meet these demands are two IBM System i 825s directly supporting the business applications and productivity of over 1,500 users. Nippon already operates a ‘trim’ systems management team and, like many System i environments, they were struggling to meet the demands this placed on them. The nature of their business means their core team is spread out from coast to coast operating out of New York, Chicago, and San Francisco. The business operates seven days a week, from 6 a.m. EST to 10 p.m. PST. The stretched team not only had to ensure maximum availability and performance during these hours, but also, between them, they needed to accommodate any out-of-hours issues. Shin Nakamura, systems analyst at Nippon, explains the central issue they were facing in their System i environment. “We have great processing power and reliability with the 825 boxes, but without the right systems management approach, it was clear that we were not passing on those benefits directly to our user community, and in turn, our customers,” he says. “We needed to change our approach and employ the tools that would help us do more, with less resource. In short, we needed more time! That’s the challenge we presented to CCSS and together, that’s exactly what we’ve achieved.” Proactive Monitoring CCSS is an IBM System i systems management software specialist. They recognized that while the problems experienced at Nippon were not unique in System i environments, they had become a barrier to increasing efficiency in the organization and therefore, had a direct impact on profitability. CCSS recommended they install their three major solutions to resolve these problems: QSystem Monitor, QMessage Monitor, and QRemote Control. Doing so would also help them to implement a new approach to managing their systems–an approach that was consistent with their time sensitive needs. Prior to installing the solutions, Nippon’s team were burdened with the laborious daily task of logging into all the partitions and checking to see if everything was running properly. They employed a simple word document based checklist system and an operator would go through the checklist at three different times during the day and then email to his colleague for checking. If the operator was busy with an urgent task, the checklist would either be completed without looking at the system, or simply not completed at all. The result was sporadic and inaccurate monitoring with many issues being discovered after receiving calls from internal users or even external customers. QSystem Monitor, a performance monitoring and reporting solution, has helped the team to turn around their reactive monitoring state. Now, all team members have a real-time, centralized view of their critical performance parameters across all partitions. With system visibility now at 100 percent, the team is able to double what is delivered with the same ISD resources. What’s more, they have been able to ditch the time-consuming checklists in favor of automated monitoring and alerts that will warn them when a parameter or status has been breached or changed. This leaves the team free to concentrate on other tasks, knowing they need only attend to the system if there is an impending issue. Catching issues ahead of time like this effectively resets the clock in their favor. Users no longer highlight problems, productivity is not impacted and the organization is maximizing its systems for best efficiency. Essential disk space monitoring allowed the team to pin-point abuse of space, be it from excessive deleted records or unnecessary large files, so these could be easily corrected. As a result, Nippon was able to remove millions of deleted records and free up their valuable disk resources for better use. By implementing distribution queue monitors they have experienced significant progress in efficiency with 84 percent improvement on the time saved at branch and another 84 percent improvement on time to receive DI data. The team is now able to easily share the results of the systems’ performance turnaround with upper management; giving them the same degree of visibility operators enjoy and offering proof-positive and easily understood verification of virtually any systems management issue, be it real-time or historical. Resolving the Application Situation Nippon’s business applications are the hub of their productivity and allow the users to log, track, and manage each scheduled delivery for their customers. With the user community stretching across 67 branch locations, serving 55 cities in the US, Canada and Mexico, the need for these applications to be running efficiently is crucial. Nakamura explains the chain of dependence between users, the applications, and their System i machines. “In a worst case scenario, if our system were to fail during weekday off-hours or weekends, our business operation would effectively stop. The impact on other areas, such as the air transportation business would be extreme. Even in a less serious situation where the application is being held up waiting for a message to be answered by a programmer, users are impacted and that means, to varying degrees, productivity and revenue loss. These are the situations we needed to avoid.” Programmers had no easy or fast means of responding to urgent system messages relating to applications. For example, if an issue occurred, operators would first have to find the message in accordance to their timing and then once found, this was copied and pasted into an e-mail and sent to the programmers to resolve. As there was no instant way of determining which problem message belonged to which programming team, all teams would receive the same e-mail. This system meant all programmers were disturbed and would have to check all messages sent to see if it fell in their realm of responsibility, effectively wasting a lot of their time. Since implementing QMessage Monitor, the operator no longer acts as the middle man. Automated monitoring, filtering and escalation procedures ensure that urgent messages are flagged for immediate attention and sent directly to the person responsible for resolving them. Operators never need to manually hunt for problem messages and programmers can be more productive without constant interruptions for jobs that are not relevant to them. Users also benefit as issues around application messages, that left unattended could prevent them from working, are now answered immediately. Like QSystem Monitor, this allows the team at Nippon to stay one step ahead of problems. The proactive approach employed here means that operators are free to work on more important tasks and their time is not spent on painstaking manual monitoring tasks. Nakamura says, “We have a real-time view of our systems now–both performance and messages. This alone has reduced downtime and saved our team countless hours that were spent on manual monitoring. We’ve calculated that our operators now spend 87 percent less time monitoring the system–that’s an incredible gain. For us, saving time is saving money, so it’s been very worthwhile.” For Every Inaction, There is a Reaction As part of their daily tasks, Nippon creates daily tape back ups. Should those back ups experience a problem and not go ahead, system data, including important customer transactions, could be lost, leading to possible shipment delays. QMessage Monitor’s event monitoring feature has now been deployed to alert managers of an event, in this case the daily tape back up, that has not occurred as it should have. In an out-of-hours situation, managers now receive a text message to alert them to the problem and can arrange for the backup to be rerun. Nippon utilizes 12 separate escalations to 12 different groups. This allows for instant precision message/problem allocation during office hours with messages arriving in the e-mail inbox of the person responsible. If that message remains unanswered, it is sent on to the next person responsible and so on until it is resolved. This gives more accountability for each issue and each person. Outside of these hours, messages are sent via 12 additional escalations in the form of SMS text messages directly to the mobile phone of the person responsible. All urgent messages have the potential to impact business operations if they are not resolved quickly. “We recently had a hardware failure message that was immediately identified by QMM and we were able to resolve it,” Nakamura says. “In the past, given the same circumstances, there would have been no way to respond as quickly and we would certainly have experienced downtime as a result. We expected to see a ROI payback on the CCSS solutions in an 11 month period, but after seven months, we already have substantial measurable results. We’re very pleased.” With QRemote Control completing the systems management install, Nippon now have a fast and effective means of responding to issues that occur outside of office hours without using their laptops to connect to the system. This has virtually eliminated their weekend monitoring hours. Now urgent messages are sent to managers (and can be based on calendar availability of authorized personnel) who can respond by running commands and programs directly from their mobile device. The burden of being “on-call” is now greatly reduced and managers are free to be in any location without their laptops and without neglecting urgent system issues. This has proven to be of great value with regard to their application messages too. In this case, the input and processing of data is not held up by application messages in need of response. Increasing their efficiency by taking a proactive approach to their systems management means the company has been able to make significant savings in man-hours and potential downtime across a number of different areas. These far reaching benefits impact not only to the team that manages them, but also their user community and customers. Much like the company itself, Nippon’s lean and highly productive network now looks fit to outpace any economic uncertainty and keep on delivering. Greer Hahn is a freelance IT writer and can be contacted at greerhahn@gmail.com
|