Lakeview Touts Customer Win at Competitor’s Expense
May 22, 2007 Alex Woodie
Lakeview Technology is using the bad i5/OS high availability experience of a Nevada casino to tout its own product, MIMIX, and rub dirt in the face of its longtime competitor, Vision Solutions, which now owns and sells the popular Echo2 high availability software. As the story unfolds, it will become apparent that high availability industry still has a ways to go on two fronts: making HA software more automatic and self-managing, and separating HA fact from fiction. Black Gaming Operations is a hotel and casino business based in Mesquite, Nevada, about 100 miles northeast of Las Vegas. The company operates three properties–Casa Blanca, Oasis, and Virgin River–encompassing nearly 2,000 hotel rooms, 1,500 slot machines, half a dozen restaurants, and three golf courses. To manage these assets, it relies heavily on IBM‘s System i server and industry-standard software, including Agilysys‘s Lodging Management System (LMS) and Materials Management System (MMS), Infor‘s Infinium Financials, and Kronos‘ iSeries Central time and attendance package. With applications for three properties residing in one central location, Black Gaming has a powerful incentive to minimize server downtime. The company installed iTera’s Echo2 high availability software in March 2006 to replicate data and objects between their i5/OS servers. Almost immediately, the company started having problems. According to both parties involved in the installation, the problems had to do with DASD. A problem would occur that would fill up the production server’s disk drives, and it would go down. When this occurred, Black Gaming would contact iTera (which was acquired last fall, in the middle of the problems Black Gaming was having), and the HA vendor would troubleshoot the issue. According to Vision Solutions, Black Gaming’s production server crashed for the first time in June 2006. According to Vision, which keeps logs of customer activity when customers call in for help, the initial problem had to do with an incorrect configuration of the Echo2 software. Vision says it found the journal manager was not deleting journals because it was set to require a save first, which resulted in journal entries piling up, eventually using up all the i5’s disk space and causing the server to crash. The customer changed the configuration to not require a save first, and it cleaned up everything and Echo2 was working properly, according to Vision. Later that month, Black Gaming contacted its vendor again with an issue concerning the server’s Integrated File System (IFS). The vendor advised Black Gaming to get current with Program Temporary Fixes (PTFs). The company contacted the vendor again in July when it discovered replication errors during the daily audit. Again, PTFs and an incorrect configuration of Echo2 was to blame, and the vendor assisted Black Gaming with fine-tuning the configuration for its particular environment. Black Gaming’s i5 crashed again in March 2007. However, when this happened, the Black Gaming IT professional in charge of the high availability setup was out of town, and so was his Echo2 consultant, who was out of the country on an installation. The responsibility fell to Justin Nelson, director of IT support for Black Gaming. Support Problems Nelson says that when he called Vision for help, he was met with resistance. “I couldn’t get them to help me,” he says. “They said it wasn’t our problem.” However, there are differences in both parties accounts of what happened. According to Vision, when Nelson called, the customer support person offered to walk Nelson through a role swap. However, because Black Gaming had yet to initiate its first role swap using Echo2, Vision says, Nelson decided not to attempt the role swap. Nelson said it was true that they had not yet performed a role swap with Echo2, aside from a single role swap performed in a test environment, but Vision never offered to perform a role swap. Instead, Nelson says he asked Vision for help in fixing what he concluded was the same replication error that had been occurring for the past 10 months. The signs were the same–DASD all used up–and the result was the same–crashed i5. He wanted Vision to fix the problem on his production machine, and he says they refused. According to Vision, it refused to help Nelson get the i5 back on line because it deemed that the problem Black Gaming was having with its i5 was unrelated to Echo2. Vision claims the crash was the result of a runaway query that was taking up all the i5’s disk space, but Nelson was convinced the problem was related to Echo2. “I knew it wasn’t a hardware issue. I knew it was a replication issue,” Nelson says, adding that he knew he could bypass Echo2 by deleting all the journals it created and clearing out the DASD. That was basically the workaround they had used before. But Nelson, in the main System i programmer’s absence, wanted Vision to at least try to fix the problem. “I said, ‘This is absolutely absurd that I can’t get support. We’ve got to find out what the hell is going on here.'” Angered and frustrated by Vision’s refusal to provide support, Nelson instead turned to his distributor and application vendor, Agilysys, for help with getting the production server back online, and Agilysys provided the support. With the lines of communication between Black Gaming and Vision a little bit frayed and the people most familiar with the installation out of town, it was difficult for Nelson to get a clear picture of what was going on with Echo2. He surmised that the Echo2 product was responsible for the disk problem that was causing the production iSeries to crash. Regardless of who’s fault it was, Nelson felt a lack of confidence in his high availability vendor, and that was enough to warrant action. “Maybe it was an installation gone wrong,” he says. “But they should have made it right.” For the record, Vision admits that it made a mistake in not providing Nelson more assistance. The company says that, even if the problem was unrelated to Echo2, it should have provided more help in bringing the crashed server back up online in similar situation as a “value add,” according to Vision spokesman Bill Rice. Foot in the Door Upset with Vision, Nelson started exploring alternative products. Because Agilysys is a close partner with Lakeview Technology, a MIMIX reseller, and a key part of its channel–especially in the lucrative and highly iSeries-centric Nevada gaming industry–it seemed like a natural fit to check out MIMIX. After all, it was Agilysys that had recommended Echo2 as a solution in the first place. Agilysys also backed Nelson’s assertion that Echo2, not a runaway query, was responsible for the system crash in March. As part of the pre-sales process, Lakeview brought in an auditing tool called MIMIX FactFinder. This software, which runs remotely, is designed to detect whether data and objects are properly synchronized from the production box to the backup box. Lakeview says the software is designed to work with the high availability products from other vendors. When Black Gaming ran MIMIX FactFinder, they discovered 20,000 objects that weren’t in proper synch, according to Nelson and Lakeview. To make matters worse, they were general ledger objects–the financial guts of their ERP systems. All of this–the crashed server, the ill-fated support call to Vision, the turn to Agilysys, the FactFinder audit tool, and the eventual replacement of Echo2 with MIMIX–occurred in a very short amount of time while the person normally responsible for Black Gaming’s high availability software was out of town. Nelson signed off on Lakeview’s Safe Passage program, which enabled the company to get MIMIX for free, with only the maintenance due. Disputed Claims Vision strongly denies the accusation of the 20,000 unreplicated objects and rejects Lakeview’s assertion that the MIMIX FactFinder audit tool can accurately spot errors with Echo2 replication. According to Vision, it is impossible to accurately analyze the current state of replication with an outside tool due to the use of commitment control, referential constraints, and other advanced database capabilities. Things get pretty complicated, according to Rice, the Vision spokesman. “A lot of stuff goes into how and why things get synchronized,” he says. “For the sake of argument, say a box gets destroyed. Our product on the other side has kept track of everything. When you do a failover, it’s applying changes and bringing it into proper state. It ties up all those loose ends.” Vision also has an audit tool like MIMIX FactFinder, but it’s usefulness in gauging other vendor’s products is limited. “We can’t even do it accurately,” Rice says. “The only way you can really know if an HA product is going to work is if you do a switchover.” However, Black Gaming never did perform a switchover, meaning that they lacked the true visibility into the system. Today, Black Gaming runs regular role swaps with MIMIX and has confidence in its high availability system, according to Nelson. He cites other benefits too, notably around MIMIX’s browser-based interface. “Just to see a help screen from a browser, that’s huge for us,” Nelson says. “I no longer have to have [the iSeries guy] run through the audit for 30 minutes at the end of the day to make sure the subsystems are running.” Vision says it hasn’t developed a graphical interface for its Echo2 software because most System i administrators prefer the speed and simplicity than the green-screen interface has to offer. Lessons Learned Despite improvements, HA software is still far from being completely self-managing and self-healing, as Black Gaming learned. Problems can still crop up, and it takes a human to fix them. Products from Vision and Lakeview are both getting better at simplifying the management and pushing as much of the intricacies of replication under the covers and into autonomic systems, but the complex and varied nature of each customers’ implementation mandates a certain degree of manual oversight, especially considering the million-dollar price tags that failure can bring. Perhaps the most important lesson learned is that role swaps absolutely must be performed on a scheduled basis. There is no other way that a user can be sure the product is working properly and is doing its job. This is probably the only thing that all the HA software vendors can agree on. But all too often, customers do not perform these critical tests, which has the effect of putting a giant question mark on the usefulness of all the hardware, software, and services the customer has invested in. The IT department at Black Gaming, like the data center at most companies today, is pressed for time and under increasing pressure to do more with less. It’s up to the administrators, programmers, and analysts–the grunts on the ground–to convince the IT directors, CIOs, and even the CEO if need be of the importance of testing. This point is even more important considering that these servers are running multiple critical applications. No server–not even the System i–is foolproof. Users should also be aware of the competitive nature of their software vendors–particularly in the System i high availability space, which has a history of being particularly raucous–and to use it to their advantage. Vendors often have offers in place that can bring a deep discount on license or maintenance fees if users switch from a competing product, as Black Gaming did. However, be wary of any tools that the vendors bring in and run against the other product–the results can be difficult to verify at best, and it could end up spinning you the wrong way.
|