Giving IBM i The Storage Of Last Resort
July 26, 2021 Timothy Prickett Morgan
Every IT ecosystem has its niche players, and they are a vital part of that ecosystem just as are niche players as in the natural ecosystem. Lots of companies bridge the gaps between products and allow customers to do something that would be hard for them to replicate on their own. This brings real value.
Entrepid Corporation is one such niche player in the IBM i and broader Power Systems market, and it bridges the gap between IBM’s Power-based systems commonly used as database and application servers in midrange and large enterprises and the storage offered by EMC before and after it was acquired by Dell. We sat down with Brian Barth, the company’s president, to chat about what is happening with high-end storage and virtual tape library backup in the IBM i base, and got quite an education about the complexity that many IBM i shops are dealing with – and how they cope by tapping Entrepid.
Timothy Prickett Morgan: Let’s start with a little history about yourself and your company. Not everyone is acquainted with who Entrepid is and the niches where you provide services to the IBM i market. My goal is to make people more aware of what you do.
Brian Barth: I have had a few different businesses over the years, and Entrepid was formed more than two decades ago. We were the original AS/400 partner to EMC, and when they wanted to support their Symmetrix RAID arrays on the AS/400, they came to us. EMC has gotten bigger over the years, and has been acquired by Dell as well, and we have kept pace with that and we now provide the services of an embedded partner to Dell EMC. We did a lot of the original development and QA for the Data Domain VTL. We do automation, we provide Level 2 tech support for Symmetrix with the AS/400, and we provide their education, etc. Around half of our business comes from integrating Dell EMC products with Power Systems.
TPM: I remember when Symmetrix took off ahead of the dot-com boom. EMC was really shaking things up. When you say Symmetrix and AS/400, you probably mean also VMAX and PowerMAX and IBM i. Just so we get that clear and people don’t think you are supporting ancient machinery.
Brian Barth: [Laughter] Yes, I mean the Symmetrix family, and even the new PowerMAX is considered a member of the Symmetrix family, as is the DMX and the VMAX.
TPM: What Power Systems customers use those things? I mean, these must be very large IBM i and AIX customers, who might also be running Linux workloads that tap into the SAN storage.
Brian Barth: Up until recently, most of our customers were huge enterprises from all over the world. There are probably 700 or 800 very large companies running Power Systems with Symmetrix-class storage. And then when Data Domain was introduced, that started to push the EMC products down into the midrange. And then when IBM started to support the 512 block disk architecture with files that allowed connectivity to midrange storage for the IBM i. So our small and medium business sales picked up, and today enterprise customers are probably still 75 percent of our business, but the other 25 percent are more traditional SMB type businesses that are running on Unity, which is the latest version of the of the midrange storage for Dell EMC. And we also implement IBM storage on occasion, but that’s a smaller part of our business.
TPM: Do you provide services directly to customers or indirectly through Dell EMC?
Brian Barth: We work directly with Dell EMC as an embedded partner and then we also work directly with the customers. It’s probably 50/50 split on that.
TPM: What is happening in the high end of storage for Power Systems where Entrepid plays? This is block storage mostly, sometimes with a file system added, but not object storage, and I would not expect SAN storage to be a big, exploding part of the storage at most companies, but an important one nonetheless. What’s happening out there?
Brian Barth: As you know, IBM i is fairly difficult to integrate with anything but block storage. And so from what I see, this business is fairly steady and tends to grow with the companies. A lot of companies have talked about getting off the IBM i platform, but we see very little of that actually happening. The attrition rate is lower than people might think.
TPM: My guess is 1 to 2 percent per year at this point for the whole base, and slightly less so for larger customers that are more risk averse.
Brian Barth: I was going to say exactly that.
TPM: It’s very tiny, but still a bleed of course. And there are always some people being added to the base as well. All I know is that the attrition rate is a lot lower than it used to be.
Brian Barth: I see new people coming onto the IBM i platform, and probably at a lower rate than the bleed rate. What we see – and this is partly due to the fact that we are working with enterprise customers – is customers who used to have a mainframe backend migrate to Power Systems because it has very close to the same uptime, performs well, but it is half the cost. We don’t see people adopting Power Systems from the other direction, being application driven or anything like that. So the growth in storage is primarily organic.
As far as block storage on Power Systems goes, people also don’t archive their data because in order to archive it, you really have to take it off the system and put it in another format. And then you have to train your people how to use that other format to get to data that they could just leave on the system. So we see very little archiving going on and the storage requirements just tend to grow and grow. We have a few customers out there that are well over 100 terabytes of data on a single partition. I think this storage growth is fairly stable. It grows naturally, organically, but we don’t see a huge boom in sudden growth of requirements.
TPM: And just to be clear, 100 TB of storage for what is mostly relational databases is a lot of storage. If they’re not archiving their data, are they doing hot backup or disk to disk replication using Symmetrix Remote Data Facility or something like that?
Brian Barth: Local replication on the Symmetrix has gone from Business Continuance Volumes, or BCVs, to clones to snaps, which are known as SnapVX, and it is essentially a hard copy that’s created from the production data. And then in the case of a full system save, you IPL that image and then back it up from the IPL image. If it’s PowerHA trade you present that image to a backup LPAR and very it on and back it up from there, and typically that’s done in conjunction with SRDF. Quite often what people will do is they’ll replicate the data to another datacenter with a second Power System and then use that second power system as the backup as well because it’s normally idle because it’s a hot DR system. And then during the day they will IPL and back up each LPAR on that target system. And if there’s a failover, the roles are reversed, and if they have a planned failover it, then they will start doing backups in the opposite direction. If it’s an unplanned failover, and then they might decide to do both their production and their backups at the target site.
A lot of our business is automating these processes and integrating with PowerHA. Even when IBM implements PowerHA, IBM Lab Services will usually customize it to the customer requirements. We fulfill that function for Dell EMC. So as far as people that do what we do, there’s us and there’s Lab Services.
Typically customers choose between Dell EMC or IBM for the back-end storage for the IBM i platform. While you can hook in other kinds of storage, the support infrastructure isn’t there. So if you connect it to Hitachi or Pure Storage, it’ll run, but you’re on your own if you have an issue because Hitachi and Pure Storage do not have an IBM i support infrastructure. We have yet to see Pure Storage implemented on IBM i, although we have seen IBM, strangely enough, recommend Pure Storage twice with the Storage Virtual Controller in front of it. So theoretically, I suppose Pure Storage could be out there, but we really haven’t seen it. We’ve seen Hitachi once or twice, and that’s probably about it for the IBM i platform at large customers.
And then quite often, in the midrange where IBM would implement V7000s, we’re integrating Dell EMC Unity and we actually provide an integrated stack called Power Stack, which is typically a Power System scale out model with the switches and a Unity backend and Data Domain as the backup. We support that as an integrated, converged system.
TPM: Interesting. Are you an IBM reseller? You can resell Power Systems or V7000s or whatever?
Brian Barth: We are an IBM reseller for Power Systems. We don’t really get involved in the IBM storage. Typically when we sell storage or we sell an integrated system: Dell EMC storage with IBM Power Systems.
TPM: What is the hot part of the business now, in terms of growth?
Brian Barth: These days, I would say most of our new business is cyber recovery. We’re seeing three or four new cyber recovery opportunities come across every week for IBM i. Some of those are existing Data Domain implementations where we are adding a cyber recovery vault, and some of them are folks who want to improve back up time by moving off of physical tape and in conjunction with that move they are implementing a cyber recovery vault to ensure that they have an offline copy at all times.
TPM: What do you what do you use for your vaulting software? Is it just the hot site and Data Domain, or do you do something else?
Brian Barth: Dell EMC has a cyber recovery appliance that manages most types of data within the Data Domain as far as creating that secure copy, but it doesn’t support VTL. So we actually have an automation program that runs in a VM in the vault. And that automation program, which is cyber recovery for VTOL, is delivered and supported by Intrepid. And it does the same thing for VTL images
To create the vault, we establish an isolated connection directly from the production Data Domain to the cyber recovery vault that is on its own network. And then within the vault, there is also a switch and a management server that creates a retention log copy each day that opens the gap to let the replication flow through. That replication connection is limited to the replication port only, its firewall at the Data Domain. And then once that replication is complete, that air gap closes. A retention log copy is created that can’t be altered for the period of the retention, which is typically 15 days to 30 days, because beyond that, you know, nobody is going to recover from data that’s over a month old. And this isn’t intended as a repository for all data. It’s really intended as the recovery of last resort so that you have a copy of your data that is off the network and unavailable to bad actors, as it were. At the end of the retention term, the automation will then go through and clean up the retention lock copy and the cycle just continues. So it’s intended to be a completely hands off copy that is air-gapped off of the network and can’t be reached.
That’s probably most of our new business going for maybe a year, and it has really picked up in the past six months.
TPM: Do you know if companies that have had to recover from cyber attacks?
Brian Barth: There have been a couple. And, you know, it is certainly done what it was supposed to do.
For open systems, there’s another component that in that vault which actually goes through and looks for anomalies by comparing the images that it sees. This software, called Cyber Sense, assumes the first image is the baseline. It looks for any signatures, for malware, et cetera, within that baseline. And then it compares the next iterations to that baseline and says, oh, I’m seeing unusual activity within the files. This may be an instance of compromise. Unfortunately, Cyber Sense does not scan VTL. So we also do a security audit on the system and we review things like shares and permissions, and then we do a review of IBM’s Backup Recovery and Media Services, or BRMS, to determine whether or not they have everything in the vault that they need to recover their systems.
You would be surprised at what we often find. Quite often, we will find that the recovery relies on a baseline tape that may be four or five years old and no longer exists.
TPM: How do you fix that?
Brian Barth: If we are not trying to recover, then we just redesign their BMRS infrastructure so that they do have full backups. If it’s during a recovery, that’s tougher. We were on a project in Brazil recently and the company lost system during a migration because somebody presented the volumes to the wrong system and basically blew up an SAP setup. Then they went to recover it. They were doing full backups once a month and then incrementals every day for an 80 terabytes system. But what they didn’t realize was that it didn’t have enough tapes. To actually get enough of the incremental so they were reusing the tapes after a week and a half, so they were three weeks into this recovery, but they didn’t have the tapes to recover it. They had to recover from the journal receivers. From three weeks of journal receivers.
TPM: I guess luckily they had journal receivers backed up. . . .
Brian Barth: Yes. Why? They were keeping those for a month, but not their incrementals. I don’t know why they were doing what they were doing.
In most cases, what we do is convince people to do hot backups from a full backup to VTL with de-duplication on every night. Because if you’re doing that to physical tape, then you end up with a huge number of tapes. But if you’re doing it with a virtual tape that’s de-duplicated, you end up with about the same capacity as stored as you would from incrementals because most of the data is the same. So it de-duplicates 100 percent and only the change data is actually stored on disk. That tends to be one of the drivers for people going to VTL because it becomes much more efficient to recover from.
That’s kind of the conversations we have every day with enterprise accounts. They want to know how they can recover they have a problem in the shortest amount of time possible. Well, that’s not from a full backup plus incrementals every day for a month. That’s not true, and I know because quite often the incrementals are 75 percent of the size of the whole system. So you’re essentially doing 25 restores that may take a day apiece.
TPM: Yikes! That’s some food for thought in these trying cyberattack times. That’s for sure.
This content was sponsored by Entrepid.