IBM Mulls Using DataMigrator as Cloud Warehouse Pipeline

June 29, 2022 Alex Woodie

IBMers are playing around with an update to a little-known ETL tool called DataMigrator for i that would make it easier for IBM i shops to move transactional data to cloud-based data warehousing platforms. The update would use Apache Kafka for the final connection to the cloud.

DataMigrator ETL Extension, as the product is formally known, is an extract, transform, and load (ETL) tool created by IBM back in 2015. The product was developed to work with Db2 Web Query, the Java-based business intelligence tool that it OEMed from Information Builders (now owned by TIBCO).

DataMigrator for i (as most users call it) was originally envisioned to create, populate, and maintain database tables on Db2 for i, usually from other Db2 for i instances, but also external data sources. The software has the capability to performs bulk loads as well as real-time updates of data via changed data capture (CDC) functionality based on IBM’s remote journaling technology.

While DataMigrator is typically used to load data onto Db2 for i so that it can be queried with Db2 Web Query, IBM is also looking at using DataMigrator to extract Db2 for i data and load it into a cloud data warehouse, such as those run by Snowflake, AWS, and Microsoft Azure, says Doug Mack, a Db2 for i consultant with IBM Systems Lab Services.

DataMigrator for i provides a native ETL tool for loading data warehouses on IBM i. (Graphic courtesy IBM.)

“We’ve been building some proof of concepts around using Apache Kafka to act as the streaming hub, if you will, to get data into any of these cloud-based services,” Mack told IT Jungle at the recent POWERUp conference in New Orleans, Louisiana.

“Kafka provides these connections to just about everything,” he continues. “So we view the architecture to be, for IBM i data . . . more of a push mode, where I’m using the ETL component to grab changed data – it could be near real time – stream it over to Kafka, and then Kafka pushes it up to the target.”

IBM i already supports Apache Kafka (supported was added nearly two years ago). Kafka, which was written in the JVM-compatible language Scala, is a distributed publish and subscribe (pub/sub) framework that was originally developed at LinkedIn to handle the large number of messages the social media site handles on a daily basis. Kafka essentially acts as a real-time message queue with built-in delivery guarantees. The open source Kafka community has built a number of connectors that enable developers to plug various databases, file systems, and applications as either sources or sinks. Confluent, the commercial entity behind Kafka, offers Kafka as a service in the cloud.

IBM Systems Lab Services is writing code that would help to glue DataMigrator and Kafka, Mack says. “We have to build programs to do that, whether it’s data queues or REST Web service,” he says. “There’s code that has to be tied to DataMigrator to interface into Kafka.”

While the IBM i server can function as a data warehouse – Mack certainly expressed his opinion on that at POWERUp – most IBM i shops prefer to use it strictly as a transactional machine. The reality is that many IBM i shops are adopting cloud data warehouses, such as Snowflake, Amazon Redshift, and Azure SQL Warehouse. Accommodating this reality makes good business sense.

However, there’s a bit of an impedance mismatch between the way the cloud data warehouse folks think about data and the way IBM i folks want to run their system. Bringing them together in a control manner can help both sides get what they want, according to Mack.

“The analytics team that’s sitting out here is saying, ‘We want to bring IBM i data into whatever it is we’re doing.’ They’re kind of looking at kind of a pull approach. ‘Let’s just get an ETL tool on the cloud side and pull data,’” Mack says. “But the IBM i people say ‘We don’t want you running these big extraction queries against our production database.’ They’re scared to death that somebody over there that doesn’t understand indexing and EVIs and things like that, or pulling data out of a remote journal.

“That’s why they reach out to us,” Mack says. “There’s got to be a better way.”

The work is still not quite production ready. “We’re a little bit in our infancy,” he says. Interested customers would most likely need to sign up for a Lab Services engagement to implement the solution, he says.

For more information on the product, check out https://www.ibm.com/support/pages/datamigrator-etl-extension.

This Issue Sponsored By

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search