Coglin Mill Reaches Out to Salesforce.com with ETL Connector
October 2, 2012 Alex Woodie
Coglin Mill is close to shipping a new release of its RODIN extract, transform, and load (ETL) tool for building data warehouses on the IBM i server. With the upcoming release of RODIN version 7.2, the company will include a connector for loading customers’ Salesforce.com data into their IBM i analytics engines. Coglin Mill has also improved the product in the areas of slowly changing dimensions, user defined functions, and real-time ETL. Coglin Mill already supports the most widely used databases with its RPG-based RODIN tool, including DB2/400, DB2 for LUW, Sybase, SQL Server, and Oracle‘s 11g and MySQL. These databases are popular enough in Coglin Mill’s base of IBM i customers that it developed standard connectors for them. “Lately we’ve started to get requests for some of the more–I wouldn’t necessarily call them exotic–but different data sources,” says Alan Jordan, vice president of Coglin Mill. “And one of them is Salesforce.com. More and more customers are using Salesforce for CRM, and it’s nice to pull some of that information into your data warehouse.” Jordan doesn’t see a huge shift to cloud-based apps among his customer base, particularly when it comes to core transactional systems, which businesses still prefer to run on their own servers and in their own data centers. “You have customer information in your applications on the AS/400, but Salesforce might have information about new opportunities with customers or contact information,” he says. “So you just try to build a universe of ‘What do I know about this customer.'” Salesforce publishes APIs that allow customers to programmatically extract data from their hosted CRM environments. But rather than start from scratch with the APIs, Coglin Mill tapped a third-party software company that had already developed a connector with the Salesforce APIs. Coglin Mill could have built its own connector, but going this route cut down on Coglin Mill’s development time, and resulted in a better overall product, Jordan says. “It made it a little more robust, requires less error handling and less customization that we needed to do in our code,” he says. “We’ve taken this raw connector and built all the metadata structure around it, just like we do for all the other data sources. It allows us to easily access that data.” The Salesforce connector is described as a “JDBC-like” connector. That is, it isn’t 100 percent pure JDBC, but for all intents and purposes, it looks and behaves like a standard JDBC connector, and brings the customer all the advantages and protections of using a standard-like JDBC, Jordan says. The connector makes it easy for RODIN customers to locate and load their Salesforce data sources into their IBM i data warehouses. “It looks to us just like a DB2 table when we’re actually using it,” Jordan says. “But when we’re pulling the data, we’re actually using their cloud API through this JDBC-like functionality to pull the data.” Challenging Dimensions RODIN 7.2 also brings new tools for automating the handling of slowly changing dimensions. While “slowly changing dimensions” sounds like a relativistic conundrum that may have stumped Albert Einstein, it actually is a fairly common problem afflicting larger and older data warehouses and multi-dimensional databases. The problem has to do with the way that companies categorize the various entities they track in their database (such as customer size, region, or sales representative), and how those categories can change over time. For example, say Big Bob’s BBQs is classified as a small business in a supplier’s data warehouse in 2003. Over time, as Big Bob’s BBQ’s sales grow, the company is no longer a small business, but a jumbo-sized one. This can have an effect on how Big Bob’s is categorized in summary reports generated from the data warehouse, and can skew that report designed back in 2003, so that the results are no longer accurate in 2012. “I wouldn’t say it’s a big problem for everybody, because not everybody does this sort of thing, or they might accept that they can’t get those reports back from 2003 in the same way,” Jordan says. “The solution is to use a concept called slowly changing dimensions. That means that, every time an attribute of an entity changes–like changing a customer’s group from small to large–instead of just updating the customer record, we create a new customer record as of this date. This complicates my database simply in that I have multiple images of the same customer, or the same product or the same store or the same vendor or whatever it is that I’m tracking. But it gives me the tremendous flexibility to correctly report the attributes of that entity as it was at the time of the transaction.” RODIN already had the tools to support slowly changing dimensions, but the information had to be tracked manually. With version 7.1, the tracking is all done automatically. “Now when you build an ETL job that targets that table, we automatically recognize the changes and do all the work for you rather than you have to do that as part of ETL,” Jordan says. “Instead of taking a half day’s work, now it’s 10 minutes of work.” UDF: User’s New BFF The new user defined function (UDF) functionality in RODIN 7.2 is just that: It lets users define their own functions, their own logic, and implement them as part of RODIN’s ETL process. What’s more, it lets users reuse those UDFs as many times as they like. The new UDF builds upon the many pieces of logic that Coglin Mill has already built into the product, including RODIN’s support for calling exit points. Jordan expects the new feature to be used by companies with specific industry needs, such as insurance companies with complex functions based on actuarial tables. That sounds simple enough, but what makes the new UDF feature so powerful is the amount of work that it will cut down, Jordan says. “It’s kind of ‘ho-hum,’ until you have the need. And then it’s the best thing since sliced bread,” he says. Customers that find themselves doing the same calculation over and over will be able to define it once as a UDF and reuse it to their heart’s content. “Being able to create your own reusable functions just extends the functionality of the product. Reusability is what we’re all about now and making it simple to reuse building blocks,” he says. Going Real Time One of the final enhancements in RODIN 7.2 is support for real-time loading of data into a data warehouse. Coglin Mill has several customers that need to be able to flow new transactions or database changes from the ERP system to the data warehouse as soon as those transactions or changes occur. RODIN can capture the data in multiple ways, including the use of triggers, journals, and its existing change data capture (CDC) function. The new real-time data capture feature supports multiple platforms, enabling customers to capture a wide view of activities in the company as they occur. The most obvious use of this real-time feature is to build dashboards that tell executives how the business is performing in certain areas, such as order fulfillment or sales. This isn’t necessarily what a data warehouse is supposed to do, Jordan says, but RODIN provides the technological plumbing to accomplish this particular task, so why not use it that way? “My personal belief is you should minimize this as much as possible,” Jordan says. “If a real data warehouse is all about what has happened in the past, and analyzing the past to predict the future, or to at least help me understand what’s going on in my business, then do I really need to know what happened 30 seconds ago? Ninety percent of times, the answer is no.” Coglin Mill also changed how real-time CDC and ETL jobs are executed. Previously, if a customer was using CDC to grab data from 100 database tables for loading into a data warehouse, there would be one CDC job and 100 ETL jobs running on the system, making for 101 total jobs on the server. Now, all of those tasks can be combined into a single server job, which will reduce the amount of job management required on the part of the customer. Development on RODIN 7.2 is done, but there is more testing yet to do. The new release is expected to be generally available in December. For more information, see the company’s website at www.thinkrodin.com. RELATED STORIES Peppermill Gets Fast Results with IBM i Data Warehouse RODIN Admitted to IBM i Solution Edition Program Coglin Mill’s ETL Makes Data Warehousing on IBM i More Attractive Coglin Mill Debuts Lower Cost Versions of ETL Tools
|