QlikTech Adapts In-Memory Analytics for External Big Data
October 23, 2012 Alex Woodie
QlikTech made its mark in the business intelligence field by simplifying the BI experience and delivering results quickly from an in-memory associative database. But with the advent of “big data,” the company’s total reliance on in-memory technology was challenged. Last week, the company unveiled a new Direct Discovery mode that allows customers to process large data sets stored externally on disk, while keeping the associative data model in tact. QlikTech has ridden the in-memory wave quite successfully over the last decade. While its big BI competitors like Oracle, SAP, and IBM have developed or acquired in-memory database technologies to complement their heavy online analytical processing (OLAP) products, QlikTech has focused on enhancing and selling QlikView, its in-memory reporting tool that was rolled out in 1993. QlikTech went public in 2010, and today counts more than 26,000 customers and more than $300 million in annual sales. The Pennsylvania company isn’t the largest BI vendor, but it has influenced the direction toward simpler, smaller, and more nimble BI systems. Then came Big Data, and everything changed. Even midsize companies, like the ones that QlikView catered to, now want to analyze massive stockpiles of data to make an insight and use it for competitive advantage. This presented a challenge for QlikTech. It wasn’t so much that big data sets couldn’t be loaded into QlikView. You can get a Windows server equipped with multiple terabytes of RAM, and use QlikView’s compression algorithms to squeeze all that big data into its database. The big problem was that customers’ big data sets were of questionable value, it changed often, and wasn’t accessed regularly. One of QlikTech’s customers, for example, has billions of insurance claims from which to pull information. But loading all of the claims into memory simply wasn’t cost effective, since it would require a big hardware upgrade to enable that capability. This approach is also anathema to QlikTech’s smaller, nimbler mantra. So the QlikView developers went out and created Direct Discovery, a new feature that allows QlikView users to see external data sources on their screens, and to query those data sources for answers, just as they are used to with in-memory data. While the data is not in memory, the associative data model remains intact with the external data. This means that users continue to benefit from the “green, gray, white” color-coding of query results that shows them which categories of data fell outside of their query, thereby giving them the context to ask more intelligent questions the next time. Elif Tutek, technical product marketing manager for QlikTech, last week briefed IT Jungle on how Direct Discovery works. “You may have SAP data or some Facebook information, or maybe Teradata or Google Big Query. You don’t want to bring that data into memory as a part of the in-memory data model, but you still want to make it available to users,” she says. “With Direct Discovery, you can merge big data with other data sources. And as a user, I can still leverage the associate experience that allows me to ask the next question on the external data sets as well.” Direct Discovery, which is enabled as a keyword during the ETL process, uses standard ODBC connections to load external data sets as they are needed. For some bid data sources, such as Teradata, QlikTech has developed a custom connector (also announced last week). Standard ODBC should work for loading data from standard relational data stores, like DB2, Oracle 11g, MySQL, SQL Server, as well as newer big data stores, like Hadoop, Cassandra, Google Big Query, and others. Even big data sets stored in DB2/400 can be accessed. Performance will not be as snappy when a user is perusing data in the Direct Discovery dashboard, since the data is being accessed directly from disk. In a QlikTech test, Direct Discovery was able to return a query of more than 3 billion rows from a Teradata data warehouse in about three to four minutes. This is obviously much slower than the sub-second response that’s typical with QlikView’s in-memory technology. But it would take hours for competitors to return the same query, Tutek says. “I think this will help truly to solve the problem with big data,” she says. “The performance with in memory will always be much faster. That’s why we truly would like to position this as a hybrid approach where people will be leveraging the in-memory power of QlikView, and also access external data sources as well.” Direct Discovery also includes a data caching mechanism that customers can use to force QlikView to refresh the external data at regular intervals. The data caching enables any data already loaded to be used for further analysis within QlikView. But if customers want the latest real-time information, they can set the threshold very low and force QlikView to continually refresh the data. Tutek expects Direct Discovery, which ships in December as part of QlikView version 11.2, to be very complementary to the new data governance dashboard that QlikView launched several weeks ago. The data governance dashboard shows a demonstrator which data is being used, what selects users are making, and what metrics they’re using. Together, these tools will allow customers to make decisions about their big data usage. “Maybe, as a BI team leader, I just want to see how people will be using that data,” Tutek says. “Then by using our dashboard, I can see what type of data people are using, and then maybe I can decide that if they’re very frequently using Direct Discovery against a piece of data, to put it in memory, because the performance will be better.”
|