More Open Source Databases Coming To IBM i
June 15, 2020 Alex Woodie
IBM is in the process of bringing several new open source databases to the IBM i platform, including schema-less NoSQL databases as well traditional relational systems. Among the databases that Rochester is targeting are MongoDB, arguably the most popular NoSQL database and a favorite among Web and mobile developers, as well as PostgreSQL, one of the industry’s oldest and most widely used relational databases.
Databases today almost are like programming languages, with developers mixing and matching databases to handle different functions, often within the same application. DB-Engines.com tracks more than 330 distinct databases, while there are somewhere around 700 programming languages, according to Career Karma.
Open source has been a big provider of new programming languages for IBM i, with PHP, Node.JS, Ruby, Python, R, and others supported on the platform. IBM brought MySQL to IBM i over a decade ago specifically to support the array of pre-packaged PHP applications that were engineered to store data in MySQL. And when Oracle bought MySQL and killed its development on IBM i, IBM brought in MySQL’s drop-in replacement, MariaDB.
For many years, the MySQL/MariaDB line has been the only other database option on IBM i, besides, of course, Db2 for i. But that began to change in October 2019, when IBM announced that Redis, an open source key-value store developed by Redis Labs, would be supported on the platform with the delivery of IBM i 7.3 Technology Refresh 7 and 7.4 TR1.
Now we’re on the cusp of getting at least two more open source databases, and potentially more, available on the platform.
PostgreSQL on IBM i
First up is PostgreSQL, the object-relational database management system that emerged out of Michael Stonebraker’s work on Ingres at UC Berkeley in the early 1980s. PostgreSQL supports an array of data types, including Boolean expressions, arrays, characters, binary, date/time, bit strings, XML, and JSON documents among others.
PostgreSQL supports advanced features, including triggers, stored procedures, foreign keys, materialized views, and automatically updateable views. All aspects of ACID (atomicity, consistency, isolation, and durability), and can be used as the basis for transactional applications, as well as data warehousing and business intelligence workloads.
PostgreSQL can be used in scale-up (SMP) and scale-out (clustered) configurations. It supports all major operating systems, and its usage has surged in recent years, in part thanks to the success of an enterprise version of PostgreSQL offered by EnterpriseDB. According to DB-Engines.com, it is currently the fourth most popular database, behind Oracle, MySQL, and Microsoft SQL Server, respectively.
IBM has completed the work around PostgreSQL and the port of the database to IBM i has “shipped,” an IBM spokesman says. But the company has not yet communicated that to the outside world or created any external-facing Web pages about it. The software installs via Yum and RPM, like most IBM-sanctioned open source software on IBM i does anymore. More information is available on the project’s GitHub page.
MongoDB for IBM i
IBM is currently working on getting MongoDB running on IBM i, the IBM spokesman said. When that work is done, IBM i shops should also be able to download it via RPM and Yum, or, if you’re using Access Client Solutions (ACS), then simply through the ACS Open Source Package Management functionality.
MongoDB is the fifth most popular database in the world, per DB-Engines.com. The database was first created in 2007 by the folks behind the online advertising company DoubleClick (now owned by Google), which needed to server 400,000 ads per second. Rather than modify an existing database to meet its specific needs, the team decided to create its own database.
MongoDB is a document-style NoSQL database that stores data in the Binary JSON format, or BSON, which is an extension to JSON (Java Script Object Notation). The database arranges these documents into collections, which are roughly the equivalent of tables in a relational database. Developers tend to like MongoDB because these documents and collections more closely resemble the native data types in popular programming languages than the column-row and table abstractions in relational databases.
MongoDB features a flexible data schema that developers can change at any time, another big advantage over the fixed-schema approach of traditional relational databases, like Db2, PostgreSQL, and SQL Server. Like other NoSQL databases, MongoDB scales horizontally rather than vertically, allowing users to expand the size of the database by adding more nodes. For large data sets, MongoDB automatically splits up, or shards, the data onto separate nodes.
The database features a pluggable storage engine architecture to, enabling customers to use different engines to match different needs.
It also supports data replication through its replica sets feature, and also recently added ACID support for transactions. It supports indexes and triggers, but does not support foreign keys or joins (a concept that does not really apply to NoSQL databases). Developers can interact with MongoDB through the official API or through one of many official and unofficial drivers available for dozens of popular languages.
Backed by a New York City-based company of the same name, MongoDB today has more than 18,000 paying customers. MongoDB (the company) went public several years ago and today has a market capitalization around $11 billion. The company launched a cloud service in 2016 called Atlas that today accounts for more than 40 percent of the company’s revenue.
The IBM i server is widely used among the biggest companies in the world, and there are many IBM i shops among MongoDB’s customer base. At the company’s 2017 MongoDB World conference, the global bank HSBC discussed how it used MongoDB to create a new operational data store that would be used as the “single version of the truth” for its global equities and fixed income trading system.
HSBC’s architects were at first hesitant to use a “schema-less” approach, fearing that the whole thing could “spin out of control.” But as they learned more about the NoSQL database, they came to appreciate not only how it handles the data model, but other aspects of the database, like its built-in data replication. In the end, support for ACID helped convince the architects to use the database for the application.
But Wait, Could There Be More?
As previously mentioned, there are well over 300 databases in the world today. In the IBM i world, there are four officially supported – Db2, MySQL, MariaDB, and Redis – with another two (PostgreSQL and MongoDB) on the way. But if the IBM i server is going to support six databases, what’s to stop it from supporting a dozen or two more?
As you look at the list of the world’s top databases, there are several that stand out that could be a good fit. According to Erwin Early, a senior solutions consultant with Perforce (the company that now owns the Zend Server line of PHP runtimes and solutions), there’s at least one other database that could be a good fit.
“Another one that should be fairly easy on the IBM i platform . . . is Cassandra,” Early said in a recent COMMON iNSIGHT session titled “Exploring Open Source Databases on IBM i.”
“Cassandra is Java-based so it should be really a simple matter of just getting the source and bringing it over to the platform. It should just flat-out run. But I don’t know of anybody to date who has done that yet.”
Apache Cassandra is a NoSQL database that was based on Bigtable, a key-value store created by Google for storing large amounts of data and delivering fast read and write throughput with low latency. Cassandra, which is considered a wide column store, today supports multi-region global clusters, and while it can be complex to set up, is considered the gold standard for organizations that need to maintain the highest levels of data availability in a globally dispersed environment.
But why stop with MongoDB, Cassandra, and Redis? There are many other more exotic members of the NoSQL database family that enterprising IBM i customers could find a use for: Elasticsearch for log management and log analytics; Memcached for super-fast serving of data; Neo4j for graph data structures and analysis; Couchbase for edge-to-core analytics; Aerospike for extreme data serving performance across geographic clusters; and MarkLogic, one of the first multi-modal databases.
Supporting a database is no trivial matter. It’s where an organization stores its most important assets, so has to be reliable. There’s no indication that Db2 will cease being the database of record for IBM i shops, even with the new databases that are coming to the platform. But clearly as the IBM i community broadens its acceptance of open source technologies, there will be advantages to supporting specialized databases that can do things that Db2 was never designed to do. And that’s a very good thing.
RELATED STORIES
What’s New In Open Source With The Latest TRs (IBM i 7.4 TR2 and 7.3 TR8)
Digging Into the Latest IBM i TRs (7.4 TR1 and IBM i 7.3 TR7)