IBM Introduces watsonx for Governed Analytics, AI
May 17, 2023 Alex Woodie
At its annual Think conference last week, IBM took the wraps off watsonx, a new platform for crunching big data and developing AI and machine learning applications in a safe and governed environment.
You don’t need an industry analyst to tell you that artificial intelligence (AI) is having a moment in the sun this year. Thanks to the public launch of ChatGPT in late last year, regular citizens (as opposed to AI experts and industry analyst) have become aware of the vast potential of large language models (LLMs) that can mimic humans in very convincing ways.
While LLMs like ChatGPT aren’t “intelligent” in the classic sense, they can still do some pretty impressive things, such as compose sonnets in the style of William Shakespeare, write high school term papers, or even generate syntactically correct RPG code.
Now the rush is on for businesses to take advantage of LLMs as well as their image-generating AI cousins, such as OpenAI’s DALL-E 2 and Stable Diffusion. Among the firms looking to find a foothold in this fast-moving world is IBM.
Watsonx isn’t IBM’s first foray into AI. It’s not even the first time Big Blue has used the name (anybody remember IBM Watson Analytics?) But the new watsonx offerings represent IBM’s attempt to build off the renewed interest in AI, including but not limited to generative AI like ChatGPT.
Watsonx is composed of three parts, including a development studio called watsonx.ai, a data lakehouse called watsonx.data, and a governance toolkit called (you guessed it) watsonx.governance. The first two watsonx components are expected to be available in July, while the last one is due in October.
Watsonx.ai provides a place for developers to train, test, tune, and deploy traditional machine learning models (think logistical regression and K-means clustering) as well as new generative AI capabilities. The studio ships a collection of “foundation models” that have already been trained on a “large, curated set of enterprise data backed by a robust filtering and cleansing process and auditable data lineage,” IBM says.
Users will gain access to LLMs as well as models suitable for training upon source code, time-series data, tabular data, geospatial data, and IT events data, IBM says.
Three groups of models will be included in the watsonx.ai studio. The fm.code models will include code generation models that will be useful for developers who desire a “copilot” experience when writing code. The fm.NLP collection will include LLMs for specific domains that can be tuned by users. Lastly, the fm.geospatial model will be trained on climate data for assistance in planning for natural disasters.
IBM is also planning on including thousands of open source models developed by Hugging Face, which creates generative models suitable for building chatbots and interactive AI applications. It’s also planning to include Watson Code Assistant, Watson Assistant, Watson Orchestrate, and AIOps Insights as foundational models in wastonx.ai, it says.
Watsonx.data serves as a lakehouse, which is a relatively new data architecture that blends elements of well-governed data warehouses built on trusted relational database technologies like Db2, Oracle, or Teradata along with the more scalable but messier data lakes based on HDFS or object storage systems, like Amazon S3 or Cleversafe, an S3-compatible object storage system that IBM acquired several years ago and which today forms the basis for IBM Cloud storage.
Watsonx.data will be available both on-prem and in the cloud, and will enable users to bring the analytic engines of their choice to bear on data stored there. That includes IBM’s well-known Db2 and Netezza engines, but also popular open source options like Presto and Apache Spark. IBM’s recent acquisition of Ahana, a Silicon Valley startup that developed a cloud service based on Presto, will likely play here in the near future.
IBM will support multiple file formats in its watsonx.data lakehouse, including Parquet, Avro, and ORC, which came out of the Apache Hadoop data lake era. A key development for IBM is that it is also supporting Apache Iceberg, a relatively new table format that layers the consistency and transactionality needed to ensure trust in data but which was lacking in the older Hadoop-era formats.
Finally, there is watsonx.governance, which is an AI governance toolkit designed to help users build trusted AI workflows. IBM says this toolkit will help customers operationalize governance while mitigating risk associated with building AI models. The governance toolkit will provide the mechanisms to “protect customer privacy, proactively detect model bias and drift, and help organizations meet their ethics standards” while providing transparent and explainable outcomes.
“With the development of foundation models, AI for business is more powerful than ever,” Arvind Krishna, IBM’s chairman and CEO, stated in a press release. “We built IBM watsonx for the needs of enterprises, so that clients can be more than just users, they can become AI advantaged. With IBM watsonx, clients can quickly train and deploy custom AI capabilities across their entire business, all while retaining full control of their data.”
There doesn’t appear to be any specific IBM i connections at this time. But there are likely few hurdles to moving data out of Db2 for i and into the new IBM lakehouse – besides the traditional data integration and ETL headaches that perpetually plague developers and data engineers. For IBM i shops looking to begin exploring the power of AI to impact their businesses, watsonx is another offering that may be worth checking out when it becomes available in July.
For more information on watsonx, check out www.ibm.com/watsonx.
RELATED STORIES
Watson-Inspired Pattern Matching Drives IBM i Performance Breakthrough