How To Contribute To IBM’s GenAI Code Assistant For RPG
July 15, 2024 Timothy Prickett Morgan
Back in October 2023, IBM launched its “Project Hopper” Watsonx Code Assistant for Z, which as the name suggests is a programming assistant that will eventually be built into the open source VS Code integrated development environment that was created by Microsoft and that is being explicitly trained to help programmers take applications coded in COBOL and convert them to Java. We speculated back then about how LLMs and GenAI might be used to do similar – and different – things for the IBM i platform, and back in May at the POWERUp 2024 conference Steve Will, the chief architect of the IBM i platform, confirmed that Big Blue was building RPG assist tools.
Now, IBM needs RPG code – lots of RPG code – to train its code assistant model. And it is soliciting help from the world’s RPG programmers. Which means you.
The resulting AI model trained with this code that is donated is almost certainly not going to be used to move RPG code to Java along similar lines as the code assistant for Z. (Although IBM will certainly have that capability with its AI models if it becomes something that customers want.) We can envision a whole lot of code assistants for IBM i, which can help to generate new RPG, COBOL, Java, PHP, and Node.js application snippets. We have thought all along that a code assistant for RPG would do translations – this is what the large language models, or LLMs, at the heart of generative AI platforms were initially created to do – from prior RPG code to modern free form RPG, and Will has endorsed this idea in the past and is doing so now.
IBM has not said specifically what the plan is yet, but there are lots of possibilities, which Will hinted at in his POWERUp keynote back in May and which he discussed at length in a webcast last week. (That webcast, called IBM i & AI – Strategy & Update, will eventually be posted at this COMMON Guide Tours 2024 link.) Here is the current list of possibilities that Big Blue is considering, and your input, if you provided, will help Will prioritize the work:
There are a lot of different LLMs that Team Rochester could have used to create Code Assistant for RPG, and one of them is StarCoder from ServiceNow, Jurassic-2 from AI21labs, Luminous from Aleph Alpha, Phi-2 from Microsoft, and Llama 2 and 3 from Meta Platforms. But according to Will, the IBM i folks have chosen IBM’s own Granite family of LLMs. IBM has three different families of AI models of its own in the Watsonx stack, which also includes dozens of other open source LLMs. All of the models are named after metamorphic rocks. The Slate models are encoder-only models, which means they cannot be used for GenAI but they are good at classification and entity extraction work. The Granite models are decoder-only models, and they are only used for generative tasks. And the Sandstone models employ a mix of decoder and encoder AI approaches and can be used for generative and non-generative work.
Given that Code Assist for RPG is about generating new code or converting code, Team Rochester is using the Granite model developed by IBM Research and embedded in the Watsonx stack.
The Granite model was trained across an unknown number of parameters, but we have to figure it must be a high enough number to be effective but not so high as to require more performance than IBM Research has on its fairly modest GPU cluster, which is called “Vela” and which I wrote about over at The Next Platform here. This machine has a mere 360 nodes, each with eight “Ampere” A100 GPU accelerators from Nvidia, for a total of 2,880 GPUs. Those GPUs have 27.9 petaflops of peak 64-bit floating point (FP64) performance and 898.6 petaflops of peak FP16 performance, which is useful for training AI models. (For inference, INT8 quarter precision is available, and the throughout doubles to 1.8 exaflops.) This is a big enough machine to create fairly large models over the course of weeks to months. Put a gun to our head and we would guess Granite has somewhere around a maximum of 300 billion parameters, maybe 400 billion.
You do not have to use a model with its full parameter span, however. And many do not, often because it is not necessary for the job and more often because the amount of compute needed to train a model is directly proportional to the number of parameters that a model spans. If you want trillions of parameters, you need tens of thousands of GPUs to train a model in a reasonable amount of time. This will cost roughly $1 billion at today’s prices. That’s because the memory capacity and memory bandwidth needed to feed the GPUs is proportional to the amount of data you train on. If GPUs had more HBM stacked memory, you would need fewer of them. But GPU memory from Micron Technology, Samsung, and SK Hynix is scarce and expensive, and so AI accelerator makers (including those making GPUs) skimp on the memory as much as possible and end users get lower utilization than they might otherwise get on these very expensive devices. And, one other thing: Nvidia gets filthy stinking rich because it can corner the market on available HBM memory. This is why it takes months to train a big model like GPT-4. One way or the other, you pay with money or time.
Will said that the RPG project is using the Granite 8B model, which means it is only spanning 8 billion parameters. This is a fairly modest model, but that might be a good thing. If the model has fewer parameters, its inference will take a lot less compute, which probably means it can run on the matrix math units inside of Power10 and later CPUs. It also means that IBM can train and retrain the model on a fairly modest cluster to keep improving it over time iteratively.
The data that was used to train the Granite comes largely from the Internet, and Will talked about how this is all boiled down from 6 PB of raw, multi-lingual data down to 2.4 terabytes of usable data, which is then chopped up into more than 1 trillion tokens for training. (A token is a portion of a word, and the unit of data in LLMs.)
Large language models are one class of transformer models, also known as foundation models, because they are trained using the statistical weightings of a large corpus of data to transform input data into a different kind of output data, such as converting a phrase from English into Chinese. You can train it with brute force against the largest clean dataset you have and a large number of parameters, and then you can refine that tuning with a more precise dataset. If you want to generate RPG code, you take a pretrained Granite model with hundreds of billions of parameters and you retrain it using a corpus of RPG code that is curated. And if you want to be able to speak in plain English to the code assistant and have to create RPG code, what you really need are a bunch of pairings of RPG code and English language explanations of what that code does.
And that is exactly what Will and his colleagues at IBM are asking the IBM i community to donate. IBM has a bunch of its own RPG code, which it uses to create and enhance the RPG compilers and to test changes it makes in the code, but it needs more than that. IBM has also contacted coding legends in the IBM i community – Susan Gantner, John Paris, Scott Klement, Jim Buck, Paul Tuohy, Niels Liisberg, Yvonne Enselman, Mats Lidstrom, Koen DeCorte, and Steve Bradshaw were named, and others are offering their code as we go to press – to get good RPG code upon which to train the code assistant for RPG model. And now, you can help.
IBM wants to collect question/answer pairs for RPG code as well as RPG code, and will have a means to grant rights to use the code for both. And the licensing will also differentiate between code that you are fine with if the outside world can see and code that only IBM can see as it trains its model for Code Assist for RPG. You get to choose, and you can sign up at https://ibm.github.io/rpg-genai-data/#/.
IBM will be collecting RPG code and code-explanation pairs throughout the third quarter (which means between now and the end of September) and will take another run at retraining the Granite 8B LLM using this data.
What IBM has not decided upon as yet as when the Code Assistant for RPG will be available, and what the delivery mechanism will be for this or what it might cost. Hopefully it is just a plug-in for VS Code with a modest price and it does not require customers to buy the Watsonx stack – but don’t be surprised if the IBM marketeers go down that road to try to push the Watsonx stack.
RELATED STORIES
IBM Developing AI Coding Assistant for IBM i
The Time Is Now To Get A GenAI Strategy
Top Priorities in 2024: Security and AI
Thoroughly Modern: Proceed With Caution With AI In The Landscape Of Cybersecurity
IBM i Shops Are Still Getting Their Generative AI Acts Together
IBM To Add Generative AI To QRadar
How Long Before Big Blue Brings Code Assist To IBM i?
Generative AI Is Part Of Application Modernization Now
Sticking To The Backroads On This Journey
With Fresche’s New CEO, There Are No Problems, Just Solutions
Enterprises Are Not Going To Miss The Fourth Wave Of AI (The Next Platform)
IBM Introduces watsonx For Governed Analytics, AI
Technology Always Replaces People While Augmenting Others