Guru: RPG’s New DATA-INTO
June 18, 2018 Jon Paris
In this tip I’m going to give a brief introduction to the latest addition to the RPG language. The new DATA-INTO op-code. DATA-INTO is IBM’s response to the oft-asked question: “When is IBM going to introduce JSON-INTO so we can process JSON as easily as XML?”
DATA-INTO provides this capability, but IBM has very cleverly given it functionality that goes way beyond what a simple JSON-INTO op-code could ever have done. DATA-INTO is effectively a cross between XML-INTO and Open Access. Like XML-INTO it uses the names of items and their hierarchy to unpack the document into RPG variables. Like Open Access it provides an extensible framework that can handle just about any form of structured data. It can be applied to CSV files, properties files and, of course, JSON. Perhaps even more importantly for the future it can readily be applied to whatever is the next “big thing” to come along.
It is able to do this because the parser that interprets the raw data is supplied by you. This does not mean that you personally will have to write a parser before you can use the new facility. IBM supplies a couple of examples and others will be available from third parties. More on this later.
This story contains code, which you can download here.
The basic syntax should look very familiar to those of you who have used XML-INTO. Here’s an example from the program I will be discussing later:
DATA-INTO orders %Data( jsonData: 'case=any') %Parser('*LIBL/JSONPARSE');
The first parameter (orders) identifies the target for the operation, i.e. the variable into which the results of the operation will be placed. Typically this will be a data structure or data structure array. We will look at how RPG maps the content of the document being processed to the target structure when we discuss the sample program.
The second parameter (%DATA) names the source of the data to be parsed (jsonData) and the processing options to be applied (‘case=any’). DATA-INTO uses the same processing options that may already be familiar to you from XML-INTO. I will discuss some of these options when I work through the example.
The third parameter (%PARSER) identifies the program or subprocedure that will perform the parsing operation. The BIF has an optional second parameter that allows for the passing of parser specific parameters. For example the IBM-supplied JSON parser that I will be using in this program has two parameters available: “diagMessages” that controls debugging information; and “boolean” that controls how JSON true/false values are handled.
Speaking of IBM-supplied parsers . . . IBM supplies the source code (all written in RPG) for three parsers. The first, and the one that many have been waiting for, parses JSON. Its name is JSONPARSE and for simplicity it is the one that I will be using in in my example. I should mention however that this is not intended as a production-grade parser. It is provided by IBM as an example of how to write a parser.
The second and third parsers are variations on property file parsers. By “property file” I mean data that is structured in keyword/value pairs – such as DATA-INTO’s own processing options, E.g. ‘case=any’. One is designed to handle property files where each property is on a separate line. The other is designed to handle properties where value pairs are separated by a character (typically a semi-colon) for example key1=value1 ; key2=value2 ; and so on …
Time For An Example
The JSON file to be parsed consists of a simplified orders document which contains details of the item codes and quantities being ordered by a specific customer. Here’s an extract from the document in “pretty” format to make it easier for you to see the structure. Some explanations follow the Orders document.
(A) { "Orders" : [ (B) { "Customer" : "B012345", "Items" : [ (C) { "Code" : "12-345", "Quantity" : 120 }, { "Code" : "12-678", "Quantity" : 10 } ] }, (D) { "Customer" : "C123456", "Items" : [ { "Code" : "23-456", "Quantity" : 50 }, ...
(A) Begins the document and identifies it as containing Orders. Since Orders will be represented by a JSON array the start of that array is indicated by the “[“.
(B) This is the first order in the document and identifies the Customer and the list (array) of Items required.
(C) Shows an individual item and consists of the Item Code and Quantity required. This line is repeated for as many items as there are in the order.
(D) Marks the beginning of the next Customer. This combination of Customer and Item lines will be repeated for all the orders in the document.
Now that we know what the JSON looks like, the first task is to create a data structure to map that data. DATA-INTO maps by name and hierarchy just as XML-INTO does. We begin with an orders data structure (E below) and, since it contains multiple orders we designate it as an array DS. The size is up to you, but in my example I have assumed a maximum of 99 orders in a single document. The actual number processed will be set into RPG’s 8 byte count integer starting in position 372 of the Program Status Data Structure (PSDS) just as it would with XML-INTO.
The first field in the DS is customer (F), which is followed a nested array DS to hold the items (G). In this case I assumed a maximum of 20 items per order.
The order lines themselves (H) consist of the item code followed by the quantity required.
Notice that the field and DS names match EXACTLY the names in the JSON document and are at the same hierarchical levels. That is, orders contain customer and items. Items in turn contain code and quantity.
(E) dcl-ds orders Dim(99) Qualified; (F) customer char(7); (G) dcl-ds items Dim(20); (H) code char(6); quantity packed(5); end-ds items; end-ds orders;
This data structure would do the job, but would require that we specify the DATA-INTO option “allowmissing=yes” that is always a dangerous choice and to be avoided whenever possible. (If you don’t know why I say that I suggest you read this article on XML-INTO, which describes this option.) So, rather than use allowmissing, I have added the field count_items to the DS and, as you will see in a moment, added the option countprefix=count_ as a parameter to the %DATA BIF. As a result, RPG will count the number of item elements that are found in each customer entry, and not treat as an error any order without a full 20 items.
This is the modified version of the DS that the program will use:
dcl-ds orders Dim(99) Qualified; customer char(7); (I) count_items int(5); dcl-ds items Dim(20); code char(6); quantity packed(5); end-ds items; end-ds orders;
Now that we have the data structure in place we can code the DATA-INTO operation to process the document. Here it is:
DATA-INTO orders %Data( jsonData: 'case=any countprefix=count_') %Parser('QOAR/JSONPARSE');
The orders DS array is identified as the target for the extracted data. Next comes %DATA to identify the source of the JSON. In this example it is contained within the variable jsonData and the processing options to be used are case=any and countprefix=count_. Last but not least we use %PARSER to identify the program that will perform the actual parsing operation. In this case it is the IBM-supplied sample JSON parser JSONPARSE, which I compiled into the QOAR library.
Play Time!
The rest of my sample program is very simple so, rather than explain it all in detail, I suggest that you download it and play with it yourselves. You will, of course, need to ensure that you have the appropriate level PTFs on your system and that you have compiled JSONPARSE from the supplied source. Details of the PTFs required can be found in the RPG Cafe.
While you are “playing” you might like to try your hand with some different JSON documents. Be aware though that there are certain styles of JSON that the IBM parser will not handle without some code changes. For example, it is not uncommon in JSON to not bother with a name for the outer container (the Orders array in my example). The document would probably just start with an unnamed array like this:
{ [ { "Customer" : "B012345", "Items" : [ ...
As supplied. IBM’s JSON parser cannot handle this — it wants the top level to have a name just as it would with an XML document. For documents like that you either need to make some small code changes to JSONPARSE. An even better bet would be to switch to using Scott Klement’s YAJLINTO parser, which you can find as part of his YAJL JSON package on his website.
I will be covering Scott’s parser and the options it provides along with some additional information on DATA-INTO in my next tip.
P.S. At this time I have no plans to write up a tip on “How To” write a parser for DATA-INTO, but if there are specific types of data you have that could benefit from the DATA-INTO approach and you would like to see how one goes about designing and building such a parser, please let me know via the comments.
RELATED STORIES
An Introduction to Processing XML With RPG, Part 2
An Introduction To Processing XML With RPG, Part 1: The Basics
Doesn’t this:
DATA-INTO orders %Data( jsonData: ‘case=any’)
%Parser(‘*LIBL/JSONPARSE’);
in essence amount to something like this:
Orders = JSONPARSE(jsonData : optionsDS)
or
Orders = CSVPARSE(jsonData : optionsDS)
and so forth?
To clarify…
While I appreciate the functionality, could this functionality have been better surfaced as a service program rather than an op-code? It’s not really a new language capability per se, but a disguised procedure call. That procedure is now unavailable to (or at least hidden from) other languages, missing an opportunity to illustrate a benefit of ILE.
I”m glad you clarified because I was not sure what you meant.
I think you are underestimating the task. Just for one example, how does JSONPARSE know the names of the fields in the DS you just passed it? You can’t assume that the elements are in sequence for many reasons not the least of which being the common practice of omitting elements.
If you would prefer to see a simple subprocedure call in the mainline logic then wrap up the DATA-INTO within it. I quite often do that – but this is a teaching article so I don’t include “extra” stuff that might confuse the basic thrust of the topic.
Good day. I tried to compile this code but the compiler is complaining about the two fields “customer” and “count_items”. Also says “THE END-DS IS MISSING FOR A GROUP CONTAINING
END-DS IS ASSUMED”.
Sorry you’re having a problem but before I can help I need a little more detail please. There are two programs in the package – are they both giving problems? Which end-ds is flagged as missing? I’m looking at the source and everything is there.
Are you sure that your system is at the correct PTF level for DATA-INTO?
Any idea on when the YAJLINTO tip will be posted?
Apologies for the late responses to these questions – for some reason the system is not notifying me when comments are posted. I will try harder to make sure I check regularly.