Guru: An Introduction to RPG’s XML-INTO, Part 1
August 1, 2018 Jon Paris
Author’s Note: The original version of this article was written in the V6 timeframe and included references to V5R4. References to the V5R4 limitations have been removed from this updated version. I have also updated the data definitions to take into account RPG’s ability for the direct coding of nested data structures rather than having to use LikeDS as before.
RPG IV’s built-in XML support has been available for some time now, having been originally introduced with V5R4 back in 2006. However, it wasn’t until the advent of V6 with its removal of many of RPG’s size limits that it became the powerful tool it is today.
This story contains code, which you can download here.
In this series of tips I am going to start with the basics and then, as we progress, illustrate how to handle the more difficult situations that can present themselves. That said, let’s get started.
XML-INTO: A “Move The Mountain” Op-Code
The heart of RPG’s XML support is XML-INTO. On the surface it is a deceptively simple op-code, using only two factors. The first identifies the target for the extracted data and the second information about the XML source. As you will see, it does a lot under the covers, so much so that I put it into the “move the mountain” category along with other powerful operations such as EXFMT.
This is the basic syntax:
XML-INTO xmlTarget %XML( xmlSource : processingOptions );
“xmlTarget” is where the extracted data will be placed. It can be almost anything, from a simple field, to an array, to a Data Structure (DS), to a DS array, the list goes on. The reason there are so many possibilities is that the “shape” (i.e., structure) of the target must match that of the original XML document. RPG determines that shape from the names of the variables and their hierarchy. This is one of those cases where it is far easier to explain what I mean with a couple of simple examples.
Suppose that the XML document contains address information for a number of customers and looks like this:
<Customers> <Customer> <Name>Brown and Sons</Name> <City>San Jose</City> <State>CA</State> </Customer> <Customer> <Name>Smith and Jones Inc.</Name> .... </Customers>
Then the target would have to look like this:
Dcl-ds customer Dim(99) Qualified; name char(40); city char(40); state char(2); End-ds;
Because the <Customer> element repeats, it has to be represented as an array. And because it is a compound element (consisting of the three elements Name, City, and State) it must, in RPG terms, be represented as a DS. Note that in this example I have placed the name, city, and state fields in the same sequence in the DS as they were in the XML document. This is not essential but in this case makes sense. In XML-INTO terms, the only requirement is that the fields be in the same hierarchical position, i.e. subordinate to customer. You’ll see this in action in the second example, where I deliberately changed the sequence.
If we assume that the XML document is contained within the IFS file Customers.xml in directory XMLDocs, then the XML-INTO operation needed to process the document would be:
XML-INTO customer %XML( fileLocn: 'doc=file case=any');
Where the character variable fileLocn would contain the value /XMLDocs/Customers.xml.
Note that I have used two processing options here. The first, “doc=file”, tells the compiler that the variable “fileLocn” contains the name of the file to be processed. Without this option the RPG compiler would assume that the variable contained the XML. You will forget to add this option (we all do) and when you do there will be a run time error indicating that the XML document does not appear to be valid. Not surprising, since it is trying to process a file name as if it were XML!
The second option, “case=any”, is one that you will have to use just about every time you use XML-INTO. Why? Because XML element names are case sensitive and in order to match them up to the RPG definitions the names must first be converted to upper-case since that is how all RPG names are seen by the compiler.
Once the XML-INTO operation completes, all that remains is to process the data that was retrieved. To do that we will almost certainly need to determine how many Customer elements were found. RPG supplies a very simple solution by placing a count of the number of elements filled in positions 372 – 379 of the program status data structure (PSDS) as a 20-digit integer. This count can be used to loop through each of the filled elements and process them. You will see this in action in my sample programs. The count field is only valid when, as in our example, the target of the XML-INTO is an array. In the original release of the XML support, this was the only kind of element count supported. You’ll learn about the enhanced support in the next part of this series.
A More Complex Example
By now you are certain to have realized that very few XML documents will be this simple. For example, what if the Customer ID was included in the document as an attribute, and perhaps the City and State were part of a compound element named Address? XML for such a document would look something like this:
<Customers> <Customer Id="B012345"> <Name>Brown and Sons</Name> <Address> <City>San Jose</City> <State>CA</State> </Address> </Customer>
Since a compound element maps to a DS in RPG terms we now have the need to nest one DS (Address) inside another (Customer). Luckily RPG gave us this facility back in V5R2. But how to handle the attribute ID? Turns out this is very simple. Basically, an attribute of an element is treated as being at the same hierarchical level as a child of that element. So the way we code the RPG structure is exactly the same as if the XML had been:
<Customers> <Customer> <Id>B012345</Id>
The changes needed to process this revised format are shown here:
Dcl-ds customer Dim(99) Qualified; id char(7); name char(40); dcl-ds address; state char(2); city char(40); End-ds; End-ds;
As you can see, by using RPG IV’s ability to directly code nested DSs it is easy to match the “shape” of the DS to the XML document. I have also added the “id” field to the structure.
For those of you unfamiliar with nested data structures such as these, I will just point out that in order to reference the ID field for a particular customer element you would refer to customer(index).id. To reference the state field in the address you would code customer(index).address.state.
If you would like to play with these examples you can download the code here.
In the next edition of Four Hundred Guru Classic, I will be taking a look at how to deal with repeating elements that occur within the body of the XML document and the basics of how to deal with optional elements.
Jon Paris is one of the world’s foremost experts on programming on the IBM i platform. A frequent author, forum contributor, and speaker at User Groups and technical conferences around the world, he is also an IBM Champion and a partner at Partner400 and System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Paul Tuohy.
RELATED STORY
An Introduction To Processing XML With RPG, Part 1: The Basics