Guru: Web Services, DATA-INTO and DATA-GEN, Part 2
April 12, 2021 Jon Paris
In Part 1 of this series I discussed the use of DATA-GEN and DATA-INTO to create “blog entries” via a web service. This time I am going to focus on using the GET HTTP method to retrieve blog entries. As you will see the basic process is very similar.
I am going to start by retrieving a single blog post, but rather than retrieve all of the data associated with the post I will show you how to restrict processing to specific elements. I will then move on to look at two approaches to processing multiple posts. The first processes all of the data in one go, the second processes it in “batches.”
The web service has the same base URL that we used before, but we will be using the GET HTTP method rather than the POST we used before. For a GET request, the web service checks to see if the URL has a post number at the end of it (i.e., after the final “/”). If it does, that specific post number will be retrieved. If you want to see what the result would look like, click this link. You should see the “raw” JSON data for blog post 15. If there is no post number present then all available blog posts will be returned. You can see this in action by clicking on the first link in this paragraph. We will be working though examples of both cases.
This story contains code, which you can download here.
Of course JSON returned in the browser in this way, and while useful to check if the web service works without writing any code, it is not terribly practical. So, I will use DATA-INTO to parse it, just as before. As you have just seen, for blog post 15, the web service returns JSON that looks like this:
{ "userId": 2, "id": 15, "title": "eveniet quod temporibus", "body": "reprehen derit quos ... fugiat vitae" }
However, for the purposes of this exercise I am going to assume that I have no need for the data contained in the “body” element. Omitting it from processing will reduce the amount of memory used by the program as this element can obviously be quite large. It will also speed up the processing a little — so this is the DS that I am going to be using:
Dcl-DS responseData Qualified Inz; id int(5); title varchar(30); userid int(5); End-Ds;
As I noted earlier, for this service we are using the GET method, and any parameters are simply part of the URL. As a result the HTTPAPI call is much simpler. Here’s what that processing looks like now:
// Build URL to be used by adding blog Id to base URL url = baseUrl + %Char(id); ... response = HTTP_string( 'GET' : url );
Once we get the response from the web service we use DATA-INTO to populate the DS. The code looks like this:
Data-Into responseData %DATA( response : 'case=any' ) %PARSER( 'YAJL/YAJLINTO' );
But there is a problem. If you were to compile and run this, you would receive a run time error to the effect that “The document for the DATA-INTO operation does not match the RPG variable; Reason code 5.”
And if you were to check you would find that code 5 is defined as:
“5. The document contains extra names that do not match subfields.”
The “extra names” of course refers to the “body” element in the JSON that I decided not to process. In order for DATA-INTO to process this document I will need to add the %DATA option ‘allowextra=yes’. This tells RPG that it is OK if there are elements in the JSON source that have no match in the target DS. You need to exercise a little caution when using this option as it does not provide for any granularity. That is to say that there is no way of saying: Allow the “body” element to be ignored, but all others must be present. As a result, if the response from the web service were to change and include additional elements we might never know about it because the “allowextra” option would also cause them to simply be ignored.
The resulting code looks like this:
Data-Into responseData %DATA( response : 'case=any allowextra=yes' ) %PARSER( 'YAJL/YAJLINTO' );
And that is all there is to it. If you study the complete code for RPG program USEWEBSRV2 (which can be downloaded here) you will see that the program uses a Monitor group to trap any errors signaled by HTTPAPI. For this particular web service an attempt to retrieve details for a non-existent blog post results in HTTPAPI throwing an error. For simplicity I am simply treating all errors as indicating a “Not found” status for the requested post. With many web services this simplistic approach to error handling will be sufficient. For others I might need to process the exact error message returned and act accordingly. I will discuss how to do this in later tips.
Processing Multiple Results
Earlier I mentioned that if there was no blog post number at the end of the URL, the web service would respond with a list of all blog posts. So how can I process that list?
There are two approaches to handling lists like this. All at once, and a “chunk” at a time. The first is best suited when you can either control the number of items that will be returned, or know that there can never be more than a given number. The second is useful when the number of results is unknown. It is also useful if the data can be processed in pieces — for example if the data returned were to be displayed in a subfile. Prior to the advent of IBM i V6 with its greatly increased size limits, it was also often necessary to use this second approach to facilitate processing results that included very large fields (descriptions for example). The new higher capacity limits reduce the number of times when we need to do this.
Let’s start by looking at the “all at once” approach. In simple cases, such as our example, all we need to do to facilitate this is to have DATA-INTO target a DS array. So the target definition changes to this:
Dcl-DS responseData Qualified Inz Dim(200); id int(5); title varchar(30); userid int(5); End-Ds;
How can we tell how many elements of the array were loaded by DATA-INTO? The RPG compiler facilitates this by providing an 8-byte count (an unsigned 20-digit integer) starting at position 372 in the Program Status Data Structure. In my program I defined it like this:
Dcl-DS *n PSDS; itemCount Uns(20) Pos(372); // Populated by DATA-INTO End-Ds;
And that is all that we need. After successfully invoking the web service I can then use the value in itemCount to control the processing of the array elements. In my test program USEWEBSRV3 I used it to control the number of elements scanned by the %LOOKUP operation, as you can see below.
DoU forever; // the "forever" indicator is off and will never be set! // Ask for Blog Post Id and exit when requested Dsply 'Which post do you want to see? ( 0 to exit ):' ' ' item; // Quit if user requests it If item = 0; Leave; EndIf; index = %LookUp( item : responseData(*).id : 1 : itemCount ); If index <> 0; Dsply ( 'Title: ' + responseData(index).title ); Dsply ( 'Author: ' + %Char(responseData(index).userid) ); Else; Dsply ( 'Post Id # ' + %Char(item) + ' not found' ); EndIf; EndDo;
Processing The Results In “Chunks”
In essence the process is simple — we merely have to change the target of the DATA-INTO operation so that rather than targeting a DS we instead identify a handler subprocedure. It is this subprocedure that will receive the “chunks” or data. So we change from this:
Data-Into responseData %Data( response : 'case=any allowextra=yes' ) %PARSER( 'YAJL/YAJLINTO' );
To this:
Data-into %Handler( ProcessPosts : itemCount) %Data( response : 'case=any allowextra=yes' ) %PARSER( 'YAJL/YAJLINTO' );
Notice that %HANDLER has replaced the DS name responseData. As far as DATA-INTO is concerned no other change is needed. The first parameter to %HANDLER identifies the subprocedure to do the processing. The second is known as the communications area and it is used to pass information between the code containing the DATA-INTO and the processing subprocedure. This is needed because your RPG code is not invoking your subprocedure directly, but rather indirectly via the RPG run-time. The use of such a value will be more obvious when we look at the code for the handler subprocedure.
I said that we would be processing the data in “chunks,” so how do we control the size of the chunk? The answer is very simple, although perhaps not immediately obvious. We control it by specifying in the subprocedure’s interface the number of elements to be handled at a time via a DIM. Here’s the procedure interface for my test program USEWEBSRV4.
Dcl-Proc ProcessPosts; Dcl-Pi *N Int(10); count Int(5); postData LikeDs(responseData_T) Dim(40) Const; items Uns(10) Value; End-Pi;
There are three parameters passed by RPG to my subprocedure.
- The first is the communications area I mentioned earlier.
- The second is effectively the -INTO target variable. It contains the data parsed out by RPG. As you can see, in this case I have chosen to process 40 items at a time. Hence the DIM(40). Note that even if I only want to process only one item at a time, I still have to code the parameter as a DS. In other words, it would have to be coded as DIM(1).
- The third is a count supplied by RPG of the number of elements that have been populated in the -INTO array. Since my DS array has 40 elements this number will always be 40, except on the last call when it could be anything from 1 to 40. It will never be zero because if there is nothing to process your handler will not be called.
The other thing to note is that the procedure is defined as returning a four byte integer ( Int(10) ). This provides a method for the subprocedure to notify the RPG run time that it should abandon processing. I have never found a need to do anything other than return a value of zero which tells RPG to keep processing.
Within the subprocedure I am just doing minimal processing to demonstrate that the date was received. Here’s the code:
Dsply ( 'Processing ' + %Char(items) + ' items' ); Dsply ( 'Starting with item ' + %Char(postData(1).id) + ' from user ' + %Char(postData(1).userid) ); count += items; // Increment total count Return 0;
I start by displaying the variable items, which contains the count of the number of elements loaded by RPG into the DS passed as the second parameter. Next I display the user ID and blog post ID from the first item in the DS array. I then add the number of items being processed on this call to the count variable (i.e., the communications area). In my example I am using this to make the total number of elements processed available to the mainline code. In the previous example RPG supplied this count in the PSDS as noted earlier. Because we are processing in “chunks” RPG can no longer supply that value and so it is up to us to build it if we need it. If you study the source you will see that the value is subsequently displayed back in the mainline following the DATA-INTO operation.
Wrapping Up
As you can see, the combination of the HTTPAPI and RPG’s DATA-INTO can make it very easy to interact with web services. Obviously the example I have used here is a simple one, but it should give you a good idea of the basics. When dealing with more complex requirements, particularly ones involving nesting of elements, the biggest challenge is often defining the receiving data structure. As you gain experience this will become second nature to you, but when starting out you may find Scott Klement’s utility YAJLGEN useful. This tool is shipped with the YAJL library. It takes as input a sample of the anticipated response JSON and generates its best guess at the required DS. In fact, it generates a complete DATA-INTO test program. Of course, when it comes to the size of fields, and arrays it is very much a guess, something that the comments included at the beginning of the generated DS make abundantly clear.
Next Time
In the next part of this series I will be looking at how DATA-INTO and DATA-GEN can be used to process JSON that includes elements with names that do not map to RPG names. I will also touch on some additional features of the YAJLINTO parser and the YAJLDTAGEN generator.
In the meantime, if you have any questions or if there are any particular aspects of this very broad topic you would like me to delve into in future tips please let me know.
Jon Paris is one of the world’s foremost experts on programming on the IBM i platform. A frequent author, forum contributor, and speaker at User Groups and technical conferences around the world, he is also an IBM Champion and a partner at Partner400 and System i Developer. Until Covid-19 messed everything up he hosted the RPG & DB2 Summit with partners Susan Gantner and Paul Tuohy. These days he has to be content with everything being done over Zoom. That includes the upcoming Summit Hands-On Live! Workshops, and the Virtual RPG & DB2 Summit.
RELATED STORIES
Brilliant article. Thanks Jon.