Guru: Dynamic Arrays Come To RPG
July 20, 2020 Jon Paris
Some 12 months ago, when the 7.4 release was announced, I wrote the Guru Tip “7.4 Brings New RPG Goodies” describing the features of 7.4 that were also available on 7.3. I said at the time that I would return later to discuss the 7.4-only features. So now that a significant number of shops have access to 7.4 that time has arrived.
Dynamic arrays are the answer to the perennial programmer question: “Just how big do I need to make this array?” In my experience it doesn’t actually matter how big you make it, at some point down the road it will be too small! At long last we have an answer to this question.
So just how do we define these arrays? Simple — we just add one of the new keywords *AUTO or *VAR to the DIM specification. Like this:
dcl-s autoArray Char(10) Dim(*Auto: 50); dcl-s varyArray Char(10) Dim(*Var: 500);
So what is the difference between the two types? An Automatic (*Auto) array will automatically grow as needed. A Variable (*Var) array is variable in size and can grow and shrink under the control of the programmer, In this first tip on the subject I am going to focus on the *Auto variant. In a subsequent tip I will cover the *Var version and discuss how to handle some of the limitations of this new support.
Before I begin however, I will just take a few moments to establish a few terms that will become significant later on. Let’s take a look at a conventional RPG array.
dcl-s normalArray Char(10) Dim(50);
As you can see, the array has been defined with a maximum capacity of 50 elements. Regardless of how many entries we have populated, the array is considered to have a current capacity of 50 elements. It is this current capacity value that governs the scope of SORTA and %LOOKUP operations. In addition, note that the storage allocation for the array is also set for 50 elements.
Now that we have those definitions in mind let’s look at how this new support works.
Using A Variable-Length Array
Arrays defined with *Auto or *Var differ from conventional arrays in that they start with the current capacity being zero. The second parameter to the DIM specifies the maximum capacity of the array. Although the array can grow, it will never be allowed to exceed this size and so it provides a sanity check to prevent runaway code from going on forever.
Let’s look at an example:
- The stand-alone array autoArray defined at (A), starts with the current capacity set to zero.
- When the code at (B) uses an index value (1) that is greater than that values, the capacity is raised to 1 to accommodate this new element.
- At (C) I am placing a value in the 10th position. As you might expect, this results in RPG increasing the current capacity to 10. The intermediate elements 2 through 9 are initialized to the default value based on the type of data in the array. In this case that would be blanks.
- Line (D) has no impact on the capacity, since 9 is less than 10, so the value will simply be placed in the 9th position.
- Finally at (E) we see the effect of our having selected a “sanity check” value of 50 as the upper limit for the array. A runtime error will occur since the required extension is beyond the maximum capacity permitted.
(A) dcl-s autoArray Char(10) Dim(*Auto: 50); (B) autoArray(1) = 'One'; (C) autoArray(10) = 'Ten'; (D) autoArray(9) = 'Nine'; Index = 90; (E) autoArray(index) = 'Ninety'; // This will go BOOM!
A Nice Additional Benefit
In many cases where arrays are being used, the code increments the index as more items are being added to the array. Along with this new style of array comes a new index option that makes such scenarios easier to code. Take a look at the code below and you’ll see how the use of the new *Next index value can be used to make life simpler.
The code starting at (F) is an example of how you might do it today, by incrementing a counter before adding a new element to the array. At (G) you can see how this can be simplified by specifying *Next as the index. This causes the compiler to increment the current capacity and to use that new index to store the data. For example, if the current capacity is 15 elements, using *Next causes the capacity to be raised to 16 and the new item to be stored at index 16.
(F) // Add new element to array count += 1; oldArray(count) = 'New Value'; (G) // *Auto version autoArray(*Next) = 'New Value'
Note that *Next extends the current capacity of autoArray, but the maximum capacity remains fixed at 50. Once data has been placed in the 50th element using *Next (or, for that matter, by specifying 50 in the index directly), any attempt to use the *Next element would fail because it would be attempting to create element 51 in an array whose maximum is defined as 50.
That is all well and good, but how can we determine just what the current capacity of an array is if we don’t keep track of it in our own code?
There are actually two answers to this question. The first is that you may not need to! The reason is that RPG will automatically limit array operations such as SORTA and %LOOKUP to the current array capacity, so there is no need to use %SUBARR to constrain such operations to the active elements.
The second answer is that, in cases where you do need to know the actual count, our old friend %ELEM will supply it.
Storage Allocation
It may have occurred to you, that one of the differences between this and conventional RPG array support is that the memory used by the array must be dynamic in order for it to “grow on demand”. This is true and also means that, on occasion, RPG may need to move the whole array to a different memory location. Never fear though, RPG will guarantee to preserve your array’s content just as it preserves the dynamic memory associated with an %ALLOC and %REALLOC.
Those of you who use routines such as qsort to sequence arrays should note that as a result of such a move, the address of the array is subject to change. So if you ever use the %ADDR function to obtain the array’s address, you must make sure to refresh the pointer value before use if there is any possibility that the array size has been changed.
You may also wonder if this constant reallocation of memory causes a performance overhead. While this may be an issue in theory, in practical terms this is not likely to be a problem. This is in part because when RPG requests the operating system to supply dynamic memory it will typically allocate more than the amount requested. The system does this to avoid memory fragmentation. If that statement piques your curiosity, you’ll find a good basic description of this topic here: en.wikipedia.org/wiki/Fragmentation_(computing) .
So, while RPG may ask for (say) 100 bytes, the system may well hand over 512. Even so, if you are concerned that frequent re-allocation could be required, read on to see how you can manage the amount of allocated storage to minimize the chance of frequent reallocations.
Here’s how. To find how many array entries can be accommodated by the current memory allocation, our friend %ELEM can once again help. In fact we can even use it to set the memory allocation if we have some idea in advance of how many elements we may need. You can see how this is done in the code sample below.
dcl-s autoArray2 Char(5) Dim(*Auto: 10000); (H) Dsply ('Initial size = ' + %Char(%Elem(autoArray2))); (I) Dsply ('Allocation = ' + %Char(%Elem(autoArray2 : *Alloc)) ); (J) %Elem(autoArray2 : *Alloc) = 150; (K) Dsply ('Allocation = ' + %Char(%Elem(autoArray2 : *Alloc)) );
If you run this code (assuming that your system behaves as mine does) the display at (I) will show that sufficient storage has been allocated to accommodate 100 elements – even though at (H) we were shown that the current capacity is zero.
At (J) a storage allocation sufficient for 150 elements is requested. However the subsequent display at (K) shows that we were actually given storage for 250. In my experiments I have found that RPG seems to allocate space for 100 more elements than requested, but I have not been able to verify with IBM that this is always true. And of course even if it is true today, it could always change in the future, so best not to rely heavily on it.
Next Time
In my next tip I will look at how the *VAR version of these new dynamic arrays are used, and discuss how to handle some of the limitations in this initial support.
Jon Paris is one of the world’s foremost experts on programming on the IBM i platform. A frequent author, forum contributor, and speaker at User Groups and technical conferences around the world, he is also an IBM Champion and a partner at Partner400. Together with Susan Gantner and Paul Tuohy he also runs System i Developer who (until Covid-19 raised its ugly head) ran the twice yearly RPG & DB2 Summit and currently offer a number of on-line education opportunities.
As I noticed, dynamic arrays work only on level 0. Nested datastructure arrays do not work. Or I’m wrong?
Sorry for the delay in responding Karl, I am not notified when a comment is posted sadly.
You are correct. For the time being this is a limitation – whether that will change in the future only time will tell.
For anyone wondering why this restriction exists take a look at the following DS:
dcl-ds testDs;
field1 char(10);
array char(10) Dim(100);
field2 char(10);
end-ds;
Compiled languages like RPG rely on the addresses of variables being determined at compile time. That is why (for example) RPG does not allow you to reference a field indirectly (e.g. by having the name of the variable in another field) whereas interpreted languages such as Basic do allow this.
So when array is a conventional RPG array it is easy to determine the address of field2. It follows element 100 of array.
Now imagine that array is dynamic in size. What is the address of field2? It would have to be calculated every time you needed to use that field! Not only that but you would probably have to be prevented from using %Addr to capture the address of field2 because there is no way to ensure that the address is updated every time the size of the array changes.
There are other ramifications but hopefully this will give you an idea as to why this is not an easy fix.