Want a Fast and Easy Way To Sort Subfile Data?

October 8, 2008 Susan Gantner

Note: The code accompanying this article is available for download here.

Sorting data is something RPG programs often need to do. If it’s just a simple single field array you’re sorting in order to use the much faster binary search possible with %Lookup, for example, then SORTA works well and is simple. But what if it is a more complex task like sorting the data in a subfile on a user-selected column? Surely you need some more involved techniques, such as retrieving the data from the database again using a different ORDER BY on an SQL SELECT statement or using a different logical file or you could use the qsort C function for sorting the array elements in the program. Something as simple as SORTA can’t be used for that, right?

Maybe so. The circumstances where this is effective are limited, for sure, but if your requirements fit, then using SORTA with a group field can be the simplest way and often a faster alternative than other methods you may have tried.

First of all, what’s a group field? It’s a field in a data structure that is broken down into smaller subfields. For example, group field SflData might be made up of information about products (name, price, quantity) by using the Overlay keyword, such as:

  D  SflDS          Ds                  Inz
  D  SflData                            Like(SflRecData)
  D                                     Dim(999)
  D   Name                              Like(ProdDS)
  D                                     Overlay(SflData)
  D   Price                             Like(SellPr)
  D                                     Overlay(SflData:*Next)
  D   Qty                               Like(STOH)
  D                                     Overlay(SflData:*Next)

The effect is similar to nested data structures, except without the requirement to use qualified names. (Likewise, there are many limitations on group fields because of the lack of name qualification.) One additional thing that’s nice about group fields compared to nested DSs is that we can use SORTA against any of the subfields in a group field array.

So this means if I wanted to sort the data in the SflData array by product name, I could do that with the following statement: SortA Name;. Much simpler than any of those other options I mentioned above! Of course, in nearly all cases, it would require the use of the built-in function %SubArr (substring array) because I’m not likely to have filled up all 999 elements of SflData. Even so, the entire bit of logic to accomplish sorting this subfile data in the sequence of any of the three fields could be as simple as:

        If SortByName;
           SortA %SubArr(Name:1:Count);
        ElseIf SortByQty;
           SortA %SubArr(Qty:1:Count);
        ElseIf SortByPrice;
           SortA %SubArr(Price:1:Count);
        EndIf;

This technique is very simple and in most cases quite a fast way to sort subfile data (or any other kind of repeating data). It does have significant limitations. For example, you can only sort on one subfield at a time. (Of course, you could group two subfields together if they happen to be adjacent in the subfile record.) Also, you must be able to retrieve and store all the data destined for the subfile into an array so that you can sort it all together. For some very large subfiles, that won’t be practical. But for those occasions where it works, it couldn’t get much simpler.

You also need to think about the timeliness of the data since by using this technique you are not re-retrieving the data on each sort request, so there is no chance for recent record additions or updates to be reflected. This point could be considered either positive or negative. In some cases, the users may actually prefer that to see the same data without update in different sequences.

The process works something like this. You retrieve the data destined for the subfile and store it in the group field array, such as the one described above, keeping a count of the number of elements you have loaded. (In my example, this is stored in Count.) Then you load the actual subfile from the array. Those steps could be combined into one process for the initial subfile load if you prefer. Also, you can decide for yourself whether you want to load the entire subfile from the array at once or use the page at a time approach. However, in order for the simple SORTA approach to work, you must have all the subfile data loaded into the array, so you won’t save as much time with the page at a time subfile load as you would otherwise.

If/when the user requests a different sequence for the data, simply sort the group field array using logic similar to that shown above and then clear and refill the subfile from the array. I’ve found this to be a very fast technique for simple sorts of subfile data of a reasonable size.

Of course, there is no reason at all to limit this sorting technique to subfile data. It could be used for any similar sorting requirement. I’ve had many requests over the years for “the best way to accomplish user-controlled sequencing of subfile data” and this is one very simple option.

Here are a few things to keep in mind if you want to try it. Remember that the DIM keyword goes on the group field, not on the subfields that you will be sorting on. Also don’t forget that you will almost certainly need to use %SubArr to ensure you don’t get all the “empty” elements appearing first in your subfile.

I find it simpler to make the group field an exact duplicate of the subfile record, so that’s why I define my group field to be like a data structure that is created as either an externally described DS or a LIKEREC DS. In my simple example, I made every subfile field sortable. In real life, of course, this is not usually desirable. Since my group field is a duplicate of the subfile output record, it simplifies the logic of writing to the subfile.

In my example program, I’ve avoided using either pointers or qualified data names to keep the logic more understandable to a wider audience. As a result, of course, the logic is not quite as efficient as it could have been. I do, however, write to the subfile using a DS in the result field on the Write operation because the logic to load each field would be far too cumbersome. My logic to fill the subfile looks like the following (note that SflRecData is my externally described DS based on the output format of the subfile record):

    FOR RRN = 1 to Count;

        SflRecData = SflData(RRN);
        WRITE ProdSfl SflRecData;

    ENDFOR;

If you want to see the completed code for my very simple example program, you can view the full program code here.

Susan Gantner is one of the most respected System i gurus in the world and is one of the co-founders of System i Developer, an organization dedicated to RPG, DB2, and other relevant software technologies for the System i platform that hosts the new RPG & DB2 Summit conference. Gantner, who has worked in IBM’s Rochester and Toronto labs, left IBM to focus on training OS/400 and i5/OS shops on the latest programming technologies. She is also a regular speaker at COMMON and other user groups. Send your questions or comments for Susan to Ted Holt via the IT Jungle Contact page.

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot