qsort: A Better SORTA
January 25, 2012 Jon Paris
Over the past few years I’ve done quite a lot of PHP coding alongside my RPG work, and PHP’s extensive use of arrays has furthered my conviction that we RPGers just don’t use arrays nearly enough. There are many reasons for this, of course, one of which is RPG’s limited ability to sort and search arrays. Admittedly in recent releases the restrictions have been somewhat alleviated, but there are still many limitations. For example, take a look at the DS below: d customerData ds Dim(1000) Qualified d name 30a d address1 40a d address2 40a d city 30a d state 2a d zip 5s 0 Now suppose that you were asked to sort this DS in the sequence: name, city, state. Just the fact that we want to key the sort on more than one field effectively rules it out for RPG. Yes, I know that you could clone the data with the three fields in reverse order and then treat them as one huge field . . . but that’s cheating, not to mention inefficient. There is a solution though, and it offers a lot of possibilities beyond multi-key searching as you will soon see. It involves the use of the C library function: qsort. Now don’t let the mention of C put you off; you don’t have to know anything about the C language to use this feature. Some of the parameters may appear a little strange the first time you see them, but the good thing is that every time you use qsort you can pretty much clone the logic and just adjust the field names and definitions. The thing that makes qsort so flexible is the fact that it relies on your code to make all the important sequencing decisions. That’s right. Unlike RPG’s SORTA, qsort doesn’t make any sequence decisions. That is your job. That said, let’s begin our exploration by looking at how you control the sort sequence. This is the sequencing routine that we supply to qsort. p SeqNameCityState... p b d pi 10i 0 d elementA LikeDS(customerData) d elementB LikeDS(customerData) /Free If elementA.state > elementB.state; Return HIGH; ElseIf elementA.state < elementB.state; Return LOW; ElseIf elementA.city > elementB.city; Return HIGH; ElseIf elementA.city < elementB.city; Return LOW; ElseIf elementA.name > elementB.name; Return HIGH; ElseIf elementA.name < elementB.name; Return LOW; Else; Return EQUAL; EndIf; Notice that the sequencing routine takes two parameters, each of which is defined like an element of the DS array. The qsort function will call this procedure repeatedly, each time passing two elements for comparison. Our code has to decide which of the two elements has the highest position in the sort sequence. Since our highest priority key is the state field, we first compare those fields. If elementA’s state value is higher than that for elementB then qsort wants us to return a value of 1, which for ease of understanding we have defined as a constant named HIGH. If A’s value is lower than B’s then we need to return the value -1 (The constant LOW). And if the two state fields have the same value? Then we go on to compare the next component of the key–the city–and if those are equal, then the name fields are compared. Since the name is the final component in the key, then we simply inform qsort that the keys are equal by returning the value 0 (EQUAL). That’s really all there is to it. The qsort function takes care of reordering the array elements in memory. Once it knows that all the entries have been sequenced it returns control to the point from which it was called. Talking of which, it is time to see how that call is made. //************************************************** // Prototype for the qsort function //************************************************** d qsort pr extproc('qsort') d dataStart * value d elemCount 10u 0 value d elemSize 10u 0 value d compareFunc * ProcPtr value d pSeqProcedure s * ProcPtr d Inz(%PAddr(SeqNameCityState)) /Free qsort ( %Addr(customerData) : customerCount : %Size(customerData) : pSeqProcedure ); The qsort function requires four parameters. The first is a pointer to the start of the data to be sorted, which we specify by using the %Addr BIF and referencing the array name. The second is the count of the number of active elements, and the third the size of each element in the array. The fourth identifies your sequencing routine. This parameter has to be in the form of a procedure pointer, which we initialize on the D spec by using the BIF %PAddr (Procedure address) and identify the name of the prototype to our sequencing routine. That is really all there is to it. If you want to study a full working version of the code you can find it here. In this article I’ve tried to focus on a simple “cookbook” approach to using qsort that you can adapt to your own requirements without an in-depth knowledge of the function. In a future article I will discuss bsearch, qsort’s array lookup companion, and discuss some of the other ways in which these functions can be used. Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.
|