bsearch: A Better %LOOKUP

February 8, 2012 Jon Paris

Note: The code accompanying this article is available for download here.

In qsort: A Better SORTA, I described the use of the qsort API. This time, we’ll be looking at its companion API–bsearch–which, as its name suggests, is used to perform searches.

Recent changes in array support in RPG, including V6.1’s abilities to define large arrays, means that I use bsearch less frequently than I used to. Nevertheless, the fact that I write the code that controls the matching still makes it useful. For example, I can code bsearch to search for a partial string, which is something that %LookUp can’t handle, no matter how hard I try. But before we get to that, we need to examine the basic operating requirements for bsearch.

The first thing to remember is that bsearch performs a binary search (hence the “b” at the start of the name), and that means that the array you are searching must be in key sequence before you begin. The principal advantage of a binary search is that it is much faster than the linear type of search used by the old LOOKUP op code. Indeed, a linear search is also used by %LookUp when no sequence is specified on the array. If you are unfamiliar with the binary search technique there is quite a good explanation on Wikipedia, or for those who are visual learners you might find this explanatory movie on YouTube useful.

Just as with qsort, you control the search by providing a comparison function, which is almost identical to the type of comparison function qsort uses. The only difference is that instead of being passed two array elements, your comparison code is passed the search key and a single element for comparison. For example: If we wanted to search the array for a specific name, we might code the comparison routine like this:

 p SearchForName   b
 d                 pi            10i 0
 d   searchName                        Like(customerData.name)
 d   candidate                         LikeDS(customerData)

  /Free

   If SearchName > candidate.name;
     Return HIGH;
   ElseIf SearchName < candidate.name;
     Return LOW;
   Else;
     Return EQUAL;
   EndIf;

Just as with qsort, if we wanted to do a multi-key comparison, we would do so by simply extending the range of the comparisons as we did with qsort.

So if the comparison routine is every bit as easy to code as for qsort, how do we call bsearch, and how do we determine when we have found a match? As you might expect, the prototype for bsearch is very similar to that for qsort, it just has an additional parameter at the beginning which is a pointer to the search key. All the other parameters remain the same as they were for qsort. Here’s the prototype:

 d bsearch         pr              *   extproc('bsearch')
 d  searchFor                      *   value
 d  dataStart                      *   value
 d  elemCount                    10u 0 value
 d  elemSize                     10u 0 value
 d  compareFunc                    *   ProcPtr value

As you can see from the prototype, bsearch returns a pointer, which will have one of two values:

*null, which indicates that no match was found
A valid pointer to the beginning of the array element in which the hit was found

This pointer (“presult” in the example) is used as the basing pointer for a copy of the DS array element. I have called this DS result.

Here’s the complete call and subsequent testing of the result.

d result          ds                  LikeDS(customerData)
d                                     Based(presult)

...

presult = bsearch(%Addr(searchString)
                 : %Addr(customerData)
                 : customerCount
                 : %Size(customerData)
                 : pSearchForName ); 

// Did we get a hit? If so presult will contain a valid pointer

If presult <> *null;
  Dsply ( 'Address is: ' + result.address);
Else;
  Dsply ( ' No address found for ' + searchString );
EndIf;

Although the %Addr() and %PAddr() functions may look a little odd, the manner of calling bsearch is essentially the same every time we use it.

To get a better feel for bsearch, compile and run the downloadable source code. It will demonstrate that bsearch provides a fast and flexible searching mechanism, and that, by virtue of its multi-key capability, goes far beyond what we can achieve with %Lookup.

It is important to note that bsearch can go well beyond the simple example I have shown here. For example, you could use bsearch to produce a case-insensitive search. How? Simply by converting the search string and candidate to all upper case (or all lower for that matter) before comparing them. Admittedly the search won’t be as fast, but you can easily set things up so that the search key is converted only on the first call.

Another thing we can do with bsearch is to perform a partial key search. We might choose to look for a match on just the first five or six characters. However, that will introduce another challenge that we have ignored in this introductory tip: how to handle a scenario in which multiple elements match the search criteria.

In fact, it is possible that this could happen even without allowing partial key searches. Suppose you were to search for a customer with the name of “Smith”? There would probably be more than one Smith in the database. In my example I have assumed that this does not matter, that a simple hit/miss answer is all that was required. But suppose the requirement was to list all customers with that name to allow the user to pick the one they wanted? This kind of processing, and a few more thoughts on the usage of these APIs, will be the subject of my next tip.

Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.

qsort: A Better SORTA

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot