A Bevy of BIFs: Look Up to %LookUp
February 4, 2009 Susan Gantner
The %LookUp built-in function is more than just a simple replacement for the LOOKUP operation code to allow use in /Free format logic. It offers the option of supercharging the performance of an array search operation. Let’s cover the basics first. %LOOKUP( searchfor : array {: startindex {: numelems}}) The first parameter is the argument you’re searching for, and the second parameter the name of the array where you’re looking; i.e., what would have been specified in Factor 1 and 2 of the LOOKUP op code, respectively. The optional third parameter is the starting index for the search, i.e., what you could have specified as an index on the Factor 2 array. The fourth parameter is the number of elements to search. This option didn’t exist in the LOOKUP op code, so this represents the first enhancement offered by %LookUp. If your array is not full, you may now simply specify the number of elements that are there, avoiding searching the entire array when looking for an equal match. The return value of %LookUp is the element where a matching value was found, or zero if it was not found. There are variants of the function for comparisons less or greater than the search argument: %LookUpGT, %LookUpLE, etc. Perhaps the best feature of %LookUp when compared with the op code is the ability to use a binary search rather than the much slower linear search when the data in the array is in sequence (ascending or descending). If you’re not familiar with the notion of a binary search, imagine you’re searching for an exact match and there is no match in an array of 100 elements. A linear search (the type always used by the op code) would require 100 comparisons to determine there was no match. A binary search would begin around element 50 and compare. If the search argument is lower than the value of element 50, it then compares to element 25 for the second comparison. After each comparison, the number of candidate elements is cut in half. So in the worst case (a no match condition), it takes seven comparisons with a binary search compared to 100 in a sequential search. The performance difference grows exponentially as the size of the array gets larger. If you’d like to read more about linear vs. binary search methods, check out http://en.wikipedia.org/wiki/Binary_search. The LookUp operation code always performs a linear search, even if the array is sequenced. Before we had %Lookup, some programmers learned to speed up their searches by specifying both the EQ and GT indicators on LookUp (and initializing any “empty” elements with high values) so that the linear search stopped before the end of the array on a not found condition. This is certainly helpful, but using a binary search will still be much faster still. It is important to note that the binary search can only be used if the data in the array is in sequence and the programmer specifies that sequence by using either the Ascend or Descend keyword on the array definition. If it’s not practical to load the array in sequence, it could prove worthwhile to sort the array in the program if the search will be performed multiple times. This might be done using SORTA or the qsort C function. A mistake that is made more commonly than you might think is to specify Ascend or Descend on an array even when the values are not in sequence. This didn’t cause a problem for the LookUp operation code when searching for an equal match, but it would cause a problem for %LookUp’s binary search. So consider replacing your use of the operation code with %LookUp even if you’re not doing /Free form logic. If you’re already using %LookUp, you may also want to examine whether sequencing the array elements–and specifying that sequence on the definition–is worth the effort to make multiple subsequent searches orders of magnitude faster. Susan Gantner is one of the most respected System i gurus in the world and is one of the co-founders of System i Developer, an organization dedicated to RPG, DB2, and other relevant software technologies for the System i platform that hosts the new RPG & DB2 Summit conference. Gantner, who has worked in IBM’s Rochester and Toronto labs, left IBM to focus on training OS/400 and i5/OS shops on the latest programming technologies. She is also a regular speaker at COMMON and other user groups. Send your questions or comments for Susan to Ted Holt via the IT Jungle Contact page.
|