The Efficiency of Varying Length Character Variables
September 10, 2008 Jon Paris
Remember the bad old days when dinosaurs still roamed the earth and the only way to build strings in RPG involved playing silly games with arrays? Or worse still, obscure combinations of MOVE operations? Thankfully those days are far behind us–although sadly there are still a few RPG/400 dinosaurs coding away! RPG IV introduced many powerful new string handling options, such as the %TRIMx family of BIFs, but even now there are capabilities in the language that few programmers fully exploit. One of my favorites is variable length fields. There are many good reasons to use these fields, but in this tip we’re going to focus mainly on performance. For those of you unfamiliar with varying length fields, the following D specs show how they are defined and illustrate the constituent parts. Varying length fields have two components: the current length that is represented by a 2-byte integer in the first two positions, followed by the actual data. They are differentiated from regular character fields by the use of the keyword “Varying.” (See (A) in the code that follows.) You should train yourself to always code the INZ keyword to ensure that the length field is set correctly. This is critical when varying length fields are incorporated in data structures. Why? Because by default, data structures are initialized to spaces (hex 40) and that causes havoc when interpreted as the field length! At (B) and (C) in the code example that follows, I have defined the two components as separate fields–overlaying varyField–to demonstrate the layout. D varyingStruct DS (A) D varyField 256a Varying Inz // Following fields are defined just to show the layout of a varying field (B) D length 5i 0 Overlay(varyField) (C) D data 256a Overlay(varyField: *Next) Whenever the content of a varying length field is changed, the compiler adjusts the length to reflect the new content. Note that you should always use %Trimx when loading data from a fixed length field into a varying length field, otherwise any trailing blanks will be counted in the field length. Any time you want to know how long the field is, use the %Len() built-in function to obtain the current value. Now that we’ve reviewed the basics of variable length fields, let’s see how they can be used to boost the performance of some types of string operation. Take a look at the following two pieces of code. Both of them build a string of 100 comma separated values. At first glance there is very little difference in the logic, but would you believe that the second one can run hundreds or even thousands of times faster? For i = 1 to 10; For j = 1 to 10; (D) fixedField = %Subst(baseString: i: j ); (E) longFixed = %TrimR(longFixed) + ',' + fixedField; EndFor; EndFor; For i = 1 to 10; For j = 1 to 10; fixedField = %Subst(baseString: i: j ); (F) longVarying += ',' + %TrimR(fixedField); EndFor; EndFor; The reason is simple. The second one (F) makes use of a varying length field to build up the result string! This difference in speed is easy to understand if you think about what is going on under the hood. The first version (E) uses a fixed length target string so these are the steps that take place:
This process is repeated for each new value added to the string. Notice that having carefully padded the string with blanks (4), the very next thing we do (1) is to work out how many there are so that we can ignore them! Contrast this with the mechanics of the version using the variable length field (F):
Much simpler! And the resulting speed differences can be staggering. In tests I ran while preparing this tip, even with a target field length as small as 256 characters, the varying length field version took only half the time of the fixed length version. When I raised the field length to 25,600, which is a much more realistic size when building a CSV, HTML or XML string, the speed difference rose to 1,300 to 1! Another point to consider is that the code shown above (E) is already much more efficient than much of the code I have seen in customers’ programs. The two variants below are both very common and both even less efficient. In the first case (G) the field being added is being trimmed of blanks, which are immediately added back if it does not fill the target field! In the second case (H) the separation of the two functions means that the calculations for the effective length of the target field and the subsequent blank filling occur twice for each loop. You can imagine what that does to the speed. And yes, I have seen cases where people combine both G and H! (G) longFixed = %TrimR(longFixed) + ',' + %TrimR(fixedField); (H) longFixed = %TrimR(longFixed) + ','; longFixed = %TrimR(longFixed) + fixedField; That’s all for this first look at variable length fields. In a future tip we’ll look at their uses and abuses in the database. P.S. For those of you wondering what the purpose of the code at (D) is, it is simply used to generate fields of different effective lengths (one to 10 characters) to act as the test data to be added to the target string. Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.
|