Guru: Dealing With Non-Normalized Data
April 9, 2018 Jon Paris
From time to time, many of us have to find solutions for handling our old non-normalized tables in an efficient manner. It would be nice to have the luxury of redesigning and normalizing these databases, but real life is not like that. This is particularly true when the tables in question are part of an application package where you have no control over the file layout.
I should point out that by “efficient” in this context I don’t necessarily mean processing speed, but rather in terms of the number of lines of code needed to perform the necessary manipulations and/or how obvious (and therefore maintainable) the resulting code is.
The classic example, and one that I often reference when teaching this topic, is where some form of monthly sales data is stored as a series of 12 consecutive fields in the record. For example, the file might contain a product code, followed by January sales, February sales, and so on all the way to December.
This story contains code, which you can download here.
If the requirement is to produce the total sales for the year from such a record, how would I go about that in an RPG program? Of course I could do it field by field, i.e. totalSales = JANSALES + FEBSALES + … DECSALES;
But if I could redefine the record so that I can treat the monthly sales values as an array (which after all is what they really are) then I can simply say totalSales = %XFoot(salesByMonth)
Now wouldn’t that be a whole lot nicer? Plus of course I could also reference individual months by using the month number as an index. Much more flexible.
So how do we go about doing that? It is very simple really, as you can see in this code extract.
dcl-f salesHist keyed; // List all sales value fields from the history file in sequence // No size or type is needed - this will be derived from the file dcl-ds productItems; JANSALES; FEBSALES; MARSALES; ... NOVSALES; DECSALES; // Now simply define an array over the fields salesByMonth Like(JANSALES) Dim(12) Pos(1); end-ds; dcl-s totalSales Like(JANSALES); // Now we can sum all the months sales values with ... totalSales = %XFoot(salesByMonth);
While this provides a neat solution to this particular problem, often such legacy files are more complex than this. I was reminded of this during a recent series of emails with an RPGer who is faced with such requirements every day. In his case he was faced not with a simple one field “array,” but with a repeated series of fields. There was also a “twist” in his requirements that I will get to in a moment.
In the past I have most commonly encountered this type of record layout in apparel applications where a single record may hold (for example) pricing information for a number of sizes. So, a simplified version of such a file is what I will be using for this example.
The basic layout of my file is that a product code is followed by fields for size, cost, and selling price. These three fields are repeated, in the case of my test file, a total of four times. In reality I have seen them repeat eight to 10 times or more. These days we would not, hopefully, ever design a table like this but . . . .
The simplest approach might be to use the same technique I demonstrated above and create three separate data structures, one for each of the field types. But the result is somewhat clumsy and the relationship between the group of three fields is not immediately obvious. Instead I’m going to demonstrate how group fields can be used to achieve the same thing a little more cleanly.
The basic principal is the same, I simply list the columns in the order I want them like this:
dcl-ds productItems; SIZE1; COSTPR1; SELLPR1; SIZE2; COSTPR2; SELLPR2; ... // fields are repeated until SELLPR4; // Now define a group field to contain the array // Note the field sequence _must_ match (A) prodDetail Dim(4) Pos(1); prodSize Like(SIZE1) Overlay(ProdDetail); prodCost Like(COSTPR1) Overlay(ProdDetail: *Next); sellPr Like(SELLPR1) Overlay(ProdDetail: *Next); end-ds;
The difference begins at (A) where I define the group field prodDetail. What makes it a “group field”? Simply the fact that it does not have any length or type definition in and of itself, but rather is defined by the combined length of the fields that overlay it. In this case it “groups” the fields prodSize, prodCost and sellPr. Even though it does not have a length specified, it can be declared as an array and — this is the important bit — that in turn allows each of the three individual fields to be treated as an array. That means I can reference any of the groups (e.g. prodDetail(1) ) or any of the individual fields (e.g. prodCost(1) ). Because prodDetail is defined as starting in Pos(1), i.e. the beginning of the data structure, prodSize(1) maps SIZE1, and sellPr(4) maps SELLPR4.
I mentioned earlier that my correspondent had a “twist” in his requirements. Simply put, he needed to copy the fields in the record to corresponding, but differently sized, typed, and named, fields in display, print, and disk files. So, for example, a field that was packed 5,2 was perhaps packed 7,2 in another table and zoned 9,2 in display and print files. Even if he placed the fields in a DS, the size and type differences precluded the simple copying one DS to another. Plus of course the name differences precluded the use of EVAL-CORR to copy the data.
My approach was to create group field arrays as shown above for both the source and target records and then to simply assign each of the input arrays to their corresponding target array. The target array definitions (B) look like this:
dcl-ds printItems Inz; ITEMSIZE1; ITEMCOST1; ITEMSELL1; ITEMSIZE2; ... // fields are repeated until ITEMSELL4; (B) itemDetail Dim(4) Pos(1); size Like(ITEMSIZE1) Overlay(ItemDetail); cost Like(ITEMCOST1) Overlay(ItemDetail: *Next); sell Like(ITEMSELL1) Overlay(ItemDetail: *Next); end-ds;
Now that the arrays are defined, copying data from the input record to the target is simply a matter of coding:
size = prodSize; cost = prodCost; sell = sellPr;
RPG takes care of any size and type differences and effectively copies the fields correctly one by one to the target — with just three lines of code!
Interestingly this approach also takes care of another issue my correspondent had, which was that sometimes the source record was from a file with only (say) three repeats but needed to be presented in the same reports/displays as files with five or six repeats. Again RPG takes care of this because the array with the smaller number of entries governs the number of fields copied.
Advantages?
I have already mentioned a number of advantages of this approach but just to summarize:
- Because the record field names are used in the DS the arrays are populated the minute a record is read. No need to code individual field moves.
- Copying of a series of fields requires just one line of code.
- Any differences in data type and size are taken care of by RPG.
- The arrays can be sorted on any of their component fields. For example, SORTA prodCost will put the groups of three fields into order based on the product cost.
- A group of associated fields can be referenced via the group field array For example, itemDetail(4) references fields ITEMSIZE4, ITEMCOST4 and ITEMSELL4.
If you are interested in “playing” with this technique you can download the sample programs and data here. Got a similarly annoying problem you’re looking for a “fix” for? Or if you simply have questions or comments I’d love to hear from you.
Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner (also an expert in System i programming). Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference.