Guru Classic: Overlaid Packed Data In Data Structures
March 4, 2020 Jon Paris
When I re-read this tip while looking for a “Classic” candidate, I was reminded that the underlying issue it addresses, namely how data is actually stored in an RPG program, is one that many RPG programmers don’t really have a firm grip on. That alone made it a good candidate. But perhaps even more important is demonstrating this topic to the many new programmers coming onto the platform. Unlike those of us who started off with assembly languages, C, RPG, or COBOL, modern programmers trained in C#, Python, or PHP have never had any need to understand the mechanics of how data is stored. For that matter, even data structures as we understand them in RPG are a somewhat alien construct. So for young and old RPGers, here’s a look back at a conversation I had with a Four Hundred Guru reader to solve a problem with overlaid packed-decimal data in data structures.
This story contains code, which you can download here.
Hi, Jon:
My question has to do with overlaid packed-decimal data in a data structure. I store a date in CYMD format in a seven-digit, packed-decimal field. I am trying to extract the two-digit year from that.
Dcl-Ds cymdDate; currentDate packed(7); currentYear packed(2) overlay(currentDate:2); End-Ds;
When I set currentDate to a value of 1160701 and looked at the value of currentYear in debug, I saw a value of 07, but I was expecting a value of 16.
I changed the current year to be defined as follows:
Dcl-Ds cymdDate; currentDate packed(7); currentYear packed(2) overlay(currentDate:1); End-Ds;
Using this definition, currentYear comes out correctly as 16.
Can you help me to understand why the overlay starting at 2 gives me the 3-4 position while the overlay starting at 1 gives me the 2-3 position?
— A Four Hundred Guru Reader
Jon Paris answers: Thanks for the question and for the opportunity to explain a couple of things that can bite you if you are not aware of the technical details underpinning them.
In order to fully explain this I have to go through some basics, so if some of this seems a little elementary, please excuse me.
First of all we need to understand that on IBM i, packed numbers are stored in nyble/nibble form, i.e., each digit is represented by a half-byte (nyble). A seven-digit number containing the value 1234567 would be stored as shown below.
Byte 1 | Byte 2 | Byte 3 | Byte 4 | |||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | F | |
The right-hand (low-order) nyble of the fourth byte is used to represent the sign of the number. For positive numbers this is normally hexadecimal F, although C, which is used on mainframes, is also considered valid. For negative numbers the value would be D.
For any given packed field, the number of bytes occupied is the number of digits + 1 for the sign, divided by 2 and the result rounded up if required. So a seven-digit number occupies 7 + 1 = 8 / 2 = 4 bytes. But what if there are an even number of digits? How are they stored? Basically a zero is normally placed in the first nyble of the number. But read on.
So instead of 1234567 let’s use a “date” such as you did in your example. It isn’t really a date of course–just a number that by convention we treat as a date. Since I’m writing this on July 2nd, 2016 I’ll use that value (1160702). You can see the storage layout in the table below:
Byte 1 | Byte 2 | Byte 3 | Byte 4 | |||||
currentDate | 1 | 1 | 6 | 0 | 7 | 0 | 2 | F |
currentYear1 | 6 | 0 | 7 | 0 | ||||
currentYear2 | 1 | 1 | 6 | 0 |
So far so good, but when you did the overlay for currentYear by specifying that you wanted it to overlay starting in position 2 you were asking it to overlay starting in byte 2, not digit position 2. Since currentYear1 occupies two bytes ( ( 2 + 1 ) / 2 = 1.5, rounded up to 2 ), it will cover bytes 2 and 3 and will contain the hex value 6070. When you displayed the value in the debugger, because the value was defined as a two digit packed, the first nyble (the 6) was ignored and the last nyble (1) was treated as a positive sign. The result was that you saw the value 07.
When you changed the overlay position to 1 then you were defining currentYear2 as occupying bytes 1 and 2. So the value the debugger showed you was 16 because once again it ignored the first nyble (1) and the 0 was treated as the positive sign.
Having said all that, had you actually run a program that used this technique you might have been writing to me with a slightly different question! This is because, in most cases, attempting to use currentYear would result in a decimal data error! The reason is that while the debugger may accept invalid values in the first and sign nybles, normal mathematical operations are not so tolerant.
Even if you happened upon a situation where you avoided a decimal data error, this would still not be a good practice and should not be used. After all, it confused you (and you wrote it), so imagine what it would do to the programmers who come after you!
The simplest modification to the code that would allow it to work correctly would be to use your original data-structure approach, but define the “date” field as zoned decimal (S), not packed. This will cause the individual digits to each be in a separate byte and then your original overlay approach (assuming the overlay is also changed to zoned) works just fine. The resulting code would look like this:
Dcl-Ds cymdDate; currentDate zoned(7); currentYear zoned(2) overlay(currentDate:2); End-Ds;
The biggest problem with anything that relies on the underlying data storage layout is that it is not immediately obvious to those who come after what you are doing. I would recommend wrapping such code in a subprocedure named GetYYfromCYMD or something similar. That way anyone reading the code in the future will know immediately what you are doing without having to concern themselves with the mechanics. Here is a very simple example of such a subprocedure.
Dcl-Proc GetYYfromCYMD; Dcl-PI *N zoned(2); cymd zoned(7) Const; End-Pi; Dcl-Ds cymdDate; date zoned(7); year zoned(2) overlay(date:2); End-Ds; date = cymd; Return year; End-Proc;
The downloadable code contains a short program that includes this subprocedure.
There are a variety of other options that you could also have used. For example:
currentYear = %Subst( %EditC( CurrentDate, ‘X’ ) : 2: 2 );
The ‘X’ edit code on the %EditC preserves any leading zeros, and the %Subst extracts the relevant characters. In this case of course, currentYear would be an alpha value. Once again this would be a good candidate for a subprocedure to make the intent more obvious.
In fact since a data structure is implicitly a character field, you could avoid the %EditC use this expression instead:
currentYear = %Subst( cymdDate : 2: 2 );
The result would be more efficient, but possibly less obvious.
You could also use the date BIF %SubDt for example to extract the year portion after using %Date to convert your numeric “date”.
%SubDt( %Date(currentDate: *CYMD) : *Y) )
However, you only want the last two digits of the year, so it quickly gets clumsy and is far from the most efficient method. I could probably think of many more ways to extract the year, but the “best” method really depends on what you actually want to do with the extracted year number.
I hope this helps you to understand why your original examples behaved the way they did and also why you should sometimes be cautious of accepting the values shown to you in debug at face value. Remember: When you have any doubts about packed values in debug, the ability to use Eval variableName:x to display the value in hex is your friend.
Jon Paris is one of the world’s foremost experts on programming on the IBM i platform. A frequent author, forum contributor, and speaker at User Groups and technical conferences around the world, he is also an IBM Champion and a partner at Partner400 and System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Paul Tuohy.
Why would you not recommend using standard date/timestamp data types. These are far more flexible and friendly than the examples, unless you have some view that these would not be suitable.
They would not be an answer to the question – which concerned the layout of data and why the OP did not see the values they expected. The whole tip is designed to explain that.
Personally I would be using real dates/timestamps/etc but that wasn’t the question.
A little out off topic, old mainframe instructions missing in ILE C:
/*********************************************************************/
/* Pack (zoned to packed decimal) */
/* (sizedec is in decimal digits) */
/*********************************************************************/
#include
#include
void pack (void* dec, char* zone, int sizedec, int sizezone) {
int i,j, k, ds;
char byte;
if (sizedec%2 == 0) ds = (sizedec+2)/2; /* size of dec in bytes */
else ds = (sizedec+1)/2;
memset (dec, 0x00, ds); /* binary zeros to result */
byte = zone[sizezone-1]; /* last byte with a sign */
if ( ((byte & 0xF0) != 0xF0) && ((byte & 0xF0) != 0xD0) )
byte != 0xF0; /* if bad sign force positive F sign */
byte = (byte <> 4); /* swap halfbytes */
memcpy ((char*)dec + ds – 1, &byte, 1); /* set last byte of dec */
if (ds == 1) return; /* one byte size is done */
j = ds-1; k = 0;
for (i = sizezone-2; i >= 0; i–) {
if (j == 0) break;
if (k == 1) { /* high halfbyte */
*((char*)dec+j-1) |= (zone[i] & 0x0F) << 4;
j–;
k = 0;
}
else { /* low halfbyte */
*((char*)dec+j-1) |= (zone[i] & 0x0F);
k = 1;
}
}
}
/*********************************************************************/
/* Unpack (packed to zoned decimal) */
/* (both sizes are in bytes) */
/*********************************************************************/
void unpk (char* zone, void* dec, int sizezone, int sizedec) {
int i,j, k, ds;
char byte;
if (sizedec%2 == 0) ds = (sizedec+2)/2; /* size of dec in bytes */
else ds = (sizedec+1)/2;
memset ( zone, '0', sizezone); /* character zeros to result */
memcpy (&byte, (char*)dec + ds – 1, 1); /* last byte with a sign */
byte = (byte <> 4); /* swap halfbytes */
if ( ((byte & 0xF0) != 0xF0) && ((byte & 0xF0) != 0xD0) )
byte != 0xF0; /* if bad sign force positive F sign */
memcpy (&zone[sizezone-1], &byte, 1); /* set last byte of zone */
if (sizezone == 1) return; /* one byte size is done */
j = ds-1; k = 0;
for (i = sizezone-2; i >= 0; i–) {
if (j == 0) break;
if (k == 1) { /* high halfbyte */
zone[i] = 0xF0 | (*((char*)dec+j-1) & 0xF0) >> 4;
j–;
k = 0;
}
else { /* low halfbyte */
zone[i] = 0xF0 | *((char*)dec+j-1) & 0x0F;
k = 1;
}
}
}
/*********************************************************************/
/* Test pack() and unpk() */
/*********************************************************************/
#include
#include
#include
#include
/* Convert zoned to packed decimal conversion */
void pack (void* dec, char* zone, int sized, int sizez);
/* Convert packed to zoned decimal conversion */
void unpk (char* zone, void* dec, int sizez, int sized);
int main(void)
{
char ZonedArg[4] = “4444”;
decimal( 8,0) DecResult;
decimal( 2,0) DecArg = 22d;
char ZonedResult[8];
pack (&DecResult, ZonedArg, digitsof(DecResult), sizeof(ZonedArg));
printf (“%D(*,0) \n”, digitsof(DecResult), DecResult);
unpk (ZonedResult, &DecArg, sizeof(ZonedResult), digitsof(DecArg));
printf (“%s \n”, ZonedResult);
}