Mystifying OS/400 and i5/OS Mathematics
August 9, 2006 Howard Arner
A customer recently reported a bug in my SQL Thing product, but after investigation it turned out not to be a bug but the user’s perception of how decimal arithmetic works in the i5/OS operating system. The user was doing some arithmetic with decimal fields and then multiplying by 100 to show the value as a percent, and was getting a result of zero for all of his calculations. The real problem is in the way the AS/400, iSeries, and i5 handle promotion of precision and scale when decimal numbers are used in equations. If you are not careful, decimal arithmetic can lead to an overflow condition or the total loss of scale. I will show you a simple query over a set of test data to reproduce the problem, talk about the promotion of data types and the changes to precision and scale in decimal operations, and then explain how you can avoid these issues in your own code. First, the test data table needs to be created using the following statement: CREATE TABLE SQLBOOK.SALES (STORE CHAR(10) NOT NULL, YR INTEGER NOT NULL, MO INTEGER NOT NULL, SALES DECIMAL(10,2) NOT NULL, PRIMARY KEY (STORE, YR, MO)); The above statement creates the table SALES, which has columns that identify the store, year, month and total sales. The table is uniquely indexed by the combination of store, year and month. Take note that the SALES column is defined as a DECIMAL(10,2) number. Next, execute the following statements to put test data into the table: INSERT INTO SQLBOOK.SALES VALUES ('WESTWOOD',2005,1,23544.34); INSERT INTO SQLBOOK.SALES VALUES ('WESTWOOD',2005,2,18645.01); INSERT INTO SQLBOOK.SALES VALUES ('WESTWOOD',2005,3,19876.98); INSERT INTO SQLBOOK.SALES VALUES ('WESTWOOD',2006,1,27899.42); INSERT INTO SQLBOOK.SALES VALUES ('WESTWOOD',2006,2,19457.20); INSERT INTO SQLBOOK.SALES VALUES ('WESTWOOD',2006,3,28941.05); INSERT INTO SQLBOOK.SALES VALUES ('YULEE',2005,1,13544.34); INSERT INTO SQLBOOK.SALES VALUES ('YULEE',2005,2,8645.01); INSERT INTO SQLBOOK.SALES VALUES ('YULEE',2005,3,12876.98); INSERT INTO SQLBOOK.SALES VALUES ('YULEE',2006,1,14899.42); INSERT INTO SQLBOOK.SALES VALUES ('YULEE',2006,2,6457.20); INSERT INTO SQLBOOK.SALES VALUES ('YULEE',2006,3,14941.05); We now have data for months 1 through 3 for years 2005 and 2006 for our WESTWOOD and YULEE stores. Now, let’s attempt the following query: SELECT TY.Store,TY.Mo,TYSales, LYSales, (TYSales-LYSales)/LYSales*100 AS PctChange FROM (SELECT Store, MO, SUM(Sales) AS TYSales FROM SQLBOOK.SALES WHERE YR=2006 GROUP By Store, MO) AS TY INNER JOIN (SELECT Store, Mo, SUM(Sales) AS LYSales FROM SQLBOOK.SALES WHERE YR=2005 GROUP by Store, MO) AS LY ON (TY.Store = LY.Store AND TY.MO = LY.MO); The above query creates two tables. TY represents this year’s sales, while LY represents last year’s sales. The tables are then joined together on the STORE and MO (month) columns so that we can show this year’s sales versus last year’s. The fifth column in the SELECT clause is a formula that calculates the percent change from this year’s monthly sales over last year’s sales. However, take a look at the output of the query:
Notice something wrong? The PCTCHANGE column should be showing the percent change of the sales from last year to this year, but instead it is showing zero. This behavior is due to the casting of data types that happens during the mathematical operations being performed on the Sales field. The Sales field is losing its scale in this operation and generating an overflow condition, and thereby showing incorrect results. There are several solutions to this problem, one of which is to cast the divisor to a double precision number, as the following query demonstrates: SELECT TY.Store,TY.Mo,TYSales, LYSales, (TYSales-LYSales)/DOUBLE(LYSales)*100 AS PctChange FROM (SELECT Store, MO, SUM(Sales) AS TYSales FROM SQLBOOK.SALES WHERE YR=2006 GROUP By Store, MO) AS TY INNER JOIN (SELECT Store, Mo, SUM(Sales) AS LYSales FROM SQLBOOK.SALES WHERE YR=2005 GROUP by Store, MO) AS LY ON (TY.Store = LY.Store AND TY.MO = LY.MO); Ok, so that is the solution, or one variant of it, but I think it is important that you understand the underlying problem in order to be able to recognize where this might occur. So, let’s talk about type promotion and casting. Casting During Mathematical Operations The system casts data from one type to another when performing operations between data types, even if the types are the same. For example, multiplying a SMALLINT data type by an INTEGER data type results in an INTEGER result. Adding a DECIMAL(10,2) number to another DECIMAL(10,2) number results in a DECIMAL(11,2). Why does the system change the precision during addition? The OS/400 platform uses the following formula to determine the outcome of the operation. Assuming the addition of two decimal numbers, Decimal(P1,S1) + Decimal(P2,S2), the new scale is max(S1,S2) and the new precision is min(MP,max(P1-S2,P2-S2)+max(S1,S2)+1). MP in the formula denotes the Maximum Precision allowable in operations and is the value 31 unless you compiled your program using a value of 63 in the DECRESULT parameter of CRTSQLxxx command or the RUNSQLSTM command, or you perform the appropriate SET OPTION statement. In addition, MP can be 63 if either P1 or P2 are greater than 31. The same formula is used for subtraction operations. Multiplication operations can also change precision and scale. Multiplying DECIMAL(10,2) * DECIMAL(10,2) yields a DECIMAL(20,4). Here, the formula for the precision is min(MP, P1+P2) and the formula for scale is min(MS, S1+S2), where MS is the Maximum Scale value. The default for MS is 31 and it can be set to any number from zero to the maximum precision using the DECRESULT parameter of CRTSQLxxx and RUNSQLSTM commands or using the SET OPTION statement. Division gets a little dicey as DECIMAL(10,2) / DECIMAL(10,2) yields a DECIMAL(31,21). The formula for the cast precision is (P1-S1+S2) + max(MDS, min(ms,mp – (P1 -S1+S2))). This formula introduces the variable MDS which is the Minimum Divide Scale. The default of MDS is zero but it can be changed to any number from zero to the maximum scale using DECRESULT parameter of the CRTSQLxxx or RUNSQLSTM commands or by using the SET OPTION statement. Aggregates and Casting We have covered what happens with the precision and scale of a decimal during simple math operations, but what about when used in column functions? Consider this query: SELECT SUM(SALES) AS TOTAL FROM SQLBOOK.SALES The resulting column total will be a DECIMAL(31,2) number because the casting specification for the SUM aggregation is to create a decimal number as DECIMAL(MP,S), where S is the scale of the input decimal column and MP is the Maximum Precision (which I defined a few paragraphs previously). So, by summing the sales we are creating a very large decimal number. Let’s use our newfound knowledge about data type casting to see why the query fails to give the proper results. Try this one first: SELECT TY.Store,TY.Mo,TYSales, LYSales, (TYSales-LYSales)/LYSales FROM (SELECT Store, MO, SUM(Sales) AS TYSales FROM SQLBOOK.SALES WHERE YR=2006 GROUP By Store, MO) AS TY INNER JOIN (SELECT Store, Mo, SUM(Sales) AS LYSales FROM SQLBOOK.SALES WHERE YR=2005 GROUP by Store, MO) AS LY ON (TY.Store = LY.Store AND TY.MO = LY.MO); TYSales and LYSales are results of aggregate operations, so they are DECIMAL(31,2) numbers. Since the parenthesis indicate that the subtraction operation should occur first, we use the subtraction casting formula and the result of the operation is a DECIMAL(31,2) number. We are OK so far, as we have not yet lost precision. However, we are now dividing DECIMAL(31,2) by LYSales, which is also a DECIMAL(31,2) number. Remember the formula for casting scale in decimal division is max(MDS, min(MS, MP – (P-S1+S2))). So if we replace the values we yield the expression max(0, min(31,31 – (31-2+2))). If you work this out, you find that the scale operation will now be zero in the resulting expression, which is why the result looks like this:
Since the casting operation took the precision to zero, we have lost the information that we want to see, specifically the percent change calculation. Here is where we can introduce a trick of casting called type promotion in order to solve the problem. Type promotion occurs when you are doing math between different data types. For example, dividing a decimal number by a double precision number yields a double precision number. We can take advantage of this by casting the LYSales number to a double precision number and thereby avoid the loss of precision, which makes our query not work. Here is a sample of the query as you might want to code it: SELECT TY.Store,TY.Mo,TYSales, LYSales, DECIMAL((TYSales-LYSales)/DOUBLE(LYSales)*100,10,3) FROM (SELECT Store, MO, SUM(Sales) AS TYSales FROM SQLBOOK.SALES WHERE YR=2006 GROUP By Store, MO) AS TY INNER JOIN (SELECT Store, Mo, SUM(Sales) AS LYSales FROM SQLBOOK.SALES WHERE YR=2005 GROUP by Store, MO) AS LY ON (TY.Store = LY.Store AND TY.MO = LY.MO); I use the DOUBLE function to cast LYSales to a double precision number, thereby keeping the precision I need in the resulting operation. The result is then multiplied by an INTEGER 100 and the result is cast to a DECIMAL(10,3) value for display purposes. Here is what the output looks like:
So, the point of this tip is to let you know that math is fun, but beware creeping precision and loss of scale when playing with zoned and packed information! RELATED STORY Howard Arner is the author of iSeries and AS/400 SQL at Work and the vice president of Client Server Development, of Jacksonville, Florida. He spends his days writing utilities and software in Microsoft .NET that reads data on the OS/400 and i5/OS platform. You can reach him by e-mail through our contacts page. |