Rounding Over a Set of Rows
March 16, 2005 Michael Sansoterra
The code for this article is available for download.
Sometimes, a seemingly simple thing like rounding can present thorny issues. This is particularly true when the rounding has to take into account an arbitrary number of values. To illustrate, consider the example of a retail point-of-sale application that requires the rounded state sales tax and local sales tax to be recorded at the sales line level instead of at the header level.
This can get tricky because the rounded tax amount calculated at the header may not equal the sum of the tax amounts calculated against the line as shown here:
State Tax – |
7% |
|
|
Local Tax – |
1.5% |
|
|
|
|
|
|
Items |
Qty |
Ext Price |
Rounded Tax (8.5%) |
Candy |
1 |
$ 0.51 |
$ 0.04 |
Light Bulbs |
1 |
$ 2.28 |
$ 0.19 |
|
|
|
|
Item |
|
$ 2.79 |
|
|
|
|
|
Tax (8.5%) |
|
$ 0.24 |
$ |
|
|
|
|
Total |
|
$ 3.03 |
|
Of course, we don’t want the total tax to show as $.23 because the customer would think our computer doesn’t multiply correctly. Nevertheless, in this case the rounded accumulated tax for line items is less than the total tax as calculated against the item total.
This problem shows up in many applications where discounts, taxes, or any type of decimal multiplication occurs against a set of line items and against the header with the same scale (number of decimal positions). So the general question at hand is: How do we get the extended line item calculations to correctly match the same calculation against the header?
One simple solution is just to calculate the difference between the line totals and the header total and add it to one of the line items. The only problem with this scenario is in cases where a significant difference may have to be added to a single line.
To solve this dilemma, let’s review the root of the problem. Rounding discrepancies arise in this case because the rounding occurs once at the header but multiple times on the lines. As a result of the extra rounding, sometimes additional pennies are added or missed during the totaling. The appropriate solution then is to only round once on both the header and the lines so that the totals always match. So how do we round only once when there are multiple lines?
To answer this, I’ll present a simple rounding technique I came across in an old MAPICS payroll manual. To round once over multiple lines we only need to add half a penny on the first calc. The remainder from the calculation is carried over to the next line as demonstrated here:
|
Ext |
|
Rounding |
Tax |
Truncated |
|
Items |
Price |
Tax 8.5% |
Adjustment |
Amount |
Amount |
Remainder |
Candy |
$ 0.51 |
0.0434 |
0.0050 |
0.0484 |
$ 0.04 |
0.0084 |
Light Bulbs |
$ 2.28 |
0.1938 |
0.0084 |
0.2022 |
$ 0.20 |
0.0022 |
|
|
|
|
Total |
$ 0.24 |
|
For simplicity, the city and state tax were left combined, although the technique can be used for applications involving several multiplication extensions.
As you can see in the chart, for the first sales line a rounding adjustment of half a penny is automatically included. This is the equivalent of the standard half a cent rounding that’s done when rounding after a single calculation. The extended amount is then truncated at the desired number of decimals (in this case two.) To spread the rounding amount over the remaining lines the fraction of a penny left over from the first calculation is carried over to the next line.
With this rounding technique calculations against the header and line will come out equal because in each case the rounding is only done once. For each line item, the maximum rounding error will be ± 1 cent, which is a side effect of carrying over the remaining fractions of a penny across all the lines.
To code a program with this technique, we simply need to write a routine that sets the initial rounding adjustment amount to half a cent and remembers the remainder from the prior calculation so that it can be included in the current calculation.
We can pick just about any programming language to do this. Since SQL is adept at working with sets of rows, writing a scalar user-defined function (UDF) provides a particularly good solution. The only SQL trick to using this code is making the function remember the prior row’s rounding amount remainder. This can be done by using a special feature of UDFs called a scratchpad. The scratchpad is a programmer-defined storage area of arbitrary length that can be used to preserve the rounding value of the current row for use when the function is called for the next row.
If you need help understanding user-defined functions and the scratchpad, see “Scribble on SQL’s Scratchpad”, where a similar UDF is demonstrated to create a running total.
UDF RoundSet (which is in the downloadable code at the top of this story) is an RPG program written to do the rounding work over many rows. The function will accept three parameters:
- BreakData: This VARCHAR parameter is used as a “control break” indicator allowing the function to determine when it should re-start the rounding process for a new set of rows.
- ExtAmount: The extended amount parameter is a decimal field, usually containing the unrounded result of an extended amount.
- RoundScale: Rounding scale informs the function of how many decimal places should be rounded before the function returns its result.
The RoundSet RPG program is registered for use with SQL with the following Create Function statement:
Create Function xxxxx/RoundSet(BreakData VarChar(128), ExtAmount Dec(30,9), RoundScale Int) Returns Dec(30,9) External Called on null input Not deterministic Language RPGLE No Sql Parameter Style DB2SQL ScratchPad 156 Final Call Disallow Parallel
Review the IBM’s DB2 Universal Database for iSeries SQL Reference for help on the keywords specified on the Create Function statement.
To demonstrate the power of this function, let’s take a payroll example right from the MAPICS manual. Consider the following earnings register where we desire to print an employee’s daily earnings extended only to two decimal places:
|
|
|
|
Gross |
Gross |
Gross |
Emp |
Card |
Hours |
Pay |
No |
Round |
Set |
No. |
Date |
Worked |
Rate |
Round |
(2 |
Round |
1 |
|
4.50 |
9.863 |
44.38350 |
44.38 |
44.38 |
1 |
|
4.50 |
9.863 |
44.38350 |
44.38 |
44.39 |
1 |
|
4.50 |
9.863 |
44.38350 |
44.38 |
44.38 |
1 |
|
4.50 |
9.863 |
44.38350 |
44.38 |
44.38 |
1 |
|
4.50 |
9.863 |
44.38350 |
44.38 |
44.39 |
|
Total |
22.50 |
9.863 |
221.91750 |
221.90 |
221.92 |
|
|
|
|
|
|
|
2 |
|
8.00 |
9.731 |
77.84800 |
77.85 |
77.85 |
2 |
|
8.00 |
9.731 |
77.84800 |
77.85 |
77.85 |
2 |
|
8.00 |
9.731 |
77.84800 |
77.85 |
77.84 |
2 |
|
7.50 |
9.731 |
72.98250 |
72.98 |
72.99 |
2 |
|
8.00 |
9.731 |
77.84800 |
77.85 |
77.84 |
|
Total |
39.50 |
9.731 |
384.37450 |
384.38 |
384.37 |
By looking at the “Gross No Round” column we can see that the actual gross earnings for Employee Number 1 is $221.91750, but this column has too many decimal places. Rounded to two decimals the earnings are $221.92. From the “Gross Round (2 Dec)” column, if we round on every line, we’ll erroneously calculate a total of $221.90. Column “Gross Set Round” uses the rounding technique discussed above to effectively round once over a set of rows by only adding half a cent on the first calculation and then carrying the fractional remainder to successive rows. It also prints the desired two decimal places. The total of this column correctly sums to $221.92–the same amount we would’ve achieved by multiplying the total weekly hours by the pay rate. A similar scenario is shown for Employee Number 2.
The EmpTime table shown below shows the employee time card table that holds the same information as shown above, except for the calculated columns:
Create Table xxxxx/EmpTime ( EmpNo Int Not Null, TCDate Date Not Null, HoursWorked Dec(5,2) Not Null, PayRate Dec(7,3) Not Null, Primary Key (EmpNo,TCDate))
By using the RoundSet UDF, the above sample register can be generated in a single statement using a common table expression (i.e., the SQL WITH statement, which requires V5R1 or higher):
With EarningsRegister As ( Select TCDate,EmpNo,HoursWorked,PayRate, HoursWorked*PayRate As Gross_NoRound, Round(HoursWorked*PayRate,2) As Gross_LineRound, RoundSet(Cast(EmpNo As VarChar(128)), HoursWorked*PayRate,2) As Gross_SetRound From EmpTime Where TCDate Between '2004-02-02' And '2004-02-06') /* Detail */ Select * From EarningsRegister Union All /* Emp Total */ Select Cast(Null As Date),EmpNo, Cast(Null As Int),Cast(Null As Int), Sum(Gross_NoRound), Sum(Gross_LineRound), Sum(Gross_SetRound) From EarningsRegister Group By EmpNo Union All /* Grand Total */ Select Cast(Null As Date),Cast(Null As Int), Cast(Null As Int),Cast(Null As Int), Sum(Gross_NoRound), Sum(Gross_LineRound), Sum(Gross_SetRound) From EarningsRegister Order By 2,1
Tips for using the UDF
- Be sure to use a forward only cursor with this type of UDF. Fetching backwards through the rows can cause erroneous results.
- The function depends on the rows being pre-sorted according to the field provided in the break data parameter. If for some reason this order is broken the function will return incorrect values. While normally the ORDER BY clause will sort the rows correctly, keep in mind that the ORDER BY may only sort the final result set — not necessarily the intermediate results (depending on the complexity of the data and the keys on the file). This means that SQL may not necessarily feed the function the data in the same order as specified in the ORDER BY. For complex queries, sometimes it’s best to have a correctly sorted intermediate temporary table before invoking this function.
- A scale of zero or a negative number can also be passed to the function. A negative scale rounds to the left of the decimal point. For example, by placing -2 in the UDF’s scale parameter in the Select shown above, the function will correctly total by employees to $600.
- Do not call the function for rows that have an extension of zero, because a leftover rounding amount could be assigned to a line that did not contribute to the total.
Avoid Rounding Discrepancies
A mistake I made during my early programming days involved a payroll application that rounded the same data in several different ways–sometimes by employee, sometimes by department, and sometimes by general ledger account. Unfortunately, due to rounding a different number of times for each report, none of them ever added up exactly the same amount, which often made the reconciliation of pennies, nickels, and dimes quite a chore. By using this technique of rounding once over an entire set of rows, the discrepancies could have been avoided.
Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. Click here to contact Michael Sansoterra by e-mail.