Use SQL User-Defined Functions to Avoid Data Repetition Problems
December 12, 2007 Michael Sansoterra
By now, most System i and AS/400 developers know they can enhance the SQL language in DB2 for i5/OS by creating their own user-defined functions (UDFs). In this tip I’d like to focus on one common problem encountered when writing queries and how a simple UDF template can often solve this problem. By way of review, in case you’re not familiar with them, UDFs are pieces of code that can bolt onto SQL. UDFs usually contain business logic for doing complex tasks that would be difficult or cumbersome to do using the SQL built-in functions. There are a variety of UDFs, but for now we will only consider a scalar UDF, which is a UDF that accepts zero or more input parameters and returns a single output parameter. Now let’s review this typical problem encountered when writing queries. To begin, a user requests a simple report that is formatted for Excel. Take the following sales order example: Select OrderId,OrderDate, CustomerId,ItemId,ItemDesc,Qty,ExtAmount From Order O Join OrderDetail D On D.OrderId=O.OrderId Where D.ItemType='ELECTRONIC' And D.Qty>10 You finish the query, dump the data to Excel and you’re done in under half an hour. Now the user comes back and wants to see all the serial numbers associated with each item number. You rewrite the query: Select OrderId,OrderDate, CustomerId,ItemId,ItemDesc,Qty,ExtAmount, SerialNo From Order O Join OrderDetail D On D.OrderId=O.OrderId Left Join Serial S On S.OrderId=D.OrderId And S.ItemId=D.ItemId Where D.ItemType='ELECTRONIC' And D.Qty>10 Now the user is getting the desired serial numbers. However, the unintended side effect is that the detail line is getting repeated for each serial number. If a quantity of 11 is sold for an item, then theoretically there should be 11 serial numbers that give you the same detail line repeated 11 times. The order detail data is the primary focus of the query so repeating detail data to show serial numbers just creates a mess. What can we do about this? Well, one solution is to let the user eliminate the duplicates in Excel, which can be time consuming. If you have a fancy Excel creation utility you may have the luxury of removing the duplicates, but usually that requires extra programming. Another possible solution is to combine this ancillary repetitive data all into one column. In this scenario the serial number data would be grouped into one large column, each serial number being concatenated together by a delimiter of some kind. This solution will keep the order detail data unique so that data isn’t repeated due to a join with the serial number file. A UDF is well suited to perform this task. Shown below is a sample UDF written with the SQL procedural language that will receive the “OrderId” and “ItemId” as parameters. The serial numbers for the given order and item combination will be joined together (delimited by a space) and returned in one large variable character (VARCHAR) column. Create Function GetSerialList (@OrderId Int, @ItemId Int) Returns VarChar(1024) Language SQL Reads SQL Data Set Option Commit=*None, UsrPrf=*Owner, DatFmt=*ISO Begin Declare @List VarChar(1024) Not Null Default ''; Declare @i Int Not Null Default 0; For SerialList AS SerialNumbers Cursor For Select RTrim(SerialNo) As Serial From Serial Where OrderId=@OrderId And ItemId=@ItemId Do Set @i=i+1; If @i>1 Then Set @List=@List||'' '; End If; Set @List=@List||Serial; End For; Return List; End This function uses the SQL FOR statement as a control statement that provides a template for loop processing. Every row returned from the FOR statement’s cursor is processed by any statements a developer wants to sandwich between the DO and END FOR statements. These statements can make use of declared variables or columns defined in the cursor’s SELECT clause. If you have column expressions in your SELECT, make sure you assign alias names so that they can be used within the row processing code. In our sample code, the only necessary functions to perform are to increment a counter variable (@i) and to append the serial number (column Serial) from the cursor to the @List variable. When all rows are processed, the FOR statement is exited automatically and the @List variable is returned as the function’s result. In fact, this template can be summarized as follows in four easy steps:
Now back to our original query. We can use the GETSERIALLIST UDF to give us all serial numbers in a single column: Select OrderId,OrderDate, CustomerId,ItemId,ItemDesc,Qty,ExtAmount, GetSerialList(OrderId,ItemId) As SerialNos From Order O Join OrderDetail D On D.OrderId=O.OrderId Where D.ItemType='ELECTRONIC' And D.Qty>10 Now instead of duplicating rows we simply create a large column for holding repetitive data. Of course, depending on the type and length of repetitive data, jamming it all in a single column may not always work. However, users I have worked with most often welcome this solution. A few more examples are:
Using this simple UDF template to combine repetitive data into a single column often provides an easy and effective way to deliver data to the user without having to muck up a query with additional joins. Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. You can contact him through our contact page.
|