Guru: Create A UDTF Wrapper For A Stored Procedure Result Set
February 4, 2019 Michael Sansoterra
In the tip, Arranging Query Logic in DB2 for i Routines, I addressed a reader’s question about how to create a user-defined table function (UDTF) in DB2 for i that would return the same result set as an existing stored procedure. The purpose of having the UDTF would be to do additional processing on a result set, such as joining the result set with another query or dumping the result set to a temporary table for analysis. To accomplish this, I suggested moving the stored procedure query logic into a UDTF and then replace the query within the stored procedure with the new UDTF.
However, there may be a case when that suggestion can’t be implemented for a reason, such as when the stored procedure is part of a vendor-supplied application package that can’t be modified. Another possibility is that the stored procedure was written in a language unfamiliar to you, such as C, and the task of rewriting the logic as UDTF is time-consuming and error-prone. In cases like these, a UDTF can be written to execute the procedure, consume the result set, and format the data in a fashion that the table function can utilize.
This story contains code, which you can download here.
Using this wrapper technique is powerful, as it will allow a stored procedure result set to be used within other DML statements. This technique can be implemented using many high-level languages, including Java and RPG. I will demonstrate this method using an “external” Java UDTF. Keep in mind a performance tax is imposed with a wrapper such as this, so carefully weigh the cost of convenience vs. performance.
The Stored Procedure And Result Set
For simplicity, using the example from the last tip, let’s say there is a stored procedure named GetOrders and it accepts CustomerId (INT) as a parameter. For the sake of argument, we’ll assume this procedure is vendor-supplied and shouldn’t be modified, yet we want to use its result set in a DML statement such as SELECT or UPDATE.
The procedure can be invoked as follows:
CALL GetOrders (11091);
When executed, the procedure returns a result set consisting of the following five columns:
Column | Data Type |
SalesOrderId | INT |
CustomerId | INT |
OrderDate | DATE |
ShipDate | DATE |
SubTotal | DECIMAL(19,4) |
The UDTF Wrapper
An “external” user-defined table function (UDTF) named GetOrdersJava will be used to capture the stored procedure’s result set. This type of function is called external because the code is written in a high-level language, rather than SQL. The function can be invoked in a simple SELECT as follows:
SELECT * FROM TABLE(GetOrdersJava(11091)) X;
The GetOrders.java source code for the GetOrdersJava UDTF is attached here. This tip assumes you know how to compile Java code. Note that in order to be used by DB2 for i, standalone Java classes should be compiled or placed in the following special IFS path:
/qibm/userdata/os400/sqllib/function
Finally, do not confuse the Java source code and class name (GetOrders.java) with the UDTF name GetOrdersJava.
If you’re not familiar with external table functions, I’ll briefly highlight how they work. External table functions (as opposed to SQL table functions) are coded in a high-level language and must follow a specific protocol demanded by DB2.
The first concern when coding a table function is the parameter list. The number of parameters can vary depending on the language and parameter style specified on the CREATE FUNCTION statement (more on this in a minute). In general, the parameter list for an RPG program, subprocedure or, in this case, a Java method, should contain an input parameter for each UDTF input parameter and additional output parameters for each column the UDTF returns.
The next concern when coding a table function is DB2’s ability to communicate to the function what call type (i.e. processing stage) it is in. The list of the five possible call types is shown below and includes a description of what GetOrders.java is doing during each call type:
Call Type | UDTF’s Task | GetOrders.java Implementation |
First | Perform initialization | Get JDBC reference to current database connection. |
Open | Open any resources required to do the work | Use JDBC to call the GetOrders stored procedure and return a result set internal to the Java program. |
Fetch | Give DB2 a row of data by populating the output parameters (i.e. UDTF columns) and exit. If no data remains, set the SQLCode to ‘02000’ (end of data). DB2 will continually call the UDTF with the fetch call type until it receives the end of data indicator. | Read one row from stored procedure result set and populate output parameters (i.e. build one row and give it back to DB2). Continue this process until there are no more rows from the result set. |
Close | Close resources | Close JDBC result set, connection, etc. |
Last | Perform final cleanup | Nothing |
In other words, DB2 will call the function a few times and give it the option to initialize data and open or allocate resources. Thereafter, it will continually call the function with a “Fetch” request. Each Fetch invocation represents one row of data to be returned from the table function.
Note that the state of the table function variables can be maintained across invocations (depending on how the function is written). In the Java program, the stored procedure result set is left open in between fetch calls and advanced one row with each fetch call. It would be a bad thing if we had to call the procedure, open the result set and reposition the result set with each type of fetch call!
When the UDTF indicates to DB2 that there are no more rows (this varies by language but in Java it is done by setting the SQLState as ‘02000’ (end of data)), DB2 will call the function a few more times to give it a chance to clean up resources.
Coding a table function in Java using the DB2GENERAL parameter style has an advantage over using other languages because it inherits a bunch of “plumbing” code from an IBM provided Java class named UDF. Therefore references to things like the call type, methods to set the SQL state, detect nulls on primitive types, etc. are already supplied and there is less chance of making a coding mistake. However, Java can carry some additional overhead, especially on the first invocation of a Java routine, as DB2 must take time to create a Java Virtual Machine instance to run the code.
Once the Java code has been compiled as a class (see source code header for a sample javac QShell command), it can be registered for use with DB2 as a UDTF using the CREATE FUNCTION statement. The CREATE FUNCTION definition for GetOrdersJava is shown here:
CREATE OR REPLACE FUNCTION GetOrdersJava (@CustomerID INT) RETURNS TABLE ( SalesOrderId INT, CustomerId INT, OrderDate DATE, ShipDate DATE, SUBTOTAL DEC(19,4)) EXTERNAL NAME 'GetOrders.orders' LANGUAGE JAVA PARAMETER STYLE DB2GENERAL DISALLOW PARALLEL FENCED SCRATCHPAD FINAL CALL RETURNS NULL ON NULL INPUT MODIFIES SQL DATA
I have bolded a few of the options that merit attention. First, EXTERNAL NAME specifies the Java class name (GetOrders) and method name (orders) to be invoked when the UDTF is called.
Second, PARAMETER STYLE DB2GENERAL is associated with a specific way of coding the Java function. Using this option gives DB2 the expectation that the Java code provided will inherit from IBM’s UDF class.
Third, MODIFIES SQL DATA may or may not be required. If the UDTF calls a resource such as another UDTF or stored procedure that DB2 believes has the ability to modify data, then MODIFIES SQL DATA is required. If the stored procedure GETORDERS was defined as READS SQL DATA then this UDTF could also be defined as READS SQL DATA. What’s the difference? Nothing other than DB2 may have to be a little more cautious about how it runs code if it believes data can be modified using SQL and that implies additional overhead. Even if your stored procedure will not be updating data, if it has not been marked as READS SQL DATA, then the UDTF will likely need to be built with MODIFIES SQL DATA.
Finally, not shown in this example is the CARDINALITY option. Specifying CARDINALITY followed by an estimated number of rows (e.g. CARDINALITY 5000) lets DB2 know how many rows, on average, the table function will return. This information can help DB2 make a more efficient query plan when the table function participates in a complex query involving other functions, tables or views. If your UDTF will return a relatively consistent average number of rows, then the CARDINALITY option should be included. If the number of rows returned by the function will vary greatly, then this option can be omitted.
For more information on building external table functions, see the references at the end of the article.
After the Java UDTF GetOrdersJava has been created, when run it will call the GetOrders stored procedure, read its result set one row at a time (and give DB2 one row at a time), and then clean up after itself. Because it is wrapped by a UDTF, the stored procedure result set can now participate in a wide array of SQL DML statements.
A Caveat
This UDTF is tightly coupled to the underlying result set. The column names and data types returned from the procedure should not change lest the table function potentially break as well. While a more dynamic approach can be taken to get column names and data types from a stored procedure result set, in the end, the coding hassle may not be worth the trouble as ultimately, the column definitions in the table function can never be dynamic (unless you’re willing to continually DROP and CREATE a table function to match a result set on a session by session basis.)
Author’s Note: Here are a few helpful references.
The power of user-defined table functions: A great overview of SQL and external UDTFs by Birgitta Hauser. If you’re new to UDTFs, this is a great place to start.
DB2 for i: Process Stored Procedure Result Sets as Cursors: For ideas on how to implement this function in RPG instead of Java, you will need to know how to capture and manipulate a stored procedure result set in RPG using embedded SQL.
IBM Developer Kit for Java: This reference guide contains a section entitled “Java SQL Routines” that describes the intricacies of creating Java code that DB2 can run (including scalar functions, table functions and stored procedures). I left quite a bit unsaid on exactly how all of this works and this is the IBM documentation for the topic. Don’t be discouraged if you don’t digest all of the options right away.