Inline Table Functions In DB2 For i
June 14, 2016 Michael Sansoterra
In DB2 for i 7.2 TR4 and IBM i 7.3, IBM has made a special user-defined table function (UDTF) enhancement that should be shouted from the rooftops. This enhancement is referred to as an inline table function. Consider the simple example of this UDTF named Get_Customer_Orders: CREATE OR REPLACE FUNCTION Get_Customer_Orders ( @CustomerID INT, @StartDate DATE, @EndDate DATE) RETURNS TABLE ( SalesOrderId INT, CustomerId INT, OrderDate DATE, ShipDate DATE, SUBTOTAL DEC(19,4)) LANGUAGE SQL NO EXTERNAL ACTION DISALLOW PARALLEL SET OPTION COMMIT=*NONE,USRPRF=*OWNER,DATFMT=*ISO RETURN SELECT SalesOrderId,CustomerId, OrderDate,ShipDate,SubTotal FROM SalesOrderHeader WHERE CustomerId=@CustomerId AND OrderDate BETWEEN @StartDate AND @EndDate; What does DB2 do when the function is invoked? In older versions of DB2 for i, when you examine the following query within Visual Explain: SELECT * FROM TABLE(Get_Customer_Orders(29994, '2007-05-01','2008-04-30')) CO If you do that, you will get the result shown in Figure 1:
As you can see from the figure, the table function is shown as a single-step “black box” within the access plan, and the only row count estimate from the table function is the function’s cardinality, which in the above case was defaulted at 1,000 rows. This plan gives no indication of what tables or indexes are being used by the UDTF. Now, moving to the latest version of DB2 for i to run the same simple query, Visual Explain now shows this:
Why the difference in plans? The answer is that DB2 has now “inlined” the table function. The index probe shown in the plan is for an index on the SalesOrderHeader table. This means that DB2 has the ability to incorporate the UDTF’s table and logic into the parent query’s access plan. similar to how a view reference within a query would be “expanded” into the access plan by DB2. This is great news because DB2 has greater control over organizing the UDTF’s resources when building an access plan. To illustrate, consider this query that hosts the Get_Customer_Orders UDTF: SELECT CO.*,ProductId,UnitPrice,OrderQty FROM TABLE(Get_Customer_Orders(29994, '2007-05-01','2008-04-30')) CO JOIN SalesOrderDetail SD ON CO.SalesOrderId=SD.SalesOrderID As Visual Explain shows in Figure 3, the UDTF’s access plan is incorporated directly into the overall access plan:
Two index probe operators are included in the plan shown in Figure 3. The upper operator is the SalesOrderHeader table, with an estimated 12 rows returned. The lower operator is the SalesOrderDetail table, with an estimated four detail rows returned per header. For clarity, Figure 4 shows what the plan for the above query looks like prior to the new in-lining feature:
Notice the row count estimates given in the plan in Figure 4 compared with Figure 3. Prior to inlining (Figure 4), DB2 estimated the final query would result in about 4,000 rows (1,000 rows from the table function multiplied by an average of four detail lines per order header). With inlining (Figure 3), because the UDTF’s resources are included, DB2’s estimate is improved to 22 rows. The actual row count returned by the query with the given literal values is 109 rows. The benefit of inlining is that DB2 gets to examine a UDTF’s tables directly to make its row count estimates. Row counts are important to DB2 because it uses them to determine table join order, the estimated amount of memory to allocate for the query’s use, etc. Having these numbers out of whack can cause DB2 to generate a suboptimal plan. Of course, this trivial example was somewhat of a hack. A cardinality (i.e., estimated row count) should have been supplied with the table function definition to give DB2 a better idea about the number of rows being returned. Specifying CARDINALITY 50 when creating the function would have taken care of the vast row count discrepancy between the two plans. Even so, for complex queries with widely varying parameters, it is not easy for a developer to supply an accurate “one size fits all” cardinality estimate. So, inlining allows DB2 use table statistics to do this row count estimate. Moreover, recall that DB2 can rewrite a query to do a statement’s JOINs in the order it thinks will be optimal, instead of sticking with the order given in the SQL statement. Most often, DB2’s chosen join order is superior to what the developer supplies, especially in complex queries. In the past, the result set produced by a UDTF was considered as a standalone table in DB2 with respect to the join order. But since an inline UDTF’s tables are now open to DB2, it can build a plan that considers the tables in the host query along with the tables within the UDTF such that it can optimize the join order of all the tables. This is the same benefit that DB2 gives to views. A view’s referenced tables are incorporated directly into the access plan so that DB2 can consider all tables at once. There are a few rules to follow when creating a UDTF that can be inlined. The primary requirements for inlining eligibility are:
Once you have written a UDTF, you can check the INLINE column in the SYSFUNCS catalog to determine if it is eligible to be inlined: SELECT * FROM QSYS2.SYSFUNCS WHERE ROUTINE_NAME='GET_CUSTOMER_ORDERS' AND ROUTINE_SCHEMA=CURRENT_SCHEMA AND INLINE='YES' The INLINE attribute of a function only indicates that it’s eligible to be inlined into a parent query. Depending on the run-time situation of the query that references the UDTF, DB2 may choose not to inline it. (For example, if a host query user profile is running under *USER while the UDTF is defined with *OWNER.) Make Your Functions Get “Inline” DB2 for i’s ability to inline table functions is a great feature. It requires no new coding knowledge and only a few rules to understand inlining eligibility. The combined benefit is the ability to write re-usable parameterized functions while giving DB2 greater latitude in incorporating the function’s logic and tables into an overall query plan. The end result is better access plans that translates to better performance. Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.
|