Let One Row Represent A Group, Take Three
October 27, 2015 Ted Holt
Faithful reader Bill Cisne sent me an email with an SQL SELECT query. No comment, no explanation, just code. Bill had figured out how to use a lateral join to solve a problem and was kind enough to share his success with me. I have been aware of lateral joins for some time, but have not found a need for them in my work. Today I revisit a problem to illustrate one way that lateral joins can be useful. LATERAL is a type of for-each loop within a query. For each row that a SELECT statement (the “outer” query) returns, the system executes a second (or inner) query. The power of LATERAL is that the inner query may refer to columns from the outer query. Michael Sansoterra mentioned LATERAL a long while back in this august publication. To illustrate the workings of LATERAL, let’s return to an example I’ve used before: retrieving only one row per group. Let’s list an unshipped sales order with information from only one line item–the one with the earliest entry time. Here’s the raw detail data: Order Line Qty Shipped Balance Price Time 101 1 10 10 0 4.00 2015-10-21-08.14.06 101 2 12 0 12 1.00 2015-10-21-11.15.32 101 3 8 0 8 3.00 2015-10-28-14.14.37 102 1 10 5 5 2.00 2015-10-21-10.05.12 103 3 5 0 5 1.10 2015-10-21-14.37.25 104 1 5 5 0 3.50 2015-10-21-14.58.21 105 1 4 4 0 5.00 2015-10-21-15.30.44 105 2 6 0 6 2.00 2015-10-21-15.30.45 Notice that there are two incomplete lines for order 101. Here’s the query: select h.OrderNo, substr(char(h.OrderDate),1,10) as OrderDate, d.LineNo, (d.QtyOrdered - d.QtyShipped) as Balance, d.Price, dec((d.QtyOrdered - d.QtyShipped) * d.Price,7,2) as Extended, substr(char(d.crttime), 1, 16) as TimeEntered from SalesOrdH as h, lateral (select * from SalesOrdD as x where h.orderno = x.orderno and x.QtyShipped < x.QtyOrdered order by x.crttime fetch first row only) as d To deepen my understanding, I’ve been working through the examples in Michael R. Jones’ fabulous ebook, Converting RPG to Set Based SQL (Part 1: Input Operations). Michael uses a slightly different syntax, which works just as well as the previous example. select h.OrderNo, h.CustNo, substr(char(h.OrderDate),1,10) as OrderDate, d.LineNo, (d.QtyOrdered - d.QtyShipped) as Balance, d.Price, dec((d.QtyOrdered - d.QtyShipped) * d.Price,7,2) as Extended, substr(char(d.crttime), 1, 16) as TimeEntered from SalesOrdH as h cross join lateral (select * from SalesOrdD as x where h.orderno = x.orderno and x.QtyShipped < x.QtyOrdered order by x.crttime fetch first row only) as d Both queries return the same report: Order Customer Order date Line Balance Price Extended Time 101 80112300 2015-10-20 2 12 1.00 12.00 2015-10-21-11.15 102 25000300 2015-10-20 1 5 2.00 10.00 2015-10-21-10.05 103 65888800 2015-10-21 3 5 1.10 5.50 2015-10-21-14.37 105 42221700 2015-10-21 2 6 2.00 12.00 2015-10-21-15.30 To understand this, let’s begin with the outer query. It selects a few columns from the order header table and a few columns (some of which are calculated) from a lateral expression, i.e. an inner query. For each sales order header row, the query engine runs the inner query to retrieve one row from the sales order detail table. This inner query looks for the earliest-dated detail row which has not been completely shipped. Notice that the nested query refers to a table in the outer query (correlation name h). That’s that power of LATERAL that I mentioned above. LATERAL was added to DB2 some 11 years ago. Don’t you agree that it’s about time I got around to mastering it? RELATED STORIES For i Scalar Function Performance Considerations Let One Row Represent a Group, Take 2
|