Guru: Partitioning Result Sets Using SQL
November 27, 2023 Mike Larsen
While working on a project recently, I needed to retrieve attributes of an item. While that sounds like a simple task, there was a twist. While reviewing the contents of the item master table, I noticed there could be multiple rows for the same item and each row was active and valid from a business perspective.
I won’t get into the business end of it, but my goal was to return the most recent iteration of the item for further processing in the program. After doing some research and trying some potential solutions, I came across the SQL partition by clause and wanted to see if it could work for me.
It makes sense, at this point, to show what the items look like (Figure 1).
In this example, I have an item number (aka NDC_NUMBER), a GTIN that represents a case of the item, the case quantity, and a sequence number. When this item was originally created, it had a case GTIN identifier of 40300931135566 and the case held 48 items. At some point the manufacturer of the item changed the case quantity to 72 and assigned a new GTIN identifier to it. For the purposes of my program, I needed to return the most recent iteration of the item.
After working with the partition by clause for a bit, I was able to write a statement that gave me exactly what I wanted (Figure 2). I’ll show the entire SQL statement, then I’ll break it down and explain the code.
On line 1, I’m building a common table expression to hold my result set. Lines 3 – 7 is where the magic happens. On line 3, I select the NDC, case GTIN, and case quantity from my item master table. The Row_number() function, on line 4, assigns a sequential number to each row in the result set. I’ll use that later. The over clause defines the result set on which I will perform an OLAP (Online analytical expression) operation. Next, the partition by clause is used to divide the result set into partitions. Partitioning a table makes it easier to manage and query the data and can improve query performance. I added an order by clause to sort the results in descending order. I did that because I want to get the most recent iteration of the item.
In figure 3 and 4, I show the results of the SQL statement.
I see that I have the two rows for the item, and I have the most recent version of the item in the first row. My final step is to add criteria to the SQL statement, so it only returns the first row (Figure 5).
When I run this query, I get just the row in which I’m interested (Figure 6).
There are usually many options that solve a challenge when building a solution. This is the method I chose to solve mine and it is working perfectly.
Mike Larsen is a director of information technology at Auburn Pharmaceutical and has been working with IBM i systems for over 20 years. He specializes in RPG, CL, and SQL and recently has been working with PHP and Python. Current projects have given Mike the opportunity to work with generating and parsing XML and JSON from SQL and consuming SOAP and REST web services. Although his main area of expertise is on IBM i, Mike has a passion for learning other languages and how he can integrate other platforms with IBM i.
RELATED STORIES
Guru: Comparing IFS Directories Using SQL
Guru: String Manipulation Using SQL
Guru: Regular Expressions, Part 1
Guru: Regular Expressions, Part 2
Guru: Debugging SQL Stored Procedures With ACS
This should also give you the same result. Performance wise I don’t know which would be the best.
SELECT a.NDC_NUMBER, b.GTIN_CASE, b.GTIN_CQTY, b.GTIN_SEQ
FROM olsdta.GtinMasterTable a
JOIN TABLE( SELECT GTIN_CASE, GTIN_CQTY, GTIN_SEQ
FROM olsdta.GtinMasterTable b
WHERE b.NDC_NUMBER = a.NDC_NUMBER
ORDER BY GTIN_SEQ desc
LIMIT 1 ) AS X ON 1=1
WHERE trim(a.NDC_NUMBER) = ‘00093113556’;
If the case is using RPG in business logic, a simple CHAIN can be a very fast, idiomatic and terse solution for this kind of use cases, much less operations.
Personally, I would avoid “trim” function in a key field to go easy on the SQL engine as much as possibile (of course if this is a program execute many times).
Hi Mike,
Thank you for the continuous SQL info.
As you said, many options are available to solve a challenge.
Here is a simple alternate:
If the table is not reorganized by key during its life, you can obtain the same result with the – relatively simple – following SQL:
SELECT Ndc_number, Gtin_case, Gtin_Cqty FROM GtinMasterTable WHERE trim(ndc_number)=’xyz’ order by rrn(GtinMasterTable) desc LIMIT 1;
I am assuming that if you had a “log” date in the table you would have ordered by it desc & gotten the same result as well.
Thanks again for the info you continue to provide