Existence Tests and Left Exception Joins
March 23, 2005 Dear Colleagues
SQL has changed a lot since I first used it in 1984, and I can’t think of a single change that hasn’t been for the better. One of the best improvements, to my way of thinking, is the addition of the JOIN clause in SELECT queries.
I have never liked the EXISTS predicate. I find it hard to read. Sometimes it is difficult to make EXISTS yield the results I want. Fortunately, I have found that the LEFT EXCEPTION JOIN makes a good substitute for EXISTS in many cases. The left exception join returns rows from the primary table that have no match in the secondary table. There is also a right exception join, which returns rows from the secondary table that have no match in the primary table.
Suppose I wish to find sales orders that do not belong to a valid customer number. One way to make my request is with the negated IN predicate.
select * from orderhdr where custnbr not in (select custnbr from customer)
IN breaks down if there are two or more fields that I have to check. I talked about this problem in “Using SQL to Find Duplicate Records”. The solution I proposed to that challenge was to use the EXISTS predicate.
Suppose, then, that orders refer to customers by company number and customer number. IN is out, but I can use EXISTS.
select * from orderhdr as o where not exists (select * from customer as c where c.company = o.company and c.custnbr = o.custnbr)
The LEFT EXCEPTION JOIN equivalent looks more natural to me.
select o.* from orderhdr as o left exception join customer as c on o.company = c.company and o.custnbr = c.custnbr
It’s almost like the usual syntax I use to join two files. The word left is optional, by the way.
Let’s complicate the example even more. Let’s say there’s a record code in the customer file that indicates whether a customer is active or inactive. Let’s retrieve orders for non-existent customers and inactive customers. Here’s the query using EXISTS.
select * from orderhdr as o where not exists (select * from customer as c where c.company = o.company and c.custnbr = o.custnbr and reccd = 'A')
The LEFT EXCEPTION JOIN requires a bit of surgery. We must replace CUSTOMER with a nested query.
select o.* from orderhdr as o left exception join (select * from customer where reccd = 'A') as c on o.company = c.company and o.custnbr = c.custnbr
If you prefer the exception join, don’t give up EXISTS just yet. I have had less than stellar success in joining three or more tables with the left exception join.
Let’s add an order details file and an item file to the query. The order details file may have many records for each order header file record.
Let’s look for orders that reference a non-existent customer, an inactive customer, or a non-existent item.
My first attempt at a left exception join looks like this:
select o.ordernbr, o.company, o.custnbr, d.itemnbr from orderhdr as o join orderdtl as d on o.ordernbr = d.ordernbr left exception join (select * from customer where reccd = 'A') as c on o.company = c.company and o.custnbr = c.custnbr left exception join item as i on d.itemnbr = i.itemnbr
Here is the equivalent EXISTS query.
select o.ordernbr, o.company, o.custnbr, d.itemnbr from orderhdr as o join orderdtl as d on o.ordernbr = d.ordernbr where not exists (select * from customer as c where reccd = 'A' and o.company = c.company and o.custnbr = c.custnbr) and not exists (select * from item as i where d.itemnbr = i.itemnbr)
The problem with this query is that it only returns orders that fail to find an active customer and a valid part number. That does not satisfy the search criteria. The EXISTS version is easily corrected by replacing the AND before the last EXISTS clause with OR.
select o.ordernbr, o.company, o.custnbr, d.itemnbr from orderhdr as o join orderdtl as d on o.ordernbr = d.ordernbr where not exists (select * from customer as c where reccd = 'A' and o.company = c.company and o.custnbr = c.custnbr) or not exists (select * from item as i where d.itemnbr = i.itemnbr)
If there’s an easy way to code this query with an exception join, I hope one of you knows it and will write to me. The following works, but I think it’s nasty.
select o.ordernbr, d.orderline, o.company, o.custnbr, d.itemnbr from orderhdr as o join orderdtl as d on o.ordernbr = d.ordernbr exception join (select * from customer where reccd = 'A') as c on o.company = c.company and o.custnbr = c.custnbr union select o.ordernbr, d.orderline, o.company, o.custnbr, d.itemnbr from orderhdr as o join orderdtl as d on o.ordernbr = d.ordernbr exception join item as i on d.itemnbr = i.itemnbr
The SELECT preceding the UNION looks for orders for invalid or inactive customer numbers. The SELECT following the UNION looks for orders with invalid item numbers. The UNION operator combines the two result sets, eliminating the duplicates in the process.
with OrderData as ( select o.ordernbr, d.orderline, o.company, o.custnbr, d.itemnbr from orderhdr as o join orderdtl as d on o.ordernbr = d.ordernbr) select o.* from OrderData as o exception join (select * from customer where reccd = 'A') as c on o.company = c.company and o.custnbr = c.custnbr union select o.* from OrderData as o exception join item as i on o.itemnbr = i.itemnbr
The following might be slightly better, but I’m not thrilled with it either.
If you’re running i5/OS V5R3, you can also use the EXCEPT operation to find rows in one table that have no match in another table. See Michael Sansoterra’s excellent article “More V5R3 SQL Enhancements” for more information on this.
Looking for data that does not correlate to other tables is an everyday necessity in information processing installations. It’s good that IBM continues to give us newer ways to find it.
–Ted