Thinking in Sets
May 24, 2006 Hey, Ted
I have a program that contains a complicated SQL command that selects a group of records exactly the way I want them selected. My boss has added a new wrinkle by creating an exceptions file, which would indicate other records that are to be selected, whether they meet the selection criteria or not. I cannot find a way to add the exceptions file without breaking my query. How do I modify my SQL command to handle the exceptions file? –Clayton You were taught the solution to your problem in elementary school. Do you remember learning about sets in math class? E. F. “Ted” Codd based the whole idea of relational databases on set theory. A set is an unordered group of like things. The students in the classroom were a set. The boys in the class were a set. The redheaded boys were a set. Your teacher and textbook explained to you that you can relate sets in different ways. For example, overlapping the set of redheaded boys with the set of boys who play baseball gives you the set of redheaded baseball-playing boys. This is known as the intersection of two sets. You might combine the two sets into one, in which case the set contains all boys who are redheaded and all boys who like to play baseball. Some of the boys might fit in both categories, but all such boys appear in the result set only once, as there cannot be two instances of one person. This is known as the union of two sets. You can also remove the members of one set from another set. For instance, you might remove the redheaded boys from the set of baseball players, which results in the set of baseball players who are not redheaded. Or you could remove the baseball players from the set of redheaded boys, which gives you the set of redheaded boys who do not play baseball. This operation is known as set minus. So which of these set operations solves your problem? Union. You need the union of the set of things that pass the rigorous testing of your WHERE clause and the set of things that are to be included whether they satisfy the WHERE clause or not. Let me illustrate these set operations with a simpler example. Consider a set of customers, stored in file QIWS/QCUSTCDT.
Let’s suppose that your query finds customers who owe your company more than 250 simoleons (or whatever your local currency is.) SELECT cusnum, lstnam, init, city, state, cdtlmt, baldue FROM qcustcdt WHERE baldue > 250
Now let’s see who’s in the exceptions file. SELECT CusNumber FROM custexcept ORDER BY 1
The only thing lacking is to combine the two sets into one. SELECT cusnum, lstnam, init, city, state, cdtlmt, baldue FROM qcustcdt WHERE baldue > 250 UNION SELECT c.cusnum, c.lstnam, c.init, c.city, c.state, c.cdtlmt, c.baldue FROM custexcept AS e JOIN qcustcdt AS c ON e.CusNumber = c.cusnum ORDER BY 1
Notice that customer S. S. Vine is included once, not twice. The normal behavior of UNION is to omit duplicate rows. If you do not want to drop duplicate rows, use UNION ALL instead of UNION. That takes care of your problem, Clayton. For sake of completeness, I’ll illustrate the other two set operations. These only work at V5R3 and above. First, here’s the intersect operation, which returns only the rows that are members of both sets. SELECT cusnum, lstnam, init, city, state, cdtlmt, baldue FROM qcustcdt WHERE baldue > 250 INTERSECT SELECT c.cusnum, c.lstnam, c.init, c.city, c.state, c.cdtlmt, c.baldue FROM custexcept AS e JOIN qcustcdt AS c ON e.CusNumber = c.cusnum ORDER BY 1
Here’s the set minus. You would use this operation if the purpose of the exceptions file were to exclude certain customers. SELECT cusnum, lstnam, init, city, state, cdtlmt, baldue FROM qcustcdt WHERE baldue > 250 EXCEPT SELECT c.cusnum, c.lstnam, c.init, c.city, c.state, c.cdtlmt, c.baldue FROM custexcept AS e JOIN qcustcdt AS c ON e.CusNumber = c.cusnum ORDER BY 1
–Ted |