Guru: Alternate SQL Row-Selection Criteria Revisited
February 24, 2020 Ted Holt
At the RPG & DB2 Summit last October, I chanced to overhear Rob Bestgen of IBM telling Paul Tuohy of System i Developer about a practice that he and others in IBM Lab Services had seen, a practice which they found horrifying. “Oh, no!” I thought. “I’ve written about that for itjungle.com!” Are you ready to be horrified?
Rob was telling Paul that they have seen SQL queries in which portions of the WHERE clause are enabled or disabled through the use of switches. I did not invent this technique — I realized more than a few years ago that I’ve never had an original thought in my life. But I have used the technique many times — with great success, I might add — and even wrote about it in this august publication in 2003, almost 16 1/2 years ago. Here’s how it works.
Let’s say we have a query that retrieves a set of customers. (A more realistic example would be to retrieve customer orders, but I’m trying to minimize the amount of source code.) We need to be able to build this set from three different criteria. Sometimes we need to retrieve all the customers. Sometimes we need only those customers to whom we’ve assigned a certain class code. Sometimes we need only the customers within a range of ZIP (postal) codes.
How do we do this? We could write the query once and clone it twice, but many of us are averse to cloning source code. If we change one thing, we have to change the copies too. Or we could use switches to enable and disable pieces of a WHERE clause.
dcl-s SelectAll char(1) inz('0'); dcl-s SelectByClass char(1) inz('0'); dcl-s SelectByZip char(1) inz('0'); dcl-s iClass char(1); dcl-s iFromZip char(5); dcl-s iThruZip char(5); dcl-ds Customer qualified; Account packed ( 5); Name char (20); ZipCode char ( 5); Class char ( 1); end-ds; exec sql declare Inp cursor for select cusnbr, cusname, cuszip, cusclass from customers where ( (:SelectAll = '1') OR (:SelectByClass = '1' AND cusclass = :iClass) OR (:SelectByZip = '1' AND cuszip BETWEEN :iFromZip AND :iThruZip) ); exec sql open Inp; dow '1'; exec sql fetch Inp into :Customer; if SqlState >= '02000'; leave; endif; *** do something with the retrieved row enddo; exec sql close Inp;
These are the switches.
dcl-s SelectAll char(1) inz('0'); dcl-s SelectByClass char(1) inz('0'); dcl-s SelectByZip char(1) inz('0');
The values of these switches activate and deactivate row-selection criteria. If SelectAll has the value 1, this is the WHERE clause.
WHERE ( ('1' = '1') OR ('0' = '1' AND cusclass = ' ') OR ('0' = '1' AND cuszip BETWEEN ' ' AND ' ')
The first condition is true for every row, so every row gets selected. If SelectByClass is 1 and iClass is B, this is the WHERE clause:
WHERE ( ('0' = '1') OR ('1' = '1' AND cusclass = 'B') OR ('0' = '1' AND cuszip,1,5 BETWEEN ' ' AND ' ') )
The first condition is false, but the second one proves true for any rows that have a customer class of B. Class B customers are selected.
IBM does not like this technique. Rob told Paul and me — I could not restrain myself from joining the conversation — that this sort of thing is pretty much guaranteed to make the query optimizer ignore indexes.
What do the folks at IBM think we should do instead? Use dynamic SQL.
dcl-s SelectAll char(1) inz('0'); dcl-s SelectByClass char(1) inz('0'); dcl-s SelectByZip char(1) inz('0'); dcl-s iClass char(1); dcl-s iFromZip char(5); dcl-s iThruZip char(5); dcl-s Stmt varchar(512); dcl-ds Customer qualified; Account packed ( 5); Name char (20); ZipCode char ( 5); Class char ( 1); end-ds; Stmt = 'select cusnbr, cusname, cuszip, cusclass from customers; select; when SelectByClass = '1'; Stmt += ' where cusclass = ?'; when SelectByZip = '1'; Stmt += ' where cuszip between ? and ?'; endsl; exec sql prepare x from :Stmt; exec sql declare Inp cursor for x; select; when SelectByClass = '1'; exec sql open Inp using :iClass; when SelectByZip = '1'; exec sql open Inp using :iFromZip, :iThruZip; other; exec sql open Inp; endsl; dow '1'; exec sql fetch Inp into :Customer; if SqlState >= '02000'; leave; endif; *** do something with the retrieved row enddo; exec sql close Inp;
The code is slightly longer, but no more complex. It works the same way, in that switches still control the selection of rows. Notice that there are three opens, since each version of the WHERE has a different argument list.
However, the code is not identical. The static version allows you to enable both selection by class and selection by ZIP code at the same time. The dynamic version could be made to work that way, but I didn’t go to the trouble to make that happen.
Now that I have delivered the message from IBM to you, permit me to end with a few observations.
First, as technical editor of The Four Hundred, I do my best to deliver practical information to help you do your job. That’s true of the articles I write and those written by other authors. I do all I can to make sure that what we publish is not only accurate, but best practice. When I published this technique in 2003, I did so with a clear conscience, believing that I was sharing a legitimate programming technique.
Second, the performance of dynamic SQL has improved since 2003. In those days, every expert I listened to urged me to avoid dynamic SQL as much as possible.
Third, performance is relative. A table scan against a table with millions of rows does not perform well in general. A table scan against a few thousand rows may be no big deal.
Fourth, dynamic SQL still suffers from a limitation it’s always had — the program doesn’t know it’s using the tables. In the examples given above, Display Program References (DSPPGMREF) tells me that the program that uses static SQL uses the customers file. DSPPGMREF against the dynamic version doesn’t. That means that your documentation package probably doesn’t know either.
And last, since IBM knows that some of us use this technique, perhaps they could make the optimizer look for it and optimize the query accordingly. The folks at IBM are smart (I mean this sincerely, not sarcastically), and if such is possible, they can find a way to make it happen.
I still don’t see anything wrong with this static SQL technique. To me, it’s one more tool in my toolbox. I leave it to you to decide when and where to use it, if at all.
I use the switch technique (or a reasonable facsimile) all the time. Most notably, with custom built “Work With” type subfiles in a business application. I include a filter line for the user to narrow down the list. It has been very popular with my client base. And it involves a where clause made up of expressions such as these:
WHERE :c2custno = 0 or dbCustNo = :c2custno
So if the user leaves the customer number field on the filter line empty, the database row is selected. Otherwise the customer number in the row is compared to the filter.
Another technique I’ve seen is to always use the same where clauses but just stuff the ones you want all this time with loval and hival (IDK if you also have to state or is null).
This:
where
(
(:SelectAll = ‘1’)
OR
(:SelectByClass = ‘1’ AND cusclass = :iClass)
OR
(:SelectByZip = ‘1’ AND
cuszip BETWEEN :iFromZip AND :iThruZip)
);
Become:
where
((cusclass BETWEEN :iFromClass AND :iThruClass)
AND
(cuszip BETWEEN :iFromZip AND :iThruZip)
);
if you want all you just stuff low values into :iFromClass and iFromZip and high values into :iThruClass and :iThruZip
if you want a single class you stuff the same value in the from and to.
Horrified? No, I’m delighted! I was in the process of writing a dynamic cursor in a place where I really thought it was overkill, when I saw this tip. Now I can easily choose between a handful of unprocessed records from a small table, or one specific unprocessed record from that table, based on whether the key fields were passed to the program. (I’m converting a batch program that runs once an hour for five months, to trigger and data-queue driven real-time processing; but I need to leave the batch option in place for now.) Thanks, Ted!