Guru: Stub Testing And SQL
March 29, 2021 Ted Holt
A stub program is a program that does nothing but stand in as a place holder for a real program, which may or may not exist yet. I have used them for years to help me test program changes. There are also stub subroutines, stub subprocedures, etc. What I want to talk about today is how to use a similar concept for SQL queries.
You can use your favorite search engine to learn about stubs, but I’ll give you an example to increase the chances that you’ll know what I’m talking about. Let’s say that I am modifying a CL program that calls several RPG programs. The CL program has plenty of conditional logic that must be tested carefully, but the RPG programs require no modification.
. . . some code call rpgpgm1 parm(&Cancel) if cond(&Cancel *ne ’1’) do call rpgpgm2 parm(&SomeDate &SomeNumber &Status) select when (&Status *eq ‘A’) do . . . whatever enddo when (&Status *eq ‘F’) do . . . more code . . . etc. . . . and so forth
I might write a CL program and call it RPGPGM1.
pgm parm(&Status) dcl &Status *char 1 chgvar &Status ’0’ endpgm
It doesn’t matter that the real RPGPGM1 is written in RPG. It doesn’t matter what goes on in the real RPG program. This short CL program does what I need it to do for my test.
I could, of course, comment out the call to RPGPGM1 and temporarily add a CHGVAR command to set &CANCEL to zero, but then I wouldn’t be running the same CL code that will run in production.
Here’s the shortest stub program I use, and I use it a lot.
pgm endpgm
I compile this source member into a test library under what whatever name I need the program object to have.
Stubs programs come in very handy for testing program flow and logic. Now let’s think about SQL. Might we want to do the same sort of thing? That is, might we want to test just part of a query without having to access data that has no bearing on our test? Yes, we might.
In many shops, testing is done against production data or copies of production data. Whether this is a good or bad practice doesn’t matter, because it’s reality. Testing against such data is usually unreliable and/or difficult for two reasons:
- There is too much data (i.e., too many rows in the result set).
- The data does not include all the values needed to execute all conditional logic.
You can overcome these impediments by creating the necessary tables and views (physical and logical files) in a test library and loading the tables with just the required data. That’s a fine approach, but it can take a lot of time. For instance, what if one of the views is a view of view of a table? That may take a while to set up.
Here’s another approach that achieves the same purposes and is often easier.
To illustrate, consider the following highly sophisticated query that retrieves information about customers.
select c.CusNum, c.LstNam, c.City, c.State, c.ZIPCod from qcustcdt as c order by c.CusNum;
192837 | Lee | Hector | NY | 14841 |
389572 | Stevens | Denver | CO | 80226 |
392859 | Vine | Broton | VT | 5046 |
397267 | Tyron | Hector | NY | 14841 |
475938 | Doe | Sutter | CA | 95685 |
583990 | Abraham | Isle | MN | 56342 |
593029 | Williams | Dallas | TX | 75218 |
693829 | Thomas | Casper | WY | 82609 |
839283 | Jones | Clay | NY | 13041 |
846283 | Alison | Isle | MN | 56342 |
938472 | Henning | Dallas | TX | 75217 |
938485 | Johnson | Helen | GA | 30545 |
The people who use the output of this query have decided that they need the state name, not the two-character abbreviation, so I have just added the STATES table.
select c.CusNum, c.LstNam, c.City, coalesce(s.Name,c.State) as State, c.ZIPCod from qcustcdt as c left join states as s on c.State = s.Abbreviation order by c.CusNum;
I ran the query to test my change, and this is what I see:
192837 | Lee | Hector | New York | 14841 |
389572 | Stevens | Denver | Colorado | 80226 |
392859 | Vine | Broton | Vermont | 5046 |
397267 | Tyron | Hector | New York | 14841 |
475938 | Doe | Sutter | California | 95685 |
583990 | Abraham | Isle | Minnesota | 56342 |
593029 | Williams | Dallas | Texas | 75218 |
693829 | Thomas | Casper | Wyoming | 82609 |
839283 | Jones | Clay | New York | 13041 |
846283 | Alison | Isle | Minnesota | 56342 |
938472 | Henning | Dallas | Texas | 75217 |
938485 | Johnson | Helen | Georgia | 30545 |
It appears to have worked, but since all the states in QCUSTCDT are in the STATES table, I don’t know whether the COALESCE function worked properly or not. I can think of two ways to test it.
- Add a row with an invalid state code to a test copy of QCUSTCDT.
- Remove one or more states from the STATES table.
But there’s an easier way — use a stub to substitute for the STATES table, like this:
select c.CusNum, c.LstNam, c.City, coalesce(s.Name,c.State) as State, c.ZIPCod from qcustcdt as c -- left join states as s left join (values ('TX', 'Texas')) as s (Abbreviation, Name) on c.State = s.Abbreviation order by c.CusNum;
I commented out the reference to the STATES table and instead defined one row of substitute data as a row value expression.
-- left join states as s left join (values ('TX', 'Texas')) as s (Abbreviation, Name)
The row value expression has the same column (field) names — ABBREVIATION and NAME. The fact that the STATES table has other columns (fields) is irrelevant. I assigned the row value expression the same correlation name, a single letter S.
Since I only defined one row in my substitute table, I expect all rows in the result set to have the state name in the case of Texas and the state code in all others. Let’s see if that’s what happens.
192837 | Lee | Hector | NY | 14841 |
389572 | Stevens | Denver | CO | 80226 |
392859 | Vine | Broton | VT | 5046 |
397267 | Tyron | Hector | NY | 14841 |
475938 | Doe | Sutter | CA | 95685 |
583990 | Abraham | Isle | MN | 56342 |
593029 | Williams | Dallas | Texas | 75218 |
693829 | Thomas | Casper | WY | 82609 |
839283 | Jones | Clay | NY | 13041 |
846283 | Alison | Isle | MN | 56342 |
938472 | Henning | Dallas | Texas | 75217 |
938485 | Johnson | Helen | GA | 30545 |
Now I know that the COALESCE function does what I want it to do. I can remove the stub and reinstate the join to the STATES table.
You could make the argument that by using the row value expression, I am not testing the same code that will run in production. I would not disagree. I would only say that working at a high level of abstraction gives me the luxury of caring only about what the data looks like and not where it comes from.
I used one row in my stub, as one was enough, but I could have defined more rows if more had been needed, like this:
left join (values ('TX', 'Texas'), ('CO', 'Colorado')) as s (Abbreviation, Name)
Just keep adding row value expressions and separating them with commas.
I know that people sometimes install software into production environments without proper testing. Maybe sometimes it’s just too hard to set up a proper test. If so, here’s another tool that can help.
RELATED STORIES
Guru: Table Value Constructors Build Tables On The Fly
Row Value Expressions Simplify Complex Row Selection
Instead of replacing the “inner join states s” line, could you have created a CTE called states and inserted it before the select?
with states (Abbreviation, Name) as (
values (‘TX’, ‘Texas’)
)
select…
Then the statement you tested would have been closer to production?
Yes, Jay! That works too, and it’s better than my version. It does exactly what I was saying — it makes the query use row value expressions rather than the table or view — with less modification to the query itself.
Thanks for taking time to share your improvement, Jay.