Guru: Using SELECT * With Cursors
November 27, 2017 Ted Holt
From time to time someone brings to my attention the use of SELECT * with SQL cursors in RPG programs. Specifically, is that a good idea or a bad idea? I have learned that the answer to that question is “It depends.” Using SELECT * in a cursor declaration may or may not get you into trouble.
To set the stage, let’s begin with a simple example — an RPG program that reads one table (physical file) and prints each row (record). Even though most programs use data from more than one table, programs that read only one table are not uncommon, and a program that reads only one table is a perfect candidate for the use of SELECT * in a cursor.
Here’s the DDL for a table of employee data.
create table employees (clock dec(3) primary key name char(12), phone dec(7)); insert into employees values ( 101, 'Barney Fife' ,4445555), ( 102, 'Luther Heggs',2223333);
Let us consider two versions of a program that uses SELECT * in a cursor. First, static SQL:
H dftactgrp(*no) actgrp(*new) option(*srcstmt: *nodebugio) D EmployeeData e ds extname('EMPLOYEES') D cSQLEOF c const('02000') *inlr = *on; exec sql declare c1 cursor for select * from employees order by clock; exec sql open c1; assert (SQLState < cSQLEOF: 'Open failed, state=' + SQLState); exec sql fetch c1 into :EmployeeData; assert (SQLState < cSQLEOF: 'Fetch failed, state=' + SQLState); dump(a); exec sql close c1; assert (SQLState < cSQLEOF: 'Close failed, state=' + SQLState); return; * ============================================================= * Abruptly end the program if an unexpected condition arises. * ============================================================ P Assert B D Assert PI D Condition N Value D Message 80A Value D QMHSNDPM PR ExtPgm('QMHSNDPM') D MsgID 7 Const D MsgFile 20 Const D MsgDta 80 Const D MsgDtaLen 10I 0 Const D MsgType 10 Const D MsgQ 10 Const D MsgQNbr 10I 0 Const D MsgKey 4 D ErrorDS 16 D ErrorDS DS 16 D BytesProv 10I 0 inz(16) D BytesAvail 10I 0 D ExceptionID 7 D MsgDta S 80 D MsgKey S 4 IF (not Condition); QMHSNDPM ('CPF9898': 'QCPFMSG QSYS': Message: %len(Message): '*ESCAPE': '*PGMBDY': 1: MsgKey: ErrorDS); ENDIF; RETURN; P Assert E
Let me point out a few things about this program.
First, notice that the employees table — the table that the program reads — provides the external definition of the EmployeeData data structure. This is comforting to me. I know that the fields in the data structure will be adequate to receive the fetched data.
Second, if the Employees table can contain null data, then I would have to define a null indicator array. I don’t think most IBM i shops use nulls in the database, so I will not unnecessarily complicate the example by adding code to handle null values.
Third, to simplify the program I omitted the loop that would process the entire table. A program that only fetches one row doesn’t need a cursor, but a SELECT with the INTO clause.
Last, the purpose of the assertions is to let me know when an SQL statement fails. I would not use assertions in this manner in a production program.
Here’s the same program with dynamic SQL:
H dftactgrp(*no) actgrp(*new) option(*srcstmt: *nodebugio) D EmployeeData e ds extname('EMPLOYEES') D Statement s 96a D cSQLEOF c const('02000') *inlr = *on; Statement = 'select * from employees order by clock'; exec sql prepare s1 from :Statement; exec sql declare c1 cursor for s1; exec sql open c1; assert (SQLState < cSQLEOF: 'Open failed, state=' + SQLState); exec sql fetch c1 into :EmployeeData; assert (SQLState < cSQLEOF: 'Fetch failed, state=' + SQLState); dump(a); exec sql close c1; assert (SQLState < cSQLEOF: 'Close failed, state=' + SQLState); return; P Assert B . . . as before . . . P Assert E
What happens if I add a new column?
alter table employees add column email varchar(30) not null with default
Does the program continue to run properly without modification and without compilation? This is where it depends.
- Is the SQL static or dynamic?
- Where did I add the column?
In this case, I added the new column at the end of the row (i.e., the record format).
The static version runs as before. The reason for this is that the SQL precompiler expands the column list, so. . .
select * from employees
is equivalent to. . .
select clock, name, phone from employees
Since the program object selects three columns only, the addition of another column to the table does not affect the program. The program does not require recompilation.
The dynamic SQL version also continues to retrieve the data correctly, but the FETCH operation sets the SQL state to 01503 (Number of host variables less than result values.) That is, there is no room in the data structure for the email column. This is a warning, not an error. The program runs and retrieves the first three columns correctly.
Let’s add another column, but this time, let’s add it within the row.
alter table employees add column dept dec(3) not null with default before phone
Again, the static SQL continues to run properly because the precompiler expanded the column list. The static version continues to select the same three columns, even though there are now five columns in the table.
However, things are not so rosy with the dynamic SQL. I again get the 01503 value in the SQL state, but the data is not accurate. The phone number is zero for all employees, because the program retrieves the department number, which was initialized to zero and has not yet been populated with the true values.
The program runs, the data is bad, and no one is the wiser.
The answer is, of course, to recompile this program and all others that use a dynamic cursor over the employees table. After all, you don’t have anything better to do on the weekends. Be sure not to overlook any of them during your analysis.
Now that you understand how SELECT * works in a cursor, let me show you what I think is a better way.
If a database is even somewhat normalized, almost all programs require data from more than one table. Let’s add the name of the department to the previous query. To maintain third normal form, we must place the department name in a table that is keyed on department number.
create table departments for system name dept (ID for column DeptNo dec ( 3) primary key, Name char(16) not null with default); insert into departments values ( 1, 'Shipping' ), ( 2, 'Receiving');
To use data from both tables requires a join. You can place such a join in a lot of programs, but I propose that a better idea to place the join in one place — a view.
create or replace view empv1 as select e.clock, e.name, e.phone, e.dept, coalesce(d.name,'*Invalid*') as DeptName from employees as e left join departments as d on e.dept = d.id
Notice that this view does not return null values, even though it uses a left join. The coalesce function takes care of any possible nulls, replacing them with a dummy department name of *Invalid*.
All programs that need this data can use this view, and guess what? They can use SELECT *, just as the one-table example did. Here’s the static SQL version:
H dftactgrp(*no) actgrp(*new) option(*srcstmt: *nodebugio) D EmployeeData e ds extname('EMPV1') D cSQLEOF c const('02000') *inlr = *on; exec sql declare c1 cursor for select * from empv1 order by clock; exec sql open c1; assert (SQLState < cSQLEOF: 'Open failed, state=' + SQLState); exec sql fetch c1 into :EmployeeData; assert (SQLState < cSQLEOF: 'Fetch failed, state=' + SQLState); dump(a); exec sql close c1; assert (SQLState < cSQLEOF: 'Close failed, state=' + SQLState); return; P Assert B . . . as before . . . P Assert E
As I did in the one-table query, I’ve used the view to describe the data structure. SELECT * works well because it retrieves the values in the same order that they are listed in the view.
Using views is even more robust if you don’t change them. Suppose we need another column in the query. If we change the view, we may have to recompile the programs. But if instead we create another view, the only programs that have to be recompiled are those that need the new column. You can change those programs to use the new view, and you can do so without a code freeze.
The bottom line, then, seems to be that using SELECT * in a cursor definition in conjunction with an externally-described data structure is bulletproof. Well, not quite. Implicitly hidden columns fowl up the works if you query a table, because implicity hidden columns are included in the data structure, but not in the SELECT * field list. Chances that this will happen are very, very slim, as many if not most shops don’t use implicitly hidden columns. Columns cannot be hidden in a view, so you won’t encounter this problem if you run SELECT * against a view.
If it is customary in your shop to query tables, consider that there is much to be said for querying views instead.
what if a column size changes, like a column goes from char(30) to char(50)?