Embedding SQL in RPG IV–Episode I
August 10, 2005 Joel Cochran
I must admit that I’ve been a Star Wars geek since I was 7. The premier theater in Des Moines, Iowa, at the time was a three-screen stand-alone job that easily lent itself to the term “blockbuster.” The line literally wrapped around the building, and we waited hours to get in and see the first installment of the space epic long before the movie-going public called it “A New Hope” or “Episode IV.” Every kid I knew wanted to be Luke Skywalker or Han Solo. Almost 30 years later, I’m still entranced by the story, and I was more excited than my children (also Star Wars fans) when “Episode III” finally brought to life the missing pieces of the story.
What does Star Wars have to do with RPG?
Don’t worry, the nostalgia trip is almost over. As an adult, I realize that I’ll never soar through the galaxy in my X-Wing or own a real light saber; those technologies are well beyond my grasp. And while my hopes of someday rescuing the Princess and destroying the Death Star erode a bit more each day, I can take a cue from a real-life Star Wars icon, George Lucas: This is my prequel.
In a recent article entitled Embedding SQL in /free, I outlined some of the methods that I use when embedding SQL in free-format RPG. That article presumes that the reader already understands embedded SQL and largely addresses “the ugly factor.” In recent months, though, it’s become apparent to me that many of our RPG brethren are still looking for the back story. The e-mail lists are frequently abuzz with basic how-to questions about embedded SQL, so in this article I’m going to lay out a Primer for embedding SQL in RPG. While this article addresses RPG specifically, SQL may also be embedded in COBOL, C, and even REXX programs. Also, since my prior article focused on SQL in free-format, I’m going to deviate from my normal mode of operations and use fixed-format for the examples in this article.
SQL: Why and When
I want to get into the meat of the topic, so I’m not going to spend a lot of time here. Plenty of other good thinkers have weighed in on this topic before, so I’m just going to hit what I think are the highlights.
1. Working with Groups of Data. SQL is terrific at selecting and manipulating groups of data. If you want to update/delete all the records in a file matching a certain criteria, use SQL. SQL can change or delete a group of records in a single statement, whereas native I/O would require you to loop through a file and issue individual update or delete statements.
2. Massaging Columns. SQL has a slew of columnar functions designed to tally, total, summarize, manipulate, and calculate columns of data. Many program features such as substringing, averaging, and math functions can be performed during the record selection phase. Columns can even be returned that don’t exist in the file, such as counts, calculations, literals, and dates.
3. Aggregate Data. SQL lets you easily aggregate columns in like rows. For instance, if you wanted to find a list of all the different zip codes in a mailing address file and count how many addresses were in each zip code, SQL can easily accomplish this in a single statement. In native I/O you would have to loop through a file and increment counter fields or use arrays and/or multiple-occurrence data structures to aggregate like data. Combine this capability with 1 and 2 above and SQL absolutely shines.
4. Performance. A constant debate rages on the e-mail lists about whether native I/O or SQL performs better. The answer, as usual, is “it depends”. If I were doing anything that falls into numbers 1-3 above, I would use SQL. If you are doing record level random retrieval (such as CHAIN), then native I/O should perform better. This is hardly a given, though: IBM’s DB2 development is largely focused on improving the SQL engine and its performance, so you may see different results for different OS levels. The SQL engine will determine access paths at execution time. If possible, the access plan is then stored in the program object and reused on subsequent calls. Sometimes, such as in the case of dynamic SQL, this is not possible and the plan is recreated on each PREPARE execution. More information on Access Paths is available at the IBM Infocenter. As a general rule, you should not reference logical files on SQL statements. Instead, reference the physical file name. If an appropriate access path does not exist, the engine must generate one on the fly, so building appropriate indexes can greatly benefit your SQL performance.
There are many other capabilities of SQL; this list is hardly inclusive. I just wanted to lay out some basic ground rules. SQL is extremely flexible and capable, which is why I tend to prefer it, but I do not espouse the total replacement of native I/O with SQL. Pick the right tool for the right job. While some argue that you should use SQL because it is an “industry standard,” I think that is a poor reason for completely dismissing a unique and powerful tool like native I/O. The database access method you choose is not an exclusive decision. You can use SQL and native I/O together in the same program. So you could use SQL to select a group of records or aggregates to work with, perhaps selecting only the key fields, and then use native I/O to CHAIN to another file using the keys to retrieve the detail.
Getting Started
In order to write programs using embedded SQL, you must have the DB2 Query Manager and SQL Development Kit installed on your machine (that’s licensed program 5722-ST1 according to my V5R2 box). A quick test to see if you already have this installed is to enter “STRSQL” on the command line. If you have it, the “Interactive SQL” program will start, which you can use to execute ad hoc queries against your DB2 database. If you do not have the Development Kit installed, you cannot write programs that use SQL, so be sure you have this before you proceed.
It is important to note that you do not need the development kit or any additional runtimes to execute programs written with embedded SQL. The SQL engine used to process the SQL commands is built-in to the DB2 core, so you can run the programs anywhere. You just can’t compile them anywhere. This can be very handy if you manage multiple machines but only have a single development environment. While you can still use a tool like Operations Navigator to execute SQL on machines without Interactive SQL, neither one is a very good option for executing complex series of statements, especially if the function is a recurring job. That’s what programs are for!
In order to create an RPG source member that can use embedded SQL, you must use a source type of SQLRPGLE. To compile an SQLRPGLE source member, use the “Create SQL ILE RPG Object” (CRTSQLRPGI) command. If you are using PDM, option 14 will create a bound program, and option 15 will create a *MODULE object. If you are issuing the command manually, use the OBJTYPE parameter to indicate *PGM or *MODULE. Also, if your program updates a file that is not journaled, you must change the command option COMMIT to *NONE, otherwise the updates will not occur. Personally, this was such as constant hassle for me that I changed my command default.
Before too many of you more experienced SQL programmers out there send me a million e-mails, yes there are two other ways to handle the commitment control problem. First, you could add With NC to your SQL statements. This signifies that no commitment control should be used on this particular statement. The problem with this approach is the additional hassle of needing to remember it for every statement that updates the database. Second, you could use the SET OPTION SQL statement. SET OPTION is perhaps worthy of an article in its own right. It allows you to enforce certain program options in the code itself rather than on the compile command:
c/exec sql c+ set option commit=*none, c+ datfmt=*iso c/end-exec
There are more options for set, and many people swear by it. For whatever they are worth, here are my issues with SET OPTION. First of all, I’d have to remember to add a block in every embedded SQL program I write. Call it lazy or senile, I just don’t like needing to remember stuff like that. By changing the command default, I only have to remember it when I upgrade my OS level. Second, I’ve run into some funky side effects when debugging programs that have SET OPTION blocks: namely the lines presented don’t appear to match the lines being executed. This may have been corrected in a PTF or subsequent releases, but I haven’t felt the need to address it since I changed my command defaults.
The SQL Pre-Compiler
SQLRPGLE programs go through a two-step compile process. Before the RPG code is sent to the standard RPG IV compiler, it must first pass muster with the “SQL pre-compiler.” The pre-compiler’s main job is to validate all of your SQL statements and convert them to dynamic program calls, after which it passes the altered code on to the main RPG IV compiler. If you are interested in seeing what the converted code looks like, set the LIST parameter to *PRINT and then view the spooled output. Your original SQL statements will be commented and followed by the code inserted by the pre-compiler.
Now for “the good, the bad, and the ugly.” The good news is that this process allows us to easily embed SQL. If we had to write code to validate our own SQL and then develop our own API calls, embedding SQL would be inordinately difficult. The bad is that the SQL pre-compiler team and the RPG IV compiler team are not one in the same: in fact, they aren’t even in the same location or the same division. This has a tendency to create a lag between features supported in RPGLE and features supported in SQLRPGLE. I’m not commenting on this state of affairs, just pointing it out. Over the last several years the SQL pre-compiler team has done a lot of work to try to rectify this situation, and it is improving.
But there is some ugly. Should your compile fail at the pre-compiler level, the error listing you get back will be of very little assistance compared to the RPGIV error listings. The message are short and cryptic and offer nothing in the way of corrective suggestions. Also, the pre-compiler appears to simply stop at the first sign of trouble, so even if you use LIST(*PRINT) you only receive the source code up to the first error. This may take a little getting used to and can surely be frustrating when you are first starting out. My advice is to stick with it: with a little practice, your SQL statements won’t cause a lot of compile-time problems.
Embedding Basics
Issuing an SQL statement within your source code has some rules as well. In your RPG source code, you must indicate to the SQL pre-compiler that you are in fact executing SQL. To do so, you must wrap all SQL in Exec SQL and End-Exec compiler statements beginning in column 7:
c/exec sql c+ create table midserve/sqltest c+ ( id int not null primary key , c+ name char(35) not null with default , c+ address char(35) not null with default , c+ city char(35) not null with default , c+ state char(2) not null with default , c+ zip numeric(5,0) not null with default ) c/end-exec
Notice that the lines between the two compiler directives all begin with c+. Also, only one SQL statement is allowed in each pair of directives, so if you wanted to drop this table before issuing the CREATE TABLE statement it would look like so:
c/exec sql c+ drop table midserve/sqltest c/end-exec c/exec sql c+ create table midserve/sqltest c+ ( id int not null primary key , c+ name char(35) not null with default , c+ address char(35) not null with default , c+ city char(35) not null with default , c+ state char(2) not null with default , c+ zip numeric(5,0) not null with default ) c/end-exec
You can insert SQL like this anywhere in your code, including subroutines and subprocedures. This SQL doesn’t use any program variables or return any values. Such SQL is not terribly functional beyond the sort of example listed above, so let’s turn our attention to using program variables with our SQL.
Host Variables
Host variable is a term that you see frequently in embedded SQL discussions. Since your RPG program is the “host” of these SQL statements, the term host variable refers to any variable declared in the RPG program. We can use these variables when we execute SQL as parameters or as return values. Host variables are always preceded in SQL by a semi-colon. The following example will add ten records to our table created above by incrementing a counter variable and using the counter for the ID field:
d counter s 10i 0 inz c for counter=1 to 10 c/exec sql c+ insert into midserve/sqltest c+ ( id ) c+ values( :counter ) c/end-exec c endfor
You can easily mix and match host variable and literals as well. In this example, I use three host variables and one literal as INSERT values.
d counter s 10i 0 inz d city s 35a inz( 'STAUNTON' ) d zip s 5s 0 inz( 24401 ) c for counter=1 to 10 c/exec sql c+ insert into midserve/sqltest c+ ( id , city , state , zip ) c+ values( :counter , c+ :city , c+ 'VA' , c+ :zip c+ ) c/end-exec c endfor
Granted, this code doesn’t do a whole lot, but it does demonstrate how to reference a host variable. It also demonstrates that, within the confines of the compiler directives, embedded SQL is basically free-format. In this case, I shifted the SQL text over to be inline with the fixed-format code around it, but I could have just easily left it left justified like the CREATE TABLE statements above. In the INSERT statement I put all the field names I’m inserting on one row but I listed each of the insert values on their own row and I also indented the closing parentheses for the values statement. These are merely stylistic demonstrations: You should experiment and find a style that is comfortable for you to code and easy for you to read. It’s also a good time to point out that the SQL engine is case-insensitive. I’m using lowercase above but you could just as easily use uppercase if it suits you. Of course, the literal value ‘VA’ above is capitalized because that’s how I want it in the database.
I should also point out that the host variable names are the same as the field names in the file. This is simply a coincidence, or rather an intentional choice. Unlike native I/O, the fields are not automatically available to the program as variables. Because of this difference, there is no conflict here in creating variables as the same name. I could have just as easily used a variable called “JOELS_TOWN” instead of “CITY”.
There are some additional rules for host variable naming. I address them in more detail in the prior article, but to sum them up in one fell swoop: the names should be unique. In other words, don’t use host variables in subprocedures that also have global names and vice-versa. Personally, I make all my host variables global variables. I will frequently prefix them with sql_ to prevent any questions. I am also fond of using externally defined datastructures for my host variables to prevent the SQL Engine from performing unnecessary translations. Above we defined ID as “int”. If we tried to insert a variable defined as “3S 0”, then the Engine would have to convert it to “int” on the fly. To avoid this, I try to use host variables that are defined exactly the same as the database fields. These conventions have worked well for me and once instituted as a standard are easy to maintain.
Returning a Value into a Host Variable
SQL reads data from a database using the SELECT statement. In embedded SQL, you must specify the target(s) for the return value(s) to populate. We can accomplish this by adding the INTO clause in our SELECT statement:
d counter s 10i 0 inz c/exec sql c+ select count(*) c+ into :counter c+ from midserve/sqltest c/end-exec c counter dsply c eval *inlr = *on
This complete program counts the total number of records in a file and retrieves that value into a program variable named “COUNTER”. Notice again that the host variable is referenced in the SQL by the use of the “:”. As I implied before, you can also use a datastructure name as the return variable:
d myDS ds d id 10a varying d name 35a d zip 5s 0 c/exec sql c+ select trim(char(id)), name, zip c+ into :myDS c+ from midserve/sqltest c/end-exec
I’ve mixed things up a bit here for demonstration purposes. ID is defined as an “int” in the database, but I want the character version of that value, so I’ve used a couple of SQL functions to convert it for me. More importantly though, I’ve retrieved 3 field values without needing to list all three host variables. This is very handy, especially if you are retrieving large numbers of fields from a file.
Handling Multiple Records
If you display these results, you will only see the first record, but there are 10 records in the file, and this particular “Select” statement should definitely return all 10 of them. Since SQL excels at processing groups of data, this is fairly typical. In order to process multiple rows, we need a facility for navigating the returned rows. In RPG, this is known as a cursor, in other languages it is frequently referred to as a RecordSet.
The order of events for using a cursor is thus:
Declare
Open
Fetch (Repeat as necessary)
Close
Declaring a cursor is fairly straightforward. Declaring a cursor for the select statement above would look like this:
c/exec sql c+ declare mainCursor Cursor c+ for c+ select trim(char(id)), name, zip c+ from midserve/sqltest c/end-exec
The differences are the addition of the declare and for clauses and the removal of the into clause. We still need an into clause, but it will be part of the fetchstatement. In this example “mainCursor” is that name that we will use to reference this cursor: you can basically name cursors however you like, but as always it’s good advice to make them mean something.
Now that we have declared a cursor, we need to indicate to the program that we are ready to use it. This is important because cursors can be reusable (more on this later). To make the cursor ready for use, we must open it:
c/exec sql c+ open mainCursor c/end-exec
This will prepare the cursor and the data for retrieval. Now we must fetch the data into our host variable(s). At this point we have some options. The first option is to make “myDS” a multiple-occurence data structure (MODS) and fetch all 10 rows into our MODS at once. Below is a complete sample:
d myDS ds occurs(10) d id 10a varying d name 35a d zip 5s 0 d i s 10i 0 inz c/exec sql c+ declare mainCursor Cursor c+ for c+ select trim(char(id)), name, zip c+ from midserve/sqltest c/end-exec c/exec sql c+ open mainCursor c/end-exec c/exec sql c+ fetch next c+ from mainCursor c+ for 10 rows c+ into :myDS c/end-exec c for i=1 to %elem( myDS ) c eval %occur( myDS ) = i c myDS dsply c endfor c eval *inlr = *on
The limitations of this approach should be obvious. It forces you to hard code a number of occurrences that is less than or equal to the number of rows returned. You can return fewer rows than the size of the MODS, but if you try to return more rows than the defined size of the MODS the compiler will throw an error. This hard coding may not be a problem depending on the needs of your program. One good example would be a program that totals sales figures by month: since there can only be twelve (calendar) months, then fetching all the records into a MODS with 12 occurrences at one time makes perfect sense.
Looping through a Cursor
For the majority of cases, however, you probably want to be able to loop through the cursor as if you were reading a file. To do so, simply remove the for x rows clause. You will also need to add a conditioning test, typically on SQLCOD or SQLSTT:
d myDS ds d id 10a varying d name 35a d zip 5s 0 c/exec sql c+ declare mainCursor Cursor c+ for c+ select trim(char(id)), name, zip c+ from midserve/sqltest c/end-exec c/exec sql c+ open mainCursor c/end-exec c/exec sql c+ fetch next c+ from mainCursor c+ into :myDS c/end-exec c dow SQLSTT = '00000' c myDS dsply c/exec sql c+ fetch next c+ from mainCursor c+ into :myDS c/end-exec c enddo c eval *inlr = *on
You can also make a cursor scrollable by adding the scroll clause to the declare statement:
c/exec sql c+ declare mainCursor scroll Cursor c+ for c+ select trim(char(id)), name, zip c+ from midserve/sqltest c/end-exec
If you do not include the scroll clause, the cursor can only be read forward, but a scrollable cursor has the additional capabilities of fetch first, fetch last, and fetch prior. There are additional fetch capabilities as well.
SQLCOD and SQLSTT
In the previous example I referred to these two variables, but you won’t find them defined in the source code. These variables both come from the “SQL Communications Area” (SQLCA). SQLCA is a data structure that is automatically included with every SQLRPGLE source member:
D* SQL Communications area D SQLCA DS D SQLAID 1 8A D SQLABC 9 12B 0 D SQLCOD 13 16B 0 D SQLERL 17 18B 0 D SQLERM 19 88A D SQLERP 89 96A D SQLERRD 97 120B 0 DIM(6) D SQLERR 97 120A D SQLER1 97 100B 0 D SQLER2 101 104B 0 D SQLER3 105 108B 0 D SQLER4 109 112B 0 D SQLER5 113 116B 0 D SQLER6 117 120B 0 D SQLWRN 121 131A D SQLWN0 121 121A D SQLWN1 122 122A D SQLWN2 123 123A D SQLWN3 124 124A D SQLWN4 125 125A D SQLWN5 126 126A D SQLWN6 127 127A D SQLWN7 128 128A D SQLWN8 129 129A D SQLWN9 130 130A D SQLWNA 131 131A D SQLSTT 132 136A D* End of SQLCA
The fields most used from SQLCA are SQLCOD and SQLSTT. Both of these fields return information to the program indicating the resulting state of the most recently executed SQL statement. SQLCODEs and SQLSTATEs always have corresponding values, so you can check either one, but I find the SQLSTATEs to be more informative.
Briefly, I primarily use SQLSTT values of ‘00000’ and ‘02000’. ‘00000’ indicates that the statement was successful and has no warnings: when fetching, this indicates that a row was found. ‘02000’ basically means no rows were found. For a complete overview of Codes and States, refer to IBM’s DB2 UDB for iSeries SQLCODEs and SQLSTATEs manual.
Final Thoughts
I’d like to stress that this article is only a beginning primer. In my next article (“Episode II”) I will discuss some of the more dynamic features of embedded SQL. If you are eager to move forward and just can’t wait, there is a wealth of information and additional capabilities that are well worth investigation. I would recommend some additional resources:
IBM’s DB2 Universal Database for iSeries SQL Programming Concepts manual is a must-have document.
A couple of non-IBM SQL books that have permanent homes on my desktop are iSeries and AS/400 SQL at Work, written by IT Jungle’s own Howard Arner, and SQL/400 Developer’s Guide by Paul Conte and Mike Cravitz.
Joel Cochran is the director of research and development for a small software firm in Staunton, Virginia, and is the author and publisher of www.RPGNext.com and www.RPGBeans.com. You can reach Joel through our Contact page
Host variables are preceded by “colons” (:), not “Host variables are always preceded in SQL by a semi-colon”….