Guru: Search Source Code For Strings Using SQL
October 9, 2023 Bob Cozzi
The IBM-supplied Find String Using PDM (FNDSTRPDM) command and PDM option 25 are how most developers scan for a simple value in a list of source file members. But what if you need more? For example, what if you want to find something complex, such as the field named “CUST” but not the field named “CUSTNAME” “CUSTOMER” or “CUSTNO”? Go ahead, I’ll wait?
While I do use FNDSTRPDM all the time, I now use the SQL READSRC (Read Source File Member) Table function more often.
READSRC SQL UDTF
The READSRC SQL Table function or UDTF (User-defined table function) uses an SQL interface to read through a source file member. You can read through a single source member or all source members in the file. It returns the source file and member name, last changed date, sequence, and the source data itself. Here’s a simplified example I ran using IBM ACS Run SQL Scripts:
As you can see from this example, the source date field is an actual Date value. My ACS is setup to show Dates as USA-format so it appears as MM/DD/YYYY but in Europe and Australia/New Zealand it might appear as DD/MM/YYYY. The source sequence number is returned as an SQL DEC(6, 2) value and the Source Data field can be any size up to 32k-ish.
While this is an awesome result, what benefit is it over just using SQL like this:
SELECT SRCSEQ, SRCDAT, SRCDTA FROM COZTOOLS.QRPGLESRC;
SQL does not support member names directly, so this would return the content of the first member added to the file, not the one we want. To circumvent this short-coming, you could create an SQL ALIAS and then query that ALIAS, or issue an OVRDBF to the desired member, like this:
With this approach you can use all the SQL query capability to locate a value within a single source file member. Note that MBR(*ALL) does not work with SQL, you would get:
[SQL0204] *ALL in *N type *MEM not found.
READSRC Internals
The READSRC UDTF allows a member name or *ALL that will read through the first member and then moves to the next member. You will note the same CPI5209 “Moved from member A to member B” messages in the joblog. This continues until you hit the last row of the last member in the file.
Because it can read multiple members, it includes the source file, library, and member names in the resultSet columns/ These are obviously important when looking for information in those source members. When *ALL is NOT specified for the Member name, then the QUSRMBRD (Retrieve Member Description) API is used to find the member in the file/library specified. This is important if a library of *LIBL is used or when there is an OVRDBF applied to the file be queried.
If the source statement change date is not a valid date, then SRCDATE is set to NULL.
On more thing. There is a TOUPPER parameter on the READSRC function. When *YES is specified for this parameter, the SRCDATA column is returned in all UPPER case. If you are doing a simple search for some field or file name, the case of that name in the various source members may be inconsistent, so this option provides a quick way solution to that issue. Alternatively there are myriad SQL functions, such as UPPER() that provide a similar solution.
Source Member Query Using SQL
I enjoy search a source file’s members for specific values or even more fun is the ability to search for a Statement Last Changed Date. For example, if I need to know what source in QCLSRC was changed in the last 30 days, I can easily do that using something like the following:
The WHERE clause is used to compare the DATE column named SRCDATE to “today minus 30 days”. If the SRCDATE is greater than 30 days ago, we want to see that source line.
Using Regular Expressions to Search Source
This is already awesome, but what about searching the Source Statement itself? A simply WHERE SRCDATA LIKE … works great, so nothing new there. However, since we are using SQL all the Regular Expression built-in functions are also available to us, including the REGEXP_LIKE predicate.
Let’s search for the use of the crtusrspaceXXXX APIs in my COZTOOLS RPG IV source members.
This is awesome, but is it complete? Unlikely, so let’s redo the query and run it again:
Now we have 64 results vs the 10 or so rows previously returned.
Using regular expressions, we can do something similar, and avoid the upper/lower case conversion option and get our SRCDATA result in the original upper/lower case. Here’s what that might look like using REGEXP_LIKE instead of a simple LIKE:
Same results. Awesome, right?
The REGEXP_LIKE predicate is documented separately from the built-in SQL REGEXP_xxxx functions for some reason. You need to enter it in the search bar to find its docs.
Basically, the docs for it are:
- Parameter 1 is the column to be searched.
- Parameter 2 is the regular expression search criteria.
- Parameter 3 is optional and indicates the starting position within parameter 1 to start the search.
- Parameter 4 is optional and can be one or more flags to control the regular expression search. The flag I’m using here is the lowercase ‘i’ – which indicates that it should ignore upper/lower case differences. When ‘i’ is specified, parameters 1 and 2 are treated as case insensitive.
You must specify parameter 3 in order to specify parameter 4. Therefore specify 1 for parameter 3 to indicate position 1 (if applicable) of the SRCDATA column is where the search begins.
One of the things that bugs me about the LIKE clause as well as this style of REGEXP_LIKE is that if I search for ‘%USRSPACE%’ or REGEXP_LIKE(srcdata,’usrspace’,1,’i’) I get all source lines that contain that value. But what if I only want that specific value? That is, I want USRSPACE but not USRSPACETEMP or USRSPACEPTR, how do I code that?
Word Boundary Search
To do that you can use the various techniques with regular expressions, but a relatively easy one to implement is the so called “word boundary” option. To do that, you surround the searched for string in \b controls. For example:
WHERE REGEXP_LIKE( SRCDATA, '\busrspace\b',1,'i')
Now I will only see results where SRCDATA contains USRSPACE as a whole word. That is USRSPACE is returned but CRTUSRSPACE or CRTUSRSPACEPTR or PUSRSPACE is not.
The results from this can often mislead your brain. This is because USRSPACE may appear on a line along with another form of the word, such as the following result:
callP(e) QusCrtUsrSpace(usrSpace : extAttr : initSize :
When using REGEXP_LIKE and you think something is wrong, it is likely just the way you initial looked at the results.
To provide a more practical example, suppose you need to search for the field named “CUST” in source code in QRPGLESRC source. But you want to avoid CUSTOMER, CUSTNO, CUSTNBR, or CUSTINFO. Here’s the query to do that:
In this example, I use ‘\bcust\b’ to indicate that I only want rows with CUST in them. The results are interesting, particularly those with legacy RPG comments out past column 80. But note that I don’t have unwanted CUSTOMER, CUSTNAME, etc. in this result.
Let’s combine a couple of these. Suppose we have a project that requires that a Field be expanded. This project started about 2 months ago and targets the CUSTINFO field’s length. We want to change it from 50 bytes to 255 bytes. To check on the progress, we need to search for the use of CUSTINFO on source statements that were changed in the last 60 days. This is what that query would look like:
select * from table(sqltools.readsrc('aidsrc','qrpglesrc','*all')) where srcdate > (current_date - 60 days) AND REGEXP_LIKE(srcdata,'\bcustinfo\b',1,'i');
In this example, I’ve combined the two search criteria illustrated in this article: Date selection and Regular Expression search.
First I check if the source line has been changed within the last 60 days and then search for the CUSTINFO word. By using this periodically I can get an update on the progress of the project. Of course, I would probably use a hard-date vs the 60 days technique since the project start date would be fixed and I may want to see all changes since it started and not just the changes over the last 60 days. Something like: SRCDATE >= ‘2023-06-21’
The IBM Documentation for REGEXP_LIKE can be found at this link:
https://www.ibm.com/docs/en/i/7.4?topic=predicates-regexp-like-predicate
The SQL Tools Documentation for the READSRC SQL table function can be found at this link:
http://www.sqliquery.com/SQLTools/UDTFParmDoc.html?UDTF=ST_READSRC
That’s all there is to it.
Bob Cozzi helps IBM i clients solve complex issues using SQL, RPG IV, or C/C++ as well as the SQL iQuery licensed program (the best way to modernize your Query/400 with SQL). Visit his website at: http://www.SQLiQuery.com. Cozzi is also the author of The Modern RPG Language, developer of SQL iQuery and SQL Tools, and a speaker on SQL and RPG IV topics you can actually use.
RELATED STORIES
Guru: Generating XML Using SQL – The Easy Way
Guru: Retrieving The Long And Short Object Name
Guru: Binding Directory Entries
Guru: Find Unused Objects On IBM i Using SQL
What’s In the Top 5 Hottest IBM i RFEs