Refacing Your Database, Part 1
April 5, 2016 Paul Tuohy
Note: The code accompanying this article is available for download here. In this group of three articles, I will discuss a technique for refacing a traditional DDS defined database as an SQL DDL defined database. There will be no recreation of files or recompilation of programs, and it is not as much hard work as you think it might be. The stored procedures, functions, programs, and commands used in these articles are available in the library SQLFACE, which can be downloaded here. Before I get into the nitty gritty of refacing databases, I would just like to say that I think this is a great project to give to one or more interns. So now might be a good time to reach out to a local college or university. This does not require a deep knowledge of IBM i and college kids will probably have the required database knowledge. And on to the nitty gritty. Why Reface? This is a variation of a scenario that I have seen multiple (20-plus) times over the last few years. Your company is investing in a reporting or data warehousing tool. The vendor sends a database analyst (DBA) to have a look at your database (which they were told is DB2) and see how they might use it. The DBA has never heard of an IBM i or iSeries or AS/400. The DBA sits down with you and asks to have a look at some of the main tables that will be accessed by the new tool. You fire up Run SQL Scripts (in Navigator) or Data Studio and proceed to type in: select * from sacusmas; select * from saprodmas; select * from sasalhst; The DBA’s eyebrows raise slightly at the strange table names. Then the result sets, shown in Figure 1, pop up.
The DBA’s eyebrows are as far as they will go and their jaw is hanging agape. What is with the column names? Why do all the columns in a table start with the same two characters? And are those dates? The DBA slowly backs out of the room, returns to the office, and promptly recommends that the company invest in a proper database that can be populated from the data on the IBM i. Of course, that database runs on SQL server or Oracle or DB2 on Linux or anything except IBM i. The DBA was hoping to see you type in: select * from customers; select * from products; select * from sales_history; Hopefully the end result is something akin to the result sets shown below in Figure 2.
The DBA was right, to an extent. Ten-character, eight-character, or six-character column names, each with unique prefixes or suffixes depending on the table, are not what one would call a modern database. No matter that your database may be a well-designed, normalized database, it just doesn’t look right. What is Refacing? Refacing is where we can go from the results shown in Figure 1 to the results shown in Figure 2 without having to make any changes to our existing DDS-defined files. This means creating corresponding views for each physical file. The views rename the columns to their proper names. In this case, the views CUSTOMERS, PRODUCTS, and SALES_HISTORY would be created over the physical files SACUSMAS, SAPRODMAS, and SASALHST. That might work for these three simple physical files, you say, but I have 2,000 physical files, some with hundreds of fields. It would take forever. Not with a couple of tools to help you along the way. The Refacing Process I break the refacing process into three stages:
The concept is that you extract the required data for the current physical files and column definitions into two conversion tables (one for the physical files and one for the tables), correct the data in the two conversion tables, and then generate a script (from the conversion tables) to create all of the required views. In this article, we will look at the extraction stage and at the other two stages in each of the subsequent articles. Physical Files for Conversion Figure 3 shows the DDS for four physical files: a field reference file and three data files. Whether or not you are using a field reference file is irrelevant because required information is extracted from the actual file definitions. The field reference file is being shown for your benefit only. These files are in library SQLFACETST. The main points to note are (refer to the callouts in the code below): A–All fields in a file start with the same two character prefix. This could just as easily be one character and/or a suffix as opposed to a prefix. Or, if you are really lucky, there are no prefixes and suffixes involved and you just have to deal with short names. The norm (but not the rule) is that the name following the prefix is consistent across files. B–The columns CMTOTSAL (in SACUSMAS) and PMAMOUNT (in SAPRODMAS) both reference the field AMOUNT for their definition but PMAMOUNT overrides the text description. C–The STOH definition (in SAFLDREF) does not have a text description. D–SHPURDAT (in SASALHST) is a YYYYMMDD date stored in numeric format. It may also be 0 or 99999999. E–SHORYY, SHORMM and SHORDD are an order date, the year, month, and day being stored as separate fields. Year is a four-digit number. F–SHDLC, SHDLYY, SHDLMM, and SHDLDD are a delivery date, the century, year, month, and day being stored as separate fields. Century is a single digit number (0 for 1900s, 1 for 2000s) and year is a two-digit number. File SAFLDREF A R SAFLDREFR A AMOUNT 15P 2 TEXT('Amount') A BALANCE 13P 2 TEXT('Balance') A EDTCDE(L) A CFIRST 30A TEXT('Contact''s First Name A CLAST 30A TEXT('Contact''s Last Name' A CUSNO 10A TEXT('Customer ID Number') A DESC 40A TEXT('Product Description') A DLC 1S 0 TEXT('Delivery Century') A DLDD 2S 0 TEXT('Delivery Day') A DLMM 2S 0 TEXT('Delivery Month') A DLYY 2S 0 TEXT('Delivery Year') A NAME 40A TEXT('Customer Name') A ORDD 2S 0 TEXT('Order Day') A ORMM 2S 0 TEXT('Order Month') A ORYY 4S 0 TEXT('Order Year') A PRODCD 10A TEXT('Product Code') A PURDAT 8S 0 TEXT('Purchase Date') A EDTCDE(Y) A QTY 15P 5 TEXT('Quantity/Volume') A EDTCDE(L) (C) A STOH 15P 5 EDTCDE(L) File SACUSMAS A REF(SAFLDREF) A R SACUSMASR (A) A CMCUSNO R REFFLD(CUSNO) A CMNAME R REFFLD(NAME) A CMCFIRST R REFFLD(CFIRST) A CMCLAST R REFFLD(CLAST) A CMBALANCE R REFFLD(BALANCE) (B) A CMTOTSAL R REFFLD(AMOUNT) File SAPRODMAS A REF(SAFLDREF) A R SAPRODMASR (A) A PMPRODCD R REFFLD(PRODCD) A PMDESC R REFFLD(DESC) A PMSTOH R REFFLD(STOH) (B) A PMAMOUNT R REFFLD(AMOUNT) A TEXT('Stock Amount') File SASALHST A REF(SAFLDREF) A R SASALHSTR (A) A SHCUSNO R REFFLD(CUSNO) A SHPRODCD R REFFLD(PRODCD) A SHQTY R REFFLD(QTY) A SHAMOUNT R REFFLD(AMOUNT) (D) A SHPURDAT R REFFLD(PURDAT) (E) A SHORYY R REFFLD(ORYY) A SHORMM R REFFLD(ORMM) A SHORDD R REFFLD(ORDD) (F) A SHDLC R REFFLD(DLC) A SHDLYY R REFFLD(DLYY) A SHDLMM R REFFLD(DLMM) A SHDLDD R REFFLD(DLDD) The Conversion Files The next piece of code shows the DDL used to create the two conversion tables used in the refacing process. These tables will initially be populated by calling a stored procedure. We will look at the contents in a moment. CREATE TABLE TABLE_TRANSLATION FOR SYSTEM NAME TABTRANS ( LIBRARY CHAR(10) CCSID 37 NOT NULL DEFAULT '' , TABLE_NAME CHAR(10) CCSID 37 NOT NULL DEFAULT '' , LONG_TABLE_NAME FOR COLUMN LONGNAME VARCHAR(100) ALLOCATE(100) CCSID 37 NOT NULL DEFAULT '' , TABLE_TEXT CHAR(50) CCSID 37 NOT NULL DEFAULT '' ) RCDFMT TABTRANS ; LABEL ON TABLE TABLE_TRANSLATION IS 'Table Translation' ; LABEL ON COLUMN TABLE_TRANSLATION ( LIBRARY IS 'Library ' , TABLE_NAME IS 'Table Name ' , LONG_TABLE_NAME IS 'Long Table Name' , TABLE_TEXT IS 'Text' ) ; LABEL ON COLUMN TABLE_TRANSLATION ( LIBRARY TEXT IS 'Library/Schema' , TABLE_NAME TEXT IS 'Table Name' , LONG_TABLE_NAME TEXT IS 'Long Table Name' , TABLE_TEXT TEXT IS 'Text' ) ; CREATE TABLE COLUMN_TRANSLATION FOR SYSTEM NAME COLTRANS ( LIBRARY CHAR(10) CCSID 37 NOT NULL DEFAULT '' , TABLE_NAME CHAR(10) CCSID 37 NOT NULL DEFAULT '' , CURRENT_COLUMN FOR COLUMN CUR_COLUMN CHAR(10) CCSID 37 NOT NULL DEFAULT '' , COLUMN_SEQUENCE FOR COLUMN COLSEQ DECIMAL(10, 0) NOT NULL DEFAULT 0 , PROPER_COLUMN FOR COLUMN PROPCOLUMN CHAR(10) CCSID 37 NOT NULL DEFAULT '' , LONG_COLUMN FOR COLUMN LONGCOLUMN VARCHAR(100) ALLOCATE(100) CCSID 37 NOT NULL DEFAULT '' , USE_FUNCTION_NAME FOR COLUMN USEFUNC VARCHAR(20) ALLOCATE(20) CCSID 37 NOT NULL DEFAULT '' , EXTRA_PARAMETERS FOR COLUMN EXTRAPARM VARCHAR(100) ALLOCATE(100) CCSID 37 NOT NULL DEFAULT '' ) RCDFMT COLTRANS ; LABEL ON TABLE COLUMN_TRANSLATION IS 'Column Translation' ; LABEL ON COLUMN COLUMN_TRANSLATION ( LIBRARY IS 'Library ' , TABLE_NAME IS 'Table Name ' , CURRENT_COLUMN IS 'Current Column Name' , COLUMN_SEQUENCE IS 'Column Sequence' , PROPER_COLUMN IS 'Proper Column Name' , LONG_COLUMN IS 'Long Column Name' , USE_FUNCTION_NAME IS 'Use Function ' , EXTRA_PARAMETERS IS 'Extra Parameters For Function' ) ; LABEL ON COLUMN COLUMN_TRANSLATION ( LIBRARY TEXT IS 'Library/Schema' , TABLE_NAME TEXT IS 'Table Name' , CURRENT_COLUMN TEXT IS 'Current Column Name' , COLUMN_SEQUENCE TEXT IS 'Column Sequence in Row' , PROPER_COLUMN TEXT IS 'Proper Column Name' , LONG_COLUMN TEXT IS 'Long Column Name' , USE_FUNCTION_NAME TEXT IS 'Use Function' , EXTRA_PARAMETERS TEXT IS 'Extra Parameters for Function' ) ; Populating the Conversion Tables We populate the conversion tables by calling the EXTRACT_DEFINITIONS stored procedure as follows: call extract_definitions('SQLFACETST', DEFAULT, 2, DEFAULT); The four parameters are:
If you don’t want all files in a library, you can request the files one by one or, as I did above, request all files and then delete references to the files you don’t want. In this case, I am not interested in the field reference file (SAFLDREF), so I run the two following SQL statements: delete from table_translation where table_name in ('SAFLDREF'); delete from column_translation where table_name in ('SAFLDREF'); Figure 3 shows what has been placed in the TABLE_TRANSLATION table. For each file, the EXTRACT_DEFINITIONS stored procedure has attempted to construct a LONG_TABLE_NAME from the table text. Constructing this name consists of putting all the characters in upper case and replacing special characters with an underscore.
Figure 4 below shows what has been placed in the TABLE_TRANSLATION table. The important points to note are:
This next bit of code shows the DDL used to create the EXTRACT_DEFINITIONS stored procedure. The procedure generates and executes two dynamic SQL statements. The insert statements run select statements that extract the required definitions from the system catalog tables QADBXFIL and QADBIFLD. CREATE PROCEDURE EXTRACT_DEFINITIONS ( IN LIBRARY VARCHAR(10) , IN TABLE_NAME VARCHAR(10) DEFAULT '*ALL' , IN PREFIX_LENGTH DECIMAL(1, 0) DEFAULT 0 , IN SUFFIX_LENGTH DECIMAL(1, 0) DEFAULT 0 ) LANGUAGE SQL SPECIFIC EXTRACT_DEFINITIONS NOT DETERMINISTIC MODIFIES SQL DATA CALLED ON NULL INPUT PROGRAM TYPE SUB CONCURRENT ACCESS RESOLUTION DEFAULT SET OPTION ALWBLK = *ALLREAD , ALWCPYDTA = *OPTIMIZE , COMMIT = *NONE , DECRESULT = (31, 31, 00) , DFTRDBCOL = *NONE , DYNDFTCOL = *NO , DYNUSRPRF = *USER , SRTSEQ = *HEX BEGIN DECLARE SQLCODE INTEGER DEFAULT 0 ; DECLARE SQL_STATEMENT VARCHAR ( 5000 ) ; DECLARE START_AT INTEGER DEFAULT 0 ; DECLARE CUT_END INTEGER DEFAULT 0 ; DECLARE QT CHAR(1) DEFAULT ''''; -- Extract Table Names SET SQL_STATEMENT = 'insert into tabtrans (library, table_name, ' || 'long_table_name, table_text) ' || '(select dbxlib, dbxfil, ' || 'coalesce(translate(trim(upper(char(dbxtxt))), ' || (A) QT || '__________' || QT || ', ' || QT || ' ,.()/+-"''''' || QT || '), ' || QT || '*MISSING' || QT || '), ' || 'coalesce(char(dbxtxt), '' '') ' || 'from qsys/qadbxfil ' || 'where dbxlib = ' || QT || LIBRARY || QT || ' and ' || 'dbxatr = ' || QT || 'PF' || QT || ' and dbxtyp = ' || QT || 'D' || QT; IF ( TABLE_NAME <> '' AND TABLE_NAME <> '*ALL' ) THEN SET SQL_STATEMENT = SQL_STATEMENT || ' and dbxfil = ' || QT || TABLE_NAME || QT ; END IF ; SET SQL_STATEMENT = SQL_STATEMENT || ')' ; PREPARE D1 FROM SQL_STATEMENT ; IF ( SQLCODE <> 0 ) THEN RETURN ; END IF ; EXECUTE D1 ; -- Set parameters for dropping prefix/suffix SET START_AT = 1 + PREFIX_LENGTH ; SET CUT_END = PREFIX_LENGTH + SUFFIX_LENGTH ; -- Extract Column Definitions SET SQL_STATEMENT = 'insert into coltrans (library, table_name, cur_column, ' || 'column_sequence, propcolumn, longcolumn) ' || '(select dbilib, dbifil, dbifld, dbipos, ' || 'substr(dbifld, ' || DIGITS ( START_AT ) || ', ' || 'length(trim(dbifld))-' || DIGITS ( CUT_END ) || ') , ' || 'coalesce(translate(trim(upper(char(dbitxt))), ' || (A) QT || '__________' || QT || ', ' || QT || ' ,.()/+-"''''' || QT || '), ' || QT || '*MISSING' || QT || ') ' || 'from qsys/qadbxfil a inner join qsys/qadbifld b ' || 'on (dbxlib, dbxfil) = (dbilib, dbifil) ' || 'where dbilib = ' || QT || LIBRARY || QT || ' and ' || 'dbxatr = ' || QT || 'PF' || QT || ' and dbxtyp = ' || QT || 'D' || QT; IF ( TABLE_NAME <> '' AND TABLE_NAME <> '*ALL' ) THEN SET SQL_STATEMENT = SQL_STATEMENT || ' and dbifil = ' || QT || TABLE_NAME || QT ; END IF ; SET SQL_STATEMENT = SQL_STATEMENT || ')' ; PREPARE D2 FROM SQL_STATEMENT ; IF ( SQLCODE <> 0 ) THEN RETURN ; END IF ; EXECUTE D2 ; END ; COMMENT ON PARAMETER SPECIFIC PROCEDURE EXTRACT_DEFINITIONS ( LIBRARY IS 'Library Name' , TABLE_NAME IS 'Table Name' , PREFIX_LENGTH IS 'Prefix Length' , SUFFIX_LENGTH IS 'Suffix Length' ) ; LABEL ON SPECIFIC PROCEDURE EXTRACT_DEFINITIONS IS 'Extract Column Definitions' ; Writing dynamic SQL in SQL is never the prettiest of code. The only part of this procedure that you might have to change is to add extra characters to the translation of text to the new column name. Let’s say we want to have the ” (reverse slash) character also translated: in the two place where the translation is done ( (A) in Figure 7), we would need to add an extra underscore to the first string and the ” to the second string, as follows: QT || '___________' || QT || ', ' || QT || ' ,.()/+-"''''' || QT || '), ' || In case you are wondering, the four quotes at the end of the “to” string equate to one single quote in the final string used for the dynamic statement. Did I mention that writing dynamic SQL in SQL is never the prettiest of code? The Next Step The next step is to analyze and correct our new naming, which we will cover in my next article. Paul Tuohy is CEO of ComCon, an iSeries consulting company, and is one of the co-founders of System i Developer, which hosts the RPG & DB2 Summit conferences. He is an award-winning speaker who also speaks regularly at COMMON conferences, and is the author of “Re-engineering RPG Legacy Applications,” “The Programmers Guide to iSeries Navigator,” and the self-study course called “iSeries Navigator for Programmers.” Send your questions or comments for Paul to Ted Holt via the IT Jungle Contact page.
|