Refacing Your Database, Part 1

April 5, 2016 Paul Tuohy

Note: The code accompanying this article is available for download here.

In this group of three articles, I will discuss a technique for refacing a traditional DDS defined database as an SQL DDL defined database. There will be no recreation of files or recompilation of programs, and it is not as much hard work as you think it might be.

The stored procedures, functions, programs, and commands used in these articles are available in the library SQLFACE, which can be downloaded here.

Before I get into the nitty gritty of refacing databases, I would just like to say that I think this is a great project to give to one or more interns. So now might be a good time to reach out to a local college or university. This does not require a deep knowledge of IBM i and college kids will probably have the required database knowledge.

And on to the nitty gritty.

Why Reface?

This is a variation of a scenario that I have seen multiple (20-plus) times over the last few years. Your company is investing in a reporting or data warehousing tool. The vendor sends a database analyst (DBA) to have a look at your database (which they were told is DB2) and see how they might use it. The DBA has never heard of an IBM i or iSeries or AS/400.

The DBA sits down with you and asks to have a look at some of the main tables that will be accessed by the new tool. You fire up Run SQL Scripts (in Navigator) or Data Studio and proceed to type in:

select * from sacusmas;
select * from saprodmas;
select * from sasalhst;

The DBA’s eyebrows raise slightly at the strange table names. Then the result sets, shown in Figure 1, pop up.

Figure 1: Results of SELECT on traditional physical files.

The DBA’s eyebrows are as far as they will go and their jaw is hanging agape. What is with the column names? Why do all the columns in a table start with the same two characters? And are those dates?

The DBA slowly backs out of the room, returns to the office, and promptly recommends that the company invest in a proper database that can be populated from the data on the IBM i. Of course, that database runs on SQL server or Oracle or DB2 on Linux or anything except IBM i.

The DBA was hoping to see you type in:

select * from customers;
select * from products;
select * from sales_history;

Hopefully the end result is something akin to the result sets shown below in Figure 2.

Figure 2: Results of SELECT on DDL named views.

The DBA was right, to an extent. Ten-character, eight-character, or six-character column names, each with unique prefixes or suffixes depending on the table, are not what one would call a modern database. No matter that your database may be a well-designed, normalized database, it just doesn’t look right.

What is Refacing?

Refacing is where we can go from the results shown in Figure 1 to the results shown in Figure 2 without having to make any changes to our existing DDS-defined files.

This means creating corresponding views for each physical file. The views rename the columns to their proper names. In this case, the views CUSTOMERS, PRODUCTS, and SALES_HISTORY would be created over the physical files SACUSMAS, SAPRODMAS, and SASALHST.

That might work for these three simple physical files, you say, but I have 2,000 physical files, some with hundreds of fields. It would take forever.

Not with a couple of tools to help you along the way.

The Refacing Process

I break the refacing process into three stages:

Extract the table and column definitions
Analyze and correct table and column names
Generate, amend and run a CREATE VIEW script

The concept is that you extract the required data for the current physical files and column definitions into two conversion tables (one for the physical files and one for the tables), correct the data in the two conversion tables, and then generate a script (from the conversion tables) to create all of the required views.

In this article, we will look at the extraction stage and at the other two stages in each of the subsequent articles.

Physical Files for Conversion

Figure 3 shows the DDS for four physical files: a field reference file and three data files. Whether or not you are using a field reference file is irrelevant because required information is extracted from the actual file definitions. The field reference file is being shown for your benefit only.

These files are in library SQLFACETST.

The main points to note are (refer to the callouts in the code below):

A–All fields in a file start with the same two character prefix. This could just as easily be one character and/or a suffix as opposed to a prefix. Or, if you are really lucky, there are no prefixes and suffixes involved and you just have to deal with short names. The norm (but not the rule) is that the name following the prefix is consistent across files.

B–The columns CMTOTSAL (in SACUSMAS) and PMAMOUNT (in SAPRODMAS) both reference the field AMOUNT for their definition but PMAMOUNT overrides the text description.

C–The STOH definition (in SAFLDREF) does not have a text description.

D–SHPURDAT (in SASALHST) is a YYYYMMDD date stored in numeric format. It may also be 0 or 99999999.

E–SHORYY, SHORMM and SHORDD are an order date, the year, month, and day being stored as separate fields. Year is a four-digit number.

F–SHDLC, SHDLYY, SHDLMM, and SHDLDD are a delivery date, the century, year, month, and day being stored as separate fields. Century is a single digit number (0 for 1900s, 1 for 2000s) and year is a two-digit number.

File SAFLDREF
     A          R SAFLDREFR
     A            AMOUNT        15P 2       TEXT('Amount')
     A            BALANCE       13P 2       TEXT('Balance')
     A                                      EDTCDE(L)
     A            CFIRST        30A         TEXT('Contact''s First Name
     A            CLAST         30A         TEXT('Contact''s Last Name'
     A            CUSNO         10A         TEXT('Customer ID Number')
     A            DESC          40A         TEXT('Product Description')
     A            DLC            1S 0       TEXT('Delivery Century')
     A            DLDD           2S 0       TEXT('Delivery Day')   
     A            DLMM           2S 0       TEXT('Delivery Month')
     A            DLYY           2S 0       TEXT('Delivery Year') 
     A            NAME          40A         TEXT('Customer Name') 
     A            ORDD           2S 0       TEXT('Order Day')  
     A            ORMM           2S 0       TEXT('Order Month') 
     A            ORYY           4S 0       TEXT('Order Year')  
     A            PRODCD        10A         TEXT('Product Code')
     A            PURDAT         8S 0       TEXT('Purchase Date')
     A                                      EDTCDE(Y)   
     A            QTY           15P 5       TEXT('Quantity/Volume')
     A                                      EDTCDE(L)     
(C)  A            STOH          15P 5       EDTCDE(L)   

File SACUSMAS
     A                                      REF(SAFLDREF)  
     A          R SACUSMASR         
(A)  A            CMCUSNO   R               REFFLD(CUSNO)   
     A            CMNAME    R               REFFLD(NAME)    
     A            CMCFIRST  R               REFFLD(CFIRST)  
     A            CMCLAST   R               REFFLD(CLAST)   
     A            CMBALANCE R               REFFLD(BALANCE) 
(B)  A            CMTOTSAL  R               REFFLD(AMOUNT)  

File SAPRODMAS
     A                                      REF(SAFLDREF) 
     A          R SAPRODMASR              
(A)  A            PMPRODCD  R               REFFLD(PRODCD) 
     A            PMDESC    R               REFFLD(DESC)  
     A            PMSTOH    R               REFFLD(STOH)   
(B)  A            PMAMOUNT  R               REFFLD(AMOUNT)  
     A                                      TEXT('Stock Amount')

File SASALHST
     A                                      REF(SAFLDREF)
     A          R SASALHSTR         
(A)  A            SHCUSNO   R               REFFLD(CUSNO)
     A            SHPRODCD  R               REFFLD(PRODCD)
     A            SHQTY     R               REFFLD(QTY) 
     A            SHAMOUNT  R               REFFLD(AMOUNT)
(D)  A            SHPURDAT  R               REFFLD(PURDAT)
(E)  A            SHORYY    R               REFFLD(ORYY)
     A            SHORMM    R               REFFLD(ORMM)
     A            SHORDD    R               REFFLD(ORDD)
(F)  A            SHDLC     R               REFFLD(DLC) 
     A            SHDLYY    R               REFFLD(DLYY)
     A            SHDLMM    R               REFFLD(DLMM)
     A            SHDLDD    R               REFFLD(DLDD)

The Conversion Files

The next piece of code shows the DDL used to create the two conversion tables used in the refacing process. These tables will initially be populated by calling a stored procedure. We will look at the contents in a moment.

CREATE TABLE TABLE_TRANSLATION FOR SYSTEM NAME TABTRANS (
	LIBRARY CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,
	TABLE_NAME CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,
	LONG_TABLE_NAME FOR COLUMN LONGNAME   VARCHAR(100) ALLOCATE(100)
                          CCSID 37 NOT NULL DEFAULT '' ,
	TABLE_TEXT CHAR(50) CCSID 37 NOT NULL DEFAULT '' )
	  
	RCDFMT TABTRANS   ;
  
LABEL ON TABLE TABLE_TRANSLATION
	IS 'Table Translation' ;
  
LABEL ON COLUMN TABLE_TRANSLATION 
( LIBRARY IS 'Library              ' ,
   TABLE_NAME IS 'Table Name           ' ,
   LONG_TABLE_NAME IS 'Long                Table               Name' ,
   TABLE_TEXT IS 'Text' ) ;
  
LABEL ON COLUMN TABLE_TRANSLATION
( LIBRARY TEXT IS 'Library/Schema' ,
	TABLE_NAME TEXT IS 'Table Name' ,
	LONG_TABLE_NAME TEXT IS 'Long Table Name' ,
	TABLE_TEXT TEXT IS 'Text' ) ; 
  
CREATE TABLE COLUMN_TRANSLATION FOR SYSTEM NAME COLTRANS ( 
	LIBRARY CHAR(10) CCSID 37 NOT NULL DEFAULT '' , 
	TABLE_NAME CHAR(10) CCSID 37 NOT NULL DEFAULT '' , 
	CURRENT_COLUMN FOR COLUMN CUR_COLUMN CHAR(10) 
                         CCSID 37 NOT NULL DEFAULT '' , 
	COLUMN_SEQUENCE FOR COLUMN COLSEQ DECIMAL(10, 0) 
                          NOT NULL DEFAULT 0 , 
	PROPER_COLUMN FOR COLUMN PROPCOLUMN CHAR(10) 
                        CCSID 37 NOT NULL DEFAULT '' , 
	LONG_COLUMN FOR COLUMN LONGCOLUMN VARCHAR(100) ALLOCATE(100) 
                      CCSID 37 NOT NULL DEFAULT '' , 
	USE_FUNCTION_NAME FOR COLUMN USEFUNC  VARCHAR(20) ALLOCATE(20) 
                            CCSID 37 NOT NULL DEFAULT '' , 
	EXTRA_PARAMETERS FOR COLUMN EXTRAPARM  VARCHAR(100) ALLOCATE(100) 
                           CCSID 37 NOT NULL DEFAULT '' )  
	RCDFMT COLTRANS   ; 
  
LABEL ON TABLE COLUMN_TRANSLATION 
	IS 'Column Translation' ; 
  
LABEL ON COLUMN COLUMN_TRANSLATION 
( LIBRARY IS 'Library              ' , 
   TABLE_NAME IS 'Table Name           ' , 
   CURRENT_COLUMN IS 'Current             Column              Name' , 
   COLUMN_SEQUENCE IS 'Column              Sequence' , 
   PROPER_COLUMN IS 'Proper              Column              Name' , 
   LONG_COLUMN IS 'Long                Column              Name' , 
   USE_FUNCTION_NAME IS 'Use Function         ' , 
   EXTRA_PARAMETERS IS 'Extra               Parameters          For Function' ) ; 
  
LABEL ON COLUMN COLUMN_TRANSLATION 
( LIBRARY TEXT IS 'Library/Schema' , 
	TABLE_NAME TEXT IS 'Table Name' , 
	CURRENT_COLUMN TEXT IS 'Current Column Name' , 
	COLUMN_SEQUENCE TEXT IS 'Column Sequence in Row' , 
	PROPER_COLUMN TEXT IS 'Proper Column Name' , 
	LONG_COLUMN TEXT IS 'Long Column Name' , 
	USE_FUNCTION_NAME TEXT IS 'Use Function' , 
	EXTRA_PARAMETERS TEXT IS 'Extra Parameters for Function' ) ;

Populating the Conversion Tables

We populate the conversion tables by calling the EXTRACT_DEFINITIONS stored procedure as follows:

call extract_definitions('SQLFACETST', DEFAULT, 2, DEFAULT);

The four parameters are:

The name of the library/schema containing the files.
The name of a file. The default value, *ALL, means all physical files with an attribute of PF-DTA in the library.
The number of prefix characters. This number of characters will be removed from the start of every field name to arrive at a proper name (more in a moment).
The number of suffix characters. This number of characters will be removed from the end of every field name to arrive at a proper name (more in a moment).

If you don’t want all files in a library, you can request the files one by one or, as I did above, request all files and then delete references to the files you don’t want. In this case, I am not interested in the field reference file (SAFLDREF), so I run the two following SQL statements:

delete from table_translation where table_name in ('SAFLDREF'); 
delete from column_translation where table_name in ('SAFLDREF');

Figure 3 shows what has been placed in the TABLE_TRANSLATION table. For each file, the EXTRACT_DEFINITIONS stored procedure has attempted to construct a LONG_TABLE_NAME from the table text. Constructing this name consists of putting all the characters in upper case and replacing special characters with an underscore.

Figure 3: Populated TABLE_TRANSLATION.

Figure 4 below shows what has been placed in the TABLE_TRANSLATION table. The important points to note are:

LIBRARY, TABLE_NAME, CURRENT_COLUMN and COLUMN_SEQUENCE identify what is currently there.
PROPER_COLUMN is the CURRENT_COLUMN without the prefix and/or suffix characters.
LONG_COLUMN is an attempt to construct a long name from the column text. Again, constructing this name consists of putting all the characters in upper case and replacing special characters with an underscore.
The USE_FUNCTION_NAME and EXTRA_PARAMETERS columns are not populated by EXTRACT_DEFINITIONS. We will see how these columns are used in the second and third articles.

Figure 4: Populated COLUMN_TRANSLATION.

This next bit of code shows the DDL used to create the EXTRACT_DEFINITIONS stored procedure. The procedure generates and executes two dynamic SQL statements. The insert statements run select statements that extract the required definitions from the system catalog tables QADBXFIL and QADBIFLD.

CREATE PROCEDURE EXTRACT_DEFINITIONS ( 
	IN LIBRARY VARCHAR(10) , 
	IN TABLE_NAME VARCHAR(10) DEFAULT  '*ALL'  , 
	IN PREFIX_LENGTH DECIMAL(1, 0) DEFAULT  0  , 
	IN SUFFIX_LENGTH DECIMAL(1, 0) DEFAULT  0  ) 
	LANGUAGE SQL 
	SPECIFIC EXTRACT_DEFINITIONS 
	NOT DETERMINISTIC 
	MODIFIES SQL DATA 
	CALLED ON NULL INPUT 
	PROGRAM TYPE SUB 
	CONCURRENT ACCESS RESOLUTION DEFAULT 
	SET OPTION  ALWBLK = *ALLREAD , 
	ALWCPYDTA = *OPTIMIZE , 
	COMMIT = *NONE , 
	DECRESULT = (31, 31, 00) , 
	DFTRDBCOL = *NONE , 
	DYNDFTCOL = *NO , 
	DYNUSRPRF = *USER , 
	SRTSEQ = *HEX   
	BEGIN 
  
DECLARE SQLCODE INTEGER DEFAULT 0 ; 
DECLARE SQL_STATEMENT VARCHAR ( 5000 ) ; 
DECLARE START_AT INTEGER DEFAULT 0 ; 
DECLARE CUT_END INTEGER DEFAULT 0 ; 
DECLARE QT CHAR(1) DEFAULT '''';
  
-- Extract Table Names 
SET SQL_STATEMENT = 
    'insert into tabtrans (library, table_name, ' || 
    'long_table_name, table_text) ' || 
    '(select dbxlib, dbxfil, ' || 
    'coalesce(translate(trim(upper(char(dbxtxt))), ' || 
(A)                     QT || '__________' || QT || ', ' || 
                        QT || ' ,.()/+-"''''' || QT || '), ' || 
              QT || '*MISSING' || QT || '), ' || 
    'coalesce(char(dbxtxt), '' '') ' || 
    'from qsys/qadbxfil ' || 
    'where dbxlib = ' || QT || LIBRARY || QT || ' and ' ||
    'dbxatr = ' || QT || 'PF' || QT || 
    ' and dbxtyp = ' || QT || 'D' || QT; 
  
IF ( TABLE_NAME <> '' AND TABLE_NAME <> '*ALL' ) THEN 
SET SQL_STATEMENT = 
    SQL_STATEMENT || 
    ' and dbxfil = ' || QT || TABLE_NAME  || QT ; 
END IF ; 
  
SET SQL_STATEMENT = SQL_STATEMENT || ')' ; 
  
PREPARE D1 FROM SQL_STATEMENT ; 
  
IF ( SQLCODE <> 0 ) THEN 
RETURN ; 
END IF ; 
  
EXECUTE D1 ; 
  
  
-- Set parameters for dropping prefix/suffix 
SET START_AT = 1 + PREFIX_LENGTH ; 
SET CUT_END = PREFIX_LENGTH + SUFFIX_LENGTH ; 
  
-- Extract Column Definitions 
SET SQL_STATEMENT = 
    'insert into coltrans (library, table_name, cur_column, ' || 
    'column_sequence, propcolumn, longcolumn) ' || 
    '(select dbilib, dbifil, dbifld, dbipos, ' || 
    'substr(dbifld, ' || DIGITS ( START_AT ) || ', ' || 
            'length(trim(dbifld))-' || 
            DIGITS ( CUT_END ) || ') ,  ' || 
    'coalesce(translate(trim(upper(char(dbitxt))), ' || 
(A)                     QT || '__________' || QT || ', ' || 
                        QT || ' ,.()/+-"''''' || QT || '), ' || 
              QT || '*MISSING' || QT || ') ' || 
    'from qsys/qadbxfil a inner join qsys/qadbifld b ' || 
    'on (dbxlib, dbxfil) = (dbilib, dbifil) ' ||
    'where dbilib = ' || QT || LIBRARY || QT || ' and ' ||
    'dbxatr = ' || QT || 'PF' || QT || 
    ' and dbxtyp = ' || QT || 'D' || QT; 
  
IF ( TABLE_NAME <> '' AND TABLE_NAME <> '*ALL' ) THEN 
SET SQL_STATEMENT = 
    SQL_STATEMENT || 
    ' and dbifil = ' || QT || TABLE_NAME  || QT ; 
END IF ; 
  
SET SQL_STATEMENT = SQL_STATEMENT || ')' ; 

PREPARE D2 FROM SQL_STATEMENT ; 
  
IF ( SQLCODE <> 0 ) THEN 
RETURN ; 
END IF ; 
  
EXECUTE D2 ; 
  
END  ; 
  
COMMENT ON PARAMETER SPECIFIC PROCEDURE EXTRACT_DEFINITIONS 
( LIBRARY IS 'Library Name' , 
	TABLE_NAME IS 'Table Name' , 
	PREFIX_LENGTH IS 'Prefix Length' , 
	SUFFIX_LENGTH IS 'Suffix Length' ) ; 
  
LABEL ON SPECIFIC PROCEDURE EXTRACT_DEFINITIONS  
	IS 'Extract Column Definitions' ;

Writing dynamic SQL in SQL is never the prettiest of code. The only part of this procedure that you might have to change is to add extra characters to the translation of text to the new column name. Let’s say we want to have the ” (reverse slash) character also translated: in the two place where the translation is done ( (A) in Figure 7), we would need to add an extra underscore to the first string and the ” to the second string, as follows:

        QT || '___________' || QT || ', ' || 
        QT || ' ,.()/+-"''''' || QT || '), ' ||

In case you are wondering, the four quotes at the end of the “to” string equate to one single quote in the final string used for the dynamic statement. Did I mention that writing dynamic SQL in SQL is never the prettiest of code?

The Next Step

The next step is to analyze and correct our new naming, which we will cover in my next article.

Paul Tuohy is CEO of ComCon, an iSeries consulting company, and is one of the co-founders of System i Developer, which hosts the RPG & DB2 Summit conferences. He is an award-winning speaker who also speaks regularly at COMMON conferences, and is the author of “Re-engineering RPG Legacy Applications,” “The Programmers Guide to iSeries Navigator,” and the self-study course called “iSeries Navigator for Programmers.” Send your questions or comments for Paul to Ted Holt via the IT Jungle Contact page.

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search