Guru: Data-Centrism, Step Zero
September 10, 2018 Ted Holt
I hear a lot these days about the need for data-centric information systems. That is as it should be. The proper way to support an organization is to remove logic from application programs and put it into the database through such devices as constraints and triggers. However, before many shops can take the first step in that direction, they need to take what I call step zero.
Step zero in data-centric computing is to remove hard-coded data values from programs and put them into the database. Just as the database manager should enforce business rules (e.g. we don’t ship to customers who owe us money), so should data drive special logic for specific entities. In my work, I see hard-coded warehouse IDs, customer numbers, device IDs, library names, output queue names, user profiles, etc. I wish that hard coding were so rare that I couldn’t justify writing an article on the topic, but unfortunately it is common.
In the following paragraphs, I tell how to identify hard coding, how to move hard-coded data into the database, and how to simplify the programming required to access the data.
Identifying Hard Coding
I suspect most people know hard coding when they see it, but you’re ever in doubt, ask yourself one or more of the following questions:
- If this software were in use in other organizations, would this value make sense?
- Is this value the primary key, or part of the primary key, of a database table (physical file)?
- If I were to add another of this type of value, would I have to modify and recompile any programs?
Here’s an RPG example with hard-coded warehouse IDs.
C IF WHSID = 1 C EVAL WHSNAM = 'NEW YOLK' C EVAL LEADTIME = 14 C EVAL MFGWHS = 'N' C ELSE C IF WHSID = 2 C EVAL WHSNAM = 'LOST ANGELES' C EVAL LEADTIME = 7 C EVAL MFGWHS = 'Y' C ENDIF C ENDIF
Question 1: If this software were in use in other organizations, would this value make sense?
Answer: No. Other organizations might not have a warehouse 1 or 2. If they do have a warehouse by either ID, it’s probably not located in New Yolk or Lost Angeles, the lead times are probably different, and whether or not those warehouses are manufacturing warehouses or not is likely to be different.
Question 2: Is this value the primary key, or part of the primary key, of a database table (physical file)?
Answer: Maybe. If we have a warehouse master file, then yes, the warehouse ID is the key.
Question 3: If I were to add another of this type of value, would I have to modify and recompile any programs?
Answer: If our organization were to open a new warehouse, we’d have to modify this program and others that test for warehouse IDs.
The logical conclusion is that we should not be hard coding warehouse IDs.
Now consider this:
C IF CUSCLASS = 'A' . . . do whatever C ENDIF
In this application, customers are categorized as A (our best customers), B (occasional customers whom we don’t wish to lose), or C (customers who we wish would take their business elsewhere). Does it make sense to hard code the A? Yes, it does. All sorts of business have these types of customers, so to make this type of classification a feature of our software makes sense. Consider also that customer type is not a primary key, but an attribute of a customer.
Identifying New Attributes
Sometimes attributes are not so obvious. That is, an attribute exists, but we don’t think of it as such. For example:
C IF WHSID = 1 OR C WHSID = 5 OR C WHSID = 12 . . . calculate lead time one way C ELSE C IF WHSID = 2 OR C WHSID = 6 . . . calculate lead time a different way C ELSE . . . calculate lead time using yet another way C ENDIF C ENDIF
This code differs from the first example in that different warehouse IDs drive different calculations. Rather than assign a lead time, we use different methods to calculate lead times. That is, we have three calculation methods. We need to assign codes, such as A, B, C or 1, 2, 3 — anything that makes sense — and add a lead-time calculation method column (field) to the appropriate warehouse table. The revised code looks like this:
C WHSID CHAIN WHSMAST . . . handle not found C IF LTMETHOD = 'A' . . . calculate lead time one way C ELSE C IF LTMETHOD = 'B' . . . calculate lead time a different way C ELSE . . . calculate lead time using yet another way C ENDIF C ENDIF
Moving Into The Database
In relational databases, data is stored in tables (physical files). When moving hard-coded data into a table, you have three options:
- If there is no suitable table for the data, create one.
- If there is a suitable table for the data, add new columns (fields) to hold the attributes.
- If there is a suitable table, but adding new columns is impractical, create an auxiliary table with the same key.
Let’s consider each of these in turn.
You have created many tables, so there is no need for me to tell you how to create one, but for completeness, here’s the command to create a warehouse master table:
create table warehouses ( ID dec(3) primary key, Name varchar(20), LeadTime dec(3) not null with default 0, IsManufacturing for MfgWhs char(1) not null with default 'N' check (IsManufacturing in ('Y', 'N')) ) insert into warehouses values ( 1, 'New Yolk', 14, 'Y'), ( 2, 'Lost Angeles', 7, 'N')
You can also use DDS to create a physical file, but I prefer and recommend SQL.
If you already have a warehouse master, consider adding new columns.
alter table warehouses add column Name varchar(20) add column LeadTime dec(3) not null with default 0 add column IsManufacturing for MfgWhs char(1) not null with default 'N' check (IsManufacturing in ('Y', 'N') )
Sometimes adding new columns is impractical. This is often the case in shops that run a software package. Modifying the software vendor’s table is not an option. In such a case, create an auxiliary table. Use the same SQL statement you would use to create a new master file, as in the example above. The two tables will have the same key fields. You may not always have a record in the auxiliary table for every record in the master table, so joins between the two will typically be outer joins with the master table in primary position.
With the data in the appropriate table, the program in the first example can be revised to do a random read (in RPG terms, a CHAIN) to the table to retrieve the appropriate data values.
Simplify The Programming
Here’s another common use of hard-coded data, this time in a CL program. Users Nosmo King, Malcolm Oron, and Donald Truck have requested certain enhancements to a report.
DCL VAR(&USER) TYPE(*CHAR) LEN(10) RTVJOBA USER(&USER) IF COND(&USER = 'NKING') THEN(OVRPRTF + FILE(SUPERPRTF) PRTTXT('Nosmo King') + OUTQ(PRODCTL3)) ELSE CMD(IF COND(&USER = 'MORON') THEN(OVRPRTF + FILE(SUPERPRTF) OUTQ(ACCTSPAY))) ELSE CMD(IF COND(&USER = 'DTRUCK') THEN(OVRPRTF + FILE(SUPERPRTF) PRTTXT('Donald Truck') + COPIES(2))) CALL PGM(SUPERPGM) DLTOVR FILE(SUPERPRTF)
I’ve seen a lot of this. Chances are good that you have too. How can we move this override information into the database? Here’s one way.
First, let’s create a table of override commands.
create table overrides ( UserID char(10), ReportID char(10), Override varchar(350) not null with default, primary key (UserID, ReportID)) insert into overrides values ('NKING','SUPERPRTF','PRTTXT(''Nosmo King'') OUTQ(PRODCTL3)'), ('MORON','SUPERPRTF','OUTQ(ACCTSPAY)'), ('DTRUCK','SUPERPRTF','PRTTXT(''Donald Truck'') COPIES(2)')
The overrides are in the database. How do I access them?
I could add Declare File (DCLF) and Receive File (RCVF) commands to all such CL programs, but that’s for the birds. I’m not even going to show you the code to make that happen. A better way is to write a program — such as this GETOVR example — to retrieve the override.
ctl-opt option(*srcstmt: *nodebugio); dcl-f overrides keyed rename(overrides: OvrRec); dcl-pr GETOVR extpgm('GETOVR'); inUser char(10) const; inReport char(10) const; ouOverride char(256); end-pr GETOVR; dcl-pi GETOVR; inUser char(10) const; inReport char(10) const; ouOverride char(256); end-pi GETOVR; chain (inUser: inReport) OvrRec; if %found(); ouOverride = Override; else; ouOverride = *blanks; endif; return;
Modify the CL programs to call this program.
DCL VAR(&USER) TYPE(*CHAR) LEN(10) DCL VAR(&OVERPARMS) TYPE(*CHAR) LEN(350) DCL VAR(&OVERRIDE) TYPE(*CHAR) LEN(350) DCL VAR(&OLENGTH) TYPE(*DEC) LEN(15 5) VALUE(350) RTVJOBA USER(&USER) CALL GETOVR (&USER SUPERPRTF &OVERPARMS) IF COND(&OVERPARMS *NE ' ') THEN(DO) CHGVAR &OVERRIDE ('OVRPRTF SUPERPRTF' *BCAT &OVERPARMS) CALL QCMDEXC (&OVERRIDE &OLENGTH) ENDDO CALL PGM(SUPERPGM) IF COND(&OVERPARMS *NE ' ') + THEN(DLTOVR FILE(SUPERPRTF))
It is much easier to add a CALL than I/O commands.
I did not invent this idea. I’ve seen it used many times. In fact, some people get pretty fancy with it. For instance, the systems I have used defined separate fields for all the override parameters, instead of one long command string. It is also common to allow generic values or special values like *ALL in the key fields. But even the barebones code in my example gets the data out of the program and into the database, where it belongs.
If your shop is already data-centric, spectacular! If you’re still moving in that direction, magnificent! If your shop has yet to take step zero, get busy!
The first two pictures are the same, I guess a typo when inserting the image into the article.
Excellent article!
And I needed to chuckle, because one of the areas where the data is hard-coded, there appears to be a typo – which is yet another reason why you don’t want to hard-code… 🙂
EVAL WHSNAM = ‘NEW YOLK’
Nice! The fixed format RPG tells us the reason why the programmer committed the “Cardinal Sin” of hard-coding. This mentality has resulted in the downfall of RPG, and subsequently that of IBM i itself.
The reason for this mentality is that many RPG developer lack higher education and therefore broader vision to foresee the hazards of their laziness.
I have seen high level officials fired after the hard-coding resulted in stoppage of business (level 0 emergency).
The ideal thing is to go beyond step 0 that Ted suggested. There are Q&A tables, and then there are logic tables with SQL statements on it. These SQL statements are picked up from the table and embedded in the program, depending on the condition from Q&A table.
There is lot to be done, and not all shops do all of the things. Having worked in dozens of good shops can give you an idea of how things can be, should be, and IS done (in some shops).
Excellent expose about different forms of technical debt that many programmers would just gloss over. Another example that I run into at legacy shops is the compile time tables. My STDXREF (Standard Cross Reference) tool is a simple little open-source vehicle for one-to-one lookups with either alpha or numeric arguments (and results). One point of maintenance and one point of lookup. Sort of a “table file” on steroids. It really helped me burn down compile time tables and hard coded IF or SELECT blocks when doing modernization projects. It actually reminds me of the table-driven programming topic you’ve done at COMMON in the past. I added it to my GitHub collection earlier this year (after freshening it up from its 2005 birthday suit).
Great reminder of stuff that not everyone thinks about. Makes life much more flexible as business rules and users change.
I find cases where data is hardcoded (state names, product categories are a few example) hardcoded for tables or arrays in dozens of programs. While these programs were mostly written without using ILE, they still don’t need this hardcoding.