Guru: Using Identity Columns For Complex Keys
February 17, 2020 Paul Tuohy
In September 2019, Ted Holt published an article that demonstrated how an identity column can be used to generate a unique key in a table. In this article, I want to expand on that theme and examine another use of an identity column — replacing complex keys.
Using an identity column in place of a complex key makes for much faster joins between tables, as you are joining based on two numbers as opposed to values of multiple columns. Identity columns also make for joins that are easier to comprehend, since they are based on just one column. This approach is well worth considering if you are creating a new database or looking at modernizing an existing database. We will look and contrast the definition of tables, constraints and views without and with identity columns.
The Tables
First of all, we have a Warehouse table. This is a simple table and does not require an identity column, but it is referenced by later tables and views.
create or replace table WAREHOUSES ( WAREHOUSE CHAR(3) NOT NULL, WAREHOUSE_DESCRIPTION for WARE_DESC VARCHAR(50) NOT NULL, -- Definition of other columns CONSTRAINT PK_WAREHOUSES PRIMARY KEY( WAREHOUSE ) ) RCDFMT WAREHOUSES;
Next we have an Item Master, which has a three part key (the three ITEM columns). This is the definition of the table without an identity column.
create or replace table ITEM_MASTER for system name ITEMMAST ( ITEM_GROUP CHAR(4) NOT NULL, ITEM_SUB_GROUP_1 for ITEM_SUB_1 CHAR(2) NOT NULL, ITEM_SUB_GROUP_2 for ITEM_SUB_2 CHAR(2) NOT NULL, ITEM_DESCRIPTION for ITEM_DESC VARCHAR(50) NOT NULL, -- Definition of other columns CONSTRAINT PK_ITEM_MASTER PRIMARY KEY( ITEM_GROUP, ITEM_SUB_GROUP_1, ITEM_SUB_GROUP_2 ) ) RCDFMT ITEMMAST;
If we include the definition of an identity column (A), the primary key constraint remains the same (B), but we also have a unique key constraint based on the identity column (C). A unique key constraint provides a different key for uniquely identifying rows in a table. A primary key or a unique key is a requirement, on the parent table, when defining a foreign key constraint, hence the requirement for a unique key constraint defined on the identity column.
create or replace table ITEM_MASTER for system name ITEMMAST ( ITEM_GROUP CHAR(4) NOT NULL, ITEM_SUB_GROUP_1 for ITEM_SUB_1 CHAR(2) NOT NULL, ITEM_SUB_GROUP_2 for ITEM_SUB_2 CHAR(2) NOT NULL, (A) ITEM_ID DECIMAL(11, 0) Generated Always As Identity NOT NULL, ITEM_DESCRIPTION for ITEM_DESC VARCHAR(50) NOT NULL, -- Definition of other columns (B) CONSTRAINT PK_ITEM_MASTER PRIMARY KEY( ITEM_GROUP, ITEM_SUB_GROUP_1, ITEM_SUB_GROUP_2 ), (C) CONSTRAINT UK_ITEM_MASTER UNIQUE ( ITEM_ID ) ) RCDFMT ITEMMAST;
Now we define a table for Items in Warehouses. If we are not using an identity column, the table has a four-column key (the warehouse and three item columns) and there are two foreign key constraints (to the Warehouse and Item Master tables). This is where we first see the potential for error. When listing multiple columns it is always an easy matter to list columns in the wrong sequence.
create or replace table ITEMS_IN_WAREHOUSES for system name WAREITEM ( WAREHOUSE CHAR(3) NOT NULL, ITEM_GROUP CHAR(4) NOT NULL, ITEM_SUB_GROUP_1 for ITEM_SUB_1 CHAR(2) NOT NULL, ITEM_SUB_GROUP_2 for ITEM_SUB_2 CHAR(2) NOT NULL, STOCK_LOCATION for STOCK_LOC CHAR(8) NOT NULL, -- Definition of other columns CONSTRAINT PK_ITEMS_IN_WAREHOUSES PRIMARY KEY( WAREHOUSE, ITEM_GROUP, ITEM_SUB_GROUP_1, ITEM_SUB_GROUP_2 ) ) RCDFMT WAREITEM; ALTER TABLE ITEMS_IN_WAREHOUSES ADD CONSTRAINT FK_ITEMS_IN_WAREHOUSES_TO_WAREHOUSE FOREIGN KEY( WAREHOUSE ) REFERENCES WAREHOUSES ( WAREHOUSE ) ON DELETE RESTRICT ON UPDATE RESTRICT ; ALTER TABLE ITEMS_IN_WAREHOUSES ADD CONSTRAINT FK_ITEMS_IN_WAREHOUSES_TO_ITEM FOREIGN KEY( ITEM_GROUP, ITEM_SUB_GROUP_1, ITEM_SUB_GROUP_2 ) REFERENCES ITEM_MASTER ( ITEM_GROUP, ITEM_SUB_GROUP_1, ITEM_SUB_GROUP_2 ) ON DELETE RESTRICT ON UPDATE RESTRICT ;
But, if we define the Items in Warehouse table and make use of the identity column from the Item Master, we now have a much simpler key and constraint definition. The table now uses the value of the ITEM_ID column (A) from the Item Master as opposed to the three ITEM columns.
This table also generates an identity column (B) to uniquely identify rows in this table.
create or replace table ITEMS_IN_WAREHOUSES for system name WAREITEM ( WAREHOUSE CHAR(3) NOT NULL, (A) ITEM_ID DECIMAL(11, 0) NOT NULL, ITEM_IN_WAREHOUSE_ID for ITEMWAREID DECIMAL(11, 0) Generated Always As Identity NOT NULL, STOCK_LOCATION for STOCK_LOC CHAR(8) NOT NULL, -- Definition of other columns CONSTRAINT PK_ITEMS_IN_WAREHOUSES PRIMARY KEY( WAREHOUSE, ITEM_ID ) , CONSTRAINT UK_ITEMS_IN_WAREHOUSES UNIQUE ( ITEM_IN_WAREHOUSE_ID ) ) RCDFMT WAREITEM; ALTER TABLE ITEMS_IN_WAREHOUSES ADD CONSTRAINT FK_ITEMS_IN_WAREHOUSES_TO_WAREHOUSE FOREIGN KEY( WAREHOUSE ) REFERENCES WAREHOUSES ( WAREHOUSE ) ON DELETE RESTRICT ON UPDATE RESTRICT ; ALTER TABLE ITEMS_IN_WAREHOUSES ADD CONSTRAINT FK_ITEMS_IN_WAREHOUSES_TO_ITEM FOREIGN KEY( ITEM_ID ) REFERENCES ITEM_MASTER ( ITEM_ID ) ON DELETE RESTRICT ON UPDATE RESTRICT ;
There will now be many more tables that are dependents of the Items in Warehouses. One example would be a Stock Transactions table. This is what it looks like without an identity column — all four key fields from the Items in Warehouses table are required.
create or replace table ITEMS_TRANSACTIONS for system name ITEMTRANS ( WAREHOUSE CHAR(3) NOT NULL, ITEM_GROUP CHAR(4) NOT NULL, ITEM_SUB_GROUP_1 for ITEM_SUB_1 CHAR(2) NOT NULL, ITEM_SUB_GROUP_2 for ITEM_SUB_2 CHAR(2) NOT NULL, TRANSACTION_SEQ for TRAN_SEQ INT NOT NULL, TRANSACTION_DATE for TRAN_DATE DATE NOT NULL, QUANTITY DECIMAL(9,0) NOT NULL, -- Definition of other columns CONSTRAINT PK_ITEMS_TRANSACTIONS PRIMARY KEY( WAREHOUSE, ITEM_GROUP, ITEM_SUB_GROUP_1, ITEM_SUB_GROUP_2, TRANSACTION_SEQ ) ) RCDFMT ITEMTRANS; ALTER TABLE ITEMS_TRANSACTIONS ADD CONSTRAINT FK_ITEMS_TRANSACTIONS_TO_ITEMS_IN_WAREHOUSES FOREIGN KEY( WAREHOUSE, ITEM_GROUP, ITEM_SUB_GROUP_1, ITEM_SUB_GROUP_2 ) REFERENCES ITEMS_IN_WAREHOUSES ( WAREHOUSE, ITEM_GROUP, ITEM_SUB_GROUP_1, ITEM_SUB_GROUP_2 ) ON DELETE CASCADE ON UPDATE RESTRICT ;
But, if we make use of the value of the identity column (from the Items in Warehouses table), the definition becomes a lot simpler (A).
create or replace table ITEMS_TRANSACTIONS for system name ITEMTRANS ( ITEM_IN_WAREHOUSE_ID for ITEMWAREID DECIMAL(11, 0) NOT NULL, TRANSACTION_SEQ for TRAN_SEQ INT NOT NULL, TRANSACTION_DATE for TRAN_DATE DATE NOT NULL, QUANTITY DECIMAL(9,0) NOT NULL, -- Definition of other columns (A) CONSTRAINT PK_ITEMS_TRANSACTIONS PRIMARY KEY( ITEM_IN_WAREHOUSE_ID, TRANSACTION_SEQ ) ) RCDFMT ITEMTRANS; ALTER TABLE ITEMS_TRANSACTIONS ADD CONSTRAINT FK_ITEMS_TRANSACTIONS_TO_ITEMS_IN_WAREHOUSES (A) FOREIGN KEY( ITEM_IN_WAREHOUSE_ID ) (A) REFERENCES ITEMS_IN_WAREHOUSES ( ITEM_IN_WAREHOUSE_ID ) ON DELETE CASCADE ON UPDATE RESTRICT;
And There Are Views
These multi-column requirements are just as tedious when we have to use them in a view. Whenever you use multiple columns on a join, there is the potential to make a mistake. Note the joins to the Item Master (A) and the Items in Warehouses (B) tables.
create or replace view STOCK_MOVEMENT for system name STOCK_MOVE AS SELECT WH.WAREHOUSE, WH.WAREHOUSE_DESCRIPTION, IM.ITEM_GROUP, IM.ITEM_SUB_GROUP_1, IM.ITEM_SUB_GROUP_2, IM.ITEM_DESCRIPTION, IT.TRANSACTION_SEQ, IT.TRANSACTION_DATE, IT.QUANTITY FROM WAREHOUSES WH inner join ITEMS_IN_WAREHOUSES IW on WH.WAREHOUSE = IW.WAREHOUSE inner join ITEM_MASTER IM (A) on (IW.ITEM_GROUP, IW.ITEM_SUB_GROUP_1, IW.ITEM_SUB_GROUP_2) = (IM.ITEM_GROUP, IM.ITEM_SUB_GROUP_1, IM.ITEM_SUB_GROUP_2) inner join ITEMS_TRANSACTIONS IT (B) on (IW.WAREHOUSE, IW.ITEM_GROUP, IW.ITEM_SUB_GROUP_1, IW.ITEM_SUB_GROUP_2) = (IT.WAREHOUSE, IT.ITEM_GROUP, IT.ITEM_SUB_GROUP_1, IT.ITEM_SUB_GROUP_2);
But look at how much easier this becomes when we use the identity columns instead (A)(B).
create or replace view STOCK_MOVEMENT for system name STOCK_MOVE AS SELECT WH.WAREHOUSE, WH.WAREHOUSE_DESCRIPTION, IM.ITEM_GROUP, IM.ITEM_SUB_GROUP_1, IM.ITEM_SUB_GROUP_2, IM.ITEM_DESCRIPTION, IT.TRANSACTION_SEQ, IT.TRANSACTION_DATE, IT.QUANTITY FROM WAREHOUSES WH inner join ITEMS_IN_WAREHOUSES IW on WH.WAREHOUSE = IW.WAREHOUSE inner join ITEM_MASTER IM (A) on IW.ITEM_ID = IM.ITEM_ID inner join ITEMS_TRANSACTIONS IT (B) on IW.ITEM_IN_WAREHOUSE_ID = IT.ITEM_IN_WAREHOUSE_ID;
Identity columns do require more thought in the design process for a database because you may need extra views to retrieve the primary key or you may even require a new table to relate a primary key to an identity column. But, as well as simplifying joins and dependencies, they also make for faster joins.
Paul Tuohy, IBM Champion and author of Re-engineering RPG Legacy Applications, is a prominent consultant and trainer for application modernization and development technologies on the IBM Midrange. He is currently CEO of ComCon, a consultancy firm in Dublin, Ireland, and partner at System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Jon Paris.