An Overview of User-Defined Types in DB2 for i

September 2, 2009 Michael Sansoterra

UDTs or User-Defined Types (also known as distinct types) allow SQL developers to implement their own data types. For example, if a database contains many tables that each have one or more columns representing a general ledger account number defined as VARCHAR(30), a suitable UDT can be defined as follows:

Create Type My_Schema/GL_ACCOUNT_NUMBER As VarChar(30) CCSID 37
With Comparisons

In DB2 for i, implementing UDTs is currently limited to deriving a new data type from an existing DB2 built-in data type. This tip will briefly explore UDTs, how they can be used in modern databases, potential pitfalls, and how UDTs in DB2 for i (a.k.a., DB2 UDB for iSeries) may be enhanced in the future.

How Is a UDT Used Within SQL Statements?

A UDT can be referenced in almost any SQL statement that references a built-in data type. These statements include: CREATE TABLE, DECLARE, CAST, etc. For instance, the above UDT named GL_ACCOUNT_NUMBER can be used in a CREATE TABLE statement as follows:

CREATE TABLE My_Schema/My_Inventory_Test
(Inventory_Id As Int,
 Inventory_Account_Number As GL_ACCOUNT_NUMBER)

UDTs and Strong Typing

Even though UDTs are derived from a built-in type, a UDT is considered distinct from all other data types. This is sometimes called a “strong type.” In the above example, GL_ACCOUNT_NUMBER is derived from the VARCHAR data type. However, GL_ACCOUNT_NUMBER is considered by DB2 to be unique, that is, different from and incompatible with all other data types, including character types.

This means that DB2 won’t allow a column defined as a GL_ACCOUNT_NUMBER to be directly compared to, or concatenated to, character data without an explicit CAST. Assuming AR_ACCOUNT_NUMBER is a column based on the GL_ACCOUNT_NUMBER UDT, here’s an example of a column expression that concatenates the UDT column to a constant:

'A/R Account:' || CAST(AR_ACCOUNT_NUMBER As VarChar(30) CCSID 37)

If you attempt to do this operation without the CAST, DB2 will throw an error indicating that the data types are incompatible. In this case DB2 for i will return message ID: SQL0171.

The good news is that the CREATE TYPE statement automatically generates companion user-defined functions (UDFs) that can be used to assist with these tasks. In fact, it creates the following UDFs:

One function to convert from the distinct type to the source type
One function to convert from the source type to the distinct type
One function to convert from INTEGER to the distinct type if the source type is SMALLINT
One function to convert from DOUBLE to the distinct type if the source type is REAL
One function to convert from VARCHAR to the distinct type if the source type is CHAR
One function to convert from VARGRAPHIC to the distinct type if the source type is GRAPHIC

These new UDFs are called “comparison functions” and allow an easy way for the base data type to be CAST to the new distinct type, or vice versa, so the base type and derived distinct type can participate in comparisons and other basic operations. In our example, we can cast the constant to a type of GL_ACCOUNT_NUMBER or cast the account number back to a character type as follows using the newly created UDFs:

GL_ACCOUNT_NUMBER('A/R Account:') || AR_ACCOUNT_NUMBER

'A/R Account:' || VARCHAR(AR_ACCOUNT_NUMBER)

In this case, GL_ACCOUNT_NUMBER function is a UDF that was created at the time the distinct type was created. Also, a new VARCHAR function was created to accept a GL_ACCOUNT_NUMBER type and cast it to VARCHAR. The IBM SQL Reference manual did specify that for the LIKE predicate an explicit cast is still required, so don’t use the UDFs for this particular case.

The strong typing concept holds true for numeric columns also. For example, if column ORDER_NUMBER is defined as a type INT, a developer can use any numeric operator (+, -, *, /) or any numeric capable built-in function (i.e., SQRT) on it, although it may not always make sense to do so. (Who wants to divide an order number by 4?!) However, if ORDER_NUMBER is defined as a UDT of GENERIC_ORDER_NUMBER, which is based on the INT type, none of these operators will work unless the data is first CAST back to a numeric type using an explicit CAST or one of the generated comparison UDFs.

Creating Sourced Functions

If you don’t like the idea of casting a UDT to its base type every time before doing a concatenation or numeric operation, there is an option to create a “sourced” function. A sourced function allows you to create a new function based on an existing function. The concatenation (||) and plus (+) operators, for example, can be viewed as SQL functions. These functions can be overloaded to accept new data types.

In this next example, the concatenation function is enhanced to operate on two GL_ACCOUNT_NUMBER data types or a GL_ACCOUNT_NUMBER and a VARCHAR data type. When the concatenation function is enhanced in this manner, no explicit casts are required. [Note: The SQL naming convention used here.]

/* Concat will now be able to join two GL_ACCOUNT_NUMBER types */
Create Function My_Schema.Concat
(My_Schema.GL_ACCOUNT_NUMBER, My_Schema.GL_ACCOUNT_NUMBER)
Specific My_Schema.CONCAT_GL_ACCOUNT_NUMBER_GL_ACCOUNT_NUMBER
Returns My_Schema.GL_ACCOUNT_NUMBER
Source CONCAT (VarChar(30) CCSID 37,VarChar(30) CCSID 37)

/* Concat will now be able to join a GL_ACCOUNT_NUMBER and a VARCHAR */
Create Function My_Schema.Concat
(My_Schema.GL_ACCOUNT_NUMBER, VARCHAR(30))
Specific My_Schema.CONCAT_GL_ACCOUNT_NUMBER_VARCHAR
Returns My_Schema.GL_ACCOUNT_NUMBER
Source CONCAT (VarChar(30) CCSID 37,VarChar(30) CCSID 37)

/* Concat will now be able to join a VARCHAR and a GL_ACCOUNT_NUMBER */
Create Function My_Schema.Concat
(VARCHAR(30) CCSID 37, My_Schema.GL_ACCOUNT_NUMBER)
Specific My_Schema.CONCAT_VARCHAR_GL_ACCOUNT_NUMBER
Returns My_Schema.GL_ACCOUNT_NUMBER
Source CONCAT (VarChar(30) CCSID 37,VarChar(30) CCSID 37)

As evident in the examples, the CREATE FUNCTION statement specifies the input data types, the return data type, and the “base” types that will be “substituted” for the instances of the UDT. Sourcing the CONCAT built-in function in this manner also applies to the concatenation operator.

Creating these functions can be a little pesky because you need to cover all possibilities! The three examples above have three different parameter input combinations so that it won’t matter what order the parameters are passed to the function (GL and GL, GL and VarChar, or VarChar and GL). Also, notice the return type of these functions is a GL_ACCOUNT_NUMBER although they could’ve been defined to return the VARCHAR type. It just depends on how you want your application to view the resulting data type after the concatenation is done.

When creating UDTs and numeric fields, if you want to be able to perform standard numeric operations and numeric BIFs on these UDTs, you’ll first need to create sourced functions for each required mathematical operator and BIF. Here’s an example showing the overload of the + operator for a UDT called USDOLLAR.

Create Function My_Schema /"+"
(USDOLLAR, USDOLLAR)
Returns USDOLLAR
Source "+" (DEC(13,4), DEC(13,4))

Create Function My_Schema/"+"
(USDOLLAR,DEC(13,4))
 Returns USDOLLAR       
Source "+" (DEC(13,4), DEC(13,4))

Of course, creating these sourced functions for all numeric operators can be tedious! Hopefully you won’t have much of a need to create sourced functions for all the possible mathematical operators. Finally, avoid creating type names that may clash with reserved SQL keywords.

How Are Unqualified UDTs Located in Dynamic SQL Statements?

You may have noticed that you can create multiple UDTs with the same name but with different attributes in different libraries (a.k.a., schemas). How does SQL know which UDT to use when there are multiple options? The answer is easy–it searches the libraries currently identified in the SQL Current Path special register. You can use the SET CURRENT PATH statement to override the order and names of the libraries for the current SQL session.

It’s a good idea to place the UDT definitions in a commonly used data library to make sure they’re always accessible to SQL statements that reference them.

One last quick note before moving on. Any created sourced functions show up in System i Navigator under the functions node of the database (as expected). However, the comparison functions that are automatically generated by DB2 when creating a UDT do not show up within System i Navigator (although they can still be viewed in the QSYS2/SYSFUNCS catalog).

Using UDTs In Place of a Data Dictionary

As more IBM i developers increasingly rely on SQL instead of Data Definition Specifications (DDS), a few issues arise. One of these issues concerns how to implement a data dictionary to enforce consistency of data types in an application. While this is “old hat” to us old-time DDS/RPG guys, this is one area where SQL is a little weak. There are two ways to use SQL to implement a data dictionary:

1. Using CREATE TABLE AS.

Under this method, an empty dictionary table with columns representing common data types is defined. A small dictionary table may look something like this:

Create Table Application_Dictionary (
GL_Account_Number  VarChar(30) Not Null CCSID 37,
Dept_Number        VarChar(12) Not Null CCSID 37,
Employee_Number    Dec(9,0)    Not Null,
Generic_Name       VarGraphic(40) Not Null CCSID 1200, /* V6R1 - 
use NVARCHAR */
Order_Number       Int         Not Null)

The CREATE TABLE AS statement allows a developer to implement columns with consistent data types based on the column definitions within the data dictionary table. In the following example, columns “Employee_Number”, “Employee_Name”, and “Dept_Number” are derived from the dictionary table columns:

Create Table Employee_Master As
(Select Employee_Number,
        Dept_Number,
        Dec(0,9,0)   As Social_Security_No,
        Generic_Name As Employee_Name
   From Application_Dictionary) WITH NO DATA

2. Using Distinct Types.

An alternative method of creating tables using SQL is to define a UDT for every common data type. Notice that the WITH COMPARISON clause is optional, although IBM recommends it be included for compatibility with other versions of DB2):

Create Type GL_Account_Number As VarChar(30) CCSID 37
Create Type Dept_Number       As VarChar(30) CCSID 37
Create Type Employee_Number   As Dec(9,0)
Create Type Generic_Name      As VarGraphic(40) CCSID 1200
Create Type Order_Number      As Int

With UDTs defined, a CREATE TABLE statement can reference these newly defined types the same as a built-in data type:

Create Table Employee_Master (
(Employee           Employee_Number Not Null,
 Department         Dept_Number     Not Null,
 Social_Security_No Dec(9,0)        Not Null,
 Employee_Name      Generic_Name    Not Null)

Now which approach is better? It depends. I like the UDT option better. The following table list includes some pros and cons of using the data dictionary table method (CREATE TABLE AS) compared to UDTs:

With either the data dictionary or UDT method, data type changes are difficult to propagate (say when increasing a column from 30 to 50 characters). Each table using the modified data type will need to be altered. In the case of UDTs, changing the attributes of a UDT can be difficult because all dependent objects must be dropped first before altering the UDT (at present there is no ALTER TYPE statement in DB2 for i). For tables, they’ll need to be altered to change the UDT back to its base data type, the UDT will need to be dropped and re-created, and the tables will need to be altered again! (For the record, I tried this on our internal system and it ended with an SQL system error so the bugs may not all be worked out yet!)
With the data dictionary method using CREATE TABLE AS, you can’t include constraints such as PRIMARY KEYs or simple CHECK CONSTRAINTs with the table definition.
The data dictionary method isn’t usable for parameter and variable definitions within SQL routines, such as procedures, functions, and triggers. In other words, you can’t base a DECLARE statement variable on a column in the data dictionary table. With UDTs, you can do something like this in a procedure:

DECLARE ACCOUNT_NUMBER AS GL_ACCOUNT_NUMBER;

Judging by the other current versions of DB2, this may be doable on i sometime in the future without using UDTs!

Column level constraints cannot be propagated with either method/
For UDTs, the NULLABLE attribute must be considered for each CREATE TABLE that uses the type/

When Should I Define a UDT?

Good candidate columns to use for UDTs are those data entities that are defined frequently and might have a special attribute requirement such as DEC(13,0), which would be harder to remember than a simple type such as an INT. For example, a UDT called GENERIC_SEQUENCE_NUMBER (derived from DEC(13,0)) can be used to represent the following column names that are used often throughout a database: CUSTOMER_ORDER_NUMBER, QUOTE_NUMBER, CREDIT_MEMO_NUMBER, INVOICE_NUMBER, etc. All of these number columns can be defined consistently and will likely be used in many table definitions.

Further, any data entities that are used as UDTs should be free of embedded meaning within the data itself. For example, if your department number is defined as VARCHAR(12) and the first two characters of the department have special meaning, then a UDT may not be good because the data will have to be cast to its original data type before the SQL built-in functions, such as substring, can be applied to it. Even if this isn’t a performance problem it can make SQL statements look ugly when they contain lots of CASTs or UDFs.

In contrast, a data entity that will not be defined in many tables should not be used as a UDT. For example, SOCIAL_SECURITY_NO is a column that is generally used only once in an application in a table such as the employee master. So why bother defining it as a UDT?

Some Pros and Cons of Using UDTs

UDTs are potentially beneficial in the following areas:

1. UDTs offer many benefits in standardizing commonly used data entities. For example, I had an application that had a boatload of UDFs that referenced a “department” entity that was defined as NUMERIC(4,0). The business rules for defining department numbers changed and required all department numbers in the application to expand to NUMERIC(6,0). This would’ve been a piece of cake with UDTs, but was quite a chore making these changes manually.

2. UDTs can increase portability between different database engines. Microsoft’s SQL Server, for instance, contains an XML data type. This data type can be emulated in DB2 for i as follows:

CREATE TYPE XML AS CLOB(4M) WITH COMPARISON

This may allow easier sharing of code or migrating from SQL Server to DB2 by allowing SQL statements within DB2 to refer to the XML data type. However, the SQL Server XML data type has many features that will not be duplicated by simply creating a UDT called XML. Another potential caution is that it may be difficult to pick a one-size-fits-all UDT. What if most SQL Server XML instances store 2K or less but a few instances store 4MB? The XML UDT would have to be defined to accommodate the largest potential instance.

3. UDTs can prevent errors that occur by enforcing strong data type comparisons. For example, a character column defined with type DEPARTMENT_NUMBER can’t be concatenated to another character field without an explicit CAST. Likewise, a developer can’t multiply a numeric ORDER_NUMBER data type by 2 or compare it to the number 1000 without first performing an explicit CAST. This forces the developer to think carefully about the data types that are being used and when it is legitimate to perform numeric or string operations on these columns.

UDTs have the following potential pitfalls:

Using UDTs can make porting SQL code to other database systems a little more complicated. One possible solution to this is to create an equivalent UDT on the destination database if it’s supported.
For another problem, data transfer tools such as SQL Server’s linked server facility (using IBM’s ODBC driver) and Microsoft Access import/export facility don’t always interpret the distinct type correctly, which can cause import and export problems with DB2. In these cases the UDT data may need to be CAST back to its base data type before being used by these tools.
Any table that includes a UDT is no longer available to be used in a high-level language program using native I/O. For example, an RPG program that attempts to open a file containing a UDT will generate a CPF428A error. All operations against such a table within a high level language program must use embedded SQL.

Future Potential for UDTs

UDTs may have room for future enhancement for DB2 for i. One enhancement I’d like to see is the ability to define a complex object type as a UDT. Currently UDTs are only derived from base SQL data types. But think of the power if there was a Java object lurking behind a UDT definition instead of just a primitive data type!

Take a simple example. Say we want to create a complex data type called GEOLOCATION that contains two related pieces of information: latitude and longitude. Using the complex data type in a stored procedure might look like this:

DECLARE CellPhoneLocation AS GeoLocation;
Set CellPhoneLocation.Latitude=42.58;
Set CellPhoneLocation.Longitude= -86.027;

In a query the reference may look like this:

Select CellPhoneNumber,
       CellPhoneLocation.Latitude,  /* Property of Complex Type */
       CellPhoneLocation.Longitude, /* Property of Complex Type */
       CellTimeStamp
  From CellPhoneTracking;

Implementing a complex type is fabulous because the object’s code can perform validation, control whether or not a NULL is allowed, and allow user-defined properties to perform complex calculations (alleviating the need to implement one or more related UDFs).

In fact, SQL Server 2005 and later already offers this capability as developers can create their own object data types using a .NET language such as VB or C#. For readers interested in how this amazing concept works out in the various SQL statements (SELECT, INSERT, CREATE INDEX, etc.) see the Microsoft tutorial at here.

In other versions of DB2, some of these things can already be done with the built-in XML support.

Summary

UDTs are an intriguing facet of DB2 for i. When getting started with them, consider carefully when to appropriately use them, the implications of using them (for instance, whether the use of a UDT give a related ODBC application a problem), and, of course, always monitor the performance of queries that use them.

Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. Send your questions or comments to Mike via the IT Jungle Contact page.

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot