An Overview of User-Defined Types in DB2 for i
September 2, 2009 Michael Sansoterra
UDTs or User-Defined Types (also known as distinct types) allow SQL developers to implement their own data types. For example, if a database contains many tables that each have one or more columns representing a general ledger account number defined as VARCHAR(30), a suitable UDT can be defined as follows: Create Type My_Schema/GL_ACCOUNT_NUMBER As VarChar(30) CCSID 37 With Comparisons In DB2 for i, implementing UDTs is currently limited to deriving a new data type from an existing DB2 built-in data type. This tip will briefly explore UDTs, how they can be used in modern databases, potential pitfalls, and how UDTs in DB2 for i (a.k.a., DB2 UDB for iSeries) may be enhanced in the future. How Is a UDT Used Within SQL Statements? A UDT can be referenced in almost any SQL statement that references a built-in data type. These statements include: CREATE TABLE, DECLARE, CAST, etc. For instance, the above UDT named GL_ACCOUNT_NUMBER can be used in a CREATE TABLE statement as follows: CREATE TABLE My_Schema/My_Inventory_Test (Inventory_Id As Int, Inventory_Account_Number As GL_ACCOUNT_NUMBER) UDTs and Strong Typing Even though UDTs are derived from a built-in type, a UDT is considered distinct from all other data types. This is sometimes called a “strong type.” In the above example, GL_ACCOUNT_NUMBER is derived from the VARCHAR data type. However, GL_ACCOUNT_NUMBER is considered by DB2 to be unique, that is, different from and incompatible with all other data types, including character types. This means that DB2 won’t allow a column defined as a GL_ACCOUNT_NUMBER to be directly compared to, or concatenated to, character data without an explicit CAST. Assuming AR_ACCOUNT_NUMBER is a column based on the GL_ACCOUNT_NUMBER UDT, here’s an example of a column expression that concatenates the UDT column to a constant: 'A/R Account:' || CAST(AR_ACCOUNT_NUMBER As VarChar(30) CCSID 37) If you attempt to do this operation without the CAST, DB2 will throw an error indicating that the data types are incompatible. In this case DB2 for i will return message ID: SQL0171. The good news is that the CREATE TYPE statement automatically generates companion user-defined functions (UDFs) that can be used to assist with these tasks. In fact, it creates the following UDFs:
These new UDFs are called “comparison functions” and allow an easy way for the base data type to be CAST to the new distinct type, or vice versa, so the base type and derived distinct type can participate in comparisons and other basic operations. In our example, we can cast the constant to a type of GL_ACCOUNT_NUMBER or cast the account number back to a character type as follows using the newly created UDFs: GL_ACCOUNT_NUMBER('A/R Account:') || AR_ACCOUNT_NUMBER 'A/R Account:' || VARCHAR(AR_ACCOUNT_NUMBER) In this case, GL_ACCOUNT_NUMBER function is a UDF that was created at the time the distinct type was created. Also, a new VARCHAR function was created to accept a GL_ACCOUNT_NUMBER type and cast it to VARCHAR. The IBM SQL Reference manual did specify that for the LIKE predicate an explicit cast is still required, so don’t use the UDFs for this particular case. The strong typing concept holds true for numeric columns also. For example, if column ORDER_NUMBER is defined as a type INT, a developer can use any numeric operator (+, -, *, /) or any numeric capable built-in function (i.e., SQRT) on it, although it may not always make sense to do so. (Who wants to divide an order number by 4?!) However, if ORDER_NUMBER is defined as a UDT of GENERIC_ORDER_NUMBER, which is based on the INT type, none of these operators will work unless the data is first CAST back to a numeric type using an explicit CAST or one of the generated comparison UDFs. Creating Sourced Functions If you don’t like the idea of casting a UDT to its base type every time before doing a concatenation or numeric operation, there is an option to create a “sourced” function. A sourced function allows you to create a new function based on an existing function. The concatenation (||) and plus (+) operators, for example, can be viewed as SQL functions. These functions can be overloaded to accept new data types. In this next example, the concatenation function is enhanced to operate on two GL_ACCOUNT_NUMBER data types or a GL_ACCOUNT_NUMBER and a VARCHAR data type. When the concatenation function is enhanced in this manner, no explicit casts are required. [Note: The SQL naming convention used here.] /* Concat will now be able to join two GL_ACCOUNT_NUMBER types */ Create Function My_Schema.Concat (My_Schema.GL_ACCOUNT_NUMBER, My_Schema.GL_ACCOUNT_NUMBER) Specific My_Schema.CONCAT_GL_ACCOUNT_NUMBER_GL_ACCOUNT_NUMBER Returns My_Schema.GL_ACCOUNT_NUMBER Source CONCAT (VarChar(30) CCSID 37,VarChar(30) CCSID 37) /* Concat will now be able to join a GL_ACCOUNT_NUMBER and a VARCHAR */ Create Function My_Schema.Concat (My_Schema.GL_ACCOUNT_NUMBER, VARCHAR(30)) Specific My_Schema.CONCAT_GL_ACCOUNT_NUMBER_VARCHAR Returns My_Schema.GL_ACCOUNT_NUMBER Source CONCAT (VarChar(30) CCSID 37,VarChar(30) CCSID 37) /* Concat will now be able to join a VARCHAR and a GL_ACCOUNT_NUMBER */ Create Function My_Schema.Concat (VARCHAR(30) CCSID 37, My_Schema.GL_ACCOUNT_NUMBER) Specific My_Schema.CONCAT_VARCHAR_GL_ACCOUNT_NUMBER Returns My_Schema.GL_ACCOUNT_NUMBER Source CONCAT (VarChar(30) CCSID 37,VarChar(30) CCSID 37) As evident in the examples, the CREATE FUNCTION statement specifies the input data types, the return data type, and the “base” types that will be “substituted” for the instances of the UDT. Sourcing the CONCAT built-in function in this manner also applies to the concatenation operator. Creating these functions can be a little pesky because you need to cover all possibilities! The three examples above have three different parameter input combinations so that it won’t matter what order the parameters are passed to the function (GL and GL, GL and VarChar, or VarChar and GL). Also, notice the return type of these functions is a GL_ACCOUNT_NUMBER although they could’ve been defined to return the VARCHAR type. It just depends on how you want your application to view the resulting data type after the concatenation is done. When creating UDTs and numeric fields, if you want to be able to perform standard numeric operations and numeric BIFs on these UDTs, you’ll first need to create sourced functions for each required mathematical operator and BIF. Here’s an example showing the overload of the + operator for a UDT called USDOLLAR. Create Function My_Schema /"+" (USDOLLAR, USDOLLAR) Returns USDOLLAR Source "+" (DEC(13,4), DEC(13,4)) Create Function My_Schema/"+" (USDOLLAR,DEC(13,4)) Returns USDOLLAR Source "+" (DEC(13,4), DEC(13,4)) Of course, creating these sourced functions for all numeric operators can be tedious! Hopefully you won’t have much of a need to create sourced functions for all the possible mathematical operators. Finally, avoid creating type names that may clash with reserved SQL keywords. How Are Unqualified UDTs Located in Dynamic SQL Statements? You may have noticed that you can create multiple UDTs with the same name but with different attributes in different libraries (a.k.a., schemas). How does SQL know which UDT to use when there are multiple options? The answer is easy–it searches the libraries currently identified in the SQL Current Path special register. You can use the SET CURRENT PATH statement to override the order and names of the libraries for the current SQL session. It’s a good idea to place the UDT definitions in a commonly used data library to make sure they’re always accessible to SQL statements that reference them. One last quick note before moving on. Any created sourced functions show up in System i Navigator under the functions node of the database (as expected). However, the comparison functions that are automatically generated by DB2 when creating a UDT do not show up within System i Navigator (although they can still be viewed in the QSYS2/SYSFUNCS catalog). Using UDTs In Place of a Data Dictionary As more IBM i developers increasingly rely on SQL instead of Data Definition Specifications (DDS), a few issues arise. One of these issues concerns how to implement a data dictionary to enforce consistency of data types in an application. While this is “old hat” to us old-time DDS/RPG guys, this is one area where SQL is a little weak. There are two ways to use SQL to implement a data dictionary: 1. Using CREATE TABLE AS. Under this method, an empty dictionary table with columns representing common data types is defined. A small dictionary table may look something like this: Create Table Application_Dictionary ( GL_Account_Number VarChar(30) Not Null CCSID 37, Dept_Number VarChar(12) Not Null CCSID 37, Employee_Number Dec(9,0) Not Null, Generic_Name VarGraphic(40) Not Null CCSID 1200, /* V6R1 - use NVARCHAR */ Order_Number Int Not Null) The CREATE TABLE AS statement allows a developer to implement columns with consistent data types based on the column definitions within the data dictionary table. In the following example, columns “Employee_Number”, “Employee_Name”, and “Dept_Number” are derived from the dictionary table columns: Create Table Employee_Master As (Select Employee_Number, Dept_Number, Dec(0,9,0) As Social_Security_No, Generic_Name As Employee_Name From Application_Dictionary) WITH NO DATA 2. Using Distinct Types. An alternative method of creating tables using SQL is to define a UDT for every common data type. Notice that the WITH COMPARISON clause is optional, although IBM recommends it be included for compatibility with other versions of DB2): Create Type GL_Account_Number As VarChar(30) CCSID 37 Create Type Dept_Number As VarChar(30) CCSID 37 Create Type Employee_Number As Dec(9,0) Create Type Generic_Name As VarGraphic(40) CCSID 1200 Create Type Order_Number As Int With UDTs defined, a CREATE TABLE statement can reference these newly defined types the same as a built-in data type: Create Table Employee_Master ( (Employee Employee_Number Not Null, Department Dept_Number Not Null, Social_Security_No Dec(9,0) Not Null, Employee_Name Generic_Name Not Null) Now which approach is better? It depends. I like the UDT option better. The following table list includes some pros and cons of using the data dictionary table method (CREATE TABLE AS) compared to UDTs:
DECLARE ACCOUNT_NUMBER AS GL_ACCOUNT_NUMBER; Judging by the other current versions of DB2, this may be doable on i sometime in the future without using UDTs!
When Should I Define a UDT? Good candidate columns to use for UDTs are those data entities that are defined frequently and might have a special attribute requirement such as DEC(13,0), which would be harder to remember than a simple type such as an INT. For example, a UDT called GENERIC_SEQUENCE_NUMBER (derived from DEC(13,0)) can be used to represent the following column names that are used often throughout a database: CUSTOMER_ORDER_NUMBER, QUOTE_NUMBER, CREDIT_MEMO_NUMBER, INVOICE_NUMBER, etc. All of these number columns can be defined consistently and will likely be used in many table definitions. Further, any data entities that are used as UDTs should be free of embedded meaning within the data itself. For example, if your department number is defined as VARCHAR(12) and the first two characters of the department have special meaning, then a UDT may not be good because the data will have to be cast to its original data type before the SQL built-in functions, such as substring, can be applied to it. Even if this isn’t a performance problem it can make SQL statements look ugly when they contain lots of CASTs or UDFs. In contrast, a data entity that will not be defined in many tables should not be used as a UDT. For example, SOCIAL_SECURITY_NO is a column that is generally used only once in an application in a table such as the employee master. So why bother defining it as a UDT? Some Pros and Cons of Using UDTs UDTs are potentially beneficial in the following areas: 1. UDTs offer many benefits in standardizing commonly used data entities. For example, I had an application that had a boatload of UDFs that referenced a “department” entity that was defined as NUMERIC(4,0). The business rules for defining department numbers changed and required all department numbers in the application to expand to NUMERIC(6,0). This would’ve been a piece of cake with UDTs, but was quite a chore making these changes manually. 2. UDTs can increase portability between different database engines. Microsoft’s SQL Server, for instance, contains an XML data type. This data type can be emulated in DB2 for i as follows: CREATE TYPE XML AS CLOB(4M) WITH COMPARISON This may allow easier sharing of code or migrating from SQL Server to DB2 by allowing SQL statements within DB2 to refer to the XML data type. However, the SQL Server XML data type has many features that will not be duplicated by simply creating a UDT called XML. Another potential caution is that it may be difficult to pick a one-size-fits-all UDT. What if most SQL Server XML instances store 2K or less but a few instances store 4MB? The XML UDT would have to be defined to accommodate the largest potential instance. 3. UDTs can prevent errors that occur by enforcing strong data type comparisons. For example, a character column defined with type DEPARTMENT_NUMBER can’t be concatenated to another character field without an explicit CAST. Likewise, a developer can’t multiply a numeric ORDER_NUMBER data type by 2 or compare it to the number 1000 without first performing an explicit CAST. This forces the developer to think carefully about the data types that are being used and when it is legitimate to perform numeric or string operations on these columns. UDTs have the following potential pitfalls:
Future Potential for UDTs UDTs may have room for future enhancement for DB2 for i. One enhancement I’d like to see is the ability to define a complex object type as a UDT. Currently UDTs are only derived from base SQL data types. But think of the power if there was a Java object lurking behind a UDT definition instead of just a primitive data type! Take a simple example. Say we want to create a complex data type called GEOLOCATION that contains two related pieces of information: latitude and longitude. Using the complex data type in a stored procedure might look like this: DECLARE CellPhoneLocation AS GeoLocation; Set CellPhoneLocation.Latitude=42.58; Set CellPhoneLocation.Longitude= -86.027; In a query the reference may look like this: Select CellPhoneNumber, CellPhoneLocation.Latitude, /* Property of Complex Type */ CellPhoneLocation.Longitude, /* Property of Complex Type */ CellTimeStamp From CellPhoneTracking; Implementing a complex type is fabulous because the object’s code can perform validation, control whether or not a NULL is allowed, and allow user-defined properties to perform complex calculations (alleviating the need to implement one or more related UDFs). In fact, SQL Server 2005 and later already offers this capability as developers can create their own object data types using a .NET language such as VB or C#. For readers interested in how this amazing concept works out in the various SQL statements (SELECT, INSERT, CREATE INDEX, etc.) see the Microsoft tutorial at here. In other versions of DB2, some of these things can already be done with the built-in XML support. Summary UDTs are an intriguing facet of DB2 for i. When getting started with them, consider carefully when to appropriately use them, the implications of using them (for instance, whether the use of a UDT give a related ODBC application a problem), and, of course, always monitor the performance of queries that use them. Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. Send your questions or comments to Mike via the IT Jungle Contact page.
|