V5R3 SQL Enhancements
June 30, 2004 Michael Sansoterra
Like a kid at Christmas, with every release of OS/400 (now i5/OS), I’m always eager to find out what enhancements IBM has made to SQL. Once again, IBM has been very good to SQL programmers with this latest release. Here is a review of many of the latest enhancements to SQL.
DISCLAIMER
Since I don’t yet have access to a machine with V5R3, this information was gleaned from the new V5R3 manuals and may be subject to revision if I interpreted something incorrectly! Where possible, I tried to validate the syntax of the material by testing some of these features that already exist in DB2 UDB for Windows Version 8.1.2.
NEW DATA TYPES
SQL has two new data types: Binary and VarBinary. These new data types are essentially equivalent to Char and VarChar columns defined with “For Bit Data” or CCSID 65535 attributes. So what are they good for? The new Binary and VarBinary data types are useful for two purposes: compatibility with other database servers and avoiding problems associated with the translation of CCSID 65535.
Character data is marked with a coded character set ID. The CCSID is used in translating data between EBCDIC and ASCII, for instance. CCSID 65535 is a special identifier that indicates the character data isn’t stored with a particular character set; this could even include binary data.
Many multinational applications define their tables to store data with CCSID 65535 so that the information is not tied to a specific character set. The problem is that, without knowing what character set is being used, ODBC, for instance, does not know how to translate the EBCDIC data to ASCII on a Windows PC. The ODBC “force translation of CCSID 65535” option allows the translation of CCSID 65535 by assuming that the iSeries data is stored in the same CCSID that the current job is using.
Until the advent of the Binary and VarBinary data types, Char and VarChar had to perform a double duty: by storing multinational character data and storing binary data. However, this caused a problem because forcing the translation of character data to a specific CCSID would also cause binary data be translated. Now binary data can be stored without worrying about translation issues.
Consider this sample table:
Create Table LabResults ( TestID Integer Not Null, PartID Char(15) CCSID 65535 Not Null, TestData Varchar(50) For Bit Data Not Null, TestDate Date Not Null Default Current_Date)
In the table, the PartID contains character data and TestData contains binary data. If an ODBC connection were specified with the automatic translation option, both columns would be translated. However, the TestData column containing binary data should not be translated. This problem is resolved by defining TestData with the VarBinary data type:
Create Table LabResults ( TestID Integer Not Null, PartID Char(15) CCSID 65535 Not Null, TestData VarBinary(50) Not Null, TestDate Date Not Null Default Current_Date)
NEW BUILT-IN SCALAR FUNCTIONS
This release comes with a plethora of new built-in functions. Here’s a brief overview of most of them (not including the new partition functions):
BINARY converts a string expression to a fixed-length binary representation. An optional second parameter allows the resulting length to be specified. As discussed above, this function is useful for preventing an existing character column storing binary data from being translated:
Select PartID, Binary(TestData) As TestData /* Part ID and TestData are CHAR fields defined with CCSID 65535. In a Client/Server environment with translation on, TestData will not be translated. */ From LabResults
DAYNAME accepts a date or timestamp (or character representation of either) and returns a mixed-case day name, based on the language used for the job’s messages. The result is VarChar(100).
Select DayName(Current_Date) /* If Current_Date is July 4, 2004, 'Sunday' is returned for U.S. English users.*/ From SysIbm.SysDummy1
MONTHNAME accepts a date or timestamp (or character representation of either) and returns a mixed-case month name, based on the language used for the job’s messages. The result is VarChar(100).
Select MonthName(Current_Date) /* If Current_Date is July 4, 2004, 'July' is returned*/ From SysIbm.SysDummy1
MULTIPLY_ALT provides an alternative to the multiplication operator (*). It is useful when multiplying numbers that can have a resulting precision greater than the maximum allowed (which is now 63!). While the multiplication operator allocates the precision so as to prevent decimal truncation, MULTIPLY_ALT allocates the maximum precision to the left of the decimal so as to prevent an overflow error.
Select Multiply_Alt(200000000000.000123, 900000000000.001456) /* These two numbers can't be multiplied using "*" because the resulting number of significant places would be too small DEC(31,12). Multiply_Alt returns DEC(31,7) This example was done in DB2 for Windows. The iSeries should now allow up 63 places. */ From SysIbm.SysDummy1
INSERT is similar to RPG IV’s %Replace BIF and is designed to insert a string into a source string at a fixed location. Optionally, a specific number of characters can be removed from the source string before the insertion is made. It accepts four parameters: SOURCE STRING , START IN SOURCE , LENGTH TO REMOVE, and INSERT-STRING and returns the modified source string.
Select Insert('<b>&1</b>',4,2,'My Address') /* The resulting string is <b>My Address</b> */ From SysIbm.SysDummy1
REPEAT returns a given string expression a specified number of times. This function is useful for padding a string with something other than spaces.
Select Left(RTrim(ItemDesc)||Repeat(X'00',30),30) /* Pad Description with binary zeros */ From ItemMaster
REPLACE accepts a source string, a search string, and a replace string and returns the modified source string if the requested search string was found:
Select Int(Replace(SocSecNo,'-','')) /* Remove dashes from SSN so that 321-72-8053 becomes 321728053 */ From SysIbm.SysDummy1
RIGHT is a counterpart to the LEFT function that will accept a string and a number of characters to return, starting from the end of the string.
Select Right(PhoneNo, Length(PhoneNo)-Posstr(PhoneNo,' ')) /* When PhoneNo contains 613 249-2110 this expression returns the right most eight characters: 249-2110 */ From Customers
VARBINARY is the same as BINARY, with the exception that the field is variable-length. An optional second parameter is used to define a maximum length.
Of these new functions, my two favorites are REPLACE and RIGHT. How these useful functions were omitted until V5R3 is beyond me. Many of you wrote user-defined functions to handle these tasks, but feel free to throw them away now, as UDFs suffer from having to define an arbitrary maximum string length.
DUPLICATE FUNCTIONS
These next few functions have limited usefulness, in that their function is already available in the form of other functions or their values are easily derived.
BIT_LENGTH calculates the number of bits required to store a value. It appears to simply take the length of a string (in bytes) and multiply it by 8. This function can accept numeric parameters, but numerics will be cast to VarChar before reporting the length.
Select Bit_Length('125') /*3 bytes -> Returns 24 */ From SysIbm.SysDummy1
DATABASE returns the same value as the special register Current_Server.
Select Database() /* Returns Server Name */ From SysIbm.SysDummy1
EXTRACT retrieves a specific portion (such as year or hour) of a date, time, or timestamp value. The only value I can find in this new function is if the portion to extract can be defined as a variable; otherwise, I’d just use the existing scalar functions, which have a shorter syntax.
Select Extract(Year From BirthDate) As BirthYear /* Equivalent to the Year function */ From Employees
OCTET_LENGTH returns the number of octets (bytes) for a given string. This function appears identical to the LENGTH and CHARACTER_LENGTH functions.
Select Octet_Length('125') /*3 bytes -> Returns 3 */ From SysIbm.SysDummy1
TIMESTAMP_ISO receives a date, time, or timestamp function and returns a timestamp. I don’t know what the difference is between this function and the TimeStamp function, other than TIMESTAMP_ISO doesn’t accept two arguments.
ENCRYPTION FUNCTIONS
Every now and then a programmer gets the job of storing sensitive data. This data is to be protected from peering eyes by disallowing any one (even with *ALLOBJ authority) from making sense of the data. This job of storing data in an encrypted format is now easy, as V5R3 introduces functions to encrypt and decrypt string data. The Cryptographic Access Provider 128-bit for AS/400 product (5722AC3) must be installed in order for these functions to work.
Here’s how the functions operate:
Encrypt_RC2 accepts string data to encrypt, an optional password, and an optional password hint. The function returns an encrypted string result using the RC2 encryption algorithm. The original password string must be preserved (by user’s memory, database file, etc.) in order to decrypt the data. The optional hint string can be used to assist a user in remembering the password.
Insert Into TreasureMaps (TreasureID, TreasureValue, Directions) Values(1,50000.00, /* Insert directions to treasure in encrypted format */ Encrypt_RC2('30 paces to the palm tree -- Turn Left','Caribbean','Pirates'))
Note that the data, the password, and the hint are stored in the encrypted result. The password can be between 6 and 127 bytes, and the hint can be up to 32 bytes. The IBM SQL reference manual (in PDF format) contains guidelines on how to calculate the number of bytes required to store all three pieces of information.
Decrypt–Once data has been encrypted, one of several decryption functions is available: Decrypt_Bit, Decrypt_Binary, Decrypt_Char, and Decrypt_DB.
Each decryption function is available for decrypting data into the original format (binary, character, etc.). These functions accept an RC2-encrypted string and an optional password string. The Decrypt_Char and Decrypt_DB functions accept an optional third parameter for specifying a CCSID for the resulting string.
Select Decrypt_Char(Directions,'Caribbean') /* Result will be: '30 paces to the palm tree -- Turn Left' */ From TreasureMaps Where TreasureID=1
If an invalid password is supplied, the statement fails with an error.
If a common password was used to encrypt data, the password can be set for a session as follows:
Set Encryption Password = 'Caribbean'
When the optional password is absent, the decrypt functions heed the password set by the SET ENCRYPTION PASSWORD statement:
Select Decrypt_Char(Directions) /* The password isn't needed here */ From TreasureMaps Where TreasureID=1
The SET ENCRYPTION PASSWORD also applies to encrypting data if the optional password is not specified when using the Encrypt_RC2 function.
GetHint is used to retrieve the password hint from a string encrypted with a hint. The only parameter is an encrypted string.
Select GetHint(Directions) /* Based on the above example, GetHint will return 'Pirates' */ From TreasureMaps Where TreasureID=1
The GetHint function is particularly valuable for applications that allow users to store their own passwords in an encrypted format.
The encryption functions are useful for storing sensitive data such as passwords or credit card numbers. Don’t forget, for client/server applications, these encryption routines will not protect your data by passing encrypted strings across the network. SSL and the like should be used for network protection. Also, don’t forget that there is a performance penalty for encrypting and decrypting data, so don’t overuse it!
SEQUENCES
A sequence is an SQL object that offers an alternative to an identity column as a way of automatically assigning a number to a column. A sequence is created using the CREATE SEQUENCE statement:
Create Sequence Domestic_Customers Start With 1 Increment By 1 MaxValue 700000 Cycle Create Sequence Foreign_Customers Start With 700001 Increment By 1 MaxValue 999999 Cycle
In these examples, a sequence is created with a name, an initial value, a maximum value, and an increment value. The cycle keyword indicates that the sequence will start over when the specified maximum is reached. Every time the sequence is used, the current value of the sequence is returned and incremented in preparation for the next use. Since a sequence isn’t tied to a particular table, it can be used with multiple tables. The ALTER SEQUENCE statement can be used to change the characteristics of a sequence.
To use a sequence, specify the NEXT VALUE FOR keywords:
Insert Into Customer(CustomerID,CustomerName,CustomerType) Values (Next Value For Domestic_Customers, 'New Domestic Customer','DOM')
Additionally, PREVIOUS VALUE FOR can be used to retrieve the previous value. However, using PREVIOUS VALUE FOR does not decrement the sequence; it simply provides a way of obtaining the last value used.
Sequences can be used in SELECT and SET statements as well. This means that sequences can be used for things like counters, which used to require a user-defined function. In DB2 for Windows, I was trying to use a sequence to pull out every other row in a query but found I could not use NEXT VALUE FOR in the WHERE clause. We’ll have to see if the iSeries has the same limitation.
THE BEST IS YET TO COME
In my next article, I’ll cover some of the join enhancements and discuss improvements to the SQL pre-compiler. There are plenty of new features in this release!
Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. E-mail: msansoterra@itjungle.com
REFERENCES
(The following references are in PDF format.)
- DB2 Universal Database for iSeries SQL Programming
- DB2 Universal Database for iSeries SQL Reference
- Embedded SQL Programming
Editor’s Note: This article has been corrected since it was originally published. Guild Companies regrets the errors. [Changes made 6/30/04.]
- In the third paragraph under the subhead “NEW DATA TYPES,” the phrase “the ‘force translation of CCSID 65535’ option” now reads: “the ODBC ‘force translation of CCSID 65535’ option.”
- In the “Select Insert” code snippet under the subhead “NEW BUILT-IN SCALAR FUNCTIONS,” slash marks (/) have been added to the closing bold tags:
Select Insert('<b>&1</b>',4,2,'My Address') /* The resulting string is <b>My Address</b> */ From SysIbm.SysDummy1
/* The password isn’t needed in the function */
to read:
/* The password isn’t needed here */.