Data Scrubbing Functions In DB2 For i
November 17, 2015 Hey, Mike
Imagine a character database field that stores a phone number. No formatting rules are involved, so the values in the column vary such as 8370155738, 837/015-5738, 837-015.5738, etc. We’d like to write an RPG program that allows the user to enter a number, formatted or not, onto the screen and if that string of numbers is found in the table’s phone column, show it to the user. Is there a way to use SQL to strip the non-numeric characters from the phone field, and select the record if the result matches the user input? –Brad Hi, Brad. Scrubbing data is a common task for database applications, and fortunately DB2 for i has plenty of built in functions to help you with this task. There are two ways to approach this problem: using the new regular expression functionality in DB2 for i (available in IBM i 7.1 and 7.2 with recent group PTFs); or, for older IBM i installations, using the REPLACE and TRANSLATE functions. If you are on IBM i 7.1 TR9 or 7.2 TR1, the REGEXP_REPLACE function can be used to clean unwanted characters from a phone number as easy as this: WITH CUSTOMER_DATA AS ( SELECT * FROM (VALUES(1,'PH:800/234-1412'), (2,' 8370155738'), (3,'837/015-5738'), (4,'837-015.5738'), (5,'7218675309 x243') ) x(ID,PHONE_NO) ) SELECT LEFT(REGEXP_REPLACE(PHONE_NO,'[^d]',''),10) PHONE FROM CUSTOMER_DATA The pattern [^d] instructs the REGEXP_REPLACE function to identify all non-digit characters within the phone number and then the function replaces them with an empty string so that they effectively disappear. The query result is:
If your DB2 for i level doesn’t offer the REGEX_REPLACE function, other options are available. The simplest is to nest multiple REPLACE functions around the phone number to remove any specific unwanted characters. For example, consider this expression that removes all hyphens, spaces and parenthesis from column PHONE_NO: SELECT REPLACE( REPLACE( REPLACE( REPLACE(PHONE_NO,'-',''), ')',''), '(',''), ' ','') AS PHONE_NO FROM CUSTOMER_DATA The only problem with this approach is if you find other characters embedded in the phone number (such as X, ., /), you need to add additional REPLACE functions to handle them. As an alternative, a single instance of the TRANSLATE function can be used to change all non-numeric characters within a phone number to a space. Thereafter, the REPLACE function can be employed to remove all of the spaces from the phone number. The following example illustrates this:
WITH CUSTOMER_DATA AS (
SELECT *
FROM (VALUES(1,'PH:800/234-1412'),
(2,' 8370155738'),
(3,'837/015-5738'),
(4,'837-015.5738'),
(5,'7218675309 x243')
) x(ID,PHONE_NO)
)
SELECT LEFT(REPLACE(TRANSLATE(PHONE_NO,' ',
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
~`|"''!@#$%^&*()-=+[]{}
;:.,<>?/'),' ',''),10) AS PHONE
FROM CUSTOMER_DATA
Because you will likely need to use this same function twice in the same SQL statement (once to scrub the table data and once to scrub the phone number the user enters on screen), it may be worthwhile to encapsulate it in a scalar user-defined function: CREATE OR REPLACE FUNCTION CLEAN_PHONE_NO (@PHONE_NO VARCHAR(24)) RETURNS CHAR(10) LANGUAGE SQL NO EXTERNAL ACTION DETERMINISTIC NOT FENCED SET OPTION COMMIT=*NONE BEGIN RETURN LEFT(REPLACE(TRANSLATE(@PHONE_NO,' ', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ~`|"''!@#$%^&*()-=+[]{} ;:.,<>?/'), ' ',''),10); END Once the function is built, it is easy to compare the phone number in the table (PHONE_NO column) with the RPG host variable SEARCH_PHONE_NO once they’re both cleaned up: /Exec SQL SELECT * FROM CUSTOMER_DATA WHERE CLEAN_PHONE_NO(PHONE_NO)=CLEAN_PHONE_NO(:SEARCH_PHONE_NO); Or, to go back to the REGEX_REPLACE example: /Exec SQL SELECT * FROM CUSTOMER_DATA WHERE LEFT(REGEXP_REPLACE(PHONE_NO,'[^d]',''),10)= LEFT(REGEXP_REPLACE(:SEARCH_PHONE_NO,'[^d]',''),10); If you want the RPG variable cleaned up within the program for other purposes (aside from the SQL lookup), you can scrub the user’s input (variable SEARCH_PHONE_NO) with an embedded SET statement in RPG: /Exec SQL SET :SEARCH_PHONE_NO=CLEAN_PHONE_NO(:SEARCH_PHONE_NO); On a related topic, if users are going to do frequent searches on the phone number, it may be a good idea to periodically clean-up the phone number column and put an index on it: UPDATE CUSTOMER_DATA SET PHONE_NO=CLEAN_PHONE_NO(PHONE_NO); CREATE INDEX IDX_CUSTOMER_DATA_PHONE ON CUSTOMER_DATA (PHONE_NO); If PHONE_NO is cleaned up in the table, this SELECT statement can take advantage of the index: SELECT * FROM CUSTOMER_DATA WHERE PHONE_NO=CLEAN_PHONE_NO(:SEARCH_PHONE_NO) Disclaimer: The “scrub” functionality shown here only allows for a 10-character phone number. You may need to tweak the code if you want to allow for international phone numbers, a leading 1 for long distance, etc. Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page. RELATED STORY Native Regular Expressions In DB2 For i 7.1 And 7.2
|
I want to find out to remove spaces between character field ( ABC DE EFF CA). I cannot find one answer? in AS400 SQL .