• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Data Scrubbing Functions In DB2 For i

    November 17, 2015 Hey, Mike

    Imagine a character database field that stores a phone number. No formatting rules are involved, so the values in the column vary such as 8370155738, 837/015-5738, 837-015.5738, etc. We’d like to write an RPG program that allows the user to enter a number, formatted or not, onto the screen and if that string of numbers is found in the table’s phone column, show it to the user. Is there a way to use SQL to strip the non-numeric characters from the phone field, and select the record if the result matches the user input?

    –Brad

    Hi, Brad.

    Scrubbing data is a common task for database applications, and fortunately DB2 for i has plenty of built in functions to help you with this task. There are two ways to approach this problem: using the new regular expression functionality in DB2 for i (available in IBM i 7.1 and 7.2 with recent group PTFs); or, for older IBM i installations, using the REPLACE and TRANSLATE functions.

    If you are on IBM i 7.1 TR9 or 7.2 TR1, the REGEXP_REPLACE function can be used to clean unwanted characters from a phone number as easy as this:

    WITH CUSTOMER_DATA AS (
    SELECT *
      FROM (VALUES(1,'PH:800/234-1412'),
                  (2,' 8370155738'),
                  (3,'837/015-5738'),
                  (4,'837-015.5738'),
                  (5,'7218675309 x243')
           ) x(ID,PHONE_NO)
    )
    SELECT LEFT(REGEXP_REPLACE(PHONE_NO,'[^d]',''),10) PHONE
      FROM CUSTOMER_DATA
    

    The pattern [^d] instructs the REGEXP_REPLACE function to identify all non-digit characters within the phone number and then the function replaces them with an empty string so that they effectively disappear.

    The query result is:

    Phone

    8002341412

    8470155738

    8470155738

    8470155738

    7218675309

    If your DB2 for i level doesn’t offer the REGEX_REPLACE function, other options are available. The simplest is to nest multiple REPLACE functions around the phone number to remove any specific unwanted characters. For example, consider this expression that removes all hyphens, spaces and parenthesis from column PHONE_NO:

    SELECT REPLACE(
             REPLACE(
               REPLACE(
                 REPLACE(PHONE_NO,'-',''),
                ')',''),
              '(',''),
            ' ','') AS PHONE_NO 
       FROM CUSTOMER_DATA
    

    The only problem with this approach is if you find other characters embedded in the phone number (such as X, ., /), you need to add additional REPLACE functions to handle them.

    As an alternative, a single instance of the TRANSLATE function can be used to change all non-numeric characters within a phone number to a space. Thereafter, the REPLACE function can be employed to remove all of the spaces from the phone number. The following example illustrates this:

    WITH CUSTOMER_DATA AS (
    SELECT *
      FROM (VALUES(1,'PH:800/234-1412'),
                  (2,' 8370155738'),
                  (3,'837/015-5738'),
                  (4,'837-015.5738'),
                  (5,'7218675309 x243')
           ) x(ID,PHONE_NO)
    )
    SELECT LEFT(REPLACE(TRANSLATE(PHONE_NO,' ',
    'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
    ~`|"''!@#$%^&*()-=+[]{}
    ;:.,<>?/'),' ',''),10) AS PHONE
      FROM CUSTOMER_DATA
    

    Because you will likely need to use this same function twice in the same SQL statement (once to scrub the table data and once to scrub the phone number the user enters on screen), it may be worthwhile to encapsulate it in a scalar user-defined function:

    CREATE OR REPLACE FUNCTION CLEAN_PHONE_NO (@PHONE_NO VARCHAR(24))
    RETURNS CHAR(10)
    LANGUAGE SQL
    NO EXTERNAL ACTION
    DETERMINISTIC
    NOT FENCED
    SET OPTION COMMIT=*NONE
    BEGIN
        RETURN LEFT(REPLACE(TRANSLATE(@PHONE_NO,' ', 
    	'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
    	~`|"''!@#$%^&*()-=+[]{}
    	;:.,<>?/'),
    ' ',''),10);
    END
    

    Once the function is built, it is easy to compare the phone number in the table (PHONE_NO column) with the RPG host variable SEARCH_PHONE_NO once they’re both cleaned up:

    /Exec SQL
        SELECT *
          FROM CUSTOMER_DATA
         WHERE CLEAN_PHONE_NO(PHONE_NO)=CLEAN_PHONE_NO(:SEARCH_PHONE_NO);
    

    Or, to go back to the REGEX_REPLACE example:

    /Exec SQL
        SELECT *
          FROM CUSTOMER_DATA
         WHERE LEFT(REGEXP_REPLACE(PHONE_NO,'[^d]',''),10)=
               LEFT(REGEXP_REPLACE(:SEARCH_PHONE_NO,'[^d]',''),10);
    

    If you want the RPG variable cleaned up within the program for other purposes (aside from the SQL lookup), you can scrub the user’s input (variable SEARCH_PHONE_NO) with an embedded SET statement in RPG:

    /Exec SQL
        SET :SEARCH_PHONE_NO=CLEAN_PHONE_NO(:SEARCH_PHONE_NO);
    

    On a related topic, if users are going to do frequent searches on the phone number, it may be a good idea to periodically clean-up the phone number column and put an index on it:

    UPDATE CUSTOMER_DATA
    SET PHONE_NO=CLEAN_PHONE_NO(PHONE_NO);
    
    CREATE INDEX IDX_CUSTOMER_DATA_PHONE ON CUSTOMER_DATA (PHONE_NO);
    

    If PHONE_NO is cleaned up in the table, this SELECT statement can take advantage of the index:

    SELECT *
      FROM CUSTOMER_DATA
     WHERE PHONE_NO=CLEAN_PHONE_NO(:SEARCH_PHONE_NO)
    

    Disclaimer: The “scrub” functionality shown here only allows for a 10-character phone number. You may need to tweak the code if you want to allow for international phone numbers, a leading 1 for long distance, etc.

    Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

    RELATED STORY

    Native Regular Expressions In DB2 For i 7.1 And 7.2

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    NGS:  Nov. 18 FREE Webinar: Migration Alternatives for Query/400 Users
    United Computer Group:  VAULT400 BaaS delivers secure cloud backup and DR solutions
    BCD:  IBM i Webinar with Jon Paris - RPG OA: So Misunderstood! November 18 at 1pm EST

    HelpSystems’ Application Integration Begins With GUI ERP Upgrades: Love ‘Em or Leave ‘Em?

    One thought on “Data Scrubbing Functions In DB2 For i”

    • ajay says:
      October 17, 2018 at 12:32 pm

      I want to find out to remove spaces between character field ( ABC DE EFF CA). I cannot find one answer? in AS400 SQL .

      Reply

    Leave a Reply Cancel reply

Volume 15, Number 24 -- November 17, 2015
THIS ISSUE SPONSORED BY:

ProData Computer Services
Connectria
WorksRight Software

Table of Contents

  • An Introduction to Processing XML with RPG, Part 4
  • Formatting Dates with SQL, Take 3
  • Data Scrubbing Functions In DB2 For i

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Power Systems Grows Nicely In Q3, Looks To Grow For All 2025, Too
  • Beta Of MCP Server Opens Up IBM i For Agentic AI
  • Sundry IBM i And Power Stack Announcements For Your Consideration
  • Please Take The IBM i Marketplace Survey
  • IBM i PTF Guide, Volume 27, Number 43
  • IBM Pulls The Curtain Back A Smidge On Project Bob
  • IBM Just Killed Merlin. Here’s Why
  • Guru: Playing Sounds From An RPG Program
  • A Bit More Insight Into IBM’s “Spyre” AI Accelerator For Power
  • IBM i PTF Guide, Volume 27, Number 42

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle