Guru: Remove Extra Blanks, Or Why I Attend Conferences
October 28, 2019 Ted Holt
Have you been to a conference lately? If not, you may be shortchanging yourself. I attend several conferences every year and I get immense benefit from them. I learn a lot, I get a break from the day-to-day, and best of all, I build relationships with other people.
I recently attended the RPG & DB2 Summit in Minneapolis, where I met a bright young developer named Kevan Robinson. He was kind enough to share his version of a tip that I shared with him and other attendees. It’s a technique that I learned ages ago from Craig Mullins, a mainframe expert. It goes like this:
Suppose you have a character field that contains both a first and a last name, and that both names are aligned with their counterparts in other rows. For example, first names begin in position 1 and last names in position 12.
Sam Sung Willie Makit Betty Wont
How can we remove the extra blanks following the first name, leaving only one blank to separate the first and last names?
Sam Sung Willie Makit Betty Wont
We do it like this:
SELECT REPLACE( REPLACE( REPLACE(name, ' ', '<>'), '><', ''), '<>', ' ') FROM MyData
I wish I could find a better way to format that. It’s a bit tough to read. Here’s how it works.
The third REPLACE is innermost, so it runs first, replacing each blank with a <> pair. The second REPLACE replaces all the >< pairs with nothing, leaving the first < and the last >. The first REPLACE runs last, replacing <> with one blank. I admire this technique. I am amazed at how clever human beings can be.
Kevan was kind enough to email me and point out that nowadays we can use a regular expression to accomplish the same thing.
SELECT REGEXP_REPLACE(name, ' +', ' ') FROM MyData
Kevan wrote: “The second parameter to REGEXP_REPLACE (space followed by plus) matches sequences of one or more spaces, and the third parameter replaces each of them with one exactly one space.”
I have shared this technique with others many times, and it never occurred to me to use a regular expression. Like many other people — maybe everybody — I get stuck in old ways. (Not that I consider Mr. Mullins’ technique obsolete. I don’t.)
Kevan added more food for thought: “Changing the second parameter to ‘\s+’ will also match other kinds of white space, such as newlines and tabs, and replace them with a single space. This might change the string too much for some cases, but I’ve found it useful for normalizing certain kinds of legacy data, such as comment fields on a web app.”
I could have stayed home — I have plenty of work to do — but I’m glad I attended the conference. Even if I hated conferences, I would have to go anyway. The benefits are too great.
RELATED STORIES
A Few SQL Tips and Techniques by Craig S. Mullins