Joining On Ranges
April 21, 2015 Ted Holt
People are ingenious. Where there’s a need, they find a way. Unfortunately, some of the ways they find don’t jibe well with relational databases, which makes my work life messy. Fortunately, I know a few ways to clean it up, and here’s one of them. Suppose you work for a company that sells office supplies. One day someone walks into your office and asks for a report of sales of boxes. “Sure thing!” you reply. “How do I know that an item is a box?” “Easy,” he replies. “The item number begins with 134.” Easy, indeed. select ItemNumber, Description from items where ItemNumber like '134%' Or: select ItemNumber, Description from items where substr(ItemNumber,1,3) = '134' The query returns this result set. ItemNumber Description ========== ================= 134001 3x3x3" Corrugated 134002 4x4x4" Corrugated 134005 5x5x5" Corrugated Done. He’s happy, and you’re happy because he’s happy. Before long you have dozens of programs that look for boxes. Months later, he’s back again. “We’re selling a new line of boxes. I need them added to my reports. You’ll know them because the item numbers of the new line start with 88.” Here’s the required modification to the previous query. select ItemNumber, Description from items where ItemNumber like '134%' or ItemNumber like '88%' Or: select ItemNumber, Description from items where substr(ItemNumber,1,3) = '134' or substr(ItemNumber,1,2) = '88' Here’s the result set of the revised query. ItemNumber Description ========== ================= 134001 3x3x3" Corrugated 134002 4x4x4" Corrugated 134005 5x5x5" Corrugated 880001 6x6x2" Flat box 880002 7x7x3" Flat box Done. Now modify the other dozens of SQL queries that have to do with boxes. How do we get into these messes? Ideally the item master table (physical file) would have a category column (field), but nobody thought about it when the database was being designed. So users found a way–they assigned ranges of numbers to the different product categories. What’s more, they weren’t always logical in building their naming scheme. If you have programs that use native I/O (i.e., READx, CHAIN, WRITE, etc. in RPG terms), adding a category column to the item master table may be unworkable. Even if you were to add the column and load the initial values, the odds that the users would keep the column accurately updated would be slim to none, as nobody can ensure the integrity of hundreds of thousands of items. So what’s a database guru like you to do? I recommend creating what I call a range table. By range table, I mean a table that centers around a range of values in each row (record). (If there’s an accepted name for such a table, I don’t know what it is. Consider it my contribution to relational database terminology.) create table ItemCategories for system name ItemCateg (FromItemNumber for column FromItem char(6), ThruItemNumber for column ThruItem char(6), ItemCategory for column Category char(1), primary key (FromItemNumber)) Now assign categories. Involve the users, since they’ll be the ones who maintain the table. insert into ItemCategories values ('134', '134999', 1), ('88' , '889999', 1), ('488', '488999', 2), ('2', '209999', 3) I’ve defined three categories: 1 = boxes 2 = chairs 3 = desks The revised query looks like this: select itm.ItemNumber, itm.Description from items as itm join ItemCategories as cat on itm.ItemNumber between cat.FromItemNumber and cat.ThruItemNumber where cat.ItemCategory = 1 I won’t repeat the result set because it looks just like the one given above. Notice first that the hard coded item prefixes are gone. I hate hard coding and always welcome an opportunity to remove it from my programming. Second, notice that the join uses the BETWEEN operator. This is the usual way to join to range tables, and I much prefer it to using the equivalent compound condition. Third, notice also that the join is an inner join. I won’t say that outer joins are never used with BETWEEN, but in my experience it’s not the usual case. Range tables come in handy for many types of data. Here are a couple more examples to think about. (1) A discount table FromQuantity ThruQuantity DiscountMultiplier ============ ============ ================== 0 24 1.0000 25 48 .9500 49 72 .9000 73 96 .8500 97 120 .8000 121 99,999 .7500 If a customer orders two dozen or fewer widgets, his order total is multiplied by 1, meaning he gets no discount. If he buys more than two dozen widgets, but no more than four dozen, his order total is multiplied by .95, giving him a 5 percent discount. And so on. SQL handles this sort of math easily. I used quantities in this example, but it works just as well for monetary amounts. (2) Date ranges Lots and lots of things are driven by date ranges. Assume a company that begins its fiscal year on September 1 and uses a 5-4-4 calendar. You might use a range table to avoid CASE logic to determine the fiscal year. You could also save yourself a lot of IF’ing and CASE’ing to find the period within the fiscal year. FromDate ThruDate Year Period ========== ========== ==== ====== 2014-09-01 2014-10-05 2015 1 2014-10-06 2014-11-02 2015 2 2014-11-03 2014-11-30 2015 3 2014-12-01 2015-01-04 2015 4 2015-01-05 2015-02-01 2015 5 2015-02-02 2015-03-01 2015 6 2015-03-02 2015-04-05 2015 7 2015-04-06 2015-05-03 2015 8 2015-05-04 2015-05-31 2015 9 2015-06-01 2015-07-05 2015 10 2015-07-06 2015-08-02 2015 11 2015-08-03 2015-08-31 2015 12 In an ideal world, we would probably not need range tables. Or maybe we would need fewer of them. I don’t live in an ideal world. Thanks to SQL’s BETWEEN operator, I don’t have to. RELATED STORY
|