• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Distinctly Speeding Up DISTINCT

    May 26, 2010 Ted Holt

    The DISTINCT keyword is a great example of the power of the SQL SELECT statement. Add this powerful word to any SELECT command and voilà! Duplicate rows magically disappear. Comments from readers have alerted me to the fact that some people find use of this feature confusing. Here are a brief discussion of DISTINCT and a performance tip.

    Given a table (physical file) of sales order information and a table of customer information, how do I go about finding the account numbers of customers who have orders in the database? We could try this:

    select s.companyno, s.customerno
      from salesordh as s
     order by 1, 2
    

    And we would get something like this:

    COMPANYNO  CUSTOMERNO
         1       34567
         1       34567
         1       45678
         1       56789
         1       77777
         2       12345
         2       23456
         2       56789
    

    Marvelous, but do I really need to have each customer listed once per order? In this short example, only one customer—34567–has more than one order. In a production database, containing hundreds or thousands of orders, there could be many duplicates. We can eliminate duplicates by adding the word DISTINCT:

    select distinct s.companyno, s.customerno
      from salesordh as s
     order by 1, 2
    

    And we would get this instead:

    COMPANYNO  CUSTOMERNO
         1       34567
         1       45678
         1       56789
         1       77777
         2       12345
         2       23456
         2       56789
    

    Let’s take it a step farther. Suppose I also want the customer’s name. For that, I must go to the customer master file. Here’s the same query, but I’ve added the customer master file.

    select distinct s.companyno, s.customerno, c.cusnam
      from salesordh as s
      join customers as c
        on c.companyno = s.companyno
       and c.customerno = s.customerno
     order by 1, 2
    

    This works. I get this:

    COMPANYNO  CUSTOMERNO  CUSNAM
         1       34567     Polly Unsaturated
         1       45678     Molly Coddle
         1       56789     R. U. Furreal
         1       77777     Herman Nootix
         2       12345     Cal E. Phornya
         2       23456     Billy Rubin
         2       56789     Sally Mander
    

    But notice something. Notice that all three selected fields are in the customer master file. Notice also that two of the fields–company number and customer number–are the key fields that uniquely identify each customer. In other words, why join, which creates multiple rows for each customer, then throw away duplicates? It’s unnecessary. Here’s the rewritten query:

    select c.companyno, c.customerno, c.cusnam
      from customers as c
     where exists
        (select * 
           from salesordh as s
          where s.companyno  = c.companyno
            and s.customerno = c.customerno)
     order by 1, 2
    

    DISTINCT is gone. The main query reads CUSTOMERS only. The sales order table has been relegated to a subquery. The resulting data set is the same, but performance should be better.

    To sum it up, use DISTINCT when you’re retrieving data from one table (or view) only. When two or more tables and/or views are involved, and all selected columns are from one table, you can usually get a performance advantage by converting the query to include a subquery and throwing away DISTINCT.



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Computer Keyes

    Fax Directly from your IBM i

    KeyesFax is a full function automated IBM i fax system. Spooled files are burst by fax number and auto transmitted with overlays.  It combines both a send and receive facsimile processing system with a complete image package.

    The fax software will edit, send, receive, display, print, and track fax documents or images using any standard IBM i without additional expensive hardware, software or subscriptions.

    Computer Keyes has been developing Software Solutions since 1978!

    www.computerkeyes.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    IBS:  Free e-book: The Six Margin Killers in Wholesale Distribution
    WorksRight Software:  ZIP codes, area codes, Canadian postal codes, CASS certification, and more
    COMMON:  Join us at the Fall 2010 Conference & Expo, Oct. 4 - 6, in San Antonio, Texas

    IT Jungle Store Top Book Picks

    Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
    The iSeries Express Web Implementer's Guide: List Price, $49.95
    The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
    The System i Pocket RPG & RPG IV Guide: List Price, $69.95
    The iSeries Pocket Database Guide: List Price, $59.00
    The iSeries Pocket SQL Guide: List Price, $59.00
    The iSeries Pocket Query Guide: List Price, $49.00
    The iSeries Pocket WebFacing Primer: List Price, $39.00
    Migrating to WebSphere Express for iSeries: List Price, $49.00
    Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
    Getting Started with WebSphere Express for iSeries: List Price, $49.00
    Can the AS/400 Survive IBM?: List Price, $49.00
    Chip Wars: List Price, $29.95

    Saudi Distributor Taps VAI for Logistics Improvement nuBridges Eases i/OS Integration for Tokenized Data

    Leave a Reply Cancel reply

Volume 10, Number 17 -- May 26, 2010
THIS ISSUE SPONSORED BY:

WorksRight Software
SEQUEL Software
Botz & Associates, Inc.

Table of Contents

  • Five Steps To Monitoring Your Server Log on IBM i
  • Distinctly Speeding Up DISTINCT
  • Hey! What Happened To My Last Used Dates

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • With Power11, Power Systems “Go To Eleven”
  • With Subscription Price, IBM i P20 And P30 Tiers Get Bigger Bundles
  • Izzi Buys CNX, Eyes Valence Port To System Z
  • IBM i Shops “Attacking” Security Concerns, Study Shows
  • IBM i PTF Guide, Volume 27, Number 26
  • Liam Allan Shares What’s Coming Next With Code For IBM i
  • From Stable To Scalable: Visual LANSA 16 Powers IBM i Growth – Launching July 8
  • VS Code Will Be The Heart Of The Modern IBM i Platform
  • The AS/400: A 37-Year-Old Dog That Loves To Learn New Tricks
  • IBM i PTF Guide, Volume 27, Number 25

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle