WHERE Versus HAVING
May 9, 2007 Hey, Ted
What is the purpose of SQL’s HAVING operator? It seems to me to be unnecessary, since the WHERE operator can choose which records are to be included in the query. What am I missing? –Rich Don’t sweat it, Rich. This is a very common point of confusion. Let me try to straighten it out for you, and in the process I’ll add a tip that some of the more experienced readers may not have seen before. Sometimes you can get the same result either WHERE or HAVING. The following two SQL commands produce the same result set. That is, both count the number of records found for the states of Texas and Georgia. SELECT state, COUNT(*) FROM qiws/qcustcdt WHERE state IN ('GA', 'TX') GROUP BY state ORDER BY state SELECT state, COUNT(*) FROM qiws/qcustcdt GROUP BY state HAVING state IN ('GA', 'TX') ORDER BY state So, what’s the difference? Which is better? I’ll let you answer those questions in a minute. The purpose of the WHERE clause is to select rows (records) that are to be included in the query. For example, assume table QCUSTCDT, which you can find on your system in the QIWS library. Suppose I want the names, account numbers, and balance due of all customers from Texas and Georgia. Since STATE is one of the fields in the record format, I can use WHERE to select those customers. SELECT cusnum, lstnam, init FROM qiws/qcustcdt WHERE state IN ('TX', 'GA') CUSNUM LSTNAM INIT BALDUE ====== ============ ==== ======== 938472 Henning G K 37.00 938485 Johnson J A 3987.50 593029 Williams E D 25.00 Suppose I want the total amount due from customers by state. In that case, I would need to use the GROUP BY clause to build an aggregate query. SELECT state,SUM(baldue) FROM qiws/qcustcdt GROUP by state ORDER BY state State Sum(Baldue) ===== =========== CA 250.00 CO 58.75 GA 3987.50 MN 510.00 NY 589.50 TX 62.00 VT 439.00 WY .00 Now, suppose I want the same information, but I don’t care about states where nobody owes me any money. Since the total owed by state is an aggregate figure, i.e., the figure is generated from a group of records, you must use HAVING to select the proper data. SELECT state,SUM(baldue) FROM qiws/qcustcdt GROUP by state HAVING SUM(baldue) > 0 ORDER BY state State Sum(Baldue) ===== =========== CA 250.00 CO 58.75 GA 3987.50 MN 510.00 NY 589.50 TX 62.00 VT 439.00 Notice that Wyoming is no longer in the picture. Here’s the rule. If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause. Here’s another rule: You can’t use HAVING unless you also use GROUP BY. Now, go back to the first example, where WHERE and HAVING produce the same result set. What’s the difference? The first query uses the WHERE clause to restrict the number of rows that the computer has to sum up. But the second query sums up all the rows in the table, then uses HAVING to discard the sums it calculated for all states except Texas and Georgia. The first query is obviously the better one, because there is no need to make the computer calculate sums and then throw them away. I mentioned a tip that some of the more experienced readers may not have seen. Some queries combine aggregate processing with non-aggregate processing. In a classic example, one table contains individual sales, while another contains a quota for a sales rep. The first table contains more than one record per rep, and those records must be summarized in order to get an aggregate total, which is then compared to the sole quota record for a rep. How do we find the reps who have not met their quotas? Like this: SELECT rep, SUM(amount) FROM sales AS a GROUP by rep HAVING SUM(a.amount) >= (SELECT quota FROM quotas AS b WHERE b.rep = a.rep) ORDER BY rep The first SELECT sums the individual sales by rep. Each rep’s sum is compared to one record from the QUOTAS table, in order to determine if the aggregate sum is at least as much as the quota. As before, it’s necessary to put the aggregate function (SUM) in the HAVING clause. If these are the sales: Rep Amount === ====== 1 20 1 30 2 40 3 30 3 20 And these are the quotas: Rep Quota === ===== 1 70 2 30 3 40 This is the result: Rep SUM(Sales) === ========== 2 40 3 50 How about that? A nested query in the HAVING clause. –Ted
|