Guru: SQL Facts Of UNION And ORDER BY
April 10, 2017 Ted Holt
UNION and ORDER BY are powerful SQL features, but put the two together and you may get some strange and frustrating error messages. Fortunately for us DB2 for i professionals, there are easy ways to make the two collaborate and cooperate. Today is a great day to be sure we understand them.
We need some data for examples. For some of the queries, I use a customer master table (file) and a vendor master table. These are a company’s trading partners, so the two have many attributes in common, especially names and addresses. I also use two sales history tables, one each for the years 2015 and 2016. Businesses commonly use such tables to speedily access summary data that does not change. Such a table would have the same data in various formats (e.g. monthly sales, quarterly sales, yearly sales). In these examples, I use the yearly sales.
The following tables show the column (field) names for these database tables.
CUST | VENDOR |
CUSTNBR | VENDORID |
NAME | NAME |
CITY | CITY |
STATE | STATE |
ZIP | ZIPCODE |
SALES2015, SALES2016 |
CUSTNBR |
YEARLY |
(other columns omitted) |
Fact 1: ORDER BY follows the last subselect and applies to the entire union.
A common mistake that people make is to put an ORDER BY clause on each subselect. Doing so is not allowed because it is unnecessary. The system combines the result sets from all the subselects, then sorts the combined data. In this example, trading partners are retrieved in order by city within state.
select state, city, name, custnbr
from cust
union all
select state, city, name, vendorid
from vendor
order by state, city
Fact 2: You can sort on column names that are in every component result sets. When combining data for trading partners, for example, you can sort on NAME, CITY, and STATE.
select custnbr, name, city, state, zip
from cust
union all
select vendorid, name, city, state, zipcode
from vendor
order by name
Fact 3: If a column has different names in different subselects, you must rename one or more of them in order to generate an acceptable column name. In this query, I rename the ZIPCODE column of the vendor table to ZIP.
select custnbr, name, city, state, zip
from cust
union all
select vendorid, name, city, state, zipcode as zip
from vendor
order by zip
Fact 4: You may use column numbers in the ORDER BY clause. (This is one of my favorite ways to sort a union.) In this case, the columns do not have to have consistent names across all the subselects. In this example, I order the data by the fifth column, which is ZIP in the customer table and ZIPCODE in the vendor table.
select custnbr, name, city, state, zip
from cust
union all
select vendorid, name, city, state, zipcode
from vendor
order by 5
Fact 5: When using UNION to create a table, you can’t use column numbers. Instead, you must be sure to name the columns consistently across all subselects. There are two ways to do so.
The first way is to use correlation names.
create table mylib.tpwork as (select custnbr as ID, name, city, state, zip from cust union all select vendorid as ID, name, city, state, zipcode as zip from vendor) with data
The second way is to list the columns in the CREATE TABLE command.
create table mylib.tpwork (ID, name, city, state, zip)
as
(select custnbr, name, city, state, zip
from cust
union all
select vendorid, name, city, state, zipcode
from vendor)
with data
Fact 6: You can use ORDER BY in the component SELECTs by enclosing the expressions in parentheses. The only time this makes sense is when you need an ORDER BY to help with row selection. Find the top two customers for the years 2015 and 2016.
(select * from sales2015 order by yearly desc fetch first 2 rows only) union all (select * from sales2016 order by yearly desc fetch first 2 rows only)
As with any union, we can add another ORDER BY clause for the entire union to sort the final result set.
(select * from sales2015
order by yearly desc
fetch first 2 rows only)
union
(select * from sales2016
order by yearly desc
fetch first 2 rows only)
order by yearly desc
Fact 7: You can’t use expressions when ordering a union. (This is a bummer.)
select custnbr, name, city, state, zip
from bbcust
union all
select vendorid, name, city, state, zipcode
from bbvendor
order by case when state = 'MN' then 0 else 1 end
The system heartlessly responds with the message “ORDER BY expression is not valid.” Maybe someday. I can dream.
Fact 8: These principles also apply to INTERSECT and EXCEPT. I did not include examples of those because unlike UNION, I don’t find them very useful.
The bottom line: Don’t panic when you see those goofy error messages. In most cases, you can find a way around them.
For Fact 7, you can use the union as a sub-query to allow the use of an expression for the ordering. An extra layer, but it gives you the result you are looking for.
select * from (
select custnbr, name, city, state, zip
from bbcust
union all
select vendorid, name, city, state, zipcode
from bbvendor
) as bb
order by case when state = ‘MN’ then 0 else 1 end
Thanks, S Sims. You’re right, and I didn’t think of that technique when I was writing the article. I just wanted to point out a place where expressions would not work. But I’m glad you mentioned it, because that’s the technique I need for a project I’m working on at the moment.