Thinking in Sets
March 31, 2004 Hey, Howard
I have an interesting problem and was wondering if you could show me how to code it in SQL. I have a file that records changes to packages in our facility. The file has a package ID number, date of activity, time of activity, and activity code. I want to find the last date and time of activity for each package but can’t get the query to work, because using MAX on the date and time can return a time value for when an activity did not occur.
–Michael
This is a really interesting problem, Michael, because your schema does not use the timestamp data type, but rather two separate fields–a date and a time–to record the activity. However, thinking in sets and relating the sets together can solve this problem.
First, I’ve created a small sample table to insert data for test queries:
CREATE TABLE SQLBOOK.T1AA (ID INTEGER NOT NULL, ACTDATE DATE NOT NULL, ACTTIME TIME NOT NULL); INSERT INTO SQLBOOK.T1AA VALUES (1,'01/01/2003','01:00'); INSERT INTO SQLBOOK.T1AA VALUES (1,'01/02/2003','02:00'); INSERT INTO SQLBOOK.T1AA VALUES (1,'01/03/2003','03:00'); INSERT INTO SQLBOOK.T1AA VALUES (2,'01/01/2003','02:00'); INSERT INTO SQLBOOK.T1AA VALUES (3,'01/02/2003','03:00'); INSERT INTO SQLBOOK.T1AA VALUES (3,'01/03/2003','02:00'); INSERT INTO SQLBOOK.T1AA VALUES (3,'01/04/2003','01:00');
If you query the table you should get results like the following:
ID |
ACTDATE |
ACTTIME |
1 |
01/01/2003 |
1:00:00 AM |
1 |
01/02/2003 |
2:00:00 AM |
1 |
01/03/2003 |
3:00:00 AM |
2 |
01/01/2003 |
2:00:00 AM |
3 |
01/02/2003 |
3:00:00 AM |
3 |
01/03/2003 |
2:00:00 AM |
3 |
01/04/2003 |
1:00:00 AM |
Now, one would assume that you could simply code a query using max and achieve the desired results, but doing that query leads to data that is not in the result set. Here is an example:
SELECT ID, MAX(ACTDATE) AS D, MAX(ACTTIME) AS T FROM SQLBOOK.T1AA GROUP BY ID;
That query would result in the following:
ID |
D |
T |
1 |
01/03/2003 |
3:00:00 AM |
2 |
01/01/2003 |
2:00:00 AM |
3 |
01/04/2003 |
3:00:00 AM |
What’s wrong with the above result set? Notice that result for ID 3 says that the last activity occurred on 1/4/2003 at 3:00 AM; however, if you look at the source data, the activity on 1/4/2003 occurred at 1:00 AM. By using MAX, you are creating a result set of incorrect data. Now, consider the following query:
SELECT ID, MAX(ACTDATE) AS D FROM SQLBOOK.T1AA GROUP BY ID;
This query returns a set I will call X:
ID |
D |
1 |
01/03/2003 |
2 |
01/01/2003 |
3 |
01/04/2003 |
The above query returns a result set that correctly identifies the date of each last activity. What you need to do is join that result set back to the table to retrieve the time of the last activity on the given date. Here is a query that returns the last time of an activity for a given ID on each date:
SELECT ID, ACTDATE, MAX(ACTTIME) AS T FROM SQLBOOK.T1AA GROUP BY ID, ACTDATE;
This query returns a set I will call Y:
ID |
ACTDATE |
T |
1 |
01/01/2003 |
1:00:00 AM |
1 |
01/02/2003 |
2:00:00 AM |
1 |
01/03/2003 |
3:00:00 AM |
2 |
01/01/2003 |
2:00:00 AM |
3 |
01/02/2003 |
3:00:00 AM |
3 |
01/03/2003 |
2:00:00 AM |
3 |
01/04/2003 |
1:00:00 AM |
Why did I elect to use MAX in returning the time? Just in case there are several entries for the same day. Remember, I want the last time that something happened to a given ID on a date. If I have more than one record for a single ID on a given date, without using MAX and a group by the query would return two records. Now that I have two sets of data (X and Y in the above examples), I can simply join them together to find the date and time of the last activity:
SELECT X.ID, D, T FROM (SELECT ID, MAX(ACTDATE) AS D FROM SQLBOOK.T1AA GROUP BY ID) AS X INNER JOIN (SELECT ID, ACTDATE, MAX(ACTTIME) AS T FROM SQLBOOK.T1AA GROUP BY ID, ACTDATE) AS Y ON (X.ID=Y.ID AND X.D = Y.ACTDATE);
Results in this set:
ID |
D |
T |
1 |
01/03/2003 |
3:00:00 AM |
2 |
01/01/2003 |
2:00:00 AM |
3 |
01/04/2003 |
1:00:00 AM |
See, the above query uses an inner join to join the set X to the set Y, where the ID codes are equal and the dates are equal. In this manner, I get a set of information showing the last date and time of an activity against an ID code.
Remember, a table is just a set of data. An SQL statement is a set of data. SQL allows you to join sets of data together. If you think in sets, relational gets a whole lots easier and opens up a lot of possibilities in your queries and reporting. Besides, it’s fun.
Howard F. Arner, Jr., is a writer and consultant with Client Server Development and is the author of the book iSeries and AS/400 SQL at Work. Howard also designed SQLThing Enterprise Edition, a query program and stored procedure editor specifically for the AS/400’s unique capabilities. You can purchase a copy of Howard’s book or learn more about SQLThing or SQL on the iSeries at www.sqlthing.com. Send your SQL questions to Howard at harner@sqlthing.com.