Guru: What Do People Have Against Timestamps?
August 3, 2020 Ted Holt
In my work I often see database tables and physical files with related date and time fields (columns). By related, I mean that the two fields together indicate a certain time on a certain date for a certain event. I have found these date/time pairs to be difficult to work with at times, so much so that I have come to prefer timestamps.
I wonder why database architects (I use the term loosely) specify separate date and time fields rather than a timestamp. Having done no scientific survey, I can only guess. I suspect that many tables date back to olden days, when there were no date/time/timestamp data types. But in those cases, the fields are typically defined as numeric, or perhaps character, not with the date and time data types. When I see the date and time data types, I assume it’s just inertia — that’s the way it’s always been done.
Let’s look at the difference in querying the two architectures. First, here’s a database table of machining operations we can play with.
create or replace table operlog ( sequence integer primary key, operation dec(3), status dec(3), WhenCompleted timestamp(0), DateCompleted date, TimeCompleted time); insert into operlog (sequence, operation, status, whencompleted) values ( 1, 300, 60, '2020-06-27-19.51.33'), ( 2, 300, 60, '2020-06-28-11.36.48'), ( 3, 300, 60, '2020-06-29-09.21.38'), ( 4, 300, 60, '2020-06-29-12.59.59'), ( 5, 300, 60, '2020-06-29-13.45.21'), ( 6, 300, 60, '2020-06-29-17.15.34'), ( 7, 300, 60, '2020-06-29-17.15.48'), ( 8, 300, 60, '2020-06-29-21.18.49'), ( 9, 300, 60, '2020-06-29-22.04.08'), ( 10, 300, 60, '2020-06-29-23.19.44'), ( 11, 300, 60, '2020-06-30-01.42.33'), ( 12, 300, 60, '2020-06-30-02.25.24'), ( 13, 300, 60, '2020-06-30-08.13.51'), ( 14, 300, 60, '2020-07-04-09.12.46'), ( 15, 300, 60, '2020-07-04-16.01.32'), ( 16, 300, 60, '2020-07-05-11.52.18') ; update operlog set DateCompleted = date(WhenCompleted), TimeCompleted = time(WhenCompleted);
I’ve put both the timestamp field and separate date and time fields in the table so that we can query the table either way. A real database table would have either the timestamp or the date and time fields, not both.
Notice that I defined the timestamp field to have zero microseconds positions. I recommend leaving off microseconds if you don’t need them. They can wreak havoc with queries. For example, searching for the time 01.02.03 is the same as searching for 01.02.03.000000 and won’t match a value like 01.02.03.456000.
In fact, you may even want to set seconds to zero in some timestamp fields. Maybe you care that something happened at 8:00 AM, but you really don’t care if it was 8.00.00, 8.00.15, or 8.00.59. You can use a trigger to set the seconds to zero when writing to or updating the database.
But that’s off the subject. Back to the matter at hand. Let’s consider some common queries. First, select all the operations for a calendar date.
-- timestamp select * from operlog where date(WhenCompleted) = '2020-06-29'; select * from operlog where WhenCompleted between '2020-06-29-00.00.00' and '2020-06-29-23.59.59'; -- date and time select * from operlog where DateCompleted = '2020-06-29';
I think date and time wins this one for simplicity, although querying the timestamp isn’t complicated.
Let’s try another one. Let’s select the operations that were completed at a certain time. We may or may not want to ignore seconds.
-- timestamp select * from operlog where WhenCompleted = '2020-06-29-17.15.34'; -- ignore seconds select * from operlog where WhenCompleted between '2020-06-29-17.15.00' and '2020-06-29-17.15.59'; -- ignore seconds select * from operlog where WhenCompleted - (extract ( seconds from WhenCompleted )) seconds = '2020-06-29-17.15.00'; -- date and time select * from operlog where DateCompleted = '2020-06-29' and TimeCompleted = '17.15.34'; -- ignore seconds select * from operlog where DateCompleted = '2020-06-29' and TimeCompleted between '17.15.00' and '17.15.59'; -- ignore seconds select * from operlog where DateCompleted = '2020-06-29' and TimeCompleted - (extract ( seconds from TimeCompleted )) seconds = '17.15.00';
I think timestamp comes out ahead here, but not by much. The last query in each set is ugly to me. Subtracting with EXTRACT is one way to ignore seconds. I don’t know of a good way. This is one case where I’d ask myself if storing seconds in the database is a good idea or not.
On to the third query. Select the operations completed by the second shift (3:00 PM – 11:00 PM).
-- timestamp select * from operlog where WhenCompleted between '2020-06-29-15.00.00' and '2020-06-29-22.59.59'; -- date and time select * from operlog where DateCompleted = '2020-06-29' and TimeCompleted between '15.00.00' and '22.59.59';
I prefer the timestamp query, but querying date and time is not a hardship.
Number four. Select the operations completed by third shift (11:00 PM – 7:00 AM).
-- timestamp select * from operlog where WhenCompleted between '2020-06-29-23.00.00' and '2020-06-30-06.59.59'; -- date and time select * from operlog where ( DateCompleted = '2020-06-29' and TimeCompleted >= '23.00.00') or ( DateCompleted = '2020-06-30' and TimeCompleted <= '06.59.59');
Querying the timestamp is the clear winner in my opinion. Let’s complicate the query slightly.
Select from noon of one day to noon several days later.
-- timestamp select * from operlog where WhenCompleted between '2020-06-28-12.00.00' and '2020-07-04-11.59.59'; -- date and time select * from operlog where ( DateCompleted = '2020-06-28' and TimeCompleted >= '12.00.00') or (DateCompleted between '2020-06-29' and '2020-07-03') or ( DateCompleted = '2020-07-04' and TimeCompleted <= '11.59.59'); select * from operlog where timestamp(DateCompleted, TimeCompleted) between '2020-06-28-12.00.00' and '2020-07-04-11.59.59';
To me there’s no contest. Querying the timestamp field is much cleaner.
These are not the only ways to query the data, but other methods involve more complicated operations, such as converting date/time/timestamp to character and putting built-in functions on the left side of the equal sign. Such operations increase the probability that the optimizer will use a table scan rather than indexes.
If you’ve already got tables with separate date and time fields, then you’re probably stuck with them and probably can’t make a business case for spending time to convert them to timestamp fields. But when creating a new table, I can’t think of any reason to use separate date and time fields instead of a timestamp.
Hi Ted, you are right on the money, as usual. I looked at table creation members at 2 of my old clients and I started using TIMESTAMP definitions back in 2003. There may have been some older but no access. Where I work now, when I create a table (we still use DDS), I use timestamps. But the VP of development (been there 30+ years), STILL defines dates as CYYMMD and a separate time field. Old habits die hard but as you show, it’s pretty easy to use new techniques.
I’ve come to realize while we like to blame IBM for poor marketing, I think we can lay the blame at the feet of most developers who won’t/refuse to modernize.
Take care, all the best. Russ
Another problem with separate columns is right at change of day. There is a small risk that setting a date column to particular date and a time column to a particular time may span the switch. For example the date may end up as 2020-08-03 but the time was just a second or so later and is now 00.00.01
Can very well relate to this, I was reprimanded by the client for using timestamp instead of date and time fields defined as 8 & 6 numeric.