Don’t Depend on Arrival Sequence
July 19, 2006 Hey, Ted
We have a sequential (i.e., unkeyed) database file that contains several years of shipment transactions. We’ve built a lot of indexes over it, and we slice and dice it daily, all day long, from both programs and queries. We have two problems that surface occasionally. The first problem is that sometimes we are not always able to see shipments in chronological order. Since we only store a shipment date, but not a time, we depend on the order in which shipments are loaded into the file to tell us which shipment occurred before another. Depending on the order in which indexes are created or restored, we may or may not receive duplicate records in the order in which they were created. Can you suggest a way that we can be assured of receiving data in chronological sequence in spite of lack of a timestamp? The second problem is that the log file contains one field that we update, yet we are not always able to tell which of several records with duplicate keys should be updated. That is, program B has received information about a record from program A, but program B does not know which record the information came from, and therefore does not know which record to modify. How can we uniquely identify a record that has no unique key value? –Louis One of the primary ideas behind relational database management systems is that the sequence of rows and columns must not have any meaning. Depending on arrival sequence goes against this idea. Yet, this is an understandable situation, since there is no combination of columns that can be used for a unique key. Here are a few ideas. 1. If your log file is defined with DDS, one easy way to address your first problem is to use the FIFO keyword at the file level of some or all of your logical files. The FIFO keyword tells the system that records with duplicate key values are to be accessed in relative record sequence. The FIFO keyword is one of those practical IBM extensions that help us get our work done. The primary disadvantage, to my way of thinking, of using the FIFO keyword is that you cannot reuse deleted records in the physical file. 2. Regarding your second problem, RPG can retrieve the relative record number of a record in a physical file, even if you access the record through a logical file. In the case of logical files that have more than one record format, you will also need to retrieve the record format name. Both of these items are in the file feedback data structure. Fsomelf ip e k disk F infds(xactds) D xactds ds D RcdFmt 261 270 D RRN 397 400i 0 Armed with this information, an RPG program can access the physical file by relative record number. 3. Adding a timestamp field would not be a good theoretical solution to your problems, since you could not be assured that two rows would not be inserted at the same time. Besides, unique keys normally have some logical relationship to the data they identify, but a timestamp does not. But adding a timestamp field would probably be a good practical solution, as it–combined with other fields if necessary–would give you a way to uniquely identify each record, as well as sort by date and time. I have dealt with a similar idea before in a past issue of this newsletter. 4. The ideal situation is to give each record a unique key. I suggest using an identity column. However, your file already exists, and besides, we don’t live in an ideal world, do we? Log files like the one you describe are a sort of poor man’s data warehouse. Every shop I’ve worked in had such files, and for the most part, such files work well. I hope these ideas give you a way around the occasional problems you’re having. –Ted |