Closing the Gaps

January 12, 2011 Hey, Ted

We have a database file to which records are added throughout the day. The key is a five-digit packed decimal whole number, and the program that writes to the file increments this number when adding a new record to the file. During the day, another program processes and deletes records throughout the file, leaving gaps between sequence numbers. At the end of each day, we would like to renumber the remaining records, beginning again at sequence number one, to remove the gaps and be prepared for a fresh start the next day. Can we use an SQL UPDATE statement to renumber the sequence numbers?

–Brian

Sure, it’s possible, but with one UPDATE command? That depends. Let’s consider some possibilities.

First, let’s set the stage. Assume 50 records were written to the file during the day, and all but four–sequence numbers 1, 4, 12, and 17–were processed and deleted from your file.

If the arrival sequence coincides with the sequence number sequence, you can probably get away with one UPDATE. (I can’t take credit for this technique. I learned it from Joe Celko. Visit this site and search for CloseMotorpoolGaps.)

D ReSeqUpd1       pr                  extpgm('RESEQUPD1')
D   ouStatus                     8a
D ReSeqUpd1       pi
D   ouStatus                     8a

D SqlEof          c                   const('02000')
D AllOK           c                   const('00-00000')
D UnexpectedError...
D                 c                   const('99-99999')

 /free
     *inlr = *on;
     ouStatus = AllOK;

     monitor;

        exec sql
           update reseq as a
              set a.sequence =
                      (select count(b.sequence)
                         from reseq as b
                        where b.sequence <= a.sequence);
        if SqlState > SqlEof;
           ouStatus = '10-' + SqlState;
           return;
        endif;

     on-error;
        ouStatus = UnexpectedError;
     endmon;

     return;

I say probably, because there is no guarantee the system will process the records in arrival sequence. However, since the entire file will be updated (i.e., there is no WHERE clause), chances are pretty close to certain that the system will update in arrival sequence.

Suppose, however, that the sequence of the records according to key does not match arrival sequence, as could occur if you’re reusing deleted records. What then?

Let’s say that the remaining records, in arrival sequence, have sequence numbers 17, 12, 4, and 1. When the system tries to assign 4 to the first record (sequence number 17), it chokes with SQL state 23505 (A violation of the constraint imposed by a unique index or a unique constraint occurred) because there is already another row with a key value of 4.

In this case, you could update using a two-step process:

Update the keys to an unused range of numbers.
Renumber them, beginning with 1, as in the previous example.

Here are a couple of possibilities.

In this version, I add the number of active records in the file to the sequence numbers before re-sequencing them.

D RowCount        s             10u 0

D ReSeqUpd2       pr                  extpgm('RESEQUPD2')
D   ouStatus                     8a
D ReSeqUpd2       pi
D   ouStatus                     8a

D SqlEof          c                   const('02000')
D AllOK           c                   const('00-00000')
D UnexpectedError...
D                 c                   const('99-99999')

 /free
     *inlr = *on;
     ouStatus = AllOK;

     monitor;
        exec sql
             select count(*)
               into :RowCount
               from reseq;
        if RowCount <= *zero;
           return;
        endif;
        if SqlState > SqlEof;
           ouStatus = '10-' + SqlState;
        endif;
        if SqlState >= SqlEof;
           return;
        endif;

        exec sql
            update reseq as a
              set a.sequence = a.sequence + :RowCount;
        if SqlState > SqlEof;
           ouStatus = '20-' + SqlState;
           return;
        endif;

        exec sql
            update reseq as a
              set a.sequence =
                      (select count(b.sequence)
                         from reseq as b
                        where b.sequence <= a.sequence);
        if SqlState > SqlEof;
           ouStatus = '30-' + SqlState;
           return;
        endif;

     on-error;
        ouStatus = UnexpectedError;
     endmon;

     return;

After the first update, the rows have sequence numbers 21, 16, 8, and 5. After the second update, the sequence numbers are 4, 3, 2, and 1, of course.

Here’s a similar version, built on the assumption that sequence numbers are always positive. In the first update, subtract the highest sequence number in use in order to convert the key values to numbers less than 1.

D MaxSequence     s              5p 0

D ReSeqUpd2       pr                  extpgm('RESEQUPD2')
D   ouStatus                     8a
D ReSeqUpd2       pi
D   ouStatus                     8a

D SqlEof          c                   const('02000')
D AllOK           c                   const('00-00000')
D UnexpectedError...
D                 c                   const('99-99999')

 /free
     *inlr = *on;
     ouStatus = AllOK;

     monitor;
        exec sql
             select max(Sequence)
               into :MaxSequence
               from reseq;
        if SqlState > SqlEof;
           ouStatus = '10-' + SqlState;
        endif;
        if SqlState >= SqlEof;
           return;
        endif;

        exec sql
            update reseq as a
              set a.sequence = a.sequence - :MaxSequence;
        if SqlState > SqlEof;
           ouStatus = '20-' + SqlState;
           return;
        endif;

        exec sql
            update reseq as a
              set a.sequence =
                      (select count(b.sequence)
                         from reseq as b
                        where b.sequence <= a.sequence);
        if SqlState > SqlEof;
           ouStatus = '30-' + SqlState;
           return;
        endif;

     on-error;
        ouStatus = UnexpectedError;
     endmon;

     return;

After the first update, the rows have key values of 0, -5, -13, and -16. After the second update, the key values are 4, 3, 2, and 1, as before.

Of course, if you reorganized the file, putting the records back into key sequence, the first example would work and you wouldn’t need two updates.

RGZPFM FILE(RESEQ) KEYFILE(*FILE)

If UPDATE had an ORDER BY clause (but it doesn’t, and I doubt it ever will), the first version would work, no matter the physical sequence of the records.

–Ted

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot