Presenting Vertical Data Horizontally

July 10, 2013 Victor Pisman

Presenting “vertical” data in a “horizontal” format is a challenging task in data management. Here is one method you can use to make SQL carry out this type of transformation.

Consider table CUSTGRP, which has two fields: a customer ID and a group code. Notice that one customer can have multiple group codes.

GROUP

100

200

500

700

Here’s one way to build a result set that has only one record per customer, with multiple fields populated with different group codes.

First, determine the number of groups to which each customer belongs:

select id, count(*) as count
  from custgrp
 group by id

COUNT

100

200

500

700

Next, join the table to itself, defining the table once for each column of the resulting repeating group. That is, if you want to allow for four repetitions, define the table four times. In this example, I allow for six group codes per customer.

The join is the key to this technique. The common field, in this case customer ID, must be joined on an equal condition, but the repeating group must be joined on either a less than or a greater than condition.

select a.id, a.group as x1, b.group as x2,
             c.group as x3, d.group as x4,
             e.group as x5, f.group as x6
  from custgrp as a
  left join custgrp as b
    on a.id = b.id and a.group < b.group
  left join custgrp as c
    on b.id = c.id and b.group < c.group
  left join custgrp as d
    on c.id = d.id and c.group < d.group
  left join custgrp as e
    on d.id = e.id and d.group < e.group
  left join custgrp as f
    on e.id = f.id and e.group < f.group

100

null

100

Null

null

100

Null

null

100

null

Null

null

200

null

200

Null

null

200

Null

null

200

null

Null

null

500

null

Null

null

700

Null

null

700

null

Null

null

The result set contains multiple rows for each customer. One row per customer contains all of the group codes, and the other rows do not. The only thing remaining to do is to select only those rows with all the group codes for each customer.

with summary as (
   select id, count(*) as count
     from custgrp
    group by id),
temp1 as (
   select a.id, a.group as x1, b.group as x2,
                c.group as x3, d.group as x4,
                e.group as x5, f.group as x6
     from custgrp as a
     left join custgrp as b
       on a.id = b.id and a.group < b.group
     left join custgrp as c
       on b.id = c.id and b.group < c.group
     left join custgrp as d
       on c.id = d.id and c.group < d.group
     left join custgrp as e
       on d.id = e.id and d.group < e.group
     left join custgrp as f
       on e.id = f.id and e.group < f.group
    order by 1, 2
)

select t.* , s.count
  from temp1 as t
  join summary as s
    on t.id = s.id
 where s.count = 
     ( case when x1 is not null then 1 else 0 end +
       case when x2 is not null then 1 else 0 end +
       case when x3 is not null then 1 else 0 end +
       case when x4 is not null then 1 else 0 end +
       case when x5 is not null then 1 else 0 end +
       case when x6 is not null then 1 else 0 end )
 order by 1

The SQL query contains two common table expressions–SUMMARY and TEMP1–to carry out the first two SQL statements. The remainder of the SQL expression combines those common table expressions to produce the final result set.

select t.* , s.count
  from temp1 as t
  join summary as s
    on t.id = s.id
 where s.count = 
     ( case when x1 is not null then 1 else 0 end +
       case when x2 is not null then 1 else 0 end +
       case when x3 is not null then 1 else 0 end +
       case when x4 is not null then 1 else 0 end +
       case when x5 is not null then 1 else 0 end +
       case when x6 is not null then 1 else 0 end )

Each case statement returns either zero or one, depending on whether a repeating group value is null or not. The system counts the number of non-null values in the repeating group in the TEMP1 result set by summing the zeros and ones. Only one TEMP1 row of each group matches the count in the SUMMARY result set. Those are the rows that make it into the result set.

COUNT

100

null

200

null

500

null

Null

null

700

Null

null

Be sure to allow for enough repetitions. In this example, any customers that belonged to seven or more groups would not be included in the result set.