OLAP Is Simple, Once You Understand It
May 9, 2012 Hey, Ted
I have a problem that I solved with a simple RPG program, but I know it can be done with SQL as well. I needed to generate a simple table of employee ID, name, and average of the last five weeks paychecks. Is this simple or complicated to do with SQL? –Barry You are correct that SQL can do the calculation, Barry. However, you’ll have to decide for yourself if it’s simple or complicated. First, let’s get some data to work with. I’m going to simplify your question slightly by using a name only, without an ID, to identify an employee. create table paydata (Emp char(5), PayDate date, GrossPay dec(5,0)) insert into paydata values ('Joe', '2012-01-06', 50), ('Joe', '2012-01-12', 100), ('Joe', '2012-02-03', 120), ('Joe', '2012-02-17', 110), ('Joe', '2012-03-02', 140), ('Joe', '2012-03-16', 100), ('Joe', '2012-03-30', 100), ('Sam', '2012-01-12', 10), ('Sam', '2012-02-17', 30), ('Sam', '2012-03-02', 70), ('Sam', '2012-03-16', 50), ('Sam', '2012-03-30', 50), ('Bill','2012-03-30', 20) Now, let’s consider the question in two different ways, because there are applications for each one. First, let’s get the average of the paychecks for the last five weeks. For the PAYDATA table, that means February 3 through March 30. Some employees may not have five paychecks, because they may have started during that period (like Bill), or may have quit and come back after missing one or more pay periods (like Sam). We will get the average of five or fewer paychecks. with WeekNumbers as (select paydate, rank() over (order by paydate desc) as WeekNumber from (select distinct paydate from paydata) as x) select pd.emp, avg(pd.grosspay) as avgpay from paydata as pd join WeekNumbers as w on pd.paydate = w.paydate where w.WeekNumber <= 5 group by emp order by emp Let’s break it down to make it easier to understand. The common table expression uses the RANK function to assign numbers to all of the payroll dates, beginning with the most recent payroll date and working backward. March 30 is week 1, March 16 is week 2, and so forth. with WeekNumbers as (select paydate, rank() over (order by paydate desc) as WeekNumber from (select distinct paydate from paydata) as x) WeekNumbers looks like this: PAYDATE WEEKNUMBER 2012-03-30 1 2012-03-16 2 2012-03-02 3 2012-02-17 4 2012-02-03 5 2012-01-12 6 2012-01-06 7 The main select retrieves the data for weeks 1 through 5 only and averages gross pay for each employee. The result set looks like this: EMP AVGPAY Bill 20 Joe 114 Sam 50 That answers your question. Let’s consider another common, but slightly different query. Suppose we need the average of the last five paychecks, regardless of when they were issued. (A more realistic example would be to retrieve the last five payments for each customer, because those payments would be made on different days, not on regularly scheduled dates. However, I don’t want to introduce another data set, so we’ll make do with payroll.) Here’s the query to get the average of the last five paychecks, whenever they were issued: with t1 as (select emp, paydate, grosspay, rank () over (partition by Emp order by paydate desc) as PaymentNumber from paydata as x ) select Emp, avg(GrossPay) as AvgPay from t1 where PaymentNumber <= 5 group by Emp order by Emp The common table expression assigns sequential numbers to each payment, beginning with the most recent payment and working into the past. PARTITION BY makes the system give each employee his own set of numbers. T1 looks like this: EMP PAYDATE GROSSPAY PAYMENTNUMBER Bill 2012-03-30 20 1 Joe 2012-03-30 100 1 Joe 2012-03-16 100 2 Joe 2012-03-02 140 3 Joe 2012-02-17 110 4 Joe 2012-02-03 120 5 Joe 2012-01-12 100 6 Joe 2012-01-06 50 7 Sam 2012-03-30 50 1 Sam 2012-03-16 50 2 Sam 2012-03-02 70 3 Sam 2012-02-17 30 4 Sam 2012-01-12 10 5 The main select averages payments 1 through 5 for each employee. Here’s the result set: EMP AVGPAY Bill 20 Joe 114 Sam 42 So, what do you think, Barry? Is the SQL method simple or complicated? –Ted
|