使用proc sql的条件求和

时间:2014-11-20 11:02:50

标签: sas

我有一个包含accrual_date,absence_type,employee_id和duration_days的表。

accrual_date        absence_type  duration_days  employee_id
01JAN2001:00:00:00  010           10.20          1
01JAN2001:00:00:00  014           11             1
01JAN2002:00:00:00  015           30             2
01JAN2001:00:00:00  015           20             2

我想创建一个查询,该查询总结每个emplid每个缺席类型的duration_days。所以结果应该是:

employee_id       duration_days_010   duration_days_014  duration_days_015
1                 10.20               11                 .
2                 .                   .                  50

添加一个包含每个employee_id的duration_daysper absence_type的列:

proc sql;
create table sort_second as
select 
        case when absence_type='014' then sum(duration_days) else . end as duration_days_014,
        case when absence_type='015' then sum(duration_days) else . end as duration_days_015,
        case when absence_type='010' then sum(duration_days) else . end as duration_days_010,
        employee_id, absence_type
    from sort_first
    group by emplid;

quit;

然后删除重复的键:

proc sort data=sort_second out=test1 nodupkey;
by emplid;
quit;

但是这段代码的作用是忽略它来自014或015或010并为员工添加所有内容。像这样:

employee_id       duration_days_010   duration_days_014  duration_days_015
    1                 21.20               21.20          .
    2                 .                   .                  50

请告知出了什么问题。提前谢谢。

1 个答案:

答案 0 :(得分:2)

首先,如果你在SAS,我建议使用SAS工具!

在这种情况下,PROC FREQ或更好PROC TABULATE可以直接执行此操作,如果您需要数据集,则可以使用ODS OUTPUT获取该数据集。

ods output table=want;
proc tabulate data=have;
where absence_type in (10,14,15);
class absence_type employee_id;
var duration_days;
tables employee_id,absence_type*duration_days*sum;
run;

proc transpose data=want out=final prefix=duration_days_;
by employee_id;
id absence_type;
var duration_days_sum;
run;

如果你想坚持使用SQL,你需要做的就是改变case语句的工作方式。

case when absence_type='014' then sum(duration_days) else . end as duration_days_014,

应该是

sum(case when absence_type='014' then duration_days else . end) as duration_days_014,

IE,你想要一个假想的列,它只有014个持续时间。您在示例中所做的是在员工持有任何duration_days的任何列中插入所有duration_days的总和。您还应该能够跳过上面的大部分步骤 - 您可以从初始数据集中执行此操作。

proc sql;
create table final as
select 
        sum (case when absence_type=014 then duration_days else . end) as duration_days_014,
        sum (case when absence_type=015 then duration_days else . end) as duration_days_015,
        sum (case when absence_type=010 then duration_days else . end) as duration_days_010,
        employee_id
    from have
    group by employee_id;

quit;