SAS - 扫描数据库子集并使用唯一值填充数组

时间:2015-08-18 14:57:24

标签: sas

我拥有一套独特的客户ID和购买,需要将每个客户的每个独特购买内容浓缩为一个观察点。

如,

CustID Purchase1 Purchase2 Purchase3 Purchase4
J Bike Shoes Shirt Pants
J Shirt Pants null null
J Bike Helmet Pants null
K Shoes Helmet null null
L Basketball Shoes Shirt null
L Bike Helmet null null

我希望我的输出看起来像:

CustID P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 PN
J Bike Shoes Shirt Pants Helmet null null null null null null null
K Shoes Helmet null null ........  null
L Basketball Shoes Shirt Bike Helmet null .... null

我可以为最大P设置一个非常大的值,这样我就不会打它,但是如果有人可以告诉我如何扫描数据集并设置P对应的P的最大值,则可以获得奖励积分针对特定客户的最大数量的独特购买。

1 个答案:

答案 0 :(得分:0)

这样的事情怎么样? 在同一列上的所有购买,nodupkey用于删除按主题重复购买,返回基于行的环境(系统将自动选择列命名为COL1 COL2等的列数。)

/*sample dataset*/
data want;
   infile datalines delimiter=' '; 
   input CustID $ Purchase1 $ Purchase2 $ Purchase3 $ Purchase4 $;
   datalines;                      
J Bike Shoes Shirt Pants
J Shirt Pants null null
J Bike Helmet Pants null
K Shoes Helmet null null
L Basketball Shoes Shirt null
L Bike Helmet null null
;


/*every purchase on the same column*/   
data want01;
length purchase $200;
set want;
array purc[*] purchase:;
do i=1 to dim(purc);
PURCHALL=purc[i];
output;
end;
keep custid purchall;
run;

/*delete repeated purchases and blanks*/   
proc sort data=want01 out=want02 nodupkey; where purchall not in ('' 'null'); by custid purchall; run;

/*returning on a row based dataset*/ 
proc transpose data=want02 out=want03;
by custid;
var purchall;
run;

如果您只想获得最大数量的唯一购买,只需在WANT02数据集上应用proc freq(包含唯一购买的数据集,不包含空格和空值)。

proc freq data=want02 noprint;
table custid /out=want04;
run;

WANT04将:

CUSTID | FREQUENCY |
--------------------
 J     |        5  | 
 K     |        2  |
 L     |        5  |
相关问题