- CUSTOMER TIME PRODUCT
- 0 0 hering
- 0 1 corned_b
- 0 2 olives
- 0 3 ham
- 0 4 turkey
- 0 5 bourbon
- 0 6 ice_crea
- 1 0 baguette
- 1 1 soda
- 1 2 hering
- 1 3 cracker
- libname emlib 'C:\Users\Administrator\Desktop\emlib';
- proc dmdb batch data=emlib.assocs out=dmassoc dmdbcat=catassoc;
- id customer time;
- class product(desc);
- run;
- proc assoc data=emlib.assocs dmdbcat=catassoc
- out=datassoc(label='Output from Proc Assoc')
- items=5 support=20;
- cust customer;
- target product;
- run;
- proc rulegen in=datassoc
- out=datrule(label='Output from Proc Rulegen')
- minconf=75;
- run;
- proc print data=datrule;
- run;
关联分析代码如上,非常简单,主要用到两个过程PROC ASSOC和PROC RULEGEN1. PROC ASSOC 主要的作用为生成所有的K-项集,并统计其频率。格式如下:
- PROC ASSOC <option(s)>;
- CUSTOMER variable-list;
- TARGET variable;
CUST statement 与TARGET statement分别代表你的标志变量和目标变量。
使用该过程的时候需要注意一点,引用SAS官方原文:
- Processing an extremely large number of sets could cause your system to run out of disk and/or memory resources. However, by using a higher support level, you can reduce the item sets to a more manageable number.
2.PROC RULEGEN 主要的作用是生成关联规则,格式也非常的简单,如下:
- PROC RULEGEN <option(s)>;
3.运行后生成结果(部分)为:主要有:support(支持度)、confidence(置信度)、lift(梯度),rule(规则)
本帖隐藏的内容
补充内容 (2013-11-3 17:13):
原文参考:http://support.sas.com/documenta ... iner/em43/assoc.pdf