----------------------------------------------------------------------------------
其实一直想写这么一个宏,今天得空写了一下,一开始觉得很难,着手写的时候发现其实挺简单。
目的:
用于从指定数据集中按组抽取代表情况。
这样,我们在对一个数据集进行某种规则时,可以先抽取一个全部情况的样本,看看自己的逻辑是否完备。
描述:
大伙都用过PROC SORT的nodupkey吧?他是按照给定的几个字段对数据集进行一条一条的去重;
可是有时我们需要对数据集进行一组一组的去重,以获得一个有代表性的样本数据集。
如下例:
| id | group1 | group2 | value |
1 | 1 | 1 | 1 |
1 | 2 | 1 | 2 |
1 | 3 | 1 | 3 |
1 | 3 | 2 | 4 |
1 | 4 | 1 | 5 |
1 | 5 | 1 | 6 |
2 | 11 | 1 | 7 |
2 | 22 | 2 | 8 |
2 | 33 | 3 | 9 |
3 | 1 | 1 | 10 |
3 | 2 | 1 | 11 |
3 | 3 | 1 | 12 |
3 | 3 | 2 | 13 |
3 | 4 | 1 | 14 |
3 | 5 | 1 | 15 |
4 | 11 | 1 | 16 |
4 | 22 | 2 | 17 |
4 | 33 | 3 | 18 |
5 | 11 | 0 | 19 |
5 | 22 | 2 | 20 |
5 | 33 | 3 | 21 |
这个例子,如果我们按照group1 + group2来看,id:1、2、5就可以代表这个数据集的不同情况
按照group1来看,id:1、2就可以代表这个数据集的不同情况。
程序:
- %macro extract_sample(in = have, out = result, id = id, group = group1 group2);
- proc sql;
- create table extract_sample1 as
- select *, count(*) as extract_sample_N
- from &in.(keep = &id. &group.)
- group by &id.
- order by &id., %sysfunc(tranwrd(%sysfunc(compbl(&group.)),%str( ),%str(,)))
- ;
- quit;
- data extract_sample1;
- set extract_sample1;
- by &id.;
- retain extract_sample_id;
- if first.&id. then extract_sample_id = 1;
- else extract_sample_id + 1;
- run;
- proc sort data = extract_sample1 out = extract_sample2 nodupkey;
- by extract_sample_N extract_sample_id &group.;
- run;
- proc sql;
- create table &out. as
- select * from &in.
- where &id. in (select &id. from extract_sample2)
- ;
- quit;
- proc datasets lib = WORK memtype = data nolist;
- delete extract_sample:;
- quit
- %mend extract_sample;
- %extract_sample(in = have, out = result, id = id, group = group1 group2);



雷达卡






京公网安备 11010802022788号







