首先,数据库js如下,含有唯一变量ID,需要匹配的变量 shcool、grade、class和age,case是定义变量,1是病例,0是对照。抽样要求school、grade、class要相同,age相差不超过2岁。
先说我的思路,首先是生成病例库和对照库, 然后依次从病例库中抽选1条,然后按照条件从对照库选择符合的样本组成待抽选库,然后从这个待抽选库中随机抽选1条,最后把抽出的这条从对照库中剔除,然后重复上述过程直到每条病例都匹配完成,最后合并抽出来的样本。
代码如下:
- proc sort data=js out=a1;
- by school grade class case age;
- data a1;set a1;
- pp=compress(school||grade||class); *班级匹配变量pp;
- run;
- data a_case a_control;set a1;
- if case=1 then output a_case; *病例数据集a_case;
- if case=0 then output a_control; *对照数据集a_control;
- run;
- %macro ss;
- proc datasets lib=work;delete sample;run;*清除前次抽选样本,方便重复运行宏抽样程序;
- proc sql noprint;
- select count(*) into: num from a_case; *统计病例个数;
- %do i=1 %to #
- %let k=%eval(&i-1);
- proc sql noprint;
- select id into: idx separated by ' ' from a_case; *病例ID号 到宏idx;
- %let xx=%scan(&idx,&i,' ') ;
- data case&i;set a_case;
- if id=&xx then call symput("pp_v",pp); *相同匹配条件pp;
- if id=&xx then call symput("age_v",age); run; *范围匹配条件age;
- data control&i;set a_control;
- if pp="&pp_v" and abs(age-&age_v)<3; run; *生成符合匹配条件control样本;
- proc surveyselect noprint data=control&i method=srs n=1 out=sample&i seed=1000; *n=1 ,1:1匹配;
- data a_control;set a_control sample&i; run;
- proc sql noprint;
- create table a_control as select * from a_control group by id having n(id)=1; *剔除已选择control样本;
- %end;
- data sample;set sample:;run; *生成匹配样本库;
- data hb;set a_case sample;run; *生成总库;
- proc datasets lib=work;save a1 js a_case a_control sample hb; run; *清除过程文件;
- %mend;
- %ss;