先生成模拟数据sim_dat。
- data sim_dat;
- call streaminit(1234);
- do id = 1 to 500;
- ps_score = rand('uniform', 0.3, 0.7);
- treat=id<=150;
- output;
- end;
- run;
- proc print data=sim_dat (firstobs=148 obs=152); run;
1. 有放回的重抽样(resample with replacement)
bootstrap通常需要有放回的抽样,对原样本重抽样形成1000个bootstrap samples,然后对每个每个bootstrap sample计算感兴趣的统计量,形成1000个统计量。
- data post_sim_dat (drop=i);
-
- do i=1 to 500*1000;
- sample_id=ceil(i/500);
- pickit=ceil(ranuni(666)*totobs); /* 设置随机数种子666 */
- set sim_dat point=pickit nobs=totobs;
- output;
- end;
- stop;
- label sample_id='对bootstrap samples编号,用于后续分析'
- ;
- run;
- proc print data=post_sim_dat (firstobs=498 obs=502);
- title ’1000 Bootstrap Samples’;
- run;
2. 无放回的重抽样(resample without replacement)
添加一个辅助列,生成一列随机数,按照这列排序即可。
- data post_sim_dat_2;
- set sim_dat;
- index=ranuni(666);
- run;
- proc sort data=post_sim_dat_2; by index; run;
- data post_sim_dat_2;
- new_id=_n_;
- set post_sim_dat_2 (drop=index);
- run;
- proc print data=post_sim_dat_2 (obs=4); run;