- data a;
- input study_ID A01ab A02ab A03ab A04ab B01ab B02ab;
- cards;
- 1 1 0 0 0 0 0
- 1 1 0 1 0 0 0
- 1 1 0 1 0 1 0
- 2 1 0 0 0 0 0
- 2 0 1 0 0 0 0
- ;
- run;
- proc sort data=a out=in;
- by study_ID;
- run;
- proc print data=in;
- run;
- data out;
- set in;
- by study_ID;
- array rawvar {*} A01ab A02ab A03ab A04ab B01ab B02ab; /* Re-define the raw variables as elements of an array */
- array newvar {*} A01 A02 A03 A04 B01 B02; /* Define the new variables as elements of an array */
- do i = 1 to dim(rawvar);
- if first.study_ID then do; /* Create a new variable and assign an initial value of 0 at each first record of sutdy_ID*/
- newvar{i} = 0;
- end;
- newvar{i} + rawvar{i}; /* SUM statement, hold the value of last row and then add the value of corresponding raw variable */
- if newvar{i} ^= 0 then newvar{i} = 1;
- end;
- if last.study_ID then output;
- keep A01 -- B02;
- run;
- proc print data=out;
- run;
我的编程思路大致如下:
- 因为最后只要individual level的数据集,而原始数据集的结构属于“multipl rows for an individual”,可以使用 LAST. 功能在每一组的最后一行输出到数据集即可。
- 利用 DATA 步按行循环处理的原理,对各列进行累计加总。一旦每组最后一行的累加值大于0,即说明该列中有取值为 1 的情况。因为需要累加,所以用到了 SUM 语句。建议先不加“ if last.study_ID then output; ” 这一句,查看生成的数据集是否正确,以便验证编程的逻辑。
- 最后按需整理数据集,保留所需变量。