楼主: jingju11
2364 15

A tricky data manipulation [推广有奖]

11
jingju11 发表于 2013-1-19 08:38:12
jingju11 发表于 2013-1-19 08:27
Thank Pobel again. To fit my code, i did some minor adjustment. To my surprise, there is no optimal  ...
  1. this is the code from myself. No significant gain to the previous. it is just for sharing. Jingju

  2. data rate_all2nonRrr2;
  3. set rate_all2nonRrr1;
  4. by clnt_no acct_ref_no;
  5. retain s0;
  6. if (first.acct_ref_no) then do;
  7. s0 =qtr_dt; s =0;
  8. end;
  9. s ++1;
  10. if (s >3) then delete; /*once s adds upto 3, delete any following records till the next id*/
  11. if (intnx('month', s0, 3*(s-1), 'e') ^=qtr_dt) then do;
  12. s0 =qtr_dt; s =1;
  13. end;
  14. drop s0;
  15. run;
  16. data rate_all2nonRrr3;
  17. do _n_ =1 by 1 until(last.acct_ref_no);
  18. set rate_all2nonRrr2;
  19. by clnt_no acct_ref_no;
  20. i1 =sum(i1,1);
  21. if (^i2) then if (s =2) then do; /* fetch 1st record @ s =2*/
  22. i3 =s; i4 =i1; i2 =1;
  23. end;
  24. end;
  25. i5 =i1; /*number of records for that id */
  26. i6 =s ; /*s @ last record for that id */
  27. i1 =0 ;
  28. do _n_ =1 by 1 until(last.acct_ref_no);
  29. set rate_all2nonRrr2;
  30. by clnt_no acct_ref_no;
  31. i1 =sum(i1,1);
  32. if (i6 =3) then do;/*when s =3 (existing 3-consecutive records) then output the last 3 records*/
  33. if (i1 >i5 -3) then output;
  34. end;
  35. /*if no 3-consecutive records exist then output 1st 2-consecutive records, otherwise no output*/
  36. else if (i3 =2) then if (i1 >i4-2 & i1 <=i4) then output;
  37. end;
  38. drop s i1-i6;
  39. run;
复制代码


12
jingju11 发表于 2013-1-21 13:31:33
jingju11 发表于 2013-1-19 08:38
在POBEL的程序里,第二步是根据生成的记录序列数来从大的数据集里选择并输出数据。这是个非常简单的sas问题。但是因为数据很大,这个很简单的方法也许值得探索一下。
  1. proc sql;
  2. select a.* from a, b where a.id = b.id
  3. ;
复制代码


我使用了若干种方法去实现这种很简单选择,以期望得到一个较有效率的方案。但是,有点出乎意料的是,几种方法效率基本相同。也就是说,sas在这个问题上没有很明显的优化。具体的结果参阅我的程序结果和分析
京剧

13
zhangzachary 发表于 2013-1-21 15:09:38
  1. data have1;
  2.   merge have have(firstobs=2 keep=id dt rename=(id=id1 dt=dt1)) have(firstobs=3 keep=id dt rename=(id=id2 dt=dt2));
  3.   if intnx('month',dt,3,'end')=dt1 and id=id1 then grp=2;
  4.   if intnx('month',dt,3,'end')=dt1 and intnx('month',dt1,3,'end')=dt2 and id=id1=id2 then grp=3;
  5.   if grp;
  6.   drop id1 id2;
  7. run;

  8. proc sort data=have1 out=have2 nodupkey ;
  9.   by id descending grp dt;
  10. run;

  11. data want;
  12.   set have2;
  13.   by id;
  14.   if first.id;
  15.   output;
  16.   dt=dt1;
  17.   _obs+1;
  18.   output;
  19.   if grp=3 then do;
  20.   dt=dt2;
  21.   _obs+1;
  22.   output;
  23.   end;
  24.   drop dt1 dt2 grp;
  25. run;
复制代码
我也写了一个,不过用了一次sort~
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
jingju11 + 3 + 5 + 5 分析的有道理

总评分: 学术水平 + 3  热心指数 + 5  信用等级 + 5   查看全部评分

寒冰凤凰 My blog: http://blog.sina.com.cn/u/1058955485

14
pobel 在职认证  发表于 2013-1-21 15:13:02
jingju11 发表于 2013-1-21 13:31
在POBEL的程序里,第二步是根据生成的记录序列数来从大的数据集里选择并输出数据。这是个非常简单的sas问 ...
There might be another way to do the subsetting: the FORMAT+PUT() way (see codes below).
Maybe sometimes this method is a bit faster.

/*Previous code is omitted*/
              
        *** Get the needed observations;
        fmtname="OBSFMT"; label="OK";
        if last.acct_ref_no then do i =1 by 1 while (scan(_obs_need,i,',') ^=' ');
                start=input(scan(_obs_need,i,','), best.);
                output null_2;
        end;
run;

proc format cntlin=null_2;
run;


data null_3;
           set null_1;
        flag=put(_obs,obsfmt.);
        if flag="OK";
        drop flag;

run;
和谐拯救危机

15
jingju11 发表于 2013-1-22 01:37:58
zhangzachary 发表于 2013-1-21 15:09
我也写了一个,不过用了一次sort~
very good, in particular, for a data not too large. By the way, nodupkey in sort may be redundant.
JingJu

16
jingju11 发表于 2013-1-23 11:36:13
pobel 发表于 2013-1-21 15:13
There might be another way to do the subsetting: the FORMAT+PUT() way (see codes below).
Maybe so ...
Thanks for the advice. I added the proc format as one method in the test.
I had a chance today to rerun the program in our Unix Server  andreported and summarized the results here.
Any comments are welcome.
Jingju


您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-24 23:16