楼主: godtears
5217 30

[原创博文] 如何运用SAS对下列数据进行处理 [推广有奖]

21
地狱小子 发表于 2009-8-22 17:36:30
考虑缺失值的情况

*间隔的天数;
%let interval=2;
data a;
input code day event x;
datalines;
1         03        0       3.3
1         04        1       -4
1         05        1       18
2         01        0        6
2         02        0       4
2         03        1        7
;
run;
proc sql;
        create table result as
                select code,day,event,x,
                        (select std(x)
                                  from a t2
                                where t2.code=t1.code and t2.day<t1.day and t2.day>=t1.day-&interval)
                        *(case (exists (select 1 from a t3 where t3.code=t1.code and t3.day=t1.day-&interval))
                                when 0 then . else 1 end)
as mean_&interval._includemissing
from a t1
where event=1
;
quit;

22
地狱小子 发表于 2009-8-22 17:43:07
Although SQL PROC of SAS is so disabled, it could do most things.

23
ddd1000 发表于 2009-8-23 09:41:12
The new SQL code still doesnot cover the situation if one of the data in the middle is missing.
If you consider and think thoroughly, the SQL will become complicated and run slowly.

anyway proc sql and data step each has strength and weakness.
My opinion is data step is more stronger but proc sql sometimes is good for quick and dirty solution and sometimes proc sql has strength for some problems solving.

24
ddd1000 发表于 2009-8-23 09:45:48
as to std, from statistics, you know how to calculate the standard deviation. it is
square root of [sum (x-mean)*(x-mean)/&n], so you can add code for std calculation.

25
ddd1000 发表于 2009-8-23 09:47:57
square root of [sum (x-mean)*(x-mean)]/&n

26
ddd1000 发表于 2009-8-23 10:00:27
* the program includine mean and standar deviation;


revised total program.
Hope answer your question and get the rewards.
A general method.
%let n=2;
%let m=%eval(&n+1);
data tem;
input code day event x;
datalines;
1         02        0       1.9
1         03        0       3.3
1         04        1       -4
1         05        1       18
2         01        0        6
2         02        0       4
2         03        1        7
;
run;
proc sort data=tem;
    by code day;
run;
data tem;
    set tem;
   array m[&m] a1-a&m;
  retain a1-a&m;
    by code day;
  if first.code then do;
      count=.;
      do i=1 to &m;
          m[i]=.;
       end;
  end;
  count+1;
  if count<=&m then m[count]=x;
  else do;
     do i=1 to &n;
         m[i]=m[i+1];
     end;
     m[&m]=x;
  end;
  do j=1 to &n;
     sum=sum+m[j];
  end;
  avg=sum/&n;

  do j=1 to &n;
     sumsq=sumsq+(m[j]-avg)**2;
  end;
  std=sumsq/&n;

run;

data out;
   set tem;
   if event=1;
run;

27
ddd1000 发表于 2009-8-23 10:04:50
std=sqrt(sumsq/&n);

28
地狱小子 发表于 2009-8-23 10:04:54
看来还是不能偷懒啊

*间隔的天数;
%let interval=2;
data a;
input code day event x;
datalines;
1         02        0       .
1         03        0       3.3
1         04        1       .
1         05        1       18
2         01        0        6
2         02        0       4
2         03        1        7
;
run;
proc sql;
        create table result as
                select code,day,event,x,
                        (select std(x)
                                  from a t2
                                where t2.code=t1.code and t2.day<t1.day and t2.day>=t1.day-&interval)
                        *(case (exists (select 1 from a t3 where t3.code=t1.code and t3.day=t1.day-&interval))
                                when 0 then . else 1 end)
                        *(case t1.x when . then . else 1 end)
                        as mean_&interval._includemissing
                        from a t1
                        where event=1
;
quit;

29
ddd1000 发表于 2009-8-23 10:21:21
I do not think it is correct.
not actually t1.x is missing is missing.
should be any record is missing within interval before the record.

30
godtears 发表于 2009-8-23 16:04:49
29# ddd1000

从我个人角度来看,
地狱小子的程序有点小缺陷,但就我现在的研究而言,不是致命的。并且,程序很简炼,很适合我这个初学者来看。
而ddd1000的程序很严谨,唯一不足的地方:不容易看懂。

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-25 07:12