10354 31

怎么样用SAS来产生模拟数据集 [推广有奖]

  • 0关注
  • 4粉丝

教授

12%

还不是VIP/贵宾

-

威望
0
论坛币
6752 个
通用积分
15.6910
学术水平
18 点
热心指数
24 点
信用等级
15 点
经验
407 点
帖子
1190
精华
0
在线时间
996 小时
注册时间
2013-1-20
最后登录
2024-4-1

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

求大神帮忙解决以下的SAS程序问题,试了很多次,实在是没有办法自己搞定。birth_data.xls


以上是一个birth_data的数据集。里面的变量包括patientage,一共有N个观测。现在我想要通过SAS产生一些模拟数据集。要求如下:


1)  在age的最大值和最小值之间随机产生0.1*N个模拟数。

2)  在birth_data随机抽取10%的干净数据,这些数据将被上一步中产生的模拟数据代替。随机抽取的次数为1000次,相应地,替代的次数也为一千次。这样可以得到1000个被抽取和替代后的birth_data的模拟数据集。

3)  得到的1000个模拟数据集以后,分别计算出这些模拟数据集中age的均值和方差,并且将这些均值和方差合并入格式如下的数据集中。

  

Simulation degree

  

Simulation dataset order



mean



std



0.1



1







0.1



2







0.1



3







0.1



4







0.1



5







0.1



... …







0.1



1000







二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:模拟数据 数据集 怎么样 Simulation ulation 最大值 程序

birth_data.xls

249.5 KB

32
小宝爱波1314 发表于 2014-9-23 08:48:17 |只看作者 |坛友微信交流群
yongyitian 发表于 2014-9-23 08:21
谢谢,我也解决啦。
data Gender_kafang_1(keep=row column count);
        set Gender_kafang_1;
        retain row 1;
        column=1;
        count=F;output Gender_kafang_1;
        column+1;
        count=M;output Gender_kafang_1;
        row+1;
run;

使用道具

31
yongyitian 发表于 2014-9-23 08:21:26 |只看作者 |坛友微信交流群
小宝爱波1314 发表于 2014-9-22 21:56
您好,可以帮我看一下这个问题怎么解决么?由于要做卡方检验,我需要完成从数据集a到数据集b的转换。我把 ...
  1. data a;
  2.         input  F M;
  3.         cards;
  4. 1381 2606
  5. 1390 2597
  6. ;
  7. run;
  8. data a_b;
  9.     set a;
  10.         row =_n_;
  11.         column = 1;  count = f;   output;
  12.         column = 2;  count = m; output;
  13.    drop f m;
  14. run;
复制代码

使用道具

30
小宝爱波1314 发表于 2014-9-22 21:56:43 |只看作者 |坛友微信交流群
yongyitian 发表于 2014-3-18 23:10
2. 在birth_data随机抽取10%的干净数据,这些数据将被上一步中产生的模拟数据代替.

不明白这是什么意思 ...
您好,可以帮我看一下这个问题怎么解决么?由于要做卡方检验,我需要完成从数据集a到数据集b的转换。我把生成a、b数据集的程序写在下面了。多谢您
data a;
        input  F M;
        cards;
1381 2606
1390 2597
;
run;
data b;
       do row=1 to 2;
          do column=1 to 2;
                input count@@;
          output;
          end;
       end;
cards;
1381 2606 1390 2597
;
run;

使用道具

29
小宝爱波1314 发表于 2014-4-14 19:50:55 |只看作者 |坛友微信交流群
yongyitian 发表于 2014-4-14 08:24
把你的程序做了几处修改。下面是修改后的程序和log。虽然可以运行。但肯定不是你想要的,因为用了不

同的 ...
您好,我运行了您的程序,你用了%do simu_num= 1 %to 2,就是最后出来的final里面只有第二次(即simu_num=2)的结果被输出了,而第一次(simu_num=1)的结果却没有,我的问题之前也是这样的。

使用道具

28
jingju11 发表于 2014-4-14 11:06:21 |只看作者 |坛友微信交流群
c[4] =c[4]-(sum(of c[ * ])-sampSize);
我的原意是保证产生的数据尺寸和要求的相同。另外在进一步生成数据的时候,可以考虑surveyselect 用来做。其本质就是从两个数据里选出不同比例的数据而已。
京剧

使用道具

27
yongyitian 发表于 2014-4-14 08:24:59 |只看作者 |坛友微信交流群
把你的程序做了几处修改。下面是修改后的程序和log。虽然可以运行。但肯定不是你想要的,因为用了不

同的sample_size 和 rate。 具体修改了那些地方请对比程序修改前后的log文件。
我这运行你那段生成outlier的程序出错。 不知你运行是否有错。


%macro cond(cond1, cond2);
  when (c[&cond1] >0 and &cond2  ) do;
    c[&cond1] +-1;
    sampSize +-1;
    output;
    end;
%mend cond;


%let mean_age=3283.95;
%let std_age=563.1736630;
data work.outlier_weight;
  call streaminit(12345);
  sampSize =1000;
  array p[4] _temporary_(5 5 45 45);
  array c[4] _temporary_;
  do i =1 to dim(p);
    c [i] =ceil(sampSize *p[i]/100);
    end;
*  c[4] =c[4]-(sum(of c-sampSize);
  mean = &mean_age;  
  std = &std_age;
  do until (sampSize <=0);
    x =rand('normal', mean, std);
    select;
      %cond(1,%str(x>mean+3*std                 ) )
      %cond(2,%str(x>0          and x<mean-3*std) )
      %cond(3,%str(x>mean-3*std and x<mean-2*std) )
      %cond(4,%str(x>mean+2*std and x<mean+3*std) )
      otherwise;
      end;
    end;
  stop;
  run;

  %macro simulation (rate1 ,rate2);

/*count number of observations in outlier dataset and create macro variable &n1 &n2*/
        proc sql noprint;
                select count(*) into :n1
                from work.outlier_weight;
        quit;
        proc sql noprint;
                select count(*) into :n2
                from work.birth_weight;
        quit;

/* certian sample size and create macro variable &sample_size*/
        data _null_;
        sample_size=int(&n1*&rate1); put sample_size=;
        call symputx("sample_size",sample_size);
        run;

/*create random number and create a loop*/

        %let seed=12345;
        %do simu_num=1 %to 2;
        %let seed=%eval(&seed+&simu_num);
        %let obs=%eval(&n2-&sample_size);  

        %put n1=&n1;
        %put n2=&n2;
        %put sample_size=&sample_size;
        %put seed=&seed;
        %put obs=&obs;
/*sample numbers of data from outlier dataset randomly*/
        proc sql noprint outobs=&sample_size;
                create table simu_weight as
                select x as birth_weight
                from work.outlier_weight
                order by ranuni(&seed);
        quit;

/*create variable simu_num and rate in SAS dataset simu_weight*/
        data simu_weight;
                set simu_weight;
                simu_num=&simu_num;
        *        put simu_num;
                rate=&rate2;
        run;

/*sample numbers of data from brith_weight dataset randomly*/

data birth_weight_new;   

    set work.birth_weight;
        do i=1 to &n2;
        order=ranuni(&seed);
        end;
run;
proc sort data=birth_weight_new out=weight_random(drop=order i);
    by order;
run;

data weight_random;
        set weight_random;
                simu_num=&simu_num;
        *        put simu_num;
                rate=&rate2;
run;

/*replace oringnal data by simulation data*/
        data sample;
                set weight_random(obs=&obs) simu_weight;
        run;
/* proc sql noprint;
        create table all
    like sample        ;
quit; */

  proc append base=all data=sample force;
        run;
        %put loop=&simu_num;
    %end;

  proc sql;
    create table final as
    select distinct rate, simu_num, avg(birth_weight) as mean_weight, std(birth_weight) as std_weight
    from all
    group by simu_num
    order by simu_num;
  quit;

   proc datasets lib=work nolist;
    delete all;
   quit;
%mend;
%simulation (1, 0.05)

-----    LOG   -----
381  %macro cond(cond1, cond2);
382    when (c[&cond1] >0 and &cond2  ) do;
383      c[&cond1] +-1;
384      sampSize +-1;
385      output;
386      end;
387  %mend cond;
388
389
390  %let mean_age=3283.95;
391  %let std_age=563.1736630;
392  data work.outlier_weight;
393    call streaminit(12345);
394    sampSize =1000;
395    array p[4] _temporary_(5 5 45 45);
396    array c[4] _temporary_;
397    do i =1 to dim(p);
398      c [i] =ceil(sampSize *p[i]/100);
399      end;
400  *  c[4] =c[4]-(sum(of c-sampSize);
401    mean = &mean_age;
402    std = &std_age;
403    do until (sampSize <=0);
404      x =rand('normal', mean, std);
405      select;
406        %cond(1,%str(x>mean+3*std                 ) )
407        %cond(2,%str(x>0          and x<mean-3*std) )
408        %cond(3,%str(x>mean-3*std and x<mean-2*std) )
409        %cond(4,%str(x>mean+2*std and x<mean+3*std) )
410        otherwise;
411        end;
412      end;
413    stop;
414    run;

NOTE: The data set WORK.OUTLIER_WEIGHT has 1000 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


415
416    %macro simulation (rate1 ,rate2);
417
418  /*count number of observations in outlier dataset and create macro variable &n1 &n2*/
419      proc sql noprint;
420          select count(*) into :n1
421          from work.outlier_weight;
422      quit;
423      proc sql noprint;
424          select count(*) into :n2
425          from work.birth_weight;
426      quit;
427
428  /* certian sample size and create macro variable &sample_size*/
429      data _null_;
430      sample_size=int(&n1*&rate1); put sample_size=;
431      call symputx("sample_size",sample_size);
432      run;
433
434  /*create random number and create a loop*/
435
436      %let seed=12345;
437      %do simu_num=1 %to 2;
438      %let seed=%eval(&seed+&simu_num);
439      %let obs=%eval(&n2-&sample_size);
440
441      %put n1=&n1;
442      %put n2=&n2;
443      %put sample_size=&sample_size;
444      %put seed=&seed;
445      %put obs=&obs;
446  /*sample numbers of data from outlier dataset randomly*/
447      proc sql noprint outobs=&sample_size;
448          create table simu_weight as
449          select x as birth_weight
450          from work.outlier_weight
451          order by ranuni(&seed);
452      quit;
453
454  /*create variable simu_num and rate in SAS dataset simu_weight*/
455      data simu_weight;
456          set simu_weight;
457          simu_num=&simu_num;
458      *   put simu_num;
459          rate=&rate2;
460      run;
461
462  /*sample numbers of data from brith_weight dataset randomly*/
463
464  data birth_weight_new;
465
466      set work.birth_weight;
467      do i=1 to &n2;
468          order=ranuni(&seed);
469      end;
470  run;
471  proc sort data=birth_weight_new out=weight_random(drop=order i);
472      by order;
473  run;
474
475  data weight_random;
476      set weight_random;
477          simu_num=&simu_num;
478      *   put simu_num;
479          rate=&rate2;
480  run;
481
482  /*replace oringnal data by simulation data*/
483      data sample;
484          set weight_random(obs=&obs) simu_weight;
485      run;
486  /* proc sql noprint;
487      create table all
488      like sample ;
489  quit; */
490
491    proc append base=all data=sample force;
492      run;
493      %put loop=&simu_num;
494      %end;
495
496    proc sql;
497      create table final as
498      select distinct rate, simu_num, avg(birth_weight) as mean_weight, std(birth_weight) as
498! std_weight
499      from all
500      group by simu_num
501      order by simu_num;
502    quit;
503
504     proc datasets lib=work nolist;
505      delete all;
506     quit;
507  %mend;
508  %simulation (1, 0.05)
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



sample_size=1000
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


n1=    1000
n2=    3987
sample_size=1000
seed=12346
obs=2987
NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT
      clause.
NOTE: Table WORK.SIMU_WEIGHT created, with 1000 rows and 1 columns.

NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



NOTE: There were 1000 observations read from the data set WORK.SIMU_WEIGHT.
NOTE: The data set WORK.SIMU_WEIGHT has 1000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds



NOTE: There were 3987 observations read from the data set WORK.BIRTH_WEIGHT.
NOTE: The data set WORK.BIRTH_WEIGHT_NEW has 3987 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           2.25 seconds
      cpu time            2.25 seconds



NOTE: There were 3987 observations read from the data set WORK.BIRTH_WEIGHT_NEW.
NOTE: The data set WORK.WEIGHT_RANDOM has 3987 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



NOTE: There were 3987 observations read from the data set WORK.WEIGHT_RANDOM.
NOTE: The data set WORK.WEIGHT_RANDOM has 3987 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



NOTE: There were 2987 observations read from the data set WORK.WEIGHT_RANDOM.
NOTE: There were 1000 observations read from the data set WORK.SIMU_WEIGHT.
NOTE: The data set WORK.SAMPLE has 3987 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



NOTE: Appending WORK.SAMPLE to WORK.ALL.
NOTE: BASE data set does not exist. DATA file is being copied to BASE file.
NOTE: There were 3987 observations read from the data set WORK.SAMPLE.
NOTE: The data set WORK.ALL has 3987 observations and 5 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


loop=1
n1=    1000
n2=    3987
sample_size=1000
seed=12348
obs=2987
NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT
      clause.
NOTE: Table WORK.SIMU_WEIGHT created, with 1000 rows and 1 columns.

NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



NOTE: There were 1000 observations read from the data set WORK.SIMU_WEIGHT.
NOTE: The data set WORK.SIMU_WEIGHT has 1000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



NOTE: There were 3987 observations read from the data set WORK.BIRTH_WEIGHT.
NOTE: The data set WORK.BIRTH_WEIGHT_NEW has 3987 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           2.40 seconds
      cpu time            2.20 seconds



NOTE: There were 3987 observations read from the data set WORK.BIRTH_WEIGHT_NEW.
NOTE: The data set WORK.WEIGHT_RANDOM has 3987 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



NOTE: There were 3987 observations read from the data set WORK.WEIGHT_RANDOM.
NOTE: The data set WORK.WEIGHT_RANDOM has 3987 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds



NOTE: There were 2987 observations read from the data set WORK.WEIGHT_RANDOM.
NOTE: There were 1000 observations read from the data set WORK.SIMU_WEIGHT.
NOTE: The data set WORK.SAMPLE has 3987 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds



NOTE: Appending WORK.SAMPLE to WORK.ALL.
NOTE: There were 3987 observations read from the data set WORK.SAMPLE.
NOTE: 3987 observations added.
NOTE: The data set WORK.ALL has 7974 observations and 5 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


loop=2
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.FINAL created, with 2 rows and 4 columns.

NOTE: PROCEDURE SQL used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds



NOTE: Deleting WORK.ALL (memtype=DATA).
NOTE: PROCEDURE DATASETS used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds




使用道具

26
小宝爱波1314 发表于 2014-4-13 14:30:11 |只看作者 |坛友微信交流群
yongyitian 发表于 2014-3-19 11:20
您好,我想您修改一下您上次给的这个程序,我自己从节前一直修改了两个星期,看了macro的书,但是还是没能修改好,求您帮我看一下
birth_data的数据集。里面的变量包括patient和age,一共有N个观测。
1)  在outlier数据集中随机抽取rate*N条数据(原来的程序是在age的最大值和最小值之间随机产生0.1*N个模拟数)。outlier的产生程序我附在最后面。outlier中一共有100000个数。
2)  在birth_data随机抽取10%的干净数据,这些数据将被上一步中产生的模拟数据代替。随机抽取的次数为1000次,相应地,替代的次数也为一千次。这样可以得到1000个被抽取和替代后的birth_data的模拟数据集。
3)  得到的1000个模拟数据集以后,分别计算出这些模拟数据集中age的均值和方差,并且将这些均值和方差合并入格式如下的数据集中。
这些要求都和主贴样了。
%macro cond(cond1, cond2);
  when (c[&cond1] >0 and &cond2  ) do;
    c[&cond1] +-1;
    sampSize +-1;
    output;
    end;
%mend cond;
data B1841039.outlier_weight;
  call streaminit(12345);
  sampSize =100000;
  array p[4] _temporary_(5 5 45 45);
  array c[4] _temporary_;
  do i =1 to dim(p);
    c =ceil(sampSize *p/100);
    end;
  c[4] =c[4]-(sum(of c
  • )-sampSize);
      mean = mean_age;*我是手动添加age的均值和方差进去的;
      std = std_age;
      do until (sampSize <=0);
        x =rand('normal', mean, std);
        select;
          %cond(1,%str(x>mean+3*std                 ) )
          %cond(2,%str(x>0          and x<mean-3*std) )
          %cond(3,%str(x>mean-3*std and x<mean-2*std) )
          %cond(4,%str(x>mean+2*std and x<mean+3*std) )
          otherwise;
          end;
        end;
      stop;
      run;
  • 使用道具

    yongyitian 发表于 2014-4-2 10:04
    这个确实是我解决不了的,昨天下午调试了一下午,自己学的所有的都用上了,早上6点起来调试到现在也没有办法搞好,只能求助您了

    使用道具

    yongyitian 发表于 2014-4-2 10:04
    新建 Microsoft Word Document (2).docx (14.01 KB) LT_3DKO9[7}5%`1(UA4__WD.jpg 您好,我用你指点我的方法写了程序,我用这个循环来测试,%do simu_num=1 %to 10;但是得到的结果最后只有最后一个迭代步骤的数据被输出了,前面的数据都没有,log里面没有提示有错误,我也找不出来哪里出了问题,请您帮我看一下。我把程序放在附件的word文档里面了,谢谢您

    使用道具

    您需要登录后才可以回帖 登录 | 我要注册

    本版微信群
    加好友,备注cda
    拉您进交流群

    京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

    GMT+8, 2024-5-6 07:20