楼主: 新人2012
42165 22

[学习分享] 有人问我,SAS中合并数据集有几种方法 [推广有奖]

11
龙潭丰乐 学生认证  发表于 2013-1-2 23:09:13
wangfengxi 发表于 2013-1-2 18:53
merge是按址合并,指针按by变量的顺序,不管数据集的先后顺序。个人理解。
不太懂,因为在 sas advance书上面是说循序的,不太懂

12
slimdell 发表于 2013-1-3 09:02:14
新人2012 发表于 2012-12-30 18:31
参与讨论的,我都给你们加点。
然后我的所有论坛币全分给你们,不过就21个,每楼5个,先来先得。
你的论坛币好像不止那么点吧
知乎专栏/微信公号同名:煮酒烹茶读书

13
kittyforever 发表于 2013-1-3 12:43:33
很有见解

14
zhuhongxu 发表于 2013-1-8 21:16:33
新人2012 发表于 2012-12-30 18:31
参与讨论的,我都给你们加点。
然后我的所有论坛币全分给你们,不过就21个,每楼5个,先来先得。
老师推荐李子奈的《计量经济学》和贺铿的《经济计量学》,我买了本李子奈的书。李子奈的书公式推导规范、重点突出,比较符合中国学生的口味。丁老师说李子奈的书不错,接着向我们推荐伍德里奇的《计量经济学导论:现代观点》,说最好看英文版的。我听了当时肠子都悔青了:本来开学的时候去了趟南湖找古扎拉蒂的计量,看到一排

15
xueyinchina 发表于 2013-5-23 19:07:19
冰雨狼 发表于 2012-12-31 11:27
还有一种是 PROC append ,不过这个是用来追加数据集的(也就是竖向的合并)。
示例:现有数据集A(包含变量  ...
请问第一句最后的那个force 是什么意思呀
心有多大,舞台就有多大。

16
chinaeu 发表于 2013-6-24 17:34:40
thankssssssssssssssssssssssss, muyou xiaoqianqiana

17
chinaeu 发表于 2013-6-24 17:37:55
学习了,谢谢了

18
dapangwang 发表于 2013-7-28 16:36:47
支持楼主

19
huahuaaiqingtia 发表于 2015-6-14 21:41:23
Data sets
(1)proc sql;
内外连接的实现的基本原理:首先生成两个数据表的迪卡尔积,然后再根据where语句来选择符合条件的数据作为输出结果
inner join: where a.XX=b.XX; 仅返回匹配的数据,最多可以有32个表同时进行内连接
left outer join: on a.XX=b.XX; 左表(a)的记录将会全部表示出来,而右表(b)只会显示符合搜索条件的记录(例子中为: A.aID = B.bID).
B表记录不足的地方均为NULL
right outer join: on a.XX=b.XX;
full outer join: on a.XX=b.XX;

(2)
One-to-One Reading:the new data set contains all the variables from all the input data sets. If the data sets contain same-named variables, the values that are read in from the last data set replace those that were read in from earlier ones
data one2one;
        set a;
        set b;
     run;
Concatenating:the new data set contains all the variables and the total number of observations from all input data sets
data concat;
        set a b;
     run;
Interleaving:input data set must be sorted or indexed in ascending order based on the BY variable(s),new data set contains all the variables and the total number of observations from all input data sets.
data interlv;
        set a b;
        by num;
     run;
Match-Merging: input data set must be sorted or indexed in ascending order based on the BY variable(s). During match-merging, SAS sequentially checks each observation of each data set to see whether the BY values match, then writes the combined observation to the new data set.
     data merged;
        merge a b;
        by num;
     run;


Data

Vertically:
Combining Raw Data Files Using a FILENAME Statement

     filename qtr1 ('c:\data\month1.dat''c:\data\month2.dat'
                    'c:\data\month3.dat');
     data work.firstqtr;
        infile qtr1;
        input Flight $ Origin $ Dest $
              Date : date9. RevCargo : comma15.2;
     run;

Combining Raw Data Files Using an INFILE Statement

     data quarter (drop=monthnum midmon lastmon);
        monthnum=month(today());
        midmon=month(intnx('month',today(),-1));
        lastmon=month(intnx('month',today(),-2));
        do i = monthnum, midmon, lastmon;
           nextfile="c:\sasuser\month"
                    !!compress(put(i,2.)!!".dat",' ');
           do until (lastobs);
              infile temp filevar=nextfile end=lastobs;
              input Flight $ Origin $ Dest $ Date : date9.
                    RevCargo : comma15.2;
              output;
           end;
        end;
        stop;
     run;

Combining SAS Data Sets Using PROC APPEND

     proc append base=work.acities
                 data=work.airports force;


Horizontally:
Combining Data with the IF-THEN/ELSE Statement
      data mylib.employees_new;
        set mylib.employees;
        if IDnum=1001 then Birthdate='01JAN1963'd;
        else if IDnum=1002 then Birthdate='08AUG1946'd;
        else if IDnum=1003 then Birthdate='23MAR1950'd;
        else if IDnum=1004 then Birthdate='17JUN1973'd;
     run;

Combining Data with the ARRAY Statement
      data mylib.employees_new;
        array birthdates{1001:1004} _temporary_ ('01JAN1963'd
              '08AUG1946'd '23MAR1950'd '17JUN1973'd);
        set mylib.employees;
        Birthdate=birthdates(IDnum);
     run;

Combining Data with the FORMAT Procedure
      proc format;
        value $birthdate '1001' = '01JAN1963'
                         '1002' = '08AUG1946'
                         '1003' = '23MAR1950'
                         '1004' = '17JUN1973';
     run;
     
     data mylib.employees_new;
        set mylib.employees;
        Birthdate=input(put(IDnum,$birthdate.),date9.);
     run;

Performing a DATA Step Match-Merge
      proc sort data=sasuser.expenses out=expenses;
        by flightid date;
     run;
         
     proc sort data=sasuser.revenue out=revenue;
        by flightid date;
     run;
         
     data revexpns (drop=rev1st revbusiness revecon
          expenses);
        merge expenses(in=e) revenue(in=r);
        by flightid date;
        if e and r;
        Profit=sum(rev1st, revbusiness, revecon,
               -expenses);
     run;
         
     proc sort data=revexpns;
        by dest;
     run;
         
     proc sort data=sasuser.acities out=acities;
        by code;
     run;
     
     data sasuser.alldata;
        merge revexpns(in=r) acities
              (in=a rename=(code=dest)
              keep=city name code);
        by dest;
        if r and a;
     run;

Performing a PROC SQL Join
      proc sql;
        create table sqljoin as
        select revenue.flightid,
               revenue.date format=date9.,
               revenue.origin, revenue.dest,
               sum(revenue.rev1st,
                   revenue.revbusiness,
                    revenue.revecon)
               -expenses.expenses as Profit,
               acities.city, acities.name
        from sasuser.expenses, sasuser.revenue,
             sasuser.acities
        where expenses.flightid=revenue.flightid
              and expenses.date=revenue.date
              and acities.code=revenue.dest
        order by revenue.dest, revenue.flightid,
                 revenue.date;
     quit;

Working with a Many-to-Many Match
      proc sql;
        create table flightemp as
        select flightschedule.*, firstname, lastname
           from sasuser.flightschedule, sasuser.flightattendants
           where flightschedule.empid=flightattendants.empid;
     quit;
         
     data fightemps3(drop=empnum jobcode)
        set sasuser.flightschedule;
        do i=1 to num;
           set sasuser.flightattendants
               (rename=(empid=empnum))
                nob=num point=i;
           if empid=empnum then output;
        end;
     run;

Combining Summary Data and Detail Data
      proc means data=sasuser.monthsum noprint;
        var revcargo;
        output out=sasuser.summary sum=Cargosum;
     run;

     data sasuser.percent1;
        if _n_=1 then set sasuser.summary
                          (keep=cargosum);
        set sasuser.monthsum
            (keep=salemon revcargo);
        PctRev=revcargo/cargosum;
     run;
      data sasuser.percent2(drop=totalrev);
        if _n_=1 then do until(lastobs);
           set sasuser.monthsum(keep=revcargo)
               end=lastobs;
           totalrev+revcargo;
        end;
        set sasuser.monthsum (keep=salemon revcargo);
        PctRev=revcargo/totalrev;
     run;

Using an Index to Combine Data
      data work.profit work.errors;
        set sasuser.dnunder;
        set sasuser.sale2000(keep=routeid
            flightid date rev1st revbusiness
            revecon revcargo)key=flightdate;
        if _iorc_=0 then do;
           Profit=sum(rev1st, revbusiness, revecon,
                  revcargo, -expenses);
           output work.profit;
        end;
        else do;
           _error_=0;
           output work.errors;
        end;
     run;

20
_appilepie_ 发表于 2016-11-9 09:56:44
使用set语句小心得:当合并的两个数据集变量顺序不一致时,set语句可以自动寻找相匹配的变量。如下边的两个数据集a和b使用set合并:
  1. data a;
  2.   input sex$ age height class;
  3.   cards;
  4.   boy 10 130 1
  5.   girl 11 125 1
  6.   boy 11 127 1
  7.   boy 10 131 1
  8.   ;
  9.   run;
  10.   data b;
  11.     input age height sex$ class;
  12.     cards;
  13.     12 137 boy 2
  14.     11 130 girl 2
  15.     13 145 boy 2
  16.     13 146 girl 2
  17. ;
  18. run;
  19. data c;
  20.   set a b;
  21. run;
复制代码

合并后c数据集如下:
c.png




您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2026-1-7 23:43