用SAS来产生子集 - 经管之家

0关注
4粉丝

已卖：23份资源

教授

12%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 6762 个
通用积分: 19.5160
学术水平: 18 点
热心指数: 24 点
信用等级: 15 点
经验: 383 点
帖子: 1182
精华: 0
在线时间: 997 小时
注册时间: 2013-1-20
最后登录: 2024-8-2

楼主

小宝爱波1314 发表于 2014-3-10 14:47:35 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

求助各位大神，用SAS来对类似以下的一个数据集产生子集
ID    DATATYPE    ORI_DATA COR_DATA
1          A                   XXX          XYX
1          A                   XYX          XXY
2          A                   YYY          ZMZ
2          B                   TTT          TRT
3          C                   YYX          YYY
4          D                   ZZZ          ZZX;

要求：
按照datatype变量来产生子集，即上述数据的前三个datatype为“A”的话，放到一个子集里面，以A命名子集。
原来的数据非常的多，datatype可以按照顺序排出来，但是如果用单纯的where或者if语句来产生子集，需要写很多的类似的行，希望高手可以帮助我呀，在此谢谢各位啦

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏2 回帖

关键词：datatype datat Where Data type where 命名

回帖推荐

yongyitian 发表于10楼查看完整内容

Sorry there are some errors that were made during cleaning and Simplifying the code after testing. Above line 2 seems having problems, it might be caused by copying and pasting the fonts from different computers. the line 2 should be read as: input id datatype $ ori_data $ cor_data $; line 23 is not correct. it should be read as: call execute('data '||ds||'; set test(where =(datatype=" ...

沙发

wwang111 发表于 2014-3-10 15:11:26

假设数据集的名字是test：

proc sql noprint;
select distinct datatype into: typlist separated by " "
from test;
quit;

* macro;
%macro subset;
%let i=1;
  %do %until(%scan(&typlist,&i)=);
data %scan(&typlist,&i);
   set test;
   where datatype="%scan(&typlist,&i)";
run;
  %let i=%eval(&i+1);
  %end;
%mend;
%subset

* call execute;
data _null_;
set test end=last;
by datatype;
length code $2000;
retain code;
if _n_=1 then code="data &typlist; set test;";
if last.datatype then code=trim(code)||"if datatype="||quote(strip(datatype))||" then output "
||strip(datatype)||';';
if last then do;
  code=trim(code)||';run;';
  call execute(code);
end;
run;

call execute方法里面需要数据集按datatype排好序。

只有一个罗纳尔多

藤椅

小宝爱波1314 发表于 2014-3-10 15:31:18

wwang111 发表于 2014-3-10 15:11
假设数据集的名字是test：

proc sql noprint;

你好，这个有点复杂，我在我的电脑上修改了一下，很多error，我也不知道怎么回事呀。我把原来的数据集给您，麻烦您再帮我看一下。我想要按照datatype来把数据分类，然后每类创造出一个子集，单独保存。

duplicate_document.xlsx (92.31 KB)

板凳

doudoudphyn 发表于 2014-3-10 16:08:21

请问楼上有熟悉SAS的高手不（在校学生），本人有朋友急需sas培训师，待遇从优，有意向的盆友们联系我哦！qq：2579858093 tel:13810097914

报纸

nomad5 发表于 2014-3-10 19:32:04

/*按照datatype排序*/
proc sort data=test;
by datatype;
run;
/*为datatype定义连续数值变量，A=1，B=2,C=3…… */
data test2;
set test end=final;
by datatype;
if first.datatype then order+1;
run;
/*准备制作宏变量的数据集*/
proc sort data=test2 out=test_name nodupkey;
by order;
run;
/*制作宏变量，name1=A，name2=B,name3=C……, max=datatype个数*/
data _null_;
set test_name end=final;
call symput("name"||strip(put(order,best.)),strip(datatype));
if final then call symput("max",strip(put(order,best.)));
run;
/*其实上面制作format会更简单*/
%macro m;
data %do i=1 %to %eval(&max.);
&&name&i..
%end; ; /*这里是2个分号*/
set test2;
%do j=1 %to %eval(&max.);
if order=&j. then output &&name&j..;
%end;
run;
%mend m;
%m;
/*没测试*/

复制代码

地板

小宝爱波1314 发表于 2014-3-10 21:08:24

RE: 用SAS来产生子集

nomad5 发表于 2014-3-10 19:32

不好意思呀，好像您没有看明白我的意思。我把数据集给您，您看看能不能帮我重新看一下。我是想按照data_type来把原来的数据集分割成许多子集，相同的data_type的数据放到一个子集里面，然后子集的名称以data_type来命名。我之前估计没有讲清楚，谢谢您啦，麻烦您帮我再看一次。

7楼

小宝爱波1314 发表于 2014-3-10 21:09:07

nomad5 发表于 2014-3-10 19:32

数据集我放在了二楼，麻烦您看一下

8楼

yongyitian 发表于 2014-3-11 09:31:51

data test;
input ID DATATYPE $ ORI_DATA $ COR_DATA $;
datalines;
1 A XXX XYX
1 A XYX XXY
2 A YYY ZMZ
2 B TTT TRT
3 C YYX YYY
4 D ZZZ ZZX
; run;
/* using call execute */
proc sql;
create table types as
select distinct datatype as type
from test;
quit;
data _null_;
set types;
length ds $8.;
ds=cats('type_', compress(type));
call execute('data '||ds||'; set test_sort(where =(datatype="'||type||'")); run;');
run;
/* using hash */
proc sort data=test out=test_sort;
by datatype;
run;
data _null_ ;
declare hash h (ordered: 'a') ;
h.definekey ('datatype', '_n_') ;
h.definedata ('DATATYPE', 'ID', 'ORI_DATA', 'COR_DATA' ) ;
h.definedone ( ) ;
do _n_ = 1 by 1 until ( last.datatype ) ;
set test_sort;
by datatype ;
h.add() ;
end ;
h.output (dataset: 'Out_'|| compress(datatype)) ;
run ;

复制代码

9楼

小宝爱波1314 发表于 2014-3-11 10:35:50

yongyitian 发表于 2014-3-11 09:31

你好，这个有点复杂，我在我的电脑上修改了一下，很多error，我也不知道怎么回事呀。我想要按照datatype来把数据分类，然后每类创造出一个子集，单独保存。

10楼

yongyitian 发表于 2014-3-11 11:10:44

Sorry there are some errors that were made during cleaning and Simplifying the code after testing.

Above line 2 seems having problems, it might be caused by copying and pasting the fonts from different computers.

the line 2 should be read as:
input id datatype $ ori_data $ cor_data $;

line 23 is not correct. it should be read as:
call execute('data '||ds||'; set test(where =(datatype="'||type||'")); run;');

The following is the corrected code.

data test;
input ID DATATYPE $ ORI_DATA $ COR_DATA $;
datalines;
1 A XXX XYX
1 A XYX XXY
2 A YYY ZMZ
2 B TTT TRT
3 C YYX YYY
4 D ZZZ ZZX
; run;
/* using call execute */
proc sql;
create table types as
select distinct datatype as type
from test;
quit;
data _null_;
set types;
length ds $8.;
ds=cats('type_', compress(type));
call execute('data '||ds||'; set test(where =(datatype="'||type||'")); run;');
run;
/* using hash */
proc sort data=test out=test_sort;
by datatype;
run;
data _null_ ;
declare hash h (ordered: 'a') ;
h.definekey ('datatype', '_n_') ;
h.definedata ('DATATYPE', 'ID', 'ORI_DATA', 'COR_DATA' ) ;
h.definedone ( ) ;
do _n_ = 1 by 1 until ( last.datatype ) ;
set test_sort;
by datatype ;
h.add() ;
end ;
h.output (dataset: 'Out_'|| compress(datatype)) ;
run ;

复制代码

here is the log

235  data test;
236  input ID DATATYPE $ ORI_DATA $  COR_DATA $;
237  datalines;

NOTE: The data set WORK.TEST has 6 observations and 4 variables.
NOTE: DATA statement used (Total process time):
   real time          0.01 seconds
   cpu time          0.01 seconds

244  ; run;
245
246  /* using call execute */
247  proc sql;
248    create table types as
249    select distinct datatype as type
250    from test;
NOTE: Table WORK.TYPES created, with 4 rows and 1 columns.

251  quit;
NOTE: PROCEDURE SQL used (Total process time):
   real time          0.00 seconds
   cpu time          0.00 seconds

252
253  data _null_;
254    set types;
255    length ds $8.;
256    ds=cats('type_', compress(type));
257    call execute('data '||ds||'; set test(where =(datatype="'||type||'")); run;');
258  run;

NOTE: There were 4 observations read from the data set WORK.TYPES.
NOTE: DATA statement used (Total process time):
   real time          0.00 seconds
   cpu time          0.00 seconds

NOTE: CALL EXECUTE generated line.
1 + data type_A  ; set test(where =(datatype="A    ")); run;

NOTE: There were 3 observations read from the data set WORK.TEST.
   WHERE datatype='A    ';
NOTE: The data set WORK.TYPE_A has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
   real time          0.01 seconds
   cpu time          0.00 seconds

2 + data type_B  ; set test(where =(datatype="B    ")); run;

NOTE: There were 1 observations read from the data set WORK.TEST.
   WHERE datatype='B    ';
NOTE: The data set WORK.TYPE_B has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
   real time          0.00 seconds
   cpu time          0.00 seconds

3 + data type_C  ; set test(where =(datatype="C    ")); run;

NOTE: There were 1 observations read from the data set WORK.TEST.
   WHERE datatype='C    ';
NOTE: The data set WORK.TYPE_C has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
   real time          0.00 seconds
   cpu time          0.00 seconds

4 + data type_D  ; set test(where =(datatype="D    ")); run;

NOTE: There were 1 observations read from the data set WORK.TEST.
   WHERE datatype='D    ';
NOTE: The data set WORK.TYPE_D has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
   real time          0.00 seconds
   cpu time          0.00 seconds

259
260
261  /* using hash */
262  proc sort data=test out=test_sort;
263    by datatype;
264  run;

NOTE: There were 6 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST_SORT has 6 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
   real time          0.01 seconds
   cpu time          0.01 seconds

265
266  data _null_ ;
267 declare hash h (ordered: 'a') ;
268    h.definekey  ('datatype', '_n_') ;
269    h.definedata ('DATATYPE', 'ID', 'ORI_DATA', 'COR_DATA' ) ;
270    h.definedone ( ) ;
271 do _n_ = 1 by 1 until ( last.datatype ) ;
272    set test_sort;
273    by datatype ;
274    h.add() ;
275 end ;
276    h.output (dataset: 'Out_'|| compress(datatype)) ;
277  run ;

NOTE: The data set WORK.OUT_A has 3 observations and 4 variables.
NOTE: The data set WORK.OUT_B has 1 observations and 4 variables.
NOTE: The data set WORK.OUT_C has 1 observations and 4 variables.
NOTE: The data set WORK.OUT_D has 1 observations and 4 variables.
NOTE: There were 6 observations read from the data set WORK.TEST_SORT.
NOTE: DATA statement used (Total process time):
   real time          0.03 seconds
   cpu time          0.03 seconds

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
shenliang_111	+ 1	+ 1	+ 1	hash的应用恰到好处～受教了

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

用SAS来产生子集 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

回帖推荐

RE: 用SAS来产生子集

浏览过的帖子

浏览过的版块

本版微信群

用SAS来产生子集 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

回帖推荐

RE: 用SAS来产生子集

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群