为简明起见,以 sashelp.class 数据集为例,要求以性别分组,分别求男、女身高总和。
sashelp.class 原表:
Obs Name Sex Age Height Weight
1 Joyce F 11 51.3 50.5
2 Louise F 12 56.3 77.0
3 Alice F 13 56.5 84.0
4 Jane F 12 59.8 84.5
5 Janet F 15 62.5 112.5
6 Carol F 14 62.8 102.5
7 Judy F 14 64.3 90.0
8 Barbara F 13 65.3 98.0
9 Mary F 15 66.5 112.0
10 James M 12 57.3 83.0
11 Thomas M 11 57.5 85.0
12 John M 12 59.0 99.5
13 Jeffrey M 13 62.5 84.0
14 Henry M 14 63.5 102.5
15 Robert M 12 64.8 128.0
16 William M 15 66.8 112.0
17 Ronald M 15 67.0 133.0
18 Alfred M 14 69.0 112.5
19 Philip M 16 72.0 150.0
方法1,first.varaiable
data height_sum1;
set sashelp.class; /* class数据集应先按sex sort*/
by sex;
if first.sex then tot_H=0;
tot_H + height;
if last.sex then output ;
run;
Obs Sex tot_H1 F 545.3
2 M 639.4
方法2, proc means
proc means data=class ; /* class数据集应先按sex sort*/
by sex;
output out=height_sum2 (drop=_type_ _freq_) sum(height)= /autoname;
run;
Height_
Obs Sex Sum
1 F 545.3
2 M 639.4
注1,也可以不用 /autoname命令,而自己指定变量名称 total_height :
proc means data=class ;by sex;
output out=height_sum3 (drop=_type_ _freq_) sum(height)=total_height;
run;
total_
Obs Sex height
1 F 545.3
2 M 639.4
注2,此处 by sex 可用 class sex 代替,数据集sashelp.class 也不必经过事先任何sort。其结果除了分组求和,还有额外一项,是两组的总和。
proc means data=class ;
class sex;
output out=height_sum4 (drop=_type_ _freq_) sum(height)=T_HEIGHT;
run;
Obs Sex T_HEIGHT
1 1184.7
2 F 545.3
3 M 639.4
方法3,proc summary
proc summary其实和上述proc means的用法一样,区别是,前者执行后不会自动给出分析的结果,必须用output/out 语句(或 print过程)来显示。
proc summary data=class ;
by sex;
output out=XXX (drop=_type_ _freq_) sum(height) = H_summary;
run;
Obs Sex H_summary
1 F 545.3
2 M 639.4
方法4,proc tabulate(数据集不用先 sort )
proc tabulate data=sashelp.class;
class sex;
var height;
table sex*height ;
run;
‚ Sex ‚
‚ F ‚ M ‚
‚ Height ‚ Height ‚
‚ Sum ‚ Sum ‚
‚ 545.30 ‚ 639.40 ‚
方法5,proc sql ( 数据集不用先 sort )
Proc sql;
SELECT sex,
sum(height) as tot_height from sashelp.class
GROUP BY sex;
quit;
Sex tot_height
F 545.3
M 639.4