sas编程求助 - 经管之家

0关注
0粉丝

已卖：98份资源

本科生

68%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 620 个
通用积分: 0.6600
学术水平: 0 点
热心指数: 1 点
信用等级: 0 点
经验: 584 点
帖子: 30
精华: 0
在线时间: 158 小时
注册时间: 2008-5-8
最后登录: 2020-9-21

楼主

jerry0501 发表于 2013-11-18 01:56:12 |AI写论文

100论坛币

请问sas如何实现以下数据处理：
假设我有100个观测值，两个变量age和income。我想求出每个观测值所在年龄阶段中（比如age相差不超过2年）的所有人的income平均值。

详细点说就是，我想加第三个变量，avg_income。比如有第一个人50岁，他的avg_income就是所有48-52岁人的income平均值。第二个人30岁，avg_income就是所有28-32岁人的平均值。以此类推。

另外希望年龄的差距可以设定，比如把两年换成5年。

多谢

（不好意思，刚才的帖子没加悬赏）

最佳答案

邓贵大查看完整内容

分享0 收藏1 回帖

关键词：SAS编程 Income Come 不好意思年龄阶段 income 平均值如何

本帖被以下文库推荐

· SAS学习|主题: 265, 订阅: 65
· bookstore of LV|主题: 135, 订阅: 7

沙发

邓贵大 发表于 2013-11-18 01:56:13

data test;
do id=1 to 100;
age = 18+ceil(60*ranuni(12345));
income = 100000+20000*rannor(12345);
output;
end;
%let delta=2;
data repeat;
set test;
do age = age-&delta to age+δ
output;
end;
run;
proc means data=repeat noprint mean;
class age;
var income;
output out=mean_income mean=income;
run;
proc sql;
create table income as
select a.*, b.income as avg_income
from test a left join mean_income b on a.age=b.age
order by a.id;
quit;

复制代码

已有 3 人评分	学术水平	热心指数	信用等级	收起理由
Tigflanker	+ 1	+ 1	+ 1	又被亮瞎了
牵你↗左手	+ 1	+ 1		思路很独特哇
zhou.wen	+ 1	+ 1		You always make me shine at the moment

总评分: 学术水平 + 3 热心指数 + 3 信用等级 + 1 查看全部评分

Be still, my soul: the hour is hastening on
When we shall be forever with the Lord.
When disappointment, grief and fear are gone,
Sorrow forgot, love's purest joys restored.

藤椅

zhou.wen 发表于 2013-11-18 16:00:09

邓老的代码效率不算最高
但是思路让人眼前一亮

Practice Is The Best Teacher!

板凳

bobguy 发表于 2013-11-19 07:26:44

Here is a simple solution.

data test;
      do id=1 to 10;
            age = 18+ceil(10*ranuni(12345));
            income = 100000+20000*rannor(12345);
            output;
      end;
run;

proc print;run;

proc sql;
  select distinct a.id, a.age, mean(b.income) as average_inc
  from test a, test b
  where abs(a.age-b.age)<=2
  group by 1
  order by 1,2
  ;
  quit;

已有 2 人评分	学术水平	热心指数	信用等级	收起理由
Tigflanker	+ 1	+ 1	+ 1	好思路！
shenliang_111	+ 1	+ 1	+ 1	好思路！

总评分: 学术水平 + 2 热心指数 + 2 信用等级 + 2 查看全部评分

报纸

牵你↗左手 发表于 2013-11-19 08:16:50

bobguy 发表于 2013-11-19 07:26
Here is a simple solution.

data test;

代码是简略了，但是sql使用笛卡尔积连接，产生的数据会很多，运行起来很慢的

地板

moyunzheng 发表于 2013-11-20 14:52:22

数据量大建立age的索引就可以了。
where abs(a.age-b.age)<=2会把年龄缺失的都算进每一组

7楼

wuyouheng 发表于 2013-11-21 16:29:32

邓老的思路太NB了

8楼

lchw001 发表于 2013-11-23 12:55:28

我也试试。@邓贵大的代码很牛！但是我不知道如果同样年龄的2个人有不同的收入水平会不会导致问题。我写了一个macro。献丑了。test数据的生成使用了@邓贵大的代码。
**生成一个测试用的数据；
data test;
         do id=1 to 100;
                  age = 18+ceil(60*ranuni(12345));
                  income = 100000+20000*rannor(12345);
                  output;
         end;
   run;
*生成一个临时数据；
data one;
input avg_income age;
cards;
. .
;
run;
**注：只要指定的这个max_age 值大于数据中的最大年龄就可以。delta就是相差的年龄；
%macro est(max_age,delta);
data age_mean_income;
set one;
run;
%let j=1;
%do %while (&j.<=&max_age.);
data temp;
set test;
where &j.-&delta<=age<=&j.+&delta.;
run;
proc means data=temp noprint mean;
var income;
output out=mean_income mean=avg_income;
run;
data mean_income;
set mean_income;
age=&j.;
drop _type_ _freq_;
run;
proc append base=age_mean_income data=mean_income;
run;
%let j=%eval(&j.+1);
%end;
proc sort data=age_mean_income;
by age;
run;
proc sort data=test;
by age;
run;
data income;
merge age_income test(in=a);
by age;
if a^=1 then delete;
run;
%mend;
%est(100,2);

9楼

jjtww 发表于 2013-12-20 11:21:33

可以利用proc fcmp,代码如下：

data test;
do id=1 to 100;
age = 18+ceil(60*ranuni(12345));
income = 100000+20000*rannor(12345);
output;
end;
run;
proc fcmp outlib=work.functions.avg;
function avg0001(age,diff);
array d[1000,2]/nosymbols;
rc=read_array('work.test',d,'age','income');
s=0;r=0;
flag=0;
do i=1 to dim(d);
if d[i,1]>=age-diff then flag=1;
if d[i,1]>age+diff then flag=0;
if flag=1 then do;r=r+1;s=s+d[i,2];end;
end;
if r=0 then avg=.;
else avg=s/r;
return (avg);
endsub;
run;quit;
options cmplib=work.functions;
data testPlus;
set test;
avg_income=avg0001(age,2);
run;

复制代码

proc fcmp缺点就是每行都扫描一遍表，对于数据量较小的看不出来。
数据量大时，比如id上千亿，proc fcmp 可以调用Hash，这样效率就高了。
有人研究过，附链接：http://support.sas.com/resources ... ings13/129-2013.pdf
总结，proc fcmp是个好proc.

10楼

jingju11 发表于 2013-12-21 02:17:54

I used data step + array to solve the problem. Here Age has to be positive integer. Since no table merge involved, the efficiency is very high. The limit of size of data can be run in this code depends on your data strucutre. I can run the code on the data set of 7 million records, 100 disctinct BY-level, age from 1 to 101 (uniformly distributed), and for 2 analyzed variables in about 12s.
Jingju

http://blog.sina.com.cn/s/blog_a3a926360101hh8z.html

[有偿编程] sas编程求助 [推广有奖]

最佳答案

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

中级学术勋章

中级热心勋章

本版微信群