[原创博文] 如何对大批量数据重复做线性回归 [推广有奖]

11楼

sxlion 发表于 2011-9-29 23:09:40

很简单的，特别是对这种重复性的工作，真是macro派上用途的地方。
就算是1万个，也是so easy ！

当然如果用iml的话，可能速度更快，代码可能比较简单。

SAS资源资讯 SAS综合信息博客 saslist.net

12楼

sherrysmile 发表于 2011-9-30 02:04:45

crazygoing 发表于 2011-9-29 09:07
花了点时间，给你编了代码，我测试了一下应该没问题。具体你根据自己需要再改吧。

谢谢"crazygoing",太强大了!!! 十分感谢！！我运行了一下是可以的。宏语言真是很方便,有空要多学习。

13楼

sherrysmile 发表于 2011-9-30 02:09:17

yunqingwang 发表于 2011-9-29 22:16
几千个回归，用宏处理结果也不容易
IML，模块化编程多好啊，咋就没人用

不知您说的IML程序是如何编写的呢？如果方便的话，能否根据我提供的sample数据大概给个编码嚒？对于sas的任何东西我都很想多学习学习呵呵。感谢。

14楼

情迷仲夏夜 发表于 2011-9-30 02:11:04

yunqingwang 发表于 2011-9-29 22:16
几千个回归，用宏处理结果也不容易
IML，模块化编程多好啊，咋就没人用

既然你说用IML模块写好，那就请把你写的程序晒一晒呀！

15楼

jingju11 发表于 2011-9-30 05:51:57

鼓吹一下：SXLION是论坛里高水平的SAS使用者之一，无论是理论还是实际京剧

16楼

denver 发表于 2011-9-30 08:25:22

yunqingwang 发表于 2011-9-29 22:16
几千个回归，用宏处理结果也不容易
IML，模块化编程多好啊，咋就没人用

期待着您的IML程序

Denver大家一起读Paper系列索引贴：
https://bbs.pinggu.org/thread-1430892-1-1.html

17楼

sherrysmile 发表于 2011-9-30 09:48:07

jingju11 发表于 2011-9-30 05:51
鼓吹一下：SXLION是论坛里高水平的SAS使用者之一，无论是理论还是实际操作。
我个人使用很多的宏程序。现在 ...

其实我的问题用reg的目的就是得到每个因变量的coefficent，p值，以及每个回归的r-square，adj-rsquare值。因为自变量的个数很多，几千个，如果单纯用proc reg来运算的话，结果会显示output window full，并且即使可以得到所有output，把结果一个一个复制粘贴也比较不方便。理想的是写一个程序，能直接把output中所有自变量的统计信息做一个汇总表格直接出来。“之前 crazygoing提供的code能解决这个问题。jingju11提供的用outset语句也能解决这个问题。非常感谢大家的指教！！！

18楼

yunqingwang

发表于 2011-9-30 10:26:30

/* 工作半年都没有接触IML,研究生时候用过IML */
/* 代码写的比较不专业，我仍觉得IML比较强大 */
/* 完全可以脱离macro，IML高手都很低调*/
/* 没有IML权限，没法调试，有误的可能性大*/
data a1;
input date $    ticker $       return       smb       hml;
cards;
19930104       AC       0.0298       -0.003       0.0023
19930105       AC       0.0334       0.0037       0.0026
19930106       AC       0.0432       0.0041       0.0025
19930107       AC       0.0256       0.0066       0.0017
19930108       AC       0.0355       0.0002       -0.0019
19930109       AC       0.056       0       0.003
19930110       AC       0.025       -0.0029       0.0018
19930111       AC       0.0345       0.001       0.009
19930112       AC       0.0435       0.0019       0.003
19930104       BDF       0.0435       0.0019       0.0018
19930105       BDF       0.0298       0.0016       0.009
19930106       BDF       0.0334       -0.0005       0.003
19930107       BDF       0.0432       0.0025       0.0003
19930108       BDF       0.0256       0.0062       0.0027
19930109       BDF       0.0355       -0.0019       0.0025
19930110       BDF       0.056       0.002       0.0026
19930111       BDF       0.025       -0.003       0.0023
19930112       BDF       0.0345       0.0037       0.0026
;
run;
/*根据ticket读到数据集，IML也可以实现，IML不熟悉，忘了怎么处理*/
proc sql;
create table t1 as
select distinct ticker
from a1;
quit;

proc sql noprint;
  select count(*) into: numrows
  from t1;
  %let nn=&numrows;
  select ticker into: ticker1- :ticker&nn
  from t1;
quit;

%macro one;
%do i = 1 %to &nn;
  data b&i;
set a1;
where ticker = "&&ticker&i";
run;
%end;
%mend;
%one;

%macro two;
proc iml;

/*自定义回归模块，X为自变量矩阵，y为因变量向量*/
start reg(x,y);
   n=nrow(x);                   /* number of observations */
   k=ncol(x);                      /* number of variables */
   xpx=x`*x;                            /* cross-products */
   xpy=x`*y;
   xpxi=inv(xpx);                /* inverse crossproducts */
   b=xpxi*xpy;                      /* parameter estimates */
   yhat=x*b;                            /* predicted values */
   resid=y-yhat;                               /* residuals */
   sse=resid`*resid;             /* sum of squared errors */
   dfe=n-k;                   /* degrees of freedom error */
   mse=sse/dfe;                      /* mean squared error */
   rmse=sqrt(mse);             /* root mean squared error */
   covb=xpxi#mse;                /* covariance of estimates */
   stdb=sqrt(vecdiag(covb));             /* standard errors */
   t=b/stdb;                      /* ttest for estimates=0 */
   probt=1-probf(t#t,1,dfe);    /* significance probability */
finish reg;

free re;
  %do i= 1 %to &nn;
proc iml;
use b&i;read all var _num_ into c;
call reg(C[,2:3],c[,1]);
re=re//(b||RESID||sse||t||probt); /*保存每次回归的结果R-sqare可以公式自己在REGmodule加上去，*/
  %end;

create table result from re;append from re;
quit;
%mend;
%two;