请选择 进入手机版 | 继续访问电脑版
楼主: cchen59
1809 2

[实际应用] 国外SAS logistic model 的题目, 哪个大牛能够解答?! [推广有奖]

  • 0关注
  • 0粉丝

高中生

2%

还不是VIP/贵宾

-

威望
0
论坛币
2946 个
通用积分
0
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
71 点
帖子
10
精华
0
在线时间
27 小时
注册时间
2011-1-3
最后登录
2015-7-12

cchen59 发表于 2015-7-1 08:32:01 |显示全部楼层 |坛友微信交流群
30论坛币
请各位高手能否帮忙解答一下关于logistic modeling的问题:

1. 建立数据

data credit_risk;

  do cus_id=1 to1000;

    os=round(1000*ranuni(_n_),1);

   ut=round(min(2,max(0,rannor(99)+1)),0.01);

   tr_num_3m=round(31*ranuni(5),1);

   tr_num_1m=max(round(tr_num_3m*0.31+2.6*rannor(9),1),0);

    if ranuni(2)>0.7 then overdu_num=round(5*ranuni(8),1);

    else overdu_num=0;

   sc=overdu_num*(-1.26)+os/78+tr_num_3m/13+tr_num_1m/2.25+0.5+2*rannor(7);

    if ranuni(32)>0.15 then target=(exp(sc)/(exp(sc)+1)>0.93);

    else target=0;

    output;

  end;

  drop sc;

  run;


2. 建模 - logistic model based on independent variables (except target and cus_id)

    proc logistic data=credit_riskdescending;

     model target=os ut tr_num_3m tr_num_1m overdu_num /stb;

    run;


问题如下:


问题1:How much is the concordance andexplain it - 如何解释这个"Concordance"?


问题2: Score each customers according to themodel - 这个问题我也不明白,是要对每一个用户算probability么?


问题3: What is the average target rate forthe top 10% group (worst customer or high risk customer group)?

问题4: What are the gaps between actualtarget rate and predicted probability in the top 10% group?

问题5: Explain the odds ratio for eachfactor and negative/positive impact for each variable?


问题6: Is there any correlation amongindependent variables? How to deal with the higher correlation?


关键词:logistic logisti ogistic logist logis target
420948492 发表于 2015-7-1 12:40:19 |显示全部楼层 |坛友微信交流群
看看proc logistic的帮助基本上就能解决了

使用道具

wsddzr 发表于 2015-7-1 13:40:18 |显示全部楼层 |坛友微信交流群
我也是初学者~ 一己之见,不能保证正确啦
1.
concordance 就是c 吧  
摘一段,不知道你能看明白不,就是把观测到的each case两两配对,只看1,0或者0,1这样的情况
然后看1的predicted value是不是比0的predicted value高,如果是的话,就说pair concordant
这样看看这样的pair concordant占总数的多少就可以得到Percent Concordant
For the 147 observations in the sample, there are 147(146)/2 =10731 different
ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1's on the
dependent variable or both 0's. We ignore these, leaving 4850 pairs in which one case has a 1 and the other case has a 0.
For each pair, we ask the question, "Does the case with a 1 have a higher predicted value (based on the model) than the
case with a 0?" If the answer is yes, we call that pair concordant. If no, the pair is discordant. If the two cases have the
same predicted value, we call it a tie.

2.
这个我也不是很清楚,你可以发邮件问问confirm下,不行就把probability和predicted target value都算了呗
不过感觉这个model goodness of fit不是很好,你再看看呢

  1. proc logistic data=credit_risk descending;
  2.      model target=os ut tr_num_3m tr_num_1m overdu_num /lackfit rsq stb ctable scale=none aggregate;
  3.          output out=a pred=phat;
  4.     run;
  5.         data a ;
  6.         set a (drop= _level_);
  7.         target_hat=0;
  8.         if phat>0.5 then target_hat =1;
  9.         else if phat <0.5 then target_hat=0;
  10.         run;
复制代码

3.不知道这个group是按什么划分的,也不知道这里target rate指的是predicted target rate吗

4.
  1. proc corr data=credit_risk;
  2.         run;
复制代码


5.odds ratio的话 比方说os的系数是0.00258,odds ratio为exp(0.00258)=1.026 >1 那么是对targe=1为正影响
interpreation: the estimated odds of target increase by 2.6% with one unit increase by os

6.tr_num_3m tr_num_1m之间有比较强的共线性
解决方法的话,粗暴点的话是直接删掉一个Wald Chi-Square小的,这里是tr_num_3m
你们应该也提到其他的解决共线性的方法吧~
  1. proc corr data=credit_risk;
  2.         run;
复制代码

有意思的是,照之前一些书上的方法用proc reg来看
  1. proc reg data=credit_risk;
  2.     model target=os ut tr_num_3m tr_num_1m overdu_num /vif;
  3.         run;
复制代码

其实这两个变量的vif也不算大,不过可能是我哪里想错了

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-19 00:26