人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › SAS专版 › 国外SAS logistic model 的题目, 哪个大牛能够解答?!

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: cchen59

1809 2

[实际应用] 国外SAS logistic model 的题目, 哪个大牛能够解答?! [推广有奖]

0关注
0粉丝

高中生

还不是VIP/贵宾

威望: 0 级
论坛币: 2946 个
通用积分: 0
学术水平: 0 点
热心指数: 0 点
信用等级: 0 点
经验: 71 点
帖子: 10
精华: 0
在线时间: 27 小时
注册时间: 2011-1-3
最后登录: 2015-7-12

cchen59 发表于 2015-7-1 08:32:01 |显示全部楼层 |坛友微信交流群

30论坛币

请各位高手能否帮忙解答一下关于logistic modeling的问题:

1. 建立数据

data credit_risk;

do cus_id=1 to1000;

os=round(1000*ranuni(_n_),1);

ut=round(min(2,max(0,rannor(99)+1)),0.01);

tr_num_3m=round(31*ranuni(5),1);

tr_num_1m=max(round(tr_num_3m*0.31+2.6*rannor(9),1),0);

if ranuni(2)>0.7 then overdu_num=round(5*ranuni(8),1);

else overdu_num=0;

sc=overdu_num*(-1.26)+os/78+tr_num_3m/13+tr_num_1m/2.25+0.5+2*rannor(7);

if ranuni(32)>0.15 then target=(exp(sc)/(exp(sc)+1)>0.93);

else target=0;

output;

end;

drop sc;

run;

2. 建模 - logistic model based on independent variables (except target and cus_id)

proc logistic data=credit_riskdescending;

model target=os ut tr_num_3m tr_num_1m overdu_num /stb;

run;

问题如下:

问题1:How much is the concordance andexplain it - 如何解释这个"Concordance"?

问题2: Score each customers according to themodel - 这个问题我也不明白,是要对每一个用户算probability么?

问题3: What is the average target rate forthe top 10% group (worst customer or high risk customer group)?

问题4: What are the gaps between actualtarget rate and predicted probability in the top 10% group?

问题5: Explain the odds ratio for eachfactor and negative/positive impact for each variable?

问题6: Is there any correlation amongindependent variables? How to deal with the higher correlation?

关键词：logistic logisti ogistic logist logis target

使用道具举报

420948492 发表于 2015-7-1 12:40:19 |显示全部楼层 |坛友微信交流群

看看proc logistic的帮助基本上就能解决了

使用道具举报

wsddzr 发表于 2015-7-1 13:40:18 |显示全部楼层 |坛友微信交流群

我也是初学者~ 一己之见，不能保证正确啦
1.
concordance 就是c 吧
摘一段，不知道你能看明白不，就是把观测到的each case两两配对，只看1,0或者0,1这样的情况
然后看1的predicted value是不是比0的predicted value高，如果是的话，就说pair concordant
这样看看这样的pair concordant占总数的多少就可以得到Percent Concordant
For the 147 observations in the sample, there are 147(146)/2 =10731 different
ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1's on the
dependent variable or both 0's. We ignore these, leaving 4850 pairs in which one case has a 1 and the other case has a 0.
For each pair, we ask the question, "Does the case with a 1 have a higher predicted value (based on the model) than the
case with a 0?" If the answer is yes, we call that pair concordant. If no, the pair is discordant. If the two cases have the
same predicted value, we call it a tie.

2.
这个我也不是很清楚，你可以发邮件问问confirm下，不行就把probability和predicted target value都算了呗
不过感觉这个model goodness of fit不是很好，你再看看呢

proc logistic data=credit_risk descending;
model target=os ut tr_num_3m tr_num_1m overdu_num /lackfit rsq stb ctable scale=none aggregate;
output out=a pred=phat;
run;
data a ;
set a (drop= _level_);
target_hat=0;
if phat>0.5 then target_hat =1;
else if phat <0.5 then target_hat=0;
run;

复制代码

3.不知道这个group是按什么划分的，也不知道这里target rate指的是predicted target rate吗

4.

proc corr data=credit_risk;
run;

复制代码

5.odds ratio的话比方说os的系数是0.00258，odds ratio为exp(0.00258)=1.026 >1 那么是对targe=1为正影响
interpreation： the estimated odds of target increase by 2.6% with one unit increase by os

6.tr_num_3m tr_num_1m之间有比较强的共线性
解决方法的话，粗暴点的话是直接删掉一个Wald Chi-Square小的，这里是tr_num_3m
你们应该也提到其他的解决共线性的方法吧~