楼主: pxwendy
4057 7

[问答] Logistic Regression with Heteroskedasticity? [推广有奖]

  • 0关注
  • 0粉丝

小学生

28%

还不是VIP/贵宾

-

威望
0
论坛币
0 个
通用积分
0
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
30 点
帖子
2
精华
0
在线时间
6 小时
注册时间
2010-12-4
最后登录
2012-5-28

楼主
pxwendy 发表于 2012-5-7 12:34:18 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
各位大侠,帮个忙啊!
一、在做logistic回归时,发现9个自变量中有3个自变量是有共线性的(相关系数大于0.7),怎么都消除不了,不想用逐步回归法,因为想9个变量都保留;不能用因子分析法,因为数据量只有371个,而且自变量本身也只有9个,提取公因子的话就提取了3个,贡献率只有70%多一点;岭回归和片最小二乘回归用spss太不好做了。我该怎么办,又不能视而不见。
二、在做logistic回归的时候,要不要进行异方差检验,我的数据是截面数据,所以没打算做自相关检验。但是异方差检验不晓得怎么做。用eviews的话直接没有white检验这个选项,是不是logistic回归不要做异方差检验,还是怎样啊。
三、本打算通过画图的方式,来看自变量和因变量之间的关系,但是那个图画出来太吓人了,完全看不出之间有任何的关系,我的自变量有的是定性的变量(0/1),有的是连续变量,有的类似于分段变量(有一部分是a,一部分是b,一部分是c,一部分是d,……),我截取一段数据如下,是不是我这样的数据有问题,还是如何。
真心望赐教,非常感谢!








0

136

555.3

0

0

0.2235

0.4933

0

0

0

0

131

784.71

0

1

0.2235

0.4933

0

0

0

0

135

909.3

0

1

0.2235

0.4933

0

0

0

0

146

584.3

0

0

0.2235

0.4933

0

0

0

0

131

1012.7

0

0

0.0797

0.5073

0

4

0

0

124

858.55

0

0

0.0797

0.5073

0

4

0

0

115

528.75

0

0

0.0797

0.5073

0

4

0







二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:regression regressio logistic ogistic logisti 分析法 自变量 white 而且

沙发
mhzhou 发表于 2012-11-5 22:11:31
我也遇到和楼主一样的问题,想知道做岭回归的时候,因变量是否可以为虚拟变量

藤椅
尤尤西子 发表于 2014-3-22 16:21:15
总是没有人来回答问题。。。同纠结中

板凳
ReneeBK 发表于 2014-3-24 08:59:11
mhzhou 发表于 2012-11-5 22:11
我也遇到和楼主一样的问题,想知道做岭回归的时候,因变量是否可以为虚拟变量
Answer is No, for dummy is only for independent variable.

报纸
ReneeBK 发表于 2014-3-24 09:03:13
Heteroskedasticity is a very different problem in models like -probit- and -logit-. Think of it this way: your dependent variable is a
probability. A probabiltiy embodies uncertainty, and that uncertainty comes from all variables we have not included in our model. In one  sense this makes it very easy to deal with heteroskedasticity: We just define our dependent variable of interest to be the probability given the control variabels in our model. The results of your model give an accurate description of what you have found in your data. However, we often want to give parameters a counterfactual interpretation (e.g. "if the men suddenly became women, then the probabiltiy changes by x percentage points"). Such a counterfactual interpretation is only correct if we can assume that there is no heteroscedasticity. Several solutions have been proposed and I trust none of them: they are just
too sensitive. If you really want to do something about it, than I you'll really need to do some reading. Since these models are so
sensitive, you really need to know what you are doing. A good entry point for that literature is (Williams 2009). But my position is that that problem is basically unsolvable, so not worth worrying about.

Hope that helps,


  • Williams, R. 2009. Using heterogenous choice models to compare logit and probit coefficients across groups. Sociological Methods & Research 37: 531--559.

地板
ReneeBK 发表于 2014-3-24 09:08:43
You could approach this problem using probit models, and once you've figured out if there's an issue and how it should be handle, then you could do equivalent logistics for ease of interpretation if you didn't want to stick with probit - they are essentially the same model in many ways, but there are some options with probit that relate to your question.

I believe you could fit your model with something like xtgee or oglm to get a first model. Then you can fit a heteroskedastic probit (oglm or a similar command). Once you have both models, since the probit model is nested within the het prob model, you can then do an LR test of nested models to see if there is an improvement in fit when using the heteroskedastic model.

I've read a surprising amount of "ignore it" regarding heteroscedasticity and binary outcomes. That seems like a bad idea, particularly with a lot of corrections available. Various robust options are available in Stata commands that address some related issues and are explained well in the Stata documentation.

  • http://www3.nd.edu/~rwilliam/oglm/oglm_Stata.pdf - pretty in depth discussion and explains things using reference to a specific Stata command.
  • Allison, Paul. 1999. Comparing Logit and Probit Coefficients Across Groups. Sociological Methods and Research 28(2): 186-208.
  • Yatchew, Adonis and Zvi Griliches. Specification Error in Probit Models. 1985. The Review of Economics and Statistics 67(1):134-139.
Hope this helps.

7
ReneeBK 发表于 2014-3-24 09:14:57
in Stata, What you could do is estimate a model with -hetprob- and -probit- and do a likelihood ratio test (-lrtest-). This is an test for
heteroscedasticity in probit regression, which is very close to logisitic regression, except you don't get the nice odds ratios.

8
ReneeBK 发表于 2014-3-24 09:18:20
SHAZAM procedure for testing for heteroskedasticity in logit and probit models

=SET NOECHO
PROC TESTHET
* Logit and Probit Models - Test for heteroskedasticity
* Reference: R. Davidson and J.G. MacKinnon, "Convenient Specification
*   Tests for Logit and Probit Models", Journal of Econometrics,
*   Vol 25, 1984, pp. 241-262.
SET NODOECHO NOOUTPUT
GEN1 TYPE_="[MODEL]"  
* Check that the model type is valid
FORMAT(' ERROR: Model must be either PROBIT or LOGIT')
IF ((TYPE_.NE." LOGIT").AND.(TYPE_.NE." PROBIT"))   
  PRINT / FORMAT
IF ((TYPE_.NE." LOGIT").AND.(TYPE_.NE." PROBIT"))
  STOP

* Model estimation
[MODEL] [DEPVAR] [X] / INDEX=XBETA_ PREDICT=CDF_

IF (TYPE_.EQ." LOGIT")
GENR PDF_=(1+EXP(-XBETA_))/((1+EXP(-XBETA_))**2)
IF (TYPE_.EQ." PROBIT")  
DISTRIB XBETA_ / TYPE=NORMAL PDF=PDF_

COPY [Z] Z_
MATRIX Z_=Z_
GEN1 DF_=$COLS
* Equation (26), p. 247.  
GENR ONE_=1
COPY [X] ONE_ X_
DO #=1,DF_
MATRIX ZZ_=Z_(0,#)
GENR ZZ_=-XBETA_*ZZ_
MATRIX Z_(0,#)=ZZ_
ENDO
MATRIX X_ = X_ | Z_
* Equations (16) and (17) , p. 245.     
GENR YAUX_=[DEPVAR]*SQRT((1-CDF_)/CDF_) + ([DEPVAR]-1)*SQRT(CDF_/(1-CDF_))
MATRIX R_=(PDF_/SQRT(CDF_*(1-CDF_)))*X_
* Artificial regression - Equation (18), p. 246.
OLS YAUX_ R_ / NOCONSTANT
* LM test statistic - explained sum of squares
GEN1 LM2=$ZSSR
* p-value
DISTRIB LM2 / TYPE=CHI DF=DF_
GEN1 pvalue_=1-$CDF
* Print results
PRINT MODEL / NONAME
FORMAT(' Test statistic for heteroskedasticity  LM2 ='/F15.5)
PRINT LM2 / NONAME FORMAT  
FORMAT(' chi-square degrees of freedom'/5X,F5.0)
PRINT DF_ / NONAME FORMAT  
FORMAT(' p-value'/5X,F10.5)
PRINT pvalue_ / NONAME FORMAT  
DELETE / ALL_
SET DOECHO OUTPUT
PROCEND
SET ECHO
For detail, please read
http://shazam.econ.ubc.ca/intro/logit3.htm

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-5 18:44