人大经济论坛 › 论坛 › 提问悬赏求职新闻读书功能一区 › 经管百科 › 爱问频道 › 怎么看待控制变量引起的多重共线性？

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: cherry_wyj

20211 6

[统计软件] 怎么看待控制变量引起的多重共线性？ [推广有奖]

7关注
1粉丝

硕士生

90%

还不是VIP/贵宾

威望: 0 级
论坛币: 38 个
通用积分: 0.0045
学术水平: 4 点
热心指数: 6 点
信用等级: 2 点
经验: 4282 点
帖子: 117
精华: 0
在线时间: 194 小时
注册时间: 2014-7-21
最后登录: 2019-1-24

楼主

cherry_wyj 发表于 2014-9-2 10:36:17 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

如题，我看到很多文献包括顶级期刊上的文献中在模型中加入的控制变量感觉都会与自变量有一定的线性关系，但是作者都不加以考虑，也不进行多重共线性检验，是因为基本回归模型中共线性的影响并不大吗？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏2 回帖

关键词：多重共线性多重共线怎么看待控制变量共线性自变量模型影响

相关帖子

已有 1 人评分	经验	学术水平	热心指数	收起理由
yangyuzhou	+ 60	+ 1	+ 1	鼓励积极发帖讨论

总评分: 经验 + 60 学术水平 + 1 热心指数 + 1 查看全部评分

使用道具举报

沙发

yangyuzhou 发表于 2014-9-3 22:53:08 |只看作者 |坛友微信交流群

我觉得不是因为作者没有检验，而纯粹是因为期刊将这部分检验的内容给删除了，因为做起来很简单；至于线性相关，哪怕是实际感觉完全不相关的变量，在统计上都有可能出现一定的线性相关性……

使用道具举报

藤椅

hyu9910

发表于 2014-9-2 11:16:53 |只看作者 |坛友微信交流群

可以检验共线性的。共线性严重的几个自变量，可以考虑不要在同一回归方程中用

已有 1 人评分	论坛币	学术水平	热心指数	收起理由
yangyuzhou	+ 12	+ 1	+ 1	鼓励积极发帖讨论

总评分: 论坛币 + 12 学术水平 + 1 热心指数 + 1 查看全部评分

使用道具举报

板凳

cherry_wyj 发表于 2014-9-3 23:04:33 |只看作者 |坛友微信交流群

yangyuzhou 发表于 2014-9-3 22:53
我觉得不是因为作者没有检验，而纯粹是因为期刊将这部分检验的内容给删除了，因为做起来很简单；至于线性相 ...

嗯，经过实际的操作发现的确如此~

使用道具举报

报纸

Alfred_G

发表于 2014-9-4 15:36:16 |只看作者 |坛友微信交流群

共线性检验要看容忍系数。这个系数其实就是根据回归方程中每个自变量的系数之间的协相关系数计算得来的。
只要容忍度在1到10之间，都可以认为共线性不是一个干扰的问题。
另外，自变量之间一定要有必要的相关，根据线性回归的残差独立假设，残差与自变量之间相关系数为零。也就是说，如果某两个自变量之间没有相关，那么，其中一个就是残差。

已有 2 人评分	论坛币	学术水平	热心指数	信用等级	收起理由
KAWA-KAWA		+ 1	+ 1	+ 1	热心帮助其他会员
admin_kefu	+ 20				热心帮助其他会员

总评分: 论坛币 + 20 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

使用道具举报

地板

lovewcq123 发表于 2016-4-3 09:39:12 |只看作者 |坛友微信交流群

http://statisticalhorizons.com/multicollinearity

Regardless of your criterion for what constitutes a high VIF, there are at least three situations in which a high VIF is not a problem and can be safely ignored:

1. The variables with high VIFs are control variables, and the variables of interest do not have high VIFs. Here’s the thing about multicollinearity: it’s only a problem for the variables that are collinear. It increases the standard errors of their coefficients, and it may make those coefficients unstable in several ways. But so long as the collinear variables are only used as control variables, and they are not collinear with your variables of interest, there’s no problem. The coefficients of the variables of interest are not affected, and the performance of the control variables as controls is not impaired.

Here’s an example from some of my own work: the sample consists of U.S. colleges, the dependent variable is graduation rate, and the variable of interest is an indicator (dummy) for public vs. private. Two control variables are average SAT scores and average ACT scores for entering freshmen. These two variables have a correlation above .9, which corresponds to VIFs of at least 5.26 for each of them. But the VIF for the public/private indicator is only 1.04. So there’s no problem to be concerned about, and no need to delete one or the other of the two controls.

2. The high VIFs are caused by the inclusion of powers or products of other variables. If you specify a regression model with both x and x2, there’s a good chance that those two variables will be highly correlated. Similarly, if your model has x, z, and xz, both x and z are likely to be highly correlated with their product. This is not something to be concerned about, however, because the p-value for xz is not affected by the multicollinearity. This is easily demonstrated: you can greatly reduce the correlations by “centering” the variables (i.e., subtracting their means) before creating the powers or the products. But the p-value for x2 or for xz will be exactly the same, regardless of whether or not you center. And all the results for the other variables (including the R2 but not including the lower-order terms) will be the same in either case. So the multicollinearity has no adverse consequences.

3. The variables with high VIFs are indicator (dummy) variables that represent a categorical variable with three or more categories. If the proportion of cases in the reference category is small, the indicator variables will necessarily have high VIFs, even if the categorical variable is not associated with other variables in the regression model.

Suppose, for example, that a marital status variable has three categories: currently married, never married, and formerly married. You choose formerly married as the reference category, with indicator variables for the other two. What happens is that the correlation between those two indicators gets more negative as the fraction of people in the reference category gets smaller. For example, if 45 percent of people are never married, 45 percent are married, and 10 percent are formerly married, the VIFs for the married and never-married indicators will be at least 3.0.

Is this a problem? Well, it does mean that p-values for the indicator variables may be high. But the overall test that all indicators have coefficients of zero is unaffected by the high VIFs. And nothing else in the regression is affected. If you really want to avoid the high VIFs, just choose a reference category with a larger fraction of the cases. That may be desirable in order to avoid situations where none of the individual indicators is statistically significant even though the overall set of indicators is significant.