楼主: cherry_wyj
20211 6

[统计软件] 怎么看待控制变量引起的多重共线性? [推广有奖]

  • 7关注
  • 1粉丝

硕士生

90%

还不是VIP/贵宾

-

威望
0
论坛币
38 个
通用积分
0.0045
学术水平
4 点
热心指数
6 点
信用等级
2 点
经验
4282 点
帖子
117
精华
0
在线时间
194 小时
注册时间
2014-7-21
最后登录
2019-1-24

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
如题,我看到很多文献包括顶级期刊上的文献中在模型中加入的控制变量感觉都会与自变量有一定的线性关系,但是作者都不加以考虑,也不进行多重共线性检验,是因为基本回归模型中共线性的影响并不大吗?
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:多重共线性 多重共线 怎么看待 控制变量 共线性 自变量 模型 影响

已有 1 人评分经验 学术水平 热心指数 收起 理由
yangyuzhou + 60 + 1 + 1 鼓励积极发帖讨论

总评分: 经验 + 60  学术水平 + 1  热心指数 + 1   查看全部评分

沙发
yangyuzhou 发表于 2014-9-3 22:53:08 |只看作者 |坛友微信交流群
我觉得不是因为作者没有检验,而纯粹是因为期刊将这部分检验的内容给删除了,因为做起来很简单;至于线性相关,哪怕是实际感觉完全不相关的变量,在统计上都有可能出现一定的线性相关性……

使用道具

藤椅
hyu9910 在职认证  发表于 2014-9-2 11:16:53 |只看作者 |坛友微信交流群
可以检验共线性的。 共线性严重的几个自变量,可以考虑不要在同一回归方程中用
已有 1 人评分论坛币 学术水平 热心指数 收起 理由
yangyuzhou + 12 + 1 + 1 鼓励积极发帖讨论

总评分: 论坛币 + 12  学术水平 + 1  热心指数 + 1   查看全部评分

使用道具

板凳
cherry_wyj 发表于 2014-9-3 23:04:33 |只看作者 |坛友微信交流群
yangyuzhou 发表于 2014-9-3 22:53
我觉得不是因为作者没有检验,而纯粹是因为期刊将这部分检验的内容给删除了,因为做起来很简单;至于线性相 ...
嗯,经过实际的操作发现的确如此~

使用道具

报纸
Alfred_G 学生认证  发表于 2014-9-4 15:36:16 |只看作者 |坛友微信交流群
共线性检验要看容忍系数。这个系数其实就是根据回归方程中每个自变量的系数之间的协相关系数计算得来的。
只要容忍度在1到10之间,都可以认为共线性不是一个干扰的问题。
另外,自变量之间一定要有必要的相关,根据线性回归的残差独立假设,残差与自变量之间相关系数为零。也就是说,如果某两个自变量之间没有相关,那么,其中一个就是残差。
已有 2 人评分论坛币 学术水平 热心指数 信用等级 收起 理由
KAWA-KAWA + 1 + 1 + 1 热心帮助其他会员
admin_kefu + 20 热心帮助其他会员

总评分: 论坛币 + 20  学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

使用道具

地板
lovewcq123 发表于 2016-4-3 09:39:12 |只看作者 |坛友微信交流群
http://statisticalhorizons.com/multicollinearity

Regardless of your criterion for what constitutes a high VIF, there are at least three situations in which a high VIF is not a problem and can be safely ignored:

1. The variables with high VIFs are control variables, and the variables of interest do not have high VIFs. Here’s the thing about multicollinearity: it’s only a problem for the variables that are collinear. It increases the standard errors of their coefficients, and it may make those coefficients unstable in several ways. But so long as the collinear variables are only used as control variables, and they are not collinear with your variables of interest, there’s no problem. The coefficients of the variables of interest are not affected, and the performance of the control variables as controls is not impaired.

Here’s an example from some of my own work: the sample consists of U.S. colleges, the dependent variable is graduation rate, and the variable of interest is an indicator (dummy) for public vs. private. Two control variables are average SAT scores and average ACT scores for entering freshmen. These two variables have a correlation above .9, which corresponds to VIFs of at least 5.26 for each of them. But the VIF for the public/private indicator is only 1.04. So there’s no problem to be concerned about, and no need to delete one or the other of the two controls.

2. The high VIFs are caused by the inclusion of powers or products of other variables. If you specify a regression model with both x and x2, there’s a good chance that those two variables will be highly correlated. Similarly, if your model has x, z, and xz, both x and z are likely to be highly correlated with their product. This is not something to be concerned about, however, because the p-value for xz is not affected by the multicollinearity.  This is easily demonstrated: you can greatly reduce the correlations by “centering” the variables (i.e., subtracting their means) before creating the powers or the products. But the p-value for x2 or for xz will be exactly the same, regardless of whether or not you center. And all the results for the other variables (including the R2 but not including the lower-order terms) will be the same in either case. So the multicollinearity has no adverse consequences.

3. The variables with high VIFs are indicator (dummy) variables that represent a categorical variable with three or more categories. If the proportion of cases in the reference category is small, the indicator variables will necessarily have high VIFs, even if the categorical variable is not associated with other variables in the regression model.

Suppose, for example, that a marital status variable has three categories: currently married, never married, and formerly married. You choose formerly married as the reference category, with indicator variables for the other two. What happens is that the correlation between those two indicators gets more negative as the fraction of people in the reference category gets smaller. For example, if 45 percent of people are never married, 45 percent are married, and 10 percent are formerly married, the VIFs for the married and never-married indicators will be at least 3.0.

Is this a problem? Well, it does mean that p-values for the indicator variables may be high. But the overall test that all indicators have coefficients of zero is unaffected by the high VIFs. And nothing else in the regression is affected. If you really want to avoid the high VIFs, just choose a reference category with a larger fraction of the cases. That may be desirable in order to avoid situations where none of the individual indicators is statistically significant even though the overall set of indicators is significant.
已有 1 人评分论坛币 学术水平 热心指数 信用等级 收起 理由
yybys + 1 + 1 + 1 + 1 观点有启发

总评分: 论坛币 + 1  学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

使用道具

7
nju_dxc 发表于 2021-10-19 20:10:23 |只看作者 |坛友微信交流群
我认为控制变量与解释变量有相关性才证明这个控制变量是比较有价值的(有效避免了内生性),一定的共线性相比之下是能接受的吧,毕竟很多时候看的是因果而并非看解释变量前面的系数β的大小

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加JingGuanBbs
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-27 07:25