- * Example generated by -dataex-. To install: ssc install dataex
- clear
- input double(y x1 x2)
- 4 2 3
- 5 3 4
- 4 4 7
- 7 5 6
- 8 3 5
- 9 7 7
- 8 8 3
- end
- gen x1_x2=x1*x2 //生成交叉项
- collin x1 x2 x1_x2
- (obs=7)
- Collinearity Diagnostics
- SQRT R-
- Variable VIF VIF Tolerance Squared
- ----------------------------------------------------
- x1 8.37 2.89 0.1194 0.8806
- x2 8.92 2.99 0.1121 0.8879
- x1_x2 18.87 4.34 0.0530 0.9470
- ----------------------------------------------------
- Mean VIF 12.05
- Cond
- Eigenval Index
- ---------------------------------
- 1 3.7550 1.0000
- 2 0.1435 5.1162
- 3 0.0979 6.1938
- 4 0.0037 31.8179
- ---------------------------------
- Condition Number 31.8179
- Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
- Det(correlation matrix) 0.0514
这里的condition number 为31.8179,大于15而且大于30,于是很多人会认为多重共线性相当严重。实际上这是错误的想法,对共线性的检验,我们应该采用中心化后的变量进行collin检验,这样才能真正判断是否存在共线性,我们采用collin,corr命令对中心化后的变量进行多重共线性检验。
- (obs=7)
- Collinearity Diagnostics
- SQRT R-
- Variable VIF VIF Tolerance Squared
- ----------------------------------------------------
- x1 8.37 2.89 0.1194 0.8806
- x2 8.92 2.99 0.1121 0.8879
- x1_x2 18.87 4.34 0.0530 0.9470
- ----------------------------------------------------
- Mean VIF 12.05
- Cond
- Eigenval Index
- ---------------------------------
- 1 2.1439 1.0000
- 2 0.8271 1.6100
- 3 0.0290 8.5989
- ---------------------------------
- Condition Number 8.5989
- Eigenvalues & Cond Index computed from deviation sscp (no intercept)
- Det(correlation matrix) 0.0514
此时我们会发现,condition number=8.5989<15,说明模型并不存在多重共线性。这个结果才是我们需要汇报在文章中的(如果你想检验共线性的,当然我基本上不做共线性检验)。这就是很多人误以为中心化能减少共线性的原因,因为他认为中心化和非中心化能得到不同的结果,而我认为共线性检验的正确做法是上述第二种,也就是说x1 x2的共线性检验应该是中心化后再做collin检验。
实际上你会发现,无论你在模型中使用原始变量还是中心化变量,对系数估计并没有影响,这一点在黄河泉老师的PPT里讲的比较详细,我就不班门弄斧了。简单给出结果,大家看一下:
- center x1 x2
- reg y x1 x2 x1_x2
- Source | SS df MS Number of obs = 7
- -------------+---------------------------------- F(3, 3) = 1.06
- Model | 13.2183951 3 4.40613169 Prob > F = 0.4821
- Residual | 12.4958906 3 4.16529688 R-squared = 0.5140
- -------------+---------------------------------- Adj R-squared = 0.0281
- Total | 25.7142857 6 4.28571429 Root MSE = 2.0409
- ------------------------------------------------------------------------------
- y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
- -------------+----------------------------------------------------------------
- x1 | 0.257 1.083 0.24 0.828 -3.191 3.705
- x2 | -0.428 1.437 -0.30 0.785 -5.001 4.146
- x1_x2 | 0.095 0.253 0.38 0.732 -0.711 0.902
- _cons | 5.160 6.156 0.84 0.463 -14.431 24.751
- ------------------------------------------------------------------------------
- . reg y c_x1 c_x2 c.x1#c.x2
- Source | SS df MS Number of obs = 7
- -------------+---------------------------------- F(3, 3) = 1.06
- Model | 13.2183951 3 4.40613169 Prob > F = 0.4821
- Residual | 12.4958906 3 4.16529688 R-squared = 0.5140
- -------------+---------------------------------- Adj R-squared = 0.0281
- Total | 25.7142857 6 4.28571429 Root MSE = 2.0409
- ------------------------------------------------------------------------------
- y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
- -------------+----------------------------------------------------------------
- c_x1 | 0.257 1.083 0.24 0.828 -3.191 3.705
- c_x2 | -0.428 1.437 -0.30 0.785 -5.001 4.146
- |
- c.x1#c.x2 | 0.095 0.253 0.38 0.732 -0.711 0.902
- |
- _cons | 4.197 5.987 0.70 0.534 -14.857 23.251
- ------------------------------------------------------------------------------
- . reg y c.c_x1##c.c_x2
- Source | SS df MS Number of obs = 7
- -------------+---------------------------------- F(3, 3) = 1.06
- Model | 13.2183951 3 4.40613169 Prob > F = 0.4821
- Residual | 12.4958906 3 4.16529688 R-squared = 0.5140
- -------------+---------------------------------- Adj R-squared = 0.0281
- Total | 25.7142857 6 4.28571429 Root MSE = 2.0409
- -------------------------------------------------------------------------------
- y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
- --------------+----------------------------------------------------------------
- c_x1 | 0.733 0.456 1.61 0.207 -0.719 2.186
- c_x2 | 0.008 0.525 0.02 0.989 -1.663 1.679
- |
- c.c_x1#c.c_x2 | 0.095 0.253 0.38 0.732 -0.711 0.902
- |
- _cons | 6.374 0.785 8.12 0.004 3.876 8.872
- -------------------------------------------------------------------------------
- . reg y x1 x2 c.c_x1#c.c_x2
- Source | SS df MS Number of obs = 7
- -------------+---------------------------------- F(3, 3) = 1.06
- Model | 13.2183951 3 4.40613169 Prob > F = 0.4821
- Residual | 12.4958906 3 4.16529688 R-squared = 0.5140
- -------------+---------------------------------- Adj R-squared = 0.0281
- Total | 25.7142857 6 4.28571429 Root MSE = 2.0409
- -------------------------------------------------------------------------------
- y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
- --------------+----------------------------------------------------------------
- x1 | 0.733 0.456 1.61 0.207 -0.719 2.186
- x2 | 0.008 0.525 0.02 0.989 -1.663 1.679
- |
- c.c_x1#c.c_x2 | 0.095 0.253 0.38 0.732 -0.711 0.902
- |
- _cons | 2.983 2.868 1.04 0.375 -6.143 12.109
- -------------------------------------------------------------------------------
实际上上述结果是完全一致的,前两种可能要结合经济含义手动算一下就可以得到后两种的结果。
- reg y x1 x2 x1_x2
- sum x2
- dis _b[x1]+`r(mean)'*_b[x1_x2]
- .73314654
- sum x1
- dis _b[x2]+`r(mean)'*_b[x1_x2]
- .0078985
所以说中心化并不会解决多重共线性问题,如果你看懂我一开始说道理,原因很简单,那就是我们检验共线性时应该直接对中心化的变量做检验,这样得到的结果就是原始变量的多重共线性检验结果。


雷达卡






京公网安备 11010802022788号







