R回归时有的变量系数是NA,但把有问题的变量单独拿出来与其他几个变量一起回归时却没有问题.
如下:
> ols4 <- lm( log( PRICE ) ~ .+0 ,data = dat)
> summary( ols4 )
Call:
lm(formula = log(PRICE) ~ . + 0, data = dat)
Residuals:
Min 1Q Median 3Q Max
-0.145684 -0.045431 -0.003511 0.031214 0.281819
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
IDWIN 3.584e-03 6.197e-05 57.840 < 2e-16 ***
GRADE -4.321e-03 4.816e-03 -0.897 0.37087
INTE 9.413e-03 7.862e-03 1.197 0.23290
FINE 8.561e-03 1.652e-02 0.518 0.60494
COMP 1.458e-02 1.464e-02 0.996 0.32080
FIRM -2.673e-02 1.435e-02 -1.863 0.06424 .
ACID 4.370e-04 1.949e-02 0.022 0.98214
SUPP -2.311e-02 1.402e-02 -1.648 0.10129
FLAT -2.300e-02 2.152e-02 -1.069 0.28672
FAT -7.810e-04 1.343e-02 -0.058 0.95371
WCON -1.049e-03 1.751e-02 -0.060 0.95230
HARM 5.187e-03 9.680e-03 0.536 0.59276
TANI 3.297e-03 1.539e-02 0.214 0.83063
FINI 1.222e-02 1.059e-02 1.154 0.25010
ALCO 2.937e-02 2.281e-02 1.288 0.19969
STAL 3.894e-03 2.607e-02 0.149 0.88146
REDU 8.673e-03 4.980e-02 0.174 0.86196
KEEP -1.582e-03 1.824e-02 -0.087 0.93097
RANK 7.667e-02 1.279e-02 5.995 1.23e-08 ***
RED 9.735e-01 5.352e-02 18.189 < 2e-16 ***
WHIT 1.009e+00 4.890e-02 20.632 < 2e-16 ***
AN89 -6.126e-02 3.451e-02 -1.775 0.07768 .
AN90 -8.108e-02 3.013e-02 -2.691 0.00786 **
AN91 NA NA NA NA
BORD 7.208e-02 2.741e-02 2.630 0.00935 **
COTE 4.426e-02 2.556e-02 1.732 0.08521 .
MEGR -5.028e-02 2.061e-02 -2.439 0.01578 *
SEPF NA NA NA NA
BLSE -1.734e-02 2.714e-02 -0.639 0.52376
BLDO NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.07517 on 166 degrees of freedom
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9996
F-statistic: 1.896e+04 on 27 and 166 DF, p-value: < 2.2e-16
但如下公式时,AN91却没有问题:
> ols5 <- lm( log( PRICE ) ~ RANK+RED+AN89+AN90+AN91+0,data = dat)
> summary( ols5 )
Call:
lm(formula = log(PRICE) ~ RANK + RED + AN89 + AN90 + AN91 + 0,
data = dat)
Residuals:
Min 1Q Median 3Q Max
-1.08085 -0.24577 -0.04147 0.22025 1.13384
Coefficients:
Estimate Std. Error t value Pr(>|t|)
RANK 0.42816 0.05202 8.231 3.04e-14 ***
RED -0.46883 0.07672 -6.111 5.59e-09 ***
AN89 1.53508 0.07802 19.677 < 2e-16 ***
AN90 1.01565 0.05562 18.262 < 2e-16 ***
AN91 0.29380 0.08097 3.629 0.000367 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3928 on 188 degrees of freedom
Multiple R-squared: 0.99, Adjusted R-squared: 0.9897
F-statistic: 3714 on 5 and 188 DF, p-value: < 2.2e-16
但如果把公式改为有常数项时, AN91的系数又是NA,为什么?
这与AN91取值只有1和2两种情形有关吗? SEPF 和BLDO等一些变量的取值也是如此
> ols6 <- lm( log( PRICE ) ~ RANK+RED+AN89+AN90+AN91,data = dat)
> summary( ols6 )
Call:
lm(formula = log(PRICE) ~ RANK + RED + AN89 + AN90 + AN91, data = dat)
Residuals:
Min 1Q Median 3Q Max
-1.08085 -0.24577 -0.04147 0.22025 1.13384
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.17520 0.32387 3.629 0.000367 ***
RANK 0.42816 0.05202 8.231 3.04e-14 ***
RED -0.46883 0.07672 -6.111 5.59e-09 ***
AN89 1.24128 0.13320 9.319 < 2e-16 ***
AN90 0.72185 0.12205 5.915 1.54e-08 ***
AN91 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3928 on 188 degrees of freedom
Multiple R-squared: 0.5519, Adjusted R-squared: 0.5423
F-statistic: 57.88 on 4 and 188 DF, p-value: < 2.2e-16