本人刚开始学R,看了李东风老师的在线教程https://www.math.pku.edu.cn/teachers/lidf/docs/Rbook/html/_Rbook/stat-glm.html#stat-glm-pois-rate,有问题想请教:
35.2.6发病率模型的章节,用泊松分布模型研究肺癌发生率(cases)的影响因素(city, age)。
数据集是eba1977:
'data.frame': 24 obs. of 4 variables:
$ city : Factor w/ 4 levels "Fredericia","Horsens",..: 1 2 3 4 1 2 3 4 1 2 ...
$ age : Factor w/ 6 levels "40-54","55-59",..: 1 1 1 1 2 2 2 2 3 3 ...
$ pop : int 3059 2879 3142 2520 800 1083 1050 878 710 923 ...
$ cases: int 11 13 4 5 11 6 8 7 11 15 ...
原文中的代码如下:
>glm.eba2 <- glm(
cases ~ -1 + age + city + offset(log(pop)),
family = poisson,
data = eba1977)
> summary(glm.eba2)
运行结果如下:
Call:
glm(formula = cases ~ -1 + age + city + offset(log(pop)), family = poisson,
data = eba1977)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.63573 -0.67296 -0.03436 0.37258 1.85267
Coefficients:
Estimate Std. Error z value Pr(>|z|)
age40-54 -5.6321 0.2003 -28.125 <2e-16 ***
age55-59 -4.5311 0.2073 -21.861 <2e-16 ***
age60-64 -4.1135 0.1869 -22.005 <2e-16 ***
age65-69 -3.8644 0.1841 -20.993 <2e-16 ***
age70-74 -3.7752 0.1897 -19.897 <2e-16 ***
age75+ -4.2124 0.2082 -20.233 <2e-16 ***
cityHorsens -0.3301 0.1815 -1.818 0.0690 .
cityKolding -0.3715 0.1878 -1.978 0.0479 *
cityVejle -0.2723 0.1879 -1.450 0.1472
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 50361.048 on 24 degrees of freedom
Residual deviance: 23.447 on 15 degrees of freedom
AIC: 137.84
Number of Fisher Scoring iterations: 5
我尝试把公式中age和city换了位置,输入代码如下:
>glm.eba3 <- glm(
cases ~ -1 + city + age + offset(log(pop)),
family = poisson,
data = eba1977)
> summary(glm.eba3)
得到结果如下:
Call:
glm(formula = cases ~ -1 + city + age + offset(log(pop)), family = poisson,
data = eba1977)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.63573 -0.67296 -0.03436 0.37258 1.85267
Coefficients:
Estimate Std. Error z value Pr(>|z|)
cityFredericia -5.6321 0.2003 -28.125 < 2e-16 ***
cityHorsens -5.9621 0.2106 -28.312 < 2e-16 ***
cityKolding -6.0036 0.2127 -28.231 < 2e-16 ***
cityVejle -5.9044 0.2151 -27.446 < 2e-16 ***
age55-59 1.1010 0.2483 4.434 9.23e-06 ***
age60-64 1.5186 0.2316 6.556 5.53e-11 ***
age65-69 1.7677 0.2294 7.704 1.31e-14 ***
age70-74 1.8569 0.2353 7.891 3.00e-15 ***
age75+ 1.4197 0.2503 5.672 1.41e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 50361.048 on 24 degrees of freedom
Residual deviance: 23.447 on 15 degrees of freedom
AIC: 137.84
Number of Fisher Scoring iterations: 5
可见两次运行结果Coefficients中的age和city项目数是不同的。请问是为什么呢?age一共6组,city一共4组,为什么结果中会少一组数据呢?而且两次结果少的数据项目还不一样。望各位解答,谢谢!