- library(ff)
- library(ffbase)
- library(biglm)
- data(Affairs, package = "AER")
- Affairs$ynaffair[Affairs$affairs > 0] <- 1
- Affairs$ynaffair[Affairs$affairs == 0] <- 0
- gender <- as.ff(c(Affairs$gender),vmode="integer")
- age <- as.ff(c(Affairs$age),vmode="double")
- yearsmarried <- as.ff(c(Affairs$yearsmarried),vmode="double")
- children <- as.ff(c(Affairs$children),vmode="integer")
- religiousness <- as.ff(c(Affairs$religiousness),vmode="integer")
- education <- as.ff(c(Affairs$education),vmode="integer")
- occupation <- as.ff(c(Affairs$occupation),vmode="integer")
- rating <- as.ff(c(Affairs$rating),vmode="integer")
- ynaffair <- as.ff(c(Affairs$ynaffair),vmode="integer")
- ts <- ffdf(ynaffair,gender,age,yearsmarried,children,religiousness,education,occupation,rating)
- full <- bigglm.ffdf(ynaffair ~ gender + age + yearsmarried +
- children + religiousness + education + occupation + rating,
- data=ts,family=binomial(),chunksize=5,sandwich=)
- summary(full)
- Large data regression model: bigglm(ynaffair ~ gender + age + yearsmarried + children + religiousness +
- education + occupation + rating, data = ts, family = binomial(),
- chunksize = 5)
- Sample size = 601
- Coef (95% CI) SE p
- (Intercept) 0.6993 -1.2040 2.6026 0.9517 0.4624
- gender 0.2803 -0.1979 0.7585 0.2391 0.2411
- age -0.0443 -0.0808 -0.0078 0.0182 0.0153
- yearsmarried 0.0948 0.0303 0.1592 0.0322 0.0033
- children 0.3977 -0.1853 0.9807 0.2915 0.1725
- religiousness -0.3247 -0.5042 -0.1452 0.0898 0.0003
- education 0.0211 -0.0800 0.1221 0.0505 0.6769
- occupation 0.0309 -0.1126 0.1745 0.0718 0.6666
- rating -0.4685 -0.6503 -0.2866 0.0909 0.0000
- fit.full <- glm(ynaffair ~ gender + age + yearsmarried +
- children + religiousness + education + occupation + rating,
- data = Affairs, family = binomial())
- summary(fit.full)
- Call:
- glm(formula = ynaffair ~ gender + age + yearsmarried + children +
- religiousness + education + occupation + rating, family = binomial(),
- data = Affairs)
- Deviance Residuals:
- Min 1Q Median 3Q Max
- -1.5713 -0.7499 -0.5690 -0.2539 2.5191
- Coefficients:
- Estimate Std. Error z value Pr(>|z|)
- (Intercept) 1.37726 0.88776 1.551 0.120807
- gendermale 0.28029 0.23909 1.172 0.241083
- age -0.04426 0.01825 -2.425 0.015301 *
- yearsmarried 0.09477 0.03221 2.942 0.003262 **
- childrenyes 0.39767 0.29151 1.364 0.172508
- religiousness -0.32472 0.08975 -3.618 0.000297 ***
- education 0.02105 0.05051 0.417 0.676851
- occupation 0.03092 0.07178 0.431 0.666630
- rating -0.46845 0.09091 -5.153 2.56e-07 ***
- ---
- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- (Dispersion parameter for binomial family taken to be 1)
- Null deviance: 675.38 on 600 degrees of freedom
- Residual deviance: 609.51 on 592 degrees of freedom
- AIC: 627.51
- Number of Fisher Scoring iterations: 4
该段程序引用自《R语言实战》第十三章,分别使用用bigglm建模,与glm建模。
广义线性模型(generalized linear model, GLM)是简单最小二乘回归(OLS)的扩展,在OLS的假设中,响应变量是连续数值数据且服从正态分布,而且响应变量期望值与预测变量之间的关系是线性关系。而广义线性模型则放宽其假设,首先响应变量可以是正整数或分类数据,其分布为某指数分布族。其次响应变量期望值的函数(连接函数)与预测变量之间的关系为线性关系。因此在进行GLM建模时,需要指定分布类型和连接函数。
在R中通常使用glm函数构造广义线性模型,其中分布参数包括了binomaial(两项分布)、gaussian(正态分布)、gamma(伽马分布)、poisson(泊松分布)等。和lm函数类似,glm的建模结果可以通过下述的泛型函数进行二次处理,如summary()、coef()、confint()、residuals()、anova()、plot()、predict()



雷达卡



京公网安备 11010802022788号







