[R] 用ffbase包里的bigglm建模 [推广有奖]

1关注
1粉丝

大专生

23%

还不是VIP/贵宾

威望: 0 级
论坛币: 41 个
通用积分: 0
学术水平: 5 点
热心指数: 6 点
信用等级: 5 点
经验: 454 点
帖子: 34
精华: 0
在线时间: 47 小时
注册时间: 2009-11-4
最后登录: 2023-12-15

楼主

fan19889017 发表于 2015-3-3 12:33:30 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

library(ff)
library(ffbase)
library(biglm)
data(Affairs, package = "AER")
Affairs$ynaffair[Affairs$affairs > 0] <- 1
Affairs$ynaffair[Affairs$affairs == 0] <- 0
gender <- as.ff(c(Affairs$gender),vmode="integer")
age <- as.ff(c(Affairs$age),vmode="double")
yearsmarried <- as.ff(c(Affairs$yearsmarried),vmode="double")
children <- as.ff(c(Affairs$children),vmode="integer")
religiousness <- as.ff(c(Affairs$religiousness),vmode="integer")
education <- as.ff(c(Affairs$education),vmode="integer")
occupation <- as.ff(c(Affairs$occupation),vmode="integer")
rating <- as.ff(c(Affairs$rating),vmode="integer")
ynaffair <- as.ff(c(Affairs$ynaffair),vmode="integer")
ts <- ffdf(ynaffair,gender,age,yearsmarried,children,religiousness,education,occupation,rating)
full <- bigglm.ffdf(ynaffair ~ gender + age + yearsmarried +
children + religiousness + education + occupation + rating,
data=ts,family=binomial(),chunksize=5,sandwich=）
summary(full)
Large data regression model: bigglm(ynaffair ~ gender + age + yearsmarried + children + religiousness +
education + occupation + rating, data = ts, family = binomial(),
chunksize = 5)
Sample size = 601
Coef (95% CI) SE p
(Intercept) 0.6993 -1.2040 2.6026 0.9517 0.4624
gender 0.2803 -0.1979 0.7585 0.2391 0.2411
age -0.0443 -0.0808 -0.0078 0.0182 0.0153
yearsmarried 0.0948 0.0303 0.1592 0.0322 0.0033
children 0.3977 -0.1853 0.9807 0.2915 0.1725
religiousness -0.3247 -0.5042 -0.1452 0.0898 0.0003
education 0.0211 -0.0800 0.1221 0.0505 0.6769
occupation 0.0309 -0.1126 0.1745 0.0718 0.6666
rating -0.4685 -0.6503 -0.2866 0.0909 0.0000
fit.full <- glm(ynaffair ~ gender + age + yearsmarried +
children + religiousness + education + occupation + rating,
data = Affairs, family = binomial())
summary(fit.full)
Call:
glm(formula = ynaffair ~ gender + age + yearsmarried + children +
religiousness + education + occupation + rating, family = binomial(),
data = Affairs)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5713 -0.7499 -0.5690 -0.2539 2.5191
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.37726 0.88776 1.551 0.120807
gendermale 0.28029 0.23909 1.172 0.241083
age -0.04426 0.01825 -2.425 0.015301 *
yearsmarried 0.09477 0.03221 2.942 0.003262 **
childrenyes 0.39767 0.29151 1.364 0.172508
religiousness -0.32472 0.08975 -3.618 0.000297 ***
education 0.02105 0.05051 0.417 0.676851
occupation 0.03092 0.07178 0.431 0.666630
rating -0.46845 0.09091 -5.153 2.56e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 675.38 on 600 degrees of freedom
Residual deviance: 609.51 on 592 degrees of freedom
AIC: 627.51
Number of Fisher Scoring iterations: 4

复制代码

该段程序引用自《R语言实战》第十三章，分别使用用bigglm建模，与glm建模。

广义线性模型（generalized linear model, GLM)是简单最小二乘回归（OLS)的扩展，在OLS的假设中，响应变量是连续数值数据且服从正态分布，而且响应变量期望值与预测变量之间的关系是线性关系。而广义线性模型则放宽其假设，首先响应变量可以是正整数或分类数据，其分布为某指数分布族。其次响应变量期望值的函数（连接函数）与预测变量之间的关系为线性关系。因此在进行GLM建模时，需要指定分布类型和连接函数。

在R中通常使用glm函数构造广义线性模型，其中分布参数包括了binomaial（两项分布）、gaussian（正态分布）、gamma（伽马分布）、poisson(泊松分布)等。和lm函数类似，glm的建模结果可以通过下述的泛型函数进行二次处理，如summary()、coef()、confint()、residuals()、anova()、plot()、predict()