楼主: Lisrelchen
1563 8

[Jared E. Knowles]Getting Started with Multilevel Modeling in R [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2016-7-2 22:49:08 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. Getting Started with Multilevel Modeling in R
  2. Jared E. Knowles
  3. Introduction
  4. Analysts dealing with grouped data and complex hierarchical structures in their data ranging from measurements nested within participants, to counties nested within states or students nested within classrooms often find themselves in need of modeling tools to reflect this structure of their data. In R there are two predominant ways to fit multilevel models that account for such structure in the data. These tutorials will show the user how to use both the lme4 package in R to fit linear and nonlinear mixed effect models, and to use rstan to fit fully Bayesian multilevel models. The focus here will be on how to fit the models in R and not the theory behind the models. For background on multilevel modeling, see the references. [1]

  5. This tutorial will cover getting set up and running a few basic models using lme4 in R.Future tutorials will cover:

  6. constructing varying intercept, varying slope, and varying slope and intercept models in R
  7. generating predictions and interpreting parameters from mixed-effect models
  8. generalized and non-linear multilevel models
  9. fully Bayesian multilevel models fit with rstan or other MCMC methods
  10. Setting up your enviRonment
  11. Getting started with multilevel modeling in R is simple. lme4 is the canonical package for implementing multilevel models in R, though there are a number of packages that depend on and enhance its feature set, including Bayesian extensions. lme4 has been recently rewritten to improve speed and to incorporate a C++ codebase, and as such the features of the package are somewhat in flux. Be sure to update the package frequently.

  12. To install lme4, we just run:

  13. # Main version
  14. install.packages("lme4")

  15. # Or to install the dev version
  16. library(devtools)
  17. install_github("lme4", user = "lme4")
复制代码

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Multilevel Modeling getting Started Level themselves structure account complex nested

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2016-7-2 22:49:42
  1. Read in the data
  2. Multilevel models are appropriate for a particular kind of data structure where units are nested within groups (generally 5+ groups) and where we want to model the group structure of the data. For our introductory example we will start with a simple example from the lme4 documentation and explain what the model is doing. We will use data from Jon Starkweather at the University of North Texas. Visit the excellent tutorial available here for more.

  3. library(lme4)  # load library
  4. library(arm)  # convenience functions for regression in R
  5. lmm.data <- read.table("http://www.unt.edu/rss/class/Jon/R_SC/Module9/lmm.data.txt",
  6.     header = TRUE, sep = ",", na.strings = "NA", dec = ".", strip.white = TRUE)
  7. # summary(lmm.data)
  8. head(lmm.data)
  9. ##   id extro  open agree social class school
  10. ## 1  1 63.69 43.43 38.03  75.06     d     IV
  11. ## 2  2 69.48 46.87 31.49  98.13     a     VI
  12. ## 3  3 79.74 32.27 40.21 116.34     d     VI
  13. ## 4  4 62.97 44.41 30.51  90.47     c     IV
  14. ## 5  5 64.25 36.86 37.44  98.52     d     IV
  15. ## 6  6 50.97 46.26 38.83  75.22     d      I
  16. Here we have data on the extroversion of subjects nested within classes and within schools.
复制代码

藤椅
Lisrelchen 发表于 2016-7-2 22:50:18
  1. Fit the Non-Multilevel Models
  2. Let's start by fitting a simple OLS regression of measures of openness, agreeableness, and socialability on extroversion. Here we use the display function in the excellent arm package for abbreviated output. Other options include stargazer for LaTeX typeset tables, xtable, or the ascii package for more flexible plain text output options.

  3. OLSexamp <- lm(extro ~ open + agree + social, data = lmm.data)
  4. display(OLSexamp)
  5. ## lm(formula = extro ~ open + agree + social, data = lmm.data)
  6. ##             coef.est coef.se
  7. ## (Intercept) 57.84     3.15  
  8. ## open         0.02     0.05  
  9. ## agree        0.03     0.05  
  10. ## social       0.01     0.02  
  11. ## ---
  12. ## n = 1200, k = 4
  13. ## residual sd = 9.34, R-Squared = 0.00
  14. So far this model does not fit very well at all. The R model interface is quite a simple one with the dependent variable being specified first, followed by the ~ symbol. The righ hand side, predictor variables, are each named. Addition signs indicate that these are modeled as additive effects. Finally, we specify that datframe on which to calculate the model. Here we use the lm function to perform OLS regression, but there are many other options in R.

  15. If we want to extract measures such as the AIC, we may prefer to fit a generalized linear model with glm which produces a model fit through maximum likelihood estimation. Note that the model formula specification is the same.

  16. MLexamp <- glm(extro ~ open + agree + social, data = lmm.data)
  17. display(MLexamp)
  18. ## glm(formula = extro ~ open + agree + social, data = lmm.data)
  19. ##             coef.est coef.se
  20. ## (Intercept) 57.84     3.15  
  21. ## open         0.02     0.05  
  22. ## agree        0.03     0.05  
  23. ## social       0.01     0.02  
  24. ## ---
  25. ##   n = 1200, k = 4
  26. ##   residual deviance = 104378.2, null deviance = 104432.7 (difference = 54.5)
  27. ##   overdispersion parameter = 87.3
  28. ##   residual sd is sqrt(overdispersion) = 9.34
  29. AIC(MLexamp)
  30. ## [1] 8774
  31. This results in a poor model fit. Let's look at a simple varying intercept model now.
复制代码

板凳
Lisrelchen 发表于 2016-7-2 22:51:02
  1. Fit a varying intercept model
  2. Depending on disciplinary norms, our next step might be to fit a varying intercept model using a grouping variable such as school or classes. Using the glm function and the familiar formula interface, such a fit is easy:

  3. MLexamp.2 <- glm(extro ~ open + agree + social + class, data = lmm.data)
  4. display(MLexamp.2)
  5. ## glm(formula = extro ~ open + agree + social + class, data = lmm.data)
  6. ##             coef.est coef.se
  7. ## (Intercept) 56.05     3.09  
  8. ## open         0.03     0.05  
  9. ## agree       -0.01     0.05  
  10. ## social       0.01     0.02  
  11. ## classb       2.06     0.75  
  12. ## classc       3.70     0.75  
  13. ## classd       5.67     0.75  
  14. ## ---
  15. ##   n = 1200, k = 7
  16. ##   residual deviance = 99187.7, null deviance = 104432.7 (difference = 5245.0)
  17. ##   overdispersion parameter = 83.1
  18. ##   residual sd is sqrt(overdispersion) = 9.12
  19. AIC(MLexamp.2)
  20. ## [1] 8719
  21. anova(MLexamp, MLexamp.2, test = "F")
  22. ## Analysis of Deviance Table
  23. ##
  24. ## Model 1: extro ~ open + agree + social
  25. ## Model 2: extro ~ open + agree + social + class
  26. ##   Resid. Df Resid. Dev Df Deviance    F  Pr(>F)   
  27. ## 1      1196     104378                             
  28. ## 2      1193      99188  3     5190 20.8 3.8e-13 ***
  29. ## ---
  30. ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  31. This is called a fixed-effects specification often. This is simply the case of fitting a separate dummy variable as a predictor for each class. We can see this does not provide much additional model fit. Let's see if school performs any better.

  32. MLexamp.3 <- glm(extro ~ open + agree + social + school, data = lmm.data)
  33. display(MLexamp.3)
  34. ## glm(formula = extro ~ open + agree + social + school, data = lmm.data)
  35. ##             coef.est coef.se
  36. ## (Intercept) 45.02     0.92  
  37. ## open         0.01     0.01  
  38. ## agree        0.03     0.02  
  39. ## social       0.00     0.00  
  40. ## schoolII     7.91     0.27  
  41. ## schoolIII   12.12     0.27  
  42. ## schoolIV    16.06     0.27  
  43. ## schoolV     20.43     0.27  
  44. ## schoolVI    28.05     0.27  
  45. ## ---
  46. ##   n = 1200, k = 9
  47. ##   residual deviance = 8496.2, null deviance = 104432.7 (difference = 95936.5)
  48. ##   overdispersion parameter = 7.1
  49. ##   residual sd is sqrt(overdispersion) = 2.67
  50. AIC(MLexamp.3)
  51. ## [1] 5774
  52. anova(MLexamp, MLexamp.3, test = "F")
  53. ## Analysis of Deviance Table
  54. ##
  55. ## Model 1: extro ~ open + agree + social
  56. ## Model 2: extro ~ open + agree + social + school
  57. ##   Resid. Df Resid. Dev Df Deviance    F Pr(>F)   
  58. ## 1      1196     104378                           
  59. ## 2      1191       8496  5    95882 2688 <2e-16 ***
  60. ## ---
  61. ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  62. The school effect greatly improves our model fit. However, how do we interpret these effects?

  63. table(lmm.data$school, lmm.data$class)
  64. ##      
  65. ##        a  b  c  d
  66. ##   I   50 50 50 50
  67. ##   II  50 50 50 50
  68. ##   III 50 50 50 50
  69. ##   IV  50 50 50 50
  70. ##   V   50 50 50 50
  71. ##   VI  50 50 50 50
  72. Here we can see we have a perfectly balanced design with fifty observations in each combination of class and school (if only data were always so nice!).

  73. Let's try to model each of these unique cells. To do this, we fit a model and use the : operator to specify the interaction between school and class.

  74. MLexamp.4 <- glm(extro ~ open + agree + social + school:class, data = lmm.data)
  75. display(MLexamp.4)
  76. ## glm(formula = extro ~ open + agree + social + school:class, data = lmm.data)
  77. ##                  coef.est coef.se
  78. ## (Intercept)       80.36     0.37
  79. ## open               0.01     0.00
  80. ## agree             -0.01     0.01
  81. ## social             0.00     0.00
  82. ## schoolI:classa   -40.39     0.20
  83. ## schoolII:classa  -28.15     0.20
  84. ## schoolIII:classa -23.58     0.20
  85. ## schoolIV:classa  -19.76     0.20
  86. ## schoolV:classa   -15.50     0.20
  87. ## schoolVI:classa  -10.46     0.20
  88. ## schoolI:classb   -34.60     0.20
  89. ## schoolII:classb  -26.76     0.20
  90. ## schoolIII:classb -22.59     0.20
  91. ## schoolIV:classb  -18.71     0.20
  92. ## schoolV:classb   -14.31     0.20
  93. ## schoolVI:classb   -8.54     0.20
  94. ## schoolI:classc   -31.86     0.20
  95. ## schoolII:classc  -25.64     0.20
  96. ## schoolIII:classc -21.58     0.20
  97. ## schoolIV:classc  -17.58     0.20
  98. ## schoolV:classc   -13.38     0.20
  99. ## schoolVI:classc   -5.58     0.20
  100. ## schoolI:classd   -30.00     0.20
  101. ## schoolII:classd  -24.57     0.20
  102. ## schoolIII:classd -20.64     0.20
  103. ## schoolIV:classd  -16.60     0.20
  104. ## schoolV:classd   -12.04     0.20
  105. ## ---
  106. ##   n = 1200, k = 27
  107. ##   residual deviance = 1135.9, null deviance = 104432.7 (difference = 103296.8)
  108. ##   overdispersion parameter = 1.0
  109. ##   residual sd is sqrt(overdispersion) = 0.98
  110. AIC(MLexamp.4)
  111. ## [1] 3396
  112. This is very useful, but what if we want to understand both the effect of the school and the effect of the class, as well as the effect of the schools and classes? Unfortunately, this is not easily done with the standard glm.

  113. MLexamp.5 <- glm(extro ~ open + agree + social + school * class - 1, data = lmm.data)
  114. display(MLexamp.5)
  115. ## glm(formula = extro ~ open + agree + social + school * class -
  116. ##     1, data = lmm.data)
  117. ##                  coef.est coef.se
  118. ## open              0.01     0.00  
  119. ## agree            -0.01     0.01  
  120. ## social            0.00     0.00  
  121. ## schoolI          39.96     0.36  
  122. ## schoolII         52.21     0.36  
  123. ## schoolIII        56.78     0.36  
  124. ## schoolIV         60.60     0.36  
  125. ## schoolV          64.86     0.36  
  126. ## schoolVI         69.90     0.36  
  127. ## classb            5.79     0.20  
  128. ## classc            8.53     0.20  
  129. ## classd           10.39     0.20  
  130. ## schoolII:classb  -4.40     0.28  
  131. ## schoolIII:classb -4.80     0.28  
  132. ## schoolIV:classb  -4.74     0.28  
  133. ## schoolV:classb   -4.60     0.28  
  134. ## schoolVI:classb  -3.87     0.28  
  135. ## schoolII:classc  -6.02     0.28  
  136. ## schoolIII:classc -6.54     0.28  
  137. ## schoolIV:classc  -6.36     0.28  
  138. ## schoolV:classc   -6.41     0.28  
  139. ## schoolVI:classc  -3.65     0.28  
  140. ## schoolII:classd  -6.81     0.28  
  141. ## schoolIII:classd -7.45     0.28  
  142. ## schoolIV:classd  -7.24     0.28  
  143. ## schoolV:classd   -6.93     0.28  
  144. ## schoolVI:classd   0.06     0.28  
  145. ## ---
  146. ##   n = 1200, k = 27
  147. ##   residual deviance = 1135.9, null deviance = 4463029.9 (difference = 4461894.0)
  148. ##   overdispersion parameter = 1.0
  149. ##   residual sd is sqrt(overdispersion) = 0.98
  150. AIC(MLexamp.5)
  151. ## [1] 3396
复制代码

报纸
Lisrelchen 发表于 2016-7-2 22:51:47
  1. Exploring Random Slopes
  2. Another alternative is to fit a separate model for each of the school and class combinations. If we believe the relationsihp between our variables may be highly dependent on the school and class combination, we can simply fit a series of models and explore the parameter variation among them:

  3. require(plyr)

  4. modellist <- dlply(lmm.data, .(school, class), function(x) glm(extro ~ open +
  5.     agree + social, data = x))
  6. display(modellist[[1]])
  7. ## glm(formula = extro ~ open + agree + social, data = x)
  8. ##             coef.est coef.se
  9. ## (Intercept) 35.87     5.90  
  10. ## open         0.05     0.09  
  11. ## agree        0.02     0.10  
  12. ## social       0.01     0.03  
  13. ## ---
  14. ##   n = 50, k = 4
  15. ##   residual deviance = 500.2, null deviance = 506.2 (difference = 5.9)
  16. ##   overdispersion parameter = 10.9
  17. ##   residual sd is sqrt(overdispersion) = 3.30
  18. display(modellist[[2]])
  19. ## glm(formula = extro ~ open + agree + social, data = x)
  20. ##             coef.est coef.se
  21. ## (Intercept) 47.96     2.16  
  22. ## open        -0.01     0.03  
  23. ## agree       -0.03     0.03  
  24. ## social      -0.01     0.01  
  25. ## ---
  26. ##   n = 50, k = 4
  27. ##   residual deviance = 47.9, null deviance = 49.1 (difference = 1.2)
  28. ##   overdispersion parameter = 1.0
  29. ##   residual sd is sqrt(overdispersion) = 1.02
  30. We will discuss this strategy in more depth in future tutorials including how to performan inference on the list of models produced in this command.
复制代码

地板
Lisrelchen 发表于 2016-7-2 22:54:15
  1. Fit a varying intercept model with lmer
  2. Enter lme4. While all of the above techniques are valid approaches to this problem, they are not necessarily the best approach when we are interested explicitly in variation among and by groups. This is where a mixed-effect modeling framework is useful. Now we use the lmer function with the familiar formula interface, but now group level variables are specified using a special syntax: (1|school) tells lmer to fit a linear model with a varying-intercept group effect using the variable school.

  3. MLexamp.6 <- lmer(extro ~ open + agree + social + (1 | school), data = lmm.data)
  4. display(MLexamp.6)
  5. ## lmer(formula = extro ~ open + agree + social + (1 | school),
  6. ##     data = lmm.data)
  7. ##             coef.est coef.se
  8. ## (Intercept) 59.12     4.10  
  9. ## open         0.01     0.01  
  10. ## agree        0.03     0.02  
  11. ## social       0.00     0.00  
  12. ##
  13. ## Error terms:
  14. ##  Groups   Name        Std.Dev.
  15. ##  school   (Intercept) 9.79   
  16. ##  Residual             2.67   
  17. ## ---
  18. ## number of obs: 1200, groups: school, 6
  19. ## AIC = 5836.1, DIC = 5789
  20. ## deviance = 5806.5
  21. We can fit multiple group effects with multiple group effect terms.

  22. MLexamp.7 <- lmer(extro ~ open + agree + social + (1 | school) + (1 | class),
  23.     data = lmm.data)
  24. display(MLexamp.7)
  25. ## lmer(formula = extro ~ open + agree + social + (1 | school) +
  26. ##     (1 | class), data = lmm.data)
  27. ##             coef.est coef.se
  28. ## (Intercept) 60.20     4.21  
  29. ## open         0.01     0.01  
  30. ## agree       -0.01     0.01  
  31. ## social       0.00     0.00  
  32. ##
  33. ## Error terms:
  34. ##  Groups   Name        Std.Dev.
  35. ##  school   (Intercept) 9.79   
  36. ##  class    (Intercept) 2.41   
  37. ##  Residual             1.67   
  38. ## ---
  39. ## number of obs: 1200, groups: school, 6; class, 4
  40. ## AIC = 4737.9, DIC = 4683
  41. ## deviance = 4703.6
  42. And finally, we can fit nested group effect terms through the following syntax:

  43. MLexamp.8 <- lmer(extro ~ open + agree + social + (1 | school/class), data = lmm.data)
  44. display(MLexamp.8)
  45. ## lmer(formula = extro ~ open + agree + social + (1 | school/class),
  46. ##     data = lmm.data)
  47. ##             coef.est coef.se
  48. ## (Intercept) 60.24     4.01  
  49. ## open         0.01     0.00  
  50. ## agree       -0.01     0.01  
  51. ## social       0.00     0.00  
  52. ##
  53. ## Error terms:
  54. ##  Groups       Name        Std.Dev.
  55. ##  class:school (Intercept) 2.86   
  56. ##  school       (Intercept) 9.69   
  57. ##  Residual                 0.98   
  58. ## ---
  59. ## number of obs: 1200, groups: class:school, 24; school, 6
  60. ## AIC = 3568.6, DIC = 3508
  61. ## deviance = 3531.1

  62. Here the (1|school/class) says that we want to fit a mixed effect term for varying intercepts 1| by schools, and for classes that are nested within schools.
复制代码

7
Lisrelchen 发表于 2016-7-2 22:54:41
  1. Fit a varying slope model with lmer
  2. But, what if we want to explore the effect of different student level indicators as they vary across classrooms. Instead of fitting unique models by school (or school/class) we can fit a varying slope model. Here we modify our random effect term to include variables before the grouping terms: (1 + open|school/class) tells R to fit a varying slope and varying intercept model for schools and classes nested within schools, and to allow the slope of the open variable to vary by school.

  3. MLexamp.9 <- lmer(extro ~ open + agree + social + (1 + open | school/class),
  4.     data = lmm.data)
  5. display(MLexamp.9)
  6. ## lmer(formula = extro ~ open + agree + social + (1 + open | school/class),
  7. ##     data = lmm.data)
  8. ##             coef.est coef.se
  9. ## (Intercept) 60.26     3.93  
  10. ## open         0.01     0.01  
  11. ## agree       -0.01     0.01  
  12. ## social       0.00     0.00  
  13. ##
  14. ## Error terms:
  15. ##  Groups       Name        Std.Dev. Corr
  16. ##  class:school (Intercept) 2.62         
  17. ##               open        0.01     1.00
  18. ##  school       (Intercept) 9.51         
  19. ##               open        0.00     1.00
  20. ##  Residual                 0.98         
  21. ## ---
  22. ## number of obs: 1200, groups: class:school, 24; school, 6
  23. ## AIC = 3574.7, DIC = 3506
  24. ## deviance = 3529.3
复制代码

8
Lisrelchen 发表于 2016-7-2 22:55:28
[code]Conclusion
Fitting mixed effect models and exploring group level variation is very easy within the R language and ecosystem. In future tutorials we will explore comparing across models, doing inference with mixed-effect models, and creating graphical representations of mixed effect models to understand their effects.

Appendix
print(sessionInfo(), locale = FALSE)
## R version 3.0.1 (2013-05-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
##
## other attached packages:
## [1] plyr_1.8        arm_1.6-10      MASS_7.3-29     lme4_1.0-5     
## [5] Matrix_1.1-0    lattice_0.20-24 knitr_1.5      
##
## loaded via a namespace (and not attached):
##  [1] abind_1.4-0    coda_0.16-1    evaluate_0.5.1 formatR_0.10  
##  [5] grid_3.0.1     minqa_1.2.1    nlme_3.1-113   splines_3.0.1
##  [9] stringr_0.6.2  tools_3.0.1
[1] Examples include Gelman and Hill, Gelman et al. 2013, etc.

9
cheeko 在职认证  发表于 2016-7-3 00:18:00
thank you

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-28 22:18