楼主: Lisrelchen
1531 6

【独家发布】Jared E. Knowles:Fun with merMod Objects [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2016-7-2 23:46:16 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. Fun with merMod Objects
  2. Jared E. Knowles

  3. Saturday, May 17, 2014

  4. Introduction
  5. First of all, be warned, the terminology surrounding multilevel models is vastly inconsistent. For example, multilevel models themselves may be referred to as hierarchical linear models, random effects models, multilevel models, random intercept models, random slope models, or pooling models. Depending on the discipline, software used, and the academic literature many of these terms may be referring to the same general modeling strategy. In this tutorial I will attempt to provide a user guide to multilevel modeling by demonstrating how to fit multilevel models in R and by attempting to connect the model fitting procedure to commonly used terminology used regarding these models.

  6. We will cover the following topics:

  7. The structure and methods of merMod objects
  8. Extracting random effects of merMod objects
  9. Plotting and interpreting merMod objects
  10. If you haven’t already, make sure you head over to the Getting Started With Multilevel Models tutorial in order to ensure you have set up your environment correctly and installed all the necessary packages. The tl;dr is that you will need:

  11. A current version of R (2.15 or greater)
  12. The lme4 package (install.packages("lme4"))
复制代码


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Objects object Jared With merm discipline literature themselves referring software

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2016-7-2 23:47:02
  1. Read in the data
  2. Multilevel models are appropriate for a particular kind of data structure where units are nested within groups (generally 5+ groups) and where we want to model the group structure of the data. We will use data from Jon Starkweather at the University of North Texas on student social attributes within schools and classes. Visit the excellent tutorial available here for more.

  3. library(lme4) # load library
  4. library(arm) # convenience functions for regression in R
  5. lmm.data <- read.table("http://www.unt.edu/rss/class/Jon/R_SC/Module9/lmm.data.txt",
  6.    header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
  7. #summary(lmm.data)
  8. head(lmm.data)
  9. ##   id extro  open agree social class school
  10. ## 1  1 63.69 43.43 38.03  75.06     d     IV
  11. ## 2  2 69.48 46.87 31.49  98.13     a     VI
  12. ## 3  3 79.74 32.27 40.21 116.34     d     VI
  13. ## 4  4 62.97 44.41 30.51  90.47     c     IV
  14. ## 5  5 64.25 36.86 37.44  98.52     d     IV
  15. ## 6  6 50.97 46.26 38.83  75.22     d      I
  16. Here we have data on the extroversion of subjects nested within classes and within schools.

  17. Let’s understand the structure of the data a bit before we begin:

  18. str(lmm.data)
  19. ## 'data.frame':    1200 obs. of  7 variables:
  20. ##  $ id    : int  1 2 3 4 5 6 7 8 9 10 ...
  21. ##  $ extro : num  63.7 69.5 79.7 63 64.2 ...
  22. ##  $ open  : num  43.4 46.9 32.3 44.4 36.9 ...
  23. ##  $ agree : num  38 31.5 40.2 30.5 37.4 ...
  24. ##  $ social: num  75.1 98.1 116.3 90.5 98.5 ...
  25. ##  $ class : Factor w/ 4 levels "a","b","c","d": 4 1 4 3 4 4 4 4 1 2 ...
  26. ##  $ school: Factor w/ 6 levels "I","II","III",..: 4 6 6 4 4 1 3 4 3 1 ...
  27. Here we see we have two possible grouping variables – class and school. Let’s explore them a bit further:

  28. table(lmm.data$class)
  29. ##
  30. ##   a   b   c   d
  31. ## 300 300 300 300
  32. table(lmm.data$school)
  33. ##
  34. ##   I  II III  IV   V  VI
  35. ## 200 200 200 200 200 200
  36. table(lmm.data$class, lmm.data$school)
  37. ##   
  38. ##      I II III IV  V VI
  39. ##   a 50 50  50 50 50 50
  40. ##   b 50 50  50 50 50 50
  41. ##   c 50 50  50 50 50 50
  42. ##   d 50 50  50 50 50 50
  43. This is a perfectly balanced dataset. In all likelihood you aren’t working with a perfectly balanced dataset, but we’ll explore the implications for that in the future. For now, let’s plot the data a bit. Using the excellent xyplot function in the lattice package, we can explore the relationship between schools and classes across our variables.

  44. require(lattice)
  45. xyplot(extro ~ open + social + agree | class, data = lmm.data,
  46.                  auto.key = list(x = .85, y = .035, corner = c(0, 0)),
  47.        layout = c(4,1), main = "Extroversion by Class")
复制代码

藤椅
Lisrelchen 发表于 2016-7-2 23:51:24
  1. By school we see that students are tightly grouped, but that school I and school VI show substantially more dispersion than the other schools. The same pattern among our predictors holds between schools as it did between classes. Let’s put it all together:

  2. xyplot(extro ~ open + social + agree | school + class, data = lmm.data,
  3.                  auto.key = list(x = .85, y = .035, corner = c(0, 0)),
  4.        main = "Extroversion by School and Class")
复制代码

板凳
Lisrelchen 发表于 2016-7-2 23:52:21
  1. Exploring the Internals of a merMod Object
  2. In the last tutorial we fit a series of random intercept models to our nested data. We will examine the lmerMod object produced when we fit this model in much more depth in order to understand how to work with mixed effect models in R. We start by fitting a the basic example below grouped by class:

  3. MLexamp1 <- lmer(extro ~ open + agree + social + (1|school), data=lmm.data)
  4. class(MLexamp1)
  5. ## [1] "lmerMod"
  6. ## attr(,"package")
  7. ## [1] "lme4"
  8. First, we see that MLexamp1 is now an R object of the class lmerMod. This lmerMod object is an S4 class, and to explore its structure, we use slotNames:

  9. slotNames(MLexamp1)
  10. ##  [1] "resp"    "Gp"      "call"    "frame"   "flist"   "cnms"    "lower"  
  11. ##  [8] "theta"   "beta"    "u"       "devcomp" "pp"      "optinfo"
  12. #showMethods(classes="lmerMod")
  13. Within the lmerMod object we see a number of objects that we may wish to explore. To look at any of these, we can simply type MLexamp1@ and then the slot name itself. For example:

  14. MLexamp1@call # returns the model call
  15. ## lmer(formula = extro ~ open + agree + social + (1 | school),
  16. ##     data = lmm.data)
  17. MLexamp1@beta # returns the betas
  18. ## [1] 59.116514  0.009751  0.027788 -0.002151
  19. class(MLexamp1@frame) # returns the class for the frame slot
  20. ## [1] "data.frame"
  21. head(MLexamp1@frame) # returns the model frame
  22. ##   extro  open agree social school
  23. ## 1 63.69 43.43 38.03  75.06     IV
  24. ## 2 69.48 46.87 31.49  98.13     VI
  25. ## 3 79.74 32.27 40.21 116.34     VI
  26. ## 4 62.97 44.41 30.51  90.47     IV
  27. ## 5 64.25 36.86 37.44  98.52     IV
  28. ## 6 50.97 46.26 38.83  75.22      I
  29. The merMod object has a number of methods available – too many to enumerate here. But, we will go over some of the more common in the list below:

  30. methods(class = "merMod")
  31. ##  [1] anova.merMod*        as.function.merMod*  coef.merMod*        
  32. ##  [4] confint.merMod       deviance.merMod*     df.residual.merMod*
  33. ##  [7] drop1.merMod*        extractAIC.merMod*   extractDIC.merMod*  
  34. ## [10] family.merMod*       fitted.merMod*       fixef.merMod*      
  35. ## [13] formula.merMod*      isGLMM.merMod*       isLMM.merMod*      
  36. ## [16] isNLMM.merMod*       isREML.merMod*       logLik.merMod*      
  37. ## [19] model.frame.merMod*  model.matrix.merMod* ngrps.merMod*      
  38. ## [22] nobs.merMod*         plot.merMod*         predict.merMod*     
  39. ## [25] print.merMod*        profile.merMod*      qqmath.merMod*      
  40. ## [28] ranef.merMod*        refit.merMod*        refitML.merMod*     
  41. ## [31] residuals.merMod*    sigma.merMod*        simulate.merMod*   
  42. ## [34] summary.merMod*      terms.merMod*        update.merMod*      
  43. ## [37] VarCorr.merMod*      vcov.merMod          weights.merMod*     
  44. ##
  45. ##    Non-visible functions are asterisked
  46. A common need is to extract the fixed effects from a merMod object. fixef extracts a named numeric vector of the fixed effects, which is handy.

  47. fixef(MLexamp1)
  48. ## (Intercept)        open       agree      social
  49. ##   59.116514    0.009751    0.027788   -0.002151
  50. If you want to get a sense of the p-values or statistical significance of these parameters, first consult the lme4 help by running ?mcmcsamp for a rundown of various ways of doing this. One convenient way built into the package is:

  51. confint(MLexamp1, level = 0.99)
  52. ##                0.5 %   99.5 %
  53. ## .sig01       4.91840 23.88758
  54. ## .sigma       2.53287  2.81456
  55. ## (Intercept) 46.27751 71.95610
  56. ## open        -0.02465  0.04415
  57. ## agree       -0.01164  0.06721
  58. ## social      -0.01493  0.01063
  59. From here we can see first that our fixed effect parameters overlap 0 indicating no evidence of an effect. We can also see that .sig01, which is our estimate of the variability in the random effect, is very large and very widely defined. This indicates we may have a lack of precision between our groups - either because the group effect is small between groups, we have too few groups to get a more precise estimate, we have too few units within each group, or a combination of all of the above.

  60. Another common need is to extract the residual standard error, which is necessary for calculating effect sizes. To get a named vector of the residual standard error:

  61. sigma(MLexamp1)
  62. ## [1] 2.671
  63. For example, it is common practice in education research to standardize fixed effects into “effect sizes” by dividing the fixed effect paramters by the residual standard error, which can be accomplished in lme4 easily:

  64. fixef(MLexamp1) / sigma(MLexamp1)
  65. ## (Intercept)        open       agree      social
  66. ##  22.1336705   0.0036508   0.0104042  -0.0008055
  67. From this, we can see that our predictors of openness, agreeableness and social are virtually useless in predicting extroversion – as our plots showed. Let’s turn our attention to the random effects next.

  68. Explore Group Variation and Random Effects
  69. In all likelihood you fit a mixed-effect model because you are directly interested in the group-level variation in your model. It is not immediately clear how to explore this group level variation from the results of summary.merMod. What we get from this output is the variance and the standard deviation of the group effect, but we do not get effects for individual groups. This is where the ranef function comes in handy.

  70. ranef(MLexamp1)
  71. ## $school
  72. ##     (Intercept)
  73. ## I       -14.091
  74. ## II       -6.183
  75. ## III      -1.971
  76. ## IV        1.966
  77. ## V         6.331
  78. ## VI       13.948
  79. Running the ranef function gives us the intercepts for each school, but not much additional information – for example the precision of these estimates. To do that, we need some additional commands:

  80. re1 <- ranef(MLexamp1, condVar=TRUE) # save the ranef.mer object
  81. class(re1)
  82. ## [1] "ranef.mer"
  83. attr(re1[[1]], which = "postVar")
  84. ## , , 1
  85. ##
  86. ##         [,1]
  87. ## [1,] 0.03565
  88. ##
  89. ## , , 2
  90. ##
  91. ##         [,1]
  92. ## [1,] 0.03565
  93. ##
  94. ## , , 3
  95. ##
  96. ##         [,1]
  97. ## [1,] 0.03565
  98. ##
  99. ## , , 4
  100. ##
  101. ##         [,1]
  102. ## [1,] 0.03565
  103. ##
  104. ## , , 5
  105. ##
  106. ##         [,1]
  107. ## [1,] 0.03565
  108. ##
  109. ## , , 6
  110. ##
  111. ##         [,1]
  112. ## [1,] 0.03565
  113. The ranef.mer object is a list which contains a data.frame for each group level. The dataframe contains the random effects for each group (here we only have an intercept for each school). When we ask lme4 for the conditional variance of the random effects it is stored in an attribute of those dataframes as a list of variance-covariance matrices.

  114. This structure is indeed complicated, but it is powerful as it allows for nested, grouped, and cross-level random effects. Also, the creators of lme4 have provided users with some simple shortcuts to get what they are really interested in out of a ranef.mer object.

  115. re1 <- ranef(MLexamp1, condVar=TRUE, whichel = "school")
  116. print(re1)
  117. ## $school
  118. ##     (Intercept)
  119. ## I       -14.091
  120. ## II       -6.183
  121. ## III      -1.971
  122. ## IV        1.966
  123. ## V         6.331
  124. ## VI       13.948
  125. ##
  126. ## with conditional variances for "school"
  127. dotplot(re1)
  128. ## $school
  129. plot of chunk ranef3

  130. This graphic shows a dotplot of the random effect terms, also known as a caterpillar plot. Here you can clearly see the effects of each school on extroversion as well as their standard errors to help identify how distinct the random effects are from one another. Interpreting random effects is notably tricky, but for assistance I would recommend looking at a few of these resources:

  131. Gelman and Hill 2006 - Data Analysis Using Regression and Multilevel/Hierarchical Techniques
  132. John Fox - An R Compantion to Applied Regression web appendix
复制代码

报纸
Lisrelchen 发表于 2016-7-2 23:52:45
  1. Using Simulation and Plots to Explore Random Effects
  2. A common econometric approach is to create what are known as empirical Bayes estimates of the group-level terms. Unfortunately there is not much agreement about what constitutes a proper standard error for the random effect terms or even how to consistently define empirical Bayes estimates.1 However, in R there are a few additional ways to get estimates of the random effects that provide the user with information about the relative sizes of the effects for each unit and the precision in that estimate. To do this, we use the sim function in the arm package.2

  3. # A function to extract simulated estimates of random effect paramaters from
  4. # lme4 objects using the sim function in arm
  5. # whichel = the character for the name of the grouping effect to extract estimates for
  6. # nsims = the number of simulations to pass to sim
  7. # x = model object
  8. REsim <- function(x, whichel=NULL, nsims){
  9.   require(plyr)
  10.   mysim <- sim(x, n.sims = nsims)
  11.   if(missing(whichel)){
  12.     dat <- plyr::adply(mysim@ranef[[1]], c(2, 3), plyr::each(c(mean, median, sd)))
  13.     warning("Only returning 1st random effect because whichel not specified")
  14.   } else{
  15.     dat <- plyr::adply(mysim@ranef[[whichel]], c(2, 3), plyr::each(c(mean, median, sd)))
  16.   }
  17.   return(dat)
  18. }

  19. REsim(MLexamp1, whichel = "school", nsims = 1000)
  20. ##    X1          X2    mean  median    sd
  21. ## 1   I (Intercept) -14.053 -14.093 3.990
  22. ## 2  II (Intercept)  -6.141  -6.122 3.988
  23. ## 3 III (Intercept)  -1.934  -1.987 3.981
  24. ## 4  IV (Intercept)   2.016   2.051 3.993
  25. ## 5   V (Intercept)   6.378   6.326 3.981
  26. ## 6  VI (Intercept)  13.992  14.011 3.987
  27. The REsim function returns for each school the level name X1, the estimate name, X2, the mean of the estimated values, the median, and the standard deviation of the estimates.

  28. Another convenience function can help us plot these results to see how they compare to the results of dotplot:

  29. # Dat = results of REsim
  30. # scale = factor to multiply sd by
  31. # var = character of "mean" or "median"
  32. # sd = character of "sd"
  33. plotREsim <- function(dat, scale, var, sd){
  34.   require(eeptools)
  35.   dat[, sd] <- dat[, sd] * scale
  36.   dat[, "ymax"] <- dat[, var] + dat[, sd]
  37.   dat[, "ymin"] <- dat[, var] - dat[, sd]
  38.   dat[order(dat[, var]), "id"] <- c(1:nrow(dat))
  39.   ggplot(dat, aes_string(x = "id", y = var, ymax = "ymax",
  40.                          ymin = "ymin")) +
  41.     geom_pointrange() + theme_dpi() +
  42.     labs(x = "Group", y = "Effect Range", title = "Effect Ranges") +
  43.     theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
  44.           axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
  45.     geom_hline(yintercept = 0, color = I("red"), size = I(1.1))
  46. }

  47. plotREsim(REsim(MLexamp1, whichel = "school", nsims = 1000), scale = 1.2,
  48.           var = "mean", sd = "sd")
  49. plot of chunk ebplot

  50. This presents a more conservative view of the variation between random effect components. Depending on how your data was collected and your research question, alternative ways of estimating these effect sizes are possible. However, proceed with caution.3

  51. Another approach recommended by the authors of lme4 involves the RLRsim package. Using this package we can test whether or not inclusion of the random effects improves model fit and we can evaluate the p-value of additional random effect terms using a likelihood ratio test based on simulation.4

  52. library(RLRsim)
  53. m0 <- lm(extro ~ agree + open + social, data =lmm.data) # fit the null model
  54. exactLRT(m = MLexamp1, m0 = m0)
  55. ##
  56. ##  simulated finite sample distribution of LRT. (p-value based on
  57. ##  10000 simulated values)
  58. ##
  59. ## data:  
  60. ## LRT = 2958, p-value < 2.2e-16
  61. Here exactLRT issues a warning because we originally fit the model with REML instead of full maximum likelihood. Fortunately, the refitML function in lme4 allows us to easily refit our model using full maximum likelihood to conduct an exact test easily.

  62. mA <- refitML(MLexamp1)
  63. exactLRT(m= mA, m0 = m0)
  64. ##
  65. ##  simulated finite sample distribution of LRT. (p-value based on
  66. ##  10000 simulated values)
  67. ##
  68. ## data:  
  69. ## LRT = 2958, p-value < 2.2e-16
  70. Here we can see that the inclusion of our grouping variable is significant, even though the effect of each individual group may be substantively small and/or imprecisely measured. This is important in understanding the correct specification of the model. Our next tutorial will cover specification tests like this in more detail.
复制代码

地板
Lisrelchen 发表于 2016-7-2 23:53:15
  1. What do Random Effects Matter?
  2. How do interpret the substantive impact of our random effects? This is often critical in observation work trying to use a multilevel structure to understand the impact that the grouping can have on the individual observation. To do this we select 12 random cases and then we simulate their predicted value of extro if they were placed in each of the 6 schools. Note, that this is a very simple simulation just using the mean of the fixed effect and the conditional mode of the random effect and not replicating or sampling to get a sense of the variability. This will be left as an exercise to the reader and/or a future tutorial!

  3. # Simulate
  4. # Let's create 12 cases of students
  5. #
  6. #sample some rows
  7. simX <- sample(lmm.data$id, 12)
  8. simX <- lmm.data[lmm.data$id %in% simX, c(3:5)] # get their data
  9. # add an intercept
  10. simX[, "Intercept"] <- 1
  11. simX <- simX[, c(4, 1:3)] # reorder
  12. simRE <- REsim(MLexamp1, whichel = "school", nsims = 1000) # simulate randome effects
  13. simX$case <- row.names(simX) # create a case ID
  14. # expand a grid of case IDs by schools
  15. simDat <- expand.grid(case = row.names(simX), school = levels(lmm.data$school))
  16. simDat <- merge(simX, simDat) # merge in the data
  17. # Create the fixed effect predictor
  18. simDat[, "fepred"] <- (simDat[, 2] * fixef(MLexamp1)[1]) + (simDat[, 3] * fixef(MLexamp1)[2]) +
  19.           (simDat[, 4] * fixef(MLexamp1)[3]) + (simDat[, 5] * fixef(MLexamp1)[4])
  20. # Add the school effects
  21. simDat <- merge(simDat, simRE[, c(1, 3)], by.x = "school", by.y="X1")
  22. simDat$yhat <- simDat$fepred + simDat$mean # add the school specific intercept
  23. Now that we have set up a simulated dataframe, let’s plot it, first by case:

  24. qplot(school, yhat, data = simDat) + facet_wrap(~case) + theme_dpi()
复制代码

7
Lisrelchen 发表于 2016-7-2 23:53:46
  1. Here we can clearly see that within each school the cases are relatively the same indicating that the group effect is larger than the individual effects.

  2. These plots are useful in demonstrating the relative importance of group and individual effects in a substantive fashion. Even more can be done to make the the graphs more informative, such as placing references to the total variability of the outcome and also looking at the distance moving groups moves each observation from its true value.

  3. Conclusion
  4. lme4 provides a very powerful object-oriented toolset for dealing with mixed effect models in R. Understanding model fit and confidence intervals of lme4 objects requires some diligent research and the use of a variety of functions and extensions of lme4 itself. In our next tutorial we will explore how to identify a proper specification of a random-effect model and Bayesian extensions of the lme4 framework for difficult to specify models. We will also explore the generalized linear model framework and the glmer function for generalized linear modeling with multi-levels.

  5. Appendix
  6. print(sessionInfo(),locale=FALSE)
  7. ## R version 3.1.0 (2014-04-10)
  8. ## Platform: x86_64-w64-mingw32/x64 (64-bit)
  9. ##
  10. ## attached base packages:
  11. ## [1] stats     graphics  grDevices utils     datasets  methods   base     
  12. ##
  13. ## other attached packages:
  14. ##  [1] RLRsim_2.1-3    eeptools_0.3.1  ggplot2_0.9.3.1 plyr_1.8.1     
  15. ##  [5] lattice_0.20-29 arm_1.7-03      MASS_7.3-33     lme4_1.1-6     
  16. ##  [9] Rcpp_0.11.1     Matrix_1.1-3    knitr_1.5      
  17. ##
  18. ## loaded via a namespace (and not attached):
  19. ##  [1] abind_1.4-0         car_2.0-20          coda_0.16-1        
  20. ##  [4] colorspace_1.2-4    data.table_1.9.2    digest_0.6.4      
  21. ##  [7] evaluate_0.5.5      foreign_0.8-61      formatR_0.10      
  22. ## [10] grid_3.1.0          gtable_0.1.2        labeling_0.2      
  23. ## [13] maptools_0.8-29     memisc_0.96-9       minqa_1.2.3        
  24. ## [16] munsell_0.4.2       nlme_3.1-117        nnet_7.3-8         
  25. ## [19] proto_0.3-10        RcppEigen_0.3.2.1.2 reshape2_1.4      
  26. ## [22] rmarkdown_0.1.25    scales_0.2.4        sp_1.0-15         
  27. ## [25] splines_3.1.0       stringr_0.6.2       tools_3.1.0        
  28. ## [28] yaml_2.1.11
  29. [See message from lme4 co-author Doug Bates on this subject]. (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q4/002984.html)↩

  30. Andrew Gelman and Yu-Sung Su (2014). arm: Data Analysis Using Regression and Multilevel/Hierarchical Models. R package version 1.7-03. http://CRAN.R-project.org/package=arm↩

  31. WikiDot FAQ from the R Mixed Models Mailing List↩

  32. There are also an extensive series of references available in the References section of the help by running ?exactLRT and ?exactRLRT.↩
复制代码

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-30 04:57