楼主: SASCHEN
2160 3

Strategies for Model Building in Multilevel Data Analysis [推广有奖]

  • 0关注
  • 0粉丝

已卖:130份资源

硕士生

10%

还不是VIP/贵宾

-

TA的文库  其他...

Social Media Mining

深度學習(DEEP LEARNING)

HTML

威望
0
论坛币
2297 个
通用积分
3.6100
学术水平
4 点
热心指数
5 点
信用等级
4 点
经验
912 点
帖子
161
精华
0
在线时间
43 小时
注册时间
2005-9-25
最后登录
2022-10-29

楼主
SASCHEN 发表于 2014-1-12 06:12:30 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
I'm wondering if there exits some strategies for model building in multilevel analysis like what we have in regression analysis. For instance, in regression analysis we can use enter, stepwise, forward, backward and remove for selecting how to enter variables in the model.

I tried to find some similar strategies applicable to HLM but I couldn't find. For example, we should first level-1 model and determine the relevant variables and then move to level-2 or we should do for each level separately and then combine them to get the final model or what?

I really appreciate any help.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Strategies Multilevel Building Analysis Strateg determine backward building relevant example

沙发
SASCHEN 发表于 2014-1-12 06:13:07
Snijders & Bosker (2012) have a very nice treatment of their suggested model building principles. See the section "6.4 Model specification" starting on page 102.
  • Considerations relating to the subject matter. These follow from field knowledge, existing theory, detailed problem formulation, and common sense.
  • The distinction between effects that are indicated a priori as effects to be tested, that is, effects on which the research is focused, and effects that are necessary to obtaina good model fit. Often the effects tested are a subset of the fixed effects, and the random part is to be fitted adequately but of secondary interest. When there is no strong prior knowledge about which variables to include in the random part, one may follow a data-driven approach to select the variables for the random part.
  • A preference for 'hierarchical' models in the general sense (not the 'hierarchical linear model' sense) that if a model contains an interaction effect, then the corresponding main effects should usually also be included (even if these are not significant); and if a variable has a random slope, its fixed effect should normally also be included in the model. The reason is that omitting such effects may lead to erroneous interpretations.
  • Doing justice to the multilevel nature of the problem.
  • Awareness of the necessity of including certain covariances of random effects. Including such covariances means that they are free parameters in the model, not constrained to 0 but estimated from the data. In Section 5 . 1 .2, attention was given to the necessity to include in the model all covariances TOh between random slopes and random intercept. Another case in point arises when a categorical variable with c ::: 3 categories has a random effect. This is implemented by giving random slopes to the c - 1 dummy variables that are used to represent the categorical variable. The covariances between these random slopes should then also be included in the model. Formulated generally, suppose that two variables Xh and Xh, have random effects, and that the meaning of these variables is such that they could be replaced by two linear combinations, aXh + a'Xh' and bXh + b'Xh, . (For the random intercept and random slope discussed in Section 5 . 1 .2, the relevant type of linear combination would correspond to a change of origin of the variable with the random slope.) Then the covariance Thh' between the two random slopes should be included in the model.
  • Reluctance to include nonsignificant effects in the model - one could also say, a reluctance to overfit. Each of points 1-5 above, however, could override this reluctance. An obvious example of this overriding is the case where one wishes to test for the effect of X2, while controlling for the effect of Xl . The purpose of the analysis is a subject-matter consideration, and even if the effect of Xl is nonsignificant, one still should include this effect in the model.
  • The desire to obtain a good fit, and include all effects in the model that contribute to a good fit. In practice, this leads to the inclusion of all significant effects unless the data set is so large that certain effects, although significant, are deemed unimportant nevertheless.
  • Awareness of the following two basic statistical facts:(a) Every test of a given statistical parameter controls for all other effects in the model used as a null hypothesis (Mo in Section 6.2). Since the latter set of effects has an influence on the interpretation as well as on the statistical power, test results may depend on the set of other effects included in the model.(b) We are constantly making type I and type II errors. Especially the latter, since statistical power often is rather low. This implies that an effect being nonsignificant does not mean that the effect is absent in the population. It also implies that a significant effect may be so by chance (but the probability of this is no larger than the level of significance - most often set at 0.05). Multilevel research often is based on data with a limited number of groups. Since power for detecting effects of level-two variables depends strongly on the number of groups in the data, warnings about low power are especially important for level-two variables.
  • Providing tested fixed effects with an appropriate error term in the model (whether or not it is significant). For level-two variables, this is the random intercept term (the residual term in (5.7)). For cross-level interactions, it is the random slope of the level-one variable involved in the interaction (the residual term in (5.8)). For level-one variables, it is the regular level-one residual that one would not dream of omitting. This guideline is supported by Berkhof and Kampen (2004).

藤椅
SASCHEN 发表于 2014-1-12 06:16:35
I would have to reach back to advice I got from the great and recently departed Carol H. Weiss, which was if there wasn't literature on the exact topic expand to other literature where similar problems had been tackled.  This proved useful advice.  I don't know your problem, but I think there must be ways to find theories or practical applications from other fields where you can apply some order to your approach, even if maybe this leads to more than one way to tackle the problem.

As for the level one first or level two first, this may have to do with how you see the problem.  For example, you might have a case where the top level represents institutions and you consider the variation associated with them more or less a nuisance, and you would just like to explain as much of it away as you can, so you might just add institutional characteristics in as a group or as a couple of related groups to explain as much institution-related variance as possible.  Then when you explain variance using variables made up of person characteristics you it is net of the variation that is explained by observed institutional characteristics.  (How level one and level two relate depends on centering of the level-one variables.) But the opposite could be true.  Your study might be of institutional policies, so you might want to "explain away" the variance accounted for by individual characteristics.  The thing is, for all I know, you might be looking at crop yields or something.

When I am modeling things having to do with people I tend to put variables into the model in a somewhat characteristic order by groups and test whether the entire groups significant improves model fit.  Generally, I put in the demographic type characteristics of the subjects first, such as gender and age and whatever else I might have like that.  Then I might have some observed psychological of lifestyle characteristics that aren't a focus of my study and/or don't change much, such as type of employment, negative affectivity, the final step is then focused on the hypothesis-related predictor(s), whatever it is I think is going to predict my outcome.  But, there are many variations of this.  Although the covariates are generally in the model to remove explainable variance from the error, things get tricky depending on how the variables relate to one another.  So in some cases I may want to see a model with just the hypothesis variable(s) in it to understand the relationship without other variables present and see how this changes as other variables are added to the model.

So back to my main point.  I think you should be as exhaustive as possible in trying to have a theoretical or at least pragmatic scheme to guide you in the process.  Anything that just cycles you through a lot of tests is, according to the theories that underlying what we do, going generate findings by chance alone.  That's not a very comforting thought.

板凳
xiexie1111 发表于 2014-8-20 22:19:20
Well, it's really time for me to learn more about it.  Thanks for your sharing, xie ixe.

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-2-4 03:31