楼主: ReneeBK
3463 4

Cox proportional hazard models with a shared frailty term [推广有奖]

  • 1关注
  • 62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49407 个
通用积分
51.8704
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57815 点
帖子
4006
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Excuse this naive question: I have run Cox proportional hazard models with a shared frailty term for neighbourhood (c. 30,000 individuals are clustered within c. 500 neighbourhoods). My outcome is birth event (c. 2000). In a model with individual socioeconomic variables, the frailty variance for neighbourhood is close to zero and non-significant. The model with shared frailty is a worse fit than an identical model without.

However, if I drop the neighbourhood shared frailty term, and add in neighbourhood level variables, ind-level variables remain unchanged but neighbourhood variables have large effects (and models with neighbourhood variables have lower AIC values than the model with ind level variables only). My understanding of this is that even though the variation in risk of birth between neighbourhoods is very low and not significant, it appears that neighbourhood factors can have high predictive power for risk of birth. I’m slightly confused by this and wonder whether anyone could offer an explanation or point me towards literature that discusses the issue? I’m aware that in multilevel models variance at level 2 can rather counterintuitively increase when level 2 variables are added but this seems like a different scenario. (My three neighbourhood variables are continuous that have been made into quartiles, and are only significant at the highest quartiles).


Many thanks in advance,

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:proportional Proportion Frailty models hazard variation appears between individual question

沙发
ReneeBK 发表于 2014-3-24 10:11:37 |只看作者 |坛友微信交流群
I'm not sure if I completely understand the problem, but it may involve the well-known sociological phenomenon of apparent, but in some ways misleading, ecological effects. In the past, I have seen studies in which the relationship between mental health and crime committed in an area varied between areas i.e. in some areas crime rates were higher among mentally ill people - but not in other areas. The overall result was that there appeared to be an effect and explanations either ignored the variability or attributed it to the characteristics of the areas involved. However, when researchers looked at a sample of individuals in such areas, those classified as mentally ill were no more likely to have committed crimes than anyone else (or vice versa). There was no individual effect but there was a large apparent area one.

If that historical research has been replicated recently, it might imply that there are some characteristics of an area which predispose individuals to commit crime or to become mentally ill - but not both, and it is the "but not both" which is relevant here. If both were affected, then one would simply have a confounding variable, for which we can control statistically. Some have tried to explain the phenomenon in terms of inequality. However, I would expect inequality to have affected both variables (mental illness and criminality). It might even be an artefact e.g. mentally ill people are no more likely to commit crime but are more likely to be caught and their offence recorded.

The ecological fallacy is usually lurking somewhere when we draw conclusions about group members without having studied them individually. That is nearly always a danger when we try to draw conclusions about an area, a social class, or an ethnic group.

If that guess is in any way relevant to  your problem, I would start with a literature search at least for the term "ecological fallacy". If not, I look forward to reading the solution that you eventually find. Good luck.

使用道具

藤椅
ReneeBK 发表于 2014-3-24 10:13:30 |只看作者 |坛友微信交流群
And the atomistic fallacy is usually lurking when reserchers impute individual-level relationships to population-level relationships. The causes of incidence and prevalence in populations are not the same thing as the causes of incidence and prevalence in individuals. It's good to be cautious about cross-level inference, but that moves in both directions (infering population from individual, and infering individual from population).

See for example,
Rose, G. (1985). Sick individuals and sick populations. International Journal of Epidemiology, 14(1):32–28.

Diez-Roux, A. V. (1998). Bringing context back into epidemiology: variables and fallacies in multilevel analysis. American Journal of Public Health, 88(2):216–222.

使用道具

板凳
ReneeBK 发表于 2014-3-24 10:14:22 |只看作者 |坛友微信交流群
You ask a relevant question. Perhaps I give you an excessively large answer… but I like this question very much…
When the frailty (neighborhood variance) is close to null and you are studying neighborhood variables, you are not using 500 information units but almost 30000. So… the lower the influence of the neighborhood, the easier you get “significant”   neighborhood level association (if there is some contrast of exposure). This is a very common paradox in neighborhood research… that is reinforced by the fact that “significant” neighborhood level associations are positive results easier to publish. In my opinion the first step in ant contextual analysis is to evaluate the relevance of the contextual boundaries by mean of the frailty, the VPC, the ICC or other measures.

Mixed model/multilevel regression analyses were developed to handle structures with very high “frailty” like repeated measurements (level 1) within individuals (level 2). In this case the intra-individual correlation of the information is always very high and we are just interested in quantifying individual level (i.e., level 2) associations. No one questions the human body’s “effects” on repeated measurements. The problem appears when we apply multilevel regression to systems that like the neighborhoods are not as well defined as in the case of individuals. You can find a larger explanation on these ideas elsewhere. [1] [2].

Another relevant aspect is that in epidemiology and similar disciplines we are simultaneously using two conceptually contradictory approaches (probabilistic and mechanistic). This is the reason of your confusion.
In fact, many epidemiologists become confused when we observe a “significant” association between contextual variables and individual health alongside with tiny general contextual influences  (e.g., frailty or VPC close to 0%) [3]. While a statistically significant association is always relevant in the probabilistic approach, it may not be in multilevel analyses investigating individual heterogeneity. This apparent paradox can be solved if we realize that the idea of quantifying general contextual influences  by using, for instance, the VPC is completely analogous to the concept of discriminatory accuracy developed in other fields of epidemiology like the study of risk factors and biomarkers [4] [5-7]. It is well recognized that many risk factors and novel biomarkers are not so useful because they have a very low discriminatory accuracy even if they are “significantly” associated with diseases [4].
In your case the neighborhood variables are significantly associated but they have a very low (not high!) discriminatory accuracy.

I explain more extensively these ideas in a recent commentary that is accepted for publication in the American Journal of Epidemiology [8]. It will be published in a near future but I can send you a copy for personal use if you like.

Best whishes
Juan Merlo

References
1. Merlo J, Ohlsson H, Lynch KF, Chaix B, Subramanian SV (2009) Individual and collective bodies: using measures of variance and association in contextual epidemiology. J Epidemiol Community Health 63: 1043-1048.
2. Merlo J, Chaix B, Yang M, Lynch J, Rastam L (2005) A brief conceptual tutorial of multilevel analysis in social epidemiology: linking the statistical concept of clustering to the idea of contextual phenomenon. J Epidemiol Community Health 59: 443-449.
3. Merlo J, Viciana-Fernandez FJ, Ramiro-Farinas D, Research Group of Longitudinal Database of Andalusian P (2012) Bringing the individual back to small-area variation studies: a multilevel analysis of all-cause mortality in Andalusia, Spain. Soc Sci Med 75: 1477-1487.
4. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P (2004) Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol 159: 882-890.
5. Merlo J, Wagner P (2013) The tyranny of the averages and the indiscriminate use of risk factors in public health: a call for revolution. European Journal of Epidemiology 28: 148.
6. Wagner P, Merlo J (2013) Measures of discriminatory accuracy in multilevel analysis. European Journal of Epidemiology 28: 135.
7. Merlo J, Wagner P, Juarez S, Mulinari S, Hedblad B (2013) Low discriminatory accuracy questions the use of risk factors. The tyranny of the averages and the indiscriminate use of risk factors and population attributable fractions in Public Health: the case of coronary heart disease. Working paper version 2013-09-26. Unit for Social Epidemiology, Department of Clinical Sciences, Faculty of Medicine, Lund University http://www.med.lu.se/english/kli ... c_working_papers_c.
8. Merlo J (2014) Multilevel analysis of individual heterogeneity: a fundamental critique of the current probabilistic risk factor epidemiology (invited commentary). American Journal of Epidemiology In press.

使用道具

报纸
ReneeBK 发表于 2014-3-24 10:15:30 |只看作者 |坛友微信交流群
Unless I am missing something, this is not particularly surprising.  You are considering two ways of explaining some modest amount of variation across neighborhoods above what would be predicted from individual-level sampling variation.  One involves an unspecified alternative hypothesis about 500 parameters; since you have a lot of sampling variation (only about 4 events per neighborhood) this test does not have a lot of power.  The other explains the excess variation, or some part of it, from a handful of variables (your neighborhood covariates) and has much more power if there is a relationship between the variables and the outcome.  In ANOVA terms, the same incremental sum of squares explained is tested with 500 or just a few degrees of freedom, the former having much more power.  (Here deviance rather than sum of squares.)

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-5-1 21:49