人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › R语言论坛 › 倾家荡产相求：关于weibull regression和Cox PH regress ...

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: 邓贵大

2493 3

[问答] 倾家荡产相求：关于weibull regression和Cox PH regression [推广有奖]

0关注
18粉丝

博士生

59%

还不是VIP/贵宾

威望: 0 级
论坛币: 88 个
通用积分: 2.1142
学术水平: 182 点
热心指数: 178 点
信用等级: 166 点
经验: 9462 点
帖子: 296
精华: 0
在线时间: 335 小时
注册时间: 2009-6-17
最后登录: 2014-9-20

楼主

邓贵大 发表于 2013-5-11 16:44:17 |只看作者 |坛友微信交流群|倒序 |AI写论文

914论坛币

cgdModProject2013.txt (4.6 KB)
分析一个Gamma Interferon的Trial
背景见这里
http://www.nejm.org/doi/full/10.1056/NEJM199102213240801
数据里有变量 ID，Z1-Z9，T1，D
其中Z1是，T1是时间，D是event/censoring indicator
参见下面的代码里有其他变量的解释

rm(list=ls())
setwd(dir="c:/bios622/final")
library(survival)
library(MASS)
#load data
cgd <- read.table(file='cgdModProject2013.csv', header=T, sep=',')
names(cgd)
#variable labels of Z1-Z9, T1, and D
#won't apply due to 'if (FALSE)'
if (FALSE) {
names(cgd)[c(2:12)] <- c('Treatment Code',
'Pattern of inheritance',
'Age (year)',
'Height (cm)',
'Weight (kg)',
'Using corticosteroids at time of study entry',
'Using prophylactic antibiotics at time of study entry',
'Gender',
'Hospital Category',
'Elapsed time (in days) from randomization to the diagnosis of a serious infection or censoring',
'Censoring indicator'
)
}
#label values
cgd$Z1 <- factor(cgd$Z1, levels=c(2,1), labels=c('Placebo','Gamma Interferon'))
cgd$Z2 <- factor(cgd$Z2, levels=c(1,2), labels=c('X-linked','Autosomal recessive'))
cgd$Z6 <- factor(cgd$Z6, levels=c(2,1), labels=c('No','Yes'))
cgd$Z7 <- factor(cgd$Z7, levels=c(1,2), labels=c('Yes','No'))
cgd$Z8 <- factor(cgd$Z8, levels=c(1,2), labels=c('Male','Female'))
cgd$Z9 <- factor(cgd$Z9, levels=c(1,2,3,4), labels=c('US-Other','US-NIH','Europe-Amsterdam','Europe-Other'))
cgd$D <- factor(cgd$D, levels=c(1,2), labels=c('Occurred', 'Censored'))
#time-to-event setup
survData <- Surv(time=cgd$T1, event=(cgd$D=='Occurred'))
#overall
summary(survData)
km1 <- survfit(survData ~ 1, error="greenwood", conf.type="log-log", data=cgd)
#png(file='01.overall.png', width=1200, height=960, res=144, bg='transparent')
plot(km1, xlab="Survival time (days)", ylim=c(.25,1),
ylab="Estimated survival probability",
main="Figure 1. Overall Kaplan-Meier Estimator with pointwise CI", conf=T, mark=3, cex=0.5)
#dev.off()

复制代码

我现在知道把COX PH MODEL估计的概率画到KAPLAN-MEIER曲线上

kmZ1 <- survfit(survData ~ Z1, data=cgd)
plot(kmZ1, xlab="Time (days)", ylab="Estimated probability of no infection", ylim=c(0,1),
main="Figure 2. Kaplan-Meier Estimator with pointwise CI by Treatment", lty=1:2, mark=3, cex=0.5, col=c("red","blue"))
cox1 <- coxph(survData ~ Z1+Z2, data=cgd)
lines(survexp(~Z1, ratetable=cox1, data=cgd), col=c('purple', 'orange'), lty=c(3,4))

复制代码

请问
（1）如何把WEIBULL　REGRESSION拟合的概率也画上去，我在网上找的代码不管用
SOLVED

kmZ1 <- survfit(survData ~ Z1, data=cgd)
plot(kmZ1, xlab="Time (days)", ylab="Estimated probability of no infection", ylim=c(0,1),
main="Figure 2. Kaplan-Meier Estimator with pointwise CI by Treatment", lty=1:2, mark=3, cex=0.5, col=c("red","blue"))
#overlay predicted survival rate by COX PH model
cox1 <- coxph(survData ~ Z1+Z2, data=cgd)
lines(survexp(~Z1, ratetable=cox1, data=cgd), col=c('purple', 'orange'), lty=c(3,4))
#overlay predicated survival rate by Weibull regression
sWei <- survreg(survData ~ Z1,dist='weibull',data=cgd)
time <- predict(sWei, newdata=list(Z1=as.factor('Gamma Interferon')),type="quantile",p=seq(.01,.99,by=.01))
lines(time, 1-seq(.01,.99,.01), lty=5)
time <- predict(sWei, newdata=list(Z1=as.factor('Placebo')),type="quantile",p=seq(.01,.99,by=.01))
lines(time, 1-seq(.01,.99,.01), lty=6)

复制代码

（2）怎样确定WEIBULL和COX　PH哪个更适合这个数据？（500点）
（3）WEIBULL　REGRESSION怎么检查ASSUMPTION和DIAGNOSTICS？（414点）

分享0 收藏0 回帖

关键词：regression regressio regress Weibull Bull survival library

Be still, my soul: the hour is hastening on
When we shall be forever with the Lord.
When disappointment, grief and fear are gone,
Sorrow forgot, love's purest joys restored.

使用道具举报

沙发

love_pig 发表于 2016-3-23 17:26:12 |只看作者 |坛友微信交流群

看起来好高深

使用道具举报

藤椅

万人往LVR

发表于 2016-3-23 17:56:51 |只看作者 |坛友微信交流群

想采访一下楼主，倾家荡产后问题还没解决，论坛币也拿不回来的感觉怎么样是不是很酸爽

使用道具举报

板凳

runman 发表于 2016-6-16 10:57:13 |只看作者 |坛友微信交流群

楼主，想请教一个问题，一篇论文中对变量的定义和数据来源的说明中，发现有些变量是时间序列数据，而有些变量是截面数据，论文的目的是用Weibull hazard model做生存分析。

比如 variable1  它的数据为1970-2015的时间序列数据
   variable2  它的数据为2000-2010年的平均值
   实在想不通它的数据结构是什么样子的？

是不是以下这种形式呢？先谢谢啦。

year  variable1          varible2
1970 数值          缺失
1971 数值          缺失
1972 数值          缺失
1973 数值          缺失
1974 数值          缺失
1975 数值          缺失
1976 数值          缺失
1977 数值          缺失
1978 数值          缺失
…    …             …
2000 数值 2000-2015年变量2的平均值
2001 数值 2000-2015年变量3的平均值
2002 数值 2000-2015年变量4的平均值
2003 数值 2000-2015年变量5的平均值
2004 数值 2000-2015年变量6的平均值
2005 数值 2000-2015年变量7的平均值
2006 数值 2000-2015年变量8的平均值
2007 数值 2000-2015年变量9的平均值
2008 数值 2000-2015年变量10的平均值
2009 数值 2000-2015年变量11的平均值
2010 数值 2000-2015年变量12的平均值
2011 数值 2000-2015年变量13的平均值
2012 数值 2000-2015年变量14的平均值
2013 数值 2000-2015年变量15的平均值
2014 数值 2000-2015年变量16的平均值
2015 数值 2000-2015年变量17的平均值

使用道具举报