[问答] 求助----R软件数据集合并！！！！急 [推广有奖]

11楼

jmpamao 发表于 2014-3-20 22:33:03

c1<-data1$ID%in%data2$ID
c2 <- data2$ID%in%data1$ID
rbind(data1[c1,],data2[c2,])

你的思路是这个不？这样子的话，上面两个条件都不需要

12楼

toby3003 发表于 2014-3-21 10:52:09

jmpamao 发表于 2014-3-20 22:33
c1

我的情况是这样的：我有一组随访数据，不同人的次数是不一样的，有些人有20次，有些人只有2,3次，所以我想在这些随访数据中找出都有12次的人。开始我就用data<-subset(aa,time<=12)找出总的人数，然后我就用您给我的那个语句，data<-data[duplicated(data[,1])+duplicated(data[,1],fromLast=T)>0,]但是现在并没有找出共同项，所以现在我也不知道怎么弄好了，多谢

13楼

jmpamao 发表于 2014-3-21 17:38:58

还是不清楚是什么样的数据及你的需求

14楼

toby3003 发表于 2014-3-21 20:07:03

jmpamao 发表于 2014-3-21 17:38
还是不清楚是什么样的数据及你的需求

数据格式：

healthid	随访次数	x
1	1	a
1	2	b
1	3	c
1	4	d
1	5	e
2	1	f
2	2	g
2	3	h
2	4	i
3	1	g
3	2	k
3	3	l

我想的是提出每个healthid的前三组数据，也就是每个人随访三次的数据。不知道我说的是否明白。。。

15楼

jmpamao 发表于 2014-3-21 20:56:41

toby3003 发表于 2014-3-21 20:07
数据格式：
我想的是提出每个healthid的前三组数据，也就是每个人随访三次的数据。不知道我说的是否明白 ...

是不是这样的？

#大体上了解你的意思，我重新给个，
#现在要取 ID次数大于等于4的子集下，再取每个ID 1到4的数据
#ID1 为5次，ID2为4次， ID3为3次，ID4为5次
#ID3要去掉，其余的取前4个数。
#所以的ID 和 time 都有序排列
data <- read.table(text="id time dose
1 1 a
1 2 b
1 3 c
1 4 d
1 5 e
2 1 f
2 2 g
2 3 h
2 4 i
3 1 g
3 2 k
3 3 l
4 1 d
4 2 f
4 3 g
4 4 r
4 5 u
",header=T)
data<-data[rep(table(data[,1])>=4,table(data[,1])),]#ID3 排除了
data2<-unique(data[,1])
data3<-match(data2,data[,1])#每个ID第一个位置
data4<-rep(data3,each=4)+0:3#每个ID第一个位置：（。。第一个位置+3）
data[data4,]

复制代码

16楼

toby3003 发表于 2014-3-22 01:24:34

jmpamao 发表于 2014-3-21 20:56
是不是这样的？

谢谢您认真的回复，但是我现在的情况是，我有大概2万多糖尿病患者的数据，随访一共量差不多30万次随访，我统计了一下每人随访的数量如下：（第一排是随访次数，第二排是随访该次数的人） 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 .....
26287 25170 23824 22706 21633 20373 19135 17731 16264 14829 13287 11907 10150 8592 7558 6248 4875 3913 ....

我计划整理出有12次随访的患者的数据，按照您给我的方法，貌似不太可行。。我想请问一下，哪种方法比较高效一些。
我之前用的方法是：

data11<-subset(diabetes,time==11)
data11<-data11[match(data12$healthid,data11$healthid),]

复制代码

然后把随访11次10次。。。。1次这种的数据筛选好，然后把所有数据rbind一下，但是我发现貌似得不到我想要的数据，所以挺郁闷的，请帮我想想，多谢多谢！

17楼

zhangyangsmith 发表于 2014-3-22 04:14:02

toby3003 发表于 2014-3-22 01:24
谢谢您认真的回复，但是我现在的情况是，我有大概2万多糖尿病患者的数据，随访一共量差不多30万次随访，我 ...

You have actually had a very good idea by counting the number of patients who have been visited for a specific number. Instead you probably should count the number of visits for each patient.

Following @jmpamao 's example:

data <- read.table(text="id time dose
1 1 a
1 2 b
1 3 c
1 4 d
1 5 e
2 1 f
2 2 g
2 3 h
2 4 i
3 1 g
3 2 k
3 3 l
4 1 d
4 2 f
4 3 g
4 4 r
4 5 u
",header=T)
# Count the number of visits for each patient
# Assuming each row represents one visit
frq <- table(data$id)
# frq is a named vector where its names are ids and elements are counts of visits
slct <-
data[data$id %in% as.numeric(names(frq[frq == 3])), ]

复制代码

Keep in mind the names of frq is character in type, if the id field in the original dataset is not of type character you will need to convert it explicitly (as done above).

@jmpamao, this is the first time I see someone using read.table in such a "datalines" fashion.