人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › R语言论坛 › R 语言随机抽样

发帖

楼主: wsnmgp

1436 3

[问答] R 语言随机抽样 [推广有奖]

0关注
0粉丝

大专生

61%

还不是VIP/贵宾

威望: 0 级
论坛币: 445 个
通用积分: 0.9052
学术水平: 0 点
热心指数: 0 点
信用等级: 0 点
经验: 53 点
帖子: 4
精华: 0
在线时间: 109 小时
注册时间: 2018-5-9
最后登录: 2023-3-5

楼主

wsnmgp 发表于 2020-5-19 22:19:27 |AI写论文

500论坛币

比如说我的数据框是这样的

x1 x2 x3 x4 x5
1  0 3 0 4
2  1 1 0 8
2  2 1 1 2
4  1 1 9 11

以上的求和是54，我知道sample函数是随机取的【个数】一定，但是如果我需要随机抽出的样本是【总和】一定呢？
比如随机抽样一次，随机取的数量(size) 无所谓，但是保证最后形成的新数据框求和是30
比如变成这样

x1  x2  x4  x5
1 2 3 8
2 2 1 11
求和为30

分享0 收藏0 回帖

关键词：随机抽样 Sample size AMPL 数据框

相关帖子

沙发

crystal8832

发表于 2020-5-20 09:08:43

我能想到的就是通过遍历的方式随机识别，不过这样效率好低呀。应该还有别的办法...

藤椅

龙熏风 发表于 2020-5-20 18:40:07

个人感觉你的这个已经不能叫随机抽样了
写了一个小程序，应该还可以优化，供君参考

sampleProcess <- function(series, value, items = c()) {
if (value == sum(series)) return(c(items, series))
selTemp <- series == value
if (sum(selTemp) == 1 & length(items) > 0) {
return(c(items, series[selTemp]))
}
#drop the items larger than value
series <- series[which(series < value)]
if (length(series) == 0) return(items)
#find the cup (minimal set number)
series <- sort(series, decreasing = TRUE)
sum.array <- sapply(seq(length(series)),
function(i, series) sum(series[1:i]), series = series)
cup <- which(sum.array >= value)[1]
#sample
index <- seq(length(series))
selTemp <- index %in% sample(index, cup)
itemsTemp <- series[selTemp]
sumTemp <- sum(itemsTemp)
if (sumTemp == value) {
return(c(items, itemsTemp))
} else if (sumTemp < value) {
series <- series[!selTemp]
items <- c(items, itemsTemp)
value <- value - sumTemp
return(sampleProcess(series, value, items = items))
} else {
return(sampleProcess(series, value, items = items))
}
}
sampleBySum <- function(series, value, iter) {
if (value > sum(series)) stop("the sum of series if below value")
#drop the items larger than value
series <- series[which(series <= value)]
if (length(series) == 0) stop("no item below value")
i <- 1
while (i <= iter) {
items <- sampleProcess(series, value)
if (sum(items) == value) {
cat("iter = ", i, "\n")
break
}
i <- i + 1
}
if (sum(items) != value) warning("no finds")
return(items)
}
x <- c(1,0,3,0,4,2,1,1,0,8,2,2,1,1,2,4,1,1,9,11)
sampleBySum(x, 30, 10)

复制代码

板凳

llb_321

发表于 2020-5-20 18:43:05

d <-
  matrix(c(1, 2, 2, 4, 0, 1, 2, 1, 3, 1, 1, 1, 0, 0, 1, 9, 4, 8, 2, 11), 4)
k = 1
repeat {
  i <- sample(1:(dim(d)[1]), 1)
  a <- matrix(0, dim(d)[1], dim(d)[2])
  for (j in 1:dim(d)[2]) {
a[, j] <- c(sample(d[, j], i), rep(0, dim(d)[1] - i))
if (sum(a) == 30) {
   print(a)
   break
}
  }
  if (k > 10000)
break
  k <- k + 1
}
结果有很多很多，当然没想办法识别重复结果，所以设了k作终止条件。你试试吧。
另外，不要你的币，我的币已经够多的了。就是玩。

返回列表

发帖

本版微信群

加好友,备注cda
拉您进交流群

京ICP备16021002号-2 京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明

[问答] R 语言随机抽样 [推广有奖]

相关帖子

浏览过的帖子

浏览过的版块

二级伯乐勋章

一级伯乐勋章

初级热心勋章

初级学术勋章

中级热心勋章

初级信用勋章

中级学术勋章

中级信用勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

[问答] R 语言 随机抽样 [推广有奖]

相关帖子

浏览过的帖子

浏览过的版块

二级伯乐勋章

一级伯乐勋章

初级热心勋章

初级学术勋章

中级热心勋章

初级信用勋章

中级学术勋章

中级信用勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

[问答] R 语言随机抽样 [推广有奖]