人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › HLM专版 › How do I sample without replacement using a sampling ...

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: Lisrelchen

1203 0

How do I sample without replacement using a sampling-with-replacement function? [推广有奖]

0关注
62粉丝

VIP

院士

67%

还不是VIP/贵宾

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望: 0 级
论坛币: 50057 个
通用积分: 79.9387
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2014-4-19 12:09:25 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

I vaguely recall from grad school that the following is a valid approach to do a weighted sampling without replacement:

Start with an initially empty "sampled set".
Draw a (single) weighted sample with replacement with whatever method you have.
Check whether you have already picked it.
If you did, ignore it and move to the next sample. If you didn't, add it to your sampled set.
Repeat 2-4 until the sampled set is of the desired size.

Will this give me a valid weighted sample?

(Context: for very large samples, R's sample(x,n,replace=FALSE,p) takes a really long time, much longer than implementing the strategy above.)

UPDATE 2012-01-05

Thank you everyone for your comments and responses! Regarding code, here's an example that faithfully reproduces what I'm trying to do (although I'm actually calling R from Python with RPy, but it seems the runtime characteristics are unchanged).

First, define the following function, which assumes a sufficiently long sequence of with-replacement samples as input:

get_rejection_sample <- function(replace_sample, n) { top = length(replace_sample) bottom = 1 mid = (top+bottom)/2 u = unique(replace_sample[1:mid]) while(length(u) != n) { if (length(u) < n) { bottom = mid } else { top = mid } mid = (top+bottom)/2 u = unique(replace_sample[1:mid]) if (mid == top || mid == bottom) { break } } u}

It simply does a binary search for the number of non-unique samples needed to get enough unique samples. Now, in the interpreter:

> x = 1:10**7> w = rexp(length(x))> system.time(y <- sample(x, 500, FALSE, w)) user system elapsed 12.820 0.048 12.902 > system.time(y <- get_rejection_sample(sample(x, 1000, TRUE, w), 500)) user system elapsed 0.311 0.066 0.378 Warning message:In sample(x, 1000, TRUE, w) : Walker's alias method used: results are different from R < 2.2.0

As you can see, even for just 500 elements the difference is dramatic. I'm actually trying to sample 106 indices from a list of about 1.2×107, which takes hours using replace=FALSE.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：replacement placement function Sampling replace following function without already desired

How do I sample without replacement using a sampling-with-replacement function? [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

本版微信群

How do I sample without replacement using a sampling-with-replacement function? [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群