(1)我认为文章作者(不是搂主)了bootstrap的概念,混淆了bootstrap and cross-validation. 不过只是本人猜测。一般在做cross-validation, similar to this case, since you already have 100 validation sample. 那些parameter estimates (KS, gini) 已经产生了 100次,因此已经可以找到对应的Confidence Interval (C.I.),没有必要再做bootstrap.
如果硬是要做bootstrap, 也应该只是对应the validation sample of one validation process. In this way, we would have 100 times 95% C.I., which I don't see the reason.
Further, the way to combine 100 validation samples together then perform bootstrapping would be a worse idea, since I can't think of any statistical meaning with it.
(2) To a set of data, bootstrapping usually need you to define the times you want to repeat, it is often thought to be >500. In each bootstrap sample, you randomly select the observations with replacement, the sample size you select is the same as original sample.
For example, 你有size为1000的univariate data, 现在你想做500 times bootstrapping. 在第一次bootstrap的时候,你randamly sample the orignial data set 1000 times with replacement. Then you have the frist bootstrap sample. 第二次, 重复第一次的步骤,..., 直到你有了500个类似的bootstrap samples.
Finally, you can do whatever you want (parameter estimation) on each sample. It would give you 500 of them, so you can find their corresponding C.I.'s.
希望我的解释有点帮助, :)


雷达卡





京公网安备 11010802022788号







