楼主: 何人来此
664 0

[统计数据] 改进的V-fold交叉验证:V-fold惩罚 [推广有奖]

  • 0关注
  • 4粉丝

会员

学术权威

78%

还不是VIP/贵宾

-

威望
10
论坛币
10 个
通用积分
64.8012
学术水平
1 点
热心指数
6 点
信用等级
0 点
经验
24593 点
帖子
4128
精华
0
在线时间
0 小时
注册时间
2022-2-24
最后登录
2022-4-15

楼主
何人来此 在职认证  发表于 2022-3-26 14:15:00 来自手机 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
摘要翻译:
本文从非渐近的观点出发,研究了V-折叠交叉验证(VFCV)在模型选择中的有效性,并对其进行了改进,称之为“V-折叠惩罚”。考虑一个特殊的(虽然简单的)回归问题,我们证明了V有界的VFCV对于模型选择是次优的,因为V越大,它就越“过分惩罚”。因此,渐近最优性要求V达到无穷大。然而,当信噪比较低时,似乎需要过度惩罚,因此,尽管存在可变性问题,但最佳V并不总是较大的V。一些模拟数据证实了这一点。为了提高VFCV的预测性能,我们定义了一种新的模型选择过程,称为“V-折叠惩罚”(penVF)。它是Efron的bootstrap惩罚的V倍子抽样版本,因此它具有与VFCV相同的计算代价,同时更加灵活。在异方差回归框架下,假设模型具有特定的结构,证明了当样本容量为无穷大时,penVF满足一个导常数趋于1的非渐近oracle不等式。特别是,这意味着对回归函数平滑性的适应性,即使在高度异方差噪声下也是如此。此外,与V参数无关,使用penVF很容易过度惩罚。仿真研究表明,该方法在非渐近情况下对VFCV有明显的改善。
---
英文标题:
《V-fold cross-validation improved: V-fold penalization》
---
作者:
Sylvain Arlot (LM-Orsay, INRIA Futurs)
---
最新提交年份:
2008
---
分类信息:

一级分类:Mathematics        数学
二级分类:Statistics Theory        统计理论
分类描述:Applied, computational and theoretical statistics: e.g. statistical inference, regression, time series, multivariate analysis, data analysis, Markov chain Monte Carlo, design of experiments, case studies
应用统计、计算统计和理论统计:例如统计推断、回归、时间序列、多元分析、数据分析、马尔可夫链蒙特卡罗、实验设计、案例研究
--
一级分类:Statistics        统计学
二级分类:Machine Learning        机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Statistics        统计学
二级分类:Statistics Theory        统计理论
分类描述:stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.
Stat.Th是Math.St的别名。渐近,贝叶斯推论,决策理论,估计,基础,推论,检验。
--

---
英文摘要:
  We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it ``overpenalizes'' all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signal-to-noise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called ``V-fold penalization'' (penVF). It is a V-fold subsampling version of Efron's bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, we prove that penVF satisfies a non-asymptotic oracle inequality with a leading constant that tends to 1 when the sample size goes to infinity. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. A simulation study shows that this results in a significant improvement on VFCV in non-asymptotic situations.
---
PDF链接:
https://arxiv.org/pdf/802.0566
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:交叉验证 OLD Multivariate IMPROVEMENT Theoretical 假设 证实 prove 预测 框架

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
jg-xs1
拉您进交流群
GMT+8, 2025-12-31 03:03