摘要翻译:
本文探讨了这样一个问题:在高维模型中进行变量选择时,可以给出什么样的统计保证?特别是,我们看一些多阶段回归方法的错误率和功率。在第一阶段,我们拟合一组候选模型。在第二阶段,我们通过交叉验证选择一个模型。在第三阶段,我们使用假设检验来消除一些变量。我们把前两个阶段称为“筛选”,最后一个阶段称为“清洗”。我们考虑了三种筛选方法:套索法、边际回归法和正向逐步回归法。我们的方法在一定条件下给出了一致的变量选择。
---
英文标题:
《High-dimensional variable selection》
---
作者:
Larry Wasserman, Kathryn Roeder
---
最新提交年份:
2009
---
分类信息:
一级分类:Mathematics 数学
二级分类:Statistics Theory 统计理论
分类描述:Applied, computational and theoretical statistics: e.g. statistical inference, regression, time series, multivariate analysis, data analysis, Markov chain Monte Carlo, design of experiments, case studies
应用统计、计算统计和理论统计:例如统计推断、回归、时间序列、多元分析、数据分析、马尔可夫链蒙特卡罗、实验设计、案例研究
--
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Statistics 统计学
二级分类:Statistics Theory 统计理论
分类描述:stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.
Stat.Th是Math.St的别名。渐近,贝叶斯推论,决策理论,估计,基础,推论,检验。
--
---
英文摘要:
This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high-dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as "screening" and the last stage as "cleaning." We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions.
---
PDF链接:
https://arxiv.org/pdf/704.1139


雷达卡



京公网安备 11010802022788号







