摘要翻译:
在大数据时代,分析人员通常会对观察到的数据探索各种统计模型或机器学习方法,以利于科学发现或获得预测力。无论采用何种数据和拟合程序,关键的一步是从一组候选模型或方法中选择最合适的模型或方法。模型选择是数据分析中的一个关键因素,可以进行可靠和可复制的统计推断或预测,因此在生态学、经济学、工程学、金融学、政治学、生物学和流行病学等领域的科学研究中处于核心地位。模型选择技术产生于统计学、信息论和信号处理领域的研究,已经有很长的历史。已经提出了相当多的方法,遵循不同的哲学,表现出不同的性能。本文的目的是从它们的动机、大样本性能和适用性等方面对它们进行全面的概述。我们提供了综合和实际相关的讨论的理论性质的最先进的模式选择方法。我们还分享了我们对模式选择实践中一些有争议的观点的思考。
---
英文标题:
《Model Selection Techniques -- An Overview》
---
作者:
Jie Ding, Vahid Tarokh, and Yuhong Yang
---
最新提交年份:
2018
---
分类信息:
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Computer Science 计算机科学
二级分类:Information Theory 信息论
分类描述:Covers theoretical and experimental aspects of information theory and coding. Includes material in ACM Subject Class E.4 and intersects with H.1.1.
涵盖信息论和编码的理论和实验方面。包括ACM学科类E.4中的材料,并与H.1.1有交集。
--
一级分类:Computer Science 计算机科学
二级分类:Machine Learning 机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
一级分类:Mathematics 数学
二级分类:Information Theory 信息论
分类描述:math.IT is an alias for cs.IT. Covers theoretical and experimental aspects of information theory and coding.
它是cs.it的别名。涵盖信息论和编码的理论和实验方面。
--
一级分类:Physics 物理学
二级分类:Applied Physics 应用物理学
分类描述:Applications of physics to new technology, including electronic devices, optics, photonics, microwaves, spintronics, advanced materials, metamaterials, nanotechnology, and energy sciences.
物理学在新技术中的应用,包括电子器件、光学、光子学、微波、自旋电子学、先进材料、超材料、纳米技术和能源科学。
--
---
英文摘要:
In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of- the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.
---
PDF链接:
https://arxiv.org/pdf/1810.09583


雷达卡



京公网安备 11010802022788号







