摘要翻译:
来自分光光度计的数据形成了大量可利用变量的向量。使用这些变量建立定量模型通常需要使用比初始变量更小的变量集。实际上,一个模型的输入变量太多导致参数太多,导致过拟合和泛化能力差。在本文中,我们建议使用互信息测度从初始集合中选择变量。互信息度量输入变量相对于模型输出的信息含量,而不对将要使用的模型做任何假设;因此,它适用于非线性建模。此外,它导致变量在初始集合中的选择,而不是它们的线性或非线性组合。与其他变量投影方法相比,在不降低模型性能的情况下,它允许结果有更大的可解释性。
---
英文标题:
《Mutual information for the selection of relevant variables in
spectrometric nonlinear modelling》
---
作者:
Fabrice Rossi (INRIA Rocquencourt / INRIA Sophia Antipolis), Amaury
Lendasse (CIS), Damien Fran\c{c}ois (CESAME), Vincent Wertz (CESAME), Michel
Verleysen (DICE - MLG)
---
最新提交年份:
2007
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Machine Learning 机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Computer Science 计算机科学
二级分类:Neural and Evolutionary Computing 神经与进化计算
分类描述:Covers neural networks, connectionism, genetic algorithms, artificial life, adaptive behavior. Roughly includes some material in ACM Subject Class C.1.3, I.2.6, I.5.
涵盖神经网络,连接主义,遗传算法,人工生命,自适应行为。大致包括ACM学科类C.1.3、I.2.6、I.5中的一些材料。
--
一级分类:Statistics 统计学
二级分类:Applications 应用程序
分类描述:Biology, Education, Epidemiology, Engineering, Environmental Sciences, Medical, Physical Sciences, Quality Control, Social Sciences
生物学,教育学,流行病学,工程学,环境科学,医学,物理科学,质量控制,社会科学
--
---
英文摘要:
Data from spectrophotometers form vectors of a large number of exploitable variables. Building quantitative models using these variables most often requires using a smaller set of variables than the initial one. Indeed, a too large number of input variables to a model results in a too large number of parameters, leading to overfitting and poor generalization abilities. In this paper, we suggest the use of the mutual information measure to select variables from the initial set. The mutual information measures the information content in input variables with respect to the model output, without making any assumption on the model that will be used; it is thus suitable for nonlinear modelling. In addition, it leads to the selection of variables among the initial set, and not to linear or nonlinear combinations of them. Without decreasing the model performances compared to other variable projection methods, it allows therefore a greater interpretability of the results.
---
PDF链接:
https://arxiv.org/pdf/709.3427


雷达卡



京公网安备 11010802022788号







