摘要翻译:
假设在一个多类问题中唯一可用的信息是专家对一组二元特征的条件概率的估计。目的是在后续的数据收集实验中选择待测量的特征子集。在缺乏特征之间相关性信息的情况下,我们假设所有特征都是条件独立的,从而选择朴素贝叶斯分类器作为该问题的最优分类器。即使在这种(看似微不足道的)完全了解分布的情况下,选择最优特征子集也不是简单的。我们讨论了序列前向选择(SFS)作为当前问题的特征选择过程的性质和实现细节。进行了敏感性分析,以研究当概率围绕估计值变化时,是否选择相同的特征。用一组羊瘙痒症的概率估计说明了这一过程。
---
英文标题:
《Pre-Selection of Independent Binary Features: An Application to
Diagnosing Scrapie in Sheep》
---
作者:
Ludmila Kuncheva, C. Whitaker, P. Cockcroft, Z. S. Hoare
---
最新提交年份:
2012
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Computational Engineering, Finance, and Science 计算工程、金融和科学
分类描述:Covers applications of computer science to the mathematical modeling of complex systems in the fields of science, engineering, and finance. Papers here are interdisciplinary and applications-oriented, focusing on techniques and tools that enable challenging computational simulations to be performed, for which the use of supercomputers or distributed computing platforms is often required. Includes material in ACM Subject Classes J.2, J.3, and J.4 (economics).
涵盖了计算机科学在科学、工程和金融领域复杂系统的数学建模中的应用。这里的论文是跨学科和面向应用的,集中在技术和工具,使挑战性的计算模拟能够执行,其中往往需要使用超级计算机或分布式计算平台。包括ACM学科课程J.2、J.3和J.4(经济学)中的材料。
--
---
英文摘要:
Suppose that the only available information in a multi-class problem are expert estimates of the conditional probabilities of occurrence for a set of binary features. The aim is to select a subset of features to be measured in subsequent data collection experiments. In the lack of any information about the dependencies between the features, we assume that all features are conditionally independent and hence choose the Naive Bayes classifier as the optimal classifier for the problem. Even in this (seemingly trivial) case of complete knowledge of the distributions, choosing an optimal feature subset is not straightforward. We discuss the properties and implementation details of Sequential Forward Selection (SFS) as a feature selection procedure for the current problem. A sensitivity analysis was carried out to investigate whether the same features are selected when the probabilities vary around the estimated values. The procedure is illustrated with a set of probability estimates for Scrapie in sheep.
---
PDF链接:
https://arxiv.org/pdf/1207.4141