摘要翻译:
关联规则是数据挖掘领域中应用最广泛的数据分析方法之一。关联规则是两组二元变量之间部分蕴涵的一种形式。在最常见的方法中,关联规则通过其置信度的一个下限来参数化,该下限是给定前件后件的经验条件概率,和/或通过一些其他参数边界来参数化,如“支持”或偏离独立性。本文从一个基本的角度研究了关联规则之间的冗余概念。我们把数据集中的每一个事务看作命题逻辑意义上的解释(或模型),并考虑关联规则之间现有的冗余概念,即逻辑蕴涵的概念,其形式是“其中包含第一条规则的任何数据集也必须服从第二条规则,因此第二条规则是冗余的”。我们讨论了关联规则之间冗余的几种已有的可供选择的定义,并给出了它们之间的新的刻画和关系。我们表明,我们讨论的主要备选方案实际上只对应于两个变体,这两个变体在处理完全信任含义方面是不同的。对于这两个冗余概念中的每一个,我们都提供了一个健全和完整的演绎演算,并且我们展示了如何构造完全的基(即公理化),以规则的数目来表示绝对最小的大小。最后,我们探讨了一种关于多个关联规则的冗余的方法,并充分刻画了它的两个部分前提的最简情况。
---
英文标题:
《Redundancy, Deduction Schemes, and Minimum-Size Bases for Association
Rules》
---
作者:
Jose L. Balcazar
---
最新提交年份:
2010
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Logic in Computer Science 计算机科学中的逻辑
分类描述:Covers all aspects of logic in computer science, including finite model theory, logics of programs, modal logic, and program verification. Programming language semantics should have Programming Languages as the primary subject area. Roughly includes material in ACM Subject Classes D.2.4, F.3.1, F.4.0, F.4.1, and F.4.2; some material in F.4.3 (formal languages) may also be appropriate here, although Computational Complexity is typically the more appropriate subject area.
涵盖计算机科学中逻辑的所有方面,包括有限模型理论,程序逻辑,模态逻辑和程序验证。程序设计语言语义学应该把程序设计语言作为主要的学科领域。大致包括ACM学科类D.2.4、F.3.1、F.4.0、F.4.1和F.4.2中的材料;F.4.3(形式语言)中的一些材料在这里也可能是合适的,尽管计算复杂性通常是更合适的主题领域。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
Association rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parameterized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as "support" or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the propositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form "any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant". We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.
---
PDF链接:
https://arxiv.org/pdf/1002.4286


雷达卡



京公网安备 11010802022788号







