摘要翻译:
本文介绍了一个从大样本文本数据中获取知识的框架。我们利用从文本中提取的简单语句的基于张量的分布表示,并展示了如何使用该表示以无监督的方式从文本数据中推断出涌现的知识模式。我们在本文中研究的模式的例子是隐含的术语关系或合取的IF-THEN规则。为了评估我们的方法的实际相关性,我们将其应用于使用MeSH(一个受控的生物医学词汇和词库)中的术语注释生命科学文章。
---
英文标题:
《Distributional Framework for Emergent Knowledge Acquisition and its
Application to Automated Document Annotation》
---
作者:
Vit Novacek
---
最新提交年份:
2012
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Information Retrieval 信息检索
分类描述:Covers indexing, dictionaries, retrieval, content and analysis. Roughly includes material in ACM Subject Classes H.3.0, H.3.1, H.3.2, H.3.3, and H.3.4.
涵盖索引,字典,检索,内容和分析。大致包括ACM主题课程H.3.0、H.3.1、H.3.2、H.3.3和H.3.4中的材料。
--
---
英文摘要:
The paper introduces a framework for representation and acquisition of knowledge emerging from large samples of textual data. We utilise a tensor-based, distributional representation of simple statements extracted from text, and show how one can use the representation to infer emergent knowledge patterns from the textual data in an unsupervised manner. Examples of the patterns we investigate in the paper are implicit term relationships or conjunctive IF-THEN rules. To evaluate the practical relevance of our approach, we apply it to annotation of life science articles with terms from MeSH (a controlled biomedical vocabulary and thesaurus).
---
PDF链接:
https://arxiv.org/pdf/1210.3241