人大经济论坛 › 论坛 › 经济学人二区 › 外文文献专区 › 基于主题地图的文档聚类

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: 可人4

315 0

[计算机科学] 基于主题地图的文档聚类 [推广有奖]

0关注
2粉丝

会员

学术权威

77%

还不是VIP/贵宾

威望: 10 级
论坛币: 15 个
通用积分: 45.8207
学术水平: 0 点
热心指数: 1 点
信用等级: 0 点
经验: 24788 点
帖子: 4166
精华: 0
在线时间: 0 小时
注册时间: 2022-2-24
最后登录: 2022-4-15

楼主

可人4

发表于 2022-3-19 14:55:00 来自手机 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

摘要翻译：
文档聚类对于更好地管理、智能导航、高效过滤以及对万维网(WWW)等大量文档的简洁摘要等方面的重要性得到了广泛的认可。下一个挑战在于基于文档的语义内容在语义上执行聚类。文档聚类问题有两个主要组成部分：（1）以一种固有的捕获文本语义的形式表示文档。这也有助于降低文档的维数，以及（2）基于语义表示定义相似性度量，使得它将更高的数值分配给具有更高语义关系的文档对。文档的特征空间对于文档聚类来说是非常具有挑战性的。一个文档可能包含多个主题，它可能包含一大组与类无关的通用字词和少数特定于类的核心字词。基于文档向量模型(DVM)和后缀树模型(STC)的传统聚类算法在产生高质量聚类结果时效率较低。提出了一种基于主题图表示的文档聚类方法。文档正在转换成紧凑的形式。提出了一种基于主题地图、数据和结构推断信息的相似性度量方法。该方法通过聚类层次聚类实现，并在标准信息检索(IR)数据集上进行了测试。对比实验表明，该方法在提高聚类质量方面是有效的。
---
英文标题：
《Document Clustering based on Topic Maps》
---
作者：
Muhammad Rafi, M. Shahid Shaikh, Amir Farooq
---
最新提交年份：
2011
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Information Retrieval 信息检索
分类描述：Covers indexing, dictionaries, retrieval, content and analysis. Roughly includes material in ACM Subject Classes H.3.0, H.3.1, H.3.2, H.3.3, and H.3.4.
涵盖索引，字典，检索，内容和分析。大致包括ACM主题课程H.3.0、H.3.1、H.3.2、H.3.3和H.3.4中的材料。
--
一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algorithms, which are based on either Document Vector model (DVM) or Suffix Tree model (STC), are less efficient in producing results with high cluster quality. This paper introduces a new approach for document clustering based on the Topic Map representation of the documents. The document is being transformed into a compact form. A similarity measure is proposed based upon the inferred information through topic maps data and structures. The suggested method is implemented using agglomerative hierarchal clustering and tested on standard Information retrieval (IR) datasets. The comparative experiment reveals that the proposed approach is effective in improving the cluster quality.
---
PDF链接：
https://arxiv.org/pdf/1112.6219

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Presentation relationship Intelligence Presentatio information efficient such 语义信息 Topic

[计算机科学] 基于主题地图的文档聚类 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本版微信群

[计算机科学] 基于主题地图的文档聚类 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本版微信群

扫码加我拉你入群