摘要翻译:
随着自然语言处理技术的发展,越来越多的系统希望在用户界面模块中采用自然语言处理技术来处理用户输入,以便与用户进行自然的交流。然而,这引起了一个速度问题。也就是说,如果NLP模块不能在持久的时延下处理句子,用户将永远不会使用该系统。因此,对处理时间要求严格的系统,如对话系统、web搜索系统、自动客户服务系统,尤其是实时系统,都不得不放弃NLP模块,以获得更快的系统响应。本文旨在解决速度问题。本文首先介绍了一个基于语料库机器学习和统计模型的句法分析器的构造,然后对该分析器及其算法进行了速度问题分析。在此基础上,提出了压缩词性集和句法模式剪枝两种加速方法,有效地提高了NLP模块句法分析的时间效率。为了评价加速算法中的不同参数,引入了PT和RT两个新的因子,并对其进行了详细的解释。最后通过实验对这些方法进行了验证和检验,为NLP的应用做出了一定的贡献。
---
英文标题:
《Accelerating and Evaluation of Syntactic Parsing in Natural Language
Question Answering Systems》
---
作者:
Zhe Chen, Dunwei Wen
---
最新提交年份:
2009
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Human-Computer Interaction 人机交互
分类描述:Covers human factors, user interfaces, and collaborative computing. Roughly includes material in ACM Subject Classes H.1.2 and all of H.5, except for H.5.1, which is more likely to have Multimedia as the primary subject area.
包括人为因素、用户界面和协作计算。大致包括ACM学科课程H.1.2和所有H.5中的材料,除了H.5.1,它更有可能以多媒体作为主要学科领域。
--
---
英文摘要:
With the development of Natural Language Processing (NLP), more and more systems want to adopt NLP in User Interface Module to process user input, in order to communicate with user in a natural way. However, this raises a speed problem. That is, if NLP module can not process sentences in durable time delay, users will never use the system. As a result, systems which are strict with processing time, such as dialogue systems, web search systems, automatic customer service systems, especially real-time systems, have to abandon NLP module in order to get a faster system response. This paper aims to solve the speed problem. In this paper, at first, the construction of a syntactic parser which is based on corpus machine learning and statistics model is introduced, and then a speed problem analysis is performed on the parser and its algorithms. Based on the analysis, two accelerating methods, Compressed POS Set and Syntactic Patterns Pruning, are proposed, which can effectively improve the time efficiency of parsing in NLP module. To evaluate different parameters in the accelerating algorithms, two new factors, PT and RT, are introduced and explained in detail. Experiments are also completed to prove and test these methods, which will surely contribute to the application of NLP.
---
PDF链接:
https://arxiv.org/pdf/0903.0174