摘要翻译:
解决大概率系统的一种流行方法依赖于基于相似性度量的状态聚合。文献中的许多方法都是启发式的。最近的一些方法依赖于基于双模拟概念的度量,或者状态之间的行为等价(Givan et al,2001,2003;Ferns et al,2004)。这种度量的一个组成部分是概率分布之间的Kantorovich度量。然而,虽然这个度量能提供许多令人满意的理论性质,但在实践中计算代价很高。本文采用网络优化和统计抽样技术来解决这一问题。通过这种方法,我们得到了MDP状态聚合的各种距离函数,这些距离函数在时间和空间复杂度之间的折衷以及聚合的质量上都有所不同。我们提供了对这些权衡的实证评估。
---
英文标题:
《Methods for computing state similarity in Markov Decision Processes》
---
作者:
Norman Ferns, Pablo Samuel Castro, Doina Precup, Prakash Panangaden
---
最新提交年份:
2012
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
A popular approach to solving large probabilistic systems relies on aggregating states based on a measure of similarity. Many approaches in the literature are heuristic. A number of recent methods rely instead on metrics based on the notion of bisimulation, or behavioral equivalence between states (Givan et al, 2001, 2003; Ferns et al, 2004). An integral component of such metrics is the Kantorovich metric between probability distributions. However, while this metric enables many satisfying theoretical properties, it is costly to compute in practice. In this paper, we use techniques from network optimization and statistical sampling to overcome this problem. We obtain in this manner a variety of distance functions for MDP state aggregation, which differ in the tradeoff between time and space complexity, as well as the quality of the aggregation. We provide an empirical evaluation of these trade-offs.
---
PDF链接:
https://arxiv.org/pdf/1206.6836


雷达卡



京公网安备 11010802022788号







