摘要翻译:
现有的值函数逼近方法在许多应用中得到了成功的应用,但它们往往缺乏有用的先验误差界。我们提出了一种新的值函数近似双线性规划公式,它采用了全局最优化。该公式通过最小化Bellman残差的特定范数,为鲁棒和预期保单损失提供了强有力的先验保证。最优解双线性规划是NP困难的,但这是不可避免的,因为Bellman残差极小化本身就是NP困难的。我们描述并分析了求解双线性规划的最优算法和近似算法。分析表明,该算法提供了近似策略迭代的收敛推广。我们还简要分析了不完全样本下双线性规划算法的行为。最后,我们证明了所提出的方法在简单基准问题上能够一致地最小化Bellman残差。
---
英文标题:
《Global Optimization for Value Function Approximation》
---
作者:
Marek Petrik and Shlomo Zilberstein
---
最新提交年份:
2010
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze both optimal and approximate algorithms for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems.
---
PDF链接:
https://arxiv.org/pdf/1006.2743