人大经济论坛 › 论坛 › 经济学人二区 › 外文文献专区 › 基于值梯度的强化学习

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: nandehutu2022

251 0

[计算机科学] 基于值梯度的强化学习 [推广有奖]

0关注
4粉丝

会员

学术权威

75%

还不是VIP/贵宾

威望: 10 级
论坛币: 10 个
通用积分: 65.5896
学术水平: 0 点
热心指数: 0 点
信用等级: 0 点
经验: 24498 点
帖子: 4088
精华: 0
在线时间: 1 小时
注册时间: 2022-2-24
最后登录: 2022-4-20

楼主

nandehutu2022

发表于 2022-3-8 21:04:20 来自手机 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

摘要翻译：
值梯度的概念是在强化学习的背景下引入和发展起来的。结果表明，通过学习值梯度，不再需要探索或随机行为来寻找局部最优轨迹。这是使用值梯度的主要动机，并认为学习值梯度是任何控制问题的值函数学习算法的实际目标。我们还认为，学习值梯度比仅仅学习值更有效，在几个问题领域中，这个论点在实验中得到了几个数量级的效率增益的支持。一旦在学习中引入价值梯度，几种分析就成为可能。例如，证明了值梯度学习算法和策略梯度学习算法之间惊人的等价性，这为使用值函数和一般函数逼近器的控制问题提供了鲁棒收敛性证明。
---
英文标题：
《Reinforcement Learning by Value Gradients》
---
作者：
Michael Fairbank
---
最新提交年份：
2008
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Neural and Evolutionary Computing 神经与进化计算
分类描述：Covers neural networks, connectionism, genetic algorithms, artificial life, adaptive behavior. Roughly includes some material in ACM Subject Class C.1.3, I.2.6, I.5.
涵盖神经网络，连接主义，遗传算法，人工生命，自适应行为。大致包括ACM学科类C.1.3、I.2.6、I.5中的一些材料。
--
一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
The concept of the value-gradient is introduced and developed in the context of reinforcement learning. It is shown that by learning the value-gradients exploration or stochastic behaviour is no longer needed to find locally optimal trajectories. This is the main motivation for using value-gradients, and it is argued that learning value-gradients is the actual objective of any value-function learning algorithm for control problems. It is also argued that learning value-gradients is significantly more efficient than learning just the values, and this argument is supported in experiments by efficiency gains of several orders of magnitude, in several problem domains. Once value-gradients are introduced into learning, several analyses become possible. For example, a surprising equivalence between a value-gradient learning algorithm and a policy-gradient learning algorithm is proven, and this provides a robust convergence proof for control problems using a value function with a general function approximator.
---
PDF链接：
https://arxiv.org/pdf/0803.3539

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Presentation Intelligence Evolutionary Equivalence Experiments 策略 problems control 证明局部

[计算机科学] 基于值梯度的强化学习 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本版微信群

[计算机科学] 基于值梯度的强化学习 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本版微信群

扫码加我拉你入群