摘要翻译:
强化学习算法在实际问题中的应用总是面临着从原始传感器读数中过滤环境状态的挑战。虽然大多数方法使用启发式,但生物学认为必须存在一种无监督的方法来自动构造这种过滤器。除了提取环境状态外,滤波器还必须以支持现代增强算法的方式来表示环境状态。许多流行的算法都使用线性结构,因此应该瞄准与线性函数结合具有良好逼近特性的滤波器。本文针对这一问题,提出了一种无监督的慢特征分析方法。给出一个传感器读数的随机序列,SFA学习一组滤波器。随着模型复杂度和训练实例的增加,滤波器收敛于三角多项式函数。这些算法具有很好的逼近能力,因此能够很好地支持增强算法。我们在一个机器人上评估这一主张。任务是在一个简单的环境中使用最小二乘策略迭代(LSPI)算法学习导航控制。唯一可访问的传感器是头戴式摄像机,但如果没有有意义的滤波,视频图像不适合作为LSPI输入。我们将展示SFA学习的过滤器,基于机器人的随机行走视频,允许学习的控制在CA中成功导航。80%的测试试验。
---
英文标题:
《Robot Navigation using Reinforcement Learning and Slow Feature Analysis》
---
作者:
Wendelin B\"ohmer
---
最新提交年份:
2012
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Neural and Evolutionary Computing 神经与进化计算
分类描述:Covers neural networks, connectionism, genetic algorithms, artificial life, adaptive behavior. Roughly includes some material in ACM Subject Class C.1.3, I.2.6, I.5.
涵盖神经网络,连接主义,遗传算法,人工生命,自适应行为。大致包括ACM学科类C.1.3、I.2.6、I.5中的一些材料。
--
---
英文摘要:
The application of reinforcement learning algorithms onto real life problems always bears the challenge of filtering the environmental state out of raw sensor readings. While most approaches use heuristics, biology suggests that there must exist an unsupervised method to construct such filters automatically. Besides the extraction of environmental states, the filters have to represent them in a fashion that support modern reinforcement algorithms. Many popular algorithms use a linear architecture, so one should aim at filters that have good approximation properties in combination with linear functions. This thesis wants to propose the unsupervised method slow feature analysis (SFA) for this task. Presented with a random sequence of sensor readings, SFA learns a set of filters. With growing model complexity and training examples, the filters converge against trigonometric polynomial functions. These are known to possess excellent approximation capabilities and should therfore support the reinforcement algorithms well. We evaluate this claim on a robot. The task is to learn a navigational control in a simple environment using the least square policy iteration (LSPI) algorithm. The only accessible sensor is a head mounted video camera, but without meaningful filtering, video images are not suited as LSPI input. We will show that filters learned by SFA, based on a random walk video of the robot, allow the learned control to navigate successfully in ca. 80% of the test trials.
---
PDF链接:
https://arxiv.org/pdf/1205.0986