基于视觉的导航需要处理复杂的信息以做出以任务为导向的决策。应用包括自动驾驶机器人,自动驾驶汽车以及对人类的辅助愿景。该过程中的关键要素之一是在像素空间中提取和选择相关特征,以便基于操作选择,适合哪种机器学习技术。但是,在模拟中接受培训的深度强化学习代理人在现实世界中部署在现实世界中通常会表现出不满意的结果,这是因为感知差异称为$ \ textit {现实gap} $。尚未探索以弥合这一差距的方法是自我注意力。在本文中,我们(1)对基于3D环境的基于自我注意力的导航进行系统探索,并从不同的超参数集中观察到的行为,包括它们的概括能力; (2)目前的策略来提高代理的概括能力和导航行为; (3)展示在模拟中训练的模型如何能够实时处理现实世界图像。据我们所知,这是使用少于4000个参数成功导航3D动作空间的基于自我注意力的代理的首次演示。
translated by 谷歌翻译
A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatiotemporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of paper: https://worldmodels.github.io 32nd Conference on Neural Information Processing Systems (NIPS 2018),
translated by 谷歌翻译
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
translated by 谷歌翻译
尽管深度强化学习(RL)最近取得了许多成功,但其方法仍然效率低下,这使得在数据方面解决了昂贵的许多问题。我们的目标是通过利用未标记的数据中的丰富监督信号来进行学习状态表示,以解决这一问题。本文介绍了三种不同的表示算法,可以访问传统RL算法使用的数据源的不同子集使用:(i)GRICA受到独立组件分析(ICA)的启发,并训练深层神经网络以输出统计独立的独立特征。输入。 Grica通过最大程度地减少每个功能与其他功能之间的相互信息来做到这一点。此外,格里卡仅需要未分类的环境状态。 (ii)潜在表示预测(LARP)还需要更多的上下文:除了要求状态作为输入外,它还需要先前的状态和连接它们的动作。该方法通过预测当前状态和行动的环境的下一个状态来学习状态表示。预测器与图形搜索算法一起使用。 (iii)重新培训通过训练深层神经网络来学习国家表示,以学习奖励功能的平滑版本。该表示形式用于预处理输入到深度RL,而奖励预测指标用于奖励成型。此方法仅需要环境中的状态奖励对学习表示表示。我们发现,每种方法都有其优势和缺点,并从我们的实验中得出结论,包括无监督的代表性学习在RL解决问题的管道中可以加快学习的速度。
translated by 谷歌翻译
模拟虚拟人群的轨迹是计算机图形中通常遇到的任务。最近的一些作品应用了强化学习方法来使虚拟代理动画,但是在基本模拟设置方面,它们通常会做出不同的设计选择。这些选择中的每一个都有合理的使用依据,因此并不明显其真正的影响是什么,以及它们如何影响结果。在这项工作中,我们从对学习绩效的影响以及根据能源效率测得的模拟的质量分析了其中一些任意选择。我们对奖励函数设计的性质进行理论分析,并经验评估使用某些观察和动作空间对各种情况的影响,并将奖励函数和能量使用作为指标。我们表明,直接使用相邻代理的信息作为观察,通常优于更广泛使用的射线播放。同样,与具有绝对观察结果的自动对照相比,使用具有以自我为中心的观察的非体力学对照倾向于产生更有效的行为。这些选择中的每一个都对结果产生重大且潜在的非平凡影响,因此研究人员应该注意选择和报告他们的工作。
translated by 谷歌翻译
Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.
translated by 谷歌翻译
我们提出了一个端到端,基于模型的深度加强学习代理,它在规划期间动态地参加其国家的相关部分。代理使用基于集的表示的瓶颈机制,以强制代理参加每个规划步骤的实体数量。在实验中,我们研究了具有不同挑战的几套定制环境的瓶颈机制。我们始终如一地观察到该设计允许规划代理通过参加相关对象来概括其在兼容的看不见环境中的学习任务解决能力,从而导致更好的分发概括性表现。
translated by 谷歌翻译
这项工作研究了图像目标导航问题,需要通过真正拥挤的环境引导具有嘈杂传感器和控制的机器人。最近的富有成效的方法依赖于深度加强学习,并学习模拟环境中的导航政策,这些环境比真实环境更简单。直接将这些训练有素的策略转移到真正的环境可能非常具有挑战性甚至危险。我们用由四个解耦模块组成的分层导航方法来解决这个问题。第一模块在机器人导航期间维护障碍物映射。第二个将定期预测实时地图上的长期目标。第三个计划碰撞命令集以导航到长期目标,而最终模块将机器人正确靠近目标图像。四个模块是单独开发的,以适应真实拥挤的情景中的图像目标导航。此外,分层分解对导航目标规划,碰撞避免和导航结束预测的学习进行了解耦,这在导航训练期间减少了搜索空间,并有助于改善以前看不见的真实场景的概括。我们通过移动机器人评估模拟器和现实世界中的方法。结果表明,我们的方法优于多种导航基线,可以在这些方案中成功实现导航任务。
translated by 谷歌翻译
从“Internet AI”的时代到“体现AI”的时代,AI算法和代理商出现了一个新兴范式转变,其中不再从主要来自Internet策划的图像,视频或文本的数据集。相反,他们通过与与人类类似的Enocentric感知来通过与其环境的互动学习。因此,对体现AI模拟器的需求存在大幅增长,以支持各种体现的AI研究任务。这种越来越多的体现AI兴趣是有利于对人工综合情报(AGI)的更大追求,但对这一领域并无一直存在当代和全面的调查。本文旨在向体现AI领域提供百科全书的调查,从其模拟器到其研究。通过使用我们提出的七种功能评估九个当前体现的AI模拟器,旨在了解模拟器,以其在体现AI研究和其局限性中使用。最后,本文调查了体现AI - 视觉探索,视觉导航和体现问题的三个主要研究任务(QA),涵盖了最先进的方法,评估指标和数据集。最后,随着通过测量该领域的新见解,本文将为仿真器 - 任务选择和建议提供关于该领域的未来方向的建议。
translated by 谷歌翻译
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
translated by 谷歌翻译
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.
translated by 谷歌翻译
深入学习的强化学习(RL)的结合导致了一系列令人印象深刻的壮举,许多相信(深)RL提供了一般能力的代理。然而,RL代理商的成功往往对培训过程中的设计选择非常敏感,这可能需要繁琐和易于易于的手动调整。这使得利用RL对新问题充满挑战,同时也限制了其全部潜力。在许多其他机器学习领域,AutomL已经示出了可以自动化这样的设计选择,并且在应用于RL时也会产生有希望的初始结果。然而,自动化强化学习(AutorL)不仅涉及Automl的标准应用,而且还包括RL独特的额外挑战,其自然地产生了不同的方法。因此,Autorl已成为RL中的一个重要研究领域,提供来自RNA设计的各种应用中的承诺,以便玩游戏等游戏。鉴于RL中考虑的方法和环境的多样性,在不同的子领域进行了大部分研究,从Meta学习到进化。在这项调查中,我们寻求统一自动的领域,我们提供常见的分类法,详细讨论每个区域并对研究人员来说是一个兴趣的开放问题。
translated by 谷歌翻译
In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. The proposed approach is evaluated in two environments: VizDoom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch.
translated by 谷歌翻译
我们提出了一种新的四管齐下的方法,在文献中首次建立消防员的情境意识。我们构建了一系列深度学习框架,彼此之叠,以提高消防员在紧急首次响应设置中进行的救援任务的安全性,效率和成功完成。首先,我们使用深度卷积神经网络(CNN)系统,以实时地分类和识别来自热图像的感兴趣对象。接下来,我们将此CNN框架扩展了对象检测,跟踪,分割与掩码RCNN框架,以及具有多模级自然语言处理(NLP)框架的场景描述。第三,我们建立了一个深入的Q学习的代理,免受压力引起的迷失方向和焦虑,能够根据现场消防环境中观察和存储的事实来制定明确的导航决策。最后,我们使用了一种低计算无监督的学习技术,称为张量分解,在实时对异常检测进行有意义的特征提取。通过这些临时深度学习结构,我们建立了人工智能系统的骨干,用于消防员的情境意识。要将设计的系统带入消防员的使用,我们设计了一种物理结构,其中处理后的结果被用作创建增强现实的投入,这是一个能够建议他们所在地的消防员和周围的关键特征,这对救援操作至关重要在手头,以及路径规划功能,充当虚拟指南,以帮助迷彩的第一个响应者恢复安全。当组合时,这四种方法呈现了一种新颖的信息理解,转移和综合方法,这可能会大大提高消防员响应和功效,并降低寿命损失。
translated by 谷歌翻译
深度强化学习在基于激光的碰撞避免有效的情况下取得了巨大的成功,因为激光器可以感觉到准确的深度信息而无需太多冗余数据,这可以在算法从模拟环境迁移到现实世界时保持算法的稳健性。但是,高成本激光设备不仅很难为大型机器人部署,而且还表现出对复杂障碍的鲁棒性,包括不规则的障碍,例如桌子,桌子,椅子和架子,以及复杂的地面和特殊材料。在本文中,我们提出了一个新型的基于单眼相机的复杂障碍避免框架。特别是,我们创新地将捕获的RGB图像转换为伪激光测量,以进行有效的深度强化学习。与在一定高度捕获的传统激光测量相比,仅包含距离附近障碍的一维距离信息,我们提议的伪激光测量融合了捕获的RGB图像的深度和语义信息,这使我们的方法有效地有效障碍。我们还设计了一个功能提取引导模块,以加重输入伪激光测量,并且代理对当前状态具有更合理的关注,这有利于提高障碍避免政策的准确性和效率。
translated by 谷歌翻译
We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation. Specifically, Habitat consists of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is fast -when rendering a scene from Matterport3D, it achieves several thousand frames per second (fps) running single-threaded, and can reach over 10,000 fps multi-process on a single GPU. (ii) Habitat-API: a modular high-level library for end-toend development of embodied AI algorithms -defining tasks (e.g. navigation, instruction following, question answering), configuring, training, and benchmarking embodied agents.These large-scale engineering contributions enable us to answer scientific questions requiring experiments that were till now impracticable or 'merely' impractical. Specifically, in the context of point-goal navigation: (1) we revisit the comparison between learning and SLAM approaches from two recent works [20,16] and find evidence for the opposite conclusion -that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and (2) we conduct the first cross-dataset generalization experiments {train, test} × {Matterport3D, Gibson} for multiple sensors {blind, RGB, RGBD, D} and find that only agents with depth (D) sensors generalize across datasets. We hope that our open-source platform and these findings will advance research in embodied AI.
translated by 谷歌翻译
逃生加强学习系统的越来越趋势使其进入现实世界应用的进入现实应用程序的伴随着对他们的安全和鲁棒性的担忧越来越伴随着。近年来,已经提出了各种方法来解决安全意识的加强学习的挑战;然而,这些方法通常需要预先提供要提供的环境的手绘模型,或者环境相对简单且低维度。我们在称为潜在屏蔽的高维环境中提出了一种新的安全意识深度增强学习方法。潜在的屏蔽利用模型的代理学到的环境的内部表示,以“想象”未来的轨迹,避免被视为不安全的人。我们通过实验证明这种方法导致改善对正式定义的安全规范的依从性。
translated by 谷歌翻译
Despite some successful applications of goal-driven navigation, existing deep reinforcement learning-based approaches notoriously suffers from poor data efficiency issue. One of the reasons is that the goal information is decoupled from the perception module and directly introduced as a condition of decision-making, resulting in the goal-irrelevant features of the scene representation playing an adversary role during the learning process. In light of this, we present a novel Goal-guided Transformer-enabled reinforcement learning (GTRL) approach by considering the physical goal states as an input of the scene encoder for guiding the scene representation to couple with the goal information and realizing efficient autonomous navigation. More specifically, we propose a novel variant of the Vision Transformer as the backbone of the perception system, namely Goal-guided Transformer (GoT), and pre-train it with expert priors to boost the data efficiency. Subsequently, a reinforcement learning algorithm is instantiated for the decision-making system, taking the goal-oriented scene representation from the GoT as the input and generating decision commands. As a result, our approach motivates the scene representation to concentrate mainly on goal-relevant features, which substantially enhances the data efficiency of the DRL learning process, leading to superior navigation performance. Both simulation and real-world experimental results manifest the superiority of our approach in terms of data efficiency, performance, robustness, and sim-to-real generalization, compared with other state-of-art baselines. Demonstration videos are available at \colorb{https://youtu.be/93LGlGvaN0c.
translated by 谷歌翻译
在空间显式的基于个别模型中捕获和模拟智能自适应行为仍然是研究人员持续的挑战。虽然收集了不断增长的现实行为数据,但存在很少的方法,可以量化和正式化关键的个人行为以及它们如何改变空间和时间。因此,通常使用的常用代理决策框架(例如事件条件 - 行动规则)通常只需要仅关注狭窄的行为范围。我们认为,这些行为框架通常不会反映现实世界的情景,并且未能捕捉如何以响应刺激而发展行为。对机器学习方法的兴趣增加了近年来模拟智能自适应行为的兴趣。在该区域中开始获得牵引的一种方法是增强学习(RL)。本文探讨了如何使用基于简单的捕食者 - 猎物代理的模型(ABM)来应用RL创建紧急代理行为。运行一系列模拟,我们证明了使用新型近端政策优化(PPO)算法培训的代理以展示现实世界智能自适应行为的性质,例如隐藏,逃避和觅食。
translated by 谷歌翻译
Development of navigation algorithms is essential for the successful deployment of robots in rapidly changing hazardous environments for which prior knowledge of configuration is often limited or unavailable. Use of traditional path-planning algorithms, which are based on localization and require detailed obstacle maps with goal locations, is not possible. In this regard, vision-based algorithms hold great promise, as visual information can be readily acquired by a robot's onboard sensors and provides a much richer source of information from which deep neural networks can extract complex patterns. Deep reinforcement learning has been used to achieve vision-based robot navigation. However, the efficacy of these algorithms in environments with dynamic obstacles and high variation in the configuration space has not been thoroughly investigated. In this paper, we employ a deep Dyna-Q learning algorithm for room evacuation and obstacle avoidance in partially observable environments based on low-resolution raw image data from an onboard camera. We explore the performance of a robotic agent in environments containing no obstacles, convex obstacles, and concave obstacles, both static and dynamic. Obstacles and the exit are initialized in random positions at the start of each episode of reinforcement learning. Overall, we show that our algorithm and training approach can generalize learning for collision-free evacuation of environments with complex obstacle configurations. It is evident that the agent can navigate to a goal location while avoiding multiple static and dynamic obstacles, and can escape from a concave obstacle while searching for and navigating to the exit.
translated by 谷歌翻译