We introduce an imitation learning-based physical human-robot interaction algorithm capable of predicting appropriate robot responses in complex interactions involving a superposition of multiple interactions. Our proposed algorithm, Blending Bayesian Interaction Primitives (B-BIP) allows us to achieve responsive interactions in complex hugging scenarios, capable of reciprocating and adapting to a hugs motion and timing. We show that this algorithm is a generalization of prior work, for which the original formulation reduces to the particular case of a single interaction, and evaluate our method through both an extensive user study and empirical experiments. Our algorithm yields significantly better quantitative prediction error and more-favorable participant responses with respect to accuracy, responsiveness, and timing, when compared to existing state-of-the-art methods.
translated by 谷歌翻译
人类可以利用身体互动来教机器人武器。这种物理互动取决于任务,用户以及机器人到目前为止所学的内容。最先进的方法专注于从单一模态学习,或者假设机器人具有有关人类预期任务的先前信息,从而结合了多个互动类型。相比之下,在本文中,我们介绍了一种算法形式主义,该算法从演示,更正和偏好中学习。我们的方法对人类想要教机器人的任务没有任何假设。取而代之的是,我们通过将人类的输入与附近的替代方案进行比较,从头开始学习奖励模型。我们首先得出损失函数,该功能训练奖励模型的合奏,以匹配人类的示范,更正和偏好。反馈的类型和顺序取决于人类老师:我们使机器人能够被动地或积极地收集此反馈。然后,我们应用受约束的优化将我们学习的奖励转换为所需的机器人轨迹。通过模拟和用户研究,我们证明,与现有基线相比,我们提出的方法更准确地从人体互动中学习了操纵任务,尤其是当机器人面临新的或意外的目标时。我们的用户研究视频可在以下网址获得:https://youtu.be/fsujstyveku
translated by 谷歌翻译
在本文中,我们讨论了通过模仿教授双人操作任务的框架。为此,我们提出了一种从人类示范中学习合规和接触良好的机器人行为的系统和算法。提出的系统结合了入学控制和机器学习的见解,以提取控制政策,这些政策可以(a)从时空和空间中恢复并适应各种干扰,同时(b)有效利用与环境的物理接触。我们使用现实世界中的插入任务证明了方法的有效性,该任务涉及操纵对象和插入钉之间的多个同时接触。我们还研究了为这种双人设置收集培训数据的有效方法。为此,我们进行了人类受试者的研究,并分析用户报告的努力和精神需求。我们的实验表明,尽管很难提供,但在遥控演示中可用的其他力/扭矩信息对于阶段估计和任务成功至关重要。最终,力/扭矩数据大大提高了操纵鲁棒性,从而在多点插入任务中获得了90%的成功率。可以在https://bimanualmanipulation.com/上找到代码和视频
translated by 谷歌翻译
Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower as each operator can control only a single robot at a time. To make this costly data collection process efficient and scalable, we propose Policy Assisted TeleOperation (PATO), a system which automates part of the demonstration collection process using a learned assistive policy. PATO autonomously executes repetitive behaviors in data collection and asks for human input only when it is uncertain about which subtask or behavior to execute. We conduct teleoperation user studies both with a real robot and a simulated robot fleet and demonstrate that our assisted teleoperation system reduces human operators' mental load while improving data collection efficiency. Further, it enables a single operator to control multiple robots in parallel, which is a first step towards scalable robotic data collection. For code and video results, see https://clvrai.com/pato
translated by 谷歌翻译
有效推论是一种数学框架,它起源于计算神经科学,作为大脑如何实现动作,感知和学习的理论。最近,已被证明是在不确定性下存在国家估算和控制问题的有希望的方法,以及一般的机器人和人工代理人的目标驱动行为的基础。在这里,我们审查了最先进的理论和对国家估计,控制,规划和学习的积极推断的实现;描述当前的成就,特别关注机器人。我们展示了相关实验,以适应,泛化和稳健性而言说明其潜力。此外,我们将这种方法与其他框架联系起来,并讨论其预期的利益和挑战:使用变分贝叶斯推理具有功能生物合理性的统一框架。
translated by 谷歌翻译
人类可以利用身体互动来教机器人武器。当人类的动力学通过示范引导机器人时,机器人学习了所需的任务。尽管先前的工作重点是机器人学习方式,但对于人类老师来说,了解其机器人正在学习的内容同样重要。视觉显示可以传达此信息;但是,我们假设仅视觉反馈就错过了人与机器人之间的物理联系。在本文中,我们介绍了一类新颖的软触觉显示器,这些显示器包裹在机器人臂上,添加信号而不会影响相互作用。我们首先设计一个气动驱动阵列,该阵列在安装方面保持灵活。然后,我们开发了这种包裹的触觉显示的单一和多维版本,并在心理物理测试和机器人学习过程中探索了人类对渲染信号的看法。我们最终发现,人们以11.4%的韦伯(Weber)分数准确区分单维反馈,并以94.5%的精度确定多维反馈。当物理教授机器人臂时,人类利用单维反馈来提供比视觉反馈更好的演示:我们包装的触觉显示会降低教学时间,同时提高演示质量。这种改进取决于包裹的触觉显示的位置和分布。您可以在此处查看我们的设备和实验的视频:https://youtu.be/ypcmgeqsjdm
translated by 谷歌翻译
我们研究了实时的协作机器人(Cobot)处理,Cobot在人类命令下操纵工件。当人类直接处理工件时,这是有用的。但是,在可能的操作中难以使COBOT易于命令和灵活。在这项工作中,我们提出了一个实时协作机器人处理(RTCOHand)框架,其允许通过用户定制的动态手势控制COBOT。由于用户,人类运动不确定性和嘈杂的人类投入的变化,这很难。我们将任务塑造为概率的生成过程,称为条件协作处理过程(CCHP),并从人类的合作中学习。我们彻底评估了CCHP的适应性和稳健性,并将我们的方法应用于Kinova Gen3机器人手臂的实时Cobot处理任务。我们实现了与经验丰富和新用户的无缝人员合作。与古典控制器相比,RTCEHAND允许更复杂的操作和更低的用户认知负担。它还消除了对试验和错误的需求,在安全关键任务中呈现。
translated by 谷歌翻译
在人类居住的环境中使用机器人的挑战是设计对人类互动引起的扰动且鲁棒的设计行为。我们的想法是用内在动机(IM)拟订机器人,以便它可以处理新的情况,并作为人类的真正社交,因此对人类互动伙伴感兴趣。人机互动(HRI)实验主要关注脚本或远程机器人,这是模拟特性,如IM来控制孤立的行为因素。本文介绍了一个“机器人学家”的研究设计,允许比较自主生成的行为彼此,而且首次评估机器人中基于IM的生成行为的人类感知。我们在受试者内部用户学习(n = 24),参与者与具有不同行为制度的完全自主的Sphero BB8机器人互动:一个实现自适应,本质上动机的行为,另一个是反应性的,但不是自适应。机器人及其行为是故意最小的,以专注于IM诱导的效果。与反应基线行为相比,相互作用后问卷的定量分析表明对尺寸“温暖”的显着提高。温暖被认为是人类社会认知中社会态度形成的主要维度。一种被认为是温暖(友好,值得信赖的)的人体验更积极的社交互动。
translated by 谷歌翻译
Humans have internal models of robots (like their physical capabilities), the world (like what will happen next), and their tasks (like a preferred goal). However, human internal models are not always perfect: for example, it is easy to underestimate a robot's inertia. Nevertheless, these models change and improve over time as humans gather more experience. Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change. In this work we take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality. Our key idea is to model the human's learning as a nonlinear dynamical system which evolves the human's internal model given new observations. We formulate a novel optimization problem to infer the human's learning dynamics from demonstrations that naturally exhibit human learning. We then formalize how robots can influence human learning by embedding the human's learning dynamics model into the robot planning problem. Although our formulations provide concrete problem statements, they are intractable to solve in full generality. We contribute an approximation that sacrifices the complexity of the human internal models we can represent, but enables robots to learn the nonlinear dynamics of these internal models. We evaluate our inference and planning methods in a suite of simulated environments and an in-person user study, where a 7DOF robotic arm teaches participants to be better teleoperators. While influencing human learning remains an open problem, our results demonstrate that this influence is possible and can be helpful in real human-robot interaction.
translated by 谷歌翻译
Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years, however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations; without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction and computer games to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this paper, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications and highlight current and future research directions.
translated by 谷歌翻译
我们描述了更改 - 联系机器人操作任务的框架,要求机器人与对象和表面打破触点。这种任务的不连续交互动态使得难以构建和使用单个动力学模型或控制策略,并且接触变化期间动态的高度非线性性质可能对机器人和物体造成损害。我们提出了一种自适应控制框架,使机器人能够逐步学习以预测更改联系人任务中的接触变化,从而了解了碎片连续系统的交互动态,并使用任务空间可变阻抗控制器提供平滑且精确的轨迹跟踪。我们通过实验比较我们框架的表现,以确定所需的代表性控制方法,以确定我们框架的自适应控制和增量学习组件需要在变化 - 联系机器人操纵任务中存在不连续动态的平稳控制。
translated by 谷歌翻译
当从人类行为中推断出奖励功能(无论是演示,比较,物理校正或电子停靠点)时,它已证明对人类进行建模作为做出嘈杂的理性选择,并具有“合理性系数”,以捕获多少噪声或熵我们希望看到人类的行为。无论人类反馈的类型或质量如何,许多现有作品都选择修复此系数。但是,在某些情况下,进行演示可能要比回答比较查询要困难得多。在这种情况下,我们应该期望在示范中看到比比较中更多的噪音或次级临时性,并且应该相应地解释反馈。在这项工作中,我们提倡,将每种反馈类型的实际数据中的理性系数扎根,而不是假设默认值,对奖励学习具有重大的积极影响。我们在模拟反馈以及用户研究的实验中测试了这一点。我们发现,从单一反馈类型中学习时,高估人类理性可能会对奖励准确性和遗憾产生可怕的影响。此外,我们发现合理性层面会影响每种反馈类型的信息性:令人惊讶的是,示威并不总是最有用的信息 - 当人类的行为非常卑鄙时,即使在合理性水平相同的情况下,比较实际上就变得更加有用。 。此外,当机器人确定要要求的反馈类型时,它可以通过准确建模每种类型的理性水平来获得很大的优势。最终,我们的结果强调了关注假定理性级别的重要性,不仅是在从单个反馈类型中学习时,尤其是当代理商从多种反馈类型中学习时,尤其是在学习时。
translated by 谷歌翻译
机器人技术中最重要的挑战之一是产生准确的轨迹并控制其动态参数,以便机器人可以执行不同的任务。提供此类运动控制的能力与此类运动的编码方式密切相关。深度学习的进步在发展动态运动原语的新方法的发展方面产生了强烈的影响。在这项工作中,我们调查了与神经动态运动原始素有关的科学文献,以补充有关动态运动原语的现有调查。
translated by 谷歌翻译
将机器人放置在受控条件外,需要多功能的运动表示,使机器人能够学习新任务并使其适应环境变化。在工作区中引入障碍或额外机器人的位置,由于故障或运动范围限制导致的关节范围的修改是典型的案例,适应能力在安全地执行机器人任务的关键作用。已经提出了代表适应性运动技能的概率动态(PROMP),其被建模为轨迹的高斯分布。这些都是在分析讲道的,可以从少数演示中学习。然而,原始PROMP制定和随后的方法都仅为特定运动适应问题提供解决方案,例如障碍避免,以及普遍的,统一的适应概率方法缺失。在本文中,我们开发了一种用于调整PROMP的通用概率框架。我们统一以前的适应技术,例如,各种类型的避避,通过一个框架,互相避免,在一个框架中,并将它们结合起来解决复杂的机器人问题。另外,我们推导了新颖的适应技术,例如时间上未结合的通量和互相避免。我们制定适应作为约束优化问题,在那里我们最小化适应的分布与原始原始的分布之间的kullback-leibler发散,而我们限制了与不希望的轨迹相关的概率质量为低电平。我们展示了我们在双机器人手臂设置中的模拟平面机器人武器和7-DOF法兰卡 - Emika机器人的若干适应问题的方法。
translated by 谷歌翻译
随着机器人越来越多地进入以人为本的环境,他们不仅必须能够在人类周围安全地浏览,还必须遵守复杂的社会规范。人类通常在围绕他人围绕他人(尤其是在密集占据的空间中)时,通常通过手势和面部表情依靠非语言交流。因此,机器人还需要能够将手势解释为解决社会导航任务的一部分。为此,我们提出了一种新型的社会导航方法,将基于图像的模仿学习与模型预测性控制结合在一起。手势是基于在图像流中运行的神经网络来解释的,而我们使用最先进的模型预测控制算法来求解点对点导航任务。我们将方法部署在真实的机器人上,并展示我们的方法对四个手势游动场景的有效性:左/右,跟随我,然后圈出一个圆圈。我们的实验表明,我们的方法能够成功地解释复杂的人类手势,并将其用作信号,以生成具有社会符合性的导航任务的轨迹。我们基于与机器人相互作用的参与者的原位等级验证了我们的方法。
translated by 谷歌翻译
Human and robot partners increasingly need to work together to perform tasks as a team. Robots designed for such collaboration must reason about how their task-completion strategies interplay with the behavior and skills of their human team members as they coordinate on achieving joint goals. Our goal in this work is to develop a computational framework for robot adaptation to human partners in human-robot team collaborations. We first present an algorithm for autonomously recognizing available task-completion strategies by observing human-human teams performing a collaborative task. By transforming team actions into low dimensional representations using hidden Markov models, we can identify strategies without prior knowledge. Robot policies are learned on each of the identified strategies to construct a Mixture-of-Experts model that adapts to the task strategies of unseen human partners. We evaluate our model on a collaborative cooking task using an Overcooked simulator. Results of an online user study with 125 participants demonstrate that our framework improves the task performance and collaborative fluency of human-agent teams, as compared to state of the art reinforcement learning methods.
translated by 谷歌翻译
在本次调查中,我们介绍了执行需要不同于环境的操作任务的机器人的当前状态,使得机器人必须隐含地或明确地控制与环境的接触力来完成任务。机器人可以执行越来越多的人体操作任务,并且在1)主题上具有越来越多的出版物,其执行始终需要联系的任务,并且通过利用完美的任务来减轻环境来缓解不确定性信息,可以在没有联系的情况下进行。最近的趋势已经看到机器人在留下的人类留给人类,例如按摩,以及诸如PEG孔的经典任务中,对其他类似任务的概率更有效,更好的误差容忍以及更快的规划或学习任务。因此,在本调查中,我们涵盖了执行此类任务的机器人的当前阶段,从调查开始所有不同的联系方式机器人可以执行,观察这些任务是如何控制和表示的,并且最终呈现所需技能的学习和规划完成这些任务。
translated by 谷歌翻译
嘈杂的传感,不完美的控制和环境变化是许多现实世界机器人任务的定义特征。部分可观察到的马尔可夫决策过程(POMDP)提供了一个原则上的数学框架,用于建模和解决不确定性下的机器人决策和控制任务。在过去的十年中,它看到了许多成功的应用程序,涵盖了本地化和导航,搜索和跟踪,自动驾驶,多机器人系统,操纵和人类机器人交互。这项调查旨在弥合POMDP模型的开发与算法之间的差距,以及针对另一端的不同机器人决策任务的应用。它分析了这些任务的特征,并将它们与POMDP框架的数学和算法属性联系起来,以进行有效的建模和解决方案。对于从业者来说,调查提供了一些关键任务特征,以决定何时以及如何成功地将POMDP应用于机器人任务。对于POMDP算法设计师,该调查为将POMDP应用于机器人系统的独特挑战提供了新的见解,并指出了有希望的新方向进行进一步研究。
translated by 谷歌翻译
When robots interact with humans in homes, roads, or factories the human's behavior often changes in response to the robot. Non-stationary humans are challenging for robot learners: actions the robot has learned to coordinate with the original human may fail after the human adapts to the robot. In this paper we introduce an algorithmic formalism that enables robots (i.e., ego agents) to co-adapt alongside dynamic humans (i.e., other agents) using only the robot's low-level states, actions, and rewards. A core challenge is that humans not only react to the robot's behavior, but the way in which humans react inevitably changes both over time and between users. To deal with this challenge, our insight is that -- instead of building an exact model of the human -- robots can learn and reason over high-level representations of the human's policy and policy dynamics. Applying this insight we develop RILI: Robustly Influencing Latent Intent. RILI first embeds low-level robot observations into predictions of the human's latent strategy and strategy dynamics. Next, RILI harnesses these predictions to select actions that influence the adaptive human towards advantageous, high reward behaviors over repeated interactions. We demonstrate that -- given RILI's measured performance with users sampled from an underlying distribution -- we can probabilistically bound RILI's expected performance across new humans sampled from the same distribution. Our simulated experiments compare RILI to state-of-the-art representation and reinforcement learning baselines, and show that RILI better learns to coordinate with imperfect, noisy, and time-varying agents. Finally, we conduct two user studies where RILI co-adapts alongside actual humans in a game of tag and a tower-building task. See videos of our user studies here: https://youtu.be/WYGO5amDXbQ
translated by 谷歌翻译
我们介绍了语言信息的潜在行动(LILA),这是在人机协作的背景下学习自然语言界面的框架。 Lila落在共享自主范式下:除了提供离散语言输入之外,人类还有低维控制器$ - 例如,可以向左/向右和向右移动2自由度(DOF)操纵杆$ - $操作机器人。 LILA学习使用语言来调制本控制器,为用户提供语言信息的控制空间:给定“将谷物碗放在托盘上的指示”,LILA可以学习一个二维空间,其中一个维度控制距离的距离机器人的末端执行器到碗,另一个维度控制机器人的末端效应器相对于碗上的抓地点。我们使用现实世界的用户学习评估LILA,用户可以在操作7 DOF法兰卡·埃米卡熊猫手臂时提供语言指导,以完成一系列复杂的操作任务。我们表明LILA模型不仅可以比仿制学习和终端效应器控制基线更高效,而且表现不变,但它们也是质疑优选的用户。
translated by 谷歌翻译