Haptic feedback can improve safety of teleoperated robots when situational awareness is limited or operators are inattentive. Standard potential field approaches increase haptic resistance as an obstacle is approached, which is desirable when the operator is unaware of the obstacle but undesirable when the movement is intentional, such as when the operator wishes to inspect or manipulate an object. This paper presents a novel haptic teleoperation framework that estimates the operator's attentiveness to dampen haptic feedback for intentional movement. A biologically-inspired attention model is developed based on computational working memory theories to integrate visual saliency estimation with spatial mapping. This model generates an attentiveness map in real-time, and the haptic rendering system generates lower haptic forces for obstacles that the operator is estimated to be aware of. Experimental results in simulation show that the proposed framework outperforms haptic teleoperation without attentiveness estimation in terms of task performance, robot safety, and user experience.
translated by 谷歌翻译
感官反应系统(例如机器人技术和AR/VR)必须采取高度响应的实时操作,这是由涉及感应,感知,计划和反应任务的复杂决策驱动的。这些任务必须安排在资源约束的设备上,以便满足应用程序的性能目标和要求。这是一个困难的调度问题,需要处理多个调度维度以及资源使用和可用性的变化。实际上,系统设计师手动调整其特定硬件和应用参数,从而导致泛化不良并增加了开发负担。在这项工作中,我们强调了在有感觉反应系统中在运行时安排CPU资源的新兴需求。我们研究三个规范应用程序(面部跟踪,机器人导航和VR),以首先了解此类系统的关键调度要求。凭借这种理解,我们开发了一个调度框架Catan,该框架动态调度了在应用程序的不同组件上计算资源,以满足指定的应用程序要求。通过在广泛使用的机器人技术框架(ROS)和开源AR/VR平台上实施的原型实验,我们显示了系统计划对达到三个应用程序的性能目标的影响,Catan能够更好地取得更好的成就应用程序性能比手工调整的配置以及如何动态适应运行时变化。
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
The ability to learn from human demonstration endows robots with the ability to automate various tasks. However, directly learning from human demonstration is challenging since the structure of the human hand can be very different from the desired robot gripper. In this work, we show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning, where a five-finger human dexterous hand robot gradually evolves into a commercial robot, while repeated interacting in a physics simulator to continuously update the policy that is first learned from human demonstration. To deal with the high dimensions of robot parameters, we propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy. Through experiments on human object manipulation datasets, we show that our framework can efficiently transfer the expert human agent policy trained from human demonstrations in diverse modalities to target commercial robots.
translated by 谷歌翻译
变形金刚已成为主要的机器学习工作负载,它们不仅是自然语言处理任务的事实上的标准,而且还将部署在其他领域,例如视觉和语音识别。许多基于变压器的应用程序都是实时系统,例如机器翻译和Web搜索。这些实时系统通常具有严格的端到端推理潜伏期需求。不幸的是,尽管大多数变压器计算都来自基质乘法,但变压器还包括几种非线性组件,它们在推理过程中倾向于成为瓶颈。在这项工作中,我们加快了张量流处理器上BERT模型的推断。通过小心地将所有非线性组件与矩阵乘法组件融合在一起,我们能够有效地利用芯片矩阵乘法单元,从而通过BERT-1通过BERT-1通过BERT-BASE,确定性的尾巴延迟为130 $ \ MU $ s,比当前的最新时间快6倍。
translated by 谷歌翻译
众所周知,自动语音识别(ASR)系统在转录儿童的言语时会出现困难。这主要归因于没有大儿童的语音语料库来培训强大的ASR模型以及在用接受成人数据培训的系统解码儿童演讲时所产生的领域不匹配。在本文中,我们提出了多种增强能力来减轻这些问题。首先,我们根据语音源过滤器模型提出了一种数据增强技术,以缩小成人和儿童语音之间的领域差距。这使我们能够通过使这些样本在感知上与儿童的言语相似,从而利用成人语音语料库的数据可用性。其次,使用这种增强策略,我们将转移学习应用于成人数据预先训练的变压器模型。该模型遵循最近引入的XLS-R体系结构,这是对几个跨语性成人语音语料库进行预训练的WAV2VEC 2.0模型,以学习一般和强大的声学框架级表示。使用拟议的来源滤清器扭曲策略增强的成人数据来采用此模型,以实现ASR任务,并且在PF-Star英国英语儿童演讲语料库上的先前最先进的结果大大优于先前的最先进的结果官方测试集中的4.86%。
translated by 谷歌翻译
自动设计虚拟人和类人动物在帮助游戏,电影和机器人中的角色创作过程中具有巨大的潜力。在某些情况下,角色创建者可能希望设计针对某些动作(例如空手道踢和跑酷跳跃)定制的类人体身体。在这项工作中,我们提出了一个人形设计框架,以自动生成以预先指定的人体运动为条件的身体有效的人形体。首先,我们学习了一个广义的类人动物控制器,该控制器在大型人体运动数据集上进行了训练,该数据集具有多样化的人体运动和身体形状。其次,我们使用设计与控制框架来优化类人动物的物理属性,以找到可以更好地模仿预先指定的人类运动序列的身体设计。我们的方法利用预先训练的类人动物控制器和物理模拟作为指导,能够发现经过定制以执行预先指定的人类运动的新类型类人体设计。
translated by 谷歌翻译
我们提出了体面意识的人类姿势估计,我们根据模拟代理的本体感受和场景意识以及外部第三人称观察来估计3D构成。与经常诉诸多阶段优化的先前方法不同,非因果推理和复杂的接触建模以估计人类姿势和人类场景的相互作用,我们的方法是一个阶段,因果关系,并在模拟环境中恢复全局3D人类姿势。由于2D第三人称观察与相机姿势结合在一起,我们建议解开相机姿势,并使用在全球坐标框架中定义的多步投影梯度作为我们体现的代理的运动提示。利用物理模拟和预先的场景(例如3D网格),我们在日常环境(库,办公室,卧室等)中模拟代理,并为我们的代理配备环境传感器,以智能导航和与场景的几何形状进行智能导航和互动。我们的方法还仅依靠2D关键点,并且可以在来自流行人类运动数据库的合成数据集上进行培训。为了评估,我们使用流行的H36M和Prox数据集,并首次在具有挑战性的Prox数据集中获得96.7%的成功率,而无需使用Prox运动序列进行培训。
translated by 谷歌翻译
该论文通过将基于定向准分析小波包(QWP)与最新的加权核定标准最小化(WNNM)denoising算法相结合,从而提出了图像降级方案。基于QWP的Denoising方法(QWPDN)由降级图像的多尺度QWP变换,使用双变量收缩方法的适应性局部软阈值应用于转换系数,以及从几个分解级别中恢复阈值系数的图像。合并的方法由QWPDN和WNNM算法的几个迭代组成,以每种迭代的方式,从一种算法中的输出将输入提高到另一个算法。提出的方法将QWPDN的功能融合在一起,即使在严重损坏的图像中捕获边缘和精细的纹理模式,并利用了WNNM算法固有的真实图像中的非本地自相似性。多个实验将所提出的方法与包括WNNM在内的六种高级denoing算法进行了比较,证实,在定量度量和视觉感知质量方面,合并的跨增强算法比大多数都优于大多数。
translated by 谷歌翻译
Semantic code search is the task of retrieving a code snippet given a textual description of its functionality. Recent work has been focused on using similarity metrics between neural embeddings of text and code. However, current language models are known to struggle with longer, compositional text, and multi-step reasoning. To overcome this limitation, we propose supplementing the query sentence with a layout of its semantic structure. The semantic layout is used to break down the final reasoning decision into a series of lower-level decisions. We use a Neural Module Network architecture to implement this idea. We compare our model - NS3 (Neuro-Symbolic Semantic Search) - to a number of baselines, including state-of-the-art semantic code retrieval methods, and evaluate on two datasets - CodeSearchNet and Code Search and Question Answering. We demonstrate that our approach results in more precise code retrieval, and we study the effectiveness of our modular design when handling compositional queries.
translated by 谷歌翻译