The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert behavior can lead to poor results when given sub-optimal experts. We compare several common approaches for skill transfer on multiple domains including changes in task and system dynamics. We identify how existing methods can fail and introduce an alternative approach to mitigate these problems. Our approach learns to sequence existing temporally-extended skills for exploration but learns the final policy directly from the raw experience. This conceptual split enables rapid adaptation and thus efficient data collection but without constraining the final solution.It significantly outperforms many classical methods across a suite of evaluation tasks and we use a broad set of ablations to highlight the importance of differentc omponents of our method.
translated by 谷歌翻译
对于在现实世界中运营的机器人来说,期望学习可以有效地转移和适应许多任务和场景的可重复使用的行为。我们提出了一种使用分层混合潜变量模型来从数据中学习抽象运动技能的方法。与现有工作相比,我们的方法利用了离散和连续潜在变量的三级层次结构,以捕获一组高级行为,同时允许如何执行它们的差异。我们在操纵域中展示该方法可以有效地将离线数据脱落到不同的可执行行为,同时保留连续潜变量模型的灵活性。由此产生的技能可以在新的任务,看不见的对象和州内转移和微调到基于视觉的策略,与现有的技能和仿制的方法相比,产生更好的样本效率和渐近性能。我们进一步分析了技能最有益的方式以及何时:他们鼓励定向探索来涵盖与任务相关的国家空间的大区域,使其在挑战稀疏奖励环境中最有效。
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译
Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
This volume contains revised versions of the papers selected for the third volume of the Online Handbook of Argumentation for AI (OHAAI). Previously, formal theories of argument and argument interaction have been proposed and studied, and this has led to the more recent study of computational models of argument. Argumentation, as a field within artificial intelligence (AI), is highly relevant for researchers interested in symbolic representations of knowledge and defeasible reasoning. The purpose of this handbook is to provide an open access and curated anthology for the argumentation research community. OHAAI is designed to serve as a research hub to keep track of the latest and upcoming PhD-driven research on the theory and application of argumentation in all areas related to AI.
translated by 谷歌翻译
We analyze the problem of detecting tree rings in microscopy images of shrub cross sections. This can be regarded as a special case of the instance segmentation task with several particularities such as the concentric circular ring shape of the objects and high precision requirements due to which existing methods don't perform sufficiently well. We propose a new iterative method which we term Iterative Next Boundary Detection (INBD). It intuitively models the natural growth direction, starting from the center of the shrub cross section and detecting the next ring boundary in each iteration step. In our experiments, INBD shows superior performance to generic instance segmentation methods and is the only one with a built-in notion of chronological order. Our dataset and source code are available at http://github.com/alexander-g/INBD.
translated by 谷歌翻译
People constantly use language to learn about the world. Computational linguists have capitalized on this fact to build large language models (LLMs) that acquire co-occurrence-based knowledge from language corpora. LLMs achieve impressive performance on many tasks, but the robustness of their world knowledge has been questioned. Here, we ask: do LLMs acquire generalized knowledge about real-world events? Using curated sets of minimal sentence pairs (n=1215), we tested whether LLMs are more likely to generate plausible event descriptions compared to their implausible counterparts. We found that LLMs systematically distinguish possible and impossible events (The teacher bought the laptop vs. The laptop bought the teacher) but fall short of human performance when distinguishing likely and unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLMs generalize well across syntactic sentence variants (active vs passive) but less well across semantic sentence variants (synonymous sentences), (iii) some, but not all LLM deviations from ground-truth labels align with crowdsourced human judgments, and (iv) explicit event plausibility information emerges in middle LLM layers and remains high thereafter. Overall, our analyses reveal a gap in LLMs' event knowledge, highlighting their limitations as generalized knowledge bases. We conclude by speculating that the differential performance on impossible vs. unlikely events is not a temporary setback but an inherent property of LLMs, reflecting a fundamental difference between linguistic knowledge and world knowledge in intelligent systems.
translated by 谷歌翻译
Most approaches for semantic segmentation use only information from color cameras to parse the scenes, yet recent advancements show that using depth data allows to further improve performances. In this work, we focus on transformer-based deep learning architectures, that have achieved state-of-the-art performances on the segmentation task, and we propose to employ depth information by embedding it in the positional encoding. Effectively, we extend the network to multimodal data without adding any parameters and in a natural way that makes use of the strength of transformers' self-attention modules. We also investigate the idea of performing cross-modality operations inside the attention module, swapping the key inputs between the depth and color branches. Our approach consistently improves performances on the Cityscapes benchmark.
translated by 谷歌翻译
我们考虑了从一个示例轨迹中学习$ dx_t = f(x_t)dt+sigma(x_t)dw_t $的形式的随机微分方程的问题。这个问题比学习确定性动力学系统更具挑战性,因为一个示例轨迹仅提供有关未知功能$ f $,$ \ sigma $的间接信息,而随机过程$ dw_t $代表漂移,扩散和随机强迫术语,强迫术语,,分别。我们为此问题提出了一个简单的基于内核的解决方案,可以分解如下:(1)表示时间添加映射$ x_t \ rightarrow x_ {t+dt} $作为计算图,其中$ f $,$ \ \ Sigma $和$ DW_T $作为未知功能和随机变量出现。 (2)通过在未知函数上使用高斯过程(GP)先验的最大后验估计(给定数据)来完成图(近似未知的函数和随机变量)。 (3)从具有随机交叉验证的数据中学习GP先验的协方差函数(内核)。数值实验说明了我们方法的功效,鲁棒性和范围。
translated by 谷歌翻译
我们研究在线学习问题,决策者必须采取一系列决策,但要受到$ M $长期约束。决策者的目标是最大程度地提高其总奖励,同时达到小累积约束,在$ t $回合中违规。我们介绍了此一般类问题的第一个最佳世界类型算法,在根据未知随机模型选择奖励和约束的情况下,无需保证,在它们的情况下,在他们的情况下选择了奖励和约束。在每个回合中由对手选择。我们的算法是关于满足长期约束的最佳固定策略的第一个在对抗环境中提供保证的算法。特别是,它保证了$ \ rho/(1+ \ rho)$的最佳奖励和额定性遗憾,其中$ \ rho $是与严格可行的解决方案有关的可行性参数。我们的框架采用传统的遗憾最小化器作为黑盒组件。因此,通过使用适当的遗憾最小化器进行实例化,它可以处理全反馈以及强盗反馈设置。此外,它允许决策者通过非凸奖励和约束无缝处理场景。我们展示了如何在重复拍卖的预算管理机制的背景下应用我们的框架,以保证不包装的长期约束(例如,ROI约束)。
translated by 谷歌翻译