传感器融合可以显着提高许多计算机视觉任务的性能。但是,传统的融合方法要么不是数据驱动的,也不能利用先验知识,也不能在给定数据集中找到规律性,或者它们仅限于单个应用程序。我们通过呈现一种新型深层分层变异自动编码器来克服这一缺点,称为FusionVae,可以作为许多融合任务的基础。我们的方法能够生成以多个嘈杂,遮挡或仅部分可见的输入图像来调节的各种图像样本。我们得出并优化了融合的条件对数似然的变化下限。为了彻底评估模型的融合功能,我们根据流行的计算机视觉数据集创建了三个新颖的图像融合数据集。在我们的实验中,我们表明FusionVae学习了与融合任务相关的汇总信息的表示。结果表明,我们的方法表现明显优于传统方法。此外,我们介绍了不同设计选择的优势和缺点。
translated by 谷歌翻译
我们提出了一种新型的元学习方法,用于对未知物体的6D姿势估计。与“实例级”构成估计方法相反,我们的算法以类别 - 不合命相的方式学习对象表示,从而在对象类别中赋予其具有强大的概括能力。具体而言,我们采用条件神经过程的元学习方法来训练编码器,以基于很少的RGB-D图像和地面真实关键点,以潜在表示中捕获对象的纹理和几何形状。然后,同时进行元训练的解码器使用潜在表示,以预测新图像中对象的6D姿势。为了评估我们的算法,在多个场景(MCMS)中从多个类别生成的新的全通道合成数据集进行了实验。实验结果表明,我们的模型在具有各种形状和外观的看不见的物体上表现良好。
translated by 谷歌翻译
在现实世界中的机器人在现实环境中的许多可能的应用领域都铰接机器人掌握物体的能力。因此,机器人Grasping多年来一直是有效的研究领域。通过我们的出版物,我们有助于使机器人能够掌握,特别关注垃圾桶采摘应用。垃圾拣选尤其挑战,由于经常杂乱和非结构化的物体排列以及通过简单的顶部掌握的物体的频繁避免的避神。为了解决这些挑战,我们提出了一种基于软演员 - 评论家(SAC)的混合离散调整的完全自我监督的强化学习方法。我们使用参数化运动原语来推动和抓握运动,以便为我们考虑的困难设置启用灵活的适应行为。此外,我们使用数据增强来提高样本效率。我们证明了我们提出的关于具有挑战性的采摘情景的方法,其中平面掌握学习或行动离散化方法会面临很大困难
translated by 谷歌翻译
The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned agent will perform if deployed in the target environment before actually deploying it and without the risk of overestimating its true performance. To achieve this, we introduce a framework for safe evaluation of offline learning using approximate high-confidence off-policy evaluation (HCOPE) to estimate the performance of offline policies during learning. In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping. A lower-bound estimate tells us how good a newly-learned target policy would perform before it is deployed in the real environment, and therefore allows us to decide when to deploy our learned policy.
translated by 谷歌翻译
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $\pi_e$. Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular state-action pair occurring under $\pi_e$ is very different from the probability of that same pair occurring in $\mathcal{D}$ (Voloshin et al. 2021, Fu et al. 2021). In this work, we propose to improve the accuracy of OPE estimators by projecting the high-dimensional state-space into a low-dimensional state-space using concepts from the state abstraction literature. Specifically, we consider marginalized importance sampling (MIS) OPE algorithms which compute state-action distribution correction ratios to produce their OPE estimate. In the original ground state-space, these ratios may have high variance which may lead to high variance OPE. However, we prove that in the lower-dimensional abstract state-space the ratios can have lower variance resulting in lower variance OPE. We then highlight the challenges that arise when estimating the abstract ratios from data, identify sufficient conditions to overcome these issues, and present a minimax optimization problem whose solution yields these abstract ratios. Finally, our empirical evaluation on difficult, high-dimensional state-space OPE tasks shows that the abstract ratios can make MIS OPE estimators achieve lower mean-squared error and more robust to hyperparameter tuning than the ground ratios.
translated by 谷歌翻译
Reinforcement Learning (RL) can solve complex tasks but does not intrinsically provide any guarantees on system behavior. For real-world systems that fulfill safety-critical tasks, such guarantees on safety specifications are necessary. To bridge this gap, we propose a verifiably safe RL procedure with probabilistic guarantees. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification, while randomizing the controller's inputs within a bounded set. Then, we use RL to improve the performance of this probabilistically verified, i.e. safe, controller and explore in the same bounded set around the controller's input as was randomized over in the verification step. Finally, we calculate probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficient for continuous action and state spaces and separates safety verification and performance improvement into two independent steps. We evaluate our approach on a safe evasion task where a robot has to evade a dynamic obstacle in a specific manner while trying to reach a goal. The results show that our verifiably safe RL approach leads to efficient learning and performance improvements while maintaining safety specifications.
translated by 谷歌翻译
Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a number of action plays. This problem is considered more challenging than the linear bandit problem, which can be viewed as a contextual bandit problem with a \emph{fixed} context. Surprisingly, in this paper, we show that the stochastic contextual problem can be solved as if it is a linear bandit problem. In particular, we establish a novel reduction framework that converts every stochastic contextual linear bandit instance to a linear bandit instance, when the context distribution is known. When the context distribution is unknown, we establish an algorithm that reduces the stochastic contextual instance to a sequence of linear bandit instances with small misspecifications and achieves nearly the same worst-case regret bound as the algorithm that solves the misspecified linear bandit instances. As a consequence, our results imply a $O(d\sqrt{T\log T})$ high-probability regret bound for contextual linear bandits, making progress in resolving an open problem in (Li et al., 2019), (Li et al., 2021). Our reduction framework opens up a new way to approach stochastic contextual linear bandit problems, and enables improved regret bounds in a number of instances including the batch setting, contextual bandits with misspecifications, contextual bandits with sparse unknown parameters, and contextual bandits with adversarial corruption.
translated by 谷歌翻译
In this paper, we present a new theoretical approach for enabling domain knowledge acquisition by intelligent systems. We introduce a hybrid model that starts with minimal input knowledge in the form of an upper ontology of concepts, stores and reasons over this knowledge through a knowledge graph database and learns new information through a Logic Neural Network. We study the behavior of this architecture when handling new data and show that the final system is capable of enriching its current knowledge as well as extending it to new domains.
translated by 谷歌翻译
在各种控制任务域中,现有控制器提供了基线的性能水平,虽然可能是次优的 - 应维护。依赖于国家和行动空间的广泛探索的强化学习(RL)算法可用于优化控制策略。但是,完全探索性的RL算法可能会在训练过程中降低低于基线水平的性能。在本文中,我们解决了控制政策的在线优化问题,同时最大程度地减少了遗憾的W.R.T基线政策绩效。我们提出了一个共同的仿制学习框架,表示乔尔。 JIRL中的学习过程假设了基线策略的可用性,并设计了两个目标\ textbf {(a)}利用基线的在线演示,以最大程度地减少培训期间的遗憾W.R.T的基线策略,\ textbf {(b) }最终超过了基线性能。 JIRL通过最初学习模仿基线策略并逐渐将控制从基线转移到RL代理来解决这些目标。实验结果表明,JIRR有效地实现了几个连续的动作空间域中的上述目标。结果表明,JIRL在最终性能中与最先进的算法相当,同时在所有提出的域中训练期间都会降低基线后悔。此外,结果表明,对于最先进的基线遗憾最小化方法,其基线后悔的减少因素最高为21美元。
translated by 谷歌翻译