We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model which combines differentiable rendering and 6-DoF dynamics. Probabilistic inference in this model amounts to simultaneous localisation and mapping (SLAM) and is intractable. We use a series of approximations to Bayesian inference to arrive at probabilistic map and state estimates. We take advantage of well-established methods and closed-form updates, preserving accuracy and enabling real-time capability. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments, with high-speed UAV and handheld camera agents (Blackbird, EuRoC and TUM-RGBD).
translated by 谷歌翻译
Multi-robot manipulation tasks involve various control entities that can be separated into dynamically independent parts. A typical example of such real-world tasks is dual-arm manipulation. Learning to naively solve such tasks with reinforcement learning is often unfeasible due to the sample complexity and exploration requirements growing with the dimensionality of the action and state spaces. Instead, we would like to handle such environments as multi-agent systems and have several agents control parts of the whole. However, decentralizing the generation of actions requires coordination across agents through a channel limited to information central to the task. This paper proposes an approach to coordinating multi-robot manipulation through learned latent action spaces that are shared across different agents. We validate our method in simulated multi-robot manipulation tasks and demonstrate improvement over previous baselines in terms of sample efficiency and learning performance.
translated by 谷歌翻译
超新星光谱时间序列可用于重建称为超新星断层扫描的空间分辨爆炸模型。除了观察到的光谱时间序列外,超新星断层扫描还需要一个辐射转移模型来执行重建不确定性定量的反问题。超新星断层扫描模型的最小参数化大约是十二个参数,其现实是需要超过100的参数。现实的辐射转移模型需要数十分钟的CPU分钟来进行一次评估,从而使问题在计算上具有传统手段,需要数百万的MCMC样本才能获得此类MCMC样本。问题。一种使用机器学习技术加速称为替代模型或模拟器的新方法为这些问题提供了一种解决方案,以及一种了解光谱时间序列中的祖/爆炸的方法。 Tardis Supernova辐射传输代码存在模拟器,但它们仅在简单的低维模型(大约十二个参数)上表现良好,并且在Supernova字段中具有少量的知识增长应用程序。在这项工作中,我们为辐射转移代码TARDIS提出了一个新的模拟器,该模拟器不仅胜过现有的模拟器,而且还提供了预测中的不确定性。它为未来的基于学习的机械提供了基础,该机械将能够模拟数百个参数的非常高的维度空间,这对于在超新星和相关领域中阐明紧急问题至关重要。
translated by 谷歌翻译
保留数据中相似性的自动编码器模型是表示学习中的流行工具。在本文中,我们介绍了几种自动编码器模型,这些模型在从数据空间到潜在空间的映射时可以保留本地距离。我们使用局部距离保留损失,该损失基于连续的K-Nearthiend邻居图,该图已知可以同时捕获所有尺度的拓扑特征。为了提高培训绩效,我们将学习作为约束优化问题,并保存本地距离,作为主要目标和重建精度作为约束。我们将这种方法推广到分层变分自动编码器,从而学习具有几何一致的潜在和数据空间的生成模型。我们的方法在几个标准数据集和评估指标上提供了最先进的性能。
translated by 谷歌翻译
在艺术音乐生成中使用机器学习会引起人们对艺术质量的有争议的讨论,而客观量化是荒谬的。因此,我们将音乐生成的算法视为与人类音乐家的对手,在这种环境中,相互互动的相互作用是为音乐家和观众带来新的体验。为了获得这种行为,我们求助于经常性变异自动编码器(VAE)的框架,并学会产生由人类音乐家种植的音乐。在学习的模型中,我们通过在潜在空间中插值生成新颖的音乐序列。但是,标准VAE不能保证其潜在表示中的任何形式的平滑度。这转化为生成的音乐序列的突然变化。为了克服这些局限性,我们将解码器的正规化并赋予潜在空间,并具有平坦的riemannian歧管,即是欧几里得空间等均衡的歧管。结果,在潜在空间中线性插值会产生逼真而平稳的音乐变化,适合我们目标的机器 - 音乐互动。我们通过音乐数据集上的一组实验为我们的方法提供了经验证据,并为与专业鼓手的交互式jam会话部署了模型。现场表演提供了定性的证据,表明鼓手可以直观地解释和利用潜在的代表来推动相互作用。除了音乐应用之外,我们的方法还展示了由可解释性和与最终用户的互动驱动的机器学习模型设计的实例。
translated by 谷歌翻译
将机器人放置在受控条件外,需要多功能的运动表示,使机器人能够学习新任务并使其适应环境变化。在工作区中引入障碍或额外机器人的位置,由于故障或运动范围限制导致的关节范围的修改是典型的案例,适应能力在安全地执行机器人任务的关键作用。已经提出了代表适应性运动技能的概率动态(PROMP),其被建模为轨迹的高斯分布。这些都是在分析讲道的,可以从少数演示中学习。然而,原始PROMP制定和随后的方法都仅为特定运动适应问题提供解决方案,例如障碍避免,以及普遍的,统一的适应概率方法缺失。在本文中,我们开发了一种用于调整PROMP的通用概率框架。我们统一以前的适应技术,例如,各种类型的避避,通过一个框架,互相避免,在一个框架中,并将它们结合起来解决复杂的机器人问题。另外,我们推导了新颖的适应技术,例如时间上未结合的通量和互相避免。我们制定适应作为约束优化问题,在那里我们最小化适应的分布与原始原始的分布之间的kullback-leibler发散,而我们限制了与不希望的轨迹相关的概率质量为低电平。我们展示了我们在双机器人手臂设置中的模拟平面机器人武器和7-DOF法兰卡 - Emika机器人的若干适应问题的方法。
translated by 谷歌翻译
Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks where CNNs were successful. In this paper we construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations.Since existing ground truth datasets are not sufficiently large to train a CNN, we generate a synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Studying animal movements is essential for effective wildlife conservation and conflict mitigation. For aerial movements, operational weather radars have become an indispensable data source in this respect. However, partial measurements, incomplete spatial coverage, and poor understanding of animal behaviours make it difficult to reconstruct complete spatio-temporal movement patterns from available radar data. We tackle this inverse problem by learning a mapping from high-dimensional radar measurements to low-dimensional latent representations using a convolutional encoder. Under the assumption that the latent system dynamics are well approximated by a locally linear Gaussian transition model, we perform efficient posterior estimation using the classical Kalman smoother. A convolutional decoder maps the inferred latent system states back to the physical space in which the known radar observation model can be applied, enabling fully unsupervised training. To encourage physical consistency, we additionally introduce a physics-informed loss term that leverages known mass conservation constraints. Our experiments on synthetic radar data show promising results in terms of reconstruction quality and data-efficiency.
translated by 谷歌翻译