大多数(3D)多对象跟踪方法依赖于数据关联的外观提示。相比之下,我们研究了仅通过编码3D空间中对象之间的几何关系作为数据驱动数据关联的线索,我们才能达到多远。我们将3D检测编码为图中的节点,其中对象之间的空间和时间成对关系是通过图边缘上的局部极性坐标编码的。这种表示使我们的几何关系不变到全球变换和平滑的轨迹变化,尤其是在非全面运动下。这使我们的图形神经网络可以学会有效地编码时间和空间交互,并充分利用上下文和运动提示,以通过将数据关联作为边缘分类来获得最终场景解释。我们在Nuscenes数据集上建立了一个新的最先进的方法,更重要的是,我们的方法在不同位置(波士顿,新加坡,Karlsruhe)和数据集(Nuscenes和Kitti)中跨越了我们的方法。
translated by 谷歌翻译
我们提出了一种基于审议的新型方法来端到端(E2E)口语理解(SLU),其中流媒体自动语音识别(ASR)模型会产生第一频繁的假设和第二通通的自然语言(NLU)(NLU) )组件通过对ASR的文本和音频嵌入来生成语义解析。通过将E2E SLU制定为广义解码器,我们的系统能够支持复杂的组成语义结构。此外,ASR和NLU之间的参数共享使该系统特别适合资源受限的(内部设备)环境;我们提出的方法始终在TOPV2数据集的口头版本(Stop)的口语版本上始终优于强大管道NLU基线的0.60%至0.65%。我们证明了文本和音频功能的融合,再加上系统重写第一通道假设的能力,使我们的方法对ASR错误更加强大。最后,我们表明我们的方法可以显着减少从自然语音到合成语音训练时的降解,但是要使文本到语音(TTS)成为可行的解决方案,以扩大E2E SLU。
translated by 谷歌翻译
放射疗法逆计划通常要求规划者在治疗计划系统的目标职能下修改参数,以在临床上可接受的计划中产生。由于此过程中的手动步骤,计划质量可能因规划时间和规划师技能而有所不同。本研究调查了两个用于自动逆计划的高参考方法。由于此框架不会在以前优化的计划上培训模型,因此可以随时适应实践模式变化,并且计划质量不受培训队列的限制。我们选择了10名接受肺部SBRT的患者使用手动生成的临床计划。我们使用随机采样(RS)和贝叶斯优化(BO)使用基于11个临床目标的线性二次实用程序功能来调谐参数。将所有计划归一化为PTV D95等于48 GY,我们比较了自动生成和手动制定的计划的计划质量。我们还调查了迭代计数对自动生成的计划的影响,比较了RS和Bo计划的计划时间和计划实用程序,而不会停止标准。如果没有停止标准,RS和BO计划的中位数规划时间为1.9和2.3小时。 RS和B​​O计划中的OAR剂量在临床剂量限制下的48.7%和60.4%的中位数(MPD),低于临床计划剂量的2.8%和3.3%的MPD。通过停止标准,效用降低了5.3%和3.9%的RS和BO计划,但中位规划时间降至0.5%和0.7小时,OAR剂量仍有42.9%和49.7%的MPD为42.9%和49.7%临床剂量限制和MPD为0.3%和1.8%以下低于临床计划剂量。本研究表明,自动逆计划的超参数调整方法可以减少与计划质量相似或优于手动生成的计划的主动规划时间。
translated by 谷歌翻译
Many challenging reinforcement learning (RL) problems require designing a distribution of tasks that can be applied to train effective policies. This distribution of tasks can be specified by the curriculum. A curriculum is meant to improve the results of learning and accelerate it. We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning, where a task sequence is created based on the success rate of each task. In this setting, each task is an algorithmically created environment instance with a unique configuration. The algorithm selects the order of tasks that provide the fastest learning for agents. The probability of selecting any of the tasks for the next stage of learning is determined by evaluating its performance score in previous stages. Experiments were carried out in the Partially Observable Grid Environment for Multiple Agents (POGEMA) and Procgen benchmark. We demonstrate that SITP matches or surpasses the results of other curriculum design methods. Our method can be implemented with handful of minor modifications to any standard RL framework and provides useful prioritization with minimal computational overhead.
translated by 谷歌翻译
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and mapping. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (``Point") at different angles. The dataset is based on the popular Habitat simulator, in which it is possible to generate photorealistic indoor scenes using both own sensor data and open datasets, such as Matterport3D. To study the main stages of solving the place recognition problem on the HPointLoc dataset, we proposed a new modular approach named as PNTR. It first performs an image retrieval with the Patch-NetVLAD method, then extracts keypoints and matches them using R2D2, LoFTR or SuperPoint with SuperGlue, and finally performs a camera pose optimization step with TEASER++. Such a solution to the place recognition problem has not been previously studied in existing publications. The PNTR approach has shown the best quality metrics on the HPointLoc dataset and has a high potential for real use in localization systems for unmanned vehicles. The proposed dataset and framework are publicly available: https://github.com/metra4ok/HPointLoc.
translated by 谷歌翻译
This paper addresses the kinodynamic motion planning for non-holonomic robots in dynamic environments with both static and dynamic obstacles -- a challenging problem that lacks a universal solution yet. One of the promising approaches to solve it is decomposing the problem into the smaller sub problems and combining the local solutions into the global one. The crux of any planning method for non-holonomic robots is the generation of motion primitives that generates solutions to local planning sub-problems. In this work we introduce a novel learnable steering function (policy), which takes into account kinodynamic constraints of the robot and both static and dynamic obstacles. This policy is efficiently trained via the policy optimization. Empirically, we show that our steering function generalizes well to unseen problems. We then plug in the trained policy into the sampling-based and lattice-based planners, and evaluate the resultant POLAMP algorithm (Policy Optimization that Learns Adaptive Motion Primitives) in a range of challenging setups that involve a car-like robot operating in the obstacle-rich parking-lot environments. We show that POLAMP is able to plan collision-free kinodynamic trajectories with success rates higher than 92%, when 50 simultaneously moving obstacles populate the environment showing better performance than the state-of-the-art competitors.
translated by 谷歌翻译
Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, when learning the correction factor the knowledge of the instance-independent heuristic is utilized. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be utilized in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of $4$x while producing the solutions, which costs exceed the costs of the optimal solutions by less than $0.3$% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners.
translated by 谷歌翻译
This paper presents a class of new fast non-trainable entropy-based confidence estimation methods for automatic speech recognition. We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word for Connectionist Temporal Classification (CTC) and Recurrent Neural Network Transducer (RNN-T) models. Proposed methods have similar computational complexity to the traditional method based on the maximum per-frame probability, but they are more adjustable, have a wider effective threshold range, and better push apart the confidence distributions of correct and incorrect words. We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability at detecting incorrect words for Conformer-CTC and Conformer-RNN-T models, respectively.
translated by 谷歌翻译
Independence testing is a fundamental and classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) allow stopping earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. It is well known that classical batch tests are not tailored for streaming data settings, since valid inference after data peeking requires correcting for multiple testing, but such corrections generally result in low power. In this paper, we design sequential kernelized independence tests (SKITs) that overcome such shortcomings based on the principle of testing by betting. We exemplify our broad framework using bets inspired by kernelized dependence measures such as the Hilbert-Schmidt independence criterion (HSIC) and the constrained-covariance criterion (COCO). Importantly, we also generalize the framework to non-i.i.d. time-varying settings, for which there exist no batch tests. We demonstrate the power of our approaches on both simulated and real data.
translated by 谷歌翻译
The paper discusses the improvement of the accuracy of an inertial navigation system created on the basis of MEMS sensors using machine learning (ML) methods. As input data for the classifier, we used infor-mation obtained from a developed laboratory setup with MEMS sensors on a sealed platform with the ability to adjust its tilt angles. To assess the effectiveness of the models, test curves were constructed with different values of the parameters of these models for each core in the case of a linear, polynomial radial basis function. The inverse regularization parameter was used as a parameter. The proposed algorithm based on MO has demonstrated its ability to correctly classify in the presence of noise typical for MEMS sensors, where good classification results were obtained when choosing the optimal values of hyperpa-rameters.
translated by 谷歌翻译