使用移动操纵器来整理家庭环境,在机器人技术中提出了各种挑战,例如适应大型现实世界的环境变化,以及在人类面前的安全和强大的部署。2021年9月举行的全球竞赛,对真正的家庭环境中的整理任务进行了基准测试,重要的是,对全面的系统性能进行了测试。对于此挑战,我们开发了整个家庭服务机器人系统,该机器人系统利用数据驱动的方法来适应众多的方法在执行过程中发生的边缘案例,而不是经典的手动预编程解决方案。在本文中,我们描述了提出的机器人系统的核心成分,包括视觉识别,对象操纵和运动计划。我们的机器人系统赢得了二等奖,验证了数据驱动的机器人系统在家庭环境中移动操作的有效性和潜力。
translated by 谷歌翻译
Multi-Exit models (MEMs) use an early-exit strategy to improve the accuracy and efficiency of deep neural networks (DNNs) by allowing samples to exit the network before the last layer. However, the effectiveness of MEMs in the presence of distribution shifts remains largely unexplored. Our work examines how distribution shifts generated by common image corruptions affect the accuracy/efficiency of MEMs. We find that under common corruptions, early-exiting at the first correct exit reduces the inference cost and provides a significant boost in accuracy ( 10%) over exiting at the last layer. However, with realistic early-exit strategies, which do not assume knowledge about the correct exits, MEMs still reduce inference cost but provide a marginal improvement in accuracy (1%) compared to exiting at the last layer. Moreover, the presence of distribution shift widens the gap between an MEM's maximum classification accuracy and realistic early-exit strategies by 5% on average compared with the gap on in-distribution data. Our empirical analysis shows that the lack of calibration due to a distribution shift increases the susceptibility of such early-exit strategies to exit early and increases misclassification rates. Furthermore, the lack of calibration increases the inconsistency in the predictions of the model across exits, leading to both inefficient inference and more misclassifications compared with evaluation on in-distribution data. Finally, we propose two metrics, underthinking and overthinking, that quantify the different behavior of practical early-exit strategy under distribution shifts, and provide insights into improving the practical utility of MEMs.
translated by 谷歌翻译
We propose a Cascaded Buffered IoU (C-BIoU) tracker to track multiple objects that have irregular motions and indistinguishable appearances. When appearance features are unreliable and geometric features are confused by irregular motions, applying conventional Multiple Object Tracking (MOT) methods may generate unsatisfactory results. To address this issue, our C-BIoU tracker adds buffers to expand the matching space of detections and tracks, which mitigates the effect of irregular motions in two aspects: one is to directly match identical but non-overlapping detections and tracks in adjacent frames, and the other is to compensate for the motion estimation bias in the matching space. In addition, to reduce the risk of overexpansion of the matching space, cascaded matching is employed: first matching alive tracks and detections with a small buffer, and then matching unmatched tracks and detections with a large buffer. Despite its simplicity, our C-BIoU tracker works surprisingly well and achieves state-of-the-art results on MOT datasets that focus on irregular motions and indistinguishable appearances. Moreover, the C-BIoU tracker is the dominant component for our 2-nd place solution in the CVPR'22 SoccerNet MOT and ECCV'22 MOTComplex DanceTrack challenges. Finally, we analyze the limitation of our C-BIoU tracker in ablation studies and discuss its application scope.
translated by 谷歌翻译
This is our 2nd-place solution for the ECCV 2022 Multiple People Tracking in Group Dance Challenge. Our method mainly includes two steps: online short-term tracking using our Cascaded Buffer-IoU (C-BIoU) Tracker, and, offline long-term tracking using appearance feature and hierarchical clustering. Our C-BIoU tracker adds buffers to expand the matching space of detections and tracks, which mitigates the effect of irregular motions in two aspects: one is to directly match identical but non-overlapping detections and tracks in adjacent frames, and the other is to compensate for the motion estimation bias in the matching space. In addition, to reduce the risk of overexpansion of the matching space, cascaded matching is employed: first matching alive tracks and detections with a small buffer, and then matching unmatched tracks and detections with a large buffer. After using our C-BIoU for online tracking, we applied the offline refinement introduced by ReMOTS.
translated by 谷歌翻译
This is our second-place solution for CVPR 2022 SoccerNet Tracking Challenge. Our method mainly includes two steps: online short-term tracking using our Cascaded Buffer-IoU (C-BIoU) Tracker, and, offline long-term tracking using appearance feature and hierarchical clustering. At each step, online tracking yielded HOTA scores near 90, and offline tracking further improved HOTA scores to around 93.2.
translated by 谷歌翻译
来自重力波检测器的数据中出现的瞬态噪声通常会引起问题,例如检测器的不稳定性以及重叠或模仿重力波信号。由于瞬态噪声被认为与环境和工具相关联,因此其分类将有助于理解其起源并改善探测器的性能。在先前的研究中,提出了用于使用时频2D图像(频谱图)进行瞬态噪声进行分类的体系结构,该架构使用了无监督的深度学习与变异自动编码器和不变信息集群的结合。提出的无监督学习结构应用于重力间谍数据集,该数据集由高级激光干涉仪重力波动台(Advanced Ligo)瞬态噪声与其相关元数据进行讨论,以讨论在线或离线数据分析的潜力。在这项研究的重点是重力间谍数据集中,研究并报告了先前研究的无监督学习结构的训练过程。
translated by 谷歌翻译
Generating realistic lip motion from audio to simulate speech production is critical for driving natural character animation. Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality. Devising metrics that align with subjective opinion first requires understanding what impacts human perception of quality. In this work, we focus on the degree of articulation and run a series of experiments to study how articulation strength impacts human perception of lip motion accompanying speech. Specifically, we study how increasing under-articulated (dampened) and over-articulated (exaggerated) lip motion affects human perception of quality. We examine the impact of articulation strength on human perception when considering only lip motion, where viewers are presented with talking faces represented by landmarks, and in the context of embodied characters, where viewers are presented with photo-realistic videos. Our results show that viewers prefer over-articulated lip motion consistently more than under-articulated lip motion and that this preference generalizes across different speakers and embodiments.
translated by 谷歌翻译
对机器学习模型进行了训练,以最大程度地减少单个度量标准的平均损失,因此通常不考虑公平和稳健性。当培训数据不平衡或测试分布不同时,忽略培训中的这种指标可能会使这些模型容易违反公平。这项工作介绍了通过元学习(FormL)进行公平优化的重新加权,这是一种训练算法,通过共同学习培训样本权重和神经网络参数来平衡公平和鲁棒性与准确性。该方法通过学习通过动态重新重量从用户指定的保留集合中学到的数据来平衡分布的数据来平衡超额和代表性不足的子组的贡献来提高模型的公平性。 Forml提高了图像分类任务上的机会公平标准的平等性,减少了损坏的标签的偏见,并通过数据凝结促进了建立更多公平数据集。这些改进是在没有预处理数据或后处理模型输出的情况下实现的,而无需学习额外的加权函数,没有更改模型体系结构,而是在原始预测指标上保持准确性。
translated by 谷歌翻译
提供有关学习者论证的反馈对于发展批判性思维技能至关重要,但是,它需要大量的时间和精力。为了减轻教师的过载,我们旨在自动化提供反馈的过程,尤其是给出诊断评论,以指出论点固有的弱点。建议给出特定的诊断评论,以便学习者可以识别诊断而不会误解。但是,如何制定提供特定的诊断评论的任务并不明显。我们将任务的表述作为模板选择和插槽填充,以使自动评估变得更加容易,并且模型的行为更加可行。该公式的关键是创建足以实用的模板集的可能性。在本文中,我们定义了三个标准,即模板集应满足:表达性,信息性和唯一性,并验证创建一个满足这些标准作为第一个试验的模板集的可行性。我们将通过一项注释研究证明,将文本中给出的诊断评论转换为模板格式是可行的。注释研究中使用的语料库公开可用。
translated by 谷歌翻译
我们培训了深度神经网络(DNN)作为中微子能量密度,助熔剂和流体速度的函数,以再现在我们的第一原理核心崩溃超新星(CCSN)模拟中获得的Eddington Tensor。虽然是中微子运输的最流行近似的矩的方法需要闭合关系,但文献中通常采用的分析闭合关系都没有捕获动量空间中的中微子角分布的所有方面。在本文中,我们通过使用将中微子能量密度,磁通量和流体速度作为输入和埃丁顿张量作为输出来开发闭合关系。我们考虑两种DNN:传统的DNN命名为组分 - 明智的神经网络(CWNN)和张力基神经网络(TBNN)。我们发现,埃丁顿张量的对角线组件由DNN比M1封闭关系更好地再现,特别是对于低到中间能量。对于非对角线组件,DNN与Boltzmann求解器更好地达到比大半径的M1闭合更好。在两个DNN之间的比较中,TBNN具有比CWNN稍微更好的性能。通过基于DNN的新的封闭关系,该DNN良好地重现Eddington Tensor的成本更小,我们为瞬间方法开辟了一种新的可能性。
translated by 谷歌翻译