The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes an object has, and what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept (category, attribute, affordance), and the causal relations of three levels. By analyzing the causal structure of OCL, we present a baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations. In experiments, OCRN effectively infers the object knowledge while following the causalities well. Our data and code are available at https://mvig-rhos.com/ocl.
translated by 谷歌翻译
End-to-end autonomous driving provides a feasible way to automatically maximize overall driving system performance by directly mapping the raw pixels from a front-facing camera to control signals. Recent advanced methods construct a latent world model to map the high dimensional observations into compact latent space. However, the latent states embedded by the world model proposed in previous works may contain a large amount of task-irrelevant information, resulting in low sampling efficiency and poor robustness to input perturbations. Meanwhile, the training data distribution is usually unbalanced, and the learned policy is hard to cope with the corner cases during the driving process. To solve the above challenges, we present a semantic masked recurrent world model (SEM2), which introduces a latent filter to extract key task-relevant features and reconstruct a semantic mask via the filtered features, and is trained with a multi-source data sampler, which aggregates common data and multiple corner case data in a single batch, to balance the data distribution. Extensive experiments on CARLA show that our method outperforms the state-of-the-art approaches in terms of sample efficiency and robustness to input permutations.
translated by 谷歌翻译
自2016年成立以来,Alexa奖计划使数百名大学生能够通过Socialbot Grand Challenge探索和竞争以发展对话代理商。挑战的目的是建立能够与人类在流行主题上连贯而诱人的代理人20分钟,同时达到至少4.0/5.0的平均评分。但是,由于对话代理商试图帮助用户完成日益复杂的任务,因此需要新的对话AI技术和评估平台。成立于2021年的Alexa奖Taskbot Challenge建立在Socialbot Challenge的成功基础上,通过引入交互式协助人类进行现实世界烹饪和做自己动手做的任务的要求,同时同时使用语音和视觉方式。这项挑战要求TaskBots识别和理解用户的需求,识别和集成任务和域知识,并开发新的方式,不分散用户的注意力,而不必分散他们的任务,以及其他挑战。本文概述了Taskbot挑战赛,描述了使用Cobot Toolkit提供给团队提供的基础架构支持,并总结了参与团队以克服研究挑战所采取的方法。最后,它分析了比赛第一年的竞争任务机器人的性能。
translated by 谷歌翻译
风险评分系统已被广泛地部署在许多应用程序中,这些应用程序根据用户的行为序列将风险分数分配给了。尽管许多具有复杂设计的深度学习方法已经取得了令人鼓舞的结果,但由于公平,解释性和合规性考虑,黑框的性质阻碍了他们的应用。在这些敏感情况下,基于规则的系统被认为是可靠的。但是,构建规则系统是劳动密集型的。专家需要从用户行为序列,基于统计数据的设计规则中找到信息统计信息,并为每个规则分配权重。在本文中,我们弥合了有效但黑色框模型与透明规则模型之间的差距。我们提出了一种两阶段的方法Rudi,该方法将黑框教师模型的知识提炼成基于规则的学生模型。我们设计了一种基于蒙特卡洛树搜索的统计生成方法,该方法可以在第一阶段提供一组信息统计信息。然后,通过模仿教师模型的输出,将统计数据与我们提出的神经逻辑网络组成逻辑规则。我们在三个现实世界公共数据集和一个工业数据集上评估了Rudi,以证明其有效性。
translated by 谷歌翻译
需求估计在动态定价中起着重要的作用,在动态定价中,可以通过基于需求曲线最大化收入来获得最佳价格。在在线酒店预订平台中,房间的需求或占用率随着房间类型而变化,随着时间的推移变化,因此获得准确的占用估算是一项挑战。在本文中,我们提出了一种新颖的酒店需求功能,该功能明确地模拟了对占用预测需求需求的价格弹性,并设计了价格弹性预测模型,以了解各种影响因素的动态价格弹性系数。我们的模型由精心设计的弹性学习模块组成,以减轻内生性问题,并在多任务框架中接受培训以解决数据稀疏性。我们在现实世界数据集上进行了全面的实验,并验证方法优于最先进的基准,以实现占用预测和动态定价。
translated by 谷歌翻译
最近,对抗机器学习攻击对实用音频信号分类系统构成了严重的安全威胁,包括语音识别,说话者识别和音乐版权检测。先前的研究主要集中在确保通过在原始信号上产生类似小噪声的扰动来攻击音频信号分类器的有效性。目前尚不清楚攻击者是否能够创建音频信号扰动,除了其攻击效果外,人类还可以很好地看待。这对于音乐信号尤其重要,因为它们经过精心制作,具有可让人的音频特征。在这项工作中,我们将对音乐信号的对抗性攻击作为一种新的感知攻击框架,将人类研究纳入对抗性攻击设计中。具体而言,我们进行了一项人类研究,以量化人类对音乐信号的变化的看法。我们邀请人类参与者根据对原始和扰动的音乐信号对进行评分,并通过回归分析对人类感知过程进行反向工程,以预测给定信号的人类感知的偏差。然后将感知感知的攻击作为优化问题提出,该问题找到了最佳的扰动信号,以最大程度地减少对回归人类感知模型的感知偏差的预测。我们使用感知感知的框架来设计对YouTube版权探测器的现实对抗音乐攻击。实验表明,感知意识攻击会产生对抗性音乐的感知质量明显优于先前的工作。
translated by 谷歌翻译
最近,深层神经网络(DNNS)用于减少带宽并提高互联网视频交付的质量。现有的方法训练服务器上每个视频块的相应内容超级分辨率(SR)模型,并将低分辨率(LR)视频块以及SR模型一起流到客户端。尽管他们取得了令人鼓舞的结果,但网络培训的巨大计算成本限制了其实际应用。在本文中,我们提出了一种名为有效元调整(EMT)的方法,以降低计算成本。 EMT没有从头开始训练,而是将元学习的模型适应了输入视频的第一部分。至于以下块,它通过以前的改编模型的梯度掩盖选择了部分参数。为了实现EMT的进一步加速,我们提出了一种新颖的抽样策略,以从视频帧中提取最具挑战性的补丁。拟议的策略高效,带来了可忽略的额外成本。我们的方法大大降低了计算成本并取得更好的性能,为将神经视频传递技术应用于实际应用铺平了道路。我们基于各种有效的SR架构进行了广泛的实验,包括ESPCN,SRCNN,FSRCNN和EDSR-1,证明了我们工作的概括能力。该代码通过\ url {https://github.com/neural-video-delivery/emt-pytorch-eccv2022}发布。
translated by 谷歌翻译
典型的文本检测器遵循两阶段的发现策略:首先检测文本实例的精确边界,然后在定期的文本区域内执行文本识别。尽管这种策略取得了实质性进展,但有两个基本的局限性。 1)文本识别的性能在很大程度上取决于文本检测的精度,从而导致从检测到识别的潜在误差传播。 2)桥接检测和识别的ROI种植会带来背景的噪音,并在合并或从特征地图中插值时导致信息丢失。在这项工作中,我们提出了单个镜头自力更生的场景文本sottter(SRSTS),该场景通过将识别解除识别来规避这些限制。具体而言,我们并行进行文本检测和识别,并通过共享的积极锚点架起它们。因此,即使确切的文本边界要检测到具有挑战性,我们的方法也能够正确识别文本实例。此外,我们的方法可大大降低文本检测的注释成本。在常规基准和任意形状的基准上进行了广泛的实验表明,就准确性和效率而言,我们的SRST与以前的最先进的观察者相比有利。
translated by 谷歌翻译