在本文中,我们研究了通过减少优化难度来改善对抗性训练(AT)获得的对抗性鲁棒性。为了更好地研究这个问题,我们为AT建立了一个新颖的Bregman Divergence观点,其中可以将其视为负熵曲线上训练数据点的滑动过程。基于这个观点,我们分析了方法(即PGD-AT和Trades)的两个典型方法的学习目标,并且我们发现交易的优化过程比PGD-AT更容易,而PGD-AT则将PGD-AT分开。此外,我们讨论了熵在贸易中的功能,我们发现具有高熵的模型可以是更好的鲁棒性学习者。受到上述发现的启发,我们提出了两种方法,即伪造和MER,它们不仅可以减少10步PGD对手下优化的难度,而且还可以提供更好的鲁棒性。我们的工作表明,在10步PGD对手下减少优化的难度是增强AT中对抗性鲁棒性的一种有前途的方法。
translated by 谷歌翻译
通过在测量$ \ kappa $ -space中进行亚采样,加速MRI缩短了采集时间。从次采样测量中恢复高保真解剖图像需要两个组件之间的密切合作:(1)选择子采样模式的采样器和(2)从不完整测量中恢复图像的重建器。在本文中,我们利用了MRI测量的顺序性质,并提出了一个完全可区分的框架,该框架共同学习与重建策略的顺序采样策略。该共同设计的框架能够在获取过程中适应,以捕获特定目标的最有用的测量结果。 FastMRI膝盖数据集的实验结果表明,所提出的方法在抽样过程中成功利用了中间信息来提高重建性能。特别是,我们提出的方法可以胜过当前最新的$ \ kappa $ - 空间采样基线,超过96%的测试样品。我们还研究了顺序采样和共同设计策略的个人和集体利益。
translated by 谷歌翻译
Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e.g., classifying sports in one dataset, tracking animals in another). However, in wearable cameras, the immersive egocentric perspective of a person engaging with the world around them presents an interconnected web of video understanding tasks -- hand-object manipulations, navigation in the space, or human-human interactions -- that unfold continuously, driven by the person's goals. We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once. Unlike traditional transfer or multi-task learning, EgoT2's flipped design entails separate task-specific backbones and a task translator shared across all tasks, which captures synergies between even heterogeneous tasks and mitigates task competition. Demonstrating our model on a wide array of video tasks from Ego4D, we show its advantages over existing transfer paradigms and achieve top-ranked results on four of the Ego4D 2022 benchmark challenges.
translated by 谷歌翻译
Recently, massive architectures based on Convolutional Neural Network (CNN) and self-attention mechanisms have become necessary for audio classification. While these techniques are state-of-the-art, these works' effectiveness can only be guaranteed with huge computational costs and parameters, large amounts of data augmentation, transfer from large datasets and some other tricks. By utilizing the lightweight nature of audio, we propose an efficient network structure called Paired Inverse Pyramid Structure (PIP) and a network called Paired Inverse Pyramid Structure MLP Network (PIPMN). The PIPMN reaches 96\% of Environmental Sound Classification (ESC) accuracy on the UrbanSound8K dataset and 93.2\% of Music Genre Classification (MGC) on the GTAZN dataset, with only 1 million parameters. Both of the results are achieved without data augmentation or model transfer. Public code is available at: https://github.com/JNAIC/PIPMN
translated by 谷歌翻译
Fact verification has attracted a lot of research attention recently, e.g., in journalism, marketing, and policymaking, as misinformation and disinformation online can sway one's opinion and affect one's actions. While fact-checking is a hard task in general, in many cases, false statements can be easily debunked based on analytics over tables with reliable information. Hence, table-based fact verification has recently emerged as an important and growing research area. Yet, progress has been limited due to the lack of datasets that can be used to pre-train language models (LMs) to be aware of common table operations, such as aggregating a column or comparing tuples. To bridge this gap, in this paper we introduce PASTA, a novel state-of-the-art framework for table-based fact verification via pre-training with synthesized sentence-table cloze questions. In particular, we design six types of common sentence-table cloze tasks, including Filter, Aggregation, Superlative, Comparative, Ordinal, and Unique, based on which we synthesize a large corpus consisting of 1.2 million sentence-table pairs from WikiTables. PASTA uses a recent pre-trained LM, DeBERTaV3, and further pretrains it on our corpus. Our experimental results show that PASTA achieves new state-of-the-art performance on two table-based fact verification benchmarks: TabFact and SEM-TAB-FACTS. In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms the previous state of the art by 4.7 points (85.6% vs. 80.9%), and the gap between PASTA and human performance on the small TabFact test set is narrowed to just 1.5 points (90.6% vs. 92.1%).
translated by 谷歌翻译
人类相互作用的分析是人类运动分析的一个重要研究主题。它已经使用第一人称视觉(FPV)或第三人称视觉(TPV)进行了研究。但是,到目前为止,两种视野的联合学习几乎没有引起关注。原因之一是缺乏涵盖FPV和TPV的合适数据集。此外,FPV或TPV的现有基准数据集具有多个限制,包括样本数量有限,参与者,交互类别和模态。在这项工作中,我们贡献了一个大规模的人类交互数据集,即FT-HID数据集。 FT-HID包含第一人称和第三人称愿景的成对对齐的样本。该数据集是从109个不同受试者中收集的,并具有三种模式的90K样品。该数据集已通过使用几种现有的动作识别方法验证。此外,我们还引入了一种新型的骨骼序列的多视图交互机制,以及针对第一人称和第三人称视野的联合学习多流框架。两种方法都在FT-HID数据集上产生有希望的结果。可以预期,这一视力一致的大规模数据集的引入将促进FPV和TPV的发展,以及他们用于人类行动分析的联合学习技术。该数据集和代码可在\ href {https://github.com/endlichere/ft-hid} {here} {herefichub.com/endlichere.com/endlichere}中获得。
translated by 谷歌翻译
零射击行动识别(ZSAR)旨在识别培训期间从未见过的视频动作。大多数现有方法都假设看到和看不见的动作之间存在共享的语义空间,并打算直接学习从视觉空间到语义空间的映射。视觉空间和语义空间之间的语义差距挑战了这种方法。本文提出了一种新颖的方法,该方法使用对象语义作为特权信息来缩小语义差距,从而有效地帮助学习。特别是,提出了一个简单的幻觉网络,以在不明确提取对象的情况下隐式提取对象语义,并开发了一个交叉注意模块,以增强对象语义的视觉功能。奥林匹克运动,HMDB51和UCF101数据集的实验表明,所提出的方法的表现优于最先进的方法。
translated by 谷歌翻译
多模式知识蒸馏(KD)将传统知识蒸馏扩展到多模式学习的领域。一种常见的做法是采用良好的多式联运网络作为老师,希望它可以将其全部知识转移到单形学生以提高绩效。在本文中,我们研究了多模式KD的功效。我们首先提供了两个失败情况,并证明KD不是多模式知识转移中的普遍治疗方法。我们介绍了维恩图的模态,以了解模态关系和焦点的假设,从而揭示了多模式KD功效的决定性因素。6个多模式数据集的实验结果有助于证明我们的假设,诊断失败情况和点方向以提高蒸馏性能。
translated by 谷歌翻译
点云降级旨在从噪音和异常值损坏的原始观察结果中恢复清洁点云,同时保留细粒细节。我们提出了一种新型的基于深度学习的DeNoising模型,该模型结合了正常的流量和噪声解散技术,以实现高降解精度。与提取点云特征以进行点校正的现有作品不同,我们从分布学习和特征分离的角度制定了denoising过程。通过将嘈杂的点云视为清洁点和噪声的联合分布,可以从将噪声对应物从潜在点表示中解​​散出来,而欧几里得和潜在空间之间的映射是通过标准化流量来建模的。我们评估了具有各种噪声设置的合成3D模型和现实世界数据集的方法。定性和定量结果表明,我们的方法表现优于先前的最先进的基于深度学习的方法。
translated by 谷歌翻译
Point Cloud升级旨在从给定的稀疏中产生密集的点云,这是一项具有挑战性的任务,这是由于点集的不规则和无序的性质。为了解决这个问题,我们提出了一种新型的基于深度学习的模型,称为PU-Flow,该模型结合了正常的流量和权重预测技术,以产生均匀分布在基础表面上的致密点。具体而言,我们利用标准化流的可逆特征来转换欧几里得和潜在空间之间的点,并将UPSMPLING过程作为潜在空间中相邻点的集合,从本地几何环境中自适应地学习。广泛的实验表明,我们的方法具有竞争力,并且在大多数测试用例中,它在重建质量,近距到表面的准确性和计算效率方面的表现优于最先进的方法。源代码将在https://github.com/unknownue/pu-flow上公开获得。
translated by 谷歌翻译