As acquiring manual labels on data could be costly, unsupervised domain adaptation (UDA), which transfers knowledge learned from a rich-label dataset to the unlabeled target dataset, is gaining increasing popularity. While extensive studies have been devoted to improving the model accuracy on target domain, an important issue of model robustness is neglected. To make things worse, conventional adversarial training (AT) methods for improving model robustness are inapplicable under UDA scenario since they train models on adversarial examples that are generated by supervised loss function. In this paper, we present a new meta self-training pipeline, named SRoUDA, for improving adversarial robustness of UDA models. Based on self-training paradigm, SRoUDA starts with pre-training a source model by applying UDA baseline on source labeled data and taraget unlabeled data with a developed random masked augmentation (RMA), and then alternates between adversarial target model training on pseudo-labeled target data and finetuning source model by a meta step. While self-training allows the direct incorporation of AT in UDA, the meta step in SRoUDA further helps in mitigating error propagation from noisy pseudo labels. Extensive experiments on various benchmark datasets demonstrate the state-of-the-art performance of SRoUDA where it achieves significant model robustness improvement without harming clean accuracy. Code is available at https://github.com/Vision.
translated by 谷歌翻译
本文提出了一种网络,称为MVSTR,用于多视图立体声(MV)。它建在变压器上,并能够用全局上下文和3D一致性提取密集的功能,这对于实现MV的可靠匹配至关重要。具体地,为了解决现有的基于CNN的MVS方法的有限接收领域的问题,首先提出全局上下文变换器模块来探索视图内的全局上下文。另外,为了进一步实现致密功能,以3D一致,通过精心设计的巧妙机制构建了3D几何变压器模块,以便于观看互联信息交互。实验结果表明,建议的MVSTR在DTU数据集中实现了最佳的整体性能,并在坦克和寺庙基准数据集上的强大泛化。
translated by 谷歌翻译
为了应对对抗性实例的威胁,对抗性培训提供了一种有吸引力的选择,可以通过在线增强的对抗示例中的培训模型提高模型稳健性。然而,大多数现有的对抗训练方法通过强化对抗性示例来侧重于提高鲁棒的准确性,但忽略了天然数据和对抗性实施例之间的增加,导致自然精度急剧下降。为了维持自然和强大的准确性之间的权衡,我们从特征适应的角度缓解了转变,并提出了一种特征自适应对抗训练(FAAT),这些培训(FAAT)跨越自然数据和对抗示例优化类条件特征适应。具体而言,我们建议纳入一类条件鉴别者,以鼓励特征成为(1)类鉴别的和(2)不变导致对抗性攻击的变化。新型的FAAT框架通过在天然和对抗数据中产生具有类似分布的特征来实现自然和强大的准确性之间的权衡,并实现从类鉴别特征特征中受益的更高的整体鲁棒性。在各种数据集上的实验表明,FAAT产生更多辨别特征,并对最先进的方法表现有利。代码在https://github.com/visionflow/faat中获得。
translated by 谷歌翻译
人类相互作用的分析是人类运动分析的一个重要研究主题。它已经使用第一人称视觉(FPV)或第三人称视觉(TPV)进行了研究。但是,到目前为止,两种视野的联合学习几乎没有引起关注。原因之一是缺乏涵盖FPV和TPV的合适数据集。此外,FPV或TPV的现有基准数据集具有多个限制,包括样本数量有限,参与者,交互类别和模态。在这项工作中,我们贡献了一个大规模的人类交互数据集,即FT-HID数据集。 FT-HID包含第一人称和第三人称愿景的成对对齐的样本。该数据集是从109个不同受试者中收集的,并具有三种模式的90K样品。该数据集已通过使用几种现有的动作识别方法验证。此外,我们还引入了一种新型的骨骼序列的多视图交互机制,以及针对第一人称和第三人称视野的联合学习多流框架。两种方法都在FT-HID数据集上产生有希望的结果。可以预期,这一视力一致的大规模数据集的引入将促进FPV和TPV的发展,以及他们用于人类行动分析的联合学习技术。该数据集和代码可在\ href {https://github.com/endlichere/ft-hid} {here} {herefichub.com/endlichere.com/endlichere}中获得。
translated by 谷歌翻译
最近基于对比的3D动作表示学习取得了长足的进步。但是,严格的正/负约束尚未放松,并且使用非自我阳性的使用尚待探索。在本文中,为无监督的骨骼3D动作表示学习提出了对比度阳性挖掘(CPM)框架。 CPM在上下文队列中识别非自我阳性以提高学习。具体而言,采用和培训了暹罗编码器,以匹配增强实例的相似性分布,以参考上下文队列中的所有实例。通过确定队列中的非自我积极实例,提出了一种积极增强的学习策略,以利用采矿阳性的知识来增强学习潜在空间的稳健性,以抵抗阶级内部和阶层间多样性。实验结果表明,所提出的CPM具有有效性,并且在挑战性的NTU和PKU-MMD数据集上胜过现有的最新无监督方法。
translated by 谷歌翻译
由于肿瘤的异质性,在个性化的基础上预测抗癌药物的临床结局在癌症治疗中具有挑战性。已经采取了传统的计算努力来建模药物反应对通过其分子概况描绘的单个样品的影响,但由于OMICS数据的高维度而发生过度拟合,因此阻碍了临床应用的模型。最近的研究表明,深度学习是通过学习药物和样品之间的学习对准模式来建立药物反应模型的一种有前途的方法。但是,现有研究采用了简单的特征融合策略,仅考虑了整个药物特征,同时忽略了在对齐药物和基因时可能起着至关重要的作用的亚基信息。特此在本文中,我们提出了TCR(基于变压器的癌症药物反应网络),以预测抗癌药物反应。通过利用注意机制,TCR能够在我们的研究中有效地学习药物原子/子结构和分子特征之间的相互作用。此外,设计了双重损耗函数和交叉抽样策略,以提高TCR的预测能力。我们表明,TCR在所有评估矩阵上(一些具有显着改进)的各种数据分裂策略下优于所有其他方法。广泛的实验表明,TCR在独立的体外实验和体内实际患者数据上显示出显着提高的概括能力。我们的研究强调了TCR的预测能力及其对癌症药物再利用和精度肿瘤治疗的潜在价值。
translated by 谷歌翻译
零射击行动识别(ZSAR)旨在识别培训期间从未见过的视频动作。大多数现有方法都假设看到和看不见的动作之间存在共享的语义空间,并打算直接学习从视觉空间到语义空间的映射。视觉空间和语义空间之间的语义差距挑战了这种方法。本文提出了一种新颖的方法,该方法使用对象语义作为特权信息来缩小语义差距,从而有效地帮助学习。特别是,提出了一个简单的幻觉网络,以在不明确提取对象的情况下隐式提取对象语义,并开发了一个交叉注意模块,以增强对象语义的视觉功能。奥林匹克运动,HMDB51和UCF101数据集的实验表明,所提出的方法的表现优于最先进的方法。
translated by 谷歌翻译
本文提出了一种新的图形卷积运算符,称为中央差异图卷积(CDGC),用于基于骨架的动作识别。它不仅能够聚合节点信息,如vanilla图卷积操作,而且还可以介绍梯度信息。在不引入任何其他参数的情况下,CDGC可以在任何现有的图形卷积网络(GCN)中取代VANILLA图表卷积。此外,开发了一种加速版的CDGC,这大大提高了培训速度。两个流行的大型数据集NTU RGB + D 60和120的实验表明了所提出的CDGC的功效。代码可在https://github.com/iesymiao/cd-gcn获得。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译