混音是指基于插值的数据增强,最初是作为超越经验风险最小化(ERM)的一种方式。然而,它的扩展侧重于插值的定义及其发生的空间,而增强本身的研究较少:对于$ m $的小批量,大多数方法在$ m $对之间的插值与单个标量插值之间的插值因子$ \ lambda $。在这项工作中,我们通过引入Multimix来朝这个方向取得进展,Multimix插入了任意数字$ n $的元组,每个元组,长度$ m $,一个vector $ \ lambda $每个元组。在序列数据上,我们进一步扩展到所有空间位置上的密集插值和损失计算。总体而言,我们通过数量级以几乎没有成本来增加每个小批量的元素数量。通过在分类器之前的最后一层插值来可以通过插值。最后,为了解决因线性目标插值而引起的不一致之处,我们引入了一种自我鉴定方法来生成和插值合成目标。我们从经验上表明,我们的贡献导致对四个基准测试的最先进混合方法的显着改善。通过分析嵌入空间,我们观察到这些类更紧密地聚集并均匀地分布在嵌入空间上,从而解释了改善的行为。
translated by 谷歌翻译
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as opposed to head of a person), thereby letting the network generalize better and have better object localization capabilities. On the other hand, current methods for regional dropout remove informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it leads to information loss and inefficiency during training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperforms the state-of-the-art augmentation strategies on CI-FAR and ImageNet classification tasks, as well as on the Im-ageNet weakly-supervised localization task. Moreover, unlike previous augmentation methods, our CutMix-trained ImageNet classifier, when used as a pretrained model, results in consistent performance gains in Pascal detection and MS-COCO image captioning benchmarks. We also show that CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances. Source code and pretrained models are available at https://github.com/clovaai/CutMix-PyTorch.
translated by 谷歌翻译
深度神经网络易于对自然投入的离前事实制作,小而难以察觉的变化影响。对这些实例的最有效的防御机制是对逆脉训练在训练期间通过迭代最大化的损失来构建对抗性实例。然后训练该模型以最小化这些构建的实施例的损失。此最小最大优化需要更多数据,更大的容量模型和额外的计算资源。它还降低了模型的标准泛化性能。我们可以更有效地实现鲁棒性吗?在这项工作中,我们从知识转移的角度探讨了这个问题。首先,我们理论上展示了在混合增强的帮助下将鲁棒性从对接地训练的教师模型到学生模型的可转换性。其次,我们提出了一种新颖的鲁棒性转移方法,称为基于混合的激活信道图(MixacM)转移。 MixacM通过匹配未在没有昂贵的对抗扰动的匹配生成的激活频道地图将强大的教师转移到学生的鲁棒性。最后,对多个数据集的广泛实验和不同的学习情景显示我们的方法可以转移鲁棒性,同时还改善自然图像的概括。
translated by 谷歌翻译
Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that ``smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.
translated by 谷歌翻译
与常规知识蒸馏(KD)不同,自我KD允许网络在没有额外网络的任何指导的情况下向自身学习知识。本文提议从图像混合物(Mixskd)执行自我KD,将这两种技术集成到统一的框架中。 Mixskd相互蒸馏以图形和概率分布在随机的原始图像和它们的混合图像之间以有意义的方式。因此,它通过对混合图像进行监督信号进行建模来指导网络学习跨图像知识。此外,我们通过汇总多阶段功能图来构建一个自学老师网络,以提供软标签以监督骨干分类器,从而进一步提高自我增强的功效。图像分类和转移学习到对象检测和语义分割的实验表明,混合物KD优于其他最先进的自我KD和数据增强方法。该代码可在https://github.com/winycg/self-kd-lib上找到。
translated by 谷歌翻译
变形金刚和蒙版语言建模在计算机视觉中很快被视为视觉变压器和蒙版图像建模(MIM)。在这项工作中,我们认为由于图像中令牌的数量和相关性,图像令牌掩盖与文本中的令牌掩盖有所不同。特别是,为了为MIM产生具有挑战性的借口任务,我们主张从随机掩盖到知情掩盖的转变。我们在基于蒸馏的MIM的背景下开发并展示了这一想法,其中教师变压器编码器生成了一个注意力图,我们用它来指导学生为学生指导掩盖。因此,我们引入了一种新颖的掩蔽策略,称为注意引导蒙版(ATTMASK),我们证明了其对基于密集蒸馏的MIM以及基于普通蒸馏的自然剥离的自助力学习的有效性。我们确认ATTMASK可以加快学习过程,并提高各种下游任务的性能。我们在https://github.com/gkakogeorgiou/attmask上提供实现代码。
translated by 谷歌翻译
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These highperforming vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption.In this work, we produce competitive convolution-free transformers by training on Imagenet only. We train them on a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data.More importantly, we introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention. We show the interest of this token-based distillation, especially when using a convnet as a teacher. This leads us to report results competitive with convnets for both Imagenet (where we obtain up to 85.2% accuracy) and when transferring to other tasks. We share our code and models.
translated by 谷歌翻译
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These highperforming vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption.In this work, we produce competitive convolutionfree transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data.We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1 acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.
translated by 谷歌翻译
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
translated by 谷歌翻译
事实证明,数据混合对提高深神经网络的概括能力是有效的。虽然早期方法通过手工制作的策略(例如线性插值)混合样品,但最新方法利用显着性信息通过复杂的离线优化来匹配混合样品和标签。但是,在精确的混合政策和优化复杂性之间进行了权衡。为了应对这一挑战,我们提出了一个新颖的自动混合(Automix)框架,其中混合策略被参数化并直接实现最终分类目标。具体而言,Automix将混合分类重新定义为两个子任务(即混合样品生成和混合分类)与相应的子网络,并在双层优化框架中求解它们。对于这一代,可学习的轻质混合发电机Mix Block旨在通过在相应混合标签的直接监督下对贴片的关系进行建模,以生成混合样品。为了防止双层优化的降解和不稳定性,我们进一步引入了动量管道以端到端的方式训练汽车。与在各种分类场景和下游任务中的最新图像相比,九个图像基准的广泛实验证明了汽车的优势。
translated by 谷歌翻译
在本文中,我们询问视觉变形金刚(VIT)是否可以作为改善机器学习模型对抗逃避攻击的对抗性鲁棒性的基础结构。尽管较早的作品集中在改善卷积神经网络上,但我们表明VIT也非常适合对抗训练以实现竞争性能。我们使用自定义的对抗训练配方实现了这一目标,该配方是在Imagenet数据集的一部分上使用严格的消融研究发现的。与卷积相比,VIT的规范培训配方建议强大的数据增强,部分是为了补偿注意力模块的视力归纳偏置。我们表明,该食谱在用于对抗训练时可实现次优性能。相比之下,我们发现省略所有重型数据增强,并添加一些额外的零件($ \ varepsilon $ -Warmup和更大的重量衰减),从而大大提高了健壮的Vits的性能。我们表明,我们的配方在完整的Imagenet-1k上概括了不同类别的VIT体系结构和大规模模型。此外,调查了模型鲁棒性的原因,我们表明,在使用我们的食谱时,在训练过程中产生强烈的攻击更加容易,这会在测试时提高鲁棒性。最后,我们通过提出一种量化对抗性扰动的语义性质并强调其与模型的鲁棒性的相关性来进一步研究对抗训练的结果。总体而言,我们建议社区应避免将VIT的规范培训食谱转换为在对抗培训的背景下进行强大的培训和重新思考常见的培训选择。
translated by 谷歌翻译
神经网络可以从单个图像中了解视觉世界的内容是什么?虽然它显然不能包含存在的可能对象,场景和照明条件 - 在所有可能的256 ^(3x224x224)224尺寸的方形图像中,它仍然可以在自然图像之前提供强大的。为了分析这一假设,我们通过通过监控掠夺教师的知识蒸馏来制定一种训练神经网络的培训神经网络。有了这个,我们发现上述问题的答案是:“令人惊讶的是,很多”。在定量术语中,我们在CiFar-10/100上找到了94%/ 74%的前1个精度,在想象中,通过将这种方法扩展到音频,84%的语音组合。在广泛的分析中,我们解除了增强,源图像和网络架构的选择,以及在从未见过熊猫的网络中发现“熊猫神经元”。这项工作表明,一个图像可用于推断成千上万的对象类,并激励关于增强和图像的基本相互作用的更新的研究议程。
translated by 谷歌翻译
很少拍摄的学习解决了学习如何解决不仅仅是有限的监督而且有限的数据的挑战。有吸引力的解决方案是合成数据生成。然而,大多数此类方法过于复杂,专注于输入空间中的高质量现实数据。目前尚不清楚是否将它们适应少次拍摄的制度并使用它们在分类的下游任务中是正确的方法。以前关于综合数据生成的工作,用于几次分类专注于利用复杂模型,例如,具有多个常规方或网络的Wasserstein GaN,可从新颖的课程中传输潜在的分集。我们遵循不同的方法,并调查如何有效地使用简单和简单的合成数据生成方法。我们提出了两个贡献,即我们表明:(1)使用简单的损失函数足以训练几次拍摄设置中的一个特征生成器; (2)学习生成张量特征而不是矢量特征是优越的。在MiniimAgenet,Cub和CiFar-FS数据集上的广泛实验表明,我们的方法设置了新的最新状态,优于更复杂的少量数据增强方法。源代码可以在https://github.com/michalislazarou/tfh_fewshot找到。
translated by 谷歌翻译
Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study. We will show that, by combining these refinements together, we are able to improve various CNN models significantly. For example, we raise ResNet-50's top-1 validation accuracy from 75.3% to 79.29% on ImageNet. We will also demonstrate that improvement on image classification accuracy leads to better transfer learning performance in other application domains such as object detection and semantic segmentation.
translated by 谷歌翻译
Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. Manifold Mixup leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with Manifold Mixup learn class-representations with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it on practical situations, and connect it to previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, Manifold Mixup improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.
translated by 谷歌翻译
我们提出了自适应培训 - 一种统一的培训算法,通过模型预测动态校准并增强训练过程,而不会产生额外的计算成本 - 以推进深度神经网络的监督和自我监督的学习。我们分析了培训数据的深网络培训动态,例如随机噪声和对抗例。我们的分析表明,模型预测能够在数据中放大有用的基础信息,即使在没有任何标签信息的情况下,这种现象也会发生,突出显示模型预测可能会产生培训过程:自适应培训改善了深网络的概括在噪音下,增强自我监督的代表学习。分析还阐明了解深度学习,例如,在经验风险最小化和最新的自我监督学习算法的折叠问题中对最近发现的双重现象的潜在解释。在CIFAR,STL和Imagenet数据集上的实验验证了我们在三种应用中的方法的有效性:用标签噪声,选择性分类和线性评估进行分类。为了促进未来的研究,该代码已在HTTPS://github.com/layneh/Self-Aveptive-训练中公开提供。
translated by 谷歌翻译
数据混合(例如混合,cutmix,resizemix)是推进识别模型的重要组成部分。在本文中,我们专注于研究其在自我监督环境中的有效性。通过注意共享相同源图像的混合图像彼此内在相关,我们在此提议SDMP,缩写为$ \ textbf {s} $ imple $ \ textbf {d} $ ata $ \ ata $ \ textbf {m} $ ixing $ ixing $ \ textbf {p} $ rior,要捕获这个直接但必不可少的先验,并将混合图像定位为其他$ \ textbf {potition pairs} $,以促进自我监督的表示的学习。我们的实验验证了所提出的SDMP可以使数据混合有助于一组自学的学习框架(例如MoCo)实现更好的准确性和分布外的鲁棒性。更值得注意的是,我们的SDMP是成功利用数据混合以改善(而不是伤害)视觉变压器在自我监督的环境中的性能的第一种方法。代码可在https://github.com/oliverrensu/sdmp上公开获取
translated by 谷歌翻译
Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets.
translated by 谷歌翻译
已知现代深度神经网络模型将错误地将分布式(OOD)测试数据分类为具有很高信心的分数(ID)培训课程之一。这可能会对关键安全应用产生灾难性的后果。一种流行的缓解策略是训练单独的分类器,该分类器可以在测试时间检测此类OOD样本。在大多数实际设置中,在火车时间尚不清楚OOD的示例,因此,一个关键问题是:如何使用合成OOD样品来增加ID数据以训练这样的OOD检测器?在本文中,我们为称为CNC的OOD数据增强提出了一种新颖的复合腐败技术。 CNC的主要优点之一是,除了培训集外,它不需要任何固定数据。此外,与当前的最新技术(SOTA)技术不同,CNC不需要在测试时间进行反向传播或结合,从而使我们的方法在推断时更快。我们与过去4年中主要会议的20种方法进行了广泛的比较,表明,在OOD检测准确性和推理时间方面,使用基于CNC的数据增强训练的模型都胜过SOTA。我们包括详细的事后分析,以研究我们方法成功的原因,并确定CNC样本的较高相对熵和多样性是可能的原因。我们还通过对二维数据集进行零件分解分析提供理论见解,以揭示(视觉和定量),我们的方法导致ID类别周围的边界更紧密,从而更好地检测了OOD样品。源代码链接:https://github.com/cnc-ood
translated by 谷歌翻译