Adversarial training (AT) is one of the most effective ways for improving the robustness of deep convolution neural networks (CNNs). Just like common network training, the effectiveness of AT relies on the design of basic network components. In this paper, we conduct an in-depth study on the role of the basic ReLU activation component in AT for robust CNNs. We find that the spatially-shared and input-independent properties of ReLU activation make CNNs less robust to white-box adversarial attacks with either standard or adversarial training. To address this problem, we extend ReLU to a novel Sparta activation function (Spatially attentive and Adversarially Robust Activation), which enables CNNs to achieve both higher robustness, i.e., lower error rate on adversarial examples, and higher accuracy, i.e., lower error rate on clean examples, than the existing state-of-the-art (SOTA) activation functions. We further study the relationship between Sparta and the SOTA activation functions, providing more insights about the advantages of our method. With comprehensive experiments, we also find that the proposed method exhibits superior cross-CNN and cross-dataset transferability. For the former, the adversarially trained Sparta function for one CNN (e.g., ResNet-18) can be fixed and directly used to train another adversarially robust CNN (e.g., ResNet-34). For the latter, the Sparta function trained on one dataset (e.g., CIFAR-10) can be employed to train adversarially robust CNNs on another dataset (e.g., SVHN). In both cases, Sparta leads to CNNs with higher robustness than the vanilla ReLU, verifying the flexibility and versatility of the proposed method.
translated by 谷歌翻译
我们从频道明智激活的角度调查CNN的对抗性鲁棒性。通过比较\ Textit {非鲁棒}(通常训练)和\ exingit {REXITIT {REARUSTIFIED}(普及培训的)模型,我们观察到对抗性培训(AT)通过将频道明智的数据与自然的渠道和自然的对抗激活对齐来强调CNN同行。然而,在处理逆势数据时仍仍会过度激活以\ texit {excy-computive}(nr)的频道仍会过度激活。此外,我们还观察到,在所有课程上不会导致类似的稳健性。对于强大的类,具有较大激活大小的频道通常是更长的\ extedit {正相关}(pr)到预测,但这种对齐不适用于非鲁棒类。鉴于这些观察结果,我们假设抑制NR通道并对齐PR与其相关性进一步增强了在其下的CNN的鲁棒性。为了检查这个假设,我们介绍了一种新的机制,即\下划线{C} Hannel-Wise \ Underline {i} Mportance的\下划线{F} eature \ Underline {s}选举(CIFS)。 CIFS通过基于与预测的相关性产生非负乘法器来操纵某些层的激活。在包括CIFAR10和SVHN的基准数据集上的广泛实验明确验证了强制性CNN的假设和CIFS的有效性。 \ url {https://github.com/hanshuyan/cifs}
translated by 谷歌翻译
深度神经网络(DNNS)最近在许多分类任务中取得了巨大的成功。不幸的是,它们容易受到对抗性攻击的影响,这些攻击会产生对抗性示例,这些示例具有很小的扰动,以欺骗DNN模型,尤其是在模型共享方案中。事实证明,对抗性训练是最有效的策略,它将对抗性示例注入模型训练中,以提高DNN模型的稳健性,以对对抗性攻击。但是,基于现有的对抗性示例的对抗训练无法很好地推广到标准,不受干扰的测试数据。为了在标准准确性和对抗性鲁棒性之间取得更好的权衡,我们提出了一个新型的对抗训练框架,称为潜在边界引导的对抗训练(梯子),该训练(梯子)在潜在的边界引导的对抗性示例上对对手进行对手训练DNN模型。与大多数在输入空间中生成对抗示例的现有方法相反,梯子通过增加对潜在特征的扰动而产生了无数的高质量对抗示例。扰动是沿SVM构建的具有注意机制的决策边界的正常情况进行的。我们从边界场的角度和可视化视图分析了生成的边界引导的对抗示例的优点。与Vanilla DNN和竞争性底线相比,对MNIST,SVHN,CELEBA和CIFAR-10的广泛实验和详细分析验证了梯子在标准准确性和对抗性鲁棒性之间取得更好的权衡方面的有效性。
translated by 谷歌翻译
Neural networks are vulnerable to adversarial examples, which poses a threat to their application in security sensitive systems. We propose high-level representation guided denoiser (HGD) as a defense for image classification. Standard denoiser suffers from the error amplification effect, in which small residual adversarial noise is progressively amplified and leads to wrong classifications. HGD overcomes this problem by using a loss function defined as the difference between the target model's outputs activated by the clean image and denoised image. Compared with ensemble adversarial training which is the state-of-the-art defending method on large images, HGD has three advantages. First, with HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Second, HGD can be trained on a small subset of the images and generalizes well to other images and unseen classes. Third, HGD can be transferred to defend models other than the one guiding it. In NIPS competition on defense against adversarial attacks, our HGD solution won the first place and outperformed other models by a large margin. 1 * Equal contribution.
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
对逆势实例的研究,对抗性鲁棒性得到了越来越关注。到目前为止,现有的作品表明,强大的模型不仅可以获得针对各种对抗攻击的鲁棒性,而且还可以提高一些下游任务中的性能。然而,对抗性鲁棒性的潜在机制仍然不明确。在本文中,从线性组分的角度解释对抗性鲁棒性,并发现存在一些统计特性的全面鲁棒模型。具体地,在删除或更换所有非线性组件(例如,批量标准化,最大池或激活层)时,鲁棒模型对其线性化子网显示了明显的分层聚类效果。根据这些观察,我们提出了对对抗性稳健性的新颖理解,并将其应用于包括领域适应和稳健性提升的更多任务。实验评估展示了我们拟议的聚类策略的合理性和优越性。
translated by 谷歌翻译
已知深神经网络(DNN)容易受到对抗性攻击的影响,即对输入的不可察觉的扰动可以误导DNN在清洁图像上培训,以制造错误的预测。为了解决这一目标,对抗性训练是目前最有效的防御方法,通过增强速度设定的训练,在飞行中产生的对抗样本。有趣的是,我们首次发现,在随机初始化的网络中,在没有任何模型训练的随机初始化网络中,第一次发现具有天生稳健性,匹配或超越对抗训练网络的强大准确性的鲁棒准确性,表明对模型权重的对抗训练不是对抗性鲁棒性不可或缺。我们命名为强大的临时票故障票(RST),也是自然效率的那种。不同于流行的彩票假设,既不需要培训原始密集的网络也不需要训练。为了验证和理解这种迷人的发现,我们进一步开展了广泛的实验,以研究不同模型,数据集,稀疏模式和攻击下RST的存在性和性质,绘制关于DNNS鲁棒性与其初始化/过度分辨率之间的关系的洞察。此外,我们确定从同一随机初始化的密集网络绘制的不同稀疏比率的RST之间的差的对抗性转移性,并提出了一种随机切换不同RST之间的随机切换的随机性,作为基于顶部的新型防御方法第一次。我们相信我们对RST的调查结果已经开辟了一个新的视角,以研究模型稳健性并扩大彩票假设。
translated by 谷歌翻译
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.
translated by 谷歌翻译
已知深神经网络(DNN)容易受到对抗性攻击的影响。已经提出了一系列防御方法来培训普遍稳健的DNN,其中对抗性培训已经证明了有希望的结果。然而,尽管对对抗性培训开发的初步理解,但从架构角度来看,它仍然不明确,从架构角度来看,什么配置可以导致更强大的DNN。在本文中,我们通过全面调查网络宽度和深度对前对方培训的DNN的鲁棒性的全面调查来解决这一差距。具体地,我们进行以下关键观察:1)更多参数(更高的模型容量)不一定有助于对抗冒险; 2)网络的最后阶段(最后一组块)降低能力实际上可以改善对抗性的鲁棒性; 3)在相同的参数预算下,存在对抗性鲁棒性的最佳架构配置。我们还提供了一个理论分析,解释了为什么这种网络配置可以帮助鲁棒性。这些架构见解可以帮助设计对抗的强制性DNN。代码可用于\ url {https://github.com/hanxunh/robustwrn}。
translated by 谷歌翻译
积极调查深度神经网络的对抗鲁棒性。然而,大多数现有的防御方法限于特定类型的对抗扰动。具体而言,它们通常不能同时为多次攻击类型提供抵抗力,即,它们缺乏多扰动鲁棒性。此外,与图像识别问题相比,视频识别模型的对抗鲁棒性相对未开发。虽然有几项研究提出了如何产生对抗性视频,但在文献中只发表了关于防御策略的少数关于防御策略的方法。在本文中,我们提出了用于视频识别的多种抗逆视频的第一战略之一。所提出的方法称为Multibn,使用具有基于学习的BN选择模块的多个独立批量归一化(BN)层对多个对冲视频类型进行对抗性训练。利用多个BN结构,每个BN Brach负责学习单个扰动类型的分布,从而提供更精确的分布估计。这种机制有利于处理多种扰动类型。 BN选择模块检测输入视频的攻击类型,并将其发送到相应的BN分支,使MultiBN全自动并允许端接训练。与目前的对抗训练方法相比,所提出的Multibn对不同甚至不可预见的对抗性视频类型具有更强的多扰动稳健性,从LP界攻击和物理上可实现的攻击范围。在不同的数据集和目标模型上保持真实。此外,我们进行了广泛的分析,以研究多BN结构的性质。
translated by 谷歌翻译
由于计算成本和能耗有限,部署在移动设备中的大多数神经网络模型都很小。然而,微小的神经网络通常很容易攻击。目前的研究证明,较大的模型规模可以提高鲁棒性,但很少的研究侧重于如何增强微小神经网络的稳健性。我们的工作侧重于如何改善微小神经网络的稳健性,而不会严重恶化移动级资源下的清洁准确性。为此,我们提出了一种多目标oneShot网络架构搜索(NAS)算法,以便在对抗准确度,清洁精度和模型尺寸方面获得最佳权衡网络。具体而言,我们基于新的微小块和通道设计一种新的搜索空间,以平衡模型大小和对抗性能。此外,由于SUPERNET显着影响了我们NAS算法中子网的性能,因此我们揭示了对SuperNet如何有助于获得白盒对抗攻击下最好的子网的洞察力。具体地,我们通过分析对抗性可转移性,超空网的宽度以及从头划痕和微调训练子网之间的差异来探索新的对抗性培训范式。最后,我们对第一个非主导的前沿的某些块和通道的层面组合进行了统计分析,这可以作为设计微小神经网络架构以实现对抗性扰动的指导。
translated by 谷歌翻译
Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 -it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ∼10%. Code is available at https://github.com/facebookresearch/ ImageNet-Adversarial-Training.
translated by 谷歌翻译
Deep learning methods have gained increased attention in various applications due to their outstanding performance. For exploring how this high performance relates to the proper use of data artifacts and the accurate problem formulation of a given task, interpretation models have become a crucial component in developing deep learning-based systems. Interpretation models enable the understanding of the inner workings of deep learning models and offer a sense of security in detecting the misuse of artifacts in the input data. Similar to prediction models, interpretation models are also susceptible to adversarial inputs. This work introduces two attacks, AdvEdge and AdvEdge$^{+}$, that deceive both the target deep learning model and the coupled interpretation model. We assess the effectiveness of proposed attacks against two deep learning model architectures coupled with four interpretation models that represent different categories of interpretation models. Our experiments include the attack implementation using various attack frameworks. We also explore the potential countermeasures against such attacks. Our analysis shows the effectiveness of our attacks in terms of deceiving the deep learning models and their interpreters, and highlights insights to improve and circumvent the attacks.
translated by 谷歌翻译
Efforts to improve the adversarial robustness of convolutional neural networks have primarily focused on developing more effective adversarial training methods. In contrast, little attention was devoted to analyzing the role of architectural elements (such as topology, depth, and width) on adversarial robustness. This paper seeks to bridge this gap and present a holistic study on the impact of architectural design on adversarial robustness. We focus on residual networks and consider architecture design at the block level, i.e., topology, kernel size, activation, and normalization, as well as at the network scaling level, i.e., depth and width of each block in the network. In both cases, we first derive insights through systematic ablative experiments. Then we design a robust residual block, dubbed RobustResBlock, and a compound scaling rule, dubbed RobustScaling, to distribute depth and width at the desired FLOP count. Finally, we combine RobustResBlock and RobustScaling and present a portfolio of adversarially robust residual networks, RobustResNets, spanning a broad spectrum of model capacities. Experimental validation across multiple datasets and adversarial attacks demonstrate that RobustResNets consistently outperform both the standard WRNs and other existing robust architectures, achieving state-of-the-art AutoAttack robust accuracy of 61.1% without additional data and 63.7% with 500K external data while being $2\times$ more compact in terms of parameters. Code is available at \url{ https://github.com/zhichao-lu/robust-residual-network}
translated by 谷歌翻译
有必要提高某些特殊班级的表现,或者特别保护它们免受对抗学习的攻击。本文提出了一个将成本敏感分类和对抗性学习结合在一起的框架,以训练可以区分受保护和未受保护的类的模型,以使受保护的类别不太容易受到对抗性示例的影响。在此框架中,我们发现在训练深神经网络(称为Min-Max属性)期间,一个有趣的现象,即卷积层中大多数参数的绝对值。基于这种最小的最大属性,该属性是在随机分布的角度制定和分析的,我们进一步建立了一个针对对抗性示例的新防御模型,以改善对抗性鲁棒性。构建模型的一个优点是,它的性能比标准模型更好,并且可以与对抗性训练相结合,以提高性能。在实验上证实,对于所有类别的平均准确性,我们的模型在没有发生攻击时几乎与现有模型一样,并且在发生攻击时比现有模型更好。具体而言,关于受保护类的准确性,提议的模型比发生攻击时的现有模型要好得多。
translated by 谷歌翻译
为了应对对抗性实例的威胁,对抗性培训提供了一种有吸引力的选择,可以通过在线增强的对抗示例中的培训模型提高模型稳健性。然而,大多数现有的对抗训练方法通过强化对抗性示例来侧重于提高鲁棒的准确性,但忽略了天然数据和对抗性实施例之间的增加,导致自然精度急剧下降。为了维持自然和强大的准确性之间的权衡,我们从特征适应的角度缓解了转变,并提出了一种特征自适应对抗训练(FAAT),这些培训(FAAT)跨越自然数据和对抗示例优化类条件特征适应。具体而言,我们建议纳入一类条件鉴别者,以鼓励特征成为(1)类鉴别的和(2)不变导致对抗性攻击的变化。新型的FAAT框架通过在天然和对抗数据中产生具有类似分布的特征来实现自然和强大的准确性之间的权衡,并实现从类鉴别特征特征中受益的更高的整体鲁棒性。在各种数据集上的实验表明,FAAT产生更多辨别特征,并对最先进的方法表现有利。代码在https://github.com/visionflow/faat中获得。
translated by 谷歌翻译
在本文中,我们询问视觉变形金刚(VIT)是否可以作为改善机器学习模型对抗逃避攻击的对抗性鲁棒性的基础结构。尽管较早的作品集中在改善卷积神经网络上,但我们表明VIT也非常适合对抗训练以实现竞争性能。我们使用自定义的对抗训练配方实现了这一目标,该配方是在Imagenet数据集的一部分上使用严格的消融研究发现的。与卷积相比,VIT的规范培训配方建议强大的数据增强,部分是为了补偿注意力模块的视力归纳偏置。我们表明,该食谱在用于对抗训练时可实现次优性能。相比之下,我们发现省略所有重型数据增强,并添加一些额外的零件($ \ varepsilon $ -Warmup和更大的重量衰减),从而大大提高了健壮的Vits的性能。我们表明,我们的配方在完整的Imagenet-1k上概括了不同类别的VIT体系结构和大规模模型。此外,调查了模型鲁棒性的原因,我们表明,在使用我们的食谱时,在训练过程中产生强烈的攻击更加容易,这会在测试时提高鲁棒性。最后,我们通过提出一种量化对抗性扰动的语义性质并强调其与模型的鲁棒性的相关性来进一步研究对抗训练的结果。总体而言,我们建议社区应避免将VIT的规范培训食谱转换为在对抗培训的背景下进行强大的培训和重新思考常见的培训选择。
translated by 谷歌翻译
基于深度神经网络(DNN)的智能信息(IOT)系统已被广泛部署在现实世界中。然而,发现DNNS易受对抗性示例的影响,这提高了人们对智能物联网系统的可靠性和安全性的担忧。测试和评估IOT系统的稳健性成为必要和必要。最近已经提出了各种攻击和策略,但效率问题仍未纠正。现有方法是计算地广泛或耗时,这在实践中不适用。在本文中,我们提出了一种称为攻击启发GaN(AI-GaN)的新框架,在有条件地产生对抗性实例。曾经接受过培训,可以有效地给予对抗扰动的输入图像和目标类。我们在白盒设置的不同数据集中应用AI-GaN,黑匣子设置和由最先进的防御保护的目标模型。通过广泛的实验,AI-GaN实现了高攻击成功率,优于现有方法,并显着降低了生成时间。此外,首次,AI-GaN成功地缩放到复杂的数据集。 Cifar-100和Imagenet,所有课程中的成功率约为90美元。
translated by 谷歌翻译
我们提出了一种新颖且有效的纯化基于纯化的普通防御方法,用于预处理盲目的白色和黑匣子攻击。我们的方法仅在一般图像上进行了自我监督学习,在计算上效率和培训,而不需要对分类模型的任何对抗训练或再培训。我们首先显示对原始图像与其对抗示例之间的残余的对抗噪声的实证分析,几乎均为对称分布。基于该观察,我们提出了一种非常简单的迭代高斯平滑(GS),其可以有效地平滑对抗性噪声并实现大大高的鲁棒精度。为了进一步改进它,我们提出了神经上下文迭代平滑(NCIS),其以自我监督的方式列举盲点网络(BSN)以重建GS也平滑的原始图像的辨别特征。从我们使用四种分类模型对大型想象成的广泛实验,我们表明我们的方法既竞争竞争标准精度和最先进的强大精度,则针对最强大的净化器 - 盲目的白色和黑匣子攻击。此外,我们提出了一种用于评估基于商业图像分类API的纯化方法的新基准,例如AWS,Azure,Clarifai和Google。我们通过基于集合转移的黑匣子攻击产生对抗性实例,这可以促进API的完全错误分类,并证明我们的方法可用于增加API的抗逆性鲁棒性。
translated by 谷歌翻译
作为反对攻击的最有效的防御方法之一,对抗性训练倾向于学习包容性的决策边界,以提高深度学习模型的鲁棒性。但是,由于沿对抗方向的边缘的大幅度和不必要的增加,对抗性训练会在自然实例和对抗性示例之间引起严重的交叉,这不利于平衡稳健性和自然准确性之间的权衡。在本文中,我们提出了一种新颖的对抗训练计划,以在稳健性和自然准确性之间进行更好的权衡。它旨在学习一个中度包容的决策边界,这意味着决策边界下的自然示例的边缘是中等的。我们称此方案为中等边缘的对抗训练(MMAT),该方案生成更细粒度的对抗示例以减轻交叉问题。我们还利用了经过良好培训的教师模型的逻辑来指导我们的模型学习。最后,MMAT在Black-Box和White-Box攻击下都可以实现高自然的精度和鲁棒性。例如,在SVHN上,实现了最新的鲁棒性和自然精度。
translated by 谷歌翻译