分类一直是对对抗攻击的研究的焦点,但是只有少数著作调查了适合于更密集的预测任务的方法,例如语义分割。这些作品中提出的方法不能准确地解决对抗性分割问题,因此,在愚弄模型所需的扰动的大小方面,它过于充满乐趣。在这里,我们基于近端分裂的近端分裂提出了对这些模型的白色盒子攻击,以产生较小的$ \ ell_1 $,$ \ ell_2 $或$ \ ell_ \ ell_ \ infty $ norms的对抗性扰动。我们的攻击可以通过增强的Lagrangian方法以及自适应约束缩放和掩盖策略来处理非covex最小化框架内的大量约束。我们证明,我们的攻击明显胜过先前提出的攻击,以及我们适应细分的分类攻击,为这项密集的任务提供了第一个全面的基准。我们的结果推动了有关分割任务中鲁棒性评估的当前限制。
translated by 谷歌翻译
深度神经网络的图像分类容易受到对抗性扰动的影响。图像分类可以通过在输入图像中添加人造小且不可察觉的扰动来轻松愚弄。作为最有效的防御策略之一,提出了对抗性训练,以解决分类模型的脆弱性,其中创建了对抗性示例并在培训期间注入培训数据中。在过去的几年中,对分类模型的攻击和防御进行了深入研究。语义细分作为分类的扩展,最近也受到了极大的关注。最近的工作表明,需要大量的攻击迭代来创建有效的对抗性示例来欺骗分割模型。该观察结果既可以使鲁棒性评估和对分割模型的对抗性培训具有挑战性。在这项工作中,我们提出了一种称为SEGPGD的有效有效的分割攻击方法。此外,我们提供了收敛分析,以表明在相同数量的攻击迭代下,提出的SEGPGD可以创建比PGD更有效的对抗示例。此外,我们建议将SEGPGD应用于分割对抗训练的基础攻击方法。由于SEGPGD可以创建更有效的对抗性示例,因此使用SEGPGD的对抗训练可以提高分割模型的鲁棒性。我们的建议还通过对流行分割模型体系结构和标准分段数据集进行了验证。
translated by 谷歌翻译
The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the l p -norms for p ∈ {1, 2, ∞} aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to stateof-the-art attacks which are partially specialized to one l p -norm, and is robust to the phenomenon of gradient masking.
translated by 谷歌翻译
许多最先进的ML模型在各种任务中具有优于图像分类的人类。具有如此出色的性能,ML模型今天被广泛使用。然而,存在对抗性攻击和数据中毒攻击的真正符合ML模型的稳健性。例如,Engstrom等人。证明了最先进的图像分类器可以容易地被任意图像上的小旋转欺骗。由于ML系统越来越纳入安全性和安全敏感的应用,对抗攻击和数据中毒攻击构成了相当大的威胁。本章侧重于ML安全的两个广泛和重要的领域:对抗攻击和数据中毒攻击。
translated by 谷歌翻译
Designing powerful adversarial attacks is of paramount importance for the evaluation of $\ell_p$-bounded adversarial defenses. Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries. The search space of PGD is dictated by the steepest ascent directions of an objective. Despite the plethora of objective function choices, there is no universally superior option and robustness overestimation may arise from ill-suited objective selection. Driven by this observation, we postulate that the combination of different objectives through a simple loss alternating scheme renders PGD more robust towards design choices. We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different $\ell_{\infty}$-robust models and 3 datasets. The performance improvement is consistent, when compared to the single loss counterparts. In the CIFAR-10 dataset, our strongest adversarial attack outperforms all of the white-box components of AutoAttack (AA) ensemble, as well as the most powerful attacks existing on the literature, achieving state-of-the-art results in the computational budget of our study ($T=100$, no restarts).
translated by 谷歌翻译
评估对抗性鲁棒性的量,以找到有输入样品被错误分类所需的最小扰动。底层优化的固有复杂性需要仔细调整基于梯度的攻击,初始化,并且可能为许多计算苛刻的迭代而被执行,即使专门用于给定的扰动模型也是如此。在这项工作中,我们通过提出使用不同$ \ ell_p $ -norm扰动模型($ p = 0,1,2,\ idty $)的快速最小规范(FMN)攻击来克服这些限制(FMN)攻击选择,不需要对抗性起点,并在很少的轻量级步骤中收敛。它通过迭代地发现在$ \ ell_p $ -norm的最大信心被错误分类的样本进行了尺寸的尺寸$ \ epsilon $的限制,同时适应$ \ epsilon $,以最小化当前样本到决策边界的距离。广泛的实验表明,FMN在收敛速度和计算时间方面显着优于现有的攻击,同时报告可比或甚至更小的扰动尺寸。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. We apply our ensemble to over 50 models from papers published at recent top machine learning and computer vision venues. In all except one of the cases we achieve lower robust test accuracy than reported in these papers, often by more than 10%, identifying several broken defenses.
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/\sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
translated by 谷歌翻译
由于在量化网络上的按位操作产生的有效存储器消耗和更快的计算,神经网络量化已经变得越来越受欢迎。尽管它们表现出优异的泛化能力,但其鲁棒性属性并不是很好地理解。在这项工作中,我们系统地研究量化网络对基于梯度的对抗性攻击的鲁棒性,并证明这些量化模型遭受梯度消失问题并显示出虚假的鲁棒感。通过归因于培训的网络中的渐变消失到较差的前后信号传播,我们引入了一个简单的温度缩放方法来缓解此问题,同时保留决策边界。尽管对基于梯度的对抗攻击进行了简单的修改,但具有多个网络架构的多个图像分类数据集的实验表明,我们的温度缩放攻击在量化网络上获得了近乎完美的成功率,同时表现出对普遍培训的模型以及浮动的原始攻击以及浮动 - 点网络。代码可在https://github.com/kartikgupta-at-anu/Attack-bnn获得。
translated by 谷歌翻译
我们表明,当考虑到图像域$ [0,1] ^ D $时,已建立$ L_1 $ -Projected梯度下降(PGD)攻击是次优,因为它们不认为有效的威胁模型是交叉点$ l_1 $ -ball和$ [0,1] ^ d $。我们研究了这种有效威胁模型的最陡渐进步骤的预期稀疏性,并表明该组上的确切投影是计算可行的,并且产生更好的性能。此外,我们提出了一种自适应形式的PGD,即使具有小的迭代预算,这也是非常有效的。我们的结果$ l_1 $ -apgd是一个强大的白盒攻击,表明先前的作品高估了他们的$ l_1 $ -trobustness。使用$ l_1 $ -apgd for vercersarial培训,我们获得一个强大的分类器,具有sota $ l_1 $ -trobustness。最后,我们将$ l_1 $ -apgd和平方攻击的适应组合到$ l_1 $ to $ l_1 $ -autoattack,这是一个攻击的集合,可靠地评估$ l_1 $ -ball与$的威胁模型的对抗鲁棒性进行对抗[ 0,1] ^ d $。
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译
深度神经网络已被证明容易受到对抗图像的影响。常规攻击努力争取严格限制扰动的不可分割的对抗图像。最近,研究人员已采取行动探索可区分但非奇异的对抗图像,并证明色彩转化攻击是有效的。在这项工作中,我们提出了对抗颜色过滤器(ADVCF),这是一种新颖的颜色转换攻击,在简单颜色滤波器的参数空间中通过梯度信息进行了优化。特别是,明确指定了我们的颜色滤波器空间,以便从攻击和防御角度来对对抗性色转换进行系统的鲁棒性分析。相反,由于缺乏这种明确的空间,现有的颜色转换攻击并不能为系统分析提供机会。我们通过用户研究进一步进行了对成功率和图像可接受性的不同颜色转化攻击之间的广泛比较。其他结果为在另外三个视觉任务中针对ADVCF的模型鲁棒性提供了有趣的新见解。我们还强调了ADVCF的人类解剖性,该advcf在实际使用方案中有希望,并显示出比对图像可接受性和效率的最新人解释的色彩转化攻击的优越性。
translated by 谷歌翻译
We propose the Square Attack, a score-based black-box l2and l∞-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized search scheme which selects localized squareshaped updates at random positions so that at each iteration the perturbation is situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art methods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks by a factor of at least 1.8 and up to 3 compared to the recent state-ofthe-art l∞-attack of Al-Dujaili & OReilly (2020). Moreover, although our attack is black-box, it can also outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate. The code of our attack is available at https://github.com/max-andr/square-attack.
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as `gradient masking'. In this work, we analyse the effect of label smoothing on adversarial training as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA). Our attack approach is based on a `match and deceive' loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate for the case of Auto-Attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
translated by 谷歌翻译
深度学习的进步使得广泛的有希望的应用程序。然而,这些系统容易受到对抗机器学习(AML)攻击的影响;对他们的意见的离前事实制作的扰动可能导致他们错误分类。若干最先进的对抗性攻击已经证明他们可以可靠地欺骗分类器,使这些攻击成为一个重大威胁。对抗性攻击生成算法主要侧重于创建成功的例子,同时控制噪声幅度和分布,使检测更加困难。这些攻击的潜在假设是脱机产生的对抗噪声,使其执行时间是次要考虑因素。然而,最近,攻击者机会自由地产生对抗性示例的立即对抗攻击已经可能。本文介绍了一个新问题:我们如何在实时约束下产生对抗性噪音,以支持这种实时对抗攻击?了解这一问题提高了我们对这些攻击对实时系统构成的威胁的理解,并为未来防御提供安全评估基准。因此,我们首先进行对抗生成算法的运行时间分析。普遍攻击脱机产生一般攻击,没有在线开销,并且可以应用于任何输入;然而,由于其一般性,他们的成功率是有限的。相比之下,在特定输入上工作的在线算法是计算昂贵的,使它们不适合在时间约束下的操作。因此,我们提出房间,一种新型实时在线脱机攻击施工模型,其中离线组件用于预热在线算法,使得可以在时间限制下产生高度成功的攻击。
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译