Deep learning has been a popular topic and has achieved success in many areas. It has drawn the attention of researchers and machine learning practitioners alike, with developed models deployed to a variety of settings. Along with its achievements, research has shown that deep learning models are vulnerable to adversarial attacks. This finding brought about a new direction in research, whereby algorithms were developed to attack and defend vulnerable networks. Our interest is in understanding how these attacks effect change on the intermediate representations of deep learning models. We present a method for measuring and analyzing the deviations in representations induced by adversarial attacks, progressively across a selected set of layers. Experiments are conducted using an assortment of attack algorithms, on the CIFAR-10 dataset, with plots created to visualize the impact of adversarial attacks across different layers in a network.
translated by 谷歌翻译
深度学习模型已被用于各种各样的任务。它们在计算机视觉,自然语言处理,语音识别等领域普遍存在。虽然这些模型在许多情况下工作得很好,但已经表明他们容易受到对抗的攻击。这导致了对途径的研究扩散,即可以识别和/或捍卫这种攻击。我们的目标是探讨可以归因于使用多个底层模型的贡献,以用于对冲实例检测。我们的论文描述了两种方法,该方法包含来自多种模型的表示,用于检测对抗性示例。我们设计了控制实验,用于测量逐步使用额外模型的检测冲击。对于我们考虑的许多场景,结果表明,性能随着用于提取表示的底层模型的数量而增加。
translated by 谷歌翻译
Previous work has shown that a neural network with the rectified linear unit (ReLU) activation function leads to a convex polyhedral decomposition of the input space. These decompositions can be represented by a dual graph with vertices corresponding to polyhedra and edges corresponding to polyhedra sharing a facet, which is a subgraph of a Hamming graph. This paper illustrates how one can utilize the dual graph to detect and analyze adversarial attacks in the context of digital images. When an image passes through a network containing ReLU nodes, the firing or non-firing at a node can be encoded as a bit ($1$ for ReLU activation, $0$ for ReLU non-activation). The sequence of all bit activations identifies the image with a bit vector, which identifies it with a polyhedron in the decomposition and, in turn, identifies it with a vertex in the dual graph. We identify ReLU bits that are discriminators between non-adversarial and adversarial images and examine how well collections of these discriminators can ensemble vote to build an adversarial image detector. Specifically, we examine the similarities and differences of ReLU bit vectors for adversarial images, and their non-adversarial counterparts, using a pre-trained ResNet-50 architecture. While this paper focuses on adversarial digital images, ResNet-50 architecture, and the ReLU activation function, our methods extend to other network architectures, activation functions, and types of datasets.
translated by 谷歌翻译
In order for machine learning to be trusted in many applications, it is critical to be able to reliably explain why the machine learning algorithm makes certain predictions. For this reason, a variety of methods have been developed recently to interpret neural network predictions by providing, for example, feature importance maps. For both scientific robustness and security reasons, it is important to know to what extent can the interpretations be altered by small systematic perturbations to the input data, which might be generated by adversaries or by measurement biases. In this paper, we demonstrate how to generate adversarial perturbations that produce perceptively indistinguishable inputs that are assigned the same predicted label, yet have very different interpretations. We systematically characterize the robustness of interpretations generated by several widely-used feature importance interpretation methods (feature importance maps, integrated gradients, and DeepLIFT) on ImageNet and CIFAR-10. In all cases, our experiments show that systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. influence functions) are similarly susceptible to adversarial attack. Our analysis of the geometry of the Hessian matrix gives insight on why robustness is a general challenge to current interpretation approaches.
translated by 谷歌翻译
随着图像识别中深度学习模型的快速发展和使用的增加,安全成为其在安全至关重要系统中的部署的主要关注点。由于深度学习模型的准确性和鲁棒性主要归因于训练样本的纯度,因此,深度学习体系结构通常容易受到对抗性攻击的影响。对抗性攻击通常是通过对正常图像的微妙扰动而获得的,正常图像对人类最不可感知,但可能会严重混淆最新的机器学习模型。我们提出了一个名为Apudae的框架,利用DeNoing AutoCoders(DAES)通过以自适应方式使用这些样品来纯化这些样本,从而提高了已攻击目标分类器网络的分类准确性。我们还展示了如何自适应地使用DAE,而不是直接使用它们,而是进一步提高分类精度,并且更强大,可以设计自适应攻击以欺骗它们。我们在MNIST,CIFAR-10,Imagenet数据集上展示了我们的结果,并展示了我们的框架(Apudae)如何在净化对手方面提供可比性和在大多数情况下的基线方法。我们还设计了专门设计的自适应攻击,以攻击我们的净化模型,并展示我们的防御方式如何强大。
translated by 谷歌翻译
随着在图像识别中的快速进步和深度学习模型的使用,安全成为他们在安全关键系统中部署的主要关注点。由于深度学习模型的准确性和稳健性主要归因于训练样本的纯度,因此深度学习架构通常易于对抗性攻击。通过对正常图像进行微妙的扰动来获得对抗性攻击,这主要是人类,但可以严重混淆最先进的机器学习模型。什么特别的智能扰动或噪声在正常图像上添加了它导致深神经网络的灾难性分类?使用统计假设检测,我们发现条件变形自身偏析器(CVAE)令人惊讶地擅长检测难以察觉的图像扰动。在本文中,我们展示了CVAE如何有效地用于检测对图像分类网络的对抗攻击。我们展示了我们的成果,Cifar-10数据集,并展示了我们的方法如何为先前的方法提供可比性,以检测对手,同时不会与嘈杂的图像混淆,其中大多数现有方法都摇摇欲坠。
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector. In this paper, we analyze an attack in an extremely limited scenario where only one pixel can be modified. For that we propose a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE). It requires less adversarial information (a blackbox attack) and can fool more types of networks due to the inherent features of DE. The results show that 67.97% of the natural images in Kaggle CIFAR-10 test dataset and 16.04% of the ImageNet (ILSVRC 2012) test images can be perturbed to at least one target class by modifying just one pixel with 74.03% and 22.91% confidence on average. We also show the same vulnerability on the original CIFAR-10 dataset. Thus, the proposed attack explores a different take on adversarial machine learning in an extreme limited scenario, showing that current DNNs are also vulnerable to such low dimension attacks. Besides, we also illustrate an important application of DE (or broadly speaking, evolutionary computation) in the domain of adversarial machine learning: creating tools that can effectively generate lowcost adversarial attacks against neural networks for evaluating robustness.
translated by 谷歌翻译
本文介绍了独立的神经颂歌(Snode),这是一种连续深入的神经模型,能够描述完整的深神经网络。这使用了一种新型的非线性结合梯度(NCG)下降优化方案,用于训练,在该方案中可以合并Sobolev梯度以提高模型权重的平滑度。我们还提出了神经敏感性问题的一般表述,并显示了它在NCG训练中的使用方式。灵敏度分析提供了整个网络中不确定性传播的可靠度量,可用于研究模型鲁棒性并产生对抗性攻击。我们的评估表明,与Resnet模型相比,我们的新型配方会提高鲁棒性和性能,并且为设计和开发机器学习的新机会提供了改善的解释性。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
许多最先进的ML模型在各种任务中具有优于图像分类的人类。具有如此出色的性能,ML模型今天被广泛使用。然而,存在对抗性攻击和数据中毒攻击的真正符合ML模型的稳健性。例如,Engstrom等人。证明了最先进的图像分类器可以容易地被任意图像上的小旋转欺骗。由于ML系统越来越纳入安全性和安全敏感的应用,对抗攻击和数据中毒攻击构成了相当大的威胁。本章侧重于ML安全的两个广泛和重要的领域:对抗攻击和数据中毒攻击。
translated by 谷歌翻译
神经科学家和机器学习研究人员通常引用对抗的例子,作为计算模型如何从生物感官系统发散的示例。最近的工作已经提出将生物启发组件添加到视觉神经网络中,作为提高其对抗性鲁棒性的一种方式。一种令人惊讶的有效组分,用于减少对抗性脆弱性是响应随机性,例如由生物神经元呈现的响应性随机性。在这里,使用最近开发的从计算神经科学的几何技术,我们研究了对抗性扰动如何影响标准,前列培训和生物学启发的随机网络的内部表示。我们为每种类型的网络找到了不同的几何签名,揭示了实现稳健表示的不同机制。接下来,我们将这些结果概括为听觉域,表明神经插值性也使听觉模型对对抗对抗扰动更鲁棒。随机网络的几何分析揭示了清洁和离前动脉扰动刺激的表示之间的重叠,并且定量表现出随机性的竞争几何效果在对抗和清洁性能之间调解权衡。我们的结果阐明了通过对外内培训和随机网络利用的强大感知的策略,并帮助解释了随机性如何有利于机器和生物计算。
translated by 谷歌翻译
深度学习(DL)系统的安全性是一个极为重要的研究领域,因为它们正在部署在多个应用程序中,因为它们不断改善,以解决具有挑战性的任务。尽管有压倒性的承诺,但深度学习系统容易受到制作的对抗性例子的影响,这可能是人眼无法察觉的,但可能会导致模型错误分类。对基于整体技术的对抗性扰动的保护已被证明很容易受到更强大的对手的影响,或者证明缺乏端到端评估。在本文中,我们试图开发一种新的基于整体的解决方案,该解决方案构建具有不同决策边界的防御者模型相对于原始模型。通过(1)通过一种称为拆分和剃须的方法转换输入的分类器的合奏,以及(2)通过一种称为对比度功能的方法限制重要特征,显示出相对于相对于不同的梯度对抗性攻击,这减少了将对抗性示例从原始示例转移到针对同一类的防御者模型的机会。我们使用标准图像分类数据集(即MNIST,CIFAR-10和CIFAR-100)进行了广泛的实验,以实现最新的对抗攻击,以证明基于合奏的防御的鲁棒性。我们还在存在更强大的对手的情况下评估稳健性,该对手同时靶向合奏中的所有模型。已经提供了整体假阳性和误报的结果,以估计提出的方法的总体性能。
translated by 谷歌翻译
神经进化可以通过应用进化计算的技术来自动化人工神经网络的产生。这些方法的主要目标是构建最大程度地提高预测性能的模型,有时还具有最大程度地减少计算复杂性的目标。尽管演变的模型在竞争成果方面取得了竞争成果,但它们对对抗性实例的稳健性(在关键方案中成为关注点)受到了有限的关注。在本文中,我们评估了通过CIFAR-10图像分类任务的两种突出的神经进化方法发现的模型的对抗性鲁棒性:密度和NSGA-NET。由于这些模型是公开可用的,因此我们考虑白盒不靶向的攻击,其中扰动是由L2或Linfital-norm界定的。与手动设计的网络类似,我们的结果表明,当通过迭代方法攻击演变的模型时,它们的准确性通常在两个距离指标下降至或接近零。密集的模型是这种趋势的例外,显示了L2威胁模型下的某些阻力,即使在迭代攻击中,其精度也从93.70%下降到18.10%。此外,我们分析了在网络第一层之前应用于数据的预处理的影响。我们的观察结果表明,其中一些技术会加剧添加到原始输入中的扰动,从而可能损害鲁棒性。因此,当自动设计网络的应用程序时,不应忽略此选择。
translated by 谷歌翻译
在这项研究中,我们介绍了机器感知的措施,灵感来自于人类感知的概念(JND)的概念。基于该措施,我们提出了一种对抗性图像生成算法,其通过添加剂噪声迭代地扭曲图像,直到模型通过输出错误标签来检测图像中的变化。添加到原始图像的噪声被定义为模型成本函数的梯度。定义了一种新的成本函数,以明确地最小化应用于输入图像的扰动量,同时强制执行对抗和输入图像之间的感知相似性。为此目的,经过众所周知的总变化和有界范围术语来规范成本函数,以满足对抗图像的自然外观。我们评估我们的算法在CiFar10,ImageNet和MS Coco Datasets上定性和定量地生成的对抗性图像。我们对图像分类和对象检测任务的实验表明,通过我们的JND方法产生的对抗性图像在欺骗识别/检测模型以及与由最先进的方法产生的图像相比,扰动扰动,即, FGV,FSGM和Deepfool方法。
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译
对卷积神经网络(CNN)的对抗性攻击的存在质疑这种模型对严重应用的适合度。攻击操纵输入图像,使得错误分类是在对人类观察者看上去正常的同时唤起的 - 因此它们不容易被检测到。在不同的上下文中,CNN隐藏层的反向传播激活(对给定输入的“特征响应”)有助于可视化人类“调试器” CNN“在计算其输出时对CNN”的看法。在这项工作中,我们提出了一种新颖的检测方法,以防止攻击。我们通过在特征响应中跟踪对抗扰动来做到这一点,从而可以使用平均局部空间熵自动检测。该方法不会改变原始的网络体系结构,并且完全可以解释。实验证实了我们对在Imagenet训练的大规模模型的最新攻击方法的有效性。
translated by 谷歌翻译
深度卷积神经网络可以准确地分类各种自然图像,但是在设计时可能很容易被欺骗,图像中嵌入了不可察觉的扰动。在本文中,我们设计了一种多管齐下的培训,输入转换和图像集成系统,该系统是攻击不可知论的,不容易估计。我们的系统结合了两个新型功能。第一个是一个转换层,该转换层从集体级训练数据示例中计算级别的多项式内核,并且迭代更新在推理时间上基于其特征内核差异的输入图像副本,以创建转换后的输入集合。第二个是一个分类系统,该系统将未防御网络的预测结合在一起,对被过滤图像的合奏进行了硬投票。我们在CIFAR10数据集上的评估显示,我们的系统提高了未防御性网络在不同距离指标下的各种有界和无限的白色盒子攻击的鲁棒性,同时牺牲了清洁图像的精度很小。反对自适应的全知攻击者创建端到端攻击,我们的系统成功地增强了对抗训练的网络的现有鲁棒性,为此,我们的方法最有效地应用了。
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
最近的光流方法几乎完全根据精度来判断,而它们的稳健性通常被忽略。尽管对抗性攻击提供了执行此类分析的有用工具,但是当前对光流方法的攻击集中在现实世界中的攻击场景上,而不是最坏的情况下的稳健性评估。因此,在这项工作中,我们提出了一种新颖的对抗性攻击 - 受扰动约束的流动攻击(PCFA) - 强调了对适用性的破坏性,作为现实世界中的攻击。 PCFA是一种全局攻击,它优化了对抗性扰动,以将预测的流向指定的目标流动,同时将扰动的L2标准保持在所选界限之下。我们的实验证明了PCFA在白色和黑色盒子设置中的适用性,并证明它发现比以前的攻击更强。基于这些强大的样本,我们考虑了考虑预测质量和对抗性鲁棒性的光流方法的第一个联合排名,这揭示了最新的方法特别容易受到攻击。代码可在https://github.com/cv-stuttgart/pcfa上找到。
translated by 谷歌翻译