The adversarial input generation problem has become central in establishing the robustness and trustworthiness of deep neural nets, especially when they are used in safety-critical application domains such as autonomous vehicles and precision medicine. This is also practically challenging for multiple reasons-scalability is a common issue owing to large-sized networks, and the generated adversarial inputs often lack important qualities such as naturalness and output-impartiality. We relate this problem to the task of patching neural nets, i.e. applying small changes in some of the network$'$s weights so that the modified net satisfies a given property. Intuitively, a patch can be used to produce an adversarial input because the effect of changing the weights can also be brought about by changing the inputs instead. This work presents a novel technique to patch neural networks and an innovative approach of using it to produce perturbations of inputs which are adversarial for the original net. We note that the proposed solution is significantly more effective than the prior state-of-the-art techniques.
translated by 谷歌翻译
机器学习算法和深度神经网络在几种感知和控制任务中的卓越性能正在推动该行业在安全关键应用中采用这种技术,作为自治机器人和自动驾驶车辆。然而,目前,需要解决几个问题,以使深入学习方法更可靠,可预测,安全,防止对抗性攻击。虽然已经提出了几种方法来提高深度神经网络的可信度,但大多数都是针对特定类的对抗示例量身定制的,因此未能检测到其他角落案件或不安全的输入,这些输入大量偏离训练样本。本文介绍了基于覆盖范式的轻量级监控架构,以增强针对不同不安全输入的模型鲁棒性。特别是,在用于评估多种检测逻辑的架构中提出并测试了四种覆盖分析方法。实验结果表明,该方法有效地检测强大的对抗性示例和分销外输入,引入有限的执行时间和内存要求。
translated by 谷歌翻译
Deep learning (DL) systems are increasingly deployed in safety-and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system's behavior for corner case inputs are of great importance. Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs.We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems. First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs. Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking. Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradientbased search techniques.DeepXplore efficiently finds thousands of incorrect corner case behaviors (e.g., self-driving cars crashing into guard rails and malware masquerading as benign software) in stateof-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity selfdriving challenge data. For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop. We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model's accuracy by up to 3%.
translated by 谷歌翻译
作为一个新的编程范式,深度神经网络(DNN)在实践中越来越多地部署,但是缺乏鲁棒性阻碍了他们在安全至关重要的领域中的应用。尽管有用于正式保证的DNN验证DNN的技术,但它们的可伸缩性和准确性有限。在本文中,我们提出了一种新颖的抽象方法,用于可扩展和精确的DNN验证。具体而言,我们提出了一种新颖的抽象来通过过度透明度分解DNN的大小。如果未报告任何虚假反例,验证抽象DNN的结果始终是结论性的。为了消除抽象提出的虚假反例,我们提出了一种新颖的反例引导的改进,该精炼精炼了抽象的DNN,以排除给定的虚假反例,同时仍然过分欣赏原始示例。我们的方法是正交的,并且可以与许多现有的验证技术集成。为了进行演示,我们使用两个有前途和确切的工具Marabou和Planet作为基础验证引擎实施我们的方法,并对广泛使用的基准ACAS XU,MNIST和CIFAR-10进行评估。结果表明,我们的方法可以通过解决更多问题并分别减少86.3%和78.0%的验证时间来提高他们的绩效。与最相关的抽象方法相比,我们的方法是11.6-26.6倍。
translated by 谷歌翻译
深度神经网络(DNN)已成为实现各种复杂任务的首选技术。但是,正如许多最近的研究所强调的那样,即使是对正确分类的输入的不可察觉的扰动也可能导致DNN错误分类。这使DNNS容易受到攻击者的战略输入操作,并且对环境噪声过敏。为了减轻这种现象,从业人员通过DNNS的“合奏”进行联合分类。通过汇总不同单个DNN的分类输出对相同的输入,基于合奏的分类可以减少因任何单个DNN的随机训练过程的特定实现而导致错误分类的风险。但是,DNN集合的有效性高度依赖于其成员 *在许多不同的输入上没有同时错误 *。在本案例研究中,我们利用DNN验证的最新进展,设计一种方法来识别一种合奏组成,即使输入对对抗性进行了扰动,也不太容易出现同时误差 - 从而导致基于更坚固的集合分类。我们提出的框架使用DNN验证器作为后端,并包括启发式方法,有助于降低直接验证合奏的高复杂性。从更广泛的角度来看,我们的工作提出了一个新颖的普遍目标,以实现正式验证,该目标可能可以改善各种应用领域的现实世界中基于深度学习的系统的鲁棒性。
translated by 谷歌翻译
背景信息:在过去几年中,机器学习(ML)一直是许多创新的核心。然而,包括在所谓的“安全关键”系统中,例如汽车或航空的系统已经被证明是非常具有挑战性的,因为ML的范式转变为ML带来完全改变传统认证方法。目的:本文旨在阐明与ML为基础的安全关键系统认证有关的挑战,以及文献中提出的解决方案,以解决它们,回答问题的问题如何证明基于机器学习的安全关键系统?'方法:我们开展2015年至2020年至2020年之间发布的研究论文的系统文献综述(SLR),涵盖了与ML系统认证有关的主题。总共确定了217篇论文涵盖了主题,被认为是ML认证的主要支柱:鲁棒性,不确定性,解释性,验证,安全强化学习和直接认证。我们分析了每个子场的主要趋势和问题,并提取了提取的论文的总结。结果:单反结果突出了社区对该主题的热情,以及在数据集和模型类型方面缺乏多样性。它还强调需要进一步发展学术界和行业之间的联系,以加深域名研究。最后,它还说明了必须在上面提到的主要支柱之间建立连接的必要性,这些主要柱主要主要研究。结论:我们强调了目前部署的努力,以实现ML基于ML的软件系统,并讨论了一些未来的研究方向。
translated by 谷歌翻译
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
神经网络已广泛应用于垃圾邮件和网络钓鱼检测,入侵预防和恶意软件检测等安全应用程序。但是,这种黑盒方法通常在应用中具有不确定性和不良的解释性。此外,神经网络本身通常容易受到对抗攻击的影响。由于这些原因,人们对可信赖和严格的方法有很高的需求来验证神经网络模型的鲁棒性。对抗性的鲁棒性在处理恶意操纵输入时涉及神经网络的可靠性,是安全和机器学习中最热门的主题之一。在这项工作中,我们在神经网络的对抗性鲁棒性验证中调查了现有文献,并在机器学习,安全和软件工程领域收集了39项多元化研究工作。我们系统地分析了它们的方法,包括如何制定鲁棒性,使用哪种验证技术以及每种技术的优势和局限性。我们从正式验证的角度提供分类学,以全面理解该主题。我们根据财产规范,减少问题和推理策略对现有技术进行分类。我们还展示了使用样本模型在现有研究中应用的代表性技术。最后,我们讨论了未来研究的开放问题。
translated by 谷歌翻译
Deep neural networks have achieved impressive experimental results in image classification, but can surprisingly be unstable with respect to adversarial perturbations, that is, minimal changes to the input image that cause the network to misclassify it. With potential applications including perception modules and end-to-end controllers for self-driving cars, this raises concerns about their safety. We develop a novel automated verification framework for feed-forward multi-layer neural networks based on Satisfiability Modulo Theory (SMT). We focus on safety of image classification decisions with respect to image manipulations, such as scratches or changes to camera angle or lighting conditions that would result in the same class being assigned by a human, and define safety for an individual decision in terms of invariance of the classification within a small neighbourhood of the original image. We enable exhaustive search of the region by employing discretisation, and propagate the analysis layer by layer. Our method works directly with the network code and, in contrast to existing methods, can guarantee that adversarial examples, if they exist, are found for the given region and family of manipulations. If found, adversarial examples can be shown to human testers and/or used to fine-tune the network. We implement the techniques using Z3 and evaluate them on state-of-the-art networks, including regularised and deep learning networks. We also compare against existing techniques to search for adversarial examples and estimate network robustness.
translated by 谷歌翻译
Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
在现实世界应用中的深度神经网络(DNN)的成功受益于丰富的预训练模型。然而,回溯预训练模型可以对下游DNN的部署构成显着的特洛伊木马威胁。现有的DNN测试方法主要旨在在对抗性设置中找到错误的角壳行为,但未能发现由强大的木马攻击所制作的后门。观察特洛伊木马网络行为表明,它们不仅由先前的工作所提出的单一受损神经元反射,而且归因于在多个神经元的激活强度和频率中的关键神经路径。这项工作制定了DNN后门测试,并提出了录音机框架。通过少量良性示例的关键神经元的差异模糊,我们识别特洛伊木马路径,特别是临界人,并通过模拟所识别的路径中的关键神经元来产生后门测试示例。广泛的实验表明了追索者的优越性,比现有方法更高的检测性能。通过隐秘的混合和自适应攻击来检测到后门的录音机更好,现有方法无法检测到。此外,我们的实验表明,录音所可能会揭示模型动物园中的模型的潜在潜在的背面。
translated by 谷歌翻译
随着深度学习在关键任务系统中的越来越多的应用,越来越需要对神经网络的行为进行正式保证。确实,最近提出了许多用于验证神经网络的方法,但是这些方法通常以有限的可伸缩性或不足的精度而挣扎。许多最先进的验证方案中的关键组成部分是在网络中可以为特定输入域获得的神经元获得的值计算下限和上限 - 并且这些界限更紧密,验证的可能性越大,验证的可能性就越大。成功。计算这些边界的许多常见算法是符号结合传播方法的变化。其中,利用一种称为后替代的过程的方法特别成功。在本文中,我们提出了一种使背部替代产生更严格的界限的方法。为了实现这一目标,我们制定并最大程度地减少背部固定过程中发生的不精确错误。我们的技术是一般的,从某种意义上说,它可以将其集成到许多现有的符号结合的传播技术中,并且只有较小的修改。我们将方法作为概念验证工具实施,并且与执行背部替代的最先进的验证者相比,取得了有利的结果。
translated by 谷歌翻译
在本文中,我们提出了一种防御策略,以通过合并隐藏的层表示来改善对抗性鲁棒性。这种防御策略的关键旨在压缩或过滤输入信息,包括对抗扰动。而且这种防御策略可以被视为一种激活函数,可以应用于任何类型的神经网络。从理论上讲,我们在某些条件下也证明了这种防御策略的有效性。此外,合并隐藏层表示,我们提出了三种类型的对抗攻击,分别生成三种类型的对抗示例。实验表明,我们的防御方法可以显着改善深神经网络的对抗性鲁棒性,即使我们不采用对抗性训练,也可以实现最新的表现。
translated by 谷歌翻译
由于它们在计算机视觉,图像处理和其他人领域的优异性能,卷积神经网络具有极大的普及。不幸的是,现在众所周知,卷积网络通常产生错误的结果 - 例如,这些网络的输入的小扰动可能导致严重的分类错误。近年来提出了许多验证方法,以证明没有此类错误,但这些通常用于完全连接的网络,并且在应用于卷积网络时遭受加剧的可扩展性问题。为了解决这一差距,我们在这里介绍了CNN-ABS框架,特别是旨在验证卷积网络。 CNN-ABS的核心是一种抽象细化技术,它通过拆除卷积连接,以便在这种方式创造原始问题的过度逼近来简化验证问题;如果产生的问题变得过于抽象,它会恢复这些连接。 CNN-ABS旨在使用现有的验证引擎作为后端,我们的评估表明它可以显着提高最先进的DNN验证引擎的性能,平均降低运行时间15.7%。
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.
translated by 谷歌翻译
深度神经网络(DNN)模型,包括在安全 - 关键域中使用的模型,需要进行彻底测试,以确保它们在不同的情况下可以可靠地表现良好。在本文中,我们提供了用于测试DNN模型的结构覆盖量指标,包括神经元覆盖(NC),K-Multisection神经元覆盖范围(KMNC),TOP-K神经元覆盖范围(TKNC),神经元边界覆盖率(NBC),强元(NBC),强神经元激活覆盖范围(SNAC)和修改条件/决策覆盖范围(MC/DC)。我们评估用于感知任务的现实DNN模型(包括LENET-1,LENET-4,LENET-5和RESNET20)以及自治(TAXINET)中使用的网络的指标。我们还提供了一个工具DNNCOV,可以测量所有这些指标的测试覆盖范围。 DNNCOV向研究人员和从业人员提供了一份信息丰富的报道报告,以评估DNN测试的充分性,比较不同的覆盖范围,并在测试过程中更方便地检查模型的内部。
translated by 谷歌翻译
We present AI 2 , the first sound and scalable analyzer for deep neural networks. Based on overapproximation, AI 2 can automatically prove safety properties (e.g., robustness) of realistic neural networks (e.g., convolutional neural networks).The key insight behind AI 2 is to phrase reasoning about safety and robustness of neural networks in terms of classic abstract interpretation, enabling us to leverage decades of advances in that area. Concretely, we introduce abstract transformers that capture the behavior of fully connected and convolutional neural network layers with rectified linear unit activations (ReLU), as well as max pooling layers. This allows us to handle real-world neural networks, which are often built out of those types of layers.We present a complete implementation of AI 2 together with an extensive evaluation on 20 neural networks. Our results demonstrate that: (i) AI 2 is precise enough to prove useful specifications (e.g., robustness), (ii) AI 2 can be used to certify the effectiveness of state-of-the-art defenses for neural networks, (iii) AI 2 is significantly faster than existing analyzers based on symbolic analysis, which often take hours to verify simple fully connected networks, and (iv) AI 2 can handle deep convolutional networks, which are beyond the reach of existing methods.
translated by 谷歌翻译