在对抗机器学习中,防止对深度学习系统的攻击的新防御能力在释放更强大的攻击后不久就会破坏。在这种情况下,法医工具可以通过追溯成功的根本原因来为现有防御措施提供宝贵的补充,并为缓解措施提供前进的途径,以防止将来采取类似的攻击。在本文中,我们描述了我们为开发用于深度神经网络毒物攻击的法医追溯工具的努力。我们提出了一种新型的迭代聚类和修剪解决方案,该解决方案修剪了“无辜”训练样本,直到所有剩余的是一组造成攻击的中毒数据。我们的方法群群训练样本基于它们对模型参数的影响,然后使用有效的数据解读方法来修剪无辜簇。我们从经验上证明了系统对三种类型的肮脏标签(后门)毒物攻击和三种类型的清洁标签毒药攻击的功效,这些毒物跨越了计算机视觉和恶意软件分类。我们的系统在所有攻击中都达到了98.4%的精度和96.8%的召回。我们还表明,我们的系统与专门攻击它的四种抗纤维法措施相对强大。
translated by 谷歌翻译
与令人印象深刻的进步触动了我们社会的各个方面,基于深度神经网络(DNN)的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注,但是通过干扰培训过程来利用破坏DNN模型的可能性,代表了破坏训练过程的可能性,这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中,攻击者损坏了培训数据,以便在测试时间诱导错误的行为。然而,测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式,损坏的网络继续正常输入的预期工作,并且只有当攻击者决定激活网络内隐藏的后门时,才会发生恶意行为。在过去几年中,后门攻击一直是强烈的研究活动的主题,重点是新的攻击阶段的发展,以及可能对策的提议。此概述文件的目标是审查发表的作品,直到现在,分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量,以及防御者验证用于培训的数据的完整性,并监控DNN在培训和测试中的操作时间。因此,拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。
translated by 谷歌翻译
计算能力和大型培训数据集的可用性增加,机器学习的成功助长了。假设它充分代表了在测试时遇到的数据,则使用培训数据来学习新模型或更新现有模型。这种假设受到中毒威胁的挑战,这种攻击会操纵训练数据,以损害模型在测试时的表现。尽管中毒已被认为是行业应用中的相关威胁,到目前为止,已经提出了各种不同的攻击和防御措施,但对该领域的完整系统化和批判性审查仍然缺失。在这项调查中,我们在机器学习中提供了中毒攻击和防御措施的全面系统化,审查了过去15年中该领域发表的100多篇论文。我们首先对当前的威胁模型和攻击进行分类,然后相应地组织现有防御。虽然我们主要关注计算机视觉应用程序,但我们认为我们的系统化还包括其他数据模式的最新攻击和防御。最后,我们讨论了中毒研究的现有资源,并阐明了当前的局限性和该研究领域的开放研究问题。
translated by 谷歌翻译
A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.
translated by 谷歌翻译
已知深度学习系统容易受到对抗例子的影响。特别是,基于查询的黑框攻击不需要深入学习模型的知识,而可以通过提交查询和检查收益来计算网络上的对抗示例。最近的工作在很大程度上提高了这些攻击的效率,证明了它们在当今的ML-AS-A-Service平台上的实用性。我们提出了Blacklight,这是针对基于查询的黑盒对抗攻击的新防御。推动我们设计的基本见解是,为了计算对抗性示例,这些攻击在网络上进行了迭代优化,从而在输入空间中产生了非常相似的图像查询。 Blacklight使用在概率内容指纹上运行的有效相似性引擎来检测高度相似的查询来检测基于查询的黑盒攻击。我们根据各种模型和图像分类任务对八次最先进的攻击进行评估。 Blacklight通常只有几次查询后,都可以识别所有这些。通过拒绝所有检测到的查询,即使攻击者在帐户禁令或查询拒绝之后持续提交查询,Blacklight也可以防止任何攻击完成。 Blacklight在几个强大的对策中也很强大,包括最佳的黑盒攻击,该攻击近似于效率的白色框攻击。最后,我们说明了黑光如何推广到其他域,例如文本分类。
translated by 谷歌翻译
后门攻击已被证明是对深度学习模型的严重安全威胁,并且检测给定模型是否已成为后门成为至关重要的任务。现有的防御措施主要建立在观察到后门触发器通常尺寸很小或仅影响几个神经元激活的观察结果。但是,在许多情况下,尤其是对于高级后门攻击,违反了上述观察结果,阻碍了现有防御的性能和适用性。在本文中,我们提出了基于新观察的后门防御范围。也就是说,有效的后门攻击通常需要对中毒训练样本的高预测置信度,以确保训练有素的模型具有很高的可能性。基于此观察结果,Dtinspector首先学习一个可以改变最高信心数据的预测的补丁,然后通过检查在低信心数据上应用学习补丁后检查预测变化的比率来决定后门的存在。对五次后门攻击,四个数据集和三种高级攻击类型的广泛评估证明了拟议防御的有效性。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
深度神经网络(DNNS)在训练过程中容易受到后门攻击的影响。该模型以这种方式损坏正常起作用,但是当输入中的某些模式触发时,会产生预定义的目标标签。现有防御通常依赖于通用后门设置的假设,其中有毒样品共享相同的均匀扳机。但是,最近的高级后门攻击表明,这种假设在动态后门中不再有效,在动态后门中,触发者因输入而异,从而击败了现有的防御。在这项工作中,我们提出了一种新颖的技术BEATRIX(通过革兰氏矩阵检测)。 BEATRIX利用革兰氏矩阵不仅捕获特征相关性,还可以捕获表示形式的适当高阶信息。通过从正常样本的激活模式中学习类条件统计,BEATRIX可以通过捕获激活模式中的异常来识别中毒样品。为了进一步提高识别目标标签的性能,BEATRIX利用基于内核的测试,而无需对表示分布进行任何先前的假设。我们通过与最先进的防御技术进行了广泛的评估和比较来证明我们的方法的有效性。实验结果表明,我们的方法在检测动态后门时达到了91.1%的F1得分,而最新技术只能达到36.9%。
translated by 谷歌翻译
最近的作品表明,深度学习模型容易受到后门中毒攻击的影响,在这些攻击中,这些攻击灌输了与外部触发模式或物体(例如贴纸,太阳镜等)的虚假相关性。我们发现这种外部触发信号是不必要的,因为可以使用基于旋转的图像转换轻松插入高效的后门。我们的方法通过旋转有限数量的对象并将其标记错误来构建中毒数据集;一旦接受过培训,受害者的模型将在运行时间推理期间做出不良的预测。它表现出明显的攻击成功率,同时通过有关图像分类和对象检测任务的全面实证研究来保持清洁绩效。此外,我们评估了标准数据增强技术和针对我们的攻击的四种不同的后门防御措施,发现它们都无法作为一致的缓解方法。正如我们在图像分类和对象检测应用程序中所示,我们的攻击只能在现实世界中轻松部署在现实世界中。总体而言,我们的工作突出了一个新的,简单的,物理上可实现的,高效的矢量,用于后门攻击。我们的视频演示可在https://youtu.be/6jif8wnx34m上找到。
translated by 谷歌翻译
后门攻击已成为深度神经网络(DNN)的主要安全威胁。虽然现有的防御方法在检测或擦除后以后展示了有希望的结果,但仍然尚不清楚是否可以设计强大的培训方法,以防止后门触发器首先注入训练的模型。在本文中,我们介绍了\ emph {反后门学习}的概念,旨在培训\ emph {Clean}模型给出了后门中毒数据。我们将整体学习过程框架作为学习\ emph {clean}和\ emph {backdoor}部分的双重任务。从这种观点来看,我们确定了两个后门攻击的固有特征,因为他们的弱点2)后门任务与特定类(后门目标类)相关联。根据这两个弱点,我们提出了一般学习计划,反后门学习(ABL),在培训期间自动防止后门攻击。 ABL引入了标准培训的两级\ EMPH {梯度上升}机制,帮助分离早期训练阶段的后台示例,2)在后续训练阶段中断后门示例和目标类之间的相关性。通过对多个基准数据集的广泛实验,针对10个最先进的攻击,我们经验证明,后卫中毒数据上的ABL培训模型实现了与纯净清洁数据训练的相同性能。代码可用于\ url {https:/github.com/boylyg/abl}。
translated by 谷歌翻译
Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.
translated by 谷歌翻译
后门是针对深神经网络(DNN)的强大攻击。通过中毒训练数据,攻击者可以将隐藏的规则(后门)注入DNN,该规则仅在包含攻击特异性触发器的输入上激活。尽管现有工作已经研究了各种DNN模型的后门攻击,但它们仅考虑静态模型,这些模型在初始部署后保持不变。在本文中,我们研究了后门攻击对时变DNN模型更现实的情况的影响,其中定期更新模型权重以处理数据分布的漂移。具体而言,我们从经验上量化了后门针对模型更新的“生存能力”,并检查攻击参数,数据漂移行为和模型更新策略如何影响后门生存能力。我们的结果表明,即使攻击者会积极增加触发器的大小和毒药比,即使在几个模型更新中,一次射击后门攻击(即一次仅中毒训练数据)也无法幸免。为了保持模型更新影响,攻击者必须不断将损坏的数据引入培训管道。这些结果共同表明,当模型更新以学习新数据时,它们也将后门“忘记”为隐藏的恶意功能。旧培训数据之间的分配变化越大,后门被遗忘了。利用这些见解,我们应用了智能学习率调度程序,以进一步加速模型更新期间的后门遗忘,这阻止了单发后门在单个模型更新中幸存下来。
translated by 谷歌翻译
随着机器学习数据的策展变得越来越自动化,数据集篡改是一种安装威胁。后门攻击者通过培训数据篡改,以嵌入在该数据上培训的模型中的漏洞。然后通过将“触发”放入模型的输入中的推理时间以推理时间激活此漏洞。典型的后门攻击将触发器直接插入训练数据,尽管在检查时可能会看到这种攻击。相比之下,隐藏的触发后托攻击攻击达到中毒,而无需将触发器放入训练数据即可。然而,这种隐藏的触发攻击在从头开始培训的中毒神经网络时无效。我们开发了一个新的隐藏触发攻击,睡眠代理,在制备过程中使用梯度匹配,数据选择和目标模型重新培训。睡眠者代理是第一个隐藏的触发后门攻击,以对从头开始培训的神经网络有效。我们展示了Imagenet和黑盒设置的有效性。我们的实现代码可以在https://github.com/hsouri/sleeper-agent找到。
translated by 谷歌翻译
本文提出了针对回顾性神经网络(Badnets)的新型两级防御(NNOCULICULE),该案例在响应该字段中遇到的回溯测试输入,修复了预部署和在线的BADNET。在预部署阶段,NNICULICULE与清洁验证输入的随机扰动进行检测,以部分减少后门的对抗影响。部署后,NNOCULICULE通过在原始和预先部署修补网络之间录制分歧来检测和隔离测试输入。然后培训Constcan以学习清洁验证和隔离输入之间的转换;即,它学会添加触发器来清洁验证图像。回顾验证图像以及其正确的标签用于进一步重新培训预修补程序,产生我们的最终防御。关于全面的后门攻击套件的实证评估表明,NNOCLICULE优于所有最先进的防御,以制定限制性假设,并且仅在特定的后门攻击上工作,或者在适应性攻击中失败。相比之下,NNICULICULE使得最小的假设并提供有效的防御,即使在现有防御因攻击者而导致其限制假设而导致的现有防御无效的情况下。
translated by 谷歌翻译
有针对性的训练集攻击将恶意实例注入训练集中,以导致训练有素的模型错误地标记一个或多个特定的测试实例。这项工作提出了目标识别的任务,该任务决定了特定的测试实例是否是训练集攻击的目标。目标识别可以与对抗性识别相结合,以查找(并删除)攻击实例,从而减轻对其他预测的影响,从而减轻攻击。我们没有专注于单个攻击方法或数据模式,而是基于影响力估计,这量化了每个培训实例对模型预测的贡献。我们表明,现有的影响估计量的不良实际表现通常来自于他们对训练实例和迭代次数的过度依赖。我们重新归一化的影响估计器解决了这一弱点。他们的表现远远超过了原始估计量,可以在对抗和非对抗环境中识别有影响力的训练示例群体,甚至发现多达100%的对抗训练实例,没有清洁数据误报。然后,目标识别简化以检测具有异常影响值的测试实例。我们证明了我们的方法对各种数据域的后门和中毒攻击的有效性,包括文本,视觉和语音,以及针对灰色盒子的自适应攻击者,该攻击者专门优化了逃避我们方法的对抗性实例。我们的源代码可在https://github.com/zaydh/target_indistification中找到。
translated by 谷歌翻译
With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time.We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.
translated by 谷歌翻译
最近的研究表明,深神经网络(DNN)易受对抗性攻击的影响,包括逃避和后门(中毒)攻击。在防守方面,有密集的努力,改善了对逃避袭击的经验和可怜的稳健性;然而,对后门攻击的可稳健性仍然很大程度上是未开发的。在本文中,我们专注于认证机器学习模型稳健性,反对一般威胁模型,尤其是后门攻击。我们首先通过随机平滑技术提供统一的框架,并展示如何实例化以证明对逃避和后门攻击的鲁棒性。然后,我们提出了第一个强大的培训过程Rab,以平滑训练有素的模型,并证明其稳健性对抗后门攻击。我们派生机学习模型的稳健性突出了培训的机器学习模型,并证明我们的鲁棒性受到紧张。此外,我们表明,可以有效地训练强大的平滑模型,以适用于诸如k最近邻分类器的简单模型,并提出了一种精确的平滑训练算法,该算法消除了从这种模型的噪声分布采样采样的需要。经验上,我们对MNIST,CIFAR-10和Imagenet数据集等DNN,差异私有DNN和K-NN模型等不同机器学习(ML)型号进行了全面的实验,并为反卧系攻击提供认证稳健性的第一个基准。此外,我们在SPAMBase表格数据集上评估K-NN模型,以展示所提出的精确算法的优点。对多元化模型和数据集的综合评价既有关于普通训练时间攻击的进一步强劲学习策略的多样化模型和数据集的综合评价。
translated by 谷歌翻译
AI安全社区的一个主要目标是为现实世界应用安全可靠地生产和部署深入学习模型。为此,近年来,在生产阶段(或培训阶段)和相应的防御中,基于数据中毒基于深度神经网络(DNN)的后门攻击以及相应的防御。具有讽刺意味的是,部署阶段的后门攻击,这些攻击通常可以在不专业用户的设备中发生,因此可以说是在现实世界的情景中威胁要威胁,得以更少的关注社区。我们将这种警惕的不平衡归因于现有部署阶段后门攻击算法的弱实用性以及现实世界攻击示范的不足。为了填补空白,在这项工作中,我们研究了对DNN的部署阶段后门攻击的现实威胁。我们基于普通使用的部署阶段攻击范式 - 对抗对抗权重攻击的研究,主体选择性地修改模型权重,以将后台嵌入到部署的DNN中。为了实现现实的实用性,我们提出了第一款灰度盒和物理可实现的重量攻击算法,即替换注射,即子网替换攻击(SRA),只需要受害者模型的架构信息,并且可以支持现实世界中的物理触发器。进行了广泛的实验模拟和系统级真实的世界攻击示范。我们的结果不仅提出了所提出的攻击算法的有效性和实用性,还揭示了一种新型计算机病毒的实际风险,这些计算机病毒可能会广泛传播和悄悄地将后门注入用户设备中的DNN模型。通过我们的研究,我们要求更多地关注DNN在部署阶段的脆弱性。
translated by 谷歌翻译
Federated Learning (FL) is a scheme for collaboratively training Deep Neural Networks (DNNs) with multiple data sources from different clients. Instead of sharing the data, each client trains the model locally, resulting in improved privacy. However, recently so-called targeted poisoning attacks have been proposed that allow individual clients to inject a backdoor into the trained model. Existing defenses against these backdoor attacks either rely on techniques like Differential Privacy to mitigate the backdoor, or analyze the weights of the individual models and apply outlier detection methods that restricts these defenses to certain data distributions. However, adding noise to the models' parameters or excluding benign outliers might also reduce the accuracy of the collaboratively trained model. Additionally, allowing the server to inspect the clients' models creates a privacy risk due to existing knowledge extraction methods. We propose CrowdGuard, a model filtering defense, that mitigates backdoor attacks by leveraging the clients' data to analyze the individual models before the aggregation. To prevent data leaks, the server sends the individual models to secure enclaves, running in client-located Trusted Execution Environments. To effectively distinguish benign and poisoned models, even if the data of different clients are not independently and identically distributed (non-IID), we introduce a novel metric called HLBIM to analyze the outputs of the DNN's hidden layers. We show that the applied significance-based detection algorithm combined can effectively detect poisoned models, even in non-IID scenarios. We show in our extensive evaluation that CrowdGuard can effectively mitigate targeted poisoning attacks and achieve in various scenarios a True-Positive-Rate of 100% and a True-Negative-Rate of 100%.
translated by 谷歌翻译
Transforming off-the-shelf deep neural network (DNN) models into dynamic multi-exit architectures can achieve inference and transmission efficiency by fragmenting and distributing a large DNN model in edge computing scenarios (e.g., edge devices and cloud servers). In this paper, we propose a novel backdoor attack specifically on the dynamic multi-exit DNN models. Particularly, we inject a backdoor by poisoning one DNN model's shallow hidden layers targeting not this vanilla DNN model but only its dynamically deployed multi-exit architectures. Our backdoored vanilla model behaves normally on performance and cannot be activated even with the correct trigger. However, the backdoor will be activated when the victims acquire this model and transform it into a dynamic multi-exit architecture at their deployment. We conduct extensive experiments to prove the effectiveness of our attack on three structures (ResNet-56, VGG-16, and MobileNet) with four datasets (CIFAR-10, SVHN, GTSRB, and Tiny-ImageNet) and our backdoor is stealthy to evade multiple state-of-the-art backdoor detection or removal methods.
translated by 谷歌翻译