Deep ensemble learning has been shown to improve accuracy by training multiple neural networks and averaging their outputs. Ensemble learning has also been suggested to defend against membership inference attacks that undermine privacy. In this paper, we empirically demonstrate a trade-off between these two goals, namely accuracy and privacy (in terms of membership inference attacks), in deep ensembles. Using a wide range of datasets and model architectures, we show that the effectiveness of membership inference attacks increases when ensembling improves accuracy. We analyze the impact of various factors in deep ensembles and demonstrate the root cause of the trade-off. Then, we evaluate common defenses against membership inference attacks based on regularization and differential privacy. We show that while these defenses can mitigate the effectiveness of membership inference attacks, they simultaneously degrade ensemble accuracy. We illustrate similar trade-off in more advanced and state-of-the-art ensembling techniques, such as snapshot ensembles and diversified ensemble networks. Finally, we propose a simple yet effective defense for deep ensembles to break the trade-off and, consequently, improve the accuracy and privacy, simultaneously.
translated by 谷歌翻译
With the wide-spread application of machine learning models, it has become critical to study the potential data leakage of models trained on sensitive data. Recently, various membership inference (MI) attacks are proposed that determines if a sample was part of the training set or not. Although the first generation of MI attacks has been proven to be ineffective in practice, a few recent studies proposed practical MI attacks that achieve reasonable true positive rate at low false positive rate. The question is whether these attacks can be reliably used in practice. We showcase a practical application of membership inference attacks where it is used by an auditor (investigator) to prove to a judge/jury that an auditee unlawfully used sensitive data during training. Then, we show that the auditee can provide a dataset (with potentially unlimited number of samples) to a judge where MI attacks catastrophically fail. Hence, the auditee challenges the credibility of the auditor and can get the case dismissed. More importantly, we show that the auditee does not need to know anything about the MI attack neither a query access to it. In other words, all currently SOTA MI attacks in literature suffer from the same issue. Through comprehensive experimental evaluation, we show that our algorithms can increase the false positive rate from ten to thousands times larger than what auditor claim to the judge. Lastly, we argue that the implication of our algorithms is beyond discredibility: Current membership inference attacks can identify the memorized subpopulations, but they cannot reliably identify which exact sample in the subpopulation was used during training.
translated by 谷歌翻译
会员推理攻击是机器学习模型中最简单的隐私泄漏形式之一:给定数据点和模型,确定该点是否用于培训模型。当查询其培训数据时,现有会员推理攻击利用模型的异常置信度。如果对手访问模型的预测标签,则不会申请这些攻击,而不会置信度。在本文中,我们介绍了仅限标签的会员资格推理攻击。我们的攻击而不是依赖置信分数,而是评估模型预测标签在扰动下的稳健性,以获得细粒度的隶属信号。这些扰动包括常见的数据增强或对抗例。我们经验表明,我们的标签占会员推理攻击与先前攻击相符,以便需要访问模型信心。我们进一步证明,仅限标签攻击违反了(隐含或明确)依赖于我们呼叫信心屏蔽的现象的员工推论攻击的多种防御。这些防御修改了模型的置信度分数以挫败攻击,但留下模型的预测标签不变。我们的标签攻击展示了置信性掩蔽不是抵御会员推理的可行的防御策略。最后,我们调查唯一的案例标签攻击,该攻击推断为少量异常值数据点。我们显示仅标签攻击也匹配此设置中基于置信的攻击。我们发现具有差异隐私和(强)L2正则化的培训模型是唯一已知的防御策略,成功地防止所有攻击。即使差异隐私预算太高而无法提供有意义的可证明担保,这仍然存在。
translated by 谷歌翻译
机器学习模型容易记住敏感数据,使它们容易受到会员推理攻击的攻击,其中对手的目的是推断是否使用输入样本来训练模型。在过去的几年中,研究人员产生了许多会员推理攻击和防御。但是,这些攻击和防御采用各种策略,并在不同的模型和数据集中进行。但是,缺乏全面的基准意味着我们不了解现有攻击和防御的优势和劣势。我们通过对不同的会员推理攻击和防御措施进行大规模测量来填补这一空白。我们通过研究九项攻击和六项防御措施来系统化成员的推断,并在整体评估中衡量不同攻击和防御的性能。然后,我们量化威胁模型对这些攻击结果的影响。我们发现,威胁模型的某些假设,例如相同架构和阴影和目标模型之间的相同分布是不必要的。我们也是第一个对从Internet收集的现实世界数据而不是实验室数据集进行攻击的人。我们进一步研究是什么决定了会员推理攻击的表现,并揭示了通常认为过度拟合水平不足以成功攻击。取而代之的是,成员和非成员样本之间的熵/横向熵的詹森 - 香农距离与攻击性能的相关性更好。这为我们提供了一种新的方法,可以在不进行攻击的情况下准确预测会员推理风险。最后,我们发现数据增强在更大程度上降低了现有攻击的性能,我们提出了使用增强作用的自适应攻击来训练阴影和攻击模型,以改善攻击性能。
translated by 谷歌翻译
员额推理攻击允许对训练的机器学习模型进行对手以预测模型的训练数据集中包含特定示例。目前使用平均案例的“精度”度量来评估这些攻击,该攻击未能表征攻击是否可以自信地识别培训集的任何成员。我们认为,应该通过计算其低(例如<0.1%)假阳性率来计算攻击来评估攻击,并在以这种方式评估时发现大多数事先攻击差。为了解决这一问题,我们开发了一个仔细结合文献中多种想法的似然比攻击(Lira)。我们的攻击是低于虚假阳性率的10倍,并且在攻击现有度量的情况下也严格占主导地位。
translated by 谷歌翻译
机器学习(ML)模型已广泛应用于各种应用,包括图像分类,文本生成,音频识别和图形数据分析。然而,最近的研究表明,ML模型容易受到隶属推导攻击(MIS),其目的是推断数据记录是否用于训练目标模型。 ML模型上的MIA可以直接导致隐私违规行为。例如,通过确定已经用于训练与某种疾病相关的模型的临床记录,攻击者可以推断临床记录的所有者具有很大的机会。近年来,MIS已被证明对各种ML模型有效,例如,分类模型和生成模型。同时,已经提出了许多防御方法来减轻米西亚。虽然ML模型上的MIAS形成了一个新的新兴和快速增长的研究区,但还没有对这一主题进行系统的调查。在本文中,我们对会员推论和防御进行了第一个全面调查。我们根据其特征提供攻击和防御的分类管理,并讨论其优点和缺点。根据本次调查中确定的限制和差距,我们指出了几个未来的未来研究方向,以激发希望遵循该地区的研究人员。这项调查不仅是研究社区的参考,而且还为该研究领域之外的研究人员带来了清晰的照片。为了进一步促进研究人员,我们创建了一个在线资源存储库,并与未来的相关作品继续更新。感兴趣的读者可以在https://github.com/hongshenghu/membership-inference-machine-learning-literature找到存储库。
translated by 谷歌翻译
We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on.We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.
translated by 谷歌翻译
对机器学习模型的会员推理攻击(MIA)可能会导致模型培训中使用的培训数据集的严重隐私风险。在本文中,我们提出了一种针对成员推理攻击(MIAS)的新颖有效的神经元引导的防御方法。我们确定了针对MIA的现有防御机制的关键弱点,在该机制中,他们不能同时防御两个常用的基于神经网络的MIA,表明应分别评估这两次攻击以确保防御效果。我们提出了Neuguard,这是一种新的防御方法,可以通过对象共同控制输出和内部神经元的激活,以指导训练集的模型输出和测试集的模型输出以具有近距离分布。 Neuguard由类别的差异最小化靶向限制最终输出神经元和层平衡输出控制的目标,旨在限制每一层中的内部神经元。我们评估Neuguard,并将其与最新的防御能力与两个基于神经网络的MIA,五个最强的基于度量的MIA,包括三个基准数据集中的新提出的仅标签MIA。结果表明,Neuguard通过提供大大改善的公用事业权衡权衡,一般性和间接费用来优于最先进的防御能力。
translated by 谷歌翻译
近年来,机器学习模型对成员推论攻击的脆弱性受到了很大的关注。然而,由于具有高假阳性率,现有攻击大多是不切实际的,其中非成员样本通常被错误地预测为成员。这种类型的错误使得预测的隶属信号不可靠,特别是因为大多数样本都是现实世界应用中的非成员。在这项工作中,我们认为会员推理攻击可以从\ emph {难度校准}剧烈地利用,其中调整攻击的预测会员评分以正确分类目标样本的难度。我们表明,在没有准确性的情况下,难度校准可以显着降低各种现有攻击的假阳性率。
translated by 谷歌翻译
如今,深度学习模型的所有者和开发人员必须考虑其培训数据的严格隐私保护规则,通常是人群来源且保留敏感信息。如今,深入学习模型执行隐私保证的最广泛采用的方法依赖于实施差异隐私的优化技术。根据文献,这种方法已被证明是针对多种模型的隐私攻击的成功防御,但其缺点是对模型的性能的实质性降级。在这项工作中,我们比较了差异私有的随机梯度下降(DP-SGD)算法与使用正则化技术的标准优化实践的有效性。我们分析了生成模型的实用程序,培训性能以及成员推理和模型反转攻击对学习模型的有效性。最后,我们讨论了差异隐私的缺陷和限制,并从经验上证明了辍学和L2型规范的卓越保护特性。
translated by 谷歌翻译
半监督学习(SSL)利用标记和未标记的数据来训练机器学习(ML)模型。最先进的SSL方法可以通过利用更少的标记数据来实现与监督学习相当的性能。但是,大多数现有作品都集中在提高SSL的性能。在这项工作中,我们通过研究SSL的培训数据隐私来采取不同的角度。具体而言,我们建议针对由SSL训练的ML模型进行的第一个基于数据增强的成员推理攻击。给定数据样本和黑框访问模型,成员推理攻击的目标是确定数据样本是否属于模型的训练数据集。我们的评估表明,拟议的攻击可以始终超过现有的成员推理攻击,并针对由SSL训练的模型实现最佳性能。此外,我们发现,SSL中会员泄漏的原因与受到监督学习中普遍认为的原因不同,即过度拟合(培训和测试准确性之间的差距)。我们观察到,SSL模型已被概括为测试数据(几乎为0个过度拟合),但“记住”训练数据通过提供更自信的预测,无论其正确性如何。我们还探索了早期停止,作为防止成员推理攻击SSL的对策。结果表明,早期停止可以减轻会员推理攻击,但由于模型的实用性降解成本。
translated by 谷歌翻译
Machine learning (ML) has become a core component of many real-world applications and training data is a key factor that drives current progress. This huge success has led Internet companies to deploy machine learning as a service (MLaaS). Recently, the first membership inference attack has shown that extraction of information on the training set is possible in such MLaaS settings, which has severe security and privacy implications.However, the early demonstrations of the feasibility of such attacks have many assumptions on the adversary, such as using multiple so-called shadow models, knowledge of the target model structure, and having a dataset from the same distribution as the target model's training data. We relax all these key assumptions, thereby showing that such attacks are very broadly applicable at low cost and thereby pose a more severe risk than previously thought. We present the most comprehensive study so far on this emerging and developing threat using eight diverse datasets which show the viability of the proposed attacks across domains.In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.
translated by 谷歌翻译
Neural networks are susceptible to data inference attacks such as the membership inference attack, the adversarial model inversion attack and the attribute inference attack, where the attacker could infer useful information such as the membership, the reconstruction or the sensitive attributes of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a method, namely PURIFIER, to defend against membership inference attacks. It transforms the confidence score vectors predicted by the target classifier and makes purified confidence scores indistinguishable in individual shape, statistical distribution and prediction label between members and non-members. The experimental results show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency, outperforming previous defense methods, and also incurs negligible utility loss. Besides, our further experiments show that PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks. For example, the inversion error is raised about 4+ times on the Facescrub530 classifier, and the attribute inference accuracy drops significantly when PURIFIER is deployed in our experiment.
translated by 谷歌翻译
作为对培训数据隐私的长期威胁,会员推理攻击(MIA)在机器学习模型中无处不在。现有作品证明了培训的区分性与测试损失分布与模型对MIA的脆弱性之间的密切联系。在现有结果的激励下,我们提出了一个基于轻松损失的新型培训框架,并具有更可实现的学习目标,从而导致概括差距狭窄和隐私泄漏减少。 RelaseLoss适用于任何分类模型,具有易于实施和可忽略不计的开销的额外好处。通过对具有不同方式(图像,医疗数据,交易记录)的五个数据集进行广泛的评估,我们的方法始终优于针对MIA和模型效用的韧性,以最先进的防御机制优于最先进的防御机制。我们的防御是第一个可以承受广泛攻击的同时,同时保存(甚至改善)目标模型的效用。源代码可从https://github.com/dingfanchen/relaxloss获得
translated by 谷歌翻译
神经网络修剪一直是减少对资源受限设备的深度神经网络的计算和记忆要求的重要技术。大多数现有的研究主要侧重于平衡修剪神经网络的稀疏性和准确性,通过策略性地删除无关紧要的参数并重新修剪修剪模型。由于记忆的增加而造成了严重的隐私风险,因此尚未调查这种训练样品的这种努力。在本文中,我们对神经网络修剪中的隐私风险进行了首次分析。具体而言,我们研究了神经网络修剪对培训数据隐私的影响,即成员推理攻击。我们首先探讨了神经网络修剪对预测差异的影响,在该预测差异中,修剪过程不成比例地影响了修剪的模型对成员和非会员的行为。同时,差异的影响甚至以细粒度的方式在不同类别之间有所不同。通过这种分歧,我们提出了对修剪的神经网络的自我发起会员推断攻击。进行了广泛的实验,以严格评估不同修剪方法,稀疏水平和对手知识的隐私影响。拟议的攻击表明,与现有的八次成员推理攻击相比,对修剪模型的攻击性能更高。此外,我们提出了一种新的防御机制,通过基于KL-Divergence距离来缓解预测差异,以保护修剪过程,该距离的预测差异已通过实验证明,可以有效地降低隐私风险,同时维持较修剪模型的稀疏性和准确性。
translated by 谷歌翻译
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge.We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing stateof-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
translated by 谷歌翻译
在模型提取攻击中,对手可以通过反复查询并根据获得的预测来窃取通过公共API暴露的机器学习模型。为了防止模型窃取,现有的防御措施专注于检测恶意查询,截断或扭曲输出,因此必然会为合法用户引入鲁棒性和模型实用程序之间的权衡。取而代之的是,我们建议通过要求用户在阅读模型的预测之前完成工作证明来阻碍模型提取。这可以通过大大增加(甚至高达100倍)来阻止攻击者,以利用查询访问模型提取所需的计算工作。由于我们校准完成每个查询的工作证明所需的努力,因此这仅为常规用户(最多2倍)引入一个轻微的开销。为了实现这一目标,我们的校准应用了来自差异隐私的工具来衡量查询揭示的信息。我们的方法不需要对受害者模型进行任何修改,可以通过机器学习从业人员来应用其公开暴露的模型免于轻易被盗。
translated by 谷歌翻译
In a membership inference attack, an attacker aims to infer whether a data sample is in a target classifier's training dataset or not. Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample's confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier's training dataset. Membership inference attacks pose severe privacy and security threats to the training dataset. Most existing defenses leverage differential privacy when training the target classifier or regularize the training process of the target classifier. These defenses suffer from two key limitations: 1) they do not have formal utility-loss guarantees of the confidence score vectors, and 2) they achieve suboptimal privacy-utility tradeoffs.In this work, we propose MemGuard, the first defense with formal utility-loss guarantees against black-box membership inference attacks. Instead of tampering the training process of the target classifier, MemGuard adds noise to each confidence score vector predicted by the target classifier. Our key observation is that attacker uses a classifier to predict member or non-member and classifier is vulnerable to adversarial examples. Based on the observation, we propose to add a carefully crafted noise vector to a confidence score vector to turn it into an adversarial example that misleads the attacker's classifier. Specifically, MemGuard works in two phases. In Phase I, MemGuard finds a carefully crafted noise vector that can turn a confidence score vector into an adversarial example, which is likely to mislead the attacker's classifier to make a random guessing at member or non-member. We find such carefully crafted noise vector via a new method that we design to incorporate the unique utility-loss constraints on the noise vector. In Phase II, Mem-Guard adds the noise vector to the confidence score vector with a certain probability, which is selected to satisfy a given utility-loss budget on the confidence score vector. Our experimental results on
translated by 谷歌翻译
Differential privacy is a strong notion for privacy that can be used to prove formal guarantees, in terms of a privacy budget, , about how much information is leaked by a mechanism. However, implementations of privacy-preserving machine learning often select large values of in order to get acceptable utility of the model, with little understanding of the impact of such choices on meaningful privacy. Moreover, in scenarios where iterative learning procedures are used, differential privacy variants that offer tighter analyses are used which appear to reduce the needed privacy budget but present poorly understood trade-offs between privacy and utility. In this paper, we quantify the impact of these choices on privacy in experiments with logistic regression and neural network models. Our main finding is that there is a huge gap between the upper bounds on privacy loss that can be guaranteed, even with advanced mechanisms, and the effective privacy loss that can be measured using current inference attacks. Current mechanisms for differentially private machine learning rarely offer acceptable utility-privacy trade-offs with guarantees for complex learning tasks: settings that provide limited accuracy loss provide meaningless privacy guarantees, and settings that provide strong privacy guarantees result in useless models.
translated by 谷歌翻译
会员推理(MI)攻击突出了当前神经网络随机培训方法中的隐私弱点。然而,它为什么出现。它们仅是不完美概括的自然结果吗?在培训期间,我们应该解决哪些根本原因以减轻这些攻击?为了回答此类问题,我们提出了第一种解释MI攻击及其基于原则性因果推理的概括的方法。我们提供因果图,以定量地解释以$ 6 $攻击变体获得的观察到的MI攻击性能。我们驳斥了几种先前的非量化假设,这些假设过于简化或过度估计潜在原因的影响,从而未能捕获几个因素之间的复杂相互作用。我们的因果模型还通过共同的因果因素显示了概括和MI攻击之间的新联系。我们的因果模型具有很高的预测能力($ 0.90 $),即它们的分析预测与经常看不见的实验中的观察结果相匹配,这使得通过它们的分析成为务实的替代方案。
translated by 谷歌翻译