已经提出了一种夸张的方法来解释深度神经网络如何达到他们的决策,但相比之下,已经做出了很少的努力,以确保这些方法产生的解释是客观相关的。虽然制定了一些可信赖的解释的若干理想的性质,但客观措施越来越难以得出。在这里,我们提出了两项​​新措施来评估从算法稳定性领域借来的解释:意味着普通象征性和相对一致性的重读。我们对不同的网络架构,常见解释性方法和几个图像数据集进行广泛的实验,以证明提出措施的好处。与我们的策略相比,流行的保真度措施不足以保证值得信赖的解释。最后,我们发现1-Lipschitz网络在达到类似的准确度的同时,具有比普通神经网络更高的象征和重新遗传的解释。这表明1-lipschitz网络是朝着更可解释和值得信赖的预测器的相关方向。
translated by 谷歌翻译
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.
translated by 谷歌翻译
已经提出了多种解释性方法和理论评价分数。然而,尚不清楚:(1)这些方法有多有用的现实情景和(2)理论措施如何预测人类实际使用方法的有用性。为了填补这一差距,我们在规模中进行了人类的心理物理学实验,以评估人类参与者(n = 1,150)以利用代表性归因方法学习预测不同图像分类器的决定的能力。我们的结果表明,用于得分的理论措施可解释方法的反映在现实世界方案中的个人归因方法的实际实用性不佳。此外,个人归因方法帮助人类参与者预测分类器的决策的程度在分类任务和数据集中广泛变化。总体而言,我们的结果突出了该领域的根本挑战 - 建议致力于开发更好的解释方法和部署人以人为本的评估方法。我们将制定框架的代码可用于缓解新颖解释性方法的系统评估。
translated by 谷歌翻译
我们描述了一种新颖的归因方法,它基于敏感性分析并使用Sobol指数。除了模拟图像区域的个人贡献之外,索尔索尔指标提供了一种有效的方法来通过方差镜头捕获图像区域与其对神经网络的预测的贡献之间的高阶相互作用。我们描述了一种通过使用扰动掩模与有效估计器耦合的扰动掩模来计算用于高维问题的这些指标的方法,以处理图像的高维度。重要的是,我们表明,与其他黑盒方法相比,该方法对视觉(和语言模型)的标准基准测试的标准基准有利地导致了有利的分数 - 甚至超过最先进的白色的准确性 - 需要访问内部表示的箱方法。我们的代码是免费的:https://github.com/fel-thomas/sobol-attribution-method
translated by 谷歌翻译
本文提出了一种基于Hilbert-Schmidt独立标准(HSIC)的新有效的黑盒归因方法,这是一种基于再现核Hilbert Spaces(RKHS)的依赖度量。 HSIC测量了基于分布的内核的输入图像区域之间的依赖性和模型的输出。因此,它提供了由RKHS表示功能丰富的解释。可以非常有效地估计HSIC,与其他黑盒归因方法相比,大大降低了计算成本。我们的实验表明,HSIC的速度比以前的最佳黑盒归因方法快8倍,同时忠实。确实,我们改进或匹配了黑盒和白框归因方法的最新方法,用于具有各种最近的模型体系结构的Imagenet上的几个保真度指标。重要的是,我们表明这些进步可以被转化为有效而忠实地解释诸如Yolov4之类的对象检测模型。最后,我们通过提出一种新的内核来扩展传统的归因方法,从而实现基于HSIC的重要性分数的正交分解,从而使我们不仅可以评估每个图像贴片的重要性,还可以评估其成对相互作用的重要性。
translated by 谷歌翻译
随着深度神经网络的兴起,解释这些网络预测的挑战已经越来越识别。虽然存在许多用于解释深度神经网络的决策的方法,但目前没有关于如何评估它们的共识。另一方面,鲁棒性是深度学习研究的热门话题;但是,在最近,几乎没有谈论解释性。在本教程中,我们首先呈现基于梯度的可解释性方法。这些技术使用梯度信号来分配对输入特征的决定的负担。后来,我们讨论如何为其鲁棒性和对抗性的鲁棒性在具有有意义的解释中扮演的作用来评估基于梯度的方法。我们还讨论了基于梯度的方法的局限性。最后,我们提出了在选择解释性方法之前应检查的最佳实践和属性。我们结束了未来在稳健性和解释性融合的地区研究的研究。
translated by 谷歌翻译
我们认为,当学习一个具有最佳运输问题双重损失的1- lipschitz神经网络时,模型的梯度既是运输计划的方向,又是与最接近的对抗性攻击的方向。沿着梯度前往决策边界不再是对抗性攻击,而是反事实的解释,明确地从一个班级运输到另一个班级。通过对XAI指标进行的广泛实验,我们发现应用于此类网络的简单显着性图方法成为可靠的解释,并且在不受约束的模型上胜过最新的解释方法。所提出的网络已经众所周知,可以证明它们也可以通过快速而简单的方法来证明它们也可以解释。
translated by 谷歌翻译
Deep neural networks are being used increasingly to automate data analysis and decision making, yet their decision-making process is largely unclear and is difficult to explain to the end users. In this paper, we address the problem of Explainable AI for deep neural networks that take images as input and output a class probability. We propose an approach called RISE that generates an importance map indicating how salient each pixel is for the model's prediction. In contrast to white-box approaches that estimate pixel importance using gradients or other internal network state, RISE works on blackbox models. It estimates importance empirically by probing the model with randomly masked versions of the input image and obtaining the corresponding outputs. We compare our approach to state-of-the-art importance extraction methods using both an automatic deletion/insertion metric and a pointing metric based on human-annotated object segments. Extensive experiments on several benchmark datasets show that our approach matches or exceeds the performance of other methods, including white-box approaches.
translated by 谷歌翻译
Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net. Evaluations of saliency methods convert this heat map into a new {\em masked input} by retaining the $k$ highest-ranked pixels of the original input and replacing the rest with \textquotedblleft uninformative\textquotedblright\ pixels, and checking if the net's output is mostly unchanged. This is usually seen as an {\em explanation} of the output, but the current paper highlights reasons why this inference of causality may be suspect. Inspired by logic concepts of {\em completeness \& soundness}, it observes that the above type of evaluation focuses on completeness of the explanation, but ignores soundness. New evaluation metrics are introduced to capture both notions, while staying in an {\em intrinsic} framework -- i.e., using the dataset and the net, but no separately trained nets, human evaluations, etc. A simple saliency method is described that matches or outperforms prior methods in the evaluations. Experiments also suggest new intrinsic justifications, based on soundness, for popular heuristic tricks such as TV regularization and upsampling.
translated by 谷歌翻译
Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.
translated by 谷歌翻译
The most popular methods and algorithms for AI are, for the vast majority, black boxes. Black boxes can be an acceptable solution to unimportant problems (in the sense of the degree of impact) but have a fatal flaw for the rest. Therefore the explanation tools for them have been quickly developed. The evaluation of their quality remains an open research question. In this technical report, we remind recently proposed post-hoc explainers FEM and MLFEM which have been designed for explanations of CNNs in image and video classification tasks. We also propose their evaluation with reference-based and no-reference metrics. The reference-based metrics are Pearson Correlation coefficient and Similarity computed between the explanation maps and the ground truth, which is represented by Gaze Fixation Density Maps obtained due to a psycho-visual experiment. As a no-reference metric we use "stability" metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kind of degradations on input images, this metric is in agreement with reference-based ones. Therefore it can be used for evaluation of the quality of explainers when the ground truth is not available.
translated by 谷歌翻译
背景信息:在过去几年中,机器学习(ML)一直是许多创新的核心。然而,包括在所谓的“安全关键”系统中,例如汽车或航空的系统已经被证明是非常具有挑战性的,因为ML的范式转变为ML带来完全改变传统认证方法。目的:本文旨在阐明与ML为基础的安全关键系统认证有关的挑战,以及文献中提出的解决方案,以解决它们,回答问题的问题如何证明基于机器学习的安全关键系统?'方法:我们开展2015年至2020年至2020年之间发布的研究论文的系统文献综述(SLR),涵盖了与ML系统认证有关的主题。总共确定了217篇论文涵盖了主题,被认为是ML认证的主要支柱:鲁棒性,不确定性,解释性,验证,安全强化学习和直接认证。我们分析了每个子场的主要趋势和问题,并提取了提取的论文的总结。结果:单反结果突出了社区对该主题的热情,以及在数据集和模型类型方面缺乏多样性。它还强调需要进一步发展学术界和行业之间的联系,以加深域名研究。最后,它还说明了必须在上面提到的主要支柱之间建立连接的必要性,这些主要柱主要主要研究。结论:我们强调了目前部署的努力,以实现ML基于ML的软件系统,并讨论了一些未来的研究方向。
translated by 谷歌翻译
可解释的人工智能(XAI)的新兴领域旨在为当今强大但不透明的深度学习模型带来透明度。尽管本地XAI方法以归因图的形式解释了个体预测,从而确定了重要特征的发生位置(但没有提供有关其代表的信息),但全局解释技术可视化模型通常学会的编码的概念。因此,两种方法仅提供部分见解,并留下将模型推理解释的负担。只有少数当代技术旨在将本地和全球XAI背后的原则结合起来,以获取更多信息的解释。但是,这些方法通常仅限于特定的模型体系结构,或对培训制度或数据和标签可用性施加其他要求,这实际上使事后应用程序成为任意预训练的模型。在这项工作中,我们介绍了概念相关性传播方法(CRP)方法,该方法结合了XAI的本地和全球观点,因此允许回答“何处”和“ where”和“什么”问题,而没有其他约束。我们进一步介绍了相关性最大化的原则,以根据模型对模型的有用性找到代表性的示例。因此,我们提高了对激活最大化及其局限性的共同实践的依赖。我们证明了我们方法在各种环境中的能力,展示了概念相关性传播和相关性最大化导致了更加可解释的解释,并通过概念图表,概念组成分析和概念集合和概念子区和概念子区和概念子集和定量研究对模型的表示和推理提供了深刻的见解。它们在细粒度决策中的作用。
translated by 谷歌翻译
众所周知,端到端的神经NLP体系结构很难理解,这引起了近年来为解释性建模的许多努力。模型解释的基本原则是忠诚,即,解释应准确地代表模型预测背后的推理过程。这项调查首先讨论了忠诚的定义和评估及其对解释性的意义。然后,我们通过将方法分为五类来介绍忠实解释的最新进展:相似性方法,模型内部结构的分析,基于反向传播的方法,反事实干预和自我解释模型。每个类别将通过其代表性研究,优势和缺点来说明。最后,我们从它们的共同美德和局限性方面讨论了上述所有方法,并反思未来的工作方向忠实的解释性。对于有兴趣研究可解释性的研究人员,这项调查将为该领域提供可访问且全面的概述,为进一步探索提供基础。对于希望更好地了解自己的模型的用户,该调查将是一项介绍性手册,帮助选择最合适的解释方法。
translated by 谷歌翻译
近年来,可解释的人工智能(XAI)已成为一个非常适合的框架,可以生成人类对“黑盒”模型的可理解解释。在本文中,一种新颖的XAI视觉解释算法称为相似性差异和唯一性(SIDU)方法,该方法可以有效地定位负责预测的整个对象区域。通过各种计算和人类主题实验分析了SIDU算法的鲁棒性和有效性。特别是,使用三种不同类型的评估(应用,人类和功能地面)评估SIDU算法以证明其出色的性能。在对“黑匣子”模型的对抗性攻击的情况下,进一步研究了Sidu的鲁棒性,以更好地了解其性能。我们的代码可在:https://github.com/satyamahesh84/sidu_xai_code上找到。
translated by 谷歌翻译
In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
translated by 谷歌翻译
与此同时,在可解释的人工智能(XAI)的研究领域中,已经开发了各种术语,动机,方法和评估标准。随着XAI方法的数量大大增长,研究人员以及从业者以及从业者需要一种方法:掌握主题的广度,比较方法,并根据特定用例所需的特征选择正确的XAI方法语境。在文献中,可以找到许多不同细节水平和深度水平的XAI方法分类。虽然他们经常具有不同的焦点,但它们也表现出许多重叠点。本文统一了这些努力,并提供了XAI方法的分类,这是关于目前研究中存在的概念的概念。在结构化文献分析和元研究中,我们识别并审查了XAI方法,指标和方法特征的50多个最引用和最新的调查。总结在调查调查中,我们将文章的术语和概念合并为统一的结构化分类。其中的单一概念总计超过50个不同的选择示例方法,我们相应地分类。分类学可以为初学者,研究人员和从业者提供服务作为XAI方法特征和方面的参考和广泛概述。因此,它提供了针对有针对性的,用例导向的基础和上下文敏感的未来研究。
translated by 谷歌翻译
Most recent work on interpretability of complex machine learning models has focused on estimating a posteriori explanations for previously trained models around specific predictions. Self-explaining models where interpretability plays a key role already during learning have received much less attention. We propose three desiderata for explanations in general -explicitness, faithfulness, and stability -and show that existing methods do not satisfy them. In response, we design self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models. Faithfulness and stability are enforced via regularization specifically tailored to such models. Experimental results across various benchmark datasets show that our framework offers a promising direction for reconciling model complexity and interpretability.
translated by 谷歌翻译
Explainability has become a central requirement for the development, deployment, and adoption of machine learning (ML) models and we are yet to understand what explanation methods can and cannot do. Several factors such as data, model prediction, hyperparameters used in training the model, and random initialization can all influence downstream explanations. While previous work empirically hinted that explanations (E) may have little relationship with the prediction (Y), there is a lack of conclusive study to quantify this relationship. Our work borrows tools from causal inference to systematically assay this relationship. More specifically, we measure the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors (hyperparameters) (inputs to generate saliency-based Es or Ys). We discover that Y's relative direct influence on E follows an odd pattern; the influence is higher in the lowest-performing models than in mid-performing models, and it then decreases in the top-performing models. We believe our work is a promising first step towards providing better guidance for practitioners who can make more informed decisions in utilizing these explanations by knowing what factors are at play and how they relate to their end task.
translated by 谷歌翻译
无法解释的黑框模型创建场景,使异常引起有害响应,从而造成不可接受的风险。这些风险促使可解释的人工智能(XAI)领域通过评估黑盒神经网络中的局部解释性来改善信任。不幸的是,基本真理对于模型的决定不可用,因此评估仅限于定性评估。此外,可解释性可能导致有关模型或错误信任感的不准确结论。我们建议通过探索Black-Box模型的潜在特征空间来从用户信任的有利位置提高XAI。我们提出了一种使用典型的几弹网络的Protoshotxai方法,该方法探索了不同类别的非线性特征之间的对比歧管。用户通过扰动查询示例的输入功能并记录任何类的示例子集的响应来探索多种多样。我们的方法是第一个可以将其扩展到很少的网络的本地解释的XAI模型。我们将ProtoShotxai与MNIST,Omniglot和Imagenet的最新XAI方法进行了比较,以进行定量和定性,Protoshotxai为模型探索提供了更大的灵活性。最后,Protoshotxai还展示了对抗样品的新颖解释和检测。
translated by 谷歌翻译