我们介绍了软件Robustar的初步发布,该版本旨在通过数据驱动的视角提高视觉分类机器学习模型的鲁棒性。基于最近的理解,即缺乏机器学习模型的鲁棒性是该模型学习虚假特征的趋势,我们旨在通过在训练前从数据中删除数据的杂种特征来从数据角度解决此问题。特别是,我们介绍了一种软件,可以通过允许用户注释图像像素级别的虚假功能来帮助用户更好地为训练图像分类模型准备数据。为了促进这一过程,我们的软件还利用了最近的进步来帮助识别值得关注的潜在图像和像素,并通过新注释的数据继续培训。我们的软件托管在GitHub存储库https://github.com/haohanwang/robustar。
translated by 谷歌翻译
机器学习已经证明了通过I.I.D数据的显着预测准确性,但准确性通常会在从另一个分发中测试数据时掉落。在本文中,我们的目标是在假设这种精度下降背后的原因是透视图的另一个问题,这是对模型对不良好的特征的依赖性,其中数据注释员如何在这两个数据集中进行类似的情况。我们将这些功能称为未对准功能。我们将传统的泛化误差扩展到该设置的新概率,以了解未对齐的功能如何与标签相关联。我们的分析为该问题提供了一组技术,并且这些技术自然地与鲁棒机学习文献中的许多方法相关联。我们还将这些方法的经验强度进行了比较,在这些方法组合时表现出性能。
translated by 谷歌翻译
由于NLP模型实现了基准测试的最先进的性能并获得了广泛的应用程序,因此确保在现实世界中的安全部署这些模型的安全部署,例如,确保模型对未经调用或具有挑战性的情景稳健。尽管具有越来越多的学习主题,但它在视觉和NLP等应用中分别探讨了,具有多种研究中的各种定义,评估和缓解策略。在本文中,我们的目标是提供对如何定义,测量和提高NLP鲁棒性的统一调查。我们首先连接多种稳健性的定义,然后统一各种各样的工作方面识别稳健性失败和评估模型的鲁棒性。相应地,我们呈现了数据驱动,模型驱动和基于归纳的缓解策略,具有如何有效地改善NLP模型中的鲁棒性的更系统的观点。最后,我们通过概述开放的挑战和未来方向来促进在这一领域的进一步研究。
translated by 谷歌翻译
通过整合人类的知识和经验,人在循环旨在以最低成本培训准确的预测模型。人类可以为机器学习应用提供培训数据,并直接完成在基于机器的方法中对管道中计算机中的难以实现的任务。在本文中,我们从数据的角度调查了人类循环的现有工作,并将它们分为三类具有渐进关系:(1)从数据处理中提高模型性能的工作,(2)通过介入模型培训提高模型性能,(3)系统的设计独立于循环的设计。使用上述分类,我们总结了该领域的主要方法;随着他们的技术优势/弱点以及自然语言处理,计算机愿景等的简单分类和讨论。此外,我们提供了一些开放的挑战和机遇。本调查打算为人类循环提供高级别的摘要,并激励有兴趣的读者,以考虑设计有效的循环解决方案的方法。
translated by 谷歌翻译
理解和解释训练有素的模型对许多机器学习目标至关重要,例如改善鲁棒性,解决概念漂移和减轻偏见。但是,这通常是一个临时过程,涉及手动查看许多测试样本上的模型的错误,并猜测这些错误的预测的根本原因。在本文中,我们提出了一种系统的方法,概念性的反事实解释(CCE),解释了为什么分类器在人类理解的概念方面在特定的测试样本上犯了一个错误(例如,此斑马被错误地分类为狗,因为因为是因为是因为是狗的。微弱的条纹)。我们基于两个先前的想法:反事实解释和概念激活向量,并在众所周知的预读模型上验证我们的方法,表明它有意义地解释了模型的错误。此外,对于接受具有虚假相关性数据的数据训练的新模型,CCE准确地将虚假相关性确定为单个错误分类测试样本中模型错误的原因。在两个具有挑战性的医学应用程序中,CCE产生了有用的见解,并由临床医生确认,涉及该模型在现实世界中犯的偏见和错误。
translated by 谷歌翻译
大型语言模型(LLM)已在一系列自然语言理解任务上实现了最先进的表现。但是,这些LLM可能依靠数据集偏差和文物作为预测的快捷方式。这极大地损害了他们的分布(OOD)概括和对抗性鲁棒性。在本文中,我们对最新发展的综述,这些发展解决了LLMS的鲁棒性挑战。我们首先介绍LLM的概念和鲁棒性挑战。然后,我们介绍了在LLM中识别快捷方式学习行为的方法,表征了快捷方式学习的原因以及引入缓解解决方案。最后,我们确定了关键挑战,并将这一研究线的联系引入其他方向。
translated by 谷歌翻译
As an important data selection schema, active learning emerges as the essential component when iterating an Artificial Intelligence (AI) model. It becomes even more critical given the dominance of deep neural network based models, which are composed of a large number of parameters and data hungry, in application. Despite its indispensable role for developing AI models, research on active learning is not as intensive as other research directions. In this paper, we present a review of active learning through deep active learning approaches from the following perspectives: 1) technical advancements in active learning, 2) applications of active learning in computer vision, 3) industrial systems leveraging or with potential to leverage active learning for data iteration, 4) current limitations and future research directions. We expect this paper to clarify the significance of active learning in a modern AI model manufacturing process and to bring additional research attention to active learning. By addressing data automation challenges and coping with automated machine learning systems, active learning will facilitate democratization of AI technologies by boosting model production at scale.
translated by 谷歌翻译
尽管有无数的同伴审查的论文,证明了新颖的人工智能(AI)基于大流行期间的Covid-19挑战的解决方案,但很少有临床影响。人工智能在Covid-19大流行期间的影响因缺乏模型透明度而受到极大的限制。这种系统审查考察了在大流行期间使用可解释的人工智能(Xai)以及如何使用它可以克服现实世界成功的障碍。我们发现,Xai的成功使用可以提高模型性能,灌输信任在最终用户,并提供影响用户决策所需的值。我们将读者介绍给常见的XAI技术,其实用程序以及其应用程序的具体例子。 XAI结果的评估还讨论了最大化AI的临床决策支持系统的价值的重要步骤。我们说明了Xai的古典,现代和潜在的未来趋势,以阐明新颖的XAI技术的演变。最后,我们在最近出版物支持的实验设计过程中提供了建议的清单。潜在解决方案的具体示例也解决了AI解决方案期间的共同挑战。我们希望本次审查可以作为提高未来基于AI的解决方案的临床影响的指导。
translated by 谷歌翻译
We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically, we consider the scenario where the spurious signal to be detected is unknown, at test-time, to the user of the explanation method. We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts to obtain models that verifiably rely on these spurious training signals. We then provide a suite of metrics that assess an explanation method's reliability for spurious signal detection under various conditions. We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artifacts like a background blur. Further, we find that feature attribution methods are susceptible to erroneously indicating dependence on spurious signals even when the model being explained does not rely on spurious artifacts. This finding casts doubt on the utility of these approaches, in the hands of a practitioner, for detecting a model's reliance on spurious signals.
translated by 谷歌翻译
我们展示了MapReader,一个在Python中编写的免费开源软件库,用于分析大地图集合(扫描或出生)。此库转换历史人员可以通过转动广泛的均匀地图设置到可搜索的主要源来使用映射的方式。 MapReader允许使用很少或没有计算机视觉专业知识的用户来通过Web服务器检索地图; ii)预处理并将它们分成补丁; iii)涂布补丁; iv)火车,微调和评估深度神经网络模型; v)创建有关地图内容的结构化数据。我们展示了MAPREADER如何使历史学家解释$ \ \左右16千世纪的军械调查地图表($ \大约30.5M补丁),将视觉标记转化为机器可读数据的挑战。我们展示了一个案例研究,重点是英国铁路基础设施和建筑物,如这些地图所示。我们还展示了MapReader管道的输出如何链接到我们用于评估的其他外部数据集以及丰富和解释结果。我们释放$ \大约62万美元手动注释的补丁,用于培训和评估模型。
translated by 谷歌翻译
注意力指导是一种解决深度学习中数据集偏见的方法,该模型依赖于错误的功能来做出决策。为了关注图像分类任务,我们提出了一个有效的人类在环境系统中,以交互性地将分类器的注意力引向用户指定的区域,从而降低了共发生偏见的影响,并提高了DNN的可传递性和可解释性。以前的注意力指导需要准备像素级注释,而不是被设计为交互式系统。我们提出了一种新的交互式方法,可让用户简单地点击注释图像,并研究一种新颖的主动学习策略,以显着减少注释的数量。我们既进行了数值评估,又进行了用户研究,以评估多个数据集上提出的系统。与现有的非活性学习方法相比,通常依靠大量基于多边形的分割口罩来微调或训练DNNS,我们的系统可以节省大量的劳动力和金钱,并获得一个效用更好的网络即使数据集有偏见。实验结果表明,所提出的系统是有效,合理且可靠的。
translated by 谷歌翻译
计算病理(CPATH)是一种具有关于组织病理研究的新兴领域,通过计算和分析组织载玻片的数字化高分辨率图像的处理算法。CPATH最近的深度学习的发展已经成功地利用了组织学图像中的原始像素数据的纯粹体积,以预测诊断域,预测,治疗敏感性和患者分层中的目标参数 - 覆盖新数据驱动的AI时代的承诺既组织病理学和肿瘤。使用作为燃料和作为发动机的燃料和AI的数据,CPATH算法准备好用于起飞和最终发射到临床和药物轨道中。在本文中,我们讨论了CPATH限制和相关挑战,使读者能够区分HIPE的希望,并为未来的研究提供指示,以克服这个崭露头角领域的一些主要挑战,以使其发射到两个轨道上。
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
为了提高模型透明度并允许用户形成训练有素的ML模型的心理模型,解释对AI和机器学习(ML)社区的兴趣越来越高。但是,解释可以超越这种方式通信作为引起用户控制的机制,因为一旦用户理解,他们就可以提供反馈。本文的目的是介绍研究概述,其中解释与交互式功能相结合,是从头开始学习新模型并编辑和调试现有模型的手段。为此,我们绘制了最先进的概念图,根据其预期目的以及它们如何构建相互作用,突出它们之间的相似性和差异来分组相关方法。我们还讨论开放研究问题并概述可能的方向,希望促使人们对这个开花研究主题进行进一步的研究。
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.
translated by 谷歌翻译
As the societal impact of Deep Neural Networks (DNNs) grows, the goals for advancing DNNs become more complex and diverse, ranging from improving a conventional model accuracy metric to infusing advanced human virtues such as fairness, accountability, transparency (FaccT), and unbiasedness. Recently, techniques in Explainable Artificial Intelligence (XAI) are attracting considerable attention, and have tremendously helped Machine Learning (ML) engineers in understanding AI models. However, at the same time, we started to witness the emerging need beyond XAI among AI communities; based on the insights learned from XAI, how can we better empower ML engineers in steering their DNNs so that the model's reasonableness and performance can be improved as intended? This article provides a timely and extensive literature overview of the field Explanation-Guided Learning (EGL), a domain of techniques that steer the DNNs' reasoning process by adding regularization, supervision, or intervention on model explanations. In doing so, we first provide a formal definition of EGL and its general learning paradigm. Secondly, an overview of the key factors for EGL evaluation, as well as summarization and categorization of existing evaluation procedures and metrics for EGL are provided. Finally, the current and potential future application areas and directions of EGL are discussed, and an extensive experimental study is presented aiming at providing comprehensive comparative studies among existing EGL models in various popular application domains, such as Computer Vision (CV) and Natural Language Processing (NLP) domains.
translated by 谷歌翻译
We present Azimuth, an open-source and easy-to-use tool to perform error analysis for text classification. Compared to other stages of the ML development cycle, such as model training and hyper-parameter tuning, the process and tooling for the error analysis stage are less mature. However, this stage is critical for the development of reliable and trustworthy AI systems. To make error analysis more systematic, we propose an approach comprising dataset analysis and model quality assessment, which Azimuth facilitates. We aim to help AI practitioners discover and address areas where the model does not generalize by leveraging and integrating a range of ML techniques, such as saliency maps, similarity, uncertainty, and behavioral analyses, all in one tool. Our code and documentation are available at github.com/servicenow/azimuth.
translated by 谷歌翻译
深度学习模型通常遭受域移位问题,其中一个源域培训的模型不会概括到其他看不见的域。在这项工作中,我们调查了单源域泛化问题:培训一个深入的网络,在训练数据仅从一个源域中获得的训练数据中的条件,这是在医学成像应用程序中常见的情况下。我们在跨域医学图像分割的背景下解决这个问题。在这种情况下,域移主要由不同的采集过程引起。我们提出了一种简单的因果关系激发数据增强方法,使分段模型暴露于合成域移位的训练示例。具体而言,1)使得深度模型在图像强度和纹理中的差异差异,我们采用了一系列随机加权浅网络。他们使用不同的外观变换来增强训练图像。 2)此外,我们表明图像中物体之间的虚假相关性对域的鲁棒性有害。网络可能被网络作为特定于域的线索进行预测的相关性,并且它们可能会破坏看不见的域。我们通过因果干预删除这些杂散相关性。这是通过分层潜在相关对象的外表来实现的。所提出的方法在三个横域分割任务上验证:跨型号(CT-MRI)腹部图像分割,串序(BSSFP-LGE)心动MRI分割和跨中心前列腺MRI分段。当在看不见的域测试时,所提出的方法与竞争方法相比,与竞争方法相比产生一致的性能。
translated by 谷歌翻译
Establishing open and general benchmarks has been a critical driving force behind the success of modern machine learning techniques. As machine learning is being applied to broader domains and tasks, there is a need to establish richer and more diverse benchmarks to better reflect the reality of the application scenarios. Graph learning is an emerging field of machine learning that urgently needs more and better benchmarks. To accommodate the need, we introduce Graph Learning Indexer (GLI), a benchmark curation platform for graph learning. In comparison to existing graph learning benchmark libraries, GLI highlights two novel design objectives. First, GLI is designed to incentivize \emph{dataset contributors}. In particular, we incorporate various measures to minimize the effort of contributing and maintaining a dataset, increase the usability of the contributed dataset, as well as encourage attributions to different contributors of the dataset. Second, GLI is designed to curate a knowledge base, instead of a plain collection, of benchmark datasets. We use multiple sources of meta information to augment the benchmark datasets with \emph{rich characteristics}, so that they can be easily selected and used in downstream research or development. The source code of GLI is available at \url{https://github.com/Graph-Learning-Benchmarks/gli}.
translated by 谷歌翻译
深度学习失败案例很丰富,尤其是在医疗区域。最近对分布式概括的研究已在控制良好的合成数据集上进行了大量发展,但它们不代表医学成像环境。我们提出了一条依赖伪像的管道的管道,以便为具有挑战性的皮肤病变分析环境提供概括评估和偏见。首先,我们将数据分为越来越高的偏见训练和测试集的水平,以更好地概括评估。然后,我们基于皮肤病变伪影创建环境,以实现域的概括方法。最后,经过强大的训练,我们执行了测试时间的偏差程序,从而减少了推理图像中的虚假特征。我们的实验表明,我们的管道改善了偏见的情况下的性能指标,并在使用解释方法时避免了伪像。尽管如此,在评估分布数据中的此类模型时,他们不喜欢临床上的功能。取而代之的是,只有在培训中呈现类似工件的测试集中的性能得到了改善,这表明模型学会忽略了已知的伪像。我们的结果引起了人们的关注,即对单个方面的偏见模型可能不足以容纳皮肤病变分析。
translated by 谷歌翻译