测量黑匣子预测算法中变量重要性的最流行方法是利用合成输入,这些输入结合了来自多个受试者的预测变量。这些输入可能是不可能的,身体上不可能的,甚至在逻辑上是不可能的。结果,对这种情况的预测可以基于数据,这与对黑匣子的训练非常不同。我们认为,当解释使用此类值时,用户不能相信预测算法的决定的解释。取而代之的是,我们主张一种称为同类沙普利的方法,该方法基于经济游戏理论,与大多数其他游戏理论方法不同,它仅使用实际观察到的数据来量化可变重要性。莎普利队的同伙通过缩小判断的主题的缩小,被认为与一个或多个功能上的目标主题相似。如果使用它来缩小队列对队列平均值有很大的不同,则功能很重要。我们在算法公平问题上进行了说明,其中必须将重要性归因于未经训练模型的保护变量。对于每个主题和每个预测变量,我们可以计算该预测因子对受试者的预测响应或对其实际响应的重要性。这些值可以汇总,例如在所有黑色受试者上,我们提出了一个贝叶斯引导程序来量化个人和骨料莎普利值的不确定性。
translated by 谷歌翻译
可解释的AI(XAI)的基本任务是确定黑匣子功能$ f $做出的预测背后的最重要功能。 Petsiuk等人的插入和缺失测试。 (2018年)用于判断从最重要的对分类至最不重要的算法的质量。在回归问题的激励下,我们在曲线标准(AUC)标准下建立了一个公式,就$ f $的锚定分解中的某些主要效果和相互作用而言。我们找到了在输入到$ f $的随机排序下AUC的期望值的表达式,并提出了回归设置的直线上方的替代区域。我们使用此标准将集成梯度(IG)计算出的特征与内核Shap(KS)以及石灰,DeepLift,Vanilla梯度和输入$ \ times $ \ times $梯度方法进行比较。 KS在我们考虑的两个数据集中具有最好的总体性能,但是计算非常昂贵。我们发现IG几乎和KS一样好,同时更快。我们的比较问题包括一些对IG构成挑战的二进制输入,因为它必须使用可能的变量级别之间的值,因此我们考虑处理IG中二进制变量的方法。我们表明,通过其shapley值进行排序变量并不一定给出插入插入测试的最佳排序。但是,对于加性模型的单调函数(例如逻辑回归),它将做到这一点。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses both issues. First, we systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. Second, we propose a common benchmark that allows for an objective comparison of approaches. It consists of five datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues. Surprisingly, we find that thorough hyper-parameter tuning on held-out validation data results in a highly competitive baseline and highlights a stunted growth of performance over the years. Indeed, only a single specialized method dating back to 2019 clearly wins our benchmark and outperforms the baseline classifier.
translated by 谷歌翻译
Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, question answering (such as ChatGPT), etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6-8 age group. Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others. To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances for each puzzle while retaining their solution algorithm. To benchmark the performance on the SMART-101 dataset, we propose a vision and language meta-learning model using varied state-of-the-art backbone neural networks. Our experiments reveal that while powerful deep models offer reasonable performances on puzzles that they are trained on, they are not better than random accuracy when analyzed for generalization. We also evaluate the recent ChatGPT large language model on a subset of our dataset and find that while ChatGPT produces convincing reasoning abilities, the answers are often incorrect.
translated by 谷歌翻译
Everting, soft growing vine robots benefit from reduced friction with their environment, which allows them to navigate challenging terrain. Vine robots can use air pouches attached to their sides for lateral steering. However, when all pouches are serially connected, the whole robot can only perform one constant curvature in free space. It must contact the environment to navigate through obstacles along paths with multiple turns. This work presents a multi-segment vine robot that can navigate complex paths without interacting with its environment. This is achieved by a new steering method that selectively actuates each single pouch at the tip, providing high degrees of freedom with few control inputs. A small magnetic valve connects each pouch to a pressure supply line. A motorized tip mount uses an interlocking mechanism and motorized rollers on the outer material of the vine robot. As each valve passes through the tip mount, a permanent magnet inside the tip mount opens the valve so the corresponding pouch is connected to the pressure supply line at the same moment. Novel cylindrical pneumatic artificial muscles (cPAMs) are integrated into the vine robot and inflate to a cylindrical shape for improved bending characteristics compared to other state-of-the art vine robots. The motorized tip mount controls a continuous eversion speed and enables controlled retraction. A final prototype was able to repeatably grow into different shapes and hold these shapes. We predict the path using a model that assumes a piecewise constant curvature along the outside of the multi-segment vine robot. The proposed multi-segment steering method can be extended to other soft continuum robot designs.
translated by 谷歌翻译
Signature-based malware detectors have proven to be insufficient as even a small change in malignant executable code can bypass these signature-based detectors. Many machine learning-based models have been proposed to efficiently detect a wide variety of malware. Many of these models are found to be susceptible to adversarial attacks - attacks that work by generating intentionally designed inputs that can force these models to misclassify. Our work aims to explore vulnerabilities in the current state of the art malware detectors to adversarial attacks. We train a Transformers-based malware detector, carry out adversarial attacks resulting in a misclassification rate of 23.9% and propose defenses that reduce this misclassification rate to half. An implementation of our work can be found at https://github.com/yashjakhotiya/Adversarial-Attacks-On-Transformers.
translated by 谷歌翻译
SKA脉冲星搜索管道将用于实时检测脉冲星。SKA等现代射电望远镜将在其全面运行中生成数据。因此,基于经验和数据驱动的算法对于诸如候选检测等应用是必不可少的。在这里,我们描述了我们的发现,从测试一种称为Mask R-CNN的最先进的对象检测算法来检测SKA PULSAR搜索管道中的候选标志。我们已经训练了蒙版R-CNN模型来检测候选图像。开发了一种自定义注释工具,以有效地标记大型数据集中感兴趣的区域。我们通过检测模拟数据集中的候选签名成功证明了该算法。本文介绍了这项工作的详细信息,并重点介绍了未来的前景。
translated by 谷歌翻译
人们现在将社交媒体网站视为其唯一信息来源,因为它们的受欢迎程度。大多数人通过社交媒体获取新闻。同时,近年来,假新闻在社交媒体平台上成倍增长。几种基于人工智能的解决方案用于检测假新闻,已显示出令人鼓舞的结果。另一方面,这些检测系统缺乏解释功能,即解释为什么他们做出预测的能力。本文在可解释的假新闻检测中突出了当前的艺术状态。我们讨论了当前可解释的假新闻检测模型中的陷阱,并介绍了我们正在进行的有关多模式可解释的假新闻检测模型的研究。
translated by 谷歌翻译
本文提出了一个算法框架,用于控制符合信号时间逻辑(STL)规范的连续动力系统的合成。我们提出了一种新型算法,以从STL规范中获得时间分配的有限自动机,并引入一个多层框架,该框架利用此自动机以空间和时间上指导基于采样的搜索树。我们的方法能够合成非线性动力学和多项式谓词功能的控制器。我们证明了算法的正确性和概率完整性,并说明了我们在几个案例研究中框架的效率和功效。我们的结果表明,在艺术状态下,速度的速度是一定的。
translated by 谷歌翻译