Deep neural networks (NNs) are powerful black box predictors that have recently achieved impressive performance on a wide spectrum of tasks. Quantifying predictive uncertainty in NNs is a challenging and yet unsolved problem. Bayesian NNs, which learn a distribution over weights, are currently the state-of-the-art for estimating predictive uncertainty; however these require significant modifications to the training procedure and are computationally expensive compared to standard (non-Bayesian) NNs. We propose an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates. Through a series of experiments on classification and regression benchmarks, we demonstrate that our method produces well-calibrated uncertainty estimates which are as good or better than approximate Bayesian NNs. To assess robustness to dataset shift, we evaluate the predictive uncertainty on test examples from known and unknown distributions, and show that our method is able to express higher uncertainty on out-of-distribution examples. We demonstrate the scalability of our method by evaluating predictive uncertainty estimates on ImageNet.
translated by 谷歌翻译
Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive uncertainty. Quantifying uncertainty is especially critical in real-world settings, which often involve input distributions that are shifted from the training distribution due to a variety of factors including sample bias and non-stationarity. In such settings, well calibrated uncertainty estimates convey information about when a model's output should (or should not) be trusted. Many probabilistic deep learning methods, including Bayesian-and non-Bayesian methods, have been proposed in the literature for quantifying predictive uncertainty, but to our knowledge there has not previously been a rigorous largescale empirical comparison of these methods under dataset shift. We present a largescale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration. We find that traditional post-hoc calibration does indeed fall short, as do several other previous methods. However, some methods that marginalize over models give surprisingly strong results across a broad spectrum of tasks.
translated by 谷歌翻译
我们研究了回归中神经网络(NNS)的模型不确定性的方法。为了隔离模型不确定性的效果,我们专注于稀缺训练数据的无噪声环境。我们介绍了关于任何方法都应满足的模型不确定性的五个重要的逃亡者。但是,我们发现,建立的基准通常无法可靠地捕获其中一些逃避者,即使是贝叶斯理论要求的基准。为了解决这个问题,我们介绍了一种新方法来捕获NNS的模型不确定性,我们称之为基于神经优化的模型不确定性(NOMU)。 NOMU的主要思想是设计一个由两个连接的子NN组成的网络体系结构,一个用于模型预测,一个用于模型不确定性,并使用精心设计的损耗函数进行训练。重要的是,我们的设计执行NOMU满足我们的五个Desiderata。由于其模块化体系结构,NOMU可以为任何给定(先前训练)NN提供模型不确定性,如果访问其培训数据。我们在各种回归任务和无嘈杂的贝叶斯优化(BO)中评估NOMU,并具有昂贵的评估。在回归中,NOMU至少和最先进的方法。在BO中,Nomu甚至胜过所有考虑的基准。
translated by 谷歌翻译
最近实现了更准确的短期预测的数据驱动的空气质量预测。尽管取得了成功,但大多数目前的数据驱动解决方案都缺乏适当的模型不确定性的量化,以传达信任预测的程度。最近,在概率深度学习中已经制定了几种估计不确定性的实用工具。但是,在空气质量预测领域的域中没有经验应用和广泛的比较这些工具。因此,这项工作在空气质量预测的真实环境中应用了最先进的不确定性量化。通过广泛的实验,我们描述了培训概率模型,并根据经验性能,信心可靠性,置信度估计和实际适用性评估其预测性不确定性。我们还使用空气质量数据中固有的“自由”对抗培训和利用时间和空间相关性提出改善这些模型。我们的实验表明,所提出的模型比以前的工作更好地在量化数据驱动空气质量预测中的不确定性方面表现出。总体而言,贝叶斯神经网络提供了更可靠的不确定性估计,但可能挑战实施和规模。其他可扩展方法,如深合奏,蒙特卡罗(MC)辍学和随机重量平均-Gaussian(SWAG)可以执行良好,如果正确应用,但具有不同的权衡和性能度量的轻微变化。最后,我们的结果表明了不确定性估计的实际影响,并证明了,实际上,概率模型更适合提出知情决策。代码和数据集可用于\ url {https:/github.com/abdulmajid-murad/deep_probabilistic_forecast}
translated by 谷歌翻译
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines
translated by 谷歌翻译
贝叶斯范式有可能解决深度神经网络的核心问题,如校准和数据效率低差。唉,缩放贝叶斯推理到大量的空间通常需要限制近似。在这项工作中,我们表明它足以通过模型权重的小子集进行推动,以便获得准确的预测后断。另一个权重被保存为点估计。该子网推断框架使我们能够在这些子集上使用表现力,否则难以相容的后近近似。特别是,我们将子网线性化LAPLACE作为一种简单,可扩展的贝叶斯深度学习方法:我们首先使用线性化的拉普拉斯近似来获得所有重量的地图估计,然后在子网上推断出全协方差高斯后面。我们提出了一个子网选择策略,旨在最大限度地保护模型的预测性不确定性。经验上,我们的方法对整个网络的集合和较少的表达后近似进行了比较。
translated by 谷歌翻译
深度神经网络具有令人印象深刻的性能,但是他们无法可靠地估计其预测信心,从而限制了其在高风险领域中的适用性。我们表明,应用多标签的一VS损失揭示了分类的歧义并降低了模型的过度自信。引入的Slova(单标签One-Vs-All)模型重新定义了单个标签情况的典型单VS-ALL预测概率,其中只有一个类是正确的答案。仅当单个类具有很高的概率并且其他概率可忽略不计时,提议的分类器才有信心。与典型的SoftMax函数不同,如果所有其他类的概率都很小,Slova自然会检测到分布的样本。该模型还通过指数校准进行了微调,这使我们能够与模型精度准确地对齐置信分数。我们在三个任务上验证我们的方法。首先,我们证明了斯洛伐克与最先进的分布校准具有竞争力。其次,在数据集偏移下,斯洛伐克的性能很强。最后,我们的方法在检测到分布样品的检测方面表现出色。因此,斯洛伐克是一种工具,可以在需要不确定性建模的各种应用中使用。
translated by 谷歌翻译
在过去几十年中,已经提出了各种方法,用于估计回归设置中的预测间隔,包括贝叶斯方法,集合方法,直接间隔估计方法和保形预测方法。重要问题是这些方法的校准:生成的预测间隔应该具有预定义的覆盖水平,而不会过于保守。在这项工作中,我们从概念和实验的角度审查上述四类方法。结果来自各个域的基准数据集突出显示从一个数据集中的性能的大波动。这些观察可能归因于违反某些类别的某些方法所固有的某些假设。我们说明了如何将共形预测用作提供不具有校准步骤的方法的方法的一般校准程序。
translated by 谷歌翻译
Objective: Convolutional neural networks (CNNs) have demonstrated promise in automated cardiac magnetic resonance image segmentation. However, when using CNNs in a large real-world dataset, it is important to quantify segmentation uncertainty and identify segmentations which could be problematic. In this work, we performed a systematic study of Bayesian and non-Bayesian methods for estimating uncertainty in segmentation neural networks. Methods: We evaluated Bayes by Backprop, Monte Carlo Dropout, Deep Ensembles, and Stochastic Segmentation Networks in terms of segmentation accuracy, probability calibration, uncertainty on out-of-distribution images, and segmentation quality control. Results: We observed that Deep Ensembles outperformed the other methods except for images with heavy noise and blurring distortions. We showed that Bayes by Backprop is more robust to noise distortions while Stochastic Segmentation Networks are more resistant to blurring distortions. For segmentation quality control, we showed that segmentation uncertainty is correlated with segmentation accuracy for all the methods. With the incorporation of uncertainty estimates, we were able to reduce the percentage of poor segmentation to 5% by flagging 31--48% of the most uncertain segmentations for manual review, substantially lower than random review without using neural network uncertainty (reviewing 75--78% of all images). Conclusion: This work provides a comprehensive evaluation of uncertainty estimation methods and showed that Deep Ensembles outperformed other methods in most cases. Significance: Neural network uncertainty measures can help identify potentially inaccurate segmentations and alert users for manual review.
translated by 谷歌翻译
We introduce ensembles of stochastic neural networks to approximate the Bayesian posterior, combining stochastic methods such as dropout with deep ensembles. The stochastic ensembles are formulated as families of distributions and trained to approximate the Bayesian posterior with variational inference. We implement stochastic ensembles based on Monte Carlo dropout, DropConnect and a novel non-parametric version of dropout and evaluate them on a toy problem and CIFAR image classification. For CIFAR, the stochastic ensembles are quantitatively compared to published Hamiltonian Monte Carlo results for a ResNet-20 architecture. We also test the quality of the posteriors directly against Hamiltonian Monte Carlo simulations in a simplified toy model. Our results show that in a number of settings, stochastic ensembles provide more accurate posterior estimates than regular deep ensembles.
translated by 谷歌翻译
我们提出了一种用于预测性不确定性的框架,其神经网络取代了重量概率密度函数(PDF)的传统贝叶斯概念,其基于基于Gaussian再现内核Hilbert空间(RKHS)嵌入的模型权重的物理潜在场表示。这使我们能够使用量子物理学的扰动理论来制定模型权力关系关系的片刻分解问题。提取的时刻显示了模型输出的局部附近的重量PDF的连续正则化。这种局部时刻以极大的灵敏度确定重量PDF的局部异质性,从而提供比贝叶斯和集合方法特征的模型预测性不确定性的模型预测性的更大准确性。我们表明这导致更好地导致检测经历了经历了经过调节的测试数据的假模型预测,从而从模型中学到的培训PDF。我们在使用常见失真技术损坏的几个基准数据集中评估我们对基线不确定性定量方法的方法。我们的方法提供了快速模型预测性不确定性估计,具有更高的精度和校准。
translated by 谷歌翻译
尽管对安全机器学习的重要性,但神经网络的不确定性量化远未解决。估计神经不确定性的最先进方法通常是混合的,将参数模型与显式或隐式(基于辍学的)合并结合。我们采取另一种途径,提出一种新颖的回归任务的不确定量化方法,纯粹是非参数的。从技术上讲,它通过基于辍学的子网分布来捕获梯级不确定性。这是通过一个新目标来实现的,这使得标签分布与模型分布之间的Wasserstein距离最小化。广泛的经验分析表明,在生产更准确和稳定的不确定度估计方面,Wasserstein丢失在香草测试数据以及在分类转移的情况下表现出最先进的方法。
translated by 谷歌翻译
现代深度学习方法构成了令人难以置信的强大工具,以解决无数的挑战问题。然而,由于深度学习方法作为黑匣子运作,因此与其预测相关的不确定性往往是挑战量化。贝叶斯统计数据提供了一种形式主义来理解和量化与深度神经网络预测相关的不确定性。本教程概述了相关文献和完整的工具集,用于设计,实施,列车,使用和评估贝叶斯神经网络,即使用贝叶斯方法培训的随机人工神经网络。
translated by 谷歌翻译
超新星光谱时间序列可用于重建称为超新星断层扫描的空间分辨爆炸模型。除了观察到的光谱时间序列外,超新星断层扫描还需要一个辐射转移模型来执行重建不确定性定量的反问题。超新星断层扫描模型的最小参数化大约是十二个参数,其现实是需要超过100的参数。现实的辐射转移模型需要数十分钟的CPU分钟来进行一次评估,从而使问题在计算上具有传统手段,需要数百万的MCMC样本才能获得此类MCMC样本。问题。一种使用机器学习技术加速称为替代模型或模拟器的新方法为这些问题提供了一种解决方案,以及一种了解光谱时间序列中的祖/爆炸的方法。 Tardis Supernova辐射传输代码存在模拟器,但它们仅在简单的低维模型(大约十二个参数)上表现良好,并且在Supernova字段中具有少量的知识增长应用程序。在这项工作中,我们为辐射转移代码TARDIS提出了一个新的模拟器,该模拟器不仅胜过现有的模拟器,而且还提供了预测中的不确定性。它为未来的基于学习的机械提供了基础,该机械将能够模拟数百个参数的非常高的维度空间,这对于在超新星和相关领域中阐明紧急问题至关重要。
translated by 谷歌翻译
量化监督学习模型的不确定性在制定更可靠的预测方面发挥着重要作用。认知不确定性,通常是由于对模型的知识不足,可以通过收集更多数据或精炼学习模型来减少。在过去的几年里,学者提出了许多认识的不确定性处理技术,这些技术可以大致分为两类,即贝叶斯和集合。本文对过去五年来提供了对监督学习的认识性不确定性学习技术的全面综述。因此,我们首先,将认知不确定性分解为偏见和方差术语。然后,介绍了认知不确定性学习技术以及其代表模型的分层分类。此外,提出了几种应用,例如计算机视觉(CV)和自然语言处理(NLP),然后讨论研究差距和可能的未来研究方向。
translated by 谷歌翻译
深度神经网络易于对异常值过度自信的预测。贝叶斯神经网络和深度融合都已显示在某种程度上减轻了这个问题。在这项工作中,我们的目标是通过提议预测由高斯混合模型的后续的高斯混合模型来结合这两种方法的益处,该高斯混合模型包括独立培训的深神经网络的LAPPALL近似的加权和。该方法可以与任何一组预先训练的网络一起使用,并且与常规合并相比,只需要小的计算和内存开销。理论上我们验证了我们的方法从训练数据中的培训数据和虚拟化的基本线上的标准不确定量级基准测试中的“远离”的过度控制。
translated by 谷歌翻译
考虑到其协变量$ \ boldsymbol x $的连续或分类响应变量$ \ boldsymbol y $的分布是统计和机器学习中的基本问题。深度神经网络的监督学习算法在预测给定$ \ boldsymbol x $的$ \ boldsymbol y $的平均值方面取得了重大进展,但是他们经常因其准确捕捉预测的不确定性的能力而受到批评。在本文中,我们引入了分类和回归扩散(卡)模型,该模型结合了基于扩散的条件生成模型和预训练的条件估计器,以准确预测给定$ \ boldsymbol y $的分布,给定$ \ boldsymbol x $。我们证明了通过玩具示例和现实世界数据集的有条件分配预测的卡片的出色能力,实验结果表明,一般的卡在一般情况下都优于最先进的方法,包括基于贝叶斯的神经网络的方法专为不确定性估计而设计,尤其是当给定$ \ boldsymbol y $的条件分布给定的$ \ boldsymbol x $是多模式时。
translated by 谷歌翻译
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.
translated by 谷歌翻译
We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only "virtually" adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward-and back-propagations. In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.
translated by 谷歌翻译
神经网络中的不确定性量化有望增加AI系统的安全性,但目前尚不清楚培训集大小如何变化。在本文中,我们评估了七种在时尚Mnist和CiFar10上的不确定性方法,因为我们子样本并产生各种训练套装尺寸。我们发现校准误差和分配检测性能强烈依赖于训练集大小,大多数方法在具有小型训练集的测试集上被错误化。基于梯度的方法似乎估计了估计的认识性不确定性,并且是受训练集规模受影响最大的。我们希望我们的结果可以指导未来的不确定性量化研究,并帮助从业者根据其特定的可用数据选择方法。
translated by 谷歌翻译