智能论文笔记

Robust Generalised Bayesian Inference for Intractable Likelihoods

Takuo Matsubara , Jeremias Knoblauch , François-Xavier Briol , Chris. J. Oates

分类： (统计)机器学习

2021-04-15

广义贝叶斯推理使用损失函数而不是可能性的先前信仰更新，因此可以用于赋予鲁棒性，以防止可能的错误规范的可能性。在这里，我们认为广泛化的贝叶斯推论斯坦坦差异作为损失函数的损失，由应用程序的可能性含有难治性归一化常数。在这种情况下，斯坦因差异来避免归一化恒定的评估，并产生封闭形式或使用标准马尔可夫链蒙特卡罗的通用后出版物。在理论层面上，我们显示了一致性，渐近的正常性和偏见 - 稳健性，突出了这些物业如何受到斯坦因差异的选择。然后，我们提供关于一系列棘手分布的数值实验，包括基于内核的指数家庭模型和非高斯图形模型的应用。

translated by 谷歌翻译

Generalised Bayesian Inference for Discrete Intractable Likelihood

Takuo Matsubara , Jeremias Knoblauch , François-Xavier Briol , Chris. J. Oates

分类： (统计)机器学习

2022-06-16

离散状态空间代表了对统计推断的主要计算挑战，因为归一化常数的计算需要在大型或可能的无限集中进行求和，这可能是不切实际的。本文通过开发适合离散可怜的可能性的新型贝叶斯推理程序来解决这一计算挑战。受到连续数据的最新方法学进步的启发，主要思想是使用离散的Fisher Divergence更新有关模型参数的信念，以代替有问题的棘手的可能性。结果是可以使用标准计算工具（例如Markov Chain Monte Carlo）进行采样的广义后部，从而规避了棘手的归一化常数。分析了广义后验的统计特性，并具有足够的后验一致性和渐近正态性的条件。此外，提出了一种新颖的通用后代校准方法。应用程序在离散空间数据的晶格模型和计数数据的多元模型上介绍，在每种情况下，方法论都以低计算成本促进通用的贝叶斯推断。

translated by 谷歌翻译

Optimal Thinning of MCMC Output

Marina Riabiz , Wilson Chen , Jon Cockayne , Pawel Swietach , Steven A. Niederer , Lester Mackey , Chris. J. Oates

分类： (统计)机器学习

2020-05-08

利用启发式来评估收敛性和压缩马尔可夫链蒙特卡罗的输出可以在生产的经验逼近时是次优。通常，许多初始状态归因于“燃烧”并移除，而链条的其余部分是“变薄”，如果还需要压缩。在本文中，我们考虑回顾性地从样本路径中选择固定基数的状态的问题，使得由其经验分布提供的近似接近最佳。提出了一种基于核心稳定性差异的贪婪最小化的新方法，这适用于需要重压力的问题。理论结果保障方法的一致性及其有效性在常微分方程的参数推理的具体背景下证明了该效果。软件可在Python，R和Matlab中的Stein细化包中提供。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Quasi-Bayesian Dual Instrumental Variable Regression

Ziyu Wang , Yuhao Zhou , Tongzheng Ren , Jun Zhu

分类： (统计)机器学习 | 机器学习

2021-06-16

近年来目睹了采用灵活的机械学习模型进行乐器变量（IV）回归的兴趣，但仍然缺乏不确定性量化方法的发展。在这项工作中，我们为IV次数回归提出了一种新的Quasi-Bayesian程序，建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率，并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法，可以扩展到与宽神经网络模型一起工作。实证评价表明，我们的方法对复杂的高维问题产生了丰富的不确定性估计。

translated by 谷歌翻译

A Spectral Representation of Kernel Stein Discrepancy with Application to Goodness-of-Fit Tests for Measures on Infinite Dimensional Hilbert Spaces

George Wynne , Mikołaj Kasprzak , Andrew B. Duncan

分类： (统计)机器学习

2022-06-09

内核Stein差异（KSD）是一种基于内核的广泛使用概率指标之间差异的非参数量度。它通常在用户从候选概率度量中收集的样本集合的情况下使用，并希望将它们与指定的目标概率度量进行比较。 KSD的一个有用属性是，它可以仅从候选度量的样本中计算出来，并且不知道目标度量的正常化常数。 KSD已用于一系列设置，包括合适的测试，参数推断，MCMC输出评估和生成建模。当前KSD方法论的两个主要问题是（i）超出有限维度欧几里得环境之外的适用性以及（ii）缺乏影响KSD性能的清晰度。本文提供了KSD的新频谱表示，这两种补救措施都使KSD适用于希尔伯特（Hilbert）评估数据，并揭示了内核和Stein oterator Choice对KSD的影响。我们通过在许多合成数据实验中对各种高斯和非高斯功能模型进行拟合优度测试来证明所提出的方法的功效。

translated by 谷歌翻译

Sequential Estimation of Convex Functionals and Divergences

Tudor Manole , Aaditya Ramdas

分类： (统计)机器学习

2021-03-16

我们提出了一种统一的技术，用于顺序估计分布之间的凸面分歧，包括内核最大差异等积分概率度量，$ \ varphi $ - 像Kullback-Leibler发散，以及最佳运输成本，例如Wassersein距离的权力。这是通过观察到经验凸起分歧（部分有序）反向半角分离的实现来实现的，而可交换过滤耦合，其具有这些方法的最大不等式。这些技术似乎是对置信度序列和凸分流的现有文献的互补和强大的补充。我们构建一个离线到顺序设备，将各种现有的离线浓度不等式转换为可以连续监测的时间均匀置信序列，在任意停止时间提供有效的测试或置信区间。得到的顺序边界仅在相应的固定时间范围内支付迭代对数价格，保留对问题参数的相同依赖性（如适用的尺寸或字母大小）。这些结果也适用于更一般的凸起功能，如负差分熵，实证过程的高度和V型统计。

translated by 谷歌翻译

The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks

Takuo Matsubara , Chris J. Oates , François-Xavier Briol

分类： (统计)机器学习 | 机器学习

2020-10-16

贝叶斯神经网络试图将神经网络的强大预测性能与与贝叶斯架构预测产出相关的不确定性的正式量化相结合。然而，它仍然不清楚如何在升入网络的输出空间时，如何赋予网络的参数。提出了一种可能的解决方案，使用户能够为手头的任务提供适当的高斯过程协方差函数。我们的方法构造了网络参数的先前分配，称为ridgelet，它近似于网络的输出空间中的Posited高斯过程。与神经网络和高斯过程之间的连接的现有工作相比，我们的分析是非渐近的，提供有限的样本大小的错误界限。这建立了贝叶斯神经网络可以近似任何高斯过程，其协方差函数是足够规律的任何高斯过程。我们的实验评估仅限于概念验证，在那里我们证明ridgele先前可以在可以提供合适的高斯过程的回归问题之前出现非结构化。

translated by 谷歌翻译

On lower bounds for the bias-variance trade-off

Alexis Derumigny , Johannes Schmidt-Hieber

分类： (统计)机器学习

2020-05-30

对于高维和非参数统计模型，速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到，但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略，以获得对任何估计方差的下限，偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的，并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限，用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中，将抽象的下限应用于几种统计模型，包括高斯白噪声模型，边界估计问题，高斯序列模型和高维线性回归模型。对于这些特定的统计应用，发生不同类型的偏差差异发生，其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡，我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动，以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中，发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用，但是平方偏差的速率和方差不必平衡以实现最小估计速率。

translated by 谷歌翻译

Controlling Moments with Kernel Stein Discrepancies

Heishiro Kanagawa , Arthur Gretton , Lester Mackey

分类： (统计)机器学习 | 机器学习

2022-11-10

Quantifying the deviation of a probability distribution is challenging when the target distribution is defined by a density with an intractable normalizing constant. The kernel Stein discrepancy (KSD) was proposed to address this problem and has been applied to various tasks including diagnosing approximate MCMC samplers and goodness-of-fit testing for unnormalized statistical models. This article investigates a convergence control property of the diffusion kernel Stein discrepancy (DKSD), an instance of the KSD proposed by Barp et al. (2019). We extend the result of Gorham and Mackey (2017), which showed that the KSD controls the bounded-Lipschitz metric, to functions of polynomial growth. Specifically, we prove that the DKSD controls the integral probability metric defined by a class of pseudo-Lipschitz functions, a polynomial generalization of Lipschitz functions. We also provide practical sufficient conditions on the reproducing kernel for the stated property to hold. In particular, we show that the DKSD detects non-convergence in moments with an appropriate kernel.

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Robust Bayesian Inference for Simulator-based Models via the MMD Posterior Bootstrap

Charita Dellaporta , Jeremias Knoblauch , Theodoros Damoulas , François-Xavier Briol

分类：机器学习 | (统计)机器学习

2022-02-09

Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posterior bootstrap and maximum mean discrepancy estimators. This leads to a highly-parallelisable Bayesian inference algorithm with strong robustness properties. This is demonstrated through an in-depth theoretical study which includes generalisation bounds and proofs of frequentist consistency and robustness of our posterior. The approach is then assessed on a range of examples including a g-and-k distribution and a toggle-switch model.

translated by 谷歌翻译

On the Statistical Complexity of Sample Amplification

Brian Axelrod , Shivam Garg , Yanjun Han , Vatsal Sharan , Gregory Valiant

分类：机器学习

2022-01-12

鉴于$ n $ i.i.d.从未知的分发$ P $绘制的样本，何时可以生成更大的$ n + m $ samples，这些标题不能与$ n + m $ i.i.d区别区别。从$ p $绘制的样品？（AXELROD等人2019）将该问题正式化为样本放大问题，并为离散分布和高斯位置模型提供了最佳放大程序。然而，这些程序和相关的下限定制到特定分布类，对样本扩增的一般统计理解仍然很大程度上。在这项工作中，我们通过推出通常适用的放大程序，下限技术和与现有统计概念的联系来放置对公司统计基础的样本放大问题。我们的技术适用于一大类分布，包括指数家庭，并在样本放大和分配学习之间建立严格的联系。

translated by 谷歌翻译

A Permutation-Free Kernel Independence Test

Shubhanshu Shekhar , Ilmun Kim , Aaditya Ramdas

分类：机器学习 | (统计)机器学习

2022-12-18

In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Thus, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. This paper provides a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced ``cross'' HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. This requires building on the newly developed theory of cross U-statistics by Kim and Ramdas (2020), and in particular developing several nontrivial extensions of the theory in Shekhar et al. (2022), which developed an analogous permutation-free kernel two-sample test. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the full dCov or HSIC, our variants have the same power up to a $\sqrt 2$ factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.

translated by 谷歌翻译

A Permutation-free Kernel Two-Sample Test

Shubhanshu Shekhar , Ilmun Kim , Aaditya Ramdas

分类：机器学习 | (统计)机器学习

2022-11-27

The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level-$\alpha$ test, one usually selects the rejection threshold as the $(1-\alpha)$-quantile of the permutation distribution. The resulting nonparametric test has finite-sample validity but suffers from large computational cost, since every permutation takes quadratic time. We propose the cross-MMD, a new quadratic-time MMD test statistic based on sample-splitting and studentization. We prove that under mild assumptions, the cross-MMD has a limiting standard Gaussian distribution under the null. Importantly, we also show that the resulting test is consistent against any fixed alternative, and when using the Gaussian kernel, it has minimax rate-optimal power against local alternatives. For large sample sizes, our new cross-MMD provides a significant speedup over the MMD, for only a slight loss in power.

translated by 谷歌翻译

Asymptotics of Network Embeddings Learned via Subsampling

Andrew Davison , Morgane Austern

分类： (统计)机器学习 | 机器学习

2021-07-06

Network data are ubiquitous in modern machine learning, with tasks of interest including node classification, node clustering and link prediction. A frequent approach begins by learning an Euclidean embedding of the network, to which algorithms developed for vector-valued data are applied. For large networks, embeddings are learned using stochastic gradient methods where the sub-sampling scheme can be freely chosen. Despite the strong empirical performance of such methods, they are not well understood theoretically. Our work encapsulates representation methods using a subsampling approach, such as node2vec, into a single unifying framework. We prove, under the assumption that the graph is exchangeable, that the distribution of the learned embedding vectors asymptotically decouples. Moreover, we characterize the asymptotic distribution and provided rates of convergence, in terms of the latent parameters, which includes the choice of loss function and the embedding dimension. This provides a theoretical foundation to understand what the embedding vectors represent and how well these methods perform on downstream tasks. Notably, we observe that typically used loss functions may lead to shortcomings, such as a lack of Fisher consistency.

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

Nicholas J. Irons , Meyer Scetbon , Soumik Pal , Zaid Harchaoui

分类： (统计)机器学习 | 机器学习

2021-12-31

三角形流量，也称为kn \“{o}的Rosenblatt测量耦合，包括用于生成建模和密度估计的归一化流模型的重要构建块，包括诸如实值的非体积保存变换模型的流行自回归流模型（真实的NVP）。我们提出了三角形流量统计模型的统计保证和样本复杂性界限。特别是，我们建立了KN的统计一致性和kullback-leibler估算器的rospblatt的kullback-leibler估计的有限样本会聚率使用实证过程理论的工具测量耦合。我们的结果突出了三角形流动下播放功能类的各向异性几何形状，优化坐标排序，并导致雅各比比流动的统计保证。我们对合成数据进行数值实验，以说明我们理论发现的实际意义。

translated by 谷歌翻译

Concentration analysis of multivariate elliptic diffusion processes

Cathrine Aeckerle-Willems , Claudia Strauch , Lukas Trottner

分类： (统计)机器学习

2022-06-07

我们证明了连续和离散时间添加功能的浓度不平等和相关的PAC界限，用于可能是多元，不可逆扩散过程的无界函数。我们的分析依赖于通过泊松方程的方法，使我们能够考虑一系列非常广泛的指数性千古过程。这些结果增加了现有的浓度不平等，用于扩散过程的加性功能，这些功能仅适用于有界函数或从明显较小的类别中的过程的无限函数。我们通过两个截然不同的区域的例子来证明这些指数不平等的力量。考虑到在稀疏性约束下可能具有高维参数非线性漂移模型，我们应用连续的时间浓度结果来验证套索估计的受限特征值条件，这对于甲骨文不平等的推导至关重要。离散添加功能的结果用于研究未经调整的Langevin MCMC算法，用于采样中等重尾密度$ \ pi $。特别是，我们为多项式增长功能$ f $的样品蒙特卡洛估计量$ \ pi（f）提供PAC边界，以量化足够的样本和阶梯尺寸，以在规定的边距内近似具有很高的可能性。

translated by 谷歌翻译