智能论文笔记

Convex regularization in statistical inverse learning problems

Tatiana A. Bubba , Martin Burger , Tapio Helin , Luca Ratti

分类： (统计)机器学习 | 机器学习

2021-02-18

我们考虑统计逆学习问题，任务是根据$ AF $的嘈杂点评估估算函数$ F $，其中$ a $是一个线性运算符。函数$ AF $在I.I.D评估。随机设计点$ u_n $，$ n = 1，...，n $由未知的一般概率分布生成。我们认为Tikhonov正规用一般凸起和$ P $-Homenecous罚款功能，并在由惩罚功能引起的对称BREGMAN距离中测量的地面真理的正则化解决方案的集中率。我们获得了Besov Norm处罚的具体率，并在数值上展示了与X射线断层扫描的背景下的观察到的率的对应。

translated by 谷歌翻译

Learning the optimal Tikhonov regularizer for inverse problems

Giovanni S. Alberti , Ernesto De Vito , Matti Lassas , Luca Ratti , Matteo Santacesaria

分类： (统计)机器学习 | 机器学习

2021-06-11

在这项工作中，我们考虑线性逆问题$ y = ax + \ epsilon $，其中$ a \ colon x \ to y $是可分离的hilbert spaces $ x $和$ y $之间的已知线性运算符，$ x $。 $ x $和$ \ epsilon $中的随机变量是$ y $的零平均随机过程。该设置涵盖成像中的几个逆问题，包括去噪，去束和X射线层析造影。在古典正规框架内，我们专注于正则化功能的情况下未能先验，而是从数据中学习。我们的第一个结果是关于均方误差的最佳广义Tikhonov规则器的表征。我们发现它完全独立于前向操作员$ a $，并仅取决于$ x $的平均值和协方差。然后，我们考虑从两个不同框架中设置的有限训练中学习常规程序的问题：一个监督，根据$ x $和$ y $的样本，只有一个无人监督，只基于$ x $的样本。在这两种情况下，我们证明了泛化界限，在X $和$ \ epsilon $的分发的一些弱假设下，包括子高斯变量的情况。我们的界限保持在无限尺寸的空间中，从而表明更精细和更细的离散化不会使这个学习问题更加困难。结果通过数值模拟验证。

translated by 谷歌翻译

Regularized ERM on random subspaces

Andrea Della Vecchia , Jaouad Mourtada , Ernesto De Vito , Lorenzo Rosasco

分类： (统计)机器学习 | 机器学习

2022-12-04

We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to extend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector machines. This unified analysis requires developing new proofs, that use different technical tools, such as sub-gaussian inputs, to achieve fast rates. Our main results show the existence of different settings, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance.

translated by 谷歌翻译

Optimal Learning Rates for Regularized Least-Squares with a Fourier Capacity Condition

Prem Talwai , David Simchi-Levi

分类： (统计)机器学习

2022-04-16

我们为在一般来源条件下的希尔伯特量表中的新型Tikhonov登记学习问题提供了最小的自适应率。我们的分析不需要在假设类中包含回归函数，并且最著名的是不使用传统的\ textit {先验{先验}假设。使用插值理论，我们证明了Mercer运算符的光谱可以在存在“紧密''$ l^{\ infty} $嵌入的存在的情况下，可以推断出合适的Hilbert鳞片的嵌入。我们的分析利用了新的傅立叶能力条件在某些参数制度中，修改后的Mercer运算符的最佳Lorentz范围空间。

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

On minimax density estimation via measure transport

Sven Wang , Youssef Marzouk

分类： (统计)机器学习

2022-07-20

我们研究基于度量传输的非参数密度估计器的收敛性和相关距离。这些估计量代表了利息的度量，作为传输图下选择的参考分布的推动力，其中地图是通过最大似然目标选择（等效地，将经验性的kullback-leibler损失）或其受惩罚版本选择。我们通过将M估计的技术与基于运输的密度表示的分析性能相结合，为一般惩罚措施估计量的一般类别的措施运输估计器建立了浓度不平等。然后，我们证明了我们的理论对三角形knothe-rosenblatt（kr）在$ d $维单元方面的运输的含义，并表明该估计器的惩罚和未化的版本都达到了Minimax最佳收敛速率，超过了H \ \ \'“较旧的密度类别。具体来说，我们建立了在有限的h \“较旧型球上，未确定的非参数最大似然估计，然后在某些sobolev-penalate的估计器和筛分的小波估计器中建立了最佳速率。

translated by 谷歌翻译

Coefficient-based Regularized Distribution Regression

Yuan Mao , Lei Shi , Zheng-Chu Guo

分类： (统计)机器学习 | 机器学习

2022-08-26

在本文中，我们考虑了基于系数的正则分布回归，该回归旨在从概率措施中回归到复制的内核希尔伯特空间（RKHS）的实现响应（RKHS），该响应将正则化放在系数上，而内核被假定为无限期的。。该算法涉及两个采样阶段，第一阶段样本由分布组成，第二阶段样品是从这些分布中获得的。全面研究了回归函数的不同规律性范围内算法的渐近行为，并通过整体操作员技术得出学习率。我们在某些温和条件下获得最佳速率，这与单级采样的最小最佳速率相匹配。与文献中分布回归的内核方法相比，所考虑的算法不需要内核是对称的和阳性的半明确仪，因此为设计不确定的内核方法提供了一个简单的范式，从而丰富了分布回归的主题。据我们所知，这是使用不确定核进行分配回归的第一个结果，我们的算法可以改善饱和效果。

translated by 谷歌翻译

HTML版本

Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces

Junhong Lin , Alessandro Rudi , Lorenzo Rosasco , Volkan Cevher

分类： (统计)机器学习 | 机器学习

2018-01-20

在本文中，我们研究了可分离的希尔伯特空间的回归问题，并涵盖了繁殖核希尔伯特空间的非参数回归。我们研究了一类光谱/正则化算法，包括脊回归，主成分回归和梯度方法。我们证明了最佳，高概率的收敛性在研究算法的规范变体方面，考虑到对假设空间的能力假设以及目标函数的一般源条件。因此，我们以最佳速率获得了几乎确定的收敛结果。我们的结果改善并推广了先前的结果，以填补了无法实现的情况的理论差距。

translated by 谷歌翻译

Variational Regularization in Inverse Problems and Machine Learning

Martin Burger

分类：机器学习

2021-12-08

本文讨论了基本结果和最近的变分正规化方法，如逆问题所开发的。在典型的设置中，我们回顾获得收敛正则化方案所需的基本属性，并进一步讨论分别需要的定量估计的推导，例如凸起功能的Bregman距离所需的成分。除了开发用于逆问题的方法外，我们还将在机器学习中讨论变分正规化，并解决与经典正则化理论的一些连接。特别是我们将讨论正规化理论框架中机器学习问题的重新解释，以及对风险最小化框架中逆问题的变分方法的重新解释。此外，我们在Bregman距离和泛化误差中建立了一些先前未知的连接。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

Concentration analysis of multivariate elliptic diffusion processes

Cathrine Aeckerle-Willems , Claudia Strauch , Lukas Trottner

分类： (统计)机器学习

2022-06-07

我们证明了连续和离散时间添加功能的浓度不平等和相关的PAC界限，用于可能是多元，不可逆扩散过程的无界函数。我们的分析依赖于通过泊松方程的方法，使我们能够考虑一系列非常广泛的指数性千古过程。这些结果增加了现有的浓度不平等，用于扩散过程的加性功能，这些功能仅适用于有界函数或从明显较小的类别中的过程的无限函数。我们通过两个截然不同的区域的例子来证明这些指数不平等的力量。考虑到在稀疏性约束下可能具有高维参数非线性漂移模型，我们应用连续的时间浓度结果来验证套索估计的受限特征值条件，这对于甲骨文不平等的推导至关重要。离散添加功能的结果用于研究未经调整的Langevin MCMC算法，用于采样中等重尾密度$ \ pi $。特别是，我们为多项式增长功能$ f $的样品蒙特卡洛估计量$ \ pi（f）提供PAC边界，以量化足够的样本和阶梯尺寸，以在规定的边距内近似具有很高的可能性。

translated by 谷歌翻译

Debiased Inference on Identified Linear Functionals of Underidentified Nuisances via Penalized Minimax Estimation

Nathan Kallus , Xiaojie Mao

分类： (统计)机器学习

2022-08-17

我们研究了对识别的非唯一麻烦的线性功能的通用推断，该功能定义为未识别条件矩限制的解决方案。这个问题出现在各种应用中，包括非参数仪器变量模型，未衡量的混杂性下的近端因果推断以及带有阴影变量的丢失 - 与随机数据。尽管感兴趣的线性功能（例如平均治疗效应）在适当的条件下是可以识别出的，但令人讨厌的非独家性对统计推断构成了严重的挑战，因为在这种情况下，常见的滋扰估计器可能是不稳定的，并且缺乏固定限制。在本文中，我们提出了对滋扰功能的受惩罚的最小估计器，并表明它们在这种挑战性的环境中有效推断。提出的滋扰估计器可以适应灵活的功能类别，重要的是，无论滋扰是否是唯一的，它们都可以融合到由惩罚确定的固定限制。我们使用受惩罚的滋扰估计器来形成有关感兴趣的线性功能的依据估计量，并在通用高级条件下证明其渐近正态性，这提供了渐近有效的置信区间。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

Optimal transport map estimation in general function spaces

Vincent Divol , Jonathan Niles-Weed , Aram-Alexandre Pooladian

分类： (统计)机器学习

2022-12-07

We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

Optimal variance-reduced stochastic approximation in Banach spaces

Wenlong Mou , Koulik Khamaru , Martin J. Wainwright , Peter L. Bartlett , Michael I. Jordan

分类：机器学习 | (统计)机器学习

2022-01-21

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.

translated by 谷歌翻译

Tractability from overparametrization: The example of the negative perceptron

Andrea Montanari , Yiqiao Zhong , Kangjie Zhou

分类：机器学习

2021-10-28

在负面的感知问题中，我们给出了$ n $数据点$（{\ boldsymbol x} _i，y_i）$，其中$ {\ boldsymbol x} _i $是$ d $ -densional vector和$ y_i \ in \ { + 1，-1 \} $是二进制标签。数据不是线性可分离的，因此我们满足自己的内容，以找到最大的线性分类器，具有最大的\ emph {否定}余量。换句话说，我们想找到一个单位常规矢量$ {\ boldsymbol \ theta} $，最大化$ \ min_ {i \ le n} y_i \ langle {\ boldsymbol \ theta}，{\ boldsymbol x} _i \ rangle $ 。这是一个非凸优化问题（它相当于在Polytope中找到最大标准矢量），我们在两个随机模型下研究其典型属性。我们考虑比例渐近，其中$ n，d \ to \ idty $以$ n / d \ to \ delta $，并在最大边缘$ \ kappa _ {\ text {s}}（\ delta）上证明了上限和下限）$或 - 等效 - 在其逆函数$ \ delta _ {\ text {s}}（\ kappa）$。换句话说，$ \ delta _ {\ text {s}}（\ kappa）$是overparametization阈值：以$ n / d \ le \ delta _ {\ text {s}}（\ kappa） - \ varepsilon $一个分类器实现了消失的训练错误，具有高概率，而以$ n / d \ ge \ delta _ {\ text {s}}（\ kappa）+ \ varepsilon $。我们在$ \ delta _ {\ text {s}}（\ kappa）$匹配，以$ \ kappa \ to - \ idty $匹配。然后，我们分析了线性编程算法来查找解决方案，并表征相应的阈值$ \ delta _ {\ text {lin}}（\ kappa）$。我们观察插值阈值$ \ delta _ {\ text {s}}（\ kappa）$和线性编程阈值$ \ delta _ {\ text {lin {lin}}（\ kappa）$之间的差距，提出了行为的问题其他算法。

translated by 谷歌翻译

Off-the-grid learning of sparse mixtures from a continuous dictionary

Cristina Butucea , Jean-François Delmas , Anne Dutfoy , Clément Hardy

分类： (统计)机器学习 | 机器学习

2022-06-29

我们考虑了一个通用的非线性模型，其中信号是未知（可能增加的，可能增加的特征数量）的有限混合物，该特征是由由真实非线性参数参数化的连续字典发出的。在连续或离散设置中使用高斯（可能相关）噪声观察信号。我们提出了一种网格优化方法，即一种不使用参数空间上任何离散化方案的方法来估计特征的非线性参数和混合物的线性参数。我们使用有关离网方法的几何形状的最新结果，在真实的基础非线性参数上给出最小的分离，以便可以构建插值证书函数。还使用尾部界限，用于高斯过程的上流，我们将预测误差限制为高概率。假设可以构建证书函数，我们的预测误差绑定到日志 - 因线性回归模型中LASSO预测器所达到的速率类似。我们还建立了收敛速率，以高概率量化线性和非线性参数的估计质量。

translated by 谷歌翻译

Tight bounds for minimum l1-norm interpolation of noisy data

Guillaume Wang , Konstantin Donhauser , Fanny Yang

分类：机器学习 | (统计)机器学习

2021-11-10

我们提供匹配的Under $ \ sigma ^ 2 / \ log（d / n）$的匹配的上下界限为最低$ \ ell_1 $ -norm插值器，a.k.a.基础追踪。我们的结果紧紧达到可忽略的术语，而且是第一个暗示噪声最小范围内插值的渐近一致性，因为各向同性特征和稀疏的地面真理。我们的工作对最低$ \ ell_2 $ -norm插值的“良性接收”进行了补充文献，其中才能在特征有效地低维时实现渐近一致性。

translated by 谷歌翻译