智能论文笔记

Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling

Boris Landa , Xiuyuan Cheng

分类：机器学习 | (统计)机器学习

2022-09-16

高斯内核及其传统的正常化（例如，行 - 故事）是评估数据点（通常用于流形学习和聚类的数据点之间的相似性）的流行方法，以及在图形上进行的监督和半监督学习。在许多实际情况下，数据可能会被禁止传统亲和力矩阵正确评估相似性的噪声损坏，尤其是在整个数据中的噪声幅度差异很大的情况下，例如在异性恋或异常值下。在噪声下提供更稳定行为的另一种方法是高斯内核的双随机归一化。在这项工作中，我们在一个环境中研究了这种归一化，在这种情况下，在高维空间中嵌入的低维歧管上的未知密度采样点，并因可能强大的，非相同的分布式，高斯的噪声而损坏。我们建立了双重随机亲和力矩阵的点浓度及其围绕某些种群形式的缩放因素。然后，我们利用这些结果来开发几种用于鲁棒推理的工具。首先，我们得出一个强大的密度估计器，该密度估计器在高维噪声下可以显着优于标准内核密度估计器。其次，我们提供估计噪声幅度的估计量，点式信号幅度以及清洁数据点之间的成对欧几里得距离。最后，我们得出了强大的图形拉普拉斯融合，这些标准差异近似于流行的歧管拉普拉斯人，包括拉普拉斯·贝特拉米操作员，表明可以在高维噪声下恢复歧管的局部几何形状。我们在仿真和实际单细胞RNA-sequering数据中举例说明了我们的结果。在后者中，我们表明我们提出的正常化对与不同细胞类型相关的技术变异性是可靠的。

translated by 谷歌翻译

Bi-stochastically normalized graph Laplacian: convergence to manifold Laplacian and robustness to outlier noise

Xiuyuan Cheng , Boris Landa

分类：机器学习 | (统计)机器学习

2022-06-22

内元化图亲和力矩阵的双性化归一化为基于图的数据分析中的图形laplacian方法提供了一种替代归一化方案，并且可以通过sinkhorn-knopp（SK）迭代在实践中有效地计算出来。本文证明了双性化标准化图拉普拉斯（Laplacian）与laplacian的融合，当$ n $数据点为i.i.d.从嵌入可能高维空间中的一般$ d $维歧管中取样。在$ n \ to \ infty $和内核带宽$ \ epsilon \ to 0 $的某些联合限制下，图Laplacian操作员的点融合率（2-Norm）被证明为$ O（N^{n^{ -1/（d/2+3）}）$在有限的大$ n $上，到log racture，在$ \ epsilon \ sim n^{ - 1/（d/2+3）} $时实现。当歧管数据被异常噪声损坏时，我们从理论上证明了图形laplacian点的一致性，该图与清洁歧管数据的速率匹配到与噪声矢量相互内部产物的界限成比例的附加错误项。我们的分析表明，在本文中考虑的设置下，不是精确的双性化归一化，而是大约将达到相同的一致性率。在分析的激励下，我们提出了一个近似且受约束的矩阵缩放问题，可以通过早期终止的SK迭代来解决，并适用于模拟的歧管数据既干净又具有离群的噪声。数值实验支持我们的理论结果，并显示了双形式归一化图拉普拉斯对异常噪声的鲁棒性。

translated by 谷歌翻译

Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation

Xiuyuan Cheng , Nan Wu

分类：机器学习 | (统计)机器学习

2021-01-25

当图形亲和力矩阵是由$ n $随机样品构建的，在$ d $ d $维歧管上构建图形亲和力矩阵时，这项工作研究图形拉普拉斯元素与拉普拉斯 - 贝特拉米操作员的光谱收敛。通过分析DIRICHLET形成融合并通过歧管加热核卷积构建候选本本函数，我们证明，使用高斯内核，可以设置核band band band band parame $ \ epsilon \ sim \ sim（\ log n/ n/ n）^{1/（D /2+2）} $使得特征值收敛率为$ n^{ - 1/（d/2+2）} $，并且2-norm中的特征向量收敛率$ n^{ - 1/（d+） 4）} $;当$ \ epsilon \ sim（\ log n/n）^{1/（d/2+3）} $时，eigenValue和eigenVector速率均为$ n^{ - 1/（d/2+3）} $。这些费率最高为$ \ log n $因素，并被证明是有限的许多低洼特征值。当数据在歧管上均匀采样以及密度校正的图laplacian（在两个边的度矩阵中归一化）时，结果适用于非归一化和随机漫步图拉普拉斯laplacians laplacians laplacians以及密度校正的图laplacian（其中两侧的级别矩阵）采样数据。作为中间结果，我们证明了密度校正图拉普拉斯的新点和差异形式的收敛速率。提供数值结果以验证理论。

translated by 谷歌翻译

Graph Based Gaussian Processes on Restricted Domains

David B Dunson , Hau-Tieng Wu , Nan Wu

分类： (统计)机器学习

2020-10-14

在非参数回归中，落在欧几里德空间的限制子集中是常见的。基于典型的内核的方法，不考虑收集观察的域的内在几何学可能产生次优效果。在本文中，我们专注于在高斯过程（GP）模型的背景下解决这个问题，提出了一种新的基于Graplacian的GPS（GL-GPS），该GPS（GL-GPS），该GPS（GL-GPS）学习尊重输入域几何的协方差。随着热核的难以计算地，我们使用Prop Laplacian（GL）的有限许多特征方来近似协方差。 GL由内核构成，仅取决于输入的欧几里德坐标。因此，我们可以从关于内核的完整知识中受益，以通过NYSTR \“{o} M型扩展来将协方差结构扩展到新到达的样本。我们为GL-GP方法提供了实质性的理论支持，并说明了性能提升各种应用。

translated by 谷歌翻译

Impact of signal-to-noise ratio and bandwidth on graph Laplacian spectrum from high-dimensional noisy point cloud

Xiucai Ding , Hau-Tieng Wu

分类：机器学习

2020-11-21

我们系统地{研究基于内核的图形laplacian（gl）的光谱}，该图在非null设置中由高维和嘈杂的随机点云构成，其中点云是从低维几何对象（如歧管）中采样的，被高维噪音破坏。我们量化了信号和噪声在信号噪声比（SNR）的不同状态下如何相互作用，并报告GL的{所产生的特殊光谱行为}。此外，我们还探索了GL频谱上的内核带宽选择，而SNR的不同状态则导致带宽的自适应选择，这与实际数据中的共同实践相吻合。该结果为数据集嘈杂时的从业人员提供了理论支持。

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

Boundary Estimation from Point Clouds: Algorithms, Guarantees and Applications

Jeff Calder , Sangmin Park , Dejan Slepčev

分类： (统计)机器学习

2021-11-05

我们调查识别来自域中的采样点的域的边界。我们向边界引入正常矢量的新估计，指向边界的距离，以及对边界条内的点位于边界的测试。可以有效地计算估算器，并且比文献中存在的估计更准确。我们为估算者提供严格的错误估计。此外，我们使用检测到的边界点来解决Point云上PDE的边值问题。我们在点云上证明了LAPLACH和EIKONG方程的错误估计。最后，我们提供了一系列数值实验，说明了我们的边界估计器，在点云上的PDE应用程序的性能，以及在图像数据集上测试。

translated by 谷歌翻译

Geometric Scattering on Measure Spaces

Joyce Chew , Matthew Hirn , Smita Krishnaswamy , Deanna Needell , Michael Perlmutter , Holly Steach , Siddharth Viswanath , Hau-Tieng Wu

分类： (统计)机器学习 | 机器学习

2022-08-17

散射变换是一种基于小波的多层转换，最初是作为卷积神经网络（CNN）的模型引入的，它在我们对这些网络稳定性和不变性属性的理解中发挥了基础作用。随后，人们普遍兴趣将CNN的成功扩展到具有非欧盟结构的数据集，例如图形和歧管，从而导致了几何深度学习的新兴领域。为了提高我们对这个新领域中使用的体系结构的理解，几篇论文提出了对非欧几里得数据结构（如无方向的图形和紧凑的Riemannian歧管）的散射转换的概括。在本文中，我们介绍了一个通用的统一模型，用于测量空间上的几何散射。我们提出的框架包括以前的几何散射作品作为特殊情况，但也适用于更通用的设置，例如有向图，签名图和带边界的歧管。我们提出了一个新标准，该标准可以识别哪些有用表示应该不变的组，并表明该标准足以确保散射变换具有理想的稳定性和不变性属性。此外，我们考虑从随机采样未知歧管获得的有限度量空间。我们提出了两种构造数据驱动图的方法，在该图上相关的图形散射转换近似于基础歧管上的散射变换。此外，我们使用基于扩散图的方法来证明这些近似值之一的收敛速率的定量估计值，因为样品点的数量趋向于无穷大。最后，我们在球形图像，有向图和高维单细胞数据上展示了方法的实用性。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

How do kernel-based sensor fusion algorithms behave under high dimensional noise?

Xiucai Ding , Hau-Tieng Wu

分类： (统计)机器学习 | 机器学习

2021-11-22

我们研究了两个基于内核的传感器融合算法，非参数规范相关性分析（NCCA）和交替扩散（AD）的行为，在非核设置下，由两个传感器收集的清洁数据集由嵌入高的常见的低维歧管建模尺寸欧几里德空间和数据集通过高维噪声损坏。假设样品尺寸和样本大小相当大，建立相关核矩阵特征值的渐近限制和收敛率，其中使用高斯内核进行NCCA和AD。事实证明，渐近限制和收敛速率都取决于每个传感器的信噪比（SNR）和所选带宽。一方面，我们表明，如果没有任何理智检查的NCCA和AD直接应用于嘈杂的点云，则可能会产生误导科学家解释的人工信息。另一方面，我们证明，如果带宽充分选择，则当SNR相对较大时，NCCA和AD都可以使NCCA和广告稳健地对高维噪声进行稳健。

translated by 谷歌翻译

IAN: Iterated Adaptive Neighborhoods for manifold learning and dimensionality estimation

Luciano Dyballa , Steven W. Zucker

分类：机器学习 | 人工智能

2022-08-19

在机器学习中调用多种假设需要了解歧管的几何形状和维度，理论决定了需要多少样本。但是，在应用程序数据中，采样可能不均匀，歧管属性是未知的，并且（可能）非纯化；这意味着社区必须适应本地结构。我们介绍了一种用于推断相似性内核提供数据的自适应邻域的算法。从本地保守的邻域（Gabriel）图开始，我们根据加权对应物进行迭代率稀疏。在每个步骤中，线性程序在全球范围内产生最小的社区，并且体积统计数据揭示了邻居离群值可能违反了歧管几何形状。我们将自适应邻域应用于非线性维度降低，地球计算和维度估计。与标准算法的比较，例如使用K-Nearest邻居，证明了它们的实用性。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

Perturbation Analysis of Randomized SVD and its Applications to High-dimensional Statistics

Yichi Zhang , Minh Tang

分类： (统计)机器学习

2022-03-19

随机奇异值分解（RSVD）是用于计算大型数据矩阵截断的SVD的一类计算算法。给定A $ n \ times n $对称矩阵$ \ mathbf {m} $，原型RSVD算法输出通过计算$ \ mathbf {m mathbf {m} $的$ k $引导singular vectors的近似m}^{g} \ mathbf {g} $;这里$ g \ geq 1 $是一个整数，$ \ mathbf {g} \ in \ mathbb {r}^{n \ times k} $是一个随机的高斯素描矩阵。在本文中，我们研究了一般的“信号加上噪声”框架下的RSVD的统计特性，即，观察到的矩阵$ \ hat {\ mathbf {m}} $被认为是某种真实但未知的加法扰动信号矩阵$ \ mathbf {m} $。我们首先得出$ \ ell_2 $（频谱规范）和$ \ ell_ {2 \ to \ infty} $（最大行行列$ \ ell_2 $ norm）$ \ hat {\ hat {\ Mathbf {M}} $和信号矩阵$ \ Mathbf {M} $的真实单数向量。这些上限取决于信噪比（SNR）和功率迭代$ g $的数量。观察到一个相变现象，其中较小的SNR需要较大的$ g $值以保证$ \ ell_2 $和$ \ ell_ {2 \ to \ fo \ infty} $ distances的收敛。我们还表明，每当噪声矩阵满足一定的痕量生长条件时，这些相变发生的$ g $的阈值都会很清晰。最后，我们得出了近似奇异向量的行波和近似矩阵的进入波动的正常近似。我们通过将RSVD的几乎最佳性能保证在应用于三个统计推断问题的情况下，即社区检测，矩阵完成和主要的组件分析，并使用缺失的数据来说明我们的理论结果。

translated by 谷歌翻译

Asymptotics of Network Embeddings Learned via Subsampling

Andrew Davison , Morgane Austern

分类： (统计)机器学习 | 机器学习

2021-07-06

Network data are ubiquitous in modern machine learning, with tasks of interest including node classification, node clustering and link prediction. A frequent approach begins by learning an Euclidean embedding of the network, to which algorithms developed for vector-valued data are applied. For large networks, embeddings are learned using stochastic gradient methods where the sub-sampling scheme can be freely chosen. Despite the strong empirical performance of such methods, they are not well understood theoretically. Our work encapsulates representation methods using a subsampling approach, such as node2vec, into a single unifying framework. We prove, under the assumption that the graph is exchangeable, that the distribution of the learned embedding vectors asymptotically decouples. Moreover, we characterize the asymptotic distribution and provided rates of convergence, in terms of the latent parameters, which includes the choice of loss function and the embedding dimension. This provides a theoretical foundation to understand what the embedding vectors represent and how well these methods perform on downstream tasks. Notably, we observe that typically used loss functions may lead to shortcomings, such as a lack of Fisher consistency.

translated by 谷歌翻译

Large sample spectral analysis of graph-based multi-manifold clustering

Nicolas Garcia Trillos , Pengfei He , Chenghui Li

分类：机器学习 | (统计)机器学习

2021-07-28

In this work we study statistical properties of graph-based algorithms for multi-manifold clustering (MMC). In MMC the goal is to retrieve the multi-manifold structure underlying a given Euclidean data set when this one is assumed to be obtained by sampling a distribution on a union of manifolds $\mathcal{M} = \mathcal{M}_1 \cup\dots \cup \mathcal{M}_N$ that may intersect with each other and that may have different dimensions. We investigate sufficient conditions that similarity graphs on data sets must satisfy in order for their corresponding graph Laplacians to capture the right geometric information to solve the MMC problem. Precisely, we provide high probability error bounds for the spectral approximation of a tensorized Laplacian on $\mathcal{M}$ with a suitable graph Laplacian built from the observations; the recovered tensorized Laplacian contains all geometric information of all the individual underlying manifolds. We provide an example of a family of similarity graphs, which we call annular proximity graphs with angle constraints, satisfying these sufficient conditions. We contrast our family of graphs with other constructions in the literature based on the alignment of tangent planes. Extensive numerical experiments expand the insights that our theory provides on the MMC problem.

translated by 谷歌翻译

On lower bounds for the bias-variance trade-off

Alexis Derumigny , Johannes Schmidt-Hieber

分类： (统计)机器学习

2020-05-30

对于高维和非参数统计模型，速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到，但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略，以获得对任何估计方差的下限，偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的，并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限，用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中，将抽象的下限应用于几种统计模型，包括高斯白噪声模型，边界估计问题，高斯序列模型和高维线性回归模型。对于这些特定的统计应用，发生不同类型的偏差差异发生，其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡，我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动，以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中，发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用，但是平方偏差的速率和方差不必平衡以实现最小估计速率。

translated by 谷歌翻译

How do noise tails impact on deep ReLU networks?

Jianqing Fan , Yihong Gu , Wen-Xin Zhou

分类： (统计)机器学习

2022-03-20

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

Tractability from overparametrization: The example of the negative perceptron

Andrea Montanari , Yiqiao Zhong , Kangjie Zhou

分类：机器学习

2021-10-28

在负面的感知问题中，我们给出了$ n $数据点$（{\ boldsymbol x} _i，y_i）$，其中$ {\ boldsymbol x} _i $是$ d $ -densional vector和$ y_i \ in \ { + 1，-1 \} $是二进制标签。数据不是线性可分离的，因此我们满足自己的内容，以找到最大的线性分类器，具有最大的\ emph {否定}余量。换句话说，我们想找到一个单位常规矢量$ {\ boldsymbol \ theta} $，最大化$ \ min_ {i \ le n} y_i \ langle {\ boldsymbol \ theta}，{\ boldsymbol x} _i \ rangle $ 。这是一个非凸优化问题（它相当于在Polytope中找到最大标准矢量），我们在两个随机模型下研究其典型属性。我们考虑比例渐近，其中$ n，d \ to \ idty $以$ n / d \ to \ delta $，并在最大边缘$ \ kappa _ {\ text {s}}（\ delta）上证明了上限和下限）$或 - 等效 - 在其逆函数$ \ delta _ {\ text {s}}（\ kappa）$。换句话说，$ \ delta _ {\ text {s}}（\ kappa）$是overparametization阈值：以$ n / d \ le \ delta _ {\ text {s}}（\ kappa） - \ varepsilon $一个分类器实现了消失的训练错误，具有高概率，而以$ n / d \ ge \ delta _ {\ text {s}}（\ kappa）+ \ varepsilon $。我们在$ \ delta _ {\ text {s}}（\ kappa）$匹配，以$ \ kappa \ to - \ idty $匹配。然后，我们分析了线性编程算法来查找解决方案，并表征相应的阈值$ \ delta _ {\ text {lin}}（\ kappa）$。我们观察插值阈值$ \ delta _ {\ text {s}}（\ kappa）$和线性编程阈值$ \ delta _ {\ text {lin {lin}}（\ kappa）$之间的差距，提出了行为的问题其他算法。

translated by 谷歌翻译