智能论文笔记

Manifold Free Riemannian Optimization

Boris Shustin , Haim Avron , Barak Sober

分类： (统计)机器学习

2022-09-07

Riemannian优化是解决优化问题的原则框架，其中所需的最佳被限制为光滑的歧管$ \ Mathcal {M} $。在此框架中设计的算法通常需要对歧管的几何描述，该描述通常包括切线空间，缩回和成本函数的梯度。但是，在许多情况下，由于缺乏信息或棘手的性能，只能访问这些元素的子集（或根本没有）。在本文中，我们提出了一种新颖的方法，可以在这种情况下执行近似Riemannian优化，其中约束歧管是$ \ r^{d} $的子手机。至少，我们的方法仅需要一组无噪用的成本函数$（\ x_ {i}，y_ {i}）\ in {\ mathcal {m}} \ times \ times \ times \ times \ times \ mathbb {r} $和内在的歧管$ \ MATHCAL {M} $的维度。使用样品，并利用歧管-MLS框架（Sober和Levin 2020），我们构建了缺少的组件的近似值，这些组件娱乐可证明的保证并分析其计算成本。如果某些组件通过分析给出（例如，如果成本函数及其梯度明确给出，或者可以计算切线空间），则可以轻松地适应该算法以使用准确的表达式而不是近似值。我们使用我们的方法分析了基于Riemannian梯度的方法的全球收敛性，并从经验上证明了该方法的强度，以及基于类似原理的共轭梯度类型方法。

translated by 谷歌翻译

Nonlinear matrix recovery using optimization on the Grassmann manifold

Florentin Goyens , Coralia Cartis , Armin Eftekhari

分类： (统计)机器学习 | 机器学习

2021-09-13

We investigate the problem of recovering a partially observed high-rank matrix whose columns obey a nonlinear structure such as a union of subspaces, an algebraic variety or grouped in clusters. The recovery problem is formulated as the rank minimization of a nonlinear feature map applied to the original matrix, which is then further approximated by a constrained non-convex optimization problem involving the Grassmann manifold. We propose two sets of algorithms, one arising from Riemannian optimization and the other as an alternating minimization scheme, both of which include first- and second-order variants. Both sets of algorithms have theoretical guarantees. In particular, for the alternating minimization, we establish global convergence and worst-case complexity bounds. Additionally, using the Kurdyka-Lojasiewicz property, we show that the alternating minimization converges to a unique limit point. We provide extensive numerical results for the recovery of union of subspaces and clustering under entry sampling and dense Gaussian sampling. Our methods are competitive with existing approaches and, in particular, high accuracy is achieved in the recovery using Riemannian second-order methods.

translated by 谷歌翻译

Riemannian Optimization via Frank-Wolfe Methods

Melanie Weber , Suvrit Sra

分类：机器学习

2017-10-30

我们研究无限制的黎曼优化的免投影方法。特别是，我们提出了黎曼弗兰克 - 沃尔夫（RFW）方法。我们将RFW的非渐近收敛率分析为最佳（高音）凸起问题，以及非凸起目标的临界点。我们还提出了一种实用的设置，其中RFW可以获得线性收敛速度。作为一个具体的例子，我们将RFW专用于正定矩阵的歧管，并将其应用于两个任务：（i）计算矩阵几何平均值（riemannian质心）; （ii）计算Bures-Wasserstein重心。这两个任务都涉及大量凸间间隔约束，为此，我们表明RFW要求的Riemannian“线性”Oracle承认了闭合形式的解决方案;该结果可能是独立的兴趣。我们进一步专门从事RFW到特殊正交组，并表明这里也可以以封闭形式解决riemannian“线性”甲骨文。在这里，我们描述了数据矩阵同步的应用程序（促使问题）。我们补充了我们的理论结果，并对RFW对最先进的riemananian优化方法进行了实证比较，并观察到RFW竞争性地对计算黎曼心质的任务进行竞争性。

translated by 谷歌翻译

Riemannian accelerated gradient methods via extrapolation

Andi Han , Bamdev Mishra , Pratik Jawanpuria , Junbin Gao

分类：机器学习 | (统计)机器学习

2022-08-13

在本文中，我们通过推断在歧管上的迭代来提出一种简单的加速度方案，用于利曼梯度方法。我们显示何时从Riemannian梯度下降法生成迭代元素，加速方案是渐近地达到最佳收敛速率，并且比最近提出的Riemannian Nesterov加速梯度方法在计算上更有利。我们的实验验证了新型加速策略的实际好处。

translated by 谷歌翻译

Riemannian Langevin Algorithm for Solving Semidefinite Programs

Mufan Bill Li , Murat A. Erdogdu

分类： (统计)机器学习 | 机器学习

2020-10-21

我们提出了一种基于langevin扩散的算法，以在球体的产物歧管上进行非凸优化和采样。在对数Sobolev不平等的情况下，我们根据Kullback-Leibler Divergence建立了有限的迭代迭代收敛到Gibbs分布的保证。我们表明，有了适当的温度选择，可以保证，次级最小值的次数差距很小，概率很高。作为一种应用，我们考虑了使用对角线约束解决半决赛程序（SDP）的burer- monteiro方法，并分析提出的langevin算法以优化非凸目标。特别是，我们为Burer建立了对数Sobolev的不平等现象 - 当没有虚假的局部最小值时，但在鞍点下，蒙蒂罗问题。结合结果，我们为SDP和最大切割问题提供了全局最佳保证。更确切地说，我们证明了Langevin算法在$ \ widetilde {\ omega}（\ epsilon^{ - 5}）$ tererations $ tererations $ \ widetilde {\ omega}（\ omega}中，具有很高的概率。

translated by 谷歌翻译

Riemannian Natural Gradient Methods

Jiang Hu , Ruicheng Ao , Anthony Man-Cho So , Minghan Yang , Zaiwen Wen

分类：机器学习

2022-07-15

本文研究了关于Riemannian流形的大规模优化问题，其目标函数是负面概要损失的有限总和。这些问题在各种机器学习和信号处理应用中出现。通过在歧管环境中引入Fisher信息矩阵的概念，我们提出了一种新型的Riemannian自然梯度方法，可以将其视为自然梯度方法的自然扩展，从欧几里得环境到歧管设置。我们在标准假设下建立了我们提出的方法的几乎纯净的全球融合。此外，我们表明，如果损失函数满足某些凸度和平稳性条件，并且输入输出图满足了雅各布稳定条件，那么我们提出的方法享有局部线性 - 或在Riemannian jacobian的Lipschitz连续性下，输入输出图，甚至二次 - 收敛速率。然后，我们证明，如果网络的宽度足够大，则可以通过具有批归归量的两层完全连接的神经网络来满足Riemannian Jacobian稳定性条件。这证明了我们的收敛率结果的实际相关性。对机器学习产生的应用的数值实验证明了该方法比最先进的方法的优势。

translated by 谷歌翻译

The Proxy Step-size Technique for Regularized Optimization on the Sphere Manifold

Fang Bai , Adrien Bartoli

分类：机器人

2022-09-05

我们为正规化优化问题$ g（\ boldsymbol {x}） + h（\ boldsymbol {x}）$提供了有效的解决方案，其中$ \ boldsymbol {x} $在单位sphere $ \ vert \ vert \ boldsymbol { x} \ vert_2 = 1 $。在这里$ g（\ cdot）$是lipschitz连续梯度的平稳成本）$通常是非平滑的，但凸出并且绝对同质，\ textit {ef。，}〜规范正则化及其组合。我们的解决方案基于Riemannian近端梯度，使用我们称为\ textIt {代理步骤}}的想法 - 一个标量变量，我们证明，与间隔内的实际步骤大小相对于实际的步骤。对于凸面和绝对均匀的$ h（\ cdot）$，替代步骤尺寸存在，并确定封闭形式中的实际步骤大小和切线更新，因此是完整的近端梯度迭代。基于这些见解，我们使用代理步骤设计了Riemannian近端梯度方法。我们证明，我们的方法仅基于$ g（\ cdot）$成本的线条搜索技术而收敛到关键点。提出的方法可以用几行代码实现。我们通过应用核规范，$ \ ell_1 $规范和核谱规则正规化来显示其有用性。这些改进是一致的，并得到数值实验的支持。

translated by 谷歌翻译

Universal Approximation Theorems for Differentiable Geometric Deep Learning

Anastasis Kratsios , Leonie Papon

分类：机器学习

2021-01-13

本文通过引入几何深度学习（GDL）框架来构建通用馈电型型模型与可区分的流形几何形状兼容的通用馈电型模型，从而解决了对非欧国人数据进行处理的需求。我们表明，我们的GDL模型可以在受控最大直径的紧凑型组上均匀地近似任何连续目标函数。我们在近似GDL模型的深度上获得了最大直径和上限的曲率依赖性下限。相反，我们发现任何两个非分类紧凑型歧管之间始终都有连续的函数，任何“局部定义”的GDL模型都不能均匀地近似。我们的最后一个主要结果确定了数据依赖性条件，确保实施我们近似的GDL模型破坏了“维度的诅咒”。我们发现，任何“现实世界”（即有限）数据集始终满足我们的状况，相反，如果目标函数平滑，则任何数据集都满足我们的要求。作为应用，我们确认了以下GDL模型的通用近似功能：Ganea等。（2018）的双波利馈电网络，实施Krishnan等人的体系结构。（2015年）的深卡尔曼 - 滤波器和深度玛克斯分类器。我们构建了：Meyer等人的SPD-Matrix回归剂的通用扩展/变体。（2011）和Fletcher（2003）的Procrustean回归剂。在欧几里得的环境中，我们的结果暗示了Kidger和Lyons（2020）的近似定理和Yarotsky和Zhevnerchuk（2019）无估计近似率的数据依赖性版本的定量版本。

translated by 谷歌翻译

The Dynamics of Riemannian Robbins-Monro Algorithms

Mohammad Reza Karimi , Ya-Ping Hsieh , Panayotis Mertikopoulos , Andreas Krause

分类：机器学习

2022-06-14

许多重要的学习算法，例如随机梯度方法，通常被部署以解决Riemannian歧管上的非线性问题。在这些应用中，我们提出了一个概括和扩展Robbins和Monro的精确随机近似框架的Riemannian算法家族。与他们的欧几里得对应物相比，由于歧管上缺乏全局线性结构，Riemannian迭代算法的理解要少得多。我们通过引入扩展的费米坐标框架来克服这一困难，该框架使我们能够绘制拟议的Riemannian Robbins-Monro（RRM）算法类别的渐近行为，以在基础歧管上非常轻微的假设下，在相关的确定性动力学系统下的算法。这样一来，我们提供了一个几乎肯定的收敛结果的一般模板，该模板镜像并扩展了欧几里得robbins-Monro方案的现有理论，尽管其分析要大得多，需要大量的新几何成分。我们通过使用该框架来建立基于回缩的类似物的融合来展示提出的RRM框架的灵活性，以解决最小化问题和游戏的流行乐观 /额外梯度方法，并且我们为其收敛提供了统一的处理。

translated by 谷歌翻译

Asymptotic Escape of Spurious Critical Points on the Low-rank Matrix Manifold

Thomas Y. Hou , Zhenzhen Li , Ziyun Zhang

分类：机器学习 | (统计)机器学习

2021-07-20

我们表明，在固定级和对称的阳性半明确矩阵上，Riemannian梯度下降算法几乎可以肯定地逃脱了歧管边界上的一些虚假关键点。我们的结果是第一个部分克服低级基质歧管的不完整而不改变香草riemannian梯度下降算法的不完整性。虚假的关键点是一些缺陷的矩阵，仅捕获地面真理的特征成分的一部分。与经典的严格鞍点不同，它们表现出非常奇异的行为。我们表明，使用动力学低级别近似和重新升级的梯度流，可以将某些伪造的临界点转换为参数化域中的经典严格鞍点，从而导致所需的结果。提供数值实验以支持我们的理论发现。

translated by 谷歌翻译

On Asymptotic Linear Convergence of Projected Gradient Descent for Constrained Least Squares

Trung Vu , Raviv Raich

分类：机器学习

2021-12-22

诸如压缩感测，图像恢复，矩阵/张恢复和非负矩阵分子等信号处理和机器学习中的许多近期问题可以作为约束优化。预计的梯度下降是一种解决如此约束优化问题的简单且有效的方法。本地收敛分析将我们对解决方案附近的渐近行为的理解，与全球收敛分析相比，收敛率的较小界限提供了较小的界限。然而，本地保证通常出现在机器学习和信号处理的特定问题领域。此稿件在约束最小二乘范围内，对投影梯度下降的局部收敛性分析提供了统一的框架。该建议的分析提供了枢转局部收敛性的见解，例如线性收敛的条件，收敛区域，精确的渐近收敛速率，以及达到一定程度的准确度所需的迭代次数的界限。为了证明所提出的方法的适用性，我们介绍了PGD的收敛分析的配方，并通过在四个基本问题上的配方的开始延迟应用来证明它，即线性约束最小二乘，稀疏恢复，最小二乘法使用单位规范约束和矩阵完成。

translated by 谷歌翻译

Learning Transition Operators From Sparse Space-Time Samples

Christian Kümmerle , Mauro Maggioni , Sui Tang

分类：机器学习 | (统计)机器学习

2022-12-01

We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by graph operators depending on the underlying graph topology. We address the nonlinearity of the problem by embedding it into a higher-dimensional space of suitable block-Hankel matrices, where it becomes a low-rank matrix completion problem, even if $\mathbf{A}$ is of full rank. For both a uniform and an adaptive random space-time sampling model, we quantify the recoverability of the transition operator via suitable measures of incoherence of these block-Hankel embedding matrices. For graph transition operators these measures of incoherence depend on the interplay between the dynamics and the graph topology. We develop a suitable non-convex iterative reweighted least squares (IRLS) algorithm, establish its quadratic local convergence, and show that, in optimal scenarios, no more than $\mathcal{O}(rn \log(nT))$ space-time samples are sufficient to ensure accurate recovery of a rank-$r$ operator $\mathbf{A}$ of size $n \times n$. This establishes that spatial samples can be substituted by a comparable number of space-time samples. We provide an efficient implementation of the proposed IRLS algorithm with space complexity of order $O(r n T)$ and per-iteration time complexity linear in $n$. Numerical experiments for transition operators based on several graph models confirm that the theoretical findings accurately track empirical phase transitions, and illustrate the applicability and scalability of the proposed algorithm.

translated by 谷歌翻译

First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces

Michael I. Jordan , Tianyi Lin , Emmanouil-Vasileios Vlatakis-Gkaragkounis

分类：机器学习

2022-06-04

从最佳运输到稳健的维度降低，可以将大量的机器学习应用程序放入Riemannian歧管上的Min-Max优化问题中。尽管在欧几里得的环境中已经分析了许多最小的最大算法，但事实证明，将这些结果转化为Riemannian案例已被证明是难以捉摸的。张等。 [2022]最近表明，测量凸凹入的凹入问题总是容纳鞍点解决方案。受此结果的启发，我们研究了Riemannian和最佳欧几里得空间凸入concove算法之间的性能差距。我们在负面的情况下回答了这个问题，证明Riemannian校正的外部（RCEG）方法在地球上强烈convex-concove案例中以线性速率实现了最后近期收敛，与欧几里得结果匹配。我们的结果还扩展到随机或非平滑案例，在这种情况下，RCEG和Riemanian梯度上升下降（RGDA）达到了近乎最佳的收敛速率，直到因歧管的曲率而定为因素。

translated by 谷歌翻译

Provably efficient variational generative modeling of quantum many-body systems via quantum-probabilistic information geometry

Faris M. Sbahi , Antonio J. Martinez , Sahil Patel , Dmitri Saberi , Jae Hyeon Yoo , Geoffrey Roeder , Guillaume Verdon

分类：机器学习 | (统计)机器学习

2022-06-09

量子哈密顿学习和量子吉布斯采样的双重任务与物理和化学中的许多重要问题有关。在低温方案中，这些任务的算法通常会遭受施状能力，例如因样本或时间复杂性差而遭受。为了解决此类韧性，我们将量子自然梯度下降的概括引入了参数化的混合状态，并提供了稳健的一阶近似算法，即量子 - 固定镜下降。我们使用信息几何学和量子计量学的工具证明了双重任务的数据样本效率，因此首次将经典Fisher效率的开创性结果推广到变异量子算法。我们的方法扩展了以前样品有效的技术，以允许模型选择的灵活性，包括基于量子汉密尔顿的量子模型，包括基于量子的模型，这些模型可能会规避棘手的时间复杂性。我们的一阶算法是使用经典镜下降二元性的新型量子概括得出的。两种结果都需要特殊的度量选择，即Bogoliubov-Kubo-Mori度量。为了从数值上测试我们提出的算法，我们将它们的性能与现有基准进行了关于横向场ISING模型的量子Gibbs采样任务的现有基准。最后，我们提出了一种初始化策略，利用几何局部性来建模状态的序列（例如量子 - 故事过程）的序列。我们从经验上证明了它在实际和想象的时间演化的经验上，同时定义了更广泛的潜在应用。

translated by 谷歌翻译

Nonconvex Matrix Factorization is Geodesically Convex: Global Landscape Analysis for Fixed-rank Matrix Optimization From a Riemannian Perspective

Yuetian Luo , Nicolas Garcia Trillos

分类：机器学习

2022-09-29

We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.

translated by 谷歌翻译

Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems

Chris Junchi Li , Michael I. Jordan

分类： (统计)机器学习 | 机器学习

2021-12-29

通过在线规范相关性分析的问题，我们提出了\ emph {随机缩放梯度下降}（SSGD）算法，以最小化通用riemannian歧管上的随机功能的期望。 SSGD概括了投影随机梯度下降的思想，允许使用缩放的随机梯度而不是随机梯度。在特殊情况下，球形约束的特殊情况，在广义特征向量问题中产生的，我们建立了$ \ sqrt {1 / t} $的令人反感的有限样本，并表明该速率最佳最佳，直至具有积极的积极因素相关参数。在渐近方面，一种新的轨迹平均争论使我们能够实现局部渐近常态，其速率与鲁普特 - Polyak-Quaditsky平均的速率匹配。我们将这些想法携带在一个在线规范相关分析，从事文献中的第一次获得了最佳的一次性尺度算法，其具有局部渐近融合到正常性的最佳一次性尺度算法。还提供了用于合成数据的规范相关分析的数值研究。

translated by 谷歌翻译

Projection Robust Wasserstein Distance and Riemannian Optimization

Tianyi Lin , Chenyou Fan , Nhat Ho , Marco Cuturi , Michael I. Jordan

分类：机器学习 | (统计)机器学习

2020-06-12

Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by~\citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data. This paper provides a first step into a computational theory of the PRW distance and provides the links between optimal transport and Riemannian optimization.

translated by 谷歌翻译

Deep learning of diffeomorphisms for optimal reparametrizations of shapes

Elena Celledoni , Helge Glöckner , Jørgen Riseth , Alexander Schmeding

分类：机器学习

2022-07-22

在形状分析中，基本问题之一是在计算这些形状之间的（地球）距离之前对齐曲线或表面。为了找到最佳的重新训练，实现这种比对的是一项计算要求的任务，它导致了在差异组上的优化问题。在本文中，我们通过组成基本差异性来解决近似问题，构建了定向性扩散的近似值。我们提出了一种在Pytorch中实施的实用算法，该算法既适用于未参考的曲线和表面。我们得出了通用近似结果，并获得了获得的差异形态成分的Lipschitz常数的边界。

translated by 谷歌翻译

Multi-level Geometric Optimization for Regularised Constrained Linear Inverse Problems

Sebastian Müller , Stefania Petra , Matthias Zisler

分类：计算机视觉

2022-07-11

我们提出了一种几何多级优化方法，该方法平滑地包含了框约束。给定一个受限的优化问题，我们考虑了具有不同离散水平的模型的层次结构。更精细的型号准确但计算昂贵，而更粗的型号则不太准确，但计算便宜。在良好级别上工作时，多级优化将基于搜索方向计算搜索方向，该模型会加快良好级别的更新。此外，利用层次结构引起的几何形状保留了更新的可行性。特别是，我们的方法扩展了多移民方法的经典组成部分，例如限制和延长延长我们约束的riemannian结构。

translated by 谷歌翻译

On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems

Michael Muehlebach , Michael I. Jordan

分类：机器学习

2021-07-17

We introduce a class of first-order methods for smooth constrained optimization that are based on an analogy to non-smooth dynamical systems. Two distinctive features of our approach are that (i) projections or optimizations over the entire feasible set are avoided, in stark contrast to projected gradient methods or the Frank-Wolfe method, and (ii) iterates are allowed to become infeasible, which differs from active set or feasible direction methods, where the descent motion stops as soon as a new constraint is encountered. The resulting algorithmic procedure is simple to implement even when constraints are nonlinear, and is suitable for large-scale constrained optimization problems in which the feasible set fails to have a simple structure. The key underlying idea is that constraints are expressed in terms of velocities instead of positions, which has the algorithmic consequence that optimizations over feasible sets at each iteration are replaced with optimizations over local, sparse convex approximations. In particular, this means that at each iteration only constraints that are violated are taken into account. The result is a simplified suite of algorithms and an expanded range of possible applications in machine learning.

translated by 谷歌翻译