智能论文笔记

Variational Wasserstein Barycenters with c-Cyclical Monotonicity

Jinjin Chi , Zhiyao Yang , Jihong Ouyang , Ximing Li

分类：机器学习 | (统计)机器学习

2021-10-22

Wasserstein barycenter, built on the theory of optimal transport, provides a powerful framework to aggregate probability distributions, and it has increasingly attracted great attention within the machine learning community. However, it suffers from severe computational burden, especially for high dimensional and continuous settings. To this end, we develop a novel continuous approximation method for the Wasserstein barycenters problem given sample access to the input distributions. The basic idea is to introduce a variational distribution as the approximation of the true continuous barycenter, so as to frame the barycenters computation problem as an optimization problem, where parameters of the variational distribution adjust the proxy distribution to be similar to the barycenter. Leveraging the variational distribution, we construct a tractable dual formulation for the regularized Wasserstein barycenter problem with c-cyclical monotonicity, which can be efficiently solved by stochastic optimization. We provide theoretical analysis on convergence and demonstrate the practical effectiveness of our method on real applications of subset posterior aggregation and synthetic data.

translated by 谷歌翻译

Scalable Computations of Wasserstein Barycenter via Input Convex Neural Networks

Jiaojiao Fan , Amirhossein Taghvaei , Yongxin Chen

分类：机器学习 | (统计)机器学习

2020-07-08

Wasserstein BaryCenter是一种原理的方法来表示给定的一组概率分布的加权平均值，利用由最佳运输所引起的几何形状。在这项工作中，我们提出了一种新颖的可扩展算法，以近似于旨在在机器学习中的高维应用的Wassersein重构。我们所提出的算法基于Wassersein-2距离的Kantorovich双重制定以及最近的神经网络架构，输入凸神经网络，其已知参数化凸函数。我们方法的显着特征是：i）仅需要来自边缘分布的样本; ii）与现有方法不同，它代表了具有生成模型的重心，因此可以在不查询边际分布的情况下从重心产生无限样品; III）它与一个边际案例中的生成对抗性模型类似。我们通过在多个实验中将其与最先进的方法进行比较来证明我们的算法的功效。

translated by 谷歌翻译

Scalable Computation of Monge Maps with General Costs

Jiaojiao Fan , Shu Liu , Shaojun Ma , Yongxin Chen , Haomin Zhou

分类：机器学习

2021-06-07

Monge Map是指两个概率分布之间的最佳运输映射，并提供了将一个分发转换为另一个的原则方法。尽管最佳运输问题的数值方法的快速发展，但计算Monge地图仍然具有挑战性，特别是对于高维问题。在本文中，我们提出了一种可扩展算法，用于计算两个概率分布之间的Monge地图。我们的算法基于最佳运输问题的弱形式，因此它仅需要来自边缘的样本而不是其分析表达式，并且可以容纳两个具有不同尺寸的分布之间的最佳运输。我们的算法适用于一般成本函数，与其他现有方法相比，用于使用样本估计Monge Maps的方法，这些方法通常用于二次成本。通过具有合成和现实数据的一系列实验来证明我们的算法的性能。

translated by 谷歌翻译

Variational Wasserstein gradient flow

Jiaojiao Fan , Amirhossein Taghvaei , Yongxin Chen

分类：机器学习

2021-12-04

在概率密度范围内相对于Wassersein度量的空间的梯度流程通常具有很好的特性，并且已在几种机器学习应用中使用。计算Wasserstein梯度流量的标准方法是有限差异，使网格上的基础空间离散，并且不可扩展。在这项工作中，我们提出了一种可扩展的近端梯度型算法，用于Wassersein梯度流。我们的方法的关键是目标函数的变分形式，这使得可以通过引流 - 双重优化实现JKO近端地图。可以通过替代地更新内部和外环中的参数来有效地解决该原始问题。我们的框架涵盖了包括热方程和多孔介质方程的所有经典Wasserstein梯度流。我们展示了若干数值示例的算法的性能和可扩展性。

translated by 谷歌翻译

Distributionally robust risk evaluation with causality constraint and structural information

Bingyan Han

分类： (统计)机器学习

2022-03-20

这项工作研究了在时间数据上对预期功能值的分配评估。一组替代措施的特征是因果最佳运输。我们证明了强大的二元性并重铸了因无限维测试功能空间的最小化因果关系的约束。我们通过神经网络近似测试函数，并证明了带有Rademacher复杂性的样品复杂性。此外，当可以使用结构信息来进一步限制歧义集时，我们证明了双重公式并提供有效的优化方法。对实现波动率和库存指数的实证分析表明，我们的框架为经典最佳运输配方提供了有吸引力的替代品。

translated by 谷歌翻译

Sinkhorn Distributionally Robust Optimization

Jie Wang , Rui Gao , Yao Xie

分类：机器学习 | (统计)机器学习

2021-09-24

We study distributionally robust optimization (DRO) with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We provide convex programming dual reformulation for a general nominal distribution. Compared with Wasserstein DRO, it is computationally tractable for a larger class of loss functions, and its worst-case distribution is more reasonable. We propose an efficient first-order algorithm with bisection search to solve the dual reformulation. We demonstrate that our proposed algorithm finds $\delta$-optimal solution of the new DRO formulation with computation cost $\tilde{O}(\delta^{-3})$ and memory cost $\tilde{O}(\delta^{-2})$, and the computation cost further improves to $\tilde{O}(\delta^{-2})$ when the loss function is smooth. Finally, we provide various numerical examples using both synthetic and real data to demonstrate its competitive performance and light computational speed.

translated by 谷歌翻译

Stochastic Saddle-Point Optimization for Wasserstein Barycenters

Daniil Tiapkin , Alexander Gasnikov , Pavel Dvurechensky

分类：机器学习 | (统计)机器学习

2020-06-11

我们考虑人口Wasserstein Barycenter问题，用于随机概率措施支持有限一组点，由在线数据流生成。这导致了复杂的随机优化问题，其中目标是作为作为随机优化问题的解决方案给出的函数的期望。我们采用了问题的结构，并获得了这个问题的凸凹陷的随机鞍点重构。在设置随机概率措施的分布是离散的情况下，我们提出了一种随机优化算法并估计其复杂性。基于内核方法的第二个结果将前一个延伸到随机概率措施的任意分布。此外，这种新算法在许多情况下，与随机近似方法相结合的随机近似方法，具有优于随机近似方法的总复杂性。我们还通过一系列数值实验说明了我们的发展。

translated by 谷歌翻译

Penalized Langevin and Hamiltonian Monte Carlo Algorithms for Constrained Sampling

Mert Gürbüzbalaban , Yuanhan Hu , Lingjiong Zhu

分类： (统计)机器学习 | 机器学习

2022-11-29

We consider the constrained sampling problem where the goal is to sample from a distribution $\pi(x)\propto e^{-f(x)}$ and $x$ is constrained on a convex body $\mathcal{C}\subset \mathbb{R}^d$. Motivated by penalty methods from optimization, we propose penalized Langevin Dynamics (PLD) and penalized Hamiltonian Monte Carlo (PHMC) that convert the constrained sampling problem into an unconstrained one by introducing a penalty function for constraint violations. When $f$ is smooth and the gradient is available, we show $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ iteration complexity for PLD to sample the target up to an $\varepsilon$-error where the error is measured in terms of the total variation distance and $\tilde{\mathcal{O}}(\cdot)$ hides some logarithmic factors. For PHMC, we improve this result to $\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon^{7})$ when the Hessian of $f$ is Lipschitz and the boundary of $\mathcal{C}$ is sufficiently smooth. To our knowledge, these are the first convergence rate results for Hamiltonian Monte Carlo methods in the constrained sampling setting that can handle non-convex $f$ and can provide guarantees with the best dimension dependency among existing methods with deterministic gradients. We then consider the setting where unbiased stochastic gradients are available. We propose PSGLD and PSGHMC that can handle stochastic gradients without Metropolis-Hasting correction steps. When $f$ is strongly convex and smooth, we obtain an iteration complexity of $\tilde{\mathcal{O}}(d/\varepsilon^{18})$ and $\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon^{39})$ respectively in the 2-Wasserstein distance. For the more general case, when $f$ is smooth and non-convex, we also provide finite-time performance bounds and iteration complexity results. Finally, we test our algorithms on Bayesian LASSO regression and Bayesian constrained deep learning problems.

translated by 谷歌翻译

Rethinking Initialization of the Sinkhorn Algorithm

James Thornton , Marco Cuturi

分类： (统计)机器学习 | 机器学习

2022-06-15

计算分布之间的最佳传输（OT）耦合在机器学习中起着越来越重要的作用。虽然可以将OT问题求解为线性程序，但添加熵平滑项会导致求解器对离群值更快，更强大，可区分且易于并行化。 Sinkhorn固定点算法是这些方法的基石，结果，已经进行了多次尝试以缩短其运行时，例如退火，动量或加速度。本文的前提是，\ textit {initialization}的sindhorn算法受到了相对较少的关注，可能是由于两个先入为主的：由于正规化的ot问题是凸的，因此可能不值得制定量身定制的初始化，因为\ textit {\ textit { }保证工作；其次，由于sindhorn算法在端到端管道中通常是区分的，因此数据依赖性初始化可能会通过展开迭代而获得的偏差梯度估计。我们挑战了这种传统的观点，并表明精心选择的初始化可能会导致巨大的加速，并且不会偏向梯度，这些梯度是通过隐式分化计算的。我们详细介绍如何使用1D或高斯设置中的已知结果从封闭形式或近似OT解决方案中恢复初始化。我们从经验上表明，这些初始化可以在现成的情况下使用，几乎没有调整，并且导致各种OT问题的速度持续加速。

translated by 谷歌翻译

Understanding Entropic Regularization in GANs

Daria Reshetova , Yikun Bai , Xiugang Wu , Ayfer Ozgur

分类：机器学习 | (统计)机器学习

2021-11-02

生成的对策网络是一种流行的方法，用于通过根据已知分发的函数来建立目标分布来从数据学习分布的流行方法。经常被称为发电机的功能优化，以最小化所生成和目标分布之间的所选距离测量。这种目的的一个常用措施是Wassersein距离。然而，Wassersein距离难以计算和优化，并且在实践中，使用熵正则化技术来改善数值趋同。然而，正规化对学到的解决方案的影响仍未得到很好的理解。在本文中，我们研究了Wassersein距离的几个流行的熵正规提出如何在一个简单的基准设置中冲击解决方案，其中发电机是线性的，目标分布是高维高斯的。我们表明，熵正则化促进了解决方案稀疏化，同时更换了与秸秆角偏差的Wasserstein距离恢复了不断的解决方案。两种正则化技术都消除了Wasserstein距离所遭受的维度的诅咒。我们表明，可以从目标分布中学习最佳发电机，以$ O（1 / \ epsilon ^ 2）$ samples从目标分布中学习。因此，我们得出结论，这些正则化技术可以提高来自大量分布的经验数据的发电机的质量。

translated by 谷歌翻译

Sliced Wasserstein Variational Inference

Mingxuan Yi , Song Liu

分类： (统计)机器学习 | 机器学习

2022-07-26

通过最小化kullback-leibler（kl）差异，变化推断近似于非差异分布。尽管这种差异对于计算有效，并且已在应用中广泛使用，但它具有一些不合理的属性。例如，它不是一个适当的度量标准，即，它是非对称的，也不保留三角形不等式。另一方面，最近的最佳运输距离显示出比KL差异的一些优势。在这些优势的帮助下，我们通过最大程度地减少切片的瓦斯汀距离，这是一种由最佳运输产生的有效度量，提出了一种新的变异推理方法。仅通过运行MCMC而不能解决任何优化问题，就可以简单地近似切片的Wasserstein距离。我们的近似值也不需要变异分布的易于处理密度函数，因此诸如神经网络之类的发电机可以摊销近似家庭。此外，我们提供了方法的理论特性分析。说明了关于合成和真实数据的实验，以显示提出的方法的性能。

translated by 谷歌翻译

Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

Kimia Nadjahi , Alain Durmus , Pierre E. Jacob , Roland Badeau , Umut Şimşekli

分类： (统计)机器学习 | 机器学习

2021-06-29

切片 - Wasserstein距离（SW）越来越多地用于机器学习应用，作为Wassersein距离的替代方案，并提供了显着的计算和统计效益。由于它被定义为随机投影的期望，因此SW通常由Monte Carlo近似。我们通过利用测量现象的浓度来采用新的视角来近似SW：在温和的假设下，高维随机向量的一维突起大致高斯。基于此观察，我们为SW开发了一个简单的确定性近似。我们的方法不需要采样许多随机投影，因此与通常的Monte Carlo近似相比，准确且易于使用。我们派生了我们的方法的非对应保证，并且显示近似误差随着数据分布的弱依赖条件下的弱依赖条件而变为零。我们验证了对合成数据集的理论发现，并说明了在生成建模问题上提出的近似。

translated by 谷歌翻译

Projection Robust Wasserstein Distance and Riemannian Optimization

Tianyi Lin , Chenyou Fan , Nhat Ho , Marco Cuturi , Michael I. Jordan

分类：机器学习 | (统计)机器学习

2020-06-12

Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by~\citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data. This paper provides a first step into a computational theory of the PRW distance and provides the links between optimal transport and Riemannian optimization.

translated by 谷歌翻译

Near-optimal estimation of smooth transport maps with kernel sums-of-squares

Boris Muzellec , Adrien Vacher , Francis Bach , François-Xavier Vialard , Alessandro Rudi

分类： (统计)机器学习 | 机器学习

2021-12-03

最近表明，在光滑状态下，可以通过吸引统计误差上限可以有效地计算两个分布之间的平方Wasserstein距离。然而，而不是距离本身，生成建模等应用的感兴趣对象是底层的最佳运输地图。因此，需要为估计的地图本身获得计算和统计保证。在本文中，我们提出了第一种统计$ L ^ 2 $错误的第一批量算法几乎匹配了现有的最低限度用于平滑地图估计。我们的方法是基于解决具有无限尺寸的平方和重构的最佳运输的半双向配方，并导致样品数量的无尺寸多项式速率的算法，具有潜在指数的维度依赖性常数。

translated by 谷歌翻译

Bayesian Learning with Wasserstein Barycenters

Julio Backhoff-Veraguas , Joaquin Fontbona , Gonzalo Rios , Felipe Tobar

分类： (统计)机器学习 | 机器学习

2018-05-28

We introduce and study a novel model-selection strategy for Bayesian learning, based on optimal transport, along with its associated predictive posterior law: the Wasserstein population barycenter of the posterior law over models. We first show how this estimator, termed Bayesian Wasserstein barycenter (BWB), arises naturally in a general, parameter-free Bayesian model-selection framework, when the considered Bayesian risk is the Wasserstein distance. Examples are given, illustrating how the BWB extends some classic parametric and non-parametric selection strategies. Furthermore, we also provide explicit conditions granting the existence and statistical consistency of the BWB, and discuss some of its general and specific properties, providing insights into its advantages compared to usual choices, such as the model average estimator. Finally, we illustrate how this estimator can be computed using the stochastic gradient descent (SGD) algorithm in Wasserstein space introduced in a companion paper arXiv:2201.04232v2 [math.OC], and provide a numerical example for experimental validation of the proposed method.

translated by 谷歌翻译

A deep learning framework for geodesics under spherical Wasserstein-Fisher-Rao metric and its application for weighted sample generation

Yang Jing , Jiaheng Chen , Lei Li , Jianfeng Lu

分类：机器学习

2022-08-25

Wasserstein-Fisher-Rao（WFR）距离是一个指标家族，用于评估两种ra措施的差异，这同时考虑了运输和重量的变化。球形WFR距离是WFR距离的投影版本，以实现概率措施，因此配备了WFR的ra尺度空间可以在概率测量的空间中，用球形WFR视为公式锥。与Wasserstein距离相比，在球形WFR下对大地测量学的理解尚不清楚，并且仍然是持续的研究重点。在本文中，我们开发了一个深度学习框架，以计算球形WFR指标下的大地测量学，并且可以采用学习的大地测量学来生成加权样品。我们的方法基于球形WFR的Benamou-Brenier型动态配方。为了克服重量变化带来的边界约束的困难，将基于反向映射的kullback-leibler（KL）发散术语引入成本函数。此外，引入了使用粒子速度的新的正则化项，以替代汉密尔顿 - 雅各比方程的动态公式中的潜力。当用于样品生成时，与先前的流量模型相比，与给定加权样品的应用相比，我们的框架可能对具有给定加权样品的应用有益。

translated by 谷歌翻译

Efficient estimates of optimal transport via low-dimensional embeddings

Patric M. Fulop , Vincent Danos

分类：机器学习 | 人工智能 | (统计)机器学习

2021-11-08

最佳运输距离（OT）已广泛应用于最近的机器学习工作作为比较概率分布的方法。当数据在高尺寸处生存时，这些都是昂贵的。Paty等人的最新工作是，2019年，专门针对使用数据的低级别投影（视为离散措施）来降低这一成本。我们扩展了这种方法，并表明，通过使用更多地图的地图族可以近距离近距离近距离。通过在给定的家庭上最大化OT来获得最佳估计。随着在将数据映射到较低维度空间之后进行OT计算，我们的方法使用原始数据维度缩放。我们用神经网络展示了这个想法。

translated by 谷歌翻译

GeONet: a neural operator for learning the Wasserstein geodesic

Andrew Gracyk , Xiaohui Chen

分类：机器学习 | 人工智能 | 计算机视觉 | (统计)机器学习

2022-09-28

Optimal Transport（OT）提供了一个多功能框架，以几何有意义的方式比较复杂的数据分布。计算Wasserstein距离和概率措施之间的大地测量方法的传统方法需要网络依赖性域离散化，并且受差异性的诅咒。我们提出了Geonet，这是一个网状不变的深神经操作员网络，该网络从输入对的初始和终端分布对到Wasserstein Geodesic连接两个端点分布的非线性映射。在离线训练阶段，Geonet了解了以耦合PDE系统为特征的原始和双空间中OT问题动态提出的鞍点最佳条件。随后的推理阶段是瞬时的，可以在在线学习环境中进行实时预测。我们证明，Geonet在模拟示例和CIFAR-10数据集上达到了与标准OT求解器的可比测试精度，其推断阶段计算成本大大降低了。

translated by 谷歌翻译

Spherical Sliced-Wasserstein

Clément Bonet , Paul Berg , Nicolas Courty , François Septier , Lucas Drumetz , Minh-Tan Pham

分类： (统计)机器学习 | 机器学习

2022-06-17

引入了Wasserstein距离的许多变体，以减轻其原始计算负担。尤其是切成薄片的距离（SW），该距离（SW）利用了一维投影，可以使用封闭式的瓦斯汀距离解决方案。然而，它仅限于生活在欧几里得空间中的数据，而Wasserstein距离已被研究和最近在歧管上使用。我们更具体地专门地关注球体，为此定义了新颖的SW差异，我们称之为球形切片 - 拖鞋，这是朝着定义SW差异的第一步。我们的构造明显基于圆圈上瓦斯汀距离的封闭式解决方案，以及新的球形ra径。除了有效的算法和相应的实现外，我们在几个机器学习用例中说明了它的属性，这些用例中，数据的球形表示受到威胁：在球体上的密度估计，变异推理或超球体自动编码器。

translated by 谷歌翻译

Generative Adversarial Learning of Sinkhorn Algorithm Initializations

Jonathan Geuter , Vaios Laschos

分类：机器学习 | (统计)机器学习

2022-11-30

The Sinkhorn algorithm (arXiv:1306.0895) is the state-of-the-art to compute approximations of optimal transport distances between discrete probability distributions, making use of an entropically regularized formulation of the problem. The algorithm is guaranteed to converge, no matter its initialization. This lead to little attention being paid to initializing it, and simple starting vectors like the n-dimensional one-vector are common choices. We train a neural network to compute initializations for the algorithm, which significantly outperform standard initializations. The network predicts a potential of the optimal transport dual problem, where training is conducted in an adversarial fashion using a second, generating network. The network is universal in the sense that it is able to generalize to any pair of distributions of fixed dimension. Furthermore, we show that for certain applications the network can be used independently.

translated by 谷歌翻译