A normalizing flow (NF) is a mapping that transforms a chosen probability distribution to a normal distribution. Such flows are a common technique used for data generation and density estimation in machine learning and data science. The density estimate obtained with a NF requires a change of variables formula that involves the computation of the Jacobian determinant of the NF transformation. In order to tractably compute this determinant, continuous normalizing flows (CNF) estimate the mapping and its Jacobian determinant using a neural ODE. Optimal transport (OT) theory has been successfully used to assist in finding CNFs by formulating them as OT problems with a soft penalty for enforcing the standard normal distribution as a target measure. A drawback of OT-based CNFs is the addition of a hyperparameter, $\alpha$, that controls the strength of the soft penalty and requires significant tuning. We present JKO-Flow, an algorithm to solve OT-based CNF without the need of tuning $\alpha$. This is achieved by integrating the OT CNF framework into a Wasserstein gradient flow framework, also known as the JKO scheme. Instead of tuning $\alpha$, we repeatedly solve the optimization problem for a fixed $\alpha$ effectively performing a JKO update with a time-step $\alpha$. Hence we obtain a "divide and conquer" algorithm by repeatedly solving simpler problems instead of solving a potentially harder problem with large $\alpha$.
translated by 谷歌翻译
传统的数据湖泊通过启用时间旅行,运行SQL查询,使用酸性交易摄入数据以及可视化PBABYTE尺度数据集在云存储中,为分析工作负载提供了关键的数据基础架构。它们使组织能够分解数据孤岛,解锁数据驱动的决策,提高运营效率并降低成本。但是,随着深度学习接管常见的分析工作流程,传统数据湖泊对诸如自然语言处理(NLP),音频处理,计算机视觉和涉及非尾巴数据集的应用程序的有用程度降低。本文介绍了Deep Lake,这是一个开源湖泊,用于在Activeloop开发的深度学习应用程序。 Deep Lake保持了一项关键区别的香草数据湖的好处:它以张量的形式存储复杂数据,例如图像,视频,注释以及表格数据,并将数据迅速流式传输到网络上(a )张量查询语言,(b)浏览器可视化引擎或(c)不牺牲GPU利用率的深度学习框架。可以从Pytorch,Tensorflow,Jax,与许多MLOPS工具集成在一起的数据集。
translated by 谷歌翻译