基于能量的模型(EBMS)允许极其灵活的概率分布规范。然而,它们不提供从这些分布中获得精确样本的机制。蒙特卡罗技术可以帮助我们获得样品,如果我们可以轻易采用可用的一些建议分布。例如,抑制采样可以提供精确的样本,但由于需要找到上限目标分布的提案分布,通常难以或不可能应用。大致马克洛夫链Monte Carlo采样技术通常更容易设计,利用在不断发展的样本上执行本地编辑的本地提案分布。然而,由于提案分布的本地性质,这些技术可能效率低下,并且不提供对样品质量的估计。在这项工作中,我们提出了一种新的近似采样技术,准拒绝采样(QRS),允许采样效率和采样质量之间进行权衡,同时提供显式收敛界限和诊断。 QRS大写从深度学习模型获得的高质量全球提案分布的可用性。我们展示了QRS采样对具有分布约束和解释生成的受控文本生成任务的分离EBMS对文本的有效性。我们表明,我们可以以采样效率的成本,从这些eBMS采样。
translated by 谷歌翻译
Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore four approaches to statistically estimate these scores: vector quantization, non-parametric estimation, classifier-based estimation, and parametric Gaussian approximations. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude the paper by demonstrating its applications to other AI domains and discussing practical recommendations.
translated by 谷歌翻译
我们提出了离散的Langevin提案(DLP),这是一种简单且可扩展的基于梯度的建议,用于对复杂的高维离散分布进行采样。与基于Gibbs采样的方法相反,DLP能够单个步骤并行更新所有坐标,并且更改的幅度由步骤尺寸控制。这允许在高维且密切相关的变量的空间中进行廉价,有效的探索。我们通过证明其固定分布的渐近偏置对于对数季度分布而言是零,并且对于接近对数季度的分布而言,我们证明了DLP的效率为零。使用DLP,我们开发了几种采样算法的变体,包括未经调整的,大都市调整后的,随机和预处理版本。DLP在各种任务上都优于许多受欢迎的替代方案,包括ISING模型,受限的Boltzmann机器,基于深层的基于能量的模型,二进制神经网络和语言生成。
translated by 谷歌翻译
我们介绍了本地自动平衡采样器(LSB),这是一种本地马尔可夫链蒙特卡洛(MCMC)方法,用于在纯离散域中采样,该方法能够自主适应目标分布并减少收敛所需的目标评估数量。LSB基于(i)局部平衡建议的参数化,(ii)基于相互信息的新提出的目标函数和(iii)自平衡学习过程,该过程最大程度地降低了提议的目标以更新提案参数。基于能量的模型和马尔可夫网络的实验表明,与最近的本地MCMC采样器相比,LSB使用较少数量的Oracle分布收敛。
translated by 谷歌翻译
Large pretrained language models generate fluent text but are notoriously hard to controllably sample from. In this work, we study constrained sampling from such language models: generating text that satisfies user-defined constraints, while maintaining fluency and the model's performance in a downstream task. We propose MuCoLa -- a sampling procedure that combines the log-likelihood of the language model with arbitrary (differentiable) constraints in a single energy function, and then generates samples in a non-autoregressive manner. Specifically, it initializes the entire output sequence with noise and follows a Markov chain defined by Langevin Dynamics using the gradients of the energy function. We evaluate MuCoLa on text generation with soft and hard constraints as well as their combinations obtaining significant improvements over competitive baselines for toxicity avoidance, sentiment control, and keyword-guided generation.
translated by 谷歌翻译
由于在开放式文本生成中取得了重大进展,衡量机器生成的文本是如何对人类语言的关键问题。我们介绍紫红色,一个开放式文本生成的比较措施,它直接将文本生成模型的学习分布与使用发散边界的分发进行了分布到人写的文本。淡紫色通过计算量化嵌入空间中的信息分流来缩放到现代文本生成模型。通过对三个开放式发电任务的广泛实证研究,我们发现紫红色标识了所生成文本的已知属性,天然存在模型大小,并与人类判断相关,而不是现有的分布评估度量的限制较少。
translated by 谷歌翻译
In reasoning about sequential events it is natural to pose probabilistic queries such as "when will event A occur next" or "what is the probability of A occurring before B", with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transformers, probabilistic querying has been largely restricted to simple cases such as next-event prediction. This is in part due to the fact that future querying involves marginalization over large path spaces, which is not straightforward to do efficiently in such models. In this paper we introduce a general typology for predictive queries in neural autoregressive sequence models and show that such queries can be systematically represented by sets of elementary building blocks. We leverage this typology to develop new query estimation methods based on beam search, importance sampling, and hybrids. Across four large-scale sequence datasets from different application domains, as well as for the GPT-2 language model, we demonstrate the ability to make query answering tractable for arbitrary queries in exponentially-large predictive path-spaces, and find clear differences in cost-accuracy tradeoffs between search and sampling methods.
translated by 谷歌翻译
重要的加权是调整蒙特卡洛集成以说明错误分布中抽取的一种一般方法,但是当重要性比的右尾巴较重时,最终的估计值可能是高度可变的。当目标分布的某些方面无法通过近似分布捕获,在这种情况下,可以通过修改极端重要性比率来获得更稳定的估计。我们提出了一种新的方法,该方法使用拟合模拟重要性比率的上尾的广义帕累托分布来稳定重要性权重。该方法在经验上的性能要比现有方法稳定重要性采样估计值更好,包括稳定的有效样本量估计,蒙特卡洛误差估计和收敛诊断。提出的帕累托$ \ hat {k} $有限样本收敛率诊断对任何蒙特卡洛估计器都有用。
translated by 谷歌翻译
这是模型选择和假设检测的边缘似然计算的最新介绍和概述。计算概率模型(或常量比率)的常规规定常数是许多统计数据,应用数学,信号处理和机器学习中的许多应用中的基本问题。本文提供了对主题的全面研究。我们突出了不同技术之间的局限性,优势,连接和差异。还描述了使用不正确的前沿的问题和可能的解决方案。通过理论比较和数值实验比较一些最相关的方法。
translated by 谷歌翻译
我们提出了连续重复的退火流传输蒙特卡洛(CRAFT),该方法结合了顺序的蒙特卡洛(SMC)采样器(本身是退火重要性采样的概括)与使用归一化流量的变异推断。直接训练了归一化的流量,可用于使用KL差异进行每个过渡,以在退火温度之间运输。使用归一化流/SMC近似值估算了此优化目标。我们从概念上展示并使用多个经验示例,这些示例可以改善退火流运输蒙特卡洛(Arbel等,2021),并在其上建造,也可以在基于马尔可夫链蒙特卡洛(MCMC)基于基于的随机归一化流(Wu等人。2020)。通过将工艺纳入粒子MCMC中,我们表明,这种学识渊博的采样器可以在具有挑战性的晶格场理论示例中获得令人印象深刻的准确结果。
translated by 谷歌翻译
利用启发式来评估收敛性和压缩马尔可夫链蒙特卡罗的输出可以在生产的经验逼近时是次优。通常,许多初始状态归因于“燃烧”并移除,而链条的其余部分是“变薄”,如果还需要压缩。在本文中,我们考虑回顾性地从样本路径中选择固定基数的状态的问题,使得由其经验分布提供的近似接近最佳。提出了一种基于核心稳定性差异的贪婪最小化的新方法,这适用于需要重压力的问题。理论结果保障方法的一致性及其有效性在常微分方程的参数推理的具体背景下证明了该效果。软件可在Python,R和Matlab中的Stein细化包中提供。
translated by 谷歌翻译
We present the GPry algorithm for fast Bayesian inference of general (non-Gaussian) posteriors with a moderate number of parameters. GPry does not need any pre-training, special hardware such as GPUs, and is intended as a drop-in replacement for traditional Monte Carlo methods for Bayesian inference. Our algorithm is based on generating a Gaussian Process surrogate model of the log-posterior, aided by a Support Vector Machine classifier that excludes extreme or non-finite values. An active learning scheme allows us to reduce the number of required posterior evaluations by two orders of magnitude compared to traditional Monte Carlo inference. Our algorithm allows for parallel evaluations of the posterior at optimal locations, further reducing wall-clock times. We significantly improve performance using properties of the posterior in our active learning scheme and for the definition of the GP prior. In particular we account for the expected dynamical range of the posterior in different dimensionalities. We test our model against a number of synthetic and cosmological examples. GPry outperforms traditional Monte Carlo methods when the evaluation time of the likelihood (or the calculation of theoretical observables) is of the order of seconds; for evaluation times of over a minute it can perform inference in days that would take months using traditional methods. GPry is distributed as an open source Python package (pip install gpry) and can also be found at https://github.com/jonaselgammal/GPry.
translated by 谷歌翻译
已经引入了生成流量网络(GFlowNETS)作为在主动学习背景下采样多样化候选的方法,具有培训目标,其使它们与给定奖励功能成比例地进行比例。在本文中,我们显示了许多额外的GFLOWN的理论特性。它们可用于估计联合概率分布和一些变量未指定的相应边际分布,并且特别感兴趣地,可以代表像集合和图形的复合对象的分布。 Gflownets摊销了通常通过计算昂贵的MCMC方法在单个但训练有素的生成通行证中进行的工作。它们还可用于估计分区功能和自由能量,给定子集(子图)的超标(超图)的条件概率,以及给定集合(图)的所有超标仪(超图)的边际分布。我们引入了熵和相互信息估计的变体,从帕累托前沿采样,与奖励最大化策略的连接,以及随机环境的扩展,连续动作和模块化能量功能。
translated by 谷歌翻译
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" sampling algorithms.
translated by 谷歌翻译
我们研究Livingstone&Zanella(2021)中引入的一阶级本地平衡的大都市 - 黑斯廷斯算法(2021)。要在类中选择特定算法,用户必须选择平衡函数$ g:\ mathbb {r} \ to \ mathbb {r} $满足$ g(t)= tg(1 / t)$,以及噪声分布提案增量。课程中的流行选择是Metropolis调整的Langevin算法,最近推出的Barker提案。我们首先建立一个普遍限制的最佳验收率为57%,并为N $ N $的缩放,因为维度在$ G $的温和平滑假设下的所有成员之间的无限程度倾向于无限算法的目标分布是产品形式。特别地,我们通过预期的平方跳跃距离来获得类中任意算法的渐近效率的显式表达式。然后,我们考虑如何在各种约束下优化此表达式。我们为Barker提案提供了最佳的噪声分布选择,在高斯噪声分布​​下的平衡功能的最佳选择,以及整个类中的一阶本地平衡算法的最佳选择,结果取决于特定的目标分布。数值模拟确认了我们的理论发现,特别表明,Barker提案中的双模噪声分布选择产生了比原始高斯版本始终如一的效率的实用算法。
translated by 谷歌翻译
最近介绍基于梯度的MCMC用于离散空间具有巨大的希望,并带来了新离散的可能性的诱人可能性,即MALA和HMC等著名的连续方法。为了实现这一目标,我们介绍了几个在概念上受到MALA启发的分离大都会杂货样本,并在贝叶斯推理和基于能量的建模中表现出了一系列具有挑战性的采样问题。从方法上讲,我们确定了为什么对预处理的MALA的离散类似物通常是棘手的,激发了我们基于辅助变量和“高斯整体技巧”引入一种新型的预处理。
translated by 谷歌翻译
统计模型是机器学习的核心,具有广泛适用性,跨各种下游任务。模型通常由通过最大似然估计从数据估计的自由参数控制。但是,当面对现实世界数据集时,许多模型运行到一个关键问题:它们是在完全观察到的数据方面配制的,而在实践中,数据集会困扰缺失数据。来自不完整数据的统计模型估计理论在概念上类似于潜在变量模型的估计,其中存在强大的工具,例如变分推理(VI)。然而,与标准潜在变量模型相比,具有不完整数据的参数估计通常需要估计缺失变量的指数 - 许多条件分布,因此使标准的VI方法是棘手的。通过引入变分Gibbs推理(VGI),是一种新的通用方法来解决这个差距,以估计来自不完整数据的统计模型参数。我们在一组合成和实际估算任务上验证VGI,从不完整的数据中估算重要的机器学习模型,VAE和标准化流程。拟议的方法,同时通用,实现比现有的特定模型特定估计方法竞争或更好的性能。
translated by 谷歌翻译
Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that are more natural and better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the lower level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks which may require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize CTG techniques from the perspective of PLMs. We hope it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.
translated by 谷歌翻译
我们提供了对神经马尔可夫链蒙特卡罗模拟中的自相关的深度研究,该版本的传统大都会算法采用神经网络来提供独立的建议。我们使用二维ising模型说明了我们的想法。我们提出了几次自相关时间的估算,其中一些灵感来自于为大都市独立采样器导出的分析结果,我们将其与逆温度$ \ Beta $的函数进行比较和研究。基于我们提出替代损失功能,并研究其对自动系列的影响。此外,我们调查对自动相关时间的神经网络培训过程中强加系统对称($ Z_2 $和/或翻译)的影响。最终,我们提出了一种包含局部热浴更新的方案。讨论了上述增强功能的影响为16美元16美元旋转系统。我们的调查结果摘要可以作为实施更复杂模型的神经马尔可夫链蒙特卡罗模拟的指导。
translated by 谷歌翻译
Normalizing flows provide a general mechanism for defining expressive probability distributions, only requiring the specification of a (usually simple) base distribution and a series of bijective transformations. There has been much recent work on normalizing flows, ranging from improving their expressive power to expanding their application. We believe the field has now matured and is in need of a unified perspective. In this review, we attempt to provide such a perspective by describing flows through the lens of probabilistic modeling and inference. We place special emphasis on the fundamental principles of flow design, and discuss foundational topics such as expressive power and computational trade-offs. We also broaden the conceptual framing of flows by relating them to more general probability transformations. Lastly, we summarize the use of flows for tasks such as generative modeling, approximate inference, and supervised learning.
translated by 谷歌翻译