我们表明,sindhorn-knopp算法的衍生物或迭代比例拟合程序会以局部统一的线性收敛速率收敛于最佳运输问题的熵正则化的衍生物。
translated by 谷歌翻译
数值验证是机器学习研究的核心,因为它允许评估新方法的实际影响,并确认理论和实践之间的一致性。然而,该领域的快速发展构成了一些挑战:研究人员面临着大量的方法来比较,有限的透明度和最佳实践的共识以及乏味的重新实施工作。结果,验证通常是非常部分的,这可能会导致错误的结论,从而减慢研究的进展。我们提出了Benchopt,这是一个协作框架,旨在在跨编程语言和硬件体系结构的机器学习中自动化,复制和发布优化基准。 Benchopt通过提供用于运行,共享和扩展实验的现成工具来简化社区的基准测试。为了展示其广泛的可用性,我们在三个标准学习任务上展示基准:$ \ ell_2 $ regulaine的逻辑回归,套索和RESNET18用于图像分类的培训。这些基准强调了关键的实际发现,这些发现对这些问题的最新问题更加细微,这表明在实际评估中,魔鬼在细节上。我们希望Benchopt能在社区中促进合作工作,从而改善研究结果的可重复性。
translated by 谷歌翻译
Bilevel优化是在机器学习的许多领域中最小化涉及另一个功能的价值函数的问题。在大规模的经验风险最小化设置中,样品数量很大,开发随机方法至关重要,而随机方法只能一次使用一些样品进行进展。但是,计算值函数的梯度涉及求解线性系统,这使得很难得出无偏的随机估计。为了克服这个问题,我们引入了一个新颖的框架,其中内部问题的解决方案,线性系统的解和主要变量同时发展。这些方向是作为总和写成的,使其直接得出无偏估计。我们方法的简单性使我们能够开发全球差异算法,其中所有变量的动力学都会降低差异。我们证明,萨巴(Saba)是我们框架中著名的传奇算法的改编,具有$ o(\ frac1t)$收敛速度,并且在polyak-lojasciewicz的假设下实现了线性收敛。这是验证这些属性之一的双光线优化的第一种随机算法。数值实验验证了我们方法的实用性。
translated by 谷歌翻译
稀疏性前锋常用于去噪和图像重建。对于分析型前导者,字典定义了可能稀疏的信号的表示。在大多数情况下,通过最小化重建误差,该字典尚不清楚。这定义了分层优化问题,可以作为双级优化投射。然而,这个问题是无法解决的,因为重建和它们的衍生物WRT字典没有闭合形式表达式。然而,可以使用前后分离(FB)算法迭代地计算重建。在本文中,我们通过上述FB算法的输出来近似重建。然后,我们利用自动差异来评估该输出的梯度WRT字典,我们使用投影梯度下降来学习。实验表明,我们的算法成功学习了来自分段恒定信号的1D总变化(TV)词典。对于相同的案例研究,我们建议将我们的搜索限制在0中心列的字典中,该字典删除了不期望的局部最小值并提高了数值稳定性。
translated by 谷歌翻译
找到模型的最佳超参数可以作为双重优化问题,通常使用零级技术解决。在这项工作中,当内部优化问题是凸但不平滑时,我们研究一阶方法。我们表明,近端梯度下降和近端坐标下降序列序列的前向模式分化,雅各比人会收敛到精确的雅各布式。使用隐式差异化,我们表明可以利用内部问题的非平滑度来加快计算。最后,当内部优化问题大约解决时,我们对高度降低的误差提供了限制。关于回归和分类问题的结果揭示了高参数优化的计算益处,尤其是在需要多个超参数时。
translated by 谷歌翻译
广义线性模型(GLM)形成了一类广泛的回归和分类模型,其中预测是输入变量的线性组合的函数。对于高维度的统计推断,事实证明,诱导正规化的稀疏性在提供统计保证时很有用。但是,解决最终的优化问题可能具有挑战性:即使对于流行的迭代算法,例如协调下降,也需要在大量变量上循环。为了减轻这种情况,称为筛选规则和工作集的技术可以通过逐步删除变量或解决增长的较小问题的序列来减少手头优化问题的大小。对于这两种技术,都可以鉴定出大量变量,这要归功于凸双重性论点。在本文中,我们表明,GLM的双重迭代在标志识别后表现出矢量自回归(VAR)行为,当使用近端梯度下降或环状坐标下降解决原始问题时。利用这种规律性,可以构建双重点,以提供最佳的最佳证书,增强筛选规则的性能并帮助设计竞争性的工作集算法。
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released.
translated by 谷歌翻译
Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.
translated by 谷歌翻译
Naturally-occurring information-seeking questions often contain questionable assumptions -- assumptions that are false or unverifiable. Questions containing questionable assumptions are challenging because they require a distinct answer strategy that deviates from typical answers to information-seeking questions. For instance, the question "When did Marie Curie discover Uranium?" cannot be answered as a typical when question without addressing the false assumption "Marie Curie discovered Uranium". In this work, we propose (QA)$^2$ (Question Answering with Questionable Assumptions), an open-domain evaluation dataset consisting of naturally-occurring search engine queries that may or may not contain questionable assumptions. To be successful on (QA)$^2$, systems must be able to detect questionable assumptions and also be able to produce adequate responses for both typical information-seeking questions and ones with questionable assumptions. We find that current models do struggle with handling questionable assumptions -- the best performing model achieves 59% human rater acceptability on abstractive QA with (QA)$^2$ questions, leaving substantial headroom for progress.
translated by 谷歌翻译