汉密尔顿蒙特卡洛(HMC)是抽样中的流行方法。尽管有很多关于各个方面的方法研究这种方法的作品,但一个有趣的问题是如何选择其集成时间以实现加速。在这项工作中,我们考虑通过HMC通过时间变化的集成时间来加速从分布$ \ pi(x)\ propto \ exp(-f(x))$采样的过程。当潜在的$ f $为$ l $ -smooth和$ m $ - $ -Strongly凸,即\ \ \用于从日志平滑且强烈的log-concove目标分配$ \ pi $进行采样时,已知在恒定的集成时间下,理想HMC需要获得$ \ epsilon $ wasserstein-2距离到目标$ \ pi $ is $ o(\ kappa \ log \ frac \ frac {1} {\ epsilon})$的迭代数量kappa:= \ frac {l} {m} $是条件号。我们提出了一个基于Chebyshev多项式根源的时变整合时间的方案。我们表明,在二次潜在$ f $的情况下,即当目标$ \ pi $是高斯分布时,理想的HMC只需$ o(\ sqrt {\ kappa} \ log \ frac) {1} {\ epsilon})$迭代数量到达Wasserstein-2距离小于$ \ epsilon $;对条件编号的依赖性的这种改善类似于优化的加速。 HMC随着建议的集成时间的设计和分析是建立在Chebyshev多项式工具上的。实验发现,即使是从没有二次的平稳凸电势的分布中进行的,即使是从具有平稳凸电势的分布中进行采样的优势也是如此。
translated by 谷歌翻译
如今,重球(HB)是非凸优化中最流行的动量方法之一。已经广泛观察到,将重球动态纳入基于梯度的方法中可以加速现代机器学习模型的训练过程。但是,建立其加速理论基础的进展显然远远落后于其经验成功。现有的可证明的加速结果是二次或近二次功能,因为当前显示HB加速度的技术仅限于Hessian固定时的情况。在这项工作中,我们开发了一些新技术,这些新技术有助于表现出二次超越二次的加速度,这是通过分析在两个连续时间点上如何变化的Hessian的变化来实现的,从而影响了收敛速度。基于我们的技术结果,一类Polyak- \ l {} Ojasiewicz(PL)优化问题可以通过HB确定可证明的加速度。此外,我们的分析证明了适应性设置动量参数的好处。
translated by 谷歌翻译
在本文中,我们研究了具有约束策略空间的两人双线零和游戏。这种约束的自然发生的一个实例是使用混合策略,这与概率单纯限制相对应。我们提出和分析交替的镜面下降算法,其中每个玩家都会轮流采取镜子下降算法采取行动,以进行约束优化。我们将交替的镜像下降解释为双重空间中偏斜梯度流的交替离散化,并使用凸优化和修改能量功能的工具来建立$ O(k^{ - 2/3})$绑定其平均后悔$ k $迭代。与同时版本的镜子下降算法相比,这可以定量验证该算法的更好行为,该算法的同时版本可以发散并产生$ O(k^{ - 1/2})$平均遗憾。在不受约束的特殊情况下,我们的结果恢复了在(Bailey等人,Colt 2020)中研究的零和零游戏的交替梯度下降算法的行为。
translated by 谷歌翻译
We present NusaCrowd, a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and its local languages. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and its local languages. Our work is intended to help advance natural language processing research in under-represented languages.
translated by 谷歌翻译
We propose JFP, a Joint Future Prediction model that can learn to generate accurate and consistent multi-agent future trajectories. For this task, many different methods have been proposed to capture social interactions in the encoding part of the model, however, considerably less focus has been placed on representing interactions in the decoder and output stages. As a result, the predicted trajectories are not necessarily consistent with each other, and often result in unrealistic trajectory overlaps. In contrast, we propose an end-to-end trainable model that learns directly the interaction between pairs of agents in a structured, graphical model formulation in order to generate consistent future trajectories. It sets new state-of-the-art results on Waymo Open Motion Dataset (WOMD) for the interactive setting. We also investigate a more complex multi-agent setting for both WOMD and a larger internal dataset, where our approach improves significantly on the trajectory overlap metrics while obtaining on-par or better performance on single-agent trajectory metrics.
translated by 谷歌翻译
Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access. The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, others cannot reproduce the experiment and reuse the findings for subsequent research. Second, suppose the ad-hoc research software fails during often long-running computationally expensive experiments. In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers. We suggest making caching an integral part of the research software development process, even before the first line of code is written. This article outlines caching recommendations for developing research software in data science projects. Our recommendations provide a perspective to circumvent common problems such as propriety dependence, speed, etc. At the same time, caching contributes to the reproducibility of experiments in the open science workflow. Concerning the four guiding principles, i.e., Findability, Accessibility, Interoperability, and Reusability (FAIR), we foresee that including the proposed recommendation in a research software development will make the data related to that software FAIRer for both machines and humans. We exhibit the usefulness of some of the proposed recommendations on our recently completed research software project in mathematical information retrieval.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
关于使用ML模型的一个基本问题涉及其对提高决策透明度的预测的解释。尽管已经出现了几种可解释性方法,但已经确定了有关其解释可靠性的一些差距。例如,大多数方法都是不稳定的(这意味着它们在数据中提供了截然不同的解释),并且不能很好地应对无关的功能(即与标签无关的功能)。本文介绍了两种新的可解释性方法,即Varimp和Supclus,它们通过使用局部回归拟合的加权距离来克服这些问题,以考虑可变重要性。 Varimp生成了每个实例的解释,可以应用于具有更复杂关系的数据集,而Supclus解释了具有类似说明的实例集群,并且可以应用于可以找到群集的较简单数据集。我们将我们的方法与最先进的方法进行了比较,并表明它可以根据几个指标产生更好的解释,尤其是在具有无关特征的高维问题中,以及特征与目标之间的关系是非线性的。
translated by 谷歌翻译
不确定性的量化对于采用机器学习至关重要,尤其是拒绝分布(OOD)数据回到人类专家进行审查。然而,进步一直很慢,因为计算效率和不确定性估计质量之间必须达到平衡。因此,许多人使用神经网络或蒙特卡洛辍学的深层集合来进行相对最小的计算和记忆时合理的不确定性估计。出乎意料的是,当我们专注于$ \ leq 1 \%$ frese-falds正率(FPR)的现实世界中的约束时,先前的方法无法可靠地检测到OOD样本。值得注意的是,即使高斯随机噪声也无法触发这些流行的OOD技术。我们通过设计一种简单的对抗训练计划来帮助缓解这个问题,该计划结合了辍学合奏所预测的认知不确定性的攻击。我们证明了这种方法可以改善标准数据(即未经对抗制作)上的OOD检测性能,并将标准化的部分AUC从近乎随机的猜测性能提高到$ \ geq 0.75 $。
translated by 谷歌翻译
鉴于完整的指纹图像(滚动或拍打),我们介绍了Cyclegan模型,以生成与完整印刷相同身份的多个潜在印象。我们的模型可以控制生成的潜在打印图像中的失真,噪声,模糊和遮挡程度,以获得NIST SD27潜在数据库中介绍的好,坏和丑陋的潜在图像类别。我们的工作的贡献是双重的:(i)证明合成生成的潜在指纹图像与NIST SD27和MSP数据库中的犯罪现场潜伏期的相似性,并由NIST NIST NFIQ 2质量度量和由SOTA指纹匹配器和ROC曲线评估。 (ii)使用合成潜伏期在公共领域增强小型的潜在训练数据库,以提高Deepprint的性能,Deepprint是一种SOTA指纹匹配器,设计用于在三个潜在数据库上滚动的指纹匹配(NIST SD27,NIST SD302和IIITD,以及IIITD,以及IIITD,以及IIITD,以及-slf)。例如,随着合成潜在数据的增强,在具有挑战性的NIST SD27潜在数据库中,Deepprint的排名1检索性能从15.50%提高到29.07%。我们生成合成潜在指纹的方法可用于改善任何潜在匹配器及其单个组件的识别性能(例如增强,分割和特征提取)。
translated by 谷歌翻译