智能论文笔记

Decoding surface codes with deep reinforcement learning and probabilistic policy reuse

Elisha Siddiqui Matekole , Esther Ye , Ramya Iyer , Samuel Yen-Chi Chen

分类：人工智能 | 机器学习 | 神经与进化计算

2022-12-22

Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.

translated by 谷歌翻译

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Srinivasan Iyer , Xi Victoria Lin , Ramakanth Pasunuru , Todor Mihaylov , Daniel Simig , Ping Yu , Kurt Shuster , Tianlu Wang , Qing Liu , Punit Singh Koura

分类：自然语言处理

2022-12-22

Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.

translated by 谷歌翻译

Artistic Arbitrary Style Transfer

Weiting Li , Rahul Vyas , Ramya Sree Penta

分类：计算机视觉

2022-12-21

Arbitrary Style Transfer is a technique used to produce a new image from two images: a content image, and a style image. The newly produced image is unseen and is generated from the algorithm itself. Balancing the structure and style components has been the major challenge that other state-of-the-art algorithms have tried to solve. Despite all the efforts, it's still a major challenge to apply the artistic style that was originally created on top of the structure of the content image while maintaining consistency. In this work, we solved these problems by using a Deep Learning approach using Convolutional Neural Networks. Our implementation will first extract foreground from the background using the pre-trained Detectron 2 model from the content image, and then apply the Arbitrary Style Transfer technique that is used in SANet. Once we have the two styled images, we will stitch the two chunks of images after the process of style transfer for the complete end piece.

translated by 谷歌翻译

Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning

Ramya Hebbalaguppe , Rishabh Patra , Tirtharaj Dash , Gautam Shroff , Lovekesh Vig

分类：机器学习 | 计算机视觉

2022-12-20

Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of overconfident predictions by pushing down the confidence of the winning class while increasing the confidence of the remaining classes across all test samples. However, from a deployment perspective, an ideal model is desired to (i) generate well-calibrated predictions for high-confidence samples with predicted probability say >0.95, and (ii) generate a higher proportion of legitimate high-confidence samples. To this end, we propose a novel regularization technique that can be used with classification losses, leading to state-of-the-art calibrated predictions at test time; From a deployment standpoint in safety-critical applications, only high-confidence samples from a well-calibrated model are of interest, as the remaining samples have to undergo manual inspection. Predictive confidence reduction of these potentially ``high-confidence samples'' is a downside of existing calibration approaches. We mitigate this by proposing a dynamic train-time data pruning strategy that prunes low-confidence samples every few epochs, providing an increase in "confident yet calibrated samples". We demonstrate state-of-the-art calibration performance across image classification benchmarks, reducing training time without much compromise in accuracy. We provide insights into why our dynamic pruning strategy that prunes low-confidence training samples leads to an increase in high-confidence samples at test time.

translated by 谷歌翻译

Maximal Initial Learning Rates in Deep ReLU Networks

Gaurav Iyer , Boris Hanin , David Rolnick

分类： (统计)机器学习 | 机器学习

2022-12-14

Training a neural network requires choosing a suitable learning rate, involving a trade-off between speed and effectiveness of convergence. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ - the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. Using a simple approach to estimate $\eta^{\ast}$, we observe that in constant-width fully-connected ReLU networks, $\eta^{\ast}$ demonstrates different behavior to the maximum learning rate later in training. Specifically, we find that $\eta^{\ast}$ is well predicted as a power of $(\text{depth} \times \text{width})$, provided that (i) the width of the network is sufficiently large compared to the depth, and (ii) the input layer of the network is trained at a relatively small learning rate. We further analyze the relationship between $\eta^{\ast}$ and the sharpness $\lambda_{1}$ of the network at initialization, indicating that they are closely though not inversely related. We formally prove bounds for $\lambda_{1}$ in terms of $(\text{depth} \times \text{width})$ that align with our empirical results.

translated by 谷歌翻译

Demystifying Prompts in Language Models via Perplexity Estimation

Hila Gonen , Srini Iyer , Terra Blevins , Noah A. Smith , Luke Zettlemoyer

分类：自然语言处理

2022-12-08

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand why this happens or how to pick the best prompts. In this work, we analyze the factors that contribute to this variance and establish a new empirical hypothesis: the performance of a prompt is coupled with the extent to which the model is familiar with the language it contains. Over a wide range of tasks, we show that the lower the perplexity of the prompt is, the better the prompt is able to perform the task. As a result, we devise a method for creating prompts: (1) automatically extend a small seed set of manually written prompts by paraphrasing using GPT3 and backtranslation and (2) choose the lowest perplexity prompts to get significant gains in performance.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Fully Bayesian inference for latent variable Gaussian process models

Suraj Yerramilli , Akshay Iyer , Wei Chen , Daniel W. Apley

分类： (统计)机器学习 | 机器学习

2022-11-04

Real engineering and scientific applications often involve one or more qualitative inputs. Standard Gaussian processes (GPs), however, cannot directly accommodate qualitative inputs. The recently introduced latent variable Gaussian process (LVGP) overcomes this issue by first mapping each qualitative factor to underlying latent variables (LVs), and then uses any standard GP covariance function over these LVs. The LVs are estimated similarly to the other GP hyperparameters through maximum likelihood estimation, and then plugged into the prediction expressions. However, this plug-in approach will not account for uncertainty in estimation of the LVs, which can be significant especially with limited training data. In this work, we develop a fully Bayesian approach for the LVGP model and for visualizing the effects of the qualitative inputs via their LVs. We also develop approximations for scaling up LVGPs and fully Bayesian inference for the LVGP hyperparameters. We conduct numerical studies comparing plug-in inference against fully Bayesian inference over a few engineering models and material design applications. In contrast to previous studies on standard GP modeling that have largely concluded that a fully Bayesian treatment offers limited improvements, our results show that for LVGP modeling it offers significant improvements in prediction accuracy and uncertainty quantification over the plug-in approach.

translated by 谷歌翻译

Low-Stabilizer-Complexity Quantum States Are Not Pseudorandom

Sabee Grewal , Vishnu Iyer , William Kretschmer , Daniel Liang

分类：机器学习

2022-09-29

我们表明，具有“低稳定器复杂性”的量子状态可以有效地与HAAR随机区分开。具体而言，给定$ n $ qubit的纯状态$ | \ psi \ rangle $，我们给出了一种有效的算法，以区分$ | \ psi \ rangle $是（i）haar-random或（ii）具有稳定器保真度的状态至少$ \ frac {1} {k} $（即，具有一些稳定器状态的保真度至少$ \ frac {1} {k} $），保证就是其中之一。使用Black-box访问$ | \ psi \ rangle $，我们的算法使用$ o \！\ left（k^{12} \ log（1/\ delta）\ right）$ copies $ | \ psi \ rangle $和$ o \！\ left（n k^{12} \ log（1/\ delta）\ right）$ $时间以概率至少$ 1- \ delta $成功，并且随着访问状态准备统一，以$ | | \ psi \ rangle $（及其倒数），$ o \！\ left（k^{3} \ log（1/\ delta）\ right）$ queries和$ o \！\！ log（1/\ delta）\ right）$时间就足够了。作为推论，我们证明$ \ omega（\ log（n））$ $ t $ - 盖特对于任何Clifford+$ t $ circile都是必不可少的，以准备计算上的pseudorandom Quantum Quantum state，这是一种首要的下限。

translated by 谷歌翻译

Streaming Encoding Algorithms for Scalable Hyperdimensional Computing

Anthony Thomas , Behnam Khaleghi , Gopi Krishna Jha , Nageen Himayat , Ravi Iyer , Nilesh Jain , Tajana Rosing

分类：机器学习 | 神经与进化计算

2022-09-20

高维计算（HDC）是用于数据表示和学习的范式，起源于计算神经科学。HDC将数据表示为高维，低精度向量，可用于学习或召回等各种信息处理任务。高维空间的映射是HDC中的一个基本问题，现有方法在输入数据本身是高维时会遇到可伸缩性问题。在这项工作中，我们探索了一个基于哈希的流媒体编码技术。我们正式表明，这些方法在学习应用程序的性能方面具有可比的保证，同时比现有替代方案更有效。我们在一个流行的高维分类问题上对这些结果进行了实验验证，并表明我们的方法很容易扩展到非常大的数据集。

translated by 谷歌翻译