While risk-neutral reinforcement learning has shown experimental success in a number of applications, it is well-known to be non-robust with respect to noise and perturbations in the parameters of the system. For this reason, risk-sensitive reinforcement learning algorithms have been studied to introduce robustness and sample efficiency, and lead to better real-life performance. In this work, we introduce new model-free risk-sensitive reinforcement learning algorithms as variations of widely-used Policy Gradient algorithms with similar implementation properties. In particular, we study the effect of exponential criteria on the risk-sensitivity of the policy of a reinforcement learning agent, and develop variants of the Monte Carlo Policy Gradient algorithm and the online (temporal-difference) Actor-Critic algorithm. Analytical results showcase that the use of exponential criteria generalize commonly used ad-hoc regularization approaches. The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
translated by 谷歌翻译
Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, we introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space. The optimal partition is gradually approximated by solving a sequence of optimization sub-problems that yield a sequence of partitions with increasing number of subsets. We show that the solution of each optimization problem can be estimated online using gradient-free stochastic approximation updates. As a consequence, a function approximation problem can be defined within each subset of the partition and solved using the theory of two-timescale stochastic approximation algorithms. This simulates an annealing process and defines a robust and interpretable heuristic method to gradually increase the complexity of the learning architecture in a task-agnostic manner, giving emphasis to regions of the data space that are considered more important according to a predefined criterion. Finally, by imposing a tree structure in the progression of the partitions, we provide a means to incorporate potential multi-resolution structure of the data space into this approach, significantly reducing its complexity, while introducing hierarchical feature extraction properties similar to certain classes of deep learning architectures. Asymptotic convergence analysis and experimental results are provided for clustering, classification, and regression problems.
translated by 谷歌翻译
在这项工作中,我们介绍了一种学习模型,旨在满足计算资源有限的应用需求,并且优先考虑鲁棒性和解释性。学习问题可以作为受限的随机优化问题提出,其限制主要源于模型假设,这些假设定义了复杂性和性能之间的权衡。这种权衡与噪声和对抗性攻击的过度拟合,概括能力和鲁棒性密切相关,并取决于模型的结构和复杂性以及所使用的优化方法的属性。我们基于退火优化开发了一种基于在线原型的学习算法,该算法被称为无线梯度随机近似算法。学习模型可以被视为一种可解释的竞争学习神经网络模型,用于监督,无监督和强化学习。该算法的退火性质有助于最小的高参数调整要求,局部最小值预防差以及相对于初始条件的鲁棒性。同时,它通过直观的分叉现象逐渐提高学习模型的复杂性,从而在线控制对性能复杂性权衡。最后,随机近似的使用能够通过动态系统和控制的数学工具来研究学习算法的收敛性,并允许其与增强学习算法的集成,从而构建适应性的状态行动聚合方案。
translated by 谷歌翻译
我们认为了解生物或人为群的协调运动的问题。在这方面,我们提出了一种学习计划,以估计相互作用者的协调规律与群体密度随时间的观察。我们根据划线斑块植绒模型的成对交互来描述群体的动态,并表达群体的密度演进作为对平均流体动力方程系统的解决方案。我们提出了一种新的参数族,以模拟成对交互,这允许积分微分方程的平均场宏观系统被有效地解决为PDE的增强系统。最后,我们在迭代优化方案中纳入了增强系统,以了解与群体的密度进化的观察中相互作用的动态。这项工作的结果可以提供一种替代方法来研究动物群坐标,为大型网络系统创造新的控制方案,并作为防止对抗逆情机制攻击的防御机制的中心部分。
translated by 谷歌翻译
人类认知中的普遍学习架构存在是由神经科学的实验结果支持的广泛传播的猜想。虽然没有指定低级实施,但据信人类感知和学习的摘要概述需要三个基本属性:(a)基于内存的知识表示,(b)基于内存的知识表示,(c)逐步学习和知识压实。我们从系统理论上探讨了这种学习架构的设计,开发了具有三个主要组件的闭环系统:(i)多分辨率分析预处理器,(ii)组不变特征提取器,以及(iii)基于渐进知识的学习模块。多分辨率反馈循环用于学习,即使系统参数适应在线观察。设计(i)和(ii),我们建立在基于小波的多分辨率分析和集团卷积运营商的属性的建立理论上。关于(iii),我们介绍了一种新的学习算法,该算法构建多项分辨率的逐步增长的知识表示。该算法是基于退火优化的在线确定性退火(ODA)算法的延伸,使用无梯度随机近似求解。 ODA具有固有的鲁棒性和正常化属性,并提供了逐步提高学习模型的复杂性的方法,即根据需要,通过直观的分叉现象,神经元的数量。所提出的多分辨率方法是分层,逐步,知识的和可解释的。我们说明了在最先进的学习算法和深度学习方法的上下文中所提出的架构的性质。
translated by 谷歌翻译
几乎在几乎每个迭代机器学习算法的内部是超参数调谐的问题,包括三个主要设计参数:(a)模型的复杂性,例如神经网络中的神经元数,(b)初始条件,这大量影响算法的行为,(c)用于量化其性能的不相似度量。我们介绍基于在线的基于原型的学习算法,可以被视为用于分类和聚类的逐步增长的竞争学习神经网络架构。所提出的方法的学习规则被制定为一种在线梯度随机近似算法,解决了模拟退火过程的适当定义的优化问题的序列。该算法的退火性质有助于避免众多局部最小值,提供初始条件的鲁棒性,并通过直观的分叉现象提供逐渐增加学习模型的复杂性的方法。提出的方法是可解释的,需要最小的超参数调整,并允许在线控制对性能复杂性权衡。最后,我们表明Bregman分歧自然地看作是一个在学习算法的性能和计算复杂性中起着核心作用的一个不相似性措施。
translated by 谷歌翻译
We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. We use an autoregressive large language model (OpenAI's text-davinci-003) to determine if proposed U.S. Congressional bills are relevant to specific public companies and provide explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. However, we test the ability to determine the relevance of a bill with the previous OpenAI GPT-3 model (text-davinci-002), which was state-of-the-art on many language tasks until text-davinci-003 was released on November 28, 2022. The performance of text-davinci-002 is worse than simply always predicting that a bill is irrelevant to a company. These results suggest that, as large language models continue to improve core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. We then discuss why this could be problematic for societal-AI alignment.
translated by 谷歌翻译
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
In this paper we derive a PAC-Bayesian-Like error bound for a class of stochastic dynamical systems with inputs, namely, for linear time-invariant stochastic state-space models (stochastic LTI systems for short). This class of systems is widely used in control engineering and econometrics, in particular, they represent a special case of recurrent neural networks. In this paper we 1) formalize the learning problem for stochastic LTI systems with inputs, 2) derive a PAC-Bayesian-Like error bound for such systems, 3) discuss various consequences of this error bound.
translated by 谷歌翻译