Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
深度神经网络的强大学习能力使强化学习者能够直接从连续环境中学习有效的控制政策。从理论上讲,为了实现稳定的性能,神经网络假设I.I.D.不幸的是,在训练数据在时间上相关且非平稳的一般强化学习范式中,输入不存在。这个问题可能导致“灾难性干扰”和性能崩溃的现象。在本文中,我们提出智商,即干涉意识深度Q学习,以减轻单任务深度加固学习中的灾难性干扰。具体来说,我们求助于在线聚类,以实现在线上下文部门,以及一个多头网络和一个知识蒸馏正规化术语,用于保留学习上下文的政策。与现有方法相比,智商基于深Q网络,始终如一地提高稳定性和性能,并通过对经典控制和ATARI任务进行了广泛的实验。该代码可在以下网址公开获取:https://github.com/sweety-dm/interference-aware-ware-deep-q-learning。
translated by 谷歌翻译
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.
translated by 谷歌翻译
我们开发了一种新的持续元学习方法,以解决连续多任务学习中的挑战。在此设置中,代理商的目标是快速通过任何任务序列实现高奖励。先前的Meta-Creenifiltive学习算法已经表现出有希望加速收购新任务的结果。但是,他们需要在培训期间访问所有任务。除了简单地将过去的经验转移到新任务,我们的目标是设计学习学习的持续加强学习算法,使用他们以前任务的经验更快地学习新任务。我们介绍了一种新的方法,连续的元策略搜索(Comps),通过以增量方式,在序列中的每个任务上,通过序列的每个任务来消除此限制,而无需重新访问先前的任务。 Comps持续重复两个子程序:使用RL学习新任务,并使用RL的经验完全离线Meta学习,为后续任务学习做好准备。我们发现,在若干挑战性连续控制任务的旧序列上,Comps优于持续的持续学习和非政策元增强方法。
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
Interacting with a complex world involves continual learning, in which tasks and data distributions change over time. A continual learning system should demonstrate both plasticity (acquisition of new knowledge) and stability (preservation of old knowledge). Catastrophic forgetting is the failure of stability, in which new experience overwrites previous experience. In the brain, replay of past experience is widely believed to reduce forgetting, yet it has been largely overlooked as a solution to forgetting in deep reinforcement learning. Here, we introduce CLEAR, a replay-based method that greatly reduces catastrophic forgetting in multi-task reinforcement learning. CLEAR leverages off-policy learning and behavioral cloning from replay to enhance stability, as well as on-policy learning to preserve plasticity. We show that CLEAR performs better than state-of-the-art deep learning techniques for mitigating forgetting, despite being significantly less complicated and not requiring any knowledge of the individual tasks being learned.
translated by 谷歌翻译
持续学习研究的主要重点领域是通过设计新算法对分布变化更强大的新算法来减轻神经网络中的“灾难性遗忘”问题。尽管持续学习文献的最新进展令人鼓舞,但我们对神经网络的特性有助于灾难性遗忘的理解仍然有限。为了解决这个问题,我们不关注持续的学习算法,而是在这项工作中专注于模型本身,并研究神经网络体系结构对灾难性遗忘的“宽度”的影响,并表明宽度在遗忘遗产方面具有出人意料的显着影响。为了解释这种效果,我们从各个角度研究网络的学习动力学,例如梯度正交性,稀疏性和懒惰的培训制度。我们提供了与不同架构和持续学习基准之间的经验结果一致的潜在解释。
translated by 谷歌翻译
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.
translated by 谷歌翻译
根据互补学习系统(CLS)理论〜\ cite {mcclelland1995there}在神经科学中,人类通过两个补充系统有效\ emph {持续学习}:一种快速学习系统,以海马为中心,用于海马,以快速学习细节,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验的快速学习, ;以及位于新皮层中的缓慢学习系统,以逐步获取有关环境的结构化知识。在该理论的激励下,我们提出\ emph {dualnets}(对于双网络),这是一个一般的持续学习框架,该框架包括一个快速学习系统,用于监督从特定任务和慢速学习系统中的模式分离代表学习,用于表示任务的慢学习系统 - 不可知论的一般代表通过自我监督学习(SSL)。双网符可以无缝地将两种表示类型纳入整体框架中,以促进在深层神经网络中更好地持续学习。通过广泛的实验,我们在各种持续的学习协议上展示了双网络的有希望的结果,从标准离线,任务感知设置到具有挑战性的在线,无任务的场景。值得注意的是,在Ctrl〜 \ Cite {veniat2020202020202020202020202020202020202020202020202020202020202021- coite {ostapenko2021-continual}的基准中。此外,我们进行了全面的消融研究,以验证双nets功效,鲁棒性和可伸缩性。代码可在\ url {https://github.com/phquang/dualnet}上公开获得。
translated by 谷歌翻译
在线持续学习(OCL)旨在通过单个通过数据从非平稳数据流进行逐步训练神经网络。基于彩排的方法试图用少量的内存近似观察到的输入分布,并以后重新审视它们以避免忘记。尽管具有强烈的经验表现,但排练方法仍然遭受了过去数据损失景观和记忆样本的差异。本文重新讨论了在线设置中的排练动态。我们从偏见和动态的经验风险最小化的角度从固有的内存过度拟合风险中提供了理论见解,并检查重复排练的优点和限制。受我们的分析的启发,一个简单而直观的基线,重复的增强彩排(RAR)旨在解决在线彩排的拟合不足的困境。令人惊讶的是,在四个相当不同的OCL基准测试中,这种简单的基线表现优于香草排练9%-17%,并且显着改善了基于最新的彩排方法miR,ASER和SCR。我们还证明,RAR成功地实现了过去数据的损失格局和其学习轨迹中的高损失山脊厌恶的准确近似。进行了广泛的消融研究,以研究重复和增强彩排和增强学习(RL)之间的相互作用(RL),以动态调整RAR的超参数以平衡在线稳定性 - 塑性权衡折衷。
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译
Progress in continual reinforcement learning has been limited due to several barriers to entry: missing code, high compute requirements, and a lack of suitable benchmarks. In this work, we present CORA, a platform for Continual Reinforcement Learning Agents that provides benchmarks, baselines, and metrics in a single code package. The benchmarks we provide are designed to evaluate different aspects of the continual RL challenge, such as catastrophic forgetting, plasticity, ability to generalize, and sample-efficient learning. Three of the benchmarks utilize video game environments (Atari, Procgen, NetHack). The fourth benchmark, CHORES, consists of four different task sequences in a visually realistic home simulator, drawn from a diverse set of task and scene parameters. To compare continual RL methods on these benchmarks, we prepare three metrics in CORA: Continual Evaluation, Isolated Forgetting, and Zero-Shot Forward Transfer. Finally, CORA includes a set of performant, open-source baselines of existing algorithms for researchers to use and expand on. We release CORA and hope that the continual RL community can benefit from our contributions, to accelerate the development of new continual RL algorithms.
translated by 谷歌翻译
Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational systems and autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Although significant advances have been made in domain-specific learning with neural networks, extensive research efforts are required for the development of robust lifelong learning on autonomous agents and robots. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.
translated by 谷歌翻译
模块化是持续学习(CL)的令人信服的解决方案,是相关任务建模的问题。学习和组合模块来解决不同的任务提供了一种抽象来解决CL的主要挑战,包括灾难性的遗忘,向后和向前传输跨任务以及子线性模型的增长。我们引入本地模块组成(LMC),该方法是模块化CL的方法,其中每个模块都提供了局部结构组件,其估计模块与输入的相关性。基于本地相关评分进行动态模块组合。我们展示了对任务身份(IDS)的不可知性来自(本地)结构学习,该结构学习是特定于模块和/或模型特定于以前的作品,使LMC适用于与以前的作品相比的更多CL设置。此外,LMC还跟踪输入分布的统计信息,并在检测到异常样本时添加新模块。在第一组实验中,LMC与最近的持续转移学习基准上的现有方法相比,不需要任务标识。在另一个研究中,我们表明结构学习的局部性允许LMC插入相关但未遵守的任务(OOD),以及在不同任务序列上独立于不同的任务序列培训的模块化网络,而无需任何微调。最后,在寻找LMC的限制,我们在30和100个任务的更具挑战性序列上研究它,展示了本地模块选择在存在大量候选模块时变得更具挑战性。在此设置中,与Oracle基准的基线相比,最佳执行LMC产生的模块更少,但它达到了较低的总体精度。 CodeBase可在https://github.com/oleksost/lmc下找到。
translated by 谷歌翻译
AI的一个关键挑战是构建体现的系统,该系统在动态变化的环境中运行。此类系统必须适应更改任务上下文并持续学习。虽然标准的深度学习系统实现了最先进的静态基准的结果,但它们通常在动态方案中挣扎。在这些设置中,来自多个上下文的错误信号可能会彼此干扰,最终导致称为灾难性遗忘的现象。在本文中,我们将生物学启发的架构调查为对这些问题的解决方案。具体而言,我们表明树突和局部抑制系统的生物物理特性使网络能够以特定于上下文的方式动态限制和路由信息。我们的主要贡献如下。首先,我们提出了一种新颖的人工神经网络架构,该架构将活跃的枝形和稀疏表示融入了标准的深度学习框架中。接下来,我们在需要任务的适应性的两个单独的基准上研究这种架构的性能:Meta-World,一个机器人代理必须学习同时解决各种操纵任务的多任务强化学习环境;和一个持续的学习基准,其中模型的预测任务在整个训练中都会发生变化。对两个基准的分析演示了重叠但不同和稀疏的子网的出现,允许系统流动地使用最小的遗忘。我们的神经实现标志在单一架构上第一次在多任务和持续学习设置上取得了竞争力。我们的研究揭示了神经元的生物学特性如何通知深度学习系统,以解决通常不可能对传统ANN来解决的动态情景。
translated by 谷歌翻译
在基于人工神经网络的终身学习系统中,最大的障碍之一是在遇到新信息时无法保留旧知识。这种现象被称为灾难性遗忘。在本文中,我们提出了一种新型的连接主义架构,即顺序的神经编码网络,在从数据点流中学习时忘记了,并且与当今的网络不同,它不会通过流行的错误反向传播来学习。基于预测性处理的神经认知理论,我们的模型以生物学上可行的方式适应了突触,而另一个神经系统学会了指导和控制这种类似皮层的结构,模仿了一些基础神经节的某些任务连续控制功能。在我们的实验中,我们证明了与标准神经模型相比,我们的自组织系统经历的遗忘大大降低,表现优于先前提出的方法,包括基于排练/数据缓冲的方法,包括标准(SplitMnist,SplitMnist,Split Mnist等) 。)和定制基准测试,即使以溪流式的方式进行了训练。我们的工作提供了证据表明,在实际神经元系统中模仿机制,例如本地学习,横向竞争,可以产生新的方向和可能性,以应对终身机器学习的巨大挑战。
translated by 谷歌翻译
持续学习旨在快速,不断地从一系列任务中学习当前的任务。与其他类型的方法相比,基于经验重播的方法表现出了极大的优势来克服灾难性的遗忘。该方法的一个常见局限性是上一个任务和当前任务之间的数据不平衡,这将进一步加剧遗忘。此外,如何在这种情况下有效解决稳定性困境也是一个紧迫的问题。在本文中,我们通过提出一个通过多尺度知识蒸馏和数据扩展(MMKDDA)提出一个名为Meta学习更新的新框架来克服这些挑战。具体而言,我们应用多尺度知识蒸馏来掌握不同特征级别的远程和短期空间关系的演变,以减轻数据不平衡问题。此外,我们的方法在在线持续训练程序中混合了来自情节记忆和当前任务的样品,从而减轻了由于概率分布的变化而减轻了侧面影响。此外,我们通过元学习更新来优化我们的模型,该更新诉诸于前面所看到的任务数量,这有助于保持稳定性和可塑性之间的更好平衡。最后,我们对四个基准数据集的实验评估显示了提出的MMKDDA框架对其他流行基线的有效性,并且还进行了消融研究,以进一步分析每个组件在我们的框架中的作用。
translated by 谷歌翻译
Many real-world learning scenarios face the challenge of slow concept drift, where data distributions change gradually over time. In this setting, we pose the problem of learning temporally sensitive importance weights for training data, in order to optimize predictive accuracy. We propose a class of temporal reweighting functions that can capture multiple timescales of change in the data, as well as instance-specific characteristics. We formulate a bi-level optimization criterion, and an associated meta-learning algorithm, by which these weights can be learned. In particular, our formulation trains an auxiliary network to output weights as a function of training instances, thereby compactly representing the instance weights. We validate our temporal reweighting scheme on a large real-world dataset of 39M images spread over a 9 year period. Our extensive experiments demonstrate the necessity of instance-based temporal reweighting in the dataset, and achieve significant improvements to classical batch-learning approaches. Further, our proposal easily generalizes to a streaming setting and shows significant gains compared to recent continual learning methods.
translated by 谷歌翻译
A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, ensembles' training and inference costs can increase significantly as the number of models grows. Motivated by this limitation, we study different ensemble models to understand their benefits and drawbacks in continual learning scenarios. Finally, to overcome the high compute cost of ensembles, we leverage recent advances in neural network subspace to propose a computationally cheap algorithm with similar runtime to a single model yet enjoying the performance benefits of ensembles.
translated by 谷歌翻译