Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning. Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge. While considerable research has focused on learning multiple input distributions, typically in classification, lifelong reinforcement learning (LRL) must also deal with variations in the state and transition distributions, and in the reward functions. Modulating masks, recently developed for classification, are particularly suitable to deal with such a large spectrum of task variations. In this paper, we adapted modulating masks to work with deep LRL, specifically PPO and IMPALA agents. The comparison with LRL baselines in both discrete and continuous RL tasks shows competitive performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.
translated by 谷歌翻译
任何机器学习解决方案的鲁棒性从根本上受到培训的数据的约束。超越原始培训的一种方法是通过对原始数据集的人为增强。但是,不可能指定部署过程中可能发生的所有可能发生的故障案例。为了解决这个限制,我们将基于模型的增强学习和模型解关方法结合在一起,提出了一种解决方案,该解决方案会自我生成的模拟场景受到环境概念和动态的约束,并以无聊的方式学习。特别是,代理环境的内部模型是基于对代理动作敏感的输入空间的低维概念表示。我们在简单的点对点导航任务中在标准逼真的驾驶模拟器中演示了这种方法,我们在其中显示了与指定失败情况不同实例的一击概括以及与相似变化的零弹性概括相比,我们显示出巨大的改进。基于模型和无模型的方法。
translated by 谷歌翻译
元钢筋学习(Meta-RL)算法使得能够快速适应动态环境中的少量样本的任务。通过代理策略网络中的动态表示(通过推理关于任务上下文,模型参数更新或两者)获得的动态表示来实现这样的壮举。然而,由于在策略网络上满足不同的政策,因此获得了超越简单基准问题的快速适应的丰富动态表示是具有挑战性的。本文通过将神经调节引入模块化组件来解决挑战,以增加调节神经元活动的标准策略网络,以便为任务适应提供有效的动态表示。策略网络的建议扩展是在越来越复杂的多个离散和连续控制环境中进行评估。为了证明在Meta-R1中的延伸的一般性和益处,将神经调序的网络应用于两个最先进的META-RL算法(胱瓦和珍珠)。结果表明,与基线相比,通过神经调节增强的Meta-R1产生明显更好的结果和更丰富的动态表示。
translated by 谷歌翻译
In this paper, a complete framework for Autonomous Self Driving is implemented. LIDAR, Camera and IMU sensors are used together. The entire data communication is managed using Robot Operating System which provides a robust platform for implementation of Robotics Projects. Jetson Nano is used to provide powerful on-board processing capabilities. Sensor fusion is performed on the data received from the different sensors to improve the accuracy of the decision making and inferences that we derive from the data. This data is then used to create a localized map of the environment. In this step, the position of the vehicle is obtained with respect to the Mapping done using the sensor data.The different SLAM techniques used for this purpose are Hector Mapping and GMapping which are widely used mapping techniques in ROS. Apart from SLAM that primarily uses LIDAR data, Visual Odometry is implemented using a Monocular Camera. The sensor fused data is then used by Adaptive Monte Carlo Localization for car localization. Using the localized map developed, Path Planning techniques like "TEB planner" and "Dynamic Window Approach" are implemented for autonomous navigation of the vehicle. The last step in the Project is the implantation of Control which is the final decision making block in the pipeline that gives speed and steering data for the navigation that is compatible with Ackermann Kinematics. The implementation of such a control block under a ROS framework using the three sensors, viz, LIDAR, Camera and IMU is a novel approach that is undertaken in this project.
translated by 谷歌翻译
当在培训期间遇到的人的分布范围内提交输入时,加固学习代理表现良好。但是,在面对新颖的,出现外部事件之前,他们无法有效地响应,直到他们经历了额外的培训。本文介绍了在线,数据驱动,紧急响应方法,旨在提供自主代理对与其接受培训或设计的意外情况的意外情况反应的能力。在这种情况下,由于在这些新颖情况下获得的观察结果将落在转让代理已经优化以处理来处理的投入的分布之外,因此无法妥善执行。通过选择最小化来自变形自动编码器的重建误差的增加率的动作,所提出的方法顺序地对未预见的情况进行了定制响应。使用修改后的贝叶斯优化过程,以数据有效的方式(大约30个数据点)在线实现该优化。我们展示了这种方法在模拟的3D车驾驶场景中的潜力,其中代理在2秒内投入响应,以避免与其在训练期间没有看到的对象的碰撞。
translated by 谷歌翻译
CMOS传感器采用行明智的采集机制,同时成像一个场景,这可能导致已知被称为捕获图像中的滚动快门(RS)失真的不希望的运动伪影。现有的单图像RS整流方法尝试通过使用针对特定的场景量身定制的算法来计算这些扭曲,该算法根据具有已知的地面真理运动参数的内在相机参数或基于学习的框架的信息。在本文中,我们提出了一个端到端的深神经网络,用于单幅图像RS整流的具有挑战性的任务。我们的网络由运动块,轨迹模块,行块,RS整流模块和RS再生模块(仅在训练期间使用)组成。当轨迹模块将估计的运动参数拟合到三阶多项式时,运动块预测输入RS失真图像的每一行的摄像机姿势。行块预测必须与目标I.E中的每个像素相关联的相机运动,RS整流图像。最后,RS整流模块使用运动轨迹和行块的输出来扭曲输入RS图像以到达畸变互联图像。为了在训练期间更快的收敛,我们还使用RS再生模块,该RS再生模块将输入RS图像与估计的运动参数失真的地面真理图像进行比较。我们模型中的端到端制定不会将估计的动作限制为地面真理运动参数,从而成功地将RS图像与复杂的现实生活相机运动进行了整理。合成和实时数据集的实验表明,我们的网络在定性和定量上占据了现有技术的现有技术。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function--integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept ``meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.
translated by 谷歌翻译