Reinforcement learning-based (RL-based) energy management strategy (EMS) is considered a promising solution for the energy management of electric vehicles with multiple power sources. It has been shown to outperform conventional methods in energy management problems regarding energy-saving and real-time performance. However, previous studies have not systematically examined the essential elements of RL-based EMS. This paper presents an empirical analysis of RL-based EMS in a Plug-in Hybrid Electric Vehicle (PHEV) and Fuel Cell Electric Vehicle (FCEV). The empirical analysis is developed in four aspects: algorithm, perception and decision granularity, hyperparameters, and reward function. The results show that the Off-policy algorithm effectively develops a more fuel-efficient solution within the complete driving cycle compared with other algorithms. Improving the perception and decision granularity does not produce a more desirable energy-saving solution but better balances battery power and fuel consumption. The equivalent energy optimization objective based on the instantaneous state of charge (SOC) variation is parameter sensitive and can help RL-EMSs to achieve more efficient energy-cost strategies.
translated by 谷歌翻译
The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.
translated by 谷歌翻译
Real-time applications of energy management strategies (EMSs) in hybrid electric vehicles (HEVs) are the harshest requirements for researchers and engineers. Inspired by the excellent problem-solving capabilities of deep reinforcement learning (DRL), this paper proposes a real-time EMS via incorporating the DRL method and transfer learning (TL). The related EMSs are derived from and evaluated on the real-world collected driving cycle dataset from Transportation Secure Data Center (TSDC). The concrete DRL algorithm is proximal policy optimization (PPO) belonging to the policy gradient (PG) techniques. For specification, many source driving cycles are utilized for training the parameters of deep network based on PPO. The learned parameters are transformed into the target driving cycles under the TL framework. The EMSs related to the target driving cycles are estimated and compared in different training conditions. Simulation results indicate that the presented transfer DRL-based EMS could effectively reduce time consumption and guarantee control performance.
translated by 谷歌翻译
With the growing need to reduce energy consumption and greenhouse gas emissions, Eco-driving strategies provide a significant opportunity for additional fuel savings on top of other technological solutions being pursued in the transportation sector. In this paper, a model-free deep reinforcement learning (RL) control agent is proposed for active Eco-driving assistance that trades-off fuel consumption against other driver-accommodation objectives, and learns optimal traction torque and transmission shifting policies from experience. The training scheme for the proposed RL agent uses an off-policy actor-critic architecture that iteratively does policy evaluation with a multi-step return and policy improvement with the maximum posteriori policy optimization algorithm for hybrid action spaces. The proposed Eco-driving RL agent is implemented on a commercial vehicle in car following traffic. It shows superior performance in minimizing fuel consumption compared to a baseline controller that has full knowledge of fuel-efficiency tables.
translated by 谷歌翻译
运输电气化需要越来越多的电动机(例如电动机和电动机存储系统)上的电动机,并且对电动电气的控制通常涉及多个输入和多个输出(MIMO)。本文重点介绍了基于多代理增强学习(MARL)算法的多模式混合动力汽车的能源管理策略的在线优化,该算法旨在解决MIMO控制优化,而大多数现有方法仅处理单个输出控制。基于对基于深层确定性策略梯度(DDPG)基于的MARL算法优化的多模式混合动力汽车(HEV)的能源效率的分析,提出了一种新的与多代理的合作网络物理学习。然后,通过一种新颖的随机方法来设定学习驾驶周期,以加快训练过程。最终,网络设计,学习率和政策噪声被纳入了敏感性分析中,并确定了基于DDPG的算法参数,并研究了与多代理的不同关系的学习绩效,并证明与与不完全独立的关系比率0.2是最好的。与单一代理和多代理的同情研究表明,多代理可以在单一代理方案中获得总能量的4%提高。因此,MAL的多目标控制可以实现良好的优化效果和应用效率。
translated by 谷歌翻译
多源机电耦合使燃料电池电动汽车(FCEV)的能源管理相对非线性和复杂,尤其是在4轮驱动(4WD)FCEV的类型中。复杂的非线性系统的准确观察状态是FCEV中出色的能源管理的基础。为了释放FCEV的节能潜力,为4WD FCEV提出了一种基于学习的新型鲁棒模型预测控制(LRMPC)策略,从而有助于多个能源之间的合适功率分布。基于机器学习(ML)的精心设计的策略将非线性系统的知识转化为具有出色稳健性能的显式控制方案。首先,具有高回归准确性和出色概括能力的ML方法是离线训练的,以建立SOC的精确状态观察者。然后,使用国家观察者生成的SOC的显式数据表用于抓住准确的状态更改,其输入功能包括车辆状态和车辆组件状态。具体来说,提供未来速度参考的车辆速度估计是由深森林构建的。接下来,将包括显式数据表和车辆速度估计的组件与模型预测控制(MPC)结合使用,以释放FCEV中多释放系统的最新能源节能能力,其名称是LRMPC。最后,在模拟测试中进行详细评估以验证LRMPC的进步性能。相应的结果突出了LRMPC的最佳控制效应和强大的实时应用能力。
translated by 谷歌翻译
本文提出了一个基于加固学习(RL)的电动连接车辆(CV)的生态驾驶框架,以提高信号交叉点的车辆能效。通过整合基于型号的汽车策略,改变车道的政策和RL政策来确保车辆代理的安全操作。随后,制定了马尔可夫决策过程(MDP),该过程使车辆能够执行纵向控制和横向决策,从而共同优化了交叉口附近CVS的CAR跟踪和改变车道的行为。然后,将混合动作空间参数化为层次结构,从而在动态交通环境中使用二维运动模式训练代理。最后,我们所提出的方法从基于单车的透视和基于流的透视图中在Sumo软件中进行了评估。结果表明,我们的策略可以通过学习适当的动作方案来大大减少能源消耗,而不会中断其他人类驱动的车辆(HDVS)。
translated by 谷歌翻译
预计自动驾驶技术不仅可以提高移动性和道路安全性,还可以提高能源效率的益处。在可预见的未来,自动车辆(AVS)将在与人机车辆共享的道路上运行。为了保持安全性和活力,同时尽量减少能耗,AV规划和决策过程应考虑自动自动驾驶车辆与周围的人机车辆之间的相互作用。在本章中,我们描述了一种通过基于认知层次理论和强化学习开发人的驾驶员行为建模来开发共用道路上的节能自主驾驶政策的框架。
translated by 谷歌翻译
Eco-driving strategies have been shown to provide significant reductions in fuel consumption. This paper outlines an active driver assistance approach that uses a residual policy learning (RPL) agent trained to provide residual actions to default power train controllers while balancing fuel consumption against other driver-accommodation objectives. Using previous experiences, our RPL agent learns improved traction torque and gear shifting residual policies to adapt the operation of the powertrain to variations and uncertainties in the environment. For comparison, we consider a traditional reinforcement learning (RL) agent trained from scratch. Both agents employ the off-policy Maximum A Posteriori Policy Optimization algorithm with an actor-critic architecture. By implementing on a simulated commercial vehicle in various car-following scenarios, we find that the RPL agent quickly learns significantly improved policies compared to a baseline source policy but in some measures not as good as those eventually possible with the RL agent trained from scratch.
translated by 谷歌翻译
在过去的几十年中,车辆的升级和更新加速了。出于对环境友好和情报的需求,电动汽车(EV)以及连接和自动化的车辆(CAVS)已成为运输系统的新组成部分。本文开发了一个增强学习框架,以在信号交叉点上对由骑士和人类驱动车辆(HDV)组成的电力排实施自适应控制。首先,提出了马尔可夫决策过程(MDP)模型来描述混合排的决策过程。新颖的状态表示和奖励功能是为模型设计的,以考虑整个排的行为。其次,为了处理延迟的奖励,提出了增强的随机搜索(ARS)算法。代理商所学到的控制政策可以指导骑士的纵向运动,后者是排的领导者。最后,在模拟套件相扑中进行了一系列模拟。与几种最先进的(SOTA)强化学习方法相比,提出的方法可以获得更高的奖励。同时,仿真结果证明了延迟奖励的有效性,延迟奖励的有效性均优于分布式奖励机制}与正常的汽车跟随行为相比,灵敏度分析表明,可以将能量保存到不同的扩展(39.27%-82.51%))通过调整优化目标的相对重要性。在没有牺牲行进延迟的前提下,建议的控制方法可以节省多达53.64%的电能。
translated by 谷歌翻译
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
translated by 谷歌翻译
As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.
translated by 谷歌翻译
由于数据量增加,金融业的快速变化已经彻底改变了数据处理和数据分析的技术,并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比,解决模型假设的财务决策问题,强化学习(RL)的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程,这是许多常用的RL方法的设置。然后引入各种算法,重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的,以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用,包括最佳执行,投资组合优化,期权定价和对冲,市场制作,智能订单路由和Robo-Awaring。
translated by 谷歌翻译
从意外的外部扰动中恢复的能力是双模型运动的基本机动技能。有效的答复包括不仅可以恢复平衡并保持稳定性的能力,而且在平衡恢复物质不可行时,也可以保证安全的方式。对于与双式运动有关的机器人,例如人形机器人和辅助机器人设备,可帮助人类行走,设计能够提供这种稳定性和安全性的控制器可以防止机器人损坏或防止伤害相关的医疗费用。这是一个具有挑战性的任务,因为它涉及用触点产生高维,非线性和致动系统的高动态运动。尽管使用基于模型和优化方法的前进方面,但诸如广泛领域知识的要求,诸如较大的计算时间和有限的动态变化的鲁棒性仍然会使这个打开问题。在本文中,为了解决这些问题,我们开发基于学习的算法,能够为两种不同的机器人合成推送恢复控制政策:人形机器人和有助于双模型运动的辅助机器人设备。我们的工作可以分为两个密切相关的指示:1)学习人形机器人的安全下降和预防策略,2)使用机器人辅助装置学习人类的预防策略。为实现这一目标,我们介绍了一套深度加强学习(DRL)算法,以学习使用这些机器人时提高安全性的控制策略。
translated by 谷歌翻译
智能能源网络提供了一种有效的手段,可容纳可变可再生能源(例如太阳能和风能)的高渗透率,这是能源生产深度脱碳的关键。但是,鉴于可再生能源以及能源需求的可变性,必须制定有效的控制和能源存储方案来管理可变的能源产生并实现所需的系统经济学和环境目标。在本文中,我们引入了由电池和氢能存储组成的混合储能系统,以处理与电价,可再生能源生产和消费有关的不确定性。我们旨在提高可再生能源利用率,并最大程度地减少能源成本和碳排放,同时确保网络内的能源可靠性和稳定性。为了实现这一目标,我们提出了一种多代理的深层确定性政策梯度方法,这是一种基于强化的基于强化学习的控制策略,可实时优化混合能源存储系统和能源需求的调度。提出的方法是无模型的,不需要明确的知识和智能能源网络环境的严格数学模型。基于现实世界数据的仿真结果表明:(i)混合储能系统和能源需求的集成和优化操作可将碳排放量减少78.69%,将成本节省的成本储蓄提高23.5%,可续订的能源利用率比13.2%以上。其他基线模型和(ii)所提出的算法优于最先进的自学习算法,例如Deep-Q网络。
translated by 谷歌翻译
Proper functioning of connected and automated vehicles (CAVs) is crucial for the safety and efficiency of future intelligent transport systems. Meanwhile, transitioning to fully autonomous driving requires a long period of mixed autonomy traffic, including both CAVs and human-driven vehicles. Thus, collaboration decision-making for CAVs is essential to generate appropriate driving behaviors to enhance the safety and efficiency of mixed autonomy traffic. In recent years, deep reinforcement learning (DRL) has been widely used in solving decision-making problems. However, the existing DRL-based methods have been mainly focused on solving the decision-making of a single CAV. Using the existing DRL-based methods in mixed autonomy traffic cannot accurately represent the mutual effects of vehicles and model dynamic traffic environments. To address these shortcomings, this article proposes a graph reinforcement learning (GRL) approach for multi-agent decision-making of CAVs in mixed autonomy traffic. First, a generic and modular GRL framework is designed. Then, a systematic review of DRL and GRL methods is presented, focusing on the problems addressed in recent research. Moreover, a comparative study on different GRL methods is further proposed based on the designed framework to verify the effectiveness of GRL methods. Results show that the GRL methods can well optimize the performance of multi-agent decision-making for CAVs in mixed autonomy traffic compared to the DRL methods. Finally, challenges and future research directions are summarized. This study can provide a valuable research reference for solving the multi-agent decision-making problems of CAVs in mixed autonomy traffic and can promote the implementation of GRL-based methods into intelligent transportation systems. The source code of our work can be found at https://github.com/Jacklinkk/Graph_CAVs.
translated by 谷歌翻译
深入强化学习(DRL)用于开发自主优化和定制设计的热处理过程,这些过程既对微观结构敏感又节能。与常规监督的机器学习不同,DRL不仅依赖于数据中的静态神经网络培训,但是学习代理人会根据奖励和惩罚元素自主开发最佳解决方案,并减少或没有监督。在我们的方法中,依赖温度的艾伦 - 卡恩模型用于相转换,用作DRL代理的环境,是其获得经验并采取自主决策的模型世界。 DRL算法的试剂正在控制系统的温度,作为用于合金热处理的模型炉。根据所需的相位微观结构为代理定义了微观结构目标。训练后,代理可以为各种初始微观结构状态生成温度时间曲线,以达到最终所需的微观结构状态。详细研究了代理商的性能和热处理概况的物理含义。特别是,该试剂能够控制温度以从各种初始条件开始达到所需的微观结构。代理在处理各种条件方面的这种能力为使用这种方法铺平了道路,也用于回收的导向热处理过程设计,由于杂质的侵入,初始组合物可能因批量而异,以及用于设计节能热处理。为了检验这一假设,将无罚款的代理人与考虑能源成本的代理人进行了比较。对能源成本的罚款是针对找到最佳温度时间剖面的代理的附加标准。
translated by 谷歌翻译
This paper surveys the eld of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the eld and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but di ers considerably in the details and in the use of the word \reinforcement." The paper discusses central issues of reinforcement learning, including trading o exploration and exploitation, establishing the foundations of the eld via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
translated by 谷歌翻译