Heating in private households is a major contributor to the emissions generated today. Heat pumps are a promising alternative for heat generation and are a key technology in achieving our goals of the German energy transformation and to become less dependent on fossil fuels. Today, the majority of heat pumps in the field are controlled by a simple heating curve, which is a naive mapping of the current outdoor temperature to a control action. A more advanced control approach is model predictive control (MPC) which was applied in multiple research works to heat pump control. However, MPC is heavily dependent on the building model, which has several disadvantages. Motivated by this and by recent breakthroughs in the field, this work applies deep reinforcement learning (DRL) to heat pump control in a simulated environment. Through a comparison to MPC, it could be shown that it is possible to apply DRL in a model-free manner to achieve MPC-like performance. This work extends other works which have already applied DRL to building heating operation by performing an in-depth analysis of the learned control strategies and by giving a detailed comparison of the two state-of-the-art control methods.
translated by 谷歌翻译
The decarbonization of buildings presents new challenges for the reliability of the electrical grid as a result of the intermittency of renewable energy sources and increase in grid load brought about by end-use electrification. To restore reliability, grid-interactive efficient buildings can provide flexibility services to the grid through demand response. Residential demand response programs are hindered by the need for manual intervention by customers. To maximize the energy flexibility potential of residential buildings, an advanced control architecture is needed. Reinforcement learning is well-suited for the control of flexible resources as it is able to adapt to unique building characteristics compared to expert systems. Yet, factors hindering the adoption of RL in real-world applications include its large data requirements for training, control security and generalizability. Here we address these challenges by proposing the MERLIN framework and using a digital twin of a real-world 17-building grid-interactive residential community in CityLearn. We show that 1) independent RL-controllers for batteries improve building and district level KPIs compared to a reference RBC by tailoring their policies to individual buildings, 2) despite unique occupant behaviours, transferring the RL policy of any one of the buildings to other buildings provides comparable performance while reducing the cost of training, 3) training RL-controllers on limited temporal data that does not capture full seasonality in occupant behaviour has little effect on performance. Although, the zero-net-energy (ZNE) condition of the buildings could be maintained or worsened as a result of controlled batteries, KPIs that are typically improved by ZNE condition (electricity price and carbon emissions) are further improved when the batteries are managed by an advanced controller.
translated by 谷歌翻译
Energy consumption in buildings, both residential and commercial, accounts for approximately 40% of all energy usage in the U.S., and similar numbers are being reported from countries around the world. This significant amount of energy is used to maintain a comfortable, secure, and productive environment for the occupants. So, it is crucial that the energy consumption in buildings must be optimized, all the while maintaining satisfactory levels of occupant comfort, health, and safety. Recently, Machine Learning has been proven to be an invaluable tool in deriving important insights from data and optimizing various systems. In this work, we review the ways in which machine learning has been leveraged to make buildings smart and energy-efficient. For the convenience of readers, we provide a brief introduction of several machine learning paradigms and the components and functioning of each smart building system we cover. Finally, we discuss challenges faced while implementing machine learning algorithms in smart buildings and provide future avenues for research at the intersection of smart buildings and machine learning.
translated by 谷歌翻译
As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.
translated by 谷歌翻译
Multi-uncertainties from power sources and loads have brought significant challenges to the stable demand supply of various resources at islands. To address these challenges, a comprehensive scheduling framework is proposed by introducing a model-free deep reinforcement learning (DRL) approach based on modeling an island integrated energy system (IES). In response to the shortage of freshwater on islands, in addition to the introduction of seawater desalination systems, a transmission structure of "hydrothermal simultaneous transmission" (HST) is proposed. The essence of the IES scheduling problem is the optimal combination of each unit's output, which is a typical timing control problem and conforms to the Markov decision-making solution framework of deep reinforcement learning. Deep reinforcement learning adapts to various changes and timely adjusts strategies through the interaction of agents and the environment, avoiding complicated modeling and prediction of multi-uncertainties. The simulation results show that the proposed scheduling framework properly handles multi-uncertainties from power sources and loads, achieves a stable demand supply for various resources, and has better performance than other real-time scheduling methods, especially in terms of computational efficiency. In addition, the HST model constitutes an active exploration to improve the utilization efficiency of island freshwater.
translated by 谷歌翻译
增强学习(RL)是多能管理系统的有前途的最佳控制技术。它不需要先验模型 - 降低了前期和正在进行的项目特定工程工作,并且能够学习基础系统动力学的更好表示。但是,香草RL不能提供约束满意度的保证 - 导致其在安全至关重要的环境中产生各种不安全的互动。在本文中,我们介绍了两种新颖的安全RL方法,即SafeFallback和Afvafe,其中安全约束配方与RL配方脱钩,并且提供了硬构成满意度,可以保证在培训(探索)和开发过程中(近距离) )最佳政策。在模拟的多能系统案例研究中,我们已经表明,这两种方法均与香草RL基准相比(94,6%和82,8%,而35.5%)和香草RL基准相比明显更高的效用(即有用的政策)开始。提出的SafeFallback方法甚至可以胜过香草RL基准(102,9%至100%)。我们得出的结论是,这两种方法都是超越RL的安全限制处理技术,正如随机代理所证明的,同时仍提供坚硬的保证。最后,我们向I.A.提出了基本的未来工作。随着更多数据可用,改善约束功能本身。
translated by 谷歌翻译
本文介绍了一种用于开发面向控制的建筑物的散热模型的数据驱动建模方法。这些型号是通过降低能耗成本的目标而开发的,同时控制建筑物的室内温度,在所需的舒适度限制内。结合白/灰盒物理模型的可解释性和神经网络的表现力,我们提出了一种物理知识的神经网络方法,用于这种建模任务。除了测量的数据和构建参数之外,我们将通过管理这些建筑物的热行为的底层物理编码神经网络。因此,实现了由物理学引导的模型,有助于建模室温和功耗的时间演化以及隐藏状态,即建筑物热质量的温度。这项工作的主要研究贡献是:(1)我们提出了两种物理学的变种信息,为机构的控制定向热建模任务提供了通知的神经网络架构,(2)我们展示这些架构是数据效率的,需要更少培训数据与传统的非物理知识的神经网络相比,(3)我们表明这些架构比传统的神经网络实现更准确的预测,用于更长的预测视野。我们使用模拟和实际字数据测试所提出的架构的预测性能,以演示(2)和(3),并显示所提出的物理知识的神经网络架构可以用于该控制导向的建模问题。
translated by 谷歌翻译
已经开发了增强学习(RL)技术来优化工业冷却系统,与传统的启发式政策相比,提供了可观的节能。工业控制中的一个主要挑战涉及由于机械限制而在现实世界中可行的学习行为。例如,某些操作只能每隔几个小时执行一次,而其他动作可以更频繁地采取。如果没有广泛的奖励工程和实验,RL代理可能无法学习机械的现实操作。为了解决这个问题,我们使用层次结构的增强学习与多种根据操作时间尺度控制动作子集的代理。我们的分层方法可以在现有基线上节省能源,同时在模拟的HVAC控制环境中保持在安全范围内的限制(例如操作冷却器)。
translated by 谷歌翻译
随着可再生能源的延伸升幅,盘中电市场在交易商和电力公用事业中录得不断增长的普及,以应对能源供应的诱导波动。通过其短途交易地平线和持续的性质,盘中市场提供了调整日前市场的交易决策的能力,或者在短期通知中降低交易风险。通过根据当前预测修改其提供的能力,可再生能源的生产者利用盘中市场降低预测风险。然而,由于电网必须保持稳定,电力仅部分可存储,因此市场动态很复杂。因此,需要在盘区市场中运营的强大和智能交易策略。在这项工作中,我们提出了一种基于深度加强学习(DRL)算法的新型自主交易方法作为可能的解决方案。为此目的,我们将盘区贸易塑造为马尔可夫决策问题(MDP),并采用近端策略优化(PPO)算法作为我们的DRL方法。介绍了一种模拟框架,使得连续盘整价格的分辨率提供一分钟步骤。从风园运营商的角度来看,我们在案例研究中测试我们的框架。我们在普通贸易信息旁边包括价格和风险预测。在2018年德国盘区交易结果的测试场景中,我们能够以至少45.24%的改进优于多个基线,显示DRL算法的优势。但是,我们还讨论了DRL代理的局限性和增强功能,以便在未来的工作中提高性能。
translated by 谷歌翻译
我们仔细比较了两种无模型控制算法,演进策略和近端政策优化(PPO),具有后退地平线模型预测控制(MPC),用于操作模拟,价格响应式热水器。考虑了四个MPC变体:单次控制器,具有完美预测产生最佳控制;一个有限的地平控制器,具有完美预测;基于平均的预测控制器;使用历史情景,一个两阶段随机编程控制器。在所有情况下,水温和电价的MPC模型精确;只有水需求不确定。为了比较,ES和PPO通过在MPC使用的相同场景下直接与模拟环境直接交互来学习基于神经网络的策略。然后在需求时间序列的单独一周继续的单独一周内进行评估所有方法。我们证明了对这个问题的最佳控制是具有挑战性的,需要超过8小时的MPC寻找,具有完美预测来获得最低成本。尽管存在这一挑战,但ES和PPO都学会了在平均成本方面优于平均预测和两级随机MPC控制器的良好通用政策,并且在计算动作时速度越来越多的数量级。我们表明ES尤其可以利用并行性,使用1150 CPU核心在90秒内学习策略。
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
我们考虑了需求侧能源管理的问题,每个家庭都配备了能够在线安排家用电器的智能电表。目的是最大程度地减少实时定价计划下的整体成本。尽管以前的作品引入了集中式方法,在该方法中,调度算法具有完全可观察的性能,但我们提出了将智能网格环境作为马尔可夫游戏的表述。每个家庭都是具有部分可观察性的去中心化代理,可以在现实环境中进行可扩展性和隐私保护。电网操作员产生的价格信号随能量需求而变化。我们提出了从代理商的角度来解决部分可观察性和环境的局部可观察性的扩展,以解决部分可观察性。该算法学习了一位集中批评者,该批评者协调分散的代理商的培训。因此,我们的方法使用集中学习,但分散执行。仿真结果表明,我们的在线深入强化学习方法可以纯粹基于瞬时观察和价格信号来降低所有消耗的总能量的峰值与平均值和所有家庭的电力。
translated by 谷歌翻译
我们提出了一个混合工业冷却系统模型,该模型将分析解决方案嵌入多物理模拟中。该模型设计用于增强学习(RL)应用程序,并平衡简单性与模拟保真度和解释性。该模型的忠诚度根据大规模冷却系统的现实世界数据进行了评估。接下来是一个案例研究,说明如何将模型用于RL研究。为此,我们开发了一个工业任务套件,该套件允许指定不同的问题设置和复杂性水平,并使用它来评估不同RL算法的性能。
translated by 谷歌翻译
建筑物中的加热和冷却系统占全球能源使用的31 \%,其中大部分受基于规则的控制器(RBC)调节,这些控制器(RBC)既不通过与电网进行最佳交互来最大化能源效率或最小化排放。通过强化学习(RL)的控制已显示可显着提高建筑能源效率,但是现有的解决方案需要访问世界上每栋建筑物都无法期望的特定建筑模拟器或数据。作为回应,我们表明可以在没有这样的知识的情况下获得减少排放的政策,这是我们称为零射击建筑物控制的范式。我们结合了系统识别和基于模型的RL的想法,以创建PEARL(概率避免发射的增强学习),并表明建立表现模型所需的短期积极探索是所需的。在三个不同的建筑能源模拟的实验中,我们显示珍珠在所有情况下都优于现有的RBC,并且在所有情况下,流行的RL基线,在维持热舒适度的同时,将建筑物排放量减少了31 \%。我们的源代码可通过https://enjeener.io/projects/pearl在线获得。
translated by 谷歌翻译
建筑物中的供暖和冷却系统占全球能源使用的31%,其中大部分受基于规则的控制器(RBC)调节,这些控制器(RBC)既不通过与网格最佳交互来最大程度地提高能源效率或最小化排放。通过增强学习(RL)的控制已显示可显着提高建筑能源效率,但是现有的解决方案需要在模拟器中进行预训练,这些模拟器对世界上每栋建筑物的获得非常昂贵。作为回应,我们表明可以通过结合系统识别和基于模型的RL的想法来对建筑物进行安全,零射击的控制。我们称这种组合珍珠(概率避免施加加固的增强学习),并表明它可以减少排放而无需预先培训,只需要三个小时的调试期。在三个不同的建筑能源模拟的实验中,我们显示珍珠在所有情况下都胜过现有的RBC,并且在所有情况下,流行的RL基线,在维持热舒适度的同时,将建筑物排放量降低了31%。
translated by 谷歌翻译
This paper presents a multi-agent Deep Reinforcement Learning (DRL) framework for autonomous control and integration of renewable energy resources into smart power grid systems. In particular, the proposed framework jointly considers demand response (DR) and distributed energy management (DEM) for residential end-users. DR has a widely recognized potential for improving power grid stability and reliability, while at the same time reducing end-users energy bills. However, the conventional DR techniques come with several shortcomings, such as the inability to handle operational uncertainties while incurring end-user disutility, which prevents widespread adoption in real-world applications. The proposed framework addresses these shortcomings by implementing DR and DEM based on real-time pricing strategy that is achieved using deep reinforcement learning. Furthermore, this framework enables the power grid service provider to leverage distributed energy resources (i.e., PV rooftop panels and battery storage) as dispatchable assets to support the smart grid during peak hours, thus achieving management of distributed energy resources. Simulation results based on the Deep Q-Network (DQN) demonstrate significant improvements of the 24-hour accumulative profit for both prosumers and the power grid service provider, as well as major reductions in the utilization of the power grid reserve generators.
translated by 谷歌翻译
深入强化学习(DRL)用于开发自主优化和定制设计的热处理过程,这些过程既对微观结构敏感又节能。与常规监督的机器学习不同,DRL不仅依赖于数据中的静态神经网络培训,但是学习代理人会根据奖励和惩罚元素自主开发最佳解决方案,并减少或没有监督。在我们的方法中,依赖温度的艾伦 - 卡恩模型用于相转换,用作DRL代理的环境,是其获得经验并采取自主决策的模型世界。 DRL算法的试剂正在控制系统的温度,作为用于合金热处理的模型炉。根据所需的相位微观结构为代理定义了微观结构目标。训练后,代理可以为各种初始微观结构状态生成温度时间曲线,以达到最终所需的微观结构状态。详细研究了代理商的性能和热处理概况的物理含义。特别是,该试剂能够控制温度以从各种初始条件开始达到所需的微观结构。代理在处理各种条件方面的这种能力为使用这种方法铺平了道路,也用于回收的导向热处理过程设计,由于杂质的侵入,初始组合物可能因批量而异,以及用于设计节能热处理。为了检验这一假设,将无罚款的代理人与考虑能源成本的代理人进行了比较。对能源成本的罚款是针对找到最佳温度时间剖面的代理的附加标准。
translated by 谷歌翻译
Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-mapping was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology.
translated by 谷歌翻译
本文解决了当参与需求响应(DR)时优化电动汽车(EV)的充电/排放时间表的问题。由于电动汽车的剩余能量,到达和出发时间以及未来的电价中存在不确定性,因此很难做出充电决定以最大程度地减少充电成本,同时保证电动汽车的电池最先进(SOC)在内某些范围。为了解决这一难题,本文将EV充电调度问题制定为Markov决策过程(CMDP)。通过协同结合增强的Lagrangian方法和软演员评论家算法,本文提出了一种新型安全的非政策钢筋学习方法(RL)方法来解决CMDP。通过Lagrangian值函数以策略梯度方式更新Actor网络。采用双重危机网络来同步估计动作值函数,以避免高估偏差。所提出的算法不需要强烈的凸度保证,可以保证被检查的问题,并且是有效的样本。现实世界中电价的全面数值实验表明,我们提出的算法可以实现高解决方案最佳性和约束依从性。
translated by 谷歌翻译
Ongoing risks from climate change have impacted the livelihood of global nomadic communities, and are likely to lead to increased migratory movements in coming years. As a result, mobility considerations are becoming increasingly important in energy systems planning, particularly to achieve energy access in developing countries. Advanced Plug and Play control strategies have been recently developed with such a decentralized framework in mind, more easily allowing for the interconnection of nomadic communities, both to each other and to the main grid. In light of the above, the design and planning strategy of a mobile multi-energy supply system for a nomadic community is investigated in this work. Motivated by the scale and dimensionality of the associated uncertainties, impacting all major design and decision variables over the 30-year planning horizon, Deep Reinforcement Learning (DRL) is implemented for the design and planning problem tackled. DRL based solutions are benchmarked against several rigid baseline design options to compare expected performance under uncertainty. The results on a case study for ger communities in Mongolia suggest that mobile nomadic energy systems can be both technically and economically feasible, particularly when considering flexibility, although the degree of spatial dispersion among households is an important limiting factor. Key economic, sustainability and resilience indicators such as Cost, Equivalent Emissions and Total Unmet Load are measured, suggesting potential improvements compared to available baselines of up to 25%, 67% and 76%, respectively. Finally, the decomposition of values of flexibility and plug and play operation is presented using a variation of real options theory, with important implications for both nomadic communities and policymakers focused on enabling their energy access.
translated by 谷歌翻译