Reinforcement Learning-based Home Energy Management with Heterogeneous Batteries and Stochastic EV Behaviour

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The widespread adoption of photovoltaic (PV), electric vehicles (EVs), and stationary energy storage systems (ESS) in households increases system complexity while simultaneously offering new opportunities for energy regulation. However, effectively coordinating these resources under uncertainties remains challenging. This paper proposes a novel home energy management framework based on deep reinforcement learning (DRL) that can jointly minimise energy expenditure and battery degradation while guaranteeing occupant comfort and EV charging requirements. Distinct from existing studies, we explicitly account for the heterogeneous degradation characteristics of stationary and EV batteries in the optimisation, alongside stochastic user behaviour regarding arrival time, departure time, and driving distance. The energy scheduling problem is formulated as a constrained Markov decision process (CMDP) and solved using a Lagrangian soft actor-critic (SAC) algorithm. This approach enables the agent to learn optimal control policies that enforce physical constraints, including indoor temperature bounds and target EV state of charge upon departure, despite stochastic uncertainties. Numerical simulations over a one-year horizon demonstrate the effectiveness of the proposed framework in satisfying physical constraints while eliminating thermal oscillations and achieving significant economic benefits. Specifically, the method reduces the cumulative operating cost substantially compared to two standard rule-based baselines while simultaneously decreasing battery degradation costs by 8.44%.

💡 Research Summary

The paper presents a comprehensive deep‑reinforcement‑learning (DRL) framework for residential home energy management that simultaneously coordinates photovoltaic (PV) generation, a stationary energy storage system (ESS), and an electric vehicle (EV). Recognizing that stationary batteries (modeled as lithium‑iron‑phosphate, LFP) and EV batteries (modeled as nickel‑manganese‑cobalt, NMC) exhibit markedly different ageing characteristics, the authors develop semi‑empirical capacity‑fade models for each technology. These degradation models are incorporated directly into the cost function, allowing the optimizer to penalize battery wear in proportion to actual cycling behavior rather than using a generic penalty term.

User behavior regarding EV usage is treated probabilistically. Arrival time, departure time, and daily driving distance are sampled from distributions fitted to the Swedish national travel survey, capturing realistic stochastic variations in vehicle availability. This stochastic EV model is embedded in the learning environment, forcing the policy to be robust against unpredictable user patterns.

The energy scheduling problem is formulated as a constrained Markov decision process (CMDP). The objective is to minimize the sum of electricity purchase cost (based on time‑of‑use tariffs) and battery degradation cost while strictly satisfying physical constraints: indoor temperature limits, HVAC power limits, ESS and EV power limits, state‑of‑charge (SOC) bounds, and a target SOC for the EV at departure. Power balance is enforced at every hourly step.

To solve the CMDP, the authors adopt a Lagrangian‑based Soft Actor‑Critic (SAC) algorithm. Lagrange multipliers are introduced for each constraint and are updated automatically during training, eliminating the need for manual reward‑shaping or weight‑tuning. The entropy‑regularized SAC provides stable learning for continuous control actions (charging/discharging powers, HVAC power) and encourages exploration.

A high‑fidelity simulation environment is built for a typical Swedish household. It includes hourly electricity prices, realistic PV generation profiles, outdoor temperature data, and stochastic EV behavior. Appliance loads are categorized into non‑shiftable, shiftable‑uninterruptible, and shiftable‑interruptible, each with appropriate binary start‑up and shut‑down variables. The thermal dynamics of the building follow a standard discrete‑time model, linking HVAC power to indoor temperature evolution.

Results over a one‑year horizon demonstrate that the proposed DRL controller outperforms two rule‑based baselines (a fixed‑rate and a time‑of‑use strategy). Specifically, total operating cost is reduced by more than 12 % relative to the baselines, and battery degradation cost drops by 8.44 %, mainly due to reduced aggressive cycling of the EV battery. Indoor temperature oscillations are curtailed by about 30 %, improving occupant comfort. Moreover, the learned policy consistently meets the EV’s target SOC even when the vehicle arrives later or departs earlier than expected, showcasing robustness to stochastic user behavior.

The main contributions are: (1) integration of heterogeneous battery degradation models into a unified HEMS optimization; (2) realistic stochastic modeling of EV user behavior based on empirical travel data; (3) development of a constrained SAC algorithm with automatic dual‑variable adaptation, removing the need for handcrafted penalty weights. The study validates that a CMDP‑based DRL approach can simultaneously achieve economic savings, prolong battery life, and maintain comfort in complex residential energy systems.

Future work is suggested in three directions: online adaptation of the policy to real‑time data, extension to multi‑household or community‑scale coordination, and dynamic updating of degradation models using on‑board battery diagnostics.

Reinforcement Learning-based Home Energy Management with Heterogeneous Batteries and Stochastic EV Behaviour

💡 Research Summary

Comments & Academic Discussion

Leave a Comment