Deadline-Aware, Energy-Efficient Control of Domestic Immersion Hot Water Heater

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Typical domestic immersion water heater systems are often operated continuously during winter, heating quickly rather than efficiently and ignoring predictable demand windows and ambient losses. We study deadline-aware control, where the aim is to reach a target temperature at a specified time while minimising energy consumption. We introduce an efficient Gymnasium environment that models an immersion hot water heater with first-order thermal losses and discrete on and off actions of 0 W and 6000 W applied every 120 seconds. Methods include a time-optimal bang-bang baseline, a zero-shot Monte Carlo Tree Search planner, and a Proximal Policy Optimisation policy. We report total energy consumption in watt-hours under identical physical dynamics. Across sweeps of initial temperature from 10 to 30 degrees Celsius, deadline from 30 to 90 steps, and target temperature from 40 to 80 degrees Celsius, PPO achieves the most energy-efficient performance at a 60-step horizon of 2 hours, using 3.23 kilowatt-hours, compared to 4.37 to 10.45 kilowatt-hours for bang-bang control and 4.18 to 6.46 kilowatt-hours for MCTS. This corresponds to energy savings of 26 percent at 30 steps and 69 percent at 90 steps. In a representative trajectory with a 50 kg water mass, 20 degrees Celsius ambient temperature, and a 60 degrees Celsius target, PPO consumes 54 percent less energy than bang-bang control and 33 percent less than MCTS. These results show that learned deadline-aware control reduces energy consumption under identical physical assumptions, while planners provide partial savings without training and learned policies offer near-zero inference cost once trained.

💡 Research Summary

The paper addresses the problem of controlling a domestic electric immersion water heater so that a desired water temperature is reached exactly at a pre‑specified deadline while consuming as little electricity as possible. To enable fair and reproducible comparisons, the authors construct a lightweight Gymnasium environment that implements a first‑order thermal model with convective losses and a binary on/off actuator (0 W or 6000 W) applied every 120 seconds. The physical parameters are realistic: 50 kg of water, specific heat 4184 J kg⁻¹ K⁻¹, heat‑loss coefficient 50 W °C⁻¹, tank surface area 1.5 m², ambient temperature 20 °C, and heater efficiency 0.95 (effective heating power ≈5700 W). The continuous energy balance mcₚ dT/dt = ηP − hA(T − Tₐ) is discretized with forward‑Euler integration, yielding a stable update rule because the dimensionless cooling factor is ≈0.043.

The control task is formalized as a finite‑horizon Markov decision process (MDP). The state consists of the current water temperature, the target temperature, the ambient temperature, and the remaining steps until the deadline. The action space is binary (off or full power). Episodes terminate at the deadline D; success is defined as the final temperature lying within ±1 °C of the target. The reward function combines a small per‑step penalty proportional to the energy used (α = 1.86 × 10⁻⁸ J⁻¹) and a larger terminal penalty proportional to the absolute temperature error (β = 0.03 °C⁻¹). This design ensures that reducing the final error by 1 °C is more valuable than an additional heating step, encouraging the agent to postpone heating when possible.

Three control strategies are evaluated under identical physics and timing:

Bang‑bang baseline – a time‑optimal policy that keeps the heater on until the temperature enters the target band, then turns it off. This strategy reaches the set‑point quickly but ignores heat losses, leading to high energy use.
Monte Carlo Tree Search (MCTS) – a zero‑shot planner that performs 25 000 deterministic rollouts per episode. The tree is expanded using the UCB1 selection rule with exploration constant c = √2, and the best root action (most visits / highest value) is executed. Because the dynamics are deterministic and the action space is binary, the search quickly concentrates on promising branches, but it requires substantial compute at every control step.
Proximal Policy Optimization (PPO) – a model‑free reinforcement‑learning agent trained with Stable‑Baselines3. The policy network is a multilayer perceptron with a discrete action head. Training runs for 2.5 million timesteps (converging after ≈2.1 M) on the same environment, using a curriculum of varied initial temperatures to promote generalisation. After training, inference is essentially instantaneous, offering near‑zero runtime cost.

The authors conduct systematic one‑dimensional sweeps over initial temperature (10–30 °C), target temperature (40–80 °C), and deadline length (30–90 steps, each step = 120 s). Energy consumption is reported in kilowatt‑hours. Results show that PPO consistently achieves the lowest energy use. For a 60‑step horizon (2 h), PPO consumes an average of 3.23 kWh, whereas the bang‑bang baseline consumes between 4.37 and 10.45 kWh and MCTS consumes between 4.18 and 6.46 kWh. Relative savings range from 26 % (long deadlines) to 69 % (tight deadlines). In a representative scenario (initial 20 °C, target 60 °C, deadline 60 steps), PPO uses 54 % less energy than bang‑bang and 33 % less than MCTS.

The study highlights two key insights. First, deadline‑aware control dramatically reduces energy waste compared with naïve on‑off thermostatic rules, because it deliberately delays heating to avoid unnecessary losses. Second, there is a clear trade‑off between online planning and offline learning: MCTS needs no training but incurs heavy per‑step computation, while PPO requires an upfront training phase but then delivers optimal‑or‑near‑optimal performance with negligible runtime overhead. The paper positions its Gymnasium environment as a reproducible benchmark for future work, suggesting extensions such as time‑of‑use electricity tariffs, carbon intensity signals, or multi‑tank stratification. Overall, the work demonstrates that reinforcement‑learning policies can outperform both simple heuristics and deterministic planners in a realistic domestic hot‑water heating scenario, offering a practical pathway toward greener household energy use.

Deadline-Aware, Energy-Efficient Control of Domestic Immersion Hot Water Heater

💡 Research Summary

Comments & Academic Discussion

Leave a Comment