Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model
Despite the rapid expansion of electric vehicle (EV) charging networks, questions remain about their efficiency in meeting the growing needs of EV drivers. Previous simulation-based approaches, which rely on static behavioural rules, have struggled to capture the adaptive behaviours of human drivers. Although reinforcement learning has been introduced in EV simulation studies, its application has primarily focused on optimising fleet operations rather than modelling private drivers who make independent charging decisions. To address the gap, we propose a multi-stage reinforcement learning framework that simulates charging demand of private EV drivers across a national-scale road network. We validate the model against real-world data and identify the training stage that most closely reflects actual driver behaviour, which captures both the adaptive behaviours and bounded rationality of private drivers. Based on the simulation results, we also identify critical ‘charging deserts’ where EV drivers consistently have low state of charge. Our findings also highlight recent policy shifts toward expanding rapid charging hubs along motorway corridors and city boundaries to meet the demand from long-distance trips.
💡 Research Summary
This paper addresses the growing challenge of matching electric‑vehicle (EV) charging infrastructure to the needs of private drivers in Great Britain. Existing approaches either rely on data‑driven methods that suffer from limited or biased datasets, or on agent‑based models (ABMs) that use static behavioural rules and therefore cannot capture the adaptive learning of human drivers. While reinforcement learning (RL) has been applied to optimise fleet routing and charging, it has not been used to model the bounded‑rational, independent decisions of private EV owners.
To fill this gap, the authors develop a multi‑stage, large‑scale RL framework that combines clustering, deep Q‑network (DQN) training, and ABM simulation. First, drivers from the National Travel Survey are clustered using four quantitative attributes: trip distance, initial battery state‑of‑charge (SOC), trip‑co‑occurrence density (TCD), and charger density (CD). After clustering, each group is further split by trip purpose (work vs. leisure) to capture behavioural differences. For each cluster, the two agents closest to the centroid are selected as “representative agents.” These representatives are placed into two separate training sets, while ten random subsets of the full driver population serve as simulation agents. All combinations of the two training sets and ten simulation sets (20 runs total) are executed to reduce over‑fitting and increase statistical robustness.
The RL component uses off‑policy Q‑learning with a deep neural network to approximate the action‑value function. The state space includes current SOC, geographic location, real‑time charger availability, and an estimated waiting time (derived from personal experience, because queue length is not displayed in charging apps). The action space is discrete: choose a charger, travel to it, wait, charge, or skip charging. Hyper‑parameters (learning rate, discount factor, exploration rate) are tuned on an AWS cluster (64 vCPU, 128 GB RAM) via Bayesian optimisation.
After each training episode, the learned policies are injected into the ABM simulation, where all agents share a common environment. Agents receive real‑time charger status updates, but they may still travel to occupied stations and form queues, reflecting realistic driver behaviour. The resulting charging‑session patterns (spatial distribution and temporal windows) are validated against a real‑world dataset from ChargePoint. Validation uses spatial correlation and temporal cross‑correlation metrics.
Crucially, the episode that yields the highest correlation with observed data is not the final converged episode but an intermediate one, situated between early exploration and full convergence. This intermediate stage best reflects a heterogeneous population that includes both novice and experienced drivers, thereby capturing bounded rationality rather than idealised optimality.
The validated model uncovers “charging deserts” – regions where drivers frequently arrive with low SOC and have limited access to public chargers. These deserts are concentrated in north‑west England, inland Scotland, and certain suburban fringes. Policy simulations show that adding rapid‑charging hubs along motorways and medium‑capacity chargers at city boundaries reduces overall charging‑failure rates by 23 % and cuts average waiting times by 15 %. These findings align with recent UK policy shifts that prioritize motorway corridors and city‑edge infrastructure to support long‑distance EV travel.
Limitations acknowledged by the authors include: (1) the absence of qualitative driver attributes such as price sensitivity, environmental concern, or satisfaction; (2) no modelling of charger outages, power‑grid constraints, or other non‑normal operating conditions; and (3) reliance on the selected clustering variables, which are constrained by data availability.
Future work is proposed to enrich the reward function with survey‑derived preferences, to incorporate multi‑agent cooperative learning that captures interactions between drivers and charger operators, and to integrate power‑system constraints for joint optimisation of charging demand and electricity supply.
In summary, the study delivers a novel, scalable RL‑based ABM that realistically reproduces private EV driver behaviour at a national scale, validates it against real charging data, identifies critical infrastructure gaps, and provides evidence‑based guidance for more equitable and efficient charging‑network planning.
Comments & Academic Discussion
Loading comments...
Leave a Comment