Fine-tuning universal machine learning potentials for transition state search in surface catalysis
Determining transition states (TSs) of surface reactions is central to understanding and designing heterogeneous catalysts but remains computationally prohibitive with density functional theory (DFT). While machine learning potentials (MLPs) offer significant speedups, task-specific models have limited transferability across catalytic systems, and universal MLPs (uMLPs) lack the accuracy needed for reactive configurations. Here, we present a workflow based on active learning to iteratively fine-tune uMLPs for DFT-quality TS search. Using 250 TSs from the CO2 hydrogenation reaction network on metal and single-atom alloy surfaces, we first benchmark TS search algorithms, identifying the Sella algorithm as most robust, and propose a modification (Bond-Aware Sella) that substantially improves its success rate. We then explore sequential and batch active-learning strategies for fine-tuning and show that DFT-quality TS structures can be found using only 8 DFT single-point calculations on average per structure. This demonstrates the viability of fine-tuned uMLPs for high-throughput catalyst screening.
💡 Research Summary
This paper addresses the long‑standing bottleneck of locating transition states (TSs) for surface reactions, which is essential for understanding and designing heterogeneous catalysts but remains computationally prohibitive when using density functional theory (DFT). The authors propose a comprehensive workflow that combines universal machine‑learning potentials (uMLPs) with an active‑learning loop to iteratively fine‑tune the potentials until DFT‑quality forces are achieved.
The study focuses on 250 TSs from the CO₂ hydrogenation (reverse water‑gas shift) network on a variety of metal and single‑atom alloy surfaces, providing a diverse test set that spans multiple elements and reaction types. First, the authors benchmark four single‑ended TS search algorithms—NEB, Dimer, ARPESS, and Sella—using the pre‑trained eSEN‑OAM uMLP for force evaluations. Sella emerges as the most robust, but its raw success rate plateaus around 80 %. To improve robustness, the authors introduce Bond‑Aware Sella (BA‑Sella), which injects chemically informed bond‑formation/breaking vectors into Sella’s Hessian approximation. By repeatedly applying rank‑one updates that align the lowest‑curvature eigenmode with the expected bond direction, BA‑Sella raises the success rate to 88 % across all tested uMLPs (including CHGNet‑0.3, MACE‑MPA, eSCN‑OC20, and UMA‑M).
Having identified a reliable TS search method, the authors then develop an active‑learning scheme to refine the uMLP. The loop proceeds as follows: (1) use the current uMLP to optimize a TS until forces fall below a user‑defined threshold; (2) perform a single‑point DFT calculation on the resulting geometry; (3) use the DFT forces as labels to fine‑tune the uMLP; (4) repeat until DFT forces meet the target. Two strategies are compared. In the sequential approach, each TS is treated independently; the uMLP is updated after each DFT point and then reused for the same TS, yielding a highly specialized potential. In the batch approach, data from many TS trajectories are pooled to update a single, more general potential that is reused for all calculations.
Performance metrics show that the sequential active‑learning strategy dramatically reduces the number of expensive DFT evaluations: on average only ~8 DFT single‑point calculations are required per TS, compared with ~102 for a full DFT optimization and ~70 when a DFT refinement follows a single‑point uMLP pre‑optimization. The batch strategy also cuts the cost (average ~38 DFT points) but exhibits larger variability. Additional experiments demonstrate that augmenting BA‑Sella with a modest number of NEB steps does not significantly improve success, whereas stochastic restarts (random perturbations of the TS guess and optimizer settings) can push the overall success rate to ~97 % when up to 20 restarts are allowed.
In summary, the authors deliver a practical, high‑throughput workflow that (i) leverages chemically informed curvature control to make Sella more reliable on surfaces, (ii) employs active learning to fine‑tune universal potentials with minimal DFT data, and (iii) achieves DFT‑level TS geometries with an order‑of‑magnitude reduction in computational cost. This approach is poised to accelerate mechanistic studies of heterogeneous catalysis and enable large‑scale screening of catalyst materials across the periodic table. Future work may extend the methodology to multi‑step reaction networks, incorporate uncertainty quantification, and integrate automated pipeline deployment for broader community use.
Comments & Academic Discussion
Loading comments...
Leave a Comment