All-in-one foundational models learning across quantum chemical levels

All-in-one foundational models learning across quantum chemical levels
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Machine learning (ML) potentials typically target a single quantum chemical (QC) level while the ML models developed for multi-fidelity learning have not been shown to provide scalable solutions for foundational models. Here we introduce the all-in-one (AIO) ANI model architecture based on multimodal learning which can learn an arbitrary number of QC levels. Our all-in-one learning approach offers a more general and easier-to-use alternative to transfer learning. We use it to train the AIO-ANI-UIP foundational model with the generalization capability comparable to semi-empirical GFN2-xTB and DFT with a double-zeta basis set for organic molecules. We show that the AIO-ANI model can learn across different QC levels ranging from semi-empirical to density functional theory to coupled cluster. We also use AIO models to design the foundational model Δ-AIO-ANI based on Δ-learning with increased accuracy and robustness compared to AIO-ANI-UIP. The code and the foundational models are available at https://github.com/dralgroup/aio-ani; they will be integrated into the universal and updatable AI-enhanced QM (UAIQM) library and made available in the MLatom package so that they can be used online at the XACS cloud computing platform (see https://github.com/dralgroup/mlatom for updates).


💡 Research Summary

The paper introduces an “All‑in‑One” (AIO) neural‑network architecture for quantum‑chemical (QC) multi‑fidelity learning, built on the well‑established ANI framework. By appending a one‑hot encoded descriptor of the reference level of theory to the atomic environment vectors (AEVs) that encode geometry, the model can simultaneously predict energies and forces at any number of QC levels it has seen during training. This multimodal approach eliminates the need for separate models, pre‑training/fine‑tuning pipelines, or level‑specific feature engineering that are typical of transfer learning (TL) and Δ‑learning methods.

Training data were assembled from the ANI‑1ccx dataset, comprising ~4.5 M off‑equilibrium conformations with ωB97X/def2‑TZVPP DFT energies and forces, and ~0.5 M high‑level CCSD(T)*/CBS energies. To broaden the fidelity spectrum, the authors added semi‑empirical GFN2‑xTB and ODM2 energies (and forces) for the same geometries, while keeping dispersion corrections (D4) separate and re‑adding them at inference time. Each level’s self‑atomic energies (SAE) were subtracted during training and restored during prediction, ensuring consistent energy baselines across levels.

The resulting AIO‑ANI‑UIP model can output DFT or CCSD(T) energies from a single forward pass. Internal validation shows mean absolute errors of ~1.2 kcal mol⁻¹ for both levels. On the external GMTKN55 benchmark (CHNO closed‑shell neutral subset), the weighted mean absolute deviation (WTMAD‑2) is 10.5 kcal mol⁻¹ for DFT predictions and 9.87 kcal mol⁻¹ for CC predictions—comparable to semi‑empirical GFN2‑xTB and DFT B3LYP‑D4/6‑31G* while being orders of magnitude faster.

A direct comparison with a conventional TL workflow (pre‑train on DFT, fine‑tune on CC) demonstrates that AIO learning converges in ~1,000 epochs (versus ~3,750 total epochs for TL) and yields a modestly lower WTMAD‑2 (9.87 vs. 10.54 kcal mol⁻¹). The authors argue that AIO’s single‑step training, freedom from layer‑freezing decisions, and ability to handle arbitrary numbers of levels make it a more scalable alternative.

To further boost accuracy, the authors exploit Δ‑learning: the same AIO‑ANI network is used to predict the energy difference between CC and DFT for a given geometry, which is then added to a baseline DFT calculation. The resulting Δ‑AIO‑ANI model halves the WTMAD‑2 to 4.69 kcal mol⁻¹, outperforming both the pure AIO‑ANI‑UIP and traditional DFT methods.

When extending the approach to three, four, or five QC levels (including semi‑empirical methods), validation errors continue to drop with training, but the generalization error exhibits erratic dependence on epoch count, indicating a high risk of over‑fitting. The authors mitigate this by introducing an external validation set (S30L) focused on non‑covalent interactions; monitoring this set stabilizes the training and yields robust models.

In summary, the AIO‑ANI architecture provides a simple, scalable, and data‑efficient route to “foundational” quantum‑chemical models that can predict at multiple fidelity levels from a single network. The code and pretrained models are released on GitHub and will be integrated into the MLatom package and the XACS cloud platform, enabling users to run fast, multi‑level ML potentials without bespoke model training. The work suggests that multimodal, all‑in‑one learning may become a standard paradigm for future quantum‑chemical machine learning.


Comments & Academic Discussion

Loading comments...

Leave a Comment