A Cross-Domain Graph Learning Protocol for Single-Step Molecular Geometry Refinement
Accurate molecular geometries are a prerequisite for reliable quantum-chemical predictions, yet density functional theory (DFT) optimization remains a major bottleneck for high-throughput molecular screening. Here we present GeoOpt-Net, a multi-branch SE(3)-equivariant geometry refinement network that predicts DFT-quality structures at the B3LYP/TZVP level of theory in a single forward pass starting from inexpensive initial conformers generated at a low-cost force-field level. GeoOpt-Net is trained using a two-stage strategy in which a broadly pretrained geometric representation is subsequently fine-tuned to approach B3LYP/TZVP-level accuracy, with theory- and basis-set-aware calibration enabled by a fidelity-aware feature modulation (FAFM) mechanism. Benchmarking against representative approaches spanning classical conformer generation (RDKit), semiempirical quantum methods (xTB), data-driven geometry refinement pipelines (Auto3D), and machine-learning interatomic potentials (UMA) on external drug-like molecules demonstrates that GeoOpt-Net achieves sub-milli-Å all-atom RMSD with near-zero B3LYP/TZVP single-point energy deviations, indicating DFT-ready geometries that closely reproduce both structural and energetic references. Beyond geometric metrics, GeoOpt-Net generates initial guesses intrinsically compatible with DFT convergence criteria, yielding nonzero ``All-YES’’ convergence rates (65.0% under loose and 33.4% under default thresholds), and substantially reducing re-optimization steps and wall-clock time. GeoOpt-Net further exhibits smooth and predictable energy scaling with molecular complexity while preserving key electronic observables such as dipole moments. Collectively, these results establish GeoOpt-Net as a scalable, physically consistent geometry refinement framework that enables efficient acceleration of DFT-based quantum-chemical workflows.
💡 Research Summary
The paper introduces GeoOpt‑Net, a novel single‑step geometry refinement framework that delivers B3LYP/TZVP‑quality molecular structures directly from inexpensive force‑field conformers. Unlike traditional DFT optimizations, which scale as O(N³‑N⁴) and require iterative self‑consistent field cycles, or machine‑learning PES approaches that still need multiple force evaluations, GeoOpt‑Net predicts refined Cartesian coordinates in one forward pass without constructing an explicit potential energy surface.
The architecture is built on SE(3)‑equivariant graph neural networks that decompose molecular geometry into three independent streams: bond lengths, bond angles, and dihedral torsions. Each stream encodes pairwise distances with radial basis functions and angular information with spherical harmonics, producing scalar (ℓ = 0) and directional (ℓ ≥ 1) features. These features are coupled via Clebsch‑Gordan tensor products, preserving exact equivariance under rotations and translations. A lightweight transformer decoder then fuses the three streams and outputs coordinate updates, applying non‑linearities only to scalar channels to maintain equivariance.
Training proceeds in two stages. Stage 1 pre‑trains the network on ~290 k molecules from QM9 and QM40, computed at the lower‑cost B3LYP/6‑31G(2df,p) level, to learn generic bonding patterns, angular preferences, and torsional statistics. Stage 2 fine‑tunes the pretrained weights on the QMe14S dataset (180 k molecules, 14 elements) calculated at the target B3LYP/TZVP level. To adapt the model across theory and basis‑set domains without retraining the whole network, the authors introduce Fidelity‑aware Feature Modulation (FAFM). FAFM generates scale (g_d) and shift (b_d) parameters from a domain embedding d and modulates intermediate features via h ← h ⊙ (1 + g_d) + b_d, enabling theory‑specific calibration.
Benchmarking against classical conformer generators (RDKit/ETKDG), semi‑empirical methods (xTB), data‑driven pipelines (Auto3D), and ML interatomic potentials (UMA) shows that GeoOpt‑Net achieves sub‑milli‑angstrom all‑atom RMSD (median ≈ 0.0008 Å) and near‑zero single‑point energy deviations (≈ 0.02 kcal mol⁻¹) relative to reference B3LYP/TZVP geometries. Error decomposition reveals bond‑length, angle, and dihedral errors well below 0.01 Å, 0.5°, and 2°, respectively. Importantly, the refined structures satisfy DFT convergence criteria in 65 % of cases under loose thresholds and 33 % under default thresholds, whereas all baseline methods achieve 0 % “All‑YES” convergence. Consequently, the number of required re‑optimisation steps drops from an average of ~2 steps to < 0.5 steps, cutting wall‑clock time by more than 70 %.
Additional analyses demonstrate smooth scaling of predicted energies with molecular size and rotatable‑bond count, and strong correlation (R > 0.96) of electronic observables such as dipole moments and HOMO‑LUMO gaps with the DFT reference. These findings indicate that GeoOpt‑Net preserves not only geometric fidelity but also electronic consistency.
The authors discuss broader implications: GeoOpt‑Net can serve as a drop‑in pre‑processor for high‑throughput virtual screening, drug‑candidate geometry preparation, and catalyst design, effectively eliminating the most time‑consuming step of DFT geometry optimization. Future work is outlined to extend the approach to transition‑metal complexes, incorporate multi‑theory transfer (e.g., MP2, CCSD(T)), and integrate with reaction‑pathway generators for simultaneous transition‑state prediction.
In summary, GeoOpt‑Net combines (1) exact SE(3) equivariance, (2) multi‑branch geometric decomposition, and (3) theory‑aware FAFM fine‑tuning to deliver DFT‑level accuracy in a deterministic, single‑inference step. This represents a substantial advance in accelerating quantum‑chemical workflows while maintaining physical rigor and reproducibility.
Comments & Academic Discussion
Loading comments...
Leave a Comment