Mimyria: Machine learned vibrational spectroscopy for aqueous systems made simple
Vibrational spectroscopy provides a powerful connection between molecular dynamics (MD) simulations and experiment, but its routine use in condensed-phase systems remains limited. We introduce mimyria, a modular and automated framework that orchestrates electronic-structure reference calculations, trains atom-resolved machine-learning response models, and generates IR and Raman spectra from MD trajectories within a unified workflow. We introduce the polarizability gradient tensor (PGT) as a novel atom-resolved machine-learning target property for Raman spectroscopy, complementing the established atomic polar tensor (APT) for IR spectroscopy. As a necessary prerequisite, we demonstrate how both PGTs and APTs can accurately be computed from electronic-structure theory, validate them across formally equivalent derivative formulations, and thereby benchmark their numerical consistency. We then employ machine learning as an efficient surrogate to represent the validated APT and PGT response functions on aqueous benchmark systems. We validate the trained models directly at the level of the spectrum against explicit ab initio reference calculations and find that IR and Raman spectra converge with surprisingly small training sets. Moreover, spectral agreement improves more rapidly than the root-mean-square error (RMSE). While RMSE is straightforward to compute, statistically converged reference spectra are generally impractical to obtain, motivating the need to relate model-level errors to observable-level accuracy. By connecting these complementary error measures, we provide practical guidelines and early-stopping criteria for achieving sufficient spectral fidelity. By integrating response-tensor learning, automated training, and spectral-domain validation into a unified workflow, mimyria enables data-efficient and quantitatively reliable vibrational spectroscopy.
💡 Research Summary
The paper introduces “Mimyria,” an integrated, automated framework that bridges ab‑initio molecular dynamics (MD) simulations and vibrational spectroscopy (IR and Raman) through machine‑learning (ML) surrogate models of electronic response tensors. The authors first define two atom‑resolved response properties: the Atomic Polar Tensor (APT), already established for IR spectroscopy, and a novel Polarizability Gradient Tensor (PGT) for Raman spectroscopy. Both tensors are derived analytically from electronic‑structure theory—APT as the derivative of the dipole moment with respect to atomic displacements, and PGT as the spatial derivative of the polarizability tensor. The paper validates the numerical consistency of these tensors by comparing multiple equivalent derivative formulations, confirming that they can be computed accurately and consistently.
Having established reliable reference data, the authors train separate neural‑network models to predict APT and PGT for each atom based solely on local geometric descriptors (atomic positions and species). This “direct‑derivative learning” approach differs from most existing ML potentials that learn global dipole moments or polarizabilities and then differentiate them; instead, the models learn the atom‑wise tensors directly, avoiding ambiguities associated with partitioning global quantities. Remarkably, training data are generated from small water clusters, yet the learned models transfer seamlessly to periodic bulk water and to an aqueous sulfate solution, demonstrating the size‑insensitivity of the target tensors.
The performance of the ML models is evaluated not only by the conventional root‑mean‑square error (RMSE) on tensor components but also by the fidelity of the resulting spectra. The authors find that spectral agreement (peak positions and relative intensities) converges much faster than the RMSE, implying that a modest RMSE can already yield spectroscopically accurate results. By systematically correlating RMSE with spectral deviation, they propose practical early‑stopping criteria: once the RMSE falls below a system‑specific threshold, the generated IR or Raman spectrum is guaranteed to be within an acceptable error margin. This bridges the gap between model‑level error metrics and observable‑level accuracy, a crucial step because statistically converged ab‑initio reference spectra are often infeasible to obtain for large systems.
Mimyria’s workflow consists of four automated stages: (1) electronic‑structure calculations to obtain reference APT and PGT values, (2) generation of a training set from snapshots along short MD trajectories, (3) training of modular neural‑network models for each tensor, and (4) application of the trained models to long MD trajectories to compute time‑correlation functions and, via Fourier transform, the IR absorption coefficient and Raman scattering cross‑section. The framework is modular: response‑tensor models are independent of the underlying ML potential used for the MD, allowing users to reuse existing potentials (e.g., for water or electrolyte solutions) without retraining them. Consequently, vibrational spectroscopy can be added retrospectively to any MD study with minimal additional effort.
The authors demonstrate the utility of Mimyria on bulk liquid water and on an aqueous sulfate solution. With training sets as small as a few dozen configurations, the ML‑generated spectra match explicit ab‑initio spectra within experimental resolution, even capturing subtle features such as low‑intensity bands and solvent‑induced shifts. They also show how the framework can diagnose rare atomic environments (e.g., the sulfate ion) that are otherwise masked by the bulk water background, highlighting the atom‑resolved diagnostic power of the approach.
In summary, Mimyria delivers a data‑efficient, quantitatively reliable pipeline for vibrational spectroscopy of condensed‑phase systems. By introducing the PGT for Raman, validating both tensors, employing direct‑derivative ML models, and linking model errors to spectral fidelity, the work removes a major bottleneck that has limited routine spectroscopic analysis of ab‑initio MD. The framework’s automation, modularity, and early‑stopping guidelines make it readily adoptable for a wide range of aqueous and solvated systems, opening the door to routine, atom‑specific interpretation of IR and Raman spectra directly from large‑scale molecular simulations.
Comments & Academic Discussion
Loading comments...
Leave a Comment