Physics-Informed Gaussian Process Inference of Liquid Structure from Scattering Data
We present a nonparametric Bayesian framework to infer radial distribution functions from experimental scattering measurements with uncertainty quantification using non-stationary Gaussian processes. The Gaussian process prior mean and kernel functions are designed to mitigate well-known numerical challenges with the Fourier transform, including discrete measurement binning and detector windowing, while encoding fundamental yet minimal physical knowledge of liquid structure. We demonstrate uncertainty propagation of the Gaussian process posterior to unmeasured quantities of interest. Experimental radial distribution functions of liquid argon and water with uncertainty quantification are provided as both a proof of principle for the method and a benchmark for molecular models. The full implementation is available on GitHub at: https://github.com/hoepfnergroup/LiquidStructureGP-Sullivan.
💡 Research Summary
The authors introduce a non‑parametric Bayesian framework that infers the radial distribution function (RDF) of liquids directly from experimental scattering data while providing rigorous uncertainty quantification (UQ). The core idea is to treat the static structure factor S(q) – the quantity measured in X‑ray or neutron scattering – as a random function drawn from a physics‑informed, non‑stationary Gaussian process (GP). By embedding essential physical constraints (e.g., S(q→0) ≈ 1, the RDF limits g(r→0)=0 and g(r→∞)=1) into the GP mean and kernel, the model respects known liquid‑state behavior while remaining flexible enough to capture experimental nuances.
Bayes’ theorem combines this GP prior with a likelihood derived from the measured S(q) values and their experimental uncertainties (count statistics, time‑of‑flight errors). The resulting posterior distribution over S(q) is analytically tractable because the GP is conjugate to a Gaussian likelihood. Crucially, the radial Fourier transform that maps S(q) to g(r) is linear; therefore, the posterior over S(q) can be propagated through the transform to obtain a full posterior distribution for the RDF. This yields pointwise credible intervals for g(r) that incorporate both measurement noise and model‑based uncertainty, overcoming the ad‑hoc “modification functions” traditionally used to taper S(q) near the experimental q‑max.
Computationally, the authors employ inducing‑point approximations and variational inference to keep the method scalable to dense q‑grids typical of modern scattering experiments. The implementation is built on PyTorch/GPyTorch and released openly on GitHub, ensuring reproducibility.
The framework is validated on three case studies: (1) neutron‑scattering data for liquid argon, where the GP‑derived RDF matches the gold‑standard analysis of YARNELL with negligible artifacts; (2) synthetic structure factors generated from a flexible‑bond water model, demonstrating that the GP can recover the ground‑truth OO RDF even when substantial synthetic noise is added; (3) real X‑ray scattering data for liquid water, producing a benchmark OO RDF with quantified uncertainties that can be directly compared to popular water models (e.g., TIP4P/2005, MB‑pol). In all cases, the GP approach eliminates spurious ripples caused by discrete q‑sampling and windowing, and it yields realistic uncertainty bands that shrink in well‑constrained regions and expand where data are sparse.
Beyond RDF estimation, the authors discuss broader implications. The GP posterior over S(q) can serve as a bridge between experimental data and machine‑learning potentials such as the Gaussian Approximation Potential (GAP). It also enables structure‑optimized potential refinement (SOPR) and Bayesian force‑field optimization to propagate experimental uncertainty directly into interatomic potential parameters. By providing a rigorous probabilistic link between scattering experiments and molecular simulations, the method addresses a long‑standing gap in the validation workflow of liquid‑state models.
In summary, the paper delivers a mathematically sound, physically transparent, and computationally practical solution to the ill‑posed inverse problem of extracting RDFs from scattering data. The open‑source code and the benchmark RDFs for argon and water constitute valuable resources for the community, paving the way for more reliable force‑field development and for experimental design guided by quantified uncertainty.
Comments & Academic Discussion
Loading comments...
Leave a Comment