Bayesian Inference for Missing Physics
Model-based approaches for (bio)process systems often suffer from incomplete knowledge of the underlying physical, chemical, or biological laws. Universal differential equations, which embed neural networks within differential equations, have emerged as powerful tools to learn this missing physics from experimental data. However, neural networks are inherently opaque, motivating their post-processing via symbolic regression to obtain interpretable mathematical expressions. Genetic algorithm-based symbolic regression is a popular approach for this post-processing step, but provides only point estimates and cannot quantify the confidence we should place in a discovered equation. We address this limitation by applying Bayesian symbolic regression, which uses Reversible Jump Markov Chain Monte Carlo to sample from the posterior distribution over symbolic expression trees. This approach naturally quantifies uncertainty in the recovered model structure. We demonstrate the methodology on a Lotka-Volterra predator-prey system and then show how a well-designed experiment leads to lower uncertainty in a fed-batch bioreactor case study.
💡 Research Summary
The manuscript tackles the pervasive problem of “missing physics” in (bio)process modeling, where the governing physical, chemical, or biological laws are only partially known. The authors propose a two‑stage framework that first employs Universal Differential Equations (UDEs) to embed a neural network within a system of ordinary differential equations, thereby learning unknown dynamical terms directly from experimental data. While UDEs excel at capturing complex, nonlinear relationships, the neural network component remains a black box, limiting interpretability and scientific insight.
To overcome this opacity, the paper introduces Bayesian symbolic regression based on Reversible Jump Markov Chain Monte Carlo (RJMCMC). Unlike conventional genetic‑algorithm symbolic regression, which yields a single point estimate, the Bayesian approach defines a prior over expression trees (including depth penalties, operator probabilities, feature selection, and a special linear transformation node lt(x,a,b)=ax+b for constants) and a likelihood that measures the discrepancy between the neural network output and the candidate symbolic expression, assuming additive Gaussian noise. Continuous parameters (the noise variance σ² and the constants a, b) are sampled using gradient‑based methods such as the No‑U‑Turn Sampler (NUTS), while discrete tree structures are updated via RJMCMC moves that allow dimensionality changes. This hybrid Gibbs sampler produces a posterior distribution over both tree topology and continuous coefficients, providing a natural quantification of structural uncertainty.
The methodology is demonstrated on two case studies. The first uses the classic Lotka‑Volterra predator‑prey system. The authors assume knowledge of the linear growth/decay terms but treat the interaction terms (−0.9 x₁x₂ and 0.8 x₁x₂) as unknown. After training a UDE, Bayesian symbolic regression generates ten posterior samples. Most samples closely approximate the true interaction terms; some recover the correct coefficients, while others represent equivalent forms without explicit constants, illustrating that the posterior captures a family of plausible models rather than a single “correct” expression.
The second case study involves a fed‑batch bioreactor. Here the authors explore how experimental design influences posterior uncertainty. An initial design with sparse sampling and high measurement noise yields a wide posterior distribution, reflecting high uncertainty in the recovered missing physics. By strategically selecting sampling times and reducing noise, the posterior contracts dramatically, leading to more precise identification of both the functional form and parameter values. This result underscores the practical importance of optimal experiment design when employing Bayesian symbolic regression for model discovery.
In summary, the paper makes three key contributions: (1) a novel integration of UDEs and Bayesian symbolic regression that transforms a black‑box neural network into interpretable, uncertainty‑aware symbolic expressions; (2) a rigorous Bayesian formulation that quantifies both parametric and structural uncertainty via posterior distributions; and (3) an empirical demonstration that thoughtful experimental design can substantially reduce model uncertainty. The approach is broadly applicable to complex systems in biotechnology, chemistry, and environmental engineering where partial mechanistic knowledge coexists with rich data streams, offering a pathway to scientifically grounded, data‑driven modeling.
Comments & Academic Discussion
Loading comments...
Leave a Comment