pADAM: A Plug-and-Play All-in-One Diffusion Architecture for Multi-Physics Learning
Generalizing across disparate physical laws remains a fundamental challenge for artificial intelligence in science. Existing deep-learning solvers are largely confined to single-equation settings, limiting transfer across physical regimes and inferen…
Authors: Amirhossein Mollaali, Bongseok Kim, Christian Moya
pAD AM: a plug-and-pla y all-in-one diffusion arc hitecture for m ulti-ph ysics learning Amirhossein Mollaali 1 , Bongseok Kim 1 , Christian Mo ya 2 , Guang Lin 1,2* 1* Sc ho ol of Mec hanical Engineering, Purdue Universit y , W est Lafa yette, 47906, IN, USA. 2* Departmen t of Mathematics, Purdue Univ ersity , W est Lafa yette, 47906, IN, USA. *Corresp onding author(s). E-mail(s): guanglin@purdue.edu ; Con tributing authors: amollaal@purdue.edu ; kim4853@purdue.edu ; cmo yacal@purdue.edu ; Abstract Generalizing across disparate ph ysical laws remains a fundamen tal challenge for artificial in telligence in science. Existing deep-learning solv ers are largely confined to single-equation settings, limiting transfer across ph ysical regimes and inference tasks. Here w e introduce pADAM, a unified generative framew ork that learns a shared probabilistic prior across heterogeneous partial differential equation fami- lies. Through a learned joint distribution of system states and, where applicable, ph ysical parameters, pADAM supports forward prediction and inv erse inference within a single arc hitecture without retraining. Across benchmarks ranging from scalar diffusion to nonlinear Navier–Stok es equations, pADAM ac hieves accurate inference even under sparse observ ations. Combined with conformal prediction, it also provides reliable uncertaint y quantification with cov erage guarantees. In addition, pADAM p erforms probabilistic model selection from only tw o sparse snapshots, identifying gov erning laws through its learned generative representa- tion. These results highlight the p oten tial of generative multi-ph ysics mo deling for unified and uncertaint y-aw are scientific inference. The mathematical description of ph ysical phenomena through partial differential equations (PDEs) forms a cornerstone of mo dern science [ 1 , 2 ], enabling the c har- acterization of systems ranging from large-scale w eather and climate dynamics [ 3 ] 1 to the complex turbulence of fluid flows [ 4 , 5 ]. While classical numerical metho ds remain the b edrock of accuracy through systematic discretization [ 6 , 7 ], their extreme computational cost in high-dimensional parameter spaces—sp ecifically in uncertaint y quan tification and inv erse design—has motiv ated the developmen t of accelerated surrogate mo dels [ 8 ]. Deep learning metho ds hav e emerged as a promising alternative to traditional PDE solvers, with architectures such as F ourier Neural Op erators (FNO) [ 9 – 13 ], Deep Op erator Net w orks (DeepONet) [ 14 – 18 ], and Physics-Informed Neural Net- w orks (PINNs) [ 19 , 20 ] demonstrating the p oten tial to bypass traditional solver constrain ts. Despite these adv ances, the field remains hindered b y tw o critical lim- itations. First, most mo dels pro duce deterministic outputs, limiting their utility in uncertaint y quan tification unless augmented with auxiliary techniques such as Ba yesian training or ensem ble metho ds [ 21 – 23 ]. Second, and more fundamen tally , cur- ren t architectures op erate within a “one-model-one-equation” paradigm. This rigidity prev ents cross-physics knowledge transfer and necessitates exhaustiv e retraining for ev ery new physical regime and task. F urthermore, while emerging foundation mo dels for PDEs [ 24 – 28 ] ha ve b egun to address multi-operator training, they remain limited in performing reliable inference under highly sparse observ ations c haracteristic of real- w orld sensing. This limitation arises b ecause these frameworks learn fixed forward mappings b et ween prescrib ed input/output fields, rather than probabilistic mo dels capable of b eing conditioned on arbitrary measurement op erators at test time. Diffusion mo dels hav e recen tly emerged as a p o werful generative framework for scien tific computing, demonstrating particular promise in solving PDEs by learning the full probability distribution of solution states rather than deterministic p oin t esti- mates [ 29 – 33 ]. Unlike deterministic op erators, their iterativ e denoising formulation supp orts conditional sampling (inpainting), positioning them as inheren tly suited for inference under sparse observ ations. Despite this flexibility , existing diffusion-based PDE solvers remain constrained by the same sp ecialization observed in earlier neu- ral arc hitectures: mo dels are typically restricted to a single PDE class. Consequen tly , curren t approac hes cannot generalize across disparate physics, p erpetuating a frag- men ted landscap e that limits their practical utilit y in settings in volving heterogeneous ph ysical regimes. T o address this fragmentation, we introduce pAD AM (plug-and-play all-in-one diffusion architecture for multi-ph ysics learning), a unified generative framework for learning across heterogeneous PDE families within a single mo del. pAD AM learns a shared class-conditional probabilistic prior o ver system states and, when applicable, ph ysical parameters, allo wing forw ard prediction, parameter inference, and initial- condition reconstruction to b e formulated within one p osterior-sampling framework without task-sp ecific retraining. In this wa y , a single architecture can op erate across m ultiple ph ysical regimes that are typically treated as separate learning problems. Through observ ation-guided conditional sampling, the mo del can incorp orate sparse measuremen ts at inference time, supp orting accurate inference from only 10–30% observ ations. 2 As a generativ e framework, pAD AM also quan tifies predictive uncertain ty through sampling. T o ensure the reliabilit y of these uncertaint y estimates, we in tegrate confor- mal prediction into the inference pip eline. This pro vides distribution-free finite-sample co verage guaran tees for predictive interv als [ 22 , 34 , 35 ], which is particularly imp or- tan t under sparse observ ations, where interv als can otherwise exhibit undercov erage. By calibrating predictive uncertaint y , this framew ork supports more reliable scientific inference. Bey ond forward and inv erse inference, pADAM also supp orts probabilistic PDE mo del selection from as few as t wo sparse temp oral snapshots b y lev eraging its learned shared generativ e prior. This enables identification of go verning physical laws and quan tification of asso ciated uncertain ties from minimal measurements. Fig. 1 provides a schematic ov erview of the pADAM architecture and its ability to transition across forward, inv erse, and disco very tasks within a single probabilistic framew ork. This versatilit y p ositions pADAM as a promising framework for unified and uncertain ty-a ware scientific inference across heterogeneous physical systems. Results W e ev aluate pAD AM as a unified foundation for multi-ph ysics learning across a div erse library of heterogeneous PDEs. Our assessment consists of fiv e inv estiga- tions, each designed to ev aluate a distinct capability of the framework. Across these in vestigations, we examine task-agnostic inference in forw ard and inv erse problems under full and sparse observ ations, uncertaint y quantification with conformal calibra- tion, out-of-distribution extrapolation, and probabilistic mo del selection from minimal observ ations. Gov erning equations are detailed in the Metho ds, and the ranges of initial conditions and physical parameters across all ev aluated PDE families are summarized in Extended Data T able 1 . Unified m ulti-physics learning and trustw orthy inference W e first ev aluate the capacity of pADAM to learn three distinct physical op erators within a single arc hitecture: purely dissipative (diffusion), purely conv ectiv e (advec- tion), and their coupled form (adv ection–diffusion). By unifying these heterogeneous PDE families—represen ting the transition from isotropic smoothing to directional transp ort—within a single set of weigh ts, w e assess whether a shared generative prior can capture disparate ph ysical dynamics without equation-sp ecific training. Op er ator c ompr ession and p ar ameter efficiency T o test this unification, w e b enc hmark pADAM against PDE-sp ecific diffusion-mo del baselines [ 29 ] trained indep enden tly for each PDE family . In this configuration, physi- cal parameters are held fixed to ev aluate the model’s abilit y to learn the bidirectional mapping b etw een initial and final states. While the baselines require separate mo del instances for each regime, pADAM is trained to learn the conditional join t distribution p ( u 0 , u T | c ), where c denotes the PDE class. Crucially , pADAM uses appro ximately the same num b er of trainable parameters as a single sp ecialized baseline, while sup- p orting all three PDE families within a single mo del, for b oth forward prediction 3 PD E cl ass l ab el PDE Simulations (Dataset Generation) Lif te d pa ram et er Init ia l con di t i on Fina l stat e … Class - Conditional T raining U - Net - ba sed de no iser arch it ectur e Lea rnin g Condi t ion al Joi nt Distrib ution Datase t a b c Class - Conditional Samplin g T rained clas s - con dit ion al dif fusio n m od el Probability flow Den oising d Plug - and - play infer ence acr oss multiple PDEs Unified class - con dit ion al dif fusio n m od el Generated sam ple Init ial Gaussian no ise Gen era ting class - con diti onal PDE samp les Den oising Probability flow Generated po sterior sa m ple Init ial Gaussian no ise 1. Forward pr oblem 2. Inverse pr oblems 3. PDE Model S elect ion Prio r score Lik eliho od score Observation - guided trajectory Pred icting the fina l s ta te In itial con dition Param eter Final st ate In fer ring the initial condition o r p ara mete rs Initial con dition Fina l state PDE class and In itial con dition Final state Parameter Ass ociated p aram eters In itial con dition Final state Param eter Prob abilistic PDE Model Selectio n fr om T wo Snap shots PDE class PDE class PDE class Fig. 1 : Sc hematic of the pAD AM framework for unified multi-ph ysics learn- ing. a – c , The pADAM framework learns across disparate physical laws, illustrated here for scalar-field PDEs, b y pro jecting heterogeneous equation families in to a shared generativ e prior. A class-conditional diffusion mo del learns the joint distribution of system states and ph ysical parameters ( a, b ), enabling the generation of diverse ph ysical regimes from Gaussian noise ( c , orange tra jectories). d , T ask-agnostic infer- ence via Bay esian conditioning. By incorp orating full or sparse observ ations through plug-and-pla y guidance (green tra jectory), the shared pADAM prior supports forward prediction, initial condition reconstruction, parameter inference, and probabilistic mo del selection within a single framew ork. This unified manifold allows pADAM to na vigate a range of inference tasks without task-sp ecific retraining. u T ∼ p ( u T | u 0 , c ) and initial-state reconstruction u 0 ∼ p ( u 0 | u T , c ) under full and sparse observ ations. As rep orted in T able 1 , pADAM achiev es high-fidelit y predictions across all regimes, even under sparse observ ations. A notable result is the model’s data efficiency; despite b eing trained on 33% few er samples p er PDE family than the sp ecialized 4 T able 1 : Comparative p erformance against PDE-sp ecific baselines. Rela- tiv e L 2 error (%) for forw ard prediction ( u T ) and in verse reconstruction ( u 0 ) across fixed-parameter PDE systems. W e compare pADAM (trained on a diverse diffu- sion, advection, and advection–diffusion library , N = 1 , 000) against PDE-sp ecific diffusion baselines ( N = 500 p er family) following DiffusionPDE [ 29 ]. Results are a veraged ov er 50 test instances under full (100%) and sparse (30%) spatial obser- v ations. The low est errors in eac h row are shown in b old. PDE system Observ ation Mo del F orward ( u T ) In v erse ( u 0 ) Diffusion F ull (100%) pAD AM 0 . 82 0 . 45 Single-PDE 0 . 55 0 . 68 Sparse (30%) pAD AM 0 . 88 0 . 86 Single-PDE 1 . 00 2 . 31 Adv ection F ull (100%) pAD AM 1 . 03 1 . 25 Single-PDE 1 . 02 1 . 26 Sparse (30%) pAD AM 1 . 36 2 . 29 Single-PDE 1 . 52 1 . 55 Adv ection–diffusion F ull (100%) pAD AM 1 . 54 1 . 43 Single-PDE 1 . 28 1 . 29 Sparse (30%) pAD AM 2 . 26 1 . 92 Single-PDE 1 . 64 2 . 05 baselines (333 vs. 500 samples), pADAM achiev es comparable accuracy ov erall across the ev aluated tasks. This result is particularly imp ortan t b ecause pADAM in tegrates m ultiple dynamics within a single net work, whereas standard approaches require inde- p enden t mo del instances for each op erator. By unifying these PDE families within a single mo del, pADAM demonstrates parameter efficiency , effectively amortizing the represen tational cost of the entire m ulti-physics library across a single set of weigh ts. This op erator compression suggests that parameter sharing induces a “shar e d physi- c al vo c abulary,” enabling a single mo del to span multiple ph ysical manifolds without increasing global computational o verhead. R epr esentation sharing acr oss PDE op er ators T o gain insight in to how pADAM shares represen tations across heterogeneous op era- tors, we examine its internal attention patterns. Analysis of a deco der blo c k, shown in Extended Data Fig. 1 , suggests that the model reuses and comp oses op erator-sp ecific features when transitioning from pure to mixed dynamics. The attention patterns for pure diffusion and pure advection are visibly distinct; in regions where attention w eights are prominen t for adv ection but absen t for diffusion, the advection–diffusion maps exhibit intermediate v alues. This graded b eha vior is consistent with the mixed 5 nature of the advection–diffusion equation, where transp ort and dissipation co exist. These patterns suggest that pAD AM comp oses op erator-specific features rather than main taining isolated mechanisms for each PDE family . T o assess this b eha vior quan titatively , w e compute pairwise cosine similarities b et w een atten tion maps extracted from one enco der blo ck and one deco der block across tw o denoising steps (Extended Data Fig. 2 ). Across all settings, similarities in volving the advection–diffusion case are consistently higher than those b etw een the pure diffusion and pure adv ection extremes. This hierarc hy mirrors the mathematical structure of the PDEs, where the mixed op erator links the t wo pure regimes. Such alignmen t suggests that pAD AM organizes its latent space in accordance with these relationships, reflecting the comp ositional structure of the underlying dynamics. Zer o-shot extr ap olation to unse en physic al laws A defining capability of a generalist prior is extrap olation beyond its training library . W e ev aluate pADAM—trained on diffusion, advection, and advection–diffusion—on the unseen adv ection–diffusion–reaction (ADR) equation, where the ADR system uses the same viscosity and v elo cit y parameters as the advection–diffusion case. The reac- tion term ( k ) induces a structured operator shift abs en t during training, rendering the learned prior missp ecified with resp ect to the true ADR dynamics. W e quantify this shift using ∆ op , defined as the relativ e L 2 deviation of the terminal state induced by the reaction term (see Metho ds, Eq. ( 30 )). As sho wn in Extended Data Fig. 3 a, the shift exceeds 20% at k = 5 . 0 and exceeds 50% at k = 15 . 0, confirming a genuinely out-of-distribution regime. Despite this mismatch, pADAM remains robust when sparse observ ations of ( u 0 , u T ) are used to jointly reconstruct the full-field states, as demonstrated in Extended Data Fig. 3 b. The model leverages observ ation-guided sampling conditioned on the closest known op erator class (the adv ection–diffusion class) to steer the mis- sp ecified prior tow ard tra jectories consistent with the unseen ADR dynamics. While reconstruction error in u T increases with the reaction rate k , reflecting the accumu- lated influence of the unseen dynamics, errors for the initial state u 0 remain stable. As observ ation sparsity increases, errors rise in a con trolled manner for b oth end- p oin ts because less information is av ailable to constrain the conditional distribution. These results suggest that pAD AM can adapt to unseen dynamics through informativ e observ ations, supp orting inference on previously unseen regimes without additional training. R eliable unc ertainty quantific ation via c onformal c alibr ation Unlik e deterministic surrogates, the generative nature of pADAM provides a proba- bilistic basis for uncertaint y quantification through p osterior ensembles that capture distributions of physically consistent solutions. Although these ensembles reflect pre- dictiv e uncertain ty , empirical co verage (PICP) can fall well b elow the nominal 95% target due to data limitations or the inheren t ill-posedness that intensifies under obser- v ational sparsity (Extended Data Fig. 4 ). T o address this, w e integrate conformal prediction into pAD AM to provide formal cov erage guarantees and improv e interv al reliabilit y . 6 As illustrated in Extended Data Fig. 5 , conformal calibration compensates for unresolv ed uncertaint y by adaptively expanding prediction in terv als in regions where observ ations provide limited information. Quantitativ e analysis in Extended Data T able 2 shows that, under 30% spatial observ ations for the advection–diffusion sys- tem, mean empirical cov erage increases from 58.33% to 98.42% for forward prediction and, most notably , from 36.31% to 99.83% for inv erse reconstruction. The sligh t o ver-co verage relative to the 95% target is consistent with finite-sample effects, as calibration set sizes are limited by sampling cost. T ogether, these results show that pAD AM supp orts uncertaint y-a ware scientific inference with calibrated cov erage even under sev ere data scarcity . Na vigating the contin uous physics manifold T o ev aluate pADAM’s capacity to mo del the contin uous sp ectrum of physical dynam- ics, we extended the framework to systems with v ariable physical co efficien ts ϕ . W e trained a unified prior on three canonical PDE families, in which one physical co effi- cien t was treated as a v ariable parameter for each system: diffusion (v ariable viscosity ν ), advection (v ariable velocity a x in the x -direction), and advection–diffusion (v ari- able ν ). By learning the joint distribution p ( ϕ, u 0 , u T | c ), pADAM mov es b ey ond discrete operator sets to represent a contin uous physical manifold. This formulation enables task-agnostic inference, in which a single set of w eights supp orts forward pre- diction u T ∼ p ( u T | u 0 , ϕ, c ), initial-state reconstruction u 0 ∼ p ( u 0 | u T , ϕ, c ), and parameter disco very ϕ ∼ p ( ϕ | u 0 , u T , c ) across disparate PDEs. As summarized in Extended Data T able 3 , pAD AM main tained high-fidelity p er- formance across this full task spectrum; relativ e L 2 errors for b oth state and parameter estimation remained b elo w 2.81% under full observ ation. Notably , the mo del also remained stable under sev ere observ ational sparsity . While parameter inference is naturally more sensitiv e to sparsity-induced ill-p osedness, state reconstruction errors remained b elo w 4.11% even with only 10% spatial observ ations. These results suggest that pAD AM can steer the generativ e prior tow ard physically consistent tra jecto- ries, supporting probabilistic state and parameter inference across con tinuous ph ysical regimes. A qualitative illustration of parameter disco very on the con tinuous manifold for the advection system under sparse (10%) observ ations is provided in Extended Data Fig. 6 a, further demonstrating pAD AM’s inference capability under limited data. Scalabilit y and generalization across the ph ysical sp ectrum T o ev aluate the robustness of the pADAM framework, we inv estigated its scalabil- it y across t wo distinct dimensions: structur al br e adth and p ar ametric depth . This dual-pronged assessment ev aluates the mo del’s ability to maintain high-fidelity rep- resen tations as b oth the diversit y of gov erning laws and the dimensionality of their parameter spaces increase, providing a systematic ev aluation of generalization across the ph ysical sp ectrum. 7 Breadth: structural scaling to a 6-PDE library W e next inv estigated the framework’s capacity to scale to a broader training library b y significantly expanding its structural breadth. T o ev aluate the representational efficiency of the learned manifold, we utilized a mo del with the same capacity and arc hitecture as in the contin uous physics manifold inv estigation and trained it on an expanded set of six PDE families, spanning b oth scalar and vector-v alued systems. This structural scaling c hallenges the task-agnostic inference capabilities of pADAM b y requiring it to learn a broader range of physical dynamics within a single set of w eights. F or the scalar regimes—including diffusion, advection, advection–diffusion, and Allen–Cahn—w e main tained the single-v ariable co efficien t settings established in the con tinuous physics manifold inv estigation to ev aluate whether the mo del could pre- serv e its precision across the full task sp ectrum: forward prediction u T ∼ p ( u T | u 0 , ϕ, c ), initial-state reconstruction u 0 ∼ p ( u 0 | u T , ϕ, c ), and parameter discov- ery ϕ ∼ p ( ϕ | u 0 , u T , c ). F or the v ector-v alued Burgers’ and Navier–Stok es systems, the physical co efficien ts were held fixed to isolate the challenge of mo deling coupled v elo cit y fields via the joint distributions p ( u 0 , v 0 , u T | c ) and p ( u 0 , v 0 , v T | c ), whic h capture the relationship b et ween the velocity comp onen ts across initial and terminal states through shared conditioning. Here, pADAM mo dels the coupled state transi- tion through comp onen t-wise sampling; sp ecifically , for forward prediction, we sample terminal v elo cities u T ∼ p ( u T | u 0 , v 0 , c ) and v T ∼ p ( v T | u 0 , v 0 , c ). F or inv erse state estimation, where joint reconstruction is inherently ill-p osed, we leverage an auxiliary conditioning scheme—sampling u 0 ∼ p ( u 0 | v 0 , u T , c ) and v 0 ∼ p ( v 0 | u 0 , v T , c )—to b etter constrain the solution space and mitigate the sensitiv e dep endence on initial conditions c haracteristic of nonlinear conv ectiv e systems. As rep orted in T able 2 , pADAM maintained strong p erformance across the expanded library without architectural mo dification. While we observe a marginal increase in relativ e error compared to the smaller op erator sets used in that earlier in vestigation, this limited degradation—despite the substan tial increase in struc- tural breadth—suggests that the mo del can maintain p erformance as the training library expands. Notably , the unified prior consistently captured disparate dynamic regimes and structural patterns, from the sharp interfaces of Allen–Cahn (retaining < 1% parameter error) to the dissipative evolution of the Burgers’ system. While the advection–diffusion system exhibited higher sensitivity in parameter discov ery under sparsity (14 . 36%), this reflects the comp ounded difficult y of resolving comp et- ing transp ort mechanisms within a shared latent space. In the Na vier–Stokes regime, although the v -v elo cit y comp onen t show ed a characteristic error increase (8 . 67% at 30% observ ations), this b eha vior is exp ected under sparse observ ations, where reco v- ery of coupled nonlinear flow fields b ecomes more underdetermined. Nev ertheless, the mo del remained stable and generated physically consistent samples. Qualitativ e examples of forward and in verse reconstructions for the Navier–Stok es and Burgers’ equations are illustrated in Fig. 2 and Extended Data Fig. 6 b,c, resp ectiv ely , high- ligh ting the framew ork’s robust p erformance under b oth full and sparse observ ation regimes. 8 T able 2 : Structural scaling across the multi-ph ysics sp ectrum. Relative L 2 errors (%) for pADAM ev aluated on a library of six distinct physical regimes. a, Performance on scalar-field PDEs (diffusion, advection, advection–diffusion, and Allen–Cahn) for forw ard prediction ( u T ), initial-state reconstruction ( u 0 ), and param- eter discov ery ( ϕ ). b, Performance on v ector-field PDEs (Burgers’ and Navier–Stok es) where the mo del learns component-wise join t distributions to provide forw ard ( u T , v T ) and inv erse ( u 0 , v 0 ) state estimation. All tasks are p erformed under b oth full (100%) and sparse (30%) spatial observ ations, with results rep orted as the mean o ver 50 test instances. a — Scalar-field PDEs PDE system Observ ation F orward ( u T ) In v erse ( u 0 ) In v erse ( ϕ ) Diffusion F ull (100%) 1.37 1.11 4.13 Sparse (30%) 2.03 2.40 5.60 Adv ection F ull (100%) 1.93 1.96 1.51 Sparse (30%) 1.93 1.97 2.24 Adv ection–diff. F ull (100%) 1.96 2.28 9.26 Sparse (30%) 2.59 3.28 14.36 Allen–Cahn F ull (100%) 1.21 2.42 0.48 Sparse (30%) 2.32 2.75 0.72 b — V ector-field PDEs PDE system Observ ation F orward ( u T ) F orward ( v T ) In verse ( u 0 ) In verse ( v 0 ) Burgers’ F ull (100%) 1.36 1.16 1.29 0.90 Sparse (30%) 2.08 1.63 2.32 1.06 Na vier–Stokes F ull (100%) 1.45 4.96 1.25 3.85 Sparse (30%) 2.72 8.67 1.38 4.33 Depth: parametric scaling and partial-parameter inference W e further challenged the framework by moving from settings with a single v ariable parameter to regimes with multi-v ariable co efficien t sets of different dimension- alities. This setting enables the ev aluation of state and parameter inference in higher-dimensional parameter spaces. W e trained pADAM on three canonical fami- lies—diffusion, adv ection, and advection–diffusion—in which all physical co efficients w ere treated as v ariable: ϕ = [ ν ] for diffusion, ϕ = [ a x , a y ] for adv ection, and ϕ = [ ν, a x , a y ] for adv ection–diffusion. T o manage this heterogeneity , we employ ed a p ar ameter lifting scheme (see Metho ds) to pro ject v ariable-length vectors and 9 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.12 -0.56 0.00 0.56 1.12 O b s e r v e d u 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.12 -0.56 0.00 0.56 1.12 O b s e r v e d v 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.46 -0.23 -0.01 0.22 0.45 E s t i m a t e d v T 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.44 -0.22 0.00 0.22 0.44 G r o u n d - T r u t h v T 0.00 0.00 0.00 0.49 0.97 0.00 0.00 0.00 0.49 0.97 0.00 0.11 0.21 0.32 0.43 0.00 0.11 0.21 0.32 0.43 (a) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.01 -0.51 0.00 0.51 1.01 O b s e r v e d u 0 ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.44 -0.22 0.00 0.22 0.44 O b s e r v e d v T ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.03 -0.51 0.01 0.53 1.05 E s t i m a t e d v 0 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.01 -0.51 0.00 0.51 1.01 G r o u n d - T r u t h v 0 0.00 0.00 0.00 0.49 0.97 0.00 0.00 0.00 0.21 0.42 0.00 0.25 0.50 0.76 1.01 0.00 0.25 0.50 0.76 1.01 (b) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.25 -0.62 0.00 0.62 1.25 O b s e r v e d u 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.25 -0.62 0.00 0.62 1.25 O b s e r v e d v 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.13 -0.57 0.00 0.57 1.14 E s t i m a t e d u T 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.11 -0.55 0.00 0.55 1.11 G r o u n d - T r u t h u T 0.00 0.00 0.00 0.54 1.09 0.00 0.00 0.00 0.54 1.09 0.00 0.27 0.55 0.82 1.09 0.00 0.27 0.55 0.82 1.09 (c) Fig. 2 : F orw ard and in v erse inference for Navier–Stok es dynamics using pAD AM. a, F orward prediction of the v elo cit y comp onen t v T conditioned on 30% spatial observ ations of the initial states ( u 0 , v 0 ). b, Inv erse reconstruction of the initial v elo cit y comp onen t v 0 conditioned on full observ ation of the terminal comp onen t v T and the initial comp onen t u 0 . c, F orward prediction of the velocity comp onen t u T conditioned on 30% spatial observ ations of the initial states ( u 0 , v 0 ). disparate physical parameters into a unified representation compatible with the con- ditional prior. W e employ ed a mo del with the same capacity and architecture as in the con tinuous physics manifold and structural scaling inv estigations. 10 Under this increased parametric depth, pADAM p erformed state and multi- parameter inference across all families, as rep orted in T able 3 a. Results indicate that pAD AM maintained high accuracy for both forw ard prediction and inv erse state recon- struction even as the parameter space expanded, while parameter inference exhibited higher error b ecause of the increased ill-p osedness asso ciated with jointly inferring mul- tiple co efficients. Qualitative examples of pAD AM predictions across heterogeneous parameter dimensionalities are sho wn in Extended Data Fig. 7 . T o demonstrate pADAM’s utilit y for partial-parameter inference, we lev eraged the Ba yesian guidance formulation to incorp orate a priori physical knowledge at infer- ence time. F or the advection system, we compared the joint inference of the velocity v ector ϕ = [ a x , a y ] with conditional settings in which one comp onent was treated as an observed constrain t. As summarized in T able 3 b, incorp orating this a pri- ori knowledge—for example, a x | a y —substan tially improv ed the estimation of the remaining comp onen t across b oth full and sparse observ ational regimes. By explicitly restricting the parameter manifold to configurations consisten t with known physical constrain ts, pADAM mitigates ill-p osedness b y reducing the effective dimensionality of the inv erse problem. This capability enables the integration of physical priors in to real-time inference without mo del retraining. Iden tifying gov erning laws through probabilistic mo del selection A central challenge in scientific mac hine intelligence is the autonomous iden tification of gov erning laws from sparse, temp oral observ ations. W e next ev aluate pADAM’s capacit y for probabilistic mo del selection, challenging the framew ork to infer the underlying physical law and its asso ciated co efficien ts from as few as tw o sparse snap- shots ( c, ϕ ∼ p ( c, ϕ | u 0 , u T )). pADAM p erforms identification by leveraging its learned class-conditional prior to explore a candidate library of PDE classes. F or this ev alua- tion, w e utilize the generalist prior trained on the heterogeneous parameter datasets used in the parametric scaling in vestigation. As illustrated in Fig. 3 , the framew ork reliably distinguishes b et ween comp eting ph ysical hypotheses—such as advectiv e transp ort versus pure diffusion—even under significan t observ ational sparsit y . This selection follo ws an infer-and-validate logic (see Metho ds): for each candidate class, pAD AM lev erages its shared generativ e prior to infer the corresp onding co efficien t posterior and assess marginal consistency by cross- referencing parameter inference with generativ e state reconstructions. This pro cess allo ws the framework to compare candidate PDE classes and identify the gov erning la w that b est explains the observ ed state transition. Despite sev ere observ ational scarcity (30% observ ations), pADAM consisten tly selects the ground-truth op erator across the ev aluated scenarios while main taining parameter estimates that closely align with the true co efficien ts. Because the frame- w ork is fully generative, rep eated sampling naturally induces a distribution ov er candidate la ws and asso ciated parameters; the resulting ensem ble-based predictive in terv als provide a measure of the epistemic uncertaint y inherent in ph ysical mo del selection. These results suggest that pAD AM can serve as a probabilistic framework 11 T able 3 : Parametric scaling and partial-parameter inference p erformance. a, T ask-agnostic inference across a heterogeneous PDE library with different para- metric dimensionality . Relativ e L 2 errors (%) for state estimation ( u 0 and u T ) and parameter recov ery ( ϕ ) across three physical regimes with v arying parametric depth: Diffusion ( ϕ = [ ν ]), Adv ection ( ϕ = [ a x , a y ]), and Adv ection–diffusion ( ϕ = [ ν, a x , a y ]). b, Impact of partial parameter constraints on parameter inference in the advection system. Comparison of joint inference ( a x , a y ) against conditional settings where one v elo cit y comp onen t ( a x or a y ) is provided at inference time. All tasks are p erformed under b oth full (100%) and sparse (30%) spatial observ ations, with results represent- ing the mean o ver 50 test instances. a — T ask-agnostic inference across a heterogeneous PDE library with different parametric dimensionality PDE system Observ ation F orward ( u T ) Inv erse ( u 0 ) avg. Inv erse ( ϕ ) Diffusion F ull (100%) 0.98 1.18 2.89 Sparse (30%) 1.82 2.07 4.61 Adv ection F ull (100%) 0.85 1.03 6.57 Sparse (30%) 1.64 1.32 6.61 Adv ection–diff. F ull (100%) 1.49 1.72 5.99 Sparse (30%) 2.31 2.71 5.95 b — Partial-parameter inference under conditional physical constrain ts in the adv ection system Observ ation Inference setting In verse ( a x ) Inv erse ( a y ) F ull (100%) a x , a y ∼ p ( a x , a y | u 0 , u T ) 6.31 6.83 a x ∼ p ( a x | a y , u 0 , u T ) 2.00 – a y ∼ p ( a y | a x , u 0 , u T ) – 2.30 Sparse (30%) a x , a y ∼ p ( a x , a y | u 0 , u T ) 6.32 6.91 a x ∼ p ( a x | a y , u 0 , u T ) 2.44 – a y ∼ p ( a y | a x , u 0 , u T ) – 2.92 for characterizing systems whose gov erning dynamics are am biguous or only partially observ ed from limited data. Discussion The results show that a single class-conditional generative prior can supp ort forward prediction, inv erse inference, and probabilistic mo del selection across multiple PDE families, including b oth scalar and vector-v alued systems. Rather than relying on sep- arate task-sp ecific solvers, pADAM learns a shared probabilistic represen tation that 12 O b s e r v e d u 0 ( 3 0 % ) O b s e r v e d u T ( 3 0 % ) 0 . 83 0 . 61 0 . 39 0 . 16 - 0 . 06 1 . 0 y 0 . 5 1 . 0 1 . 0 0 . 5 0 . 5 0 . 5 0 . 0 0 . 0 x 0 . 84 0 . 62 0 . 39 0 . 17 - 0 . 06 1 . 0 y 0 . 0 0 . 0 x 0 . 19 0 . 39 0 . 58 0 . 77 0 . 00 0 . 20 0 . 39 0 . 59 0 . 79 (a) Advection regime T rue PDE: ∂ t u + [2 . 56 , 2 . 97] · ∇ u = 0 Sampled PDE: ∂ t u + [2 . 60 , 2 . 87] · ∇ u = 0 95% Interv al: ∂ t u + [2 . 70 ± 0 . 10 , 2 . 79 ± 0 . 09] · ∇ u = 0 O b s e r v e d u 0 ( 3 0 % ) O b s e r v e d u T ( 3 0 % ) 0 . 77 0 . 56 0 . 36 0 . 15 - 0 . 05 1 . 0 y 0 . 5 1 . 0 1 . 0 0 . 5 0 . 5 0 . 5 0 . 0 0 . 0 x 0 . 33 0 . 24 0 . 15 0 . 07 - 0 . 02 1 . 0 y 0 . 0 0 . 0 x 0 . 18 0 . 36 0 . 54 0 . 72 0 . 08 0 . 15 0 . 23 0 . 31 (b) Diffusion regime T rue PDE: ∂ t u = 0 . 29∆ u Sampled PDE: ∂ t u = 0 . 30∆ u 95% Interv al: ∂ t u = 0 . 30( ± 0 . 0010)∆ u O b s e r v e d u 0 ( 3 0 % ) O b s e r v e d u T ( 3 0 % ) 0 . 64 0 . 47 0 . 30 0 . 13 - 0 . 04 1 . 0 y 0 . 5 1 . 0 1 . 0 0 . 5 0 . 5 0 . 5 0 . 0 0 . 0 x 0 . 24 0 . 18 0 . 12 0 . 05 - 0 . 01 1 . 0 y 0 . 0 0 . 0 x 0 . 15 0 . 30 0 . 45 0 . 59 0 . 06 0 . 12 0 . 17 0 . 23 (c) Advection–diffusion regime T rue PDE: ∂ t u + [2 . 40 , 2 . 30] · ∇ u = 0 . 33∆ u Sampled PDE: ∂ t u + [2 . 37 , 2 . 19] · ∇ u = 0 . 34∆ u 95% Interv al: ∂ t u + [2 . 29 ± 0 . 06 , 2 . 35 ± 0 . 18] · ∇ u = 0 . 34( ± 0 . 016)∆ u Fig. 3 : Probabilistic mo del selection across three PDE classes. Identifica- tion of go v erning laws from tw o snapshots ( u 0 , u T ) with only 30% of the spatial field observed. a–c, Left panels show sparse field snapshots; right panels compare the ground-truth PDEs with representativ e sampled PDEs and the asso ciated 95% pre- dictiv e interv als deriv ed from the ensemble. can b e conditioned on different forms of partial information. This suggests that het- erogeneous physical systems can b e addressed within a unified generative framew ork while main taining accurate inference under sparse observ ations. A cen tral result is that pADAM can identify gov erning laws from only tw o sparse snapshots by comparing ho w well candidate PDE classes explain the observed tran- sition under the learned prior. Unlike classical discov ery metho ds that rely on dense tra jectories, this approac h formulates PDE iden tification as probabilistic inference 13 o ver a candidate library of ph ysical mo dels. Because the comparison is generative rather than deterministic, it also yields uncertain ty o ver candidate la ws and associated parameters. Crucially , conformal calibration strengthens pAD AM as a framework for scientific inference by pro viding distribution-free finite-sample co verage guaran tees for predic- tiv e in terv als. This is particularly important in sparse and ill-p osed settings, where raw predictiv e interv als ma y exhibit substantial undercov erage. In addition, the zero-shot extrap olation results suggest that the learned prior is not limited to a fixed training library and can remain effectiv e under op erator shift when guided by observ ations. The pADAM framework is designed to remain compatible with a broad class of generativ e backbones, making it adaptable to emerging paradigms suc h as flo w matc h- ing [ 36 ] and recent mean-flow approaches [ 37 ], which may improv e sampling efficiency and reduce computational cost. F uture extensions could further enhance p erformance, particularly in extreme low-data regimes, by incorp orating physics-informed priors directly into the generativ e process [ 33 ]. In addition, extending the framew ork to utilize function-space diffusion [ 30 ] ma y enable discretization-inv ariant inference in infinite-dimensional settings and supp ort inference across v arying grid resolutions. Bey ond these technical extensions, an imp ortan t next direction is to mov e b eyond fixed candidate libraries tow ard more flexible forms of equation discov ery for sys- tems with partially unknown physics. Such developmen ts could improv e the ability of generativ e framew orks to relate high-dimensional observ ations to in terpretable mathematical structure. T aken together, these results suggest that shared generative physical priors can supp ort prediction, inference, and mo del disco very within a single probabilistic frame- w ork. More broadly , this w ork can b e viewed as a pilot step tow ard foundation mo dels for science, in whic h large generativ e models are trained across div erse physical domains and adapted to downstream scientific and engineering tasks. In that sense, pAD AM p oin ts tow ard a class of generalist scientific machine learning mo dels that remain effectiv e even when observ ations are sparse, irregular, or incomplete. Metho ds Problem form ulation and ob jectiv e W e consider a library of C distinct PDE families. Each class c ∈ { 1 , . . . , C } is asso ciated with a gov erning op erator F ( c ) represen ting a sp ecific ph ysical dynamical regime: F ( c ) u ( s , t ); ϕ = 0 , ( s , t ) ∈ Ω × (0 , T ] , u ( s , 0) = u 0 ( s ) , (1) sub ject to appropriate b oundary conditions on ∂ Ω. Here, u ( s , t ) denotes the solution field in spatial co ordinates s ∈ Ω, and ϕ ∈ R d c represen ts the v ector of physical co ef- ficien ts parameterizing the dynamics within class c . This formulation accommo dates b oth scalar-v alued and v ector-v alued states (for example, coupled v elo cit y comp o- nen ts u = ( u, v ) in Navier–Stok es). In this work, all exp erimen ts are conducted on t wo-dimensional spatial domains; thus, s ∈ Ω ⊂ R 2 . The ob jectiv e is to learn a unified, class-conditional generative prior p ( x | c ), where x is a multi-c hannel v ariable encapsulating the joint distribution of temp oral system 14 states and, where applicable, gov erning physical parameters within a shared proba- bilistic represen tation. By parameterizing this prior via a diffusion mo del [ 38 , 39 ], we treat forw ard prediction, inv erse state estimation, and parameter recov ery as p oste- rior sampling tasks of the form p ( x | y obs , c ), where y obs represen ts an arbitrary set of observ ations spanning from full-field data to sp arse measurements. This formula- tion enables task-agnostic inference, in which div erse do wnstream physical tasks are p erformed by conditioning a shared manifold of ph ysical dynamics on av ailable data without requiring task-sp ecific architectures or retraining. The pAD AM framework: a unified foundation for m ulti-ph ysics learning The pAD AM framework brings heterogeneous PDE families into a shared genera- tiv e formulation (Fig. 1 ). As illustrated in Fig. 1 a–c, a class-conditional diffusion mo del learns the join t distribution of the unified representation, comprising system states and, where applicable, physical parameters, and thereb y supp orts generation across multiple ph ysical regimes from Gaussian noise (Fig. 1 c, orange tra jectories). As sho wn in Fig. 1 d, task-agnostic inference is p erformed through Ba yesian conditioning: b y incorp orating full or sparse observ ations through plug-and-play guidance (green observ ation-guided tra jectory), the shared pAD AM prior supp orts forward prediction, in verse state and parameter inference, uncertaint y quantification, and probabilistic mo del selection within a single framework. W e detail eac h comp onen t of pADAM in the follo wing sections. Unified join t representation and parameter lifting T o enable a single architecture to pro cess heterogeneous physics, w e transform each system in to a unified 3-c hannel generativ e v ariable x ∈ R 3 × N x × N y . This representation ensures arc hitectural inv ariance across differen t physical regimes: x := ( [ Φ , u 0 , u T ] for scalar-field PDEs [ u 0 , v 0 , u T or v T ] for vector-field PDEs (2) where Φ is a spatially lifted representation of the co efficien ts ϕ , and u, v represent coupled velocity-field comp onen ts. By maintaining a fixed channel dimension, a single class-conditional diffusion mo del can b e deploy ed across diverse dynamical systems without structural mo dification. While this configuration is optimized for the forward and inv erse tasks presen ted in this w ork, the framew ork is inheren tly modular, allo wing for alternative channel assignments dep ending on the requirements of the physical regime. Generativ e inclusion of physical parameters. T reating ϕ as a comp onen t of the generativ e v ariable x , rather than a static conditioning input, is a key mo deling c hoice for probabilistic in verse inference in parametric systems. This design allows the mo del to learn the intrinsic joint prior p ( ϕ, u 0 , u T | c ), enabling the reco very of unknown ph ysical parameters through p osterior sampling ϕ ∼ p ( ϕ | u 0 , u T , c ) at inference time. 15 Mathematical lifting scheme. T o bridge the dimensionality gap b et ween co effi- cien ts d c and spatial states N x × N y , we define a lifting op erator L that broadcasts ϕ in to a spatial field Φ. This pro cess ensures that physical parameters are represented as spatially compatible features: • Scalar case ( d c = 1 ): Φ i,j = ϕ for all ( i, j ) ∈ Ω. • V ector case ( d c > 1 ): Φ i,j := P d c k =1 ϕ k 1 Ω k ( i, j ), where { Ω k } d c k =1 is a disjoint partition of the spatial grid. This spatial lifting provides a consistent inductive bias, allowing the mo del’s hierar- c hical conv olutional architecture to capture the functional dep endencies b etw een the go verning physical parameters (Φ) and the system states ( u 0 , u T ). Learning unified generative priors across heterogeneous physical regimes T o enable a unified mo deling framework for multi-ph ysics systems, we learn a single generativ e prior p ( x | c ) using a class-conditional diffusion model [ 40 ] that cap- tures the joint distribution of system states and, where applicable, gov erning ph ysical parameters across the library of PDE families C . This approac h leverages shared laten t structure across heterogeneous op erators, establishing a unified generative represen tation that captures dynamical structure across disparate PDE families. The represen tation is adapted to the structural dep endencies of eac h physical family: • P arametric scalar-field PDEs: W e learn the join t prior p (Φ , u 0 , u T | c ). By including the lifted parameter field Φ in the generativ e v ariable x , the mo del captures the relationship b et ween ph ysical co efficien ts and system states. • V ector-v alued PDEs: F or systems with fixed ph ysical parameters, we learn comp onen t-wise join t priors, p ( u 0 , v 0 , u T | c ) and p ( u 0 , v 0 , v T | c ). This strategy uses the av ailable channel capacit y to resolv e coupled velocity comp onen ts, enabling the mo del to capture dep endencies betw een initial and terminal states. W e adopt the EDM framework [ 41 ] to parameterize this prior. In this implemen- tation, the mo del is trained using a conditional denoising score-matching ob jective: L train = E x 0 ,c,σ,n λ ( σ ) ∥ D θ ( x 0 + n ; σ, c ) − x 0 ∥ 2 2 , (3) where n ∼ N (0 , σ ( t ) 2 I ) denotes additiv e Gaussian noise at diffusion level σ ( t ), and λ ( σ ) is a preconditioning weigh t. The diffusion pro cess defines a time-dep enden t generativ e v ariable x = x ( t ) for t ∈ [0 , T ], where x 0 = x (0) denotes a clean sample from the class-conditional data distribution and x ( t ) b ecomes progressiv ely noisier as t increases under the noise schedule σ ( t ). At the terminal time t = T , the diffusion distribution approaches a Gaussian prior, so that sampling b egins from a Gaussian noise distribution and is then transp orted bac k tow ard the data manifold. Here, D θ denotes the denoiser that maps noisy inputs to w ard the clean manifold. Through the relation s θ ( x ; σ ( t ) , c ) = ( D θ ( x ; σ ( t ) , c ) − x ) /σ ( t ) 2 , the netw ork learns to appro xi- mate the conditional score function, yielding an estimate of the log-density gradien t 16 ∇ x log p t ( x | c ), where p t ( x | c ) denotes the class-conditional marginal distribution of the diffusing sample at time t . Once this score estimator is obtained, samples are ev olved through a deterministic mapping gov erned b y the Probabilit y Flo w Ordinary Differen tial Equation (ODE) [ 38 , 41 ]: dx dt = − ˙ σ ( t ) σ ( t ) s θ ( x ; σ ( t ) , c ) , (4) By evolving this ODE at inference time, pAD AM transforms sto chastic noise in to physically consistent realizations, steering sample tra jectories along the learned class-conditional manifold so that generated samples remain consistent with the c haracteristic dynamics of the sp ecified ph ysical regime. T ask-agnostic inference via plug-and-pla y observ ational guidance A key feature of pADAM is its abilit y to p erform task-agnostic inference in a plug- and-play manner, where forward prediction, inv erse state estimation, and co efficient reco very are unified as conditional sampling problems without requiring mo del retrain- ing. Given full or sparse observ ations y obs , we sample from the p osterior distribution p ( x | y obs , c ) using the learned class-conditional generative prior. F ollowing Ba yes’ rule, the p osterior score is decomp osed as [ 38 ]: ∇ x log p t ( x | y obs , c ) ≈ s θ ( x ; σ ( t ) , c ) | {z } Prior Score + ∇ x log p t ( y obs | x, c ) | {z } Likelihood Score , (5) where the prior score is provided b y the pre-trained class-conditional mo del, and the lik eliho o d score enforces consistency with av ailable measurements. Under additive Gaussian measurement noise, and following the formulation for score-based guidance [ 29 , 42 ], the lik eliho o d score is approximated at inference time as: ∇ x log p t ( y obs | x, c ) ≈ − λ obs ∇ x ∥ y obs − A ( ˆ x 0 ( x ) ) ∥ 2 2 , (6) where A denotes a task-sp ecific measuremen t op erator, and ˆ x 0 is the denoiser’s esti- mate of the clean system representation. Substituting this approximation into the Probabilit y Flow ODE ( 4 ) yields the guided probability-flo w dynamics: dx dt = − ˙ σ ( t ) σ ( t ) s θ ( x ; σ ( t ) , c ) − λ obs ∇ x ∥ y obs − A ( ˆ x 0 ( x ) ) ∥ 2 2 , (7) The guidance scale λ obs acts as a w eighting factor that balances the learned prior against observ ational constraints. Under this formulation, the guided ODE maps an initial Gaussian noise sample to realizations that remain consistent with b oth the sp ecified physical class and the av ailable measurements. The prior score steers the tra jectory tow ard high-probability regions of the class-conditional distribution, while the observ ation-based correction enforces consistency with y obs . F orwar d and inverse op er ator synthesis The flexibility of the observ ation op erator A enables pAD AM to supp ort inference across a diverse set of physical op erators within a single arc hitecture. This v ersatility is demonstrated across three fundamen tal classes of physics problems: 17 • F orw ard prediction: F or parametric scalar PDEs, given the co efficients Φ and the initial state u 0 , pAD AM samples the terminal state u T ∼ p ( u T | u 0 , Φ , c ). F or v ector-v alued systems (e.g., Navier–Stok es), the mo del p erforms comp onent-wise prediction of terminal v elo cities u T ∼ p ( u T | u 0 , v 0 , c ) and v T ∼ p ( v T | u 0 , v 0 , c ). • In verse state estimation: F or scalar fields, pAD AM samples u 0 ∼ p ( u 0 | u T , Φ , c ). F or m ulti-comp onen t systems where join t reconstruction is highly ill-posed, w e sam- ple u 0 ∼ p ( u 0 | v 0 , u T , c ) and v 0 ∼ p ( v 0 | u 0 , v T , c ); this auxiliary conditioning constrains the solution space and mitigates the ill-p osedness of the inv erse recov ery . • In verse parameter identification: F or scalar parametric systems, given paired observ ations of the initial and terminal states ( u 0 , u T ), pADAM recov ers the go verning co efficients Φ ∼ p (Φ | u 0 , u T , c ). Crucially , A can represent b oth full-field and sparse observ ations, allo wing pAD AM to op erate across different data-densit y regimes. By unifying these tasks across het- erogeneous ph ysical domains, pADAM functions as a pr ob abilistic fr amework for multi-op er ator, multi-physics infer enc e . This shared formulation enables forw ard and in verse problems to b e addressed within a single architecture without task-specific retraining. Reliable uncertain ty quan tification via conformal calibration As a generative framework, pADAM provides uncertaint y quan tification (UQ) through p osterior sampling. By dra wing multiple conditional samples from the observ ation- guided reverse pro cess, we obtain an empirical distribution consistent with b oth the ph ysical constraints of the PDE class and the av ailable measurements. The uncer- tain ty captured here is primarily epistemic , reflecting ambiguit y arising from factors including sparse observ ations, inv erse ill-p osedness, and limited data. This is particu- larly imp ortan t in scientific settings where reliable uncertaint y estimates are needed to supp ort inference. Ensemble-b ase d unc ertainty estimation F or a given conditioning sp ecification—encompassing the PDE class, partial state observ ations, and known coefficients—w e generate an ensemble of M indep en- den t samples { z j } M j =1 from the guided reverse diffusion pro cess ( 7 ), where z ∈ { Φ , u 0 , u T , v 0 , v T } . W e characterize this predictiv e distribution through the ensem ble mean µ M and standard deviation σ M : µ M = 1 M M X j =1 z j , σ M = v u u t 1 M − 1 M X j =1 z j − µ M 2 . While a standard Gaussian-based 95% interv al ( µ M ± 1 . 96 σ M ) is a common base- line, conditional distributions in chaotic or under-determined PDE regimes frequently deviate from Gaussianity , exhibiting significant skewness, heavy tails, or multimodal- it y . Consequently , these nominal in terv als provide no formal guarantees and can suffer from substan tial miscalibration. This issue is esp ecially relev ant in ill-p osed inv erse 18 inference and prediction under sparse observ ations, where multiple physical solutions ma y b e consisten t with the av ailable data. Conformal c alibr ation with distribution-fr e e c over age T o pro vide the reliabilit y required for scientific inference, we in tegrate c onformal pr e diction [ 34 , 35 ]. This pro cedure p ost-processes pADAM’s ensemble estimates to pro duce prediction interv als with finite-sample, distribution-free cov erage guarantees. Because the scale of uncertain ty v aries across heterogeneous physical regimes, calibra- tion is p erformed indep enden tly for each class–task pair ( c, τ ). Under the exchange- abilit y assumption for the calibration and test samples, we compute nonconformity scores s ( c,τ ) i on a calibration dataset D ( c,τ ) cal of size n c,τ to quan tify the normalized discrepancy b et ween the ground truth and the predictive distribution [ 22 , 43 ]: s ( c,τ ) i = | z i − µ M ( y obs ,i ) | σ M ( y obs ,i ) . (8) Giv en a target miscov erage rate α ∈ (0 , 1), the calibration threshold ˆ q ( c,τ ) α is defined as the ⌈ ( n c,τ + 1)(1 − α ) ⌉ /n c,τ -th empirical quantile of the scores in D ( c,τ ) cal . W e then construct the final conformal in terv al: C ( c,τ ) α ( y obs , test ) = h µ M ( y obs , test ) ± ˆ q ( c,τ ) α σ M ( y obs , test ) i . (9) Under exchangeabilit y , this construction guaran tees marginal co verage of at least 1 − α , pro viding a principled measure of predictive reliability . Probabilistic PDE mo del selection from tw o snapshots A core strength of pAD AM is its ability to accommodate heterogeneous parameter rep- resen tations across multiple PDE families through parameter lifting. This architectural design main tains structural consistency even when the dimensionalit y of the physical parameters v aries across PDE classes. F or example, a PDE library may include diffu- sion equations defined b y scalar diffusivit y ν , adv ection equations go verned by v elo city comp onen ts ( a x , a y ), and advection–diffusion systems parameterized by ( ν , a x , a y ). W e lev erage this capabilit y to pe rform pr ob abilistic mo del sele ction fr om only two snap- shots for scalar-field PDEs, casting the identification of gov erning laws from sparse temp oral observ ations as a probabilistic inference task. Giv en only a pair of state snapshots, consisting of an initial state u 0 and a terminal state u T , which may be av ailable through sparse measurements, the goal is to identify the go verning PDE class from a candidate library C = c 1 , . . . , c K and infer the asso- ciated ph ysical parameters. This setting is highly ill-p osed and represents a form of probabilistic PDE discov ery . In contrast to classical iden tification approaches requir- ing dense spatiotemp oral tra jectories, pADAM leverages its learned multi-operator prior to ev aluate the consistency of each candidate law with the observed transi- tion. By sampling class-conditional parameter p osteriors, the framework effectively reconstructs the joint p osterior p ( c, ϕ | u 0 , u T ), naturally quantifying the epistemic uncertain ty inherent in identifying ph ysics from limited temp oral snapshots. 19 The selection pro cedure follows a infer-and-validate logic: 1. Conditional parameter inference: F or eac h candidate class c ∈ C , we sample from the class-conditional parameter p osterior ˆ ϕ ( c ) ∼ p ( ϕ | u 0 , u T , c ). This iden tifies parameter configurations that are maximally consistent with the observed state transition under class c . 2. Generativ e v alidation: W e then use forward prediction to sample a synthetic terminal state ˆ u ( c ) T ∼ p ( u T | ˆ ϕ ( c ) , u 0 , c ). Each candidate is ev aluated based on its generativ e reconstruction discrepancy: E ( c ) = ∥ ˆ u ( c ) T − u T ∥ 2 . (10) The identified op erator c ⋆ = arg min c ∈C E ( c ), together with the corresponding inferred co efficien ts ϕ ⋆ = ˆ ϕ ( c ⋆ ) , defines the explicit gov erning mo del that b est explains the observed dynamics. Because pADAM is fully generative, rep eated ensem ble sam- pling allo ws us to quan tify the uncertain ty asso ciated with mo del selection. This enables a principled approac h to unc ertainty-awar e mo del sele ction , where the relative supp ort for comp eting ph ysical laws is reflected in the distribution of reconstruction discrepancies. Exp erimen tal design Benc hmark PDE Problems for Exp erimen ts T o assess the generalit y of the pADAM framework, w e consider seven representativ e PDE families spanning dissipative, advectiv e, mixed, and nonlinear dynamics. This m ulti-physics library provides a stringen t testb ed for unified op erator learning across heterogeneous ph ysical regimes. Datasets for the diffusion, adv ection, adv ection–diffusion, advection–diffusion– reaction, Burgers’, and Allen–Cahn equations are generated using finite-difference discretizations, while Navier–Stok es solutions are computed using a F ourier sp ectral solv er. Diffusion equation. W e first consider the diffusion equation, which mo dels a purely dissipative pro cess in whic h spatial gradients of a scalar field are smo othed b y diffusion. Let u ( x, y , t ) denote a scalar quan tit y defined on the spatial domain Ω = (0 , 1) × (0 , 1) for t ≥ 0. The gov erning equation reads ∂ t u = ν ∆ u in Ω × (0 , T ] , (11) where ν > 0 denotes the diffusion co efficient and ∆ = ∂ xx + ∂ y y is the Laplacian op erator. W e assume that u is sufficiently smo oth, for example u ∈ C 2 , 1 (Ω × (0 , T ]), so that the deriv atives in Eq. ( 11 ) are well defined. The system is equipp ed with the initial condition u ( x, y , 0) = exp − ( x − x c ) 2 + ( y − y c ) 2 w 0 sin( π x ) sin( π y ) , ( x, y ) ∈ Ω , (12) 20 where the centroid and width parameters ( x c , y c , w 0 ) are randomly sampled according to the uniform distributions sp ecified in Extended Data T able 1 . The sinusoidal tap er ensures smo oth decay tow ard the b oundary . On the b oundary ∂ Ω, we imp ose the homogeneous Neumann b oundary condition ∇ u · n = 0 on ∂ Ω × (0 , T ] , (13) where n denotes the outw ard unit normal v ector. Adv ection equation. W e next consider the advection equation, whic h describ es the transp ort of a scalar field by a prescrib ed uniform velocity field. The gov erning equation is giv en by ∂ t u + a · ∇ u = 0 in Ω × (0 , T ] , (14) where a = ( a x , a y ) ∈ R 2 is a constan t advection velocity v ector. W e assume sufficient regularit y of the solution, e.g., u ∈ C 1 (Ω × (0 , T ]), so that the deriv atives in Eq. ( 14 ) exist. The advection problem is p osed with the same initial condition ( 12 ) and the homogeneous Neumann b oundary condition ∇ u · n = 0 on ∂ Ω × (0 , T ] . (15) The solution corresp onds to a translation of the initial profile along the flow direction prescrib ed by a , while preserving the ov erall shap e and amplitude. Adv ection–diffusion equation. W e consider the advection–diffusion equation, whic h describ es the combined effects of advectiv e transp ort and diffusive spreading. The go verning equation takes the form ∂ t u + a · ∇ u = ν ∆ u in Ω × (0 , T ] , (16) where a = ( a x , a y ) ∈ R 2 is a constant advection v elo cit y and ν > 0 denotes the diffusion coefficient. A sufficien tly smo oth solution is assumed, for instance u ∈ C 2 , 1 (Ω × (0 , T ]), ensuring that Eq. ( 16 ) is prop erly defined. The system is supplemented b y the initial condition ( 12 ) and a homogeneous Neumann b oundary condition ∇ u · n = 0 on ∂ Ω × (0 , T ] . (17) The solution exhibits advectiv e transp ort together with diffusive smo othing, resulting in spreading and amplitude deca y ov er time. Adv ection–diffusion–reaction equation. W e consider the advection–diffusion– reaction equation, whic h describ es the combined effects of advectiv e transp ort, diffusiv e spreading, and lo cal reaction. The go verning equation takes the form ∂ t u + a · ∇ u = ν ∆ u + R ( u ) in Ω × (0 , T ] , (18) where a ∈ R 2 is a constan t advection v elo cit y , ν > 0 denotes the diffusion co efficien t, and R ( u ) is a reaction term. A sufficiently smo oth solution is assumed, for instance u ∈ 21 C 2 , 1 (Ω × (0 , T ]), ensuring that Eq. ( 18 ) is prop erly defined. The system is supplemented b y the initial condition ( 12 ) and a homogeneous Neumann b oundary condition ∇ u · n = 0 on ∂ Ω × (0 , T ] . (19) In this study , we consider the linear reaction term R ( u ) = k u , with k ≥ 0, rep- resen ting lo cal growth dynamics. When R ( u ) = 0, the equation reduces to the adv ection–diffusion equation. Allen–Cahn equation. W e consider the Allen–Cahn equation, a prototypi- cal nonlinear reaction–diffusion mo del that describ es phase separation and interface dynamics in bistable systems. Let u ( x, y , t ) denote an order parameter defined on the spatial domain Ω = (0 , 1) × (0 , 1) for t ≥ 0. The gov erning equation is giv en by ∂ t u = ε 2 ∆ u − 1 ε 2 u 3 − u in Ω × (0 , T ] , (20) where ε > 0 is a small parameter controlling the interfacial thickness and ∆ = ∂ xx + ∂ y y denotes the Laplacian operator. W e restrict attention to sufficiently smo oth solutions u , e.g., u ∈ C 2 , 1 (Ω × (0 , T ]). The system is initialized with the same initial condi- tion defined in Eq. ( 12 ). On the b oundary of the domain, we imp ose a homogeneous Diric hlet b oundary condition, u = 0 on ∂ Ω × (0 , T ] . (21) This condition fixes the phase v ariable at the b oundary and preven ts interface motion across the domain b oundary . Over time, the solution reflects the combined effects of diffusion and nonlinear reaction, leading to smo othing and phase separation b eha vior. Burgers’ equation. The tw o-dimensional Burgers’ equation is a nonlinear mo del com bining conv ective transp ort and viscous diffusion. The equation describ es the evo- lution of a velocity field u ( x, y , t ) = ( u 1 ( x, y , t ) , u 2 ( x, y , t ) ) defined on the square domain Ω = ( − 1 , 1) × ( − 1 , 1), and takes the form ∂ t u + ( u · ∇ ) u = ν ∆ u , (22) where ν > 0 denotes the kinematic viscosit y . W e consider sufficiently smo oth velocity fields, for example u ∈ C 2 , 1 (Ω × (0 , T ]), so that all deriv atives in Eq. ( 22 ) are w ell defined. W ritten in comp onen t form, the system b ecomes ∂ u 1 ∂ t + u 1 ∂ u 1 ∂ x + u 2 ∂ u 1 ∂ y = ν ∂ 2 u 1 ∂ x 2 + ∂ 2 u 1 ∂ y 2 , ∂ u 2 ∂ t + u 1 ∂ u 2 ∂ x + u 2 ∂ u 2 ∂ y = ν ∂ 2 u 2 ∂ x 2 + ∂ 2 u 2 ∂ y 2 . (23) The initial condition consists of spatially lo calized v elo cit y fields with Gaussian pro- files, modulated by a sine taper to satisfy homogeneous Diric hlet b oundary conditions. 22 Sp ecifically , u 1 ( x, y , 0) = exp − ( x − c x, 1 ) 2 + ( y − c y , 1 ) 2 w 1 sin( π x ) sin( π y ) , u 2 ( x, y , 0) = exp − ( x − c x, 2 ) 2 + ( y − c y , 2 ) 2 w 2 sin( π x ) sin( π y ) , (24) where the comp onen t centr oids ( c x,i , c y ,i ) and width parameters w i (for i = 1 , 2) are randomly sampled according to the uniform distributions defined in Extended Data T able 1 . The velocity field satisfies homogeneous Dirichlet b oundary conditions, u ( x, y , t ) = 0 , ( x, y ) ∈ ∂ Ω . (25) The nonlinear term ( u · ∇ ) u describ es adv ective transp ort of the velocity field, while the viscous term ν ∆ u introduces diffusiv e smoothing. The solution behavior is determined b y the relative magnitude of conv ection and viscosity . Incompressible Na vier–Stokes equations. W e next consider the tw o- dimensional incompressible Navier–Stok es equations, which describ e the evolution of a divergence-free velocity field driven by nonlinear advection and balanced b y pres- sure and viscous diffusion. Let u ( x, y , t ) = ( u 1 ( x, y , t ) , u 2 ( x, y , t )) denote the velocity field and p ( x, y , t ) the kinematic pressure on the spatial domain Ω = (0 , L ) × (0 , L ) for t ≥ 0. The gov erning equations without external forcing read ∂ t u + ( u · ∇ ) u + ∇ p = ν ∆ u , in Ω × (0 , T ] , ∇ · u = 0 , in Ω × [0 , T ] , (26) where ν > 0 is the kinematic viscosity . W e assume sufficien tly smo oth v elo cit y and pressure fields ( u , p ) so that Eq. ( 26 ) is well defined. The system is initialized by a solenoidal velocity field parameterized by an amplitude factor a , sampled from the uniform distribution sp ecified in Extended Data T able 1 . F or all ( x, y ) ∈ Ω, the initial condition is defined as u 1 ( x, y , 0) = − a ϕ 2 π y L , u 2 ( x, y , 0) = a ψ 4 π x L , (27) where each of ϕ and ψ is indep endently chosen from { sin( · ) , cos( · ) } , resulting in four p ossible sine–cosine com binations. This construction satisfies the incompressibilit y constrain t at t = 0, since u 1 dep ends only on y and u 2 only on x , which directly implies ∇ · u ( · , · , 0) = 0 . W e imp ose p eriodic b oundary conditions for b oth v elo cit y and pressure, u ( x + L, y , t ) = u ( x, y , t ) , u ( x, y + L, t ) = u ( x, y , t ) , p ( x + L, y , t ) = p ( x, y , t ) , p ( x, y + L, t ) = p ( x, y , t ) , (28) for all ( x, y ) ∈ Ω and t ∈ [0 , T ]. 23 Ev aluation metrics W e employ a set of ev aluation metrics to assess accuracy , robustness to physical missp ecification, and statistical reliability . P oint wise accuracy . Poin twise accuracy is quantified using the relative L 2 p ercen tage error: Rel- L 2 (%) = 100 × ∥ u pred − u true ∥ 2 ∥ u true ∥ 2 . (29) As pADAM is a generative mo del, point predictions are obtained b y dra wing a single sample from the learned solution distribution for each test case. Rep orted errors are a veraged ov er 50 indep enden t test instances for each problem setting. Quan tifying op erator shift. T o assess the generativ e prior’s robustness to out-of-distribution (OOD) physical dynamics, we define the op erator shift ∆ op as a measure of the discrepancy b et ween the trained physical library and an unseen target dynamic. F or a target op erator P unseen and a reference training op erator P train , the shift is quantified by the relativ e L 2 deviation of their resp ectiv e terminal states u T ev olving from an identical initial condition u 0 : ∆ op (%) = 100 × ∥ u ( P unseen ) T − u ( P train ) T ∥ 2 ∥ u ( P train ) T ∥ 2 . (30) In the con text of the zero-shot extrap olation exp erimen ts presen ted in this study , this metric captures the physical departure induced b y the reaction term k in the adv ection–diffusion–reaction (ADR) system relative to the base advection–diffusion (AD) tra jectory . This serves as a formal proxy for the degree of physical missp ecifica- tion the pAD AM prior must reconcile during observ ation-guided sampling. Uncertain ty quantification. The reliability of uncertaint y quantification is ev aluated using the prediction interv al cov erage probability (PICP): PICP(%) = 100 |S | X s ∈S 1 u true ( s ) ∈ [ ˆ u low ( s ) , ˆ u high ( s ) ] , (31) where S denotes the set of spatial co ordinates and 1 {·} represents the indicator function. Data a v ailability The data that supp ort the findings of this study are a v ailable from the corresp onding author up on reasonable request. Co de a v ailability The co de used to generate the results of this study will b e made publicly av ailable up on publication at the following rep ository: h ttps://github.com/Mollaali/pAD AM . 24 Ac kno wledgemen t W e w ould lik e to thank the supp ort of National Science F oundation (DMS-2533878, DMS-2053746, DMS-2134209, ECCS-2328241, CBET-2347401 and O AC-2311848), and U.S. Departmen t of Energy (DOE) Office of Science Adv anced Scientific Comput- ing Research program DE-SC0023161, the SciDA C LEADS Institute, and DOE–F usion Energy Science, under gran t num b er: DE-SC0024583. References [1] Ev ans, L.C.: P artial Differential Equations, 2nd edn. Graduate Studies in Mathematics, v ol. 19. American Mathematical So ciet y , Pro vidence, RI (2010) [2] Logan, J.D.: Applied Partial Differential Equations, 3rd edn. Undergraduate T exts in Mathematics. Springer, Cham (2014) [3] Fisher, M., No cedal, J., T r ´ emolet, Y., W right, S.J.: Data assimilation in weather forecasting: a case study in p de-constrained optimization. Optimization and Engineering 10 (3), 409–426 (2009) [4] F oias, C., Manley , O., Rosa, R., T emam, R.: Navier-Stok es Equations and T ur- bulence. Encyclop edia of Mathematics and its Applications, v ol. 83. Cam bridge Univ ersity Press, Cambridge (2001) [5] Pletc her, R.H., T annehill, J.C., Anderson, D.A.: Computational Fluid Mechanics and Heat T ransfer, 3rd edn. CRC Press, Bo ca Raton (2013) [6] Brenner, S.C., Scott, L.R.: The Mathematical Theory of Finite Elemen t Methods, 3rd edn. T exts in Applied Mathematics. Springer, New Y ork, NY (2008) [7] LeV eque, R.J.: Finite Difference Metho ds for Ordinary and Partial Differential Equations: Steady-State and Time-Dep enden t Problems. SIAM, Philadelphia, P A (2007) [8] Ghattas, O., Willcox, K.: Learning ph ysics-based mo dels from data: persp ectives from inv erse problems and mo del reduction. Acta Numerica 30 , 445–554 (2021) [9] Li, Z., Ko v ac hki, N., Azizzadenesheli, K., Liu, B., Bhattachary a, K., Stuart, A., Anandkumar, A.: F ourier neural op erator for parametric partial differential equations. arXiv preprin t arXiv:2010.08895 (2020) [10] Ko v achki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattachary a, K., Stuart, A., Anandkumar, A.: Neural op erator: Learning maps b etw een function spaces with applications to p des. Journal of Machine Learning Research 24 (89), 1–97 (2023) [11] W en, G., Li, Z., Azizzadenesheli, K., Anandkumar, A., Benson, S.M.: U-fno—an enhanced fourier neural op erator-based deep-learning mo del for m ultiphase flo w. 25 Adv ances in W ater Resources 163 , 104180 (2022) https://doi.org/10.1016/j. advw atres.2022.104180 [12] Jiang, Z., Zhu, M., Lu, L.: F ourier-mionet: F ourier-enhanced multiple-input neural op erators for multiphase mo deling of geological carb on sequestration. Reliability Engineering & System Safety 251 , 110392 (2024) https://doi.org/10.1016/j.ress. 2024.110392 [13] Li, K., Y e, W.: D-fno: A decomp osed fourier neural op erator for large-scale para- metric partial differen tial equations. Computer Metho ds in Applied Mechanics and Engineering 436 , 117732 (2025) https://doi.org/10.1016/j.cma.2025.117732 [14] Lu, L., Jin, P ., P ang, G., Zhang, Z., Karniadakis, G.E.: Learning nonlinear oper- ators via deep onet based on the universal approximation theorem of op erators. Nature mac hine intelligence 3 (3), 218–229 (2021) [15] Ho w ard, A.A., Perego, M., Karniadakis, G.E., Stinis, P .: Multifidelity deep op erator netw orks for data-driven and physics-informed problems. Journal of Computational Physics 493 , 112462 (2023) https://doi.org/10.1016/j.jcp.2023. 112462 [16] Zheng, J., Hu, H., Huang, J., Zhao, B., Huang, H.: Cf-deep onet: Deep op erator neural netw orks for solving compressible flo ws. Aerospace Science and T echnology 163 , 110329 (2025) h ttps://doi.org/10.1016/j.ast.2025.110329 [17] Yin, M., Charon, N., Bro dy , R., Lu, L., T ray anov a, N., Maggioni, M.: A scal- able framework for learning the geometry-dep enden t solution op erators of partial differen tial equations. Nature computational science 4 (12), 928–940 (2024) [18] Jin, P ., Meng, S., Lu, L.: Mionet: Learning multiple-input op erators via tensor pro duct. SIAM Journal on Scientific Computing 44 (6), 3490–3514 (2022) [19] Raissi, M., Perdik aris, P ., Karniadakis, G.E.: Physics-informed neural netw orks: A deep learning framework for solving forward and inv erse problems inv olving nonlinear partial differential equations. Journal of Computational Physics 378 , 686–707 (2019) h ttps://doi.org/10.1016/j.jcp.2018.10.045 [20] Karniadakis, G.E., Kevrekidis, I.G., Lu, L., P erdik aris, P ., W ang, S., Y ang, L.: Ph ysics-informed mac hine learning. Nature Reviews Ph ysics 3 (6), 422–440 (2021) [21] Lin, G., Mo ya, C., Zhang, Z.: B-deep onet: An enhanced bay esian deep onet for solving noisy parametric p des using accelerated replica exchange sgld. Journal of Computational Physics 473 , 111713 (2023) https://doi.org/10.1016/j.jcp.2022. 111713 [22] Mo y a, C., Mollaali, A., Zhang, Z., Lu, L., Lin, G.: Conformalized-deep onet: A 26 distribution-free framework for uncertaint y quantification in deep op erator net- w orks. Ph ysica D: Nonlinear Phenomena 471 , 134418 (2025) h ttps://doi.org/10. 1016/j.ph ysd.2024.134418 [23] Lone, S.N., De, S., Na yek, R.: α -VI DeepONet: A prior-robust v ariational Ba yesian approach for enhancing DeepONets with uncertaint y quan tification. Computer Metho ds in Applied Mechanics and Engineering 449 , 118552 (2026) h ttps://doi.org/10.1016/j.cma.2025.118552 [24] Zhang, Z.: Modno: Multi-operator learning with distributed neural op erators. Computer Metho ds in Applied Mechanics and Engineering 431 , 117229 (2024) h ttps://doi.org/10.1016/j.cma.2024.117229 [25] Liu, Y., Zhang, Z., Sc haeffer, H.: Prose: Predicting m ultiple op erators and sym- b olic expressions using multimodal transformers. Neural Netw orks 180 , 106707 (2024) h ttps://doi.org/10.1016/j.neunet.2024.106707 [26] Sun, J., Liu, Y., Zhang, Z., Schaeffer, H.: T ow ards a foundation mo del for partial differen tial equations: Multiop erator learning and extrapolation. Physical Review E 111 (3), 035304 (2025) [27] McCabe, M., Blancard, B.R.-S., P arker, L.H., Ohana, R., Cranmer, M., Bietti, A., Eic ken b erg, M., Golk ar, S., Kraw ezik, G., Lanusse, F., Pettee, M., T esileanu, T., Cho, K., Ho, S.: Multiple physics pretraining for spatiotemp oral surrogate mo d- els. In: The Thirty-eigh th Ann ual Conference on Neural Information Pro cessing Systems (2024). https://op enreview.net/forum?id=DKSI3bULiZ [28] Hao, Z., Su, C., Liu, S., Berner, J., Ying, C., Su, H., Anandkumar, A., Song, J., Zh u, J.: DPOT: Auto-Regressiv e Denoising Op erator T ransformer for Large-Scale PDE Pre-T raining (2024). [29] Huang, J., Y ang, G., W ang, Z., Park, J.J.: Diffusionp de: Generative p de-solving under partial observ ation. Adv ances in Neural Information Pro cessing Systems 37 , 130291–130323 (2024) [30] Y ao, J., Mammadov, A., Berner, J., Kerrigan, G., Y e, J.C., Azizzadenesheli, K., Anandkumar, A.: Guided diffusion sampling on function spaces with applica- tions to PDEs. In: The Thirty-nin th Annual Conference on Neural Information Pro cessing Systems (2025). https://op enreview.net/fo rum?id=oAgwvZay2U [31] W ang, S., Dou, Z., Shan, S., Liu, T.-R., Lu, L.: F undiff: Diffusion mo dels o ver function spaces for physics-informed generativ e mo deling. arXiv preprint arXiv:2506.07902 (2025) [32] Shan, S., Zh u, M., Lin, Y., Lu, L.: Red-diffeq: Regularization by denoising dif- fusion mo dels for solving in verse p de problems with application to full wa veform in version. arXiv preprint arXiv:2509.21659 (2025) 27 [33] Bastek, J.-H., Sun, W., Kochmann, D.M.: Ph ysics-informed diffusion mo dels. arXiv preprin t arXiv:2403.14404 (2024) [34] V o vk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random W orld. Springer, New Y ork, NY (2005) [35] Angelopoulos, A.N., Bates, S.: A gentle introduction to conformal prediction and distribution-free uncertaint y quantification. arXiv preprint (2021) [36] Lipman, Y., Chen, R.T., Ben-Hamu, H., Nick el, M., Le, M.: Flow matching for generativ e mo deling. arXiv preprin t arXiv:2210.02747 (2022) [37] Geng, Z., Deng, M., Bai, X., Kolter, J.Z., He, K.: Mean flo ws for one-step generativ e mo deling. arXiv preprin t arXiv:2505.13447 (2025) [38] Song, Y., Sohl-Dickstein, J., Kingma, D.P ., Kumar, A., Ermon, S., P o ole, B.: Score-based generativ e mo deling through stochastic differen tial equations. In: International Conference on Learning Represen tations (2021). https://op enreview.net/fo rum?id=PxTIG12RRHS [39] Ho, J., Jain, A., Abb eel, P .: Denoising diffusion probabilistic mo dels. Adv ances in Neural Information Pro cessing Systems 33 , 6840–6851 (2020) [40] Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Researc h 23 (47), 1–33 (2022) [41] Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. Adv ances in neural information processing systems 35 , 26565–26577 (2022) [42] Ch ung, H., Kim, J., Mccann, M.T., Klasky , M.L., Y e, J.C.: Diffusion p osterior sampling for general noisy inv erse problems. arXiv preprin t (2022) [43] Romano, Y., Patterson, E., Candes, E.: Conformalized quantile regression. Adv ances in neural information pro cessing systems 32 (2019) 28 Extended Data Extended Data Fig. 1 : Mec hanistic analysis of internal atten tion patterns across ph ysical op erators. Representativ e attention maps are extracted from a deco der blo c k at a fixed denoising step ( t = 1000) for diffusion, advection, and adv ection–diffusion regimes under identical initial conditions. Comparison of local atten tion w eights rev eals distinct structural signatures: the advection–diffusion maps (righ t) exhibit in termediate activ ations in spatial regions where features are prominen t in pure advection (middle) but absent in pure diffusion (left). This graded atten tion pattern across the three operator classes suggests that the shared pAD AM prior reuses and comp oses operator-sp ecific internal representations to capture complex mixed dynamics. 29 Extended Data T able 1 : Exp erimental configurations and parameter mani- folds. a, Initial condition (IC) protocol standardized across all exp erimental regimes. ICs for scalar and Burgers’ systems follow lo calized Gaussian forms adapted to their resp ectiv e domains (Eqs. 12 , 24 ), while Navier–Stok es follows the solenoidal trigono- metric form (Eq. 27 ). b, Systematic v ariability of gov erning coefficients ϕ across thematic inv estigations. These manifolds define the contin uous physical regimes ana- lyzed in the corresp onding Results sections. PDE group IC parameter Sampling rule Scalar PDEs Cen troid ( x c , y c ) U [0 . 2 , 0 . 8] 2 Gaussian Width w 0 U [0 . 025 , 0 . 075] Burgers’ Comp onent Centroids ( c x,i , c y ,i ) U [0 . 2 , 0 . 8] 2 Comp onen t Widths w 1 , w 2 U [0 . 025 , 0 . 075] Na vier–Stokes Amplitude F actor a a ∼ U (0 . 5 , 1 . 5] T rigonometric Basis ϕ, ψ ( ϕ, ψ ) ∈ { sin , cos } 2 Mo de F requency k ∈ { 2 π , 4 π } Domain Length L L = 1 (fixed) P anel b — Parameter manifolds and physical regimes across thematic in vestigations Thematic in vestiga- tion PDE library V ariable parameters ( ϕ ) Sampling rule Unified m ulti-physics Diffusion – ν = 0 . 25 Adv ection – ( a x , a y ) = (4 , 2) Adv.–Diff. – ( ν, a x , a y ) = (0 . 25 , 4 , 2) Con tinuous physics manifold Diffusion ν U [0 . 1 , 0 . 4] Adv ection a x U [2 . 0 , 5 . 0] , a y = 2 Adv.–Diff. ν U [0 . 1 , 0 . 4] , ( a x , a y ) = (4 , 2) Structural scaling Diffusion ν U [0 . 1 , 0 . 4] Adv ection a x U [2 . 0 , 5 . 0] , a y = 2 Adv.–Diff. ν U [0 . 1 , 0 . 4] , ( a x , a y ) = (4 , 2) Allen–Cahn ε 2 U [2 . 5 × 10 − 3 , 0 . 0121] Burgers’ ν ν = 0 . 05 Na vier–Stokes ν ν = 0 . 02 P arametric scaling and mo del selection Diffusion ν U [0 . 2 , 0 . 4] Adv ection ( a x , a y ) U [2 . 0 , 3 . 0] 2 Adv.–Diff. ( ν, a x , a y ) U [0 . 2 , 0 . 4] × U [2 . 0 , 3 . 0] 2 30 sample 1 sample 2 sample 3 sample 4 sample 5 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Encoder - Denoising step t = 500 sample 1 sample 2 sample 3 sample 4 sample 5 Encoder - Denoising step t = 1000 sample 1 sample 2 sample 3 sample 4 sample 5 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Decoder - Denoising step t = 500 sample 1 sample 2 sample 3 sample 4 sample 5 Decoder - Denoising step t = 1000 Cosine similarity (Diffusion, A dvection) (Diffusion, A dvection Diffusion) (A dvection, A dvection Diffusion) Extended Data Fig. 2 : Quan titativ e assessment of representational similar- it y across PDE families. Pairwise cosine similarities of attention maps calculated for one encoder and one deco der blo c k at t wo representativ e denoising steps ( t = 500 and t = 1000) across fiv e test samples. Across all configurations, similarities b et w een the adv ection–diffusion (mixed) regime and the pure extremes (diffusion or advection) are consisten tly higher than the similarity b et ween the tw o pure extremes. This hier- arc hy indicates that the model’s laten t space is organized according to the underlying mathematical comp osition of the ph ysical laws, with the mixed op erator acting as a represen tational bridge. Extended Data T able 2 : Conformal calibration resolv es the systematic under-co verage of Bay esian ensem bles. Mean empirical co verage (%) of nominal 95% prediction in terv als for the advection–diffusion system under sparse (30%) spatial observ ations (ensem ble size M = 6; 50 calibration instances). While raw Ba y esian ense m bles fail to meet the nominal tar- get, the integration of conformal calibration into the pADAM framew ork effectiv ely reco vers statistical v alidit y , ensuring reli- able uncertain ty quan tification for ph ysical inference. Results are a veraged ov er 50 test instances. Metho d F orward ( u T ) In verse ( u 0 ) Ensem ble only 58.33 36.31 Ensem ble + Conformal 98.42 99.83 31 (a) Op erator discrepancy (∆ op ) Reaction rate k ∆ op (%) 5.0 22 . 14 15.0 52 . 83 (b) ADR extrap olation p erformance 30 20 10 Observed fraction (%) 0 2 4 6 R e l a t i v e L 2 e r r o r ( % ) k = 5 . 0 , u T k = 5 . 0 , u 0 k = 1 5 . 0 , u T k = 1 5 . 0 , u 0 Extended Data Fig. 3 : Zero-shot extrap olation p erformance on unseen PDE (advection–diffusion–reaction dynamics). a, Quantification of the op era- tor shift (∆ op ) b et ween the advection–diffusion (AD) training prior and the unseen adv ection–diffusion–reaction (ADR) dynamics for tw o reaction rates k . This shift is defined as the relativ e L 2 discrepancy b et w een the terminal states ( u T ) of the t wo sys- tems under identical initial conditions; as k increases, the physical div ergence b et ween the tw o PDE systems grows. b, Relativ e L 2 error for the joint reconstruction of full- field initial ( u 0 ) and terminal ( u T ) states conditioned on sparse spatial observ ations of endp oin ts. The model is conditioned on the closest known op erator class (AD) and steered via observ ation-guided sampling. Error in u T increases with k due to the accum ulated influence of the unseen reaction term, while u 0 remains stable. As obser- v ation sparsit y increases, reconstruction accuracy degrades gracefully , demonstrating the robustness of the pADAM prior under physical missp ecification. All results are a veraged ov er 20 indep enden t test instances. 32 2 3 4 5 6 7 8 Number of Samples 0.0 0.2 0.4 0.6 0.8 1.0 Mean PICP F ull Observation (100%) 95% tar get Diffusion (forwar d) Diffusion (inverse) A dvection (forwar d) A dvection (inverse) A dvection-Diffusion (forwar d) A dvection-Diffusion (inverse) 2 3 4 5 6 7 8 Number of Samples Sparse Observation (30%) Extended Data Fig. 4 : Empirical co verage of Bay esian p osterior ensembles across different observ ation regimes. Mean predictiv e-interv al cov erage proba- bilit y (PICP) of nominal 95% interv als (dashed grey line) as a function of ensemble size for forward and inv erse tasks. Particularly under ill-p osed conditions, suc h as in verse problems or prediction under sparse observ ations, where only 30% of the field is observ ed, the co v erage of ra w Ba yesian in terv als saturates w ell below the 95% target across all operator families. This systematic under-cov erage underscores the necessity of conformal calibration for pro viding rigorous reliabilit y guarantees in data-sparse or ill-p osed regimes. Results are av eraged o ver 20 test instances. Extended Data T able 3 : T ask-agnostic p erformance across the contin uous ph ysics manifold. Relativ e L 2 errors (%) for forw ard prediction ( u T ), initial-state reconstruction ( u 0 ), and physical parameter discov ery ( ϕ ). pADAM was trained across three PDE families, eac h with a single v ariable co efficien t: diffusion ( ϕ = ν ), adv ection ( ϕ = a x ), and advection–diffusion ( ϕ = ν ). Performance is ev aluated under full (100%) and sparse spatial observ ations (30% and 10%), with results av eraged o ver 50 test instances. PDE System Observ ation F orward ( u T ) In v erse ( u 0 ) In v erse ( ϕ ) Diffusion F ull (100%) 0.69 0.89 1.38 Sparse (30%) 1.64 1.96 3.48 Sparse (10%) 2.09 3.05 8.26 Adv ection F ull (100%) 1.91 1.13 0.72 Sparse (30%) 2.11 1.47 1.48 Sparse (10%) 2.13 2.55 2.69 Adv ection–diffusion F ull (100%) 1.12 1.20 2.81 Sparse (30%) 1.70 2.26 4.44 Sparse (10%) 2.65 4.11 7.73 33 0.0 0.4 0.8 x 0.0 0.4 0.8 y 0.00 0.15 0.30 u (a) Ensemble Method (Coverage=59.20%) T rue Surface Upper Bound L ower Bound Slice locations 0.0 0.4 0.8 x 0.0 0.4 0.8 y 0.00 0.15 0.30 u (b) Ensemble Method + Confor mal (Coverage=100.00%) T rue Surface Upper Bound L ower Bound Slice locations 0.0 0.2 0.4 0.6 0.8 1.0 x 0.000 0.025 0.050 u (c) y = 0 . 0 0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.1 0.2 u (d) y = 0 . 3 3 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 u (g) y = 0 . 6 7 0.0 0.2 0.4 0.6 0.8 1.0 x 0.00 0.05 0.10 u (h) y = 1 . 0 0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.000 0.025 0.050 u (e) y = 0 . 0 0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.1 0.2 u (f) y = 0 . 3 3 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 u (i) y = 0 . 6 7 0.0 0.2 0.4 0.6 0.8 1.0 x 0.00 0.05 0.10 u (j) y = 1 . 0 0 Extended Data Fig. 5 : Effect of conformal calibration on prediction inter- v als. a, Standard ensemble-based prediction interv als sho w undercov erage (59.20%) for forw ard prediction of u T in the adv ection–diffusion system under sparse (30%) observ ations. b, Conformal-calibrated interv als ac hieving 100.00% empirical co ver- age. c–j, One-dimensional slices at representativ e y -lo cations; shaded bands denote predictiv e in terv als and blue curves indicate the true solution. Conformal calibration adaptiv ely expands the in terv als in regions of high epistemic uncertain ty to reco ver the nominal 95% cov erage. Critically , this p ost-hoc pro cedure preserves the shap e of the underlying generative predictions while improving co verage reliabilit y without mo del retraining. 34 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.22 0.45 0.67 0.89 Observed u 0 (10%) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.00 0.22 0.45 0.68 0.91 Observed u T (10%) Estimated Lifted ϕ True Lifted ϕ 0.00 0.22 0.45 0.67 0.89 0.00 0.22 0.45 0.68 0.91 4.18 4.18 4.18 4.18 4.18 4.18 4.18 4.18 4.18 4.18 (a) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.02 0.20 0.42 0.63 0.85 O b s e r v e d u 0 ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.02 0.20 0.42 0.64 0.86 O b s e r v e d v 0 ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.01 0.12 0.26 0.40 0.53 E s t i m a t e d u T 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.01 0.13 0.26 0.40 0.53 G r o u n d - T r u t h u T 0.00 0.21 0.42 0.62 0.83 0.00 0.21 0.42 0.63 0.85 0.00 0.13 0.26 0.39 0.52 0.00 0.13 0.26 0.39 0.52 (b) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.05 0.14 0.32 0.50 0.68 O b s e r v e d u 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.03 0.10 0.23 0.35 0.48 O b s e r v e d v T ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.02 0.17 0.35 0.54 0.72 E s t i m a t e d v 0 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.01 0.17 0.35 0.54 0.72 G r o u n d - T r u t h v 0 0.00 0.16 0.32 0.48 0.64 0.00 0.11 0.23 0.34 0.45 0.00 0.18 0.35 0.53 0.71 0.00 0.18 0.35 0.53 0.71 (c) Extended Data Fig. 6 : Qualitativ e inference p erformance of pADAM under con tinuous physics manifold and structural scaling. a, Parameter disco very within the scalar physics manifold (diffusion, advection, and advection–diffusion). F or the adv ection system, the physical co efficien t ϕ = a x is inferred from paired states ( u 0 , u T ) under 10% spatial observ ations. b, c, Inference for the Burgers’ system under structural scaling to the 6-PDE library . b, F orw ard prediction of the Burgers’ velocity comp onen t u T conditioned on full spatial observ ations of the initial states ( u 0 , v 0 ). c, In verse reconstruction of the Burgers’ initial velocity comp onen t v 0 conditioned on 30% spatial observ ations of the terminal component v T and the initial comp onent u 0 . 35 L i f t e d 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.00 0.22 0.44 0.66 0.88 O b s e r v e d u T ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.22 0.44 0.66 0.88 E s t i m a t e d u 0 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.22 0.44 0.66 0.88 G r o u n d - T r u t h u 0 2.19 2.29 2.38 2.48 2.58 0.00 0.22 0.44 0.66 0.88 0.00 0.22 0.44 0.66 0.88 0.00 0.22 0.44 0.66 0.88 (a) 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.22 0.43 0.65 0.86 Observed u 0 (30%) 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.10 0.21 0.31 0.41 Observed u T (30%) Estimated Lifted ϕ True Lifted ϕ 0.22 0.43 0.65 0.86 0.10 0.21 0.31 0.41 0.28 0.79 1.30 1.81 2.32 0.28 0.79 1.30 1.81 2.32 (b) Extended Data Fig. 7 : Qualitativ e inference p erformance of pADAM under parametric scaling and heterogeneous parametric dimensionality . a, Initial- state reconstruction in the advection system with a v ariable parameter vector ϕ = [ a x , a y ]. The reconstructed initial state u 0 is inferred by conditioning on the known parameter v ector and full observ ation of the terminal state u T . b, Parameter disco very in the adv ection–diffusion system with a three-dimensional parameter v ector ϕ = [ ν, a x , a y ]. The physical coefficients are inferred from paired states ( u 0 , u T ) under 30% spatial observ ations. 36
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment