pADAM: A Plug-and-Play All-in-One Diffusion Architecture for Multi-Physics Learning

pAD AM: a plug-and-pla y all-in-one diﬀusion arc hitecture for m ulti-ph ysics learning Amirhossein Mollaali 1 , Bongseok Kim 1 , Christian Mo ya 2 , Guang Lin 1,2* 1* Sc ho ol of Mec hanical Engineering, Purdue Universit y , W est Lafa yette, 47906, IN, USA. 2* Departmen t of Mathematics, Purdue Univ ersity , W est Lafa yette, 47906, IN, USA. *Corresp onding author(s). E-mail(s): guanglin@purdue.edu ; Con tributing authors: amollaal@purdue.edu ; kim4853@purdue.edu ; cmo yacal@purdue.edu ; Abstract Generalizing across disparate ph ysical laws remains a fundamen tal challenge for artiﬁcial in telligence in science. Existing deep-learning solv ers are largely conﬁned to single-equation settings, limiting transfer across ph ysical regimes and inference tasks. Here w e introduce pADAM, a uniﬁed generative framew ork that learns a shared probabilistic prior across heterogeneous partial diﬀerential equation fami- lies. Through a learned joint distribution of system states and, where applicable, ph ysical parameters, pADAM supports forward prediction and inv erse inference within a single arc hitecture without retraining. Across benchmarks ranging from scalar diﬀusion to nonlinear Navier–Stok es equations, pADAM ac hieves accurate inference even under sparse observ ations. Combined with conformal prediction, it also provides reliable uncertaint y quantiﬁcation with cov erage guarantees. In addition, pADAM p erforms probabilistic model selection from only tw o sparse snapshots, identifying gov erning laws through its learned generative representa- tion. These results highlight the p oten tial of generative multi-ph ysics mo deling for uniﬁed and uncertaint y-aw are scientiﬁc inference. The mathematical description of ph ysical phenomena through partial diﬀerential equations (PDEs) forms a cornerstone of mo dern science [ 1 , 2 ], enabling the c har- acterization of systems ranging from large-scale w eather and climate dynamics [ 3 ] 1 to the complex turbulence of ﬂuid ﬂows [ 4 , 5 ]. While classical numerical metho ds remain the b edrock of accuracy through systematic discretization [ 6 , 7 ], their extreme computational cost in high-dimensional parameter spaces—sp eciﬁcally in uncertaint y quan tiﬁcation and inv erse design—has motiv ated the developmen t of accelerated surrogate mo dels [ 8 ]. Deep learning metho ds hav e emerged as a promising alternative to traditional PDE solvers, with architectures such as F ourier Neural Op erators (FNO) [ 9 – 13 ], Deep Op erator Net w orks (DeepONet) [ 14 – 18 ], and Physics-Informed Neural Net- w orks (PINNs) [ 19 , 20 ] demonstrating the p oten tial to bypass traditional solver constrain ts. Despite these adv ances, the ﬁeld remains hindered b y tw o critical lim- itations. First, most mo dels pro duce deterministic outputs, limiting their utility in uncertaint y quan tiﬁcation unless augmented with auxiliary techniques such as Ba yesian training or ensem ble metho ds [ 21 – 23 ]. Second, and more fundamen tally , cur- ren t architectures op erate within a “one-model-one-equation” paradigm. This rigidity prev ents cross-physics knowledge transfer and necessitates exhaustiv e retraining for ev ery new physical regime and task. F urthermore, while emerging foundation mo dels for PDEs [ 24 – 28 ] ha ve b egun to address multi-operator training, they remain limited in performing reliable inference under highly sparse observ ations c haracteristic of real- w orld sensing. This limitation arises b ecause these frameworks learn ﬁxed forward mappings b et ween prescrib ed input/output ﬁelds, rather than probabilistic mo dels capable of b eing conditioned on arbitrary measurement op erators at test time. Diﬀusion mo dels hav e recen tly emerged as a p o werful generative framework for scien tiﬁc computing, demonstrating particular promise in solving PDEs by learning the full probability distribution of solution states rather than deterministic p oin t esti- mates [ 29 – 33 ]. Unlike deterministic op erators, their iterativ e denoising formulation supp orts conditional sampling (inpainting), positioning them as inheren tly suited for inference under sparse observ ations. Despite this ﬂexibility , existing diﬀusion-based PDE solvers remain constrained by the same sp ecialization observed in earlier neu- ral arc hitectures: mo dels are typically restricted to a single PDE class. Consequen tly , curren t approac hes cannot generalize across disparate physics, p erpetuating a frag- men ted landscap e that limits their practical utilit y in settings in volving heterogeneous ph ysical regimes. T o address this fragmentation, we introduce pAD AM (plug-and-play all-in-one diﬀusion architecture for multi-ph ysics learning), a uniﬁed generative framework for learning across heterogeneous PDE families within a single mo del. pAD AM learns a shared class-conditional probabilistic prior o ver system states and, when applicable, ph ysical parameters, allo wing forw ard prediction, parameter inference, and initial- condition reconstruction to b e formulated within one p osterior-sampling framework without task-sp eciﬁc retraining. In this wa y , a single architecture can op erate across m ultiple ph ysical regimes that are typically treated as separate learning problems. Through observ ation-guided conditional sampling, the mo del can incorp orate sparse measuremen ts at inference time, supp orting accurate inference from only 10–30% observ ations. 2 As a generativ e framework, pAD AM also quan tiﬁes predictive uncertain ty through sampling. T o ensure the reliabilit y of these uncertaint y estimates, we in tegrate confor- mal prediction into the inference pip eline. This pro vides distribution-free ﬁnite-sample co verage guaran tees for predictive interv als [ 22 , 34 , 35 ], which is particularly imp or- tan t under sparse observ ations, where interv als can otherwise exhibit undercov erage. By calibrating predictive uncertaint y , this framew ork supports more reliable scientiﬁc inference. Bey ond forward and inv erse inference, pADAM also supp orts probabilistic PDE mo del selection from as few as t wo sparse temp oral snapshots b y lev eraging its learned shared generativ e prior. This enables identiﬁcation of go verning physical laws and quan tiﬁcation of asso ciated uncertain ties from minimal measurements. Fig. 1 provides a schematic ov erview of the pADAM architecture and its ability to transition across forward, inv erse, and disco very tasks within a single probabilistic framew ork. This versatilit y p ositions pADAM as a promising framework for uniﬁed and uncertain ty-a ware scientiﬁc inference across heterogeneous physical systems. Results W e ev aluate pAD AM as a uniﬁed foundation for multi-ph ysics learning across a div erse library of heterogeneous PDEs. Our assessment consists of ﬁv e inv estiga- tions, each designed to ev aluate a distinct capability of the framework. Across these in vestigations, we examine task-agnostic inference in forw ard and inv erse problems under full and sparse observ ations, uncertaint y quantiﬁcation with conformal calibra- tion, out-of-distribution extrapolation, and probabilistic mo del selection from minimal observ ations. Gov erning equations are detailed in the Metho ds, and the ranges of initial conditions and physical parameters across all ev aluated PDE families are summarized in Extended Data T able 1 . Uniﬁed m ulti-physics learning and trustw orthy inference W e ﬁrst ev aluate the capacity of pADAM to learn three distinct physical op erators within a single arc hitecture: purely dissipative (diﬀusion), purely conv ectiv e (advec- tion), and their coupled form (adv ection–diﬀusion). By unifying these heterogeneous PDE families—represen ting the transition from isotropic smoothing to directional transp ort—within a single set of weigh ts, w e assess whether a shared generative prior can capture disparate ph ysical dynamics without equation-sp eciﬁc training. Op er ator c ompr ession and p ar ameter eﬃciency T o test this uniﬁcation, w e b enc hmark pADAM against PDE-sp eciﬁc diﬀusion-mo del baselines [ 29 ] trained indep enden tly for each PDE family . In this conﬁguration, physi- cal parameters are held ﬁxed to ev aluate the model’s abilit y to learn the bidirectional mapping b etw een initial and ﬁnal states. While the baselines require separate mo del instances for each regime, pADAM is trained to learn the conditional join t distribution p ( u 0 , u T | c ), where c denotes the PDE class. Crucially , pADAM uses appro ximately the same num b er of trainable parameters as a single sp ecialized baseline, while sup- p orting all three PDE families within a single mo del, for b oth forward prediction 3 PD E cl ass l ab el PDE Simulations (Dataset Generation) Lif te d pa ram et er Init ia l con di t i on Fina l stat e … Class - Conditional T raining U - Net - ba sed de no iser arch it ectur e Lea rnin g Condi t ion al Joi nt Distrib ution Datase t a b c Class - Conditional Samplin g T rained clas s - con dit ion al dif fusio n m od el Probability flow Den oising d Plug - and - play infer ence acr oss multiple PDEs Unified class - con dit ion al dif fusio n m od el Generated sam ple Init ial Gaussian no ise Gen era ting class - con diti onal PDE samp les Den oising Probability flow Generated po sterior sa m ple Init ial Gaussian no ise 1. Forward pr oblem 2. Inverse pr oblems 3. PDE Model S elect ion Prio r score Lik eliho od score Observation - guided trajectory Pred icting the fina l s ta te In itial con dition Param eter Final st ate In fer ring the initial condition o r p ara mete rs Initial con dition Fina l state PDE class and In itial con dition Final state Parameter Ass ociated p aram eters In itial con dition Final state Param eter Prob abilistic PDE Model Selectio n fr om T wo Snap shots PDE class PDE class PDE class Fig. 1 : Sc hematic of the pAD AM framework for uniﬁed multi-ph ysics learn- ing. a – c , The pADAM framework learns across disparate physical laws, illustrated here for scalar-ﬁeld PDEs, b y pro jecting heterogeneous equation families in to a shared generativ e prior. A class-conditional diﬀusion mo del learns the joint distribution of system states and ph ysical parameters ( a, b ), enabling the generation of diverse ph ysical regimes from Gaussian noise ( c , orange tra jectories). d , T ask-agnostic infer- ence via Bay esian conditioning. By incorp orating full or sparse observ ations through plug-and-pla y guidance (green tra jectory), the shared pADAM prior supports forward prediction, initial condition reconstruction, parameter inference, and probabilistic mo del selection within a single framew ork. This uniﬁed manifold allows pADAM to na vigate a range of inference tasks without task-sp eciﬁc retraining. u T ∼ p ( u T | u 0 , c ) and initial-state reconstruction u 0 ∼ p ( u 0 | u T , c ) under full and sparse observ ations. As rep orted in T able 1 , pADAM achiev es high-ﬁdelit y predictions across all regimes, even under sparse observ ations. A notable result is the model’s data eﬃciency; despite b eing trained on 33% few er samples p er PDE family than the sp ecialized 4 T able 1 : Comparative p erformance against PDE-sp eciﬁc baselines. Rela- tiv e L 2 error (%) for forw ard prediction ( u T ) and in verse reconstruction ( u 0 ) across ﬁxed-parameter PDE systems. W e compare pADAM (trained on a diverse diﬀu- sion, advection, and advection–diﬀusion library , N = 1 , 000) against PDE-sp eciﬁc diﬀusion baselines ( N = 500 p er family) following DiﬀusionPDE [ 29 ]. Results are a veraged ov er 50 test instances under full (100%) and sparse (30%) spatial obser- v ations. The low est errors in eac h row are shown in b old. PDE system Observ ation Mo del F orward ( u T ) In v erse ( u 0 ) Diﬀusion F ull (100%) pAD AM 0 . 82 0 . 45 Single-PDE 0 . 55 0 . 68 Sparse (30%) pAD AM 0 . 88 0 . 86 Single-PDE 1 . 00 2 . 31 Adv ection F ull (100%) pAD AM 1 . 03 1 . 25 Single-PDE 1 . 02 1 . 26 Sparse (30%) pAD AM 1 . 36 2 . 29 Single-PDE 1 . 52 1 . 55 Adv ection–diﬀusion F ull (100%) pAD AM 1 . 54 1 . 43 Single-PDE 1 . 28 1 . 29 Sparse (30%) pAD AM 2 . 26 1 . 92 Single-PDE 1 . 64 2 . 05 baselines (333 vs. 500 samples), pADAM achiev es comparable accuracy ov erall across the ev aluated tasks. This result is particularly imp ortan t b ecause pADAM in tegrates m ultiple dynamics within a single net work, whereas standard approaches require inde- p enden t mo del instances for each op erator. By unifying these PDE families within a single mo del, pADAM demonstrates parameter eﬃciency , eﬀectively amortizing the represen tational cost of the entire m ulti-physics library across a single set of weigh ts. This op erator compression suggests that parameter sharing induces a “shar e d physi- c al vo c abulary,” enabling a single mo del to span multiple ph ysical manifolds without increasing global computational o verhead. R epr esentation sharing acr oss PDE op er ators T o gain insight in to how pADAM shares represen tations across heterogeneous op era- tors, we examine its internal attention patterns. Analysis of a deco der blo c k, shown in Extended Data Fig. 1 , suggests that the model reuses and comp oses op erator-sp eciﬁc features when transitioning from pure to mixed dynamics. The attention patterns for pure diﬀusion and pure advection are visibly distinct; in regions where attention w eights are prominen t for adv ection but absen t for diﬀusion, the advection–diﬀusion maps exhibit intermediate v alues. This graded b eha vior is consistent with the mixed 5 nature of the advection–diﬀusion equation, where transp ort and dissipation co exist. These patterns suggest that pAD AM comp oses op erator-speciﬁc features rather than main taining isolated mechanisms for each PDE family . T o assess this b eha vior quan titatively , w e compute pairwise cosine similarities b et w een atten tion maps extracted from one enco der blo ck and one deco der block across tw o denoising steps (Extended Data Fig. 2 ). Across all settings, similarities in volving the advection–diﬀusion case are consistently higher than those b etw een the pure diﬀusion and pure adv ection extremes. This hierarc hy mirrors the mathematical structure of the PDEs, where the mixed op erator links the t wo pure regimes. Such alignmen t suggests that pAD AM organizes its latent space in accordance with these relationships, reﬂecting the comp ositional structure of the underlying dynamics. Zer o-shot extr ap olation to unse en physic al laws A deﬁning capability of a generalist prior is extrap olation beyond its training library . W e ev aluate pADAM—trained on diﬀusion, advection, and advection–diﬀusion—on the unseen adv ection–diﬀusion–reaction (ADR) equation, where the ADR system uses the same viscosity and v elo cit y parameters as the advection–diﬀusion case. The reac- tion term ( k ) induces a structured operator shift abs en t during training, rendering the learned prior missp eciﬁed with resp ect to the true ADR dynamics. W e quantify this shift using ∆ op , deﬁned as the relativ e L 2 deviation of the terminal state induced by the reaction term (see Metho ds, Eq. ( 30 )). As sho wn in Extended Data Fig. 3 a, the shift exceeds 20% at k = 5 . 0 and exceeds 50% at k = 15 . 0, conﬁrming a genuinely out-of-distribution regime. Despite this mismatch, pADAM remains robust when sparse observ ations of ( u 0 , u T ) are used to jointly reconstruct the full-ﬁeld states, as demonstrated in Extended Data Fig. 3 b. The model leverages observ ation-guided sampling conditioned on the closest known op erator class (the adv ection–diﬀusion class) to steer the mis- sp eciﬁed prior tow ard tra jectories consistent with the unseen ADR dynamics. While reconstruction error in u T increases with the reaction rate k , reﬂecting the accumu- lated inﬂuence of the unseen dynamics, errors for the initial state u 0 remain stable. As observ ation sparsity increases, errors rise in a con trolled manner for b oth end- p oin ts because less information is av ailable to constrain the conditional distribution. These results suggest that pAD AM can adapt to unseen dynamics through informativ e observ ations, supp orting inference on previously unseen regimes without additional training. R eliable unc ertainty quantiﬁc ation via c onformal c alibr ation Unlik e deterministic surrogates, the generative nature of pADAM provides a proba- bilistic basis for uncertaint y quantiﬁcation through p osterior ensembles that capture distributions of physically consistent solutions. Although these ensembles reﬂect pre- dictiv e uncertain ty , empirical co verage (PICP) can fall well b elow the nominal 95% target due to data limitations or the inheren t ill-posedness that intensiﬁes under obser- v ational sparsity (Extended Data Fig. 4 ). T o address this, w e integrate conformal prediction into pAD AM to provide formal cov erage guarantees and improv e interv al reliabilit y . 6 As illustrated in Extended Data Fig. 5 , conformal calibration compensates for unresolv ed uncertaint y by adaptively expanding prediction in terv als in regions where observ ations provide limited information. Quantitativ e analysis in Extended Data T able 2 shows that, under 30% spatial observ ations for the advection–diﬀusion sys- tem, mean empirical cov erage increases from 58.33% to 98.42% for forward prediction and, most notably , from 36.31% to 99.83% for inv erse reconstruction. The sligh t o ver-co verage relative to the 95% target is consistent with ﬁnite-sample eﬀects, as calibration set sizes are limited by sampling cost. T ogether, these results show that pAD AM supp orts uncertaint y-a ware scientiﬁc inference with calibrated cov erage even under sev ere data scarcity . Na vigating the contin uous physics manifold T o ev aluate pADAM’s capacity to mo del the contin uous sp ectrum of physical dynam- ics, we extended the framework to systems with v ariable physical co eﬃcien ts ϕ . W e trained a uniﬁed prior on three canonical PDE families, in which one physical co eﬃ- cien t was treated as a v ariable parameter for each system: diﬀusion (v ariable viscosity ν ), advection (v ariable velocity a x in the x -direction), and advection–diﬀusion (v ari- able ν ). By learning the joint distribution p ( ϕ, u 0 , u T | c ), pADAM mov es b ey ond discrete operator sets to represent a contin uous physical manifold. This formulation enables task-agnostic inference, in which a single set of w eights supp orts forward pre- diction u T ∼ p ( u T | u 0 , ϕ, c ), initial-state reconstruction u 0 ∼ p ( u 0 | u T , ϕ, c ), and parameter disco very ϕ ∼ p ( ϕ | u 0 , u T , c ) across disparate PDEs. As summarized in Extended Data T able 3 , pAD AM main tained high-ﬁdelity p er- formance across this full task spectrum; relativ e L 2 errors for b oth state and parameter estimation remained b elo w 2.81% under full observ ation. Notably , the mo del also remained stable under sev ere observ ational sparsity . While parameter inference is naturally more sensitiv e to sparsity-induced ill-p osedness, state reconstruction errors remained b elo w 4.11% even with only 10% spatial observ ations. These results suggest that pAD AM can steer the generativ e prior tow ard physically consistent tra jecto- ries, supporting probabilistic state and parameter inference across con tinuous ph ysical regimes. A qualitative illustration of parameter disco very on the con tinuous manifold for the advection system under sparse (10%) observ ations is provided in Extended Data Fig. 6 a, further demonstrating pAD AM’s inference capability under limited data. Scalabilit y and generalization across the ph ysical sp ectrum T o ev aluate the robustness of the pADAM framework, we inv estigated its scalabil- it y across t wo distinct dimensions: structur al br e adth and p ar ametric depth . This dual-pronged assessment ev aluates the mo del’s ability to maintain high-ﬁdelity rep- resen tations as b oth the diversit y of gov erning laws and the dimensionality of their parameter spaces increase, providing a systematic ev aluation of generalization across the ph ysical sp ectrum. 7 Breadth: structural scaling to a 6-PDE library W e next inv estigated the framework’s capacity to scale to a broader training library b y signiﬁcantly expanding its structural breadth. T o ev aluate the representational eﬃciency of the learned manifold, we utilized a mo del with the same capacity and arc hitecture as in the contin uous physics manifold inv estigation and trained it on an expanded set of six PDE families, spanning b oth scalar and vector-v alued systems. This structural scaling c hallenges the task-agnostic inference capabilities of pADAM b y requiring it to learn a broader range of physical dynamics within a single set of w eights. F or the scalar regimes—including diﬀusion, advection, advection–diﬀusion, and Allen–Cahn—w e main tained the single-v ariable co eﬃcien t settings established in the con tinuous physics manifold inv estigation to ev aluate whether the mo del could pre- serv e its precision across the full task sp ectrum: forward prediction u T ∼ p ( u T | u 0 , ϕ, c ), initial-state reconstruction u 0 ∼ p ( u 0 | u T , ϕ, c ), and parameter discov- ery ϕ ∼ p ( ϕ | u 0 , u T , c ). F or the v ector-v alued Burgers’ and Navier–Stok es systems, the physical co eﬃcien ts were held ﬁxed to isolate the challenge of mo deling coupled v elo cit y ﬁelds via the joint distributions p ( u 0 , v 0 , u T | c ) and p ( u 0 , v 0 , v T | c ), whic h capture the relationship b et ween the velocity comp onen ts across initial and terminal states through shared conditioning. Here, pADAM mo dels the coupled state transi- tion through comp onen t-wise sampling; sp eciﬁcally , for forward prediction, we sample terminal v elo cities u T ∼ p ( u T | u 0 , v 0 , c ) and v T ∼ p ( v T | u 0 , v 0 , c ). F or inv erse state estimation, where joint reconstruction is inherently ill-p osed, we leverage an auxiliary conditioning scheme—sampling u 0 ∼ p ( u 0 | v 0 , u T , c ) and v 0 ∼ p ( v 0 | u 0 , v T , c )—to b etter constrain the solution space and mitigate the sensitiv e dep endence on initial conditions c haracteristic of nonlinear conv ectiv e systems. As rep orted in T able 2 , pADAM maintained strong p erformance across the expanded library without architectural mo diﬁcation. While we observe a marginal increase in relativ e error compared to the smaller op erator sets used in that earlier in vestigation, this limited degradation—despite the substan tial increase in struc- tural breadth—suggests that the mo del can maintain p erformance as the training library expands. Notably , the uniﬁed prior consistently captured disparate dynamic regimes and structural patterns, from the sharp interfaces of Allen–Cahn (retaining < 1% parameter error) to the dissipative evolution of the Burgers’ system. While the advection–diﬀusion system exhibited higher sensitivity in parameter discov ery under sparsity (14 . 36%), this reﬂects the comp ounded diﬃcult y of resolving comp et- ing transp ort mechanisms within a shared latent space. In the Na vier–Stokes regime, although the v -v elo cit y comp onen t show ed a characteristic error increase (8 . 67% at 30% observ ations), this b eha vior is exp ected under sparse observ ations, where reco v- ery of coupled nonlinear ﬂow ﬁelds b ecomes more underdetermined. Nev ertheless, the mo del remained stable and generated physically consistent samples. Qualitativ e examples of forward and in verse reconstructions for the Navier–Stok es and Burgers’ equations are illustrated in Fig. 2 and Extended Data Fig. 6 b,c, resp ectiv ely , high- ligh ting the framew ork’s robust p erformance under b oth full and sparse observ ation regimes. 8 T able 2 : Structural scaling across the multi-ph ysics sp ectrum. Relative L 2 errors (%) for pADAM ev aluated on a library of six distinct physical regimes. a, Performance on scalar-ﬁeld PDEs (diﬀusion, advection, advection–diﬀusion, and Allen–Cahn) for forw ard prediction ( u T ), initial-state reconstruction ( u 0 ), and param- eter discov ery ( ϕ ). b, Performance on v ector-ﬁeld PDEs (Burgers’ and Navier–Stok es) where the mo del learns component-wise join t distributions to provide forw ard ( u T , v T ) and inv erse ( u 0 , v 0 ) state estimation. All tasks are p erformed under b oth full (100%) and sparse (30%) spatial observ ations, with results rep orted as the mean o ver 50 test instances. a — Scalar-ﬁeld PDEs PDE system Observ ation F orward ( u T ) In v erse ( u 0 ) In v erse ( ϕ ) Diﬀusion F ull (100%) 1.37 1.11 4.13 Sparse (30%) 2.03 2.40 5.60 Adv ection F ull (100%) 1.93 1.96 1.51 Sparse (30%) 1.93 1.97 2.24 Adv ection–diﬀ. F ull (100%) 1.96 2.28 9.26 Sparse (30%) 2.59 3.28 14.36 Allen–Cahn F ull (100%) 1.21 2.42 0.48 Sparse (30%) 2.32 2.75 0.72 b — V ector-ﬁeld PDEs PDE system Observ ation F orward ( u T ) F orward ( v T ) In verse ( u 0 ) In verse ( v 0 ) Burgers’ F ull (100%) 1.36 1.16 1.29 0.90 Sparse (30%) 2.08 1.63 2.32 1.06 Na vier–Stokes F ull (100%) 1.45 4.96 1.25 3.85 Sparse (30%) 2.72 8.67 1.38 4.33 Depth: parametric scaling and partial-parameter inference W e further challenged the framework by moving from settings with a single v ariable parameter to regimes with multi-v ariable co eﬃcien t sets of diﬀerent dimension- alities. This setting enables the ev aluation of state and parameter inference in higher-dimensional parameter spaces. W e trained pADAM on three canonical fami- lies—diﬀusion, adv ection, and advection–diﬀusion—in which all physical co eﬃcients w ere treated as v ariable: ϕ = [ ν ] for diﬀusion, ϕ = [ a x , a y ] for adv ection, and ϕ = [ ν, a x , a y ] for adv ection–diﬀusion. T o manage this heterogeneity , we employ ed a p ar ameter lifting scheme (see Metho ds) to pro ject v ariable-length vectors and 9 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.12 -0.56 0.00 0.56 1.12 O b s e r v e d u 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.12 -0.56 0.00 0.56 1.12 O b s e r v e d v 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.46 -0.23 -0.01 0.22 0.45 E s t i m a t e d v T 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.44 -0.22 0.00 0.22 0.44 G r o u n d - T r u t h v T 0.00 0.00 0.00 0.49 0.97 0.00 0.00 0.00 0.49 0.97 0.00 0.11 0.21 0.32 0.43 0.00 0.11 0.21 0.32 0.43 (a) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.01 -0.51 0.00 0.51 1.01 O b s e r v e d u 0 ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.44 -0.22 0.00 0.22 0.44 O b s e r v e d v T ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.03 -0.51 0.01 0.53 1.05 E s t i m a t e d v 0 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.01 -0.51 0.00 0.51 1.01 G r o u n d - T r u t h v 0 0.00 0.00 0.00 0.49 0.97 0.00 0.00 0.00 0.21 0.42 0.00 0.25 0.50 0.76 1.01 0.00 0.25 0.50 0.76 1.01 (b) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.25 -0.62 0.00 0.62 1.25 O b s e r v e d u 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.25 -0.62 0.00 0.62 1.25 O b s e r v e d v 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.13 -0.57 0.00 0.57 1.14 E s t i m a t e d u T 0.0 0.5 1.0 x 0.0 0.5 1.0 y -1.11 -0.55 0.00 0.55 1.11 G r o u n d - T r u t h u T 0.00 0.00 0.00 0.54 1.09 0.00 0.00 0.00 0.54 1.09 0.00 0.27 0.55 0.82 1.09 0.00 0.27 0.55 0.82 1.09 (c) Fig. 2 : F orw ard and in v erse inference for Navier–Stok es dynamics using pAD AM. a, F orward prediction of the v elo cit y comp onen t v T conditioned on 30% spatial observ ations of the initial states ( u 0 , v 0 ). b, Inv erse reconstruction of the initial v elo cit y comp onen t v 0 conditioned on full observ ation of the terminal comp onen t v T and the initial comp onen t u 0 . c, F orward prediction of the velocity comp onen t u T conditioned on 30% spatial observ ations of the initial states ( u 0 , v 0 ). disparate physical parameters into a uniﬁed representation compatible with the con- ditional prior. W e employ ed a mo del with the same capacity and architecture as in the con tinuous physics manifold and structural scaling inv estigations. 10 Under this increased parametric depth, pADAM p erformed state and multi- parameter inference across all families, as rep orted in T able 3 a. Results indicate that pAD AM maintained high accuracy for both forw ard prediction and inv erse state recon- struction even as the parameter space expanded, while parameter inference exhibited higher error b ecause of the increased ill-p osedness asso ciated with jointly inferring mul- tiple co eﬃcients. Qualitative examples of pAD AM predictions across heterogeneous parameter dimensionalities are sho wn in Extended Data Fig. 7 . T o demonstrate pADAM’s utilit y for partial-parameter inference, we lev eraged the Ba yesian guidance formulation to incorp orate a priori physical knowledge at infer- ence time. F or the advection system, we compared the joint inference of the velocity v ector ϕ = [ a x , a y ] with conditional settings in which one comp onent was treated as an observed constrain t. As summarized in T able 3 b, incorp orating this a pri- ori knowledge—for example, a x | a y —substan tially improv ed the estimation of the remaining comp onen t across b oth full and sparse observ ational regimes. By explicitly restricting the parameter manifold to conﬁgurations consisten t with known physical constrain ts, pADAM mitigates ill-p osedness b y reducing the eﬀective dimensionality of the inv erse problem. This capability enables the integration of physical priors in to real-time inference without mo del retraining. Iden tifying gov erning laws through probabilistic mo del selection A central challenge in scientiﬁc mac hine intelligence is the autonomous iden tiﬁcation of gov erning laws from sparse, temp oral observ ations. W e next ev aluate pADAM’s capacit y for probabilistic mo del selection, challenging the framew ork to infer the underlying physical law and its asso ciated co eﬃcien ts from as few as tw o sparse snap- shots ( c, ϕ ∼ p ( c, ϕ | u 0 , u T )). pADAM p erforms identiﬁcation by leveraging its learned class-conditional prior to explore a candidate library of PDE classes. F or this ev alua- tion, w e utilize the generalist prior trained on the heterogeneous parameter datasets used in the parametric scaling in vestigation. As illustrated in Fig. 3 , the framew ork reliably distinguishes b et ween comp eting ph ysical hypotheses—such as advectiv e transp ort versus pure diﬀusion—even under signiﬁcan t observ ational sparsit y . This selection follo ws an infer-and-validate logic (see Metho ds): for each candidate class, pAD AM lev erages its shared generativ e prior to infer the corresp onding co eﬃcien t posterior and assess marginal consistency by cross- referencing parameter inference with generativ e state reconstructions. This pro cess allo ws the framework to compare candidate PDE classes and identify the gov erning la w that b est explains the observ ed state transition. Despite sev ere observ ational scarcity (30% observ ations), pADAM consisten tly selects the ground-truth op erator across the ev aluated scenarios while main taining parameter estimates that closely align with the true co eﬃcien ts. Because the frame- w ork is fully generative, rep eated sampling naturally induces a distribution ov er candidate la ws and asso ciated parameters; the resulting ensem ble-based predictive in terv als provide a measure of the epistemic uncertaint y inherent in ph ysical mo del selection. These results suggest that pAD AM can serve as a probabilistic framework 11 T able 3 : Parametric scaling and partial-parameter inference p erformance. a, T ask-agnostic inference across a heterogeneous PDE library with diﬀerent para- metric dimensionality . Relativ e L 2 errors (%) for state estimation ( u 0 and u T ) and parameter recov ery ( ϕ ) across three physical regimes with v arying parametric depth: Diﬀusion ( ϕ = [ ν ]), Adv ection ( ϕ = [ a x , a y ]), and Adv ection–diﬀusion ( ϕ = [ ν, a x , a y ]). b, Impact of partial parameter constraints on parameter inference in the advection system. Comparison of joint inference ( a x , a y ) against conditional settings where one v elo cit y comp onen t ( a x or a y ) is provided at inference time. All tasks are p erformed under b oth full (100%) and sparse (30%) spatial observ ations, with results represent- ing the mean o ver 50 test instances. a — T ask-agnostic inference across a heterogeneous PDE library with diﬀerent parametric dimensionality PDE system Observ ation F orward ( u T ) Inv erse ( u 0 ) avg. Inv erse ( ϕ ) Diﬀusion F ull (100%) 0.98 1.18 2.89 Sparse (30%) 1.82 2.07 4.61 Adv ection F ull (100%) 0.85 1.03 6.57 Sparse (30%) 1.64 1.32 6.61 Adv ection–diﬀ. F ull (100%) 1.49 1.72 5.99 Sparse (30%) 2.31 2.71 5.95 b — Partial-parameter inference under conditional physical constrain ts in the adv ection system Observ ation Inference setting In verse ( a x ) Inv erse ( a y ) F ull (100%) a x , a y ∼ p ( a x , a y | u 0 , u T ) 6.31 6.83 a x ∼ p ( a x | a y , u 0 , u T ) 2.00 – a y ∼ p ( a y | a x , u 0 , u T ) – 2.30 Sparse (30%) a x , a y ∼ p ( a x , a y | u 0 , u T ) 6.32 6.91 a x ∼ p ( a x | a y , u 0 , u T ) 2.44 – a y ∼ p ( a y | a x , u 0 , u T ) – 2.92 for characterizing systems whose gov erning dynamics are am biguous or only partially observ ed from limited data. Discussion The results show that a single class-conditional generative prior can supp ort forward prediction, inv erse inference, and probabilistic mo del selection across multiple PDE families, including b oth scalar and vector-v alued systems. Rather than relying on sep- arate task-sp eciﬁc solvers, pADAM learns a shared probabilistic represen tation that 12 O b s e r v e d u 0 ( 3 0 % ) O b s e r v e d u T ( 3 0 % ) 0 . 83 0 . 61 0 . 39 0 . 16 - 0 . 06 1 . 0 y 0 . 5 1 . 0 1 . 0 0 . 5 0 . 5 0 . 5 0 . 0 0 . 0 x 0 . 84 0 . 62 0 . 39 0 . 17 - 0 . 06 1 . 0 y 0 . 0 0 . 0 x 0 . 19 0 . 39 0 . 58 0 . 77 0 . 00 0 . 20 0 . 39 0 . 59 0 . 79 (a) Advection regime T rue PDE: ∂ t u + [2 . 56 , 2 . 97] · ∇ u = 0 Sampled PDE: ∂ t u + [2 . 60 , 2 . 87] · ∇ u = 0 95% Interv al: ∂ t u + [2 . 70 ± 0 . 10 , 2 . 79 ± 0 . 09] · ∇ u = 0 O b s e r v e d u 0 ( 3 0 % ) O b s e r v e d u T ( 3 0 % ) 0 . 77 0 . 56 0 . 36 0 . 15 - 0 . 05 1 . 0 y 0 . 5 1 . 0 1 . 0 0 . 5 0 . 5 0 . 5 0 . 0 0 . 0 x 0 . 33 0 . 24 0 . 15 0 . 07 - 0 . 02 1 . 0 y 0 . 0 0 . 0 x 0 . 18 0 . 36 0 . 54 0 . 72 0 . 08 0 . 15 0 . 23 0 . 31 (b) Diﬀusion regime T rue PDE: ∂ t u = 0 . 29∆ u Sampled PDE: ∂ t u = 0 . 30∆ u 95% Interv al: ∂ t u = 0 . 30( ± 0 . 0010)∆ u O b s e r v e d u 0 ( 3 0 % ) O b s e r v e d u T ( 3 0 % ) 0 . 64 0 . 47 0 . 30 0 . 13 - 0 . 04 1 . 0 y 0 . 5 1 . 0 1 . 0 0 . 5 0 . 5 0 . 5 0 . 0 0 . 0 x 0 . 24 0 . 18 0 . 12 0 . 05 - 0 . 01 1 . 0 y 0 . 0 0 . 0 x 0 . 15 0 . 30 0 . 45 0 . 59 0 . 06 0 . 12 0 . 17 0 . 23 (c) Advection–diﬀusion regime T rue PDE: ∂ t u + [2 . 40 , 2 . 30] · ∇ u = 0 . 33∆ u Sampled PDE: ∂ t u + [2 . 37 , 2 . 19] · ∇ u = 0 . 34∆ u 95% Interv al: ∂ t u + [2 . 29 ± 0 . 06 , 2 . 35 ± 0 . 18] · ∇ u = 0 . 34( ± 0 . 016)∆ u Fig. 3 : Probabilistic mo del selection across three PDE classes. Identiﬁca- tion of go v erning laws from tw o snapshots ( u 0 , u T ) with only 30% of the spatial ﬁeld observed. a–c, Left panels show sparse ﬁeld snapshots; right panels compare the ground-truth PDEs with representativ e sampled PDEs and the asso ciated 95% pre- dictiv e interv als deriv ed from the ensemble. can b e conditioned on diﬀerent forms of partial information. This suggests that het- erogeneous physical systems can b e addressed within a uniﬁed generative framew ork while main taining accurate inference under sparse observ ations. A cen tral result is that pADAM can identify gov erning laws from only tw o sparse snapshots by comparing ho w well candidate PDE classes explain the observed tran- sition under the learned prior. Unlike classical discov ery metho ds that rely on dense tra jectories, this approac h formulates PDE iden tiﬁcation as probabilistic inference 13 o ver a candidate library of ph ysical mo dels. Because the comparison is generative rather than deterministic, it also yields uncertain ty o ver candidate la ws and associated parameters. Crucially , conformal calibration strengthens pAD AM as a framework for scientiﬁc inference by pro viding distribution-free ﬁnite-sample co verage guaran tees for predic- tiv e in terv als. This is particularly important in sparse and ill-p osed settings, where raw predictiv e interv als ma y exhibit substantial undercov erage. In addition, the zero-shot extrap olation results suggest that the learned prior is not limited to a ﬁxed training library and can remain eﬀectiv e under op erator shift when guided by observ ations. The pADAM framework is designed to remain compatible with a broad class of generativ e backbones, making it adaptable to emerging paradigms suc h as ﬂo w matc h- ing [ 36 ] and recent mean-ﬂow approaches [ 37 ], which may improv e sampling eﬃciency and reduce computational cost. F uture extensions could further enhance p erformance, particularly in extreme low-data regimes, by incorp orating physics-informed priors directly into the generativ e process [ 33 ]. In addition, extending the framew ork to utilize function-space diﬀusion [ 30 ] ma y enable discretization-inv ariant inference in inﬁnite-dimensional settings and supp ort inference across v arying grid resolutions. Bey ond these technical extensions, an imp ortan t next direction is to mov e b eyond ﬁxed candidate libraries tow ard more ﬂexible forms of equation discov ery for sys- tems with partially unknown physics. Such developmen ts could improv e the ability of generativ e framew orks to relate high-dimensional observ ations to in terpretable mathematical structure. T aken together, these results suggest that shared generative physical priors can supp ort prediction, inference, and mo del disco very within a single probabilistic frame- w ork. More broadly , this w ork can b e viewed as a pilot step tow ard foundation mo dels for science, in whic h large generativ e models are trained across div erse physical domains and adapted to downstream scientiﬁc and engineering tasks. In that sense, pAD AM p oin ts tow ard a class of generalist scientiﬁc machine learning mo dels that remain eﬀectiv e even when observ ations are sparse, irregular, or incomplete. Metho ds Problem form ulation and ob jectiv e W e consider a library of C distinct PDE families. Each class c ∈ { 1 , . . . , C } is asso ciated with a gov erning op erator F ( c ) represen ting a sp eciﬁc ph ysical dynamical regime: F ( c )  u ( s , t ); ϕ  = 0 , ( s , t ) ∈ Ω × (0 , T ] , u ( s , 0) = u 0 ( s ) , (1) sub ject to appropriate b oundary conditions on ∂ Ω. Here, u ( s , t ) denotes the solution ﬁeld in spatial co ordinates s ∈ Ω, and ϕ ∈ R d c represen ts the v ector of physical co ef- ﬁcien ts parameterizing the dynamics within class c . This formulation accommo dates b oth scalar-v alued and v ector-v alued states (for example, coupled v elo cit y comp o- nen ts u = ( u, v ) in Navier–Stok es). In this work, all exp erimen ts are conducted on t wo-dimensional spatial domains; thus, s ∈ Ω ⊂ R 2 . The ob jectiv e is to learn a uniﬁed, class-conditional generative prior p ( x | c ), where x is a multi-c hannel v ariable encapsulating the joint distribution of temp oral system 14 states and, where applicable, gov erning physical parameters within a shared proba- bilistic represen tation. By parameterizing this prior via a diﬀusion mo del [ 38 , 39 ], we treat forw ard prediction, inv erse state estimation, and parameter recov ery as p oste- rior sampling tasks of the form p ( x | y obs , c ), where y obs represen ts an arbitrary set of observ ations spanning from full-ﬁeld data to sp arse measurements. This formula- tion enables task-agnostic inference, in which div erse do wnstream physical tasks are p erformed by conditioning a shared manifold of ph ysical dynamics on av ailable data without requiring task-sp eciﬁc architectures or retraining. The pAD AM framework: a uniﬁed foundation for m ulti-ph ysics learning The pAD AM framework brings heterogeneous PDE families into a shared genera- tiv e formulation (Fig. 1 ). As illustrated in Fig. 1 a–c, a class-conditional diﬀusion mo del learns the join t distribution of the uniﬁed representation, comprising system states and, where applicable, physical parameters, and thereb y supp orts generation across multiple ph ysical regimes from Gaussian noise (Fig. 1 c, orange tra jectories). As sho wn in Fig. 1 d, task-agnostic inference is p erformed through Ba yesian conditioning: b y incorp orating full or sparse observ ations through plug-and-play guidance (green observ ation-guided tra jectory), the shared pAD AM prior supp orts forward prediction, in verse state and parameter inference, uncertaint y quantiﬁcation, and probabilistic mo del selection within a single framework. W e detail eac h comp onen t of pADAM in the follo wing sections. Uniﬁed join t representation and parameter lifting T o enable a single architecture to pro cess heterogeneous physics, w e transform each system in to a uniﬁed 3-c hannel generativ e v ariable x ∈ R 3 × N x × N y . This representation ensures arc hitectural inv ariance across diﬀeren t physical regimes: x := ( [ Φ , u 0 , u T ] for scalar-ﬁeld PDEs [ u 0 , v 0 , u T or v T ] for vector-ﬁeld PDEs (2) where Φ is a spatially lifted representation of the co eﬃcien ts ϕ , and u, v represent coupled velocity-ﬁeld comp onen ts. By maintaining a ﬁxed channel dimension, a single class-conditional diﬀusion mo del can b e deploy ed across diverse dynamical systems without structural mo diﬁcation. While this conﬁguration is optimized for the forward and inv erse tasks presen ted in this w ork, the framew ork is inheren tly modular, allo wing for alternative channel assignments dep ending on the requirements of the physical regime. Generativ e inclusion of physical parameters. T reating ϕ as a comp onen t of the generativ e v ariable x , rather than a static conditioning input, is a key mo deling c hoice for probabilistic in verse inference in parametric systems. This design allows the mo del to learn the intrinsic joint prior p ( ϕ, u 0 , u T | c ), enabling the reco very of unknown ph ysical parameters through p osterior sampling ϕ ∼ p ( ϕ | u 0 , u T , c ) at inference time. 15 Mathematical lifting scheme. T o bridge the dimensionality gap b et ween co eﬃ- cien ts d c and spatial states N x × N y , we deﬁne a lifting op erator L that broadcasts ϕ in to a spatial ﬁeld Φ. This pro cess ensures that physical parameters are represented as spatially compatible features: • Scalar case ( d c = 1 ): Φ i,j = ϕ for all ( i, j ) ∈ Ω. • V ector case ( d c > 1 ): Φ i,j := P d c k =1 ϕ k 1 Ω k ( i, j ), where { Ω k } d c k =1 is a disjoint partition of the spatial grid. This spatial lifting provides a consistent inductive bias, allowing the mo del’s hierar- c hical conv olutional architecture to capture the functional dep endencies b etw een the go verning physical parameters (Φ) and the system states ( u 0 , u T ). Learning uniﬁed generative priors across heterogeneous physical regimes T o enable a uniﬁed mo deling framework for multi-ph ysics systems, we learn a single generativ e prior p ( x | c ) using a class-conditional diﬀusion model [ 40 ] that cap- tures the joint distribution of system states and, where applicable, gov erning ph ysical parameters across the library of PDE families C . This approac h leverages shared laten t structure across heterogeneous op erators, establishing a uniﬁed generative represen tation that captures dynamical structure across disparate PDE families. The represen tation is adapted to the structural dep endencies of eac h physical family: • P arametric scalar-ﬁeld PDEs: W e learn the join t prior p (Φ , u 0 , u T | c ). By including the lifted parameter ﬁeld Φ in the generativ e v ariable x , the mo del captures the relationship b et ween ph ysical co eﬃcien ts and system states. • V ector-v alued PDEs: F or systems with ﬁxed ph ysical parameters, we learn comp onen t-wise join t priors, p ( u 0 , v 0 , u T | c ) and p ( u 0 , v 0 , v T | c ). This strategy uses the av ailable channel capacit y to resolv e coupled velocity comp onen ts, enabling the mo del to capture dep endencies betw een initial and terminal states. W e adopt the EDM framework [ 41 ] to parameterize this prior. In this implemen- tation, the mo del is trained using a conditional denoising score-matching ob jective: L train = E x 0 ,c,σ,n  λ ( σ ) ∥ D θ ( x 0 + n ; σ, c ) − x 0 ∥ 2 2  , (3) where n ∼ N (0 , σ ( t ) 2 I ) denotes additiv e Gaussian noise at diﬀusion level σ ( t ), and λ ( σ ) is a preconditioning weigh t. The diﬀusion pro cess deﬁnes a time-dep enden t generativ e v ariable x = x ( t ) for t ∈ [0 , T ], where x 0 = x (0) denotes a clean sample from the class-conditional data distribution and x ( t ) b ecomes progressiv ely noisier as t increases under the noise schedule σ ( t ). At the terminal time t = T , the diﬀusion distribution approaches a Gaussian prior, so that sampling b egins from a Gaussian noise distribution and is then transp orted bac k tow ard the data manifold. Here, D θ denotes the denoiser that maps noisy inputs to w ard the clean manifold. Through the relation s θ ( x ; σ ( t ) , c ) = ( D θ ( x ; σ ( t ) , c ) − x ) /σ ( t ) 2 , the netw ork learns to appro xi- mate the conditional score function, yielding an estimate of the log-density gradien t 16 ∇ x log p t ( x | c ), where p t ( x | c ) denotes the class-conditional marginal distribution of the diﬀusing sample at time t . Once this score estimator is obtained, samples are ev olved through a deterministic mapping gov erned b y the Probabilit y Flo w Ordinary Diﬀeren tial Equation (ODE) [ 38 , 41 ]: dx dt = − ˙ σ ( t ) σ ( t ) s θ ( x ; σ ( t ) , c ) , (4) By evolving this ODE at inference time, pAD AM transforms sto chastic noise in to physically consistent realizations, steering sample tra jectories along the learned class-conditional manifold so that generated samples remain consistent with the c haracteristic dynamics of the sp eciﬁed ph ysical regime. T ask-agnostic inference via plug-and-pla y observ ational guidance A key feature of pADAM is its abilit y to p erform task-agnostic inference in a plug- and-play manner, where forward prediction, inv erse state estimation, and co eﬃcient reco very are uniﬁed as conditional sampling problems without requiring mo del retrain- ing. Given full or sparse observ ations y obs , we sample from the p osterior distribution p ( x | y obs , c ) using the learned class-conditional generative prior. F ollowing Ba yes’ rule, the p osterior score is decomp osed as [ 38 ]: ∇ x log p t ( x | y obs , c ) ≈ s θ ( x ; σ ( t ) , c ) | {z } Prior Score + ∇ x log p t ( y obs | x, c ) | {z } Likelihood Score , (5) where the prior score is provided b y the pre-trained class-conditional mo del, and the lik eliho o d score enforces consistency with av ailable measurements. Under additive Gaussian measurement noise, and following the formulation for score-based guidance [ 29 , 42 ], the lik eliho o d score is approximated at inference time as: ∇ x log p t ( y obs | x, c ) ≈ − λ obs ∇ x ∥ y obs − A ( ˆ x 0 ( x ) ) ∥ 2 2 , (6) where A denotes a task-sp eciﬁc measuremen t op erator, and ˆ x 0 is the denoiser’s esti- mate of the clean system representation. Substituting this approximation into the Probabilit y Flow ODE ( 4 ) yields the guided probability-ﬂo w dynamics: dx dt = − ˙ σ ( t ) σ ( t )  s θ ( x ; σ ( t ) , c ) − λ obs ∇ x ∥ y obs − A ( ˆ x 0 ( x ) ) ∥ 2 2  , (7) The guidance scale λ obs acts as a w eighting factor that balances the learned prior against observ ational constraints. Under this formulation, the guided ODE maps an initial Gaussian noise sample to realizations that remain consistent with b oth the sp eciﬁed physical class and the av ailable measurements. The prior score steers the tra jectory tow ard high-probability regions of the class-conditional distribution, while the observ ation-based correction enforces consistency with y obs . F orwar d and inverse op er ator synthesis The ﬂexibility of the observ ation op erator A enables pAD AM to supp ort inference across a diverse set of physical op erators within a single arc hitecture. This v ersatility is demonstrated across three fundamen tal classes of physics problems: 17 • F orw ard prediction: F or parametric scalar PDEs, given the co eﬃcients Φ and the initial state u 0 , pAD AM samples the terminal state u T ∼ p ( u T | u 0 , Φ , c ). F or v ector-v alued systems (e.g., Navier–Stok es), the mo del p erforms comp onent-wise prediction of terminal v elo cities u T ∼ p ( u T | u 0 , v 0 , c ) and v T ∼ p ( v T | u 0 , v 0 , c ). • In verse state estimation: F or scalar ﬁelds, pAD AM samples u 0 ∼ p ( u 0 | u T , Φ , c ). F or m ulti-comp onen t systems where join t reconstruction is highly ill-posed, w e sam- ple u 0 ∼ p ( u 0 | v 0 , u T , c ) and v 0 ∼ p ( v 0 | u 0 , v T , c ); this auxiliary conditioning constrains the solution space and mitigates the ill-p osedness of the inv erse recov ery . • In verse parameter identiﬁcation: F or scalar parametric systems, given paired observ ations of the initial and terminal states ( u 0 , u T ), pADAM recov ers the go verning co eﬃcients Φ ∼ p (Φ | u 0 , u T , c ). Crucially , A can represent b oth full-ﬁeld and sparse observ ations, allo wing pAD AM to op erate across diﬀerent data-densit y regimes. By unifying these tasks across het- erogeneous ph ysical domains, pADAM functions as a pr ob abilistic fr amework for multi-op er ator, multi-physics infer enc e . This shared formulation enables forw ard and in verse problems to b e addressed within a single architecture without task-speciﬁc retraining. Reliable uncertain ty quan tiﬁcation via conformal calibration As a generative framework, pADAM provides uncertaint y quan tiﬁcation (UQ) through p osterior sampling. By dra wing multiple conditional samples from the observ ation- guided reverse pro cess, we obtain an empirical distribution consistent with b oth the ph ysical constraints of the PDE class and the av ailable measurements. The uncer- tain ty captured here is primarily epistemic , reﬂecting ambiguit y arising from factors including sparse observ ations, inv erse ill-p osedness, and limited data. This is particu- larly imp ortan t in scientiﬁc settings where reliable uncertaint y estimates are needed to supp ort inference. Ensemble-b ase d unc ertainty estimation F or a given conditioning sp eciﬁcation—encompassing the PDE class, partial state observ ations, and known coeﬃcients—w e generate an ensemble of M indep en- den t samples { z j } M j =1 from the guided reverse diﬀusion pro cess ( 7 ), where z ∈ { Φ , u 0 , u T , v 0 , v T } . W e characterize this predictiv e distribution through the ensem ble mean µ M and standard deviation σ M : µ M = 1 M M X j =1 z j , σ M = v u u t 1 M − 1 M X j =1  z j − µ M  2 . While a standard Gaussian-based 95% interv al ( µ M ± 1 . 96 σ M ) is a common base- line, conditional distributions in chaotic or under-determined PDE regimes frequently deviate from Gaussianity , exhibiting signiﬁcant skewness, heavy tails, or multimodal- it y . Consequently , these nominal in terv als provide no formal guarantees and can suﬀer from substan tial miscalibration. This issue is esp ecially relev ant in ill-p osed inv erse 18 inference and prediction under sparse observ ations, where multiple physical solutions ma y b e consisten t with the av ailable data. Conformal c alibr ation with distribution-fr e e c over age T o pro vide the reliabilit y required for scientiﬁc inference, we in tegrate c onformal pr e diction [ 34 , 35 ]. This pro cedure p ost-processes pADAM’s ensemble estimates to pro duce prediction interv als with ﬁnite-sample, distribution-free cov erage guarantees. Because the scale of uncertain ty v aries across heterogeneous physical regimes, calibra- tion is p erformed indep enden tly for each class–task pair ( c, τ ). Under the exchange- abilit y assumption for the calibration and test samples, we compute nonconformity scores s ( c,τ ) i on a calibration dataset D ( c,τ ) cal of size n c,τ to quan tify the normalized discrepancy b et ween the ground truth and the predictive distribution [ 22 , 43 ]: s ( c,τ ) i = | z i − µ M ( y obs ,i ) | σ M ( y obs ,i ) . (8) Giv en a target miscov erage rate α ∈ (0 , 1), the calibration threshold ˆ q ( c,τ ) α is deﬁned as the ⌈ ( n c,τ + 1)(1 − α ) ⌉ /n c,τ -th empirical quantile of the scores in D ( c,τ ) cal . W e then construct the ﬁnal conformal in terv al: C ( c,τ ) α ( y obs , test ) = h µ M ( y obs , test ) ± ˆ q ( c,τ ) α σ M ( y obs , test ) i . (9) Under exchangeabilit y , this construction guaran tees marginal co verage of at least 1 − α , pro viding a principled measure of predictive reliability . Probabilistic PDE mo del selection from tw o snapshots A core strength of pAD AM is its ability to accommodate heterogeneous parameter rep- resen tations across multiple PDE families through parameter lifting. This architectural design main tains structural consistency even when the dimensionalit y of the physical parameters v aries across PDE classes. F or example, a PDE library may include diﬀu- sion equations deﬁned b y scalar diﬀusivit y ν , adv ection equations go verned by v elo city comp onen ts ( a x , a y ), and advection–diﬀusion systems parameterized by ( ν , a x , a y ). W e lev erage this capabilit y to pe rform pr ob abilistic mo del sele ction fr om only two snap- shots for scalar-ﬁeld PDEs, casting the identiﬁcation of gov erning laws from sparse temp oral observ ations as a probabilistic inference task. Giv en only a pair of state snapshots, consisting of an initial state u 0 and a terminal state u T , which may be av ailable through sparse measurements, the goal is to identify the go verning PDE class from a candidate library C = c 1 , . . . , c K and infer the asso- ciated ph ysical parameters. This setting is highly ill-p osed and represents a form of probabilistic PDE discov ery . In contrast to classical iden tiﬁcation approaches requir- ing dense spatiotemp oral tra jectories, pADAM leverages its learned multi-operator prior to ev aluate the consistency of each candidate law with the observed transi- tion. By sampling class-conditional parameter p osteriors, the framework eﬀectively reconstructs the joint p osterior p ( c, ϕ | u 0 , u T ), naturally quantifying the epistemic uncertain ty inherent in identifying ph ysics from limited temp oral snapshots. 19 The selection pro cedure follows a infer-and-validate logic: 1. Conditional parameter inference: F or eac h candidate class c ∈ C , we sample from the class-conditional parameter p osterior ˆ ϕ ( c ) ∼ p ( ϕ | u 0 , u T , c ). This iden tiﬁes parameter conﬁgurations that are maximally consistent with the observed state transition under class c . 2. Generativ e v alidation: W e then use forward prediction to sample a synthetic terminal state ˆ u ( c ) T ∼ p ( u T | ˆ ϕ ( c ) , u 0 , c ). Each candidate is ev aluated based on its generativ e reconstruction discrepancy: E ( c ) = ∥ ˆ u ( c ) T − u T ∥ 2 . (10) The identiﬁed op erator c ⋆ = arg min c ∈C E ( c ), together with the corresponding inferred co eﬃcien ts ϕ ⋆ = ˆ ϕ ( c ⋆ ) , deﬁnes the explicit gov erning mo del that b est explains the observed dynamics. Because pADAM is fully generative, rep eated ensem ble sam- pling allo ws us to quan tify the uncertain ty asso ciated with mo del selection. This enables a principled approac h to unc ertainty-awar e mo del sele ction , where the relative supp ort for comp eting ph ysical laws is reﬂected in the distribution of reconstruction discrepancies. Exp erimen tal design Benc hmark PDE Problems for Exp erimen ts T o assess the generalit y of the pADAM framework, w e consider seven representativ e PDE families spanning dissipative, advectiv e, mixed, and nonlinear dynamics. This m ulti-physics library provides a stringen t testb ed for uniﬁed op erator learning across heterogeneous ph ysical regimes. Datasets for the diﬀusion, adv ection, adv ection–diﬀusion, advection–diﬀusion– reaction, Burgers’, and Allen–Cahn equations are generated using ﬁnite-diﬀerence discretizations, while Navier–Stok es solutions are computed using a F ourier sp ectral solv er. Diﬀusion equation. W e ﬁrst consider the diﬀusion equation, which mo dels a purely dissipative pro cess in whic h spatial gradients of a scalar ﬁeld are smo othed b y diﬀusion. Let u ( x, y , t ) denote a scalar quan tit y deﬁned on the spatial domain Ω = (0 , 1) × (0 , 1) for t ≥ 0. The gov erning equation reads ∂ t u = ν ∆ u in Ω × (0 , T ] , (11) where ν > 0 denotes the diﬀusion co eﬃcient and ∆ = ∂ xx + ∂ y y is the Laplacian op erator. W e assume that u is suﬃciently smo oth, for example u ∈ C 2 , 1 (Ω × (0 , T ]), so that the deriv atives in Eq. ( 11 ) are well deﬁned. The system is equipp ed with the initial condition u ( x, y , 0) = exp  − ( x − x c ) 2 + ( y − y c ) 2 w 0  sin( π x ) sin( π y ) , ( x, y ) ∈ Ω , (12) 20 where the centroid and width parameters ( x c , y c , w 0 ) are randomly sampled according to the uniform distributions sp eciﬁed in Extended Data T able 1 . The sinusoidal tap er ensures smo oth decay tow ard the b oundary . On the b oundary ∂ Ω, we imp ose the homogeneous Neumann b oundary condition ∇ u · n = 0 on ∂ Ω × (0 , T ] , (13) where n denotes the outw ard unit normal v ector. Adv ection equation. W e next consider the advection equation, whic h describ es the transp ort of a scalar ﬁeld by a prescrib ed uniform velocity ﬁeld. The gov erning equation is giv en by ∂ t u + a · ∇ u = 0 in Ω × (0 , T ] , (14) where a = ( a x , a y ) ∈ R 2 is a constan t advection velocity v ector. W e assume suﬃcient regularit y of the solution, e.g., u ∈ C 1 (Ω × (0 , T ]), so that the deriv atives in Eq. ( 14 ) exist. The advection problem is p osed with the same initial condition ( 12 ) and the homogeneous Neumann b oundary condition ∇ u · n = 0 on ∂ Ω × (0 , T ] . (15) The solution corresp onds to a translation of the initial proﬁle along the ﬂow direction prescrib ed by a , while preserving the ov erall shap e and amplitude. Adv ection–diﬀusion equation. W e consider the advection–diﬀusion equation, whic h describ es the combined eﬀects of advectiv e transp ort and diﬀusive spreading. The go verning equation takes the form ∂ t u + a · ∇ u = ν ∆ u in Ω × (0 , T ] , (16) where a = ( a x , a y ) ∈ R 2 is a constant advection v elo cit y and ν > 0 denotes the diﬀusion coeﬃcient. A suﬃcien tly smo oth solution is assumed, for instance u ∈ C 2 , 1 (Ω × (0 , T ]), ensuring that Eq. ( 16 ) is prop erly deﬁned. The system is supplemented b y the initial condition ( 12 ) and a homogeneous Neumann b oundary condition ∇ u · n = 0 on ∂ Ω × (0 , T ] . (17) The solution exhibits advectiv e transp ort together with diﬀusive smo othing, resulting in spreading and amplitude deca y ov er time. Adv ection–diﬀusion–reaction equation. W e consider the advection–diﬀusion– reaction equation, whic h describ es the combined eﬀects of advectiv e transp ort, diﬀusiv e spreading, and lo cal reaction. The go verning equation takes the form ∂ t u + a · ∇ u = ν ∆ u + R ( u ) in Ω × (0 , T ] , (18) where a ∈ R 2 is a constan t advection v elo cit y , ν > 0 denotes the diﬀusion co eﬃcien t, and R ( u ) is a reaction term. A suﬃciently smo oth solution is assumed, for instance u ∈ 21 C 2 , 1 (Ω × (0 , T ]), ensuring that Eq. ( 18 ) is prop erly deﬁned. The system is supplemented b y the initial condition ( 12 ) and a homogeneous Neumann b oundary condition ∇ u · n = 0 on ∂ Ω × (0 , T ] . (19) In this study , we consider the linear reaction term R ( u ) = k u , with k ≥ 0, rep- resen ting lo cal growth dynamics. When R ( u ) = 0, the equation reduces to the adv ection–diﬀusion equation. Allen–Cahn equation. W e consider the Allen–Cahn equation, a prototypi- cal nonlinear reaction–diﬀusion mo del that describ es phase separation and interface dynamics in bistable systems. Let u ( x, y , t ) denote an order parameter deﬁned on the spatial domain Ω = (0 , 1) × (0 , 1) for t ≥ 0. The gov erning equation is giv en by ∂ t u = ε 2 ∆ u − 1 ε 2  u 3 − u  in Ω × (0 , T ] , (20) where ε > 0 is a small parameter controlling the interfacial thickness and ∆ = ∂ xx + ∂ y y denotes the Laplacian operator. W e restrict attention to suﬃciently smo oth solutions u , e.g., u ∈ C 2 , 1 (Ω × (0 , T ]). The system is initialized with the same initial condi- tion deﬁned in Eq. ( 12 ). On the b oundary of the domain, we imp ose a homogeneous Diric hlet b oundary condition, u = 0 on ∂ Ω × (0 , T ] . (21) This condition ﬁxes the phase v ariable at the b oundary and preven ts interface motion across the domain b oundary . Over time, the solution reﬂects the combined eﬀects of diﬀusion and nonlinear reaction, leading to smo othing and phase separation b eha vior. Burgers’ equation. The tw o-dimensional Burgers’ equation is a nonlinear mo del com bining conv ective transp ort and viscous diﬀusion. The equation describ es the evo- lution of a velocity ﬁeld u ( x, y , t ) = ( u 1 ( x, y , t ) , u 2 ( x, y , t ) ) deﬁned on the square domain Ω = ( − 1 , 1) × ( − 1 , 1), and takes the form ∂ t u + ( u · ∇ ) u = ν ∆ u , (22) where ν > 0 denotes the kinematic viscosit y . W e consider suﬃciently smo oth velocity ﬁelds, for example u ∈ C 2 , 1 (Ω × (0 , T ]), so that all deriv atives in Eq. ( 22 ) are w ell deﬁned. W ritten in comp onen t form, the system b ecomes ∂ u 1 ∂ t + u 1 ∂ u 1 ∂ x + u 2 ∂ u 1 ∂ y = ν  ∂ 2 u 1 ∂ x 2 + ∂ 2 u 1 ∂ y 2  , ∂ u 2 ∂ t + u 1 ∂ u 2 ∂ x + u 2 ∂ u 2 ∂ y = ν  ∂ 2 u 2 ∂ x 2 + ∂ 2 u 2 ∂ y 2  . (23) The initial condition consists of spatially lo calized v elo cit y ﬁelds with Gaussian pro- ﬁles, modulated by a sine taper to satisfy homogeneous Diric hlet b oundary conditions. 22 Sp eciﬁcally , u 1 ( x, y , 0) = exp  − ( x − c x, 1 ) 2 + ( y − c y , 1 ) 2 w 1  sin( π x ) sin( π y ) , u 2 ( x, y , 0) = exp  − ( x − c x, 2 ) 2 + ( y − c y , 2 ) 2 w 2  sin( π x ) sin( π y ) , (24) where the comp onen t centr oids ( c x,i , c y ,i ) and width parameters w i (for i = 1 , 2) are randomly sampled according to the uniform distributions deﬁned in Extended Data T able 1 . The velocity ﬁeld satisﬁes homogeneous Dirichlet b oundary conditions, u ( x, y , t ) = 0 , ( x, y ) ∈ ∂ Ω . (25) The nonlinear term ( u · ∇ ) u describ es adv ective transp ort of the velocity ﬁeld, while the viscous term ν ∆ u introduces diﬀusiv e smoothing. The solution behavior is determined b y the relative magnitude of conv ection and viscosity . Incompressible Na vier–Stokes equations. W e next consider the tw o- dimensional incompressible Navier–Stok es equations, which describ e the evolution of a divergence-free velocity ﬁeld driven by nonlinear advection and balanced b y pres- sure and viscous diﬀusion. Let u ( x, y , t ) = ( u 1 ( x, y , t ) , u 2 ( x, y , t )) denote the velocity ﬁeld and p ( x, y , t ) the kinematic pressure on the spatial domain Ω = (0 , L ) × (0 , L ) for t ≥ 0. The gov erning equations without external forcing read ∂ t u + ( u · ∇ ) u + ∇ p = ν ∆ u , in Ω × (0 , T ] , ∇ · u = 0 , in Ω × [0 , T ] , (26) where ν > 0 is the kinematic viscosity . W e assume suﬃcien tly smo oth v elo cit y and pressure ﬁelds ( u , p ) so that Eq. ( 26 ) is well deﬁned. The system is initialized by a solenoidal velocity ﬁeld parameterized by an amplitude factor a , sampled from the uniform distribution sp eciﬁed in Extended Data T able 1 . F or all ( x, y ) ∈ Ω, the initial condition is deﬁned as u 1 ( x, y , 0) = − a ϕ  2 π y L  , u 2 ( x, y , 0) = a ψ  4 π x L  , (27) where each of ϕ and ψ is indep endently chosen from { sin( · ) , cos( · ) } , resulting in four p ossible sine–cosine com binations. This construction satisﬁes the incompressibilit y constrain t at t = 0, since u 1 dep ends only on y and u 2 only on x , which directly implies ∇ · u ( · , · , 0) = 0 . W e imp ose p eriodic b oundary conditions for b oth v elo cit y and pressure, u ( x + L, y , t ) = u ( x, y , t ) , u ( x, y + L, t ) = u ( x, y , t ) , p ( x + L, y , t ) = p ( x, y , t ) , p ( x, y + L, t ) = p ( x, y , t ) , (28) for all ( x, y ) ∈ Ω and t ∈ [0 , T ]. 23 Ev aluation metrics W e employ a set of ev aluation metrics to assess accuracy , robustness to physical missp eciﬁcation, and statistical reliability . P oint wise accuracy . Poin twise accuracy is quantiﬁed using the relative L 2 p ercen tage error: Rel- L 2 (%) = 100 × ∥ u pred − u true ∥ 2 ∥ u true ∥ 2 . (29) As pADAM is a generative mo del, point predictions are obtained b y dra wing a single sample from the learned solution distribution for each test case. Rep orted errors are a veraged ov er 50 indep enden t test instances for each problem setting. Quan tifying op erator shift. T o assess the generativ e prior’s robustness to out-of-distribution (OOD) physical dynamics, we deﬁne the op erator shift ∆ op as a measure of the discrepancy b et ween the trained physical library and an unseen target dynamic. F or a target op erator P unseen and a reference training op erator P train , the shift is quantiﬁed by the relativ e L 2 deviation of their resp ectiv e terminal states u T ev olving from an identical initial condition u 0 : ∆ op (%) = 100 × ∥ u ( P unseen ) T − u ( P train ) T ∥ 2 ∥ u ( P train ) T ∥ 2 . (30) In the con text of the zero-shot extrap olation exp erimen ts presen ted in this study , this metric captures the physical departure induced b y the reaction term k in the adv ection–diﬀusion–reaction (ADR) system relative to the base advection–diﬀusion (AD) tra jectory . This serves as a formal proxy for the degree of physical missp eciﬁca- tion the pAD AM prior must reconcile during observ ation-guided sampling. Uncertain ty quantiﬁcation. The reliability of uncertaint y quantiﬁcation is ev aluated using the prediction interv al cov erage probability (PICP): PICP(%) = 100 |S | X s ∈S 1  u true ( s ) ∈ [ ˆ u low ( s ) , ˆ u high ( s ) ]  , (31) where S denotes the set of spatial co ordinates and 1 {·} represents the indicator function. Data a v ailability The data that supp ort the ﬁndings of this study are a v ailable from the corresp onding author up on reasonable request. Co de a v ailability The co de used to generate the results of this study will b e made publicly av ailable up on publication at the following rep ository: h ttps://github.com/Mollaali/pAD AM . 24 Ac kno wledgemen t W e w ould lik e to thank the supp ort of National Science F oundation (DMS-2533878, DMS-2053746, DMS-2134209, ECCS-2328241, CBET-2347401 and O AC-2311848), and U.S. Departmen t of Energy (DOE) Oﬃce of Science Adv anced Scientiﬁc Comput- ing Research program DE-SC0023161, the SciDA C LEADS Institute, and DOE–F usion Energy Science, under gran t num b er: DE-SC0024583. References [1] Ev ans, L.C.: P artial Diﬀerential Equations, 2nd edn. Graduate Studies in Mathematics, v ol. 19. American Mathematical So ciet y , Pro vidence, RI (2010) [2] Logan, J.D.: Applied Partial Diﬀerential Equations, 3rd edn. Undergraduate T exts in Mathematics. Springer, Cham (2014) [3] Fisher, M., No cedal, J., T r ´ emolet, Y., W right, S.J.: Data assimilation in weather forecasting: a case study in p de-constrained optimization. Optimization and Engineering 10 (3), 409–426 (2009) [4] F oias, C., Manley , O., Rosa, R., T emam, R.: Navier-Stok es Equations and T ur- bulence. Encyclop edia of Mathematics and its Applications, v ol. 83. Cam bridge Univ ersity Press, Cambridge (2001) [5] Pletc her, R.H., T annehill, J.C., Anderson, D.A.: Computational Fluid Mechanics and Heat T ransfer, 3rd edn. CRC Press, Bo ca Raton (2013) [6] Brenner, S.C., Scott, L.R.: The Mathematical Theory of Finite Elemen t Methods, 3rd edn. T exts in Applied Mathematics. Springer, New Y ork, NY (2008) [7] LeV eque, R.J.: Finite Diﬀerence Metho ds for Ordinary and Partial Diﬀerential Equations: Steady-State and Time-Dep enden t Problems. SIAM, Philadelphia, P A (2007) [8] Ghattas, O., Willcox, K.: Learning ph ysics-based mo dels from data: persp ectives from inv erse problems and mo del reduction. Acta Numerica 30 , 445–554 (2021) [9] Li, Z., Ko v ac hki, N., Azizzadenesheli, K., Liu, B., Bhattachary a, K., Stuart, A., Anandkumar, A.: F ourier neural op erator for parametric partial diﬀerential equations. arXiv preprin t arXiv:2010.08895 (2020) [10] Ko v achki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattachary a, K., Stuart, A., Anandkumar, A.: Neural op erator: Learning maps b etw een function spaces with applications to p des. Journal of Machine Learning Research 24 (89), 1–97 (2023) [11] W en, G., Li, Z., Azizzadenesheli, K., Anandkumar, A., Benson, S.M.: U-fno—an enhanced fourier neural op erator-based deep-learning mo del for m ultiphase ﬂo w. 25 Adv ances in W ater Resources 163 , 104180 (2022) https://doi.org/10.1016/j. advw atres.2022.104180 [12] Jiang, Z., Zhu, M., Lu, L.: F ourier-mionet: F ourier-enhanced multiple-input neural op erators for multiphase mo deling of geological carb on sequestration. Reliability Engineering & System Safety 251 , 110392 (2024) https://doi.org/10.1016/j.ress. 2024.110392 [13] Li, K., Y e, W.: D-fno: A decomp osed fourier neural op erator for large-scale para- metric partial diﬀeren tial equations. Computer Metho ds in Applied Mechanics and Engineering 436 , 117732 (2025) https://doi.org/10.1016/j.cma.2025.117732 [14] Lu, L., Jin, P ., P ang, G., Zhang, Z., Karniadakis, G.E.: Learning nonlinear oper- ators via deep onet based on the universal approximation theorem of op erators. Nature mac hine intelligence 3 (3), 218–229 (2021) [15] Ho w ard, A.A., Perego, M., Karniadakis, G.E., Stinis, P .: Multiﬁdelity deep op erator netw orks for data-driven and physics-informed problems. Journal of Computational Physics 493 , 112462 (2023) https://doi.org/10.1016/j.jcp.2023. 112462 [16] Zheng, J., Hu, H., Huang, J., Zhao, B., Huang, H.: Cf-deep onet: Deep op erator neural netw orks for solving compressible ﬂo ws. Aerospace Science and T echnology 163 , 110329 (2025) h ttps://doi.org/10.1016/j.ast.2025.110329 [17] Yin, M., Charon, N., Bro dy , R., Lu, L., T ray anov a, N., Maggioni, M.: A scal- able framework for learning the geometry-dep enden t solution op erators of partial diﬀeren tial equations. Nature computational science 4 (12), 928–940 (2024) [18] Jin, P ., Meng, S., Lu, L.: Mionet: Learning multiple-input op erators via tensor pro duct. SIAM Journal on Scientiﬁc Computing 44 (6), 3490–3514 (2022) [19] Raissi, M., Perdik aris, P ., Karniadakis, G.E.: Physics-informed neural netw orks: A deep learning framework for solving forward and inv erse problems inv olving nonlinear partial diﬀerential equations. Journal of Computational Physics 378 , 686–707 (2019) h ttps://doi.org/10.1016/j.jcp.2018.10.045 [20] Karniadakis, G.E., Kevrekidis, I.G., Lu, L., P erdik aris, P ., W ang, S., Y ang, L.: Ph ysics-informed mac hine learning. Nature Reviews Ph ysics 3 (6), 422–440 (2021) [21] Lin, G., Mo ya, C., Zhang, Z.: B-deep onet: An enhanced bay esian deep onet for solving noisy parametric p des using accelerated replica exchange sgld. Journal of Computational Physics 473 , 111713 (2023) https://doi.org/10.1016/j.jcp.2022. 111713 [22] Mo y a, C., Mollaali, A., Zhang, Z., Lu, L., Lin, G.: Conformalized-deep onet: A 26 distribution-free framework for uncertaint y quantiﬁcation in deep op erator net- w orks. Ph ysica D: Nonlinear Phenomena 471 , 134418 (2025) h ttps://doi.org/10. 1016/j.ph ysd.2024.134418 [23] Lone, S.N., De, S., Na yek, R.: α -VI DeepONet: A prior-robust v ariational Ba yesian approach for enhancing DeepONets with uncertaint y quan tiﬁcation. Computer Metho ds in Applied Mechanics and Engineering 449 , 118552 (2026) h ttps://doi.org/10.1016/j.cma.2025.118552 [24] Zhang, Z.: Modno: Multi-operator learning with distributed neural op erators. Computer Metho ds in Applied Mechanics and Engineering 431 , 117229 (2024) h ttps://doi.org/10.1016/j.cma.2024.117229 [25] Liu, Y., Zhang, Z., Sc haeﬀer, H.: Prose: Predicting m ultiple op erators and sym- b olic expressions using multimodal transformers. Neural Netw orks 180 , 106707 (2024) h ttps://doi.org/10.1016/j.neunet.2024.106707 [26] Sun, J., Liu, Y., Zhang, Z., Schaeﬀer, H.: T ow ards a foundation mo del for partial diﬀeren tial equations: Multiop erator learning and extrapolation. Physical Review E 111 (3), 035304 (2025) [27] McCabe, M., Blancard, B.R.-S., P arker, L.H., Ohana, R., Cranmer, M., Bietti, A., Eic ken b erg, M., Golk ar, S., Kraw ezik, G., Lanusse, F., Pettee, M., T esileanu, T., Cho, K., Ho, S.: Multiple physics pretraining for spatiotemp oral surrogate mo d- els. In: The Thirty-eigh th Ann ual Conference on Neural Information Pro cessing Systems (2024). https://op enreview.net/forum?id=DKSI3bULiZ [28] Hao, Z., Su, C., Liu, S., Berner, J., Ying, C., Su, H., Anandkumar, A., Song, J., Zh u, J.: DPOT: Auto-Regressiv e Denoising Op erator T ransformer for Large-Scale PDE Pre-T raining (2024). [29] Huang, J., Y ang, G., W ang, Z., Park, J.J.: Diﬀusionp de: Generative p de-solving under partial observ ation. Adv ances in Neural Information Pro cessing Systems 37 , 130291–130323 (2024) [30] Y ao, J., Mammadov, A., Berner, J., Kerrigan, G., Y e, J.C., Azizzadenesheli, K., Anandkumar, A.: Guided diﬀusion sampling on function spaces with applica- tions to PDEs. In: The Thirty-nin th Annual Conference on Neural Information Pro cessing Systems (2025). https://op enreview.net/fo rum?id=oAgwvZay2U [31] W ang, S., Dou, Z., Shan, S., Liu, T.-R., Lu, L.: F undiﬀ: Diﬀusion mo dels o ver function spaces for physics-informed generativ e mo deling. arXiv preprint arXiv:2506.07902 (2025) [32] Shan, S., Zh u, M., Lin, Y., Lu, L.: Red-diﬀeq: Regularization by denoising dif- fusion mo dels for solving in verse p de problems with application to full wa veform in version. arXiv preprint arXiv:2509.21659 (2025) 27 [33] Bastek, J.-H., Sun, W., Kochmann, D.M.: Ph ysics-informed diﬀusion mo dels. arXiv preprin t arXiv:2403.14404 (2024) [34] V o vk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random W orld. Springer, New Y ork, NY (2005) [35] Angelopoulos, A.N., Bates, S.: A gentle introduction to conformal prediction and distribution-free uncertaint y quantiﬁcation. arXiv preprint (2021) [36] Lipman, Y., Chen, R.T., Ben-Hamu, H., Nick el, M., Le, M.: Flow matching for generativ e mo deling. arXiv preprin t arXiv:2210.02747 (2022) [37] Geng, Z., Deng, M., Bai, X., Kolter, J.Z., He, K.: Mean ﬂo ws for one-step generativ e mo deling. arXiv preprin t arXiv:2505.13447 (2025) [38] Song, Y., Sohl-Dickstein, J., Kingma, D.P ., Kumar, A., Ermon, S., P o ole, B.: Score-based generativ e mo deling through stochastic diﬀeren tial equations. In: International Conference on Learning Represen tations (2021). https://op enreview.net/fo rum?id=PxTIG12RRHS [39] Ho, J., Jain, A., Abb eel, P .: Denoising diﬀusion probabilistic mo dels. Adv ances in Neural Information Pro cessing Systems 33 , 6840–6851 (2020) [40] Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diﬀusion models for high ﬁdelity image generation. Journal of Machine Learning Researc h 23 (47), 1–33 (2022) [41] Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diﬀusion-based generative models. Adv ances in neural information processing systems 35 , 26565–26577 (2022) [42] Ch ung, H., Kim, J., Mccann, M.T., Klasky , M.L., Y e, J.C.: Diﬀusion p osterior sampling for general noisy inv erse problems. arXiv preprin t (2022) [43] Romano, Y., Patterson, E., Candes, E.: Conformalized quantile regression. Adv ances in neural information pro cessing systems 32 (2019) 28 Extended Data Extended Data Fig. 1 : Mec hanistic analysis of internal atten tion patterns across ph ysical op erators. Representativ e attention maps are extracted from a deco der blo c k at a ﬁxed denoising step ( t = 1000) for diﬀusion, advection, and adv ection–diﬀusion regimes under identical initial conditions. Comparison of local atten tion w eights rev eals distinct structural signatures: the advection–diﬀusion maps (righ t) exhibit in termediate activ ations in spatial regions where features are prominen t in pure advection (middle) but absent in pure diﬀusion (left). This graded atten tion pattern across the three operator classes suggests that the shared pAD AM prior reuses and comp oses operator-sp eciﬁc internal representations to capture complex mixed dynamics. 29 Extended Data T able 1 : Exp erimental conﬁgurations and parameter mani- folds. a, Initial condition (IC) protocol standardized across all exp erimental regimes. ICs for scalar and Burgers’ systems follow lo calized Gaussian forms adapted to their resp ectiv e domains (Eqs. 12 , 24 ), while Navier–Stok es follows the solenoidal trigono- metric form (Eq. 27 ). b, Systematic v ariability of gov erning coeﬃcients ϕ across thematic inv estigations. These manifolds deﬁne the contin uous physical regimes ana- lyzed in the corresp onding Results sections. PDE group IC parameter Sampling rule Scalar PDEs Cen troid ( x c , y c ) U [0 . 2 , 0 . 8] 2 Gaussian Width w 0 U [0 . 025 , 0 . 075] Burgers’ Comp onent Centroids ( c x,i , c y ,i ) U [0 . 2 , 0 . 8] 2 Comp onen t Widths w 1 , w 2 U [0 . 025 , 0 . 075] Na vier–Stokes Amplitude F actor a a ∼ U (0 . 5 , 1 . 5] T rigonometric Basis ϕ, ψ ( ϕ, ψ ) ∈ { sin , cos } 2 Mo de F requency k ∈ { 2 π , 4 π } Domain Length L L = 1 (ﬁxed) P anel b — Parameter manifolds and physical regimes across thematic in vestigations Thematic in vestiga- tion PDE library V ariable parameters ( ϕ ) Sampling rule Uniﬁed m ulti-physics Diﬀusion – ν = 0 . 25 Adv ection – ( a x , a y ) = (4 , 2) Adv.–Diﬀ. – ( ν, a x , a y ) = (0 . 25 , 4 , 2) Con tinuous physics manifold Diﬀusion ν U [0 . 1 , 0 . 4] Adv ection a x U [2 . 0 , 5 . 0] , a y = 2 Adv.–Diﬀ. ν U [0 . 1 , 0 . 4] , ( a x , a y ) = (4 , 2) Structural scaling Diﬀusion ν U [0 . 1 , 0 . 4] Adv ection a x U [2 . 0 , 5 . 0] , a y = 2 Adv.–Diﬀ. ν U [0 . 1 , 0 . 4] , ( a x , a y ) = (4 , 2) Allen–Cahn ε 2 U [2 . 5 × 10 − 3 , 0 . 0121] Burgers’ ν ν = 0 . 05 Na vier–Stokes ν ν = 0 . 02 P arametric scaling and mo del selection Diﬀusion ν U [0 . 2 , 0 . 4] Adv ection ( a x , a y ) U [2 . 0 , 3 . 0] 2 Adv.–Diﬀ. ( ν, a x , a y ) U [0 . 2 , 0 . 4] × U [2 . 0 , 3 . 0] 2 30 sample 1 sample 2 sample 3 sample 4 sample 5 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Encoder - Denoising step t = 500 sample 1 sample 2 sample 3 sample 4 sample 5 Encoder - Denoising step t = 1000 sample 1 sample 2 sample 3 sample 4 sample 5 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Decoder - Denoising step t = 500 sample 1 sample 2 sample 3 sample 4 sample 5 Decoder - Denoising step t = 1000 Cosine similarity (Diffusion, A dvection) (Diffusion, A dvection Diffusion) (A dvection, A dvection Diffusion) Extended Data Fig. 2 : Quan titativ e assessment of representational similar- it y across PDE families. Pairwise cosine similarities of attention maps calculated for one encoder and one deco der blo c k at t wo representativ e denoising steps ( t = 500 and t = 1000) across ﬁv e test samples. Across all conﬁgurations, similarities b et w een the adv ection–diﬀusion (mixed) regime and the pure extremes (diﬀusion or advection) are consisten tly higher than the similarity b et ween the tw o pure extremes. This hier- arc hy indicates that the model’s laten t space is organized according to the underlying mathematical comp osition of the ph ysical laws, with the mixed op erator acting as a represen tational bridge. Extended Data T able 2 : Conformal calibration resolv es the systematic under-co verage of Bay esian ensem bles. Mean empirical co verage (%) of nominal 95% prediction in terv als for the advection–diﬀusion system under sparse (30%) spatial observ ations (ensem ble size M = 6; 50 calibration instances). While raw Ba y esian ense m bles fail to meet the nominal tar- get, the integration of conformal calibration into the pADAM framew ork eﬀectiv ely reco vers statistical v alidit y , ensuring reli- able uncertain ty quan tiﬁcation for ph ysical inference. Results are a veraged ov er 50 test instances. Metho d F orward ( u T ) In verse ( u 0 ) Ensem ble only 58.33 36.31 Ensem ble + Conformal 98.42 99.83 31 (a) Op erator discrepancy (∆ op ) Reaction rate k ∆ op (%) 5.0 22 . 14 15.0 52 . 83 (b) ADR extrap olation p erformance 30 20 10 Observed fraction (%) 0 2 4 6 R e l a t i v e L 2 e r r o r ( % ) k = 5 . 0 , u T k = 5 . 0 , u 0 k = 1 5 . 0 , u T k = 1 5 . 0 , u 0 Extended Data Fig. 3 : Zero-shot extrap olation p erformance on unseen PDE (advection–diﬀusion–reaction dynamics). a, Quantiﬁcation of the op era- tor shift (∆ op ) b et ween the advection–diﬀusion (AD) training prior and the unseen adv ection–diﬀusion–reaction (ADR) dynamics for tw o reaction rates k . This shift is deﬁned as the relativ e L 2 discrepancy b et w een the terminal states ( u T ) of the t wo sys- tems under identical initial conditions; as k increases, the physical div ergence b et ween the tw o PDE systems grows. b, Relativ e L 2 error for the joint reconstruction of full- ﬁeld initial ( u 0 ) and terminal ( u T ) states conditioned on sparse spatial observ ations of endp oin ts. The model is conditioned on the closest known op erator class (AD) and steered via observ ation-guided sampling. Error in u T increases with k due to the accum ulated inﬂuence of the unseen reaction term, while u 0 remains stable. As obser- v ation sparsit y increases, reconstruction accuracy degrades gracefully , demonstrating the robustness of the pADAM prior under physical missp eciﬁcation. All results are a veraged ov er 20 indep enden t test instances. 32 2 3 4 5 6 7 8 Number of Samples 0.0 0.2 0.4 0.6 0.8 1.0 Mean PICP F ull Observation (100%) 95% tar get Diffusion (forwar d) Diffusion (inverse) A dvection (forwar d) A dvection (inverse) A dvection-Diffusion (forwar d) A dvection-Diffusion (inverse) 2 3 4 5 6 7 8 Number of Samples Sparse Observation (30%) Extended Data Fig. 4 : Empirical co verage of Bay esian p osterior ensembles across diﬀerent observ ation regimes. Mean predictiv e-interv al cov erage proba- bilit y (PICP) of nominal 95% interv als (dashed grey line) as a function of ensemble size for forward and inv erse tasks. Particularly under ill-p osed conditions, suc h as in verse problems or prediction under sparse observ ations, where only 30% of the ﬁeld is observ ed, the co v erage of ra w Ba yesian in terv als saturates w ell below the 95% target across all operator families. This systematic under-cov erage underscores the necessity of conformal calibration for pro viding rigorous reliabilit y guarantees in data-sparse or ill-p osed regimes. Results are av eraged o ver 20 test instances. Extended Data T able 3 : T ask-agnostic p erformance across the contin uous ph ysics manifold. Relativ e L 2 errors (%) for forw ard prediction ( u T ), initial-state reconstruction ( u 0 ), and physical parameter discov ery ( ϕ ). pADAM was trained across three PDE families, eac h with a single v ariable co eﬃcien t: diﬀusion ( ϕ = ν ), adv ection ( ϕ = a x ), and advection–diﬀusion ( ϕ = ν ). Performance is ev aluated under full (100%) and sparse spatial observ ations (30% and 10%), with results av eraged o ver 50 test instances. PDE System Observ ation F orward ( u T ) In v erse ( u 0 ) In v erse ( ϕ ) Diﬀusion F ull (100%) 0.69 0.89 1.38 Sparse (30%) 1.64 1.96 3.48 Sparse (10%) 2.09 3.05 8.26 Adv ection F ull (100%) 1.91 1.13 0.72 Sparse (30%) 2.11 1.47 1.48 Sparse (10%) 2.13 2.55 2.69 Adv ection–diﬀusion F ull (100%) 1.12 1.20 2.81 Sparse (30%) 1.70 2.26 4.44 Sparse (10%) 2.65 4.11 7.73 33 0.0 0.4 0.8 x 0.0 0.4 0.8 y 0.00 0.15 0.30 u (a) Ensemble Method (Coverage=59.20%) T rue Surface Upper Bound L ower Bound Slice locations 0.0 0.4 0.8 x 0.0 0.4 0.8 y 0.00 0.15 0.30 u (b) Ensemble Method + Confor mal (Coverage=100.00%) T rue Surface Upper Bound L ower Bound Slice locations 0.0 0.2 0.4 0.6 0.8 1.0 x 0.000 0.025 0.050 u (c) y = 0 . 0 0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.1 0.2 u (d) y = 0 . 3 3 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 u (g) y = 0 . 6 7 0.0 0.2 0.4 0.6 0.8 1.0 x 0.00 0.05 0.10 u (h) y = 1 . 0 0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.000 0.025 0.050 u (e) y = 0 . 0 0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.1 0.2 u (f) y = 0 . 3 3 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 u (i) y = 0 . 6 7 0.0 0.2 0.4 0.6 0.8 1.0 x 0.00 0.05 0.10 u (j) y = 1 . 0 0 Extended Data Fig. 5 : Eﬀect of conformal calibration on prediction inter- v als. a, Standard ensemble-based prediction interv als sho w undercov erage (59.20%) for forw ard prediction of u T in the adv ection–diﬀusion system under sparse (30%) observ ations. b, Conformal-calibrated interv als ac hieving 100.00% empirical co ver- age. c–j, One-dimensional slices at representativ e y -lo cations; shaded bands denote predictiv e in terv als and blue curves indicate the true solution. Conformal calibration adaptiv ely expands the in terv als in regions of high epistemic uncertain ty to reco ver the nominal 95% cov erage. Critically , this p ost-hoc pro cedure preserves the shap e of the underlying generative predictions while improving co verage reliabilit y without mo del retraining. 34 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.22 0.45 0.67 0.89 Observed u 0 (10%) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.00 0.22 0.45 0.68 0.91 Observed u T (10%) Estimated Lifted ϕ True Lifted ϕ 0.00 0.22 0.45 0.67 0.89 0.00 0.22 0.45 0.68 0.91 4.18 4.18 4.18 4.18 4.18 4.18 4.18 4.18 4.18 4.18 (a) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.02 0.20 0.42 0.63 0.85 O b s e r v e d u 0 ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.02 0.20 0.42 0.64 0.86 O b s e r v e d v 0 ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.01 0.12 0.26 0.40 0.53 E s t i m a t e d u T 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.01 0.13 0.26 0.40 0.53 G r o u n d - T r u t h u T 0.00 0.21 0.42 0.62 0.83 0.00 0.21 0.42 0.63 0.85 0.00 0.13 0.26 0.39 0.52 0.00 0.13 0.26 0.39 0.52 (b) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.05 0.14 0.32 0.50 0.68 O b s e r v e d u 0 ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.03 0.10 0.23 0.35 0.48 O b s e r v e d v T ( 3 0 % ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.02 0.17 0.35 0.54 0.72 E s t i m a t e d v 0 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.01 0.17 0.35 0.54 0.72 G r o u n d - T r u t h v 0 0.00 0.16 0.32 0.48 0.64 0.00 0.11 0.23 0.34 0.45 0.00 0.18 0.35 0.53 0.71 0.00 0.18 0.35 0.53 0.71 (c) Extended Data Fig. 6 : Qualitativ e inference p erformance of pADAM under con tinuous physics manifold and structural scaling. a, Parameter disco very within the scalar physics manifold (diﬀusion, advection, and advection–diﬀusion). F or the adv ection system, the physical co eﬃcien t ϕ = a x is inferred from paired states ( u 0 , u T ) under 10% spatial observ ations. b, c, Inference for the Burgers’ system under structural scaling to the 6-PDE library . b, F orw ard prediction of the Burgers’ velocity comp onen t u T conditioned on full spatial observ ations of the initial states ( u 0 , v 0 ). c, In verse reconstruction of the Burgers’ initial velocity comp onen t v 0 conditioned on 30% spatial observ ations of the terminal component v T and the initial comp onent u 0 . 35 L i f t e d 0.0 0.5 1.0 x 0.0 0.5 1.0 y -0.00 0.22 0.44 0.66 0.88 O b s e r v e d u T ( F u l l ) 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.22 0.44 0.66 0.88 E s t i m a t e d u 0 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.22 0.44 0.66 0.88 G r o u n d - T r u t h u 0 2.19 2.29 2.38 2.48 2.58 0.00 0.22 0.44 0.66 0.88 0.00 0.22 0.44 0.66 0.88 0.00 0.22 0.44 0.66 0.88 (a) 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.22 0.43 0.65 0.86 Observed u 0 (30%) 0.0 0.5 1.0 x 0.0 0.5 1.0 y 0.00 0.10 0.21 0.31 0.41 Observed u T (30%) Estimated Lifted ϕ True Lifted ϕ 0.22 0.43 0.65 0.86 0.10 0.21 0.31 0.41 0.28 0.79 1.30 1.81 2.32 0.28 0.79 1.30 1.81 2.32 (b) Extended Data Fig. 7 : Qualitativ e inference p erformance of pADAM under parametric scaling and heterogeneous parametric dimensionality . a, Initial- state reconstruction in the advection system with a v ariable parameter vector ϕ = [ a x , a y ]. The reconstructed initial state u 0 is inferred by conditioning on the known parameter v ector and full observ ation of the terminal state u T . b, Parameter disco very in the adv ection–diﬀusion system with a three-dimensional parameter v ector ϕ = [ ν, a x , a y ]. The physical coeﬃcients are inferred from paired states ( u 0 , u T ) under 30% spatial observ ations. 36

pADAM: A Plug-and-Play All-in-One Diffusion Architecture for Multi-Physics Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment