Posterior contraction under misspecification and heteroscedasticity in non-linear inverse problems
In many practical and numerical inverse problems, the exact data log-likelihood is not fully accessible, motivating the use of surrogate models. We study heteroscedastic nonparametric nonlinear regression problems with Gaussian errors and establish c…
Authors: Fanny Seizilles, Maximilian Siebel
Posterior contra ction under misspecifica tion and heter osced asticity in non-linear inverse pr oblems F anny Seizilles 1 & Maximilian Siebel 2 University of Cambridge Heidelberg University March 31, 2026 Abstract In man y practical and numerical in verse problems, the exact data log-lik eliho o d is not fully accessible, motiv ating the use of surrogate mo dels. W e study heteroscedastic non- parametric nonlinear regression prob lems with Gaussian errors and establish contraction results for posterior distributions arising from a surrogate log-likelihoo d constructed from pro xy error v ariances, an approximate forw ard map, and an appropriate Gaussian process prior. Under general assumptions on the approximation quality , we show that the result- ing surrogate p osterior is statistically reliable and contracts ab out the true parameter at rates comparable to those of the exact p osterior. The analysis leverages consistency prop- erties of the (p enalised) MLE to effectively handle heteroscedastic noise and to control the impact of likelihoo d approximation errors. W e apply the framework to PDE-constrained in verse problems for a reaction–diffusion equation and the tw o-dimensional Navier–Stok es equation. In the latter case, we consider missp ecified viscosity and forcing terms as well as Oseen-type linearization mo dels, highlighti ng the relev ance of our results for numerical analysis applications. Con tents 1 In tro duction 2 1.1 Statistical inference for nonlinear inv erse problems . . . . . . . . . . . . . . . . . 2 1.2 Prior works and con tributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Setting 5 2.1 Notation and preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 F unction spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Observ ation model, Bay esian approac h and mild missp ecification . . . . . . . . . 7 2.4 Regularit y conditions on the forw ard map . . . . . . . . . . . . . . . . . . . . . . 8 2.5 Conditions on the Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 P osterior Contraction for missp ecified mo dels 10 3.1 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Basic contraction theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Con traction result for the in v erse problem . . . . . . . . . . . . . . . . . . . . . . 14 1 DPMMS, Univ ersity of Cam bridge; e-mail: fps25@cam.ac.uk 2 Institute for Mathematics, Heidelberg Universit y; e-mail: sieb el@math.uni-heidelb erg.de 1 4 Examples 15 4.1 Example 1: Noise missp ecification in the Reaction Diffusion equation . . . . . . . 15 4.2 Example 2: Mo del misspecification in the Navier-Stok es equation . . . . . . . . . 17 4.3 Example 3: Mo del misspecification from n umerical appro ximation . . . . . . . . 18 5 Pro of of Section 3 19 5.1 Small ball computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2 Pro ofs for the c hange of measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.3 Pro of of Proposition 3.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4 Pro of of Theorem 3.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.5 Pro of of Theorem 3.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 A Choice of Priors 34 B Mild Missp ecification in M-Estimation 35 B.1 Tikhonov-regularized estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 B.2 Consistency and T ests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 B.3 Pro of of Theorem B.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 C Analysis and PDE Theory 45 C.1 Reaction Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 C.2 2D-Na vier-Stokes Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 C.3 Oseen approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 D Miscellaneous 53 D.1 An inequality for Sob olev spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 D.2 A chaining lemma for non i.i.d. data . . . . . . . . . . . . . . . . . . . . . . . . . 54 D.3 An inequality for infinite series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 1 In tro duction 1.1 Statistical inference for nonlinear inv erse problems A wide range of statistical inference problems arising in the natural sciences can b e form ulated as nonline ar inverse pr oblems . In such settings, an unkno wn parameter θ that is t ypically infinite-dimensional, is linked to observ able data through a nonlinear forward op erator G : Θ ∋ θ 7→ G ( θ ) ∈ Y whic h models the resp onse of a complex system to the parameter of in terest. A practically imp ortan t class of nonlinear inv erse problems arises when the forw ard operator G is defined implicitly as the solution map of a nonlinear dynamical system gov erned b y ordinary or partial differen tial equations; see, e.g., T emam (1997) ; Strogatz (2018) . In this case, θ ma y represent an unknown initial condition, forcing term, or constitutive parameter. Ph ysical observ ations are t ypically indirect, noisy , and av ailable only at finitely man y ‘design’ p oints ( t i , x i ) , leading to regression-type mo dels of the form Y i = G ( θ )( t i , x i ) + ε i , i = 1 , . . . , N , (1) where ε 1 , . . . , ε N are independent Gaussian measuremen t errors with noise v ariances σ 2 1 , . . . , σ 2 N . The statistical task then consists in recov ering θ from noisy partial observ ations as describ ed b y Eq. (1). F rom a deterministic and statistical p ersp ectiv e, in verse problems of this type hav e b een studied extensively; see, for instance, Engl et al. (2000) ; Kaipio and Somersalo (2005) ; 2 Kalten bacher et al. (2008) ; Stuart (2010) ; Arridge et al. (2019) and the references therein. A p opular viewp oin t is provided b y the Ba yesian approach. F rom this p ersp ectiv e, uncertaint y ab out θ is enco ded b y placing a prior distribution Π on the parameter space Θ , leading - via Ba yes’ theorem - to the p osterior distribution dΠ( θ | D N ) ∝ e ℓ N ( θ ) dΠ( θ ) , (2) where N ( θ ) denotes the log-likelihoo d function with data D N = { ( Y i , t i , x i ) } N i =1 asso ciated to Eq. (1). In infinite-dimensional inv erse problems, Gaussian pro cess priors are commonly em- plo yed, and p osterior contraction rates pro vide a first-order notion of frequen tist v alidity . F or mo dels, where the log-lik eliho o d N is fully accessible, p osterior contraction for nonlinear in- v erse problems is by now well understo o d in a num b er of settings, including problems gov erned b y partial differen tial equations. F or a comprehensive o verview of the (infinite dimensional) Ba yesian metho dology as well its treatment in (non-)linear in v erse problems, w e refer to Ghosal and v an der V aart (2017) and Nickl (2023) , together with the references therein. In practice, the exact log-lik eliho o d function N ( θ ) ma y not b e av ailable. Instead, statistical inference is then often based on appr oximate or surr o gate likelihoo ds, arising from numerical discretization of the forw ard op erator, incomplete kno wledge of the noise distribution, or the use of pre-estimated noise lev els. Even in the idealized case of additive Gaussian errors, the exact ev aluation of N ( θ ) requires rep eated solutions of the forw ard problem θ 7→ G ( θ ) , which is often computationally prohibitiv e in nonlinear or high-dimensional settings. As a result, Bay esian inference is commonly carried out using a surrogate p osterior e Π( θ | D N ) computed as in Eq. (2) but with a missp ecified log-likelihoo d of the form ˜ N ( θ ) = − 1 2 N X i =1 1 s 2 i Y i − ˜ G ( θ )( t i , x i ) 2 , (3) where ˜ G denotes a numerical or otherwise approximate forw ard map replacing G and s 2 1 , . . . , s 2 N are surrogate noise v ariances. Bey ond mo delling considerations, lik eliho o d misspecification is also closely tied to computational feasibilit y . Posterior inference for Ba y esian in verse problems t ypically relies on sampling-based algorithms such as Marko v chain or sequential Mon te Carlo metho ds, all of which require re- p eated ev aluation of the (log-)likelihoo d; see, e.g., Stuart (2010) ; Cotter et al. (2013) ; Hairer et al. (2014) ; Nic kl and W ang (2024) ; Giordano and W ang (2025) ; Castre and Nic kl (2026) and the references therein. This observ ation has motiv ated a substantial literature on appro ximate Ba yesian metho ds, including noisy and pseudo-marginal MCMC algorithms, delay ed acceptance sc hemes, and surrogate-based approaches; see, for instance, Christen and F o x (2005) ; Andrieu and Rob erts (2009) ; Andrieu et al. (2010) . 1.2 Prior works and contributions Studying prop erties of p osteriors in missp ecified mo dels is notoriously challenging, and obtain- ing quantitativ e contraction results often requires strong assumptions. In noise missp ecification, for instance, previous approac hes mo del an unknown noise using a Gaussian distribution: that is the case of Norets (2015) for heteroscedastic missp ecified noise in nonparametric linear regres- sion, or Kleijn and v an der V aart (2006) ; Ghosal and v an der V aart (2017) for nonparametric (nonlinear) regression. Such results ho wev er require stronger h yp otheses, suc h as the parameter- to-observ ation map b eing uniformly b ounded ov er the parameter space. The use of fr actional p osteriors where the surrogate p osterior is obtained by raising the likelihoo d to some p o w er 3 α ∈ (0 , 1) , has also b een inv estigated in this context: b y reducing the weigh t of the data, this mak es the p osterior more robust to misspecification, and leads to p osterior contraction results obtained in Rényi divergences (see e.g. Grünw ald (2012) ; Bhattachary a et al. (2019) ; Miller and Dunson (2019) ; L’Huillier et al. (2023) ). These divergences are how ever weak er metrics, so that in turn one requires stronger stability conditions if one aims to reco v er con traction at the lev el of the target parameter. In a significant p roportion of the Bay esian inference literature, missp ecification arises when the data generating distribution P θ 0 do es not b elong to the class { Q θ , θ ∈ Θ } of mo dels un- der consideration. In these situations, Berk (1966) ; Kleijn and v an der V aart (2012) sho w that the p osterior distribution typically concen trates around a pseudo-true parameter θ ∗ min- imising the Kullback-Leibler divergence b et ween the true distribution and the mo del class, i.e. θ ∗ = arg min θ ∈ Θ D K L ( P θ 0 , Q θ ) . Con traction is established in a ‘Hellinger transform’ testing div ergence, which cannot b e immediately related to less abstract metrics relev an t in the case of nonlinear in verse problems, as the class of mo dels asso ciated to the loglikehoo d Eq. (3) is not con vex. The main con tribution of this pap er is to give sufficien t, w orkable, conditions for reliable Ba yesian inference in PDE-based inv erse problems under lik eliho o d missp ecification. Precisely , w e sho w posterior contraction of the resulting missp ecified p osterior distribution d e Π( θ | D N ) ∝ e ˜ ℓ N ( θ ) dΠ( θ ) directly around the true parameter θ 0 at the usual nonparametric rate for correctly sp ecified problems, under a regime of mild missp e cific ation in whic h the ‘error’ arising from either un- kno wn Gaussian noise v ariance or an approximate PDE mo del decays sufficien tly fast relativ e to the sample size N and the ill-p osedness of the in v erse-problem. A dapting ideas in Nic kl (2023) to such a situation, and emplo ying suitable stability estimates, we then sho w posterior contrac- tion for the nonlinear in verse problem, and prov e conv ergence rates for the surrogate p osterior mean E e Π [ θ | D N ] tow ards the ground truth θ 0 . Our approach lev erages the inherent robustness of the p enalised MLE – whic h has b een shown to conv erge to the pseudo-true parameter even under mo del error ( White (1982) ; Kleijn and v an der V aart (2012) ) – to build a sequence of test functions Ψ N = Ψ N ( D N ) whose t yp e-I-error and t yp e-I I-error decay sufficien tly fast (follo wing the w ork of Nickl et al. (2020) ; Sieb el (2025) ). Robustness of those p oin t estimators is how ev er not sufficien t, and to obtain p osterior contraction we further establish ‘change of measure’ con- ditions, which ensure that the growth of certain lik eliho o d terms is offset by the deca y of the prior. Imp ortan tly , our pro of metho ds for the change of measure enable us to consider PDE maps that are not uniformly b ounded ov er the parameter space, through the use of a slicing argumen t. These conditions are also essen tial to prov e conv ergence results of the mean of the surrogate p osterior distribution. Outline of the pap er In Section 2 we introduce the general setting and notations. In Section 3 w e establish the main surrogate p osterior con traction theorem, with pro ofs giv en in Section 5. In Section 4 we give three protot ypical examples of missp ecification for PDE- based in v erse problems, resp ectively noise missp ecification in the reaction-diffusion equation and mo del missp ecification (via wrong parameter definition, and via numerical approximation) in the Na vier-Stokes equation. F urther results in Section B revisit the kno wn robustness of M-estimation techniques under missp ecification which are needed to establish the correct con vergence for the hypothesis tests. 4 2 Setting 2.1 Notation and preliminaries Notation 2.1 (Preliminaries) . W e set N 0 : = N ∪ { 0 } . If ( S, T ) is a top ological v ector space, we denote b y B S the Borel- σ -field on S generated by the top ology T . The top ological dual of S is denoted by S ∗ and consists of all linear and b ounded functionals L : S → R . Giv en tw o normed spaces ( S 1 , ∥ · ∥ S 1 ) and ( S 2 , ∥ · ∥ S 2 ) , w e write S 1 → S 2 , if S 1 is con tin uously em b edded in to S 2 . F urther, for M > 0 we write S 1 ( M ) : = { s ∈ S 1 : ∥ s ∥ S 1 ≤ M } . Throughout, random v ariables are defined on a probability space (Ω , A , P ) if not further mentioned. The exp ectation w.r.t. P is denoted by E . Lastly , universal constants are denote by c . If a constan t dep ends on a family of ob jects A , we write c ( A ) . Constants arising from assumptions are denoted b y C with some further sp ecifications. If not further men tioned, the v alue of a constan t can c hange from line to line. 2.2 F unction spaces In the following let d ∈ N b e the fixed dimension. Let ( Z , Z , ζ ) be any measurable space. W e denote for p ∈ [1 , ∞ ) b y L p ζ ( Z , C ) : = L p ζ ( Z , Z , C ) the space of p -in tegrable functions f : ( Z , Z , ζ ) → ( C , B C ) . In particular, the Hilb ert space L 2 ζ ( Z , C ) is equipped with inner pro duct ⟨ f , g ⟩ L 2 ζ ( Z , C ) : = Z Z f ( z ) g ( z )d ζ ( z ) , where w denotes the complex conjugate of any w ∈ C . F or simplicity , we write L p ζ ( Z ) for the corresp onding space of real-v alued functions. Throughout, M denotes either • a bounded op en set O ⊆ R d with smo oth b oundary ∂ O , or • the d -dimensional torus T d : = [0 , 1] d \ ∼ T , where ∼ T is the equiv alence relation iden tifying opp osite p oints. Both spaces, equipp ed with their Borel- σ -field B O and B T d , and the Leb esgue measure L d on R d form measure spaces. W e define for m ∈ N 0 the Banach space C m ( M ) of m -times differen tiable functions f : M → R with bounded deriv ativ es up to order m , equipp ed with the norm ∥ f ∥ C m ( M ) : = X β ∈ N d 0 : | β |≤ m ∥ D β f ∥ ∞ , where ∥ · ∥ ∞ denotes the uniform norm. F or s ∈ R ≥ 0 this definition is extended b y sa ying f ∈ C s ( M ) , if f ∈ C ⌊ s ⌋ ( M ) and further D β u is s − ⌊ s ⌋ -Hölder contin uous for | β | = ⌊ s ⌋ . A norm on C s ( M ) is given b y ∥ f ∥ C s ( M ) : = X β ∈ N d 0 : | β |≤⌊ s ⌋ ∥ D β f ∥ ∞ + X β ∈ N d 0 : | β | = ⌊ s ⌋ sup x,y ∈M : x = y | D β f ( x ) − D β f ( y ) | | x − y | s −⌊ s ⌋ R d . W e call a function smo oth if it b elongs to C ∞ ( M ) : = T s> 0 C s ( M ) . W e denote by C ∞ c ( M ) the subspace of smo oth functions with compact supp ort, noting that C ∞ c ( T d ) = C ∞ ( T d ) . F or m ∈ N 0 , we define the usual Sob olev spaces of real-v alued functions u : M → R with square in tegrable w eak deriv ativ es up to order m . Note, H 0 ( M ) = L 2 ( M ) : = L 2 L d ( M ) . F or non- in teger s ≥ 0 , H s ( M ) is defined via interpolation, see T riebel (1983) . F or s < 0 , we define H s ( M ) : = ( H − s ( M )) ∗ as the topological dual. F or M = O , w e further define for s ≥ 0 H s c ( O ) : = C ∞ c ( O ) ∥·∥ H s ( O ) . Note, for s ≤ 1 2 , we hav e H s c ( O ) = H s ( O ) and otherwise, if s ∈ N , H s c ( O ) equals the subspace of H s ( O ) of v anishing trace on ∂ O , see Lions and Magenes (1972) . 5 F or M = T d b eing the d -dimensional torus, the p erio dic Laplacian − ∆ diagonalises the F ourier basis, i.e. after en umerating Z d = { k j : j ∈ N } such that j 7→ k j is non-decreasing, w e ha ve an orthonormal system e j : = e k j : = exp − 2 π i ⟨ x, k j ⟩ R d , λ 0 = 0 , λ j = 4 π | k j | 2 R d (4) with imaginary unit i = √ − 1 ∈ C . F or s ∈ R , the Sob olev space H s ( T d ) has a equiv alen t sp ectral norm giv en b y ∥ u ∥ 2 h s ( T d ) : = X j ∈ N (1 + λ j ) s ⟨ u, e j ⟩ L 2 ( T d , C ) 2 C . In the subsequen t, we will need homo gene ous Sobolev spaces ˙ H s ( T d ) , whic h are defined as subspace of H s ( T d ) remo ving the zero mo de λ 0 = 0 . T o that end, define the corresp onding inner pro duct ⟨ f , g ⟩ ˙ H s ( T d ) : = X j ∈ N λ s j ⟨ f , e j ⟩ L 2 ( T d , C ) · ⟨ g , e j ⟩ L 2 ( T d , C ) , noting that the iduced norm is giv en b y ∥ u ∥ ˙ H s ( T d ) : = X j ∈ N λ s j ⟨ u, e j ⟩ L 2 ( T d , C ) 2 C = ∥ ( − ∆) s 2 u ∥ 2 L 2 ( T d ) . W e generalize the previous definitions for real- or complex v alued functions to functions with v alues in W , where ( W, |·| W ) is a finite-dimensional C -v ector space with dimension d W : = dim C ( W ) . T o that end, let F ( M ) one of the the previous function spaces defined b efore. W e then define the space of functions f = ( f 1 , . . . , f d W ) : M → W as F ( M , W ) : = d W × i =1 F ( M ) , ∥ f ∥ 2 F ( M ,W ) : = X i ≤ d W ∥ f i ∥ 2 F ( M ) , where w e identify W canonically with C d W . In particular, if F ( M ) is a Hilb ert space with inner pro duct ⟨· , ·⟩ F ( M ) , then F ( M , W ) is an Hilb ert space equipp ed with inner pro duct ⟨ f , g ⟩ F ( M ,W ) : = X i ≤ d W ⟨ f i , g i ⟩ F ( M ) . A ccordingly , these ob jects are also defined for W b eing a finite-dimensional R -vector space. • F or the analysis of the 2D-Navier-Stok es equation, we require Sob olev spaces with v an- ishing mean and div ergence. T o that end, we define ˙ H ⋄ : = u ∈ L 2 ( T 2 , R 2 ) : div( u ) = 0 , Z T d u i ( x )d L 2 ( x ) = 0 for i = 1 , 2 , where div ( u ) : = ∂ ∂ x 1 u 1 + ∂ ∂ x 2 u 2 denotes the divergence. As ( ˙ H ⋄ , ⟨· , ·⟩ L 2 ( T 2 , R 2 ) ) is a closed linear subspace of L 2 ( T 2 , R 2 ) , we can define the L 2 -pro jection op erator, also called L er ay - op erator P : L 2 ( T 2 , R 2 ) → ˙ H ⋄ . (5) F or any s ≥ 0 , we then define ˙ H s ⋄ : = ˙ H ⋄ ∩ H s ( T 2 , R 2 ) . Let T > 0 . If X is a normed linear space, we further define the Bo c hner space L 2 ([0 , T ] , X ) of measurable maps h from [0 , T ] to X , such that ∥ h ( · ) ∥ X is a map in L 2 ([0 , T ]) . Analo- gously , we also define C 0 ([0 , T ] , X ) as the space of contin uous maps from [0 , T ] to X , suc h that sup t ∈ (0 ,T ) ∥ h ( t ) ∥ X is finite. 6 2.3 Observ ation mo del, Ba y esian approac h and mild missp ecification Throughout, let ( V , |·| V ) ≃ ( R d V , ⟨· , ·⟩ R d V ) and ( W, |·| W ) ≃ ( R d W , ⟨· , ·⟩ R d W ) b e tw o finite dimen- sional R -v ector spaces with dimensions d V ∈ N and d W ∈ N respectively . F urther, let ( Z , Z , ζ ) b e a probability space. In this work, we are interested in a parameter space Θ ⊆ L 2 ( M , W ) and in a measurable and p ossibly non-linear forward map G : Θ → L 2 ζ ( Z , V ) , suc h that p oin t wise ev aluations for eac h θ ∈ Θ of G ( θ )( z ) for z ∈ Z are w ell-defined. W e assume to ha v e access to data coming from a random design regression mo del with heteroscedastic error, i.e., for a fixed sample size N ∈ N , w e observ e D N : = ( Y i , Z i ) N i =1 ∈ ( V × Z ) N arising from Y i = G ( θ )( Z i ) + ε i , θ ∈ Θ , i = 1 , . . . , N . (6) The co v ariates ( Z i ) N i =1 are dra wn identically and indep endently (i.i.d.) from the la w ζ and assumed to b e indep enden t of the indep endent Gaussian errors ( ε i ) N i =1 , whic h satisfy ε i ∼ N 0 , σ 2 i Id V for i = 1 , . . . , N with heteroscedastic v ariances σ 2 1 , . . . , σ 2 N > 0 . The la w of the data vector D N is denoted by P N θ , its corresp onding exp ectation op erator is denoted by E N θ . The law and exp ectation of a single datum ( Y i , Z i ) is denoted b y P ( i ) θ and E ( i ) θ , resp ectiv ely . P N θ has a probabilit y density function with resp ect to the measure N N i =1 L d V ⊗ ζ , which is for all θ ∈ Θ , ( y , z ) ∈ ( V × Z ) N giv en b y p N θ ( y , z ) : = Y i ≤ N p ( i ) θ : = Y i ≤ N ( (2 π σ 2 i ) − d V 2 exp − 1 2 σ 2 i | Y i − G ( θ )( Z i ) | 2 V !) . W e define a corresp onding (re-scaled) log-lik eliho o d N ( θ ) by ∀ θ ∈ Θ : N ( θ ) : = X i ≤ N ( i ) ( θ ) : = − 1 2 X i ≤ N σ − 2 i | Y i − G ( θ )( Z i ) | 2 V . (7) The results obtained in this work hold under the frequentist assumption that the data D N is generated from the la w P N θ 0 describ ed by a fixed and unknown gr ound truth θ 0 ∈ Θ , which we aim to recov er. F or the Bay esian framew ork, we assume that Θ is equipp ed with a Borel- σ -field B Θ . Let Π ′ b e a probability measure (called b ase prior ) on the measurable space (Θ , B Θ ) . W e use so-called r e-sc ale d priors, defined b y Π N = La w( θ ) , θ = 1 q N δ 2 N θ ′ , θ ′ ∼ Π ′ , (8) where δ N > 0 is a sequence, such that N δ 2 N → ∞ as N → ∞ . Assuming that the map Θ × Z ∋ ( θ , z ) 7→ G ( θ )( z ) ∈ V is B Θ ⊗ Z − B V measurable, w e introduce the asso ciated p osterior me asur e Π ( ·| D N ) giv en by ∀ B ∈ B Θ : Π N ( B | D N ) = R B e ℓ N ( θ ) dΠ N ( θ ) R Θ e ℓ N ( θ ) dΠ N ( θ ) . As discussed in Section 1, w e in vestigate situations in whic h the error v ariances σ 2 1 , . . . , σ 2 N are not known and the forward map G is only appro ximatively kno wn. T o that end, let s 2 1 , . . . , s 2 N > 0 b e surrogate (pro xy) v ariances and e G : Θ → L 2 ζ ( Z , V ) a surrogate (pro xy) forward map, which 7 is join tly B Θ ⊗ Z − B V measurable as a map Θ × Z ∋ ( θ , z ) 7→ e G ( θ )( z ) ∈ V . Giv en the data D N ∼ P N θ 0 , we then ha ve a surrogate log-likelihoo d defined by ∀ θ ∈ Θ : ˜ N ( θ ) : = − 1 2 X i ≤ N s − 2 i Y i − e G ( θ )( Z i ) 2 V , (9) whic h we can ev aluate numerically . Note, ˜ N is the (re-scaled) log-likelihoo d asso ciated to the missp e cifie d regression mo del described b y Y i = e G ( θ )( Z i ) + ˜ ε i , θ ∈ Θ , i = 1 , . . . , N , (10) with indep enden t ˜ ε i ∼ N 0 , s 2 i Id V . The law generating Eq. (10) is analogously denoted by Q N θ . Its probabilit y densit y function w.r.t N N i =1 L d V ⊗ ζ for θ ∈ Θ , ( y , z ) ∈ ( V × Z ) N is given b y q N θ ( y , z ) : = Y i ≤ N q ( i ) θ ( y i , z i ) : = Y i ≤ N ( (2 π s 2 i ) − d V 2 exp − 1 2 s 2 i Y i − e G ( θ )( Z i ) 2 V !) . In other words, giv en the data D N ∼ P N θ 0 with not fully accessible log-lik eliho o d N , we replace the true log-lik eliho od N b y its surrogate ˜ N . In the Bay esian approach that means we lo ok at the corresp onding surrogate posterior distribution on (Θ , B Θ ) , which we define as ∀ B ∈ B Θ : e Π N ( B | D N ) = R B e ˜ ℓ N ( θ ) dΠ N ( θ ) R Θ e ˜ ℓ N ( θ ) dΠ N ( θ ) . (11) The goal in this work is to show that e Π N ( ·| D N ) is statistical reliable under P N θ 0 -probabilit y to infer the unknown parameter of in terest θ 0 . T o that end, we provide contraction results in Section 3. 2.4 Regularit y conditions on the forward map W e now imp ose analytical assumptions on the forward map G . Condition 2.2 (F orw ard Regularit y) . Let Θ ⊆ L 2 ( M , W ) b e the parameter space. Let ( R , ∥ · ∥ R ) b e a separable normed subspace of Θ such that ( R , ∥ · ∥ R ) → ( B η , ∥ · ∥ B η ) , where B η is either C η ( M , W ) or H η ( M , W ) for some η ≥ 0 . [FR1] F or all M > 0 there exist constants C Lip , 2 ( M ) > 0 and κ ≥ 0 , such that for all θ 1 , θ 2 ∈ R ( M ) ∥G ( θ 1 ) − G ( θ 2 ) ∥ L 2 ζ ( Z ,V ) ≤ C Lip , 2 ( M ) × ∥ θ 1 − θ 2 ∥ ( H κ ( M ,W )) ∗ . [FR2] There exist constants C G , B > 0 and γ B ≥ 0 , such that for all θ ∈ R ∥G ( θ ) ∥ ∞ ≤ C G , B × 1 + ∥ θ ∥ γ B R . [FR3] F or all M > 0 there exist a constant C Lip , ∞ ( M ) > 0 , such that for all θ 1 , θ 2 ∈ R ( M ) ∥G ( θ 1 ) − G ( θ 2 ) ∥ ∞ ≤ C Lip , ∞ ( M ) × ∥ θ 1 − θ 2 ∥ B η . 8 Remark 2.3 (F orw ard Regularity) . i) In the following, we refer to ( R , ∥ · ∥ R ) as the r e gularization sp ac e , which is t ypically the largest p ossible space so that the conditions in Condition 2.2 are still satisfied. ii) Note, the general theory developed in Nickl (2023) utilizes [FR1] , whic h implies that the induced prior is w ell sp ecified on the set of admissible regression functions, and a weak er v ersion of [FR2] , namely that G is uniformly b ounded on b ounded balls of R . In this w ork, w e need to trace the dep endence on θ more carefully , since later in Condition 3.2 and Condition 3.3 w e require at most linear or quadratic (p olynomial) growth in order to prov e p osterior contraction in missp ecified mo dels. F urthermore, we w ant to highlight that [FR2] includes uniformly b ounded forward maps ( γ B = 0 ), such as the solution maps asso ciated to PDE-constrained regression mo dels driven by the Darcy problem and the time-(in)dep enden t Schrödinger equation (see Nickl et al. (2020) ; Kekk onen (2022) ). In Section 4, w e will apply the presen t general theory to P DE-constrained regression mo dels driv en by non-linear reaction diffusion equation and the 2D-Navier-Stok es equation. F or the latter, it is sho wn in Nickl and Titi (2024) and K onen and Nickl (2025) that the corresp onding solution map, mapping the initial condition θ to the solution u θ of the dynamical system satisfies the assumptions imp osed in Condition 2.2, particularly [FR2] with some γ B > 0 . F ollo wing the theory provided in Nickl (2024) , where [FR1] and [FR3] are shown for the solution map of the non-linear reaction diffusion equation, w e derive [FR2] in Lemma C.1. 2.5 Conditions on the Prior Condition 2.4 (Base Prior) . Under the conditions imp osed in Condition 2.2, let Π ′ b e a cen tered Gaussian measure on the linear subspace Θ ⊆ L 2 ( M , W ) with r epr o ducing kernel Hilb ert sp ac e (RKHS) H , such that H → R . F or some α > 0 assume that either H → H α c ( M , W ) , if κ ≥ 1 2 , or H → H α ( M , W ) , if κ < 1 2 . F urther, assume that Π ′ ( θ ∈ Θ : ∥ θ ∥ R < ∞ ) = 1 . Remark 2.5. i) T ypical choices of H and R from Condition 2.4 w e hav e in mind are H = H α ( M , W ) and R = H β ( M , W ) for appropriate α > β , whic h will be particularly important in Section 4. ii) Sev eral constructions for Gaussian priors that satisfy Condition 2.4 ha ve b een discussed in Nickl (2023) . In Section A, w e summarize these discussions including explicit con- structions that are suitable for the Reaction Diffusion Equation and the 2D-Na vier-Stokes equation, see particularly Example A.1 and Example A.3. The results in this w ork are presen ted for Gaussian pro cess priors, while finite-dimensional ( sieve ) priors could also b e used with some minor changes in the proofs, see also Remark A.2 iii). 9 3 P osterior Con traction for missp ecified mo dels In the rest of the section, let G b e a forw ard map satisfying the forward regularit y conditions form ulated in Condition 2.2 for some κ ≥ 0 , and Π ′ b e a base Gaussian prior satisfying Condition 2.4 with α > 0 . Let Π N b e the corresp onding sequence of rescaled priors from Eq. (8) with δ N = N − α + κ 2 α +2 κ + d . (12) In this section, the main result, Theorem 3.12 consists of a p osterior contraction theorem at rate δ N around the fixed ground truth θ 0 ∈ Θ for the missp ecified posterior distribution e Π N ( · | D N ) under P N θ 0 probabilit y , provided the missp ecification of the noise v ariance ( s 2 i for σ 2 i ) or approximate map ( e G for G ) is sufficiently small. T o that end, w e need some assumptions on the noise v ariances σ 2 1 , . . . , σ 2 N as well as on the level of missp ecification. Condition 3.1 (Noise v ariances) . [NV] The error v ariances σ 2 1 , . . . , σ 2 N > 0 satisfy 0 < σ 2 0 : = min i ≤ N σ 2 i ≤ max i ≤ N σ 2 i = : σ 2 ∞ . W e will consider [NV] to hold implicitly in the rest of the pap er. Condition 3.2 (Noise missp ecification) . F or N ∈ N , let s 2 1 , . . . , s 2 N > 0 b e the sequence of pro xy v ariances used in place of σ 2 1 , ..., σ 2 N . [NM1] Let 0 < s 2 0 := min i ≤ N s 2 i ≤ max i ≤ N s 2 i = : s 2 ∞ suc h that ¯ s − 2 N := 1 N N P i =1 s − 2 i ≤ s − 2 0 . [NM2] W e ha ve max i ≤ N 1 − σ 2 i s 2 i = ˜ δ noise ,N with some sequence ˜ δ noise ,N > 0 , and either [NM2.1] the v ariance is consistently over estimate d , that is s 2 i > σ 2 i for all i ≤ N , and ˜ δ noise ,N → 0 as N → ∞ ; [NM2.2] or, ˜ δ noise ,N ≤ C noise × δ 2 N for a sufficiently small constant C noise > 0 , and the proxy forw ard e G satisfies [FR2] with γ B ∈ [0 , 1] . Condition 3.3 (Mo del missp ecification) . Let e G · b e as in Section 2.3. F urther: [MM1] The pro xy op erator e G satisfies [FR2] with γ B ∈ [0 , 2] . [MM2] Let M > 0 . There exists a constant c (M) > 0 and sequence of ˜ δ model ,N > 0 such that ∥G ( θ ) − ˜ G ( θ ) ∥ ∞ ≤ c (M) × ˜ δ model ,N for all θ ∈ R (M) , with ˜ δ model ,N ≤ C model × δ 2 N , for some constan t C model > 0 sufficiently small. 10 Remark 3.4 (Interpretation of Condition 3.2 and Condition 3.3) . i) [NM1] means that the surrogate v ariance has similar b ounds as the true v ariance from [NV] , and preven ts the use of proxies that w ould v astly underestimate the correct ones. Concerning [NM2.1] , ov erestimating the v ariance can b e related to the approach of frac- tional p osteriors men tioned in Section 1, as indeed it amoun ts to using s 2 i = σ 2 i /α N as proxy v ariance for some 0 < α N < 1 with α N → 1 , effectively raising the original lik eliho o d Eq. (7) to the p o w er α N . ii) The ‘smallness’ condition on C noise and C model is in regard to the small ball exp onen t of the prior (as defined in Eq. (35)), and will pla y a part in the pro ofs of Proposition 3.9 and Theorem 3.17, Eq. (17). F or the examples in Section 4 w e ignore this tec hnicality , replacing these constants with 1 / (log N ) . 3.1 Preliminary results W e start with some preliminary results. F ollo wing the pro of strategies in Ghosal and v an der V aart (2017) and Nic kl (2023) , a standard p osterior con traction pro of relies on tw o main con- ditions: a smal l b al l c ondition and the existenc e of tests . In presence of missp ecification, we exhibit a third so called change of me asur e condition whic h is crucial in nonlinear problems to con trol the b eha viour of the p osterior distribution. W e show that under Condition 3.2 and Condition 3.3, these requirements are satisfied. F or conciseness we state here the key lemmas and prop ositions; remaining proofs can be found in detail in Section 5. W e define for any θ 1 , θ 2 ∈ Θ the shorthand notation d G ( θ 1 , θ 2 ) : = ∥G ( θ 1 ) − G ( θ 2 ) ∥ L 2 ζ ( Z ,V ) , noting that this defines a semi-metric on the parameter-space Θ . W e analogously define d e G . Giv en a fixed constant U > 0 , we define the sets e B N : = n θ ∈ Θ : d e G ( θ , θ 0 ) ≤ δ N , ∥ e G θ ∥ ∞ ≤ U o . Small ball computations Prop osition 3.5 (Information Inequality) . Under the misspecification assumptions Condi- tion 3.2 and Condition 3.3, we ha ve the follo wing properties. i) There exists a constan t c 1 = c 1 ( θ 0 , s 2 0 ) > 0 , such that for all θ ∈ e B N − E N θ 0 " log q N θ q N θ 0 !# ≤ 1 2 N s − 2 0 × d e G ( θ , θ 0 ) 2 + c 1 N ˜ δ model ,N × d e G ( θ , θ 0 ) . ii) There exists a constan t c 2 = c 2 (U , s 2 0 , σ 2 ∞ ) > 0 , such that for all θ ∈ e B N ∀ i ≤ N : E ( i ) θ log q ( i ) θ q ( i ) θ 0 − E ( i ) θ log q ( i ) θ q ( i ) θ 0 2 ≤ c 2 × d e G ( θ , θ 0 ) 2 . 11 Note that the noise v ariance missp ecification do es not play a part here, b ey ond affecting the m ultiplicative constan t in front of d e G ( θ , θ 0 ) . F rom the prop osition one can then easily derive the follo wing auxiliary lemma, whic h shows that the denominator in the formula of the p osterior measure is b ounded a w ay from 0 on ev ents of high P N θ 0 -probabilit y . Lemma 3.6 (Auxiliary con traction) . In the setting of Proposition 3.5, let ν b e a probability measure on some (measurable) subset B N ⊆ e B N . F or the surrogate log-likelihoo d ˜ N from Eq. (9), we ha ve for all K > s − 2 0 P N θ 0 Z B N e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) d ν ( θ ) ≤ e − K N δ 2 N − − − − → N →∞ 0 . Existence of T ests As already discuss ed ab o ve, in homoscedastic mo dels, the existence of tests Ψ N is a standard result, whic h follo ws for instance from Theorem 7.1.4 in Gine and Nic kl (2021) . Due to the heteroscedasticity of the observ ation sc heme Eq. (1), w e need to construct tests explicitly , whose t yp e-I and type-I I errors are con trolled sufficien tly . W e define in tro duce the following r e gularization sets . Giv en m > 0 , w e define Θ N ( m ) : = n θ ∈ R : θ = θ 1 + θ 2 , ∥ θ 1 ∥ ( H κ ( M ,W )) ∗ ≤ mδ N , ∥ θ 2 ∥ H ≤ m, ∥ θ ∥ R ≤ m o . (13) Prop osition 3.7 (Existence of T ests) . Let G satisfy [FR1] - [FR3] . Let H b e the RKHS from Condition 2.4 with α > η + d . Let N ∈ N and assume [NV] . Let D N ∼ P N θ 0 with fixed θ 0 ∈ H . Let δ N as in Eq. (12). Given ¯ c > 0 , there exist a sequence of tests (indicator functions) Ψ N = Ψ N ( D N ) , such that lim N →∞ E N θ 0 [Ψ N ] = 0 and sup θ ∈ Θ N ( m ): ∥G ( θ ) −G ( θ 0 ) ∥ L 2 ζ ( Z ,V ) ≥ ρδ N E N θ [1 − Ψ N ] ≲ exp − ¯ cN δ 2 N for all ρ = ρ ( S ) , m = m ( θ 0 ) > 0 and N sufficien tly large, where S : = { ¯ c, α, γ B , κ, η , d, d W , m, C G , B , C Lip , 2 , C Lip , ∞ , C v ar , σ 0 , σ ∞ } . F or the pro of of Pr oposition 3.7, we use concen tration prop erties of estimators as prop osed in Giné and Nickl (2011) . W e deriv e in Cor ollar y B.7 that the maximizer ˆ θ N of the following Tikhono v-type-functional H ( m ) ∋ θ 7→ − 1 2 N X i ≤ N | Y i − G ( θ )( Z i ) | 2 V − δ 2 N 2 ∥ θ ∥ 2 H , for m > 0 sufficiently large defined on balls H ( m ) of the RKHS H , exists and is consisten t in the sense that P N θ 0 d 2 δ N ( ˆ θ N , θ 0 ) ≥ cδ 2 N N →∞ − − − − → 0 , d 2 δ N ( ˆ θ N , θ 0 ) : = d G ( ˆ θ N , θ 0 ) + δ 2 N ∥ ˆ θ N ∥ 2 H for some c > 0 sufficiently large. Defining the even ts A N : = n d 2 δ N ( ˆ θ N , θ 0 ) ≥ cδ 2 N o , we sho w in Corollar y B.9 that the resulting sequence of tests Ψ N : = 1 A N has the desired prop erties. In fact, in Corollar y B.9 w e can abandon the Gaussian assumption on the measurement errors ε 1 , . . . , ε N and require only a Bernstein condition (see Condition B.1), due to the w ell- kno wn robustness of M-estimation techniques, which w e again demonstrate in Theorem B.6 and Corollar y B.7, resp ectively . 12 Change of measure In the pro of of the main theorem Theorem 3.12, it will b ecome ap- paren t that conditions are required control the effect of missp ecification on the (log-)lik eliho o d: this is the purp ose of the follo wing tw o prop ositions. Prop osition 3.8 (Change of measure I: Inside of the regularization set) . Supp ose Condition 3.2 and Condition 3.3 are satisfied. Then, for all M > 0 and b > 0 , there exist c 5 = c 5 ( b, s − 2 0 , σ 2 ∞ M , C e G , C noise , C model ) > 0 such that ∀ θ ∈ R ( M ) : E N θ q N θ q N θ 0 p N θ 0 p N θ ! b ≤ exp c 5 × N δ 2 N Prop osition 3.9 (Change of measure I I: Outside of the regularisation set) . Gran t Condition 3.2 and Condition 3.3, with constan ts C noise , C model small enough compared to the small ball exp onen t of the prior (see Eq. (35)). Let M > 0 . Then there exists c 6 = c 6 ( M , C e G ,B , θ 0 , s − 2 0 , C noise , C model ) > 0 such that Z Θ N ( M ) c E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i dΠ N ( θ ) ≤ exp − c 6 × N δ 2 N and where the constan t c 6 can b e made as large as desired by increasing M . Remark 3.10. Proposition 3.9 is reminiscen t of Equation (2.13) in Kleijn and v an der V aart (2006) . Here, w orking with the slicing technique enables us to co ver a wider range of PDE problems; not necessarily uniformly b ounded ov er the parameter space. 3.2 Basic contraction theorem A final requirement is a mass condition on the prior: Prop osition 3.11. Let Π N b e the sequence of rescaled Gaussian priors as ab o ve with δ N as in Eq. (12), such that N δ 2 N → ∞ as N → ∞ . Under Condition 3.3, there exists some A = A ( d W , e G , G , θ 0 ) > 0 , such that for all N large enough, we ha ve Π N ( e B N ) ≥ e − AN δ 2 N for some A > 0 . (14) W e are no w able to state the main theorem: Theorem 3.12 (Posterior contraction) . Let H and R b e as in Condition 2.4 with α > η + d . Let G satisfy Condition 2.2. Let D N ∼ P N θ 0 b e data arising as in Eq. (6), for fixed θ 0 ∈ H . Let Π N b e the sequence of rescaled Gaussian priors as ab o ve with δ N as in Eq. (12), such that N δ 2 N → ∞ as N → ∞ . Let e Π N ( ·| D N ) b e the surrogate p osterior distribution arising as in Section 2. Gran t Condition 3.2 and Condition 3.3, with constants C noise , C model small enough compared to the small ball exp onen t of the prior (see Eq. (35)). Let A b e as in the setting of Proposition 3.11, and M b e large enough suc h that c 6 from Proposition 3.9 satisfies c 6 > A + s − 2 0 . Then for all 0 < b < c 6 − A − s − 2 0 , w e can find ρ > 0 large enough such that P N θ 0 e Π N ( θ ∈ Θ N ( M ) : d G ( θ , θ 0 ) ≤ ρδ N | D N ) ≤ 1 − e − bN δ 2 N − − − − → N →∞ 0 (15) Remark 3.13. Our approac h cov ers the case of strongly consisten t plug-in estimators as proxy v ariances and pro xy forward map. In practice, it could also happ en, e.g. in the case of noise missp ecification, that the v ariance estimator conv erges in a weak er sense (in probability), or 13 that in a fully Bay esian approach one prefers adopting a hierarchical approach and putting a prior on σ 2 . Our contraction guaran tees remain v alid, holding for the conditional p osterior of θ given σ 2 . Remark 3.14 (About the con traction rate) . The resulting rate Eq. (12) matc hes standard nonparametric forward conv ergence rates, whic h minimax optimality has b een established in sp ecial cases, suc h as the Darcy problem (see Nickl et al. (2020) ). In our approach, w e fix the contraction rate δ N and set the desired deca y of ˜ δ model ,N , ˜ δ noise ,N accordingly . In practice one migh t b e limited by the deca y of these missp ecification rates: note then that the p osterior con traction results still hold, with statements on δ N replaced with a slow er contraction rate, and prior renormalisation also mo dified suitably – as long as Conditions 3.2 and 3.3 connecting con traction and missp ecification rates still hold. This yields the follo wing corollary . Corollary 3.15. Consider the setting of Theorem 3.12, replacing the rate δ N from Eq. (12) ev erywhere b y δ ′ N = log N ( ˜ δ noise ,N ∨ ˜ δ model ,N ) 1 / 2 . Then for all 0 < b < c 6 − A − s − 2 0 , w e can find ρ > 0 large enough such that P N θ 0 e Π N ( θ ∈ Θ N ( M ) : d G ( θ , θ 0 ) ≤ ρδ ′ N | D N ) ≤ 1 − e − bN δ ′ 2 N − − − − → N →∞ 0 3.3 Con traction result for the inv erse problem While the last theorem pro v es con traction for the forward problem, we need the following inv erse mo dulus of con tinuit y to pro vide a corresp onding con traction theorem on the parameter-level. Condition 3.16 (Inv erse mo dulus of con tinuit y) . F or an y δ, M > 0 define Λ δ : = n ( θ 1 , θ 2 ) ∈ (Θ ∩ R ( M )) 2 : d G ( θ 1 , θ 2 ) ≤ δ o . There exist constants τ > 0 and C G , inv ( M ) > 0 , such that for ρ small enough sup ( θ 1 ,θ 2 ) ∈ Λ ρ n ∥ θ 1 − θ 2 ∥ L 2 ( M ,W ) o ≤ C G , inv ( M ) × δ τ . ( IR1 ) Theorem 3.17 (Posterior contraction - inv erse problem) . Grant the assumptions of Theo- rem 3.12. Assume additionally that Condition 3.16 holds true for some τ > 0 . W e then ha ve P N θ 0 e Π N ( θ ∈ Θ N ( M ) : ∥ θ − θ 0 ∥ L 2 ( M ,W ) ≤ C G , inv ( M )( ρδ N ) τ | D N ) ≤ 1 − e − bN δ 2 N − − − − → N →∞ 0 . (16) Moreo ver, denoting b y E e Π [ ·| D N ] the surrogate p osterior mean, w e ha ve ∥ E e Π [ θ | D N ] − θ 0 ∥ L 2 ( M ,W ) = O P N θ 0 ( δ τ N ) . (17) The pro of of Eq. (16) follows easily from Theorem 3.12 and Condition 3.16. The pro of of Eq. (17) requires more care, in particular the use of the c hange of measure conditions detailed in Section 3.1: details can b e found in Section 5. 14 4 Examples In this section, we apply the theoretical results established in Section 3 to three illustrative examples of nonlinear and time-dep endent PDE-based inv erse problems, where the goal is to infer the initial condition θ of the system at time t = 0 . 1. Noise Missp ecification in Reaction-Diffusion: we first consider a time-ev olution problem where the primary c hallenge lies not in the PDE model, but in the observ ation noise. W e fo cus on a general heteroscedastic setting where sensor noise v ariances are unknown and m ust b e estimated from auxiliary data. 2. Mo del Missp ecification via parameter uncertaint y in Navier-Stok es: w e study the case where other physical parameters go v erning the PDE are only known approximately . 3. Mo del Missp ecification via n umerical approximation: when the PDE solution is computed appro ximately - in our case via an Oseen iterative scheme. 4.1 Example 1: Noise missp ecification in the Reaction Diffusion equation W e b egin b y addressing the problem of noise missp ecification in a dynamical setting. In man y practical data assimilation scenarios, observ ations are gathered from a netw ork of sensors where the precision (noise v ariance) ma y v ary from one sensor to another and is not kno wn a priori. In this example, w e ignore missp ecification arising from the PDE map itself and isolate the problem of heteroscedastic missp ecified noise. In this setting, it makes sense to consider observ ations arising from a fixed design setting for the spatial co v ariate. The random design employ ed in Eq. (20) facilitates a simpler presen- tation and is essentially a technical choice: it can b e shown to b e asymptotically equiv alen t to other commonly used nonparametric regression mo dels, see for instance Reiß (2008) . In V ollmer (2013) , a condition on the empirical distribution of design p oin ts is used. Construction of the v ariance proxy . Let us then consider a fixed design setting with L X ∈ N sensors densely distributed across the spatial domain at lo cations x 1 , . . . , x L X , measuring the solution u θ 0 of Eq. (19) o ver time. Each sensor j is associated with a measurement error N (0 , σ 2 j ) with unknown v ariance σ 2 j . T o estimate these v ariances, w e collect observ ations at eac h spatial lo cation o ver a small time windo w, at t 1 , . . . t L T ∈ [0 , ∆ T ] , L T ∈ N Υ ij = u θ 0 ( t i , x j ) + ξ ij , j = 1 , . . . , L X , i = 1 , . . . , L T , ξ ij ∼ N (0 , σ 2 j ) . W e denote the la w of Υ · j : = (Υ ij ) i ≤ L T b y e P L T θ 0 ,j . Natural estimators for each of the v ariances σ 2 j ’s are then the sample v ariance estimators s 2 j = 1 L T − 1 L T X i =1 (Υ ij − ¯ Υ j ) 2 , with ¯ Υ j = 1 L T L T X i =1 Υ ij (18) These plug-in estimators s 2 1 , ..., s 2 L X are subsequen tly used to compute the surrogate log-likelihoo d ˜ N ( θ ) and the resulting surrogate p osterior distribution. P arameter reconstruction. The underlying physical pro cess gov erning the data generation is the Reaction Diffusion equation. Let M = T d with d ≤ 3 . This equation mo dels the time- ev olution for a fixed time horizon T > 0 of the the concentration u : [0 , T ] × T d → R = : V of a 15 substance from its initial condition θ b y ∂ ∂ t u − ∆ u = f ( u ) on (0 , T ] × T d u ( · , 0) = θ on T d (19) where f : O → R is a nonlinear reaction term modelling p otential creation or destruction of the substance ( T emam (1997) ). If f ∈ C ∞ c ( R ) and θ ∈ H 1 ( T d ) , it can be sho wn (see e.g. Ev ans (2010) ) that there exists a solution of Eq. (19) that is unique in C 0 ([0 , T ] , L 2 ( T d )) . W e now turn to the inv erse problem of recov ering θ 0 . T o that end, w e observe another sam- ple of data, whic h is drawn indep enden tly from (Υ ij ) i ≤ L T ,j ≤ L X . Precisely , w e ha ve D N : = ( Y i , t i , X i ) N i =1 ∼ P N θ 0 generated by Y i = u θ 0 ( t i , X i )+ ε i , ( t i , X i ) i.i.d. ∼ Unif ([0 , T ] × { x 1 , ..., x L X } ) , ε i | x i ind. ∼ N 0 , σ 2 i , i = 1 , . . . , N (20) with θ 0 ∈ H 1 ( T d ) and unknown v ariances σ 2 1 , . . . , σ 2 N satisfying [NV] . With the noise estimators and forward mo del defined, we can now state the contraction result for this setup. The follow- ing theorem ensures that, despite using estimated v ariances, the surrogate p osterior contracts around the ground truth. Theorem 4.1. Let Π ′ b e a Gaussian pro cess base prior satisfying Condition 2.4 with Θ = R = H β ( T d ) , β > 2 + d , and RKHS H → H α ( T d ) with α > β + d 2 . Let Π N b e the corresp onding rescaled prior from Eq. (8) with δ N = N − α 2 α + d . Let D N ∼ P N θ 0 as in Eq. (20) with θ 0 ∈ H . Consider the surrogate p osterior distribution e Π( ·| D N ) arising from that c hoice of prior and the surrogate log-lik eliho o d ˜ N ( θ ) computed with the noise v ariance estimators s 2 1 , . . . , s 2 L X from Eq. (18) ov er a small enough time window ∆ T suc h that the bias term b T satisfies b 2 T + L − 1 / 2 T ≤ 1 log N δ 2 N . Th en, for N → ∞ and th us sufficien tly man y L T = L T ( N ) past observ ations, the surrogate p osterior e Π N ( ·| D N ) contracts around the ground truth θ 0 at rate δ N , i.e. there exist m, m ′ > 0 sufficiently large, suc h that e Π N θ ∈ H β ( T d ) : ∥ u θ − u θ 0 ∥ L 2 ([0 ,T ] × T d ) ≤ mδ N | D N = 1 − o P N θ 0 (1) . Moreo ver, w e ha ve e Π N θ ∈ H β ( T d ) : ∥ θ − θ 0 ∥ L 2 ( T d ) ≤ m ′ δ β β +1 N | D N = 1 − o P N θ 0 (1) as well as ∥ E e Π [ θ | D N ] − θ 0 ∥ L 2 ( T d ) = O P N θ 0 δ β β +1 N . Pr o of of Theorem 4.1. W e choose the time window ∆ T used to build estimators s 2 1 , ..., s 2 L X sufficien tly small so that max 1 ≤ i ≤ L T | u θ 0 ( t i , x j ) − u θ 0 (0 , x j ) | ≤ b T , b T → 0 , uniformly in 1 ≤ j ≤ L X as L T → ∞ . Then the deterministic bias term in s 2 j , arising from the v ariations in time of u θ 0 ( t, x j ) , is b ounded b y b 2 T . By the strong law of large n um b ers, s 2 j = σ 2 j + b 2 T + O p ( L − 1 / 2 T ) (21) 16 under e P L T θ 0 ,j for ev ery 1 ≤ j ≤ L X . Hence, Condition 3.2 [NM1] b ounding the s ′ j s a w ay from 0 and ∞ follows from [NV] with e P L T θ 0 ,j -probabilit y arbitrarily close to one for L T large enough. [NM2] is also satisfied with ˜ δ noise ,N = max 1 ≤ i ≤ L X | 1 − σ 2 i /s 2 i | ≤ δ 2 N / log N , b y Eq. (21) and b y the theorem assumption. In particular, [NM2.2] follows from Lemma C.1 with a = 2 and the embedding H 2 ( T d ) → C 0 ( T d ) yielding [FR2] with γ B = 1 . Note that G : θ 7→ u θ satisfies [FR1] and [FR3] with κ = 0 and η = 2 , which is shown in section 3.1.3 in Nickl (2024) . F urther, Condition 3.16 is shown in Nickl (2024) (see equation (45)) with τ = β β +1 . The claim th us follo ws immediately from the main theorems Theorem 3.12 and Theorem 3.17. 4.2 Example 2: Mo del missp ecification in the Navier-Stok es equation W e no w turn to the case of missp ecification of the PDE forward map, ignoring p ossible noise missp ecification. In practical fluid dynamics and data assimilation, the gov erning physical laws are often w ell-understo o d, but the sp ecific ph ysical parameters defining the system may only b e known approximately: this is the example w e address here. Let M = T 2 . The 2D Navier-Stok es equation describ es the ev olution of the v elo cit y u : [0 , T ] × T 2 → V : = R 2 of (incompressible) fluids from an initial velocity θ at time t = 0 for a fixed time horizon T > 0 . Giv en a viscosity ν > 0 , a scalar pressure p : [0 , T ] × T d → R , and some time-indep endent external forcing f : T 2 → R 2 , u = u ν,f θ then solves ∂ u ∂ t − ν ∆ u + ( u · ∇ ) u = f − ∇ p on (0 , T ) × T 2 , u (0) = θ on T 2 , ∇ · u = 0 on [ 0 , T ] × T 2 . (22) Among the physical parameters app earing in Eq. (22), the initial condition θ is the one that w e wan t to infer; how ev er the other parameters ν and f migh t only b e known appro ximately via some estimates ˜ ν and ˜ f . As common in the literature, w e consider the pr oje cte d equation Eq. (22) b y applying the Leray op erator Eq. (5) on it. This leads to an equiv alent formulation in functional form where the v elo cit y field, as a map u ν,f θ : [ 0 , T ] → ˙ H ⋄ , is the solution to d d t u + ν Au + B [ u, u ] = f , u (0) = θ (23) with A : = − P ∆ and B [ u, v ] : = P [( u · ∇ ) v ] . It is well-kno wn that for an y f ∈ ˙ H ⋄ , ν, T > 0 and an y θ ∈ ˙ H 1 ⋄ , there exists a solution of Eq. (23) that is unique in C 0 [0 , T ] , ˙ H 1 ⋄ ∩ L 2 [0 , T ] , ˙ H 2 ⋄ , see e.g. Robinson (2001) . Our observ ation scheme hence consists of data D N : = ( Y i , t i , X i ) N i =1 ∼ P N θ 0 generated by Y i = u ν,f θ 0 ( t i , X i ) + ε i , ( t i , X i ) i.i.d. ∼ Unif [0 , T ] × T 2 , ε i ind. ∼ N 0 , σ 2 i Id R 2 , i = 1 , . . . , N (24) with θ 0 ∈ H 1 ( T d ) and v ariances σ 2 1 , . . . , σ 2 N satisfying [NV] . Theorem 4.2. Let Π ′ b e a Gaussian pro cess base prior satisfying Condition 2.4 with Θ = R = ˙ H β ⋄ with β > 4 and RKHS H → ˙ H α ⋄ with α > β + 1 . Let Π N b e the corresp onding rescaled prior from Eq. (8) with δ N = N − α 2 α +2 . Let D N ∼ P N θ 0 as in Eq. (24) with θ 0 ∈ H , f ∈ ˙ H 1 ⋄ and ν > 0 . Consider the surrogate posterior distribution e Π( ·| D N ) arising from that c hoice of prior and the surrogate log-likelihoo d ˜ N ( θ ) 17 computed with surrogate forward op erator e G ( θ ) = u ˜ ν, ˜ f θ , where the parameter approximations ˜ ν > 0 and ˜ f ∈ ˙ H 1 ⋄ satisfy | ˜ ν − ν | ≲ 1 log N δ 2 N and ∥ f − ˜ f ∥ L 2 ([0 ,T ] , ˙ H 1 ) ≲ 1 log N δ 2 N . (25) Then the surrogate p osterior e Π N ( ·| D N ) contracts around the ground truth θ 0 at rate δ N , i.e. there exist m, m ′ > 0 sufficiently large, suc h that e Π N θ ∈ ˙ H β ⋄ : ∥ u θ − u θ 0 ∥ L 2 ([0 ,T ] × T 2 , R 2 ) ≤ mδ N | D N = 1 − o P N θ 0 (1) . Moreo ver, w e ha ve that e Π N θ ∈ ˙ H β ⋄ : ∥ θ − θ 0 ∥ L 2 ( T 2 , R 2 ) ≤ m ′ δ β β +1 N | D N = 1 − o P N θ 0 (1) as well as ∥ E e Π [ θ | D N ] − θ 0 ∥ L 2 ( T 2 , R 2 ) = O P N θ 0 δ β β +1 N . Pr o of of Theorem 4.2. Firstly , note that the asso ciated forward map G : θ → u ν,f θ of Eq. (23) satisfies the conditions imp osed in Condition 2.2 and Condition B.3 with κ = 0 , η = 2 , γ B = 2 and τ = β β +1 as derived in Nickl and Titi (2024) and K onen and Nickl (2025) . Analogously , e G : θ 7→ u ˜ ν, ˜ f θ satisfies [MM1] (with a c hange of constan ts). [MM2] then follo ws from the stabilit y of u ν,f θ with resp ect to the parameters ν and f as derived in Lemma C.4 as w ell as the Sob olev embedding ˙ H 2 ( T 2 ) → C 0 ( T 2 ) . Th us the claim follo ws immediately from an application of Theorem 3.12 and Theorem 3.17. 4.3 Example 3: Mo del missp ecification from numerical appro ximation W e no w address another source of mo del missp ecification, via n umerical approximation. Lo ok- ing at the 2D Navier-Stok es equation Eq. (22), computing a corresp onding solution G ( θ ) = u ν,f θ is generally c hallenging, with the main difficult y coming from the non-linear conv ection term ( u · ∇ ) u . In practice, linearization from an iterative pro cess is often used. Starting with an appropriate initializer u 0 , the l th (pro jected) iteration for l ∈ N 0 is defined via d d t u l + ν Au l + B [ u l − 1 , u l ] = f , u l (0) = θ . (26) These are called Ose en e quations and constitute a goo d ap pro ximation of Navier-Stok es in vis- cous flow settings under some smallness condition on the Reynolds num b er ( ν ≫ 1 , Re ≪ 1 )(see Girault and Raviart (1986) ; Batchelor (1999) for details). These fixed p oin t (Picard) iterations are known to conv erge linearly , so that by taking the num b er of iterations L ∈ N sufficien tly high one can then sho w that for e G ( θ ) = u L θ the surrogate op erator is go o d enough for the con- traction of the surrogate p osterior as the following results sho w. Prop osition 4.3. Let ν > 0 , f ∈ L 2 ([0 , T ] , ˙ H 1 ⋄ ) , θ ∈ ˙ H 1 ⋄ , and choose an initilizer u 0 ∈ L 2 [0 , T ] , ˙ H 2 ⋄ . Beginning with u 0 , for all l ∈ N , the l th -iteration step of Eq. (26) has a solution u l ∈ C 0 ([0 , T ] , ˙ H 1 ⋄ ) ∩ L 2 ([0 , T ] , ˙ H 2 ⋄ ) , d u l d t ∈ L 2 ([0 , T ] , ˙ H 0 ⋄ ) . 18 No w let L = L N ∈ N b e c hosen sufficien tly large, such that ∀ r > 0 : sup θ ∈ ˙ H 2 ⋄ ( r ) sup t ∈ [0 ,T ] ∥ u L θ ( t ) − u L − 1 θ ( t ) ∥ ˙ H 2 ( T 2 ) ≤ C model ( r ) × 1 log N δ 2 N . Assume further, that this last iterate u L θ satisfies ∀ θ ∈ ˙ H 2 ⋄ : sup t ∈ [0 ,T ] ∥ u L θ ( t ) ∥ ˙ H 2 ( T 2 ) ≤ C Oseen , B × 1 + ∥ θ ∥ 2 ˙ H 2 . Defining than the surrogate forward map e G : ˙ H 2 ⋄ ∋ θ 7→ u L θ ∈ ˙ H 2 ⋄ it satisfies Condition 3.3. The pro of of Proposition 4.3 follo ws standard regularit y estimates as presen ted in K onen and Nic kl (2025) and can be found in Section C. Theorem 4.4. Let Π ′ b e a Gaussian pro cess base prior satisfying Condition 2.4 with Θ = R = ˙ H β ⋄ with β > 4 and RKHS H → ˙ H α ⋄ with α > β + 1 . Let Π N b e the corresp onding rescaled prior from Eq. (8) with δ N = N − α 2 α +2 . Let D N ∼ P N θ 0 as in Eq. (24) with θ 0 ∈ H , f ∈ ˙ H 1 ⋄ and ν > 0 . Consider the surrogate posterior distribution e Π( ·| D N ) arising from that c hoice of prior and the surrogate log-likelihoo d ˜ N ( θ ) computed with surrogate forw ard op erator e G ( θ ) = u L θ as describ ed in Proposition 4.3. Then the surrogate p osterior e Π N ( ·| D N ) con tracts around the ground truth θ 0 at rate δ N , i.e. there exist m, m ′ > 0 sufficiently large, suc h that e Π N θ ∈ ˙ H β ⋄ : ∥ u θ − u θ 0 ∥ L 2 ([0 ,T ] × T 2 , R 2 ) ≤ mδ N | D N = 1 − o P N θ 0 (1) . Moreo ver, w e ha ve that e Π N θ ∈ ˙ H β ⋄ : ∥ θ − θ 0 ∥ L 2 ( T 2 , R 2 ) ≤ m ′ δ β β +1 N | D N = 1 − o P N θ 0 (1) as well as ∥ E e Π [ θ | D N ] − θ 0 ∥ L 2 ( T 2 , R 2 ) = O P N θ 0 δ β β +1 N . The pro of of Theorem 4.4 follo ws the arguments of Theorem 4.2 using Proposition 4.3 and applying Theorem 3.12 as w ell as Theorem 3.17, and is th us omitted. Remark 4.5. Proposition 4.3 provides a guideline for when the iteration can b e considered sufficien tly conv erged: when tw o successive iterates are sufficien tly close in the ˙ H 2 -norm and when u l exhibits the qualitative analytic prop erties of the exact solution u θ (relativ e to [FR2] ). It should b e noted that in this setting, recov ering the initial condition from a linearized Navier- Stok es b ecomes an appro ximately simpler line ar inv erse problem, but our analysis still applies. 5 Pro of of Section 3 In this section we pro ve the results of Section 3. W e will use rep eatedly the following inequality , whic h can easily b e derived by Cauch y-Sch warz. ∀ j ≤ N , ( a j ) J j =1 ⊆ R : X j ≤ J a j ≤ √ J × s X j ≤ J a 2 j (27) F urther, in what follows we will write shorthand ˜ w θ := G ( θ ) − e G ( θ ) . 19 5.1 Small ball computations Pr o of of Proposition 3.5. First, under the ground truth probability P N θ 0 , ∀ i = 1 , . . . , N : Y i = G ( θ 0 )( Z i ) + ε i = e G ( θ 0 )( Z i ) + ˜ ε i where ˜ ε i = ˜ w θ 0 ( Z i ) + ε i , so ˜ N ( θ ) − ˜ N ( θ 0 ) = − 1 2 N X i =1 1 s 2 i e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) + ˜ ε i 2 V + 1 2 N X i =1 1 s 2 i | ˜ ε i | 2 V = − 1 2 N X i =1 1 s 2 i e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) 2 V − N X i =1 1 s 2 i D ˜ w θ 0 ( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V − N X i =1 1 s 2 i D ε i , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V . (28) • Pro of of i): T aking the expectation under P N θ 0 , the last term cancels and w e obtain − E N θ 0 [ ˜ N ( θ ) − ˜ N ( θ 0 )] = N 2 ¯ s − 2 N ∥ e G ( θ ) − e G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) + N X i =1 1 s 2 i E hD ˜ w θ 0 ( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V i . Applying [MM2] on ˜ w θ 0 and by Cauch y-Sch w arz, w e further obtain E hD ˜ w θ 0 ( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V i ≤ c ( θ 0 ) ˜ δ model ,N × ∥ e G ( θ 0 ) − e G ( θ ) ∥ L 2 ζ ( Z ,V ) So that in the end − E N θ 0 [ ˜ N ( θ ) − ˜ N ( θ 0 )] ≤ 1 2 N s − 2 0 × d e G ( θ , θ 0 ) 2 + C ( θ 0 , s 2 0 , s 2 ∞ ) N ˜ δ model ,N × d e G ( θ , θ 0 ) . • Pro of of ii): W e now compute T i : = E ( i ) θ 0 log q ( i ) θ q ( i ) θ 0 − E ( i ) θ 0 log q ( i ) θ q ( i ) θ 0 ! 2 for i = 1 , . . . , N . By similar computations as ab ov e, under P N θ 0 , we hav e log q ( i ) θ q ( i ) θ 0 − E N θ 0 log q ( i ) θ q ( i ) θ 0 = − 1 2 s 2 i e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) 2 V − 1 s 2 i D ˜ ε i , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V + 1 2 s 2 i ∥ e G ( θ 0 ) − e G ( θ ) ∥ 2 L 2 ζ + 1 s 2 i E ( i ) θ 0 D ˜ w θ 0 ( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V Using Eq. (27), we upp er b ound T i b y the sum of squares: then by Cauc hy-Sc hw arz T i ≤ 4 s 4 i E ( i ) θ 0 " 1 4 e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) 4 V + | ˜ ε i | 2 V e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) 2 V + 1 4 ∥ e G ( θ 0 ) − e G ( θ ) ∥ 4 L 2 ζ + E ( i ) θ 0 hD ˜ w θ 0 ( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V i 2 # = 1 s 4 i E ( i ) θ 0 e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) 4 V + 4 s 4 i E ( i ) θ 0 | ˜ w θ 0 ( Z i ) | 2 V e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) 2 V + 4 σ 2 i s 4 i ∥ e G ( θ 0 ) − e G ( θ ) ∥ 2 L 2 ζ + 1 s 4 i ∥ e G ( θ 0 ) − e G ( θ ) ∥ 4 L 2 ζ + 4 E ( i ) θ 0 hD ˜ w θ 0 ( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V i 2 20 where we used that ˜ ε i = ε i + ˜ w θ 0 ( Z i ) and ε i ⊥ Z i , E [ ε i ] = 0 and E ε 2 i = σ 2 i to go from the first to the second line. Then, by Jensen, Cauc h y-Sch warz and the uniform bound on ˜ w θ 0 b y Condition 3.3 E hD ˜ w θ 0 ( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V i 2 ≲ ˜ δ 2 model ,N × ∥ e G ( θ 0 ) − e G ( θ ) ∥ 2 L 2 ζ and finally using the uniform b ounds on e G on the set ˜ B N ∀ i = 1 , . . . , N : T i ≤ 1 s 4 i ∥ e G ( θ 0 ) − e G ( θ ) ∥ 2 L 2 ζ h 4 U 2 + 4 ˜ δ 2 model ,N + 4 σ 2 i + 4 U 2 + 4 ˜ δ 2 model ,N i ≤ 4 ∥ e G ( θ 0 ) − e G ( θ ) ∥ 2 L 2 ζ 4 U 2 + σ 2 i s 4 i + 2 ˜ δ 2 model ,N s 4 i ! As ˜ δ model ,N → 0 as N → ∞ , for N large enough there exists a constant c 2 = c 2 ( U, s 2 0 , σ 2 ∞ ) > 0 such that E ( i ) θ 0 log q ( i ) θ q ( i ) θ 0 − E ( i ) θ 0 log q ( i ) θ q ( i ) θ 0 2 ≤ c 2 ∥ e G ( θ 0 ) − e G ( θ ) ∥ 2 L 2 ζ , whic h sho ws the claim. F ollowing standard literature, if the small ball condition is satisfied then Lemma 3.6 holds. Pr o of of Lemma 3.6. By Jensen’s inequalit y , log Z B N q N θ q N θ 0 ( D N ) dν ( θ ) ≥ Z B N log q N θ q N θ 0 ( D N ) dν ( θ ) So using Proposition 3.5, and the fact that on the supp ort B N of ν ∥ e G ( θ ) − e G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) ≤ δ 2 N , the probability in question is b ounded b y P N θ 0 Z B N log q N θ q N θ 0 ( D N ) dν ( θ ) ≤ − K N δ 2 N ! = P N θ 0 Z B N log q N θ q N θ 0 ( D N ) − E N θ 0 log q N θ q N θ 0 ( D N ) dν ( θ ) ≤ − K N δ 2 N − Z B N E N θ 0 log q N θ q N θ 0 ( D N ) dν ( θ ) ≤ P N θ 0 Z B N log q N θ q N θ 0 ( D N ) − E N θ 0 log q N θ q N θ 0 ( D N ) dν ( θ ) ≤ − ( K − s − 2 0 / 2) N δ 2 N + c 1 N ˜ δ model ,N δ N ! = P N θ 0 E ν log q N θ q N θ 0 − E ν E N θ 0 log q N θ q N θ 0 ( D N ) ≤ − ( K − s − 2 0 / 2) N δ 2 N + c 1 N ˜ δ model ,N δ N ! = P N θ 0 N X i =1 E ν log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) − E ν E log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) ≤ − ( K − s − 2 0 / 2) N δ 2 N + c 1 N ˜ δ model ,N δ N F ubini = P N θ 0 N X i =1 E ν log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) − EE ν log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) ≤ − ( K − s − 2 0 / 2) N δ 2 N + c 1 N ˜ δ model ,N δ N Chebyshev ≤ P N θ 0 N X i =1 W i ≥ K ′ N δ 2 N ! ≤ V ar N P i =1 W i ! K ′ 2 N 2 δ 4 N = N P i =1 V ar ( W i ) K ′ 2 N 2 δ 4 N (29) 21 b y indep endence of the W i , for some K ′ > 0 since N ˜ δ model ,N δ N = o ( N δ 2 N ) b y Condition 3.3, ha ving defined for i ≤ N the centered v ariables W i = E ν " log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) # − E N θ 0 E ν " log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) # . No w, for every i = 1 , ..., N V ar( W i ) = E N θ 0 [ W 2 i ] = E N θ 0 E ν log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) − E N θ 0 log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) 2 b y F ubini ≤ E N θ 0 E ν log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) − E N θ 0 log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) 2 b y Jensen on E ν = E ν E N θ 0 log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) − E N θ 0 log q ( i ) θ q ( i ) θ 0 ( Y i , Z i ) 2 b y F ubini. Proposition 3.5 (ii) and the assumption on the supp ort of ν yields V ar( W i ) ≤ c 2 E ν [ d e G ( θ , θ 0 ) 2 ] ≤ c 2 δ 2 N and coming back to Eq. (29), this leads to P N θ 0 Z B N e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) dν ( θ ) ≤ e − K N δ 2 N ≤ c 2 N δ 2 N K ′ 2 N 2 δ 4 N = O 1 N δ 2 N ! − − − − → N →∞ 0 . 5.2 Pro ofs for the c hange of measure Pr o of of Proposition 3.8. Recall the notation ˜ w θ = G ( θ ) − e G ( θ ) , for θ ∈ Θ . Under P N θ , the generated data satisfies, ∀ 1 ≤ i ≤ N : Y i = G ( θ )( Z i ) + ε i = e G ( θ )( Z i ) + ˜ ε i , in terms of the surrogate mo del where ˜ ε i | Z i = ˜ w θ ( Z i ) + ε i | Z i ∼ N ( ˜ w θ ( Z i ) , σ 2 i ) are mutually indep enden t. F ollowing similar computations as in the pro of of Proposition 3.5, this time under P N θ , we find: N ( θ ) − N ( θ 0 ) = 1 2 N X i =1 1 σ 2 i |G ( θ )( Z i ) − G ( θ 0 )( Z i ) | 2 V + N X i =1 1 σ 2 i ⟨G ( θ )( Z i ) − G ( θ 0 )( Z i ) , ε i ⟩ V and ˜ N ( θ ) − ˜ N ( θ 0 ) = 1 2 N X i =1 1 s 2 i e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) 2 V + N X i =1 1 s 2 i D e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) , ˜ ε i E V = 1 2 N X i =1 1 s 2 i e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) 2 V + N X i =1 1 s 2 i D e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) , ε i E V + N X i =1 1 s 2 i D e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) , ˜ w θ ( Z i ) E V . This yields, by indep endence of observ ations E N θ q N θ q N θ 0 p N θ 0 p N θ ! b = E N θ h exp b ( ˜ N ( θ ) − ˜ N ( θ 0 )) − b ( N ( θ ) − N ( θ 0 )) i = N Y i =1 E N θ [exp ( bW i )] , (30) 22 with W i = 1 2 " 1 s 2 i ∆ e G i 2 V − 1 σ 2 i | ∆ G i | 2 V # + 1 s 2 i D ∆ e G i , ˜ w θ ( Z i ) E V | {z } A i + * 1 s 2 i ∆ e G i − 1 σ 2 i ∆ G i , ε i + V | {z } B i where we write ∆ e G i := e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) and ∆ G i := G ( θ )( Z i ) −G ( θ 0 )( Z i ) for ease of notation. No w, kno wing that Z i ⊥ ε i , by the to wer prop ert y and applying the Gaussian MGF for ε i : log E N θ [exp ( bW i )] = log E N ζ h E N ε [exp ( bW i ) | Z i ] i = log E N ζ h exp ( bA i ) E N ε [exp ( bB i ) | Z i ] i = log E N ζ exp b 2 " 1 s 2 i ∆ e G i 2 V − 1 σ 2 i | ∆ G i | 2 V # + b s 2 i D ∆ e G i , ˜ w θ ( Z i ) E V ! × exp b 2 σ 2 i 2 1 s 2 i ∆ e G i − 1 σ 2 i ∆ G i 2 V = log E N ζ " exp b 2 " 1 s 2 i − 1 σ 2 i ! ∆ e G i 2 V − 1 σ 2 i | ˜ w θ ( Z i ) − ˜ w θ 0 ( Z i ) | 2 V # − b σ 2 i D ∆ e G i , ˜ w θ ( Z i ) − w θ 0 ( Z i ) E V + b s 2 i D ∆ e G i , ˜ w θ ( Z i ) E V + b 2 σ 2 i 2 1 s 2 i ∆ e G i − 1 σ 2 i ∆ G i 2 V ≤ log E N ζ " exp b 2 1 s 2 i − 1 σ 2 i ! ∆ e G i 2 V + ∆ e G i V b σ 2 i | w θ ( Z i ) − w θ 0 ( Z i ) | V + b s 2 i | ˜ w θ ( Z i ) | V ! + b 2 σ 2 i 2 1 s 2 i ∆ e G i − 1 σ 2 i ∆ G i 2 V ≤ log E N ζ exp − bσ 2 i 2 1 − σ 2 i s 2 i ! + b 2 σ 2 i 1 − σ 2 i s 2 i 2 ∆ e G i 2 V | {z } ( I ) + b 2 σ 2 i | ˜ w θ 0 ( Z i ) − ˜ w θ ( Z i ) | 2 V | {z } ( I I ) + ∆ e G i V b σ 2 i | ˜ w θ ( Z i ) − w θ 0 ( Z i ) | V + b s 2 i | ˜ w θ ( Z i ) | V ! | {z } ( I I I ) b y Cauc hy-Sc hw arz, and where to simplify we used computations ∆ e G i = e G ( θ ) − G ( θ ) + G ( θ ) − G ( θ 0 ) + G ( θ 0 ) − e G ( θ 0 ) ( Z i ) = ∆ G i + ˜ w θ 0 ( Z i ) − ˜ w θ ( Z i ) to go from the first to the second line, and b 2 σ 2 i 2 1 s 2 i ∆ e G i − 1 σ 2 i ∆ G i 2 V = b 2 σ 2 i 2 1 s 2 i − 1 σ 2 i ! ∆ e G i + 1 σ 2 i ˜ w θ 0 ( Z i ) − 1 σ 2 i ˜ w θ ( Z i ) 2 V ≤ b 2 σ 2 i 1 − σ 2 i s 2 i 2 ∆ e G i 2 V + b 2 σ 2 i | ˜ w θ 0 ( Z i ) − ˜ w θ ( Z i ) | 2 V . for the last line. Let us no w control eac h term (I), (I I), (I I I), using to Conditions 3.2 and 3.3. W e first remark that by Conditions [FR2] , for θ with ∥ θ ∥ R ≤ M : ∆ e G i V ≤ ∥ e G ( θ ) − e G ( θ 0 ) ∥ ∞ ≤ ∥ e G ( θ ) ∥ ∞ + ∥ e G ( θ 0 ) ∥ ∞ ≤ C e G , B (1 + ∥ θ ∥ γ B R ) + C e G , B (1 + ∥ θ 0 ∥ γ B R ) ≤ C M , e G ,γ B . Con trolling (I). (I) is the standalone con tribution of the noise missp ecification. 23 • Under Condition 3.2 [NM2.1] , when the v ariance is ov erestimated (i.e. s 2 i > σ 2 i ) with 1 − σ 2 i /s 2 i = ˜ δ noise ,N = o (1) , then the term in parenthesis is of order − bσ 2 i 1 − σ 2 i s 2 i and the con tribution of (I) inside the exp onential is negative: w e can therefore ignore (I) as it will yield an upp er b ound b y 1. • Under Condition 3.2 [NM2.2] , the term in parenthesis is of order bσ 2 i 1 − σ 2 i s 2 i ≤ bσ 2 ∞ ˜ δ noise ,N ≤ bσ 2 ∞ C noise δ 2 N yielding then ( I ) ≤ cδ 2 N for some constant c = c ( M , e G , γ B , b, σ 2 ∞ ) . Con trolling (I I) and (I I I). By Condition 3.3, ∥ ˜ w θ ∥ ∞ ≤ ∥G ( θ ) − e G ( θ ) ∥ ∞ ≤ c ( M ) ˜ δ model ,N with ˜ δ model ,N ≤ C model δ 2 N , yielding ( I I ) ≲ ∥ ˜ w θ ∥ 2 ∞ + ∥ ˜ w θ 0 ∥ 2 ∞ ≲ ˜ δ model ,N 2 ≲ δ 4 N and ( I I I ) ≲ ∥ ∆ e G i ∥ ∞ ( ∥ ˜ w θ ∥ ∞ + ∥ ˜ w θ 0 ∥ ∞ ) ≲ ˜ δ model ,N ≲ δ 2 N for constants dep ending on { M , e G , b, s 2 0 , σ 2 0 , σ 2 ∞ , C model } . Finally , going bac k to Eq. (30), w e obtain E N θ q N θ q N θ 0 p N θ 0 p N θ ! b ≤ exp( c 5 × N δ 2 N ) for some c 5 = c ( b, s 2 0 , σ 2 ∞ , M , C e G , C noise , C model ) . Pr o of of Proposition 3.9. W e wan t to show that there exists a constan t c 6 > 0 such that Z Θ N ( M ) c E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) ] d Π N ( θ ) = e − c 6 N δ 2 N (31) Using computation ( Eq. (28)), and recalling ˜ w θ 0 = G ( θ 0 ) − e G ( θ 0 ) , under P N θ 0 ˜ N ( θ ) − ˜ N ( θ 0 ) = − 1 2 N X i =1 1 s 2 i e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) 2 V − N X i =1 1 s 2 i D ˜ w θ 0 ( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V − N X i =1 1 s 2 i D ε i , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V w e find that conditioned on the co v ariates Z 1 , ..., Z N , and using the MGF of ε i whic h under P θ 0 is a N (0 , σ 2 i ) random v ariable: E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) | Z 1 , ..., Z N ] = exp − 1 2 N X i =1 1 s 2 i 1 − σ 2 i s 2 i ! e G ( θ )( Z i ) − e G ( θ 0 )( Z i ) 2 V ! (32) × exp − N X i =1 1 s 2 i D G ( θ 0 )( Z i ) − e G ( θ 0 )( Z i ) , e G ( θ 0 )( Z i ) − e G ( θ )( Z i ) E V ! 24 = exp 1 2 s − 2 0 N ˜ δ noise ,N ∥ e G ( θ ) − e G ( θ 0 ) ∥ 2 ∞ exp N C ˜ δ model ,N ∥ e G ( θ 0 ) − e G ( θ ) ∥ ∞ (33) for some C = C ( s − 2 0 , ∥ θ 0 ∥ R ) . The quan tity ∥ e G ( θ ) − e G ( θ 0 ) ∥ ∞ cannot b e controlled globally on Θ N ( M ) c ; ho wev er we can control it on “slices” where the parameter has b ounded R -norm, and use tail decay of the prior via a slicing argumen t. Note that the integral Eq. (31) is 0 outside of the supp ort R of the prior; hence we can restrict computations to the integral o v er Θ N ( M ) c ∩ R . W rite Θ N ( M ) c ∩ R = ∞ [ ℓ = ℓ 0 Θ N ( M ) c ∩ { θ ∈ R : 2 ℓ S ≤ ∥ θ ∥ R < 2 ℓ +1 S } := ∞ [ ℓ = ℓ 0 P ℓ (34) for S > ∥ θ 0 ∥ R ∨ 1 and 0 = ⌊ log 2 ( M /S ) ⌋ . On ev ery slice, for ev ery θ ∈ P ℓ ( ≥ 0 ) : ∥ e G ( θ ) − e G ( θ 0 ) ∥ ∞ ≤ ∥ e G ( θ ) ∥ ∞ + ∥ e G ( θ 0 ) ∥ ∞ ≤ C e G , B (1 + ∥ θ ∥ γ B R ) + C e G , B (1 + ∥ θ 0 ∥ γ B R ) ≤ 4 C e G , B (2 ℓ +1 S ) γ B Coming back to Eq. (33) and using that (see e.g. (2.21) in Nic kl (2023) ) Π N ( P ℓ ) ≤ Π N ( ∥ θ ∥ R > 2 ℓ S ) ≤ e − c prior 2 2 ℓ S 2 N δ 2 N , (35) w e see ∞ X ℓ = ℓ 0 Z P ℓ E N θ 0 [e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) ] d Π N ( θ ) ≤ ∞ X ℓ = ℓ 0 exp 8 s − 2 0 N ˜ δ noise ,N C 2 e G , B (2 ℓ +1 S ) 2 γ B + 4 N C C e G , B ˜ δ model ,N N (2 ℓ +1 S ) γ B exp( − c prior 2 2 ℓ S 2 N δ 2 N ) ≤ ∞ X ℓ = ℓ 0 exp " − N δ 2 N 2 2 ℓ S 2 c prior − c a ˜ δ noise ,N δ 2 N 2 (2 γ B − 2) ℓ S (2 γ B − 2) − c b ˜ δ model ,N δ 2 N 2 ( γ B − 2) ℓ S ( γ B − 2) !# ≤ ∞ X ℓ = ℓ 0 exp − N δ 2 N 2 2 ℓ S 2 c prior − c a C noise 2 (2 γ B − 2) ℓ − c b C model 2 ( γ B − 2) ℓ | {z } ( K ( ℓ )) (36) under conditions 3.2 and 3.3, for s ome constants c a , c b dep ending on { S , C e G , B , s 2 0 , ∥ θ 0 ∥ R } . W e sho w that under our conditions, ( K ( )) can b e low er b ounded by some strictly p ositiv e constan t τ > 0 for every ≥ 0 . Let us lo ok at con tributions from each source of missp ecification in ( K ( )) : • Under [NM2.1] where the v ariance gets ov erestimated, the 2nd term in ( K ( )) accoun ting for noise missp ecification actually disapp ears (since the term inside the exponential in Eq. (32) is negative when s 2 i > σ 2 i ). Only the contribution from mo del missp ecification remains, and by Condition γ B ≤ 2 in Condition 3.3 and for C model small enough we obtain the desired result. • Otherwise under [NM2.2] , γ B ≤ 1 . The con tribution from noise missp ecification deca ys with , and the dominating con tribution is the one from noise missp ecification. F or C noise small enough the term K ( ) remains p ositive. Therefore, for C noise and C model sufficien tly small, the term ( K ( )) can b e lo wer b ounded by some constan t τ > 0 for every ≥ 0 . The summands decay sup erexp onen tially with and w e can upp er b ound the sum b y the order of the first term, O (exp( − cτ 2 2 ℓ 0 N δ 2 N )) . In the end, w e obtain the desired Eq. (31), for some constant c 6 = c 6 ( M , C e G ,B , θ 0 , s 2 0 , C noise , C model ) . In particular, c 6 can be made as large as desired by increasing the radius M in the definition of the regularisation set Θ N ( M ) . 25 5.3 Pro of of Prop osition 3.11 Pr o of of Proposition 3.11. The pro of copies that of Theorem 2.2.2 (step 2) in Nickl (2023) replacing G with e G ; with a simple p erturbation argumen t. Precisely: let M ∈ (0 , ∞ ) , and assume that ∥ θ − θ 0 ∥ R ≤ M . As θ 0 ∈ R , we thus hav e ∥ θ ∥ R ≤ M + ∥ θ 0 ∥ R = : ¯ M ∈ (0 , ∞ ) . Then, by [MM2] and [FR2] : ∥ e G ( θ ) ∥ ∞ ≤ C e G , B × (1 + ¯ M γ B ) ≤ c 7 ( ¯ M , C e G , B ) = : U . Th us, Π N ( e B N ) = Π N θ ∈ Θ : d e G ( θ , θ 0 ) ≤ δ N , ∥ e G ( θ ) ∥ ∞ ≤ U ≥ Π N θ ∈ Θ : d e G ( θ , θ 0 ) ≤ δ N , ∥ θ − θ 0 ∥ R ≤ M . No w with the Lipsc hitz-condition [FR1] , the triange inequalit y and [MM2] (noting that ( Z , Z , ζ ) is a probability space), w e obtain d e G ( θ , θ 0 ) ≤ d G ( θ , θ 0 ) + ∥ e G ( θ ) − G ( θ ) ∥ L 2 ζ ( Z ,V ) + ∥ e G ( θ 0 ) − G ( θ 0 ) ∥ L 2 ζ ( Z ,V ) ≤ d G ( θ , θ 0 ) + c ( ¯ M ) × ˜ δ model ,N + c ( θ 0 ) ˜ δ model ,N ≤ d G ( θ , θ 0 ) + c ¯ M , θ 0 , C model δ 2 N ≤ C Lip , 2 × 1 + ¯ M γ 2 × ∥ θ − θ 0 ∥ ( H κ ( M ,W )) ∗ + c ¯ M , θ 0 , C model δ 2 N . The last line is upp er b ounded by δ N , if ∥ θ − θ 0 ∥ ( H κ ( M ,W )) ∗ ≤ δ N c 8 ( ¯ M ,C Lip , 2 ,γ 2 ) with c 8 ( ¯ M , C Lip , 2 , γ 2 ) > C Lip , 2 × 1 + ¯ M γ 2 and N = N ( ¯ M , θ 0 , C model ) sufficen tly large. Then, the ab ov e probability can b e further upper b ounded by Π N ( e B N ) ≥ Π N θ ∈ Θ : ∥ θ − θ 0 ∥ ( H κ ( M ,W )) ∗ ≤ δ N c 8 ( ¯ M , C Lip , 2 , γ 2 ) , ∥ θ − θ 0 ∥ R ≤ M ! . Applying Corollary 2.6.18 in Gine and Nickl (2021) as w ell as the Gaussian correlation inequalit y , Theorem B.1.2 in Nic kl (2023) , we obtain Π N ( e B N ) ≥ e − 1 2 N δ 2 N ∥ θ 0 ∥ 2 H × Π N θ ∈ Θ : ∥ θ ∥ ( H κ ( M ,W )) ∗ ≤ δ N c 8 ( ¯ M , C Lip , 2 , γ 2 ) ! × Π N ( ∥ θ ∥ R ≤ M ) . The pro of now follows the same steps as in Step 2 of Theorem 2.2.2 in Nic kl (2023) b y noting that an y ball in H α ( M , W ) (and H α c ( M , W ) ) can b e co vered in terms of the ∥ · ∥ ( H κ ( M ,W )) ∗ -norm taking care of the additional dimension d W , see also Lemma 4.9 in Sieb el (2025) . 5.4 Pro of of Theorem 3.12 Pr o of of Theorem 3.12. The b eginning of the pro of follows the steps of Theorem 1.3.2. in Nic kl (2023) . F rom Pr oposition 3.11 and Lemma 3.6 applied to ν = Π N ( · ) / Π N ( B N ) for B N = ˜ B N , K = A + s − 2 0 , it follows that the even ts A N = Z Θ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) dΠ N ( θ ) ≥ Π N ( B N ) e − s − 2 0 N δ 2 N ≥ e − ( A + s − 2 0 ) N δ 2 N (37) satisfy lim N →∞ P N θ 0 ( A N ) = 1 . Giv en M , ρ ∈ (0 , ∞ ) , w e no w in tro duce the sets ¯ Θ N ( M ) : = Θ N ( M ) ∩ { θ ∈ Θ : d G ( θ , θ 0 ) ≤ ρδ N } 26 and denote b y ¯ Θ N ( M ) c their complemen ts in Θ . No w controlling the even ts A N as ab o v e and using the tests Ψ N from Proposition 3.7, the target probabilit y Eq. (15) can b e written for N → ∞ : P N θ 0 e Π N ( ¯ Θ N ( M ) c | D N ) ≥ e − bN δ 2 N = P N θ 0 e Π N ( ¯ Θ N ( M ) c | D N ) ≥ e − bN δ 2 N , A N + P N θ 0 e Π N ( ¯ Θ c N ) ≥ e − bN δ 2 N , A c N = P N θ 0 R ¯ Θ N ( M ) c e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) dΠ N ( θ ) R Θ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) dΠ N ( θ ) ≥ e − bN δ 2 N , Ψ N = 0 , A N + o (1) ≤ P N θ 0 Z ¯ Θ N ( M ) c e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) (1 − Ψ N )dΠ N ( θ ) ≥ e − ( b + A + s − 2 0 ) N δ 2 N ! + o (1) . It remains to upp er b ound the last probability . Mark o v’s inequalit y and F ubini’s theorem yield e ( b + A + s − 2 0 ) N δ 2 N × E N θ 0 " Z ¯ Θ N ( M ) c e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) (1 − Ψ N ) d Π N ( θ ) # = e ( b + A + s − 2 0 ) N δ 2 N " Z Θ N ( M ) c E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) (1 − Ψ N )] d Π N ( θ ) + Z { d G ( θ,θ 0 ) ≥ ρδ N } E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) (1 − Ψ N )] d Π N ( θ ) # ≤ e ( b + A + s − 2 0 ) N δ 2 N " 2 Z Θ N ( M ) c E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) (1 − Ψ N )] d Π N ( θ ) + Z { d G ( θ,θ 0 ) ≥ ρδ N }∩ Θ N ( M ) E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) (1 − Ψ N )] d Π N ( θ ) ( † ) where we used that ¯ Θ c N = Θ N ( M ) c ∪ { d G ( θ ,θ 0 ) ≥ ρδ N } . Let us no w con trol eac h the integral. Step 1: W e first con trol Z { θ ∈ Θ N ( M ): d G ( θ,θ 0 ) >ρδ N } E N θ 0 " q N θ q N θ 0 (1 − Ψ N ) # d Π N ( θ ) Note that E N θ 0 " q N θ q N θ 0 (1 − Ψ N ) # = E N θ 0 " (1 − Ψ N ) q N θ q N θ 0 p N θ 0 p N θ p N θ p N θ 0 # = E N θ " (1 − Ψ N ) q N θ q N θ 0 p N θ 0 p N θ # ≤ E N θ h (1 − Ψ N ) 2 i 1 / 2 E N θ q N θ q N θ 0 p N θ 0 p N θ ! 2 1 / 2 b y Cauc hy-Sc hw arz ≤ E N θ [(1 − Ψ N )] 1 / 2 E N θ q N θ q N θ 0 p N θ 0 p N θ ! 2 1 / 2 as | 1 − Ψ N | ≤ 1 ≤ exp ( − ¯ c/ 2 × N δ 2 N ) exp ( c 5 N δ 2 N ) , where we used the assumptions on the tests ( Proposition 3.7) to control the factor on the left, and Proposition 3.8, to control the factor on the right, since Θ N ( M ) ⊂ { θ : ∥ θ ∥ R ≤ M } . 27 Step 2: On Θ c N , notice that E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) (1 − Ψ N )] ≤ E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) ] so that Z Θ N ( M ) c E N θ 0 [ e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) (1 − Ψ N )] d Π( θ ) = exp( − c 6 × N δ 2 N ) b y virtue of Proposition 3.9. In the end, the target probability can be b ounded by ab ov e b y e ( b + A + ¯ s − 2 N ) N δ 2 N e − (¯ c/ 2 − c 5 ) N δ 2 N + e − c 6 N δ 2 N → 0 , as N → ∞ for ρ large enough suc h that ¯ c > 2( b + A + s − 2 0 + c 5 ) , and also by assumption c 6 > b + A + s − 2 0 . 5.5 Pro of of Theorem 3.17 Pr o of of Theorem 3.17. As in Theorem 2.3.1 of Nic kl (2023) , the pro of of Eq. (16) follows from Theorem 3.12 and Condition 3.16 yielding the set inclusion { θ ∈ Θ N ( M ) : d G ( θ , θ 0 ) ≤ ρδ N } ⊂ { θ ∈ Θ N ( M ) : d G ( θ , θ 0 ) ≤ C G , inv ( M ) } ( ρδ N ) τ } . F or the conv ergence of the p osterior mean Eq. (17), follo wing the pro of of Theorem 2.3.2 in Nic kl (2023) , we set η N = C G , inv ( M )( ρδ N ) η . Then, by Jensen and Cauch y–Sch w arz, E e Π [ θ | D N ] − θ 0 L 2 ≤ E e Π [ ∥ θ − θ 0 ∥ L 2 | D N ] ≤ η N + E e Π h ∥ θ − θ 0 ∥ L 2 1 { ∥ θ − θ 0 ∥ L 2 >η N } | D N i ≤ η N + E e Π h ∥ θ − θ 0 ∥ 2 L 2 | D N i 1 / 2 e Π( ∥ θ − θ 0 ∥ L 2 > η N | D N ) 1 / 2 , and w e no w sho w that the last term is O P N θ 0 ( η N ) to prov e the theorem. W e recall the sets A N from Eq. (37) which satisfy P N θ 0 ( A N ) → 1 as N → ∞ . Then using Eq. (16), Mark ov’s inequalit y , F ubini’s theorem, P N θ 0 E e Π h ∥ θ − θ 0 ∥ 2 L 2 | D N i e Π( ∥ θ − θ 0 ∥ L 2 > η N | D N ) > η 2 N ≤ P N θ 0 E e Π h ∥ θ − θ 0 ∥ 2 L 2 | D N i e − bN δ 2 N > η 2 N + o (1) ≤ P N θ 0 e − bN δ 2 N R ∥ θ − θ 0 ∥ 2 L 2 e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) d Π( θ ) R e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) d Π( θ ) > η 2 N , A N + o (1) ≤ e ( A + s − 2 0 − b ) N δ 2 N η − 2 N Z ∥ θ − θ 0 ∥ 2 L 2 E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i d Π( θ ) + o (1) As in the pro of of Theorem 3.12, w e cannot directly upp er b ound E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i b y 1 b ecause of the missp ecification. W e again make use of the change of measures prop ositions. Note that we can decompose the in tegral: Z ∥ θ − θ 0 ∥ 2 L 2 E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i d Π( θ ) = Z Θ N ( M ) ∥ θ − θ 0 ∥ 2 L 2 E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i d Π( θ ) + Z R ∩ Θ N ( M ) c ∥ θ − θ 0 ∥ 2 L 2 E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i d Π( θ ) F or the first term, notice that Θ N ( M ) ⊂ R ( M ) , and that E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i = E N θ q N θ q N θ 0 p N θ 0 p N θ , therefore Proposition 3.8 applies with b = 1 , yielding Z Θ N ( M ) ∥ θ − θ 0 ∥ 2 L 2 E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i d Π( θ ) ≤ e c 5 N δ 2 N Z Θ N ( M ) ∥ θ − θ 0 ∥ 2 L 2 d Π( θ ) ≤ C e c 5 N δ 2 N 28 where w e used that the Gaussian measure Π ′ used as the base prior is supp orted in L 2 and in tegrates ∥ · ∥ 2 L 2 to a finite constan t. T o deal with the second term, w e adapt the slicing argument from Proposition 3.9, noting that on every slice P ℓ , b ecause R embeds con tin uously in to L 2 : ∥ θ − θ 0 ∥ L 2 ≤ ∥ θ − θ 0 ∥ R ≤ ∥ θ ∥ R + ∥ θ 0 ∥ R ≤ 2(2 ℓ +1 S ) and plugging into Eq. (36) we get Z R ∩ Θ N ( M ) c ∥ θ − θ 0 ∥ 2 L 2 E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i d Π( θ ) ≤ ∞ X ℓ = ℓ 0 Z P ℓ ∥ θ − θ 0 ∥ 2 R E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i d Π( θ ) ≤ 4 ∞ X ℓ = ℓ 0 Z P ℓ (2 ℓ +1 S ) 2 exp( − N δ 2 N 2 2 ℓ K ( )) whic h con verges to a constant C ′ b y Lemma D.3. Therefore even tually: e ( A + s − 2 0 − b ) N δ 2 N η − 2 N Z ∥ θ − θ 0 ∥ 2 L 2 E N θ 0 h e ˜ ℓ N ( θ ) − ˜ ℓ N ( θ 0 ) i d Π( θ ) ≤ e ( A + s − 2 0 − b ) N δ 2 N η − 2 N C e c 5 N δ 2 N + C ′ W e conclude for b large enough suc h that c 6 − A − s − 2 0 > b > c 5 + A + s − 2 0 . T o ensure this is p ossible, we lo ok at the gro wth of constan ts c 6 and c 5 with M . c 6 is of order 2 2 ℓ 0 = 2 2 log 2 ( M /S ) = O ( M 2 ) b y Eq. (36). c 5 gro ws at the rate of O ( s − 2 0 C noise (1 + M γ B ) 2 ∨ C model (1 + M γ B )) ≲ ( C noise ∨ C model ) M 2 b y Eq. (33), under Conditions 3.2 and 3.3. Therefore for constan ts C noise and C model sufficien tly small, by choosing M and then ρ large enough, conv ergence of the p osterior mean to the ground truth holds at the desired rate. F unding FS is funded by an ERC Adv anced Gran t (UKRI G116786) and MS is funded by Deutsc he F orsch ungsgemeinsc haft (DF G, German Research F oundation) under German y’s Excellence Strategy EXC-2181/1-39090098 (the Heidelb erg STRUCTURES Cluster of Excellence). A ckno wledgemen ts The authors are grateful to Ric hard Nickl for suggesting this pro ject and for many insigh tful discussions. They further thank Ric hard Nickl and Jan Johannes for facilitating recipro cal researc h visits to Cambridge and Heidelb erg, resp ectiv ely , and Rob ert Scheic hl for prop osing the inclusion of Oseen-t yp e iterations as an illustrative example. References Berk, Rob ert H. (1966). “Limiting b ehavior of posterior distributions when the mo del is in- correct”. In: A nn. Math. Statist. 37, 51–58, correction, ibid. 745–746. issn : 0003-4851. doi : 10.1214/aoms/1177699477 . url : https://doi.org/10.1214/aoms/1177699477 . Lions, J. L. and E. Magenes (1972). Non-Homo gene ous Boundary V alue Pr oblems and A ppli- c ations . Berlin, Heidelb erg: Springer. isbn : 978-3-642-65161-8. doi : 10 . 1007 / 978 - 3 - 642 - 65161- 8 . (Visited on 01/12/2026). 29 A ubin, Thierry (1982). Nonline ar A nalysis on Manifolds. Monge-A mp èr e Equations . Ed. b y M. Artin, S. S. Chern, J. L. Do ob, A. Grothendieck, E. Heinz, F. Hirzebruch, L. Hörmander, S. Mac Lane, W. Magnus, C. C. Mo ore, J. K. Moser, M. Nagata, W. Schmidt, D. S. Scott, J. Tits, B. L. V an Der W aerden, M. Berger, B. Ec kmann, and S. R. S. V aradhan. V ol. 252. Grundlehren Der Mathematisc hen Wissensc haften. New Y ork, NY: Springer. isbn : 978-1- 4612-5734-9. doi : 10.1007/978- 1- 4612- 5734- 9 . (Visited on 01/30/2026). White, Halb ert (1982). “Maxim um likelihoo d estimation of missp ecified mo dels”. In: Ec ono- metric a 50.1, pp. 1–25. issn : 0012-9682,1468-0262. doi : 10 . 2307 / 1912526 . url : https : //doi.org/10.2307/1912526 . T rieb el, Hans (1983). The ory of F unction Sp ac es . Basel: Springer. isbn : 978-3-0346-0416-1. doi : 10.1007/978- 3- 0346- 0416- 1 . (Visited on 11/29/2025). Girault, Viv ette and Pierre-Arnaud Ra viart (1986). Finite element metho ds for Navier-Stokes e quations . V ol. 5. Springer Series in Computational Mathematics. Theory and algorithms. Springer-V erlag, Berlin, pp. x+374. isbn : 3-540-15796-4. doi : 10.1007/978- 3- 642- 61623- 5 . url : https://doi.org/10.1007/978- 3- 642- 61623- 5 . Constan tin, P eter and Ciprian F oias (1989). Navier-Stokes Equations . Chicago Lectures in Mathematics. Chicago, IL: Univ ersity of Chicago Press. isbn : 978-0-226-11549-8. (Visited on 01/08/2026). Alzer, Horst (1997). “On Some Inequalities for the Incomplete Gamma F unction”. In: Mathe- matics of Computation 66.218, pp. 771–778. T emam, Roger (1997). Infinite-Dimensional Dynamic al Systems in Me chanics and Physics . Ed. b y J. E. Marsden, L. Sirovic h, and F. John. V ol. 68. Applied Mathematical Sciences. New Y ork, NY: Springer. isbn : 978-1-4612-0645-3. doi : 10 . 1007 / 978 - 1 - 4612 - 0645 - 3 . (Visited on 01/08/2026). v an der V aart, Aad (1998). A symptotic Statistics . Cambridge Series in Statistical and Proba- bilistic Mathematics. Cambridge: Cambridge Universit y Press. isbn : 978-0-521-78450-4. doi : 10.1017/CBO9780511802256 . (Visited on 01/14/2026). Batc helor, G. K. (1999). A n intr o duction to fluid dynamics . pap erbac k. Cambridge Mathemat- ical Library . Cambridge Universit y Press, Cam bridge, pp. xviii+615. isbn : 0-521-66396-2. Engl, Heinz W erner, Martin Hank e, and A. Neubauer (2000). R e gularization of Inverse Pr oblems . Springer Science & Business Media. isbn : 978-0-7923-6140-4. v an de Geer, Sara (2000a). A pplic ations of Empiric al Pr o c ess The ory . V ol. 6. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge Universit y Press. isbn : 0-521-65002-X. v an de Geer, Sara A. (2000b). Empiric al Pr o c esses in M-Estimation . Cam bridge Universit y Press. isbn : 978-0-521-65002-1. Robinson, James C. (2001). Infinite-Dimensional Dynamic al Systems: A n Intr o duction to Dis- sip ative Par ab olic PDEs and the The ory of Glob al A ttr actors . Cam bridge Univ ersity Press. isbn : 978-0-521-63204-1. v an de Geer, Sara (2001). “Least squares estimation with complexity p enalties”. In: Mathemat- ic al Metho ds of Statistics 10.3, pp. 355–374. 30 Christen, J. Andrés and Colin F ox (2005). “Mark ov Chain Mon te Carlo Using an Approx- imation”. In: Journal of Computational and Gr aphic al Statistics 14.4, pp. 795–810. issn : 1061-8600. JSTOR: 27594150 . (Visited on 01/22/2026). Kaipio, Jari and Erkki Somersalo (2005). Statistic al and c omputational inverse pr oblems . V ol. 160. Applied Mathematical Sciences. Springer-V erlag, New Y ork, pp. xvi+339. isbn : 0-387-22073- 9. Kleijn, B. J. K. and Aad v an der V aart (2006). “Missp ecification in infinite-dimensional Ba yesian statistics”. In: A nn. Statist. 34.2, pp. 837–877. issn : 0090-5364,2168-8966. doi : 10 . 1214 / 009053606000000029 . url : https://doi.org/10.1214/009053606000000029 . Kalten bacher, Barbara, Andreas Neubauer, and Otmar Sc herzer (2008). Iter ative R e gularization Metho ds for Nonline ar Il l-Pose d Pr oblems . De Gruyter. isbn : 978-3-11-020827-6. doi : 10 . 1515/9783110208276 . (Visited on 01/08/2026). Reiß, Markus (2008). “Asymptotic equiv alence for nonparametric regression with m ultiv ariate and random design”. In: A nn. Statist. 36.4, pp. 1957–1982. issn : 0090-5364,2168-8966. doi : 10.1214/07- AOS525 . url : https://doi.org/10.1214/07- AOS525 . Andrieu, Christophe and Gareth O. Rob erts (2009). “The Pseudo-Marginal Approac h for Ef- ficien t Monte Carlo Computations”. In: The A nnals of Statistics 37.2, pp. 697–725. issn : 0090-5364. JSTOR: 30243645 . (Visited on 01/22/2026). Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein (2010). “Particle Marko v Chain Mon te Carlo Metho ds”. In: Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy 72.3, pp. 269–342. issn : 1369-7412. doi : 10.1111/j.1467- 9868.2009.00736.x . (Visited on 01/22/2026). Ev ans, Lawrence C. (2010). Partial Differ ential Equations . American Mathematical So c. isbn : 978-0-8218-4974-3. Stuart, A. M. (2010). “Inv erse problems: a Bay esian p ersp ectiv e”. In: A cta Numer. 19, pp. 451– 559. issn : 0962-4929,1474-0508. doi : 10.1017/S0962492910000061 . url : https://doi.org/ 10.1017/S0962492910000061 . Giné, Ev arist and Richard Nickl (2011). “Rates of contraction for p osterior distributions in L r - metrics, 1 ≤ r ≤ ∞ ”. In: A nn. Statist. 39.6, pp. 2883–2911. issn : 0090-5364,2168-8966. doi : 10.1214/11- AOS924 . url : https://doi.org/10.1214/11- AOS924 . Grün wald, P eter (2012). “The safe Ba yesian: learning the learning rate via the mixabilit y gap”. In: A lgorithmic le arning the ory . V ol. 7568. Lecture Notes in Comput. Sci. Springer, Heidelb erg, pp. 169–183. isbn : 978-3-642-34106-9. doi : 10.1007/978- 3- 642- 34106- 9\_16 . url : https: //doi.org/10.1007/978- 3- 642- 34106- 9_16 . Kleijn, B. J. K. and Aad v an der V aart (2012). “The Bernstein-Von-Mises theorem under missp ecification”. In: Ele ctr on. J. Stat. 6, pp. 354–381. issn : 1935-7524. doi : 10 . 1214 / 12 - EJS675 . url : https://doi.org/10.1214/12- EJS675 . Cotter, S. L., G. O. Rob erts, A. M. Stuart, and D. White (2013). “MCMC Metho ds for F unc- tions: Mo difying Old Algorithms to Mak e Them F aster”. In: Statistic al Scienc e 28.3, pp. 424– 446. issn : 0883-4237, 2168-8745. doi : 10.1214/13- STS421 . (Visited on 01/08/2026). 31 V ollmer, Sebastian J. (2013). “P osterior consistency for Ba yesian in verse problems through stabilit y and regression results”. In: Inverse Pr oblems 29.12, pp. 125011, 32. issn : 0266- 5611,1361-6420. doi : 10 . 1088 / 0266 - 5611 / 29 / 12 / 125011 . url : https : / / doi . org / 10 . 1088/0266- 5611/29/12/125011 . Hairer, Martin, Andrew M. Stuart, and Sebastian J. V ollmer (2014). “Spectral Gaps for a Metrop olis–Hastings Algorithm in Infinite Dimensions”. In: The A nnals of A pplie d Pr ob ability 24.6, pp. 2455–2490. issn : 1050-5164. JSTOR: 24520134 . (Visited on 01/09/2026). Dirksen, Sjo erd (2015). “T ail b ounds via generic c haining”. In: Ele ctr onic Journal of Pr ob ability 20, pp. 1–29. doi : 10.1214/EJP.v20- 3760 . url : https://doi.org/10.1214/EJP.v20- 3760 . Norets, Andriy (2015). “Ba yesian regression with nonparametric heterosk edasticity”. In: J. Ec onometrics 185.2, pp. 409–419. issn : 0304-4076,1872-6895. doi : 10 . 1016 / j . jeconom . 2014.12.006 . url : https://doi.org/10.1016/j.jeconom.2014.12.006 . Ghosal, Subhashis and Aad v an der V aart (2017). F undamentals of Nonp ar ametric Bayesian Infer enc e . Cam bridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cam- bridge Universit y Press. isbn : 978-0-521-87826-5. doi : 10.1017/9781139029834 . (Visited on 01/09/2026). Strogatz, Stev en H. (2018). Nonline ar Dynamics and Chaos: With A pplic ations to Physics, Biolo gy, Chemistry, and Engine ering . 2nd ed. Bo ca Raton: CR C Press. isbn : 978-0-429-49256- 3. doi : 10.1201/9780429492563 . Arridge, Simon, P eter Maass, Ozan Öktem, and Carola-Bibiane Schönlieb (2019). “Solving In verse Problems Using Data-Driven Mo dels”. In: A cta Numeric a 28, pp. 1–174. issn : 0962- 4929, 1474-0508. doi : 10.1017/S0962492919000059 . (Visited on 01/08/2026). Bhattac harya, Anirban, Deb deep P ati, and Y un Y ang (2019). “Ba y esian fractional p osteriors”. In: A nn. Statist. 47.1, pp. 39–66. issn : 0090-5364,2168-8966. doi : 10.1214/18- AOS1712 . url : https://doi.org/10.1214/18- AOS1712 . Miller, Jeffrey W. and Da vid B. Dunson (2019). “Robust Bay esian inference via coarsening”. In: J. A mer. Statist. A sso c. 114.527, pp. 1113–1125. issn : 0162-1459,1537-274X. doi : 10 .1080 / 01621459.2018.1469995 . url : https://doi.org/10.1080/01621459.2018.1469995 . Giordano, Matteo and Ric hard Nickl (2020). “Consistency of Bay esian inference with Gaussian pro cess priors in an elliptic inv erse problem”. In: Inverse Pr oblems 36.8, p. 085001. doi : 10.1088/1361- 6420/ab7d2a . Nic kl, Ric hard, Sara v an de Geer, and Sven W ang (2020). “Conv ergence Rates for Penalized Least Squares Estimators in PDE Constrained Regression Problems”. In : SIAM/ASA Journal on Unc ertainty Quantific ation 8.1, pp. 374–413. doi : 10.1137/18M1236137 . Gine, Ev arist and Ric hard Nickl (2021). Mathematic al F oundations of Infinite-Dimensional Sta- tistic al Mo dels . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cam bridge Univ ersity Press. Kekk onen, Hanne (2022). “Consistency of Ba yesian Inference with Gaussian Pro cess Priors for a Parabolic Inv erse Problem”. In: Inverse Pr oblems 38.3, p. 035002. issn : 0266-5611. doi : 10.1088/1361- 6420/ac4839 . (Visited on 11/25/2025). 32 L’Huillier, Alice, Luk e T ravis, Ismaël Castillo, and Koly an Ray (2023). “Semiparametric in- ference using fractional p osteriors”. In: J. Mach. L e arn. R es. 24, Paper No. [389], 61. issn : 1532-4435,1533-7928. Nic kl, Ric hard (2023). Bayesian Non-line ar Statistic al Inverse Pr oblems . Zurich Lectures in A dv anced Mathematics. Berlin: EMS Press. isbn : 978-3-98547-053-2. doi : 10.4171/zlam/30 . url : https://doi.org/10.4171/zlam/30 . V aart, Aad W. v an der and Jon A. W ellner (2023). W e ak Conver genc e and Empiric al Pr o c esses. With A pplic ations to Statistics . 2nd ed. Springer Series in Statistics. Cham: Springer. isbn : 978-3-031-29040-4. doi : 10 . 1007 / 978 - 3 - 031 - 29040 - 4 . url : https : / / doi . org / 10 . 1007 / 978- 3- 031- 29040- 4 . Bohr, Jan and Ric hard Nic kl (2024). “On log-conca v e appro ximations of high-dimensional p oste- rior measures and stability prop erties in non-linear inv erse problems”. In: A nnales de l’Institut Henri Poinc ar é Pr ob abilités et Statistiques 60.4, pp. 2619–2667. doi : 10.1214/23- aihp1397 . Kutri, Rob ert and Rob ert Sc heichl (2024). Dirichlet-Neumann A ver aging: The DNA of Effi- cient Gaussian Pr o c ess Simulation . doi : 10 . 48550 / arXiv . 2412. 07929 . arXiv: 2412 . 07929 [stat] . (Visited on 01/22/2026). Nic kl, Ric hard (2024). Bernstein-von Mises The or ems for Time Evolution Equations . arXiv: 2407.14781 [math.ST] . url : https://arxiv.org/abs/2407.14781 . Nic kl, Ric hard and Edriss S. Titi (2024). “On Posterior Consistency of Data Assimilation with Gaussian Process Priors: The 2D Na vier–Stokes Equations”. In: A nnals of Statistics 52.4, pp. 1825–1844. doi : 10.1214/24- AOS2427 . Nic kl, Richard and Sven W ang (2024). “On p olynomial-time computation of high-dimensional p osterior measures by Langevin-type algorithms”. In: Journal of the Eur op e an Mathematic al So ciety 26.3, pp. 1031–1112. doi : 10.4171/jems/1304 . Giordano, Matteo and Sv en W ang (2025). “Statistical Algorithms for Lo w-F requency Diffusion Data: A PDE Approach”. In: The A nnals of Statistics 53.3, pp. 1150–1175. issn : 0090-5364, 2168-8966. doi : 10.1214/25- AOS2496 . (Visited on 01/09/2026). K onen, Dimiri and Richard Nickl (2025). Data A ssimilation with the 2D Navier-Stokes Equa- tions: Optimal Gaussian A symptotics for the Posterior Me asur e . doi : 10 . 48550 / arXiv . 2507.18279 . arXiv: 2507.18279 [math] . url : https://arxiv.org/abs/2507.18279 . Sieb el, Maximilian (2025). “Conv ergence Rates for the Maxim um A P osteriori Estimator in PDE-Regression Mo dels with Random Design”. In: SIAM/ASA Journal on Unc ertainty Quan- tific ation , pp. 1862–1903. doi : 10.1137/25M1744526 . Castre, Aurélien and Ric hard Nickl (2026). On Gr adient Stability in Nonline ar PDE Mo dels and Infer enc e in Inter acting Particle Systems . doi : 10 . 48550 / arXiv . 2601 . 10326 . arXiv: 2601.10326 [math] . (Visited on 01/22/2026). 33 App endix A: Choice of Priors In this section we review the construction of Gaussian base prior distributions Π ′ as they are needed in Condition 2.4. The following examples and remarks summarize the constructions considered in Nickl (2023) , Nic kl and Titi (2024) , Konen and Nic kl (2025) , and Nickl (2024) . Example A.1 (General prior) . Let ( λ j , e j ) j ∈ N ⊆ (0 , ∞ ) × L 2 ( M , W ) b e an orthonormal basis of the linear subspace Θ ⊆ L 2 ( M , W ) such that there exist constants C 1 , C 2 > 0 with ∀ j ∈ N : C 1 j 2 d ≤ λ j ≤ C 2 j 2 d . Let α > 0 b e a fixed smo othness parameter. Given an i.i.d. sequence ( g j ) j ∈ N ∼ N(0 , 1) , we define a centred Gaussian process {W ( z ) : z ∈ M} b y ∀ z ∈ M : W ( z ) : = X j ∈ N λ − α 2 j g j e j ( z ) . (38) If ( λ α j ) j ∈ N ∈ 2 ( N ) , it follo ws from Gine and Nickl (2021) , Example 2.6.15 that its RKHS H is giv en b y H : = X j ∈ N λ α 2 j h j e j : ( h j ) j ∈ N ∈ 2 ( N ) = ˙ H α ( M , W ) . Moreo ver, the process {W ( z ) : z ∈ M} con verges in ˙ H β ( M , W ) since X j ∈ N λ β j E h ⟨W , e j ⟩ 2 L 2 ( M ) i = X j ∈ N λ β − α j ≲ X j ∈ N j 2( β − α ) d < ∞ , pro vided that β < α − d 2 . W e thus define the prior Π ′ on Θ as the resulting Borel law Π ′ = La w( W ) , whose supp ort is given by R : = ˙ H β ( M , W ) . If β > d 2 , the Sob olev embedding ˙ H β ( M , W ) → C 0 ( M , W ) implies that Π ′ also defines a la w on C 0 ( M , W ) . Remark A.2. i) As an alternativ e to the Gaussian random series approach, Whittle–Matérn prior me asur es are commonly used for the construction of infinite-dimensional priors; see Nickl (2023) , Theorem B.1.3 for further details. ii) If M = O ⊆ R d is a b ounded domain with smo oth b oundary ∂ O , one ma y wish to enco de b oundary b eha viour in to the prior Π ′ . This is particularly relev ant for the Darcy problem; see, for instance, Giordano and Nic kl (2020) . T o this end, consider the cen tred Gaussian pro cess {W ( z ) : z ∈ O } and a smo oth cut-off function ϕ ∈ C ∞ c ( O ) . Defining {W φ ( z ) : = W ( z ) ϕ ( z ) : z ∈ O } yields a centred Gaussian process with RKHS H φ : = { hϕ : h ∈ H } ⊆ H α c ( O , W ) , whose supp ort is con tin uously em b edded into H β c ( O , W ) for β < α − d 2 . iii) High-dimensional sieve d priors: F or computational reasons, it can b e attractive to consider finite- (but high-) dimensional priors. Fix a dimension D ∈ N and consider the truncated Gaussian random series ∀ z ∈ M : W D ( z ) : = X j ≤ D λ − α 2 j g j e j ( z ) , 34 whic h la w defines a Gaussian prior distribution Π ′ = La w( W D ) on the finite-dimensional space E D : = span { e j : j ≤ D } ≃ R D . More precisely , Π ′ = N 0 , diag λ − α j : j ≤ D . In this w ork, we restrict ourselves to priors constructed as in Example A.1, noting that analogous results can b e obtained with only minor mo difications of the pro ofs. W e refer to Nic kl (2023) (in particular Exercise 2.4.3), as well as to Bohr and Nickl (2024) and Giordano and Nickl (2020) . The following example summarizes explicit choices for priors Π ′ when studying the Reaction Diffusion Equation and the 2D-Navier-Stok es Equ ation in Section 4. Example A.3 (Gaussian process priors for RDE and NSE) . i) Reaction–diffusion equation: Based on Eq. (38), c ho ose the orthonormal system from Eq. (4), whic h yields a prior supp orted on R = ˙ H β ( T d ) for all β < α − d 2 . See also Nic kl (2024) . ii) 2D Navier–Stok es equation: Based on Eq. (38), let k = ( k 1 , k 2 ) ∈ Z 2 \ { (0 , 0) } and define c k ( x ) ∝ ( k 2 , − k 1 ) T cos(2 π k · x ) , s k ( x ) ∝ ( k 2 , − k 1 ) T sin(2 π k · x ) , whic h are eigenfunctions of the Stokes op erator A = − P ∆ . After enumerating k = k j , j ∈ N , w e define the eigenpairs ( λ j , e j ) j ∈ N from Example A.1 by e 2 j − 1 : = c k j , e 2 j : = s k j , Ae j = λ j e j , 0 < λ j ≃ | k j | . The final identit y follo ws from W eyl’s la w for the eigenv alues λ j ; see Prop osition 4.14 in Constan tin and F oias (1989) . Cho osing Π ′ = La w( W ) then induces a law on ˙ H β ⋄ for all β < α − d 2 . See also K onen and Nickl (2025) . App endix B: Mild Missp ecification in M-Estimation While p osterior contraction app ears to b e delicate with resp ect to (mild) missp ecification, it is w ell known that frequentist M-estimation techniques are quite robust; see v an der V aart (1998) ; v an de Geer (2000b) . In this section, w e revisit the results of Sieb el (2025) while generalizing the tec hniques of Nickl et al. (2020) to allow heteroscedasticity and (mild) missp ecification, and un b ounded forward maps, suc h as the solution maps to the Reaction Diffusion equation and the 2D-Navier-Stok es Equations. While allo wing a direct comparison betw een Bay esian and frequen tist approaches, the fav ourable asymptotic b ehaviour of the MAP estimator is used to pro ve Proposition 3.7. Recall the observ ation scheme presented in Section 2.3. Instead of assuming heteroscedastic Gaussian errors, M-estimation tec hniques allo w us to consider more general distributions. Condition B.1 (Bernstein Condition) . The error terms ε 1 , . . . , ε N are indep enden t, heterosk edas- tic, and centred V -v alued random v ariables that satisfy a Bernstein-type condition. That is, there exists a family ( σ 2 i ) N i =1 ⊆ (0 , ∞ ) and a finite constan t B ∈ (0 , ∞ ) suc h that for all i = 1 , . . . , N ∀ v ∈ V : E [ ⟨ ε i , v ⟩ V ] = 0 and E h |⟨ ε i , v ⟩ V | 2 i ≤ σ 2 i | v | 2 V , 35 as well as ∀ v ∈ V , k ∈ N ≥ 2 : E h | ⟨ ε i , v ⟩ V | k i ≤ k ! 2 σ 2 i B k − 2 | v | k V . In addition, there exist σ 0 , σ ∞ ∈ (0 , ∞ ) , such that σ 2 0 ≤ min i ∈≤ N σ 2 i ≤ max i ∈≤ N σ 2 i ≤ σ 2 ∞ . Example B.2. In the following, let N ∈ N be fixed. i) Let V = R d with d ∈ N . Assume that ε i ∼ N(0 , Σ i ) indep endently , with p ositiv e semi- definite co v ariance matrices Σ i ∈ R d × d for i = 1 , . . . , N , suc h that the largest eigen v alues are uniformly b ounded, i.e., max i ≤ N λ max (Σ i ) ≤ Σ ∞ ∈ (0 , ∞ ) . Then, for all v ∈ R d , we hav e E [ ⟨ ε i , v ⟩ R d ] = 0 and E h |⟨ ε i , v ⟩ R d | 2 i ≤ λ max (Σ i ) | v | 2 R d . Moreo ver, ∀ v ∈ R d , k ∈ N ≥ 2 : E h | ⟨ ε i , v ⟩ R d | k i ≤ k ! 2 ( v ⊤ Σ i v ) k/ 2 | v | k R d ≤ k ! 2 λ max (Σ i ) Σ k − 2 2 ∞ | v | k R d . Th us, ε 1 , . . . , ε N satisfy Condition B.1 with σ 2 i = λ max (Σ i ) , i ≤ N , and B = √ Σ ∞ . In particular, if Σ i = v 2 i diag(1 , . . . , 1) ∈ R d × d with ( v 2 i ) i ∈≤ N ⊆ (0 , v 2 ∞ ) for some v 2 ∞ > 0 , we ha ve ( σ 2 i ) i ∈≤ N = ( v 2 i ) i ∈≤ N and B = v ∞ . ii) Let ε 1 , . . . , ε N b e indep endent, centred, V -v alued random v ariables suc h that | ε i | V ≤ B i ∈ (0 , ∞ ) for all i = 1 , . . . , N . F urther assume that max i ≤ N B i ≤ B ∞ ∈ (0 , ∞ ) . Then, for all v ∈ V , E [ ⟨ ε i , v ⟩ V ] = 0 and E h |⟨ ε i , v ⟩ V | 2 i ≤ B 2 i | v | 2 V , as well as ∀ v ∈ V , k ∈ N ≥ 2 : E h | ⟨ ε i , v ⟩ V | k i ≤ B k i | v | k V ≤ k ! 2 B 2 i B k − 2 ∞ | v | k V . Th us, ε 1 , . . . , ε N satisfy Condition B.1 with σ 2 i = B 2 i and B = B ∞ . In the Bay esian approac h, we ha v e imp osed in Condition 2.2 forward regularit y assumptions, that hold lo cally on bounded balls of R . F or M-estimation tec hniques, ho wev er, to obtain statistical guarantees for glob al optimizer w e generally need assumptions on the forward map that quantify its regularit y more globally . Condition B.3 (F orward Regularity – Global) . Let Θ ⊆ L 2 ( M , W ) b e the parameter space. Let ( R , ∥ · ∥ R ) b e a separable normed subspace of Θ suc h that ( R , ∥ · ∥ R ) → ( B η , ∥ · ∥ B η ) , where B η is either C η ( W , W ) or H η ( W , W ) for some η ≥ 0 . F urthermore, let Θ ⋄ ⊆ Θ . [gFR1] There exist constants C ′ Lip , 2 > 0 and γ 2 , κ ≥ 0 suc h that for all θ 1 , θ 2 ∈ Θ ⋄ ∩ R , ∥G ( θ 1 ) − G ( θ 2 ) ∥ L 2 ζ ( Z ,V ) ≤ C ′ Lip , 2 1 + ∥ θ 1 ∥ γ 2 R ∨ ∥ θ 2 ∥ γ 2 R ∥ θ 1 − θ 2 ∥ ( H κ ( O ,W )) ∗ . 36 [gFR2] There exist constants C ′ G , B > 0 and γ B ≥ 0 such that for all θ ∈ Θ ⋄ ∩ R , ∥G ( θ ) ∥ ∞ ≤ C ′ G , B 1 + ∥ θ ∥ γ B R . [gFR3] There exist constants C ′ Lip , ∞ > 0 and γ ∞ ≥ 0 such that for all θ 1 , θ 2 ∈ Θ ⋄ ∩ R , ∥G ( θ 1 ) − G ( θ 2 ) ∥ ∞ ≤ C ′ Lip , ∞ 1 + ∥ θ 1 ∥ γ ∞ R ∨ ∥ θ 2 ∥ γ ∞ R ∥ θ 1 − θ 2 ∥ B η . W e see that the conditions imp osed in Condition B.3 are stronger and hence imply the con- ditions imp osed in Condition 2.2. B.1 Tikhono v-regularized estimator Let N ∈ N and θ 0 ∈ Θ . Giv en data D N ∼ P N θ 0 , any δ > 0 , and α > 0 , we we use the proxy v ariances ( s 2 i ) i ≤ N and the proxy forw ard map e G to define a Tikhonov-regularized functional ˜ J δ,N, s : Θ → [ −∞ , 0) , ˜ J δ,N, s [ θ ] : = − 1 2 N X i ≤ N s − 2 i Y i − e G θ ( Z i ) 2 V − δ 2 2 ∥ θ ∥ 2 H , where H ⊆ L 2 ( M , W ) is a separable Hilb ert space. W e call any elemen t ˆ θ N ∈ Θ ⋄ a maximizer of ˜ J δ,N, s o ver Θ ⋄ ⊆ Θ if it satisfies ˜ J δ,N, s [ ˆ θ N ] = sup θ ∈ Θ ⋄ ˜ J δ,N, s [ θ ] . (39) Prop osition B.4 (Existence of ˆ θ N ) . Let Condition B.1 b e satisfied. Let θ 0 ∈ Θ b e fixed. Assume D N ∼ P N θ 0 for a fixed sample size N ∈ N . Assume that the surrogate operator e G satisfies [gFR2] and [gFR3] of Condition B.3 as well as [MM2] of Condition 3.3. Let H ⊆ L 2 ( M , W ) satisfy Condition 2.4 with α > η + d 2 . Let either Θ ⋄ ⊆ H , if γ B = 0 or Θ ⋄ ⊆ R ( M ) ∩ H for some M ∈ (0 , ∞ ) , if γ B = 0 , (40) b e weakly closed in H . Then, for all δ ∈ (0 , ∞ ) and proxy v ariances ( s 2 i ) N i =1 ⊆ (0 , ∞ ) , suc h that min i ≤ N s 2 i ≥ s 2 0 ∈ (0 , ∞ ) , almost surely under P N θ 0 , there exists a maximizer ˆ θ N of ˜ J δ,N, s o ver Θ ⋄ , that is ˜ J δ,N, s [ ˆ θ N ] = sup θ ∈ Θ ⋄ ˜ J δ,N, s [ θ ] P N θ 0 a.s. Pr o of of Proposition B.4. The proof follo ws the lines of Siebel (2025) , where existence is sho wn for the homoscedastic and correctly sp ecified mo del, i.e. when s 2 i = σ 2 i = σ 2 ∈ (0 , ∞ ) for all i ≤ N . As the pro of do es not change substantially , the details are left for the interested reader. Example B.5. In Proposition B.4, it is required to chose Θ ⋄ accordingly , suc h that it is w eakly closed in H . Standard argumen ts sho w that typical choices as Θ ⋄ : = H , if γ B = 0 , or Θ ⋄ : = H ∩ R ( M ) and Θ ⋄ : = H ( M ) for some M ∈ (0 , ∞ ) , if γ B > 0 satisfy the requirements, if H → R . 37 Theorem B.6. Grant Condition B.1 and assume that the surrogate op erator e G satisfies Con- dition B.3 as w ell as [MM2] of Condition 3.3. Let N ∈ N , θ 0 ∈ Θ b e fixed. Let D N ∼ P N θ 0 . Let H ⊆ L 2 ( M , W ) satisfy Condition 2.4 with α > max n d 2 γ 2 − κ, η + d ∨ d 2 (1 + γ ∞ ) o . Let either Θ ⋄ ⊆ H , if γ B = 0 or Θ ⋄ ⊆ R ( M ) ∩ H for some M ∈ (0 , ∞ ) , if γ B = 0 , b e weakly closed in H . Assume that the pro xy v ariances ( s 2 i ) N i =1 ⊆ (0 , ∞ ) satisfy 0 < s 2 0 : = min i ≤ N s 2 i ≤ max i ≤ N s 2 i = : s 2 ∞ Giv en an y ¯ c ∈ (0 , ∞ ) , w e define the family of symbols S : = n ¯ c, α, γ 2 , γ ∞ , γ B , κ, η , d, d W , M , C e G , B , C ′ Lip , 2 , C ′ Lip , ∞ , σ 0 , σ ∞ , s 0 , s ∞ , B o . Then, we can choose C Rate = C Rate ( S ) ∈ (0 , ∞ ) sufficiently large, such that for all δ ∈ (0 , ∞ ) and R ∈ (0 , ∞ ) with R ≥ δ ≥ N − 1 2 that fulfill δ − 1 − d 2( α + κ ) ≲ N 1 2 , (41) an y maximizer ˆ θ N of ˜ J δ,N, s o ver Θ ⋄ satisfies P N θ 0 e d 2 δ, s ( ˆ θ N , θ 0 ) ≥ C Rate e d 2 δ, s ( θ ⋄ , θ 0 ) + R 2 ≤ 3 ln(2) exp − ¯ cN R 2 for any θ ⋄ ∈ Θ ⋄ . Here, e d 2 δ, s is defined via ∀ θ 1 ∈ Θ ∩ H , θ 2 ∈ Θ : e d 2 δ, s ( θ 1 , θ 2 ) : = ¯ s − 2 N ∥ e G ( θ 1 ) − G ( θ 2 ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 ∥ θ 1 ∥ 2 H . B.2 Consistency and T ests Corollary B.7. Grant Condition B.1 and assume that the surrogate operator e G satisfies Con- dition B.3 as well as [MM2] of Condition 3.3. Let H ⊆ L 2 ( M , W ) satisfy Condition 2.4 with α > max n d 2 γ 2 − κ, η + d ∨ d 2 (1 + γ ∞ ) o . Let either Θ ⋄ = H , if γ B = 0 or Θ ⋄ = R ( M ) ∩ H for some M ∈ (0 , ∞ ) , if γ B = 0 . Let δ ∈ (0 , ∞ ) , suc h that ¯ δ : = ( δ ∨ ˜ δ model ,N ) ≥ N − 1 2 and ¯ δ − 1 − d 2( α + κ ) ≲ N 1 2 . Giv en an y ¯ c, M ∈ (0 , ∞ ) , we can choose C Rate = C Rate ( S ) ∈ (0 , ∞ ) and m = m ( C ′ Lip , 2 , M , γ 2 ) sufficiently large, such that an y maximizer ˆ θ N of ˜ J δ,N, 1 o ver Θ ⋄ satisfies sup θ 0 ∈ Θ ¯ δ ( M ) P N θ 0 ∥ e G ( ˆ θ N ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 ∥ ˆ θ N ∥ 2 H ≥ C Rate m 2 ¯ δ 2 ≲ exp − ¯ cm 2 N ¯ δ 2 . Here, Θ ¯ δ ( M ) denotes Θ ¯ δ ( M ) : = n θ ∈ R : θ = θ 1 + θ 2 , ∥ θ 1 ∥ ( H κ ( M ,W )) ∗ ≤ M ¯ δ, ∥ θ 2 ∥ H ≤ M , ∥ θ ∥ R ≤ M o . Remark B.8. Note, if δ is c hosen in an admissible w a y that ¯ δ N → ∞ , we obtain consistency . This is particularly in the situation where δ = δ N = N − α + κ 2( α + κ )+ d and ˜ δ model ,N ≤ δ N , such that ¯ δ = δ N . 38 Pr o of of Corollar y B.7. Fix θ 0 ∈ Θ δ ( M ) and set 1 : = (1) N i =1 = : ( s 2 i ) N i =1 . Then, following Proposition B.4 and Example B.5, there exists a maximizer ˆ θ N of ˜ J δ,N, 1 o ver Θ ⋄ at least P N θ 0 -almost surely . As e G satisfies [MM2] of Condition 3.3, w e ha v e for all θ ⋄ ∈ Θ ⋄ ∥ e G ( θ ⋄ ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) ≤ 2 ∥ e G ( θ ⋄ ) − e G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) + 2 ∥ e G ( θ 0 ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) . As θ 0 ∈ Θ δ ( M ) ⊆ R ( M ) , we ha ve ∥ e G ( θ 0 ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) ≤ c 1 ( M ) × ˜ δ 2 model ,N ≤ c 1 ( M ) × ¯ δ 2 . As e G satisfies Condition B.3, we further ha ve ∥ e G ( θ ⋄ ) − e G ( θ 0 ) ∥ L 2 ζ ( Z ,V ) ≤ C ′ Lip , 2 1 + ∥ θ ⋄ ∥ γ 2 R ∨ ∥ θ 0 ∥ γ 2 R ∥ θ ⋄ − θ 0 ∥ ( H κ ( M ,W )) ∗ . No w as θ 0 ∈ Θ δ ( M ) , we can c ho ose θ 0 , 1 , θ 0 , 2 , suc h that θ 0 = θ 0 , 1 + θ 0 , 2 and ∥ θ 0 , 1 ∥ ( H κ ( M ,W )) ∗ ≤ M ¯ δ and ∥ θ 0 , 2 ∥ H ≤ M . No w we can c ho ose θ ⋄ = θ 0 , 2 ∈ H ( M ) ⊆ H ∩ R ( M ) , such that the last display reads as ∥ e G ( θ ⋄ ) − e G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) ≤ c 2 ( C ′ Lip , 2 ) × 1 + M 2 γ 2 ¯ δ 2 M 2 . Ov erall w e conclude ∥ e G ( θ ⋄ ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 ∥ θ ⋄ ∥ H ≤ c 3 ( C ′ Lip , 2 , M , γ 2 ) × ¯ δ 2 , where the right-hand-side do es not dep end on the c hoice of θ 0 . Thus, c ho osing R = m √ 2 × ¯ δ with m 2 ≥ 2 ∨ 2 c 3 ( C ′ Lip , 2 , M , γ 2 ) , the claim follo ws b y a direct consequence of Theorem B.6. Corollary B.9 (T ests) . Let N ∈ N . Assume Condition B.1. Assume that the true op erator G satisfies Condition 2.2. Let θ 0 ∈ H b e fixed with α > η + d . Let D N ∼ P N θ 0 . Given ¯ c ∈ (0 , ∞ ) , there exists a test (indicator function) Ψ N = Ψ N ( D N ) , such that lim N →∞ E N θ 0 [Ψ N ] = 0 and sup θ ∈ Θ N ( M ): ∥G ( θ ) −G ( θ 0 ) ∥ L 2 ζ ( Z ,V ) ≥ Lδ N n E N θ [1 − Ψ N ] o ≲ exp − ¯ cN δ 2 N for all L = L ( S ) , M = M ( θ 0 ) > 0 and N sufficien tly large. Pr o of of Corollar y B.9. The pro of follows with similar arguments as presented in Corol- lar y B.7 with the simplification e G = G and hence ˜ δ model ,N = 0 . Giv en M ∈ (0 , ∞ ) consider Θ ⋄ : = H ( M ) . F rom the pro of of Theorem B.6 it b ecomes eviden t, that we can weak en the global assumptions on G as form ulated in Condition B.3 to the local ones as stated in Condi- tion 2.2, if the Θ ⋄ is a b ounded ball in H . Cho osing δ = δ N as in Remark B.8 and utilizing Proposition B.4 and Example B.5, there exists a maximizer ˆ θ N of ˜ J δ,N, s o ver H ( M ) at least P N θ 0 -almost surely . W e then define the test statistic ˆ S N : = ∥G ( ˆ θ N ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 N ∥ ˆ θ N ∥ 2 H and a corresp onding test statistic Ψ N = Ψ N ( D N ) by Ψ N : = 1 { ˆ S N ≥ c 1 δ 2 N } for some constant c 1 ∈ (0 , ∞ ) choosen later. Applying Theorem B.6 with R = δ N , we get P N θ 0 d 2 δ N ( ˆ θ N , θ 0 ) ≥ C Rate d 2 δ N ( θ ⋄ , θ 0 ) + δ 2 N ≲ exp − ¯ cN δ 2 N 39 for any θ ⋄ ∈ H ( M ) and C rate = C rate ( S ) ∈ (0 , ∞ ) sufficiently large. Here, d 2 δ N is defined via ∀ θ 1 ∈ Θ ∩ H , θ 2 ∈ Θ : d 2 δ N ( θ 1 , θ 2 ) : = ∥G ( θ 1 ) − G ( θ 2 ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 N ∥ θ 1 ∥ 2 H . With M > ∥ θ 0 ∥ H , such that θ 0 ∈ Θ ⋄ , we can c ho ose θ ⋄ = θ 0 and obtain C Rate d 2 δ N ( θ ⋄ , θ 0 ) + δ 2 N ≤ C Rate δ 2 N ∥ θ 0 ∥ 2 H + δ 2 N ≤ C ′ rate ( S , ∥ θ 0 ∥ 2 H ) × δ 2 N . Th us, E N θ 0 [Ψ N ] = P N θ 0 ˆ S N ≥ C ′ rate ( S , ∥ θ 0 ∥ 2 H ) × δ 2 N ≲ exp − ¯ cN δ 2 N , whic h con trols the type-I-error. F or the t yp e-I I-error, let θ ∈ Θ N ( M ) , suc h that ∥G ( θ ) − G ( θ 0 ) ∥ L 2 ζ ( Z ,V ) ≥ Lδ N b e arbitrary but fixed. W e then ha ve ∥G ( θ ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) ≤ 2 ∥G ( ˆ θ N ) − G ( θ ) ∥ 2 L 2 ζ ( Z ,V ) + 2 ∥G ( ˆ θ N ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) , where we ha ve applied the triangle inequality and ( a + b ) 2 ≤ 2 a 2 + 2 b 2 for any reals a, b ∈ (0 , ∞ ) . Th us, w e ha v e d 2 δ N ( ˆ θ N , θ 0 ) = ∥G ( ˆ θ N ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 N ∥ ˆ θ N ∥ 2 H ≥ 1 2 ∥G ( θ ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) − ∥G ( ˆ θ N ) − G ( θ ) ∥ 2 + δ 2 N ∥ ˆ θ N ∥ 2 H ≥ 1 2 ∥G ( θ ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) − d 2 δ N ( ˆ θ N , θ ) . Th us, w e ha v e E N θ [1 − Ψ N ] = P N θ d 2 δ N ( ˆ θ N , θ 0 ) ≤ c 1 δ 2 N ≤ P N θ d 2 δ N ( ˆ θ N , θ ) ≥ 1 2 ∥G ( θ ) − G ( θ 0 ) ∥ 2 L 2 ζ ( Z ,V ) − c 1 δ 2 N ≤ P N θ d 2 δ N ( ˆ θ N , θ ) ≥ 1 2 L 2 − c 1 δ 2 N ≤ P N θ d 2 δ N ( ˆ θ N , θ ) ≥ 1 4 L 2 δ 2 N where w e ha ve used that L ≥ 2 √ c 1 . T o upp er bound the last probability , let ˆ θ N b e any maximizer of ˜ J δ,N, s o ver Θ ⋄ = H ( M ) under P N θ -probabilit y with θ ∈ Θ N ( M ) . Thus, b y Theorem B.6, for ev ery ¯ c ∈ (0 , ∞ ) , w e can find C rate = C rate ( S ) sufficiently large, suc h that P N θ d 2 δ N ( ˆ θ N , θ ) ≥ C rate d 2 δ N ( θ ⋄ , θ ) + δ 2 N ≤ 6 × exp − ¯ cN δ 2 N . As θ ∈ Θ N ( M ) there exists a decomp osition θ = θ 1 + θ 2 , suc h that ∥ θ 1 ∥ ( H κ ( M ,W )) ∗ ≤ M δ N and ∥ θ 2 ∥ H ≤ M as well as ∥ θ ∥ R ≤ M . Thus, c ho osing θ ⋄ = θ 2 , we obtain with [FR1] d 2 δ N ( θ ⋄ , θ ) ≲ M ∥ θ 1 ∥ 2 ( H κ ( M ,W )) ∗ ≲ M δ 2 N , where ≲ M means that the m ultiplicative constant dep ends on M . Thus, if we choose L suffi- cien tly large, dep ending on M , we obtain E N θ [1 − Ψ N ] ≲ × exp − ¯ cN δ 2 N uniformly ov er θ ∈ Θ N ( M ) , which sho ws the claim. 40 B.3 Pro of of Theorem B.6 In this section w e are going to pro of the general concentration inequalit y , Theorem B.6. T o that end w e need some notations and key arguments from M-estimation theory we adapt to the presen t setting. Notation B.10 . 1. F or N ∈ N , let D N ∼ ( Y i , Z i ) i ≤ N ⊆ ( V × Z ) N b e a finite family of random v ariables with la w P N θ 0 , where θ 0 ∈ Θ . W e denote b y P ( i ) θ 0 the la w of an y datum ( Y i , Z i ) , i ≤ N . F or any ( y , z ) ∈ V × Z , let δ ( y ,z ) ( · ) denote the Dirac measure on ( V × Z , B V ×Z ) . F or eac h i ≤ N , w e define ∀ B ∈ B V ×Z : b P ( i ) ( B ) : = δ ( Y i ,Z i ) ( B ) . Th us, for any measurable function h : V × Z → R and eac h i ≤ N , Z V ×Z h ( y , z ) d b P ( i ) ( y , z ) = h ( Y i , Z i ) . 2. F or any θ ∈ Θ , we define ∀ ( y , z ) ∈ V × Z : ˜ L θ ( y , z ) : = exp y − e G θ ( z ) 2 V . Lemma B.11 (Key Argumen t) . Let Θ ⋄ ⊆ Θ ∩ H . Giv en θ 0 ∈ Θ and D N ∼ P N θ 0 , we define for all θ ∈ Θ ⋄ the empirical pro cesses f W N ( θ ) : = 1 2 N X i ≤ N s − 2 i Z V ×Z log ˜ L θ ⋄ ( y i , z i ) ˜ L θ ( y i , z i ) ! d( b P ( i ) − P ( i ) θ 0 )( y i , z i ) with fixed θ ⋄ ∈ Θ ⋄ . Giv en any fixed δ ∈ (0 , ∞ ) and ( s 2 i ) N i =1 ⊆ (0 , ∞ ) , let ˆ θ N an y maximizer of ˜ J δ,N, s o ver Θ ⋄ , i.e. that satisfies ˜ J δ,N, s [ ˆ θ N ] = sup θ ∈ Θ ⋄ ˜ J δ,N, s [ θ ] . W e then ha ve 2 f W N ( ˆ θ N ) + e d 2 δ, s ( θ ⋄ , θ 0 ) ≥ e d 2 δ, s ( ˆ θ N , θ 0 ) , where the inequality is understoo d P N θ 0 -almost surely . Pr o of of Lemma B.11. In the following, w e fix an arbitrary maximizer ˆ θ N of ˜ J δ,N, s under P N θ 0 - probabilit y . Thus, all (in-)equalities hold P N θ 0 -almost surely . By definition of ˆ θ N , we hav e ˜ J δ,N, s [ ˆ θ N ] ≥ ˜ J δ,N, s [ θ ⋄ ] , and hence 1 2 N X i ≤ N s − 2 i Y i − e G ( θ ⋄ )( Z i ) 2 V − Y i − e G ( ˆ θ N )( Z i ) 2 V + δ 2 2 ∥ θ ⋄ ∥ 2 H ≥ δ 2 2 ∥ ˆ θ N ∥ 2 H . This is equiv alent to 1 2 N X i ≤ N s − 2 i log ˜ L θ ⋄ ( Y i , Z i ) ˜ L ˆ θ N ( Y i , Z i ) ! + δ 2 2 ∥ θ ⋄ ∥ 2 H ≥ δ 2 2 ∥ ˆ θ N ∥ 2 H . 41 Subtracting on b oth sides 1 2 N X i ≤ N s − 2 i Z V ×Z log ˜ L θ ⋄ ( y i , z i ) ˜ L ˆ θ N ( y i , z i ) ! d P ( i ) θ 0 ( y i , z i ) , the previous inequality b ecomes f W N ( ˆ θ N ) + δ 2 2 ∥ θ ⋄ ∥ 2 H ≥ δ 2 2 ∥ ˆ θ N ∥ 2 H − 1 2 N X i ≤ N s − 2 i Z V ×Z log ˜ L θ ⋄ ( y i , z i ) ˜ L ˆ θ N ( y i , z i ) ! d P ( i ) θ 0 ( y i , z i ) . Since Z i ∼ ζ is sto c hastically indep enden t of ε i , we observe that P ( i ) θ 0 = ( ζ ⊗ P ε i ) ◦ T − 1 θ 0 , T θ 0 : Z × V ∋ ( z , e ) 7→ ( z , G ( θ 0 )( z ) + e ) ∈ Z × V , suc h that Z V ×Z log ˜ L θ ⋄ ( y i , z i ) ˜ L ˆ θ N ( y i , z i ) ! d P ( i ) θ 0 ( y i , z i ) = Z Z × V G ( θ 0 )( z i ) − e G ( θ ⋄ )( z i ) 2 V − G ( θ 0 )( z i ) − e G ( ˆ θ N )( z i ) 2 V d ( ζ ⊗ P ε i ) ( z i , e i ) + 2 Z Z × V D e i , e G ( ˆ θ N )( z i ) − e G ( θ ⋄ )( z i ) E V d ( ζ ⊗ P ε i ) ( z i , e i ) . By indep endence of Z i and ε i , and since ε i is a centered V -v alued random v ariable, we obtain Z V ×Z log ˜ L θ ⋄ ( y i , z i ) ˜ L ˆ θ N ( y i , z i ) ! d P ( i ) θ 0 ( y i , z i ) = ∥G ( θ 0 ) − e G ( θ ⋄ ) ∥ 2 L 2 ζ ( Z ,V ) − ∥G ( θ 0 ) − e G ( ˆ θ N ) ∥ 2 L 2 ζ ( Z ,V ) . Com bining these computations yields f W N ( ˆ θ N ) + 1 2 ¯ s − 2 N ∥G ( θ 0 ) − e G ( θ ⋄ ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 2 ∥ θ ⋄ ∥ 2 H ≥ 1 2 ¯ s − 2 N ∥G ( θ 0 ) − e G ( ˆ θ N ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 2 ∥ ˆ θ N ∥ 2 H . This prov es the claim. Lemma B.12 (M-Estimation) . Define for each θ 0 ∈ Θ , θ ⋄ ∈ Θ ⋄ and real num b ers δ, R ∈ (0 , ∞ ) the even t Ξ δ,R : = n e d 2 δ, s ( ˆ θ N , θ 0 ) ≥ c 1 e d 2 δ, s ( θ ⋄ , θ 0 ) + R 2 o , where c 1 ∈ R ≥ 2 is a fixed p ositiv e constant. On Ξ δ,R , we hav e i) ¯ s − 2 N ∥ e G ( ˆ θ N ) − e G ( θ ⋄ ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 ∥ ˆ θ N ∥ 2 H ≥ c 2 R 2 for some constant c 2 ≥ 1 . ii) e d 2 δ, s ( ˆ θ N , θ 0 ) − e d 2 δ, s ( θ ⋄ , θ 0 ) ≥ 1 6 ¯ s − 2 N ∥ e G ( ˆ θ N ) − e G ( θ ⋄ ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 ∥ ˆ θ N ∥ 2 H . Pr o of of Lemma B.12. The pro of follo ws from standard computations in M-estimation theory as they can b e found in v an de Geer (2001) . Hence, the deriv ation is omitted. W e are no w able to pro of the general concen tration inequalit y presen ted in Theorem B.6. 42 Pr o of of Theorem B.6. Noting that the even t in question is the ev ent Ξ δ,R as presented in Lemma B.11, Lemma B.11 implies that it suffices to establish the correct exp onen tial decay of the probability P N θ 0 n 2 f W N ( ˆ θ N ) + e d 2 δ, s ( θ ⋄ , θ 0 ) ≥ e d 2 δ, s ( ˆ θ N , θ 0 ) o ∩ Ξ δ,R . Defining the shorthand notation e D 2 ( ˆ θ N , θ ⋄ ) : = ¯ s − 2 N ∥ e G ( ˆ θ N ) − e G ( θ ⋄ ) ∥ 2 L 2 ζ ( Z ,V ) + δ 2 ∥ ˆ θ N ∥ 2 H , according to Lemma B.12, the last probabilit y can b e upp er b ounded b y P N θ 0 | f W N ( ˆ θ N ) | ≥ 1 12 e D 2 ( ˆ θ N , θ ⋄ ) , e D 2 ( ˆ θ N , θ ⋄ ) ≥ 1 2 c 1 R 2 . (42) T o further upp er b ound the last probability , w e apply a p eeling device (see v an de Geer (2000a) ). F or this, w e define slices of the set Θ ⋄ , i.e., for eac h l ∈ N w e set Θ ⋄ l : = θ ∈ Θ ⋄ : e D 2 ( θ , θ ⋄ ) ∈ 1 2 c 1 R 2 · 2 2 l − 2 , 1 2 c 1 R 2 · 2 2 l and observe that θ ∈ Θ : e D 2 ( θ , θ ⋄ ) ≥ 1 2 c 1 R 2 = [ l ∈ N Θ ⋄ l . Moreo ver, b y definition we hav e for eac h l ∈ N Θ ⋄ l ⊆ H α M , W , 2 l √ c 1 R √ 2 δ and for each θ ∈ Θ ⋄ l ∥ e G ( θ ) − e G ( θ ⋄ ) ∥ 2 L 2 ζ ( Z ,V ) ≤ 2 2 l c 1 R 2 2 ¯ s − 2 N , where we hav e used Condition 2.4. Then, the p eeling device yields that Eq. (42) can b e upp er b ounded by X l ∈ N P N θ 0 sup θ ∈ Θ ⋄ l | f W N ( θ ) | ≥ 2 2 l c 1 R 2 192 ! . (43) No w, analogously to the pro of of Lemma B.11, w e decomp ose f W N ( θ ) in to tw o cen tered empirical pro cesses. Indeed, under P N θ 0 , we hav e for eac h i ≤ N log ˜ L θ ⋄ ( Y i , Z i ) ˜ L θ ( Y i , Z i ) ! = G ( θ 0 )( Z i ) − e G ( θ ⋄ )( Z i ) 2 V − G ( θ 0 )( Z i ) − e G θ ( Z i ) 2 V + 2 D e G θ ( Z i ) − e G ( θ ⋄ )( Z i ) , ε i E V , suc h that under P N θ 0 w e ha ve f W N ( θ ) = 1 √ N T N , 1 ( θ ) + 1 √ N ( T N , 2 ( θ ) − E [ T N , 2 ( θ )]) , where we hav e defined the empirical pro cesses T N , 1 ( θ ) : = 1 √ N X i ≤ N s − 2 i D ε i , e G θ ( Z i ) − e G ( θ ⋄ )( Z i ) E V and T N , 2 ( θ ) : = 1 2 √ N X i ≤ N s − 2 i G ( θ 0 )( Z i ) − e G ( θ ⋄ )( Z i ) 2 V − G ( θ 0 )( Z i ) − e G ( θ )( Z i ) 2 V . Th us, Eq. (43) can b e upp er bounded by 43 X l ∈ N P sup θ ∈ Θ ⋄ l | T N , 1 ( θ ) | ≥ √ N 2 2 l c 1 R 2 384 ! + X l ∈ N P sup θ ∈ Θ ⋄ l | T N , 2 ( θ ) − E [ T N , 2 ( θ )] | ≥ √ N 2 2 l c 1 R 2 384 ! . W e aim to control b oth remaining probabilities using Lemma D.2. Starting with ( T N , 1 ( θ )) θ ∈ Θ , w e define for each l ∈ N the class of measurable functions H l : = n h θ : = e G θ − e G ( θ ⋄ ) : Z → V : θ ∈ Θ ⋄ l o . No w, with [gFR2] , we obtain ∀ θ ∈ Θ ⋄ : ∥ h θ ∥ ∞ ≤ ∥ e G θ − e G ( θ ⋄ ) ∥ ∞ ≤ 2 C e G , B 1 + ∥ θ ⋄ ∥ γ B R ∨ ∥ θ ∥ γ B R ≤ 2 C e G , B × max { 2 , 1 + M γ B } . Th us, sup θ ∈ Θ ⋄ l ∥ h θ ∥ ∞ ≤ c 2 C e G , B , M , γ B ∈ (0 , ∞ ) . F urther, for an y Z ∼ ζ , w e ha v e sup θ ∈ Θ ⋄ l E h h θ ( Z ) 2 i = sup θ ∈ Θ ⋄ l ∥ e G θ − e G ( θ ⋄ ) ∥ 2 L 2 ζ ( Z ,V ) ≤ 2 2 l c 1 R 2 2 ¯ s − 2 N =: v 2 l . Th us, all conditions of Lemma D.2 are satisfied, and w e can apply the result to conclude that there exists a univ ersal constan t L ∈ (0 , ∞ ) suc h that for all x ≥ 1 , P sup θ ∈ Θ ⋄ l | T N , 1 ( θ ) | ≥ L v u u t 1 N X i ≤ N σ 2 i s 4 i J 2 ( H l ) + v l √ x + B s 2 0 √ N ( J ∞ ( H l ) + 2 C G , B x ) ≤ 3 exp( − x ) , W e now show that for suitable x ≥ 1 and c 1 ∈ (0 , ∞ ) sufficently large, L v u u t 1 N X i ≤ N σ 2 i s 4 i J 2 ( H l ) + v l √ x + B s 2 0 √ N ( J ∞ ( H l ) + 2 C G , B x ) ≤ √ N 2 2 l c 1 R 2 384 . Therefore, w e need to upp er b ound the en tropy integrals J 2 ( H l ) and J ∞ ( H l ) , resp ectiv ely . In Sieb el (2025) it is shown that there exist finite constants c 3 , c 4 ∈ (0 , ∞ ) , b oth dep ending only on α , κ, d, γ 2 , γ ∞ , η and d W , such that for an y ρ ∈ (0 , ∞ ) log N ( H l , d 2 , ρ ) ≤ c 3 × 2 l √ c 1 R ¯ m 2 δ ρ ! d α + κ and log N ( H l , d ∞ , ρ ) ≤ c 4 × 2 l √ c 1 R ¯ m ∞ δ ρ ! d α − η with ¯ m i : = max { C Lip , 2 , C Lip , ∞ } × 1 + 2 l √ c 1 R δ γ i , i ∈ { 2 , ∞} . With similar arguments as in Sieb el (2025) w e can choose α > d 2 − κ and upp er bound J 2 ( H l ) with J 2 ( H l ) = Z [0 , 2v l ] q log N ( H l , d 2 , ρ ) d ρ ≤ c 3 × √ N 2 2 l c 1 R 2 × 2 − l (1 − γ 2 d 2( α + κ ) ) c − 1 2 (1+ dγ 2 2( α + κ ) ) 1 × R − 1+ dγ 2 2( α + κ ) δ − d ( γ 2 +1) 2( α + κ ) × q ¯ s − 2 N − 1+ d 2( α + κ ) × √ N − 1 ≤ c 3 × √ N 2 2 l c 1 R 2 × c − 1 2 (1 − dγ 2 2( α + κ ) ) 1 . 44 In particular, the en tropy in tegral can b e upp er b ounded b y m ultiples of √ N 2 2 l c 1 R 2 as small as desired b y choosing c 1 sufficien tly large. With similar (tedious) computations, w e obtain ov erall J 2 ( H l ) ≤ c ( c 1 , α, γ 2 , κ, d, d W , C Lip , 2 , C Lip , ∞ ) × √ N 2 2 l c 1 R 2 as well as J ∞ ( H l ) ≤ c ( c 1 , α, γ ∞ , γ B , η , C Lip , 2 , C Lip , ∞ , C e G , B , d, d W , M ) × N 2 2 l c 1 R 2 , where b oth constants can b e made as small as desired by choosing c 1 ∈ (0 , ∞ ) sufficien tly large. Plugging in these computations and choosing x = ¯ cN 2 2 l R 2 , we obtain, v u u t 1 N X i ≤ N σ 2 i s 4 i J 2 ( H l ) + v l √ x ≤ c ( c 1 , S ) × 2 2 l c 1 R 2 √ N , as well as B s 2 0 √ N ( J ∞ ( H l ) + 2 C G , B x ) ≤ c ( c 1 , S ) × 2 2 l c 1 R 2 √ N , where again b oth constan ts can be made as small as desired b y choosing c 1 ∈ (0 , ∞ ) sufficiently large. Altogether, we obtain L v u u t 1 N X i ≤ N σ 2 i s 4 i J 2 ( H l ) + v l √ x + B s 2 0 √ N ( J ∞ ( H l ) + 2 C G , B x ) ≤ √ N 2 2 l c 1 R 2 384 . for c 1 sufficien tly large, dep ending on c 1 , S . All together, w e ha v e sho wn P sup θ ∈ Θ ⋄ l | T N , 1 ( θ ) | ≥ √ N 2 2 l c 1 R 2 384 ! ≤ 3 exp( − ¯ cN 2 2 l R 2 ) . By the same argumen ts, with a c hange of constan ts, we can similarly show that P sup θ ∈ Θ ⋄ l | T N , 2 ( θ ) − E N θ 0 [ T N , 2 ( θ )] | ≥ √ N 2 2 l c 1 R 2 192 ! ≤ 3 exp( − ¯ cN 2 2 l R 2 ) . Th us, o verall, w e obtain X l ∈ N P sup θ ∈ Θ ⋄ l | f W N ( θ ) | ≥ 2 2 l c 2 1 R 2 48 ! ≤ 6 X l ∈ N exp( − ¯ cN 2 2 l R 2 ) ≤ 3 ln(2) exp( − ¯ cN R 2 ) , where we ha ve used Lemma D.3 with a = 0 , b = ¯ cN R 2 , B = 2 and c = 2 . This shows the claim. App endix C: Analysis and PDE Theory In the following w e will make regulary usage of Y oung’s Ine quality for pr o ducts that is ∀ ∈ (0 , ∞ ) , ∀ a, b ∈ [ 0 , ∞ ) : ab ≤ a 2 2 + b 2 2 . (44) W e note that for any s ∈ [0 , ∞ ) , ( ˙ H s ( T d )) ∗ ≃ ˙ H − s ( T d ) is the top ological dual w.r.t the L 2 - pairing. In particular, w e ha ve for u, v ∈ C ∞ ( T d ) ⟨ u, v ⟩ L 2 ( T d ) ≤ ∥ u ∥ ˙ H s ( T d ) · ∥ v ∥ ˙ H − s ( T d ) . (45) F urther, for s 1 , s 2 , s 3 ∈ R , we hav e ⟨ ( − ∆ T ) s 1 u, ( − ∆ T ) s 2 v ⟩ ˙ H s 3 ( T d ) = D u, ( − ∆ T v ) s 1 + s 2 + s 3 E L 2 ( T d ) (46) as well as ∥ ( − ∆ T ) s 1 ∥ ˙ H s 2 ( T d ) = ∥ u ∥ ˙ H 2 s 1 + s 2 . (47) F or further details we refer to Section A in K onen and Nic kl (2025) . 45 C.1 Reaction Diffusion Equation In this section, w e derive the analytical prop erties the solution of the non-linear reaction diffusion equation needed for the missp ecification setting in Section 4. T o that end, we recall the framew ork presen ted in Nickl (2024) . Let d ∈ N and M = T d the d -dimensional torus. Fix some time horizont T ∈ (0 , ∞ ) . W e are interested in p erio dic solutions u to the parab olic PDE ∂ ∂ t u θ ( t, x ) − ∆ u θ ( t, x ) = f ( u θ ( t, x )) on T d × (0 , T ] u θ (0 , x ) = θ ( x ) on T d , (48) where ∆ = P i ≤ d ∂ 2 ∂ x 2 i denotes the spatial Laplacian, f ∈ C ∞ c ( R ) is a reaction term, and θ ∈ H 1 ( T d ) is a initial condition. A we ak solution to Eq. (48) is a map u ∈ L 2 ([0 , T ] , H 1 ( T d )) ∩ C 0 ([0 , T ] , L 2 ( T d )) , such that ∂ ∂ t u ∈ L 2 ([0 , T ] , H − 1 ( T d )) satisfies ∂ ∂ t u θ ( t, · ) , v L 2 ( T d ) = ⟨∇ u θ ( t, · ) , ∇ v ⟩ L 2 ( T d ) = ⟨ f ( u θ ( t, · )) , v ⟩ L 2 ( T d ) u θ ( · , 0) = θ for all v ∈ H 1 ( T d ) and a.e. t ∈ (0 , T ] . u is called str ong solution to Eq. (48), if in addition u ∈ L 2 ((0 , T ] , H 2 ( T d )) ∩ C 0 ((0 , T ] , H 1 ( T d )) and ∂ ∂ t u ∈ L 2 ([0 , T ] , L 2 ( T d )) , in whic h case then u solves Eq. (48) as an equation in L 2 ([0 , T ] , L 2 ( T d )) . In Theorem 6 in Sec- tion 3.1.1. in Nickl (2024) it is shown that under these conditions there exists alwa ys a strong solution u = u f θ to the reaction-diffusion equation Eq. (48) that is unique in C 0 ([0 , T ] , L 2 ( T d )) . F or the pro ofs of Section 4, where we missp ecifiy the reaction term f , we need some analytical prop erties of the map f 7→ u f θ to v erify the assumptions formulated in Condition 3.3. T o that end, we need to refine Prop osition 5 in Nickl (2024) . Lemma C.1. F or a ∈ N 0 , let θ ∈ H a ( T d ) ∩ H 1 ( T d ) and f ∈ C ∞ c ( R ) fixed. Let u = u θ b e a solution to Eq. (48). Then, there exists a constant c = c ( f , T , d, a ) , suc h that sup t ∈ [0 ,T ] ∥ u θ ( t, · ) ∥ 2 H a ( T d ) + Z T 0 ∥ u θ ( t, · ) ∥ 2 H a +1 ( T d ) d t ≤ c · 1 + ∥ θ ∥ 2( a − 1)! H a ( T d ) , where we define ( − 1)! := 1 . Remark C.2. In Nickl (2024) the claim was prov en under the additional assumption that ∥ θ ∥ H a ( T d ) ≤ B for some fixed and kno wn constant B ∈ (0 , ∞ ) , suc h that the right-hand side of the equation ab o ve is bounded by a constan t d epending additionally on B . Pr o of of Lemma C.1. F rom the pro of of Prop osition 4 in Nickl (2024) , it suffices to assume that the solution u θ is sufficiently regular as the general statement then follo ws from a Galerkin appro ximation argumen t. • Step I - Preliminary identities and estimates: Let a ∈ N 0 . Using the fact that ∀ t ∈ [ 0 , T ] : ∥∇ u θ ( t, · ) ∥ 2 h a = − ⟨ ∆ u θ ( t, · ) , u θ ( t, · ) ⟩ h a and differentiating the squared h a -norm yields ∀ t ∈ [ 0 , T ] : 1 2 d d t ∥ u θ ( t, · ) ∥ 2 h a + ∥∇ u θ ( t, · ) ∥ 2 h a = ⟨ f ( u θ ( t, · )) , u θ ( t, · ) ⟩ h a . (49) 46 F or a = 0 , w e observ e ∀ t ∈ [ 0 , T ] : ⟨ f ( u θ ( t, · )) , u θ ( t, · ) ⟩ L 2 ( T d ) = Z T d f ( u θ ( t, x )) u 1 {| u θ ( t,x ) |≤ K } d x ≤ C f < ∞ for some K ∈ (0 , ∞ ) as f is compactly supp orted. Th us, Eq. (49) implies ∀ t ∈ [ 0 , T ] : ∥ u θ ( t, · ) ∥ 2 L 2 ( T d ) + 2 Z t 0 ∥∇ u θ ( s, · ) ∥ 2 L 2 ( T d ) d s ≤ 2 C f t + ∥ θ ∥ 2 L 2 ( T d ) . F rom this, elemen tray computations lead to sup t ∈ [0 ,T ] ∥ u θ ( t, · ) ∥ 2 L 2 ( T d ) + 2 Z T 0 ∥ u θ ( t, · ) ∥ 2 H 1 ( T d ) d t ≤ c ( f , T ) × 1 + ∥ θ ∥ 2 L 2 ( T d ) . This already shows the claim for a = 0 . F or a ∈ N , going bac k to Eq. (49), an application of the Cauch y-Sch warz inequality further yields ∀ t ∈ [ 0 , T ] : ⟨ f ( u θ ( t, · )) , u θ ( t, · ) ⟩ h a ≤ ∥ f ( u θ ( t, · )) ∥ h a − 1 · ∥ u θ ( t, · ) ∥ h a +1 . Using Y oung’s inequality with = 2 yields ∥ f ( u θ ( t, · )) ∥ h a − 1 · ∥ u θ ( t, · ) ∥ h a +1 ≤ ∥ f ( u θ ( t, · )) ∥ 2 h a − 1 + 1 4 ∥ u θ ( t, · ) ∥ 2 h a +1 ≃ ∥ f ( u θ ( t, · )) ∥ 2 H a − 1 · 1 4 ∥∇ u θ ( t, · ) ∥ 2 h a + ∥ u θ ( t, · ) ∥ 2 L 2 ( T d ) , where we hav e used ∥ · ∥ 2 H a +1 ( T d ) ≃ ∥ · ∥ 2 L 2 ( T d ) + ∥∇ · ∥ 2 h a . Th us, we hav e ∀ t ∈ [ 0 , T ] : 1 2 d d t ∥ u θ ( t, · ) ∥ 2 h a + 3 4 ∥∇ u θ ( t, · ) ∥ 2 h a ≲ ∥ f ( u θ ( t, · )) ∥ 2 H a − 1 + 1 4 ∥ u θ ( t, · ) ∥ 2 L 2 ( T d ) . (50) • Step II - Induction o ver a : W e will pro ve the general claim b y an induction argument. T o that end, we start with the induction beginning by pro ving the claim for a = 1 and a = 2 explicitly . – (IB) a = 1 : W e start with Eq. (50) and obtain ∀ t ∈ [ 0 , T ] : 1 2 d d t ∥ u θ ( t, · ) ∥ 2 h 1 + 3 4 ∥∇ u θ ( t, · ) ∥ 2 h 1 ≲ ∥ f ( u θ ( t, · )) ∥ 2 L 2 ( T d ) + 1 4 ∥ u θ ( t, · ) ∥ 2 L 2 ( T d ) . Observing that ∀ t ∈ [ 0 , T ] : ∥ f ( u θ ( t, · )) ∥ 2 L 2 ( T d ) ≤ ∥ f ∥ 2 ∞ as well as using the b ound deriv ed for a = 0 leads to ∀ t ∈ [ 0 , T ] : 1 2 d d t ∥ u θ ( t, · ) ∥ 2 h 1 + 3 4 ∥∇ u θ ( t, · ) ∥ 2 h 1 ≲ c ( f , T ) × 1 + ∥ θ ∥ 2 L 2 ( T d ) . In tegration yields sup t ∈ [0 ,T ] ∥ u θ ( t, · ) ∥ 2 h 1 + 3 2 Z T 0 ∥∇ u θ ( t, · ) ∥ 2 h 1 d t ≤ c ( f , t ) 1 + ∥ θ ∥ 2 H 1 ( T d ) , whic h sho ws the claim. 47 – (IB) a = 2 : W e start with Eq. (50) and obtain ∀ t ∈ [ 0 , T ] : 1 2 d d t ∥ u θ ( t, · ) ∥ 2 h 2 + 3 4 ∥∇ u θ ( t, · ) ∥ 2 h 2 ≲ ∥ f ( u θ ( t, · )) ∥ 2 h 1 + 1 4 ∥ u θ ( t, · ) ∥ 2 L 2 ( T d ) . Note that ∥ f ( u θ ( t, · )) ∥ 2 h 1 ≃ ∥ f ( u θ ( t, · )) ∥ 2 L 2 ( T d ) + max | β | =1 ∥ D β f ( u θ ( t, · )) ∥ 2 L 2 ( T d ) ≤ ∥ f ∥ 2 ∞ + ∥ f ′ ∥ 2 ∞ · ∥ u θ ( t, · ) ∥ 2 H 1 ( T d ) ≤ c ( f , f ′ , T ) × 1 + ∥ θ ∥ 2 H 1 ( T d ) , where w e hav e u sed the b ound derived for a = 1 in the last step. Th us, w e obtain for all t ∈ [0 , T ] 1 2 d d t ∥ u θ ( t, · ) ∥ 2 h 2 + 3 4 ∥∇ u θ ( t, · ) ∥ 2 h 2 ≲ c ( f , f ′ , T ) × 1 + ∥ θ ∥ 2 H 1 ( T d ) + 1 4 ∥ u θ ( t, · ) ∥ 2 L 2 ( T d ) , whic h - after integration - shows the claim for a = 2 . – (IH) F or a ∈ N fixed, there exists a finite constant c = c ( f , T ) ∈ (0 , ∞ ) , suc h that sup t ∈ [0 ,T ] ∥ u θ ( t, · ) ∥ 2 H a ( T d ) + 3 2 Z T 0 ∥ u θ ( t, · ) ∥ 2 H a +1 ( T d ) d t ≤ c ( f , T ) × 1 + ∥ θ ∥ 2( a − 1)! H a ( T d ) . – (IS) a ⇒ a + 1 : Under (IH) , we consider the case a + 1 . W e start with Eq. (50) and obtain ∀ t ∈ [ 0 , T ] : 1 2 d d t ∥ u θ ( t, · ) ∥ 2 h a +1 + 3 4 ∥∇ u ∥ 2 h a +1 ≲ ∥ f ( u θ ( t, · )) ∥ 2 h a + 1 4 ∥ u θ ( t, · ) ∥ 2 L 2 ( T d ) . With Lemma D.1 we obtain ∀ t ∈ [ 0 , T ] : ∥ f ( u ( t, · )) ∥ 2 h a ≲ 1 + ∥ u ( t, · ) ∥ 2 a H a ( T d ) . Then (IH) yields ∀ t ∈ [ 0 , T ] : ∥ u ( t, · ) ∥ 2 a H a ( T d ) ≲ 1 + ∥ θ ∥ 2 a ( a − 1)! H a ( T d ) = 1 + ∥ θ ∥ 2 a ! H a ( T d ) Th us, ∥ f ( u ) ∥ 2 h a ≲ 1 + ∥ θ ∥ 2 a ! H a ( T d ) . In tegrating the resulting inequality as b efore then yields the desired claim. Lemma C.3. Let d ≤ 3 . Let θ ∈ H 2 ( T d ) . Denote by u f 1 θ and u f 1 θ the unique solutions to Eq. (48) with different reaction terms f 1 , f 2 ∈ C ∞ c ( R ) . Then there exists a finite constan t c = c ( f 1 , f 2 , T ) ∈ (0 , ∞ ) , suc h that sup t ∈ [0 ,T ] ∥ u f 1 θ ( t, · ) − u f 2 θ ( t, · ) ∥ 2 H 2 ( T d ) ≤ c ( f 1 , f 2 , T , d ) · 1 + ∥ θ ∥ 2 H 2 ( T d ) · ∥ f 1 − f 2 ∥ 2 C 1 ( R ) . Pr o of of Lemma C.3. Let θ ∈ H 2 ( T d ) b e arbitrary but fixed. W e show the claim successively b y deriving a L 2 -b ound first, a H 1 -b ound second and finally the desired claim. W e define the difference w : = u f 1 θ − u f 2 θ , which solves the PDE ( ∂ t w ( t, x ) − ∆ w ( t, x ) = f 1 ( u f 1 θ ( t, x )) − f 2 ( u f 2 θ ( t, x )) , on (0 , T ] × T d w (0 , x ) = 0 , on T d 48 Analogously as in the pro of of Lemma C.1, for all a ∈ N 0 , we then ha ve the basic equality 1 2 d d t ∥ w ( t, · ) ∥ 2 h a + ∥∇ w ( t, · ) ∥ 2 h a = D f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) , w ( t, · ) E h a . (51) In the subsequent, w e will upp er b ound the right-hand-side of the equation accordingly . • a = 0 : F or all t ∈ [0 , T ] , w e can write f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) = f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 1 θ ( t, · )) + f 2 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) , suc h that ∀ t ∈ [ 0 , T ] : f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) ≤ ∥ f 1 − f 2 ∥ ∞ + ∥ f ′ 2 ∥ ∞ · | w ( t, · ) | . Th us, b y Cauc h y-Sch warz inequality D f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) , w ( t, · ) E L 2 ( T d ) ≤ ∥ f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) ∥ L 2 ( T d ) · ∥ w ( t, · ) ∥ L 2 ( T d ) ≤ ∥ f 1 − f 2 ∥ ∞ · ∥ w ( t, · ) ∥ L 2 ( T d ) + ∥ f ′ 2 ∥ ∞ ∥ w ( t, · ) ∥ 2 L 2 ( T d ) ≤ 1 2 ∥ f 1 − f 2 ∥ 2 ∞ + 1 2 + ∥ f ′ 2 ∥ ∞ ∥ w ( t, · ) ∥ 2 L 2 ( T d ) . Th us, the basic inequality yields d d t ∥ w ( t, · ) ∥ 2 L 2 ( T d ) + 2 ∥∇ w ( t, · ) ∥ 2 L 2 ( T d ) ≤ ∥ f 1 − f 2 ∥ 2 ∞ + 1 + 2 ∥ f ′ 2 ∥ ∞ ∥ w ( t, · ) ∥ 2 L 2 ( T d ) . In tegrating the last inequality and applying Grönw all’s inequalit y yields sup t ∈ [0 ,T ] ∥ w ( t, · ) ∥ 2 L 2 ( T d ) ≤ c ( T , f 1 , f 2 ) × ∥ f 1 − f 2 ∥ 2 ∞ . • a = 1 : Applying Cauch y-Sch warz inequalit y , the right hand side of Eq. (51) can b e for all t ∈ [0 , T ] upp er b ounded b y D f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) , w ( t, · ) E h 1 ≤ ∥ f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) ∥ L 2 ( T d ) · ∥ w ( t, · ) ∥ h 2 ≤ ∥ f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) ∥ 2 L 2 ( T d ) + 1 4 ∥ w ( t, · ) ∥ 2 L 2 ( T d ) + 1 4 ∥∇ w ( t, · ) ∥ 2 h 1 ≲ ∥ f 1 − f 2 ∥ 2 ∞ + ∥ f ′ 2 ∥ 2 ∞ + 1 4 ∥ w ( t, · ) ∥ 2 L 2 ( T d ) + 1 4 ∥∇ w ( t, · ) ∥ 2 h 1 . In tegrating Eq. (51) after absorbing the remaining term on the righ t-hand side, w e hav e for all t ∈ [0 , T ] ∥ w ( t, · ) ∥ 2 h 1 + 3 2 Z t 0 ∥∇ w ( s, · ) ∥ 2 h 1 d s ≲ ∥ f 1 − f 2 ∥ 2 ∞ + ∥ f ′ 2 ∥ 2 ∞ + 1 4 Z t 0 ∥ w ( s, · ) ∥ 2 h 1 d s. Using the b ound deriv ed for a = 0 yields sup t ∈ [0 ,T ] ∥ w ( t, · ) ∥ 2 h 1 ≤ c ( T , f 1 , f 2 ) × ∥ f 1 − f 2 ∥ 2 ∞ . 49 • a = 2 : Again, by applying Cauch y-Sch warz inequalit y to the right-hand side of Eq. (51) yields ∀ t ∈ [ 0 , T ] : D f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) , w ( t, · ) E h 2 ≤ ∥ f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) ∥ 2 h 1 + 1 4 ∥ w ( t, · ) ∥ 2 L 2 ( T d ) + 1 4 ∥∇ w ( t, · ) ∥ 2 h 2 . As b efore, we can use the bounds derived for a = 0 to get an L 2 -estimate for w for the second term. The third expression is absorb ed by the left-hand-side of Eq. (51). It remains to upp er bound the first expression. First observ e ∀ t ∈ [ 0 , T ] : ∥ f 1 ( u f 1 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) ∥ 2 h 1 ≤ ∥ f 1 ( u f 1 θ ( t, · )) − f 1 ( u f 2 θ ( t, · )) ∥ 2 h 1 + ∥ f 1 ( u f 2 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) ∥ 2 h 1 = : E 1 ( t ) + E 2 ( t ) . F or E 1 w e ha ve for all t ∈ [0 , T ] E 1 ( t ) ≲ ∥ f 1 ( u f 1 θ ( t, · )) − f 1 ( u f 2 θ ( t, · )) ∥ 2 L 2 ( T d ) + max | β | =1 ∥ D β f 1 ( u f 1 θ ( t, · )) − D β f 1 ( u f 2 θ ( t, · )) ∥ 2 L 2 ( T d ) ≤ ∥ f ′′ 1 ∥ 2 ∞ · ∥ u f 1 θ ∥ 2 C 1 ( T d ) · ∥ w ( t, · ) ∥ 2 L 2 ( T d ) + ∥ f ′ 1 ∥ 2 ∞ · ∥ w ( t, · ) ∥ 2 h 1 . As d ≤ 3 , w e ha ve the em b edding H 3 ( T d ) → C 1 ( T d ) and hence ∀ t ∈ [ 0 , T ] : E 1 ( t ) ≲ 1 + ∥ u f 1 θ ( t, · ) ∥ 2 H 3 ( T d ) · ∥ f 1 − f 2 ∥ 2 ∞ , where in the last step we also used the b ounds deriv ed for a ∈ { 0 , 1 } . Similarly , for E 2 w e deriv e for all t ∈ [0 , T ] E 2 ( t ) ≲ ∥ f 1 ( u f 2 θ ( t, · )) − f 2 ( u f 2 θ ( t, · )) ∥ 2 L 2 ( T d ) + max | β | =1 ∥ D β f 1 ( u f 2 θ ( t, · )) − D β f 2 ( u f 2 θ ( t, · )) ∥ 2 L 2 ( T d ) ≤ ∥ f 1 − f 2 ∥ 2 ∞ + max | β | =1 ∥ D β u f 2 θ ( t, · ) · f ′ 1 ( u f 2 θ ( t, · )) − f ′ 2 ( u f 2 θ ( t, · )) ∥ 2 L 2 ( T d ) ≲ ∥ f 1 − f 2 ∥ 2 ∞ + ∥ u f 2 θ ( t, · ) ∥ 2 C 1 ( T d ) · ∥ f ′ 1 − f ′ 2 ∥ 2 ∞ . As d ≤ 3 , w e ha ve the em b edding H 3 ( T d ) → C 1 ( T d ) and hence ∀ t ∈ [ 0 , T ] : E 2 ( t ) ≲ 1 + ∥ u f 2 θ ( t, · ) ∥ 2 H 3 ( T d ) · ∥ f 1 − f 2 ∥ 2 C 1 ( R ) . Ov erall, the right-hand side of Eq. (51) is then for all t ∈ [0 , T ] upp er b ounded b y 1 2 d d t ∥ w ( t, · ) ∥ 2 h 2 + 3 4 ∥∇ w ( t, · ) ∥ 2 h 2 ≤ c ( f 1 , f 2 , T ) 1 + ∥ u f 2 θ ( t, · ) ∥ 2 H 3 ( T d ) · ∥ f 1 − f 2 ∥ 2 C 1 ( R ) . In tegrating ev erything and using Lemma C.1 with a = 2 , w e ha v e ∥ w ( t, · ) ∥ 2 h 2 + 3 2 Z t 0 ∥∇ w ( s, · ) ∥ 2 h 2 d s ≤ c ( f 1 , f 2 , T , d ) · 1 + ∥ θ ∥ 2 H 2 ( T d ) · ∥ f 1 − f 2 ∥ 2 C 1 ( R ) for all t ∈ [0 , T ] , whic h shows the claim. 50 C.2 2D-Na vier-Stok es Equation In the following, we derive regularit y estimates for the solution of the 2D-Na vier-Stokes equation, see Eq. (22). T o that end recall the setting introduced in Section 4. As is common in the literature for the Navier-Stok es equation, w e study the pr oje cte d equation, that is given the Lera y op erator P (see Eq. (5)), we are in terested in the solution u : [ 0 , T ] → ˙ H ⋄ solving d d t u + ν Au + B [ u, u ] = f u θ (0) = θ , (52) where A : = − P ∆ denotes the Stok es operator, and B [ u, v ] : = P (( u · ∇ ) v ) . W e call u a strong solution of Eq. (52), if the equations holds in L 2 ([0 , T ] , ˙ H ⋄ ) . Based on the theory provided in Nic kl and Titi (2024) it is sho wn in Konen and Nickl (2025) that for every a ∈ N , ν ∈ (0 , ∞ ) , θ ∈ ˙ H a ⋄ , and f ∈ L 2 ([0 , T ] , ˙ H a − 1 ⋄ ) there exists a strong solution u = u θ of Eq. (52) that satisfies u θ ∈ C 0 [0 , T ] , ˙ H a ⋄ ∩ L 2 [0 , T ] , ˙ H a +1 ⋄ . In the following, we derive an regularit y estimate for differences of solutions of Eq. (52) with differen t viscosit y and forcing terms, whic h is used in Theorem 4.2. Lemma C.4. Let θ ∈ ˙ H 2 ⋄ , such that ∥ θ ∥ ˙ H 2 ( T 2 ) ≤ B for some B ∈ (0 , ∞ ) , ν 1 , ν 2 ∈ (0 , ∞ ) , and f 1 , f 2 ∈ L 2 [0 , T ] , ˙ H 1 ⋄ . Denote by u ν i ,f i θ , i ≤ 2 , the solutions to Eq. (52) with viscosity ν i and external forcing f i , resp ectively . W e then ha ve sup t ∈ [0 ,T ] ∥ u ν 1 ,f 1 θ ( t ) − u ν 2 ,f 2 θ ( t ) ∥ 2 ˙ H 2 ( T 2 ) ≤ c ( ν 1 , ν 2 , f 1 , f 2 , T , B ) × | ν 1 − ν 2 | 2 + ∥ f 1 − f 2 ∥ 2 L 2 ( [0 ,T ] , ˙ H 1 ( T 2 ) ) . Pr o of of Lemma C.4. In this pro of, we shorten notation by writing u i : = u ν i ,f i θ for i ≤ 2 and further, we write ˙ H 2 : = ˙ H 2 ( T 2 ) . First observe, that w : = u 1 − u 2 satisfies the equation d d t w + ν 1 ∆ w + ( ν 1 − ν 2 ) ∆ u 2 + B [ w , u 1 ] − B [ u 2 , w ] = f 1 − f 2 on [0 , T ] × T 2 with initial condition w (0 , · ) = 0 on T 2 . In the follo wing, the pro of is understo o d in terms of a Galerkin approximation, such that w can b e view ed of b eing smo oth in the sense that w ∈ C ∞ ∩ ˙ H ⋄ , see Konen and Nickl (2025) , Prop osition A.6 and Prop osition B.1 for example. Th us, taking the ˙ H 2 -inner pro duct with w yields 1 2 d d t ∥ w ( t ) ∥ 2 ˙ H 2 + ν 1 ∥ w ( t ) ∥ 2 ˙ H 3 = − ( ν 1 − ν 2 ) ⟨ ∆ u 2 ( t ) , w ( t ) ⟩ ˙ H 2 − ⟨ B [ w ( t ) , u 1 ( t )] , w ( t ) ⟩ ˙ H 2 − ⟨ B [ u 2 ( t ) , w ( t )] , w ( t ) ⟩ ˙ H 2 + ⟨ f 1 ( t ) − f 2 ( t ) , w ( t ) ⟩ ˙ H 2 = : E 1 ( t ) + E 2 ( t ) + E 3 ( t ) + E 4 ( t ) . In the subsequent, w e upp er b ound the terms E 1 , . . . , E 4 . 1. E 1 ( t ) : Utilising Eq. (46), Eq. (45) as w ell as Eq. (47), w e obtain for all t ∈ [0 , T ] | E 1 ( t ) | = | ν 1 − ν 2 | · D u 2 ( t ) , ( − ∆) 3 w ( t ) E ˙ H 0 ≤ | ν 1 − ν 2 | · ∥ u 2 ( t ) ∥ ˙ H 3 · ∥ ( − ∆) 3 w ( t ) ∥ ˙ H − 3 ≤ | ν 1 − ν 2 | · ∥ u 2 ( t ) ∥ ˙ H 3 · ∥ w ( t ) ∥ ˙ H 3 ≤ c ( ν 1 ) · | ν 1 − ν 2 | 2 · ∥ u 2 ( t ) ∥ 2 ˙ H 3 + 1 6 ν 1 ∥ w ( t ) ∥ 2 ˙ H 3 51 2. E 2 ( t ) + E 3 ( t ) : Applying Konen and Nickl (2025) , Prop osition A.3 with a = 2 twice, w e obtain for all t ∈ [0 , T ] | E 2 ( t ) + E 3 ( t ) | ≤ ⟨ B [ w ( t ) , u 1 ( t )] , w ( t ) ⟩ ˙ H 2 + ⟨ B [ u 2 ( t ) , w ( t )] , w ( t ) ⟩ ˙ H 2 ≲ ∥ u 1 ( t ) ∥ ˙ H 2 + ∥ u 2 ( t ) ∥ ˙ H 2 · ∥ w ( t ) ∥ ˙ H 2 · ∥ w ( t ) ∥ ˙ H 3 ≤ c ( ν 1 ) · ∥ u 1 ( t ) ∥ 2 ˙ H 2 + ∥ u 2 ( t ) ∥ 2 ˙ H 2 · ∥ w ( t ) ∥ 2 ˙ H 2 + 1 6 ν 1 ∥ w ( t ) ∥ 2 ˙ H 3 . 3. E 4 ( t ) : Applying Eq. (45) as w ell as Y oung’s Inequality , w e obtain for all t ∈ [0 , T ] | E 4 ( t ) | ≤ ∥ f 1 ( t ) − f 2 ( t ) ∥ ˙ H 1 · ∥ w ( t ) ∥ ˙ H 3 ≤ c ( ν 1 ) · ∥ f 1 ( t ) − f 2 ( t ) ∥ 2 ˙ H 1 + 1 6 ν 1 ∥ w ( t ) ∥ 2 ˙ H 3 . Absorbing the three ˙ H 3 -terms to the left-hand side, we obtain 1 2 d d t ∥ w ( t ) ∥ 2 ˙ H 2 + 1 2 ν 1 ∥ w ( t ) ∥ 2 ˙ H 3 ≤ c ( ν 1 ) · | ν 1 − ν 2 | 2 · ∥ u 2 ( t ) ∥ 2 ˙ H 3 + c ( ν 1 ) · ∥ u 1 ( t ) ∥ 2 ˙ H 2 + ∥ u 2 ( t ) ∥ 2 ˙ H 2 · ∥ w ( t ) ∥ 2 ˙ H 2 + c ( ν 1 ) · ∥ f 1 ( t ) − f 2 ( t ) ∥ 2 ˙ H 1 In tegrating ov er [0 , t ] for any t ∈ [0 , T ] and applying Konen and Nic kl (2025) , Prop osition A.6 with a = 2 yields ∥ w ( t ) ∥ 2 ˙ H 2 ≤ c ( ν 1 , ν 2 , T , f 2 , B ) · | ν 1 − ν 2 | 2 + ∥ f 1 − f 2 ∥ 2 L 2 ([0 ,T ] , ˙ H 1 ) + c ( ν 1 ) Z t 0 ∥ u 1 ( s ) ∥ 2 ˙ H 2 + ∥ u 2 ( s ) ∥ 2 ˙ H 2 · ∥ w ( t ) ∥ 2 ˙ H 2 d s. Th us, an application of Grön wall’s inequalit y as well as Konen and Nickl (2025) , Prop osition A.6 with a = 2 yields sup t ∈ [0 ,T ] ∥ w ( t ) ∥ 2 ˙ H 2 ≤ c ( ν 1 , ν 2 , T , f 1 , f 2 , B ) × | ν 1 − ν 2 | 2 + ∥ f 1 − f 2 ∥ 2 L 2 ([0 ,T ] , ˙ H 1 ) , whic h sho ws the claim. C.3 Oseen approximation Pr o of of Proposition 4.3. The pro of follo ws similar arguments as the theory dev elop ed in K onen and Nickl (2025) , App endix A and B. Firstly , w e lo ok at a general functional equation as d d t U + ν ∆ U + B [ U, v 1 ] + B [ v 2 , U ] = f , U (0) = ξ . (53) Denote for a ∈ Z a ∗ : = ( | a | + 1 , if | a | ≤ 1 , | a | , if | a | ≥ 2 . F ollowing the proof of K onen and Nickl (2025) , Propos ition B.1, if ξ ∈ ˙ H a ⋄ , f ∈ L 2 [0 , T ] , ˙ H a − 1 ⋄ , ν > 0 and v 1 , v 2 ∈ L 2 [0 , T ] , ˙ H a ∗ ⋄ , there exists a unique solution U : [0 , T ] × T 2 → R 2 of Eq. (53), and w e ha v e U ∈ C 0 ([0 , T ] , ˙ H a ⋄ ) ∩ L 2 ([0 , T ] , ˙ H a +1 ⋄ ) , d U d t ∈ L 2 ([0 , T ] , ˙ H a − 1 ⋄ ) . (54) 52 In particular, sup t ∈ [0 ,T ] ∥ U ( t ) ∥ 2 ˙ H a + ν Z T 0 ∥ U ( t ) ∥ 2 ˙ H a +1 d t ≤ c × ∥ ξ ∥ 2 ˙ H a + ∥ f ∥ 2 L 2 ( [0 ,T ] , ˙ H a − 1 ⋄ ) (55) with some constan t c = c ν, T , f , a, ∥ v 1 ∥ L 2 ([0 ,T ] , ˙ H a ∗ ) , ∥ v 2 ∥ L 2 ([0 ,T ] , ˙ H a ∗ ) > 0 . Thus, lo oking at the Oseen-t yp e iteration Eq. (26), w e deduce that if u 0 ∈ L 2 [0 , T ] , ˙ H a ∗ ⋄ , for eac h l ∈ N , there exists a iterated solution u l that satisfies Eq. (54). Now let L = L N ∈ N chosen as by h yp othesis, suc h that ∀ θ ∈ ˙ H 2 ⋄ : sup t ∈ [0 ,T ] ∥ u L θ ( t ) ∥ ˙ H 2 ( T 2 ) ≤ C Oseen , B × 1 + ∥ θ ∥ 2 ˙ H 2 . (56) ∀ r > 0 : sup θ ∈ ˙ H 2 ⋄ ( r ) sup t ∈ [0 ,T ] ∥ u L θ ( t ) − u L − 1 θ ( t ) ∥ ˙ H 2 ( T 2 ) ≤ C model ( r ) × δ 2 N , W e are now v erifying that the asso ciated forw ard map e G ( θ ) : = u L θ satisfies the conditions of Condition 3.3. • Pro of of [MM1] : F ollows by hypothesis Eq. (56). • Pro of of [MM2] : Let us write w L := u L − u . Then, w L solv es d d t w L − ν ∆ w L + B [ u L − 1 , u L ] − B [ u, u ] = 0 , w L (0) = 0 (57) By bilinearity , B [ u L − 1 , u L ] − B [ u, u ] = B [ u L − 1 − u L , u L ] + B [ u L , u L ] − B [ u, u ] = B [ u L − 1 − u L , u L ] + B [ w L , u L ] + B [ u, u L ] − B [ u, u ] = B [ u L − 1 − u L , u L ] + B [ w L , u L ] + B [ u, w L ] . Th us, applying Eq. (55) and Eq. (56) with U = w L , v 1 = u L , v 2 = u , f = − B [ u L − 1 − u L , u L ] , ξ = 0 and a = 2 , w e obtain for all r > 0 sup θ ∈ ˙ H 2 ( r ) sup t ∈ [0 ,T ] ∥ w L ( t ) ∥ 2 ˙ H a ≤ c × sup θ ∈ ˙ H 2 ( r ) ∥ B [ u L − 1 − u L , u L ] ∥ 2 L 2 ( [0 ,T ] , ˙ H 1 ⋄ ) = c × Z T 0 ∥ B [ u L − u L − 1 , u L ]( t ) ∥ 2 ˙ H 1 d t ≤ c × sup θ ∈ ˙ H 2 ( r ) Z T 0 ∥ u L ( t ) − u L − 1 ( t ) ∥ 2 ˙ H 2 · ∥ u L ( t ) ∥ 2 ˙ H 2 d t ≤ c ′ × δ 4 N with c ′ = c ′ ( ν, T , f , r ) > 0 , which shows the claims. App endix D: Miscellaneous D.1 An inequality for Sob olev spaces Lemma D.1 (Comp osition with smo oth and compactly supported functions) . Let f ∈ C ∞ c ( R ) and assume that u ∈ H m ( T d ) for some in teger m ≥ d 2 . W e then hav e ∥ f ( u ) ∥ H m ( T d ) ≤ c · (1 + ∥ u ∥ m H m ( T d ) ) , where c = c ( m, d ) ∈ (0 , ∞ ) . The pro of of Lemma D.1 follows exactly the lines of Lemma 29 in Nickl et al. (2020) by utilizing an analogous v ersion of Nierenberg’s inequality on the torus, which can b e found for instance in Theorem 3.70 in Aubin (1982) . 53 D.2 A chaining lemma for non i.i.d. data Lemma D.2. Let Θ b e a countable set. Consider the family H := { h θ : Z → V | θ ∈ Θ } of V -v alued functions defined on a probability space ( Z , Z , P Z ) . Assume that there exist finite constan ts v , U ∈ (0 , ∞ ) such that sup θ ∈ Θ E [ | h θ ( Z ) | 2 V ] ≤ v 2 , sup θ ∈ Θ ∥ h θ ∥ ∞ ≤ U , (C1) where Z ∼ P Z . Define the en tropy in tegrals J 2 ( H ) and J ∞ ( H ) b y J 2 ( H ) := Z [0 , 2v] q log N ( H , d 2 , ρ ) d ρ, J ∞ ( H ) := Z [0 , 2U] log N ( H , d ∞ , ρ ) d ρ, with resp ect to the (pseudo)-metrics d 2 ( θ 1 , θ 2 ) := q E [ | h θ 1 ( Z ) − h θ 2 ( Z ) | 2 V ] , d ∞ ( θ 1 , θ 2 ) := ∥ h θ 1 − h θ 2 ∥ ∞ . 1. Let N ∈ N . Let ε 1 , . . . , ε N b e indep endent V -v alued random v ariables satisfying Condi- tion B.1. Let Z 1 , . . . , Z N b e i.i.d. copies of Z ∼ P Z , indep endent of ε 1 , . . . , ε N , and let a 1 , . . . , a N b e real num b ers suc h that max i ∈≤ N | a i | ≤ a ∞ ∈ (0 , ∞ ) . Define the empirical pro cess ∀ θ ∈ Θ : T N , 1 ( θ ) := 1 √ N X i ∈≤ N a i ⟨ ε i , h θ ( Z i ) ⟩ V . (EP1) Then there exists a universal constant M ∈ (0 , ∞ ) such that for all x ≥ 1 , P sup θ ∈ Θ | T N , 1 ( θ ) | ≥ M q ( aσ ) N 2 J 2 ( H ) + v √ x + a ∞ B √ N J ∞ ( H ) + U x ! ≤ 3 exp( − x ) , where ( aσ ) N 2 := 1 N X i ∈≤ N a 2 i σ 2 i . 2. Let V = R . Let a 1 , . . . , a N b e real num b ers suc h that max i ∈≤ N | a i | ≤ a ∞ ∈ (0 , ∞ ) . Let Z 1 , . . . , Z N b e i.i.d. copies of Z ∼ P Z and define ∀ θ ∈ Θ : T N , 2 ( θ ) := 1 √ N X i ∈≤ N a i h θ ( Z i ) − E [ h θ ( Z i )] . (EP2) Then there exists a universal constant M ∈ (0 , ∞ ) such that for all x ≥ 1 , P sup θ ∈ Θ | T N , 2 ( θ ) | ≥ M ¯ a N J 2 ( H ) + v √ x + a ∞ √ N J ∞ ( H ) + U x ! ≤ 3 exp( − x ) . Pr o of of Lemma D.2. F or b oth empirical pro cesses w e will apply Theorem 3.5 in Dirksen (2015) . T o this end, w e v erify that the pro cesses ( T N , 1 ( θ )) θ ∈ Θ and ( T N , 2 ( θ )) θ ∈ Θ satisfy Condition 3.8 therein, that is, they exhibit mixed sub-Gaussian–exp onen tial tails. W e b egin with the m ulti- plier pro cess ( T N , 1 ( θ )) θ ∈ Θ . Fix arbitrary θ 1 , θ 2 ∈ Θ , λ ∈ R , and i ∈≤ N . By F ubini’s theorem, 54 E exp( λa i ⟨ ε i , h θ 1 ( Z i ) − h θ 2 ( Z i ) ⟩ V ) = E X k ∈ N 0 λ k k ! a k i ⟨ ε i , h θ 1 ( Z i ) − h θ 2 ( Z i ) ⟩ k V ≤ 1 + λa i E ⟨ ε i , h θ 1 ( Z i ) − h θ 2 ( Z i ) ⟩ V + X k ∈ N ≥ 2 | λ | k k ! | a i | k E h E h | ⟨ ε i , h θ 1 ( Z i ) − h θ 2 ( Z i ) ⟩ V | k | Z i ii . Using indep endence of Z i and ε i and applying Condition B.1, the preceding displa y is b ounded b y 1 + σ 2 i 2 X k ∈ N ≥ 2 | λ | k k ! | a i | k k !B k − 2 E h | h θ 1 ( Z i ) − h θ 2 ( Z i ) | k V i . In voking Eq. (C1), w e further b ound this by 1 + σ 2 i 2 λ 2 | a i | 2 E h | h θ 1 ( Z i ) − h θ 2 ( Z i ) | 2 V i X k ∈ N ≥ 2 | λ | k − 2 | a i | k − 2 B k − 2 ∥ h θ 1 − h θ 2 ∥ k − 2 ∞ . The remaining sum is a geometric series. Hence, for | λ | < ( a ∞ B d ∞ ( θ 1 , θ 2 )) − 1 , E exp( λa i ⟨ ε i , h θ 1 ( Z i ) − h θ 2 ( Z i ) ⟩ V ) ≤ exp λ 2 a 2 i σ 2 i d 2 2 ( θ 1 , θ 2 ) 2 − 2 | λ | a ∞ B d ∞ ( θ 1 , θ 2 ) ! , where we used that 1 + x ≤ exp( x ) for all x ∈ R . Similarly , for | λ | < √ N a ∞ B d ∞ ( θ 1 ,θ 2 ) , indep endence yields E [exp( λ ( T N , 1 ( θ 1 ) − T N , 1 ( θ 2 )))] = Y i ∈≤ N E exp λ √ N a i ⟨ ε i , h θ 1 ( Z i ) − h θ 2 ( Z i ) ⟩ V ≤ exp λ 2 ( aσ ) 2 N d 2 2 ( θ 1 , θ 2 ) 2 − 2 N − 1 / 2 | λ | a ∞ B d ∞ ( θ 1 , θ 2 ) ! . Applying the exp onen tial Cheb yshev inequalit y , for an y x ∈ R ≥ 0 , P ( | T N , 1 ( θ 1 ) − T N , 1 ( θ 2 ) | ≥ x ) ≤ 2 exp λ 2 ( aσ ) 2 N d 2 2 ( θ 1 , θ 2 ) 2 − 2 N − 1 / 2 | λ | a ∞ B d ∞ ( θ 1 , θ 2 ) − λx ! . Minimizing the right-hand side with resp ect to λ , as in the pro of of Theorem 3.1.8 in Gine and Nic kl (2021) , yields P ( | T N , 1 ( θ 1 ) − T N , 1 ( θ 2 ) | ≥ x ) ≤ 2 exp − x 2 2( aσ ) 2 N d 2 2 ( θ 1 , θ 2 ) + a ∞ B √ N d ∞ ( θ 1 , θ 2 ) x . Consequen tly , for an y θ 1 , θ 2 ∈ Θ , P | T N , 1 ( θ 1 ) − T N , 1 ( θ 2 ) | ≥ 2 q ( aσ ) 2 N d 2 ( θ 1 , θ 2 ) √ x + 2 a ∞ B √ N d ∞ ( θ 1 , θ 2 ) x ≤ 2 e − x . Hence, ( T N , 1 ( θ )) θ ∈ Θ satisfies Condition 3.8 of Dirksen (2015) with metrics ¯ d 1 := 2 a ∞ B √ N d ∞ and ¯ d 2 := 2 q ( aσ ) 2 N d 2 . By Theorem 3.5 in Dirksen (2015) , there exist univ ersal constants c, C ∈ (0 , ∞ ) such that for any θ † ∈ Θ and x ≥ 1 , P sup θ ∈ Θ | T N , 1 ( θ ) − T N , 1 ( θ † ) | > C γ 2 ( H , ¯ d 2 ) + γ 1 ( H , ¯ d 1 ) + c √ x ∆ ¯ d 2 ( H ) + x ∆ ¯ d 1 ( H ) ! ≤ e − x . 55 Moreo ver, the diameters satisfy ∆ ¯ d 1 ( H ) = 2 a ∞ B √ N sup θ 1 ,θ 2 ∈ Θ d ∞ ( θ 1 , θ 2 ) ≤ 4 a ∞ BU √ N , and ∆ ¯ d 2 ( H ) = 2 q ( aσ ) 2 N sup θ 1 ,θ 2 ∈ Θ d 2 ( θ 1 , θ 2 ) ≤ 4 q ( aσ ) 2 N v . F urthermore, up to universal constants, γ 1 ( H , ¯ d 1 ) ≲ Z (0 , ∞ ) log N ( H , ¯ d 1 , ρ ) d ρ = 2 a ∞ B √ N J ∞ ( H ) . Analogously , γ 2 ( H , ¯ d 2 ) ≲ Z (0 , ∞ ) log N ( H , ¯ d 2 , ρ ) 1 / 2 d ρ = 2 q ( aσ ) 2 N J 2 ( H ) . Th us, for some universal M ∈ (0 , ∞ ) , P sup θ ∈ Θ | T N , 1 ( θ ) − T N , 1 ( θ † ) | > M q ( aσ ) 2 N ( J 2 ( H ) + v √ x ) + a ∞ B √ N ( J ∞ ( H ) + U x ) ! ≤ e − x . No w, for x ≥ 1 and τ ( x ) sp ecified below, P sup θ ∈ Θ | T N , 1 ( θ ) | > 2 τ ( x ) ! ≤ P sup θ ∈ Θ | T N , 1 ( θ ) − T N , 1 ( θ † ) | > τ ( x ) ! + P | T N , 1 ( θ † ) | > τ ( x ) . Moreo ver, for all k ∈ N , 1 N X i ∈≤ N E h | a i ⟨ h θ † ( Z i ) , ε i ⟩ V | k i ≤ 1 2 k ! ( aσ ) 2 N v 2 ( a ∞ BU) k − 2 . Therefore, Bernstein’s inequalit y (Lemma 2.2.10 in V aart and W ellner (2023) ) implies that for all x ∈ (0 , ∞ ) , P X i ∈≤ N a i ⟨ h θ † ( Z i ) , ε i ⟩ V > x ≤ exp − x 2 2 N ( aσ ) 2 N v 2 + 2 a ∞ BU x ! , and hence P | T N , 1 ( θ † ) | > 2 q ( aσ ) 2 N v √ x + 4 √ N a ∞ BU x ≤ 2 e − x . The claim follows b y taking M ≥ 2 and defining, for x ≥ 1 , τ ( x ) := M q ( aσ ) 2 N ( J 2 ( H ) + v √ x ) + a ∞ B √ N ( J ∞ ( H ) + U x ) , whic h pro ves (i). P art (ii) follo ws b y analogous computations and is omitted. D.3 An inequality for infinite series Lemma D.3. Let B > 1 . Let a ∈ [0 , ∞ ) , and b , c ∈ (0 , ∞ ) , suc h that a < b c . It holds, X l ∈ N B a l exp − b · B c l ≤ b − a c c ln(B) Γ 2 + a c · exp ( − τ ⋆ b) , (58) where τ ⋆ : = Γ 2 + a c − c c+a and Γ denotes the Gamma function. 56 Pr o of of Lemma D.3. W e follo w the pro of strategy of Lemma A.2 in Kutri and Sc heichl (2024) . W e start with defining for x ∈ (0 , ∞ ) the function g ( x ) : = B a x exp ( − b · B c x ) . Observing that g ′ ( x ) = − ln(B) · (B c x b c − a ) · g ( x ) , w e see directly that g is monotonically decreasing, if a = 0 . If a > 0 , we ha ve g ′ ( x o ) = 0 if and only if x o = ln a cb c ln(B) . As a < b c by assumption, we hav e x o < 0 as w ell as (B c x b c − a ) > 0 for all x > x o , such that g is monotonically decreasing on [ 0 , ∞ ) . Thus, indep enden t of the c hoice of a ∈ [ 0 , ∞ ) , we ha ve X l ∈ N g ( l ) ≤ Z ∞ 0 g ( x )d x = 1 c ln(B) Z ∞ b x b a c b x exp( − x )d x = b 1 − a c c ln(B) Z ∞ b x a c − 1 exp ( − x ) d x, where we used the in tegral transformation ϕ : x 7→ ln ( x b ) c ln(B) . In case of a = 0 , the ab o v e displa y is upp er b ounded by 1 c ln(B) Z ∞ b exp ( − x ) d x ≤ 1 c ln(B) exp( − b) , whic h sho ws the claim. If a = 0 , applying the integral transformation ϕ p : (0 , ∞ ) ∋ x 7→ x p ∈ (0 , ∞ ) , p ∈ (0 , ∞ ) , the last displa y is upper b ounded by b − a c c ln(B) Z ∞ b x a c exp ( − x ) d x = p b − a c c ln(B) Z ∞ b 1 p x p a c + p − 1 exp( − x p )d x. Cho osing p = c c+a < 1 , the last displa y reads p b − a c c ln(B) Z ∞ b 1 p exp( − x p )d x. F or any 0 ≤ τ ≤ min { 1 , Γ(1 + 1 /p ) − p } , we can apply the results of Alzer (1997) and upp er b ound the last displa y by p b − a c c ln(B) Γ (1 + 1 /p ) · 1 − (1 − exp ( − τ b)) 1 p . By an application of Bernoulli’s inequality , the last term is upp er bounded by b − a c c ln(B) Γ 2 + a c · exp ( − τ ⋆ b) with τ ⋆ : = Γ 2 + a c − c c+a ≤ 1 . 57
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment