Maximum Likelihood Drift Estimation for Multiscale Diffusions

MAXIMU M LIKELIH OOD DRIFT ESTIMA TION FOR MUL TISCALE DIFFUSIO NS A.Papa vasiliou ∗ Statistics Department W arwick Univ ersity Cove ntry CV4 7AL, UK G.A. Pa vliotis † Department of Mathematics Imperial College London London SW7 2AZ, UK and A.M. Stuart ‡ Mathematics Institute W arwick Univ ersity Cove ntry CV4 7AL, UK March 11, 2022 Abstract W e study the problem of parameter estimation using maximum likelihood for fast/slo w systems of stochastic differential equations. Our aim is t o shed light on the problem of model/data mismatch at small scales. W e consider two classes of fast/slo w problems for which a closed coarse-grained equation for the slo w va ri- ables can be rigorously derived , which we refer to as averaging and homogeniza- tion problems. W e ask whether , gi ven data from the sl o w v ariable i n the fast/slow system, we can correctly estimate parameters in the drift of the coarse-grained equation for the slo w v ariable, using maximu m likelihood. W e sh ow that, whereas the maximum likelihood estimator is asymptotically unbiased f or the av eraging problem, for the homogenization problem max imum likelihood fails unless we subsample the data at an appropriate rate. An explicit formula f or the asymptotic error in the log likelihood function is presented. Our theory is applied to two sim- ple examp les from molecular dynamics. ∗ E-mail address: a.papa vasi liou@warwi ck.ac.uk. † E-mail address: g.pavi otis@imperial .ac.uk. ‡ E-mail address: a.m.stuart@w arwick.ac.uk. 1 Keywords: p arameter estimation, multiscale diffusions, averaging, h omoge niza- tion, maximum likelihood, subsampling 1 Introd uction Fitting stochastic differential equations (SDEs) to time-series data is often a useful way of extractin g simple model ﬁts which captu re imp ortant aspects of the dynamics [9] . Howe ver , whilst the data may well be compatible with an SDE model in many respects, it is often incomp atible with the desired model at small scales. Since many commo nly applied statistical technique s see the data at small scales this can lead to inconsistencies between the data and the de sired mo del ﬁt. This phenomen on a ppears quite often in econom etrics [1, 2, 13], where the term market micr ostructure noise is u sed to describe the h igh f requency/small scale part o f the data as well as in m olecular d ynamics [1 9]. In essence, the problem that we are facing is that t her e is an inconsistency b etween the coarse-gr ained model that we ar e using and the microsco pic dynamics from which the data is gener ated, at small scales. Similar pro blems appear quite often in statistical in- ference, in the con text of parameter estimation for misspeciﬁed or incor rect models [1 1, Sec. 2. 6]. The aim of this paper is to create a th eoretical framework in w hich it is possible to study this issue, in o rder to gain better insight into h ow it is manifest in practice, and how to overcome it. In particular ou r goal is to investigate the following prob lem: how can we ﬁt data obtain ed from the h igh-dimen sional, multiscale full dyn amics to a low-dimen sional, coarse grained m odel wh ich g overns the e volution o f the resolved (”slow”) d egrees of f reedom? W e will stud y this question f or a class of stochastic systems for wh ich we c an derive rig orously a coarse grained descrip tion for the dy - namics of th e resolved variables. Mor e speciﬁcally , we will work in the framework of coupled systems of mu ltiscale SDEs for a pair o f unkn own functions ( x ( t ) , y ( t )) . W e assume that y ( t ) is fast, r elati ve to x ( t ) , an d that the eq uations average or hom oge- nize to gi ve a clo sed equ ation for X ( t ) to w hich x ( t ) c on verges in the limit of inﬁnite scale separation. The function X ( t ) then approxim ates x ( t ) , typ ically in th e sense of weak conver genc e of probability measures [7 , 20]. W e then ask t he following question: giv en data for x ( t ) , fr om th e coupled sy stem, can we cor rectly id entify parameters in the av erage d or hom ogenized model for X ( t ) ? Fast/slo w systems of SDEs of this form ha ve been studied extensi vely over the last four decades [4, 14, 20] and th e references therein. Recently , various methods have been proposed for solvin g numerically these SDEs [6, 8, 23]. In these w or ks, the coef- ﬁcients o f the limiting SDE a re calculated ”on the ﬂy” from simulations of the fast/slow system. There is a d irect link between these numerical method s and o ur app roach in that our go al is also to infer info rmation about the coefﬁcients in the coarse-g rained equation using data from the multiscale system. Howe ver, our interest is mainly in sit- uations wher e the ”m icroscopic” multiscale system is not k nown e xp licitly . From th is point of view , we merely use th e m ultiscale stocha stic system as our ”data generatin g process”; our g oal is to ﬁt this da ta to the coarse-grain ed equa tion f or X ( t ) , th e limit of the slow variable x ( t ) . A ﬁrst step tow ard s the u nderstand ing of this pro blem was taken in [19]. There, the data gener ating pr ocess x ( t ) w as taken to be the path of a particle movin g in a multiscale poten tial under the inﬂuen ce of therma l noise. The g oal was to iden tify pa- rameters in the drift as well as th e d iffusion coefﬁcient in th e ho mogenized m odel for X ( t ) , the we ak limit of x ( t ) . It was shown th at the maximum likelihood estimato r is asymptotically b iased and that subsampling is nece ssary in order to estimate th e p aram- eters of the homog enized limit correctly , ba sed on a time s eries (i.e. sing le o bservation) of x ( t ) . In this paper we e xtend the analysis to more gener al classes of fast/slo w systems of 2 SDEs fo r wh ich eith er a n averaging or homog enization prin ciple hold s [20]. W e con- sider cases where the d rift in th e averaged or hom ogenized equatio n contains para me- ters which we want to estimate using observations of the slow v ariable in the fast/slo w system. W e sho w th at in the case o f averaging the max imum likelihood f unction is asymptotically un biased and that we can estimate cor rectly the parameters o f the drift in the averaged model from a single path of the slow variable x ( t ) . On th e other hand , we show rigoro usly th at th e m aximum likelihood estimator is asymp totically biased for homog enization problems. In particular , an additio nal term appears in the likelihood function in the lim it of inﬁnite scale sep aration. W e show then that this ter m vanishes, and hence that the maximu m likelihood estimator b ecomes asymptotically unbiased, provided that we subsample at an appropriate rate. T o be more speciﬁc, in this paper we will consider fast/slow systems of SDEs of the form dx dt = f 1 ( x, y ) + α 0 ( x, y ) dU dt + α 1 ( x, y ) dV dt , (1.1a) dy dt = 1 ǫ g 0 ( x, y ) + 1 √ ǫ β ( x, y ) dV dt ; (1.1b) or the SDEs dx dt = 1 ǫ f 0 ( x, y ) + f 1 ( x, y ) + α 0 ( x, y ) dU dt + α 1 ( x, y ) dV dt , (1.2a) dy dt = 1 ǫ 2 g 0 ( x, y ) + 1 ǫ g 1 ( x, y ) + 1 ǫ β ( x, y ) dV dt . (1.2b) W e will refer to equation s (1 .1) as th e av erag ing problem and to equations (1.2) as the homog enization problem . In both cases our assumption s on the coefﬁcients in the SDEs are such that a coarse-grained (averaged o r hom ogenized ) equation exists, which is of the form dX dt = F ( X ; θ ) + K ( X ) dW dt . (1.3) The slow variable x ( t ) converges weakly , in the limit as ǫ → 0 , to X ( t ) , the solution of (1.3). W e assume that the vector ﬁeld F ( X ; θ ) dep ends on a set of parameters θ that we want to estimate based o n data from eith er the averaging o r the homo genization problem . W e suppose th at the actual drift compatible with the data is gi ven by F ( X ) = F ( X ; θ 0 ) . W e ask whether it is p ossible to correctly identify θ = θ 0 by ﬁnd ing the maximum lik elihoo d estimator (MLE) when using a statist ical m odel of the form (1.3), but given data fro m (1 .1) o r (1.2). Our main results can b e stated , informally , as f ollows. Theorem 1 .1. Assume th at we ar e g iven contin uous time da ta. The MLE for th e av- eraging pr oblem (i.e. ﬁtting d ata fr om (1.1 a) to (1.3) ) is asymptotically unbiased. On the other ha nd, the MLE for the homogenization pr oblem (i.e. ﬁtting d ata fr om (1.2a) to (1.3) ) is a symptotically biased and an explicit formula for the asymptotic err o r in the likelihood, E ∞ , can be obta ined. Precise statements of th e above results can be f ound in Theorems 3. 10, 3.1 2 and 3.13. The failure o f the MLE wh en applied to the hom ogenization problem is due to the presence of high frequ ency data. Natu rally , in order to be able to identify co rrectly the parameter θ = θ 0 in (1.3) u sing data fr om (1. 2a) subsampling at an appropr iate rate is necessary . Theorem 1.2. The MLE for the ho mogenization pr oblem becomes asympto tically un- biased if we subsample at an appr opriate rate . 3 Roughly speaking, the sampling rate should be between the two charac teristic time scales o f the fast/slo w SDEs (1.2), 1 and ǫ 2 . The precise statement of this result can be fou nd in Theor ems 4.1 and 4 .5. IIn p ractice real da ta will not come explicitly fro m a scale- separated mo del like (1.1a) or (1.2a). Ho wever real data is of ten m ultiscale in character . Thus the results in this paper she d lig ht on the p itfalls th at m ay arise wh en ﬁtting simpliﬁed statistical mo dels to multiscale data. Further more the results indicate the central, and subtle, role played b y subsampling d ata in orde r to overcome misma tch between model and data at small scales. The rest o f the pap er is organized as follows. In Sec tion 2 we study the fast/slo w stochastic systems in troduced above, and pr ove approp riate averaging and homoge- nization theorems. In Section 3 we intro duce the m aximum likelihood fu nction f or (1.3) and study its limiting behavior, given data f rom the averaging an d homo geniza- tion problems (1.1a) an d (1.2a). In Section 4 we show that, when subsampling at an appropr iate ra te, the maximum likelihood estimator for the ho mogen ization p rob- lem becomes asy mptotically u nbiased. In Section 5 w e p resent examples of fast/slo w stochastic systems that ﬁt in to the gen eral framework of this pap er . Section 6 is re- served for conclusions. V arious technical results are proved in the appendices. 2 Set-Up W e will consider fast/s low systems of SDEs for the variables ( x, y ) ∈ X × Y . W e can take, fo r examp le, X × Y = R l × R d − l or X × Y = T l × T d − l . In th e seco nd case, where the state spa ce is compact, all of the assumptions that we need for the p roofs of our results can be justiﬁed using elliptic PDEs theory . Let ϕ t ξ ( y ) denote the Markov process which solves the SDE d dt  ϕ t ξ ( y )  = g 0  ξ , ϕ t ξ ( y )  + β  ξ , ϕ t ξ ( y )  dV dt , ϕ 0 ξ ( y ) = y . (2.1) Here ξ ∈ X is a ﬁxed param eter and , for each t ≥ 0 , ϕ t ξ ( y ) ∈ Y , g 0 : X × Y → R d − l , β : X × Y → R ( d − l ) × m and V is a standard Bro wnian motion in m d imensions. 1 The generato r of the proce ss is L 0 ( ξ ) = g 0 ( ξ , y ) · ∇ y + 1 2 B ( ξ , y ) : ∇ y ∇ y (2.2) with B ( ξ , y ) := β ( ξ , y ) β ( ξ , y ) T . No tice that L 0 ( ξ ) is a differential oper ator in y alone, with ξ a param eter . Our interest is in data generated by the projectio n onto the x coordinate of systems of SDEs for ( x, y ) in X × Y . In p articular, f or U a standard Brownian motion in R n we will consider either of the following coupled systems of SDEs: dx dt = f 1 ( x, y ) + α 0 ( x, y ) dU dt + α 1 ( x, y ) dV dt , (2.3a) dy dt = 1 ǫ g 0 ( x, y ) + 1 √ ǫ β ( x, y ) dV dt ; (2.3b) 1 Throughout this pape r we wri te stochast ic dif ferential equatio ns as ide ntities in fully dif ferenti ated form, e ven though Brownia n motion is not differen tiable. In all cases the iden tity should be interpeted as holding in inte grated form, with the It ˆ o interpre ation of the stocha stic int egral. 4 or the SDEs dx dt = 1 ǫ f 0 ( x, y ) + f 1 ( x, y ) + α 0 ( x, y ) dU dt + α 1 ( x, y ) dV dt , (2.4a) dy dt = 1 ǫ 2 g 0 ( x, y ) + 1 ǫ g 1 ( x, y ) + 1 ǫ β ( x, y ) dV dt . (2.4b) Here f i : X × Y → R l , α 0 : X × Y → R l × n , α 1 : X × Y → R l × m , g 1 : X × Y → R d − l and g 0 , β and V a re as above. Assumptions 2.1. • The equa tion −L ∗ 0 ( ξ ) ρ ( y ; ξ ) = 0 , Z Y ρ ( y ; ξ ) dy = 1 has a un ique non-n e gative solu tion ρ ( y ; ξ ) ∈ L 1 ( Y ) for every ξ ∈ X ; further- mor e ρ ( y ; ξ ) is C ∞ in y and ξ . • F or each ξ ∈ X deﬁ ne the weighted Hilbert s pa ce L 2 ρ ( Y ; ξ ) with inner-pr od uct h a, b i ρ := Z Y ρ ( y ; ξ ) a ( y ) b ( y ) dy . F or all ξ ∈ X P o isson equation −L 0 ( ξ )Θ( y ; ξ ) = h ( y ; ξ ) , Z Y ρ ( y ; ξ )Θ( y ; ξ ) dy = 0 has a uniqu e solution Θ( y ; ξ ) ∈ L 2 ρ ( Y ; ξ ) , pr ovided that Z Y ρ ( y ; ξ ) h ( y ; ξ ) dy = 0 . • The fun ctions f i , g i , α i , β and all derivatives ar e un iformly bounded in X × Y . • If h ( y ; ξ ) and all its derivatives with r espect to y , ξ ar e u niformly bo unded in X × Y then the same is true of Θ solving the P oisson equa tion above. Remark 2.2. In the case wher e the state space of the fast pr oce ss is compact, Y = T d − ℓ , and the diffusion matrix B ( ξ , y ) is positive deﬁ nite th e ab ove a ssumptions can be easily pr oved using ellip tic PDE theory [20 , Ch. 6 ]. Similar r esults can a lso be pr oved without the comp actness and uniform ellipticity assumptions [ 15 , 16, 17]. The ﬁrst assumptio n essentially states th e th e process (2.1) is ergodic, for each ξ ∈ X . Let L 0 = L 0 ( x ) and deﬁne L 1 = f 0 · ∇ x + g 1 · ∇ y + C : ∇ y ∇ x , L 2 = f 1 · ∇ x + 1 2 A : ∇ x ∇ x , where A ( x, y ) = α 0 ( x, y ) α 0 ( x, y ) T + α 1 ( x, y ) α 1 ( x, y ) T , C ( x, y ) = α 1 ( x, y ) β ( x, y ) T . 5 The generato rs for the Markov p rocesses d eﬁned by equation s (2.3) and (2.4) respec- ti vely are L av = 1 ǫ L 0 + 1 √ ǫ L 1 + L 2 , (2.5) L hom = 1 ǫ 2 L 0 + 1 ǫ L 1 + L 2 , (2.6) with the understand ing that f 0 ≡ 0 and g 1 ≡ 0 in the case of L av . W e let Ω denote the probab ility space for the pair of Brownian motions U, V . In (2.3) (resp. (2.4)) th e dynamics f or y with x vie wed as frozen has solutio n ϕ t/ǫ x ( y (0)) (resp. ϕ t/ǫ 2 x ( y (0)) ). Of course x is not f rozen, but since it e volves much more slowly th an y , intuitio n based on freezing x and considering the pro cess (2.1) is useful in understan ding how av erag ing an d homogen ization arise for equatio ns ( 2.3) and (2.4) respectively . Speciﬁcally , f or (2.3) on timescales long com pared with ǫ a nd short compared to 1 , x will be approximately fr ozen and y will traverse its inv ariant measure with density ρ ( y ; x ) . W e may thus a verage ov er this measure and eliminate y . Similar ideas h old f or e quation (2.4), but are comp licated by the p resence o f the ter m ǫ − 1 f 0 . These ideas un derly the a veraging and homog enization results contained in the next tw o sub sections. 2.1 A veraging Deﬁne F : X → R l and K : X → R l × l by F ( x ) := Z Y f 1 ( x, y ) ρ ( y ; x ) dy and K ( x ) K ( x ) T := Z Y  α 0 ( x, y ) α 0 ( x, y ) T + α 1 ( x, y ) α 1 ( x, y ) T  ρ ( y ; x ) dy . Note that K ( x ) K ( x ) T is positiv e semideﬁnite and hence K ( x ) is well deﬁned via, for example, the C ho lesky decomposition. Theorem 2.3. Let Assumptions 2.1 hold and let x (0) = X (0) . Then x ⇒ X in C ([0 , T ] , X ) and X solves the SDE dX dt = F ( X ) + K ( X ) dW dt , (2.7) wher e W is ca standar d l -d imensional Br ownian motion. W e use the n otation Ω 0 to denote the probab ility space for the Brownian motion W . Pr o of. Consider the Poisson equation −L 0 Ξ( y ; x ) = f 1 ( x, y ) − F ( x ) , Z Y ρ ( y ; x )Ξ( y ; x ) dy = 0 6 with unique solution Ξ( y ; x ) ∈ L 2 ρ ( Y ; x ) . Ap plying It ˆ o’ s fo rmula to Ξ we obtain d Ξ dt = 1 ǫ L 0 Ξ + 1 √ ǫ L 1 Ξ + L 2 Ξ + 1 √ ǫ ∇ y Ξ β dV dt + ∇ x Ξ α 0 dU dt + ∇ y Ξ α 1 dV dt . From this we obtain Z t 0  f 1 ( x ( s ) , y ( s )) − F ( x ( s ))  ds = e 0 ( t ) where e 0 ( t ) = √ ǫ Z t 0 ( L 1 Ξ ds + ∇ y Ξ β dV ) + ǫ Z t 0 ( L 2 Ξ ds + ∇ x Ξ α 0 dU + ∇ y Ξ α 1 dV ) + ǫ (Ξ( y (0); x (0)) − Ξ( y ( t ); x ( t ))) . Thus, by Assumptions 2.1 and the Burkhold er-Da vis-Gund y inequality , e 0 → 0 in L p ( C ([0 , T ] , X ); Ω) . Hence x ( t ) = x (0) + Z t 0 F ( x ( s )) ds + M ( t ) + e 0 ( t ) with M ( t ) := Z t 0 α 0 ( x ( s ) , y ( s )) dU ( s ) + Z t 0 α 1 ( x ( s ) , y ( s )) dV ( s ) . The quadr atic variation process for M ( t ) is h M i t = Z t 0 A ( x ( s ) , y ( s )) ds, where A ( x, y ) = α 0 ( x, y ) α 0 ( x, y ) T + α 1 ( x, y ) α 1 ( x, y ) T . By use of the Poisson equation techniqu e ap plied above to show that f 0 ( x, y ) can b e approx imated by F ( x ) (its average against the fast y pro cess), we can show similarly that Z t 0 A ( x ( s ) , y ( s )) ds = Z t 0 K ( x ( s )) K ( x ( s )) T ds + e 1 ( t ) where, as above, e 1 → 0 in L p ( C ([0 , T ] , X ); Ω) . Let B ( t ) = x (0) + Z t 0 F ( x ( s )) ds + e 0 ( t ) , q ( t ) = Z t 0 K ( x ( s )) K ( x ( s )) T ds + e 1 ( t ) . Then x ( t ) = B ( t ) + M ( t ) , 7 where M ( t ) and M ( t ) M ( t ) T − q ( t ) are F t martingales, wh ere F t is the ﬁltra tion gener- ated by σ (( U ( s ) , V ( s )) , s ≤ t ) . Let C ∞ c ( X ) denote the space of com pactly suppor ted C ∞ function s. The martingale problem for A = { ( f , K : F · ∇ f + ∇ x ∇ x f ) : f ∈ C ∞ c ( X ) } is well po sed and x ( s ) , y ( s ) an d X ( s ) are co ntinuou s. By L 2 conv ergence of the e i to 0 in C ([0 , T ] , X ) we dedu ce con vergence to 0 in p robability , in the same space. Hence by a slight generalization of Theorem 4.1 in Chapter 7 of [7] we deduce the desired result. 2.2 Homogenization In order f or the equations ( 2.4) to produce a sensible limit as ǫ → 0 it is n ecessary to impose a conditio n on f 0 . Speciﬁcally we assume th e following which , roug hly , says that f 0 ( x, y ) averages to zer o ag ainst the in variant measure o f th e fast y pro cess, with x ﬁxed. Assumptions 2.4. The func tion f 0 satisﬁes the centering condition Z Y ρ ( y ; x ) f 0 ( x, y ) dy = 0 . Let Φ( y ; x ) ∈ L 2 ρ ( Y ; x ) b e the solution of the equation − L 0 Φ( y ; x ) = f 0 ( x, y ) , Z Y ρ ( y ; x )Φ( y ; x ) dy = 0 , (2.8) which is unique by Assumptions 2.4. Deﬁn e F 0 ( x ) := Z Y ( L 1 Φ)( x, y ) ρ ( y ; x ) dy = Z Y   ∇ x Φ f 0  ( x, y ) +  ∇ y Φ g 1  ( x, y ) +  α 1 β T : ∇ y ∇ x Φ  ( x, y )  ρ ( y ; x ) dy , F 1 ( x ) := Z Y f 1 ( x, y ) ρ ( y ; x ) dy an d F ( x ) = F 0 ( x ) + F 1 ( x ) . Also deﬁne A 1 ( x ) A 1 ( x ) T := Z Y   ∇ y Φ β + α 1  ∇ y Φ β + α 1  T  ( x, y ) ρ ( y ; x ) dy , A 0 ( x ) A 0 ( x ) T := Z Y α 0 ( x, y ) α 0 ( x, y ) T ρ ( y ; x ) dy an d K ( x ) K ( x ) T = A 0 ( x ) A 0 ( x ) T + A 1 ( x ) A 1 ( x ) T . Note that K ( x ) K ( x ) T is positive semideﬁnite by constru ction so that K ( x ) is well deﬁned by , for example, the Cholesky decomposition. 8 Theorem 2.5 . Let Assump tions 2.1, 2.4 ho ld. Then x ⇒ X in C ([0 , T ] , X ) and X solves the SDE dX dt = F ( X ) + A ( X ) dW dt (2.9) wher e W is a standar d l -d imensional Br ownian motion . Pr o of. W e co nsider three Poisson equations: that for Φ given above and − L 0 χ ( y ; ξ ) = f 1 ( x, y ) − F 1 ( x ) , Z Y ρ ( y ; x ) χ ( y ; x ) dy = 0 , (2.1 0a) −L 0 Ψ( y ; ξ ) = ( L 1 Φ)( x, y ) − F 0 ( x ) , Z Y ρ ( y ; x )Ψ( y ; x ) dy = 0 . (2 .10b) All of these equation s have a unique solution since the righ t hand sides a verage to zero against the density ρ ( y ; x ) by assumption ( Φ ) or by constru ction ( χ , Ψ ). By the It ˆ o form ula we obtain d Φ dt = 1 ǫ 2 L 0 Φ + 1 ǫ L 1 Φ + + L 2 Φ + 1 ǫ ∇ y Φ β dV dt + ∇ x Φ α 0 dU dt + ∇ x Φ α 1 dV dt . From this we obtain, using arguments similar to t ho se in the proof of Theorem 2.3, 1 ǫ Z t 0 f 0 ( x, y ) ds = Z t 0 ( L 1 Φ)( x ( s ) , y ( s )) ds + Z t 0 ( ∇ y Φ β )( x ( s ) , y ( s )) dV ( s ) + e 0 ( t ) where e 0 ( t ) → 0 in L p ( C ([0 , T ] , X ); Ω) and where, re call, Ω is th e probab ility space for ( U, V ) . Applying It ˆ o’ s fo rmula to χ , the solution of (2.10a), we may show that Z t 0  f 1 ( x ( s ) , y ( s )) − F 1 ( x ( s ))  ds = e 1 ( t ) where e 1 ( t ) → 0 in L p ( C ([0 , T ] , R d ); Ω) . Thus x ( t ) = x (0) + Z t 0  L 1 Φ  ( x ( s ) , y ( s )) ds + Z t 0 F 1 ( x ( s )) ds + Z t 0  ∇ y Φ β  ( x ( s ) , y ( s )) dV ( s ) + Z t 0 α 0 ( x ( s ) , y ( s )) dU ( s ) + Z t 0 α 1 ( x ( s ) , y ( s )) dV ( s ) + e 2 ( t ) and e 2 ( t ) → 0 in L p ( C ([0 , T ] , X ); Ω) . By applying It ˆ o’ s formula to Ψ , the solution of (2.10b) we obtain d Ψ dt = 1 ǫ 2 L 0 Ψ + 1 ǫ L 1 Ψ + + L 2 Ψ + 1 ǫ ∇ y Ψ β dV dt + ∇ x Ψ α 0 dU dt + ∇ x Ψ α 1 dV dt 9 From this we obtain Z t 0  L 1 Φ − F 0  ( x, y ) ds = e 3 ( t ) where e 3 ( t ) → 0 in L p ( C ([0 , T ] , X ); Ω) . Thus x ( t ) = x (0) + Z t 0 F ( x ( s )) ds + M ( t ) + e 4 ( t ) and M ( t ) := Z t 0 α 0 ( x ( s ) , y ( s )) dU ( s ) +  ∇ y Φ β + α 1  ( x ( s ) , y ( s )) dV ( s ) . Here e 4 → 0 in L p ( C ([0 , T ] , X ); Ω) . Deﬁne A 2 ( x, y ) =  ∇ y Φ β + α 1  ∇ y Φ β + α 1  T ( x, y ) + α 0 ( x, y ) α 0 ( x, y ) T . The quadr atic variation of M ( t ) is h M i t = Z t 0 A 2 ( x ( s ) , y ( s )) ds. By use of the Poisson equation technique we can show that Z t 0 A 2 ( x ( s ) , y ( s )) ds = Z t 0 K ( x ( s )) K ( x ( s )) T ds + e 5 ( t ) where, as above, e 5 → 0 in L p ( C ([0 , T ] , X ); Ω) . The remainde r of the proo f proceeds as in Th eorem 2.3. 3 Parameter Estimation Recall that Ω 0 is the pro bability space for W . I magine that we try to ﬁt data { x ( t ) } t ∈ [0 ,T ] from (2.3) or (2.4) to a homogen ized or averaged equation of the from (2. 7) or (2.9), but with unk nown parameter θ ∈ Θ , where Θ is an open subset of R k , in the drift: dX dt = F ( X ; θ ) + K ( X ) dW dt . (3.1) Suppose th at th e actual drift co mpatible with the data is given by F ( X ) = F ( X ; θ 0 ) . W e ask whe ther it is possible to co rrectly identif y θ = θ 0 by ﬁnding th e ma ximum likelihood estimator (MLE) when using a statistical mod el of the form (3.1), but gi ven data from (2 .3) or (2. 4). Recall that the a veraging an d homogen ization tech niques from the pr evious sectio n show that x ( t ) fro m (2.3) and (2. 4) conv erges we akly to the solutio n of an equation of the form (3.1). W e make the follo wing assumption s concern ing the mod el equations (3.1 ) wh ich will be used to ﬁt the data. 10 Assumptions 3.1. W e assume that K is uniformly positive-deﬁn ite on X . W e also assume tha t ( 3.1) is er godic with invariant me asur e ν ( dx ) = π ( x ) dx at θ = θ 0 and that A ∞ := Z X  K ( x ) − 1 F ( x ) ⊗ K ( x ) − 1 F ( x )  π ( x ) dx (3.2) is in vertible. Giv en d ata { z ( t ) } t ∈ [0 ,T ] , th e log likelihood fun ction for θ satisfying (3. 1) is giv en by L ( θ ; z ) = Z T 0 h F ( z ; θ ) , dz i a ( z ) − 1 2 Z T 0 | F ( z ; θ ) | 2 a ( z ) dt, (3.3) where h p, q i a ( z ) = h K ( z ) − 1 p, K ( z ) − 1 q i . T o be precise d P d P 0 = exp ( L ( θ ; X )) where P is the path space mea sure for (3. 1) and P 0 the path space measure fo r (3.1) with F ≡ 0 [21]. Th e MLE is ˆ θ = argmax θ L ( θ ; z ) . (3.4) As a prelim inary to understan ding the ef fect of using m ultiscale data, we start by ex- hibiting an underlyin g property o f the lo g-likelihood when confr onted with data from the model (3.1) itself. Th e fo llowing theore m shows that, in this case: (i) in the limit T → ∞ the log-likelihood is asympto tically indep endent of the particu lar sample path of (3.1) cho sen – it dep ends only on the inv arian t measure π ; (ii) as a conseque nce we see that, asy mptotically , time-order ing of the data is irrelev ant to parameter estimation; (iii) under some additional assumptions, the large T expression also shows that choos- ing data from the mod el (3.1) leads to the correct estimation of drift parameters, in the limit T → ∞ . Theorem 3.2. Let Ass ump tions 3.1 hold and let { X ( t ) } t ∈ [0 ,T ] be a sample p ath o f (3.1) with θ = θ 0 . Then, in L 2 (Ω 0 ) and almost sur ely with r espect to X (0) , lim T →∞ 2 T L ( θ ; X ) = Z X | F ( X ; θ 0 ) | 2 a ( X ) π ( X ) dX − Z X | F ( X ; θ ) − F ( X ; θ 0 ) | 2 a ( X ) π ( X ) dX . This expr ession is maximized by choosing ˆ θ = θ 0 , in the limit T → ∞ . Pr o of. By Lem mas A.2 and A.3 in the appendix we deduce that, with all lim its in L 2 (Ω) , lim T →∞ 1 T L ( θ ; X ) = lim T →∞  1 T Z T 0 h F ( X ; θ ) , F ( X ; θ 0 ) i a ( X ) dt + 1 T Z T 0 h F ( X ; θ ) , K ( X ) dW i a ( X ) dt − 1 2 T Z T 0 | F ( X ; θ ) | 2 a ( X ) dt  = Z X h F ( X ; θ ) , F ( X ; θ 0 ) i a ( X ) π ( X ) dX − 1 2 Z X | F ( X ; θ ) | 2 a ( X ) π ( X ) dX . Completing the square provides the proof. 11 In the particular case wher e the param eter θ ap pears linearly in the drift it ca n be viewed as an R l × l matrix Θ and F ( X ; θ ) = Θ F ( X ) (3.5) The co rrect value for Θ is thus th e R l × l identity matrix I . Th e max imum likelihood estimator is ˆ Θ( z ; T ) = A ( z ; T ) − 1 B ( z ; T ) (3.6) where A ( z ; T ) = 1 T Z T 0 K ( z ) − 1 F ( z ) ⊗ K ( z ) − 1 F ( z ) dt, B ( z ; T ) = 1 T Z T 0 K ( z ) − 1 dz ⊗ K ( z ) − 1 F ( z ); if A ( z ; T ) is not invertible then we set ˆ Θ( z ; T ) = 0 . A result closely related to Theor em 3.2 is the following 2 : Theorem 3.3. Let Ass ump tions 3.1 hold and let { X ( t ) } t ∈ [0 ,T ] be a sample p ath o f (3.1) with θ = θ 0 so that F ( X ; θ ) = F ( X ) . Then lim T →∞ ˆ Θ( X ; T ) = I in pr ob ability . Pr o of. W e o bserve that B ( X ; T ) = A ( X ; T ) + J 1 where J 1 = 1 T Z T 0 dW ⊗ K ( X ) − 1 F ( X ) and where E | J 1 | 2 = O (1 / T ) b y Lemma A.2. By ergodicity , and Lemma A.3, we ha ve that A ( X ; T ) = A ∞ + J 2 where E | J 2 | 2 = O (1 /T ) a nd A ∞ is given by (3 .2). By Assum ption 3.1 an d for T sufﬁciently large, A ( z ; T ) is in vertible and we hav e ˆ Θ( X ; T ) = I + ( A ∞ + J 2 ) − 1 J 1 and the result follows. Remark 3.4. The invertibility of A ∞ is necessary in or der to b e able to succ essfully estimate the drift of the linear system. In ord er to prove an analo gue o f Theore m 3.3 wh en the drift dep ends nonlinearly on the parameter θ we n eed to make additional as sump tions. 2 The proof is standard and we outli ne it only for comparison with the situati on in the nex t subsection where data from a multiscal e m odel is employ ed. 12 Assumptions 3.5. • W e assume that inf | u | >δ Z X | F ( X ; θ 0 + u ) − F ( X ; θ 0 ) | 2 a ( X ) π ( X ) dX > κ ( δ ) > 0 , ∀ δ > 0 . (3.7) When (3.7) holds we will say that the system is identiﬁable. • There exist an α > 0 an d ˆ F : X → R , squ ar e inte grable with r espect to th e in varian t measure , i.e. R X ˆ F ( X ) 2 π ( X ) dX < ∞ , such that | F ( X ; θ ) − F ( X ; θ ′ ) | a ( X ) ≤ | θ − θ ′ | α ˆ F ( X ) (3.8) Under the ab ove assumption we can prove conver gen ce of the M LE to th e c orrect value θ 0 . Theorem 3.6. Su ppose that Assump tions 3.1 a nd 3.5 h old. I f, in add ition, the p aram- eter space Θ is compact, then lim T →∞ ˆ θ ( X ; T ) = θ 0 in pr ob ability . Pr o of. It is a straightfo rward application of the results in [22]. W e now ask whether the likelihoo d behaves similarly when co nfron ted w ith data { x ( t ) } from the underlying multiscale systems (2.3 ) or (2.4). T o address this issue we make the f ollowing natur al assumptions regarding the inv ariant measure for these underly ing multiscale systems. Assumptions 3.7. • The fast/slow SDE (2.3) (resp. ( 2.4) ) is er godic with in varian t measur e µ ǫ ( dxdy ) which is a bsolutely c ontinuou s with r espect to th e Leb esgue measur e on X × Y with smo oth density ρ ǫ ( x, y ) . • The limiting SDE (2.7) or (2.9) is er go dic with in varian t me asur e ν ( dx ) whic h is absolutely co ntinuou s with respect to the Leb esgue measure o n X with smoo th density π ( x ) . • The measur e µ ǫ ( dxdy ) = ρ ǫ ( x, y ) dxdy con verg es wea kly to the mea sur e µ ( dxdy ) = π ( x ) ρ ( y ; x ) dxdy whe r e ρ ( y ; x ) is the invariant d ensity of the fast pr ocess ( 2.1) given in Assumption 2.1 and π ( x ) is the invariant den sity for (2.7) (r esp. (2.9) ). • The invariant m easur e µ ǫ ( dxdy ) = ρ ǫ ( x, y ) dxdy satisﬁes a P oincar ´ e inequality with a constan t independ ent of ǫ : th er e exists a constant C p indepen dent o f ǫ such that for every mean zer o H 1 ( X × Y ; µ ǫ ( dxdy )) function f we have that k f k ≤ C p k∇ f k (3.9) wher e ∇ r epresents the gradient with r espect to ( x T , y T ) T and k · k denotes the L 2 ( X × Y ; µ ǫ ( dxdy )) norm. W e also need to assume that the fast/slo w SDEs (2.3) and (2.4) are uniformly ellip- tic. 13 Assumption 3.8. Deﬁne the matrix ﬁeld Σ = γ γ T wher e γ =  α 0 α 1 0 1 ǫ β  . Then ther e is C γ > 0 , independ ent of ǫ → 0 such that h ξ , Σ( x, y ) ξ i ≥ C γ | ξ | 2 ∀ ( x, y ) ∈ X × Y , ξ ∈ R d . Remark 3.9. It is straightforwar d to show that, when X = T ℓ , Y = T ℓ − d , Assump- tions 3.7 fo llow fr om Assump tion 3.8, using pr o perties of perio dic function s [19 ], to - gether with the compactn ess of the state space. Wh en X = R ℓ , Y = R ℓ − d mor e work is needed in order to pr ove that the in variant measur e satisﬁes P oincar ´ e’ s in equality with an ǫ independen t constan t, since this, essentially , r equires to pr ove that the gener - ator o f the fast/slow system ha s an ǫ -indepe ndent spectral gap. In this case where the fast/slow system has a gradien t s tructure with a smooth potentia l V ( x, y ) , then simple criteria o n the potential have b een derived that facilitate d etermination o f wheth er or not the invariant measur e sa tisﬁes the P oinca r ´ e inequality . W e r efer to [2 4, 3] an d the r efer ences ther ein for more details. 3.1 A veraging W e now ask what happens when the MLE for the a veraged equation (3.1 ) is confro nted with data fr om the origin al multiscale equation (2.3). Th e following result shows that, in this case, the estimator will behave well, for large tim e and sm all ǫ . Large time is always required for convergence of drift p arameter estimation, e ven when mo del and data match. In the limit ǫ → 0 , X ( t ) from (3.1) appro ximates x ( t ) fro m (2.3 ). Theorem 3.1 0. Let Assumptio ns 2 .1, 3.1, 3. 7 and 3.8 h old. Let { x ( t ) } t ∈ [0 ,T ] be a sample path o f (2 .3) and { X ( t ) } t ∈ [0 ,T ] a sample p ath of (3 .1) at θ = θ 0 . Then the following limits, to be interpr eted in L 2 (Ω) an d L 2 (Ω 0 ) res pec tively , and almo st surely with r espect to x (0) , y (0) , X (0 ) , ar e identical: lim ǫ → 0 lim T →∞ 1 T L ( θ ; x ) = lim T →∞ 1 T L ( θ ; X ) . Pr o of. W e start by ob serving that, by Lemma A.3 and Assumptions 3.7, lim ǫ → 0 lim T →∞ 1 T Z T 0 | F ( x ; θ ) | 2 a ( x ) dt = lim ǫ → 0 Z X ×Y | F ( x ; θ ) | 2 a ( x ) ρ ǫ ( x, y ) dxdy = Z X ×Y | F ( x ; θ ) | 2 a ( x ) π ( x ) ρ ( y ; x ) dxdy = Z X | F ( x ; θ ) | 2 a ( x ) π ( x ) dx, where the limits are in L 2 (Ω) . Now , from Equatio n (2.3) it follows that 1 T Z T 0 h F ( x ; θ ) , dx i a ( x ) = 1 T Z T 0 h F ( x ; θ ) , f 1 ( x, y ) i a ( x ) dt + 1 T Z T 0 h F ( x ; θ ) , α 0 ( x, y ) dU i a ( x ) + 1 T Z T 0 h F ( x ; θ ) , α 1 ( x, y ) dV i a ( x ) . 14 The last two integrals ten d to zero in L 2 (Ω) as T → ∞ by Lem ma A.2. In order to analyze the ﬁrst integral o n the right han d side we co nsider solu tion of th e Poisson equation −L 0 Λ = h F ( x ; θ ) , f 1 ( x, y ) − F ( x ; θ 0 ) i a ( x ) , Z Y ρ ( y ; ξ )Λ( y ) dy = 0 . This has a unique solution Λ( y ; x ) ∈ L 2 ρ ( Y ; x ) b y construction of F . Applying It ˆ o’ s formula to Λ gi ves d Λ dt = 1 ǫ L 0 Λ + 1 √ ǫ L 1 Λ + L 2 Λ + 1 √ ǫ ∇ y Λ β dV dt + ∇ x Λ α 0 dU dt + ∇ x Λ α 1 dV dt which shows that 1 T Z T 0 h F ( x ; θ ) , f 1 ( x, y ) i a ( x ) dt = 1 T Z T 0 h F ( x ; θ ) , F ( x ; θ 0 ) i a ( x ) dt + ǫ T Z T 0  L 2 Λ  ( x ( t ) , y ( t )) dt − ǫ T  Λ( x ( T ) , y ( T )) − Λ( x (0) , y (0))  + 1 T Z T 0 √ ǫ  ∇ y Λ β )( x ( t ) , y ( t )) dV ( t ) + ( L 1 Λ) ( x ( t ) , y ( t )) dt  + 1 T Z T 0 ǫ  ∇ x Λ α 0 )( x ( t ) , y ( t )) dU ( t ) + ∇ y Λ α 1 )( x ( t ) , y ( t )) dV ( t )  . The stochastic integrals tend to zero in L 2 (Ω) as T → ∞ . By assump tion Λ is bo unded. Furthermo re, in L 2 (Ω) , 1 T Z T 0  L i Λ  ( x ( t ) , y ( t )) dt → Z X ×Y  L i Λ  ( x, y ) ρ ( y ; x ) dy , i = 1 , 2 . Hence we deduce that lim ǫ → 0 lim T →∞ 1 T Z T 0 h F ( x ; θ ) , f 0 ( x, y ) i a ( x ) dt = lim ǫ → 0 lim T →∞ 1 T Z T 0 h F ( x ; θ ) , F ( x ; θ 0 ) i a ( x ) dt = lim ǫ → 0 Z X ×Y h F ( x ; θ ) , F ( x ; θ 0 ) i a ( x ) ρ ǫ ( x, y ) dxdy = Z X h F ( x ; θ ) , F ( x ; θ 0 ) i π ( x ) dx. The result follows. In the pa rticular case of linear p arameter depend ence, wh en the MLE is given by (3.6) we ha ve the following result, showing that the MLE r ecovers the c orrect an- swer fr om high frequency data co mpatible with the statistical m odel in an app ropriate asymptotic limit. Theorem 3.11. Let Assumptions 2. 1, 3.1, 3.7 and 3 .8 h old. Assume that F ( X ; θ ) is given by (3. 5) . Let { x ( t ) } t ∈ [0 ,T ] be a samp le pa th of (2.3) . Then b θ given by (3.6) satisﬁes lim ǫ → 0 lim T →∞ ˆ Θ( x ; T ) = I in pr ob ability . 15 Pr o of. Using equation (2.3) we ﬁnd that B ( x ; T ) = A ( x ; T ) + J 3 + J 4 , where J 3 = 1 T Z T 0 K ( x ) − 1 ( f 1 ( x, y ) − F ( x )) ⊗ K ( x ) − 1 F ( x ) dt, J 4 = 1 T Z T 0 K ( x ) − 1 ( α 0 ( x, y ) dU + α 1 ( x, y ) dV ) ⊗ K ( x ) − 1 F ( x ) . Here, for ﬁxed ǫ > 0 , E | J 4 | 2 = O (1 / T ) by Lemma A.2 and lim ǫ → 0 lim T →∞ E | J 3 | 2 = 0 by use of the Poisson eq uation techn ique. By ergodicity , an d Lemm a A. 3, we have that A ( x ; T ) = A ∞ ,ǫ + J 5 where A ∞ ,ǫ := Z X ×Y  K ( x ) − 1 F ( x ) ⊗ K ( x ) − 1 F ( x )  ρ ǫ ( x, y ) dxdy , with lim ǫ → 0 A ∞ ,ǫ = A ∞ and, for ﬁxed ǫ > 0 , E | J 5 | 2 = O (1 /T ) . Thus by Assum ption 3 .1 A ( x ; T ) is in vertible for T sufﬁciently large, and ǫ sufﬁ- ciently small, so that ˆ Θ( X ; T ) = I + ( A ∞ ,ǫ + J 5 ) − 1 ( J 3 + J 4 ) . The result follows. W e wou ld lik e to show that this also holds for the general case, i.e. if ˆ θ ( x ; T ) := ar g max θ L ( θ ; x ) then lim ǫ → 0 lim T →∞ ˆ θ ( x ; T ) = θ 0 , in probability . In fact, the following theorem is true for ev ery ǫ > 0 . Theorem 3 .12. Let Assumptio ns 2.1, 3.1, 3.5, 3.7 and 3. 8 hold and assume that θ ∈ Θ , a compac t set. Let { x ( t ) } t ∈ [0 ,T ] be a sample p ath of (2.3) at θ = θ 0 . Assume furthermor e tha t that the ma r ginal o f the in variant mea sur e o f (2.3) on X π ǫ ( x ) dx =  R Y ρ ǫ ( x, y ) dy  dx is absolutely continu ous with r espect to th e in varian t me asur e of the limiting SDE π ( x ) dx . Th en, for every ǫ > 0 , lim T →∞ ˆ θ ( x ; T ) = θ 0 , in pro bability . 16 Pr o of. Let g T ( ω , θ ) := 1 T L ( θ ; x ) and g ∞ ( θ ) := Z X ×Y  h F ( x ; θ ) , F ( x ; θ 0 ) i a ( x ) − 1 2 | F ( x ; θ ) | 2 a ( x )  ρ ǫ ( x, y ) dxdy . It is straightforward to see that arg max θ g ∞ ( θ ) = θ 0 by co mpleting the square. W e a pply Le mma A.4, r eplacing ǫ by 1 T , g ǫ by g T and g 0 by g ∞ . The result follows, provided th at conditions (A.2), (A.3) and (A.4) ar e satisﬁed. Condition (A.2) follows from T heorem 3.10. The identiﬁab ility co ndi- tion (A. 4) fo llows f rom Assumption s 3.5 and the abso lute con tinuity of π ǫ ( x ) dx =  R Y ρ ǫ ( x, y ) dy  dx with respect to π ( x ) dx . Finally , we can verify th at (A.3) holds, following the pro of in [22] and u sing the fact th at functio ns f 1 , α 0 and α 1 are un i- formly bounded . 3.2 Homogenization W e now ask what hap pens when the ML E for the h omogen ized equation (3. 1) is con- fronted with data from the multiscale equation (2.4), which homogenize s to give (3.1). The situatio n differs substantially fro m the case where data is taken from the multiscale equations (2.3) which averages to give (3.1): the two likelihoo ds are not identical in the large T limit. In order to state the main result of this subsection we need to introduce the Poisson equation − L 0 Γ = h F ( x ; θ ) , f 0 ( x, y ) i a ( x ) , Z Y ρ ( y ; ξ )Γ( y ; x ) dy = 0 (3.10) which has a unique solution Γ( y ; x ) ∈ L 2 ρ ( Y ; x ) . No te that Γ = h F ( x ; θ ) , Φ( x, y ) i a ( x ) , where Φ solves (2.8) . Deﬁne E ∞ ( θ ) = Z X ×Y  L 1 Γ( x, y ) − h F ( x ; θ ) ,  L 1 Φ( x, y )  i a ( x )  π ( x ) ρ ( y ; x ) dxdy . (3 .11) The follo wing theorem sho ws that the corre ct limit of the log likelihood is not obtained unless E ∞ = 0 , something which will not be true in ge neral. Howe ver in the case where f 0 , g 1 ≡ 0 we do obtain E ∞ = 0 and in th is case we recover the averaging situation covered in the Theor ems 2.3 and Th eorem 3.10 (with ǫ replaced by ǫ 2 ). Theorem 3 .13. Let Assumptions 2.1, 2.4, 3.1, 3.7 and 3.8 ho ld. Let { x ( t ) } t ∈ [0 ,T ] be a sample pa th of (2.4) and { X ( t ) } t ∈ [0 ,T ] a sample pa th of (3.1) at θ = θ 0 . Then th e following limits, to be interpr eted in L 2 (Ω) an d L 2 (Ω 0 ) res pec tively , and almo st surely with r espect to x (0) , y (0) , X (0 ) , ar e id entical: lim ǫ → 0 lim T →∞ 1 T L ( θ ; x ) = lim T →∞ 1 T L ( θ ; X ) + E ∞ ( θ ) . 17 Pr o of. As in the av erag ing case of Theorem 3.10 we ha ve lim ǫ → 0 lim T →∞ 1 T Z T 0 | F ( x ; θ ) | 2 a ( x ) dt = Z X | F ( x ; θ ) | 2 a ( x ) π ( x ) dx. Now 1 T Z T 0 h F ( x ; θ ) , dx i a ( x ) = I 1 + I 2 + I 3 where I 1 = 1 ǫT Z T 0 h F ( x ; θ ) , f 0 ( x, y ) i a ( x ) dt, I 2 = 1 T Z T 0 h F ( x ; θ ) , f 1 ( x, y ) i a ( x ) dt, I 3 = 1 T Z T 0 h F ( x ; θ ) , α 0 ( x, y ) dU + α 1 ( x, y ) dV i a ( x ) . Now I 3 is O (1 / √ T ) in L 2 (Ω) by Lemm a A.2. T ech niques similar to those used in the proof of Theorem 3.10 show t hat lim ǫ → 0 lim T →∞ I 2 → Z X h F ( x ; θ ) , F 1 ( x ; θ 0 ) i a ( x ) π ( dx ) . Now consider I 1 . Apply ing It ˆ o’ s formu la to the solution Γ of the Poisson equa tion (3.10), we obtain d Γ dt = 1 ǫ 2 L 0 Γ + 1 ǫ L 1 Γ + L 2 Γ + 1 ǫ ∇ y Γ β dV dt + ∇ x Γ α 0 dU dt + ∇ x Γ α 1 dV dt . From this we deduce that 1 ǫT Z T 0 h F ( x ; θ ) , f 0 ( x, y ) i dt = 1 T Z T 0  L 1 Γ  dt + I 4 where lim ǫ → 0 lim T →∞ I 4 = 0 . Thus I 1 = 1 ǫT Z T 0 h F ( x ; θ ) , f 0 ( x, y ) i dt = I 4 + I 5 + I 6 where, in L 2 (Ω) , I 5 = 1 T Z T 0 h F ( x ; θ ) ,  L 1 Φ( x, y )  i a ( x ) dt, I 6 = 1 T Z T 0  L 1 Γ( x, y ) − h F ( x ; θ ) ,  L 1 Φ( x, y )  i a ( x )  dt. 18 By the methods used in the proo f of Theo rem 3.10 we deduce that lim ǫ → 0 lim T →∞ I 5 → Z X h F ( x ; θ ) , F 0 ( x ; θ 0 ) i a ( x ) π ( x ) dx. Putting together all the estimates we deduce that, in L 2 , lim ǫ → 0 lim T →∞ 1 T L ( x ; θ ) = lim T →∞ L ( X ; θ ) + lim ǫ → 0 lim T →∞ I 6 = lim T →∞ L ( X ; θ ) + E ∞ ( θ ) . 4 Subsampling In the p revious section we studied the behavior of estimators when confr onted with multiscale data . The data is such that, in an app ropriate asymptotic limit ǫ → 0 , it behaves weak ly as if it comes from a single scale equation in the form of the statistical model. By considering the beh avior of con tinuous time estimators in the limit o f large time, followed b y tak ing ǫ → 0 , we studied the beh avior of estimator s which do no t subsample the data. W e showed tha t in the averaging set-up th is did no t cause a problem – the likelihood behaves as if confron ted with data from the statistical model itself; b ut in th e homogenizatio n set-up the likeliho od fu nction was asymptotically b iased for large time. In this section we sh ow that sub sampling the data can overcom e this issue, provided the subsampling rate is chosen appropriately . In the f ollowing we use E π to denote expectatio n on X with respect to measure with d ensity π and E ρ ǫ to denote expectatio n on X × Y with respect to m easure with density ρ ǫ . Recall that, by Assumptio n 3.7 the latter measure has weak limit with den- sity π ( x ) ρ ( y ; x ) . Let Ω ′ = Ω × X × Y an d consider th e pr obability measur e induced on paths x, y solvin g (2.4) by choosing initial conditions distributed accor ding to th e measure π ( x ) ρ ( y ; x ) dxdy . W ith expectation E under this measure we will also use the notation k · k p := ( E | · | p ) 1 /p . W e d eﬁne the discrete log likelihood function foun d fr om applying th e likelihood principle to th e Euler-Marayama approximatio n of th e statistical mod el (3.1). Let z = { z n } N − 1 n =0 denote a time series in X . W e o btain the likelihood L δ,N ( θ ; z ) = N − 1 X n =0 h F ( z n ; θ ) , z n +1 − z n i a ( z n ) − 1 2 N − 1 X n =0 | F ( z n ; θ ) | 2 a ( z n ) δ. Let x n = x ( nδ ) , noting that x ( t ) d epends on ǫ , and set x = { x n } N − 1 n =0 . The b asic theorem in this s ection proves con vergence of the lo g likelihood fun ction, p rovided that we subsamp le (i.e. choose δ ) at an appro priate ǫ -depend ent rate. W e s tate and prove the theorem , r elying on a pair of intuiti vely reasonable proposition s which we then prove at the end of the section. Theorem 4.1 . Let Assump tions 2.1, 2.4, 3.1, 3.7 and 3.8 hold. Let { x ( t ) } t ∈ [0 ,T ] be a sample path of (2 .4) an d X ( t ) a samp le p ath o f ( 3.1) at θ = θ 0 . Let δ = ǫ α with 19 α ∈ (0 , 1) and let N = [ ǫ − γ ] with γ > α . Then th e following limits, to be interpreted in L 2 (Ω ′ ) and L 2 (Ω 0 ) r espectively , and a lmost su r ely with r espect to X (0) , ar e identical: lim ǫ → 0 1 N δ L N ,δ ( θ ; x ) = lim T →∞ 1 T L ( θ ; X ) . (4.1) The proof o f this th eorem is based on the f ollowing two technical results, whose proof s are presen ted in the appendix. Proposition 4.2. Let ( x ( t ) , y ( t )) be the solu tion of (2.4) a nd assume th at Assump- tions 2.1 an d 2. 4 ho ld. Then, fo r ǫ, δ suf ﬁcien tly small, the incr ement of th e p r oc ess x ( t ) can be written in the form x n +1 − x n = F ( x n ; θ 0 ) δ + M n + R ( ǫ , δ ) , (4.2) wher e M n denotes the martingale term M n = Z ( n +1) δ nδ ( ∇ y Φ β + α 0 ) ( x ( s ) , y ( s )) dV + Z ( n +1) δ nδ α 1 ( x ( s ) , y ( s )) dU with k M n k p ≤ C √ δ and k R ( ǫ, δ ) k p ≤ C ( δ 3 / 2 + ǫδ 1 2 + ǫ ) . Proposition 4.3. Let g ∈ C 1 ( X ) and le t Assumption s 3.7 h old. Assume th at ǫ a nd N ar e r elated as in Theorem 4.1. Then lim ǫ → 0 1 N N − 1 X n =0 g ( x n ) = E π g , (4.3) wher e the conver gen ce is in L 2 with r espect to the mea sur e on initia l condition s with density π ( x ) ρ ( y ; x ) . Proof of T heorem 4.1. W e deﬁn e I 1 ( x, θ ) = N − 1 X n =0 h F ( x n ; θ ) , x n +1 − x n i a ( x n ) and I 2 ( x ) = 1 2 N − 1 X n =0 | F ( x n ; θ ) | 2 a ( x n ) δ. By Proposition 4.3 we hav e that 1 N δ I 2 ( x ) → 1 2 Z X | F ( x ; θ ) | 2 a ( x ) π ( dx ) . 20 W e use Pro position 4.2 to deduce that 1 N δ I 1 ( x ; θ ) = 1 N δ N − 1 X n =0 h F ( x n ; θ ) , F ( x n ; θ 0 ) δ + M n + R ( ǫ , δ ) i a ( x n ) = 1 N N − 1 X n =0 h F ( x n ; θ ) , F ( x n ; θ 0 ) i a ( x n ) + 1 N δ N − 1 X n =0 h F ( x n ) , M n i a ( x n ) + 1 N δ N − 1 X n =0 h F ( x n ) , R ( ǫ, δ ) i a ( x n ) =: J 1 + J 2 + J 3 . Again using Proposition 4.3 we ha ve that J 1 → Z X h F ( x ; θ ) , F ( x ; θ 0 ) i a ( x ) π ( dx ) . Furthermo re, u sing the fact that M n is indep endent o f x n and has qu adratic variation of order δ it follows that k J 2 k 2 2 ≤ 1 N 2 δ 2 N − 1 X n =0 E   h F ( x n ; θ ) , M n i a ( x n )   2 ≤ C N δ . Here Q is deﬁned to o btain the correct quad ratic variation o f the M n . Conseq uently , and since γ > α , k J 2 k 2 ≤ o (1) as ǫ → 0 . Similarly , u sing martin gale moment in equalities [10, Eq. (3.2 5) p. 1 63] we obtain k J 2 k p ≤ o (1) . Finally , again using Proposition 4.2, we ha ve, for q − 1 + p − 1 = 1 , k J 3 k p ≤ 1 N δ N − 1 X n =0 k F ( x n ) k q k R ( ǫ, δ ) k p ≤ C 1 N δ N  δ 3 / 2 + ǫ + ǫδ 1 / 2  ≤ o (1) , as ǫ → 0 , since we have assumed that α ∈ (0 , 1) . W e th us hav e lim ǫ → 0 1 N δ L N ,δ ( θ ; x ) = Z X h F ( x ; θ ) , F ( x ; θ 0 ) i a ( x ) π ( x ) dx − 1 2 Z X | F ( x ; θ ) | 2 a ( x ) π ( x ) dx. By completing the square we obtain (4.1). As before, we would like to use this theore m in order to prov e the consistency of our estima tor . T he theory de veloped in [2 2] n o longer ap plies because it is based on th e assumption that the fu nction we are maximizing ( i.e. the log likeliho od fun c- tion) is a co ntinuou s semimartin gale, which is not true fo r the discrete semima rtingale 21 L N ,δ ( θ ; x ) . The most difﬁcult part in pr oving consistency is to prove that the mar- tingale c on verges unifo rmly to zero (Assump tion A.3 in Lemma A.4). T o av oid this difﬁculty , we m ake some extra assump tions that a llow u s to get rid of the martinga le part: Assumptions 4.4. 1. There exists a fu nction V : X × Θ → R such that fo r each θ ∈ Θ , V ( · , θ ) ∈ C 3 ( X ) an d ∇ V ( z ; θ ) =  K ( z ) K ( z ) T  − 1 F ( z ; θ ) , ∀ z ∈ X , θ ∈ Θ . (4.4) 2. Deﬁne G : X × Θ → R as follows: G ( z ; θ ) := D 2 V ( z ; θ ) : ( K ( z ) K ( z ) T ) , wher e D 2 V denotes the Hessian matrix of V . Then there exist an β > 0 and ˆ G : X → R that is square inte grable with r espect to the in variant measur e, suc h that | G ( z ; θ ) − G ( z ; θ ′ ) | ≤ | θ − θ ′ | β ˆ G ( z ) . Suppose that the above assumption is tru e and { X ( t ) } t ∈ [0 ,T ] is a sample p ath of (3.1). The n, if we apply It ˆ o’ s for mula to function V , w e get that for e very θ ∈ Θ : dV ( X ( t ); θ ) = h∇ V ( X ( t ); θ ) , dX ( t ) i + 1 2 G ( X ( t ); θ ) dt. But from (4.4) we hav e that h∇ V ( X ( t ); θ ) , dX ( t ) i = h  K ( X ( t )) K ( X ( t )) T  − 1 F ( X ( t ); θ ) , dX ( t ) i = = h F ( X ( t ); θ ) , dX ( t ) i a ( X ( t )) and thus h F ( X ( t ); θ ) , dX ( t ) i a ( X ( t )) = dV ( X ( t )) − 1 2 G ( X ( t ); θ ) dt. Using this identity , we can write the log-likeliho od function (3.3) in the fo rm L ( θ ; X ( t )) = ( V ( X ( T ); θ ) − V ( X (0); θ )) − 1 2 Z T 0  | F ( X ( t ); θ ) | 2 a ( X ( t )) + G ( X ( t ); θ )  dt. Using this version of the log-likelihood fun ction , we deﬁne ˜ L N ,δ ( θ ; z ) = − 1 2 N − 1 X n =0  | F ( z n ; θ ) | 2 a ( z n ) + G ( z n ; θ )  δ. (4.5) Now we can prove asymptotic consistency of the MLE, provided that we subsample at the approp riate sampling rate. Theorem 4.5 . Let Assumption s 2.1, 2.4, 3.1, 3.5, 3.7, 3.8 and 4.4 hold an d assume that θ ∈ Θ , a compact set. Let { x ( t ) } t ∈ [0 ,T ] be a sample path of (2.4) at θ = θ 0 . Deﬁn e ˆ θ ( x ; ǫ ) := arg max θ ˜ L N ,δ ( θ ; x ) with N and δ as in Theor em 4.1 above and ˜ L N ,δ ( θ ; x ) deﬁn ed in (4.5) . Then, lim ǫ → 0 ˆ θ ( x ; ǫ ) = θ 0 , in pro bability . 22 Pr o of. W e apply Lem ma A.4 with g ǫ ( x, θ ) = 1 N δ ˜ L N ,δ ( θ ; x ) a nd g 0 ( θ ) its lim it. No te that lim ǫ → 0 1 N δ ˜ L N ,δ ( θ ; x ) = lim T →∞ 1 T L ( θ ; X ) by Proposition 4.3 and the fact that lim T →∞ 1 T ( V ( X ( T ); θ ) − V ( X (0); θ )) = 0 , which fo llows from the ergodicity of X . As in Theore m 4.1, the lim its are in terpreted in L 2 (Ω ′ ) and L 2 (Ω 0 ) resp ecti vely , and almost surely with respect to X (0) . As we have already seen, th e maximize r of g 0 ( θ ) is θ 0 . So , Assumption (A.2) is satisﬁed. Also, Assumption 3.5 is equiv alent to (A.4). T o pr ove con sistency , we ne ed to prove (A.3), which can be viewed as un iform ergodicity . The proo f is aga in similar to that in [22]. First, we note that by Assum ptions 3.5 and 4. 4, both g ǫ ( · , θ ) a nd g 0 ( θ ) are continuo us with r espect to θ , so it is s ufﬁcient to prove (A.3) on a countable dense subset Θ ⋆ of Θ . Then , uniform er god icity follows from [5 , T hm. 6.1 .5] , provide d that N [ ]  ǫ, F , k · k L 1 ( π )  < ∞ , i.e. the nu mber of balls of radius ǫ with respect to k · k L 1 ( π ) needed to cover F := {| F ( z ; θ | 2 a ( z ) + G ( z ; θ ); θ ∈ Θ ⋆ } is ﬁnite. As demo nstrated in [22], this f ollows from the H ¨ older con tinuity of | F ( z ; θ ) | 2 a ( z ) and G ( z ; θ ) . 5 Examples Numerical experiments, illustrating the phenom ena studied in this pap er , can be found in the pap er [19]. The expe riments therein are co ncerned with a p articular case of the general homogen ization framework considered in this pap er and illustrate the failure of th e M LE when the da ta is sampled too frequen tly , and the role of sub sampling to ameliorate this p roblem. In this section we constru ct two examples which iden tify the term E ∞ responsible for the failure of the MLE. 5.1 Langevin Eq uation i n the High Friction Limit W e con sider the Langevin equation in the high friction limit: 3 ǫ 2 d 2 q dt 2 = −∇ q V ( q ; θ ) − dq dt + p 2 β − 1 dW dt , (5.1) where V ( q ; θ ) is a smooth conﬁning potential d ependin g on a para meter θ ∈ Θ ⊂ R ℓ , 4 β stands for th e in verse tempera ture and W ( t ) is standar d Brownian motion on R d . W e 3 W e have rescaled the equation in such a way that we actually con sider the small m ass, rather than the high frict ion limit. In the case where the mass and the fri ction are scalar quantiti es the two sca ling limits are equi va lent. 4 A standard e xample is that of a quadrat ic potential V ( q ; θ ) = 1 2 q θ q T where the parameters to be estimate d from time ser ies are the elements of the stiffne ss m atrix θ . 23 write this equation as a ﬁrst order system dq dt = 1 ǫ p, dp dt = − 1 ǫ ∇ q V ( q ; θ ) − 1 ǫ 2 p + r 2 β − 1 ǫ 2 dW dt . (5.2) In the notation of the general homog enization set-up we have ( x , y ) = ( q , p ) and f 0 = p, f 1 = 0 , α 0 = 0 , α 1 = 0 and g 0 = − p, g 1 = −∇ q V ( q ) , β 7→ p 2 β − 1 I . The fast process is simply an Ornstein-Uhlenbeck process with generator L 0 = − p · ∇ p + β − 1 ∆ p . The unique square integrable (with respect to the inv ariant measure of the OU pro cess) solution of the Poisson equation (2.8) is Φ = p . Therefore , F 0 = −∇ q V ( q ; θ ) , F 1 = 0 , A 1 = p 2 β − 1 I . Hence the homog enized equatio n is 5 dX dt = −∇ V ( X ; θ ) + p 2 β − 1 dW dt . (5.3) Consider now the par ameter estimation problem for ”full dy namics” (5.1) an d the ”coarse grain ed” model (5.3): W e ar e given data from (5.1) and we want to ﬁt it to equation (5.3). Th eorem 3.13 im plies that for this problem the maximum likelihood estimator is asymptotically biased. 6 In fact, in this case we can comp ute the term E ∞ , responsible for the bias and given in eq uation (3.11). W e have the following result. Proposition 5. 1. Ass ume that the p otential V ( q ; θ ) ∈ C ∞ ( R d ) is such that e − β V ( q ; θ ) ∈ L 1 ( R d ) for every β > 0 a nd all θ ∈ Θ . Then err or term E ∞ , eq n. (3.11) for the SDE ( 5.1) is given by the formula E ∞ ( θ ) = − Z − 1 V β 2 Z R d |∇ q V ( q ; θ ) | 2 e − β V ( q ; θ ) dq , (5.4) wher e Z V = R R d e − β V ( q ; θ ) dq . In p articular , E ∞ < 0 . Pr o of. W e h av e that L 1 = p · ∇ q − ∇ q V · ∇ p . The in variant measur e of the process is ǫ -independen t an d we write it is ρ ( q , p ; θ ) dq dp = Z − 1 e − β H ( p,q ; θ ) dq dp. Furthermo re, since the hom ogenized dif fusion matrix is p 2 β − 1 I , h· , ·i a ( z ) = β 2 h· , ·i , 5 In this case we can actuall y pro ve strong con ver gence of q ( t ) to X ( t ) [12, 18]. 6 Subsampling, at the rate gi ven in Theorem 4.1, is necessary for the correct estimation of the parameters in the drift of the homogeni zed equ ation (5.3) . 24 where h· , ·i stands for the standard Euclide an inner product. W e readily ch eck that 2 β L 1 Γ = L 1 h−∇ q V , p i = − p ⊗ p : D 2 q V ( q ; θ ) + |∇ q V ( q ; θ ) | 2 and 2 β h F, L 1 Φ i a = h−∇ q V , L 1 p i = |∇ q V ( q ; θ ) | 2 . Thus, E ∞ ( θ ) = − β 2 Z R 2 d p ⊗ p : D 2 q V ( q ; θ ) Z − 1 e − β H ( p,q ; θ ) dq dp = − 1 2 Z R d ∆ q V ( q ; θ ) Z − 1 V e − β V ( q ; θ ) dq = − β 2 Z R d |∇ q V ( q ; θ ) | 2 Z − 1 V e − β V ( q ; θ ) dq , which is precisely (5.4). 5.2 Motion in a Multiscale Pote ntial Consider the equation [19] dx dt = −∇ V ǫ ( x ) + p 2 β − 1 dW dt (5.5) where V ǫ ( x ) = V ( x ) + p ( x/ǫ ) , where the ﬂuctuating part of the potential p ( · ) is taken to be a smoo th 1 -periodic func- tion. Setting y = x/ǫ we obtain dx dt = −  ∇ V ( x ) + 1 ǫ ∇ p ( y )  + p 2 β − 1 dW dt (5.6a) dy dt = − 1 ǫ  ∇ V ( x ) + 1 ǫ ∇ p ( y )  + 1 ǫ p 2 β − 1 dW dt . (5.6b) In the notation of the general homog enization set-up we have f 0 = g 0 = −∇ y p ( y ) , f 1 = g 1 = −∇ V ( x ) and α 0 = 0 , α 1 = β = p 2 β − 1 . The fast process has generator L 0 = −∇ y p ( y ) · ∇ y + β − 1 ∆ y . The inv arian t density is ρ ( y ) = Z − 1 p exp( − β p ( y )) with Z p = R T d exp( − β p ( y )) dy . The Poisson equation for Φ is L 0 Φ( y ) = ∇ y p ( y ) . 25 Notice that Φ is a function of y on ly . Th e homogen ized equation is dX dt = − K ∇ V ( X ) + p 2 β − 1 K dW dt (5.7) where K = Z T d ( I + ∇ y Φ( y ))( I + ∇ y Φ( y )) T ρ ( y ) dy . Suppose n ow that th e p otential co ntains p arameters, V = V ( x, θ ) , θ ∈ Θ ⊂ R ℓ . W e want to estimate the pa rameter θ , gi ven data fr om (5 .5) and u sing the ho mogenized equation dX dt = − K ∇ V ( X ; θ ) + p 2 β − 1 K dW dt . Theorem 3.13 implies that, for th is problem , the maximu m li kelihoo d estimator is asymptotically biased and th at subsampling at the appr opriate rate is necessary for the accur ate e stimation of the parameter θ . As in the exam ple presented in the previous section, we can calculate explicitly the error term E ∞ . F or simplicity we will consider the problem in one dimension. Proposition 5.2. Assume tha t the poten tial V ( x ; θ ) ∈ C ∞ ( R ) is such that e − β V ( x ; θ ) ∈ L 1 ( R ) for every β > 0 and a ll θ ∈ Θ . Then er r or term E ∞ , eqn. ( 3.11) for the SDE ( 5.5) is given by the formula E ∞ ( θ ) =  − 1 + b Z − 1 p Z − 1 p  β Z − 1 V 2 Z R | ∂ x V | 2 e − β V ( x ; θ ) dx. (5.8) wher e Z V = R R e − β V ( q ; θ ) dq , Z p = R 1 0 e − β p ( y ) dy b Z p = R 1 0 e β p ( y ) dy . In p articular , E ∞ < 0 . Pr o of. Equations (5.6) in one dimension becom e ˙ x = − ∂ x V ( x ; θ ) − 1 ǫ ∂ y p ( y ) + p 2 β − 1 ˙ W , (5.9a) ˙ y = − 1 ǫ ∂ x V ( x ; θ ) − 1 ǫ 2 ∂ y p ( y ) + 2 β − 1 ǫ 2 ˙ W . (5.9 b) The in variant measur e of this system is (notice that it is independent of ǫ ) ρ ( y , x ; θ ) dxdy = Z − 1 V ( θ ) Z − 1 p e − β V ( x ; θ ) − β p ( y ) dxdy . The homog enized equa tion is ˙ X = − K ∂ x V ( x ; θ ) + p 2 β − 1 K ˙ W . The cell problem is L 0 φ = ∂ y p and the homog enized coefﬁcient is K = Z − 1 p Z 1 0 (1 + ∂ y φ ) 2 e − p ( y ) /σ dy . 26 W e have that h p, q i α ( x ) = β 2 K pq . The error in the likelihood is E ∞ ( θ ) = Z ∞ −∞ Z 1 0  L 1 Γ( x, y ) − h F, L 1 φ i α ( x )  ρ ( x, y ) dy dx, where Γ = h F , φ i α ( x ) , F = − K ∂ x V . W e have that Γ( x, y ) = β 2 K ( − K ∂ x V φ ) = − β 2 ∂ x V φ. Furthermo re L 1 = − ∂ x V ∂ y − ∂ y p∂ x + 2 β − 1 ∂ x ∂ y . Consequently L 1 Γ( x, y ) = β 2  | ∂ x V | 2 ∂ y φ + ∂ y p∂ 2 x V φ − 2 β − 1 ∂ 2 x V ∂ y φ  . In addition, h F, L 1 φ i α ( x ) = β 2 | ∂ x V | 2 ∂ y φ. The error in the likelihood is E ∞ ( θ ) = β 2 Z R Z 1 0  − ∂ y p∂ 2 x V φ + 2 β − 1 ∂ 2 x ∂ y φ  Z − 1 V Z − 1 p e − β V ( x ; θ ) − β p ( y ) dxdy = − Z − 1 V Z − 1 p 2 Z R ∂ 2 x V e − β V ( x ; θ ) dx Z 1 0 ∂ y φe − β p ( y ) dy + Z − 1 V Z − 1 p Z R ∂ 2 x V e − β V ( x ; θ ) dx Z 1 0 ∂ y φe − β p ( y ) dy = Z − 1 V Z − 1 p 2 Z R ∂ 2 x V e − β V ( x ; θ ) dx Z 1 0 ∂ y φe − β p ( y ) dy = β Z − 1 V 2 Z R | ∂ x V | 2 e − β V ( x ; θ ) dx  − 1 + b Z − 1 p Z − 1 p  . In above derivation we used v ariou s integrations by parts, together with the formula f or the d eriv ativ e of the solutio n of the Po isson equ ation ∂ y φ = − 1 + b Z − 1 p e β p ( y ) , [ 20, p . 213]. The fact th at E ∞ is non positiv e fo llows fro m the inequality Z − 1 p b Z − 1 p < 1 (f or p ( y ) not identically equal to 0 ) , which follows from the Cauchy-Sch warz inequality . Remark 5.3. An application of Lapla ce’ s method shows that, fo r β ≫ 1 , Z − 1 p b Z − 1 p ∼ e − 2 β . 27 6 Conclusions The prob lem of pa rameter estima tion for fast/slow systems o f SDEs which admit a coarse-gr ained descriptio n in terms of an SDE fo r the s low variable was s tud ied in this p aper . It was shown that, when applied to the averaging problem , the maximum likelihood estimator (MLE ) is asymptotica lly un biased and we can use it to estimate accurately the paramete rs in the drift coef ﬁcient of the coar se-grained mod el u sing data from the slow v ariable in the fast/slo w system. On the contrar y , the MLE is asymptoti- cally biased when a pplied to the homog enization prob lem and a systematic asymptotic error ap pears in the log-likelihood func tion, in the long time /inﬁnite scale separa tion limit. The MLE can lead to th e correct estimation of the par ameters in the drift co - efﬁcient of the homog enized equation provided th at we sub sample th e d ata from the fast/slo w system at the appr opriate sampling rate. The averaging/hom ogenization systems of SDEs that we consider in th is paper are of q uite gen eral form and have been studied quite exten si vely in the last several decade s since they ap pear in various ap plications, e.g . m olecular dyna mics, chemical k inetics, mathematical ﬁnance, atmospher e/ocean scienc e-see th e r eferences in [20]. Thus, we believe that our r esults show th at great care has to be taken when u sing max imum likelihood in ord er to infe r infor mation about parameter s in stochastic systems with multiple characteristic time scales. There are various problems, both of theoretical and of applied interest, that remain open and that we plan to address in future work. W e list som e of them below . • Bayesian techniq ues for parameter estimation of multiscale dif fusion proce sses. • The development o f ef ﬁcient algorithm s for estimatin g the parameters in the coarse-gr ained model of a fast/slow stochastic sy stem. Based on the work that has bee n d one to similar m odels in the co ntext o f econom etrics [1 3, 2] on e ex- pects that such an algorithm would inv olve th e e stimation of an appropriate mea - sure of scale separation ǫ , and o f the optimal sam pling rate, averaging over all the av ailable data and a bias reduction step. • Investi gate whether there is any adv antag e in using rand om samplin g rates. • Investi gate similar issues for deterministic fast/slow systems of differential equ a- tions. Ackno wledgements AP has been partially supported by a Marie Curie Internationa l Reintegration Gr ant, MIRG-CT -200 5-029 160. AMS is partially suppo rted by EPSR C. A A ppendix A.1 An Ergodic Theor em with Con vergence Rates Consider the SDE dz dt = h ( z ) + γ ( z ) dW dt , (A.1) with z ∈ Z , wher e Z is either R k or T k , h : Z → R k , γ : Z → R k × p and w ∈ R p a stand ard Brownian motion. Assume that h, γ are C ∞ with b ounded derivati ves. Let ψ : Z → R be b ounded , and φ : Z → R be bound ed. W e denote th e gen erator of the Markov proc ess (A.1) by A . 28 Assumptions A.1. The equa tion (A.1) is er go dic with in va riant measur e ν ( z ) dz . Let φ = Z Z φ ( z ) ν ( z ) dz . Then the equation −A Φ = φ − φ, Z Z Φ( z ) ν ( z ) dz = 0 has a unique solution Φ : Z → R , with Φ and ∇ Φ bounded . Lemma A.2. Let I = 1 √ T Z T 0 ψ ( z ( t )) dW ( t ) . Then ther e exists a constant C > 0 : E | I | 2 ≤ C for all T > 0 . Pr o of. Use the It ˆ o isometry and in voke the bounded ness o f ψ . Lemma A.3. T ime averages conver ge to their mean value almost sur ely . Furthermor e ther e is a constant C > 0 : E      1 T Z T 0 φ ( z ( t )) dt − φ      2 ≤ C T . Pr o of. By applyin g the It ˆ o formu la to Φ we obtain − Z T 0 A Φ( z ( t )) dt = Φ( z (0)) − Φ( z ( T )) + Z T 0 ( ∇ Φ γ ) ( z ( t )) dW ( t ) . Thus Z T 0 φ ( z ( t )) dt = φ + 1 T (Φ( z (0)) − Φ( z ( T ))) + 1 √ T I , I = 1 √ T Z T 0 ( ∇ Φ γ ) ( z ( t )) dW ( t ) . The result concerning L 2 (Ω) con vergence follows from boun dedness of Φ , ∇ Φ and γ , together with Lemma A. 2. Almost sure conver gen ce follows fro m the er god ic theorem. A.2 Consistency of the Estimators Lemma A.4. Let ( ˜ Ω , ˜ F , ˜ P ) be a p r ob ability space an d g ǫ : ˜ Ω × Θ → R , g 0 : Θ → R be such that ∀ θ ∈ Θ , g ǫ → g 0 in probability , as ǫ → 0 (A.2) and ∀ δ, κ > 0 : P ( ω : sup | u | >δ  g ǫ ( ω , ˆ θ 0 + u ) − g 0 ( ˆ θ 0 + u )  > κ ) → 0 , as ǫ → 0 , (A.3) 29 wher e ˆ θ 0 = arg sup θ ∈ Θ g 0 ( θ ) . Mor eover , we assume th at ∀ δ > 0 , sup | u | >δ  g 0 ( ˆ θ 0 + u ) − g 0 ( ˆ θ 0 )  ≤ − κ ( δ ) < 0 . (A.4) If ˆ θ ǫ ( ω ) = a rg s up θ ∈ Θ g ǫ ( ω , θ ) then ˆ θ ǫ → ˆ θ 0 in probability . Pr o of. First note that ∀ δ > 0 ˜ P n | ˆ θ ǫ − ˆ θ 0 | > δ o ≤ ˜ P ( sup | u | >δ  g ǫ ( ω , ˆ θ 0 + u ) − g ǫ ( ω , ˆ θ 0 )  ≥ 0 ) . (A.5 ) W e deﬁne G ǫ ( ω ; θ , u ) := g ǫ ( ω , θ + u ) − g ǫ ( ω , θ ) a nd G 0 ( θ, u ) := g 0 ( θ + u ) − g 0 ( θ ) . Clearly , sup | u | >δ G ǫ ( ω ; ˆ θ 0 , u ) ≤ sup | u | >δ  G ǫ ( ω ; ˆ θ 0 , u ) − G 0 ( ˆ θ 0 , u )  + sup | u | >δ G 0 ( ˆ θ 0 , u ) and thus ˜ P ( sup | u | >δ G ǫ ( ω ; ˆ θ 0 , u ) ≥ 0 ) ≤ ˜ P ( sup | u | >δ  G ǫ ( ω ; ˆ θ 0 , u ) − G 0 ( ˆ θ 0 , u )  ≥ − sup | u | >δ G 0 ( ˆ θ 0 , u ) ) ≤ ˜ P ( sup | u | >δ  G ǫ ( ω ; ˆ θ 0 , u ) − G 0 ( ˆ θ 0 , u )  ≥ κ ( δ ) > 0 ) (A.6) by Assumption (A.4). No te that G ǫ ( ω ; ˆ θ 0 , u ) − G 0 ( ˆ θ 0 , u ) =  g ǫ ( ω ; ˆ θ 0 + u ) − g 0 ( ˆ θ 0 + u )  −  g ǫ ( ω ; ˆ θ 0 ) − g 0 ( ˆ θ 0 )  . So, by cond itioning o n n ω : | g ǫ ( ω ; ˆ θ 0 ) − g 0 ( ˆ θ 0 ) | ≥ 1 2 κ ( δ ) o and (A.5) and (A.6), we get that ˜ P n | ˆ θ ǫ − ˆ θ 0 | > δ o ≤ ˜ P n sup | u | >δ  g ǫ ( ω ; ˆ θ 0 + u ) − g 0 ( ˆ θ 0 + u )  ≥ 1 2 κ ( δ ) > 0 o + ˜ P n | g ǫ ( ω ; ˆ θ 0 ) − g 0 ( ˆ θ 0 ) | ≥ 1 2 κ ( δ ) > 0 o Both probabilities on the right-h and-side go to zer o as ǫ → 0 , b y assump tions (A. 3) and (A.2) respectively . W e co nclude that ˆ θ ǫ → ˆ θ 0 in probab ility . 30 A.3 Pr oof of Propositions 4.2 and 4.3 In this sectio n we present the pro ofs of P ro positions 4.2 and 4.3 which we repeat there, for the reader’ s con venience. Proposition A.5. Let ( x ( t ) , y ( t )) be the solution o f ( 2.4) and assume that Assump- tions 2.1 an d 2. 4 ho ld. Then, fo r ǫ, δ suf ﬁcien tly small, the incr ement of th e p r oc ess x ( t ) can be written in the form x n +1 − x n = F ( x n ; θ 0 ) δ + M n + R ( ǫ , δ ) , wher e M n denotes the martingale term M n = Z ( n +1) δ nδ ( ∇ y Φ β + α 0 ) ( x ( s ) , y ( s )) dV ( s ) + Z ( n +1) δ nδ α 1 ( x ( s ) , y ( s )) dU ( s ) with k M n k p ≤ C √ δ and k R ( ǫ, δ ) k p ≤ C ( δ 3 / 2 + ǫδ 1 2 + ǫ ) . Proposition A.6. Let g ∈ C 1 ( X ) an d let Assumptions 3.7 h old. Assume that ǫ and N ar e r elated as in Theorem 4.1. Then lim ǫ → 0 1 N N − 1 X n =0 g ( x n ) = E π g , wher e the conver gen ce is in L 2 with r espect to the mea sur e on initia l condition s with density π ( x ) ρ ( y ; x ) . For the pro ofs of Prop ositions A.5 and A.6, both used in t he pr oof of Theorem 4.1, we will n eed the following two technical lemmas. W e start with a rou gh estimate on the increments of the process x ( t ) . Lemma A.7. Let ( x ( t ) , y ( t )) be the solution of (2 .4) a nd assume that Assumption s 2. 1 and 2.4 hold. Let s ∈ [ nδ, ( n + 1) δ ] . Th en, for ǫ, δ sufﬁciently small, the following estimate holds: k x ( s ) − x n k p ≤ C ( ǫ + δ 1 2 ) . (A.7) Pr o of. W e apply It ˆ o’ s formu la to Φ , the solution of th e Poisson equation (2. 8), to obtain x ( s ) − x n = − ǫ (Φ( x ( s ) , y ( s )) − Φ( x n , y n )) + Z s nδ ( L 1 Φ + f 1 )) ( x ( s ) , y ( s )) ds + Z s nδ ( ∇ y Φ β + α 0 ) ( x ( s ) , y ( s )) dV ( s ) + Z s nδ α 1 ( x ( s ) , y ( s )) dU ( s ) + ǫ Z s nδ ( L 2 Φ)( x ( s ) , y ( s )) ds + ǫ Z s nδ ( ∇ y Φ α 0 ) ( x ( s ) , y ( s )) dU ( s ) + ǫ Z s nδ ( ∇ x Φ α 1 ) ( x ( s ) , y ( s )) dV ( s ) =: J 1 + J 2 + J 3 + J 4 + J 5 + J 6 + J 7 . 31 Our assumptions on Φ( x, y ) , to gether with standard inequalities, imply that k J 1 k p ≤ C ǫ, k J 2 k p ≤ C δ, k J 3 k p ≤ C δ 1 2 , k J 4 k p ≤ C δ 1 2 , k J 5 k p ≤ C ǫδ, k J 6 k p ≤ C ǫδ 1 / 2 , k J 7 k p ≤ C ǫδ 1 / 2 . Estimate (A.7) follows from these estimates. Using this lemma we can prove the following estimate. Lemma A.8. Let h ( x, y ) be a smooth , boun ded fu nction, let ( x ( t ) , y ( t )) be the solution of (2.4) and assume that Assumption 2.1 holds. Deﬁn e H ( x ) := Z Y h ( x, y ) ρ ( y ; x ) dy . Then, for ǫ, δ sufﬁciently small, the following estimate holds: Z ( n +1) δ nδ h ( x ( s ) , y ( s )) ds = H ( x n ) δ + R ( ǫ, δ ) (A.8) wher e k R ( ǫ, δ ) k p ≤ C ( ǫ 2 + δ 3 / 2 + ǫδ 1 / 2 ) . Pr o of. Let φ be the mean zero solution of the equation − L 0 φ = h ( x, y ) − H ( x ) . (A.9 ) By Assum ption 2. 1 this solution is smo oth in bo th x, y and it is unique a nd b ounde d. W e app ly It ˆ o’ s for mula to o btain Z ( n +1) δ nδ ( h ( x ( s ) , y ( s ))) − H ( x ( s ))) ds = − ǫ 2 ( φ ( x n +1 , y n +1 ) − φ ( x n , y n )) + ǫ Z ( n +1) δ nδ L 1 φ ( x ( s ) , y ( s )) ds + ǫ 2 Z ( n +1) δ nδ L 2 φ ( x ( s ) , y ( s )) ds + ǫ 2 Z ( n +1) δ nδ ( ∇ x φα 0 )( x ( s ) , y ( s )) dU ( s ) + ǫ Z ( n +1) δ nδ ( ∇ y φβ + ǫ ∇ x φα 1 )( x ( s ) , y ( s )) dV ( s ) =: J 1 + J 2 + J 3 + J 4 + J 5 . Our assumptio ns on the solution φ of the Poisson equation (A.9), together with stan - dard estimates for the moments of s toch astic i ntegrals and H ¨ older ’ s inequality gi ve the estimates k J 1 k p ≤ C ǫ 2 , k J 2 k p ≤ C ǫδ, k J 3 k p ≤ C ǫ 2 δ, k J 4 k p ≤ C ǫ 2 δ 1 / 2 , k J 5 k p ≤ C ǫδ 1 / 2 . 32 The above estimates imply that Z ( n +1) δ nδ h ( x ( s ) , y ( s )) ds = Z ( n +1) δ nδ H ( x ( s )) ds + R 1 ( ǫ, δ ) with k R 1 ( ǫ, δ ) k p ≤ C  ǫδ 1 / 2 + ǫ 2  . W e use the H ¨ older inequality and the Lipschitz continuity of H ( x ) to estimate:      Z ( n +1) δ nδ H ( x ( s )) ds − H ( x n ) δ      p p =      Z ( n +1) δ nδ ( H ( x ( s )) − H ( x n )) ds      p p ≤ δ p − 1 Z ( n +1) δ nδ k H ( x ( s )) − H ( x n ) k p p ds ≤ C δ p − 1 Z ( n +1) δ nδ k x ( s ) − x n k p p ds ≤ C δ p  δ 1 / 2 + ǫ  p = R 2 ( ǫ, δ ) p , where L emma A.7 was used and R 2 ( ǫ, δ ) = ( ǫδ + δ 3 / 2 ) . W e combine the above estimates to obtain Z ( n +1) δ nδ h ( x ( s ) , y ( s ))) ds = Z ( n +1) δ nδ H ( x ( s )) ds + R 1 ( ǫ, δ ) = H ( x n ) δ + R 1 ( ǫ, δ ) + R 2 ( ǫ, δ ) , from which (A.8) follows. Pr o of of Pr op osition 4.2 (Pr opo sition A .5). This fo llows fr om the ﬁrst line o f the proof of L emma A.7, the estimates th erein con cerning all the J i with the excep tion of J 2 , and the use of Lemma A.8 to estimate J 2 in terms of δ F ( x n ; θ 0 ) . Pr o of of Pr o position 4.3 (Pr opo sition A.6). W e h av e 1 N N − 1 X n =0 g ( x n ) = 1 N δ N − 1 X n =0 Z ( n +1) δ nδ g ( x n ) ds = 1 N δ N − 1 X n =0 Z ( n +1) δ nδ g ( x ( s )) ds + 1 N δ N − 1 X n =0 Z ( n +1) δ nδ ( g ( x n ) − g ( x ( s )) ) ds = 1 N δ Z N δ 0 g ( x ( s )) ds + 1 N δ N − 1 X n =0 Z ( n +1) δ nδ ( g ( x n ) − g ( x ( s ))) ds =: I 1 + R 1 . W e in troduce the notation f n := Z ( n +1) δ nδ  g ( x n ) − g ( x ( s ))  ds. 33 By Lemma A.7 we ha ve that x ( s ) − x n = O ( ǫ + δ 1 2 ) in L p (Ω ′ ) . W e use this, together with the Lipschitz continuity of g and H ¨ older’ s inequality , to estimate: k f n k p p ≤ δ p/q Z ( n +1) δ nδ E   g ( x n ) − g ( x ( s ))   p ds ≤ C δ 1+ p/q  ǫ p + δ p/ 2  . Here p − 1 + q − 1 = 1 . Using this we can estimate R 1 using: k R 1 k p ≤ 1 N δ N − 1 X n =0 k f n k p ≤ C 1 N δ N δ (1 /p +1 /q )  ǫ + δ 1 / 2  = C  ǫ + δ 1 / 2  → 0 , as ǫ → 0 . Thus it remains to estimate I 1 . Let T = N δ . Let ψ ǫ solve − L hom ψ ǫ ( x, y ) = ˆ g ( x ) := g ( x ) − E ρ ǫ g . (A.10) Apply It ˆ o’ s formula. Th is gi ves 1 T Z T 0 g ( x ( s )) ds − E ρ ǫ g = − 1 T  ψ ǫ  x ( T ) , y ( T )  − ψ ǫ  x (0) , y (0)   + 1 ǫT Z T 0  ∇ y ψ ǫ β )( x ( s ) , y ( s )) dV ( s ) + 1 T Z T 0  ∇ x ψ ǫ α )( x ( s ) , y ( s )) dU ′ ( s ) , =: J 1 + J 2 where J 2 denotes the two stochastic integrals and we write αdU ′ = α 0 dU + α 1 dV , in law . Note that E ρ ǫ g → E π g as ǫ → 0 by Assumption s 3.7. Thus the theorem will b e proved if we can show that J 1 + J 2 tends to zero in the required topology on the initial conditions. Note th at E ρ ǫ | J 1 | 2 ≤ 4 T 2 E ρ ǫ | ψ ǫ | 2 , E ρ ǫ | J 2 | 2 ≤ 1 T E ρ ǫ h∇ ψ ǫ , Σ ∇ ψ ǫ i . Here Σ is deﬁn ed in Ass um ptions 3.8 and ∇ is the gradient with respect to ( x T , y T ) T . W e note that, by station arity , we ha ve that E ρ ǫ | ψ ǫ | 2 = k ψ ǫ k , E ρ ǫ h∇ ψ ǫ , Σ ∇ ψ ǫ i = ( ∇ ψ ǫ , Σ ∇ ψ ǫ ) , (A.11) where k · k and ( · , · ) denote the L 2 ( X × Y ; µ ǫ ( dxdy )) norm and inner product, respec- ti vely . Use of the Dirichlet form (see Theor em 6.1 2 in [20]) sho ws that  ∇ ψ ǫ , Σ ∇ ψ ǫ  ≤ 2 Z ˆ g ( x ) ψ ǫ ( x, y ) ρ ǫ ( x, y ) dxdy ≤ a k ˆ g k 2 + a − 1 k ψ ǫ k 2 , 34 for any a > 0 . Using the Poincar ´ e inequality (3.9), together with Assumptions 3.7 and 3.8, giv es k ψ ǫ k 2 ≤ C 2 p k∇ ψ ǫ k 2 ≤ aC − 1 γ C 2 p k ˆ g k 2 + a − 1 C − 1 γ C 2 p k ψ ǫ k 2 . Choosing α so that a − 1 C − 1 γ C 2 p = 1 2 giv es k ψ ǫ k 2 ≤ C E ρ ǫ | ˆ g | 2 . Hence  ψ ǫ , Γ ∇ ψ ǫ  ≤ C E ρ ǫ | ˆ g | 2 , where the notation in troduced in ( A.11) was u sed. Th e constant C in th e above in- equalities is independ ent of ǫ . Thus E ρ ǫ | J 1 | 2 + E ρ ǫ | J 2 | 2 ≤ 1 T C E ρ ǫ | ˆ g | 2 . (A.12) Since the measu re with d ensity ρ ǫ conv erges to the m easure with den sity π ( x ) ρ ( y ; x ) the desired result follows. Refer ences [1] Y . Ait-Sahalia, P . A. My kland, and L Zhang. Ho w of ten to sample a continuou s- time process in the presence of market microstructu re noise. Rev . F inanc. Studie s , 18:351 –416, 200 5. [2] Y . Ait-Sahalia, P . A. Myk land, a nd L Zhan g. A tale of two time scales: Deter- mining i ntegrated volatility with noisy high -freque ncy data. J. Amer . Stat. Assoc. , 100:13 94–14 11, 2005 . [3] D. B akr y , P . Cattiaux, and A. Guillin. Rate of conver gen ce fo r ergodic co ntinuou s Markov processes: Lyap unov versus Poincar ´ e. J. Funct. A nal. , 25 4(3):72 7–759 , 2008. [4] A. Ben soussan, J.-L. Lions, an d G. Papanicolaou. Asymptotic analysis for peri- odic structur e s , volume 5 of Studies in Mathematics and its Applicatio ns . North- Holland Publishing Co., Amsterdam, 1978. [5] R. M. Dudley . A co urse o n empirical p rocesses. In ´ Ecole d’ ´ et ´ e de p r ob abilit ´ es de Saint-Flour , XII—19 82 , volume 109 7 of Lectur e Notes in Math. , page s 1–1 42. Springer, Berlin, 1984. [6] W . E, D. Liu, an d E. V anden-Eijn den. Analysis o f mu ltiscale method s f or stoch as- tic differential equa tions. Comm. Pure App l. Math. , 58(11):15 44–15 85, 20 05. [7] S.N. E thier and T .G. Kurtz. Ma rkov pr oce sses . W iley Series in Pro bability and Mathematical Statistics: Probab ility and Mathematical Statistics. John Wile y & Sons Inc., Ne w Y ork, 1986. [8] D. Giv on, I.G. Kevrekidis, and R. Kupf erman. Strong conver genc e schemes of pro jectiv e intregration schemes f or singular ly perturbed stoch astic differential equations. Comm. Ma th. Sci. , 4(4):707 –729, 2006 . 35 [9] D. Giv on, R. Kupf erman, and A.M. Stuart. Ex tracting macroscop ic d ynamics: model problems and algorithms. Nonlinearity , 17(6):R55 –R127, 2004. [10] I. Karatzas and S.E. Shreve. Br o wnian Mo tion a nd Stochastic Calculus , volume 113 o f Gradu ate T exts in Mathema tics . Springer-V erlag, New Y ork, second ed i- tion, 1991. [11] Y . A. Kutoyants. Statistical inference fo r er god ic d iffusion p r oce sses . Sp ringer Series in Statistics. Springer-V erlag London Ltd., Londo n, 20 04. [12] E. Nelso n. Dyna mical theories of B r ownia n motion . Prin ceton University Press, Princeton, N.J., 1967. [13] S. Olhede, G. A. Pavliotis, and A. Sy kulski. M ultiscale in ference fo r high f re- quency data. Preprint , 2008. [14] G. C. Papanicolaou, D .W . Stroock, and S. R. S. V arad han. Martin gale ap proach to some limit theorem s. In P a pers fr om the Du ke T urbulence Co nfer ence (Duke Univ ., Durh am, N.C., 1976), P aper No. 6 , pages ii+12 0 pp . Duke Un i v . Math. Ser ., V ol. II I. Duke Uni v ., Durham, N.C., 1977. [15] E. Pardoux an d A. Y u. V eretennikov . On the Poisson equation and diffusion ap- proxim ation. I. Ann. Pr oba b . , 29(3):106 1–108 5, 2 001. [16] ` E. Pardoux and A. Y u. V eretennikov . On Poisson eq uation and diffusion approx- imation. II. Ann. Pr o bab. , 3 1(3):11 66–11 92, 2003 . [17] E. Pardoux an d A. Y u. V eretennikov . On the Poisson equation and diffusion ap- proxim ation. III. An n. Pr obab. , 33(3 ):1111– 1133, 20 05. [18] G. A. Pavliotis and A. M. Stu art. White n oise limits fo r inertial pa rticles in a random ﬁeld. Mu ltiscale Model. Simul. , 1(4):527 –533 (electron ic), 200 3. [19] G. A. Pa vliotis and A. M . Stuart. Parameter estimation f or multiscale diffusions. J. Stat. Phy s. , 127(4):741 –781, 2 007. [20] G.A. Pavliotis and A.M. Stuart. Multiscale meth ods , volume 53 of T exts in Ap - plied Mathematics . Springer, New Y ork, 2008. A veraging and homogen ization. [21] B. L. S. Pra kasa Rao. Statistical infer en ce for diffusion type pr o cesses , volume 8 of K end all’ s Library of Statistics . Ed ward Arnold, London, 1999. [22] J. H . van Zanten. A note on con sistent estimation o f mu lti variate p arameters in ergodic dif fusion models. Scand . J. Statist. , 28(4):617– 623, 2 001. [23] E. V anden -Eijnden. Numerical techniqu es for multi-scale d ynamical systems with stochastic effects. Commun. Math. Sci. , 1(2):38 5–391 , 20 03. [24] C. V illani. Hypocoer civity . AMS, 200 8. 36

Maximum Likelihood Drift Estimation for Multiscale Diffusions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment