LAD estimation of locally stable SDE

LAD estimation of lo cally stable SDE Oleksii M. Kulyk ∗ Hiroki Masuda † Marc h 31, 2026 Abstract W e pro ve the asymptotic mixed normalit y of the least absolute deviation (LAD) estimator for a lo cally α -stable sto c hastic diﬀeren tial equation (SDE) observ ed at high frequency , where α ∈ (0 , 2) . W e in vestigate b oth ergo dic and non-ergo dic cases, where the terminal sampling time diverges or is ﬁxed, resp ectiv ely , under diﬀeren t sets of assumptions. The ob jectiv e function for the LAD estimator is expressed in a fully explicit form without necessitating numerical in tegration, oﬀering a signiﬁcant computational adv an tage ov er the existing non-Gaussian stable quasi-likelihoo d approach. 1 In tro duction The ob jective of this pap er is the drift estimation of the sto c hastic diﬀerential equation (SDE) dX t = a ( θ ; X t ) dt + σ ( X t − ) d Z t on the basis of a discrete-time sample ( X t k,n ) n k =0 , where t k,n = k h n with the sampling step size h n → 0 as n → ∞ , that is, we consider high-frequency sampling. W e assume that the parametric drift co eﬃcien t a ( θ ; x ) is known up to a ﬁnite-dimensional parameter θ ∈ Θ ⊂ R m while the scale co eﬃcien t σ ( x ) ma y b e unkno wn, and that the driving pro cess Z is a pur e jump lo c al ly α -stable Lévy pro cess for α < 2 , to b e sp eciﬁed in Section 2 . T o estimate the true v alue θ 0 of θ from ( X t k,n ) n k =0 , we consider the Least Absolute Deviation (LAD) t yp e estimator deﬁned to b e any element ˆ θ n minimizing the random function L n ( θ ) := n X k =1 V ( X t k − 1 ,n )   X t k,n − F h n ( θ ; X t k − 1 ,n )   , (1.1) for suitable regressor function F h ( θ ; x ) and non-negative weigh t function V ( x ) . Apart from the weigh t- ing through V ( x ) , the c hoice of the loss function ( 1.1 ) corresp onds to adopting the Laplace (double exp onen tial) quasi-lik eliho od with constant scale. The goal is to prov e that the scaled LAD estimator √ n h 1 − 1 /α n ( ˆ θ n − θ 0 ) is asymptotically (mixed-)normally distributed; w e hav e √ n h 1 − 1 /α n → ∞ under the condition ( 2.2 ) giv en b elo w. W e will rev eal that, despite its prett y simple form, having computational adv antages, the LAD estimation strategy sho ws the following nice features: • Unlik e the conv en tional Gaussian quasi-likelihoo d estimation (see [ 21 ]), and as in the non-Gaussian stable quasi-likelihoo d estimation (see [ 23 ], [ 6 ], and the references therein), the LAD estimation allo ws us to deal with cases of ﬁxed terminal sampling time (i.e. the terminal sampling time t n,n ≡ T is ﬁxed); • F urther, we can obtain a rate-optimal, namely √ n h 1 − 1 /α n -consisten t estimator without speciﬁc information of the index α (see [ 2 ], [ 11 ], [ 19 ], and also the references in [ 6 ] for the rate-optimality). These features are nov el, and neither the Gaussian quasi-likelihoo d estimator nor the non-Gaussian stable quasi-lik eliho od estimator in the literature can sim ultaneously enjoy them. ∗ W ro cla w Univ ersit y of Science and T echnology , W ybrzeźe W yspiańskiego Str. 27, 50-370 W roclaw, Poland kulik.alex.m@gmail.com † Graduate School of Mathematical Sciences, Univ ersit y of T oky o, 3-8-1 Komaba Meguro-ku T okyo 153-8914, Japan. hmasuda@ms.u-tokyo.ac.jp 1 The rate of conv ergence √ n h 1 − 1 /α n reﬂects the non-Gaussian nature of the driving noise process through the index α ∈ (0 , 2) . This rev eals that the L 1 -loss case can more appropriately handle the small-time character of the driving noise than the L 2 -loss (Gaussian-loss) case. The theory and asymptotics of the LAD-type estimator and related Laplace quasi-likelihoo d-based inference hav e a long history: among other, we refer to [ 1 ], [ 3 ], [ 25 ], [ 26 ], [ 31 ], [ 29 ], [ 30 ] and the references therein. Ho wev er, our results seem to b e the ﬁrst that pro ve the rigorous asymptotics for the self-weigh ted LAD-t yp e estimator of a weak solution to the nonlinear lo cally stable Lévy-driven SDE observed at high- frequency . T o our kno wledge, [ 20 ] is the only pap er that studied a LAD-t yp e estimation based on high-frequency sampling. It considered a self-weigh ted LAD estimation of a class of Lévy-driven ergo dic Ornstein-Uhlen b ec k pro cesses under the ergo dicit y (i.e. a ( θ ; x ) = θ 1 − θ 2 x with θ 1 ∈ R and θ 2 > 0 , with σ ( x ) b eing a constan t) and the large-time asymptotics where t n,n → ∞ as n → ∞ . The domain of applicabilit y of the result given in this pap er is muc h broader, making it p ossible to deal with the pro cess X with the non-linearit y of a ( θ ; x ) , the skewness of the noise distribution L ( Z 1 ) , and the non-ergo dicit y in a uniﬁed manner. This pap er is organized as follows. Section 2 describ ed the underlying mo del setup and basic assump- tions. Section 3 presents the main results, the asymptotic (mixed) normality of the LAD estimator. The pro ofs are giv en in Sections 4 and 5 . W e construct a consistent estimator of the asymptotic (p ossibly random) co v ariance matrix of the LAD estimator in Section 6 . The app endix Sections A , B , and C presen t some technical materials used in the pro ofs of the main results. Con v en tions. W e denote b y C, c generic p ositiv e constants whose v alues can v ary at each app earance. When these constan ts dep end on additional parameters (e.g. the truncation lev el R ), w e ma y write c R , C R . F or any sequences ( ξ n ) and ( ζ n ) of nonnegative random v ariables, we write ξ n ≲ ζ n if ξ n /ζ n ≤ C a.s. for every n large enough. W e denote by ∂ ∂ x the partial deriv ativ e op erator with resp ect to x , and by ∇ θ the gradient op erator with resp ect to θ . F or a matrix A , we write A ⊗ 2 = AA ⊤ with A ⊤ denoting the transp ose of A . 2 Setup and assumptions Throughout the pap er w e are given a ﬁltered probability space (Ω , F , ( F t ) t ∈ R + , P ) with a solution X to the partly parametrized family of SDE given by dX t = a ( θ ; X t ) dt + σ ( X t − ) d Z t , (2.1) where the driving Lévy pro cess Z is ( F t ) -adapted and indep enden t of the initial v ariable X 0 . W e assume that: • The parametrized drift co eﬃcien t a ( θ ; x ) is kno wn up to a ﬁnite-dimensional parameter θ ∈ Θ , where Θ is a b ounded domain in R m ; • The scale co eﬃcien t σ ( x ) may b e unknown; • The driving pro cess Z is assumed to b e a pur e jump lo c al ly α -stable Lévy process with α ∈ (0 , 2) , whic h we will s pecify later. Our ob jectiv e is to estimate the true v alue θ 0 ∈ Θ based on a high-frequency sample ( X t k,n ) n k =0 with t k,n = k h n , where the p ositiv e sequence h n → 0 ( n → ∞ ) denotes the sampling-step size and satisﬁes that for T n := t n,n = nh n , lim inf n →∞ T n > 0 . (2.2) The terminal sampling time T n ma y or may not diverge. Although we treat here only a single v alue θ 0 ∈ Θ of θ for brevity , with a trivial mo diﬁcation of the regularity conditions, many of our forthcoming results hold uniformly in θ ∈ Θ . W e will sometimes denote b y P θ 0 the true distribution of the pro cess X . Recall that, b y the Lévy-Khinc hin representation, the c haracteristic function of a Lévy process without a diﬀusion comp onen t has the form E [ e iξZ t ] = e − tψ ( ξ ) , 2 where the L évy exp onent ψ ( ξ ) has represen tation ψ ( ξ ) = − ibξ + Z R (1 − e iξu + iξ u 1 | u |≤ 1 ) µ ( du ); (2.3) here and b elow we denote by µ ( du ) the Lévy measure of the pro cess. In this pap er, w e call the Lévy pro cess pur e jump if ψ ( ξ ) = P . V . Z R (1 − e iξu ) µ ( du ) = lim ε → 0+ Z | u | >ε (1 − e iξu ) µ ( du ) , (2.4) that is, the ‘external drift’ part − ibξ in the representation ( 2.3 ) cancels, in the Principal V alue sense, the ‘internal drift’ whic h comes from the comp ensator part in the integral. If ( 2.4 ) holds true, the Lévy pro cess Z can b e obtained as a weak limit for ε → 0+ of the comp ound Poisson pro cesses with the Lévy exp onen ts ψ ( ε ) ( ξ ) = Z | u | >ε (1 − e iξu ) µ ( du ) , and thus can b e intuitiv ely understo o d to be free b oth from the drift and comp ensator terms; see also Remark 2.1 below. It is substantial for our estimation procedure that the noise is ‘free from a drift’ in the ab ov e sense b ecause we need to av oid the (unknown) sto c hastic term σ ( X t − ) d Z t in ( 2.1 ) to interfere with the parameterized drift co eﬃcient a ( θ ; X t ) which will b e used in the construction of the estimator. A suﬃcient but not necessary condition for this prop ert y to hold is that the Lévy pro cess Z is symmetric. F urthermore, we assume the driving pro cess to b e lo c al ly α -stable for α ∈ (0 , 2) , meaning that its Lévy measure has the form µ ( du ) = µ α ( du ) + ν ( du ) , (2.5) where the ‘principal’ part µ α ( du ) = c α | u | α +1 du, c α = ( α  2Γ(1 − α ) cos π α 2  − 1 , α  = 1 , 1 π , α = 1 is the Lévy measure of the standard symmetric α -stable pro cess with the Lévy exp onent ψ α ( ξ ) = | ξ | α . The ‘n uisance’ part ν ( du ) is allow ed to b e a signed measure and is assumed to ha v e a small-jump activit y strictly lo w er than the ‘principal’ one. The latter means that its total v ariation should hav e the Blumenthal-Geto or index β < α : | ν | ( { u : | u | ≥ ε } ) ≤ C ε − β , ε ∈ (0 , 1] . (2.6) The principal assumptions on the mo del ( 2.1 ) are given b elo w. Assumption 2.1 (Prop erties of co eﬃcien ts and noise) . 1. Drift c o eﬃcient a ( θ ; x ) satisﬁes the ﬁnite r ange Hölder c ondition in x with exp onent η ∈ (0 , 1] : | a ( θ ; x ) − a ( θ ; y ) | ≤ C | x − y | η , | x − y | ≤ 1 , θ ∈ Θ . Mor e over, a ( θ ; x ) is diﬀer entiable in θ for e ach x , and the function x 7→ a ( θ 0 ; x ) is lo c al ly b ounde d. 2. Jump c o eﬃcient σ ( x ) is b ounde d, sep ar ate d fr om 0 , and satisﬁes the ﬁnite r ange Hölder c ondition in x with exp onent ζ ∈ (0 , 1] : | σ ( x ) − σ ( y ) | ≤ C | x − y | ζ , | x − y | ≤ 1 . 3. The L évy exp onent of the L évy pr o c ess Z is given by ( 2.4 ) , wher e the L évy me asur e µ ( du ) satisﬁes ( 2.5 ) and ( 2.6 ) . 4. The balance condition holds: α + η > 1 . (2.7) 3 In particular, Assumption 2.1 ensures the existence of a solution to ( 2.1 ). See Section A for related details. Remark 2.1. It wil l fol low fr om the (statistic stability) Assumption 2.3 b elow that β < α/ 2 < 1 . This yields that the assumption ( 2.4 ) for the L évy pr o c ess to b e ‘pur e jump’ is e quivalent to the fol lowing: in ( 2.3 ) , we have b = Z | u |≤ 1 u ν ( du ) . (2.8) Next, let us proceed with the assumptions on the ob jects inv olv ed in the LAD t ype estimator ˆ θ n ; recall ( 1.1 ). Consider the Cauch y problem d f t = a ( θ ; f t ) dt, f 0 = x. (2.9) Assumption 2.1 yields that this problem has a solution f t ( θ ; x ) . This solution may fail to b e unique; ho wev er, it is easy to show that an y tw o solutions f 1 t ( θ ; x ) and f 2 t ( θ ; x ) satisfy | f 1 t ( θ ; x ) − f 2 t ( θ ; x ) | ≤ C t 1 1 − η , t ∈ [0 , 1] . (2.10) W e call any lo cally b ounded function W : R → [0 , ∞ ) a weight function ; typical example here is W ( x ) = C (1 + | x | p ) , p ≥ 0 , for some C > 0 whose speciﬁc v alue do es not matter. Assumption 2.2 (Prop erties of the regressor) . The fol lowing c onditions hold for a c ertain weight func- tion W ( x ) and a c onstant γ > 0 . 1. R e gr essor function F t ( θ ; x ) satisﬁes | F t ( θ ; x ) − x − ta ( θ ; x ) | ≤ t 1+ γ W ( x ) . (2.11) 2. Ther e exists a c onstant δ reg r > 0 such that for some solution f t ( θ ; x ) to ( 2.9 ) , | F t ( θ ; x ) − f t ( θ ; x ) | ≤ t 1 /α + δ reg r W ( x ) . (2.12) 3. R e gr essor function F t ( θ ; x ) is diﬀer entiable in θ and the fol lowing b ounds hold: |∇ θ F t ( θ ; x ) | ≤ tW ( x ) , (2.13) |∇ θ F t ( θ ; x ) − t ∇ θ a ( θ ; x ) | ≤ t 1+ γ W ( x ) , (2.14) | F t ( θ ; x ) − F t ( θ 0 ; x ) − t ∇ θ a ( θ 0 ; x ) · ( θ − θ 0 ) | ≤ t | θ − θ 0 |  | θ − θ 0 | γ + t γ  W ( x ) . (2.15) Remark 2.2. Without loss of gener ality, we c an and wil l assume that δ reg r ≤ δ drif t , se e Assumption 2.3 b elow for the deﬁnition of δ drif t . Then, by ( 2.10 ) the assumption ( 2.12 ) holds for any solution to ( 2.9 ) . Mor e over, it stil l holds with a solution f t ( θ ; x ) r eplac e d by the appr oximate solution f t ( θ ; x ) involve d in the semi-explicit r epr esentation of the tr ansition density p t ( θ ; x, y ) of the pr o c ess; se e App endix A and the discussions ther ein. W e illustrate the ab o ve assumption with several natural examples of regressors. Example 2.1 (Euler scheme) . One natur al choic e of the r e gr essor is the one use d in the classic al Euler scheme: F t ( θ ; x ) = x + ta ( θ ; x ) . F or this choic e, the c onditions ( 2.11 ) , ( 2.14 ) hold true trivial ly and the c onditions ( 2.13 ) , ( 2.15 ) hold true whenever |∇ θ a ( θ , x ) | ≤ W ( x ) , |∇ θ a ( θ , x ) − ∇ θ a ( θ 0 , x ) | ≤ | θ − θ 0 | γ W ( x ) . (2.16) 4 T o verify ( 2.12 ) , we assume that Assumption 2.1 holds true. By the ﬁrst c ondition in this assumption, a ( θ ; x ) has at most line ar gr owth uniformly in x and thus | f t ( θ ; x ) − x | ≤ Z t 0 | a ( θ ; f s ( θ ; x )) | ds ≤ C t (1 + | x | ) , | F t ( θ ; x ) − f t ( θ ; x ) | ≤ Z t 0 | a ( θ ; x ) − a ( θ ; f s ( θ ; x )) | ds ≤ C t 1+ η (1 + | x | ) . Ther efor e ( 2.12 ) holds true with δ reg r = 1 + η − 1 α and W ( x ) = C (1 + | x | p ) whenever p ≥ 2 is such that ( 2.16 ) holds true. Remark 2.3. W e ne e d δ reg r > 0 and mor e over, by Assumption 2.3 and R emark 2.4 b elow, δ reg r > 1 2 . This gives the fol lowing lower b ound for α for the Euler scheme-b ase d r e gr essor to b e applic able in our setting: α > 2 1 + 2 η , (2.17) which e quals α > 2 / 3 in the r e gular c ase wher e η = 1 . Example 2.2 (Improv ed Euler schemes) . Continue d fr om Example 2.1 , we c an show that additional r e gularity of a ( θ ; x ) al lows one to we aken the b ound ( 2.17 ) for α by using impr ove d Euler schemes b ase d on the T aylor exp ansion. F or instanc e, if ∂ ∂ x a ( θ ; · ) is b ounde d and Hölder c ontinuous with the index η uniformly in θ , then we c an put F t ( θ ; x ) = x + ta ( θ ; x ) + 1 2 t 2 a ( θ ; x ) ∂ ∂ x a ( θ ; x ) . Inde e d, then dir e ct c omputation gives (we may set t ≤ 1 ) | F t ( θ ; x ) − f t ( θ ; x ) | ≤ C (1 + | x | 1+ η ) t 2+ η , and ther efor e ( 2.12 ) holds true with δ reg r = 2 + η − 1 α and the same W ( x ) as in Example 2.1 . This gives the fol lowing lower b ound for α : α > 2 3 + 2 η . (2.18) In addition, ( 2.11 ) holds true trivial ly and c onditions ( 2.13 ) , ( 2.14 ) , ( 2.15 ) hold true under ( 2.16 ) c ombine d with     ∇ θ ∂ ∂ x a ( θ , x )     ≤ W ( x ) . The c ondition ( 2.18 ) c an b e further r elaxe d by using higher-or der impr ove d Euler schemes. In some particular imp ortan t cases, the solution to ( 2.9 ) can b e given explicitly and used as the regressor. Example 2.3 (Linear ODE) . L et a ( θ ; x ) = θ 1 + θ 2 x for θ = ( θ 1 , θ 2 ) . Then, f t ( θ ; x ) = e θ 2 t x + θ 1 tψ ( θ 2 t ) for the smo oth function ψ ( x ) = ( e x − 1) /x . T aking F t ( θ ; x ) = f t ( θ ; x ) , we get ( 2.12 ) with arbitr arily lar ge δ reg r . It is also e asy to show that c onditions ( 2.11 ) , ( 2.13 ) , ( 2.14 ) , ( 2.15 ) hold true with W ( x ) = C (1 + | x | ) . 5 Example 2.4 (Bernoulli’s ODE) . L et a ( θ ; x ) = θ 1 x ⟨ κ ⟩ + θ 2 x for θ = ( θ 1 , θ 2 ) , wher e x ⟨ κ ⟩ := | x | κ sgn( x ) . Then the function f t ( θ ; x ) =  e (1 − κ ) θ 2 t | x | 1 − κ + (1 − κ ) θ 1 t ψ ((1 − κ ) θ 2 t )  1 1 − κ + sgn( x ) , solves ( 2.9 ) , note that for κ < 1 and θ 1 > 0 this solution is not unique. T aking F t ( θ ; x ) = f t ( θ ; x ) , we get ( 2.12 ) with arbitr arily lar ge δ reg r . Assuming κ < 1 , which is r e quir e d for Assumption 2.1 to b e satisﬁe d, it is e asy to show that c onditions ( 2.11 ) , ( 2.13 ) , ( 2.14 ) , ( 2.15 ) hold with W ( x ) = C (1 + | x | ) . Next, we imp ose the following assumption on the discretization step h n . Assumption 2.3 (Statistic stabilit y assumption) . Ther e exists p ositive c onstant δ such that δ < δ drif t := α + η − 1 α , δ < δ σ := ζ α , δ < δ ν := α − β α , δ < δ reg r and that nh 2 δ n → 0 . (2.19) Remark 2.4. Under ( 2.2 ) , the c ondition ( 2.19 ) yields that δ > 1 2 . This actual ly puts str onger assumptions on η , ζ and β than those imp ose d primarily: α 2 + η > 1 , ζ > α 2 , β < α 2 . In the r e gular c ase wher e η = ζ = 1 , these c onditions r e duc e to only β < α 2 , which in p articular r e quir es that the nuisanc e p art | ν | ( du ) (r e c al l ( 2.6 ) ) is ﬁnite-variation. Finally , we will require a certain balance condition b et ween the regularity of the ‘n uisance’ part ν ( du ) of the Lévy measure and the stabilit y index α . Denote B r ( x ) = { y : | y − x | ≤ r } , x ∈ R , r ≥ 0 . W e will use Assumption 2.4 b elo w to prov e the crucial estimate ab out the residual term in the decomp osition of the transition density; see Theorem A.2 . Assumption 2.4. Ther e exist c onstants κ ≥ 0 , β ′ > 0 , C > 0 such that 1 − κ < β ′ < α and | ν | ( B r ( z )) ≤ C r κ | z | − β ′ − κ , r ≤ 1 2 | z | , z ∈ R . (2.20) Remark 2.5. Condition ( 2.20 ) has the same spirit as the notion of a ‘ κ -me asur e’, which r e quir es a me asur e of a b al l of a r adius r to b e b ounde d by C r κ , e.g. [ 14 ]. The additional factor | z | − β ′ − κ app e ars her e b e c ause we extend this notion to L évy me asur es, which may ‘explo de’ ne ar the origin. It is e asy to verify that, for a me asur e satisfying ( 2.20 ) , the index c ondition ( 2.6 ) holds with β = β ′ . The r efor e, such an extension is wel l-adjuste d with the b asic setup. T o illustrate Assumption 2.4 , let us consider tw o natural examples. Example 2.5. L et κ = 0 , which me ans that ther e is no actual r e gularity limitation on the me asur e ν ( du ) . The b ound ( 2.20 ) for κ = 0 is e quivalent to the fol lowing: ( 2.6 ) holds for β = β ′ and for all ε > 0 . With a minor r e-arr angement, we c an formulate the fol lowing e quivalent c ondition for Assumption 2.4 to hold with κ = 0 : α > 1 , the “smal l-jump intensity” c ondition ( 2.6 ) holds with some β < α , and the fol lowing “tail c ondition ” holds with some β ′′ > 1 : | ν | ( { u : | u | ≥ r } ) ≤ C r − β ′′ , r ≥ 1 . (2.21) Inde e d, we c an assume without loss of gener ality that β ′′ < α , and then c ombining ( 2.6 ) and ( 2.21 ) we se e that Assumption 2.4 holds true for κ = 0 with β ′ = max { β , β ′′ } . Note that β ′ , unlike β , is not involve d in any other assumption like (the statistic stability) Assumption 2.3 , henc e we ar e fr e e to cho ose it close to α . 6 Example 2.6. L et κ = 1 so that ther e is no limitation on α > 0 . Consider ations similar to those in Example 2.5 le ad to the fol lowing e quivalent c ondition for Assumption 2.4 to hold true with κ = 1 : α ∈ (0 , 2) is arbitr ary and ν ( du ) admits a density n ( u ) with r esp e ct to the L eb esgue me asur e such that | n ( u ) | ≤ C | u | − β − 1 1 | u |≤ 1 + C | u | − β ′′ − 1 1 | u | > 1 , β < α, β ′′ > 0 . (2.22) This is a very mild and e asy-to-verify c ondition when the L évy me asur e µ ( du ) admits an explicit density; r e c al l also the c ondition β < α / 2 mentione d in R emarks 2.4 . Se e also Example C.1 . 3 Main results W e study the asymptotic distribution of the LAD estimator ˆ θ n , deﬁned by an y minimizer of the random function ( 1.1 ) for a w eight function V ( x ) : L n ( θ ) := n X k =1 V ( X t k − 1 ,n )   X t k,n − F h n ( θ ; X t k − 1 ,n )   . W e will formulate our main results, separating tw o principal cases of ﬁnite observ ation horizon T n = nh n → T ∈ (0 , ∞ ) and inﬁnite observ ation horizon T n → ∞ . 3.1 Finite observ ation horizon In this case, the observed tra jectory of X is a.s. bounded, namely sup t ≤ T | X t ( ω ) | ≤ C ( ω ) for an a.s. ﬁnite random v ariable C ( ω ) , hence the weigh t V ( x ) is not imp ortan t for the entire consideration. Th us, for simplicity , we tak e V ( x ) ≡ 1 in this case. Denote Γ 0 = Γ 0 ( θ 0 ) = 1 T Z T 0  ∇ θ a ( θ 0 ; X t )  ⊗ 2 dt, Σ 0 = Σ 0 ( θ 0 ) = 1 T Z T 0 1 σ ( X t )  ∇ θ a ( θ 0 ; X t )  ⊗ 2 dt. F or an F -measurable non-negative deﬁnite m × m -random matrix A , w e will use the symbol M N (0 , A ( ω )) to denote the distribution of a random vector of the form A 1 / 2 ξ with a standard m -dimensional Gaussian random vector ξ deﬁned on an extended probability space and indep endent of F . F urther, we denote by ϕ α ( x ) = 1 2 π Z ∞ −∞ e − ixξ −| ξ | α dξ (3.1) the α -stable density with the characteristic function ξ 7→ e −| ξ | α , and by ⇒ the weak conv ergence. Theorem 3.1. L et Assumptions 2.1 to 2.3 hold. Assume also the fol lowing identiﬁability c onditions: • (Glob al identiﬁability): for any θ  = θ 0 , Λ 0 ( θ ) = Z T 0  a ( θ ; X t ) − a ( θ 0 ; X t )  2 dt > 0 a.s. • (L o c al identiﬁability): P (Γ 0 is p ositive deﬁnite ) = 1 . Then, the LAD estimator satisﬁes √ n h 1 − 1 /α n ( ˆ θ n − θ 0 ) ⇒ M N  0 , (2 ϕ α (0)) − 2 Σ − 1 0 Γ 0 Σ − 1 0  . 7 3.2 Inﬁnite observ ation horizon When T n → ∞ , we further assume the following conditions. Assumption 3.1 (Drift dissipation and tail moments) . Ther e exists a c onstant κ > − 1 such that lim sup | x |→∞ a ( θ 0 , x ) sgn x | x | κ < 0 . In addition, Z | u | > 1 | u | q µ ( dz ) < ∞ for some q > 0 with κ + q > 1 . This assumption ensures that the process X is ergo dic; see Section B.2 . Denote by π ( θ 0 , dx ) the corresp onding unique inv ariant probability measure, and put Γ 0 = Γ 0 ( θ 0 ) = Z R V ( x ) 2  ∇ θ a ( θ 0 ; x )  ⊗ 2 π ( θ 0 , dx ) , Σ 0 = Σ 0 ( θ 0 ) = Z R V ( x ) σ ( x )  ∇ θ a ( θ 0 ; x )  ⊗ 2 π ( θ 0 , dx ) . Unlik e the case of a ﬁnite observ ation horizon, we must now address the integrabilit y . Assumption 3.2 (Integrabilit y condition for weigh ts) . Ther e exists p > 1 such that sup t ≥ 0 E  V ( X t ) p W ( X t ) 2 p  < ∞ , wher e W ( x ) is the weight function given in Assumption 2.2 . Remark 3.1 (On the choice of V ( x ) ) . Assumption 3.2 exhibits the r ole of the function V : while the weight W fr om the Assumption 2.2 on the drift c o eﬃcient and r e gr essor may b e gr owing, the function V should b e de c aying at ∞ suﬃciently fast to ensur e that V ( X t ) W ( X t ) 2 b elong to L 1+ ε uniformly in t > 0 . Using Assumption 3.1 , one c an explicitly cho ose V in the natur al c ase W ( x ) = (1 + | x | ) p W , p W ≥ 0 . Inde e d, by [ 17 ], Assumption 3.1 pr ovides that, for any p X < q + κ − 1 , sup t ≥ 0 E [ | X t | p X ] ≤ C + | X 0 | p X . Thus V ( x ) = (1 + | x | ) − p V satisﬁes Assumption 3.2 whenever 2 p W − p V < q + κ − 1 ⇐ ⇒ p V > 2 p W + 1 − q − κ. Sinc e V is me ant to dump the gr owth of the weight function W , we wil l assume furthermor e that V is b ounde d. W e c ould set V ( x ) ≡ 1 fr om the b e ginning if Z c an b e supp ose d to b e light-taile d, such as R | u | > 1 | u | K µ ( du ) < ∞ for any K > 0 . Theorem 3.2. L et Assumptions 2.1 to 2.3 , 3.1 , and 3.2 hold. Assume also the fol lowing identiﬁability c onditions: • (Glob al identiﬁability): for any θ  = θ 0 , Λ 0 ( θ ) = Z R V ( x ) 2  a ( θ ; x ) − a ( θ 0 ; x )  ⊗ 2 π ( θ 0 , dx ) > 0; • (L o c al identiﬁability): Γ 0 is p ositive deﬁnite. Then, the LAD estimator satisﬁes √ n h 1 − 1 /α n ( ˆ θ n − θ 0 ) ⇒ N  0 , (2 ϕ α (0)) − 2 Σ − 1 0 Γ 0 Σ − 1 0  . 8 Remark 3.2. The c onver genc e r ate √ nh 1 − 1 /α n is known to b e optimal in some situations. Inde e d, the lo c al asymptotic normality (LAN) has b e en pr ove d for sever al lo c al ly stable L évy pr o c esses. Conc erning the symmetric stable L évy pr o c ess, we r efer to [ 19 ] for the de gener ate LAN with diagonal norming and [ 2 ] for the non-de gener ate LAN with asymmetric non-diagonal norming. F or a gener al lo c al ly stable L évy pr o c ess with known index, [ 11 ] derive d the LAN pr op erty 1 . Some c ase studies, including other sp e ciﬁc lo c al ly stable L évy pr o c esses, c an b e found in [ 22 ]. As for r elate d (lo c al ly) stable SDE mo dels, [ 4 ] and [ 7 ] studie d the lo c al asymptotic mixe d normality (LAMN) when the index is known. 4 Pro ofs, I: Consistency at sub-optimal rate In what follows, we denote r n = r n ( α ) = √ n h 1 − 1 /α n . In this preparatory section, we prov e that the LAD type estimator is consistent with the rate r − ς n for any ς ∈ (0 , 1) : ˆ θ n − θ 0 = O P ( r − ς n ) . (4.1) This statement is weak er than the main statement, which yields ( 4.1 ) with ς = 1 . Ho w ever, we will need this weak er statemen t in order to ac hieve the optimal rate r − 1 n , and its pro of is already quite technically in volv ed. Therefore, for the b eneﬁt of the reader, we discuss this pro of separately , hoping that this will mak e the entire argument easier to follow. W e separate the pro of of ( 4.1 ) into several steps, gradually improving the rate of consistency . In each of the steps, we will use essentially the same set of to ols, which we will in tro duce now. 4.1 Con trast function: deﬁnition and represen tation W e deﬁne the c ontr ast function as H n ( θ ) = 1 r n √ nh n  L n ( θ ) − L n ( θ 0 )  = 1 r 2 n h − 1 /α n  L n ( θ ) − L n ( θ 0 )  , whic h is a.s. con tinuous in θ ∈ Θ . Then H n ( ˆ θ n ) ≤ H n ( θ 0 ) = 0 a.s., and in order to prov e the consistency with some rate ρ n > 0 in the sense that P ( | ˆ θ n − θ 0 | < ρ n ) → 1 , it is suﬃcient to sho w that P  inf θ : | θ − θ 0 |≥ ρ n H n ( θ ) > 0  → 1 . (4.2) T o deriv e ( 4.2 ) with v arious rates ρ n , w e will systematically use the follo wing represen tation for the con trast function. Denote ζ k,n = h − 1 /α n  X t k,n − F h n ( θ 0 ; X t k − 1 ,n )  V ( X t k − 1 ,n ) , κ k,n ( θ ) = h − 1 /α n  F h n ( θ ; X t k − 1 ,n ) − F h n ( θ 0 ; X t k − 1 ,n )  V ( X t k − 1 ,n ) , then we hav e H n ( θ ) = 1 r 2 n n X k =1 ( | ζ k,n − κ k,n ( θ ) | − | ζ k,n | ) . Deﬁne a non-negative function q ( x, v ) by q ( x, v ) =  (2 v − 2 x )1 x ∈ [0 ,v ) , v ≥ 0; (2 x − 2 v )1 x ∈ ( v , 0] , v < 0 . (4.3) A direct computation shows that | q ( x, v ) − q ( x, v ′ ) | ≤ 2 | v − v ′ | and | x − v | − | x | = − v sgn x + q ( x, v ) , x  = 0 , 1 There was a minor mistake ab out the asymptotic cov ariance matrix in [ 11 , Theorem 2.1]: the oﬀ-diagonal element should b e non-n ull in case of asymmetric jumps. 9 whic h leads to the decomp osition H n ( θ ) = 1 r 2 n n X k =1 u k,n ( θ ) + 1 r 2 n n X k =1 y k,n ( θ ) =: U n ( θ ) + Y n ( θ ) (4.4) with u k,n ( θ ) := − κ k,n ( θ ) sgn( ζ k,n ) , y k,n ( θ ) := q ( ζ k,n , κ k,n ( θ )) . The heuristics b ehind this decomp osition can b e explained as follows: formally , we hav e v sgn x = ( | x | ) ′ v , v − 2 q ( x, v ) → δ 0 ( x ) = 1 2 ( | x | ) ′′ , v → 0 , whic h means that U n ( θ ) and Y n ( θ ) essentially corresp ond to the linear and the quadratic terms in the T a ylor expansion of the random function H n ( θ ) . This seemingly v ague explanation is actually quite informativ e: we will see later that, on a prop er restriction of the set Θ where the v alue of the LAD estimator ˆ θ n is contained, functions U n ( θ ) and Y n ( θ ) are linear and quadratic, resp ectiv ely , up to negligible summands. With this p erspective in mind, we will call U n ( θ ) and Y n ( θ ) the linear term and the quadratic term, resp ectiv ely . W e further decomp ose the linear and quadratic terms into their martingale and predictable parts as follo ws: U n ( θ ) = 1 r 2 n n X k =1 u M k,n ( θ ) + 1 r 2 n n X k =1 u P k,n ( θ ) =: U M n ( θ ) + U P n ( θ ) , (4.5) Y n ( θ ) = 1 r 2 n n X k =1 y M k,n ( θ ) + 1 r 2 n n X k =1 y P k,n ( θ ) =: Y M n ( θ ) + Y P n ( θ ) , (4.6) with u P k,n ( θ ) := E [ u k,n ( θ ) |F k − 1 ,n ] , u M k,n ( θ ) := u k,n ( θ ) − u P k,n ( θ ) , y P k,n ( θ ) := E [ y k,n ( θ ) |F k − 1 ,n ] , y M k,n ( θ ) := y k,n ( θ ) − y P k,n ( θ ) . Later, we will show that: • for the linear term, its martingale part U M n ( θ ) is the principal one, and the predictable part is negligible, in a sense; • for the quadratic term, its predictable part Y P n ( θ ) is the principal one, and the martingale part is negligible, in a sense. 4.2 Estimates for martingale and predictable parts W e will rep eatedly use, in slightly diﬀerent settings, essen tially the same argumen t in order to estimate the martingale and the ‘negligible predictable’ parts in the decomp ositions ( 4.5 ), ( 4.6 ) and their analogues. Therefore, for conv enience of the reader, we give this argument separately . Lemma 4.1. W e have sup θ  = θ 0 | U P n ( θ ) | | θ − θ 0 | = o P ( r − 1 n ) , n → ∞ . (4.7) Pr o of. Since κ k,n ( θ ) is F k − 1 ,n -measurable, we hav e E [ u k,n ( θ ) |F k − 1 ,n ] = − κ k,n ( θ ) E [sgn( ζ k,n ) |F k − 1 ,n ] . Let us estimate the terms of this pro duct separately . F or the ﬁrst term, we hav e simply by ( 2.13 ), | κ k,n ( θ ) | = h − 1 /α n | F h n ( θ ; X t k − 1 ,n ) − F h n ( θ 0 ; X t k − 1 ,n ) | V ( X t k − 1 ,n ) ≤ C h − 1 /α +1 n | θ − θ 0 | V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) . (4.8) T o estimate the second term, we use the decomp osition ( A.1 ) for the transition density of the pro cess X : E [sgn( ζ k,n ) |F k − 1 ,n ] = Z R sgn  y − F h n ( θ 0 ; X t k − 1 ,n ) h 1 /α n  e p h n ( X t k − 1 ,n , y ) dy 10 + Z R sgn  y − F h n ( θ 0 ; X t k − 1 ,n ) h 1 /α n  r h n ( X t k − 1 ,n , y ) dy , (4.9) and since | sgn( · ) | ≤ 1 , ( A.3 ) guarantees that the second summand is b ounded by C h δ n ; here and b elo w w e assume δ to b e ﬁxed and satisfy all the relations in Assumption 2.3 . The ﬁrst summand equals Z R sgn  y − F h n ( θ 0 ; X t k − 1 ,n ) h 1 /α n  1 σ h n ( x ) h 1 /α n ϕ α y − f h n ( θ 0 ; X t k − 1 ,n ) σ h n ( X t k − 1 ,n ) h 1 /α n ! dy = Z R sgn z − F h n ( θ 0 ; X t k − 1 ,n ) − f h n ( θ 0 ; X t k − 1 ,n ) σ h n ( X t k − 1 ,n ) h 1 /α n ! ϕ α ( z ) dz ≤ C | F h n ( θ 0 ; X t k − 1 ,n ) − f h n ( θ 0 ; X t k − 1 ,n ) | σ h n ( X t k − 1 ,n ) h 1 /α n , where, in the last inequality , we hav e used that ϕ α is symmetric and b ounded. Recall that ( 2.12 ) holds true with f t replaced b y f t ; see Remark 2.2 and the discussion in Appendix A . Since σ t ( x ) is b ounded a wa y from 0, the ﬁrst summand in ( 4.9 ) is b ounded by C h δ reg r n W ( X t k − 1 ,n ) ≤ C h δ n W ( X t k − 1 ,n ) . Com bining the ab o ve b ounds, we obtain sup θ  = θ 0 | U P n ( θ ) | | θ − θ 0 | ≤ C r 2 n h δ n h − 1 /α +1 n n X k =1 V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) 2 ≤ C n 1 / 2 h δ n r n 1 n n X k =1 V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) 2 ! . Observ e that 1 n n X k =1 V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) 2 = O P (1) , n → ∞ . In the ﬁnite observ ation horizon case, this is ob vious b ecause the weigh t functions are lo cally b ounded and X has a.s. b ounded tra jectories; in the inﬁnite observ ation horizon case, this follows from Assumption 3.2 . Th us, sup θ  = θ 0 | U P n ( θ ) | | θ − θ 0 | = O P ( n 1 / 2 h δ n r − 1 n ) , n → ∞ and applying the statistic stabilit y Assumption 2.3 , w e complete the pro of. Lemma 4.2. 1. F or any θ  = θ 0 , we have | U M n ( θ ) | + | Y M n ( θ ) | | θ − θ 0 | = O P ( r − 1 n ) , n → ∞ . (4.10) 2. F or any υ ∈ (0 , 1) , we have sup θ  = θ 0 | U M n ( θ ) | + | Y M n ( θ ) | | θ − θ 0 | υ = O P ( r − 1 n ) , n → ∞ . (4.11) Remark 4.1. The denominator | θ − θ 0 | υ in the uniform b ound is worse than the term | θ − θ 0 | which we have in the individual one and which is the same as the one fr om the pr evious lemma. This is one of the main sour c es of te chnic al c omplic ations we wil l exp erienc e later, and it se ems to b e inevitable b e c ause of the metho d of the pr o of b ase d on the Kolmo gor ov-Chentsov the or em. Se e also the ﬁrst p ar agr aph in Se ction 5 . Pr o of of L emma 4.2 . 1. W e hav e | u M k,n ( θ ) | ≤ 2 | κ k,n ( θ ) | , | y M k,n ( θ ) | ≤ | κ k,n ( θ ) | 11 b ecause | q ( x, v ) | ≤ 2 | v | , | v sgn( x ) | ≤ | v | , and κ k,n ( θ ) is F k − 1 ,n -measurable. F rom now on, we will pro ve the required b ounds for U M n ( θ ) only; the pro of for Y M n ( θ ) will b e literally the same. In the ﬁnite observ ation horizon case, ﬁx R > 0 and denote τ R = inf { t ≤ T : | X t k − 1 ,n | > R } , with the usual conv en tion inf ∅ = T . Then, E [ U M n ( θ ) 2 1 τ R = T ] = 1 r 4 n E X k : t n,k − 1 <τ R E [ u M k,n ( θ ) 2 |F k − 1 ,n ] ≤ 4 nh 2 − 2 /α n r 4 n | θ − θ 0 | 2 sup | x |≤ R W 2 ( x ) ≤ C R nh 2 − 2 /α n r 4 n | θ − θ 0 | 2 = C R r 2 n | θ − θ 0 | 2 . This gives an analogue of ( 4.10 ) for U M n ( θ ) on the set { τ R = T } , and b ecause { τ R = T } → Ω as R → ∞ , actually prov es ( 4.10 ) for U M n ( θ ) . In the inﬁnite observ ation horizon case, in the ab o v e estimate, w e should combine the lo calization argumen t with the Assumption 3.2 on the weigh ts; we omit the details, referring to the second part of the pro of where the same issue is treated in a more complicated setting. 2. In order to get the uniform estimate, we will apply the Kolmogoro v-Chen tso v criterion for the existence of a Hölder contin uous mo diﬁcation. F or that purp ose, we need to estimate the higher-order momen ts of the (prop erly lo calized) diﬀerences of U M n ( θ ) . Namely for an y θ, θ ′ , we consider the martingale U M r,n ( θ , θ ′ ) := r X k =1  u M k,n ( θ ) − u M k,n ( θ ′ )  , r ≥ 0 . Recall that u M k,n ( θ 0 ) = 0 so that U M n ( θ ) = r − 2 n U M n,n ( θ , θ 0 ) . W e ha ve  u M k,n ( θ ) − u M k,n ( θ ′ )  2 ≤ 4( κ k,n ( θ ) − κ k,n ( θ ′ )) 2 ≤ C h 2 − 2 /α n | θ − θ ′ | 2 V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2 . Let τ b e a stopping time, then by the Burkholder-Davis-Gundy inequality applied to the martingale M · ,n ( θ , θ ′ ) stopp ed at τ , we hav e for any p > 1 E h   U M τ ∧ n,n ( θ , θ ′ )   p i ≤ C p h p − p/α n | θ − θ ′ | p E        τ X k =1 V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2      p/ 2   . Fix R > 0 and take in the abov e inequality τ = τ n,R = min ( l : l X k =0 V ( X t k,n ) 2 W ( X t k,n ) 2 > R 2 n ) . Then, E  | U M n ( θ ) − U M n ( θ ′ ) | p 1 { τ n,R >n }  ≤ r − 2 p n E h   U M τ ∧ n,n ( θ , θ ′ )   p i ≤ C p h p − p/α n r 2 p n | θ − θ ′ | p E        τ n,R X k =1 V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2      p/ 2   ≤ C p R p h p − p/α n r 2 p n n p/ 2 | θ − θ ′ | p = C p R p r − p n | θ − θ ′ | p . 12 Recall that m denotes the dimension of θ . F or any ﬁxed υ ∈ (0 , 1) we can choose p large enough for p − m > υ p , and for such p the Kolmogoro v-Chen tso v theorem yields E " sup θ  = e θ | U M n ( θ ) − U M n ( θ ′ ) | | θ − θ ′ | υ ! p 1 { τ n,R >n } # ≤ R p C p,υ ,m r − p n . W e complete the pro of by observing that lim inf n →∞ P ( τ n,R > n ) → 1 , R → ∞ , b ecause 1 n n X k =1 V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2 = O P (1) , n → ∞ b y Assumption 3.2 and our conv en tion that V is b ounded; see Remark 3.1 . 4.3 Basic consistency Prop osition 4.1. W e have ˆ θ n − θ 0 = o P (1) , n → ∞ . (4.12) Pr o of. Let us estimate from b elo w the predictable part Y P n ( θ ) . F or that, we will p erform the following ‘double truncation’ pro cedure. First, note that q ( x, v ) ≥ 0 and th us for arbitrary R > 0 , Y P n ( θ ) ≥ Y P n,R ( θ ) = 1 r 2 n n X k =1 y P n,k ( θ ) χ ( R − 1 X t k − 1 ,n ) , where χ : R → [0 , 1] is a contin uous function such that χ ( r ) =  1 , | r | ≤ 1; 0 , | r | ≥ 2 . Second, we p erform the truncation of the domain of integration in the representation y P n,k ( θ ) = E [ q ( ζ k,n , κ k,n ( θ )) |F k − 1 ,n ] = Z R q  y − F h n ( θ 0 ; X t k − 1 ,n ) h 1 /α n , κ k,n ( θ )  p h n ( X t k − 1 ,n , y ) dy . Namely , instead of integrating with resp ect to y ∈ R , we in tegrate w.r.t the smaller domain n y : | y − F h n ( θ 0 ; X t k − 1 ,n ) | ≤ h 1 /α n o . W e use decomp osition ( A.1 ) for the transition density p h n ( X t k − 1 ,n , y ) , and observe that on this domain w e hav e the follo wing: • Since σ t ( x ) is b ounded and separated from zero, there exists c > 0 such that e p h n ( X t k − 1 ,n , y ) ≥ 2 ch − 1 /α n ; • By Theorem A.2 , the b ound ( A.4 ) holds true, and thus | r h n ( X t k − 1 ,n , y ) | ≤ C h − 1 /α + δ ′ n . Then, for n large enough such that C h δ ′ n < c , we hav e on this domain p h n ( X t k − 1 ,n , y ) ≥ e p h n ( X t k − 1 ,n , y ) − | r h n ( X t k − 1 ,n , y ) | ≥ ch − 1 /α n , whic h gives the b ound y P n,k ( θ ) ≥ ch − 1 /α n Z | y − F h n ( θ 0 ; X t k − 1 ,n ) |≤ h 1 /α n q  y − F h n ( θ 0 ; X t k − 1 ,n ) h 1 /α n , κ k,n ( θ )  dy = c Z [ − 1 , 1] q ( z , κ k,n ( θ )) dz . 13 By a direct calculation, w e can verify that Z [ − 1 , 1] q ( z , v ) dz ≥ c  | v | ∧ v 2  , v ∈ R . (4.13) Summarizing all the ab ov e, we obtain that Y P n ( θ ) ≥ c r 2 n n X k =1  | κ k,n ( θ ) | ∧ | κ k,n ( θ ) | 2  χ ( R − 1 X t k − 1 ,n ) , R ≥ 0 (4.14) with the constant c not dep ending on R . Using ( 2.11 ) and the fact that W ( · ) is lo cally b ounded, we get that, on the set {| X t k − 1 ,n | ≤ 2 R } , | κ k,n ( θ ) | ≥ h − 1 /α +1 n | a ( θ ; X t k − 1 ,n ) − a ( θ 0 ; X t k − 1 ,n ) | V ( X t k − 1 ,n ) − C R h − 1 /α +1+ γ n . (4.15) In addition, a ( θ ; x ) − a ( θ 0 ; x ) is bounded on Θ × [ − R, R ] and V ( x ) is b ounded on [ − R, R ] , hence | a ( θ ; x ) − a ( θ 0 ; x ) | V ( X t k − 1 ,n ) ≥ c R ( a ( θ ; x ) − a ( θ 0 ; x )) 2 V ( X t k − 1 ,n ) 2 . Then, for n large enough, ( 4.14 ) and ( 4.15 ) together with the Cauch y inequality ( a − b ) 2 ≥ a 2 / 2 − b 2 giv e Y P n ( θ ) ≥ c h − 1 /α +1 n ∧ h − 2 /α +2 n r 2 n × n X k =1   a ( θ ; X t k − 1 ,n ) − a ( θ 0 ; X t k − 1 ,n )  2 V ( X t k − 1 ,n ) 2 − C R h γ n  χ ( R − 1 X t k − 1 ,n ) ≥ c h − 1 /α +1 n ∧ h − 2 /α +2 n r 2 n  nc R Λ n,R ( θ ) − nC R h γ n  , where we denoted Λ n,R ( θ ) = 1 n n X k =1  a ( θ ; X t k − 1 ,n ) − a ( θ 0 ; X t k − 1 ,n )  2 V ( X t k − 1 ,n ) 2 χ ( R − 1 X t k − 1 ,n ) . Note that n h − 1 /α +1 n ∧ h − 2 /α +2 n r 2 n = 1 ∧ h 1 /α − 1 n , hence we ha ve prov ed the follo wing family of lo wer b ounds: for an y R > 0 there exist c R , C R > 0 , and N R ∈ N such that, for n ≥ N R , Y P n ( θ ) ≥ c R  1 ∧ h 1 /α − 1 n  (Λ n,R ( θ ) − C R h γ n ) , θ ∈ Θ . (4.16) W e are now ready to pro ve ( 4.12 ). W e hav e  1 ∧ h 1 /α − 1 n  r n =  √ n, α ∈ (0 , 1] r n , α ∈ [1 , 2) → ∞ , n → ∞ . Since Θ is b ounded, this yields, together with ( 4.7 ), ( 4.11 ), that sup θ ∈ Θ | H n ( θ ) − Y P n ( θ ) | = o P (1 ∧ h 1 /α − 1 n ) , n → ∞ . (4.17) On the other hand, w e hav e the family of low er b ounds ( 4.16 ). Observ e the following. • In the ﬁnite observ ation horizon case, denote Λ 0 ,R ( θ ) = 1 T Z T 0  a ( θ ; X t ) − a ( θ 0 ; X t )  2 χ ( R − 1 X t ) dt and observe that, with probabilit y 1, Λ n,R ( θ ) → Λ 0 ,R ( θ ) , n → ∞ uniformly in θ ∈ Θ ; recall that V ( x ) ≡ 1 in this case. 14 • In the inﬁnite observ ation horizon case, we ha ve uniform conv ergence in probability sup θ | Λ n,R ( θ ) − Λ 0 ,R ( θ ) | → 0 , n → ∞ , (4.18) with (non-random) Λ 0 ,R ( θ ) = Z R V ( x ) 2  a ( θ ; x ) − a ( θ 0 ; x )  ⊗ 2 χ ( R − 1 x ) π ( θ 0 , dx ) , see App endix B.2 . W e now complete the pro of using the following standard argument. W e hav e H n ( b θ n ) ≤ 0 , hence for every ε > 0 , ν > 0 , and R > 0 , we hav e by ( 4.16 ) and ( 4.17 ) lim inf n P ( | ˆ θ n − θ 0 | < ε ) ≥ lim inf n P  inf θ : | θ − θ 0 |≥ ε H n ( θ ) > 0  ≥ lim inf n P  inf θ : | θ − θ 0 |≥ ε H n ( θ ) >  1 ∧ h 1 /α − 1 n  cc R ν / 2  ≥ lim inf n P  inf θ : | θ − θ 0 |≥ ε Λ n,R ( θ ) > ν  ≥ P  inf θ : | θ − θ 0 |≥ ε Λ 0 ,R ( θ ) > ν  , where the last inequality follo ws from the p ortman teau theorem. W e hav e with probability 1, sup θ | Λ 0 ,R ( θ ) − Λ 0 ( θ ) | → 0 , R → ∞ uniformly in θ , hence taking R → ∞ and using the global identiﬁabilit y condition, we get lim inf n P ( | ˆ θ n − θ 0 | < ε ) ≥ P  inf θ : | θ − θ 0 |≥ ε Λ 0 ( θ ) > ν 2  . The right-hand side conv erges to 1 as ν → 0+ so that we can conclude lim inf n P ( | ˆ θ n − θ 0 | < ε ) = 1 , ε > 0 , completing the pro of of ( 4.12 ). 4.4 Impro v ed consistency for α < 1 Prop osition 4.2. In the c ase α < 1 , the impr ove d fol lowing c onsistency r ate holds: ˆ θ n − θ 0 = o P ( h 1 /α − 1 n ) , n → ∞ . Pr o of. W e will mainly rep eat the arguments from the pro of of Prop osition 4.1 , but now to obtain the lo wer b ound for Y P n ( θ ) , we will use a ‘lo cal’ b ound instead of the ‘global’ b ound ( 4.15 ) for | κ k,n ( θ ) | . By the assumption ( 2.15 ), w e hav e the follo wing b ound for | X t k − 1 ,n | ≤ 2 R : | κ k,n ( θ ) | ≥ h − 1 /α +1 n  |∇ θ a ( θ 0 ; X t k − 1 ,n ) · ( θ − θ 0 ) | V ( X t k − 1 ,n ) 2 − C R | θ − θ 0 |  | θ − θ 0 | γ + h γ n   . Since |∇ θ a ( θ 0 ; x ) | is bounded for | x | ≤ R , w e hav e for | X t k − 1 ,n | ≤ 2 R , |∇ θ a ( θ 0 ; X t k − 1 ,n ) · ( θ − θ 0 ) | ≥ c R | θ − θ 0 |  ∇ θ a ( θ 0 ; X t k − 1 ,n ) · θ − θ 0 | θ − θ 0 |  2 , whic h gives | κ k,n ( θ ) | ≥ h − 1 /α +1 n | θ − θ 0 | c R  ∇ θ a ( θ 0 ; X t k − 1 ,n ) · θ − θ 0 | θ − θ 0 |  2 V ( X t k − 1 ,n ) 2 − C R ( | θ − θ 0 | γ + h γ n ) ! . 15 Again by applying the Cauc h y inequality ( a − b ) 2 ≥ a 2 / 2 − b 2 , we hav e for | X t k − 1 ,n | ≤ 2 R , | κ k,n ( θ ) | 2 ≥ h − 2 /α +2 n 2 |∇ θ a ( θ 0 ; X t k − 1 ,n ) · ( θ − θ 0 ) | 2 V ( X t k − 1 ,n ) 2 − C R h − 2 /α +2 n | θ − θ 0 | 2 ( | θ − θ 0 | γ + h γ n ) 2 ≥ h − 2 /α +2 n | θ − θ 0 | 2  1 2  ∇ θ a ( θ 0 ; X t k − 1 ,n ) · θ − θ 0 | θ − θ 0 |  2 V ( X t k − 1 ,n ) 2 − C R ( | θ − θ 0 | γ + h γ n ) 2  . (4.19) Since | κ k,n ( θ ) | 2 ≥ 0 , we can deduce from the latter b ound that, for | θ − θ 0 | ≥ εh 1 /α − 1 n , | κ k,n ( θ ) | 2 ≥ εh − 1 /α +1 n | θ − θ 0 |  1 2  ∇ θ a ( θ 0 ; X t k − 1 ,n ) · θ − θ 0 | θ − θ 0 |  2 V ( X t k − 1 ,n ) 2 − C R ( | θ − θ 0 | γ + h γ n ) 2  . Then, by ( 4.14 ), we get for arbitrary ε > 0 , e ε > 0 and εh 1 /α − 1 n ≤ | θ − θ 0 | ≤ e ε , h − 1 /α +1 n Y P n ( θ ) | θ − θ 0 | ≥ h − 2 /α +2 n r 2 n n X k =1   ε 2 ∧ c R   ∇ θ a ( θ 0 ; X t k − 1 ,n ) · θ − θ 0 | θ − θ 0 |  2 V ( X t k − 1 ,n ) 2 χ ( R − 1 X t k − 1 ,n ) − C R ( e ε γ + h γ n )  . Recall that h − 2 /α +2 n r 2 n = 1 n and denote Γ n,R = 1 n n X k =1  ∇ θ a ( θ 0 ; X t k − 1 ,n )  ⊗ 2 χ ( R − 1 X t k − 1 ,n ) V ( X t k − 1 ,n ) 2 , γ n,R = inf | ℓ | =1  Γ n,R ℓ · ℓ  . Summarizing the ab o v e arguments, we get the following low er b ound: h − 1 /α +1 n inf εh 1 /α − 1 n ≤| θ − θ 0 |≤ e ε Y P n ( θ ) | θ − θ 0 | ≥  ε 2 ∧ c R  γ n,R − C R ( e ε γ + h γ n ) . (4.20) Next, by Lemma 4.1 w e hav e h − 1 /α +1 n sup θ | U P n ( θ ) | | θ − θ 0 | = o P ( h − 1 /α +1 n r − 1 n ) = o P ( n − 1 / 2 ) . Similarly , b y Lemma 4.2 w e hav e for an y υ ∈ (0 , 1) , h − 1 /α +1 n sup | θ − θ 0 |≥ εh 1 /α − 1 n | U M n ( θ ) + Y M n ( θ ) | | θ − θ 0 | ≤ h − 1 /α +1 n h ( − 1 /α +1)(1 − υ ) n sup | θ − θ 0 |≥ εh 1 /α − 1 n | U M n ( θ ) | + | Y M n ( θ ) | | θ − θ 0 | υ = o P ( h ( − 1 /α +1)(1 − υ ) n n − 1 / 2 ) . W e are assuming that lim inf n →∞ T n > 0 , hence taking υ > 1 suﬃciently large so that ( − 1 /α + 1)(1 − υ ) + 1 2 ≥ 0 ⇐ ⇒ υ ≥ 1 − α 2(1 − α ) , 16 w e get ψ n,ε := h − 1 /α +1 n sup | θ − θ 0 |≥ εh 1 /α − 1 n | H n ( θ ) − Y P n ( θ ) | | θ − θ 0 | = o P (1) . (4.21) No w we can ﬁnalize the entire argument. By Prop osition 4.1 , we hav e for arbitrary ε > 0 , e ε > 0 lim inf n P ( | ˆ θ n − θ 0 | < εh 1 /α − 1 n ) = lim inf n P inf εh 1 /α − 1 n ≤| θ − θ 0 |≤ e ε H n ( θ ) | θ − θ 0 | > 0 ! . Recall the decomp ositions ( 4.4 ), ( 4.5 ), and ( 4.6 ). By ( 4.21 ) and ( 4.20 ), lim inf n P inf εh 1 /α − 1 n ≤| θ − θ 0 |≤ e ε H n ( θ ) | θ − θ 0 | > 0 ! = lim inf n P h − 1 /α +1 n inf εh 1 /α − 1 n ≤| θ − θ 0 |≤ e ε H n ( θ ) | θ − θ 0 | > 0 ! ≥ lim inf n P  ε 2 ∧ c R  γ n,R − C R ( e ε γ + h γ n ) − ψ n,R > 0  . W e ha ve γ n,R → γ 0 ,R := inf | ℓ | =1  Γ 0 ,R ℓ · ℓ  , n → ∞ in probability , where Γ 0 ,R = Z T 0  ∇ θ a ( θ 0 ; X t )  ⊗ 2 χ ( R − 1 X t ) dt in the ﬁnite observ ation horizon case (where V ( x ) ≡ 1 ) and Γ 0 ,R = Z R V ( x ) 2  ∇ θ a ( θ 0 ; x )  ⊗ 2 χ ( R − 1 x ) π ( θ 0 , dx ) in the inﬁnite observ ation horizon case. Then, lim inf n P  ε 2 ∧ c R  γ n,R − C R ( e ε γ + h γ n ) − ψ n,R > 0  ≥ P  ε 2 ∧ c R  γ 0 ,R − C R e ε γ > 0  , and we ﬁnally get lim inf n P ( | ˆ θ n − θ 0 | < εh 1 /α − 1 n ) ≥ lim R →∞ lim e ε → 0 P  ε 2 ∧ c R  γ 0 ,R − C R e ε γ > 0  = lim R →∞ P ( γ 0 ,R > 0) = 1 , where we hav e used the lo cal identiﬁabilit y condition in the last identit y . 4.5 Consistency at sub-optimal rate W e are now ready to prov e ( 4.1 ). The pro of rep eats those of Prop ositions 4.1 and 4.2 with just one (but substan tial) mo diﬁcation. W e hav e just prov ed that P ( θ n ∈ Θ n ) → 1 (4.22) with Θ n := n θ : | θ − θ 0 | ≤ 1 ∧ h 1 /α − 1 n o . This additional knowledge now makes it p ossible to improv e the lo w er b ound for Y P n ( θ ) . The key obser- v ation here is that, for θ ∈ Θ n and | X t k − 1 ,n | ≤ 2 R , the term κ k,n ( θ ) is b ounded by C R . Indeed, in this case, by ( 2.13 ) one has | κ k,n ( θ ) | ≤ h 1 − 1 /α n | θ − θ 0 | V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) ≤ sup | x |≤ 2 R V ( x ) W ( x ) . 17 This means that instead of the lo wer estimate ( 4.13 ), which combines a lo cal quadratic b ound with a global linear one, we can use just the quadratic b ound: Z [ − 1 , 1] q ( z , v ) dz ≥ c Q v 2 , | v | ≤ Q. This will give, instead of ( 4.14 ), the following impro ved low er b ound: Y P n,R ( θ ) ≥ c R r 2 n n X k =1 κ k,n ( θ ) 2 χ ( R − 1 X t k − 1 ,n ) , θ ∈ Θ n . (4.23) F or κ k,n ( θ ) 2 w e hav e the lo wer b ound ( 4.19 ). Since h − 2 /α +2 n r 2 n = 1 n , w e ﬁnally get from ( 4.23 ), Y P n,R ( θ ) ≥ | θ − θ 0 | 2  c R γ n,R − C R ( | θ − θ 0 | γ + h γ n ) 2  , θ ∈ Θ n . Com bined with ( 4.7 ), ( 4.11 ), the previous estimate giv es for any R > 0 , H n ( θ ) ≥ | θ − θ 0 | 2  c R γ n,R − C R ( | θ − θ 0 | γ + h γ n ) 2  − r − 1 n η n | θ − θ 0 | υ , θ ∈ Θ n , (4.24) with η n b ounded in probability . Since H n ( b θ n ) ≤ 0 , ( 4.24 ) yields that for each ν > 0 , on the set Ω R,ν n =  γ n,R ≥ ν, b θ n ∈ Θ n  , the following b ound holds: | b θ n − θ 0 | ≤ η R,ν,υ n r − 1 / (2 − υ ) n (4.25) with η R,ν,υ n b ounded in probability . By ( 4.22 ), lim inf n →∞ P (Ω R,ν n ) ≥ lim inf n →∞ P ( γ n,R > ν ) ≥ P ( γ 0 ,R > ν ) . Under the lo cal identiﬁabilit y assumption, for any ε > 0 we can ﬁx R large enough and ν > 0 small enough so that P ( γ 0 ,R > ν ) ≥ 1 − ε. Therefore, the inequality ( 4.25 ) holds with probability ≥ 1 − 2 ε for n large enough. Since ε > 0 here is arbitrary , this shows that | b θ n − θ 0 | = O P ( r − 1 / (2 − υ ) n ) for arbitrary υ < 1 . T aking υ > 2 − 1 /ς , we complete the pro of of ( 4.1 ). 5 Pro ofs, I I: the main statemen ts In the previous section, w e hav e prov ed that, for any ς < 1 , P ( b θ n ∈ Θ n,ς ) → 1 , where we denote Θ n,ς =  θ : | θ − θ 0 | ≤ r − ς n  . In what follows, we would lik e to consider the ‘true’ rate r − 1 n ; hence, it is instructive to identify the k ey diﬃcult y which do es not allow us to reac h the ‘ideal’ v alue ς = 1 using the previously dev elop ed technique. The limitation ς < 1 comes from the estimate ( 4.24 ), where the last (sub-linear) term contains a p ow er function with υ < 1 . W e can not get a similar estimate with υ = 1 (whic h would lead to consistency at the rate r − 1 n ), b ecause the uniform b ound for the martingale terms U M n ( θ ) , Y M n ( θ ) , giv en by Lemma 4.2 , only pro vides the p ow er υ < 1 . Note that the p oint-wise bound from the same lemma has the required p o w er υ = 1 . Thus, the technique, based on the K olmogoro v-Chen tso v theorem, appears to be not precise enough to pro vide the exact consistency rate. This observ ation gives an insigh t into the subsequen t constructions and considerations: w e will use additional considerations to a v oid the loss of accuracy caused by the Kolmogoro v-Chen tso v theorem. Put simply , we will use a new contrast function, con vex in θ ; in this framework, p oin t-wise b ounds (conv ergence) will imply uniform-in- θ bounds (con vergence). 18 5.1 Linearization Denote ¯ F h ( θ ; x ) = F h ( θ 0 ; x ) + h ∇ θ a ( θ 0 ; x ) · ( θ − θ 0 ) , ¯ κ k,n ( θ ) = h − 1 /α n  ¯ F h n ( θ ; X t k − 1 ,n ) − ¯ F h n ( θ 0 ; X t k − 1 ,n )  V ( X t k,n ) = h 1 − 1 /α n V ( X t k,n ) ∇ θ a ( θ 0 ; X t k − 1 ,n ) · ( θ − θ 0 ) , and put ¯ H n ( θ ) = 1 r 2 n h − 1 /α n n X k =1     X t k,n − ¯ F h n ( θ ; X t k − 1 ,n )    −    X t k,n − ¯ F h n ( θ 0 ; X t k − 1 ,n )     V ( X t k,n ) . Because F h ( θ 0 ; x ) = ¯ F h ( θ 0 ; x ) , we hav e ¯ H n ( θ ) = 1 r 2 n n X k =1 ( | ζ k,n − ¯ κ k,n ( θ ) | − | ζ k,n | ) . The functions ¯ κ k,n ( θ ) are linear, and thus the new contrast function ¯ H n ( θ ) is con v ex in θ . W e will sho w that the diﬀerence H n ( θ ) − ¯ H n ( θ ) is negligible, in a sense. F or this purp ose, w e will compare the parts in the decomp ositions ( 4.4 ) of H n ( θ ) with the similar parts for ¯ H n ( θ ) , which we denote ¯ U n ( θ ) and ¯ Y n ( θ ) , resp ectiv ely . Similar notational conv en tion will b e used for the decomp ositions of ¯ U n ( θ ) and ¯ Y n ( θ ) , analogous to ( 4.5 ) and ( 4.6 ). Recall that γ > 0 is the constant giv en by Assumption 2.2 . Lemma 5.1. F or any ς ∈ (0 , 1) , we have sup θ  = θ 0 ,θ ∈ Θ n,ς | U P n ( θ ) − ¯ U P n ( θ ) | | θ − θ 0 | = o P ( r − 1 n h γ ς n ) , n → ∞ . Pr o of. W e hav e ∆ n ( θ ) − ¯ ∆ n ( θ ) = 1 r 2 n n X k =1 ( u P k,n ( θ ) − ¯ u P k,n ( θ )) with ¯ u P k,n ( θ ) = − ¯ κ k,n ( θ ) E [sgn( ζ k,n ) |F k − 1 ,n ] . By ( 2.15 ), | κ k,n ( θ ) − ¯ κ k,n ( θ ) | ≤ C h − 1 /α +1 n | θ − θ 0 |  | θ − θ 0 | γ + h γ n  V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) . Since δ < δ reg r = 1 + η − 1 /α (Assumption 2.3 ), we ha ve (0 ≤ ) 1 − η < 2 − 1 /α − δ and hence for θ ∈ Θ n,ς , | θ − θ 0 | γ ≤ q nh 2 δ n h 2 − 1 /α − δ n ≤ h γ ς n for n large enough. Since h γ ς n ≫ h γ n , this gives the b ound | κ k,n ( θ ) − ¯ κ k,n ( θ ) | ≤ C h − 1 /α +1+ γ ς n | θ − θ 0 | V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) . (5.1) Rep eating the pro of of Lemma literally 4.1 with the estimate ( 4.8 ) replaced by ( 5.1 ), we get the required statemen t. Similarly , using ( 5.1 ) instead of ( 4.8 ), we get the follo wing analogue of the second statement in Lemma 4.2 . Lemma 5.2. F or any υ ∈ (0 , 1) and ς ∈ (0 , 1) , we have sup θ  = θ 0 ,θ ∈ Θ n,ς | U M n ( θ ) − ¯ U M n ( θ ) | + | Y M n ( θ ) − ¯ Y M n ( θ ) | | θ − θ 0 | υ = o P ( r − 1 n h γ ς n ) , n → ∞ . 19 Finally , w e hav e the follo wing. Lemma 5.3. F or any ς ∈ (0 , 1) , sup θ  = θ 0 ,θ ∈ Θ n,ς | Y P n ( θ ) − ¯ Y P n ( θ ) | | θ − θ 0 | 2 = O P ( h γ ς n ) , n → ∞ . Pr o of. Denote ψ t ( x, z ) = t 1 /α p t  x, F t ( θ 0 ; x ) + z t 1 /α  . Then, we can write y P n,k ( θ ) = Z R q  y − F h n ( θ 0 ; X t k − 1 ,n ) h 1 /α n , κ k,n ( θ )  p h n ( X t k − 1 ,n , y ) dy = Z R q ( z , κ k,n ( θ )) ψ h n ( X t k − 1 ,n , z ) dz and similarly ¯ y P n,k ( θ ) = Z R q ( z , ¯ κ k,n ( θ )) ψ h n ( X t k − 1 ,n , z ) dz . By Theorem A.2 we ha ve ψ t ( x, z ) b ounded uniformly in x, z ∈ R and t ∈ (0 , 1] , hence | y n,k ( θ ) − ¯ y n,k ( θ ) | ≤ C Z R | q ( z , κ k,n ( θ )) − q ( z , ¯ κ k,n ( θ )) | dz . On the other hand, it is easy to v erify using the form ula ( 4.3 ) that Z R | q ( z , v )) − q ( z , ¯ v ) | dz ≤ C ( | v | + | ¯ v | ) | v − ¯ v | . W e ha ve | κ k,n ( θ ) | + | ¯ κ k,n ( θ ) | ≤ C h − 1 /α +1 n | θ − θ 0 | V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) , | κ k,n ( θ ) − ¯ κ k,n ( θ ) | ≤ C h − 1 /α +1+ γ ς n | θ − θ 0 | V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) b y ( 4.8 ) and ( 5.1 ), resp ectiv ely . Therefore, | y P n,k ( θ ) − ¯ y P n,k ( θ ) | ≤ C h − 2 /α +2+ γ ς n | θ − θ 0 | 2 V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2 . Recall that h − 2 /α +2 n r 2 n = 1 n , hence | Y P n ( θ ) − ¯ Y P n ( θ ) | ≤ C | θ − θ 0 | 2 h γ ς n 1 n n X k =1 V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2 ! . Since 1 n n X k =1 V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2 = O P (1) , n → ∞ , the last estimate completes the pro of. 5.2 Lo calization W e in tro duce a lo cal co ordinate u in the neigh b orhoo d of θ 0 with the rate r n : θ = θ 0 + r − 1 n u, and deﬁne the asso ciated re-scaled ob jective function as G n ( u ) := r 2 n H n ( θ 0 + r − 1 n u ) = h − 1 /α n n X k =1     X t k,n − F h ( θ 0 + r − 1 n u ; X t k − 1 ,n )    −    X t k,n − F h ( θ 0 ; X t k − 1 ,n )     . 20 W e also introduce the linearized v ersion ¯ G n ( u ) := r 2 n ¯ H n ( θ 0 + r − 1 n u ) = h − 1 /α n n X k =1     X t k,n − ¯ F h ( θ 0 + r − 1 n u ; X t k − 1 ,n )    −    X t k,n − ¯ F h ( θ 0 ; X t k − 1 ,n )     . Denote U n,ς = { u : θ 0 + r − 1 n u ∈ Θ n,ς } =  u : | u | ≤ r 1 − ς n  . Prop osition 5.1. Ther e exists ς ∈ (0 , 1) such that sup u ∈ U n,ς | G n ( u ) − ¯ G n ( u ) | = o P (1) , n → ∞ . Pr o of. Denote by U P,G n ( u ) , U M ,G n ( u ) , Y P,G n ( u ) , Y M ,G n ( u ) the corresp onding parts of the decomp osition of H n ( θ ) after the same lo calization and re-scaling, e.g. U P,G n ( u ) := r 2 n U P n ( θ 0 + r − 1 n u ) . Denote also by ¯ U P,G n ( u ) , ¯ U M ,G n ( u ) , ¯ Y P,G n ( u ) , ¯ Y M ,G n ( u ) the similar parts for ¯ H n ( θ ) . Then, w e ha v e the follo wing. • By Lemma 5.1 , | U P,G n ( u ) − ¯ U P,G n ( u ) | ≤ r 2 n  k U,P,ς n r − 1 n h γ ς n  | r − 1 n u | ≤ k U,P,ς n h γ ς n r 1 − ς n , u ∈ U n,ς , with the family { k U,P,ς n } b ounded in probabilit y; • By Lemma 5.2 , | U M ,G n ( u ) − ¯ U M ,G n ( u ) | + | Y M ,G n ( u ) − ¯ Y M ,G n ( u ) | ≤ r 2 n  k M ,ς n r − 1 n h γ ς n  | r − 1 n u | υ ≤ k M ,ς n h γ ς n r 1 − υ ς n , u ∈ U n,ς , with the family { k M ,ς n } b ounded in probabilit y; • By Lemma 5.3 , | Y P,G n ( u ) − ¯ Y P,G n ( u ) | ≤ r 2 n  k Y ,P,ς n h γ ς n  | r − 1 n u | 2 ≤ k Y ,ς n h γ ς n r 2 − 2 ς n , u ∈ U n,ς . with the family { k Y ,P,ς n } b ounded in probabilit y . By Assumption 2.3 , we ha ve h γ ς n = o ( n − ( γ ς ) / (2 δ ) ) . T aking υ , ς close enough to 1, w e can guarantee 1 − υ ς < γ ς 2 δ , 2 − 2 ς < γ ς 2 δ , th us proving the required statemen t. 5.3 Limit theorem for the linearized and localized ob jective functions In this section, we establish the following point-wise limit theorem for the (linearized and localized) random ﬁeld ¯ G n ( · ) . Prop osition 5.2. The r andom ﬁeld ¯ G n ( u ) , u ∈ R m we akly c onver ges, in the sense of ﬁnite-dimensional distributions, to G 0 ( u ) = Γ 1 / 2 0 ξ · u + ϕ α (0)Σ 0 u · u, u ∈ R m , (5.2) wher e ξ is a standar d m -dimensional Gaussian r andom ve ctor indep endent of Γ 0 and Σ 0 . 21 Pr o of. Denote Ξ n = − 1 √ n n X k =1  sgn( ζ k,n ) − E [sgn( ζ k,n ) |F k − 1 ,n ]  V ( X t k − 1 ,n ) ∇ θ a ( θ 0 ; X t k − 1 ,n ) , Γ n = 1 n n X k =1 V ( X t k − 1 ,n ) 2  ∇ θ a ( θ 0 ; X t k − 1 ,n )  ⊗ 2 , Σ n = ϕ α (0) n n X k =1 V ( X t k − 1 ,n ) σ ( X t k − 1 ,n )  ∇ θ a ( θ 0 , X t k − 1 ,n )  ⊗ 2 . Let us show that, for any u ∈ R m , ¯ G n ( u ) = Ξ n · u + Σ n u · u + o P (1) , n → ∞ . (5.3) W e will show in App endix B.2 b elow that (Ξ n , Γ n , Σ n ) = ⇒ (Γ 1 / 2 0 ξ , Γ 0 , Σ 0 ) , (5.4) whic h together with ( 5.3 ) will yield the required statement. W e hav e ¯ G n (0) = 0 , hence ( 5.3 ) for u = 0 is trivial; therefore we consider further the case u  = 0 , only . W e analyse the terms in the decomp osition separately for ¯ G n ( u ) . First, we hav e simply ¯ U M ,G n ( u ) = ¯ U M n ( θ 0 + r − 1 n u ) = n X k =1 ¯ u M k,n ( θ 0 + r − 1 n u ) = − n X k =1  sgn( ζ k,n ) − E [sgn( ζ k,n ) |F k − 1 ,n ]  ¯ κ k,n ( θ 0 + r − 1 n u ) . Since ¯ κ k,n ( θ 0 + r − 1 n u ) = h 1 − 1 /α n r − 1 n V ( X t k − 1 ,n ) ∇ θ a ( θ 0 ; X t k − 1 ,n ) · u = 1 √ n V ( X t k − 1 ,n ) ∇ θ a ( θ 0 ; X t k − 1 ,n ) · u, w e obtain ¯ U M ,G n ( u ) = Ξ n · u. That is, the linear part in the right-hand side of ( 5.3 ) equals the martingale part of the ‘linear’ term ¯ U G n ( u ) . Also, it is easy to show that the corresp onding predictable part is negligible: ¯ U P,G n ( u ) = o P (1) , n → ∞ . Indeed, for u  = 0 , we hav e ( 4.7 ) v alid for ¯ U P n ( θ ) : the pro of remains literally the same, with κ k,n ( θ ) in ( 4.8 ) replaced by ¯ κ k,n ( θ ) . Then, | ¯ U P,G n ( u ) | | u | = r n | ¯ U P n ( θ 0 + r − 1 n u ) | | r − 1 n u | ≤ r n sup θ  = θ 0 | ¯ U P n ( θ ) | | θ − θ 0 | = o P (1) , n → ∞ , whic h prov es the required conv ergence. Next, we analyze the predictable and martingale parts of the ‘quadratic’ term ¯ Y G n ( u ) . Denote Q ( x, υ ) = υ − 2 q ( x, υ ) , υ  = 0 . Then, Q ( x, υ ) is a family of probability densities with resp ect to x such that Q ( x, υ ) dx ⇒ δ 0 ( dx ) , υ → 0 . (5.5) W e ha ve ¯ Y P,G n ( u ) = n X k =1  ¯ κ k,n ( θ 0 + r − 1 n u )  2 Z R Q  z , ¯ κ k,n ( θ 0 + r − 1 n u )  ψ h n ( X t k − 1 ,n , z ) dz ; 22 see the pro of of Lemma 5.3 for the notation ψ t ( x, z ) . F or any R > 0 , by ( 5.5 ) and Theorem A.2 we hav e Z R Q ( z , υ ) ψ t ( x, z ) dz → 1 σ ( x ) ϕ α (0) , υ → 0 , t → 0+ uniformly in | x | ≤ R . W e hav e | ¯ κ k,n ( θ 0 + r − 1 n u ) | ≤ C R n − 1 / 2 on {| X t k − 1 ,n | ≤ R } , hence for any R , n X k =1  ¯ κ k,n ( θ 0 + r − 1 n u )  2 ×     Z R Q  z , ¯ κ k,n ( θ 0 + r − 1 n u )  ψ h n ( X t k − 1 ,n , z ) dz − 1 σ ( X t k − 1 ,n ) ϕ α (0)     1 | X t k − 1 ,n |≤ R = o P (1) . Since ψ t ( x, z ) is bounded and σ ( x ) is separated from 0 , we get     Z R Q  z , ¯ κ k,n ( θ 0 + r − 1 n u )  ψ h n ( X t k − 1 ,n , z ) dz − 1 σ ( X t k − 1 ,n ) ϕ α (0)     ≤ C. W e also hav e (recall Assumption 2.2 ; we may and do set h n ≤ 1 )  ¯ κ k,n ( θ 0 + r − 1 n u )  2 = 1 n  ∇ θ a ( θ 0 , X t k − 1 ,n ) · u  2 V ( X t k − 1 ,n ) 2 ≤ | u | 2 n V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2 . Hence Υ n,R ( u ) := n X k =1  ¯ κ k,n ( θ 0 + r − 1 n u )  2 ×     Z R Q  z , ¯ κ k,n ( θ 0 + r − 1 n u )  ψ h n ( X t k − 1 ,n , z ) dz − 1 σ ( X t k − 1 ,n ) ϕ α (0)     1 | X t k − 1 ,n | >R ≤ C | u | 2 n n X k =1 V ( X t k − 1 ,n ) 2 W ( X t k − 1 ,n ) 2 1 | X t k − 1 ,n | >R ≤ C | u | 2 1 n n X k =1 V ( X t k − 1 ,n ) p W ( X t k − 1 ,n ) p ! 2 /p 1 n n X k =1 1 | X t k − 1 ,n | >R ! 1 − 2 /p for any p > 2 . W e hav e lim sup n →∞ P 1 n n X k =1 1 | X t k − 1 ,n | >R > ε ! → 0 , R → ∞ for an y ε > 0 : for the ﬁnite observ ation horizon, this is trivial, for the inﬁnite observ ation horizon, this follo ws by ergo dicit y of the pro cess X , see App endix B.2 . Thus, using Assumption 3.2 , we get lim sup n →∞ P (Υ n,R ( u ) > ε ) → 0 , R → ∞ , for any u ∈ R m , ε > 0 . Summarizing all the abov e yields that, for an y u ∈ R m , ¯ Y G n ( u ) = Σ n u · u + o p (1) , n → ∞ . Finally , w e show that the martingale part ¯ Y M ,G n ( u ) of the ‘quadratic’ term is negligible: ¯ Y M ,G n ( u ) = n X k =1 ¯ y M k,n ( θ 0 + r − 1 n u ) = o P (1) , n → ∞ . (5.6) Indeed, we hav e E [ q ( ζ k,n , ¯ κ k,n ( θ )) 2 |F k − 1 ,n ] = Z R q  z , ¯ κ k,n ( θ 0 + r − 1 n u )  2 ψ h n ( X t k − 1 ,n , z ) dz . 23 Eac h of the functions ψ h n ( x, z ) is a probabilit y density with resp ect to z , and these functions are b ounded. In addition, q ( x, v ) is b ounded by 2 | v | and supp orted by [ −| v | , | v | ] . Summarizing all these prop erties, we get Z R q ( z , v ) 2 ψ h n ( x, z ) dz ≤ C ( | v | 2 ∧ | v | 3 ) Without loss of generality we can assume 2 p ∈ (2 , 3] in the moment Assumption 3.2 ; for suc h p , the ab o v e estimate yields Z R q ( z , v ) 2 ψ h n ( x, z ) dz ≤ C | v | 2 p . Then, in the inﬁnite observ ation horizon case, using Assumption 3.2 , we get that E   n X k =1 ¯ y M ,G k,n ( θ 0 + r − 1 n u ) ! 2   ≤ C E " n X k =1 | ¯ κ k,n ( θ 0 + r − 1 n u ) | 2 p # ≤ C | u | p n p n X k =1 V ( X t k − 1 ,n ) 2 p W ( X t k − 1 ,n ) 2 p → 0 , n → ∞ , whic h prov es ( 5.6 ). As for the ﬁnite observ ation horizon case, ( 5.6 ) readily follows from the Lenglart domination inequality [ 13 , Section 2.1.7]. Summarizing all the ab o v e, we deriv e the represen tation ( 5.2 ), whic h completes the pro of. Com bining Prop osition 5.1 and Prop osition 5.2 , we get the following. Corollary 5.1. F or any c omp act set K ⊂ R m , t he r andom ﬁelds { G n ( u ) : u ∈ K } and { ¯ G n ( u ) : u ∈ K } we akly c onver ge to { G 0 ( u ) : u ∈ K } in C ( K ) . Pr o of. By Prop osition 5.1 , it is enough to prov e the required w eak conv ergence for ¯ G n ( u ) , u ∈ K . These random ﬁelds are conv ex and the limiting random ﬁeld G 0 ( u ) , u ∈ R m is b ounded in probability on every ball. Therefore, it is easy to sho w, using Prop osition 5.2 , that the (sub-) gradients of ¯ G n ( u ) , u ∈ R m are b ounded in probabilit y on every ball. This yields that the family of the distributions of { ¯ G n ( u ) : u ∈ K } in C ( K ) is tight. Combined with the conv ergence of ﬁnite-dimensional distributions, this giv es the required w eak conv ergence in C ( K ) . 5.4 Completion of the pro of No w we are ready to complete the proofs of Theorems 3.1 and 3.2 . First, we prov e that ˆ u n := r n ( ˆ θ n − θ 0 ) = argmin u G n ( u ) = O P (1) , (5.7) whic h actually sho ws that ( 4.1 ) holds true for ς = 1 . F or that purp ose, w e make a simple observ ation that, for a conv ex function G : R m → R with G (0) = 0 , for any R , G ( u ) > 1 , | u | = R = ⇒ G ( u ) > 1 , | u | ≥ R. Fix R > 0 and let n large enough so that R < r 1 − ς n . By the con v exity of ¯ G n and Corollary 5.1 , lim inf n →∞ P  inf u ∈ U n,ς : | u |≥ R ¯ G n ( u ) > 1  ≥ lim inf n →∞ P  inf | u | = R ¯ G n ( u ) > 1  ≥ P  inf | u | = R G 0 ( u ) > 1  . Cho ose ς < 1 close enough to 1, so that Prop osition 5.1 holds. Then by ( 4.1 ) and Proposition 5.1 , we ha ve P ( ˆ u n ∈ U n,ς ) → 1 , lim sup n →∞ P sup u ∈ U n,ς | G n ( u ) − ¯ G n ( u ) | > 1 2 ! = 0 . Recall that G n ( ˆ u n ) ≤ G n (0) = 0 . Building on the ab o v e observ ations, we hav e for any R > 0 , lim sup n →∞ P ( | ˆ u n | ≥ R ) ≤ lim sup n →∞ P ( R ≤ | ˆ u n | ≤ r 1 − ς n ) 24 ≤ lim sup n →∞ P inf u : R ≤| u |≤ r 1 − ς n G n ( u ) ≤ 1 2 ! ≤ 1 − lim inf n →∞ P inf u : R ≤| u |≤ r 1 − ς n G n ( u ) > 1 2 ! ≤ 1 − lim inf n →∞ P inf u : R ≤| u |≤ r 1 − ς n ¯ G n ( u ) > 1 ! + lim sup n →∞ P sup u ∈ U n,ς | G n ( u ) − ¯ G n ( u ) | > 1 2 ! ≤ 1 − P  inf | u | = R G 0 ( u ) > 1  . The limiting random ﬁeld G 0 ( u ) is a quadratic function with the matrix ϕ α (0)Σ 0 of the quadratic part; see ( 5.2 ). This matrix is non-degenerate by the lo cal iden tiﬁability assumption, hence P  inf | u | = R G 0 ( u ) > 1  → 1 , R → ∞ , whic h completes the pro of of ( 5.7 ). Com bining ( 5.7 ) with Corollary 5.1 , w e can apply the standard argmin argument to deduce that ˆ u n ⇒ ˆ u 0 , where ˆ u 0 := − 1 2 ϕ α (0) Σ − 1 0 Γ 1 / 2 0 ξ is the unique minimal p oin t ˆ u 0 of the random ﬁeld G 0 ( u ) . This random p oin t has the (mixed) normal distribution with the parameters 0 and (2 ϕ α (0)) − 2 Σ − 1 0 Γ 0 Σ − 1 0 , thus completing the proofs of Theorems 3.1 and 3.2 . 6 Consisten t estimator of asymptotic co v ariance matrix The ob jective of this section is to construct a consistent estimator of th e asymptotic p ossibly random co v ariance matrix A 0 := (2 ϕ α (0)) − 2 Σ − 1 0 Γ 0 Σ − 1 0 (6.1) of the LAD estimator. T o that end, we will separately consider ϕ α (0) , Σ 0 , and Γ 0 . Thr oughout this se ction, we ke ep the assumptions given in Se ctions 2 and 3 valid. Just for brevit y , w e here supp ose that T n ≡ T ∈ (0 , ∞ ) for the ﬁnite observ ation horizon case. Recall that, in case of T n → ∞ , we denote by q > 0 the tail index of the Lévy pro cess Z (see Assumption 3.1 ); this implies that E [ | Z 1 | q ] < ∞ . Denote by ∥ · ∥ T V the total v ariation norm and by P t ( θ 0 ; x, dy ) := P [ X t ∈ dy | X 0 = x ] the transition probability function of X . Denote b y P X 0 the marginal distribution of X 0 . Throughout this section, we will w ork under the follo wing additional conditions 2 . Assumption 6.1. 1. The index α and the ﬁnite r ange drift-Hölder exp onent η ∈ (0 , 1] satisfy that α > 1 η + 1 . 2. When T n → ∞ , we additional ly have sup s ≥ 0 E [ | X s | q ] < ∞ and also Z ∥ P t ( θ 0 ; x, dy ) − π ( θ 0 ; dy ) ∥ T V P X 0 ( dx ) ≲ (1 + t ) − a (6.2) for some a > 0 . F urther, T n ≳ h − c ′ n for some c ′ > 0 . 2 There may b e some redundancy with the main assumptions given in Section 3 , but no confusion will o ccur. 25 Recall the balance condition ( 2.7 ). Assumption 6.1 .1 en tails that α > 1 2 (6.3) and that α + min { α, 1 } η > 1 . (6.4) Let f ( x, θ ) := ( ∇ θ a ( θ ; x )) ⊗ 2 . Assumption 6.2. The function f ( x, θ ) is c ontinuously diﬀer entiable in θ for e ach x , and we have the fol lowing. 1. When T n ≡ T , sup θ | ∂ l θ f ( x, θ ) | ≲ 1 + | x | C , l ∈ { 0 , 1 } , sup θ | f ( x, θ ) − f ( y , θ ) | ≲ (1 + | x | + | y | ) C | x − y | . 2. When T n → ∞ , the (nonne gative and b ounde d) weight function V ( x ) is glob al ly Lipschitz and the function ( x, θ ) 7→ ( V ( x ) f ( x, θ ) , V ( x ) ∂ θ f ( x, θ )) is essential ly b ounde d. Mor e over, lim ε → 0 sup θ sup | x − y |≤ ε   V ( x ) f ( x, θ ) − V ( y ) f ( y , θ )   = 0 . (6.5) Then, the prop ert y ( 6.5 ) also holds with V ( · ) replaced by V ( · ) 2 /σ ( · ) . 6.1 A ctivit y index Recall that the symmetric α -stable density ϕ α is asso ciated with the characteristic function ξ 7→ exp( −| ξ | α ) = exp  Z (cos( ξ u ) − 1) µ α ( u )  . The density satisﬁes that ϕ α (0) = 1 π Z ∞ 0 exp( − ξ α ) dξ = 1 π Γ  1 + α − 1  . In this section, we will construct an estimator ˆ α n of the index α such that ∃ κ > 0 , h − κ n ( ˆ α n − α ) = O P (1) (6.6) with leaving the scale co eﬃcien t σ unkno wn, for b oth cases T n → ∞ and T n ≡ T ∈ (0 , ∞ ) . Under ( 6.6 ), w e hav e ˆ α n P − → α and ϕ ˆ α n (0) = 1 π Γ  1 + ˆ α − 1 n  P − → 1 π Γ  1 + α − 1  = ϕ α (0) , (6.7) h − 1 / ˆ α n n = h − 1 /α n (1 + o P ( h κ − ε n )) (6.8) for any ε > 0 . The latter ensures that √ nh 1 − 1 /α n ( ˆ θ n − θ 0 ) and √ nh 1 − 1 / ˆ α n n ( ˆ θ n − θ 0 ) are asymptotically equiv alent, hence hav e the same asymptotic mixed-normal distribution. Let us denote the k th increment of X by ∆ k X = X t k,n − X t k − 1 ,n . T o estimate ( ϕ α (0) , Σ 0 , Γ 0 ) , w e will mak e use of the second-order increments ∆ 2 k X := ∆ k X − ∆ k − 1 X = X t k,n + X t k − 2 ,n − 2 X t k − 1 ,n for 2 ≤ k ≤ n , which quantitativ ely w eakens the eﬀect of the drift term. W e also deﬁne the third-order diﬀerences with one lag by ∆ 2 k X − ∆ 2 k − 2 X for 4 ≤ k ≤ n . Analogous symbols are used for the increments of Z = Z ( α ) + Z △ (see ( C.2 )). Recall that h − 1 /α n Z h n = h − 1 /α n Z ( α ) h n + o p (1) ⇒ S α ( h n → 0 ) under the 26 presen t assumptions, where S α denotes the α -stable distribution corresponding to the densit y ϕ α . It follo ws that (2 h n ) − 1 /α ∆ 2 k Z ⇒ S α , (4 h n ) − 1 /α (∆ 2 k Z − ∆ 2 k − 2 Z ) ⇒ S α for each k ≥ 4 . F or notational conv enience, we will often write R k for R t k,n t k − 1 ,n and denote a s = a ( X s , θ 0 ) and σ s = σ ( X s ) . With slight abuse of notation, w e further abbreviate as a k = a t k,n , σ k = σ t k,n , and so on. Then, w e can write (2 h n ) − 1 /α ∆ 2 k X = σ k − 2 (2 h n ) − 1 /α ∆ 2 k Z + 2 − 1 /α r k , (6.9) where r j = r a,k + r σ,k with r a,k := h − 1 /α n  Z k ( a s − a k − 2 ) ds − Z k − 1 ( a s − a k − 2 ) ds  , r σ,k := h − 1 /α n  Z k ( σ s − − σ k − 2 ) d Z s − Z k − 1 ( σ s − − σ k − 2 ) d Z s  . F urther b y ( 6.9 ), (4 h n ) − 1 /α (∆ 2 k X − ∆ 2 k − 2 X ) = σ j − 4 (4 h n ) − 1 /α (∆ 2 k Z − ∆ 2 k − 2 Z ) + ( σ k − 2 − σ k − 4 )(4 h n ) − 1 /α ∆ 2 k Z + 4 − 1 /α ( r k − r k − 2 ) . T o construct an estimator of α , we introduce H 1 ,n ( ρ ) := 1 n − 1 n X k =2   (2 h n ) − 1 /α ∆ 2 k X   ρ , (6.10) where ρ ∈ (0 , min { 1 , α } ) , and where ρ ∈ (0 , min { 1 , α, q } ) when T n → ∞ ; recall that q > 0 is the tail index given in Assumption 3.1 . W e b egin with handlin g the remaining terms. Lemma 6.1. Pick a c onstant ρ > 0 such that • ρ ∈ (0 , min { 1 , α } ) for T n ≡ T ; • ρ ∈ (0 , min { 1 , α, q } ) for T n → ∞ . Then, ther e exists a c onstant κ 1 > 0 such that max  1 n − 1 n X k =2 | r a,k | ρ , 1 n − 1 n X k =2 | r σ,k | ρ  = O P ( h κ 1 n ) . Pr o of. Recalling the deﬁnitions of r a,k and r σ,k , we will only sho w that ¯ U a,n ( ρ ) := 1 n − 1 n X k =2     h − 1 /α n Z k ( a s − a k − 2 ) ds     ρ = O P ( h κ 1 n ) , (6.11) ¯ U σ,n ( ρ ) := 1 n − 1 n X k =2     h − 1 /α n Z k ( σ s − − σ k − 2 ) d Z s     ρ = O P ( h κ 1 n ) (6.12) hold for some κ 1 > 0 ; the other tw o terms (asso ciated with R k − 1 ) can b e handled similarly . W e note that sup θ | a ( θ ; x ) − a ( θ ; y ) | ≤ C  | x − y | η 1 | x − y |≤ 1 + | x − y | 1 | x − y | > 1  , (6.13) | σ ( x ) − σ ( y ) | ≤ C  | x − y | ζ 1 | x − y |≤ 1 + | x − y | 1 | x − y | > 1  , under the ﬁnite range Hölder conditions for the co eﬃcien ts ( η , ζ ∈ (0 , 1] ) in Assumption 2.1 . In particular, sup θ | a ( θ ; x ) | ≤ C (1 + | x | ) . 27 Let us lo ok at ( 6.11 ). W e hav e | a s − a k − 2 | ≲ | X s − X t k − 2 ,n | + | X s − X t k − 2 ,n | η b y ( 6.13 ), hence it suﬃces to show that 1 n − 1 n X k =2  h − 1 /α n Z k | X s − X t k − 2 ,n | η ′ ds  ρ = O P ( h κ 1 n ) (6.14) for η ′ ∈ { η , 1 } . Inserting the expression X s − X t k − 2 ,n = Z s t k − 2 ,n σ u − d Z u + Z s t k − 2 ,n a u du, (6.15) w e can b ound the left-hand side of ( 6.14 ) by a constan t multiple of ¯ U ′ a,n ( ρ ) h (1 − 1 /α + η ′ /α ) ρ n + ¯ U ′′ a,n ( ρ ) h (1 − 1 /α + η ′ ) ρ n , (6.16) where ¯ U ′ a,n ( ρ ) := 1 n − 1 n X k =2   1 h n Z k      h − 1 /α n Z s t k − 2 ,n σ u − d Z u      η ′ ds   ρ , ¯ U ′′ a,n ( ρ ) := 1 n − 1 n X k =2    1 h n Z k 1 h n Z s t k − 2 ,n (1 + | X u | ) du ! η ′ ds    ρ . Since 1 − 1 /α + η ′ /α > 0 , to complete the pro of it suﬃces to show that ¯ U ′ a,n ( ρ ) h (1 − 1 /α + η ′ /α ) ρ n = O P ( h κ 1 n ) for some κ 1 > 0 and that ¯ U ′′ a,n ( ρ ) = O P (1) . Then, the quantit y ( 6.16 ) b ecomes O P ( h κ 1 n ) for a smaller κ 1 > 0 if necessary , hence follo w ed by ( 6.14 ). F or ¯ U ′ a,n ( ρ ) , we will use the decomp osition Z = Z ( α ) + Z △ (see ( C.2 ) in Section C ). First, since Z △ t = P 0 1 z N α ( ds, dz ) =: e Z ( α ) ,s t + Z ( α ) ,l t . (6.18) F or the large-jump part, w e hav e a similar estimate of ( 6.17 ) with Z △ therein replaced by Z ( α ) ,l . As for the comp ensated small-jump (lo cally α -stable) part, we can b ound the moment as follo ws: E   1 n − 1 n X k =2   1 h n Z k      h − 1 /α n Z s t k − 2 ,n σ u − d e Z ( α ) ,s u      η ′ ds   ρ   ≲ 1 n − 1 n X k =2   1 h n Z k E        h − 1 /α n Z s t k − 2 ,n σ u − d e Z ( α ) ,s u      η ′   ds   ρ ≲ 1 n − 1 n X k =2   E   E   sup v ∈ [ t k − 2 ,n ,t j ]      h − 1 /α n Z v t k − 2 ,n σ u − d e Z ( α ) ,s u      η ′       F t k − 2 ,n       ρ 28 ≲ h (1 − η ′ /α ) ρ n I ( α < η ′ ) + (log (1 /h n )) ρ I ( α = η ′ ) + I ( α > η ′ ) , (6.19) where, in the ﬁrst and last steps, w e used the conca vit y of y 7→ y ρ ( y ≥ 0 ) and Lemma 6.2 b elow, resp ectiv ely . By ( 6.17 ) and ( 6.19 ), we conclude that ¯ U ′ a,n ( ρ ) ≲ O P ( h 1 − η ′ ρ/α ) + O P ( h (1 − η ′ /α ) ρ n ) I ( α < η ′ ) + O P (log(1 /h n ) ρ ) I ( α ≥ η ′ ) , follo wed by ¯ U ′ a,n ( ρ ) h (1 − 1 /α + η ′ /α ) ρ n ≲ O P ( h (1 − 1 /α ) ρ +1 n ) + O P ( h (2 − 1 /α ) ρ n ) + O P  h (1 − 1 /α + η ′ /α ) ρ n log(1 /h n ) ρ  . It is easy to see that the right-hand side equals O P ( h κ 1 n ) under ( 6.3 ) and ( 6.4 ). Note that, under the essen tial b oundedness of σ , the ab o ve argument is v alid for b oth T n → ∞ and T n ≡ T . T urning to ¯ U ′′ a,n ( ρ ) , we ﬁrst note that ¯ U ′′ a,n ( ρ ) ≲ 1 n − 1 n X k =2 1 + sup s ∈ [ t k − 2 ,n ,t k,n ] | X s | η ′ ρ ! . Again by substituting the expression ( 6.15 ) into X s , sup s ∈ [ t k − 2 ,n ,t k,n ] | X s | η ′ ρ ≲ | X t k − 2 ,n | η ′ ρ + h η ′ ρ n 1 + sup s ∈ [ t k − 2 ,n ,t k,n ] | X s | η ′ ρ ! + sup s ∈ [ t k − 2 ,n ,t k,n ]      Z s t k − 2 ,n σ u − d Z u      η ′ ρ . This implies that for h n small enough, sup s ∈ [ t k − 2 ,n ,t k,n ] | X s | η ′ ρ ≲ | X t k − 2 ,n | η ′ ρ + h η ′ ρ n + sup s ∈ [ t k − 2 ,n ,t k,n ]      Z s t k − 2 ,n σ u − d Z u      η ′ ρ . T aking the conditional exp ectation and applying Lemma 6.2 b elo w together with recalling the decom- p osition Z = Z ( α ) + Z △ and the argument ( 6.17 ), we ha ve, with localization through τ n = inf { t ≥ 0 : | X t | ≥ n } if necessary , E " sup s ∈ [ t k − 2 ,n ,t k,n ] | X s | η ′ ρ      F t k − 2 ,n # ≲ | X t k − 2 ,n | η ′ ρ + h η ′ ρ n + h η ′ ρ/α n ≲ | X t k − 2 ,n | η ′ ρ + 1 . (6.20) W e conclude that ¯ U ′′ a,n ( ρ ) = O P (1) for b oth T n → ∞ and T n ≡ T , by observing the following. • F or T n ≡ T , without loss of generality , we may and do supp ose that sup t ≤ T E [ | X t | K ] < ∞ for every K > 0 . 3 Hence by ( 6.20 ), we get E [ ¯ U ′′ a,n ( ρ )] = O (1) . • F or T n → ∞ , by ( 6.20 ) we directly get E [ ¯ U ′′ a,n ( ρ )] ≲ 1 + sup s ≥ 0 E  | X s | η ′ ρ  ≲ 1 + sup s ≥ 0 E  | X s | q  < ∞ . The pro of of ( 6.11 ) is thus complete. W e can deduce ( 6.12 ) with the same technicalities as in handling ¯ U ′ a,n ( ρ ) , hence we omit the details. F or e Z ( α ) ,s giv en in ( 6.18 ), w e hav e the follo wing estimate. 3 Since ( 6.6 ) is a weak prop ert y , to deduce it we may restrict our attention to any conv enien t event whose probability we can control to b e as large as p ossible, known as the lo calization pro cedure; see [ 13 , Section 4.4.1] for a detailed account. In our case, this essentially amounts to removing large jumps of Z . A concise explanation can b e found in [ 23 , Section 6.1]. 29 Lemma 6.2. Supp ose that ξ = ( ξ t ) t ≥ 0 is ( F t ) -adaptive and essential ly b ounde d. F or r ∈ (0 , 1) and t ≥ 0 , we have (a.s.) E " sup v ∈ [ t,t + h ]     Z v t ξ u − d e Z ( α ) ,s u     r      F t # ≤ C    h ( α < r ) h log(1 /h ) ( α = r ) h r/α ( α > r ) , wher e the c onstant C do es not dep end on h > 0 . Pr o of. W e can adapt the pro ofs of [ 18 ] without essen tial change: the paper dealt with Lévy processes, but their pro of is based on the Burkholder inequality , which can b e v alid for general sto c hastic integrals with resp ect to a Lévy pro cess and a predictable integrand. Sp eciﬁcally , to get the claims, we can mo dify the pro ofs of Theorem 1, Theorem 3, and Theorem 2 of [ 18 ] for α < r , α = r , and α > r , resp ectively . W e pro ceed with further estimates to deduce the asymptotic b eha vior of H 1 ,n ( ρ ) given by ( 6.10 ), with estimating the conv ergence rate. By Lemma 6.1 and ( 6.9 ), we hav e      H 1 ,n ( ρ ) − 1 n − 1 n X k =2 σ ρ k − 2   (2 h n ) − 1 /α ∆ 2 k Z   ρ      ≤ 1 n − 1 n X k =2 | r k | ρ = O P ( h κ 1 n ) . (6.21) Denote by m α ( ρ ) , ρ ∈ (0 , α ) , the q th absolute moment of S α (see [ 27 , Example 25.10]): m α ( ρ ) := E  | Z ( α ) 1 | ρ  = 2 q Γ(( ρ + 1) / 2) √ π Γ(1 − ρ/ 2) Γ(1 − ρ/α ) . (6.22) By the indep endence of the increments and the triangular inequality ,      1 n − 1 n X k =2 σ ρ k − 2   (2 h n ) − 1 /α ∆ 2 k Z   ρ − m α ( ρ ) 1 n − 1 n X k =2 σ ρ k − 2      ≤ 1 √ n − 1      1 √ n − 1 n X k =2 σ ρ k − 2    (2 h n ) − 1 /α ∆ 2 k Z   ρ − E h   (2 h n ) − 1 /α ∆ 2 k Z   ρ i      +    E h   (2 h n ) − 1 /α ∆ 2 2 Z   ρ i − m α ( ρ )    1 n − 1 n X k =2 σ ρ k − 2 =: δ ′ n + δ ′′ n . (6.23) In what follows, we will assume that ρ ∈     0 , 1 { β =0 } + 1 { β > 0 } β  , T n ≡ T ,  0 ,  1 { β =0 } + 1 { β > 0 } β  ∧ q 2  , T n → ∞ . (6.24) Lemma 6.3. Under ( 6.24 ) , 1. δ ′ n = O P ( n − 1 / 2 ) , and 2. δ ′′ n = O P ( h κ 2 ) for some κ 2 > 0 . These ar e valid for b oth T n ≡ T and T n → ∞ . Pr o of. The ﬁrst claim easily follows from the Burkholder inequality together with the b oundedness of σ . F or the second one, we apply the momen t estimate Lemma C.1 in Section C together with the represen tation Z = Z ( α ) + Z △ , the fact that (2 h n ) − 1 /α ∆ 2 2 Z ( α ) has the same distribution as Z ( α ) 1 , and the inequality || x + ε | r − | x | r | ≤ | ε | r v alid for any r ∈ (0 , 1] :    E h   (2 h n ) − 1 /α ∆ 2 2 Z   ρ i − m α ( ρ )    ≲ E h   (2 h n ) − 1 /α ∆ 2 2 Z △   ρ i = O ( h κ 2 n ) for some κ 2 > 0 . 30 Under the sampling-design condition ( 2.19 ), the estimates ( 6.21 ) and ( 6.23 ) conclude that      H 1 ,n ( ρ ) − m α ( ρ ) 1 n − 1 n X k =2 σ ρ k − 2      = O P ( h κ 3 n ) . for some κ 3 > 0 . No w, we introduce the a.s. ﬁnite quantit y: H 0 ( ρ ) := m α ( ρ ) ×        1 T Z T 0 σ ρ s ds ( T n ≡ T ) Z σ ( x ) ρ π ( θ 0 , dx ) ( T n → ∞ ) . Lemma 6.4. Under ( 6.24 ) , we have for some κ > 0 , H 1 ,n ( ρ ) = H 0 ( ρ ) + O P ( h κ n ) . Pr o of. First, consider the case T n ≡ T . Thanks to ( 6.12 ) and the b oundedness of the scale co eﬃcien t σ , w e hav e      1 n − 1 n X k =2 σ ρ k − 2 − 1 T Z T 0 σ ρ s ds      =      1 n − 1 n X k =2 σ ρ k − 2 − 1 n n X k =1 1 h n Z k σ ρ s ds      ≲ 1 n n X k =1 1 h n Z k | σ ρ s − σ ρ k − 1 | ds + O P  1 n  ≲ 1 n n X j =1 1 h n Z k max  | X s − X t k − 1 | ρ , | X s − X t j − 1 | ζ ρ  ds + O P  1 n  . (6.25) In a quite similar w ay to pro v e ( 6.14 ) (with assuming that sup s ≥ 0 E [ | X s | ρ ] < ∞ without loss of generality), w e can deduce that the ﬁrst term in ( 6.25 ) is O P ( h κ n ) for some κ > 0 . This completes the pro of for T n ≡ T . Next, we consider the case T n → ∞ . Noting that E [ σ ρ k − 2 ] = Z Z σ ( y ) ρ P t k − 2 ,n ( θ 0 ; x, dy ) P X 0 ( dx ) , w e will make use of the following estimate:      1 n − 1 n X k =2 σ ρ k − 2 − Z σ ( x ) ρ π ( θ 0 , dx )      ≤ δ ′ n ( ρ ) + δ ′′ n ( ρ ) , where δ ′ n ( ρ ) :=      1 n − 1 n X k =2 Z Z σ ( y ) ρ  P t k − 2 ,n ( θ 0 ; x, dy ) − π ( θ 0 ; dy )  P X 0 ( dx )      , δ ′′ n ( ρ ) :=      1 n − 1 n X k =2  σ ρ k − 2 − E [ σ ρ k − 2 ]       . F or δ ′ n ( ρ ) , by Assumption 6.1 .2 w e get δ ′ n ( ρ ) ≲ 1 n − 1 n X k =2 Z ∥ P t k − 2 ,n ( θ 0 ; x, · ) − π ( θ 0 ; · ) ∥ T V P X 0 ( dx ) ≲ 1 n − 1 n X k =2 (1 + t k − 2 ,n ) − a = 1 n − 1 n − 2 X k =0  1 + k n T n  − a 31 ≲ 1 n + n X k =1 Z k/n ( k − 1) /n (1 + uT n ) − a du ≲ 1 n + 1 T n Z 1 0 (1 + y ) − a dy ≲ 1 n +  T − min { a , 1 } n ( a  = 1) T − 1 n log T n ( a = 1) ≲ h κ 4 n for a suﬃciently small κ 4 > 0 , where we used ( 2.19 ) in the last step. W e turn to δ ′′ n ( ρ ) . ( 6.2 ) implies that X is p olynomially β -mixing, hence polynomially α -mixing (strong-mixing) as well in the following sense: α X ( t ) := sup s ≥ 0 sup A ∈F X [0 ,s ] , B ∈F X [ s + t, ∞ ) | P [ A ∩ B ] − P [ A ] P [ B ] | ≲ (1 + t ) − a . Then, Ibragimov’s inequality says that (see, for example, [ 9 ]) sup s ≥ 0 | Co v[ σ s , σ s + τ ] | ≲ (1 + τ ) − a . (6.26) By the b oundedness of σ , V ar[ δ ′′ n ( ρ )] ≲ 1 n 2 n X k =1 n X l =1 Co v[ σ k − 1 , σ l − 1 ] ≲ 1 n + 1 n 2 X k  = l Co v[ σ k − 1 , σ l − 1 ] . By ( 6.26 ), the second term on the rightmost side can b e b ounded by a constant multiple of 1 n 2 n − 1 X k =0 n X l = k +1 (1 + ( l − k ) h n ) − a = 1 n 2 n − 1 X k =0 n − k X m =1 (1 + mh n ) − a ≲ 1 n 2 n − 1 X k =0 Z ( n − k ) h n 0 (1 + y ) − a dy ≲        n − 1 ( a > 1) n − 1 log T n ( a = 1) n − 1 T 1 − a n ( a < 1) = O P ( h 2 κ n ) for some κ ∈ (0 , κ 4 ∧ (1 / 2)) . Thus, w e get V ar[ δ ′′ n ( ρ )] ≲ h 2 κ n (1 + ( nh 2 κ n ) − 1 ) ≲ h 2 κ n . Since the summands of δ ′′ n ( ρ ) are zero-mean, it follo ws that δ ′′ n ( ρ ) = O P ( h κ n ) . W e no w construct our estimator of α under ( 6.24 ), in a similar manner to that of [ 28 , Section 4]. T o this end, by combining ( 6.24 ) with the conditions on ρ given in Lemma 6.1 , we assume that ρ ∈     0 , { 1 { β =0 } + 1 { β > 0 } β } ∧ α ∧ 1  , T n ≡ T ,  0 , { 1 { β =0 } + 1 { β > 0 } β } ∧ α ∧ q 2 ∧ 1  , T n → ∞ . (6.27) regardless of whether T n ≡ T or T n → ∞ . In the same wa y as the ab o v e lemmas, we can derive the following for the normalized p ow er-v ariation based on the third-order diﬀerences: H 2 ,n ( ρ ) := 1 n − 3 n X k =4   (4 h n ) − 1 /α (∆ 2 k X − ∆ 2 k − 2 X )   ρ = 1 n − 3 n X k =4 σ ρ k − 4   (4 h n ) − 1 /α (∆ 2 k Z − ∆ 2 k − 2 Z )   ρ + O P ( h κ n ) = ( m α ( ρ ) + o (1)) 1 n − 3 n X k =4 σ ρ j − 4 + O P ( h κ n ) = H 0 ( ρ ) + O P ( h κ n ) , (6.28) 32 where the exp onen t κ > 0 can b e the same as b efore, by making it smaller if necessary . On the one hand, since H 0 ( ρ ) > 0 , from Lemma 6.4 and ( 6.28 ), H 2 ,n ( ρ ) H 1 ,n ( ρ ) = 1 + O P ( h κ n ) . (6.29) On the other hand, the left-hand side of the identit y H 2 ,n ( ρ ) H 1 ,n ( ρ ) = 2 − ρ/α ( n − 1) P n k =4   ∆ 2 k X − ∆ 2 k − 2 X   ρ ( n − 3) P n k =2   ∆ 2 k X   ρ is a statistic. By ignoring the O P ( h κ n ) term in ( 6.29 ), namely by setting the left-hand side of the previous displa y equal to 1 , we obtain the following explicit estimator of α : ˆ α n ( ρ ) : = ρ log 2 , log ( n − 1) P n k =4   ∆ 2 k X − ∆ 2 k − 2 X   ρ ( n − 3) P n k =2   ∆ 2 k X   ρ ! . (6.30) This ˆ α n ( ρ ) diﬀers slightly from the estimator in [ 28 , Section 4], where an aggr e gate d version (sp eciﬁcally , the statistics e V 2 n ( p, X ) therein) was used, in contrast to our thir d-or der-diﬀer enc e-b ase d statistic V 2 ,n ( ρ ) . Th us, by the delta metho d, we end up with the following lemma: 4 Lemma 6.5. F or any ρ satisfying ( 6.27 ) , ther e exist a c onstant κ = κ ( ρ ) > 0 for which ˆ α n ( ρ ) deﬁne d by ( 6.30 ) satisﬁes that h − κ n ( ˆ α n ( ρ ) − α ) = O P (1) , r e gar d less of whether T n ≡ T or T n → ∞ . 6.2 Riemann in tegrals Ha ving obtained an estimator ˆ α n that satisﬁes ( 6.6 ), we proceed to estimate Σ 0 = Σ 0 ( θ 0 ) and Γ 0 = Γ 0 ( θ 0 ) in ( 6.1 ). Let us recall their sp eciﬁc forms: • F or T n ≡ T (with V ( x ) ≡ 1 ), Γ 0 = 1 T Z T 0  ∇ θ a ( θ 0 ; X t )  ⊗ 2 dt, Σ 0 = 1 T Z T 0 1 σ ( X t )  ∇ θ a ( θ 0 ; X t )  ⊗ 2 dt. • F or T n → ∞ , Γ 0 = Z R V ( x ) 2  ∇ θ a ( θ 0 ; x )  ⊗ 2 π ( θ 0 , dx ) , Σ 0 = Z R V ( x ) σ ( x )  ∇ θ a ( θ 0 ; x )  ⊗ 2 π ( θ 0 , dx ) . 6.2.1 Basic conv ergence in probability Again, we use the shorthand f ( x, θ ) = ( ∇ θ a ( θ ; x )) ⊗ 2 . W e denote by ψ : R × Θ → R either: f ( x, θ ) or 1 σ ( x ) f ( x, θ ) for T n ≡ T ; V ( x ) f ( x, θ ) or V ( x ) 2 σ ( x ) f ( x, θ ) for T n → ∞ . 4 Y et another way would be to use the bip o w er version instead of V 2 ,n ( ρ ) : B n ( ρ/ 2 , ρ/ 2) := 1 n − 2 n X j =3   (2 h n ) − 1 /α ∆ 2 k X   ρ/ 2   (2 h n ) − 1 /α ∆ 2 k − 2 X   ρ/ 2 . 33 By Assumption 6.2 , sup θ | ∂ l θ ψ ( · , θ 0 ) | ≲ 1 + | x | C for l ∈ { 0 , 1 } when T n ≡ T , and ψ ( · , θ 0 ) ∈ L 1 ( π ( θ 0 , dx )) when T n → ∞ . Let Ψ n ( θ ) := 1 n n X k =1 ψ ( X t k − 1 ,n , θ ) , Ψ 0 ( θ ) :=        1 T Z T 0 ψ ( X t , θ ) dt ( T n ≡ T ) Z ψ ( x, θ ) π ( θ 0 , dx ) ( T n → ∞ ) . Lemma 6.6. W e have sup θ | Ψ n ( θ ) − Ψ 0 ( θ ) | P − → 0 . Pr o of. Suppose for a moment that sup θ      Ψ n ( θ ) − 1 T n Z T n 0 ψ ( X t , θ ) dt      P − → 0 . (6.31) F or T n ≡ T , ( 6.31 ) is exactly what we wan t to show. F or T n → ∞ , the assumed ergo dicity ensures that T − 1 n R T n 0 ψ ( X t , θ ) dt P − → Ψ 0 ( θ ) = R ψ ( x, θ ) π ( θ 0 , dx ) for each θ , and its uniformity in θ follo ws from the b oundedness sup ( x,θ ) | ∂ θ ψ ( x, θ ) | < ∞ , whic h holds by Assumption 6.2 . Th us, it remains to show ( 6.31 ). Note that sup θ      Ψ n ( θ ) − 1 T n Z T n 0 ψ ( X t , θ ) dt      ≤ 1 n n X k =1 1 h n Z k sup θ | ψ ( X t , θ ) − ψ ( X t k − 1 ,n , θ ) | dt. (6.32) First, we consider T n ≡ T ; as b efore, we may and do suppose that sup t ≤ T E [ | X t | K ] < ∞ for any K > 0 . By the global Lipschitz prop ert y of σ ( x ) , the right-hand side of ( 6.32 ) can b e b ounded by a constant m ultiple of the follo wing quantit y: 1 n n X k =1 1 h n Z k sup θ | f ( X t , θ ) − f ( X t k − 1 ,n , θ ) | dt + 1 n n X k =1 sup θ | f ( X t k − 1 ,n , θ ) | 1 h n Z k | σ ( X t ) − σ ( X t k − 1 ,n ) | dt ≲ 1 n n X k =1 (1 + | X t k − 1 ,n | C ) 1 h n Z k | X t − X t k − 1 ,n | max  1 , | X t − X t k − 1 ,n |  C dt + 1 n n X k =1 (1 + | X t k − 1 ,n | C ) 1 h n Z k | X t − X t k − 1 ,n | dt. W e take the exp ectation in ( 6.32 ) with the conditioning with resp ect to F t k − 1 ,n : through a similar manner to the pro of of ( 6.14 ) via the expression ( 6.15 ) with X t k − 2 ,n replaced by X t k − 1 ,n , we get E " sup θ      Ψ n ( θ ) − 1 T Z T 0 ψ ( X t , θ ) dt      # ≲ 1 n n X k =1 1 h n Z k E  (1 + | X t k − 1 ,n | C ) E  | X t − X t k − 1 ,n | C | F t k − 1 ,n  dt = o (1) . This establishes ( 6.31 ), completing the pro of for T n ≡ T . W e no w consider T n → ∞ . Under the present assumptions, the function ψ ( x, θ ) is essen tially b ounded, sa y sup x,θ | ψ ( x, θ ) | ≤ c ψ . Denote by N Z ( ds, du ) the Poisson p oin t measure asso ciated with Z , and let A t ( h ) := { N Z (( t, t + h ] , {| u | ≥ 1 } ) = 0 } for t ≥ 0 and h > 0 . Then, sup t ≥ 0 E  sup θ | ψ ( X t + h , θ ) − ψ ( X t , θ ) | ; A t ( h ) c  ≤ 2 c ψ P  A 0 ( h ) c  = O ( h ) . 34 Fix any ε ′ > 0 . Then, there exists a constant δ ′ > 0 for which sup | x − y |≤ δ ′ sup θ | ψ ( x, θ ) − ψ ( y, θ ) | < ε ′ / 2 . W e ha ve sup t ≥ 0 E  sup θ | ψ ( X t + h , θ ) − ψ ( X t , θ ) | ; A t ( h )  ≤ ε ′ 2 + 2 c ψ sup t ≥ 0 P [ A t ( h ) ∩ {| X t + h − X t | ≥ δ ′ } ] . (6.33) Since the pro cess v 7→ Z v − Z t for v ∈ ( t, t + h ] has only b ounded jumps on A t ( h ) , under Assumption 6.1 .2 and the linear growth prop ert y of the co eﬃcien ts of X , it is standard to show that the second term in the upp er b ound of ( 6.33 ) go es to 0 for h → 0 . Piecing together the ab o v e observ ations and ( 6.32 ) concludes ( 6.31 ). The pro of is complete. By Lemma 6.6 and the contin uit y of θ 7→ ψ ( x, θ ) for each x , we ha ve ˆ Ψ n : = Ψ n ( ˆ θ n ) P − → Ψ 0 ( θ 0 ) for the LADE ˆ θ n . Let ˆ f k − 1 : = f ( X t k − 1 ,n , ˆ θ n ) . F or b oth T n → ∞ and T n ≡ T (with V ( x ) ≡ 1 ), it holds that ˆ Γ n := 1 n n X k =1 V 2 k − 1 ˆ f k − 1 P − → Γ 0 , Σ ∗ n : = 1 n n X k =1 V k − 1 σ k − 1 ˆ f k − 1 P − → Σ 0 . (6.34) Note that Σ ∗ n is not a statistic, as it dep ends on the unknown function σ ( · ) . In the next subsection, we address this by plugging in a suitable estimator. Before pro ceeding, we remark on the case of a constant scale, namely σ ( x ) ≡ σ 0 > 0 ), with focusing on the weigh t function V ≡ 1 even in the case of T n → ∞ . Then, the asymptotic cov ariance matrix takes the following simpler form: A 0 = σ 2 0 (2 ϕ α (0)) − 2 ×          1 T Z T 0 f ( θ 0 ; X t ) dt ! − 1 ( T n ≡ T )  Z f ( θ 0 , x ) π ( θ 0 , dx )  − 1 ( T n → ∞ ) . By Lemma 6.6 and recalling ( 6.7 ), we hav e ˆ σ 2 n (2 ϕ ˆ α n (0)) − 2 1 n n X k =1 ˆ f k − 1 ! − 1 P − → A 0 for any ˆ σ n P − → σ 0 . Hence, it suﬃces to ha v e a consisten t estimator of σ 0 . By Lemma 6.5 and ( 6.8 ), we ha ve for some κ ′ > 0 , ˆ H 1 ,n ( ρ ) : = 1 n − 1 n X k =2   (2 h n ) − 1 / ˆ α n ∆ 2 k X   ρ = H 1 ,n ( ρ )(1 + o P ( h κ − ε n )) ρ = ( H 0 ( ρ ) + O P ( h κ n )) (1 + o P ( h κ − ε n )) ρ = H 0 ( ρ ) + O P ( h κ ′ n ) . By the contin uit y of α 7→ m α ( ρ ) (recall the expression ( 6.22 )), we get ˆ σ n = ˆ σ n ( ρ ) : = ˆ H 1 ,n ( ρ ) m ˆ α n ( ρ ) ! 1 /ρ P − → σ 0 . 35 6.2.2 Dealing with unkno wn scale co eﬃcient Our goal here is to construct statistics ˆ Σ n ( ρ ) such that ˆ Σ n ( ρ ) P − → Σ 0 , (6.35) with fo cusing on an unkno wn non-constant σ ( x ) . W e will consider approximating eac h instantaneous scales σ k − 1 = σ ( X t k − 1 ) ov er rolling windo ws with shrinking width h n l n , where ( l n ) n ≥ 1 is a given divergen t p ositiv e-in teger sequence such that l n ≳ h − c n , l n h n → 0 (6.36) for some c > 0 ; in particular, l n = o ( n ) . This amoun ts to normalizing the incremen ts ∆ 2 k X in a non- parametric wa y . F or k = 2 , . . . , n − l n , we introduce the statistics e σ ♭ k − 1 ,n ( ρ ) = e σ ♭ k − 1 ( ρ ) : =  m ˆ α n ( ρ ) − 1 ˆ S k ( ρ )  1 /ρ , where ˆ S k ( ρ ) = ˆ S k,n ( ρ ) : = 1 l n k + l n X j = k +1   (2 h n ) − 1 / ˆ α n ∆ 2 j X   ρ , F or c σ := inf x σ ( x ) > 0 , w e deﬁne ˆ σ k − 1 ,n ( ρ ) = ˆ σ k − 1 ( ρ ) : = max n e σ ♭ k − 1 ( ρ ) , c σ 2 o . This ˆ σ k − 1 ( ρ ) will serve as a sp ot-scale estimator of σ k − 1 = σ ( X t k − 1 ,n ) ; w e ha ve e σ ♭ k − 1 ( ρ ) > 0 a.s., and we do not need to sp ecify the v alue c σ in practice. Lemma 6.7. Under the ab ove-mentione d setup, we have ( 6.35 ) for ( V k − 1 : = V ( X t k − 1 ,n ) ) ˆ Σ n ( ρ ) : =            1 n n X k =1 1 ˆ σ k − 1 ( ρ ) ˆ f k − 1 ( T n ≡ T ) 1 n n X k =1 V k − 1 ˆ σ k − 1 ( ρ ) ˆ f k − 1 ( T n → ∞ ) . Pr o of. (i) W e ﬁrst consider T n ≡ T . Let B k − 1 : = n e σ ♭ k − 1 ,n ( ρ ) ≥ c σ 2 o . Since ˆ σ k − 1 ,n ( ρ ) ≥ c σ / 2 , we hav e    ˆ Σ n ( ρ ) − Σ ∗ n    ≤ 1 n n X k =1     1 σ k − 1 − 1 ˆ σ k − 1 ( ρ )     | ˆ f k − 1 | ≲ 1 n n X k =1 | ˆ f k − 1 |   e σ ♭ k − 1 ( ρ ) − σ k − 1   I B k − 1 + 1 n n X k =1 | ˆ f k − 1 | I B c k − 1 = : D 1 ,n + D 2 ,n . (6.37) F rom this, com bined with ( 6.34 ) with V ( x ) ≡ 1 , ( 6.35 ) follows once w e sho w b oth D 1 ,n P − → 0 and D 2 ,n P − → 0 . Again, without loss of generality , we may and do supp ose that J has only b ounded jumps. First, we make some preliminary observ ations. By ( 6.8 ), ˆ S k ( ρ ) =  1 + o P ( h κ − ε n )  ρ S k ( ρ ) , (6.38) where S k ( ρ ) : = 1 l n k + l n X j = k +1   (2 h n ) − 1 /α ∆ 2 j X   ρ . 36 Let us generically denote b y R k ( ρ ) an y non-negative random v ariable such that n − 1 P k E [ R k ( ρ ) 2 ] = O (1) . By the expression ( 6.9 ) of (2 h n ) − 1 /α ∆ 2 k X and mimicking the argumen t in the proof of Lemma 6.1 , we can deduce the following estimates:       S k ( ρ ) − 1 l n k + l n X j = k +1 σ ρ j − 2   (2 h n ) − 1 /α ∆ 2 j Z   ρ       ≲ 1 l n k + l n X j = k +1 | r j | ρ ≲ h κ 1 n R k ( ρ ) . (6.39) F urther, b y the Burkholder inequalit y and the momen t estimates in Lemma C.1 ,      1 l n k + l n X j = k +1 σ ρ j − 2    (2 h n ) − 1 /α ∆ 2 j Z   ρ − E h   (2 h n ) − 1 /α ∆ 2 2 Z   ρ i − 1 l n k + l n X j = k +1 σ ρ j − 2  E h   (2 h n ) − 1 /α ∆ 2 2 Z   ρ i − m α ( ρ )       ≤ 1 √ l n R k ( ρ ) + C h c . (6.40) Here again, c denotes a generic p ositiv e constan t which may v ary at each app earance. Combining ( 6.38 ), ( 6.39 ), and ( 6.40 ) gives ˆ S k ( ρ ) = m α ( ρ ) 1 l n k + l n X j = k +1 σ ρ j − 2 + h c n R k ( ρ ) . W e can further pro ceed as follows, with c > 0 b eing small enough: ˆ S k ( ρ ) = m ˆ α n ( ρ ) σ ρ k − 1 + m α ( ρ ) 1 l n k + l n X j = k +1 ( σ ρ j − 2 − σ ρ k − 1 ) + h c n R k ( ρ ) = m ˆ α n ( ρ ) σ ρ k − 1 + h c n R k ( ρ ) , where the second step is due to ( 6.36 ). Since m α ( ρ ) > 0 for any α in a compact subset of the op en interv al (1 / 2 , 2) , we arrive at e σ ♭ k − 1 ( ρ ) ρ = m ˆ α n ( ρ ) − 1 ˆ S k ( ρ ) = σ ρ k − 1 + h c n R k ( ρ ) . Note that for ρ ∈ (0 , 1] and x, y ≥ 0 , | x − y | =    ( x ρ ) 1 /ρ − ( y ρ ) 1 /ρ    =     Z 1 0 1 ρ ( y ρ + s ( x ρ − y ρ )) 1 /ρ − 1 ds ( x ρ − y ρ )     ≲ (1 + | x | + | y | ) 1 /ρ − 1 | x ρ − y ρ | . The previous tw o disp la ys yield    e σ ♭ k − 1 ( ρ ) − σ k − 1    ≤ h c n R k ( ρ ) (6.41) for c > 0 small enough. No w, for D 1 ,n , w e note | ˆ f k − 1 | ≲ 1 + | X t k − 1 ,n | , so that n − 1 P n k =1 | ˆ f k − 1 | K = O P (1) for any K > 0 . Then, the estimate ( 6.41 ) and the Hölder inequality giv e D 1 ,n = O P ( h c n ) P − → 0 . Next, D 2 ,n P − → 0 follo ws from n − 1 P n k =1 P [ B c k − 1 ] → 0 . Observe that, by ( 6.41 ), for a suﬃciently small c > 0 and any M > 0 , 1 n n X k =1 P [ B c k − 1 ] ≤ 1 n n X k =1 P h σ k − 1 ≤ c σ 2 +    e σ ♭ k − 1 ( ρ ) − σ k − 1    i ≤ 1 n n X k =1  P h σ k − 1 ≤ c σ 2 + h c/ 2 n M i | {z } =0 for every n large enough. + P h h c/ 2 n R k ( ρ ) ≥ M i  ≲ h c n → 0 . This concludes the pro of of ( 6.35 ) for T n ≡ T . (ii) The case of T n → ∞ is no w almost done; in this case, we are assuming the boundedness of ( x, θ ) 7→ V ( x ) f ( x, θ ) . T o derive ( 6.35 ), w e can follow the same argumen t as in (i), starting from the estimate ( 6.37 ) with T n ≡ T , by replacing ˆ f k − 1 with the b ounded term V 2 k − 1 ˆ f k − 1 . 37 A Prop erties of the transition densit y In this section, we discuss the prop erties of the transition density of the solution X for the SDE ( 2.1 ) considered, for each θ , as a time-homogeneous Marko v pro cess. This discussion will b e based mainly on the pap er [ 16 ], where lo cally stable L évy-typ e pr o c esses were studied. It is easy to re-formulate our SDE setting to the one adopted in [ 16 ]; in particular, the Lévy kernel µ ( x ; du ) in the formula [ 16 , Eq.(2.1)] for the generator of the pro cess is just the image of our Lévy measure ν ( du ) under the mapping u 7→ σ ( x ) u , and the principal and n uisance parts in the decomp osition [ 16 , Eq.(2.2)] of this kernel are the images under the same mapping of the similar parts of the decomp osition ( 2.5 ) of ν in our setting. The principal part is the stable k ernel with λ ( x ) = c α σ ( x ) α , ρ ( x ) = 0 , and it is easy to v erify that Assumption 2.1 guarantees all the assumptions of [ 16 , Theorem 3.3]. Therefore the latter theorem guarantees that the solution X for the SDE ( 2.1 ) with a ﬁxed θ is a (well-deﬁned) time- homogeneous Mark ov pro cess, which admits a transition density p t ( θ ; x, y ) with respect to the Lebesgue measure: P ( X t ∈ dy | X 0 = x ) = p t ( x, y ) dy . Moreov er, this density has a semi-explicit representation, whic h we will explain b elo w. Note that the c omp ensate d drift c o eﬃcient in troduced in [ 16 , Section 3.1], in our setting, coincides with the drift co eﬃcien t a ( θ ; x ) b ecause the driving Lévy pro cess is pure jump; recall ( 2.4 ) and also ( 2.8 ). Deﬁne a dynamic al ly mol liﬁe d drift c o eﬃcient A t ( θ ; x ) = Z R a ( θ ; x − z ) 1 2 √ π t 1 /α e − z 2 t − 2 /α dz . This co eﬃcien t satisﬁes | ∂ x A t ( θ ; x ) | ≤ C t − 1+ δ drif t , hence the usual Picard iteration scheme provides that the Cauch y problem d f t ( θ ; x ) = A t ( θ ; f t ( θ ; x )) dt, f 0 ( θ ; x ) = x has a unique solution. Moreov er, it is easy to chec k that | A t ( θ ; x ) − a ( θ ; x ) | ≤ C t η /α , whic h yields that, for any solution f t ( θ ; x ) to the ODE ( 2.9 ), | f t ( θ ; x ) − f t ( θ ; x ) | ≤ C t 1+ η α = C t 1 /α + δ drif t . Recall that w e assume δ reg r ≤ δ drif t , whic h means that assumption ( 2.12 ) remains true with a true solution f t ( θ ; x ) replaced the an approximate solution f t ( θ ; x ) ; see Remark 2.2 . Also, recall that we are giv en the constant (see Assumption 2.3 ) δ < min { δ reg r , δ σ , δ ν } ≤ min { δ drif t , δ σ , δ ν } . F urthermore, w e denote σ t ( x ) =  1 t Z t 0 σ ( f s ( θ 0 ; x )) α ds  1 /α . The following statement directly follo ws from part I of [ 16 , Theorem 3.4]. Theorem A.1 (Prop erties of p t ( x, y ) ) . L et Assumptions 2.1 and 2.3 hold true. Then p t ( x, y ) = e p t ( x, y ) + r t ( x, y ) , (A.1) with the le ading term given explicitly by e p t ( x, y ) = 1 σ t ( x ) t 1 /α ϕ α  y − f t ( θ 0 ; x ) σ t ( x ) t 1 /α  , (A.2) wher e ϕ α is deﬁne d by ( 3.1 ) , and with the r esidual term satisfying sup x ∈ R Z R | r t ( x, y ) | dy ≤ C t δ . (A.3) 38 The estimate ( A.3 ) tells us that the term we called residual is indeed negligible in the integral sense. W e hav e used this b ound, combined with the stability assumption ( 2.19 ), in order to show that the ‘linear part’ U n ( θ ) of the contrast function is ‘essentially martingale’ in the sense that its predictable part is negligible; see Lemma 4.1 . In order to analyze the ‘quadratic part’ Y ( θ ) , we require the residual term r t ( x, y ) to b e small in another, p oin t-wise sense. Namely , w e hav e used in the pro of of Prop osition 4.1 that sup x,y | r t ( x, y ) | ≤ C t − 1 /α + δ ′ (A.4) with some δ ′ > 0 . There is a substantial diﬀerence betw een the integral and p oin t-wise b ounds for the residual term r t ( x, y ) in ( A.2 ), see [ 16 , Example 3.7] whic h shows that the basic Assumption 2.1 , while pro viding the integral b ound ( A.3 ), is not suﬃcient for the p oin t-wise b ound ( A.4 ). This w as the actual reason for us to in tro duce the additional Assumption 2.4 . Theorem A.2. L et Assumption 2.1 and Assumption 2.4 hold. Then the r esidue term in the de c omp osi- tion ( A.2 ) satisﬁes ( A.4 ) with δ ′ = 1 − β ′ α > 0 . Pr o of. By part II of [ 16 , Theorem 3.4], the following additional condition is suﬃcient for ( A.4 ) to hold true for some δ ′ ≤ δ : sup w ∈ R     t − 1 /α Z R ν  | σ ( x ) u | > t 1 /α , | x + σ ( x ) u − w | < t 1 /α  dx     ≤ C t − 1+ δ ′ . Using additivity of ν ( du ) , it is easy to deduce the ab o v e condition from a similar one with arbitrary (but ﬁxed) ε > 0 : sup w ∈ R     t − 1 /α Z R ν ( | σ ( x ) u | > t 1 /α , | x + σ ( x ) u − w | < εt 1 /α ) dx     ≤ C t − 1+ δ ′ . T aking m = inf x 1 σ ( x ) , M = sup x 1 σ ( x ) , w e obtain another suﬃcien t condition for the previous one: sup w ∈ R t − 1 /α Z R | ν |  | u | > mt 1 /α ,     u − w − x σ ( x )     < εM t 1 /α  dx ≤ C t − 1+ δ ′ . (A.5) No w we are ready to verify this condition using Assumption 2.4 . Fix x, w and denote z = w − x σ ( x ) . W e ha ve | u | > mt 1 /α , | u − z | ≤ εM t 1 /α = ⇒ | u − z | ≤ εM t 1 /α , | z | > mt 1 /α − εM t 1 /α . T ak e ε = m 3 M , then r := εM t 1 /α = 2( mt 1 /α − εM t 1 /α ) < | z | , and applying Assumption 2.4 w e get | ν |  | u | > mt 1 /α ,     u − w − x σ ( x )     < εM t 1 /α  ≤ | ν | ( | u − z | < r ) ≤ C r κ | z | − β ′ − κ for | z | > mt 1 /α − εM t 1 /α , and | ν |  | u | > mt 1 /α ,     u − w − x σ ( x )     < εM t 1 /α  = 0 otherwise. Note that m | x − w | ≤ | z | ≤ M | x − w | , 39 hence we can verify ( A.5 ) as follows: t − 1 /α Z R | ν |  | u | > mt 1 /α ,     u − w − x σ ( x )     < εM t 1 /α  dx ≤ t − 1 /α ( εM t 1 /α ) κ Z | x − w |≥ M − 1 ( m − εM ) t 1 /α | x − w | − κ − β ′ dw = C t − 1 /α + κ/α ( t 1 /α ) 1 − κ − β ′ = C t − β ′ /α = C t − 1+ δ ′ , where we used that κ + β ′ > 1 . B Limit theorems In this section w e prov e the weak conv ergence ( 5.4 ) for the triple (Ξ n , Γ n , Σ n ) . This will require substan- tially diﬀeren t to ols in the ﬁnite and inﬁnite observ ation horizon cases, which we thus consider separately . B.1 Finite observ ation horizon Recall that, for the ﬁnite observ ation horizon case T n → T , we take V ( x ) ≡ 1 ; that is, we do not need additional weigh ts in the construction. Since the functions f ( x ) := ( ∇ θ a ( θ 0 ; x )) ⊗ 2 , g ( x ) = 1 σ ( x ) ( ∇ θ a ( θ 0 , x )) ⊗ 2 are contin uous and tra jectories of X are càdlàg, we hav e Γ n = 1 n n X k =1 f ( X t k − 1 ,n ) P → 1 T Z T 0 f ( X t ) dt = Γ 0 , Σ n = 1 n n X k =1 g ( X t k − 1 ,n ) P → 1 T Z T 0 g ( X t ) dt = Σ 0 simply by the conv ergence of Riemannian sums to the Riemann integral. Therefore, in order to prov e ( 5.4 ), it is enough to show that Ξ n s −L → Ξ 0 , (B.1) where s −L → denotes stable c onver genc e in law . W e refer to [ 12 , Section 2] for the deﬁnition and basic prop erties of this mo de of conv ergence. T o pro v e ( B.1 ) it is enough apply [ 12 , Theorem 3.2] to the sequence of pro cesses Ξ n t = [ nt ] X k =1 χ n k , t ∈ [0 , 1] , with χ n k = µ k,n ξ k,n , µ k,n = − 1 √ n  sgn( ζ k,n ) − E [sgn( ζ k,n ) |F k − 1 ,n ]  , ξ k,n = ∇ θ a ( θ 0 ; X t k − 1 ,n ) . There are ﬁve assumptions in [ 12 , Theorem 3.2], tw o of whic h are satisﬁed trivially in our set- ting. Namely , b oth { µ k,n } and { χ n k } are martingale diﬀerence sequences, hence the ﬁrst assumption [ 12 , Eq.(3.10)] holds trivially with B t ≡ 0 . Next, we will take the r efer enc e martingale M equal identi- cally 0, hence [ 12 , Eq.(3.12)] holds trivially with G t ≡ 0 . It is left to v erify the follo wing three assumptions: for any t ∈ [0 , 1] , • [ 12 , Eq.(3.11)]: [ nt ] X k =1 E [( χ n k ) ⊗ 2 |F k − 1 ,n ] P → F t := 1 T Z t 0 f ( X s ) ds ; (B.2) 40 • [ 12 , Eq.(3.13)]: for some ε > 0 , [ nt ] X k =1 E [ | χ n k | 2 1 | χ n k | >ε |F k − 1 ,n ] P → 0; (B.3) • [ 12 , Eq.(3.14)]: for any b ounded martingale N on the ﬁltered probability space (Ω , F , ( F t ) t ∈ [0 , ∞ ) , P ) , [ nt ] X k =1 E [( N t k,n − N t k − 1 ,n ) χ n k |F k − 1 ,n ] P → 0 . (B.4) The ﬁrst tw o prop erties are easy to chec k. Indeed, | χ n k | 2 ≤ 1 n |∇ θ a ( θ 0 , X t k − 1 ,n ) | 2 , and since the tra jectory X t , t ∈ [0 , T ] is b ounded and ∇ θ a ( θ 0 , · ) is contin uous, ( B.3 ) holds true. Next, we ha ve E [( χ n k ) ⊗ 2 |F k − 1 ,n ] = E [( µ n k ) ⊗ 2 |F k − 1 ,n ] f ( X t k − 1 ,n ) = 1 n  1 − ( E [sgn( ζ k,n ) |F k − 1 ,n ]) 2  f ( X t k − 1 ,n ) . W e ha ve 1 n [ nt ] X k =1 f ( X t k − 1 ,n ) P → F t := 1 T Z t 0 f ( X s ) ds b y the conv ergence of Riemannian sums. Moreov er, | E [sgn( ζ k,n ) |F k − 1 ,n ] | ≤ C h δ n W ( X t k − 1 ,n ) (see ( 4.9 ) and subsequent estimates) and the function W ( x ) is locally b ounded. Hence, 1 n [ nt ] X k =1 ( E [sgn( ζ k,n )]) 2 f ( X t k − 1 ,n ) P → 0 , whic h gives ( B.2 ). The main diﬃcult y is provided b y the ‘orthogonality’ condition ( B.4 ), whic h is required to hold for any (b ounded) martingale N . T o describ e the space H 1 of all martingales on our given ﬁltered space, w e will use the version of the Jaco d-Y or theorem, which we outline b elo w, following the exp osition in [ 8 , Section 18.3]. W e can assume without loss of generality that our ﬁltered probabilit y space is canonical: Ω = D ([0 , ∞ )) , the ﬁltration ( F t ) t ∈ [0 , ∞ ) is the corresponding natural ﬁltration, and P is the distribution in D ([0 , ∞ )) of the solution to ( 2.1 ). W e note that, on this speciﬁc probability space, there exists a Lévy pro cess with the characteristic exp onent ( 2.3 ) such that ( 2.1 ) holds. In general, we understand the solution to ( 2.1 ) in the weak sense; that is, as a pair ( X , Z ) on some probability space. Note how ever that the pro cess Z in this pair can b e obtained from X in a measurable wa y b ecause σ ( x ) is separated from 0 . Indeed, denote ∆ X t = X t − X t − for the jump size at time t and consider the random p oint measure N Z ( A ) = #  s :  s, 1 σ ( X s − ) ∆ X s  ∈ A  . (B.5) This will b e the random p oin t measure corresp onding to the jumps of the pro cess Z . Then, N Z ( ds, du ) is a Poisson p oin t measure with the intensit y measure dsµ ( du ) , and by the Itô-Lévy decomposition we ha ve the representation Z t = bt + Z t 0 Z | u | < 1 u g N Z ( ds, du ) + Z t 0 Z | u |≥ 1 uN Z ( ds, du ) , g N Z ( ds, du ) = N Z ( ds, du ) − dsµ ( du ) . (B.6) 41 The same pro cedure can b e repeated on the canonical probabilit y space. Namely , we can deﬁne N Z,can and Z can b y ( B.5 ) and ( B.6 ) with X can instead of X . Since the la ws of X can and X are the same, their images under the same meas urable construction are also the same. That is, N Z,can ( ds, du ) is a Poisson p oin t measure with the intensit y measure dsµ ( du ) and Z can is a Lévy pro cess with the characteristic exp onen t ( 2.3 ). Finally , ( X can , Z can ) satisfy ( 2.1 ) by the construction. F rom no w on, we op erate on the canonical probability space only , and for the brevity of notation omit the sup erscript; e.g. we will write X instead of X can . Denote by L f ( x ) = a ( θ 0 ; x ) f ′ ( x ) + P . V . Z R ( f ( x + σ ( x ) u ) − f ( x )) ν ( du ) , f ∈ C 2 b ( R ) , the generator of the solution to ( 2.1 ). Denote by N the set of all the pro cesses of the form f ( X t ) − Z t 0 L f ( X s ) ds, t ∈ [0 , ∞ ) . Denote by P ( N ) the set of all probabilit y measures on Ω = D ([0 , ∞ )) such that every pro cess from N is a martingale; that is, the set of all solutions to the martingale pr oblem ( L , C 2 b ( R )) . Denote by H p ( P ) for p ∈ [1 , ∞ ] the space of all martingales M suc h that E [sup t | M t | p ] < ∞ . The Jaco d-Y or theorem (e.g. [ 8 , Theorem 18.3.6]) states, in particular, that the following tw o statements are equiv alen t for a given probabilit y P ∈ P ( N ) : • the set of all sto c hastic integrals with resp ect to elements of N with b ounded predictable in tegrands is dense in H 1 ( P ) ; • P is an extremal p oin t of P ( N ) , considered as a con v ex subset of the space of all probabilit y measures on (Ω , F ) . By [ 16 , Theorem 3.3], the martingale problem ( L , C 2 b ( R )) is wel l p ose d , i.e. has exactly one solution, and the law P of the (unique) weak solution to ( 2.1 ) is this unique solution to the martingale problem. That is, P ( N ) = { P } and thus P is the extremal p oin t of the one-p oin t set P ( N ) . This means that sto c hastic integrals with resp ect to elements of N with b ounded predictable in tegrands form a dense subset in H 1 ( P ) . W e ha ve already shown that, on the canonical probability space, the pro cess X satisﬁes ( 2.1 ) with the Lévy pro cess which has the Itô-Lévy decomp osition ( B.6 ). Applying the Itô formula, we see that arbitrary element of N can b e written as f ( X t ) − Z t 0 L f ( X s ) ds = Z t 0 Z R h f ( X s − + σ ( X s − u ) − f ( X s − ) i g N Z ( ds, du ) . It follows that any P -martingale N can b e obtained as a limit in H 1 ( P ) of linear combinations of the martingales of the form I f ,G t = Z t 0 G s Z R h f ( X s − + σ ( X s − u ) − f ( X s − ) i g N Z ( ds, du ) with f ∈ C 2 b ( R ) and b ounded predictable pro cesses G . Each such linear combination is an element of the linear space J 1 of the pro cesses of the form J t = Z t 0 Z R H ( s, u ) g N Z ( ds, du ) (B.7) with predictable pro cesses H ( s, u ) such that | H ( s, u ) | ≤ C ( | u | 2 ∧ 1) a.s. for some constant C . Let us sho w that an y b ounde d martingale N can b e approximated by elements of J 1 in H 2 ( P ) . By the Burkhölder-Davis-Gundy inequality (e.g. [ 8 , Theorem 11.5.5]), for any J ∈ J 1 c 1 E h [ J − H ] 1 / 2 1 i ≤ ∥ J − H ∥ H ∞ = E " sup t ∈ [0 , 1] | J t − H t | # ≤ C 1 E h [ J − H ] 1 / 2 1 i 42 for p ositiv e univ ersal constants c 1 and C 1 . W e ha ve [ J − N ] = [ J ] − 2[ J, N ] + [ N ] = X s ≤ t (∆ J s ) 2 − 2 X s ≤ t ∆ J s ∆ N s + ⟨ N c ⟩ + X s ≤ t (∆ N s ) 2 = X s ≤ t (∆ J s − ∆ N s ) 2 + ⟨ N c ⟩ , where we hav e used that J do es not hav e the contin uous martingale part: J c = 0 . Let sup t | N t | ≤ R a.s. and deﬁne a new pro cess J R ∈ J 1 b y changing the integrand H ( s, u ) to H R ( s, u ) = F R ( H ( s, u )) , F R ( x ) =  x, | x | ≤ 2 R ; 2 R sgn( x ) , otherwise. The time moments of jumps of J and J R coincide and, b ecause | ∆ N s | ≤ 2 R, we ha ve (∆ J s − ∆ N s ) 2 ≤ (∆ J R s − ∆ N s ) 2 for each jump. This yields that [ J − N ] ≤ [ J R − N ]; that is, a martingale N with sup t | N t | ≤ R can b e approximated in H 1 ( P ) b y pro cesses from J 1 with their jumps b ounded by 2 R . W e also can stop these pro cesses on the moment of their exit from [ − 2 R, 2 R ] , the op eration of whic h corresp onds to the multiplication of H ( s, u ) by 1 [0 ,τ ] ( s ) with the corresponding exit time τ and the stopp ed pro cess remains in the class J 1 . This means that in H 1 ( P ) , we can approximate a martingale N b ounded by R by a sequence { J n } ⊂ J 1 b ounded by 4 R . By the dominated conv ergence theorem, this approximation then holds for H 2 ( P ) , as well. Denote b y J 0 the class of pro cesses from J 1 suc h that, in the represen tation ( B.7 ), the function H ( s, u ) is b ounded and equals 0 for | u | ≤ c , where c > 0 ma y depend on the pro cess. Clearly , by the Itô isometry J 0 is dense in J 1 in H 2 . Summarizing the abov e considerations, we pro v ed that, for any b ounded martingale N and any ε > 0 , there exists J ε ∈ J 0 suc h that E " sup t ∈ [0 , 1] ( J ε t − N t ) 2 # ≤ ε. (B.8) No w w e can prov e the required assertion ( B.4 ). First, we observ e that ( B.4 ) holds true with N replaced b y any J ∈ J 0 . Indeed, χ n k = µ k,n ξ k,n , where | µ k,n | ≤ n − 1 / 2 . Since ( B.4 ) states conv ergence in probability , we can use the standard lo calization tec hnique (e.g. suc h as in the proof of Lemma 4.2 ) to restrict ourselves to the case sup s ∈ [0 ,T | X s | ≤ K . In this case, | ξ k,n | ≤ C and th us | χ n k | ≤ C n − 1 2 . (B.9) On the other hand, J is a compensated comp ound Poisson pro cess with b ounded jumps, hence   E [( J t k,n − J t k − 1 ,n ) χ n k |F k − 1 ,n ]   ≤ C h n n − 1 2 + C n − 1 2 E  | J t k,n − J t k − 1 ,n | 1 N Z (( t k − 1 ,n ,t k,n ] , R \{ 0 } ) ≥ 1   F k − 1 ,n  ≤ C h n n − 1 2 . Since n X k =1 C h n n − 1 2 = C n − 1 2 → 0 , this prov es ( B.4 ) for J ∈ J 0 . F or arbitrary b ounded martingale N and arbitrary ε > 0 , take J ε ∈ J 0 suc h that ( B.8 ) holds. Again, using the lo calization we can assume ( B.9 ). Then, [ nt ] X k =1 E [( N t k,n − N t k − 1 ,n ) χ n k |F k − 1 ,n ] = [ nt ] X k =1 E [( J t k,n − J t k − 1 ,n ) χ n k |F k − 1 ,n ] + [ nt ] X k =1 E h ( N t k,n − N t k − 1 ,n ) − ( J t k,n − J t k − 1 ,n )  χ n k    F k − 1 ,n i . 43 The ﬁrst sum conv erges to 0 in probabilit y . F or the second sum, we hav e E   [ nt ] X k =1 E h ( N t k,n − N t k − 1 ,n ) − ( J t k,n − J t k − 1 ,n )  χ n k    F k − 1 ,n i   = E   [ nt ] X k =1  ( N t k,n − N t k − 1 ,n ) − ( J t k,n − J t k − 1 ,n )  χ n k   ≤   E   [ nt ] X k =1  ( N t k,n − N t k − 1 ,n ) − ( J t k,n − J t k − 1 ,n )  2     1 / 2   E   [ nt ] X k =1 ( χ n k ) 2     1 / 2 ≤ C E   ( N 1 − N 0 ) − ( J 1 − J 0 )  2  ≤ C ε. Th us for any γ > 0 , lim sup n →∞ P         [ nt ] X k =1 E [( N t k,n − N t k − 1 ,n ) χ n k |F k − 1 ,n ]       > γ   ≤ ε γ . Since ε > 0 is arbitrary , this actually gives lim n →∞ P         [ nt ] X k =1 E [( N t k,n − N t k − 1 ,n ) χ n k |F k − 1 ,n ]       > γ   = 0 and completes the pro of of ( B.4 ). W e ha v e c heck ed all the assumptions required in [ 12 , Theorem 3.2]. Applying this theorem, we get ( B.1 ) and complete the pro of. B.2 Inﬁnite observ ation horizon W e ha ve the following. Theorem B.1. L et Assumption 3.1 hold true. Then, the pr o c ess X is er go dic with r esp e ct to P θ 0 , and its tr ansition pr ob abilities admit the fol lowing c onver genc e r ates to the invariant me asur e: ∥ P t ( θ 0 ; x, dy ) − π ( θ 0 , dy ) ∥ T V ≤ U ( x ) r ( t ) , t ≥ 0 , (B.10) wher e • for κ ≥ 1 , U ( x ) = 1 + | x | q , r ( t ) = C e − ct ; • for κ < 1 , U ( x ) = 1 + | x | q + κ − 1 , r ( t ) = C (1 + ct ) − ( q + κ − 1) / (1 − κ ) with some C, c > 0 . This result follows by applying [ 15 , Theorem 3.4.11] and [ 15 , Theorem 3.4.12] for the cases κ ≥ 1 and κ < 1 , resp ectiv ely . The Dobrushin condition requested in the pream ble to these theorems can b e v eriﬁed easily using the prop erties of the transition density from Section A and the argument from [ 15 , Section 3.2.2]. Giv en the ergo dicit y of the underlying pro cess, we hav e the following. Prop osition B.1. F or any f ∈ L 1 ( π ( θ 0 · )) , 1 n n X k =1 f ( X t k − 1 ,n ) P → Z R f ( y ) π ( θ 0 , dy ) . (B.11) 44 Pr o of. This statement looks v ery muc h like the Birkhoﬀ ergo dic theorem, but we cannot use the latter one here b ecause of the n -dep enden t discretization step h n . Instead, we use with minimal changes the argumen ts from [ 15 , Section 5]. Namely , consider ﬁrst the stationary v ersion of X , i.e. the solution to ( 2.1 ) with random X 0 whose distribution is equal to π ( θ 0 ; · ) . It follo ws from ( B.10 ) that    Co v( f ( X t ) , f ( X s ))    ≤ 2 U ( X s ) r ( t ) ∥ f ∥ 2 ∞ , see [ 15 , Corollary 5.1.8], where one should take γ = 1 , W ( x ) ≡ 1 , V ( x ) = U ( x ) . Then it is an easy calculation to sho w that for b ounde d f con v ergence ( B.11 ) holds true in L 2 ( P ) sense. Using the L 1 ( P ) isometry for stationary P , we extend this conv ergence to whole L 1 ( π ( θ 0 · )) , though in this general case ( B.11 ) holds true in L 1 ( P ) sense. Finally , by ( B.10 ), a non-stationary pro cess with a prescrib ed probabilit y arbitrarily close to 1 can b e coupled in a ﬁnite time with the stationary version of X , which yields ( B.11 ) in the general non-stationary setting. W e omit the details of such a quite common ‘coupling’ tric k, referring the reader to [ 15 ] for details and references. Assumption 3.2 yields that G := V W 2 ∈ L 1 ( π ( θ 0 · )) . Indeed, let G N = G 1 G ≤ N , then for any N > 1 b y Prop osition B.1 , Z R G N ( y ) π ( θ 0 ; dy ) = lim n →∞ E " 1 n n X k =1 G N ( X t k,n ) # ≤ C := sup n E " 1 n n X k =1 V ( X t k,n ) W ( X t k,n ) 2 # . T aking the limit for N → ∞ , we obtain the required statement. Since V is b ounded and ∇ θ a is dominated b y W , this means that the functions f ( x ) := V ( x ) 2 (( ∇ θ a ( θ 0 ; x )) ⊗ 2 , g ( x ) = V ( x ) 2 σ ( x ) ( ∇ θ a ( θ 0 , x )) ⊗ 2 b elong to L 1 ( π ( θ 0 · )) , and applying Prop osition B.1 we get that Γ n P → Γ , Σ n P → Σ 0 . The same argument shows that, for every θ and any R, Λ n,R ( θ ) P → Λ 0 ,R ( θ ) , see the notation prior to ( 4.18 ). Because, for a given R , the random ﬁelds Λ n,R are uniformly Lipschitz in θ , this yields ( 4.18 ). Recall the notation: Ξ n t = [ nt ] X k =1 χ n k , χ n k = µ k,n ξ k,n , µ k,n = − 1 √ n  sgn( ζ k,n ) − E [sgn( ζ k,n ) |F k − 1 ,n ]  , ξ k,n = V ( X t k − 1 ,n ) ∇ θ a ( θ 0 ; X t k − 1 ,n ) . Recall that sup k    E [ n ( µ k,n ) ⊗ 2 |F k − 1 ,n ] − 1    → 0 , Then by Prop osition B.1 , [ nt ] X k =1 E [( χ n k ) ⊗ 2 |F k − 1 ,n ] P → t Γ 0 . Using that | ξ k,n | is dominated by V ( X t k − 1 ,n ) W ( X t k − 1 ,n ) and the moment Assumption 3.2 , we get that, for any ε > 0 , [ nt ] X k =1 E [ | χ n k | 2 1 | χ n k | >ε |F k − 1 ,n ] P → 0 . Then, by the central limit theorem for martingale diﬀerence arrays (e.g. [ 10 ]), we obtain Ξ n = Ξ n 1 ⇒ N (0 , Γ 0 ) , whic h completes the pro of of ( 5.4 ). 45 C Some momen t estimates In this section, we pro ve the moment estimate    E h | h − 1 /α Z h | ρ i − E h | h − 1 /α Z ( α ) h | ρ i    ≤ C h δ (C.1) for some ρ > 0 , δ > 0 , used in Section 6 . F or that purp ose, we ﬁrst construct a pair of pro cesses Z , Z ( α ) on the same probability space in such a w ay that they are close, in a sense. Recall the decomp osition ( 2.5 ) of the Lévy measure of Z : µ ( dz ) = µ α ( dz ) + ν ( dz ) . The Hahn-Jordan decomp osition of the ﬁnite-v ariation signed measure ν ( dz ) gives the representation by the p ositive and negativ e parts: ν ( dz ) = ν + ( dz ) − ν − ( dz ) . Denote µ min ( dz ) = µ α ( dz ) − ν − ( dz ) . Then, µ α ( dz ) = µ min ( dz ) + ν − ( dz ) , µ ( dz ) = µ min ( dz ) + ν + ( dz ) . Deﬁne on a certain probabilit y space three indep endent P oisson p oin t measures N min ( dz , dt ) , N + ( dz , dt ) , N − ( dz , dt ) , with the intensit y me asures µ min ( dz ) dt , ν + ( dz ) dt , and ν − ( dz ) dt , resp ectiv ely . Denote N α ( dz , dt ) = N min ( dz , dt ) + N − ( dz , dt ) , N ( dz , dt ) = N min ( dz , dt ) + N + ( dz , dt ) , whic h are no w dep endent Poisson p oin t measures with the intensit y measures µ α ( dz ) dt and µ ( dz ) dt , resp ectiv ely . Deﬁne the pro cess Z ( α ) t = Z t 0 Z | z |≤ 1 z e N α ( dz , ds ) + Z t 0 Z | z | > 1 z N α ( dz , ds ) , where e N α denotes the comp ensated N α . W rite Z t = bt + Z t 0 Z | z |≤ 1 z e N ( dz , ds ) + Z t 0 Z | z | > 1 z N ( dz , ds ) for our driving Lévy pro cess. Moreov er, let N △ ( dz , dt ) := N + ( dz , dt ) + N − ( d ( − z ) , dt ) , whic h deﬁnes a P oisson p oin t measure with the in tensit y measure ν △ ( dz ) dt , where ν △ ( dz ) := ν + ( dz ) + ν − ( d ( − z )) . Then, Z △ t := Z t − Z ( α ) t = bt + Z t 0 Z | z |≤ 1 z e N △ ( dz , ds ) + Z t 0 Z | z | > 1 z N △ ( dz , ds ) . W e ha ve Z | z |≤ 1 z ν △ ( dz ) = Z | z |≤ 1  z ν + ( dz ) − z ν − ( dz )  = Z | z |≤ 1 z ν ( dz ) = b ; see Remark 2.1 for the last identit y . This ﬁnally giv es the representation Z = Z ( α ) + Z △ (C.2) with Z △ t = Z t 0 Z R z N △ ( dz , ds ) . 46 W rite Z △ = Z △ ,small + Z △ ,lar ge , where Z △ ,small t := Z t 0 Z | z |≤ 1 z N △ ( dz , ds ) , Z △ ,lar ge t := Z t 0 Z | z | > 1 z N △ ( dz , ds ) . Recall that β ∈ [0 , α/ 2) ⊂ [0 , 1) denotes the Blumen thal-Geto or index of the n uisance part of Z , hence of Z △ ,small ; see Remark 2.4 and also Examples 2.5 and 2.6 . Building on the ab o v e descriptions, in the following lemma we pro ve small-time moment estimates for the tw o comp onen ts of h − 1 α Z △ h . Note that Assumption 3.1 en tails that E [ | Z △ ,lar ge 1 | q ] < ∞ when T n → ∞ , whereas no momen t conditions are required when T n ≡ T . In the sequel, w e suppress the explicit dep endence on q > 0 to present our claims more concisely . Lemma C.1. W e have the fol lowing for h ∈ (0 , 1] . 1. F or l ∈ (0 , 1] for which E [ | Z △ ,lar ge 1 | l ] < ∞ , E h | h − 1 α Z △ ,lar ge h | l i ≤ C h 1 − l α . (C.3) 2. If β = 0 , then for s ∈ (0 , 1] , E h | h − 1 α Z △ ,small h | s i ≤ C h 1 − s α . (C.4) If β > 0 , then for s ∈ (0 , β ) , E h | h − 1 α Z △ ,small h | s i ≤ C h s γ − s α (C.5) for any γ ∈ ( β , α ∧ 1) . Pr o of. Both Z △ ,lar ge t and Z △ ,small t are just sums of some jumps of the pro cess Z △ : the ﬁrst one is ﬁnite, the second one is a sum of an a.s. absolutely con v ergen t series. This observ ation, together with the elemen tary inequality | P i a | p ≤ P i | a i | p for p ∈ (0 , 1] and { a i } ⊂ R gives the estimate E "     Z t 0 Z U z N △ ( dz , ds )     p # ≤ E  Z t 0 Z U | z | p N △ ( dz , ds )  = t Z U | z | p ν △ ( dz ) for any p ∈ (0 , 1] and U ⊂ R such that Z U | z | p ν △ ( dz ) < ∞ . T aking in this estimate p = l and U = {| u | > 1 } , we get ( C.3 ). Also, taking p = s and U = {| u | ≤ 1 } , we get ( C.4 ). Finally , applying the same inequality with p = γ and U = {| u | ≤ 1 } , we get E h | h − 1 α Z △ ,small h | γ i ≤ C h 1 − γ α , where we used that | ν △ | ( dz ) ≤ 2 | ν | ( dz ) has the Blumenthal-Getoor index β < γ , and therefore Z | u |≤ 1 | z | γ ν △ ( dz ) < ∞ . Then ( C.5 ) follows by the Lyapuno v inequality . The estimate ( C.1 ) is immediate from the previous three ones. Com bining ( C.3 ), ( C.4 ), and ( C.5 ), we get the follo wing. Corollary C.1. F or any ρ ∈ (0 , 1 { β =0 } + 1 { β > 0 } β ) and γ ∈ ( β , α ∧ 1) , ( C.1 ) holds true with δ =  1 { β =0 } + 1 { β > 0 } ρ γ  − ρ α > 0 . 47 It is straigh tforw ard to extend Lemma C.1 and the corresponding moment bounds to sto c hastic in tegrals: for t ≥ 0 , h ∈ (0 , 1] , s ∈ (0 , β ) , and for l ∈ (0 , 1] for which E [ | Z △ ,lar ge 1 | l ] < ∞ , w e can ﬁnd a λ > 1 / 2 such that E        h − 1 α Z t + h t ξ s − d Z △ ,lar ge s      l   ≤ C h 1 − l α sup t ∈ [ t,t + h ] E  | ξ t | l  , E "      h − 1 α Z t + h t ξ s − d Z △ ,small s      s # ≤ C h λ sup t ∈ [ t,t + h ] E [ | ξ t | s ] . See also Lemma 6.2 for the related moment estimate asso ciated with the lo cally α -stable small-jump part. The following Example C.1 is more than necessary for our use in proving the second item in Lemma 6.3 , but is imp ortant in statistical applications with the stable quasi-likelihoo d function. See [ 5 ], [ 6 ], [ 23 ], and also [ 24 ]. Example C.1. Supp ose that the L évy me asur e µ ( dz ) admits a L eb esgue density on R \ { 0 } : µ ( dz ) = c α | z | α +1 m ( z ) dz for some function m ( z ) = 1 + m 0 ( z ) with m 0 ( z )  = 0 ( z  = 0 ) satisfying that m 0 ( z ) = O ( | z | ) for | z | → 0 . This is r elate d to ( 2.22 ) in Example 2.6 (namely, n ( z ) = c α | z | − ( α +1) m 0 ( z ) ) and is a r ather p articular c ase c omp ar e d with the gener al setting. Stil l, it includes many p opular sp e ciﬁc examples, such as the (exp onential ly) temp er e d stable, the normal temp er e d stable, the gener alize d hyp erb olic, including the Student- t , the Meixner, and so on. Supp ose that β = 0 ∨ ( α − 1) , which is satisﬁe d in many c ontexts, including the afor ementione d examples. Under the pr esent assump- tions, L emma C.1 gives ( C.3 ) , ( C.4 ) , and ( C.5 ) , with β < γ < ( α ∧ 1) . W e further supp ose that m := ( l ∨ s ) < α 2 . Then, we have 1 − m /α > 1 / 2 , so that the upp er b ounds in ( C.3 ) and ( C.4 ) b e c ome o ( h 1 / 2 ) for h → 0 . L et us observe that in the estimate ( C.5 ) , state d for β > 0 and γ ∈ ( β , α ∧ 1) , we c an take s γ − s α > 1 2 . (C.6) T o se e this, it suﬃc es to c onsider α > 1 sinc e β = 0 otherwise; then, β = α − 1 > 0 . As was saw in the pr o of of L emma C.1 , we c an take γ ( > β ) as close to β as p ossible, say γ = β + ε with ε > 0 smal l enough. Then, ( C.6 ) is e quivalent to s > 1 2  1 α − 1 + ε − 1 α  − 1 . This to gether with the c ondition s < β = α − 1 le ads to α > 1 + 1 2  1 α − 1 + ε − 1 α  − 1 . The lower b ound is monotonic al ly de cr e asing to 1 + 1 2 α ( α − 1) for ε ↓ 0 , and the ine quality α > 1 + 1 2 α ( α − 1) ⇐ ⇒ α ∈ (1 , 2) always holds, thus c oncluding ( C.6 ) . Henc e, the upp er b ounds in ( C.3 ) , ( C.4 ) , and ( C.5 ) ar e al l o ( h 1 / 2 ) for h → 0 . It fol lows fr om ( C.1 ) that 1 √ h    E h | h − 1 /α Z h | ρ i − E h | h − 1 /α Z ( α ) h | ρ i    → 0 (C.7) for any ρ ∈ (0 , 1 { β =0 } + 1 { β > 0 } β ) for which E [ | Z 1 | ρ ] < ∞ . The estimate ( C.7 ) is an L 1 -lo c al limit the or em with c onver genc e r ate. It serves as a variant of [ 6 , Pr op osition 4.1]. A c kno wledgemen ts. This work w as partially supp orted by JSPS KAKENHI Grant Number 23K22410 and JST CREST Grant Num b er JPMJCR2115, Japan (HM). 48 References [1] J.-M. Bardet, Y. Boularouk, and K. Djaballah. Asymptotic b eha vior of the Laplacian quasi- maxim um likelihoo d estimator of aﬃne caus al pro cesses. Ele ctr on. J. Stat. , 11(1):452–479, 2017. [2] A. Brouste and H. Masuda. Eﬃcient estimation of stable Lévy pro cess with symmetric jumps. Stat. Infer enc e Sto ch. Pr o c ess. , 21(2):289–307, 2018. [3] K. Chen, Z. Ying, H. Zhang, and L. Zhao. Analysis of least absolute deviation. Biometrika , 95(1):107– 122, 2008. [4] E. Clémen t and A. Gloter. Local asymptotic mixed normality prop ert y for discretely observ ed sto c hastic diﬀeren tial equations driven by stable Lévy pro cesses. Sto chastic Pr o c ess. Appl. , 125(6):2316–2352, 2015. [5] E. Clément and A. Gloter. Estimating functions for SDE driven b y stable Lévy pro cesses. Ann. Inst. Henri Poinc ar é Pr ob ab. Stat. , 55(3):1316–1348, 2019. [6] E. Clémen t and A. Gloter. Joint estimation for SDE driven by locally stable Lévy pro cesses. Ele ctr on. J. Stat. , 14(2):2922–2956, 2020. [7] E. Clémen t, A. Gloter, and H. Nguyen. LAMN prop ert y for the drift and volatilit y parameters of a SDE driven by a s table Lévy pro cess. ESAIM Pr ob ab. Stat. , 23:136–175, 2019. [8] S. N. Cohen and R. J. Elliott. Sto chastic c alculus and applic ations . Probabilit y and its Applications. Springer, Cham, second edition, 2015. [9] P . Doukhan. Mixing. Pr op erties and examples , v olume 85 of L e ctur e Notes in Statistics . Springer- V erlag, New Y ork, 1994. Properties and examples. [10] A. Dvoretzky . Asymptotic normality of sums of dep enden t random vectors. In Multivariate analysis, IV (Pr o c. Fourth Internat. Symp os., Dayton, Ohio, 1975) , pages 23–34. North-Holland, Amsterdam, 1977. [11] D. Iv anenko, A. M. Kulik, and H. Masuda. Uniform LAN prop ert y of locally stable Lévy pro cess observ ed at high frequency . ALEA L at. Am. J. Pr ob ab. Math. Stat. , 12(2):835–862, 2015. [12] J. Jaco d. On contin uous conditional Gaussian martingales and stable conv ergence in law. In Sémi- nair e de Pr ob abilités, XXXI , v olume 1655 of L e ctur e Notes in Math. , pages 232–246. Springer, Berlin, 1997. [13] J. Jaco d and P . Protter. Discr etization of pr o c esses , volume 67 of Sto chastic Mo del ling and Applie d Pr ob ability . Springer, Heidelb erg, 2012. [14] K. Kaleta and P . Sztonyk. Estimates of transition densities and their deriv ativ es for jump Lévy pro cesses. Journal of Mathematic al A nalysis and Applic ations , 431(1):260–282, 2015. [15] A. Kulik. Er go dic b ehavior of Markov pr o c esses , volume 67 of De Gruyter Studies in Mathematics . De Gruyter, Berlin, 2018. With applications to limit theorems. [16] A. Kulik. Approximation in law of lo cally α -stable Lévy-type pro cesses b y non-linear regressions. Ele ctr on. J. Pr ob ab. , 24:P ap er No. 83, 45, 2019. [17] A. Kulik and I. Pa vlyuk evic h. Momen t b ounds for dissipative semimartingales with hea vy jumps. Sto chastic Pr o c ess. Appl. , 141:274–308, 2021. [18] H. Luschgy and G. P agès. Moment estimates for Lévy processes. Ele ctr on. Commun. Pr ob ab. , 13:422–434, 2008. [19] H. Masuda. Join t estimation of discretely observ ed stable Lévy pro cesses with symmetric Lévy densit y . J. Jap an Statist. So c. , 39(1):49–75, 2009. 49 [20] H. Masuda. Approximate self-weigh ted LAD estimation of discretely observ ed ergo dic Ornstein- Uhlen b ec k pro cesses. Ele ctr on. J. Stat. , 4:525–565, 2010. [21] H. Masuda. Conv ergence of Gaussian quasi-likelihoo d random ﬁelds for ergodic L évy driven SDE observ ed at high frequency . Ann. Statist. , 41(3):1593–1641, 2013. [22] H. Masuda. Parametric estimation of Lévy pro cesses. In L évy matters. IV , volume 2128 of L e ctur e Notes in Math. , pages 179–286. Springer, Cham, 2015. Edited by Ole E. Barndorﬀ-Nielsen, Jean Bertoin, Jean Jaco d, and Claudia Küpp elb erg. [23] H. Masuda. Non-Gaussian quasi-likelihoo d estimation of SDE driven by lo cally stable Lévy pro cess. Sto chastic Pr o c ess. Appl. , 129(3):1013–1059, 2019. [24] H. Masuda. On model selection in locally stable regression. Co operative Research Rep ort 446 (pp.5-11), The Institute of Statistical Mathematics, 2021. [25] W. Ob erhofer. The consistency of nonlinear regression minimizing the L 1 -norm. A nn. Statist. , 10(1):316–319, 1982. [26] D. Pollard. Asymptotics for least absolute deviation regression estimators. Ec onometric The ory , 7(2):186–199, 1991. [27] K.-i. Sato. L évy pr o c esses and inﬁnitely divisible distributions , volume 68 of Cambridge Studies in A dvanc e d Mathematics . Cambridge Univ ersit y Press, Cambridge, 1999. T ranslated from the 1990 Japanese original, Revised by the author. [28] V. T o doro v. Po w er v ariation from second order diﬀerences for pure jump semimartingales. Sto chastic Pr o c ess. Appl. , 123(7):2829–2850, 2013. [29] K. Zhu and S. Ling. Global self-weigh ted and lo cal quasi-maxim um exp onential lik eliho od estimators for ARMA-GARCH/IGAR CH mo dels. A nn. Statist. , 39(4):2131–2163, 2011. [30] K. Zhu and S. Ling. LADE-based inference for ARMA mo dels with unsp eciﬁed and he a vy-tailed heteroscedastic noises. J. Amer. Statist. Asso c. , 110(510):784–794, 2015. [31] S. Zwanzig. On L 1 -norm estimators in nonlinear regression and in nonlinear error-in-v ariables mo dels. In L 1 -statistic al pr o c e dur es and r elate d topics (Neuchâtel, 1997) , v olume 31 of IMS L e ctur e Notes Mono gr. Ser. , pages 101–118. Inst. Math. Statist., Ha ywar d, CA, 1997. 50

LAD estimation of locally stable SDE

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment