Efficient Inference after Directionally Stable Adaptive Experiments

We study inference on scalar-valued pathwise differentiable targets after adaptive data collection, such as a bandit algorithm. We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed ta…

Authors: Zikai Shen, Houssam Zenati, Nathan Kallus

Efficien t Inference after Directionally Stable A daptiv e Exp erimen ts Zik ai Shen ∗ 1 , Houssam Zenati ∗ 2 , Nathan Kallus 3,4 , Arthur Gretton 2,5 , Koulik Khamaru 6 , and Aurélien Bibaut 4 1 Univ ersity College London 2 Gatsb y Computational Neuroscience Unit, Universit y College London 3 Cornell Universit y 4 Netflix 5 Go ogle DeepMind 6 Rutgers Universit y F ebruary 26, 2026 Abstract W e study inference on scalar-v alued path wise differentiable targets after adaptiv e data collection, suc h as a bandit algorithm. W e in tro duce a nov el target-sp ecific condition, dir e ctional stability , whic h is strictly weak er than previously imp osed target-agnostic stabilit y conditions. Under directional stabilit y , w e sho w that estimators that would hav e b een efficient under i.i.d. data remain asymptotically normal and semiparametrically efficient when computed from adaptiv ely collected tra jectories. The canonical gradient has a martingale form, and directional stabilit y guarantees stabilization of its predictable quadratic v ariation, enabling high- dimensional asymptotic normality . W e characterize efficiency using a con volution theorem for the adaptive-data setting, and giv e a condition under whic h the one-step estimator attains the efficiency b ound. W e verify directional stabilit y for LinUCB, yielding the first semiparametric efficiency guaran tee for a regular scalar target under LinUCB sampling. 1 In tro duction Statistical inference under adaptive data collection is no w routine in mo dern learning systems. In con textual bandits and related online decision problems, actions are chosen using past observ ations, inducing dep endence betw een observ ations collected at different rounds [Auer et al., 2002, Li et al., 2010, Lattimore and Szepesv ári, 2020, Sutton and Barto, 2018]. This dep endence can inv alidate classical i.i.d. asymptotics: even when estimators remain consistent, their limiting distributions ma y b e non-normal, complicating inference [v an der V aart, 1998, Hall and Heyde, 1980, Hadad et al., 2021, Bibaut and Kallus, 2025]. A key mechanism by whic h classical inference can b e recov ered is stability of the adaptiv e design. Specifically , in the linear regression setting, if the realized random design matrix becomes deterministic as the horizon gets large, then the predictable quadratic v ariation conv ergence condition of martingale central limit theorems is satisfied in the analysis of the OLS estimator, whic h guarantees asymptotic normalit y [Lai and W ei, 1982]. Most existing stability form ulations are ful l-matrix ∗ Equal contribution. 1 conditions: they require the en tire empirical cov ariance (or information) matrix to stabilize after deterministic rescaling. While p o werful, such global requiremen ts can b e misaligned with regret- minimizing ob jectives that limit exploration. In particular, mo dern bandit algorithms deliberately concen trate sampling in “go o d” directions, so information typically accumulates anisotropically across directions. F ull-matrix stabilit y can therefore be to o regret-exp ensiv e to enforce. F ortunately , ev en it fails, inference for a sp ecific sc alar target of interest may still b e possible. W e introduce a no vel notion of stabilit y which is sufficien t for inference for sc alar p athwise differ entiable tar gets . Our starting p oint is that for scalar targets, inference dep ends on the design only in the direction in whic h the target of interest dep ends on the co efficien t vector. W e in tro duce a target-sp ecific notion we call dir e ctional stability , whic h requires stabilization of the empirical design only along a relev ant direction (at an explicit rate), while allo wing instability in directions that do not affect the target. Directional stabilit y can b e substan tially w eaker than classical full-matrix stabilit y [Lai and W ei, 1982] and is compatible with the anisotropic exploration patterns induced b y regret minimization, as we show b y example with LinUCB. Finally , anisotropic stabilization is not unique to optimistic exploration. In the m ulti-armed setting, recent work by Han [2026] shows that Thompson sampling exhibits a sharp stability dichotom y: the pull counts are asymptotically deterministic for each sub optimal arm (and for the unique optimal arm), whereas when there are m ultiple optimal arms, the vector of optimal-arm pull prop ortions con verges to a non-degenerate random limit c haracterized as the inv ariant la w of an SDE. This provides a complementary example in whic h stabilization holds in suboptimal-arm directions but fails along directions associated with the optimal arms. Main message: in directionally stable designs, i.i.d.-efficien t estimators remain efficient. Our main consequence is simple: under directional stability , the estimators that w ould ha ve b een asymptotically normal and efficient under i.i.d. sampling are also the ones that are asymptotically normal and efficien t under adaptiv e sampling. No alterations or square-root-prop ensity weigh ts are needed. W e work at the tra jectory level, viewing the observ ed bandit history as a single dra w from a horizon-indexed longitudinal exp erimen t. W e deriv e the tra jectory-lev el canonical gradien t for the target and show it has a martingale form. Directional stability then guaran tees stabilization of the martingale’s (conditional) quadratic v ariation, yielding asymptotic normalit y in high-dimensional regimes. W e further dev elop an efficiency theory for a sequence of horizon-indexed exp erimen ts and sho w that the resulting one-step estimator achiev es the semiparametric efficiency b ound asso ciated with the stable limit exp eriment. In particular, once directional stability holds, there is no intrinsic need for prop ensity-w eigh ting or ad ho c v ariance stabilization: the classical one-step construction from the i.i.d. setting is already optimal, while prop ensit y w eighting would actually degrade efficiency . Relation to prior w ork. Sev eral lines of work obtain v alid inference under adaptivit y by mo difying estimating equations to enforce normalit y under broad sampling rules. Prominen t examples include prop ensit y-w eighted or v ariance-stabilized doubly robust scores, which often require either kno wn assignmen t probabilities or consisten t prop ensit y estimation and conditional v ariance corrections [e.g., Hadad et al., 2021, Bibaut et al., 2021, Zhang et al., 2021]. A complemen tary literature dev elops alwa ys-v alid pro cedures (e.g., confidence sequences) that remain v alid under arbitrary stopping, sometimes at the cost of conserv atism at fixed horizons [Ho w ard et al., 2021, W audb y-Smith and Ramdas, 2021, Bibaut et al., 2022]. Another strand c haracterizes non-Gaussian limits induced b y adaptiv e designs and p erforms inference by in verting the corresp onding limit exp erimen ts [Rosen b erger and Hu, 1999, Hirano and P orter, 2023, Adusumilli, 2023, Cho et al., 2 2025]. A review of these differen t lines is given in [Bibaut and Kallus, 2025]. A distinct and increasingly activ e line of work studies when classic al W ald-typ e infer enc e is restored under adaptive sampling via stability of the empirical design, in the sense of Lai and W ei [1982]. This p ersp ectiv e has motiv ated stability-based analyses for a range of bandit algorithms, including UCB-t yp e metho ds and refinemen ts thereof [Kalvit and Zeevi, 2021, Khamaru and Zhang, 2024, F an and Glynn, 2022, Han et al., 2024], more precise and quan titative characterizations of (in)stabilit y and cov erage behavior [F an et al., 2024, Han, 2026], stable v arian ts of Thompson sampling under v ariance inflation [Halder et al., 2025], and linear con textual bandits such as LinUCB [F an et al., 2025]. A t the same time, these works highligh t that many widely used adaptiv e p olicies fail to satisfy full-matrix stabilit y , leading to systematic under-co verage of naiv e W ald in terv als when applied without adjustmen t [F an et al., 2024]. Our con tribution is a w eakening of the notion of stabilit y: giv en a target, we iden tify a minimal statistical functional of the design of which the conv ergence along a deterministic sequence suffices for asymptotic normality and a c haracterization of the efficiency b ound. Application to LinUCB. Finally , we instan tiate our conditions for LinUCB. F ull-matrix stabilit y is not curren tly known to hold in linear con textual bandits due to direction-dep enden t information gro wth. Directional stability , b y construction, matc hes this anisotrop y: it only asks for stabilization in the target direction. Plugging in the c haracterization of the eigendecomposition of the empirical design matrix under LinUCB from F an et al. [2025], w e verify directional stabilit y and obtain—to our knowledge—the first semiparametric efficiency guarantee for inference on a regular scalar target under LinUCB sampling. Our con tributions are as follows: • T ra jectory canonical gradient and one-step estimator. W e derive the canonical gradien t for a functional of the repeated factor (that is the en vironment factor, as opposed to the design factors) in the distribution of the full bandit tra jectory and construct a one-step estimator that is algebraically iden tical to the i.i.d. one-step estimator (Sections 2 – 4). • Directional stabilit y and high-dimensional asymptotics. W e introduce directional stabilit y and sho w it yields asymptotic normalit y via stabilization of predictable quadratic v ariation, including high-dimensional regimes where plug-in OLS asymptotics are inadequate. W e then v erify directional stability for LinUCB (Section 5). • Efficiency theory for horizon-indexed exp erimen ts. W e use the notion of regularit y along a sequence of submo dels from v an der Laan et al. [2026] and efficiency w.r.t. suc h regular estimators. W e c haracterize the efficiency b ound under directional stabilit y and show that the one-step estimator attains it (Section 6). 1.1 Data and Statistical Mo dels Data W e observe an adaptively collected sequence { O t } T t =1 with O t := ( X t , A t , Y t ) , where X t ∈ X is a (potentially contin uous) con text, A t ∈ A = { 1 , . . . , K } is a categorical action, and Y t ∈ Y ⊂ R is an outcome. Let ¯ O t := ( O 1 , . . . , O t ) denote the history and F t := σ ( ¯ O t ) the associated filtration. The data are generated b y a con textual bandit–type exp erimen t: at each round t , the agen t selects a (random) logging p olicy g t ( · | x, ¯ O t − 1 ) based on past observ ations, then observ es a fresh, i.i.d. sampled con text X t , draws an action A t ∼ g t ( · | X t , ¯ O t − 1 ) , and finally observ es an outcome Y t dra wn from an unkno wn model conditional on ( X t , A t ) . 3 W e assume that X t is independent of F t − 1 with stationary marginal distribution Q 0 ,X . Condi- tional on ( F t − 1 , X t ) , the action satisfies P ( A t = a | F t − 1 , X t ) = g t ( a | X t , ¯ O t − 1 ) for all a ∈ A , where g t is F t − 1 -measurable and maps X to the K -simplex. Conditional on ( X t , A t ) , the outcome is inde- p enden t of the past and has stationary conditional distribution Y t | ( X t = x, A t = a ) ∼ Q 0 ,Y ( · | a, x ) . Statistical mo dels. W e supp ose that the full tra jectory ¯ O T is a draw of P ( T ) living in the set M np T of distributions that are absolutely contin uous w.r.t. an appropriate pro duct measure µ ( T ) = µ ⊗ T , where density w.r.t µ ( T ) factors as dP ( T ) dµ ( T ) ( ¯ o T ) = T Y t =1 q X ( x t ) g t ( a t | x t , ¯ o t − 1 ) q Y ( y t | a t , x t ) , (1) where the lik eliho od factors q X , q Y , g t , 1 ≤ t ≤ T are unknown to the statistician and allow ed to v ary arbitrarily . W e assume that µ ( X × A ) = 1 . W e then sp ecialize to a restricted setting in the next subsection. F or each horizon T , the data-generating pro cess induces a statistical experiment M np T consisting of all distributions P ( T ) on ¯ O T admitting the factorization (1) . W e emphasize that ( M np T ) T ≥ 1 is a se quenc e of statistic al mo dels . 1.2 Sequence of T arget Estimands W e no w define a sequence of target estimands indexed b y the exp erimen tal horizon T . Eac h target Ψ T is defined on the corresp onding statistical mo del M T and dep ends on a feature represen tation whose dimension ma y gro w with T . All asymptotic statements in the sequel are understo od along this sequence as T → ∞ . Throughout, let Z = X × A . W e write z = ( x, a ) , and Z t = ( X t , A t ) . F or eac h horizon T , define a feature map φ T : Z → R d T . Notation. F or any measurable f : Z × Y → R , define the time-a veraged empirical pro cess notation ¯ P ( T ) f := 1 T T X t =1 E P ( T ) [ f ( Z t , Y t )] , whic h in tegrates o ver the argumen ts of f only , ev en when f is random. This induces the Hilb ert space L 2 ( ¯ P ( T ) ) := { f : Z × Y → R | ¯ P ( T ) ( f 2 ) < ∞} , ∥ f ∥ L 2 ( ¯ P ( T ) ) := { ¯ P ( T ) ( f 2 ) } 1 / 2 . (2) Similarly , for an y measurable f : Z × Y → R , define the empirical av erage ¯ P T f := 1 T T X t =1 f ( X t , A t , Y t ) , with asso ciated (random) seminorm ∥ f ∥ L 2 ( ¯ P T ) := { ¯ P T ( f 2 ) } 1 / 2 and space L 2 ( ¯ P T ) . Finally , define the po oled second-momen t matrix ¯ Σ T := E P ( T ) h 1 T P T t =1 φ T ( Z t ) φ T ( Z t ) ⊤ i . In general, w e use the bar notation to denote a veraging o ver time and the distribution of tra jectories. Assumption 1. W e assume that the set { φ T ( z ) | z ∈ Z } is line arly indep endent. 4 F or T ≥ 1 , we define the sequence of statistical mo dels M T ⊂ M np T as M T = n P ( T ) ∈ M T | ∃ β 0 ∈ R d T s.t. ∀ z ∈ Z , E P ( T ) [ Y t | Z t = z ] = φ T ( z ) ⊤ β 0 o . (3) In Eq. (3) , E P ( T ) [ Y t | Z t = z ] denotes the conditional exp ectation of the in v arian t factor q Y in the factorization of P ( T ) as in Eq. (1) . By Assumption 1, for P ( T ) ∈ M T , there can only b e one β 0 satisfying the condition in Eq. (3) . When there is no ambiguit y as to the underlying distribution P ( T ) , w e denote the corresp onding unique β 0 b y β T ; w e similarly define h T := E P ( T ) [ Y t | Z t = · ] . Remark 1 (Wh y do w e consider high dimensional setting?) . W e c onsider asymptotic normality in setting of c orr e ctly sp e cifie d line ar sieves with gr owing dimension as it is pr e cisely the setting that r eve als a gap b etwe en the plug-in OLS estimator and the one-step estimator in the i.i.d. setting. The plug-in OLS estimator exhibits a naive dep endenc e on the ambient dimension d , and r e quir es d T = o ( T ) for c onsistency. In c ontr ast, the one-step estimator with r e gularize d nuisanc es incur a se c ond or der r emainder term of or der d eff ( λ ) T , wher e d eff is the effe ctive dimension asso ciate d with r e gularization str ength λ , al lowing for p otential ly much mor e aggr essive sc aling of dimensions. W e mak e follo wing assumptions on the outcome mo del error. Sequence of T arget P arameters. Define the se quenc e of functionals Ψ T : M T → R , for any T ≥ 1 , Ψ T ( P ( T ) ) = ν ⊤ T β T , (4) Riesz representation of the tra jectory-level target. Although Ψ T is defined as a functional of the full tra jectory la w P ( T ) , it dep ends on P ( T ) only through the inv ariant conditional mean h T ( z ) := E P ( T ) [ Y t | Z t = z ] . T o formalize this dep endence, we leverage the time-av eraged L 2 geometry L 2 ( ¯ P ( T ) ) associated with P ( T ) defined in Eq. (2). Assumption 2 (Identification condition) . ν T ∈ ker( ¯ Σ T ) ⊥ . W e define the function ¯ α T : Z → R b y ¯ α T ( z ) := ν ⊤ T ¯ Σ † T φ T ( z ) , where ¯ Σ † T denotes the Mo ore– P enrose pseudoinv erse. W e then define the linear functional e Ψ T : L 2 ( ¯ P ( T ) ) → R by e Ψ T ( h ) := 1 T T X t =1 E P ( T ) [ ¯ α T ( Z t ) h ( Z t )] . (5) W e then ha ve that for an y P ( T ) ∈ M T , Ψ T ( P ( T ) ) = e Ψ T ( h T ) , and that ¯ α T is the Riesz representer of e Ψ T . 2 Canonical Gradien t In this section, we derive the canonical gradient of Ψ T for a given T . The followin g homoskesdasticit y assumption simplifies the analysis. Assumption 3 (Homosk edastic noise) . Ther e exists σ > 0 indep endent of T and z such that, E P ( T ) [ ε 2 t | Z t = z ] = σ 2 , wher e ε t := Y t − E [ Y t | Z t , F t − 1 ] . 5 Theorem 1 (Canonical gradien t) . Under assumptions 1-3, Ψ T is p athwise differ entiable at P ( T ) if, and only if, D ∗ T = D ∗ T ( ¯ α T , h T ) wher e D ∗ T ( α, h ) : ¯ o T 7→ 1 T T X t =1 α ( z t )  y t − h ( z t )  , (6) has b ounde d L 2 ( P ( T ) ) norm (i.e. ∥ D ∗ T ∥ 2 L 2 ( P ( T ) ) = σ 2 T ν ⊤ T ¯ Σ † T ν T < ∞ ), in which c ase it is its c anonic al gr adient. W e refer the reader to Section 8 for the full pro of. The ab o ve expression mak es explicit the martingale structure of the canonical gradient under adaptiv e data collection and is the basis of our asymptotic analysis. Note that the canonical gradient mak es app ear the marginal av erage design matrix ¯ Σ T through the definition of the Riesz representer ¯ α T , in other w ords, ¯ Σ T arises from a veraging in T longitudinal ly and at the p opulation lev el across tra jectories cr oss-se ctional ly . This is to b e contrasted with the v ariance-stabilized doubly robust scores used in [Bibaut et al., 2021] for instance, that do not coincide with the ab ov e canonical gradien t, whic h suggests that their corresponding stabilized AIPW is likely not efficient. 3 Directional stabilit y In this section we in tro duce the notion of directional stabilit y whic h is the cornerstone of our asymptotic analysis. Let b α T ,λ α : z 7→ ν ⊤  b Σ T + λ α I d T  − 1 φ T ( z ) , (7) where b Σ T := 1 T P t φ T ( Z t ) φ T ( Z t ) ⊤ . Definition 1 (Directional stability) . W e say that the design is dir e ctional ly stable w.r.t. (Ψ T ) T ≥ 1 if ther e exists a deterministic se quenc e of functions ( e α T ) T ≥ 1 of the form e α T : z 7→ ν ⊤ T e Σ − 1 T φ T ( z ) for a se quenc e of p ositive definite matric es ( e Σ T ) T ≥ 1 such that ∥ b α T ,λ α − e α T ∥ L 2 ( P T ) = o P ( T ) (1) . (8) W e note that when λ α = o ( λ min ( e Σ T )) , where λ min ( e Σ T ) is the smallest eigen v alue of e Σ T , then directional stabilit y is alwa ys implied by the classical full-matrix notion of stabilit y [Lai and W ei, 1982], whic h requires that ∥ e Σ − 1 T b Σ T − I d T ∥ op = o P ( T ) (1) . Remark 2. Note that in the ab ove definition e Σ T ne e d not r elate to ¯ Σ T fr om the c anonic al gr adient. Should ¯ Σ T b e a valid choic e for e Σ T , then ( b α T ,λ α ) is a c onsistent estimator of the Riesz r epr esenter se quenc e ( ¯ α T ) . W e wil l se e further down how c onsistent estimation of the Riesz r epr esenter se quenc e is a ne c essary c ondition for efficiency of the one-step. 4 One-Step Estimator W e construct a one-step estimator b y plugging in the ridge regularized Riesz-represen ter-like estimator b α T ,λ α from (7) as in [Chernozhuk ov et al., 2022], and the follo wing ridge-regularized estimator of the outcome regression function: b h T ,λ h ( z ) := φ T ( z ) ⊤ b β T ,λ , where b β T ,λ h := b Σ − 1 T ,λ h b Σ T ,Z Y , 6 with b Σ T ,Z Y := 1 T T X t =1 φ T ( Z t ) Y t . W e then construct a one-step estimator of Ψ( P ( T ) ) as follows: b Ψ := Ψ( b h T ,λ h ) + ¯ P T h b α T ,λ α ( Z ) { Y − b h T ,λ h ( Z ) } i . Remark 3 (Equiv alence to Undersmoothed Plug-in Estimator) . In the i.i.d. c ase, Bruns-Smith et al. [2025] show that when b oth the outc ome mo del and the R iesz r epr esenter ar e estimate d by (kernel) ridge r e gr ession (“double ridge”), the r esulting augmente d estimator is numeric al ly identic al in finite samples to a single (kernel) ridge r e gr ession plug-in estimator with a smal ler effe ctive p enalty p ar ameter. Due to this numeric al e quivalenc e, an undersmo othe d plug-in ridge estimator inherits the the or etic al pr op erties we establishe d for the one-step estimator. 5 Asymptotic Analysis In this section we provide an asymptotic analysis of our one step estimator using our definition of directional stabilit y . Let e σ T := σ ( ν ⊤ T e Σ − 1 T ν T ) 1 / 2 , whic h is the quan tity app earing in the source condition of exponent 1 of the Riesz represen ter w.r.t e Σ T . An alternativ e representation is e σ T = ( e P ( T ) { e α T ( y − h T ) } 2 ) 1 / 2 , where e P ( T ) is the operator defined for any fixed function f : ( z , ε ) 7→ a 1 φ ( z ) φ ( z ) ⊤ + a 2 φ ( z ) ε b y e P ( T ) f := a 1 Σ T . This alternativ e represen tation highligh ts the in terpretation of e σ 2 T as a v ariance-like ob ject of the generic term e α T ( y − h T ) of D ∗ ( e α T , h T ) . In general e σ T needs not coincide with ¯ σ T := σ q ν ⊤ T ¯ Σ † T ν T = ∥ ¯ α T ( y − h T ) ∥ L 2 ( ¯ P ( T ) ) = √ T ∥ D ∗ T ∥ L 2 ( P ( T ) ) (see Eq. (29) for a deriv ation of this equalit y). It instead plays an analogous role for adaptiv e data collection under stable design. In general e σ T do es not conv erge as a function of T , as can b e c heck ed for example in the case of LinUCB from the pro of of Proposition 2. Assumption 4 (Directional stabilit y) . It holds that σ 2 e σ 2 T ν T T e Σ − 1 T ( b Σ T − e Σ T ) e Σ − 1 T ν T p → 0 . Assumption 5 (Lindeb erg condition) . F or al l ϵ > 0 , we have T X t =1 E  1 T e σ 2 T e α T ( Z t ) 2 ε 2 t 1  e α T ( Z t ) 2 ε 2 t T e σ 2 T > ϵ  | F t − 1  p → 0 . Assumption 6 (Gaussian noise) . Ther e exists σ ∈ (0 , ∞ ) such that, c onditional on Z 1: T := ( Z 1 , . . . , Z T ) , the noise variables ( ε t ) T t =1 ar e indep endent and satisfy ( ε 1 , . . . , ε T ) | Z 1: T ∼ N (0 , σ 2 I T ) . Equivalently, ε t | Z 1: T i.i.d. ∼ N (0 , σ 2 ) . Theorem 2 (von Mises expansion and asymptotic normalit y) . Under assumption 6, the fol lowing von-Mises exp ansion holds pr ovide d the quantities in it ar e wel l-define d and b ounde d: b Ψ T − Ψ T =( P T − e P ( T ) ) { α T ( Y − h T ) } + O ( R T ) 7 with R T := ∥ b α T ,λ α − e α T ∥ L 2 ( P T )     b h T ,λ h − h T    L 2 ( P T ) + 1 √ T  + 1 T ν ⊤ T b Σ − 1 T ,λ α ( b β T ,λ h − β T ) . (9) If we have that R T = o P  e σ T √ T  and assumptions 4-5 hold, then it further holds that p T / e σ T  b Ψ T − Ψ T  d − → N (0 , 1) . W e refer the reader to Section 9 for a full proof. Pr o of sketch of The or em 2. W rite the one-step error as the explicit v on Mises expansion b ψ − Ψ T = ( P T − e P ( T ) ) e α T ( Y − h T ) | {z } (A) + ( P T − e P ( T ) ) e α T ( h T − b h T ,λ ) | {z } (B) + ( P T − e P ( T ) )( b α T − e α T ) ε | {z } (C) − P T ( b α T − e α T )( b h T ,λ − h T ) | {z } (D) . F or (A), set X T ,t := e α T ( Z t ) ε t / ( e σ T √ T ) ; then ( X T ,t ) t ≤ T is a martingale difference array . Assump- tion 4 giv es the conditional quadratic v ariation P t ≤ T E [ X 2 T ,t | F t − 1 ] → p 1 , and Assumption 5 giv es the conditional Lindeb erg condition, so the martingale CL T (Lemma 2) yields √ T (A) / e σ T ⇒ N (0 , 1) . F or the remainder term, (C) is a conditional cen tered Gaussian with v ariance σ 2 ∥ b α T − e α T ∥ 2 L 2 ( P T ) /T , hence (C) = O p ( ∥ b α T − e α T ∥ L 2 ( P T ) / √ T ) . Moreo ver, by Cauch y–Sc hw arz, | (D) | ≤ ∥ b α T − e α T ∥ L 2 ( P T ) ∥ b h T ,λ − h T ∥ L 2 ( P T ) , and (B) is b ounded by the second order b ound plus an additional bias term 1 T ν ⊤ b Σ − 1 T ,λ α ( b β T ,λ h − β T ) . Instan tiation under LinUCB. W e consider LinUCB as presented b y Lattimore and Szep esv ári [2020, Chapter 19], with a d T -dimensional feature map. W e use γ for their β (the “exploration b on us” factor). W e define the deterministic matrix e Σ T := P ⋆ + ( T d T ) 1 / 4 γ − 1 / 2 P ⊥ , where P ⋆ = β T β ⊤ T / ∥ β T ∥ 2 2 and P ⊥ = Id H − P ⋆ . Assumption 7. Ther e exists L > 0 such that for al l T ≥ 1 and for al l z ∈ Z , we have ∥ φ T ( z ) ∥ ≤ L . W e require a tail assumption with same conditional v ariance scale as assumption 3. Assumption 8. The se quenc e ( ϵ ) T t =1 is an ( F t ) -adapte d MDS with c onditional varianc e σ 2 , and satisfies: for al l t ≥ 1 and λ ∈ R , E [exp( λϵ t ) | F t − 1 ] ≤ exp  1 2 σ 2 λ 2  . Assumption 8 plays a k ey role in the self-normalized inequalities of Abbasi-y adkori et al. [2011] and the non-asymptotic analysis of F an et al. [2025]. The follo wing prop osition is a direct consequence of [Abbasi-yadk ori et al., 2011, Theorem 2] along with [Jézéquel et al., 2019, Proposition 2]. Suc h b ounds cannot b e impro ved for a general bandit algorithm [Lattimore, 2023]. 8 Prop osition 1. Under assumption 7-8, it holds with pr ob ability ≥ 1 − δ , for al l T ≥ 0 , that    b h T ,λ h − h T    L 2 ( P T ) ≲ r d eff ( λ h , T ) T + p λ h ∥ β 0 ∥ 2 . wher e d eff ( λ h , T ) := T r  ( λ h I d T + b Σ T ) − 1 b Σ T  is the effe ctive dimension. The notion d eff ( λ, T ) is related to the Bay esian information gain [Sriniv as et al., 2010]. A reader familiar with the regression literature [Cap onnetto and De Vito, 2007, Fisc her and Stein w art, 2020] ma y b e more accustomed to effective dimension as defined in terms of p opulation co v ariance matrix. The term √ λ ∥ β 0 ∥ 2 is a trivial upp er b ound on the bias term, and can b e improv ed under source conditions on β 0 with respect to b Σ T . The following directional stabilit y proposition is a consequence of [F an et al., 2025]. Assumption 9 (Large exploration) . The explor ation b onus γ T for horizon T satisfies γ T ≳ d 2 T ( σ p d T + log log T + 1) . Prop osition 2. Under assumptions 7-8-9, supp ose that ∥ φ T ( Z t ) ∥ = 1 for every t , and that ε bulk := d T  γ 8 T T  d T +1 d T − 1 + d 1 / 4 T √ γ T = o (1) , d T = o ( T ) , γ T p d T /T = o (1) , and γ − 1 T = o (1) . F or λ α = 1 /T , omitting p olylo gs, we have the fol lowing b ound in pr ob ability, with pr o of given in Se ction 10. ∥ b α T ,λ α − e α T ∥ L 2 ( P T ) e σ T = O P ( T ) d T γ T + r d T T γ T ! ∥ P ⋆ ν T ∥ ∥ P ⋆ ν T ∥ + ∥ P ⊥ ν T ∥ ( T d T ) 1 / 4 / √ γ T ( ∥ ) + r d T T + d T γ T + ε bulk ! ∥ P ⊥ ν T ∥ ( T d T ) 1 / 4 / √ γ T ∥ P ⋆ ν T ∥ + ∥ P ⊥ ν T ∥ ( T d T ) 1 / 4 / √ γ T ! ( ⊥ ) Here Eq. ( ∥ ) reflects the con tribution from the mass of ν aligned with the true signal direction h T , and Eq. ( ⊥ ) reflects the con tribution from the mass of ν orthogonal to the true signal direction. This is the key distinction betw een adaptiv e data collection and data collected under i.i.d. design, as the bandit is regret-minimizing and learns to collect less data in the directions corresponding to P ⊥ . Prop osition 2 c haracterizes when asymptotic normality and regret minimization can sim ultaneously b e achiev ed, b y optimally tuning the parameters γ , d T as a function of T in the high dimensional regime suc h that b oth Eq. ( ∥ ) and Eq. ( ⊥ ) are o (1) with resp ect to T , while the exp ected regret remains sublinear and is as small as p ossible. Remark 4. The b ound in Pr op osition 2 makes app e ar the term q d T T , as we r e gularize b α T with ridge p enalty 1 T or faster. W e c onje ctur e that it may b e p ossible to derive a gener alization of the ab ove pr op osition to an arbitr ary r e gularization sche dule, and inste ad make app e ar the term q d eff ( λ,T ) T , al lowing for gr owth of d that is sup erline ar in T . In other wor ds, we exp e ct that r e gularization impr oves dir e ctional stability, as it do es in the i.i.d. setting (se e e.g., Bruns-Smith et al., 2025). Under this c onje ctur e, the se c ond or der r emainder term in the von Mises exp ansion c an b e o P ( e σ T / √ T ) under much milder r estrictions on gr owth of d T r elative to T , yielding asymptotic normality under dir e ctional stability in high dimensional r e gimes wher e the OLS fails to b e normal. 9 6 Efficiency Theory W e consider efficiency for the sequence of statistical exp eriments {M T } T ≥ 1 , where both the target parameter Ψ T and its canonical gradient D ∗ T dep end on the exp erimen tal horizon T . Assumption 10 (Path wise differentiabilit y) . F or e ach T ∈ N , the p ar ameter Ψ T : M T → R is p athwise differ entiable at P ( T ) ∈ M T with c anonic al gr adient D ∗ T ∈ L 2 ( P ( T ) ) . T o form ulate lo cal asymptotic normality (LAN) and efficiency when √ T -rates are not attainable, w e state a notion of least fa vorable submo dels with T -sp ecific Fisher information [v an der V aart, 1998]. Definition 2 (Sequence of Least fav orable submo dels) . L et Ψ T : M T → R b e p athwise differ entiable at P ( T ) with c anonic al gr adient D ∗ T ∈ L 2 ( P ( T ) ) , and let ¯ σ T b e as define d in Se ction 5. A se quenc e of one–dimensional submo dels { P ( T ) η : | η | ≤ δ } ⊂ M T with P ( T ) 0 = P ( T ) is said to b e least fav orable for the se quenc e of tuples (Ψ T , M T , P ( T ) ) T ≥ 1 if 1. F or e ach fixe d T , the submo del { p ( T ) η | | η | ≤ δ } is QMD at η = 0 with sc or e function ∂ η log p ( T ) η ( ¯ O T )   η =0 = D ∗ T ( ¯ O T ) , (10) 2. The QMD r emainder vanishes, for e ach fixe d ϵ ∈ R , as T → ∞ , we have Z  q dP ( T ) η − p dP ( T ) − η 2 D ∗ T p dP ( T )  2 = o ( η 2 ) , η → 0 . (11) Note that the Fisher information in the parameterization defined b y (10) is ( ¯ P ( T ) { ( D ∗ T ) 2 } ) − 1 whic h diverges as T → ∞ : this is because it is a whole-tra jectory Fisher information, while t ypical analyses in the i.i.d. setting work with a p er-sample notion of Fisher information. In App endix 11.1 we show the existence of such a submodel. Theorem 3 b elo w iden tifies the lo cal asymptotic structure of the adaptive exp erimen t along least fa v orable submo dels. Assumption 11 (Directional stabilit y) . L et σ b e as in Assumption 3. W e assume that σ 2 ¯ σ 2 T ν ⊤ T ¯ Σ † T b Σ T ¯ Σ † T ν T p → 1 ⇐ ⇒ σ 2 ¯ σ 2 T ν ⊤ T ¯ Σ † T ( b Σ T − ¯ Σ T ) ¯ Σ † T ν T p → 0 . Notice that this is the same notion of directional stability as introduced earlier, except that w e no w require the deterministic stabilizing sequence to be ¯ Σ T . Assumption 12 (Lindeb erg condition) . W e assume that, for al l ϵ > 0 X t =1 E  η 2 s T ( Z t , Y t ) 2 1[ | η s T ( Z t , Y t ) | > ϵ ] | F T ,t − 1  p → 0 , wher e s T ( z , y ) := 1 T ¯ α T ( z )( y − h T ( z )) and η = ϵ ¯ σ − 1 T T 1 2 . Theorem 3 (Lo cal asymptotic normalit y) . W e r e quir e assumptions 11-12, and that ther e exists a se quenc e of le ast favor able submo dels { P ( T ) η : | η | ≤ δ } in the sense of Definition 2, and define ∆ T := √ T ¯ σ T D ∗ T ( ¯ O T ) . Supp ose that I T = ¯ σ 2 T T → ∞ as T → ∞ . Then, for e ach fixe d ϵ ∈ R , log dP ( T ) ϵ √ T / ¯ σ T dP ( T ) ( ¯ O T ) = ϵ ∆ T − ϵ 2 2 + o P ( T ) (1) , (12) and we have ϵ ∆ T d → N (0 , ϵ 2 ) . 10 Pr o of sketch of The or em 3. Fix ϵ > 0 and set I T := ¯ σ 2 T /T , η := ϵI − 1 / 2 T . Define W t := 2  q q Y ,η ( Y t | Z t ) /q Y ( Y t | Z t ) − 1  , log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) = 2 T X t =1 log  1 + 1 2 W t  . In Lemma 1 in the App endix, w e explicitly construct a sequence of least fav orable submo dels via p erturbing the outcome lik eliho o d q Y ,η , suc h that for remainder term R t,η satisfying T E [ R 2 t,η ] = o (1) , w e hav e W t = η T ¯ α T ( Z t ) { Y t − h T ( Z t ) } + R t,η ( Y t , Z t ) , whic h implies T X t =1 W t = η T T X t =1 ¯ α T ( Z t ) { Y t − h T ( Z t ) } − ϵ 2 4 + o p (1) . Using log(1 + x ) = x − x 2 2 + x 2 R ( x ) with R ( x ) → 0 as x → 0 , we hav e log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) = T X t =1 W t − 1 4 T X t =1 W 2 t + 1 2 T X t =1 W 2 t R ( W t ) . The term η T ¯ α T ( Z t ) { Y t − h T ( Z t ) } ⇒ N (0 , ϵ 2 ) by Assumptions 11 and 12, which verify the assumptions to apply martingale cen tral limit theorem [Hall and Heyde, 1980, Theorem 3.2]. W e sho w the quadratic term satisfies P T t =1 W 2 t → p ϵ 2 , and P T t =1 W 2 t R ( W t ) = o p (1) since max t | W t | = o p (1) . Com bining the abov e yields log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) ⇒ N  − ϵ 2 2 , ϵ 2  . A full pro of is deferred to App endix 11.2. As a consequence, for an y sequence of statistics ϕ T ( ¯ O T ) , the joint limit b eha vior of ( ϕ T , ∆ T ) under P ( T ) determines the entire family of asymptotic la ws of ϕ T under the lo cal exp eriments { P ( T ) ϵ √ T / ¯ σ T : ϵ ∈ R } . By the matching theorem [v an der V aart, 1998, Theorem 7.10], these limit laws can b e realized b y a single statistic acting on the Gaussian shift exp eriment X ϵ ∼ N ( ϵ, 1) . W e no w introduce a notion of regularity for estimator sequences { b ψ T } T ≥ 1 , requiring that their asymptotic distribution, when cen tered at the local target Ψ T ( P ( T ) ϵ √ T / ¯ σ T ) , b e inv arian t with resp ect to ϵ . Definition 3 (Regularit y along least fa v orable submo dels) . L et { P ( T ) ϵ √ T / ¯ σ T : ϵ ∈ R } b e a le ast favor able submo del for Ψ T in the sense of Definition 2. A se quenc e of estimators { b Ψ T } T ≥ 1 is said to b e regular along this se quenc e if ther e exists a pr ob ability law L such that, for every fixe d ϵ ∈ R , √ T ¯ σ T  b Ψ T − Ψ T ( P ( T ) ϵ √ T / ¯ σ T )  ⇒ L under P ( T ) ϵ √ T / ¯ σ T , (13) with the same limit law L for al l ϵ . Remark 5. The efficiency notion develop e d her e is r elative to the class of r e gular estimators define d in Definition 3. R e gularity r e quir es that the asymptotic distribution of the estimator, when c enter e d at the lo c al tar get along le ast favor able submo dels, b e invariant under √ T / ¯ σ T –sc ale p erturb ations. Unlike the i.i.d. setting, the natur al lo c al sc ale of the exp eriment dep ends on T thr ough the factor ¯ σ T . Conse quently, lo c al p erturb ations ar e not c omp ar able acr oss differ ent horizons T without r esc aling. The normalization by ¯ σ T / √ T identifies a c ommon lo c al p ar ameterization under which the lo c alize d exp eriments admit a homo gene ous Gaussian shift limit. 11 This regularit y condition translates, in the Gaussian shift exp erimen t, into equiv ariance-in-la w and forms the basis for the conv olution and efficiency results that follo w. Theorem 4 (Conv olution theorem) . Under Assumption 10-11-12. Supp ose that { b Ψ T } T ≥ 1 is r e gular along a le ast favor able submo del in the sense of Definition 3. Then, under P ( T ) , √ T ¯ σ T  b Ψ T − Ψ T ( P ( T ) )  ⇒ L, (14) and ther e exists a pr ob ability me asur e M on R such that L = N (0 , 1) ∗ M . In p articular, if L has varianc e σ 2 , then σ 2 ≥ 1 , with e quality if and only if M = δ 0 (e quivalently, L = N (0 , 1) ). Consequen tly , b Ψ T is asymptotically efficient among regular estimators if and only if √ T ¯ σ T  b Ψ T − Ψ T ( P ( T ) )  ⇒ N (0 , 1) . (15) In particular, an y asymptotically linear estimator with influence function D ∗ T attains this limit; hence the one–step estimator constructed in the previous section is asymptotically efficient. Remark 6. An asymptotic efficient se quenc e has √ T -sc ale d r ate q ν ⊤ ¯ Σ † T ν whenever D ∗ T ,Y dom- inates D ∗ T ,X , which is typic al ly the c ase. Me anwhile, we have pr ove d asymptotic normality at √ T -sc ale d r ate q ν ⊤ Σ − 1 T ν . F or our one-step to b e efficient, these ne e d to b e asymptotic al ly e quiva- lent. Remark 7. While the notion of efficiency might se em ad-ho c, instantiating it in the i.i.d. setting r e c overs the “usual” semip ar ametric efficiency b ound. In other wor ds, in the i.i.d. c ase, c omp eting against estimators that ar e only r e gular along the se quenc e of le ast favor able submo dels is as har d as c omp eting against al l r e gular estimators. 7 Discussions This w ork sho ws that dir e ctional stability is sufficient for v alid and efficien t inference in adaptive exp erimen ts. Under this mild condition, adaptiv ely collected data can b e treated with estimators that are algebraically equiv alent to i.i.d estimators and asymptotically efficient. Sev eral directions for future w ork remain op en. First, the analysis ma y b e extended to settings where the w orking mo del for the rew ard function is misspecified, and our target parameter ma y b e defined as a nonparametric M-estimand to further anc hor the robustness of the prop osed framew ork. Second, our results curren tly focus on p d T /T rates induced by sp ecific ridge regularization schedules; extending the theory to more general regularization regimes, in which complexity is go verned by the effectiv e dimension d eff ( λ, T ) , is an imp ortant next step. A final direction is to study debiased inference after mo del selection in adaptiv e settings, building on recent adv ances suc h as v an der Laan et al. [2023]. 12 References Y asin Abbasi-y adkori, Dávid P ál, and Csaba Szep esv ári. Improv ed algorithms for linear stochastic bandits. In J. Shaw e-T aylor, R. Zemel, P . Bartlett, F. Pereira, and K.Q. W ein b erger, editors, A dvanc es in Neur al Information Pr o c essing Systems , volume 24. Curran Asso ciates, Inc., 2011. Karun Adusumilli. Optimal tests follo wing sequential exp erimen ts. arXiv pr eprint arXiv:2305.00403 , 2023. P eter Auer, Nicolò Cesa-Bianc hi, and P aul Fisc her. Finite-time analysis of the multiarmed bandit problem. Machine L e arning , 47(2–3):235–256, 2002. Aurélien Bibaut and Nathan Kallus. Demystifying inference after adaptive exp erimen ts. A nnual R eview of Statistics and its Applic ation , 12(1):407–423, 2025. Aurélien Bibaut, Maria Dimakopoulou, Nathan Kallus, An toine Cham baz, and Mark v an Der Laan. P ost-contextual-bandit inference. A dvanc es in neur al information pr o c essing systems , 34:28548– 28559, 2021. Aurelien Bibaut, Nathan Kallus, and Mic hael Lindon. Near-optimal non-parametric sequen tial tests and confidence sequences with possibly dep enden t observ ations. arXiv pr eprint arXiv:2212.14411 , 2022. P eter J Bic kel, Chris AJ Klaassen, P eter J Bick el, Y a’acov Ritov, J Klaassen, Jon A W ellner, and Y A’Aco v Ritov. Efficient and adaptive estimation for semip ar ametric mo dels , v olume 4. Johns Hopkins Univ ersity Press Baltimore, 1993. Da vid Bruns-Smith, Oliver Dukes, A vi F eller, and Elizab eth L Ogburn. Augmented balancing weigh ts as linear regression. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , page qk af019, 2025. Andrea Cap onnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algorithm. F oundations of Computational Mathematics , 7:331–368, 2007. Victor Chernozh uko v, Whitney K Newey , and Rahul Singh. Automatic debiased machine learning of causal and structural effects. Ec onometric a , 90(3):967–1027, 2022. Brian Cho, Aurélien Bibaut, and Nathan Kallus. Sim ulation-based inference for adaptive exp eri- men ts. In Neur al Information Pr o c essing Systems , 2025. Lin F an and Peter W Glynn. The typical b eha vior of bandit algorithms. arXiv pr eprint arXiv:2210.05660 , 2022. W ei F an, Kevin T an, and Y uting W ei. Statistical inference under adaptive sampling with linucb, 2025. URL . Yingying F an, Y uxuan Han, Jinchi Lv, Xiao cong Xu, and Zhengyuan Zhou. Precise asymptotics and refined regret of v ariance-aw are ucb. arXiv pr eprint arXiv:2412.08843 , 2024. Simon Fisc her and Ingo Stein wart. Sobolev norm learning rates for regularized least-squares algorithms. Journal of Machine L e arning R ese ar ch , 21(205):1–38, 2020. 13 Vitor Hadad, David A Hirsh b erg, Ruohan Zhan, Stefan W ager, and Susan Athey . Confidence in terv als for p olicy ev aluation in adaptiv e exp erimen ts. Pr o c e e dings of the national ac ademy of scienc es , 118(15):e2014602118, 2021. Budhadit ya Halder, Shubha yan Pan, and K oulik Khamaru. Stable thompson sampling: V alid inference via v ariance inflation, 2025. URL . P eter Hall and Christopher C Heyde. Martingale limit the ory and its applic ation . Academic press, 1980. Qiy ang Han. Thompson sampling: Precise arm-pull dynamics and adaptiv e inference. arXiv pr eprint arXiv:2601.21131 , 2026. Qiy ang Han, Koulik Khamaru, and Cun-Hui Zhang. Ucb algorithms for multi-armed bandits: Precise regret and adaptive inference, 2024. URL . Keisuk e Hirano and Jac k R Porter. Asymptotic representations for sequential decisions, adaptiv e exp erimen ts, and batc hed bandits. arXiv pr eprint arXiv:2302.03117 , 2023. Stev en R. Ho w ard, Aadity a Ramdas, Jon McAuliffe, and Jasjeet Sekhon. Time-uniform c hernoff b ounds via nonnegative sup ermartingales. A nnals of Statistics , 49(2):1055–1080, 2021. Rémi Jézéquel, Pierre Gaillard, and Alessandro Rudi. Efficien t online learning with k ernels for adv ersarial large scale problems. In A dvanc es in Neur al Information Pr o c essing Systems , v olume 32. Curran Associates, Inc., 2019. Anand Kalvit and Assaf Zeevi. A closer lo ok at the w orst-case b eha vior of m ulti-armed bandit algorithms. A dvanc es in Neur al Information Pr o c essing Systems , 34:8807–8819, 2021. K oulik Khamaru and Cun-Hui Zhang. Inference with the upper confidence b ound algorithm. arXiv pr eprint arXiv:2408.04595 , 2024. T ze Leung Lai and Ching Zong W ei. Least squares estimates in stochastic regression mo dels with applications to identification and con trol of dynamic systems. The A nnals of Statistics , 10(1): 154–166, 1982. ISSN 00905364, 21688966. URL http://www.jstor.org/stable/2240506 . T or Lattimore. A lo w er b ound for linear and kernel regression with adaptiv e cov ariates. In Gergely Neu and Lorenzo Rosasco, editors, Pr o c e e dings of Thirty Sixth Confer enc e on L e arning The ory , v olume 195 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 2095–2113. PMLR, 12–15 Jul 2023. T or Lattimore and Csaba Szep esv ári. Bandit Algorithms . Cambridge Univ ersit y Press, 2020. Lucien Le Cam. Asymptotic Metho ds in Statistic al De cision The ory . Springer, New Y ork, 1986. Lihong Li, W ei Chu, John Langford, and Rob ert E. Schapire. A con textual-bandit approach to p ersonalized news article recommendation. In Pr o c e e dings of the 19th International W orld Wide W eb Confer enc e (WWW) , 2010. William F Rosenberger and F eifang Hu. Bo otstrap metho ds for adaptive designs. Statistics in me dicine , 18(14):1757–1767, 1999. 14 Niranjan Sriniv as, Andreas Krause, Sham Kak ade, and Matthias Seeger. Gaussian pro cess opti- mization in the bandit setting: no regret and exp erimen tal design. In International Confer enc e on Machine L e arning , page 1015–1022, 2010. Ric hard S. Sutton and Andrew G. Barto. R einfor c ement L e arning: An Intr o duction . MIT Press, Cam bridge, MA, 2nd edition, 2018. Lars v an der Laan, Marco Carone, Alex Luedtk e, and Mark v an der Laan. A daptive debiased mac hine learning using data-driven mo del selection tec hniques, 2023. URL abs/2307.12544 . Lars v an der Laan, Nathan Kallus, and Aurélien Bibaut. Nonparametric instrumen tal v ariable inference with many weak instruments, 2026. URL . Aad W. v an der V aart. Asymptotic Statistics . Cam bridge Series in Statistical and Probabilistic Mathematics. Cam bridge Univ ersity Press, 1998. doi: 10.1017/CBO9780511802256. Ian W audby-Smith and Aadit ya Ramdas. Time-uniform cen tral limit theorems and confidence sequences. In International Confer enc e on Machine L e arning , volume 139 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 10663–10672, 2021. Kelly Zhang, Lucas Janson, and Susan Murphy . Statistical inference with m-estimators on adaptiv ely collected data. A dvanc es in neur al information pr o c essing systems , 34:7460–7471, 2021. 15 App endix This appendix is organized as follows: – App endix 8: W e pro vide proofs for path wise differentiabilit y and canonical gradien t. – App endix 9: Asymptotic analysis of the one-step estimator in the high dimensional regime. – App endix 10: Upper Bound on Stabilit y Rate under LinUCB Sampling. – App endix 11: Proof of lo cal asymptotic normality , and con volution theorem. 8 Canonical gradien t P arameter and notation. Let Z t := ( X t , A t ) and write the conditional mean and residual as h T ( z ) := E P ( T ) [ Y t | Z t = z ] = φ T ( z ) ⊤ β T , ε t := Y t − h T ( Z t ) . W e define the p o oled ( X, A ) marginal induced by the (p ossibly adaptiv e) logging sequence: ¯ g t ( a | x ) := Z g t ( a | x, ¯ o t − 1 ) dP ( T ) ( ¯ o t − 1 ) , ¯ g ( a | x ) := 1 T T X t =1 ¯ g t ( a | x ) . Equiv alently , the p o oled marginal density/mass of Z with respect to µ X ⊗ ν A is ¯ h ( a, x ) := q X ( x ) ¯ g ( a | x ) . The po oled co v ariance matrix ¯ Σ T can be written as ¯ Σ T = X a ∈A Z X φ ( x, a ) φ ( x, a ) ⊤ ¯ h ( a, x ) d x. The target functional ev aluated at P ( T ) can be written as Ψ T ( P ( T ) ) = ν ⊤ T β T . W e no w presen t the pro of of Theorem 1. Pr o of. Fix P ( T ) ∈ M T , where dP ( T ) dµ ( T ) ( ¯ o T ) = T Y t =1 q X ( x t ) g t ( a t | x t , ¯ o t − 1 ) q Y ( y t | a t , x t ) . W e define M res T = M T ∩ ( dP ( T ) dµ ( T ) ( ¯ o T ) = T Y t =1 q X ( x t ) g t ( a t | x t , ¯ o t − 1 ) q Y ′ ( y t | a t , x t ) | q Y ′ ) , (16) as the restricted mo del of M T where q Y is allo wed to v ary , and q X , g t , 1 ≤ t ≤ T are fixed at the corresp onding factors in the fixed P ( T ) . W e pro ve the theorem b y (i) identifying the tangen t space of M res T , (ii) computing the path wise deriv ative of Ψ T with respect to M res T , (iii) showing that the path wise deriv ativ e lies in the tangent space, hence it is the canonical gradien t. Since Ψ T only depends on q Y , the canonical gradien t is agnostic to if g t or q X is known. Therefore, the canonical gradient of Ψ T with resp ect to M res T coincides with that with resp ect to M T . 16 Step 1: T angen t space of M res T . Consider a one-dimensional submo del { P ( T ) ϵ : ϵ ∈ ( − ϵ 0 , ϵ 0 ) } ⊆ M res T through P ( T ) . W e assume ϵ 7→ q Y ,ϵ is differen tiable in quadratic mean (DQM) at ϵ = 0 with (conditional) score s ( y , a, x ) := d dϵ log q Y ,ϵ ( y | a, x )     ϵ =0 , Z s ( y , a, x ) q Y ( y | a, x ) dy = 0 ∀ ( a, x ) . Then ϵ 7→ P ( T ) ϵ is DQM at ϵ = 0 with score S T ( ¯ O T ) = d dϵ log dP ( T ) ϵ dP ( T ) ( ¯ O T )      ϵ =0 = T X t =1 s ( Y t , A t , X t ) . Let m ϵ ( a, x ) := E P ( T ) ϵ [ Y t | A t = a, X t = x ] . A standard conditional-mean differentiation identit y giv es ˙ m ( a, x ) := d dϵ m ϵ ( a, x )     ϵ =0 = E P ( T )  Y t s ( Y t , A t , X t ) | A t = a, X t = x  = E P ( T )  ε t s ( Y t , A t , X t ) | A t = a, X t = x  . (17) Because { P ( T ) ϵ : ϵ ∈ ( − ϵ 0 , ϵ 0 ) } ⊂ M T , by Assumption 1, for eac h ϵ ∈ ( − ϵ 0 , ϵ 0 ) , there exists a unique β ϵ suc h that m ϵ ( a, x ) ≡ φ ( a, x ) ⊤ β ϵ . Then w e claim that ϵ 7→ β ϵ is differen tiable at ϵ = 0 with deriv ative ˙ β ∈ R d T . Indeed, b y Assumption 1, w e can choose z i , 1 ≤ i ≤ d T , such that { φ ( z i ) } 1 ≤ i ≤ d T form a linearly indep enden t set. Let Φ denote the matrix whose i th ro w is φ ( z i ) ⊤ . Then w e ha ve β ϵ − β T = Φ − 1    ( m ϵ − m 0 )( z 1 ) . . . ( m ϵ − m 0 )( z d T )    . The claim then follows from differen tiability of ( m ϵ − m 0 )( z i ) and contin uity of Φ − 1 . In this case, ˙ m ( a, x ) = φ ( x, a ) ⊤ ˙ β ∀ ( a, x ) . (18) Step 2: Path wise deriv ative of Ψ T along ϵ 7→ q Y ,ϵ . Since Ψ T ( P ( T ) ) = ν ⊤ T β T , w e ha ve d dϵ Ψ T ( P ( T ) ϵ )     ϵ =0 = ν ⊤ T ˙ β . (19) T o express ˙ β in terms of the score s , m ultiply Eq. (18) on the left by φ ( x, a ) and integrate with resp ect to the p o ole d marginal ¯ h ( a, x ) dx (equiv alen tly , tak e T − 1 P T t =1 E P ( T ) [ · ] ): 1 T T X t =1 E P ( T ) [ φ ( Z t ) ˙ m ( Z t )] = 1 T T X t =1 E P ( T ) h φ ( Z t ) φ ( Z t ) ⊤ i ˙ β = ¯ Σ T ˙ β . Using Eq. (17) and the la w of iterated exp ectations, 1 T T X t =1 E P ( T ) [ φ ( Z t ) ˙ m ( Z t )] = 1 T T X t =1 E P ( T ) [ φ ( Z t ) ε t s ( Y t , Z t )] . 17 By Assumption 2, ν T = ¯ Σ T ¯ Σ † T ν T . Th us from Eq. (19), we hav e d dϵ Ψ T ( P ( T ) ϵ )     ϵ =0 = ν ⊤ T ¯ Σ † T 1 T T X t =1 E P ( T ) h φ ( Z t ) φ ( Z t ) ⊤ i ˙ β = 1 T T X t =1 E P ( T ) h ν ⊤ T ¯ Σ † T φ ( Z t ) ε t  s ( Y t , Z t ) i = 1 T T X t =1 E P ( T ) [ ¯ α T ( Z t ) ε t s ( Y t , Z t )] . (20) Step 3: Expressing Eq. (20) as a L 2 ( P ( T ) ) -inner pro duct. In L 2 ( P ( T ) ) , the inner pro duct of 1 T P T t =1 ¯ α T ( Z t ) ε t with the full score S T = P T l =1 s ( Y l , Z l ) expands as E P ( T ) " 1 T T X t =1 ¯ α T ( Z t ) ε t ! T X l =1 s ( Y l , Z l ) !# = 1 T T X t =1 E P ( T )  ¯ α T ( Z t ) ε t s ( Y t , Z t )  + 1 T X t  = l E P ( T )  ¯ α T ( Z t ) ε t s ( Y l , Z l )  . F or t  = l , the cross-terms v anish by the law of iterated exp ectation and the fact that E P ( T ) [ ε t | Z t , F t − 1 ] = E P ( T ) [ ε t | Z t ] = 0 : if l < t , then s ( Y l , Z l ) is F t − 1 -measurable and E P ( T )  ¯ α T ( Z t ) ε t s ( Y l , Z l )  = E P ( T ) [ s ( Y l , Z l ) E P ( T ) [ ¯ α T ( Z t ) ε t | F t − 1 ]] = E P ( T ) [ s ( Y l , Z l ) E P ( T ) [ ¯ α T ( Z t ) E P ( T ) [ ε t | Z t ] | F t − 1 ]] = 0 . The case t < l is analogous (condition on F l − 1 ). Therefore E P ( T ) " 1 T T X t =1 ¯ α T ( Z t ) ε t ! S T # = 1 T T X t =1 E P ( T )  ¯ α T ( Z t ) ε t s ( Y t , Z t )  . (21) Step 4: W e show D ∗ T is the canonical gradien t in M res T . Putting Eq. (20) and Eq. (21) together, w e obtain d dϵ Ψ T ( P ( T ) ϵ )     ϵ =0 = E P ( T ) " 1 T T X t =1 ¯ α T ( Z t ) ε t ! S T # = E P ( T ) [ D ∗ T S T ] , for D ∗ T giv en in Eq. (6) . Hence D ∗ T is a gradien t for Ψ T along all smooth one-parameter parametric submo dels M res T . Moreo ver, since D ∗ T = 1 T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) , where E [ ¯ α T ( Z t )( Y t − h T ( Z t )) | Z t ] = ¯ α T ( Z t ) E [ ε t | Z t ] = 0 , we see that D ∗ T ,Y lies in the tangen t space of M res T at P ( T ) . Therefore it is the canonical gradient with resp ect to M res T . 18 Step 5: Path wise differentiabilit y criterion. If D ∗ T / ∈ L 2 ( P ( T ) ) , then the linear functional s 7→ d dϵ Ψ T ( P ( T ) ϵ )    ϵ =0 cannot b e contin uous on the tangen t space of M res T equipp ed with the L 2 ( P ( T ) ) norm, so Ψ T is not pathwise differentiable at P ( T ) . Con v ersely , if D ∗ T ∈ L 2 ( P ( T ) ) , then (20) – (21) sho w that the deriv ative is represented by the L 2 inner product with D ∗ T , hence Ψ T is path wise differen tiable. 9 Asymptotic analysis of the one-step estimator Notation. W e omit subscript T when it’s clear from con text, for instance, w e sometimes write E 0 for E P ( T ) 0 . F or a positive definite, symmetric matrix A ∈ R d T × d T and a ∈ R d T , w e let ∥ a ∥ A := ⟨ a, Aa ⟩ 1 2 H . F or a symmetric A ∈ R d T × d T and λ ≥ 0 , we also write A λ = A + λI d T . Pr o of of the or em 2. W e ha v e R = ψ 0 ( ˆ h ) − ψ 0 ( h T ) + ˜ P ( T ) ˆ α T ( Y − ˆ h ) ( ∗ ) = ˜ P ( T ) e α T ( ˆ h T ,λ − h T ) + ˜ P ( T ) ˆ α T ( h T − ˆ h T ,λ ) = − ˜ P ( T ) ( ˆ α T − e α T )( ˆ h T ,λ − h T ) , where in ( ∗ ) we use ˜ P ( T ) φ ( z ) ε = 0 . W e hav e ˆ Ψ − ψ ( h T ) =  P T − ˜ P ( T )  ˆ α T ( Y − ˆ h T ,λ ) − ˜ P ( T ) ( ˆ α T − e α T )( ˆ h T ,λ − h T ) =  P T − ˜ P ( T )  e α T ( Y − h T ) +  P T − ˜ P ( T )  n ˆ α T ( Y − ˆ h T ,λ ) − e α T ( Y − h T ) o − ˜ P ( T ) ( ˆ α T − e α T )( ˆ h T ,λ − h T ) . Then, using the identit y ˆ a ˆ b − ab = ( ˆ a − a ) b + a ( ˆ b − b ) + ( ˆ b − b )(ˆ a − a ) , w e ha ve ˆ Ψ − ψ ( h T ) =  P T − ˜ P ( T )  e α T ( Y − h T ) (I) +  P T − ˜ P ( T )  ( ˆ α T − e α T )( Y − h T ) (I I) +  P T − ˜ P ( T )  e α T ( h T − ˆ h T ,λ ) (I II) +  P T − ˜ P ( T )  ( ˆ α T − e α T )( h T − ˆ h T ,λ ) (IV) − ˜ P ( T ) ( ˆ α T − e α T )( ˆ h T ,λ − h T ) . (V) W e ha ve (IV) + (V) = − P T ( ˆ α T − e α T )( ˆ h T ,λ − h T ) (I I) =  P T − ˜ P ( T )  ( ˆ α T − e α T ) ε. 19 Therefore the von Mises expansion can be explicitly written as ˆ Ψ − Ψ T =  P T − e P ( T )  e α T ( Y − h T ) (A) +  P T − e P ( T )  e α T ( h T − ˆ h T ,λ ) (B) +  P T − e P ( T )  ( b α T − e α T ) ε (C) − P T ( b α T − e α T )( b h T ,λ − h T ) . (D) T erm (A) Claim: √ T e σ T (A) d → N (0 , 1) . Pr o of of claim . W e verify the assumptions of Lemma 2 for the natural filtration F T ,t = F t . First, w e verify (QV). W e hav e T e σ 2 T 1 T 2 T X t =1 e α T ( Z t ) 2 σ 2 = σ 2 T e σ 2 T T X t =1 ν ⊤ T ˜ Σ − 1 T φ T ( Z t ) φ T ( Z t ) ⊤ ˜ Σ − 1 T ν T = σ 2 ˜ σ 2 T ν ⊤ T ˜ Σ − 1 T ˆ Σ T ˜ Σ − 1 T ν T = 1 + σ 2 ˜ σ 2 T ν ⊤ T ˜ Σ − 1 T ( ˆ Σ T − ˜ Σ T ) ˜ Σ − 1 T ν T p → 1 , where the last step follows by Assumption 4 and the fact that ˜ σ 2 T = σ 2 ν ⊤ T ˜ Σ − 1 T ν T . W e next v erify (Lindeb erg) is verified by Assumption 5. Hence the claim follo ws from Lemma 2. T erm (B) W e ha ve that (B) = ν ⊤ T e Σ − 1 T  b Σ T − e Σ T   b β T ,λ − β T  = ν ⊤ T  e Σ − 1 T − b Σ − 1 T , 1 /T  b Σ T  b β T ,λ − β T  − 1 T ν ⊤ T b Σ − 1 T , 1 /T  b β T ,λ − β T  = ν ⊤ T  e Σ − 1 T − b Σ − 1 T , 1 /T  ˆ Σ 1 2 T ˆ Σ 1 2 T  ˆ β T ,λ − β T  − 1 T ν ⊤ T b Σ − 1 T , 1 /T  b β T ,λ − β T  ≤     e Σ − 1 T − b Σ − 1 T , 1 /T  ν T    ˆ Σ T    ˆ β T ,λ − β T    ˆ Σ T − 1 T ν ⊤ T b Σ − 1 T , 1 /T  b β T ,λ − β T  = ∥ b α T − e α T ∥ L 2 ( P T )    ˆ h T ,λ − h T    L 2 ( P T ) − 1 T ν ⊤ T b Σ − 1 T , 1 /T  b β T ,λ − β T  , where the first line follo ws by definition of P T and e P ( T ) , the second line follo ws from e Σ − 1 T  ˆ Σ T − e Σ T  =  e Σ − 1 T − ˆ Σ − 1 T , 1 /T  ˆ Σ T + ˆ Σ − 1 T , 1 /T ˆ Σ T − I d T =  e Σ − 1 T − ˆ Σ − 1 T , 1 /T  ˆ Σ T − 1 T ˆ Σ − 1 T , 1 T , w e apply Cauc hy Sch w arz inequality in the second last line, and in the last line w e use    ˆ h T ,λ − h T    2 L 2 ( P T ) = 1 T T X t =1  ˆ β T ,λ − β T  ⊤ φ ( Z t ) φ ( Z t ) ⊤  ˆ β T ,λ − β T  =    ˆ β T ,λ − β T    2 ˆ Σ T , and similarly ∥ b α T − e α T ∥ L 2 ( P T ) =     e Σ − 1 T − b Σ − 1 T , 1 /T  ν T    ˆ Σ T . 20 T erm (C) Recall that (C) = ( P T − e P ( T ) )( b α T − e α T ) ε = P T  ( b α T − e α T ) ε  = 1 T T X t =1  b α T ( Z t ) − e α T ( Z t )  ε t , where the second equality uses e P ( T ) ( a 2 φ ( z ) ε ) = 0 b y definition of e P ( T ) for an y z ∈ Z and a 2 ∈ R . Let w t := b α T ( Z t ) − e α T ( Z t ) , t = 1 , . . . , T . Then ( w 1 , . . . , w T ) is measurable with resp ect to Z 1: T . Under Assumption 6, conditional on Z 1: T w e hav e T X t =1 w t ε t    Z 1: T ∼ N 0 , σ 2 T X t =1 w 2 t ! . Therefore, (C)    Z 1: T ∼ N 0 , σ 2 T 2 T X t =1 w 2 t ! = N 0 , σ 2 T · 1 T T X t =1 w 2 t ! . Moreo ver, 1 T T X t =1 w 2 t = ∥ b α T − e α T ∥ 2 L 2 ( P T ) , since P T is the empirical measure on ( Z t ) t ≤ T . Hence, for an y δ ∈ (0 , 1) , using the standard Gaussian tail bound P ( | N (0 , 1) | ≥ x ) ≤ 2 e − x 2 / 2 , w e obtain P | (C) | ≤ σ ∥ b α T − e α T ∥ L 2 ( P T ) r 2 log(2 /δ ) T    Z 1: T ! ≥ 1 − δ. Since the right-hand side is measurable in Z 1: T , the same inequality holds unconditionally: P | (C) | ≤ σ ∥ b α T − e α T ∥ L 2 ( P T ) r 2 log(2 /δ ) T ! ≥ 1 − δ. Consequen tly , (C) = O P ( T )  σ √ T ∥ b α T − e α T ∥ L 2 ( P T )  = O P ( T )  1 √ T ∥ b α T − e α T ∥ L 2 ( P T )  , where the second equality uses that σ is a fixed constan t. T erm (D) F rom Cauc hy-Sc h warz, (D) ≤ ∥ ˆ α T − e α T ∥ L 2 ( P T )    ˆ h T ,λ − h T    L 2 ( P T ) . 10 Upp er Bound on Stabilit y Rate under LinUCB Sampling Pr o of of pr op osition 2. W e omit p olylogs throughout the pro of. W e omit the dependence of d on T notation wise and write d = d T . Let b Λ T := T b Σ T , let v = v 1 b e the top eigen vector of b Λ T , let v i , 21 i = 2 , . . . , d b e its eigen vectors corresp onding to the non-leading eigenv alues in descending order, denote λ i , i = 1 , . . . , d its eigenv alues ordered in descending order, i.e. b Λ T = d X i =1 λ i v i v ⊤ i . Define Q ⋆ := v v ⊤ and Q ⊥ := I d − Q ⋆ . Let e 1 := β T / ∥ β T ∥ 2 . W e use γ for the β in F an et al. [2025]. Define P ∗ = e 1 e ⊤ 1 and P ⊥ = I d − P ∗ , as the pro jection onto the true signal direction (and its orthogonal complement), and let e Λ T := ω 1 P ⋆ + ¯ ω P ⊥ , (22) ˇ Λ T := λ 1 Q ⋆ + ¯ λQ ⊥ , (23) with ¯ λ := ¯ ω := γ p T /d , ω 1 := T . W e assume p d/T = o (1) . W e’ll establish Prop osition 2 with e Σ T = 1 T e Λ T . Preliminary facts. • F rom F an et al. [2025, Eq. (27), Eq. (28)], under assumptions 7-8-9, with probability ≥ 1 − 1 log T , ∥ v − e 1 ∥ ≲ r d ¯ λ = d 3 / 4 γ 1 / 2 T 1 / 4 . (24) λ i = (1 + ∆ T ,i ) r 2 γ 2 T d + 1 , (25) where w e ha ve | ∆ T ,i | ≲ ε bulk = o T (1) . b y assumption. • F rom ∥ Z t ∥ = 1 for ev ery t , T r( ˇ Λ T ) = T + d = λ 1 + ( d − 1) ¯ λ. Therefore, under d = o ( T ) , the eigenv alue gap δ := ω 1 − λ 1 satisfies δ = ( d − 1) γ p T /d − d = O ( γ √ T d ) . • It holds that ∥ Q ⊥ P ⋆ ∥ = ∥ e 1 e ⊤ 1 − Q ⋆ e 1 e ⊤ 1 ∥ = ∥ ( e 1 − Q ⋆ e 1 ) e ⊤ 1 ∥ ≤ ∥ e 1 − Q ⋆ v ∥ ≤ ∥ e 1 − v ∥ . Similarly ∥ Q ⋆ P ⊥ ∥ ≤ ∥ e 1 − v ∥ . 22 • F or an y σ i , i = 1 , . . . , d , w e ha ve      X i σ i v i v ⊤ i e 1 e ⊤ 1      op ≤   X j ≥ 2 σ 2 j ( v ⊤ j e 1 ) 2   1 2 ≤  max j ≥ 2 | σ j |    X j ≥ 2 ( v ⊤ j e 1 ) 2   1 2 W e ha ve X j ≥ 2 ( v ⊤ j e 1 ) 2 =    e 1 − e ⊤ 1 v 1 v 1    2 = ∥ e 1 − Pro j( e 1 | Span( v 1 )) ∥ 2 ≤ ∥ e 1 − v ∥ 2 . Th us we hav e      X i σ i v i v ⊤ i e 1 e ⊤ 1      op ≤ max j ≥ 2 | σ j |∥ v − e 1 ∥ . (26) W e consider the following decomp osition of the target quantit y . W e ha ve ∥ b Λ 1 / 2 T  b Λ − 1 T − e Λ − 1 T  a ∥ ≤∥ ˇ Λ 1 / 2 T  ˇ Λ − 1 T − e Λ − 1 T  a ∥ (27) + ∥ ( b Λ − 1 / 2 T e Λ 1 / 2 T − b Λ 1 / 2 T e Λ − 1 / 2 T ) e Λ − 1 / 2 T a − ( ˇ Λ − 1 / 2 T e Λ 1 / 2 T − ˇ Λ 1 / 2 T e Λ − 1 / 2 T ) e Λ − 1 / 2 T a ∥ . (28) W e b ound the tw o terms ab o ve in steps 1 and 2 b elo w, resp ectiv ely . Step 1. W e ha ve that ˇ Λ 1 / 2 T  ˇ Λ − 1 T − e Λ − 1 T  a =(I) + (I I) + (I II) + (IV ) , with (I) :=  λ − 1 / 2 1 ω 1 / 2 1 − λ 1 / 2 1 ω − 1 / 2 1  ω − 1 / 2 1 Q ⋆ P ⋆ a, (I I) :=  ¯ λ − 1 / 2 ¯ ω 1 / 2 − ¯ λ 1 / 2 ¯ ω − 1 / 2  ¯ ω − 1 / 2 Q ⊥ P ⊥ a, (I II) :=  ¯ λ − 1 / 2 ω 1 / 2 1 − ¯ λ 1 / 2 ω − 1 / 2 1  ω − 1 / 2 1 Q ⊥ P ⋆ a, (IV) :=  λ − 1 / 2 1 − λ 1 / 2 1 ¯ ω − 1  Q ⋆ P ⊥ a. • First term. Using p d/T = o (1) , w e hav e that ∥ (I) ∥ ≲      s 1 1 − δ /T − p 1 − δ /T      ∥ P ⋆ a ∥ √ T ≲ δ T ∥ P ⋆ a ∥ √ T ≲ γ √ d 1 T ∥ P ⋆ a ∥ . 23 • Second term: since ¯ ω = ¯ λ , (II) = 0 . • Third term. Using p d/T = o (1) , w e hav e that ∥ (I II) ∥ ≲   γ r T d ! − 1 / 2 √ T + γ r T d ! 1 / 2 1 √ T   1 √ T d 3 / 4 √ γ T 1 / 4 ∥ P ⋆ a ∥ = d γ √ T + √ d T ! ∥ P ⋆ a ∥ . • F ourth term: ∥ (IV) ∥ ≲  1 √ T − δ + √ T − δ γ √ T √ d  d 3 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ ≲ 1 √ T + √ d γ ! d 3 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ Collecting the b ounds ab o ve and using γ − 1 = o (1) , w e then ha v e ∥  ˇ Λ − 1 T − e Λ − 1 T  a ∥ ˇ Λ T ≲ γ √ d T + d γ √ T ! ∥ P ⋆ a ∥ + 1 √ T + √ d γ ! d 3 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ . Step 2. W e ha ve that  b Λ − 1 / 2 T e Λ 1 / 2 T − b Λ 1 / 2 T e Λ − 1 / 2 T  e Λ − 1 / 2 T a −  ˇ Λ − 1 / 2 T e Λ 1 / 2 T − ˇ Λ 1 / 2 T e Λ − 1 / 2 T  e Λ − 1 / 2 T a =  ˆ Λ − 1 2 T − ˇ Λ − 1 2 T  a −  ˆ Λ 1 2 T e Λ − 1 T − ˇ Λ 1 2 T e Λ − 1 T  a =(V) + (VI) with (V) := d X i =2 ( λ − 1 / 2 i − ¯ λ − 1 / 2 ) v i v ⊤ i ( P ⋆ + P ⊥ ) a (VI) := d X i =2 ( λ 1 / 2 i − ¯ λ 1 / 2 ) v i v ⊤ i  1 T P ⋆ + 1 ¯ λ P ⊥  a. F rom Eq. (26), ∥ (V) ∥ ≤ max j     λ − 1 2 j − ¯ λ − 1 2     ∥ v − e 1 ∥∥ P ∗ a ∥ +      d X i =2  λ − 1 2 i − ¯ λ − 1 2  v i v ⊤ i      ∥ P ⊥ a ∥ ≲ ¯ λ ¯ λ 3 / 2 ε bulk ∥ v − e 1 ∥∥ P ⋆ a ∥ + ¯ λ ¯ λ 3 / 2 ε bulk ∥ P ⊥ a ∥ ≲ ε bulk d γ √ T ∥ P ⋆ a ∥ + d 1 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ ! 24 and ∥ (VI) ∥ ≲ ¯ λ ¯ λ 1 / 2 ε bulk 1 T ∥ v − e 1 ∥∥ P ⋆ a ∥ + ¯ λ ¯ λ 1 / 2 ¯ λ ε bulk ∥ P ⊥ a ∥ ≲ ε bulk √ d T ∥ P ⋆ a ∥ + d 1 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ ! Collecting the b ounds. W e hav e that    ( b Σ − 1 T − e Σ − 1 T ) a    b Σ T = √ T    ( b Λ − 1 T − e Λ − 1 T ) a    b Λ T = γ r d T + d γ + r d T ε bulk ! ∥ P ⋆ a ∥ + r d T + d γ + ε bulk ! ( T d ) 1 / 4 √ γ ∥ P ⊥ a ∥ . No w observing that e σ 2 T = ∥ P ⋆ a ∥ 2 + √ T d γ ∥ P ⊥ a ∥ 2 yields the desired result. 11 Efficiency Theory In this section, w e pro ve Theorem 3 and Theorem 4. W e adapt the efficiency theory along a sequence of exp eriments introduced in v an der Laan et al. [2026, Section I]. Both our setting and their setting consider a sequence of statistical mo dels with div erging Fisher information, whic h is indexed b y horizon T in our case and the num b er of instruments K in their case. How ev er, the k ey distinction is that they consider a factorized mo del Bic kel et al. [1993] due to the indep endence b et ween differen t units, whereas w e consider a longitudinal mo del where g t ( a t | x t , ¯ o t − 1 ) factors induce in tertemp oral dependencies. This necessitates certain adaptations in our construction. 11.1 Lo cal asymptotic normalit y along least fav orable submodels Recall z = ( a, x ) and w e defined ¯ P T f ( z , y ) = Z ¯ h T ( z ) q ( y | z ) f ( z , y ) d z d y , ¯ h T ( z ) = 1 T T X t =1 ¯ g t ( a | x ) q X ( x ) . D ∗ T = 1 T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) ∈ L 2 0 ( P ( T ) ) , and we emplo y the empirical pro cess notation P ( T ) [ f ] for E P ( T ) [ f ] , f : ¯ O T 7→ R , where w e only in tegrate ov er the randomness of the arguments of f , and not f itself. Then we hav e P ( T ) [( D ∗ T ) 2 ] = P ( T ) 1 T T X t =1 ¯ α T ( A t , X t ) ε t ! 2 ( ∗ ) = 1 T 2 T X t =1 E P ( T )  ( ¯ α T ( A t , X t ) ε t ) 2  ( ∗∗ ) = 1 T ¯ σ 2 T . 25 where in ( ∗ ) w e use conditional mean zero of ε t and la w of total expectation, and in ( ∗∗ ) , ¯ σ T = σ q ν ⊤ T ¯ Σ † T ν T is defined in Section 5. T o see why ( ∗∗ ) holds, w e note that ¯ Σ T = E P ( T ) " 1 T T X t =1 φ ( Z t ) φ ( Z t ) ⊤ # = Z φ ( z ) φ ( z ) ⊤ 1 T T X t =1 ¯ g t ( a | x ) q X ( x ) ! d a d x hence 1 T T X t =1 E P ( T )  ( ¯ α T ( Z t ) ε t ) 2  = 1 T T X t =1 E P ( T ) h ε 2 t ν ⊤ T ¯ Σ † T φ ( Z t ) φ ( Z t ) ⊤ ¯ Σ † T ν T i = 1 T T X t =1 E P ( T ) h E [ ε 2 t | F t − 1 , Z t ] ν ⊤ T ¯ Σ † T φ ( Z t ) φ ( Z t ) ⊤ ¯ Σ † T ν T i = σ 2 ν ⊤ T ¯ Σ † T ¯ Σ T ¯ Σ † T ν T = σ 2 ν ⊤ T ¯ Σ † T ν T , (29) where w e use the homoschedastic noise assumption E [ ε 2 t | F t − 1 , Z t ] = σ 2 (Assumption 3). W e no w construct an explicit least fav orable submo del satisfying Definition 2. The construction follo ws the classical Le Cam–Hájek quadratic mean differentiabilit y path [Le Cam, 1986], adapted to the presen t setting b y taking the score to b e the canonical gradien t D ∗ T . This provides a concrete example of a submo del along whic h the local asymptotic normality expansion of Theorem 3 holds. W e adapt the factorizable mo del construction in v an der Laan et al. [2026, Lemma 17], noting that while g t factors induce dependence betw een time p oin ts, it suffices to consider p erturbations of the repeated q Y ( y t | z t ) factors in order to obtain a one-parameter submodel with D ∗ T = D ∗ T ,Y score at the origin. In Subsection 11.2, we’ll leverage the factorizabilit y to express the log-lik eliho o d ratio pro cess in terms of a martingale, and then obtain local asymptotic normalit y via a careful application of martingale limit theory [Hall and Heyde, 1980]. Lemma 1 (Le Cam–Hájek QMD path) . Fix horizon T ≥ 1 and assume Assumption 3. W e define a one-p ar ameter family η 7→ q Y ,η , for | η | ≤ δ , with sc or e s T ( z , y ) = 1 T ¯ α T ( z )( y − h T ( z )) , by q Y ,η ( y | z ) :=  1 + η 2 T ¯ α T ( z )( y − h T ( z ))  2 1 + η 2 4 ¯ σ 2 T T 2 q Y ( y | z ) . (30) W e then define a one-dimensional p ar ametric submo del { P ( T ) η : | η | ≤ δ } via d P ( T ) η d µ ( T ) ( ¯ o T ) = T Y t =1 q X ( x t ) g t ( a t | x t , ¯ o t − 1 ) q Y ,η ( y t | a t , x t ) . R e c al l µ is the b ase me asur e on Z × Y . Then i) q Y ,η ( y | z ) ¯ h T ( z ) is a valid pr ob ability density with r esp e ct to µ , ii), { q Y ,η ¯ h T : | η | ≤ δ } is QMD at η = 0 with sc or e 1 T ¯ α T ( z )( y − h T ( z )) , and iii), the r emainder term R t,η ( y , z ) := q q Y ,η ( y | z ) ¯ h T ( z ) − q q Y ( y | z ) ¯ h T ( z ) − η 2 T ¯ α T ( z )( y − h T ( z )) q q Y ( y | z ) ¯ h T ( z ) satisfies ∥ R t,η ∥ 2 L 2 ( µ ) ≤  1 − q 1 + η 2 4 ¯ σ 2 T T 2  2 ≤ η 4 ¯ σ 4 T T 4 . 26 Pr o of. F or this proof t is fixed. W e let ξ ( y , z ) 2 = q Y ( y | z ) ¯ h T ( z ) . This has the prop ert y that for an y f : Y × Z → R , we hav e Z Z ξ ( y , z ) 2 d µ ( y , z ) f ( y , z ) = ¯ P ( T ) [ f ] = E P ( T ) " 1 T T X t =1 f ( Y t , Z t ) # . (31) W e define, for η ∈ R ξ η ( y , z ) := ξ ( y , z )  1 + η 2 T ¯ α T ( z )( y − h T ( z ))  . W e readily v erify from Eq. (31) that Z ξ ( y , z ) 2 d µ ( y , z ) = 1 Z ξ ( y , z ) 2 1 T ¯ α T ( z )( y − h T ( z )) d µ ( y , z ) = 0 Z ξ ( y , z ) 2  1 T ¯ α T ( z )( y − h T ( z ))  2 d µ ( y , z ) = E P ( T ) " 1 T T X t =1  1 T ¯ α T ( Z t )( Y t − h T ( Z t ))  2 # = ¯ σ 2 T T 2 Z ξ η ( y , z ) 2 d µ ( y , z ) = 1 + η 2 4 ¯ σ 2 T T 2 . Therefore Eq. (30) defines a v alid conditional probability . Th us w e’ve shown i). W e ha ve log q Y ,η ( y | z ) = 2 log  1 + η 2 T ¯ α T ( z )( y − h T ( z ))  − log  1 + η 2 4 ¯ σ 2 T T 2  + log q Y ( y | z ) . W e no w compute its deriv ative at η = 0 : ∂ ∂ η     η =0 log q Y ,η ( y | z ) = 1 T ¯ α T ( z )( y − h T ( z )) = s T ( z , y ) . Th us we’v e sho wn ii). W e now control the remainder ∥ R t,η ∥ 2 L 2 ( µ ) to establish quadratic mean differen tiability at η = 0 . W e obtain via direct algebraic manipulations that R t,η ( y , z ) =   1 q 1 + η 2 4 ¯ σ 2 T T 2 − 1   ξ ( y , z )  1 + η 2 s T ( z , y )  . Therefore ∥ R t,η ∥ 2 L 2 ( µ ) =   1 q 1 + η 2 4 ¯ σ 2 T T 2 − 1   2 E P ( T )   1 + η 2 s T ( Z t , Y t )  2  =   1 q 1 + η 2 4 ¯ σ 2 T T 2 − 1   2  1 + η 2 4 ¯ σ 2 T T 2  = 1 − r 1 + η 2 4 ¯ σ 2 T T 2 ! 2 ≤ η 4 ¯ σ 4 T T 4 . where w e use Eq. (31) in the first line, and the last inequality follows from | 1 − √ 1 + x | ≤ x . 27 This construction supplies an example of a QMD one-dimensional submodel with ∂ η | η =0 log P ( T ) η ( ¯ o T ) = D ∗ T ( ¯ o T ) . The QMD of P ( T ) η is implied by the QMD of q Y ,η . 11.2 Lo cal asymptotic normalit y along least fav orable submodels In this section, w e adapt v an der V aart [1998, Theorem 7.2] and v an der Laan et al. [2026, Theorem 17] in order to pro ve Theorem 3, as presen ted below. Pr o of. Fix ϵ > 0 . W e define I T = ¯ σ 2 T T and η = ϵI − 1 2 T . W e define the random v ariable W t := 2 " s q Y ,η ( Y t | Z t ) q Y ( Y t | Z t ) − 1 # . The log-lik eliho o d ratio admits the expansion log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) = 2 T X t =1 log  1 + 1 2 W t  . W e aim to show that, as T → ∞ , we hav e T X t =1 W t = η 1 T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) − ϵ 2 4 + o p (1) . (32) Then, we use the T a ylor expansion log (1 + x ) = x − x 2 2 + x 2 R (2 x ) where R ( x ) → 0 as x → 0 , to obtain log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) = 2 T X t =1 log  1 + 1 2 W t  = T X t =1 W t − 1 4 W 2 t + 1 2 W 2 t R ( W t ) = η 1 T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) − 1 4 T X t =1 W 2 t + 1 2 T X t =1 W 2 t R ( W t ) − ϵ 2 4 + o p (1) . W e will establish the following statements T X t =1 W 2 t p → ϵ 2 , (QV) η T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) d → N (0 , ϵ 2 ) , (MCL T) T X t =1 W 2 t R ( W t ) = o p (1) , (REM) 28 where Eq. (QV) is a quadratic v ariation term, Eq. (MCL T) will follo w from a Martingale Central Limit Theorem with stable quadratic v ariation [Hall and Heyde, 1980, Theorem 3.2], and Eq. (REM) will follo w from a negligibility condition. Putting the abov e statemen ts together, w e obtain that log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) p → N  − ϵ 2 2 , ϵ 2  . Pro of of Eq. (32) . It suffices to sho w that the mean and v ariance of T X t =1 W t − η T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) + ϵ 2 4 con verges to zero. By the martingale structure, w e can write V ar P ( T ) T X t =1 W t − η T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) ! = T X t =1 V ar P ( T ) h W t − η T ¯ α T ( Z t )( Y t − h T ( Z t )) i ≤ T E P ( T ) " 1 T T X t =1  W t − η T ¯ α T ( Z t )( Y t − h T ( Z t ))  2 # = T Z 2 s q Y ,η ( y | z ) q Y ( y | z ) − 2 − η T ¯ α T ( z )( y − h T ( z )) ! 2 q Y ( y | z ) ¯ h T ( z )d µ ( a, x, y ) = 4 T Z  q q Y ,η ( y | z ) − p q Y ( y | z )  1 + η 2 T ¯ α T ( z )( y − h T ( z ))   2 ¯ h T ( z )d µ ( a, x, y ) ( ∗ ) ≤ 4 η 4 T ¯ σ 4 T T 4 = 4 ϵ 4 T = o (1) . where ( ∗ ) follo ws from the remainder bound in Lemma 1. W e now sho w that mean v anishes. Using the fact that the score has mean zero, w e hav e E P ( T ) " T X t =1 W t # = T E P ( T ) " 1 T T X t =1 2 s q Y ,η ( Y t | Z t ) q Y ( Y t | Z t ) − 2 # = T Z 2 s q Y ,η ( y | z ) q Y ( y | z ) − 2 ! ¯ h T ( z ) q Y ( y | z )d µ ( y , z ) = 2 T Z q q Y ,η ( y | z ) q Y ( y | z ) ¯ h T ( z )d µ ( y , z ) − 2 T = − T Z  q q Y ,η ( y | z ) ¯ h T ( z ) − q q Y ( y | z ) ¯ h T ( z )  2 d µ ( y , z ) . 29 If w e define A = q q Y ,η ( y | z ) ¯ h T ( z ) − q q Y ( y | z ) ¯ h T ( z ) B = R t,η − A = − η 2 T ¯ α T ( z )( y − h T ( z )) q q Y ( y | z ) ¯ h T ( z ) , W e no w use ∥ R t,η ∥ L 2 ( µ ) ≤ η 2 ¯ σ 2 T T 2 and η = ϵI − 1 2 T = ϵ ¯ σ − 1 T T 1 2 to obtain ∥ R t,η ∥ L 2 ( µ ) ≤ ϵ 2 ¯ σ − 2 T T ¯ σ 2 T T 2 = ϵ 2 T . W e compute ∥ B ∥ 2 L 2 ( µ ) = η 2 4 T 2 Z ( ¯ α T ( z )( y − h T ( z ))) 2 q Y ( y | z ) ¯ h T ( z )d µ ( y , z ) = η 2 4 T 2 E " 1 T T X t =1 ( ¯ α T ( Z t )( Y t − h T ( Z t ))) 2 # = η 2 4 T 2 ¯ σ 2 T = ϵ 2 ¯ σ − 2 T T ¯ σ 2 T 4 T 2 = ϵ 2 4 T . where in the last step w e substitute in the definition of η = ϵI − 1 2 T . Then we hav e    − T ∥ A ∥ 2 L 2 ( µ ) + T ∥ B ∥ 2 L 2 ( µ )    =    − T ∥ R t,η − B ∥ 2 L 2 ( µ ) + T ∥ B ∥ 2 L 2 ( µ )    =    − T ∥ R t,η ∥ 2 L 2 ( µ ) + 2 T ⟨ B , R t,η ⟩ L 2 ( µ )    ≤ T ∥ R t,η ∥ 2 L 2 ( µ ) + 2 T ∥ B ∥ L 2 ( µ ) ∥ R t,η ∥ L 2 ( µ ) . Hence w e ha ve    − T ∥ A ∥ 2 L 2 ( µ ) + T ∥ B ∥ 2 L 2 ( µ )    ≤ ϵ 4 T + ϵ 2 √ T 2 T ϵ 2 T = ϵ 3 √ T = o (1) . Hence, w e ha ve      E P ( T ) " T X t =1 W t # + ϵ 2 4      =    − T ∥ A ∥ 2 L 2 ( µ ) + T ∥ B ∥ 2 L 2 ( µ )    = o (1) , as desired. Pro of of Eq. (MCL T) . W e verify the assumptions of Lemma 2. Firstly w e verify (QV) . W e hav e T X t =1 η 2 T 2 E  ¯ α T ( Z t ) 2 ( Y t − h T ( Z t )) 2 | F T ,t − 1  = ϵ 2 σ 2 ¯ σ 2 T 1 T T X t =1 ¯ α T ( Z t ) 2 = ϵ 2 σ 2 ¯ σ 2 T ν ⊤ T ¯ Σ † T 1 T T X t =1 φ ( Z t ) φ ( Z t ) ⊤ ! ¯ Σ † T ν T = ϵ 2 σ 2 ¯ σ 2 T ν ⊤ T ¯ Σ † T ˆ Σ T ¯ Σ † T ν T , 30 By Assumption 11, we hav e T X t =1 η 2 T 2 ¯ α T ( Z t ) 2 ( Y t − h T ( Z t )) 2 p → ϵ 2 . (33) Secondly , w e require T X t =1 E  η 2 s T ( Z t , Y t ) 2 1[ | η s T ( Z t , Y t ) | > ϵ ] | F T ,t − 1  p → 0 . Moreo ver, (Lindeb erg) is v erified b y Assumption 12. Pro of of Eq. (QV) . W e hav e T X t =1 W 2 t = T X t =1 4 " s q Y ,η ( Y t | Z t ) q Y ( Y t | Z t ) − 1 # 2 = T X t =1 4 " R t,η ( Y t , Z t ) p q Y ( Y t | Z t ) ¯ h T ( Z t ) + η 2 T ¯ α T ( Z t )( Y t − h T ( Z t )) # 2 = T X t =1 4 R t,η ( Y t , Z t ) 2 q Y ( Y t | Z t ) ¯ h T ( Z t ) + T X t =1 4 η 2 4 T 2 ¯ α T ( Z t ) 2 ( Y t − h T ( Z t )) 2 + T X t =1 4 R t,η ( Y t , Z t ) p q Y ( Y t | Z t ) ¯ h T ( Z t ) η T ¯ α T ( Z t )( Y t − h T ( Z t )) . W e ha ve E P ( T ) " T X t =1 R t,η ( Y t , Z t ) 2 q Y ( Y t | Z t ) ¯ h T ( Z t ) # = T X t =1 Z Z R t,η ( y , z ) 2 q Y ( y | z ) ¯ h T ( z ) q Y ( y | z ) q X ( x ) g t ( a | x )d z d y = T ∥ R t,η ∥ 2 L 2 ( µ ) ≤ η 4 ¯ σ 4 T T 3 = ϵ 4 T = o (1) . Since con vergence in exp ectation implies conv ergence in probabilit y , we hav e T X t =1 4 R t,η ( Y t , Z t ) 2 q Y ( Y t | Z t ) ¯ h T ( Z t ) p → 0 , W e no w conclude via Eq. (33) and a Cauc h y-Sch w arz inequality that P T t =1 W 2 t p → ϵ 2 . Pro of of Eq. (REM) . This is implied b y max t | W t | = o T (1) as shown in the pro of of v an der V aart [1998, Theorem 7.2]. In our setting, this holds due to I T = ¯ σ 2 T T → ∞ as T → ∞ , hence η → 0 as T → ∞ . 31 11.3 Pro of of Con volution theorem W e pro ve Theorem 4 following a standard route (QMD, LAN, Third Le Cam lemma, matc hing and equiv ariance, Anderson lemma), with the adaptive lo calization η T = ϵ √ T / ¯ σ T . Throughout, w e implicitly assume that the information index I T := ¯ σ 2 T /T → ∞ , so that η T → 0 for eac h fixed ϵ . Pr o of of The or em 4. Fix ϵ ∈ R and define the log-lik eliho o d ratio Λ T ( ϵ ) := log dP ( T ) ϵ √ T / ¯ σ T dP ( T ) ( ¯ O T ) . By Theorem 3, Λ T ( ϵ ) = ϵ ∆ T − ϵ 2 2 + o P ( T ) (1) , ∆ T := √ T ¯ σ T D ∗ T ( ¯ O T ) ⇒ N (0 , 1) under P ( T ) . (34) In particular, the LAN expansion (34) implies that for eac h fixed ϵ ∈ R , the sequence of local la ws { P ( T ) ϵ √ T / ¯ σ T } T ≥ 1 is con tiguous with resp ect to { P ( T ) } T ≥ 1 . Define the normalized estimation error under the baseline la w P ( T ) , ϕ T ( ¯ O T ) := √ T ¯ σ T  ˆ Ψ T − Ψ T ( P ( T ) )  . (35) Regularit y (Definition 3) implies tigh tness of { ϕ T } under each lo cal law P ( T ) ϵ √ T / ¯ σ T , hence in particular under P ( T ) . Along a subsequence (not relabeled), ( ϕ T , ∆ T ) ⇒ ( S, ∆) under P ( T ) , ∆ ∼ N (0 , 1) . (36) Step 1: Third lemma and the family of limit la ws. F rom (34) and (36), ( ϕ T , Λ T ( ϵ )) ⇒ ( S, ϵ ∆ − ϵ 2 / 2) under P ( T ) . Le Cam’s third lemma implies that, under P ( T ) ϵ √ T / ¯ σ T , ϕ T ⇒ L ϵ , L ϵ ( B ) = E h 1 { S ∈ B } exp  ϵ ∆ − ϵ 2 2 i , B ∈ B ( R ) . (37) Step 2: Matc hing in the Gaussian shift exp eriment. By the matching theorem [v an der V aart, 1998, Theorem 7.10], there exist a measurable map g : R × [0 , 1] → R and an auxiliary random v ariable U ∼ Unif (0 , 1) indep enden t of ev erything else suc h that, for X ϵ ∼ N ( ϵ, 1) , g ( X ϵ , U ) ∼ L ϵ , ∀ ϵ ∈ R . (38) Step 3: Regularity ⇒ equiv ariance-in-law. Define the (normalized) lo cal centering term B T ( ϵ ) := √ T ¯ σ T  Ψ T ( P ( T ) ϵ √ T / ¯ σ T ) − Ψ T ( P ( T ) )  . (39) By path wise differen tiabilit y of Ψ T at P ( T ) (Assumption 10), for an y one-dimensional submo del { P ( T ) η } through P ( T ) with score s T = ∂ η log p ( T ) η | η =0 , Ψ T ( P ( T ) η ) − Ψ T ( P ( T ) ) = η P ( T )  D ∗ T s T  + o ( η ) , η → 0 . (40) 32 F or a least fav orable submo del in the sense of Definition 2, w e ha ve s T = D ∗ T . Substituting into (40) giv es Ψ T ( P ( T ) η ) − Ψ T ( P ( T ) ) = η P ( T )  ( D ∗ T ) 2  + o ( η ) . (41) W e no w tak e the lo cal parameter η T := ϵ √ T / ¯ σ T = ϵ/ p I T . Since I T → ∞ , we hav e η T → 0 , and the pathwise differen tiabilit y expansion (41) applies along this sequence. Plugging η = η T in to (41) and then into (39) yields B T ( ϵ ) = ϵ T ¯ σ 2 T P ( T )  ( D ∗ T ) 2  + o (1) . (42) Moreo ver, since η T → 0 and √ T / ¯ σ T = 1 / √ I T → 0 , the remainder term satisfies √ T ¯ σ T o ( η T ) = o (1) . By construction of ¯ σ T , P ( T ) [( D ∗ T ) 2 ] = ¯ σ 2 T /T , w e ha ve that B T ( ϵ ) = ϵ + o (1) for eac h fixed ϵ. (43) Regularit y (Definition 3) asserts that, for ev ery fixed ϵ ∈ R , √ T ¯ σ T  ˆ Ψ T − Ψ T ( P ( T ) ϵ √ T / ¯ σ T )  ⇒ L under P ( T ) ϵ √ T / ¯ σ T , (44) for a law L not dep ending on ϵ . Since ϕ T − B T ( ϵ ) is exactly the left-hand side of (44), we obtain ϕ T − B T ( ϵ ) ⇒ L under P ( T ) ϵ √ T / ¯ σ T . Com bining this with (37) and (43) yields L ϵ ( · − ϵ ) = L 0 ( · ) , ∀ ϵ ∈ R . Using (38), this is equiv alen t to g ( X ϵ , U ) − ϵ d = g ( X 0 , U ) , ∀ ϵ ∈ R , (45) i.e. g is equiv ariant-in-la w in the Gaussian shift exp erimen t. Step 4: Anderson decomp osition and conv olution. By [v an der V aart, 1998, Prop osition 8.4], equiv ariance-in-law implies there exists a random v ariable ξ , independent of X 0 ∼ N (0 , 1) , suc h that g ( X 0 , U ) d = X 0 + ξ . Let M := L ( ξ ) . Then L 0 = L ( g ( X 0 , U )) = N (0 , 1) ∗ M . Since regularity iden tifies the (subsequen tial) limit la w under P ( T ) as L = L 0 , w e obtain the con volution representation L = N (0 , 1) ∗ M . 33 12 Auxiliary results W e use the follo wing v ersion of the martingale cen tral limit theorem from Hall and Heyde [1980, Corollary 3.1]. Lemma 2. Supp ose { ( X n,i , F n,i ) : i ∈ [ k n ] , n ∈ N } is a martingale differ enc e arr ay with the neste d pr op erty F n,i ⊂ F n +1 ,i for al l i ∈ [ k n ] and n ∈ N . W e further assume that ( ∀ ϵ > 0) X i ∈ [ k n ] E [ X 2 n,i 1[ | X n,i | > ϵ ] | F n,i − 1 ] p → 0 (Lindeb erg) X i ∈ [ k n ] E  X 2 n,i | F n,i − 1  p → 1 . (QV) Then P i ∈ [ k n ] X n,i d → N (0 , 1) . 34

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment