Efficient Inference after Directionally Stable Adaptive Experiments

Eﬃcien t Inference after Directionally Stable A daptiv e Exp erimen ts Zik ai Shen ∗ 1 , Houssam Zenati ∗ 2 , Nathan Kallus 3,4 , Arthur Gretton 2,5 , Koulik Khamaru 6 , and Aurélien Bibaut 4 1 Univ ersity College London 2 Gatsb y Computational Neuroscience Unit, Universit y College London 3 Cornell Universit y 4 Netﬂix 5 Go ogle DeepMind 6 Rutgers Universit y F ebruary 26, 2026 Abstract W e study inference on scalar-v alued path wise diﬀerentiable targets after adaptiv e data collection, suc h as a bandit algorithm. W e in tro duce a nov el target-sp eciﬁc condition, dir e ctional stability , whic h is strictly weak er than previously imp osed target-agnostic stabilit y conditions. Under directional stabilit y , w e sho w that estimators that would hav e b een eﬃcient under i.i.d. data remain asymptotically normal and semiparametrically eﬃcient when computed from adaptiv ely collected tra jectories. The canonical gradient has a martingale form, and directional stabilit y guarantees stabilization of its predictable quadratic v ariation, enabling high- dimensional asymptotic normality . W e characterize eﬃciency using a con volution theorem for the adaptive-data setting, and giv e a condition under whic h the one-step estimator attains the eﬃciency b ound. W e verify directional stabilit y for LinUCB, yielding the ﬁrst semiparametric eﬃciency guaran tee for a regular scalar target under LinUCB sampling. 1 In tro duction Statistical inference under adaptive data collection is no w routine in mo dern learning systems. In con textual bandits and related online decision problems, actions are chosen using past observ ations, inducing dep endence betw een observ ations collected at diﬀerent rounds [Auer et al., 2002, Li et al., 2010, Lattimore and Szepesv ári, 2020, Sutton and Barto, 2018]. This dep endence can inv alidate classical i.i.d. asymptotics: even when estimators remain consistent, their limiting distributions ma y b e non-normal, complicating inference [v an der V aart, 1998, Hall and Heyde, 1980, Hadad et al., 2021, Bibaut and Kallus, 2025]. A key mechanism by whic h classical inference can b e recov ered is stability of the adaptiv e design. Speciﬁcally , in the linear regression setting, if the realized random design matrix becomes deterministic as the horizon gets large, then the predictable quadratic v ariation conv ergence condition of martingale central limit theorems is satisﬁed in the analysis of the OLS estimator, whic h guarantees asymptotic normalit y [Lai and W ei, 1982]. Most existing stability form ulations are ful l-matrix ∗ Equal contribution. 1 conditions: they require the en tire empirical cov ariance (or information) matrix to stabilize after deterministic rescaling. While p o werful, such global requiremen ts can b e misaligned with regret- minimizing ob jectives that limit exploration. In particular, mo dern bandit algorithms deliberately concen trate sampling in “go o d” directions, so information typically accumulates anisotropically across directions. F ull-matrix stabilit y can therefore be to o regret-exp ensiv e to enforce. F ortunately , ev en it fails, inference for a sp eciﬁc sc alar target of interest may still b e possible. W e introduce a no vel notion of stabilit y which is suﬃcien t for inference for sc alar p athwise diﬀer entiable tar gets . Our starting p oint is that for scalar targets, inference dep ends on the design only in the direction in whic h the target of interest dep ends on the co eﬃcien t vector. W e in tro duce a target-sp eciﬁc notion we call dir e ctional stability , whic h requires stabilization of the empirical design only along a relev ant direction (at an explicit rate), while allo wing instability in directions that do not aﬀect the target. Directional stabilit y can b e substan tially w eaker than classical full-matrix stabilit y [Lai and W ei, 1982] and is compatible with the anisotropic exploration patterns induced b y regret minimization, as we show b y example with LinUCB. Finally , anisotropic stabilization is not unique to optimistic exploration. In the m ulti-armed setting, recent work by Han [2026] shows that Thompson sampling exhibits a sharp stability dichotom y: the pull counts are asymptotically deterministic for each sub optimal arm (and for the unique optimal arm), whereas when there are m ultiple optimal arms, the vector of optimal-arm pull prop ortions con verges to a non-degenerate random limit c haracterized as the inv ariant la w of an SDE. This provides a complementary example in whic h stabilization holds in suboptimal-arm directions but fails along directions associated with the optimal arms. Main message: in directionally stable designs, i.i.d.-eﬃcien t estimators remain eﬃcient. Our main consequence is simple: under directional stability , the estimators that w ould ha ve b een asymptotically normal and eﬃcient under i.i.d. sampling are also the ones that are asymptotically normal and eﬃcien t under adaptiv e sampling. No alterations or square-root-prop ensity weigh ts are needed. W e work at the tra jectory level, viewing the observ ed bandit history as a single dra w from a horizon-indexed longitudinal exp erimen t. W e deriv e the tra jectory-lev el canonical gradien t for the target and show it has a martingale form. Directional stability then guaran tees stabilization of the martingale’s (conditional) quadratic v ariation, yielding asymptotic normalit y in high-dimensional regimes. W e further dev elop an eﬃciency theory for a sequence of horizon-indexed exp erimen ts and sho w that the resulting one-step estimator achiev es the semiparametric eﬃciency b ound asso ciated with the stable limit exp eriment. In particular, once directional stability holds, there is no intrinsic need for prop ensity-w eigh ting or ad ho c v ariance stabilization: the classical one-step construction from the i.i.d. setting is already optimal, while prop ensit y w eighting would actually degrade eﬃciency . Relation to prior w ork. Sev eral lines of work obtain v alid inference under adaptivit y by mo difying estimating equations to enforce normalit y under broad sampling rules. Prominen t examples include prop ensit y-w eighted or v ariance-stabilized doubly robust scores, which often require either kno wn assignmen t probabilities or consisten t prop ensit y estimation and conditional v ariance corrections [e.g., Hadad et al., 2021, Bibaut et al., 2021, Zhang et al., 2021]. A complemen tary literature dev elops alwa ys-v alid pro cedures (e.g., conﬁdence sequences) that remain v alid under arbitrary stopping, sometimes at the cost of conserv atism at ﬁxed horizons [Ho w ard et al., 2021, W audb y-Smith and Ramdas, 2021, Bibaut et al., 2022]. Another strand c haracterizes non-Gaussian limits induced b y adaptiv e designs and p erforms inference by in verting the corresp onding limit exp erimen ts [Rosen b erger and Hu, 1999, Hirano and P orter, 2023, Adusumilli, 2023, Cho et al., 2 2025]. A review of these diﬀeren t lines is given in [Bibaut and Kallus, 2025]. A distinct and increasingly activ e line of work studies when classic al W ald-typ e infer enc e is restored under adaptive sampling via stability of the empirical design, in the sense of Lai and W ei [1982]. This p ersp ectiv e has motiv ated stability-based analyses for a range of bandit algorithms, including UCB-t yp e metho ds and reﬁnemen ts thereof [Kalvit and Zeevi, 2021, Khamaru and Zhang, 2024, F an and Glynn, 2022, Han et al., 2024], more precise and quan titative characterizations of (in)stabilit y and cov erage behavior [F an et al., 2024, Han, 2026], stable v arian ts of Thompson sampling under v ariance inﬂation [Halder et al., 2025], and linear con textual bandits such as LinUCB [F an et al., 2025]. A t the same time, these works highligh t that many widely used adaptiv e p olicies fail to satisfy full-matrix stabilit y , leading to systematic under-co verage of naiv e W ald in terv als when applied without adjustmen t [F an et al., 2024]. Our con tribution is a w eakening of the notion of stabilit y: giv en a target, we iden tify a minimal statistical functional of the design of which the conv ergence along a deterministic sequence suﬃces for asymptotic normality and a c haracterization of the eﬃciency b ound. Application to LinUCB. Finally , we instan tiate our conditions for LinUCB. F ull-matrix stabilit y is not curren tly known to hold in linear con textual bandits due to direction-dep enden t information gro wth. Directional stability , b y construction, matc hes this anisotrop y: it only asks for stabilization in the target direction. Plugging in the c haracterization of the eigendecomposition of the empirical design matrix under LinUCB from F an et al. [2025], w e verify directional stabilit y and obtain—to our knowledge—the ﬁrst semiparametric eﬃciency guarantee for inference on a regular scalar target under LinUCB sampling. Our con tributions are as follows: • T ra jectory canonical gradient and one-step estimator. W e derive the canonical gradien t for a functional of the repeated factor (that is the en vironment factor, as opposed to the design factors) in the distribution of the full bandit tra jectory and construct a one-step estimator that is algebraically iden tical to the i.i.d. one-step estimator (Sections 2 – 4). • Directional stabilit y and high-dimensional asymptotics. W e introduce directional stabilit y and sho w it yields asymptotic normalit y via stabilization of predictable quadratic v ariation, including high-dimensional regimes where plug-in OLS asymptotics are inadequate. W e then v erify directional stability for LinUCB (Section 5). • Eﬃciency theory for horizon-indexed exp erimen ts. W e use the notion of regularit y along a sequence of submo dels from v an der Laan et al. [2026] and eﬃciency w.r.t. suc h regular estimators. W e c haracterize the eﬃciency b ound under directional stabilit y and show that the one-step estimator attains it (Section 6). 1.1 Data and Statistical Mo dels Data W e observe an adaptively collected sequence { O t } T t =1 with O t := ( X t , A t , Y t ) , where X t ∈ X is a (potentially contin uous) con text, A t ∈ A = { 1 , . . . , K } is a categorical action, and Y t ∈ Y ⊂ R is an outcome. Let ¯ O t := ( O 1 , . . . , O t ) denote the history and F t := σ ( ¯ O t ) the associated ﬁltration. The data are generated b y a con textual bandit–type exp erimen t: at each round t , the agen t selects a (random) logging p olicy g t ( · | x, ¯ O t − 1 ) based on past observ ations, then observ es a fresh, i.i.d. sampled con text X t , draws an action A t ∼ g t ( · | X t , ¯ O t − 1 ) , and ﬁnally observ es an outcome Y t dra wn from an unkno wn model conditional on ( X t , A t ) . 3 W e assume that X t is independent of F t − 1 with stationary marginal distribution Q 0 ,X . Condi- tional on ( F t − 1 , X t ) , the action satisﬁes P ( A t = a | F t − 1 , X t ) = g t ( a | X t , ¯ O t − 1 ) for all a ∈ A , where g t is F t − 1 -measurable and maps X to the K -simplex. Conditional on ( X t , A t ) , the outcome is inde- p enden t of the past and has stationary conditional distribution Y t | ( X t = x, A t = a ) ∼ Q 0 ,Y ( · | a, x ) . Statistical mo dels. W e supp ose that the full tra jectory ¯ O T is a draw of P ( T ) living in the set M np T of distributions that are absolutely contin uous w.r.t. an appropriate pro duct measure µ ( T ) = µ ⊗ T , where density w.r.t µ ( T ) factors as dP ( T ) dµ ( T ) ( ¯ o T ) = T Y t =1 q X ( x t ) g t ( a t | x t , ¯ o t − 1 ) q Y ( y t | a t , x t ) , (1) where the lik eliho od factors q X , q Y , g t , 1 ≤ t ≤ T are unknown to the statistician and allow ed to v ary arbitrarily . W e assume that µ ( X × A ) = 1 . W e then sp ecialize to a restricted setting in the next subsection. F or each horizon T , the data-generating pro cess induces a statistical experiment M np T consisting of all distributions P ( T ) on ¯ O T admitting the factorization (1) . W e emphasize that ( M np T ) T ≥ 1 is a se quenc e of statistic al mo dels . 1.2 Sequence of T arget Estimands W e no w deﬁne a sequence of target estimands indexed b y the exp erimen tal horizon T . Eac h target Ψ T is deﬁned on the corresp onding statistical mo del M T and dep ends on a feature represen tation whose dimension ma y gro w with T . All asymptotic statements in the sequel are understo od along this sequence as T → ∞ . Throughout, let Z = X × A . W e write z = ( x, a ) , and Z t = ( X t , A t ) . F or eac h horizon T , deﬁne a feature map φ T : Z → R d T . Notation. F or any measurable f : Z × Y → R , deﬁne the time-a veraged empirical pro cess notation ¯ P ( T ) f := 1 T T X t =1 E P ( T ) [ f ( Z t , Y t )] , whic h in tegrates o ver the argumen ts of f only , ev en when f is random. This induces the Hilb ert space L 2 ( ¯ P ( T ) ) := { f : Z × Y → R | ¯ P ( T ) ( f 2 ) < ∞} , ∥ f ∥ L 2 ( ¯ P ( T ) ) := { ¯ P ( T ) ( f 2 ) } 1 / 2 . (2) Similarly , for an y measurable f : Z × Y → R , deﬁne the empirical av erage ¯ P T f := 1 T T X t =1 f ( X t , A t , Y t ) , with asso ciated (random) seminorm ∥ f ∥ L 2 ( ¯ P T ) := { ¯ P T ( f 2 ) } 1 / 2 and space L 2 ( ¯ P T ) . Finally , deﬁne the po oled second-momen t matrix ¯ Σ T := E P ( T ) h 1 T P T t =1 φ T ( Z t ) φ T ( Z t ) ⊤ i . In general, w e use the bar notation to denote a veraging o ver time and the distribution of tra jectories. Assumption 1. W e assume that the set { φ T ( z ) | z ∈ Z } is line arly indep endent. 4 F or T ≥ 1 , we deﬁne the sequence of statistical mo dels M T ⊂ M np T as M T = n P ( T ) ∈ M T | ∃ β 0 ∈ R d T s.t. ∀ z ∈ Z , E P ( T ) [ Y t | Z t = z ] = φ T ( z ) ⊤ β 0 o . (3) In Eq. (3) , E P ( T ) [ Y t | Z t = z ] denotes the conditional exp ectation of the in v arian t factor q Y in the factorization of P ( T ) as in Eq. (1) . By Assumption 1, for P ( T ) ∈ M T , there can only b e one β 0 satisfying the condition in Eq. (3) . When there is no ambiguit y as to the underlying distribution P ( T ) , w e denote the corresp onding unique β 0 b y β T ; w e similarly deﬁne h T := E P ( T ) [ Y t | Z t = · ] . Remark 1 (Wh y do w e consider high dimensional setting?) . W e c onsider asymptotic normality in setting of c orr e ctly sp e ciﬁe d line ar sieves with gr owing dimension as it is pr e cisely the setting that r eve als a gap b etwe en the plug-in OLS estimator and the one-step estimator in the i.i.d. setting. The plug-in OLS estimator exhibits a naive dep endenc e on the ambient dimension d , and r e quir es d T = o ( T ) for c onsistency. In c ontr ast, the one-step estimator with r e gularize d nuisanc es incur a se c ond or der r emainder term of or der d eﬀ ( λ ) T , wher e d eﬀ is the eﬀe ctive dimension asso ciate d with r e gularization str ength λ , al lowing for p otential ly much mor e aggr essive sc aling of dimensions. W e mak e follo wing assumptions on the outcome mo del error. Sequence of T arget P arameters. Deﬁne the se quenc e of functionals Ψ T : M T → R , for any T ≥ 1 , Ψ T ( P ( T ) ) = ν ⊤ T β T , (4) Riesz representation of the tra jectory-level target. Although Ψ T is deﬁned as a functional of the full tra jectory la w P ( T ) , it dep ends on P ( T ) only through the inv ariant conditional mean h T ( z ) := E P ( T ) [ Y t | Z t = z ] . T o formalize this dep endence, we leverage the time-av eraged L 2 geometry L 2 ( ¯ P ( T ) ) associated with P ( T ) deﬁned in Eq. (2). Assumption 2 (Identiﬁcation condition) . ν T ∈ ker( ¯ Σ T ) ⊥ . W e deﬁne the function ¯ α T : Z → R b y ¯ α T ( z ) := ν ⊤ T ¯ Σ † T φ T ( z ) , where ¯ Σ † T denotes the Mo ore– P enrose pseudoinv erse. W e then deﬁne the linear functional e Ψ T : L 2 ( ¯ P ( T ) ) → R by e Ψ T ( h ) := 1 T T X t =1 E P ( T ) [ ¯ α T ( Z t ) h ( Z t )] . (5) W e then ha ve that for an y P ( T ) ∈ M T , Ψ T ( P ( T ) ) = e Ψ T ( h T ) , and that ¯ α T is the Riesz representer of e Ψ T . 2 Canonical Gradien t In this section, we derive the canonical gradient of Ψ T for a given T . The followin g homoskesdasticit y assumption simpliﬁes the analysis. Assumption 3 (Homosk edastic noise) . Ther e exists σ > 0 indep endent of T and z such that, E P ( T ) [ ε 2 t | Z t = z ] = σ 2 , wher e ε t := Y t − E [ Y t | Z t , F t − 1 ] . 5 Theorem 1 (Canonical gradien t) . Under assumptions 1-3, Ψ T is p athwise diﬀer entiable at P ( T ) if, and only if, D ∗ T = D ∗ T ( ¯ α T , h T ) wher e D ∗ T ( α, h ) : ¯ o T 7→ 1 T T X t =1 α ( z t )  y t − h ( z t )  , (6) has b ounde d L 2 ( P ( T ) ) norm (i.e. ∥ D ∗ T ∥ 2 L 2 ( P ( T ) ) = σ 2 T ν ⊤ T ¯ Σ † T ν T < ∞ ), in which c ase it is its c anonic al gr adient. W e refer the reader to Section 8 for the full pro of. The ab o ve expression mak es explicit the martingale structure of the canonical gradient under adaptiv e data collection and is the basis of our asymptotic analysis. Note that the canonical gradient mak es app ear the marginal av erage design matrix ¯ Σ T through the deﬁnition of the Riesz representer ¯ α T , in other w ords, ¯ Σ T arises from a veraging in T longitudinal ly and at the p opulation lev el across tra jectories cr oss-se ctional ly . This is to b e contrasted with the v ariance-stabilized doubly robust scores used in [Bibaut et al., 2021] for instance, that do not coincide with the ab ov e canonical gradien t, whic h suggests that their corresponding stabilized AIPW is likely not eﬃcient. 3 Directional stabilit y In this section we in tro duce the notion of directional stabilit y whic h is the cornerstone of our asymptotic analysis. Let b α T ,λ α : z 7→ ν ⊤  b Σ T + λ α I d T  − 1 φ T ( z ) , (7) where b Σ T := 1 T P t φ T ( Z t ) φ T ( Z t ) ⊤ . Deﬁnition 1 (Directional stability) . W e say that the design is dir e ctional ly stable w.r.t. (Ψ T ) T ≥ 1 if ther e exists a deterministic se quenc e of functions ( e α T ) T ≥ 1 of the form e α T : z 7→ ν ⊤ T e Σ − 1 T φ T ( z ) for a se quenc e of p ositive deﬁnite matric es ( e Σ T ) T ≥ 1 such that ∥ b α T ,λ α − e α T ∥ L 2 ( P T ) = o P ( T ) (1) . (8) W e note that when λ α = o ( λ min ( e Σ T )) , where λ min ( e Σ T ) is the smallest eigen v alue of e Σ T , then directional stabilit y is alwa ys implied by the classical full-matrix notion of stabilit y [Lai and W ei, 1982], whic h requires that ∥ e Σ − 1 T b Σ T − I d T ∥ op = o P ( T ) (1) . Remark 2. Note that in the ab ove deﬁnition e Σ T ne e d not r elate to ¯ Σ T fr om the c anonic al gr adient. Should ¯ Σ T b e a valid choic e for e Σ T , then ( b α T ,λ α ) is a c onsistent estimator of the Riesz r epr esenter se quenc e ( ¯ α T ) . W e wil l se e further down how c onsistent estimation of the Riesz r epr esenter se quenc e is a ne c essary c ondition for eﬃciency of the one-step. 4 One-Step Estimator W e construct a one-step estimator b y plugging in the ridge regularized Riesz-represen ter-like estimator b α T ,λ α from (7) as in [Chernozhuk ov et al., 2022], and the follo wing ridge-regularized estimator of the outcome regression function: b h T ,λ h ( z ) := φ T ( z ) ⊤ b β T ,λ , where b β T ,λ h := b Σ − 1 T ,λ h b Σ T ,Z Y , 6 with b Σ T ,Z Y := 1 T T X t =1 φ T ( Z t ) Y t . W e then construct a one-step estimator of Ψ( P ( T ) ) as follows: b Ψ := Ψ( b h T ,λ h ) + ¯ P T h b α T ,λ α ( Z ) { Y − b h T ,λ h ( Z ) } i . Remark 3 (Equiv alence to Undersmoothed Plug-in Estimator) . In the i.i.d. c ase, Bruns-Smith et al. [2025] show that when b oth the outc ome mo del and the R iesz r epr esenter ar e estimate d by (kernel) ridge r e gr ession (“double ridge”), the r esulting augmente d estimator is numeric al ly identic al in ﬁnite samples to a single (kernel) ridge r e gr ession plug-in estimator with a smal ler eﬀe ctive p enalty p ar ameter. Due to this numeric al e quivalenc e, an undersmo othe d plug-in ridge estimator inherits the the or etic al pr op erties we establishe d for the one-step estimator. 5 Asymptotic Analysis In this section we provide an asymptotic analysis of our one step estimator using our deﬁnition of directional stabilit y . Let e σ T := σ ( ν ⊤ T e Σ − 1 T ν T ) 1 / 2 , whic h is the quan tity app earing in the source condition of exponent 1 of the Riesz represen ter w.r.t e Σ T . An alternativ e representation is e σ T = ( e P ( T ) { e α T ( y − h T ) } 2 ) 1 / 2 , where e P ( T ) is the operator deﬁned for any ﬁxed function f : ( z , ε ) 7→ a 1 φ ( z ) φ ( z ) ⊤ + a 2 φ ( z ) ε b y e P ( T ) f := a 1 Σ T . This alternativ e represen tation highligh ts the in terpretation of e σ 2 T as a v ariance-like ob ject of the generic term e α T ( y − h T ) of D ∗ ( e α T , h T ) . In general e σ T needs not coincide with ¯ σ T := σ q ν ⊤ T ¯ Σ † T ν T = ∥ ¯ α T ( y − h T ) ∥ L 2 ( ¯ P ( T ) ) = √ T ∥ D ∗ T ∥ L 2 ( P ( T ) ) (see Eq. (29) for a deriv ation of this equalit y). It instead plays an analogous role for adaptiv e data collection under stable design. In general e σ T do es not conv erge as a function of T , as can b e c heck ed for example in the case of LinUCB from the pro of of Proposition 2. Assumption 4 (Directional stabilit y) . It holds that σ 2 e σ 2 T ν T T e Σ − 1 T ( b Σ T − e Σ T ) e Σ − 1 T ν T p → 0 . Assumption 5 (Lindeb erg condition) . F or al l ϵ > 0 , we have T X t =1 E  1 T e σ 2 T e α T ( Z t ) 2 ε 2 t 1  e α T ( Z t ) 2 ε 2 t T e σ 2 T > ϵ  | F t − 1  p → 0 . Assumption 6 (Gaussian noise) . Ther e exists σ ∈ (0 , ∞ ) such that, c onditional on Z 1: T := ( Z 1 , . . . , Z T ) , the noise variables ( ε t ) T t =1 ar e indep endent and satisfy ( ε 1 , . . . , ε T ) | Z 1: T ∼ N (0 , σ 2 I T ) . Equivalently, ε t | Z 1: T i.i.d. ∼ N (0 , σ 2 ) . Theorem 2 (von Mises expansion and asymptotic normalit y) . Under assumption 6, the fol lowing von-Mises exp ansion holds pr ovide d the quantities in it ar e wel l-deﬁne d and b ounde d: b Ψ T − Ψ T =( P T − e P ( T ) ) { α T ( Y − h T ) } + O ( R T ) 7 with R T := ∥ b α T ,λ α − e α T ∥ L 2 ( P T )     b h T ,λ h − h T    L 2 ( P T ) + 1 √ T  + 1 T ν ⊤ T b Σ − 1 T ,λ α ( b β T ,λ h − β T ) . (9) If we have that R T = o P  e σ T √ T  and assumptions 4-5 hold, then it further holds that p T / e σ T  b Ψ T − Ψ T  d − → N (0 , 1) . W e refer the reader to Section 9 for a full proof. Pr o of sketch of The or em 2. W rite the one-step error as the explicit v on Mises expansion b ψ − Ψ T = ( P T − e P ( T ) ) e α T ( Y − h T ) | {z } (A) + ( P T − e P ( T ) ) e α T ( h T − b h T ,λ ) | {z } (B) + ( P T − e P ( T ) )( b α T − e α T ) ε | {z } (C) − P T ( b α T − e α T )( b h T ,λ − h T ) | {z } (D) . F or (A), set X T ,t := e α T ( Z t ) ε t / ( e σ T √ T ) ; then ( X T ,t ) t ≤ T is a martingale diﬀerence array . Assump- tion 4 giv es the conditional quadratic v ariation P t ≤ T E [ X 2 T ,t | F t − 1 ] → p 1 , and Assumption 5 giv es the conditional Lindeb erg condition, so the martingale CL T (Lemma 2) yields √ T (A) / e σ T ⇒ N (0 , 1) . F or the remainder term, (C) is a conditional cen tered Gaussian with v ariance σ 2 ∥ b α T − e α T ∥ 2 L 2 ( P T ) /T , hence (C) = O p ( ∥ b α T − e α T ∥ L 2 ( P T ) / √ T ) . Moreo ver, by Cauch y–Sc hw arz, | (D) | ≤ ∥ b α T − e α T ∥ L 2 ( P T ) ∥ b h T ,λ − h T ∥ L 2 ( P T ) , and (B) is b ounded by the second order b ound plus an additional bias term 1 T ν ⊤ b Σ − 1 T ,λ α ( b β T ,λ h − β T ) . Instan tiation under LinUCB. W e consider LinUCB as presented b y Lattimore and Szep esv ári [2020, Chapter 19], with a d T -dimensional feature map. W e use γ for their β (the “exploration b on us” factor). W e deﬁne the deterministic matrix e Σ T := P ⋆ + ( T d T ) 1 / 4 γ − 1 / 2 P ⊥ , where P ⋆ = β T β ⊤ T / ∥ β T ∥ 2 2 and P ⊥ = Id H − P ⋆ . Assumption 7. Ther e exists L > 0 such that for al l T ≥ 1 and for al l z ∈ Z , we have ∥ φ T ( z ) ∥ ≤ L . W e require a tail assumption with same conditional v ariance scale as assumption 3. Assumption 8. The se quenc e ( ϵ ) T t =1 is an ( F t ) -adapte d MDS with c onditional varianc e σ 2 , and satisﬁes: for al l t ≥ 1 and λ ∈ R , E [exp( λϵ t ) | F t − 1 ] ≤ exp  1 2 σ 2 λ 2  . Assumption 8 plays a k ey role in the self-normalized inequalities of Abbasi-y adkori et al. [2011] and the non-asymptotic analysis of F an et al. [2025]. The follo wing prop osition is a direct consequence of [Abbasi-yadk ori et al., 2011, Theorem 2] along with [Jézéquel et al., 2019, Proposition 2]. Suc h b ounds cannot b e impro ved for a general bandit algorithm [Lattimore, 2023]. 8 Prop osition 1. Under assumption 7-8, it holds with pr ob ability ≥ 1 − δ , for al l T ≥ 0 , that    b h T ,λ h − h T    L 2 ( P T ) ≲ r d eﬀ ( λ h , T ) T + p λ h ∥ β 0 ∥ 2 . wher e d eﬀ ( λ h , T ) := T r  ( λ h I d T + b Σ T ) − 1 b Σ T  is the eﬀe ctive dimension. The notion d eﬀ ( λ, T ) is related to the Bay esian information gain [Sriniv as et al., 2010]. A reader familiar with the regression literature [Cap onnetto and De Vito, 2007, Fisc her and Stein w art, 2020] ma y b e more accustomed to eﬀective dimension as deﬁned in terms of p opulation co v ariance matrix. The term √ λ ∥ β 0 ∥ 2 is a trivial upp er b ound on the bias term, and can b e improv ed under source conditions on β 0 with respect to b Σ T . The following directional stabilit y proposition is a consequence of [F an et al., 2025]. Assumption 9 (Large exploration) . The explor ation b onus γ T for horizon T satisﬁes γ T ≳ d 2 T ( σ p d T + log log T + 1) . Prop osition 2. Under assumptions 7-8-9, supp ose that ∥ φ T ( Z t ) ∥ = 1 for every t , and that ε bulk := d T  γ 8 T T  d T +1 d T − 1 + d 1 / 4 T √ γ T = o (1) , d T = o ( T ) , γ T p d T /T = o (1) , and γ − 1 T = o (1) . F or λ α = 1 /T , omitting p olylo gs, we have the fol lowing b ound in pr ob ability, with pr o of given in Se ction 10. ∥ b α T ,λ α − e α T ∥ L 2 ( P T ) e σ T = O P ( T ) d T γ T + r d T T γ T ! ∥ P ⋆ ν T ∥ ∥ P ⋆ ν T ∥ + ∥ P ⊥ ν T ∥ ( T d T ) 1 / 4 / √ γ T ( ∥ ) + r d T T + d T γ T + ε bulk ! ∥ P ⊥ ν T ∥ ( T d T ) 1 / 4 / √ γ T ∥ P ⋆ ν T ∥ + ∥ P ⊥ ν T ∥ ( T d T ) 1 / 4 / √ γ T ! ( ⊥ ) Here Eq. ( ∥ ) reﬂects the con tribution from the mass of ν aligned with the true signal direction h T , and Eq. ( ⊥ ) reﬂects the con tribution from the mass of ν orthogonal to the true signal direction. This is the key distinction betw een adaptiv e data collection and data collected under i.i.d. design, as the bandit is regret-minimizing and learns to collect less data in the directions corresponding to P ⊥ . Prop osition 2 c haracterizes when asymptotic normality and regret minimization can sim ultaneously b e achiev ed, b y optimally tuning the parameters γ , d T as a function of T in the high dimensional regime suc h that b oth Eq. ( ∥ ) and Eq. ( ⊥ ) are o (1) with resp ect to T , while the exp ected regret remains sublinear and is as small as p ossible. Remark 4. The b ound in Pr op osition 2 makes app e ar the term q d T T , as we r e gularize b α T with ridge p enalty 1 T or faster. W e c onje ctur e that it may b e p ossible to derive a gener alization of the ab ove pr op osition to an arbitr ary r e gularization sche dule, and inste ad make app e ar the term q d eﬀ ( λ,T ) T , al lowing for gr owth of d that is sup erline ar in T . In other wor ds, we exp e ct that r e gularization impr oves dir e ctional stability, as it do es in the i.i.d. setting (se e e.g., Bruns-Smith et al., 2025). Under this c onje ctur e, the se c ond or der r emainder term in the von Mises exp ansion c an b e o P ( e σ T / √ T ) under much milder r estrictions on gr owth of d T r elative to T , yielding asymptotic normality under dir e ctional stability in high dimensional r e gimes wher e the OLS fails to b e normal. 9 6 Eﬃciency Theory W e consider eﬃciency for the sequence of statistical exp eriments {M T } T ≥ 1 , where both the target parameter Ψ T and its canonical gradient D ∗ T dep end on the exp erimen tal horizon T . Assumption 10 (Path wise diﬀerentiabilit y) . F or e ach T ∈ N , the p ar ameter Ψ T : M T → R is p athwise diﬀer entiable at P ( T ) ∈ M T with c anonic al gr adient D ∗ T ∈ L 2 ( P ( T ) ) . T o form ulate lo cal asymptotic normality (LAN) and eﬃciency when √ T -rates are not attainable, w e state a notion of least fa vorable submo dels with T -sp eciﬁc Fisher information [v an der V aart, 1998]. Deﬁnition 2 (Sequence of Least fav orable submo dels) . L et Ψ T : M T → R b e p athwise diﬀer entiable at P ( T ) with c anonic al gr adient D ∗ T ∈ L 2 ( P ( T ) ) , and let ¯ σ T b e as deﬁne d in Se ction 5. A se quenc e of one–dimensional submo dels { P ( T ) η : | η | ≤ δ } ⊂ M T with P ( T ) 0 = P ( T ) is said to b e least fav orable for the se quenc e of tuples (Ψ T , M T , P ( T ) ) T ≥ 1 if 1. F or e ach ﬁxe d T , the submo del { p ( T ) η | | η | ≤ δ } is QMD at η = 0 with sc or e function ∂ η log p ( T ) η ( ¯ O T )   η =0 = D ∗ T ( ¯ O T ) , (10) 2. The QMD r emainder vanishes, for e ach ﬁxe d ϵ ∈ R , as T → ∞ , we have Z  q dP ( T ) η − p dP ( T ) − η 2 D ∗ T p dP ( T )  2 = o ( η 2 ) , η → 0 . (11) Note that the Fisher information in the parameterization deﬁned b y (10) is ( ¯ P ( T ) { ( D ∗ T ) 2 } ) − 1 whic h diverges as T → ∞ : this is because it is a whole-tra jectory Fisher information, while t ypical analyses in the i.i.d. setting work with a p er-sample notion of Fisher information. In App endix 11.1 we show the existence of such a submodel. Theorem 3 b elo w iden tiﬁes the lo cal asymptotic structure of the adaptive exp erimen t along least fa v orable submo dels. Assumption 11 (Directional stabilit y) . L et σ b e as in Assumption 3. W e assume that σ 2 ¯ σ 2 T ν ⊤ T ¯ Σ † T b Σ T ¯ Σ † T ν T p → 1 ⇐ ⇒ σ 2 ¯ σ 2 T ν ⊤ T ¯ Σ † T ( b Σ T − ¯ Σ T ) ¯ Σ † T ν T p → 0 . Notice that this is the same notion of directional stability as introduced earlier, except that w e no w require the deterministic stabilizing sequence to be ¯ Σ T . Assumption 12 (Lindeb erg condition) . W e assume that, for al l ϵ > 0 X t =1 E  η 2 s T ( Z t , Y t ) 2 1[ | η s T ( Z t , Y t ) | > ϵ ] | F T ,t − 1  p → 0 , wher e s T ( z , y ) := 1 T ¯ α T ( z )( y − h T ( z )) and η = ϵ ¯ σ − 1 T T 1 2 . Theorem 3 (Lo cal asymptotic normalit y) . W e r e quir e assumptions 11-12, and that ther e exists a se quenc e of le ast favor able submo dels { P ( T ) η : | η | ≤ δ } in the sense of Deﬁnition 2, and deﬁne ∆ T := √ T ¯ σ T D ∗ T ( ¯ O T ) . Supp ose that I T = ¯ σ 2 T T → ∞ as T → ∞ . Then, for e ach ﬁxe d ϵ ∈ R , log dP ( T ) ϵ √ T / ¯ σ T dP ( T ) ( ¯ O T ) = ϵ ∆ T − ϵ 2 2 + o P ( T ) (1) , (12) and we have ϵ ∆ T d → N (0 , ϵ 2 ) . 10 Pr o of sketch of The or em 3. Fix ϵ > 0 and set I T := ¯ σ 2 T /T , η := ϵI − 1 / 2 T . Deﬁne W t := 2  q q Y ,η ( Y t | Z t ) /q Y ( Y t | Z t ) − 1  , log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) = 2 T X t =1 log  1 + 1 2 W t  . In Lemma 1 in the App endix, w e explicitly construct a sequence of least fav orable submo dels via p erturbing the outcome lik eliho o d q Y ,η , suc h that for remainder term R t,η satisfying T E [ R 2 t,η ] = o (1) , w e hav e W t = η T ¯ α T ( Z t ) { Y t − h T ( Z t ) } + R t,η ( Y t , Z t ) , whic h implies T X t =1 W t = η T T X t =1 ¯ α T ( Z t ) { Y t − h T ( Z t ) } − ϵ 2 4 + o p (1) . Using log(1 + x ) = x − x 2 2 + x 2 R ( x ) with R ( x ) → 0 as x → 0 , we hav e log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) = T X t =1 W t − 1 4 T X t =1 W 2 t + 1 2 T X t =1 W 2 t R ( W t ) . The term η T ¯ α T ( Z t ) { Y t − h T ( Z t ) } ⇒ N (0 , ϵ 2 ) by Assumptions 11 and 12, which verify the assumptions to apply martingale cen tral limit theorem [Hall and Heyde, 1980, Theorem 3.2]. W e sho w the quadratic term satisﬁes P T t =1 W 2 t → p ϵ 2 , and P T t =1 W 2 t R ( W t ) = o p (1) since max t | W t | = o p (1) . Com bining the abov e yields log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) ⇒ N  − ϵ 2 2 , ϵ 2  . A full pro of is deferred to App endix 11.2. As a consequence, for an y sequence of statistics ϕ T ( ¯ O T ) , the joint limit b eha vior of ( ϕ T , ∆ T ) under P ( T ) determines the entire family of asymptotic la ws of ϕ T under the lo cal exp eriments { P ( T ) ϵ √ T / ¯ σ T : ϵ ∈ R } . By the matching theorem [v an der V aart, 1998, Theorem 7.10], these limit laws can b e realized b y a single statistic acting on the Gaussian shift exp eriment X ϵ ∼ N ( ϵ, 1) . W e no w introduce a notion of regularity for estimator sequences { b ψ T } T ≥ 1 , requiring that their asymptotic distribution, when cen tered at the local target Ψ T ( P ( T ) ϵ √ T / ¯ σ T ) , b e inv arian t with resp ect to ϵ . Deﬁnition 3 (Regularit y along least fa v orable submo dels) . L et { P ( T ) ϵ √ T / ¯ σ T : ϵ ∈ R } b e a le ast favor able submo del for Ψ T in the sense of Deﬁnition 2. A se quenc e of estimators { b Ψ T } T ≥ 1 is said to b e regular along this se quenc e if ther e exists a pr ob ability law L such that, for every ﬁxe d ϵ ∈ R , √ T ¯ σ T  b Ψ T − Ψ T ( P ( T ) ϵ √ T / ¯ σ T )  ⇒ L under P ( T ) ϵ √ T / ¯ σ T , (13) with the same limit law L for al l ϵ . Remark 5. The eﬃciency notion develop e d her e is r elative to the class of r e gular estimators deﬁne d in Deﬁnition 3. R e gularity r e quir es that the asymptotic distribution of the estimator, when c enter e d at the lo c al tar get along le ast favor able submo dels, b e invariant under √ T / ¯ σ T –sc ale p erturb ations. Unlike the i.i.d. setting, the natur al lo c al sc ale of the exp eriment dep ends on T thr ough the factor ¯ σ T . Conse quently, lo c al p erturb ations ar e not c omp ar able acr oss diﬀer ent horizons T without r esc aling. The normalization by ¯ σ T / √ T identiﬁes a c ommon lo c al p ar ameterization under which the lo c alize d exp eriments admit a homo gene ous Gaussian shift limit. 11 This regularit y condition translates, in the Gaussian shift exp erimen t, into equiv ariance-in-la w and forms the basis for the conv olution and eﬃciency results that follo w. Theorem 4 (Conv olution theorem) . Under Assumption 10-11-12. Supp ose that { b Ψ T } T ≥ 1 is r e gular along a le ast favor able submo del in the sense of Deﬁnition 3. Then, under P ( T ) , √ T ¯ σ T  b Ψ T − Ψ T ( P ( T ) )  ⇒ L, (14) and ther e exists a pr ob ability me asur e M on R such that L = N (0 , 1) ∗ M . In p articular, if L has varianc e σ 2 , then σ 2 ≥ 1 , with e quality if and only if M = δ 0 (e quivalently, L = N (0 , 1) ). Consequen tly , b Ψ T is asymptotically eﬃcient among regular estimators if and only if √ T ¯ σ T  b Ψ T − Ψ T ( P ( T ) )  ⇒ N (0 , 1) . (15) In particular, an y asymptotically linear estimator with inﬂuence function D ∗ T attains this limit; hence the one–step estimator constructed in the previous section is asymptotically eﬃcient. Remark 6. An asymptotic eﬃcient se quenc e has √ T -sc ale d r ate q ν ⊤ ¯ Σ † T ν whenever D ∗ T ,Y dom- inates D ∗ T ,X , which is typic al ly the c ase. Me anwhile, we have pr ove d asymptotic normality at √ T -sc ale d r ate q ν ⊤ Σ − 1 T ν . F or our one-step to b e eﬃcient, these ne e d to b e asymptotic al ly e quiva- lent. Remark 7. While the notion of eﬃciency might se em ad-ho c, instantiating it in the i.i.d. setting r e c overs the “usual” semip ar ametric eﬃciency b ound. In other wor ds, in the i.i.d. c ase, c omp eting against estimators that ar e only r e gular along the se quenc e of le ast favor able submo dels is as har d as c omp eting against al l r e gular estimators. 7 Discussions This w ork sho ws that dir e ctional stability is suﬃcient for v alid and eﬃcien t inference in adaptive exp erimen ts. Under this mild condition, adaptiv ely collected data can b e treated with estimators that are algebraically equiv alent to i.i.d estimators and asymptotically eﬃcient. Sev eral directions for future w ork remain op en. First, the analysis ma y b e extended to settings where the w orking mo del for the rew ard function is misspeciﬁed, and our target parameter ma y b e deﬁned as a nonparametric M-estimand to further anc hor the robustness of the prop osed framew ork. Second, our results curren tly focus on p d T /T rates induced by sp eciﬁc ridge regularization schedules; extending the theory to more general regularization regimes, in which complexity is go verned by the eﬀectiv e dimension d eﬀ ( λ, T ) , is an imp ortant next step. A ﬁnal direction is to study debiased inference after mo del selection in adaptiv e settings, building on recent adv ances suc h as v an der Laan et al. [2023]. 12 References Y asin Abbasi-y adkori, Dávid P ál, and Csaba Szep esv ári. Improv ed algorithms for linear stochastic bandits. In J. Shaw e-T aylor, R. Zemel, P . Bartlett, F. Pereira, and K.Q. W ein b erger, editors, A dvanc es in Neur al Information Pr o c essing Systems , volume 24. Curran Asso ciates, Inc., 2011. Karun Adusumilli. Optimal tests follo wing sequential exp erimen ts. arXiv pr eprint arXiv:2305.00403 , 2023. P eter Auer, Nicolò Cesa-Bianc hi, and P aul Fisc her. Finite-time analysis of the multiarmed bandit problem. Machine L e arning , 47(2–3):235–256, 2002. Aurélien Bibaut and Nathan Kallus. Demystifying inference after adaptive exp erimen ts. A nnual R eview of Statistics and its Applic ation , 12(1):407–423, 2025. Aurélien Bibaut, Maria Dimakopoulou, Nathan Kallus, An toine Cham baz, and Mark v an Der Laan. P ost-contextual-bandit inference. A dvanc es in neur al information pr o c essing systems , 34:28548– 28559, 2021. Aurelien Bibaut, Nathan Kallus, and Mic hael Lindon. Near-optimal non-parametric sequen tial tests and conﬁdence sequences with possibly dep enden t observ ations. arXiv pr eprint arXiv:2212.14411 , 2022. P eter J Bic kel, Chris AJ Klaassen, P eter J Bick el, Y a’acov Ritov, J Klaassen, Jon A W ellner, and Y A’Aco v Ritov. Eﬃcient and adaptive estimation for semip ar ametric mo dels , v olume 4. Johns Hopkins Univ ersity Press Baltimore, 1993. Da vid Bruns-Smith, Oliver Dukes, A vi F eller, and Elizab eth L Ogburn. Augmented balancing weigh ts as linear regression. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , page qk af019, 2025. Andrea Cap onnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algorithm. F oundations of Computational Mathematics , 7:331–368, 2007. Victor Chernozh uko v, Whitney K Newey , and Rahul Singh. Automatic debiased machine learning of causal and structural eﬀects. Ec onometric a , 90(3):967–1027, 2022. Brian Cho, Aurélien Bibaut, and Nathan Kallus. Sim ulation-based inference for adaptive exp eri- men ts. In Neur al Information Pr o c essing Systems , 2025. Lin F an and Peter W Glynn. The typical b eha vior of bandit algorithms. arXiv pr eprint arXiv:2210.05660 , 2022. W ei F an, Kevin T an, and Y uting W ei. Statistical inference under adaptive sampling with linucb, 2025. URL . Yingying F an, Y uxuan Han, Jinchi Lv, Xiao cong Xu, and Zhengyuan Zhou. Precise asymptotics and reﬁned regret of v ariance-aw are ucb. arXiv pr eprint arXiv:2412.08843 , 2024. Simon Fisc her and Ingo Stein wart. Sobolev norm learning rates for regularized least-squares algorithms. Journal of Machine L e arning R ese ar ch , 21(205):1–38, 2020. 13 Vitor Hadad, David A Hirsh b erg, Ruohan Zhan, Stefan W ager, and Susan Athey . Conﬁdence in terv als for p olicy ev aluation in adaptiv e exp erimen ts. Pr o c e e dings of the national ac ademy of scienc es , 118(15):e2014602118, 2021. Budhadit ya Halder, Shubha yan Pan, and K oulik Khamaru. Stable thompson sampling: V alid inference via v ariance inﬂation, 2025. URL . P eter Hall and Christopher C Heyde. Martingale limit the ory and its applic ation . Academic press, 1980. Qiy ang Han. Thompson sampling: Precise arm-pull dynamics and adaptiv e inference. arXiv pr eprint arXiv:2601.21131 , 2026. Qiy ang Han, Koulik Khamaru, and Cun-Hui Zhang. Ucb algorithms for multi-armed bandits: Precise regret and adaptive inference, 2024. URL . Keisuk e Hirano and Jac k R Porter. Asymptotic representations for sequential decisions, adaptiv e exp erimen ts, and batc hed bandits. arXiv pr eprint arXiv:2302.03117 , 2023. Stev en R. Ho w ard, Aadity a Ramdas, Jon McAuliﬀe, and Jasjeet Sekhon. Time-uniform c hernoﬀ b ounds via nonnegative sup ermartingales. A nnals of Statistics , 49(2):1055–1080, 2021. Rémi Jézéquel, Pierre Gaillard, and Alessandro Rudi. Eﬃcien t online learning with k ernels for adv ersarial large scale problems. In A dvanc es in Neur al Information Pr o c essing Systems , v olume 32. Curran Associates, Inc., 2019. Anand Kalvit and Assaf Zeevi. A closer lo ok at the w orst-case b eha vior of m ulti-armed bandit algorithms. A dvanc es in Neur al Information Pr o c essing Systems , 34:8807–8819, 2021. K oulik Khamaru and Cun-Hui Zhang. Inference with the upper conﬁdence b ound algorithm. arXiv pr eprint arXiv:2408.04595 , 2024. T ze Leung Lai and Ching Zong W ei. Least squares estimates in stochastic regression mo dels with applications to identiﬁcation and con trol of dynamic systems. The A nnals of Statistics , 10(1): 154–166, 1982. ISSN 00905364, 21688966. URL http://www.jstor.org/stable/2240506 . T or Lattimore. A lo w er b ound for linear and kernel regression with adaptiv e cov ariates. In Gergely Neu and Lorenzo Rosasco, editors, Pr o c e e dings of Thirty Sixth Confer enc e on L e arning The ory , v olume 195 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 2095–2113. PMLR, 12–15 Jul 2023. T or Lattimore and Csaba Szep esv ári. Bandit Algorithms . Cambridge Univ ersit y Press, 2020. Lucien Le Cam. Asymptotic Metho ds in Statistic al De cision The ory . Springer, New Y ork, 1986. Lihong Li, W ei Chu, John Langford, and Rob ert E. Schapire. A con textual-bandit approach to p ersonalized news article recommendation. In Pr o c e e dings of the 19th International W orld Wide W eb Confer enc e (WWW) , 2010. William F Rosenberger and F eifang Hu. Bo otstrap metho ds for adaptive designs. Statistics in me dicine , 18(14):1757–1767, 1999. 14 Niranjan Sriniv as, Andreas Krause, Sham Kak ade, and Matthias Seeger. Gaussian pro cess opti- mization in the bandit setting: no regret and exp erimen tal design. In International Confer enc e on Machine L e arning , page 1015–1022, 2010. Ric hard S. Sutton and Andrew G. Barto. R einfor c ement L e arning: An Intr o duction . MIT Press, Cam bridge, MA, 2nd edition, 2018. Lars v an der Laan, Marco Carone, Alex Luedtk e, and Mark v an der Laan. A daptive debiased mac hine learning using data-driven mo del selection tec hniques, 2023. URL abs/2307.12544 . Lars v an der Laan, Nathan Kallus, and Aurélien Bibaut. Nonparametric instrumen tal v ariable inference with many weak instruments, 2026. URL . Aad W. v an der V aart. Asymptotic Statistics . Cam bridge Series in Statistical and Probabilistic Mathematics. Cam bridge Univ ersity Press, 1998. doi: 10.1017/CBO9780511802256. Ian W audby-Smith and Aadit ya Ramdas. Time-uniform cen tral limit theorems and conﬁdence sequences. In International Confer enc e on Machine L e arning , volume 139 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 10663–10672, 2021. Kelly Zhang, Lucas Janson, and Susan Murphy . Statistical inference with m-estimators on adaptiv ely collected data. A dvanc es in neur al information pr o c essing systems , 34:7460–7471, 2021. 15 App endix This appendix is organized as follows: – App endix 8: W e pro vide proofs for path wise diﬀerentiabilit y and canonical gradien t. – App endix 9: Asymptotic analysis of the one-step estimator in the high dimensional regime. – App endix 10: Upper Bound on Stabilit y Rate under LinUCB Sampling. – App endix 11: Proof of lo cal asymptotic normality , and con volution theorem. 8 Canonical gradien t P arameter and notation. Let Z t := ( X t , A t ) and write the conditional mean and residual as h T ( z ) := E P ( T ) [ Y t | Z t = z ] = φ T ( z ) ⊤ β T , ε t := Y t − h T ( Z t ) . W e deﬁne the p o oled ( X, A ) marginal induced by the (p ossibly adaptiv e) logging sequence: ¯ g t ( a | x ) := Z g t ( a | x, ¯ o t − 1 ) dP ( T ) ( ¯ o t − 1 ) , ¯ g ( a | x ) := 1 T T X t =1 ¯ g t ( a | x ) . Equiv alently , the p o oled marginal density/mass of Z with respect to µ X ⊗ ν A is ¯ h ( a, x ) := q X ( x ) ¯ g ( a | x ) . The po oled co v ariance matrix ¯ Σ T can be written as ¯ Σ T = X a ∈A Z X φ ( x, a ) φ ( x, a ) ⊤ ¯ h ( a, x ) d x. The target functional ev aluated at P ( T ) can be written as Ψ T ( P ( T ) ) = ν ⊤ T β T . W e no w presen t the pro of of Theorem 1. Pr o of. Fix P ( T ) ∈ M T , where dP ( T ) dµ ( T ) ( ¯ o T ) = T Y t =1 q X ( x t ) g t ( a t | x t , ¯ o t − 1 ) q Y ( y t | a t , x t ) . W e deﬁne M res T = M T ∩ ( dP ( T ) dµ ( T ) ( ¯ o T ) = T Y t =1 q X ( x t ) g t ( a t | x t , ¯ o t − 1 ) q Y ′ ( y t | a t , x t ) | q Y ′ ) , (16) as the restricted mo del of M T where q Y is allo wed to v ary , and q X , g t , 1 ≤ t ≤ T are ﬁxed at the corresp onding factors in the ﬁxed P ( T ) . W e pro ve the theorem b y (i) identifying the tangen t space of M res T , (ii) computing the path wise deriv ative of Ψ T with respect to M res T , (iii) showing that the path wise deriv ativ e lies in the tangent space, hence it is the canonical gradien t. Since Ψ T only depends on q Y , the canonical gradien t is agnostic to if g t or q X is known. Therefore, the canonical gradient of Ψ T with resp ect to M res T coincides with that with resp ect to M T . 16 Step 1: T angen t space of M res T . Consider a one-dimensional submo del { P ( T ) ϵ : ϵ ∈ ( − ϵ 0 , ϵ 0 ) } ⊆ M res T through P ( T ) . W e assume ϵ 7→ q Y ,ϵ is diﬀeren tiable in quadratic mean (DQM) at ϵ = 0 with (conditional) score s ( y , a, x ) := d dϵ log q Y ,ϵ ( y | a, x )     ϵ =0 , Z s ( y , a, x ) q Y ( y | a, x ) dy = 0 ∀ ( a, x ) . Then ϵ 7→ P ( T ) ϵ is DQM at ϵ = 0 with score S T ( ¯ O T ) = d dϵ log dP ( T ) ϵ dP ( T ) ( ¯ O T )      ϵ =0 = T X t =1 s ( Y t , A t , X t ) . Let m ϵ ( a, x ) := E P ( T ) ϵ [ Y t | A t = a, X t = x ] . A standard conditional-mean diﬀerentiation identit y giv es ˙ m ( a, x ) := d dϵ m ϵ ( a, x )     ϵ =0 = E P ( T )  Y t s ( Y t , A t , X t ) | A t = a, X t = x  = E P ( T )  ε t s ( Y t , A t , X t ) | A t = a, X t = x  . (17) Because { P ( T ) ϵ : ϵ ∈ ( − ϵ 0 , ϵ 0 ) } ⊂ M T , by Assumption 1, for eac h ϵ ∈ ( − ϵ 0 , ϵ 0 ) , there exists a unique β ϵ suc h that m ϵ ( a, x ) ≡ φ ( a, x ) ⊤ β ϵ . Then w e claim that ϵ 7→ β ϵ is diﬀeren tiable at ϵ = 0 with deriv ative ˙ β ∈ R d T . Indeed, b y Assumption 1, w e can choose z i , 1 ≤ i ≤ d T , such that { φ ( z i ) } 1 ≤ i ≤ d T form a linearly indep enden t set. Let Φ denote the matrix whose i th ro w is φ ( z i ) ⊤ . Then w e ha ve β ϵ − β T = Φ − 1    ( m ϵ − m 0 )( z 1 ) . . . ( m ϵ − m 0 )( z d T )    . The claim then follows from diﬀeren tiability of ( m ϵ − m 0 )( z i ) and contin uity of Φ − 1 . In this case, ˙ m ( a, x ) = φ ( x, a ) ⊤ ˙ β ∀ ( a, x ) . (18) Step 2: Path wise deriv ative of Ψ T along ϵ 7→ q Y ,ϵ . Since Ψ T ( P ( T ) ) = ν ⊤ T β T , w e ha ve d dϵ Ψ T ( P ( T ) ϵ )     ϵ =0 = ν ⊤ T ˙ β . (19) T o express ˙ β in terms of the score s , m ultiply Eq. (18) on the left by φ ( x, a ) and integrate with resp ect to the p o ole d marginal ¯ h ( a, x ) dx (equiv alen tly , tak e T − 1 P T t =1 E P ( T ) [ · ] ): 1 T T X t =1 E P ( T ) [ φ ( Z t ) ˙ m ( Z t )] = 1 T T X t =1 E P ( T ) h φ ( Z t ) φ ( Z t ) ⊤ i ˙ β = ¯ Σ T ˙ β . Using Eq. (17) and the la w of iterated exp ectations, 1 T T X t =1 E P ( T ) [ φ ( Z t ) ˙ m ( Z t )] = 1 T T X t =1 E P ( T ) [ φ ( Z t ) ε t s ( Y t , Z t )] . 17 By Assumption 2, ν T = ¯ Σ T ¯ Σ † T ν T . Th us from Eq. (19), we hav e d dϵ Ψ T ( P ( T ) ϵ )     ϵ =0 = ν ⊤ T ¯ Σ † T 1 T T X t =1 E P ( T ) h φ ( Z t ) φ ( Z t ) ⊤ i ˙ β = 1 T T X t =1 E P ( T ) h ν ⊤ T ¯ Σ † T φ ( Z t ) ε t  s ( Y t , Z t ) i = 1 T T X t =1 E P ( T ) [ ¯ α T ( Z t ) ε t s ( Y t , Z t )] . (20) Step 3: Expressing Eq. (20) as a L 2 ( P ( T ) ) -inner pro duct. In L 2 ( P ( T ) ) , the inner pro duct of 1 T P T t =1 ¯ α T ( Z t ) ε t with the full score S T = P T l =1 s ( Y l , Z l ) expands as E P ( T ) " 1 T T X t =1 ¯ α T ( Z t ) ε t ! T X l =1 s ( Y l , Z l ) !# = 1 T T X t =1 E P ( T )  ¯ α T ( Z t ) ε t s ( Y t , Z t )  + 1 T X t  = l E P ( T )  ¯ α T ( Z t ) ε t s ( Y l , Z l )  . F or t  = l , the cross-terms v anish by the law of iterated exp ectation and the fact that E P ( T ) [ ε t | Z t , F t − 1 ] = E P ( T ) [ ε t | Z t ] = 0 : if l < t , then s ( Y l , Z l ) is F t − 1 -measurable and E P ( T )  ¯ α T ( Z t ) ε t s ( Y l , Z l )  = E P ( T ) [ s ( Y l , Z l ) E P ( T ) [ ¯ α T ( Z t ) ε t | F t − 1 ]] = E P ( T ) [ s ( Y l , Z l ) E P ( T ) [ ¯ α T ( Z t ) E P ( T ) [ ε t | Z t ] | F t − 1 ]] = 0 . The case t < l is analogous (condition on F l − 1 ). Therefore E P ( T ) " 1 T T X t =1 ¯ α T ( Z t ) ε t ! S T # = 1 T T X t =1 E P ( T )  ¯ α T ( Z t ) ε t s ( Y t , Z t )  . (21) Step 4: W e show D ∗ T is the canonical gradien t in M res T . Putting Eq. (20) and Eq. (21) together, w e obtain d dϵ Ψ T ( P ( T ) ϵ )     ϵ =0 = E P ( T ) " 1 T T X t =1 ¯ α T ( Z t ) ε t ! S T # = E P ( T ) [ D ∗ T S T ] , for D ∗ T giv en in Eq. (6) . Hence D ∗ T is a gradien t for Ψ T along all smooth one-parameter parametric submo dels M res T . Moreo ver, since D ∗ T = 1 T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) , where E [ ¯ α T ( Z t )( Y t − h T ( Z t )) | Z t ] = ¯ α T ( Z t ) E [ ε t | Z t ] = 0 , we see that D ∗ T ,Y lies in the tangen t space of M res T at P ( T ) . Therefore it is the canonical gradient with resp ect to M res T . 18 Step 5: Path wise diﬀerentiabilit y criterion. If D ∗ T / ∈ L 2 ( P ( T ) ) , then the linear functional s 7→ d dϵ Ψ T ( P ( T ) ϵ )    ϵ =0 cannot b e contin uous on the tangen t space of M res T equipp ed with the L 2 ( P ( T ) ) norm, so Ψ T is not pathwise diﬀerentiable at P ( T ) . Con v ersely , if D ∗ T ∈ L 2 ( P ( T ) ) , then (20) – (21) sho w that the deriv ative is represented by the L 2 inner product with D ∗ T , hence Ψ T is path wise diﬀeren tiable. 9 Asymptotic analysis of the one-step estimator Notation. W e omit subscript T when it’s clear from con text, for instance, w e sometimes write E 0 for E P ( T ) 0 . F or a positive deﬁnite, symmetric matrix A ∈ R d T × d T and a ∈ R d T , w e let ∥ a ∥ A := ⟨ a, Aa ⟩ 1 2 H . F or a symmetric A ∈ R d T × d T and λ ≥ 0 , we also write A λ = A + λI d T . Pr o of of the or em 2. W e ha v e R = ψ 0 ( ˆ h ) − ψ 0 ( h T ) + ˜ P ( T ) ˆ α T ( Y − ˆ h ) ( ∗ ) = ˜ P ( T ) e α T ( ˆ h T ,λ − h T ) + ˜ P ( T ) ˆ α T ( h T − ˆ h T ,λ ) = − ˜ P ( T ) ( ˆ α T − e α T )( ˆ h T ,λ − h T ) , where in ( ∗ ) we use ˜ P ( T ) φ ( z ) ε = 0 . W e hav e ˆ Ψ − ψ ( h T ) =  P T − ˜ P ( T )  ˆ α T ( Y − ˆ h T ,λ ) − ˜ P ( T ) ( ˆ α T − e α T )( ˆ h T ,λ − h T ) =  P T − ˜ P ( T )  e α T ( Y − h T ) +  P T − ˜ P ( T )  n ˆ α T ( Y − ˆ h T ,λ ) − e α T ( Y − h T ) o − ˜ P ( T ) ( ˆ α T − e α T )( ˆ h T ,λ − h T ) . Then, using the identit y ˆ a ˆ b − ab = ( ˆ a − a ) b + a ( ˆ b − b ) + ( ˆ b − b )(ˆ a − a ) , w e ha ve ˆ Ψ − ψ ( h T ) =  P T − ˜ P ( T )  e α T ( Y − h T ) (I) +  P T − ˜ P ( T )  ( ˆ α T − e α T )( Y − h T ) (I I) +  P T − ˜ P ( T )  e α T ( h T − ˆ h T ,λ ) (I II) +  P T − ˜ P ( T )  ( ˆ α T − e α T )( h T − ˆ h T ,λ ) (IV) − ˜ P ( T ) ( ˆ α T − e α T )( ˆ h T ,λ − h T ) . (V) W e ha ve (IV) + (V) = − P T ( ˆ α T − e α T )( ˆ h T ,λ − h T ) (I I) =  P T − ˜ P ( T )  ( ˆ α T − e α T ) ε. 19 Therefore the von Mises expansion can be explicitly written as ˆ Ψ − Ψ T =  P T − e P ( T )  e α T ( Y − h T ) (A) +  P T − e P ( T )  e α T ( h T − ˆ h T ,λ ) (B) +  P T − e P ( T )  ( b α T − e α T ) ε (C) − P T ( b α T − e α T )( b h T ,λ − h T ) . (D) T erm (A) Claim: √ T e σ T (A) d → N (0 , 1) . Pr o of of claim . W e verify the assumptions of Lemma 2 for the natural ﬁltration F T ,t = F t . First, w e verify (QV). W e hav e T e σ 2 T 1 T 2 T X t =1 e α T ( Z t ) 2 σ 2 = σ 2 T e σ 2 T T X t =1 ν ⊤ T ˜ Σ − 1 T φ T ( Z t ) φ T ( Z t ) ⊤ ˜ Σ − 1 T ν T = σ 2 ˜ σ 2 T ν ⊤ T ˜ Σ − 1 T ˆ Σ T ˜ Σ − 1 T ν T = 1 + σ 2 ˜ σ 2 T ν ⊤ T ˜ Σ − 1 T ( ˆ Σ T − ˜ Σ T ) ˜ Σ − 1 T ν T p → 1 , where the last step follows by Assumption 4 and the fact that ˜ σ 2 T = σ 2 ν ⊤ T ˜ Σ − 1 T ν T . W e next v erify (Lindeb erg) is veriﬁed by Assumption 5. Hence the claim follo ws from Lemma 2. T erm (B) W e ha ve that (B) = ν ⊤ T e Σ − 1 T  b Σ T − e Σ T   b β T ,λ − β T  = ν ⊤ T  e Σ − 1 T − b Σ − 1 T , 1 /T  b Σ T  b β T ,λ − β T  − 1 T ν ⊤ T b Σ − 1 T , 1 /T  b β T ,λ − β T  = ν ⊤ T  e Σ − 1 T − b Σ − 1 T , 1 /T  ˆ Σ 1 2 T ˆ Σ 1 2 T  ˆ β T ,λ − β T  − 1 T ν ⊤ T b Σ − 1 T , 1 /T  b β T ,λ − β T  ≤     e Σ − 1 T − b Σ − 1 T , 1 /T  ν T    ˆ Σ T    ˆ β T ,λ − β T    ˆ Σ T − 1 T ν ⊤ T b Σ − 1 T , 1 /T  b β T ,λ − β T  = ∥ b α T − e α T ∥ L 2 ( P T )    ˆ h T ,λ − h T    L 2 ( P T ) − 1 T ν ⊤ T b Σ − 1 T , 1 /T  b β T ,λ − β T  , where the ﬁrst line follo ws by deﬁnition of P T and e P ( T ) , the second line follo ws from e Σ − 1 T  ˆ Σ T − e Σ T  =  e Σ − 1 T − ˆ Σ − 1 T , 1 /T  ˆ Σ T + ˆ Σ − 1 T , 1 /T ˆ Σ T − I d T =  e Σ − 1 T − ˆ Σ − 1 T , 1 /T  ˆ Σ T − 1 T ˆ Σ − 1 T , 1 T , w e apply Cauc hy Sch w arz inequality in the second last line, and in the last line w e use    ˆ h T ,λ − h T    2 L 2 ( P T ) = 1 T T X t =1  ˆ β T ,λ − β T  ⊤ φ ( Z t ) φ ( Z t ) ⊤  ˆ β T ,λ − β T  =    ˆ β T ,λ − β T    2 ˆ Σ T , and similarly ∥ b α T − e α T ∥ L 2 ( P T ) =     e Σ − 1 T − b Σ − 1 T , 1 /T  ν T    ˆ Σ T . 20 T erm (C) Recall that (C) = ( P T − e P ( T ) )( b α T − e α T ) ε = P T  ( b α T − e α T ) ε  = 1 T T X t =1  b α T ( Z t ) − e α T ( Z t )  ε t , where the second equality uses e P ( T ) ( a 2 φ ( z ) ε ) = 0 b y deﬁnition of e P ( T ) for an y z ∈ Z and a 2 ∈ R . Let w t := b α T ( Z t ) − e α T ( Z t ) , t = 1 , . . . , T . Then ( w 1 , . . . , w T ) is measurable with resp ect to Z 1: T . Under Assumption 6, conditional on Z 1: T w e hav e T X t =1 w t ε t    Z 1: T ∼ N 0 , σ 2 T X t =1 w 2 t ! . Therefore, (C)    Z 1: T ∼ N 0 , σ 2 T 2 T X t =1 w 2 t ! = N 0 , σ 2 T · 1 T T X t =1 w 2 t ! . Moreo ver, 1 T T X t =1 w 2 t = ∥ b α T − e α T ∥ 2 L 2 ( P T ) , since P T is the empirical measure on ( Z t ) t ≤ T . Hence, for an y δ ∈ (0 , 1) , using the standard Gaussian tail bound P ( | N (0 , 1) | ≥ x ) ≤ 2 e − x 2 / 2 , w e obtain P | (C) | ≤ σ ∥ b α T − e α T ∥ L 2 ( P T ) r 2 log(2 /δ ) T    Z 1: T ! ≥ 1 − δ. Since the right-hand side is measurable in Z 1: T , the same inequality holds unconditionally: P | (C) | ≤ σ ∥ b α T − e α T ∥ L 2 ( P T ) r 2 log(2 /δ ) T ! ≥ 1 − δ. Consequen tly , (C) = O P ( T )  σ √ T ∥ b α T − e α T ∥ L 2 ( P T )  = O P ( T )  1 √ T ∥ b α T − e α T ∥ L 2 ( P T )  , where the second equality uses that σ is a ﬁxed constan t. T erm (D) F rom Cauc hy-Sc h warz, (D) ≤ ∥ ˆ α T − e α T ∥ L 2 ( P T )    ˆ h T ,λ − h T    L 2 ( P T ) . 10 Upp er Bound on Stabilit y Rate under LinUCB Sampling Pr o of of pr op osition 2. W e omit p olylogs throughout the pro of. W e omit the dependence of d on T notation wise and write d = d T . Let b Λ T := T b Σ T , let v = v 1 b e the top eigen vector of b Λ T , let v i , 21 i = 2 , . . . , d b e its eigen vectors corresp onding to the non-leading eigenv alues in descending order, denote λ i , i = 1 , . . . , d its eigenv alues ordered in descending order, i.e. b Λ T = d X i =1 λ i v i v ⊤ i . Deﬁne Q ⋆ := v v ⊤ and Q ⊥ := I d − Q ⋆ . Let e 1 := β T / ∥ β T ∥ 2 . W e use γ for the β in F an et al. [2025]. Deﬁne P ∗ = e 1 e ⊤ 1 and P ⊥ = I d − P ∗ , as the pro jection onto the true signal direction (and its orthogonal complement), and let e Λ T := ω 1 P ⋆ + ¯ ω P ⊥ , (22) ˇ Λ T := λ 1 Q ⋆ + ¯ λQ ⊥ , (23) with ¯ λ := ¯ ω := γ p T /d , ω 1 := T . W e assume p d/T = o (1) . W e’ll establish Prop osition 2 with e Σ T = 1 T e Λ T . Preliminary facts. • F rom F an et al. [2025, Eq. (27), Eq. (28)], under assumptions 7-8-9, with probability ≥ 1 − 1 log T , ∥ v − e 1 ∥ ≲ r d ¯ λ = d 3 / 4 γ 1 / 2 T 1 / 4 . (24) λ i = (1 + ∆ T ,i ) r 2 γ 2 T d + 1 , (25) where w e ha ve | ∆ T ,i | ≲ ε bulk = o T (1) . b y assumption. • F rom ∥ Z t ∥ = 1 for ev ery t , T r( ˇ Λ T ) = T + d = λ 1 + ( d − 1) ¯ λ. Therefore, under d = o ( T ) , the eigenv alue gap δ := ω 1 − λ 1 satisﬁes δ = ( d − 1) γ p T /d − d = O ( γ √ T d ) . • It holds that ∥ Q ⊥ P ⋆ ∥ = ∥ e 1 e ⊤ 1 − Q ⋆ e 1 e ⊤ 1 ∥ = ∥ ( e 1 − Q ⋆ e 1 ) e ⊤ 1 ∥ ≤ ∥ e 1 − Q ⋆ v ∥ ≤ ∥ e 1 − v ∥ . Similarly ∥ Q ⋆ P ⊥ ∥ ≤ ∥ e 1 − v ∥ . 22 • F or an y σ i , i = 1 , . . . , d , w e ha ve      X i σ i v i v ⊤ i e 1 e ⊤ 1      op ≤   X j ≥ 2 σ 2 j ( v ⊤ j e 1 ) 2   1 2 ≤  max j ≥ 2 | σ j |    X j ≥ 2 ( v ⊤ j e 1 ) 2   1 2 W e ha ve X j ≥ 2 ( v ⊤ j e 1 ) 2 =    e 1 − e ⊤ 1 v 1 v 1    2 = ∥ e 1 − Pro j( e 1 | Span( v 1 )) ∥ 2 ≤ ∥ e 1 − v ∥ 2 . Th us we hav e      X i σ i v i v ⊤ i e 1 e ⊤ 1      op ≤ max j ≥ 2 | σ j |∥ v − e 1 ∥ . (26) W e consider the following decomp osition of the target quantit y . W e ha ve ∥ b Λ 1 / 2 T  b Λ − 1 T − e Λ − 1 T  a ∥ ≤∥ ˇ Λ 1 / 2 T  ˇ Λ − 1 T − e Λ − 1 T  a ∥ (27) + ∥ ( b Λ − 1 / 2 T e Λ 1 / 2 T − b Λ 1 / 2 T e Λ − 1 / 2 T ) e Λ − 1 / 2 T a − ( ˇ Λ − 1 / 2 T e Λ 1 / 2 T − ˇ Λ 1 / 2 T e Λ − 1 / 2 T ) e Λ − 1 / 2 T a ∥ . (28) W e b ound the tw o terms ab o ve in steps 1 and 2 b elo w, resp ectiv ely . Step 1. W e ha ve that ˇ Λ 1 / 2 T  ˇ Λ − 1 T − e Λ − 1 T  a =(I) + (I I) + (I II) + (IV ) , with (I) :=  λ − 1 / 2 1 ω 1 / 2 1 − λ 1 / 2 1 ω − 1 / 2 1  ω − 1 / 2 1 Q ⋆ P ⋆ a, (I I) :=  ¯ λ − 1 / 2 ¯ ω 1 / 2 − ¯ λ 1 / 2 ¯ ω − 1 / 2  ¯ ω − 1 / 2 Q ⊥ P ⊥ a, (I II) :=  ¯ λ − 1 / 2 ω 1 / 2 1 − ¯ λ 1 / 2 ω − 1 / 2 1  ω − 1 / 2 1 Q ⊥ P ⋆ a, (IV) :=  λ − 1 / 2 1 − λ 1 / 2 1 ¯ ω − 1  Q ⋆ P ⊥ a. • First term. Using p d/T = o (1) , w e hav e that ∥ (I) ∥ ≲      s 1 1 − δ /T − p 1 − δ /T      ∥ P ⋆ a ∥ √ T ≲ δ T ∥ P ⋆ a ∥ √ T ≲ γ √ d 1 T ∥ P ⋆ a ∥ . 23 • Second term: since ¯ ω = ¯ λ , (II) = 0 . • Third term. Using p d/T = o (1) , w e hav e that ∥ (I II) ∥ ≲   γ r T d ! − 1 / 2 √ T + γ r T d ! 1 / 2 1 √ T   1 √ T d 3 / 4 √ γ T 1 / 4 ∥ P ⋆ a ∥ = d γ √ T + √ d T ! ∥ P ⋆ a ∥ . • F ourth term: ∥ (IV) ∥ ≲  1 √ T − δ + √ T − δ γ √ T √ d  d 3 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ ≲ 1 √ T + √ d γ ! d 3 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ Collecting the b ounds ab o ve and using γ − 1 = o (1) , w e then ha v e ∥  ˇ Λ − 1 T − e Λ − 1 T  a ∥ ˇ Λ T ≲ γ √ d T + d γ √ T ! ∥ P ⋆ a ∥ + 1 √ T + √ d γ ! d 3 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ . Step 2. W e ha ve that  b Λ − 1 / 2 T e Λ 1 / 2 T − b Λ 1 / 2 T e Λ − 1 / 2 T  e Λ − 1 / 2 T a −  ˇ Λ − 1 / 2 T e Λ 1 / 2 T − ˇ Λ 1 / 2 T e Λ − 1 / 2 T  e Λ − 1 / 2 T a =  ˆ Λ − 1 2 T − ˇ Λ − 1 2 T  a −  ˆ Λ 1 2 T e Λ − 1 T − ˇ Λ 1 2 T e Λ − 1 T  a =(V) + (VI) with (V) := d X i =2 ( λ − 1 / 2 i − ¯ λ − 1 / 2 ) v i v ⊤ i ( P ⋆ + P ⊥ ) a (VI) := d X i =2 ( λ 1 / 2 i − ¯ λ 1 / 2 ) v i v ⊤ i  1 T P ⋆ + 1 ¯ λ P ⊥  a. F rom Eq. (26), ∥ (V) ∥ ≤ max j     λ − 1 2 j − ¯ λ − 1 2     ∥ v − e 1 ∥∥ P ∗ a ∥ +      d X i =2  λ − 1 2 i − ¯ λ − 1 2  v i v ⊤ i      ∥ P ⊥ a ∥ ≲ ¯ λ ¯ λ 3 / 2 ε bulk ∥ v − e 1 ∥∥ P ⋆ a ∥ + ¯ λ ¯ λ 3 / 2 ε bulk ∥ P ⊥ a ∥ ≲ ε bulk d γ √ T ∥ P ⋆ a ∥ + d 1 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ ! 24 and ∥ (VI) ∥ ≲ ¯ λ ¯ λ 1 / 2 ε bulk 1 T ∥ v − e 1 ∥∥ P ⋆ a ∥ + ¯ λ ¯ λ 1 / 2 ¯ λ ε bulk ∥ P ⊥ a ∥ ≲ ε bulk √ d T ∥ P ⋆ a ∥ + d 1 / 4 √ γ T 1 / 4 ∥ P ⊥ a ∥ ! Collecting the b ounds. W e hav e that    ( b Σ − 1 T − e Σ − 1 T ) a    b Σ T = √ T    ( b Λ − 1 T − e Λ − 1 T ) a    b Λ T = γ r d T + d γ + r d T ε bulk ! ∥ P ⋆ a ∥ + r d T + d γ + ε bulk ! ( T d ) 1 / 4 √ γ ∥ P ⊥ a ∥ . No w observing that e σ 2 T = ∥ P ⋆ a ∥ 2 + √ T d γ ∥ P ⊥ a ∥ 2 yields the desired result. 11 Eﬃciency Theory In this section, w e pro ve Theorem 3 and Theorem 4. W e adapt the eﬃciency theory along a sequence of exp eriments introduced in v an der Laan et al. [2026, Section I]. Both our setting and their setting consider a sequence of statistical mo dels with div erging Fisher information, whic h is indexed b y horizon T in our case and the num b er of instruments K in their case. How ev er, the k ey distinction is that they consider a factorized mo del Bic kel et al. [1993] due to the indep endence b et ween diﬀeren t units, whereas w e consider a longitudinal mo del where g t ( a t | x t , ¯ o t − 1 ) factors induce in tertemp oral dependencies. This necessitates certain adaptations in our construction. 11.1 Lo cal asymptotic normalit y along least fav orable submodels Recall z = ( a, x ) and w e deﬁned ¯ P T f ( z , y ) = Z ¯ h T ( z ) q ( y | z ) f ( z , y ) d z d y , ¯ h T ( z ) = 1 T T X t =1 ¯ g t ( a | x ) q X ( x ) . D ∗ T = 1 T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) ∈ L 2 0 ( P ( T ) ) , and we emplo y the empirical pro cess notation P ( T ) [ f ] for E P ( T ) [ f ] , f : ¯ O T 7→ R , where w e only in tegrate ov er the randomness of the arguments of f , and not f itself. Then we hav e P ( T ) [( D ∗ T ) 2 ] = P ( T ) 1 T T X t =1 ¯ α T ( A t , X t ) ε t ! 2 ( ∗ ) = 1 T 2 T X t =1 E P ( T )  ( ¯ α T ( A t , X t ) ε t ) 2  ( ∗∗ ) = 1 T ¯ σ 2 T . 25 where in ( ∗ ) w e use conditional mean zero of ε t and la w of total expectation, and in ( ∗∗ ) , ¯ σ T = σ q ν ⊤ T ¯ Σ † T ν T is deﬁned in Section 5. T o see why ( ∗∗ ) holds, w e note that ¯ Σ T = E P ( T ) " 1 T T X t =1 φ ( Z t ) φ ( Z t ) ⊤ # = Z φ ( z ) φ ( z ) ⊤ 1 T T X t =1 ¯ g t ( a | x ) q X ( x ) ! d a d x hence 1 T T X t =1 E P ( T )  ( ¯ α T ( Z t ) ε t ) 2  = 1 T T X t =1 E P ( T ) h ε 2 t ν ⊤ T ¯ Σ † T φ ( Z t ) φ ( Z t ) ⊤ ¯ Σ † T ν T i = 1 T T X t =1 E P ( T ) h E [ ε 2 t | F t − 1 , Z t ] ν ⊤ T ¯ Σ † T φ ( Z t ) φ ( Z t ) ⊤ ¯ Σ † T ν T i = σ 2 ν ⊤ T ¯ Σ † T ¯ Σ T ¯ Σ † T ν T = σ 2 ν ⊤ T ¯ Σ † T ν T , (29) where w e use the homoschedastic noise assumption E [ ε 2 t | F t − 1 , Z t ] = σ 2 (Assumption 3). W e no w construct an explicit least fav orable submo del satisfying Deﬁnition 2. The construction follo ws the classical Le Cam–Hájek quadratic mean diﬀerentiabilit y path [Le Cam, 1986], adapted to the presen t setting b y taking the score to b e the canonical gradien t D ∗ T . This provides a concrete example of a submo del along whic h the local asymptotic normality expansion of Theorem 3 holds. W e adapt the factorizable mo del construction in v an der Laan et al. [2026, Lemma 17], noting that while g t factors induce dependence betw een time p oin ts, it suﬃces to consider p erturbations of the repeated q Y ( y t | z t ) factors in order to obtain a one-parameter submodel with D ∗ T = D ∗ T ,Y score at the origin. In Subsection 11.2, we’ll leverage the factorizabilit y to express the log-lik eliho o d ratio pro cess in terms of a martingale, and then obtain local asymptotic normalit y via a careful application of martingale limit theory [Hall and Heyde, 1980]. Lemma 1 (Le Cam–Hájek QMD path) . Fix horizon T ≥ 1 and assume Assumption 3. W e deﬁne a one-p ar ameter family η 7→ q Y ,η , for | η | ≤ δ , with sc or e s T ( z , y ) = 1 T ¯ α T ( z )( y − h T ( z )) , by q Y ,η ( y | z ) :=  1 + η 2 T ¯ α T ( z )( y − h T ( z ))  2 1 + η 2 4 ¯ σ 2 T T 2 q Y ( y | z ) . (30) W e then deﬁne a one-dimensional p ar ametric submo del { P ( T ) η : | η | ≤ δ } via d P ( T ) η d µ ( T ) ( ¯ o T ) = T Y t =1 q X ( x t ) g t ( a t | x t , ¯ o t − 1 ) q Y ,η ( y t | a t , x t ) . R e c al l µ is the b ase me asur e on Z × Y . Then i) q Y ,η ( y | z ) ¯ h T ( z ) is a valid pr ob ability density with r esp e ct to µ , ii), { q Y ,η ¯ h T : | η | ≤ δ } is QMD at η = 0 with sc or e 1 T ¯ α T ( z )( y − h T ( z )) , and iii), the r emainder term R t,η ( y , z ) := q q Y ,η ( y | z ) ¯ h T ( z ) − q q Y ( y | z ) ¯ h T ( z ) − η 2 T ¯ α T ( z )( y − h T ( z )) q q Y ( y | z ) ¯ h T ( z ) satisﬁes ∥ R t,η ∥ 2 L 2 ( µ ) ≤  1 − q 1 + η 2 4 ¯ σ 2 T T 2  2 ≤ η 4 ¯ σ 4 T T 4 . 26 Pr o of. F or this proof t is ﬁxed. W e let ξ ( y , z ) 2 = q Y ( y | z ) ¯ h T ( z ) . This has the prop ert y that for an y f : Y × Z → R , we hav e Z Z ξ ( y , z ) 2 d µ ( y , z ) f ( y , z ) = ¯ P ( T ) [ f ] = E P ( T ) " 1 T T X t =1 f ( Y t , Z t ) # . (31) W e deﬁne, for η ∈ R ξ η ( y , z ) := ξ ( y , z )  1 + η 2 T ¯ α T ( z )( y − h T ( z ))  . W e readily v erify from Eq. (31) that Z ξ ( y , z ) 2 d µ ( y , z ) = 1 Z ξ ( y , z ) 2 1 T ¯ α T ( z )( y − h T ( z )) d µ ( y , z ) = 0 Z ξ ( y , z ) 2  1 T ¯ α T ( z )( y − h T ( z ))  2 d µ ( y , z ) = E P ( T ) " 1 T T X t =1  1 T ¯ α T ( Z t )( Y t − h T ( Z t ))  2 # = ¯ σ 2 T T 2 Z ξ η ( y , z ) 2 d µ ( y , z ) = 1 + η 2 4 ¯ σ 2 T T 2 . Therefore Eq. (30) deﬁnes a v alid conditional probability . Th us w e’ve shown i). W e ha ve log q Y ,η ( y | z ) = 2 log  1 + η 2 T ¯ α T ( z )( y − h T ( z ))  − log  1 + η 2 4 ¯ σ 2 T T 2  + log q Y ( y | z ) . W e no w compute its deriv ative at η = 0 : ∂ ∂ η     η =0 log q Y ,η ( y | z ) = 1 T ¯ α T ( z )( y − h T ( z )) = s T ( z , y ) . Th us we’v e sho wn ii). W e now control the remainder ∥ R t,η ∥ 2 L 2 ( µ ) to establish quadratic mean diﬀeren tiability at η = 0 . W e obtain via direct algebraic manipulations that R t,η ( y , z ) =   1 q 1 + η 2 4 ¯ σ 2 T T 2 − 1   ξ ( y , z )  1 + η 2 s T ( z , y )  . Therefore ∥ R t,η ∥ 2 L 2 ( µ ) =   1 q 1 + η 2 4 ¯ σ 2 T T 2 − 1   2 E P ( T )   1 + η 2 s T ( Z t , Y t )  2  =   1 q 1 + η 2 4 ¯ σ 2 T T 2 − 1   2  1 + η 2 4 ¯ σ 2 T T 2  = 1 − r 1 + η 2 4 ¯ σ 2 T T 2 ! 2 ≤ η 4 ¯ σ 4 T T 4 . where w e use Eq. (31) in the ﬁrst line, and the last inequality follows from | 1 − √ 1 + x | ≤ x . 27 This construction supplies an example of a QMD one-dimensional submodel with ∂ η | η =0 log P ( T ) η ( ¯ o T ) = D ∗ T ( ¯ o T ) . The QMD of P ( T ) η is implied by the QMD of q Y ,η . 11.2 Lo cal asymptotic normalit y along least fav orable submodels In this section, w e adapt v an der V aart [1998, Theorem 7.2] and v an der Laan et al. [2026, Theorem 17] in order to pro ve Theorem 3, as presen ted below. Pr o of. Fix ϵ > 0 . W e deﬁne I T = ¯ σ 2 T T and η = ϵI − 1 2 T . W e deﬁne the random v ariable W t := 2 " s q Y ,η ( Y t | Z t ) q Y ( Y t | Z t ) − 1 # . The log-lik eliho o d ratio admits the expansion log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) = 2 T X t =1 log  1 + 1 2 W t  . W e aim to show that, as T → ∞ , we hav e T X t =1 W t = η 1 T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) − ϵ 2 4 + o p (1) . (32) Then, we use the T a ylor expansion log (1 + x ) = x − x 2 2 + x 2 R (2 x ) where R ( x ) → 0 as x → 0 , to obtain log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) = 2 T X t =1 log  1 + 1 2 W t  = T X t =1 W t − 1 4 W 2 t + 1 2 W 2 t R ( W t ) = η 1 T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) − 1 4 T X t =1 W 2 t + 1 2 T X t =1 W 2 t R ( W t ) − ϵ 2 4 + o p (1) . W e will establish the following statements T X t =1 W 2 t p → ϵ 2 , (QV) η T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) d → N (0 , ϵ 2 ) , (MCL T) T X t =1 W 2 t R ( W t ) = o p (1) , (REM) 28 where Eq. (QV) is a quadratic v ariation term, Eq. (MCL T) will follo w from a Martingale Central Limit Theorem with stable quadratic v ariation [Hall and Heyde, 1980, Theorem 3.2], and Eq. (REM) will follo w from a negligibility condition. Putting the abov e statemen ts together, w e obtain that log p ( T ) η ( ¯ O T ) p ( T ) ( ¯ O T ) p → N  − ϵ 2 2 , ϵ 2  . Pro of of Eq. (32) . It suﬃces to sho w that the mean and v ariance of T X t =1 W t − η T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) + ϵ 2 4 con verges to zero. By the martingale structure, w e can write V ar P ( T ) T X t =1 W t − η T T X t =1 ¯ α T ( Z t )( Y t − h T ( Z t )) ! = T X t =1 V ar P ( T ) h W t − η T ¯ α T ( Z t )( Y t − h T ( Z t )) i ≤ T E P ( T ) " 1 T T X t =1  W t − η T ¯ α T ( Z t )( Y t − h T ( Z t ))  2 # = T Z 2 s q Y ,η ( y | z ) q Y ( y | z ) − 2 − η T ¯ α T ( z )( y − h T ( z )) ! 2 q Y ( y | z ) ¯ h T ( z )d µ ( a, x, y ) = 4 T Z  q q Y ,η ( y | z ) − p q Y ( y | z )  1 + η 2 T ¯ α T ( z )( y − h T ( z ))   2 ¯ h T ( z )d µ ( a, x, y ) ( ∗ ) ≤ 4 η 4 T ¯ σ 4 T T 4 = 4 ϵ 4 T = o (1) . where ( ∗ ) follo ws from the remainder bound in Lemma 1. W e now sho w that mean v anishes. Using the fact that the score has mean zero, w e hav e E P ( T ) " T X t =1 W t # = T E P ( T ) " 1 T T X t =1 2 s q Y ,η ( Y t | Z t ) q Y ( Y t | Z t ) − 2 # = T Z 2 s q Y ,η ( y | z ) q Y ( y | z ) − 2 ! ¯ h T ( z ) q Y ( y | z )d µ ( y , z ) = 2 T Z q q Y ,η ( y | z ) q Y ( y | z ) ¯ h T ( z )d µ ( y , z ) − 2 T = − T Z  q q Y ,η ( y | z ) ¯ h T ( z ) − q q Y ( y | z ) ¯ h T ( z )  2 d µ ( y , z ) . 29 If w e deﬁne A = q q Y ,η ( y | z ) ¯ h T ( z ) − q q Y ( y | z ) ¯ h T ( z ) B = R t,η − A = − η 2 T ¯ α T ( z )( y − h T ( z )) q q Y ( y | z ) ¯ h T ( z ) , W e no w use ∥ R t,η ∥ L 2 ( µ ) ≤ η 2 ¯ σ 2 T T 2 and η = ϵI − 1 2 T = ϵ ¯ σ − 1 T T 1 2 to obtain ∥ R t,η ∥ L 2 ( µ ) ≤ ϵ 2 ¯ σ − 2 T T ¯ σ 2 T T 2 = ϵ 2 T . W e compute ∥ B ∥ 2 L 2 ( µ ) = η 2 4 T 2 Z ( ¯ α T ( z )( y − h T ( z ))) 2 q Y ( y | z ) ¯ h T ( z )d µ ( y , z ) = η 2 4 T 2 E " 1 T T X t =1 ( ¯ α T ( Z t )( Y t − h T ( Z t ))) 2 # = η 2 4 T 2 ¯ σ 2 T = ϵ 2 ¯ σ − 2 T T ¯ σ 2 T 4 T 2 = ϵ 2 4 T . where in the last step w e substitute in the deﬁnition of η = ϵI − 1 2 T . Then we hav e    − T ∥ A ∥ 2 L 2 ( µ ) + T ∥ B ∥ 2 L 2 ( µ )    =    − T ∥ R t,η − B ∥ 2 L 2 ( µ ) + T ∥ B ∥ 2 L 2 ( µ )    =    − T ∥ R t,η ∥ 2 L 2 ( µ ) + 2 T ⟨ B , R t,η ⟩ L 2 ( µ )    ≤ T ∥ R t,η ∥ 2 L 2 ( µ ) + 2 T ∥ B ∥ L 2 ( µ ) ∥ R t,η ∥ L 2 ( µ ) . Hence w e ha ve    − T ∥ A ∥ 2 L 2 ( µ ) + T ∥ B ∥ 2 L 2 ( µ )    ≤ ϵ 4 T + ϵ 2 √ T 2 T ϵ 2 T = ϵ 3 √ T = o (1) . Hence, w e ha ve      E P ( T ) " T X t =1 W t # + ϵ 2 4      =    − T ∥ A ∥ 2 L 2 ( µ ) + T ∥ B ∥ 2 L 2 ( µ )    = o (1) , as desired. Pro of of Eq. (MCL T) . W e verify the assumptions of Lemma 2. Firstly w e verify (QV) . W e hav e T X t =1 η 2 T 2 E  ¯ α T ( Z t ) 2 ( Y t − h T ( Z t )) 2 | F T ,t − 1  = ϵ 2 σ 2 ¯ σ 2 T 1 T T X t =1 ¯ α T ( Z t ) 2 = ϵ 2 σ 2 ¯ σ 2 T ν ⊤ T ¯ Σ † T 1 T T X t =1 φ ( Z t ) φ ( Z t ) ⊤ ! ¯ Σ † T ν T = ϵ 2 σ 2 ¯ σ 2 T ν ⊤ T ¯ Σ † T ˆ Σ T ¯ Σ † T ν T , 30 By Assumption 11, we hav e T X t =1 η 2 T 2 ¯ α T ( Z t ) 2 ( Y t − h T ( Z t )) 2 p → ϵ 2 . (33) Secondly , w e require T X t =1 E  η 2 s T ( Z t , Y t ) 2 1[ | η s T ( Z t , Y t ) | > ϵ ] | F T ,t − 1  p → 0 . Moreo ver, (Lindeb erg) is v eriﬁed b y Assumption 12. Pro of of Eq. (QV) . W e hav e T X t =1 W 2 t = T X t =1 4 " s q Y ,η ( Y t | Z t ) q Y ( Y t | Z t ) − 1 # 2 = T X t =1 4 " R t,η ( Y t , Z t ) p q Y ( Y t | Z t ) ¯ h T ( Z t ) + η 2 T ¯ α T ( Z t )( Y t − h T ( Z t )) # 2 = T X t =1 4 R t,η ( Y t , Z t ) 2 q Y ( Y t | Z t ) ¯ h T ( Z t ) + T X t =1 4 η 2 4 T 2 ¯ α T ( Z t ) 2 ( Y t − h T ( Z t )) 2 + T X t =1 4 R t,η ( Y t , Z t ) p q Y ( Y t | Z t ) ¯ h T ( Z t ) η T ¯ α T ( Z t )( Y t − h T ( Z t )) . W e ha ve E P ( T ) " T X t =1 R t,η ( Y t , Z t ) 2 q Y ( Y t | Z t ) ¯ h T ( Z t ) # = T X t =1 Z Z R t,η ( y , z ) 2 q Y ( y | z ) ¯ h T ( z ) q Y ( y | z ) q X ( x ) g t ( a | x )d z d y = T ∥ R t,η ∥ 2 L 2 ( µ ) ≤ η 4 ¯ σ 4 T T 3 = ϵ 4 T = o (1) . Since con vergence in exp ectation implies conv ergence in probabilit y , we hav e T X t =1 4 R t,η ( Y t , Z t ) 2 q Y ( Y t | Z t ) ¯ h T ( Z t ) p → 0 , W e no w conclude via Eq. (33) and a Cauc h y-Sch w arz inequality that P T t =1 W 2 t p → ϵ 2 . Pro of of Eq. (REM) . This is implied b y max t | W t | = o T (1) as shown in the pro of of v an der V aart [1998, Theorem 7.2]. In our setting, this holds due to I T = ¯ σ 2 T T → ∞ as T → ∞ , hence η → 0 as T → ∞ . 31 11.3 Pro of of Con volution theorem W e pro ve Theorem 4 following a standard route (QMD, LAN, Third Le Cam lemma, matc hing and equiv ariance, Anderson lemma), with the adaptive lo calization η T = ϵ √ T / ¯ σ T . Throughout, w e implicitly assume that the information index I T := ¯ σ 2 T /T → ∞ , so that η T → 0 for eac h ﬁxed ϵ . Pr o of of The or em 4. Fix ϵ ∈ R and deﬁne the log-lik eliho o d ratio Λ T ( ϵ ) := log dP ( T ) ϵ √ T / ¯ σ T dP ( T ) ( ¯ O T ) . By Theorem 3, Λ T ( ϵ ) = ϵ ∆ T − ϵ 2 2 + o P ( T ) (1) , ∆ T := √ T ¯ σ T D ∗ T ( ¯ O T ) ⇒ N (0 , 1) under P ( T ) . (34) In particular, the LAN expansion (34) implies that for eac h ﬁxed ϵ ∈ R , the sequence of local la ws { P ( T ) ϵ √ T / ¯ σ T } T ≥ 1 is con tiguous with resp ect to { P ( T ) } T ≥ 1 . Deﬁne the normalized estimation error under the baseline la w P ( T ) , ϕ T ( ¯ O T ) := √ T ¯ σ T  ˆ Ψ T − Ψ T ( P ( T ) )  . (35) Regularit y (Deﬁnition 3) implies tigh tness of { ϕ T } under each lo cal law P ( T ) ϵ √ T / ¯ σ T , hence in particular under P ( T ) . Along a subsequence (not relabeled), ( ϕ T , ∆ T ) ⇒ ( S, ∆) under P ( T ) , ∆ ∼ N (0 , 1) . (36) Step 1: Third lemma and the family of limit la ws. F rom (34) and (36), ( ϕ T , Λ T ( ϵ )) ⇒ ( S, ϵ ∆ − ϵ 2 / 2) under P ( T ) . Le Cam’s third lemma implies that, under P ( T ) ϵ √ T / ¯ σ T , ϕ T ⇒ L ϵ , L ϵ ( B ) = E h 1 { S ∈ B } exp  ϵ ∆ − ϵ 2 2 i , B ∈ B ( R ) . (37) Step 2: Matc hing in the Gaussian shift exp eriment. By the matching theorem [v an der V aart, 1998, Theorem 7.10], there exist a measurable map g : R × [0 , 1] → R and an auxiliary random v ariable U ∼ Unif (0 , 1) indep enden t of ev erything else suc h that, for X ϵ ∼ N ( ϵ, 1) , g ( X ϵ , U ) ∼ L ϵ , ∀ ϵ ∈ R . (38) Step 3: Regularity ⇒ equiv ariance-in-law. Deﬁne the (normalized) lo cal centering term B T ( ϵ ) := √ T ¯ σ T  Ψ T ( P ( T ) ϵ √ T / ¯ σ T ) − Ψ T ( P ( T ) )  . (39) By path wise diﬀeren tiabilit y of Ψ T at P ( T ) (Assumption 10), for an y one-dimensional submo del { P ( T ) η } through P ( T ) with score s T = ∂ η log p ( T ) η | η =0 , Ψ T ( P ( T ) η ) − Ψ T ( P ( T ) ) = η P ( T )  D ∗ T s T  + o ( η ) , η → 0 . (40) 32 F or a least fav orable submo del in the sense of Deﬁnition 2, w e ha ve s T = D ∗ T . Substituting into (40) giv es Ψ T ( P ( T ) η ) − Ψ T ( P ( T ) ) = η P ( T )  ( D ∗ T ) 2  + o ( η ) . (41) W e no w tak e the lo cal parameter η T := ϵ √ T / ¯ σ T = ϵ/ p I T . Since I T → ∞ , we hav e η T → 0 , and the pathwise diﬀeren tiabilit y expansion (41) applies along this sequence. Plugging η = η T in to (41) and then into (39) yields B T ( ϵ ) = ϵ T ¯ σ 2 T P ( T )  ( D ∗ T ) 2  + o (1) . (42) Moreo ver, since η T → 0 and √ T / ¯ σ T = 1 / √ I T → 0 , the remainder term satisﬁes √ T ¯ σ T o ( η T ) = o (1) . By construction of ¯ σ T , P ( T ) [( D ∗ T ) 2 ] = ¯ σ 2 T /T , w e ha ve that B T ( ϵ ) = ϵ + o (1) for eac h ﬁxed ϵ. (43) Regularit y (Deﬁnition 3) asserts that, for ev ery ﬁxed ϵ ∈ R , √ T ¯ σ T  ˆ Ψ T − Ψ T ( P ( T ) ϵ √ T / ¯ σ T )  ⇒ L under P ( T ) ϵ √ T / ¯ σ T , (44) for a law L not dep ending on ϵ . Since ϕ T − B T ( ϵ ) is exactly the left-hand side of (44), we obtain ϕ T − B T ( ϵ ) ⇒ L under P ( T ) ϵ √ T / ¯ σ T . Com bining this with (37) and (43) yields L ϵ ( · − ϵ ) = L 0 ( · ) , ∀ ϵ ∈ R . Using (38), this is equiv alen t to g ( X ϵ , U ) − ϵ d = g ( X 0 , U ) , ∀ ϵ ∈ R , (45) i.e. g is equiv ariant-in-la w in the Gaussian shift exp erimen t. Step 4: Anderson decomp osition and conv olution. By [v an der V aart, 1998, Prop osition 8.4], equiv ariance-in-law implies there exists a random v ariable ξ , independent of X 0 ∼ N (0 , 1) , suc h that g ( X 0 , U ) d = X 0 + ξ . Let M := L ( ξ ) . Then L 0 = L ( g ( X 0 , U )) = N (0 , 1) ∗ M . Since regularity iden tiﬁes the (subsequen tial) limit la w under P ( T ) as L = L 0 , w e obtain the con volution representation L = N (0 , 1) ∗ M . 33 12 Auxiliary results W e use the follo wing v ersion of the martingale cen tral limit theorem from Hall and Heyde [1980, Corollary 3.1]. Lemma 2. Supp ose { ( X n,i , F n,i ) : i ∈ [ k n ] , n ∈ N } is a martingale diﬀer enc e arr ay with the neste d pr op erty F n,i ⊂ F n +1 ,i for al l i ∈ [ k n ] and n ∈ N . W e further assume that ( ∀ ϵ > 0) X i ∈ [ k n ] E [ X 2 n,i 1[ | X n,i | > ϵ ] | F n,i − 1 ] p → 0 (Lindeb erg) X i ∈ [ k n ] E  X 2 n,i | F n,i − 1  p → 1 . (QV) Then P i ∈ [ k n ] X n,i d → N (0 , 1) . 34

Efficient Inference after Directionally Stable Adaptive Experiments

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment