On the Equivalence between Neyman Orthogonality and Pathwise Differentiability

On the Equiv alence b et w een Neyman Orthogonalit y and P ath wise Diﬀeren tiabilit y Y uxi Chen 1 , Edw ard H. Kennedy 1 , and Siv araman Balakrishnan 1 , 2 1 Departmen t of Statistics & Data Science 2 Mac hine Learning Department Carnegie Mellon Universit y {eric, edward, siva}@stat.cmu.edu Abstract It has b een frequen tly observed that Neyman orthogonality , the central device underlying double/debiased mac hine learning [ Chernozhuk ov et al. , 2018 ], and path wise diﬀeren tiability , a cornerstone concept from semiparametric theory , often lead to the same debiased estimators in practice. Despite the widespread adoption of b oth ideas, the precise nature of this equiv alence has remained elusiv e, with the tw o concepts having b een developed in largely separate tradi- tions. In this w ork, we revisit the semiparametric framew ork of v an der Laan and Robins [ 2003 ] and iden tify an implicit regularity assumption on the relationship b et ween target and n uisance parameters—a local pro duct structure—that allo ws us to establish a formal equiv alence b et ween Neyman orthogonality and pathwise diﬀerentiabilit y . W e demonstrate that the tw o directions of this equiv alence imp ose fundamentally diﬀerent structural requirements, and illustrate the theory through a concrete example of estimating the av erage treatment eﬀect. This helps clarify the relationship b et ween these tw o foundational frameworks and provides a useful reference for practitioners working at their intersection. 1 In tro duction In recen t years, the double/debiased machine learning (DML) framew ork of Chernozhuk o v et al. [ 2018 ] has b ecome a standard to ol in mo dern causal inference for estimating lo w-dimensional pa- rameters in the presence of high-dimensional n uisance functions. The central feature of DML is that the estimating function satisﬁes Neyman ortho gonality : an estimating function m ( Z ; β , η ) is Neyman orthogonal if the Gâteaux deriv ativ e of the exp ected estimating function with resp ect to the nuisance parameter η , ev aluated at the true parameter v alues ( β 0 , η 0 ) , v anishes in all admis- sible p erturbation directions. This ﬁrst-order insensitivit y to the n uisance ensures that bias from estimating η 0 en ters only at second order, enabling the use of ﬂexible machine learning estimators for nuisance functions while preserving desirable prop erties of the target estimator. W e refer the reader to Chernozhuk ov et al. [ 2018 ] for a thorough treatment of these statistical consequences. It has long b een observ ed that Neyman orthogonal estimating functions coincide, in essentially ev- ery example of interest, with inﬂuence functions of p athwise diﬀer entiable functionals from classical semiparametric theory , whic h underpin the construction of eﬃcien t estimators [ Bic kel et al. , 1998 , v an der V aart , 1998 , v an der Laan and Robins , 2003 , T siatis , 2006 ]. F or example, the augmented 1 in verse probabilit y w eighted estimator for the a verage treatmen t eﬀect arises naturally b oth as a one-step correction built from the eﬃcient inﬂuence function and as the solution to a Neyman or- thogonal moment condition. Y et these t wo concepts are routinely developed and inv oked as distinct notions. A precise c haracterization of when and why they agree, and what structural condition each direction requires, do es not app ear to hav e b een made explicit in the literature. W e hop e to close the gap b etw een these tw o traditions by formalizing the equiv alence, and by clarifying the struc- tural and regularity conditions that underpin each direction of the implication. F or simplicity , w e restrict attention to scalar-v alued functionals, although the results extend generally to vector-v alued scenarios. Establishing the equiv alence requires bridging tw o seemingly distinct viewp oin ts. P athwise diﬀer- en tiability is formulated geometrically , characterizing the ﬁrst-order b eha vior of a functional along smo oth perturbations of the data-generating distribution without reference to an y explicit nui- sance parameterization. On the other hand, Neyman orthogonality is deﬁned analytically through deriv ativ es of an exp ected estimating function with resp ect to an explicitly parameterized n uisance. Relating the tw o turns out to require constructing smo oth p erturbations of the distribution that mo ve one parameter while holding the other ﬁxed. The classical notion of lo cal v ariation indep en- dence means the attainable parameter set contains a pro duct neigh b orho od of ( β 0 , η 0 ) , guaranteeing that indep enden tly v aried parameter v alues exist. Ho wev er, this deﬁnition is purely set-theoretic and do es not ensure that they are connected b y submo dels regular enough to diﬀerentiate along. W e formalize the missing condition as a lo cal pro duct structure (Assumption 1 ), whic h requires that co ordinate p erturbations not merely exist as p oin ts in the mo del but form regular submo dels through P 0 . This condition underlies the classical framew ork of v an der Laan and Robins [ 2003 ], where it is implicitly in vok ed but not separately identiﬁed. W e make this explicit and discuss its role in their pro ofs in App endix B . Equipp ed with this assumption, we establish the equiv alence b et ween Neyman orthogonality and path wise diﬀerentiabilit y through t wo results. The forw ard direction (Theorem 1 ) shows that a Neyman orthogonal estimating function with a nondegenerate Jacobian induces an inﬂuence func- tion, and hence pathwise diﬀeren tiability , without requiring any v ariation indep endence or pro duct structure. The reverse direction (Theorem 2 ) shows that a mean-zero estimating function whose v alue at the truth is an inﬂuence function must b e Neyman orthogonal. This direction do es require lo cal pro duct structure to identify co ordinate submo dels that p erturb β and η indep enden tly . 2 Bac kground W e work on a measurable space ( Z , A ) and ﬁx a σ -ﬁnite measure ν suc h that ev ery P ∈ P is dominated b y ν . W e denote the density of P ∈ P b y p = dP /dν . Fix P 0 ∈ P with densit y p 0 . W e write E 0 [ · ] ≡ E P 0 [ · ] . Let L 2 ( P 0 ) = n f : E 0 [ f 2 ] < ∞ o , L 0 2 ( P 0 ) = { f ∈ L 2 ( P 0 ) : E 0 [ f ] = 0 } , L ∞ ( P 0 ) = ( f : ∥ f ∥ ∞ := ess sup P 0 | f | < ∞ ) . All equalities b et ween random v ariables are in terpreted P 0 -a.s. unless stated otherwise. T o deﬁne lo cal p erturbations at P 0 , w e consider paths through P 0 inside the mo del P . The appropriate regularit y condition on such paths is quadratic-mean diﬀeren tiability [ v an der V aart , 1998 ]. 2 2.1 Regular Submo dels and Scores Deﬁnition 1 (Regular (QMD) submo del and score) . A r e gular p ar ametric submo del thr ough P 0 is an indexe d family { P t : t ∈ ( − ϵ, ϵ ) } ⊂ P with P t =0 ≡ P 0 such that 1. P t ≪ ν with density p t = dP t /dν . 2. The map t 7→ P t is diﬀer entiable in quadr atic me an (QMD) at 0: ther e exists s ∈ L 0 2 ( P 0 ) such that Z  √ p t − √ p 0 t − 1 2 s √ p 0  2 dν → 0 as t → 0 . The function s is the sc or e of the submo del at 0. When it is helpful to indicate the score of a submo del, we write P t,s for a regular submo del through P 0 with score s . W e also use t 7→ P t,s and { P t,s } interc hangeably to refer to the submo del. One may observe that diﬀerent submo dels can share the same score, and for our purp oses these submo dels should b e considered in terchangeable. This is b ecause the score alone determines the ﬁrst-order b eha vior along any such submo del. It is therefore natural to work not with individual submo dels but with their scores, which we collect into a single space. Let S ⊂ L 0 2 ( P 0 ) b e the set of scores of all regular submo dels through P 0 . Deﬁnition 2 (T angent space) . The (ful l) tangent sp ac e is T := span( S ) L 2 ( P 0 ) ⊂ L 0 2 ( P 0 ) . The tangen t space is deﬁned as the closed span of the scores, but it remains to sho w that scores can b e constructed in a controlled wa y . One simple and standard construction is the linear tilt, where one p erturbs p 0 b y a m ultiplicative factor 1 + tg for a bounded, mean-zero function g , pro ducing a regular submo del with score exactly g . Lemma 1 (Linear tilt submodel is QMD with score g ) . L et g ∈ L ∞ ( P 0 ) with E 0 [ g ] = 0 . L et M := ∥ g ∥ ∞ . F or | t | < 1 / M , deﬁne p t ( z ) := p 0 ( z ) { 1 + tg ( z ) } . Then p t ≥ 0 ν -a.e., R p t dν = 1 , and the r esulting submo del { P t : | t | < 1 / M } is r e gular (QMD) at 0 with sc or e s ≡ g . (Pr o of in A pp endix A.1 .) It should b e noted that these submo dels are not necessarily intended as realistic data-generating mec hanisms but rather as analytical to ols for assessing the lo cal geometry of the mo del. Indeed, in the nonparametric mo del, linear tilts alone suﬃce to saturate the tangent space. Corollary 1 (Saturation in the nonparametric mo del) . Supp ose P is the ful l nonp ar ametric mo del (al l densities p w.r.t. ν ). Then T = L 0 2 ( P 0 ) . Pr o of. By Lemma 1 , ev ery b ounded mean-zero g is a score so L ∞ ( P 0 ) ∩ L 0 2 ( P 0 ) ⊂ S . Since b ounded functions are dense in L 2 ( P 0 ) , it follo ws that L ∞ ( P 0 ) ∩ L 0 2 ( P 0 ) is dense in L 0 2 ( P 0 ) . T aking the closed linear span of these scores yields T = L 0 2 ( P 0 ) . 3 2.1.1 Diﬀeren tiating Exp ectations along Regular Submo dels Deriving the cen tral results of this note requires diﬀerentiating exp ectations of the form E P t,s [ f ( Z )] along regular submo dels, where the integrand f itself may also dep end on t . The ﬁrst result b elo w handles the case for a ﬁxed in tegrand, and the second extends to integrands that v ary along the submo del, which arises naturally when the integrand dep ends on parameters that mo ve with P t,s . Lemma 2 (Diﬀeren tiation of expectations for a ﬁxed f ) . L et t 7→ P t,s b e a r e gular (QMD) submo del thr ough P 0 with b ounde d sc or e s . If f ∈ L ∞ ( P 0 ) , then d dt E P t,s [ f ( Z )]     t =0 = E 0 [ f ( Z ) s ( Z )] . (Pr o of in A pp endix A.2 .) Lemma 3 (Diﬀerentiation of exp ectations for v arying f t ) . L et t 7→ P t,s b e a r e gular (QMD) submo del thr ough P 0 with b ounde d sc or e s . L et f t : Z → R b e me asur able. A ssume 1. f 0 ∈ L ∞ ( P 0 ) . 2. Ther e exists ˙ f 0 ∈ L 2 ( P 0 ) such that E 0 "  f t − f 0 t − ˙ f 0  2 # → 0 as t → 0 . 3. Ther e exists δ > 0 , C < ∞ such that sup | t | <δ E P t,s "  f t − f 0 t  2 # ≤ C . Then t 7→ E P t,s [ f t ] is diﬀer entiable at 0 and d dt E P t,s [ f t ]     t =0 = E 0 [ f 0 s ] + E 0 [ ˙ f 0 ] . (Pr o of in A pp endix A.3 .) 2.1.2 Nuisance Scores, Nuisance T angen t Space, and P athwise Deriv ativ es The to ols dev elop ed in Section 2.1.1 allow us to diﬀerentiate exp ectations along regular submo dels, but do not yet distinguish b etw een p erturbations that change the parameter of interest and those that do not. T o clarify this distinction, w e deﬁne nuisance scores, the nuisance tangent space, and inﬂuence functions follo wing v an der Laan and Robins [ 2003 ], and sho w that inﬂuence functions are orthogonal to the n uisance tangen t space. Let β : P → R be the target parameter of interest with β 0 := β ( P 0 ) . Deﬁnition 3 (Nuisance scores and nuisance tangen t space) . A ssume that for every r e gular submo del t 7→ P t,s thr ough P 0 , the derivative d dt β ( P t,s ) | t =0 exists. Deﬁne the nuisanc e sc or e set S nuis := ( s ∈ S : ∃ a r e gular submo del t 7→ P t,s with sc or e s such that d dt β ( P t,s )     t =0 = 0 ) . Deﬁne the nuisanc e tangent sp ac e Λ := span( S nuis ) L 2 ( P 0 ) ⊂ T . 4 It is worth noting that n uisance scores are deﬁned without reference to any explicit n uisance param- eterization. Concretely , a score s is nuisance if there exists a regular submo del with score s along whic h β is lo cally constant to ﬁrst order. In general, d dt β ( P t,s ) | t =0 need not b e determined uniquely b y the score alone, i.e., t w o regular submo dels can share the same score while yielding diﬀeren t deriv ativ es of β . Moreo ver, Λ is a closed linear subspace of T generated by score directions that admit regular submo dels along whic h β is lo cally constant to ﬁrst order. Deﬁnition 4 (Path wise diﬀeren tiabilit y and inﬂuence functions) . W e say β is p athwise diﬀer entiable at P 0 if ther e exists φ ∈ L 0 2 ( P 0 ) such that for every r e gular submo del with sc or e s , d dt β ( P t,s )     t =0 = E 0 [ φ ( Z ; P 0 ) s ( Z )] . A ny such φ is c al le d an inﬂuenc e function of β at P 0 or a gr adient of the p athwise derivative. Here, φ ( Z ; P 0 ) indicates that φ is a functional of P 0 ev aluated at the data p oin t Z . Since P 0 is ﬁxed throughout, we write simply φ ( Z ) hereafter. Note that if β is pathwise diﬀerentiable at P 0 , then the deriv ativ e dep ends only on the score, in whic h case the condition in Deﬁnition 3 is equiv alen t to requiring that β do es not change to ﬁrst order along any regular submo del with score s . Remark 1 (Uniqueness of the inﬂuence function) . In general, the inﬂuence function need not b e unique. The pathwise deriv ative condition only probes φ through inner products with scores s ∈ T , so adding any h ∈ T ⊥ to φ pro duces another v alid inﬂuence function. Only the pro jection on to T is identiﬁed b y the pathwise deriv ative. This pro jection is called the eﬃcient inﬂuence function (EIF) and is the unique inﬂuence function lying in T . In the nonparametric mo del, T = L 0 2 ( P 0 ) by Corollary 1 , so T ⊥ = { 0 } and the inﬂuence function is unique. Since all examples in this note are nonparametric, the inﬂuence function and the EIF coincide throughout. Lemma 4 (Inﬂuence functions are orthogonal to Λ ) . If β is p athwise diﬀer entiable with inﬂuenc e function φ (Deﬁnition 4 ), then E 0 [ φ ( Z ) s ( Z )] = 0 ∀ s ∈ Λ . Pr o of. Let s ∈ S nuis . By Deﬁnition 3 , there exists a regular submo del with score s along which d dt β ( P t,s )   t =0 = 0 . By pathwise diﬀeren tiability , 0 = d dt β ( P t,s )     t =0 = E 0 [ φ ( Z ) s ( Z )] . Since the map s 7→ E 0 [ φs ] is a contin uous linear functional on L 2 ( P 0 ) , the equality extends from S nuis to its closed linear span Λ . Lemma 4 says that the inﬂuence function is orthogonal to every direction in the n uisance tangent space. This can b e considered as an analogue of Neyman orthogonality , which requires that the exp ected estimating function b e insensitive to p erturbations of the n uisance parameter, but formu- lated without reference to an y explicit parameterization. Establishing a formal equiv alence b et w een these tw o formulations, as we do in Section 3 , will rely on the pro duct structure developed in the next section to iden tify n uisance p erturbations with n uisance scores in Λ . 5 2.2 Estimating F unctions and Neyman Orthogonalit y T o form ulate Neyman orthogonalit y , w e will need to w ork with estimating functions of the form m ( Z ; β , η ) that dep end explicitly on b oth a target parameter β and a nuisance parameter η . This requires us to mo v e b eyond the framew ork of Section 2.1.2 , where n uisance scores w ere deﬁned without reference to any explicit parameterization, and sp ecify concrete functionals on the mo del that assume the roles of the target and nuisance. Once such a parameterization is in place, it is natural to ask what structure the relationship b et w een β and η must p ossess for the t wo viewp oin ts to agree. Path wise diﬀeren tiability is deﬁned through scores alone and mak es no reference to how the n uisance is parameterized, while Neyman orthogonalit y dep ends explicitly on the functional form of β and η . As w e sho w b elow, connecting these tw o viewp oin ts requires the ability to construct submo dels that mo ve one co ordinate while holding the other ﬁxed. As b efore, let β : P → R , η : P → H b e functionals on the mo del, where H ⊂ V is a subset of a normed v ector space with the norm denoted by ∥ · ∥ V . W e let β 0 := β ( P 0 ) and η 0 := η ( P 0 ) . F or any pair of ( β , η ) in the attainable set Θ := { ( β ( P ) , η ( P )) : P ∈ P } , w e write P β ,η for a distribution in P with β ( P β ,η ) = β and η ( P β ,η ) = η , so that for any P ∈ P , the exp ectation E P [ f ( Z ; β ( P ) , η ( P ))] can b e written E P β ,η [ f ( Z ; β , η )] with ( β , η ) = ( β ( P ) , η ( P )) . Finally , let ˙ H := { h ∈ V : ∃ ϵ > 0 suc h that η 0 + th ∈ H for all | t | < ϵ } denote the set of admissible p erturbation directions at η 0 . 2.2.1 Local Pro duct Structure T o apply the diﬀeren tiation results of Section 2.1.1 , we require an additional lo cal pro duct structure assumption, whic h ensures the existence of regular (QMD) submo dels along eac h co ordinate, that is, submo dels that p erturb one of β or η while holding the other ﬁxed. Note that any regular submo del t 7→ P t,s through P 0 induces a co ordinate path t 7→ ( β t,s , η t,s ) := ( β ( P t,s ) , η ( P t,s )) . The follo wing assumption requires that this co ordinate path can b e con trolled indep enden tly in each comp onen t. Assumption 1 (Lo cal pro duct structure) . The fol lowing ﬁrst-or der c o or dinate c onditions hold: 1. β -c o or dinate submo del. Ther e exists a r e gular (QMD) submo del t 7→ P t ∈ P thr ough P 0 along which the induc e d c o or dinate p ath is diﬀer entiable at t = 0 with d dt β ( P t )     t =0 = 1 and d dt η ( P t )     t =0 = 0 . 2. η -c o or dinate submo del. F or every admissible nuisanc e p erturb ation dir e ction h ∈ ˙ H , ther e exists a r e gular (QMD) submo del t 7→ P t ∈ P thr ough P 0 along which the induc e d c o or dinate p ath is diﬀer entiable at t = 0 with d dt β ( P t )     t =0 = 0 and d dt η ( P t )     t =0 = h. This formalizes a condition implicit in the framework of v an der Laan and Robins [ 2003 , p. 56], where the mo del is written as { F µ,η } with µ and η “indep enden tly v arying,” and submo dels v arying only 6 the n uisance parameter are used to generate the nuisance tangent space. Note that Assumption 1 requires only ﬁrst-order con trol, where the deriv ativ es of β ( P t ) and η ( P t ) at t = 0 are prescrib ed, but the paths need not satisfy β ( P t ) = β 0 + t or η ( P t ) = η 0 + th exactly for t  = 0 . W e discuss the relationship b etw een Assumption 1 and the notion of lo cal v ariation indep endence, as well as the role of pro duct structure in the pro of of Lemma 1.3 of v an der Laan and Robins [ 2003 ], in App endix B . 2.2.2 Neyman Orthogonality Next, let m : Z × R × H → R b e such that z 7→ m ( z ; β , η ) is A -measurable for eac h ( β , η ) . The function m pla ys the role of an estimating function, enco ding a moment condition whose solution at the true nuisance v alue identiﬁes β 0 , while the explicit dep endence on η reﬂects the presence of n uisance quan tities that need to b e estimated. Deﬁnition 5 (Correct lo cal sp eciﬁcation) . W e say that m is c orr e ctly sp e ciﬁe d in a neighb orho o d of ( β 0 , η 0 ) if for al l P ∈ P with ( β ( P ) , η ( P )) in a neighb orho o d of ( β 0 , η 0 ) , E P [ m ( Z ; β ( P ) , η ( P ))] = 0 . Correct sp eciﬁcation ensures that β 0 solv es the moment condition at the true nuisance, but do es not constrain how the exp ected estimating function v aries with η near η 0 . Neyman orthogonality strengthens this by requiring that this v ariation v anish to ﬁrst order, so that small errors in η do not propagate to estimation of β . Deﬁnition 6 (Neyman orthogonality) . A ssume the map η 7→ E 0 [ m ( Z ; β 0 , η )] is Gâte aux diﬀer en- tiable at η 0 along dir e ctions h ∈ ˙ H . W e say m is Neyman ortho gonal at ( β 0 , η 0 ) if ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] = 0 ∀ h ∈ ˙ H . Remark 2. The Gâteaux deriv ativ e in Deﬁnition 6 is computed under the ﬁxed measure P 0 with β 0 held ﬁxed. The map η 7→ E 0 [ m ( Z ; β 0 , η )] is deﬁned for an y η ∈ H for whic h the in tegral exists, without requiring that ( β 0 , η ) corresp ond to a distribution in the mo del P . In particular, no v ariation indep endence is needed to formulate Neyman orthogonality . The role of Assumption 1 is instead to establish that Neyman orthogonalit y holds for inﬂuence functions. 2.3 The L 2 Chain R ule along Co ordinate Paths T o connect estimating functions with path wise diﬀeren tiability , w e also need to diﬀerentiate the estimating function m ( Z ; β , η ) along the coordinate path induced b y a regular submodel. The follo wing t wo assumptions regulate the behavior of this coordinate path and of the estimating function along it. Assumption 2 (Co ordinate smo othness along a submo del) . F or a given r e gular (QMD) submo del t 7→ P t,s thr ough P 0 with sc or e s , the induc e d c o or dinate p ath satisﬁes: 1. t 7→ β t,s := β ( P t,s ) is diﬀer entiable at 0 : ( β t,s − β 0 ) /t → ˙ β 0 ,s ∈ R . 2. t 7→ η t,s := η ( P t,s ) is diﬀer entiable at 0 in V : ∥ ( η t,s − η 0 ) /t − ˙ η 0 ,s ∥ V → 0 for some ˙ η 0 ,s ∈ ˙ H . In p articular, ˙ β 0 ,s = d dt β ( P t,s )   t =0 . 7 Assumption 3 (F réc het diﬀeren tiability of m in L 2 ( P 0 ) ) . The map ( β , η ) 7→ m ( · ; β , η ) ∈ L 2 ( P 0 ) is F r é chet diﬀer entiable at ( β 0 , η 0 ) . That is, ther e exist b ounde d line ar maps D β m 0 : R → L 2 ( P 0 ) , D η m 0 : V → L 2 ( P 0 ) such that ∥ m ( · ; β , η ) − m ( · ; β 0 , η 0 ) − D β m 0 ( β − β 0 ) − D η m 0 ( η − η 0 ) ∥ L 2 ( P 0 ) = o ( | β − β 0 | + ∥ η − η 0 ∥ V ) . W e write ∂ β m ( Z ; β 0 , η 0 ) := D β m 0 (1)( Z ) and ∂ η m ( Z ; β 0 , η 0 )[ h ] := D η m 0 ( h )( Z ) . Lemma 5 ( L 2 c hain rule) . Under A ssumptions 2 and 3 , deﬁne f t,s ( Z ) := m ( Z ; β t,s , η t,s ) and ˙ f 0 ,s ( Z ) := ∂ β m ( Z ; β 0 , η 0 ) ˙ β 0 ,s + ∂ η m ( Z ; β 0 , η 0 )[ ˙ η 0 ,s ] . Then ( f t,s − f 0 ) /t → ˙ f 0 ,s in L 2 ( P 0 ) . (Pr o of in App endix A.4 .) 3 Equiv alence Bet w een Neyman Orthogonalit y and P ath wise Dif- feren tiabilit y W e now establish the relationship b etw een Neyman orthogonalit y and path wise diﬀeren tiability . The forw ard direction (Section 3.1 ) demonstrates that a Neyman orthogonal estimating function with nondegenerate Jacobian induces an inﬂuence function, and hence pathwise diﬀeren tiability . The rev erse direction (Section 3.2 ) shows that if a correctly sp eciﬁed estimating function ev aluates to an inﬂuence function at the truth, then it must b e Neyman orthogonal and its sensitivit y to the target parameter is fully calibrated by the inﬂuence function representation. Here, we require lo cal pro duct structure in order to sp ecialize to coordinate submo dels that p erturb β and η indep enden tly . The proofs of the tw o directions diﬀer regarding their structural requiremen ts. The forward direction requires that the induced co ordinate paths b e smo oth along a dense class of regular submo dels and that the target functional b e lo cally Lipsc hitz in Hellinger distance, whereas the rev erse direction requires the lo cal pro duct structure of Assumption 1 in order to construct submo dels that p erturb β and η indep enden tly . 3.1 Neyman Orthogonalit y Implies P ath wise Diﬀerentiabilit y Fix an estimating function m : Z × R × H → R . Correct sp eciﬁcation ensures E P t,s [ m ( Z ; β t,s , η t,s )] = 0 iden tically along any regular submo del, so the deriv ativ e of this constan t function v anishes. Ex- panding the deriv ativ e via Lemma 3 and the L 2 c hain rule (Lemma 5 ), and then in voking Neyman orthogonalit y to eliminate the n uisance con tribution, yields a representation of ˙ β 0 ,s as an inner pro duct with the score, whic h is exactly pathwise diﬀerentiabilit y . Assumption 4 (Correct sp eciﬁcation) . The estimating function m is c orr e ctly sp e ciﬁe d at ( β 0 , η 0 ) in the sense of Deﬁnition 5 . Assumption 5 (Co ordinate smo othness along a dense class of submo dels) . Ther e exists a set of sc or es S ⊂ L ∞ ( P 0 ) ∩ L 0 2 ( P 0 ) whose L 2 ( P 0 ) -closur e is e qual to T such that for e ach s ∈ S , ther e exists a r e gular submo del t 7→ P t,s thr ough P 0 with sc or e s along which the induc e d c o or dinate p ath t 7→ ( β t,s , η t,s ) = ( β ( P t,s ) , η ( P t,s )) satisﬁes A ssumption 2 . 8 Assumption 6 (F réc het diﬀeren tiability of m ) . The map ( β , η ) 7→ m ( · ; β , η ) ∈ L 2 ( P 0 ) satisﬁes A ssumption 3 . Assumption 7 (Regularit y along submo dels) . F or e ach s ∈ S and the c orr esp onding submo del t 7→ P t,s fr om A ssumption 5 , the function f t,s ( Z ) := m ( Z ; β t,s , η t,s ) satisﬁes the c onditions of L emma 3 . Assumption 8 (Nondegenerate Jacobian) . G := E 0 [ ∂ β m ( Z ; β 0 , η 0 )]  = 0 . Assumption 9 (Neyman orthogonality) . F or al l h ∈ ˙ H , ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] = 0 . Assumption 10 (Hellinger Lipschitz) . Ther e exist c, δ > 0 such that | β ( P 1 ) − β ( P 2 ) | ≤ cH ( P 1 , P 2 ) ∀ P 1 , P 2 ∈ P with H ( P i , P 0 ) ≤ δ . Assumptions 4 and 9 are the t w o standard requiremen ts on the estimating function in tro duced in Section 2.2.2 . Assumptions 5 through 7 ensure that the diﬀerentiation machinery of Section 2 applies along a dense class of regular submo dels. These amoun t to diﬀeren tiability of the estimating function in its parameters and of the functionals β and η along these submo dels, together with b oundedness and in tegrability conditions near the truth. Assumption 8 ensures that the rescaling φ = − G − 1 m ( Z ; β 0 , η 0 ) in the conclusion of Theorem 1 is w ell deﬁned. Assumption 10 provides the quan titative con trol needed to extend the path wise deriv ativ e from the dense class of b ounded scores to all scores. It b ounds how fast β can v ary relativ e to the Hellinger distance b et w een distributions, ensuring that replacing an arbitrary regular submo del by one from the dense class with a nearby score incurs a con trolled error in the deriv ativ e of β . Theorem 1 (Neyman orthogonality implies pathwise diﬀeren tiability) . Under A ssumptions 4 – 10 , β is p athwise diﬀer entiable (Deﬁnition 4 ) at P 0 with inﬂuenc e function φ ( Z ) := − G − 1 m ( Z ; β 0 , η 0 ) . Pr o of. Let s ∈ S and let t 7→ P t,s b e a regular submo del through P 0 with score s as furnished b y Assumption 5 . By Assumption 5 , the induced co ordinate path ( β t,s , η t,s ) lies in the neighborho o d of ( β 0 , η 0 ) for small t . Assumption 4 then gives E P t,s [ m ( Z ; β t,s , η t,s )] = 0 for all suﬃcien tly small t. Deﬁne f t,s ( Z ) := m ( Z ; β t,s , η t,s ) and f 0 ( Z ) := m ( Z ; β 0 , η 0 ) . Since t 7→ E P t,s [ f t,s ] is identically zero, d dt E P t,s [ f t,s ]     t =0 = 0 . W e apply Lemma 3 to the function f t,s , which is v alid by Assumption 7 . By the L 2 c hain rule (Lemma 5 ), which applies under Assumptions 5 and 6 , the quotient ( f t,s − f 0 ) /t conv erges in L 2 ( P 0 ) to ˙ f 0 ,s ( Z ) = ∂ β m ( Z ; β 0 , η 0 ) ˙ β 0 ,s + ∂ η m ( Z ; β 0 , η 0 )[ ˙ η 0 ,s ] . 9 Lemma 3 thus give s 0 = E 0 [ f 0 ( Z ) s ( Z )] + E 0 [ ˙ f 0 ,s ( Z )] . Substituting the expression for ˙ f 0 ,s and using linearity of exp ectation, 0 = E 0 [ m ( Z ; β 0 , η 0 ) s ( Z )] + E 0 [ ∂ β m ( Z ; β 0 , η 0 )] ˙ β 0 ,s + E 0 [ ∂ η m ( Z ; β 0 , η 0 )[ ˙ η 0 ,s ]] . (1) No w, ˙ η 0 ,s ∈ ˙ H by Assumption 5 , and F réc het diﬀerentiabilit y (Assumption 6 ) p ermits the inter- c hange of deriv ativ e and exp ectation, so that E 0 [ ∂ η m ( Z ; β 0 , η 0 )[ h ]] = ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] , whic h v anishes by Neyman orthogonality (Assumption 9 ). Recalling G = E 0 [ ∂ β m ( Z ; β 0 , η 0 )] , we are left with 0 = E 0 [ m ( Z ; β 0 , η 0 ) s ( Z )] + G ˙ β 0 ,s . Since G  = 0 b y Assumption 8 , ˙ β 0 ,s = − G − 1 E 0 [ m ( Z ; β 0 , η 0 ) s ( Z )] = E 0 [ φ ( Z ) s ( Z )] , where φ ( Z ) = − G − 1 m ( Z ; β 0 , η 0 ) . W e note that φ ∈ L 0 2 ( P 0 ) . Condition 1 of Lemma 3 , inv ok ed via Assumption 7 , requires f 0 = m ( Z ; β 0 , η 0 ) ∈ L ∞ ( P 0 ) . Since this is a prop ert y of f 0 alone and do es not dep end on the choice of submo del, φ = − G − 1 f 0 ∈ L ∞ ( P 0 ) ⊂ L 2 ( P 0 ) . Mean zero follows from correct sp eciﬁcation at the truth (Assumption 4 ). It remains to extend the conclusion to all regular submo dels. T o start, let t 7→ P t,s ′ b e an arbitrary regular submo del through P 0 with score s ′ ∈ S and ﬁx ϵ > 0 . Since S is dense in T by Assumption 5 , there exists g ∈ S with ∥ s ′ − g ∥ L 2 ( P 0 ) ≤ ϵ. Let t 7→ P t,g b e the regular submo del with score g furnished b y Assumption 5 . By QMD, w e kno w that H ( P t,s ′ , P 0 ) → 0 as t → 0 , and H ( P t,g , P 0 ) → 0 as t → 0 . Let δ > 0 b e as in Assumption 10 . There exists t ∗ > 0 suc h that for all | t | < t ∗ , H ( P t,s ′ , P 0 ) ≤ δ and H ( P t,g , P 0 ) ≤ δ. By Lemma 6 (App endix A.5 ), which b ounds the Hellinger distance b et ween t wo regular submo dels in terms of the L 2 ( P 0 ) distance b etw een their scores, lim sup t → 0 H ( P t,s ′ , P t,g ) | t | ≤ 1 2 √ 2 ∥ s ′ − g ∥ L 2 ( P 0 ) ≤ ϵ 2 √ 2 . Then it follows that lim sup t → 0 | β ( P t,s ′ ) − β ( P t,g ) | | t | ≤ c · lim sup t → 0 H ( P t,s ′ , P t,g ) | t | ≤ cϵ 2 √ 2 , where the ﬁrst inequality holds by Assumption 10 , since b oth P t,s ′ and P t,g lie within Hellinger distance δ of P 0 for | t | < t ∗ . Next, for an y t  = 0 with | t | < t ∗ , we write     β ( P t,s ′ ) − β 0 t − E 0 [ φs ′ ]     ≤ | β ( P t,s ′ ) − β ( P t,g ) | | t | | {z } (I) +     β ( P t,g ) − β 0 t − E 0 [ φg ]     | {z } (II) + | E 0 [ φ ( s ′ − g )] | | {z } (II I) . 10 F or term (I), we know lim sup t → 0 (I) ≤ cϵ 2 √ 2 . F or term (I I), the score g lies in S , so we know from ab o v e that lim t → 0 (I I) = 0 . F or term (I I I), b y Cauc hy-Sc h warz, we ha ve | E 0 [ φ ( s ′ − g )] | ≤ ∥ φ ∥ L 2 ( P 0 ) · ∥ s ′ − g ∥ L 2 ( P 0 ) ≤ ∥ φ ∥ L 2 ( P 0 ) · ϵ. Com bining the three terms, w e arrive at lim sup t → 0     β ( P t,s ′ ) − β 0 t − E 0 [ φs ′ ]     ≤ ϵ  c 2 √ 2 + ∥ φ ∥ L 2 ( P 0 )  . Since ϵ was arbitrary , the left-hand side ev aluates to zero. Therefore, d dt β ( P t,s ′ )     t =0 = E 0 [ φ ( Z ) s ′ ( Z )] . Since s ′ ∈ S was arbitrary , Deﬁnition 4 is satisﬁed and β is path wise diﬀerentiable at P 0 with inﬂuence function φ . Remark 3 (Hellinger Lipschitz) . The extension from b ounded scores to all scores in the pro of of Theorem 1 adapts an argumen t from Luedtk e and Chung [ 2024 ], who use a Hellinger Lipschitz condition to establish pathwise diﬀerentiabilit y of Hilb ert-v alued parameters from a score-dense class of submo dels (their Lemma 2). The ﬁrst part of the pro of, which establishes the deriv ative represen tation on the dense class from Neyman orthogonalit y , is sp eciﬁc to the presen t setting. 3.2 P ath wise Diﬀerentiabilit y Implies Neyman Orthogonalit y W e no w prov e the conv erse. If m is a correctly sp eciﬁed estimating function whose v alue at the truth is an inﬂuence function, then m is Neyman orthogonal an d its sensitivity to p erturbations of β is pinned at unit rate by the path wise deriv ativ e. Unlike the forward direction, this requires the lo cal pro duct structure of Assumption 1 in order to sp ecialize Equation 1 from the pro of of Theorem 1 to eac h co ordinate axis indep enden tly . Assumption 11 (Path wise diﬀerentiabilit y and inﬂuence function representation) . The functional β is p athwise diﬀer entiable at P 0 (Deﬁnition 4 ) with inﬂuenc e function φ ( Z ) ≡ m ( Z ; β 0 , η 0 ) . Assumption 12 (Lo cal pro duct structure) . A ssumption 1 holds. W e denote the sc or e of the β - c o or dinate submo del by s β and the sc or e of the η -c o or dinate submo del in dir e ction h by s h . Assumption 13 (Regularit y along co ordinate submo dels) . F or e ach c o or dinate submo del t 7→ P t,s fr om A ssumption 12 , the sc or e s is b ounde d and the function f t,s ( Z ) := m ( Z ; β t,s , η t,s ) satisﬁes the c onditions of L emma 3 . Theorem 2 (Path wise diﬀeren tiability implies Neyman orthogonality) . Under A ssumptions 4 , 6 , and 11 – 13 , the estimating function m satisﬁes: 1. Neyman ortho gonality. F or al l h ∈ ˙ H , ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] = 0 . 2. − 1 normalization. G := E 0 [ ∂ β m ( Z ; β 0 , η 0 )] = − 1 . 11 Pr o of. The co ordinate submo dels furnished by Assumption 12 are regular submo dels through P 0 , and their induced co ordinate paths are diﬀerentiable at t = 0 b y construction, so they satisfy As- sumption 2 . T ogether with correct sp eciﬁcation (Assumption 4 ), F réchet diﬀeren tiability (Assump- tion 6 ), and the regularity conditions of Assumption 13 , the deriv ation in the pro of of Theorem 1 leading to Equation 1 applies to each co ordinate submo del t 7→ P t,s with score s , 0 = E 0 [ m ( Z ; β 0 , η 0 ) s ( Z )] + G ˙ β 0 ,s + ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ ˙ η 0 ,s ] . (2) By the inﬂuence function represen tation (Assumption 11 ), the ﬁrst term equals ˙ β 0 ,s = d dt β ( P t,s )   t =0 , so ( 2 ) b ecomes (1 + G ) ˙ β 0 ,s + ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ ˙ η 0 ,s ] = 0 . (3) W e now sp ecialize ( 3 ) to each co ordinate submo del. P art 1. Fix h ∈ ˙ H and take t 7→ P t,s h to b e the η -coordinate submo del from Assumption 12 . By Assumption 12 , ˙ β 0 ,s h = 0 and ˙ η 0 ,s h = h . Substituting into ( 3 ), ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] = 0 . Since h ∈ ˙ H was arbitrary , Neyman orthogonalit y holds. P art 2. T ak e t 7→ P t,s β to b e the β -co ordinate submo del from Assumption 12 . By Assumption 12 , ˙ β 0 ,s β = 1 and ˙ η 0 ,s β = 0 . Since the Gâteaux deriv ativ e in ( 3 ) is ev aluated at direction ˙ η 0 ,s β = 0 , the n umerator of the deﬁning diﬀerence quotient v anishes identically , which lea ves (1 + G ) · 1 = 0 , hence G = − 1 . Remark 4 (Structural comparison with the forw ard direction) . The forward and rev erse directions share the same intermediate iden tity ( 1 ), but diﬀer in what is known and what is derived. In the forw ard direction, Neyman orthogonality eliminates the n uisance term, and the resulting inner- pro duct representation ˙ β 0 ,s = E 0 [ φ s ] for every score s ∈ S yields path wise diﬀeren tiability . In the rev erse direction, the inﬂuence function represen tation conv erts the ﬁrst term into ˙ β 0 ,s , and pro duct structure allows one to sp ecialize the resulting identit y ( 3 ) to eac h co ordinate axis indep enden tly , yielding Neyman orthogonality and G = − 1 . The tw o directions also place diﬀeren t requirements on the submo dels. In Theorem 1 , the co ordinate path ( β t,s , η t,s ) arises from ev aluating the functionals β and η along regular submo dels from the dense class in Assumption 5 . In Theorem 2 , we m ust construct submo dels with prescrib ed ﬁrst- order co ordinate b eha vior, one along which ˙ β 0 ,s = 1 , ˙ η 0 ,s = 0 and one with ˙ β 0 ,s = 0 , ˙ η 0 ,s = h . Remark 5 (Sharpness of the pro duct structure assumption) . When β factors through η , as when β ( P ) = R p 2 dν with η = p , the target parameter carries no degrees of freedom b ey ond those already enco ded in the nuisance. Holding η ﬁxed necessarily holds β ﬁxed, and no β -coordinate submo del of the kind required by Assumption 1 can exist. P ath wise diﬀerentiabilit y still holds in this example, but the estimating function deﬁned by the inﬂuence function is not Neyman orthogonal. The rev erse direction of the equiv alence th us requires the pro duct structure of Assumption 1 and do es not follow from pathwise diﬀerentiabilit y alone. 12 Remark 6 (The − 1 normalization) . The − 1 normalization follo ws naturally as a structural con- sequence of pathwise diﬀerentiabilit y and the co ordinate geometry of the mo del. Along the β - co ordinate submo del t 7→ P t,s β , the parameter β increases at unit rate by construction, and th e inﬂuence function representation giv es E 0 [ m ( Z ; β 0 , η 0 ) s β ( Z )] = 1 , and ( 3 ) forces 1 + G = 0 . A ﬁrst-order T aylor expansion giv es E 0 [ m ( Z ; β , η 0 )] ≈ E 0 [ m ( Z ; β 0 , η 0 )] + ( − 1) · ( β − β 0 ) = − ( β − β 0 ) , so that E 0 [ m ( Z ; β , η 0 )] = 0 has the unique lo cal solution β = β 0 , as desired. This also sheds light on a familiar pattern in semiparametric inference where many inﬂuence functions take the form φ ( Z ) = ( data-dep enden t term ) − β 0 . The normalization requires β to enter the exp ected estimating function with ﬁrst-order sensitivit y exactly − 1 , which is realized b y subtracting oﬀ β . T o illustrate that the conditions of Theorems 1 and 2 can b e v eriﬁed in a standard setting, we w ork through an example of the av erage treatment eﬀect in detail in App endix C , constructing the required co ordinate submo dels explicitly and c hecking each assumption. 4 Discussion In this pap er, w e hav e established a precise equiv alence b etw een Neyman orthogonalit y and path- wise diﬀerentiabilit y in nonparametric mo dels, building on the foundational semiparametric theory of Bic kel et al. [ 1998 ], v an der Laan and Robins [ 2003 ], T siatis [ 2006 ], and connecting it to the mo dern debiased mac hine learning framework of Chernozh uko v et al. [ 2018 ]. Our forward theo- rem shows that under mild conditions, Neyman orthogonalit y implies path wise diﬀeren tiability , and our conv erse shows that the reverse implication also holds, but requires the additional geometric condition of lo cal pro duct structure. Sev eral directions remain op en. F oremost, the regularity conditions we imp ose, notably the ex- istence of co ordinate submo dels witnessing lo cal pro duct structure, can b e nontrivial to verify in complex semiparametric problems, such as those in volving constrained n uisance spaces or function- als deﬁned through implicit equations. That said, the conditions w e require are mild, amounting to smo othness of the estimating function and the ability to p erturb the target and nuisance parame- ters indep enden tly , and w e exp ect the equiv alence to hold broadly in the semiparametric settings most commonly encountered in practice. Relaxing these conditions, extending the equiv alence to settings with non-smo oth functionals, and developing systematic to ols for constructing co ordinate submo dels in applied problems w ould b e natural next steps. References Victor Chernozhuk o v, Denis Chetv eriko v, Mert Demirer, Esther Duﬂo, Christian Hansen, Whit- ney Newey , and James Robins. Double/debiased mac hine learning for treatment and structural parameters. The Ec onometrics Journal , 21(1):C1–C68, 2018. Mark J. v an der Laan and James M. Robins. Uniﬁe d Metho ds for Censor e d L ongitudinal Data and Causality . Springer New Y ork, 2003. P eter J. Bick el, Chris A.J. Klaassen, Y a’aco v Rito v, and Jon A. W ellner. Eﬃcient and A daptive Estimation for Semip ar ametric Mo dels . Springer New Y ork, 1998. A. W. v an der V aart. A symptotic Statistics . Cambridge Universit y Press, 1998. 13 Anastasios T siatis. Semip ar ametric The ory and Missing Data . Springer New Y ork, 2006. Alex Luedtke and Incheoul Chung. One-step estimation of diﬀeren tiable Hilb ert-v alued parameters. The A nnals of Statistics , 52(4), 2024. A Pro ofs of Lemmas A.1 Pro of of Lemma 1 W e ﬁrst v erify that p t is a densit y . F or | t | < 1 / M , we ha ve 1 + tg ( z ) > 1 − | t | M > 0 ν -a.e., so p t ≥ 0 . Also Z p t dν = Z p 0 (1 + tg ) dν = 1 + t Z g dP 0 = 1 + t E 0 [ g ] = 1 . W e next show that it satisﬁes the QMD expansion. W rite √ p t = √ p 0 √ 1 + tg . Deﬁne r ( u ) := √ 1 + u − 1 − 1 2 u, u ∈ ( − 1 , 1) . Then r (0) = r ′ (0) = 0 . Since √ 1 + u has b ounded second deriv ativ e on [ − 1 / 2 , 1 / 2] , there exists C < ∞ such that | r ( u ) | ≤ C u 2 for | u | ≤ 1 / 2 . F or | t | ≤ 1 / (2 M ) w e h a v e | tg | ≤ 1 / 2 , hence p 1 + tg = 1 + t 2 g + r ( tg ) . Therefore √ p t − √ p 0 t − 1 2 g √ p 0 = √ p 0 · r ( tg ) t . Using | r ( tg ) | ≤ C t 2 g 2 , we hav e | r ( tg ) t | ≤ C | t | g 2 ≤ C | t | M 2 . Hence Z  √ p t − √ p 0 t − 1 2 g √ p 0  2 dν ≤ Z p 0 ( C | t | M 2 ) 2 dν = C 2 t 2 M 4 → 0 as t → 0 and the path is QMD with score s ≡ g . A.2 Pro of of Lemma 2 W rite P t ≡ P t,s and p t = dP t /dν throughout. W e hav e E P t [ f ] = R f p t dν . Then E P t [ f ] − E 0 [ f ] = Z f ( p t − p 0 ) dν = Z f ( √ p t − √ p 0 )( √ p t + √ p 0 ) dν. Dividing by t , E P t [ f ] − E 0 [ f ] t = Z f  √ p t − √ p 0 t  ( √ p t + √ p 0 ) dν. Let ∆ t := √ p t − √ p 0 t − 1 2 s √ p 0 . By QMD (Deﬁnition 1 ), it follo ws ∥ ∆ t ∥ L 2 ( ν ) → 0 . Decomp ose E P t [ f ] − E 0 [ f ] t = I t, 1 + I t, 2 14 where I t, 1 := Z f  1 2 s √ p 0  ( √ p t + √ p 0 ) dν, I t, 2 := Z f ∆ t ( √ p t + √ p 0 ) dν. T erm I t, 1 . By QMD (Deﬁnition 1 ) and the triangle inequality , √ p t → √ p 0 in L 2 ( ν ) . T o see this, there exists a function g = 1 2 s √ p 0 suc h that     √ p t − √ p 0 t − g     L 2 ( ν ) → 0 , whic h is equiv alen t to sa ying that for any ϵ > 0 , there exists δ > 0 suc h that for | t | < δ ,     √ p t − √ p 0 t − g     L 2 ( ν ) < ϵ. By the triangle inequalit y for | t | < δ ,     √ p t − √ p 0 t     L 2 ( ν ) ≤     √ p t − √ p 0 t − g     L 2 ( ν ) + ∥ g ∥ L 2 ( ν ) < ϵ + ∥ g ∥ L 2 ( ν ) where the right-hand side do es not depend on t . Multiplying b oth sides by | t | and taking the limit as t → 0 yields the result. Th us, √ p t + √ p 0 → 2 √ p 0 in L 2 ( ν ) . Since f s √ p 0 ∈ L 2 ( ν ) as Z f 2 s 2 p 0 dν = E 0 [ f 2 s 2 ] < M 2 E 0 [ f 2 ] < ∞ where M := ∥ s ∥ ∞ , we hav e I t, 1 → Z f  1 2 s √ p 0  2 √ p 0 dν = Z f sp 0 dν = E 0 [ f s ] . T erm I t, 2 . By Cauch y-Sch w arz, | I t, 2 | ≤ ∥ f ( √ p t + √ p 0 ) ∥ L 2 ( ν ) · ∥ ∆ t ∥ L 2 ( ν ) . W e already hav e that ∥ ∆ t ∥ 2 → 0 . It remains to sho w ∥ f ( √ p t + √ p 0 ) ∥ L 2 ( ν ) is bounded for small t . W rite ∥ f ( √ p t + √ p 0 ) ∥ 2 L 2 ( ν ) = Z f 2 ( √ p t + √ p 0 ) 2 dν ≤ 2 Z f 2 ( p t + p 0 ) dν = 2 n E P t [ f 2 ] + E 0 [ f 2 ] o Under QMD, we kno w P t → P 0 in Hellinger distance, and    E P t [ f 2 ] − E 0 [ f 2 ]    =     Z f 2 ( p t − p 0 ) dν     ≤ ∥ f 2 ∥ ∞ Z | p t − p 0 | dν ≤ ∥ f 2 ∥ ∞ · 2 ∥ √ p t − √ p 0 ∥ L 2 ( ν ) → 0 where the ﬁrst inequality follows by the b oundedness assumption of f and the second inequality follo ws b y TV ( P t , P 0 ) ≤ 2 √ 2 H ( P t , P 0 ) . Hence E P t [ f 2 ] → E 0 [ f 2 ] as t → 0 and I t, 2 → 0 . Com bining the results ab o ve yields the desired deriv ativ e. 15 A.3 Pro of of Lemma 3 W rite P t ≡ P t,s and p t = dP t /dν throughout. W e hav e E P t [ f t ] − E 0 [ f 0 ] = { E P t [ f 0 ] − E 0 [ f 0 ] } + E P t [ f t − f 0 ] . Dividing by t , E P t [ f t ] − E 0 [ f 0 ] t = E P t [ f 0 ] − E 0 [ f 0 ] t + E P t  f t − f 0 t  . By Lemma 2 , the ﬁrst term conv erges to E 0 [ f 0 s ] . F or the second term, let g t := f t − f 0 t . By Assumption (2), g t → ˙ f 0 in L 2 ( P 0 ) so E 0 [ g t ] → E 0 [ ˙ f 0 ] . It remains to show E P t [ g t ] − E 0 [ g t ] → 0 . W rite E P t [ g t ] − E 0 [ g t ] = Z g t ( p t − p 0 ) dν = Z g t ( √ p t − √ p 0 )( √ p t + √ p 0 ) dν. By Cauch y-Sc hw arz, | E P t [ g t ] − E 0 [ g t ] | ≤ ∥ g t ( √ p t + √ p 0 ) ∥ L 2 ( ν ) · ∥ √ p t − √ p 0 ∥ L 2 ( ν ) . By QMD (Deﬁnition 1 ), ∥ √ p t − √ p 0 ∥ L 2 ( ν ) → 0 . It suﬃces to show that ∥ g t ( √ p t + √ p 0 ) ∥ L 2 ( ν ) is b ounded for small t . By ( a + b ) 2 ≤ 2( a 2 + b 2 ) , ∥ g t ( √ p t + √ p 0 ) ∥ 2 L 2 ( ν ) ≤ 2 n E P t [ g 2 t ] + E 0 [ g 2 t ] o By Assumption (3), E P t [ g 2 t ] is uniformly b ounded for small t . E 0 [ g 2 t ] is also b ounded since g t → ˙ f 0 in L 2 ( P 0 ) . Hence the right-hand side is b ounded and E P t [ g t ] − E 0 [ g t ] → 0 . Therefore E P t [ g t ] → E 0 [ ˙ f 0 ] . Collecting b oth terms yields the desired iden tity . A.4 Pro of of Lemma 5 By F réchet diﬀerentiabilit y (Assumption 3 ), m ( · ; β t,s , η t,s ) − m ( · ; β 0 , η 0 ) = D β m 0 ( β t,s − β 0 ) + D η m 0 ( η t,s − η 0 ) + r t,s , where ∥ r t,s ∥ L 2 ( P 0 ) = o ( | β t,s − β 0 | + ∥ η t,s − η 0 ∥ V ) . Dividing by t and subtracting ˙ f 0 ,s , f t,s − f 0 t − ˙ f 0 ,s = D β m 0  β t,s − β 0 t − ˙ β 0 ,s  + D η m 0  η t,s − η 0 t − ˙ η 0 ,s  + r t,s t . T ak e L 2 ( P 0 ) norms. By b oundedness of D β m 0 , D η m 0 , there exist constan ts C β , C η suc h that ∥ · ∥ L 2 ≤ C β     β t,s − β 0 t − ˙ β 0 ,s     + C η     η t,s − η 0 t − ˙ η 0 ,s     V +     r t,s t     L 2 . The ﬁrst tw o terms → 0 by Assumption 2 . F or the remainder, Assumption 2 implies | β t,s − β 0 | + ∥ η t,s − η 0 ∥ V = O ( | t | ) , so ∥ r t,s /t ∥ L 2 = o (1) , which prov es the claim. 16 A.5 Hellinger Gap b et ween Regular Submo dels Lemma 6. L et t 7→ P t,s b e a r e gular submo del with sc or e s and t 7→ P t,g b e a r e gular submo del with sc or e g . Then lim sup t → 0 H ( P t,s , P t,g ) | t | ≤ 1 2 √ 2 ∥ s − g ∥ L 2 ( P 0 ) . Pr o of. By QMD, w e ha ve √ p t,s = √ p 0  1 + t 2 s  + r t , ∥ r t ∥ L 2 ( ν ) = o ( | t | ) , √ p t,g = √ p 0  1 + t 2 g  + ˜ r t , ∥ ˜ r t ∥ L 2 ( ν ) = o ( | t | ) . Subtracting, √ p t,s − √ p t,g = t 2 ( s − g ) √ p 0 + ( r t − ˜ r t ) . T aking L 2 ( ν ) norms and using the triangle inequalit y , ∥ √ p t,s − √ p t,g ∥ L 2 ( ν ) ≤ | t | 2 ∥ ( s − g ) √ p 0 ∥ L 2 ( ν ) + ∥ r t ∥ L 2 ( ν ) + ∥ ˜ r t ∥ L 2 ( ν ) . No w ∥ ( s − g ) √ p 0 ∥ 2 L 2 ( ν ) = R ( s − g ) 2 p 0 dν = E 0 [( s − g ) 2 ] = ∥ s − g ∥ 2 L 2 ( P 0 ) . Dividing by | t | , ∥ √ p t,s − √ p t,g ∥ L 2 ( ν ) | t | ≤ 1 2 ∥ s − g ∥ L 2 ( P 0 ) + ∥ r t ∥ L 2 ( ν ) + ∥ ˜ r t ∥ L 2 ( ν ) | t | . Since ∥ r t ∥ L 2 ( ν ) = o ( | t | ) and ∥ ˜ r t ∥ L 2 ( ν ) = o ( | t | ) , the second term v anishes as t → 0 . Using the deﬁnition H ≡ 1 √ 2 ∥ · ∥ L 2 ( ν ) , lim sup t → 0 H ( P t,s , P t,g ) | t | ≤ 1 2 √ 2 ∥ s − g ∥ L 2 ( P 0 ) . B V ariation Indep endence and Pro duct Structure Assumption 1 requires that, for each co ordinate direction, there exists a regular submo del through P 0 along whic h the induced coordinate path mov es one of β or η to ﬁrst order while holding the other ﬁxed. A natural question is how this relates to the classical notion of local v ariation indep endence, whic h asks that the attainable parameter set con tains a pro duct neighborho o d of ( β 0 , η 0 ) . Lo cal v ariation indep endence guarantees that indep enden tly v aried parameter v alues exist, but is purely set-theoretic and do es not ensure that they are connected by submo dels regular enough to diﬀeren tiate along. In this app endix, we formalize the distinction b et ween these t wo conditions and examine the role of pro duct structure in the classical results of v an der Laan and Robins [ 2003 ]. B.1 Lo cal V ariation Indep endence As mentioned in Deﬁnition 3 , we are primarily concerned with regular submo dels along whic h β has deriv ativ e zero at the truth, while η is free to v ary . The obvious question is whether suc h paths can alw ays b e constructed, i.e., whether one can p erturb η while holding β ﬁxed. If the chosen nuisance functional η already determines β , for instance, if β = g ( η ) for some known map g , then v arying η necessarily changes β , and the t wo functionals cannot b e p erturb ed indep enden tly . 17 Deﬁnition 7 (Lo cal V ariation Indep endence) . W e say that β and η ar e lo c al ly variation indep endent at P 0 if ther e exist neighb orho o ds U ∋ β 0 and V ∋ η 0 such that U × V ⊂ Θ := { ( β ( P ) , η ( P )) : P ∈ P } , that is, the attainable p ar ameter set Θ c ontains a pr o duct neighb orho o d of β 0 , η 0 . In words, near ( β 0 , η 0 ) , there is a full interv al of β -v alues and a full neigh b orho od of η -v alues such that ev ery combination of the t wo is realized by some P ∈ P . The consequence is that one can v ary β while holding η ﬁxed, and vice versa. That is, for suﬃciently small t , the pairs ( β 0 + t, η 0 ) and ( β 0 , η 0 + th ) are b oth attainable, meaning there exist distributions in P realizing those functional v alues. Without such a pro duct neighborho o d, the attainable pairs near ( β 0 , η 0 ) could lie along a lo wer-dimensional surface, so that c hanging β migh t force η to c hange as well. Crucially , how ev er, lo cal v ariation indep endence is purely a set-theoretic statement ab out the at- tainable set Θ . The condition guarantees that for each small t , there exists at least one distribution P ∈ P with ( β ( P ) , η ( P )) = ( β 0 + t, η 0 ) . Ho w ever, this is only a p oin twise existence guaran tee and imp oses no regularity on ho w such c hoices may dep end on t . In particular, lo cal v ariation indep en- dence do es not imply that there exists a map t 7→ P t ∈ P satisfying ( β ( P t ) , η ( P t )) = ( β 0 + t, η 0 ) that is quadratic-mean diﬀeren tiable at t = 0 . Assumption 14 (Regular co ordinate submo dels) . F or every admissible dir e ction h ∈ ˙ H , the p aths t 7→ P β 0 + t, η 0 and t 7→ P β 0 , η 0 + th exist in P for suﬃciently smal l | t | and ar e r e gular (QMD) submo dels thr ough P 0 at t = 0 . Prop osition 1. If β and η ar e lo c al ly variation indep endent at P 0 (Deﬁnition 7 ) and satisfy c o or- dinate QMD smo othness (A ssumption 14 ), then lo c al pr o duct structur e (A ssumption 1 ) holds. Pr o of. Lo cal v ariation indep endence pro vides neigh b orho ods U ∋ β 0 and V ∋ η 0 with U × V ⊆ Θ . F or small | t | , the parameter v alues ( β 0 + t, η 0 ) and ( β 0 , η 0 + th ) lie in U × V and hence corresp ond to distributions in P . Assumption 14 asserts that the resulting paths are diﬀeren tiable in quadratic mean at t = 0 , giving exactly the conditions of Assumption 1 . W e note that Assumption 1 is strictly w eaker than this combination in tw o resp ects, as it requires neither a full product neigh b orhoo d in the parameter space nor exact co ordinate paths, only regular submo dels with the correct ﬁrst-order co ordinate deriv ativ es at P 0 . B.2 Revisiting the Gradient Characterization As discussed in Section 2.2.1 , the distinction b et ween the set-theoretic conten t of lo cal v ariation indep endence and the analytic conten t of Assumption 1 is subtle, and it is natural to ask whether this distinction matters in practice. W e sho w that the answer is aﬃrmativ e by revisiting the classical results of v an der Laan and Robins [ 2003 , Section 1.4], whic h connects inﬂuence functions to estimating functions. Their framew ork contains the essen tial insigh t that underpins the equiv alence w e formalize in Section 3 . Ho wev er, the regularity of submo dels that p erturb β and η indep enden tly , whic h we hav e isolated as Assumption 1 , pla ys an imp ortan t role in their argumen t that was not separately identiﬁed. Making this explicit is the purp ose of the present subsection. W e fo cus on t wo results from v an der Laan and Robins [ 2003 ]: their Lemma 1.2, whic h characterizes gradien ts through the deriv ativ e of an exp ected estimating function along arbitrary submo dels, and Lemma 1.3, which establishes that the deriv ativ e of the exp ected estimating function with resp ect to 18 β at ﬁxed η 0 equals − 1 . This latter result is the key step that links inﬂuence functions to estimating functions and underpins the construction of eﬃcient estimators via solving moment conditions. W e will show that the pro of of Lemma 1.3 contains an implicit step, replacing the v arying nuisance η ( P t,s ) by the ﬁxed v alue η 0 inside a deriv ativ e, that requires the nuisance tangent space to capture all nuisance directions, whic h in turn requires the lo cal pro duct structure of Assumption 1 . F or the reader’s conv enience, w e state the relev an t results in our notation. The corresp ondence with v an der Laan and Robins [ 2003 ] is: P 0 ↔ F X , β ↔ µ, η ↔ ρ, E 0 ↔ E F X , { P t,s } ↔ { F ϵ,s } , φ ∗ ↔ S ∗ F eﬀ , Λ ↔ T F nuis , P ↔ M F . Setup. The framework of v an der Laan and Robins [ 2003 ] p osits a class of estimating functions indexed b y an abstract lab el k , mapping each distribution in the mo del to a mean-zero function of the data. The key structural requirement is that these estimating functions, ev aluated at the true parameter v alues, span the orthogonal complement Λ ⊥ of the nuisance tangent space. Since inﬂuence functions are orthogonal to Λ by Lemma 4 , this ensures that every candidate inﬂuence function is represen table as an estimating function, and com bined with unbiasedness along sub- mo dels, allo ws one to reco v er the inner-product c haracterization linking estimating functions to gradien ts (Lemma 7 ). W e collect the precise conditions as follows. Assumption 15 (Estimating function representation) . Supp ose ther e exists an abstr act index set K and a mapping ( k , β , η ) 7→ D k ( · | β , η ) fr om K × Θ into functions of Z such that: 1. Unbiase d estimating function. E P [ D k ( Z | β ( P ) , η ( P ))] = 0 for al l P ∈ P and al l k ∈ K . 2. R ichness. The index set K is rich enough that, at P 0 , Λ ⊥ = { D k ( · | β 0 , η 0 ) : k ∈ K ( P 0 ) } , wher e K ( P 0 ) ⊆ K is the index set at P 0 and Λ ⊥ denotes the ortho gonal c omplement of the nuisanc e tangent sp ac e Λ (Deﬁnition 3 ) inside L 0 2 ( P 0 ) . 3. Continuity along submo dels. F or al l k ∈ K ( P 0 ) and e ach r e gular submo del { P t,s } with sc or e s ∈ S , ∥ D k ( · | β ( P t,s ) , η ( P t,s )) − D k ( · | β 0 , η 0 ) ∥ L 2 ( P 0 ) → 0 as t → 0 . 4. Pathwise diﬀer entiability. β is p athwise diﬀer entiable at P 0 with eﬃcient inﬂuenc e func- tion φ ∗ , and ⟨ φ ∗ ⟩ ⊂ S , wher e ⟨ φ ∗ ⟩ denotes the one-dimensional sp an of φ ∗ . 5. Uniform b ounde dness. F or al l k ∈ K ( P 0 ) , ther e exist C < ∞ and a neighb orho o d N of ( β 0 , η 0 ) such that sup z ∈Z , ( β ,η ) ∈ N | D k ( z | β , η ) | ≤ C . Gradien t characterization (Lemma 1.2 of v an der Laan and Robins [ 2003 ]). The ﬁrst result characterizes whic h estimating functions are gradien ts. The idea is as follo ws: by the unbi- asedness condition (i) of Assumption 15 , the expectation of D k under P t,s v anishes identically along an y regular submo del. Diﬀerentiating this identit y at t = 0 recov ers an inner-pro duct represen tation that determines when an estimating function is an inﬂuence function. 19 Lemma 7 (Gradien t c haracterization; Lemma 1.2 of v an der Laan and Robins [ 2003 ]) . Under A ssumption 15 , deﬁne f k ( s ) := d dt E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))]     t =0 . Then an element D = D k ( · | β 0 , η 0 ) ∈ Λ ⊥ for k ∈ K ( P 0 ) is a gr adient if and only if f k ( s ) = ( 0 if s ∈ S nuis , − d dt β ( P t,s )   t =0 if s ∈ ⟨ φ ∗ ⟩ . Pr o of. By Assumption 15 (i), E P t,s [ D k ( Z | β ( P t,s ) , η ( P t,s ))] = 0 for all suﬃciently small t . Com bined with E 0 [ D k ( Z | β 0 , η 0 )] = 0 , we can write 1 t E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))] = 1 t  E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))] − E P t,s [ D k ( Z | β ( P t,s ) , η ( P t,s ))]  = Z D k ( z | β ( P t,s ) , η ( P t,s )) dP 0 − dP t,s t ( z ) Deﬁne g t := D k ( z | β ( P t,s ) , η ( P t,s )) and g 0 := D k ( z | β 0 , η 0 ) . W riting dP 0 = p 0 dν and dP t,s = p t dν , Z g t · p 0 − p t t dν = − Z g t · √ p t − √ p 0 t ( √ p t + √ p 0 ) dν = − Z g 0 · √ p t − √ p 0 t ( √ p t + √ p 0 ) dν − Z ( g t − g 0 ) · √ p t − √ p 0 t ( √ p t + √ p 0 ) dν. The ﬁrst integral conv erges to E 0 [ g 0 s ] b y the same argument as in the pro of of Lemma 2 . F or the second integral, Cauch y–Sch w arz gives     Z ( g t − g 0 ) · √ p t − √ p 0 t ( √ p t + √ p 0 ) dν     ≤ ∥ ( g t − g 0 )( √ p t + √ p 0 ) ∥ L 2 ( ν ) ·     √ p t − √ p 0 t     L 2 ( ν ) , where the second factor is boun ded by QMD. F or the ﬁrst factor, ∥ ( g t − g 0 )( √ p t + √ p 0 ) ∥ 2 L 2 ( ν ) = Z ( g t − g 0 ) 2 ( √ p t + √ p 0 ) 2 dν ≤ 2 Z ( g t − g 0 ) 2 ( p t + p 0 ) dν = 2  E P t [( g t − g 0 ) 2 ] + E 0 [( g t − g 0 ) 2 ]  . The term E 0 [( g t − g 0 ) 2 ] → 0 b y Assumption 15 (iii). F or E P t [( g t − g 0 ) 2 ] , write E P t [( g t − g 0 ) 2 ] = E 0 [( g t − g 0 ) 2 ] + Z ( g t − g 0 ) 2 ( p t − p 0 ) dν ≤ E 0 [( g t − g 0 ) 2 ] + 4 C 2 Z | p t − p 0 | dν, where the second inequalit y uses Assumption 15 (v), and R | p t − p 0 | dν → 0 again by QMD. Thus, f k ( s ) = − E 0 [ D k ( Z | β 0 , η 0 ) · s ( Z )] = −⟨ D k ( · | β 0 , η 0 ) , s ⟩ P 0 . By deﬁnition, D k is a gradient if and only if the inner pro duct equals zero for all s ∈ S nuis and equals d dt β ( P t,s ) | t =0 for s ∈ ⟨ φ ∗ ⟩ . This is equiv alen t to the stated conditions on f k . 20 The negativ e identit y (Lemma 1.3 of v an der Laan and Robins [ 2003 ]). The second result builds on Lemma 7 to establish that if D k ( · | β 0 , η 0 ) is an inﬂuence function, then the partial deriv ativ e of E 0 [ D k ( Z | β , η 0 )] with resp ect to β at β 0 equals − 1 . It is in the pro of of this result that the regularity of co ordinate submo dels (Assumption 1 ) pla ys an imp ortan t but implicit role. Lemma 8 (Negative identit y; Lemma 1.3 of v an der Laan and Robins [ 2003 ]) . In addition to A ssumption 15 , assume that β and η ar e lo c al ly variation indep endent at P 0 (Deﬁnition 7 ), that β 7→ E 0 [ D k ( Z | β , η 0 )] is diﬀer entiable at β 0 with nonzer o derivative for al l k ∈ K ( P 0 ) , and that E 0 [ φ ∗ ( Z ) 2 ] > 0 . If D k ( · | β 0 , η 0 ) is a gr adient, then d dβ E 0 [ D k ( Z | β , η 0 )]     β = β 0 = − 1 . Pr o of (as given by van der L aan and R obins [ 2003 ]). Let s ∈ S b e a scalar m ultiple of φ ∗ , sa y s = cφ ∗ for some c  = 0 . Since φ ∗ ∈ S by Assumption 15 (iv), s is the score of some regular submo del { P t,s } through P 0 . Deﬁne h 2 ,s ( t ) := β ( P t,s ) , h 1 ( β ) := E 0 [ D k ( Z | β , η 0 )] . The map t 7→ E 0 [ D k ( Z | β ( P t,s ) , η 0 )] is the comp osition h 1 ( h 2 ,s ( t )) . As in the pro of of Lemma 7 , d dt h 1 ( h 2 ,s ( t ))     t =0 = − d dt β ( P t,s )     t =0 . (4) By the chain rule, the left-hand side equals h ′ 1 ( β 0 ) · h ′ 2 ,s (0) . Path wise diﬀeren tiability giv es h ′ 2 ,s (0) = d dt β ( P t,s )     t =0 = E 0 [ φ ∗ s ] = c E 0 [( φ ∗ ) 2 ]  = 0 . So h ′ 1 ( β 0 ) · h ′ 2 ,s (0) = − h ′ 2 ,s (0) . Since h ′ 2 ,s (0)  = 0 , it follows that h ′ 1 ( β 0 ) = − 1 . The role of regularity . The pro of inv ok es “as in the pro of of Lemma 7 ” to claim ( 4 ), i.e., d dt E 0 [ D k ( Z | β ( P t,s ) , η 0 )]     t =0 = − d dt β ( P t,s )     t =0 . (5) Ho wev er, Lemma 7 actually established d dt E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))]     t =0 = − d dt β ( P t,s )     t =0 , (6) where η ( P t,s ) v aries with t . F or ( 5 ) to follo w from ( 6 ), one m ust show that replacing η ( P t,s ) b y the ﬁxed v alue η 0 do es not aﬀect the deriv ativ e, i.e., that ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 [ h ] = 0 ∀ h ∈ ˙ H . (7) T o see wh y ( 7 ) is needed, supp ose that the map ( β , η ) 7→ E 0 [ D k ( Z | β , η )] is F réc het diﬀerentiable at ( β 0 , η 0 ) . The chain rule decomp oses ( 6 ) as d dt E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))]     t =0 21 = d dt E 0 [ D k ( Z | β ( P t,s ) , η 0 )]     t =0 + ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 " d dt η ( P t,s )     t =0 # so ( 5 ) follo ws from ( 6 ) if and only if the second term v anishes. ( 7 ) guarantees this by requiring that the nuisance deriv ative of the exp ected estimating function v anishes in ev ery direction h ∈ ˙ H . W e now show that establishing ( 7 ) requires Assumption 1 . Apply Lemma 7 to a n uisance score s nuis ∈ S nuis . Since D k ( · | β 0 , η 0 ) is an inﬂuence function, f k ( s nuis ) = 0 , i.e., d dt E 0 [ D k ( Z | β ( P t,s nuis ) , η ( P t,s nuis ))]     t =0 = 0 . Assuming again F réchet diﬀerentiabilit y , the chain rule giv es ∂ ∂ β E 0 [ D k ( Z | β , η 0 )]     β = β 0 · d dt β ( P t,s nuis )     t =0 + ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 " d dt η ( P t,s nuis )     t =0 # = 0 . Since s nuis is a nuisance score, d dt β ( P t,s nuis )   t =0 = 0 , so the ﬁrst term v anishes and w e obtain ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 " d dt η ( P t,s nuis )     t =0 # = 0 . This establishes ( 7 ) only for those directions h ∈ ˙ H that arise as n uisance deriv atives of submo dels in S nuis . A priori, these nuisance deriv ativ es p opulate some subset of ˙ H , but there is no reason this subset should exhaust ˙ H . Assumption 1 closes the remaining gap by furnishing for each h ∈ ˙ H a regular submo del with d dt β ( P t ) | t =0 = 0 and d dt η ( P t ) | t =0 = h . Since d dt β ( P t ) | t =0 = 0 , the score of this submo del is a n uisance score, and its n uisance deriv ativ e at t = 0 is exactly h . The argument ab o ve then yields ( 7 ) for this h . Since h ∈ ˙ H was arbitrary , ( 7 ) holds in full generalit y . With ( 7 ) in hand, the passage from ( 6 ) to ( 5 ) immediately follows. F or any score s = cφ ∗ , the same c hain-rule decomposition used ab o ve giv es d dt E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))]     t =0 = ∂ ∂ β E 0 [ D k ( Z | β , η 0 )]     β = β 0 · d dt β ( P t,s )     t =0 + ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 " d dt η ( P t,s )     t =0 # = ∂ ∂ β E 0 [ D k ( Z | β , η 0 )]     β = β 0 · d dt β ( P t,s )     t =0 = d dt E 0 [ D k ( Z | β ( P t,s ) , η 0 )]     t =0 , where the second equality follows from ( 7 ), and the pro of of the negative identit y then pro ceeds as written. Remark 7 (F réchet diﬀeren tiability) . The c hain-rule decompositions abov e require F réchet dif- feren tiability of the map ( β , η ) 7→ E 0 [ D k ( Z | β , η )] at ( β 0 , η 0 ) , whic h is not explicitly stated in Lemma 1.3 of v an der Laan and Robins [ 2003 ]. The paragraph immediately preceding Lemma 1.3 in their exp osition, how ev er, suggests that smo othness conditions should b e jointly imp osed on β and η . 22 Remark 8 (Boundedness and the score deﬁnition) . The reader ma y notice that Assumption 15 (v) imp oses uniform b oundedness on the estimating functions, a condition not present in the corre- sp onding result of v an der Laan and Robins [ 2003 ]. The diﬀerence traces to the deﬁnition of scores. v an der Laan and Robins [ 2003 ] deﬁne the score as the L 2 ( P 0 ) limit of the density ratio ( p t /p 0 − 1) /t , whic h is strictly stronger than the quadratic mean diﬀeren tiability (QMD) formulation of v an der V aart [ 1998 ] adopted here. Under their deﬁnition, the conv ergence in the pro of of Lemma 7 follows from Cauch y–Sc hw arz in L 2 ( P 0 ) alone. Under QMD, the same step requires decomp osing through ( √ p t − √ p 0 )( √ p t + √ p 0 ) , and b ounding the resulting cross term requires in tro ducing the uniform b oundedness condition. It should be noted that the uniform b oundedness condition is an artifact of the QMD formulation and not a structural requirement of the arguments. W e adopt QMD through- out to maintain a single consistent con ven tion, and the distinction b et ween lo cal pro duct structure and v ariation indep endence arises indep enden tly of which score form ulation is adopted. C Example with the A v erage T reatmen t Eﬀect W e now illustrate the equiv alence results of Section 3 through a detailed w orked example on the a verage treatment eﬀect. F or each direction of the equiv alence, we v erify every assumption and construct the required ob jects explicitly . Setup. Let Z = ( Y , X , A ) with confounders X , binary treatment A ∈ { 0 , 1 } , and outcome Y . W e assume that the standard causal assumptions of consistency , positivity , and no unmeasured confounding hold. W e work in the nonparametric mo del P consisting of all densities p with resp ect to a σ -ﬁnite dominating measure ν that satisfy the regularit y conditions (R1)–(R2) b elo w. W e ﬁx P 0 ∈ P and deﬁne the n uisance quantities µ a ( x ) := E 0 [ Y | X = x, A = a ] , π ( x ) := P 0 ( A = 1 | X = x ) , the treatment eﬀect function τ ( x ) := µ 1 ( x ) − µ 0 ( x ) , and the conditional outcome v ariance σ 2 a ( x ) := V ar 0 ( Y | X = x, A = a ) . The target and nuisance functionals are β ( P ) := E P [ µ P 1 ( X ) − µ P 0 ( X )] , η ( P ) := ( µ P 1 , µ P 0 , π P ) , with β 0 := β ( P 0 ) and η 0 := ( µ 1 , µ 0 , π ) . Regularit y conditions. W e further imp ose the following conditions, which ensure that the con- structed submo dels are well-behav ed. Note that p ositivit y already appeared as an identiﬁcation assumption. (R1) P ositivit y . There exists ε > 0 such that for all P ∈ P , π P ( x ) ∈ [ ε, 1 − ε ] for P X -a.s. x . (R2) Bounded outcomes. There exists C Y < ∞ suc h that for all P ∈ P , | Y | ≤ C Y P -a.s. (R3) P ositiv e conditional v ariance. σ 2 a ( x ) ≥ σ 2 > 0 for a = 0 , 1 , P 0 -a.s. (R4) T reatmen t eﬀect heterogeneit y . V ar 0 ( τ ( X )) > 0 . W e also assume an interior p ositivit y margin at P 0 : there exists ε ′ > ε such that π ( x ) ∈ [ ε ′ , 1 − ε ′ ] P 0 ,X -a.s. This ensures that for an y b ounded mean-zero g , the linear tilt P t with density p 0 (1 + tg ) remains in P for suﬃcien tly small | t | , so the tangent space at P 0 is T = L 0 2 ( P 0 ) b y the same argumen t as Corollary 1 . 23 Finally , w e take the am bient normed space to b e V := L ∞ ( P 0 ,X ) 3 with the pro duct supremum norm, and the n uisance parameter set to b e H := n ( µ 1 , µ 0 , π ) ∈ V : ess inf P 0 ,X π > 0 and ess inf P 0 ,X (1 − π ) > 0 o . Since | µ a ( x ) | ≤ C Y P 0 ,X -a.s. b y (R2) and π ∈ [ ε, 1 − ε ] P 0 ,X -a.s. b y (R1), w e hav e η 0 ∈ H . Moreo ver, since H is op en in V , the admissible perturbation space is ˙ H = V . Estimating function and inﬂuence function. Deﬁne the estimating function m ( Z ; β , η ) := A π ( X ) { Y − µ 1 ( X ) } − 1 − A 1 − π ( X ) { Y − µ 0 ( X ) } + µ 1 ( X ) − µ 0 ( X ) − β , (8) and the inﬂuence function at the truth, φ ( Z ) := m ( Z ; β 0 , η 0 ) = A π ( X ) { Y − µ 1 ( X ) } − 1 − A 1 − π ( X ) { Y − µ 0 ( X ) } + τ ( X ) − β 0 . (9) C.1 F orward direction W e verify Assumptions 4 – 10 and apply Theorem 1 to conclude that β is path wise diﬀeren tiable with inﬂuence function φ ( Z ) = − G − 1 m ( Z ; β 0 , η 0 ) . Assumption 4 . Let P b e an y distribution with β ( P ) = β and η ( P ) = ( µ 1 , µ 0 , π ) . W e sho w E P [ m ( Z ; β , η )] = 0 . By the to wer prop ert y , conditioning ﬁrst on X and then on ( X , A ) , and using the deﬁnition µ 1 ( x ) = E P [ Y | X = x, A = 1] , E P  A π ( X ) { Y − µ 1 ( X ) }  = E P  E P  A π ( X ) { Y − µ 1 ( X ) }     X  = E P  π ( X ) π ( X ) · E P [ Y − µ 1 ( X ) | X , A = 1]  = 0 , where the second equality uses E P [ A · f ( Z ) | X ] = π ( X ) · E P [ f ( Z ) | X , A = 1] . The second IPW term v anishes identically by the same argument with a = 0 . The remaining terms contribute E P [ µ 1 ( X ) − µ 0 ( X )] − β = β − β = 0 . Assumption 8 . Since m is linear in β with co eﬃcien t − 1 , we ha ve ∂ β m ( Z ; β 0 , η 0 ) = − 1 iden tically , so G := E 0 [ ∂ β m ( Z ; β 0 , η 0 )] = − 1  = 0 . Assumption 9 . W e verify that the Gâteaux deriv ative of η 7→ E 0 [ m ( Z ; β 0 , η )] v anishes at η 0 in eac h co ordinate direction of ˙ H . Since η = ( µ 1 , µ 0 , π ) and the admissible p erturbation space ˙ H is a pro duct, linearit y allo ws us to chec k each comp onen t separately . Recall that throughout, the exp ectation E 0 is taken under the ﬁxed measure P 0 and only the function argumen ts inside m are b eing v aried. P erturbation µ 1 → µ 1 + th 1 . Substituting µ 1 + th 1 in to ( 8 ) with β = β 0 and ( µ 0 , π ) held at their true v alues, the only terms aﬀected are the ﬁrst IPW term A π ( X ) { Y − µ 1 ( X ) − th 1 ( X ) } 24 and the outcome regression µ 1 ( X ) + th 1 ( X ) − µ 0 ( X ) . T aking the exp ectation under P 0 and diﬀerentiating at t = 0 : d dt E 0 [ m ( Z ; β 0 , ( µ 1 + th 1 , µ 0 , π ))]     t =0 = E 0  − Ah 1 ( X ) π ( X ) + h 1 ( X )  = E 0  h 1 ( X )  1 − A π ( X )  . Conditioning on X and using E 0 [ A | X ] = π ( X ) : E 0  1 − A π ( X )     X  = 1 − π ( X ) π ( X ) = 0 . By the to w er prop erty , the deriv ativ e v anishes for all h 1 ∈ L ∞ ( P 0 ,X ) . The perturbation µ 0 → µ 0 + th 0 follo ws b y an iden tical argumen t. P erturbation π → π + th π . Substituting π + th π aﬀects only the denominators of the t wo IPW terms. Since the outcome regression µ 1 ( X ) − µ 0 ( X ) − β 0 do es not inv olv e π , w e diﬀerentiate only the IPW terms. Using d dt 1 π + th π     t =0 = − h π /π 2 and d dt 1 1 − π − th π     t =0 = h π / (1 − π ) 2 , w e can write d dt E 0 [ m ( Z ; β 0 , ( µ 1 , µ 0 , π + th π ))]     t =0 = E 0  − A h π ( X ) π ( X ) 2 { Y − µ 1 ( X ) } − (1 − A ) h π ( X ) (1 − π ( X )) 2 { Y − µ 0 ( X ) }  . F or the ﬁrst term, we condition on X : E 0  A ( Y − µ 1 ( X )) π ( X ) 2     X  = 1 π ( X ) 2 E 0 [ A ( Y − µ 1 ( X )) | X ] = π ( X ) π ( X ) 2 E 0 [ Y − µ 1 ( X ) | X , A = 1] = 0 , where w e used E 0 [ A · f ( Z ) | X ] = π ( X ) E 0 [ f ( Z ) | X , A = 1] and the deﬁnition of µ 1 . The second term v anishes identically by the same argument with a = 0 . Assumptions 5 – 7 . Under (R1)–(R2), the map ( β , η ) 7→ m ( · ; β , η ) ∈ L 2 ( P 0 ) is F réchet diﬀeren- tiable at ( β 0 , η 0 ) . The partial deriv ativ es computed ab o ve are b ounded linear maps in to L 2 ( P 0 ) , with b oundedness follo wing from π ≥ ε and | Y | ≤ C Y , which ensure all IPW-weigh ted terms lie in L ∞ ( P 0 ) . W e tak e S = L ∞ ( P 0 ) ∩ L 0 2 ( P 0 ) , which is dense in T = L 0 2 ( P 0 ) by the same argument as Corollary 1 , and for each s ∈ S w e use the linear tilt submo del from Lemma 1 . The induced co ordinate paths t 7→ ( β t,s , η t,s ) are diﬀerentiable at t = 0 , whic h follo ws from the explicit deriv ative form ulas ∂ ∂ t µ a,t ( x )     t =0 = E 0 [( Y − µ a ( X )) s ( Z ) | X = x, A = a ] , ∂ ∂ t π t ( x )     t =0 = E 0 [( A − π ( X )) s ( Z ) | X = x ] , whic h are deriv ed in the pathwise diﬀerentiabilit y veriﬁcation b elow via the quotien t rule. The uniform second momen t bound of Lemma 3 holds since for linear tilt submo dels with b ounded scores, the n uisance diﬀerence quotients ( µ a,t − µ a ) /t and ( π t − π ) /t ad mit closed-form expressions via the change-of-measure identit y µ a,t ( x ) = E 0 [ Y (1 + ts ) | X = x, A = a ] / E 0 [(1 + ts ) | X = x, A = a ] and similarly for π t , whic h are uniformly b ounded in x for small t under (R1)–(R2). Combined 25 with the b oundedness of ( β t − β 0 ) /t from co ordinate smo othness, this yields a uniform L ∞ b ound on the full diﬀerence quotient ( f t,s − f 0 ) /t , which dominates the L 2 ( P t,s ) norm for an y t . Assumption 10 . W e sho w that | β ( P 1 ) − β ( P 2 ) | ≤ cH ( P 1 , P 2 ) for all P 1 , P 2 ∈ P , where c = 4 √ 2 C Y (1 + 1 /ε ) . T o start, write p j = dP j /dν and recall τ P j ( x ) = µ P j 1 ( x ) − µ P j 0 ( x ) and β ( P j ) = R τ P j ( x ) dP j,X ( x ) . W e decomp ose β ( P 1 ) − β ( P 2 ) = Z  τ P 1 ( x ) − τ P 2 ( x )  dP 1 ,X ( x ) | {z } (I) + Z τ P 2 ( x ) d ( P 1 ,X − P 2 ,X )( x ) | {z } (II) . By (R2), | τ P 2 ( x ) | ≤ 2 C Y , so | I I | ≤ 2 C Y Z | p 1 ,X ( x ) − p 2 ,X ( x ) | dν X = 2 C Y TV( P 1 ,X , P 2 ,X ) . Since the marginal densit y is obtained by in tegrating out ( y , a ) , | p 1 ,X ( x ) − p 2 ,X ( x ) | =       X a ∈{ 0 , 1 } Z ( p 1 − p 2 )( y , x, a ) dν Y       ≤ X a ∈{ 0 , 1 } Z | p 1 − p 2 | ( y , x, a ) dν Y , where the inequality holds by the triangle inequality . In tegrating o ver ν X : TV( P 1 ,X , P 2 ,X ) ≤ X a ∈{ 0 , 1 } Z Z | p 1 − p 2 | ( y , x, a ) dν Y dν X = Z | p 1 − p 2 | dν = TV ( P 1 , P 2 ) , hence | I I | ≤ 2 C Y TV( P 1 , P 2 ) . Next, by the triangle inequalit y , | τ P 1 ( x ) − τ P 2 ( x ) | ≤ | µ P 1 1 ( x ) − µ P 2 1 ( x ) | + | µ P 1 0 ( x ) − µ P 2 0 ( x ) | , so it suﬃces to b ound eac h R | µ P 1 a ( x ) − µ P 2 a ( x ) | dP 1 ,X ( x ) separately . Fix a ∈ { 0 , 1 } . By deﬁnition, µ P j a ( x ) = R y dP j ( y | x, a ) , so R ( y − µ P 2 a ( x )) dP 2 ( y | x, a ) = 0 . It follows that for P 1 ,X -a.e. x , µ P 1 a ( x ) − µ P 2 a ( x ) = Z y dP 1 ( y | x, a ) − µ P 2 a ( x ) = Z ( y − µ P 2 a ( x )) dP 1 ( y | x, a ) = Z ( y − µ P 2 a ( x )) dP 1 ( y | x, a ) − Z ( y − µ P 2 a ( x )) dP 2 ( y | x, a ) = R ( y − µ P 2 a ( x ))( p 1 − p 2 )( y , x, a ) dν Y p 1 ( x, a ) , where the last equality writes dP j ( y | x, a ) = p j ( y , x, a ) dν Y /p j ( x, a ) . By (R2), | y − µ P 2 a ( x ) | ≤ 2 C Y , so | µ P 1 a ( x ) − µ P 2 a ( x ) | ≤ 2 C Y p 1 ( x, a ) Z | p 1 − p 2 | ( y , x, a ) dν Y . By (R1), p 1 ( x, a ) ≥ ε p 1 ,X ( x ) , since p 1 ( x, a ) = π P 1 a ( x ) p 1 ,X ( x ) and π P 1 a ( x ) ≥ ε . Multiplying b oth sides by p 1 ,X ( x ) : | µ P 1 a ( x ) − µ P 2 a ( x ) | p 1 ,X ( x ) ≤ 2 C Y ε Z | p 1 − p 2 | ( y , x, a ) dν Y . 26 In tegrating o ver ν X and summing ov er a ∈ { 0 , 1 } , I ≤ 1 X a =0 Z | µ P 1 a ( x ) − µ P 2 a ( x ) | dP 1 ,X ( x ) ≤ 2 C Y ε 1 X a =0 Z Z | p 1 − p 2 | ( y , x, a ) dν Y dν X = 2 C Y ε TV( P 1 , P 2 ) . Com bining the ab o ve, we arrive at | β ( P 1 ) − β ( P 2 ) | ≤  2 C Y + 2 C Y ε  TV( P 1 , P 2 ) = 2 C Y  1 + 1 ε  TV( P 1 , P 2 ) ≤ 4 √ 2 C Y  1 + 1 ε  H ( P 1 , P 2 ) , where the last step uses TV( P 1 , P 2 ) ≤ 2 √ 2 H ( P 1 , P 2 ) . Therefore, Assumption 10 holds with c = 4 √ 2 C Y (1 + 1 /ε ) and an y δ > 0 . Since the assumptions of Theorem 1 h old, we conclude that β is pathwise diﬀeren tiable at P 0 with inﬂuence function φ ( Z ) = − G − 1 m ( Z ; β 0 , η 0 ) = − ( − 1) − 1 m ( Z ; β 0 , η 0 ) = m ( Z ; β 0 , η 0 ) . C.2 Rev erse direction Assumption 11 . W e sho w directly that for every linear tilt submo del p t ( z ) = p 0 ( z )(1 + tg ( z )) with E 0 [ g ] = 0 and ∥ g ∥ ∞ ≤ M , whose score is s ≡ g by Lemma 1 , d dt β ( P t )     t =0 = E 0 [ φ ( Z ) g ( Z )] . W e ﬁrst establish this identit y for all b ounded mean-zero scores b elo w, and then extend the con- clusion to all regular submo dels via the appro ximation step used in the argument of Theorem 1 . The deriv ativ e of β ( P t ) = E P t [ µ 1 ,t ( X ) − µ 0 ,t ( X )] decomp oses b y the pro duct rule into three terms: d dt β ( P t )     t =0 = Z X ∂ ∂ t µ 1 ,t ( x )     t =0 dP 0 ( x ) | {z } (I) − Z X ∂ ∂ t µ 0 ,t ( x )     t =0 dP 0 ( x ) | {z } (II) + Z X τ ( x ) ∂ ∂ t dP t,X ( x )     t =0 | {z } (II I) . F or the ﬁrst t wo terms, µ a,t ( x ) = Z Y y dP t ( y , x, a )  Z Y dP t ( y , x, a ) where dP t = (1 + tg ) dP 0 . Deﬁne N µ a ( t ) := Z Y y dP 0 ( y , x, a )(1 + tg ( y , x, a )) , and D µ a ( t ) := Z Y dP 0 ( y , x, a )(1 + tg ( y , x, a )) . Then it follows N µ a (0) = µ a ( x ) dP 0 ( x, a ) , N ′ µ a (0) = E 0 [ Y g ( Z ) | X = x, A = a ] dP 0 ( x, a ) , 27 D µ a (0) = dP 0 ( x, a ) , D ′ µ a (0) = E 0 [ g ( Z ) | X = x, A = a ] dP 0 ( x, a ) . By the quotient rule, we obtain ∂ ∂ t µ a,t ( x )     t =0 =  N ′ µ a (0) D µ a (0) − N µ a (0) D ′ µ a (0)  D µ a (0) 2 = E 0 [ Y g ( Z ) | X = x, A = a ] − µ a ( x ) E 0 [ g ( Z ) | X = x, A = a ] = E 0 [( Y − µ a ( X )) g ( Z ) | X = x, A = a ] . (10) F or π t ( x ) = p t ( x, A = 1) /p t ( x ) , set N π ( t ) := Z Y dP 0 ( y , x, 1)(1 + t g ( y , x, 1)) , D π ( t ) := Z Y X a dP 0 ( y , x, a )(1 + t g ( y , x, a )) . Then it follows N π (0) = dP 0 ( x, A =1) , N ′ π (0) = E 0 [ g ( Z ) | X = x, A = 1] dP 0 ( x, A =1) , D π (0) = dP 0 ( x ) , D ′ π (0) = E 0 [ g ( Z ) | X = x ] dP 0 ( x ) . Again by the quotien t rule, we obtain ∂ ∂ t π t ( x )     t =0 =  N ′ π (0) D π (0) − N π (0) D ′ π (0)  D π (0) 2 = π ( x )  E 0 [ g ( Z ) | X = x, A = 1] − E 0 [ g ( Z ) | X = x ]  = E 0 [( A − π ( X )) g ( Z ) | X = x ] , (11) where the second equalit y follo ws from E 0 [ A g ( Z ) | X = x ] = π ( x ) E 0 [ g ( Z ) | X = x, A = 1] . W e now assemble the terms. F or term (I), recall the iden tity E 0 [ W · 1 { A = 1 } | X ] = π ( X ) · E 0 [ W | X, A = 1] . By the law of total exp ectation, Z X E 0 [( Y − µ 1 ( X )) g ( Z ) | X = x, A = 1] dP 0 ( x ) = E 0  π ( X ) · E 0  ( Y − µ 1 ( X )) g ( Z ) π ( X )     X , A = 1  = E 0  A ( Y − µ 1 ( X )) g ( Z ) π ( X )  . By the same reasoning, term (II) giv es E 0  (1 − A )( Y − µ 0 ( X )) g ( Z ) 1 − π ( X )  . F or term (I I I), by the la w of total exp ectation Z X τ ( x ) E 0 [ g ( Z ) | X = x ] dP 0 ( x ) = E 0 [ τ ( X ) g ( Z )] . Collecting all three terms shows that d dt β ( P t,g )   t =0 = E 0 [ φ ( Z ) g ( Z )] for every linear tilt submo del with b ounded mean-zero score g . W e know b ounded mean-zero functions are dense in T = L 0 2 ( P 0 ) b y the same argumen t as Corollary 1 , and the A TE is Hellinger Lipschitz as veriﬁed in Section C.1 28 for Assumption 10 . Therefore, the same three-term appro ximation argumen t used in the proof of Theorem 1 extends this identit y to all regular submo dels with score s ′ ∈ S , whic h establishes path wise diﬀeren tiability at P 0 with inﬂuence function φ . Assumption 12 . T o v erify this assumption, we construct explicit QMD submo dels along eac h co ordinate of the parameter space. β -co ordinate submo del. W e construct a QMD p ath t 7→ P t with β ( P t ) = β 0 + t and η ( P t ) = η 0 . Deﬁne the function g β ( x ) := τ ( x ) − β 0 V ar 0 ( τ ( X )) , whic h depends on z only through x . By (R4), V ar 0 ( τ ( X )) > 0 , and (R2) gives ∥ g β ∥ ∞ < ∞ . Clearly , E 0 [ g β ] = 0 . By Lemma 1 , the linear tilt dP t ( z ) = (1 + tg β ( x )) dP 0 ( z ) deﬁnes a regular QMD submo del through P 0 with score g β for | t | < 1 / ∥ g β ∥ ∞ . Since g β dep ends only on x , the conditional densities are undisturb ed b y the tilt: dP t ( y | x, a ) = dP t ( y , x, a ) dP t ( x, a ) = (1 + tg β ( x )) dP 0 ( y , x, a ) (1 + tg β ( x )) dP 0 ( x, a ) = dP 0 ( y | x, a ) , so µ a,t ( x ) = µ a ( x ) for all small t . Similarly , π t ( x ) = dP t ( x, 1) /dP t ( x ) = π ( x ) since the (1 + tg β ( x )) factors cancel in the ratio. Hence η ( P t ) = η 0 . F urthermore, β increases at unit rate: β ( P t ) = E P t [ τ ( X )] = E 0 [ τ ( X )(1 + tg β ( X ))] = β 0 + t · E 0 [ τ ( X ) ( τ ( X ) − β 0 )] V ar 0 ( τ ( X )) = β 0 + t, since E 0 [ τ ( X )( τ ( X ) − β 0 )] = V ar 0 ( τ ( X )) . η -co ordinate submo dels. F or each admissible direction h = ( h 1 , h 0 , h π ) ∈ ˙ H , we construct a regular (QMD) submo del through P 0 satisfying ˙ β 0 ,s h = 0 and ˙ η 0 ,s h = h . Since Assumption 1 requires only ﬁrst-order co ordinate con trol, a linear tilt submo del suﬃces. Deﬁne the p erturbation functions g a ( y , x ) := h a ( x ) ( y − µ a ( x )) σ 2 a ( x ) , g h π ( x, a ) := h π ( x ) ( a − π ( x )) π ( x )(1 − π ( x )) , g β ( x ) := τ ( x ) − β 0 V ar 0 ( τ ( X )) , and the score s h ( z ) := g a ( y , x ) + g h π ( x, a ) + α 0 g β ( x ) , α 0 := − E 0 [ h 1 ( X ) − h 0 ( X )] . (12) Under (R1)–(R3), eac h summand is bounded. Indeed, | g a | ≤ ∥ h a ∥ ∞ · 2 C Y /σ 2 , | g h π | ≤ ∥ h π ∥ ∞ / ( ε (1 − ε )) , and | g β | ≤ 4 C Y / V ar 0 ( τ ( X )) . Each summand also has mean zero. F or the outcome p erturbation, the la w of iterated exp ectation and E 0 [ Y − µ a ( X ) | X, A ] = 0 give E 0 [ g a ] = 0 . F or the prop ensit y p erturbation, E 0 [ A − π ( X ) | X ] = 0 gives E 0 [ g h π ] = 0 . Finally , E 0 [ g β ] = 0 by construction. Hence s h is b ounded and mean-zero, and by Lemma 1 , the linear tilt p t ( z ) := p 0 ( z )(1 + t s h ( z )) deﬁnes a regular QMD submo del through P 0 with score s h for | t | < 1 / ∥ s h ∥ ∞ . 29 W e now verify the ﬁrst-order co ordinate deriv ativ es using the quotien t rule form ulas ( 10 ) and ( 11 ), applied to the linear tilt with score s h . Deriv ative of µ a . By ( 10 ), ˙ µ a, 0 ( x ) = E 0 [( Y − µ a ( X )) s h ( Z ) | X = x, A = a ] . W e expand s h = g a + g h π + α 0 g β and compute each contribution separately . F or the outcome p erturbation, E 0 [( Y − µ a ( X )) g a ( Y , X ) | X = x, A = a ] = h a ( x ) σ 2 a ( x ) E 0 [( Y − µ a ( X )) 2 | X = x, A = a ] = h a ( x ) σ 2 a ( x ) · σ 2 a ( x ) = h a ( x ) . F or the prop ensit y p erturbation, since g h π ( x, a ) do es not dep end on y , it factors out of the condi- tional exp ectation and the remaining factor E 0 [ Y − µ a ( X ) | X = x, A = a ] v anishes by deﬁnition of µ a . The same reasoning applies to α 0 g β ( x ) , which also do es not dep end on y . That is, E 0 [( Y − µ a ( X )) g h π ( X , A ) | X = x, A = a ] = g h π ( x, a ) · E 0 [ Y − µ a ( X ) | X = x, A = a ] = 0 , E 0 [( Y − µ a ( X )) α 0 g β ( X ) | X = x, A = a ] = α 0 g β ( x ) · E 0 [ Y − µ a ( X ) | X = x, A = a ] = 0 . Com bining the three contributions giv es ˙ µ a, 0 ( x ) = h a ( x ) . Deriv ative of π . By ( 11 ), ˙ π 0 ( x ) = E 0 [( A − π ( X )) s h ( Z ) | X = x ] . F or the outcome p erturbation, we condition on A and use the conditional mean-zero prop erty of g a to obtain E 0 [( A − π ( X )) g a ( Y , X ) | X = x ] = X a ′ ∈{ 0 , 1 } P 0 ( A = a ′ | x ) ( a ′ − π ( x )) E 0 [ g a ′ ( Y , X ) | X = x, A = a ′ ] . Eac h inner exp ectation ev aluates to E 0 [ g a ′ ( Y , X ) | X = x, A = a ′ ] = h a ′ ( x ) σ 2 a ′ ( x ) E 0 [ Y − µ a ′ ( X ) | X = x, A = a ′ ] = 0 , so the entire sum v anishes. F or the prop ensit y p erturbation, recalling that g h π ( x, a ) = h π ( x )( a − π ( x )) / [ π ( x )(1 − π ( x ))] , E 0 [( A − π ( X )) g h π ( X , A ) | X = x ] = h π ( x ) π ( x )(1 − π ( x )) E 0 [( A − π ( X )) 2 | X = x ] = h π ( x ) π ( x )(1 − π ( x )) · π ( x )(1 − π ( x )) = h π ( x ) . F or the marginal correction, since α 0 g β ( x ) do es not dep end on a , it factors out and the remaining exp ectation v anishes: E 0 [( A − π ( X )) α 0 g β ( X ) | X = x ] = α 0 g β ( x ) · E 0 [ A − π ( X ) | X = x ] = 0 . Com bining the three contributions giv es ˙ π 0 ( x ) = h π ( x ) , and therefore ˙ η 0 ,s h = ( h 1 , h 0 , h π ) = h . 30 Deriv ative of β . By Assumption 11 , ˙ β 0 ,s h = E 0 [ φ ( Z ) s h ( Z )] . Expanding φ from ( 9 ) and using linearity of exp ectation, this b ecomes E 0 [ φ ( Z ) s h ( Z )] = E 0  A ( Y − µ 1 ( X )) s h ( Z ) π ( X )  − E 0  (1 − A )( Y − µ 0 ( X )) s h ( Z ) 1 − π ( X )  + E 0 [ τ ( X ) s h ( Z )] − β 0 E 0 [ s h ( Z )] . (13) The last term v anishes since s h ∈ L 0 2 ( P 0 ) . W e ev aluate the remaining three terms in order. F or the ﬁrst term, the iden tity E 0 [ A · f ( Z ) | X ] = π ( X ) E 0 [ f ( Z ) | X , A = 1] allows us to write E 0  A ( Y − µ 1 ( X )) s h ( Z ) π ( X )  = E 0  E 0  A ( Y − µ 1 ( X )) s h ( Z ) π ( X )     X  = E 0 [ E 0 [( Y − µ 1 ( X )) s h ( Z ) | X , A = 1]] = E 0 [ ˙ µ 1 , 0 ( X )] = E 0 [ h 1 ( X )] , where the p en ultimate equality uses ( 10 ) and the ﬁnal equality uses ˙ µ 1 , 0 ( x ) = h 1 ( x ) as established ab o v e. The second term in ( 13 ) follo ws by the same argumen t with a = 0 , giving E 0 [ h 0 ( X )] . F or the third term, we apply the tow er prop ert y to condition on X : E 0 [ τ ( X ) s h ( Z )] = E 0 [ τ ( X ) · E 0 [ s h ( Z ) | X ]] . T o ev aluate the inner conditional exp ectation, we treat each comp onent of s h separately . F or g a , iden tical to the ab o ve, conditioning further on A gives E 0 [ g a ( Y , X ) | X = x ] = X a ′ ∈{ 0 , 1 } P 0 ( A = a ′ | x ) E 0 [ g a ′ ( Y , X ) | X = x, A = a ′ ] = X a ′ ∈{ 0 , 1 } P 0 ( A = a ′ | x ) · h a ′ ( x ) σ 2 a ′ ( x ) E 0 [ Y − µ a ′ ( X ) | X = x, A = a ′ ] = 0 , since the conditional mean of Y − µ a ′ ( X ) v anishes by deﬁnition. F or g h π , E 0 [ g h π ( X , A ) | X = x ] = h π ( x ) π ( x )(1 − π ( x )) E 0 [ A − π ( X ) | X = x ] = 0 . Since α 0 g β ( x ) is already a function of x alone, it passes through the conditional exp ectation un- c hanged. Combining these three observ ations, E 0 [ s h ( Z ) | X = x ] = 0 + 0 + α 0 g β ( x ) = α 0 g β ( x ) . Substituting bac k and us ing the iden tit y E 0 [ τ ( X ) g β ( X )] = 1 (established for the β -co ordinate submo del), E 0 [ τ ( X ) s h ( Z )] = E 0 [ τ ( X ) · α 0 g β ( X )] = α 0 · E 0 [ τ ( X ) g β ( X )] = α 0 . Collecting the three terms of ( 13 ), ˙ β 0 ,s h = E 0 [ h 1 ( X )] − E 0 [ h 0 ( X )] + α 0 = E 0 [ h 1 ( X ) − h 0 ( X )] + ( − E 0 [ h 1 ( X ) − h 0 ( X )]) = 0 . Com bined with the β -co ordinate submo del ab o ve, we see that Assumption 1 holds for the av erage treatmen t eﬀect. 31 Remark 9 (When no marginal correction is needed) . When E 0 [ h 1 ( X ) − h 0 ( X )] = 0 , which holds for instance when h 1 = h 0 p oin t wise or whenever the outcome p erturbations are mean-balanced across treatment arms, we hav e α 0 = 0 and the score simpliﬁes to s h = g a + g h π . The marginal correction α 0 g β is driven entirely b y the imbalance E 0 [ h 1 − h 0 ] of the outcome p erturbations. Assumption 13 . Under (R1)–(R2), ∥ φ ∥ ∞ < ∞ since µ a and π are b ounded and π is b ounded aw a y from 0 and 1 . The scores s β and s h are b ounded b y construction, so each co ordinate submo del is a linear tilt with b ounded score and the veriﬁcation of Lemma 3 pro ceeds iden tically to the forw ard direction. Since all assump tions of Theorem 2 hold, with lo cal product structure (Assumption 12 ) estab- lished b y the explicit co ordinate submo del constructions ab o ve, Theorem 2 giv es that m is Neyman orthogonal with G = E 0 [ ∂ β m ( Z ; β 0 , η 0 )] = − 1 . 32

On the Equivalence between Neyman Orthogonality and Pathwise Differentiability

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment