On the Equivalence between Neyman Orthogonality and Pathwise Differentiability

It has been frequently observed that Neyman orthogonality, the central device underlying double/debiased machine learning (Chernozhukov et al., 2018), and pathwise differentiability, a cornerstone concept from semiparametric theory, often lead to the…

Authors: Yuxi Chen, Edward H. Kennedy, Sivaraman Balakrishnan

On the Equiv alence b et w een Neyman Orthogonalit y and P ath wise Differen tiabilit y Y uxi Chen 1 , Edw ard H. Kennedy 1 , and Siv araman Balakrishnan 1 , 2 1 Departmen t of Statistics & Data Science 2 Mac hine Learning Department Carnegie Mellon Universit y {eric, edward, siva}@stat.cmu.edu Abstract It has b een frequen tly observed that Neyman orthogonality , the central device underlying double/debiased mac hine learning [ Chernozhuk ov et al. , 2018 ], and path wise differen tiability , a cornerstone concept from semiparametric theory , often lead to the same debiased estimators in practice. Despite the widespread adoption of b oth ideas, the precise nature of this equiv alence has remained elusiv e, with the tw o concepts having b een developed in largely separate tradi- tions. In this w ork, we revisit the semiparametric framew ork of v an der Laan and Robins [ 2003 ] and iden tify an implicit regularity assumption on the relationship b et ween target and n uisance parameters—a local pro duct structure—that allo ws us to establish a formal equiv alence b et ween Neyman orthogonality and pathwise differentiabilit y . W e demonstrate that the tw o directions of this equiv alence imp ose fundamentally different structural requirements, and illustrate the theory through a concrete example of estimating the av erage treatment effect. This helps clarify the relationship b et ween these tw o foundational frameworks and provides a useful reference for practitioners working at their intersection. 1 In tro duction In recen t years, the double/debiased machine learning (DML) framew ork of Chernozhuk o v et al. [ 2018 ] has b ecome a standard to ol in mo dern causal inference for estimating lo w-dimensional pa- rameters in the presence of high-dimensional n uisance functions. The central feature of DML is that the estimating function satisfies Neyman ortho gonality : an estimating function m ( Z ; β , η ) is Neyman orthogonal if the Gâteaux deriv ativ e of the exp ected estimating function with resp ect to the nuisance parameter η , ev aluated at the true parameter v alues ( β 0 , η 0 ) , v anishes in all admis- sible p erturbation directions. This first-order insensitivit y to the n uisance ensures that bias from estimating η 0 en ters only at second order, enabling the use of flexible machine learning estimators for nuisance functions while preserving desirable prop erties of the target estimator. W e refer the reader to Chernozhuk ov et al. [ 2018 ] for a thorough treatment of these statistical consequences. It has long b een observ ed that Neyman orthogonal estimating functions coincide, in essentially ev- ery example of interest, with influence functions of p athwise differ entiable functionals from classical semiparametric theory , whic h underpin the construction of efficien t estimators [ Bic kel et al. , 1998 , v an der V aart , 1998 , v an der Laan and Robins , 2003 , T siatis , 2006 ]. F or example, the augmented 1 in verse probabilit y w eighted estimator for the a verage treatmen t effect arises naturally b oth as a one-step correction built from the efficient influence function and as the solution to a Neyman or- thogonal moment condition. Y et these t wo concepts are routinely developed and inv oked as distinct notions. A precise c haracterization of when and why they agree, and what structural condition each direction requires, do es not app ear to hav e b een made explicit in the literature. W e hop e to close the gap b etw een these tw o traditions by formalizing the equiv alence, and by clarifying the struc- tural and regularity conditions that underpin each direction of the implication. F or simplicity , w e restrict attention to scalar-v alued functionals, although the results extend generally to vector-v alued scenarios. Establishing the equiv alence requires bridging tw o seemingly distinct viewp oin ts. P athwise differ- en tiability is formulated geometrically , characterizing the first-order b eha vior of a functional along smo oth perturbations of the data-generating distribution without reference to an y explicit nui- sance parameterization. On the other hand, Neyman orthogonality is defined analytically through deriv ativ es of an exp ected estimating function with resp ect to an explicitly parameterized n uisance. Relating the tw o turns out to require constructing smo oth p erturbations of the distribution that mo ve one parameter while holding the other fixed. The classical notion of lo cal v ariation indep en- dence means the attainable parameter set contains a pro duct neigh b orho od of ( β 0 , η 0 ) , guaranteeing that indep enden tly v aried parameter v alues exist. Ho wev er, this definition is purely set-theoretic and do es not ensure that they are connected b y submo dels regular enough to differentiate along. W e formalize the missing condition as a lo cal pro duct structure (Assumption 1 ), whic h requires that co ordinate p erturbations not merely exist as p oin ts in the mo del but form regular submo dels through P 0 . This condition underlies the classical framew ork of v an der Laan and Robins [ 2003 ], where it is implicitly in vok ed but not separately identified. W e make this explicit and discuss its role in their pro ofs in App endix B . Equipp ed with this assumption, we establish the equiv alence b et ween Neyman orthogonality and path wise differentiabilit y through t wo results. The forw ard direction (Theorem 1 ) shows that a Neyman orthogonal estimating function with a nondegenerate Jacobian induces an influence func- tion, and hence pathwise differen tiability , without requiring any v ariation indep endence or pro duct structure. The reverse direction (Theorem 2 ) shows that a mean-zero estimating function whose v alue at the truth is an influence function must b e Neyman orthogonal. This direction do es require lo cal pro duct structure to identify co ordinate submo dels that p erturb β and η indep enden tly . 2 Bac kground W e work on a measurable space ( Z , A ) and fix a σ -finite measure ν suc h that ev ery P ∈ P is dominated b y ν . W e denote the density of P ∈ P b y p = dP /dν . Fix P 0 ∈ P with densit y p 0 . W e write E 0 [ · ] ≡ E P 0 [ · ] . Let L 2 ( P 0 ) = n f : E 0 [ f 2 ] < ∞ o , L 0 2 ( P 0 ) = { f ∈ L 2 ( P 0 ) : E 0 [ f ] = 0 } , L ∞ ( P 0 ) = ( f : ∥ f ∥ ∞ := ess sup P 0 | f | < ∞ ) . All equalities b et ween random v ariables are in terpreted P 0 -a.s. unless stated otherwise. T o define lo cal p erturbations at P 0 , w e consider paths through P 0 inside the mo del P . The appropriate regularit y condition on such paths is quadratic-mean differen tiability [ v an der V aart , 1998 ]. 2 2.1 Regular Submo dels and Scores Definition 1 (Regular (QMD) submo del and score) . A r e gular p ar ametric submo del thr ough P 0 is an indexe d family { P t : t ∈ ( − ϵ, ϵ ) } ⊂ P with P t =0 ≡ P 0 such that 1. P t ≪ ν with density p t = dP t /dν . 2. The map t 7→ P t is differ entiable in quadr atic me an (QMD) at 0: ther e exists s ∈ L 0 2 ( P 0 ) such that Z  √ p t − √ p 0 t − 1 2 s √ p 0  2 dν → 0 as t → 0 . The function s is the sc or e of the submo del at 0. When it is helpful to indicate the score of a submo del, we write P t,s for a regular submo del through P 0 with score s . W e also use t 7→ P t,s and { P t,s } interc hangeably to refer to the submo del. One may observe that different submo dels can share the same score, and for our purp oses these submo dels should b e considered in terchangeable. This is b ecause the score alone determines the first-order b eha vior along any such submo del. It is therefore natural to work not with individual submo dels but with their scores, which we collect into a single space. Let S ⊂ L 0 2 ( P 0 ) b e the set of scores of all regular submo dels through P 0 . Definition 2 (T angent space) . The (ful l) tangent sp ac e is T := span( S ) L 2 ( P 0 ) ⊂ L 0 2 ( P 0 ) . The tangen t space is defined as the closed span of the scores, but it remains to sho w that scores can b e constructed in a controlled wa y . One simple and standard construction is the linear tilt, where one p erturbs p 0 b y a m ultiplicative factor 1 + tg for a bounded, mean-zero function g , pro ducing a regular submo del with score exactly g . Lemma 1 (Linear tilt submodel is QMD with score g ) . L et g ∈ L ∞ ( P 0 ) with E 0 [ g ] = 0 . L et M := ∥ g ∥ ∞ . F or | t | < 1 / M , define p t ( z ) := p 0 ( z ) { 1 + tg ( z ) } . Then p t ≥ 0 ν -a.e., R p t dν = 1 , and the r esulting submo del { P t : | t | < 1 / M } is r e gular (QMD) at 0 with sc or e s ≡ g . (Pr o of in A pp endix A.1 .) It should b e noted that these submo dels are not necessarily intended as realistic data-generating mec hanisms but rather as analytical to ols for assessing the lo cal geometry of the mo del. Indeed, in the nonparametric mo del, linear tilts alone suffice to saturate the tangent space. Corollary 1 (Saturation in the nonparametric mo del) . Supp ose P is the ful l nonp ar ametric mo del (al l densities p w.r.t. ν ). Then T = L 0 2 ( P 0 ) . Pr o of. By Lemma 1 , ev ery b ounded mean-zero g is a score so L ∞ ( P 0 ) ∩ L 0 2 ( P 0 ) ⊂ S . Since b ounded functions are dense in L 2 ( P 0 ) , it follo ws that L ∞ ( P 0 ) ∩ L 0 2 ( P 0 ) is dense in L 0 2 ( P 0 ) . T aking the closed linear span of these scores yields T = L 0 2 ( P 0 ) . 3 2.1.1 Differen tiating Exp ectations along Regular Submo dels Deriving the cen tral results of this note requires differentiating exp ectations of the form E P t,s [ f ( Z )] along regular submo dels, where the integrand f itself may also dep end on t . The first result b elo w handles the case for a fixed in tegrand, and the second extends to integrands that v ary along the submo del, which arises naturally when the integrand dep ends on parameters that mo ve with P t,s . Lemma 2 (Differen tiation of expectations for a fixed f ) . L et t 7→ P t,s b e a r e gular (QMD) submo del thr ough P 0 with b ounde d sc or e s . If f ∈ L ∞ ( P 0 ) , then d dt E P t,s [ f ( Z )]     t =0 = E 0 [ f ( Z ) s ( Z )] . (Pr o of in A pp endix A.2 .) Lemma 3 (Differentiation of exp ectations for v arying f t ) . L et t 7→ P t,s b e a r e gular (QMD) submo del thr ough P 0 with b ounde d sc or e s . L et f t : Z → R b e me asur able. A ssume 1. f 0 ∈ L ∞ ( P 0 ) . 2. Ther e exists ˙ f 0 ∈ L 2 ( P 0 ) such that E 0 "  f t − f 0 t − ˙ f 0  2 # → 0 as t → 0 . 3. Ther e exists δ > 0 , C < ∞ such that sup | t | <δ E P t,s "  f t − f 0 t  2 # ≤ C . Then t 7→ E P t,s [ f t ] is differ entiable at 0 and d dt E P t,s [ f t ]     t =0 = E 0 [ f 0 s ] + E 0 [ ˙ f 0 ] . (Pr o of in A pp endix A.3 .) 2.1.2 Nuisance Scores, Nuisance T angen t Space, and P athwise Deriv ativ es The to ols dev elop ed in Section 2.1.1 allow us to differentiate exp ectations along regular submo dels, but do not yet distinguish b etw een p erturbations that change the parameter of interest and those that do not. T o clarify this distinction, w e define nuisance scores, the nuisance tangent space, and influence functions follo wing v an der Laan and Robins [ 2003 ], and sho w that influence functions are orthogonal to the n uisance tangen t space. Let β : P → R be the target parameter of interest with β 0 := β ( P 0 ) . Definition 3 (Nuisance scores and nuisance tangen t space) . A ssume that for every r e gular submo del t 7→ P t,s thr ough P 0 , the derivative d dt β ( P t,s ) | t =0 exists. Define the nuisanc e sc or e set S nuis := ( s ∈ S : ∃ a r e gular submo del t 7→ P t,s with sc or e s such that d dt β ( P t,s )     t =0 = 0 ) . Define the nuisanc e tangent sp ac e Λ := span( S nuis ) L 2 ( P 0 ) ⊂ T . 4 It is worth noting that n uisance scores are defined without reference to any explicit n uisance param- eterization. Concretely , a score s is nuisance if there exists a regular submo del with score s along whic h β is lo cally constant to first order. In general, d dt β ( P t,s ) | t =0 need not b e determined uniquely b y the score alone, i.e., t w o regular submo dels can share the same score while yielding differen t deriv ativ es of β . Moreo ver, Λ is a closed linear subspace of T generated by score directions that admit regular submo dels along whic h β is lo cally constant to first order. Definition 4 (Path wise differen tiabilit y and influence functions) . W e say β is p athwise differ entiable at P 0 if ther e exists φ ∈ L 0 2 ( P 0 ) such that for every r e gular submo del with sc or e s , d dt β ( P t,s )     t =0 = E 0 [ φ ( Z ; P 0 ) s ( Z )] . A ny such φ is c al le d an influenc e function of β at P 0 or a gr adient of the p athwise derivative. Here, φ ( Z ; P 0 ) indicates that φ is a functional of P 0 ev aluated at the data p oin t Z . Since P 0 is fixed throughout, we write simply φ ( Z ) hereafter. Note that if β is pathwise differentiable at P 0 , then the deriv ativ e dep ends only on the score, in whic h case the condition in Definition 3 is equiv alen t to requiring that β do es not change to first order along any regular submo del with score s . Remark 1 (Uniqueness of the influence function) . In general, the influence function need not b e unique. The pathwise deriv ative condition only probes φ through inner products with scores s ∈ T , so adding any h ∈ T ⊥ to φ pro duces another v alid influence function. Only the pro jection on to T is identified b y the pathwise deriv ative. This pro jection is called the efficient influence function (EIF) and is the unique influence function lying in T . In the nonparametric mo del, T = L 0 2 ( P 0 ) by Corollary 1 , so T ⊥ = { 0 } and the influence function is unique. Since all examples in this note are nonparametric, the influence function and the EIF coincide throughout. Lemma 4 (Influence functions are orthogonal to Λ ) . If β is p athwise differ entiable with influenc e function φ (Definition 4 ), then E 0 [ φ ( Z ) s ( Z )] = 0 ∀ s ∈ Λ . Pr o of. Let s ∈ S nuis . By Definition 3 , there exists a regular submo del with score s along which d dt β ( P t,s )   t =0 = 0 . By pathwise differen tiability , 0 = d dt β ( P t,s )     t =0 = E 0 [ φ ( Z ) s ( Z )] . Since the map s 7→ E 0 [ φs ] is a contin uous linear functional on L 2 ( P 0 ) , the equality extends from S nuis to its closed linear span Λ . Lemma 4 says that the influence function is orthogonal to every direction in the n uisance tangent space. This can b e considered as an analogue of Neyman orthogonality , which requires that the exp ected estimating function b e insensitive to p erturbations of the n uisance parameter, but formu- lated without reference to an y explicit parameterization. Establishing a formal equiv alence b et w een these tw o formulations, as we do in Section 3 , will rely on the pro duct structure developed in the next section to iden tify n uisance p erturbations with n uisance scores in Λ . 5 2.2 Estimating F unctions and Neyman Orthogonalit y T o form ulate Neyman orthogonalit y , w e will need to w ork with estimating functions of the form m ( Z ; β , η ) that dep end explicitly on b oth a target parameter β and a nuisance parameter η . This requires us to mo v e b eyond the framew ork of Section 2.1.2 , where n uisance scores w ere defined without reference to any explicit parameterization, and sp ecify concrete functionals on the mo del that assume the roles of the target and nuisance. Once such a parameterization is in place, it is natural to ask what structure the relationship b et w een β and η must p ossess for the t wo viewp oin ts to agree. Path wise differen tiability is defined through scores alone and mak es no reference to how the n uisance is parameterized, while Neyman orthogonalit y dep ends explicitly on the functional form of β and η . As w e sho w b elow, connecting these tw o viewp oin ts requires the ability to construct submo dels that mo ve one co ordinate while holding the other fixed. As b efore, let β : P → R , η : P → H b e functionals on the mo del, where H ⊂ V is a subset of a normed v ector space with the norm denoted by ∥ · ∥ V . W e let β 0 := β ( P 0 ) and η 0 := η ( P 0 ) . F or any pair of ( β , η ) in the attainable set Θ := { ( β ( P ) , η ( P )) : P ∈ P } , w e write P β ,η for a distribution in P with β ( P β ,η ) = β and η ( P β ,η ) = η , so that for any P ∈ P , the exp ectation E P [ f ( Z ; β ( P ) , η ( P ))] can b e written E P β ,η [ f ( Z ; β , η )] with ( β , η ) = ( β ( P ) , η ( P )) . Finally , let ˙ H := { h ∈ V : ∃ ϵ > 0 suc h that η 0 + th ∈ H for all | t | < ϵ } denote the set of admissible p erturbation directions at η 0 . 2.2.1 Local Pro duct Structure T o apply the differen tiation results of Section 2.1.1 , we require an additional lo cal pro duct structure assumption, whic h ensures the existence of regular (QMD) submo dels along eac h co ordinate, that is, submo dels that p erturb one of β or η while holding the other fixed. Note that any regular submo del t 7→ P t,s through P 0 induces a co ordinate path t 7→ ( β t,s , η t,s ) := ( β ( P t,s ) , η ( P t,s )) . The follo wing assumption requires that this co ordinate path can b e con trolled indep enden tly in each comp onen t. Assumption 1 (Lo cal pro duct structure) . The fol lowing first-or der c o or dinate c onditions hold: 1. β -c o or dinate submo del. Ther e exists a r e gular (QMD) submo del t 7→ P t ∈ P thr ough P 0 along which the induc e d c o or dinate p ath is differ entiable at t = 0 with d dt β ( P t )     t =0 = 1 and d dt η ( P t )     t =0 = 0 . 2. η -c o or dinate submo del. F or every admissible nuisanc e p erturb ation dir e ction h ∈ ˙ H , ther e exists a r e gular (QMD) submo del t 7→ P t ∈ P thr ough P 0 along which the induc e d c o or dinate p ath is differ entiable at t = 0 with d dt β ( P t )     t =0 = 0 and d dt η ( P t )     t =0 = h. This formalizes a condition implicit in the framework of v an der Laan and Robins [ 2003 , p. 56], where the mo del is written as { F µ,η } with µ and η “indep enden tly v arying,” and submo dels v arying only 6 the n uisance parameter are used to generate the nuisance tangent space. Note that Assumption 1 requires only first-order con trol, where the deriv ativ es of β ( P t ) and η ( P t ) at t = 0 are prescrib ed, but the paths need not satisfy β ( P t ) = β 0 + t or η ( P t ) = η 0 + th exactly for t  = 0 . W e discuss the relationship b etw een Assumption 1 and the notion of lo cal v ariation indep endence, as well as the role of pro duct structure in the pro of of Lemma 1.3 of v an der Laan and Robins [ 2003 ], in App endix B . 2.2.2 Neyman Orthogonality Next, let m : Z × R × H → R b e such that z 7→ m ( z ; β , η ) is A -measurable for eac h ( β , η ) . The function m pla ys the role of an estimating function, enco ding a moment condition whose solution at the true nuisance v alue identifies β 0 , while the explicit dep endence on η reflects the presence of n uisance quan tities that need to b e estimated. Definition 5 (Correct lo cal sp ecification) . W e say that m is c orr e ctly sp e cifie d in a neighb orho o d of ( β 0 , η 0 ) if for al l P ∈ P with ( β ( P ) , η ( P )) in a neighb orho o d of ( β 0 , η 0 ) , E P [ m ( Z ; β ( P ) , η ( P ))] = 0 . Correct sp ecification ensures that β 0 solv es the moment condition at the true nuisance, but do es not constrain how the exp ected estimating function v aries with η near η 0 . Neyman orthogonality strengthens this by requiring that this v ariation v anish to first order, so that small errors in η do not propagate to estimation of β . Definition 6 (Neyman orthogonality) . A ssume the map η 7→ E 0 [ m ( Z ; β 0 , η )] is Gâte aux differ en- tiable at η 0 along dir e ctions h ∈ ˙ H . W e say m is Neyman ortho gonal at ( β 0 , η 0 ) if ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] = 0 ∀ h ∈ ˙ H . Remark 2. The Gâteaux deriv ativ e in Definition 6 is computed under the fixed measure P 0 with β 0 held fixed. The map η 7→ E 0 [ m ( Z ; β 0 , η )] is defined for an y η ∈ H for whic h the in tegral exists, without requiring that ( β 0 , η ) corresp ond to a distribution in the mo del P . In particular, no v ariation indep endence is needed to formulate Neyman orthogonality . The role of Assumption 1 is instead to establish that Neyman orthogonalit y holds for influence functions. 2.3 The L 2 Chain R ule along Co ordinate Paths T o connect estimating functions with path wise differen tiability , w e also need to differentiate the estimating function m ( Z ; β , η ) along the coordinate path induced b y a regular submodel. The follo wing t wo assumptions regulate the behavior of this coordinate path and of the estimating function along it. Assumption 2 (Co ordinate smo othness along a submo del) . F or a given r e gular (QMD) submo del t 7→ P t,s thr ough P 0 with sc or e s , the induc e d c o or dinate p ath satisfies: 1. t 7→ β t,s := β ( P t,s ) is differ entiable at 0 : ( β t,s − β 0 ) /t → ˙ β 0 ,s ∈ R . 2. t 7→ η t,s := η ( P t,s ) is differ entiable at 0 in V : ∥ ( η t,s − η 0 ) /t − ˙ η 0 ,s ∥ V → 0 for some ˙ η 0 ,s ∈ ˙ H . In p articular, ˙ β 0 ,s = d dt β ( P t,s )   t =0 . 7 Assumption 3 (F réc het differen tiability of m in L 2 ( P 0 ) ) . The map ( β , η ) 7→ m ( · ; β , η ) ∈ L 2 ( P 0 ) is F r é chet differ entiable at ( β 0 , η 0 ) . That is, ther e exist b ounde d line ar maps D β m 0 : R → L 2 ( P 0 ) , D η m 0 : V → L 2 ( P 0 ) such that ∥ m ( · ; β , η ) − m ( · ; β 0 , η 0 ) − D β m 0 ( β − β 0 ) − D η m 0 ( η − η 0 ) ∥ L 2 ( P 0 ) = o ( | β − β 0 | + ∥ η − η 0 ∥ V ) . W e write ∂ β m ( Z ; β 0 , η 0 ) := D β m 0 (1)( Z ) and ∂ η m ( Z ; β 0 , η 0 )[ h ] := D η m 0 ( h )( Z ) . Lemma 5 ( L 2 c hain rule) . Under A ssumptions 2 and 3 , define f t,s ( Z ) := m ( Z ; β t,s , η t,s ) and ˙ f 0 ,s ( Z ) := ∂ β m ( Z ; β 0 , η 0 ) ˙ β 0 ,s + ∂ η m ( Z ; β 0 , η 0 )[ ˙ η 0 ,s ] . Then ( f t,s − f 0 ) /t → ˙ f 0 ,s in L 2 ( P 0 ) . (Pr o of in App endix A.4 .) 3 Equiv alence Bet w een Neyman Orthogonalit y and P ath wise Dif- feren tiabilit y W e now establish the relationship b etw een Neyman orthogonalit y and path wise differen tiability . The forw ard direction (Section 3.1 ) demonstrates that a Neyman orthogonal estimating function with nondegenerate Jacobian induces an influence function, and hence pathwise differen tiability . The rev erse direction (Section 3.2 ) shows that if a correctly sp ecified estimating function ev aluates to an influence function at the truth, then it must b e Neyman orthogonal and its sensitivit y to the target parameter is fully calibrated by the influence function representation. Here, we require lo cal pro duct structure in order to sp ecialize to coordinate submo dels that p erturb β and η indep enden tly . The proofs of the tw o directions differ regarding their structural requiremen ts. The forward direction requires that the induced co ordinate paths b e smo oth along a dense class of regular submo dels and that the target functional b e lo cally Lipsc hitz in Hellinger distance, whereas the rev erse direction requires the lo cal pro duct structure of Assumption 1 in order to construct submo dels that p erturb β and η indep enden tly . 3.1 Neyman Orthogonalit y Implies P ath wise Differentiabilit y Fix an estimating function m : Z × R × H → R . Correct sp ecification ensures E P t,s [ m ( Z ; β t,s , η t,s )] = 0 iden tically along any regular submo del, so the deriv ativ e of this constan t function v anishes. Ex- panding the deriv ativ e via Lemma 3 and the L 2 c hain rule (Lemma 5 ), and then in voking Neyman orthogonalit y to eliminate the n uisance con tribution, yields a representation of ˙ β 0 ,s as an inner pro duct with the score, whic h is exactly pathwise differentiabilit y . Assumption 4 (Correct sp ecification) . The estimating function m is c orr e ctly sp e cifie d at ( β 0 , η 0 ) in the sense of Definition 5 . Assumption 5 (Co ordinate smo othness along a dense class of submo dels) . Ther e exists a set of sc or es S ⊂ L ∞ ( P 0 ) ∩ L 0 2 ( P 0 ) whose L 2 ( P 0 ) -closur e is e qual to T such that for e ach s ∈ S , ther e exists a r e gular submo del t 7→ P t,s thr ough P 0 with sc or e s along which the induc e d c o or dinate p ath t 7→ ( β t,s , η t,s ) = ( β ( P t,s ) , η ( P t,s )) satisfies A ssumption 2 . 8 Assumption 6 (F réc het differen tiability of m ) . The map ( β , η ) 7→ m ( · ; β , η ) ∈ L 2 ( P 0 ) satisfies A ssumption 3 . Assumption 7 (Regularit y along submo dels) . F or e ach s ∈ S and the c orr esp onding submo del t 7→ P t,s fr om A ssumption 5 , the function f t,s ( Z ) := m ( Z ; β t,s , η t,s ) satisfies the c onditions of L emma 3 . Assumption 8 (Nondegenerate Jacobian) . G := E 0 [ ∂ β m ( Z ; β 0 , η 0 )]  = 0 . Assumption 9 (Neyman orthogonality) . F or al l h ∈ ˙ H , ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] = 0 . Assumption 10 (Hellinger Lipschitz) . Ther e exist c, δ > 0 such that | β ( P 1 ) − β ( P 2 ) | ≤ cH ( P 1 , P 2 ) ∀ P 1 , P 2 ∈ P with H ( P i , P 0 ) ≤ δ . Assumptions 4 and 9 are the t w o standard requiremen ts on the estimating function in tro duced in Section 2.2.2 . Assumptions 5 through 7 ensure that the differentiation machinery of Section 2 applies along a dense class of regular submo dels. These amoun t to differen tiability of the estimating function in its parameters and of the functionals β and η along these submo dels, together with b oundedness and in tegrability conditions near the truth. Assumption 8 ensures that the rescaling φ = − G − 1 m ( Z ; β 0 , η 0 ) in the conclusion of Theorem 1 is w ell defined. Assumption 10 provides the quan titative con trol needed to extend the path wise deriv ativ e from the dense class of b ounded scores to all scores. It b ounds how fast β can v ary relativ e to the Hellinger distance b et w een distributions, ensuring that replacing an arbitrary regular submo del by one from the dense class with a nearby score incurs a con trolled error in the deriv ativ e of β . Theorem 1 (Neyman orthogonality implies pathwise differen tiability) . Under A ssumptions 4 – 10 , β is p athwise differ entiable (Definition 4 ) at P 0 with influenc e function φ ( Z ) := − G − 1 m ( Z ; β 0 , η 0 ) . Pr o of. Let s ∈ S and let t 7→ P t,s b e a regular submo del through P 0 with score s as furnished b y Assumption 5 . By Assumption 5 , the induced co ordinate path ( β t,s , η t,s ) lies in the neighborho o d of ( β 0 , η 0 ) for small t . Assumption 4 then gives E P t,s [ m ( Z ; β t,s , η t,s )] = 0 for all sufficien tly small t. Define f t,s ( Z ) := m ( Z ; β t,s , η t,s ) and f 0 ( Z ) := m ( Z ; β 0 , η 0 ) . Since t 7→ E P t,s [ f t,s ] is identically zero, d dt E P t,s [ f t,s ]     t =0 = 0 . W e apply Lemma 3 to the function f t,s , which is v alid by Assumption 7 . By the L 2 c hain rule (Lemma 5 ), which applies under Assumptions 5 and 6 , the quotient ( f t,s − f 0 ) /t conv erges in L 2 ( P 0 ) to ˙ f 0 ,s ( Z ) = ∂ β m ( Z ; β 0 , η 0 ) ˙ β 0 ,s + ∂ η m ( Z ; β 0 , η 0 )[ ˙ η 0 ,s ] . 9 Lemma 3 thus give s 0 = E 0 [ f 0 ( Z ) s ( Z )] + E 0 [ ˙ f 0 ,s ( Z )] . Substituting the expression for ˙ f 0 ,s and using linearity of exp ectation, 0 = E 0 [ m ( Z ; β 0 , η 0 ) s ( Z )] + E 0 [ ∂ β m ( Z ; β 0 , η 0 )] ˙ β 0 ,s + E 0 [ ∂ η m ( Z ; β 0 , η 0 )[ ˙ η 0 ,s ]] . (1) No w, ˙ η 0 ,s ∈ ˙ H by Assumption 5 , and F réc het differentiabilit y (Assumption 6 ) p ermits the inter- c hange of deriv ativ e and exp ectation, so that E 0 [ ∂ η m ( Z ; β 0 , η 0 )[ h ]] = ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] , whic h v anishes by Neyman orthogonality (Assumption 9 ). Recalling G = E 0 [ ∂ β m ( Z ; β 0 , η 0 )] , we are left with 0 = E 0 [ m ( Z ; β 0 , η 0 ) s ( Z )] + G ˙ β 0 ,s . Since G  = 0 b y Assumption 8 , ˙ β 0 ,s = − G − 1 E 0 [ m ( Z ; β 0 , η 0 ) s ( Z )] = E 0 [ φ ( Z ) s ( Z )] , where φ ( Z ) = − G − 1 m ( Z ; β 0 , η 0 ) . W e note that φ ∈ L 0 2 ( P 0 ) . Condition 1 of Lemma 3 , inv ok ed via Assumption 7 , requires f 0 = m ( Z ; β 0 , η 0 ) ∈ L ∞ ( P 0 ) . Since this is a prop ert y of f 0 alone and do es not dep end on the choice of submo del, φ = − G − 1 f 0 ∈ L ∞ ( P 0 ) ⊂ L 2 ( P 0 ) . Mean zero follows from correct sp ecification at the truth (Assumption 4 ). It remains to extend the conclusion to all regular submo dels. T o start, let t 7→ P t,s ′ b e an arbitrary regular submo del through P 0 with score s ′ ∈ S and fix ϵ > 0 . Since S is dense in T by Assumption 5 , there exists g ∈ S with ∥ s ′ − g ∥ L 2 ( P 0 ) ≤ ϵ. Let t 7→ P t,g b e the regular submo del with score g furnished b y Assumption 5 . By QMD, w e kno w that H ( P t,s ′ , P 0 ) → 0 as t → 0 , and H ( P t,g , P 0 ) → 0 as t → 0 . Let δ > 0 b e as in Assumption 10 . There exists t ∗ > 0 suc h that for all | t | < t ∗ , H ( P t,s ′ , P 0 ) ≤ δ and H ( P t,g , P 0 ) ≤ δ. By Lemma 6 (App endix A.5 ), which b ounds the Hellinger distance b et ween t wo regular submo dels in terms of the L 2 ( P 0 ) distance b etw een their scores, lim sup t → 0 H ( P t,s ′ , P t,g ) | t | ≤ 1 2 √ 2 ∥ s ′ − g ∥ L 2 ( P 0 ) ≤ ϵ 2 √ 2 . Then it follows that lim sup t → 0 | β ( P t,s ′ ) − β ( P t,g ) | | t | ≤ c · lim sup t → 0 H ( P t,s ′ , P t,g ) | t | ≤ cϵ 2 √ 2 , where the first inequality holds by Assumption 10 , since b oth P t,s ′ and P t,g lie within Hellinger distance δ of P 0 for | t | < t ∗ . Next, for an y t  = 0 with | t | < t ∗ , we write     β ( P t,s ′ ) − β 0 t − E 0 [ φs ′ ]     ≤ | β ( P t,s ′ ) − β ( P t,g ) | | t | | {z } (I) +     β ( P t,g ) − β 0 t − E 0 [ φg ]     | {z } (II) + | E 0 [ φ ( s ′ − g )] | | {z } (II I) . 10 F or term (I), we know lim sup t → 0 (I) ≤ cϵ 2 √ 2 . F or term (I I), the score g lies in S , so we know from ab o v e that lim t → 0 (I I) = 0 . F or term (I I I), b y Cauc hy-Sc h warz, we ha ve | E 0 [ φ ( s ′ − g )] | ≤ ∥ φ ∥ L 2 ( P 0 ) · ∥ s ′ − g ∥ L 2 ( P 0 ) ≤ ∥ φ ∥ L 2 ( P 0 ) · ϵ. Com bining the three terms, w e arrive at lim sup t → 0     β ( P t,s ′ ) − β 0 t − E 0 [ φs ′ ]     ≤ ϵ  c 2 √ 2 + ∥ φ ∥ L 2 ( P 0 )  . Since ϵ was arbitrary , the left-hand side ev aluates to zero. Therefore, d dt β ( P t,s ′ )     t =0 = E 0 [ φ ( Z ) s ′ ( Z )] . Since s ′ ∈ S was arbitrary , Definition 4 is satisfied and β is path wise differentiable at P 0 with influence function φ . Remark 3 (Hellinger Lipschitz) . The extension from b ounded scores to all scores in the pro of of Theorem 1 adapts an argumen t from Luedtk e and Chung [ 2024 ], who use a Hellinger Lipschitz condition to establish pathwise differentiabilit y of Hilb ert-v alued parameters from a score-dense class of submo dels (their Lemma 2). The first part of the pro of, which establishes the deriv ative represen tation on the dense class from Neyman orthogonalit y , is sp ecific to the presen t setting. 3.2 P ath wise Differentiabilit y Implies Neyman Orthogonalit y W e no w prov e the conv erse. If m is a correctly sp ecified estimating function whose v alue at the truth is an influence function, then m is Neyman orthogonal an d its sensitivity to p erturbations of β is pinned at unit rate by the path wise deriv ativ e. Unlike the forward direction, this requires the lo cal pro duct structure of Assumption 1 in order to sp ecialize Equation 1 from the pro of of Theorem 1 to eac h co ordinate axis indep enden tly . Assumption 11 (Path wise differentiabilit y and influence function representation) . The functional β is p athwise differ entiable at P 0 (Definition 4 ) with influenc e function φ ( Z ) ≡ m ( Z ; β 0 , η 0 ) . Assumption 12 (Lo cal pro duct structure) . A ssumption 1 holds. W e denote the sc or e of the β - c o or dinate submo del by s β and the sc or e of the η -c o or dinate submo del in dir e ction h by s h . Assumption 13 (Regularit y along co ordinate submo dels) . F or e ach c o or dinate submo del t 7→ P t,s fr om A ssumption 12 , the sc or e s is b ounde d and the function f t,s ( Z ) := m ( Z ; β t,s , η t,s ) satisfies the c onditions of L emma 3 . Theorem 2 (Path wise differen tiability implies Neyman orthogonality) . Under A ssumptions 4 , 6 , and 11 – 13 , the estimating function m satisfies: 1. Neyman ortho gonality. F or al l h ∈ ˙ H , ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] = 0 . 2. − 1 normalization. G := E 0 [ ∂ β m ( Z ; β 0 , η 0 )] = − 1 . 11 Pr o of. The co ordinate submo dels furnished by Assumption 12 are regular submo dels through P 0 , and their induced co ordinate paths are differentiable at t = 0 b y construction, so they satisfy As- sumption 2 . T ogether with correct sp ecification (Assumption 4 ), F réchet differen tiability (Assump- tion 6 ), and the regularity conditions of Assumption 13 , the deriv ation in the pro of of Theorem 1 leading to Equation 1 applies to each co ordinate submo del t 7→ P t,s with score s , 0 = E 0 [ m ( Z ; β 0 , η 0 ) s ( Z )] + G ˙ β 0 ,s + ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ ˙ η 0 ,s ] . (2) By the influence function represen tation (Assumption 11 ), the first term equals ˙ β 0 ,s = d dt β ( P t,s )   t =0 , so ( 2 ) b ecomes (1 + G ) ˙ β 0 ,s + ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ ˙ η 0 ,s ] = 0 . (3) W e now sp ecialize ( 3 ) to each co ordinate submo del. P art 1. Fix h ∈ ˙ H and take t 7→ P t,s h to b e the η -coordinate submo del from Assumption 12 . By Assumption 12 , ˙ β 0 ,s h = 0 and ˙ η 0 ,s h = h . Substituting into ( 3 ), ∂ ∂ η E 0 [ m ( Z ; β 0 , η )]     η = η 0 [ h ] = 0 . Since h ∈ ˙ H was arbitrary , Neyman orthogonalit y holds. P art 2. T ak e t 7→ P t,s β to b e the β -co ordinate submo del from Assumption 12 . By Assumption 12 , ˙ β 0 ,s β = 1 and ˙ η 0 ,s β = 0 . Since the Gâteaux deriv ativ e in ( 3 ) is ev aluated at direction ˙ η 0 ,s β = 0 , the n umerator of the defining difference quotient v anishes identically , which lea ves (1 + G ) · 1 = 0 , hence G = − 1 . Remark 4 (Structural comparison with the forw ard direction) . The forward and rev erse directions share the same intermediate iden tity ( 1 ), but differ in what is known and what is derived. In the forw ard direction, Neyman orthogonality eliminates the n uisance term, and the resulting inner- pro duct representation ˙ β 0 ,s = E 0 [ φ s ] for every score s ∈ S yields path wise differen tiability . In the rev erse direction, the influence function represen tation conv erts the first term into ˙ β 0 ,s , and pro duct structure allows one to sp ecialize the resulting identit y ( 3 ) to eac h co ordinate axis indep enden tly , yielding Neyman orthogonality and G = − 1 . The tw o directions also place differen t requirements on the submo dels. In Theorem 1 , the co ordinate path ( β t,s , η t,s ) arises from ev aluating the functionals β and η along regular submo dels from the dense class in Assumption 5 . In Theorem 2 , we m ust construct submo dels with prescrib ed first- order co ordinate b eha vior, one along which ˙ β 0 ,s = 1 , ˙ η 0 ,s = 0 and one with ˙ β 0 ,s = 0 , ˙ η 0 ,s = h . Remark 5 (Sharpness of the pro duct structure assumption) . When β factors through η , as when β ( P ) = R p 2 dν with η = p , the target parameter carries no degrees of freedom b ey ond those already enco ded in the nuisance. Holding η fixed necessarily holds β fixed, and no β -coordinate submo del of the kind required by Assumption 1 can exist. P ath wise differentiabilit y still holds in this example, but the estimating function defined by the influence function is not Neyman orthogonal. The rev erse direction of the equiv alence th us requires the pro duct structure of Assumption 1 and do es not follow from pathwise differentiabilit y alone. 12 Remark 6 (The − 1 normalization) . The − 1 normalization follo ws naturally as a structural con- sequence of pathwise differentiabilit y and the co ordinate geometry of the mo del. Along the β - co ordinate submo del t 7→ P t,s β , the parameter β increases at unit rate by construction, and th e influence function representation giv es E 0 [ m ( Z ; β 0 , η 0 ) s β ( Z )] = 1 , and ( 3 ) forces 1 + G = 0 . A first-order T aylor expansion giv es E 0 [ m ( Z ; β , η 0 )] ≈ E 0 [ m ( Z ; β 0 , η 0 )] + ( − 1) · ( β − β 0 ) = − ( β − β 0 ) , so that E 0 [ m ( Z ; β , η 0 )] = 0 has the unique lo cal solution β = β 0 , as desired. This also sheds light on a familiar pattern in semiparametric inference where many influence functions take the form φ ( Z ) = ( data-dep enden t term ) − β 0 . The normalization requires β to enter the exp ected estimating function with first-order sensitivit y exactly − 1 , which is realized b y subtracting off β . T o illustrate that the conditions of Theorems 1 and 2 can b e v erified in a standard setting, we w ork through an example of the av erage treatment effect in detail in App endix C , constructing the required co ordinate submo dels explicitly and c hecking each assumption. 4 Discussion In this pap er, w e hav e established a precise equiv alence b etw een Neyman orthogonalit y and path- wise differentiabilit y in nonparametric mo dels, building on the foundational semiparametric theory of Bic kel et al. [ 1998 ], v an der Laan and Robins [ 2003 ], T siatis [ 2006 ], and connecting it to the mo dern debiased mac hine learning framework of Chernozh uko v et al. [ 2018 ]. Our forward theo- rem shows that under mild conditions, Neyman orthogonalit y implies path wise differen tiability , and our conv erse shows that the reverse implication also holds, but requires the additional geometric condition of lo cal pro duct structure. Sev eral directions remain op en. F oremost, the regularity conditions we imp ose, notably the ex- istence of co ordinate submo dels witnessing lo cal pro duct structure, can b e nontrivial to verify in complex semiparametric problems, such as those in volving constrained n uisance spaces or function- als defined through implicit equations. That said, the conditions w e require are mild, amounting to smo othness of the estimating function and the ability to p erturb the target and nuisance parame- ters indep enden tly , and w e exp ect the equiv alence to hold broadly in the semiparametric settings most commonly encountered in practice. Relaxing these conditions, extending the equiv alence to settings with non-smo oth functionals, and developing systematic to ols for constructing co ordinate submo dels in applied problems w ould b e natural next steps. References Victor Chernozhuk o v, Denis Chetv eriko v, Mert Demirer, Esther Duflo, Christian Hansen, Whit- ney Newey , and James Robins. Double/debiased mac hine learning for treatment and structural parameters. The Ec onometrics Journal , 21(1):C1–C68, 2018. Mark J. v an der Laan and James M. Robins. Unifie d Metho ds for Censor e d L ongitudinal Data and Causality . Springer New Y ork, 2003. P eter J. Bick el, Chris A.J. Klaassen, Y a’aco v Rito v, and Jon A. W ellner. Efficient and A daptive Estimation for Semip ar ametric Mo dels . Springer New Y ork, 1998. A. W. v an der V aart. A symptotic Statistics . Cambridge Universit y Press, 1998. 13 Anastasios T siatis. Semip ar ametric The ory and Missing Data . Springer New Y ork, 2006. Alex Luedtke and Incheoul Chung. One-step estimation of differen tiable Hilb ert-v alued parameters. The A nnals of Statistics , 52(4), 2024. A Pro ofs of Lemmas A.1 Pro of of Lemma 1 W e first v erify that p t is a densit y . F or | t | < 1 / M , we ha ve 1 + tg ( z ) > 1 − | t | M > 0 ν -a.e., so p t ≥ 0 . Also Z p t dν = Z p 0 (1 + tg ) dν = 1 + t Z g dP 0 = 1 + t E 0 [ g ] = 1 . W e next show that it satisfies the QMD expansion. W rite √ p t = √ p 0 √ 1 + tg . Define r ( u ) := √ 1 + u − 1 − 1 2 u, u ∈ ( − 1 , 1) . Then r (0) = r ′ (0) = 0 . Since √ 1 + u has b ounded second deriv ativ e on [ − 1 / 2 , 1 / 2] , there exists C < ∞ such that | r ( u ) | ≤ C u 2 for | u | ≤ 1 / 2 . F or | t | ≤ 1 / (2 M ) w e h a v e | tg | ≤ 1 / 2 , hence p 1 + tg = 1 + t 2 g + r ( tg ) . Therefore √ p t − √ p 0 t − 1 2 g √ p 0 = √ p 0 · r ( tg ) t . Using | r ( tg ) | ≤ C t 2 g 2 , we hav e | r ( tg ) t | ≤ C | t | g 2 ≤ C | t | M 2 . Hence Z  √ p t − √ p 0 t − 1 2 g √ p 0  2 dν ≤ Z p 0 ( C | t | M 2 ) 2 dν = C 2 t 2 M 4 → 0 as t → 0 and the path is QMD with score s ≡ g . A.2 Pro of of Lemma 2 W rite P t ≡ P t,s and p t = dP t /dν throughout. W e hav e E P t [ f ] = R f p t dν . Then E P t [ f ] − E 0 [ f ] = Z f ( p t − p 0 ) dν = Z f ( √ p t − √ p 0 )( √ p t + √ p 0 ) dν. Dividing by t , E P t [ f ] − E 0 [ f ] t = Z f  √ p t − √ p 0 t  ( √ p t + √ p 0 ) dν. Let ∆ t := √ p t − √ p 0 t − 1 2 s √ p 0 . By QMD (Definition 1 ), it follo ws ∥ ∆ t ∥ L 2 ( ν ) → 0 . Decomp ose E P t [ f ] − E 0 [ f ] t = I t, 1 + I t, 2 14 where I t, 1 := Z f  1 2 s √ p 0  ( √ p t + √ p 0 ) dν, I t, 2 := Z f ∆ t ( √ p t + √ p 0 ) dν. T erm I t, 1 . By QMD (Definition 1 ) and the triangle inequality , √ p t → √ p 0 in L 2 ( ν ) . T o see this, there exists a function g = 1 2 s √ p 0 suc h that     √ p t − √ p 0 t − g     L 2 ( ν ) → 0 , whic h is equiv alen t to sa ying that for any ϵ > 0 , there exists δ > 0 suc h that for | t | < δ ,     √ p t − √ p 0 t − g     L 2 ( ν ) < ϵ. By the triangle inequalit y for | t | < δ ,     √ p t − √ p 0 t     L 2 ( ν ) ≤     √ p t − √ p 0 t − g     L 2 ( ν ) + ∥ g ∥ L 2 ( ν ) < ϵ + ∥ g ∥ L 2 ( ν ) where the right-hand side do es not depend on t . Multiplying b oth sides by | t | and taking the limit as t → 0 yields the result. Th us, √ p t + √ p 0 → 2 √ p 0 in L 2 ( ν ) . Since f s √ p 0 ∈ L 2 ( ν ) as Z f 2 s 2 p 0 dν = E 0 [ f 2 s 2 ] < M 2 E 0 [ f 2 ] < ∞ where M := ∥ s ∥ ∞ , we hav e I t, 1 → Z f  1 2 s √ p 0  2 √ p 0 dν = Z f sp 0 dν = E 0 [ f s ] . T erm I t, 2 . By Cauch y-Sch w arz, | I t, 2 | ≤ ∥ f ( √ p t + √ p 0 ) ∥ L 2 ( ν ) · ∥ ∆ t ∥ L 2 ( ν ) . W e already hav e that ∥ ∆ t ∥ 2 → 0 . It remains to sho w ∥ f ( √ p t + √ p 0 ) ∥ L 2 ( ν ) is bounded for small t . W rite ∥ f ( √ p t + √ p 0 ) ∥ 2 L 2 ( ν ) = Z f 2 ( √ p t + √ p 0 ) 2 dν ≤ 2 Z f 2 ( p t + p 0 ) dν = 2 n E P t [ f 2 ] + E 0 [ f 2 ] o Under QMD, we kno w P t → P 0 in Hellinger distance, and    E P t [ f 2 ] − E 0 [ f 2 ]    =     Z f 2 ( p t − p 0 ) dν     ≤ ∥ f 2 ∥ ∞ Z | p t − p 0 | dν ≤ ∥ f 2 ∥ ∞ · 2 ∥ √ p t − √ p 0 ∥ L 2 ( ν ) → 0 where the first inequality follows by the b oundedness assumption of f and the second inequality follo ws b y TV ( P t , P 0 ) ≤ 2 √ 2 H ( P t , P 0 ) . Hence E P t [ f 2 ] → E 0 [ f 2 ] as t → 0 and I t, 2 → 0 . Com bining the results ab o ve yields the desired deriv ativ e. 15 A.3 Pro of of Lemma 3 W rite P t ≡ P t,s and p t = dP t /dν throughout. W e hav e E P t [ f t ] − E 0 [ f 0 ] = { E P t [ f 0 ] − E 0 [ f 0 ] } + E P t [ f t − f 0 ] . Dividing by t , E P t [ f t ] − E 0 [ f 0 ] t = E P t [ f 0 ] − E 0 [ f 0 ] t + E P t  f t − f 0 t  . By Lemma 2 , the first term conv erges to E 0 [ f 0 s ] . F or the second term, let g t := f t − f 0 t . By Assumption (2), g t → ˙ f 0 in L 2 ( P 0 ) so E 0 [ g t ] → E 0 [ ˙ f 0 ] . It remains to show E P t [ g t ] − E 0 [ g t ] → 0 . W rite E P t [ g t ] − E 0 [ g t ] = Z g t ( p t − p 0 ) dν = Z g t ( √ p t − √ p 0 )( √ p t + √ p 0 ) dν. By Cauch y-Sc hw arz, | E P t [ g t ] − E 0 [ g t ] | ≤ ∥ g t ( √ p t + √ p 0 ) ∥ L 2 ( ν ) · ∥ √ p t − √ p 0 ∥ L 2 ( ν ) . By QMD (Definition 1 ), ∥ √ p t − √ p 0 ∥ L 2 ( ν ) → 0 . It suffices to show that ∥ g t ( √ p t + √ p 0 ) ∥ L 2 ( ν ) is b ounded for small t . By ( a + b ) 2 ≤ 2( a 2 + b 2 ) , ∥ g t ( √ p t + √ p 0 ) ∥ 2 L 2 ( ν ) ≤ 2 n E P t [ g 2 t ] + E 0 [ g 2 t ] o By Assumption (3), E P t [ g 2 t ] is uniformly b ounded for small t . E 0 [ g 2 t ] is also b ounded since g t → ˙ f 0 in L 2 ( P 0 ) . Hence the right-hand side is b ounded and E P t [ g t ] − E 0 [ g t ] → 0 . Therefore E P t [ g t ] → E 0 [ ˙ f 0 ] . Collecting b oth terms yields the desired iden tity . A.4 Pro of of Lemma 5 By F réchet differentiabilit y (Assumption 3 ), m ( · ; β t,s , η t,s ) − m ( · ; β 0 , η 0 ) = D β m 0 ( β t,s − β 0 ) + D η m 0 ( η t,s − η 0 ) + r t,s , where ∥ r t,s ∥ L 2 ( P 0 ) = o ( | β t,s − β 0 | + ∥ η t,s − η 0 ∥ V ) . Dividing by t and subtracting ˙ f 0 ,s , f t,s − f 0 t − ˙ f 0 ,s = D β m 0  β t,s − β 0 t − ˙ β 0 ,s  + D η m 0  η t,s − η 0 t − ˙ η 0 ,s  + r t,s t . T ak e L 2 ( P 0 ) norms. By b oundedness of D β m 0 , D η m 0 , there exist constan ts C β , C η suc h that ∥ · ∥ L 2 ≤ C β     β t,s − β 0 t − ˙ β 0 ,s     + C η     η t,s − η 0 t − ˙ η 0 ,s     V +     r t,s t     L 2 . The first tw o terms → 0 by Assumption 2 . F or the remainder, Assumption 2 implies | β t,s − β 0 | + ∥ η t,s − η 0 ∥ V = O ( | t | ) , so ∥ r t,s /t ∥ L 2 = o (1) , which prov es the claim. 16 A.5 Hellinger Gap b et ween Regular Submo dels Lemma 6. L et t 7→ P t,s b e a r e gular submo del with sc or e s and t 7→ P t,g b e a r e gular submo del with sc or e g . Then lim sup t → 0 H ( P t,s , P t,g ) | t | ≤ 1 2 √ 2 ∥ s − g ∥ L 2 ( P 0 ) . Pr o of. By QMD, w e ha ve √ p t,s = √ p 0  1 + t 2 s  + r t , ∥ r t ∥ L 2 ( ν ) = o ( | t | ) , √ p t,g = √ p 0  1 + t 2 g  + ˜ r t , ∥ ˜ r t ∥ L 2 ( ν ) = o ( | t | ) . Subtracting, √ p t,s − √ p t,g = t 2 ( s − g ) √ p 0 + ( r t − ˜ r t ) . T aking L 2 ( ν ) norms and using the triangle inequalit y , ∥ √ p t,s − √ p t,g ∥ L 2 ( ν ) ≤ | t | 2 ∥ ( s − g ) √ p 0 ∥ L 2 ( ν ) + ∥ r t ∥ L 2 ( ν ) + ∥ ˜ r t ∥ L 2 ( ν ) . No w ∥ ( s − g ) √ p 0 ∥ 2 L 2 ( ν ) = R ( s − g ) 2 p 0 dν = E 0 [( s − g ) 2 ] = ∥ s − g ∥ 2 L 2 ( P 0 ) . Dividing by | t | , ∥ √ p t,s − √ p t,g ∥ L 2 ( ν ) | t | ≤ 1 2 ∥ s − g ∥ L 2 ( P 0 ) + ∥ r t ∥ L 2 ( ν ) + ∥ ˜ r t ∥ L 2 ( ν ) | t | . Since ∥ r t ∥ L 2 ( ν ) = o ( | t | ) and ∥ ˜ r t ∥ L 2 ( ν ) = o ( | t | ) , the second term v anishes as t → 0 . Using the definition H ≡ 1 √ 2 ∥ · ∥ L 2 ( ν ) , lim sup t → 0 H ( P t,s , P t,g ) | t | ≤ 1 2 √ 2 ∥ s − g ∥ L 2 ( P 0 ) . B V ariation Indep endence and Pro duct Structure Assumption 1 requires that, for each co ordinate direction, there exists a regular submo del through P 0 along whic h the induced coordinate path mov es one of β or η to first order while holding the other fixed. A natural question is how this relates to the classical notion of local v ariation indep endence, whic h asks that the attainable parameter set con tains a pro duct neighborho o d of ( β 0 , η 0 ) . Lo cal v ariation indep endence guarantees that indep enden tly v aried parameter v alues exist, but is purely set-theoretic and do es not ensure that they are connected by submo dels regular enough to differen tiate along. In this app endix, we formalize the distinction b et ween these t wo conditions and examine the role of pro duct structure in the classical results of v an der Laan and Robins [ 2003 ]. B.1 Lo cal V ariation Indep endence As mentioned in Definition 3 , we are primarily concerned with regular submo dels along whic h β has deriv ativ e zero at the truth, while η is free to v ary . The obvious question is whether suc h paths can alw ays b e constructed, i.e., whether one can p erturb η while holding β fixed. If the chosen nuisance functional η already determines β , for instance, if β = g ( η ) for some known map g , then v arying η necessarily changes β , and the t wo functionals cannot b e p erturb ed indep enden tly . 17 Definition 7 (Lo cal V ariation Indep endence) . W e say that β and η ar e lo c al ly variation indep endent at P 0 if ther e exist neighb orho o ds U ∋ β 0 and V ∋ η 0 such that U × V ⊂ Θ := { ( β ( P ) , η ( P )) : P ∈ P } , that is, the attainable p ar ameter set Θ c ontains a pr o duct neighb orho o d of β 0 , η 0 . In words, near ( β 0 , η 0 ) , there is a full interv al of β -v alues and a full neigh b orho od of η -v alues such that ev ery combination of the t wo is realized by some P ∈ P . The consequence is that one can v ary β while holding η fixed, and vice versa. That is, for sufficiently small t , the pairs ( β 0 + t, η 0 ) and ( β 0 , η 0 + th ) are b oth attainable, meaning there exist distributions in P realizing those functional v alues. Without such a pro duct neighborho o d, the attainable pairs near ( β 0 , η 0 ) could lie along a lo wer-dimensional surface, so that c hanging β migh t force η to c hange as well. Crucially , how ev er, lo cal v ariation indep endence is purely a set-theoretic statement ab out the at- tainable set Θ . The condition guarantees that for each small t , there exists at least one distribution P ∈ P with ( β ( P ) , η ( P )) = ( β 0 + t, η 0 ) . Ho w ever, this is only a p oin twise existence guaran tee and imp oses no regularity on ho w such c hoices may dep end on t . In particular, lo cal v ariation indep en- dence do es not imply that there exists a map t 7→ P t ∈ P satisfying ( β ( P t ) , η ( P t )) = ( β 0 + t, η 0 ) that is quadratic-mean differen tiable at t = 0 . Assumption 14 (Regular co ordinate submo dels) . F or every admissible dir e ction h ∈ ˙ H , the p aths t 7→ P β 0 + t, η 0 and t 7→ P β 0 , η 0 + th exist in P for sufficiently smal l | t | and ar e r e gular (QMD) submo dels thr ough P 0 at t = 0 . Prop osition 1. If β and η ar e lo c al ly variation indep endent at P 0 (Definition 7 ) and satisfy c o or- dinate QMD smo othness (A ssumption 14 ), then lo c al pr o duct structur e (A ssumption 1 ) holds. Pr o of. Lo cal v ariation indep endence pro vides neigh b orho ods U ∋ β 0 and V ∋ η 0 with U × V ⊆ Θ . F or small | t | , the parameter v alues ( β 0 + t, η 0 ) and ( β 0 , η 0 + th ) lie in U × V and hence corresp ond to distributions in P . Assumption 14 asserts that the resulting paths are differen tiable in quadratic mean at t = 0 , giving exactly the conditions of Assumption 1 . W e note that Assumption 1 is strictly w eaker than this combination in tw o resp ects, as it requires neither a full product neigh b orhoo d in the parameter space nor exact co ordinate paths, only regular submo dels with the correct first-order co ordinate deriv ativ es at P 0 . B.2 Revisiting the Gradient Characterization As discussed in Section 2.2.1 , the distinction b et ween the set-theoretic conten t of lo cal v ariation indep endence and the analytic conten t of Assumption 1 is subtle, and it is natural to ask whether this distinction matters in practice. W e sho w that the answer is affirmativ e by revisiting the classical results of v an der Laan and Robins [ 2003 , Section 1.4], whic h connects influence functions to estimating functions. Their framew ork contains the essen tial insigh t that underpins the equiv alence w e formalize in Section 3 . Ho wev er, the regularity of submo dels that p erturb β and η indep enden tly , whic h we hav e isolated as Assumption 1 , pla ys an imp ortan t role in their argumen t that was not separately identified. Making this explicit is the purp ose of the present subsection. W e fo cus on t wo results from v an der Laan and Robins [ 2003 ]: their Lemma 1.2, whic h characterizes gradien ts through the deriv ativ e of an exp ected estimating function along arbitrary submo dels, and Lemma 1.3, which establishes that the deriv ativ e of the exp ected estimating function with resp ect to 18 β at fixed η 0 equals − 1 . This latter result is the key step that links influence functions to estimating functions and underpins the construction of efficient estimators via solving moment conditions. W e will show that the pro of of Lemma 1.3 contains an implicit step, replacing the v arying nuisance η ( P t,s ) by the fixed v alue η 0 inside a deriv ativ e, that requires the nuisance tangent space to capture all nuisance directions, whic h in turn requires the lo cal pro duct structure of Assumption 1 . F or the reader’s conv enience, w e state the relev an t results in our notation. The corresp ondence with v an der Laan and Robins [ 2003 ] is: P 0 ↔ F X , β ↔ µ, η ↔ ρ, E 0 ↔ E F X , { P t,s } ↔ { F ϵ,s } , φ ∗ ↔ S ∗ F eff , Λ ↔ T F nuis , P ↔ M F . Setup. The framework of v an der Laan and Robins [ 2003 ] p osits a class of estimating functions indexed b y an abstract lab el k , mapping each distribution in the mo del to a mean-zero function of the data. The key structural requirement is that these estimating functions, ev aluated at the true parameter v alues, span the orthogonal complement Λ ⊥ of the nuisance tangent space. Since influence functions are orthogonal to Λ by Lemma 4 , this ensures that every candidate influence function is represen table as an estimating function, and com bined with unbiasedness along sub- mo dels, allo ws one to reco v er the inner-product c haracterization linking estimating functions to gradien ts (Lemma 7 ). W e collect the precise conditions as follows. Assumption 15 (Estimating function representation) . Supp ose ther e exists an abstr act index set K and a mapping ( k , β , η ) 7→ D k ( · | β , η ) fr om K × Θ into functions of Z such that: 1. Unbiase d estimating function. E P [ D k ( Z | β ( P ) , η ( P ))] = 0 for al l P ∈ P and al l k ∈ K . 2. R ichness. The index set K is rich enough that, at P 0 , Λ ⊥ = { D k ( · | β 0 , η 0 ) : k ∈ K ( P 0 ) } , wher e K ( P 0 ) ⊆ K is the index set at P 0 and Λ ⊥ denotes the ortho gonal c omplement of the nuisanc e tangent sp ac e Λ (Definition 3 ) inside L 0 2 ( P 0 ) . 3. Continuity along submo dels. F or al l k ∈ K ( P 0 ) and e ach r e gular submo del { P t,s } with sc or e s ∈ S , ∥ D k ( · | β ( P t,s ) , η ( P t,s )) − D k ( · | β 0 , η 0 ) ∥ L 2 ( P 0 ) → 0 as t → 0 . 4. Pathwise differ entiability. β is p athwise differ entiable at P 0 with efficient influenc e func- tion φ ∗ , and ⟨ φ ∗ ⟩ ⊂ S , wher e ⟨ φ ∗ ⟩ denotes the one-dimensional sp an of φ ∗ . 5. Uniform b ounde dness. F or al l k ∈ K ( P 0 ) , ther e exist C < ∞ and a neighb orho o d N of ( β 0 , η 0 ) such that sup z ∈Z , ( β ,η ) ∈ N | D k ( z | β , η ) | ≤ C . Gradien t characterization (Lemma 1.2 of v an der Laan and Robins [ 2003 ]). The first result characterizes whic h estimating functions are gradien ts. The idea is as follo ws: by the unbi- asedness condition (i) of Assumption 15 , the expectation of D k under P t,s v anishes identically along an y regular submo del. Differentiating this identit y at t = 0 recov ers an inner-pro duct represen tation that determines when an estimating function is an influence function. 19 Lemma 7 (Gradien t c haracterization; Lemma 1.2 of v an der Laan and Robins [ 2003 ]) . Under A ssumption 15 , define f k ( s ) := d dt E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))]     t =0 . Then an element D = D k ( · | β 0 , η 0 ) ∈ Λ ⊥ for k ∈ K ( P 0 ) is a gr adient if and only if f k ( s ) = ( 0 if s ∈ S nuis , − d dt β ( P t,s )   t =0 if s ∈ ⟨ φ ∗ ⟩ . Pr o of. By Assumption 15 (i), E P t,s [ D k ( Z | β ( P t,s ) , η ( P t,s ))] = 0 for all sufficiently small t . Com bined with E 0 [ D k ( Z | β 0 , η 0 )] = 0 , we can write 1 t E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))] = 1 t  E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))] − E P t,s [ D k ( Z | β ( P t,s ) , η ( P t,s ))]  = Z D k ( z | β ( P t,s ) , η ( P t,s )) dP 0 − dP t,s t ( z ) Define g t := D k ( z | β ( P t,s ) , η ( P t,s )) and g 0 := D k ( z | β 0 , η 0 ) . W riting dP 0 = p 0 dν and dP t,s = p t dν , Z g t · p 0 − p t t dν = − Z g t · √ p t − √ p 0 t ( √ p t + √ p 0 ) dν = − Z g 0 · √ p t − √ p 0 t ( √ p t + √ p 0 ) dν − Z ( g t − g 0 ) · √ p t − √ p 0 t ( √ p t + √ p 0 ) dν. The first integral conv erges to E 0 [ g 0 s ] b y the same argument as in the pro of of Lemma 2 . F or the second integral, Cauch y–Sch w arz gives     Z ( g t − g 0 ) · √ p t − √ p 0 t ( √ p t + √ p 0 ) dν     ≤ ∥ ( g t − g 0 )( √ p t + √ p 0 ) ∥ L 2 ( ν ) ·     √ p t − √ p 0 t     L 2 ( ν ) , where the second factor is boun ded by QMD. F or the first factor, ∥ ( g t − g 0 )( √ p t + √ p 0 ) ∥ 2 L 2 ( ν ) = Z ( g t − g 0 ) 2 ( √ p t + √ p 0 ) 2 dν ≤ 2 Z ( g t − g 0 ) 2 ( p t + p 0 ) dν = 2  E P t [( g t − g 0 ) 2 ] + E 0 [( g t − g 0 ) 2 ]  . The term E 0 [( g t − g 0 ) 2 ] → 0 b y Assumption 15 (iii). F or E P t [( g t − g 0 ) 2 ] , write E P t [( g t − g 0 ) 2 ] = E 0 [( g t − g 0 ) 2 ] + Z ( g t − g 0 ) 2 ( p t − p 0 ) dν ≤ E 0 [( g t − g 0 ) 2 ] + 4 C 2 Z | p t − p 0 | dν, where the second inequalit y uses Assumption 15 (v), and R | p t − p 0 | dν → 0 again by QMD. Thus, f k ( s ) = − E 0 [ D k ( Z | β 0 , η 0 ) · s ( Z )] = −⟨ D k ( · | β 0 , η 0 ) , s ⟩ P 0 . By definition, D k is a gradient if and only if the inner pro duct equals zero for all s ∈ S nuis and equals d dt β ( P t,s ) | t =0 for s ∈ ⟨ φ ∗ ⟩ . This is equiv alen t to the stated conditions on f k . 20 The negativ e identit y (Lemma 1.3 of v an der Laan and Robins [ 2003 ]). The second result builds on Lemma 7 to establish that if D k ( · | β 0 , η 0 ) is an influence function, then the partial deriv ativ e of E 0 [ D k ( Z | β , η 0 )] with resp ect to β at β 0 equals − 1 . It is in the pro of of this result that the regularity of co ordinate submo dels (Assumption 1 ) pla ys an imp ortan t but implicit role. Lemma 8 (Negative identit y; Lemma 1.3 of v an der Laan and Robins [ 2003 ]) . In addition to A ssumption 15 , assume that β and η ar e lo c al ly variation indep endent at P 0 (Definition 7 ), that β 7→ E 0 [ D k ( Z | β , η 0 )] is differ entiable at β 0 with nonzer o derivative for al l k ∈ K ( P 0 ) , and that E 0 [ φ ∗ ( Z ) 2 ] > 0 . If D k ( · | β 0 , η 0 ) is a gr adient, then d dβ E 0 [ D k ( Z | β , η 0 )]     β = β 0 = − 1 . Pr o of (as given by van der L aan and R obins [ 2003 ]). Let s ∈ S b e a scalar m ultiple of φ ∗ , sa y s = cφ ∗ for some c  = 0 . Since φ ∗ ∈ S by Assumption 15 (iv), s is the score of some regular submo del { P t,s } through P 0 . Define h 2 ,s ( t ) := β ( P t,s ) , h 1 ( β ) := E 0 [ D k ( Z | β , η 0 )] . The map t 7→ E 0 [ D k ( Z | β ( P t,s ) , η 0 )] is the comp osition h 1 ( h 2 ,s ( t )) . As in the pro of of Lemma 7 , d dt h 1 ( h 2 ,s ( t ))     t =0 = − d dt β ( P t,s )     t =0 . (4) By the chain rule, the left-hand side equals h ′ 1 ( β 0 ) · h ′ 2 ,s (0) . Path wise differen tiability giv es h ′ 2 ,s (0) = d dt β ( P t,s )     t =0 = E 0 [ φ ∗ s ] = c E 0 [( φ ∗ ) 2 ]  = 0 . So h ′ 1 ( β 0 ) · h ′ 2 ,s (0) = − h ′ 2 ,s (0) . Since h ′ 2 ,s (0)  = 0 , it follows that h ′ 1 ( β 0 ) = − 1 . The role of regularity . The pro of inv ok es “as in the pro of of Lemma 7 ” to claim ( 4 ), i.e., d dt E 0 [ D k ( Z | β ( P t,s ) , η 0 )]     t =0 = − d dt β ( P t,s )     t =0 . (5) Ho wev er, Lemma 7 actually established d dt E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))]     t =0 = − d dt β ( P t,s )     t =0 , (6) where η ( P t,s ) v aries with t . F or ( 5 ) to follo w from ( 6 ), one m ust show that replacing η ( P t,s ) b y the fixed v alue η 0 do es not affect the deriv ativ e, i.e., that ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 [ h ] = 0 ∀ h ∈ ˙ H . (7) T o see wh y ( 7 ) is needed, supp ose that the map ( β , η ) 7→ E 0 [ D k ( Z | β , η )] is F réc het differentiable at ( β 0 , η 0 ) . The chain rule decomp oses ( 6 ) as d dt E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))]     t =0 21 = d dt E 0 [ D k ( Z | β ( P t,s ) , η 0 )]     t =0 + ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 " d dt η ( P t,s )     t =0 # so ( 5 ) follo ws from ( 6 ) if and only if the second term v anishes. ( 7 ) guarantees this by requiring that the nuisance deriv ative of the exp ected estimating function v anishes in ev ery direction h ∈ ˙ H . W e now show that establishing ( 7 ) requires Assumption 1 . Apply Lemma 7 to a n uisance score s nuis ∈ S nuis . Since D k ( · | β 0 , η 0 ) is an influence function, f k ( s nuis ) = 0 , i.e., d dt E 0 [ D k ( Z | β ( P t,s nuis ) , η ( P t,s nuis ))]     t =0 = 0 . Assuming again F réchet differentiabilit y , the chain rule giv es ∂ ∂ β E 0 [ D k ( Z | β , η 0 )]     β = β 0 · d dt β ( P t,s nuis )     t =0 + ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 " d dt η ( P t,s nuis )     t =0 # = 0 . Since s nuis is a nuisance score, d dt β ( P t,s nuis )   t =0 = 0 , so the first term v anishes and w e obtain ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 " d dt η ( P t,s nuis )     t =0 # = 0 . This establishes ( 7 ) only for those directions h ∈ ˙ H that arise as n uisance deriv atives of submo dels in S nuis . A priori, these nuisance deriv ativ es p opulate some subset of ˙ H , but there is no reason this subset should exhaust ˙ H . Assumption 1 closes the remaining gap by furnishing for each h ∈ ˙ H a regular submo del with d dt β ( P t ) | t =0 = 0 and d dt η ( P t ) | t =0 = h . Since d dt β ( P t ) | t =0 = 0 , the score of this submo del is a n uisance score, and its n uisance deriv ativ e at t = 0 is exactly h . The argument ab o ve then yields ( 7 ) for this h . Since h ∈ ˙ H was arbitrary , ( 7 ) holds in full generalit y . With ( 7 ) in hand, the passage from ( 6 ) to ( 5 ) immediately follows. F or any score s = cφ ∗ , the same c hain-rule decomposition used ab o ve giv es d dt E 0 [ D k ( Z | β ( P t,s ) , η ( P t,s ))]     t =0 = ∂ ∂ β E 0 [ D k ( Z | β , η 0 )]     β = β 0 · d dt β ( P t,s )     t =0 + ∂ ∂ η E 0 [ D k ( Z | β 0 , η )]     η = η 0 " d dt η ( P t,s )     t =0 # = ∂ ∂ β E 0 [ D k ( Z | β , η 0 )]     β = β 0 · d dt β ( P t,s )     t =0 = d dt E 0 [ D k ( Z | β ( P t,s ) , η 0 )]     t =0 , where the second equality follows from ( 7 ), and the pro of of the negative identit y then pro ceeds as written. Remark 7 (F réchet differen tiability) . The c hain-rule decompositions abov e require F réchet dif- feren tiability of the map ( β , η ) 7→ E 0 [ D k ( Z | β , η )] at ( β 0 , η 0 ) , whic h is not explicitly stated in Lemma 1.3 of v an der Laan and Robins [ 2003 ]. The paragraph immediately preceding Lemma 1.3 in their exp osition, how ev er, suggests that smo othness conditions should b e jointly imp osed on β and η . 22 Remark 8 (Boundedness and the score definition) . The reader ma y notice that Assumption 15 (v) imp oses uniform b oundedness on the estimating functions, a condition not present in the corre- sp onding result of v an der Laan and Robins [ 2003 ]. The difference traces to the definition of scores. v an der Laan and Robins [ 2003 ] define the score as the L 2 ( P 0 ) limit of the density ratio ( p t /p 0 − 1) /t , whic h is strictly stronger than the quadratic mean differen tiability (QMD) formulation of v an der V aart [ 1998 ] adopted here. Under their definition, the conv ergence in the pro of of Lemma 7 follows from Cauch y–Sc hw arz in L 2 ( P 0 ) alone. Under QMD, the same step requires decomp osing through ( √ p t − √ p 0 )( √ p t + √ p 0 ) , and b ounding the resulting cross term requires in tro ducing the uniform b oundedness condition. It should be noted that the uniform b oundedness condition is an artifact of the QMD formulation and not a structural requirement of the arguments. W e adopt QMD through- out to maintain a single consistent con ven tion, and the distinction b et ween lo cal pro duct structure and v ariation indep endence arises indep enden tly of which score form ulation is adopted. C Example with the A v erage T reatmen t Effect W e now illustrate the equiv alence results of Section 3 through a detailed w orked example on the a verage treatment effect. F or each direction of the equiv alence, we v erify every assumption and construct the required ob jects explicitly . Setup. Let Z = ( Y , X , A ) with confounders X , binary treatment A ∈ { 0 , 1 } , and outcome Y . W e assume that the standard causal assumptions of consistency , positivity , and no unmeasured confounding hold. W e work in the nonparametric mo del P consisting of all densities p with resp ect to a σ -finite dominating measure ν that satisfy the regularit y conditions (R1)–(R2) b elo w. W e fix P 0 ∈ P and define the n uisance quantities µ a ( x ) := E 0 [ Y | X = x, A = a ] , π ( x ) := P 0 ( A = 1 | X = x ) , the treatment effect function τ ( x ) := µ 1 ( x ) − µ 0 ( x ) , and the conditional outcome v ariance σ 2 a ( x ) := V ar 0 ( Y | X = x, A = a ) . The target and nuisance functionals are β ( P ) := E P [ µ P 1 ( X ) − µ P 0 ( X )] , η ( P ) := ( µ P 1 , µ P 0 , π P ) , with β 0 := β ( P 0 ) and η 0 := ( µ 1 , µ 0 , π ) . Regularit y conditions. W e further imp ose the following conditions, which ensure that the con- structed submo dels are well-behav ed. Note that p ositivit y already appeared as an identification assumption. (R1) P ositivit y . There exists ε > 0 such that for all P ∈ P , π P ( x ) ∈ [ ε, 1 − ε ] for P X -a.s. x . (R2) Bounded outcomes. There exists C Y < ∞ suc h that for all P ∈ P , | Y | ≤ C Y P -a.s. (R3) P ositiv e conditional v ariance. σ 2 a ( x ) ≥ σ 2 > 0 for a = 0 , 1 , P 0 -a.s. (R4) T reatmen t effect heterogeneit y . V ar 0 ( τ ( X )) > 0 . W e also assume an interior p ositivit y margin at P 0 : there exists ε ′ > ε such that π ( x ) ∈ [ ε ′ , 1 − ε ′ ] P 0 ,X -a.s. This ensures that for an y b ounded mean-zero g , the linear tilt P t with density p 0 (1 + tg ) remains in P for sufficien tly small | t | , so the tangent space at P 0 is T = L 0 2 ( P 0 ) b y the same argumen t as Corollary 1 . 23 Finally , w e take the am bient normed space to b e V := L ∞ ( P 0 ,X ) 3 with the pro duct supremum norm, and the n uisance parameter set to b e H := n ( µ 1 , µ 0 , π ) ∈ V : ess inf P 0 ,X π > 0 and ess inf P 0 ,X (1 − π ) > 0 o . Since | µ a ( x ) | ≤ C Y P 0 ,X -a.s. b y (R2) and π ∈ [ ε, 1 − ε ] P 0 ,X -a.s. b y (R1), w e hav e η 0 ∈ H . Moreo ver, since H is op en in V , the admissible perturbation space is ˙ H = V . Estimating function and influence function. Define the estimating function m ( Z ; β , η ) := A π ( X ) { Y − µ 1 ( X ) } − 1 − A 1 − π ( X ) { Y − µ 0 ( X ) } + µ 1 ( X ) − µ 0 ( X ) − β , (8) and the influence function at the truth, φ ( Z ) := m ( Z ; β 0 , η 0 ) = A π ( X ) { Y − µ 1 ( X ) } − 1 − A 1 − π ( X ) { Y − µ 0 ( X ) } + τ ( X ) − β 0 . (9) C.1 F orward direction W e verify Assumptions 4 – 10 and apply Theorem 1 to conclude that β is path wise differen tiable with influence function φ ( Z ) = − G − 1 m ( Z ; β 0 , η 0 ) . Assumption 4 . Let P b e an y distribution with β ( P ) = β and η ( P ) = ( µ 1 , µ 0 , π ) . W e sho w E P [ m ( Z ; β , η )] = 0 . By the to wer prop ert y , conditioning first on X and then on ( X , A ) , and using the definition µ 1 ( x ) = E P [ Y | X = x, A = 1] , E P  A π ( X ) { Y − µ 1 ( X ) }  = E P  E P  A π ( X ) { Y − µ 1 ( X ) }     X  = E P  π ( X ) π ( X ) · E P [ Y − µ 1 ( X ) | X , A = 1]  = 0 , where the second equality uses E P [ A · f ( Z ) | X ] = π ( X ) · E P [ f ( Z ) | X , A = 1] . The second IPW term v anishes identically by the same argument with a = 0 . The remaining terms contribute E P [ µ 1 ( X ) − µ 0 ( X )] − β = β − β = 0 . Assumption 8 . Since m is linear in β with co efficien t − 1 , we ha ve ∂ β m ( Z ; β 0 , η 0 ) = − 1 iden tically , so G := E 0 [ ∂ β m ( Z ; β 0 , η 0 )] = − 1  = 0 . Assumption 9 . W e verify that the Gâteaux deriv ative of η 7→ E 0 [ m ( Z ; β 0 , η )] v anishes at η 0 in eac h co ordinate direction of ˙ H . Since η = ( µ 1 , µ 0 , π ) and the admissible p erturbation space ˙ H is a pro duct, linearit y allo ws us to chec k each comp onen t separately . Recall that throughout, the exp ectation E 0 is taken under the fixed measure P 0 and only the function argumen ts inside m are b eing v aried. P erturbation µ 1 → µ 1 + th 1 . Substituting µ 1 + th 1 in to ( 8 ) with β = β 0 and ( µ 0 , π ) held at their true v alues, the only terms affected are the first IPW term A π ( X ) { Y − µ 1 ( X ) − th 1 ( X ) } 24 and the outcome regression µ 1 ( X ) + th 1 ( X ) − µ 0 ( X ) . T aking the exp ectation under P 0 and differentiating at t = 0 : d dt E 0 [ m ( Z ; β 0 , ( µ 1 + th 1 , µ 0 , π ))]     t =0 = E 0  − Ah 1 ( X ) π ( X ) + h 1 ( X )  = E 0  h 1 ( X )  1 − A π ( X )  . Conditioning on X and using E 0 [ A | X ] = π ( X ) : E 0  1 − A π ( X )     X  = 1 − π ( X ) π ( X ) = 0 . By the to w er prop erty , the deriv ativ e v anishes for all h 1 ∈ L ∞ ( P 0 ,X ) . The perturbation µ 0 → µ 0 + th 0 follo ws b y an iden tical argumen t. P erturbation π → π + th π . Substituting π + th π affects only the denominators of the t wo IPW terms. Since the outcome regression µ 1 ( X ) − µ 0 ( X ) − β 0 do es not inv olv e π , w e differentiate only the IPW terms. Using d dt 1 π + th π     t =0 = − h π /π 2 and d dt 1 1 − π − th π     t =0 = h π / (1 − π ) 2 , w e can write d dt E 0 [ m ( Z ; β 0 , ( µ 1 , µ 0 , π + th π ))]     t =0 = E 0  − A h π ( X ) π ( X ) 2 { Y − µ 1 ( X ) } − (1 − A ) h π ( X ) (1 − π ( X )) 2 { Y − µ 0 ( X ) }  . F or the first term, we condition on X : E 0  A ( Y − µ 1 ( X )) π ( X ) 2     X  = 1 π ( X ) 2 E 0 [ A ( Y − µ 1 ( X )) | X ] = π ( X ) π ( X ) 2 E 0 [ Y − µ 1 ( X ) | X , A = 1] = 0 , where w e used E 0 [ A · f ( Z ) | X ] = π ( X ) E 0 [ f ( Z ) | X , A = 1] and the definition of µ 1 . The second term v anishes identically by the same argument with a = 0 . Assumptions 5 – 7 . Under (R1)–(R2), the map ( β , η ) 7→ m ( · ; β , η ) ∈ L 2 ( P 0 ) is F réchet differen- tiable at ( β 0 , η 0 ) . The partial deriv ativ es computed ab o ve are b ounded linear maps in to L 2 ( P 0 ) , with b oundedness follo wing from π ≥ ε and | Y | ≤ C Y , which ensure all IPW-weigh ted terms lie in L ∞ ( P 0 ) . W e tak e S = L ∞ ( P 0 ) ∩ L 0 2 ( P 0 ) , which is dense in T = L 0 2 ( P 0 ) by the same argument as Corollary 1 , and for each s ∈ S w e use the linear tilt submo del from Lemma 1 . The induced co ordinate paths t 7→ ( β t,s , η t,s ) are differentiable at t = 0 , whic h follo ws from the explicit deriv ative form ulas ∂ ∂ t µ a,t ( x )     t =0 = E 0 [( Y − µ a ( X )) s ( Z ) | X = x, A = a ] , ∂ ∂ t π t ( x )     t =0 = E 0 [( A − π ( X )) s ( Z ) | X = x ] , whic h are deriv ed in the pathwise differentiabilit y verification b elow via the quotien t rule. The uniform second momen t bound of Lemma 3 holds since for linear tilt submo dels with b ounded scores, the n uisance difference quotients ( µ a,t − µ a ) /t and ( π t − π ) /t ad mit closed-form expressions via the change-of-measure identit y µ a,t ( x ) = E 0 [ Y (1 + ts ) | X = x, A = a ] / E 0 [(1 + ts ) | X = x, A = a ] and similarly for π t , whic h are uniformly b ounded in x for small t under (R1)–(R2). Combined 25 with the b oundedness of ( β t − β 0 ) /t from co ordinate smo othness, this yields a uniform L ∞ b ound on the full difference quotient ( f t,s − f 0 ) /t , which dominates the L 2 ( P t,s ) norm for an y t . Assumption 10 . W e sho w that | β ( P 1 ) − β ( P 2 ) | ≤ cH ( P 1 , P 2 ) for all P 1 , P 2 ∈ P , where c = 4 √ 2 C Y (1 + 1 /ε ) . T o start, write p j = dP j /dν and recall τ P j ( x ) = µ P j 1 ( x ) − µ P j 0 ( x ) and β ( P j ) = R τ P j ( x ) dP j,X ( x ) . W e decomp ose β ( P 1 ) − β ( P 2 ) = Z  τ P 1 ( x ) − τ P 2 ( x )  dP 1 ,X ( x ) | {z } (I) + Z τ P 2 ( x ) d ( P 1 ,X − P 2 ,X )( x ) | {z } (II) . By (R2), | τ P 2 ( x ) | ≤ 2 C Y , so | I I | ≤ 2 C Y Z | p 1 ,X ( x ) − p 2 ,X ( x ) | dν X = 2 C Y TV( P 1 ,X , P 2 ,X ) . Since the marginal densit y is obtained by in tegrating out ( y , a ) , | p 1 ,X ( x ) − p 2 ,X ( x ) | =       X a ∈{ 0 , 1 } Z ( p 1 − p 2 )( y , x, a ) dν Y       ≤ X a ∈{ 0 , 1 } Z | p 1 − p 2 | ( y , x, a ) dν Y , where the inequality holds by the triangle inequality . In tegrating o ver ν X : TV( P 1 ,X , P 2 ,X ) ≤ X a ∈{ 0 , 1 } Z Z | p 1 − p 2 | ( y , x, a ) dν Y dν X = Z | p 1 − p 2 | dν = TV ( P 1 , P 2 ) , hence | I I | ≤ 2 C Y TV( P 1 , P 2 ) . Next, by the triangle inequalit y , | τ P 1 ( x ) − τ P 2 ( x ) | ≤ | µ P 1 1 ( x ) − µ P 2 1 ( x ) | + | µ P 1 0 ( x ) − µ P 2 0 ( x ) | , so it suffices to b ound eac h R | µ P 1 a ( x ) − µ P 2 a ( x ) | dP 1 ,X ( x ) separately . Fix a ∈ { 0 , 1 } . By definition, µ P j a ( x ) = R y dP j ( y | x, a ) , so R ( y − µ P 2 a ( x )) dP 2 ( y | x, a ) = 0 . It follows that for P 1 ,X -a.e. x , µ P 1 a ( x ) − µ P 2 a ( x ) = Z y dP 1 ( y | x, a ) − µ P 2 a ( x ) = Z ( y − µ P 2 a ( x )) dP 1 ( y | x, a ) = Z ( y − µ P 2 a ( x )) dP 1 ( y | x, a ) − Z ( y − µ P 2 a ( x )) dP 2 ( y | x, a ) = R ( y − µ P 2 a ( x ))( p 1 − p 2 )( y , x, a ) dν Y p 1 ( x, a ) , where the last equality writes dP j ( y | x, a ) = p j ( y , x, a ) dν Y /p j ( x, a ) . By (R2), | y − µ P 2 a ( x ) | ≤ 2 C Y , so | µ P 1 a ( x ) − µ P 2 a ( x ) | ≤ 2 C Y p 1 ( x, a ) Z | p 1 − p 2 | ( y , x, a ) dν Y . By (R1), p 1 ( x, a ) ≥ ε p 1 ,X ( x ) , since p 1 ( x, a ) = π P 1 a ( x ) p 1 ,X ( x ) and π P 1 a ( x ) ≥ ε . Multiplying b oth sides by p 1 ,X ( x ) : | µ P 1 a ( x ) − µ P 2 a ( x ) | p 1 ,X ( x ) ≤ 2 C Y ε Z | p 1 − p 2 | ( y , x, a ) dν Y . 26 In tegrating o ver ν X and summing ov er a ∈ { 0 , 1 } , I ≤ 1 X a =0 Z | µ P 1 a ( x ) − µ P 2 a ( x ) | dP 1 ,X ( x ) ≤ 2 C Y ε 1 X a =0 Z Z | p 1 − p 2 | ( y , x, a ) dν Y dν X = 2 C Y ε TV( P 1 , P 2 ) . Com bining the ab o ve, we arrive at | β ( P 1 ) − β ( P 2 ) | ≤  2 C Y + 2 C Y ε  TV( P 1 , P 2 ) = 2 C Y  1 + 1 ε  TV( P 1 , P 2 ) ≤ 4 √ 2 C Y  1 + 1 ε  H ( P 1 , P 2 ) , where the last step uses TV( P 1 , P 2 ) ≤ 2 √ 2 H ( P 1 , P 2 ) . Therefore, Assumption 10 holds with c = 4 √ 2 C Y (1 + 1 /ε ) and an y δ > 0 . Since the assumptions of Theorem 1 h old, we conclude that β is pathwise differen tiable at P 0 with influence function φ ( Z ) = − G − 1 m ( Z ; β 0 , η 0 ) = − ( − 1) − 1 m ( Z ; β 0 , η 0 ) = m ( Z ; β 0 , η 0 ) . C.2 Rev erse direction Assumption 11 . W e sho w directly that for every linear tilt submo del p t ( z ) = p 0 ( z )(1 + tg ( z )) with E 0 [ g ] = 0 and ∥ g ∥ ∞ ≤ M , whose score is s ≡ g by Lemma 1 , d dt β ( P t )     t =0 = E 0 [ φ ( Z ) g ( Z )] . W e first establish this identit y for all b ounded mean-zero scores b elo w, and then extend the con- clusion to all regular submo dels via the appro ximation step used in the argument of Theorem 1 . The deriv ativ e of β ( P t ) = E P t [ µ 1 ,t ( X ) − µ 0 ,t ( X )] decomp oses b y the pro duct rule into three terms: d dt β ( P t )     t =0 = Z X ∂ ∂ t µ 1 ,t ( x )     t =0 dP 0 ( x ) | {z } (I) − Z X ∂ ∂ t µ 0 ,t ( x )     t =0 dP 0 ( x ) | {z } (II) + Z X τ ( x ) ∂ ∂ t dP t,X ( x )     t =0 | {z } (II I) . F or the first t wo terms, µ a,t ( x ) = Z Y y dP t ( y , x, a )  Z Y dP t ( y , x, a ) where dP t = (1 + tg ) dP 0 . Define N µ a ( t ) := Z Y y dP 0 ( y , x, a )(1 + tg ( y , x, a )) , and D µ a ( t ) := Z Y dP 0 ( y , x, a )(1 + tg ( y , x, a )) . Then it follows N µ a (0) = µ a ( x ) dP 0 ( x, a ) , N ′ µ a (0) = E 0 [ Y g ( Z ) | X = x, A = a ] dP 0 ( x, a ) , 27 D µ a (0) = dP 0 ( x, a ) , D ′ µ a (0) = E 0 [ g ( Z ) | X = x, A = a ] dP 0 ( x, a ) . By the quotient rule, we obtain ∂ ∂ t µ a,t ( x )     t =0 =  N ′ µ a (0) D µ a (0) − N µ a (0) D ′ µ a (0)  D µ a (0) 2 = E 0 [ Y g ( Z ) | X = x, A = a ] − µ a ( x ) E 0 [ g ( Z ) | X = x, A = a ] = E 0 [( Y − µ a ( X )) g ( Z ) | X = x, A = a ] . (10) F or π t ( x ) = p t ( x, A = 1) /p t ( x ) , set N π ( t ) := Z Y dP 0 ( y , x, 1)(1 + t g ( y , x, 1)) , D π ( t ) := Z Y X a dP 0 ( y , x, a )(1 + t g ( y , x, a )) . Then it follows N π (0) = dP 0 ( x, A =1) , N ′ π (0) = E 0 [ g ( Z ) | X = x, A = 1] dP 0 ( x, A =1) , D π (0) = dP 0 ( x ) , D ′ π (0) = E 0 [ g ( Z ) | X = x ] dP 0 ( x ) . Again by the quotien t rule, we obtain ∂ ∂ t π t ( x )     t =0 =  N ′ π (0) D π (0) − N π (0) D ′ π (0)  D π (0) 2 = π ( x )  E 0 [ g ( Z ) | X = x, A = 1] − E 0 [ g ( Z ) | X = x ]  = E 0 [( A − π ( X )) g ( Z ) | X = x ] , (11) where the second equalit y follo ws from E 0 [ A g ( Z ) | X = x ] = π ( x ) E 0 [ g ( Z ) | X = x, A = 1] . W e now assemble the terms. F or term (I), recall the iden tity E 0 [ W · 1 { A = 1 } | X ] = π ( X ) · E 0 [ W | X, A = 1] . By the law of total exp ectation, Z X E 0 [( Y − µ 1 ( X )) g ( Z ) | X = x, A = 1] dP 0 ( x ) = E 0  π ( X ) · E 0  ( Y − µ 1 ( X )) g ( Z ) π ( X )     X , A = 1  = E 0  A ( Y − µ 1 ( X )) g ( Z ) π ( X )  . By the same reasoning, term (II) giv es E 0  (1 − A )( Y − µ 0 ( X )) g ( Z ) 1 − π ( X )  . F or term (I I I), by the la w of total exp ectation Z X τ ( x ) E 0 [ g ( Z ) | X = x ] dP 0 ( x ) = E 0 [ τ ( X ) g ( Z )] . Collecting all three terms shows that d dt β ( P t,g )   t =0 = E 0 [ φ ( Z ) g ( Z )] for every linear tilt submo del with b ounded mean-zero score g . W e know b ounded mean-zero functions are dense in T = L 0 2 ( P 0 ) b y the same argumen t as Corollary 1 , and the A TE is Hellinger Lipschitz as verified in Section C.1 28 for Assumption 10 . Therefore, the same three-term appro ximation argumen t used in the proof of Theorem 1 extends this identit y to all regular submo dels with score s ′ ∈ S , whic h establishes path wise differen tiability at P 0 with influence function φ . Assumption 12 . T o v erify this assumption, we construct explicit QMD submo dels along eac h co ordinate of the parameter space. β -co ordinate submo del. W e construct a QMD p ath t 7→ P t with β ( P t ) = β 0 + t and η ( P t ) = η 0 . Define the function g β ( x ) := τ ( x ) − β 0 V ar 0 ( τ ( X )) , whic h depends on z only through x . By (R4), V ar 0 ( τ ( X )) > 0 , and (R2) gives ∥ g β ∥ ∞ < ∞ . Clearly , E 0 [ g β ] = 0 . By Lemma 1 , the linear tilt dP t ( z ) = (1 + tg β ( x )) dP 0 ( z ) defines a regular QMD submo del through P 0 with score g β for | t | < 1 / ∥ g β ∥ ∞ . Since g β dep ends only on x , the conditional densities are undisturb ed b y the tilt: dP t ( y | x, a ) = dP t ( y , x, a ) dP t ( x, a ) = (1 + tg β ( x )) dP 0 ( y , x, a ) (1 + tg β ( x )) dP 0 ( x, a ) = dP 0 ( y | x, a ) , so µ a,t ( x ) = µ a ( x ) for all small t . Similarly , π t ( x ) = dP t ( x, 1) /dP t ( x ) = π ( x ) since the (1 + tg β ( x )) factors cancel in the ratio. Hence η ( P t ) = η 0 . F urthermore, β increases at unit rate: β ( P t ) = E P t [ τ ( X )] = E 0 [ τ ( X )(1 + tg β ( X ))] = β 0 + t · E 0 [ τ ( X ) ( τ ( X ) − β 0 )] V ar 0 ( τ ( X )) = β 0 + t, since E 0 [ τ ( X )( τ ( X ) − β 0 )] = V ar 0 ( τ ( X )) . η -co ordinate submo dels. F or each admissible direction h = ( h 1 , h 0 , h π ) ∈ ˙ H , we construct a regular (QMD) submo del through P 0 satisfying ˙ β 0 ,s h = 0 and ˙ η 0 ,s h = h . Since Assumption 1 requires only first-order co ordinate con trol, a linear tilt submo del suffices. Define the p erturbation functions g a ( y , x ) := h a ( x ) ( y − µ a ( x )) σ 2 a ( x ) , g h π ( x, a ) := h π ( x ) ( a − π ( x )) π ( x )(1 − π ( x )) , g β ( x ) := τ ( x ) − β 0 V ar 0 ( τ ( X )) , and the score s h ( z ) := g a ( y , x ) + g h π ( x, a ) + α 0 g β ( x ) , α 0 := − E 0 [ h 1 ( X ) − h 0 ( X )] . (12) Under (R1)–(R3), eac h summand is bounded. Indeed, | g a | ≤ ∥ h a ∥ ∞ · 2 C Y /σ 2 , | g h π | ≤ ∥ h π ∥ ∞ / ( ε (1 − ε )) , and | g β | ≤ 4 C Y / V ar 0 ( τ ( X )) . Each summand also has mean zero. F or the outcome p erturbation, the la w of iterated exp ectation and E 0 [ Y − µ a ( X ) | X, A ] = 0 give E 0 [ g a ] = 0 . F or the prop ensit y p erturbation, E 0 [ A − π ( X ) | X ] = 0 gives E 0 [ g h π ] = 0 . Finally , E 0 [ g β ] = 0 by construction. Hence s h is b ounded and mean-zero, and by Lemma 1 , the linear tilt p t ( z ) := p 0 ( z )(1 + t s h ( z )) defines a regular QMD submo del through P 0 with score s h for | t | < 1 / ∥ s h ∥ ∞ . 29 W e now verify the first-order co ordinate deriv ativ es using the quotien t rule form ulas ( 10 ) and ( 11 ), applied to the linear tilt with score s h . Deriv ative of µ a . By ( 10 ), ˙ µ a, 0 ( x ) = E 0 [( Y − µ a ( X )) s h ( Z ) | X = x, A = a ] . W e expand s h = g a + g h π + α 0 g β and compute each contribution separately . F or the outcome p erturbation, E 0 [( Y − µ a ( X )) g a ( Y , X ) | X = x, A = a ] = h a ( x ) σ 2 a ( x ) E 0 [( Y − µ a ( X )) 2 | X = x, A = a ] = h a ( x ) σ 2 a ( x ) · σ 2 a ( x ) = h a ( x ) . F or the prop ensit y p erturbation, since g h π ( x, a ) do es not dep end on y , it factors out of the condi- tional exp ectation and the remaining factor E 0 [ Y − µ a ( X ) | X = x, A = a ] v anishes by definition of µ a . The same reasoning applies to α 0 g β ( x ) , which also do es not dep end on y . That is, E 0 [( Y − µ a ( X )) g h π ( X , A ) | X = x, A = a ] = g h π ( x, a ) · E 0 [ Y − µ a ( X ) | X = x, A = a ] = 0 , E 0 [( Y − µ a ( X )) α 0 g β ( X ) | X = x, A = a ] = α 0 g β ( x ) · E 0 [ Y − µ a ( X ) | X = x, A = a ] = 0 . Com bining the three contributions giv es ˙ µ a, 0 ( x ) = h a ( x ) . Deriv ative of π . By ( 11 ), ˙ π 0 ( x ) = E 0 [( A − π ( X )) s h ( Z ) | X = x ] . F or the outcome p erturbation, we condition on A and use the conditional mean-zero prop erty of g a to obtain E 0 [( A − π ( X )) g a ( Y , X ) | X = x ] = X a ′ ∈{ 0 , 1 } P 0 ( A = a ′ | x ) ( a ′ − π ( x )) E 0 [ g a ′ ( Y , X ) | X = x, A = a ′ ] . Eac h inner exp ectation ev aluates to E 0 [ g a ′ ( Y , X ) | X = x, A = a ′ ] = h a ′ ( x ) σ 2 a ′ ( x ) E 0 [ Y − µ a ′ ( X ) | X = x, A = a ′ ] = 0 , so the entire sum v anishes. F or the prop ensit y p erturbation, recalling that g h π ( x, a ) = h π ( x )( a − π ( x )) / [ π ( x )(1 − π ( x ))] , E 0 [( A − π ( X )) g h π ( X , A ) | X = x ] = h π ( x ) π ( x )(1 − π ( x )) E 0 [( A − π ( X )) 2 | X = x ] = h π ( x ) π ( x )(1 − π ( x )) · π ( x )(1 − π ( x )) = h π ( x ) . F or the marginal correction, since α 0 g β ( x ) do es not dep end on a , it factors out and the remaining exp ectation v anishes: E 0 [( A − π ( X )) α 0 g β ( X ) | X = x ] = α 0 g β ( x ) · E 0 [ A − π ( X ) | X = x ] = 0 . Com bining the three contributions giv es ˙ π 0 ( x ) = h π ( x ) , and therefore ˙ η 0 ,s h = ( h 1 , h 0 , h π ) = h . 30 Deriv ative of β . By Assumption 11 , ˙ β 0 ,s h = E 0 [ φ ( Z ) s h ( Z )] . Expanding φ from ( 9 ) and using linearity of exp ectation, this b ecomes E 0 [ φ ( Z ) s h ( Z )] = E 0  A ( Y − µ 1 ( X )) s h ( Z ) π ( X )  − E 0  (1 − A )( Y − µ 0 ( X )) s h ( Z ) 1 − π ( X )  + E 0 [ τ ( X ) s h ( Z )] − β 0 E 0 [ s h ( Z )] . (13) The last term v anishes since s h ∈ L 0 2 ( P 0 ) . W e ev aluate the remaining three terms in order. F or the first term, the iden tity E 0 [ A · f ( Z ) | X ] = π ( X ) E 0 [ f ( Z ) | X , A = 1] allows us to write E 0  A ( Y − µ 1 ( X )) s h ( Z ) π ( X )  = E 0  E 0  A ( Y − µ 1 ( X )) s h ( Z ) π ( X )     X  = E 0 [ E 0 [( Y − µ 1 ( X )) s h ( Z ) | X , A = 1]] = E 0 [ ˙ µ 1 , 0 ( X )] = E 0 [ h 1 ( X )] , where the p en ultimate equality uses ( 10 ) and the final equality uses ˙ µ 1 , 0 ( x ) = h 1 ( x ) as established ab o v e. The second term in ( 13 ) follo ws by the same argumen t with a = 0 , giving E 0 [ h 0 ( X )] . F or the third term, we apply the tow er prop ert y to condition on X : E 0 [ τ ( X ) s h ( Z )] = E 0 [ τ ( X ) · E 0 [ s h ( Z ) | X ]] . T o ev aluate the inner conditional exp ectation, we treat each comp onent of s h separately . F or g a , iden tical to the ab o ve, conditioning further on A gives E 0 [ g a ( Y , X ) | X = x ] = X a ′ ∈{ 0 , 1 } P 0 ( A = a ′ | x ) E 0 [ g a ′ ( Y , X ) | X = x, A = a ′ ] = X a ′ ∈{ 0 , 1 } P 0 ( A = a ′ | x ) · h a ′ ( x ) σ 2 a ′ ( x ) E 0 [ Y − µ a ′ ( X ) | X = x, A = a ′ ] = 0 , since the conditional mean of Y − µ a ′ ( X ) v anishes by definition. F or g h π , E 0 [ g h π ( X , A ) | X = x ] = h π ( x ) π ( x )(1 − π ( x )) E 0 [ A − π ( X ) | X = x ] = 0 . Since α 0 g β ( x ) is already a function of x alone, it passes through the conditional exp ectation un- c hanged. Combining these three observ ations, E 0 [ s h ( Z ) | X = x ] = 0 + 0 + α 0 g β ( x ) = α 0 g β ( x ) . Substituting bac k and us ing the iden tit y E 0 [ τ ( X ) g β ( X )] = 1 (established for the β -co ordinate submo del), E 0 [ τ ( X ) s h ( Z )] = E 0 [ τ ( X ) · α 0 g β ( X )] = α 0 · E 0 [ τ ( X ) g β ( X )] = α 0 . Collecting the three terms of ( 13 ), ˙ β 0 ,s h = E 0 [ h 1 ( X )] − E 0 [ h 0 ( X )] + α 0 = E 0 [ h 1 ( X ) − h 0 ( X )] + ( − E 0 [ h 1 ( X ) − h 0 ( X )]) = 0 . Com bined with the β -co ordinate submo del ab o ve, we see that Assumption 1 holds for the av erage treatmen t effect. 31 Remark 9 (When no marginal correction is needed) . When E 0 [ h 1 ( X ) − h 0 ( X )] = 0 , which holds for instance when h 1 = h 0 p oin t wise or whenever the outcome p erturbations are mean-balanced across treatment arms, we hav e α 0 = 0 and the score simplifies to s h = g a + g h π . The marginal correction α 0 g β is driven entirely b y the imbalance E 0 [ h 1 − h 0 ] of the outcome p erturbations. Assumption 13 . Under (R1)–(R2), ∥ φ ∥ ∞ < ∞ since µ a and π are b ounded and π is b ounded aw a y from 0 and 1 . The scores s β and s h are b ounded b y construction, so each co ordinate submo del is a linear tilt with b ounded score and the verification of Lemma 3 pro ceeds iden tically to the forw ard direction. Since all assump tions of Theorem 2 hold, with lo cal product structure (Assumption 12 ) estab- lished b y the explicit co ordinate submo del constructions ab o ve, Theorem 2 giv es that m is Neyman orthogonal with G = E 0 [ ∂ β m ( Z ; β 0 , η 0 )] = − 1 . 32

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment