Coarsening Bias from Variable Discretization in Causal Functionals

Coarsening Bias from V ariable Discretization in Causal F unctionals Xiaxian Ou and Razieh Nabi Departmen t of Biostatistics and Bioinformatics, Emory Univ ersit y , A tlan ta, GA, U.S.A. Abstract A class of causal eﬀect functionals requires in tegration o v er conditional densities of con tin uous v ariables, as in mediation eﬀects and nonparametric identiﬁcation in causal graphical mo dels. Estimating such densities and ev aluating the resulting integrals can be statistically and computationally demanding. A common workaround is to discretize the v ariable and replace in tegrals with ﬁnite sums. Although conv enien t, discretization alters the p opulation-lev el functional and can induce non-negligible approximation bias, even under correct identiﬁcation. Under smo othness conditions, we sho w that this coarsening bias is ﬁrst order in the bin width and arises at the level of the target functional, distinct from statistical estimation error. W e prop ose a simple bias-reduced functional that ev aluates the outcome regression at within-bin conditional means, eliminating the leading term and yielding a second-order approximation error. W e derive plug-in and one-step estimators for the bias-reduced functional. Simulations demonstrate substantial bias reduction and near- nominal conﬁdence in terv al cov erage, even under coarse binning. Our results provide a simple framew ork for con trolling the impact of v ariable discretization on parameter approximation and estimation. K eywor ds: Causal inference, Mediator discretization, Coarsening bias, F unctional appro ximation, Estimation bias. 1 In tro duction Sev eral causal parameters admit identiﬁcation formulas that require av eraging an outcome regression o v er the distribution of mediating v ariables. This structure arises in mediation 1 analysis [ 1 , 2 ], path-speciﬁc eﬀects [ 3 , 4 ], and nonparametric identiﬁcation of causal eﬀects in hidden-v ariable DA Gs [ 5 , 6 , 7 , 8 , 9 ]. When the mediator is contin uous or high-dimensional, as is common in biomedical and so cial science applications [ 10 , 11 , 12 ], ev aluating these identifying form ulas requires estimating conditional mediator densities and computing integrals with respect to them. Both tasks can b e statistically c hallenging, and Monte Carlo approximations may b e computationally in tensiv e and unstable in mo derate samples. A range of strategies hav e b een prop osed to address these challenges. One class of approaches directly mo dels the conditional mediator density and ev aluates the identifying integral nu- merically , using (semi/non-)parametric estimators [ 13 , 14 , 15 , 16 , 17 ]. These metho ds require correct sp eciﬁcation of the conditional densities to ac hiev e estimator consistency and can b e computationally demanding. Sequential regression approac hes a v oid explicit density estimation b y rewriting the iden tifying functional as iterated conditional exp ectations, thereby reducing the problem to estimation of a collection of regression functions [ 18 , 19 ]. How ev er, consistency of suc h estimators require correct sp eciﬁcation of all the nuisance regression comp onen ts. Alternativ ely , the identifying functional can often b e expressed using inv erse probability weigh ts inv olving treatmen t probabilities and mediator density ratios, which may b e rewritten via Bay es’ rule. Suc h weigh ting approac hes can suﬀer from instabilit y when treatment probabilities are small or p o orly estimated [ 20 , 21 , 22 ]. Inﬂuence-function based estimators, including one-step and targeted minimum loss estimators [ 23 , 24 ], provide a principled approach to estimation bias reduction and inference with ﬂexible nuisance estimation, and can oﬀer robustness to some forms of mo del missp eciﬁcation. How ever, such estimators are often av oided in practice due to metho dological complexit y . In practice, a simpler workaround is often adopted b y discretizing the mediator and replacing in tegrals with ﬁnite sums [ 25 , 26 , 27 ]. This a voids estimation of a contin uous conditional density and simpliﬁes computation. Ho wev er, discretization changes the p opulation-level parameter. Ev en when the causal identiﬁcation assumptions hold, the discretized functional is, in general, a diﬀerent parameter from the original one. In many applied analyses, the bias induced by mediator discretization is either ov erlo oked or treated as secondary relative to computational considerations, and its impact on the target parameter is rarely quantiﬁed. Here, w e study the diﬀerence b etw een the original causal functional and its discretized analogue, 2 whic h we refer to as the coarsening bias. Our primary fo cus is not on ﬁrst-order estimation bias arising from nuisance estimation, but rather on the p opulation-level approximation error induced b y mediator discretization itself. W e sho w that, under smo othness conditions, the naive discretized functional incurs a ﬁrst-order bias prop ortional to the maximum bin width, driven by within-bin shifts in mediator means across treatmen t levels. W e then construct a bias-reduced coarsened functional that ev aluates the ou tcome regression at within-bin conditional means. A T a ylor expansion argument sho ws that this mo diﬁcation eliminates the leading ﬁrst-order term, yielding a second-order remainder in the bin width. As for statistical estimation bias, we derive plug-in and one-step estimators and establish their asymptotic prop erties. Sim ulation studies demonstrate substan tial bias reduction and impro ved cov erage relativ e to naive discretization, ev en under coarse binning. More broadly , our results pro vide a formal foundation for con trolling the impact of discretization on parameter appro ximation and its estimate. 2 T arget estimands Let A denote a binary exp osure taking v alues in { a 0 , a 1 } , C observ ed co v ariates with supp ort C and marginal distribution P C , M a mediator, and Y the outcome. Deﬁne the outcome regression µ ( m, a, c ) = E ( Y | M = m, A = a, C = c ) , the conditional mediator density f M | A,C ( m | a, c ) = p ( M = m | A = a, C = c ) , and the prop ensity score π ( a | c ) = p ( A = a | C = c ) . Let Q = { µ, f M | A,C , π } collect the n uisance functions. The sp eciﬁc causal functionals considered b elow may dep end only on subsets of Q . F or each c ∈ C , deﬁne θ ( Q )( c ) = Z µ ( m, a 1 , c ) f M | A,C ( m | a 0 , c ) dm . (1) The conditional functional θ ( Q )( c ) serv es as a building blo ck for several causal parameters. In particular, deﬁne ψ ( Q ) = Z θ ( Q )( c ) p ( c ) dc , γ ( Q ) = p ( a 0 ) E ( Y | a 0 ) + Z θ ( Q )( c ) π ( a 1 | c ) p ( c ) dc . (2) Under standard iden tiﬁcation assumptions (see App endix B.1 ), these estimands corresp ond to the mediation functional E ( Y ( a 1 , M ( a 0 ))) and the fron t-do or functional E ( Y ( a 0 )) , resp ectiv ely , where Y ( a, m ) denotes the p otential outcome under A = a and M = m [ 1 , 28 , 8 , 29 ]. 3 Giv en n i.i.d. observ ations ( C i , A i , M i , Y i ) n i =1 and n uisance estimates collected in b Q , the plug-in estimators are ψ ( b Q ) = 1 n n X i =1 θ ( b Q )( C i ) , γ ( b Q ) = 1 n n X i =1 n I ( A i = a 0 ) Y i + θ ( b Q )( C i ) b π ( a 1 | C i ) o . (3) When M is contin uous, ev aluation of θ ( b Q )( c ) requires computing R b µ ( m, a 1 , c ) b f M | A,C ( m | a 0 , c ) dm , whic h ma y inv olv e n umerical in tegration or Monte Carlo appro ximation under a working mo del for the conditional densit y f M | A,C . In man y practical settings, this can b e statistically and com- putationally c hallenging. A common simpliﬁcation is to coarsen the mediator by discretization. Next, w e formalize such coarsened functionals and analyze the resulting approximation errors. 3 Coarsened estimands F or clarity we present the dev elopmen t for a univ ariate mediator M ∈ R ; extensions to multi- v ariate mediators are discussed in App endix A . Let h : R → { 1 , . . . , K } b e a measurable discretization map that partitions the supp ort of M in to disjoin t bins B k = { m : h ( m ) = k } , k ∈ { 1 , . . . , K } . Let w k = sup m 1 ,m 2 ∈B k | m 1 − m 2 | denote the bin width and deﬁne f M = h ( M ) . The rule h ma y corresp ond, for instance, to thresholding M at a ﬁxed or data-dep enden t cutoﬀ, but more generally can enco de any measurable partition. F or each bin k , deﬁne µ k ( a, c ) = E ( Y | f M = k , A = a, C = c ) , g k ( a, c ) = p ( f M = k | A = a, C = c ) , and µ k,a 1 ( a, c ) = E ( µ ( M , a 1 , c ) | f M = k , A = a, C = c ) . The coarsened analogue of ( 1 ) is θ h ( Q )( c ) = K X k =1 µ k ( a 1 , c ) g k ( a 0 , c ) . (4) The coarsened v ersions of estimands in ( 2 ) , denoted by ψ h ( Q ) and γ h ( Q ) , are obtained b y replacing θ with θ h . W e primarily fo cus on θ since coarsening behavior propagates to ψ and γ through θ b y a veraging ov er C . Lemma 3.1. F or e ach c ∈ C , the c o arsening err or ∆ h ( Q )( c ) = θ h ( Q )( c ) − θ ( Q )( c ) satisﬁes ∆ h ( Q )( c ) = K X k =1 { µ k ( a 1 , c ) − µ k,a 1 ( a 0 , c ) } g k ( a 0 , c ) . (5) 4 If m 7→ µ ( m, a 1 , c ) is c ontinuously diﬀer entiable on e ach B k and for e ach ﬁxe d c ther e exists a ﬁnite c onstant L ( c ) < ∞ such that sup m ∈B k | µ ′ m ( m, a 1 , c ) | ≤ L ( c ) for al l k ∈ { 1 , . . . , K } and L ( C ) is squar e-inte gr able under P C , then | ∆ h ( Q )( c ) | = O  w max ,K  , wher e w max ,K : = max k ∈{ 1 ,...,K } w k is the maximum bin width. In p articular, if M has b ounde d supp ort and the K bins ar e chosen to have e qual width, then ∆ h ( Q )( c ) = O (1 /K ) . When one bin is muc h wider than the others, e.g., { 0 } v ersus (0 , M max ] , the w orst-case b ound based on w max ,K can be conserv ative; a sharper b ound uses the w eighted av erage P K k =1 w k g k ( a 0 , c ) . See a pro of in App endix B.2 . T o understand the magnitude of coarsening error in ( 5 ) and design coarsening bias corrections, w e take a closer lo ok at the diﬀerence b etw een µ k,a 1 ( a 0 , c ) and µ k ( a 1 , c ) . Assume µ is t wice contin uously diﬀerentiable on each B k , and deﬁne the within-bin mean m k ( a, c ) = E ( M | A = a, C = c, f M = k ) . (6) Under the conditions of Lemma 3.1 , a T a ylor expansion of µ ( M , a 1 , c ) around m k yields µ k ( a 1 , c ) − µ k,a 1 ( a 0 , c ) = µ ′ m ( m k ( a 1 , c ) , a 1 , c ) { m k ( a 1 , c ) − m k ( a 0 , c ) } + O ( w 2 k ) , (7) where µ ′ m is the deriv ative of µ w.r.t m ; see App endix B.2 . Equation ( 7 ) rev eals that the coarsening error in ( 5 ) scales linearly with the shift in within-bin mediator means b etw een treatment levels and the lo cal slop e of the outcome regression in M . The higher-order terms are prop ortional to the curv ature of µ and the within-bin spread of M ; under mild smo othness conditions, these remainder terms are of second order in the bin width. Therefore ∆ h ( Q )( c ) admits a ﬁrst-order approximation with remainder O ( w 2 max ,K ) . In the follo wing remark, w e re-express the coarsening error via an exact cov ariance represen tation. Remark 3.2. ∆ h ( Q )( c ) admits an exact r epr esentation in terms of the within-bin c ovarianc e b etwe en (i) the c onditional outc ome surfac e µ ( M , a 1 , c ) and (ii) the tr e atment-induc e d density r atio r k ( M | c ) = p ( M | f M = k , A = a 0 , C = c ) /p ( M | f M = k , A = a 1 , C = c ) . The bias vanishes if within every bin, these two quantities ar e unc orr elate d under the distribution of M given ( f M = k , A = a 1 , C = c ) . This o c curs, for instanc e, if µ is lo c al ly c onstant in M within bins, or 5 if the c onditional distribution of M do es not diﬀer b etwe en A = a 0 and A = a 1 inside e ach bin; se e App endix B.3 . The plug-in estimate of θ h ( Q )( c ) is θ h ( b Q )( c ) = K X k =1 b µ k ( a 1 , c ) b g k ( a 0 , c ) , (8) where b µ k ( a, c ) and b g k ( a, c ) estimate µ k ( a, c ) and g k ( a, c ) , e.g., via regressions of Y on ( f M , A, C ) and f M on ( A, C ) . The plug-in estimators ψ h ( b Q ) and γ h ( b Q ) are obtained by replacing θ ( b Q ) with θ h ( b Q ) in ( 3 ). W e decomp ose θ h ( b Q )( c ) − θ ( Q )( c ) as: n θ h ( b Q )( c ) − θ h ( Q )( c ) o | {z } estimation error + n θ h ( Q )( c ) − θ ( Q )( c ) o | {z } coarsening error . (9) Assume that for ﬁxed h , the estimator θ h ( b Q ) satisﬁes ∥ θ h ( b Q ) − θ h ( Q ) ∥ L 2 ( P C ) = O p ( n − 1 / 2 ) , where ∥ g ∥ 2 L 2 ( P C ) = E ( g ( C ) 2 ) . Then, b y Lemma 3.1 , ∥ θ h ( b Q ) − θ ( Q ) ∥ L 2 ( P C ) = O p ( n − 1 / 2 ) + O ( w max ,K ) . Hence the asymptotic bias is driv en by the coarsening bias ∆ h ( Q ) , which is ﬁrst order in the bin width. Let K = K n increase with n . If M has bounded supp ort and equal-width bins are used, then w max ,K n = O (1 /K n ) . If K n → ∞ and w max ,K n = o ( n − 1 / 2 ) (e.g., K n = o ( n 1 / 2 ) ), the coarsening bias is negligible relative to sampling error and θ h ( b Q ) is asymptotically equiv alent to θ in L 2 ( P C ) . 4 Debiased coarsened estimands W e introduce a simple correction that remov es the leading discretization bias by ev aluating the outcome regression µ ( ., a 1 , c ) at within-bin representativ es of the mediator under A = a 0 . Given 6 m k ( a, c ) in ( 6 ), deﬁne the debiased coarsened functional e θ h ( Q )( c ) = K X k =1 µ  m k ( a 0 , c ) , a 1 , c  g k ( a 0 , c ) . (10) The expansion is taken around m k ( a 0 , c ) because θ ( Q )( c ) a verages µ ( M , a 1 , c ) with resp ect to the distribution of M giv en A = a 0 . Th us ( 10 ) replaces the in-bin conditional exp ectation of µ ( M , a 1 , c ) under A = a 0 b y its ev aluation at the within-bin mean under the same distribution. The debiased coarsened causal parameters e ψ h ( Q ) and e γ h ( Q ) are obtained from ( 2 ) b y replacing θ with e θ h . Lemma 4.1. F or e ach c ∈ C , the c o arsening err or e ∆ h ( Q )( c ) = e θ h ( Q )( c ) − θ ( Q )( c ) satisﬁes e ∆ h ( Q )( c ) = K X k =1 { µ ( m k ( a 0 , c ) , a 1 , c ) − µ k,a 1 ( a 0 , c ) } g k ( a 0 , c ) . (11) If m 7→ µ ( m, a 1 , c ) is twic e c ontinuously diﬀer entiable and R k ( c ) = µ  m k ( a 0 , c ) , a 1 , c  − µ k,a 1 ( a 0 , c ) , then | R k ( c ) | ≤ 1 2 sup m ∈B k | µ ′′ m ( m, a 1 , c ) | E  ( M − m k ( a 0 , c )) 2 | a 0 , c, k  . (12) If sup m ∈B k | µ ′′ m ( m, a 1 , c ) | ≤ L ( c ) , wher e L ( C ) is squar e-inte gr able under P C , then e ∆ h ( Q )( c ) = O ( w 2 max ,K ) . If M has b ounde d supp ort and the K bins ar e chosen to have e qual width, then ∆ h ( Q )( c ) = O (1 /K 2 ) . See App endix B.4 for a pro of. Giv en Lemma 4.1 , the ﬁrst-order term in Lemma 3.1 is eliminated. The remaining error is prop ortional to the curv ature of µ ( m, a 1 , c ) and the within-bin v ariance of M , and is therefore second order in the bin width. The plug-in estimator of e θ h ( Q )( c ) is e θ h ( b Q )( c ) = K X k =1 b µ  b m k ( a 0 , c ) , a 1 , c  b g k ( a 0 , c ) , (13) where b m k ( a 0 , c ) = b E ( M | A = a 0 , C = c, f M = k ) . The estimators e ψ h ( b Q ) and e γ h ( b Q ) are obtained 7 b y replacing θ ( b Q ) with e θ h ( b Q ) in ( 3 ). W e decomp ose e θ h ( b Q )( c ) − θ ( Q )( c ) as n e θ h ( b Q )( c ) − e θ h ( Q )( c ) o | {z } estimation error + n e θ h ( Q )( c ) − θ ( Q )( c ) o | {z } coarsening error . (14) Assume that for ﬁxed h , ∥ e θ h ( b Q ) − e θ h ( Q ) ∥ L 2 ( P C ) = O p ( n − 1 / 2 ) . Then, b y Lemma 4.1 , ∥ e θ h ( b Q ) − θ ( Q ) ∥ L 2 ( P C ) = O p ( n − 1 / 2 ) + O ( w 2 max ,K ) . F or ﬁxed K , the asymptotic bias is driv en b y the coarsening error e ∆ h ( Q ) , whic h is second order in the bin width. Let K = K n increase with n . If M has bounded supp ort and equal-width bins are used, then w max ,K n = O (1 /K n ) , and hence e ∆ h ( Q )( c ) = O (1 /K 2 n ) . If K n → ∞ and w 2 max ,K n = o ( n − 1 / 2 ) (e.g., K n = o ( n 1 / 4 ) ), the coarsening bias is negligible relative to sampling error and e θ h ( b Q ) is asymptotically equiv alent to θ ( Q ) in L 2 ( P C ) . Compared to the naive coarsened functional θ h ( Q )( c ) , which incurs ﬁrst-order bias of order O ( w max ,K ) , the debiased functional e θ h ( Q )( c ) achiev es a second-order approximation of order O ( w 2 max ,K ) under the stated smo othness conditions. Under equal-width binning, this corresp onds to O (1 /K ) versus O (1 /K 2 ) . 5 Smo othed debiased coarsened estimands Sections 3 and 4 analyzed the approximation error induced b y coarsening. W e now turn to statistical estimation. Throughout the previous sections w e assumed that, for ﬁxed h , the estimation error of the plug-in estimator is O p ( n − 1 / 2 ) in L 2 ( P C ) . Such b ehavior typically requires suﬃcien tly fast conv ergence 8 of all nuisance estimators. When ﬂexible machine learning metho ds are used to estimate µ , π , and g k , plug-in estimators can exhibit non-negligible ﬁrst-order estimation bias. A standard v on Mises expansion sho ws that for a smo oth target parameter ψ ( Q ) , ψ ( b Q ) − ψ ( Q ) = − P { ϕ ( b Q ) } + R 2 ( b Q, Q ) , where ϕ ( Q ) is the canonical gradien t, eﬃcien t inﬂuence function (EIF), and R 2 is a second-order remainder [ 23 ]. The leading term − P { ϕ ( b Q ) } represen ts ﬁrst-order bias and dep ends directly on the qualit y of nuisance estimation. A chieving asymptotic linearity therefore requires suﬃcien tly fast nuisance conv ergence. Inﬂuence-function–based estimators, such as one-step and targeted minim um loss estimators, remo ve the ﬁrst-order bias term and attain asymptotic linearit y under w eaker conditions. Ho wev er, to construct such estimators, the target parameter m ust b e pathwise diﬀerentiable in the nonparametric mo del. The debiased coarsened functional e θ h ( Q )( c ) fails this requirement when M is con tinuous, b ecause it inv olves p oin t ev aluation of µ ( m, a, c ) at m k ( a 0 , c ) . Poin t ev aluation is not a contin uous linear functional on L 2 ( P ) unless M is discrete, and therefore e ψ h and e γ h are not path wise diﬀerentiable in the nonparametric mo del. 5.1 Smo othed estimands and plug-ins T o restore pathwise diﬀerentiabilit y and enable inﬂuence-function–based estimation, we introduce a smo othed v ersion of the debiased functional. W e regularize the point ev aluation by replacing it with a lo calized a verage. Let K b e a symmetric k ernel and let b > 0 b e a bandwidth. Deﬁne K b ( u ) = b − 1 K ( u/b ) and the normalized weigh t ω b,k ( m | a 1 , c ) = K b ( m − m k ( a 0 , c )) E ( K b ( M − m k ( a 0 , c )) | A = a 1 , C = c ) . Deﬁne the lo calized mean µ b,k ( a 1 , c ) = E ( µ ( M , a 1 , c ) ω b,k ( M | a 1 , c ) | A = a 1 , C = c ) , and the smo othed debiased coarsened estimand e θ h,b ( Q )( c ) = K X k =1 µ b,k ( a 1 , c ) g k ( a 0 , c ) . (15) 9 As b → 0 , the k ernel K b concen trates at m k ( a 0 , c ) and e θ h,b ( Q )( c ) → e θ h ( Q )( c ) . The smo othed debiased coarsened parameters e ψ h,b ( Q ) and e γ h,b ( Q ) are obtained from ( 2 ) b y replacing θ with e θ h,b . Lemma 5.1. F or e ach c ∈ C , the smo othing err or e ∆ s h,b ( Q )( c ) = e θ h,b ( Q )( c ) − e θ h ( Q )( c ) satisﬁes e ∆ s h,b ( Q )( c ) = K X k =1 n µ b,k ( a 1 , c ) − µ ( m k ( a 0 , c ) , a 1 , c ) o g k ( a 0 , c ) . (16) Supp ose: (i) m 7→ µ ( m, a 1 , c ) is twic e c ontinuously diﬀer entiable ne ar m k ( a 0 , c ) with sup m,c | µ ′′ m ( m, a 1 , c ) | < ∞ ; (ii) K is symmetric with R u K ( u ) du = 0 and R u 2 K ( u ) du < ∞ ; and (iii) f M | A,C ( · | a 1 , c ) is c ontinuous and b ounde d away fr om zer o ne ar m k ( a 0 , c ) . Then, as b → 0 , e ∆ s h,b ( Q )( c ) = O ( b 2 ) . Combining with L emma 4.1 , the total err or e ∆ h,b ( Q )( c ) = e θ h,b ( Q )( c ) − θ ( Q )( c ) satisﬁes e ∆ h,b ( Q )( c ) = O  w 2 max ,K + b 2  . See a pro of in App endix B.5 . The plug-in estimator for e θ h,b ( Q )( c ) is e θ h,b ( b Q )( c ) = K X k =1 b µ b,k ( a 1 , c ) b g k ( a 0 , c ) , (17) where b µ b,k ( a 1 , c ) = b E ( b µ ( M , a 1 , c ) b ω b,k ( m | a 1 , c ) | a 1 , c ) . W e can decomp ose e θ h,b ( b Q )( c ) − θ ( Q )( c ) as n e θ h,b ( b Q )( c ) − e θ h,b ( Q )( c ) o | {z } estimation error + n e θ h,b ( Q )( c ) − θ ( Q )( c ) o | {z } total error . (18) Assume that for ﬁxed h and b , ∥ e θ h,b ( b Q ) − e θ h,b ( Q ) ∥ L 2 ( P C ) = O p ( n − 1 / 2 ) . Then, b y Lemma 5.1 , ∥ e θ h,b ( b Q ) − θ ( Q ) ∥ L 2 ( P C ) = O p ( n − 1 / 2 ) + O ( w 2 max ,K + b 2 ) . Th us, for ﬁxed K and b , the asymptotic bias is driven by e ∆ h,b ( Q )( c ) , whic h is second order in 10 the bin width and quadratic in the smoothing bandwidth. Let K = K n and b = b n dep end on n . Under equal-width binning, w max ,K n = O (1 /K n ) , so e ∆ h,b n ( Q )( c ) = e θ h,b n ( Q )( c ) − θ ( Q )( c ) = O (1 /K 2 n + b 2 n ) . If K n → ∞ , b n → 0 , and 1 /K 2 n + b 2 n = o ( n − 1 / 2 ) , for example K n = o ( n 1 / 4 ) and b n = o ( n 1 / 4 ) , then e θ h,b n ( b Q ) is asymptotically equiv alent to θ ( Q ) in L 2 ( P C ) . 5.2 Inﬂuence function based estimation As an example, consider the smo othed mediation functional e ψ h,b ( Q ) = Z e θ h,b ( Q )( c ) p ( c ) dc . T o present the inﬂuence function in a transparent form, we ﬁrst treat the bin cen ters m k ( a 0 , c ) as ﬁxed quan tities m ⋆ k ( a 0 , c ) (e.g., midp oin ts or pre-sp eciﬁed represen tatives). Theorem 5.2. The nonp ar ametric eﬃcient inﬂuenc e function for e ψ h,b ( Q ) when m k ( a 0 , k ) is ﬁxe d, is e ϕ ﬁxe d h,b ( Q )( O ) = K X k =1 I ( A = a 1 ) π ( a 1 | C ) g k ( a 0 , C ) n Y ω b,k ( M | a 1 , C ) − µ b,k ( a 1 , C ) o + K X k =1 I ( A = a 0 ) π ( a 0 | C ) µ b,k ( a 1 , C ) n I ( f M = k ) − g k ( a 0 , C ) o + n e θ h,b ( Q )( C ) − e ψ h,b ( Q ) o . See a pro of in App endix B.6 . Let b Q collect all nuisance estimates. The one-step estimator of e ψ h,b ( Q ) , when m k ( a 0 , k ) is ﬁxed, is e ψ one-step h,b ( b Q ) = e ψ ﬁxed h,b ( b Q ) + 1 n n X i =1 e ϕ ﬁxed h,b ( b Q )( O i ) , 11 where e ϕ ﬁxed h,b ( b Q )( O i ) is the ev aluation of the inﬂuence function at b Q and O i , and e ψ ﬁxed h,b ( b Q )= 1 n n X i =1 K X k =1 b µ ﬁxed b,k ( a 1 , C i ) b g k ( a 0 , C i ) , b µ ﬁxed b,k ( a 1 , c )= b E h b µ ( M , a 1 , c ) ω b,k ( M | a 1 , c ) | A = a 1 , C = c i . Under suﬃcien t regularity conditions, n − 1 / 4 rates of the n uisance estimators, and cross-ﬁtting, the one-step estimator is asymptotically linear with asymptotic v ariance of E ( e ϕ ﬁxed h,b ( Q )( O ) 2 ) . When m k ( a 0 , c ) is estimated from the data, additional inﬂuence function contributions arise. The con tribution of the nuisance m k to the EIF, denoted b y e ϕ ω h,b ( Q )( O ) , is e ϕ ω h,b ( Q )( O ) = K X k =1 α k ( C ) I ( A = a 0 ) I ( f M = k ) π ( a 0 | C )  M − m k ( a 0 , C )  , (19) where α k ( c ) : = ∂ ∂ m µ b,k ( a 1 , c ) . The full EIF for e ψ h,b ( Q ) is th us giv en b y e ϕ h,b ( Q )( O ) = e ϕ ﬁxed h,b ( Q )( O )+ e ϕ ω h,b ( Q )( O ) . See a pro of in App endix B.6 . 6 Sim ulation studies W e study estimation of the mediation functional ψ ( Q ) = E ( Y ( a 1 , M ( a 0 ))) , corresp onding to ( 2 ) , with a 1 = 1 and a 0 = 0 throughout. Our goals are tw ofold: (i) T o quantify the p opulation-lev el coarsening bias of the discretized estimand θ h ( Q )( c ) and verify the ﬁrst-order scaling derived in Lemma 3.1 ; (ii) T o ev aluate the ﬁnite-sample p erformance of the coarsened estimator ψ h ( b Q ) and the prop osed debiased estimator e ψ h ( b Q ) , including plug-in and one-step v ariants. All sim ulations are conducted under the following data-generating pro cess (DGP). The baseline co v ariate C tak es v alues in {− 2 , − 1 , 0 , 1 , 2 } with probability mass (0 . 15 , 0 . 20 , 0 . 18 , 0 . 30 , 0 . 17) resp ectiv ely; A | C ∼ Bernoulli  expit(0 . 5 C )  , M | A, C ∼ N  − 0 . 6 C + 2 A + 0 . 5 AC , 1  , Y | M , A, C ∼ N  µ ( M , A, C ) , 1  , where µ ( M , A, C ) = 0 . 8 C + 1 . 5 A + 0 . 75 AM + 0 . 20 M C 2 + 0 . 1 M 3 + 0 . 55 AC , nonlinear in M , 12 T able 1: Coarsening error ∆ h ( Q )( c ) for equal-frequency discretizations with K = 2 and K = 6 . C = -2 C = -1 C = 0 C = 1 C = 2 K = 2 1.103 0.871 0.876 1.522 3.390 K = 6 0.195 0.149 0.216 0.550 1.562 ensuring nonzero curv ature and hence non negligible second order terms in the T a ylor expansions b ehind Lemmas 3.1 and 4.1 . Sim ulation #1: Population coarsening error. W e ﬁrst isolate the population-level discretization bias ∆ h ( Q )( c ) = θ h ( Q )( c ) − θ ( Q )( c ) as c haracterized in Lemma 3.1 . T o approximate the theoretical quantities app earing in ∆ h ( Q )( c ) , w e generate a large Mon te Carlo sample of size n = 1 , 000 , 000 from the DGP . No nuisance estimation is in volv ed in this exp eriment. Instead, we exploit the known parametric forms to compute: g k ( a, c ) , m k ( a, c ) , µ k ( a, c ) . W e consider equal-frequency discretizations of M with K = 2 and K = 6 bins. T able 1 rep orts ∆ h ( Q )( c ) across lev els of C . F or all cov ariate v alues, the magnitude of the coarsening error decreases substantially as K increases. This pattern is consistent with the theoretical scaling ∆ h ( Q )( c ) = O (1 /K ) under equal-width binning and b ounded supp ort, as established in Lemma 3.1 . Imp ortantly , even at K = 6 , the bias remains non-negligible for some v alues of C , illustrating that coarse discretization can alter the target parameter. Equation ( 7 ) sho ws that the leading term of ∆ h ( Q )( c ) is prop ortional to µ ′ m ( m k ( a 1 , c ) , a 1 , c ) { m k ( a 1 , c ) − m k ( a 0 , c ) } . Figure 1 plots the within-bin shifts m k ( a 1 , c ) − m k ( a 0 , c ) for discretizations with K = 2 and K = 6 . Increasing K reduces these shifts across bins and cov ariate strata, thereby atten uating the dominant ﬁrst-order comp onent of the discretization bias. The empirical b ehavior aligns closely with the T a ylor expansion derived in Lemma 3.1 . Remark 3.2 expresses the coarsening bias as a within-bin cov ariance b etw een µ ( M , a 1 , c ) and the ratio r k ( M | c ) . Figure 2 illustrates this at C = 0 . With ﬁner discretization, the co v ariance magnitude drops substan tially , giving a complementary view of the reduced coarsening error. T ogether, these results empirically v alidate the O (1 /K ) ﬁrst-order b ehavior predicted b y Lemma 3.1 . 13 Figure 1: Within-bin diﬀerences in conditional mediator means, m k ( a 1 , c ) − m k ( a 0 , c ) , under discretizations with K = 2 (red) and K = 6 (blue). Each horizontal segmen t represen ts one mediator bin, with bin indices shown on the upp er axis for K = 2 and on the lo wer axis for K = 6 . Panels corresp ond to diﬀeren t v alues of the cov ariate C . Figure 2: Within-bin cov ariance b et ween µ ( M , a 1 , C ) and r k ( M | C ) at C = 0 , as formalized in Remark 3.2 . Finer discretization reduces the magnitude of this co v ariance. 14 Sim ulation #2: Finite-sample p erformance. W e now turn to statistical estimation. W e generate datasets with sample sizes n ∈ { 500 , 5 , 000 , 50 , 000 } , using 1,000 Mon te Carlo replications for eac h n . W e consider equal-frequency discretizations of the mediator using a broad range of bin coun ts K . F or each discretization lev el K , we compare: the coarsened plug-in estimator ψ h ( b Q ) , the debiased coarsened plug-in estimator e ψ h ( b Q ) , and their corresp onding one-step estimators. All n uisance functions are estimated using correctly sp eciﬁed generalized linear mo dels. The within-bin means m k ( a 0 , c ) and bin probabilities g k ( a, c ) are computed under the truncated normal distribution implied b y the DGP . Figure 3 displa ys the bias of ψ h ( b Q ) and e ψ h ( b Q ) as functions of n for sev eral v alues of K . T w o distinct phenomena are evident: (i) Increasing n reduces Mon te Carlo v ariability , and (ii) for ﬁxed K , the bias of ψ h ( b Q ) does not v anish as n increases. The p ersistence of bias reﬂects p opulation coarsening error, not estimation error. In con trast, e ψ h ( b Q ) exhibits dramatically reduced bias across all sample sizes, consisten t with the second-order scaling e ∆ h ( Q )( c ) = O (1 /K 2 ) from Lemma 4.1 . Because bias patterns are qualitativ ely similar across sample sizes, w e fo cus on n = 5 , 000 . Figure 4 summarizes absolute bias, v ariance, mean squared error (MSE), and cov erage of 95% conﬁdence in terv als (CIs) as functions of K . The coarsened plug-in estimator ψ h ( b Q ) exhibits substantial bias and p o or cov erage for small K . P erformance improv es gradually as K increases, reﬂecting the O (1 /K ) deca y of discretization bias. In contrast, the debiased estimator e ψ h ( b Q ) ac hiev es near-zero bias even for relativ ely small K , smaller MSE, and co v erage close to the nominal lev el across a broad range of discretization lev els. V ariance is only mo destly aﬀected by the debiasing step. These ﬁndings demonstrate that ev aluating µ at the within-bin mean m k ( a 0 , c ) eﬀectiv ely remov es the leading discretization bias without inducing meaningful v ariance inﬂation. One-step estimators are often used to reduce estimation bias b y incorp orating inﬂuence-function corrections. Deﬁne the generalized prop ensity score g ( a | m, c ) = p ( A = a | M = m, C = c ) . The 15 one-step estimators for ψ ( Q ) and ψ h ( Q ) , denoted ψ + ( b Q ) and ψ + h ( b Q ) are ψ + ( b Q ) = 1 n n X i =1 " I ( A i = a ) b π ( a 0 | C i ) b g ( a 0 | M i , C i ) b g ( a 1 | M i , C i )  Y − b µ ( M i , a 1 , C i )  (20) + I ( A i = a 0 ) ˆ π ( a 0 | C i )  b µ ( M i , a 1 , C i ) − θ ( b Q )( C i )  + θ ( b Q )( C i ) # , ψ + h ( b Q ) = 1 n n X i =1 " I ( A i = a ) b π ( a | C i ) b g k ( a 0 , C i ) b g k ( a 1 , C i )  Y − b µ k ( a 1 , C i )  (21) + I ( A i = a 0 ) b π ( a 0 | C i )  b µ k ( a 1 , C i ) − θ h ( b Q )( C i )  + θ h ( b Q )( C i ) # . The estimator in ( 21 ) is consisten t for ψ h ( Q ) if either (i) b π and b g k , (ii) b π and b µ k , or (iii) b g k and b µ k are consisten t. Note that ψ + h ( b Q ) still targets the coarsened functional ψ h ( Q ) rather than the original parameter ψ ( Q ) . T o remedy this limitation, we compare: (i) ψ + h ( b Q ) , the one-step estimator for ψ h ( Q ) , (ii) e ψ + h 1 ( b Q ) , a naiv e deriv ative-based correction that replaces b µ k b y b µ ( m k ( a 0 , c ) , a 1 , c ) and θ h ( b Q )( c ) with e θ h ( b Q )( c ) in ( 21 ) , (iii) e ψ + h 2 ( b Q ) , the one-step estimator that replaces θ ( b Q )( c ) with e θ h ( b Q )( c ) in ( 20 ), (iv) the coarsened plug-in estimator ψ h ( b Q ) , (v) the debiased plug-in estimator e ψ h ( b Q ) . Figure 5 presents results under correct sp eciﬁcation and a range of missp eciﬁcation for nuisance mo dels. Details of the missp eciﬁcation conﬁgurations is pro vided in App endix C . The one-step estimator ψ + h ( b Q ) yields mo dest improv emen ts in bias and cov erage relative to ψ h ( b Q ) , but do es not eliminate discretization bias, since the target remains ψ h ( Q ) . The ﬁrst deriv ative estimator e ψ + h 1 ( b Q ) b eha v es similarly and do es not meaningfully reduce discretization bias. In contrast, e ψ + h 2 ( b Q ) achiev es uniformly lo w bias and near-nominal cov erage across scenarios. Notably , the debiased plug-in estimator e ψ h ( b Q ) p erforms comparably to e ψ + h 2 ( b Q ) , with only minor diﬀerences in bias and co verage. Ov erall, these ﬁndings demonstrate that discretization bias arises at the level of the target functional and therefore cannot b e eliminated by inﬂuence-function correction alone. Correcting the functional through the prop osed debiased coarsened construction is essential for v alid inference. The resulting debiased plug-in estimator achiev es substantial bias reduction while retaining computational simplicit y and strong empirical p erformance. 16 Figure 3: Bias of the coarsened plug-in estimator ψ h ( b Q ) and the debiased plug-in estimator e ψ h ( b Q ) as functions of sample size for sev eral discretization lev els K . F or ﬁxed K , the bias of ψ h ( b Q ) p ersists as n increases, reﬂecting p opulation coarsening error, whereas e ψ h ( b Q ) exhibits substan tially reduced bias across sample sizes. Figure 4: The p erformance of the coarsened plug-in estimator ψ h ( b Q ) and the debiased estimator e ψ h ( b Q ) as functions of the n umber of mediator bins K for sample size n = 5 , 000 . 17 Figure 5: Comparison of plug-in and one-step estimators under nuisance-model missp eciﬁcation for n = 5 , 000 . “Correct” indicates all nuisance mo dels are correctly sp eciﬁed, whereas “F alse” indicates all nuisance mo dels are missp eciﬁed. Conditions 1–3 corresp ond to missp eciﬁcation of { µ, µ k } , { g , g k } , an d π , resp ectiv ely . Performance is summarized in terms of bias, v ariance, MSE, and 95% conﬁdence in terv al cov erage. 7 A real data application W e apply our proposed estimation framew ork to data from the B_PR OUD study , an observ ational in vestigation assessing the impact of Mobile Stroke Unit (MSU) dispatch on p ost-stroke outcomes in Berlin [ 30 ] conducted b et ween F ebruary 2017 and May 2019. It can b e used to ev aluate the eﬀect of additional MSU care on 3-mon th functional outcomes among patients for whom an MSU w as received. W e analyzed 768 patients eligible for rep erfusion therap y , of whom 588 (76.6%) receiv ed MSU care ( A = 1 ) and 180 (23.4%) received conv entional emergency services ( A = 0 ). The outcome Y is the 3-month mo diﬁed Rankin Scale (mRS), an ordinal measure ranging from 0 (no symptoms) to 6 (death). In our study , we treat mRS as contin uous. The full mediator M is deﬁned as the time from am bulance dispatc h to thrombolysis and is set to zero for patien ts who did not receive 18 Figure 6: Comparison of plug-in estimators γ h ( b Q ) , e γ h ( b Q ) and γ s ( b Q ) in B_PROUD study for estimating γ ( Q ) = E [ Y ( a 0 )] throm b olytic therap y . W e adjust for tw o baseline contin uous cov ariates: systolic blo o d pressure and strok e severit y . Missing data are handled using m ultiple imputation. This dataset was previously analyzed by [ 27 ] emplo ying a front-door approach to estimate the causal eﬀect of MSU dispatc h on 3-mon th mRS using a three-category mediator. Subsequently , [ 8 ] considered both contin uous and binary v ersions of the mediator, exploring diﬀerent cutoﬀs and implemen ting one-step and TMLE estimators of the fron t-do or functional. In this analysis, w e compare three estimators of the fron t-do or functional γ ( Q ) for E ( Y ( a 0 )) in ( 2 ) with a 0 = 0 : (i) the coarsened plug-in estimator γ h ( b Q ) , (ii) the debiased plug-in estimator e γ h ( b Q ) , and (iii) the plug-in estimator γ s ( b Q ) based on sequen tial regression, which av oids explicit estimation of the mediator densit y by rewriting θ ( Q )( c ) as θ ( Q )( c ) = E ( ˜ Y | a 0 , c ) , where e Y = E ( Y | a 1 , m, c ) . (22) All nuisance functions w ere estimated using super learner with candidates including glm.interaction , glmnet , random forests (1000 trees), xgboost , and dbarts . Figure 6 presen ts p oint estimates along with 95% CIs obtained via b o otstrapping. In terms of p oin t estimates, the debiased estimator e γ h ( b Q ) and the sequen tial estimator γ s ( b Q ) are closer to eac h other than either is to the coarsened estimator γ h ( b Q ) . When K = 2 , the CIs of the three estimators exhibit limited o verlap, and γ h ( b Q ) (2.074; 95% CI: 1.931–2.217) app ears somewhat 19 Figure 7: Estimated within-bin diﬀerences in conditional mediator means, b m k ( a 1 , C i ) − b m k ( a 0 , C i ) , under discretizations with K = 2 . The red line denotes zero and the blue dashed lines indicate the median of within-bin diﬀerences. farther from the debiased estimator e γ h ( b Q ) (2.182; 95% CI: 2.015–2.382) and the sequential estimator γ s ( b Q ) (2.157; 95% CI: 1.984–2.340). F or K = 3 , 4 , and 5 , the CIs largely ov erlap. In particular, the estimate of e γ h ( b Q ) with K = 5 (2.144; 95% CI: 2.010–2.340) is nearly identical to the estimate of γ s ( b Q ) , indicating substan tial agreement under ﬁner discretization. The coarsened estimator e γ h ( b Q ) do es not p erform as p o orly as migh t b e anticipated. This pattern is in terpretable up on examining the within-bin mediator mean shifts across treatmen t lev els, as illustrated in Figure 7 . F or K = 2 , the estimated shifts in the tw o bins cluster around opp osite signs-one predominan tly p ositive and the other predominantly negativ e. As indicated in ( 5 ) , the coarsening bias dep ends on b oth the mediator shifts within bins and the lo cal deriv ative structure of µ ( m, a 1 , c ) . When mediator shifts diﬀer in sign across bins, their weigh ted contributions ma y partially oﬀset.This cancellation eﬀect pro vides a plausible explanation for the relativ ely small empirical deviation of the coarsened estimator compared with the other estimators in this dataset. Ho wev er, such oﬀsets dep ends on sp eciﬁc features of the data structure and cannot b e alw ays guaran teed in practice. Therefore, although the coarsened plug-in estimator ma y p erform well in certain empirical settings, relying on it without bias correction remains p otentially risky . In con trast, the prop osed debiased estimator oﬀers a more reliable alternative, mitigating ﬁrst-order coarsening bias without imp osing restrictive assumptions on the mediator–outcome relationship. 20 8 Discussion W e study the bias induced by discretizing a contin uous mediator in causal functionals. Even under correct identiﬁcation and consistent nuisance estimation, naive discretization alters the target parameter and introduces a ﬁrst-order approximation error that scales with the bin width. Ev aluating the outcome regression at within-bin means under the appropriate treatment- reference distribution remov es this leading term and reduces the error to second order under standard smo othness conditions. A key distinction is b etw een functional appro ximation error and statistical estimation error. Inﬂuence-function–based estimators address the latter but do not correct discretization bias when the estimand itself has b een mo diﬁed. Correcting the functional is therefore essential; once this is done, standard semiparametric to ols can b e applied to con trol estimation error and conduct v alid inference. F rom a practical p ersp ective, discretization is frequen tly used to simplify computation or av oid direct conditional d ensit y estimation. Our results sho w that the resulting approximation error can be formally characterized and reduced through a simple within-bin correction that preserv es computational tractabilit y . Although we fo cus on mediation and fron t-do or functionals, the underlying argument applies more broadly to causal parameters that integrate a regression surface against a conditional distribution, provided suitable smo othness conditions hold. 21 References [1] Judea P earl. Direct and indirect eﬀects. In Pr o c e e dings of the Sevente enth Confer enc e on Unc ertainty in A rtiﬁcial Intel ligenc e (UAI-01) , pages 411–420. Morgan Kaufmann, San F rancisco, 2001. [2] T yler V anderW eele and Stijn V ansteelandt. Mediation analysis with m ultiple mediators. Epidemiolo gic metho ds , 2(1):95–115, 2014. [3] Chen A vin, Ily a Shpitser, and Judea Pearl. Iden tiﬁabilit y of path-sp eciﬁc eﬀects. In Pr o c e e dings of the IJCAI , pages 357–363, 12 2005. [4] Ily a Shpitser. Counterfactual graphical mo dels for longitudinal mediation analysis with unobserv ed confounding. Co gnitive scienc e , 37(6):1011–1035, 2013. [5] Jin Tian and Judea P earl. A general identiﬁcation condition for causal eﬀects. In Eighte enth National Confer enc e on A rtiﬁcial Intel ligenc e , pages 567–573, 2002. [6] Thomas S Richardson, Robin J Ev ans, James M Robins, and Ilya Shpitser. Nested mark o v prop erties for acyclic directed mixed graphs. The A nnals of Statistics , 51(1):334–361, 2023. [7] Rohit Bhattac harya, Razieh Nabi, and Ily a Shpitser. Semiparametric inference for causal eﬀects in graphical mo dels with hidden v ariables. Journal of Machine L e arning R ese ar ch , 23:1–76, 2022. [8] Anna Guo, Da vid Benk eser, and Razieh Nabi. Flexible nonparametric inference for causal eﬀects under the fron t-do or mo del. arXiv pr eprint arXiv:2312.10234 , 2023. [9] Anna Guo, David Benkeser, and Razieh Nabi. Causal inference with the" napkin graph". arXiv pr eprint arXiv:2512.19861 , 2025. [10] Y asmmyn D Salinas, Zuoheng W ang, and Andrew T DeW an. Discov ery and mediation analysis of cross-phenotype asso ciations b etw een asthma and b o dy mass index in 12q13. 2. A meric an Journal of Epidemiolo gy , 190(1):85–94, 2021. [11] Jiasheng Huang, Y ehua Li, Angelique G Brellenthin, Duc k-ch ul Lee, Xuemei Sui, and Stev en N Blair. Causal mediation analysis b et ween resistance exercise and reduced risk of 22 cardio v ascular disease based on the aerobics center longitudinal study . Journal of A pplie d Statistics , 49(14):3750–3767, 2022. [12] Ali Hemade and P ascale Salameh. Revisiting the ob esity–anaemia paradox: Inﬂammation and iron homeostasis in the bmi–haemoglobin relationship. Endo crinolo gy, Diab etes & Metab olism , 8(5):e70110, 2025. [13] Ze-w ei Ma and W ei-nan Zeng. A multiple mediator mo del: Po wer analysis based on monte carlo sim ulation. A meric an Journal of Applie d Psycholo gy , 3(3):72–79, 2014. [14] Dustin Tingley , T epp ei Y amamoto, Kentaro Hirose, Luk e Keele, and Kosuk e Imai. Mediation: R pac kage for causal mediation analysis. Journal of statistic al softwar e , 59:1–38, 2014. [15] Katrina L Devic k, Linda V aleri, Jarvis Chen, Alejandro Jara, Marie-Abèle Bind, and Bren t A Coull. The role of b o dy mass index at diagnosis of colorectal cancer on black–white disparities in surviv al: a density regression mediation approac h. Biostatistics , 23(2):449–466, 2022. [16] An tonio R Linero. Sim ulation-based estimators of analytically intractable causal eﬀects. Biometrics , 78(3):1001–1017, 2022. [17] Yi Li, May a B Math ur, Daniel H Solomon, Paul M Ridker, Rob ert J Glynn, and Kazuki Y oshida. Eﬀect measure mo diﬁcation by co v ariates in mediation: extending regression-based causal mediation analysis. Epidemiolo gy , 34(5):661–672, 2023. [18] R uyi Liu, Liangyuan Hu, F rancis Perry Wilson, Josh ua L W arren, and F an Li. A bay esian approac h to the g-form ula via iterative conditional regression. Statistics in Me dicine , 44(13-14):e70123, 2025. [19] Zeyi W ang, Lars v an der Laan, May a Petersen, Thomas Gerds, Ka jsa K vist, and Mark v an der Laan. T argeted maximum likelihoo d based estimation for longitudinal mediation analysis. Journal of Causal Infer enc e , 13(1):20230013, 2025. [20] Guanglei Hong. Ratio of mediator probabilit y w eighting for estimating natural direct and indirect eﬀects. arXiv pr eprint arXiv:2506.03284 , 2025. [21] Xiang Zhou. Semiparametric estimation for causal mediation analysis with m ultiple causally 23 ordered mediators. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 84(3):794–821, 2022. [22] Chang Liu and AmirEmad Ghassami. T wo-stage nuisance function estimation for causal mediation analysis. arXiv pr eprint arXiv:2404.00735 , 2024. [23] Aad W V an der V aart. A symptotic statistics , volume 3. Cambridge universit y press, 2000. [24] Mark J v an der Laan and Daniel Rubin. T argeted maximum likelihoo d learning. The International Journal of Biostatistics , 2(1), 2006. [25] Mu-Chi Chung, Hui-T sung Hsu, Y an-Chiao Mao, Chin-Ching W u, Chih-T e Ho, Chiu-Shong Liu, and Chi-Jung Ch ung. Asso ciation and mediation analyses among multiple metals exp osure, plasma folate, and communit y-based impaired estimated glomerular ﬁltration rate in cen tral taiwan. Envir onmental He alth , 21(1):44, 2022. [26] Osv aldo F Morera, Mosi I Dane’el, Brandt A Smith, Alisha H Redelfs, Sarah L Ruiz, Kristopher J Preac her, and Leah D Whigham. Discretizing contin uous v ariables in n utrition and ob esity research: a practice that needs to b e cut short. Nutrition & Diab etes , 13(1):20, 2023. [27] Marco Piccininni, T obias Kurth, Heinrich J A udeb ert, and Jessica L Rohmann. The eﬀect of mobile stroke unit care on functional outcomes: an application of the front-door formula. Epidemiolo gy , 34(5):712–720, 2023. [28] Judea P earl. Causal diagrams for empirical research. Biometrika , 82(4):669–688, 1995. [29] Anna Guo and Razieh Nabi. A v erage causal eﬀect estimation in dags with hidden v ariables: Bey ond back-door and front-door criteria. arXiv pr eprint arXiv:2409.03962 , 2024. [30] Martin Ebinger, Peter Harmel, Christian H Nolte, Ulrike Grittner, Bob Siegerink, and Heinric h J Audebert. Berlin prehospital or usual deliv ery of acute strok e care–study proto col. International Journal of Str oke , 12(6):653–658, 2017. 24 Coarsening Bias from V ariable Discretization in Causal F unctionals (Supplementary Material) This Supplemen tary Material is organized as follows. App endix A presents extensions to settings with m ultiple mediators. App endix B contains all technical pro ofs. App endix C pro vides additional details for the sim ulation studies. A Extensions to m ultiple mediators W e discuss the extensions with t w o (sets of ) mediators. Let M 1 and M 2 denote tw o ordered con tinuous or m ulti-dimensional mediators. Deﬁne the outcome regression µ ( m 1 , m 2 , a, c ) = E ( Y | M 1 = m 1 , M 2 = m 2 , A = a, C = c ) , the conditional mediator density f M 1 | A,C ( m 1 | a, c ) = p ( M 1 = m 1 | A = a, C = c ) and f M 2 | M 1 ,A,C ( m 2 | m 1 , a, c ) = p ( M 2 = m 2 | M 1 = m 1 , A = a, C = c ) . Let Q = { µ, f M 1 | A,C , f M 2 | M 1 ,A,C } collects the n uisance functions. Let ( a y , a 1 , a 2 ) ∈ { 0 , 1 } 3 , and for eac h c ∈ C , θ ( Q )( c ) = Z Z µ ( m 2 , m 1 , a y , c ) f M 2 | M 1 ,A,C ( m 2 | m 1 , a 2 , c ) f M 1 | A,C ( m 1 | a 1 , c ) dm 2 dm 1 . (23) Similarly , for i = 1 , 2 , let h i : R 7→ { 1 , . . . , K i } denote the measu rable discretization maps that partitions the supp ort of M i in to K i disjoin t bins: B i,k i = { m i : h i ( m i ) = k i } , k i ∈ { 1 , . . . , K i } . Let w i,k i = sup m i, 1 ,m i, 2 ∈B i,k i | m i, 1 − m i, 2 | denote the bin width and deﬁne the coarsened mediator f M i = h ( M i ) . F or each bin k i ∈ { 1 , . . . , K i } , deﬁne g k 1 ( a 1 , c ) = p ( f M 1 = k 1 | A = a 1 , C = c ) , g k 1 ,k 2 ( a 2 , c ) = p ( f M 2 = k 2 | f M 1 = k 1 , A = a 2 , C = c ) , and µ k 1 ,k 2 ( a y , c ) = E ( Y | f M 2 = k 2 , f M 1 = k 1 , A = a y , C = c ) . The coarsened analogue of ( 23 ) is: θ h ( Q )( c ) = K 1 X k 1 =1 K 2 X k 2 =1 µ k 1 ,k 2 ( a y , c ) g k 1 ,k 2 ( a 2 , c ) g k 1 ( a 1 , c ) . (24) T o introduce the similar correction as one mediator in main draft, we ﬁrst deﬁne g k 2 ( m 1 , a 2 , c ) = p ( f M 2 = k 2 | M 1 = m 1 , A = a 2 , C = c ) , µ k 2 ,a y ( m 1 , a 2 , c ) = E ( µ ( M 1 , M 2 , a y , C ) | f M 2 = k 2 , M 1 = m 1 , A = a 2 , C = c ) , η k 2 ,a y ( m 1 , a 2 , c ) = µ k 2 ,a y ( m 1 , a 2 , c ) p ( f M 2 = k 2 | M 1 = m 1 , A = a 2 , C = c ) , 25 µ η ,k 1 ,k 2 ,a 2 ,a y ( a 1 , c ) = E  η k 2 ,a y ( M 1 , a 2 , C ) | f M 1 = k 1 , A = a 1 , C = c  . Then, the conditional functional θ ( Q )( c ) could b e written as θ ( Q )( c ) = K 1 X k 1 =1 K 2 X k 2 =1  Z n Z µ ( m 1 , m 2 , a y , c ) p ( m 2 | f M 2 = k 2 , m 1 , a 2 , c ) dm 2 o × p ( f M 2 = k 2 | m 1 , a 2 , c ) p ( m 1 | f M 1 = k 1 , a 1 , c ) dm 1  × p ( f M 1 = k 1 | a 1 , c ) = K 1 X k 1 =1 K 2 X k 2 =1 µ η ,k 1 ,k 2 ,a 2 ,a y ( a 1 , c ) g k 1 ( a 1 , c ) . (25) Deﬁne the within-bin mean m 1 ,k 1 ( a 1 , c ) = E [ M 1 | f M 1 = k 1 , A = a 1 , C = c ] , m 2 ,k 2 ( m 1 , a 2 , c ) = E [ M 2 | f M 2 = k 2 , M 1 = m 1 , A = a 2 , C = c ] . Th us, we deﬁne the debiased coarsened functional as e θ h ( Q )( c ) = K 1 X k 1 =1 K 2 X k 2 =1 µ  m 1 ,k 1 ( a 1 , c ) , m 2 ,k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  , a y , c  g k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  g k 1 ( a 1 , c ) . (26) F or each c ∈ C , the coarsening error e ∆ h ( Q )( c ) = e θ h ( Q )( c ) − θ ( Q )( c ) satisﬁes e ∆ h ( Q )( c ) = K 1 X k 1 =1 K 2 X k 2 =1  µ  m 1 ,k 1 ( a 1 , c ) , m 2 ,k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  , a y , c  g k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  − µ η ,k 1 ,k 2 ,a 2 ,a y ( a 1 , c )  g k 1 ( a 1 , c ) . (27) Assume µ ( m 1 , m 2 , a 1 , c ) is t wice con tin uously diﬀerentiable on eac h B i,k i . Inside each bin B 2 ,k 2 , apply a second-order T aylor expansion of µ ( m 1 , m 2 , a 1 , c ) around m 2 ,k 2 ( m 1 , a 2 , c ) : µ ( m 1 , M 2 , a y , c ) = µ ( m 1 , m 2 ,k 2 ( m 1 , a 2 , c ) , a y , c ) + µ ′ m 2 ( m 1 , m 2 ,k 2 ( m 1 , a 2 , c ) , a y , c )( M 2 − m 2 ,k 2 ( m 1 , a 2 , c ) + 1 2 µ ′′ m 2 ( m 1 , M ∗ 2 , a y , c )( M 2 − m 2 ,k 2 ( m 1 , a 2 , c )) 2 , (28) for some M ∗ 2 ∈ B 2 ,k 2 . 26 Assume η k 2 ,a y ( m 1 , a 2 , c ) is t wice contin uously diﬀerentiable on each B 1 ,k 1 . Inside each bin B 1 ,k 1 for M 1 , apply a second-order T aylor expansion of η k 2 ,a y ( M 1 , a 2 , c ) around m 1 ,k 1 ( a 1 , c ) : η k 2 ,a y ( M 1 , a 2 , c ) = η k 2 ,a y ( m 1 ,k 1 ( a 1 , c ) , a 2 , c ) + ( η k 2 ,a y ) ′ m 1 ( m 1 ,k 1 ( a 1 , c ) , a 2 , c )( M 1 − m 1 ,k 1 ( a 1 , c )) + 1 2 ( η k 2 ,a y ) ′′ m 1 ( M ∗ 1 , a 2 , c )( M 1 − m 1 ,k 1 ( a 1 , c )) 2 . (29) for some M ∗ 1 ∈ B 1 ,k 1 . T ake conditional exp ectation sequentially : µ η ,k 1 ,k 2 ,a 2 ,a y ( a 1 , c ) = E  η k 2 ,a y ( M 1 , a 2 , c ) | f M 1 = k 1 , A = a 1 , C = c  29 = η k 2 ,a y ( m 1 ,k 1 ( a 1 , c ) , a 2 , c ) + 1 2 E  ( η k 2 ,a y ) ′′ m 1 ( M ∗ 1 , a 2 , c )( M 1 − m 1 ,k 1 ( a 1 , c )) 2   f M 1 = k 1 , A = a 1 , C = c  = µ k 2 ,a y ( m 1 ,k 1 ( a 1 , c ) , a 2 , c ) g k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  + 1 2 E  ( η k 2 ,a y ) ′′ m 1 ( M ∗ 1 , a 2 , c )( M 1 − m 1 ,k 1 ( a 1 , c )) 2   f M 1 = k 1 , A = a 1 , C = c  = E  µ ( m 1 ,k 1 ( a 1 , c ) , M 2 , a y , c ) | f M 2 = k 2 , m 1 ,k 1 ( a 1 , c ) , A = a 2 , C = c  g k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  + 1 2 E  ( η k 2 ,a y ) ′′ m 1 ( M ∗ 1 , a 2 , c )( M 1 − m 1 ,k 1 ( a 1 , c )) 2   f M 1 = k 1 , A = a 1 , C = c  28 = n µ  m 1 ,k 1 ( a 1 , c ) , m 2 ,k 2 ( m 1 ,k 1 ( a 1 , c ) , a 2 , c ) , a y , c  + 1 2 E  µ ′′ m 2 ( m 1 ,k 1 ( a 1 , c ) , M ∗ 2 , a y , c )( M 2 − m 2 ,k 2 ( m 1 ,k 1 ( a 1 , c ) , a 2 , c )) 2 | f M 2 = k 2 , m 1 ,k 1 ( a 1 , c ) , A = a 2 , C = c  o × g k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  + 1 2 E  ( η k 2 ,a y ) ′′ m 1 ( M ∗ 1 , a 2 , c )( M 1 − m 1 ,k 1 ( a 1 , c )) 2   f M 1 = k 1 , A = a 1 , C = c  . Th us R k 1 ,k 2 ( c ) =  m 1 ,k 1 ( a 1 , c ) , m 2 ,k 2 ( m 1 ,k 1 ( a 1 , c ) , a 2 , c ) , a y , c  g k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  − µ η ,k 1 ,k 2 ,a 2 ,a y ( a 1 , c ) = − 1 2 E  µ ′′ m 2 ( m 1 ,k 1 ( a 1 , c ) , M ∗ 2 , a y , c )( M 2 − m 2 ,k 2 ( m 1 ,k 1 ( a 1 , c ) , a 2 , c )) 2 | f M 2 = k 2 , m 1 ,k 1 ( a 1 , c ) , A = a 2 , C = c  × g k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  − 1 2 E  ( η k 2 ,a y ) ′′ m 1 ( M ∗ 1 , a 2 , c )( M 1 − m 1 ,k 1 ( a 1 , c )) 2   f M 1 = k 1 , A = a 1 , C = c  . Giv en | g k 2  m 1 ,k 1 ( a 1 , c ) , a 2 , c  | < 1 , the monotonicity and linearity of expectations, taking 27 absolute v alues, we obtain the following upp er b ound on | R k 1 ,k 2 ( c ) | : | R k 1 ,k 2 ( c ) | ≤ 1 2 E  | µ ′′ m 2 ( m 1 ,k 1 ( a 1 , c ) , M ∗ 2 , a y , c ) | ( M 2 − m 2 ,k 2 ( m 1 ,k 1 ( a 1 , c ) , a 2 , c )) 2 | f M 2 = k 2 , m 1 ,k 1 ( a 1 , c ) , A = a 2 , C = c  + 1 2 E  | ( η k 2 ,a y ) ′′ m 1 ( M ∗ 1 , a 2 , c ) | ( M 1 − m 1 ,k 1 ( a 1 , c )) 2   f M 1 = k 1 , A = a 1 , C = c  ≤ 1 2 sup m 2 ∈B 2 ,k 2 ( c ) | µ ′′ m 2 ( m 1 ,k 1 ( a 1 , c ) , m 2 , a y , c ) | E  ( M 2 − m 2 ,k 2 ( m 1 ,k 1 ( a 1 , c ) , a 2 , c )) 2 | f M 2 = k 2 , m 1 ,k 1 ( a 1 , c ) , A = a 2 , C = c  + 1 2 sup m 1 ∈B 1 ,k 1 ( c ) | ( η k 2 ,a y ) ′′ m 1 ( M ∗ 1 , a 2 , c ) | E  ( M 1 − m 1 ,k 1 ( a 1 , c )) 2 | f M 1 = k 1 , A = a 1 , C = c  . F or any ( a 1 , a 2 , c ) , the conditional la w p ( m 1 | f M 1 = k 1 , A = a 1 , C = c ) is supp orted on B 1 ,k 1 ( c ) and p ( m 2 | f M 2 = k 2 , M 1 = m 1 ,k 1 ( a 1 , c ) , A = a 2 , C = c ) is supp orted on B 2 ,k 2 ( c ) , hence V ar ( M 1 | k 1 , a 1 , c ) ≤ (sup m 1 , 1 ,m 1 , 2 ∈B 1 ,k 1 ( c ) | m 1 , 1 − m 1 , 2 | ) 2 4 = w 1 ,k 1 ( c ) 2 4 V ar ( M 2 | k 2 , m 1 ,k 1 ( a 1 , c ) , a 2 , c ) ≤ (sup m 2 , 1 ,m 2 , 2 ∈B 2 ,k 2 ( c ) | m 2 , 1 − m 2 , 2 | ) 2 4 = w 2 ,k 2 ( c ) 2 4 . Therefore, e ∆ h ( Q )( c ) = O ( w max , 1 ,K 1 ( c ) 2 + w max , 2 ,K 2 ( c ) 2 ) . B Pro ofs B.1 Iden tiﬁcation pro ofs F or the identiﬁcation of mediation functional E ( Y ( a 1 , M ( a 0 ))) and the front-do or functional E ( Y ( a 0 )) , w e ﬁrst introduce general assumptions that apply throughout. F or any a and m , we assume (A) Consistency , whic h indicates that observed outcome and mediators match their coun terfac- tuals when treatmen t and mediator v alues are set at observed v alues; i.e. Y ( a, m ) = Y if A = a and M = m , and M ( a ) = m if A = a . (B) Positivity , which states that P ( A = 1 | C = c ) > 0 when P ( C = c ) > 0 , and P ( A = 1 | M = m, C = c ) > 0 when P ( M = m, C = c ) > 0 . T o identify the mediation functional E ( Y ( a 1 , M ( a 0 ))) , w e additionally need: (C1) Conditional ignor ability , which assumes the absence of unmeasured confounders b et w een the treatment-mediator, treatment-outcome and mediator-outcome pairs; i.e. for an y 28 a, a 0 , a 1 and m : (i) Y ( a 1 , m ) , M ( a 0 ) ⊥ ⊥ A | C ; (ii) Y ( a 1 , m ) ⊥ ⊥ M ( a ) | A, C . Under (A)–(C1), the mediation functional is iden tiﬁed as E ( Y ( a 1 , M ( a 0 ))) = Z Z E ( Y ( a 1 , m ) | M ( a 0 ) = m, C = c ) p ( M ( a 0 ) = m | C = c ) p ( C = c ) dm dc = Z Z E ( Y ( a 1 , m ) | M ( a 0 ) = m, A = a 1 , C = c ) p ( M = m | A = a 0 , C = c ) p ( C = c ) dm dc = Z Z E ( Y ( a 1 , m ) | M ( a 1 ) = m, A = a 1 , C = c ) p ( M = m | A = a 0 , C = c ) p ( C = c ) dm dc = Z Z E ( Y | M = m, A = a 1 , C = c ) p ( M = m | A = a 0 , C = c ) p ( C = c ) dm dc (30) T o identify the front-door functional E ( Y ( a 0 )) , in addition to (A) and (B), w e require: (C2) Conditional ignor ability , which assumes the absence of unmeasured confounders b et w een the treatment-mediator, and mediator-outcome pairs; i.e. (i) M ( a 0 ) ⊥ ⊥ A | C ; (ii) Y ( m ) ⊥ ⊥ M | A, C ; (iii) Y ( m ) ⊥ ⊥ M ( a 0 ) | C . (D) No dir e ct eﬀe ct , whic h assumes that M blo c ks all directed paths from A to Y , i.e., Y ( a, m )= Y ( m ) for any a and m . Under (A), (B), (C2) and (D), the fron t-do or functional is identiﬁed as E ( Y ( a 0 )) = Z Z E ( Y ( a 0 ) | M ( a 0 ) = m, C = c ) p ( M ( a 0 ) = m | C = c ) p ( C = c ) dm dc = Z Z E ( Y ( a 0 , m ) | M ( a 0 ) = m, C = c ) p ( M ( a 0 ) = m | A = a 0 , C = c ) p ( C = c ) dm dc = Z Z E ( Y ( m ) | M ( a 0 ) = m, C = c ) p ( M = m | A = a 0 , C = c ) p ( C = c ) dm dc = Z Z E ( Y ( m ) | C = c ) p ( M = m | A = a 0 , C = c ) p ( C = c ) dm dc = Z Z  X a ∈{ 0 , 1 } E ( Y | M = m, A = a, C = c ) p ( A = a | C = c )  p ( M = m | A = a 0 , C = c ) p ( C = c ) dm dc = p ( A = a 0 ) E ( Y | A = a 0 ) + Z Z E ( Y | M = m, A = a 1 , C = c ) p ( M = m | A = a 0 , C = c ) p ( A = a 1 , C = c ) dm dc . (31) 29 B.2 Lemma 3.1 Giv en our notations, we can rewrite θ ( Q )( C ) in ( 1 ) as: θ ( Q )( C )= K X k =1  Z µ ( m, a 1 , c ) p ( m | f M = k , a 0 , c ) dm  g k ( a 0 , c )= K X k =1 µ k,a 1 ( a 0 , c ) g k ( a 0 , c ) , (32) By deﬁnition of the coarsened estimand, θ h ( Q )( c ) = K X k =1 µ k ( a 1 , c ) g k ( a 0 , c ) . Therefore, ∆ h ( Q )( c ) = θ h ( Q )( c ) − θ ( Q )( c ) (33) = K X k =1 µ k ( a 1 , c ) g k ( a 0 , c ) − K X k =1 µ k,a 1 ( a 0 , c ) g k ( a 0 , c ) (34) = K X k =1 n µ k ( a 1 , c ) − µ k,a 1 ( a 0 , c ) o g k ( a 0 , c ) , (35) whic h prov es ( 5 ). F urther, supp ose that sup m ∈B k ( c ) | µ ′ m ( m, a 1 , c ) | ≤ L ( c ) for some square-integrable function under P C . Then from ( 5 ),    ∆ h ( Q )( c )    ≤ K X k =1    µ k ( a 1 , c ) − µ k,a 1 ( a 0 , c )    g k ( a 0 , c ) (36) ≤ L ( c ) K X k =1    m k ( a 1 , c ) − m k ( a 0 , c )    g k ( a 0 , c ) . (37) Since b oth conditional means m k ( a 1 , c ) and m k ( a 0 , c ) lie in the same bin B k ( c ) , their diﬀerence is b ounded b y the bin width,    m k ( a 1 , c ) − m k ( a 0 , c )    ≤ w k ( c ) , so    ∆ h ( Q )( c )    ≤ L ( c ) K X k =1 w k ( c ) g k ( a 0 , c ) . (38) 30 The w eights g k ( a 0 , c ) form a probability distribution ov er k , hence P K k =1 w k ( c ) g k ( a 0 , c ) is a w eighted a verage of the bin widths and is b ounded by the maximum bin width, K X k =1 w k ( c ) g k ( a 0 , c ) ≤ max k w k ( c ) = : w max ,K ( c ) . Therefore ∆ h ( Q )( c ) = O  w max ,K ( c )  . When the mediator has b ounded support and the K bins are chosen to ha v e equal width, w max ,K ( c ) = O (1 /K ) , so ∆ h ( Q )( c ) = O (1 /K ) . T o prov e ( 7 ), we assume µ is twice contin uously diﬀeren tiable and deﬁne the within-bin mean m k ( a, c ) = E ( M | A = a, C = c, f M = k ) . (39) Applying a ﬁrst-order T aylor expansion of µ ( M , a 1 , c ) around m k ( a, c ) yields µ ( M , a 1 , c ) ≈ µ ( m k ( a, c ) , a 1 , c ) + µ ′ m ( m k ( a, c ) , a 1 , c ) ( M − m k ( a, c )) , where µ ′ m denotes the partial deriv ative of µ with resp ect to m . T aking exp ectations w.r.t. p ( m | f M = k , a, c ) sho ws that E ( µ ( M , a 1 , c ) | f M = k , a, c ) ≈ µ ( m k ( a, c ) , a 1 , c ) up to second-order remainder terms. Hence, µ k ( a 1 , c ) − µ k,a 1 ( a 0 , c ) can b e appro ximated by µ k ( a 1 , c ) − µ k,a 1 ( a 0 , c ) ≈ µ ( m k ( a 1 , c ) , a 1 , c ) − µ ( m k ( a 0 , c ) , a 1 , c ) ≈ µ ′ m ( m k ( a 1 , c ) , a 1 , c )  m k ( a 1 , c ) − m k ( a 0 , c )  , (40) where the second line follo ws another ﬁrst-order T aylor expansion of µ around m k ( a 1 , c ) . B.3 Remark 3.2 F or each bin index k and co v ariate v alue c , deﬁne p k,a ( m | c ) = p ( m | f M = k , A = a, C = c ) , r k ( m | c ) = p k,a 0 ( m | c ) p k,a 1 ( m | c ) . (41) 31 W e can write µ k,a 1 ( a 0 , c ) − µ k ( a 1 , c ) = Z µ ( m, a 1 , c ) p k,a 0 ( m | c ) dm − Z µ ( m, a 1 , c ) p k,a 1 ( m | c ) dm = Z µ ( m, a 1 , c ) n r k ( m | c ) − 1 o p k,a 1 ( m | c ) dm = E p k,a 1  µ ( M , a 1 , c )  r k ( M | c ) − 1   = E p k,a 1   µ ( M , a 1 , c ) − E p k,a 1  µ ( M , a 1 , c )   r k ( M | c ) − 1   = Co v p k,a 1  µ ( M , a 1 , c ) , r k ( M | c )  , since E p k,a 1  r k ( M | c )  = 1 . B.4 Lemma 4.1 In the pro of of Lemma 3.1 , we show ed θ ( Q )( c ) = P K k =1 µ k,a 1 ( a 0 , c ) g k ( a 0 , c ) . By deﬁnition of the debiased coarsened functional, e θ h ( Q )( c ) = P K k =1 µ  m k ( a 0 , c ) , a 1 , c  g k ( a 0 , c ) . Subtracting the t wo displa ys yields ( 11 ). W e now b ound R k ( c ) for a ﬁxed bin index k . Inside each bin B k , apply a second-order T a ylor expansion of µ ( M , a 1 , c ) around m k ( a 0 , c ) : µ ( M , a 1 , c ) = µ ( m k ( a 0 , c ) , a 1 , c ) + µ ′ m ( m k ( a 0 , c ) , a 1 , c )( M − m k ( a 0 , c )) + 1 2 µ ′′ m ( M ∗ , a 1 , c )( M − m k ( a 0 , c )) 2 , for some M ∗ ∈ B k (b y the mean-v alue form of the remainder). T aking the conditional exp ectation under ( f M = k , A = a 0 , C = c ) gives E ( µ ( M , a 1 , c ) | k , a 0 , c ) = µ ( m k ( a 0 , c ) , a 1 , c ) + 1 2 E  µ ′′ m ( M ∗ , a 1 , c )( M − m k ( a 0 , c )) 2 | k , a 0 , c  . Th us R k ( c ) = µ ( m k ( a 0 , c ) , a 1 , c ) − µ k,a 1 ( a 0 , c ) = − 1 2 E  µ ′′ m ( M ∗ , a 1 , c )( M − m k ( a 0 , c )) 2 | k , a 0 , c  . T aking absolute v alues and given the monotonicit y and linearity of exp ectations, we obtain the follo wing upp er b ound on | R k ( c ) | : | R k ( c ) | ≤ 1 2 E  | µ ′′ m ( M ∗ , a 1 , c ) | ( M − m k ( a 0 , c )) 2 | k , a 0 , c  ≤ 1 2 sup m ∈B k ( c ) | µ ′′ m ( m, a 1 , c ) | E  ( M − m k ( a 0 , c )) 2 | k , a 0 , c  . 32 Finally , since the conditional la w of M giv en ( A = a 0 , C = c, f M = k ) is supp orted on B k , its range is at most w k , and hence E  ( M − m k ( a 0 , c )) 2 | A = a 0 , C = c, f M = k  = V ar( M | A = a 0 , C = c, f M = k ) ≤ (sup m 1 ,m 2 ∈B k | m 1 − m 2 | ) 2 4 = w 2 k / 4 Com bining with sup m ∈B k | µ ′′ m ( m, a 1 , c ) | ≤ L ( c ) yields | R k ( c ) | ≤ L ( c ) w 2 k / 8 . F urthermore, using the fact that P K k =1 g k ( a 0 , c ) = 1 , w e obtain | e ∆ h ( Q )( c ) | ≤ K X k =1 | R k ( c ) | g k ( a 0 , c ) ≤ L ( c ) 8 K X k =1 w 2 k g k ( a 0 , c ) ≤ L ( c ) 8 w 2 max ,K , whic h implies e ∆ h ( Q )( c ) = O ( w 2 max ,K ) . B.5 Lemma 5.1 By deﬁnition: e θ h,b ( Q )( c ) = K X k =1 µ b,k ( a 1 , c ) g k ( a 0 , c ) , e θ h ( Q )( c ) = K X k =1 µ  m k ( a 0 , c ) , a 1 , c  g k ( a 0 , c ) , so e ∆ s h,b ( Q )( c ) = e θ h,b ( Q )( c ) − e θ h ( Q )( c ) = K X k =1 n µ b,k ( a 1 , c ) − µ  m k ( a 0 , c ) , a 1 , c o g k ( a 0 , c ) , whic h establishes the ﬁrst display . It remains to show that eac h term in braces is of order b 2 uniformly in k , c under the stated conditions. Fix k and c , µ b,k ( a 1 , c ) = E n µ ( m, a 1 , c ) ω b,k ( M | a 1 , c ) | A = a 1 , C = c o = E n µ ( m, a 1 , c ) K b ( M − m k ( a 0 , c )) | A = a 1 , C = c o E n K b ( M − m k ( a 0 , c )) | A = a 1 , C = c o = N b D b , 33 where N b = Z µ ( m, a 1 , c ) K b ( m − m k ( a 0 , c )) f M | A,C ( m | a 1 , c ) dm, D b = Z K b ( m − m k ( a 0 , c )) f M | A,C ( m | a 1 , c ) dm. Mak e the change of v ariables m = m k ( a 0 , c ) + bu , so that dm = b du and K b ( m − m k ( a 0 , c )) = 1 b K  m − m k ( a 0 ,c ) b  = 1 b K ( u ) . Then N b = Z µ ( m k ( a 0 , c ) + bu, a 1 , c ) K b  m k ( a 0 , c ) + bu − m k ( a 0 , c )  f M | A,C ( m k ( a 0 , c ) + bu | a 1 , c ) b du = Z µ ( m k ( a 0 , c ) + bu, a 1 , c ) f M | A,C ( m k ( a 0 , c ) + bu | a 1 , c ) K ( u ) du, D b = Z f M | A,C ( m k ( a 0 , c ) + bu | a 1 , c ) K ( u ) du. By assumption (i), µ ( · ) is t wice contin uously diﬀeren tiable in a neighborho o d of m k ( a 0 , c ) and its second deriv ative is uniformly b ounded; similarly , assumption (iii) ensures that f M | A,C ( · ) is con tinuous and b ounded aw a y from zero in a neighborho o d of m k ( a 0 , c ) . Applying a second order T aylor expansion around m k ( a 0 , c ) , µ ( m k ( a 0 , c ) + bu, a 1 , c ) = µ ( m k ( a 0 , c ) , a 1 , c ) + µ ′ m ( m k ( a 0 , c ) , a 1 , c ) bu + 1 2 µ ′′ m ( f M 1 , a 1 , c ) b 2 u 2 , f M | A,C ( m k ( a 0 , c ) + bu | a 1 , c ) = f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + f ′ ( m k ( a 0 , c ) | a 1 , c ) bu + 1 2 f ′′ ( f M 2 | a 1 , c ) b 2 u 2 , for some f M 1 , f M 2 b et w een m k ( a 0 , c ) and m k ( a 0 , c ) + bu . Multiplying these expansions and using that the second deriv atives are uniformly b ounded yields µ ( m k ( a 0 , c ) + bu, a 1 , c ) f M | A,C ( m k ( a 0 , c ) + bu | a 1 , c ) = µ ( m k ( a 0 , c ) , a 1 , c ) f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + n µ ′ m ( m k ( a 0 , c ) , a 1 , c ) f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + µ ( m k ( a 0 , c ) , a 1 , c ) f ′ ( m k ( a 0 , c ) | a 1 , c ) o bu + O ( b 2 u 2 ) , where the O ( b 2 u 2 ) term is uniform in u on the support of K . 34 Substituting in to the expression for N b and using assumption (ii) on the k ernel, Z K ( u ) du = 1 , Z u K ( u ) du = 0 , Z u 2 K ( u ) du < ∞ , w e obtain N b = Z h µ ( m k ( a 0 , c ) , a 1 , c ) f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + n µ ′ m ( m k ( a 0 , c ) , a 1 , c ) f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + µ ( m k ( a 0 , c ) , a 1 , c ) f ′ ( m k ( a 0 , c ) | a 1 , c ) o bu + O ( b 2 u 2 ) i K ( u ) du = µ ( m k ( a 0 , c ) , a 1 , c ) f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + O ( b 2 ) , since the term prop ortional to bu in tegrates to zero by symmetry of K , and the O ( b 2 u 2 ) term in tegrates to O ( b 2 ) b ecause R u 2 K ( u ) du < ∞ . A similar argumen t for D b giv es D b = Z h f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + f ′ ( m k ( a 0 , c ) | a 1 , c ) bu + O ( b 2 u 2 ) i K ( u ) du = f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + O ( b 2 ) . Assumption (iii) ensures that f M | A,C ( m k ( a 0 , c ) | a 1 , c ) is b ounded a wa y from zero, so for b small enough D b is also b ounded a wa y from zero and we can write µ b,k ( a 1 , c ) = N b D b = µ ( m k ( a 0 , c ) , a 1 , c ) f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + O ( b 2 ) f M | A,C ( m k ( a 0 , c ) | a 1 , c ) + O ( b 2 ) = µ ( m k ( a 0 , c ) , a 1 , c ) + O ( b 2 ) , where the O ( b 2 ) term is uniform in k and c under the stated uniform b oundedness conditions. Therefore, for eac h k , µ b,k ( a 1 , c ) − µ  m k ( a 0 , c ) , a 1 , c  = O ( b 2 ) . 35 Plugging this in to the expression for e ∆ s h,b ( Q )( c ) and using that P K k =1 g k ( a 0 , c ) = 1 yields    e ∆ s h,b ( Q )( c )    ≤ K X k =1    µ b,k ( a 1 , c ) − µ  m k ( a 0 , c ) , a 1 , c     g k ( a 0 , c ) ≤ C ∗ b 2 K X k =1 g k ( a 0 , c ) = C ∗ b 2 , for some constant C ∗ < ∞ not dep ending on b . Hence e ∆ s h,b ( Q )( c ) = O ( b 2 ) as b → 0 , as claimed. F urthermore, we can write e ∆ h,b ( Q )( c ) : = e θ h,b ( Q )( c ) − θ ( Q )( c ) = e θ h ( Q )( c ) − θ ( Q )( c ) | {z } coarsening error + e θ h,b ( Q )( c ) − e θ h ( Q )( c ) | {z } smoothing error = O  w 2 max ,K + b 2  . B.6 Theorem 5.2 W e derive the EIF for the smo othed mediation functional e ψ h,b ( Q ) = E   K X k =1 µ b,k ( a 1 , C ) g k ( a 0 , C )   , (42) where µ b,k ( a 1 , c ) = E h Y ω b,k ( M | a 1 , c )    A = a 1 , C = c i , g k ( a 0 , c ) = p ( f M = k | A = a 0 , C = c ) . (43) Recall that ω b,k ( m | a 1 , c ) = K b  m − m k ( a 0 , c )  E  K b  M − m k ( a 0 , c )  | A = a 1 , C = c  , K b ( u ) = b − 1 K ( u/b ) . (44) The functional e ψ h,b ( Q ) dep ends on Q in tw o distinct w a ys: (i) through the conditional laws deﬁning g k ( a 0 , c ) and µ b,k ( a 1 , c ) , (ii) through the nuisance center m k ( a 0 , c ) that app ears inside the w eight ω b,k . Along a regular parametric submo del Q ε with score S , the pathwise deriv ative decomp oses in to 36 t wo con tributions, d dε e ψ h,b ( Q ε )      ε =0 = d dε e ψ h,b  Q ε ; { m k }       ε =0 | {z } v ary Q holding { m k } ﬁxed + K X k =1 ∂ e ψ h,b ∂ m k       Q d dε m k ( Q ε )      ε =0 | {z } chain rule through m k . W e compute these tw o contributions separately . P art (I): Deriv ative with resp ect to the conditional laws (holding m k ﬁxed). In this ﬁrst step we treat the function m 7→ ω b,k ( m | a 1 , c ) as ﬁxed, so that µ b,k ( a 1 , c ) = E { Y ω b,k ( M | a 1 , c ) | A = a 1 , C = c } is a standard regression functional; the dep endence of the normalization in ( 44 ) on the conditional law is accounted for when w e later add the c hain-rule correction through m k . Let Z k ≡ Z k ( O ) : = Y ω b,k ( M | a 1 , C ) . Then µ b,k ( a 1 , c ) = E ( Z k | A = a 1 , C = c ) . F or ﬁxed ω b,k , the EIF for the regression functional c 7→ E ( Z k | A = a 1 , C = c ) is D ﬁx µ b,k ( O ) = I ( A = a 1 ) π ( a 1 | C )  Z k − µ b,k ( a 1 , C )  . = I ( A = a 1 ) π ( a 1 | C )  Y ω b,k ( M | a 1 , C ) − µ b,k ( a 1 , C )  . (45) This corresponds to the deriv ative of e ψ h,b ( Q ) with respect to the conditional distribution argument in the m ultiv ariate chain rule. F or g k ( a 0 , c ) = E ( I ( f M = k ) | A = a 0 , C = c ) , the EIF is D g k ( O ) = I ( A = a 0 ) π ( a 0 | C )  I ( f M = k ) − g k ( a 0 , C )  . (46) Since e θ h,b ( C ) = P k g k ( a 0 , C ) µ b,k ( a 1 , C ) is a pro duct of tw o regression functionals, its deriv ative with resp ect to the conditional laws follo ws from the usual pro duct rule applied p oint wise in C . The EIF for e ψ h,b ( Q ) = E ( e θ h,b ( Q )( C )) under ﬁxed ω b,k is e ϕ ﬁxed h,b ( Q ) = K X k =1 g k ( a 0 , C ) D ﬁx µ b,k ( O ) + K X k =1 µ b,k ( a 1 , C ) D g k ( O ) +  e θ h,b ( Q )( C ) − e ψ h,b ( Q )  . (47) 37 Substituting ( 45 ) and ( 46 ) yields the explicit display ed form ula. P art (I I): Deriv ative with resp ect to the nuisance cen ters m k ( a 0 , c ) . No w consider the actual w eight in ( 44 ), whic h dep ends on the unkno wn cen ter m k ( a 0 , c ) = E ( M | A = a 0 , C = c, f M = k ) and on the normalizing denominator E n K b  M − m k ( a 0 , c )  | A = a 1 , C = c o . This induces an additional EIF comp onen t through the path wise deriv ative of µ b,k ( a 1 , c ) with resp ect to the cen ter m k ( a 0 , c ) . The EIF for the conditional mean m k ( a 0 , C ) = E ( M | A = a 0 , C, f M = k ) is D m k ( O ) = I ( A = a 0 ) I ( f M = k ) π ( a 0 | C ) g k ( a 0 , C )  M − m k ( a 0 , C )  . (48) T o compute the second term in the multiv ariate chain rule, we diﬀeren tiate µ b,k ( a 1 , c ) with respect to its second argument m k ( a 0 , c ) while holding the conditional law of ( Y , M ) | ( A = a 1 , C = c ) ﬁxed. W riting µ b,k ( a 1 , c ) = N k ( c ) /D k ( c ) with N k ( c ) = E n Y K b  M − m k ( a 0 , c )  | A = a 1 , C = c o , D k ( c ) = E n K b  M − m k ( a 0 , c )  | A = a 1 , C = c o , and applying the quotient rule yields (assuming the usual regularity conditions that justify diﬀeren tiation under the conditional exp ectation) α k ( c ) : = ∂ ∂ m µ b,k ( a 1 , c ) = − E h { Y − µ b,k ( a 1 , c ) } K ′ b  M − m k ( a 0 , c )  | A = a 1 , C = c i E h K b  M − m k ( a 0 , c )  | A = a 1 , C = c i . (49) 38 By the m ultiv ariate chain rule, the contribution of the nuisance m k to the EIF is e ϕ ω h,b ( Q )( O ) = K X k =1 g k ( a 0 , C ) α k ( C ) D m k ( O ) . (50) Substituting ( 48 ) into ( 50 ) simpliﬁes this term to e ϕ ω h,b ( Q )( O ) = K X k =1 α k ( C ) I ( A = a 0 ) I ( f M = k ) π ( a 0 | C )  M − m k ( a 0 , C )  . (51) Com bining the deriv ative with resp ect to the conditional laws and the deriv ativ e with resp ect to the n uisance centers yields the full eﬃcient inﬂuence function e ϕ h,b ( Q )( O ) = e ϕ ﬁxed h,b ( Q )( O ) + e ϕ ω h,b ( Q )( O ) , (52) where e ϕ ﬁxed h,b ( Q )( O ) is giv en in ( 47 ) and e ϕ ω h,b ( Q )( O ) is giv en in ( 51 ). C Details in sim ulations In this section, w e presen t a detailed explanation of the missp eciﬁcation scenarios implemented in Simulation #2. F or eac h of the estimators, the corresp onding collection of nuisance functions Q is giv en as follows: (i) ψ h ( b Q ) : { µ k , g k } ; (ii) e ψ h ( b Q ) : { µ, m k , g k } ; (iii) ψ + h ( b Q ) : { µ k , g k , π } ; (iv) e ψ + h 1 ( b Q ) : { µ, m k , g k , π } ; (v) e ψ + h 2 ( b Q ) : { µ, m k , g k , g , π } . T able 2 presents the sp eciﬁcation and missp eciﬁcation conﬁgurations of the n uisance functions. The plug-in estimator ψ h ( b Q ) coincides with the sp eciﬁcation under Condition 3 as well as the fully correctly sp eciﬁed setting; the same holds for e ψ h ( b Q ) . Under the DGP describ ed in Section 6 , correctly sp eciﬁed n uisance functions can b e consistently estimated using GLMs. Missp eciﬁcation is in tro duced b y constructing mo dels with nonlinear and irrelev ant functional forms, including exp onen tial, cosine, in verse, and higher-order p olynomial terms. 39 T able 2: Missp eciﬁcation scenarios for n uisance functions in Simulation #2. A chec k mark ( ✓ ) indicates correct sp eciﬁcation, while a cross mark ( ✗ ) denotes model missp eciﬁcation. F unction Condition 1 Condition 2 Condition 3 Correct F alse m k ✓ ✓ ✓ ✓ ✗ µ ✗ ✓ ✓ ✓ ✗ µ k ✗ ✓ ✓ ✓ ✗ g ✓ ✗ ✓ ✓ ✗ g k ✓ ✗ ✓ ✓ ✗ π ✓ ✓ ✗ ✓ ✗ 40

Coarsening Bias from Variable Discretization in Causal Functionals

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment