Tail-Calibrated Estimation of Extreme Quantile Treatment Effects
Extreme quantile treatment effects (eQTEs) measure the causal impact of a treatment on the tails of an outcome distribution and are central for studying rare, high-impact events. Standard QTE methods often fail in extreme regimes due to data sparsity…
Authors: Mengran Li, Daniela Castro-Camilo
T ail-Calibrated Estimation of Extreme Quan tile T reatmen t Effects Mengran Li Daniela Castro-Camilo Scho ol of Mathematics and Statistics, University of Glasgow, UK. Abstract Extreme quan tile treatmen t effects (eQTEs) measure the causal impact of a treatmen t on the tails of an outcome distribution and are central for studying rare, high-impact even ts. Standard QTE metho ds often fail in extreme regimes due to data sparsity , while existing eQTE metho ds rely on restrictiv e tail assumptions or on interior-quan tile theory . W e propose the T ail-Calibrated Inv erse Estimating Equation (TIEE) framework, whic h com bines information across quantile levels and anc hors the tail using extreme v alue mo dels within a unified estimating equation approac h. W e establish asymptotic properties of the resulting estimator and ev aluate its p erformance through sim ulation under differen t tail b ehaviours and mo del missp ecifications. An application to extreme precipitation in the A ustrian Alps illustrates ho w TIEE enables observ ational causal attribution for v ery rare ev en ts under anthropogenic warming. More broadly , the proposed framew ork estab- lishes a new foundation for causal inference on rare, high-impact outcomes, with relev ance across en vironmental risk, economics, and public health. Keyw ords : Causal inference; Extreme quan tile treatment effect; Extreme v alue theory; Inv erse estimating equations. 1 In tro duction Estimating causal effects at extreme quantiles is cen tral in applications where rare, high-impact outcomes drive risk and decision-making. Examples arise in climate attribution, financial losses, en vironmental hazards and health extremes, where the in terest lies in c hanges in the severit y of the most extreme even ts rather than in av erage effects. A natural framew ork for studying distributional causal effects is that of quan tile treatmen t effects (QTEs), widely explored in classical statistics and econometrics ( Doksum , 1974 ; Lehmann and D’Abrera , 1975 ). Imp ortant examples include that of Firp o ( 2007 ), who developed semiparametric estimation of QTEs under unconfoundedness, and ( Ma et al. , 2023 ), who extended Firp o’s work to more complex data stru ctures and missingness. Zhang ( 2018a ) formalised the conditions under whic h these t yp es of estimators retain their desired ∗ m.li.3@researc h.gla.ac.uk 1 asymptotic prop erties and concluded that asymptotic normalit y can b e reco v ered for in termediate quan tiles, but inference b ecomes ill-posed as the target quantile mo ves further into the tail. Recently , Cheng and Li ( 2024 ) prop osed a metho d for causal quan tile estimation via a general inv erse estimating equation framework. Their approach pro vides a unifying p ersp ective on quan tile identification, but their theory primarily applies to in terior quantiles. F ormal causal analysis of tail quantiles remains, therefore, metho dologically challenging. Recen t approaches to causal inference at extreme quantile lev els hav e turned to extreme v alue theory (EVT) to facilitate estimation and inference for extreme quantiles b eyond the range of observ ed v alues. F or heavy-tailed distributions, the tail quan tile function follo ws a p o wer-la w represen tation gov erned by the extreme v alue index (EVI), whic h is typically estimated using classical pro cedures suc h as the Hill estimator ( Hill , 1975 ). Within this framew ork, W ang et al. ( 2012 ) and Xu et al. ( 2022 ) extended EVT-based metho ds to conditional and regression settings, while Deuber et al. ( 2024 ) w ere among the first to explicitly link EVT with causal inference for quan tile treatmen t effects (QTEs) in observ ational studies. Although this w ork constitutes a significan t adv ancement, the approac h relies on heavy-tailed (F réc het-domain) assumptions and emplo ys a t w o-step estimation pro cedure follow ed b y extrap olation. T aken together, the literature rev eals that causal iden tification and extreme v alue asymptotics ha ve been developed largely in isolation, with existing methods treating confounding adjustment and tail extrap olation sequentially . Consequently , no unified approac h currently deliv ers formally iden tified and statistically v alid inference for causal effects at extreme quantiles. T o address this, w e in tro duce the T ail-Calibrated In v erse Estimating Equation (TIEE) framew ork, a new inferential paradigm for causal analysis in the tails. TIEE in tegrates causal identification and extreme v alue mo delling within a single estimating equation, aggregating information across tail quan tile lev els to yield a well-defined and stable target ev en in regimes where conv en tional metho ds fail. Our contributions are fourfold. First, w e prop ose a general class of tail-calibrated estimating equations that enables iden tification of extreme quantile treatmen t effects without restrictive tail assumptions. Second, we dev elop a computationally stable estimator that lev erages EVT to inform extrap olation b ey ond observ ed extremes. Third, we establish identification, consistency , and asymptotic normality , with v ariance expressions accounting for b oth causal and tail uncertain ty . Finally , through extensiv e simulations and an application to extreme precipitation in the A ustrian Alps, we demonstrate substan tial gains in efficiency and robustness ov er existing metho ds. The remainder of the pap er is organised as follo ws. Section 2 reviews causal inference, extreme quan tiles, and the IEE framew ork. Section 3 in tro duces TIEE and its in tegration with causal signals and EVT. Section 4 presen ts theoretical results and inference pro cedures. Section 5 reports sim ulations, and Section 6 provides an application to Austrian precipitation data. Section 7 concludes. A note on notation. Throughout the pap er, D ∈ { 0 , 1 } denotes a binary treatmen t indicator, with d ∈ { 0 , 1 } its generic v alue, and Y d the corresp onding p otential outcome (subscript omitted when clear). The v ector X X X denotes cov ariates, and the data consist of indep enden t copies W ( i ) = 2 ( Y ( i ) , D ( i ) , X X X ( i )) , i = 1 , . . . , n , of W = ( Y , D , X X X ) . W e write F Y ,d ( · ) for the marginal distribution of Y under treatment d . Quantile lev els are denoted b y τ or p ∈ (0 , 1) , and treated as fixed. The corresp onding quan tile is Q Y d ( τ ) = inf { y : F Y ,d ( y ) ≥ τ } , with shorthand θ d,τ = Q Y d ( τ ) . When con venien t, w e also write q ( p ) for a generic quantile function. 2 Bac kground Let Y 1 ( i ) − Y 0 ( i ) denote the causal effect of the treatment for observ ation i . T o iden tify and estimate treatmen t effects from observ ational data, w e require the follo wing standard assumptions. Assumption 1 (Iden tifiabilit y conditions) . W e assume: (i) Consistency, so the observe d outc ome e quals the c orr esp onding p otential outc ome; (ii) SUTV A, ensuring no interfer enc e b etwe en units and wel l-define d tr e atment levels; (iii) Unc onfounde dness, ( Y 0 , Y 1 ) ⊥ D | X X X , me aning that the outc omes ar e indep endent of the tr e atment given a set of c onfounders; and (iv) Overlap, 0 < P( D = 1 | X X X ) < 1 for al l X X X in the supp ort, me aning b oth tr e atment states must b e p ossible for every c ovariate p attern. Whilst the a v erage treatmen t effect (A TE), defined as E { Y 1 − Y 0 } , remains the most commonly rep orted causal estimand, it only c haracterises the lo cation shift b et ween treatmen t distributions. The quantile treatmen t effect (QTE) pro vides a more comprehensiv e c haracterisation of distributional differences and is defined as δ ( τ ) = Q Y 1 ( τ ) − Q Y 0 ( τ ) = θ 1 ,τ − θ 0 ,τ τ ∈ [0 , 1] . (1) 2.1 Existing approaches to quan tile treatmen t effect estimation Without imp osing distributional assumptions, Firp o ( 2007 ) proposed a QTE estimator for τ ∈ (0 , 1) obtained by rew eigh ting the quan tile loss function as b Q Y d ( τ ) = arg min q ∈ R n X i =1 w d,i ρ τ ( Y d ( i ) − q ) , where ρ τ ( u ) = u ( τ − 1 ( u < 0)) is the chec k function and w d,i are inv erse probability w eights (IPW s) constructed from the prop ensity score, defined as the conditional probability of receiving treatmen t giv en the observ ed cov ariates. An alternativ e approach estimates QTE via conditional distributions, as in Zhang et al. ( 2012 ). Sp ecifically , the conditional distribution of the p oten tial outcome giv en co v ariates is mo delled parametrically , and the marginal distribution under treatment is obtained b y av eraging the fitted conditional distribution ov er the empirical distribution of the cov ariates. The desired quantile is then obtained by in v erting this estimated marginal distribution. In Zhang et al. ( 2012 ), this approach is implemen ted b y assuming a Gaussian mo del for the outcome after a Bo x–Cox transformation. Ma et al. ( 2023 ) prop ose estimating QTEs b y linking marginal quantiles to conditional quantile regression. The approach assumes a linear quantile regression model for the conditional outcome 3 distribution given co v ariates. The marginal quan tile is then c haracterised through a moment condition that connects the marginal distribution with the conditional distribution via the law of iterated exp ectations. In practice, the conditional exp ectation is appro ximated by in tegrating ov er conditional quantile lev els obtained from the quan tile regression mo del. The resulting estimating equation is solv ed after a veraging ov er the empirical distribution of the cov ariates, with the integration t ypically restricted to a trimmed quantile range to av oid instability near the extremes. This approac h pro duces estimators with smaller standard errors than those of Firpo ( 2007 ). 2.2 Extreme quantile treatmen t effects Whilst QTE estimators pro vide a comprehensive ov erview of treatment effect distributions, standard asymptotic prop erties no longer hold when τ approac hes 0 or 1. Considering the lo wer tail an d follo wing Zhang ( 2018a ), we distinguish three asymptotic regimes for extreme quan tile lev els τ n → 0 , defined by ho w fast the effective sample size nτ n shrinks: in termediate ( τ n → 0 and nτ n → ∞ ), mo derately extreme ( τ n → 0 and nτ n → d > 0 ), and extreme ( τ n → 0 and nτ n → 0 ). The first regime still admits Gaussian limits for suitably normalised estimators, whereas in the latter t wo regimes, empirical quan tiles b ecome unstable and classical ro ot- n theory fails. Zhang ( 2018a ) established that Firp o’s ( 2007 ) estimator is asymptotically normal for intermediate quan tiles and dev elop ed b -out-of- n b o otstrap procedures for moderately extreme cases. How ever, empirical quan tile-based metho ds b ecome unreliable when nτ n → 0 , as few or no observ ations fall in the tail region. In the supplemen tary materials ( Zhang , 2018b ), the author also prop oses a Pickands-t yp e estimator for the extreme v alue indices (EVIs) of the p otential outcome distributions, adapted to the causal inference setting in whic h p otential outcomes are only partially observed. The EVI quan tifies the hea viness or ligh tness of the tail of a distribution. Specifically , in EVT, one typically assumes an i.i.d. sample Y (1) . . . , Y ( n ) from distribution G that is in the max-domain of attraction of a generalised extreme v alue distribution, meaning there exist sequences of constants { a n ≥ 0 } and { b n } such that lim n →∞ G n ( a n y + b n ) = exp n − (1 + γ y ) − 1 /γ o , (2) where γ is the EVI. Heavy-tailed distributions correspond to γ > 0 , light-tailed to γ = 0 , and b ounded tails to γ < 0 . In Zhang ( 2018a ), if ˆ q d ( τ ) denote the estimator of the τ -th quantile of Y d , the EVI γ d is estimated as ˆ γ d = − R X r =1 w r log( ˆ q d ( ml r τ n ) − ˆ q d ( l r τ n )) − log ˆ q d ( ml r − 1 τ n ) − ˆ q d ( l r − 1 τ n ) log l , (3) where l > 0 is a spacing parameter, m > 1 , { w r } R r =1 are weigh ts satisf ying P R r =1 w r = 1 , and τ n is an intermediate quan tile index. T o enable estimation b eyond the range of observ ed data, Bh uyan et al. ( 2023 ) reconstruct the counterfactual distribution via a bulk–tail mo del for conditional quantiles with a data-driven threshold τ u , follo w ed b y in tegration o v er co v ariates and inv ersion. This requires the correct 4 sp ecification of the conditional quan tile pro cess and rep eated tail mo delling across cov ariates. By con trast, Deub er et al. ( 2024 ) integrate extreme v alue theory with causal inference via causal Hill estimators for the EVI. Sp ecifically , they prop osed to use b γ H 0 := 1 nτ n n X i =1 (log Y ( i ) − log b q 0 (1 − τ n )) 1 − D ( i ) 1 − b π ( X i ) 1 Y ( i ) > b q 0 (1 − τ n ) , b γ H 1 := 1 nτ n n X i =1 (log Y ( i ) − log b q 1 (1 − τ n )) D ( i ) b π ( X i ) 1 Y ( i ) > b q 1 (1 − τ n ) , where b π ( X i ) denotes the estimated prop ensit y score. Extreme quantiles are then obtained b y extrap olating intermediate quan tiles, b q d (1 − p n ) = b q d (1 − τ n ) τ n p n b γ H d , (4) where τ n → 0 , nτ n → ∞ and p n → 0 , np n → d ≥ 0 . This extrap olation is v alid only for hea vy-tailed distributions ( γ > 0 ), limiting the approach to that setting. 2.3 The inv erse estimating equation framew ork The in v erse estimating equation (IEE) framew ork in tro duced b y Cheng and Li ( 2024 ) pro vides a general strategy for estimating quantiles of p oten tial outcome distributions b y exploiting the defining relationship betw een quantiles and threshold-transformed means. F or an y threshold θ ∈ R , τ d ( θ ) = E [ 1 { Y d ≤ θ } ] = Pr( Y d ≤ θ ) , (5) whic h is the distribution function of Y d ev aluated at θ . Quantiles are defined as the in verse of τ d ( θ ) . Under Assumption 1 and the additional requirement of p oin t iden tification in Cheng and Li ( 2024 ), τ d ( θ ) is iden tified from observ ed data through a signal function s ( W , θ , γ ) satisfying E [ s d ( W , θ , ζ )] = τ d ( θ ) , for all θ ∈ R , where W = ( Y , D , X X X ) and ζ denotes nuisance parameters. Defining the estimating function g d ( W , τ , θ , ζ ) = s d ( W , θ , ζ ) − τ , the identit y E [ g d ( W , τ , θ , ζ )] = 0 holds if and only if τ = τ d ( θ ) . Quantiles therefore follow by in v ersion. Let θ ⋆ d,q = Q Y d ( q ) denote the q -quantile of Y d . By definition, E [ g d ( W , q , θ ⋆ d,q , ζ )] = 0 , (6) so the quantile is c haracterised as the ro ot of the estimating equation, whic h in practice is solved using its empirical analogue. F or in terior quan tiles, this in v ersion is stable and standard asymptotics apply . Near the extremes, how ev er, data b ecome sparse, and the density may v anish, so small p erturbations in the estimating equation can cause large c hanges in the ro ot, making the inv ersion ill-conditioned and inv alidating classical asymptotics. 5 3 T ail-calibrated in v erse estimating equation (TIEE) framew ork This section introduces our framew ork for estimating extreme quan tile treatment effects b y reformu- lating the in verse estimating equation as an in tegrated moment condition. W e imp ose the following regularit y conditions. Assumption 2 (Regularit y conditions on potential outcomes) . F or e ach d ∈ { 0 , 1 } : (i) Y d is c ontinuously distribute d with density f Y ,d ; (ii) f Y ,d ( y ) is monotone in the upp er tail b eyond some thr eshold y 0 ; (iii) f Y ,d is b ounde d away fr om zer o on any c omp act subset of the supp ort; (iv) F Y ,d lies in the maximum domain of attr action of an extr eme value distribution with index ξ d (se e ( 2 ) ). Assumption 2 adapts standard conditions ( Firp o , 2007 ; Cheng and Li , 2024 ) to extreme-tail settings, relaxing the requirement that the densit y b e bounded a wa y from zero and providing a semi- parametric c haracterisation of tail b ehaviour that supp orts consisten t extreme v alue appro ximations. 3.1 F rom in v erse estimating equations to tail calibration The classical in v erse estimating equation (IEE) framework identifies quantiles via the mapping θ 7→ τ d ( θ ) = E [ 1 { Y d ≤ θ } ] , which is strictly increasing, with θ d,τ defined by τ d ( θ ) = τ . Estimation pro ceeds b y constructing a zero-mean estimating equation at lev el τ and in verting it p oint wise. This approac h relies on local regularit y . Indeed, the distribution function must v ary sufficien tly near θ d,τ and enough data m ust b e av ailable in its neighbourho o d. These conditions fail in the tail, where the effectiv e sample size collapses and τ d ( θ ) b ecomes nearly flat, rendering the inv ersion ill-conditioned and unstable. The TIEE framework addresses this limitation b y replacing p oin twise in v ersion with a distribution-level represen tation that aggregates information across quantile lev els. Sp ecifically , note that we can write ( 5 ) as τ d ( θ ) = Z 1 0 1 { Q Y d ( p ) ≤ θ } dp, θ ∈ R , whic h expresses the distribution function as the measure of quantile lev els whose v alues lie b elow θ . This iden tit y holds indep endently of lo cal densit y behaviour and do es not rely on point wise in version. Motiv ated b y this represen tation, we define the random integrated signal S d ( W , θ ) and its p opulation counterpart S d ( θ ) as S d ( W , θ ) := Z 1 0 s TIEE ,d ( W , θ , ζ , p ) dp, S d ( θ ) := E [ S d ( W , θ )] = Z 1 0 E [ s TIEE ,d ( W , θ , ζ , p )] dp, (7) where ζ denotes n uisance parameters and the last equalit y in ( 7 ) is true under mild integrabilit y conditions (e.g., E [ R 1 0 | s TIEE ,d ( W , θ , ζ , p ) | dp ] < ∞ ). The iden tifying requiremen t of the TIEE 6 framew ork is the inte gr ate d unbiase dness c ondition S d ( θ ) = Z 1 0 E [ s TIEE ,d ( W , θ , ζ , p )] dp = τ d ( θ ) for all θ ∈ R . (8) Under the condition in ( 8 ) , identification is now ac hiev ed through a global probability balance rather than a local condition at a single quan tile lev el. The extreme quantile θ d,τ is therefore c haracterised as the unique solution to the integrated estimating equation S d ( θ ) = τ or equiv alently E [ g TIEE ,d ( W , θ , τ , ζ )] = 0 , (9) with g TIEE ,d ( W , θ , τ , ζ ) = S d ( W , θ ) − τ . This reformulation preserv es the core logic of the IEE framew ork (quan tiles are still obtained as ro ots of estimating equations) while stabilising the in version b y aggregating information across quan tile levels. As a result, the estimating equation remains w ell-defined ev en when empirical information near the target quan tile is sparse. The in tegrated formulation naturally induces a decomp osition into a data-ric h b o dy region and a tail region that requires extrap olation. Indeed, S d ( W , θ ) admits a natural decomp osition into a b o dy comp onen t (integrating ov er p ≤ u ) and a tail component (in tegrating o ver p > u ), for some threshold u ∈ (0 , 1) . W e can then express the condition in ( 9 ) as E [ g TIEE ,d ( W , θ , τ , ζ )] = E Z 1 0 { s TIEE ,d ( W , θ d,τ , ζ , p ) − τ eff } d p = 0 , where τ eff = τ − p u 1 − p u = Pr ( Y d ≤ θ d,τ | Y d > u ) denotes the quan tile lev el relativ e to the tail distribution. 3.2 Causal signal functions The integrated estimating equation in Section 3.1 is agnostic to how the distribution of Y d is identified. Causal iden tification en ters only through the c hoice of signal function, leaving the estimator’s structure unc hanged and allo wing flexibility across causal designs. When the un biasedness condition in ( 8 ) holds, the signal s TIEE reco vers the marginal distribution of Y d up on integration o ver p , and the in tegrated estimating equation in ( 9 ) yields the target quantile. This condition can b e ac hieved through different signal constructions; w e describ e t wo canonical examples relev an t to our applications. Naïv e signal under randomisation. In a randomised experiment, treatmen t assignment is indep enden t of the potential outcomes. In this setting, a v alid signal function is giv en b y the indicator itself, s TIEE ( W , θ , p ) = 1 { Q Y d ( p ) ≤ θ } . (10) This signal trivially satisfies the un biasedness condition in ( 8 ) , but it is not v alid in observ ational studies where treatment assignmen t dep ends on cov ariates. In v erse probabilit y weigh ted signal. In observ ational settings, causal identification can b e ac hieved under unconfoundedness and ov erlap assumptions through in verse probabilit y weigh ting. 7 Let π d ( X X X ; ζ ) = Pr( D = d | X X X ) denote the prop ensit y score. A v alid TIEE signal is then given b y s TIEE ( W , θ , ζ , p ) = 1 { D = d } π d ( X X X ; ζ ) 1 { Q Y d ( p ) ≤ θ } . Under standard causal assumptions, this signal satisfies ( 8 ) and recov ers the marginal distribution of Y d up on integration o v er p . 3.3 T ail mo delling and extrap olation The in tegrated estimating equation requires a mo del-assisted represen tation of the signal in the extreme tail, where data are scarce. When the target quantile exceeds a high threshold u , extrap o- lation b ecomes unav oidable. W e address this by em b edding an extreme v alue mo del directly in to the estimating equation. Specifically , under Assumption 2 , exceedances ab ov e u follo w a generalised P areto distribution, yielding the quantile represen tation Q Y d ( τ ) = u + σ d ξ d " 1 − p u 1 − τ ξ d − 1 # , (11) for Q Y d ( τ ) > u , where p u = Pr ( Y d ≤ u ) . This motiv ates the effectiv e tail probabilit y τ eff = ( τ − p u ) / (1 − p u ) introduced in Section 3.1 . Within the TIEE framew ork, the GPD en ters through the signal rather than as a plug-in estimator. The signal uses empirical indicators in the b o dy and mo del-assisted counterparts in the tail. F or the IPW signal in Section 3.2 , this yields s d ( W , θ , ζ ) = 1 { D = d } π d ( X ; ζ ) 1 { ˜ Y d ≤ θ } , where ˜ Y d is generated from the fitted GPD (e.g. ˜ Y d = Q Y d ( U ) with U ∼ Uniform (0 , 1) or via transformed residuals). This preserves the probabilistic in terpretation while enabling ev aluation b ey ond the observ ed range. In practice, ˜ Y d is obtained via regression mo dels on cov ariates (Sections 5 and 6 ). 3.4 Numerical solution for the TIEE estimator ˆ θ d,τ The TIEE estimator ˆ θ d,τ is defined as the solution to the empirical analogue of the in tegrated estimating equation introduced in Section 3.1 . Specifically , letting Φ d,n ( θ ) = 1 n n X i =1 g TIEE ,d ( W i , θ , ˆ ζ , p ) = 1 n n X i =1 Z 1 0 s TIEE ,d ( W , θ , ζ , p ) dp, (12) 8 the estimator satisfies Φ n ( ˆ θ d,τ ) = 0 . In practice, the in tegral o v er p ∈ [0 , 1] is appro ximated using a grid { p k } K k =0 with K sufficien tly large. The empirical estimating equation is then approximated b y 1 n n X i =1 K X k =1 s TIEE ,d ( W , θ , ζ , p ) ( p k − p k − 1 ) = 0 . (13) W e use K = 800 for n = 1000 and K = 2000 for n = 5000 ; sensitivity to K is assessed in App endix B.2 . Direct ro ot-finding is often unstable due to the non-smo othness induced b y indicator functions, particularly in the tail. F ollo wing Ma et al. ( 2023 ), we instead solve an equiv alen t conv ex optimisation problem, defining an ob jectiv e L d,n ( θ ) whose subgradient is prop ortional to Φ d,n ( θ ) . A con v enient c hoice is L d,n ( θ ) = − n X i =1 K X k =1 ( p k − p k − 1 ) ˜ S d ( W i , θ , ˆ ζ , p k ) + | R ∗ − θ | n X i =1 K X k =1 τ eff ( p k − p k − 1 ) , where ˜ S d ( W , θ , ζ , p k ) is an an tideriv ativ e of the signal function with resp ect to θ at p k , i.e., it satisfies ∂ ∂ θ ˜ S d ( W , θ , ζ , p k ) = s TIEE ,d ( W , θ , ζ , p k ) , and R ∗ is a sufficiently large constant ensuring b oundedness of the optimisation domain. The TIEE estimator is then obtained as b θ d,τ = b Q Y d ( τ ) = arg min θ ∈ Θ L d,n ( θ ) . This con vex L 1 -t yp e form ulation ensures n umerical stability and can b e solved efficien tly , ev en for extreme quan tiles. 4 Theoretical prop erties In this section, w e establish the theoretical foundations of the TIEE framework. Our study centres on the TIEE estimating function g TIEE ,d ( W , θ , τ , ζ ) defined in ( 9 ) , and its asso ciated p opulation in tegrated moment Φ d ( θ ) = E [ g TIEE ,d ( W , θ , τ , ζ )] . Note that by construction, Φ d ( θ ) = S d ( θ ) − τ , where S d ( θ ) = E [ S d ( W , θ )] is the in tegrated signal defined in the second line of ( 7 ) . Thus, Φ d ( θ ) is simply the centred v ersion of S d ( θ ) . W e assume that θ ∈ Θ , where Θ ⊂ R is a compact parameter space. 4.1 Iden tification, uniqueness, consistency and asymptotic normality Iden tification hinges on t w o crucial prop erties of the integrated moment function g TIEE ,d ( W , θ , τ , ζ ) . First, that it correctly reco vers the quantile condition when ev aluated at θ d,τ , and second, that it is strictly monotonic in θ . Theorem 1 (Iden tification and Uniqueness) . L et Y b e a r e al-value d r andom variable with c ontinuous mar ginal distribution function. Supp ose the tar get quantile θ d,τ is uniquely define d by F Y d ( θ d,τ ) = τ for τ ∈ (0 , 1) . A ssume that for every ( W , τ , ζ ) the function g TIEE ,d ( W , θ , τ , ζ ) is me asur able and strictly incr e asing in θ , and that the mapping Φ d ( θ ) = E [ g TIEE ,d ( W , θ , τ , ζ )] is wel l define d on Θ 9 and differ entiable in θ . Then Φ d ( θ d,τ ) = 0 and Φ d ( θ ) is strictly incr e asing in θ . Conse quently, the e quation Φ d ( θ ) = 0 admits the unique solution θ = θ d,τ . Pr o of. By the unbiasedness property of the signal function (see ( 8 )), w e hav e Φ d ( θ d,τ ) = E [ g TIEE ,d ( W , θ d,τ , τ , ζ )] = E Z 1 0 { s ( W , θ d,τ , ζ , p ) − τ eff } d p = 0 . Moreo ver, Φ d ( θ ) is strictly increasing in θ b ecause g TIEE ,d ( W , θ , τ , ζ ) is strictly increasing in θ . Therefore, the equation Φ d ( θ ) = 0 admits at most one ro ot. Since Φ d ( θ d,τ ) = 0 , this ro ot must coincide with θ d,τ . Ha ving established that the true extreme quan tile θ d,τ is uniquely identified as the ro ot of the p opulation integrated moment Φ d ( θ ) = 0 , w e now turn to the con vergence of its empirical coun terpart. The following Lemma is required to prov e consistency , and its pro of is deferred to App endix A . Lemma 1 (Glivenk o–Cantelli prop ert y of the integrated signal) . F or θ ∈ Θ , let S d ( W , θ ) b e define d as in ( 7 ) , with s TIEE ,d define d as in ( 10 ) . Then the class of functions F = S d ( · , θ ) : θ ∈ Θ is Glivenko–Cantel li. Theorem 2 (Consistency) . A ssume the c onditions of The or em 1 . In addition, supp ose that the c ovariate X has c omp act supp ort, that for fixe d p the mapping θ 7→ 1 { Q Y d ( p ) ≤ θ } is monotonic and pie c ewise c onstant, and that the nuisanc e p ar ameter estimates ˆ ζ c onver ge in pr ob ability to ζ . Then, the TIEE estimator ˆ θ d,τ that solves the empiric al estimating e quation Φ d,n ( θ ) = 0 is c onsistent for the true extr eme quantile θ d,τ , i.e., ˆ θ d,τ P − → θ d,τ . Pr o of. Consistency follo ws from the general theory of Z-estimators ( V an der V aart , 2000 , Theo- rem 5.9). Unique identification holds b y Theorem 1 . F or uniform conv ergence, Lemma 1 implies that the class F = { S d ( · , θ ) : θ ∈ Θ } is Gliv enko–Can telli. Since g TIEE ,d ( W , θ , τ ) = S d ( W , θ ) − τ , the class { g TIEE ,d ( · ; θ , τ ) : θ ∈ Θ } is also Glivenk o–Can telli, yielding sup θ ∈ Θ 1 n n X i =1 g TIEE ,d ( W i , θ , τ ) − E [ g TIEE ,d ( W , θ , τ )] P − → 0 . Replacing ζ b y ˆ ζ in tro duces a uniformly o P (1) error, so sup θ ∈ Θ | Φ d,n ( θ ) − Φ d ( θ ) | P − → 0 . Since Φ d has a unique zero at θ d,τ , it follows that ˆ θ d,τ P − → θ d,τ . Theorem 2 establishes that the TIEE estimator is consistent ev en when the target quantile τ is extreme and lies b eyond the range of observed data. This result is non-trivial b ecause it holds despite the v anishing density at θ d,τ , a consequence of the EVT-based regularisation em b edded in the integrated momen t condition. Building on this result, w e derive the asymptotic distribution of the TIEE under suitable regularit y conditions. The complete pro of, including the treatment of n uisance parameter estimation via the functional delta metho d ( V an der V aart , 2000 ), is giv en in App endix A . 10 Theorem 3 (Asymptotic Normality) . A ssume the c onditions of The or ems 1 and 2 . Supp ose further that Φ d ( θ ) is c ontinuously differ entiable in a neighb ourho o d of θ d,τ with Φ ′ d ( θ d,τ ) = 0 , that √ n ( ˆ ζ − ζ ) = O p (1) and Φ d,n ( θ ) is p athwise differ entiable in ζ , and that σ 2 d,τ := V ar [ g TIEE ,d ( W , τ , θ d,τ , ζ )] is finite. Then √ n ( ˆ θ d,τ − θ d,τ ) d − → N (0 , V d,τ ) , V d,τ = σ 2 d,τ [Φ ′ d ( θ d,τ )] 2 . (14) The asymptotic v ariance in ( 14 ) highligh ts several k ey features of the TIEE estimator. First, unlik e empirical quan tiles, it remains finite ev en in extreme regimes where f Y ( θ d,τ ) → 0 as τ → 1 . This reflects the EVT-based regularisation induced b y integrating the estimating equation o v er quan tile levels, whic h stabilises tail inference b y b orro wing strength from neigh b ouring quantiles. Second, the construction is also compatible with classical extreme v alue limits, yielding asymptotic normalit y under in termediate sequence regimes where empirical quan tiles typically fail. Third, the v ariance V d,τ dep ends on the deriv ative of the p opulation moment function, whic h, for the signal functions in Section 3.2 , coincides with the densit y of the p otential outcome at the target quantile, Φ ′ d ( θ ) = f Y d ( θ ) . Finally , σ 2 d,τ also captures uncertaint y from estimating nuisance parameters ζ . When the prop ensity score is correctly sp ecified, the IPW-based TIEE estimator is asymptotically efficien t. 4.2 Inference for the extreme quan tile treatment effect δ ( τ ) Based on the results from Section 4.1 , w e derive an expression for the asymptotic distribution of the extreme quantile treatmen t effect δ ( τ ) = θ 1 ,τ − θ 0 ,τ (as defined in ( 1 )). Standard Z-estimation argumen ts yield the joint asymptotic normalit y √ n ˆ θ 1 ,τ − θ 1 ,τ ˆ θ 0 ,τ − θ 0 ,τ ! d − → N 0 , D − 1 Σ Φ D − 1 , where D = diag Φ ′ 1 ( θ 1 ,τ ) , Φ ′ 0 ( θ 0 ,τ ) and Σ Φ = V ar ( g 1 ( W , θ 1 ,τ ) , g 0 ( W , θ 0 ,τ )) ⊤ with comp onentes Σ dd ′ . It follows that √ n ˆ δ ( τ ) − δ ( τ ) d − → N 0 , σ 2 δ , where σ 2 δ = Σ 11 [Φ ′ 1 ( θ 1 ,τ )] 2 + Σ 00 [Φ ′ 0 ( θ 0 ,τ )] 2 − 2Σ 10 Φ ′ 1 ( θ 1 ,τ )Φ ′ 0 ( θ 0 ,τ ) . (15) A consistent estimator ˆ σ 2 δ is obtained via sample analogues (see Appendix A.3 ). The terms Φ ′ d ( θ d,τ ) can b e estimated via numerical differen tiation or density estimation. 5 Sim ulation Study T o ev aluate the finite-sample p erformance of the prop osed TIEE framew ork, w e conduct a simulation study with three ob jectiv es: (i) to compare the in verse probability weigh ted TIEE estimator (TIEE-IPW) with existing metho ds, including the Zhang–Firp o estimator ( Zhang , 2018a ), the Hill causal estimator ( Deub er et al. , 2024 ), and the Pickands estimator ( Zhang , 2018b ); (ii) to examine 11 p erformance across different tail regimes (hea vy- v ersus light-tailed distributions); and (iii) to assess robustness of TIEE-IPW under prop ensity score missp ecification. W e consider sample sizes n ∈ { 1000 , 5000 } and rep ort bias and mean squared error (MSE) based on 1000 Monte Carlo replications. The extreme quantile lev el is defined through the tail probabilit y 1 − τ n , allowing us to study three regimes commonly encoun tered in extreme quantile estimation. Specifically , we set (1 − τ n ) ∈ { 5 /n, 1 /n, 5 / ( n log n ) } , corresp onding to intermediate ( n (1 − τ n ) → ∞ ), mo derately extreme ( n (1 − τ n ) → d > 0 ), and v ery extreme ( n (1 − τ n ) → 0 ) regimes. 5.1 Data-Generating Pro cess The data-generating process is identical across scenarios. W e generate n i.i.d. observ ations W ( i ) = ( Y ( i ) , D ( i ) , X X X ( i )) , where X X X ( i ) = (1 , X ( i )) ⊤ and X ( i ) ∼ Uniform ( − 1 , 1) . The prop ensity score for the binary treatment D = 1 is π ( x x x ) = P ( D = 1 | X X X = x x x ) = 0 . 5 x 2 + 0 . 25 . (16) T reatment assignmen ts are generated as D ( i ) | X X X ( i ) ∼ Ber ( π ( X X X ( i ))) , and the p otential outcomes Y 1 and Y 0 follo w the tail-regime sp ecifications describ ed below. F ollowing Deub er et al. ( 2024 ), the intermediate quantile lev el asso ciated with the GPD threshold is set to (1 − p u ) = k /n with k = n 0 . 65 . Although not optimal in all settings, this choice pro vides a common b enchmark. The tail of the reconstructed potential outcome distribution is obtained via GPD regression on exceedances ab ov e the threshold, with GPD parameters mo delled as functions of cov ariates. A dditional sensitivity analyses for the threshold u and grid size K are provided in App endix B . 5.2 Hea vy-tailed scenarios This set of scenarios fo cuses on cases where the underlying p otential outcome distributions F Y d b elong to the F réchet maxim um domain of attraction, c haracterised b y heavy tails. The three scenarios v ary the heterogeneity of the EVIs b et ween the treatment and control groups. F ollowing Deub er et al. ( 2024 ), the p otential outcomes are generated from the following models: M (H) 1 : Y 1 = 5 S (1 + X ) , Y 0 = S (1 + X ) , M (H) 2 : Y 1 = C 2 exp( X ) , Y 0 = C 3 exp( X ) , M (H) 3 : Y 1 = P 1 . 75+ X, 2 , Y 0 = P 1 . 75+ X, 1 , where X is defined as ab ov e, S follo ws a Studen t- t distribution with 3 degrees of freedom, C s is F réchet distributed with shap e parameter s , lo cation 0, and scale 1, and P a,b is Pareto distributed with shap e parameter a and scale b . The asso ciated EVIs for eac h scenario are γ 1 = γ 0 = 1 / 3 for mo del M (H) 1 , γ 1 = 1 / 2 and γ 0 = 1 / 3 for mo del M (H) 2 , and γ 1 = γ 0 = 4 / 7 for mo del M (H) 3 . 12 5.3 Ligh t-tailed scenarios Here we fo cus on scenarios where the underlying distribution F Y d b elong to the Gumbel or W eibull maxim um domain of attraction, corresp onding to light-tailed distributions. Here we expect EVT- based estimators that primarily target heavy-tailed data (such as the Hill estimator method) to not p erform very w ell. The potential outcomes are generated from the following three mo dels: M (L) 1 : Y (1) = 5 S (1 + X ) , Y (0) = S (1 + X ) , M (L) 2 : Y (1) = C 1 exp( X ) , Y (0) = C 2 exp( X ) , M (L) 3 : Y (1) = W 2+ X, 2 , Y (0) = W 3+2 X, 1 . where X is defined as ab ov e, S follo ws a standard normal distribution, C 1 ∼ Exp (1) , C 2 ∼ Exp (2) , and W a,b denotes a W eibull distribution with shap e parameter a and scale b . The asso ciated EVIs for each scenario are γ 1 = γ 0 = 0 for mo dels M (L) 1 and M (L) 2 (corresp onding to the Gumbel domain of attraction), and γ 1 = − 1 / (2 + X ) and γ 0 = − 1 / (3 + 2 X ) for mo del M (L) 3 . 5.4 Results T able 1 summarises estimation performance under b oth heavy- and light-tailed scenarios for n = 1000 . In the heavy-tailed case, TIEE-IPW consistently ac hieves lo w er bias and v ariance than the Zhang- Firp o and Pickands estimators across all tail regimes. While the Causal Hill estimator p erforms reasonably w ell in the intermediate regime ( p n = 5 /n ), TIEE-IPW remains stable across all regimes, including the very extreme tail ( p n = 5 / ( n log n ) ), where its adv an tage is most pronounced. The Pic kands estimator exhibits substan tially larger bias and v ariance throughout, particularly in the most extreme regime, reflecting its known finite-sample v ariabilit y . The Zhang-Firp o estimator p erforms comp etitively in the in termediate regime but deteriorates in Scenario M (H) 2 , where EVI heterogeneit y is strongest. In the ligh t-tailed case, all estimators p erform markedly b etter, as reflected b y lo w er MSE v alues. The relative adv antage of TIEE-IPW p ersists but is less pronounced, consisten t with the easier estimation of light-tailed extremes. The Pickands estimator again performs w orst, with inflated MSE, esp ecially in the very extreme tail regime. Similar results for n = 5000 are shown in App endix B.1 . Figure 1 rep orts the empirical co verage of the 90% interv als across both tail designs. In the hea vy-tailed case, b oth the Causal Hill and TIEE estimators exhibit stable co verage, consistent with their compatibility with P areto-type tails, while Zhang systematically under-cov ers, with deviations increasing in the extreme tail. Overall, TIEE ac hieves the most accurate and robust finite-sample co verage. Results for the Pickands estimator are omitted due to its highly unstable in terv al estimates. In the light-tailed case, TIEE main tains cov erage close to the nominal lev el across all designs and regimes. Zhang con tin ues to under-cov er, particularly at more extreme quan tiles. The Causal Hill estimator sho ws design-dep endent behaviour, with substan tial under-cov erage in M (L) 1 and M (L) 3 , but near-nominal p erformance in M (L) 2 . Overall, TIEE provides the most uniform and reliable uncertaint y quantification, remaining robust in settings with mild tails and limited extreme observ ations. 13 T able 1: Estimation performance (Bias and MSE) under heavy- and ligh t-tailed scenarios for sample size n = 1000 . Heavy-tailed scenarios Scenario M (H) 1 Scenario M (H) 2 Scenario M (H) 3 1 − τ n Method Bias MSE Bias MSE Bias MSE 5 / ( n log n ) Causal Hill 45.982 4828.099 6.370 1355.930 26.759 3652.382 Pickands -52.322 5694.769 42.410 56223.057 63.741 86214.119 Zhang-Firpo -3.243 2840.382 0.847 5395.063 -9.793 6191.232 TIEE-IPW -6.792 1513.337 -16.460 581.743 -5.879 544.757 1 /n Causal Hill 28.305 2329.556 4.388 844.775 15.660 1872.560 Pickands -36.222 6194.962 59.070 110234.161 58.364 101126.968 Zhang-Firpo 4.117 2846.812 8.653 5469.215 -2.782 6103.073 TIEE-IPW -4.997 1025.129 -10.776 390.789 -4.072 390.293 5 /n Causal Hill 4.496 135.273 0.539 55.963 2.142 79.982 Pickands -19.168 899.346 6.520 1714.555 4.988 1268.937 Zhang-Firpo 1.194 174.980 1.183 131.197 3.330 1128.588 TIEE-IPW -1.193 101.772 -2.066 52.606 -1.680 68.289 Light-tailed scenarios Scenario M (L) 1 Scenario M (L) 2 Scenario M (L) 3 1 − τ n Method Bias MSE Bias MSE Bias MSE 5 / ( n log n ) Causal Hill 13.488 244.970 4.372 49.431 0.535 0.601 Pickands -15.687 482.247 5.818 3791.332 -1.827 15.487 Zhang-Firpo -0.337 14.195 0.475 12.403 -0.347 0.307 TIEE-IPW -1.286 10.593 -0.153 9.121 -0.097 0.252 1 /n Causal Hill 10.916 163.065 3.459 32.668 0.441 0.447 Pickands -15.195 408.884 3.743 2018.179 -1.788 12.386 Zhang-Firpo 0.406 14.245 0.847 12.895 -0.266 0.257 TIEE-IPW -1.144 8.332 -0.109 6.573 -0.089 0.196 5 /n Causal Hill 2.563 12.662 0.734 3.189 0.105 0.083 Pickands -11.699 183.880 -0.564 106.240 -1.438 4.721 Zhang-Firpo -0.090 4.283 0.058 2.774 -0.015 0.089 TIEE-IPW -0.488 2.484 0.173 1.334 -0.063 0.050 5.4.1 Robustness to mo del missp ecification In the extreme quantile regime, the outcome model is inheren tly missp ecified, as tail b eha viour cannot b e learned directly from the data, so classical double robustness do es not strictly apply . Nev ertheless, simulations in Sections 5.2 and 5.3 show that TIEE remains robust to tail-mo del missp ecification, despite imp osing a GPD on non-GPD data. W e further inv estigate robustness to prop ensity score misspecification by fitting incorrect mo dels. Sp ecifically , w e consider three missp ecification scenarios: (1) omitting the quadratic term in ( 16 ) , (2) using a linear mo del when the true mo del is quadratic, and (3) introducing spurious interaction terms. T able 2 shows that the EQTE estimator δ ( τ n ) maintains small bias and v alid inference under these p erturbations. T ak en together, these results sho w that TIEE remains n umerically robust u nder missp ecification of b oth the tail and prop ensity score models. While not formally doubly robust, this practical robustness is v aluable in applications. 14 M 1 (H) M 2 (H) M 3 (H) 5 n 1 n 5 ( n log n ) 5 n 1 n 5 ( n log n ) 5 n 1 n 5 ( n log n ) 0.4 0.6 0.8 1.0 Quantile lev el 1 − τ n Cov erage Probability Method Hill TIEE Zhang M 1 (L) M 2 (L) M 3 (L) 5 n 1 n 5 ( n log n ) 5 n 1 n 5 ( n log n ) 5 n 1 n 5 ( n log n ) 0.4 0.6 0.8 1.0 Quantile lev el 1 − τ n Cov erage Probability Method Hill TIEE Zhang Figure 1: Co v erage rates of 90% confidence interv als for the extreme quantile treatment effect. The comparison includes Zhang, Hill, and TIEE estimators with v arying sample fractions ( 5 /n , 1 /n , and 5 / ( n log n ) ). The dashed horizon tal line represen ts the nominal confidence lev el. The top ro w sho ws results for the hea vy-tailed scenarios, while the b ottom ro w shows results for the light-tailed scenarios. 15 T able 2: Bias and MSE of the TIEE estimator of the EQTE δ ( τ n ) under differen t prop ensity score misspecifications. The “T rue V alue” refers to the EQTE computed from the underlying data-generating distributions via large-sample Monte Carlo appro ximation. 1 − τ n T rue V alue Propensity Score Mo del Bias MSE 5 / ( n log n ) 45.39 T rue Mo del (Quadratic) -5.438 374.19 Polynomial Basis -5.601 397.00 Misspecified F orm (Linear) -3.903 386.64 Misspecified Link (Logit) -3.269 551.88 Spurious Cov ariates -3.472 581.16 1 /n 38.38 T rue Mo del (Quadratic) -3.837 275.01 Polynomial Basis -3.838 283.39 Misspecified F orm (Linear) -2.516 291.51 Misspecified Link (Logit) -2.199 361.22 Spurious Cov ariates -2.103 370.79 5 /n 16.53 T rue Mo del (Quadratic) -1.338 29.53 Polynomial Basis -1.354 29.33 Misspecified F orm (Linear) -1.009 31.11 Misspecified Link (Logit) -1.287 29.74 Spurious Cov ariates -1.285 29.89 6 Data application Extreme ev ent attribution (EEA) typically compares factual and counterfactual climates using mo del ensembles, either via c hanges in exceedance probabilities ( V an Olden b orgh et al. , 2021 ) or storyline approaches conditioning on atmospheric circulation ( Shepherd , 2016 ). While effectiv e, these metho ds do not directly target c hanges in extreme quan tiles. W e adopt a causal observ ational framew ork that complements EEA by focusing on extreme quan tile treatment effects. Conditioning on large-scale circulation aligns with storyline attribution while enabling estimation directly from historical data. W e analyse the EEAR-Clim dataset ( Bongio v anni et al. , 2025 ), using seasonal 7-day maxim um precipitation in the Austrian Alps as the outcome. The treatmen t con trasts mo dern (1995–2020) and historical (1950–1980) p erio ds, targeting the τ = 0 . 99 quantile ( ≈ 100-y ear return level under stationarit y). T wo main c hallenges arise: data sparsit y in the tail and confounding by atmospheric circu- lation. V alid attribution therefore requires separating thermodynamic effects from dynamical v ariability ( Shepherd , 2016 ). In our framework, circulation acts as a confounder, affecting b oth the climate regime and extreme precipitation. W e address this by constructing a confounder set X X X from ERA5 reanalysis data (ECMWF; Hersbach et al. 2020 ∗ ), targeting the dynamical state of the atmosphere. Cov ariates include synoptic-scale circulation v ariables such as sea-lev el pressure (SLP) whcih characterises surface pressure systems, geop otential height at 500 hPa ( Z 500 ) describing mid-trop ospheric circulation, and wind intensit y at 850 hPa ( Wind 850 ), capturing lo w er-trop ospheric flo w. Additionally , w e include large-scale teleconnection indices (NAO and AO). W e explicitly exclude thermo dynamic v ariables suc h as local temperature, whic h lie on the causal path wa y and ∗ Data : Cop ernicus Climate Change Service (C3S) (2023). ERA5 hourly data on single levels [DOI: 10.24381/cds.adbb2d47] and pressure levels [DOI: 10.24381/cds.bd0915c6]. Accessed Sept. 2025. 16 T able 3: Extreme quantile treatmen t effect estimates (99% quan tile) for 23 Austrian stations. Bold v alues in the TIEE column indicate statistical significance at the 10% level. Station Empirical Causal Hill Zhang-Firpo TIEE Aspang -4.92 -1.81 [-6.53, 2.90] -3.70 [-10.05, 4.73] -1.25 [-3.60, 1.11] Bruckm ur 5.65 1.95 [-1.98, 5.89] 4.20 [-5.07, 12.70] 3.01 [ 0.92, 5.09] Deutschlandsberg -2.67 -4.07 [-8.31, 0.17] -4.90 [-14.91, 3.26] -3.69 [-6.34, -1.04] Doellach 5.22 2.85 [-0.07, 5.78] 4.30 [-0.86, 11.16] 3.60 [ 1.90, 5.30] F euerkogel 0.82 6.13 [-0.15, 12.42] 7.90 [ 2.64, 14.76] 1.30 [-1.89, 4.49] F reistadt 3.21 5.04 [ 0.51, 9.57] 4.30 [-1.51, 10.27] 5.41 [ 2.84, 7.98] Graz Universitaet -1.62 -2.34 [-6.33, 1.66] -3.10 [-10.90, 8.90] -1.53 [-3.89, 0.82] Hohenau -0.25 2.32 [-1.21, 5.85] 1.90 [-4.82, 6.78] 0.47 [-1.41, 2.36] Kremsmuenster T a wes -3.84 -2.05 [-6.58, 2.48] -0.50 [-4.95, 3.42] -3.61 [-6.05, -1.17] Landeck 1.45 0.94 [-1.45, 3.33] -0.20 [-2.72, 2.93] 2.97 [ 1.52, 4.43] Mariazellst. Sebastian Flugfeld -0.49 2.39 [-2.58, 7.36] 0.10 [-3.42, 2.40] 1.24 [-1.15, 3.63] Mayrhofen 1.34 0.44 [-2.60, 3.48] 1.20 [-6.11, 8.63] 0.68 [-0.94, 2.30] Muerzzuschlag 9.24 8.55 [ 4.86, 12.24] 6.70 [ 0.51, 11.46] 8.99 [ 7.00, 10.99] Patsc herkofel 3.11 1.61 [-1.32, 4.54] 2.50 [-0.02, 5.86] 1.96 [ 0.33, 3.60] Puch b ergschneeberg 6.27 8.18 [ 3.66, 12.70] 8.30 [-1.06, 16.81] 7.25 [ 4.89, 9.62] Retzwindmuehle 6.88 6.17 [ 3.12, 9.22] 7.00 [ 4.14, 9.23] 5.43 [ 3.80, 7.07] Reutte 4.51 8.94 [ 3.49, 14.40] 9.70 [-7.58, 22.47] 5.46 [ 2.64, 8.29] Schoppernau 7.25 8.42 [ 3.04, 13.81] 11.00 [ 0.28, 21.58] 5.41 [ 2.86, 7.96] Schroeck en 3.05 4.00 [-0.91, 8.90] 4.00 [-4.87, 20.37] 3.08 [ 0.47, 5.68] St. Jakobdef. 0.41 -2.32 [-5.74, 1.11] -0.40 [-6.15, 3.68] -0.24 [-2.15, 1.67] St. Poeltenlandhaus 4.23 3.87 [-1.33, 9.07] 5.20 [-3.28, 12.88] 1.65 [-0.92, 4.21] Stift Zwettl 4.45 3.44 [-0.53, 7.41] 5.30 [ 1.74, 9.44] 2.24 [ 0.09, 4.39] W aizenkirchen -4.84 -3.10 [-6.83, 0.63] -1.30 [-7.78, 4.26] -4.88 [-6.90, -2.87] w ould induce o v er-adjustment, as w ell as slo wly v arying o ceanic indices (e.g., AMO, PDO), wh ose p ersistence ma y violate the positivity assumption. The prop ensity score π ( X X X ) = P ( D = 1 | X X X ) is estimated via the follw oing logistic regression mo del log π ( X X X ) 1 − π ( X X X ) = β 0 + 2 X r =1 α r Z r 500 + 2 X r =1 γ r SLP r + δ Wind 850 + η NAO + 4 X r =1 λ r A O r . This flexible sp ecification ensures effectiv e balancing of circulation patterns across p erio ds, allo wing us to isolate the thermo dynamic contribution of anthropogenic forcing. 6.1 Results W e applied the TIEE, Causal Hill, and Zhang-Firp o estimators to the 23 stations listed in T a- ble 3 . T able 3 addresses the attribution question by quan tifying changes in the 99% precipitation quan tile under recen t w arming, conditional on comparable circulation states. While unadjusted and alternative estimators yield mixed and often inconclusive results, TIEE consistently iden tifies p ositiv e shifts in the upp er tail across stations, indicating that anthropogenic warming primarily in tensifies extreme precipitation rather than altering its frequency . The results highligh t substantial discrepancies b etw een naiv e and causally adjusted estimates. Unadjusted empirical comparisons frequen tly suggest negativ e effects (e.g., Aspang − 4 . 92 mm, W aizenkirc hen − 4 . 84 mm), which w e attribute to circulation-driven v ariability rather than true thermo dynamic resp onses. In con trast, 17 TIEE corrects these distortions and reveals coheren t p ositive signals. TIEE also demonstrates greater p ow er and precision. At Landec k, it detects a significant increase ( ˆ δ TIEE (0 . 99) = 2 . 97 mm), whereas Zhang-Firp o suggests a decrease ( − 0 . 20 mm), illustrating how inefficien t estimators can obscure the signal. More broadly , Zhang-Firpo pro duces very wide interv als (e.g., ≈ 18 mm at Bruckm ur), while TIEE yields substantially narro w er in terv als (t ypically ≈ 4 – 5 mm) and identifies significan t effects at stations such as Bruc km ur, F reistadt, and Schoppernau. The Causal Hill estimator generally reco vers the correct direction but suffers from large v ariance, often failing to detect significance. Aggregating across stations, TIEE rev eals a coheren t regional signal. While empirical estimates v ary widely due to local dynamical noise, TIEE pro duces consisten tly positive effects, with an a verage in tensification of ˆ δ TIEE (0 . 99) ≈ 2 . 18 mm, exceeding and stabilising estimates from Zhang ( ≈ 1 . 75 mm) and Causal Hill ( ≈ 1 . 78 mm). It also reduces apparen t spatial heterogeneit y , suggesting that muc h of the v ariability in comp eting estimators reflects noise rather than gen uine differences. Ov erall, once dynamical confounding is prop erly addressed, a robust thermo dynamic signal of in tensification emerges across the region. 7 Conclusion Estimating causal effects at extreme quantiles is cen tral in applications suc h as climate science and econometrics, but standard metho ds fail in the tails. W e in tro duce the T ail-Calibrated In v erse Estimating Equation (TIEE) framew ork, which integrates causal inference and extreme v alue theory via an integrated momen t condition, yielding stable estimation without plug-in in version. While implemen ted with a GPD, the framework is not mo del-sp ecific and accommo dates an y uniformly consisten t tail approximation. W e establish iden tification, consistency , and asymptotic normality , and sim ulations sho w substan tial efficiency gains and robustness to prop ensit y score missp ecification. The metho d is compatible with flexible n uisance estimation, including mac hine learning approac hes. An application to Austrian precipitation data reveals a clear in tensification of extreme even ts after adjusting for atmospheric circulation. F uture work includes extensions to high-dimensional confounding, lev eraging flexible machine learning estimators, and to m ultiv ariate or spatial extremes b eyond the curren t univ ariate setting. The framework also extends naturally to con tinuous treatmen ts by replacing the binary signal with a contin uous treatmen t analogue based on the generalised prop ensit y score or related doubly robust constructions ( Hirano and Imbens , 2004 ; Kennedy et al. , 2017 ), yielding an unbiased estimating function for the distribution function of the p otential outcome at a given treatmen t level. More broadly , the TIEE framework addresses causal inference in sparse tails and has applications b ey ond climate science. In quan titative finance, it can be used to isolate the impact of p olicy in terven tions on systemic risk measures such as V alue-at-Risk, and in epidemiology , to quan tify in terven tion effects on p eak infection rates, where a v erage-based metrics can obscure tail b eha viour. These extensions highligh t the broad utility of tail-calibrated inference b eyond environmen tal 18 sciences. 8 A c kno wledgmen ts Mengran Li gratefully ackno wledges the supp ort provided by the China Sc holarship Council (CSC). The authors thank members of the Glasgow-Edin burgh Extremes Net work (GLE 2 N) for useful discussions related to this work. 9 Data av ailabilit y The data underlying this article are deriv ed from publicly av ailable sources. The ERA5 reanalysis data were obtained from the Cop ernicus Climate Change Service (C3S) Climate Data Store (h ttps://cds.climate.cop ernicus.eu), including ERA5 hourly data on single levels (DOI:10.24381/cds.adbb2d47) and pressure lev els (DOI:10.24381/cds.b d0915c6). The EEAR-Clim dataset is av ailable from Bongio v anni et al. (2025). Derived data generated during the analysis are a v ailable from the corresp onding author up on reasonable request. The co de to reproduce the sim ulation studies in Section 5 , and the data and co de to repro duce the data application in Section 6 are freely av ailable at h ttps://github.com /MengranLi- git/TIEE . A A dditional results and pro ofs from Section 4 A.1 Pro of of Lemma 1 F or each fixed p ∈ (0 , 1) , the mapping W 7→ 1 { Q Y d ( p ) ≤ θ } is an indicator function indexed b y the scalar threshold parameter θ . Such threshold indicator functions form a VC-subgraph class of finite V C dimension. VC-subgraph classes ha ve polynomial cov ering num b ers and therefore satisfy the conditions of the Glivenk o–Can telli theorem ( V an Der V aart and W ellner , 2023 , Sections 2.4 and 2.6). The in tegrated signal S d is obtained b y in tegrating these indicator functions with resp ect to p o ver a finite measure on (0 , 1) . Since the integrand is uniformly b ounded and in tegration is a linear and bounded op erator, the resulting class F inherits the Gliv enko–Can telli property ( V an Der V aart and W ellner , 2023 , Section 2.4). Therefore, sup θ ∈ Θ 1 n n X i =1 S d ( W i , θ ) − E S d ( W , θ ) P − → 0 , and F is Glivenk o–Can telli. A.2 Pro of of Theorem 3 (Asymptotic Normality) The pro of of asymptotic normality for the TIEE estimator ˆ θ d,τ follo ws the standard approac h for semi-parametric Z-estimators. W e assume that the true parameters θ d,τ and ζ are fixed, and we 19 emphasise the dep endence of the empirical functional Φ d,n of the estimator of ζ b y writing Φ d,n ( θ , ˆ ζ ) = 1 n n X i =1 g TIEE ,d ( W ( i ) , τ , θ , ˆ ζ ) . As we know, the estimator ˆ θ d,τ is defined as the ro ot of the empirical estimating equation Φ d,n ( ˆ θ d,τ , ˆ ζ ) = 0 . Applying the mean v alue theorem to Φ d,n ( θ , ˆ ζ ) in θ around the true param- eter θ d,τ yields 0 = Φ d,n ( ˆ θ d,τ , ˆ ζ ) = Φ d,n ( θ d,τ , ˆ ζ ) + ( ˆ θ d,τ − θ d,τ ) · ∂ ∂ θ Φ d,n ( ˜ θ , ˆ ζ ) , where ˜ θ is an in termediate v alue betw een ˆ θ d,τ and θ d,τ . Rearranging terms, w e get √ n ( ˆ θ d,τ − θ d,τ ) = − √ n Φ d,n ( θ d,τ , ˆ ζ ) ∂ ∂ θ Φ d,n ( ˜ θ , ˆ ζ ) . (17) By the consistency of ˆ θ d,τ established in Theorem 2, w e hav e ˜ θ P − → θ d,τ and by the assumptions of Theorem 2, ˆ ζ P − → ζ . F urthermore, the assumptions also ensure that the partial deriv ativ e ∂ ∂ θ Φ d ( θ , ζ ) is contin uous in a neigh b ourho o d of θ d,τ . The uniform conv ergence of the empirical deriv ativ e to its p opulation counterpart, ∂ ∂ θ Φ d,n ( θ , ζ ) P − → ∂ ∂ θ Φ d ( θ , ζ ) , com bined with the consistency of the estimated parameters, ensures that ∂ ∂ θ Φ d,n ( ˜ θ , ˆ ζ ) P − → ∂ ∂ θ Φ d ( θ d,τ , ζ ) . (18) Since ∂ ∂ θ Φ d ( θ d,τ ) = 0 , the denominator in ( 17 ) con verges to a non-zero constant, ensuring the asymptotic normality result is well-defined. No w we turn our attention to the n umerator in ( 17 ) . Since this is a function of the estimated n uisance parameter ˆ ζ , we exploit the path wise differen tiability of the moment functional with respect to ζ to obtain the following asymptotic linear expansion around the true parameter ζ : √ n Φ d,n ( θ d,τ , ˆ ζ ) = √ n Φ d,n ( θ d,τ , ζ ) + √ n ( ˆ ζ − ζ ) ⊤ · ∇ ζ Φ d ( θ d,τ , ζ ) + o p (1) , (19) where ∇ ζ Φ d ( θ d,τ , ζ ) is the gradient of the p opulation functional with respect to ζ , ev aluated at the true parameters. Since Φ d ( θ d,τ , ζ ) = 0 is the momen t condition identifying the true quan tile θ d,τ , the momen t condition is lo cally un biased with resp ect to ζ at the root (this is a standard prop erty for efficien t semi-parametric estimators). This condition implies that the term inv olving the estimation error of ˆ ζ v anishes asymptotically . Therefore, the numerator in ( 19 ) simplifies to √ n Φ d,n ( θ d,τ , ˆ ζ ) = √ n Φ d,n ( θ d,τ , ζ ) + o p (1) . 20 Substituting the definition of Φ d,n ( θ d,τ , ζ ) and using the iden tification condition E [ g TIEE ,d ( W ( i ) , τ , θ d,τ , ζ )] = Φ d ( θ d,τ , ζ ) = 0 , w e obtain √ n Φ d,n ( θ d,τ , ˆ ζ ) = 1 √ n n X i =1 g TIEE ,d ( W i , τ , θ d,τ , ζ ) + o p (1) , whic h, b y the central limit theorem, conv erges to a mean-zero Gaussian distribution with finite v ariance σ 2 d,τ = V ar ( g TIEE ,d ( W i , τ , θ d,τ , ζ )) . Com bining (via Slutsky’s theorem) this conv ergence result with that of ( 18 ), w e hav e √ n ( ˆ θ d,τ − θ d,τ ) = − √ n Φ d,n ( θ d,τ , ˆ ζ ) ∂ ∂ θ Φ d,n ( ˜ θ , ˆ ζ ) d − → − N (0 , σ 2 q ) ∂ ∂ θ Φ d ( θ d,τ , ζ ) ≡ N 0 , σ 2 q [ ∂ ∂ θ Φ d ( θ d,τ , ζ )] 2 ! . A.3 V ariance of ˆ θ d,τ when ∇ ζ Φ d ( θ d,τ , ζ ) = 0 A.3.1 Deriv ation of V d,τ The standard deriv ation in Theorem 3 assumes that the estimation of the nuisance parameter ζ do es not affect the asymptotic distribution of θ d,τ through the orthogonalit y condition ∇ ζ Φ d ( θ d,τ , ζ ) = 0 . When this condition fails to hold, we m ust incorp orate the estimation uncertain t y of ˆ ζ in to the calculation of the asymptotic v ariance V d,τ . This section deriv es the general v ariance formula that accoun ts for the influence of n uisance parameter estimation. Let ζ b e a k × 1 vector of nuisance parameters. The estimators ( ˆ θ d,τ , ˆ ζ ) are obtained by solving the empirical moment conditions Φ d,n ( ˆ θ d,τ , ˆ ζ ) = 1 n n X i =1 g TIEE ,d ( W i , ˆ θ d,τ , ˆ ζ ) = 0 , and Ψ d,n ( ˆ ζ ) = 1 n n X i =1 ψ ( W i , ˆ ζ ) = 0 , where ψ ( W i , ˆ ζ ) is an estimating function defining the nuisance parameter estimator ζ through the momen t condition E [ ψ ( W i , ˆ ζ )] = 0 . Let the stack ed momen t function m and its a v erage b e defined as m ( W , θ , ζ ) = g TIEE ,d ( W , θ , ζ ) ψ ( W, ζ ) ! , ¯ m n ( θ , ζ ) = 1 n n X i =1 m ( W i , θ , ζ ) . Under standard regularity conditions, the join t asymptotic distribution of ( ˆ θ d,τ , ˆ ζ ) is √ n ˆ θ d,τ − θ d,τ ˆ ζ − ζ ! d − → N (0 , V full ) , where V full is the sandwic h cov ariance matrix giv en by V full = A − 1 B ( A − 1 ) ⊤ , where A is the Jacobian matrix and B is the cov ariance matrix. Sp ecifically , A is the exp ected gradient of the momen t 21 function m with resp ect to the parameters ( θ , ζ ) , i.e., A = E ∂ m ( W, θ d,τ , ζ ) ∂ ( θ , ζ ) = E [ ∂ θ g TIEE ,d ] E [ ∂ ζ g TIEE ,d ] E [ ∂ θ ψ ] E [ ∂ ζ ψ ] ! . Under the natural assumption that the estimating equation for ζ do es not dep end on θ , w e hav e E [ ∂ θ ψ ] = 0 . Therefore, the Jacobian matrix tak es the blo c k upp er-triangular form A = Φ ′ ( θ d,τ ) A ζ 0 M ! , (20) where Φ ′ ( θ d,τ ) = E [ ∂ θ g TIEE ,d ( W , θ d,τ , ζ )] is a scalar representing the sensitivit y of the moment condition to changes in θ ; A ζ = E [ ∂ ζ g TIEE ,d ( W , θ d,τ , ζ )] = ∇ ζ Φ d ( θ d,τ , ζ ) is a 1 × k ro w vector capturing the sensitivit y to n uisance parameters; and M = E [ ∂ ζ ψ ( W , ζ )] is a k × k matrix go v erning the estimation of ζ . The matrix B is the cov ariance matrix of the stac k ed moment functions ev aluated at the true parameters, i.e., B = V ar( m ( W, θ d,τ , ζ )) = V ar( g TIEE ,d ) Cov( g TIEE ,d , ψ ) Co v( ψ , g TIEE ,d ) V ar( ψ ) ! = σ 2 d,τ Σ g ,ψ Σ ⊤ g ,ψ Σ ψ ! , where σ 2 d,τ = V ar ( g TIEE ,d ( W , θ d,τ , ζ )) is the scalar v ariance of the moment function for θ d,τ ; Σ ψ = V ar ( ψ ( W , ζ )) is the k × k co v ariance matrix of the moment functions for ζ ; and Σ g ,ψ = Co v ( g TIEE ,d ( W , θ d,τ , ζ ) , ψ ( W , ζ )) is the 1 × k ro w v ector of cov ariances betw een the t w o sets of momen t functions. V d,τ is defined b y the (1 , 1) blo ck of V full , i.e., V d,τ = V ar ( √ n ( ˆ θ d,τ − θ d,τ )) . F or a blo ck upp er- triangular matrix of the form given in ( 20 ), the in verse is A − 1 = (Φ ′ ) − 1 − (Φ ′ ) − 1 A ζ M − 1 0 M − 1 ! = U 11 U 12 0 U 22 ! . (21) V d,τ can then b e computed as V d,τ = U 11 B 11 U ⊤ 11 + U 12 B 21 U ⊤ 11 + U 11 B 12 U ⊤ 12 + U 12 B 22 U ⊤ 12 , (22) where B 11 = σ 2 d,τ , B 12 = Σ g ,ψ , B 21 = Σ ⊤ g ,ψ , and B 22 = Σ ψ . Since V d,τ is a scalar quan tit y , the second and third terms in ( 22 ) are transp oses of each other and therefore equal. Substituting the expressions for U 11 and U 12 from ( 21 ), w e obtain the general v ariance form ula V d,τ = σ 2 d,τ + A ζ V ζ A ⊤ ζ − 2 A ζ C [Φ ′ ( θ d,τ )] 2 , (23) where V ζ = M − 1 Σ ψ ( M − 1 ) ⊤ and C = M − 1 Σ ⊤ g ,ψ . Here, V ζ represen ts the asymptotic v ariance of ˆ ζ , 22 and C captures the scaled co v ariance b et ween the moment functions g and ψ . The form ula in ( 23 ) con tains three comp onents: σ 2 d,τ , the in trinsic v ariabilit y of the momen t condition for θ d,τ ; A ζ V ζ A ⊤ ζ , the v ariance inflation due to the estimation of nuisance parameters; and − 2 A ζ C , a correction term accoun ting for the correlation b etw een the estimation procedures for θ d,τ and ζ . A.3.2 Consisten t plug-in estimation A consistent plug-in estimator ˆ V d,τ can b e constructed by replacing all p opulation quan tities in ( 23 ) with their empirical coun terparts, i.e., ˆ V d,τ = ˆ σ 2 d,τ + ˆ A ζ ˆ V ζ ˆ A ⊤ ζ − 2 ˆ A ζ ˆ C ( ˆ Φ ′ ) 2 , where the empirical quan tities are defined as: ˆ Φ ′ = 1 n n X i =1 ∂ θ g TIEE ,d ( W i , ˆ θ d,τ , ˆ ζ ) , ˆ A ζ = 1 n n X i =1 ∂ ζ g TIEE ,d ( W i , ˆ θ d,τ , ˆ ζ ) , ˆ M = 1 n n X i =1 ∂ ζ ψ ( W i ; ˆ ζ ) , ˆ V ζ = ˆ M − 1 ˆ Σ ψ ( ˆ M − 1 ) ⊤ , ˆ C = ˆ M − 1 ˆ Σ ⊤ g ,ψ . The empirical second momen ts ˆ σ 2 d,τ , ˆ Σ ψ , and ˆ Σ g ,ψ are computed from the centred momen t functions as ˆ σ 2 d,τ = 1 n n X i =1 [ g TIEE ,d ( W i , ˆ θ d,τ , ˆ ζ )] 2 , ˆ Σ ψ = 1 n n X i =1 ψ ( W i , ˆ ζ ) ψ ( W i , ˆ ζ ) ⊤ , ˆ Σ g ,ψ = 1 n n X i =1 g TIEE ,d ( W i , ˆ θ d,τ , ˆ ζ ) ψ ( W i , ˆ ζ ) ⊤ . Under standard regularity conditions, ˆ V d,τ P − → V d,τ , ensuring consistent v ariance estimation for constructing confidence interv als and hypothesis tests. A.3.3 Special Cases The general v ariance formula in ( 23 ) encompasses tw o imp ortan t sp ecial cases that arise in practice. Case 1: Orthogonality . If the orthogonality condition ∇ ζ Φ d ( θ d,τ , ζ ) = 0 holds, as assumed in Theorem 3, then A ζ = 0 . Consequently , b oth the v ariance inflation term A ζ V ζ A ⊤ ζ and the cross-term 23 2 A ζ C v anish, and the v ariance simplifies to V d,τ = σ 2 d,τ [Φ ′ ( θ d,τ )] 2 , reco vering the simpler result stated in Theorem 3. This is the most efficient case, as the nuisance parameter estimation introduces no additional v ariabilit y . Case 2: Uncorrelatedness. If Σ g ,ψ = 0 then C = 0 and the cross-term 2 A ζ C disapp ears. This case can arise, for example, through sample-splitting or cross-fitting pro cedures that empirically decouple the estimation of g TIEE ,d and ψ . In this case, the v ariance b ecomes V d,τ = σ 2 d,τ + A ζ V ζ A ⊤ ζ [Φ ′ ( θ d,τ )] 2 . This case exhibits only the v ariance inflation due to n uisance parameter estimation, without the p oten tially b eneficial correction from correlated estimation errors. A.4 Asymptotic deriv ations for the extreme quantile treatmen t effect δ ( τ ) In this section, w e provide detailed deriv ations supporting the asymptotic distribution of δ ( τ ) = θ 1 ,τ − θ 0 ,τ pro vided in Section 4.2 . W e start b y deriving the join t asymptotic distribution of ( θ 1 ,τ , θ 0 ,τ ) ⊤ . A first order T aylor expansion of the empirical estimating equation Φ d,n ( b θ d,τ ) = 0 around θ d,τ giv es 0 = Φ d,n ( b θ d,τ ) ≈ Φ d,n ( θ d,τ ) + Φ ′ d,n ( θ d,τ )( ˆ θ d,τ − θ d,τ ) , d = 0 , 1 , whic h means that √ n ( ˆ θ d − θ d,τ ) ≈ − √ n Φ d,n ( θ d,τ ) Φ ′ d,n ( θ d,τ ) . By consistency and uniform conv ergence, Φ ′ d,n ( θ d,τ ) P − → Φ ′ d ( θ d,τ ) , so asymptotically , √ n ( ˆ θ d − θ d,τ ) = − √ n Φ d,n ( θ d,τ ) Φ ′ d ( θ d,τ ) + o p (1) . No w, stacking the tw o treatmen t groups together, w e get √ n ˆ θ 1 ,τ − θ 1 ,τ ˆ θ 0 ,τ − θ 0 ,τ ! = − D − 1 √ n Φ n, 1 ( θ 1 ,τ ) Φ n, 0 ( θ 0 ,τ ) ! + o p (1) , D = diag Φ ′ 1 ( θ 1 ,τ ) , Φ ′ 0 ( θ 0 ,τ ) . By the multiv ariate central limit theorem, √ n Φ n, 1 ( θ 1 ,τ ) Φ n, 0 ( θ 0 ,τ ) ! d − → N (0 , Σ Φ ) , 24 where Σ Φ = Σ 11 Σ 10 Σ 10 Σ 00 ! = V ar ( g 1 ( W , θ 1 ,τ )) Co v ( g 1 ( W , θ 1 ,τ ) , g 0 ( W , θ 0 ,τ )) Co v ( g 1 ( W , θ 1 ,τ ) , g 0 ( W , θ 0 ,τ )) V ar ( g 0 ( W , θ 0 ,τ )) ! . (24) In ( 24 ) , we hav e reduced notation clutter by denoting g d ( W , θ d,τ )) = g TIEE ,d ( W , θ , τ , ζ ) , so V ar ( g d ( W , θ d,τ )) is the v ariance of the in tegrated estimating function for treatment d = 0 , 1 , which can b e estimated using the sample v ariance: ˆ Σ dd = 1 n n X i =1 g d ( W ( i ) , ˆ θ d,τ ) 2 − 1 n n X i =1 g d ( W ( i ) , ˆ θ d,τ ) ! 2 ≈ 1 n n X i =1 g d ( W ( i ) , ˆ θ d,τ ) 2 . (25) The approximation in ( 25 ) comes from the fact that b θ d,τ solv es Φ d,n ( b θ d,τ ) = 0 , therefore the sample mean is close to 0. The cov ariance term Σ 10 captures the dependence betw een the estimating equations for the tw o treatmen t groups. When treatmen t and control samples are independent (e.g., in randomised experiments with non-ov erlapping units), Σ 10 = 0 . How ev er, in observ ational studies with IPW adjustment, Σ 10 = 0 in general b ecause b oth estimators use the same prop ensity score mo del and the same cov ariate distribution. In that case, the cov ariance can b e estimated b y ˆ Σ 10 = 1 n n X i =1 g 1 ( W ( i ) , ˆ θ 1 ,τ ) g 0 ( W ( i ) , ˆ θ 0 ,τ ) − 1 n n X i =1 g 1 ( W ( i ) , ˆ θ 1 ,τ ) ! 1 n n X i =1 g 0 ( W ( i ) , ˆ θ 0 ,τ ) ! . Therefore, the joint re-scaled asymptotic distribution of ( b θ 1 ,τ , b θ 0 ,τ ) ⊤ is given b y √ n ˆ θ 1 ,τ − θ 1 ,τ ˆ θ 0 ,τ − θ 0 ,τ ! d − → N 0 , D − 1 Σ Φ D − 1 . The asymptotic distribution of ˆ δ ( τ ) is therefore √ n ˆ δ ( τ ) − δ ( τ ) d − → N 0 , σ 2 δ , where the asymptotic v ariance σ 2 δ = (1 , − 1) D − 1 Σ Φ D − 1 (1 , − 1) ⊤ simplifies to σ 2 δ = Σ 11 [Φ ′ 1 ( θ 1 ,τ )] 2 + Σ 00 [Φ ′ 0 ( θ 0 ,τ )] 2 − 2Σ 10 Φ ′ 1 ( θ 1 ,τ )Φ ′ 0 ( θ 0 ,τ ) . (26) A consistent estimator ˆ σ 2 δ of ( 26 ) is obtained by substituting the terms on the right-hand side by sample analogues. The terms in the denominators can either b e estimated via numerical differen tia- tion or kernel densit y estimation of f Y d ( ˆ θ d,τ ) (since Φ ′ d ( θ d,τ ) = f Y d ( ˆ θ d,τ ) ). An asymptotically v alid (1 − α ) confidence in terv al for b δ ( τ ) is therefore given b y b δ ( τ ) ± z 1 − α/ 2 ˆ σ δ / √ n, where z 1 − α/ 2 denotes the (1 − α/ 2) -quantile of the standard normal distribution. 25 B A dditional sim ulation results and studies B.1 P erformance under hea vy- and ligh t-tailed scenarios with n = 5000 The results for sample size n = 5000 presented in T able 4 exhibit the same patterns as those rep orted for n = 1000 . In particular, the TIEE-IPW estimator contin ues to outp erform comp eting metho ds across all tail regimes, with the p erformance gains most pronounced in the extreme tail. As exp ected, increasing the sample size leads to o v erall reductions in bias and MSE for all estimators, but do es not alter the relative ranking of the metho ds. These findings further supp ort the robustness and scalabilit y of the prop osed approach. T able 4: Estimation performance (Bias and MSE) under heavy- and ligh t-tailed scenarios for sample size n = 5000 . Hea vy-tailed scenarios Scenario M (H) 1 Scenario M (H) 2 Scenario M (H) 3 1 − τ n Metho d Bias MSE Bias MSE Bias MSE 5 / ( n log n ) Causal Hill 40.165 3652.733 22.858 1537.911 35.070 3914.050 Pic kands 70.235 29395.493 130.364 202295.849 153.724 354024.912 Zhang-Firp o 33.818 4655.698 42.605 14786.807 39.325 9158.521 TIEE-IPW 22.842 876.524 22.348 996.323 21.852 878.432 1 /n Causal Hill 31.616 2279.542 18.035 942.085 26.499 2181.550 Pic kands 59.719 15831.284 95.367 86841.475 107.704 139943.738 Zhang-Firp o 31.558 4507.900 40.327 14597.895 35.425 8866.979 TIEE-IPW 19.310 695.274 18.341 761.169 17.668 632.377 5 /n Causal Hill 8.553 182.638 5.164 72.501 6.128 107.085 Pic kands 26.375 1308.698 20.979 1807.624 19.685 1832.421 Zhang-Firp o 9.558 272.607 7.824 208.126 8.530 447.514 TIEE-IPW 7.061 131.785 5.947 114.472 5.163 75.775 Ligh t-tailed scenarios Scenario M (L) 1 Scenario M (L) 2 Scenario M (L) 3 1 − τ n Metho d Bias MSE Bias MSE Bias MSE 5 / ( n log n ) Causal Hill 17.536 339.969 6.735 71.926 0.797 0.826 Pic kands -20.353 492.972 -1.767 492.544 -2.236 10.139 Zhang-Firp o -0.770 11.042 0.196 13.396 -0.386 0.322 TIEE-IPW -1.688 7.863 -0.683 5.600 -0.173 0.138 1 /n Causal Hill 13.458 201.658 4.997 40.885 0.628 0.535 Pic kands -19.186 430.554 -2.139 293.848 -2.118 8.627 Zhang-Firp o 0.316 10.549 0.847 14.074 -0.262 0.242 TIEE-IPW -1.456 5.861 -0.486 3.930 -0.150 0.104 5 /n Causal Hill 4.903 28.551 1.604 5.565 0.250 0.113 Pic kands -15.105 259.551 -2.340 64.948 -1.685 4.994 Zhang-Firp o -0.169 2.942 -0.116 2.348 -0.044 0.070 TIEE-IPW -0.876 2.081 -0.076 1.095 -0.105 0.039 26 B.2 Sensitivit y to the threshold u The TIEE estimator’s p erformance dep ends on the appropriate selection of the in termediate quan tile lev el p u asso ciated with the threshold u . T o assess this sensitivit y , w e fix the sample size at n = 1000 and consider the target quan tile lev els τ n with (1 − τ n ) ∈ { 5 /n, 1 /n, 5 / ( n log n ) } . W e consider p u ∈ [0 . 85 , 0 . 95] with incremen ts of 1% within each treatmen t group (i.e., u v aries ov er the 85th and 95th p ercentiles of the control distribution). Figure 2 rep orts the mean squared error of the estimated EQTE δ ( τ n ) (see its definition in Section 2.1 ) as a function of p u . for different tail probability regimes. As exp ected, the absolute MSE increases as the target tail probabilit y decreases, reflecting the increasing extremeness of the target quan tile and the reduced effective sample size a v ailable for tail calibration. How ever, the primary goal of this analysis is to assess the sensitivit y of the estimator to the c hoice of threshold. F rom this persp ective, w e can see that across the range p u ∈ [0 . 85 , 0 . 95] the MSE v aries smo othly without exhibiting sharp increases or pronounced minima, indicating that the estimator do es not dep end critically on fine tuning of the threshold. While v ery high thresholds increase v ariability due to the limited n um b er of exceedances and very lo w thresholds may in tro duce bias from tail-model missp ecification, the estimator remains reasonably stable across a broad in termediate range of thresholds. 500 1000 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 Inter mediate quantile le vel p u MSE 1 − τ n 5 log ( n ) 1 n 5 n Figure 2: Mean squared error of the TIEE-IPW estimator of the EQTE δ ( τ n ) as a function of the in termediate quantile lev el p u for (1 − τ n ) ∈ { 5 / log( n ) , 1 /n, 5 /n } . B.3 Robustness to grid size K Under the same exp erimen tal setting as Section B.2 , w e inv estigate robustness with resp ect to the grid size used in the numerical integration detailed in Section 3.4 . W e v ary the grid resolution o ver { 100 , 200 , . . . , 2000 } while fixing the threshold u at n 0 . 65 and k eeping all other comp onents unc hanged. 27 Figure 3 examines the robustness of the TIEE-IPW estimator of δ ( τ n ) to the grid size K under three tail probability regimes. Across the range of grid resolutions considered, the MSE remains stable, with no systematic deterioration as K increases. This indicates that the p erformance of TIEE is not driven b y fine-tuning of the grid resolution and that relativ ely coarse grids already pro vide sufficien t accuracy . The result suggests that the in tegrated estimating equation is numerically w ell b eha ved and that the prop osed implemen tation is robust to the discretisation c hoices required for practical computation. 300 600 900 1200 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 Grid size K MSE 1 − τ n 5 log ( n ) 1 n 5 n Figure 3: Mean squared error (MSE) of the TIEE-IPW estimator of the EQTE δ ( τ n ) as a function of the grid size K for (1 − τ n ) = 5 / log( n ) , 1 /n, 5 /n . References Bh uyan, P ., Jana, K. and McCo y , E. J. (2023) Causal analysis at extreme quantiles with application to london traffic flow data. Journal of the R oyal Statistic al So ciety Series C: Applie d Statistics 72 (5), 1452–1474. Bongio v anni, G., Matiu, M., Crespi, A., Nap oli, A., Ma jone, B. and Zardi, D. (2025) EEAR-Clim: a high-density observ ational dataset of daily precipitation and air temp erature for the Extended Europ ean Alpine Region. Earth System Scienc e Data 17 (4), 1367–1391. Cheng, C. and Li, F. (2024) In verting estimating equations for causal inference on quan tiles. Biometrika 112 (1), asae058. Deub er, D., Li, J., Engelke, S. and Maathuis, M. H. (2024) Estimation and inference of extremal quan tile treatment effects for hea vy-tailed distributions. Journal of the A meric an Statistic al A sso ciation 119 (547), 2206–2216. Doksum, K. (1974) Empirical probability plots and statistical inference for nonlinear mo dels in the t wo-sample case. The A nnals of Statistics pp. 267–277. Firp o, S. (2007) Efficient semiparametric estimation of quantile treatmen t effects. Ec onometric a 75 (1), 259–276. 28 Hersbac h, H., Bell, B., Berrisford, P ., Hirahara, S., Horán yi, A., Muñoz-Sabater, J., Nicolas, J., P eub ey , C., Radu, R., Sc hep ers, D. et al. (2020) The ERA5 global reanalysis. Quarterly journal of the r oyal mete or olo gic al so ciety 146 (730), 1999–2049. Hill, B. M. (1975) A simple general approac h to inference ab out the tail of a distribution. The A nnals of Statistics pp. 1163–1174. Hirano, K. and Imbens, G. W. (2004) The Pr op ensity Sc or e with Continuous T r e atments. In A pplie d Bayesian Mo deling and Causal Infer enc e fr om Inc omplet e-Data Persp e ctives (e ds W.A. Shewhart, S.S. Wilks, A. Gelman and X.-L. Meng) . John Wiley & Sons, Ltd. Kennedy , E. H., Ma, Z., McHugh, M. D. and Small, D. S. (2017) Non-parametric metho ds for doubly robust estimation of con tinuous treatmen t effects. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy 79 (4), 1229–1245. Lehmann, E. L. and D’Abrera, H. J. (1975) Nonp ar ametrics: statistic al metho ds b ase d on r anks . T oronto: Holden-Da y . Ma, H., Qin, J. and Zhou, Y. (2023) F rom conditional quantile regression to marginal quantile estimation with applications to missing data and causal inference. Journal of Business & Ec onomic Statistics 41 (4), 1377–1390. Shepherd, T. G. (2016) A common framew ork for approac hes to extreme ev ent attribution. Curr ent Climate Change R ep orts 2 (1), 28–38. V an der V aart, A. W. (2000) A symptotic statistics . V olume 3. Cambridge Univ ersit y Press. V an Der V aart, A. W. and W ellner, J. A. (2023) Empirical processes. In W e ak Conver genc e and Empiric al Pr o c esses: With Applic ations to Statistics , pp. 127–384. Springer. V an Olden b orgh, G. J., v an Der Wiel, K., Kew, S., Philip, S., Otto, F., V autard, R., King, A., Lott, F., Arrighi, J., Singh, R. et al. (2021) Path wa ys and pitfalls in extreme ev ent attribution. Climatic Change 166 (1), 13. W ang, H. J., Li, D. and He, X. (2012) Estimation of high conditional quantiles for heavy-tailed distributions. Journal of the A meric an Statistic al A sso ciation 107 (500), 1453–1464. Xu, W., W ang, H. J. and Li, D. (2022) Extreme quantile estimation based on the tail single-index mo del. Statistic a Sinic a 32 (2), 893–914. Zhang, Y. (2018a) Extremal quan tile treatment effects. The A nnals of Statistics 46 (6B), 3707–3740. Zhang, Y. (2018b) Supplemen t to “extremal quan tile treatment effects. ” . Zhang, Z., Chen, Z., T ro endle, J. F. and Zhang, J. (2012) Causal inference on quan tiles with an obstetric application. Biometrics 68 (3), 697–706. 29
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment