Panel Quantile Regression with Common Shocks

This paper develops an asymptotic and inferential theory for fixed-effects panel quantile regression (FEQR) that delivers inference robust to pervasive common shocks. Such shocks induce cross-sectional dependence that is central in many economic and …

Authors: Harold D. Chiang, Antonio F. Galvao, Chia-Min Wei

P anel Quan tile Regression with Common Sho c ks ∗ Harold D. Chiang † An tonio F. Galv ao ‡ Chia-Min W ei § F ebruary 24, 2026 Abstract This pap er dev elops an asymptotic and inferential theory for fixed-effects panel quan tile regression (FEQR) that deliv ers inference robust to perv asiv e common shocks. Suc h sho c ks induce cross-sectional dependence that is cen tral in man y economic and financial panels but largely ignored in existing FEQR theory , which t ypically assumes cross-sectional indep endence and requires T ≫ N . W e show that the standard FEQR estimator remains asymptotically normal under the mild condition (log N ) 2 /T → 0, thereb y accommodating empirically relev ant regimes, including those with T ≪ N . W e further sho w that common sho c ks fundamentally alter the asymptotic co v ariance structure, rendering con ven tional cov ariance estimators inconsisten t, and we prop ose a simple cov ariance estimator that remains consistent both in the presence and absence of common sho c ks. The prop osed pro cedure therefore provides v alid robust inference without requiring prior kno wledge of the dep endence structure, substan tially expanding the applicabilit y of FEQR methods in realistic panel data settings. ∗ First arXiv version: 20 F ebruary 2026. W e are grateful to Bruce E. Hansen for helpful commen ts and suggestions. All the remaining errors are ours. † Harold D. Chiang: hdchiang@wisc.edu. Department of Economics, Universit y of Wisconsin-Madison, William H. Sewell So cial Science Building, 1180 Observ atory Driv e, Madison, WI 53706, USA ‡ An tonio F. Galv ao: agalv ao@msu.edu. Departmen t of Economics, Michigan State Universit y , 110 Marshall-Adams Hall, 486 W. Circle Driv e East Lansing, MI 48824, USA § Chia-Min W ei: cwei69@wisc.edu. Department of Economics, Universit y of Wisconsin-Madison, William H. Sew ell So cial Science Building, 1180 Observ atory Driv e, Madison, WI 53706, USA 1 1 In tro duction Quan tile regression (Ko enk er and Bassett, 1978) offers a flexible framework for analysing heterogeneit y across the conditional distribution of an outcome and is particularly v aluable for studying tail b eha viour and outcomes with heavy tails or outliers. In many economic and financial applications, data are naturally organised as panels, making panel quantile regression with individual fixed effects (FEQR) an imp ortan t to ol for mo delling distribu- tional heterogeneity while controlling for unobserved unit-sp ecific heterogeneity (Ko enk er (2004)). A salient empirical characteristic of suc h panels, how ever, is the prev alence of strong common sho c ks. Common shocks—aggregate disturbances or latent factors that affect man y cross-sectional units within a giv en p erio d—are a central and recurring feature of economic and financial data. Much empirical w ork is built around this structure; for example, asset-pricing frame- w orks such as the tw o-pass pro cedure emphasise time v ariation in risk premia driv en by econom y-wide conditions rather than purely idiosyncratic firm sho c ks F ama and MacBeth (1973). More broadly , m acroeconomic announcemen ts, monetary p olicy changes, business- cycle fluctuations, regulatory reforms, and shifts in global risk sentimen t generate substan- tial comov ement across units, so regression disturbances t ypically displa y cross-sectional dep endence even after conditioning on observ ables. As emphasised in Andrews (2003) 1 , this p erv asiveness reflects the fact that key determinants of b ehaviour—in terest rates, in- flation, financial conditions, p olicy actions, institutional and legal c hanges, p olitical even ts, en vironmental and health sho c ks, and technological c hange—op erate at an aggregate level, sim ultaneously influencing wages, consumption, inv estment, costs, wealth, and credit con- ditions for large sets of agents. Although exp osure v aries with sector, lo cation, or balance- sheet c haracteristics, these forces link units through shared systemic influences, and with deep ening economic in tegration their transmission has become more widespread. Hence, cross-sectional indep endence is rarely a credible assumption in practice; common sho cks are a structural feature of economic en vironments rather than an exceptional complication. These features ha ve first-order econometric consequences. With common sho c ks, the er- ror pro cess violates cross-sectional indep endence and often lies outside the weak-dependence framew orks underlying classical panel asymptotics. Driscoll and Kraay (1998) show that con ven tional cov ariance estimators are generally inconsistent under such dep endence and 1 A w orking pap er version of Andrews (2005) that contains extra justifications on common shocks. 2 prop ose standard errors that are robust to broad forms of cross-sectional and temporal cor- relation. In empirical finance settings, Petersen (2008) demonstrate that ignoring common time effects leads to substantial size distortions, with severely ov erstated t-statistics. This concern is esp ecially acute in panels with fairly large N but moderately large T , where limited time v ariation restricts information ab out aggregate comp onen ts. Against this bac kground, the theoretical literature on FEQR has progressed consider- ably . Koenker (2004) first studies the estimation problem of a linear quantile regression mo del with individual fixed effects, prop osing an ℓ 1 -regularised estimator to mitigate the inciden tal parameter problem and analysing its asymptotic prop erties. Canay (2011) pro- p oses a t wo-step estimator that remov es fixed effects under a lo cation-shift restriction, yielding a computationally simple pro cedure with standard large- N , large- T b eha viour. Without imp osing a lo cation shift, Kato, Galv ao, and Mon tes-Ro jas (2012) develop a gen- eral asymptotic theory for Ko enker-t yp e FEQR with individual sp ecific effects, establishing consistency and asymptotic normality under joint growth of N and T sub ject to a long- panel rate condition suc h as N 2 (log N ) 3 /T → 0, and highlighting the challenges p osed by inciden tal parameters and the non-smo oth ob jectiv e. Galv ao and Kato (2016) introduce a k ernel-smo othed v ersion of the quan tile ob jectiv e to enable higher-order expansions, char- acterise the inciden tal-parameter bias, and prop ose analytic bias correction. More recen tly , Galv ao, Gu, and V olgushev (2020) obtain asymptotically unbiased normal approximations under a w eaker long-panel requirement, N (log T ) 2 /T → 0, which is standard in the non- linear panel data mo dels literature (see, e.g., Arellano and Hahn (2007), Arellano and Bonhomme (2011), and F ern´ andez-V al and W eidner (2018)). Nevertheless, these results still rely on regimes in whic h the time dimension dominates the cross section, effectiv ely requiring T ≫ N , which limits applicabilit y in many economic and financial panels. Other recen t approaches include Machado and Santos Silv a (2019) considering a lo cation-scale shift effect for the individual effects, while adopting a lo cation-scale shift sp ecification also for the slop e parameter, and Gu and V olgushev (2019) and Zhang, W ang, Lian, and Li (2023) inv estigating the estimation of panel quantile regression mo dels with group effects. 2 Despite its increasing empirical relev ance, the theory of fixed-effects panel quan tile 2 The panel quantile literature offers differen t iden tification and estimation strategies including p enalized fixed effects, random effects, correlated random effects, group effects, instrumental v ariables, and factor mo dels (see, e.g., Ko enker (2004); Abrev ay a and Dahl (2008); Lamarche (2010); Kim and Y ang (2011); Chetv eriko v, Larsen, and Palmer (2016); Arellano and Bonhomme (2016); Graham, Hahn, P oirier, and P ow ell (2018); Demetrescu, Hosseink ouchac k, and Ro drigues (2023)). 3 regression still exhibits tw o notable gaps. First, most existing results abstract from common sho c ks and therefore do not accoun t for the cross-sectional dep endence that is inheren t in man y economic and financial panels. In the linear regression con text, theoretical analysis of common time effe cts in cross-sectional regressions w as developed by Andrews (2005). Comparable results, ho w ever, app ear to b e unav ailable for panel quan tile regression mo dels. Second, even the most adv anced analyses rely on stringent long-panel conditions suc h as N (log T ) 2 /T → 0 (Galv ao et al., 2020), effectively requiring the time dimension to dominate, whic h is hardly satisfied in empirical applications. In this pap er, w e develop a nov el framew ork for FEQR mo dels that accommo date com- mon time effects while preserving asymptotic normalit y as b oth N and T diverge, thereby co vering a wide range of asymptotic regimes, including N ≍ T and T ≪ N . The inclusion of time effects alters b oth the con vergence rate and the form of the asymptotic cov ariance, rendering existing inference pro cedures for conv entional FEQR inconsisten t. This relax- ation of conv entional rate restrictions is driven by a phenomenon we term data-gener ating pr o c ess (DGP)–induc e d smo othing : conditional a veraging o ver latent sho c ks inherently smo oths the empirical criterion and concentrates its sto c hastic v ariation around a smo oth comp onen t, thereby enabling refined sto c hastic expansions. Exploiting this structure, we establish uniform conv ergence and asymptotic normalit y of the standard unregularised FEQR estimator. In addition, we prop ose a new simple robust cov ariance estimator that is consistent under b oth common-sho c k and classical FEQR settings. Consequently , v alid inference do es not require prior knowledge of whether common sho cks are present. These results con tribute to the broader literature on nonlinear panel mo dels with large panels ( N , T → ∞ ). W e note that another branc h of the literature considers factor models for panel quan tile regression. Chen, Dolado, and Gonzalo (2021) develop iden tification and estimation of a quan tile factor mo del, allowing for individual and time effects in teraction. The metho d requires panels with large n and T . The broader literature on factor-augmen ted panel quan tile regression mo dels includes different approaches. Ando and Bai (2020) use PCA for the estimation of a mo del with heterogeneous parameters. Ma, Lin ton, and Gao (2021) study the estimation and inference of a semiparametric quantile factor mo del. Harding, Lamarc he, and P esaran (2020) consider the estimation of a dynamic quantile regression mo del with factors. Belloni, Chen, Padilla, and W ang (2023) introduce a high-dimensional mo del of latent heterogeneit y and establish a series of asymptotic results under relatively 4 mild conditions. Chen (2024) a tw o-step estimator for the estimation of models with in teractive fixed effects. Our pap er differs from this line of work in that we do not imp ose an explicit factor structure. Instead, within a classical linear quantile regression sp ecification with individual fixed effects first prop osed by Ko enk er (2004), w e allo w time-specific common shocks to en ter the regressors and the error term in a highly flexible and unsp ecified manner. This mo delling c hoice preserv es the familiar and interpretable fixed-effects quan tile regression structure while accommo dating ric h forms of cross-sectional dep endence. The trade-off is that we do not attempt to iden tify or reco ver an underlying factor or interactiv e fixed-effects structure. In our view, this approach strik es a reasonable balance b et ween interpretabilit y and flexibilit y . At the same time, it remains closely aligned with sp ecifications that are widely utilised b y empirical researchers and is simple and straightforw ard to implement computationally . The reminder of the pap er is organised as follo ws. Section 2 presents the mo del and estimation pro cedure. In Section 3 w e formalise the main statistical prop erties of the estimator. Section 4 pro vides practical inference pro cedures and establishes their v alidity . Section 5 provides Monte Carlo simulations. Finally , Section 6 concludes. Notations Let N denote the set of p ositiv e integers and R the real line. F or a finite-dimensional v ector a , let ∥ a ∥ denote its Euclidean norm, and for a matrix A , let A ′ denote its transpose. Throughout the remainder of the paper, all asymptotic statemen ts refer to regimes in which b oth N and T div erge to infinit y . Thorought the pap er, the probability spaces we consider are assumed to b e Borel-measurable. 2 Mo del and Estimation W e b egin by in tro ducing the mo delling framework; further motiv ation is provided in Re- mark 1 b elo w. Consider the data-generating pro cess ( Y it , X ′ it ) = g ( A i , B t , U it ) , (1) 5 where the regressor v ector X it excludes an intercept. The laten t v ariables { A i } i ∈ N , { B t } t ∈ N , and { U it } ( i,t ) ∈ N 2 are mutually indep endent and unobserv ed by the econometrician, and g : A × B × U → R p +1 is an unkno wn Borel-measurable function. F or simplicit y , the common time sho c ks B t ∈ B are assumed i.i.d. across t , and the idiosyncratic shocks U it ∈ U are i.i.d. across ( i, t ). No identical sampling or particular dep endence structure is imp osed on the unit-sp ecific effects A i ∈ A . W e imp ose no smo othness-type restrictions on the function g with resp ect to these laten t sho c ks. Moreov er, the spaces A , B , and U are allow ed to be high-dimensional and structurally complex. In the spirit of Ko enk er (2004, Equation (2.2)), we sp ecify a quantile regression mo del with individual fixed effects: Q τ ( Y it | X it , A i ) = α 0 ( τ ; A i ) + X ′ it β 0 ( τ ) . (2) Here, s imilar to Kato et al. (2012), the fixed effects are allo wed to v ary across quan tile levels. F or the purp ose of asymptotic analysis, we condition on a realisation of the individual effects, { A i } N i =1 = { a i } N i =1 . Under this conditioning, the mo del can b e written as Y it = α i 0 ( τ ) + X ′ it β 0 ( τ ) + ϵ it , Q τ ( ϵ it | X it , a i ) = 0 , where α i 0 ( τ ) = α 0 ( τ ; a i ). As a consequence, the regressor–error pair then admits a repre- sen tation, ( X it , ϵ it ) = g i ( B t , U it ) , (3) where g i ( b, u ) = g ( a i , b, u ). W e impose no parametric restrictions on the relationship b et w een the individual fixed effects α i 0 ( τ ) and the regressors X it ; in particular, the fixed effects are allow ed to v ary flexibly with τ . Moreo ver, under this framework, common sho c ks are p ermitted to exert heterogeneous effects across p opulation units, with the magnitude and direction of these effects depending on unit-sp ecific characteristics, which ma y be either observ ed or unobserved. Remark 1 (The rationale b ehind the structures of Equations (1) and (3)) . The nonp ar a- metric r epr esentation in (3) is closely r elate d to A ndr ews (2005), with similar mo del ling fr ameworks c onsider e d in Chernozhukov, F ern´ andez-V al, Hahn, and Newey (2013) and Chernozhukov, De aner, Gao, Hausman, and Newey (2025) in p anel data c ontexts. F un- 6 damental ly, it is highly gener al and c an b e viewe d as an applic ation of the de Finetti’s the or em under exchange ability. In p articular, for e ach fixe d i , de Finetti’s the or em im- plies that exchange able observations ar e c onditional ly indep endent given a latent variable, yielding a r epr esentation of the form in (3) –se e L emma 2 in Se ction 7 in Andr ews (2005). Exchange ability arises natur al ly in a wide r ange of e c onomic mo dels. F or example, in dynamic oligop oly mo dels with investment, Athey and Schmutzler (2001) show that ex- change able pr ofit functions ar e c onsistent with standar d envir onments such as Cournot and differ entiate d-pr o duct c omp etition, r efle cting a symmetry r estriction wher eby firms’ p ayoffs dep end on rivals’ actions and states but not their identities. This symmetry is closely r e- late d to the notion of anonymity in c o op er ative game the ory and so cial choic e the ory (e.g. Moulin 1991). Similarly, in differ entiate d pr o duct markets, Berry, L evinsohn, and Pakes (1995) ar gue that demand and c ost functions ar e exchange able in c omp etitors’ char acter- istics and show that e quilibrium uniqueness implies exchange ability in b oth observe d and unobserve d c omp onents. Likewise, the factor-typ e r epr esentation in (1) is motivate d by the liter atur e on ex- change able arr ays and network-typ e dep endenc e (Bickel, Chen, and L evina, 2011; Gr aham, 2024), as wel l as by w ork on two-way clustering and dep endenc e structur es in p anel data (Menzel, 2021; Davezies, D’Haultfœuil le, and Guyonvar ch, 2021; Chiang, Hansen, and Sasaki, 2024; Athey and Imb ens, 2025). Under a suitable exchange ability c ondition, the existenc e of such data-gener ating pr o c esses is guar ante e d by the A ldous–Ho over r epr esen- tation the or em (se e Chapter 7 of Kal lenb er g, 2005). ♢ Giv en data { ( Y it , X ′ it ) : i = 1 , . . . , N , t = 1 , . . . , T } , we jointly estimate the common parameter β 0 ( τ ) and the fixed effects { α i 0 ( τ ) } N i =1 using the standard Ko enker-t yp e FEQR estimator (Ko enk er, 2004) without regularisation: ( ˆ α ( τ ) , ˆ β ( τ )) = arg min { α i } N i =1 , β 1 N T N X i =1 T X t =1 ρ τ  Y it − α i − X ′ it β  , (4) where ρ τ ( u ) = u  τ − 1 { u < 0 }  is the chec k loss function. Unless stated otherwise, w e fix τ ∈ (0 , 1) throughout and suppress its dep endence in the notation. F rom a computational persp ective, the FEQR estimator is straightforw ard to imple- men t using existing optimisation routines. The ob jectiv e function is conv ex in ( α , β ), so estimation reduces to a standard linear programming problem. Consequently , the estimator 7 can b e readily computed using widely av ailable quantile regression pack ages that accom- mo date fixed effects or high-dimensional controls, such as quantreg in R , statsmodels and cvxpy in Python, as w ell as built-in pro cedures in standard econometric soft ware. Mo dern algorithms for large-scale conv ex optimisation make it feasible to handle panels with a large num b er of cross-sectional units, ensuring practical scalability in empirically relev ant applications. 3 Asymptotic Theory In this section, we study the theoretical properties of the FEQR estimator under the common sho c k framework. W e denote b y F i and f i the distribution function and density of ϵ it = g ϵ i ( B t , U it ) . F urther, let F i ( · | x ) (resp. F i ( · | b )) and f i ( · | x ) (resp. F i ( · | b )) b e the conditional distribution function and density of ϵ it giv en X i 1 = x (resp. B i 1 = b ), and let F i ( · | x, b ) and f i ( · | x, b ) denote the corresp onding conditional distribution function and density giv en X i 1 = x and B 1 = b . Let X i and E i denote the support of X i 1 and ϵ i 1 , and write X = S i ≥ 1 X i ⊂ R p and E = S i ≥ i E i ⊂ R . 3.1 Uniform Consistency W e first present the assumptions needed for uniform consistency . Assumption 1 (Bounded regressors) . X is a b ounde d set in R p for a fixe d p . Assumption 2 (Iden tification) . F or e ach δ > 0 , ϵ δ := inf i ≥ 1 inf | α | + ∥ β ∥ 1 E " Z α + X ′ i 1 β 0 { F i ( s | X i 1 − τ ) } ds # > 0 , wher e ∥ · ∥ 1 is the ℓ 1 norm. Assumption 1 is imp osed for technical conv enience and could in principle b e relaxed to sub-gaussian tail b ounds or suitable p olynomial moment conditions, alb eit at the exp ense of substantially more inv olved argumen ts. This assumption is recurrent in the quan tile regression literature, see, for example, Ko enker (2004) and Chernozh uko v and Hansen (2006). Assumption 2 is a standard identification condition that can b e found in, e.g. 8 F ern´ andez-V al (2005) and Kato et al. (2012). Under Assumptions 1 and 2, the following Lemma sho ws that b oth the individual effects estimators ˆ α i and the common parameter estimator ˆ β are uniformly consistent for their true v alues. Prop osition 1 (Uniform consistency) . Under Assumptions 1 and 2 and supp ose that (log N ) 2 /T → 0 , we have max 1 ≤ i ≤ N | ˆ α i − α i 0 | ∨ ∥ ˆ β − β 0 ∥ p → 0 . A proof is pro vided in Section A.1 of the App endix. The argumen t closely parallels that of Theorem 3.1 in Kato et al. (2012), with only minor modifications. In particular, the pres- ence of common sho c ks do es not materially complicate the consistency pro of. By contrast, establishing asymptotic normality requires a substantially differen t line of argumen t. 3.2 Asymptotic Distribution T o establish asymptotic distributional theory , we imp ose the follo wing additional assump- tions. Assumption 3 (Joint densities) . F or e ach i , ( X i 1 , ϵ i 1 , B 1 ) admits a joint density on X × E × B . Assumption 4 (Smo othness of densities) . F or e ach i ∈ N , (a) f i ( ϵ | x, b ) is c ontinuously differ entiable with r esp e ct to ϵ on E i for e ach x ∈ X i . (b) Ther e exists finite c onstant C f and L f such that f i ( ϵ | x, b ) ≤ C f , and | ∂ f i ( ϵ | x, b ) /∂ ϵ | ≤ L f uniformly over ( ϵ, x, b ) ∈ E i × X i × B and i ∈ N ; (c) f i (0 | b ) is b ounde d fr om b elow by some p ositive c onstant indep endent of i and b . Assumptions 3 and 4 are mild, and primarily ensure the existence and sufficient smo oth- ness of the joint density of ( X i 1 , ϵ i 1 , B 1 ). Let us define γ i = E [ f i (0 | X i 1 ) X i 1 ] /f i (0), which pla ys a key role in the expression of the asymptotic cov ariance introduced b elo w. 9 Assumption 5 (V ariance with common sho c ks) . Supp ose that sup i ∈ N ∥ γ i ∥ < ∞ . F urther assume the fol lowing matrix exists and is invertible Σ = lim N →∞ V ar 1 N N X i =1 E [( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i ) | B 1 ] ! . (5) Assumption 5 implicitly requires that the common time effect has a non-negligible im- pact on the distribution of the data. This condition rules out the classical fixed-effects quan tile regression framew ork analysed in Kato et al. (2012) and subsequen t studies, where—after controlling for individual fixed effects—the disturbances are indep enden t across ( i, t ) pairs, corresp onding to a degenerate sp ecial case of our setting. The assump- tion is conceptually related to the non-degeneracy conditions imp osed in the multiw a y clustering literature; see, for example, Chiang et al. (2024). Remark 2 (Structure of asymptotic co v ariance) . This term Σ c onstitutes the midd le c om- p onent of the sandwich-form asymptotic c ovarianc e of ˆ β . In c ontr ast to the c orr esp onding midd le c omp onent in Kato et al. (2012) and Galvao et al. (2020), which, in our notation, is Ω = lim N →∞ 1 N N X i =1 V ar  ( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i )  , (6) it admits a fundamental ly differ ent structur al r epr esentation. T o se e this, c onsider the varianc e term app e aring inside the limit in the definition of Σ : V ar 1 N N X i =1 E [( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i ) | B 1 ] ! = 1 N 2 N X i =1 V ar( E [( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i ) | B 1 ]) + 1 N 2 N X i =1 X j  = i Co v  E [( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i ) | B 1 ] , E [( τ − 1 { ϵ j 1 ≤ 0 } )( X j 1 − γ j ) | B 1 ]  . The distinction is not mer ely the pr esenc e of the cr oss-se ctional c ovarianc e terms. Even 10 the first sum differs in structur e. By the law of total varianc e, V ar( E [( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i ) | B 1 ]) = V ar(( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i )) − E [V ar(( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i ) | B 1 )] . Unless the c onditional varianc e in the se c ond term on the right-hand side vanishes, the aggr e gate varianc e c ontribution differs fr om the i.i.d.-typ e structur e in Kato et al. (2012). The c ommon time sho cks ther efor e alter the structur e of the c ovarianc e term even b efor e ac c ounting for cr oss-unit c ovarianc e. ♢ Assumption 6 (Jacobian matrix) . The fol lowing matrix exists and is invertible Γ = lim N →∞ 1 N N X i =1 E [ f i (0 | X i 1 ) X i 1 ( X i 1 − γ i ) ′ ] . Assumption 6 requires existence of the Jacobian matrix and the usual full-rank condition on the bread matrix in the sandwic h-form asymptotic co v ariance. In con trast to the middle comp onen t Σ in Assumption 5, Γ in retains the same structure as its counterpart in classicial FEQR theory as in Kato et al. (2012). Also that Γ is a symmetric matrix–observe that E [ f i (0 | X i 1 ) γ i ( X i 1 − γ i ) ′ ] = 0 for all i , whic h implies E [ f i (0 | X i 1 ) X i 1 ( X i 1 − γ i ) ′ ] = E [ f i (0 | X i 1 )( X i 1 − γ i )( X i 1 − γ i ) ′ ] . Denote the subgradients of the chec k function ev aluated at ( α ′ , β ′ ) b y H (1) N i ( α i , β ) = 1 T T X t =1 ( τ − 1 { Y it ≤ α i + X ′ it β } ) , H (2) N ( α , β ) = 1 N T N X i =1 T X t =1 ( τ − 1 { Y it ≤ α i + X ′ it β } ) X it . In classical FEQR en vironments, such as in Kato et al. (2012), the non-differentiabilit y of the c heck loss precludes standard T a ylor expansions. Com bined with the faster √ N T - rate implied b y cross-sectional indep endence, the empirical pro cess terms arising from differences b et ween subgradients ev aluated at the estimator and at the true parameter 11 b ecome difficult to control, thereby typically requiring asymptotic regimes in which T grows substan tially faster than N . Under the common time-effect structure in (3), how ev er, these non-smo oth subgradients admit approximations via suitably defined differentiable pro jections. This smo othness restores local differen tiabilit y , thereby permitting T a ylor expansions and substantially simplifying the arguments and w eaken the conditions. Explicitly , let B T = { B t } T t =1 and define the conditional (on common shocks) pro jections of the subgradients as ˜ H (1) N i ( α i , β ) = E h H (1) N i ( α i , β ) | B T i = 1 T T X t =1 E  τ − 1 { Y it ≤ α i + X ′ it β } | B t  , ˜ H (2) N ( α , β ) = E " 1 N T N X i =1 T X t =1 ( τ − 1 { Y it ≤ α i + X ′ it β } ) X it      B T # = 1 N T N X i =1 T X t =1 E  ( τ − 1 { Y it ≤ α i + X ′ it β } ) X it | B t  . Conditioning on B T remo ves the non-smo othness stemming from the indicator function at the lev el of the sto chastic fluctuations. The pro jected ob jects ˜ H (1) N i and ˜ H (2) N are exp ec- tations of indicator functions and therefore can b e expressed in terms of conditional dis- tribution functions of Y it giv en B t . Under mild regularity conditions—suc h as conditional smo othness of these distributions and b ounded densities—the pro jections approximate the original subgradien ts uniformly as N div erges while, at the same time, b eing smo oth func- tions of ( α i , β ). This structure p ermits standard T aylor-t yp e expansions around the true parameters despite the non-smo othness of the original ob jective function, while creating a limiting distribution driv en by the common sho c ks, it yields simple and general asymptotic theory . W e refer to this phenomenon as DGP-induc e d smo othing : conditional a v eraging o ver the common time factors smo oths the empirical criterion through the randomness of the idiosyncratic comp onen ts, resulting in sto c hastic fluctuations of order √ T around its mean. This mechanism is central to obtaining refined sto c hastic expansions and, ulti- mately , to establishing asymptotic normalit y under panel asymptotic regimes that allo w N to b e large relative to T . W e now present the main asymptotic distributional result. Theorem 1 (Asymptotic distribution) . Supp ose Assumptions 1–6 hold and supp ose that 12 (log N ) 2 /T → 0 , then we have √ T ( ˆ β − β 0 ) d − → N (0 , V ) , wher e V = Γ − 1 ΣΓ − 1 . A pro of can b e found in Section A.2 in the app endix. Some remarks are in order. Remark 3 (Comparison with classical FEQR results) . Comp ar e d with the asymptotic nor- mality r esults for FEQR obtaine d in the absenc e of time effe cts, such as Kato et al. (2012) and Galvao et al. (2020), sever al imp ortant differ enc es arise. First, the pr esenc e of p er- vasive c ommon sho cks changes the effe ctive c onver genc e r ate fr om √ N T to √ T . Despite of the slower r ate, this is not ne c essarily detrimental, as the bias induc e d by the incidental p ar ameter pr oblem is attenuate d in our setting and has a less pr onounc e d imp act on in- fer enc e. F urthermor e, standar d infer enc e pr o c e dur e for FEQR, ther efor e do not deliver a c onsistent estimate of the asymptotic c ovarianc e. Se c ond, b e c ause the pr oje cte d obje ctive is smo oth, T aylor exp ansion ar guments b e c ome available. As a c onse quenc e, the r esulting lin- e ar r epr esentation—and henc e the asymptotic c ovarianc e—explicitly inc orp or ates the effe ct of estimating the fixe d effe cts. Final ly, under the classic al setup, even the r efine d anal- ysis in Galvao et al. (2020) imp oses the r estriction N (log T ) 2 /T → 0 . By c omp arison, our r e quir ement (log N ) 2 /T → 0 is marke d ly we aker, ther eby al lowing for a wide class of asymptotic se quenc es with T ≪ N . ♢ Remark 4 (Intuition b ehind Theorem 1) . Her e we pr ovide intuition for the pr o of. The key step is to work with the pro jected score , obtaine d by c onditioning on the c ommon time effe cts: ˜ H (1) N i ( α i , β ) = 1 T T X t =1 E  τ − 1 { Y it ≤ α i + X ′ it β } | B t  . We show that this pr oje cte d sc or e is uniformly close to the original sc or e H (1) N i ( α i , β ) . F ur- thermor e, under mild c onditions, ˜ H (1) N i ( α i , β ) is a differ entiable function of ( α i , β ) , even though the original sc or e is non-differ entiable. This DGP-induc e d smo othing p ermits a T aylor exp ansion ar ound the true p ar ameter, in c ontr ast to Equation (A.6) of Kato et al. 13 (2012), wher e the absenc e of smo othness pr e cludes such a line arisation: ˆ α i − α i 0 = 1 T T X t =1 f i (0 | B t ) ! − 1 ˜ H (1) N i ( α i 0 , β 0 ) − 1 T T X t =1 f i (0 | B t ) ! − 1 1 T T X t =1 E [ f i (0 | X it , B t ) X ′ it | B t ] ! ( ˆ β − β 0 ) + higher-or der terms . Such an exp ansion is not available in e arlier FEQR analyses, wher e the lack of smo oth- ness of the obje ctive ne c essitate d a c o arser r ate b ase d on c onc entr ation ine qualities. The pr oje cte d-sc or e appr o ach, ther efor e r eplac es non-smo oth empiric al pr o c ess ar guments with a smo oth sto chastic exp ansion. T o gether with the √ T -asymptotic normality induc e d by the c ommon sho cks, this p ermits substantial ly we aker r estrictions on the r elative gr owth r ates of N and T ♢ Remark 5 (W eakly correlated common sho c ks) . Serial c orr elation in the c ommon time effe cts under stationary, we ak dep endenc e (e.g. β -mixing) c an also b e ac c ommo date d, alb eit at the c ost of additional te chnic al work. The differ enc es ar e lar gely te chnic al, r e quiring standar d de c oupling and blo cking ar guments to hand le temp or al dep endenc e. We le ave a ful l tr e atment to futur e r ese ar ch. ♢ 4 Statistical Inference This section suggests simple practical methods for inference for the quan tile model with common shocks. T o utilise Theorem 1 for statistical inference, one needs a consistent estimator of the cov ariance matrix V . W e now in tro duce the new cov ariance estimator and establish its asymptotic guarantees. F or a kernel function (a probability densit y function) K : R → R , denote K h ( u ) = h − 1 K ( u/h ) and ˆ ϵ it = Y it − ˆ α i − X ′ it ˆ β . W e define ˆ m N t = 1 N N X i =1 ( τ − 1 { ˆ ϵ it ≤ 0 } )( X it − ˆ γ i ) , ¯ m N T = 1 T T X t =1 ˆ m N t . (7) 14 W e estimate Σ and Γ b y ˆ Σ N = 1 T T X t =1 ( ˆ m N t − ¯ m N T )( ˆ m N t − ¯ m N T ) ′ , ˆ Γ N = 1 N T N X i =1 T X t =1 K h (ˆ ϵ it ) X it ( X it − ˆ γ i ) , (8) where ˆ γ i = 1 ˆ f i T T X t =1 K h (ˆ ϵ it ) X it , ˆ f i = 1 T T X t =1 K h (ˆ ϵ it ) . Hence, the asymptotic cov ariance in Theorem 1 can b e estimated b y the robust cov ariance estimator: ˆ V = ˆ Γ − 1 N ˆ Σ N ˆ Γ − 1 N . (9) W e emphasise that this estimator in (9) differs from the standard co v ariance estima- tors commonly used in the literature; see, for example, Kato et al. (2012). As discussed in Remark 2, the presence of common sho c ks fundamen tally alters the structure of the asymptotic co v ariance relativ e to the classical setting with cross-sectional indep endence. Consequen tly , conv entional co v ariance estimators are generally inconsistent in this envi- ronmen t and lead to inv alid inference. W e illustrate this discrepancy in the simulation study rep orted in Section 5. The follo wing result establishes the asymptotic prop erties of the prop osed co v ariance estimator. Theorem 2 (Robust co v ariance estimation) . Supp ose the kernel K is c ontinuous, b ounde d, and of b ounde d variation on the r e al line, and the se quenc e of h satisfies h → 0 and (log N ) / ( T h ) → 0 as N , T → ∞ , then (i) Under Assumptions of The or em 1 and log T / N → 0 , it holds that ˆ V p − → V . (ii) In the absenc e of c ommon sho cks { B t } t , under the assumptions of The or em 1 in Galvao et al. (2020), it holds that N ˆ V p − → τ (1 − τ )Γ − 1 ΩΓ − 1 , wher e Ω is as define d in (6) . A pro of can b e found in Section A.3. A couple of remarks are in order. Remark 6 (Estimating Σ) . R e c al l that Σ define d in (5) c ontains the c onditional exp e cta- tion E [( τ − 1 { ϵ i 1 ≤ 0 } )( X i 1 − γ i ) | B 1 ] . T o estimate it, in addition to the ne e d to r eplac e 15 ϵ it and γ i by their estimators, we also ne e d to appr oximate the c onditional exp e ctation given the c ommon sho ck B 1 . Dir e ct estimation of this obje ct is chal lenging b e c ause B t is unobserve d and may b e high-dimensional, r endering standar d nonp ar ametric estimation chal lenging. F ortunately, for e ach fixe d t , al l cr oss-se ctional units shar e the same r e alisa- tion of B t . This al lows the c onditional exp e ctation given B t to b e c onsistently appr oximate d by its cr oss-se ctional aver age over i = 1 , . . . , N , ther eby avoiding the ne e d to estimate it nonp ar ametric al ly. ♢ Remark 7 (Robustness of the cov ariance estimator) . The or em 2 implies that, even in the absenc e of c ommon sho cks, infer enc e b ase d on the pr op ose d varianc e estimator ˆ V r emains valid under the classic al fixe d effe cts quantile r e gr ession (FEQR) fr amework of Galvao et al. (2020). In p articular, valid infer enc e do es not r e quir e the pr actitioner to determine ex ante whether c ommon sho cks ar e pr esent in the data-gener ating pr o c ess. F or this r e ason, we r e c ommend the use of our r obust c ovarianc e estimator of (9) in empiric al applic ations in plac e of existing alternatives. ♢ 5 Mon te Carlo Sim ulations W e conduct Monte Carlo simulations to examine the finite-sample p erformance of the fixed effects quantile regression estimator and to ev aluate the accuracy of the prop osed v ariance estimator in the presence of common time sho cks. The simulation design reflects empirically relev ant panel dimensions with a large cross-sectional dimension and a mo derate time dimension. W e consider the following lo cation-scale shift mo del similar to Kato et al. (2012) Y it = α i + β X it + (1 + γ X it ) U it , i = 1 , . . . , N , t = 1 , . . . , T , (10) where { α i } N i =1 are individual fixed effects and X it is a scalar regressor. The regressors are generated according to X it = χ 2 it (3) + 0 . 3 α i , where χ 2 it (3) are indep enden t chi-square random v ariables with three degrees of freedom, and α i ∼ Uniform(0 , 1). The error term is generated as U it = ε it + η t √ 2 16 T able 1: Mon te Carlo p erformance of the FEQR estimator: bias and RMSE τ = 0 . 25 τ = 0 . 50 τ = 0 . 75 ( N , T ) Bias RMSE Bias RMSE Bias RMSE (250 , 25) 0.0034 0.0342 -0.0004 0.0323 -0.0043 0.0344 (250 , 50) 0.0027 0.0248 0.0007 0.0232 -0.0023 0.0244 (500 , 25) 0.0035 0.0325 -0.0009 0.0311 -0.0033 0.0325 (500 , 50) 0.0004 0.0235 -0.0006 0.0230 -0.0053 0.0271 (1000 , 25) 0.0027 0.0319 -0.0009 0.0315 -0.0048 0.0334 (1000 , 50) 0.0019 0.0227 0.0005 0.0216 -0.0015 0.0227 Notes: En tries report the Monte Carlo bias E [ ˆ β ( τ )] − β ( τ ) and RMSE q E [( ˆ β ( τ ) − β ( τ )) 2 ] of the Koenker-t yp e FEQR estimator at quantile index τ . The true co efficient is β ( τ ) = β + γ q τ . Results are based on 2 , 000 Monte Carlo replications under the DGP in Section 5. where the idiosyncratic comp onen t ε it ∼ N (0 , 1) and common sho c k η t ∼ N (0 , 1) are m utually indep endent and i.i.d. across i and t . The common sho c k comp onent η t induces cross-sectional dep endence while preserving the conditional quantile structure. W e fo cus on parameters β = 1 , γ = 0 . 2, and estimate the mo del for three quantile levels τ = { 0 . 25 , 0 . 50 , 0 . 75 } . Under this sp ecification, the true quan tile slop e coefficient equals β ( τ ) = β + γ q τ , where q τ denotes the τ -quan tile of the standard normal distribution. The panel dimensions are set to N ∈ { 250 , 500 , 1 , 000 } , T ∈ { 25 , 50 } , and each exp er- imen t is rep eated ov er 2 , 000 Monte Carlo replications. F or eac h simulated sample, w e estimate the FEQR mo del using the standard Ko enk er-type estimator as defined in (4). W e consider tw o co v ariance estimators: (i) The conv entional sandwich v ariance estimator from Kato et al. (2012). (ii) The proposed robust cov ariance estimator of (9). In b oth co v ariance estimators, a Gaussian k ernel is used with the bandwidth chosen according to Silv erman’s rule-of-thum b, h = 1 . 06 · sd(ˆ ϵ ) · N − 1 / 5 , while imp osing a low er b ound of 0 . 05 to preven t undersmo othing and ensure numerical stabilit y in finite samples. F or b oth v ari- ance estimators, w e construct nominal 95% c onfidence interv als using asymptotic normal appro ximations. W e ev aluate FEQR estimator p erformance using bias, ro ot mean squared error (RMSE), and the t wo co v ariance estimators by their co verage probabilities. The results are display ed in T ables 1 and 2. The p erformance of the FEQR estimator under the common sho ck mo del is ev aluated 17 T able 2: Monte Carlo inference accuracy: co verage of robust and standard co v ariance estimators τ = 0 . 25 τ = 0 . 50 τ = 0 . 75 ( N , T ) Robust Standard Robust Standard Robust Standard (250 , 25) 0.903 0.646 0.922 0.633 0.908 0.623 (250 , 50) 0.911 0.617 0.927 0.612 0.930 0.627 (500 , 25) 0.900 0.518 0.916 0.489 0.909 0.488 (500 , 50) 0.905 0.521 0.911 0.525 0.923 0.566 (1000 , 25) 0.910 0.379 0.911 0.335 0.898 0.338 (1000 , 50) 0.920 0.365 0.932 0.357 0.924 0.375 Notes: Entries rep ort empirical cov erage probabilities of nominal 95% confidence interv als for β ( τ ) constructed using (i) the prop osed robust cov ariance estimator and (ii) the con- v entional sandwich cov ariance estimator that ignores cross-sectional dep endence. Results are based on 2 , 000 Mon te Carlo replications under the DGP in Section 5. first by its bias and RMSE. T able 1 rep orts the results for the three quantiles. The results sho w n umerical evidence that the FEQR estimator has small bias for ev ery sample under consideration. In addition, the RMSE decreases monotonically as either N or T increases. These results suggest that the FEQR estimator p erforms w ell in small samples for the common sho c k mo del, and even for the case of large N relativ e to T the mo del. Results ev aluating the finite sample p erformance of the prop osed inference pro cedure are collected in T able 2. If the asymptotic inference pro cedure correctly approximates the finite sample distribution of ˆ β ( τ ) − β 0 ( τ ), the cov erage rate should b e close to the nominal lev el of significance (95%). Results sho w that, for a given N , empirical co verage improv es as T increases. Ov erall, the confidence in terv als displa y accurate finite-sample co v erage, with empirical cov erage under the prop osed v ariance estimator closely aligned with the nominal 95% level. 6 Conclusion This pap er develops an asymptotic theory for fixed-effects panel quan tile regression with p erv asive common sho cks, a feature central to many economic and financial panels but largely absent from existing FEQR theory . By modelling outcomes through unit, time, and idiosyncratic latent comp onen ts, w e allo w for general cross-sectional dep endence while 18 retaining the standard FEQR estimator. The k ey insight is that conditioning on common time sho c ks induces DGP-induced smo othing, whic h restores local differen tiability of pro- jected scores and enables T aylor expansions despite the non-smo oth ob jectiv e. This yields consistency and asymptotic normality under joint asymptotics with N , T → ∞ , includ- ing regimes with T ≪ N . Common sho c ks fundamen tally alter the sto chastic structure of FEQR: the es timator concen trates around a √ T -order asymptotic comp onent, with an asymptotic cov ariance matrix that differs from that in classical settings, rendering conv en- tional FEQR v ariance estimator inconsistent. W e therefore develop inference pro cedures, including a nov el robust v ariance estimator, that deliv er v alid confidence in terv als and h y- p othesis tests in the presence of systemic time sho c ks while remaining applicable under the classical FEQR framework. Sev eral directions for future research app ear promising. First, the framework could b e extended to other panel mo dels with non-smo oth ob jective functions – such as censored or threshold-t yp e estimators – where analogous forms of inherent smo othing may emerge in the presence of common sho c ks. Second, an imp ortant extension is to p ermit serially cor- related time factors and to develop inference procedures that simultaneously accommodate b oth cross-sectional and temp oral dependence in non-smo oth panel settings. Finally , incor- p orating t wo-w a y fixed effects into panel quantile regression under full generality remains theoretically demanding and represents a particularly in teresting op en problem. 19 App endix Notations Throughout the app endix, for f : B → R , define P T f = T − 1 P T t =1 f ( B t ). F or eac h i ∈ N , let us write h i ( b ) = f i (0 | B t = b ) , k i ( b ) = E [ f i (0 | X it , B t ) X it | B t = b ] for the conditional densities and conditional densit y-weigh ted momen ts. A Pro of of the Main Results A.1 Pro of of Prop osition 1 The pro of follows the same arguments as in the pro of of Theorem 3.1 in Kato et al. (2012) and is not rep eated here–note that the dep endence structure was only used in b ounding their Equation (A.4) when Marcinkiewicz-Zygmund inequality is inv oked with indep en- dence across t = 1 , ..., T , which is satisfied under our setup follo wing (1). Here, under Assumption 1, one can apply Hoeffding’s inequality in place of Marcinkiewicz-Zygm und and hence the desired results requires only a weak er condition of (log N ) 2 /T = o (1). A.2 Pro of of Theorem 1 First, observ e that Lemma 2 show that the subgradients satisfy O p ( T − 1 ) = sup 1 ≤ i ≤ N | H (1) N i ( ˆ α i , ˆ β ) | , O p ( T − 1 ) =    H (2) N ( ˆ α , ˆ β )    . F urther, Lemma 3 implies the differen tiability of ˜ H (1) N i and ˜ H (2) N , and it also shows the uni- form b oundedness of the second-order deriv ativ es. Hence, T aylor expansions on ˜ H (2) N ( ˆ α , ˆ β ) 20 yield O p ( T − 1 ) = H (2) N ( ˆ α , ˆ β ) = ˜ H (2) N ( ˆ α , ˆ β ) + ( H (2) N ( ˆ α , ˆ β ) − ˜ H (2) N ( ˆ α , ˆ β )) = ˜ H (2) N ( α 0 , β 0 ) − 1 N T N X i =1 T X t =1 E [ f i (0 | X it , B t ) X it X ′ it | B t ] ! ( ˆ β − β 0 ) − 1 N T N X i =1 ( ˆ α i − α i 0 ) T X t =1 E [ f i (0 | X it , B t ) X it | B t ] + O p  max 1 ≤ i ≤ N | ˆ α i − α i 0 | 2 ∨    ˆ β − β 0    2  +  H (2) N ( ˆ α , ˆ β ) − ˜ H (2) N ( ˆ α , ˆ β )  . (11) Also, b y T a ylor expansions on eac h ˜ H (1) N i ( ˆ α i , ˆ β ), it holds uniformly o ver 1 ≤ i ≤ N O p ( T − 1 ) = H (1) N i ( ˆ α i , ˆ β ) = ˜ H (1) N i ( ˆ α i , ˆ β ) +  H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β )  = ˜ H (1) N i ( α i 0 , β 0 ) − 1 T T X i =1 f i (0 | B t ) ! ( ˆ α i − α i 0 ) − 1 T T X t =1 E [ f i (0 | X it , B t ) X ′ it | B t ] ! ( ˆ β − β 0 ) + O p  max 1 ≤ i ≤ N | ˆ α i − α i 0 | 2 ∨    ˆ β − β 0    2  +  H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β )  . 21 By rearranging the ab o ve, we hav e uniformly ov er 1 ≤ i ≤ N , ˆ α i − α i 0 = 1 T T X t =1 f i (0 | B t ) ! − 1 ˜ H (1) N i ( α i 0 , β i 0 ) − 1 T T X t =1 f i (0 | B t ) ! − 1 1 T T X t =1 E [ f i (0 | X it , B t ) X ′ it | B t ] ! ( ˆ β − β 0 ) + O p  max 1 ≤ i ≤ N | ˆ α i − α i 0 | 2 ∨    ˆ β − β 0    2  + 1 T T X t =1 f i (0 | B t ) ! − 1  H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β )  + O p ( T − 1 ) . (12) Recall that h i ( b ) = f i (0 | B t = b ) and k i ( b ) = E [ f i (0 | X it , B t ) X it | B t = b ]. Plugging (12) in to (11), we obtain O p ( T − 1 ) = ˜ H (2) N ( α 0 , β 0 ) − 1 N N X i =1 ( P T h i ) − 1 ( P T k i ) ˜ H (1) N i ( α i 0 , β 0 ) − Γ N ( ˆ β − β 0 ) + O p  max 1 ≤ i ≤ N | ˆ α i − α i 0 | 2 ∨    ˆ β − β 0    2  1 N N X i =1 P T k i + 1 ! + 1 N N X i =1 ( P T h i ) − 1 ( P T k i )  H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β )  + H (2) N ( ˆ α, ˆ β ) − ˜ H (2) N ( ˆ α, ˆ β ) , where Γ N = 1 N T N X i =1 T X t =1 E [ f i (0 | X it , B t ) X it X ′ it | B t ] − 1 N T N X i =1 T X t =1 E [ f i (0 | X it , B t ) X it | B t ] ! ( P T h i ) − 1 ( P T k i ) ′ . 22 Rearranging the ab o ve gives ˆ β − β 0 + o p     ˆ β − β 0     =Γ − 1 N ˜ H (2) N ( α 0 , β 0 ) − 1 N N X i =1 ( P T h i ) − 1 ( P T k i ) ˜ H (1) N i ( α i 0 , β 0 ) ! + Γ − 1 N 1 N N X i =1 ( P T h i ) − 1 ( P T k i )  H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β )  ! + Γ − 1 N  H (2) N ( ˆ α, ˆ β ) − ˜ H (2) N ( ˆ α, ˆ β )  + O p  max 1 ≤ i ≤ N | ˆ α i − α i 0 | 2  + O p ( T − 1 ) . (13) By applying Lemmas 4, 5, 8, and 11 in Section B in the app endix, the second to the fourth terms on the RHS of (13) are together of the order o p ( T − 1 / 2 ). Hence (13) b ecomes ˆ β − β 0 =Γ − 1 N ˜ H (2) N ( α 0 , β 0 ) − 1 N N X i =1 ( P T h i ) − 1 ( P T k i ) ˜ H (1) N i ( α i 0 , β 0 ) ! + o p  T − 1 / 2  =Γ − 1 N ˜ H (2) N ( α 0 , β 0 ) − 1 N N X i =1 γ i ˜ H (1) N i ( α i 0 , β 0 ) ! + o p  T − 1 / 2  , (14) where γ i = E [ f i (0 | X i 1 ) X i 1 ] /f i (0). This yields an asymptotic linear represen tation of ˆ β − β 0 . The desirable result follows directly from the represen tation of (14), Lemmas 9 and 11, and an application of the Slutsky’s lemma. 23 A.3 Pro of of Theorem 2 (i) . First, note that following the same steps as Prop osition 3.1 in Kato et al. (2012), it holds uniformly ov er i = 1 , . . . , N , that 1 T T X t =1 K h (ˆ ϵ it ) = f i (0) + o p (1) , 1 T T X t =1 K h (ˆ ϵ it ) X it = E [ f i (0 | X i 1 ) X i 1 ] + o p (1) , 1 T T X t =1 K h (ˆ ϵ it ) X it X ′ it = E  f i (0 | X i 1 ) X i 1 X ′ i 1  + o p (1) (15) Observ e that under the maintained assumptions, Prop osition 1 implies weak consistency of ( ˆ α 1 , . . . , ˆ α N , ˆ β ′ ), which allows one to replace ˆ ϵ it with its p opulation counterpart up to an asymptotically negligible error. Moreov er, for eac h fixed i , the summands in the three a verages on the left-hand side are indep enden t across t = 1 , . . . , T , as a consequence of the indep endence of { ( B t , U it ) } T t =1 o ver t , so these uniform consistency statemen ts follo w from the same Bousquet’s inequality arguments used in Kato et al. (2012). These results imply that sup i ∈ N ∥ ˆ γ i − γ i ∥ = o p (1). No w, let us define the oracle quantities of m 0 N t = 1 N N X i =1 ( τ − 1 { ϵ it ≤ 0 } )( X it − γ i ) , ¯ m 0 N T = 1 T T X t =1 m 0 N t . and write m 0 N tj , ¯ m 0 N T j for their resp ective j -th element for j = 1 , ..., p . The difference b et w een m 0 N T and ˆ m N T in (7) is that we replace ˆ γ i and ˆ ϵ it b y the true γ i and ϵ it , resp ectiv ely . Notice that conditioning on { B t } T t =1 , for each t = 1 , ..., T , the summands in m 0 N t and ¯ m 0 N T are indep enden t ov er i . By Ho effding’s Inequality for conditional probabilities and Assumption 1, we hav e for all 1 ≤ t ≤ T , 1 ≤ j ≤ p and s > 0 P {   m 0 N tj − E [ m 0 N tj | B t ]   ≥ s | B t } ≤ 2 C exp  − 2 N s 2  , 24 where C is a constan t that do es not dep end on t , N , T and B t . This implies that, P {   m 0 N tj − E [ m 0 N tj | B t ]   ≥ s } ≤ 2 C exp  − 2 N s 2  . By the uniform b ound, we hav e max 1 ≤ t ≤ T   m 0 N t − E [ m 0 N t | B t ]   = O p r log T N ! . Define the following oracle estimator for Σ, ˜ Σ N = 1 T P T t =1 m 0 N t ( m 0 N t ) ′ − ( ¯ m 0 N T )( ¯ m 0 N T ) ′ . By the assumption that (log T ) / N → 0, we see that ˜ Σ N = 1 T T X t =1 E [ m 0 N t | B t ] E [ m 0 N t | B t ] ′ − 1 T T X t =1 E [ m 0 N t | B t ] ! 1 T T X t =1 E [ m 0 N t | B t ] ! ′ + o p (1) . An application of Marko v’s Inequalit y yields that for any N and ϵ > 0, we hav e P (      1 T T X t =1 E [ m 0 N t | B t ] E [ m 0 N t | B t ] ′ − E  E [ m 0 N t | B t ] E [ m 0 N t | B t ] ′       > ϵ ) ≤ D T ϵ 2 , where D is a constant indep endent of N and T . This shows that, as N , T → ∞ ,      1 T T X t =1 E [ m 0 N t | B t ] E [ m 0 N t | B t ] ′ − E  E [ m 0 N t | B t ] E [ m 0 N t | B t ] ′       = o p (1) . Similarly , as N , T → ∞ ,      1 T T X t =1 E [ m 0 N t | B t ] − E [ m 0 N t ]      = o p (1) . This shows that ˜ Σ N = E  E [ m 0 N t | B t ] E [ m 0 N t | B t ] ′  − E [ m 0 N 1 ] E [ m 0 N 1 ] ′ + o p (1) =V ar  E [ m 0 N 1 | B 1 ]  + o p (1) . 25 Recall from Assumption (5) that Σ = lim N →∞ V ar  E [ m 0 N 1 | B 1 ]  . Therefore, we hav e the consistency of the oracle estimator ∥ ˜ Σ N − Σ ∥ = o p (1). W e now claim ∥ ˆ Σ N − ˜ Σ N ∥ = o p (1). Observ e from (7) and (8) that ˆ Σ N = 1 T T X t =1 ˆ m N t ( ˆ m N t ) ′ − ¯ m N T ( ¯ m N T ) ′ . Without loss of generality , supp ose α i 0 = 0 for all i ∈ N and β 0 = 0. W rite ˆ z it =  τ − 1 { ϵ it − ˆ α i − X ′ it ˆ β ≤ 0 }  ( X it − ˆ γ i ) , z 0 it = ( τ − 1 { ϵ it ≤ 0 } ) ( X it − γ i ) . Observ e that ˆ Σ N = 1 N 2 N X i =1 N X j =1 1 T T X t =1 ˆ z it ( ˆ z j t ) ′ ! − 1 N T N X i =1 T X t =1 ˆ z it ! 1 N T N X i =1 T X t =1 ˆ z it ! ′ , ˜ Σ N = 1 N 2 N X i =1 N X j =1 1 T T X t =1 z 0 it ( z 0 j t ) ′ ! − 1 N T N X i =1 T X t =1 z 0 it ! 1 N T N X i =1 T X t =1 z 0 it ! ′ . If follo ws that    ˆ Σ N − ˜ Σ N    ≤ 1 N 2 N X i =1 N X j =1      1 T T X t =1 ˆ z it ( ˆ z j t ) ′ − 1 T T X t =1 z 0 it ( z 0 j t ) ′      + 2   max 1 ≤ i ≤ N      1 T T X t =1 z 0 it      + 1 N 2 N X i =1 N X j =1      1 T T X t =1 ˆ z it − 1 T T X t =1 z 0 it        · 1 N 2 N X i =1 N X j =1      1 T T X t =1 ˆ z it − 1 T T X t =1 z 0 it      . 26 Therefore, to sho w that    ˆ Σ N − Σ    = o p (1), by Mark ov’s inequalit y , it suffices to sho w that max 1 ≤ i ≤ N E      1 T T X t =1 ( ˆ z it − z 0 it )      = o p (1) , max 1 ≤ i ≤ j ≤ N E      1 T T X t =1  ˆ z it ( ˆ z j t ) ′ − z 0 it ( z 0 j t ) ′       = o p (1) Let us sho w the first statement as the second can b e shown analogously . Recall that max 1 ≤ i ≤ N ∥ ˆ γ i − γ i ∥ = o p (1) from (15) and sup ( i,t ) ∈ N 2 ∥ X it ∥ < ∞ and sup i ∈ N ∥ γ i ∥ < ∞ from Assumptions 1 and 5, it suffices to show max 1 ≤ i ≤ N E      1 T T X t =1 ( 1 { ˆ ϵ it ≤ 0 } − 1 { ϵ it ≤ 0 } ) X it      = o p (1) , max 1 ≤ i ≤ N E      1 T T X t =1 ( 1 { ˆ ϵ it ≤ 0 } − 1 { ϵ it ≤ 0 } ) γ i      = o p (1) . W e fo cus on the first term, since the argument for the second is analogous. Observe that as ˆ ϵ it = ϵ it − ˆ α i − X ′ it ˆ β , we hav e for each i = 1 , ..., N , it holds that      1 T T X t =1 ( 1 { ˆ ϵ it ≤ 0 } − 1 { ϵ it ≤ 0 } ) X it      ≤      1 T T X t =1 ( 1 { ˆ ϵ it ≤ 0 } − E [ 1 { ˆ ϵ it ≤ 0 }| X it ]) X it      +      1 T T X t =1 ( E [ 1 { ˆ ϵ it ≤ 0 }| X it ] − E [ 1 { ϵ it ≤ 0 }| X it ]) X it      +      1 T T X t =1 ( E [ 1 { ϵ it ≤ 0 }| X it ] − 1 { ϵ it ≤ 0 } ) X it      ≤ 2 sup ( a,b ′ ) ∈ R 1+ p      1 T T X t =1 ( 1 { ϵ it − X ′ it b ≤ a } − E [ 1 { ϵ it − X ′ it b ≤ a }| X it ]) X it      + o p (1) , (16) where the the second term in (16) is o p (1) following Prop osition 1 and prop erties of f i ( ϵ | x, b ) from Assumption 4. T o control the first term in (16), note as they all share the same i , ϵ it are indep enden t ov er t . F urther, note that the class { ( x 1 , ..., x p , ϵ ) 7→ 1 { ϵ − ( x 1 , ..., x p ) b ≤ a } · x j : ( a, b ′ ) ∈ R 1+ p , j = 1 , ..., p } 27 is a VC-subgraph 3 class following Lemmas 2.6.15 and 2.6.18 in v an der V aart and W ellner (1996). By Lemma 1 and Theorem 2.14.1 in v an der V aart and W ellner (1996), there exists a C > 0 indep endent of N , T and i such that for all i = 1 , ..., N , E " sup ( a,b ′ ) ∈ R 1+ p      1 T T X t =1 ( 1 { ϵ it − X ′ it b ≤ a } − E [ 1 { ϵ it − X ′ it b ≤ a }| X it ]) X it      # ≤ C r 1 T . This implies that the first term in (16) is o p (1). Collecting these results, we no w ha ve ∥ ˆ Σ N − ˜ Σ N ∥ = o p (1) and therefore ∥ ˆ Σ N − Σ ∥ = o p (1). On the other hand, write ˜ Γ N = 1 N P N i =1 E [ f i (0 | X i 1 ) X i 1 ( X i 1 − γ i ) ′ ] = Γ + o (1). By some algebra, we hav e    ˆ Γ N − ˜ Γ N    ≤ sup 1 ≤ i ≤ N      1 T T X t =1 K h (ˆ ϵ it ) X it X ′ it − E  f i (0 | X i 1 ) X i 1 X ′ i 1       + 2 C f sup x ∈X ∥ x ∥ + sup 1 ≤ i ≤ N      1 T T X t =1 K h (ˆ ϵ it ) X it − E [ f i (0 | X i 1 ) X i 1 ]      ! sup 1 ≤ i ≤ N ∥ ˆ γ i − γ i ∥ + 2 sup 1 ≤ i ≤ N ∥ γ i ∥ + sup 1 ≤ i ≤ N ∥ ˆ γ i − γ i ∥ ! sup 1 ≤ i ≤ N      1 T T X t =1 K h (ˆ ϵ it ) X it − E [ f i (0 | X i 1 ) X i 1 ]      . By collecting the results from (15), the righ t hand side is o p (1) and hence the pro of of this case is concluded. (ii) . First, observe that the results in (15) remain v alid in this scenario. F urther, notice that by the indep endent across i, t -pairs, following a v ariance calculation, the av erage ov er cross-pro duct terms satisfies 1 N 2 T T X t =1 N X i =1 N X i ′  = i z 0 it ( z 0 i ′ t ) ′ = O p  1 √ N 2 T  , where z 0 it = ( τ − 1 { ϵ it ≤ 0 } )( X it − γ i ). Under the maintained asymptotic regime, these comp onen ts remain o p (1) ev en after multiplication by N . The remainder of the pro of follo ws from standard argumen ts using the weak law of large n umbers and is therefore omitted. 3 The formal definition can b e found in Section 2.6.2 of v an der V aart and W ellner (1996). 28 B Auxiliary Lemmas Throughout this section, for q ∈ [1 , ∞ ), define ∥ f ∥ Q,q = ( Q | f | q ) 1 /q . F or a non-empty set T and a function f : T → R , let ∥ f ∥ T = sup t ∈ T | f ( t ) | . F or a pseudometric space ( T , d ), let N ( T , d, ε ) denote the ε -cov ering num b er of ( T , d ). F or a class of functions F ∋ f : X → R , w e sa y F : X → R + is an en velope for F if it is measurable and sup f ∈F | f ( x ) | ≤ F ( x ) for all x ∈ X . Lemma 1 (Uniform co vering for conditional exp ectations) . L et F b e a class of functions f : X × Y → R with envelop es F and R a fixe d pr ob ability me asur e on Y . F or a given f ∈ F , let f : X → R b e f = R f ( x, y ) dR ( y ) . Set F = { f : f ∈ F } . Note that F is an envelop e of F . Then, for any r , s ≥ 1 , ε ∈ (0 , 1] , sup Q N ( F , ∥ · ∥ Q,r , 2 ε ∥ F ∥ Q,r ) ≤ sup Q ′ N ( F , ∥ · ∥ Q ′ × R,s , ε r ∥ F ∥ Q ′ × R,s ) , wher e sup Q and sup Q ′ ar e taken over al l finite discr ete distributions on X and X × Y , r esp e ctively. Pr o of. This is a direct consequence of Lemma A.2. in Ghosal, Sen, and v an der V aart (2000). Lemma 2 (Computational prop ert y of quan tile regression) . Supp ose Assumption 1 and 3 hold, then we have O p ( T − 1 ) = sup 1 ≤ i ≤ N | H (1) N i ( ˆ α i , ˆ β ) | O p ( T − 1 ) =    H (2) N ( ˆ α , ˆ β )    . Pr o of. This result is a consequence of the w ell-known computational prop ert y of quantile regression. W e include a pro of for completeness. W rite Q N ( α , β ) = 1 N T N X i =1 T X t =1 ρ τ ( Y it − α i − X ′ it β ) . 29 The directional deriv ative of Q N with resp ect to β at ( ˆ α , ˆ β ) is given b y D ( Q N ; w ) = − 1 N T X ( i,t ) ( X ′ it w ) ψ ( Y it − ˆ α i − X ′ it ˆ β , − X ′ it w ) , where ψ ( u, v ) =    τ − 1 { u < 0 } if u  = 0 , τ − 1 { v < 0 } if u = 0 . Let h b e the subset of { ( i, t ) : 1 ≤ i ≤ N , ; 1 ≤ t ≤ T } such that ˆ Y it = ˆ α i + X ′ it ˆ β . Define, h i = h ∩ { ( j, t ) : j = i, 1 ≤ t ≤ T } . Since ( ˆ α , ˆ β ) minimizes Q N , w e kno w that, for an y ∥ w ∥ = 1, D ( Q N ; w ) ≥ 0 , D ( Q N ; − w ) ≥ 0 , whic h implies 1 N T X h ( − X ′ it w )( τ − 1 { X ′ it w < 0 } ) ≤ w ′ N T X h c X it ( τ − 1 { Y it < ˆ α i + X ′ it ˆ β } ) ≤ 1 N T X h ( − X ′ it w )( τ − 1 {− X ′ it w < 0 } ) . Notice that, for eac h i , with probabilit y 1, | h i | is at most p + 1. W e conclude that, uniformly o ver ∥ w ∥ = 1, w.p.1,      w ′ N T X h c X it ( τ − 1 { Y it < ˆ α i + X ′ it ˆ β } )      ≤ 1 N T  sup x ∈X ∥ x ∥ ∞  N ( p + 1) = O  1 T  ,       w ′ N T X ( i,t ) X it ( τ − 1 { Y it ≤ ˆ α i + X ′ it ˆ β } )       ≤ 2 N T  sup x ∈X ∥ x ∥ ∞  N ( p + 1) = O  1 T  , Hence we hav e,    H (2) N ( ˆ α , ˆ β )    = O p ( T − 1 ) . The pro of for sup 1 ≤ i ≤ N | H (1) N i ( ˆ α i , ˆ β ) | is analogous–using this argument, we ha ve for all 30 1 ≤ i ≤ N , with probability 1,       1 T X h c i ( τ − 1 { Y it < ˆ α i + X ′ it ˆ β } )       ≤ 1 T ( p + 1) ,      1 T T X t =1 ( τ − 1 { Y it ≤ ˆ α i + X ′ it ˆ β } )      ≤ 2 T ( p + 1) . whic h implies sup 1 ≤ i ≤ N | H (1) N i ( ˆ α i , ˆ β ) | = O p ( T − 1 ) . Lemma 3 (Pro jection differen tiability) . Under the setup of Se ction 2, supp ose Assumption 4 holds, then ˜ H (2) N ( α , β ) is twic e c ontinuously differ entiable in ( α , β ) almost sur ely. In addition, the first-or der p artial derivatives ar e given by ∂ ˜ H (2) N ( α , β ) ∂ α i = − 1 T T X t =1 E [ f i ( α i − α i 0 + X ′ it ( β − β 0 ) | X it , B t ) X it | B t ] , ∂ ˜ H (2) N ( α , β ) ∂ β = − 1 N T N X i =1 T X t =1 E [ f i ( α i − α i 0 + X ′ it ( β − β 0 ) | X it , B t ) X it X ′ it | B t ] . F urther, the first- and se c ond-or der derivatives ar e uniformly b ounde d acr oss ( α , β ) . The b ounds of the derivatives ar e indep endent of N . Similarly, ˜ H (1) N i ( α i , β ) ar e also twic e c on- tinuously differ entiable in ( α i , β ) almost sur ely. The first-or der p artial derivatives ar e given by ∂ ˜ H (1) N i ( α i , β ) ∂ α i = − 1 T T X t =1 f i ( α i − α i 0 + X ′ it ( β − β 0 ) | B t ) ∂ ˜ H (1) N i ( α i , β ) ∂ β = − 1 T T X t =1 E [ f i ( α i − α i 0 + X ′ it ( β − β 0 ) | X it , B t ) X it | B t ] . Mor e over, the first- and se c ond-or der derivatives ar e uniformly b ounde d acr oss ( α i , β ) and i ∈ N . Pr o of. It suffices to pro ve that the partial deriv atives with resp ect to ( α i , β ) of E [ 1 { Y it ≤ α i + X ′ it β } ) X it | B t ] (17) 31 exist and are twice contin uously differentiable almost surely for all i ∈ N . Observe that E [ 1 { Y it ≤ α i + X ′ it β } ) X it | B t ] = E [ F i ( α i − α i 0 + X ′ it ( β − β 0 ) | X it , B t ) X it | B t ] By Assumption 1 and 4, with the dominated conv ergence theorem for conditional exp ec- tations, the ab o ve display is differentiable in α i almost surely , with deriv ativ e ∂ E [ F i ( α i − α i 0 + X ′ it ( β − β 0 ) | X it , B t ) X it | B t ] ∂ α i = E [ f i ( α i − α i 0 + X ′ it ( β − β 0 ) | X it , B t ) X it | B t ] . The ab o ve is con tin uous in ( α i , β ) b y the dominated con vergence theorem. The same holds for β —the partial deriv ative exists and is con tinuous in ( α i , β ) with the form ∂ E [ F i ( α i − α i 0 + X ′ it ( β − β 0 ) | X it , B t ) X it | B t ] ∂ β = E [ f i ( α i − α i 0 + X ′ it ( β − β 0 ) | X it , B t ) X it X ′ it | B t ] . This prov es the con tinuous differentiabilit y of (17). By applying the same argumen ts on the first order partial deriv atives, one reac hes the conclusion that (17) is twice contin uously differen tiable in ( α , β ). The b oundedness of the deriv atives comes from Assumptions 1 and 4. The pro of for ˜ H (1) N i ( α i , β ) is analogous and thus omitted. Lemma 4 (Pro jection error b ound (i)) . Under the setup of Se ction 2, supp ose Assumptions 1–6 hold. T ake δ N → 0 such that max 1 ≤ i ≤ N | ˆ α i − α i 0 | ∨    ˆ β − β 0    = O p ( δ N ) . We have      1 N N X i =1 ( P T h i ) − 1 ( P T k i )  H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β )       = o p ( T − 1 / 2 ) + O p ( d N ) . wher e d N := T − 1 | log δ N | ∨ T − 1 / 2 δ N | log δ N | 1 / 2 . Henc e, in c onjunction with Pr op osition 1, 32 the pr e c e ding display is of or der o p ( T − 1 / 2 ) . Pr o of. First let us define ψ ( x, ϵ ; δ α , δ β ) : = 1 { ϵ − x ′ δ β ≤ δ α } − 1 { ϵ ≤ 0 } , ˜ ψ i ( b ; δ α , δ β ) : = E i [ 1 { ϵ i 1 − X ′ i 1 δ β ≤ δ α } − 1 { ϵ i 1 ≤ 0 } | B 1 = b ] = Z X ×E ( 1 { ϵ − x ′ δ β ≤ δ α } − 1 { ϵ ≤ 0 } ) f i ( x, ϵ | b ) d ( x, ϵ ) , ϕ i ( x, ϵ, b ; δ α , δ β ) : = ψ ( x, ϵ ; δ α , δ β ) − ˜ ψ i ( b ; δ α , δ β ) . (18) Also denote ˆ δ αi = ˆ α i − α i 0 and ˆ δ β = ˆ β − β 0 . Observ e that √ T N N X i =1 ( P T h i ) − 1 ( P T k i )  H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β )  = 1 N N X i =1 ( P T h i ) − 1 ( P T k i ) 1 √ T T X t =1 ( − 1 { ϵ it ≤ 0 } + E i [ 1 { ϵ it ≤ 0 } | B t ]) + 1 N N X i =1 ( P T h i ) − 1 ( P T k i ) 1 √ T T X t =1 ϕ i ( X it , ϵ it , B t ; ˆ δ α , ˆ δ β ) (19) Note that, conditional on B T , the terms are indep endent across i . By the la w of total v ariance, the v ariance of the first term can b e written as 1 N 2 N X i =1 E " ( P T h i ) − 2 ( P T k i ) ′ 1 T T X t =1 V ar( 1 { ϵ it ≤ 0 } | B t ) ! ( P T k i ) # . Notice that, as N → ∞ , the norm of this v ariance expression is b ounded by 1 N 2 N X i =1 E   ( P T h i ) − 1 P T k i   2 ≤ 1 N  inf i ∈ N ,b ∈B f i (0 | b )  − 1 · sup i ∈ N ,x ∈X ,b ∈B f i (0 | x, b ) · sup x ∈X | x | . By Assumption 1 and 4, the ab o ve is O (1 / N ). Therefore, b y Mark ov’s Inequality , the first 33 term in (19) is o p (1). The norm of the second term in (19) is b ounded ab o ve by sup 1 ≤ i ≤ N ∥ γ i ∥ + sup 1 ≤ i ≤ N   ( P T h i ) − 1 P T k i − γ i   ! × 1 N N X i =1      1 √ T T X t =1 ϕ i ( X it , ϵ it , B t ; ˆ δ α , ˆ δ β )      . (20) By Lemma 10 and Assumptions 4 and 5, the t wo terms in the round brac kets in (20) is O p (1). Note that we used the fact that E [ f i (0 | X i 1 , B 1 ) X i 1 ] = E [ f i (0 | X i 1 ) X i 1 ] , as E [ f i (0 | X i 1 , B 1 ) | X i 1 ] = f i (0 | X i 1 ) follo wing the la w of iterated exp ectations. Next, w e show that the second term in (20) is O p ( √ T d N ). By Marko v’s Inequalit y , it suffices to show max 1 ≤ i ≤ N E      T X t =1 ϕ i ( X it , ϵ it , B t ; ˆ δ α , ˆ δ β )      = O p ( d N T ) . (21) W e follow the pro of strategy in Step 2 in the pro of of Theorem 3.2 with appropriate mo difications. Define the following function classes Ψ : = { ψ ( · , · ; δ α , δ β ) : X × E → R : | δ α | ∨ | δ β | ∈ R } , Ψ δ : = { ψ ( · , · ; δ α , δ β ) : X × E → R : | δ α | ∨ | δ β | < δ } , recall that ψ is defined in (18). Note that all functions in Ψ is b ounded by 2. By Lemmas 2.6.15, 2.6.16, and 2.6.18 of v an der V aart and W ellner (1996), there exists constants A ≥ 3 √ e and v dep ending only on p = dim( X ) suc h that N (Ψ , ∥ · ∥ Q, 2 , 2 ϵ ) ≤  A ϵ  v . for ev ery probability measure Q x,ϵ on X × E . F or each i ∈ N , define ˜ Ψ i : = { ˜ ψ i ( · ; δ α , δ β ) : B → R : | δ α | ∨ | δ β | ∈ R } , ˜ Ψ i,δ : = { ˜ ψ i ( · ; δ α , δ β ) : B → R : | δ α | ∨ | δ β | < δ } , Φ i : = { ϕ i ( · , · , · ; δ α , δ β ) : X × E × B → R : | δ α | ∨ | δ β | ∈ R } , Φ i,δ : = { ϕ i ( · , · , · ; δ α , δ β ) : X × E × B → R : | δ α | ∨ | δ β | < δ } , (22) 34 where ˜ ψ and ϕ i are defined in 18. By Lemma 1, w e hav e for all i ∈ N , N ( ˜ Ψ i , ∥ · ∥ Q B , 2 , 2 ϵ ) ≤  A ϵ  v , for an y finite discrete probability measure Q B on B . W e therefore hav e, for any finite discrete probabilit y measure Q on X × E × B and i ∈ N , N (Φ i , ∥ · ∥ Q, 2 , 2 ϵ ) ≤  A ϵ  2 v . No w w e fo cus on a class Φ i,δ with a 0 < δ < ∞ . The class is p oin t wise measurable. Notice that each function in Φ i,δ has zero mean (w.r.t to the probabilit y distribution of ( X i 1 , ϵ i 1 , B 1 )). They also hav e zero conditional mean given B t . Therefore, b y the law of total v ariance, it holds that V ar ( ϕ i ( X i 1 , ϵ i 1 , B 1 ; δ α , δ β )) = E [V ar ( ϕ i ( X i 1 , ϵ i 1 , B 1 ; δ α , δ β ) | B 1 )] = E  V ar  1 { ϵ i 1 ≤ δ ′ β X i 1 + δ α } − 1 { ϵ i 1 ≤ 0 } | B 1  ≤ E  E [ 1 { ϵ i 1 ≤ δ ′ β X i 1 + δ α } − 1 { ϵ i 1 ≤ 0 } | B 1 ] 2  ≤ E h  F i ( X ′ i 1 δ β + δ α | X i 1 , B 1 ) − F i (0 | X i 1 , B 1 )  2 i ≤ C 2 f  | δ α | + sup x ∈ X ∥ x ∥ | δ β |  2 . Therefore, we can apply Corollary 5.1 in Chernozhuk ov, Chetverik ov, and Kato (2014) to Φ i,δ with F = 2 and σ 2 = C 2 f (1 + sup x ∈X ∥ x ∥ ) 2 δ 2 . This yields E      T X t =1 ϕ ( X it , ϵ it , B t )      Φ i,δ ≤ C  | log δ | + √ T δ | log δ | 1 / 2  . (23) where C is a constan t indep enden t of i , N and T . It follows that, for any i ∈ N and | ˆ δ α | ∨ | ˆ δ β | < δ N , we hav e E      T X t =1 ϕ i ( X it , ϵ it , B t ; ˆ δ α , ˆ δ β )      ≤ C  | log δ N | + √ T δ N | log δ N | 1 / 2  . This pro ves (21). 35 Lemma 5 (Pro jection error b ound (ii)) . Under the setup of Se ction 2, supp ose Assump- tions 1 - 4 hold, then ∥ H (2) N ( ˆ α, ˆ β ) − ˜ H (2) N ( ˆ α, ˆ β ) ∥ = o p ( T − 1 / 2 ) . Pr o of. The pro of pro ceeds along the same lines as the second part of Lemma 4 (from (21) on ward) and is therefore omitted. Lemma 6 (Pro jec tion error b ound (iii)) . Under the setup of Se ction 2, supp ose Assump- tions 1-4 hold. F urther assume that (log N ) 2 /T → 0 . Then max 1 ≤ i ≤ N    H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β )    = O p r log N T ! . Pr o of. Let ϕ i ( x, ϵ, b ; δ α , δ β ) b e defined as in (18) in the pro of of Lemma 4. Observe that H (1) N i ( ˆ α i , ˆ β ) − ˜ H (1) N i ( ˆ α i , ˆ β ) =  H (1) N i ( α i 0 , β i 0 ) − ˜ H (1) N i ( α i 0 , β i 0 )  + 1 T T X t =1 ϕ i ( X it , ϵ it , B t ; ˆ δ αi , ˆ δ β ) . It then suffices to show that the tw o terms on the righ t hand side are of the order O p ( p T − 1 log N ) W e first show that max 1 ≤ i ≤ N    H (1) N i ( α i 0 , β i 0 ) − ˜ H (1) N i ( α i 0 , β i 0 )    = O p r log N T ! (24) By Ho effding’s Inequalit y , for an y s ∈ R and i ∈ N , we ha ve P n    H (1) N i ( α i 0 , β i 0 ) − ˜ H (1) N i ( α i 0 , β i 0 )    > s o ≤ exp( − T s 2 / 2) . By the union b ound, we hav e P  max 1 ≤ i ≤ N    H (1) N i ( α i 0 , β i 0 ) − ˜ H (1) N i ( α i 0 , β i 0 )    > s  ≤ N exp( − T s 2 / 2) , 36 whic h in turn implies P ( max 1 ≤ i ≤ N    H (1) N i ( α i 0 , β i 0 ) − ˜ H (1) N i ( α i 0 , β i 0 )    > r log N T s ) ≤ exp( − 2 s 2 ) . Next, we show that max 1 ≤ i ≤ N      1 T T X t =1 ϕ i ( X it , ϵ it , B t ; ˆ δ αi , ˆ δ β )      = o p r log N T ! . (25) The argument builds on the approac h used in Step 3 of the pro of of Theorem 3.2 in Kato et al. (2012) with appropriate mo difications. F or each i ∈ N , let Φ i and Φ i,δ b e defined as in (22). By the union b ound and max 1 ≤ i ≤ N | ˆ α i − α i 0 | ∨    ˆ β − β 0    p → 0, it suffices to prov e that for any ϵ > 0, there exists δ > 0 such that max 1 ≤ i ≤ N P         T X t =1 ϕ ( X it , ϵ it , B t )      Φ i,δ > ( T log N ) 1 / 2 ϵ    = o (1 / N ) . (26) Fix ϵ > 0. F or any δ > 0 and 1 ≤ i ≤ N , write Z N i ( δ ) =    P T t =1 ϕ ( X it , ϵ it , B t )    Φ i,δ and let σ ( δ ) = C f (1 + sup x ∈X ∥ x ∥ ) δ . Recall from the pro of of Lemma 4, σ ( δ ) 2 ≥ sup ϕ ∈ ϕ i,δ E  ϕ ( X i 1 , ϵ i 1 , B 1 ) 2  . Also, note that functions in Φ i,δ are cen tred. Applying Prop osition B.2 in Kato et al. 2012 (Bousquet’s version of T alagrand’s inequality) with U = 2, for any sequence s N > 0, P  Z N i ( δ ) ≥ E [ Z N i ( δ )] + s N p 2( T σ ( δ ) 2 + 2 U E [ Z N i ( δ )]) + s 2 N U 3  ≤ exp( − s 2 N ) . T ake s N = √ 2 log N . Then the righ t hand side of the inequalit y becomes (1 / N ) 2 = o (1 / N ). It then suffices to prov e there exists δ > 0 indep enden t of i , N and T suc h that E [ Z N i ( δ )] + s N p 2( T σ ( δ ) 2 + 2 U E [ Z N i ( δ )]) + s 2 N U 3 ≤ ( T log N ) 1 / 2 ϵ ⇐ ⇒ ( T log N ) − 1 / 2  E [ Z N i ( δ )] + s N p 2( T σ ( δ ) 2 + 2 U E [ Z N i ( δ )]) + s 2 N U 3  ≤ ϵ. 37 for large enough N . With the assumption that (log N ) 2 /T → 0, we ha ve ( T log N ) − 1 / 2 s 2 N U 3 − → 0 . By (23), E [ Z N i ( δ )] ≤ C  | log δ | + √ T δ | log δ | 1 / 2  . It follows that there exists a choice of δ that only dep ends on C suc h that ( T log N ) − 1 / 2 E [ Z N i ( δ )] < ϵ 3 , ( T log N ) − 1 / 2 s N p 2( T σ 2 + 2 U E [ Z N i ( δ )]) < ϵ 3 . for large enough N . Recall that such C is a constan t indep enden t of i , N and T , so the c hoice of δ is also indep endent of i , N and T . W e hav e thus prov ed (26). Lemma 7 (Magnitude of pro jection) . Under the setup of Se ction 2, it holds that max 1 ≤ i ≤ N | ˜ H (1) N i ( α i 0 , β 0 ) | = O p r log N T ! . Pr o of. By Ho effding’s Inequality , for an y i ∈ N , s ∈ R , P {| ˜ H (1) N i ( α i 0 , β 0 ) | > s } ≤ exp  − 2 T s 2 ( τ − ( τ − 1)) 2  = exp( − 2 T s 2 ) . By the union b ound, we hav e P  max 1 ≤ i ≤ N | ˜ H (1) N i ( α i 0 , β 0 ) | > s  ≤ N exp( − 2 T s 2 ) , whic h in turn implies P ( max 1 ≤ i ≤ N | ˜ H (1) N i ( α i 0 , β 0 ) | > r log N T s ) ≤ exp( − 2 s 2 ) , whic h concludes the pro of. Lemma 8 (Uniform conv ergence rate for ˆ α i ) . Under the setup of Se ction 2, supp ose As- 38 sumptions 1– 6 hold, then max 1 ≤ i ≤ N | ˆ α i − α i 0 | = O p r log N T ! . Pr o of. By Lemma 10 and Assumptions 1, 4, 5, and 6, the first term of the right hand side of (13) is O p (1 / √ T ) follo wing a v ariance calculation. T ogether with Lemma 4, 5, and 11, w e can further infer from (13) that    ˆ β − β 0    = O p  max 1 ≤ i ≤ N | ˆ α i − α i 0 | 2  + O p  1 √ T  . (27) Hence, by plugging (27) in to (12) and applying Lemma 6, 7, and 11, w e obtain the desired result. Lemma 9 (CL T for the score) . Under the setup of Se ction 2, supp ose Assumption 5 holds, then we have √ T ˜ H (2) N ( α 0 , β 0 ) − 1 N N X i =1 γ i ˜ H (1) N i ( α i 0 , β 0 ) ! d − → N (0 , Σ) . Pr o of. Observ e that for any fixed N and T , it holds that √ T ˜ H (2) N − 1 N N X i =1 γ i ˜ H (1) N i ( α i 0 , β 0 ) ! = 1 T T X t =1 1 N N X i =1 E [( τ − 1 { ϵ it ≤ 0 } )( X it − γ i ) | B t ] = 1 √ T T X t =1 E " 1 N N X i =1 ( τ − 1 { ϵ it ≤ 0 } )( X it − γ i )     B t # | {z } =: ψ N t = 1 √ T T X t =1 ψ N t . Since ψ N t is uniformly b ounded across N , and V ar( ψ N t ) is p ositive-definite for large N , b y Lindeb erg-F eller CL T, the conclusion follows. Lemma 10 (ULLN for conditional densities) . Under the setup of Se ction 2, supp ose that 39 Assumption 4 holds. Define the se quenc es of function classes F N 1 =  b 7→ f i (0 | B t = b ) : i = 1 , . . . , N  , F N 2 =  b 7→ E  f i (0 | X it , B t ) X it | B t = b  : i = 1 , . . . , N  . Then, for e ach j ∈ { 1 , 2 } , it holds that max f ∈F N j      1 T T X t =1  f ( B t ) − E [ f ( B 1 )]       = O p r log N T ! , as N , T → ∞ . Pr o of. Notice that for eac h fixed i , the summands are i.i.d. ov er t . The desired result is then direct consequence of Theorem 2.14.1 in v an der V aart and W ellner (1996) and Mark ov’s inequality . Lemma 11 (Consistency of the Jacobian) . Under the setup of Se ction 2, supp ose Assump- tions 1, 4 and 6 hold, then we have Γ N p − → Γ . Pr o of. This follows directly from Lemma 10 and a standard w eak law of large num b ers. 40 References Abrev a y a, J. and C. M. Dahl (2008): “The Effects of Birth Inputs on Birthw eight: Evidence F rom Quan tile Estimation on P anel Data,” Journal of Business and Ec onomic Statistics , 26, 379–397. Ando, T. and J. Bai (2020): “Quan tile Co-Mo vemen t in Financial Mark ets: A Panel Quan tile Mo del With Unobserved Heterogeneity ,” Journal of the Americ an Statistic al Asso ciation , 115, 266–279. Andrews, D. (2003): “Cross-section Regression with Common Shocks,” T ech. rep., Co wles F oundation for Research in Economics, Y ale Universit y . Andrews, D. W. (2005): “Cross-section regression with common sho c ks,” Ec onometric a , 73, 1551–1585. Arellano, M. and S. Bonhomme (2011): “Nonlinear Panel Data Analysis,” Annual R eview of Ec onomics , 3, 395–424. ——— (2016): “Nonlinear P anel Data Estimation Via Quantile Regressions,” Ec onomet- rics Journal , 19, C61–C94. Arellano, M. and J. Hahn (2007): “Understanding Bias in Nonlinear Panel Mo dels: Some Recen t Dev elopments,” in Ec onomics and Ec onometrics , ed. b y R. Blundell, W. K. New ey , and T. P ersson, Cambridge Univ ersity Press, 381–409. A they, S. and G. Imbens (2025): “Iden tification of a verage treatment effects in non- parametric panel mo dels,” arXiv pr eprint arXiv:2503.19873 . A they, S. and A. Schmutzler (2001): “In vestmen t and mark et dominance,” RAND Journal of e c onomics , 1–26. Belloni, A., M. Chen, O. H. M. P adilla, and Z. K. W ang (2023): “High-dimensional laten t panel quan tile regression with an application to asset pricing,” The Annals of Statistics , 51, 96 – 121. Berr y, S., J. Levinsohn, and A. P akes (1995): “Automobile prices in mark et equilib- rium,” Ec onometric a , 63, 841–890. Bickel, P. J., A. Chen, and E. Levina (2011): “The metho d of moments and degree distributions for netw ork models,” A nnals of Statistics , 39, 2280–2301. 41 Cana y, I. A. (2011): “A simple approach to quantile regression for panel data,” Ec ono- metrics Journal , 14, 368–386. Chen, L. (2024): “Two-Step Estimation of Quantile P anel Data Mo dels with Interactiv e Fixed Effects,” Ec onometric The ory , 40, 419–446. Chen, L., J. J. Dolado, and J. Gonzalo (2021): “Quantile factor mo dels,” Ec ono- metric a , 89, 875–910. Chernozhuko v, V., D. Chetverik ov, and K. Ka to (2014): “Gaussian approximation of suprema of empirical pro cesses,” The Annals of Statistics , 42, 1564. Chernozhuko v, V., B. Deaner, Y. Gao, J. A. Hausman, and W. Newey (2025): “Linear Estimation of Structural and Causal Effects for Nonseparable Panel Data,” . Chernozhuko v, V., I. Fern ´ andez-V al, J. Hahn, and W. Newey (2013): “Average and quan tile effects in nonseparable panel mo dels,” Ec onometric a , 81, 535–580. Chernozhuko v, V. and C. Hansen (2006): “Instrumental Quan tile Regression In- ference for Structural and T reatmen t Effects Mo dels,” Journal of Ec onometrics , 132, 491–525. Chetveriko v, D., B. Larsen, and C. P almer (2016): “IV Quan tile Regression for Group-Lev el T reatments, with an Application to the Effects of T rade on the Distribution of W ages,” Ec onometric a , 84, 809–833. Chiang, H. D., B. E. Hansen, and Y. Sasaki (2024): “Standard errors for t wo-w a y clustering with serially correlated time effects,” R eview of Ec onomics and Statistics , 1–40. D a vezies, L., X. D’Haul tfœuille, and Y. Guyonv arch (2021): “Empirical pro cess results for exchangeable arrays,” The Annals of Statistics , 49, 845–862. Demetrescu, M., M. Hosseink ouchack, and P. M. Rodrigues (2023): T ests of no cr oss-se ctional err or dep endenc e in p anel quantile r e gr essions , 1041, Ruhr Economic P ap ers. Driscoll, J. C. and A. C. Kraa y (1998): “Consisten t co v ariance matrix estimation with spatially dep endent panel data,” R eview of Ec onomics and Statistics , 80, 549–560. 42 F ama, E. F. and J. D. MacBeth (1973): “Risk, return, and equilibrium: Empirical tests,” Journal of Politic al Ec onomy , 81, 607–636. Fern ´ andez-V al, I. (2005): “Bias correction in panel data models with individual sp ecific parameters,” Available at SSRN 869104 . Fern ´ andez-V al, I. and M. Weidner (2018): “Fixed Effect Estimation of Large-T P anel Data Mo dels,” A nnual R eview of Ec onomics , 10, 109–138. Gal v ao, A. F., J. Gu, and S. Volgushev (2020): “On the un biased asymptotic nor- malit y of quantile regression with fixed effects,” Journal of Ec onometrics , 218, 178–215. Gal v ao, A. F. and K. Ka to (2016): “Smo othed quantile regression for panel data,” Journal of Ec onometrics , 193, 92–112. Ghosal, S., A. Sen, and A. W. v an der V aar t (2000): “T esting monotonicity of regression,” Annals of Statistics , 1054–1082. Graham, B. S. (2024): “Sparse netw ork asymptotics for logistic regression under p ossible missp ecification,” Ec onometric a , 92, 1837–1868. Graham, B. S., J. Hahn, A. Poirier, and J. L. Powell (2018): “A Quantile Corre- lated Random Co efficien ts Panel Data Model,” Journal of Ec onometrics , 206, 305–335. Gu, J. and S. V olgushev (2019): “Panel Data Quan tile Regression with Grouped Fixed Effects,” Journal of Ec onometrics , 213, 68–91. Harding, M., C. Lamarche, and M. H. Pesaran (2020): “Common Correlated Effects Estimation of Heterogeneous Dynamic Panel Quantile Regression Mo dels,” Journal of Applie d Ec onometrics , 35, 294–314. Kallenberg, O. (2005): Pr ob abilistic symmetries and invarianc e principles , Springer. Ka to, K., A. F. Gal v a o, and G. V. Montes-R ojas (2012): “Asymptotics for panel quan tile regression mo dels with individual effects,” Journal of Ec onometrics , 170, 76–91. Kim, M.-O. and Y. Y ang (2011): “Semiparametric Approach to a Random Effects Quan tile Regression Mo del,” Journal of the A meric an Statistic al Asso ciation , 106, 1405– 1417. 43 K oenker, R. (2004): “Quan tile Regression for Longitudinal Data,” Journal of Multivari- ate A nalysis , 91, 74–89. K oenker, R. and G. Bassett (1978): “Regression quan tiles,” Ec onometric a: journal of the Ec onometric So ciety , 33–50. Lamarche, C. (2010): “Robust P enalized Quan tile Regression Estimation for Panel Data,” Journal of Ec onometrics , 157, 396–408. Ma, S., O. Linton, and J. Gao (2021): “Estimation and Inference in Semiparametric Quan tile F actor Models,” Journal of Ec onometrics , 222, 295–323, annals Issue: Financial Econometrics in the Age of the Digital Econom y . Machado, J. A. F. and J. M. C. Santos Sil v a (2019): “Quan tiles via Moments,” Journal of Ec onometrics , 213, 145–173. Menzel, K. (2021): “Bo otstrap with cluster-dep endence in tw o or more dimensions,” Ec onometric a , 89, 2143–2188. Moulin, H. (1991): Axioms of c o op er ative de cision making , 15, Cam bridge univ ersity press. Petersen, M. A. (2008): “Estimating standard errors in finance panel data sets: Com- paring approac hes,” The R eview of Financial Studies , 22, 435–480. v an der V aar t, A. W. and J. A. Wellner (1996): We ak Conver genc e and Empiric al Pr o c esses , Springer Series in Statistics, Springer New Y ork, NY. Zhang, X., D. W ang, H. Lian, and G. Li (2023): “Nonparametric Quan tile Regres- sion for Homogeneity Pursuit in Panel Data Mo dels,” Journal of Business & Ec onomic Statistics , 41, 1238–1250. 44

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment