Refined Cluster Robust Inference

Reﬁned Cluster Robust Inference ∗ Bulat Gafarov † T akuy a Ura ‡ Marc h 27 , 2026 Abstract It has beco me standard for empirical studies to conduct inference robust to cluster dep en- dence and hetero geneity . With a small num ber of clusters, the no r mal approximation for the t -statistics of regres sion co eﬃcients may be p o o r . This pap er tackles this problem using a critical v alue based on the conditional Cramér- Edgeworth expansion for the t -sta tistics. Our approach guarantees third-o r der reﬁnemen t, rega r dless o f whether a regres sor is discre te or not, and, un- like the c luster pairs bo otstrap, av oids resa mpling data. Sim ulations show that our pro p o s al can make a diﬀerence in size cont r ol with as few as 10 clusters. Keyw ords: Cluster ro bust inference, Cramér-Edgeworth expa nsion, Asymptotic reﬁnement JEL Cl assiﬁcation: C12, C21 ∗ W e w ould like to thank A. C olin Cameron for v ery helpful comments. Ura ackno wledges f u n ding fro m A . C olin Cameron Associate Pro fessor Research A w ard. † Department of Agricultural and R esource Economics, Universit y of California, Davis. Email: bgafaro v@ucdavis.edu ‡ Department of Economics, Universit y of Calif ornia, D a v is. Email: takura@ucdavis.edu 1 1 In tro duction Cluster robust inference has become a standard practice in applied microeconometrics. Robustness to arbitr ary dep endence within a cluster often comes at the cos t of a reduction in the eﬀectiv e sample s ize. As a result, despite a large o v erall sample s ize, the researc hers ha ve to accoun t for the non-Gaussian distribution of the t -statistic for signiﬁcance tests as w as the case in classical statistics Studen t ( 1908 ). In parti cular, it is w ell known that the standard normalit y-based inference metho d ma y lead to o ver- rejection when the n um b er of clusters is small ( Cameron and Miller , 2015 , 2025 ; MacKinnon, Nielsen, and W ebb , 2023 ). I nstead of assuming Gaussian reg ression errors, one can use only mild momen t restrictions and derive corrected critical v alues using higher order asy mptotic expansions of Edgew orth ( 1905 ) and Cramér ( 1928 ). In this pap er, w e prop ose a new analytical correction to critical v alues based on in verting the Cramér-Ed gewor th expansion of the t -statistic n ull distribution. The resulting inference metho d is third-order asymptotically accurate and robust against heterogeneous cluster dep endence. W e ﬁrst deriv e the Cramér-Edgew orth term up to the second order and then adjust the critical v alue based on the estimated Cramér-Edgew orth term. W e develo p the approac h of Hall ( 1983 ) in the s etup of linear regre ss ion with heterogeneous clusters. 1 As s p ecial sub-cases, our approac h also nests inference on the sample mean of non- iden tically distributed data and regression co eﬃcien ts in cross -s ectional regression. The resulting inference is third-order asymptotically accurate in the s e nse that the actual test size diﬀers from the presp eciﬁed signiﬁcanc e lev el α only b y o ( G − 1 ) with G clusters. Our sim ulation studies supp ort this theoretical size con trol. A cluster pairs b o otstrap migh t b e a popular cho ice for asymptotic reﬁnemen t, but our pro- p osed method has a few adv an tages o ve r the cluster pairs b o otstrap. 2 First and most imp ortant ly , existing sim ulation studies sho w that the clu ster pairs b o otstrap do es not perform w ell with a small n um b er of clusters. F or example, Cameron, Gelbac h, and Miller ( 2008 ) use a sim ulation design based on Bertrand, Duﬂo, and Mullainathan ( 2004 ) and explain that the p o or p erformance of the cluster pairs b o otstrap is due to the fact that the resampled v alues of the Gram matrix X ′ X are nearly singular. Our approac h of analytically in verting the Cramér-Edgew orth expansion a voids this problem by not resampling X ′ X while achie ving asymptotic reﬁnemen t. Second, the cluster pairs b o otstrap uses independence across X 1 , . . . , X G , while our expansion do es not require it. W e allo w X 1 , . . . , X G to b e corre lated, e.g., through adaptiv e randomization. Third, the standard pro of for the cluster pairs b o otstrap’s asy mptotic reﬁnem ent (e.g., Liu , 1988 ; Hall , 2013 ) excludes discrete regressors. T o apply the results f rom Hall ( 2013 , Ch.5) to the regression framew ork, we need to im- p ose the Cramér’s condition on the regressors, but the Cramér’s condition fails for discrete random v ariables ( Bhattac harya and Rao , 1976 , p.207). 3 Last, our prop osed critical v alue has a closed-form 1 In this p aper, we consider a tw o-sided alternative hypothesis and the second-order Cramér-Edgeworth expansion. As a result, w e do not need to app ly the Cramér-Edgew orth expansion of t he estima ted coeﬃcient recursiv ely , as in Hall ( 1983 ). 2 The residual b o otstrap cannot be applied for cluster robust inference w h en the sample sizes v ary across clusters. 3 W e could not ﬁnd a suﬃcient condition for the cluster p airs bo otstrap’s asympt otic reﬁn ement that allo ws for 2 expression, and for this reason, its computatio n is m uc h f aster than resampling . A nu mber of other met ho ds ha ve b een prop osed for the con s truction of standard errors and conﬁdence in terv als. Our proposal is unique in the follo wing sense: it demonstrates go o d ﬁnite- sample p erformance in the s imulati on designs based on Bertrand et al. ( 2004 ), whi le simult a- neously ac hieving third-order asymptotic reﬁnemen t. F or ex ample, Cameron et al. ( 2008 ) pro- p ose the wild cluster b o otstrap and demonstrate its go o d ﬁnite-sample prop erties in s im ulations. Djogb enou, MacKinnon, and Nielsen ( 2019 ) pro vide the asymptotic size con trol of the wild cluster b o otstrap, and Cana y , Santo s , and Shaikh ( 2021 ) s how the size contr ol of the wi ld cluster b o otstrap ev en with a ﬁxed num b er of the clusters as long as there are a large n um b er of observ ations per clu s- ter. Ho wev er, Theorem 5.2 of Djogb enou et al. ( 2019 ) shows that the wild cluster b o otstrap do es not ac hiev e asymptotic reﬁnemen t when the score has non-zero skew ness. Our sim ulation results in Section 3.2 conﬁrm this result of Djogb enou et al. ( 2019 ), and our prop osed metho d demonstrates b etter size con trol than the wild cluster b o otstrap in s uc h cases. This pap er is also related to the use of the t -distri bution for the critical v alue and an adjustmen t of degrees of freedom in the t -distribution ( McCaﬀrey and Bell , 2003 ; Bester, Conley , and Hansen , 2011 ; Im b ens and Ko lesar , 2016 ; Y oung , 2016 ; Hansen , 2021 , 2025 ). These pap ers ass ume normal and homoscedastic errors. On the other hand, our approac h do es not rely on normal errors or a sp eciﬁc co v ariance structure for the error terms within a cluster. The pap er pro ceeds as follow s. Section 2 formally in tro duces the regre ss ion mo del with clustered errors and the new critical v alue. Section 3 presen ts Mon te Carlo sim ulations f or the prop osed critical v alue and existing ones. Section 4 concludes. The app endix collects all the pro ofs and additional results. 2 In v erting a Cramér-Edgew o rth Ex pansi on with Clustered Errors W e hav e the dataset of { ( Y ig , X ig ) : i = 1 , . . . , N g , g = 1 , . . . , G } and consider the regression mo del Y ig = X ′ ig β + u ig with E [ u ig | X 1 g , . . . , X N g g ] = 0 and dim( X ig ) = k . W e assume the observ ations { ( Y ig , X ig ) : i = 1 , . . . , N g } are indep enden t across g = 1 , . . . , G . T o emphasize the fact that we use the indep endence across g , w e us e the matrix notation with Y g = ( Y 1 g , . . . , Y N g g ) ′ and X g = ( X ′ 1 g , . . . , X ′ N g g ) ′ , and write the regression model s uccinctly as Y g = X g β + u g with E [ u g | X g ] = 0 . F or a ﬁxed v ector λ and a h yp othesized v alue c 0 for λ ′ β , w e conside r the h yp othesis testing problem of H 0 : λ ′ β = c 0 vs H 1 : λ ′ β 6 = c 0 discrete regressors and non-identical distributions. A weak er version of the Cramér’s condition has b een proposed, e.g., in Bai and Rao ( 1991 ), b ut the cluster p airs bo otstrap’s asymptotic reﬁnement w ithout the classica l Cramér’s condition is b eyo nd the scop e of th is pap er. 3 with the signiﬁcanc e level α ∈ (0 , 1) . The OLS estimator for β is ˆ β =   1 G G X g =1 X ′ g X g   − 1   1 G G X g =1 X ′ g Y g   . Consider an asymptotic v ariance estimator for λ ′ ˆ β deﬁned b y ˆ σ 2 = 1 G G X g =1 ( λ ′ Π X ′ g ˆ u g ) 2 with Π =   1 G G X g =1 X ′ g X g   − 1 and ˆ u g = Y g − X g ˆ β . Deﬁne the t -statistic by t = √ G λ ′ ˆ β − c 0 ˆ σ . F rom now on, w e estimate the n ull distribution for the ab o ve t -statistic t and construct a critical v alue for it. W e treat the co v ariates X = { X g } ∞ g =1 as ﬁxed, s o w e in v estigate P r ( | t | ≤ z | X = x ) for a given sequence of constan ts x = { x g } ∞ g =1 . W e consider the n umerator and denominat or of t = √ G ( λ ′ ˆ β − c 0 ) /σ ˆ σ /σ under the n ull hypothesis H 0 , where σ 2 is the asymptotic v ariance f or λ ′ ˆ β deﬁned by σ 2 = 1 G G X g =1 σ 2 g with σ 2 g = E  ( λ ′ Π X ′ g u g ) 2 | X = x  . The n umerator has the linear represen tation of ( λ ′ ˆ β − c 0 ) /σ = 1 G G X g =1 ω 1 g with ω 1 g = σ − 1 λ ′ Π X ′ g u g . The sq uare of th e denomi nator ˆ σ 2 /σ 2 has the follo wing quadratic represen tation. The pro of is giv en in Sectio n A.1 . Lemma 2.1. ˆ σ 2 /σ 2 = 1 −   1 G G X g =1 ω 2 g   ′ Γ   1 G G X g =1 ω 2 g   + 1 G G X g =1 ω 3 g wher e Γ = − 1 G P G g =1 X ′ g X g Π ′ λλ ′ Π X ′ g X g I k I k O ! 4 and ω 2 g = σ − 1 I k X ′ g X g Π ′ λλ ′ ! Π X ′ g u g , ω 3 g = σ − 2 (( λ ′ Π X ′ g u g ) 2 − σ 2 g ) . T o appro ximate the n ull distribution for the t -statistic, we appro ximate the distribution of 1 √ G P G g =1 ( ω 1 g , ω ′ 2 g , ω 3 g ) ′ up to o ( G − 1 ) . F or this purp ose, w e use the follow ing momen ts of ω 1 g and ω 2 g : µ 1 , 2 = 1 G X g E [ ω 1 g ω 2 g | X = x ] , µ 2 , 2 = 1 G X g E  ω ′ 2 g Γ ω 2 g | X = x  , µ 1 , 1 , 1 = 1 G X g E  ω 3 1 g | X = x  , µ 1 , 1 , 1 , 1 = 1 G X g E  ω 4 1 g | X = x  W e assume these momen ts are b ounded. Assumption 1. µ 1 , 2 , µ 2 , 2 , µ 1 , 1 , 1 , and µ 1 , 1 , 1 , 1 ar e b ounde d uniformly in G . 4 The (second-order) Cramér-Edg ewor th expansion for the t -statistic’s n ull distribution is ex- pressed under the follo wing assumption, for whic h w e pro vide a suﬃcien t condition in Section 2.1 . Assumption 2. Under H 0 , ther e is a neighb orho o d N of Φ − 1 (1 − α/ 2) such that sup z ∈N | P r ( | t | ≤ z | X = x ) − (2Φ( z ) − 1 + 2 G − 1 q 2 ( z ) φ ( z )) | = o ( G − 1 ) , wher e H e r ar e the r -th or der Hermite p olynomials and q 2 ( z ) = −  1 2 ( k 2 + k 2 1 ) H e 1 ( z ) + 1 24 ( k 4 + 4 k 1 k 3 ) H e 3 ( z ) + 1 72 k 2 3 H e 5 ( z )  . The p ar ameters k 1 , . . . , k 4 ar e deﬁne d as k 1 = ν 1 k 2 = ν 2 − ν 2 1 k 3 = ν 3 − 3 ν 1 k 4 = ν 4 − 4 ν 1 ν 3 − 6 ν 2 + 12 ν 2 1 4 All the results in this paper only require th e assumptions hold for suﬃciently large G . T o simplify t he exp osition, w e impose the a ssumptions for ev ery v alue of G . 5 wher e ν 1 = − µ 1 , 1 , 1 2 ν 2 = 2 µ 2 1 , 1 , 1 + ( µ 2 , 2 + 2 µ ′ 1 , 2 Γ µ 1 , 2 ) ν 3 = − 7 2 µ 1 , 1 , 1 ν 4 = − 2 µ 1 , 1 , 1 , 1 + 28 µ 2 1 , 1 , 1 + 6 µ 2 , 2 + 24 µ ′ 1 , 2 µ 1 , 2 . The function q 2 ( z ) in the (second-order) Cramér-Ed gewo rth expansion is unkno wn since w e do not kno w the p opulation ob jects of µ ′ 1 , 2 µ 1 , 2 , µ 2 , 2 , µ 1 , 1 , 1 , µ 1 , 1 , 1 , 1 . W e can estimate them using their sample analogs: ˆ µ ′ 1 , 2 Γ ˆ µ 1 , 2 = 1 G X g ˆ ω 1 g 1 ˆ ω 2 g 1 ! ′ Γ 1 G X g ˆ ω 1 g ˆ ω 2 g ! ˆ µ 2 , 2 = 1 G X g ˆ ω ′ 2 g Γ ˆ ω 2 g , ˆ µ 1 , 1 , 1 = 1 G X g ˆ ω 3 1 g , ˆ µ 1 , 1 , 1 , 1 = 1 G X g ˆ ω 4 1 g , where ˆ ω 1 g = ˆ σ − 1 λ ′ Π X ′ g ˆ u g , ˆ ω 2 g = ˆ σ − 1 I k X ′ g X g Π ′ λλ ′ ! Π X ′ g ˆ u g . W e can construct the estimator ˆ q 2 ( z ) for q 2 ( z ) using these sample analogs. Our prop osed critical v alue is ˆ cv = Φ − 1 (1 − α/ 2) − G − 1 ˆ q 2 (Φ − 1 (1 − α/ 2)) and the resulting conﬁdence in terv al for λ ′ β is λ ′ ˆ β ± ˆ cv p ˆ σ /G . W e imp ose the follo wing conditions on momen ts. Assumption 3. ( i ) σ 2 and the min imum eigenvalue of 1 G P G g =1 x ′ g x g ar e b ounde d away fr om zer o uniformly in G . ( i i) 1 G P G g =1   x ′ g x g   4 and 1 G P G g =1 E [   X ′ g u g   j | X = x ] ar e b ounde d unif orm ly i n G f or every p ositive inte ger j . The second condition mak es the c haracteristic function of 1 G P G g =1 ω g inﬁnitely diﬀeren tiable and simpliﬁes the pro ofs. W e may weak en it to the b ounded momen t condition up to a certain order b y trun cating ω g (cf., Hall , 2013 , p.256; Bhattac harya and Ghosh , 1978 , p.446). The b ounded higher-order momen ts are crucial for our inference b ecause w e estimate the p opulation ob jects of 6 µ 1 , 2 , µ 2 , 2 , µ 1 , 1 , 1 , µ 1 , 1 , 1 , 1 . F or example, µ 1 , 1 , 1 , 1 is the av erage fourth momen t of ω 1 g . I n the pro of, w e use the fourth momen t of its estimator and th us require b ounded 16th momen ts. In Theorem 1 b elow, we show the size con trol for the proposed critical v al ue. Note that even if Assumption 2 fails, we can still ac hiev e size con trol with an asymptotic appro x imation error of the standard rate O ( G − 1 ) , as long as asymptotic normalit y holds. This p oin t resem bles the eﬃciency gain of the f easible generalized least squares estimation ( Cameron and Miller , 2025 , Section 3.5). Theorem 1. P r ( | t | ≤ ˆ cv | X = x ) = 1 − α + o ( G − 1 ) under H 0 and Assumption 1 - 3 . The proof is pro vided in Section A.2 . Remark 1. Ou r pr op ose d c onﬁ denc e interval do es not r ely on a p articular choic e of the asymptotic varianc e estimator i n the fol lowing sense. Supp ose we use an other asymptoti c varianc e estimator for λ ′ ˆ β denote d by ˜ σ 2 . Then the t -statistic b e c omes √ G λ ′ ˆ β − c 0 ˜ σ = ˆ σ ˜ σ t. W e c an ac c or dingly change the critic al value as ( ˆ σ / ˜ σ ) ˆ cv , but the r esulting c onﬁdenc e interval i s stil l λ ′ ˆ β ± ˆ cv p ˆ σ /G . Remark 2. In this p ap er, we tr e ate d X as ﬁxe d, so we do n ot apply the C r amér-Edgeworth exp ansi on to 1 G P G g =1 X ′ g X g . It is diﬀer ent fr om Djo gb enou et al. ( 2019 ), who c onsider the r e gr essor X as r andom when they derive the Cr amér-Edgeworth exp ansi on for the t -statist i c t . 5 First, ou r exp ansi on do es not r e quir e the indep endenc e of X g acr oss g . As a c onse quenc e, we al low X 1 , . . . , X G to b e gener ate d fr om an adaptive r andomization. Se c on d, the assumption i n Djo gb enou et al. ( 2019 ) for the C r amér-Edgeworth exp ans i on ex cludes empiric al ly r elevant c ases such as binary r e gr essors. With the r andom r e gr essor, the y apply the C r amér-Edgeworth exp ansion to the term 1 G P G g =1 X ′ g X g and assume the Cr amér’s c ondi tion on X ′ g X g . This c ondition fails if X g includes discr ete variab les ( Bhattacharya and R ao , 1976 , p.207). 6 In this p ap er, we do not r e quir e the Cr amér’s c on dition on X ′ g X g . Thir d, the r esulting Cr amér-Edgeworth exp ansion has fewer unknown p ar ameters to estimate. Namely, we do n ot ne e d to estimate the moments of X ′ g X g . Remark 3. The ab ove Cr amér-Edgeworth exp an sion for the t -statis ti c in Assumption 2 is diﬀer ent fr om the one f or the (standar dize d) sample me an σ − 1 1 √ G P G g =1 λ ′ Π X ′ g u g sinc e we estim ate the av- er age varianc e σ 2 . One of the qualitative diﬀer enc es is that we c an estimate the ab ove q 2 ( z ) using 5 They do not use the Cramér-Edgew orth expansion to construct a n asymptotically reﬁned inference, but instead use it to sho w the wild b ootstrap cannot achiev e the asymptotic reﬁnement of o ( G − 1 ) . As a result, this remark is not relev ant for their analysis on the wild b o otstrap b ecause their conclusion relies on th e fact that th e wild b o otstrap cannot replic ate the ﬁrst four cu mulants of th e Cramér-Edgew orth expansion. 6 Consider the case where X ig includes a binary v ariable D g . The matrix X ′ g X g includes N g D 2 g , which tak es the tw o v alues of N g and 0 . By c ho osing the vector t suc h that t ′ vec h( X ′ g X g ) = k t k N g D 2 g , we h ave E [exp( it ′ vec h( X ′ g X g ))] = P r ( D g = 1) exp( i k t k N g ) + (1 − P r ( D g = 1)) = 1 if k t k N g is a m ultiple of 2 π . It implies that th is Cramér’s condition fails. 7 the sample analo gs f or the c o eﬃcients, while we c annot estimate the Cr amér-Edgeworth exp ansion for σ − 1 1 √ G P G g =1 λ ′ Π X ′ g u g without additional ass umpti ons. In this sense, even if we know the value of σ 2 , we ne e d to use the estimate d value of σ 2 for the infer enc e str ate gy pr op ose d i n this p ap er. 7 2.1 Cramér-Edgew orth Expansion in Assumption 2 In this subsection, w e pro vide a suﬃcien t conditi on for the Cramér-Edgew orth expansion in As- sumption 2 . Assumption 4. Ther e ar e r andom ve ctors η 1 , . . . , η G , an d matri c es M 1 and M 2 such that ω g = M 1 η g and η g = M 2 ω g and that the minimum eigenvalue of V G = V ar ( 1 √ G P G g =1 η g | X = x ) is b ounde d away fr om zer o uniformly in G . This assumption remo ve s the redundan t or duplicated elemen ts from ω g . This remo v al is neces- sary to normalize the random v ariable 1 √ G P G g =1 η g b y using the matrix square ro ot of the v ariance matrix V G . Assumption 5 . Ther e ar e p ositive numb ers C, R , b with b < 2 such that 1 G P G g =1 | E [exp ( it ′ η g ) | X = x ] | ≤ 1 − C k t k − b for every t with k t k > R and for suﬃciently lar ge G . The assumption is the mean w eak Cramér’s condition prop os ed in Angst and P oly ( 2017 ). W e pro vide a suﬃcien t condition for Assumption 5 . Theorem 2. Supp ose u g has an absolutely c ontin uous dis tri bution given X = x (with r esp e ct to the N g -dimensional L eb esgue me asur e), and ther e ar e p ositive numb ers R , c 1 , c 2 such that | B ( t ) | ≥ c 2 G for every t with k t k = 1 and suﬃciently lar ge G , wher e B ( t ) = { g : π 2 / (16 R 2 ) ≤ V ar ( t ′ η g | X = x ) ≤ c 1 med( f t ′ η g | X = x ( t ′ η g ) | X = x ) − 2 } . 8 Then Assumption 5 holds. 7 The C ramér-Edgeworth ex pansion for σ − 1 1 √ G P G g = 1 λ ′ Π X ′ g u g inv olves th e av erage squared v ariance θ = 1 G G X g = 1 ( σ 2 g ) 2 = 1 G G X g = 1  λ ′ Π X ′ g E [ u g u ′ g | X = x ] X g Π λ  2 and the error v ariance matrix E [ u g u ′ g | X = x ] is not speciﬁed in this pap er. T o illustrate ho w the a verage squared v ariance app ears in the C ramér-Edgeworth expansion, we assume σ = 1 . In order to approximate the distribution of 1 √ G P G g = 1 λ ′ Π X ′ g u g up to o ( G − 1 ) , we characterize log( E [exp( iz λ ′ Π X ′ g u g / √ G )]) up to o ( G − 2 ) . W e hav e t h e T a ylor expansion of E [exp( iz λ ′ Π X ′ g u g / √ G ) | X = x ] = 1 + σ 2 g ( iz / √ G ) 2 2! + · · · . Since log(1 + z ) = z − 1 2 z 2 + O ( k z k 3 ) as z → 0 , w e hav e log( E [exp( iz λ ′ Π X ′ g u g / √ G ) | X = x ]) = σ 2 g ( iz / √ G ) 2 2! − 1 2  σ 2 g ( iz / √ G ) 2 2!  2 + · · · . By the indep endence across g , the log chara cteristic function of 1 √ G P G g = 1 λ ′ Π X ′ g u g is th e sum of the log characteristic function of λ ′ Π X ′ g u g / √ G and therefore it has the a verage squared v ariance 1 G P G g = 1 ( σ 2 g ) 2 . In the above equation, the squared va riance σ 2 g app ears b ecause the logarithm fun ction is nonlinear and we use t h e quadratic approximation to ac hieve th e remainder term of o ( G − 1 ) . 8 f t ′ η g | X = x is th e probability density function of t ′ η g giv en X = x for every g with V ar ( t ′ η g | X = x ) > 0 . F or 8 The proof is giv en in Section A.3 . The ab o v e three assumptions constitute a suﬃcien t condition for Assumption 2 . Theorem 3. Assumptions 3 - 5 i m ply Assumption 2 . The proof is giv en in Section A.4 . 3 Mon te Carlo sim ula tions In this section, w e in vestigate the ﬁnite-sample perf ormance of the critical v alue prop osed in Section 2 using s im ulated data. W e compare our metho d (denoted by “Analytical” in the ﬁgures) with a few existing metho ds, such as (i) the t G − 1 critical v alue (“Studen t”), (ii) the restricted wild cluster b o ot- strap with Rademac her weigh ts by Cameron et al. ( 2008 ) (“CWB”), and (iii) the pairs p ercen tile- t cluster b o otstrap (“Pair s”). W e use 10,000 sim ulations and 1000 draw s for b o otstrap pro cedures. W e consider tw o designs that are c hallenging for existing me tho ds but can b e accommo dated w ell using our analytic corrections. The ﬁrst design features bi nary regressors, whic h is c hallenging for the pairs cluster b o otstrap, while the second design features sk ew ed errors, whic h is challe nging for the cluster wild b o otstrap. 3.1 Design with Binary Regressors In this section, w e follo w the sim ulation design from Bertran d et al. ( 2004 ) and Cameron et al. ( 2008 , Section 5.A). It uses a state-ye ar panel of excess earnings from 1979 to 1999 based on the Curren t P opulation Surv ey . 9 F or eac h simulat ion dra w, we randomly select G out of 50 states with replacemen t. W e randomly select the policy c hange time uniformly from { 1984 , . . . , 1993 } and assume that half of the G states exp erience the p olicy c hange after the selected time p erio d. W e construct the p olicy dumm y v ariable accordingly . By deﬁnition, this p olicy dumm y v ariable has a zero co eﬃcien t in the p opulation. W e regress the excess earnings on the p olicy dumm y , the y ear dummies, and the state dummies, and conduct the signiﬁcanc e test for the co eﬃcien t of the p olicy dumm y v ariable. In the data generating pro cess of this s ection, the skewn ess of the score is close to zero (with ˆ µ 1 , 1 , 1 = 0 . 02 for G = 10 4 ). Although the Cramér condition do es not hold for this design b ecause all the v ariables are discrete (cf., Bhattac harya and Rao , 1976 , Ch.5), w e can consider whether a critical v alue accoun ts for the sk ewness and kurtosis of the t -statistic, whic h are the k ey componen ts of the second-order Cramér-Edgew orth expansion. Our prop os ed metho d matc hes these momen ts b y estimating them explicitly . A t the same time, the wild cluster b o otstrap appro ximates the every t , the distri bu tion of t ′ η g giv en X = x is ei th er a p oint mass of zero or absolutely con tinuous (with respect to the 1 -dimensional Leb esgue measure). Therefore, if V ar ( t ′ η g | X = x ) > 0 then the distribution of t ′ η g giv en X = x has a probabilit y densit y function. 9 W e use the data from the replication pack age of Cameron and Miller ( 2015 ): https://ca meron.econ.ucdavis .e du/research/papers.html In particular, this sim ulation exercise uses the v ariable lnwage from CPS_panel.dta from 1979 to 1999. 9 G 6 10 25 50 75 100 T est size 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 N = 21 , α = 0 . 05 Student Analytical P airs CWB Figure 1: Rejection probabilities for tw o-sided tests in the Bertrand et al. ( 2004 ) design. N is the n um b er of observ ations p er cluster, α is nominal test size. sk ewness and k urtosis well b ecause it uses zero skewne s s and estimates the kurtosis consisten tly ( Djogb enou et al. , 2019 , Section 5.2). Figure 1 show s the rejection probabilities for diﬀeren t metho ds. As do cumen ted in Cameron et al. ( 2008 ), the pairs cluster b o otstrap under-rejects for s mall v alues of G (e.g., G = 6 , 10 ), while the inference based on the t G − 1 critical v alue ov er-rejects. A ll metho ds con trol the size a ppro ximately when G is suﬃcien tly large (e.g., G = 50 ). Our prop osed method exhibits comparable p erformance to the wild cluster b o otstrap, ev en for a small v alue of G = 10 . 3.2 Design with a Sk ew ed E rror Distribution T o compare the metho ds when the error has large sk ewness, w e consider the case with N g = 1 , X ig = 1 , and Y ig follo ws the exp onen tial distribution with unit mean. This d esign is used in Section 3 of Hall ( 1983 ) for one-sided tests. Since the error has a sk ewness of 2 , Theorem 5.2 of Djogb enou et al. ( 2 019 ) implies that the wild cluster b o otstrap do es not ha v e asymptotic reﬁnemen t in this case. Figure 2 sho ws the rejection probabilities f or diﬀerent metho ds with the s kew ed error distribu- tion. Again, the rejection probabilities of all the metho ds approac h the presp eciﬁed signiﬁcance level ( 5% ) as G increases, whic h conﬁrms their asymptotic v alidit y . Ho we ver, the ﬁnite-sample p erfor- 10 G 6 10 25 50 75 100 T est size 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 N = 1 , α = 0 . 05 Student Analytical P airs CWB Figure 2: R ej ection probabilities for t wo- s ided tests with a skew ed error distribut ion. N is the n um b er of observ ations p er cluster, α is nominal test size. mance diﬀers. The rejection probabilities of our prop osed method approac h 5% faster than the wild cluster b o otstrap. This is consisten t with the theoretical fact that the wild cluster b o otstrap do es not ha v e asymptotic reﬁnemen t in this data distribution. In con trast, our analytical approac h and the pairs cluster b o otstrap b oth ac hieve the asymptotic reﬁnemen t. It explains wh y these methods are similar to eac h other and m uc h closer to nomina l size than the other tw o metho ds in Figure 2 . 4 Conclusion In this pap er, w e prop ose an inference metho d for linear regression with clustered errors, and it ac hieves thir d-order asymptotic reﬁn emen ts. Unlik e the cluster pairs b o otstrap, it does not resample the Gram matrix of 1 G P G g =1 X ′ g X g , thus a vo iding the small-sample issues of the cluster pairs b o otstrap (cf., Cameron et al. , 2008 ). Our sim ulation results show f a vo rable ﬁnite-sample p erformance of the prop osed metho d. Notably , it w orks comparably to the wild b o otstrap in the sim ulation design based on Bertrand et al. ( 2004 ) and for some designs (with skew ed distributions) it has better size con trol than the wild bo otstrap. 11 References Angst, J. and G . Pol y (2 017): “A w eak Cramér condition and applica tion to Edgew orth expan- sions,” Ele ctr onic Journal of Pr ob ability , 22, 1–24. Bai, Z. and C. R. Ra o (1991): “Edgew orth expansion of a function of sample means,” T he A nn als of Statis tics , 1295–1 315. Ber trand , M. , E. Duflo, and S. Mu llaina than (2004): “Ho w m uc h should w e trust diﬀerences-in-diﬀeren ces estimates?” T he Quarterly journal of e c onomi cs , 119, 249–275. Bester, C. A. , T . G. Conley, and C. B. Hansen (2011): “Inference with dependen t data using cluster co v ariance estimators,” Journal of Ec onometrics , 165, 137–151. Bha tt a char y a, R. N. and J. K. Ghosh (1978): “On the v alidit y of the formal Edgew orth expansion,” Ann. Statist , 6, 434–451. Bha tt a char y a, R. N. an d R. R. Ra o (1976): Normal Appr oximation an d Asymptotic Exp an- sions , SIAM. Bobk ov , S., G. Chisty ak ov , and F. Götze (20 12): “Bounds for c haracteristic functions in terms of quan tiles and entr opy ,” Ele ctr on i c Communic ations i n Pr ob ability , 17, 1–9. Camer on, A. C., J. B. Gelba ch, and D. L. Mi ller (2008): “Bo otstrap-based impro v emen ts for inference with clustered errors,” The r eview of e c onomics an d statistics , 90, 414–427. Camer on, A. C . and D. L. Miller (20 15): “A practit ioner’s guide to cluster-robust inference,” Journal of human r esour c es , 50, 317–372. —— — (2025): “ Inf erence for R egression with Clustered or Spatia lly Correlated Data,” W orking Pap er . Cana y, I. A., A. Santos, and A. M. Shaikh (2021): “The wild b o otstrap with a “s mall” n um b er of “large” clusters,” R eview of Ec onomics and Stati stics , 103, 346–36 3. Cramér, H. (1928): “On the Comp osition of Elemen tary Errors,” Skandinavisk Aktu arietidskrift , 11, 13–74, 141–1 80. Djogbeno u, A. A. , J. G. Ma cKinnon, and M. Ø. Nielsen (2019 ): “Asymptotic Theory and Wild Bo otstrap Inference with Clustered Errors,” Journal of Ec onometrics , 212, 393–412. Edgew or th, F. Y. (1905): “The La w of Error,” Pr o c e e dings of the C ambridge Philosophic al S o ciety , 20, 36–65. Hall, P. (1983): “I n v erting an Edgew orth expansion,” Annals of Statistics , 569–576. —— — (2013): The Bo otstr ap and Edgeworth Exp ansion , Springer Science & Business Media. Hansen, B. E. (2021): “The exact distribution of the W hite t-ratio,” Manuscript, Un iversity of Wisc onsin . —— — (2025): “Jac k knife standard errors for clustered regression,” R eview of Ec onomic Studies . Horn, R. A. and C. R. Johnson (2012): Matrix A nalys is , Camb ridge U niv ersit y Press. Imbens, G. W. and M. Ko lesar (2016): “Robust standard errors in small samples: Some practical advice,” R eview of Ec onomics and Statistics , 98, 701–71 2. Liu, R. Y. (1988): “Bo otstrap pro cedures under some non-iid mo dels,” The annals of statis ti cs , 1696–1708. 12 Ma cKinnon, J. G., M. Ø. Nielsen, and M. D. Webb (2023): “Cluster-robust inference: A guide to empirica l practice,” Journal of Ec onometrics , 232, 272–299. McCaffrey, D . F. a nd R. M. Bell (2003): “Bias reduction in s tandard errors for linear regres- sion with m ulti-stage samples,” Quality c ontr ol and applie d statistics , 48, 677–682. Student (1908): “The probable error of a mean,” Biometrika , 1–25. Young, A. (2016): “Impro ved, nearly exact, statistical inferen ce with robust and clustered co- v ariance matrices using eﬀectiv e degrees of freedom corrections,” Manuscript, L ondon Scho ol of Ec onomics . 13 App en dix A Pro ofs A.1 Pro of of Lemma 2.1 Pr o of of L emma 2.1 . N ote that λ ′ Π X ′ g ˆ u g = λ ′ Π X ′ g u g − λ ′ Π X ′ g X g ( ˆ β − β ) and that 1 G G X g =1 ( λ ′ Π X ′ g ˆ u g ) 2 = 1 G G X g =1 ( λ ′ Π X ′ g u g ) 2 − 2 1 G G X g =1 ( λ ′ Π X ′ g u g ) ′ λ ′ Π X ′ g X g ( ˆ β − β ) + 1 G G X g =1 ( ˆ β − β ) ′ X ′ g X g Π λλ ′ Π X ′ g X g ( ˆ β − β ) = 1 G G X g =1 ( λ ′ Π X ′ g u g ) 2 − 2   1 G G X g =1 ( X ′ g X g Π λλ ′ Π X ′ g u g )   ( ˆ β − β ) + 1 G ( ˆ β − β ) ′   G X g =1 X ′ g X g Π λλ ′ Π X ′ g X g   ( ˆ β − β ) . Therefore, ˆ σ 2 /σ 2 = 1 + 1 G G X g =1 ω 3 g − 2 σ − 2   1 G G X g =1 X ′ g X g Π λλ ′ Π X ′ g u g   ′   1 G G X g =1 Π X ′ g u g   + σ − 2   1 G G X g =1 Π X ′ g u g   ′ 1 G G X g =1 X ′ g X g Π λλ ′ Π X ′ g X g   1 G G X g =1 Π X ′ g u g   . 14 A.2 Pro of of Theorem 1 Lemma A.1. L et ζ 1 , . . . , ζ G b e any indep endent r andom variables with a b ounde d ave r age fourth c entr al moment. F or any se quenc e { ε G } with ε G > 0 and Gε 4 G → ∞ , we have P r         1 G G X g =1 ( ζ g − E [ ζ g | X = x ]         > ε G | X = x ) = o ( G − 1 ) . Pr o of. By Mark ov’s inequalit y (for the fourth momen t), w e ha ve P r         1 G G X g =1 ( ζ g − E [ ζ g | X = x ])       > ε G | X = x   ≤ E   1 G P G g =1 ( ζ g − E [ ζ g | X = x ])  4 | X = x  ε 4 G = 1 G 4 P G g =1 E [( ζ g 1 − E [ ζ g 1 | X = x ]) 4 | X = x ] ε 4 G + 6 1 G 4 P G g 1 =1 P G g 2 =1 E [( ζ g 1 − E [ ζ g 1 | X = x ]) 2 ( ζ g 2 − E [ ζ g 2 | X = x ]) 2 | X = x ] ε 4 G , where the equalit y f ollo ws from the independence across g . Note that 1 G 4 G X g 1 =1 G X g 2 =1 E [( ζ g 1 − E [ ζ g 1 | X = x ]) 2 ( ζ g 2 − E [ ζ g 2 | X = x ]) 2 | X = x ] ≤ 1 G 4 G X g 1 =1 G X g 2 =1 E [( ζ g 1 − E [ ζ g 1 | X = x ]) 4 | X = x ] 1 / 2 E [( ζ g 2 − E [ ζ g 2 | X = x ]) 4 | X = x ] 1 / 2 = 1 G 2 ( 1 G G X g =1 E [( ζ g − E [ ζ g | X = x ]) 4 | X = x ] 1 / 2 ) 2 ≤ 1 G 2 1 G G X g =1 E [( ζ g − E [ ζ g | X = x ]) 4 | X = x ] , where the second inequalit y follo ws f rom the Cauc h y-Sch w arz inequalit y , and the last inequali ty follo ws from Jensen’s inequalit y . Since Gε 4 G → ∞ , w e hav e the statemen t of this lemma. Lemma A.2. L et ˆ ϑ 1 ,G , . . . , ˆ ϑ L,G b e L se quenc es of r andom variables. Supp ose ˆ ρ G is the c omp osition of a ﬁ nite numb er of matrix additions and/or matrix m ultiplic ations of ˆ ϑ 1 ,G , . . . , ˆ ϑ L,G . Supp ose that the se quenc e P ℓ =1 ,...,L k ϑ ℓ,G k is b ounde d and P L ℓ =1 P r ( k ˆ ϑ ℓ,G − ϑ ℓ,G k > ε G | X = x ) = o ( G − 1 ) for some dimin i shing se quenc e { ε G } . Then ther e is some c onstant C such that P r ( k ˆ ρ G − ρ G k > C ε G | X = x ) = o ( G − 1 ) . 15 If, i n addition, ρ G is an invertible matrix whose singular values ar e b ounde d away fr om zer o, then ther e is some c ons tan t C such that P r ( k ˆ ρ − 1 G − ρ − 1 G k > C ε G | X = x ) = o ( G − 1 ) . Pr o of. F or the ﬁrst result, it suﬃces to s how that P r ( k ( ˆ ϑ 1 ,G ˆ ϑ 2 ,G + ˆ ϑ 3 ,G ) − ( ϑ 1 ,G ϑ 2 ,G + ϑ 3 ,G ) k > ε G | X = x ) = o ( G − 1 ) for some constan t C . Since k ( ˆ ϑ 1 ,G ˆ ϑ 2 ,G + ˆ ϑ 3 ,G ) − ( ϑ 1 ,G ϑ 2 ,G + ϑ 3 ,G ) k ≤ k ˆ ϑ 1 ,G − ϑ 1 ,G kk ˆ ϑ 2 ,G − ϑ 2 ,G k + k ϑ 1 ,G kk ˆ ϑ 2 ,G − ϑ 2 ,G k + k ˆ ϑ 1 ,G − ϑ 1 ,G kk ϑ 2 ,G k + k ˆ ϑ 3 ,G − ϑ 3 ,G k , w e ha ve P r ( k ( ˆ ϑ 1 ,G ˆ ϑ 2 ,G + ˆ ϑ 3 ,G ) − ( ϑ 1 ,G ϑ 2 ,G + ϑ 3 ,G ) k > ε 2 G + ( k ϑ 1 ,G k + k ϑ 2 ,G k ) ε G + ε G | X = x ) ≤ P r ( k ˆ ϑ 1 ,G − ϑ 1 ,G kk ˆ ϑ 2 ,G − ϑ 2 ,G k > ε 2 G | X = x ) + P r ( k ϑ 1 ,G kk ˆ ϑ 2 ,G − ϑ 2 ,G k > k ϑ 1 ,G k ε G | X = x ) + P r ( k ˆ ϑ 1 ,G − ϑ 1 ,G kk ϑ 2 ,G k > k ϑ 2 ,G k ε G | X = x ) + P r ( k ˆ ϑ 3 ,G − ϑ 3 ,G k > ε G | X = x ) ≤ P r ( k ˆ ϑ 1 ,G − ϑ 1 ,G k > ε G | X = x ) + P r ( k ˆ ϑ 2 ,G − ϑ 2 ,G k > ε G | X = x ) + P r ( k ˆ ϑ 2 ,G − ϑ 2 ,G k > ε G | X = x ) + P r ( k ˆ ϑ 1 ,G − ϑ 1 ,G k > ε G | X = x ) + P r ( k ˆ ϑ 3 ,G − ϑ 3 ,G k > ε G | X = x ) = o ( G − 1 ) . By taking C such that C > ε G + ( k ϑ 1 ,G k + k ϑ 2 ,G k ) + 1 , the ﬁrst result holds. No w w e will sho w the second result ab out the in verse as f ollows. By Horn and Johnson ( 2012 , Eq.(5.8.6)), w e hav e   ˆ ρ − 1 G − ρ − 1 G   ≤ k ρ − 1 G k k ρ − 1 G kk ˆ ρ G − ρ G k 1 − k ρ − 1 G kk ˆ ρ G − ρ G k as long as k ρ − 1 G kk ˆ ρ G − ρ G k < 1 . When ε G ≤ 0 . 5 k ρ − 1 G k − 1 , w e ha ve k ˆ ρ G − ρ G k < ε G = ⇒   ˆ ρ − 1 G − ρ − 1 G   ≤ k ρ − 1 G k k ρ − 1 G k ε G 1 − k ρ − 1 G k ε G ≤ 2 k ρ − 1 G k 2 ε G . Therefore, the s econd result holds. Lemma A.3. Supp ose the assumptions in The or em 1 as wel l as the nul l hyp othesis H 0 . F or every z > 0 , ther e is som e se quenc e { ε G } such that ε G = o (1) and that P r ( | ˆ q 2 ( z ) − q 2 ( z ) | > ε G | X = x ) = o ( G − 1 ) . Pr o of. By Lemma A.2 and A ssumption 1 , it suﬃces to in vestigate ˆ µ 1 , 2 − µ 1 , 2 , ˆ µ 2 , 2 − µ 2 , 2 , ˆ µ 1 , 1 , 1 − µ 1 , 1 , 1 , 16 and ˆ µ 1 , 1 , 1 , 1 − µ 1 , 1 , 1 , 1 . N ote that ˆ µ 1 , 2 − µ 1 , 2 = 1 G X g (( ˆ ω 1 g 1 − ω 1 g 1 ) ω 2 g 1 + ω 1 g 1 ( ˆ ω 2 g 1 − ω 2 g 1 ) + ( ˆ ω 1 g 1 − ω 1 g 1 )( ˆ ω 2 g 1 − ω 2 g 1 )) + 1 G X g ( ω 1 g 1 ω 2 g 1 − E [ ω 1 g 1 ω 2 g 1 | X = x ]) ˆ µ 2 , 2 − µ 2 , 2 = 1 G X g (( ˆ ω 2 g − ω 2 g ) ′ Γ ω 2 g + ω ′ 2 g Γ( ˆ ω 2 g − ω 2 g ) + ( ˆ ω 2 g − ω 2 g ) ′ Γ( ˆ ω 2 g − ω 2 g )) + 1 G X g ( ω ′ 2 g Γ ω 2 g − E [ ω ′ 2 g Γ ω 2 g | X = x ]) , ˆ µ 1 , 1 , 1 − µ 1 , 1 , 1 = 1 G X g ( ˆ ω 1 g − ω 1 g )(( ˆ ω 1 g − ω 1 g ) 2 + 3( ˆ ω 1 g − ω 1 g ) ω 1 g + 3 ω 2 1 g ) + 1 G X g ( ω 3 1 g − E [ ω 3 1 g | X = x ]) , ˆ µ 1 , 1 , 1 , 1 − µ 1 , 1 , 1 , 1 = 1 G X g ( ˆ ω 1 g − ω 1 g )(4 ω 3 1 g + 6 ω 2 1 g ( ˆ ω 1 g − ω 1 g ) + 4 ω 1 g ( ˆ ω 1 g − ω 1 g ) 2 + ( ˆ ω 1 g − ω 1 g ) 3 ) + 1 G X g ( ω 4 1 g − E [ ω 4 1 g | X = x ]) , ˆ ω 1 g − ω 1 g = (( ˆ σ /σ ) − 1 − 1) ω 1 g − ( ˆ σ /σ ) − 1 σ − 1 λ ′ Π X ′ g X g ( ˆ β − β ) , ˆ ω 2 g − ω 2 g = (( ˆ σ /σ ) − 1 − 1) ω 2 g − ( ˆ σ /σ ) − 1 σ − 1 I k X ′ g X g Π ′ λλ ′ ! Π X ′ g X g ( ˆ β − β ) . Eac h term in the abov e express ions is b ounded as follo ws: k ˆ β − β k ≤ k Π k       1 G G X g =1 X ′ g u g       , | ˆ ω 1 g − ω 1 g | ≤ | ( ˆ σ /σ ) − 1 − 1 || ω 1 g | + k ˆ β − β k| ˆ σ /σ | − 1 σ − 1 k λ kk Π kk X ′ g X g k , k ˆ ω 2 g − ω 2 g k ≤ | ( ˆ σ /σ ) − 1 − 1 |k ω 2 g k + k ˆ β − β k| ˆ σ /σ | − 1 σ − 1 k Π kk X ′ g X g k + k ˆ β − β k| ˆ σ /σ | − 1 σ − 1 k λ k 2 k Π k 2 k X ′ g X g k 2 , | ω 1 g | ≤ σ − 1 k λ kk Π kk X ′ g u g k , k ω 2 g k ≤ σ − 1 k Π kk X ′ g u g k + σ − 1 k λ k 2 k Π k 2 k X ′ g X g kk X ′ g u g k . 17 By Lemma A.1 - A.2 and Assumption 3 , w e hav e the statemen t of this lemma. Pr o of of The or em 1 . This pro of essen tially follo ws the pro of of H all ( 1983 , Theorem 1), while not using his i.i.d. assumption. Let z = Φ − 1 (1 − α/ 2) . Using the sequence ε G in Lemma A.3 , w e hav e P r ( | t | ≤ z − G − 1 ˆ q 2 ( z ) | X = x ) = P r ( | t | ≤ z − G − 1 ˆ q 2 ( z ) , | ˆ q 2 ( z ) − q 2 ( z ) | > ε G | X = x ) + P r ( | t | ≤ z − G − 1 ˆ q 2 ( z ) , | ˆ q 2 ( z ) − q 2 ( z ) | ≤ ε G | X = x ) ≤ P r ( | ˆ q 2 ( z ) − q 2 ( z ) | > ε G | X = x ) + P r ( | t | ≤ z − G − 1 q 2 ( z ) + G − 1 ε G | X = x ) = o ( G − 1 ) + P r ( | t | ≤ z − G − 1 q 2 ( z ) + G − 1 ε G | X = x ) . (1) Similarly , w e ha ve P r ( | t | ≤ z − G − 1 ˆ q 2 ( z ) | X = x ) ≥ o ( G − 1 ) + P r ( | t | ≤ z − G − 1 q 2 ( z ) − G − 1 ε G | X = x ) . (2) By Assumption 2 , w e hav e P r ( | t | ≤ z − G − 1 q 2 ( z ) + G − 1 ε G | X = x ) = 2Φ( z − G − 1 q 2 ( z ) + G − 1 ε G ) − 1 + 2 G − 1 q 2 ( z − G − 1 q 2 ( z ) + G − 1 ε G ) φ ( z − G − 1 q 2 ( z ) + G − 1 ε G ) + o ( G − 1 ) P r ( | t | ≤ z − G − 1 q 2 ( z ) − G − 1 ε G | X = x ) = 2Φ( z − G − 1 q 2 ( z ) − G − 1 ε G ) − 1 + 2 G − 1 q 2 ( z − G − 1 q 2 ( z ) − G − 1 ε G ) φ ( z − G − 1 q 2 ( z ) − G − 1 ε G ) + o ( G − 1 ) . Since Φ , φ , and q 2 are con tin uously diﬀeren tiable and ε G = o (1) , we hav e P r ( | t | ≤ z − G − 1 q 2 ( z ) + G − 1 ε G | X = x ) = 2Φ( z ) − 1 + o ( G − 1 ) P r ( | t | ≤ z − G − 1 q 2 ( z ) − G − 1 ε G | X = x ) = 2Φ( z ) − 1 + o ( G − 1 ) . T ogether with Eq.( 1 )-( 2 ), w e ha ve P r ( | t | ≤ z − G − 1 ˆ q 2 ( z ) | X = x ) − (1 − α ) = o ( G − 1 ) , whic h implies the statemen t of this theorem. 18 A.3 Pro of of Theorem 2 Pr o of of The or em 2 . By Bob ko v , Chist y ako v, and Götze ( 2012 , Theorem 2), there is a p ositive con- stan t c 3 suc h that, if g ∈ B ( t/ k t k ) and k t k > R , then   E  exp  it ′ η g  | X = x    =    E h exp  iV ar ( t ′ η g | X = x ) 1 / 2 ( V ar ( t ′ η g | X = x ) − 1 / 2 t ′ η g )  | X = x i    ≤ 1 − c 3 c 1 . Then, for every t with k t k > R , w e hav e 1 G G X g =1   E  exp  it ′ η g  | X = x    ≤ 1 − 1 G X g ∈ B ( t/ k t k ) c 3 c 1 ≤ 1 − c 2 c 3 c 1 for suﬃcien tly large G , whic h implies Assumption 5 . A.4 Pro of of Theorem 3 Lemma A.4. Deﬁne µ 1 , 3 = 1 G X g E [ ω 1 g 1 ω 3 g 1 | X = x ] µ 3 , 3 = 1 G X g E  ω 2 3 g 1 | X = x  , µ 1 , 1 , 3 = 1 G X g E  ω 2 1 g ω 3 g | X = x  , Then µ 1 , 3 = µ 1 , 1 , 1 , µ 3 , 3 = µ 1 , 1 , 1 , 1 − θ , and µ 1 , 1 , 3 = µ 1 , 1 , 1 , 1 − θ . Pr o of. Since ω 1 g = σ − 1 λ ′ Π X ′ g u g , ω 3 g = ω 2 1 g − E [ ω 2 1 g | X = x ] , 19 w e ha ve µ 1 , 3 = 1 G X g E  ω 1 g 1 ( ω 2 1 g − E [ ω 2 1 g | X = x ]) | X = x  = 1 G X g E  ω 3 1 g 1 | X = x  − 1 G X g E [ ω 1 g 1 | X = x ] E [ ω 2 1 g | X = x ] = 1 G X g E  ω 3 1 g 1 | X = x  = µ 1 , 1 , 1 µ 1 , 1 , 3 = 1 G X g E  ω 2 1 g ( ω 2 1 g − E [ ω 2 1 g | X = x ]) | X = x  = 1 G X g E  ω 4 1 g | X = x  − 1 G X g E [ ω 2 1 g | X = x ] 2 = µ 1 , 1 , 1 , 1 − θ µ 3 , 3 = 1 G X g E  ( ω 2 1 g − E [ ω 2 1 g | X = x ]) 2 | X = x  , = 1 G X g E  ω 4 1 g | X = x  − 1 G X g E [ ω 2 1 g | X = x ] 2 = µ 1 , 1 , 1 , 1 − θ . Lemma A.5. Deﬁne ˜ t = √ GW 1  1 + 1 2 W ′ 2 Γ W 2 − 1 2 W 3 + 3 8 W 2 3  , wher e    W 1 W 2 W 3    = 1 G G X g =1    ω 1 g ω 2 g ω 3 g    . Then E [ ˜ t | X = x ] = G − 1 / 2 ν 1 + o ( G − 1 ) E [ ˜ t 2 | X = x ] = 1 + G − 1 ν 2 + o ( G − 1 ) E [ ˜ t 3 | X = x ] = G − 1 / 2 ν 3 + o ( G − 1 ) E [ ˜ t 4 | X = x ] = 3 + G − 1 ν 4 + o ( G − 1 ) , E [ ˜ t j | X = x ] = o ( G − 1 ) f or every j ≥ 5 . 20 Pr o of. First, w e are going to sho w the ﬁrst momen t. Note that G 1 / 2 E [ W 1 | X = x ] = 0 and that G 1 / 2 E [ W 1 W 3 | X = x ] = G − 3 / 2 X g 1 ,g 2 E [ ω 1 g 1 ω 3 g 2 | X = x ] = G − 1 / 2 1 G X g E [ ω 1 g ω 3 g | X = x ] = G − 1 / 2 µ 1 , 3 , where the second equalit y holds b ecause ω 1 g and ω 3 g are mean zero. By Djogb enou et al. ( 2019 , Lemma A.3), G 1 / 2 E  W 1 W ′ j 1 W j 2 | X = x  = o ( G − 1 ) . Since ˜ t = √ GW 1  1 + 1 2 W ′ 2 Γ W 2 − 1 2 W 3 + 3 8 W 2 3  , w e ha ve E [ ˜ t | X = x ] = − G − 1 / 2 µ 1 , 3 2 + o ( G − 1 ) . By Lemma A.4 , w e ha ve the ﬁrst equation of this lemma. Second, w e are going to s how the second momen t. Note that GE  W 2 1 | X = x  = 1 , that GE  W 2 1 W 3 | X = x  = G − 2 X g 1 ,g 2 ,g 3 E [ ω 1 g 1 ω 1 g 2 ω 3 g 3 | X = x ] = G − 1 1 G X g E  ω 2 1 g ω 3 g | X = x  = G − 1 µ 1 , 1 , 3 , 21 and that GE  W 2 1 W ′ j 1 W j 2 | X = x  = G − 3 X g 1 ,g 2 ,g 3 ,g 4 E  ω 1 g 1 ω 1 g 2 ω ′ j 1 g 3 ω j 2 g 4 | X = x  = G − 3 X g 1 E  ω 2 1 g 1 ω ′ j 1 g 1 ω j 2 g 1 | X = x  + G − 3 X g 1 6 = g 2  E  ω 2 1 g 2 | X = x  E  ω ′ j 1 g 1 ω j 2 g 1 | X = x  + 2 E  ω 1 g 1 ω ′ j 1 g 1 | X = x  E [ ω 1 g 2 ω j 2 g 2 | X = x ]  = G − 1 1 G X g 1 E  ω ′ j 1 g 1 ω j 2 g 1 | X = x  + 2 1 G X g 1 E  ω 1 g 1 ω ′ j 1 g 1 | X = x  1 G X g 2 E [ ω 1 g 2 ω j 2 g 2 | X = x ] ! + o ( G − 1 ) = G − 1  µ j 1 ,j 2 + 2 µ ′ 1 ,j 1 µ 1 ,j 2  + o ( G − 1 ) . By Djogb enou et al. ( 2019 , Lemma A.3), GE  W 2 1 W j 1 · · · W j k | X = x  = o ( G − 1 ) for k ≥ 3 . Since ˜ t 2 = GW 2 1  1 + 1 2 W ′ 2 Γ W 2 − 1 2 W 3 + 3 8 W 2 3  2 = GW 2 1  1 − W 3 + W 2 3 + W ′ 2 Γ W 2 + ( pro ducts of three or more terms of W 2 , W 3 )  , w e ha ve E [ ˜ t 2 | X = x ] = 1 + G − 1  − µ 1 , 1 , 3 + ( µ 3 , 3 + 2 µ 2 1 , 3 ) + ( µ 2 , 2 + 2 µ ′ 1 , 2 µ 1 , 2 )  + o ( G − 1 ) . By Lemma A.4 , w e ha ve the second equation of this lemma . Third, w e are going to sho w the third momen t. Note that G 3 / 2 E  W 3 1 | X = x  = G − 3 / 2 X g 1 ,g 2 ,g 3 E [ ω 1 g 1 ω 1 g 2 ω 1 g 3 | X = x ] = G − 3 / 2 X g 1 E  ω 3 1 g 1 | X = x  = G − 1 / 2 µ 1 , 1 , 1 , 22 and that G 3 / 2 E  W 3 1 W 3 | X = x  = G − 5 / 2 X g 1 ,g 2 ,g 3 ,g 4 E [ ω 1 g 1 ω 1 g 2 ω 1 g 3 ω 3 g 4 | X = x ] = G − 5 / 2 X g 1 E  ω 3 1 g 1 ω 3 g 1 | X = x  + G − 5 / 2 X g 1 6 = g 2  E  ω 2 1 g 2 | X = x  E [ ω 1 g 1 ω 3 g 1 | X = x ] + 2 E  ω 2 1 g 1 | X = x  E [ ω 1 g 2 ω 3 g 2 | X = x ]  = 3 G − 1 / 2 1 G X g 1 E [ ω 1 g 1 ω 3 g 1 | X = x ] + o ( G − 1 ) = 3 G − 1 / 2 µ 1 , 3 + o ( G − 1 ) . By Djogb enou et al. ( 2019 , Lemma A.3), G 3 / 2 E  W 3 1 W j 1 · · · W j k | X = x  = o ( G − 1 ) for k ≥ 2 . Since ˜ t 3 = G 3 / 2 W 3 1  1 + 1 2 W ′ 2 Γ W 2 − 1 2 W 3 + 3 8 W 2 3  3 = G 3 / 2 W 3 1  1 − 3 2 W 3 + ( pro ducts of t w o or more terms of W 2 , W 3 )  , w e ha ve E [ ˜ t 3 | X = x ] = G − 1 / 2 ( µ 1 , 1 , 1 − 9 2 µ 1 , 3 ) + o ( G − 1 ) . By Lemma A.4 , w e ha ve the third equation of this lemma. F ourth, w e are going to sho w the fourth momen t. Note that G 2 E  W 4 1 | X = x  = G − 2 X g 1 ,g 2 ,g 3 ,g 4 E [ ω 1 g 1 ω 1 g 2 ω 1 g 3 ω 1 g 4 | X = x ] = G − 2 X g 1 E  ω 4 1 g 1 | X = x  + 3 1 G X g 1 E  ω 2 1 g 1 | X = x  !   G − 1 X g 2 6 = g 1 E  ω 2 1 g 2 | X = x    = 3 + G − 1 1 G X g 1  E  ω 4 1 g 1 | X = x  − 3 E  ω 2 1 g 1 | X = x  2  = 3 + G − 1 ( µ 1 , 1 , 1 , 1 − 3 θ ) , 23 that G 2 E  W 4 1 W 3 | X = x  = G − 3 X g 1 ,g 2 ,g 3 ,g 4 ,g 5 E [ ω 3 g 1 ω 1 g 2 ω 1 g 3 ω 1 g 4 ω 1 g 5 | X = x ] = G − 3 X g 1 ,g 2  6 E  ω 3 g 1 ω 2 1 g 1 | X = x  E  ω 2 1 g 1 | X = x  + 4 E [ ω 3 g 1 ω 1 g 1 | X = x ] E  ω 3 1 g 1 | X = x  + G − 3 X g 1 E [ ω 3 g 1 ω 1 g 1 ω 1 g 1 ω 1 g 1 ω 1 g 1 | X = x ] + o ( G − 1 ) = G − 1 (6 µ 1 , 1 , 3 + 4 µ 1 , 3 µ 1 , 1 , 1 ) + o ( G − 1 ) , and that G 2 E  W 4 1 W ′ j 1 W j 2 | X = x  = G − 4 X g 1 ,g 2 ,g 3 ,g 4 ,g 5 ,g 6 E  ω ′ j 1 g 1 ω j 2 g 2 ω 1 g 3 ω 1 g 4 ω 1 g 5 ω 1 g 6 | X = x  = G − 4 X g 1 ,g 2 ,g 3  3 E  ω ′ j 1 g 1 ω j 2 g 1 ω 2 1 g 2 ω 2 1 g 3 | X = x  + 12 E  ω ′ j 1 g 1 ω j 2 g 2 ω 1 g 1 ω 1 g 2 ω 2 1 g 3 | X = x  + G − 4 X g 1 ,g 2  4 E  ω ′ j 1 g 1 ω j 2 g 1 ω 1 g 1 ω 3 1 g 2 | X = x  + 6 E  ω ′ j 1 g 1 ω j 2 g 2 ω 2 1 g 1 ω 2 1 g 2 | X = x  + G − 4 X g 1 E  ω ′ j 1 g 1 ω j 2 g 1 ω 1 g 1 ω 1 g 1 ω 1 g 1 ω 1 g 1 | X = x  + o ( G − 1 ) = G − 4 X g 1 ,g 2 ,g 3  3 E  ω ′ j 1 g 1 ω j 2 g 1 ω 2 1 g 2 ω 2 1 g 3 | X = x  + 12 E  ω ′ j 1 g 1 ω j 2 g 2 ω 1 g 1 ω 1 g 2 ω 2 1 g 3 | X = x  + o ( G − 1 ) = G − 1  3 µ j 1 ,j 2 + 12 µ ′ 1 ,j 1 µ 1 ,j 2  + o ( G − 1 ) . By Djogb enou et al. ( 2019 , Lemma A.3), G 2 E  W 4 1 W j 1 · · · W j k | X = x  = o ( G − 1 ) for k ≥ 3 . Since ˜ t 4 = G 2 W 4 1  1 + 1 2 W ′ 2 Γ W 2 − 1 2 W 3 + 3 8 W 2 3  4 = G 2 W 4 1  1 − 2 W 3 + 2 W ′ 2 Γ W 2 + 3 W 2 3 + ( pro ducts of three or more terms of W 2 , W 3 )  , w e ha ve E [ ˜ t 4 | X = x ] − 3 = G − 1 (( µ 1 , 1 , 1 , 1 − 3 θ ) − 2(6 µ 1 , 1 , 3 + 4 µ 1 , 3 µ 1 , 1 , 1 ) + 2  3 µ 2 , 2 + 12 µ ′ 1 , 2 µ 1 , 2  + 3  3 µ 3 , 3 + 12 µ ′ 1 , 3 µ 1 , 3  ) + o ( G − 1 ) . By Lemma A.4 , w e ha ve the fourth equation of this lemma . 24 Last, the statemen t with j ≥ 5 f ollows f rom Dj ogb enou et al. ( 2019 , Lemma A.3). Lemma A.6. P r ( | t − ˜ t | > (log G ) − 1 G − 1 | X = x ) = o ( G − 1 ) under H 0 . Pr o of. Throughout this proof, we imp os e H 0 . W e ha ve t − ˜ t = − √ GW 1  1 + 1 2 W ′ 2 Γ W 2 − 1 2 W 3 + 3 8 W 2 3 − (1 − W ′ 2 Γ W 2 + W 3 ) − 1 / 2  . By the second-ord er T aylor expansion of u 7→ u − 1 / 2 , w e ha ve | (1 − W ′ 2 Γ W 2 + W 3 ) − 1 / 2 − (1 − 1 2 ( − W ′ 2 Γ W 2 + W 3 ) + 3 8 ( − W ′ 2 Γ W 2 + W 3 ) 2 ) | ≤ | − W ′ 2 Γ W 2 + W 3 | 3 when | − W ′ 2 Γ W 2 + W 3 | < 1 − ( 5 16 ) 2 / 7 ≈ 0 . 72 . Therefore, | t − ˜ t | ≤ √ G | W 1 |  3 8 | ( W ′ 2 Γ W 2 ) 2 − 2( W ′ 2 Γ W 2 ) W 3 | + | − W ′ 2 Γ W 2 + W 3 | 3  when | − W ′ 2 Γ W 2 + W 3 | < 1 − ( 5 16 ) 2 / 7 . By Mark o v’s inequalit y and Djogb enou et al. ( 2019 , Lemma A.3), w e ha ve P r ( | − W ′ 2 Γ W 2 + W 3 | > 1 − ( 5 16 ) 2 / 7 | X = x ) ≤ (1 − ( 5 16 ) 2 / 7 )) − 4 E [ | W ′ 2 Γ W 2 + W 3 | 4 | X = x ] = O ( G − 2 ) P r ( √ G | W 1 | ( W ′ 2 Γ W 2 ) 2 > (log G ) − 1 G − 1 | X = x ) ≤ ((log G ) − 1 G − 1 ) − 2 E [ GW 2 1 ( W ′ 2 Γ W 2 ) 4 | X = x ] = o ( G − 1 ) P r ( √ G | W 1 | W ′ 2 Γ W 2 | W 3 | > (log G ) − 1 G − 1 | X = x ) ≤ ((log G ) − 1 G − 1 ) − 4 E [ G 2 W 4 1 ( W ′ 2 Γ W 2 ) 4 W 4 3 | X = x ] = o ( G − 1 ) P r ( √ G | W 1 | ( W ′ 2 Γ W 2 ) 3 > (log G ) − 1 G − 1 | X = x ) ≤ ((log G ) − 1 G − 1 ) − 2 E [ GW 2 1 ( W ′ 2 Γ W 2 ) 6 | X = x ] = o ( G − 1 ) P r ( √ G | W 1 | ( W ′ 2 Γ W 2 ) 2 | W 3 | > (log G ) − 1 G − 1 | X = x ) ≤ ((log G ) − 1 G − 1 ) − 2 E [ GW 2 1 ( W ′ 2 Γ W 2 ) 4 W 2 3 | X = x ] = o ( G − 1 ) P r ( √ G | W 1 | ( W ′ 2 Γ W 2 ) W 2 3 > (log G ) − 1 G − 1 | X = x ) ≤ ((log G ) − 1 G − 1 ) − 2 E [ GW 2 1 ( W ′ 2 Γ W 2 ) 2 W 4 3 | X = x ] = o ( G − 1 ) P r ( √ G | W 1 || W 3 | 3 > (log G ) − 1 G − 1 | X = x ) ≤ ((log G ) − 1 G − 1 ) − 4 E [ G 2 W 4 1 W 12 3 | X = x ] = o ( G − 1 ) . They imply the statemen t of this lemma. Lemma A.7. Deﬁne F ( z ) = Φ( z ) + G − 1 / 2 q 1 ( z ) φ ( z ) + G − 1 q 2 ( z ) φ ( z ) wher e q 1 ( z ) = − ( k 1 + 1 6 k 3 H e 2 ( z )) . Then R z j d F ( z ) = E [ ˜ t j | X = x ] + o ( G − 1 ) f or every j ≥ 1 . 25 Pr o of. By Lemma A .5 and the deﬁnitions of q 1 and q 2 , the j -th cum ulan t of F is diﬀeren t only in the magnitude of o ( G − 1 ) from the one of ˜ t for j = 1 , 2 , 3 , 4 and o ( G − 1 ) for j ≥ 5 . By Hall ( 2013 , Theorem 2.1), the j -th cum ulan t of ˜ t is o ( G − 1 ) for j ≥ 5 . Ther efore, the cum ulan ts of F are diﬀerent only in the magnitude of o ( G − 1 ) from those of ˜ t . Lemma A.8. sup z ∈ R | P r ( ˜ t ≤ z | X = x ) − F ( z ) | = o ( G − 1 ) . Pr o of. The pro of structure is similar to Bhattac harya and Ghosh ( 1978 , Theorem 2), but w e extend it to the triangular arra y { η g } with p oten tially non-iden tical distributions. There is a function H suc h that ˜ t = H   1 √ G G X g =1 η g ; G − 1 / 2   . By Angst and P oly ( 2017 , Theorem 4.3), sup z ∈ R      P r ( ˜ t ≤ z | X = x ) − Z { H  V 1 / 2 G u ; G − 1 / 2  ≤ z } 2 X r =0 G − r / 2 ( ˜ P r ( − D ) φ I ) ! ( u ) du      = o ( G − 1 ) , (3) where φ I is the mu ltidimensional s tandard normal p df and ˜ P r is the p olynomial deﬁned in Bhattac harya and Rao ( 1976 , Sec tion 7). By Bhattac harya and Ghosh ( 1978 , Lemma 2.1), there are p olynomials ˜ q 1 ( z ) and ˜ q 2 ( z ) such that sup z ∈ R      Z { H  V 1 / 2 G u ; G − 1 / 2  ≤ z } 2 X r =0 G − r / 2 ( ˜ P r ( − D ) φ I ) ! ( u ) du − ˜ F ( z )      = o ( G − 1 ) (4) and Z H  V 1 / 2 G u ; G − 1 / 2  j 2 X r =0 G − r / 2 ( ˜ P r ( − D ) φ I ) ! ( u ) du = Z z j d ˜ F ( z ) + o ( G − 1 ) (5) for every in teger j ≥ 0 , where ˜ F ( z ) = Φ( z )+ G − 1 / 2 ˜ q 1 ( z ) φ ( z )+ G − 1 ˜ q 2 ( z ) φ ( z ) . 10 Since H  V 1 / 2 G u ; G − 1 / 2  j is a p olynomial of u , its momen ts are written as the deriv ativ es of the c haracteristic function at the origin and therefore Bhattac harya and Ghosh ( 1978 , Theorem 9.9) implies E [ ˜ t j | X = x ] = Z H  V 1 / 2 G u ; G − 1 / 2  j 2 X r =0 G − r / 2 ( ˜ P r ( − D ) φ I ) ! ( u ) du + o ( G − 1 ) . By Lemma A.7 and Eq. ( 5 ), we hav e Z z j d F ( z ) = E [ ˜ t j | X = x ] + o ( G − 1 ) = Z z j d ˜ F ( z ) + o ( G − 1 ) , This implies that all the co eﬃcien ts of G − 1 / 2 q 1 ( z )+ G − 1 q 2 ( z ) and G − 1 / 2 ˜ q 1 ( z )+ G − 1 ˜ q 2 ( z ) are diﬀeren t 10 Although Bhattac harya and Ghosh ( 19 78 ) assumes the iden tical distributions, their Lemma 2.1 is applicable to the triangular arra y with non-iden tical distributions. 26 only in the magn itude of o ( G − 1 ) . Ther efore, ( G − 1 / 2 q 1 ( z ) + G − 1 q 2 ( z )) φ ( z ) and ( G − 1 / 2 ˜ q 1 ( z ) + G − 1 ˜ q 2 ( z )) φ ( z ) are diﬀeren t only in the magnitude of o ( G − 1 ) uniformly in z , and w e hav e sup z ∈ R    F ( z ) − ˜ F ( z )    = o ( G − 1 ) . T ogether with Eq. ( 3 ) and ( 4 ) , w e hav e the statemen t of this lemma. Pr o of of The or em 3 . Under H 0 , w e hav e sup z ∈ R | P ( t ≤ z | X = x ) − F ( z ) | = su p z ∈ R | P r ( t ≤ z | X = x ) − P ( ˜ t ≤ z | X = x ) | + o ( G − 1 ) = su p z ∈ R  P r ( t ≤ z < ˜ t | X = x ) + P r ( ˜ t ≤ z < t | X = x )  + o ( G − 1 ) = su p z ∈ R P r ( z − (log G ) − 1 G − 1 < ˜ t < z + (log G ) − 1 G − 1 | X = x ) + o ( G − 1 ) = su p z ∈ R ( F ( z + (log G ) − 1 G − 1 ) − F ( z − (log G ) − 1 G − 1 )) + o ( G − 1 ) = o ( G − 1 ) , where the ﬁrst equalit y f ollo ws from Lemma A.8 , the third equality follo ws from Lemma A .6 , and the fourth equalit y follo ws from Lemma A.8 . 27

Refined Cluster Robust Inference

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment