Bias analysis of a linear order-statistic inequality index estimator: Unbiasedness under gamma populations

Bias analysis of a linear order-statistic inequalit y index estimator: Un biasedness under gamma p opulations Rob erto Vila 1* † and Helton Saulo 1,2 † 1* Departmen t of Statistics, Univ ersit y of Brasilia, Brasilia, Brazil. 2 Departmen t of Economics, F ederal Universit y of Pelotas, Pelotas, Brazil. *Corresp onding author(s). E-mail(s): rovig161@gmail.com ; Con tributing authors: heltonsaulo@gmail.com ; † These authors contributed equally to this w ork. Abstract This pap er studies a class of rank-based inequalit y measures built from linear com binations of exp ected order statistics. The prop osed framework uniﬁes sev- eral well-kno wn indices, including the classical Gini co eﬃcient, the m th Gini index, extended m th Gini index and S -Gini index, and also connects to sp ectral inequalit y measures through an in tegral represen tation. W e in v estigate the ﬁnite- sample b eha vior of a natural U-statistic-type estimator that av erages weigh ted order-statistic con trasts ov er all subsamples of ﬁxed size and normalizes b y the sample mean. A general bias decomposition is derived in terms of comp onen ts that isolate the eﬀect of random normalization on each rank level, yielding ana- lytical expressions that can b e ev aluated under broad non-negative distributions via Laplace-transform methods. Under mild moment conditions, the estimator is shown to b e asymptotically unbiased. Moreov er, we prov e exact un biasedness under gamma p opulations for any sample size, extending earlier unbiasedness results for Gini-t yp e estimators. A Monte Carlo study is performed to numerically c heck that the theoretical un biasednes under gamma populations. Keyw ords: Linear order-statistic inequality index, linear order-statistic inequality index estimator, extended m th Gini index, m th Gini index, unbiased estimator. 1 1 In tro duction Quan tifying economic inequality from sample data is a central problem in applied economics and statistics. The classical Gini coeﬃcient is arguably the most widely used rank-based inequality measure, but it is well kno wn to exhibit non-negligible ﬁnite- sample bias, esp ecially for small and mo derate sample sizes and for highly sk ewed p opulations; see, e.g., [ 2 , 5 ] and references therein. This motiv ates the developmen t and analysis of alternative indices and estimators with improv ed small-sample prop erties, while retaining the desirable in v ariance and rank-based interpretation that make Gini- t yp e measures attractive in practice. A prominen t extension of the Gini co eﬃcien t is the class of generalized and extended Gini indices, including the m th Gini index, S -Gini index and its exten- sions, whic h can b e expressed as normalized contrasts of exp ected order statistics [ 5 – 7 , 12 – 14 ]. Suc h measures are particularly app ealing b ecause they admit transpar- en t in terpretations in terms of dispersion b et w een extreme or intermediate ranks and connect naturally to the broader class of sp ectral (rank-dep enden t) inequalit y mea- sures [ 3 , 4 ]. At the same time, man y of these indices are t ypically estimated through U-statistic-t yp e constructions that a verage rank contrasts o ver all subsamples of a giv en size m , often combined with normalization by the sample mean. Despite their conceptual app eal and practical relev ance, the ﬁnite-sample bias b eha vior of these estimators has remained largely unexplored beyond sp eciﬁc cases. The presen t paper con tributes to this literature by developing a uniﬁed bias anal- ysis for a broad family of order-statistic-based inequality indices and their canonical estimators. Sp eciﬁcally , for a non-negativ e random v ariable X with ﬁnite, strictly p os- itiv e mean µ = E [ X ], and for m ⩾ 2, w e consider the linear order-statistic inequality index I m = 1 mµ m X k =1 a k E [ X k : m ] , m X k =1 a k = 0 , whic h includes as sp ecial cases the classical Gini co eﬃcien t ( m = 2), the m th Gini index, extended m th Gini index, and S -Gini index (T able 1 ). This class also admits an integral representation I m = µ − 1 R 1 0 w m ( u ) Q X ( u )d u, so that it can b e viewed as a ﬁnite-dimensional appro ximation to con tin uous sp ectral measures of inequalit y . Giv en an indep endent and identically distributed (i.i.d.) sample X 1 , . . . , X n with n ⩾ m , we study the estimator b I m deﬁned in ( 6 ), which av erages the weigh ted order- statistic contrasts o ver all m -subsamples and normalizes by X . Our ﬁrst main result deriv es a general expression for the ﬁnite-sample bias Bias( b I m , I m ) (Corollary 3 ). The bias can b e decomposed into a linear com bination of factors ∆ n,r = E  X r : r X  − E [ X r : r ] µ , r = 1 , . . . , m, whic h separates the contribution of eac h rank lev el in a manner that is amenable to analytic and numerical inv estigation. W e further provide a c haracterization of ∆ n,r via Laplace-transform metho ds (Prop osition 4 ), yielding a route to compute (or appro ximate) the bias under general non-negative distributions. 2 Our second main result establishes asymptotic unbiasedness: under mild moment conditions, Bias( b I m , I m ) → 0 as n → ∞ (Prop osition 5 ). While asymptotic un biased- ness is reassuring, it does not address the small-sample bias that often drives empirical discrepancies. The key contribution of the pap er is therefore our third main result, whic h identiﬁes a distributional setting where the estimator is exactly unbiased for all sample sizes. Sp eciﬁcally , we prov e that if X ∼ Gamma( α, λ ), then ∆ n,r = 0 for all r ⩽ n , implying (Corollary 3 ) Bias( b I m , I m ) = 0 for all n ⩾ m, thereb y extending earlier unbiasedness results for the Gini co eﬃcien t and its m th and extended v arian ts [ 2 , 5 , 12 , 13 ]. The pro of relies on the Dirichlet prop ert y of normalized gamma samples and the homogeneit y of the maxim um functional, which together yield the iden tity E [ X r : r ] = µ E [ X r : r /X ]. Finally , w e complement the theoretical analysis with a Monte Carlo study (Section 7 ) under gamma and non-gamma p opulations. The sim ulations conﬁrm that the bias is essentially zero for Gamma( α, 1) distributions, even for n = 10, while non-gamma hea vy-tailed alternativ es suc h as Lognormal(0 , 1) and Lomax(3 , 1) displa y noticeable negativ e bias in small samples. The rest of the pap er unfolds as follows. Section 2 deﬁnes the index I m and estab- lishes key properties and representations. Section 4 derives the general bias formula for b I m and pro vides a Laplace-transform characterization. Section 5 pro v es asymptotic un biasedness under moment conditions. Section 6 establishes exact unbiasedness under gamma p opulations. Section 7 reports Mon te Carlo evidence, and Section 8 concludes. 2 A linear order-statistic inequalit y index Let X b e a non-negative random v ariable with ﬁnite mean µ = E [ X ] > 0. F or an i.i.d. random sample of size m ⩾ 2, denote the order statistics by X 1: m ⩽ · · · ⩽ X m : m . Deﬁne the p opulation index I m ≡ I m ( X ) = 1 mµ m X k =1 a k E [ X k : m ] , (1) where ( a 1 , . . . , a m ) are ﬁxed real co eﬃcien ts suc h that m X k =1 a k = 0 . (2) The index I m measures a weigh ted con trast of expected rank positions relative to the population mean. If the distribution of X is close to equalit y (i.e., X k : m ≈ µ for all k ), then the index reﬂects low inequalit y (depending on the structure of the w eights a k ). Conv ersely , large gaps b et w een low er and upp er order statistics increase the magnitude of the index, capturing stronger disp ersion across ranks. 3 Therefore, I m can b e viewed as a ﬁnite-dimensional, rank-based summary of inequalit y determined b y the w eighting scheme ( a 1 , . . . , a m ). T able 1 summarizes some main sp ecial cases of the linear order-statistic inequality index ( 1 ). Index W eigh ts a k F orm ula In terpretation Classical Gini index G [ 5 , 7 ] ( m = 2) a 1 = − 1 , a 2 = 1 E | X 1 − X 2 | 2 µ Mean abso- lute pairwise diﬀerence. m th Gini index I G m [ 6 , 12 ] ( m ⩾ 2) a 1 = − 1 , a m = 1 , a k = 0 for 2 ⩽ k ⩽ m − 1 E [ X m : m − X 1: m ] mµ Dispersion between sample minimum and maximum. Extended m th Gini I G m ( j, k ) [ 13 ] ( m ⩾ 2) a j = − 1 , a k = 1 , a i = 0 for i  = j, k , 1 ⩽ j < k ⩽ m E [ X k : m − X j : m ] mµ Diﬀerence between tw o arbitrary order statistics. General linear index I m ( m ⩾ 2) W eights with P m k =1 a k = 0 1 mµ m X k =1 a k E [ X k : m ] Most gen- eral linear order-statistic inequality index. S-Gini R ν ( ν > 1) [ 14 ] a k = ν  1 − ( m − k + ν − 1 m − k ) ( m + ν − 1 m )  , 1 − ν µ E  X (1 − F ( X )) ν − 1  Gini with tun- able tail sensi- tivity . T able 1 Special cases of the linear order-statistic index I m . R emark 1 Note that condition ( 2 ) is not required for the v alidity of the main results of this pap er. It is used only in Section 3 to derive low er and upp er b ounds for the index I m . When this restriction is relaxed, other inequalit y measures arise as particular cases of the general index I m . In particular, b y choosing a k = 1 m − b k , b 1 = 1 , b k = 0 for 2 ⩽ k ⩽ m, and a k = b k − 1 m , b m = 1 , b k = 0 for 1 ⩽ k ⩽ m − 1 , w e reco ver, resp ectiv ely , the extended lo wer and upp er Gini indices introduced by [ 11 ]: i I G m = E [ X i − X 1: m ] mµ , i I G m = E [ X m : m − X i ] mµ , 1 ⩽ i ⩽ m. 4 3 Basic prop erties In this section, w e presen t sev eral prop erties of the linear order-statistic inequalit y index deﬁned in ( 1 ). • Ratio-cale in v ariance. F or any c > 0, I m ( cX ) = I m ( X ) . • Lac k of translation inv ariance. F or an y c > 0, I m ( X + c ) = µ µ + c I m ( X ) . • V anishing under equalit y . If X = c almost surely , then I m ( X ) = 0 . • Non-negativit y under increasing weigh ts. Assume that a 1 ⩽ · · · ⩽ a m . Since E [ X 1: m ] ⩽ · · · ⩽ E [ X m : m ] , Cheb yshev’s rearrangement inequality yields m X k =1 a k E [ X k : m ] m ⩾ m X k =1 a k m m X k =1 E [ X k : m ] m ( 2 ) = 0 , b ecause P m k =1 E [ X k : m ] = mµ . Hence, under the condition a 1 ⩽ · · · ⩽ a m , the index satisﬁes I m ⩾ 0. • Upp er bound under increasing weigh ts. Assume that a 1 ⩽ · · · ⩽ a m . Since m X k =1 a k E [ X k : m ] m ( 2 ) ⩽ a m m X k =1 E [ X k : m ] m = a m µ, w e hav e I m ⩽ a m . • In tegral representation. Using the represen tation E [ X k : m ] = Z 1 0 Q X ( u ) f U k : m ( u )d u, where Q X is the quan tile function of X and U k : m ∼ Beta( k , m − k + 1) is the k -th order statistic of a sample of size m from U (0 , 1), w e obtain I m = 1 µ Z 1 0 w m ( u ) Q X ( u )d u, with w eight function w m ( u ) = (1 /m ) P m k =1 a k f U k : m ( u ) . Th us I m b elongs to the class of linear (sp ectral) inequality measures [ 3 , 4 ]. • Co v ariance represen tation. Let F denote the cum ulative distribution function (CDF) of X . Since  m − 1 k − 1  E [ F k − 1 ( X )[1 − F ( X )] m − k ] = 1 m and  m − 1 k − 1  = k m  m k  , (3) 5 w e hav e 1 µ m X k =1 a k Co v  X,  m − 1 k − 1  F k − 1 ( X )[1 − F ( X )] m − k  = 1 mµ m X k =1 a k E  Q X ( U )  k  m k  U k − 1 (1 − U ) m − k  , (4) where U ∼ U (0 , 1) and Q X is the quan tile function of X . By com bining the w ell- kno wn identit y (see Item (1) em [ 8 ]): E [ X k : m ] = k  m k  E  Q X ( U ) U k − 1 (1 − U ) m − k  , with the deﬁnition in ( 1 ) of I m , from ( 4 ) we hav e I m = 1 µ m X k =1 a k Co v  X,  m − 1 k − 1  F k − 1 ( X )[1 − F ( X )] m − k  . (5) • Lorenz-t yp e inequalit y representation. The Lorenz curve L for X is deﬁned b y L ( p ) = (1 /µ ) R p 0 Q X ( t )d t, for any 0 ⩽ p ⩽ 1 . Since L ′ ( p ) = Q X ( p ) /µ , b y ( 5 ) and iden tity in ( 4 ), w e hav e I m = 1 m m X k =1 a k E  { L ′ ( U ) − 1 }  k  m k  U k − 1 (1 − U ) m − k  , where U ∼ U (0 , 1). By integration by parts, the abov e iden tity takes the form: I m = 1 m m X k =1 a k E " { U − L ( U ) } ( k  m k  U k − 2 (1 − U ) m − k − 1 [ k − 1 − ( m − 1) U ] )# . Using the iden tity ( 3 ) and the binomial theorem applied to (1 − U ) m − k − 1 , w e derive I m = m X k =1 a k  m − 1 k − 1  m − k − 1 X r =0  m − k − 1 r  ( − 1) m − k − 1 − r ×  k − 1 m − 1 − r D m − 2 − r − m − 1 m − r D m − 1 − r  , where D n ≡ D n ( X ) = ( n + 1) E [ { U − L ( U ) } U n − 1 ] , U ∼ U (0 , 1) , n ⩾ 1 , is the Lorenz measure of inequality for X introduced in [ 1 ]. • Structural relationship. Gini ( m = 2) ⊂ m th Gini ⊂ Extended m th Gini ⊂ I m ⊂ Sp ectral inequalit y measures . 6 Prop osition 1 F or any m ⩾ 2 , the line ar or der-statistic ine quality index ( 1 ) c an be c ompute d as I m = 1 m m X k =1 a k m X r = k m r ! ( − 1) r − k r − 1 k − 1 ! E [ X r : r ] µ . Pr o of F rom Proposition 10 of [ 10 ], it follo ws that X k : m = m X r = k ( − 1) r − k r − 1 k − 1 ! X 1 ⩽ t 1 < ··· 0 a.s. , 0 , if n X i =1 X i = 0 a.s. , (6) where X = (1 /n ) P n i =1 X i is the sample mean, X 1: i = min { X i 1 , . . . , X i m } ⩽ · · · ⩽ X m : i = max { X i 1 , . . . , X i m } are the order statistics of the i.i.d. sample X i 1 , . . . , X i m , and b I m is based on n ⩾ m i.i.d. observ ations. Theorem 2 L et X 1 , . . . , X m b e i.i.d. c opies of a non-ne gative, non-degener ate r andom variable X with ﬁnite, strictly p ositive me an µ . F or any m ⩾ 2 , the fol lowing holds: E h b I m i = 1 m m X k =1 a k m X r = k m r ! ( − 1) r − k r − 1 k − 1 ! E  X r : r X  . Pr o of Making use of the identities provided in [ 10 , Prop osition 10]: X k : i = m X r = k ( − 1) r − k r − 1 k − 1 ! X t =( t 1 ,...,t r ) ∈ N r : 1 ⩽ t 1 < ··· 0 , z > 0 . Pr o of The result is an immediate consequence of the iden tities R ∞ 0 exp ( − w z ) d z = 1 /w , w > 0 , applied with w = P n i =1 X i , and X r : r = R ∞ 0 1 { X r : r >t } d t, com bined with the i.i.d. assumption on X 1 , . . . , X n . □ 5 Asymptotic un biasedness In this section, w e show that the bias of the linear order-statistic inequalit y index estimator ( 6 ), as c haracterized in Corollary 3 , v anishes as the sample size n increases. Prop osition 5 Supp ose that for some p > 1 , E  X p r : r  ∈ (0 , ∞ ) , r = 1 , . . . , m . F or any m ⩾ 2 , we have lim n →∞ Bias( b I m , I m ) = 0 . Pr o of Using the weak law of large num b ers, X P − → µ, as n → ∞ , (9) where “ P − → ” denotes conv ergence in probabilit y . Hence, for an y ε > 0, there is N ∈ N suc h that for n ⩾ N , P  X < µ − ε  = 0 . T aking ε = µ/ 2 yields lim n →∞ P ( A n ) = 0 , A n ≡ n X < µ 2 o . (10) Next, ﬁx p > 1. By ( 10 ), for every δ > 0 there exists N 0 ∈ N suc h that sup n ⩾ N 0 E  X r : r X  p 1 A n  < δ. (11) Hence, by using ( 11 ) we hav e 0 ⩽ sup n ⩾ 1 E  X r : r X  p  ⩽ sup n ⩾ 1 E  X r : r X  p 1 A c n  + max ( sup 1 ⩽ n

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment