Effective sample size approximations as entropy measures

Eﬀectiv e sample size appro ximations as en trop y measures L. Martino ⋆ , V. Elvira ⊤ , ⋆ Univ ersit´ a degli Studi di Catania, Italy . ⊤ Univ ersity of Edin burgh, UK. Abstract In this w ork, w e analyze alternative eﬀectiv e sample size (ESS) metrics for imp ortance sampling algorithms, and discuss a p ossible extended range of applications. W e show the relationship b et ween the ESS expressions used in the literature and t wo en trop y families, the R ´ en yi and Tsallis en trop y . The R´ en yi entrop y is connected to the Huggins-Ro y’s ESS family in tro duced in [22]. W e prov e that that all the ESS functions included in the Huggins-Ro y’s family fulﬁll all the desirable theoretical conditions. W e analyzed and remark the connections with several other ﬁelds, suc h as the Hill num b ers introduced in ecology , the Gini inequality co eﬃcien t emplo yed in economics, and the Gini impurit y index used mainly in machine learning, to name a few. Finally , by n umerical simulations, we study the p erformance of diﬀeren t ESS expressions con tained in the previous ESS families in terms of appro ximation of the theoretical ESS deﬁnition, and show the application of ESS form ulas in a v ariable selection problem. Keyw ords: Eﬀectiv e Sample Size; Imp ortance Sampling; Entrop y; Div ersity measure; Gini impurit y; Gini inequality co eﬃcient; inv erse Simpson concentration; Berger-Park er index. The eﬀectiv e sample size (ESS) measure is an imp ortan t concept in order to quan tify the eﬃciency of diﬀerent Mon te Carlo metho ds, such as Marko v Chain Monte Carlo (MCMC) [16, 30] and Imp ortance Sampling (IS) techniques [4, 6]. In an IS con text, the ESS is a heuristic to appro ximate how many indep enden t identically distributed (i.i.d.) samples, dra wn directly from the target distribution ¯ π ( x ) = 1 Z π ( x ) where Z is the normalizing constan t, are equiv alent in some sense to the N weigh ted samples, x 1 , . . . , x n , dra wn from a prop osal distribution q ( x ) and w eighted according to the ratio w n = π ( x n ) q ( x n ) [40]. This consideration is represented in the ﬁrst b o x of Figure 1, referred as abstr act ESS c onc ept . The theoretical deﬁnition of the ESS for IS is giv en by the ratio b etw een tw o v ariances [16, 26]: the v ariance of the ideal Mon te Carlo estimator (dra wing samples directly from the target), and the v ariance of the estimator obtained by an IS sc heme, using the same num b er of samples in b oth estimators (see Eq. (5) for more details). This deﬁnition presen ts some dra wbac ks (see [34, 14] for an exhaustiv e discussion) and is useless for practical purp oses since it cannot b e computed in general. Hence, appro ximations of this theoretical form ula are required. In Figure 1, this theoretical deﬁnition is represented b y the second b o x. Within an IS context, the most common c hoice in literature to appro ximate this theoretical ESS deﬁnition is ESS = 1 P N n =1 ¯ w 2 n , which in volv es (only) the normalized imp ortance w eights ¯ w n = w n P N j =1 w j , n = 1 , . . . , N [10, 11, 27, 40]. This expression has b een widely used in particle ﬁltering in order to apply the resampling steps adaptiv ely [11, 10, 19]. Ho wev er, it presen ts diﬀeren t weaknesses since it has b een obtained after sev eral approximations of the theoretical deﬁnition. F or instance, it just depends on the normalized weigh ts, but it is not dep enden t on particle lo cations and from the particular integral to approximate (see [14, 34] for further details). Sev eral other alternatives ha ve b een studied in literature and applied in order to p erform adaptive resampling within sequen tial Mon te Carlo (SMC) metho ds [22, 34]. F or instance, another measure called p erplexity , in volving the discrete en tropy [9] of the normalized weigh ts has b een also prop osed in [5] (see also [40, Chapter 4], [13, Section 3.5]). Another expression is deﬁned as the inv erse of the maximum of the normalized w eights ¯ w n [34]. In this work, w e recall the deﬁnition of the generalized ESS (G-ESS) functions giv en in [34]. W e stress and show that the G-ESS functions can b e considered diversity indic es [24] (see third b o x in Figure 1). Indeed, w e show that the G-ESS functions can b e asso ciated to diﬀeren t entrop y families [9]. Giv en an entrop y measure of the probabilit y mass function (pmf ) deﬁned b y the normalized weigh ts ¯ w n , n = 1 , . . . , N , w e can obtain a G-ESS formula by taking the exp onen tial transformation of the en tropy expression (in some cases, some additional translation and scaling are needed). More sp eciﬁcally , we analyze the R ´ en yi and Tsallis entrop y families, conv erting them in G-ESS functions. The ESS formulas corresp onding to the R´ enyi entrop y coincides with the Huggins-Ro y’s ESS family introduced and studied indep endently in [22], ESS = N X n =1 ¯ w β n ! 1 1 − β , β ≥ 0 . W e sho w that all the G-ESS expressions b elonging to this family satisfy all the desired requiremen ts, b eing all deﬁned as pr op er and stable (see Section 3 for further details). Moreo v er, almost all the main form ulas previously prop osed in the literature are con tained in the Huggins- Ro y’s family . Using the Tsallis entrop y , we obtain another ESS family which con tain the Gini impurity index as special case, that is widely employ ed in mac hine learning within decision tree algorithms [3, 28]. W e also discuss the connection to another ESS family provided in [34]. Ho wev er, generally the Tsallis ESS formulas are not prop er and stable. Other stable expressions that do not b elong to the Huggins-Roy’s family are also given (see, e.g., Sections 6 and 7.3). The connections with the en tropy families sho w the relationships with multiple studies in diﬀeren t ﬁelds (e.g., ecology and mac hine learning to name a few). The beneﬁt of creating these bridges b et ween ﬁelds is t w ofold and bidirectional: diﬀeren t ideas used in other ﬁelds can be applied as ESS expressions in an IS con text (as the formula (44) in tro duced in p olitical science) and, vice- v ersa, ESS formulas prop osed for IS could b e emplo y ed in other ﬁelds. Showing these bridges is the main goal of th is w ork. W e remark on the links with other ﬁelds in Section 7, where w e discuss the applications of ESS expression in ecology , economics, p olitical science, physics, and also in feature selection problems. Other connections with mac hine learning, economics, and ecology are also discussed in the previous Sections 5 and 6. Figure 2 provides a summary of the main 2 nomenclature emplo yed in diﬀerent ﬁelds. F urthermore, by n umerical simulations, w e obtain the G-ESS function within Huggins-Roy’s family whic h provides the best approximation the theoretical ESS deﬁnition, in t wo speciﬁc scenarios. W e also study linear com binations of G-ESS functions in order to enhance the approximation of the theoretical deﬁnition. The results of our numerical sim ulations suggest the use of the formulas of the t yp e of ESS =  1 P N n =1 ¯ w 4 n  1 / 3 and ESS =  1 P N n =1 ¯ w 8 n  1 / 7 . Both expression diﬀer from the classical form ula ESS = 1 P N n =1 ¯ w 2 n , whic h is cont ained in Huggins-Roy’s family with β = 2. Our study suggest the use of β > 2. Moreov er, w e ha ve applied the most relev an t ESS formulas in a v ariable selection framework. Some of them pro vides go o d results in line with the exp ert’s opinions. These considerations can b e also relev ant clues for future applications and studies. Abstract ESS concept: comparing weigh ted samples with i.i.d. samples from ¯ ⇡ Theoretical Deﬁnition: for instance, ESS = N v ar ⇡ [ b I ] v ar q [ e I ] Appro ximation: Div ers ity measures for instance, [ ESS = 1 P N n =1 ¯ w 2 n Figure 1: Graphical represen tation of the dev elopment of the appro ximated ESS form ulas for imp ortance sampling. The abstract concept of Eﬀective Sample Size has b een translated in a mathematical form ulation pro viding a ﬁrst attempt of theoretical deﬁnition. Since this deﬁnition cannot compute, sev eral approximations hav e b een prop osed (based only in the information pro vided b y the normalized IS w eights). The expression ESS = 1 P M n =1 ¯ w 2 n is the most applied so far in the literature. 1 Eﬀectiv e sample size (ESS) for imp ortance sampling Let us denote the target probability density function (p df ) as ¯ π ( x ) ∝ π ( x ) (kno wn up to a normalizing constan t) with x ∈ X . Moreo v er, we consider the follo wing in tegral in volving ¯ π ( x ) and a square-in tegrable function h ( x ), I = Z X h ( x ) ¯ π ( x ) d x , (1) whic h w e aim to appro ximate using a Mon te Carlo approac h. If w e are able to draw N indep enden t samples x 1 , . . . , x N from ¯ π ( x ), then the Monte Carlo estimator of I is b I = 1 N N X n =1 h ( x n ) N →∞ − − − → I , where x n ∼ ¯ π ( x ) . (2) Ho wev er, generating samples directly from the target, ¯ π ( x ), is often imp ossible. Alternatively , we can dra w N samples x 1 , . . . , x N from a (simpler) prop osal p df q ( x ), 1 and then assign a weigh t to 1 W e assume that q ( x ) > 0 for all x where ¯ π ( x ))  = 0. 3 eac h sample, w n = π ( x n ) q ( x n ) , with n = 1 , . . . , N , according to the imp ortance sampling (IS) approach. Deﬁning the normalized w eights, ¯ w n = w n P N i =1 w i , where w n = π ( x n ) q ( x n ) , n = 1 , . . . , N , (3) then the self-normalized IS estimator is e I = N X n =1 ¯ w n h ( x n ) N →∞ − − − → I , where x n ∼ q ( x ) . (4) Generally , the estimator e I has greater v ariance than b I , since the samples are not directly drawn from ¯ π ( x ) (for some exceptions, that o ccur with a suitable c hoice of the prop osal, see [32]). Moreo ver, e I is biased whereas b I is unbiased. In sev eral applications [10, 11], it is necessary to measure the loss of the eﬃciency when we apply the IS estimator e I , instead of ideal Mon te Carlo estimator b I , i.e., to measure in some wa y the increase of v ariance due to the use of e I instead of b I . Hence, the idea is to deﬁne the Eﬀective Sample Size (ESS) as the ratio of the v ariances of the estimators [26], ESS teo ( h ) = N v ar π [ b I ] v ar q [ e I ] . (5) Note the dep endence on the function h ( x ) corresp onding to a sp eciﬁc in tegral. 2 Practical ESS form ulas 2.1 ESS expressions in the literature Finding a useful expression of ESS deriv ed analytically from the theoretical deﬁnition in Eq. (5) ab o ve is not straigh tforw ard. Then, diﬀeren t deriv ations [26, 27], [11, Chapter 11], [40, Chapter 4] pro ceed using several approximations and assumptions for yielding an expression useful from a practical p oin t of view. A well-kno wn rule of th um b, widely used in literature [11, 31, 40], is ESS N ( ¯ w ) = 1 P N n =1 ¯ w 2 n , (6) where w e hav e used the the normalized w eigh ts ¯ w = [ ¯ w 1 , . . . , ¯ w N ] , deﬁned in Eq. (3). The formula ab o ve has also an intuitiv e probabilistic in terpretation (from a resampling p oin t of view): if we draw random pairs of samples with replacement according to the probabilit y mass function (pmf ) deﬁned by ¯ w n , with n = 1 , ..., N , the v alue 1 P N n =1 ¯ w 2 n is the exp ected n umber of trials needed to obtain a ﬁrst pair con taining the same sample twice (see App endix A for details). F urthermore, another in teresting form of Eq. (6) as a function of the v ariance of the 4 w eights is giv en in App endix B. Another similar measure, called p erplexity , has b een prop osed indep enden tly in literature based only on the normalized imp ortance w eights [5, 40], ESS N ( ¯ w ) = exp { H ( ¯ w ) } (7) where H ( ¯ w ) = − N X n =1 ¯ w n log ¯ w n , is the discrete en tropy of the vector ¯ w [9]. An additional example is the following formula [34], ESS N ( ¯ w ) = 1 max ¯ w n . (8) Let us assume that max ¯ w n is reached only with one sample (only for one index n ). In this case, the expression ab o ve has also a probabilistic in terpretation: if w e dra w one sample with replacemen t according to the pmf deﬁned b y ¯ w n , with n = 1 , ..., N , the v alue 1 max ¯ w n is the expected n umber of trials needed to obtain for a ﬁrst time the sample corresp onding to the maxim um w eight. The pro of is very similar the deriv ation in App endix A. This in terpretation is in teresting from a resampling p oint of view. An interesting prop erty of all the three expressions ab o ve in Eqs. (6)-(7)-(8) is 1 ≤ ESS N ( ¯ w ) ≤ N . (9) 2.2 Relationship with the theoretical deﬁnition All these measures ESS N ( ¯ w ) are only based on the normalized weigh ts ¯ w and there is a loss of information regarding the lo cations of the samples x n , which is clearly a drawbac k [14, 34], even if the computation of the weigh ts in volv es the use of the samples, i.e., w n = π ( x n ) /q ( x n ). T o clarify this p oin t, w e give the follo wing example. Two diﬀeren t samples x ′ and x ′′ could ha v e very similar w eights w ′ = π ( x ′ ) q ( x ′ ) ≈ w ′′ = π ( x ′′ ) q ( x ′′ ) , so that the ESS form ulas just use this information. How ev er, the ESS formulas lose all the information ab out the p ositions of the samples x ′ and x ′′ . The tw o particles can b e very close to each other or far aw a y; the latter scenario is often preferred in terms of statistical information. Moreo ver, the theoretical v alue ESS teo ( h ) in (5) is alw ays p ositiv e, could b e smaller than 1 and, in some situations, bigger than N as w ell [14, Section 3.3], [32]. This last scenario can o ccur when an optimal prop osal p df (or a densit y close to the optimal one) is used in an IS sc heme [32]. In this case, the IS scheme can b eat the baseline Mon te Carlo estimator, and ESS teo ( h ) > N . F urthermore, ESS teo ( h ) dep ends on the function h that do es not appear in the expressions ESS N ( ¯ w ). Therefore, the form ulas ESS N ( ¯ w ) that all satisfy the constrains in Eq. (9) (i.e., 1 ≤ ESS N ( ¯ w ) ≤ N ) are quite rough appro ximations of ESS teo ( h ). How ev er, they are often used in practice. The reason for this success is connected to their in terpretation as discrepancy/div ersity measures, as explained b elo w. 5 2.3 Discrepancy w.r.t. the uniform pmf All the formulas ab o ve can be considered diversity indic es or discr ep ancy me asur es [24, 34]. W e giv e more details in the rest of the work. Here, let us start considering the discrepancy b etw een t wo pmfs: the pmf deﬁned b y the weigh ts ¯ w = [ ¯ w 1 , . . . , ¯ w N ] and the discrete uniform pmf deﬁned b y ¯ w ∗ =  1 N , . . . , 1 N  . Indeed, the ESS form ula in Eq. (6) can b e directly related to the Euclidean distance b et ween these t w o pmfs, i.e., ∥ ¯ w − ¯ w ∗ ∥ 2 = v u u t N X n =1  ¯ w n − 1 N  2 = v u u t N X n =1 ¯ w 2 n ! + N  1 N 2  − 2 N N X n =1 ¯ w n = v u u t N X n =1 ¯ w 2 n ! − 1 N = s 1 ESS N ( ¯ w ) − 1 N , where we ha v e used ESS N ( ¯ w ) = 1 P N i =1 ¯ w 2 n in Eq. (6). Hence, maximizing the expression in Eq. (6) is equiv alen t to minimizing the Euclidean distance ∥ ¯ w − ¯ w ∗ ∥ 2 . Note that this b eha vior is also t ypical of discrete entrop y measures, as w e stress in the next sections. Indeed, if the w eights are more “diverse” to eac h other, the distance w.r.t. the discrete uniform pmf ¯ w ∗ is higher, the ESS and the entrop y of ¯ w are smaller. On the other hand, if the normalized w eights are more similar to each other, they are all closer to the v alue 1 / N , so that the distance w.r.t. the discrete uniform pmf ¯ w ∗ is smaller. As a consequence, the corresponding ESS and the entrop y of ¯ w would b e greater. Hence, it appears natural to consider the p ossibilit y of using other discrepancy and/or en tropy measures to design alternativ e ESS expressions. Highligh ting these t yp es of connections is relev an t since (a) we can extend the range of applications of the ESS form ulas (applying that expressions in other ﬁelds) and (b) deriv ations emplo y ed in other ﬁelds can b e used to design no vel ESS form ulas. Wh y discrepancy measures. The maxim um ESS v alue is obtained when ¯ w = ¯ w ∗ , i.e., all the normalized w eights are equal to 1 / N , ¯ w 1 = ... = ¯ w N = 1 N . This can b e considered a go o d scenario (and confused with the optimal one) since the ideal Mon te Carlo estimator in Eq. (2) can b e in terpreted as an estimator with “equal w eighted samples” (eac h h ( x n ) is m ultiplied by a factor 1 / N ). The problem is that with an IS sc heme, the case ¯ w 1 = ... = ¯ w N = 1 N (or ¯ w 1 ≈ ... ≈ ¯ w N ≈ 1 N ) can o ccur also in catastrophic scenarios, for instance, 6 when all the samples are lo cated in a tail of the target distribution (that often is a quite ﬂat region), or when the samples are very close to each other. Ho wev er, the ESS form ulas ab ov e based on the discrepancy approac h are able to detect other critical scenarios. F or instance, the minimum ESS v alue is reached when just one weigh t concen trates all the probability mass ( ¯ w i = 1 and ¯ w j = 0 for i  = j ), which is a situation to be a voided within particle ﬁltering and sequen tial Mon te Carlo sc hemes [1, 8, 10, 12]. Therefore, this idea of this discrepancy approac h has gained strength in the literature, and the ESS approximations ab o ve hav e b een widely applied. In the follo wing, we describ e ﬁve conditions that a generic ESS approximation based only on the information of the normalized w eights must satisfy . Then w e sho w that the family of functions prop osed in [22] fulﬁlls these ﬁv e conditions. F urthermore, we link this G-ESS family with the R ´ en yi entrop y pro viding also some theoretical results. 3 Generalized ESS functions Considering the practical approac h emplo yed ab ov e for deﬁning ESS form ulas as discrepancy- div ersity measures, here we describ e the ﬁv e prop erties that a generalized ESS measure (G-ESS) should satisfy , based only on the information of the normalized weigh ts. W e list b elow ﬁv e conditions. The formulas which satisfy all of them can b e applied as suitable ESS measures in practical applications (within IS or sequen tial IS schemes). Otherwise, if they satisfy at least the ﬁrst three conditions, they can b e considered as discrepancy measures with resp ect to the the uniform pmf, but hav e not the ability to b e ”particle/sample coun ters”, as w e clarify b elo w with practical examples. First of all, note that any p ossible G-ESS is a function of the v ector of normalized weigh ts ¯ w = [ ¯ w 1 , . . . , ¯ w N ], ESS N ( ¯ w ) = ESS N ( ¯ w 1 , . . . , ¯ w N ) : S N → [1 , N ] , (10) where S N ⊂ R N represen ts the unit simplex in R N . Namely , the v ariables ¯ w 1 , . . . , ¯ w N are sub jected to the follo wing constraint: ¯ w 1 + ¯ w 2 + . . . + ¯ w N = 1 . (11) Moreo ver, we denoted ¯ w ∗ =  1 N , . . . , 1 N  , (12) and the v ertices of the simplex S N are denoted as ¯ w ( j ) = [ ¯ w 1 = 0 , . . . , ¯ w j = 1 , . . . , ¯ w N = 0] , (13) i.e., ¯ w j = 1 and ¯ w n = 0 (it can o ccur only if π ( x n ) = 0), for n  = j with j ∈ { 1 , . . . , N } . Belo w we list the ﬁve conditions that ESS N ( ¯ w ) should fulﬁll: 7 C1. Symmetry: ESS N m ust b e inv arian t under an y p ermutation of the w eigh ts, i.e., ESS N ( ¯ w 1 , ¯ w 2 , . . . , ¯ w N ) = ESS N ( ¯ w j 1 , ¯ w j 2 , . . . , ¯ w j N ) , (14) for an y p ossible set of indices { j 1 , . . . , j N } = { 1 , . . . , N } . C2. Maxim um condition: A maximum v alue is N and it is reached at ¯ w ∗ (see Eq. (12)), i.e., ESS N ( ¯ w ∗ ) = N ≥ ESS N ( ¯ w ) . (15) C3. Minim um condition : the minimum v alue is 1 and it is reac hed (at least) at the vertices ¯ w ( j ) of the unit simplex in Eq. (13), ESS N ( ¯ w ( j ) ) = 1 ≤ ESS N ( ¯ w ) . (16) for all j ∈ { 1 , . . . , N } . C4. Unicit y of extreme v alues: The maximum at ¯ w ∗ is unique and the the minim um v alue 1 is reac hed only at the vertices ¯ w ( j ) , for all j ∈ { 1 , . . . , N } . C5. Stabilit y of the rate ESS N / N : Consider the vector of w eights ¯ w ∈ R N and the v ector ¯ v = [ ¯ v 1 , . . . , ¯ v M N ] ∈ R M N , M ≥ 1, obtained rep eating and scaling by 1 M the entries of ¯ w , i.e., ¯ v = 1 M [ ¯ w , ¯ w , . . . , ¯ w | {z } M − times ] . (17) The in v ariance condition is expressed as ESS N ( ¯ w ) = 1 M ESS M N ( ¯ v ) , (18) for all M ∈ N + . This last requirement can b e interpreted as an adjustment of the well-kno wn homo geneity (scale- in v ariance) condition for real functions. 2 Note that, giv en conditions C2 and C3, we alwa ys ha ve 1 ≤ ESS N ( ¯ w ) ≤ N . (19) If at least C1, C2 and C3 are fulﬁlled, the G-ESS can b e considered a discrepancy measure with resp ect to the the uniform pmf. If also C4 is satisﬁed, then it is a proper discrepancy measure since it reac hes the maxim um v alue (that is N ) only at ¯ w ∗ and the minim um v alue (that is 1) only at the v ertices ¯ w ( j ) . How ev er, if C5 is not ensured, the formula cannot b e considered a useful ESS function from a practical p oint of view. 2 A function f ( x ) is said to b e homogeneous of degree k if f ( c x ) = c k f ( x ) where c is a non-zero constant v alue. 8 On the condition C5. F or clarifying this condition, consider the vector ¯ v = [0 , 1 , 0] with N = 3, and the tw o additional vectors obtained rep eating ¯ v tw o or three times, ¯ v ′ =  0 , 1 2 , 0 , 0 , 1 2 , 0  = 1 2 [ ¯ v , ¯ v ] , ¯ v ′′ =  0 , 1 3 , 0 , 0 , 1 3 , 0 , 0 , 1 3 , 0  = 1 3 [ ¯ v , ¯ v , ¯ v ] , W e w ould lik e to obtain ESS 3 ( ¯ v ) = 1 , ESS 6 ( ¯ v ′ ) = 2 and ESS 9 ( ¯ v ′′ ) = 3, i.e., the ratio ESS N N is constan t, i.e., ESS 3 ( ¯ v ) 3 = ESS 6 ( ¯ v ′ ) 6 = ESS 9 ( ¯ v ′′ ) 9 = 1 3 . A more in tuitiv e explanation is as follows. If we ha v e a v ector of normalized weigh ts ¯ v ′ =  0 , 1 2 , 0 , 0 , 1 2 , 0  , we w ould lik e to get ESS 6 ( ¯ v ′ ) = 2, since we ha ve 2 eﬀective samples instead of 6 (at most w e ha v e 2 eﬀective samples; w e can just say that, lo oking the vector ¯ v ′ ). Now, if we ha ve a vector ¯ v ′′ =  0 , 1 3 , 0 , 0 , 1 3 , 0 , 0 , 1 3 , 0  , w e would like to obtain ESS 9 ( ¯ v ′′ ) = 3. F rom another p oin t of view, since ¯ v ′′ can b e seen as ¯ v ′′ = 1 3 [ ¯ v , ¯ v , ¯ v ], where ¯ v = [0 , 1 , 0], we w ould lik e that the ESS N form ula would b e able to count eﬀe ctive samples in the same w ay in diﬀerent pieces of a v ector. Namely , if E S S 3 ( ¯ v ) = 1 and ¯ v ′′ is formed b y three rep etitions of ¯ v then we exp ect to obtain ESS 9 ( ¯ v ′′ ) = 3. An y result that diﬀers from these ones, do es not mak e sense from a practical p oin t of view, e.g., within a particle ﬁlter or sequen tial Mon te Carlo scheme. Classiﬁcation of G-ESS. Giv en the previous observ ations, we can provide a classiﬁcation of the p ossible G-ESS functions. T able 1 classiﬁes the G-ESS functions in diﬀeren t families dep ending on the conditions fulﬁlled. T able 1 shows the cases found in diﬀeren t families of ESS measures [34]. Recall that the ﬁrst three conditions are strictly required, to b e considered a discrepancy measure with resp ect to the uniform pmf. All the G-ESS functions whic h satisfy at least the ﬁrst four conditions, i.e., from C1 to C4, are called pr op er functions. If all the conditions are fulﬁlled they are called pr op er and stable. W e are in terested in this last type of G-ESS expressions, prop er and stable. Remark 1. Only the pr op er and stable G-ESS functions ar e useful fr om a pr actic al p oint of view, in or der to b e employe d as ESS me asur es. T able 1: Classiﬁcation of G-ESS form ulas. Class of G-ESS C1 C2 C3 C4 C5 Degenerate ✓ ✓ ✓ x x Prop er ✓ ✓ ✓ ✓ x Degenerate and Stable ✓ ✓ ✓ x ✓ Prop er and Stable ✓ ✓ ✓ ✓ ✓ In order to clarify the previous remark, and the imp ortance of the ﬁve conditions, b elo w w e sho w the imp ortance of the condition C5, in tro ducing a family that fulﬁlls the ﬁrst 4 conditions but do es not satisfy the last condition. 9 Example of a prop er but non-stable G-ESS family . Here, as an example, we in tro duce a G-ESS family suc h that the formulas in that family are all prop er but not stable. This means that all the contained G-ESS expressions can be used as discrepancy measure with resp ect to the uniform pmf but are not suitable to b e employs as ESS measures (within particle ﬁlters or sequen tial Mon te Carlo schemes). W e can design a G-ESS family based on the L p distance b etw een ¯ w and ¯ w ∗ whic h satisﬁes the ﬁrst four conditions ab o ve. This could b e an in tuitive idea when we are in terested in discrepancy measures. W e can in fact deﬁne the family ESS-D ( p ) N ( ¯ w ) = 1 α p || ¯ w − ¯ w ∗ || p + 1 N , α p = N − 1 N h N − 1+( N − 1) p N p i 1 /p , (20) where || ¯ w − ¯ w ∗ || p = N X i =1     ¯ w n − 1 N     p ! 1 /p , p > 0 . It is possible to sho w that ESS-D ( p ) N ( ¯ w ∗ ) = N and ESS-D ( p ) N ( ¯ w ( j ) ) = 1. Hence, ESS-D ( p ) fulﬁlls C1, C2, and C3 and it is also easy to show that satisﬁes C4. Hence, this family can b e employ ed as a discrepancy measure with resp ect to the uniform pmf ¯ w ∗ . Ho wev er, it is not a go o d ESS measure since do es not satisfy C5 (it is not stable). F or clarifying this p oint, let us consider some examples comparing ESS-D ( p ) N with p = 2 in Eq. (20), with other ESS form ulas (but stable ) in Eqs. (6) and (8). The results are given in T able 2. T able 2: Examples of ESS measures with diﬀeren t v ector ¯ w with dimension N = 5. Note that ESS-D ( p ) N ( ¯ w ) with p = 2 in Eq. (20) is proper but non-stable, whereas the rest of t wo ESS form ulas in Eqs. (6)-(8) are b oth prop er and stable. (a) (b) (c) (d) (e) ¯ w = ⇒ [1 , 0 , 0 , 0 , 0]  1 2 , 1 2 , 0 , 0 , 0   1 3 , 1 3 , 1 3 , 0 , 0   1 4 , 1 4 , 1 4 , 1 4 , 0   1 5 , 1 5 , 1 5 , 1 5 , 1 5  = ¯ w ∗ ESS-D (2) 5 ( ¯ w ) — Eq. (20) 1 1.45 1.90 2.5 5 1 P 5 n =1 ¯ w 2 n — Eq. (6) 1 2 3 4 5 1 max ¯ w n — Eq. (8) 1 2 3 4 5 the formula ESS-D ( p ) N do es not provide the desired results, with the exception of the cases (a) and (e) (the ﬁrst and the last scenarios) that are related to the condition C2 and C3. Namely , ESS-D ( p ) N is not a go od particle counter, unlike the other tw o ESS formulas. F or instance, in case of ¯ w = [ ¯ w 1 = 1 3 , ¯ w 2 = 1 3 , ¯ w 3 = 1 3 , ¯ w 4 = 0 , ¯ w 5 = 0] using just the information of these normalized 10 w eights ¯ w n , we can just assert that w e ha v e three eﬀective samples, whereas ESS-D (2) 5 ( ¯ w ) returns ≈ 1 . 90. 4 Huggins-Ro y’s ESS family The Huggins-Ro y’s ESS family introduced in [22] is deﬁned as ESS-H ( β ) N ( ¯ w ) = 1 P N n =1 ¯ w β n ! 1 β − 1 , (21) = N X n =1 ¯ w β n ! 1 1 − β , β ≥ 0 . (22) T able 3 shows b elow that the Huggins-Ro y’s family con tains all the most imp ortan t, pr op er and stable G-ESS functions introduced in literature. The sp ecial cases with β = 0 and β = 1 bring to t wo undetermined expressions that will b e solv ed and clariﬁed b elow (when the relationship with R ´ en yi entrop y is sho wn). W e can easily note that 1 ≤ ESS-H ( β ) N ( ¯ w ) ≤ N for all β ≥ 0. More generally , it is possible to observ e that for β  = 0 the conditions C1, C2, C3 and C4 are fulﬁlled (with the exception of β = 0 that do es not satisfy C4). F urthermore, the condition C5 is also satisﬁed, for all β , as we show next. Pro of. In order to pro ve that C5 is satisﬁed, for simplicity let us consider a v ector ¯ v = 1 2 [ ¯ w , ¯ w ], deﬁned rep eating twice the vector ¯ w (i.e., M = 2). In this case, we hav e ESS-H ( β ) 2 N ( ¯ v ) = 1 2 β N X n =1 ¯ w β n + 1 2 β N X n =1 ¯ w β n ! 1 1 − β , = 1 2 β − 1 N X n =1 ¯ w β n ! 1 1 − β , = 2 N X n =1 ¯ w β n ! 1 1 − β , = 2 ESS-H ( β ) N ( ¯ w ) , ∀ β , (23) whic h is exactly the condition in Eq. (18). The pro of can b e easily rep eated for a v alue M > 2. Remark 2. Henc e, al l G-ESS functions (exc ept for β → 0 ) b elonging to the Huggins-R oy’s ESS family ar e pr op er and stable. F or β → 0 , the c orr esp onding ESS is de gener ate and stable. Mor e over, some sp e ciﬁc c ases pr ovide d in T able 3, c oincide with other pr op er and stable G-ESS formulas pr op ose d in [34]. 11 T able 3: Relev ant sp ecial cases contained in the Huggins-Ro y’s family . They are all prop er and stable, except for N − N Z that is degenerate and stable. β → 0 β = 1 / 2 β → 1 β = 2 β → ∞ N − N Z  P N n =1 √ ¯ w n  2 exp  − P N n = ¯ w n log ¯ w n  1 P N n =1 ¯ w 2 n 1 max[ ¯ w 1 ,..., ¯ w N ] wher e N Z is the numb er of Perplexity - Eq. (7) Standar d formula In Eq. (8) zer os in ¯ w [5, 40] in Eq. (6) - [26] [34] 5 Relationship with the en trop y measures 5.1 Relationship with the R ´ en yi en trop y In this section, w e sho w the connection betw een the R ´ enyi entrop y and Huggins-Ro y’s family . The R ´ en yi entrop y [9] is deﬁned as R ( β ) N ( ¯ w ) = 1 1 − β log " N X n =1 ¯ w β n # , β > 0 , (24) Then, ﬁrst noting that 1 1 − β log h P N n =1 ¯ w β n i = log h P N n =1 ¯ w β n i 1 1 − β and taking the exponential of b oth sides of the equation ab ov e, w e obtain ESS-H ( β ) N ( ¯ w ) = exp  R ( β ) N ( ¯ w )  = N X n =1 ¯ w β n ! 1 1 − β , β > 0 . (25) In ecology , the exp onen tial of the R ´ enyi entrop y deﬁnes the so-called diversity indic es [24]. This means that the Huggins-Roy’s family con tains and coincides with all the div ersity indices deriv ed b y the R´ en yi entrop y [9, 24]. See Section 7.1, for further details. Note that, for β = 0, w e hav e R (0) N ( ¯ w ) = log ( N − N Z ) where N Z = #  all ¯ w n : ¯ w n = 0 , ∀ n = 1 , . . . , N  (see [9] for further details), so that ESS-H (0) N ( ¯ w ) = N − N Z , as also sho wn in T able 3. F or β = 1, w e hav e R (0) N ( ¯ w ) = − P N n = ¯ w n log ¯ w n [9] then ESS-H (1) N ( ¯ w ) = exp − N X n = ¯ w n log ¯ w n ! , (26) that is the p erplexit y in Eq. (7) [5, 40]. 5.1.1 Inequalities for the G-ESS within Huggins-Roy family One of adv an tages of the connection with the R ´ enyi en trop y is that we can obtain easily some theoretical results ab out ESS-H ( β ) N . Indeed, for instance, it is w ell-kno wn that [9] R (0) N ( ¯ w ) ≥ R (1) N ( ¯ w ) ≥ R (2) N ( ¯ w ) ≥ . . . R ( β ′ ) N ( ¯ w ) . . . ≥ R ( ∞ ) N ( ¯ w ) , β ′ ≥ 2 . 12 Then, since ESS-H ( β ) N is an increasing monotonic function of R ( β ) N , w e can also assert ESS-H (0) N ( ¯ w ) ≥ ESS-H (1) N ( ¯ w ) ≥ ESS-H (2) N ( ¯ w ) ≥ . . . ESS-H ( β ′ ) N ( ¯ w ) . . . ≥ ESS-H ( ∞ ) N ( ¯ w ) . (27) Namely , we can re-write ESS-H ( ∞ ) N ( ¯ w ) ≤ ESS-H ( β ) N ( ¯ w ) ≤ ESS-H (0) N ( ¯ w ) , 1 max ¯ w n ≤ ESS-H ( β ) N ( ¯ w ) ≤ N − N Z , β ≥ 0 . (28) Moreo ver, since from [9] we hav e R (2) N ( ¯ w ) ≤ 2 R ( ∞ ) N ( ¯ w ) , w e can also write ESS-H (2) N ( ¯ w ) ≤ 2 ESS-H ( ∞ ) N ( ¯ w ) . (29) 5.2 Relationship with the Tsallis en trop y Another famous entrop y family is the so-called Tsallis entrop y [46] (as known as q -logarithmic en tropy [24]), deﬁned as T ( α ) N ( ¯ w ) = 1 α − 1 " 1 − N X n =1 ¯ w α n # , α > 0 . (30) W e can obtain a corresp onding G-ESS family based on the Tsallis en tropy , after some additional simple op erations of translation and scaling, i.e., ESS-T ( α ) N ( ¯ w ) = ( α − 1)( N − 1) N 1 − α − 1 T ( α ) N ( ¯ w ) + 1 , (31) = ( α − 1)( N − 1) N 1 − α − 1 " 1 − N X n =1 ¯ w α n # + 1 , α > 0 . (32) Note that 1 ≤ ESS-T ( α ) N ( ¯ w ) ≤ N . Sp ecial cases. F or α → 0, we get again the following degenerate and stable form ula ESS-T (0) N ( ¯ w ) = N − N Z , where N Z = #  all ¯ w n : ¯ w n = 0 , ∀ n = 1 , . . . , N  . F or α → ∞ , w e hav e the degenerate expression ESS-T ( ∞ ) N ( ¯ w ) = N if ¯ w  = ¯ w ( j ) for all j ∈ { 1 , . . . , N } , or ESS-T ( ∞ ) N ( ¯ w ) = 1 if ¯ w = ¯ w ( j ) , for all j ∈ { 1 , . . . , N } . Setting α = 2, we ha v e ESS-T (2) N ( ¯ w ) = N 1 − N X n =1 ¯ w 2 n ! + 1 , = N Gini-impurit y( ¯ w ) + 1 , (33) 13 where w e hav e used the deﬁnition of the function b elo w, Gini-impurit y( ¯ w ) = 1 − N X n =1 ¯ w 2 n , (34) that is the so-called Gini impurity or Gini’s diversity index or also known as Gini-Simpson index in bio div ersity studies, that is widely used in mac hine learning within decision tree algorithms [3, 28]. Moreo ver, from an ecology p oin t of view, that Gini-impurit y( ¯ w ) represents the probabilit y that t wo individuals c hosen at random are of diﬀeren t species. The Gini impurit y is associated with the name of Edward H. Simpson, who in tro duced it as an index of diversit y in 1949 [44]. Then, Corrado Gini used the formula (called as “Gini impurity”) ab o ve in economics, statistics, and demography [7]. It is suc h a natural quan tit y that it has b een used in many diﬀeren t ﬁelds and admits an unbiased estimator. Despite all these b eneﬁts, Gini-impurity( ¯ w ) is not directly an eﬀective n umber and needs an additional translation and scaling, becoming ESS-T (2) N ( ¯ w ). Moreo ver, the ﬁnal expression is not stable. It is also interesting to remark that the ﬁnal form of ESS-T ( α ) N ( ¯ w ) resem bles the G-ESS family ESS-V ( r ) N ( ¯ w ) introduced in [34], ESS-V ( r ) N ( ¯ w ) = N r − 1 ( N − 1) 1 − N r − 1 " N X n =1 ¯ w r n # + N r − 1 N r − 1 − 1 , r > 0 , Ho wev er, generally the ESS expressions con tained in ESS-V ( r ) N ( ¯ w ) and ESS-T ( α ) N ( ¯ w ) are not stable. F or this reason, in this work w e fo cus on mainly Huggins-Roy ESS family . F urthermore, it is also p ossible to ﬁnd another transformation, instead of the standard exp onential function exp( · ) (as for the R ´ en yi entrop y), that con verts the Tsallis entrop y in to the Huggins-Ro y ESS family . That is the so-called q -exp onen tial function [9]: exp α ( t ) = ( (1 + (1 − α ) t ) 1 / (1 − α ) if α  = 1 , exp( t ) if α = 1 . (35) After some manipulations, w e arrive to exp α  T ( α ) N ( ¯ w )  = ESS-H ( α ) N ( ¯ w ) . (36) Hence, this conﬁrms in a generalized-sense the deﬁnition of a div ersity index as “exp onen tial of an en tropy” given in Eq. (25), used in ecology . 6 Other stable G-ESS expressions All the ESS formulas contained in the Huggins-Roy family are prop er and stable, as w e ha v e sho wn in Section 4. The conv erse statemen t is not true, i.e., there are other prop er and stable G-ESS 14 form ulas that are not contained in the Huggins-Roy family . W e provide some examples b elo w. Another degenerate and stable formula. W e start with an additional example of degenerate and stable expression: ESS-Plus N ( ¯ w ) = N + = # { ¯ w n ≥ 1 / N , ∀ n = 1 , . . . , N } . (37) It represen ts the num b er of the normalized w eights bigger or equal to 1 / N . This ESS expression is stable but degenerate. The issue is that ESS-Plus N ( ¯ w ) reac hes the minimum v alue 1 ev en at p oints that are not the v ertices ¯ w ( j ) of the simplex (see Eq. (13)). F or instance, with ¯ w = [0 . 8 , 0 , 0 . 2] w e get ESS-Plus 3 ( ¯ w ) = 1, but we w ould like to reac h the minimum v alue 1 only at the vertex ¯ w (1) = [1 , 0 , 0], ¯ w (2) = [0 , 1 , 0] and ¯ w (3) = [0 , 0 , 1]. Ho wev er, ESS-Plus N ( ¯ w ) is muc h more useful than another degenerate and stable form ula that we already found in T able 3, i.e., N − N Z . Indeed, N − N Z is degenerate since reaches the maxim um v alue, N , in any ¯ w that do es not contain an y zero (instead of only at ¯ w ∗ ). This mak es N − N Z m uch less useful from a practical p oin t of view, for instance, within a particle ﬁlter. Whereas ESS-Plus N ( ¯ w ) could b e p erfectly employ ed within a particle ﬁlter, considering it as a more conserv ative ESS form ula with respect to other ESS expressions. Other proper and stable form ulas. Let us deﬁne  ¯ w + 1 , . . . , ¯ w + N +  = { all ¯ w n suc h that: ¯ w n ≥ 1 / N , ∀ n = 1 , . . . , N } , (38) where N + is given in Eq. (37), i.e., N + = #  ¯ w + 1 , . . . , ¯ w + N +  . No w, it is p ossible to deﬁne a correct-prop er version of ESS-Plus [34], i.e., ESS-Q N ( ¯ w ) = − N N + X i =1 ¯ w + i + N + + N , = N + + N 1 − N + X i =1 ¯ w + i ! , = N + + N N − N + X i =1 ¯ w − i ! = N + + N γ , (39) where ¯ w − i are all the normalized w eigh ts such that < 1 / N , and γ = P N − N + i =1 ¯ w − i ≤ 1. Note that γ = 0 in the t wo extreme cases ¯ w = ¯ w ( j ) and ¯ w = ¯ w ∗ and ESS-Q N ( ¯ w ) = N + + 0 = N + , i.e., we ha ve ESS-Q N ( ¯ w j ) = 1 and ESS-Q N ( ¯ w ∗ ) = N as exp ected. In all the other scenarios, a p ortion of all the num b er of samples (that is γ N with γ ≤ 1) is added to N + . The resulting ESS form ula is prop er and stable. This measure is also related to L 1 b et ween the tw o pmfs [34]. Another prop er and stable ESS expression in tro duced in the literature is based on the Gini inequalit y co eﬃcien t, widely applied in economics [23, 34]. First of all, w e deﬁne the non-decreasing sequence of normalized w eights as ¯ w (1) ≤ ¯ w (2) ≤ . . . ≤ ¯ w ( N ) , (40) 15 obtained sorting in ascending order the en tries of the v ector ¯ w . Let us consider the Gini inequality co eﬃcien t G ( ¯ w ) in tro duced in economy for measuring the wealth inequalit y can b e deﬁned as follo ws [17, 7, 23], G ( ¯ w ) = 2 s ( ¯ w ) N − N + 1 N , where s ( ¯ w ) = N X n =1 n ¯ w ( n ) . (41) It is not the unique form ulation: there are v arious equiv alen t form ulations of the Gini co eﬃcien t [47, 33]. Then, the corresp onding G-ESS function is given by ESS-Gini N ( ¯ w ) = − N G ( ¯ w ) + N , = − N  2 s ( ¯ w ) N − N + 1 N  + N , = − 2 s ( ¯ w ) + 1 + 2 N , = − 2 N X n =1 n ¯ w ( n ) + 1 + 2 N , (42) whic h is prop er and stable. It can b e easily also sho wn that ESS-Gini N ( ¯ w ( j ) ) = − 2 N + 1 + 2 N = 1 for all j and ESS-Gini N ( ¯ w ∗ ) = − 2 1 N N ( N +1) 2 + 1 + 2 N = N . The fact that some prop er and stable ESS expressions do not b elong to the Huggins-Ro y family sho ws there is still ro om and need for further researc h on in this topic. F or instance, new prop er and stable form ulas could b e discov ered, as w e shown in the next section. 7 Connections with other researc h ﬁelds: extended range of applications In the previous section, w e ha v e already seen that the connections with the R´ en yi and Tsallis en tropy families sho w the existence of other relationships with man y studies in diﬀeren t ﬁelds (e.g., ecology and mac hine learning). The b eneﬁt of creating these bridges b et ween ﬁelds is bidirectional: diﬀeren t ideas used in other ﬁelds can b e applied as ESS in a IS context and, vice- v ersa, ESS formulas prop osed for IS could b e employ ed in other ﬁelds. A clear example of this b eneﬁt is given, in Section 7.3, where we discov er a new prop er and stable ESS form ula, that has b een introduced in p olitical science. 7.1 ESS in ecology The connection with the R ´ enyi entrop y shows that the G-ESS functions of the Huggins-Ro y’s family are also div ersity indices [24]. More sp eciﬁcally , the exp onential of the R´ en yi entrop y is known in ecology as the Hil l numb er of order β [24]. The Hill num b ers are the most imp ortan t measures of biological diversit y . F or instance, the Hill num b er of order 0 corresp onds 16 to ESS-H (0) N ( ¯ w ) = N − N Z , and represents the num b er of sp ecies. This is also called the sp e cies richness in ecology , and is often used as a measure of diversit y in the p opular media and the ecology literature. How ev er, it do es not mak e any distinction b etw een a rare sp ecies and a common sp ecies. Moreo ver, ESS-H (0) N do es not provide any information ab out the balance b etw een the sp ecies that are in volv ed. In Section 5.2, w e hav e seen that the form ula 1 − P N n =1 ¯ w 2 n is called Gini-impurity in machine learning. Whereas, in ecology , it is called Gini-Simpson index, since Simpson introduced it as an index of diversit y [44]. Moreov er, since the sum of the squares P N n =1 ¯ w 2 n can b e in terpreted as a me asur e of c onc entr ation (see Section 7.2), the Hill num b er (div ersity) of order 2, ESS-H (2) N ( ¯ w ), is also called the inverse Simpson c onc entr ation in ecology [44]. F urthermore, the diversit y of order ∞ , i.e., ESS-H ( ∞ ) N ( ¯ w ) = 1 / max ¯ w i , is kno wn as the Ber ger-Parker index in ecology [2]. While the Hill n um b er of order 0 gives rare sp ecies the same imp ortance as an y other, div ersity of order ∞ ignores them and takes in to account only the dominan t sp ecies. More generally , the parameter β con trols the sensitivit y of the div ersity measure ESS-H ( β ) N to rare sp ecies, with higher v alues of β corresp onding to measures less sensitiv e to rare sp ecies. In other words, β reﬂects the in verse of the imp ortance given to rare sp ecies. 7.2 ESS in economics The ESS indices ha ve b een also widely emplo yed (under other names) as metrics for p ortfolio disp ersion and/or concen tration. The eﬀectiv e n um b er of p ositions held in a p ortfolio is usually measured as ESS N ( ¯ w ) = 1 P N n =1 ¯ w 2 n , where the normalized w eights ¯ w n represen t the proportion of mark et v alue inv ested in each securit y . A high v alue of ESS N ( ¯ w ) implies a v ery diversiﬁed p ortfolio (at most diﬀerent N equally w eighted p ositions). The formula 1 P N n =1 ¯ w 2 n has b een shown to b e one of the most eﬃcient measures of p ortfolio div ersiﬁcation. It has b een also used as a constraint to force a p ortfolio to hold a minimum num b er of eﬀective assets, denoted for instance as N eﬀ (e.g., ∥ ¯ w ∥ 2 ≤ N − 1 eﬀ ). Concen tration measures. In Section 7.1, we ha v e seen that there is a coincidence betw een Hill num b ers and the Huggins-Ro y ESS form ulas. More generally , the reciprocals of the Hill n umbers (hence, the recipro cals of the G-ESS formulas as well) ha ve b een used in economics as concen tration measures, i.e., Conc ( β ) N ( ¯ w ) = 1 ESS-H ( β ) N ( ¯ w ) = N X n =1 ¯ w β n ! 1 β − 1 , β > 0 . (43) As an example, we could in v estigate if an industry or a mark et is concen trated in the hands of a small num b er of large play ers. Let assume there are N competing companies in a given industry , eac h one o ccupying a p ortion of the market represented b y the normalized weigh ts ¯ w 1 , . . . , ¯ w N , then the concen tration 1 / ESS-H ( β ) N ( ¯ w ) is maximized when one compan y has a monop oly , i.e., when ¯ w = ¯ w ( j ) (the j -th compan y has conquered all the market, ¯ w j = 1). Namely , a concen tration index ranges from 1 / N (in case of p erfect comp etition) to 1 (in case of monop oly), where N represen ts the n um b er of companies in the market. The concen tration measure for β = 2, 17 i.e., Conc (2) N ( ¯ w ) = 1 ESS-H (2) N ( ¯ w ) = P N n =1 ¯ w 2 n is known as Herﬁndahl-Hirschman index in economy . Finally , in the previous section, we hav e also seen the application of similar indices for measuring the w ealth inequality , e.g., using the Gini co eﬃcien t [7, 17, 47]. 7.3 ESS in p olitical science In political science, ESS formulas hav e b een used to set the eﬀective n umber of parties in a p olitical system. More precisely , the authors in [29] prop osed the eﬀective n um b er of parties using the follo wing formula 1 P N n =1 ¯ w 2 n (the Hill n umber of order 2 in ecology), where N is the total n umber of parties and ¯ w n is the prop ortion of votes of the n -th party . An alternative formula w as in tro duced in p olitical science b y Grigorii Goloso v [18], ESS-Gol N ( ¯ w ) = N X n =1 1 1 +  (max ¯ w n ) 2 ¯ w n  − ¯ w n = N X n =1 ¯ w n ¯ w n + (max ¯ w n ) 2 − ¯ w 2 n , (44) that is also pr op er and stable . The v alue max ¯ w n denotes the p ortion of v otes of the part y that are obtained the greatest num b er of votes. Other alternatives can b e found in the literature [37]. 7.4 ESS in quan tum ph ysics In quantum physics, there exists a quan tit y that is related to the formula ESS-H (2) N ( ¯ w ) = P N n =1 ¯ w 2 n , that is called p articip ation r atio (PR) and the corresp onding concentration Conc (2) N ( ¯ w ) is known as inverse p articip ation r atio (IPR) . F or a fully delo calized or spread state, w e hav e the low est v alue of the IPR, i.e., min(IPR) = 1 / N . On the other hand, for a fully lo calized state, w e hav e the highest v alue of IPR, i.e., max(IPR) = 1. IPR is also close to the concept of purity whereas PR is close to the concept of sep ar ability , emplo yed b oth in quantum mechanics [49]. Moreov er, since the purity P of a quan tum state is a quan tity suc h that P ≤ 1, another concepts naturally arises that is state mixe dness as the complement of purit y , M = 1 − P . The quan tum state is pure if P = 1. Figure 2 summarizes the main nomenclature describ ed so far. 7.5 Application to mo del selection as eﬀectiv e num b er of comp onen ts Mo del selection is a fundamental task in statistics and machine learning. An interesting scenario is when we hav e a family of nested models, where the mo del complexity can c hange since the n umber of parameters can v ary (i.e., the dimension of the v ector of parameters grows, building more complex mo dels). The dimension of the v ector of parameters is itself ob ject of inference. This is the case of the order selection in polynomial regression problems or autoregressive schemes, v ariable selection, clustering, dimension reduction, just to name a few. In the literature, generally cross-v alidation (CV) tec hniques [3], and information criteria [21, 35, 43, 45] the pro cedures used to handle this problem. More recently , other approaches based on geometric considerations, hav e b een also proposed in the literature, such as the automatic detection of an “elbow” or “k nee-p oin t” in a non-increasing curv e describing a metric of p erformance of the mo del v ersus its complexit y 18 Figure 2: Graphical summary of the main nomenclature in diﬀerent ﬁelds. [38, 39, 48, 25]. An eﬀectiv e num b er of v ariables/features (ENV) has b een also prop osed [36]. The ENV index is inspired b y the concept of maxim um area-under-the-curv e (AUC) in receiver op erating c haracteristic (ROC) curves [20] and the Gini inequality index, described in Section 6 and men tioned in Section 7.2 [23]. In the v ariable selection scenario, the ENV index is given by I ENV = 1 + 2 V (0) N − 1 X k =1 V ( k ) , for V (0)  = 0 , and V ( N ) = 0 , (45) where V ( k ) is a non-increasing error curv e, e.g., the mean square errors (MSE), for a mo del that uses only k ≤ N input v ariables (instead of all the N p ossible v ariables). By construction, it is alw ays p ossible to ha ve V ( N ) = 0 (b y a simple translation). It is possible to show that 1 ≤ I ENV ≤ N . The ENV index could b e deﬁned also for non-decreasing curv e by the alternativ e deﬁnition I ENV = 1 + 2 V ( N ) N − 1 X k =1 V ( k ) , for V ( N )  = 0 , and V (0) = 0 . (46) Th us, we can conv ert the ENV index in an ESS form ula building the curve V ( k ) as follo ws: • Sort in ascending order the normalized w eigh ts as ¯ w (1) ≤ ¯ w (2) ≤ . . . ≤ ¯ w ( N ) . 19 • Build a non-decreasing curve V ( k ), as in (46), following the recursion: V ( k ) = k X i =1 ¯ w ( i ) = V ( k − 1) + ¯ w ( k ) , (47) starting with V (0) = 0. Note that w e alwa ys ha v e V ( N ) = 1. The corresp onding ESS formula is ESS-ENV N ( ¯ w ) = 1 + 2 N − 1 X k =1 k X i =1 ¯ w ( i ) . (48) Note that ESS-ENV N ( ¯ w ( j ) ) = 1 + 0 = 1 for all j , and ESS-ENV N ( ¯ w ∗ ) = 1 + 2 N N − 1 X k =1 k = 1 + 2 N ( N − 1)( N ) 2 = 1 + N − 1 = N . (49) Remark 3. It is p ossible to show that ESS-ENV N ( ¯ w ) is pr op er and stable. F urthermor e, it c oincides with ESS-Gini ( ¯ w ) in Eq. (42) , i.e., ESS-ENV N ( ¯ w ) = ESS-Gini N ( ¯ w ) . R e c al l that ther e exist diﬀer ent formulations of the Gini c o eﬃcient [47]. The closest one in this fr amework is r elate d to the L or enz curve [33]. Remark 4. This se ction op ens the p ossibility to apply the ESS formulas as eﬀe ctive numb er fo c omp onents in mo del sele ction pr oblems. Inde e d, given a non-incr e asing err or curve V ( k ) , i.e., V ( k − 1) ≤ V ( k ) , we c an build the normalize d weights in this way: d k = V ( k − 1) − V ( k ) , ¯ w k = d k P N i =1 d i , (50) for al l k = 1 , ..., N . Then the ESS formula c an b e applie d to the ve ctor ¯ w = [ ¯ w 1 , ..., ¯ w N ] . 8 Numerical exp erimen ts 8.1 Analyzing the Huggins-Ro y family Since all the ESS functions in the Huggins-Ro y family are prop er and stable and, since this family con tains the main relev an t form ulas, we fo cus the numerical exp erimen ts on this family . First of all, w e recall the theoretical deﬁnition of ESS in Eq. (5), ESS teo ( h ) = N v ar π [ b I ] v ar q [ e I ] . (51) where, for simplicit y , w e consider a scalar x ∈ R the use of the in tegrand h ( x ) = x (in the deﬁnition abov e, w e hav e clariﬁed the dep endence on the function h ). Namely , b I and e I are 20 estimators of the exp ected v alue of a random v ariable X with a target p df ¯ π ( x ) (deﬁned b elo w). In this n umerical example, we compute approximately via Monte Carlo the theoretical deﬁnition ESS teo , and compare them with the G-ESS functions ESS-H ( β ) N . More sp eciﬁcally , we consider a univ ariate standard Gaussian density as target p df, ¯ π ( x ) = N ( x ; 0 , 1) , (52) and also a Gaussian prop osal p df, q ( x ) = N ( x ; µ p , σ 2 p ) , (53) with mean µ p and v ariance σ 2 p . In all the exp erimen ts, we consider N = 1000. 8.1.1 V arying the prop osal mean µ p In a ﬁrst analysis, w e k eep ﬁxed σ p = 1 and v ary µ p ∈ [0 , 2]. Figures 3(a)-3(b) depict t wo scenarios in this exp erimen tal setup, corresp onding to t wo speciﬁc v alues of µ p , 0 . 5 and 1 . 5. Clearly , for µ p = 0 we ha ve the ideal Mon te Carlo case, q ( x ) ≡ ¯ π ( x ). As µ p increases, the prop osal b ecomes more diﬀerent from ¯ π . W e recall that N = 1000. Figure 4(a) sho ws the theoretical ESS teo / N curv es (solid line) ESS-H (2) N / N (circles) and ESS-H ( ∞ ) N / N (squares), a v eraged o v er 10 5 indep enden t runs. Note that 1 N ≤ E S S N ≤ 1. Optimal linear combination of ESS-H (2) N and ESS-H ( ∞ ) N . The functions ESS-H (2) N and ESS-H ( ∞ ) N are the most used and suggested formulas in diﬀerent studies [22, 34]. Moreov er, at least in this sim ulation scenario, they seem to pla y the role of upp er b ound and lo wer b ound of the true v alue, as shown by Figure 4(a). F or this reason, we also consider the linear combination of the G-ESS form ulas ESS-H (2) N and ESS-H ( ∞ ) N , Com b-ESS N ( ¯ w ) = a 1 ESS-H (2) N ( ¯ w ) + a 2 ESS-H ( ∞ ) N ( ¯ w ) . (54) This example suggests the use of a 1 = 0 . 6245 , a 2 = 0 . 4289 , (55) obtained using a Least Squares (LS) regression in order to obtain an expression Com b-ESS N ( ¯ w ) as close as p ossible to the theoretical ESS curve. Optimal β for ESS-H ( β ) N ( ¯ w ) . F urthermore, we ha ve computed the curv es (as function β ) of ESS-H ( β ) N ( ¯ w ) for diﬀeren t v alues of β , considering a thin grid of β v alues from 0 . 2 to 50 with a step of 0 . 01 (i.e., β ∈ G denoting G the thin grid). W e consider a L 1 distance b et ween eac h ESS-H ( β ) N ( ¯ w ) curve and the theoretical ESS curv e, 3 , i.e., | ESS-H ( β ) N − ESS teo | , and compute β ∗ = arg min β ∈G | ESS-H ( β ) N − ESS teo | . (56) With this pro cedure, we obtain β ∗ ≈ 4 . 3 Recall that these curv es are functions of µ p and are a veraged o ver 10 5 indep enden t runs. 21 Discussion of the results. Figure 4(b) shows the curves of the ESS rates corresp onding to the theoretical ESS curv e (solid line), the b est linear combination corresp onding to the Eqs. (54)-(58) (squares) and the curv e corresp onding to ESS-H ( β ∗ ) N (dashed line). First of all, we can note that the linear com bination can return v alues greater than 1 (recall that we are considering ESS / N ). Moreo ver, we can see that the curv e corresp onding to ESS-H (4) N ( ¯ w ) ﬁts particularly w ell in this numerical setup, pro viding a very close to the theoretical ESS curv e. Observe that the appro ximation pro vided b y ESS-H (4) N is virtually p erfect for µ p ≤ 1. Hence, in this kind of scenario, w e would suggest the use of the expression ESS-H (4) N ( ¯ w ) = 1 P N n =1 ¯ w 4 n ! 1 3 . (57) − 5 0 5 0 0.1 0.2 0.3 0.4 x Target and proposal pdfs µ p µ p =0.5 π (x) q(x) (a) − 5 0 5 0 0.1 0.2 0.3 0.4 x Target and proposal pdfs µ p µ p =1.5 π (x) q(x) (b) -5 0 5 x 0 0.1 0.2 0.3 0.4 Target and proposal pdfs p =0.8 p =0.5 (x) q(x) (c) Figure 3: T arget and prop osal pdfs: (a) - (b) with µ p ∈ { 0 . 5 , 1 . 5 } . The v ariances in b oth is set to 1. (c) here µ p = 0 and σ p ∈ { 0 . 5 , 0 . 8 } . 8.1.2 V arying the prop osal standard deviation σ p No w, w e k eep ﬁxed µ p = 0 and v ary the standard deviation of the prop osal σ p ∈ [0 . 5 , 1]. Figure 3(c) depicts the target density and the prop osal density for t wo sp eciﬁc v alues of σ p , 0 . 5 and 0 . 8, used in this exp erimen tal setup. W e recall that N = 1000 and the results hav e b een av eraged o ver 10 5 indep enden t runs. In Figure 5(a), w e can observ e the results of ESS teo / N versus σ p (in solid line), join tly with the curves ESS-H (2) N / N (giv en with circles) and ESS-H ( ∞ ) N / N (sho wn with squares). Optimal linear com bination of ESS-H (2) N and ESS-H ( ∞ ) N . Since the form ulas ESS-H (2) N and ESS-H ( ∞ ) N are the most used in practice, again w e consider the linear com bination of the G-ESS form ulas ESS-H (2) N and ESS-H ( ∞ ) N , Com b-ESS N ( ¯ w ) = a 1 ESS-H (2) N ( ¯ w ) + a 2 ESS-H ( ∞ ) N ( ¯ w ) , (58) 22 0 0.5 1 1.5 2 p 0 0.2 0.4 0.6 0.8 1 ESS/N ESS H N (2) H N ( ) (a) 0 0.5 1 1.5 2 p 0 0.2 0.4 0.6 0.8 1 ESS/N ESS H N (4) Lin Comb (b) Figure 4: Ratio of ESS v alues ov er N (with N = 1000) v ersus µ p . The curv e corresponding to theoretical ESS v alue, i.e., ESS teo / N is shown in black solid line in b oth ﬁgures. In (a) the curves of ESS-H (2) N / N (circles) and ESS-H ( ∞ ) N / N (squares) are also depicted. In (b) we show the curves ESS-H (4) N / N (dashed line) and the linear combination in Eq. (54)-(58) (squares), as w ell. The appro ximation provided by ESS-H (4) N is virtually p erfect for µ p ≤ 1. where in this scenario we get by LS solution a 1 = 0 . 2715 , a 2 = 0 . 8483 , (59) hence ESS-H ( ∞ ) N tak es more imp ortance in this scenario. Figure 5(b) pro vides the curv e corresp onding to Comb-ESS N ( ¯ w ) / N with a dashed line and green squares. Optimal β for ESS-H ( β ) N ( ¯ w ) . F urthermore, we ha ve computed the curv es (as function β ) of ESS-H ( β ) N ( ¯ w ) for diﬀeren t v alues of β , considering a grid of v alues of β denoted as G . W e consider a L 1 distance b et ween each ESS-H ( β ) N ( ¯ w ) curve and the theoretical ESS curv e, and compute β ∗ = arg min β ∈G | ESS-H ( β ) N − ESS teo | . (60) In this scenario, w e obtain β ∗ ≈ 7 . 6 . The corresp onding curv e is depicted in Figure 5(b) with a dashed line and red triangles. W e can see that w e obtain a v ery goo d appro ximation of ESS teo / N , but sligh tly w orse than in the case describ ed in the previous section. Moreov er, here the optimal β ∗ is ≈ 7 . 6 whereas, in the previous section, w as β ∗ is ≈ 4. 23 0.5 0.6 0.7 0.8 0.9 1 p 0 0.2 0.4 0.6 0.8 1 ESS/N ESS H N (2) H N ( ) (a) 0.5 0.6 0.7 0.8 0.9 1 p 0 0.2 0.4 0.6 0.8 1 ESS/N ESS H N (7.6) H N ( ) (b) Figure 5: Ratio of ESS v alues o ver N (with N = 1000) v ersus σ p . The curv e corresp onding to theoretical ESS v alue, i.e., ESS teo / N is shown in black solid line in b oth ﬁgures. In (a) the curves of ESS-H (2) N / N (circles) and ESS-H ( ∞ ) N / N (squares) are also depicted. In (b) we show the curv es ESS-H (7 . 6) N / N (dashed line) and the linear com bination in Eq. (59) (squares), as well. Discussion of the results. Figure 5(b) shows the curves of the ESS rates corresp onding to the theoretical ESS curv e (solid line), the b est linear combination corresp onding to the Eqs. (54)-(58) (green squares) and the curv e corresp onding to ESS-H ( β ∗ ) N (red triangles). Again the linear com bination can return v alues greater than 1 (recall that we are considering ESS / N ). This b eha vior could b e exploited in future w orks since actually ESS teo / N can exceed 1 (see [14, Section 3.3]). Moreo ver, we can see that ESS-H (7 . 6) N ( ¯ w ) p erforms particularly well in this scenario, pro viding a close to the theoretical ESS curve. Hence, in this setup, w e would suggest the use of ESS-H (7 . 6) N ( ¯ w ). Only for simplicity in computation and comparison, one could consider the closest in teger and use β = 8, ESS-H (8) N ( ¯ w ) = 1 P N n =1 ¯ w 8 n ! 1 7 . (61) Finally , it is imp ortant to remark that ev en if the optimal β ∗ ≈ 7 . 6 (or 8) is diﬀeren t from the v alue β ∗ ≈ 4 suggested in the previous section, how ev er b oth v alues diﬀer from 2 (that corresp onds to the typical formula employ ed in the literature) and b oth v alues are bigger than 2. The expression with β → ∞ , i.e., ESS-H ( ∞ ) N = 1 max ¯ w n seems that can b e emplo y ed as a lo wer bound for the theoretical v alue ESS teo , in b oth setups. These considerations can b e relev ant clues for future applications and studies. 24 8.2 Application to a v ariable selection in a regression problem with real data Finding the connections with other ﬁelds creates the opportunities for new applications for the ESS form ulas. As w e describ ed in Section 7.5, the ESS can b e applied in a feature selection problem to ﬁnd the eﬀective n um b er of comp onen ts. In this section, w e provide an example of this application with a real dataset. Let us consider a regression problem, where we observ e a dataset of N pairs { x n , y n } N n =1 , with eac h input v ector x n = [ x n, 1 , . . . , x n,K ] is formed by K v ariables, and the outputs y n ’s are scalar v alues [42]. W e consider the case that b eing K ≤ N and assume a linear observ ation mo del, y n = θ 0 + θ 1 x n, 1 + θ 2 x n, 2 + . . . θ K x n,K + ϵ n , (62) where ϵ n is a Gaussian noise with zero mean and v ariance σ 2 ϵ , i.e., ϵ n ∼ N ( ϵ | 0 , σ 2 ϵ ). More sp eciﬁcally , in this real dataset [42, 41, 15], w e hav e K = 122 features and N = 1214 n um b er of data p oin ts x i . W e focus on the ﬁrst of the tw o outputs in the dataset (called “arousal”). W e set V ( k ) = − 2 log ( ℓ max ) with ℓ max = max θ p ( y | θ k ) with k ≤ K , after ranking the 122 v ariables (see [42]), where the lik eliho o d function p ( y | θ k ) is induced by the Eq. (62). In order to ﬁnd the eﬀectiv e num b er of v ariables N eff ≤ K = 122, we compare with diﬀeren t well-kno wn information criteria 4 , AIC, BIC and HQIC, and other metho ds provided in the literature. F or the sp ectral information criterion (SIC), w e test t w o conﬁdence internal parameter to 95% and 99%. W e also test diﬀerent stable ESS formulas obtained the w eigh ts as in Eq. (50). W e test the expressions in the Huggins-Ro y family , ESS-H ( β ) N , with β → 1, β = 2, β → ∞ , and the other stable form ulas giv en in the Eqs. (37), (39), (42), and (44). All the results are r ounde d to the closest in teger. Th us, the results provided by each metho d are given in T able 4. T able 4: Results in the v ariable selection example with a real dataset. Sc heme AIC BIC HQIC UAED SIC-95 SIC-99 ENV N eff 44 17 41 11 7 17 13 Ref. [45] [43] [21] [38] [35] [35] [36] ESS form ula β → 1 β = 2 β → ∞ Plus Q Gini Gol N eff 10 5 3 11 24 11 4 Eq. (7) (6) (8) (37) (39) (42) (44) After an exhaustiv e analysis, the authors in [42, Section 4-C] suggest that there are 7 v ery relev ant v ariables (level 1 of [42, Section 4-C]), other 7 relev an t v ariables (lev el 2 of [42, Section 4-C]) and other 2 v ariables in a lev el 3 of imp ortance [42, Section 4-C], hence, ov erall 16 v ariables among v ery relev an t, relev an t, and imp ortant ones (16 o ver 122 possible features). The minim um v alue in 4 Considering the cost function C ( k ) = V ( k ) + λk , each information criterion suggests the use of a diﬀerent parameter λ . 25 T able 4 is 3, provided b y ESS-H ( ∞ ) N , whereas the maxim um v alue is 44 given b y AIC. These v alues and the rest of results in T able 4 are in line with the conclusions in [42]. More speciﬁcally , the results given b y the SIC-99, BIC, UAED, ENV, the p erplexit y ESS-H (1) N , ESS-Plus and ESS-Gini are 10 ≤ N eff ≤ 17, and are close to the results of the analysis in [42]. Hence, in this exp eriment, some ESS form ulas lik e the p erplexit y ESS-H (1) N , ESS-Plus and ESS-Gini, seem to pro vide go o d p erformance as eﬀective num b er of comp onents in mo del selection. 9 Conclusions In this w ork, we ha ve analyzed alternative eﬀective sample size (ESS) measures for Monte Carlo algorithms based on the importance sampling tec hniques. W e ha ve remarked the connection to the practical ESS form ulas used in the literature and entrop y families [9]. W e hav e sho wn that all the ESS functions included in the Huggins-Roy’s ESS family fulﬁll all the required theoretical conditions describ ed in [34], and we hav e also highlighted the relationship of this family with the R ´ en yi en trop y [9]. W e hav e also shown the application of the Gini impurity index as ESS form ula and its connection to the Tsallis entrop y . F urthermore, w e hav e studied the p erformance of diﬀeren t Huggins-Ro y’s ESS form ulas by n umerical simulations, in tro ducing also an optimal linear combination of the most promising ESS indices. In tw o n umerical examples, w e hav e obtained the best ESS appro ximations within the Huggins-Roy’s family in t wo diﬀerent setups, ESS =  1 P M n =1 ¯ w 4 n  1 / 3 and ESS =  1 P M n =1 ¯ w 8 n  1 / 7 . These form ulas pro vide a goo d appro ximation (and in the ﬁrst case almost a perfect match) of the theoretical ESS v alues, in tw o diﬀerent considered exp erimen tal scenarios. Moreov er, the expression ESS = 1 max ¯ w n , whic h corresp onds to β → ∞ , also pro vides go o d p erformance in some sp eciﬁc cases (and pla ying the role of a lo wer bound of the ESS measures in other cases). All these considerations suggest us that the use of a β > 2 can more adequate in practical applications, e.g., in order to ﬁght the sample degeneracy and imp ov erishmen t within a particle ﬁltering algorithm. The relationship with the en tropy families has also clariﬁed the connections with other ﬁelds: p ossible applications in ecology , economics, political science, and mac hine learning hav e b een discussed. The application of the ESS expressions as the eﬀective n umber of comp onents in model selection seems to be promising but should b e further inv estigated and tested. Moreo v er, the construction of these connections with other ﬁelds can also yield nov el contributions in the IS con text. As a ﬁnal consideration, ﬁnding a nov el and broader family that contains all the stable ESS formulas (that do not b elong to the Huggins-Ro y’s family) could b e ob ject of future research. Ac kno wledgemen t The work w as partially supp orted by the pro ject Starting Gran t for Rttb, BA-GRAPH “Eﬃcient Ba yesian inference for graph-supp orted data”, of the Univ ersity of Catania (UPB-28722052144), b y the pro ject Lik eF ree-BA-GRAPH funded by “PIAno di inCEn tivi per la RIcerca di A teneo 2024/2026” of the Univ ersity of Catania (UPB-28722052159), Italy . 26 References [1] M. S. Arulumpalam, S. Maskell, N. Gordon, and T. Klapp. A tutorial on particle ﬁlters for online nonlinear/non-Gaussian Bay esian tracking. IEEE T r ansactions Signal Pr o c essing , 50(2):174–188, F ebruary 2002. [2] W. H. Berger and F. L Park er. Diversit y of planktonic foraminifera in deep-sea sediments. Scienc e (New Y ork, N.Y.) , 168(3937):1345–1347, 1970. [3] C. M. Bishop. Pattern R e c o gnition and Machine L e arning (Information Scienc e and Statistics) . Springer, 1 edition, 2007. [4] M. F. Bugallo, L. Martino, and J. Corander. Adaptiv e imp ortance sampling in signal pro cessing. Digital Signal Pr o c essing , 47:36–49, 2015. [5] O. Capp ´ e, R. Douc, A. Guillin, J. M. Marin, and C. P . Rob ert. Adaptiv e importance sampling in general mixture classes. Statistics and Computing , 18:447–459, 2008. [6] O. Capp´ e, A. Guillin, J. M. Marin, and C. P . Rob ert. Population Monte Carlo. Journal of Computational and Gr aphic al Statistics , 13(4):907–929, 2004. [7] L. Ceriani and P . V erme. The origins of the Gini index: extracts from v ariabilit´ a e mutabilit´ a (1912) b y Corrado Gini. The Journal of Ec onomic Ine quality , 10(3):421–443, 2012. [8] N. Chopin. A sequential particle ﬁlter for static mo dels. Biometrika , 89:539–552, 2002. [9] T. M. Co v er and J. A. Thomas. Elements of Information The ory . Wiley-Interscience, New Y ork (USA), 1991. [10] P . M. Djuri´ c, J. H. Kotec ha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo, and J. M ´ ıguez. P article ﬁltering. IEEE Signal Pr o c essing Magazine , 20(5):19–38, Septem b er 2003. [11] A. Doucet, N. de F reitas, and N. Gordon, editors. Se quential Monte Carlo Metho ds in Pr actic e . Springer, New Y ork, 2001. [12] A. Doucet, S. Go dsill, and C. Andrieu. On sequential Mon te Carlo Sampling metho ds for Ba yesian ﬁltering. Statistics and Computing , 10(3):197–208, 2000. [13] A. Doucet and A. M. Johansen. A tutorial on particle ﬁltering and smo othing: ﬁfteen years later. te chnic al r ep ort , 2008. [14] V. Elvira, L. Martino, and C. P . Rob ert. Rethinking the Eﬀective Sample Size. International Statistic al R eview , 90(3):525–550, 2022. [15] J. F an, M. Thorogo o d, and P . P asquier. Emo-soundscap es: A dataset for soundscap e emotion recognition. In 2017 Seventh international c onfer enc e on aﬀe ctive c omputing and intel ligent inter action (ACII) , pages 196–201, 2017. 27 [16] D. Gamerman and H. F. Lop es. Markov Chain Monte Carlo: Sto chastic Simulation for Bayesian Infer enc e, . Chapman & Hall/CR C T exts in Statistical Science, 2006. [17] C. Gini. Measuremen t of inequality and incomes. The Ec onomic Journal , 31:124–126, 1921. [18] G. V. Goloso v. The eﬀective num b er of parties: A new approach. Party Politics , 16(2):171– 192, 2010. [19] N. Gordon, D. Salmond, and A. F. M. Smith. No vel approac h to nonlinear and non-Gaussian Ba yesian state estimation. IEE Pr o c e e dings-F R adar and Signal Pr o c essing , 140:107–113, 1993. [20] J A Hanley and B J McNeil. The meaning and use of the area under a receiver op erating c haracteristic (ROC) curve. R adiolo gy , 143(1):29–36, 1982. [21] E. J. Hannan and B. G. Quinn. The determination of the order of an autoregression. Journal of the R oyal Statistic al So ciety. Series B (Metho dolo gic al) , 41(2):190–195, 1979. [22] J. H Huggins and D. M Roy . Con v ergence of sequen tial Mon te Carlo based sampling metho ds. arXiv:1503.00966 , 2015. [23] S. Inoua. Beware the Gini index! a new inequalit y measure. pr eprint arXiv:2110.01741 , pages 1–26, 2021. [24] L. Jost. En tropy and div ersit y . Oikos , 113(2):363–375, 2006. [25] D. Kaplan. Knee p oint, 2024. MA TLAB Cen tral File Exchange. [26] A. Kong. A note on imp ortance sampling using standardized weigh ts. T e chnic al R ep ort 348, Dep artment of Statistics, University of Chic ago , 1992. [27] A. Kong, J. S. Liu, and W. H. W ong. Sequen tial imputations and Bay esian missing data problems. Journal of the Americ an Statistic al Asso ciation , 89(425):278–288, 1994. [28] M. Krzywinski and N. Altman. Classiﬁcation and regression trees. Natur e Metho ds , 14(8):757– 758, 2017. [29] M. Laakso and T aagep era R. ”eﬀectiv e” n um b er of parties: A measure with application to w est europ e. Comp ar ative Politic al Studies , 12:3–27, 1979. [30] F. Liang, C. Liu, and R. Caroll. A dvanc e d Markov Chain Monte Carlo Metho ds: L e arning fr om Past Samples . Wiley Series in Computational Statistics, England, 2010. [31] J. S. Liu. Monte Carlo Str ate gies in Scientiﬁc Computing . Springer, 2004. [32] F. Llorente and L. Martino. Optimalit y in imp ortance sampling: a gentle survey . arXiv:2502.07396 , pages 1–40, 2025. 28 [33] M. O. Lorenz. Metho ds of measuring the concen tration of wealth. Public ations of the A meric an Statistic al Asso ciation , 9(70):209–219, 1905. [34] L. Martino, V. Elvira, and F. Louzada. Eﬀective sample size for imp ortance sampling based on discrepancy measures. Signal Pr o c essing , 131:386–401, 2017. [35] L. Martino, R. San Millan-Castillo, and E. Morgado. Sp ectral information criterion for automatic elb o w detection. Exp ert Systems with Applic ations , 231:120705, 2023. [36] L. Martino, E. Morgado, and R. San Millan Castillo. An index of eﬀectiv e n um b er of v ariables for uncertaint y and reliability analysis in mo del selection problems. Signal Pr o c essing , 227:109735, 2025. [37] J. Molinar. Counting the n um b er of parties: An alternativ e index. The A m eric an Politic al Scienc e R eview , 85(4):1383–1391, 1991. [38] E. Morgado, L. Martino, and R. San Millan-Castillo. Universal and automatic elb o w detection for learning the eﬀectiv e num b er of comp onents in mo del selection problems. Digital Signal Pr o c essing , 140:104103, 2023. [39] A. J. On umanyi, D. N. Molokomme, S. J. Isaac, and A. M. Abu-Mahfouz. Auto elb ow: An automatic elbow detection metho d for estimating the num b er of clusters in a dataset. Applie d Scienc es , 12(15), 2022. [40] C. P . Rob ert and G. Casella. Intr o ducing Monte Carlo Metho ds with R . Springer, 2010. [41] R. San Mill´ an-Castillo, L. Martino, and E. Morgado. A v ariable selection analysis for soundscap e emotion mo delling using decision tree regression and mo dern information criteria. IEEE A c c ess , 2024. [42] R. San Mill´ an-Castillo, L. Martino, E. Morgado, and F. Llorente. An exhaustive v ariable selection study for linear mo dels of soundscape emotions: Rankings and Gibbs analysis. IEEE/A CM T r ansactions on A udio, Sp e e ch, and L anguage Pr o c essing , 30:2460–2474, 2022. [43] G. Sc h w arz et al. Estimating the dimension of a model. The annals of statistics , 6(2):461–464, 1978. [44] E. H. Simp oson. Measuremen t of div ersity . Natur e , 163(4148):688–688, Apr 1949. [45] D.J. Spiegelhalter, N. G. Best, B. P . Carlin, and A. V an der Linde. Bay esian measures of mo del complexity and ﬁt. J. R. Stat. So c. B , 64:583–616, 2002. [46] C. Tsallis. P ossible generalization of Boltzmann-Gibbs statistics. Journal of Statistic al Physics , 52(1):479–487, Jul 1988. [47] S. Yitzhaki and E. Sc hec htman. Mor e Than a Dozen A lternative Ways of Sp el ling Gini , pages 11–31. Springer New Y ork, 2013. 29 [48] J. Zhang, P . F u, F. Meng, X. Y ang, J. Xu, and Y. Cui. Estimation algorithm for c hloroph yll-a concen trations in w ater from h yp ersp ectral images based on feature deriv ation and ensemble learning. Ec olo gic al Informatics , 71:101783, 2022. [49] K. Zyczko wski, P . Horo dec ki, A. Sanp era, and M. Lew enstein. V olume of the set of separable states. Phys. R ev. A , 58:883–892, 1998. A Probabilistic in terpretation Let us deﬁne a pair of random v ariables { X t , Z t } that corresp onds to generate a random pair of samples that are indep endently dra wn with replacemen t according to the pmf deﬁned by ¯ w n , with n = 1 , ..., N . W e denote with t ∈ N a sub-index corresp onding the trial/exp erimen t. W e p erform diﬀeren t indep endent trials. Let also deﬁne the random v ariable T = { minim um t ∈ N suc h that X t = Z t } . W e aim to compute the exp ected n umber of trials needed to obtain a ﬁrst pair containing the same sample t wice, i.e., E [ T ] = ∞ X t =1 t · Prob( T = t ) . (63) Note no w that Prob( T = 1) = N X n =1 ¯ w 2 n , Prob( T = 2) = 1 − N X n =1 ¯ w 2 n ! N X n =1 ¯ w 2 n , and Prob( T = t ) = 1 − N X n =1 ¯ w 2 n ! t − 1 N X n =1 ¯ w 2 n . Th us, replacing into Eq. (63), we hav e E [ T ] = ∞ X t =1 t ·   1 − N X n =1 ¯ w 2 n ! t − 1 N X n =1 ¯ w 2 n   , (64) = N X n =1 ¯ w 2 n !   ∞ X t =1 t 1 − N X n =1 ¯ w 2 n ! t − 1   , (65) = P N n =1 ¯ w 2 n 1 − P N n =1 ¯ w 2 n !   ∞ X t =1 t 1 − N X n =1 ¯ w 2 n ! t   . (66) 30 T o simplify the expression ab ov e, w e can set r = 1 − P N n =1 ¯ w 2 n , so that w e can rewrite it as E [ T ] =  1 − r r  " ∞ X t =1 t r t # , (67) =  1 − r r  r (1 − r ) 2 = 1 1 − r = 1 P N n =1 ¯ w 2 n , (68) where w e hav e used the follo wing equalit y , ∞ X t =1 t r t = r (1 − r ) 2 , when r ≤ 1 , whic h is a well-kno wn result of p o wer series. B Other form for the ESS form ula in Eq. (6) Let us recall P N n =1 ¯ w n = 1 so that the arithmetic mean of the normalized weigh ts is alwa ys µ = 1 N P N n =1 ¯ w n = 1 N . Note that the ESS form ula in Eq. (6) can ESS N ( ¯ w ) = 1 P N n =1 ¯ w 2 n = 1 1 N + N b σ 2 , (69) where b σ 2 = 1 N P N n =1 ( ¯ w n − µ ) 2 is the v ariance of the normalized weigh ts. If b σ 2 = 0, then ESS N ( ¯ w ) reac hes the maximum v alue N . W e can write: 1 1 N + N b σ 2 = 1 1 N + N  1 N P N n =1 ( ¯ w n − µ ) 2  , = 1 1 N + N  1 N P N n =1 ¯ w 2 n + 1 N P N n =1 µ 2 − 2 1 N µ P N n =1 ¯ w n  , = 1 1 N + N  1 N P N n =1 ¯ w 2 n + 1 N N µ 2 − 2 µ 2  , = 1 1 N + N  1 N P N n =1 ¯ w 2 n − µ 2  , = 1 1 N +  P N n =1 ¯ w 2 n − N µ 2  , = 1 1 N + P N n =1 ¯ w 2 n − 1 N = 1 P N n =1 ¯ w 2 n . The equation ab o ve pro v es the equality (69). 31

Effective sample size approximations as entropy measures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment