Effective sample size approximations as entropy measures
In this work, we analyze alternative effective sample size (ESS) metrics for importance sampling algorithms, and discuss a possible extended range of applications. We show the relationship between the ESS expressions used in the literature and two en…
Authors: L. Martino, V. Elvira
Effectiv e sample size appro ximations as en trop y measures L. Martino ⋆ , V. Elvira ⊤ , ⋆ Univ ersit´ a degli Studi di Catania, Italy . ⊤ Univ ersity of Edin burgh, UK. Abstract In this w ork, w e analyze alternative effectiv e sample size (ESS) metrics for imp ortance sampling algorithms, and discuss a p ossible extended range of applications. W e show the relationship b et ween the ESS expressions used in the literature and t wo en trop y families, the R ´ en yi and Tsallis en trop y . The R´ en yi entrop y is connected to the Huggins-Ro y’s ESS family in tro duced in [22]. W e prov e that that all the ESS functions included in the Huggins-Ro y’s family fulfill all the desirable theoretical conditions. W e analyzed and remark the connections with several other fields, suc h as the Hill num b ers introduced in ecology , the Gini inequality co efficien t emplo yed in economics, and the Gini impurit y index used mainly in machine learning, to name a few. Finally , by n umerical simulations, we study the p erformance of differen t ESS expressions con tained in the previous ESS families in terms of appro ximation of the theoretical ESS definition, and show the application of ESS form ulas in a v ariable selection problem. Keyw ords: Effectiv e Sample Size; Imp ortance Sampling; Entrop y; Div ersity measure; Gini impurit y; Gini inequality co efficient; inv erse Simpson concentration; Berger-Park er index. The effectiv e sample size (ESS) measure is an imp ortan t concept in order to quan tify the efficiency of different Mon te Carlo metho ds, such as Marko v Chain Monte Carlo (MCMC) [16, 30] and Imp ortance Sampling (IS) techniques [4, 6]. In an IS con text, the ESS is a heuristic to appro ximate how many indep enden t identically distributed (i.i.d.) samples, dra wn directly from the target distribution ¯ π ( x ) = 1 Z π ( x ) where Z is the normalizing constan t, are equiv alent in some sense to the N weigh ted samples, x 1 , . . . , x n , dra wn from a prop osal distribution q ( x ) and w eighted according to the ratio w n = π ( x n ) q ( x n ) [40]. This consideration is represented in the first b o x of Figure 1, referred as abstr act ESS c onc ept . The theoretical definition of the ESS for IS is giv en by the ratio b etw een tw o v ariances [16, 26]: the v ariance of the ideal Mon te Carlo estimator (dra wing samples directly from the target), and the v ariance of the estimator obtained by an IS sc heme, using the same num b er of samples in b oth estimators (see Eq. (5) for more details). This definition presen ts some dra wbac ks (see [34, 14] for an exhaustiv e discussion) and is useless for practical purp oses since it cannot b e computed in general. Hence, appro ximations of this theoretical form ula are required. In Figure 1, this theoretical definition is represented b y the second b o x. Within an IS context, the most common c hoice in literature to appro ximate this theoretical ESS definition is ESS = 1 P N n =1 ¯ w 2 n , which in volv es (only) the normalized imp ortance w eights ¯ w n = w n P N j =1 w j , n = 1 , . . . , N [10, 11, 27, 40]. This expression has b een widely used in particle filtering in order to apply the resampling steps adaptiv ely [11, 10, 19]. Ho wev er, it presen ts differen t weaknesses since it has b een obtained after sev eral approximations of the theoretical definition. F or instance, it just depends on the normalized weigh ts, but it is not dep enden t on particle lo cations and from the particular integral to approximate (see [14, 34] for further details). Sev eral other alternatives ha ve b een studied in literature and applied in order to p erform adaptive resampling within sequen tial Mon te Carlo (SMC) metho ds [22, 34]. F or instance, another measure called p erplexity , in volving the discrete en tropy [9] of the normalized weigh ts has b een also prop osed in [5] (see also [40, Chapter 4], [13, Section 3.5]). Another expression is defined as the inv erse of the maximum of the normalized w eights ¯ w n [34]. In this work, w e recall the definition of the generalized ESS (G-ESS) functions giv en in [34]. W e stress and show that the G-ESS functions can b e considered diversity indic es [24] (see third b o x in Figure 1). Indeed, w e show that the G-ESS functions can b e asso ciated to differen t entrop y families [9]. Giv en an entrop y measure of the probabilit y mass function (pmf ) defined b y the normalized weigh ts ¯ w n , n = 1 , . . . , N , w e can obtain a G-ESS formula by taking the exp onen tial transformation of the en tropy expression (in some cases, some additional translation and scaling are needed). More sp ecifically , we analyze the R ´ en yi and Tsallis entrop y families, conv erting them in G-ESS functions. The ESS formulas corresp onding to the R´ enyi entrop y coincides with the Huggins-Ro y’s ESS family introduced and studied indep endently in [22], ESS = N X n =1 ¯ w β n ! 1 1 − β , β ≥ 0 . W e sho w that all the G-ESS expressions b elonging to this family satisfy all the desired requiremen ts, b eing all defined as pr op er and stable (see Section 3 for further details). Moreo v er, almost all the main form ulas previously prop osed in the literature are con tained in the Huggins- Ro y’s family . Using the Tsallis entrop y , we obtain another ESS family which con tain the Gini impurity index as special case, that is widely employ ed in mac hine learning within decision tree algorithms [3, 28]. W e also discuss the connection to another ESS family provided in [34]. Ho wev er, generally the Tsallis ESS formulas are not prop er and stable. Other stable expressions that do not b elong to the Huggins-Roy’s family are also given (see, e.g., Sections 6 and 7.3). The connections with the en tropy families sho w the relationships with multiple studies in differen t fields (e.g., ecology and mac hine learning to name a few). The benefit of creating these bridges b et ween fields is t w ofold and bidirectional: differen t ideas used in other fields can be applied as ESS expressions in an IS con text (as the formula (44) in tro duced in p olitical science) and, vice- v ersa, ESS formulas prop osed for IS could b e emplo y ed in other fields. Showing these bridges is the main goal of th is w ork. W e remark on the links with other fields in Section 7, where w e discuss the applications of ESS expression in ecology , economics, p olitical science, physics, and also in feature selection problems. Other connections with mac hine learning, economics, and ecology are also discussed in the previous Sections 5 and 6. Figure 2 provides a summary of the main 2 nomenclature emplo yed in different fields. F urthermore, by n umerical simulations, w e obtain the G-ESS function within Huggins-Roy’s family whic h provides the best approximation the theoretical ESS definition, in t wo specific scenarios. W e also study linear com binations of G-ESS functions in order to enhance the approximation of the theoretical definition. The results of our numerical sim ulations suggest the use of the formulas of the t yp e of ESS = 1 P N n =1 ¯ w 4 n 1 / 3 and ESS = 1 P N n =1 ¯ w 8 n 1 / 7 . Both expression differ from the classical form ula ESS = 1 P N n =1 ¯ w 2 n , whic h is cont ained in Huggins-Roy’s family with β = 2. Our study suggest the use of β > 2. Moreov er, w e ha ve applied the most relev an t ESS formulas in a v ariable selection framework. Some of them pro vides go o d results in line with the exp ert’s opinions. These considerations can b e also relev ant clues for future applications and studies. Abstract ESS concept: comparing weigh ted samples with i.i.d. samples from ¯ ⇡ Theoretical Definition: for instance, ESS = N v ar ⇡ [ b I ] v ar q [ e I ] Appro ximation: Div ers ity measures for instance, [ ESS = 1 P N n =1 ¯ w 2 n Figure 1: Graphical represen tation of the dev elopment of the appro ximated ESS form ulas for imp ortance sampling. The abstract concept of Effective Sample Size has b een translated in a mathematical form ulation pro viding a first attempt of theoretical definition. Since this definition cannot compute, sev eral approximations hav e b een prop osed (based only in the information pro vided b y the normalized IS w eights). The expression ESS = 1 P M n =1 ¯ w 2 n is the most applied so far in the literature. 1 Effectiv e sample size (ESS) for imp ortance sampling Let us denote the target probability density function (p df ) as ¯ π ( x ) ∝ π ( x ) (kno wn up to a normalizing constan t) with x ∈ X . Moreo v er, we consider the follo wing in tegral in volving ¯ π ( x ) and a square-in tegrable function h ( x ), I = Z X h ( x ) ¯ π ( x ) d x , (1) whic h w e aim to appro ximate using a Mon te Carlo approac h. If w e are able to draw N indep enden t samples x 1 , . . . , x N from ¯ π ( x ), then the Monte Carlo estimator of I is b I = 1 N N X n =1 h ( x n ) N →∞ − − − → I , where x n ∼ ¯ π ( x ) . (2) Ho wev er, generating samples directly from the target, ¯ π ( x ), is often imp ossible. Alternatively , we can dra w N samples x 1 , . . . , x N from a (simpler) prop osal p df q ( x ), 1 and then assign a weigh t to 1 W e assume that q ( x ) > 0 for all x where ¯ π ( x )) = 0. 3 eac h sample, w n = π ( x n ) q ( x n ) , with n = 1 , . . . , N , according to the imp ortance sampling (IS) approach. Defining the normalized w eights, ¯ w n = w n P N i =1 w i , where w n = π ( x n ) q ( x n ) , n = 1 , . . . , N , (3) then the self-normalized IS estimator is e I = N X n =1 ¯ w n h ( x n ) N →∞ − − − → I , where x n ∼ q ( x ) . (4) Generally , the estimator e I has greater v ariance than b I , since the samples are not directly drawn from ¯ π ( x ) (for some exceptions, that o ccur with a suitable c hoice of the prop osal, see [32]). Moreo ver, e I is biased whereas b I is unbiased. In sev eral applications [10, 11], it is necessary to measure the loss of the efficiency when we apply the IS estimator e I , instead of ideal Mon te Carlo estimator b I , i.e., to measure in some wa y the increase of v ariance due to the use of e I instead of b I . Hence, the idea is to define the Effective Sample Size (ESS) as the ratio of the v ariances of the estimators [26], ESS teo ( h ) = N v ar π [ b I ] v ar q [ e I ] . (5) Note the dep endence on the function h ( x ) corresp onding to a sp ecific in tegral. 2 Practical ESS form ulas 2.1 ESS expressions in the literature Finding a useful expression of ESS deriv ed analytically from the theoretical definition in Eq. (5) ab o ve is not straigh tforw ard. Then, differen t deriv ations [26, 27], [11, Chapter 11], [40, Chapter 4] pro ceed using several approximations and assumptions for yielding an expression useful from a practical p oin t of view. A well-kno wn rule of th um b, widely used in literature [11, 31, 40], is ESS N ( ¯ w ) = 1 P N n =1 ¯ w 2 n , (6) where w e hav e used the the normalized w eigh ts ¯ w = [ ¯ w 1 , . . . , ¯ w N ] , defined in Eq. (3). The formula ab o ve has also an intuitiv e probabilistic in terpretation (from a resampling p oin t of view): if we draw random pairs of samples with replacement according to the probabilit y mass function (pmf ) defined by ¯ w n , with n = 1 , ..., N , the v alue 1 P N n =1 ¯ w 2 n is the exp ected n umber of trials needed to obtain a first pair con taining the same sample twice (see App endix A for details). F urthermore, another in teresting form of Eq. (6) as a function of the v ariance of the 4 w eights is giv en in App endix B. Another similar measure, called p erplexity , has b een prop osed indep enden tly in literature based only on the normalized imp ortance w eights [5, 40], ESS N ( ¯ w ) = exp { H ( ¯ w ) } (7) where H ( ¯ w ) = − N X n =1 ¯ w n log ¯ w n , is the discrete en tropy of the vector ¯ w [9]. An additional example is the following formula [34], ESS N ( ¯ w ) = 1 max ¯ w n . (8) Let us assume that max ¯ w n is reached only with one sample (only for one index n ). In this case, the expression ab o ve has also a probabilistic in terpretation: if w e dra w one sample with replacemen t according to the pmf defined b y ¯ w n , with n = 1 , ..., N , the v alue 1 max ¯ w n is the expected n umber of trials needed to obtain for a first time the sample corresp onding to the maxim um w eight. The pro of is very similar the deriv ation in App endix A. This in terpretation is in teresting from a resampling p oint of view. An interesting prop erty of all the three expressions ab o ve in Eqs. (6)-(7)-(8) is 1 ≤ ESS N ( ¯ w ) ≤ N . (9) 2.2 Relationship with the theoretical definition All these measures ESS N ( ¯ w ) are only based on the normalized weigh ts ¯ w and there is a loss of information regarding the lo cations of the samples x n , which is clearly a drawbac k [14, 34], even if the computation of the weigh ts in volv es the use of the samples, i.e., w n = π ( x n ) /q ( x n ). T o clarify this p oin t, w e give the follo wing example. Two differen t samples x ′ and x ′′ could ha v e very similar w eights w ′ = π ( x ′ ) q ( x ′ ) ≈ w ′′ = π ( x ′′ ) q ( x ′′ ) , so that the ESS form ulas just use this information. How ev er, the ESS formulas lose all the information ab out the p ositions of the samples x ′ and x ′′ . The tw o particles can b e very close to each other or far aw a y; the latter scenario is often preferred in terms of statistical information. Moreo ver, the theoretical v alue ESS teo ( h ) in (5) is alw ays p ositiv e, could b e smaller than 1 and, in some situations, bigger than N as w ell [14, Section 3.3], [32]. This last scenario can o ccur when an optimal prop osal p df (or a densit y close to the optimal one) is used in an IS sc heme [32]. In this case, the IS scheme can b eat the baseline Mon te Carlo estimator, and ESS teo ( h ) > N . F urthermore, ESS teo ( h ) dep ends on the function h that do es not appear in the expressions ESS N ( ¯ w ). Therefore, the form ulas ESS N ( ¯ w ) that all satisfy the constrains in Eq. (9) (i.e., 1 ≤ ESS N ( ¯ w ) ≤ N ) are quite rough appro ximations of ESS teo ( h ). How ev er, they are often used in practice. The reason for this success is connected to their in terpretation as discrepancy/div ersity measures, as explained b elo w. 5 2.3 Discrepancy w.r.t. the uniform pmf All the formulas ab o ve can be considered diversity indic es or discr ep ancy me asur es [24, 34]. W e giv e more details in the rest of the work. Here, let us start considering the discrepancy b etw een t wo pmfs: the pmf defined b y the weigh ts ¯ w = [ ¯ w 1 , . . . , ¯ w N ] and the discrete uniform pmf defined b y ¯ w ∗ = 1 N , . . . , 1 N . Indeed, the ESS form ula in Eq. (6) can b e directly related to the Euclidean distance b et ween these t w o pmfs, i.e., ∥ ¯ w − ¯ w ∗ ∥ 2 = v u u t N X n =1 ¯ w n − 1 N 2 = v u u t N X n =1 ¯ w 2 n ! + N 1 N 2 − 2 N N X n =1 ¯ w n = v u u t N X n =1 ¯ w 2 n ! − 1 N = s 1 ESS N ( ¯ w ) − 1 N , where we ha v e used ESS N ( ¯ w ) = 1 P N i =1 ¯ w 2 n in Eq. (6). Hence, maximizing the expression in Eq. (6) is equiv alen t to minimizing the Euclidean distance ∥ ¯ w − ¯ w ∗ ∥ 2 . Note that this b eha vior is also t ypical of discrete entrop y measures, as w e stress in the next sections. Indeed, if the w eights are more “diverse” to eac h other, the distance w.r.t. the discrete uniform pmf ¯ w ∗ is higher, the ESS and the entrop y of ¯ w are smaller. On the other hand, if the normalized w eights are more similar to each other, they are all closer to the v alue 1 / N , so that the distance w.r.t. the discrete uniform pmf ¯ w ∗ is smaller. As a consequence, the corresponding ESS and the entrop y of ¯ w would b e greater. Hence, it appears natural to consider the p ossibilit y of using other discrepancy and/or en tropy measures to design alternativ e ESS expressions. Highligh ting these t yp es of connections is relev an t since (a) we can extend the range of applications of the ESS form ulas (applying that expressions in other fields) and (b) deriv ations emplo y ed in other fields can b e used to design no vel ESS form ulas. Wh y discrepancy measures. The maxim um ESS v alue is obtained when ¯ w = ¯ w ∗ , i.e., all the normalized w eights are equal to 1 / N , ¯ w 1 = ... = ¯ w N = 1 N . This can b e considered a go o d scenario (and confused with the optimal one) since the ideal Mon te Carlo estimator in Eq. (2) can b e in terpreted as an estimator with “equal w eighted samples” (eac h h ( x n ) is m ultiplied by a factor 1 / N ). The problem is that with an IS sc heme, the case ¯ w 1 = ... = ¯ w N = 1 N (or ¯ w 1 ≈ ... ≈ ¯ w N ≈ 1 N ) can o ccur also in catastrophic scenarios, for instance, 6 when all the samples are lo cated in a tail of the target distribution (that often is a quite flat region), or when the samples are very close to each other. Ho wev er, the ESS form ulas ab ov e based on the discrepancy approac h are able to detect other critical scenarios. F or instance, the minimum ESS v alue is reached when just one weigh t concen trates all the probability mass ( ¯ w i = 1 and ¯ w j = 0 for i = j ), which is a situation to be a voided within particle filtering and sequen tial Mon te Carlo sc hemes [1, 8, 10, 12]. Therefore, this idea of this discrepancy approac h has gained strength in the literature, and the ESS approximations ab o ve hav e b een widely applied. In the follo wing, we describ e five conditions that a generic ESS approximation based only on the information of the normalized w eights must satisfy . Then w e sho w that the family of functions prop osed in [22] fulfills these fiv e conditions. F urthermore, we link this G-ESS family with the R ´ en yi entrop y pro viding also some theoretical results. 3 Generalized ESS functions Considering the practical approac h emplo yed ab ov e for defining ESS form ulas as discrepancy- div ersity measures, here we describ e the fiv e prop erties that a generalized ESS measure (G-ESS) should satisfy , based only on the information of the normalized weigh ts. W e list b elow fiv e conditions. The formulas which satisfy all of them can b e applied as suitable ESS measures in practical applications (within IS or sequen tial IS schemes). Otherwise, if they satisfy at least the first three conditions, they can b e considered as discrepancy measures with resp ect to the the uniform pmf, but hav e not the ability to b e ”particle/sample coun ters”, as w e clarify b elo w with practical examples. First of all, note that any p ossible G-ESS is a function of the v ector of normalized weigh ts ¯ w = [ ¯ w 1 , . . . , ¯ w N ], ESS N ( ¯ w ) = ESS N ( ¯ w 1 , . . . , ¯ w N ) : S N → [1 , N ] , (10) where S N ⊂ R N represen ts the unit simplex in R N . Namely , the v ariables ¯ w 1 , . . . , ¯ w N are sub jected to the follo wing constraint: ¯ w 1 + ¯ w 2 + . . . + ¯ w N = 1 . (11) Moreo ver, we denoted ¯ w ∗ = 1 N , . . . , 1 N , (12) and the v ertices of the simplex S N are denoted as ¯ w ( j ) = [ ¯ w 1 = 0 , . . . , ¯ w j = 1 , . . . , ¯ w N = 0] , (13) i.e., ¯ w j = 1 and ¯ w n = 0 (it can o ccur only if π ( x n ) = 0), for n = j with j ∈ { 1 , . . . , N } . Belo w we list the five conditions that ESS N ( ¯ w ) should fulfill: 7 C1. Symmetry: ESS N m ust b e inv arian t under an y p ermutation of the w eigh ts, i.e., ESS N ( ¯ w 1 , ¯ w 2 , . . . , ¯ w N ) = ESS N ( ¯ w j 1 , ¯ w j 2 , . . . , ¯ w j N ) , (14) for an y p ossible set of indices { j 1 , . . . , j N } = { 1 , . . . , N } . C2. Maxim um condition: A maximum v alue is N and it is reached at ¯ w ∗ (see Eq. (12)), i.e., ESS N ( ¯ w ∗ ) = N ≥ ESS N ( ¯ w ) . (15) C3. Minim um condition : the minimum v alue is 1 and it is reac hed (at least) at the vertices ¯ w ( j ) of the unit simplex in Eq. (13), ESS N ( ¯ w ( j ) ) = 1 ≤ ESS N ( ¯ w ) . (16) for all j ∈ { 1 , . . . , N } . C4. Unicit y of extreme v alues: The maximum at ¯ w ∗ is unique and the the minim um v alue 1 is reac hed only at the vertices ¯ w ( j ) , for all j ∈ { 1 , . . . , N } . C5. Stabilit y of the rate ESS N / N : Consider the vector of w eights ¯ w ∈ R N and the v ector ¯ v = [ ¯ v 1 , . . . , ¯ v M N ] ∈ R M N , M ≥ 1, obtained rep eating and scaling by 1 M the entries of ¯ w , i.e., ¯ v = 1 M [ ¯ w , ¯ w , . . . , ¯ w | {z } M − times ] . (17) The in v ariance condition is expressed as ESS N ( ¯ w ) = 1 M ESS M N ( ¯ v ) , (18) for all M ∈ N + . This last requirement can b e interpreted as an adjustment of the well-kno wn homo geneity (scale- in v ariance) condition for real functions. 2 Note that, giv en conditions C2 and C3, we alwa ys ha ve 1 ≤ ESS N ( ¯ w ) ≤ N . (19) If at least C1, C2 and C3 are fulfilled, the G-ESS can b e considered a discrepancy measure with resp ect to the the uniform pmf. If also C4 is satisfied, then it is a proper discrepancy measure since it reac hes the maxim um v alue (that is N ) only at ¯ w ∗ and the minim um v alue (that is 1) only at the v ertices ¯ w ( j ) . How ev er, if C5 is not ensured, the formula cannot b e considered a useful ESS function from a practical p oint of view. 2 A function f ( x ) is said to b e homogeneous of degree k if f ( c x ) = c k f ( x ) where c is a non-zero constant v alue. 8 On the condition C5. F or clarifying this condition, consider the vector ¯ v = [0 , 1 , 0] with N = 3, and the tw o additional vectors obtained rep eating ¯ v tw o or three times, ¯ v ′ = 0 , 1 2 , 0 , 0 , 1 2 , 0 = 1 2 [ ¯ v , ¯ v ] , ¯ v ′′ = 0 , 1 3 , 0 , 0 , 1 3 , 0 , 0 , 1 3 , 0 = 1 3 [ ¯ v , ¯ v , ¯ v ] , W e w ould lik e to obtain ESS 3 ( ¯ v ) = 1 , ESS 6 ( ¯ v ′ ) = 2 and ESS 9 ( ¯ v ′′ ) = 3, i.e., the ratio ESS N N is constan t, i.e., ESS 3 ( ¯ v ) 3 = ESS 6 ( ¯ v ′ ) 6 = ESS 9 ( ¯ v ′′ ) 9 = 1 3 . A more in tuitiv e explanation is as follows. If we ha v e a v ector of normalized weigh ts ¯ v ′ = 0 , 1 2 , 0 , 0 , 1 2 , 0 , we w ould lik e to get ESS 6 ( ¯ v ′ ) = 2, since we ha ve 2 effective samples instead of 6 (at most w e ha v e 2 effective samples; w e can just say that, lo oking the vector ¯ v ′ ). Now, if we ha ve a vector ¯ v ′′ = 0 , 1 3 , 0 , 0 , 1 3 , 0 , 0 , 1 3 , 0 , w e would like to obtain ESS 9 ( ¯ v ′′ ) = 3. F rom another p oin t of view, since ¯ v ′′ can b e seen as ¯ v ′′ = 1 3 [ ¯ v , ¯ v , ¯ v ], where ¯ v = [0 , 1 , 0], we w ould lik e that the ESS N form ula would b e able to count effe ctive samples in the same w ay in different pieces of a v ector. Namely , if E S S 3 ( ¯ v ) = 1 and ¯ v ′′ is formed b y three rep etitions of ¯ v then we exp ect to obtain ESS 9 ( ¯ v ′′ ) = 3. An y result that differs from these ones, do es not mak e sense from a practical p oin t of view, e.g., within a particle filter or sequen tial Mon te Carlo scheme. Classification of G-ESS. Giv en the previous observ ations, we can provide a classification of the p ossible G-ESS functions. T able 1 classifies the G-ESS functions in differen t families dep ending on the conditions fulfilled. T able 1 shows the cases found in differen t families of ESS measures [34]. Recall that the first three conditions are strictly required, to b e considered a discrepancy measure with resp ect to the uniform pmf. All the G-ESS functions whic h satisfy at least the first four conditions, i.e., from C1 to C4, are called pr op er functions. If all the conditions are fulfilled they are called pr op er and stable. W e are in terested in this last type of G-ESS expressions, prop er and stable. Remark 1. Only the pr op er and stable G-ESS functions ar e useful fr om a pr actic al p oint of view, in or der to b e employe d as ESS me asur es. T able 1: Classification of G-ESS form ulas. Class of G-ESS C1 C2 C3 C4 C5 Degenerate ✓ ✓ ✓ x x Prop er ✓ ✓ ✓ ✓ x Degenerate and Stable ✓ ✓ ✓ x ✓ Prop er and Stable ✓ ✓ ✓ ✓ ✓ In order to clarify the previous remark, and the imp ortance of the five conditions, b elo w w e sho w the imp ortance of the condition C5, in tro ducing a family that fulfills the first 4 conditions but do es not satisfy the last condition. 9 Example of a prop er but non-stable G-ESS family . Here, as an example, we in tro duce a G-ESS family suc h that the formulas in that family are all prop er but not stable. This means that all the contained G-ESS expressions can be used as discrepancy measure with resp ect to the uniform pmf but are not suitable to b e employs as ESS measures (within particle filters or sequen tial Mon te Carlo schemes). W e can design a G-ESS family based on the L p distance b etw een ¯ w and ¯ w ∗ whic h satisfies the first four conditions ab o ve. This could b e an in tuitive idea when we are in terested in discrepancy measures. W e can in fact define the family ESS-D ( p ) N ( ¯ w ) = 1 α p || ¯ w − ¯ w ∗ || p + 1 N , α p = N − 1 N h N − 1+( N − 1) p N p i 1 /p , (20) where || ¯ w − ¯ w ∗ || p = N X i =1 ¯ w n − 1 N p ! 1 /p , p > 0 . It is possible to sho w that ESS-D ( p ) N ( ¯ w ∗ ) = N and ESS-D ( p ) N ( ¯ w ( j ) ) = 1. Hence, ESS-D ( p ) fulfills C1, C2, and C3 and it is also easy to show that satisfies C4. Hence, this family can b e employ ed as a discrepancy measure with resp ect to the uniform pmf ¯ w ∗ . Ho wev er, it is not a go o d ESS measure since do es not satisfy C5 (it is not stable). F or clarifying this p oint, let us consider some examples comparing ESS-D ( p ) N with p = 2 in Eq. (20), with other ESS form ulas (but stable ) in Eqs. (6) and (8). The results are given in T able 2. T able 2: Examples of ESS measures with differen t v ector ¯ w with dimension N = 5. Note that ESS-D ( p ) N ( ¯ w ) with p = 2 in Eq. (20) is proper but non-stable, whereas the rest of t wo ESS form ulas in Eqs. (6)-(8) are b oth prop er and stable. (a) (b) (c) (d) (e) ¯ w = ⇒ [1 , 0 , 0 , 0 , 0] 1 2 , 1 2 , 0 , 0 , 0 1 3 , 1 3 , 1 3 , 0 , 0 1 4 , 1 4 , 1 4 , 1 4 , 0 1 5 , 1 5 , 1 5 , 1 5 , 1 5 = ¯ w ∗ ESS-D (2) 5 ( ¯ w ) — Eq. (20) 1 1.45 1.90 2.5 5 1 P 5 n =1 ¯ w 2 n — Eq. (6) 1 2 3 4 5 1 max ¯ w n — Eq. (8) 1 2 3 4 5 the formula ESS-D ( p ) N do es not provide the desired results, with the exception of the cases (a) and (e) (the first and the last scenarios) that are related to the condition C2 and C3. Namely , ESS-D ( p ) N is not a go od particle counter, unlike the other tw o ESS formulas. F or instance, in case of ¯ w = [ ¯ w 1 = 1 3 , ¯ w 2 = 1 3 , ¯ w 3 = 1 3 , ¯ w 4 = 0 , ¯ w 5 = 0] using just the information of these normalized 10 w eights ¯ w n , we can just assert that w e ha v e three effective samples, whereas ESS-D (2) 5 ( ¯ w ) returns ≈ 1 . 90. 4 Huggins-Ro y’s ESS family The Huggins-Ro y’s ESS family introduced in [22] is defined as ESS-H ( β ) N ( ¯ w ) = 1 P N n =1 ¯ w β n ! 1 β − 1 , (21) = N X n =1 ¯ w β n ! 1 1 − β , β ≥ 0 . (22) T able 3 shows b elow that the Huggins-Ro y’s family con tains all the most imp ortan t, pr op er and stable G-ESS functions introduced in literature. The sp ecial cases with β = 0 and β = 1 bring to t wo undetermined expressions that will b e solv ed and clarified b elow (when the relationship with R ´ en yi entrop y is sho wn). W e can easily note that 1 ≤ ESS-H ( β ) N ( ¯ w ) ≤ N for all β ≥ 0. More generally , it is possible to observ e that for β = 0 the conditions C1, C2, C3 and C4 are fulfilled (with the exception of β = 0 that do es not satisfy C4). F urthermore, the condition C5 is also satisfied, for all β , as we show next. Pro of. In order to pro ve that C5 is satisfied, for simplicity let us consider a v ector ¯ v = 1 2 [ ¯ w , ¯ w ], defined rep eating twice the vector ¯ w (i.e., M = 2). In this case, we hav e ESS-H ( β ) 2 N ( ¯ v ) = 1 2 β N X n =1 ¯ w β n + 1 2 β N X n =1 ¯ w β n ! 1 1 − β , = 1 2 β − 1 N X n =1 ¯ w β n ! 1 1 − β , = 2 N X n =1 ¯ w β n ! 1 1 − β , = 2 ESS-H ( β ) N ( ¯ w ) , ∀ β , (23) whic h is exactly the condition in Eq. (18). The pro of can b e easily rep eated for a v alue M > 2. Remark 2. Henc e, al l G-ESS functions (exc ept for β → 0 ) b elonging to the Huggins-R oy’s ESS family ar e pr op er and stable. F or β → 0 , the c orr esp onding ESS is de gener ate and stable. Mor e over, some sp e cific c ases pr ovide d in T able 3, c oincide with other pr op er and stable G-ESS formulas pr op ose d in [34]. 11 T able 3: Relev ant sp ecial cases contained in the Huggins-Ro y’s family . They are all prop er and stable, except for N − N Z that is degenerate and stable. β → 0 β = 1 / 2 β → 1 β = 2 β → ∞ N − N Z P N n =1 √ ¯ w n 2 exp − P N n = ¯ w n log ¯ w n 1 P N n =1 ¯ w 2 n 1 max[ ¯ w 1 ,..., ¯ w N ] wher e N Z is the numb er of Perplexity - Eq. (7) Standar d formula In Eq. (8) zer os in ¯ w [5, 40] in Eq. (6) - [26] [34] 5 Relationship with the en trop y measures 5.1 Relationship with the R ´ en yi en trop y In this section, w e sho w the connection betw een the R ´ enyi entrop y and Huggins-Ro y’s family . The R ´ en yi entrop y [9] is defined as R ( β ) N ( ¯ w ) = 1 1 − β log " N X n =1 ¯ w β n # , β > 0 , (24) Then, first noting that 1 1 − β log h P N n =1 ¯ w β n i = log h P N n =1 ¯ w β n i 1 1 − β and taking the exponential of b oth sides of the equation ab ov e, w e obtain ESS-H ( β ) N ( ¯ w ) = exp R ( β ) N ( ¯ w ) = N X n =1 ¯ w β n ! 1 1 − β , β > 0 . (25) In ecology , the exp onen tial of the R ´ enyi entrop y defines the so-called diversity indic es [24]. This means that the Huggins-Roy’s family con tains and coincides with all the div ersity indices deriv ed b y the R´ en yi entrop y [9, 24]. See Section 7.1, for further details. Note that, for β = 0, w e hav e R (0) N ( ¯ w ) = log ( N − N Z ) where N Z = # all ¯ w n : ¯ w n = 0 , ∀ n = 1 , . . . , N (see [9] for further details), so that ESS-H (0) N ( ¯ w ) = N − N Z , as also sho wn in T able 3. F or β = 1, w e hav e R (0) N ( ¯ w ) = − P N n = ¯ w n log ¯ w n [9] then ESS-H (1) N ( ¯ w ) = exp − N X n = ¯ w n log ¯ w n ! , (26) that is the p erplexit y in Eq. (7) [5, 40]. 5.1.1 Inequalities for the G-ESS within Huggins-Roy family One of adv an tages of the connection with the R ´ enyi en trop y is that we can obtain easily some theoretical results ab out ESS-H ( β ) N . Indeed, for instance, it is w ell-kno wn that [9] R (0) N ( ¯ w ) ≥ R (1) N ( ¯ w ) ≥ R (2) N ( ¯ w ) ≥ . . . R ( β ′ ) N ( ¯ w ) . . . ≥ R ( ∞ ) N ( ¯ w ) , β ′ ≥ 2 . 12 Then, since ESS-H ( β ) N is an increasing monotonic function of R ( β ) N , w e can also assert ESS-H (0) N ( ¯ w ) ≥ ESS-H (1) N ( ¯ w ) ≥ ESS-H (2) N ( ¯ w ) ≥ . . . ESS-H ( β ′ ) N ( ¯ w ) . . . ≥ ESS-H ( ∞ ) N ( ¯ w ) . (27) Namely , we can re-write ESS-H ( ∞ ) N ( ¯ w ) ≤ ESS-H ( β ) N ( ¯ w ) ≤ ESS-H (0) N ( ¯ w ) , 1 max ¯ w n ≤ ESS-H ( β ) N ( ¯ w ) ≤ N − N Z , β ≥ 0 . (28) Moreo ver, since from [9] we hav e R (2) N ( ¯ w ) ≤ 2 R ( ∞ ) N ( ¯ w ) , w e can also write ESS-H (2) N ( ¯ w ) ≤ 2 ESS-H ( ∞ ) N ( ¯ w ) . (29) 5.2 Relationship with the Tsallis en trop y Another famous entrop y family is the so-called Tsallis entrop y [46] (as known as q -logarithmic en tropy [24]), defined as T ( α ) N ( ¯ w ) = 1 α − 1 " 1 − N X n =1 ¯ w α n # , α > 0 . (30) W e can obtain a corresp onding G-ESS family based on the Tsallis en tropy , after some additional simple op erations of translation and scaling, i.e., ESS-T ( α ) N ( ¯ w ) = ( α − 1)( N − 1) N 1 − α − 1 T ( α ) N ( ¯ w ) + 1 , (31) = ( α − 1)( N − 1) N 1 − α − 1 " 1 − N X n =1 ¯ w α n # + 1 , α > 0 . (32) Note that 1 ≤ ESS-T ( α ) N ( ¯ w ) ≤ N . Sp ecial cases. F or α → 0, we get again the following degenerate and stable form ula ESS-T (0) N ( ¯ w ) = N − N Z , where N Z = # all ¯ w n : ¯ w n = 0 , ∀ n = 1 , . . . , N . F or α → ∞ , w e hav e the degenerate expression ESS-T ( ∞ ) N ( ¯ w ) = N if ¯ w = ¯ w ( j ) for all j ∈ { 1 , . . . , N } , or ESS-T ( ∞ ) N ( ¯ w ) = 1 if ¯ w = ¯ w ( j ) , for all j ∈ { 1 , . . . , N } . Setting α = 2, we ha v e ESS-T (2) N ( ¯ w ) = N 1 − N X n =1 ¯ w 2 n ! + 1 , = N Gini-impurit y( ¯ w ) + 1 , (33) 13 where w e hav e used the definition of the function b elo w, Gini-impurit y( ¯ w ) = 1 − N X n =1 ¯ w 2 n , (34) that is the so-called Gini impurity or Gini’s diversity index or also known as Gini-Simpson index in bio div ersity studies, that is widely used in mac hine learning within decision tree algorithms [3, 28]. Moreo ver, from an ecology p oin t of view, that Gini-impurit y( ¯ w ) represents the probabilit y that t wo individuals c hosen at random are of differen t species. The Gini impurit y is associated with the name of Edward H. Simpson, who in tro duced it as an index of diversit y in 1949 [44]. Then, Corrado Gini used the formula (called as “Gini impurity”) ab o ve in economics, statistics, and demography [7]. It is suc h a natural quan tit y that it has b een used in many differen t fields and admits an unbiased estimator. Despite all these b enefits, Gini-impurity( ¯ w ) is not directly an effective n umber and needs an additional translation and scaling, becoming ESS-T (2) N ( ¯ w ). Moreo ver, the final expression is not stable. It is also interesting to remark that the final form of ESS-T ( α ) N ( ¯ w ) resem bles the G-ESS family ESS-V ( r ) N ( ¯ w ) introduced in [34], ESS-V ( r ) N ( ¯ w ) = N r − 1 ( N − 1) 1 − N r − 1 " N X n =1 ¯ w r n # + N r − 1 N r − 1 − 1 , r > 0 , Ho wev er, generally the ESS expressions con tained in ESS-V ( r ) N ( ¯ w ) and ESS-T ( α ) N ( ¯ w ) are not stable. F or this reason, in this work w e fo cus on mainly Huggins-Roy ESS family . F urthermore, it is also p ossible to find another transformation, instead of the standard exp onential function exp( · ) (as for the R ´ en yi entrop y), that con verts the Tsallis entrop y in to the Huggins-Ro y ESS family . That is the so-called q -exp onen tial function [9]: exp α ( t ) = ( (1 + (1 − α ) t ) 1 / (1 − α ) if α = 1 , exp( t ) if α = 1 . (35) After some manipulations, w e arrive to exp α T ( α ) N ( ¯ w ) = ESS-H ( α ) N ( ¯ w ) . (36) Hence, this confirms in a generalized-sense the definition of a div ersity index as “exp onen tial of an en tropy” given in Eq. (25), used in ecology . 6 Other stable G-ESS expressions All the ESS formulas contained in the Huggins-Roy family are prop er and stable, as w e ha v e sho wn in Section 4. The conv erse statemen t is not true, i.e., there are other prop er and stable G-ESS 14 form ulas that are not contained in the Huggins-Roy family . W e provide some examples b elo w. Another degenerate and stable formula. W e start with an additional example of degenerate and stable expression: ESS-Plus N ( ¯ w ) = N + = # { ¯ w n ≥ 1 / N , ∀ n = 1 , . . . , N } . (37) It represen ts the num b er of the normalized w eights bigger or equal to 1 / N . This ESS expression is stable but degenerate. The issue is that ESS-Plus N ( ¯ w ) reac hes the minimum v alue 1 ev en at p oints that are not the v ertices ¯ w ( j ) of the simplex (see Eq. (13)). F or instance, with ¯ w = [0 . 8 , 0 , 0 . 2] w e get ESS-Plus 3 ( ¯ w ) = 1, but we w ould like to reac h the minimum v alue 1 only at the vertex ¯ w (1) = [1 , 0 , 0], ¯ w (2) = [0 , 1 , 0] and ¯ w (3) = [0 , 0 , 1]. Ho wev er, ESS-Plus N ( ¯ w ) is muc h more useful than another degenerate and stable form ula that we already found in T able 3, i.e., N − N Z . Indeed, N − N Z is degenerate since reaches the maxim um v alue, N , in any ¯ w that do es not contain an y zero (instead of only at ¯ w ∗ ). This mak es N − N Z m uch less useful from a practical p oin t of view, for instance, within a particle filter. Whereas ESS-Plus N ( ¯ w ) could b e p erfectly employ ed within a particle filter, considering it as a more conserv ative ESS form ula with respect to other ESS expressions. Other proper and stable form ulas. Let us define ¯ w + 1 , . . . , ¯ w + N + = { all ¯ w n suc h that: ¯ w n ≥ 1 / N , ∀ n = 1 , . . . , N } , (38) where N + is given in Eq. (37), i.e., N + = # ¯ w + 1 , . . . , ¯ w + N + . No w, it is p ossible to define a correct-prop er version of ESS-Plus [34], i.e., ESS-Q N ( ¯ w ) = − N N + X i =1 ¯ w + i + N + + N , = N + + N 1 − N + X i =1 ¯ w + i ! , = N + + N N − N + X i =1 ¯ w − i ! = N + + N γ , (39) where ¯ w − i are all the normalized w eigh ts such that < 1 / N , and γ = P N − N + i =1 ¯ w − i ≤ 1. Note that γ = 0 in the t wo extreme cases ¯ w = ¯ w ( j ) and ¯ w = ¯ w ∗ and ESS-Q N ( ¯ w ) = N + + 0 = N + , i.e., we ha ve ESS-Q N ( ¯ w j ) = 1 and ESS-Q N ( ¯ w ∗ ) = N as exp ected. In all the other scenarios, a p ortion of all the num b er of samples (that is γ N with γ ≤ 1) is added to N + . The resulting ESS form ula is prop er and stable. This measure is also related to L 1 b et ween the tw o pmfs [34]. Another prop er and stable ESS expression in tro duced in the literature is based on the Gini inequalit y co efficien t, widely applied in economics [23, 34]. First of all, w e define the non-decreasing sequence of normalized w eights as ¯ w (1) ≤ ¯ w (2) ≤ . . . ≤ ¯ w ( N ) , (40) 15 obtained sorting in ascending order the en tries of the v ector ¯ w . Let us consider the Gini inequality co efficien t G ( ¯ w ) in tro duced in economy for measuring the wealth inequalit y can b e defined as follo ws [17, 7, 23], G ( ¯ w ) = 2 s ( ¯ w ) N − N + 1 N , where s ( ¯ w ) = N X n =1 n ¯ w ( n ) . (41) It is not the unique form ulation: there are v arious equiv alen t form ulations of the Gini co efficien t [47, 33]. Then, the corresp onding G-ESS function is given by ESS-Gini N ( ¯ w ) = − N G ( ¯ w ) + N , = − N 2 s ( ¯ w ) N − N + 1 N + N , = − 2 s ( ¯ w ) + 1 + 2 N , = − 2 N X n =1 n ¯ w ( n ) + 1 + 2 N , (42) whic h is prop er and stable. It can b e easily also sho wn that ESS-Gini N ( ¯ w ( j ) ) = − 2 N + 1 + 2 N = 1 for all j and ESS-Gini N ( ¯ w ∗ ) = − 2 1 N N ( N +1) 2 + 1 + 2 N = N . The fact that some prop er and stable ESS expressions do not b elong to the Huggins-Ro y family sho ws there is still ro om and need for further researc h on in this topic. F or instance, new prop er and stable form ulas could b e discov ered, as w e shown in the next section. 7 Connections with other researc h fields: extended range of applications In the previous section, w e ha v e already seen that the connections with the R´ en yi and Tsallis en tropy families sho w the existence of other relationships with man y studies in differen t fields (e.g., ecology and mac hine learning). The b enefit of creating these bridges b et ween fields is bidirectional: differen t ideas used in other fields can b e applied as ESS in a IS context and, vice- v ersa, ESS formulas prop osed for IS could b e employ ed in other fields. A clear example of this b enefit is given, in Section 7.3, where we discov er a new prop er and stable ESS form ula, that has b een introduced in p olitical science. 7.1 ESS in ecology The connection with the R ´ enyi entrop y shows that the G-ESS functions of the Huggins-Ro y’s family are also div ersity indices [24]. More sp ecifically , the exp onential of the R´ en yi entrop y is known in ecology as the Hil l numb er of order β [24]. The Hill num b ers are the most imp ortan t measures of biological diversit y . F or instance, the Hill num b er of order 0 corresp onds 16 to ESS-H (0) N ( ¯ w ) = N − N Z , and represents the num b er of sp ecies. This is also called the sp e cies richness in ecology , and is often used as a measure of diversit y in the p opular media and the ecology literature. How ev er, it do es not mak e any distinction b etw een a rare sp ecies and a common sp ecies. Moreo ver, ESS-H (0) N do es not provide any information ab out the balance b etw een the sp ecies that are in volv ed. In Section 5.2, w e hav e seen that the form ula 1 − P N n =1 ¯ w 2 n is called Gini-impurity in machine learning. Whereas, in ecology , it is called Gini-Simpson index, since Simpson introduced it as an index of diversit y [44]. Moreov er, since the sum of the squares P N n =1 ¯ w 2 n can b e in terpreted as a me asur e of c onc entr ation (see Section 7.2), the Hill num b er (div ersity) of order 2, ESS-H (2) N ( ¯ w ), is also called the inverse Simpson c onc entr ation in ecology [44]. F urthermore, the diversit y of order ∞ , i.e., ESS-H ( ∞ ) N ( ¯ w ) = 1 / max ¯ w i , is kno wn as the Ber ger-Parker index in ecology [2]. While the Hill n um b er of order 0 gives rare sp ecies the same imp ortance as an y other, div ersity of order ∞ ignores them and takes in to account only the dominan t sp ecies. More generally , the parameter β con trols the sensitivit y of the div ersity measure ESS-H ( β ) N to rare sp ecies, with higher v alues of β corresp onding to measures less sensitiv e to rare sp ecies. In other words, β reflects the in verse of the imp ortance given to rare sp ecies. 7.2 ESS in economics The ESS indices ha ve b een also widely emplo yed (under other names) as metrics for p ortfolio disp ersion and/or concen tration. The effectiv e n um b er of p ositions held in a p ortfolio is usually measured as ESS N ( ¯ w ) = 1 P N n =1 ¯ w 2 n , where the normalized w eights ¯ w n represen t the proportion of mark et v alue inv ested in each securit y . A high v alue of ESS N ( ¯ w ) implies a v ery diversified p ortfolio (at most different N equally w eighted p ositions). The formula 1 P N n =1 ¯ w 2 n has b een shown to b e one of the most efficient measures of p ortfolio div ersification. It has b een also used as a constraint to force a p ortfolio to hold a minimum num b er of effective assets, denoted for instance as N eff (e.g., ∥ ¯ w ∥ 2 ≤ N − 1 eff ). Concen tration measures. In Section 7.1, we ha v e seen that there is a coincidence betw een Hill num b ers and the Huggins-Ro y ESS form ulas. More generally , the reciprocals of the Hill n umbers (hence, the recipro cals of the G-ESS formulas as well) ha ve b een used in economics as concen tration measures, i.e., Conc ( β ) N ( ¯ w ) = 1 ESS-H ( β ) N ( ¯ w ) = N X n =1 ¯ w β n ! 1 β − 1 , β > 0 . (43) As an example, we could in v estigate if an industry or a mark et is concen trated in the hands of a small num b er of large play ers. Let assume there are N competing companies in a given industry , eac h one o ccupying a p ortion of the market represented b y the normalized weigh ts ¯ w 1 , . . . , ¯ w N , then the concen tration 1 / ESS-H ( β ) N ( ¯ w ) is maximized when one compan y has a monop oly , i.e., when ¯ w = ¯ w ( j ) (the j -th compan y has conquered all the market, ¯ w j = 1). Namely , a concen tration index ranges from 1 / N (in case of p erfect comp etition) to 1 (in case of monop oly), where N represen ts the n um b er of companies in the market. The concen tration measure for β = 2, 17 i.e., Conc (2) N ( ¯ w ) = 1 ESS-H (2) N ( ¯ w ) = P N n =1 ¯ w 2 n is known as Herfindahl-Hirschman index in economy . Finally , in the previous section, we hav e also seen the application of similar indices for measuring the w ealth inequality , e.g., using the Gini co efficien t [7, 17, 47]. 7.3 ESS in p olitical science In political science, ESS formulas hav e b een used to set the effective n umber of parties in a p olitical system. More precisely , the authors in [29] prop osed the effective n um b er of parties using the follo wing formula 1 P N n =1 ¯ w 2 n (the Hill n umber of order 2 in ecology), where N is the total n umber of parties and ¯ w n is the prop ortion of votes of the n -th party . An alternative formula w as in tro duced in p olitical science b y Grigorii Goloso v [18], ESS-Gol N ( ¯ w ) = N X n =1 1 1 + (max ¯ w n ) 2 ¯ w n − ¯ w n = N X n =1 ¯ w n ¯ w n + (max ¯ w n ) 2 − ¯ w 2 n , (44) that is also pr op er and stable . The v alue max ¯ w n denotes the p ortion of v otes of the part y that are obtained the greatest num b er of votes. Other alternatives can b e found in the literature [37]. 7.4 ESS in quan tum ph ysics In quantum physics, there exists a quan tit y that is related to the formula ESS-H (2) N ( ¯ w ) = P N n =1 ¯ w 2 n , that is called p articip ation r atio (PR) and the corresp onding concentration Conc (2) N ( ¯ w ) is known as inverse p articip ation r atio (IPR) . F or a fully delo calized or spread state, w e hav e the low est v alue of the IPR, i.e., min(IPR) = 1 / N . On the other hand, for a fully lo calized state, w e hav e the highest v alue of IPR, i.e., max(IPR) = 1. IPR is also close to the concept of purity whereas PR is close to the concept of sep ar ability , emplo yed b oth in quantum mechanics [49]. Moreov er, since the purity P of a quan tum state is a quan tity suc h that P ≤ 1, another concepts naturally arises that is state mixe dness as the complement of purit y , M = 1 − P . The quan tum state is pure if P = 1. Figure 2 summarizes the main nomenclature describ ed so far. 7.5 Application to mo del selection as effectiv e num b er of comp onen ts Mo del selection is a fundamental task in statistics and machine learning. An interesting scenario is when we hav e a family of nested models, where the mo del complexity can c hange since the n umber of parameters can v ary (i.e., the dimension of the v ector of parameters grows, building more complex mo dels). The dimension of the v ector of parameters is itself ob ject of inference. This is the case of the order selection in polynomial regression problems or autoregressive schemes, v ariable selection, clustering, dimension reduction, just to name a few. In the literature, generally cross-v alidation (CV) tec hniques [3], and information criteria [21, 35, 43, 45] the pro cedures used to handle this problem. More recently , other approaches based on geometric considerations, hav e b een also proposed in the literature, such as the automatic detection of an “elbow” or “k nee-p oin t” in a non-increasing curv e describing a metric of p erformance of the mo del v ersus its complexit y 18 Figure 2: Graphical summary of the main nomenclature in different fields. [38, 39, 48, 25]. An effectiv e num b er of v ariables/features (ENV) has b een also prop osed [36]. The ENV index is inspired b y the concept of maxim um area-under-the-curv e (AUC) in receiver op erating c haracteristic (ROC) curves [20] and the Gini inequality index, described in Section 6 and men tioned in Section 7.2 [23]. In the v ariable selection scenario, the ENV index is given by I ENV = 1 + 2 V (0) N − 1 X k =1 V ( k ) , for V (0) = 0 , and V ( N ) = 0 , (45) where V ( k ) is a non-increasing error curv e, e.g., the mean square errors (MSE), for a mo del that uses only k ≤ N input v ariables (instead of all the N p ossible v ariables). By construction, it is alw ays p ossible to ha ve V ( N ) = 0 (b y a simple translation). It is possible to show that 1 ≤ I ENV ≤ N . The ENV index could b e defined also for non-decreasing curv e by the alternativ e definition I ENV = 1 + 2 V ( N ) N − 1 X k =1 V ( k ) , for V ( N ) = 0 , and V (0) = 0 . (46) Th us, we can conv ert the ENV index in an ESS form ula building the curve V ( k ) as follo ws: • Sort in ascending order the normalized w eigh ts as ¯ w (1) ≤ ¯ w (2) ≤ . . . ≤ ¯ w ( N ) . 19 • Build a non-decreasing curve V ( k ), as in (46), following the recursion: V ( k ) = k X i =1 ¯ w ( i ) = V ( k − 1) + ¯ w ( k ) , (47) starting with V (0) = 0. Note that w e alwa ys ha v e V ( N ) = 1. The corresp onding ESS formula is ESS-ENV N ( ¯ w ) = 1 + 2 N − 1 X k =1 k X i =1 ¯ w ( i ) . (48) Note that ESS-ENV N ( ¯ w ( j ) ) = 1 + 0 = 1 for all j , and ESS-ENV N ( ¯ w ∗ ) = 1 + 2 N N − 1 X k =1 k = 1 + 2 N ( N − 1)( N ) 2 = 1 + N − 1 = N . (49) Remark 3. It is p ossible to show that ESS-ENV N ( ¯ w ) is pr op er and stable. F urthermor e, it c oincides with ESS-Gini ( ¯ w ) in Eq. (42) , i.e., ESS-ENV N ( ¯ w ) = ESS-Gini N ( ¯ w ) . R e c al l that ther e exist differ ent formulations of the Gini c o efficient [47]. The closest one in this fr amework is r elate d to the L or enz curve [33]. Remark 4. This se ction op ens the p ossibility to apply the ESS formulas as effe ctive numb er fo c omp onents in mo del sele ction pr oblems. Inde e d, given a non-incr e asing err or curve V ( k ) , i.e., V ( k − 1) ≤ V ( k ) , we c an build the normalize d weights in this way: d k = V ( k − 1) − V ( k ) , ¯ w k = d k P N i =1 d i , (50) for al l k = 1 , ..., N . Then the ESS formula c an b e applie d to the ve ctor ¯ w = [ ¯ w 1 , ..., ¯ w N ] . 8 Numerical exp erimen ts 8.1 Analyzing the Huggins-Ro y family Since all the ESS functions in the Huggins-Ro y family are prop er and stable and, since this family con tains the main relev an t form ulas, we fo cus the numerical exp erimen ts on this family . First of all, w e recall the theoretical definition of ESS in Eq. (5), ESS teo ( h ) = N v ar π [ b I ] v ar q [ e I ] . (51) where, for simplicit y , w e consider a scalar x ∈ R the use of the in tegrand h ( x ) = x (in the definition abov e, w e hav e clarified the dep endence on the function h ). Namely , b I and e I are 20 estimators of the exp ected v alue of a random v ariable X with a target p df ¯ π ( x ) (defined b elo w). In this n umerical example, we compute approximately via Monte Carlo the theoretical definition ESS teo , and compare them with the G-ESS functions ESS-H ( β ) N . More sp ecifically , we consider a univ ariate standard Gaussian density as target p df, ¯ π ( x ) = N ( x ; 0 , 1) , (52) and also a Gaussian prop osal p df, q ( x ) = N ( x ; µ p , σ 2 p ) , (53) with mean µ p and v ariance σ 2 p . In all the exp erimen ts, we consider N = 1000. 8.1.1 V arying the prop osal mean µ p In a first analysis, w e k eep fixed σ p = 1 and v ary µ p ∈ [0 , 2]. Figures 3(a)-3(b) depict t wo scenarios in this exp erimen tal setup, corresp onding to t wo specific v alues of µ p , 0 . 5 and 1 . 5. Clearly , for µ p = 0 we ha ve the ideal Mon te Carlo case, q ( x ) ≡ ¯ π ( x ). As µ p increases, the prop osal b ecomes more different from ¯ π . W e recall that N = 1000. Figure 4(a) sho ws the theoretical ESS teo / N curv es (solid line) ESS-H (2) N / N (circles) and ESS-H ( ∞ ) N / N (squares), a v eraged o v er 10 5 indep enden t runs. Note that 1 N ≤ E S S N ≤ 1. Optimal linear combination of ESS-H (2) N and ESS-H ( ∞ ) N . The functions ESS-H (2) N and ESS-H ( ∞ ) N are the most used and suggested formulas in different studies [22, 34]. Moreov er, at least in this sim ulation scenario, they seem to pla y the role of upp er b ound and lo wer b ound of the true v alue, as shown by Figure 4(a). F or this reason, we also consider the linear combination of the G-ESS form ulas ESS-H (2) N and ESS-H ( ∞ ) N , Com b-ESS N ( ¯ w ) = a 1 ESS-H (2) N ( ¯ w ) + a 2 ESS-H ( ∞ ) N ( ¯ w ) . (54) This example suggests the use of a 1 = 0 . 6245 , a 2 = 0 . 4289 , (55) obtained using a Least Squares (LS) regression in order to obtain an expression Com b-ESS N ( ¯ w ) as close as p ossible to the theoretical ESS curve. Optimal β for ESS-H ( β ) N ( ¯ w ) . F urthermore, we ha ve computed the curv es (as function β ) of ESS-H ( β ) N ( ¯ w ) for differen t v alues of β , considering a thin grid of β v alues from 0 . 2 to 50 with a step of 0 . 01 (i.e., β ∈ G denoting G the thin grid). W e consider a L 1 distance b et ween eac h ESS-H ( β ) N ( ¯ w ) curve and the theoretical ESS curv e, 3 , i.e., | ESS-H ( β ) N − ESS teo | , and compute β ∗ = arg min β ∈G | ESS-H ( β ) N − ESS teo | . (56) With this pro cedure, we obtain β ∗ ≈ 4 . 3 Recall that these curv es are functions of µ p and are a veraged o ver 10 5 indep enden t runs. 21 Discussion of the results. Figure 4(b) shows the curves of the ESS rates corresp onding to the theoretical ESS curv e (solid line), the b est linear combination corresp onding to the Eqs. (54)-(58) (squares) and the curv e corresp onding to ESS-H ( β ∗ ) N (dashed line). First of all, we can note that the linear com bination can return v alues greater than 1 (recall that we are considering ESS / N ). Moreo ver, we can see that the curv e corresp onding to ESS-H (4) N ( ¯ w ) fits particularly w ell in this numerical setup, pro viding a very close to the theoretical ESS curv e. Observe that the appro ximation pro vided b y ESS-H (4) N is virtually p erfect for µ p ≤ 1. Hence, in this kind of scenario, w e would suggest the use of the expression ESS-H (4) N ( ¯ w ) = 1 P N n =1 ¯ w 4 n ! 1 3 . (57) − 5 0 5 0 0.1 0.2 0.3 0.4 x Target and proposal pdfs µ p µ p =0.5 π (x) q(x) (a) − 5 0 5 0 0.1 0.2 0.3 0.4 x Target and proposal pdfs µ p µ p =1.5 π (x) q(x) (b) -5 0 5 x 0 0.1 0.2 0.3 0.4 Target and proposal pdfs p =0.8 p =0.5 (x) q(x) (c) Figure 3: T arget and prop osal pdfs: (a) - (b) with µ p ∈ { 0 . 5 , 1 . 5 } . The v ariances in b oth is set to 1. (c) here µ p = 0 and σ p ∈ { 0 . 5 , 0 . 8 } . 8.1.2 V arying the prop osal standard deviation σ p No w, w e k eep fixed µ p = 0 and v ary the standard deviation of the prop osal σ p ∈ [0 . 5 , 1]. Figure 3(c) depicts the target density and the prop osal density for t wo sp ecific v alues of σ p , 0 . 5 and 0 . 8, used in this exp erimen tal setup. W e recall that N = 1000 and the results hav e b een av eraged o ver 10 5 indep enden t runs. In Figure 5(a), w e can observ e the results of ESS teo / N versus σ p (in solid line), join tly with the curves ESS-H (2) N / N (giv en with circles) and ESS-H ( ∞ ) N / N (sho wn with squares). Optimal linear com bination of ESS-H (2) N and ESS-H ( ∞ ) N . Since the form ulas ESS-H (2) N and ESS-H ( ∞ ) N are the most used in practice, again w e consider the linear com bination of the G-ESS form ulas ESS-H (2) N and ESS-H ( ∞ ) N , Com b-ESS N ( ¯ w ) = a 1 ESS-H (2) N ( ¯ w ) + a 2 ESS-H ( ∞ ) N ( ¯ w ) , (58) 22 0 0.5 1 1.5 2 p 0 0.2 0.4 0.6 0.8 1 ESS/N ESS H N (2) H N ( ) (a) 0 0.5 1 1.5 2 p 0 0.2 0.4 0.6 0.8 1 ESS/N ESS H N (4) Lin Comb (b) Figure 4: Ratio of ESS v alues ov er N (with N = 1000) v ersus µ p . The curv e corresponding to theoretical ESS v alue, i.e., ESS teo / N is shown in black solid line in b oth figures. In (a) the curves of ESS-H (2) N / N (circles) and ESS-H ( ∞ ) N / N (squares) are also depicted. In (b) we show the curves ESS-H (4) N / N (dashed line) and the linear combination in Eq. (54)-(58) (squares), as w ell. The appro ximation provided by ESS-H (4) N is virtually p erfect for µ p ≤ 1. where in this scenario we get by LS solution a 1 = 0 . 2715 , a 2 = 0 . 8483 , (59) hence ESS-H ( ∞ ) N tak es more imp ortance in this scenario. Figure 5(b) pro vides the curv e corresp onding to Comb-ESS N ( ¯ w ) / N with a dashed line and green squares. Optimal β for ESS-H ( β ) N ( ¯ w ) . F urthermore, we ha ve computed the curv es (as function β ) of ESS-H ( β ) N ( ¯ w ) for differen t v alues of β , considering a grid of v alues of β denoted as G . W e consider a L 1 distance b et ween each ESS-H ( β ) N ( ¯ w ) curve and the theoretical ESS curv e, and compute β ∗ = arg min β ∈G | ESS-H ( β ) N − ESS teo | . (60) In this scenario, w e obtain β ∗ ≈ 7 . 6 . The corresp onding curv e is depicted in Figure 5(b) with a dashed line and red triangles. W e can see that w e obtain a v ery goo d appro ximation of ESS teo / N , but sligh tly w orse than in the case describ ed in the previous section. Moreov er, here the optimal β ∗ is ≈ 7 . 6 whereas, in the previous section, w as β ∗ is ≈ 4. 23 0.5 0.6 0.7 0.8 0.9 1 p 0 0.2 0.4 0.6 0.8 1 ESS/N ESS H N (2) H N ( ) (a) 0.5 0.6 0.7 0.8 0.9 1 p 0 0.2 0.4 0.6 0.8 1 ESS/N ESS H N (7.6) H N ( ) (b) Figure 5: Ratio of ESS v alues o ver N (with N = 1000) v ersus σ p . The curv e corresp onding to theoretical ESS v alue, i.e., ESS teo / N is shown in black solid line in b oth figures. In (a) the curves of ESS-H (2) N / N (circles) and ESS-H ( ∞ ) N / N (squares) are also depicted. In (b) we show the curv es ESS-H (7 . 6) N / N (dashed line) and the linear com bination in Eq. (59) (squares), as well. Discussion of the results. Figure 5(b) shows the curves of the ESS rates corresp onding to the theoretical ESS curv e (solid line), the b est linear combination corresp onding to the Eqs. (54)-(58) (green squares) and the curv e corresp onding to ESS-H ( β ∗ ) N (red triangles). Again the linear com bination can return v alues greater than 1 (recall that we are considering ESS / N ). This b eha vior could b e exploited in future w orks since actually ESS teo / N can exceed 1 (see [14, Section 3.3]). Moreo ver, we can see that ESS-H (7 . 6) N ( ¯ w ) p erforms particularly well in this scenario, pro viding a close to the theoretical ESS curve. Hence, in this setup, w e would suggest the use of ESS-H (7 . 6) N ( ¯ w ). Only for simplicity in computation and comparison, one could consider the closest in teger and use β = 8, ESS-H (8) N ( ¯ w ) = 1 P N n =1 ¯ w 8 n ! 1 7 . (61) Finally , it is imp ortant to remark that ev en if the optimal β ∗ ≈ 7 . 6 (or 8) is differen t from the v alue β ∗ ≈ 4 suggested in the previous section, how ev er b oth v alues differ from 2 (that corresp onds to the typical formula employ ed in the literature) and b oth v alues are bigger than 2. The expression with β → ∞ , i.e., ESS-H ( ∞ ) N = 1 max ¯ w n seems that can b e emplo y ed as a lo wer bound for the theoretical v alue ESS teo , in b oth setups. These considerations can b e relev ant clues for future applications and studies. 24 8.2 Application to a v ariable selection in a regression problem with real data Finding the connections with other fields creates the opportunities for new applications for the ESS form ulas. As w e describ ed in Section 7.5, the ESS can b e applied in a feature selection problem to find the effective n um b er of comp onen ts. In this section, w e provide an example of this application with a real dataset. Let us consider a regression problem, where we observ e a dataset of N pairs { x n , y n } N n =1 , with eac h input v ector x n = [ x n, 1 , . . . , x n,K ] is formed by K v ariables, and the outputs y n ’s are scalar v alues [42]. W e consider the case that b eing K ≤ N and assume a linear observ ation mo del, y n = θ 0 + θ 1 x n, 1 + θ 2 x n, 2 + . . . θ K x n,K + ϵ n , (62) where ϵ n is a Gaussian noise with zero mean and v ariance σ 2 ϵ , i.e., ϵ n ∼ N ( ϵ | 0 , σ 2 ϵ ). More sp ecifically , in this real dataset [42, 41, 15], w e hav e K = 122 features and N = 1214 n um b er of data p oin ts x i . W e focus on the first of the tw o outputs in the dataset (called “arousal”). W e set V ( k ) = − 2 log ( ℓ max ) with ℓ max = max θ p ( y | θ k ) with k ≤ K , after ranking the 122 v ariables (see [42]), where the lik eliho o d function p ( y | θ k ) is induced by the Eq. (62). In order to find the effectiv e num b er of v ariables N eff ≤ K = 122, we compare with differen t well-kno wn information criteria 4 , AIC, BIC and HQIC, and other metho ds provided in the literature. F or the sp ectral information criterion (SIC), w e test t w o confidence internal parameter to 95% and 99%. W e also test different stable ESS formulas obtained the w eigh ts as in Eq. (50). W e test the expressions in the Huggins-Ro y family , ESS-H ( β ) N , with β → 1, β = 2, β → ∞ , and the other stable form ulas giv en in the Eqs. (37), (39), (42), and (44). All the results are r ounde d to the closest in teger. Th us, the results provided by each metho d are given in T able 4. T able 4: Results in the v ariable selection example with a real dataset. Sc heme AIC BIC HQIC UAED SIC-95 SIC-99 ENV N eff 44 17 41 11 7 17 13 Ref. [45] [43] [21] [38] [35] [35] [36] ESS form ula β → 1 β = 2 β → ∞ Plus Q Gini Gol N eff 10 5 3 11 24 11 4 Eq. (7) (6) (8) (37) (39) (42) (44) After an exhaustiv e analysis, the authors in [42, Section 4-C] suggest that there are 7 v ery relev ant v ariables (level 1 of [42, Section 4-C]), other 7 relev an t v ariables (lev el 2 of [42, Section 4-C]) and other 2 v ariables in a lev el 3 of imp ortance [42, Section 4-C], hence, ov erall 16 v ariables among v ery relev an t, relev an t, and imp ortant ones (16 o ver 122 possible features). The minim um v alue in 4 Considering the cost function C ( k ) = V ( k ) + λk , each information criterion suggests the use of a different parameter λ . 25 T able 4 is 3, provided b y ESS-H ( ∞ ) N , whereas the maxim um v alue is 44 given b y AIC. These v alues and the rest of results in T able 4 are in line with the conclusions in [42]. More specifically , the results given b y the SIC-99, BIC, UAED, ENV, the p erplexit y ESS-H (1) N , ESS-Plus and ESS-Gini are 10 ≤ N eff ≤ 17, and are close to the results of the analysis in [42]. Hence, in this exp eriment, some ESS form ulas lik e the p erplexit y ESS-H (1) N , ESS-Plus and ESS-Gini, seem to pro vide go o d p erformance as effective num b er of comp onents in mo del selection. 9 Conclusions In this w ork, we ha ve analyzed alternative effective sample size (ESS) measures for Monte Carlo algorithms based on the importance sampling tec hniques. W e ha ve remarked the connection to the practical ESS form ulas used in the literature and entrop y families [9]. W e hav e sho wn that all the ESS functions included in the Huggins-Roy’s ESS family fulfill all the required theoretical conditions describ ed in [34], and we hav e also highlighted the relationship of this family with the R ´ en yi en trop y [9]. W e hav e also shown the application of the Gini impurity index as ESS form ula and its connection to the Tsallis entrop y . F urthermore, w e hav e studied the p erformance of differen t Huggins-Ro y’s ESS form ulas by n umerical simulations, in tro ducing also an optimal linear combination of the most promising ESS indices. In tw o n umerical examples, w e hav e obtained the best ESS appro ximations within the Huggins-Roy’s family in t wo different setups, ESS = 1 P M n =1 ¯ w 4 n 1 / 3 and ESS = 1 P M n =1 ¯ w 8 n 1 / 7 . These form ulas pro vide a goo d appro ximation (and in the first case almost a perfect match) of the theoretical ESS v alues, in tw o different considered exp erimen tal scenarios. Moreov er, the expression ESS = 1 max ¯ w n , whic h corresp onds to β → ∞ , also pro vides go o d p erformance in some sp ecific cases (and pla ying the role of a lo wer bound of the ESS measures in other cases). All these considerations suggest us that the use of a β > 2 can more adequate in practical applications, e.g., in order to fight the sample degeneracy and imp ov erishmen t within a particle filtering algorithm. The relationship with the en tropy families has also clarified the connections with other fields: p ossible applications in ecology , economics, political science, and mac hine learning hav e b een discussed. The application of the ESS expressions as the effective n umber of comp onents in model selection seems to be promising but should b e further inv estigated and tested. Moreo v er, the construction of these connections with other fields can also yield nov el contributions in the IS con text. As a final consideration, finding a nov el and broader family that contains all the stable ESS formulas (that do not b elong to the Huggins-Ro y’s family) could b e ob ject of future research. Ac kno wledgemen t The work w as partially supp orted by the pro ject Starting Gran t for Rttb, BA-GRAPH “Efficient Ba yesian inference for graph-supp orted data”, of the Univ ersity of Catania (UPB-28722052144), b y the pro ject Lik eF ree-BA-GRAPH funded by “PIAno di inCEn tivi per la RIcerca di A teneo 2024/2026” of the Univ ersity of Catania (UPB-28722052159), Italy . 26 References [1] M. S. Arulumpalam, S. Maskell, N. Gordon, and T. Klapp. A tutorial on particle filters for online nonlinear/non-Gaussian Bay esian tracking. IEEE T r ansactions Signal Pr o c essing , 50(2):174–188, F ebruary 2002. [2] W. H. Berger and F. L Park er. Diversit y of planktonic foraminifera in deep-sea sediments. Scienc e (New Y ork, N.Y.) , 168(3937):1345–1347, 1970. [3] C. M. Bishop. Pattern R e c o gnition and Machine L e arning (Information Scienc e and Statistics) . Springer, 1 edition, 2007. [4] M. F. Bugallo, L. Martino, and J. Corander. Adaptiv e imp ortance sampling in signal pro cessing. Digital Signal Pr o c essing , 47:36–49, 2015. [5] O. Capp ´ e, R. Douc, A. Guillin, J. M. Marin, and C. P . Rob ert. Adaptiv e importance sampling in general mixture classes. Statistics and Computing , 18:447–459, 2008. [6] O. Capp´ e, A. Guillin, J. M. Marin, and C. P . Rob ert. Population Monte Carlo. Journal of Computational and Gr aphic al Statistics , 13(4):907–929, 2004. [7] L. Ceriani and P . V erme. The origins of the Gini index: extracts from v ariabilit´ a e mutabilit´ a (1912) b y Corrado Gini. The Journal of Ec onomic Ine quality , 10(3):421–443, 2012. [8] N. Chopin. A sequential particle filter for static mo dels. Biometrika , 89:539–552, 2002. [9] T. M. Co v er and J. A. Thomas. Elements of Information The ory . Wiley-Interscience, New Y ork (USA), 1991. [10] P . M. Djuri´ c, J. H. Kotec ha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo, and J. M ´ ıguez. P article filtering. IEEE Signal Pr o c essing Magazine , 20(5):19–38, Septem b er 2003. [11] A. Doucet, N. de F reitas, and N. Gordon, editors. Se quential Monte Carlo Metho ds in Pr actic e . Springer, New Y ork, 2001. [12] A. Doucet, S. Go dsill, and C. Andrieu. On sequential Mon te Carlo Sampling metho ds for Ba yesian filtering. Statistics and Computing , 10(3):197–208, 2000. [13] A. Doucet and A. M. Johansen. A tutorial on particle filtering and smo othing: fifteen years later. te chnic al r ep ort , 2008. [14] V. Elvira, L. Martino, and C. P . Rob ert. Rethinking the Effective Sample Size. International Statistic al R eview , 90(3):525–550, 2022. [15] J. F an, M. Thorogo o d, and P . P asquier. Emo-soundscap es: A dataset for soundscap e emotion recognition. In 2017 Seventh international c onfer enc e on affe ctive c omputing and intel ligent inter action (ACII) , pages 196–201, 2017. 27 [16] D. Gamerman and H. F. Lop es. Markov Chain Monte Carlo: Sto chastic Simulation for Bayesian Infer enc e, . Chapman & Hall/CR C T exts in Statistical Science, 2006. [17] C. Gini. Measuremen t of inequality and incomes. The Ec onomic Journal , 31:124–126, 1921. [18] G. V. Goloso v. The effective num b er of parties: A new approach. Party Politics , 16(2):171– 192, 2010. [19] N. Gordon, D. Salmond, and A. F. M. Smith. No vel approac h to nonlinear and non-Gaussian Ba yesian state estimation. IEE Pr o c e e dings-F R adar and Signal Pr o c essing , 140:107–113, 1993. [20] J A Hanley and B J McNeil. The meaning and use of the area under a receiver op erating c haracteristic (ROC) curve. R adiolo gy , 143(1):29–36, 1982. [21] E. J. Hannan and B. G. Quinn. The determination of the order of an autoregression. Journal of the R oyal Statistic al So ciety. Series B (Metho dolo gic al) , 41(2):190–195, 1979. [22] J. H Huggins and D. M Roy . Con v ergence of sequen tial Mon te Carlo based sampling metho ds. arXiv:1503.00966 , 2015. [23] S. Inoua. Beware the Gini index! a new inequalit y measure. pr eprint arXiv:2110.01741 , pages 1–26, 2021. [24] L. Jost. En tropy and div ersit y . Oikos , 113(2):363–375, 2006. [25] D. Kaplan. Knee p oint, 2024. MA TLAB Cen tral File Exchange. [26] A. Kong. A note on imp ortance sampling using standardized weigh ts. T e chnic al R ep ort 348, Dep artment of Statistics, University of Chic ago , 1992. [27] A. Kong, J. S. Liu, and W. H. W ong. Sequen tial imputations and Bay esian missing data problems. Journal of the Americ an Statistic al Asso ciation , 89(425):278–288, 1994. [28] M. Krzywinski and N. Altman. Classification and regression trees. Natur e Metho ds , 14(8):757– 758, 2017. [29] M. Laakso and T aagep era R. ”effectiv e” n um b er of parties: A measure with application to w est europ e. Comp ar ative Politic al Studies , 12:3–27, 1979. [30] F. Liang, C. Liu, and R. Caroll. A dvanc e d Markov Chain Monte Carlo Metho ds: L e arning fr om Past Samples . Wiley Series in Computational Statistics, England, 2010. [31] J. S. Liu. Monte Carlo Str ate gies in Scientific Computing . Springer, 2004. [32] F. Llorente and L. Martino. Optimalit y in imp ortance sampling: a gentle survey . arXiv:2502.07396 , pages 1–40, 2025. 28 [33] M. O. Lorenz. Metho ds of measuring the concen tration of wealth. Public ations of the A meric an Statistic al Asso ciation , 9(70):209–219, 1905. [34] L. Martino, V. Elvira, and F. Louzada. Effective sample size for imp ortance sampling based on discrepancy measures. Signal Pr o c essing , 131:386–401, 2017. [35] L. Martino, R. San Millan-Castillo, and E. Morgado. Sp ectral information criterion for automatic elb o w detection. Exp ert Systems with Applic ations , 231:120705, 2023. [36] L. Martino, E. Morgado, and R. San Millan Castillo. An index of effectiv e n um b er of v ariables for uncertaint y and reliability analysis in mo del selection problems. Signal Pr o c essing , 227:109735, 2025. [37] J. Molinar. Counting the n um b er of parties: An alternativ e index. The A m eric an Politic al Scienc e R eview , 85(4):1383–1391, 1991. [38] E. Morgado, L. Martino, and R. San Millan-Castillo. Universal and automatic elb o w detection for learning the effectiv e num b er of comp onents in mo del selection problems. Digital Signal Pr o c essing , 140:104103, 2023. [39] A. J. On umanyi, D. N. Molokomme, S. J. Isaac, and A. M. Abu-Mahfouz. Auto elb ow: An automatic elbow detection metho d for estimating the num b er of clusters in a dataset. Applie d Scienc es , 12(15), 2022. [40] C. P . Rob ert and G. Casella. Intr o ducing Monte Carlo Metho ds with R . Springer, 2010. [41] R. San Mill´ an-Castillo, L. Martino, and E. Morgado. A v ariable selection analysis for soundscap e emotion mo delling using decision tree regression and mo dern information criteria. IEEE A c c ess , 2024. [42] R. San Mill´ an-Castillo, L. Martino, E. Morgado, and F. Llorente. An exhaustive v ariable selection study for linear mo dels of soundscape emotions: Rankings and Gibbs analysis. IEEE/A CM T r ansactions on A udio, Sp e e ch, and L anguage Pr o c essing , 30:2460–2474, 2022. [43] G. Sc h w arz et al. Estimating the dimension of a model. The annals of statistics , 6(2):461–464, 1978. [44] E. H. Simp oson. Measuremen t of div ersity . Natur e , 163(4148):688–688, Apr 1949. [45] D.J. Spiegelhalter, N. G. Best, B. P . Carlin, and A. V an der Linde. Bay esian measures of mo del complexity and fit. J. R. Stat. So c. B , 64:583–616, 2002. [46] C. Tsallis. P ossible generalization of Boltzmann-Gibbs statistics. Journal of Statistic al Physics , 52(1):479–487, Jul 1988. [47] S. Yitzhaki and E. Sc hec htman. Mor e Than a Dozen A lternative Ways of Sp el ling Gini , pages 11–31. Springer New Y ork, 2013. 29 [48] J. Zhang, P . F u, F. Meng, X. Y ang, J. Xu, and Y. Cui. Estimation algorithm for c hloroph yll-a concen trations in w ater from h yp ersp ectral images based on feature deriv ation and ensemble learning. Ec olo gic al Informatics , 71:101783, 2022. [49] K. Zyczko wski, P . Horo dec ki, A. Sanp era, and M. Lew enstein. V olume of the set of separable states. Phys. R ev. A , 58:883–892, 1998. A Probabilistic in terpretation Let us define a pair of random v ariables { X t , Z t } that corresp onds to generate a random pair of samples that are indep endently dra wn with replacemen t according to the pmf defined by ¯ w n , with n = 1 , ..., N . W e denote with t ∈ N a sub-index corresp onding the trial/exp erimen t. W e p erform differen t indep endent trials. Let also define the random v ariable T = { minim um t ∈ N suc h that X t = Z t } . W e aim to compute the exp ected n umber of trials needed to obtain a first pair containing the same sample t wice, i.e., E [ T ] = ∞ X t =1 t · Prob( T = t ) . (63) Note no w that Prob( T = 1) = N X n =1 ¯ w 2 n , Prob( T = 2) = 1 − N X n =1 ¯ w 2 n ! N X n =1 ¯ w 2 n , and Prob( T = t ) = 1 − N X n =1 ¯ w 2 n ! t − 1 N X n =1 ¯ w 2 n . Th us, replacing into Eq. (63), we hav e E [ T ] = ∞ X t =1 t · 1 − N X n =1 ¯ w 2 n ! t − 1 N X n =1 ¯ w 2 n , (64) = N X n =1 ¯ w 2 n ! ∞ X t =1 t 1 − N X n =1 ¯ w 2 n ! t − 1 , (65) = P N n =1 ¯ w 2 n 1 − P N n =1 ¯ w 2 n ! ∞ X t =1 t 1 − N X n =1 ¯ w 2 n ! t . (66) 30 T o simplify the expression ab ov e, w e can set r = 1 − P N n =1 ¯ w 2 n , so that w e can rewrite it as E [ T ] = 1 − r r " ∞ X t =1 t r t # , (67) = 1 − r r r (1 − r ) 2 = 1 1 − r = 1 P N n =1 ¯ w 2 n , (68) where w e hav e used the follo wing equalit y , ∞ X t =1 t r t = r (1 − r ) 2 , when r ≤ 1 , whic h is a well-kno wn result of p o wer series. B Other form for the ESS form ula in Eq. (6) Let us recall P N n =1 ¯ w n = 1 so that the arithmetic mean of the normalized weigh ts is alwa ys µ = 1 N P N n =1 ¯ w n = 1 N . Note that the ESS form ula in Eq. (6) can ESS N ( ¯ w ) = 1 P N n =1 ¯ w 2 n = 1 1 N + N b σ 2 , (69) where b σ 2 = 1 N P N n =1 ( ¯ w n − µ ) 2 is the v ariance of the normalized weigh ts. If b σ 2 = 0, then ESS N ( ¯ w ) reac hes the maximum v alue N . W e can write: 1 1 N + N b σ 2 = 1 1 N + N 1 N P N n =1 ( ¯ w n − µ ) 2 , = 1 1 N + N 1 N P N n =1 ¯ w 2 n + 1 N P N n =1 µ 2 − 2 1 N µ P N n =1 ¯ w n , = 1 1 N + N 1 N P N n =1 ¯ w 2 n + 1 N N µ 2 − 2 µ 2 , = 1 1 N + N 1 N P N n =1 ¯ w 2 n − µ 2 , = 1 1 N + P N n =1 ¯ w 2 n − N µ 2 , = 1 1 N + P N n =1 ¯ w 2 n − 1 N = 1 P N n =1 ¯ w 2 n . The equation ab o ve pro v es the equality (69). 31
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment