Percentile rank scores are congruous indicators of relative performance, or arent they?
Percentile ranks and the I3 indicator were introduced by Bornmann, Leydesdorff, Mutz and Opthof. These two notions are based on the concept of percentiles (or quantiles) for discrete data. As several definitions for these notions exist we propose one…
Authors: Ronald Rousseau
1 Percentile rank scores are congruous indicators of relative performa nce, or aren’t they? Ronald Rousseau KHBO (Association K.U.Leuven), Faculty of Engineering Technology, Zeedijk 101, B-8400 Oostende, Belgium E-mail: ronald.rousseau@khbo.be K.U.Leuven, D ept. Mathe matics, Celestijnenlaan 20 0B, B-3000 Leuven (Heverlee), Belgium Universiteit Antwerpen, IBW, Venusstraat 35, B-2000 Antwerpen, Belgium Abstract Percentile ranks and the I 3 indicator were introduced by Bornmann, Leyd esdorff, Mutz and Opthof. These two notions are based on the concept of percentiles (or quantiles) for discrete data. As several definitions for these notions exist we propose one that we think is suitable i n this context. Nex t we show that if the notio n of relative congruous indicators is carefully defined then percentile rank scores are congruous indicators of relative performa nce. The I 3 indicator is a strictly congru ous indicator of absolute performance. Keywords: percentiles, pe rcentile rank scores, I 3 indicators, congruous indicators Introduction During recent years the field of scientom etrics and especially that part involved in research evaluation has gone through a number of major shocks when old truths and methods do not seem to be valid anymore (Lundberg, 2007; Opthof & Leydesdorff, 2010; van Raan et al., 2010). How should indicat ors for research evaluation b e calculated? Can arithmetic averages still play a prominent role? Because of problems with the so-called crown indicator and even with the standard (and other synchronous) impact factors (Rousseau & Leydesdorf f, 2011) colleagues have begun a search to find other indicators that do not d epend on averages. An axiomatic approach Following the lead of mathematicians, statisticians, econometrici ans and ecologists (Dalton, 1920; Hardy, Littlewood & Pólya, 1952; Pielou, 1975; Stirling, 2007) also informetricians have taken recourse to axiom atic approaches. Examples of such approaches are the study of the informetric laws (Egghe, 2005), inequality (Egghe, 2005; Rousseau, 1992), ranking journals (Palacios-Hu erta & Volij, 2004) and h-type indices (Marchant, 2009; Quesada, 2009; Rousse au, 2008; Woeginger, 2008). When it seemed that there might be problems w ith the original crown indic ator it was an 2 axiomatic approach that dealt the final blow. Indeed, one of the requirements for an acce ptab le indi cat or i s t hat i t be cons ist ent (fo r th e def ini tio n of this not ion we r efe r the reader to (Waltman et al ., 2011 a)). However, in (Walt man et al., 2011a) it is shown that the (old) crow n indicator (a ra ti o of averages) d oes not satisfy this requ ire men t, whi le th e Karo lin ska i ndic ato r (an av era ge of rati os) d oes. Yet , als o averages of ratios may present some practical problems as illustrated in (Walt man et al., 2011b). The property we will us e is based on (Waltman & Van Eck, 2009a; Waltman et al., 2011a). However, contrary to their definition, the notion of expected number of citations does not play a role at all in our approach. It is also very close to the notion of independence as defined by Bouyssou & Marchant (2011). The exact difference between consistency and independence is discussed in (Bouyssou & Marchant, 2011). As we will slightly change the definition of independence (and consis tency) we refer to it by a different name, using congruous ins tea d of consistent . Definition 1. Congruous indicato rs of average performance Let A and B denote se ts of documents, containing the same number o f documents (#A = #B). Let A’ and B’ be the sets A and B to which the same document has been added. Then an indicator of average p erformance f is said to be strictly congruo us if: f(A) > f(B) if and only if f(A’) > f(B’) (1 ) It is said to be congruous if: f(A) > f(B) ⇒ f(A’) ≥ f(B’) (2) Clearly, if an in dicator of av erage perfor mance is strictly congruous it is also congruous. The opposite is not true. Consider, for example, the indicator F(A) = Max(3, average number of citations of articles in A). Then, if F(A) > F(B) it is easy to add to both sets an article with a large number of cit ations leading to F(A’) = F(B’) = 3. Definition 2. Congruous indi cators of total performance Let A an d B denot e sets of doc ument s and le t A’ and B ’ be the se ts A an d B to whic h the same document has been added. Then an indicator of total performanc e f is said to be strictly congruous if : f(A) > f(B) if and only if f(A’) > f(B’) (3) Percen tile rank scores Consider a set A and a reference set S containing all elements in A, hence A S. Moreover, we assume that a function g from S to the positive real numbers is given, leading to the multiset g(S). Note that we consider g(S) as a multiset, as we consider 3 the images g(s), s in S, as separate entities (even if their values are the same). A standard situation is the case that A consist s of a set of ar ticles, the set S consists of all articles in the journals in which the set A is publis hed (published in the same yea r), and a function g whic h maps an article to the number of citations it has received over a given period (and there may be several art icles with the same numb er of citations). Now a rule is given which subdivid es the set S into K disjoint classes, based on the values of the function g. If a document belongs to class k then it receives a score x k . Note that this scor e only depends on the class (and h ence on S), but not on A. Again a standard situation is t he case that there are 100 percent ile classes (or ten decile classes). In the case of percentiles articles belonging to the top 1% receive a score of 100, those belonging to the t op 2% (and not to the top 1%) receive a score o f 99, and so on. Definition 3. Percentile rank scores (Bor nmann & Mutz, 2011; Leydesdorff et al., 2011) Let A be a set of N documents and let n A (k) (or simply n(k) wh en the set A is of no importance) be the number of docum ents in A that belong to class k. Then the percentile rank score of A is defined as: 1 () () K A k k nk RA x N = = ∑ (4) Clearly, the value of R(A) depends not only o n A, but also on the reference set S, the K classes used and their score. A note on percentiles and relat ed classes Percentile rank scores use percentil es and, usually, classes based on perc entiles. Hence these two notions must be clarified. There is, however, no general agreement about how to define perc entiles for discrete data, see e.g. (Hyndman & Fan, 1996). We follow the appr oach by (Beirlant et al., 2005). This definitio n is simple and has the advantage to always give the highest score to the best performer in a reference set. We provide an example illustrati ng the used concepts and compare wit h the definition as used in (Leyd esdorff et al., 2011). Yet, in the app endix, we will also use a nother definition (Egghe & Rousseau, 2001) and illustrate the difference. Given a finite multiset of numbers (some numbers may occur several times, hence this is a multiset and not ne cessarily a set), M = [x i , i = 1 ,.., n], ranked from smallest to largest, we define for : 1 # ; ; 1 , … , This function is a right-contin uous step function with discontinu ities in the points x i . If M = [1,15,79,2,3,11,15,15,185, 47, 71, 18, 101] or in ranked order: M = [ x 1 = 1, x 2 = 4 2, x 3 = 3, x 4 = 11, x 5 = 15, x 6 = 15, x 7 = 15, x 8 = 18, x 9 = 47, x 10 = 71, x 11 = 79, x 12 = 101, x 13 = 185]; n = 13. Then, for the example multiset M the values for the corresponding function are shown in Table 1. Tabl e 1. - values 0 , 1 ; 0.077, 1 2 ; ; 0.154, 2 3 ; ; 0.231, 3 11 ; 0.308 , 11 15 ; 0.538 , 15 18 ; 0.615 , 18 47 ; 0.692 , 47 71 ; 0.769 , 71 79 ; 0.846 , 79 101 ; 0.923, 101 185 ; 1 , 1 8 5 . Let now 0 < p ≤ 1, then the p th -quan ti le, d enot ed as is a kind of inverse of . However, as is not injective an inverse in the strict sense cannot be defined. Consequently, a choice must be made a nd following (Beirlant et al., 2005) we d efine ; For p = 0 we put 0 m i n . When p takes the values 0, 0.01, 0.02, …, 0.99, 1.00 then these quantiles ar e called percentiles. We note that, in particul ar 1 m a x . Also the function is a step function with discontinuities in the points x i ; in these poin ts the function is left-continuous. For our example 1 for 0, 0.01 , … ,0.07 ; 2 f or 0.08, … ,0.15 ; 3 for 0.16, … ,0.2 ; 11 for 0.24, … ,0.30 ; 1 5 f o r 0.31, … ,0.53 ; 18 for 0.54, … ,0.61 ; 47 f or 0.62, … ,0.69 ; 5 71 f or 0.70, … ,0.76 ; 79 f or 0.77, … ,0.84 ; 101 f or 0.85, … ,0.92 ; 185 for 0.93, … ,1 .00 . Percentile values can b e used to delin eate classes. Examples m ay be: using hundred or ten classes of equal breadth, or six classes, such as the NSF categories, of unequal bre adth. We propose using the following percentil e classes and scores in g(S): the 99 th percentile class is 0.99 , 1.0 with score 100; the 98 th percentile class is 0.98 , 0.99 with score 99; and generally the t th percentile class (t = 0, 1,…, 99) is , with score t+1. The last percentile class (t=0) is class 0 , 0.01 with score 1. Note that these inte rvals are right-open, except for the 99 th percentile class which is a closed interval. In this way the largest number in the set always belongs to the highest percentile. We further note that for a small number of discrete da ta most of these interval s will be empty. Simi larly, decil e classes are defined as 0.9 , 1.0 with sco re 10 ; 0.8 , 0.9 with score 9 and the k- th cl ass (k = 0, 1, 2, …, 9) , with score k+1. In our example the percentile scor es are: x 13 = 185 has score 100, x 12 = 101 has score 93; x 11 = 79 has score 85; x 10 = 71 has scor e 77; x 9 = 47 has sco re 70; x 8 = 18 has score 62; x 5 = x 6 = x 7 = 15 have score 54; x 4 = 11 has sc ore 3 1; x 3 = 3 has score 24; x 2 = 2 has score 16 and x 1 = 1 has score 8 . Indeed, let us calculate, as an example 0.92 , 0.93 . This interval is equal to ; 0 . 9 2 , ; 0 . 9 3 = 101,185 (reading values from Table 1). Hence 101 belongs to the 92 nd percentile class, and hence its pe rcentile score is 93. We further note that f or our example the 93 rd percentile class is empty. Indeed 0.93 , 0.94 = ; 0 . 9 3 , ; 0 . 9 4 = 185,185 . How do Leydesdorff et al. (2011) determine percentiles? The quantile of paper x j is defined as # : . Paper x j then belongs to percentile class # : (where the symbol denotes the largest inte ger smaller than or equal to z) with scor e # : +1. As an example of this procedure we determine the pe rcenti le score of x 8 = 18 . Acco rding to th e Leydesdorff et al. procedure we first determine # : = 7*100 /13 = 53.846. This implies th at x 8 belongs to percentile class 53 and hence has a score equal to 54. Note that in our approach above the score is 62. As the Leydesdorff et al. procedure shifts all scores this leads to the problem that the best performer do es not have a score of 100 (or any other chosen highest score), but that it heavily depends on n. I n our approach no ad hoc procedure (e.g. adding 0.9) is necessary to compensate for this dependence, as in (Leydesdorff & Bornmann, 2011). 6 Note that the following a lternative proc edure easily leads to the same values as in the more formal appro ach explained abov e. Determine # : . (the only diff ere nce wi th th e Leydesdorff et al. procedure is that we added an equality sign). Paper x j then belongs to pe rcentile class # : with score # : +1 (for x n there is no incre ase by one). As an ex ample of this procedure we again determine the percentile score of x 8 = 18. According to this alternative procedure we first deter mine # : = 8*1 00/13 = 61.54. This implies that x 8 belongs to percentile class 61 and hence has a score equal to 62. Percentile rank scores are strictly congruou s indicators of average performance The truth of this proposition does not depend on the ways percentiles and percenti le classes are calculated, as long as these are fixed. Yet, there is a fin er point to be observed which is made further on . Proposition. Percentile rank scores are strictly cong ruous indicators of averag e performance Proof. Assume that 11 () () () () KK AB kk kk nk nk RA x R B x NN == => = ∑∑ (re call that A and B must have the same number of elements). We now add to A and to B a document that belongs to class j and de note the new sets by A’ and B’. Then 1 (' ) ( ) 11 j N RA RA x NN =+ ++ , while 1 (' ) ( ) 11 j N RB RB x NN =+ + + . Removing a common document from the sets A and B leads to a similar conclusion. This shows that R is a strictly congruous indicator of average performance. Percentile rank scores are not congruo us indicators of absolute performa nce. Note though that they are not meant to be! Example Let A consist of two articles: one with score 1 and one with score 4. Its percentile rank score is (1+4)/2 = 2.5. Let B consist of four articles: two with score 1, one with score 3 and one with score 6. Its perc entile rank score is (1+ 1+3+6)/4 = 2.75 Now we add t o A and B the same artic le w ith score 5. The ne w percent ile r ank scor e of A is no w: (1 +4+5 )/3 = 3.33 The ne w percent ile r ank scor e of B is: (1+1+ 3+5+ 6)/ 5 = 16/ 5 = 3. 2 7 Although we have shown that percentile rank scores are strictly congruous indicators of average performance, t here is, however, a way to violate the requireme nt of congruousness for relativ e indicators. This i s the finer poi nt we mentioned ab ove. Let us con sid er t he foll owing cou nter example . A counterexampl e Assume that classes are determined as d eciles. Those in the h ighest decile receive a score of 10; the next 10% a score of 9 and so on. The reference set consists of 99 articles which have rec eived 99, 98 ... 1 citation(s). The function , for 0 ≤ x ≤ 99. The 9 th decile is then 0.9 , 1.0 = [90,99], with score 10; the 8 th decile i s 0.8 , 0.9 = [80, 90[, with score 9 and so on. Group A consists of the articles receiving 96,86,76,66,56,46,36,26,16,6 citations. Group A has a percentile rank score of 55/10 = 5.5 . Group B consists of the articles receiving 89, 88, 79, 69, 59, 49, 39, 29, 19, 9 citations. Group B has a percentile rank score of 54/10 = 5.4, so that group A has a higher score than group B. Now we add a new article to A, B and to the reference set. This article has no citations. For the new situation we have: , for 0 ≤ x ≤ 99. The 9 th decile is then 0.9 , 1.0 = [89,99], with scor e 10; the 8 th is 0.8 , 0.9 = [79, 89[, with score 9 and so on. The new score of A is (55+1)/11 ≈ 5.09. However, the new score for B is: (10+9+9+8+...+2+1)/11 = 64/11 ≈ 5. 82. Hence, by adding an article with out any citations, B’s percentile rank score became higher than A’s and even higher than A’s o riginal score. We admit that this counterexample is of a purely theoretical nature. The problem that occurred can easily be avoided by stipulati ng in the definition of congruousness t hat the added article d must already belong to the intersection of the reference sets for A and B, hence adding d does not change any of the two reference sets, and hence also classes derived from percentiles do not change. Conseq uently, we propose to add this requirement to the d efinition of congruousness. This leads to the followin g definition. 8 Definition 1a. Congruous indicators of av erage performance Let A and B denote se ts of documents, containing the same number o f documents (#A = #B). Let A ⊂ S (the ref erenc e set o f A) and let B ⊂ T (the reference set of B). Let A’ and B’ be the sets A and B to which the same document d ST ∈∩ has been added. Then an indicator of average perform ance f is said to be strictly congruous if: f(A) > f(B) if and only if f(A’) > f(B’ ) (5 ) The I 3 ind icat or The I3 indicator (Leydesdorff & Bornmann, 2011), where I 3 stands for Integrated Impact Indicator, is defined in a similar way as the percentile rank score a s given in equation (4). No division by N is performed, but the role of the reference set S is the same. Hence, using the notation introduced above, we hav e the following definition. Definition 4. The I 3 score of a set A is defined as: 1 3( ) ( ) K kA k IA x n k = = ∑ (6) Proposition. I 3 scores are strictly congruous in dicators of absolute performance Proof. Assume that 11 3( ) ( ) 3( ) ( ) KK kA k B kk IA x n k I B x n k == => = ∑∑ (now A and B do not necessarily have the same number of elements). Adding to sets A an d B a document that belongs to class j and denoting the new sets by A’ and B’ gives: 3( ') 3( ) j IA IA x =+ , while 3( ') 3( ) j IB IB x = + . This trivially shows that I 3 is a strictly congruous indicator of absolute perfo rmance. Notes Note 1 As observed by Leydesdorff and Bornmann the I 3 score of a set A can also be defined without the notion of classes. In that ca se one just uses the quantile value of each element in A (with respect to the refere nce set S). This does not change the fact that the I 3 indi cat or i s a st ric tly con gruo us in dicator of a bsol ute perf orma nce. Note 2 The proposal by W altman and van Eck (2009b), see also (Plomp, 1990) to use the number of highly cit ed publications as a n indicator (t he HCP-indic ator) can be consider as a special case of the I 3 sco re. Indeed, taking only two classes (the highly cited ones and the oth er ones) and giving the highly-cit ed ones a score of 1, and the 9 other ones a score of 0, reduces formula (6 ) to the number of highly-cite d publications in a set A. Conclusion When the notion of relative congr uous indicators is carefully defined t hen percentile rank scores are congruous indicators of relativ e performance. Similarly, the I 3 indicator is a strictly congruous indicator of absolute p erformance. The HCP-indicator can be considered as a special (sim ple) case of the I 3 indicator. Acknowledgements. The author thanks Loet Leydesdorff for many useful discussi ons about the notions studied in th is article. He als o thanks Yuxian Liu, Leo Egghe, Raf Guns and Ying (Fred) Ye for helpful suggestions. Work of the author is supported by NSFC grants 70773101 and 7101017006. Refer ences Beirlant, J., Dierckx, G. & Hubert, M. (2005). Statistiek en Wetenschap . Acco: Leuven. Bornmann, L. & Mutz, R. (2011). Furt her steps towards an ideal metho d of measuring citation performance: T he avoidance of citation (ratio) averages in field- norm aliz ation. Jo urnal of Informetrics , 5(1), 228-230. Bouyssou, D. & Marchant, T. (2011). Rankin g scientists and departments in a consistent manner. J our nal of the Am erican Soc iet y for I nform ati on Sci enc e and Technology (to appear). Dalton, H. (1920 ). The measurement of the inequality of incomes. The Economic Journal , 30, 348-361. Egghe, L. (2005). Power laws in the information production pr ocess: Lotkaian informetrics. Elsevier: Amsterdam. Egghe, L. & Rousseau, R. (2001). El ementary statistics for effective library and inform ation service managem ent . Aslib: London. Hardy, G., Littlewo od, J.E. & Pólya, G. (1952). Inequalitie s (sec. ed.). Cambridge University Press, Cambridge (UK). Hyndman, R.J. & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician , 50(4), 361-364. Leydesdorff, L. & Bornmann, L. (2011). Integrated Impact Indicators (I3) compared with Impact Factors (IFs): An alternati ve research design wit h policy implications. Jou rnal of the Am erican Soci ety f or Inf orm a tion Scienc e and Technology , 62 (to appear). Leydesdorff, L., Bornmann, L., Mutz, R. & Opthof, T. (2011).Turning the tables o n citation analysis one more time: Principles for comparing sets of documents. Journal of the American Society for Inform ation Science and Technology , 62(7), 1370-138 1. 10 Lundberg, J. (2007). Lifting the crown – citation z-score. Journal of Inform etrics , 1(2), 145-154. Marchant, T. (2009). An axiomat ic characterization of the ranking based on th e h- index and some other bib liometric rankings of authors. Scie ntom etrics, 80(2), 325-342. Opthof, T., & Leydesdorff, L. (2010). Caveat s for the journal and field normalizations in the CWTS (“Leiden”) ev aluations of research perform ance. Journal of Informetrics, 4 (3), 423-430. Pala cios- Huerta, I . & Volij, O . (200 4). The me asureme nt of i nte llect ual infl uenc e. Econometrica , 72, 963–977. Pielou, E.C. ( 1975). Ecologi cal diver sity . Wiley, New York. Plomp, R. (1990). The significanc e of the number of highly cited papers as an indicator of scientific prolifica cy. Scientometrics , 19( 3-4), 18 5-197. Quesada, A. (2009). Monotonicity and the Hirsch index. Journal of Informetrics , 3, 158-160. Rousseau, R. (1992). Concentration and diversit y in inform etric research . Doctoral thesis, UIA. Rousseau, R. (2008). Woeginger’s ax iomatization of the h-index and its relation to the g-index, the h (2) - index and the R 2 -index. Journ al of I nform etrics , 2(4), 335- 340. Rousseau, R. & Leydesdorff, L. (2011). Simple arithmetic versus intuitiv e understanding: the case of the impact factor. ISSI Newsletter , Vol.7 (1), 10-14. Stirling, A. (2007). A general f ramework for analysing d iversity in science, technology and society. Journal of the Royal Society Interface, 4 ( 15), 707-7 19. van Raan, A. F. J., van Leeuwe n, T. N., Viss er, M. S., van Eck, N. J., & Waltman , L. (2010). Rivals for the crown: Reply to Opthof and Leydesdorff. Journal of Informetrics, 4(3), 431-435. Waltman, L. & van Eck, J.N. (2009a). A taxonomy of bibli ometric performanc e indicators based on the propert y of consistency. In: ISSI 20 09 (B. Larsen & J. Leta, eds.). Sao Paulo: BIREME & Federal University of Rio de Janeiro, pp. 1002 -1003. Waltman, L. & van Eck, J.N. (2009b ).A simple alternat ive to the H- index. ISSI Newsletter , 5(3), 46-48. Waltman, L., Van Eck, N. J., Van Leeuwen, T. N., Visser, M. S., & Van Raan, A. F. J. (2011a). Towards a new crown indicat or: Some theoretical co nsiderations. Journal of Informetrics, 5 (1), 37-47. Waltman, L., Van Eck, N. J., Van Leeuwen, T. N., Visser, M. S., & Van Raan, A. F. J. (2011b). Towards a new crown indicator: an em pirical analysis. Sciento metrics, 87(3), 467-481. Woeginger, G. J. (2008). An axiomatic characteriz ation of the Hirsch-index. Mathematical Soc ial Sciences , 56 (2), 224–232. 11 Appendix As said before, there exist many definitions for the notion of a quantile. In this appendix we compare the defi nition as proposed e.g . in (Beirlant et al., 2005) with the one used, e.g., in (Egghe & Rousseau, 2001). For 0 ≤ p ≤ 1, quantiles in the sense of Egghe & Rousseau (2001), denoted here a s , can be defined as (0 < p < 1) with 0 0 and 1 1 . Limits of a function in a point p are calculated by first removing this poi nt p from the domain of definition and then calc ulating the limit (in this case the left-hand or t he right-hand lim it) 1 . If is continuous in the point p then . Hence, there can only be a difference between these two notions in points where the quantile function is discontinuo us, this means in points x of the form x = m/n, 0 < m < n, with m a na tural number. In the example used in this article, n = 13, hence m/n is never a multiple of 1/100. So for the example we have used percentile values coincide for the two definitions. Next we prese nt an example where percentile values (and deciles) d o not coincide. We take n = 10 . Let S 0 be a set of articles with corresponding c itations [1,3,7,8,8,12,17,23,30,60]. Then Table 2 gives the values of . Table 2 x < 1 0 12 ≤ x < 17 0.6 1 ≤ x < 3 0.1 17 ≤ x < 23 0.7 3 ≤ x < 7 0.2 23 ≤ x < 30 0.8 7 ≤ x < 8 0.3 30 ≤ x < 60 0.9 8 ≤ x < 1 2 0.5 60 ≤ x 1 Now is defin ed a s: 1 for 0 ≤ p ≤ 0.10 3 for 0.10 < p ≤ 0.20 7 for 0.20 < p ≤ 0.30 8 for 0.30 < p ≤ 0.50 1 2 for 0.50 < p ≤ 0.60 1 There exist different definitions of limits in a point. However, to the best of our knowledge these defin iti ons coincide when the point p is first removed. This is the reason for this special procedure. 12 1 7 for 0.60 < p ≤ 0.70 2 3 for 0.70 < p ≤ 0.80 3 0 for 0.80 < p ≤ 0.90 6 0 for 0.90 < p ≤ 1.00 while is defined as: 1 for 0 ≤ p < 0 .10; 2 for p = 0.10; 3 for 0. 10 < p < 0.2 0; 5 for p = 0.20; 7 for 0. 20 < p < 0.3 0; 7 . 5 for p = 0.30; 8 for 0. 30 < p < 0.5 0; 1 0 for p = 0. 50; 1 2 for 0.50 < p < 0.60; 1 4 . 5 for p = 0.60; 1 7 for 0.60 < p < 0.70; 2 0 for p = 0.70; 2 3 for 0.70 < p < 0.80; 2 6 . 5 for p = 0.80; 3 0 for 0.80 < p < 0.90; 4 5 for p = 0.90; 6 0 for 0.90 < p < 1.00 Correspondin g decile classes ar e: 0.9 , 1.0 30 , 60 = {30,60} with score 10 0.8 , 0.9 = [23 , 30 [ = {23} with score 9 0.7 , 0.8 = [17 , 23 [ = {17} with score 8 0.6 , 0.7 = [12 , 17 [ = {12} with score 7 0.5 , 0.6 = [8 , 12 [ = {8,8} (this must be consider ed as a mult iset) with score 6 0.4 , 0.5 = Ø 0.3 , 0.4 = [7 , 8 [ = {7} with score 4 0.2 , 0.3 = [3 , 7 [ = {3} with score 3 0.1 , 0.2 = [1 , 3 [ = {1} with score 2 0 , 0.1 = Ø 13 Based on we have the following deciles and decile scores: 0.9 , 1.0 4 5 , 6 0 = { 60} with sco re 10 0.8 , 0.9 = [26.5 , 45 [ = {30} with score 9 0.7 , 0.8 = [20 , 26.5 [ = {23} with score 8 0.6 , 0.7 = [14.5 , 20 [ = {17} with score 7 0.5 , 0.6 = [10 , 14.5 [ = {12} with score 6 0.4 , 0.5 = [8 , 10 [ = {8,8} (again a multis et) with score 5 0.3 , 0.4 = [7.5 , 8 [ = Ø 0.2 , 0.3 = [5 , 7.5 [ = {7} with score 3 0.1 , 0.2 = [2 , 5 [ = {3} with score 2 0 , 0.1 = [0, 2 [ = {1} with score 1 This example illustrates the difference between these two definitions of percentiles. The counterexample we presented is not valid anymore when using the function . Hence we provide a slight variation. Consider the set g(T) = {0,1,…,99} as the set of citations (n = 100) received by a reference set T . Th en , for 0 ≤ x ≤ 99, while 0 f or 0 and 1 for 99 . Consequently 100 1 for p = 0.01, 0.02, …, 0.99, 1 , while . 100 0 . 5 , for p = 0.01, 0.02 , …, 0.99; and 1.00 9 9 . Then 0.9 , 1.0 = [89, 9 9], with score 10 ; and similarly , 0 . 1 = [100d-1, 100(d+0.1)-1[, for d = 0.1, …, 0.8 with sc ore 10(d+0.1) and 0 , 0.1 = [0, 9[, with score 1 . Further: 0.9 , 1.0 = [89.5, 99] with score 10 and similarly , 0 . 1 = [100d-0.5, 100(d+0.1)-0.5[, fo r d = 0.1, …, 0.8 with s core 10(d+0.1) and 0 , 0.1 = [0, 9.5[ w ith score 1 . When group A consists of the articles receiving 96, 86,76,66,56,46,36,26,1 6,6 citations then its percentile rank score is 55/10 = 5.5 ; while group B, consistin g of the artic les rec eivi ng 89, 88, 79, 69, 59, 49, 39, 29, 19, 9 citation s has a percentile rank score of 54/10 = 5.4 , so that group A has a higher score than group B. Now adding an article with zero citations leads to the new referenc e set T 0 consisting of art icles wi th c itat ions [0,0,1,…,99]. Then , for 0 ≤ x ≤ 99 . Consequent ly see n ot e belo w 101 2 100 1 for p 14 = 0.01, 0.02, …, 0.99 , whe re denotes the smallest integer larger than or eq ual to t . Moreover, we put 0 0 0 and 1 1 9 9 . Then 0.9 , 1.0 0.9 , 1.0 = [89, 99], with score 10 ; and similarly , 0 . 1 , 0 . 1 = [100d-1, 100(d+0.1)-1[ , for d = 0.1, …, 0. 8 with score 10(d+0.1) and finall y 0 , 0.1 0 , 0.1 = [0 , 9[, with score 1 . Now the new sco re of A is (55+1)/11 ≈ 5.09 , wh ile the new score fo r B is: (10+9+9+8+...+2+1)/11 = 64/11 ≈ 5.82 . Again we have a counterexample. Note: for p = 0.01 , 0.0 2, …, 0.9 9 as is continuous for p = 0.01, 0.02, …, 0.99 . Indee d, is discontinuous in points x of the form m/101, with m a natural number between 1 and 100. Now m/ 101 is a percentile if it has the form q/100 with q a natural number betw een 0 and 100. In that case, one has 101* q = 100*m. Clea rly, 100*m is a multiple of 100. As 101 is a prime number, 100 and 1 01 have no divisors in common. So 101*q = 100*m only if q is a multiple of 100. As q < 100 this is not possible. This sh ows that is conti nuo us for p = 0.01, 0.02, …, 0.99 and hence, i n these points : .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment