Fractional counting of authorship to quantify scientific research output

We investigate the problem of counting co-authorhip in order to quantify the impact and relevance of scientific research output through normalized \textit{h-index} and \textit{g-index}. We use the papers whose authors belong to a subset of full profe…

Authors: Vincenzo Carbone (Dipartimento di Fisica, Università della Calabria; IPCF/CNR – Università della Calabria)

Fractional counting of authorship to quantify scientific research output
F ractional coun ting of authorship to quan tify scien tific researc h output Vincenzo Carb one Dipartimen to di Fisica, Univ ersit` a della Calabria, Cosenza, Italy IPCF/CNR - Univ ersit´ a della Calabria, Cosenza, Ita ly Octob er 16, 2018 Abstract W e inv estigate the problem of counting co-auth orhip in order to q uan- tify the impact and relev ance of scientific research outpu t through nor- malized h-index and g-index . W e use the pap ers whose aut h ors b elong to a subset of full p rofess ors of the Italian Settore S cien tifico Disciplinare (SSD) FIS 01 - Exp erimental Physics. I n this SSD tw o p opulations, char- acterized by the num b er of co-authors of eac h pap er, are roughly present. The total number of citations for eac h individuals, as wel l as their h-index and g-index, strongly dep ends on th e a verage number of co-authors. W e sho w that, in order to remov e th e dep endence of the v arious indices on the t wo populations, th e b est w ay to define a fractional counting of au- torship is t o d ivide the num b er of citations received by eac h pap er by the square ro ot of the num b er of co-authors. This allo ws us to obtain some information which can b e used for a b etter understand ing of the scientific knowl edge made th rough the process of writing and publishing pap ers. 1 The e ns em ble of pa per s published b y a scien tist at a given ep och, which hav e b een cited by the s c ien tific communit y , co n tain useful informa tio n on the impact a nd rele v ance of the res earch output of the individual. In 2005 J.E. Hirsch intro duced the celebrated h-index [1], which would represents a mea - sure of resea rch achiev e men t, and dep ends on b oth the num b er of a s c ien tists publications, and their impact on his or her p eers. Simply said, the h-index is the highest num b er o f pa pers s igned by a scientist, that hav e each received at least that num b er of citations . Th us , someone with and h-index ranked H has published H pap ers each had at lea st H cita tions. The h-index represents a better mea sure with r espe c t to other bibliometric pa r ameters as counting total pap ers, whic h co uld reward those with man y medio cre publications, whereas counting just hig hest-ranked pa pers may not recog nize a la rge and consistent bo dy of work during a scientific career . The para meter immediately a ttracted lots of attention of the scientific w orld, po licy makers and the public media. The growth of the n umber of pap ers on the h-index is sp ectacular, and it is practica lly imp ossible to present a complete reference list (e.g . [2, 3 , 4, 5, 6, 7, 8]). Scientific news editor s [2] enth usias tically received the new index, a nd rese archers in v a rious fields of science [9, 1 0, 1 1, 12], particularly in the bibliometric research communit y [3, 4 ] started follow-up work. The idea of ranking scientists by a fair measur e stirr ed the fire, b ecause such rankings could make election pro cedures of scientific aca demies more ob jective and trans parent. Apart for the simple definition of the h- index , the conclusions of the Hirsch’s pap er [1], which ar e ba sed on a nalysis of re a l data, a re very interesting. Hirsch show ed that it is ha rd to inflate ones own h-index for example by self-citation, bec ause the par ameter re lies o n how a b o dy of work is received over time and it is very har d to manipulate an ent ire career . Hirs c h sug gests that after 20 years in resea r c h, an H ≃ 20 is a sign of succes s, and H ≃ 40 indica tes outsta nding scientists likely to b e found only at the ma jor resea rch lab orato ries. An H ≃ 12 should b e go o d e no ugh to s ecure universit y tenu re [1]. What is also interesting in the data ana ly sis by Hirsch is the fact that applying the metho d to pr ominen t ph ysicists, it can b e found that 84% o f Nobe l prize winner s hav e substa n tial h- indices H ≥ 30 , while pr ominen t physicists hav e H ≥ 50 [1], thus indicating that Nob el pr izes, or even a br illian t scientific ca reer, do not orig inate in one stroke o f luck but in a b o dy of scientific work. Among other , one of the ma in and p erhaps the only serious dis a dv antage of the h-index has b een revealed by L. Egghe [13, 14], who noted that the h- index is insensitive to one o r several outstandingly high cited pa pers . In deed, although highly cited pap ers are imp ortant for the determination of the v alue H of the h-index, once s uch a highly cited pap er is selected to b e lo ng to the top H paper s, its actual num b er of cita tions at any time is not used a n ymo re. Once a pap er is selected to the top group, the h-index calculated in subseq ue nt years r emains insensitive to the cita tion of this pa per, whatev er the n umber of s ubsequent citations. T o overcome this disadv antage of the h-index while keeping its adv antages, it ha s b een introduced the g-index [13, 14]. Note that by definition the pa pers on rank 1 , . . . , H each hav e a t least H citations, and hence these pap ers hav e, togheter, at leas t H 2 citations. The parameter G defined through the g-index [13] is just the larges t rank such that the first G pap ers hav e, together, at least G 2 citations. O b vio usly G ≥ H in a ll cases. Actually a scientific work is made in gener al by collab ora tions among tw o or 2 more scientists, so tha t lots o f attention ha s b een given to coauthor ed pap ers, s ay pap ers signed b y more tha n o ne author [1 5]. How a re the credits of e ach a uthore counted ? In o ther w ords, do es every author in a n - authored pa per g et a cr edit of 1 (total counting) or do es every author get a credit of 1 /n (fractio na l counting)? In g eneral fractio na l counting is preferre d b ecause this do es not increa se the total weigh t of a sing le pap er. The same question has b een p osed by Hirsch [1] which states that . . . a scientist with a high H achieve d most ry thr ough p ap ers with many c o-authors would b e tr e ate d overly kind ly by his or her H . Subfields with typic al ly lar ge c ol la b or ations (e.g. hi gh-ener gy exp eriments) wil l exhibit lar ger H values, and I suggest that in c ases of lar ge differ en c es in the numb er of c o- authors, it may b e useful in c omp aring differ ent individuals to normalize H by a factor that r efle cts t he aver age nu mb er of c o-aut hors [1]. Possible so lutio ns range fro m the simple division o f H b y the av er age n umber of res e archers in the publications o f the Hirsch co re[9, 15], to dis c o un t the h-index for ca r eer length, multi-authorship and self-citations [16], a nd to tak e in to accoun t the actual num b er of co-author s and the scientists re la tiv e p osition in the byline [17]. Even if all the ab ov e pr opo sal present adv antages and disa dv antages, in this pa per I inv es tigate how the fractiona l counting of autors hip should simply work on a real ca se. Aimed b y a more accurate a pproach to fractio na l counting, we investigate scientific p erformances of a subset of a non y mous individuals. W e select individ- uals within the italian full pro fessor of exp erimental physics b elonging to the Settore Scie ntifico Disciplinare (SSD) FIS01 . This choice is due to the fact that different individuals, b elonging to uncompa rable exp erimental facilities, co exist within this SSD. Using the Thomson ISI W eb of Sc ie nce database (av a ilable at ht tp:/ /isiknowledge.com) , w e select all the pap ers o f a subse t of N = 60 full professor s b elonging to the ab ov e men tio ned SSD, which roug hly corres ponds to 25% of the who le full pr ofessors of the SSD. Let us consider , for e a c h j -th individual ( j = 1 , 2 , . . . , 60), the av e r age v alue M j of a uthors of each publica tion M j and the total num b er o f citations C tot of the j - th scientist at a given ep o ch. The v a lue M j is corr e lated to the num b er of publicatio ns n j , thus trivia lly b oth the usua l h-index and g-index a re c o rrelated to M j , a s results fr om fig.s 1. As suggested by Hirsch [1], the total n um b er of citations is linearly re la ted to H 2 through C tot = αH 2 , with a par ameter which results α = 4 . 4 5 ± 0 . 06, in agr ee- men t with Hirsch [1]. How ever, also the v alue of G 2 is rela ted to C tot through the linea r relatio n C tot = β G 2 , wher e β = 1 . 6 8 ± 0 . 02 (not shown her e). What is in teresting fro m fig.s 1, is that, as nively exp ected, t wo different po pulations are present within the SSD FIS0 1 which differ for the amount of the av er age num b er o f co-a uthored pap ers. The tw o p opulatio ns b elong to the same SSD as it is well known, even if, as showed here, n j and b oth H and G are strongly dep endent on M j . In other words, the mo re the num b er of c o -authors the higher the parameter s which denote scientific p erformances . It go es without saying that it is muc h easier to get a hig h h-index when one has written many pap ers with many collab orator s. Note that this is crucial if we conjecture that funding, tenu re po sitions, etc. could b e attributed on the basis of s cien tific per formances. F or exa mple it could b e conjectured that a n individua l might hav e a n h- index gr eater than a threshold v alue H ≥ H th in order to ac c ess a po sition of full pro fessor in the SSD FIS0 1 . It is clear that the non-homo g eneit y due to the tw o p opulations within FIS01 will make without sense an ob jectiv e v a lutation. In order to av o id the r ejection a priori of the use o f a n ob jectiv e 3 scientometric index, we must inv estigate the pr oblem of co -authorship. I prop ose to w eig h t each i -th pap er of the j - th individual acco rding to a fraction of the co-autho r num b er m µ i ( m i is the num b er of co-a uthors o f the i -th pap er), and to compare the fractional indices which results fro m this op eration. More formally , let us consider the weigh ted num b er o f cita tion for each pap er χ ( i ) µ = C i /m µ i , wher e C i is the num b er of citation c ollected by the i -th pap er, ordered such as χ (1) µ > χ (2) µ > . . . . The fractional h-index h µ and g-index g µ for the j -th individual are then defined a s the maximum integer such that χ ( h ) µ ≥ h µ (1) and v u u t g µ X i =1 χ ( i ) µ ≥ g µ (2) A moment o f reflection suffices to r ealize that the max im um weight µ = 1 has the same effect of no weigh t µ = 0, b ecause the t wo p op o lation still s hould per sist, even if with ro ughly upse tting their dep endence on M j . T he most us eful wa y to preceed is then to find the v alue o f the parameter 0 < µ < 1 s uc h that the resulting indices are indep endent on M j . In other words we calculate a v alue µ ⋆ which minimizes the dep endence of the pa rameters o n M j . Lo oking a t fig.s 1 we can conjecture tha t there r oughly exists a linea r r elation betw een h µ and M j , and b et ween g µ and M j . Then, by using the r elation f p ( N , µ ) =   N N X j =1 M j p µ − N X j =1 M j N X j =1 p µ   (3) (the index p s tands for b oth h and g ). w e can simply define a b est para meter through µ ⋆ ( N ) ≃ min µ { f g ( N , µ ) + f h ( N , µ ) } (4) Using our dataset made b y N = 60 individua ls as an exa mple, we find that the be st-fit parameter which minimize the dep endence on M j results µ ⋆ ≃ 0 . 53 ± 0 . 0 1, curiously close to 1 / 2. In fig. 2 we rep ort the v alues of b oth h µ and g µ as a function of M j , wher e the indep endence o f b oth norma liz ed indices on M j is cle a rly show e d. It is worth r epo rting that also the sq yares of fr actional indices ar e linea r ly rela ted to the fractiona l numer C ( µ ) tot of total citation obtained by summing the fractiona l citations χ ( i ) µ ov er all pap ers. This is made thro ug h the linear co efficients C ( µ ) tot = α µ h 2 µ = β µ g 2 µ , where α µ = 5 . 4 5 ± 0 . 08 and β µ = 2 . 04 ± 0 . 07 . The pr oblem of the deter mina tion of t ypica l v alues h ⋆ µ and g ⋆ µ , and mainly their fluctuations within a given ensem ble of individuals, should be o f some practical interest. T o ev alua te these v alues w e can build up the hystogram of bo th h µ and g µ calculated for µ = 0 . 53, which a r e repr o duce d in fig. 3 . The empirical distributions fo r b oth normalized indices can b e very well r e pro duced through a Cauch y -Lorentz dis tribution function 4 L p ( x ) = L 0 + 2 A π σ p 4( x − p ) 2 + σ 2 p (5) The maximum v alues p of the distributions can be considered a s t ypical h- index a nd g-index for the cla ss of scientists a t hand, while typical fluctuatio ns are describ ed by the v alues of σ p . In our e x ample, the b est fit corr espo nd to h ⋆ µ ≃ 6 . 80 ± 0 . 01 with σ h ≃ 4 . 0 ± 0 . 1, and g ⋆ ≃ 1 1 . 70 ± 0 . 07 with σ g ≃ 8 . 5 ± 0 . 6. The information we obtained ca n b e used to infer something ab out s cien tific pro cesses of k nowledge. The fact that the b est wa y to ov er come the difficulty of co-author s hip se e ms to b e weigh ting each pap er by the square ro ot o f the num b er of authors of that pa per is quite evocative of a ra ndom walk dyna mics. This should per haps indicate that in big exp erimental colla bor ations, whose output is a pap er with a lot of co-author s , the effective work is carr ie d out indep endently by rela tively small g roup of scientists, as usually happ ens in smalle r la bor atories within Universities. Mo r eov er, the o ccurr ence of a Cauch y -Lorentz distribution for normalized indices indica tes that the v ar io us scientists tend to differentiate enough to gener ate a pro cess of homogeneous broadening . V ery int erestingly , the fact that σ g > σ h in the dis tribution functions means that the normalized g-index is the r esult of a la rger broa dening with res p ect to the h-index. This indicates that a ctually a succesfull scient ific career is the result of some few resear ch pa pers with a g reat impact and some mor e pap ers with fewer citatio ns . In conclus io n, I investigated the problem of how to weigh t a co-author ed pa- per in or der to not reject a prior i the p ossibility o f ob jectively using par ameters as the h-index o r the g-index. I intro duced the fractional indices h µ and g µ built up by weigh ting the citations of the i -th pap er with a p ow er µ of the num b er of co- authors. The b est fit parameter which minimize the strong dep endence of the num be r of pap ers, cita tions and indices on the av e r age num b er of co-author s if close to µ ≃ 1 / 2. More interstingly , we found that, at least within the SSD FIS01 wher e tw o p opulations of scient ists co exist, the ab ov e fractio nal co un t can gives ris e to a single p opulation. The information on the dis tribution functions of no rmalized indices co uld b e very useful, for example, dur ing the selection pro- cedures of scie n tific academies, rese arch funding and tenure decisions, which are often seen as opa que, clubby and capricious. In fact the hypo thetica l commettee could b e free to use a threshold v alues h th ≃ h ⋆ − r σ h and g th ≃ g ⋆ − r σ g , where r is a n a rbitrary parameter , as one of the ob jectiv e cr iteria to select younger scientists. Of co urse this is just one of the p ossible wa y to ov e r come the pr ob- lem, and differ en t metho ds can b e investigated, even if they mig h t b e a imed at the so lution of the pre s ence of a double p opo lation. Bibliometric indicators as the no r malized h-index and g-index, which as we show ed are useful par ameters to ev aluate the output of science and which gives us so me info r mation ab out the wa y scientists actually work, cannot b e consider ed as the only yardstick to ev a luate the career of a n individuals. Ac knowledgmen ts: I ’m very grateful to C . Basile and S. Do na to who pro- vided a par tial dataset used in the pap er. I’m gr ateful to P . V eltri and R. Bartolino for fruitful discuss ions. 5 References [1] Hirsch, J.E . (20 05) An index to quantify a n individual’s s cien tific resear c h output PNAS 102 , 1656 9 - 1657 2. [2] Ball, P . (2005) Index aims for fair r anking of scientists Natur e 43 6 , 900. [3] Bornman, L. & Daniel, H.D. (2005) Do es the h-index for ranking of scie n- tists rea lly work? Scientometrics 65 , 39 1 - 39 2 . [4] Braun, T., Gl¨ anzel, W. & Sch ub ert, A. (2005) A Hirsch-t yp e index for journals The Scientist 19 , 22 . [5] Egghe, L. & Russeau, R. (20 06) An informetr ic mo del for the Hirsch index Scientometrics 6 9 , 121 - 129 . [6] Gl¨ anzel, W. (20 06) O n he H-index - A ma thematical appro ach to a new measure o f publication activity and citation impact Scientometrics 67 , 31 5 - 32 1. [7] Russeau, R. (2007) The influence of missing publications on the Hirsch index J. of Informetrics 1 , 2- 7 . [8] v an Raan, A.F.J. (2006) Compar ison of the Hirsch-index with standard bibliometric indica to rs and with p eer judgment for 147 chemistry research groups Scientometrics 67 , 491 - 5 02. [9] Batista, P .D., Ca mpiteli, M.G., Kinouchi, O. & Martinez, A.S. (2006) Is it po ssible to compar e resea rchers with differ en t scientific interests? Sciento- metrics 68 , 17 9-189. [10] Iglesias, J.E. & Pecharroman, C. (2007 ) Sca ling the h-index for different scientific ISI fields Scientiometrics 73 , 3 03-320 . [11] Popov, S.B. (20 05) A parameter to quantify dy na mics o f a resear c her ’s sci- ent ific activity , preprint av a ilable at http://arxiv.org /abs/physics/0508113 . [12] Radicchi, F., F or tunato, S. & Castellano , C. (2 008) Universality of citation distributions: T ow ar d an ob jectiv e measur e of scientific impact PNA S 105 , 17268 -17272 . [13] Egghe, L. (200 6) Theor y and practice o f the g -index Scientometrics 69 , 131 - 152. [14] Egghe, L. (20 06) How to improv e the h-index The Scientist 20 , 14. [15] Egghe, L. (2 008) Mathematica l theory of the h- index and g-index in case of fractional counting o f a uthorship J. Americ an So ciety for I n formation Scienc e and T e chnolo gy 59 , 1608 - 16 16. [16] Burrell, Q. (20 07) Should the h-index b e discounted?, In: W. Gl¨ anzel and A. Sch ub ert (editor s). The multidimensional world of Tib or Braun. Leuven: ISSI, 6 5-68. [17] W an, J.-K., Hua, P .-H. & Rous seau, R. (2008) The pure h-index: calculat- ing an authors h- index by taking co-author s into acco un t, pr eprint. 6 0 100 200 300 400 500 600 700 800 1 10 100 1000 # papers Average # authors 0 20 40 60 80 100 120 1 10 100 1000 H, G indices Average # authors Figure 1: In the upp er panel we rep ort the num b er of pap ers as a function of the average num b er of coauthors for a given individuals . I n the low er panel we rep ort b oth the indices H (squares) and G (triang les) as a function of the av era ge num b er of co a uthors for a given individuals. 7 0 10 20 30 40 50 60 70 1 10 100 1000 H indices Average # authors 0 20 40 60 80 100 120 1 10 100 1000 G indices Average # authors Figure 2: In the upp er panel we re por t the v alue s of both H (withe symbo ls ) and of their no r malized v a lue h µ (black symbols) for µ = 0 . 53. In the lower panel we rep ort the v alues of bo th G (withe symbols) a nd o f their normaliz ed v a lue g µ (black sy m b ols) for µ = 0 . 53. 0 0.05 0.1 0.15 0.2 0.25 0.3 0 5 10 15 20 25 30 Probability density Binned H, G indices Figure 3: W e rep ort the binned v alues o f bo th h µ (circles) a nd g µ (squares) for µ = 0 . 53. Sup erimpo sed as full lines we re p or t the fitted Cauch y-Lo ren tz functions L h ( x ) a nd L g ( x ) (see text). 8

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment