Archetypal Athletes
Discussions on outstanding---positively and/or negatively---athletes are common practice. The rapidly grown amount of collected sports data now allow to support such discussions with state of the art statistical methodology. Given a (multivariate) da…
Authors: Manuel J. A. Eugster
Archet ypal A thletes Man uel J. A. Eugster Ludwig-Maximilans-Univ erst ¨ at M ¨ unc hen Discussions on outstanding—positively and/or negativ ely—athletes are com- mon practice. The rapidly grown amount of collected sp orts data now allo w to supp ort such discussions with state of the art statistical metho dology . Giv en a (multiv ariate) data set with collected data of athletes within a sp e- cific sp ort, outstanding athletes are v alues on the data set b oundary . In the present pap er w e prop ose arc het ypal analysis to compute these extreme v alues. The so-calle d arc het yp es, i.e., arc hetypal athletes, appro ximate the observ ations as con v ex com binations. W e in terpret the arc het ypal athletes and their characteristics, and, furthermore, the composition of all athletes based on the arc het ypal athletes. The ap plication of arc het ypal analysis is demonstrated on bask etball statistics and so ccer skill ratings. Keyw ords: archet ypal analysis, conv ex hull, extreme v alue, basket ball, so c- cer 1 Intro duction “Dirk No witzki is the b est bask etball play er. No, it’s Kevin Durant!” . “Christiano Ronaldo is the n um b er one, Lionel Messi n umber tw o so ccer play er in the world” . “Ronaldinho is the b etter dribbler, but Zin ´ edine Zidane is faster” . These and similar statemen ts can b e found in almost all discussions on sp orts and the practicing athletes. They are interesting to debate, but they are also having a great impact on many (man- agerial) decisions—from a coac h’s tactical sp ecification via engagemen ts of new play ers through to a company ’s selection of a brand ambassador. The consequence is the col- lection of more and more sp orts data. A large num b er of statistics (the v ariables) p er sp orts and athletes (the observ ations) are measured to inv estigate such statemen ts using state of the art statistical metho dology . The foundations of statements like the introductory examples are constructed orders of the athletes (maybe implicit). Given that no uniquely defined strict order (and there- fore no minimum and maximum) exists for observ ations with more than one dimension, 1 most approaches are based on an appropriate reduction of the collected statistics to the one-dimensional space (where a strict order exists). General metho ds are for exam- ple ordination metho ds lik e multidimen sional scaling and principal comp onents analysis (e.g., 8 ); sp ecialized metho ds are for example the EA Sp orts Pla y er P erformance Index (previously Actim index) for so ccer ( 11 ) and the T otal Play er Rating for baseball ( 16 ). Ob viously , the reduction to the one dimensional space im plies the loss of information—it enables a simply ranking of the athletes, but in case of an ob jectiv e ev aluation it might cause discrepancies. Arc het ypal analysis has the aim to find a few, not necessarily observed, extremal obser- v ations (the archet yp es) in a m ultiv ariate data set such that all the data can be well represen ted as conv ex com binations of the archet ypes. The archet yp es themselv es are restricted to b eing con vex combinations of the individuals in the data set and lie on the data set b oundary , i.e., the conv ex hull. This statistical metho d was first intr o duced b y Cutler and Breiman ( 3 ) and has found applications in differen t areas, e.g., in economics ( 10 , 13 ), astroph ysics ( 2 ) and pattern recognition ( 1 ). Arc het yp es can b e seen as data-driven extreme v alues. In sp orts data, these extreme v alues are the arc hetypal athletes; athletes whic h are outstanding—positively and/or negativ ely—in one or more of the collected statistics. F or interpretat ion, we iden tify the arc het ypal athletes as different types of “goo d” and “bad” , and set the observ ations in relation to them. Statements lik e “Dirk Nowitzki is the b est bask etball play er” are then easily verified—the athlete has to b e an archet ypal athlete (or its nearest observ ation). F urthermore, statemen ts like “Ronaldinho is the b etter dribbler” are verifie d b y not only in terpreting the observ ations’ nearest archet yp es but their (conv ex) com binations of all arc het yp es. The pap er is organized as follows. In Section 2 w e outlin e arc hetypal analysis by in troduc- ing the formal optimization problem. W e illustrate the idea of arc het ypal analysis using a tw o-dimensional subset of NBA pla yer statistics from the season 2009/2010. In Sec- tion 3 we then identify and discuss arc hetypal athletes for tw o popular sports. Section 3.1 extends the illustrativ e NBA example and computes arc hetypal basketb all play ers using common statistics from the season 2009/2010. Section 3.2 computes archet ypal so ccer pla ye rs of the German Bundesliga, the English Premier League, the Italian Lega Serie A, and the Spanish La Liga using skill ratings (at the time of September 2011). Finally , in Section 4 the conclusions are giv en. All data sets and source co des for replicating our analyses are freely av ailable (section on computational details on page 14 ). 2 Archet ypal analysis Consider an n × m matrix X representing a m ultiv ariate data set with n observ ations and m attributes. F or giv en k the archet ypal problem is to find the matrix Z of k m - 2 dimensional arc hetypes. More precisely , to find the t wo n × k co efficien t matrices α and β whic h minimize the residual sum of squares RSS = k X − αZ > k 2 with Z = X > β (1) sub ject to the constrain ts k X j =1 α ij = 1 with α ij ≥ 0 and i = 1 , . . . , n , n X i =1 β j i = 1 with β j i ≥ 0 and j = 1 , . . . , k . The constrain ts imply that (1) the appro ximated data are con vex combin ations of the arc het yp es, i.e., X = αZ > , and (2) the archet yp es are conv ex com binations of the data p oin ts, i.e., Z = X > β . k · k 2 denotes the Euclidean matrix norm. Cutler and Breiman ( 3 ) present an alternating constrained least squares algorithm to solv e the problem: it alternates b et w een finding the b est α for giv en arche types Z and finding the b est archet yp es Z for giv en α ; at each step several con v ex least squares problems are solv ed, the o ver all RSS is reduced successiv ely . Through the definition of the problem, archet yp es lie on the b oundary of the conv ex h ull of the data. Let N b e the num b er of data p oin ts which define the b oundary of the con vex hul l, then Cutler and Breiman ( 3 ) show ed: if 1 < k < N , there are k archet ypes on the b oundary which minimize RSS; if k = N , exactly the data p oints which define the conv ex hull are the arc het yp es with RSS = 0; and if k = 1, the sample mean mini mizes the RSS. In practice, ho wev er, these theoretical results can not alwa ys b e ac hieved ( 6 ). F urthermore, there is no rule for the correct n umber of arc hetypes k for a given problem instance. A simple metho d to determine the v alue of k is to run the algorithm for increasing n umbers of k and use the “elbow criterion” on the RSS where a “flattening” of the curve indicates the correct v alue of k . F or detailed explanations we refer to Cutler and Breiman ( 3 , on the original algorithm), Eugster and Leisc h ( 6 , on numer ical issues, stabilit y , and computational complexit y), and Eugster and Leisch ( 7 , on robustness). In order to illustrate arc hetypal analysis, we use a tw o-dimensional subset of the NBA pla ye r statistics from the season 2009/2010 whic h w e analyze in Section 3.1 : the tw o v ariables are total minutes playe d ( Min ) and field go als made ( FGM ) of 441 pla y ers, i.e., w e in ves tigate “the score efficiency” . Figure 1a shows the data set. The ma jority of play ers are in the range [0 , 1000] of Min and [0 , 200] of FGM . With increasing Min , the v ariance in FGM increases and the shape of the data set suggests the estimation of three arc hetypes. Figu re 1b visualizes the k = 3 archet yp es solution (red), together with the data’s conv ex h ull (gray). W e see that this archet yp es solution is a reasonable appro ximation of the conv ex hull (note that the arc het yp es do not hav e to b e observ ed data p oin ts). Using this solution, the data p oin ts inside the archet yp es solution are exactly approximated, the data p oints outside the archet ypes solution are approx imated with an error, as they are pro jected on the hull of the arc het yp es solution. 3 0 500 1000 2000 3000 0 200 400 600 800 Min FGM (a) 0 500 1000 2000 3000 0 200 400 600 800 Min FGM ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● (b) Figure 1: (a) Data set of tw o NBA play er statistics from the season 2009/2010. (b) Con- v ex hull (gray) and the corresp onding three archet yp es solution (red). The three archet yp es can b e interpret ed as follo ws. Concerning these tw o v ariables total minutes playe d ( MIN ) and field go als made ( FGM ), three t yp es of extreme play ers are in the data set: Arc het yp e 1 is the natural “maxim um” with high v alues in all v ariables ( Min = 3234, F GM = 793); this arc hetype represents a type of “ go o d ” scorer. Arc het yp e 2 is the natural “minimum” with low v alues in all v ariables ( Min = 7, F GM = 0); this arc hetype represents a type of “ b ad ” scorer. Arc het yp e 3 is another extreme v alue with a high num b er of Min but a (relativ ely) low n umber of F GM ( Min = 2713, F GM = 256); this arc hetype represen ts another t yp e of “ b ad ” scorer (i.e., an ineffective scorer). Note that there is no archet yp e with a lo w n um b er of Min and a high num b er of FGM ; suc h an arc hetype w ould represen t another t yp e of “ go o d ” scorer (i.e., an effectiv e scorer ). An important asp ect of the in terpretation is, that it is conditioned on the giv en data; e.g., the num b er field go als made obviou sly is related to the p osition and tactical orientation of a play er, but these information are not av ailable in this illustrativ e data set and therefore cannot con tribute to the interpretation of “ go o d ” and “ b ad ” play ers. Ha ving iden tified the p ossible extreme v alues within the giv en data set, the next step is to set the observ ations in relation to them. The α co efficien ts of the archet ypal problem (F ormula 1 ) define how muc h eac h arc het yp e contributes to the approximati on of each individual observ ation (as conv ex combination). This allows the assignment of the observ ations to their nearest arc hety p es and, consequently , the iden tification of the 4 1 2 3 0.8 0.2 0.8 0.6 0.4 0.6 0.4 0.6 0.4 0.2 0.8 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● (a) 0 500 1000 2000 3000 0 200 400 600 800 Min FGM 1 2 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● (b) Figure 2: (a) Visualization of the α co efficients using a ternary plot and (b) the data set in case of the k = 3 archet yp es solution. The red dots are the archet yp es’ nearest pla ye rs; dots colored with blue, orange, and green are pla y ers where Archet yp e 1, 2, and 3 con tribute more than 0 . 8. most archet ypal observ ation(s). Figure 2 shows the corresp onding ternary plot of the α co efficients for the ab o v e k = 3 arc hetypes solution. The three play ers (red p oints ) nearest to the resp ectiv e archet yp es (red crosses) are: Name T eam Role Min F GM α · 1 α · 2 α · 3 Arc het yp e 1 Kevin Duran t OKL SF 3241 794 1.00 0.00 0.00 Arc het yp e 2 Dw ay ne Jones PHO C 7 0 0.00 1.00 0.00 Arc het yp e 3 Jason Kidd D AL PG 2883 284 0.06 0.00 0.94 Arc het yp e 1 and 3 hav e well-defined nearest observ ations; Archet yp e 2, on the con- trary , has a set of nearest observ ations and the concrete pla y er iden tification should b e considered as a “random” selection from the set of similar play ers. W e ha v e iden tified Arc het yp e 1 as the “ go o d ” archet yp e in this data setting—on this accoun t, Kevin Durant can b e considered as the b est scorer. T o find other go o d scorers, w e lo ok at the observ ations where Archet yp e 1 contributes more than 0 . 8 (blue p oints): 5 Name T eam Role Min FGM α · 1 α · 2 α · 3 Kevin Duran t OKL SF 3241 794 1.00 0.00 0.00 Lebron James CLE SF 2967 768 0.95 0.05 0.00 Kob e Bryan t LAL SG 2834 716 0.89 0.11 0.00 Dwy ane W ade MIA SG 2793 719 0.89 0.11 0.00 Dirk No witzki D AL PF 3041 720 0.89 0.05 0.06 Amare Stoudemire PHO PF 2835 704 0.88 0.12 0.00 Carmelo An thony DEN SF 2636 688 0.85 0.15 0.00 Da vid Lee NYK C 3018 686 0.82 0.05 0.13 Derric k Rose CHI PG 2872 672 0.82 0.10 0.08 W e equiv alently pro ceed for the other tw o archet yp es: Pla y ers where Arc hetype 3 con- tributes more than 0 . 8 are Jason Kidd, Thab o Sefolosha, Earl W atson, Anthon y Park er, Derek Fisher, Ron Artest, Marcus Camb y (green p oin ts). Fiv e randomly selected play- ers where Arc het yp e 2 con tributes more than 0 . 8 are Ry an Bow en, Sean Marks, Ian Mahinmi, Jamaal Magloire, Quin ton Ross (orange p oints). Observ ations to w ard the cen ter of the data set are not appro ximated b y one arc het yp e alone, but each archet ype con tributes a significant fraction. The following five pla y ers, for example, are randomly selected from the data sets’ center (brown p oints): Name T eam Role Min FGM α · 1 α · 2 α · 3 Vince Carter ORL SG 2310 434 0.44 0.23 0.32 An thon y Morrow GSW SG 2019 329 0.28 0.31 0.40 C.j. Miles UT A SF 1497 241 0.21 0.49 0.31 P aul Millsap UT A PF 2275 385 0.35 0.23 0.42 Ro dney Stuck ey DET PG 2499 449 0.44 0.16 0.40 As w e can see, based on the α co efficien ts of the play ers no assignments to one of the arc het yp es are p ossible. Besides setting the observ ations in relation to their nearest archet yp e using the obser- v ations’ highest α , the interpretati on of all α s of an observ ation is of interest as well. Supp ose that, for example, the data set describes skill ratings of play ers, then the α s can b e interpreted as the play ers’ comp ositions of skills; see Section 3.2 for such an application of arc hetypal analysis. 6 3 Archet ypal athletes Arc het ypal analysis in general enables to compute data-driv en extreme v alues and the corresp onding observ ations’ (conv ex) com binations of all archet yp es. In case of sp orts data this allows to iden tify and interpret arch etypal athletes. F urthermore, all athletes are set in relation to the archet ypal athletes and then can b e “ev aluated” according to them. In this section we determine arc hety pal athletes for tw o p opular sp orts and their rep- resen tativ e leagues. Section 3.1 extends the illustrativ e tw o-dimensional example and computes archet ypal bask etball play ers with common statistics from the NBA season 2009/2010. Section 3.2 determines arc hetypal so ccer pla y ers of the German Bundesliga, the English Premier League, the Italian Lega Serie A, and the Spanish La Liga using skill ratings (at the time of September 2011). 3.1 Archetypal bask etball play ers W e determine the archet ypal bask etball pla yer s of the NBA season 2009/2010. Ku- batk o et al. ( 9 ) define basic v ariables used in what is now the mainstream of bask etball statistics. F ollo wing their suggestion w e use a data set pro vided b y Steele ( 15 ) with 19 statistics of 441 play ers. Figure 3 visualizes the data using a parallel co ordinates plot. In comparison to the t wo-di mensional illustrativ e example no structure is easily observ able; and there is, for example, no pla yer which is the maxim um ov er all statistics. W e fit k = 1 , . . . , 10 arc het yp es; Figure 4 shows the corresp onding scree plot: the first “elbow” is at k = 4 ( RS S = 0 . 04), the second one at k = 8 ( RS S = 0 . 03). The additional error reduction b et w een k = 4 and k = 8 is marginal and we decide on k = 4 arc hetypal basketball pla ye rs. Figure 5 displa ys the percentile plots (i.e., the p ercen tile v alue in an archet yp e as com- pared to the data) of the four arc het ypal basket ball play ers av ailable in the NBA season 2009/2010. The particular characteristics are: Arc het yp e 1 is the arc hetypal “bench warmer” with few games play ed and therefore lo w v alues in all statistics. Arc het yp e 2 is the archet ypal reb ounder and defensive play er with high v alues in the reb ounds, blo cks and foul-related statistics, and low v alues in the three-p oin ters. Arc het yp e 3 is the arc het ypal three-p oint sho oter with high v alues in the three-p oin ter statistics and lo w v alues in the free thro ws and reb ounds. Arc het yp e 4 is the arch etypal offensiv e pla yer with high v alues in all thro w-related statis- tics and low v alues in foul-related statistics. 7 Figure 3: P arallel co ordinates plot of the statistics of 441 pla yers from the NBA season 2009/2011. ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 Archetypes RSS 1 2 3 4 5 6 7 8 9 10 Figure 4: Scree plot of the residual sum of squares for 1 to 10 archet yp es. 8 P ercentile 0 20 40 60 80 100 Archetype 1 P ercentile 0 20 40 60 80 100 Archetype 2 P ercentile 0 20 40 60 80 100 Archetype 3 P ercentile 0 20 40 60 80 100 Archetype 4 GamesPlay ed T otalMinutesPla yed FieldGoalsMade FieldGoalsAttempted ThreesMade ThreesAttempted FreeThrowsMade FreeThrowsAttempted Offensiv eRebounds T otalRebounds Assists Steals T urnovers Blocks P ersonalF ouls Disqualifications T otalP oints T echnicals GamesStar ted Figure 5: Per centile plot of the four archet ypal bask etball play ers solution. 9 Arc het yp e 1 represents a type of “ b ad ” bask etball pla yer while all others represent dif- feren t t yp es of “ go o d ” play ers. The four basketball pla yer nearest to one of the four arc het yp es are: Name T eam Role α · 1 α · 2 α · 3 α · 4 Arc het yp e 1 Dw ay ne Jones PHO C 1.00 0.00 0.00 0.00 Arc het yp e 2 T a j Gibson CHI SF 0.00 1.00 0.00 0.00 Arc het yp e 3 An thon y Morrow GSW SG 0.00 0.00 0.96 0.04 Arc het yp e 4 Kevin Duran t OKL SF 0.00 0.00 0.00 1.00 On this account, T a j Gibson, Anthon y Morrow, Kevin Durant can b e considered as the b est basketball play ers of the season 2009/2010 with respect to the c haracteristics of their corresp onding archet yp es. Ho w ever, note that in case of Arc het yp e 3 the pla ye r is not exactly the archet yp e. In order to find all go o d play ers, we lo ok at the observ ations where one of the three “ go o d ” archet yp e contributes more than 0 . 95: Arc het yp e Name T eam Role α · 1 α · 2 α · 3 α · 4 Arc het yp e 2 T a j Gibson CHI SF 0.00 1.00 0.00 0.00 Andrew Bogut MIL C 0.00 1.00 0.00 0.00 Sam uel Dalembert PHI C 0.02 0.98 0.00 0.00 Jason Thompson SAC PF 0.03 0.96 0.00 0.00 Arc het yp e 3 An thon y Morrow GSW SG 0.00 0.00 0.96 0.04 Stev e Blake LA C PG 0.02 0.00 0.96 0.02 Arc het yp e 4 Kevin Duran t OKL SF 0.00 0.00 0.00 1.00 Lebron James CLE SF 0.00 0.00 0.00 1.00 Dwy ane W ade MIA SG 0.00 0.00 0.00 1.00 Kob e Bryan t LAL SG 0.03 0.00 0.00 0.97 The equal co efficien ts, e.g. for the first t wo play ers in case of Arc het yp e 2, o ccur due to rounding to tw o decimal places. The threshold 0 . 95 is arbitrarily defined; this is, in fact, the only sub jective decision one has to mak e when discussing the qualit y of athletes using arc hetypal analysis. 3.2 Archetypal so ccer play ers The skill ratings are from the PES Stats Database ( 12 ) (PSD), a comm unity-b ased approac h to create a database with accurate statistics and skill ratings for so ccer pla yer s (originally for the video game “Pro Ev olution So ccer” b y Konami). The extracted data set consists of 25 skills of 1658 play ers (all p ositions—Defender, Midfielder, F orw ard— except Goalkeepers) from the German Bundesliga, the English Premier League, the Italian Serie A, and the Spanish La Liga. The skills are rated from 0 to 100 and describe differen t abilites of the play ers: ph ysical abilities like balance, stamania, and top sp eed; ball skills lik e dribble, pass, and shot accuracy and sp eed; and general skills lik e attack 10 Figure 6: Parallel co ordinates of the skill ratings of of 1658 play ers from the German Bundesliga, the English Premier League, the Italian Serie A, and the Spanish La Liga. and defence p erformance, technique, aggression, and teamw ork. Note that w e assume that the differences are interp retable, i.e., the ratings are on a ratio scale. Figure 6 shows a parallel co ordinates plot of the data set. Most skills range b etw een 50 and 100; this is due to the fact that PSD describ es so ccer play ers of all hierarch y lev els of a league system. Anyw ay , no real structure is visible in the data, and there are no pla ye rs which are the maximum or the minim um ov er all skills. W e decide to use k = 4 arc het ypal so ccer play ers; see the online supplemen t for the decision pro cess (section on computational details on page 4 ). Figure 7 di splays the percentile plots of the four arc heypal soccer pla yers. The particular c haracteristical skills are: 11 P ercentile 0 20 40 60 80 100 Archetype 1 P ercentile 0 20 40 60 80 100 Archetype 2 P ercentile 0 20 40 60 80 100 Archetype 3 P ercentile 0 20 40 60 80 100 Archetype 4 Attack Defence Balance Stamina T opSpeed Acceleration Response Agility DribbleAccuracy DribbleSpeed Shor tP assAccuracy Shor tP assSpeed LongP assAccuracy LongP assSpeed ShotAccuracy ShotP ower ShotT echnique FreeKickAccuracy Curling Header Jump T echnique Aggression Mentality T eamwork Figure 7: Per centile plot of the four archet ypal so ccer play ers solution. Figure 8: Parallel co ordinates plot of α co efficien ts of the four archet ypal so ccer play ers solution with highligh ted defenders (red). 12 Arc het yp e 1 is the archet ypal offensiv e play er with all skills high excp ect the defense, balance, header, and jump. Arc het yp e 2 is the archet ypal cen ter forw ard with high skills in attack, shot, accelera- tion, header and jump, and low passing skills. Arc het yp e 3 is the arc hetypal weak so ccer pla y er with high skills in running, but low skills in most ball related skills. Arc het yp e 4 is the archet ypal defender with high skills in defense, balance, header, and jump. T o v erify this in terpretation w e look at the α coefficients in combination with the play ers’ p osition; Figure 8 exemplarily shows the parallel co ordinates plot with the “Defender” p osition highligh ted (red). As w e can see, nearly all defenders hav e a high α co efficient for Arc hetype 4. No w, in order to in v estigate the question of the b est so ccer play er w e hav e to make a (sub jective) definition of “the b est” in terms of the four arc hetypes. F or us, the best pla ye r is a combination of Arc het yp e 1 and Arc het yp e 2 with Arc het yp e 1 contributing more than Arc hetype 2 (according to the common sense that offensiv e pla yers are matc h- winning). The following so ccer play ers apply to the definition (orderd according to α · 1 ): Name Club α · 1 α · 2 α · 3 α · 4 W ayne Ro oney Manc hester United FC 0.82 0.18 0.00 0.00 Leo Messi F C Barcelona 0.79 0.21 0.00 0.00 Cristiano Ronaldo Real Madrid CF 0.68 0.32 0.00 0.00 An tonio Di Natale Udinese Calcio 0.67 0.33 0.00 0.00 Carlos Tiv ez Manc hester City FC 0.66 0.34 0.00 0.00 Diego F orlan F C Internazionale Milano 0.64 0.36 0.00 0.00 Dimitar Berbato v Manc hester United F C 0.60 0.40 0.00 0.00 Adrian Mutu A C Cesena 0.60 0.40 0.00 0.00 Zlatan Ibrahimo vic AC Milan 0.54 0.46 0.00 0.00 Luis Suarez Liv erp o ol FC 0.53 0.47 0.00 0.00 Mladen P etric Ham burger SV 0.53 0.47 0.00 0.00 Xa vi Hernandez F C Barcelona 0.52 0.48 0.00 0.00 Didier Drogba Chelsea F C 0.52 0.48 0.00 0.00 Giusepp e Rossi Villarreal CF 0.51 0.49 0.00 0.00 Based on our definition and the given skill rating data set, W a yne Ro oney is the b est pla ye r, follow ed by Leo Messi and Cristiano Ronaldo. 13 4 Conclusion The presen t pap er applies the statistical metho d archet ypal analysis to sp orts data. This enables to compute outstanding—p ositiv ely and/or negatively—athlet es, i.e., the arc het ypal athletes. Statements like “Di rk Nowitzki is the b est basketball play er. No, it’s Kevin Duran t!” can b e then discussed completely data-drive n and with a w ell- defined and repro ducible amount of sub jectivit y . The prop osed wa y is (1) to estimate the archet yp es, i.e., the archet ypal athletes, then (2) to identify the athletes as different t yp es of “ go o d ” and “ b ad ” athletes, and finally (3) to set all athletes in relation to the arc het yp es (using the α co efficients). The tw o examples—basketball and so ccer—sho ws that this is an appropriat e approac h; the estimated archet ypal athletes definitely are consisten t with the general opinion. Computational details All computations and graphics ha ve b een done using the statistical softw are R 2.13.1 ( 14 ), the archet yp es pack age ( 4 ), and the Sp ortsAnalytics pack age ( 5 ). R itself and all pac k ages used are freely av ailable under the terms of the General Public License from the Comprehensiv e R Archiv e Netw ork at http://CRAN.R- project.org/. Data sets and source co des for replic ating our analyses are av ailable in the Sp o rtsAnalytics pac k age. An individual analysis is executed via (replace *** with nba-2d , nba and soccer ): R> demo("archeplayers-***", package = "SportsAnalytics") The source co de file for a demo is accessible via: R> edit(file = system.file("demo", "archeplayers-***.R", + package = "SportsAnalytics")) References [1] Christian Bauckhage and Christian Thurau. Making archet ypal analysis practical. In Pr o c e e dings of the 31st DA GM Symp osium on Pattern R e c o gnition , pages 272– 281, 2009. doi: 10.1007/978- 3- 642- 03798- 6 28. [2] Ben H. P . Chan, Daniel A. Mitchell , and La wrence E. Cram. Ar chet ypal analysis of galaxy sp ectra. Monthly Notic e of the R oyal Astr onomic al So ciety , 338:790–795, 2003. doi: 10.1046/j.1365- 8711.2003.06099.x. 14 [3] Adele Cutler and Leo Breiman. Archet ypal analysis. T e chnometrics , 36(4):338–347, 1994. [4] Man uel J. A. Eugster. a rchetypes : A r chetyp al Analysis , 2010. URL http://cran. r- project.org/package=archetypes . R pack age v ersion 2.0-2. [5] Man uel J. A. Eugster. Sp o rtsAnalytics : Sp orts Anal ytics , 2011. URL http://cran. r- project.org/package=SportsAnalytic s . R pack age version 0.1. [6] Man uel J. A. Eugster and F riedric h Leisc h. F rom Spider-man to Hero – Archet ypal analysis in R. Journal of Statistic al Softwar e , 30(8):1–23, 2009. URL http://www. jstatsoft.org/v30/i08 . [7] Man uel J. A. Eugster and F riedrich Leisc h. W eighted and robust arc hetypal analysis. Computational Statistics and Data Analysis , 55(3):1215–1225, 2011. doi: 10.1016/ j.csda.2010.10.017. Preprin t a v ailable from http://epub.ub. uni- muenchen.de/ 11498/ . [8] T rev or Hastie, Robert Tibshirani, and Jerome F riedman. The Elements of Statistic al L e arning: Data Mining, Infer enc e, and Pr e diction . Springer-V erlag, second edition, 2009. ISBN 0387848576. [9] Justin Kubatk o, Dean Oliv er, Kevin Pelton, and Dan T. Rosenbaum. A starting p oin t for analyzing bask etball statistics. Journal of Quantitative A nalysis in Sp orts , 3:Article 1, 2007. doi: 10.2202/1559- 0410.1070. [10] Shan Li, Paul W ang, Jordan Louviere, and Richard Carson. Archet ypal analysis: A new wa y to segmen t mark ets based on extreme individuals. In A Celebr ation of Ehr enb er g and Bass: Marketing Know le dge, Disc overies and Contribution. Pr o- c e e dings of the ANZMAC 2003 Confer enc e, De c emb er 1-3, 2003 , pages 1674–1679, 2003. [11] Ian McHale and Phil Scarf. Ranking fo otball pla y ers. Signific anc e , 2:54–57, 2005. doi: 10.1111/j.1740- 9713.2005.00091.x. [12] PES Stats Database. PSD – PES Stats Database. W ebsite, 2011. http: //pesstatsdatabase.com/ ; visited on Nov em b er 4, 2021. [13] Gio v anni C. Porzi o, Giancarlo Ragozini, and Domenico Vistocco. On the use of arc het yp es as b enc hmarks. Applie d Sto chastic Mo dels in Business and Industry , 24 (5):419–437, 2008. doi: 10.1002/asm b.v24:5. [14] R Developmen t Core T eam. R : A L anguage and Envir onment for Statistic al Com- puting . R F oundation for Statistical Computing, Vienna, Austria, 2011. URL http://www.R- project.org/ . ISBN 3-900051-07-0. [15] Doug Steele. Doug’s NBA & MLB statistics home page. W ebsite, 2011. http: //dougstats.com/ ; visited on Nov em b er 4, 2021. 15 [16] John Thorn and P ete P almer. The Hidden Game of Baseb al l . Doubleda y , 1984. ISBN 0385182848. Affiliation: Man uel J. A. Eugster Institut f ¨ ur Statistik Ludwig-Maximilians-Univ erstit ¨ at M ¨ unc hen 80530 Munic h, Germany E-mail: Manuel.Eugster@ stat.uni- muenchen.de W ebsite: http://www.statistik.lm u.de/~eugster/ 16
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment