Karl Pearsons Theoretical Errors and the Advances They Inspired

Karl Pearson played an enormous role in determining the content and organization of statistical research in his day, through his research, his teaching, his establishment of laboratories, and his initiation of a vast publishing program. His technical…

Authors: ** Stephen M. Stigler (Ernest DeWitt Burton Distinguished Service Professor, Department of Statistics, University of Chicago) **

Statistic al Scienc e 2008, V ol. 23, No. 2, 261– 271 DOI: 10.1214 /08-STS256 c  Institute of Mathematical Statisti cs , 2008 Ka rl P ea rson’s Theo retical Erro rs and the Advances They Inspi red Stephen M. Stigler Abstr act. Karl Pe arson pla y ed an enormous role in determining the con ten t and organizatio n of statistical researc h in his day , through his researc h , his teac hing, his establishment of lab oratories, and his ini- tiation of a v ast publishin g program. His tec hnical con trib utions had initially and con tinue today t o hav e a profound impact up on the work of b oth applied and theoretical statisticians, p artly through their in- adequately ac knowledge d influence up on Ronald A. Fisher. Pa rticular atten tion is dra wn to tw o of Pea rson’s ma jor errors that nonetheless ha v e left a p ositiv e and lasting impr ession up on the statistical world. Key wor ds and phr ases: Karl P earson, R. A. Fisher, Chi-squ are test, degrees of freedom, parametric inference, history of statistics. 1. INTRODUCTION Karl P earson surely ranks among the more pr o- ductiv e and in tellectually energetic sc holars in his- tory . He cannot matc h the most prolific humanists, suc h as one of whom it has b een said, “he had no unpu blished though t,” but in the domain of qu an ti- tativ e science P earson h as no serious riv al. Eve n the immensely prolific Leonhard Euler, whose collected w orks are still b eing pu blished more than t wo cen- turies after h is death, falls sh ort of Pearson in sh eer v olume. A list of P earson’s works fills a hard b oun d b o ok; that b o ok lists 648 w orks and is still incom- plete (Moran t, 1939 ). My o wn m o derate collec tion of his w orks—itself ve ry far fr om complete (it omits his con tributions to Biometrika )—o ccupies 5 feet of Stephen M. Stigler is t he Ernest DeWitt Burt on Distinguishe d Servic e Pr ofessor in the Dep artment of Statistics, U n iversity of Chic ago, 5734 University Avenue, Chic ago, Il linois 60637, USA e-mail: stigler@uchic ago.e du . This p ap er is b ase d up on a talk pr esente d at the Ro yal Statistic al So ciety in Mar ch 2007, at a symp osium c elebr ating the 150th anniversary of Karl Pe arson ’s birth. This is an electronic reprint of the o riginal ar ticle published by the Institute of Ma thematical Statistics in Statistic al Scienc e , 20 08, V ol. 23 , No. 2, 26 1–27 1 . This reprint differs from the orig inal in pagination a nd t yp ogr aphic detail. shelf space. And h is were not casually constructed w orks: wh en a stu den t or a n ew co-w ork er would do the lab orious ca lculations for some statistical an al- ysis, Pea rson w ou ld redo the w ork to greater acc u- racy , as a c hec k. An American visiting Pe arson in the early 1930s once ask ed him ho w h e found the time to w rite so m uc h and compu te so muc h. P ear- son replied, “Y ou Americans would not un derstand, but I n ever answer a telephone or attend a commit- tee meeting” (S touffer, 1958 ). P earson’s accomplishment s w ere n ot merely volu- minous; they could b e lu minously enlightening as w ell. T oday the most famous of these are Pearson’s Pro du ct Moment C orr elation Co efficien t and the Chi- square test, dating resp ectiv ely from 1896 and 1900 (P earson, 1896 , 19 00a , 190 0b ). He w as a drivin g force b ehind the founding of Biometrika , whic h h e edited for 36 y ears a nd made in to th e fir s t imp ortant journal in m athematical statistics. He also estab- lished anot her jour nal ( the Anna ls of E ugenics ) and sev eral add itional serial publications, t wo researc h lab oratories, and a sc ho ol of statistical though t. P ear- son pioneered in the use of mac hine calculation, and he sup ervised the calculation of a series of mathe- matical tables that influ enced statistical practice for decades. He made other d isco veries, less commonly asso ciated with his n ame. He w as in 1897 the fi rst to n ame the phen omenon of “sp urious correlation,” th us pub licly iden tifying a p o werful idea that made 1 2 S. M. STIGLER him and coun tless descendents more a ware of the pitfalls exp ected in an y serious statistical inv esti- gation of so ciet y (P earson, 1897 ). And in a series of in v estigations of cr an iometry he introd uced the idea of landm arks to the s tatistica l study of shap es. P earson w as at one time well known for the P ear- son F amily of F requency Curves. That f amily is sel- dom referred to to da y , but there is a small fact (re- ally a striking d isco very) he found in its early de- v elopmen t that I wo uld call atten tion to . When we think of th e normal appr o ximation to the bin omial, w e usu ally think in terms of large samples. Pea r- son disco ve red th at there is a sense in whic h the t w o distribu tions agree exactly for ev en th e s m allest n umb er of trials. It is w ell kno wn that the n orm al densit y is c haracterized by the different ial equation d dx log( f ( x )) = f ′ ( x ) f ( x ) = − ( x − µ ) σ 2 . P earson discov ered that p ( k ) , the pr obabilit y func- tion for the symm etric b inomial distribu tion ( n in- dep end en t trials, p = 0 . 5 ea c h trial), satisfies the analogous difference equ ation exactly: p ( k + 1 ) − p ( k ) ( p ( k + 1) + p ( k )) / 2 = − ( k + 1 / 2) − n/ 2 ( n + 1) · 1 / 2 · 1 / 2 or rate of c hange p ( k ) to p ( k + 1) a verage of p ( k ) and p ( k + 1) = − midp oint of ( k , k + 1) − µ n σ 2 n +1 for all n , k . Th e app earance of n + 1 ins tead of n in the denomin ator might b e considered a minor fud ge, but th e equ ation still demons tr ates a really funda- men tal agreemen t in the shap es of the tw o d istr ibu- tions th at do es not rely up on asymptotics (Pe arson, 1895 , p age 356). All of these Pea rsonian ac hieve ments are indeed substanti al, and constitute ample reason to cele - brate h im 150 ye ars after his b ir th. But if these are all w e saluted, I would hold that P earson is b eing un- derappreciated. T o prop erly gauge his impact up on mo dern statistics, w e m ust tak e a lo ok at parts of t w o w orks of his that are t ypically not held in high regard. Indeed, they are us ually men tioned in deri- sion, as exhib iting tw o m a jor errors that show Pear- son’s limitations and highlight the great gulf that la y b et ween Pearson and the Fisherian era that was to follo w. I wish to r eturn no w to these tw o works and reassess them. I intend to argue that these er- rors should coun t among the more infl uent ial of his w orks, and that they help ed p a ve the w a y for the creation of m o dern mathematical s tatistics. 2. PEARSON’S FIRST MAJOR ERROR Louis Nap oleon George Filon w as b orn in F rance, but his family mo ve d to England w hen h e w as three y ears old (J effr ey , 1938 ). He first encountered K arl P earson as a student at Un iv ersit y College London. After receiving a B.A. in 1896, Filon serve d as P ear- son’s Demonstrator un til 1898, and together they wrote a monumen tal m emoir on the “probable er- rors of frequ en cy constan ts,” a pap er read to the Ro yal So ciet y in 1897 and pu blished in their T r ans- actions in 1898. In 1912 Filon succeeded Pea rson as Goldsmid Pr ofessor of App lied Mathematics and Mec hanics. At Pe arson’s retiremen t banquet in 1934, Filon (who w as by that time Vice-Chancello r of the Univ ersit y of Lond on) explained the genesis of this, their only w ork together in statistics. “K. P . lectured to us on the Mathemat- ical Th eory of Statistics, and on one o c- casion wrote down a certain integ ral as zero, wh ic h it should h av e b een on ac- cepted pr inciple. Unfortun ately I ha v e al- w a ys b een one of those wr ong-headed p er- sons who refuse to accept the statemen ts of P rofessors, u n less I can ju stify them for m yself. After muc h lab our, I actually ar- riv ed at th e v alue of the in tegral directly— and it was nothing like zero. I to ok this result to K. P ., and then, if I ma y say so, the fu n b egan. The battle laste d, I think, ab out a week, bu t in th e end I suc- ceeded in convincing Professor Pea rson. It w as typical of K. P . th at, the momen t he w as r eally convinced, he sa w the full con- sequences of the result, pro ceeded at once to b uild u p a new theory (wh ic h inv olved scrapping some p reviously published re- sults) and generously associated me with himself in the resulting p ap er” (Filo n, 1934 ). The term “pr obable error” w as in tro duced early in the 19th cent ury to mean w h at w e would n ow call the median err or of an estimat e. Th us it is a v alue whic h, when divided b y 0.6745, giv es th e standard deviation for an unbiased estimate w ith an app ro x- imately n ormal d istribution. My guess is that the KP’S ERRORS 3 lecture that Filon referred to in v olv ed the formula for the p robable error of Pea rson’s pro d uct-momen t estimate of the correlation co efficient for biv ariate normal distributions. Pe arson had giv en th is incor- rectly in 1896, and one of the signal ac hieveme nts of th e P earson–Filon pap er was to correct that er- ror (Stigler, 1986 , page 343). But the 1898 pap er did m uc h more: it p urp orted to giv e the appro xi- mate d istributions f or the p robable errors of all the estimated frequency constan ts, indeed their en tire join t distribu tion, f or virtually any statistical prob- lem. The th eory presente d was relativ ely short; most of the pap er was tak en up with a large num b er of applications. Unfortunately , quite a num b er of these app lica- tions pro v ed to b e in err or. There are some ind ica- tions Pearson ma y ha ve realized this b y 1903, b ut if he did sense troub le with the pap er, he d id not call it to p ublic atten tion. In 1922 Ronald Fisher repaired P earson’s omission when he noted in p articular th at outside of the case of the normal d istribution, nearly all of the ap p lications in Pe arson–Filon w ere erro- neous. Th is included many metho d-of-moment s es- timates (the gold-standard metho d for the Pear- sonian s c ho ol). A significant P earson ac h iev emen t came to b e lab eled an error, on e ev entual ly o v ercome b y a Fisher success, and in consequence, the 1898 pap er has su ffer ed a low reputation. But despite these p roblems, the p ap er had, arguably , a signif- ican t and large ly underappr eciated p ositiv e impact up on statistics. A t fir st glance the P earson–Filon argument ma y app ear strikingly mo d ern, apparen tly expanding a log-lik eliho o d ratio to d eriv e an asymptotic app r o x- imately m ultiv ariate norm al distribu tion for the er- rors of estimation. The auth ors considered a multi- v ariate set of m -d imensional measurements x 1 , x 2 , x 3 , . . . , x m of “a complex of organs,” and they stated the “frequency surface” should b e given by “ z = f ( x 1 , x 2 , x 3 , . . . x m ; c 1 , c 2 , c 3 , . . . c p ) , where c 1 , c 2 , c 3 , . . . , c p , are p frequency con- stan ts, whic h define th e form as distin- guished from the p osition of the frequency surface, and which will b e fu nctions of stan- dard deviations, momen ts, ske wnesses, co- efficien ts of correlation, &c., &c., of indi- vidual organs, and of pairs of organs in the complex.” The “p osition” of the surface would b e give n in terms of the means h 1 , h 2 , . . . , h m of the x ’s, wh ic h are implicit in this notatio n as they are stated to giv e the origin of the su rface, and so there are m + p frequency constant s to b e d etermined fr om a set of n measurements, eac h m -dimensional. T o d etermine the “probable errors ” of the fre- quency constants, Pea rson and Filon looke d at the ratio f orm ed by dividing the pro du ct of n suc h sur- faces (for the n vecto r measurements) in to w hat a similar pro d uct w ould b e if the frequen cy constants had b een d ifferen t v alues: P ∆ P 0 = Π f ( x 1 + ∆ h 1 , x 2 + ∆ h 2 , . . . , x m + ∆ h m ; c 1 + ∆ c 1 , c 2 + ∆ c 2 , . . . , c p + ∆ c p ) / Π f ( x 1 , x 2 , . . . , x m ; c 1 , c 2 , . . . , c p ) . The logarithm of this ratio is th en the difference of t w o sums. After b eing expanded “by T aylo r’s theo- rem,” this yields a series w ith t ypical terms, writing S for su mmation, they found to b e log( P ∆ /P 0 ) = ∆ h r S d dx r (log f ) + 1 2 (∆ h r ) 2 S d 2 dx 2 r (log f ) + ∆ h r ∆ h r ′ S d 2 dx r dx r ′ (log f ) + ∆ c s S d dc s (log f ) + 1 2 (∆ c s ) 2 S d 2 dc 2 s (log f ) + ∆ c s ∆ c s ′ S d 2 dc s dc s ′ (log f ) + ∆ h r ∆ c s S d 2 dh r dc s (log f ) + · · · + cub ic terms in ∆ h and ∆ c + & c, “where f stands for f ( x 1 , x 2 , x 3 , . . . , x m ; c 1 , c 2 , c 3 , . . . , c p ).” P earson and Filon then “replace su m s by in te- grals,” f or example, b y replacing the second sum- mation ab o ve, namely S d 2 dx 2 r (log f ), b y − B r = R R R · · · f d 2 log f dx 2 r dx 1 dx 2 · · · dx m . With their notation the fre- quency surface encompassed a v olume = n (i.e., it w as not a r elative frequency su rface), so if the f w ere tak en as a density or relativ e frequency surface this would b e tan tamoun t to replacing 1 n S d 2 dx 2 r (log f ) b y its exp ectatio n E [ d 2 dx 2 r (log f )], whic h wo uld equal 4 S. M. STIGLER − B r . The integrals were then ev aluated, and higher- order terms d iscarded, to get P ∆ = P 0 exp t . − 1 2 { B r (∆ h r ) 2 − 2 C r r ′ ∆ h r ∆ h r ′ − 2 G r s ∆ h r ∆ c s + E s (∆ c s ) 2 (1) − 2 F ss ′ ∆ c s ∆ c s ′ + & c. · · ·} where B r , etc. are in tegrals giv en in terms of deriv a- tiv es of log f . W e are then told: “This r epresen ts the probabilit y of the ob- serv ed unit, i.e. the individu als ( x 1 , x 2 , x 3 , . . . x m , for all sets), o ccur ring, on the assumption that the er r ors ∆ h 1 , ∆ h 2 , . . . , ∆h m , ∆ c 1 , . . . , ∆ c p , ha ve b een made in the determination of the fr equ ency con- stan ts. In other wo rds, w e h av e here the frequency distribution for errors in the v al- ues of the fr equency constants.” Sev eral of their s teps, such as the cav alier sub- stitution of in tegrals for sum s or the discarding of remainder terms, ma y seem in sufficien tly d efended, but the general dr ift is so similar to what we tend to see to d a y that it would b e easy for an uncriti- cal r eader to accept it, b elieving that it is p robably essen tially accurate, and that with some effort and additional r egularit y cond itions all should b e well. After all, such a reader migh t say , the r eplacemen t can b e justified u nder reasonable regularity condi- tions, and ev en th e last step wo uld b e sanctioned b y a lo ose inv erse probabilit y argumen t su c h as was common at that time. That reader would b e wr ong. 3. THE SOURCE OF THE ERROR The ke y to und erstanding wh at w en t wrong with the P earson–Filon argument is at the v ery b egin- ning, as a closer readin g s h o ws. Mo dern r eaders ha v e understand ably tended to tak e th e frequency surface z = f ( x 1 , x 2 , x 3 , . . . , x m ; c 1 , c 2 , c 3 , . . . , c p ) as a para- metric mo d el. But in fact, in their notation, z is the fitte d surf ace in terms of estimate d h ’s and c ’s. P earson and Filon explained this implicitly in th e first p aragraph on page 231, when they w rite exclu- siv ely in terms of th e group of n in dividuals and the “means” h i (referring to the arithmetic means) as determining the origin of the surface, and exp licitly in the sh ort fourth paragraph, wh er e they refer to the h + ∆ h ’s and c + ∆ c ’s of the numerator as b eing considered “instead of the observe d v alues.” T his runs coun ter to mo d er n statistical practice, which w ould f o cus on a sp ecification in terms of p aram- eters, not the estimates, with one v alue consid ered the tru e or p opulation v alue, and the f o cus would b e the deviatio ns of the estima tes from that v alue. The idea of this t y p e of parametric mo deling was, ho w ev er, only to b e in tro duced in 1922 by Fisher (Stigler, 2005 ), and Pe arson’s “frequen cy constant s” w ere not parameters, ev en if they were sometimes emplo y ed in an equiv alen t fashion. This differen ce w as, as we shall see, highly consequentia l; it was the source of th e pr incipal difficulties in the argu m en t. Because Pea rson and Filon to ok the estimates as a s tarting p oin t, the T a ylor expansion th ey gav e w as ab out th e estimate d v alues. T h e expansion itself is fine, bu t when they came to substituting in tegrals for su ms, they inadverten tly encounte red a problem. Consider the fir st su m that inv olv es a general fre- quency constant c , n amely S d dc s (log f ). (The earlier terms in vo lv e the h ’s but hav e b een wr itten in terms of d eriv ativ es with resp ect to the x ’s; the issue is more clearly ad d ressed with the c ’s.) If we to ok f as a densit y , it might not b e unreasonable to replace this sum (divided b y n ) by its limit in p robabil- it y , the exp ectation E [ d dc s (log f )]. But und er what distribution should the exp ectatio n b e computed? With Fisher it wo uld b e computed under the distri- bution with the true v alues of the parameters. But P earson lac k ed that notion; for him there was no “true v alue,” only a summary estimate in terms of observ ed v alues. Someone—p erhaps F ourier—h as b een quoted as sa ying that “Mathemati cs has no s y mb ols for co n- fused ideas.” Any one seeking a coun terexample to this need lo ok n o fu rther than Pe arson–Filon. With their symb ol f = f ( x 1 , x 2 , x 3 , . . . , x m ; c 1 , c 2 , c 3 , . . . , c p ) submerging th e role of estimated v alues and elev at- ing them in the p ro cess to surrogates for tru e v al- ues, the argum en t go es astray . All exp ectatio ns are computed as if the estimated v alues were tru e v al- ues, and the result is a distribution for errors that do es not in any wa y d ep end up on the metho d used to estimate. Pea rson and Filon replaced S d dc s (log f ) b y the in tegral D s = R R R · · · f d log f dc s dx 1 dx 2 · · · dx m , whic h reduces iden tically to zero (if the t wo f ’s are identic al) und er fairly general regularit y condi- tions. But th e same w ould not b e generally true for R R R · · · f ( x | θ ) d log f ( x | ˆ θ ) dc s dx 1 dx 2 · · · dx m , writing ˆ θ for ( h 1 , h 2 , h 3 , . . . , h m ; c 1 , c 2 , c 3 , . . . , c p ), and θ f or the p o- ten tial true v alue ˆ θ is inte nded to estimate. KP’S ERRORS 5 The summation S d dc s (log f ) is id en tically zero if maxim um lik eliho o d estimates are used, bu t it re- mained for Fisher to notice that this term is n ot in general negligible; in fact, it will con tribute asymp- totical ly to the v ariance term if the estimate ˆ θ is in- efficien t, as would b e the case for man y of Pea rson’s momen t-based estimates. P earson and Filon w rote that the expansion repr esen ted the d istr ibution “on the assump tion that the errors ∆ h 1 , ∆ h 2 , . . . , ∆ h m , ∆ c 1 , . . . , ∆ c p ha v e b een made in the d etermination of the frequency constan ts.” But that is n ot wh at they had done. With their notation of a simple f without argumen ts, and their fixation on th e esti- mated v alues, they w ere led to mathematical mis- adv en ture. If th e substitution of limiting in tegrals f or sums had b een v alid, the lik eliho o d ratio P ∆ /P 0 w ould then hav e b een the ratio of the probabilit y densities of the sample w ith estimated h ’s and c ’s (the denom- inator) to that for a hyp othetical s et of alternativ e v alues (the h + ∆ h ’s and c + ∆ c ’s, the n umerator). In mo dern terminology , Pe arson’s final expression ( 1 ) w as claimed to b e an appro ximation for P ∆ , the (conditional) densit y of the sample x give n that the estimated v alues are in err or by ∆ h or ∆ c , and their final clai m (“In other words, we ha ve here the fre- quency distr ibution for err ors in the v alues of the frequency constan ts”) was an assertion that form ula (1) also giv es the (conditional) d ensit y of the err ors ∆ h and ∆ c giv en the d ata x . This last statemen t w as not explained, b ut later in 1916 corresp ondence with Fisher (quoted in Stigler, 2005 ) P earson de- scrib ed it as the use of in v erse probabilit y , making it a naive Ba ye sian approac h with a un iform prior, suc h as was practice d routinely o v er th e 19th cen- tury and w as sometimes referr ed to as the Gaussian metho d. P earson’s con temp oraries did not raise questions ab out the memoir. When Edgew orth discuss ed and extended it in 1908, he ga v e no indication he sa w an ything amiss (Edgewo rth, 1908 ). Only in 1922 did Ronald Fisher criticize the app roac h of th e p ap er in a lengthy f o otnote (Fisher, 1922a , page 329), writ- ing that “It is un fortunate that in this memoir no sufficien t distinction is dra wn b et wee n the p opula- tion and the sample . . . . ” Fisher w ent on to sa y that the results implicitly assu m e the estimates ac- tually maximized the lik eliho o d function, whereas they were app lied in man y cases wher e this was not the case. He wr ote, “It w ould app ear that shortly b efore 1898 the pro cess which leads to the correct v alue, of the pr ob ab le err ors of optimum statis- tics, w as hit up on and fou n d to agree w ith the pr ob ab le err ors of statistics found b y the metho d of momen ts for normal curves and su rfaces; without further enquiry it w ould app ear to ha v e b een assumed that this pro cess was v alid in all cases, its di- rectness and simplicit y b eing p eculiarly attractiv e. The mistak e w as at th e time, p erhap s , a natur al one; b u t that it should ha v e b een disco v ered and corrected w ith - out rev ealing the inefficiency of the method of moments is a v ery remark able circum- stance” (Fisher, 1922a , page 329). It is w orth p oin ting out that th e size of the correc- tion Fisher noted w as needed wa s not small. Fisher ga ve several examples of n onnormal m em b ers of Pear- son’s o wn f amily of curve s where the lo wer b oun d of the efficiency of the moment -based estimates wa s zero. Since Fisher measured effici ency as a ratio of v ariances, this meant that the correction needed for P earson’s 1898 expr essions for “probable errors” could b e enormous—in fact arbitrarily large. The 1898 ex- pressions w ere not larger than the actual prob ab le errors, but there w as little else that could b e said. There w as n o finite limit to the amount they und er- estimated th e actual probable errors. The ma jor error in the pap er was d ue (as Fisher noted) to a conceptual confusion, a taking of the es- timated fr equency constants in part of the analysis in the place of the actual frequency constants. Pea r- son had run aground after encount ering a need f or a clear notion of a set of v alues for his frequ en cy con- stan ts; he did not ha v e a framew ork to encompass b oth estimates and targets of estimation. T o some degree then, the Pe arson–Filon err or can b e seen to b e due to the lac k of the notion of parametric fam- ilies. P earson and Filon used notation in this m em- oir suggestiv e of parametric families, but the lac k of conceptual clarit y led to a confused and ultimately erroneous analysis. P earson th ough t of “frequency constan ts” as qu an tities su c h as moments, deriv able from arb itrary d ensit y curve s with the same mean- ing in all cases and with the sample moments as clearly leading to the b est estimates. Fisher’s comments were apt; p erhaps eve n to o gen- erous, although it is doubtful P earson would ha ve agreed with such an assessm ent. That the Pearson– Filon pro cedure ca n b e sho wn to w ork for effici ent 6 S. M. STIGLER estimates is a s p ecies of mathematical acciden t, al- b eit one that ma y h a v e help ed to deceiv e Pea rson and probably pro d uced ov erconfidence w hen it ga ve the results he knew sh ould hold for the normal dis- tribution case. The fact that the identit ies they claimed in general w ould w ork for many efficien t es- timates is mathematics that would ha ve b een foreign to P earson and Filon, and un ac h iev able without th e full notion of parametric families. F rom 1903 on, P earson subtly d istanced himself from the pap er with ou t ever calling atten tion to the errors, b ut he never repudiated it. I n 1899 William F. Sh eppard pu blished a long s tu dy of “normal cor- relation” (Sheppard, 1899 ). Sh eppard app ears to ha ve not seen the Pea rson–Filon pap er (at an y rate he did not cite it), and a part of w hat he p resen ted in- cluded probable errors for the fr equency constan ts in the normal case, d eriv ed by metho d s differen t f rom P earson–Filon. The metho ds he used we re qu ite straigh tforw ard—writing estimates as linear f unc- tions of frequency coun ts (using a T a ylor expans ion if necessary), and then finding momen ts from the v ariances and co v ariances of the coun ts in wa ys that remain s tandard to da y . The d irectness of Sh eppard’s metho d s must ha ve app ealed to P earson. In a sequence of articles, all with the same title “On the probable errors of f re- quency constan ts” (Pea rson, 1903, 1913, 1920 ), he present ed what he called “simple p r o ofs of the main prop ositions,” all the while w ith the Pe arson–Filon pap er receding into th e b ac kgrou n d. In 1903 he ga ve only a general reference to the 1898 treatmen t (as w ell as to Shepp ard); in 1913 h e only r eferred to the form ulae in 1898 f or the case of the norm al corre- lation co efficient (wh ere they w ere correct); in 1920 he d id not cite the 1898 work at all. In the 1903 pap er h e includ ed formulae based u p on Sh eppard’s approac h that were capable of b eing work ed out for getting probable errors for estimates in fi v e t yp es of cur v es w ithin the Pe arson family , but only for metho ds of momen ts estimates. The 189 8 pap er had a co nsider ab le impact up on statistica l practice in making the use of probab le er- rors a v ailable f or the en tire span of the n ew metho d - ology including momen t estimates. It could ev en b e argued that the wrong, generally o v eroptimistic probable errors w ere b etter than none at all. And again, the pap er had a significan t impact up on Fisher. While preparin g his 1922 m emoir, Fisher clearly h ad P earson and Filon b efore h im, and his discussion of the asymptotic v ariance of m axim um lik eliho o d esti- mates (Fisher, 1922a , pages 328–32 9), in vo lving the expansion of the densit y of a sample, reflects that. Ho wev er, Fish er used th e expansion in a differen t w a y , and op erated und er different assu mptions. He b egan by assuming that the estimate tended to nor- malit y with large samples, and u nder th at restric- tion and the assump tion that th e estimates maxi- mized lik eliho od , Fisher used the exp ansion to sho w ho w the asymptotic v ariance could b e found from the second deriv ativ e of the log d ensit y . Pea rson and Filon had sk etc h ed a solution to a p roblem that w as not the one th ey h ad embark ed up on. But it w as Fisher who recognized, with the conceptual appa- ratus of parametric families, that this s ketc h could lead to the solution of his o wn pr oblem. Pe arson the pioneer h ad laid a path that w as insuffi ciently w ell- lit for his o wn tra vel, bu t it p ro vided a brightl y lit high wa y for Fisher. 4. PEARSON’S SECOND MAJOR ERROR P earson introdu ced the C h i-square test in 1900, and it has b een wid ely celebrated as a great ac hiev e- men t in statistical m etho dology . I n 1984 the editors of a p opular science magazine select ed it as on e of t w en t y d isco veries made during the tw ent ieth cen- tury that ha ve c hanged our liv es (Hac kin g, 1984 ). Y et f or all this celebration, virtually n o historical men tion of th e pap er is mad e by statisticians w ith- out adding damning w ords to the effect that Pe arson erred in cla iming, as w e w ould no w put it, that no correction in degrees of freedom need b e made wh en parameters are estimate d un der the n ull hyp othe- sis. W orse f or P earson’s reputation, suc h accounts further n ote that the error sto o d uncorrected un til it was s ensed in 1915 b y Green woo d and Y u le and definitiv ely corrected in 1922 and 1924 by Ronald Fisher, thus seemingly turning P earson’s landmark publication in to Fisher’s triumph o v er ignorance. P earson has had some defenders in this m atter; some ha v e eve n suggested that Pe arson was righ t all along. F or example, Karl’s son Egon and George Barnard hav e separatel y adv anced ten tativ e (and I think half-hearted) statisti cal cases that migh t b e made f or pro ceeding as P earson did (Pearson, 1938 , page 30; Barnard, 1992 ). Bu t a cold, clear-ey ed lo ok at the original 1900 pap er sho ws that suc h excuses cannot b e r econciled with Pea rson’s text. He did mak e an err or, and a b ig, consequen tial one to o. The crucial passage from Pe arson’s 1900 article is on p ages 165–166. Pea rson considered a test of KP’S ERRORS 7 fit based up on a tot al of N fr equency coun ts from a samp le in dep end en tly distribu ted among n + 1 groups or categories, w ith m = theoretical f requency [i.e., the exp ected frequency for the group in question], m s = theoretical fr equency dedu ced from data for the sample [i.e., exp ected frequency usin g the data to fin d the “b est” v alue for the group], m ′ = observ ed f requency [for th e group], and with the total coun t N = P m = P m s = P m ′ . P earson recognized that the estimated theoretical frequency m s w ould t yp ically differ from the th eo- retical frequency m , and h e d en oted that difference b y µ ; th at is, µ = m − m s . His analysis ga ve p artic- ular atten tion to the relativ e error, namely µ/m s , “whic h,” he told us, “will, as a r ule, b e small.” The gist of P earson’s argumen t wa s to show that the Chi-square statistic based up on the theoretica l frequencies, χ 2 = P ( m ′ − m ) 2 m , is close to the Chi- square statistic based up on the estimated th eoret- ical frequencies, χ 2 s = P ( m ′ − m s ) 2 m s ; so close, in f act, that the d iscr ep ancy could for all practical purp oses b e ignored. 1 P earson had evident ly expand ed h ( m ) = ( m ′ − m ) 2 m = ( m ′ − m s − µ ) 2 m s + µ in a T aylo r series ab out m s , discard ed the terms of higher order than ( µ/m s ) 2 , and then summed the r esults o ve r th e n + 1 group s. P r o ceed- ing in this wa y , he w ould ha v e fou n d h ′ ( m ) = − ( m ′ 2 − m 2 ) m 2 , h ′′ ( m ) = 2 m ′ 2 m 3 , h ′′′ ( m ) = − 6 m ′ 2 m 4 , . . . . And so, sin ce µ = m − m s , h ( m ) = h ( m s ) + µh ′ ( m s ) + µ 2 2 h ′′ ( m s ) + µ 3 6 h ′′′ ( m s ) + · · · 1 P earson again employ ed S for P , and his argumen t is made harder than n ecessary to und erstand by tw o clear ty- p ographical errors. The typ ographical errors are an evident missing left p arenthesis in t he numerator of the second term on his first line of equations on page 165, and a missing m s in the denominator of t he second term of the second line of equa- tions [it reapp eared, correctly , when this term w as rep eated tw o lines later; that equ ation is our equ ation ( 3 ) b elow] . = ( m ′ − m s ) 2 m s − µ m s m ′ 2 − m 2 s m s +  µ m s  2 m ′ 2 m s −  µ m s  3 m ′ 2 m s + · · · = ( m ′ − m s ) 2 m s − µ m s m ′ 2 − m 2 s m s +  µ m s  2 m ′ 2 m s , dropping terms of higher order than ( µ/m s ) 2 . Su m b oth sides o ve r the n + 1 group s and this is the ex- pression P earson arrives at: χ 2 = χ 2 s − X  µ m s m ′ 2 − m 2 s m s  (2) + X  µ m s  2 m ′ 2 m s  , and hence , χ 2 − χ 2 s = − X  µ m s m ′ 2 − m 2 s m s  (3) + X  µ m s  2 m ′ 2 m s  . The term − m 2 s in the n umerator of the first term on the r ight-hand side of ( 3 ) is sup erfluous when summed o v er groups since P µ = P m − P m s = 0, but it pla ys a role in Pearson’s argument, whic h is no d oubt wh y he left it in. F or futur e r eference, I note that exactly the same result can b e arriv ed at more simply b y noting χ 2 = X  m ′ 2 − 2 mm ′ + m 2 m  = X  m ′ 2 m  − 2 N + N = X  m ′ 2 m  − N , and similarly χ 2 s = P { m ′ 2 m s } − N ; then χ 2 − χ 2 s = X  m ′ 2 m  − X  m ′ 2 m s  = X m ′ 2  1 m − 1 m s  . If we th en expand m − 1 as a function of m ab out m s (again n eglecting third- or higher order terms), we get χ 2 − χ 2 s = − X  µ m s m ′ 2 m s  + X  µ m s  2 m ′ 2 m s  . (4) This agrees exactly with Pe arson’s expans ion ( 3 ) when the sup erfluous term “ − m 2 s ” is dropp ed, as w ould ha ve to b e the ca se since the function b eing expanded ( χ 2 − χ 2 s ) is th e s ame in b oth cases. 8 S. M. STIGLER It is not hard to show reasonably generally under the hypothesis of fit that the terms dropp ed, ev en when summed, are ind eed with high probabilit y neg- ligible w hen N is large [ O P ( N − 1 / 2 )]. In ord er to see where P earson w as led astra y , w e must then lo ok to the paragraph follo wing his equations. P earson’s ar- gumen t p ro ceeded as follo ws: He r ecognized that the difference ( 3 ) b et ween these t w o C hi-squares sh ould b e p ositiv e: th e d eviation of the obser ved counts from the theoretical coun ts should b e greater than the same deviation if the theoretical count s are ad- justed to fit the observ ed. He wished to argue that the difference ( 3 ) w as not large. His argumen t w as in t wo parts: (i) the firs t term on the r igh t-hand side of ( 3 ) sh ould b e exp ected to b e either negativ e (th us canceling out part of the second term) or at least very small; (ii) th e second term wa s nonn ega- tiv e of course, b ut it w ould b e exp ected to b e small in any case, b ecause it inv olv ed for eac h summ and the square of th e relativ e error µ /m s , whic h Pea rson had stated (page 164) “will, as a r ule, b e small,” and m uc h sm aller still when squared. He ga v e no citation for this “rule,” but t w o yea rs earlier he had explic- itly cited Gauss, Laplace and P oisson, among oth- ers, as sanctioning the dropping of terms in vo lving the squares of errors though t to b e small (P earson and Filon, 1898 , page 246). Presumably in stating this he assumed go o d estimates and ample d ata. He gran ted that in some cases where the fit wa s bad the deviations w ould b e quite large, bu t then b oth Ch i- squares wo uld b e large and the discrepancy b et w een them u nimp ortan t. There are tw o p oin ts to mak e ab out P earson’s ar- gumen t. The first is that his analysis (i) of the first term ma y seem dub ious to mo d ern ey es, but it is not the source of the error. He noted that th e firs t term will b e p ositive only if the tw o terms m ulti- plied ( µ = m − m s and m ′ 2 − m 2 s ) are negativ ely correlated 2 ; that is, if there was a tendency f or the m ’s to b e ordered m ′ > m s > m or m ′ < m s < m . He thou ght such a tendency “seems imp ossible,” but this is uncon vincing, at least und er the null hy- p othesis of fit. Migh t w e not then exp ect often to find m ′ > m s > m or m ′ < m s < m , with m s a com- promise b et ween theory and observ ation? He m igh t ha v e had an alternativ e hyp othesis in mind, where m ′ w ould then tend to track the true theoretical ex- p ectation m , lea ving the estimate m s (made und er 2 This would presumably b e why he left the superfl uous term “ − m 2 s ” in the expression. false assu mptions) off to one sid e. Although Pea r- son’s argumen t on p oin t (i) ca n b e qu estioned, his conclusion is correct. As Fisher would observ e later, the fir s t term is in fact zero (or nearly so) if the esti- mated m ’s are c hosen well (minimum Ch i-square or maxim um lik eliho o d) due to the (near) orthogonal- it y of m − m s and m ′ − m s in those cases (m uch like that of ¯ X and X i − ¯ X for n ormal distr ib utions). In an y ev ent, it is part (ii) of his argumen t that is cru cial, and that argument fails, and fails dra- matically . The s econd term on the right -hand side of ( 3 ) sh ould not b e exp ected to b e sm all un der either n ull or alternativ e h y p othesis. A t this dis- tance in time it ma y seem sur prising that P earson did not realize this. Already in 1938 his son Egon registered th is surp rise in a biographical memoir of his f ather (P earson, 1938 , page 30), when he noted that for an y multinomial distr ibution, if Chi-square is computed with no parameter restrictions (so eac h theoretical v alue is estimated b y the corresp ond- ing observed coun t and m s = m ′ ), then the fit with the estimated v alues is p erfect. W e wo uld th us hav e χ 2 − χ 2 s = χ 2 − 0 , while th e right-hand side of ( 3 ) giv es − X  µ m s m ′ 2 − m 2 s m s  + X  µ m s  2 m ′ 2 m s  = − 0 + X  ( m − m ′ ) 2 m ′  . In this extreme case the second te rm is asymptoti- cally equiv alen t to the original Ch i-squ are itself un- der the null hyp othesis, and so it is certainly not negligible. Th e test of fit is not in teresting here (we w ould sa y the degrees of f reedom is zero), but it sho ws starkly the dev astating effect estimated pa- rameters can ha v e up on the statistic, ev en wh en (as in Egon’s examp le) the relativ e err or itself ( µ/m s ) w ould b e sm all [ O P ( N − 1 / 2 )]. Wh y , Egon seemed to ask, would Karl ha v e not seen th is? Egon offered his father’s p ossib le “hurry in execution” as one expla- nation. 5. FISHER’S CORRECTION Ironically , Pearson did consider a similar example in 192 2 and rejected its relev ance. In 1922 Fisher ( 1922b ) pub lished his fir s t comment on the degrees of freedom issu e, and at that time h e dealt on ly with the case Greenw o o d and Y ule h ad noticed, the case of r × c con tingency tables. There, Fisher’s argumen t w as k ey ed to the w a y the lin ear relations with the KP’S ERRORS 9 marginal totals in hibited the estimated exp ectations under the n ull hyp othesis, th u s reducing the “de- grees of freedom,” a term Fisher in tro duced there. A t that time, Fisher made no attempt to addr ess the question for tests of fit more generally . P earson immediately rebutted in Biometrika . The reply fo- cused up on w hat Pe arson thought (mistak enly) w as a confu sion b et ween different sampling mo dels (fixed totals or full multinomial sampling), and P earson in v ok ed the traditional custom of astronomers and others of sub stituting estimates with small standard errors without p enalt y in large samp les. He th ough t Fisher had b lundered and w as offering an exclusively conditional analysis, giv en the estimated qu an tities. P earson noted ( 1922 , page 187 ) that if y ou estimated “the first p − 1 moment -co efficien ts” a p erfect fi t w ould b e obtained; he r ejected suc h a conditional analysis as restricting the random sampling and an- tithetical to the question at issue. He did not see (and Fisher’s exp osition wo uld ha ve made it d iffi- cult for him to see) that in the con tingency table setting the conditional and unconditional tests w ere the same. In 1924 Fish er r eturned to address the more gen- eral question, and if w e lo ok at Fisher’s treatmen t there, w e see exactly where P earson’s argumen t ab out the second term of ( 3 ) failed, and exactly what he lac ked for a successful tr eatment (Fisher, 1924 ). W rit- ing in 1924, Fisher clearly h ad Pearson’s pap er in fron t of him. Fisher used slightly d ifferen t n otation, 3 but for ease of comparison I sh all translate to Pear- son’s notation. Fisher’s d ev elopmen t was sligh tly streamlined in that Fisher did giv e the sim p ler ex- pression for the d ifferen ce of C hi-squares: χ 2 − χ 2 s = X m ′ 2  1 m − 1 m s  . It is exactly this expression that Fisher expand ed in a T aylo r series, just as P earson had d one, but with one absolutely crucial d ifference. Fisher was no w armed with his o wn recently introd uced notion of a parametric family , and w here Pea rson had sim- ply dealt w ith this as a function of m , Fisher had m = m ( θ ) and expanded as a fun ction of θ , not m . He f ound the same t w o terms Pearson had f ound, but expressed th em differen tly: 1 m − 1 m s 3 Fisher used χ ′ , x, m ′ and n where Pearson used χ s , m ′ , m s and N . = − 1 m 2 s ∂ m s ∂ θ δ θ +  2 m 3 s  ∂ m s ∂ θ  2 − 1 m 2 s ∂ 2 m s ∂ θ 2  ( δ θ ) 2 2 + higher-ord er terms . If this is multiplied by m ′ 2 and summed it giv es χ 2 − χ 2 s = − δ θ X  m ′ 2 m 2 s ∂ m s ∂ θ  + ( δ θ ) 2 2 X  2 m ′ 2 m 3 s  ∂ m s ∂ θ  2 − m ′ 2 m 2 s ∂ 2 m s ∂ θ 2  . Fisher wa s now able to see th at if th e minim um Chi- square estimate ˆ θ is used, then his fir st term an d P earson’s first term actually v anish ed (since then the first s u mmation is exactly d dθ χ 2 | θ = ˆ θ = 0), and he knew already that the same would b e true asymp- totical ly for the maxim um lik eliho o d estimate or an y other efficien t estimate of θ . He th en replaced m ′ /m s b y unit y (its asymptotic v alue) to get χ 2 − χ 2 s = ( δ θ ) 2 X  m ′ 2 m 3 s  ∂ m s ∂ θ  2 − m ′ 2 2 m 2 s ∂ 2 m s ∂ θ 2  ≈ ( δ θ ) 2 X  1 m s  ∂ m s ∂ θ  2 − 1 2 ∂ 2 m s ∂ θ 2  = ( δ θ ) 2 X  1 m s  ∂ m s ∂ θ  2  . The last step used th e fact that P { ∂ 2 m s ∂ θ 2 } = ∂ 2 ∂ θ 2 · P m s = ∂ 2 ∂ θ 2 N = 0 . Based up on his o wn 1922 p ap er, he now n oted that P { 1 m s ( ∂ m s ∂ θ ) 2 } w ould, in the case of a sin gle estimated parameter θ , estimate (and ap- pro ximate asymptotically) the r ecipro cal of the v ari- ance of any efficient estimate. This wo uld giv e in mo dern notation χ 2 − χ 2 s = ( ˆ θ − θ ) 2 σ 2 ( ˆ θ ) . T h is difference then w as asymptotically equiv alent to the square of a standard norm al random v ariable. Th e degree of freedom that is lost by estimation b ecame clearly visible. There are tw o views that ma y b e tak en of this. One I ha ve already men tioned: that the alc hemist Fisher’s concept of a parametric family had tu r ned P earson’s base expressions into statistica l gold. Pos- terit y has used this to diminish Pearson’s repu tation— ho w could he ha ve missed suc h a simple and (now) ob vious step? But there is another, to me more p er- suasiv e view. F or o ver 20 y ears th at step w as any- thing bu t ob vious. Pearson’s p erceptive s tu den t G. 10 S. M. STIGLER Udn y Y ule initially accepted the 1900 rule, for ex- ample usin g 8 rather than 4 degrees of freedom for a Chi-square test of a 3 × 3 con tingency table in Y u le ( 1906 , p age 349). Only in 1915, after y ears of ex- p erience, did Greenw o o d and Y ule ( 1915 ) bring th e puzzle to w ider n otice, and even th en neither they nor an y one else h ad a clear view of the source of the problem. And so it sto o d u n til Fisher. Ev en with Fisher’s work b efore us, we must mar- v el at ho w f ar Pea rson had gone. He had lac ke d only one ingredient —parametric families—but what he had managed to do w as to identify the issue and present it in suc h a clear wa y that wh en Fisher com- bined P earson’s 1900 dev elopmen t with the d ecep- tiv ely simp le idea of parametric families, the solu- tion must hav e sp r ung to mind nearly imm ed iately . It to ok Fish er’s genius to answer the q u estion, b ut he would scarcely ha v e b een in a p osition to do so without the path-breaking form ulation of P earson the p ioneer. It is n ot anac h ronistic to see Pearson as err in g in 1900. Ev en without the notion of parametric fami- lies he could hav e seen a discrepancy without seeing a resolution, j u st as Greenw o o d and Y ule did, wh en they found th e Chi-square test f or 2 × 2 tables gav e results inconsistent with a comparison of th e tw o columns as bin omial count s. Pea rson erred, but the error led to Fisher’s disco ve ry of degrees of freedom. P earson had n ot on ly solv ed the great problem of testing multinomial go o dn ess of fit against all alter- nativ es, he had also isolated and form ulated another great problem in terms that tw o d ecades later p er- mitted another genius, armed with his o wn ma jor disco v ery , an easy solution. 6. CONCLUSION The errors Pe arson made did not go undetected b ecause they we re small; to the cont rary , they were large and of p oten tially large practical consequence. F or example, in Pea rson an d Filon’s o wn numerical example for a Type I I I or Gamma d ensit y (Pea rson and Filon, 1898 , p ages 279–280 ), the p robable er- ror given for the shap e parameter p is only ab out a fifth of what it should ha v e b een (Fisher, 1922a , page 336). If the cur v es b eing fi t b y the metho d of momen ts h ad b een closer to the normal s hap e, the errors w ould h a ve b een smaller, b ut if not, there w as no fi nite b ound on ho w far off th ey could b e. F or Chi-squ ares for 2 × 2 tables Pe arson w ould giv e 3 r ather than 1 degree of freedom; for 3 × 3 tables he w ould giv e 8 rather than 4 d egrees of freedom. In these and other examples the effect up on inferences could b e d ev astating. Not only w ere the er r ors Pea rson made not eas- ily disco v ered; ev en after they w ere p ointed out in 1922 they w ere not w idely understo o d. In 1924, a Handb o ok of Mathematic al Statistics w as published , prepared un der the auspices of the U.S. National Researc h Council (Rietz, 1924 ). The Editor-in-Chief w as H. L. Rietz, and ma jor con tributions were m ade b y Harv ard Unive rsity Professor E. V. Hun tington and Unive rsity of Mic higan Professor H. C. Carv er (later the founding editor of the Annals of Mathe- matic al Statistics ). Carver cited the P earson–Filon pap er without any ind ication h e saw th e pr oblem with it (page 95). Fisher’s ( 1922b ) first correction to the degrees of f r eedom for con tingency tables w as briefly cite d without commen t b y Rietz, bu t Ri- etz (with evident approv al) also ga v e in more detail P earson’s argument that n o correction f or estimat- ing exp ected v alues w as needed (pages 80–81). Else- where in the v olume, Hun tington wrote warmly of the metho d of m omen ts, and no where w as Fisher’s magn um opus of 1922 r eferred to. Eve n in En g- land und erstanding w as slo w. By 1938 Egon P earson had conceded the degrees of fr eedom issue, but he seemed to h a v e n ot accepted Fisher’s p oin t ab out P earson and Filon (P earson, 1938 , pages 28–29 ). Both P earson and Fisher w ere gian ts in our his- tory; d esp ite their lac k of m u tual app reciation we cannot im agine mo dern statistics w ith ou t b oth. P ear- son’s errors we re substant ial and n ot to b e glossed o ver, but they should not obscure the even greater ac h ievemen ts they accompanied. Pearson had a gi- an t am bition and the en ergy to r ealize it. He sough t to create a whole new statistical system, and for a time succeeded. He did n ot ha v e a mathemati- cal min d equal to Fisher’s, an d h e b ecame mired in and neve r escaped from an incompletely dev elop ed conceptual app aratus that w as n ot equal to the full task at hand . But he to ok statistics to a h igher lev el nonetheless. If P earson could never come to admit some failures, it wa s surely due to a stubb ornness that ev en he recognized in h im s elf. In th e Pr eface for the S econd Edition of The Gr ammar of Sci enc e ( 1900 b ), P earson wrote, “If I h a v e not paid greater atten tion to m y numerous critics, it is not that I ha ve failed to stu d y them; it is simply that I ha v e remained—obstinately it ma y b e— con vinced that the views expr essed are, KP’S ERRORS 11 relativ ely to our present state of kno wl- edge, su bstan tially correct” (Pe arson, 1900b , page ix). So it was with his statistical work as we ll. P earson’s impact u p on Fisher ma y in the end stand as one of his greater ac hiev emen ts. P earson had no student more diligen t th an Fisher, desp ite their dif- ferences. When in 1945 Fisher wrote an ill-fated bi- ographical account of P earson f or the D ictionary of National Bio g r aphy (rejecte d b y the Dictionary and not p u blished until by A. W. F. E dwa rds, in 1994 ), he wrote to th e editor that he had made a “lifelong study of Pea rson’s w ritings.” Fish er further stated, “I hav e du ring the last 35 y ears at v arious times had o ccasion to lo ok at probably all of [P earson’s fu n- damen tal statistical memoirs] and at th e immense output which w as p ublished in Biometrika .” It was from r eading Pe arson’s work and P earson’s journal that Fisher’s in terest in statistics deve lop ed in the w a y it did, and in the case of the tw o examples dis- cussed here, the effect of the P earsonian blueprint could scarcely b e more evid ent. Fisher saw Pe arson clearly , w arts and all, and while he did not ac k n o wl- edge the exten t of his debt to Pea rson, its exten t is clear to other, less inv olv ed readers. As in Newton’s famous statemen t, Fisher s to od on the sh oulders of a gian t (Merton, 1965 ). P orter’s r ecen t biograph y ( 2004 ) is illuminating on P earson’s pre-statistical life. Eisenhart ( 1974 ) re- mains the most complete discussion of K. P .’s sta- tistical w ork. F or other discussion relating to this early w ork see Aldrich ( 1997 ), Hald ( 1998 ), Mag- nello ( 1996 , 19 98 ). On Ch i-square see in particular Fien b erg ( 1980 ), Hacking ( 1984 ), P lac kett ( 1983 ) and Stigler ( 1999 , Chapter 19). F or other asp ects of th e Pearson–Fisher relationship see Stigler ( 2005 , 2007a , 2007b ). P earson himself returned to that topic of C h i-square frequen tly , including P earson ( 1915 , 1922 , 1923 , 19 32 ), most of these under the ins tiga- tion of Fisher. REFERENCES Aldrich, J. (1997). R. A. Fisher and the making of maximum lik eliho od 1912–192 2. Statist. Sci. 12 162–176. MR1617519 Barnard, G. A. (1992). Introd u ction to Pearson (1900). In Br e akthr oughs in Statistics I I (S. Kotz and N. L. Johnson, eds.) 1–10. Sp ringer, New Y ork. Edgewo r th, F. Y. (1908). On th e probable errors of frequency-constants (contd.). J. R oy. Statist. So c. 71 499– 512. Edw ards, A. W . F. (1994). R. A. Fisher on Karl P ear- son. Notes and R e c or ds R oy. So c. L ondon 48 97– 106. MR1272638 Eisenhar t, C. (1974). K arl Pearson. In Dictionary of Scien- tific Bio gr aphy 10 447–473. Charles Scribner’s Sons, New Y ork. Fienberg, S. E. (1980). Fisher’s con tributions to th e analysis of categorical d ata. In R. A . Fisher: An Appr e ci ation . (S. E. Fienberg and D. V. Hinkley , eds.) 75–84. L e ctur e Notes in Statist. 1 . Springer, New Y ork. Filon, L. N. G. (1934). Remarks in Sp e e ches Deliver e d at a Dinner i n University Col le ge, L ondon, i n Honour of Pr o- fessor Karl Pe arson, 23 April 1934 . The Universit y Press, Cam bridge. Fisher, R. A. ( 1922a). On the mathematical foun d ations of theoretical statistics. Philos. T r ans. R oy. So c. L ondon ( A ) 222 309–368; reprinted as Paper 18 in Fisher ( 1974 ). Fisher, R. A . (1922b). On the interpretation of χ 2 from con- tingency tables, and the calculation of P. J. R oy. Statist. So c. 85 87–94; reprinted as Paper 19 in Fisher ( 1974 ). Fisher, R. A . (1924). Conditions un der whic h χ 2 measures the discrepancy b etw een observa tion and h yp othesis. J. R oy. Statist. So c. 87 442–450; rep rinted as Paper 34 in Fisher ( 1974 ). Fisher, R. A. (1974). The Col le cte d Pap ers of R. A. Fisher (J. H . Bennett , ed.). Univ. Adelaide. Greenwood, M. and Y ule, G. U. (1915). The statistics of anti-t y phoid and anti-c h olera ino culations, and the inter- pretation of such statistics in general. In Pr o c. Ro y. So c. Me dicine , Se ction of Epidemiol o gy and State Me dicine 8 113-190; reprin ted ( 1971) in Statist ic al Pap ers of Ge or ge Udny Y ule ( A . Stuart and M. G. Kend all, eds.) 171–248 Griffin, Lond on. Hack ing, I. (1984). T rial by number. Scienc e 84 5 69–70. Hald, A. (1998). A History of Mathematic al Statistics F r om 1750 to 1930 . Wiley , New Y ork. MR1619032 Jeffrey, G. B. ( 1938). Louis Nap oleon George Filon. J. L on- don Math. So c. 13 310–318. Magnello, M. E. (1996). Karl Pearso n’s Gresham lectu res: W. F. R. W eldon, sp eciation and the origins of Pearso nian statistics. British J. History of Scienc e 29 43–63. Magnello, M. E. (1998). Karl Pears on’s m ath ematization of inheritance: F rom ancestral heredity to Mendelian genetics (1895–19 09). Ann. Sci. 55 35–94. MR1605849 Mer ton, R . K. (1965). On the Shoulders of Giants: A Shan- de an Postscript . The F ree Press, New Y ork. Morant, G. M. (1939). A Bibli o gr aphy of the Statistic al and Other Writings of Karl Pe arson (Compiled with the Assis- tance of B. L. W elc h). Cambridge Univ. Press. Pearson, E. S . (1938). Karl Pe arson: An Appr e ciation of Some Asp e cts of His Life and Work . Cambridge Univ. Press. Pearson, K. (1895). Mathematical contributions t o the the- ory of evolution, I I: Sk ew v ariation in homogeneous ma- terial. Philos. T r ans. R oy. So c. L ondon ( A ) 186 343–414. Reprinted in Karl Pe arson ’s Early Statistic al Pap ers (1956) 41–112 Cam bridge Univ. Press. Pearson, K. (1896). Mathematical contributions to the theory of evolution, I I I: Regression, heredity and pan- mixia. Philos. T r ans. R oy. So c. L ondon ( A ) 187 253–318. 12 S. M. STIGLER Reprinted in Karl Pe arson ’s Early Statistic al Pap ers (1956) 113–178 Cambridge Univ. Press. Pearson, K. (1897). Mathematical contributions t o the the- ory of evolution. On a form of spurious correlation whic h ma y arise when indices are u sed in the measurement of organs. Pr o c. R oy. So c. L ondon 60 489–498. Pearson, K. (1900a). On t he criterion that a given system of deviations from t h e probable in th e case of a correlated sys- tem of v ariables is such that it can b e reasonably supp osed to hav e arisen from random sampling. Phi los. Magazine , 5th Series 50 157–175. Reprinted in Karl Pe arson ’s Early Statistic al Pap ers (1948) 339–357 Cambridge U n iv. Press. Pearson, K. (1900b). The Gr ammar of Scienc e , 2nd ed. [First was 1892]. Ad am and Charles Blac k, Lond on. Pearson, K. (1903, 1913, 1920). On the probable errors of frequency constants. Biometrika 2 273–28 1, 9 1–10, 13 113– 132. Pearson, K. (1915). On a brief proof of the fundamental form ula for testing the goo dn ess of fit of frequency distri- butions, and on t he probable error of “ P .” Phil osophic al Magazine , 6th Series 31 369–378 . Pearson, K. (1922). On the χ 2 test of goo dness of fit. Biometrika 14 186–191. Pearson, K. (1923). F urther note on the χ 2 test of go o dness of fit. Bi ometrika 14 418. Pearson, K. and Filon, L. N. G. (1898). Mathematical con- tributions to the theory of evolution IV. On the prob ab le errors of frequ ency constants and on the influ ence of ran- dom selection on v ariation and correlation. Phil os. T r ans. R oy. So c. L ondon ( A ) 191 229–311 . Reprinted in Karl Pe arson ’ s Early Statistic al Pap ers (1956) 179–261 Cam- bridge Univ. Press. Abstract in Pr o c. R oy. So c. L ondon (1897) 62 173–176. Pearson, K. (1932). Exp erimental discussion of the ( χ 2 , P ) test for go o d ness of fit. Biometrika 24 351–381. Pla ckett, R. L. ( 1983). Karl Pea rson and the chi-squared test. I nternat. Statist. R ev. 51 59–72. MR0703306 Por te r, T . M . (2004). Karl Pe arson: The Scientific Life in a Statistic al Ag e . Princeton Univ . Press. MR2054951 Rietz, H. L. , e d. (1924). H andb o ok of Mathematic al Statis- tics . Houghton Mifflin, Boston. Shepp ard, W . F. ( 1899). On the app lication of t he theory of errors to cases of normal distribution and normal corre- lation. Phi los. T r ans. R oy. So c. L ondon ( A ) 192 101–167 , 531. Stigler, S. M. (1986). The Hi story of Statistics: The Me a- sur ement of Unc ertainty Befor e 1900 . H arv ard Un iv . Press, Cam bridge, MA. MR0852410 Stigler, S. M. (1999). Statistics on the T able . Harv ard Univ. Press, Cambridge, MA. MR1712969 Stigler, S. M. (2005). Fisher in 1921. Statist. Sci. 20 32–49. MR2182986 Stigler, S. M. (2007a). The pedigree of the I nternational Biometrics S o ciety . Biometrics 63 317–321 . MR2370789 Stigler, S. M . ( 2007b). The epic story of maximum likeli- hoo d. Statist. Sci. 22 598–620 . Stouffer, S. A. (1958). Karl Pearson—An appreciation on the 100th anniversary of his birth. J. Amer. Statist. Asso c. 58 23–27. MR0109122 Yule, G. U. (1906). The influence of bias and of p ersonal equation in statistics of ill-defined qu alities. J. Anthr op o- lo gic al I nstitute of Gr e at Britain and I r eland 36 325–381 .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment