Karl Pearsons Theoretical Errors and the Advances They Inspired

Statistic al Scienc e 2008, V ol. 23, No. 2, 261– 271 DOI: 10.1214 /08-STS256 c  Institute of Mathematical Statisti cs , 2008 Ka rl P ea rson’s Theo retical Erro rs and the Advances They Inspi red Stephen M. Stigler Abstr act. Karl Pe arson pla y ed an enormous role in determining the con ten t and organizatio n of statistical researc h in his day , through his researc h , his teac hing, his establishment of lab oratories, and his ini- tiation of a v ast publishin g program. His tec hnical con trib utions had initially and con tinue today t o hav e a profound impact up on the work of b oth applied and theoretical statisticians, p artly through their in- adequately ac knowledge d inﬂuence up on Ronald A. Fisher. Pa rticular atten tion is dra wn to tw o of Pea rson’s ma jor errors that nonetheless ha v e left a p ositiv e and lasting impr ession up on the statistical world. Key wor ds and phr ases: Karl P earson, R. A. Fisher, Chi-squ are test, degrees of freedom, parametric inference, history of statistics. 1. INTRODUCTION Karl P earson surely ranks among the more pr o- ductiv e and in tellectually energetic sc holars in his- tory . He cannot matc h the most proliﬁc humanists, suc h as one of whom it has b een said, “he had no unpu blished though t,” but in the domain of qu an ti- tativ e science P earson h as no serious riv al. Eve n the immensely proliﬁc Leonhard Euler, whose collected w orks are still b eing pu blished more than t wo cen- turies after h is death, falls sh ort of Pearson in sh eer v olume. A list of P earson’s works ﬁlls a hard b oun d b o ok; that b o ok lists 648 w orks and is still incom- plete (Moran t, 1939 ). My o wn m o derate collec tion of his w orks—itself ve ry far fr om complete (it omits his con tributions to Biometrika )—o ccupies 5 feet of Stephen M. Stigler is t he Ernest DeWitt Burt on Distinguishe d Servic e Pr ofessor in the Dep artment of Statistics, U n iversity of Chic ago, 5734 University Avenue, Chic ago, Il linois 60637, USA e-mail: stigler@uchic ago.e du . This p ap er is b ase d up on a talk pr esente d at the Ro yal Statistic al So ciety in Mar ch 2007, at a symp osium c elebr ating the 150th anniversary of Karl Pe arson ’s birth. This is an electronic reprint of the o riginal ar ticle published by the Institute of Ma thematical Statistics in Statistic al Scienc e , 20 08, V ol. 23 , No. 2, 26 1–27 1 . This reprint diﬀers from the orig inal in pagination a nd t yp ogr aphic detail. shelf space. And h is were not casually constructed w orks: wh en a stu den t or a n ew co-w ork er would do the lab orious ca lculations for some statistical an al- ysis, Pea rson w ou ld redo the w ork to greater acc u- racy , as a c hec k. An American visiting Pe arson in the early 1930s once ask ed him ho w h e found the time to w rite so m uc h and compu te so muc h. P ear- son replied, “Y ou Americans would not un derstand, but I n ever answer a telephone or attend a commit- tee meeting” (S touﬀer, 1958 ). P earson’s accomplishment s w ere n ot merely volu- minous; they could b e lu minously enlightening as w ell. T oday the most famous of these are Pearson’s Pro du ct Moment C orr elation Co eﬃcien t and the Chi- square test, dating resp ectiv ely from 1896 and 1900 (P earson, 1896 , 19 00a , 190 0b ). He w as a drivin g force b ehind the founding of Biometrika , whic h h e edited for 36 y ears a nd made in to th e ﬁr s t imp ortant journal in m athematical statistics. He also estab- lished anot her jour nal ( the Anna ls of E ugenics ) and sev eral add itional serial publications, t wo researc h lab oratories, and a sc ho ol of statistical though t. P ear- son pioneered in the use of mac hine calculation, and he sup ervised the calculation of a series of mathe- matical tables that inﬂu enced statistical practice for decades. He made other d isco veries, less commonly asso ciated with his n ame. He w as in 1897 the ﬁ rst to n ame the phen omenon of “sp urious correlation,” th us pub licly iden tifying a p o werful idea that made 1 2 S. M. STIGLER him and coun tless descendents more a ware of the pitfalls exp ected in an y serious statistical inv esti- gation of so ciet y (P earson, 1897 ). And in a series of in v estigations of cr an iometry he introd uced the idea of landm arks to the s tatistica l study of shap es. P earson w as at one time well known for the P ear- son F amily of F requency Curves. That f amily is sel- dom referred to to da y , but there is a small fact (re- ally a striking d isco very) he found in its early de- v elopmen t that I wo uld call atten tion to . When we think of th e normal appr o ximation to the bin omial, w e usu ally think in terms of large samples. Pea r- son disco ve red th at there is a sense in whic h the t w o distribu tions agree exactly for ev en th e s m allest n umb er of trials. It is w ell kno wn that the n orm al densit y is c haracterized by the diﬀerent ial equation d dx log( f ( x )) = f ′ ( x ) f ( x ) = − ( x − µ ) σ 2 . P earson discov ered that p ( k ) , the pr obabilit y func- tion for the symm etric b inomial distribu tion ( n in- dep end en t trials, p = 0 . 5 ea c h trial), satisﬁes the analogous diﬀerence equ ation exactly: p ( k + 1 ) − p ( k ) ( p ( k + 1) + p ( k )) / 2 = − ( k + 1 / 2) − n/ 2 ( n + 1) · 1 / 2 · 1 / 2 or rate of c hange p ( k ) to p ( k + 1) a verage of p ( k ) and p ( k + 1) = − midp oint of ( k , k + 1) − µ n σ 2 n +1 for all n , k . Th e app earance of n + 1 ins tead of n in the denomin ator might b e considered a minor fud ge, but th e equ ation still demons tr ates a really funda- men tal agreemen t in the shap es of the tw o d istr ibu- tions th at do es not rely up on asymptotics (Pe arson, 1895 , p age 356). All of these Pea rsonian ac hieve ments are indeed substanti al, and constitute ample reason to cele - brate h im 150 ye ars after his b ir th. But if these are all w e saluted, I would hold that P earson is b eing un- derappreciated. T o prop erly gauge his impact up on mo dern statistics, w e m ust tak e a lo ok at parts of t w o w orks of his that are t ypically not held in high regard. Indeed, they are us ually men tioned in deri- sion, as exhib iting tw o m a jor errors that show Pear- son’s limitations and highlight the great gulf that la y b et ween Pearson and the Fisherian era that was to follo w. I wish to r eturn no w to these tw o works and reassess them. I intend to argue that these er- rors should coun t among the more inﬂ uent ial of his w orks, and that they help ed p a ve the w a y for the creation of m o dern mathematical s tatistics. 2. PEARSON’S FIRST MAJOR ERROR Louis Nap oleon George Filon w as b orn in F rance, but his family mo ve d to England w hen h e w as three y ears old (J eﬀr ey , 1938 ). He ﬁrst encountered K arl P earson as a student at Un iv ersit y College London. After receiving a B.A. in 1896, Filon serve d as P ear- son’s Demonstrator un til 1898, and together they wrote a monumen tal m emoir on the “probable er- rors of frequ en cy constan ts,” a pap er read to the Ro yal So ciet y in 1897 and pu blished in their T r ans- actions in 1898. In 1912 Filon succeeded Pea rson as Goldsmid Pr ofessor of App lied Mathematics and Mec hanics. At Pe arson’s retiremen t banquet in 1934, Filon (who w as by that time Vice-Chancello r of the Univ ersit y of Lond on) explained the genesis of this, their only w ork together in statistics. “K. P . lectured to us on the Mathemat- ical Th eory of Statistics, and on one o c- casion wrote down a certain integ ral as zero, wh ic h it should h av e b een on ac- cepted pr inciple. Unfortun ately I ha v e al- w a ys b een one of those wr ong-headed p er- sons who refuse to accept the statemen ts of P rofessors, u n less I can ju stify them for m yself. After muc h lab our, I actually ar- riv ed at th e v alue of the in tegral directly— and it was nothing like zero. I to ok this result to K. P ., and then, if I ma y say so, the fu n b egan. The battle laste d, I think, ab out a week, bu t in th e end I suc- ceeded in convincing Professor Pea rson. It w as typical of K. P . th at, the momen t he w as r eally convinced, he sa w the full con- sequences of the result, pro ceeded at once to b uild u p a new theory (wh ic h inv olved scrapping some p reviously published re- sults) and generously associated me with himself in the resulting p ap er” (Filo n, 1934 ). The term “pr obable error” w as in tro duced early in the 19th cent ury to mean w h at w e would n ow call the median err or of an estimat e. Th us it is a v alue whic h, when divided b y 0.6745, giv es th e standard deviation for an unbiased estimate w ith an app ro x- imately n ormal d istribution. My guess is that the KP’S ERRORS 3 lecture that Filon referred to in v olv ed the formula for the p robable error of Pea rson’s pro d uct-momen t estimate of the correlation co eﬃcient for biv ariate normal distributions. Pe arson had giv en th is incor- rectly in 1896, and one of the signal ac hieveme nts of th e P earson–Filon pap er was to correct that er- ror (Stigler, 1986 , page 343). But the 1898 pap er did m uc h more: it p urp orted to giv e the appro xi- mate d istributions f or the p robable errors of all the estimated frequency constan ts, indeed their en tire join t distribu tion, f or virtually any statistical prob- lem. The th eory presente d was relativ ely short; most of the pap er was tak en up with a large num b er of applications. Unfortunately , quite a num b er of these app lica- tions pro v ed to b e in err or. There are some ind ica- tions Pearson ma y ha ve realized this b y 1903, b ut if he did sense troub le with the pap er, he d id not call it to p ublic atten tion. In 1922 Ronald Fisher repaired P earson’s omission when he noted in p articular th at outside of the case of the normal d istribution, nearly all of the ap p lications in Pe arson–Filon w ere erro- neous. Th is included many metho d-of-moment s es- timates (the gold-standard metho d for the Pear- sonian s c ho ol). A signiﬁcant P earson ac h iev emen t came to b e lab eled an error, on e ev entual ly o v ercome b y a Fisher success, and in consequence, the 1898 pap er has su ﬀer ed a low reputation. But despite these p roblems, the p ap er had, arguably , a signif- ican t and large ly underappr eciated p ositiv e impact up on statistics. A t ﬁr st glance the P earson–Filon argument ma y app ear strikingly mo d ern, apparen tly expanding a log-lik eliho o d ratio to d eriv e an asymptotic app r o x- imately m ultiv ariate norm al distribu tion for the er- rors of estimation. The auth ors considered a multi- v ariate set of m -d imensional measurements x 1 , x 2 , x 3 , . . . , x m of “a complex of organs,” and they stated the “frequency surface” should b e given by “ z = f ( x 1 , x 2 , x 3 , . . . x m ; c 1 , c 2 , c 3 , . . . c p ) , where c 1 , c 2 , c 3 , . . . , c p , are p frequency con- stan ts, whic h deﬁne th e form as distin- guished from the p osition of the frequency surface, and which will b e fu nctions of stan- dard deviations, momen ts, ske wnesses, co- eﬃcien ts of correlation, &c., &c., of indi- vidual organs, and of pairs of organs in the complex.” The “p osition” of the surface would b e give n in terms of the means h 1 , h 2 , . . . , h m of the x ’s, wh ic h are implicit in this notatio n as they are stated to giv e the origin of the su rface, and so there are m + p frequency constant s to b e d etermined fr om a set of n measurements, eac h m -dimensional. T o d etermine the “probable errors ” of the fre- quency constants, Pea rson and Filon looke d at the ratio f orm ed by dividing the pro du ct of n suc h sur- faces (for the n vecto r measurements) in to w hat a similar pro d uct w ould b e if the frequen cy constants had b een d iﬀeren t v alues: P ∆ P 0 = Π f ( x 1 + ∆ h 1 , x 2 + ∆ h 2 , . . . , x m + ∆ h m ; c 1 + ∆ c 1 , c 2 + ∆ c 2 , . . . , c p + ∆ c p ) / Π f ( x 1 , x 2 , . . . , x m ; c 1 , c 2 , . . . , c p ) . The logarithm of this ratio is th en the diﬀerence of t w o sums. After b eing expanded “by T aylo r’s theo- rem,” this yields a series w ith t ypical terms, writing S for su mmation, they found to b e log( P ∆ /P 0 ) = ∆ h r S d dx r (log f ) + 1 2 (∆ h r ) 2 S d 2 dx 2 r (log f ) + ∆ h r ∆ h r ′ S d 2 dx r dx r ′ (log f ) + ∆ c s S d dc s (log f ) + 1 2 (∆ c s ) 2 S d 2 dc 2 s (log f ) + ∆ c s ∆ c s ′ S d 2 dc s dc s ′ (log f ) + ∆ h r ∆ c s S d 2 dh r dc s (log f ) + · · · + cub ic terms in ∆ h and ∆ c + & c, “where f stands for f ( x 1 , x 2 , x 3 , . . . , x m ; c 1 , c 2 , c 3 , . . . , c p ).” P earson and Filon then “replace su m s by in te- grals,” f or example, b y replacing the second sum- mation ab o ve, namely S d 2 dx 2 r (log f ), b y − B r = R R R · · · f d 2 log f dx 2 r dx 1 dx 2 · · · dx m . With their notation the fre- quency surface encompassed a v olume = n (i.e., it w as not a r elative frequency su rface), so if the f w ere tak en as a density or relativ e frequency surface this would b e tan tamoun t to replacing 1 n S d 2 dx 2 r (log f ) b y its exp ectatio n E [ d 2 dx 2 r (log f )], whic h wo uld equal 4 S. M. STIGLER − B r . The integrals were then ev aluated, and higher- order terms d iscarded, to get P ∆ = P 0 exp t . − 1 2 { B r (∆ h r ) 2 − 2 C r r ′ ∆ h r ∆ h r ′ − 2 G r s ∆ h r ∆ c s + E s (∆ c s ) 2 (1) − 2 F ss ′ ∆ c s ∆ c s ′ + & c. · · ·} where B r , etc. are in tegrals giv en in terms of deriv a- tiv es of log f . W e are then told: “This r epresen ts the probabilit y of the ob- serv ed unit, i.e. the individu als ( x 1 , x 2 , x 3 , . . . x m , for all sets), o ccur ring, on the assumption that the er r ors ∆ h 1 , ∆ h 2 , . . . , ∆h m , ∆ c 1 , . . . , ∆ c p , ha ve b een made in the determination of the fr equ ency con- stan ts. In other wo rds, w e h av e here the frequency distribution for errors in the v al- ues of the fr equency constants.” Sev eral of their s teps, such as the cav alier sub- stitution of in tegrals for sum s or the discarding of remainder terms, ma y seem in suﬃcien tly d efended, but the general dr ift is so similar to what we tend to see to d a y that it would b e easy for an uncriti- cal r eader to accept it, b elieving that it is p robably essen tially accurate, and that with some eﬀort and additional r egularit y cond itions all should b e well. After all, such a reader migh t say , the r eplacemen t can b e justiﬁed u nder reasonable regularity condi- tions, and ev en th e last step wo uld b e sanctioned b y a lo ose inv erse probabilit y argumen t su c h as was common at that time. That reader would b e wr ong. 3. THE SOURCE OF THE ERROR The ke y to und erstanding wh at w en t wrong with the P earson–Filon argument is at the v ery b egin- ning, as a closer readin g s h o ws. Mo dern r eaders ha v e understand ably tended to tak e th e frequency surface z = f ( x 1 , x 2 , x 3 , . . . , x m ; c 1 , c 2 , c 3 , . . . , c p ) as a para- metric mo d el. But in fact, in their notation, z is the ﬁtte d surf ace in terms of estimate d h ’s and c ’s. P earson and Filon explained this implicitly in th e ﬁrst p aragraph on page 231, when they w rite exclu- siv ely in terms of th e group of n in dividuals and the “means” h i (referring to the arithmetic means) as determining the origin of the surface, and exp licitly in the sh ort fourth paragraph, wh er e they refer to the h + ∆ h ’s and c + ∆ c ’s of the numerator as b eing considered “instead of the observe d v alues.” T his runs coun ter to mo d er n statistical practice, which w ould f o cus on a sp eciﬁcation in terms of p aram- eters, not the estimates, with one v alue consid ered the tru e or p opulation v alue, and the f o cus would b e the deviatio ns of the estima tes from that v alue. The idea of this t y p e of parametric mo deling was, ho w ev er, only to b e in tro duced in 1922 by Fisher (Stigler, 2005 ), and Pe arson’s “frequen cy constant s” w ere not parameters, ev en if they were sometimes emplo y ed in an equiv alen t fashion. This diﬀeren ce w as, as we shall see, highly consequentia l; it was the source of th e pr incipal diﬃculties in the argu m en t. Because Pea rson and Filon to ok the estimates as a s tarting p oin t, the T a ylor expansion th ey gav e w as ab out th e estimate d v alues. T h e expansion itself is ﬁne, bu t when they came to substituting in tegrals for su ms, they inadverten tly encounte red a problem. Consider the ﬁr st su m that inv olv es a general fre- quency constant c , n amely S d dc s (log f ). (The earlier terms in vo lv e the h ’s but hav e b een wr itten in terms of d eriv ativ es with resp ect to the x ’s; the issue is more clearly ad d ressed with the c ’s.) If we to ok f as a densit y , it might not b e unreasonable to replace this sum (divided b y n ) by its limit in p robabil- it y , the exp ectation E [ d dc s (log f )]. But und er what distribution should the exp ectatio n b e computed? With Fisher it wo uld b e computed under the distri- bution with the true v alues of the parameters. But P earson lac k ed that notion; for him there was no “true v alue,” only a summary estimate in terms of observ ed v alues. Someone—p erhaps F ourier—h as b een quoted as sa ying that “Mathemati cs has no s y mb ols for co n- fused ideas.” Any one seeking a coun terexample to this need lo ok n o fu rther than Pe arson–Filon. With their symb ol f = f ( x 1 , x 2 , x 3 , . . . , x m ; c 1 , c 2 , c 3 , . . . , c p ) submerging th e role of estimated v alues and elev at- ing them in the p ro cess to surrogates for tru e v al- ues, the argum en t go es astray . All exp ectatio ns are computed as if the estimated v alues were tru e v al- ues, and the result is a distribution for errors that do es not in any wa y d ep end up on the metho d used to estimate. Pea rson and Filon replaced S d dc s (log f ) b y the in tegral D s = R R R · · · f d log f dc s dx 1 dx 2 · · · dx m , whic h reduces iden tically to zero (if the t wo f ’s are identic al) und er fairly general regularit y condi- tions. But th e same w ould not b e generally true for R R R · · · f ( x | θ ) d log f ( x | ˆ θ ) dc s dx 1 dx 2 · · · dx m , writing ˆ θ for ( h 1 , h 2 , h 3 , . . . , h m ; c 1 , c 2 , c 3 , . . . , c p ), and θ f or the p o- ten tial true v alue ˆ θ is inte nded to estimate. KP’S ERRORS 5 The summation S d dc s (log f ) is id en tically zero if maxim um lik eliho o d estimates are used, bu t it re- mained for Fisher to notice that this term is n ot in general negligible; in fact, it will con tribute asymp- totical ly to the v ariance term if the estimate ˆ θ is in- eﬃcien t, as would b e the case for man y of Pea rson’s momen t-based estimates. P earson and Filon w rote that the expansion repr esen ted the d istr ibution “on the assump tion that the errors ∆ h 1 , ∆ h 2 , . . . , ∆ h m , ∆ c 1 , . . . , ∆ c p ha v e b een made in the d etermination of the frequency constan ts.” But that is n ot wh at they had done. With their notation of a simple f without argumen ts, and their ﬁxation on th e esti- mated v alues, they w ere led to mathematical mis- adv en ture. If th e substitution of limiting in tegrals f or sums had b een v alid, the lik eliho o d ratio P ∆ /P 0 w ould then hav e b een the ratio of the probabilit y densities of the sample w ith estimated h ’s and c ’s (the denom- inator) to that for a hyp othetical s et of alternativ e v alues (the h + ∆ h ’s and c + ∆ c ’s, the n umerator). In mo dern terminology , Pe arson’s ﬁnal expression ( 1 ) w as claimed to b e an appro ximation for P ∆ , the (conditional) densit y of the sample x give n that the estimated v alues are in err or by ∆ h or ∆ c , and their ﬁnal clai m (“In other words, we ha ve here the fre- quency distr ibution for err ors in the v alues of the frequency constan ts”) was an assertion that form ula (1) also giv es the (conditional) d ensit y of the err ors ∆ h and ∆ c giv en the d ata x . This last statemen t w as not explained, b ut later in 1916 corresp ondence with Fisher (quoted in Stigler, 2005 ) P earson de- scrib ed it as the use of in v erse probabilit y , making it a naive Ba ye sian approac h with a un iform prior, suc h as was practice d routinely o v er th e 19th cen- tury and w as sometimes referr ed to as the Gaussian metho d. P earson’s con temp oraries did not raise questions ab out the memoir. When Edgew orth discuss ed and extended it in 1908, he ga v e no indication he sa w an ything amiss (Edgewo rth, 1908 ). Only in 1922 did Ronald Fisher criticize the app roac h of th e p ap er in a lengthy f o otnote (Fisher, 1922a , page 329), writ- ing that “It is un fortunate that in this memoir no suﬃcien t distinction is dra wn b et wee n the p opula- tion and the sample . . . . ” Fisher w ent on to sa y that the results implicitly assu m e the estimates ac- tually maximized the lik eliho o d function, whereas they were app lied in man y cases wher e this was not the case. He wr ote, “It w ould app ear that shortly b efore 1898 the pro cess which leads to the correct v alue, of the pr ob ab le err ors of optimum statis- tics, w as hit up on and fou n d to agree w ith the pr ob ab le err ors of statistics found b y the metho d of momen ts for normal curves and su rfaces; without further enquiry it w ould app ear to ha v e b een assumed that this pro cess was v alid in all cases, its di- rectness and simplicit y b eing p eculiarly attractiv e. The mistak e w as at th e time, p erhap s , a natur al one; b u t that it should ha v e b een disco v ered and corrected w ith - out rev ealing the ineﬃciency of the method of moments is a v ery remark able circum- stance” (Fisher, 1922a , page 329). It is w orth p oin ting out that th e size of the correc- tion Fisher noted w as needed wa s not small. Fisher ga ve several examples of n onnormal m em b ers of Pear- son’s o wn f amily of curve s where the lo wer b oun d of the eﬃciency of the moment -based estimates wa s zero. Since Fisher measured eﬃci ency as a ratio of v ariances, this meant that the correction needed for P earson’s 1898 expr essions for “probable errors” could b e enormous—in fact arbitrarily large. The 1898 ex- pressions w ere not larger than the actual prob ab le errors, but there w as little else that could b e said. There w as n o ﬁnite limit to the amount they und er- estimated th e actual probable errors. The ma jor error in the pap er was d ue (as Fisher noted) to a conceptual confusion, a taking of the es- timated fr equency constants in part of the analysis in the place of the actual frequency constants. Pea r- son had run aground after encount ering a need f or a clear notion of a set of v alues for his frequ en cy con- stan ts; he did not ha v e a framew ork to encompass b oth estimates and targets of estimation. T o some degree then, the Pe arson–Filon err or can b e seen to b e due to the lac k of the notion of parametric fam- ilies. P earson and Filon used notation in this m em- oir suggestiv e of parametric families, but the lac k of conceptual clarit y led to a confused and ultimately erroneous analysis. P earson th ough t of “frequency constan ts” as qu an tities su c h as moments, deriv able from arb itrary d ensit y curve s with the same mean- ing in all cases and with the sample moments as clearly leading to the b est estimates. Fisher’s comments were apt; p erhaps eve n to o gen- erous, although it is doubtful P earson would ha ve agreed with such an assessm ent. That the Pearson– Filon pro cedure ca n b e sho wn to w ork for eﬃci ent 6 S. M. STIGLER estimates is a s p ecies of mathematical acciden t, al- b eit one that ma y h a v e help ed to deceiv e Pea rson and probably pro d uced ov erconﬁdence w hen it ga ve the results he knew sh ould hold for the normal dis- tribution case. The fact that the identit ies they claimed in general w ould w ork for many eﬃcien t es- timates is mathematics that would ha ve b een foreign to P earson and Filon, and un ac h iev able without th e full notion of parametric families. F rom 1903 on, P earson subtly d istanced himself from the pap er with ou t ever calling atten tion to the errors, b ut he never repudiated it. I n 1899 William F. Sh eppard pu blished a long s tu dy of “normal cor- relation” (Sheppard, 1899 ). Sh eppard app ears to ha ve not seen the Pea rson–Filon pap er (at an y rate he did not cite it), and a part of w hat he p resen ted in- cluded probable errors for the fr equency constan ts in the normal case, d eriv ed by metho d s diﬀeren t f rom P earson–Filon. The metho ds he used we re qu ite straigh tforw ard—writing estimates as linear f unc- tions of frequency coun ts (using a T a ylor expans ion if necessary), and then ﬁnding momen ts from the v ariances and co v ariances of the coun ts in wa ys that remain s tandard to da y . The d irectness of Sh eppard’s metho d s must ha ve app ealed to P earson. In a sequence of articles, all with the same title “On the probable errors of f re- quency constan ts” (Pea rson, 1903, 1913, 1920 ), he present ed what he called “simple p r o ofs of the main prop ositions,” all the while w ith the Pe arson–Filon pap er receding into th e b ac kgrou n d. In 1903 he ga ve only a general reference to the 1898 treatmen t (as w ell as to Shepp ard); in 1913 h e only r eferred to the form ulae in 1898 f or the case of the norm al corre- lation co eﬃcient (wh ere they w ere correct); in 1920 he d id not cite the 1898 work at all. In the 1903 pap er h e includ ed formulae based u p on Sh eppard’s approac h that were capable of b eing work ed out for getting probable errors for estimates in ﬁ v e t yp es of cur v es w ithin the Pe arson family , but only for metho ds of momen ts estimates. The 189 8 pap er had a co nsider ab le impact up on statistica l practice in making the use of probab le er- rors a v ailable f or the en tire span of the n ew metho d - ology including momen t estimates. It could ev en b e argued that the wrong, generally o v eroptimistic probable errors w ere b etter than none at all. And again, the pap er had a signiﬁcan t impact up on Fisher. While preparin g his 1922 m emoir, Fisher clearly h ad P earson and Filon b efore h im, and his discussion of the asymptotic v ariance of m axim um lik eliho o d esti- mates (Fisher, 1922a , pages 328–32 9), in vo lving the expansion of the densit y of a sample, reﬂects that. Ho wev er, Fish er used th e expansion in a diﬀeren t w a y , and op erated und er diﬀerent assu mptions. He b egan by assuming that the estimate tended to nor- malit y with large samples, and u nder th at restric- tion and the assump tion that th e estimates maxi- mized lik eliho od , Fisher used the exp ansion to sho w ho w the asymptotic v ariance could b e found from the second deriv ativ e of the log d ensit y . Pea rson and Filon had sk etc h ed a solution to a p roblem that w as not the one th ey h ad embark ed up on. But it w as Fisher who recognized, with the conceptual appa- ratus of parametric families, that this s ketc h could lead to the solution of his o wn pr oblem. Pe arson the pioneer h ad laid a path that w as insuﬃ ciently w ell- lit for his o wn tra vel, bu t it p ro vided a brightl y lit high wa y for Fisher. 4. PEARSON’S SECOND MAJOR ERROR P earson introdu ced the C h i-square test in 1900, and it has b een wid ely celebrated as a great ac hiev e- men t in statistical m etho dology . I n 1984 the editors of a p opular science magazine select ed it as on e of t w en t y d isco veries made during the tw ent ieth cen- tury that ha ve c hanged our liv es (Hac kin g, 1984 ). Y et f or all this celebration, virtually n o historical men tion of th e pap er is mad e by statisticians w ith- out adding damning w ords to the eﬀect that Pe arson erred in cla iming, as w e w ould no w put it, that no correction in degrees of freedom need b e made wh en parameters are estimate d un der the n ull hyp othe- sis. W orse f or P earson’s reputation, suc h accounts further n ote that the error sto o d uncorrected un til it was s ensed in 1915 b y Green woo d and Y u le and deﬁnitiv ely corrected in 1922 and 1924 by Ronald Fisher, thus seemingly turning P earson’s landmark publication in to Fisher’s triumph o v er ignorance. P earson has had some defenders in this m atter; some ha v e eve n suggested that Pe arson was righ t all along. F or example, Karl’s son Egon and George Barnard hav e separatel y adv anced ten tativ e (and I think half-hearted) statisti cal cases that migh t b e made f or pro ceeding as P earson did (Pearson, 1938 , page 30; Barnard, 1992 ). Bu t a cold, clear-ey ed lo ok at the original 1900 pap er sho ws that suc h excuses cannot b e r econciled with Pea rson’s text. He did mak e an err or, and a b ig, consequen tial one to o. The crucial passage from Pe arson’s 1900 article is on p ages 165–166. Pea rson considered a test of KP’S ERRORS 7 ﬁt based up on a tot al of N fr equency coun ts from a samp le in dep end en tly distribu ted among n + 1 groups or categories, w ith m = theoretical f requency [i.e., the exp ected frequency for the group in question], m s = theoretical fr equency dedu ced from data for the sample [i.e., exp ected frequency usin g the data to ﬁn d the “b est” v alue for the group], m ′ = observ ed f requency [for th e group], and with the total coun t N = P m = P m s = P m ′ . P earson recognized that the estimated theoretical frequency m s w ould t yp ically diﬀer from the th eo- retical frequency m , and h e d en oted that diﬀerence b y µ ; th at is, µ = m − m s . His analysis ga ve p artic- ular atten tion to the relativ e error, namely µ/m s , “whic h,” he told us, “will, as a r ule, b e small.” The gist of P earson’s argumen t wa s to show that the Chi-square statistic based up on the theoretica l frequencies, χ 2 = P ( m ′ − m ) 2 m , is close to the Chi- square statistic based up on the estimated th eoret- ical frequencies, χ 2 s = P ( m ′ − m s ) 2 m s ; so close, in f act, that the d iscr ep ancy could for all practical purp oses b e ignored. 1 P earson had evident ly expand ed h ( m ) = ( m ′ − m ) 2 m = ( m ′ − m s − µ ) 2 m s + µ in a T aylo r series ab out m s , discard ed the terms of higher order than ( µ/m s ) 2 , and then summed the r esults o ve r th e n + 1 group s. P r o ceed- ing in this wa y , he w ould ha v e fou n d h ′ ( m ) = − ( m ′ 2 − m 2 ) m 2 , h ′′ ( m ) = 2 m ′ 2 m 3 , h ′′′ ( m ) = − 6 m ′ 2 m 4 , . . . . And so, sin ce µ = m − m s , h ( m ) = h ( m s ) + µh ′ ( m s ) + µ 2 2 h ′′ ( m s ) + µ 3 6 h ′′′ ( m s ) + · · · 1 P earson again employ ed S for P , and his argumen t is made harder than n ecessary to und erstand by tw o clear ty- p ographical errors. The typ ographical errors are an evident missing left p arenthesis in t he numerator of the second term on his ﬁrst line of equations on page 165, and a missing m s in the denominator of t he second term of the second line of equa- tions [it reapp eared, correctly , when this term w as rep eated tw o lines later; that equ ation is our equ ation ( 3 ) b elow] . = ( m ′ − m s ) 2 m s − µ m s m ′ 2 − m 2 s m s +  µ m s  2 m ′ 2 m s −  µ m s  3 m ′ 2 m s + · · · = ( m ′ − m s ) 2 m s − µ m s m ′ 2 − m 2 s m s +  µ m s  2 m ′ 2 m s , dropping terms of higher order than ( µ/m s ) 2 . Su m b oth sides o ve r the n + 1 group s and this is the ex- pression P earson arrives at: χ 2 = χ 2 s − X  µ m s m ′ 2 − m 2 s m s  (2) + X  µ m s  2 m ′ 2 m s  , and hence , χ 2 − χ 2 s = − X  µ m s m ′ 2 − m 2 s m s  (3) + X  µ m s  2 m ′ 2 m s  . The term − m 2 s in the n umerator of the ﬁrst term on the r ight-hand side of ( 3 ) is sup erﬂuous when summed o v er groups since P µ = P m − P m s = 0, but it pla ys a role in Pearson’s argument, whic h is no d oubt wh y he left it in. F or futur e r eference, I note that exactly the same result can b e arriv ed at more simply b y noting χ 2 = X  m ′ 2 − 2 mm ′ + m 2 m  = X  m ′ 2 m  − 2 N + N = X  m ′ 2 m  − N , and similarly χ 2 s = P { m ′ 2 m s } − N ; then χ 2 − χ 2 s = X  m ′ 2 m  − X  m ′ 2 m s  = X m ′ 2  1 m − 1 m s  . If we th en expand m − 1 as a function of m ab out m s (again n eglecting third- or higher order terms), we get χ 2 − χ 2 s = − X  µ m s m ′ 2 m s  + X  µ m s  2 m ′ 2 m s  . (4) This agrees exactly with Pe arson’s expans ion ( 3 ) when the sup erﬂuous term “ − m 2 s ” is dropp ed, as w ould ha ve to b e the ca se since the function b eing expanded ( χ 2 − χ 2 s ) is th e s ame in b oth cases. 8 S. M. STIGLER It is not hard to show reasonably generally under the hypothesis of ﬁt that the terms dropp ed, ev en when summed, are ind eed with high probabilit y neg- ligible w hen N is large [ O P ( N − 1 / 2 )]. In ord er to see where P earson w as led astra y , w e must then lo ok to the paragraph follo wing his equations. P earson’s ar- gumen t p ro ceeded as follo ws: He r ecognized that the diﬀerence ( 3 ) b et ween these t w o C hi-squares sh ould b e p ositiv e: th e d eviation of the obser ved counts from the theoretical coun ts should b e greater than the same deviation if the theoretical count s are ad- justed to ﬁt the observ ed. He wished to argue that the diﬀerence ( 3 ) w as not large. His argumen t w as in t wo parts: (i) the ﬁrs t term on the r igh t-hand side of ( 3 ) sh ould b e exp ected to b e either negativ e (th us canceling out part of the second term) or at least very small; (ii) th e second term wa s nonn ega- tiv e of course, b ut it w ould b e exp ected to b e small in any case, b ecause it inv olv ed for eac h summ and the square of th e relativ e error µ /m s , whic h Pea rson had stated (page 164) “will, as a r ule, b e small,” and m uc h sm aller still when squared. He ga v e no citation for this “rule,” but t w o yea rs earlier he had explic- itly cited Gauss, Laplace and P oisson, among oth- ers, as sanctioning the dropping of terms in vo lving the squares of errors though t to b e small (P earson and Filon, 1898 , page 246). Presumably in stating this he assumed go o d estimates and ample d ata. He gran ted that in some cases where the ﬁt wa s bad the deviations w ould b e quite large, bu t then b oth Ch i- squares wo uld b e large and the discrepancy b et w een them u nimp ortan t. There are tw o p oin ts to mak e ab out P earson’s ar- gumen t. The ﬁrst is that his analysis (i) of the ﬁrst term ma y seem dub ious to mo d ern ey es, but it is not the source of the error. He noted that th e ﬁrs t term will b e p ositive only if the tw o terms m ulti- plied ( µ = m − m s and m ′ 2 − m 2 s ) are negativ ely correlated 2 ; that is, if there was a tendency f or the m ’s to b e ordered m ′ > m s > m or m ′ < m s < m . He thou ght such a tendency “seems imp ossible,” but this is uncon vincing, at least und er the null hy- p othesis of ﬁt. Migh t w e not then exp ect often to ﬁnd m ′ > m s > m or m ′ < m s < m , with m s a com- promise b et ween theory and observ ation? He m igh t ha v e had an alternativ e hyp othesis in mind, where m ′ w ould then tend to track the true theoretical ex- p ectation m , lea ving the estimate m s (made und er 2 This would presumably b e why he left the superﬂ uous term “ − m 2 s ” in the expression. false assu mptions) oﬀ to one sid e. Although Pea r- son’s argumen t on p oin t (i) ca n b e qu estioned, his conclusion is correct. As Fisher would observ e later, the ﬁr s t term is in fact zero (or nearly so) if the esti- mated m ’s are c hosen well (minimum Ch i-square or maxim um lik eliho o d) due to the (near) orthogonal- it y of m − m s and m ′ − m s in those cases (m uch like that of ¯ X and X i − ¯ X for n ormal distr ib utions). In an y ev ent, it is part (ii) of his argumen t that is cru cial, and that argument fails, and fails dra- matically . The s econd term on the right -hand side of ( 3 ) sh ould not b e exp ected to b e sm all un der either n ull or alternativ e h y p othesis. A t this dis- tance in time it ma y seem sur prising that P earson did not realize this. Already in 1938 his son Egon registered th is surp rise in a biographical memoir of his f ather (P earson, 1938 , page 30), when he noted that for an y multinomial distr ibution, if Chi-square is computed with no parameter restrictions (so eac h theoretical v alue is estimated b y the corresp ond- ing observed coun t and m s = m ′ ), then the ﬁt with the estimated v alues is p erfect. W e wo uld th us hav e χ 2 − χ 2 s = χ 2 − 0 , while th e right-hand side of ( 3 ) giv es − X  µ m s m ′ 2 − m 2 s m s  + X  µ m s  2 m ′ 2 m s  = − 0 + X  ( m − m ′ ) 2 m ′  . In this extreme case the second te rm is asymptoti- cally equiv alen t to the original Ch i-squ are itself un- der the null hyp othesis, and so it is certainly not negligible. Th e test of ﬁt is not in teresting here (we w ould sa y the degrees of f reedom is zero), but it sho ws starkly the dev astating eﬀect estimated pa- rameters can ha v e up on the statistic, ev en wh en (as in Egon’s examp le) the relativ e err or itself ( µ/m s ) w ould b e sm all [ O P ( N − 1 / 2 )]. Wh y , Egon seemed to ask, would Karl ha v e not seen th is? Egon oﬀered his father’s p ossib le “hurry in execution” as one expla- nation. 5. FISHER’S CORRECTION Ironically , Pearson did consider a similar example in 192 2 and rejected its relev ance. In 1922 Fisher ( 1922b ) pub lished his ﬁr s t comment on the degrees of freedom issu e, and at that time h e dealt on ly with the case Greenw o o d and Y ule h ad noticed, the case of r × c con tingency tables. There, Fisher’s argumen t w as k ey ed to the w a y the lin ear relations with the KP’S ERRORS 9 marginal totals in hibited the estimated exp ectations under the n ull hyp othesis, th u s reducing the “de- grees of freedom,” a term Fisher in tro duced there. A t that time, Fisher made no attempt to addr ess the question for tests of ﬁt more generally . P earson immediately rebutted in Biometrika . The reply fo- cused up on w hat Pe arson thought (mistak enly) w as a confu sion b et ween diﬀerent sampling mo dels (ﬁxed totals or full multinomial sampling), and P earson in v ok ed the traditional custom of astronomers and others of sub stituting estimates with small standard errors without p enalt y in large samp les. He th ough t Fisher had b lundered and w as oﬀering an exclusively conditional analysis, giv en the estimated qu an tities. P earson noted ( 1922 , page 187 ) that if y ou estimated “the ﬁrst p − 1 moment -co eﬃcien ts” a p erfect ﬁ t w ould b e obtained; he r ejected suc h a conditional analysis as restricting the random sampling and an- tithetical to the question at issue. He did not see (and Fisher’s exp osition wo uld ha ve made it d iﬃ- cult for him to see) that in the con tingency table setting the conditional and unconditional tests w ere the same. In 1924 Fish er r eturned to address the more gen- eral question, and if w e lo ok at Fisher’s treatmen t there, w e see exactly where P earson’s argumen t ab out the second term of ( 3 ) failed, and exactly what he lac ked for a successful tr eatment (Fisher, 1924 ). W rit- ing in 1924, Fisher clearly h ad Pearson’s pap er in fron t of him. Fisher used slightly d iﬀeren t n otation, 3 but for ease of comparison I sh all translate to Pear- son’s notation. Fisher’s d ev elopmen t was sligh tly streamlined in that Fisher did giv e the sim p ler ex- pression for the d iﬀeren ce of C hi-squares: χ 2 − χ 2 s = X m ′ 2  1 m − 1 m s  . It is exactly this expression that Fisher expand ed in a T aylo r series, just as P earson had d one, but with one absolutely crucial d iﬀerence. Fisher was no w armed with his o wn recently introd uced notion of a parametric family , and w here Pea rson had sim- ply dealt w ith this as a function of m , Fisher had m = m ( θ ) and expanded as a fun ction of θ , not m . He f ound the same t w o terms Pearson had f ound, but expressed th em diﬀeren tly: 1 m − 1 m s 3 Fisher used χ ′ , x, m ′ and n where Pearson used χ s , m ′ , m s and N . = − 1 m 2 s ∂ m s ∂ θ δ θ +  2 m 3 s  ∂ m s ∂ θ  2 − 1 m 2 s ∂ 2 m s ∂ θ 2  ( δ θ ) 2 2 + higher-ord er terms . If this is multiplied by m ′ 2 and summed it giv es χ 2 − χ 2 s = − δ θ X  m ′ 2 m 2 s ∂ m s ∂ θ  + ( δ θ ) 2 2 X  2 m ′ 2 m 3 s  ∂ m s ∂ θ  2 − m ′ 2 m 2 s ∂ 2 m s ∂ θ 2  . Fisher wa s now able to see th at if th e minim um Chi- square estimate ˆ θ is used, then his ﬁr st term an d P earson’s ﬁrst term actually v anish ed (since then the ﬁrst s u mmation is exactly d dθ χ 2 | θ = ˆ θ = 0), and he knew already that the same would b e true asymp- totical ly for the maxim um lik eliho o d estimate or an y other eﬃcien t estimate of θ . He th en replaced m ′ /m s b y unit y (its asymptotic v alue) to get χ 2 − χ 2 s = ( δ θ ) 2 X  m ′ 2 m 3 s  ∂ m s ∂ θ  2 − m ′ 2 2 m 2 s ∂ 2 m s ∂ θ 2  ≈ ( δ θ ) 2 X  1 m s  ∂ m s ∂ θ  2 − 1 2 ∂ 2 m s ∂ θ 2  = ( δ θ ) 2 X  1 m s  ∂ m s ∂ θ  2  . The last step used th e fact that P { ∂ 2 m s ∂ θ 2 } = ∂ 2 ∂ θ 2 · P m s = ∂ 2 ∂ θ 2 N = 0 . Based up on his o wn 1922 p ap er, he now n oted that P { 1 m s ( ∂ m s ∂ θ ) 2 } w ould, in the case of a sin gle estimated parameter θ , estimate (and ap- pro ximate asymptotically) the r ecipro cal of the v ari- ance of any eﬃcient estimate. This wo uld giv e in mo dern notation χ 2 − χ 2 s = ( ˆ θ − θ ) 2 σ 2 ( ˆ θ ) . T h is diﬀerence then w as asymptotically equiv alent to the square of a standard norm al random v ariable. Th e degree of freedom that is lost by estimation b ecame clearly visible. There are tw o views that ma y b e tak en of this. One I ha ve already men tioned: that the alc hemist Fisher’s concept of a parametric family had tu r ned P earson’s base expressions into statistica l gold. Pos- terit y has used this to diminish Pearson’s repu tation— ho w could he ha ve missed suc h a simple and (now) ob vious step? But there is another, to me more p er- suasiv e view. F or o ver 20 y ears th at step w as any- thing bu t ob vious. Pearson’s p erceptive s tu den t G. 10 S. M. STIGLER Udn y Y ule initially accepted the 1900 rule, for ex- ample usin g 8 rather than 4 degrees of freedom for a Chi-square test of a 3 × 3 con tingency table in Y u le ( 1906 , p age 349). Only in 1915, after y ears of ex- p erience, did Greenw o o d and Y ule ( 1915 ) bring th e puzzle to w ider n otice, and even th en neither they nor an y one else h ad a clear view of the source of the problem. And so it sto o d u n til Fisher. Ev en with Fisher’s work b efore us, we must mar- v el at ho w f ar Pea rson had gone. He had lac ke d only one ingredient —parametric families—but what he had managed to do w as to identify the issue and present it in suc h a clear wa y that wh en Fisher com- bined P earson’s 1900 dev elopmen t with the d ecep- tiv ely simp le idea of parametric families, the solu- tion must hav e sp r ung to mind nearly imm ed iately . It to ok Fish er’s genius to answer the q u estion, b ut he would scarcely ha v e b een in a p osition to do so without the path-breaking form ulation of P earson the p ioneer. It is n ot anac h ronistic to see Pearson as err in g in 1900. Ev en without the notion of parametric fami- lies he could hav e seen a discrepancy without seeing a resolution, j u st as Greenw o o d and Y ule did, wh en they found th e Chi-square test f or 2 × 2 tables gav e results inconsistent with a comparison of th e tw o columns as bin omial count s. Pea rson erred, but the error led to Fisher’s disco ve ry of degrees of freedom. P earson had n ot on ly solv ed the great problem of testing multinomial go o dn ess of ﬁt against all alter- nativ es, he had also isolated and form ulated another great problem in terms that tw o d ecades later p er- mitted another genius, armed with his o wn ma jor disco v ery , an easy solution. 6. CONCLUSION The errors Pe arson made did not go undetected b ecause they we re small; to the cont rary , they were large and of p oten tially large practical consequence. F or example, in Pea rson an d Filon’s o wn numerical example for a Type I I I or Gamma d ensit y (Pea rson and Filon, 1898 , p ages 279–280 ), the p robable er- ror given for the shap e parameter p is only ab out a ﬁfth of what it should ha v e b een (Fisher, 1922a , page 336). If the cur v es b eing ﬁ t b y the metho d of momen ts h ad b een closer to the normal s hap e, the errors w ould h a ve b een smaller, b ut if not, there w as no ﬁ nite b ound on ho w far oﬀ th ey could b e. F or Chi-squ ares for 2 × 2 tables Pe arson w ould giv e 3 r ather than 1 degree of freedom; for 3 × 3 tables he w ould giv e 8 rather than 4 d egrees of freedom. In these and other examples the eﬀect up on inferences could b e d ev astating. Not only w ere the er r ors Pea rson made not eas- ily disco v ered; ev en after they w ere p ointed out in 1922 they w ere not w idely understo o d. In 1924, a Handb o ok of Mathematic al Statistics w as published , prepared un der the auspices of the U.S. National Researc h Council (Rietz, 1924 ). The Editor-in-Chief w as H. L. Rietz, and ma jor con tributions were m ade b y Harv ard Unive rsity Professor E. V. Hun tington and Unive rsity of Mic higan Professor H. C. Carv er (later the founding editor of the Annals of Mathe- matic al Statistics ). Carver cited the P earson–Filon pap er without any ind ication h e saw th e pr oblem with it (page 95). Fisher’s ( 1922b ) ﬁrst correction to the degrees of f r eedom for con tingency tables w as brieﬂy cite d without commen t b y Rietz, bu t Ri- etz (with evident approv al) also ga v e in more detail P earson’s argument that n o correction f or estimat- ing exp ected v alues w as needed (pages 80–81). Else- where in the v olume, Hun tington wrote warmly of the metho d of m omen ts, and no where w as Fisher’s magn um opus of 1922 r eferred to. Eve n in En g- land und erstanding w as slo w. By 1938 Egon P earson had conceded the degrees of fr eedom issue, but he seemed to h a v e n ot accepted Fisher’s p oin t ab out P earson and Filon (P earson, 1938 , pages 28–29 ). Both P earson and Fisher w ere gian ts in our his- tory; d esp ite their lac k of m u tual app reciation we cannot im agine mo dern statistics w ith ou t b oth. P ear- son’s errors we re substant ial and n ot to b e glossed o ver, but they should not obscure the even greater ac h ievemen ts they accompanied. Pearson had a gi- an t am bition and the en ergy to r ealize it. He sough t to create a whole new statistical system, and for a time succeeded. He did n ot ha v e a mathemati- cal min d equal to Fisher’s, an d h e b ecame mired in and neve r escaped from an incompletely dev elop ed conceptual app aratus that w as n ot equal to the full task at hand . But he to ok statistics to a h igher lev el nonetheless. If P earson could never come to admit some failures, it wa s surely due to a stubb ornness that ev en he recognized in h im s elf. In th e Pr eface for the S econd Edition of The Gr ammar of Sci enc e ( 1900 b ), P earson wrote, “If I h a v e not paid greater atten tion to m y numerous critics, it is not that I ha ve failed to stu d y them; it is simply that I ha v e remained—obstinately it ma y b e— con vinced that the views expr essed are, KP’S ERRORS 11 relativ ely to our present state of kno wl- edge, su bstan tially correct” (Pe arson, 1900b , page ix). So it was with his statistical work as we ll. P earson’s impact u p on Fisher ma y in the end stand as one of his greater ac hiev emen ts. P earson had no student more diligen t th an Fisher, desp ite their dif- ferences. When in 1945 Fisher wrote an ill-fated bi- ographical account of P earson f or the D ictionary of National Bio g r aphy (rejecte d b y the Dictionary and not p u blished until by A. W. F. E dwa rds, in 1994 ), he wrote to th e editor that he had made a “lifelong study of Pea rson’s w ritings.” Fish er further stated, “I hav e du ring the last 35 y ears at v arious times had o ccasion to lo ok at probably all of [P earson’s fu n- damen tal statistical memoirs] and at th e immense output which w as p ublished in Biometrika .” It was from r eading Pe arson’s work and P earson’s journal that Fisher’s in terest in statistics deve lop ed in the w a y it did, and in the case of the tw o examples dis- cussed here, the eﬀect of the P earsonian blueprint could scarcely b e more evid ent. Fisher saw Pe arson clearly , w arts and all, and while he did not ac k n o wl- edge the exten t of his debt to Pea rson, its exten t is clear to other, less inv olv ed readers. As in Newton’s famous statemen t, Fisher s to od on the sh oulders of a gian t (Merton, 1965 ). P orter’s r ecen t biograph y ( 2004 ) is illuminating on P earson’s pre-statistical life. Eisenhart ( 1974 ) re- mains the most complete discussion of K. P .’s sta- tistical w ork. F or other discussion relating to this early w ork see Aldrich ( 1997 ), Hald ( 1998 ), Mag- nello ( 1996 , 19 98 ). On Ch i-square see in particular Fien b erg ( 1980 ), Hacking ( 1984 ), P lac kett ( 1983 ) and Stigler ( 1999 , Chapter 19). F or other asp ects of th e Pearson–Fisher relationship see Stigler ( 2005 , 2007a , 2007b ). P earson himself returned to that topic of C h i-square frequen tly , including P earson ( 1915 , 1922 , 1923 , 19 32 ), most of these under the ins tiga- tion of Fisher. REFERENCES Aldrich, J. (1997). R. A. Fisher and the making of maximum lik eliho od 1912–192 2. Statist. Sci. 12 162–176. MR1617519 Barnard, G. A. (1992). Introd u ction to Pearson (1900). In Br e akthr oughs in Statistics I I (S. Kotz and N. L. Johnson, eds.) 1–10. Sp ringer, New Y ork. Edgewo r th, F. Y. (1908). On th e probable errors of frequency-constants (contd.). J. R oy. Statist. So c. 71 499– 512. Edw ards, A. W . F. (1994). R. A. Fisher on Karl P ear- son. Notes and R e c or ds R oy. So c. L ondon 48 97– 106. MR1272638 Eisenhar t, C. (1974). K arl Pearson. In Dictionary of Scien- tiﬁc Bio gr aphy 10 447–473. Charles Scribner’s Sons, New Y ork. Fienberg, S. E. (1980). Fisher’s con tributions to th e analysis of categorical d ata. In R. A . Fisher: An Appr e ci ation . (S. E. Fienberg and D. V. Hinkley , eds.) 75–84. L e ctur e Notes in Statist. 1 . Springer, New Y ork. Filon, L. N. G. (1934). Remarks in Sp e e ches Deliver e d at a Dinner i n University Col le ge, L ondon, i n Honour of Pr o- fessor Karl Pe arson, 23 April 1934 . The Universit y Press, Cam bridge. Fisher, R. A. ( 1922a). On the mathematical foun d ations of theoretical statistics. Philos. T r ans. R oy. So c. L ondon ( A ) 222 309–368; reprinted as Paper 18 in Fisher ( 1974 ). Fisher, R. A . (1922b). On the interpretation of χ 2 from con- tingency tables, and the calculation of P. J. R oy. Statist. So c. 85 87–94; reprinted as Paper 19 in Fisher ( 1974 ). Fisher, R. A . (1924). Conditions un der whic h χ 2 measures the discrepancy b etw een observa tion and h yp othesis. J. R oy. Statist. So c. 87 442–450; rep rinted as Paper 34 in Fisher ( 1974 ). Fisher, R. A. (1974). The Col le cte d Pap ers of R. A. Fisher (J. H . Bennett , ed.). Univ. Adelaide. Greenwood, M. and Y ule, G. U. (1915). The statistics of anti-t y phoid and anti-c h olera ino culations, and the inter- pretation of such statistics in general. In Pr o c. Ro y. So c. Me dicine , Se ction of Epidemiol o gy and State Me dicine 8 113-190; reprin ted ( 1971) in Statist ic al Pap ers of Ge or ge Udny Y ule ( A . Stuart and M. G. Kend all, eds.) 171–248 Griﬃn, Lond on. Hack ing, I. (1984). T rial by number. Scienc e 84 5 69–70. Hald, A. (1998). A History of Mathematic al Statistics F r om 1750 to 1930 . Wiley , New Y ork. MR1619032 Jeffrey, G. B. ( 1938). Louis Nap oleon George Filon. J. L on- don Math. So c. 13 310–318. Magnello, M. E. (1996). Karl Pearso n’s Gresham lectu res: W. F. R. W eldon, sp eciation and the origins of Pearso nian statistics. British J. History of Scienc e 29 43–63. Magnello, M. E. (1998). Karl Pears on’s m ath ematization of inheritance: F rom ancestral heredity to Mendelian genetics (1895–19 09). Ann. Sci. 55 35–94. MR1605849 Mer ton, R . K. (1965). On the Shoulders of Giants: A Shan- de an Postscript . The F ree Press, New Y ork. Morant, G. M. (1939). A Bibli o gr aphy of the Statistic al and Other Writings of Karl Pe arson (Compiled with the Assis- tance of B. L. W elc h). Cambridge Univ. Press. Pearson, E. S . (1938). Karl Pe arson: An Appr e ciation of Some Asp e cts of His Life and Work . Cambridge Univ. Press. Pearson, K. (1895). Mathematical contributions t o the the- ory of evolution, I I: Sk ew v ariation in homogeneous ma- terial. Philos. T r ans. R oy. So c. L ondon ( A ) 186 343–414. Reprinted in Karl Pe arson ’s Early Statistic al Pap ers (1956) 41–112 Cam bridge Univ. Press. Pearson, K. (1896). Mathematical contributions to the theory of evolution, I I I: Regression, heredity and pan- mixia. Philos. T r ans. R oy. So c. L ondon ( A ) 187 253–318. 12 S. M. STIGLER Reprinted in Karl Pe arson ’s Early Statistic al Pap ers (1956) 113–178 Cambridge Univ. Press. Pearson, K. (1897). Mathematical contributions t o the the- ory of evolution. On a form of spurious correlation whic h ma y arise when indices are u sed in the measurement of organs. Pr o c. R oy. So c. L ondon 60 489–498. Pearson, K. (1900a). On t he criterion that a given system of deviations from t h e probable in th e case of a correlated sys- tem of v ariables is such that it can b e reasonably supp osed to hav e arisen from random sampling. Phi los. Magazine , 5th Series 50 157–175. Reprinted in Karl Pe arson ’s Early Statistic al Pap ers (1948) 339–357 Cambridge U n iv. Press. Pearson, K. (1900b). The Gr ammar of Scienc e , 2nd ed. [First was 1892]. Ad am and Charles Blac k, Lond on. Pearson, K. (1903, 1913, 1920). On the probable errors of frequency constants. Biometrika 2 273–28 1, 9 1–10, 13 113– 132. Pearson, K. (1915). On a brief proof of the fundamental form ula for testing the goo dn ess of ﬁt of frequency distri- butions, and on t he probable error of “ P .” Phil osophic al Magazine , 6th Series 31 369–378 . Pearson, K. (1922). On the χ 2 test of goo dness of ﬁt. Biometrika 14 186–191. Pearson, K. (1923). F urther note on the χ 2 test of go o dness of ﬁt. Bi ometrika 14 418. Pearson, K. and Filon, L. N. G. (1898). Mathematical con- tributions to the theory of evolution IV. On the prob ab le errors of frequ ency constants and on the inﬂu ence of ran- dom selection on v ariation and correlation. Phil os. T r ans. R oy. So c. L ondon ( A ) 191 229–311 . Reprinted in Karl Pe arson ’ s Early Statistic al Pap ers (1956) 179–261 Cam- bridge Univ. Press. Abstract in Pr o c. R oy. So c. L ondon (1897) 62 173–176. Pearson, K. (1932). Exp erimental discussion of the ( χ 2 , P ) test for go o d ness of ﬁt. Biometrika 24 351–381. Pla ckett, R. L. ( 1983). Karl Pea rson and the chi-squared test. I nternat. Statist. R ev. 51 59–72. MR0703306 Por te r, T . M . (2004). Karl Pe arson: The Scientiﬁc Life in a Statistic al Ag e . Princeton Univ . Press. MR2054951 Rietz, H. L. , e d. (1924). H andb o ok of Mathematic al Statis- tics . Houghton Miﬄin, Boston. Shepp ard, W . F. ( 1899). On the app lication of t he theory of errors to cases of normal distribution and normal corre- lation. Phi los. T r ans. R oy. So c. L ondon ( A ) 192 101–167 , 531. Stigler, S. M. (1986). The Hi story of Statistics: The Me a- sur ement of Unc ertainty Befor e 1900 . H arv ard Un iv . Press, Cam bridge, MA. MR0852410 Stigler, S. M. (1999). Statistics on the T able . Harv ard Univ. Press, Cambridge, MA. MR1712969 Stigler, S. M. (2005). Fisher in 1921. Statist. Sci. 20 32–49. MR2182986 Stigler, S. M. (2007a). The pedigree of the I nternational Biometrics S o ciety . Biometrics 63 317–321 . MR2370789 Stigler, S. M . ( 2007b). The epic story of maximum likeli- hoo d. Statist. Sci. 22 598–620 . Stouffer, S. A. (1958). Karl Pearson—An appreciation on the 100th anniversary of his birth. J. Amer. Statist. Asso c. 58 23–27. MR0109122 Yule, G. U. (1906). The inﬂuence of bias and of p ersonal equation in statistics of ill-deﬁned qu alities. J. Anthr op o- lo gic al I nstitute of Gr e at Britain and I r eland 36 325–381 .

Karl Pearsons Theoretical Errors and the Advances They Inspired

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment