On the history and use of some standard statistical models
This paper tries to tell the story of the general linear model, which saw the light of day 200 years ago, and the assumptions underlying it. We distinguish three principal stages (ignoring earlier more isolated instances). The model was first propose…
Authors: E. L. Lehmann
IMS Collectio ns Probability and St atistics: Essays i n Honor o f David A. F reedman V ol. 2 (2008) 114–126 c Institute of Mathematical Statistics , 2008 DOI: 10.1214/ 19394030 70000004 19 On the history and use of some standard statistic al mo de ls E. L. Lehmann 1 University of California, Berk eley Abstract: This paper tr i es to tell the story of the general linear model, whic h sa w the ligh t of d a y 200 y ears ag o, and the assumptions underlying it. W e distinguish three pri ncipal stages (ignoring earlier more isolated instances). The mo del wa s first prop osed in the con text of astronomical and geodesic observ ations, where the main source of v ariation wa s observ ational error. This wa s the main use of the mo del during the 19th cen tury . In the 1920’s it was deve loped in a new dir ection by R. A. Fisher whose principal appli cations were in agriculture and biology . Finally , b eginning in the 1930’s and 40’s it b ecame an imp ortant tool for the so cial sciences. As new areas of appli cations w ere added, the assumptions underlyi ng the mo del tended to become more questionable, and the r esulting statistical technique s more pr one to misuse. 1. In troductio n It was 2 00 years ago , in 1805, that Legendr e first published the metho d of least squares a nd a v a gue for mulation of what has come to be known a s the standard linear mo del [ 27 ]. This model has play ed a central role in the statistical metho do logy used in the physical, biolog ic al and so cia l sciences, a nd it is the aim of the pr e sent pap er to sketch this role. I am not tr y ing to write a histo ry of the linea r mo del 2 , but am mainly concerned with the role the underly ing assumptions hav e play ed in these three ar e a s of application. The model (defined in Section 6 ) assumes that e a ch obser v ation is the sum of tw o comp onents: a deterministic term (which is a linea r co mb ination of the relev a nt ex- planatory v ariables) and a r a ndom term r epresenting er ror or o ther “disturbances ”. The e rror ter ms ar e ass umed to be indep endently , iden tically distr ibuted (i.i.d.) ac- cording to a common no r mal distribution, and Sections 2 – 5 are therefo re concerned with the role o f these a ssumptions in the sp ecial cas e of the simple normal i.i.d. mo del. This mo del is o f interest also in its own right as the s tandard mo del fo r the one-sample pr oblem. Sections 6 a nd 7 take up issues concerning the no n-random term of the linear mo del. T o conclude this introduction, let me briefly consider the general nature of math- ematical, and par ticularly statistical, mo dels. 1 Unive rsity of California, D epartment of Statistics, 367 Ev ans Hall #3860, Berk eley , CA 94720- 3860, USA, e-mail: shaffer@ stat.ber keley.edu AMS 2000 subje ct classific ations: 62A01, 62-03, 62J05. Keywor ds and phr ases: assumptions, independence, least squares, linear model, norm ality , ob- serv ational studies. 2 F or accoun ts of this history see Seal [ 41 ] and Plack et t [ 38 ], and in a broader cont ext Stigler [ 45 ] and Hald [ 20 ]. 114 On the history and use of some standar d statist i c al mo dels 115 The first use of the ter m “mo del” by a statisticia n that I hav e found o c curs in Karl Pearson’s Gramma r of Science [ 33 ]. In this b o ok he emphasizes the distinction betw een the real phenomenon and the mo del, which he usually ca lls “conception” , but to which on a few o ccasio ns he refers as “mo dels ”. F or example he writes [ 33 ], p. 206: “The scien tist postulates nothing in the w orld beyon d sense [i. e. sense perceptions]; for him the atom and the ether ar e–like the geometric surface–models by which he resumes the wo rld of sense. The ro le a nd cons tr uction o f mathematical mo dels played a central ro le in the work of the mathematician Ric hard von Mises (18 83-19 53) w ho develope d such mo dels for a n um b er of disciplines, among them aero dynamics , hydrodyna mics, a nd plasticity . B e sides undertaking these efforts in the physical scie nces, von Mises felt that there was a great need for a similar treatment o f pro bability theory . Instead of relying on Lapla ce’s inadequate definition in terms of equa lly likely cases, he wan ted to build a mo del for probability that would repres ent the physical rea lity underlying this conce pt. In his fundamental pap er of 1 919 on the foundatio ns of probability theory [ 48 ], he describ es his approach as follows [my transla tion from the German]: “The presen t treatment is based on the assumption that probability theory is a natur al scienc e, of the same kind as g e ometry or t he or etic al me chanics . It has the aim to present the r elations and dependencies of specific observ able phenomenon, not as a faithful description of r eali t y , but as its abstraction and idealization.” The obser v able pheno menon for which the pap er builds a mo del is the stability of the long-r un frequency of an outcome in a long sequence o f rep eated random even ts, for exa mple, the freq uency o f heads in a long se q uence o f tosses o f a coin. The pr incipal co nstituen ts o f the mo del a re infinite sequences of tria ls with r andom outcomes, of which it is a ssumed that the frequency of a g iven outcome tends to a limit. T o define randomness, von Mises requir e s that the same limit should obtain in any predeter mined subsequence which ca n b e chosen in the lig ht of the obse r v ations up to this p o in t. It turned out that this formulation was to o complicated; it was also to o narrow in that it only a pplied to situatio ns which a llow ed a larg e num ber of re petitio ns . As a formulation of pro bability theory it w as repla ced by a quite different approach prop osed by Kolmog orov [ 23 ]. Ko lmogor ov did not construct a mo del for probability theory . Instead, he stated a small num ber of very s imple axio ms which a ny idea l mo del should satisfy . In this system probability itself was left undefined, sub ject only to thes e ax ioms. In this wa y the system could b e in terpreted and fleshed o ut not only b y the freque nc y concept of probability but also by the idea of probability as deg ree of b elief (The background of Ko lmogorov’s formulation is discus s ed in Shafer and V ovk [ 42 ]). Although von Mises’ mo deling a pproach was not the one that was ultimately adopted by the pro fession, it nevertheless e x erted g reat influence, par ticularly o n Kolmogo rov who cited the effect it ha d on his own formulation and who la ter to ok it as a starting p oint of a re newed e ffort to get a gr ip on the crucia l (a nd difficult) concept of rando mness 3 . 3 F or a more detailed review of these foundational issues, see v on Plato [ 49 ]. 116 E. L. L ehmann 2. The assumption of normali t y in the 19th cen tury The most widely used sta tistical mo del fo r a sequence of rep eated meas urements has b een the i.i.d. nor mal mo del acco rding to which the o bserv a tions are indep en- dent ly distributed with a common normal distribution. The first p ers on to write down the formula for what to day is called the norma l density was De Moivr e [ 10 ] who derived it as the limit of the binomia l but did not consider it as a proba bility density in its own rig ht. It was only in the b eginning of the 19 th Century through the combined insights o f La place a nd Ga us s that the central r o le of the normal distribution w as realized. The normal distr ibution b ecame the acknowledged mo del for the distribution o f err ors of physical (pa r ticularly a stronomica l) measurements and w as called the Law of E rror s. 4 It had a theor e tical basis in the so called Law of Elementary E r rors which a ssumed that an observ ational error is the sum of a large nu m ber of s ma ll indep endent error s and is therefore approximately normally dis - tributed by the Central Limit Theorem. This argument w as re info r ced by extensive exp erience which showed go o d a pproximate agree men t with the normal form. The ma ny textb o oks on the sub ject all agr eed on this point. F or example, Brunt [ 6 , 7 ], after citing theoretica l a rguments in fav or of the normal distr ibution, sta tes that “ the final justification o f Gauss’ error cur ve res ts up on the fact that it works well in practice and yie lds curves which in very many cases agr ee very closely with the obs erved fre q uency curves. The normal law is to b e reg arded as pr ove d by exp erienc e and explaine d by Hag e n’s h ypothesis [i.e. the law of elementary errors.]” In a Germa n text, Helmer t [ 2 1 , 2 2 ] explains [my tra nslation]: “The form of the distribution of errors ca n only b e determined through observ a tion. . . . According to exp erience the [nor ma ] law usually provides a close a pproximation.” Similar statements can b e found in other British, German and F r e nch texts. The history of the Theory o f Er rors is recounted in meticulous detail in a bo ok on the sub ject by Czub er [ 9 ]. It was realized o f course that the no rmal law could only be a n approximation since in pr actice the observ ations were discrete and b ounded. Nevertheless, as Czub er wr ites [m y transla tion]: “An essential supp ort of Gauss’ Law of Er rors is provided b y the agreemen t which exists betw een its consequ ences and the results of observ ations that ha v e really been obtained. It has led to the general acceptance by observers despite the conc erns that can b e raised against the v ar ious theoretical argument s i n i ts fav or.” Czube r devotes 14 pages to empirical co mparisons o f the law of er rors with exp erience, a nd a no ther section to the “elimination of contradictory observ ations” [i.e. gros s erro rs], b ecause it is only after such “ doubtful obser v ations ” have be e n remov ed that the law a pplies. He analyzes seven data sets from a stronomy and geo desy , and in a ddition some exp erimental results of measurements carr ie d o ut sp ecifica lly to test the Gaussia n law. His method in these examples is to compare the num bers of obser v atio ns in v a r- ious interv als with thos e pr edicted by theory . His conclusion is tha t the agr eement is “sa tisfactory”. It sho uld b e noted that when Czub er mentions the gener al acceptance o f the law, this statement is implicitly r estricted to the principal ar eas of applicatio n he considers: the physical sc ie nc e s a nd particular ly as tronomy and geo desy . The nor mal distributio n lost its exclusive p osition tow ard the e nd of the 19th Cent ury when a strong interest develop ed in systematic application of statistical 4 The term “normal distri bution” wa s first suggested by P eirce [ 37 ], Lexis [ 28 ] and Galton [ 16 ] and began to take hold in the 1880’s (see Stigler [ 46 ], Chapter 22). On the history and use of some standar d statist i c al mo dels 117 metho ds to biolo gical, so ciolog ical a nd econo mic inv estigations. There the distri- butions enco unt ered were often far fro m normal a nd freq ue ntly asymmetric. As a result, families o f models were develop ed that include s kewed and heavy-tailed dis- tributions. The most influential of these a t the time was the system pr op osed by Karl Pearson [ 34 ]. Stigler [ 45 ] (p. 3 35) ex plains Pearson’s success: “In thirt y pages of detailed examples he demonstrated the successful flexibility and practicalit y of the system with a f orce that bludgeoned any p otent ial skept ic into submission. The examples ranged from V enn’s barometric pressures to the heigh ts of St. Louis school girl s to W eldon’s crabs to statistics on pauperism . . . . Not only was his w ork more general than others, it w as practical and came to public view wi th a record of pro v en accomplishment.” 3. Fisher and the assum ption of normali t y Early in the 20 th Century the no rmal distr ibution regained its as cendancy thr ough the small sample work of R.A. Fisher. As Gea ry [ 17 ] summarized the situation in a histor ic al overview: “Our hi storian wi ll find a s ignifican t cha nge of attitude a quarter of a cent ury ago following the brill iant work of R.A. Fisher who show ed that when unive rsal normal- ity could b e assumed, inf erences of the widest pr actical useful ness could b e drawn. Prejudice i n f av or of normality returned in full force and interest in non-normali t y receded in the back ground.” But could universal normality b e as s umed? Fisher developed his new metho ds not in the co ntext of as tr onomical measur e men ts but for a pplica tion to biologica l and agr icultural da ta, and here the assumption is on m uch shakier gr o und. In the ea rlier applications, the principa l source of v ariation in the obser v ations was observ a tional er ror. Now to this is added the v a r iability of the sub jects (trees , agricultura l plots, far ms, . . . ) b eing mea sured. These sub jects Fisher mo deled as random sa mples fro m a n infinite p opulation, and he based his der iv atio ns on the assumption that the p opulation dis tribution of the characteris tic b eing mea sured is normal. But this distribution dep ends on the natur e o f the p opulatio n, and the basis for such a n a ssumption in these circumstance s o ften is weak. Fisher’s tr eatment of the assumptions of no rmality in his enormously s uccessful 1925 b o ok, “Statistical Metho ds for Research W o rkers” (SMR W) [ 11 ], gave rise to a heated contro v ersy betw een Fisher and E.S. Pearso n. The conflict had its or igin in Pearson’s 1929 r eview of the 2 nd Editio n of SMR W for the journal “Natur e ”. On the whole the r eview was favorable, but it co n tained the following critical para graph. “There is one cri ticism how ev er which must b e made from the statistical point of view. A l arge num ber of these tests are based . . . on the assumption that the p opulation sampled is of the ‘normal’ form. That this is the case can be gathered f r om a careful reading of the test, but the poi n t is not sufficiently emphasized. It does not app ear reasonable to la y stress on the ‘exactness’ of the tests when no means whatev er are give n of appreciating how rapidly they b ecame inexact as the p opulation sampled dive rges from normality . That the tests, f or example, conne cted with the analysis of v ari ance [ here he is referring to the F -test for v ariances] are far more dependent on normality than those inv olv ed Studen t’s z (or t ) distribution is almost certain, but no clear indication of the need for caution in their application is gi v en.” T o make things worse, the review a ls o c o ntained the sentence: “ It would seem wiser in the lo ng-run even in a textb o o k, to admit the inco mpleteness of theory in this direction, rather than risk giving the reader the impr ession that the solution of all his pr oblems has b een achieved.” 118 E. L. L ehmann In defense of his cr iticism, 5 Pearson (in a letter to Gosset) cited a paper b y the American econo mis t T o lley [ 47 ] w hich show ed that its author indeed had b een misled by Fisher’s b o ok . T o lley wrote: “Recen tly th e English School of Stat isticians has dev elop ed for mulas and probability tables to accompany them which, they state, are applicable regardless of the f or m of the f requency distribution. These f ormulas are given, most of them without proof, in Fisher’s b ook [ 11 ] . . . . If we accept the statemen ts of those who ha v e dev eloped these new er formulas, sk ew frequency distributions and small samples need cause no further difficulty as far as measuremen t of error is concerne d.” Fisher was shaken by T olle y’s misunders tanding and in a letter to Go sset (June 27, 1929 ) admitted so me culpa bilit y: “The claim of exactness for the solutions and tests given was wrong, although a careful reader would find that I had kept within the letter of the law by hidden allusions to normality .” A careful reading of SMWR co ncerning its tre atment o f normality in fact shows the following. 1. F or s ome of the re s ults, the assumption is sta ted (though never emphasized). 2. More often, re s ults a re stated without any qua lifications. 3. The treatment of examples is very deficient in this res pe c t. As Goss et p ointed out to Fisher (June 2 4, 1929 ): “Although when y ou think about it y ou agree that ‘exactness’ or ev en appropri ate use depends on normalit y , in practice y ou don’t consider the question at all when y ou apply y our tables to your examples: not one wor d .” 4. In the chapter on distr ibutio ns (whic h in tro duces the normal, Poisson and bino- mial and no others), Fisher states that it is important to know “the ex pe r imental conditions up on which they o ccur.” How e ver, neither this nor any later chapter contains a ny dis c us sion of the co nditions under which data can b e exp ected to be normally distr ibuted (not one word, a s Goss e t might say .) 5. Another important omissio n (not ev en taken up in later editions after it ha d b een po int ed o ut by Pearson) is the great sensitivity to the a s sumption of no r mality in tests of v ariances (rather than means), which make these tests essen tially useless. This criticism must of co urse b e seen aga inst the background of the enormous achiev emen ts and novelt y of the b o o k. In the pro c e ss of making a ccessible Fisher’s resear ch of the pr eceding decade, it established a new paradig m and revolutionized statistical metho dology . The b o o k provides little informatio n a bo ut Fisher ’s o wn attitude tow ard the assumption of normality , which forms the basis of his work on the a nalysis of v a ri- ance, cov aria nce and regres s ion. The clear e st s ta temen t of his po sition that I hav e bee n able to find is in a letter of 192 9 to “Nature” entitled “Statistics and bi- ologica l resea rch”, which was the concluding do cument in the contro v ersy with Pearson. Basing his defense of the assumption on exp er ience ra ther than theory , Fisher claims : “On the pr actical side there is little enough room for anxiet y , esp ecially among biolo- gists, who are used to ch ec king the adequa cy of their metho ds b y control exp eriments. The di fficult y of obtaining decisive r esults often flo ws from hete rogeneit y of materials, often fr om causes of bi as, often, too, from the diffi cult y of setting up an experim ent in suc h a wa y as to obtain a v alid estimate of errors. I hav e nev er known difficult y to 5 F or more details on this con tro v ersy see Pearson [ 32 ] and the Fisher-Gosset corresp ondence (Gosset [ 18 ]). On the history and use of some standar d statist i c al mo dels 119 arise i n biological wo rk from imp erfect normality of the v ariation, often though I hav e examined data for this particular cause of difficult y; nor is there, I b elieve, an y case to cont rary in the l iterature. This is not to say that the deviation from ‘ Student’s’ t -distribution f ound by Shewhart and Win ters [ 43 ], f or sampl es fr om r ectangular and triangular distributions, ma y not hav e a real application i n some te c hnological wo rk, but rather that such deviations ha v e not b een f ound, and are scarcely to b e l ooked for, in biological research and or dinarily conducted.” Fisher is not so naive as to claim universal normality even for the kind of bio - logical data with which he is dealing. What he claims instead is that his metho ds are inse ns itive to depa rtures from normality , in mo der n terminolo g y that they are fairly robust against non-normality . This turned out, a s Pearso n ha d alr e ady dis - cov ered through sim ulation studies, to be tr ue for tests o f means but not for tests of v aria nces. (How ev er, even for means a theore m o f Baha dur and Sav age [ 3 ] sugg est the need for so me ca ution. They show ed that for a ny given sample size, no matter how larg e, there exist distributions for whic h the size of the t -test is arbitrarily close to 1.) 4. The role of normality after Fisher Fisher’s “Statistical Metho ds” was eno rmously influential. T ogether with the work on which it w as based it was the most imp orta nt instrument for the mov emen t from 19th Century larg e-sample to 20th Century small-sample sta tistics. The volume sold well, particula rly cons idering how sma ll the s tatistical communit y was at the time. A second edition bec ame necessar y in 1928 and a third in 1930, with the fir st three editions selling 105 0, 12 50 and 1500 copies res pec tively . This flow contin ued throughout Fisher’s life with new editions app ear ing every 2 to 3 years. How e ver, in the 1930 ’s other texts also b egan to bring the new metho dolo gy to ever widening audiences. By far the mo st s uccessful of these was George Snede c o r’s “Statistical Metho ds” [ 44 ], which was published in 1937, and the s even editions of which sold the unpar alleled num b er o f 237,0 00 copies. F o r several years it was one of the mo st cited publications in the Science Citation Index , a nd in 199 5 still had nearly 200 0 entries. 6 Snedecor’s b o ok is essentially a mor e clearly written, simpler, very user - friendly , version of Fisher’s SMR W. It also contains a lar ge num b er of numerical examples, mainly fro m a griculture and bio logy . The b o ok avoids co mplications and pa ys prac- tically no attention to the assumptions underlying the r e c ommended pro ce dures including the as sumption o f no rmality . This is in line with Snedecor’s philoso phy which he expla ins in the Preface: “T o the mathemat ical statistician must b e delegated the task of developing the theory and devising the metho ds, ac c omp anying t hese latt e r b y ade quate statements of the limitations of their use (m y i talics).” He adds that “None but the bio logist can decide whether the conditions a re fulfilled in his e x pe r iments.” This seems beg ging the question bec a use how can the biologists or other users mak e this decis ion when they are not clear ly told what a r e the conditions. This lack of attention to assumptions is shar ed by most of the many texts on statistics in the 40’s and 50 ’s that followed thos e o f Fisher and Snedecor. A fairly comprehensive lo ok at the r ole of the as sumptions underlying Fisher ’s small-sample metho ds was provided in 1959 by Scheff ´ e in his b o ok “The Ana lysis of V ariance” 6 Cited fr om Carriquiri and David [ 8 ]. 120 E. L. L ehmann [ 40 ], in a justly famous chapter of nearly 40 pag es e n titled “The effects of depar tures from the under ly ing assumptions.” Scheff ´ e measures the departure of a distr ibution fro m no rmality in terms of the co efficients γ 1 of skewness and γ 2 of kurtos is, the s tandardized 3rd and 4th moments. F or any symmetric distribution γ 1 = 0, and fo r the normal distribution also γ 2 = 0 . In the introduction to the chapter, he provides an idea o f the r ange of ( γ 1 , γ 2 ) in enginee r ing data, in r outine chemical a nalyses, in age s of marriag e, in barometric heights, a nd in the length and brea dth o f b eans. Scheff ´ e p oints out that for lar ge n the distribution of Student’s t is approximately normal and hence “the inferences ab out the mean which a r e v alid in the cas e of normality must b e c orrect for large n rega rdless of the form of the p opulatio n.” This is followed by a discussion of the effect of non-no rmality on the χ 2 -test for v aria nce, where it is shown that non-norma lity causes s erious error s. Th us, Scheff ´ e concludes that “the effect of violation of the normality assumption is sl ight on inferences ab out the mean but dangerous on inferences ab out v ariances.” He p oints o ut that these results had already b een noted by E. S. Pearson [ 31 ], and their rea s on by Box [ 4 ]. How e ver, Scheff ´ e’s concern is a n exception. The standard textbo ok attitude to- ward the as s umption o f nor mality is well summar ized by B rownlee [ 5 ] (p. 17 9): “It is presumably on this foundation [that means are approximately normal by the Cen tral Limit Theorem] that applied statisticians ha v e found empirically that usually there is no gr eat need to f uss about the normality assumption. Af ter a statistician has analyzed sev eral quite widely differi ng transform ations of a v ariable in a f air n um ber of sp ecific instances and found that the conclusions reac hed ar e substant ially ident ical for all the transformations, then he ceases to worry unduly ab out the normality assumption in most situations.” 5. The assumption of indep endence The standar d mo del for the one-sample problem, i.e. for a num ber of rep eated measurements or other obse r v atio ns of a common quantit y , assumed not only that the o bs erv a tions are normally distributed but also tha t they ar e independent. This assumption has received muc h les s atten tion than the assumption of no rmality . As Kr usk al [ 26 ] states: “An almost universal ass umption in statistica l mo dels for rep eated measurements of r eal-word quantities is tha t these measurements a re in- depe ndent, yet we know tha t such indep endence is fra gile.” As an explanatio n o f the ca sual assumption of indep endence he suggests: “One answ er is i gnorance. . . . F ar more imp ortant than simple ignorance, i s seduc- tiv e s i mplicity: It is so easy to multiply marginal probabilities, f ormulas simplif y , and manipulation is relatively smo oth, so the inv estigato r neglects dep endence, or hopes that it mak es little difference. Sometimes the hope is realized, but more often dependence can make a tremendous difference.” In many situations, the assumption of indep e ndence seems natural, even obvi- ous, b ecause of the absence of a ny direct influence of one o bs erv a tio n on ano ther. How e ver, a “spur ious” (but nevertheless very real) dep endence may b e caused b y the presence o f a common factor. This phenomenon w as inv estigated (and the term “spurious co rrelation” coined) by Ka rl Pearson in tw o pap ers of 189 7 and 19 02 [ 35 , 36 ], in which he repo rts ex tensive exp eriments on measurements carried out by himself. On the history and use of some standar d statist i c al mo dels 121 A difficulty for any serio us inv estigation o f the a ffect of dep e ndence is the great v arie ty of forms that dep endence can take. E ven Scheff ´ e, who includes a discussion of dep e ndenc e in his chapter on failur es of assumptions (men tioned in the preceding section) cannot do more than examine tw o or three sp ecia l cas e s. His co nclusion: “The effect of correla tion in the observ ations can be very serious o n inferenc e s a bo ut means.” F ollowing Sc heff ´ e, man y of the later b o oks on linear mo dels and r e gressio n anal- ysis included discussion of the e ffects of non- normality , dependence, and ineq ua lity of v ar ia nces. Particularly no teworth y in this reg ard is Miller [ 30 ] who provides a careful treatment o f these issues as an integral par t of each chapter o f his bo ok. There w as also a tric kle-down effect on some of the mor e general in troductions to statistical metho ds. Thu s, the sevent h (1980 ) edition of Snedeco r’s text (Snedecor had died in 1 974 and Co chran had b ecome a co- author) contained a substantial chapter on “F ailure in the assumptions” . Its trea tmen t of dep endence clos ely par- alleled that of Scheff ´ e and in particula r als o warned abo ut the effect of dep endence on the t -tes t. 6. The linear mo del The previo us sections hav e b een concerned with the i.i.d. normal mo del for the one-sample problem, i.e. for r ep e ated measurements o r obs e rv atio ns of the s ame quantit y . In the remainder of the pap er we shall co nsider the g eneral linear mo del where the same issues a rise as well as so me new ones. The mo del is given by Y i = n X j =1 x ij β j + ε i ( i = 1 , . . . , n ) where the Y ’s ar e the observed v a lues, the x ’s ar e known constants, the β ’s unknown parameters , a nd the ε ’s the erro r s. Of the ε ’s it is typically as s umed that they a re independently nor mally distr ibuted with mean 0 a nd co mmon v aria nce σ 2 . This mo del and the pr o p osal to estimate the β ’s by means of least squares, ar e due to Gauss and Legendre at the b eginning of the 19th Ce ntury . Known as the theory of combinations of observ atio ns, the r esulting metho dology , applied primar - ily to a stronomy and geo desy , b ecame the pr incipal statistical a ctivity throughout m uch of the 19th Cent ury , with many textb o oks devoted to it. (When Gos set in 1904 nee de d statistical metho ds for his brewery work, his main source s of infor- mation were Airy’s [ 1 ] (3rd E d) b o ok with the p onderous title “On the Alg e braic and Numerical Theo ry o f Error s of Obs erv a tions” and Merriman’s “A T extb o ok on Least Squares ” [ 29 ]. As Stig le r [ 45 ] (p.11) puts it: “The method of least sq ua res was the dominan t theme – the leitmotif – of nineteen th-cen tury ma thema tical s tatistics.” In this theory , proba bility calculations such as that of deter mining the probable error of the es tima tes were carr ied out ass uming large samples. In this way the v aria nce of the normal err or distribution could b e ass umed known, and the estimates (whic h were linear functions o f the obser v ations ) could ther efore b e assumed to b e (approximately) nor mally distr ibuted with known v ariance. The linea r mo del b eca me a muc h mo re flexible instrument thr ough Fisher’s in- tro duction of analysis of v ariance a nd cov ariance, a nd regres sion analy sis. At the same time he extended its applica tion to the more co mplicated da ta of biolo g y with their complex so urces of v a riation mentioned in Section 4. Fisher was able 122 E. L. L ehmann to ov ercome many of the difficulties a r ising in this extension through the use of randomizatio n and other asp ects of exp erimental design set forth in his 193 5 b o ok on this sub ject [ 12 ]. After Fisher’s biologica l applications , the linear mo del still faced one imp or tant challenge: the application o f regress ion mo dels to the so cial sciences, particularly to econo mics. Here in addition to the as sumptions concerning the ra ndom terms ε i (normality and independence), questions concer ning the adequacy of the structural part Σ x ij β j bec ame particularly imp or tant. F ree dma n [ 15 ] (p. 9) co mpares the situation with that in astrono m y where “The r elev ant v ariables were known from Newtonian m ec hanics, and so w ere the func- tional forms of the equations connecting them. M easuremen t could b e done w i th high precision. Muc h was known ab out the errors in the measurements and the equations.” Regarding the co r resp onding iss ues in the so cial sciences, I shall here restric t attent ion to economics 7 where after some ea rly is olated instances, statistical model building was pursue d systematica lly (b eginning in the 1 930’s) by writer s such a s Tinbergen, Haavelmo [ 19 ] and Ko opmans . It was develop ed further in the 194 0’s by the work of the Cowles Co mmission, which was then the cen ter of econo mic resear ch. Ab out these efforts Ko opmans [ 24 ] p oints out that “this theory has b een widely applied to data obtained from agricultural experiments or fr om measuremen ts in biological p opulations. There are some essential differences betw een data of this ki nd and those usually encount ered in economic problems. In agricultural experiment s some of the determining v ariables can b e completely con trolled b y the exp erimenter . . . . Other determining v ariables l ess under his control are usually by their nature s ub ject to adequate independent v ariation. In that respect they bear a resemblance to the v ariables represen ting measurable charac teristics of individuals of a biological p opulation which is usually conceive d as random drawings from a st able (m y italics) probabilit y distribu- tion. In economic analysis v ariables at the con trol of an experim enting institution are exception al. F urther only a few types of v ariables . . . are so erratic in nature that they could reasonably be regarded as drawings from any stable distribution . . . . F urther the relations betw een the v ar iables studied in this t ype of analysis ar e themselves sub ject to gradual or abrupt c hang es, according to i nstitutional or tec hnical c hanges i n society . . . . Therefore the n umber of observ ations fr om which the regressi on co efficien ts hav e to b e estimated i s limited by the very nature of the problem.” Commenting on the rela tive roles of the economist a nd the s ta tistician in this work, Ko o pmans states : “The economist – or, in general, the expert in the field to which the dependen t v ari- able belongs – should by economic reasoning and general economic exp erience. . . devise a set of determining v ariables which he expects to be a complete set [i. e. suc h that the effect of any additional v ariables can safely be absorbed int o the error term].” Ko opmans makes it clear that the determination o f the appro priate structur al part is not the ta sk of the s ta tistician but of the sub ject matter exp ert. This p oint is also made by Ar r ow [ 2 ] when he states that “The m ethod of scientific i n v estigation indicated i n the preceding paragraphs call s then for i nt ensiv e a priori thinking to formulate a m o del, follow ed by the selection of a b est-fitting s tructure f rom that model b y appropriate statistical techniques.” Once the mo del has b eco me formulated, statistical techniques make it p ossible to deduce far-rea ching conclus io ns. How ev er the reliability of thes e conclusio ns is o f course limited by the r eliability of the mo del. The difficulty of verifying the mo del assumptions on the one hand, and the power of the s ta tistical machinery to deliver 7 F or applications to psychology see Kr¨ uger et al. [ 25 ] (Chapters 2 and 3). On the history and use of some standar d statist i c al mo dels 123 results on the o ther, hav e provided a fertile ground for misuse o f the regres sion metho dology . The chief critic o f this mis us e has b een David F reedma n who ov er the past 25 years has called attention to it, bo th in a serie s of clos e to 50 substantial pap ers, and a s a statistical consultant and exp ert witness in more than 1 00 cases. As an example of F reedman’s c r iticism consider his detailed a nalysis of a b o ok by Hop e, which deals with the effects o f education on cla ss mobility (F reedman [ 13 ]). In this pap er F reedman po int s out that “one problem noticeable to a s tatistician is that inv estiga tors do not pay attent ion to the sto chastic assumptions behind the models. It do es not seem possible to derive these assumptions fr om curr en t theory , nor are they easily v alidated empirically on a case-by-ca se basis” The pap er ends with the dev astating conclusion: “My opini on is that in v estigators need to think more ab out the underlying process, and look more closely at the data, without the distorting prism of con v en tional (and largely i rrelev an t) stochastic mo dels. Estimating nonexisten t parameters cannot b e v ery f ruitful. And it must b e equally a waste of time to test theories on the basis of statistical h ypotheses that are r ooted neither in pri or theory nor in fact, even i f the algorithms are recited in ev ery statistics text without ca v eat.” The o pening se n tence o f this conclusio n sugg ests an alternative to the metho d- ology F re e dma n is criticizing: better data, mor e substantiv e knowledge and input, m ultiple studies under v arying co nditio ns as had b een carr ied o ut for e x ample in establishing smoking as a ma jor cause o f lung cancer a nd other diseases. The s e requirements ha d b een fores ha dow ed in the pass ages from Ko opmans a nd Arrow cited earlier in this section. F ree dma n capp ed his long inv olv emen ts in these is s ues as scientist, c onsultant and exp ert witness with a text: “Statistical Mo dels–Theo ry a nd P ractice” [ 15 ]. As the title suggests, the b o ok provides an a ccount not o nly of the theo ry of sta tistical mo deling but also –through the car eful examina tio n of many real-life ex amples–a guide to how such mo deling should and sho uld not b e used. An additional v aluable feature o f the b o ok is a r e view of the literature o n sta- tistical mo deling which is bo th a re source and encourag ement for fur ther study . 7. Conclusions This pap er has b een concerned with the ass umptions under lying the linear mo del and the sp ecial case o f the normal i.i.d. mo del. It considered three assumptions: normality , independence and in the mo re gener al case, the linear structure of the deterministic part. O f these three, the assumption o f no r mality ha s received by far the most atten tion in the liter ature. F or the i.i.d. case and a few simple linear mo dels it has led to an alternative nonpar ametric metho dolog y w hich has b een developed to av oid it. Unfortunately these nonpara metr ic metho ds are no help with resp ect to dep endence (or the inequa lit y o f v ariances, a topic we hav e not discussed here.) A p oint emphasized by this surv ey is that differen t fields of s tudy inv olve different kinds of data and have very different mo deling situations. Thus, when the only source o f v ariation is observ ational error , as o ften is the case in the physical sc ie nc e s, the situation is muc h simpler than in the biolog ic al s c iences whe r e o ne is dealing with samples from a somewhat heterog eneous (but stable) p opula tion. And these situations, in turn, are easier to handle than thos e of o bserv a tional studies in the so cial science s where the po pulations ar e less stable and where random sa mpling t ypically is not p oss ible. 124 E. L. L ehmann This sub ject matter dep endence crea tes a difficulty for genera l texts o n statisti- cal metho ds but it do es not abso lve them from resp onsibility . As a minimum, such texts should provide warnings ag ainst the unsubstantiated use of standard mo dels. A useful cautionary example migh t b e the F -test of v ariance s with its str o ng depen- dence o n the assumption of norma lity . E ven with the t -test which is fa irly ro bust against non-no rmality one sho uld not tempt s tudents into T olley’s error (discussed in Section 4). A warning that sho uld b e included in such texts is the danger of the to o facile assumption of indepe ndence. Another distinction that would be worth mentioning is that betw een obser v ationa l and e xp e rimental studies, per haps with references to the liter ature (for exa mple the b o ok by Rosenbaum [ 39 ]) tre ating the sp ecial metho ds developed for the former. Though genera l s tatistics texts could, and should include, this kind of ma terial, they ca n only go so far. As men tioned by Ko opmans, the statistician and the sub ject matter sp ecia list each ha v e a role to play . A general text cannot provide the s ub ject matter knowledge and the s pec ia l features that ar e needed for s uccessful mo deling in s p ecific cases. Exper ience with similar data is requir ed, knowledge of theor y and, as F r eedman po ints out: s ho e leather. Ac kno wle dgment s. I am g rateful to Persi Diaconis a nd Juliet Sha ffer for many helpful critica l co mment s. References [1] Air y, G. (1861, 1879 ). O n the Algebr aic al and Numeric al The ory of Err ors of Observations , 3rd ed. Mac milla n, Lo ndon. [2] Arro w, K. (1951 ). Mathematical mo dels in the so c ial sciences. In The Policy Scienc es (Lerner and Lass well, e ds.). Stanford Univ. Pr ess. MR00448 15 [3] Bahadur, R. and Sa v a ge, L. (1 956). The no nexistence of cer tain statisti- cal pro cedures in nonpa rametric problems. Ann. Math. Statist. 27 11 15–1 122. MR00842 41 [4] Box, G. (1953). Non-nor mality a nd tests of v ariances. Bio metrika 40 318–33 5. MR00589 37 [5] Bro wnlee, K. (19 60). Statistic al The ory and Metho dolo gy in Scienc e and Engine ering . Wiley , New Y ork . MR0 11926 8 [6] Brunt, D. (1917). The Combination of Observations . Cambridge Univ. Press. [7] Brunt, D. (1931). The Combination of Observations . Cambridge Univ. Press. [8] Carriquir y, A . and Da vid, H. (2001 ). Georg e W a ddel Snede c o r. In Statis- ticians of the Centuries (Heyde and Seneta, eds.). Springer , New Y o r k. [9] Czuber, E . (1891). The orie der Be ob ach tungsfehler . T eubner , Leipzig. [10] De Moivre, A. (1733 ). Approximatio ad Summam Terminorum Binomii ( a + b ) n in Seriem Expans i. P rinted for priv a te c ir culation. [11] Fisher, R. (19 25). Statistic al Metho ds for Rese ar ch Workers . Oliver a nd Boyd, Edinburgh. [12] Fisher, R. (1 935). The Design of Exp eriments . Oliver a nd Boyd, Edinburgh. [13] Freedman, D. (1987). As o thers see us (with discus sion). J. Educ. St atist . 12 101–2 23. Reprinted in J. Shaffer, ed. The Role o f Mo dels in Nonexp erimental So cial Science AERA/ASA W ashington, D.C. (19 97). [14] Freedman, D. (1991). Statistical models a nd sho e le a ther. In So ciolo gic al Metho dolo gy (P . Marsden, ed.). Amer . So cial Asso c., W ashington, D.C. On the history and use of some standar d statist i c al mo dels 125 [15] Freedman, D. (2005). St atistic al Mo dels: The ory and Pr actic e . Cambridge Univ. Press . [16] Gal ton, F. (187 7 ). Typical laws o f her edity . Natur e 15 492– 495, 5 12–51 4, 532–5 33. [17] Gear y, R. (1 947). T es ting for normality . Biometrika 34 209– 242. MR00234 97 [18] Gosset, W. (1970). Letters from W. S. Gosset to R. A. Fisher, 19 15–19 36 with summar ies by R. A. Fisher and a Fo reword b y L. McMullen. Pr int ed for priv ate cir culation. [19] Haa velmo, T. (19 44). The pr obability approach in econo metrics. Ec onomet- ric a 12 Supplement. MR00109 53 [20] Hald, A. (19 98). A History of Mathematic al Statistics fr om 175 0 to 193 0 . Wiley , New Y o rk. MR1619 032 [21] Helmer t, F. (1872). Die Au s gleichsr e chnun g nach der Metho de der Kleinsten Quadr ate . T eubner , Leipzig . [22] Helmer t, F. (1907). Die Au s gleichsr e chnun g nach der Metho de der Kleinsten Quadr ate . T eubner , Leipzig . [23] Kolmogor ov, A. (193 3). Grundb e griffe der Wahrsch einlichke itsr e chnung . Springer, Berlin. [24] Koopmans, T. (1937 ). Line ar Re gr ession Analysis of Ec onomic Time Series . Netherlands Econo mic Institute, Ha a rlem. [25] Kr ¨ uger, L., Gigerenzer, G. and Mo rg an, M., eds. (1987). The Pr ob a- bilistic R evol ution 2 . MIT Pre ss, Cambridge, MA. MR09298 69 [26] Kruskal, W. (1988). Mira cles and statistics: The casua l assumption of inde- pendenc e . J. Amer. St atist. Asso c. 8 3 9 29–9 40. MR09975 72 [27] Legendre, A. (1805). Nouvel les M´ e tho des p our la D´ etermination des Orbites des Com ` etes . Courcier , Paris. [28] Lexis, W. (1877). The orie der Massenerscheinungen in der Menschlichen Gesel lschaft . W a gner, F r eiburg. [29] Merriman, M. (18 8 4). A textb o ok on the metho d of lea st squa res, 8th ed. 1900. [30] Miller, R. (1986). Beyond ANOV A, Basics of Applie d S t atistics . Wiley , New Y or k. MR08380 87 [31] Pearson, E. (1931). The analysis o f v ariance in case s of no n-normal v ariation. Biometrika 23 114– 133. [32] Pearson, E. (1990 ). Student . Cla rendon P ress, Oxfor d. MR12551 03 [33] Pearson, K. (1892 ). Gr ammar of Scienc e . W alter Sco tt, L o ndon. 3rd ed. of 1911 reprinted by Meridia n B o oks, 1 957. [34] Pearson, K. (1895). Contributions to the mathematical theory of evolution, II. Sk ew v ar ia tion in homogeneous materia l. Phil. T r ans. R oy. So c. L ondon 186 343– 414. [35] Pearson, K. (189 7 ). On a form o f s purious correla tion which may arise when indices are used in the measurement of or gans. Pr o c. R oy. So c. 60 489– 497. [36] Pearson, K. (1902). On the mathematical theo ry of er r ors of judgement, with sp ecial r e fer ence to the p erso nal equation. Phil. T r ans. R oy. So c. L ondon 198 235–2 99. [37] Peirce, C. (1873). On the theory of err ors o f observ ations . App e ndix 21 to the Repo rt o f the Superintenden t o f the U.S. Coa st Survey for the year ending June 1870. 200– 224. Reprinted in Stigler: Ameri c an Contributions to Mathematic al Statistics in the Ninete enth Century 2 (198 0 ). Arno Pr ess, New Y or k. [38] Plackett, R. (1972). The discovery of the method of lea st squares. Biometrika 59 239– 251. MR03268 71 126 E. L. L ehmann [39] Rosenba u m , P . (1995, 2002 ). Observational Stu dies . Springer, New Y o rk. [40] Scheff ´ e, H. (195 9 ). The A nalysis of Varianc e . Wiley , New Y ork. MR01164 29 [41] Seal, H. (1967). The histor ical dev elopmnent of the gauss linear mo del. Biometrika 54 1–24 . MR021 4170 [42] Shafer, G. and Vo vk, V. (2006 ). The so urces of kolmogorov ˜ Os Grundb e- griffe . Statist. Sci. 21 7 0–98 . MR22759 6 7 [43] Shewhar t, W. and Winters, F. (192 8). Small sa mples – new e x pe r imental results. J. Amer. St atist. Asso c. 2 3 144–1 53. [44] Snedecor, G. (1937). Statistial Metho ds . The Iow a Sta te College Pres s, Ames, Iow a. [45] Stigler, S. (1986). The History of Statistics . Belknap Press, Cambridge, MA. MR08524 10 [46] Stigler, S. (199 9). Statistics on t he Table . Harv ard Univ. Press , Cambridge, MA. MR1712 9 69 [47] Tolley, H. (19 2 9). Econo mic da ta from the sampling point of view. J. Amer. Statist. Asso c. 24 69– 72. [48] von Mises, R. (191 9). Grundlagen der Wahr scheinlic hk eitsrechn ung. Math. Zeitschrift 5 5 2–99 . [49] von Pla to, J. (199 4). Cr e ating Mo dern Pr ob ability . Cambridge Univ. P ress.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment