Comment: Fisher Lecture: Dimension Reduction in Regression

Statistic al Scienc e 2007, V ol. 22, No. 1, 27–3 1 DOI: 10.1214 /0883423 07000000041 Main article DO I: 10.1214/0883 42306000000682 c  Institute of Mathematical Statisti cs , 2007 Comment: Fisher Lecture: Di mension Reduction in Regre ssion Ronald Christensen I am pleased to p articipate in this w ell-deserv ed recognition of Dennis Co ok’s remark able career. Co ok p oints out Fi sher’s insistence that predictor v ariables in regression b e c hosen without reference to the dep end en t v ariable. Red uction by p rincipal comp onent s clearly sati sﬁes that dictum. One of my primary ob jections to partial least squares regres- sion when I ﬁrst encoun tered it as an alternativ e to principal comp onents wa s that th e p redictor v ari- ables w ere b eing chosen with r eference to the de- p end ent v ariable. (I no w hav e other ob jections to partial least squares.) Y et on the other hand, v ari- able selection in r egression is well accepted and it clearly chooses v ariables based on their relationship to the dep enden t v ariable. P erhaps v ariable s electio n is b etter thought of as a form of shrin k age estima- tion rather than as a pro cess for c h o osing predictor v ariables. Co ok also reiterates something that I th in k is dif- ﬁcult to ov eremphasize: Fisher’s p o int that “More or less elab orate forms [for mo dels] w ill b e suitable according to the v olume of the data.” W e see this no w on a r egular basis as mo d ern tec hn ology p ro- vides larger data sets to w h ic h elab orate mo dels are regularly ﬁtted. With regard to Cook’s work, it seems to me that the k ey issue in th e dev elopmen t of Co ok’s mo dels (2), (5), (10) and (13) is wh ether they are broadly reasonable. The question did not seem to b e exten- siv ely ad d ressed bu t Co ok sh o ws that m uch can b e gained if we can reasonably use them. When th ey are appropriate, the r esults in the corresp onding prop ositions are rather stun ning. It has long b ee n R onald Christensen is Pr ofessor, Dep artment of Mathematics and Statistics, University of N ew Mexic o, Albuquer que, New Mexic o 87131-0001 , USA e-mail: ﬂetcher@stat.unm.e du . This is an electronic r eprint of the o riginal article published by the Institute of Ma thema tical Statistics in Statistic al Scienc e , 2 007, V ol. 22, No. 1, 27 –31 . This reprint diﬀers from the o riginal in pagina tio n and t yp ogr aphic deta il. kno wn that the b est regression mo del a v ailable— tec hn ically the b est predictor of a random v ariable y b ased on a p -dimensional random ve ctor x — is the conditional mean E( y | x ). The problem with this re- sult is that it requires us to kn ow the joint d istribu- tion of ( x ′ , y ). Most of wh at w e commonly recognize as regression analysis is an attempt to mo d el the relationship E( y | x ) . Th is includes linear regression, nonlinear regression, generalized linear mo dels and the v arious approac hes to “n on p arametric” (actu- ally , highly parametric) regression. Under the mo d- els b ei ng considered, there exists a p × d m atrix Γ suc h that y | x ∼ y | Γ ′ x. This means that E( y | x ) = E( y | Γ ′ x ) regardless of what mo deling strateg y we c ho ose to use. If an ything, this dimensionalit y reduction from p to d is of more imp ortance to nonparametric regression than other forms b ec ause, as the n umber of predictor v ariables increases, nonp arametric regression gets hit hard er b y the curse of dimensionalit y than less highly para- metric forms. As a result, nonparametric regression should b en eﬁt most fr om the existence of a generally v alid reduction in dimensionalit y . The issu e with these four mo dels is to estimate the column space of Γ , sa y , C (Γ). In th e ﬁrst six sections, the results are all closely tied to the eige n- v ectors (principal comp onent v ectors) of some esti- mated co v ariance matrix for the predictor v ariables x , sa y ˆ Σ. F or mo del (2), the space is spanned by the ﬁrst d p rincipal comp onent vecto rs of the usual ˆ Σ. F or mo d el (5), the space is spann ed by the ﬁr st d principal comp onent v ectors of a restricted v ersion of ˆ Σ. F or mo dels (10) and (13), the estimation pro- cedure is a b it more complica ted. T he k ey is that for b oth mo dels (10) and (13) the p opulatio n co v ariance matrix of x can b e written as Σ = Γ V DV ′ Γ + Γ 0 V 0 D 0 V ′ 0 Γ 0 , with D and D 0 diagonal matrices, in su c h a w a y that Σ(Γ V ) = (Γ V ) D , Σ(Γ 0 V 0 ) = (Γ 0 V 0 ) D 0 . 1 2 R. CHRI STENSEN This implies that the eigenv ectors of Σ are either in C (Γ) or in C (Γ 0 ) ≡ C (Γ) ⊥ , the orthog onal comple- men t of C (Γ). The p roblem is to establish whic h d out of the p orthogonal eigen vect ors b elong in C (Γ). T o estimate C (Γ), ﬁnd the orthogonal eigen ve ctors of ˆ Σ, say , v 1 , . . . , v p , and chec k the lik eliho o d of ev- ery one of th e p choose d com b inations that has d of the v i ’s in C (Γ) and the remaining p − d v ectors in the orthogonal complemen t. Whic heve r combina- tion maximizes the lik eliho o d pr o v id es the estimate of C (Γ). In case  p d  is large, Co ok pr o v id es a sequen- tial selection m etho d. The k ey diﬀerence b e t we en the pro cedur es for mo dels (10) and (13) is th at the lik eliho o ds are diﬀerent. The r emainder of m y discussion is an attempt to put the question of estimating the reduced space in to th e con text of multi v ariate linear mo del theory . T o d o this, I will c hange Cook’s notation completely , so that the problem lo oks more like standard multi- v ariate linear mo dels, but th en reiden tify the parts of the problem that int erest Co ok. I do not pr esume that an y of this is new to Co ok, bu t I found it helpful in understandin g the pro cess. In discussing multiv ariate linear mo d els, lib e ral use is made of Kronec k er pr o ducts, V ec op erators, and their p r op erties; s ee, for example, C hristensen ( 2002 , Deﬁnition B.5 and S ubsection B.5). Recall also that for a univ ariate linear mo del Y = X β + e , E( e ) = 0 and Co v( e ) = V , SSE ≡ ( Y − X ˆ β ) ′ V − 1 ( Y − X ˆ β ) = Y ′ V − 1 ( Y − X ˆ β ) . Moreo ver, lea st squ ares estima tes are BLUE’s, an d th us maxim um lik eliho o d, if C ( V X ) ⊂ C ( X ). W e will apply these f acts to the m ultiv ariate mo dels. Finally , let J s r denote an r × s matrix of 1’s with J r ≡ J 1 r , and for a matrix A let P A = A ( A ′ A ) − A ′ b e the p erp e ndicular pro jec tion op erator (pp o) on to C ( A ) with r( A ) the rank of A . The standard m ultiv ariate linear mo del inv olv es dep end en t v ariables y 1 , . . . , y q . If n observ ations are tak en on eac h dep enden t v ariable, w e ha v e y ih , i = 1 , . . . , n , h = 1 , . . . , q . Let Y h = [ y 1 h , . . . , y nh ] ′ and y ′ i = [ y i 1 , . . . , y iq ]. F or eac h h , w e ha ve a linear mo d el, Y h = X β h + e h , E( e h ) = 0 , Co v ( e h ) = σ hh I , where X is a kno wn n × p matrix th at is the same for all dep enden t v ariables, but β h and the error v ec- tor e h = [ e 1 h , . . . , e nh ] ′ are p eculiar to the d ep endent v ariable Y h . The m ultiv ariate lin ear mo del consists of ﬁtting the q linear mo dels sim ultaneously . L etting Y n × q = [ Y 1 , . . . , Y q ] , B p × q = [ β 1 , . . . , β q ] , e n × q = [ e 1 , . . . , e q ] , the m u ltiv ariate linear mo d el is Y = X B + e. (1) Alternativ ely , thinking of X as a matrix with ro ws x ′ i and e as ha ving r ows ε ′ i , w e can wr ite the multi- v ariate linear mo del as y ′ i = x ′ i B + ε ′ i , i = 1 , . . . , n. T o p erform maximum lik eliho o d , w e assume that the ε i ’s are in dep end en t N ( 0 , Σ) random v ectors. It is reasonably w ell kn own that f or any p p o P A , E Y | X ( Y ′ P A Y ) = r( A )Σ + B ′ X ′ P A X B . (2) Σ is now b eing used for the conditional co v ariance matrix of y | x , whereas Co ok used Σ f or the m arginal co v ariance matrix of x . A generalization of the m u ltiv ariate linear mo d el that is often asso ciated with gro wth curve mo d els (cf. Christensen, 2001 ) is Y = X Γ Z ′ + e, (3) where the un kno wn parameter matrix B in ( 1 ) is replaced b y a p ro du ct of a redu ced parameter ma- trix Γ that is p × d and a ﬁxed, kn o wn m atrix Z that is q × d with r ( Z ) ≤ d < q . Th is is essential ly Co ok’s mo del (5) when applied to data and u sing drastically diﬀeren t notation. (Our Y is his X , our X is his kn o wn fun ction of y , F y , ou r Z is his Γ , etc.) Th e u ltimate goal of our exercise is to drop the assumption that w e kno w Z and estimate it, or more prop erly C ( Z ), from the data. But for no w , we act as if Z is kno wn. Note that the “gro wth curv e” mo del sp eciﬁes something akin to a linear mo del for eac h r o w of Y , y i = Z (Γ x i ) + ε i , i = 1 , . . . , n. Moreo ver, using Kr onec k er pro du cts and V ec op er- ators, we can tu r n the multiv ariate growth curve mo del ( 3 ) in to a univ ariate linear mo del, V ec( Y ) = [ Z ⊗ X ] V ec(Γ) + V ec( e ) , (4) COMMENT 3 with V ec( e ) ∼ N (0 , [Σ ⊗ I n ]) . There are a coup le of reﬁnements to mo del ( 4 ) used in C o ok’s dev elopment. First, th e gro wth cur ve mo del is sp eciﬁed as Y = J µ ′ + X Γ Z ′ + e, (5) with J ′ X = 0 and Z ′ Z = I d . As a linear m o del ( 5 ) b ecomes V ec( Y ) = [ I q ⊗ J n ] µ + [ Z ⊗ X ] V ec(Γ) + V ec( e ) . Second is the assumption in Co ok’s mo dels (2) and (5) that Σ = σ 2 I q , in whic h case Co v[V ec ( e )] = σ 2 [ I q ⊗ I n ] = σ 2 I nq , so standard estimation results ap p ly to the mo del. In particular, least squares estimates of the parame- ters µ and V ec(Γ) are maxim u m lik eliho o d estimates and the lik eliho o d fun ction for ﬁxed σ 2 ev aluated at the maxim um lik eliho o d estimates of µ and Γ is, ignoring the constan t, − nq 2 log( σ 2 ) − SSE 2 σ 2 . (6) P erforming the usu al computations n ecessary to ﬁnding least squares estimates, but using p rop erties of Kronec ke r pro ducts and V ec op erators and ex- ploiting the fact that since J ′ X = 0 we ha ve C ([ I q ⊗ J n ]) ⊥ C ([ Z ⊗ X ]) so that estimatio n of µ and Γ can b e p erformed separately , the least squ ares estimates reduce to ˆ µ = ¯ y · , ˆ Γ = ( X ′ X ) − X ′ Y Z ( Z ′ Z ) − , or, alte rnative ly , X ˆ Γ Z = P X Y P Z . The maximum lik eliho o d estimate of σ 2 is obtained b y diﬀerentiati ng (6) with r esp ect to σ 2 and setting it equal to 0, yielding ˆ σ 2 = SSE /nq . In particular, the p erpen d icular p ro jectio n op erator on to C ([ I q ⊗ J n ] , [ Z ⊗ X ]) is [ I q ⊗ (1 / n ) J n n ] + [ P Z ⊗ P X ], so SSE = V ec( Y ) ′ [ I q ⊗ ( I − (1 /n ) J n n )] V ec( Y ) − V ec( Y ) ′ [ P Z ⊗ P X ] V ec( Y ) = k V ec [( I − (1 /n ) J n n ) Y ] k 2 (7) − k V ec( P X Y P Z ) k 2 = tr[ Y ′ ( I − (1 /n ) J n n ) Y ] − tr[ P Z Y ′ P X Y P Z ] . Using notation analogous to Co ok’s, thr ee estima- tors that w e will us e frequently are ˆ Σ ≡ 1 n Y ′  I − 1 n J n n  Y , ˆ Σ ﬁt ≡ 1 n Y ′ P X Y , ˆ Σ res ≡ 1 n Y ′  I − 1 n J n n − P X  Y . Note that ˆ Σ is the m aximum lik eliho o d estimate of the co v ariance matrix when ﬁtting the usu al m u lti- v ariate one-sample m o del Y = J n µ ′ + e . Using the assumption Z ′ Z = I d so that Z Z ′ = P Z , SSE = tr[ n ˆ Σ] − tr[ Z ′ n ˆ Σ ﬁt Z ] . As Co ok p oints out, this d ep ends on C ( Z ) rather than Z itself. W e are ﬁnally in a p osition to address Co ok’s question, the fact that we do not actually kno w Z . T o maximize the likelihoo d (6) as a fun ction of Z we need to maximize tr[ Z ′ n ˆ Σ ﬁt Z ] as a function of Z sub ject to Z ′ Z = I d . If w e think ab out ﬁnd- ing the columns of Z sequen tially , that is, ﬁn ding z 1 to m aximize z ′ ˆ Σ ﬁt z sub ject to k z 1 k 2 = 1, then ﬁnding z 2 to maximize z ′ ˆ Σ ﬁt z s u b ject to k z 2 k 2 = 1 and z ′ 2 z 1 = 0, and so on, this is a standard prob- lem in m ultiv ariate analysis th at is solv ed by ﬁnd- ing the eigen ve ctors of ˆ Σ ﬁt relativ e to the eigen v alues λ 1 ≥ λ 2 ≥ · · · ≥ λ q ≥ 0. Of course, s in ce Z is q × d , w e consider only the ﬁrst d eigen vect ors. T o examine a mo del equiv alen t to Co ok’s mo del (2), we consider the most extreme (largest) choice for X , which is C ( X ) = C ( J n ) ⊥ . In this case, P X = I n − (1 /n ) J n n so that ˆ Σ ﬁt = ˆ Σ. It follo ws that the maxim um likelihoo d estimate of Z consists of the ﬁrst d p rincipal comp onen t vecto rs. As p oin ted out b y Co ok, the num b er of parameters in ou r Z matrix is pd . Ho we v er, with this choic e of X , p = n − 1, so the num b er of parameters rises with the samp le size. F or Co ok’s mo d els (10) and (13) in Section 6, the co v ariance structure c hanges. As indicated ear- lier, the estimation metho ds ultimately inv olv e de- termining which of th e principal comp onent direc- tions are most likely wh ere p rincipal comp onents can b e computed fr om some estimate of Σ , wh ich ma y b e any , or preferably all, of ˆ Σ, ˆ Σ ﬁt or ˆ Σ res . F or Co ok’s mo del (13) w e again h a v e V ec( Y ) = [ I q ⊗ J n ] µ + [ Z ⊗ X ] V ec(Γ) + V ec( e ) , 4 R. CHRI STENSEN but recalling th at Z ′ Z = I d , w e now incorp orate a matrix Z 0 with Z ′ 0 Z = 0 and Z ′ 0 Z 0 = I q − d and as- sume V ec( e ) ∼ N (0 , [ Z 0 Ω 2 0 Z ′ 0 + Z Ω 2 Z ′ ⊗ I n ]) . Observe that least squares estimates will still b e BLUEs and th us m aximum lik eliho o d estimates b ecause C ([ Z 0 Ω 2 0 Z ′ 0 + Z Ω 2 Z ′ ⊗ I n ][ Z ⊗ X ]) ⊂ C ([ Z ⊗ X ]). The SSE n o w inv olv es the p e rp endicular pr o jec- tion op e rators as in (7), but also inv olve s th e in- v erse of th e co v ariance matrix. With our assu mp- tions ab out Z and Z 0 , [ Z 0 Ω 2 0 Z ′ 0 + Z Ω 2 Z ′ ⊗ I n ] − 1 = [ Z 0 Ω − 2 0 Z ′ 0 + Z Ω − 2 Z ′ ⊗ I n ] . The SSE b eco mes SSE = V ec( Y ) ′ [( Z Ω − 2 Z ′ + Z 0 Ω − 2 0 Z ′ 0 ) ⊗ ( I − (1 /n ) J n n )] V ec( Y ) − V ec( Y ) ′ [ Z Ω − 2 Z ′ ⊗ P X ] V ec( Y ) = V ec ( Y ) ′ V ec[( I − (1 /n ) J n n ) · Y ( Z Ω − 2 Z ′ + Z 0 Ω − 2 0 Z ′ 0 )] − V ec( Y ) ′ V ec( P X Y Z Ω − 2 Z ′ ) = tr[ Y ′ ( I − (1 / n ) J n n ) Y ( Z Ω − 2 Z ′ + Z 0 Ω − 2 0 Z ′ 0 )] − tr( Y ′ P X Y Z Ω − 2 Z ′ ) = tr[Ω − 2 0 Z ′ 0 Y ′ ( I − (1 /n ) J n n ) Y Z 0 ] + tr { Ω − 2 Z ′ [ Y ′ ( I − (1 / n ) J n n ) Y − Y ′ P X Y ] Z } = tr[Ω − 2 0 Z ′ 0 n ˆ Σ Z 0 ] + tr { Ω − 2 Z ′ [ n ˆ Σ − n ˆ Σ ﬁt ] Z } . The lik eliho o d will b e − n 2 log( | Z 0 Ω 2 0 Z ′ 0 + Z Ω 2 Z ′ | ) + − 1 2 SSE = − n 2 log( | Ω 2 0 | ) + − n 2 tr[Ω − 2 0 Z ′ 0 ˆ Σ Z 0 ] + − n 2 log( | Ω 2 | ) + − n 2 tr(Ω − 2 Z ′ [ ˆ Σ − ˆ Σ ﬁt ] Z ) , whic h, maximizing o v er Ω an d Ω 0 , Co ok indicates reduces to − n 2 log( | Z ′ 0 ˆ Σ Z 0 | ) + − n 2 log( | Z ′ [ ˆ Σ − ˆ Σ ﬁt ] Z | ) . As b efore, Co ok’s mod el (10) can b e view ed as the sp ecial case where C ( X ) = C ( J n ) ⊥ , so that the sec- ond term in th e lik eliho o d disapp ears. T o con tinue this discussion, we n eed to leav e the conditional w orld of linear mo d els and consider the unconditional exp e cted v alues of ˆ Σ, ˆ Σ ﬁt and ˆ Σ res . Conditionally , applying ( 2 ) to mo d el ( 5 ) when J ′ A = 0 giv es E Y | X ( Y ′ P A Y ) = r ( A )Σ + Z Γ ′ X ′ P A X Γ Z ′ . (8) F or ˆ Σ and ˆ Σ ﬁt the appropriate pp o has P A X = X and for ˆ Σ res the pp o has P A X = 0 . W e ha v e assumed that J ′ X = 0, which is only reasonable if the ran- dom ro w s of X hav e b een adju sted b y their sample means; n on etheless, it is reasonable to deﬁn e the marginal co v ariance matrix of a r o w of X as V x ≡ 1 n − 1 E X ( X ′ X ) . These results quic kly yield th e exp ectatio ns E( ˆ Σ) = n − 1 n Σ + n − 1 n Z Γ ′ V x Γ Z ′ , E( ˆ Σ ﬁt ) = r( X ) n Σ + n − 1 n Z Γ ′ V x Γ Z ′ , E( ˆ Σ res ) = n − 1 − r( X ) n Σ . In particular, with Σ = Z 0 Ω 2 0 Z ′ 0 + Z Ω 2 Z ′ , Co o k’s Prop osition 4 sa ys that th e estimates conv erge in probabilit y to the limits of their exp ecte d v alues. Co ok’s second simulat ion has a true mo del with d = 1, n = 250, q = 10 (his p ), p = 1, Γ = 1, Ω = σ , Ω 0 = σ 0 I 9 and V x = σ 2 x (his σ 2 Y ). Here, E( ˆ Σ) = n − 1 n σ 2 0 Z 0 Z ′ 0 + n − 1 n ( σ 2 + σ 2 x ) Z Z ′ , E( ˆ Σ ﬁt ) = r( X ) n σ 2 0 Z 0 Z ′ 0 +  r( X ) n σ 2 + n − 1 n σ 2 x  Z Z ′ , E( ˆ Σ res ) = n − 1 − r( X ) n σ 2 0 Z 0 Z ′ 0 + n − 1 − r( X ) n σ 2 Z Z ′ . Co ok’s s im ulation results make go o d sense in terms of these exp ecte d v alues. T erms inv olving r( X ) /n should b e u nimp o rtant . When σ 0 is small, E( ˆ Σ) is dominated by ( σ 2 + σ 2 x ) Z Z ′ , wh ic h is larger than the co rresp onding terms σ 2 x Z Z ′ and σ 2 Z Z ′ for ˆ Σ ﬁt and ˆ Σ res , resp ectiv ely , so it wo rks b est. Wh en σ 0 is comparable to σ and σ x , ˆ Σ ﬁt w orks well, b ec ause E( ˆ Σ ﬁt ) is muc h less aﬀected by σ 2 0 than the other estimates. And wh en σ 0 is large, ˆ Σ res w orks w ell b ecause then w e need to lo ok at the eigen vect ors for COMMENT 5 small eigen v alues of ˆ Σ res and ˆ Σ, but the term σ 2 Z Z ′ for ˆ Σ res is smaller than the term ( σ 2 + σ 2 x ) Z Z ′ for ˆ Σ, wh er eas the exp e cted v alue of ˆ Σ ﬁt is relativ ely unaﬀected b y σ 2 0 getting large. As Co ok m en tions, when σ 2 x + σ 2 = σ 2 0 , there is ve ry little abilit y for ˆ Σ to iden tify C ( Z ) b ecause then E( ˆ Σ) = ( n − 1) σ 2 0 /nI q , so w e cannot really exp ect the eigen vec tors of ˆ Σ to help us iden tify C ( Z ). Similarly , when σ 2 = σ 2 0 , E( ˆ Σ res ) = ( n − 1 − r( X )) σ 2 0 /nI q . Co ok’s m o dels (2), (5), (10) and (13) inv olv e sp e- cialize d structure for Σ . His mo del (16) allo ws f or general Σ. Nonetheless, th e exp ec tations of the esti- mates sho w that there sh ould almost alw ays b e some abilit y to estimat e C ( Z ). In particular, E  n r( X ) ˆ Σ ﬁt − n n − 1 ˆ Σ  = n − 1 − r( X ) r( X ) Z Γ ′ V x Γ Z ′ , so the ﬁrst d p rincipal comp onent vec tors of the es- timate n r( X ) ˆ Σ ﬁt − n n − 1 ˆ Σ should b e at least a reason- able estimate of a basis for C ( Z ). F or large sam- ples this is similar to lo oking at the directions de- termined by ˆ Σ ﬁt , but in the extreme case of C ( X ) = C ( J n ) ⊥ , th e estimator is degenerate at 0. Of course, according to Co ok’s Prop osition 6, for the general Σ (Co ok’s σ 2 ∆) of mo del (16), it no longer s uﬃces to estimate C ( Z ) ; we need to estimate C (Σ − 1 Z ). F ortunately , ˆ Σ res pro vides an estimate of Σ , so we can just transform the estimated basis for C ( Z ). F or large n , it make s sense to b ase estimation of C ( Z ) on ˆ Σ ﬁt , but r ather than transf orm ing its eigen v ectors, one could alternativ ely lo o k d irectly at the eigen- v ectors of ˆ Σ − 1 res ˆ Σ ﬁt , which is Co ok’s recommendation when p = d . Th is intuitiv e appr oac h b ased on th e exp ected v alues of (matrix) quadratic forms seems analogous to using Henderson ’s metho d 3 for esti- mating v ariance comp onents, whereas Co ok is rec- ommending a maxim um lik eliho o d p ro cedure, which I susp ect is b etter. I found the relationship b e t wee n m o del (16) and ordinary least squares (OLS), d iscussed in Section 7.4, fascinating. It seems that the gains to b e had o ver OLS demonstrated in the sim ulations are d ue to im- p osing alternative structure on the inv erse regres- sion relationship. I am comforted by the idea that using OLS is not bad bu t rather, if w e can ﬁn d an ap- propriate mo del with more structure, we can do b et- ter than OLS . Of course, this only applies when the suﬃcien t reduction is one-dimensional. With more dimensions, w e n eed our fu ll range of regression tools to d ev elop a relationship b et we en the dep endent v ari- able and the r educed predictor v ariables. I initially foun d Co ok’s discussion of standardiza- tion in S ection 7.3 d isturbing. I am not dogmatic ab out the need to standard ize v ariables prior to ﬁ nd- ing principal comp onents. When the measur emen ts are all on similar scales, using the original scales seems reasonable to m e, as when measuring the heigh t, length and width of turtle s h ells in cen timeters. On the other hand, if I measure length in kilometers and h eigh t and width in millimeters, the ﬁrs t prin - cipal component will essen tially ignore the lengths, regardless of an y role that length m ight p la y in p re- diction. I s usp e ct that one p oint of Co o k’s discus - sion is that in a s itu ation wh ere y ou need to stan- dardize the v ariables, there will b e little r eason to supp ose that his mo dels (2) or (5) are appropr iate, whic h means there is little reason to use pr in cipal comp onent s. More generally , his p oin t seems to b e that standardization is necessary but that a more complete standardization than merely rescaling th e v ariables is needed. As I indicated at th e b e ginning of my discussion, m y b iggest problem with these pro cedures is that I do not hav e a go o d feel for w hen the mod els (2), (5), (10) and (13) will b e appropriate. Multiv ariate lin- ear mo d el theory should allo w us to use ˆ Σ res to test the assump tion of Co ok’s mo dels (2) and (5) that Σ = σ 2 I . I am less sure if it will p ro vide a test of whether Σ = σ 2 Z Z ′ + σ 2 0 Z 0 Z ′ 0 , wh en Z is unkno wn, but a ge neralized likelihoo d ratio test seems plausi- ble. In an y case, the pro cedures in Section 7.2 seem generally applicable. REFERENCES Christensen, R. (2001). A dvanc e d Line ar Mo deling , 2nd ed. Springer, New Y ork. MR1880721 Christensen, R. (2002). Plane A nswers to Complex Ques- tions : The The ory of Line ar Mo dels , 3rd ed. Springer, New Y ork. MR1903460

Comment: Fisher Lecture: Dimension Reduction in Regression

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment