A note on Influence diagnostics in nonlinear mixed-effects elliptical models

This paper provides general matrix formulas for computing the score function, the (expected and observed) Fisher information and the $\Delta$ matrices (required for the assessment of local influence) for a quite general model which includes the one p…

Authors: Alex, re G. Patriota

A note on Influence diagno stics in nonlinear mixed-effects ellipti cal mo del s Alexandre G. Patriota a, ∗ a Institute of Mathematics and Statistics, University of S˜ ao Paulo, S˜ ao Paulo/ SP, 05508 -090, Br azil Abstract This paper provides general matrix formulas for computing the scor e function, the (expected and observed) Fisher information and the ∆ matrices (req uired fo r the assess men t of lo ca l influence) for a quite genera l mo del whic h includes the one prop os e d by Russo et al. (2009). Additionally , we als o pres en t a n expression for the generalized leverage. The matrix formulation has a cons iderable a dv antage, since althoug h the complexity of the p ostulated mo del, all genera l formulas are co mpact, clea r and hav e nice forms. Key wor ds: Elliptical mo dels , I nfluence diagno stics, Matr ix op era tions, Nonlinear mo dels. 1. Main results Recently , Russo et al. (2009 ) introduce a n in teresting nonlinear mixed mo del consider ing a n e lliptica l distribution for the resp ons e v ariable. The author s also pr esent a mo tiv ating example in a kinetics long i- tudinal data set whic h w as firstly presented in V onesh and Carter (1992 ) and pr eviously analyzed under the assumption of no rmality . Russ o et al. (2009 ) a nalyze this dataset co nsidering heavy-tailed distributions which ma y accommo date “lar ge” obser v ations. The author s compute the score function, Fis he r infor mation and some influence measures, but some matrices are pres e nted only with the ( r , s ) elemen t. The first au- thor to compute expressions fo r the en tries of the expected Fisher information in a multiv ariate elliptical distribution was, perha ps, Mitchell (19 89). O ther recent pap ers hav e ado pted the same stra tegy , namely Sav alli et al. (200 6) and Os orio et al. (2007 ). Since writing a ma tr ix b y ent ering element by element is not an efficien t wa y to do it, we present a matrix version of these qua n tities (co nsidering a more genera l mo del) in which, be s ides a n a esthetic improvemen t, one can use it for avoiding that c um ber s ome task. Moreov er, the compac tnes s of the expressions might encourag e other resear ch es to study mor e complex mo dels. W e also show matr ix versions of some exp ectations of a v a riable with elliptic distr ibution that can b e useful to apply in a multiv ariate co n text. The nonlinear mo del studied in Russo et al. (2 0 09) is given by y i = f ( x i , α ) + Z i b i + ǫ i , i = 1 , . . . , n (1) and, as defined by the authors , f is an m i -dimensional nonlinear function of α , x i is a vector of c ov aria tes, Z i is a matr ix of known constants, α is a p × 1 vector of unknown pa rameters and b i is an r × 1 vector o f unobserved random reg ression co efficie n ts, where ( y i , b i ) fo llows a n elliptica l distribution, such that  y i b i  ind ∼ El m i + r  f ( x i , α ) 0  ;  Z i D Z ⊤ i + σ 2 I m i Z i D D Z ⊤ i D  , where I m i is an ( m i × m i ) ide ntit y matrix . F o r the pur po s e of a v oiding n umer ical in tegrations, Russo et al. (2009) consider the margina l mo del, that is y i ind ∼ El m i ( f ( x i , α ); Σ i ), where Σ i = Z i D Z ⊤ i + σ 2 I m i . The vector o f parameters of interest is defined ∗ Corresp onding author Email addr esses: patriota.al exandre@ gmail.com and patriot a@ime.us p.br (Alexandre G. Patriota) Pr eprint submitted to Elsevier Septemb er 17, 2021 as θ = ( α ⊤ , γ ⊤ ) ⊤ , where γ = ( γ 0 , γ 1 , . . . , γ q ) ⊤ is the vector of para meters in volv ed in Σ i with, in this case , γ 0 = σ 2 . In addition to the a uthors’ s upp ositio ns, the functional form of f ( x i , α ) must b e known and twice contin uously differentiable with resp ect to ea ch element o f α . In this pape r , we consider the following mo del, y i ind ∼ El m i ( f ( x i , α ); Σ i ( w i , γ )) , (2) where w i and x i may hav e common c ompo nen ts. The functional form of the cov aria nce matrix Σ i ( w i , γ ) is known and twice contin uously differentiable with resp ect to ea c h element of γ . Since θ m ust b e identifiable in mo del (1), w e supp ose that the mo del fulfills this requir emen t. T o see that mo del (1) is a sp ecial case o f (2), tak e w i = Z i and Σ i ( Z i , γ ) = Z i D Z ⊤ i + σ 2 I m i . As mo del (2) is not co nsidering a sp ecific structure for Σ i , it can repr esent other m ultiv ariate mo dels. That is, mo del (1) can b e gener alized just by co nsidering R i ( z i , σ 2 ) instead of σ 2 I m i , where z i is a vector of ex tra disper sion cov ariates. Then, in this context, we hav e that Σ i ( w i , γ ) = Z i D Z ⊤ i + R i ( z i , σ ) a nd γ = ( τ ⊤ , σ ⊤ ) ⊤ , where w i = ( Z ⊤ i , z ⊤ i ) ⊤ , τ is a q 1 × 1 v ector of disp ersio n parameters in v olved in D a nd σ is a q 2 × 1 v ector of disper sion pa rameters asso cia ted with the mo del error term. W e ca n go further and a ssign, for instance, a fir st-order autoregr essive co v ariance matrix to the error terms, that is, Σ i ( w i , γ ) = Z i D Z ⊤ i + σ 2 V ( ρ ), where V r s ( ρ ) = ρ | r − s | / (1 − ρ ), then w i = Z i , q 2 = 2 and γ = ( τ , σ 2 , ρ ) ⊤ . In gener al, Σ i ( w i , γ ) may b e any structur ed cov ariance ma trix with pro per ties aforementioned. T o keep the same notation, co nsider γ = ( γ 0 , . . . , γ q ) ⊤ , i.e., q 1 + q 2 = q + 1, then, the nu m ber of parameter s is still b = p + q + 1 (here, b is fixed a nd b ≪ n ). Russo et al. (2009 ) show that the s core functions cons idering mo del (1) are given b y U α = n X i =1 v i J ⊤ i Σ − 1 i r i and U γ j = − 1 2 n X i =1 n tr  Σ − 1 i ˙ Σ i ( j )  − v i r ⊤ i Σ − 1 i ˙ Σ − 1 i ( j ) Σ − 1 i r i o for j = 0 , . . . , q where v i = − 2 W g ( u i ), u i = r ⊤ i Σ − 1 i r i , r i = y i − f ( x i , α ), J i = ∂ f ( x i , α ) /∂ α ⊤ , ˙ Σ i ( j ) = ∂ Σ i /∂ γ j , W g ( u i ) = d log g ( u i ) /du i and function g ( · ) is the densit y genera tor function with prop erties defined in Russo et al. (2009). Notice that, the sc o re function U γ j has a t ypo graphical err or. The rig ht form is g iven b y U γ j = − 1 2 n X i =1 n tr  Σ − 1 i ˙ Σ i ( j )  − v i r ⊤ i Σ − 1 i ˙ Σ i ( j ) Σ − 1 i r i o for j = 0 , . . . , q . The authors a lso show that the exp ected Fisher information considering mo del (1) is g iv en by K θ θ =  K αα 0 0 K γ γ  , where K αα = n X i =1 4 d gi m i J ⊤ i Σ − 1 i J i , and the ( r, s ) element of K γ γ is given by K γ r γ s = n X i =1  a r si 4 ( c i − 1) + c i 1 2 tr  Σ − 1 i ˙ Σ i ( r ) Σ − 1 i ˙ Σ i ( s )   with c i = 4 f gi / [ m i ( m i + 2)] and the quantities d gi , f gi and a r si are w ell de fined in Rus s o et al. (2009). Note that, the ab ov e sco re functions a nd Fisher information are essen tially the same of thos e under mode l (2), but here matrix Σ i do es not hav e the sp ecific str ucture as regar ded in Russo et al. (2009), it is left in the g eneral form. See also that the sco re function a nd Fisher information for γ are written in an element-b y-element form. This pap er is organized as follo ws. Section 1.1 presents a matrix version for the s c o re function, the (observed and exp ected) Fisher infor mation and s hows an iterative r e-weigh ted least squares alg o rithm to 2 attain the ma xim um-likelihoo d es timate for θ . Section 1.2 s hows a ma trix version for the ∆ matrices presented by Russo et al. (2 009) which are also applicable for mo del (2). Additionally , Section 1.3 presents an expression for the g eneralized leverage in mo del (2). W e do not presen t a n a pplication in this pap er, since it can b e seen just as a co mplemen tary materia l of Russo et al. (20 09). 1.1. Matrix version for the sc or e function and Fisher information The following t wo matrix results will b e intensiv ely used in the computation of the expr essions derived in this paper . Let A , B , C a nd D be n × n matric es, define also A = ( a 1 , a 2 , . . . , a n ) and C = ( c 1 , c 2 , . . . , c n ), where a i and c i are n × 1 vectors, then tr { A ⊤ C D B ⊤ } = vec( A ) ⊤ ( B ⊗ C )vec( D ) and A ⊤ B C = { a ⊤ r B c s } (3) where vec( · ) is the vec op era tor, which tr ansforms a matr ix into a vector b y stacking the columns of the matrix one underneath the other, “ ⊗ ” indicates the K roneck er pro duct. These results and other metho ds in matrix differ en tial calc ulus can b e studied in Ma gnus and Neudeck er (2007 ). Define the following quantities, F i =  J i 0 0 V i  , H i =  Σ − 1 i 0 0 1 2 Σ − 1 i ⊗ Σ − 1 i  , ˙ u i =  v i r i − vec ( Σ i − v i r i r ⊤ i )  and V i =  vec ( ˙ Σ i (0) ) , . . . , vec( ˙ Σ i ( q ) )  , where F i has r ank b (i.e., the functions f and Σ i m ust be defined to hold such condition). Then, b y using (3) and after a s omewhat a lgebra, we hav e that the sco re function and the expe cted Fishe r informa tion, cons ide r ing mo del (2), can be wr itten, r espe c tiv ely , as U θ = n X i =1 F ⊤ i H i ˙ u i and K θ θ = n X i =1 F ⊤ i H i O i H i F i (4) where O i = c i  4 d gi m i c i Σ i 0 0 2 Σ i ⊗ Σ i  + ( c i − 1)  0 0 0 vec( Σ i )vec( Σ i ) ⊤  . Fisher infor ma tion g iven in (4) can clear ly be interpreted as a quadratic for m which can be ea sily attained thro ugh direct matrix op er a tions. Thus, a joint itera tiv e pro cedur e for attaining the MLE of θ can be fo r m ulated as the following re-weight ed least squares a lgorithm b θ ( m +1) = n X i =1 F ( m ) ⊤ i f H ( m ) i F ( m ) i ! − 1 n X i =1 F ( m ) ⊤ i f H ( m ) i e u ( m ) i ! , m = 1 , 2 , . . . (5) where the quantities with the upper script “( m )” a re ev aluated at b θ ( m ) , f H i = H i O i H i , e u i = H − 1 i O − 1 i ˙ u i + F i b θ and m is the iteratio n c o un ter. Under nor malit y we hav e that c i = 1 , O i = H − 1 i and v i = 1 , and it is easy to see that this iterative pro c edure (under normality) is a s pecia l case of the one prop osed in Patriota and Lemo n te (200 9 ). In the sequence, we provide a matrix formulation for the obser ved Fisher info r mation which requires harder matr ix op eratio ns than the o ne sp ent in the exp ected Fisher infor mation. The observed Fisher information presented in Russo et al. (20 09), that is the sa me obser ved Fisher infor mation cons idering mo del (2), is given by − ¨ L θ θ = − P n i =1 ¨ L θ θ ,i , with ¨ L θ θ ,i = ∂ L i ( θ ) ∂ θ ∂ θ ⊤ =  ¨ L αα ,i ¨ L αγ ,i ¨ L γ α ,i ¨ L γ γ ,i  where ¨ L αα ,i = 2 J ⊤ i Σ − 1 i  W g ( u i ) Σ i + 2 W ′ g ( u i ) r i r ⊤ i  Σ − 1 i J i − 2 W g ( u i )[ I p ⊗ r ⊤ i Σ − 1 i ] D i , 3 ¨ L αγ ,i = ( ¨ L α γ 0 ,i , ¨ L α γ 1 ,i , . . . , ¨ L α γ q ,i ) (6) with ¨ L α γ j ,i = 2 J ⊤ i Σ − 1 i  W g ( u i ) Σ i + W ′ g ( u i ) r i r ⊤ i  Σ − 1 i ˙ Σ i ( j ) Σ − 1 i r i and the elemen t ( j, k ) of ¨ L γ γ ,i has the form 1 2 tr  Σ − 1 i  ˙ Σ i ( j ) Σ − 1 i ˙ Σ i ( k ) − ˙ Σ i ( j k )  + r ⊤ i Σ − 1 i n W ′ g ( u i ) ˙ Σ i ( j ) Σ − 1 i r i r ⊤ i Σ − 1 i ˙ Σ i ( k ) − W g ( u i ) ˙ Σ i ( j k ) (7) + W g ( u i ) ˙ Σ i ( j ) Σ − 1 i ˙ Σ i ( k ) + W g ( u i ) ˙ Σ i ( k ) Σ − 1 i ˙ Σ i ( j ) o Σ − 1 i r i with ˙ Σ i ( j k ) = ∂ 2 Σ i ∂ γ j ∂ γ k , D i =    a i (11) . . . a i (1 p ) . . . . . . . . . a i ( p 1) . . . a i ( pp )    and a i ( rs ) = ∂ 2 f ∂ α r ∂ α s . Note that, qua n tities (6) and (7) are no t written in a matrix for m, in the following we present a compact matrix version of ¨ L θ θ . ¨ L θ θ = n X i =1  F ⊤ i H i ¨ O i H i F i +  ˙ u ⊤ i H i   ∂ F i ∂ θ  (8) where ¨ O i = 2 W g ( u i )  Σ i 2 Σ i ⊗ r ⊤ i 2 Σ i ⊗ r i 2( Σ i ⊗ ( r i r ⊤ i ) + ( r i r ⊤ i ) ⊗ Σ i )  + 2  0 0 0 Σ i ⊗ Σ i ,  + 4 W ′ g ( u i )  r i r ⊤ i ( r i r ⊤ i ) ⊗ r ⊤ i ( r i r ⊤ i ) ⊗ r i vec( r i r ⊤ i )vec( r i r ⊤ i ) ⊤  , ∂ F i ∂ θ is an m i ( m i + 1) × b × b arr ay ,  ˙ u ⊤ i H i   ∂ F i ∂ θ  is the br a ck et pro duct of ˙ u ⊤ i H i and ∂ F i ∂ θ (for further details see W ei, 1998, o n pg. 18 8). In what fo llows, we present s o me matricia l r esults on elliptical v ariable s . Here, r i ind ∼ El m i ( 0 , Σ i ), then adapting the r esults o f Mitchell (198 9 ) for a matrix version, we have that a) E ( r i v i ) = 0 , b) E ( r i r ⊤ i v i ) = Σ i , c) E ( r i r ⊤ i v 2 i ) = 4 d gi /m i Σ i d) E (v ec( r i r ⊤ i ) r ⊤ i v 2 i ) = 0 e) E (v ec( r i r ⊤ i )vec( r i r ⊤ i ) ⊤ v 2 i ) = c i  vec ( Σ i )vec ( Σ i ) ⊤ + Σ i ⊗ Σ i + P i ( Σ i ⊗ Σ i )  , where P i is a commutation maltrix such that vec( A ) = P i vec ( A ⊤ ) fo r any matr ix A with appropr iated dimensions. Therefore, as we are considering a function g ( · ) with r egular pro per ties (differen tiation and integration are interch angeable), we hav e that E ( ˙ u i ) = 0 and E ( − ¨ L θ θ ) = K θ θ . 1.2. Matrix version for ∆ The diagnostic technique develop ed in Co ok (1986) is a w ell-spread to o l to check the mo del assumptions and conduct diag nostic studies. The author prop oses to lo ok at the likelihoo d displacement LD ( ω ) = 2 { L ( b θ ) − L ( b θ ω ) } to find po ssible influen tial observ a tions in the MLE s , wher e L ( θ ) = P i L i ( θ ) is the log- likelihoo d function and ω is a s × 1 vector of p erturba tion restricted in an op en set Ω ⊂ R s . It is also defined a v ector of no p erturba tio n a s ω 0 ∈ Ω in whic h LD ( ω 0 ) = 0, i.e., L ( θ ω 0 ) = L ( θ ). In his seminal pap er, Co o k shows that the normal curv ature at the unit dir ection ℓ has the following form C ℓ ( θ ) = 2 | ℓ ⊤ ∆ ⊤ ( ¨ L θ θ ) − 1 ∆ ℓ | where ∆ = ∂ 2 L ( θ | ω ) /∂ θ ∂ ω ⊤ , b oth ∆ and ¨ L θ θ are ev alua ted a t θ = b θ and ω = ω 0 . Thus, C d max is twice 4 the larges t eigenv alue of B = − ∆ ⊤ ¨ L − 1 θ θ ∆ and d max is the corres ponding eigenv ector. The index plot of d max may reveal how to per turb the mo del (or data) to obtain large c hanges in the estimate of θ . F or a more detailed information, we refer the rea der to the work of Russo et al. (200 9) and the references therein. Note that, by using the defined qua n tities, we can write the b × n matrix ∆ in the case w eig h t per turbation (i.e., L i ( θ ω 0 ) = ω i L i ( θ )) and the scale p erturbation (i.e., the p erturb ed log -likelihoo d function L i ( θ ω ) is built r eplacing Σ i with ω − 1 i Σ i in L i ( θ )), resp ectively , by ∆ =  b F ⊤ 1 c H 1 b ˙ u 1 , . . . , b F ⊤ n c H n b ˙ u n  and ∆ =  b F ⊤ 1 c H 1 b ˙ v 1 , . . . , b F ⊤ n c H n b ˙ v n  , (9) where the qua n tities with “ b ” a r e ev a luated at b θ and ˙ v i = − 2( W g ( u i ) + u i W ′ g ( u i ))  r i vec ( r i r ⊤ i )  . In Russo et al. (200 9), the ∆ matrix under a case weight per turbation is presented with the same typo of the s core function. Finally , the b × N matrix ∆ under the respo ns e p er tur bation (i.e., the p ertur bed log-likeliho o d function L i ( θ ω ) is built re placing y i with y i + ω i in L i ( θ )) be c omes ∆ =  b F ⊤ 1 c H 1 b G 1 , . . . , b F ⊤ n c H n b G n  , (10) where N = P n i =1 m i and G i = − 2  W g ( u i ) I m i + 2 W ′ g ( u i ) r i r ⊤ i Σ − 1 i 2 r i ⊗  W g ( u i ) I m i + W ′ g ( u i ) r i r ⊤ i Σ − 1 i   . Note that, formulas (9) a nd (10) ar e easily handled through a n y statistica l softw are. 1.3. Gener alize d lever age In this s ection, we co mpute the gener alized leverage prop osed by W ei et al. (19 98). Let y = vec( y 1 , . . . , y n ) and µ ( α ) = vec( f ( α , x 1 ) , . . . , f ( α , x n )). The author s have shown that the gener alized leverage is o btained by ev aluating the N × N matrix GL ( θ ) = D θ ( − ¨ L θ θ ) − 1 ¨ L θ Y , at θ = b θ , wher e D θ = ∂ µ ( α ) /∂ θ ⊤ and ¨ L θ Y = ∂ 2 ℓ ( θ ) /∂ θ ∂ Y ⊤ . The main idea behind the concept o f leverage is that of ev aluating the influence of Y i on its own predicted v alue. As noted by the authors , the generalized leverage is inv ariant under r eparameteriza tions and observ ations with la r ge GL ii are leverage po in ts. Under the mo del defined in (2), we hav e that D θ =      J 1 0 J 2 0 . . . . . . J n 0      and ¨ L θ Y =  F ⊤ 1 H 1 G 1 , . . . , F ⊤ n H n G n  Index plots o f GL ii may reveal those o bs erv ations with high influence o n their own predicted v alues. It is worth emphasizing that other mo dels are spec ia l c ases o f the formulas der ived in this pap er. One just has to define f ( x i , α ) and Σ i ( w i , γ ) a nd find their deriv atives. That is, the score vector and the (exp ected and observed) Fisher informa tion as well as the curv atures and the ge ner alized leverage (when av ailable) of several w orks are sp ecia l cas es of the prop osed matrix formulation (to mention just a few of them, see for instance, Paula et al. , 2003; Sav alli et al. , 20 06; Oso rio et al. , 2 007; Paula et al. , 2 009; Russo et al. , 2009). 5 2. Conclusion In this short comm unication, w e pres e n ted a matrix for m ulation of the sco re function, the (expected and o bs erved) Fisher infor mation, the g eneralized leverage a nd the ∆ matrices under c a se weight, scale a nd resp onse p erturbatio ns for a very genera l elliptical model whic h includes the nonlinear mixed-effects elliptical mo del prop osed in Russo et al. (20 09). The ge ner al expressions derived in this paper can be applied in many other mo dels and have adv antages fo r num erical purp ose s b ecause they requir e o nly simple op erations on matrices and vectors. Ac knowledgmen ts I gratefully ackno wledge grants from F APESP . References Cook, D . , (1986). Assessment of l ocal influence. Journal of the Ro yal Statistica l So ciety - Series B , 48 (2), 133–169 Magn us, J. R. and Neudec ker, H. (2007). Matrix Differ ential Calculu s with Applic ations in Statistics and Ec onometrics . W iley , Chich ester, 3rd edition. Mitch ell, (1989). The i nf ormation matrix, sk ewness tensor and a -connections for the general multiv ariate elli ptic distribution. Anna ls of the Institute of Statistics and Mathematics , 41 :(2), 289–304. Osorio, F., Paula, G. A. and Galea, M. (2007). A ssessmen t of lo cal i nfluence in elliptical li near mo dels with longitudinal structure. Computational Statistics and Data Analysis , 51 , 4354–4368. Pa triota, A.G. and Lemont e, A.J. (2009). Bi as corr ection in a m ultiv ariate normal regression model with general parameteri- zation. Statistics & Pr ob ability L e t ters , 79 : (15), 1655–1662 Pa ula, G. A., Cysneiros, F. J. A. and Galea, M . (2003). Local influence and leve rage i n elliptical nonlinear regression models, In: Pr o c e edings of the 18th InternationalWorksho p on Stati st ic al Mo del ling , V erb eke , G., Molenberghs, G., Aerts, A. and Fieu ws, S. (Eds). Leuv en: Katholieke Univ ersiteit Leuve n, 361–3 65 Pa ula, G. A., Medeiros, M. and Vilca-Labrab, F.E. (2009) . Influence diagnostics for linear m odels with first-order autoregressiv e elliptical errors. Statist i cs & Pr ob ability Letters , 79 :(3), 339-346 Russo, C. M., Paula, G.A. , Aoki, R. (2009). Influence diagnostics in nonlinear mixed-effects elli ptical mo dels. Computational Statistics and Data Ana lysis , doi:10.1016/j.csda.2009.05.004 Sa v alli, C., Paula, G. A. and Cysneiros, F. J. A. , (2006) . Assessment of v ariance comp onen ts in elliptical l inear mixed models. Statistic al Mo del ling Inglaterra, 6 :(1), 59–76. V onesh, E. F. and Car ter, R. L., (1992). Mixed-effects nonlinear regression for un balanced repeated measures. Biometric s , 48 , 1-17. W ei, B.C. (1998). Exp onent i al F amily Nonline ar Mo dels . Singap ore: Springer. W ei, B.-C., Hu, Y.-Q., F ung, W.-K. (1998). Generalized lev erage and its applications. Sc andinavian Journal of Statisti cs 25 , 25–37. 6

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment