Estimation of linear autoregressive models with Markov-switching, the E.M. algorithm revisited

Estimatio n of linear autoregressi v e mo dels wi th Mark o v-swit c hing, the E.M. algori thm revisi ted. Joseph Rynkiewicz ∗ Octob er 30, 2018 Abstract This work concerns estimation of linear autoreg ressive models with Marko v-s w itching using exp ectation maximisation (E.M.) alg o rithm. Our metho d generalise the metho d intro duced by Elliot for genera l hidden Marko v models and av oid to use backw ard recursion. Keywords : Maxim um likelihoo d estimation, Exp ectation-Maximis ation algorithm, Hidden Marko v models, Switching models . 1 In tro duction In the present pap er w e consider an extension of b asic (HMM). Let ( X t , Y t ) t ∈ Z b e the pro cess suc h that 1. ( X t ) t ∈ Z is a Mark o v c hain in a ﬁnite state space E = { e 1 , ..., e N } , whic h can b e iden tiﬁed without loss of generalit y with the simplex of R N , where e i are unit v ecto r in R N , with unit y as the i th element and zeros elsewhere. 2. Given ( X t ) t ∈ Z , the pro cess ( Y t ) t ∈ Z is a sequence of linear autoregres- siv e mo del in R and the distribution of Y n dep end s only of X n and Y n − 1 , · · · , Y n − p . Hence, for a ﬁxed t , the dynamic of the mo del is : Y t +1 = F X t +1 ( Y t t − p +1 ) + σ X t +1 ε t +1 with F X t +1 ∈ { F e 1 , ..., F e N } linear functions, σ X t +1 ∈ { σ e 1 , ..., σ e N } strictly p ositiv e n umbers and ( ε t ) t ∈ N ∗ a i.i.d sequence of Gaussian r andom v a riable N (0 , 1). ∗ SAMOS/MA TISSE, Universit y of P arisI, 90 ru e de T olbiac, P aris, F rance, rynkiewi@univ-p aris1.fr 1 Deﬁnition 1 Write F t = σ { X 0 , · · · , X t } , for the σ -ﬁeld gener ate d by X 0 , · · · , X t , Y t = σ { Y 0 , · · · , Y t } , for the σ -ﬁeld gener ate d by Y 0 , · · · , Y t and G t = σ { ( X 0 , Y 0 ) , · · · , ( X t , Y t ) } , for the σ -ﬁeld gener ate d by X 0 , · · · , X t and Y 0 , · · · , Y t . The Marko v p rop erty implies here that P ( X t +1 = e i |F t ) = P ( X t +1 = e i | X t ) . W rite a ij = P ( X t +1 = e i | X t = e j ) and A = ( a ij ) ∈ R N × N and deﬁne : V t +1 := X t +1 − E [ X t +1 |F t ] = X t +1 − AX t . With the previous notations, w e obtain the general equation of th e m o del, for t ∈ N :  X t +1 = AX t + V t +1 Y t +1 = F X t +1 ( Y t t − p +1 ) + σ X t +1 ε t +1 (1) The parameters of the mo del are the transition probabilities of the ma- trix A, the co eﬃcien ts of the linear functions F e i and the v ariances σ e i . A successfull metho d for estimating such mo del is to compute th e maxim um lik eliho o d estimator 1 with the E.M. algorithm in tro duced b y Demster , Lair and Ru bin (1977). Generally , this algorithm demands the calculus of the conditional exp ectation of the h idden states knowing the observ ations (the E.-step), this can b e done with th e Baum and W elc h forw ard-backw ard al- gorithm (see Baum et al. (1970)) . The d eriv ation of the M-step of the E.M. algorithm is then imm ediate since we can compu te the optimal p arameters of the r egression functions thanks w eigh ted linear regression. Ho w ev er w e sh o w here that we can also embed these tw o steps in only one. Namel y we can compute, for eac h step of the E.M. algorithm, directly the optimal co eﬃcien ts of the regression fun ctions as the v ariances an d the transition matrix thanks a generalisa tion of the metho d in tro du ced by Elliott (1994 ). 2 Change of measure The fund amen tal tec h nique emp lo y ed thr oughout this pap er is the discrete time c hange of measure. W rite σ the v ector ( σ e 1 , ..., σ e N ), φ ( . ) for the densit y of N (0 , 1) and h ., . i the inner pro duct in R N . W e wish to in tro d uce a new pr obabilit y measur e ¯ P , usin g a density Λ, so that d ¯ P dP = Λ and u n der ¯ P the random v ariables y t are N (0 , 1) i.i.d. r andom v ariables. 1 This likelihood is computed conditionally to the ﬁrst “p” observ ations. 2 Deﬁne λ l = h σ, X l − 1 i φ ( y l ) φ ( ε l ) , l ∈ N ∗ , with Λ 0 = 1 and Λ t = t Y l =1 λ l and construct a new prob ab ility measur e ¯ P by setting the restriction of the Radon-Nik o d ym deriv ativ e to G t equal to Λ t . Then the follo wing lemma is a str aightforw ard adaptation of lemma 4.1 of Elliot (1994) (see annexe). Lemma 1 Under ¯ P the Y t ar e N (0 , 1) i .i.d. r andom variables. Con ve rsely , s u pp ose w e start w ith a probabilit y m easure ¯ P suc h that under ¯ P 1. ( X t ) t ∈ N is a Marko v c hain with transition matrix A . 2. ( Y t ) t ∈ N is a sequ ence of N (0 , 1) i.i.d. random v ariable. W e construct a new p robabilit y measure P s u c h that under P w e ha ve Y t +1 = F X t  Y t t − p  + σ X t ε t +1 . T o construct P from ¯ P , we introd uce ¯ λ l := ( λ l ) − 1 and ¯ Λ t := (Λ t ) − 1 and w e deﬁne P by putting  dP d ¯ P  | G t = ¯ Λ t , Deﬁnition 2 let ( H t ) , t ∈ N b e a se quenc e adapte d to ( G t ) , We sha l l write : γ t ( H t ) = ¯ E  ¯ Λ t H t |Y t  and Γ i ( Y t +1 ) = φ  Y t +1 − F X t ( Y t t − p +1 ) h σ ,e i i  h σ , e i i φ ( Y t +1 ) . The pro of of the follo w ing theorem is a detailled adaption of the pro of of theorem 5.3 of Elliott (1994) (see annexe). Theorem 1 Supp ose H t is a sc alar G -adapte d pr o c ess of the form : H 0 is F 0 me asur able, H t +1 = H t + α t +1 + h β t +1 , V t +1 i + δ t +1 f ( Y t +1 ) , k ≥ 0 , wher e V t +1 = X t +1 − AX t , f is a sc alar value d function and α , β , δ ar e G pr e dictable pr o c ess ( β wil l b e N -dimensional v e ctor pr o c e ss). Then : γ t +1 ( H t +1 X t +1 ) := γ t +1 ,t +1 ( H t +1 ) = P N i =1  γ t ( H t X t ) , Γ i ( y t +1 )  a i + γ t  α t +1  X t , Γ i ( y t +1 )  a i + γ t  δ t +1  X t , Γ i ( y t +1 )  f ( y t +1 ) a i +  diag ( a i ) − a i a T i  γ t  β t +1  X t , Γ i ( y t +1 )  (2) wher e a i := Ae i , a T i is the tr ansp ose o f a i and diag ( a i ) is the matrix with ve ctor a i for diagonal and zer os elsewher e. 3 W e will now consider sp ecial cases of p ro cesses H. In all cases, w e w ill cal cu- late the quant it y γ t,t ( H t ) and deduce γ t ( H t ) by su mming the comp onent s of γ t,t ( H t ). Then, w e deduce from the conditional Ba yes’ theorem the con- ditional exp ectation of H t : ˆ H t := E [ H t |Y t ] = γ t ( H t ) γ t (1) . 3 Application to the Exp ectation (E.-step) of the E.M. algorithm W e will use the previous th eorem in order to compute conditional quan tities needed b y the E.M. algo rithm . Let J r s t = t X l =1 h X l − 1 , e r i h X l , e s i b e the n umber of jump from state e r to state e s at time t , we obtain : γ t +1 ,t +1  J r s t +1  = P N i =1  γ t,t ( J r s t ) , Γ i ( Y t +1 )  a i + h γ t ( X t ) , Γ r ( Y t +1 ) i a sr e s . (3) W rite no w O r t = P t +1 n =1 h X n , e r i for the n umb er of times, up to t , that X o ccupies the state e r . W e obtain γ t +1 ,t +1  O r t +1  = P N i =1  γ t,t ( O r t ) , Γ i ( Y t +1 )  a i + h γ t ( X t ) , Γ r ( Y t +1 ) i a r . (4) F or the r egression functions, the M-Step of the E.M. algorithm is achiev ed b y ﬁn ding the parameters minimising the w eigh ted sum of squares : n X t =1 γ i ( t )  y t −  a i 0 + a 1 y t − 1 + · · · + a p y t − p  2  where γ i ( t ) is the conditional exp ectation of the hid den e i at time t knowing the observ ations y − p +1 , · · · , y n . W rite ψ T ( t ) = (1 , y t − 1 , ..., y t − p ) and θ i = ( a i 0 , ..., a i p ), supp ose that the matrix  P n t =1 γ i ( t ) ψ ( t ) ψ T ( t )  is in v ertible. The estimator ˆ θ i ( n ) of θ i is giv en by : ˆ θ i ( n ) = " n X t =1 γ i ( t ) ψ ( t ) ψ T ( t ) # − 1 n X t =1 γ i ( t ) ψ ( t ) Y t . Hence, in ord er to compu te ˆ θ i ( n ), we need to estimate the conditional exp ectation of the follo wing pro cesses : 4 1. T A r t +1 ( j ) = t +1 X l =1 h X l , e r i Y l − j Y l +1 for − 1 ≤ j ≤ p and 1 ≤ r ≤ N . 2. T B r t +1 ( i, j ) = t +1 X l =1 h X l , e r i Y l − j Y l − i for 0 ≤ j, i ≤ p and 1 ≤ r ≤ N . 3. T C r t +1 = t +1 X l =1 h X l , e r i Y l +1 . 4. T D r t +1 ( j ) = t +1 X l =1 h X l , e r i Y l − j for 0 ≤ j ≤ p and 1 ≤ r ≤ N . Applying theorem (2) with H t +1 ( j ) = T A r t +1 ( j ), H 0 = 0, α t +1 = 0, β t +1 = 0, δ t +1 = h X t , e r i Y t − j and f ( Y t +1 ) = Y t +1 , if j 6 = − 1 or δ t +1 = h X t , e r i and f ( Y t +1 ) = Y 2 t +1 if j = − 1, giv es us γ t +1 ,t +1  T A r t +1 ( j )  = P N i =1  γ t,t ( T A r t ( j )) , Γ i ( Y t +1 )  a i + h γ t ( X t ) , Γ r ( Y t +1 ) i Y t − j Y t +1 a r , (5) where a r is the r -th column of A . Then, applying theorem (2) with H t +1 ( j ) = T B r t +1 ( i, j ), H 0 = 0, α t +1 = 0, β t +1 = 0 , δ t +1 = h X t , e r i Y t − j Y t − i and f ( Y t +1 ) = 1 give s : γ t +1 ,t +1  T B r t +1 ( i, j )  = P N i =1  γ t,t ( T B r t ( j )) , Γ i ( Y t +1 )  a i + h γ t ( X t ) , Γ r ( Y t +1 ) i Y t − j Y t − i a r . (6) Next, applying theorem (2) with 5 H t +1 = T C r t +1 , H 0 = 0, α t +1 = 0, β t +1 = 0, δ t +1 = h X t , e r i and f ( Y t +1 ) = Y t +1 giv es : γ t +1 ,t +1  T C r t +1  = P N i =1  γ t,t ( T C r t ( j )) , Γ i ( Y t +1 )  a i + h γ t ( X t ) , Γ r ( Y t +1 ) i Y t +1 a r . (7) Finally , applying theorem (2) with H t +1 ( j ) = T D r t +1 ( j ), H 0 = 0, α t +1 = 0, β t +1 = 0 , δ t +1 = h X t , e r i Y t − j and f ( Y t +1 ) = 1 give s : γ t +1 ,t +1  T D r t +1 ( j )  = P N i =1  γ t,t ( T D r t ( j )) , Γ i ( Y t +1 )  a i + h γ t ( X t ) , Γ r ( Y t +1 ) i Y t − j a r . (8) The “Maximisat ion” pass of the E.M. algorithm is now ac hiev ed b y up- dating the parameters in the follo wing wa y . P arameters of the transition matrix The p arameter of the transition matrix will b e u p d ates with the form ula : ˆ a sr = γ T ( J sr T ) γ T  O r T  . (9) P arameters of the regression functions F or 1 ≤ r ≤ N , let R r :=  R r ij  1 ≤ i,j ≤ p +1 b e the s ymmetric with R r 11 = 1 , R r 1 j = R r j 1 = ˆ T D r ( j ), R ij = ˆ T B r ( i − 1 , j − 1) and C r = ( ˆ T C r , ( ˆ T A r ( i )) 0 ≤ i ≤ p ) we can then compute the u p dated parameter ˆ θ r of the r egression function F e r with the form ula : ˆ θ r = ( R r ) − 1 C r (10) P arameters of the v aria nces Finally , thanks the previous conditional exp ectations, we can dir ectly calculate the parameters ˆ σ 1 , ..., ˆ σ N , since for 1 ≤ r ≤ N the cond itional exp ectation of the mean square error of the rth mo del is ˆ σ 2 r = 1 O r  ˆ T A r ( − 1) + ˆ θ T r R r ˆ θ r − 2 ˆ θ T r C r  . (11) This complete the M-step of the E.M. algorithm. 6 4 conclusion Using the discrete Girsano v m easur e transform, we p rop ose an new wa y to apply the E.M. algo rithm in the case of Mark o v-switc hin g linear autoregres- sions. Note that, cont rary to the Baum and W elc h algorithm, we don’t use bac kw ard recurrence, altough t the cost of calculus sligh ty increase sin ce the n umb er of op erations is m ultiplicated by N 2 , w h ere N is the n umb er of hidden state of the Marko v chain. References Baum, L.E., P etrie, T., Soules, G. and W eiss N. A maximisation tec hniqu e o ccuring in the statistic al estimation of probabilistic functions of Marko v pro cesses. Anna ls of Mathematic al statistics , 41:1:16 4-171, 1970 Demster, A.P ., Lair N.M. and Rubin, D.B . (1977) Ma ximum lik eliho o d from incomplete data via the E.M. algo rithm . Journal of the R oyal statistic al so- ciety of L ondon , Series B:39 :1–38, 1966. Elliott,R.J. (1994) Exact Adaptativ e Filters for Mark o v c hains observed in Gaussian Noise A utomatic a 30:9:1399-1 408, 1994. Annexe Pro of of lemma 1 Lemma 2 Under ¯ P the Y t ar e N (0 , 1) i .i.d. r andom variables. Pro of The pr o of is based on the conditionnal Ba y es’Theorem, it is a simple rewriting of the Pro of of Elliot , hence w e ha ve ¯ P ( Y t +1 ≤ τ |G t ) = ¯ E  1 { Y t +1 ≤ τ } |G t  Thanks the conditionnal Bay es’ Theorem w e ha ve : ¯ E  1 { Y t +1 ≤ τ } |G t  = E  Λ t +1 1 { Y t +1 ≤ τ } |G t  E [Λ t +1 |G t ] 7 = Λ t Λ t × E  λ t +1 1 { Y t +1 ≤ τ } |G t  E [ λ t +1 |G t ] . No w E [ λ t +1 |G t ] = Z ∞ −∞ h σ, X t i φ ( Y t +1 ) φ ( ε t +1 ) × φ ( ε t +1 ) dε t +1 = Z ∞ −∞ h σ , X t i φ ( F X t ( Y t t − p +1 ) + h σ, X t i × ε t +1 ) dε t +1 = 1 and since ε t +1 = Y t +1 − F X t ( Y t t − p +1 ) h σ,X t i : ¯ P ( Y t +1 ≤ τ |G t ) = E  λ t +1 1 { Y t +1 ≤ τ } |G t  = R ∞ −∞ h σ ,X t i φ ( Y t +1 ) φ ( ε t +1 ) × 1 { Y t +1 ≤ τ } × φ ( ε t +1 ) dε t +1 = R τ −∞ φ ( Y t +1 ) dy t +1 = ¯ P ( Y t +1 ≤ τ )  Pro of of Theorem 2 Theorem 2 Supp ose H t is a sc alar G -adapte d pr o c ess of the form : H 0 is F ′ me asur able, H t +1 = H t + α t +1 + h β t +1 , V t +1 i + δ t +1 f ( Y t +1 ) , k ≥ 0 , wher e V t +1 = X t +1 − AX t , f is a sc alar value d function and α , β , δ ar e G pr e dictable pr o c ess ( β wil l b e N -dimensional v e ctor pr o c e ss). Then : γ t +1 ( H t +1 X t +1 ) := γ t +1 ,t +1 ( H t +1 ) = P N i =1  γ t ( H t X t ) , Γ i ( y t +1 )  a i + γ t  α t +1  X t , Γ i ( y t +1 )  a i + γ t  δ t +1  X t , Γ i ( y t +1 )  f ( y t +1 ) a i +  diag ( a i ) − a i a T i  γ t  β t +1  X t , Γ i ( y t +1 )  (12) wher e a i := Ae i , a T i is the tr ansp ose o f a i and diag ( a i ) is the matrix with ve ctor a i for diagonal and zer os elsewher e. Pro of Here again it is only a rewriting of the p ro of of Elliot. W e b egin with the t w o folo wwing results : Result 1 ¯ E [ V t +1 |Y t +1 ] = ¯ E  ¯ E [ V t +1 |G t , Y t +1 ] | Y t +1  = ¯ E  ¯ E [ V t +1 |G t ] |Y t +1  = 0 . (13) 8 Result 2 X t +1 X T t +1 = AX t ( AX t ) T + AX t V T t +1 + V t +1 ( AX t ) T + V t +1 V T t +1 . Since X t is of th e f orm ( 0 , · · · , 0 , 1 , 0 , · · · , 0) w e ha ve X t +1 X T t +1 = diag ( X t +1 ) = diag ( AX t ) + diag ( V t +1 ) so V t +1 V T t +1 = diag ( AX t )+ diag ( V t +1 ) − A diag ( X t ) A T − AX t V T t +1 − V t +1 ( AX t ) T . Finaly w e obtain the result h V t +1 i := E [ V t +1 V T t +1 |F t ] = E [ V t +1 V T t +1 | X t ] = diag ( AX t ) − A diag ( X t ) A T . (14) Main proﬀ W e ha ve γ t +1 ,t +1 ( H t +1 ) = ¯ E  ¯ Λ t +1 H t +1 X t +1 |Y t +1  = ¯ E  ( AX t + V t +1 ) ( H t + α t +1 + < β t +1 , V t +1 > + δ t +1 f ( y t +1 )) × ¯ Λ t +1 |Y t +1  Thanks equation (13), γ t +1 ,t +1 ( H t +1 ) = ¯ E  (( H t + α t +1 + δ t +1 f ( y t +1 )) AX t + < β t +1 , V t +1 > ) × ¯ Λ t +1 |Y t +1  . so : γ t +1 ,t +1 ( H t +1 ) = N X j =1  ¯ E  (( H t + α t +1 + δ t +1 f ( y t +1 )) < AX t , e j > e j ) ¯ Λ t +1 |Y t +1  + ¯ E  < β t +1 , V t +1 > × ¯ Λ t +1 |Y t +1  , hence γ t +1 ,t +1 ( H t +1 ) = N X j =1 N X i =1  ¯ E  (( H t + α t +1 + δ t +1 f ( y t +1 )) < X t , e i > ) ¯ Λ t +1 a j i e j |Y t +1  + ¯ E  < β t +1 , V t +1 > × ¯ Λ t +1 |Y t +1  . w e ha ve n oted a i = Ae i , so γ t +1 ,t +1 ( H t +1 ) = N X i =1  ¯ E  (( H t + α t +1 + δ t +1 f ( y t +1 )) < X t , e i > ) ¯ Λ t +1 a i |Y t +1  + ¯ E  < β t +1 , V t +1 > × ¯ Λ t +1 |Y t +1  . 9 Since for an adapted p ro cess H t to the sigma-a lgebra G t ¯ E  ¯ Λ t +1 H t |Y t +1  = N X i =1  γ t ( H t X t ) , Γ i ( y t +1 )  So, for all e r ∈ E ¯ E  ¯ Λ t +1 H t < X t , e r > |Y t +1  = P N i =1  γ t ( H t X t < X t , e r > ) , Γ i ( y t +1 )  = P N i =1  γ t ( H t X t X T t e r ) , Γ i ( y t +1 )  But w e ha ve also : γ t ( H t X t X T t ) = N X i =1 h γ t ( H t X t ) , e i i e i e T i , So w e ha v e : ¯ E  ¯ Λ t +1 H t < X t , er > |Y t +1  = N X i =1  γ t ( H t X t X T t e r ) , Γ i ( y t +1 )  = h γ t ( H t X t ) , Γ r ( y t +1 ) i . Since α , β , δ are G predictible and f ( y t +1 ) m esu rable with r esp ect to Y t +1 , the result (14) y ield u s the conclusion  10

Estimation of linear autoregressive models with Markov-switching, the E.M. algorithm revisited

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment