A constructive proof of the existence of Viterbi processes

SUBMITTED TO THE IEEE TRANSA CTIONS ON INFORMA TION THEOR Y 1 A constructi ve proof of the existence of V iterbi processes J ¨ uri Lember , Alexe y K oloydenko Abstract —Since the early days of digital communication, hid- den Mark ov models (HMMs) hav e now been also routinely used in speech recognition, processing of n atural languages, images, and in bioinformatics. In an HMM ( X i , Y i ) i ≥ 1 , observa tions X 1 , X 2 , . . . ar e assumed to be cond itionally in dependent give n an “explanatory” Marko v process Y 1 , Y 2 , . . . , which itself is not observ ed; moreov er , t he conditional distribution of X i depends solely on Y i . Central to th e theory an d applications of HMM is the V iterbi algorithm to ﬁnd a maximum a posteriori (MAP) estimate q 1: n = ( q 1 , q 2 , . . . , q n ) of Y 1: n giv en ob serv ed d ata x 1: n . Maximum a posteriori paths are also kn own as Viterbi paths or alignments. Recentl y , attempts h a ve been made to study the behavio r of V iterbi alignments when n → ∞ . Th us, it h as been shown that in some special cases a well-deﬁned limiting V iterbi alignment exists. While inn ov ative, these attempts have relied on rather strong assumptions and in volved proofs wh ich are existential. This work p ro ves th e existence of inﬁnite V iterbi alignments in a more constructive manner and for a very general class of HMMs. Index T erms —Asymptotic, HMM, maximum a posteriori p ath, V iterbi algorithm, Viterbi extraction, V i terbi training. I . I N T R O D U C T I O N L ET Y = ( Y i ) i ≥ 1 be a Ma rkov chain with state space S = { 1 , . . . , K } , K > 1 , and tr ansition matrix P = ( p ij ) i,j ∈ S . Sup pose that Y is irr educible and aperiodic, hence a un ique stationary distribution π = π P exists; suppo se further that Y i ∼ π f rom time i = 1 . T o every state l ∈ S , l et us assign an emission distribution P l on ( X , B ) , wh ere X = R D , the D - dimensiona l Euc lidean space. Let f l be the density o f P l with respect to a suitable referenc e measure λ o n ( X , B ) . M ost common ly , λ is either the Lebesgue measur e (co ntinuou sly distributed X i ) o r the cou nting measure (d iscretely distributed X i ). Deﬁnition 1.1 : The stochastic pro cess ( X , Y ) is a hidden Markov mo del if there is a (measur able) function h such that for each n , X n = h ( Y n , e n ) , wher e e 1 , e 2 , . . . a re i.i.d. and indepen dent o f Y . Hence, the emission distribution P l is the distribution of h ( l , e n ) . Th e distribution of X is co mpletely determined by P an d the emission distributions P l , l ∈ S . It can be shown that X is also ergod ic [ 1 ], [ 2 ], [ 3 ]. Let x 1: n = ( x 1 , . . . , x n ) and y 1: n = ( y 1 , . . . , y n ) be ﬁxed ob served and u nobserved realizations, respecti vely , of HMM ( X i , Y i ) i ≥ 1 up to time n . J. Lember is with the Institute of Mathematical Statist ics, T artu Uni versit y , J. Liivi 2-507, 50409, Estonia. E-mail: jyri@ut.e e A. Kolo ydenko is with the Divisio n of Statistics of Nottingh am Univ ersity , Uni versity Park, Nottingham, NG7 2RD, UK. E-mail: ale xey .kolo ydenko@nottingham.ac.uk Manuscript recei ved April 8, 2008; revised ??, 2008 T reating y 1: n as parameters to be estimated, let Λ ( q 1: n ; x 1: n ) be th e likelihoo d functio n P ( Y 1: n = q 1: n ) Q n i =1 f q i ( x i ; θ q i ) of q 1: n , and let V ( x 1: n ) be the set of the m aximum- likelihood estimates v ( x 1: n ) ∈ S n of y 1: n . The elemen ts of V ( x 1: n ) are called (V iterbi) alignments and are co mmon ly computed by the V iterb i algorithm [ 4 ], [ 5 ]. If P ( Y 1: n = q 1: n ) is thought of as the prior distribution of Y 1: n , then v ( x 1: n ) ’ s also max imize the probab ility mass fun ction of the posterior distribution o f Y , hence th e term ma ximum a posteriori (MAP) paths . Besides their dire ct signiﬁcance for p rediction of Y from X , V iterbi alignments, o r MAP paths, are also central to the th eory and applications of HM Ms [ 6 ] in th e more general setting when any par ameters of the emission d istributions P l and any of the transition prob abilities p ij , i, j ∈ S , would also be u nknown and of interest. Theref ore, asympto tic behavior of V iter bi alignm ents is also crucial fo r the inference on the unknown p arameters [ 6 ], [ 7 ]. T o a ppreciate that the q uestion of extending v ( x 1: n ) ad inﬁnitum is no t a trivial one even if the prob lem of non- uniquen ess of v ( x 1: n ) is disregarded , sufﬁce it to say that an addition al observation x n +1 can in pr inciple change th e entire alignmen t based on x 1: n , i.e. v ( x 1: n ) and v ( x 1: n +1 ) 1: n can disagree signiﬁcantly , if not fu lly . Fortun ately , the sit- uation is not hop eless and in this pape r we prove that in most HMMs alignments can be co nsistently extended piecewise . Speciﬁcally , motifs of (contigu ous) o bservations z 1: b , called barriers , are o bserved with p ositi ve p robab ility , forcing V iterbi alignmen ts based on extended observations ( x 1: n , z 1: b , x n + b +1: n + b + r ) , n ≥ 0 , r ≥ 1 , to stabilize as follows: Roughly , v ( x 1: n z 1: b x n + b +1: n + b + r ) 1: n = v ( x 1: n ) for all x 1: n and all extension s x n + b +1: n + b + r . T o be more precise, a particular state l ∈ S and an element b k , called a n ode , of the barrier b can be f ound such that regardless o f th e ob servations before an d after b , the alignment has to go through l at time u = n + k . The op timality prin ciple th en in sures the stabilization v ( x 1: n z 1: b x n + b +1: n + b + r ) 1: u = v ( x 1: u ) and in particular v u = l . Suppose n ow th at x 1: n contains se veral barr iers with nodes occurrin g a t times u 1 < · · · < u m ≤ n . Th en th e V iterbi alignment v ( x 1: n ) can be constructed piecewise as follows: L et v ( x 1: ∞ ) = ( v 1 , v 2 , . . . , v m , v m +1 ) , wher e v 1 is the alignment based on x 1: u 1 and ending in l , and let v i , for i = 2 , 3 , . . . , m + 1 , be the condition al alignment based on x u i − 1 : u i giv en that Y u i − 1 = l ; note that the align ments v i , i = 2 , 3 , . . . , m also end in l . Now , if a ne w observation x n +1 is added, then the last segment v m +1 can change, but th e segmen ts v 1 , . . . , v m are intact. Supp ose n ow that a realization x 1: ∞ contains inﬁnitely many b arriers, an d henc e also in ﬁnitely many nod es. Then the SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 2 (piecewise) inﬁ nite alig nment v ( x 1: ∞ ) is deﬁn ed n aturally a s the in ﬁnite succession o f the se gmen ts v 1 , v 2 , . . . . In this paper, we prove th at for some ﬁxed integer M > 0 , the probab ility that the ﬁnite rando m process X 1: M generates a barrier, is positiv e. Since X is ergodic, almost every realization x 1: ∞ has inﬁn itely m any b arriers and, therefor e, the in ﬁnite piecewise alignment is well-deﬁned. Apparen tly , the p iecewise alignment giv es rise to a d ecoding process v : X ∞ 7→ S ∞ via V 1: ∞ = v ( X 1: ∞ ) , which we shall call the V iterbi alig nment pr ocess . The co nstruction ensures that V is regenera ti ve an d ergodic. Note also how th is p iecewise construction n aturally calls fo r a buffered on-line implem entation in which the memory used to store x u i − 1 : u i can b e r eleased o nce v i has been co mputed. A. Pre vious related work and contribution of this work The pr oblem of con structing inﬁnite V iter bi pro cesses has been br ought to the a ttention of the IEEE In formatio n Th eory commun ity fairly recen tly b y [ 8 ] and [ 9 ]. Although the piece- wise structure o f V iterbi alignments was already acknowledged in [ 10 ], to our b est kn owledge, th e subject has b een ﬁrst seriously co nsidered in [ 8 ], [ 9 ]. In th ese latter works, the existence of in ﬁnite alig nments for certain special cases, such as K = 2 an d Markov chains with additive white Gaussian noise, has b een proved. In par ticular , in these cases the autho rs of [ 8 ], [ 9 ] have proved th e existence of ‘m eeting times’ and ‘meeting states’, which are a special (stro nger) type of no des. While inn ovati ve, the main result of [ 8 ] (Th eorem 2) makes se veral restrictive assumptions an d is proved in an existential manner, which p revents its extension beyond the K = 2 case. Indepe ndently of th ese works, [ 11 ], [ 7 ], [ 12 ] hav e de veloped a more gene ral theory to include the prob lem of estimating unknown par ameters ( θ i , and p ij , i, j ∈ S ). Namely , the foc us of this theo ry has been the V iter bi training (VT), or extraction , algorithm [ 13 ]. Competing with EM-based pro cedures, th is algorithm pr ovides computationally and intu iti vely appealin g estimates which , o n the other hand , ar e biased, even in the limit when n → ∞ . In order to reduce th is bias, th e adju sted V iterbi training (V A) has been introdu ced in [ 11 ], [ 7 ], [ 1 2 ]. Naturally , V A relies on the existence of inﬁnite alignmen ts and their ergodic pro perties. Althoug h the general theor y has been presented in [ 12 ], [ 7 ], some o f th e main results of the theory (Lemma 3 .1 and 3.2 o f [ 7 ]) have app eared witho ut proof due to the limitations o f scope and size. This pape r slightly reﬁnes th ese results an d, most imp ortantly , p resents their complete proofs. Whereas these results are formulated for general HM Ms ( K ≥ 2 ), [ 14 ] has most recently con sidered in full d etail the special case of K = 2 , gener alizing similar results o f [ 8 ], [ 9 ]. Speciﬁcally , it has been p roved in [ 14 ] that inﬁnitely many bar riers (a nd h ence the inﬁn ite V iterbi alignment) exist for any aperiodic and irr ed ucible 2 -state HMM . T hus, the results presented here g eneralize th e ones of [ 14 ] and [ 8 ], [ 9 ] fo r K ≥ 2 . It turns o ut that this g eneralization is far from being straightfor ward and requir es a more advanced analysis and tools. Furth ermore, as we show below , when K > 2 , n ot every ap eriodic and irr educible HMM ha s inﬁnitely ma ny nod es , und ermining the piecewise con struction of inﬁn ite a lignments for those m odels. The disappeara nce of nodes is due to the fact that an ape riodic and irred ucible Markov ch ain can ha ve zeros in the transition matrix. If this possibility is excluded, as is th e case in [ 8 ], [ 9 ], th e ‘meeting times’ and ‘meeting states’ of [ 8 ], [ 9 ] are suf ﬁcient to prove the existence of inﬁnite V iterbi alignmen ts for many HMMs used in practice . In th eir r ecent commu nication with us, the authors o f [ 8 ], [ 9 ] have corrected those stateme nts in their above works where the strict p ositivity of the transition matrix is implicitly assumed but form ally omitted (see [ 7 ] for details). At the same time, in o rder to accomm odate for zer os in the transition m atrix, [ 7 ] intro duced a more general n otion o f nodes, effectively removing the limitations of the notio n of ‘meeting times’ and ‘meeting states’. Howev er, the p rice for this genera lization h as been r ather high du e to the interf ering issue of no n-uniq ueness of (ﬁn ite) V iterbi align ments. For a detailed treatment of the piecewise con struction of the inﬁn ite alignment and pro cess in general HM Ms, and the role of the inﬁnite V iterbi pr ocess fo r the adjusted V iterbi tr aining th eory , we ref er to th e state-o f-the-art article [ 7 ]. B. Or gan ization of the r est of the p aper In § II we brieﬂy outline the con struction of the inﬁnite alignments § I I-B based on [ 7 ]. T his includes deﬁn itions of nodes § II-A an d barriers § II- C . Next, § III states ou r main results wh ich have ﬁrst ap peared in [ 7 ] a nd guaran tee the existence of the alig nment process V . In § I II-B , we give a coun terexample to explain the ne cessity of o ur technica l assumptions. In § IV , we pr esent a com plete and de tailed proof o f our ma in results. This is followed in § V by a br ief discussion of the signiﬁcance of the presented re sults. I I . C O N S T RU C T I O N A. Nodes First, con sider the s cores δ u ( l ) def = max q ∈ S u − 1 Λ  ( q , l ); x 1: u  . (1) Thus, δ u ( l ) is the m aximum o f the likelihood of the path s terminating at u in state l . No te that δ 1 ( l ) = π l f l ( x 1 ) an d the recursion below δ u +1 ( j ) = max l ∈ S ( δ u ( l ) p lj ) f j ( x u +1 ) ∀ u ≥ 1 , ∀ j ∈ S, helps to verify that V ( x 1: n ) , the set of all the V iterbi alignments, can be written as follows: V ( x 1: n ) = { v ∈ S n : ∀ i ∈ S, δ n ( v n ) ≥ δ n ( i ) and ∀ u : 1 ≤ u < n, v u ∈ t ( u, v u +1 ) } , wh ere ∀ u ≥ 1 , ∀ j ∈ S , t ( u, j ) def = { l ∈ S : ∀ i ∈ S δ u ( l ) p lj ≥ δ u ( i ) p ij } . (2) Next, we introd uce p ( r ) ij ( u ) , the maximu m of the likeli- hood realized along the path s co nnecting states i a nd j at times u and u + r , respectively . Thu s, p (0) ij ( u ) def = p ij and ∀ u ≥ 1 , and ∀ r ≥ 1 , let p ( r ) ij ( u ) def = max q 1: r ∈ S r p iq 1 f q 1 ( x u +1 ) p q 1 q 2 f q 2 ( x u +2 ) p q 2 q 3 · · · · · · p q r − 1 q r f q r ( x u + r ) p q r j . (3) SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 3 Note also δ u +1 ( j ) = max i ∈ S  δ u − r ( i ) p ( r ) ij ( u − r )  f j ( x u +1 ) ∀ r < u, p ( r ) ij ( u ) = max q ∈ S p ( r − 1) iq ( u ) f q ( x u + r ) p qj . (4) Deﬁnition 2.1 : Let 0 ≤ r < n , u ≤ n − r an d let l ∈ S . Giv en x 1: u + r , the ﬁrst u + r o bservations, x u is said to be an l -node of or der r if δ u ( l ) p ( r ) lj ( u ) ≥ δ u ( i ) p ( r ) ij ( u ) ∀ i, j ∈ S. (5) Also, x u is said to be a no de o f order r if it is an l -n ode o f order r for some l ∈ S ; x u is said to be a strong no de o f o rder r if the inequalities in ( 5 ) are strict for e very i, j ∈ S, i 6 = l . 1 Let x 1: n be such th at x u i is an l i -node of order r , 1 ≤ i ≤ k , for some k < n , and assume u k + r < n and u i +1 > u i + r for all i = 1 , 2 , . . . , k − 1 . Such nod es are said to be sep arated . B. Piecewise alig nment Suppose x 1: n is such that for some u i , r i , i = 1 , 2 , . . . , k , u 1 + r 1 < u 2 + r 2 < · · · < u k + r k < n , x u i is an l i - node of or der r i . It fo llows th en easily fro m the deﬁnition of th e node that th ere exists a V iterbi align ment v ( x 1: n ) ∈ V ( x 1: n ) that go es throug h l i at u i (i.e. v u i = l i ) for each i = 1 , 2 , . . . , k ( see [ 7 ]). It is n ot d ifﬁcult to verify th at such v ( x 1: n ) ca n actually b e com puted as follows: Obtain v 1 , a path that is o ptimal amo ng all those that end at u 1 in l 1 . (Note that unless th e or der of the node x u 1 is 0, v 1 need not b e in V ( x 1: u 1 ) .) Gi ven x u 1 +1: u 2 , continue on by taking v 2 to be a maximum likelihood path from l 1 to l 2 . That is, v 2 maximizes the constrained likelihood under the initial distribution ( p l 1 · ) and the constrain t v 2 u 2 − u 1 = l 2 . Now , ( v 1 , v 2 ) maximizes the likelihood given x 1: u 2 over all p aths endin g with l 2 . Similarly , we deﬁne the pieces v 3 , . . . , v k . Finally , v k +1 is chosen to maximize th e (u nconstrain ed) likelihood given x u k +1 : n under the in itial distribution ( p l k · ) . The sep arated nodes a ssumption u i +1 > u i + r , 1 ≤ i < k , is n ot r estrictiv e at all since it is always p ossible to ch oose from any inﬁnite sequ ence of no des a n inﬁnite sub sequence of separated ones. The reason for this requirement has to do with the non-un iqueness of alignments and is as f ollows. The fact that x u i is an r th orde r l i -node guara ntees that when backtrack ing fro m u i + r down to u i , ties (if any) can be broken in such a way that, regardless of the values o f x u i + r +1: n and how ties are broken in between n and u i + r , the align ment goes th roug h l i at u i . At th e same tim e, segment u i , . . . , u i + r is ‘delicate’, that is, unless x u i is a strong node, breakin g the ties arbitra rily with in u i , . . . , u i + r can r esult in v u i 6 = l i . Hence, when neither x u i nor x u i +1 is strong and u i +1 ≤ u i + r , breaking th e ties in fav or o f x u i can result in v u i +1 6 = l i +1 . Clearly , such a patholog ical situation is impo ssible if r = 0 and mig ht also be rare in practice ev en for r > 0 . 1 Note that if x u is a node of order r , it is then also a node of any order higher than r . Hence, the order of a node is deﬁned to be the m inimum s uch r . T o form alize the p iecewise con struction, let W l ( x 1: n ) def = { v ∈ S n : v n = l Λ( v ; x 1: n ) ≥ Λ( w ; x 1: n ) ∀ w ∈ S n : w n = l } , V l ( x 1: n ) def = { v ∈ V ( x 1: n ) : v n = l } be th e set of maximizers of th e con strained likelihood, and the subset of maximizers of the (unconstrained ) likeliho od, respecti vely , all elements of which go throu gh l at n . Note tha t u nlike W l ( x 1: n ) , V l ( x 1: n ) mig ht be empty . I t can be shown that V l ( x 1: n ) 6 = ∅ ⇒ V l ( x 1: n ) = W l ( x 1: n ) . Also, let subscrip t the ( l ) in W m ( l ) ( x 1: n ) and V ( l ) ( x 1: n ) refer to ( p li ) i ∈ S being used as the initial distribution in place of π . W ith these notatio ns, the piecewise alig nment is v = ( v 1 , . . . , v k +1 ) ∈ V ( x 1: n ) , wher e v 1 ∈W l 1 ( x 1: u 1 ) , v k +1 ∈ V ( l k ) ( x u k +1: u n ) v i ∈W l i ( l i − 1 ) ( x u i − 1 +1: u i ) , 2 ≤ i ≤ k . ( 6) Moreover , fo r i = 1 , 2 , . . . , k , the partial paths w ( i ) def = ( v 1 , . . . , v i ) ∈ W l i ( x 1: u i ) . If x 1: ∞ has in ﬁnitely many (separated ) nodes { x u k } k ≥ 1 then v ( x 1: ∞ ) , an inﬁnite piecew ise alignment based on the node times { u k ( x 1: ∞ ) } k ≥ 1 can be deﬁn ed as follows: If the sets W l i ( l i − 1 ) ( x u i − 1 +1: u i ) , i = 2 , . . . , k a s well as V ( l k ) ( x u k +1: n ) and W l 1 ( u 1 , x 1: u 1 ) are singleton s, then ( 6 ) immediately d eﬁnes a uniq ue inﬁnite alignm ent v ( x 1: ∞ ) = ( v 1 ( x 1: u 1 ) , v 2 ( x u 1 +1: u 2 ) , . . . ) . Otherwise, ties must b e broken. If we want our inﬁn ite alig nment pro cess V to be regen- erative (see [ 7 ]), a natu ral con sistency con dition mu st be imposed on r ules to select uniq ue v ( x 1: n ) f rom W l 1 ( x 1: u 1 ) × W l 2 ( l 1 ) ( x u 1 +1: u 2 ) × · · · × W l k ( l k − 1 ) ( x u k − 1 +1: u k ) × V ( l k ) ( x u k +1: n ) . In [ 7 ], r esulting inﬁnite align ments, as well as decodin g v : X ∞ → S ∞ based on such alignments, are c alled pr oper . This cond ition is, p erhaps, best und erstood by the fo llow- ing exam ple. Supp ose fo r some x 1:5 ∈ X 5 , W 1 (1) ( x 1:5 ) = { 1221 1 , 1 1211 } , and sup pose the tie is broken in fav or o f 11211 . Now , whenever W 1 ( l ) ( x ′ 1:4 ) co ntains { 1221 , 11 21 } , we naturally req uire that 1221 n ot be selected . In par ticular , we select 1121 fr om W 1 (1) ( x 1:4 ) = { 12 21 , 1121 } . Subsequently , 112 is selected fr om W 2 (1) ( x 1:3 ) = { 1 22 , 112 } , and so on. It can be shown tha t a de coding by pie cewise alignment ( 6 ) with ties br oken in favor of min (or max) u nder the r everse lexico graphic ordering of S n , n ∈ N , is a pr oper deco ding. Note a lso tha t we b reak ties lo cally , i.e. within ind ividual intervals u i − 1 + 1 , . . . , u i , i ≥ 2 , enclosed by adjacent n odes. This is in contrast to g lobal or dering of V ( x 1: n ) , such as the one in [ 8 ], [ 9 ]. Sinc e a global o rder n eed no t r espect decomp osition ( 6 ), it can fail to p roduc e an inﬁnite align ment going through inﬁnitely many n odes unless the nodes are strong. C. Barriers Recall (Deﬁnition 2.1 ) that nodes of o rder r at time u are deﬁned r elative to the entire realizatio n x 1: u + r . Th us, whethe r x u is a node or not d epends, in prin ciple, on all o bservations up to x u . W e show below that typically a bloc k x b 1: k ∈ X k ( k ≥ r ) can be f ound such that for any w ≥ 1 and fo r any x ′ 1: w ∈ X w , SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 4 ( w + k − r ) th elemen t of ( x ′ 1: w , x b 1: k ) is a node o f orde r r (relative to ( x ′ 1: w , x b 1: k ) ). Sequenc es x b 1: k that en sure existence of such persistent nodes are called b arriers in [ 7 ]. Speciﬁcally , Deﬁnition 2.2 : Giv en l ∈ S , x b 1: k ∈ X k is called an (strong) l - barrier of o rder r ≥ 0 and length k ≥ 1 if, for any w ≥ 1 and for every x ′ 1: w ∈ X w , ( x ′ 1: w , x b 1: k ) is such tha t ( x ′ 1: w , x b 1: k ) w + k − r is an (strong) l - node of order r . I I I . E X I S T E N C E A. Clusters a nd main r e sults For each i ∈ S , let G i def = { x ∈ X : f i ( x ) > 0 } . Deﬁnition 3.1 : W e call a sub set C ⊂ S a cluster if the following conditions are satisﬁed: min j ∈ C P j ( ∩ i ∈ C G i ) > 0 , a nd ma x j 6∈ C P j ( ∩ i ∈ C G i ) = 0 . Hence, a clu ster is a max imal subset of states such that G C = ∩ i ∈ C G i , the intersectio n of the suppo rts of th e cor respond ing emission distributions, is ‘ detectable’. Distinct clu sters need not be disjoin t and a clu ster can consist o f a single state. In this latter ca se such a state is n ot hidden, since it is exposed by any ob servation it emits. When K = 2 , S is the only cluster possible, since otherwise a ll observations would expose the ir states an d the und erlying M arkov chain would cease to be hidden. In practice, m any other HM Ms have the e ntirety of S as their (necessarily uniqu e) cluster . W e n ow state the main results. For every state l ∈ S , let p ∗ l = max j p j l . (7) Lemma 3.1 : Assume th at fo r each state l ∈ S , P l  x ∈ X : f l ( x ) p ∗ l > max i,i 6 = l f i ( x ) p ∗ i  > 0 . (8) Moreover , assum e that th ere exists a cluster C ⊂ S and a p ositiv e integer m such that the m th power o f the sub- stochastic ma trix Q = ( p ij ) i,j ∈ C is strictly positive. Then , for some integers M an d r , M > r ≥ 0 , there exist a set B = B 1 × · · · × B M ⊂ X M , an M -tuple of states q 1: M ∈ S M and a state l ∈ S , such that e very x 1: M ∈ B is an l - barrier o f order r (an d length M ), q M − r = l an d P ( X 1: M ∈ B , Y 1: M = q 1: M ) > 0 . Lemma 3.1 implies that P ( X 1: M ∈ B ) > 0 . Also, since ev ery eleme nt of B is a b arrier of o rder r , th e ergodicity of X th erefore guarantees that almost every r ealization of X contains inﬁnitely many l -barriers of or der r . Hen ce, almost every r ealization of X also has inﬁnitely many l - nodes of or der r . In two state HMMs, S is the on ly cluster (o therwise the Markov chain would not be hid den), hen ce Q = P . The ir re- ducibility and aperiod icity in this case imply strict positivity of P 2 . Thus, the o nly con dition to be veriﬁed is ( 8 ), which in this case writes as P 1 ( { x ∈ X : f 1 ( x ) p ∗ 1 > f 2 ( x ) p ∗ 2 } ) > 0 and P 2 ( { x ∈ X : f 2 ( x ) p ∗ 2 > f 1 ( x ) p ∗ 1 } ) > 0 . I n [ 14 ], it is sho wn that in the case of two state HMMs, on e of these two positivity condition s is always met, which, in fact, turns out to be sufﬁcient for the existence of inﬁnitely many strong b arriers in this ( K = 2 ) case. Thus, an y two state HMM with irreducible and aperiodic Y has inﬁnitely many str ong barriers. Lemm a 3.1 signiﬁca ntly gen eralizes this an d associated r esults of [ 1 4 ]. The case K = 2 is spec ial in sev eral r espects, hen ce the generalizatio n is technically inv olved, and in particu lar the CL T -based proof of the existence of inﬁnitely many no des in [ 8 ] (Th eorem 2) do es no t apply when K > 2 . For certain techn ical re asons, instead o f extrac ting subse- quences o f separated nodes from gen eral inﬁnite seq uences of nodes guaranteed by Lemma 3.1 , we achieve node separatio n by adju sting the n otion of barrier s. Namely , note tha t two r th- order l -b arriers x j : j + M − 1 and x i : i + M − 1 might be in B with j < i ≤ j + r , implying that the associated no des x j + M − r − 1 and x i + M − r − 1 are not separated. Thus, we impose on B the following condition: x j : j + M − 1 , x i : i + M − 1 ∈ B , i 6 = j ⇒ | i − j | > r. (9) If ( 9 ) ho lds, we say th at the b arriers fr om B ⊂ X M are separated . This is of ten ea sy to ac hieve by a simple extension of B as shown in the following example . Suppo se there exists x ∈ X such that x 6∈ B m , for all m = 1 , 2 , . . . , M . All elements of B ∗ def = { x } × B are e viden tly barr iers, and moreover , they ar e now separated. The fo llowing Lemma incorpo rates a mo re gener al version o f the above example. Lemma 3.2 : Suppose the assum ptions o f Lemma 3.1 are satisﬁed. Then, for some integers M and r , M > r ≥ 0 , ther e exist B = B 1 × · · · × B M ⊂ X M , q 1: M ∈ S M , an d l ∈ S , such that ev ery x b 1: M ∈ B is a separated l - barrier of order r (and length M ), q M − r = l , and P ( X 1: M ∈ B , Y 1: M = q 1: M ) > 0 . B. Coun ter examples The con dition on C in Lemma 3.1 mig ht seem techn ical and even unnecessary . W e next give an examp le o f a n HMM where the clu ster cond ition is not me t and n o node (b arrier) can occur . Then, we will m odify the example to enf orce the cluster con dition and c onsequen tly ga in barrie rs. Example 3.2: L et K = 4 and consider an ergodic Markov chain with transition matrix P =     1 2 0 0 1 2 0 1 2 1 2 0 1 2 0 1 2 0 0 1 2 0 1 2     . Let the emission distributions be such that ( 8 ) is satisﬁed and G 1 = G 2 and G 3 = G 4 and G 1 ∩ G 3 = ∅ . Hence, in this c ase there are two disjoint cluster s C 1 = { 1 , 2 } , C 2 = { 3 , 4 } . The matrices Q i correspo nding to C i , i = 1 , 2 are Q 1 = Q 2 =  1 2 0 0 1 2  . Evidently , the cluster assumption of Lemma 3 .1 i s not s atisﬁed. Note also that the alignm ent cann ot c hange (in one step) SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 5 its state to the opp osite o ne within the same cluster . Since the sup ports G 1 , 2 and G 3 , 4 are disjoint, any observation exposes the correspon ding cluster . Hence any sequen ce of observations can be regarded as a sequen ce of block s emitted from alternatin g clusters. Howe ver , the alignmen t inside each block stay s c onstant. It can be shown that in this case no x u can be a node (of any o rder) for any n > 1 , x 1: n ∈ X n , and 1 ≤ u < n . Let us modify the HM M in Example 3.2 to ensure the assumptions o f Lemma 3. 1 . Example 3 .3: Let ǫ be suc h tha t 0 < ǫ < 1 2 and let us replace P by the fo llowing tran sition matrix     1 2 − ǫ ǫ 0 1 2 ǫ 1 2 − ǫ 1 2 0 1 2 0 1 2 0 0 1 2 0 1 2     . Let the em ission d istributions be as in the previous example. In this case, the cluster C 1 satisﬁes the assumption o f Lemma 3.1 . As previously , e very observation expo ses its cluster . Lemma 3.1 now a pplies to g uarantee ba rriers and no des. T o b e mor e speciﬁc, let ǫ = 1 / 4 , f 1 ( x ) = exp( − x ) x ≥ 0 , f 2 ( x ) = 2 e xp( − 2 x ) x ≥ 0 , an d f 3 ( x ) = exp( x ) x ≤ 0 , f 4 ( x ) = 2 ex p(2 x ) x ≤ 0 . It ca n then be veriﬁed that if x 1:2 = (1 , 1) then x 1 is a 1 - node of order 2. Indeed , in that case any ele ment of B = (0 , + ∞ ) × (lo g(2) , + ∞ ) × (0 , + ∞ ) is a 1 -barrier of order 2. Another way to modify th e HMM in Example 3.2 to enforce the assump tions of Lemma 3.1 is to ch ange the emission probabilities. Namely , assume that the supports G i , i = 1 , . . . , 4 are su ch that P j ( ∩ 4 i =1 G i ) > 0 f or all j ∈ S , and ( 8 ) holds. Now , S = { 1 , . . . , 4 } is the only cluster . Since the matrix P 2 has all its entr ies p ositiv e, the con ditions of Lemma 3.1 are no w satisﬁed and barriers can now be constructed. I V . P RO O F O F T H E M A I N R E S U LT A. Pr oof of Lemma 3.1 The proo f below is a rather direct co nstruction wh ich is, however , techn ically involv ed. In o rder to facilitate the exposition of this pr oof, we have divided it into 17 short parts as follows. 1) X l ⊂ X : I t follows from the assump tion ( 8 ) an d ﬁniteness of S that ther e exists an ǫ > 0 such that for all l ∈ S P l ( X l ) > 0 , where X l def = n x ∈ X : max i,i 6 = l p ∗ i f i ( x ) < (1 − ǫ ) p ∗ l f l ( x ) o . (10) (Note that p ∗ l > 0 for all l ∈ S by irreducib ility o f Y .) Also note that X l , l ∈ S are disjoin t and have positive referen ce measure λ ( X l ) > 0 . 2) Z ⊂ X and δ − K boun ds on clu ster d ensities f i , i ∈ C : Let C be a cluster as in the assumptions of the Lemma. The existence of C implies the existence of a set ˆ Z ⊂ ∩ i ∈ C G i and δ > 0 , such that λ ( ˆ Z ) > 0 , and ∀ z ∈ ˆ Z , the following statements h old: (i) min i ∈ C f i ( z ) > δ ; (ii) max j 6∈ C f j ( z ) = 0 . Indeed , min j ∈ C P j ( ∩ i ∈ C G i ) > 0 implies (and indeed is equiv alent to) λ ( ∩ i ∈ C G i ) > 0 . The latter implies the exis- tence o f ˆ Z ⊂ ∩ i ∈ C G i with po siti ve λ -measure and δ > 0 such that (i) holds. Since λ ( ∩ i ∈ C G i ) > 0 , the condition P j ( ∩ i ∈ C G i ) = 0 for j 6∈ C im plies (is equ iv alent to) f j = 0 λ - almost everywher e on ∩ i ∈ C G i . Th us, max j 6∈ C f j = 0 λ - almost everywher e o n ∩ i ∈ C G i , wh ich implies (ii) . Evidently , K > 0 can be chosen sufﬁciently large to make λ ( { z ∈ X : f i ( z ) ≥ K } ) arbitrarily small, and in particu lar , to g uarantee that λ ( { z ∈ X : f i ( z ) ≥ K } ) < λ ( ˆ Z ) | C | , wh ere | C | is th e size of C . Clearly then, redeﬁnin g ˆ Z def = ˆ Z ∩ { z ∈ X : f i ( z ) < K , i ∈ C } preserves λ ( ˆ Z ) > 0 . Next, co nsider λ ( ˆ Z \ ( ∪ l ∈ S X l )) . (11) If ( 11 ) is positi ve, then deﬁne Z def = ˆ Z \ ( ∪ l ∈ S X l ) . (12) If ( 11 ) is zero, then th ere must b e s ∈ C such that λ ( ˆ Z ∩ X s ) > 0 and in this case, let Z def = ˆ Z ∩ X s . (13) Such s ∈ S mu st clearly exist sinc e λ ( ˆ Z ) > 0 but λ ( ˆ Z \ ( ∪ l ∈ S X l )) = 0 . T o see th at s must n ecessarily be in the cluster C , note ∀ s 6∈ C , f s ( z ) = 0 ∀ z ∈ ˆ Z , wh ich implies ˆ Z ∩ X s = ∅ . 3) Seque nces s , a , a nd b o f states in S : Let us deﬁne an auxiliary sequence of states q 1 , q 2 , an d so on, as follows: If ( 11 ) is zero, that is, if Z = ˆ Z ∩ X s for som e s ∈ C , then deﬁne q 1 = s , otherwise let q 1 be an arbitrary state in C . L et q 2 be a state with maximal probab ility of transition to q 1 , i.e.: p q 2 q 1 = p ∗ q 1 Suppose q 2 6 = q 1 . Th en ﬁnd q 3 with p q 3 q 2 = p ∗ q 2 . If q 3 6∈ { q 1 , q 2 } , ﬁnd q 4 : p q 4 q 3 = p ∗ q 3 , and so on. Let U b e the ﬁrst in dex such that q U ∈ { q 1 , . . . , q U − 1 } , that is, q U = q T for some T < U . This means that there exists a sequence of states { q T , . . . , q U } suc h that • q T = q U • q T + i = a rg max j p j q T + i − 1 , i = 1 , . . . , U − T . T o simplify the n otation and withou t loss of gener ality , a ssume q U = 1 . Reorde r and rename th e states as follows: s 1 def = q U − 1 , s 2 def = q U − 2 , . . . , s i def = q U − i , . . . , s L def = q T = 1 i = 1 , . . . , L def = U − T , a 1 def = q T − 1 , a 2 def = q T − 2 , . . . , a P def = q 1 , where P def = T − 1 . Hence, { q 1 , . . . , q T − 1 , q T , q T +1 , . . . , q U − 1 , q U } = { a P , . . . , a 1 , 1 , s L − 1 , . . . , s 1 , 1 } . Note th at if T = 1 , then P = 0 and { q 1 , . . . , . . . , q U − 1 , q U } = { 1 , s L − 1 , . . . , s 1 , 1 } . W e h av e thus in troduced special se- quences a = ( a 1 , a 2 , . . . , a P ) and s = ( s 1 , s 2 , . . . , s L − 1 , 1) . Clearly , p s i − 1 s i = p ∗ s i , i = 2 , . . . , L, p ∗ s 1 = p 1 s 1 p a i − 1 a i = p ∗ a i , i = 2 , . . . , P, p ∗ a 1 = s L = 1 . ( 14) SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 6 Next, we are goin g to exhibit b = ( b 1 , . . . , b R ) , anoth er auxiliary seque nce for som e R ≥ 1 , characterized as f ollows: (i) b R = 1 ; (ii) ∃ b 0 ∈ C such that p b 0 b 1 p b 1 b 2 · · · p b R − 1 b R > 0 ; (iii) if R > 1 , th en b i − 1 6 = b i for every i = 1 , . . . , R . Thus, the path b 1: R connects cluster C to state 1 in R steps. Let us also r equire th at R be minim um such. Clearly such b and b 0 do exist due to ir reducibility of Y . Note also that minimality o f R guarantees ( iii) (in th e special case of R = 1 it may h appen that b 1 = 1 ∈ S and p 1 1 > 0 , in wh ich case b 0 can be taken to be also 1 ). 4) Determinin g k : L et Q m be the m th power o f the sub- stochastic ma trix Q = ( p ij ) i,j ∈ C ; let q ij be th e entries of Q m . By the hypo thesis of the Le mma, q ij > 0 ∀ i, j ∈ C . This mea ns that for every i , j ∈ C , th ere exists a p ositiv e probab ility p ath fro m i to j of length m . Let q ∗ ij be th e probab ility of a maximum probability path from i to j . In oth er words, for every i, j ∈ C , there exist states w 1 , . . . , w m − 1 ∈ C such that p iw 1 p w 1 w 2 · · · p w m − 1 w m − 1 p w m − 1 j = q ∗ ij > 0 . (15) Let u s deﬁne q = min i,j ∈ C q ∗ i j > 0 , and (16) A = max i ∈ S max j ∈ S  p ∗ i p j i : p j i > 0  , (17) where p ∗ i ’ s are as deﬁn ed in ( 7 ). Choose k su fﬁciently large for the follo wing to hold: (1 − ǫ ) k − 1 < q 2  δ K  2 m A − R , (18) where ǫ is as in ( 10 ) an d δ a nd K are as introdu ced in § I V - A2 . 5) The s -path: W e now ﬁx th e state sequence b 0 , b 1 , . . . , b R , s 1 , s 2 , . . . , s 2 Lk , a 1 , . . . , a P , (19) where s Lj + i = s i , j = 1 , . . . , 2 k − 1 , i = 1 , . . . , L , (an d in particu lar s Lj = 1 , j = 1 , . . . , 2 k ). The sequen ce ( 19 ) will be called the s -p ath . The s -p ath is a con catenation of 2 k s cycles s 1: L , the b eginning an d the end of which are connected to the c luster C v ia po siti ve p robability path s b an d a , respecti vely (recall th at a P = q 1 ∈ C an d b R = 1 by con- struction). Addition ally , the b R , s 1 , s 2 , . . . , s 2 Lk , a 1 , . . . , a P - segment of the s -p ath ( 19 ) h as the im portant proper ty ( 14 ), i.e. ev ery consecutiv e transition along this segment occurs with the maxim al transition probability given its destination state. (Howe ver , b , th e beginning of the s -path, need n ot satisfy this proper ty .) T he s - path is almost ready to serve as q 1: M promised by the Lemma an d its conversion to q 1: M will be completed in § IV -A17 . I n fact, the idea of the Le mma and its proof is to exhib it (a cylinder su bset of) observations such that once emitted along the s -path, these o bservations would trap the V iterbi b acktrack ing so that the latter wind s u p on the s -path. That will guaran tee tha t a n observation corresp onding to th e beginning of th e s - path, is a n ode. 6) The barrier: Conside r the fo llowing sequ ence of ob ser- vations z 0 , z 1 , . . . , z m , y ′ 1 , . . . , y ′ R − 1 , y 0 , y 1 , . . . , y 2 Lk , y ′′ 1 , . . . , y ′′ P , z ′ 1 , . . . , z ′ m , (20) where z 0 , z i , z ′ i ∈ Z , i = 1 , . . . , m ; y ′ i ∈ X b i , i = 1 , . . . , R − 1; y 0 ∈ X 1 , y i + Lj ∈ X s i , j = 1 , . . . , 2 k − 1 , i = 1 , . . . , L y ′′ i ∈ X a i , i = 1 , . . . , P. From this p oint on througho ut § IV -A15 , we shall be proving that y Lk is a 1-node of o rder ( kL + m + P ) , and, therefore, that ( 20 ) is a 1-barr ier of ord er ( k L + m + P ) . First, let u ≥ 2 Lk + 2 m + 1 + P + R and le t x 1: u be a ny sequence o f o bservations contain ing the sequence ( 20 ) in the tail. 7) α , β , γ , η : Recall the deﬁnition of the scores δ u ( i ) ( 1 ) and the maximum partial likelihoo ds p ( r ) i j ( u ) ( 3 ). Now , we need to introdu ce the fo llowing abbreviated no tation. For any i, j ∈ S and appropriate r ≥ 0 , let δ i ( y l ) def = δ u − P − m − 2 kL + l ( i ) ∀ l : 0 ≤ l ≤ 2 k L p ( r ) ij ( y l ) def = p ( r ) ij ( u − P − m − 2 k L + l ) , (21) p ( r ) ij ( y ′ l ) def = p ( r ) ij ( u − P − m − 2 k L − R + l ) ∀ l : 1 ≤ l ≤ R − 1 , δ i ( z l ) def = δ u − 2 Lk − 2 m − P − R + l ( i ) ∀ l : 0 ≤ l ≤ m, p ( r ) ij ( z l ) def = p ( r ) ij ( u − 2 Lk − 2 m − P − R + l ) , δ i ( z ′ l ) def = δ u − m + l ( i ) ∀ l : 1 ≤ l ≤ m, p ( r ) ij ( z ′ l ) def = p ( r ) ij ( u − m + l ) . (22) Also, we will be f requen tly u sing the scores correspond ing to z 0 , y ′ 1 , y Lk , and y 2 Lk , h ence the fo llowing further abbrevia- tions: α i def = δ i ( z 0 ) , β i def = δ i ( z m ) , γ i def = δ i ( y 0 ) , η i def = δ i ( y Lk ) . Note that ∀ j 6∈ C , f ( z 0 ) = f j ( z ′ l ) = f j ( z l ) = 0 , l = 1 , . . . , m by con struction of Z ( § IV -A2 ). Hence, α j = β j = 0 ∀ j 6∈ C , and a more general im plication is tha t fo r every j ∈ S β j = max i ∈ C α i p ( m − 1) ij ( z 0 ) f j ( z m ) (23) = α i β ( j ) p ( m − 1) i β ( j ) j ( z 0 ) f j ( z m ) for som e i β ( j ) ∈ C ; γ j = max i ∈ C β i p ( R − 1) ij ( z m ) f j ( y 0 ) (24) = β i γ ( j ) p ( R − 1) i γ ( j ) j ( z m ) f j ( y 0 ) for some i γ ( j ) ∈ C . Also, we will use the following rep resentation of η j in terms of γ : η j = ma x i ∈ S γ i p ( kL − 1) i j ( y 0 ) f j ( y kL ) (25) = γ i η ( j ) p ( kL − 1) i η ( j ) j ( y 0 ) f j ( y kL ) for some i η ( j ) ∈ S. SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 7 8) Bo unds o n β : Recall ( § IV -A 3 ) th at b 0 ∈ C . W e show that f or ev ery j ∈ S β j < q − 1  K δ  m β b 0 . (26) Fix j ∈ S and consider α i β ( j ) from ( 23 ). Let v 1 , . . . , v m − 1 be a path that r ealizes p ( m − 1) ij ( z 0 ) . Then β j = α i β ( j ) p i β ( j ) v 1 f v 1 ( z 1 ) p v 1 v 2 f v 2 ( z 2 ) · · · p v m − 1 j f j ( z m ) < α i β ( j ) K m . (The last in equality follows fro m ( 12 ), ( 1 3 ).) Let w 1 , . . . , w m − 1 be a maximu m proba bility path fr om i β ( j ) to b 0 as in ( 15 ). Thu s, β b 0 ≥ α i β ( j ) p ( m − 1) i β ( j ) b 0 ( z 0 ) f b 0 ( z m ) ≥ α i β ( j ) p i β ( j ) w 1 f w 1 ( z 1 ) p w 1 w 2 f w 2 ( z 2 ) · · · · · · p w m − 1 b 0 f b 0 ( z m ) ≥ α i β ( j ) q δ m . (The last inequality aga in follo ws from ( 12 ), ( 13 ).) Since q > 0 ( 16 ), we thus obtain: β j < α i β ( j ) K m ≤ β b 0 q δ m K m , as req uired. 9) Likelihood ratio bo unds: W e next prove the following claims p ( L − 1) i 1 ( y lL ) ≤ p ( L − 1) 11 ( y lL ) ∀ i ∈ S ∀ l = 0 , . . . , 2 k − 1 , (27) p ( L − 1) ij ( y lL ) f j ( y ( l +1) L ) p ( L − 1) 11 ( y lL ) f 1 ( y ( l +1) L ) < 1 − ǫ ∀ i, j ∈ S, j 6 = 1 , ∀ l : 0 ≤ l ≤ 2 k − 1 , (28) p ( R − 1) ij ( z m ) f j ( y 0 ) ≤ A R p ( R − 1) b 0 1 ( z m ) f 1 ( y 0 ) ∀ i, j ∈ S, (29) p ( m + P − 1) ij ( y 2 kL ) p ( m + P − 1) 1 j ( y 2 kL ) ≤ q − 1  K δ  m − 1 ∀ j ∈ C ∀ i ∈ S. (30) If L = 1 , then ( 27 ) becomes p i 1 ≤ p 1 1 for all i ∈ S , which is true b y the assumption p ∗ 1 = p 1 1 made in th e course o f constructing the s seque nce ( § IV -A 3 ). I f L = 1 , then ( 28 ) becomes p ij f j ( y l +1 ) p 11 f 1 ( y l +1 ) < 1 − ǫ ∀ i, j ∈ S, j 6 = 1 , and thus, since y l +1 ∈ X 1 , 0 ≤ l < 2 k in this case, ( 28 ) is true by the deﬁnition o f X 1 ( § IV -A1 ) (and the fact that p ∗ 1 = p 1 1 ). Let us next prove ( 27 ) and ( 28 ) for the case L > 1 . Conside r any l = 0 , 1 , . . . , 2 k − 1 . Note that th e d eﬁnitions of the s - path ( 19 ), X s i ( § IV -A1 ), and th e fact that y lL + i ∈ X s i for 1 ≤ i < L imply that gi ven observations y Ll +1: L ( l +1) − 1 , the path s 1: L − 1 realizes the maximum in p ( L − 1) 11 ( y Ll ) , i.e. p ( L − 1) 11 ( y lL ) = p 1 s 1 f s 1 ( y lL +1 ) p s 1 s 2 · · · (31) · · · p s L − 2 s L − 1 f s L − 1 ( y ( l +1) L − 1 ) p s L − 1 1 . (Indeed , p 1 s 1 f s 1 ( y lL +1 ) p s 1 s 2 · · · · · · p s L − 2 s L − 1 f s L − 1 ( y ( l +1) L − 1 ) p s L − 1 1 = p ∗ s 1 f s 1 ( y lL +1 ) p ∗ s 2 · · · p ∗ s L − 1 f s L − 1 ( y ( l +1) L − 1 ) p ∗ 1 , and for i = 1 , 2 , . . . , L − 1 , p ∗ s i f s i ( y lL + i ) ≥ p hj f j ( y lL + i ) for any h, j ∈ S .) Sup pose j 6 = 1 and t 1: L − 1 realizes p ( L − 1) ij ( y lL ) , i.e. p ( L − 1) ij ( y lL ) = p i t 1 f t 1 ( y lL +1 ) p t 1 t 2 · · · (32) · · · p t L − 2 t L − 1 f t L − 1 ( y ( l +1) L − 1 ) p t L − 1 j . Hence, with t 0 and t L standing for i and j , respectively (and s 0 = s L = 1 ), the lef t-hand side of ( 28 ) b ecomes  p t 0 t 1 f t 1 ( y lL +1 ) p s 0 s 1 f s 1 ( y lL +1 )  p t 1 t 2 f t 2 ( y lL +2 ) p s 1 s 2 f s 2 ( y lL +2 )  · · · (33)  p t L − 2 t L − 1 f t L − 1 ( y ( l +1) L − 1 ) p s L − 2 s L − 1 f s L − 1 ( y ( l +1) L − 1 )  p t L − 1 t L f j ( y ( l +1) L ) p s L − 1 s L f 1 ( y ( l +1) L )  . For h = 1 , . . . , L suc h that t h 6 = s h , p t h − 1 t h f t h ( y lL + h ) p s h − 1 s h f s h ( y lL + h ) < 1 − ǫ, since y lL + h ∈ X s h . (34) For all oth er h , s h = t h and the refore, the left-han d side o f ( 34 ) bec omes p t h − 1 t h p s h − 1 s h = p t h − 1 s h p ∗ s h ≤ 1 (by pr operty ( 14 )). Since th e last term o f the pr oduct ( 33 ) ab ove do es satisfy ( 3 4 ) ( j 6 = 1 ), ( 28 ) is thus proved. Suppose next tha t t 1 , . . . , t L − 1 realizes p ( L − 1) i 1 ( y lL ) . W ith s 0 = 1 an d t 0 = i , similarly to the previous arguments, we hav e p ( L − 1) i 1 ( y lL ) p ( L − 1) 1 1 ( y lL ) = L − 1 Y h =1  p t h − 1 t h f t h ( y lL + h ) p s h − 1 s h f s h ( y lL + h )  p t L − 1 1 p s L − 1 1 ≤ 1 , implying ( 27 ). Let us now prove ( 29 ). T o that end, note that f or all states h, i, j ∈ S suc h that p j h > 0 , it follows fr om the deﬁnitions ( 7 ) and ( 17 ) that p ih p j h ≤ p ∗ h p j h ≤ A. (35) If R = 1 , th en ( 29 ) becom es p ij f j ( y 0 ) ≤ Ap b 0 1 f 1 ( y 0 ) . By the d eﬁnition of X 1 (recall that y 0 ∈ X 1 ), we hav e that for ev ery i, j ∈ S p ij f j ( y 0 ) ≤ p ∗ 1 f 1 ( y 0 ) . Using ( 35 ) with h = 1 and j = b 0 , we get p ∗ 1 f 1 ( y 0 ) ≤ Ap b 0 1 f 1 ( y 0 ) ( p b 0 1 > 0 by the construction of b § IV -A3 ). Putting these all together , we obtain p ij f j ( y 0 ) < p ∗ 1 f 1 ( y 0 ) ≤ Ap b 0 1 f 1 ( y 0 ) , as req uired. Consider the case R > 1 . Let t 1: R − 1 be a path th at realizes p ( R − 1) ij ( z m ) , i.e. p ( R − 1) ij ( z m ) = p i t 1 f t 1 ( y ′ 1 ) p t 1 t 2 f t 2 ( y ′ 2 ) · · · p t R − 2 t R − 1 f t R − 1 ( y ′ R − 1 ) p t R − 1 j . By the deﬁnition of X l ( § IV -A 1 ) and the facts that y ′ r ∈ X b r , r = 1 , 2 , . . . , R − 1 , an d y 0 ∈ X 1 , we ha ve p ( R − 1) ij ( z m ) f j ( y 0 ) ≤ p ∗ b 1 f b 1 ( y ′ 1 ) p ∗ b 2 f b 2 ( y ′ 2 ) · · · p ∗ b R − 1 f b R − 1 ( y ′ R − 1 ) p ∗ 1 f 1 ( y 0 ) . (36 ) SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 8 Now , b y the con struction o f b ( § IV -A3 ), p b r − 1 b r > 0 for r = 1 , . . . , R , ( b R = 1 ). Thu s, the argum ent behind ( 35 ) applies here to bound the r ight-han d side of ( 36 ) f rom above by Ap b 0 b 1 f b 1 ( y ′ 1 ) Ap b 1 b 2 f b 2 ( y ′ 2 ) · · · Ap b R − 2 b R − 1 f b R − 1 ( y ′ R − 1 ) Ap b R − 1 1 f 1 ( y 0 ) = A R p ( R − 1) b 0 1 ( z m ) f 1 ( y 0 ) , as required. Let u s now prove ( 30 ). If m = 1 then ( 30 ) becomes p ( P ) ij ( y 2 kL ) ≤ p ( P ) 1 j ( y 2 kL ) q − 1 ∀ j ∈ C ∀ i ∈ S. (37) If P = 0 , then ( 37 ) r educes to p ij ≤ p 1 j q − 1 which is true, because in this case the state q 1 = q T = 1 be longs to C ( § IV -A 3 ) an d p 1 j q − 1 ≥ 1 (( 1 5 ), ( 16 ) with m = 1 ). T o see why ( 37 ) is true with P ≥ 1 , note that by the same argume nt as used for proving ( 27 ) and ( 28 ), we no w get ∀ h , l ∈ S p ( P − 1) 1 a P ( y 2 kL ) f a P ( y ′′ P ) ≥ p ( P − 1) h ′ ,l ( y 2 kL ) f l ( y ′′ P ) . (3 8) Also, since a P = q 1 ∈ C ( § IV -A 3 ), p a P j q − 1 ≥ 1 (( 15 ), ( 16 ) with m = 1 ). Thu s p ( P ) i j ( y 2 kL ) = by ( 4 ) = ma x l ∈ S p ( P − 1) i l ( y 2 kL ) f l ( y ′′ P ) p l j by ( 38 ) ≤ p ( P − 1) 1 a P ( y 2 kL ) f a P ( y ′′ P ) max l ∈ S p l j ≤ p ( P − 1) 1 a P ( y 2 kL ) f a P ( y ′′ P ) ≤ p ( P − 1) 1 a P ( y 2 kL ) f a P ( y ′′ P ) p a P j q − 1 by ( 4 ) ≤ p ( P ) 1 j ( y 2 kL ) q − 1 . For m > 1 , let t 1: m − 1 be a path realizing p ( m − 1) h j ( y ′′ P ) . Th us, p ( m − 1) h j ( y ′′ P ) = = p h t 1 f t 1 ( z ′ 1 ) p t 1 t 2 f t 2 ( z ′ 2 ) · · · f t m − 1 ( z ′ m − 1 ) p t m − 1 j < K m − 1 . (39) (This is true since z ′ r ∈ Z f or r = 1 , 2 , . . . , m − 1 ( § IV -A 2 ) and thus, fo r p ( m − 1) h j ( y ′′ P ) to be po siti ve it is necessary that t r ∈ C , r = 1 , . . . , m − 1 , imply ing f t r ( z ′ r ) < K .) Now , let t 1: m − 1 realize p ( m − 1) a P j ( y ′′ P ) , wh ich is clearly pos- iti ve, with t r ∈ C , r = 1 , . . . , m − 1 ( z ′ r ∈ Z for r = 1 , 2 , . . . , m − 1 ), an d a P , j ∈ C (recall th e positivity assumption o n Q m , § IV -A4 ). W e th us have p ( m − 1) a P j ( y ′′ P ) = p a P t 1 f t 1 ( z ′ 1 ) p t 1 t 2 f t 2 ( z ′ 2 ) · · · f t m − 1 ( z ′ m − 1 ) p t m − 1 j ≥ ≥ q ∗ a P j f t 1 ( z ′ 1 ) f t 2 ( z ′ 2 ) · · · f t m − 1 ( z ′ m − 1 ) > q δ m − 1 . (40) Combining the b ound s of ( 39 ) and ( 4 0 ) ( q > 0 , ( 16 )), we obtain: p ( m − 1) h j ( y ′′ P ) < p ( m − 1) a P j ( y ′′ P )  K δ  m − 1 /q . (41) Finally , p ( P + m − 1) ij ( y 2 kL ) = by ( 4 ) = max l ∈ S p ( P − 1) il ( y 2 kL ) f l ( y ′′ P ) p ( m − 1) lj ( y ′′ P ) by ( 38 ) , ( 41 ) < p ( P − 1) 1 a P ( y 2 kL ) f a P ( y ′′ P ) p ( m − 1) a P j ( y ′′ P )  K δ  m − 1 /q by ( 4 ) ≤ p ( P + m − 1) 1 j ( y 2 kL )  K δ  m − 1 /q . 10) γ j ≤ cons t × γ 1 : Combin ing ( 24 ), ( 26 ), a nd ( 2 9 ), we see that for ev ery state j ∈ S , γ j by ( 24 ) = β i γ ( j ) p ( R − 1) i γ ( j ) j ( z m ) f j ( y 0 ) by ( 29 ) ≤ β i γ ( j ) p ( R − 1) b 0 1 ( z m ) f 1 ( y 0 ) A R by ( 26 ) ≤ q − 1  K δ  m A R β b 0 p ( R − 1) b 0 1 ( z m ) f 1 ( y 0 ) ≤ U max i ∈ S β i p ( R − 1) i 1 ( z m ) f 1 ( y 0 ) by ( 24 ) = U γ 1 , where U def = q − 1  K δ  m A R . (42) Hence γ j ≤ U γ 1 ∀ j ∈ S. (43) 11) Further b ounds o n likelihoods: Let l ≥ 0 and n > 0 b e integers such th at l + n ≤ 2 k but arbitr ary otherwise. Expand ing p ( nL − 1) 1 1 ( y lL ) r ecursively accordin g with ( 4 ), we obtain p ( nL − 1) 1 1 ( y lL ) = max i 1: n − 1 ∈ S n − 1 p ( L − 1) 1 i 1 ( y lL ) f i 1 ( y ( l +1) L ) × × p ( L − 1) i 1 i 2 ( y ( l +1) L ) f i 2 ( y ( l +2) L ) · · · p ( L − 1) i n − 2 i n − 1 ( y ( l + n − 2) L ) × × f i n − 1 ( y ( l + n − 1) L ) p ( L − 1) i n − 1 1 ( y ( l + n − 1) L ) . (44) Since for any i 1 ∈ S , p ( L − 1) 1 i 1 ( y lL ) f i 1 ( y ( l +1) L ) ≤ p ( L − 1) 1 1 ( y lL ) f 1 ( y ( l +1) L ) , as well as p ( L − 1) i r − 1 i r ( y ( l + r − 1) L ) f i r ( y ( l + r ) L ) by ( 28 ) ≤ p ( L − 1) 1 1 ( y ( l + r − 1) L ) f 1 ( y ( l + r ) L ) , r = 2 , . . . , n − 1 , and since for any i n − 1 ∈ S p ( L − 1) i n − 1 1 ( y ( l + n − 1) L ) by ( 27 ) ≤ p ( L − 1) 1 1 ( y ( l + n − 1) L ) , maximization ( 44 ) ab ove is achie ved as follows: p ( nL − 1) 1 1 ( y lL ) = (45) p ( L − 1) 1 1 ( y lL ) f 1 ( y ( l +1) L ) p ( L − 1) 11 ( y ( l +1) L ) f 1 ( y ( l +2) L ) · · · · · · p ( L − 1) 1 1 ( y ( l + n − 2) L ) f 1 ( y ( l + n − 1) L ) p ( L − 1) 1 1 ( y ( l + n − 1) L ) . Now , we replace state 1 by g eneric states i, j ∈ S on the both ends of the p aths in ( 44 ) and repeat the above arguments. Thus, also u sing ( 45 ), we arri ve at bou nd ( 46 ) be low: p ( nL − 1) ij ( y lL ) f j ( y ( l + n ) L ) ≤ l + n Y u = l +1 p ( L − 1) 11 ( y ( u − 1) L ) f 1 ( y uL ) by ( 45 ) = p ( nL − 1) 11 ( y lL ) f 1 ( y ( l + n ) L ) ∀ i, j ∈ S. (46) In par ticular , ( 46 ) states ∀ i, j ∈ S p ( kL − 1) ij ( y 0 ) f j ( y kL ) ≤ p ( kL − 1) 11 ( y 0 ) f 1 ( y kL ) . (47) SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 9 12) η j ≤ const × η 1 : I n o rder to see η j ≤ U η 1 ∀ j ∈ S, (48 ) note: η j ( 25 ) = max i ∈ S γ i p ( kL − 1) i j ( y 0 ) f j ( y kL ) by ( 47 ) ≤ max i ∈ S γ i p ( kL − 1) 1 1 ( y 0 ) f 1 ( y kL ) by ( 43 ) ≤ by ( 43 ) ≤ U γ 1 p ( kL − 1) 1 1 ( y 0 ) f 1 ( y kL ) by ( 25 ) ≤ U η 1 . 13) A r epr esentation o f η 1 : Recall that k , the n umber of cycles in the s -path , was chosen suf ﬁciently large f or ( 18 ) to hold (in pa rticular, k > 1 ). W e n ow prove that th ere exists κ ∈ { 1 , . . . , k − 1 } such that η 1 = δ 1 ( y κL ) p (( k − κ ) L − 1) 1 1 ( y κL ) f 1 ( y kL ) . (49) The relation ( 49 ) states th at (g iv en observations x 1: u ) a maximum -likelihood p ath (from tim e 1 , obser vation x 1 ) to time u − m − P − k L ( observation y kL ) goes thro ugh state 1 at time u − m − P − 2 k L + κL , tha t is wh en y κL is observed. T o see this, supp ose no such κ existed.Then , ap plying ( 4 ) to ( 25 ) an d recalling that δ 1 ( y κL ) is in troduce d in ( 2 1 ), we would have η 1 = γ j η (1) p ( L − 1) j η (1) j 1 ( y 0 ) f j 1 ( y L ) p ( L − 1) j 1 j 2 ( y L ) × × f j 2 ( y 2 L ) p ( L − 1) j 2 j 3 ( y 2 L ) · · · p ( L − 1) j k − 1 1 ( y ( k − 1) L ) f 1 ( y kL ) for some j 1 6 = 1 , . . . , j k − 1 6 = 1 . Further more, this would imply η 1 < by ( 28 ) , ( 2 7 ) < γ j η (1) (1 − ǫ ) k − 1 k Y i =1 p ( L − 1) 1 1 ( y ( i − 1) L ) f 1 ( y iL ) by ( 18 ) < γ j η (1) q 2  δ K  2 m A − R k Y i =1 p ( L − 1) 1 1 ( y ( i − 1) L ) f 1 ( y iL ) by ( 43 ) ≤ γ 1 U q 2  δ K  2 m A − R k Y i =1 p ( L − 1) 1 1 ( y ( i − 1) L ) f 1 ( y iL ) by ( 42 ) = γ 1 q  δ K  m k Y i =1 p ( L − 1) 1 1 ( y ( i − 1) L ) f 1 ( y iL ) < γ 1 k Y i =1 p ( L − 1) 1 1 ( y ( i − 1) L ) f 1 ( y iL ) . ( 50) (The last ineq uality f ollows from q ≤ 1 ( 16 ) and δ < K , § IV -A2 .) On the o ther h and, by deﬁnition ( 25 ) (and k − 1 -f old application of ( 4 )), η 1 ≥ γ 1 Q k i =1 p ( L − 1) 1 1 ( y ( i − 1) L ) f 1 ( y iL ) , which evidently contradicts ( 50 ) above. Th erefore , κ satisfyin g ( 49 ) and 1 ≤ κ < k , does exist. 14) An implication of ( 45 ) and ( 4 9 ) for δ 1 ( y lL ) : Clearly , the a rguments of the previous section ( § I V - A13 ) are valid if k is replaced by any l ∈ { k , . . . , 2 k } . Hence th e following generalizatio n of ( 49 ): For some κ ( l ) < l δ 1 ( y lL ) = δ 1 ( y κ ( l ) L ) p (( l − κ ( l )) L − 1) 11 ( y κ ( l ) L ) f 1 ( y lL ) . (51) W e apply ( 51 ) recursi vely , starting with κ (0) def = l an d returning κ (1) def = κ ( l ) < l . If κ (1) ≤ k , we stop, oth erwise we substitute κ (1) for l , and obtain κ (2) def = κ ( l ) < κ (1) , and so , on u ntil κ ( j ) ≤ k for som e j > 0 . Thus, δ 1 ( y lL ) = = δ 1 ( y κ ( j ) L ) p (( κ ( j − 1) − κ ( j ) ) L − 1) 11 ( y κ ( j ) L ) f 1 ( y κ ( j − 1) L ) · · · p (( l − κ (1) ) L − 1) 11 ( y κ (1) L ) f 1 ( y lL ) . (52 ) Applying ( 45 ) to the appr opriate factors of the right- hand side of ( 52 ) above, we obtain : δ 1 ( y lL ) = δ 1 ( y κ ( j ) L ) p ( L − 1) 11 ( y κ ( j ) L ) f 1 ( y ( κ ( j ) +1) L ) · · · p ( L − 1) 11 ( y ( k − 1) L ) f 1 ( y kL ) · · · p ( L − 1) 11 ( y kL ) f 1 ( y ( k +1) L ) · · · p ( L − 1) 11 ( y ( κ ( j − 1) − 1) L ) f 1 ( y κ ( j − 1) L ) · · · p ( L − 1) 11 ( y ( κ (1) − 1) L ) f 1 ( y κ (1) L ) · · · p ( L − 1) 11 ( y ( l − 1) L ) f 1 ( y lL ) . (53 ) Also, accord ing to ( 45 ), δ 1 ( y κ ( j ) L ) p ( L − 1) 11 ( y κ ( j ) L ) f 1 ( y ( κ ( j ) +1) L ) · · · p ( L − 1) 11 ( y ( k − 1) L ) = δ 1 ( y κ ( j ) L ) p (( k − κ ( j ) ) L − 1) 11 ( y κ ( j ) L ) . At the same time, δ 1 ( y κ ( j ) L ) p (( k − κ ( j ) ) L − 1) 11 ( y κ ( j ) L ) f 1 ( y kL ) by ( 4 ) ≤ η 1 . (5 4) Howe ver , we can not h av e the strict in equality in ( 54 ) above sinc e that, by virtue of ( 53 ), would co ntradict ma x- imality of δ 1 ( y lL ) . W e ha ve thus ar riv ed at δ 1 ( y lL ) = η 1 p ( L − 1) 11 ( y kL ) f 1 ( y ( k +1) L ) · · · · · · p ( L − 1) 11 ( y ( l − 1) L ) f 1 ( y lL ) . (55) In summ ary , fo r any l ≥ k and l ≤ 2 k the re exists a realization of δ 1 ( y lL ) th at goes throug h state 1 every tim e when y iL , i = k , . . . , l , is observed. 15) y kL is a ( k L + m + P ) -or der 1-node: In § IV -A16 , w e will prove that fo r any i ∈ S, i 6 = 1 , and any j ∈ C , η i p ( kL + m + P − 1) ij ( y kL ) ≤ η 1 p ( kL + m + P − 1) 1 j ( y kL ) , (5 6) which imp lies that y kL is a 1-n ode of order k L + m + P . Indeed , let l ∈ S be arbitrary . Since f j ( z ′ m ) = 0 fo r every j ∈ S \ C , any maximu m likeliho od path to state l at time u + 1 (o bservation x u +1 ) mu st go thr ough a state in C at time u (ob servation x u = z ′ m .) Formally , η i p ( kL + m + P ) il ( y kL ) = = max j ∈ S η i p ( kL + m + P − 1) ij ( y kL ) f j ( z ′ m ) p j l = max j ∈ C η i p ( kL + m + P − 1) ij ( y kL ) f j ( z ′ m ) p j l by ( 56 ) ≤ max j ∈ C η 1 p ( kL + m + P − 1) 1 j ( y kL ) f j ( z ′ m ) p j l by ( 4 ) = η 1 p ( kL + m + P ) 1 l ( y kL ) . Therefo re, by Deﬁn ition 2.1 y kL is a 1-no de of order k L + m + P . SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 10 16) Pr oof of ( 56 ) : Let i ∈ S and j ∈ C be arbitrary . Let state j ∗ ∈ S be such that p ( kL + m + P − 1) i j ( y kL ) = p ( kL − 1) i j ∗ ( y kL ) f j ∗ ( y 2 kL ) p ( m + P − 1) j ∗ j ( y 2 kL ) = ν ( i, j ∗ ) p ( m + P − 1) j ∗ j ( y 2 kL ) , wher e ν ( i, j ) def = p ( kL − 1) ij ( y kL ) f j ( y 2 kL ) , for all i, j ∈ S. W e co nsider the fo llowing two cases separately: 1. Th ere exists a p ath realizin g p ( kL − 1) i j ∗ ( y kL ) and goin g throug h state 1 at the time of o bserving y lL for some l ∈ { k , . . . , 2 k } . p ( kL − 1) i j ∗ ( y kL ) = p (( l − k ) L − 1) i 1 ( y kL ) f 1 ( y lL ) p ((2 k − l ) L − 1) 1 j ∗ ( y lL ) . (57) Equation ( 57 ) above to gether with the fundam ental r ecur- sion ( 4 ) yields the fo llowing: η i p ( kL − 1) i j ∗ ( y kL ) = by ( 57 ) = η i p (( l − k ) L − 1) i 1 ( y kL ) f 1 ( y lL ) p ((2 k − l ) L − 1) 1 j ∗ ( y lL ) by ( 21 ) , ( 4 ) ≤ δ 1 ( y lL ) p ((2 k − l ) L − 1) 1 j ( y lL ) . (58) At the same time, the right hand-side of ( 58 ) can b e expressed as follows: δ 1 ( y lL ) p ((2 k − l ) L − 1) 1 j ∗ ( y lL ) by ( 55 ) = η 1 p (( l − k ) L − 1) 1 1 ( y kL ) f 1 ( y lL ) p ((2 k − l ) L − 1) 1 j ∗ by ( 45 ) = η 1 p ( kL − 1) 1 j ∗ ( y kL ) . (59) Therefo re, if there exists l ∈ { k , . . . , 2 k } such that ( 57 ) h olds, w e have by virtue of ( 58 ) an d ( 59 ): η i p ( kL − 1) i j ∗ ( y kL ) ≤ η 1 p ( kL − 1) 1 j ∗ ( y kL ) , that is η i ν ( i, j ∗ ) ≤ η 1 ν (1 , j ∗ ) . (60) Hence, η i p ( kL + m + P − 1) i j ( y kL ) = by ( 57 ) = η i ν ( i, j ∗ ) p ( m + P − 1) j ∗ l ( y 2 kL ) by ( 60 ) ≤ η 1 ν (1 , j ∗ ) p ( m + P − 1) j ∗ j ( y 2 kL ) by ( 4 ) ≤ η 1 p ( kL + m + P − 1) 1 j ( y kL ) and ( 56 ) holds. 2. Assume now that no path exists to satisfy ( 57 ). Argue as for ( 5 0 ) to obtain ν ( i, j ∗ ) < (1 − ǫ ) k − 1 2 k Y n = k +1 p ( L − 1) 1 1 ( y ( n − 1) L ) f 1 ( y nL ) . (61) By 45 , the (p artial likelihood) produc t in the rig ht-han d side o f ( 61 ) eq uals ν (1 , 1) . Th us, η i ν ( i, j ∗ ) p ( m + P − 1) j ∗ j ( y 2 kL ) < (62) by ( 61 ) < η i (1 − ǫ ) k − 1 ν (1 , 1) p ( m + P − 1) j ∗ j ( y 2 kL ) by ( 18 ) < η i q 2  δ K  2 m A − R ν (1 , 1) p ( m + P − 1) j ∗ j ( y 2 kL ) by ( 42 ) , ( 48 ) ≤ η 1 q  δ K  m ν (1 , 1) p ( m + P − 1) j ∗ j ( y 2 kL ) . Hence, for e very j ′ ∈ S , η i ν ( i, j ′ ) p ( m + P − 1) j ′ j ( y 2 kL ) by ( 57 ) ≤ η i ν ( i, j ∗ ) p ( m + P − 1) j ∗ j ( y 2 kL ) by ( 62 ) < by ( 62 ) < η 1 q  δ K  m ν (1 , 1) p ( m + P − 1) j ∗ j ( y 2 kL ) by ( 30 ) ≤ η 1  δ K  ν (1 , 1) p ( m + P − 1) 1 j ( y 2 kL ) < η 1 ν (1 , 1) p ( m + P − 1) 1 j ( y 2 kL ) by ( 4 ) ≤ η 1 p ( kL + m + P − 1) 1 j ( y kL ) , which, by v irtue of ( 4 ), imp lies ( 56 ). 17) Completion o f th e s - path to q 1: M and c onclusion : Finally , let M = 2 m + 2 L k + P + R + 2 , r = k L + P + m , l = 1 . Recall from § IV -A3 that b 0 ∈ C . Sin ce all the en tries o f Q m are po siti ve, there exists a p ath v 0: m − 1 , b 0 ∈ C such that p v i v i +1 > 0 and p v m − 1 b 0 > 0 . Similarly , th ere m ust exist a path u 1: m ∈ C such that p u i u i +1 > 0 ∀ i = 1 , . . . , m − 1 and p a P u 1 > 0 (recall that a P ∈ C ). H ence, by these, and the construction s of § IV -A5 , all of the transitions of the f ollowing sequence occu r with po siti ve pro babilities. q 1: M def = ( v 0: m − 1 , b 0: R , s 1:2 Lk , a 1: P , u 1: m ) . (6 3) Clearly , the actual pro bability of obser ving q 1: M is p ositiv e, as required . By the constructio ns of §§ IV -A1 - IV -A3 , the con di- tional probability of B below , gi ven q 1: M , is e vidently positi ve, as req uired. B def = Z m +1 × X b 1 × · · · × X b R − 1 × X 1 × X s 1 × · · · × X s 2 kL − 1 × X 1 × X a 1 × · · · × X a P × Z m . Finally , since the sequenc e ( 20 ) be low was chosen from B arbitrarily ( § I V - A6 ) and h as been sho wn to be an l -b arrier of order r , this comp letes th e pr oof of th e Lemma . ( z 0: m , y ′ 1: R − 1 , y 0:2 Lk , y ′′ 1: P , z ′ 1: m ) ∈ B . ( 20 ) B. Pr oof of Lemma 3.2 Pr oof: W e use the n otation of the previous proof in § IV -A and con sider th e following two distinct situations: First ( § IV -B1 ), all b arriers from B as construc ted in th e proof of Lemma 3.2 ar e alread y separated. Obviously , there is nothing to do in this case. Th e second situation ( § IV -B2 ) is complemen tary , in which case a simple extension w ill immediately ensur e separation . 1) All y ∈ B are a lr eady separated: Recall the deﬁnition of Z fr om § IV -A2 . Con sider the two cases in the deﬁn ition separately . First, supp ose Z = ˆ Z \ ( ∪ l ∈ S X l ) , in which case Z and X l are disjoint for every l ∈ S . This imp lies that every barrier ( 20 ) is already separated . Indeed, f or any w , 1 ≤ w ≤ r , and for any y ∈ B , the fact that y M − max( m,w ) 6∈ Z , for example, makes it impo ssible fo r ( y ′ 1: w , y 1: M − w ) ∈ B for any SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 11 y ′ 1: w ∈ X w . Con sider now the case when Z = ˆ Z ∩ X s for some s ∈ C . Then B ⊂ X m +1 s × X b 1 × · · · × X b R − 1 × X 1 × X s 1 × · · · X s 2 kL − 1 × X 1 × X a 1 × · · · × X a P − 1 × X m +1 s . (64) Let y ∈ B be arbitrary . Assume ﬁrst L > 1 . By construction ( § IV -A 3 ), the states s 1 , . . . , s L are all distinct. W e no w show that ( y ′ 1: w , y 1: M − w ) 6∈ B fo r any y ′ 1: w ∈ X w when 1 ≤ w ≤ r . Note that the sequen ce q m +2: m + R +2 kL + P +1 = ( b 1: R − 1 , 1 , s 1:2 kL − 1 , 1 , a 1: P − 1 , s ) is such that no two consecu ti ve states are equ al. It is straig ht- forward to verify that th ere exist indices j , 0 ≤ j ≤ m − 1 , such that, when shifted w positions to th e r ight, the pair y j +1 j +2 ∈ X 2 s would at the same time h av e to b elong to X q j +1+ w × X q j +2+ w with m + 1 ≤ j + 1 + w < j + 2 + w ≤ m + R + 2 k L + 1 + P . This is clearly a contradiction since X q j +1+ w and X q j +2+ w are d isjoint for that range of ind ices j . A veriﬁcation of the above fact simply amou nts to verifying that the ine quality max (0 , m − w ) ≤ j ≤ min( m − 1 , m + R + 2 k L − 1 + P − w ) is consistent for any w from the admissible range: i.) When 0 ≥ m − w , m − 1 ≤ m + R + 2 k L − 1 + P − w ( m ≤ w ≤ min( r, R + 2 k L + P ) ), 0 ≤ j ≤ m − 1 is evidently c onsistent. ii.) When 0 ≥ m − w , m − 1 > m + R + 2 k L − 1 + P − w ( max( m, R + 2 k L + P ) ≤ w ≤ r ) , 0 ≤ j ≤ m + R + 2 k L − 1 + P − w is also con sistent since m + R + 2 k L − 1 + P − r = R + k L − 1 ≥ 0 . iii.) Whe n 0 < m − w , m − 1 ≤ m + R + 2 k L − 1 + P − w ( 1 ≤ w ≤ min( m − 1 , R + 2 k L + P ) ), m − w ≤ j ≤ m − 1 is consistent since w ≥ 1 . iv .) When 0 < m − w , m − 1 > m + R + 2 k L − 1 + P − w ( max(1 , R + 2 kL + P − 1 ) ≤ w < m ) , m − w ≤ j ≤ m + R + 2 k L − 1 + P − w is consistent since R + 2 k L − 1 ≥ 0 . Next co nsider the case of L = 1 but s 6 = 1 (th at is, P > 0 ). Then B ⊂ X m +1 s × X b 1 × · · · ×X b R − 1 × X 2 k +1 1 × X a 1 × · · · × X a P − 1 × X m +1 s . If s 6 = 1 , then also b i 6 = 1 , i = 1 , . . . , R − 1 and a i 6 = 1 , i = 1 , . . . , P − 1 . T o see that y is separated in this case, simply n ote that y M − max ( w, m +1) 6∈ X s for any a dmissible w . 2) Ba rriers y ∈ B need no t b e sep arated: Finally , we consider th e case when L = 1 and s = 1 (where s ∈ C is such that Z = ˆ Z ∩ X s ). This im plies that P = 0 , 1 ∈ C , and p 1 1 > 0 , wh ich in turn implies that R = 1 , and B ⊂ X m +1 1 × X 2 k +1 1 × X m +1 1 = X 2 m +2 k +3 1 . Clearly , the barriers fro m B need not b e, and indeed, are not separated. I t is, howe ver , easy to extend them to separated ones. Indee d, let q 0 6 = 1 be such th at p q 0 1 > 0 and red eﬁne B def = X q 0 × B . Ev idently , any shift of any y ∈ B by w ( 1 ≤ w ≤ r ) positions to the r ight makes it imp ossible for y 1 to be simu ltaneously in X q 0 and in X 1 (since the latter sets are disjoint, § IV -A1 ). V . C O N C L U S I O N As discussed in § I an d § I-A in particular, the prop er in ﬁnite alignments ( § I I-B ) allow us to deﬁne the d ecoding pro cess V wh ich is r egenerative and can fur ther be stationarize d to become ergodic [ 7 ]. This in turn allo ws us to study the distribution and asymptotic pro perties no t on ly o f th e V iterbi process V but also o f th e join t process ( X , V ) . In particu lar , this reveals ho w d ifferent these p roperties ar e from the prop er- ties of the u nderlyin g chain Y and HMM ( X , Y ) , respectively . More speciﬁcally , since the process V (r esp. ( X , V ) ) can deviate from th e process Y (resp. ( X, Y ) ) signiﬁcantly , using the V iterbi alignm ents v 1: n as estimates for the hidden paths Y 1: n might lead to inco rrect conclu sions not only for ﬁnite n (as gen erally app reciated) but also in the limit as n → ∞ [ 7 ]. This c ertainly d oes no t mea n that one sho uld not make inference based on V but simply su ggests that the afor e- mentioned d ifferences may need to be taken into accou nt. One example of how th ese asymp totic differences can be successfully acco unted for is the ad justed V iterbi train ing for HMM param eter estimation [ 11 ], [ 12 ], [ 7 ]. If k nown — possibly estimated — these differences migh t also be app reciated when the V iter bi p aths are used for pre dic- tion, or segmentation , o f Y , e.g. in spee ch segmenta tion o r in segmentation of DNA sequence s in to coding and non -codin g regions, o r in detection of CpG islands in DNA s equ ences [ 15 ]. Indeed , in segmentation o f DNA seq uences, the u nderly ing chain Y has few , often two, states ( e.g. co ding and non- coding r egions, o r CpG islands and non-CpG r egions), the probab ilities of transitions betwe en the states are very low , hence the true ( Y ) an d predicted ( V ) h idden paths consist of long c onstant blocks. At th e same time, it has been noted that the predicted constant blocks c an be somewhat long er than what the c hain paramete rs wou ld suggest. W ith the help of the inﬁn ite V iterbi p rocess V it is now clear that this discrepancy is not simply due to th e random ﬂuctuations but is systematic, does not vanish asymptotically , and is a direct consequen ce of that th e tran sition pro babilities of V do inde ed often und erestimate the tru e ones. No te that in these examples, unlike in the e stimation of the HMM em ission param eters, the overall performan ce is directly linked to the accur acy of the transition pro bability estimates. Thus, ﬁndin g the differences between the p rocesses ( X, Y ) and ( X , V ) in this case might help ﬁnd better alignments. A C K N O W L E D G M E N T The ﬁrst auth or has b een supp orted by the Estonian Scien ce Foundation Grant 7 553. T he au thors than k Eu random (The Netherland s) fo r initiating an d stimulating their research o n hidden Markov mod els, o f which this w ork has been an integral par t. The au thors also tha nk Dr . A. Caliebe for valuable discussions and f or emp hasizing the sig niﬁcance o f the topic of path estima tion in HMM s. R E F E R E N C E S [1] Y . Ephraim and N. Merhav , “Hidden Marko v processes, ” IEE E T rans. Inform. Theory , vol. 48, no. 6, pp. 1518–1569, 2002, special issue on Shannon theory: perspecti ve , trends, and applicati ons. SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMA TION T HEOR Y 12 [2] V . Geno n-Catalot , T . J eanthe au, and C. Lar ´ edo, “Sto chastic volati lity models as hidden Markov models and statisti cal applicati ons, ” Bernoulli , vol. 6, no. 6, pp. 1051–1079, 2000. [3] B. G. Leroux, “Maximum-like lihood estimation for hidden Markov models, ” Stochast ic Proc ess. Appl. , vol. 40, no. 1, pp. 127–143, 1992. [4] A. V iter bi, “ Error bounds for con voluti onal codes and an asymptotica lly optimum decoding algorith m , ” IEEE T rans. Inform. Theory , vol. 13, no. 2, pp. 260–269, 1967. [5] L. Rabiner , “ A tutorial on hidden Marko v models and select ed applica - tions in speech recogni tion, ” Pr oc. IEEE , vol. 77, no. 2, pp. 257–286, 1989. [6] O. Capp ´ e, E . Moulines, and T . Ryd ´ en, Inferenc e in hidden Markov models , ser . Springer Series in Statistics. Ne w Y ork: Springer , 2005, with Randal Douc’ s contributi ons to Chapter 9 and Christian P . Robert’ s to Chapters 6, 7 and 13, Wit h Chapter 14 by Gersende Fort , Philippe Soulier and Moulines, and Chapter 15 by St ´ ephane Bouchero n and Elisabeth Gassiat. [7] J. Lember and A. Kol oydenk o, “ The Adjusted Viterbi traini ng for hidden Marko v models , ” Bernoulli , vol. 14, no. 1, pp. 180–206, 2008. [8] A. Calie be and U. R ¨ o sler , “Con v ergence of the maximum a posteriori path estimator in hidden Marko v models, ” IE EE T rans. Inform. Theory , vol. 48, no. 7, pp. 1750–1758, 2002. [9] A. Caliebe, “Propert ies of the maximum a posteri ori path estimator in hidden Markov models, ” IEEE T rans. Inform. The ory , v ol. 52 , no. 1, pp. 41–51, 2006. [10] J. A . Kogan , “Hidden Marko v models estimation via the most informa- ti ve stoppi ng times for the Vit erbi algorit hm, ” in Ima ge models (and their speec h model cousins) (Minneapolis, MN, 1993/1994) , ser . IMA V ol. Math. Appl. New Y ork: Springer , 1996, vol. 80, pp. 115–130. [11] J. Lember and A. Koloyd enko, “ Adjusted Viterbi trai ning: A proof of concep t , ” P r obab . Eng. Inf. Sci. , vol. 21, no. 3, pp. 451–475, 2007. [12] A. Kolo ydenk o, M. K ¨ a ¨ a rik, and J. Lember, “ On adjusted Viterbi traini ng , ” Acta Appl. Math. , vol. 96, no. 1-3, pp. 309–326, 2007. [13] F . Jelinek , “Continuous speech recogniti on by statistica l methods, ” P r oc. IEEE , vol. 64, pp. 532–556, 1976. [14] J. L ember and A. Kolo ydenko, “Inﬁnite Viterbi alignments in the two- state hidden Markov models, ” in Proc . 8th T artu Conf. Multiv ariate Statist . , J uly 2007, accept ed. [15] R. Durbin, S. Eddy , K. A., and G. Mitchison, Biological Sequence Anal- ysis: Pr obabilistic Models of Prot eins and Nucleic Acids . Cambridge Uni versity Press, 1998. PLA CE PHO TO HERE J ¨ uri Lember was born in 1968 in T alli nn, Estonia. He recei ved the diploma and M.Sc. degrees in m ath- ematica l statistics in 1992 and 1994, respecti vely , from the Uni versity of T artu , E stonia. He recei v ed the Ph.D. degree in mathematics from the Univ ersity of T artu, Estonia, in 1999. He completed his compulsory military service in 1987–1989, and was a Postdoctoral Research Fello w in the Institute of Mathematical Statistics, Univ ersity of T artu, in 1999–2000. He held a Postdoctoral Researc h position in Eurandom, The Netherl ands, in 2001–2003 . Since 2003, he has been a Lecture and a Senior Researcher in the Institute of Mathemati cal Statist ics, Uni versity of T artu. His scientiﬁc intere sts include probability theory , theoret ical sta tistics, informat ion theory , and speech recogniti on. Dr . L ember has been a member of the Estonian statistica l society as well as Estonian m athemati cal society since 2003. He has been awarded Estonian Science foundation grants for periods of 2004–2007 and 2008–2011. PLA CE PHO TO HERE Alexey Kolo ydenko recei ved the B.S. degrees in physics and mathematic s (with information sys- tems minor) in 1994 from the V oronez h Univ ersity , Russian Federation and Norwich Unive rsity , USA, respect iv ely . He recei ved in 1996 the M.S. (tech.) degre e in physics and radio-el ectronic s from the V oronezh Uni versity , Russian Federation , and the M.S. degree in mathemat ics and statistics from the Uni versity of Massachusetts at Amherst, USA. He rece iv ed the Ph.D. de gree in mathemat ics and statisti cs from the Unive rsity of Massachusetts at Amherst, USA, in 2000. He held Postdoctora l Research and T eaching positions with the Department of Mathemat ics and Statistics of the Uni versity of Massachuset ts at Amherst, Statist ics and Computer Science Department s of the Univ ersity of Chicago, and Eurandom, The Netherl ands, in 2000, 2001–2002, and 2002–2005, respect iv ely . Since 2005 he has been a L ecture r in Statistics at the Uni versity of Nottingham, UK. His researc h intere sts include statistical processing and analysi s of images, diffusion weighte d MRI, algebrai c aspects of probability theory and statistics, and hidden Marko v models. Dr . Ko loydenk o has been a member of the Pattern Analysis, Statistical Modelli ng and Computati onal L earning European netw ork (P ASCAL) since 2004.

A constructive proof of the existence of Viterbi processes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment