Asymptotics of input-constrained binary symmetric channel capacity

The Annals of Applie d Pr obabil ity 2009, V ol. 19, N o. 3, 1063 –1091 DOI: 10.1214 /08-AAP570 c  Institute of Mathematical S tatistics , 2009 ASYMPTOTICS OF INPUT-CONSTRAINED BINAR Y SYMMETRIC CHANNEL CAP A CITY By Guangyue Han 1 and Brian Marcus University of Hong Kong and Unive rsity of British Columbia W e study th e classical problem of noisy constrained capacit y in the case of the binary symmetric c h annel (BSC), n amely , the capacity of a BSC whose inputs are sequences c h osen from a constrained set. Motiv ated by a result of Ordentlic h and W eissman [In Pr o c e e dings of IEEE I nformation The ory Workshop (2004) 117–122 ], w e d erive an asympt otic formula (when the noise p arameter is small) for the entrop y rate of a hidden Marko v chain, observed when a Marko v chain passes through a BSC . Using this result, we establis h an asymptotic form u la for th e capacit y of a BSC with input pro cess supp orted on an irreducible ﬁn ite t yp e constrain t, as the noise parameter tends to zero. 1. In tro d uction and bac kground . Let X , Y b e discrete random v ariables with alphab et X , Y and join t probabilit y mass function p X,Y ( x, y ) △ = P ( X = x, Y = y ), x ∈ X , y ∈ Y [for n otational simp licit y , w e will w rite p ( x, y ) rather than p X,Y ( x, y ), similarly p ( x ) , p ( y ) rather than p X ( x ) , p Y ( y ), resp., wh en it is clear from the conte xt]. The entr opy H ( X ) of th e discrete random v ariable X , which measures the level of u ncertain ty of X , is d eﬁ ned as (in this pap er log is take n to mean th e natural logarithm) H ( X ) = − X x ∈X p ( x ) log p ( x ) . The c onditional entr opy H ( Y | X ), which measur es the lev el of uncertain t y of Y giv en X , is deﬁn ed as H ( Y | X ) = X x ∈X p ( x ) H ( Y | X = x ) Received Sep tember 2008. 1 Supp orted by the U niversit y of Hong Kong u nder Grant No. 20070 915900 7 and sup- p orted by the Research Grants Council of the Hong Kong Sp ecial Administrative Region, China un der Grant No. HKU 701708P . AMS 2000 subje ct classiﬁc ations. Primary 60K99, 94A15; secondary 60J10. Key wor ds and phr ases. Hidd en Marko v chain, entropy, constrained capacity. This is an electronic reprint of the origina l a rticle published b y the Institute of Mathematical Statistics in The Annals of Applie d Pr ob ability , 2009, V ol. 19, No. 3 , 1 063– 1 091 . This reprint diﬀers from the o riginal in pagination and typo graphic detail. 1 2 G. HA N AND B. MARCUS △ = − X x ∈X p ( x ) X y ∈Y p ( y | x ) log p ( y | x ) = − X x ∈X ,y ∈Y p ( x, y ) log p ( y | x ) . The deﬁnitions ab o v e n aturally include the case wh en X , Y are v ector-v alued v ariables, for example, X = X ℓ k △ = ( X k , X k +1 , . . . , X ℓ ), a sequence of d iscrete random v ariables. F or a left-inﬁn ite discrete s tationary sto c hastic pro cess X = X 0 −∞ △ = { X i : i = 0 , − 1 , − 2 , . . . } , the entr opy r ate of X is deﬁned to b e H ( X ) = lim n →∞ 1 n + 1 H ( X 0 − n ) , (1.1) where H ( X 0 − n ) denotes the entrop y of the vec tor-v alued r andom v ariable X 0 − n . Giv en another stationary pro cess Y = Y 0 −∞ , w e similarly deﬁne the conditional entrop y rate H ( Y | X ) = lim n →∞ 1 n + 1 H ( Y 0 − n | X 0 − n ) . (1.2) A simple monotonicit y argumen t in page 64 of [ 8 ] s ho ws the existence of the limit in ( 1.1 ). Using the c hain rule for en tropy (see p age 21 of [ 8 ]), we obtain H ( Y 0 − n | X 0 − n ) = H ( X 0 − n , Y 0 − n ) − H ( X 0 − n ) , and so w e can apply the same argumen t to the pro cesses ( X , Y ) and X to obtain the limit in ( 1.2 ). If Y = Y 0 −∞ is a stationary ﬁnite-state Marko v c hain, then H ( Y ) has a simple analytic form. Sp eciﬁcally , denoting by ∆ the transition pr obabilit y matrix of Y , we ha ve H ( Y ) = H ( Y 0 | Y − 1 ) = − X i,j P ( Y 0 = i )∆( i, j ) log ∆( i, j ) . (1.3) A fu nction Z = Z 0 −∞ of the stationary Mark o v c h ain Y with the form Z i = Φ( Y i ) is called a hidden Markov chain ; here Φ is a fun ction deﬁned on the alphab et of Y i , taking v alues in the alph ab et of Z i . W e often write Z = Φ( Y ) . Hidden Marko v chains are t ypically not Mark ov. F or a hidden Mark o v c hain Z , the en trop y rate H ( Z ) was studied by Blac kw ell [ 6 ] as early as 1957, where the analysis suggested the in trins ic complexit y of H ( Z ) as a f unction of the pro cess parameters. He ga v e an expression for H ( Z ) in terms of a measure Q on a simp lex, obtained b y solving an integ ral equ ation dep endent on the parameters of the p ro cess. Ho we v er, th e measure is diﬃcult to extract fr om th e equation in an y explicit w ay , and the en tropy rate is diﬃcult to compute. Recen tly , the problem of compu tin g the ent rop y rate of a h idden Marko v c hain has d r a wn m uch in terest, and many approac hes ha ve b een adopted to tac kle this problem. These include asymp totic expansions as Mark ov INPUT-CONSTRA INED CHA NNEL CAP ACITY 3 c hain parameters tend to extremes [ 14 , 17 , 18 , 22 , 23 , 34 , 35 ], analytic - it y r esu lts [ 13 ], v ariations on a classical b ou n d [ 9 ] and eﬃcien t Mon te Carlo metho ds [ 2 , 27 , 31 ]; and connections with the top Ly apuno v exp onen t of a random matrix pro duct ha v e b een observ ed [ 11 , 15 , 16 , 17 ], relating to earlier work on Lya p unov exp onent s [ 4 , 25 , 26 , 28 ]. Of particular in terest are h idden Mark o v c h ains which arise as output pro cesses of noisy c h annels. F or example, the binary symmetric channel with cr ossover pr ob ability ε [denoted BS C ( ε )] is an ob ject wh ic h transf orms input pro cesses to output pro cesses b y means of a ﬁxed i.i.d. binary noise p r o cess E = { E n } with p E n (0) = 1 − ε and p E n (1) = ε . Sp eciﬁcally , giv en an arbitrary binary input pro cess X = { X n } , wh ic h is ind ep endent of E , deﬁne at time n , Z n ( ε ) = X n ⊕ E n , where ⊕ denotes b inary addition mo du lo 2; then Z ε = { Z n ( ε ) } is th e output pro cess corresp onding to X . When the input X is a stationary Mark ov c h ain, the output Z ε can b e view ed as a hidden Mark o v c hain b y approp r iately augmen ting th e s tate space of X [ 10 ]; sp eciﬁcally , in the case that X is a ﬁ rst order b inary Marko v c hain with trans ition probability matrix Π =  π 00 π 01 π 10 π 11  , then Y ε = { Y n ( ε ) } = { ( X n , E n ) } is join tly Mark o v with transition prob ab ility matrix ∆ =        y (0 , 0) (0 , 1) (1 , 0) (1 , 1) (0 , 0) π 00 (1 − ε ) π 00 ε π 01 (1 − ε ) π 01 ε (0 , 1) π 00 (1 − ε ) π 00 ε π 01 (1 − ε ) π 01 ε (1 , 0) π 10 (1 − ε ) π 10 ε π 11 (1 − ε ) π 11 ε (1 , 1) π 10 (1 − ε ) π 10 ε π 11 (1 − ε ) π 11 ε        and Z ε = { Z n ( ε ) } is a hid den Marko v chain with Z n ( ε ) = Φ( Y n ( ε )), where Φ m aps states (0 , 0) and (1 , 1) to 0 and m aps states (0 , 1) and (1 , 0) to 1 . In Section 2 w e giv e asymptotics for the en trop y rate of a h idden Mark o v c hain, obtained by passing a b inary Mark ov c h ain, of arbitrary order, through BSC( ε ) as the noise ε tends to zero. In S ection 2.1 we review, from [ 18 ], the result when the transition p r obabilities are strictly p ositiv e. In S ection 2.2 w e develo p the form u la when s ome transition probabilities are zero (wh ic h is our main fo cus), thereb y generalizing a sp eciﬁc result from [ 23 ]. The remaind er of the pap er is devo ted to asymptotics for n oisy con- strained c h annel capacit y . The c ap acity of the (unconstrained) BSC ( ε ) is deﬁned C ( ε ) = lim n →∞ sup X 0 − n 1 n + 1 ( H ( Z 0 − n ( ε )) − H ( Z 0 − n ( ε ) | X 0 − n )); (1.4) 4 G. HA N AND B. MARCUS here X 0 − n is a ﬁn ite-length inp ut p ro cess from time − n to 0 and Z 0 − n ( ε ) is the corresp onding output p ro cess. Seminal results of inform ation th eory , due to S hannon [ 30 ], in clude the f ollo wing: (1) the capaci t y is the optimal rate of transmission p ossible w ith arbitrarily small p robabilit y of error, and (2) the capacit y can b e explicitly computed: C ( ε ) = 1 − H ( ε ), where H ( ε ) is the binary entrop y f u nction deﬁn ed as H ( ε ) = ε log 1 /ε + (1 − ε ) log 1 / (1 − ε ) . Generally sp eaking, it is ve ry diﬃcult to calculate the capacit y of a generic c hann el. F or a discrete memoryless channel without input-constrain ts, the Blah ut–Arimoto algorithm [ 1 , 7 ] can b e applied to appr o ximate th e capacit y n u merically . A generalized Blah ut–Arimoto algorithm has b een prop osed to numerically compute the lo cal maxim u m mutual information rate of a ﬁnite state mac h ine channel [ 32 ]. W e are in terested in input-c onstr aine d c hann el capaci t y , i.e., th e capacit y of BSC( ε ), where the p ossible inputs are constrained, d escrib ed as f ollo ws. Let X = { 0 , 1 } , X ∗ denote all the ﬁn ite length binary w ords, and X n denote all the binary words w ith length n . A binary ﬁnite typ e constrain t [ 20 , 21 ] S is a subset of X ∗ deﬁned by a ﬁnite set (denoted by F ) of forbidden w ord s; in other wo r ds, an y elemen t in S do es not cont ain any element in F as a con tiguous s ubsequence. A prominen t example is th e ( d, k )-RLL constrain t S ( d, k ) , wh ic h forbids any sequence with few er than d or more than k consecutiv e zeros in b et wee n tw o 1’s. F or S ( d, k ) with k < ∞ , a forbidden set F is: F = { 1 0 · · · 0 | {z } l 1 : 0 ≤ l < d } ∪ { 0 · · · 0 | {z } k +1 } . When k = ∞ , one can c h o ose F to b e F = { 1 0 · · · 0 | {z } l 1 : 0 ≤ l < d } ; in particular, w h en d = 1 , k = ∞ , F can b e chosen to b e { 11 } . These con- strain ts on inpu t sequ ences arise in magnetic recording in order to eliminate the most damaging error ev ents [ 21 ]. W e will use S n to denote the su bset of S consisting of wo rds with length n . A ﬁn ite t yp e constr aint S is irr e ducible if for any u, v ∈ S , there is a w ∈ S suc h that uw v ∈ S . F or a ﬁnite binary stoc hastic (not necessarily stationary) pr o cess X = X 0 − n , deﬁ n e the set of al lowe d words with resp ect to X as A ( X 0 − n ) = { w 0 − n ∈ X n +1 : P ( X 0 − n = w 0 − n ) > 0 } . INPUT-CONSTRA INED CHA NNEL CAP ACITY 5 F or a left-inﬁnite b inary sto c hastic (again n ot necessarily stationary) pr o cess X = X 0 −∞ , deﬁne th e set of al lowe d w ord s with resp ect to X as A ( X ) = { w 0 − m ∈ X ∗ : m ≥ 0 , P ( X 0 − m = w 0 − m ) > 0 } . F or a constrained BSC( ε ) with in put sequences in S , the noisy c onstr aine d c ap acity C ( S , ε ) is d eﬁned as C ( S , ε ) = lim n →∞ sup A ( X 0 − n ) ⊆S 1 n + 1 ( H ( Z 0 − n ( ε )) − H ( Z 0 − n ( ε ) | X 0 − n )) , where again Z 0 − n ( ε ) is the output pro cess corresp ondin g to the input pro cess X 0 − n . Let P (resp. P n ) denote the set of all left-inﬁnite (resp. length n ) stationary pro cesses ov er the alphab et X . Using the approac h in S ection 12.4 of [ 12 ], one can sho w that C ( S , ε ) = lim n →∞ sup X 0 − n ∈ P n +1 , A ( X 0 − n ) ⊆S 1 n + 1 ( H ( Z 0 − n ( ε )) − H ( Z 0 − n ( ε ) | X 0 − n )) (1.5) = sup X ∈ P , A ( X ) ⊆S H ( Z ε ) − H ( Z ε | X ) , where Z 0 − n ( ε ) , Z ε are the outpu t pro cess corresp onding to the input pr o cesses X 0 − n , X , resp ectiv ely . In Section 3 we apply the results of Section 2 to derive an asymp totic form u la for capaci t y of the input-constrained BSC( ε ) (ag ain as ε tends to zero) for any irredu cible ﬁ nite t yp e inpu t constraint. In Section 4 we consider the sp ecial case of the ( d, k )-RLL constrain t, and compu te the co eﬃcients of the asymp totic formulas. Regarding pr ior wo rk on C ( S , ε ), the b est results in the lite rature ha ve b een in the form of b ounds and numerical simulati ons based on p ro ducing random (and, hop efu lly , t yp ical) c h annel output sequences (see, e.g., [ 3 , 29 , 33 ] and references therein). T hese metho ds allo w f or fairly pr ecise numerical appro ximations of the capacit y for giv en constrain ts and c h annel p arameters. F or a more detailed introdu ction to en trop y , capacit y and r elated concepts in information theory , we refer to standard textb o oks suc h as [ 8 , 12 ]. 2. Asymptotics of en trop y rate. Consider a BSC( ε ) and sup p ose the input is an m th ord er ir reducible Mark ov chain X deﬁn ed by the transition probabilities P ( X t = a 0 | X t − 1 t − m = a − 1 − m ), a 0 − m ∈ X m +1 , here again X = { 0 , 1 } , and the output hid den Mark ov c hain w ill b e d enoted b y Z ε . 2.1. When tr ansition pr ob abilities of X ar e al l p ositive. This case is treated in [ 18 ]: 6 G. HA N AND B. MARCUS Theorem 2.1 ([ 18 ], Theorem 3). If P ( X t = a 0 | X t − 1 t − m = a − 1 − m ) > 0 for al l a 0 − m ∈ X n +1 , the e ntr opy r ate of Z ε for smal l ε is H ( Z ε ) = H ( X ) + g ( X ) ε + O ( ε 2 ) , (2.1) wher e, denoting by ¯ z i the Bo ole an c omplement of z i , and ˇ z 2 m +1 = z 1 · · · z m ¯ z m +1 z m +2 · · · z 2 m +1 , we have g ( X ) = X z 2 m +1 1 ∈X 2 m +1 P X ( z 2 m +1 1 ) log P X ( z 2 m +1 1 ) P X ( ˇ z 2 m +1 1 ) . (2.2) W e remark that the expr ession her e for g ( X ) is a familiar quantit y in infor- mation theory , kno wn as the Ku llb ac k–Liebler diverge nce; s p eciﬁcally , g ( X ) is the diverge nce b et ween the t wo distr ibutions P X ( z 2 m +1 1 ) and P X ( ˇ z 2 m +1 1 ). In [ 18 ] a complete p r o of is giv en for ﬁrst-order Marko v c hains, as well as the sk etc h for the generalization to higher ord er Marko v c hains. Alterna- tiv ely , after app ropriately enlarging the s tate space of X to con v ert the m th order Mark o v c h ain to a ﬁ rst order Marko v c h ain, w e can u se Th eorem 1.1 of [ 13 ] to s ho w H ( Z ε ) is analytic with resp ect to ε at ε = 0, and Theorem 2.5 of [ 14 ] to sho w that all the deriv ativ es of H ( Z ε ) at ε = 0 can b e com- puted explicitly (in prin ciple) without taking limits. Th eorem 2.1 do es this explicitly (in fact) for the ﬁr s t deriv ativ e. 2.2. When tr ansition pr ob abilities of X ar e not ne c essarily al l p ositive. First consider the case w hen X is a binary ﬁ rst order Mark o v c hain with the transition probabilit y matrix  1 − p p 1 0  , (2.3) where 0 ≤ p ≤ 1. T h is pro cess generates sequences satisfying the ( d, k ) = (1 , ∞ ) -RLL constraint, whic h simply means that the string 11 is forbidd en . Sequences generated by the output p ro cess Z ε , h o wev er, will generally not satisfy the constraint. The pr obabilit y of the constraint -violating sequences at the outpu t of the c hann el is p olynomial in ε , w hic h will generally con- tribute a term O ( ε log ε ) to the ent rop y r ate H ( Z ε ) when ε is small. This w as already observ ed for the probabilit y transition matrix ( 2.3 ) in [ 23 ], wh ere it is sh o wn that H ( Z ε ) = H ( X ) + p (2 − p ) 1 + p ε log 1 /ε + O ( ε ) (2.4) as ε → 0. INPUT-CONSTRA INED CHA NNEL CAP ACITY 7 In the follo wing w e shall generaliz e formulas ( 2.1 ) an d ( 2.4 ) and deriv e a form ula f or entrop y rate of any hidd en Marko v c hain Z ε , obtained wh en passing a Mark ov chain X of an y order m through a BSC( ε ). W e will apply the Birch b oun ds [ 5 ], for n ≥ m , wh ic h yield H ( Z 0 ( ε ) | Z − 1 − n + m ( ε ) , X − n + m − 1 − n , E − n + m − 1 − n ) (2.5) ≤ H ( Z ε ) ≤ H ( Z 0 ( ε ) | Z − 1 − n ( ε )) . Note that the lo w er b ound is really just H ( Z 0 ( ε ) | Z − 1 − n + m ( ε ) , X − n + m − 1 − n ) , since Z 0 − n + m ( ε ), if conditioned on X − n + m − 1 − n , is indep endent of E − n + m − 1 − n . Lemma 2.2. F or a stationary input pr o c ess X 0 − n and the c orr esp onding output pr o c ess Z 0 − n ( ε ) thr ough BSC( ε ) and 0 ≤ k ≤ n , H ( Z 0 ( ε ) | Z − 1 − n + k ( ε ) , X − n + k − 1 − n ) = H ( X 0 | X − 1 − n ) + f k n ( X 0 − n ) ε log(1 /ε ) + g k n ( X 0 − n ) ε + O ( ε 2 log ε ) , wher e f k n ( X 0 − n ) and g k n ( X 0 − n ) ar e given by ( 2.8 ) and ( 2.9 ) b elow, r esp e ctively. Pr oof. In this pro of w = w − 1 − n , wh ere w − j is a single binary bit, and we let v denote a single binary bit. And w e use th e notation for pr obabilit y: p X Z ( w ) = P ( X − n + k − 1 − n = w − n + k − 1 − n , Z − 1 − n + k ( ε ) = w − 1 − n + k ) , p X Z ( wv ) = P ( X − n + k − 1 − n = w − n + k − 1 − n , Z − 1 − n + k ( ε ) = w − 1 − n + k , Z 0 ( ε ) = v ) and p X Z ( v | w ) = P ( Z 0 ( ε ) = v | Z − 1 − n + k ( ε ) = w − 1 − n + k , X − n + k − 1 − n = w − n + k − 1 − n ) . W e remark th at the deﬁnition of p X Z do es dep end on ε and h o w we p artition w − 1 − n according to k , how ev er, we keep the dep end ence implicit for n otational simplicit y . W e split H ( Z 0 ( ε ) | Z − 1 − n + k ( ε ) , X − n + k − 1 − n ) in to ﬁv e terms: H ( Z 0 ( ε ) | Z − 1 − n + k ( ε ) , X − n + k − 1 − n ) = X w v ∈A ( X ) − p X Z ( wv ) log( p X Z ( v | w )) + X w ∈A ( X ) ,w v / ∈A ( X ) − p X Z ( wv ) log( p X Z ( v | w )) (2.6) 8 G. HA N AND B. MARCUS + X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) − p X Z ( wv ) log( p X Z ( v | w )) + X p X Z ( w )=Θ( ε ) ,p X Z ( wv )= O ( ε 2 ) − p X Z ( wv ) log ( p X Z ( v | w )) + X p X Z ( w )= O ( ε 2 ) − p X Z ( wv ) log( p X Z ( v | w )) , here by α = Θ( β ), we mean, as usual, there exist p ositiv e constan ts C 1 , C 2 suc h that C 1 | β | ≤ | α | ≤ C 2 | β | , while by α = O ( β ), we mean th ere exists a p ositiv e constant C such that | α | ≤ C | β | ; note that from p X Z ( w ) = X u − 1 − n + k : w − n + k − 1 − n u − 1 − n + k ∈A ( X − 1 − n ) P ( X − n + k − 1 − n = w − n + k − 1 − n , X − 1 − n + k = u − 1 − n + k ) × − 1 Y j = − n + k p E ( u j ⊕ w j ) ! , w e s ee th at p X Z ( w ) = Θ ( ε ) is equiv alen t to the statemen t that w / ∈ A ( X − 1 − n ), and by ﬂipp ing exactly one of the bits in w − 1 − n + k , one obtains, fr om w , a sequence in A ( X − 1 − n ). F or the f ou r th term, we ha ve X p X Z ( w )=Θ( ε ) ,p X Z ( wv )= O ( ε 2 ) − p X Z ( wv ) log ( p X Z ( v | w )) = O ( ε 2 log ε ) . F or the ﬁ fth term, we ha ve X p X Z ( w )= O ( ε 2 ) − p X Z ( wv ) log ( p X Z ( v | w )) = X p X Z ( w )= O ( ε 2 ) − p X Z ( w ) X v p X Z ( v | w ) log( p X Z ( v | w )) ≤ (log 2) X p X Z ( w )= O ( ε 2 ) p X Z ( w ) = O ( ε 2 ) , where we use the fact that − P v p X Z ( v | w ) log ( p X Z ( v | w )) ≤ log 2 for an y w . W e conclude that the su m of the fourth term and the ﬁf th term is O ( ε 2 log ε ). F or a bin ary sequence u − 1 − n , deﬁne h k n ( u − 1 − n ) to b e h k n ( u − 1 − n ) = n − k X j =1 p X ( u − j − 1 − n ¯ u − j u − 1 − j +1 ) − ( n − k ) p X ( u − 1 − n ) . (2.7) INPUT-CONSTRA INED CHA NNEL CAP ACITY 9 Note that with this notation, h k n ( w ) and h k n +1 ( wv ) can b e expressed as deriv ativ es with r esp ect to ε at ε = 0: h k n ( w ) = p ′ X Z ( w ) | ε =0 , h k n +1 ( wv ) = p ′ X Z ( wv ) | ε =0 . Then for the ﬁrst term, we ha ve X w v ∈A ( X ) − p X Z ( wv ) log( p X Z ( v | w )) = − X w v ∈A ( X ) ( p X ( wv ) + h k n +1 ( wv ) ε + O ( ε 2 )) × log  p X ( v | w ) + h k n +1 ( wv ) p X ( w ) − h k n ( w ) p X ( wv ) p 2 X ( w ) ε + O ( ε 2 )  = H ( X 0 | X − 1 − n ) − X w v ∈A ( X )  h k n +1 ( wv ) log p X ( v | w ) + h k n +1 ( wv ) p X ( w ) − h k n ( w ) p X ( wv ) p X ( w )  ε + O ( ε 2 ) . F or the second term, it is easy to c heck that for w ∈ A ( X ) and w v / ∈ A ( X ), p X Z ( v | w ) = Θ( ε ) and so p X Z ( wv ) = h k n +1 ( wv ) ε + O ( ε 2 ); w e then obtain X w ∈A ( X ) ,w v / ∈A ( X ) − p X Z ( wv ) log ( p X Z ( v | w )) = − X w ∈A ( X ) ,w v / ∈A ( X ) h k n +1 ( wv ) ε log h k n +1 ( wv ) ε + O ( ε 2 ) p X ( w ) + O ( ε 2 ) log Θ( ε ) = X w ∈A ( X ) ,w v / ∈A ( X ) h k n +1 ( wv ) ε log(1 /ε ) − X w ∈A ( X ) ,w v / ∈A ( X ) h k n +1 ( wv ) log h k n +1 ( wv ) p X ( w ) ! ε + O ( ε 2 log ε ) . F or the th ir d term, we ha ve X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) − p X Z ( wv ) log( p X Z ( v | w )) 10 G. HA N AND B. MARCUS = − X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) ( h k n +1 ( wv ) ε + O ( ε 2 )) × log  h k n +1 ( wv ) h k n ( w ) + O ( ε )  = − X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) h k n +1 ( wv ) log  h k n +1 ( wv ) h k n ( w )  ! ε + O ( ε 2 ) . In sum mary , H ( Z 0 ( ε ) | Z − 1 − n + k ( ε ) , X − n + k − 1 − n ) can b e r ewritten as H ( Z 0 ( ε ) | Z − 1 − n + k ( ε ) , X − n + k − 1 − n ) = H ( X 0 | X − 1 − n ) + f k n ( X 0 − n ) ε log(1 /ε ) + g k n ( X 0 − n ) ε + O ( ε 2 log ε ) , where [see ( 2.7 ) for the deﬁn ition of h k n ( · )] f k n ( X 0 − n ) = X w ∈A ( X ) ,w v / ∈A ( X ) h k n +1 ( wv ) (2.8) = X w ∈A ( X ) ,w v / ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − n ) ! , and g k n ( X 0 − n ) = − X w v ∈A ( X )  h k n +1 ( wv ) log p X ( v | w ) + h k n +1 ( wv ) p X ( w ) − h k n ( w ) p X ( wv ) p X ( w )  (2.9) − X w ∈A ( X ) ,w v / ∈A ( X ) h k n +1 ( wv ) log h k n +1 ( wv ) p X ( w ) − X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) h k n +1 ( wv ) log  h k n +1 ( wv ) h k n ( w )  .  Remark 2.3. F or any δ > 0 and ﬁxed n , the constan t in O ( ε 2 log ε ) in Lemma 2.2 can b e c hosen uniform ly on P n +1 ,δ , wh ere P n +1 ,δ denotes the s et of bin ary stationary pro cesses X = X 0 − n , su ch that, for all w 0 − n ∈ A ( X ), w e ha ve p X ( w ) ≥ δ . Theorem 2.4. F or an m th or der Markov chain X p assing thr ough a BSC( ε ), with Z ε as the output hidden Markov chain, H ( Z ε ) = H ( X ) + f ( X ) ε log(1 /ε ) + g ( X ) ε + O ( ε 2 log ε ) , wher e f ( X ) = f 0 2 m ( X 0 − 2 m ) = f m 2 m ( X 0 − 2 m ) and g ( X ) = g 0 3 m ( X 0 − 3 m ) = g m 3 m ( X 0 − 3 m ) . INPUT-CONSTRA INED CHA NNEL CAP ACITY 11 Pr oof. W e app ly Lemma 2.2 to the Birc h u p p er and lo w er b ounds [equation ( 2.5 )] of H ( Z ε ). F or the upp er b ound, k = 0 , we ha ve , for all n , H ( Z 0 ( ε ) | Z − 1 − n ( ε )) = H ( X 0 | X − 1 − n ) + f 0 n ( X 0 − n ) ε log(1 /ε ) + g 0 n ( X 0 − n ) ε + O ( ε 2 log ε ) . And for the lo w er b ound, k = m , w e ha ve , for n ≥ m , H ( Z 0 ( ε ) | Z − 1 − n + m ( ε ) , X − n + m − 1 − n ) = H ( X 0 | X − 1 − n ) + f m n ( X 0 − n ) ε log(1 /ε ) + g m n ( X 0 − n ) ε + O ( ε 2 log ε ) . The ﬁrst term alwa ys coincides for the up p er and low er b oun ds. When n ≥ m , sin ce X is an m th order Mark ov c h ain, H ( X 0 | X − 1 − n ) = H ( X 0 | X − 1 − m ) = H ( X ) . Let w = w − 1 − n , where w − j is a single bit, and v denotes a single bit. If w ∈ A ( X ) and w v / ∈ A ( X ), then p X ( w − 1 − m v ) = 0 . It then follo ws that for an m th ord er Marko v chain, when n ≥ 2 m , f m n ( X 0 − n ) = f 0 n ( X 0 − n ) = f 0 2 m ( X 0 − 2 m ) = f m 2 m ( X 0 − 2 m ) . (2.10) No w consider g k n ( X 0 − n ). When 0 ≤ k ≤ m , w e hav e the follo wing facts [for a detailed deriv ation of ( 2.11 )–( 2.13 ), s ee the App endix ]: if w v ∈ A ( X ) , p X ( v | w ) = p X ( v | w − 1 − m ) for n ≥ m, (2.11) if w ∈ A ( X ) , w v / ∈ A ( X ) , (2.12) h k n +1 ( wv ) p X ( w ) is constant (as function of n and k ) for n ≥ 2 m, 0 ≤ k ≤ m, if p X Z ( w ) = Θ ( ε ) , p X Z ( wv ) = Θ( ε ) , (2.13) h k n +1 ( wv ) h k n ( w ) is constant for n ≥ 3 m, 0 ≤ k ≤ m. It then follo ws [see th e deriv ations of ( 2.14 )–( 2.16 ) in the App endix ] that X w v ∈A ( X ) h k n +1 ( wv ) p X ( w ) − h k n ( w ) p X ( wv ) p X ( w ) (2.14) is constant (as a f u nction of n ) for n ≥ 2 m, 0 ≤ k ≤ m, X w ∈A ( X ) ,w v / ∈A ( X ) h k n +1 ( wv ) log h k n +1 ( wv ) p X ( w ) (2.15) 12 G. HA N AND B. MARCUS is constant for n ≥ 2 m, 0 ≤ k ≤ m, and X w v ∈A ( X ) h k n +1 ( wv ) log p X ( v | w ) (2.16) + X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) h k n +1 ( wv ) log h k n +1 ( wv ) h k n ( w ) is constant for n ≥ 3 m, 0 ≤ k ≤ m. Consequent ly , w e hav e g m n ( X 0 − n ) = g 0 n ( X 0 − n ) = g 0 3 m ( X 0 − 3 m ) = g m 3 m ( X 0 − 3 m ) . (2.17) Let f ( X ) = f 0 2 m ( X 0 − 2 m ) and g ( X ) = g 0 3 m ( X 0 − 3 m ), then th e theorem follo ws.  Remark 2.5. Note that this result applies in particular to the case when the transition probabilities of X are all p ositiv e; thus, in this case the form u la should reduce to that of Theorem 2.1 . Indeed, when all transition probabilities of X are p ositiv e, f ( X ) v anishes since the su mmation in ( 2.8 ) is tak en o ver an empty set; on the other hand, again f r om ( 2.8 ), if some of the transition probabilities of X are zero, then f ( X ) do es n ot v anish [to see this, n ote that w hen w ∈ A ( X ) , wv / ∈ A ( X ), necessarily we will hav e w ¯ v ∈ A ( X )]. T he agreemen t of g ( X ) with expression in Theorem 2.1 is a straigh tforward, b u t tedious, computation. Remark 2.6. T ogether with Remark 2.3 , the p ro of of T heorem 2.4 im- plies that for any δ > 0 and ﬁxed m , the constan t in O ( ε 2 log ε ) in Theo- rem 2.4 can b e chosen u niformly on Q m,δ , where Q m,δ denotes the s et of all m th order Marko v c hains X suc h that, whenev er w = w 0 − m ∈ A ( X ), we ha ve p X ( w ) ≥ δ . Remark 2.7. The error term in the formula of Theorem 2.4 cannot b e impro v ed, in the sense that, in some cases, the error term is dominated by a strictly p ositiv e constant times ε 2 log ε . As we sh ow ed in Theorem 2.4 , the Birc h upp er b ound with n = 3 m yields H ( Z 0 ( ε ) | Z − 1 − n ( ε )) = H ( X ) + f ( X ) ε log(1 /ε ) + g ( X ) ε + O ( ε 2 log ε ) . T ogether with ( 2.6 ), one c hecks that the Θ( ε 2 log ε ) term in the error term O ( ε 2 log ε ) is con trib uted b y [see the second term in ( 2.6 ) with k = 0] X w ∈A ( X ) ,w v / ∈A ( X ) − p Z ( wv ) log ( p Z ( v | w )) INPUT-CONSTRA INED CHA NNEL CAP ACITY 13 and [see the fourth term in ( 2.6 ) with k = 0 ] X p Z ( w )=Θ( ε ) ,p Z ( wv )= O ( ε 2 ) − p Z ( wv ) log ( p Z ( v | w )) , and this Θ( ε 2 log ε ) term do es not v anish at least for certain cases. F or instance, consider the inpu t Mark o v c hain X with the follo win g transition probabilit y matrix:  1 − p p 1 0  , where 0 < p < 1 . Then one c h ec ks that for this case, m = 1 , n = 3, and the co eﬃcien t of the ab o v e-men tioned Θ( ε 2 log ε ) term tak es the f orm of 1 − 6 p + 7 p 2 − p 3 1 + p , whic h is s trictly p ositiv e for p is close to 0 . 3. Asymptotics of capacit y . Consid er a binary irreducible ﬁn ite t yp e con- strain t S deﬁned by F , whic h consists of f orbidden w ords with length ˆ m + 1. In general, there are man y su c h F ’s corresp onding to the same S with d if- feren t lengths; here we may c h o ose F to b e th e one with the smallest length ˆ m + 1. And ˆ m = ˆ m ( S ) is deﬁned to b e the top olo gic al or der of the constraint S . F or example, the order of S ( d, k ) , discu ssed in the in tro d uction, is k [ 20 ]. The topological order of a ﬁ n ite type constrain t is analog ous to the order of a Marko v chain. Recall from ( 1.5 ) that f or an in put-constrained BSC( ε ) w ith input se- quences X 0 − n in S and with the corresp ond ing output Z 0 − n ( ε ), the capacit y can b e written as C ( S , ε ) = lim n →∞ sup X 0 − n ∈ P n +1 , A ( X 0 − n ) ⊆S (1 / ( n + 1)( H ( Z 0 − n ( ε )) − H ( Z 0 − n ( ε ) | X 0 − n ))) Since the noise distribu tion is symmetric and the noise pro cess E is i.i.d. and ind ep endent of X , this can b e simpliﬁed to C ( S , ε ) = lim n →∞ sup X 0 − n ∈ P n +1 , A ( X 0 − n ) ⊆S H ( Z 0 − n ( ε )) / ( n + 1) − H ( ε ) , whic h can b e rewritten as C ( S , ε ) = lim n →∞ sup X 0 − n ∈ P n +1 , A ( X 0 − n ) ⊆S H ( Z 0 ( ε ) | Z − 1 − n ( ε )) − H ( ε ) , where we used the chain rule for en tropy (see page 21 of [ 8 ]) H ( Z 0 − n ( ε )) = n X j =0 H ( Z 0 ( ε ) | Z − 1 − j ( ε )) , 14 G. HA N AND B. MARCUS and the fact that (further) conditioning red uces entrop y (see page 27 of [ 8 ]) H ( Z 0 ( ε ) | Z − 1 − j 1 ( ε )) ≥ H ( Z 0 ( ε ) | Z − 1 − j 2 ( ε )) for j 1 ≤ j 2 . Recall from ( 1.5 ) that C ( S , ε ) = sup X ∈ P , A ( X ) ⊆S H ( Z ε ) − H ( Z ε | X ) . No w let H n ( S , ε ) = sup X 0 − n ∈ P n +1 , A ( X 0 − n ) ⊆S H ( Z 0 ( ε ) | Z − 1 − n ( ε )) and h m ( S , ε ) = sup X ∈M m , A ( X ) ⊆S H ( Z ε ) , where M m denotes the set of all m th ord er b inary irred u cible Marko v chains; w e then h av e the b oun ds for C ( S , ε ): h m ( S , ε ) − H ( ε ) ≤ C ( S , ε ) ≤ H n ( S , ε ) − H ( ε ) . (3.1) Noting that sup X 0 − n ∈ P n +1 , A ( X 0 − n ) ( S n +1 H ( X 0 | X − 1 − n ) < sup X 0 − n ∈ P n +1 , A ( X 0 − n )= S n +1 H ( X 0 | X − 1 − n ) (here ( means “prop er sub set of ”), and H ( Z 0 ( ε ) | Z − 1 − n ( ε )) are conti n u ous at ε = 0, w e conclude that, for ε suﬃcien tly small ( ε < ε 0 ), one ma y c ho ose δ > 0 (here, δ d ep ends on n and m ) such that H n ( S , ε ) = sup X 0 − n ∈ P n +1 ,δ , A ( X 0 − n )= S n +1 , H ( Z 0 ( ε ) | Z − 1 − n ( ε )) . So from n o w on we only consider stationary pro cesses X = X 0 − n with A ( X 0 − n ) = S n +1 . No w for a stationary pro cess X = X 0 − n , deﬁne ~ p n as the follo wing proba- bilit y vec tor indexed by all the element s in S n +1 : ~ p n = ~ p n ( X 0 − n ) = ( P ( X 0 − n = w 0 − n ) : w 0 − n ∈ S n +1 ) . T o emphasize the dep enden ce of X 0 − n on ~ p n , in the follo wing, w e shall rewrite X 0 − n as X 0 − n ( ~ p n ). F or an m th order binary irreducible Mark o v chain X = X 0 −∞ , s lightly abusing th e n otation, deﬁne ~ p m as th e follo w ing p robabilit y v ector indexed by all the elemen ts in S m +1 , ~ p m = ~ p m ( X 0 −∞ ) = ( P ( X 0 − m = w 0 − m ) : w 0 − m ∈ S m +1 ) . Similarly , to emph asize the dep en d ence of X = X 0 −∞ on ~ p m , in the follo wing, w e shall rewrite X as X ~ p m . And we shall u se Z 0 − n ( ~ p n , ε ) to denote the outpu t pro cess obtained b y passing X 0 − n ( ~ p n ) through BSC( ε ), and u se Z ~ p m ,ε to denote the output pr o cess obtained by passing X ~ p m through BSC( ε ). INPUT-CONSTRA INED CHA NNEL CAP ACITY 15 Lemma 3.1. F or any stationary pr o c ess X 0 − n ( ~ p n ) with A ( X 0 − n ( ~ p n )) = S n +1 , H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) , as a function of ~ p n , has a ne gative deﬁnite Hes- sian matrix. Pr oof. Note that H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) = − X x 0 − n ∈S p ( x 0 − n ) log p ( x 0 | x − 1 − n ) . F or tw o diﬀerent probabilit y ve ctors ~ p n and ~ q n , consider the conv ex combi- nation ~ r n ( t ) = t ~ p n + (1 − t ) ~ q n , where 0 ≤ t ≤ 1. It su ﬃ ces to pro ve that H ( X 0 ( ~ r n ( t )) | X − 1 − n ( ~ r n ( t ))) has a strictly n egativ e second deriv ativ e with resp ect to t . No w consider a single term in H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )): − ( t~ p n ( x 0 − n ) + (1 − t ) ~ q n ( x 0 − n )) log t~ p n ( x 0 − n ) + (1 − t ) ~ q n ( x 0 − n ) t~ p n ( x − 1 − n ) + (1 − t ) ~ q n ( x − 1 − n ) . Note that for t wo f ormal sym b ols α and β , if we assume α ′′ = 0 and β ′′ = 0, the second order formal deriv ativ e of α log α β can b e compu ted as  α log α β  ′′ =  α ′ √ α − √ α β ′ β  2 . It then follo ws that the second deriv ativ e of this term (with resp ect to t ) can b e calculated as −  ~ p n ( x 0 − n ) − ~ q n ( x 0 − n ) q t~ p n ( x 0 − n ) + (1 − t ) ~ q n ( x 0 − n ) − q t~ p n ( x 0 − n ) + (1 − t ) ~ q n ( x 0 − n ) ~ p n ( x 0 − ( n − 1) ) − ~ q n ( x 0 − ( n − 1) ) t~ p n ( x 0 − ( n − 1) ) + (1 − t ) ~ q n ( x 0 − ( n − 1) )  2 . That is, the expr ession ab ov e is alwa ys nonp ositiv e, and is equal to 0 only if ~ p n ( x 0 − n ) − ~ q n ( x 0 − n ) t~ p n ( x 0 − n ) + (1 − t ) ~ q n ( x 0 − n ) = ~ p n ( x 0 − ( n − 1) ) − ~ q n ( x 0 − ( n − 1) ) t~ p n ( x 0 − ( n − 1) ) + (1 − t ) ~ q n ( x 0 − ( n − 1) ) , whic h is equ iv alen t to P ( X 0 ( ~ p n ) = x 0 | X − 1 − n ( ~ p n ) = x − 1 − n ) (3.2) = P ( X 0 ( ~ q n ) = x 0 | X − 1 − n ( ~ q n ) = x − 1 − n ) . Since S is an irreducible ﬁn ite t yp e constrain t and A ( X 0 − n ( ~ p n )) = A ( X 0 − n ( ~ q n )) = S n +1 , th e expression ( 3.2 ) cannot b e true for ev ery x 0 − n unless ~ p n = ~ q n . So 16 G. HA N AND B. MARCUS w e conclude that the second deriv ativ e of H ( X 0 ( ~ r n ( t )) | X − 1 − n ( ~ r n ( t ))) (with resp ect to t ) is strictly negativ e. Th us, H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )), as a function of ~ p n , has a strictly n egativ e deﬁn ite Hessian.  F or m ≥ ˆ m , o v er all m th order Marko v c hains X ~ p m with A ( X ~ p m ) = S , H ( X ~ p m ) is maximized at some unique Mark o v chain X ~ p max m (see [ 20 , 24 ]). Moreo v er, X ~ p max m do es not dep end on m and is an ˆ m th order Mark o v chain, so w e will drop the subscript m and use X ~ p max instead to denote X ~ p max m for any m ≥ ˆ m . The same idea shows that o v er all stationary d istr ibutions X 0 − n ( ~ p n ) ( n ≥ ˆ m ) with A ( X 0 − n ( ~ p n )) = S n +1 , H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) is maxi- mized at ~ p max n , wh ic h corresp onds to the ab o v e un ique X ~ p max as w ell. Note that C ( S ) = C ( S , 0) is equal to the noiseless c ap acity of the con- strain t S . This quantit y has b een extensiv ely stu died, and sev eral interpre- tations and metho ds for its explicit d eriv ation are kno wn (see, e.g., [ 21 ] and the extensiv e b ibliograph y therein). It is well kn o wn th at C ( S ) = H ( X ~ p max ) (see [ 20 , 24 ]). Theorem 3.2. 1. If n ≥ 3 ˆ m ( S ) , H n ( S , ε ) = C ( S ) + f ( X ~ p max ) ε log(1 /ε ) + g ( X ~ p max ) ε + O ( ε 2 log 2 ε ) . 2. If m ≥ ˆ m ( S ) , h m ( S , ε ) = C ( S ) + f ( X ~ p max ) ε log(1 /ε ) + g ( X ~ p max ) ε + O ( ε 2 log 2 ε ) . Her e, as deﬁne d in The or em 2.4 , f ( X ~ p max ) = f 0 2 ˆ m ( X 0 − 2 ˆ m ( ~ p max )) and g ( X ~ p max ) = g 0 3 ˆ m ( X 0 − 3 ˆ m ( ~ p max )) . Pr oof. W e ﬁr st pr ov e the statemen t for H n ( S , ε ) . As mentioned b efore, for ε suﬃ cien tly small ( ε < ε 0 ), H n ( S , ε ) is ac hieve d b y X 0 − n with A ( X 0 − n ) = S n +1 ; and one ma y c h o ose δ such that H n ( S , ε ) = sup ~ p : X 0 − n ( ~ p n ) ∈ P n +1 ,δ , A ( X 0 − n ( ~ p n ))= S n +1 H ( Z 0 ( ~ p n , ε ) | Z − 1 − n ( ~ p n , ε )) . Belo w, we assume ε < ε 0 , X 0 − n ( ~ p n ) ∈ P n +1 ,δ , A ( X 0 − n ( ~ p n )) = S n +1 ; and for notational con venience, we rewrite f 0 n ( X 0 − n ( ~ p n )) as f n ( ~ p n ), g 0 n ( X 0 − n ( ~ p n )) as g n ( ~ p n ). In Lemma 2.2 we ha v e pro ved that H ( Z 0 ( ~ p n , ε ) | Z − 1 − n ( ~ p n , ε )) = H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) + f n ( ~ p n ) ε log(1 /ε ) + g n ( ~ p n ) ε + O ( ε 2 log ε ) . Moreo v er, by Remark 2.3 , for an y δ > 0, O ( ε 2 log ε ) is un iform on P n +1 ,δ , that is, there is a constan t C (dep end ing on n ) su c h that, for all X 0 − n with INPUT-CONSTRA INED CHA NNEL CAP ACITY 17 X 0 − n ( ~ p ) ∈ P n +1 ,δ and A ( X 0 − n ) = S n +1 , | H ( Z 0 ( ~ p n , ε ) | Z − 1 − n ( ~ p n , ε )) − H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) − f n ( ~ p n ) ε log(1 /ε ) − g n ( ~ p n ) ε | ≤ C ε 2 log ε. Let ~ q n = ~ p n − ~ p max n . Since H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) is maximized at ~ p max n , we can expand H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) arou n d ~ p max n : H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) = H ( X 0 ( ~ p max n ) | X − 1 − n ( ~ p max n )) + ~ q n t K 1 ~ q n + O ( | ~ q n | 3 ) = H ( X ~ p max ) + ~ q t n K 1 ~ q n + O ( | ~ q n | 3 ) , where K 1 is a n egativ e deﬁn ite matrix b y Lemma 3.1 (the second equalit y follo ws from the fact that X ~ p max is an ˆ m th order Mark ov c hain). So for | ~ q n | suﬃcien tly small, w e ha ve H ( X 0 ( ~ p n ) | X − 1 − n ( ~ p n )) < H ( X ~ p max ) + (1 / 2) ~ q t n K 1 ~ q n . No w w e expand f n ( ~ p n ) and g n ( ~ p n ) around ~ p max n : f n ( ~ p n ) = f n ( ~ p max n ) + K 2 · ~ q n + O ( | ~ q n | 2 ) , g n ( ~ p n ) = g n ( ~ p max n ) + K 3 · ~ q n + O ( | ~ q n | 2 ) (here, K 2 and K 3 are v ectors of ﬁrst order partial deriv ativ es). Then, for | ~ q n | suﬃcien tly small, we ha ve f n ( ~ p n ) ≤ f n ( ~ p max n ) + 2 X j | K 2 ,j || q n,j | , g n ( ~ p ) ≤ g n ( ~ p max n ) + 2 X j | K 3 ,j || q n,j | , where K 2 ,j , K 3 ,j , q n,j are the j th coord inates of K 2 , K 3 , ~ q n , resp ectiv ely . With a c hange of coord inates, if necessary , w e may assume K 1 is a di- agonal matrix with strictly negativ e diagonal elements K 1 ,j . In the fol- lo wing we assu me 0 < ε < ε 0 . And w e ma y f urther assume that for some ℓ ≥ 1, | q n,j | > 4 | K 2 ,j /K 1 ,j | ε log(1 /ε ) + 4 | K 3 ,j /K 1 ,j | ε for j ≤ ℓ − 1 , and | q n,j | ≤ 4 | K 2 ,j /K 1 ,j | ε log(1 /ε ) + 4 | K 3 ,j /K 1 ,j | ε for j ≥ ℓ . Then for eac h j ≤ l − 1, w e ha ve (1 / 2) K 1 ,j q 2 n,j + 2 | K 2 ,j || q n,j | ε log(1 /ε ) + 2 | K 3 ,j || q n,j | ε < 0. T h u s, H ( Z 0 ( ~ p n , ε ) | Z − 1 − n ( ~ p n , ε )) < H ( X ~ p max n ) + f n ( ~ p max n ) ε log(1 /ε ) + g n ( ~ p max n ) ε + X j ((1 / 2) K 1 ,j q 2 n,j + 2 | K 2 ,j || q n,j | ε log(1 /ε ) + 2 | K 3 ,j || q n,j | ε ) 18 G. HA N AND B. MARCUS + C ε 2 log ε < H ( X ~ p max ) + f n ( ~ p max n ) ε log(1 /ε ) + g n ( ~ p max n ) ε + X j ≥ l (1 / 2) K 1 ,j (4 | K 2 ,j /K 1 ,j | ε log(1 /ε ) + 4 | K 3 ,j /K 1 ,j | ε ) 2 + X j ≥ l 2 | K 2 ,j | (4 | K 2 ,j /K 1 ,j | ε log(1 /ε ) + 4 | K 3 ,j /K 1 ,j | ε ) ε log(1 /ε ) + X j ≥ l 2 | K 3 ,j | (4 | K 2 ,j /K 1 ,j | ε log(1 /ε ) + 4 | K 3 ,j /K 1 ,j | ε ) ε + C ε 2 log ε. Collecting term s , w e eve ntually r eac h H ( Z 0 ( ~ p n , ε ) | Z − 1 − n ( ~ p n , ε )) < H ( X ~ p max ) + f n ( ~ p max n ) ε log(1 /ε ) + g n ( ~ p max n ) ε + O ( ε 2 log 2 ε ) , and since H n ( S , ε ) is the sup of the left-hand side expression, together with H ( X ~ p max ) = C ( S ), we ha ve H n ( S , ε ) ≤ C ( S ) + f n ( ~ p max n ) ε log(1 /ε ) + g n ( ~ p max n ) ε + O ( ε 2 log 2 ε ) . As d iscu ssed in Th eorem 2.4 , we ha ve f n ( ~ p max n ) = f ( X ~ p max ) , n ≥ 2 ˆ m, (3.3) and g n ( ~ p max n ) = g ( X ~ p max ) , n ≥ 3 ˆ m. (3.4) So even tually w e reac h H n ( S , ε ) ≤ C ( S ) + f ( X ~ p max ) ε log(1 /ε ) + g ( X ~ p max ) ε + O ( ε 2 log 2 ε ) . The reverse inequalit y follo ws trivially from the deﬁnition of H n ( ε ). W e now p ro ve the statemen t for h m ( S , ε ) . First, observ e that H 3 m ( S , ε ) ≥ h m ( S , ε ) ≥ h ˆ m ( S , ε ) ≥ H ( Z ~ p max ,ε ) , where Z ~ p max ,ε is the output p ro cess corresp onding to inp ut pro cess X ~ p max . By part 1, H 3 m ( S , ε ) is of the form C ( S ) + f ( X ~ p max ) ε log(1 /ε ) + g ( X ~ p max ) ε + O ( ε 2 log 2 ε ) . By Th eorem 2.4 , H ( Z ~ p max ,ε ) is of th e same form. Thus, h m ( S , ε ) is also of the same form, as desired.  Corollar y 3.3. C ( S , ε ) = C ( S ) + ( f ( X ~ p max ) − 1) ε log (1 /ε ) + ( g ( X ~ p max ) − 1) ε + O ( ε 2 log 2 ε ) . In fact, f or e ach m ≥ ˆ m ( S ) , h m ( S, ε ) − H ( ε ) is of this form. INPUT-CONSTRA INED CHA NNEL CAP ACITY 19 Pr oof. This follo ws fr om Th eorem 3.2 , inequalit y ( 3.1 ) and the fact that H ( ε ) = ε log 1 /ε + (1 − ε ) log 1 / (1 − ε ) = ε log 1 /ε + ε + O ( ε 2 ) .  Remark 3.4. Note that the err or term here for noisy constrained capac- it y is O ( ε 2 log 2 ε ), which is larger than the err or term, O ( ε 2 log ε ), for entrop y rate in Theorem 2.4 . At least in some cases, this cann ot b e impr o ved, as w e sho w at the end of the next section. 4. Binary symmetric c hann el w ith ( d, k ) -RLL constrained input. W e no w app ly the r esults of the preceding section to compute asypmto tics for the the noisy constrained BSC c hannel with inp u ts r estricted to the ( d, k )- RLL constraint S ( d, k ). Expressions ( 2.8 ) and ( 2.9 ) allo w us to explicitly compute f ( X ~ p max ) and g ( X ~ p max ). In this section, as an example, w e derive the explicit expression for f ( X ~ p max ), omitting the computation of g ( X ~ p max ) due to tedious deriv ation. W e remark that for a BSC( ε ) for some cases, the ( d, k ) -RLL constrained input, similar exp r essions hav e b een in dep end en tly obtained in [ 19 ]. It is ﬁrst sho w n in [ 19 ] that in the case k ≤ 2 d , f or an y binary stationary Mark o v c hain X , of an y order, with A ( X ) ⊆ S ( d, k ), f ( X ) = 1, and so, in this case, C ( S ( d, k ) , ε ) = C ( S ( d, k ) , 0) + O ( ε ) , that is, the noisy constrained capacit y diﬀers from the noiseless capacit y by O ( ε ), rather than O ( ε log ε ). In the follo wing we tak e a lo ok at this using a diﬀeren t approac h. F or this, ﬁrst note that for any d, k , f ( X ) tak es the f orm f ( X ) = X l 1 + l 2 ≤ k − 1 , 0 ≤ l 2 ≤ d − 1 ,l 1 ≥ d p X (10 l 1 + l 2 +1 1) (4.1) + X l 1 + l 2 = k ,l 1 ≥ d p X (10 l 1 10 l 2 ) + X 1 ≤ l ≤ d p X (10 l ) . No w, when k ≤ 2 d , X l 1 + l 2 = k ,l 1 ≥ d p X (10 l 1 10 l 2 ) = X d ≤ l 1 ≤ k p X (10 l 1 1) = p (1) and X l 1 + l 2 ≤ k − 1 , 0 ≤ l 2 ≤ d − 1 ,l 1 ≥ d p X (10 l 1 + l 2 +1 1) = p X (10 d +1 ) + p X (10 d +2 ) + · · · + p X (10 k ) . So f ( X ) = p X (1) + p X (10) + · · · + p X (10 d ) + p X (10 d +1 ) + · · · + p X (10 k ) = 1 , 20 G. HA N AND B. MARCUS as desired . No w we consider the general RLL constraint S ( d, k ). By Corollary 3.3 , w e ha v e C ( S ( d, k ) , ε ) = C ( S ( d, k )) + ( f ( X ~ p max ) − 1) ε log 1 /ε (4.2) + ( g ( X ~ p max ) − 1) ε + O ( ε 2 log 2 ε ) . F or an y irredu cible ﬁn ite t yp e constrain t, the n oiseless capacit y and Mark ov pro cess of m aximal entrop y rate can b e compu ted in v arious w a ys (whic h all go bac k to Shannon; see [ 21 ] or [ 20 ], page 444) . Let A d enote the adjacency matrix of the standard graph p resen tation, with k + 1 states, of S ( d, k ). Let ρ d enote th e recipro cal of the largest eigen v alue. O ne can write C ( S ( d, k )) = − log ρ 0 , and in this case ρ 0 is the real ro ot of k X ℓ = d ρ ℓ +1 0 = 1 . (4.3) In the follo wing we compute f ( X ~ p max ) explicitly in terms of ρ 0 . Let ~ w = ( w 0 , w 1 , . . . , w k ) and ~ v = ( v 0 , v 1 , . . . , v k ) d en ote the left and righ t eigen vecto rs of A . Assume that ~ w and ~ v are scaled suc h that ~ w · ~ v = 1. T hen one c hecks that, w ith X = X ~ p max , p X (1) = w 0 v 0 = 1 ( k + 1) − P k j = d +1 P j − d − 1 l =0 1 /ρ l − j 0 , p X (10 l 1 + l 2 +1 1) = p X (1) ρ l 1 + l 2 +2 0 , p X (10 k 1) = p X (1) ρ k +1 0 , p X (10 l 1 10 l 2 ) = p X (10 l 1 10 l 2 1) + p X (10 l 1 10 l 2 +1 1) + · · · + p X (10 l 1 10 k 1) = p X (1) ρ l 1 + l 2 +2 0 (1 + ρ 0 + · · · + ρ k − l 2 0 ) = p X (1) ρ l 1 + l 2 +2 0 1 − ρ k − l 2 +1 0 1 − ρ 0 and p X (10 l ) = p X (10 l 1) + p X (10 l +1 1) + · · · + p X (10 k 1) = p X (1) ρ l +1 0 (1 + ρ 0 + · · · + ρ k − l 0 ) = p X (1) ρ l +1 0 1 − ρ k − l +1 0 1 − ρ 0 . So we obtain an explicit expression: f ( X ~ p max ) = X l 1 + l 2 ≤ k − 1 , 0 ≤ l 2 ≤ d − 1 ,l 1 ≥ d p X (10 l 1 + l 2 +1 1) + X l 1 = k ,l 2 =0 + X l 1 + l 2 = k ,k − 1 ≥ l 1 ≥ d ! p X (10 l 1 10 l 2 ) + X 1 ≤ l ≤ d p X (10 l ) INPUT-CONSTRA INED CHA NNEL CAP ACITY 21 = p X (1) ρ k +1 0 + X l 1 + l 2 ≤ k − 1 , 0 ≤ l 2 ≤ d − 1 ,l 1 ≥ d p X (1) ρ l 1 + l 2 +2 0 + X l 1 + l 2 = k ,k − 1 ≥ l 1 ≥ d p X (1) ρ l 1 + l 2 +2 0 1 − ρ k − l 2 +1 0 1 − ρ 0 + X 1 ≤ l ≤ d p X (1) ρ l +1 0 1 − ρ k − l +1 0 1 − ρ 0 . The co eﬃcien t g can also b e computed explicitly bu t tak es a more compli- cated form . Example 4.1. Cons ider a ﬁ rst ord er stationary Mark o v chain X with A ( X ) ⊆ S (1 , ∞ ), transmitted o ve r BSC( ε ) with th e corresp onding output Z , a hidden Marko v chai n. In this case, X can b e characte rized by the follo wing probabilit y ve ctor: ~ p 1 = ( p X (00) , p X (01) , p X (10)) . Note that ˆ m ( S ) = 1, and the only sequence w − 2 w − 1 v , which satisﬁes the requirement that w − 2 w − 1 is in S and w − 2 w − 1 v is not allo wa ble in S , is 011. It then follo ws that f ( X ~ p 1 ) = p (01 ¯ 1) + p (0 ¯ 11) + p ( ¯ 011) = π 01 (2 − π 01 ) / (1 + π 01 ) , (4.4) where π 01 denotes the transition probabilit y from 0 to 1 in X . Straightfo r- w ard , b ut tedious, computation also leads to g ( X ~ p 1 ) = (1 + π 01 ) − 1 (2 π 01 − π 2 01 − 2 π 3 01 + 3 π 4 01 − π 5 01 + ( − 2 π 01 + 4 π 3 01 − 2 π 4 01 ) log(2) + ( − 1 + 3 π 01 − π 2 01 − 2 π 3 01 + 5 π 4 01 − 3 π 5 01 ) log( π 01 ) + (2 − 6 π 01 + 7 π 3 01 − 8 π 4 01 + 3 π 5 01 ) log(1 − π 01 ) + (2 π 01 + π 2 01 − 3 π 3 01 + π 4 01 ) log(2 − π 01 )) . Th us, H ( Z ~ p 1 ,ε ) = H ( X ~ p 1 ) + ( π 01 (2 − π 01 ) / (1 + π 01 )) ε log(1 /ε ) + ( g ( X ~ p 1 ) − 1) ε + O ( ε 2 log ε ) . This asymp totic formula wa s originally prov en in [ 23 ], with the less pr ecise result th at replaces ( g ( X ~ p 1 ) − 1) ε + O ( ε 2 log 2 (1 /ε )) by O ( ε ). The maxim u m entrop y Mark o v chain X ~ p max on S (1 , ∞ ) is deﬁned by th e transition pr obabilit y matrix  1 /λ 1 /λ 2 1 0  22 G. HA N AND B. MARCUS and C ( S ) = H ( X ~ p max ) = log λ, where λ is the golden mean. Thus, in this case π 01 = 1 /λ 2 and so by C orol- lary 3.3 , we obtain C ( S , ε ) = log λ − ((2 λ + 2) / (4 λ + 3)) ε log (1 /ε ) + ( g ( X ~ p 1 ) | π 01 =1 /λ 2 − 1) ε + O ( ε 2 log 2 (1 /ε )) . W e now sho w that the err or term in the ab ov e f ormula cannot b e im- pro ved, in the sense that the error term is of size at least a p ositiv e constan t times ε 2 log 2 (1 /ε ). First observe that if w e parameterize ~ p 1 = ~ p 1 ( ε ) in any w ay , w e obtain C ( S , ε ) ≥ H ( Z ~ p 1 ( ε ) ,ε ) − H ( ε ) . (4.5) Since ~ p 1 is uniquely d etermined b y the transition probabilit y π 01 , we shall re-write ~ p 1 ( ε ) as π 01 ( ε ). W e shall also rewrite the v alue of π 01 = 1 /λ 2 at th e maxim um entrop y Marko v c hain as p max . Cho ose the parametrizatio n π 01 ( ε ) = p max + αε log(1 /ε ), where α is se- lected as follo ws . Let K 1 denote the v alue of the second deriv ativ e of H ( X π 01 ) at π 01 = p max (the ﬁrst deriv ativ e at π 01 = p max is 0). Let K 2 denote the v alue of the ﬁrst deriv ativ e of f ( X π 01 ) at π 01 = p max . Th ese v alues can b e computed explicitly: K 1 from the formula for entrop y rate of a ﬁr st order Mark ov chain ( 1.3 ) and K 2 from ( 4.4 ) ab ov e. A compu tation sho ws that K 1 ≈ − 3 . 065 and K 2 ≈ 0 . 571 (all that really m atters is that neither constan t is 0). Let α b e an y num b er su c h that 0 < α < K 2 / | K 1 | . F rom Theorem 2.4 and Remark 2.6 , we ha ve H ( Z π 01 ( ε ) ,ε ) ≥ H ( X π 01 ( ε ) ) + f ( X π 01 ( ε ) ) ε log(1 /ε ) (4.6) + g ( X π 01 ( ε ) ) ε + C 1 ε 2 log ε, for some constan t C 1 (indep end en t of ε suﬃciently small). W e also ha ve H ( X π 01 ( ε ) ) ≥ H ( X p max ) + K 1 ( αε log(1 /ε )) 2 + C 2 ( αε log(1 /ε )) 3 (4.7) for some constan t C 2 . And f ( X π 01 ( ε ) ) ≥ f ( X p max ) + K 2 ( αε log(1 /ε )) + C 3 ( αε log(1 /ε )) 2 , (4.8) g ( X π 01 ( ε ) ) ≥ g ( X p max ) + C 4 ( αε log(1 /ε )) (4.9) for constan ts C 3 , C 4 . And r ecall that H ( ε ) = ε log 1 /ε + (1 − ε ) log 1 / ( 1 − ε ) = ε log 1 /ε + ε + C 5 ε 2 (4.10) for some constan t C 5 . INPUT-CONSTRA INED CHA NNEL CAP ACITY 23 Recalling th at H ( X p max ) = C ( S ) and com binin g ( 4.5 )–( 4.1 0 ), w e see that C ( S , ε ) ≥ C ( S ) + ( f ( X p max ) − 1) ε log(1 /ε ) + ( g ( X p max ) − 1) ε + K 1 ( αε log(1 /ε )) 2 + K 2 ( αε 2 log 2 (1 /ε )) plus “error terms” wh ic h add up to C 1 ε 2 log ε + C 2 ( αε log(1 /ε )) 3 + C 3 α 2 ( ε log(1 /ε )) 3 + C 4 ( αε 2 log(1 /ε )) + C 5 ε 2 , whic h is lo wer b ounded by a constant M times ε 2 log(1 /ε ). Thus, we see that the diﬀerence b etw een C ( S , ε ) and C ( S ) + ( f ( X p max ) − 1) ε log (1 /ε ) + ( g ( X p max ) − 1) ε is low er b oun ded b y α ( K 1 α + K 2 ) ε 2 log 2 (1 /ε ) + M ε 2 log(1 /ε ) . (4.11) Since α > 0 and K 1 α + K 2 > 0, for suﬃcien tly small ε , ( 4.11 ) is low er b ound ed b y a p ositiv e constan t times ε 2 log 2 (1 /ε ), as desired. APPENDIX W e ﬁrst pr o ve ( 2.11 )–( 2.13 ). • ( 2.11 ) f ollo ws trivially fr om the fact that X is an m th order Mark ov c hain. • No w consid er ( 2.12 ). F or w ∈ A ( X ) and wv / ∈ A ( X ), h k n +1 ( wv ) = n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − n ¯ v ) = m X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − n ¯ v ) . So h k n +1 ( wv ) p X ( w ) = P m j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − n ¯ v ) p X ( w − 1 − n ) = m X j =1 p X ( w − j − 1 − m ¯ w − j w − 1 − j +1 v | w − m − 1 − 2 m ) + p X ( w − 1 − m ¯ v | w − m − 1 − 2 m ) ! × p X ( w − m − 1 − n )( p X ( w − 1 − m | w − m − 1 − 2 m ) p X ( w − m − 1 − n )) − 1 = P m j =1 p X ( w − j − 1 − 2 m ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − 2 m ¯ v ) p X ( w − 1 − 2 m ) . 24 G. HA N AND B. MARCUS • F or ( 2.13 ), there are tw o cases. If p X ( w − m − 1 − n ) = 0 , h k n +1 ( wv ) h k n ( w ) = P n − k j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) P n − k j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 ) = P n − k j = m +1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) P n − k j = m +1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 ) = p X ( v | w − 1 − m ) . If p X ( w − m − 1 − n ) > 0 , h k n +1 ( wv ) h k n ( w ) = P n − k j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) P n − k j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 ) = P 2 m j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) P 2 m j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 ) = P 2 m j =1 p X ( w − j − 1 − 3 m ¯ w − j w − 1 − j +1 v ) P 2 m j =1 p X ( w − j − 1 − 3 m ¯ w − j w − 1 − j +1 ) . Using ( 2.11 )–( 2. 13 ), we no w p ro ceed to prov e ( 2.14 )–( 2.16 ). • F or ( 2.14 ), we ha ve X w v ∈A ( X ) h k n +1 ( wv ) p X ( w ) − h k n ( w ) p X ( wv ) p X ( w ) = X w v ∈A ( X ) h k n +1 ( wv ) − X w v ∈A ( X ) h k n ( w ) p X ( v | w − 1 − m ) = X w v ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − n ¯ v ) ! − ( n + 1 − k ) X w v ∈A ( X ) p X ( wv ) − X w v ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 ) p X ( v | w − 1 − m ) + ( n − k ) X w v ∈A ( X ) p X ( wv ) = X w v ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) INPUT-CONSTRA INED CHA NNEL CAP ACITY 25 − X w ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 ) + X w v ∈A ( X ) p X ( w − 1 − n ¯ v ) − 1 = X w v ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) − X w ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 0) + n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 1) ! + X w − 1 − m v ∈A ( X ) p X ( w − 1 − m ¯ v ) − 1 = X w v ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) − X w v ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) − X w ∈A ( X ) ,w v / ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + X w − 1 − m v ∈A ( X ) p X ( w − 1 − m ¯ v ) − 1 = − X w − 1 − 2 m ∈A ( X ) ,w − 1 − 2 m v / ∈ A ( X ) m X j =1 p X ( w − j − 1 − 2 m ¯ w − j w − 1 − j +1 v ) + X w − 1 − m v ∈A ( X ) p X ( w − 1 − m ¯ v ) − 1 . • F or ( 2.15 ), we ha ve X w ∈A ( X ) ,w v / ∈A ( X ) h k n +1 ( wv ) log h k n +1 ( wv ) p X ( w ) = X w ∈A ( X ) ,w v / ∈A ( X ) h k n +1 ( wv ) log h 0 2 m +1 ( w − 1 − 2 m v ) p X ( w − 1 − 2 m ) 26 G. HA N AND B. MARCUS = X w ∈A ( X ) ,w v / ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) log h 0 2 m +1 ( w − 1 − 2 m v ) p X ( w − 1 − 2 m ) = X w − 1 − 2 m ∈A ( X ) ,w − 1 − 2 m v / ∈ A ( X ) m X j =1 p X ( w − j − 1 − 2 m ¯ w − j w − 1 − j +1 v ) × log h 0 2 m +1 ( w − 1 − 2 m v ) p X ( w − 1 − 2 m ) . • F or ( 2.16 ), we ha ve X w v ∈A ( X ) h k n +1 ( wv ) log p X ( v | w ) + X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) h k n +1 ( wv ) log h 0 n +1 ( wv ) h 0 n ( w ) = X w v ∈A ( X ) n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − n ¯ v ) − ( n + 1 − k ) p X ( wv ) ! log p X ( v | w − 1 − m ) + X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) ,p X ( w − m − 1 − n )=0 h k n +1 ( wv ) log h k n +1 ( wv ) h k n ( w ) + X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) ,p X ( w − m − 1 − n ) > 0 h k n +1 ( wv ) log h k n +1 ( wv ) h k n ( w ) = X w v ∈A ( X ) + X p X Z ( w )=Θ( ε ) ,p X Z ( wv )=Θ( ε ) ,p X ( w − m − 1 − n )=0 ! n − k X j =1 p X ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − n ¯ v ) ! log p X ( v | w − 1 − m ) − ( n + 1 − k ) X w − 1 − m v ∈A ( X ) p X ( w − 1 − m v ) log p X ( v | w − 1 − m ) + X p X Z ( w − 1 − 3 m )=Θ( ε ) ,p X Z ( w − 1 − 3 m v )=Θ( ε ) ,p X ( w − m − 1 − 3 m ) > 0 h 0 3 m +1 ( wv ) × log h 0 3 m +1 ( wv ) h 0 3 m ( w ) INPUT-CONSTRA INED CHA NNEL CAP ACITY 27 = ( n − k − m ) X w − 1 − m v ∈A ( X ) p X ( w − 1 − m v ) log p X ( v | w − 1 − m ) + X w v ∈A ( X ) m X j =1 p ( w − j − 1 − n ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − n ¯ v ) ! log p X ( v | w − 1 − m ) − ( n + 1 − k ) X w − 1 − m v ∈A ( X ) p X ( w − 1 − m v ) log p X ( v | w − 1 − m ) + X p X Z ( w − 1 − 3 m )=Θ( ε ) ,p X Z ( w − 1 − 3 m v )=Θ( ε ) ,p X ( w − m − 1 − 3 m ) > 0 h 0 3 m +1 ( wv ) × log h 0 3 m +1 ( wv ) h 0 3 m ( w ) = ( − m − 1) X w − 1 − m v ∈A ( X ) p X ( w − 1 − m v ) log p X ( v | w − 1 − m ) + X w − 1 − 2 m v ∈A ( X ) m X j =1 p X ( w − j − 1 − 2 m ¯ w − j w − 1 − j +1 v ) + p X ( w − 1 − 2 m ¯ v ) ! × log p X ( v | w − 1 − m ) + X p X Z ( w − 1 − 3 m )=Θ( ε ) ,p X Z ( w − 1 − 3 m v )=Θ( ε ) ,p X ( w − m − 1 − 3 m ) > 0 h 0 3 m +1 ( wv ) × log h 0 3 m +1 ( wv ) h 0 3 m ( w ) . Ac kn o wledgment s. W e are grateful to W o jciec h Szpanko wski, who raised the pr oblem addressed in this pap er and suggested a version of th e result in Corollary 3.3 . W e also thank the anon ymous reviewers for h elpful commen ts. REFERENCES [1] Arimoto, S. (1972). An algorithm for compu ting the capacity of arbitrary discrete memoryless channels. IEEE T r ans. Inform. The ory IT- 18 14–20. MR0403796 [2] Arnold, D. and Loelige r, H. (2001). The information rate of binary-inpu t chan- nels with memory . In Pr o c e e dings of 2001 IEEE I nternational Confer enc e on Communic ations (Helsi nki, Finl and) 2692–2695 . [3] Arnold, D. M. , Loelige r, H.-A. , Vontobel, P. O. , Ka v ˇ ci ´ c, A. and Zeng, W. (2006). Sim u lation-b ased comput ation of information rates for channels with memory. IEEE T r ans. Inform. The ory 52 3498–3508 . MR2242361 [4] Arnold, L. , Gun dlach, V. M. and Demetrius, L. (1994). Evol utionary formal- ism for pro ducts of p ositive random matrices. Ann. Appl. Pr ob ab. 4 859–901. MR1284989 28 G. HA N AND B. MARCUS [5] Birch, J. J. (1962). A pproximation for the entrop y for functions of Marko v chains. Ann . M ath. Statist. 33 930–938. MR0141162 [6] Blackwell, D. (1957). The entrop y of functions of ﬁnite- state Marko v c hains. In T r ansactions of the First Pr ague Confer enc e on Information The ory, Statis- tic al De cision F unctions, R andom Pr o c esses 13–20. Publishing House of the Czec h oslov ak Academy of Sciences, Prague. MR0100297 [7] Blahut, R. E. (1972). Computation of channel capacity and rate-distortion fun c- tions. IEEE T r ans. Inf orm. The ory IT-18 460–473. MR0476161 [8] Cover, T. M. and Thomas, J. A. (1991). Elements of Information T he ory . Wiley , New Y ork. MR1122806 [9] Egner, S. , Balakirsky, V. , Tolhuizen , L. , B aggen, S. and Hollmann, H. (2004). On the entrop y rate of a hidden Mark o v mo del. In Pr o c e e dings of the 2004 I EEE International Symp osium on I nformation The ory (Chic ago, I L) 12. [10] Ephra i m, Y. and Merha v, N. (2002). Hidden Marko v pro cesses. IEEE T r ans. Inform. The ory 48 1518–1 569. Sp ecial issue on S hannon th eory: Perspective, trends, and applications. MR1909472 [11] G h ara vi, R. and Anantharam, V. (2005). An upp er b oun d for the largest Lya- punov exp onent of a Marko vian p rod uct of nonnegative matrices. The or et. Com- put. Sci. 332 543–557 . MR2122519 [12] G ra y, R. M. (1990). Entr opy and Information The ory . Springer, New Y ork. MR1070359 [13] Ha n, G . and Mar cus, B. (2006). Analyticity of entro py rate of hidden Mark ov chai ns. IEEE T r ans. I nf orm. The ory 52 5251–5266. MR2300690 [14] Ha n, G . and Marcus, B. (2007). Deriv atives of entropy rate in special families of hidden Marko v chains. IEEE T r ans. Inform. The ory 53 2642–265 2. MR2319402 [15] Hollida y, T. , Goldsmith, A. and Gl ynn, P. ( 2003). Capacit y of ﬁnite state Mark o v channels with general inp uts. In Pr o c e e dings of the 2003 IEEE Interna- tional Symp osium on Information The ory 289. [16] Hollida y, T. , Goldsmith, A. and Gl ynn, P. (2006). Capacity of ﬁnite state c han- nels based on Lyapuno v exp onents of random matrices. IEEE T r ans. Inform. The ory 52 3509–3532. MR2242362 [17] Jacquet, P. , Seroussi, G. and Szp an k o wski, W. ( 2004). On the en t ropy of a hidden Mark ov pro cess (extended abstract). I n Pr o c e e dings of Data Compr ession Confer enc e 362–371. [18] Jacquet, P. , Seroussi, G. and Szp an k o wski, W. ( 2007). On the en t ropy of a hidden Marko v p rocess. The or et. Comput. Sci. 395 203–219. MR2424508 [19] Jacquet, P. , Seroussi, G . and Szp anko wski, W. (2007). N oisy constrained capac- it y . In Pr o c e e ding of the 2007 IEEE I nternational Symp osium on I nformation The ory (Nic e, F r anc e) 986–99 0. [20] Li n d, D. and Marcus, B. (1995). An Intr o duction to Symb olic Dynami cs and Co ding . Cam b rid ge Univ. Press, Cam b ridge. MR1369092 [21] Ma rcus, B. H . , Roth, R. M. an d Sie gel, P. H. (1998). Constrained sy stems and codin g for recording channels. In Handb o ok of Co ding The ory ( V. S. Pless and W. C. Hofman, eds.) I I 1635–176 4. North-Holland, Amsterdam. MR1667956 [22] Orde ntlich, E. and Weissman, T. (2006). On t h e optimalit y of sy mbol-by-symbol ﬁltering and den oising. IEEE T r ans. Inform. The ory 52 19–40. MR2237333 [23] Orde ntlich, E. and Weissman, T. (2004). New b oun d s on the entrop y rate of hidden Marko v pro cess. In Pr o c e e di ngs of IEEE Information The ory Workshop (San Antonio, TX) 24–29. INPUT-CONSTRA INED CHA NNEL CAP ACITY 29 [24] P arr y, W . (1964). Intrinsic Marko v chains. T r ans. Amer. M ath. So c. 112 55–66. MR0161372 [25] Pere s, Y. (1990). Analyt ic dep en dence of Lyapuno v exp onents on transition p roba- bilities. In Lyapunov Exp onents (Ob erwolfach, 1990) . L e ctur e Notes in Math- ematics (L. Aruold, H. Crauel and J.-P. Eckman, e ds.) 1486 64–80. Springer, Berlin. MR1178947 [26] Pere s, Y. (1992). Domains of analytic contin u ation for th e top Lyapuno v ex p onent. Ann . I nst. H. Poi nc ar ´ e Pr ob ab. Statist . 28 131–148. MR1158741 [27] Pfister, H. , Soriaga, J. and Si egel, P. (2001). The achiev able informatio n rates of ﬁnite-state ISI c h annels. In Pr o c e e dings of IEEE GLOBECOM (San Ant oni o, TX) 2992–2996. [28] Ruelle, D. (1979). Analycity prop erties of th e c haracteristic exponents of random matrix p rodu cts. A dv. in Math. 32 68–80. MR534172 [29] Sh amai ( S hitz), S. and Kofman, Y. (1990). On the capacit y of binary and Gaussian channels with run -length limited inputs. I EEE T r ans. Commun. 38 584–594. [30] Sh annon, C. E. (1948). A mathematical th eory of communication. Bel l System T e ch. J. 27 379–423, 623–656. MR0026286 [31] Sh arma, V. and Singh, S. (2001). Entrop y and channel capacity in the regenerative setup with applications to Marko v channels. In Pr o c e e dings of IEEE Interna- tional Symp osium on Information The ory (Washington, DC) 283. [32] Vontobel, P. O. , Ka v ˇ ci ´ c, A. , Arnold, D. M. and Loeliger, H.-A. (2008). A generalization of the Blahut-Arimoto algorithm to ﬁnite-state channels. IEEE T r ans. Inform. T he ory 54 1887–1918. MR2450041 [33] Ze h a vi, E. and Wolf, J. (1988). O n runlength codes. IEEE T r ans. Inform. The ory 34 45–54. [34] Zuk, O. , Kanter, I. and Domany, E. (2005). The en tropy of a b inary hidden Mark o v process. J. Stat. Phys. 121 343–360. MR2185333 [35] Zuk, O. , Domany, E. , Kanter, I. and Aizenman, M. (2006). F rom ﬁnite-system entrop y to entrop y rate for a Hidden Mark ov pro cess. IEEE Signal Pr o c essing L etters 13 517–520. Dep ar tment of Ma thema tics University of Hong Kong Pokfulam Road, Pokfulam Hong Kong E-mail: ghan@maths.hku.hk Dep ar tment of Ma thema tics University of British Columbia V an couv er, B.C. V6T 1Z2 Canada E-mail: marcus@math.ubc.ca

Asymptotics of input-constrained binary symmetric channel capacity

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment