Classical Capacities of Averaged and Compound Quantum Channels
We determine the capacity of compound classical-quantum channels. As a consequence we obtain the capacity formula for the averaged classical-quantum channels. The capacity result for compound channels demonstrates, as in the classical setting, the ex…
Authors: Igor Bjelakovic, Holger Boche
1 Classical Capaciti es of Compou nd a nd A v eraged Quantum Channels Igor Bjelakovi ´ c and Holger Boche Heinrich-Hertz-Chair for Mobile Commu nications T echnische Uni versit ¨ at Berlin W erner -von -Siemens-Bau (HFT 6), Einsteinu fer 25, 1058 7 Berlin, Germany & Institut f ¨ ur Mathem atik, T echnische Uni versit ¨ at Berlin Straße des 17. Jun i 136, 106 23 Berlin, Germany Email: { igor .bjelak ovic, holger .boche } @mk.tu-berlin.de Abstract — W e determine the capacity of compound classical- quantum channels. As a consequ ence we obtain th e capacity f or - mula f or th e a veraged classical-quan tum ch annels. The capacity result for compound channels demonstrates, as in the classical setting, the existence of reliable universal classica l-quantum codes in scenarios where the only a priori inform ation about the channel used for the transmission of information is that it belongs to a giv en set of memoryless classical-quantu m channels. Our approach is based on a unive rsal classical approxima tion of the quantum relativ e entropy w hich in tu rn relies on a unive rsal hypothesis testing result. Index T erms — Compound quantum channels, av eraged quan- tum channels, coding th eorem, capacity , u niversa l quantu m codes I . I N T RO D U C T I O N In this paper we presen t the codin g theorem s for compou nd and a veraged channels with classical input and quantum output (cq-cha nnels). The re sult nicely supplements recent results of Datta and Dorlas [6] where th ey con sidered finite weighted sums of memor yless quantu m chan nels and determ ined their classical capacity . This is one of the basic examp les of channels with long-ter m memo ry . This is obviously equ iv alent to the determinatio n of the classical cap acity for the associated compou nd channel co nsisting of finitely m any chann els, since for finite su ms we can easily bo und th e error probabilities of the individual m emory less b ranches by th e error pro bability of the averaged ch annel and vice versa. Unfo rtunately , the beautiful method of pro of in [6] do es not apply when the number of ch annels is infinite. Roughly , the interest in co mpou nd channe ls is motiv ated by the fact that in many situation s we ha ve only a limited knowl- edge about the c hannel wh ich is u sed for the tr ansmission of informa tion. In the co mpoun d setting we know mer ely that the memo ryless cq -chann el which is in u se belo ngs to some giv en fin ite or infinite set of memory less cq-chann els which is a priori kn own to the sender and receiver . Their goal is to con- struct cod ing-dec oding strategies that work well f or the whole set of chann els simultan eously . The situatio n is compar able with the un iv ersal sour ce co ding scen ario con sidered in [17] This work is supporte d by the Deutsche Forschungsgeme inschaft DFG via proje ct Bj 57/1-1 ”Entropie und Kodieru ng großer Quanten- Information ssysteme”. by Jozsa a nd M., P ., and R. Hor odecki. A v eraged cq- channels are close relatives of comp ound channels, the difference bein g that in this situatio n the comm unicating parties h av e access to an additional a priori probability distribution governing the appearan ce of the particular member of the compou nd chan nel. The p aper is organized as follows: In Section II we give a rapid overview of the c lassical th eory of comp ound chann els. Whereas Section III is dev o ted to the notio n of co mpou nd cq- channels an d th e defin ition of the ca pacity fo r this class of channels. The su bsequent Section IV co ntains the first pillar of our argum ent. Namely , we co nstruct, u sing an id ea going back to Nagaoka, a universal classical appr oximatio n o f the quantum re lati ve entropy fo r classes of uncorrelated quan tum states. The central Section V starts with a r elation between a minimization p rocedu re arising in un iv ersal hypothesis testing and the minimization process required fo r the determination of the c apacity of co mpou nd cq-ch annels wh ich is b ased on Donald’ s ineq uality (cf . Lem mata 5 .1 and 5. 3). Th en we proceed with th e d irect and the (strong ) converse par t of the coding theorem fo r co mpoun d cq-channels 1 . As a by- produ ct we can prove in Section VI the c oding theorem and the weak conv erse for arbitrary averaged cq-chann els with memory less branch es. This extends, in part, the re sults of Ah lswede [2] to the cq-situation. Moreover, the r esults of Datta and Do rlas [6] are generalized to averages of mem oryless cq- channels with respect to ar bitrary p robab ility measures, provided the set o f channels h as some app ropriate measurable structu re. A. Notation W e will assume ta citly th roug hout the pap er that all Hilb ert spaces are over the field C . The identity op erator acting on a Hilbert space H is deno ted by 1 H or simply by 1 if it is clear from th e context which Hilbert space is un der con sideration. The set of d ensity op erators acting on the finite-d imensiona l Hilbert space H is d enoted b y S ( H ) and th e set of p robab ility distributions on a finite set A will be abb reviated b y P ( A ) . | A | d enotes the card inality of the set A . The projection o nto 1 After the submission of this paper Hayashi [12] obtained a s imilar result via W eyl-Shur duality . His result can be used to giv e anoth er proof of the direct part of the coding the orem for averag ed cha nnels. His error bounds are expo nenial but depe nd on the channel. 2 the rang e of a density o perator ρ ∈ S ( H ) , dim H < ∞ , is called th e sup port of ρ an d we dedicate the notatio n supp( ρ ) to it. The relative en tropy of the state (i.e. den sity operato r) ρ with respect to th e state σ is giv en b y S ( ρ || σ ) := tr ( ρ log ρ − ρ log σ ) if supp( ρ ) ≤ supp( σ ) ∞ else , where tr stands for the trace and lo g is the bina ry loga- rithm. Th e classical analo g of th e relative entro py kn own as Kullback-Leibler distanc e is defined by D ( p || q ) := P a ∈ A p ( a ) log p ( a ) − p ( a ) log q ( a ) if p ≪ q ∞ else , where p, q ∈ P ( A ) . The rela tion p ≪ q means that q ( a ) = 0 for some a ∈ A implies p ( a ) = 0 or, equ iv alen tly , that supp( p ) ⊂ supp( q ) , where supp( p ) := { a ∈ A : p ( a ) > 0 } . V on Neuman n entropy o f a density o perator ρ ∈ S ( H ) , dim H < ∞ , is defined to be S ( ρ ) := − tr ( ρ log ρ ) . Th e Shannon en tropy of p ∈ P ( A ) , | A | < ∞ , is given by H ( p ) := − P x ∈ A p ( x ) log p ( x ) . The n -fold Cartesian prod uct of a finite set A with itself is denoted by A n . W e set x n := ( x 1 , . . . , x n ) for sequences ( x 1 , . . . , x n ) ∈ A n . Notation we use fo r the logarith ms is as follows: log a is th e logarithm to the base a > 1 and log is unde rstood as log 2 . I I . S H O RT O V E RV I E W O F T H E C L A S S I C A L T H E O RY O F C O M P O U N D C H A N N E L S The basic classical theor y of com pound channels was developed indepen dently b y Black well, Breiman , Thom asian [4] and W olf owitz [ 24]. Black well, Breima n an d T homasian proved the cod ing theorem with th e weak con verse. W olf owitz, on the other hand, obtained the coding theore m with the strong co n verse for the maximum erro r criterio n by a n en tirely different metho d of proof . W e re call at this place br iefly the cap acity f ormula just to emphasize the similarity to th e capacity formula ( 6) for the cq-case. For an arb itrary set T and finite sets A , B we consider th e family of discrete ch annels W t : A → B , t ∈ T . The compou nd chan nel, d enoted by T , is simp ly the who le family of d iscrete m emory less chann els { W n t } t ∈ T ,n ∈ N . Let λ ∈ (0 , 1) . An ( n, M n , λ ) max -code f or the compoun d channel T is set of tuple s ( x n ( i ) , B i ) M n i =1 where x n ( i ) ∈ A n , B i ⊆ B n , B i ∩ B j = ∅ for i 6 = j and W n t ( B i | x n ( i )) ≥ 1 − λ for all i = 1 , . . . , M n and all t ∈ T . A similar definition of the ( n, M n , λ ) av -codes can be giv en simp ly b y replacing the maximum error criterion by the average one. Thus the goal is to find re liable codes wh ich work well for all discrete memory less channels indexed by the set T . The work [4], [2 4] can be summa rized as fo llows: The weak capacity of the compo und channel T with respect to both the maximum and average erro r criteria is giv en by C ( T ) = max p ∈P ( A ) inf t ∈ T I ( p, W t ) , (1) where P ( A ) denotes the set of probab ility distributions on A and I ( p, W t ) is the mutua l info rmation of the channel W t with resp ect to the in put distribution p . W olfowitz has shown that the RHS of (1) is the stron g capacity with respect to the maximum err or criterio n. Ahlswede gi ves an example in [1] that demo nstrates that, surprisin gly , th e strong conv erse ne ed not hold for compo und channels if the a verage p robab ility of error is u sed in the definition o f the cap acity . I I I . C O M P O U N D C Q - C H A N N E L S W e consider here a set o f cq-channels W t : A ∋ x 7→ D t,x ∈ S ( H ) , t ∈ T , for an arb itrary set T where A is a finite set and H is a finite-dimen sional Hilbert space. The n -th memory less extension of the cq-ch annel W t is given by W n t ( x n ) := D t,x n := D t,x 1 ⊗ . . . ⊗ D t,x n for x n ∈ A n . The compou nd cq-channel is gi ven by the family { W n t } t ∈ T ,n ∈ N . W e will write simply T f or the compo und cq-chan nel. An n - cod e , n ∈ N , f or the compo und cq-chan nel T is a family C n := ( x n ( i ) , b i ) M n i =1 consisting of sequences x n ( i ) ∈ A n and positive semi-d efinite operator s b i ∈ B ( H ) ⊗ n such that P M n i =1 b i ≤ 1 ⊗ n . Th e number M n is called the size of the code. A code C n is called a ( n, M n , λ ) max -code for the compou nd cq-chan nel T if th e size o f C n is M n , x n ( i ) ∈ A n and if e m ( t, C n ) := max i =1 ,...,M n (1 − tr ( D t,x n ( i ) b i )) ≤ λ ∀ t ∈ T . (2) with an analog definition of an ( n, M n , λ ) av -code w .r .t a verage error p robab ility criterion , i.e. we re place e m ( t, C n ) ≤ λ by e a ( t, C n ) := 1 M n M n X i =1 (1 − tr ( D t,x n ( i ) b i )) ≤ λ ∀ t ∈ T in the defin ition. Thus an ( n, M n , λ ) max -code for the co mpou nd cha nnel T ensures that the ma ximal error probab ility f or all chann els of class T is b ounde d fro m above b y λ . A more in tuitiv e description of th e co mpoun d c hannel is that the sender and receiver ac tually d on’t know which channel f rom the set T is used during the transmission of the n -b lock. Their prior knowledge is merely th at the cha nnel is memory less and belongs to the set T . Th is is a channel analog of the univ ersal source coding problem for a set of memory less sourc es ( cf. [17]). A real number R ≥ 0 is said to be an achievable rate for the compou nd channel if there is a seq uence of cod es ( C n ) n ∈ N of sizes M n such th at lim inf n →∞ 1 n log M n ≥ R, (3) and lim n →∞ sup t ∈ T e ( t, C n ) = 0 . (4) The wea k capacity , de noted by C ( T ) , of the compo und channel T is defined as th e lea st u pper b ound o f all achievable rates. R ≥ 0 is called a λ -achievable rate for the compound ch annel 3 T , λ ∈ [0 , 1) , if there is a sequ ence o f cod es ( C n ) n ∈ N of sizes M n for which (3) holds but the error condition is relaxed to sup t ∈ T e ( t, C n ) ≤ λ ∀ n ∈ N . The λ -capacity C ( T , λ ) is the least upper bo und of all λ - achiev a ble rates. The Holevo i nform ation of a cq -chann el W t : A → S ( H ) with respect to th e inp ut d istribution p ∈ P ( A ) is d efined b y χ ( p, W t ) := S ( D t ) − X x ∈ A p ( x ) S ( D t,x ) (5) where S ( · ) stands for von Neuman n entro py . As shown in [16], [20], [23], an d [19] the λ -capacity of a single m emoryle ss cq-ch annel W is g iv en b y C ( W , λ ) = max p ∈P ( A ) χ ( p, W ) ∀ λ ∈ (0 , 1) . The main result of our p aper is an analog of the ca pacity formu la (1) and c an b e stated as follows. Theor em 3.1 : Let T be an arbitrary co mpoun d c q-chan nel with fin ite in put alp habet A and finite-dim ensional output Hilbert sp ace H . Then C ( T , λ ) = max p ∈P ( A ) inf t ∈ T χ ( p, W t ) (6) holds for any λ ∈ (0 , 1) . Pr oof: The achiev a bility , i.e. the inequality C ( T , λ ) ≥ max p ∈P ( A ) inf t ∈ T χ ( p, W t ) follows fro m Theor em 5 .10. On the o ther hand , Th eorem 5.13 shows that we cannot be better than the right hand sid e of ( 6) which establishes th e ineq uality C ( T , λ ) ≤ max p ∈P ( A ) inf t ∈ T χ ( p, W t ) . I V . U N I V E R S A L C L A S S I C A L A P P RO X I M A T I O N O F T H E Q UA N T U M R E L AT I V E E N T R O P Y The purpo se of this section is the deriv ation of a universal classical appro ximation of q uantum relativ e e ntropies of a giv en set Ω ⊂ S ( H ) with respect to a ref erence state σ ∈ S ( H ) . Th e first result o f this kind was ob tained in the paper [14] by Hiai and Petz in the c ase | Ω | = 1 . Basically they have shown th at for given states ρ, σ ∈ S ( H ) we can approx imate S ( ρ ⊗ l || σ ⊗ l ) by the Kullback-Leibler divergence of th e probab ility distributions p l and q l giv en by p l ( i ) = tr ( ρ ⊗ l P i ) , q l ( i ) = tr ( σ ⊗ l P i ) , for suitable projectio ns P i = P i ( l, ρ, σ ) ∈ B ( H ) ⊗ l with P N l i =1 P i = 1 ⊗ l H . The ap prox imation error d oes not exceed dim H · log ( l + 1 ) . Precisely , Hiai and Petz h ave shown that S ( ρ ⊗ l || σ ⊗ l ) ≥ D ( p l || q l ) ≥ S ( ρ ⊗ l || σ ⊗ l ) − dim H · log( l + 1) . This approxim ation result for quantum relati ve entropy was the crucial step for a construction of p rojections Q n ∈ B ( H ) ⊗ n for each n ∈ N with the properties 1) lim n →∞ tr ( ρ ⊗ n Q n ) = 1 and, 2) lim s up n →∞ 1 n log tr ( σ ⊗ n Q n ) ≤ S ( ρ || σ ) . These prope rties are exactly the direc t part of the quantu m version of Stein’ s Lemma. Subsequ ently , Nagaoka ob served that these argu ments c an be re versed, i.e. starting from the direct part o f Stein’ s Lem ma we can construct a classical approx imation of the quan tum relative entro py b y simply considerin g th e projec tions Q n and 1 ⊗ n H − Q n and prob a- bility d istributions p n = ( tr ( ρ ⊗ n Q n ) , 1 − tr ( ρ ⊗ n Q n )) , q n = ( tr ( σ ⊗ n Q n ) , 1 − tr ( σ ⊗ n Q n )) 2 (cf. o ur ine quality chain (7) for more details). It is an interesting fact th at Nagaoka’ s argument produ ces for each n ∈ N p airs of projection s which give rise to a goo d app roximatio n of th e qu antum r elativ e entro py . Our appro ach to the univ ersal classical approximation is motiv ated by Nag aoka’ s argum ent and ther efore we need a universal version of Stein’ s Lem ma or Sanov’ s Theore m from [ 3]. Actually we need a slightly sh arper result than that obtained in [3]. The main tool to o btain th is sharpenin g is contained in the following Lemma 4 .1: Let X be a finite set and r ∈ P ( X ) with r ( x ) > 0 f or all x ∈ X . Then for each δ > 0 , k ∈ N , and any set Ω k ⊂ P ( X ) there is a subset X k,δ ⊂ X k with 1) q ⊗ k ( X k,δ ) ≥ 1 − ( k + 1) | X | 2 − kcδ 2 for all q ∈ Ω k with a univ ersal constant c > 0 . 2) r ⊗ k ( X k,δ ) ≤ ( k + 1) | X | 2 − k ( D (Ω k || r ) − η ( δ,r ) ) , with D (Ω k || r ) := inf q ∈ Ω k D ( q || r ) and η ( δ, r ) := − δ log δ | X | − δ lo g r min , where r min denotes the smallest positive value of r . Pr oof: The p roof uses the well known type bo unding technique s from [ 5] and [2 1] and is therefo re omitted . A (discrete) pro jection valued m easure (PVM) on a finite dimensiona l Hilbert space K is a set M := { P i } m i =1 consisting of p rojection s P i ∈ B ( K ) such that P m i =1 P i = 1 K . For two states ρ, σ ∈ S ( K ) and any PVM M on K we d efine S M ( ρ || σ ) := m X i =1 tr ( ρP i ) log tr ( ρP i ) − tr ( ρP i ) log tr ( σ P i ) if ( tr ( ρP i )) m i =1 ≪ ( tr ( σ P i )) m i =1 and S M ( ρ || σ ) := ∞ else. Theor em 4.2 : Let σ ∈ S ( H ) be in vertible. Then for each l ∈ N ther e is a real number ζ l ( σ ) with lim l →∞ ζ l ( σ ) = 0 such that for any set Ω l ⊂ S ( H ) th ere is a PVM M l = { P l , 1 ⊗ l H − P l } on H ⊗ l with S M l ( ρ ⊗ l || σ ⊗ l ) ≥ l ( S (Ω l || σ ) − ζ l ( σ )) for all ρ ∈ Ω l with S (Ω l || σ ) := inf ρ ∈ Ω l S ( ρ || σ ) . Conse- quently , inf ρ ∈ Ω l S M l ( ρ ⊗ l || σ ⊗ l ) ≥ l ( S (Ω l || σ ) − ζ l ( σ )) . Pr oof: The proo f is based on the following observation: Let M l = { P l , 1 ⊗ l H − P l } be any PVM on H ⊗ l with the proper ties 2 W e learned this from the paper [18 ] by Ogawa and Hayashi who attrib ute this observat ion to Nagaoka. 4 1) tr ( ρ ⊗ l P l ) ≥ 1 − τ 1 ,l for all ρ ∈ Ω l with lim l →∞ τ 1 ,l = 0 and 2) tr ( σ ⊗ l P l ) ≤ 2 − l ( S (Ω l || σ ) − τ 2 ,l ) with lim l →∞ τ 2 ,l = 0 . Then using these relatio ns we can lower -boun d S M l for each ρ ∈ Ω l as f ollows: First of a ll, since σ is inv e rtible we have S ( ρ ⊗ l || σ ⊗ l ) < ∞ for each ρ ∈ Ω l . Th us, the mono tonicity of the relativ e entropy yields S M l ( ρ ⊗ l || σ ⊗ l ) ≤ S ( ρ ⊗ l || σ ⊗ l ) < ∞ for all ρ ∈ Ω l . Consequ ently we can lower-bound 1 l S M l ( ρ ⊗ l || σ ⊗ l ) u sing the r elations 1) and 2): 1 l S M l ( ρ ⊗ l || σ ⊗ l ) ≥ − 1 l H (( tr ( ρ ⊗ l P l , tr ( ρ ⊗ l ( 1 ⊗ l H − P l ))) − tr ( ρ ⊗ l P l ) 1 l log tr ( σ ⊗ l P l ) ≥ − log 2 l + tr ( ρ ⊗ l P l )( S (Ω l || σ ) − τ 2 ,l ) ≥ − 1 l + (1 − τ 1 ,l )( S (Ω l || σ ) − τ 2 ,l ) ≥ S (Ω l || σ ) − ζ l ( σ ) , (7) with ζ l ( σ ) := (1 − τ 1 ,l ) τ 2 ,l − τ 1 ,l log λ min ( σ ) + 1 l , ( 8) where λ min ( σ ) den otes the smallest eigen value o f σ . Thus ou r remaining job is the co nstruction of th e PVM with the properties described above. T o this end let l ∈ N and Ω l ⊂ S ( H ) be given. For m ∈ N we can find k , y ∈ N with 0 ≤ y < m such that l = k m + y . T hen applyin g exactly the same boun ding tech nique as in the pro of of Theorem 2 in [3] but using our L emma 4.1 instead of their Lem ma 1 we obtain for each δ > 0 a p rojection P l,δ ∈ B ( H ) ⊗ l with 1) tr ( ρ ⊗ l P l,δ ) ≥ 1 − ( k + 1) d m 2 − kcδ 2 with a u niversal constant c > 0 and where d = d im ( H ) , 2) 1 l log tr ( σ ⊗ l P l,δ ) ≤ − S (Ω l || σ ) + d log( m + 1) m +( d 2 m + d m ) log( k + 1) k m + η ( δ, σ ) , with η ( δ, σ ) = − δ log δ d − δ log λ min ( σ ) . Choosing m = m l := ⌈ log d ( l 1 / 8 ) ⌉ it is easily seen that for k = k l = l − y l m l with 0 ≤ y l < m l and δ l := l − 1 / 4 we hav e lim l →∞ τ 1 ,l = 0 and lim l →∞ τ 2 ,l = 0 , where τ 1 ,l := ( k l + 1) d m l 2 − k l cδ 2 l , (9) and τ 2 ,l := d log( m l + 1) m l + ( d 2 m l + d m l ) log( k l + 1) k l m l + η ( δ l , σ ) . (10) The desired PVM is then gi ven b y M l := { P l , 1 ⊗ l H − P l } with P l := P l,δ l . Remark 4 .3: An alternati ve p roof of Theorem 4.2 might be based on the techniques developed by Hayashi in [10], [11]. He con structs there a sequen ce of PVM’ s on H ⊗ l via representatio n theory of Lie g roups which depends merely on σ and sh ows how to deriv e Stein’ s Lem ma. Thus we are forced to uniform ly bou nd the errors of the first and second kind in Hayashi’ s setting fo r the whole family Ω l in o rder to ob tain a un iv ersal abelian approx imation of the q uantum relativ e entropy . V . C A PAC I T Y O F C O M P O U N D C Q - C H A N N E L S Let T be an arbitrary co mpoun d channel an d f or a fixed p ∈ P ( A ) define Ω p := ( ρ t := X x ∈ A p ( x ) | x ih x | ⊗ D t,x : t ∈ T ) , where each ρ t ∈ Ω p is seen as a d ensity op erator in A diag ⊗ B ( H ) with A diag := M x ∈ A C | x ih x | being the algeb ra of operator s diagon al w .r .t. the b asis {| x i} x ∈ A of C | A | 3 . Moreover , for each t ∈ T we set σ t := X x ∈ A p ( x ) D t,x . In wha t follows we iden tify the probability distribution p with a diagonal d ensity o perator, i. e. we set p = X x ∈ A p ( x ) | x ih x | ∈ A diag . It is well known that S ( ρ t || p ⊗ σ t ) = χ ( p, W t ) holds, whe re S ( ρ t || p ⊗ σ t ) is the relative entropy . Lemma 5 .1 (Donald’s Ine quality): Consider any t, t ′ ∈ T . Then S ( ρ t ′ || p ⊗ σ t ) ≥ S ( ρ t ′ || p ⊗ σ t ′ ) and equality hold s iff σ t ′ = σ t . Pr oof: The claimed ineq uality can be seen as a special instance of Do nald’ s identity [7] . W e gi ve a short direct pro of for reader’ s con venien ce. I f s upp( ρ t ′ ) is not do minated b y supp( p ⊗ σ t ) we have S ( ρ t ′ || p ⊗ σ t ) = + ∞ . But on th e o ther hand S ( ρ t ′ || p ⊗ σ t ′ ) = χ ( p, W t ′ ) < + ∞ for any t ′ ∈ T . Thus the claimed ineq uality is trivially fu lfilled and is always strict in this case. Assume now that supp( ρ t ′ ) is dominated by supp( p ⊗ σ t ) , 3 A diag has a natural structure of a ∗ -algebr a, thus A diag ⊗ B ( H ) is an admissible construction. 5 then we o btain S ( ρ t ′ || p ⊗ σ t ) = tr ( ρ t ′ log ρ t ′ − ρ t ′ log p ⊗ σ t ) = − S ( ρ t ′ ) − tr ( ρ t ′ log p ⊗ σ t ) = − S ( ρ t ′ ) + S ( p ) − tr ( σ t ′ log σ t ) = − S ( ρ t ′ ) + S ( p ) − tr ( σ t ′ log σ t ) + tr ( σ t ′ log σ t ′ ) − tr ( σ t ′ log σ t ′ ) = − S ( ρ t ′ ) + S ( p ) + S ( σ t ′ ) + tr ( σ t ′ log σ t ′ − σ t ′ log σ t ) = S ( ρ t ′ || p ⊗ σ t ′ ) + S ( σ t ′ || σ t ) ≥ S ( ρ t ′ || p ⊗ σ t ′ ) , where we used the fact th at S ( σ t ′ || σ t ) ≥ 0 in the last line. W e are done n ow since S ( σ t ′ || σ t ) = 0 iff σ t ′ = σ t . Remark 5 .2: A glan ce at the pro of of Lemma 5. 1 shows that th e following stronger conc lusion holds 4 . For any t ∈ T and any state σ ∈ S ( H ) S ( ρ t ′ || p ⊗ σ ) ≥ S ( ρ t ′ || p ⊗ σ t ′ ) with equality iff σ = σ t ′ . For given p ∈ P ( A ) and t ∈ T we set S (Ω p || p ⊗ σ t ) := inf r ∈ T S ( ρ r || p ⊗ σ t ) . Lemma 5 .3: For each p ∈ P ( A ) we have inf t ′ ∈ T S (Ω p || p ⊗ σ t ′ ) = inf t ′ ∈ T S ( ρ t ′ || p ⊗ σ t ′ ) . Pr oof: It is clear that inf t ′ ∈ T S (Ω p || p ⊗ σ t ′ ) ≤ inf t ′ ∈ T S ( ρ t ′ || p ⊗ σ t ′ ) ho lds. For the re verse ineq uality we choose an arb itrary ε > 0 and a t ( ε ) ∈ T with S (Ω p || p ⊗ σ t ( ε ) ) ≤ inf t ′ ∈ T S (Ω p || p ⊗ σ t ′ ) + ε 2 , (11) and a s ( ε ) ∈ V such th at S ( ρ s ( ε ) || p ⊗ σ t ( ε ) ) ≤ S (Ω p || p ⊗ σ t ( ε ) ) + ε 2 ≤ inf t ′ ∈ T S (Ω p || p ⊗ σ t ′ ) + ε (12) where the la st line fo llows from (1 1). Donald ’ s inequality , Lemma 5 .1, shows that S ( ρ s ( ε ) || p ⊗ σ s ( ε ) ) ≤ S ( ρ s ( ε ) || p ⊗ σ t ( ε ) ) , and co nsequently by (12) that inf t ′ ∈ T S ( ρ t ′ || p ⊗ σ t ′ ) ≤ inf t ′ ∈ T S (Ω p || p ⊗ σ t ′ ) + ε holds for every ε > 0 . This shows o ur claim. A. The Dir ect P a rt o f the Codin g Theorem The crucial po int in our cod e construction for the co mpoun d cq-chan nels will b e following o ne-shot version o f the coding theorem w hich is based on (and is an easy co nsequen ce o f) the ideas de velo ped by Hayashi and Na gaoka in [1 3]. I n or der to fo rmulate th e result pro perly we need some no tation. Let W : K → S ( K ) be any cq-chann el with finite inpu t alph abet K an d finite-dimension al outp ut Hilbert space K . L et D k := 4 W e would like to thank the As sociate Editor for point ing out this improv ement of Lemma 5.1 W ( k ) for all k ∈ K . For any w ∈ P ( K ) we consider the states ρ := X k ∈ K w ( k ) | k ih k | ⊗ D k , and w ⊗ σ with σ = X k ∈ K w ( k ) D k acting on the Hilbert space C | K | ⊗ K . Le t B diag denote th e set o f oper ators on C | K | that ar e diag onal with respect to the orthon ormal basis { | k i} k ∈ K . Theor em 5.4 (Haya shi & Nagaoka [13]): Given any cq- channel W : K → S ( K ) and w ∈ P ( K ) with finite set K and fin ite-dimension al Hilbert space K . Let P ∈ B diag ⊗ B ( K ) be a pro jection with 1) tr ( ρP ) ≥ 1 − λ with some λ > 0 and 2) tr (( w ⊗ σ ) P ) ≤ 2 − µ for some µ > 0 . Then fo r eac h 0 < γ < µ we can find k 1 , . . . , k [2 µ − γ ] ∈ K and b 1 , . . . , b [2 µ − γ ] ∈ B ( K ) with b i ≥ 0 an d P [2 µ − γ ] i =1 b i ≤ 1 K such th at 1 [2 µ − γ ] [2 µ − γ ] X i =1 (1 − tr ( D k i b i )) ≤ 2 · λ + 4 · 2 − γ . Pr oof: All argu ments needed in th e pr oof of th is theo rem are contain ed explicitly or implicitly in [13]. W e provid e the proof in Ap pendix I f or completen ess and in order to m ake the p resentation m ore self-contained . As in the classical app roaches to the direct part of the cod ing theorem we need a discrete app roximatio n o f our compo und cq-chan nel. A pa rtition Π of S ( H ) is a family { π 1 , . . . , π y } of subsets of S ( H ) such that π i ∩ π j = ∅ for i 6 = j and S ( H ) = S y i =1 π i hold. W e say that the d iameter of the par tition Π = { π 1 , . . . , π y } o f S ( H ) is at most κ > 0 if sup ρ,σ ∈ π i || ρ − σ || 1 ≤ κ ∀ i = 1 , . . . , y . W e bo rrow f rom [22] a basic partition ing result for S ( H ) which is proven by a p acking argument in the d 2 -dimension al cube. Theor em 5.5 (W inter , Lemma II.8 in [22]): For any κ > 0 there is a p artition Π = { π i , . . . , π y } of S ( H ) having diameter at most κ with y ≤ K κ − d 2 , where the n umber K > 0 depen ds only on th e dim ension d of H . Applying this result | A | -times outp uts for e ach κ > 0 a partition Π o f the set of cq -chann els C Q ( A, H ) with inp ut alphabet A an d o utput Hilb ert space H with at most K | A | · κ −| A | d 2 elements. For n ∈ N we cho ose κ = κ n := 1 n 2 and a partition Π κ n = { π 1 ,n , . . . , π y ,n } of C Q ( A, H ) with at m ost K | A | · n | A | d 2 elements a nd diam eter n ot exceed ing κ n . Th is Π κ n produ ces a partition Π ′ n := { π i,n ∩ T : i = 1 , . . . , y , π i,n ∩ T 6 = ∅} , of the given comp ound cq-c hannel T . From each π i,n ∩ T 6 = ∅ we select one cq-channel W t i and denote this finite set of channels b y T ′ n . Let U : A → S ( H ) denote the useless cq- channel U ( x ) := (1 /d ) · 1 H . W e set W ′ t := (1 − 1 n 2 ) W t + 1 n 2 U for all t ∈ T ′ n . The resulting set of channels will be deno ted b y T n . Written 6 in terms of density oper ators this defining relation means that we consider D ′ t,x := (1 − 1 n 2 ) D t,x + 1 n 2 d 1 H , (13) for all t ∈ T ′ n and all x ∈ A . Lemma 5 .6: Let T be any comp ound cq- channel and choose n ∈ N . Then the associated compou nd cq-cha nnel T n has the fo llowing properties: 1) | T n | ≤ K | A | · n | A | d 2 . 2) For eac h t ∈ T we ca n find at least o ne s ∈ T n such that for all x n ∈ A n || D t,x n − D ′ s,x n || 1 ≤ 4 n , where || · || 1 denotes the trace distance. Th e same statement h olds if we reverse the roles of t ∈ T and s ∈ T n . 3) Ther e is a constant C = C ( d ) such that for e ach p ∈ P ( A ) and all n ∈ N | min s ∈ T n χ ( p, W ′ s ) − inf t ∈ T χ ( p, W t ) | ≤ C /n holds. Pr oof: The first p art of the lemma is clear by our construction of T n . The second assertion follows from the gene ral fact that for states ρ 1 , . . . , ρ n , σ 1 , . . . , σ n ∈ S ( H ) the relation || ρ 1 ⊗ . . . ⊗ ρ n − σ 1 ⊗ . . . ⊗ σ n || 1 ≤ n X i =1 || ρ i − σ i || 1 holds and th at f or each t ∈ T we can find s ′ ∈ T ′ n with || D t,x − D s ′ ,x || 1 ≤ 2 /n 2 for all x ∈ A and to each s ′ ∈ T ′ n there is obviou sly s ∈ T n with || D s ′ ,x − D ′ s,x || 1 ≤ 2 /n 2 for all x ∈ A . The last part of th e lemma is easily ded uced from the Fannes inequality [8] which s tates that for an y states ρ, σ ∈ S ( H ) with || ρ − σ || 1 ≤ δ ≤ 1 /e we have | S ( ρ ) − S ( σ ) | ≤ δ lo g d − δ log δ . Indeed , for each n ∈ N choose s n ∈ T n with χ ( p, W ′ s n ) = min s ∈ T n χ ( p, W ′ t ) . (14) Then observing that χ ( p, W ′ s n ) = S ( X x ∈ A p ( x ) D ′ t n ,x ) − X x ∈ A p ( x ) S ( D ′ t n ,x ) , and that we ca n find t ∈ T with || D t,x − D ′ s n ,x || 1 ≤ 4 /n 2 for all x ∈ A leads via Fannes inequality to | χ ( p, W ′ s n ) − χ ( p, W t ) | ≤ 2 ( 4 n 2 log d − 4 n 2 log 4 n 2 ) , (15) provided that n ≥ p e 4 . (14) and (15) show that inf t ∈ T χ ( p, W t ) ≤ min s ∈ T n χ ( p, W ′ s ) +2( 2 n 2 log d − 2 n 2 log 2 n 2 ) = min s ∈ T n χ ( p, W ′ s ) + O ( n − 1 ) . A similar argu ment shows the reverse ineq uality and we are done. Remark 5 .7: At this point we pause for a moment to indicate why our discretization Lem ma 5.6 does not suffice to re duce the capa city prob lem for arbitrar y sets of ch annels to the finite case solved by Datta and Dor las [6]. Let us assume that we want to con struct codes for the ch annel T n of blo ck length n The proof strategy in [6], tran slated into th e setting of o ur Lemma 5 .6 would co nsist of a combination o f a measuremen t that detects the branch from T n combined with reliable codes f or individual chann els f rom T n . I n ord er to detect wh ich chan nel is in use d uring the transmission Datta and Dorlas construct a sequence x mL n ∈ A mL n , L n := | T n | 2 , and a PVM in { p mL n t } t ∈ T n in B ( H ⊗ mL n ) with tr ( p mL n t W mL n t ( x mL n )) ≥ (1 − | T n | f m ) | T n |− 1 , (16) where f ∈ (0 , 1) . It is easily seen using stand ard volumetric arguments with respect to the Hausdo rff measure on the set of cq-chan nels that for open sets T (w .r .t. the relative topolog y) of ch annels | T n | ≥ poly ( n ) with d egree strictly larger than 1 . Hence, L n = p oly ( n ) . And since the righ tmost q uantity in (16) has to approach 1 we hav e to ch oose m = m ( n ) as a n increasing sequen ce de pendin g on n . Thus fo r large n m n L n = m n poly ( n ) ≥ n and no more block length is left for coding. In the co urse of the p roof of Theo rem 5.10 we will n eed two probabilistic inequa lities which go b ack to th e work of Blackwell, Breim an, and Thom asian [4] and Hoeffding [1 5]. Let { V t } t ∈ T be a finite set o f stochastic matrices V t : X → J with finite sets X and J . For r ∈ P ( X ) we set p t ( x, j ) := r ( x ) V t ( j | x ) ( x ∈ X , j ∈ J ) , and q t ( j ) := X x ∈ X r ( x ) V t ( j | x ) . Moreover , for each a ∈ N we define the a verag ed c hannel V a : X a → J a by V a ( j a | x a ) := 1 | T | X t ∈ T V a t ( j a | x a ) , the jo int input-ou tput distribution p ′ a ( x a , j a ) := r ⊗ a ( x a ) V a ( j a | x a ) , and q a := 1 | T | X t ∈ T q ⊗ a t . For each t ∈ T an d a ∈ N let i a t ( x a , j a ) := 1 a log V a t ( j a | x a ) q ⊗ a t ( j a ) , (17) and i a ( x a , j a ) := 1 a log V a ( j a | x a ) q a ( j a ) , (18) where x a ∈ X a and j a ∈ J a . Theor em 5.8 (Blackwell, Breiman, Thomasian [4]): W ith the n otation introd uced in p receding pa ragraph we have fo r all α, β ∈ R P ( i a ≤ α ) ≤ 1 | T | X t ∈ T P t ( i a t ≤ α + β ) + | T | 2 − aβ . 7 Our proof of Theor em 5 .10 will also requir e Hoeffding ’ s tail inequality: Theor em 5.9 (Hoeffding [15]): Let X 1 , . . . , X a be inde- penden t real valued r andom variables such that each X i takes values in the interval [ u i , o i ] with probability one, i = 1 , . . . , a . Then for a ny τ > 0 we hav e P a X i =1 ( X i − E ( X i )) ≥ aτ ! ≤ e − 2 a 2 τ 2 P a i =1 ( o i − u i ) 2 and P a X i =1 ( X i − E ( X i )) ≤ − aτ ! ≤ e − 2 a 2 τ 2 P a i =1 ( o i − u i ) 2 W ith all th ese p reliminary re sults we are able n ow to state and prove our ma in o bjective: Theor em 5.1 0 (Direct P art): L et T b e an arb itrary com - pound cq -chann el. Th en for each λ ∈ (0 , 1) and a ny α > 0 we can find ( n, M n , λ ) max -codes with 1 n log M n ≥ max p ∈P ( A ) inf t ∈ T χ ( p, W t ) − α, for all n ∈ N with n ≥ n 0 ( α, λ ) . Consequ ently , for each λ ∈ (0 , 1) C ( T , λ ) ≥ max p ∈P ( A ) inf t ∈ T χ ( p, W t ) . Pr oof: Our strategy will be, rough ly , to co nstruct a “good” pr ojection for the av eraged channel W n = 1 | T n | P t ∈ T n W ′ n t via Th eorem 4.2, Theor em 5.8, and T heorem 5.9. Th is m eans th at fo r a suitably chosen inpu t distribution p ∈ P ( A ) , the associated state ρ ( n ) = X x n ∈ A n p ⊗ n ( x n ) | x n ih x n | ⊗ X t ∈ T n W n t ( x n ) and the resulting prod uct of the m arginal states p ⊗ n ⊗ σ ( n ) we will find a projection P n ∈ ( A diag ⊗ B ( H )) ⊗ n with 1) tr ( ρ ( n ) P n ) ≈ 1 , and 2) tr (( p ⊗ n ⊗ σ ( n ) ) P n ) / 2 − n inf t ∈ T χ ( p,W t ) . Then we will ap ply Theorem 5 .4 to ob tain a good co de f or W n . This code perfo rms well for the c ompou nd channel T n since the e rror pro bability d epends affinely on the cha nnel. Finally , by L emma 5. 6 we see that the co de ob tained in th is way is also reliable for th e or iginal channel T . Let p = argmax p ′ ∈P ( A ) (inf t ∈ T χ ( p ′ , W t )) . W e assume w .l.o.g. th at inf t ∈ T χ ( p, W t ) > 0 , because other wise the assertion o f th e theorem is trivially true. Our goal is to construct ( n, M n , λ 2 ) max -codes C n for the approx imating channel T n with M n ≥ 2 n (inf t ∈ T χ ( p,W t ) − α ) for all sufficiently large n ∈ N . The n by Le mma 5.6 C n is also an ( n, M n , λ 2 + 4 n ) max -code for the origin al ch annel T . Choosing n large en ough we can ensure that 4 n ≤ λ 2 and our proof would be accomplished . In w hat follows we use the abbr eviations Ω p,n := { ρ ′ t : ρ ′ t = X x ∈ A p ( x ) | x ih x | ⊗ D ′ t,x , t ∈ T n } and for t ∈ T n we write σ ′ t := X x ∈ A p ( x ) D ′ t,x , where p ∈ P ( A ) is arbitrar y . Note that by (13) we hav e for each t ∈ T n λ min ( p ⊗ σ ′ t ) ≥ p min 1 n 2 d . (19) Moreover it is clear from the de finition of T n that supp( ρ ′ t ) is dominated b y supp( p ⊗ σ ′ s ) f or each t, s ∈ T n and supp ( p ⊗ σ ′ s ) = supp( p ) ⊗ 1 H for all s ∈ T n . Now choose any s ∈ T n . By the properties o f the supports just mention ed we may assume w .l.o.g . th at p ⊗ σ s is inv ertible. Then for fixed l ∈ N we can find a, b ∈ N with n = al + b , 0 ≤ b < l , and obtain from Th eorem 4.2 a PVM M l = { P 1 ,l , P 2 ,l } with P i,l ∈ ( A diag ⊗ B ( H )) ⊗ l , i = 1 , 2 , with S M l ( ρ ′ ⊗ l t || ( p ⊗ σ ′ s ) ⊗ l ) ≥ l ( S (Ω p,n || p ⊗ σ ′ s ) − ζ l ( p ⊗ σ ′ s )) ≥ l ( min t ∈ T n χ ( p, W ′ t ) − ζ l ( p ⊗ σ ′ s )) , (20) where we h av e used Lemma 5 .3. Since P i,l ∈ ( A diag ⊗ B ( H )) ⊗ l for i = 1 , 2 we can find projection s { r i,x l } x l ∈ A l ⊂ B ( H ) ⊗ l , i = 1 , 2 , with P i,l = X x l ∈ A l | x l ih x l | ⊗ r i,x l ( i = 1 , 2) . The relation ( 1 A diag ⊗ 1 H ) ⊗ l = P 1 ,l + P 2 ,l implies 1 ⊗ l H = r 1 ,x l + r 2 ,x l ∀ x l ∈ A l . (21) For ea ch x l ∈ A l let { e x l ,j } tr ( r 1 ,x l ) j =1 be an or thono rmal basis of the range of r 1 ,x l and { e x l ,j } d l j = tr ( r 1 ,x l )+1 an ortho norm al basis of the range of r 2 ,x l . Then b y (21) the set {| x l i ⊗ e x l ,j } d l x l ∈ A l ,j =1 is an o rthono rmal b asis of ( C | A | ⊗ H ) ⊗ l , and we hav e by definition P 1 ,l = X x l ∈ A l | x l ih x l | ⊗ tr ( r 1 ,x l ) X j =1 | e x l ,j ih e x l ,j | , and similarly P 2 ,l = X x l ∈ A l | x l ih x l | ⊗ d l X j = tr ( r 1 ,x l )+1 | e x l ,j ih e x l ,j | , i.e. th e PVM Q l ( s ) := {| x l ih x l | ⊗ | e x l ,j ih e x l ,j |} d l x l ∈ A l ,j =1 consisting of one-d imensional p rojection s is a r efinement of the PVM M l = { P 1 ,l , P 2 ,l } . Thus by the mono tonicity of the relativ e entropy an d ( 20) we obtain S Q l ( s ) ( ρ ′ ⊗ l t || ( p ⊗ σ ′ s ) ⊗ l ) ≥ l ( min t ∈ T n χ ( p, W ′ t ) − ζ l ( p ⊗ σ ′ s )) , (22) for all t ∈ T n , and co nsequen tly min s ∈ T n min t ∈ T n S Q l ( s ) ( ρ ′ ⊗ l t || ( p ⊗ σ ′ s ) ⊗ l ) ≥ l ( min t ∈ T n χ ( p, W ′ t ) − ζ l ( p )) , (23) 8 where ζ l ( p ) = max s ∈ T n ζ l ( p ⊗ σ ′ s ) . Claim: For the choice l = l n = [ √ n ] we h av e lim n →∞ ζ l n ( p ) = 0 . (24) Recall f rom the pr oof o f Th eorem 4.2 tha t ζ l n ( p ⊗ σ ′ s ) = (1 − τ 1 ,l n ) τ 2 ,l n ( s ) − τ 1 ,l n log λ min ( p ⊗ σ ′ s ) + 1 l n , where τ 1 ,l and τ 2 ,l = τ 2 ,l ( s ) are defined in (9) an d (10). O ur remaining goal is to prove lim n →∞ max s ∈ T n τ 2 ,l n ( s ) = 0 , (25) and lim n →∞ τ 1 ,l n max s ∈ T n ( − log λ min ( p ⊗ σ ′ s )) = 0 . (26) In ord er to simplify the n otation and stream line the su bsequent arguments we introd uce following termino logy: Let ( a n ) n ∈ N and ( b n ) n ∈ N be two sequences o f n on-negative reals. W e write a n ∼ + b n if lim n →∞ a n b n > 0 . The validity of th e assertions (25) and (26) can b e easily d educed from (19) and the facts that k l n ∼ + n 1 / 2 log n 1 / 16 , δ l n ∼ + n − 1 / 8 , an d k l n δ 2 l n ∼ + n 3 / 8 log n 1 / 16 . For example we have by (19) 0 ≤ τ 1 ,l n max s ∈ T n ( − log λ min ( p ⊗ σ ′ s )) ≤ − τ 1 ,l n log p min n 2 · d = 2 − k l n δ 2 l n ( c − o ( n 0 ) − 1 k l n δ 2 l n log n 2 d p min ) , which tends to 0 as n → ∞ sinc e k l n δ 2 l n ∼ + n 3 / 8 log n 1 / 16 . Thus, (26) is p roven. In or der to prove (25) it suf fices to show that lim n →∞ max s ∈ T n ( − δ l n log δ l n − δ l n log λ min ( p ⊗ σ ′ s )) = 0 . But this is clear from max s ∈ T n ( − δ l n log δ l n − δ l n log λ min ( p ⊗ σ ′ s )) ≤ − δ l n log δ l n − δ l n log p min n 2 d and δ l n ∼ + n − 1 / 8 . Choose s ∗ ∈ T n such that s ∗ = argmin s ∈ T n ( min t ∈ T n S Q l ( s ) ( ρ ′ ⊗ l t || ( p ⊗ σ ′ s ) ⊗ l )) , (27) and consider the correspond ing PVM Q l n ( s ∗ ) = { | x l n ih x l n | ⊗ | e x l n ,j ih e x l n ,j |} d l n x l n ∈ A l n ,j =1 . For each t ∈ T n we define p t ( x l n , j ) := tr ( ρ ′ ⊗ l n t | x l n ih x l n | ⊗ | e x l n ,j ih e x l n ,j | ) = p ⊗ l n ( x l n ) tr ( D ′ t,x l n | e x l n ,j ih e x l n ,j | ) = p ⊗ l n ( x l n ) V t ( j | x l n ) , where for each t ∈ T n the stochastic matrix V t : A l n → { 1 , . . . , d l n } is given by V t ( j | x l n ) := tr ( D ′ t,x l n | e x l n ,j ih e x l n ,j | ) for x l n ∈ A l n , j ∈ { 1 , . . . , d l n } . By (27), (23), and (24) we get min t ∈ T n I ( p ⊗ l n , V t ) ≥ l n ( min t ∈ T n χ ( p, W ′ t ) − ζ l n ( p )) , (28) with lim n →∞ ζ l n ( p ) = 0 . (2 8) imp lies toge ther with Lemma 5.6 that 1 l n min t ∈ T n I ( p ⊗ l n , V t ) ≥ inf t ∈ T χ ( p, W t ) − C n − ζ l n ( p ) . (29) This imp lies that we can find n 1 ( ε 1 ) such th at 1 l n min t ∈ T n I ( p ⊗ l n , V t ) ≥ 1 2 inf t ∈ T χ ( p, W t ) > 0 (30) for all n ≥ n 1 ( ε 1 ) . The last in equality in (30) ho lds by our general assumption that inf t ∈ T χ ( p, W t ) > 0 . Choose any n ≥ n 1 ( ε 1 ) . Let Θ := θ ∈ R : 0 < θ < 1 6 inf t ∈ T χ ( p, W t ) and I n : = min t ∈ T n I ( p ⊗ l n , V t ) = min s ∈ T n min t ∈ T n D ( p t || r ⊗ q s ) , (31) where r := p ⊗ l n and q t ( j ) := P x l n r ( x l n ) V t ( j | x l n ) for all j ∈ { 1 , . . . , d l n } . Moreover , in o rder to simp lify our notation, we set X := A l n and J := { 1 , . . . , d l n } and sup press th e n -depen dence of a and l te mporar ily . Recalling th e de finition of i a t and i a from ( 17) an d (1 8) we obtain fr om Theorem 5 .8 for α := I n − 2 l θ , β := l θ , θ ∈ Θ P ( i a ≤ I n − 2 l θ ) ≤ 1 | T n | X t ∈ T n P t ( i a t ≤ I n − l θ ) + | T n | 2 − alθ . (32) Our co nstruction o f th e compound cq-ch annel T n implies that for all t ∈ T n , x ∈ X , j ∈ J V t ( j | x ) ≥ 1 ( n 2 d ) l . Consequently q t ( j ) ≥ 1 ( n 2 d ) l for all j ∈ J , and − l log n 2 d ≤ lo g V t ( j | x ) q t ( j ) ≤ l log n 2 d. (33) Since i a t is a sum of i.i.d. rand om variables each o f which takes values in [ − l lo g n 2 d, l log n 2 d ] by (33), we can apply Theorem 5.9 and obtain P t ( i a t ≤ I n − l θ ) ≤ e − al 2 θ 2 4 l 2 (log n 2 d ) 2 (34) for all t ∈ T n since I n ≤ E t ( i a t ) f or all t ∈ T n . (34) and (32) show that P ( i a ≤ I n − 2 l θ ) ≤ e − aθ 2 16(log nd ) 2 + | T n | 2 − alθ . ( 35) Thus the set X a,θ ⊂ X a × J a = A la × { 1 , . . . , d l } a giv en by X a,θ := { ( x a , j a ) : i a ( x a , j a ) > I n − l θ } , is used to construc t an orthogo nal pr ojection P la,θ ∈ ( A diag ⊗ B ( H )) ⊗ la defined by P la,θ := X ( x a ,j a ) ∈ X a,θ | x a ih x a | ⊗ | e x a ,j a ih e x a ,j a | , 9 where we iden tify each x a ∈ X a with a sequ ence in A la . Moreover e x a ,j a := e x 1 ,j 1 ⊗ . . . ⊗ e x a ,j a . By the definitio n o f set X a,θ the relations p ′ a ( X a,θ ) ≥ 1 − e − aθ 2 16(log nd ) 2 − | T n | 2 − alθ , (36) and ( r ⊗ a ⊗ q a )( X a,θ ) ≤ 2 − a ( I n − 2 lθ ) (37) hold. (36) and (3 7) imply by definitio n of the projection P la,θ ∈ ( A diag ⊗ B ( H )) ⊗ la that tr ( ρ ( la ) P la,θ ) ≥ 1 − e − aθ 2 16(log nd ) 2 − | T n | 2 − alθ , (38) and tr (( p ⊗ la ⊗ σ ( la ) ) P la,θ ) ≤ 2 − a ( I n − 2 lθ ) , (39) where ρ ( la ) := 1 | T n | X t ∈ T n ρ ′ ⊗ la t = X x al ∈ A al p ⊗ al ( x al ) | x al ih x al | ⊗ 1 | T n | X t ∈ T n D ′ t,x al , and σ ( la ) := 1 | T n | X t ∈ T n σ ′ ⊗ la t . Since n = al + b , 0 ≤ b < l , we can define a pr ojection P n,θ ∈ ( A diag ⊗ B ( H )) ⊗ n by P n,θ := P la,θ ⊗ ( 1 A diag ⊗ 1 H ) ⊗ ( n − la − 1) , (38), (39) yield then tr ( ρ ( n ) P n,θ ) ≥ 1 − e − a n θ 2 16(log nd ) 2 − | T n | 2 − a n l n θ , (40) and tr (( p ⊗ n ⊗ σ ( n ) ) P n,θ ) ≤ 2 − a n ( I n − 2 l n θ ) ≤ 2 − a n l n (inf t ∈ T χ ( p,W t ) − ε n − 2 θ ) (41) by ( 29) where ε n := C n + ζ l n ( p ) . Thus for n ≥ n 2 ( θ ) we conclud e from (41), the fact that lim n →∞ ε n = 0 , and 0 ≤ b n ≤ [ n 1 / 2 ] that tr (( p ⊗ n ⊗ σ ( n ) ) P n,θ ) ≤ 2 − n (inf t ∈ T χ ( p,W t ) − 3 θ ) . (42) Since the states ρ ( n ) ∈ ( A diag ⊗ B ( H )) ⊗ n and σ ( n ) ∈ B ( H ) ⊗ n correspo nd to the av e raged c q-chan nel W n = 1 | T n | P t ∈ T n W ′ n t we can app ly T heorem 5. 4 with λ = λ n := e − a n θ 2 16(log nd ) 2 + | T n | 2 − a n l n θ , µ = µ n := n ( inf t ∈ T χ ( p, W t ) − 3 θ ) , γ = γ n = nθ and end u p with a ( n, M ′ n = [2 n (inf t ∈ T χ ( p,W t ) − 4 θ ) ] , λ ′ n ) av - code for th e ch annel W n = 1 | T n | P t ∈ T n W ′ n t where λ ′ n = 2 λ n + 4 · 2 − nθ . By stand ard argumen ts we can select a sub-code for W n with M n ≥ (1 / 2 ) · M ′ n and max imum er ror prob ability ˜ λ n ≤ 2 λ ′ n . W e denote this ( n, M n , ˜ λ n ) max -code by C n . But since W n = 1 | T n | X t ∈ T n W ′ n t , it is clea r that C n is a ( n, M n , | T n | ˜ λ n ) max -code for the compou nd ch annel T n . W e know from our Le mma 5.6 that | T n | ≤ K | A | n | A | d 2 . Thus sin ce l n = [ √ n ] and a n = n − b n l n we see that lim n →∞ | T n | ˜ λ n = 0 and we are done sin ce M n ≥ (1 / 2)[2 n (inf t ∈ T χ ( p,W t ) − 4 θ ) ] ≥ [2 n (inf t ∈ T χ ( p,W t ) − 5 θ ) ] for all sufficiently large n ∈ N . Remark 5 .11: Note th at the error p robab ility of the codes constructed in the p roof of Theor em 5.10 be haves like 1 /n asymptotically . This is caused by o ur choice of τ n as τ n = 1 /n 2 . So we can achieve a fas ter decay of the dec oding er rors by using better seq uences τ n . For example, if we ch oose τ n = 2 − n 1 / 16 and replace D ′ t,x in (13) by D ′ t,x := (1 − τ n ) D t,x + τ n d 1 H for all x ∈ A and t ∈ T ′ n we obtain, as a careful inspectio n and a pain less mod ification of the argu ments ap plied so far show , for each suf ficiently small θ > 0 ( n, M n , λ n ) max -codes for the com pound cq -chann el T with M n ≥ [2 n (max p ∈P ( A ) inf t ∈ T χ ( p,W t ) − 5 θ ) ] and λ n ≤ 2 − c ( θ ) n 1 / 16 , for an app ropriate po siti ve constant c ( θ ) . B. The Str ong Con verse For the proo f of the strong conv erse we simply follow W olfowitz’ strategy in [24], [25]. T o this end we use Winter’ s result from [2 3] which is the core of the stron g converse for the sing le memoryless cq -chann el: Theor em 5.1 2 (W inter [23]): For λ ∈ (0 , 1) there exists a constant K ′ ( λ, dim H , | A | ) such that for e very memor yless cq-chan nel { W n } n ∈ N with finite input alp habet A and finite- dimensiona l outpu t Hilbert spac e H and every ( n, M n , λ ) max - code with the code words o f the same type p ∈ P ( A ) the inequality M n ≤ 2 n ( χ ( p,W )+ K ′ ( λ, dim H , | A | ) 1 √ n ) holds. The proof of this theorem is implicit in the proof of Theorem 13 in [ 23]. Theor em 5.1 3 (Str o ng Con ve rse): L et λ ∈ (0 , 1) . Then there is a con stant K = K ( λ, dim H , | A | ) such that for any compou nd c q-chan nel { W n t } t ∈ T ,n ∈ N and any ( n, M n , λ ) max - code C n 1 n log M n ≤ max p ∈P ( A ) inf t ∈ T χ ( p, W t ) + K 1 √ n holds. 10 Pr oof: W olfowitz’ proof of the strong converse [24], [25] for the classical compo und chann el extends muta tis mu tandis to the cq- case on ce we have Theo rem 5.12. W e fix n ∈ N and co nsider any ( n, M n , λ ) max -code C n = ( u i , b i ) M n i =1 . Each code word u i ∈ A n induces a type (em pirical distribution) p u i on P ( A ) a nd accor ding to th e stand ard type coun ting lem ma (cf. [5]) there ar e at most ( n + 1) | A | different types. W e divide our code C n into su b-code s C n,j = ( u ′ k , b ′ k ) M n,j k =1 such that the code word s o f each C n,j belong to the same type class, i.e. ind uce the same type. It is clea r that the maximum error prob abilities of these sub -codes are bound ed fro m above by λ fo r all t ∈ T . Since we ha ve a unifor m b ound on error probab ilities on each chan nel in the class T we may apply W inter’ s, Theorem 5.12, and obtain M j ≤ 2 n ( χ ( p j ,W t )+ K ′ ( λ, dim H , | A | ) 1 √ n ) ∀ t ∈ T , ( 43) where p j denotes th e type of the cod e word s b elonging to the sub-cod e C n,j . Since the left h and side of ( 43) does not dep end on t we may conclud e that M j ≤ 2 n (inf t ∈ T χ ( p j ,W t )+ K ′ ( λ, dim H , | A | ) 1 √ n ) ≤ 2 n (max p ∈P ( A ) inf t ∈ T χ ( p,W t )+ K ′ ( λ, dim H , | A | ) 1 √ n ) (44) holds. Th en, recalling that there are at most ( n + 1) | A | sub- codes and usin g (4 4) we arrive at M n ≤ ( n + 1) | A | 2 n (max p ∈P ( A ) inf t ∈ T χ ( p,W t )+ K ′ 1 √ n ) ≤ 2 n (max p ∈P ( A ) inf t ∈ T χ ( p,W t )+ K 1 √ n ) , with a su itable co nstant K = K ( λ, dim H , | A | ) . V I . A V E R AG E D C H A N N E L S In this section we extend the results o f Datta and Dorlas [6 ] to arbitrary av eraged channels whose branches are memoryless cq-chan nels. Let ( T , Σ , µ ) be a probab ility space , i.e. T is a set, Σ is a σ -algebr a, and µ is a probability measu re on Σ . Moreover we consider a memory less comp ound cq- channe l { W n t } t ∈ T ,n ∈ N with fin ite in put alp habet A and finite-dim ensional output Hilbert spa ce H . W e assume that the branch es W t , t ∈ T , depend measu rably on t ∈ T , i.e. we assume that for each fixed x ∈ A the ma ps T ∋ t 7→ D t,x ∈ S ( H ) are measur able. W e assume her e that S ( H ) is en dowed with its natura l Borel σ -algebr a. The averaged channel W = { W n } n ∈ N is defined by the following pr escription: For any n ∈ N we have a map W n : A n ∋ x n 7→ D x n ∈ S ( H ⊗ n ) where D x n is the density operator uniqu ely d etermined by the requiremen t that for all b ∈ B ( H ⊗ n ) the r elation tr ( D x n b ) = Z tr ( D t,x n b ) µ ( dt ) holds 5 . A code C n = ( x n ( i ) , b i ) M n i =1 for the averaged ch annel 5 Note that tr ( D t,x n b ) depends measurably on t since tensor and ordinary products of oper ators are continu ous and hence measurable operation s. { W n } n ∈ N consists as bef ore of cod ew o rds x n ( i ) ∈ A n and decodin g o perators b i ∈ B ( H ) ⊗ n , b i ≥ 0 , P M n i =1 b i ≤ 1 ⊗ n H . The integer M n is the size o f the code. Achiev able r ates and the capacity C ( W ) are define d in a similar fashion as for memory less cq-chann els. W e will sh ow in the fo llowing tw o subsections that, in a nalogy to the classical case [2], the weak capac ity of W is gi ven by C ( W ) = sup p ∈P ( A ) ess − inf t ∈ T χ ( p, W t ) , (45) where ess − inf d enotes th e essential infimum 6 . Clear ly , we ca nnot expect the strong converse to hold because of Ahlswede’ s [ 2] counter examp les in the classical setting. A. The dir ect part of th e Coding Theo r em W e will need some simple proper ties of the essential in- fimum in the proof o f the direct part of the c oding th eorem for the averaged ch annel W . W e start with a simple gene ral proper ty of th e essential in fimum: Lemma 6 .1: Let ( T , Σ , µ ) be a prob ability sp ace an d f : T → R any measurable fu nction. Let a := ess − inf t ∈ T f . Then the set A := { t ∈ T : f ( t ) ≥ a } satisfies µ ( A ) = 1 . Pr oof: The a ssertion o f th e lemma follows e asily from the d efinition of the essential infimum. Our proo f of th e direct part o f the coding theore m will be based on a redu ction to the case of comp ound cq-cha nnels. Therefo re we ha ve to give another chara cterization of sup p ∈P ( A ) ess − inf t ∈ T χ ( p, W t ) in terms of the optimizatio n pr ocesses appearing in the ca- pacity formula for the compound cq -chann els. T o this end we define for any p ∈ P ( A ) a ( p ) := es s − inf t ∈ T χ ( p, W t ) , and T p := { t ∈ T : χ ( p, W t ) ≥ a ( p ) } . Lemma 6 .2: Let { W n } n ∈ N be the a veraged cq-channel defined by the probab ility space ( T , Σ , µ ) and the compoun d cq-chan nel T . Then sup p ∈P ( A ) max q ∈P ( A ) inf t ∈ T p χ ( q , W t ) = sup q ∈P ( A ) ess − inf t ∈ T χ ( q , W t ) . Pr oof: µ ( T p ) = 1 holds by Lemma 6 .1. For p, q ∈ P ( A ) and the corr espond ing sets T p , T q ⊆ T we have inf t ∈ T p χ ( q , W t ) ≤ inf t ∈ T p ∩ T q χ ( q , W t ) ≤ ess − inf t ∈ T χ ( q , W t ) , (46) where the last inequ ality is justified by the o bservation that µ ( T p ∩ T q ) = 1 an d that T p ∩ T q ⊆ { t ∈ T : χ ( q , W t ) ≥ inf t ∈ T p ∩ T q χ ( q , W t ) } , i.e. µ ( { t ∈ T : χ ( q , W t ) < 6 The essential infimum of a measurabl e function f : T → R on the probabil ity space ( T , Σ , µ ) is defined by ess − inf t ∈ T f := sup { c ∈ R : µ ( { t ∈ T : f ( t ) < c } ) = 0 } . 11 inf t ∈ T p ∩ T q χ ( q , W t ) } ) = 0 and (4 6) holds by definition of the essential infimum. (4 6) im plies tha t max q ∈P ( A ) inf t ∈ T p χ ( q , W t ) ≤ sup q ∈P ( A ) ess − inf t ∈ T χ ( q , W t ) , and consequently sup p ∈P ( A ) max q ∈P ( A ) inf t ∈ T p χ ( q , W t ) ≤ sup q ∈P ( A ) ess − inf t ∈ T χ ( q , W t ) . (47) In order to show the re verse inequality we choo se for any ε > 0 a q ε ∈ P ( A ) with sup q ∈P ( A ) ess − inf t ∈ T χ ( q , W t ) ≤ es s − inf t ∈ T χ ( q ε , W t ) + ε. (48) By definition of the set T q ε as T q ε = { t ∈ T : χ ( q ε , W t ) ≥ a ( q ε ) } , with a ( q ε ) = es s − inf t ∈ T χ ( q ε , W t ) we have ess − inf t ∈ T χ ( q ε , W t ) ≤ inf t ∈ T q ε χ ( q ε , W t ) . (49) The inequalities (4 8) and (49) show that sup q ∈P ( A ) ess − inf t ∈ T χ ( q , W t ) ≤ inf t ∈ T q ε χ ( q ε , W t ) + ε, which in tu rn yield s sup q ∈P ( A ) ess − inf t ∈ T χ ( q , W t ) ≤ sup p ∈P ( A ) max q ∈P ( A ) inf t ∈ T p χ ( q , W t ) + ε. Since ε > 0 can be made ar bitrarily small and the lef t hand side of the last in equality does not depend on ε we finally obtain sup q ∈P ( A ) ess − inf t ∈ T χ ( q , W t ) ≤ sup p ∈P ( A ) max q ∈P ( A ) inf t ∈ T p χ ( q , W t ) , which concludes ou r p roof. Theor em 6.3 (Dir ect P art): Let W deno te the a verag ed cq- channel. Then C ( W ) ≥ sup p ∈P ( A ) ess − inf t ∈ T χ ( p, W t ) Pr oof: W e assume that sup p ∈P ( A ) ess − inf t ∈ T χ ( p, W t ) > 0 since otherwise th e assertion of the theorem is trivially true. By Lem ma 6.2 it is eno ugh to show that for each p ∈ P ( A ) with max q ∈P ( A ) inf t ∈ T p χ ( q , W t ) > 0 the rate max q ∈P ( A ) inf t ∈ T p χ ( q , W t ) − ε is achiev ab le for each suf ficiently small ε > 0 . But this follows immediately if we a pply our Theore m 5 .10 to the c ompou nd channel T p since a ny g ood cod e for th e com pound cq-chan nel T p has the same perform ance for the averaged ch annel W n due to th e fact that µ ( T p ) = 1 . B. The W eak Conver se W e start with a gen eral p roperty of the essential infimum which will help us to reduce the argum ents in the proo f of the weak converse to Fano’ s inequality and Holevo’ s boun d v ia Markov’ s inequality . Lemma 6 .4: Consider a probability spac e ( T , Σ , µ ) . Le t n ∈ N and f , f n : T → R be measur able bo unded fu nctions with lim n →∞ f n ( t ) = f ( t ) ∀ t ∈ T . (50) Let ( G n ) n ∈ N be a seque nce of measurable subsets of T with lim n →∞ µ ( G n ) = 1 . Then lim sup n →∞ inf t ∈ G n f n ( t ) ≤ ess − inf t ∈ T f (51) holds. Pr oof: Th e proof will be a ccomplished if we can show the f ollowing two ine qualities: lim sup n →∞ inf t ∈ G n f n ( t ) ≤ lim sup n →∞ inf t ∈ G n f ( t ) , (52) and lim sup n →∞ inf t ∈ G n f ( t ) ≤ ess − inf t ∈ T f . (53) Pr oof of (52): Set b n := inf t ∈ G n f ( t ) and b ′ n := inf t ∈ G n f n ( t ) . Then to any ε > 0 we can fin d a t ε ∈ G n with f ( t ε ) ≤ b n + ε, (54) and, by (5 0), there is n ( ε ) ∈ N such that f or all n ≥ n ( ε ) we have f n ( t ε ) ≤ f ( t ε ) + ε. (55) Then the defin ition of b ′ n , (55), an d ( 54) yield b ′ n ≤ b n + 2 ε for all n ≥ n ( ε ) . This implies lim sup n →∞ b ′ n ≤ lim sup n →∞ b n + 2 ε, and since ε > 0 is arbitrary we ob tain (52). Pr oof o f (53): As in the first part of the proof w e use the abbreviation b n := inf t ∈ G n f ( t ) , and additionally we set b := lim sup n →∞ b n . Then by the very basic pr operties of the upper limit we ca n select a sub sequence ( n i ) i ∈ N with lim i →∞ b n i = b. (56) In ord er to keep the notation as simple as po ssible we will denote this ind uced sequen ce ( b n i ) i ∈ N by ( b n ) n ∈ N , i.e. we simply rename the subsequ ence. For any fixed n ∈ N we consider the sequence ( A n,k ) k ∈ N consisting of measurable 12 subsets of T d efined by A n,k := S k i =1 G n + i . Note that for each n ∈ N the seq uence ( A n,k ) k ∈ N has the following proper ties which are easy to check: 1) A n, 1 ⊂ A n, 2 ⊂ . . . , 2) lim k →∞ µ ( A n,k ) = 1 , 3) a n,k := inf t ∈ A n,k f ( t ) = min { b n +1 , b n +2 , . . . , b n + k } , the sequen ce ( a n,k ) k ∈ N is non -increasing for any n ∈ N , and 4) for A n := S k ∈ N A n,k and a n := inf t ∈ A n f ( t ) we have µ ( A n ) = 1 , a n ≤ ess − inf t ∈ T f , and a n = lim k →∞ a n,k for each n ∈ N . In sp ite of these properties it suffices to prove that for eac h ε > 0 there is n ( ε ) ∈ N such that b − ε ≤ a n ( ε ) ,k ≤ b + ε ∀ k ∈ N , (57) holds. In fact, (57) implies then that b − ε ≤ a n ( ε ) ≤ b + ε , since a n ( ε ) = lim k →∞ a n ( ε ) ,k and b y choosing an approp riate sequence ( ε j ) j ∈ N with ε j ց 0 we can conclude that b = lim sup j →∞ a n ( ε j ) . But then b ≤ ess- inf t ∈ T f by a n ( ε j ) ≤ ess- inf t ∈ T f for all j ∈ N . Thus we only need to prove (57) which f ollows f rom (56) (with o ur convention to supp ress the in dex i ): T o any ε > 0 we can find by (5 6) an n ( ε ) ∈ N such that for all n ≥ n ( ε ) we hav e b − ε ≤ b n ≤ b + ε. Then by p roperty 3 ) ab ove we o btain for each k ∈ N b − ε ≤ min { b n ( ε )+1 , . . . , b n ( ε )+ k } = a n ( ε ) ,k ≤ b + ε, which is the desired relation. As a last p reliminary result we need the generalization of Lemma 6 in [4]. Lemma 6 .5: Let { W n } n ∈ N be a memoryless cq-chann el with in put a lphabet A and outp ut Hilbert space H . Then for any ( n, M n , ε n ) av -code C n = ( x n ( i ) , b i ) M n i =1 with distinct codewords we h ave (1 − ε n ) log M n ≤ nχ ( p ∗ , W ) + 1 , where p ∗ = 1 M n P M n i =1 p x n ( i ) ∈ P ( A ) with empirical distri- butions o r types p x n ( i ) ∈ P ( A ) of the co dew ords x n ( i ) for i = 1 , . . . , M n . Pr oof: The proo f is based upo n similar arguments as that of correspo nding Lemma 6 in [4]. The only addition al argument we need is Holev o’ s boun d. Th e details are as follows; W e may assum e w .l.o.g . that P M n i =1 b i = 1 ⊗ n and define correspond ing classical cha nnel by K ( j | i ) := tr ( D x n ( i ) b j ) i, j ∈ { 1 , . . . , M n } . Let ν ∈ P ( A n ) b e giv en by ν ( x n ) = 1 M n if x n is o ne o f x n ( i ) , i = 1 , . . . , M n , an d ν ( x n ) = 0 else. In what follows we consider the marginal d istributions ν 1 , . . . , ν n ∈ P ( A ) in duced by ν ∈ P ( A n ) . It is obvious th at p ∗ ( a ) = 1 n n X j =1 ν j ( a ) ∀ a ∈ A (58) holds. Fro m Fano’ s inequality an d Ho lev o’ s bound we obta in (1 − ε n ) log M n ≤ I ( ν, K ) + 1 ≤ χ ( ν , W n ) + 1 , (59) where I ( ν, K ) den otes the mutual information evaluated f or the input distribution ν and the classical channe l K . Using the super-additivity (cf. [16]) and conc avity (w .r .t. the inp ut distribution) of the Holevo inform ation we get χ ( ν, W n ) ≤ n X j =1 χ ( ν j , W ) ≤ nχ ( p ∗ , W ) , (60) where we have used (5 8) in the last inequality . In serting (6 0) into ( 59) y ields the claimed relation. The c orrespo nding weak conv erse is th e content of the next theorem. Theor em 6.6 (W eak Con verse): Let W be the av eraged channel de fined by the proba bility spa ce ( T , Σ , µ ) an d the compoun d chann el T . Then any sequence ( C n ) n ∈ N of ( n, M n , ε n ) av / max -codes with lim n →∞ ε n = 0 fulfills lim sup n →∞ 1 n log M n ≤ sup p ∈P ( A ) ess − inf t ∈ T χ ( p, W t ) . Pr oof: Let ( C n ) n ∈ N be any sequence o f ( n, M n , ε n ) av - codes with lim n →∞ ε n = 0 , i. e. Z e av ( t, C n ) µ ( dt ) = ε n , where e av ( t, C n ) = 1 M n M n X i =1 (1 − tr ( D t,x n ( i ) b i )) . Set G n := { t ∈ T : e av ( t, C n ) ≤ √ ε n } . (61) Then Mar kov’ s inequa lity yields µ ( G n ) ≥ 1 − √ ε n . (62) If we choose n 1 ∈ N such that √ ε n < 1 2 for all n ≥ n 1 then all the code words are distinct and we can apply Lemm a 6. 5 to each t ∈ G n (cf. (61)) lead ing to (1 − √ ε n ) log M n ≤ nχ ( p ∗ , W t ) + 1 , which is equ iv alent to 1 n log M n ≤ χ ( p ∗ , W t ) + 1 n 1 − √ ε n , (63) for all t ∈ G n and all n ≥ n 1 . Sin ce (63) holds for all t ∈ G n we obtain 1 n log M n ≤ inf t ∈ G n χ ( p ∗ , W t ) + 1 n 1 − √ ε n . (64) 13 Recall that p ∗ depend s on th e block leng th n . Thu s we are done if we can show that lim sup n →∞ max p ∈P ( A ) inf t ∈ G n χ ( p, W t ) ≤ sup p ∈P ( A ) ess − inf t ∈ T χ ( p, W t ) (65) holds. For each n ∈ N with n ≥ n 1 we choose p n ∈ P ( A ) with inf t ∈ G n χ ( p n , W t ) = max p ∈P ( A ) inf t ∈ G n χ ( p, W t ) . By passing to a subsequenc e if necessary we may assume that lim n →∞ inf t ∈ G n χ ( p n , W t ) = lim sup n →∞ max p ∈P ( A ) inf t ∈ G n χ ( p, W t ) . ( 66) By selecting a further sub sequence w e can e ven ensure that lim j →∞ p n j =: p ′ ∈ P ( A ) du e to the comp actness of P ( A ) . By (66) we have lim j →∞ inf t ∈ G n j χ ( p n j , W t ) = lim sup n →∞ max p ∈P ( A ) inf t ∈ G n χ ( p, W t ) . (67) Now , since lim j →∞ χ ( p n j , W t ) = χ ( p ′ , W t ) for all t ∈ T b y the con tinuity of Holevo inform ation, and since lim j →∞ µ ( G n j ) = 1 by (62), we see th at the assumptions of Le mma 6.4 are fulfilled for the fun ctions f j ( t ) := χ ( p n j , W t ) a nd f ( t ) := χ ( p ′ , W t ) . Thus Lemma 6 .4 a nd (67) show that lim sup n →∞ max p ∈P ( A ) inf t ∈ G n χ ( p, W t ) ≤ ess − inf t ∈ T χ ( p ′ , W t ) ≤ sup p ∈P ( A ) ess − inf t ∈ T χ ( p, W t ) . This is exactly (65) an d we are done. V I I . C O N C L U S I O N In this paper we have shown the existence of u niversally “good ” classical-quan tum code s for two p articularly inter- esting cq-cha nnel mo dels with limited channel knowledge. W e determin ed th e op timal transmission rates f or the classes of comp ound and averaged cq- channels. For the first model we cou ld prove the stron g converse for the maximum erro r criterion whereas for the latter o nly a weak converse is established. The co ding theorems fo r com poun d and averaged cq-ch annels imply in an obviou s way the co rrspon ding cap acity formu las for the classical pr oduct state capacities of compound and av eraged q uantum channels (cf. the arguments in [1 6], [2 0], [23] for m emory less qu antum chann els). T o be specific the classical produ ct state capac ity of a family {N t : B ( H ′ ) → B ( H ) } t ∈ T of qua ntum ch annels, as described by co mpletely positive, tr ace preserving maps, is gi ven, according to our results, by C 1 ( {N t } t ∈ T ) = sup { p i ,D i } inf t ∈ T χ ( { p i , N t ( D i ) } ) , where the suprem um is taken over all en sembles { p i , D i } of possible inpu t states D i ∈ S ( H ′ ) occu rring acc ording to probab ility distribution ( p i ) , and χ ( { p i , N t ( D i ) } ) := S X p i N t ( D i ) − X p i S ( N t ( D i )) . The full classical capacity of {N t } t ∈ T is then C ( {N t } t ∈ T ) = lim n →∞ 1 n C 1 ( {N ⊗ n t } t ∈ T ) , and the limit is in g eneral necessary b y a cou nterexample to the ad ditivity conjectu re giv en b y Hasting s [9]. The capacity results for comp ound and averaged cq- channels show nicely the im pact of the degree of ch annel uncertain ty on the cap acity . In fact, for the com poun d cq-chann el we merely k now that th e infor mation transmission hap pens over an unkn own mem oryless cq -chann el which belong s to an a priori given set of c hannels. T he capac ity form ula (6) is the b est worst-case rate we can guarante e simu ltaneously for all inv o lved channe ls. For av eraged cq-chan nels, on the other hand , the fo rmula (45) takes in to accou nt only the almost sure worst-case cq-ch annel, since we are given an additional inform ation repre sented by th e probability measure on the memoryless branch es. Con sequently , the cap acity of compou nd-cq -channels is smaller than the capacity of their av eraged c ounterp arts in many natural situations. A simple example illustrating this effect is as follows. Let T := { 1 , . . . , K } be a finite set an d let W 1 , . . . , W K : { 0 , 1 } → S ( C 2 ) be cq-ch annels that defined as follows. Let W 1 be any chan nel with the capacity C ( W 1 ) = 0 . For j ∈ { 2 , . . . , K } select distinct unitaries U 2 , . . . , U K acting on C 2 and defin e W j ( b ) := U j | e b ih e b | U ∗ j where b ∈ { 0 , 1 } , j ∈ { 2 , . . . , K } a nd e 0 , e 1 is the canonical basis o f C 2 . Note that fo r ea ch p ∈ P ( { 0 , 1 } ) and j ∈ { 2 , . . . , K } χ ( p, W j ) = H ( p ) holds, and consequ ently C ( W 2 ) = . . . = C ( W K ) = 1 . Since any sequen ce of codes with asymptotically vanishing probab ility of erro r for the compo und cq-channel T has to be reliable for each of o ur chan nels W 1 , . . . , W K and especially for W 1 , we see that the only achiev ab le ra te for T is 0 . Consequently C ( T ) = 0 . Now , if both th e transmitter and receiver have add itional inform ation that th e chann els from T ar e drawn acco rding to a priori probab ility distribution µ (1) = 0 and µ ( i ) = 1 K − 1 for i ∈ { 2 , . . . , K } th en it f ollows from Th eorem 6.3 tha t C ( W ) ≥ sup p ∈P ( { 0 , 1 } ) ess − inf t ∈ T χ ( p, W t ) = sup p ∈P ( { 0 , 1 } ) min i ∈{ 2 ,...,K } χ ( p, W t ) = sup p ∈P ( { 0 , 1 } ) H ( p ) = 1 , where W denotes the a verag ed ch annel associated with T and µ . 14 A C K N OW L E D G M E N T W e are g rateful to M. Hayashi wh o helped us clarify the story o f h is ap proach to universal qu antum hy pothesis testing. W e than k the Associate E ditor an d the anonymo us r eferee for many useful commen ts and suggestions that improved the readability of the pap er . A P P E N D I X I P R O O F O F T H E O R E M 5 . 4 This appendix is dev oted to th e proof of Theorem 5.4. W e will app ly a random codin g argument of Hayashi and Nag aoka which in turn is based o n th e following operator ine quality which we quote f rom the w ork [13] by Hayashi and Nag aoka: Theor em 1.1 (Haya shi & Nagaoka [13]): Let K be a finite-dimensional Hilbert space. For an y op erators a, b ∈ B ( K ) with 0 ≤ a ≤ 1 and b ≥ 0 , we h av e 1 − √ a + b − 1 a √ a + b − 1 ≤ 2( 1 − a ) + 4 b, (68) where ( · ) − 1 denotes the gen eralized inverse. Let us first note that our projection P ∈ B diag ⊗ B ( K ) c an b e uniquely written as P = X k ∈ K | k ih k | ⊗ P k , with suitab le pr ojections P k ∈ B ( K ) f or all k ∈ K . W ith this representatio n we ha ve tr ( ρP ) = X k ∈ K w ( k ) tr ( D k P k ) , (69) and tr (( w ⊗ σ ) P ) = X k ∈ K w ( k ) tr ( σP k ) . (70 ) Now let u s set M := [2 µ − γ ] and consider i.i.d. ran dom variables U 1 , . . . , U M with values in K each of wh ich is distributed accor ding to w ∈ P ( A ) . Moreover we set b i ( U 1 , . . . , U M ) := M X j =1 P U j − 1 / 2 P U i M X j =1 P U j − 1 / 2 . (71) Applying Lemma 1.1 we obtain 1 K − b i ( U 1 , . . . , U M ) ≤ 2 ( 1 K − P U i ) + 4 M X j =1 j 6 = i P U j . (72) In the followi ng con sideration we use the shorthand e ( U ) for the average error pr obability o f th e rando m code ( U i , b i ( U 1 , . . . , U M )) M i =1 , i.e. we set e ( U ) := 1 M M X i =1 tr ( D U i ( 1 K − b i ( U 1 , . . . U M ))) . Recalling th e fact that U 1 , . . . , U M are i.i.d. each distributed accordin g to w and (72) yields E U 1 ,...,U M ( e ( U )) ≤ 2 M M X i =1 X k ∈ K w ( k ) tr ( D k ( 1 K − P k )) + 4( M − 1) M M X k ∈ K w ( k ) tr ( σP k ) ≤ 2 tr ( ρ ( 1 − P )) + 4 · M · tr (( w ⊗ σ ) P ) ≤ 2 · λ + 4 · 2 − γ , (73) where we hav e used (69) and (70) in the seco nd inequality . (73) shows that th ere must be at least o ne deter ministic code ( k i , b i ) M i =1 , which is a realizatio n o f the rand om cod e ( U i , b i ( U 1 , . . . , U M )) M i =1 , with average error p robab ility less than 2 · λ + 4 · 2 − γ which conclu des the pr oof of Theorem 5.4. R E F E R E N C E S [1] R. Ahlswede, “Certain Results in Coding T heory for Compound Chan- nels I”, Proc. Colloquiu m Inf. Theory , Bolayi Mathe matica l Society , Debrece n, Hungary , 35-59 (1967) [2] R. Ahlswede, “The W eak Capacit y of A verage d Channels”, Z. W ahrsche inlic hkeitstheorie verw . Geb . 11, 61-73 (1968) [3] I. Bjelako vi ´ c, J. -D. Deuschel , T . K r ¨ uger , R. S eiler , Ra. Siegmund- Schultz e, A. Szkoła, “ A Quantum V ersion of Sanov’ s Theore m”, Com- mun. Math. Phys. 260, 659-671 (2005) [4] D. Blackwell, L. Breiman, A.J. Thomasian, “The Capac ity of a Class of Channels” , A nn. Math. Stat. 30, No. 4, 1229-1241 (1959) [5] I. Csizsar , J. K ¨ orner , “Information Theory; Coding Theorems for Discrete Memoryless Systems”, Akad ´ emiai Kiad ´ o, Budapest/Acad emic Press Inc., Ne w Y ork 1981 [6] N. Datta, T . Dorlas, “Coding Theorem for a Class of Quantum Channels with Long-T erm Memory”, J . Physics A: Math. Gen. 40, 8147-8164 (2007). A vaila ble at : http:/ /arxi v .org/abs/quant-ph/0610049 [7] M.J. Donald, “Further results on the relati ve entropy”, Math. P r oc. Camb . P hil. Soc. 101, 363-373 (1987) [8] M. Fa nnes, “ A Continu ity Property of the Entrop y density for Spin Lattice Systems”, Commun. Math. Phys. 31, 291-294 (1973) [9] M.B. Hastings, “ A Countere xample to Additi vity of Minimum Outpu t Entropy” , arXiv:0 809.3972 [10] M. Hayashi, “ Asymptotics of Quantu m Rela ti ve Entropy from a Rep- resenta tion Theoreti cal V ie wpoint”, J . Physics A: Math. Gen. 34, 3413- 3419 (2001) [11] M. Hayashi, ”Optimal sequence of quantum measurements in the s ense of Stein’ s lemma in quantum hypothesis testing”, J . P hys. A: Math. Gen. , 35, 10759-10773 (2002 ) [12] M. Hayashi, “Uni versal coding for classical-qua ntum ch annel” , arXi v:0805.4092 [13] M. Hayashi, H. Nagaoka, “General Formulas for Capacit y of Classical - Quantum Channels”, IEEE T rans. Inf. Th. V ol. 49. No. 7, 1753-1768 (2003) [14] F . Hiai, D. Petz, “The Proper Formula for Relati ve Entrop y and its Asymptotics in Quantu m Probabili ty”, Commun. Math. Phys. 143, 99- 114 (1991) [15] W . Hoef fding, “Proba bilit y inequalitie s for sums of bounded random v ariabl es”, Jour . Amer . Math. Stat. Association V ol. 58, 13-30 (1963) [16] A.S. Hole vo, “The Capacit y of the Quantum Channel with General Signal States”, IE EE T rans. Inf. Th. V ol. 44, No. 1, 269-273, (1998) [17] R. Jozsa, M. Horodecki, P . Horodecki , R. Horodecki, “Uni versal Quan- tum Information Compression” , Phys. Rev . L etter s V ol. 81, No. 8, 1714- 1717 (1998) [18] T . Oga wa, M. Hayashi, “ A New Proof of the Direct Part of Stein’ s Lemma in Quantum Hypothesis T esting ”, A vai lable at: http:/ /arxi v .org/abs/qua nt-ph/0110125 [19] T . Ogaw a, H. Nagaoka, “Strong Conv erse to the Quantum Channel Coding Theore m”, IEEE T rans. Inf . Th. V ol. 45, No. 7, 2486-2489 (1999) [20] B. Schumache r , M.D. W estmoreland, “Sendi ng Classical Information via Noisy Quantum Channel”, Phys. R ev . A V ol. 56, No. 1, 131-138, (1997) [21] P .C. Shields, “The E rgodic Theory of Discrete Sample Paths”, G raduate Studies in Mathemat ics V ol. 13, American Mathematical Soci ety 1996 15 [22] A. Wint er , “Coding Theorems of Quantum Information T heory”, Ph.D . dissertat ion, Unive rsit ¨ at Bielefeld, Bielefel d, Germany 1999, A v aila ble at : http://www . arxi v . org/a bs/quant -ph/9907077 [23] A. Wi nter , “Coding Theorem and Strong Con verse for Quantum Chan- nels”, IEE E T rans. Inf. Th. V ol. 45, No. 7, 2481-2485 (1999) [24] J. W olfowit z, “Simultaneous Channe ls”, Arc h. Rational Mech. A nal. V ol. 4, No. 4, 371-386 (1960) [25] J. W olfo witz, “ Codi ng Theorems of Informat ion Theory”, Erg ebnisse der Mathemat ik und ihrer Grenzge biete 31 , 3. Edition, Springer -V erlag, Berlin 1978
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment