Joint source-channel coding via statistical mechanics: thermal equilibrium between the source and the channel

Join t Source–Channel Co ding via Statis tical Mec hanics: Thermal Equilibrium Bet w een the Source and t he Channel ∗ Neri Merha v Departmen t o f Electrical Engineering T e c hnion - Israel Institute of T ec hnology Haifa 32000, ISRAEL Abstract W e examine the classical joint so urce–channel co ding problem from the v iewpo int of sta- tistical physics a nd demonstr ate that in the ra ndom co ding regime, the poster io r proba bilit y distribution of the source given the c hannel output is dominated by sour ce sequences, which exhibit a be havior that is highly parallel to tha t o f thermal equilibrium be t ween tw o s ystems of par ticles that exc hange energy , wher e one sys tem correspo nds to the source and the other corres p o nds to the c ha nnel. The thermo dyna mical entopies of the dual ph ysical pr o blem are analogo us to conditiona l and unconditional Shannon entropies of the source, a nd so, their ba l- ance in t hermal equilibrium yields a simple for mula for the mutual information betw een the source a nd the channel o utput, that is induced by the typical co de in an ensemble of join t source–channel co des under cer tain co nditions. W e also demonstra te how our results can b e used in applications, like the wiretap channel, and ho w can it be extended to m ultiuser scenar - ios, like that of the multiple access c hannel. Index T erms: join t source–channel co ding, statistical physics, th ermal equilibrium, mutual information, entrop y . 1 In tro duction Consider the follo wing t w o seemingly unrelated problems, wh ich serv e as simp le sp ecia l cases of a more general setting we study later in this pap er: The ﬁrst is an elemen trary p r oblem in statistical ph ysics: W e hav e t wo subsystems of particles whic h are brought into thermal equilibirium with eac h other as well as with the en vironment (a ∗ P art of this w ork w as carried out during a visit in Hewl ett–Pac k ard Lab oratories, P alo Alto, CA, U.S.A., in the Summer of 2008. 1 heat bath) at temp erature T . The ﬁrst subsystem consists o f N particles ha ving magnetic momen ts (spins), { s i } , eac h of wh ic h ma y b e orien ted either in the direction of an applied external magnetic ﬁeld B , in whic h case s i = + 1, or in the opposite direction, in w hic h ca se s i = − 1, and its energy in b oth cases is giv en b y − s i B (up to a certain multiplicati ve constant , w hic h carries th e appr opriate physic al un its, and wh ic h is irrelev an t for the pur p ose of this discussion). In the second su bsystem, there are n non–interac ting particles { s ′ i } n i =1 , eac h one of wh ic h ma y lie in one of t wo p ossible states: the state s ′ i = 0, in which the particle has zero energy , and th e state s ′ i = 1, in which it has energy e 0 . What is the a ve rage energy p ossessed by eac h one of these sub systems in equilibrium, as fun ctions of e 0 , T , n , N , and B ? The sec ond problem is in Information Theory , in particular, it is in joint source–c hannel co ding, where some of th e nota tion used is delib erately c h osen to b e the same as in the pr evious paragraph: A binary memoryless source generates a v ector s of sym b ols ( s 1 , s 2 , . . . , s N ), s i ∈ { +1 , − 1 } , i = 1 , . . . , N , with probabilitie s q = Pr { S i = +1 } and 1 − q = Pr { S i = − 1 } . This v ector is enco ded in to a binary channel co deword x ( s ) of length n and transmitted o v er a b inary symm etric c hannel (BSC) with a crosso ver probabilit y p < 1 / 2, and a binary n –v ector y is receiv ed at the c hannel output. Consider the p osterior distribution P ( s | y ) = P ( s ) W ( y | x ( s )) P s ′ P ( s ′ ) W ( y | x ( s ′ )) where P ( s ) and W ( y | x ) are th e pr obabilit y d istributions that go v ern the sour ce and the c h an- nel, resp ectiv ely , as describ ed ab o v e. Th us, cle arly , P ( s | y ) is prop ortional t o P ( s ) W ( y | x ( s )), o r equiv alen tly , ln P ( s | y ) is (within a term that is ind ep enden t of s ) giv en b y ln P ( s ) + ln W ( y | x ( s )). F or a t ypical co de dra wn un iformly at rand om from t he ensem ble of cod es, what are the relat ive con tributions of the source and the channel to this quan tit y , for those v ectors s that d ominate P ( s | y ) (i.e., those that capture the v ast ma jorit y of the p osterior p r obabilit y)? It turns out, as we shall see in Section 3 b elo w, that th e t wo problems ha v e virtually id en tical answ ers (in a sense that will b e made clear and pr ecise therein), pro vid ed that the parameters T and B of the ﬁrst problem are r elated to the p arameters p and q of the second pr oblem b y p = exp {− e 0 /k T } 1 + exp {− e 0 /k T } (1) 2 and q = exp { B /k T } 2 cosh( B /k T ) , (2) or equiv alen tly , e 0 = k T ln 1 − p p (3) and B = k T 2 ln q 1 − q , (4) where k is Boltz mann’s constan t. Thermal equilibrium b et w een the t wo subsystems in the ab ov e describ ed p h ysical problem, dic- tates a certa in balance b et we en their thermo dyn amical en tropies in order to arrive at the maxim um total en trop y (by the second la w of th ermo dynamics) for the total energy p ossessed by the entire system at the given temp erature T . As the therm o dynamical en tropy , in its statistical –mec hanical deﬁnition, is intimatel y related to the Shannon ent ropy , it tu rns out that this equilibrium relation b et we en the thermo dynamical en trop ies of the ph ysical problem, give s r ise to an analog ous r elation b et we en Shannon en tropies p ertaining to the join t source–c hannel codin g problem in the random co ding regime . In p articular, it relates the ent ropy of the source to its c onditional en trop y gi ve n the c hannel outp u t, whose diﬀerence is exactly th e m utual information b et ween the source and the c hannel output. The ﬁnal outcome of this is a simp le formula for calculating the mutual infor- mation rate b et ween th e input and the output o f a c o ded system for th e typical c o de in a g iv en ensem ble u nder certain c onditions. This calculatio n builds strongly on the rand om energy mo d el (REM) of sp in glasses due to Derrida [3, 4, 5] and its relation to the random co de ensem ble (R CE) as describ ed in [12]. Clearly , und er the regime of reliable comm unication, the m utual information r ate b et ween the source and the channel output coincides with the en trop y rate of the s ource, as the conditional en trop y rate of the source giv en the channel outpu t v anishes. Thus, the problem of calculating the mutual in f ormation un der reliable comm unication conditions is easy and in fact, not quite in teresting. The same calculation, ho w ev er, when the cond itions of reliable comm unication are not met, app ears less trivial. But what would b e the motiv ation for suc h a calculation? Here are just a f ew examples that motiv ate this: Consider a user th at, in addition to its desired signal, recei ve s also a relativ ely strong in terfering sig nal ( co deword), which is in tended to 3 other users, and whic h comes fr om a c o deb ook whose rate exce eds the capacit y of this c rosstalk c hannel b etw een the int erferer and our user, so that the us er cannot fully deco de this inte rference. Nonetheless, our user w ould lik e to learn as muc h as p ossible on the in terfering signal for m an y p ossible reasons: F or example, the user would like to learn th e interference signal in order to ident ify where it originates from, or in ord er to estimat e it and subtract it (intereference cancella tion). T he m utual information rate, call it I , b et w een the interference signal and the c hannel output th en giv es some assessmen t concerning the qualit y of this estimation. F or on e th ing, D ( I ), w h ere D ( · ) is the d istortion–rate function of the source, is a lo w er boun d to the distortion in estimating this signal. Mo reo v er, if the c hannel is Gaussian, one can calculate the exact minim um mean square error (MMSE) fr om the mutual information r ate I b y taking its deriv ativ e w.r.t. the signal–to–noise ratio (SNR) [9]. Another ap p lication comes from sce narios where t he ab o v e describ ed r eceiv er is a hositle part y (an ea vesdropp er), from whic h one w ould lik e to conceal information as m uc h as p ossible. The natural setup, in this con text, is that of the wir etap c hannel (cf. [14] as well as many follo w –up pap ers), where excess c hannel noise b ey ond capacit y is harn essed as an eﬀectiv e k ey that secures data communicatio n. As w e sh o w in the sequel, th e m utual inf ormation rate b etw een the transmitted m essage and the ea v esdropp er, which suﬀers from this excess noise, is strongly related to the equiv o cation, whic h is a customary mea sure of securit y in Shannon–theoretic secrecy systems. The outline of this pap er is as follo ws. In Section 2, we establish notation conv ent ions. In Section 3, we p ro vide s ome basic bac kground of elemen tary statistical physics, whic h will b e needed in the sequel. In Sectio n 4, we deriv e our main resu lt, whic h is a form ula for the mutual information rate. In S ection 5, we demonstrate ho w it is applied for the wiretap c hannel, and ﬁnally , in S ection 6, w e demons trate how our results ca n b e extended to multiuser scenarios, lik e that of the multiple access c hannel. 2 Notation Con v en tions Throughout this pap er, scalar random v ariables (R V’s) will b e denoted by th e capital letters, lik e S , X , and Y , their sample v alues will b e denoted b y the resp ectiv e lo wer case letters, and their alphab ets will b e denoted by the resp ectiv e calligraphic letters. A similar conv entio n will apply to 4 random vec tors and their sample v alues, whic h will b e d enoted with same symb ols with the bold face font. Th us, for example, X will denote a rand om n -vec tor ( X 1 , . . . , X n ), and x = ( x 1 , ..., x n ) is a sp eciﬁc v ector v alue in X n , the n -th Cartesian p o wer of X . Sources and channels will b e denoted generica lly b y the le tter P , Q , M and W . Whenev er cl arit y and unam biguit y will require it, these letters will b e subscrip ted b y th e names of the relev ant R V’ s, follo wing the standard notation con v en tions in the literaure, fo r example, P S will denote the probabilit y distrib ution of a random v ariable S , P X | Y will den ote the conditional probabilit y distribution of X giv en Y , and so on. The cardinalit y of a ﬁ nite set A will b e d enoted by |A| . Information theoretic quan tities lik e ent ropies and m utual informations will b e denoted follo w in g the usual con v ent ions of th e information theory literature. 3 Bac kground In this section, we pr o vide a b rief acc ount of the v ery basic bac kground in statistic al p h ysics, whic h is needed for this pap er. Consider a p hysical system with N of particles, whic h can b e in a v ariet y of m icroscopic s tates (‘microstates’) , deﬁ ned b y com binations of of physical quan tities asso ciated with these particles, e.g., p ositions, momen ta, angular momenta, spins , etc., of all N particles. F or eac h suc h mi- crostate of the system, whic h w e shall designate by a v ector s = ( s 1 , . . . , s N ), there is an asso ciated energy , giv en b y an Hamiltonian (energy function), E ( s ). F or example, if s i = ( p i , r i ), wh ere p i is the momen tum v ector of p article n umber i and r i is its p osition vec tor, then classically , E ( s ) = P N i =1 [ k p i k 2 2 m + mgz i ], where m is th e mass of eac h p article, z i is its heigh t – one of the co ordinates of r i , and g is the gra vitation constan t. One of the most fundamenta l results in statistical physics (based on the la w of energy conserv a- tion and the basic p ostulate that all microstates of th e same energy lev el are equiprobable) is th at when the system is in thermal equ ilibrium with its environmen t, th e probability of a m icrostate s is giv en b y the Boltzmann–Gibbs d istribution P ( s ) = e − β E ( s ) Z ( β ) (5) where β = 1 / ( kT ), k b eing Boltmann’s conta nt and T b eing temp erature, and Z ( β ) is the normal- 5 ization constan t, called the p artition function , w h ic h is giv en b y Z ( β ) = X s e − β E ( s ) or Z ( β ) = Z d s e − β E ( s ) , dep end ing on whether s is discrete or con tin uous. The r ole of the partition fu nction is by far deep er than just b eing a normalization factor, as it is actually the k ey quan tit y from whic h m an y macroscopic physical quan tities can b e deriv ed, f or example, the free energy 1 is − 1 β ln Z ( β ), the a v erage in ternal energy (i.e., the exp ecta tion of E ( s ) where s drawn is acco rdin g (5 )) is giv en by the negativ e deriv ativ e o f ln Z ( β ), the h eat c apacit y is obtained from the second deriv ativ e, etc. One of th e wa ys t o obtain eq. (5), is as the maximum entrop y distribution und er an energy constraint (o wing to the second law of thermo dynamics), where β pla ys th e role of a Lagrange m ultiplier that con trols th is energy lev el. Let us d eﬁne th e quantit y: Ω N ,δ ( ǫ ) =     { s : ( ǫ − δ / 2) N ≤ E ( s ) ≤ ( ǫ + δ / 2) N }     , (6) and let us assume that the limit Σ( ǫ ) = lim δ → 0 lim N →∞ ln Ω N ,δ ( ǫ ) N exists and that Σ( ǫ ) is a diﬀerenti able conca ve function. Σ ( ǫ ) is the e ntr opy of th e ph ysical system in its statistical–mec hanical deﬁnition. W e will see shortly th at it is int imately related to the Sh an n on en trop y associated with the Boltzmann–Gibbs probablit y distribu tion P ( s ) d eﬁned ab ov e. T o s ee why the conca vit y assumption mak es sense, note that a t least when P ( s ) is a prod u ct distribution (namely , when E ( s ) = P i E ( s i )), Ω N 1 + N 2 ,δ  N 1 ǫ 1 + N 2 ǫ 2 N 1 + N 2  ≥ Ω N 1 ,δ ( ǫ 1 ) · Ω N 2 ,δ ( ǫ 2 ) since for eve ry conﬁguration s , where N 1 ≤ N particles ha v e tota l energy N 1 ǫ 1 and N 2 = N − N 1 particles h av e total energy N 2 ǫ 2 , the total energy of all N = N 1 + N 2 particles is obviously 1 The free energy means th e maximum work that the sy stem can carry out in any pro cess of ﬁx ed temp erature. The maxim um is obtained when th e pro cess is rev ersible (slow, q uasi–static c h anges in the system). 6 N 1 ǫ 1 + N 2 ǫ 2 , but the con verse is not true since there are other wa ys to split the total energy of N 1 ǫ 1 + N 2 ǫ 2 b et we en the tw o complemen tary subsets of particles. Th us, taking the loga rithm of b oth sides, d ividing by ( N 1 + N 2 ), then taking the limits of N 1 , N 2 → ∞ suc h that N 1 / N 2 tends to a give n constant, and ﬁn ally , taking the limit of δ → 0, one readily observes th at Σ( ǫ ) is conca ve. An argumen t of the same sp irit can b e exercised in somewhat more general situations, e.g., wh en P ( s ) has a Mark ov structur e (namely , the physical system has some nearest–neigh b or interac tions), though some more caution is required. Denoting ψ ( β ) = lim N →∞ 1 N ln X s exp {− β E ( s ) } , it is readily seen that ψ ( β ) = lim δ → 0 lim N →∞ 1 N ln   X j ≥ 0 Ω N ,δ (( j + 1 / 2) δ ) · exp {− N β j δ ] }   = sup ǫ ≥ 0 [Σ( ǫ ) − β ǫ ] , (7) i.e., ψ ( · ) and Σ( · ) are a Legendre–transform pair. Sin ce Σ( · ) is assumed conca ve , then the inv ers e transform relation Σ( ǫ ) = inf β ≥ 0 [ β ǫ + ψ ( β )] , holds true as well, an d so the deriv ativ es β ( ǫ ) ∆ = d Σ /dǫ and ǫ ( β ) = − dψ/dβ (whic h are the maximizer of [Σ( ǫ ) − β ǫ ] and the m inimizer of [ β ǫ + ψ ( β )], resp ectiv ely), are in v erses of eac h other. It follo w s then that Σ( ǫ ) = ψ ( β ) − β · dψ dβ , but as is readily seen, − dψ /dβ is the a v erage in tern al energy , E {E ( S )] } , where E is the e xp ecta tion op erator associated with the Boltzmann distribution. This, in turn, is readily veriﬁed to agree with the expression of th e Shann on en trop y r ate H ( S ) of the distribution P ( s ), H ( S ) = lim n →∞ 1 N E  ln  1 P ( S )  = lim n →∞ 1 n E  ln  Z ( β ) exp {− β E ( S ) }  = ψ ( β ) + β E {E ( S ) } . (8) Th us, Σ ( ǫ ) = H ( S ) whenev er β and ǫ are related by β = β ( ǫ ), or equ iv alen tly , ǫ = ǫ ( β ). F or a giv en β , the Boltzmann–Gibbs distribution h as a sharp p eak (for large N ) at the lev el of ǫ ( β ). W e 7 then say that this v alue of ǫ is the dominant energy lev el: Not only is it the av erage energy , there is also a strong concen tr ation of the probabilit y ab out this v alue as N grows without b ound. The second law of thermo dyn amics asserts th at in an isolated sy s tem (which do es not exchange ener gy with its environmen t), the total en tropy cannot decrease, and hence in equilibrium, it r eac h es its maxim um. No w, supp ose that w e ha v e a p h ysical system that is comp osed of tw o subsystems, one having N particles with microstates { s } and Hamiltonian E 1 ( s ), and the other has n particles with microstates { s ′ } and Hamiltonian E 2 ( s ′ ). Let us supp ose that these t wo subsystems are in thermal co nta ct and they b oth resid e in a v ery large environmen t (heat bath) ha ving a ﬁxed temp erature T = 1 / ( k β ). The tw o sub systems are allo wed to exc hange energy with eac h other as w ell as with the heat bath. Ho w is the total energy of the system s p lit b et wee n the tw o subs y s tems? An example of tw o such subsystems wa s describ ed in the ﬁ rst few paragraphs of the In tro du ction. The partition fu nction of the comp osite system is giv en b y Z ( β ) = X s , s ′ exp {− β [ E 1 ( s ) + E 2 ( s ′ )] } and so th e dominant energy lev el, as we saw b efore, is the one that a c hieve s the asso ciated nor- malized log–partition function ψ ( β ), i.e., th e solution ǫ 0 to the equation d Σ( ǫ ) /dǫ = β , wher e Σ( ǫ ) is the entrop y of the com bin ed system. Let us conﬁne atten tion n o w to the set of combined mi- crostates { ( s , s ′ ) } of the composite s ystem wh ich ha v e energy ( N + n ) ǫ 0 . More precisely , assume that the r atio n/ N = λ is held ﬁxed, so ( N + n ) ǫ 0 = N (1 + λ ) ǫ 0 , and let us deﬁne Ω N ,n,δ ( ǫ 0 ) =      { ( s , s ′ ) : N (1 + λ )( ǫ 0 − δ / 2) ≤ E 1 ( s ) + E 2 ( s ′ ) ≤ N (1 + λ )( ǫ 0 + δ / 2) }      . Clearly , every conﬁguration ( s , s ′ ) with energy ab ou t N (1 + λ ) ǫ 0 corresp onds to some allo cation of of the energy in one su bsystem and the r emainin g energy in th e other. Thus, deﬁ ning Ω (1) N ,δ ( ǫ ) and Ω (2) n,δ ( ǫ ) as the e numerators of microstate s with energy ab out ǫ in e ac h one of the t wo sub systems individually (as d eﬁned in eq. (6)), w e hav e, for ˆ δ = δ (1 + λ ): Ω N ,n, ˆ δ ( ǫ 0 ) = X j ≥ 0 Ω (1) N ,δ (( j + 1 / 2) δ )Ω (2) n,δ  (1 + λ ) ǫ 0 − ( j + 1 / 2) δ λ  . Deﬁning Σ( ǫ ) as lim δ → 0 lim N →∞ [ln Ω N ,λN , ˆ δ ( ǫ )] / [ N (1 + λ )], we ﬁnd, after taking logarithms of b oth sides, dividing by N (1 + λ ), letting N → ∞ , and then δ → 0, that Σ( ǫ 0 ) is giv en by the w eigh ted 8 supremal conv olution 2 : Σ( ǫ 0 ) = sup 0 ≤ ǫ ≤ (1+ λ ) ǫ 0  1 1 + λ · Σ 1 ( ǫ ) + λ 1 + λ · Σ 2  (1 + λ ) ǫ 0 − ǫ λ  . Assuming that t he maximum is ac hiev ed b y ǫ ∗ ∈ (0 , (1 + λ ) ǫ 0 ), it is c haracterized by a v anishing deriv ativ e of the expression in the square br ack ets, i.e., the solution to the equation Σ ′ 1 ( ǫ ) = Σ ′ 2  (1 + λ ) ǫ 0 − ǫ λ  , (9) where ǫ is th e unknown, and where Σ ′ i is the deriv ativ e of Σ i , i = 1 , 2. This equation c haracterizes the thermal equilibr ium b et we en the t wo subsystems and the heat bath. No w, the left–hand side is e xactly β . Thus, ǫ ∗ , the p er–particle energy share of the ﬁrst subs ystem is the solution to the equation Σ ′ 1 ( ǫ ) = β (or, equiv alen tly , of eq. (9), as s aid), and the remaining energy p er particle , [(1 + λ ) ǫ 0 − ǫ ∗ ] /λ b elongs to the other sub system. Comment. Returning to th e example that op ens the Introd uction, a simple calculation sho ws that the dominant energies are H · E { N X i =1 S i } = N B tanh  B k T  in the ﬁ rst s ubsystem, and e 0 · E { n X i =1 S ′ i } = ne 0 exp {− e 0 /k T } 1 + exp {− e 0 /k T } in the second s ubsystem. Thus, ǫ ∗ = B tanh  B k T  and (1 + λ ) ǫ 0 − ǫ ∗ λ = exp {− e 0 /k T } 1 + exp {− e 0 /k T } . In the parall el join t source–c hannel codin g problem describ ed in the In tro du ction, and t o b e fur - ther studied in a more general setting in the sequel, w e hav e: ln P ( s ) = ( 1 2 ln q 1 − q ) · P N i =1 s i + const, and ln W ( y | x ) = (ln p 1 − p ) · P n i =1 ( x i ⊕ y i ) + const, with ⊕ denoting mo d ulo 2 addition, the dom- inan t con tribution to P ( s | y ) comes from those { s } f or whic h P N i =1 s i is about its t ypical v alue N [(+1) · q + ( − 1) · (1 − q )] = N (2 q − 1) = N ta nh( B /k T ) (in analogy to the energy of the 2 The supremal conv olution betw een tw o fun ctions f ( x ) and g ( x ) is generally deﬁ ned as h ( x ) = sup t [ f ( x − t ) + g ( t )]. The qualiﬁer “weigh ted”, in our context, refers to the fact that b oth functions as well as their arguments are w eigh ted by 1 / (1 + λ ) and λ/ (1 + λ ) . 9 ﬁrst s u bsystem ab o ve, where w e h a v e used the relations (1)-(4) ) and P n i =1 ( x i ⊕ y i ) is ab out np = n exp {− e 0 /k T } / [1 + exp {− e 0 /k T } ] (in analogy to the energy of the second subsystem). Notice that these tw o typical con tributions to the log –p osterior probabilit y agree also with the corresp ondin g t ypical con tributions, ln P ( s 0 ) an d ln W ( y | x ( s 0 )), of the r e al message s 0 that w as actually tr an s mitted. Th is is tru e regardless of whether the comm unication is r eliable or not, i.e., it contin ues to h old no matt er whether the en tropy rate of the so urce is smaller or larger than λ times the mutual information b et wee n the input and the outp ut of the channel. Returning to the general discussion ab o ve, note that the same considerations con tin ue to hold ev en if one of the sys tems, sa y , the second on e, has an eﬀectiv e ne g ative en tropy , that is, Ω (2) n,δ ([(1 + λ ) ǫ 0 − ǫ ∗ ] /λ ) < 1, whic h means th at for eac h microstate s of the ﬁ rst s ubsystem with p er–particle energy ǫ ∗ , only a fr action of the compatible combined microstate s { ( s , s ′ ) } ha v e n oramilzed energy ǫ 0 . Of course, Ω N ,n, ˆ δ ( ǫ ∗ ) m ust b e larger than 1. In the sequel, we shall see that in the join t source– c hannel co ding problem, the sou r ce and the channel constitute a mechanism whic h is h ighly parallel to th at of equilibrium energy–sharing b et ween tw o subsys tems in a h eat bath, where the subsystem corresp ondin g to the c h annel h as a negativ e eﬀectiv e th er m o dynamic ent ropy in this sense. W e sh ould commen t that in order to determine the energy sharing b et wee n the t wo subsy s tems in th e ab o v e discus s ion, it was not necessary to consider ho w they thermally int eract with eac h ot her and to go th rough the w eigh ted supremal con v olution b et ween th eir en tropies, as w e did. W e could ha v e determined these energies simp ly by considering the equilibr ium of ea c h one of the subsystems individually with the heat bath, 3 th us equating the d eriv ativ e of eac h one of the en trop y fun ctions to β . Nonetheless, we ha ve delib erately c hosen to present the su premal conv olution b ecause in the sequel, it is th is relation that will lead to the deriv ation of the mutual information in the joint source–c h annel co din g problem. 4 F orm ulation, M ain Results and Discussion Consider an information source, S 1 , S 2 , . . . , w hose symb ols { S i } tak e on v alues in a ﬁnite alphab et S . Th e source is charact erized b y a sequ ence of probab ility distributions, P ( s ), s ∆ = ( s 1 , . . . , s N ), where N = 1 , 2 , . . . . C onsider next a d iscrete memoryless c han n el (DMC), whic h is c haracterized 3 When doing so, the other system then b ecomes part of the heat bath an ywa y . 10 b y a m atrix of single–letter transition probabilities { W ( y | x ) , x ∈ X , y ∈ Y } , where X and Y are ﬁnite alphab ets. The op eration rate of the c hannel relativ e to the source is λ c hann el uses p er source symb ol, wh ic h means that while the source p ro duces an N –v ector s = ( s 1 , . . . , s N ) ∈ S N , the channel con v eys n c hannel symbols, n amely , it receiv es an n –v ector x = ( x 1 , . . . , x n ) ∈ X n and outputs an n –vec tor y = ( y 1 , . . . , y n ) ∈ Y n , w here n = λN . The parameter λ is referred to as the b andwidth exp ansion factor of the c hannel relativ e to the source. F or the sak e of con v enience in dra wing the analog y with statistical mec hanics, we will think of b oth the source and the c hannel as Boltz mann distribu tions with certain Hamiltonians at a certain common inv ers e temp erature β , that is, P ( s ) is pr op ortional to exp {− β E S ( s ) } and W ( y | x ) is p r op ortional to exp {− β E C ( x, y ) } , where E S ( · ) and E C ( · , · ) are the Hamil tonians of the source and the channel, resp ective ly . F or a pair of n –v ectors x and y , we will d en ote W ( y | x ) = Q n i =1 W ( y i | x i ), and k eep in mind that it is pr op ortional to exp {− β E C ( x , y ) } , wh er e E C ( x , y ) ∆ = P n i =1 E C ( x i , y i ). Clearly , there is no loss of generalit y in this representa tion of the source and the channel since there is alw a ys at least one w a y of doing this: F or example, one can simply take β = 1 , E S ( s ) = − ln P ( s ), and E C ( x, y ) = − ln W ( y | x ). Th e p oin t is, ho w ev er, that by d oing this w e ha v e slightly extended the scop e: instead of one source a nd o ne c hannel, w e are actually considering a family of sources and channels, b oth indexed by a co mmon parameter β , t hat con trols the d egree of uniformity or sk ew edness of the distribu tion. An ( N , n ) joint sour c e–channel c o de , for the ab o v e d eﬁned source and channel, is a mappin g f rom the set S N to X n . Every source string s is mapp ed in to a c hannel inpu t vec tor x ∆ = ( x 1 , . . . , x n ), and when we wish to emphasize the dep endence of x on s , w e denote it a s x ( s ). The cod e is assumed to b e selected at r an d om, where for eac h s , the co dew ord x ( s ) is dra wn u nder a distribution 4 M ( x ), indep end en tly 5 of all other cod ew ords. The receiv er estimates s by applying a certain function on the receiv ed channel o utpu t sequence y ∆ = ( y 1 , . . . , y n ), i.e., it implemen ts a function from Y n to S N , which will b e denoted b y ˆ s = ˆ s ( y ). In some applications, the r eceiv er (or the observ er) ma y not necessarily attempt at full–ﬂedged deco din g of the message, but ma y opt to merely estimate a 4 A more general mo del w ould allow a distribution M that dep ends on s . F or examp le, if S N can b e natu rally divided into type classes (like in te case of memoryless sources, Marko v sources, etc.), then it is plausible to let M dep end on the t yp e class of s . How ever, among all sequences in S N , the important ones are those that ar e t ypical to the source (others can b e ignored in the large N limit), which are equiprobable in the exp onential scale, and so, the distribution M for al l of t hem can b e tak en to b e the same without los s of as ymptotic optimalit y . 5 The indep endence assumption is made here mostly for the sak e of simplicit y . It can b e somewhat relaxed as long as t he concen tration properties speciﬁed b elo w con tinue to hold. 11 certain function of the source sequence (e.g., some statistic such as its comp osition). Our study of th e m utual information ind uced b y the joint source–c hannel cod e will b e s trongly based on the p osterior distribution, wh ic h, for a giv en (randomly selected) co d e, is deﬁn ed as: P β ( s | y ) = P ( s ) W ( y | x ( s )) P s ′ ∈S N P ( s ′ ) W ( y | x ( s ′ )) = exp {− β [ E S ( s ) + E C ( x ( s ) , y )] } P s ′ exp {− β [ E S ( s ′ ) + E C ( x ( s ′ ) , y )] } . (10) On a tec hnical n ote, observ e that since the posterior distr ib ution is give n by a rati o, this all o ws slighlt y more f r eedom in the deﬁn ition o f the Ha miltonians E S and E C , as certain common constants in the n umerator and the denominator ma y cancel eac h other. F or example, if the source is bin ary and m emoryless, as d escrib ed in the example giv en in the In tro du ction, then P ( s ) is prop ortional to exp {− ( 1 2 ln 1 − q q ) P N i =1 s i } , and so one can deﬁ ne E S ( s ) to b e prop ortional to P N i =1 s i , where the factor 1 2 ln 1 − q q can b e sp lit b et wee n a part that is absr ob ed in the Ha miltonian itself and a p art that is attributed to the in verse temp erature parameter β . A similar comment applies to th e c hannel, but here some more caution is required since, in general, the constant of prop ortionali t y that relates W ( y | x ) and exp {− β E C ( x , y ) } ma y dep end on x , unless the co de is of constan t comp osition a nd /or the c hann el is symmetric in the sense that P y exp {− β E C ( x, y ) } is indep endent of x for all β (whic h is the case , e.g., in mod ulo–additiv e channels, like the BSC ). If neither of these conditions hold (i.e., if the co de is n ot constan t comp osition and the channel is not symmetric), we k eep the choice E C ( x, y ) as b eing prop ortional to − ln W ( y | x ). F or a given choi ce of the Hamiltonians E S and E C , in view of these considerations, let us deﬁne the joint sour c e–channel p artition function as the d enominator of the p osterior distribution, i.e., Z ( β | y ) ∆ = X s ∈S N exp {− β [ E S ( s ) + E C ( x ( s ) , y )] } . In the course of studying the prop er ties of a t ypical realization of the join t s ource–c h annel partition function, p ertaining to a giv en cod e ensem ble, we w ill mak e a few obser v ations, which were al ready men tioned b rieﬂy in the Introd uction: 1. Similarly a s results th at hav e already b een observ ed in the co nte xt of th e pure c hannel codin g problem [12], the sta tistical–mec hanical system p ertaining to Z ( β | y ) undergo es a phase tran- sition, whic h corresp onds, in the realm of co ded systems, to the transition b et ween reliable 12 and unreliable communicati on, n amely , the p oin t at wh ic h the en tropy rate of the source exceeds the mutual information b etw een the inpu t and the output of the c h annel. 2. When iden tifying the set of sour ce vec tors { s } th at dominates Z ( β | y ) (i.e., those that con tribute most to Z ( β | y )) ab o v e the phase transition temp erature, one ob s erv es a situa- tion th at parallels t hat o f t hermal equilibrium b et wee n t w o p h ysical sub systems, one co rre- sp ond in g to the source and the other corresp onds to the channel. T o b e more sp eciﬁc, if E ( s , y ) = E S ( s ) + E C ( x ( s ) , y ) is though t of as the total ‘energy’ shared b y the source and the co de/c hannel, then the dominan t messages { s } split this total a verag e energy b et w een the source and the c hann el comp onen ts in a w a y that corresp onds to thermal equilibrium b et we en the t w o parallel physica l su bsystems. 3. The balance b et ween the thermo dynamic al en tropies of th e tw o p hysical su bsystems that lie in equilibrium, as describ ed in item no. 2, is identiﬁed with the simple relation b et ween th e corresp ondin g Shanno n en tropies of the source, namely , the unconditional source e ntrop y and the conditional en trop y give n the c hann el output, w hose diﬀerence is th e m utual information b et we en th e sour ce and the c hannel output. This giv es rise to a simple form ula of the m utual information rate in duced b y a typica l co de in th e ensemble. In analogy to the deﬁnitions and the assumptions outlined in Section 3, we no w mak e a few deﬁnitions and assu mptions concerning the join t source–c hannel co d ing mo d el. A.1 Deﬁn ing Ω ( S ) N ,δ ( ǫ ) ∆ =     n s ∈ S N : ( ǫ − δ / 2) N ≤ E S ( s ) ≤ ( ǫ + δ / 2) N o     , our ﬁrst assu mption is that Σ S ( ǫ ) ∆ = lim δ → 0 lim N →∞ ln Ω ( S ) N ,δ ( ǫ ) N exists and that Σ S ( ǫ ) is a diﬀeren tiable conca v e function. A.2 F or a give n y , d eﬁne φ n,δ ( ǫ | y ) ∆ = 1 n ln Pr { n ( ǫ − δ / 2) ≤ E C ( X , y ) ≤ n ( ǫ + δ / 2) } , where the random v ector X is d ra wn und er the rand om cod in g distribution M , indep endently of y . T hen, our second assumption is that for all ǫ ≥ 0, lim δ → 0 lim n →∞ E { φ n,δ ( ǫ | Y ) } tends 13 uniformly to a diﬀerentia ble function φ ( ǫ ), where the exp ectation E is w.r.t. b oth the r andom selection of the codeb o ok and the r andom actions o f the source and the c hannel. Moreo v er, w e assum e th at lim δ → 0 lim n →∞ φ n,δ ( ǫ | Y ) tends φ ( ǫ ) uniformly almost su rely . A.3 L et Σ S ( ǫ ) and φ ( ǫ ) b e d eﬁned as ab o ve, and let Σ 0 ( ǫ ) b e deﬁn ed by the we igh ted supremal con v olution Σ 0 ( ǫ ) ∆ = max 0 ≤ ǫ ′ ≤ (1+ λ ) ǫ  Σ S ( ǫ ′ ) 1 + λ + λ 1 + λ φ  (1 + λ ) ǫ − ǫ ′ λ  . Our thir d assump tion is that Σ 0 ( ǫ ) is a conca ve f unction thr oughout the range of ǫ where it is non–negativ e. W e now deﬁne Σ( ǫ ) = ( Σ 0 ( ǫ ) Σ 0 ( ǫ ) ≥ 0 −∞ Σ 0 ( ǫ ) < 0 As we shall see b elow, while Σ 0 ( ǫ ) gives the logarithm of the e xp e cte d numb er of conﬁgur ations with total e nergy ǫ , the fun ction Σ( ǫ ) giv es th e n umber of suc h conﬁgurations for a typic al cod e in the en s em ble. T o see this, note that if Σ S ( ǫ ′ ) + λφ ([(1 + λ ) ǫ − ǫ ′ ] /λ ) < 0 for a ll ǫ ′ , t his means that for ev ery ǫ ′ the pro d uct of the n umber of conﬁgurations { s } for wh ic h E S ( s ) is ab out nǫ ′ and the probab ility that a randomly chosen co dew ord w ould pro vide the complemen tary energy ([(1 + λ ) ǫ − ǫ ′ ] /λ , is less than one, which means that there is a v ery low p robabilit y to ﬁnd an y conﬁguration with total energy ǫ , and so, Σ( ǫ ) whic h is the normalized logarithm of the num b er of suc h conﬁgurations (i.e., the thermod ynamical entrop y of the co mbined sys tem) is equal to −∞ for a typical co de realizatio n. Note that the conca vit y of Σ 0 ( ǫ ) across the range where it is non–negativ e implies that Σ ( ǫ ) is conca v e as w ell. In analogy to the d iscussion of th e previous section, let us deﬁne Z S ( β ) ∆ = X s exp {− β E S ( s ) } . Then, ψ S ( β ) ∆ = lim N →∞ 1 N ln Z S ( β ) and Σ S ( ǫ ) are a Legendre–transform pair. Sin ce Σ S ( · ) is assu m ed conca v e, then the inv erse trans- form relation Σ S ( ǫ ) = in f β ≥ 0 [ β ǫ + ψ S ( β )] , 14 holds true as w ell, and so the deriv ativ es β S ( ǫ ) ∆ = d Σ S /dǫ and ǫ S ( β ) = − dψ S /dβ are in ve rses of eac h other. It follo ws then that the Shann on en trop y rate H ( S ) of P ( s ) (wh ich dep end s on β ) agrees with Σ S ( ǫ ) whenever β and ǫ are related b y β = β S ( ǫ ), or equiv alently , ǫ = ǫ S ( β ). Referring to th e p artition fu nction Z ( β | y ), let us distinguish b et w een the con tribution of the actual realization of the true sequence th at th e source actually emitted s 0 , i.e., Z c ( β | y ) = exp {− β [ E S ( s 0 ) + E C ( x ( s 0 ) , y )] } and the cont ribu tion of all other (err on eous) source v ectors Z e ( β | y ) = X s 6 = s 0 exp {− β [ E S ( s ) + E C ( x ( s ) , y )] } . No w, ln Z c ( β | y ) is t ypically aroun d − [ E {E S ( S ) } + E {E C ( X ( S ) , Y ) } ] . As for Z e ( β | y ), let us deﬁn e Ω N ,δ ( ǫ | y ) =     { s 6 = s 0 : N (1 + λ )( ǫ − δ / 2) ≤ E S ( s ) + E C ( x ( s ) , y ) ≤ N (1 + λ )( ǫ + δ/ 2) }     . Then, similarly as in the previous section, one readily observe s that for δ ′ = δ (1 + λ ), w e h a v e: Ω N ,δ ′ ( ǫ | y ) = X j ≥ 0 Ω ( S ) N ,δ (( j + 1 / 2) δ ) × Pr { N (1 + λ )( ǫ − δ ′ / 2) − N ( j + 1) δ ≤ E C ( X , y )) ≤ N (1 + λ )( ǫ + δ ′ / 2) − N j δ } = X j ≥ 0 Ω ( S ) N ,δ (( j + 1 / 2) δ ) exp { nφ n,δ ([(1 + λ ) ǫ − ( j + 1 / 2) δ ] /λ | y ) } (11) T aking logarithms of b oth sides, dividing by N + n = N (1 + λ ), letting N gro w without boun d, and ﬁnally letting δ go to zero, w e obtain 6 that: lim N →∞ ln ˆ Ω N ,δ ′ ( ǫ | Y ) N (1 + λ ) a.s. = ( Σ 0 ( ǫ ) Σ 0 ( ǫ ) ≥ 0 −∞ Σ 0 ( ǫ ) < 0 but the r.h.s. is exactly Σ( ǫ ). Th us, as explained earlier, Σ( ǫ ) is the therm o dynamical en tropy asso ciated w ith the combined source–c h annel system. The conca vit y of Σ ( ǫ ) then implies that it ag rees (after the app ropriate scaling) with the conditional Shannon entrop y rate of the sou r ce giv en th e channel output, H ( S | Y ), i.e., the en tropy rate p ertaining to the sequence of conditional probabilities P ( s | y ) deﬁned ab o v e. F or a give n ǫ in the range w h ere Σ( ǫ ) is ﬁ nite, let ǫ ′ = ǫ ∗ ac h iev e the sup rem um deﬁning Σ( ǫ ). 6 At this p oint, w e are using the fa ct [12],[11] that for an en sem ble of independently sel ected cod ew ords, the num b er of co dewords which contri bute en ergy E C ( X , y ) ≈ n [(1 + λ ) ǫ − ǫ ′ ] λ , is with very high probabilit y zero, if Σ S ( ǫ ′ ) + λφ (1 + λ ) ǫ − ǫ ′ ] /λ ) < 0 and around exp { N [Σ S ( ǫ ′ ) + λφ (1 + λ ) ǫ − ǫ ′ ] /λ ) } if Σ S ( ǫ ′ ) + λφ (1 + λ ) ǫ − ǫ ′ ] /λ ) > 0. The assumption of ind ep endnent codewords can be rela xed as long as th is concentration prop erty contin ues to hold. 15 A t this p oint, one should distinguish b etw een t w o situations: In the ﬁ rst situation, ǫ is on the b oundary of th e range where Σ( ǫ ) is ﬁnite and p ositiv e, namely , Σ( ǫ ) = 0. In this ca se, the partition function Z ( β | y ) (and h ence also P β ( s | y )) is dominate d b y a su b exp onen tial num b er of conﬁgurations { s } and so, the en trop y rate H ( S | Y ) = 0, which means that the system is frozen in its glassy phase (cf. [12],[11] and references therein.) In the second situation, ǫ is an in tern al p oint of the r ange wh ere Σ ( ǫ ) > 0, where w e w ill also assume that ǫ ∗ ∈ (0 , (1 + λ ) ǫ ), whic h is the p ar amagnetic phase (or th e disordered phase) of Z e ( β | y ). T hen, the d eriv ativ e of the fu nction b eing maximized v anishes, i.e., d Σ S ( ǫ ′ ) dǫ ′      ǫ ′ = ǫ ∗ − dφ ( ǫ ′′ ) dǫ ′′      ǫ ′′ =[(1+ λ ) ǫ − ǫ ∗ ] /λ = 0 or equiv alen tly , Σ ′ S ( ǫ ∗ ) = φ ′  (1 + λ ) ǫ − ǫ ∗ λ  , (12) where Σ ′ S and φ ′ denote the deriv ativ es of Σ S and φ , resp ecti ve ly . As b efore, eq. (1 2) giv es r ise to thermal equilbriu m b et w een th e physic al system corresp onding to the source and th e on e th at p ertains to th e co d e/c h annel. Next observ e that the left–hand side is exactly β S ( ǫ ∗ ). Thus, β S ( ǫ ∗ ) = φ ′  (1 + λ ) ǫ − ǫ ∗ λ  , whic h means that giv en the v alue of the total p er–particle energy ǫ , w e can ﬁn d ho w the dominant co dew ords sp lit the energy b et ween the source and the channel: w e can so lv e th e ab ov e e quation with the giv en ǫ , with ǫ ∗ as an unkn o wn. Then, th e source contribution will b e ǫ ∗ and the c hannel con tribution w ill b e [(1 + λ ) ǫ − ǫ ∗ ] /λ . The discussion ab o ve holds for ev ery v alue of ǫ f or wh ic h Σ( ǫ ) > 0. The d ominan t v alue of ǫ is ǫ 0 , the one that ac h iev es E { ln Z ( β | Y ) } / [ N (1 + λ )] for large N , in other words, the ac h iev er of: ψ ( β ) = lim N →∞ E ln Z ( β | Y ) N (1 + λ ) = sup ǫ ≥ 0 [Σ( ǫ ) − β ǫ ] . Th us, the d ominan t v alue of ǫ , whic h is relev an t for the previous paragraph, is ǫ 0 , whic h in tur n dep end s only on β . Bu t since Σ is assu med conca v e, then ψ and Σ are also a Legendre–transform pair, and so ǫ 0 and β are related via the d eriv ativ es, ǫ 0 = ǫ ( β ) ∆ = − ψ ′ ( β ) and β = β ( ǫ ) = Σ ′ ( ǫ ), where again, primes den ote the d eriv ativ es. In summary , giv en β , ǫ 0 = ǫ ( β ) and ǫ ∗ = ǫ S ( β ). Thus, 16 β S ( ǫ ∗ ) in the equilibr ium equation is β s ( ǫ S ( β )) ≡ β since β S ( · ) and ǫ S ( · ) are inv erses of one another. Th us, th e equ ilibr ium equation app lied to the d ominan t energy ǫ 0 b ecomes β = Σ ′ S ( ǫ ∗ ) = φ ′  (1 + λ ) ǫ 0 − ǫ ∗ λ  . If, in ad d ition, φ is conca v e, then φ ′ is monotone, and thus h as an in verse, w h ic h is giv en b y the negativ e d eriv ativ e − ζ ′ of the Legendre transform of φ , th at is, b y the d eriv ativ e of ζ ( t ) = s up ǫ [ φ ( ǫ ) − ǫt ] and then (1 + λ ) ǫ 0 − ǫ ∗ λ = − ζ ′ ( β ) . No w observe th at if, for a typica l y , either Z c ( β | y ) dominates Z e ( β | y ), or Z e ( β | y ) is in its f rozen phase, th en H ( S | Y ) v anishes, and so the mutual information rate lim N →∞ I ( S ; Y ) / N = H ( S ). F or the complement ary case, our main resu lt is the follo wing: Theorem 1 L et E { I ( S ; Y ) } denote the exp e cte d mutual information, wher e the exp e ctation is taken w.r.t. the ensemble of of joint sour c e–channel c o des. Then, under Assumptions A1–A3: lim N →∞ E { I ( S ; Y ) } N = − λφ ( − ζ ′ ( β )) , pr ovide d th at Σ( ǫ 0 ) > 0 . R emark: F rom th e ab o v e discussion, it is apparent that this r esult applies also to the almost–sure limit of I ( S ; Y ) / N w.r.t. the co d e ensemble. Pr o of. lim N →∞ E I ( S ; Y ) N = H ( S ) − H ( S | Y ) = Σ S ( ǫ ∗ ) − (1 + λ )Σ( ǫ 0 ) = − λφ  (1 + λ ) ǫ 0 − ǫ ∗ λ  = − λφ ( − ζ ′ ( β )) . ✷ (13) Discussion. W e obtained then a very simple form ula which dep ends solely on the rand om codin g distribution. But what is the meaning of ζ ′ ( β )? Since − φ ( ǫ ) is, in fact, the large d eviations rate 17 function for the ev ent E C ( X , y ) ≤ nǫ , and ζ ( t ) is its Legendre transform, then it must b e the almost–sure limit of the log–momen t generating function, that is ζ ( t ) a.s. = lim n →∞ 1 n ln X x ∈X n M ( x ) e − t E C ( x , Y ) where, as deﬁned ab o ve , M is th e random co ding d istribution that gov erns eac h one of the inde- p endent, rand omly selected co dew ords. T h us, − ζ ′ ( β ) a.s. = lim n →∞ 1 n · P x M ( x ) E c ( x , Y ) e − β E C ( x , Y ) } P x M ( x ) e − β E C ( x , Y ) . But the Boltzmann w eigh t e − β E C ( x , y ) is prop ortional to W ( y | x ), and so, − ζ ′ ( β ) is exact ly the asymptotic almost–sure normalized conditional exp ectation of the energy , lim n →∞ E {E C ( X , Y ) | Y } /n , stemming from the action of the c h an n el on the message x ( s 0 ) that w as actually transmitted. This quan tit y in tur n is assumed to concen trate ab out its mean whic h is lim n →∞ E {E C ( X , Y ) } /n . Th us, Z e ( β | y ) and P ( s | y ) are dominated by (err on eous) sequences { s } whose n ormalized en - ergy ǫ 0 consists of a source cont ribu tion ǫ ∗ = lim N →∞ E {E S ( S ) } / N , and a c h annel contribution, [(1 + λ ) ǫ 0 − ǫ ∗ ] /λ that agrees with the normalized energy generated by the noise, i.e., it ag rees with lim n →∞ E {E C ( X , Y ) } /n , where X and Y are r elated via the c h annel W . Moreo ver, this is also the t ypical energy comp osition of the true message s 0 that w as actually trans m itted (cf. the deﬁnition of Z c ( β | y ). Thus, the ab o v e conclusion holds true regardless of whether or not the entrop y r ate of the so urce is sm aller (in whic h c ase s 0 dominates Z ( β | y )) or larger th an λ times the normalized m utual inform tion b et w een X and Y (in w hic h case, erroneous message s dominate Z ( β | y ) for a t ypical y ). W e ha ve already seen this b eha vior in the sp ecial case of the b inary source and the BSC. Example 1. Supp ose th at the c hannel is BSC and co dew ords are generated b y fair coin tossing. In this case, W ( y | x ) is prop ortional to exp {− β E C ( x , y ) } , where E C ( x , y ) is the Hamming distance and β = ln 1 − p p . In this case, φ ( ǫ ) = h 2 ( ǫ ) − ln 2 wh ose d er iv ativ e is φ ′ ( ǫ ) = ln 1 − p p , and so, − ζ ′ ( β ), the inv erse of φ ′ ( ǫ ), is giv en by − ζ ′ ( β ) = 1 / (1 + e β ) = p . It follo ws then that if, in add ition, the source is binary and memoryless with a parameter q , then P ( s | y ) is dominated b y v ectors { s } w hose energy is as describ ed in the Introdu ction. Also, the normalized m utual information is − λφ ( − ζ ′ ( β )) = − λφ ( p ) = λ (ln 2 − h 2 ( p )). Somewh at more generally , let ea c h co ordinate X i ( s ), i = 1 , . . . , n , of eac h codeword b e d ra wn i.i.d. with probab ilities Pr { X i ( s ) = 1 } = 1 − Pr { X i ( s ) = 18 0 } = m . Then, it is easy to show (using the metho d of types [2]) that − φ ( p ) = min { P X | Y : E d ( X,Y ) ≤ p } [ I ( X ; Y ) + D ( P X k M )] , Y ∼ Bernoulli( m ∗ p ) , where m ∗ p means the b inary con vo lution b et ween m and p (i.e., m ∗ p = m (1 − p ) + p (1 − m )), d ( · , · ) is the Hamming distance and P X is the marginal of X ind uced by Y (wh ic h is Bernoulli( m ∗ p )) and the rev ersed c h annel P X | Y to b e optimized. By eliminating the div ergence term, w e are low er b ounding − φ ( p ) by the rate–distortion function of Y at Hamming distortion p , which is h 2 ( m ∗ p ) − h 2 ( p ). On the other h an d , returning to the original minimization problem, by selecting P X | Y (instead of minimizing o ve r P X | Y ) to b e the rev erse channel indu ced b y M and W Y | X (whic h is the BSC( p )), w e are getting the same quanti t y a lso as an up p er b ound. Thus, − φ ( p ) = h 2 ( m ∗ p ) − h 2 ( p ), and so, lim N →∞ E I ( S ; Y ) N = λ [ h 2 ( m ∗ p ) − h 2 ( p )] . Comment: An alternativ e v iew on the deriv ation of the asymptotic m utual information rate b etw een S a nd Y comes f r om the follo wing chain of equalities: lim N →∞ E I ( S ; Y ) N = lim N →∞ E  ln P ( Y | S ) P ( Y )  = lim N →∞ 1 N E { ln exp {− β E C ( X ( S ) , Y ) }} − lim N →∞ 1 N E ( ln " X s 1 Z S ( β ) exp {− β [ E S ( s ) + E C ( X ( s ) , Y )] } #) = − β [(1 + λ ) ǫ 0 − ǫ ∗ ] + ψ S ( β ) − Σ S ( ǫ ∗ ) − λφ  (1 + λ ) ǫ 0 − ǫ ∗ ) λ  + β (1 + λ ) ǫ 0 = β ǫ ∗ + ψ S ( β ) − Σ S ( ǫ ∗ ) − λφ  (1 + λ ) ǫ 0 − ǫ ∗ ) λ  = − λφ  (1 + λ ) ǫ 0 − ǫ ∗ ) λ  (14) where we hav e used the fact that the summ ation o v er s is dominated b y conﬁgurations with p er– particle energy ǫ 0 , whic h is allocated as ǫ ∗ and [(1 + λ ) ǫ 0 − ǫ ∗ ] /λ . 5 Application to the Wiretap Channel In this section, w e demonstrate h o w our resu lts apply to the wiretap c hannel. Wyner, in his w ell– kno wn pap er on the wiretap c hannel [14], studied the problem of secure c omm un ication across a 19 degraded broadcast channel, without using a secret k ey , wh ere th e leg itimate receiv er has access to the output of the go o d c hannel and the w iretapp er r eceiv es the output of the bad channel. In that pap er, Wyner characte rized the o ptimum trade–oﬀ b et w een reliable coding rate s a nd the equiv o cation at the wiretapp er, whic h w as deﬁned in terms of the conditional en trop y of the source giv en the output of the bad c h annel, observed by the wire–tapp er. Consider a DMS P as b efore, and a cascade of t w o ﬁnite alphab et DMC’s: W Y | X follo wed immediately b y W Z | Y , b oth 7 op erating at a relativ e r ate of λ c hannel sym b ols p er sour ce symbol. The source s ∈ S N is enco d ed to a c hannel input vect or x ( s ) ∈ X n , n = λN , and then transmitte d. A co de for the wire–tap channel sh ould b e designed in a w a y , that on the one h an d , the legitimate receiv er is requir ed to estimate the source s fr om the outpu t y ∈ Y n of the c h annel W Y | X within an arbitrarily s mall probability of error, wh ereas on the other hand, the ea v esdropp er, which h as access t o z ∈ Z n , should b e able to learn as little as p ossible ab out the s ource in the sense that the asymptotic equiv o cation, ∆ = lim sup N →∞ H ( S | Z ) / N , sh ould b e as large as p ossible. Wyner sho w ed [14] th at the largest achiev able v alue of ∆ is giv en b y λ Γ( H ( S ) /λ ), where Γ( R ) ∆ = max P X : I ( X ; Y ) ≥ R [ I ( X ; Y ) − I ( X ; Z )] . In particular, the secrecy capacit y C s , which is the solution to the equation R = Γ( R ), is the rate at whic h the p oten tial secrecy that the wiretap c hann el can oﬀer is fully exp oilte d: If the en trop y of the sou r ce, H ( S ) /λ is less than or equal to C s (supp osing that λ can b e chosen in s uc h a wa y), then the coding scheme of [14] that asymptotical ly ac hiev es C s w orks as fo llo ws: Let X ∗ b e the random v ariable X that ac h iev es Γ ( R ), for some R in the range H ( S ) /λ ≤ R ≤ C s , and let Y ∗ and Z ∗ b e the corresp onding outputs of the t w o channels. W e ﬁ rst compr ess the source S to its en trop y , and then apply c hann el cod ing so that t he go o d receiv er can still deco de reliably for large N and n , but the bad one c annot. Now, sin ce H ( S ) /λ ≤ C s , then by the deﬁn itions of Γ( · ) and C s , I ( X ∗ ; Y ∗ ) ≥ H ( S ) /λ + I ( X ∗ ; Z ∗ ). Accordingly , the channel co d eb o ok is comp osed of ab ou t e N H ( S ) = e nH ( S ) /λ bins (o ne for eac h typical source sequ en ce), eac h of size sligh tly less than e nI ( X ∗ ; Z ∗ ) . The co dewo rd actually transmitted is randomly c h osen among all c o dewords of the bin p ertaining to the ind ex of the compr essed source sequ ence. Note that th e ea v esdropp er could ha v e deco ded the message had it b een informed of t he bin whic h the transmitted co d eword 7 The notation of the output of the se cond c hannel, Z , should not b e confused with the notation of the p artition function since we do not refer the partition function in t his secti on. 20 b elongs to since the rate of the b in , as s aid, is (slightl y) less than I ( X ∗ ; Z ∗ ). The idea then is that this inf ormation is irrelev an t s in ce it is indep end en t of the source ve ctor, and so it do es n ot help the ea ve sdr opp er in learning a nything ab out the source. Indeed, if w e represen t the transmitted co dew ord x as f ( c ( s ) , u ), where c ( s ) stands f or the bit string of the lossless compression of s , indicating the bin index using nH ( S ) /λ nats, and u as an indep enden t random bit string of length nI ( X ∗ ; Z ∗ ) nats, th en w e h a v e the f ollo win g: One the one hand, H ( X | Z ) ≤ H ( c ( S ) , U | Z ) = H ( c ( S ) | Z ) + H ( U | Z , c ( S )) where th e term H ( U | Z , c ( S )) essen tially v anishes since, as menti oned ab o v e, ev ery bin form s a c hannel su b–co de that is reliably decod able by the ea vesdropp er. On the other hand, H ( X | Z ) = H ( X ) − I ( X ; Z ) , th us the equiv o cation ac hiev ed is: H ( S | Z ) ≥ H ( c ( S ) | Z ) ∼ H ( X ) − I ( X ; Z ) where the ﬁ rst term in the r.h.s. is essen tially n [ H ( S ) /λ + I ( X ∗ ; Z ∗ )] and the second term, whic h is a mutual information induced b y a co de ab o ve capacit y , can b e ev aluated using our ab o v e results, pro vided that the c hann el co d e is randomly selected from an ensem ble that satisﬁes our assump tions. F or example, if the co dewo rds are chose n i.i.d. ac cording to the distrib u tion of X ∗ , then I ( X ; Z ) is appro ximately nI ( X ∗ ; Z ∗ ), and then full secrecy is ac hieve d as H ( S | Z ) / N is essen tially equal to H ( S ). Nonetheless, since the rate of the co de [ H ( S ) /λ + I ( X ∗ ; Z ∗ ) is less than I ( X ∗ ; Y ∗ ), the legtimate deco der can still d eco d e reliably . O u t results can b e used also to assess th e secrecy ac h iev ed by random v arlaibles other than i.i.d. according to X ∗ , wh ile ensu r ing th at the goo d deco de can still d ecod e reliably . 6 Extension to Multiuser Settings The ab o v e ideas can b e extended in a natural manner to multiuser comm unication situations, and in this section, we d emons trate this for the m ultiple acce ss c hann el (MA C), wh er e the underlyin g principle is aga in thermal equilibrium b et wee n the subs ystems p ertaining to th e diﬀerent users an d 21 that of the channel. As b efore, our fo cus is on the regi me wher e reliable comm unication cannot hold (the paramegnetic phase). As an example, consider a randomly selected join t source–c hannel co d e for a M A C with t wo users, in the follo wing s etting. W e are giv en tw o indep enden t sour ces, S 1 , S 2 , . . . and T 1 , T 2 , . . . go verned by probabilit y distributions P S ( · ) and P T ( · ), which are prop ortional to exp {− β E S ( · ) } and to exp {− β E T ( · ) } , with partition fu nctions Z S ( β ) and Z T ( β ), resp ectiv ely . Eac h N –v ector of the ﬁrst source s = ( s 1 , . . . , s N ) ∈ S N is encoded in to a c hannel input vec tor x S ( s ) ∈ X n S and eac h N –v ector of th e seco nd source t = ( t 1 , . . . , t N ) ∈ T N is enco ded into a c hannel in put v ector x T ( t ) ∈ X n T . Both co d eb o oks are select ed indep endently , wh ere eac h codevec tor of the ﬁrst co de is chosen ind ep endently acco rdin g to distrib ution M S and eac h co d ev ector of th e second co deb o ok is selected indep end en tly according to distribu tion M T . Both cod ew ords are fed int o a memoryless MA C W ( y | x S , x T ), whic h is prop ortional to exp {− β E C ( x S , x T , y ) } . If we wish to estimate the m utual information E I ( S , T ; Y ) induced b y the co d e, this is qu ite a trivia l extension of the f ormer deriv ation. But what ab out E I ( S ; Y )? Here, it will b e more conv enient to adopt the al ternativ e d eriv ation of eq. (14 ). Considerin g the partition fun ction Z ( β | y ) = X s , t exp {− β [ E S ( s ) + E T ( t ) + E C ( x S ( s ) , x T ( t ) , y )] } , let ǫ ∗ S , ǫ ∗ T , and ǫ ∗ C denote the d ominan t energies allo cated to the source S , the sour ce T , and the MA C, resp ectiv ely . Also, for a t y p ical randomly c hosen codeword x S ( s ) of the source mes- sage s actually transmitted, let us deﬁn e e nφ n,δ ( ǫ | x S ( s ) , y ) as the probabilit y (under M T ) that E c ( x S ( s ) , X T , y ) is b etw een n ( ǫ − δ / 2) and n ( ǫ + δ/ 2), for give n x S ( s ) and y , and assume that as n → ∞ and th en δ → 0, φ n,δ ( ǫ | x S ( s ) , y ) tends u n iformly almost s urely to a certain f unction which will b e denoted by φ ( ǫ | S ). No w, lim N →∞ E I ( S ; Y ) N = lim N →∞ 1 N E { ln P ( Y | S ) } − lim N →∞ 1 N E { ln P ( Y ) } = lim N →∞ 1 N E    ln   1 Z T ( β ) X t exp {− β [ E T ( t ) + E C ( X S ( S ) , X T ( t ) , Y )] }      − lim N →∞ 1 N E    ln   1 Z S ( β ) Z T ( β ) X s , t exp {− β [ E S ( s )+ 22 E T ( t ) + E C ( X S ( s ) , X T ( t ) , Y )] } ] } = ψ S ( β ) + Σ T ( ǫ ∗ T ) + λφ ( ǫ ∗ C | S ) − β ( ǫ ∗ C + ǫ ∗ T ) − Σ T ( ǫ ∗ T ) − Σ S ( ǫ ∗ S ) − λφ ( ǫ ∗ C | S ) + β ( ǫ ∗ S + ǫ ∗ T + ǫ ∗ C ) = λ [ φ ( ǫ ∗ C | S ) − φ ( ǫ ∗ C )] (15) The last line of the ab o ve c hain of equalities can b e in tuitiv ely explained as follo ws: The term − λφ ( ǫ ∗ C ) stands fo r lim N →∞ E I ( S , T ; Y ) / N , because of the same reaso ning as b efore (if w e lo ok at th e pair ( S , T ) as one en tit y). The term λφ ( ǫ ∗ C | S ) corresp onds to the conditional mutual information rate lim N →∞ E I ( T ; Y | S ) / N since the true S is give n and only th e r andom cod ew ord of T is select ed. Th us, by the c hain ru le of the mutual information, the d iﬀerence giv es the mutual information rate b et w een S and Y . Example 2. Consider th e binary mo dulo–2 additiv e MA C, Y = X S ⊕ X T ⊕ V , where all v ariables tak e on v alues in { 0 , 1 } , ⊕ d enotes addition m o dulo 2 (X O R), and V is Bernoulli with parameter p = Pr { V = 1 } , ind ep endent of X T and X S . S imilarly as in Example 1, let the co deb ooks o f the t w o users b e generated by i.i.d. distrib utions with parameters m S and m T , respectiv ely . No w, as b efore, ǫ ∗ C = p and the pr obabilit y that X S ⊕ X T , whose comp onents are Bernoulli( m S ∗ m T ), w ould fall within distance n p from a t ypical y , w hose comp onen ts are Bernoulli( m S ∗ m T ∗ p ), is exp onen tially e n [ h 2 ( p ) − h 2 ( m S ∗ m T ∗ p )] , th us φ ( p ) = h 2 ( m S ∗ m T ∗ p ) − h 2 ( p ). On the other hand, the pr ob ab ility of the same ev en t conditioned on x S , is the probabilit y that X T w ould fall within distance np from y ⊕ x S = x T ⊕ v (whic h has Bernoulli( m T ∗ p ) comp on ents), and thus is o f the exp onenti al ord er of e nφ ( p | S ) = e n [ h 2 ( p ) − h 2 ( m T ∗ p )] . It follo ws then that lim N →∞ E I ( S ; Y ) N = λ [ h 2 ( m S ∗ m T ∗ p ) − h 2 ( m T ∗ p )] . In th e s p ecial case wher e m T = 1 / 2, w e ge t lim N →∞ I ( S ; Y ) N = 0 rega rdless of m S , in agreement with int uition, as X T b ehav es like Bern oulli(1/2 ) noise in the paramagnetic r egime. References [1] S . Arimoto, “On the con verse to the cod ing theorem for discrete memoryless c hannels,” IEEE T r ans. Inform. The ory , v ol. IT -19, no. 5, pp. 357–359, Ma y 1973. 23 [2] I . Csisz´ ar and J. K¨ orner, Inf ormation Th eory: Co ding Theorems for Discrete Memoryless Systems. New Y ork: Academic, 1981. [3] B. Derrida, “Rand om–energy mo d el: limit of a family of disordered mo dels,” Phys. R ev. L ett. , vol. 45, no. 2, pp. 79–82, July 1980. [4] B. Derrida, “The rand om energy mo del,” Physics R ep orts (Review Section of Physics Letters), v ol. 67, no. 1, pp. 29–35, 1980. [5] B. Derrida, “Random–energy mo del: an exactly solv able mo del for disord er ed systems,” Phys. R ev. B , v ol. 24, no. 5, pp. 2613–2 626, September 1981. [6] G. Duec k and J. K¨ orner, “Relia bilit y function of a discrete memoryless c hann el at rates ab ov e capacit y ,” IEEE T r ans. Inform. The ory , vol. IT-25, no. 1, pp. 82–85 , January 1979. [7] G. D. F orn ey , Jr., “Exp onen tial error b ound s for erasu re, list, and decision f eeedbac k sc hemes,” IE EE T r ans. Inform. The ory , vol. IT–14, no. 2, pp. 206–220, Marc h 1968. [8] R. G. Gallager, Inform ation Theory and Reliable Comm unication , J. Wiley & Sons, 1968. [9] D. Guo, S . S hamai, and S . V erd ´ u, “Mutual information and min imum mean–square error in Gaussian channels,” IE EE T r ans. Inform. The ory , vol. 51, no. 4, pp. 1261–1282 , April 2005 . [10] C. Kittel, Elementary statistic al physics , John Wiley & S ons, 1958. [11] N. Merha v, “The random energy m o del in a magnetic ﬁeld and j oint source–c hannel co ding,” Physic a A: Statistic al M e chanics and Its Applic ations , v ol. 387, issue 22, pp. 5662–567 4, Septem b er 15, 2008. doi:10.10 16/j.ph ysa.2008.05.040 [12] M. M ´ ezard and A. Mon tanari, Inf ormation, Ph ysics and Computation , draft, No vem b er 2007 [h ttp://www.stanford.edu/ ∼ montanar/BOOK/b o ok.h tml]. [13] P . Ru j´ an, “Finite temp erature error–correcting codes,” Phys. R ev. L et. , v ol. 70 , no. 19, pp. 2968– 2971, Ma y 1993. [14] A. D. Wyner , “The w ire-tap c hannel,” Bel l System T e chnic al J ournal , v ol. 54, no. 8, pp. 1355– 1387, Octob er 1975. 24

Joint source-channel coding via statistical mechanics: thermal equilibrium between the source and the channel

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment