Joint source-channel coding via statistical mechanics: thermal equilibrium between the source and the channel
We examine the classical joint source--channel coding problem from the viewpoint of statistical physics and demonstrate that in the random coding regime, the posterior probability distribution of the source given the channel output is dominated by so…
Authors: ** Neri Merhav (Technion – Israel Institute of Technology) **
Join t Source–Channel Co ding via Statis tical Mec hanics: Thermal Equilibrium Bet w een the Source and t he Channel ∗ Neri Merha v Departmen t o f Electrical Engineering T e c hnion - Israel Institute of T ec hnology Haifa 32000, ISRAEL Abstract W e examine the classical joint so urce–channel co ding problem from the v iewpo int of sta- tistical physics a nd demonstr ate that in the ra ndom co ding regime, the poster io r proba bilit y distribution of the source given the c hannel output is dominated by sour ce sequences, which exhibit a be havior that is highly parallel to tha t o f thermal equilibrium be t ween tw o s ystems of par ticles that exc hange energy , wher e one sys tem correspo nds to the source and the other corres p o nds to the c ha nnel. The thermo dyna mical entopies of the dual ph ysical pr o blem are analogo us to conditiona l and unconditional Shannon entropies of the source, a nd so, their ba l- ance in t hermal equilibrium yields a simple for mula for the mutual information betw een the source a nd the channel o utput, that is induced by the typical co de in an ensemble of join t source–channel co des under cer tain co nditions. W e also demonstra te how our results can b e used in applications, like the wiretap channel, and ho w can it be extended to m ultiuser scenar - ios, like that of the multiple access c hannel. Index T erms: join t source–channel co ding, statistical physics, th ermal equilibrium, mutual information, entrop y . 1 In tro duction Consider the follo wing t w o seemingly unrelated problems, wh ich serv e as simp le sp ecia l cases of a more general setting we study later in this pap er: The first is an elemen trary p r oblem in statistical ph ysics: W e hav e t wo subsystems of particles whic h are brought into thermal equilibirium with eac h other as well as with the en vironment (a ∗ P art of this w ork w as carried out during a visit in Hewl ett–Pac k ard Lab oratories, P alo Alto, CA, U.S.A., in the Summer of 2008. 1 heat bath) at temp erature T . The first subsystem consists o f N particles ha ving magnetic momen ts (spins), { s i } , eac h of wh ic h ma y b e orien ted either in the direction of an applied external magnetic field B , in whic h case s i = + 1, or in the opposite direction, in w hic h ca se s i = − 1, and its energy in b oth cases is giv en b y − s i B (up to a certain multiplicati ve constant , w hic h carries th e appr opriate physic al un its, and wh ic h is irrelev an t for the pur p ose of this discussion). In the second su bsystem, there are n non–interac ting particles { s ′ i } n i =1 , eac h one of wh ic h ma y lie in one of t wo p ossible states: the state s ′ i = 0, in which the particle has zero energy , and th e state s ′ i = 1, in which it has energy e 0 . What is the a ve rage energy p ossessed by eac h one of these sub systems in equilibrium, as fun ctions of e 0 , T , n , N , and B ? The sec ond problem is in Information Theory , in particular, it is in joint source–c hannel co ding, where some of th e nota tion used is delib erately c h osen to b e the same as in the pr evious paragraph: A binary memoryless source generates a v ector s of sym b ols ( s 1 , s 2 , . . . , s N ), s i ∈ { +1 , − 1 } , i = 1 , . . . , N , with probabilitie s q = Pr { S i = +1 } and 1 − q = Pr { S i = − 1 } . This v ector is enco ded in to a binary channel co deword x ( s ) of length n and transmitted o v er a b inary symm etric c hannel (BSC) with a crosso ver probabilit y p < 1 / 2, and a binary n –v ector y is receiv ed at the c hannel output. Consider the p osterior distribution P ( s | y ) = P ( s ) W ( y | x ( s )) P s ′ P ( s ′ ) W ( y | x ( s ′ )) where P ( s ) and W ( y | x ) are th e pr obabilit y d istributions that go v ern the sour ce and the c h an- nel, resp ectiv ely , as describ ed ab o v e. Th us, cle arly , P ( s | y ) is prop ortional t o P ( s ) W ( y | x ( s )), o r equiv alen tly , ln P ( s | y ) is (within a term that is ind ep enden t of s ) giv en b y ln P ( s ) + ln W ( y | x ( s )). F or a t ypical co de dra wn un iformly at rand om from t he ensem ble of cod es, what are the relat ive con tributions of the source and the channel to this quan tit y , for those v ectors s that d ominate P ( s | y ) (i.e., those that capture the v ast ma jorit y of the p osterior p r obabilit y)? It turns out, as we shall see in Section 3 b elo w, that th e t wo problems ha v e virtually id en tical answ ers (in a sense that will b e made clear and pr ecise therein), pro vid ed that the parameters T and B of the first problem are r elated to the p arameters p and q of the second pr oblem b y p = exp {− e 0 /k T } 1 + exp {− e 0 /k T } (1) 2 and q = exp { B /k T } 2 cosh( B /k T ) , (2) or equiv alen tly , e 0 = k T ln 1 − p p (3) and B = k T 2 ln q 1 − q , (4) where k is Boltz mann’s constan t. Thermal equilibrium b et w een the t wo subsystems in the ab ov e describ ed p h ysical problem, dic- tates a certa in balance b et we en their thermo dyn amical en tropies in order to arrive at the maxim um total en trop y (by the second la w of th ermo dynamics) for the total energy p ossessed by the entire system at the given temp erature T . As the therm o dynamical en tropy , in its statistical –mec hanical definition, is intimatel y related to the Shannon ent ropy , it tu rns out that this equilibrium relation b et we en the thermo dynamical en trop ies of the ph ysical problem, give s r ise to an analog ous r elation b et we en Shannon en tropies p ertaining to the join t source–c hannel codin g problem in the random co ding regime . In p articular, it relates the ent ropy of the source to its c onditional en trop y gi ve n the c hannel outp u t, whose difference is exactly th e m utual information b et ween the source and the c hannel output. The final outcome of this is a simp le formula for calculating the mutual infor- mation rate b et ween th e input and the output o f a c o ded system for th e typical c o de in a g iv en ensem ble u nder certain c onditions. This calculatio n builds strongly on the rand om energy mo d el (REM) of sp in glasses due to Derrida [3, 4, 5] and its relation to the random co de ensem ble (R CE) as describ ed in [12]. Clearly , und er the regime of reliable comm unication, the m utual information r ate b et ween the source and the channel output coincides with the en trop y rate of the s ource, as the conditional en trop y rate of the source giv en the channel outpu t v anishes. Thus, the problem of calculating the mutual in f ormation un der reliable comm unication conditions is easy and in fact, not quite in teresting. The same calculation, ho w ev er, when the cond itions of reliable comm unication are not met, app ears less trivial. But what would b e the motiv ation for suc h a calculation? Here are just a f ew examples that motiv ate this: Consider a user th at, in addition to its desired signal, recei ve s also a relativ ely strong in terfering sig nal ( co deword), which is in tended to 3 other users, and whic h comes fr om a c o deb ook whose rate exce eds the capacit y of this c rosstalk c hannel b etw een the int erferer and our user, so that the us er cannot fully deco de this inte rference. Nonetheless, our user w ould lik e to learn as muc h as p ossible on the in terfering signal for m an y p ossible reasons: F or example, the user would like to learn th e interference signal in order to ident ify where it originates from, or in ord er to estimat e it and subtract it (intereference cancella tion). T he m utual information rate, call it I , b et w een the interference signal and the c hannel output th en giv es some assessmen t concerning the qualit y of this estimation. F or on e th ing, D ( I ), w h ere D ( · ) is the d istortion–rate function of the source, is a lo w er boun d to the distortion in estimating this signal. Mo reo v er, if the c hannel is Gaussian, one can calculate the exact minim um mean square error (MMSE) fr om the mutual information r ate I b y taking its deriv ativ e w.r.t. the signal–to–noise ratio (SNR) [9]. Another ap p lication comes from sce narios where t he ab o v e describ ed r eceiv er is a hositle part y (an ea vesdropp er), from whic h one w ould lik e to conceal information as m uc h as p ossible. The natural setup, in this con text, is that of the wir etap c hannel (cf. [14] as well as many follo w –up pap ers), where excess c hannel noise b ey ond capacit y is harn essed as an effectiv e k ey that secures data communicatio n. As w e sh o w in the sequel, th e m utual inf ormation rate b etw een the transmitted m essage and the ea v esdropp er, which suffers from this excess noise, is strongly related to the equiv o cation, whic h is a customary mea sure of securit y in Shannon–theoretic secrecy systems. The outline of this pap er is as follo ws. In Section 2, we establish notation conv ent ions. In Section 3, we p ro vide s ome basic bac kground of elemen tary statistical physics, whic h will b e needed in the sequel. In Sectio n 4, we deriv e our main resu lt, whic h is a form ula for the mutual information rate. In S ection 5, we demonstrate ho w it is applied for the wiretap c hannel, and finally , in S ection 6, w e demons trate how our results ca n b e extended to multiuser scenarios, lik e that of the multiple access c hannel. 2 Notation Con v en tions Throughout this pap er, scalar random v ariables (R V’s) will b e denoted by th e capital letters, lik e S , X , and Y , their sample v alues will b e denoted b y the resp ectiv e lo wer case letters, and their alphab ets will b e denoted by the resp ectiv e calligraphic letters. A similar conv entio n will apply to 4 random vec tors and their sample v alues, whic h will b e d enoted with same symb ols with the bold face font. Th us, for example, X will denote a rand om n -vec tor ( X 1 , . . . , X n ), and x = ( x 1 , ..., x n ) is a sp ecific v ector v alue in X n , the n -th Cartesian p o wer of X . Sources and channels will b e denoted generica lly b y the le tter P , Q , M and W . Whenev er cl arit y and unam biguit y will require it, these letters will b e subscrip ted b y th e names of the relev ant R V’ s, follo wing the standard notation con v en tions in the literaure, fo r example, P S will denote the probabilit y distrib ution of a random v ariable S , P X | Y will den ote the conditional probabilit y distribution of X giv en Y , and so on. The cardinalit y of a fi nite set A will b e d enoted by |A| . Information theoretic quan tities lik e ent ropies and m utual informations will b e denoted follo w in g the usual con v ent ions of th e information theory literature. 3 Bac kground In this section, we pr o vide a b rief acc ount of the v ery basic bac kground in statistic al p h ysics, whic h is needed for this pap er. Consider a p hysical system with N of particles, whic h can b e in a v ariet y of m icroscopic s tates (‘microstates’) , defi ned b y com binations of of physical quan tities asso ciated with these particles, e.g., p ositions, momen ta, angular momenta, spins , etc., of all N particles. F or eac h suc h mi- crostate of the system, whic h w e shall designate by a v ector s = ( s 1 , . . . , s N ), there is an asso ciated energy , giv en b y an Hamiltonian (energy function), E ( s ). F or example, if s i = ( p i , r i ), wh ere p i is the momen tum v ector of p article n umber i and r i is its p osition vec tor, then classically , E ( s ) = P N i =1 [ k p i k 2 2 m + mgz i ], where m is th e mass of eac h p article, z i is its heigh t – one of the co ordinates of r i , and g is the gra vitation constan t. One of the most fundamenta l results in statistical physics (based on the la w of energy conserv a- tion and the basic p ostulate that all microstates of th e same energy lev el are equiprobable) is th at when the system is in thermal equ ilibrium with its environmen t, th e probability of a m icrostate s is giv en b y the Boltzmann–Gibbs d istribution P ( s ) = e − β E ( s ) Z ( β ) (5) where β = 1 / ( kT ), k b eing Boltmann’s conta nt and T b eing temp erature, and Z ( β ) is the normal- 5 ization constan t, called the p artition function , w h ic h is giv en b y Z ( β ) = X s e − β E ( s ) or Z ( β ) = Z d s e − β E ( s ) , dep end ing on whether s is discrete or con tin uous. The r ole of the partition fu nction is by far deep er than just b eing a normalization factor, as it is actually the k ey quan tit y from whic h m an y macroscopic physical quan tities can b e deriv ed, f or example, the free energy 1 is − 1 β ln Z ( β ), the a v erage in ternal energy (i.e., the exp ecta tion of E ( s ) where s drawn is acco rdin g (5 )) is giv en by the negativ e deriv ativ e o f ln Z ( β ), the h eat c apacit y is obtained from the second deriv ativ e, etc. One of th e wa ys t o obtain eq. (5), is as the maximum entrop y distribution und er an energy constraint (o wing to the second law of thermo dynamics), where β pla ys th e role of a Lagrange m ultiplier that con trols th is energy lev el. Let us d efine th e quantit y: Ω N ,δ ( ǫ ) = { s : ( ǫ − δ / 2) N ≤ E ( s ) ≤ ( ǫ + δ / 2) N } , (6) and let us assume that the limit Σ( ǫ ) = lim δ → 0 lim N →∞ ln Ω N ,δ ( ǫ ) N exists and that Σ( ǫ ) is a differenti able conca ve function. Σ ( ǫ ) is the e ntr opy of th e ph ysical system in its statistical–mec hanical definition. W e will see shortly th at it is int imately related to the Sh an n on en trop y associated with the Boltzmann–Gibbs probablit y distribu tion P ( s ) d efined ab ov e. T o s ee why the conca vit y assumption mak es sense, note that a t least when P ( s ) is a prod u ct distribution (namely , when E ( s ) = P i E ( s i )), Ω N 1 + N 2 ,δ N 1 ǫ 1 + N 2 ǫ 2 N 1 + N 2 ≥ Ω N 1 ,δ ( ǫ 1 ) · Ω N 2 ,δ ( ǫ 2 ) since for eve ry configuration s , where N 1 ≤ N particles ha v e tota l energy N 1 ǫ 1 and N 2 = N − N 1 particles h av e total energy N 2 ǫ 2 , the total energy of all N = N 1 + N 2 particles is obviously 1 The free energy means th e maximum work that the sy stem can carry out in any pro cess of fix ed temp erature. The maxim um is obtained when th e pro cess is rev ersible (slow, q uasi–static c h anges in the system). 6 N 1 ǫ 1 + N 2 ǫ 2 , but the con verse is not true since there are other wa ys to split the total energy of N 1 ǫ 1 + N 2 ǫ 2 b et we en the tw o complemen tary subsets of particles. Th us, taking the loga rithm of b oth sides, d ividing by ( N 1 + N 2 ), then taking the limits of N 1 , N 2 → ∞ suc h that N 1 / N 2 tends to a give n constant, and fin ally , taking the limit of δ → 0, one readily observes th at Σ( ǫ ) is conca ve. An argumen t of the same sp irit can b e exercised in somewhat more general situations, e.g., wh en P ( s ) has a Mark ov structur e (namely , the physical system has some nearest–neigh b or interac tions), though some more caution is required. Denoting ψ ( β ) = lim N →∞ 1 N ln X s exp {− β E ( s ) } , it is readily seen that ψ ( β ) = lim δ → 0 lim N →∞ 1 N ln X j ≥ 0 Ω N ,δ (( j + 1 / 2) δ ) · exp {− N β j δ ] } = sup ǫ ≥ 0 [Σ( ǫ ) − β ǫ ] , (7) i.e., ψ ( · ) and Σ( · ) are a Legendre–transform pair. Sin ce Σ( · ) is assumed conca ve , then the inv ers e transform relation Σ( ǫ ) = inf β ≥ 0 [ β ǫ + ψ ( β )] , holds true as well, an d so the deriv ativ es β ( ǫ ) ∆ = d Σ /dǫ and ǫ ( β ) = − dψ/dβ (whic h are the maximizer of [Σ( ǫ ) − β ǫ ] and the m inimizer of [ β ǫ + ψ ( β )], resp ectiv ely), are in v erses of eac h other. It follo w s then that Σ( ǫ ) = ψ ( β ) − β · dψ dβ , but as is readily seen, − dψ /dβ is the a v erage in tern al energy , E {E ( S )] } , where E is the e xp ecta tion op erator associated with the Boltzmann distribution. This, in turn, is readily verified to agree with the expression of th e Shann on en trop y r ate H ( S ) of the distribution P ( s ), H ( S ) = lim n →∞ 1 N E ln 1 P ( S ) = lim n →∞ 1 n E ln Z ( β ) exp {− β E ( S ) } = ψ ( β ) + β E {E ( S ) } . (8) Th us, Σ ( ǫ ) = H ( S ) whenev er β and ǫ are related by β = β ( ǫ ), or equ iv alen tly , ǫ = ǫ ( β ). F or a giv en β , the Boltzmann–Gibbs distribution h as a sharp p eak (for large N ) at the lev el of ǫ ( β ). W e 7 then say that this v alue of ǫ is the dominant energy lev el: Not only is it the av erage energy , there is also a strong concen tr ation of the probabilit y ab out this v alue as N grows without b ound. The second law of thermo dyn amics asserts th at in an isolated sy s tem (which do es not exchange ener gy with its environmen t), the total en tropy cannot decrease, and hence in equilibrium, it r eac h es its maxim um. No w, supp ose that w e ha v e a p h ysical system that is comp osed of tw o subsystems, one having N particles with microstates { s } and Hamiltonian E 1 ( s ), and the other has n particles with microstates { s ′ } and Hamiltonian E 2 ( s ′ ). Let us supp ose that these t wo subsystems are in thermal co nta ct and they b oth resid e in a v ery large environmen t (heat bath) ha ving a fixed temp erature T = 1 / ( k β ). The tw o sub systems are allo wed to exc hange energy with eac h other as w ell as with the heat bath. Ho w is the total energy of the system s p lit b et wee n the tw o subs y s tems? An example of tw o such subsystems wa s describ ed in the fi rst few paragraphs of the In tro du ction. The partition fu nction of the comp osite system is giv en b y Z ( β ) = X s , s ′ exp {− β [ E 1 ( s ) + E 2 ( s ′ )] } and so th e dominant energy lev el, as we saw b efore, is the one that a c hieve s the asso ciated nor- malized log–partition function ψ ( β ), i.e., th e solution ǫ 0 to the equation d Σ( ǫ ) /dǫ = β , wher e Σ( ǫ ) is the entrop y of the com bin ed system. Let us confine atten tion n o w to the set of combined mi- crostates { ( s , s ′ ) } of the composite s ystem wh ich ha v e energy ( N + n ) ǫ 0 . More precisely , assume that the r atio n/ N = λ is held fixed, so ( N + n ) ǫ 0 = N (1 + λ ) ǫ 0 , and let us define Ω N ,n,δ ( ǫ 0 ) = { ( s , s ′ ) : N (1 + λ )( ǫ 0 − δ / 2) ≤ E 1 ( s ) + E 2 ( s ′ ) ≤ N (1 + λ )( ǫ 0 + δ / 2) } . Clearly , every configuration ( s , s ′ ) with energy ab ou t N (1 + λ ) ǫ 0 corresp onds to some allo cation of of the energy in one su bsystem and the r emainin g energy in th e other. Thus, defi ning Ω (1) N ,δ ( ǫ ) and Ω (2) n,δ ( ǫ ) as the e numerators of microstate s with energy ab out ǫ in e ac h one of the t wo sub systems individually (as d efined in eq. (6)), w e hav e, for ˆ δ = δ (1 + λ ): Ω N ,n, ˆ δ ( ǫ 0 ) = X j ≥ 0 Ω (1) N ,δ (( j + 1 / 2) δ )Ω (2) n,δ (1 + λ ) ǫ 0 − ( j + 1 / 2) δ λ . Defining Σ( ǫ ) as lim δ → 0 lim N →∞ [ln Ω N ,λN , ˆ δ ( ǫ )] / [ N (1 + λ )], we find, after taking logarithms of b oth sides, dividing by N (1 + λ ), letting N → ∞ , and then δ → 0, that Σ( ǫ 0 ) is giv en by the w eigh ted 8 supremal conv olution 2 : Σ( ǫ 0 ) = sup 0 ≤ ǫ ≤ (1+ λ ) ǫ 0 1 1 + λ · Σ 1 ( ǫ ) + λ 1 + λ · Σ 2 (1 + λ ) ǫ 0 − ǫ λ . Assuming that t he maximum is ac hiev ed b y ǫ ∗ ∈ (0 , (1 + λ ) ǫ 0 ), it is c haracterized by a v anishing deriv ativ e of the expression in the square br ack ets, i.e., the solution to the equation Σ ′ 1 ( ǫ ) = Σ ′ 2 (1 + λ ) ǫ 0 − ǫ λ , (9) where ǫ is th e unknown, and where Σ ′ i is the deriv ativ e of Σ i , i = 1 , 2. This equation c haracterizes the thermal equilibr ium b et we en the t wo subsystems and the heat bath. No w, the left–hand side is e xactly β . Thus, ǫ ∗ , the p er–particle energy share of the first subs ystem is the solution to the equation Σ ′ 1 ( ǫ ) = β (or, equiv alen tly , of eq. (9), as s aid), and the remaining energy p er particle , [(1 + λ ) ǫ 0 − ǫ ∗ ] /λ b elongs to the other sub system. Comment. Returning to th e example that op ens the Introd uction, a simple calculation sho ws that the dominant energies are H · E { N X i =1 S i } = N B tanh B k T in the fi rst s ubsystem, and e 0 · E { n X i =1 S ′ i } = ne 0 exp {− e 0 /k T } 1 + exp {− e 0 /k T } in the second s ubsystem. Thus, ǫ ∗ = B tanh B k T and (1 + λ ) ǫ 0 − ǫ ∗ λ = exp {− e 0 /k T } 1 + exp {− e 0 /k T } . In the parall el join t source–c hannel codin g problem describ ed in the In tro du ction, and t o b e fur - ther studied in a more general setting in the sequel, w e hav e: ln P ( s ) = ( 1 2 ln q 1 − q ) · P N i =1 s i + const, and ln W ( y | x ) = (ln p 1 − p ) · P n i =1 ( x i ⊕ y i ) + const, with ⊕ denoting mo d ulo 2 addition, the dom- inan t con tribution to P ( s | y ) comes from those { s } f or whic h P N i =1 s i is about its t ypical v alue N [(+1) · q + ( − 1) · (1 − q )] = N (2 q − 1) = N ta nh( B /k T ) (in analogy to the energy of the 2 The supremal conv olution betw een tw o fun ctions f ( x ) and g ( x ) is generally defi ned as h ( x ) = sup t [ f ( x − t ) + g ( t )]. The qualifier “weigh ted”, in our context, refers to the fact that b oth functions as well as their arguments are w eigh ted by 1 / (1 + λ ) and λ/ (1 + λ ) . 9 first s u bsystem ab o ve, where w e h a v e used the relations (1)-(4) ) and P n i =1 ( x i ⊕ y i ) is ab out np = n exp {− e 0 /k T } / [1 + exp {− e 0 /k T } ] (in analogy to the energy of the second subsystem). Notice that these tw o typical con tributions to the log –p osterior probabilit y agree also with the corresp ondin g t ypical con tributions, ln P ( s 0 ) an d ln W ( y | x ( s 0 )), of the r e al message s 0 that w as actually tr an s mitted. Th is is tru e regardless of whether the comm unication is r eliable or not, i.e., it contin ues to h old no matt er whether the en tropy rate of the so urce is smaller or larger than λ times the mutual information b et wee n the input and the outp ut of the channel. Returning to the general discussion ab o ve, note that the same considerations con tin ue to hold ev en if one of the sys tems, sa y , the second on e, has an effectiv e ne g ative en tropy , that is, Ω (2) n,δ ([(1 + λ ) ǫ 0 − ǫ ∗ ] /λ ) < 1, whic h means th at for eac h microstate s of the fi rst s ubsystem with p er–particle energy ǫ ∗ , only a fr action of the compatible combined microstate s { ( s , s ′ ) } ha v e n oramilzed energy ǫ 0 . Of course, Ω N ,n, ˆ δ ( ǫ ∗ ) m ust b e larger than 1. In the sequel, we shall see that in the join t source– c hannel co ding problem, the sou r ce and the channel constitute a mechanism whic h is h ighly parallel to th at of equilibrium energy–sharing b et ween tw o subsys tems in a h eat bath, where the subsystem corresp ondin g to the c h annel h as a negativ e effectiv e th er m o dynamic ent ropy in this sense. W e sh ould commen t that in order to determine the energy sharing b et wee n the t wo subsy s tems in th e ab o v e discus s ion, it was not necessary to consider ho w they thermally int eract with eac h ot her and to go th rough the w eigh ted supremal con v olution b et ween th eir en tropies, as w e did. W e could ha v e determined these energies simp ly by considering the equilibr ium of ea c h one of the subsystems individually with the heat bath, 3 th us equating the d eriv ativ e of eac h one of the en trop y fun ctions to β . Nonetheless, we ha ve delib erately c hosen to present the su premal conv olution b ecause in the sequel, it is th is relation that will lead to the deriv ation of the mutual information in the joint source–c h annel co din g problem. 4 F orm ulation, M ain Results and Discussion Consider an information source, S 1 , S 2 , . . . , w hose symb ols { S i } tak e on v alues in a finite alphab et S . Th e source is charact erized b y a sequ ence of probab ility distributions, P ( s ), s ∆ = ( s 1 , . . . , s N ), where N = 1 , 2 , . . . . C onsider next a d iscrete memoryless c han n el (DMC), whic h is c haracterized 3 When doing so, the other system then b ecomes part of the heat bath an ywa y . 10 b y a m atrix of single–letter transition probabilities { W ( y | x ) , x ∈ X , y ∈ Y } , where X and Y are finite alphab ets. The op eration rate of the c hannel relativ e to the source is λ c hann el uses p er source symb ol, wh ic h means that while the source p ro duces an N –v ector s = ( s 1 , . . . , s N ) ∈ S N , the channel con v eys n c hannel symbols, n amely , it receiv es an n –v ector x = ( x 1 , . . . , x n ) ∈ X n and outputs an n –vec tor y = ( y 1 , . . . , y n ) ∈ Y n , w here n = λN . The parameter λ is referred to as the b andwidth exp ansion factor of the c hannel relativ e to the source. F or the sak e of con v enience in dra wing the analog y with statistical mec hanics, we will think of b oth the source and the c hannel as Boltz mann distribu tions with certain Hamiltonians at a certain common inv ers e temp erature β , that is, P ( s ) is pr op ortional to exp {− β E S ( s ) } and W ( y | x ) is p r op ortional to exp {− β E C ( x, y ) } , where E S ( · ) and E C ( · , · ) are the Hamil tonians of the source and the channel, resp ective ly . F or a pair of n –v ectors x and y , we will d en ote W ( y | x ) = Q n i =1 W ( y i | x i ), and k eep in mind that it is pr op ortional to exp {− β E C ( x , y ) } , wh er e E C ( x , y ) ∆ = P n i =1 E C ( x i , y i ). Clearly , there is no loss of generalit y in this representa tion of the source and the channel since there is alw a ys at least one w a y of doing this: F or example, one can simply take β = 1 , E S ( s ) = − ln P ( s ), and E C ( x, y ) = − ln W ( y | x ). Th e p oin t is, ho w ev er, that by d oing this w e ha v e slightly extended the scop e: instead of one source a nd o ne c hannel, w e are actually considering a family of sources and channels, b oth indexed by a co mmon parameter β , t hat con trols the d egree of uniformity or sk ew edness of the distribu tion. An ( N , n ) joint sour c e–channel c o de , for the ab o v e d efined source and channel, is a mappin g f rom the set S N to X n . Every source string s is mapp ed in to a c hannel inpu t vec tor x ∆ = ( x 1 , . . . , x n ), and when we wish to emphasize the dep endence of x on s , w e denote it a s x ( s ). The cod e is assumed to b e selected at r an d om, where for eac h s , the co dew ord x ( s ) is dra wn u nder a distribution 4 M ( x ), indep end en tly 5 of all other cod ew ords. The receiv er estimates s by applying a certain function on the receiv ed channel o utpu t sequence y ∆ = ( y 1 , . . . , y n ), i.e., it implemen ts a function from Y n to S N , which will b e denoted b y ˆ s = ˆ s ( y ). In some applications, the r eceiv er (or the observ er) ma y not necessarily attempt at full–fledged deco din g of the message, but ma y opt to merely estimate a 4 A more general mo del w ould allow a distribution M that dep ends on s . F or examp le, if S N can b e natu rally divided into type classes (like in te case of memoryless sources, Marko v sources, etc.), then it is plausible to let M dep end on the t yp e class of s . How ever, among all sequences in S N , the important ones are those that ar e t ypical to the source (others can b e ignored in the large N limit), which are equiprobable in the exp onential scale, and so, the distribution M for al l of t hem can b e tak en to b e the same without los s of as ymptotic optimalit y . 5 The indep endence assumption is made here mostly for the sak e of simplicit y . It can b e somewhat relaxed as long as t he concen tration properties specified b elo w con tinue to hold. 11 certain function of the source sequence (e.g., some statistic such as its comp osition). Our study of th e m utual information ind uced b y the joint source–c hannel cod e will b e s trongly based on the p osterior distribution, wh ic h, for a giv en (randomly selected) co d e, is defin ed as: P β ( s | y ) = P ( s ) W ( y | x ( s )) P s ′ ∈S N P ( s ′ ) W ( y | x ( s ′ )) = exp {− β [ E S ( s ) + E C ( x ( s ) , y )] } P s ′ exp {− β [ E S ( s ′ ) + E C ( x ( s ′ ) , y )] } . (10) On a tec hnical n ote, observ e that since the posterior distr ib ution is give n by a rati o, this all o ws slighlt y more f r eedom in the defin ition o f the Ha miltonians E S and E C , as certain common constants in the n umerator and the denominator ma y cancel eac h other. F or example, if the source is bin ary and m emoryless, as d escrib ed in the example giv en in the In tro du ction, then P ( s ) is prop ortional to exp {− ( 1 2 ln 1 − q q ) P N i =1 s i } , and so one can defi ne E S ( s ) to b e prop ortional to P N i =1 s i , where the factor 1 2 ln 1 − q q can b e sp lit b et wee n a part that is absr ob ed in the Ha miltonian itself and a p art that is attributed to the in verse temp erature parameter β . A similar comment applies to th e c hannel, but here some more caution is required since, in general, the constant of prop ortionali t y that relates W ( y | x ) and exp {− β E C ( x , y ) } ma y dep end on x , unless the co de is of constan t comp osition a nd /or the c hann el is symmetric in the sense that P y exp {− β E C ( x, y ) } is indep endent of x for all β (whic h is the case , e.g., in mod ulo–additiv e channels, like the BSC ). If neither of these conditions hold (i.e., if the co de is n ot constan t comp osition and the channel is not symmetric), we k eep the choice E C ( x, y ) as b eing prop ortional to − ln W ( y | x ). F or a given choi ce of the Hamiltonians E S and E C , in view of these considerations, let us define the joint sour c e–channel p artition function as the d enominator of the p osterior distribution, i.e., Z ( β | y ) ∆ = X s ∈S N exp {− β [ E S ( s ) + E C ( x ( s ) , y )] } . In the course of studying the prop er ties of a t ypical realization of the join t s ource–c h annel partition function, p ertaining to a giv en cod e ensem ble, we w ill mak e a few obser v ations, which were al ready men tioned b riefly in the Introd uction: 1. Similarly a s results th at hav e already b een observ ed in the co nte xt of th e pure c hannel codin g problem [12], the sta tistical–mec hanical system p ertaining to Z ( β | y ) undergo es a phase tran- sition, whic h corresp onds, in the realm of co ded systems, to the transition b et ween reliable 12 and unreliable communicati on, n amely , the p oin t at wh ic h the en tropy rate of the source exceeds the mutual information b etw een the inpu t and the output of the c h annel. 2. When iden tifying the set of sour ce vec tors { s } th at dominates Z ( β | y ) (i.e., those that con tribute most to Z ( β | y )) ab o v e the phase transition temp erature, one ob s erv es a situa- tion th at parallels t hat o f t hermal equilibrium b et wee n t w o p h ysical sub systems, one co rre- sp ond in g to the source and the other corresp onds to the channel. T o b e more sp ecific, if E ( s , y ) = E S ( s ) + E C ( x ( s ) , y ) is though t of as the total ‘energy’ shared b y the source and the co de/c hannel, then the dominan t messages { s } split this total a verag e energy b et w een the source and the c hann el comp onen ts in a w a y that corresp onds to thermal equilibrium b et we en the t w o parallel physica l su bsystems. 3. The balance b et ween the thermo dynamic al en tropies of th e tw o p hysical su bsystems that lie in equilibrium, as describ ed in item no. 2, is identified with the simple relation b et ween th e corresp ondin g Shanno n en tropies of the source, namely , the unconditional source e ntrop y and the conditional en trop y give n the c hann el output, w hose difference is th e m utual information b et we en th e sour ce and the c hannel output. This giv es rise to a simple form ula of the m utual information rate in duced b y a typica l co de in th e ensemble. In analogy to the definitions and the assumptions outlined in Section 3, we no w mak e a few definitions and assu mptions concerning the join t source–c hannel co d ing mo d el. A.1 Defin ing Ω ( S ) N ,δ ( ǫ ) ∆ = n s ∈ S N : ( ǫ − δ / 2) N ≤ E S ( s ) ≤ ( ǫ + δ / 2) N o , our first assu mption is that Σ S ( ǫ ) ∆ = lim δ → 0 lim N →∞ ln Ω ( S ) N ,δ ( ǫ ) N exists and that Σ S ( ǫ ) is a differen tiable conca v e function. A.2 F or a give n y , d efine φ n,δ ( ǫ | y ) ∆ = 1 n ln Pr { n ( ǫ − δ / 2) ≤ E C ( X , y ) ≤ n ( ǫ + δ / 2) } , where the random v ector X is d ra wn und er the rand om cod in g distribution M , indep endently of y . T hen, our second assumption is that for all ǫ ≥ 0, lim δ → 0 lim n →∞ E { φ n,δ ( ǫ | Y ) } tends 13 uniformly to a differentia ble function φ ( ǫ ), where the exp ectation E is w.r.t. b oth the r andom selection of the codeb o ok and the r andom actions o f the source and the c hannel. Moreo v er, w e assum e th at lim δ → 0 lim n →∞ φ n,δ ( ǫ | Y ) tends φ ( ǫ ) uniformly almost su rely . A.3 L et Σ S ( ǫ ) and φ ( ǫ ) b e d efined as ab o ve, and let Σ 0 ( ǫ ) b e defin ed by the we igh ted supremal con v olution Σ 0 ( ǫ ) ∆ = max 0 ≤ ǫ ′ ≤ (1+ λ ) ǫ Σ S ( ǫ ′ ) 1 + λ + λ 1 + λ φ (1 + λ ) ǫ − ǫ ′ λ . Our thir d assump tion is that Σ 0 ( ǫ ) is a conca ve f unction thr oughout the range of ǫ where it is non–negativ e. W e now define Σ( ǫ ) = ( Σ 0 ( ǫ ) Σ 0 ( ǫ ) ≥ 0 −∞ Σ 0 ( ǫ ) < 0 As we shall see b elow, while Σ 0 ( ǫ ) gives the logarithm of the e xp e cte d numb er of configur ations with total e nergy ǫ , the fun ction Σ( ǫ ) giv es th e n umber of suc h configurations for a typic al cod e in the en s em ble. T o see this, note that if Σ S ( ǫ ′ ) + λφ ([(1 + λ ) ǫ − ǫ ′ ] /λ ) < 0 for a ll ǫ ′ , t his means that for ev ery ǫ ′ the pro d uct of the n umber of configurations { s } for wh ic h E S ( s ) is ab out nǫ ′ and the probab ility that a randomly chosen co dew ord w ould pro vide the complemen tary energy ([(1 + λ ) ǫ − ǫ ′ ] /λ , is less than one, which means that there is a v ery low p robabilit y to find an y configuration with total energy ǫ , and so, Σ( ǫ ) whic h is the normalized logarithm of the num b er of suc h configurations (i.e., the thermod ynamical entrop y of the co mbined sys tem) is equal to −∞ for a typical co de realizatio n. Note that the conca vit y of Σ 0 ( ǫ ) across the range where it is non–negativ e implies that Σ ( ǫ ) is conca v e as w ell. In analogy to the d iscussion of th e previous section, let us define Z S ( β ) ∆ = X s exp {− β E S ( s ) } . Then, ψ S ( β ) ∆ = lim N →∞ 1 N ln Z S ( β ) and Σ S ( ǫ ) are a Legendre–transform pair. Sin ce Σ S ( · ) is assu m ed conca v e, then the inv erse trans- form relation Σ S ( ǫ ) = in f β ≥ 0 [ β ǫ + ψ S ( β )] , 14 holds true as w ell, and so the deriv ativ es β S ( ǫ ) ∆ = d Σ S /dǫ and ǫ S ( β ) = − dψ S /dβ are in ve rses of eac h other. It follo ws then that the Shann on en trop y rate H ( S ) of P ( s ) (wh ich dep end s on β ) agrees with Σ S ( ǫ ) whenever β and ǫ are related b y β = β S ( ǫ ), or equiv alently , ǫ = ǫ S ( β ). Referring to th e p artition fu nction Z ( β | y ), let us distinguish b et w een the con tribution of the actual realization of the true sequence th at th e source actually emitted s 0 , i.e., Z c ( β | y ) = exp {− β [ E S ( s 0 ) + E C ( x ( s 0 ) , y )] } and the cont ribu tion of all other (err on eous) source v ectors Z e ( β | y ) = X s 6 = s 0 exp {− β [ E S ( s ) + E C ( x ( s ) , y )] } . No w, ln Z c ( β | y ) is t ypically aroun d − [ E {E S ( S ) } + E {E C ( X ( S ) , Y ) } ] . As for Z e ( β | y ), let us defin e Ω N ,δ ( ǫ | y ) = { s 6 = s 0 : N (1 + λ )( ǫ − δ / 2) ≤ E S ( s ) + E C ( x ( s ) , y ) ≤ N (1 + λ )( ǫ + δ/ 2) } . Then, similarly as in the previous section, one readily observe s that for δ ′ = δ (1 + λ ), w e h a v e: Ω N ,δ ′ ( ǫ | y ) = X j ≥ 0 Ω ( S ) N ,δ (( j + 1 / 2) δ ) × Pr { N (1 + λ )( ǫ − δ ′ / 2) − N ( j + 1) δ ≤ E C ( X , y )) ≤ N (1 + λ )( ǫ + δ ′ / 2) − N j δ } = X j ≥ 0 Ω ( S ) N ,δ (( j + 1 / 2) δ ) exp { nφ n,δ ([(1 + λ ) ǫ − ( j + 1 / 2) δ ] /λ | y ) } (11) T aking logarithms of b oth sides, dividing by N + n = N (1 + λ ), letting N gro w without boun d, and finally letting δ go to zero, w e obtain 6 that: lim N →∞ ln ˆ Ω N ,δ ′ ( ǫ | Y ) N (1 + λ ) a.s. = ( Σ 0 ( ǫ ) Σ 0 ( ǫ ) ≥ 0 −∞ Σ 0 ( ǫ ) < 0 but the r.h.s. is exactly Σ( ǫ ). Th us, as explained earlier, Σ( ǫ ) is the therm o dynamical en tropy asso ciated w ith the combined source–c h annel system. The conca vit y of Σ ( ǫ ) then implies that it ag rees (after the app ropriate scaling) with the conditional Shannon entrop y rate of the sou r ce giv en th e channel output, H ( S | Y ), i.e., the en tropy rate p ertaining to the sequence of conditional probabilities P ( s | y ) defined ab o v e. F or a give n ǫ in the range w h ere Σ( ǫ ) is fi nite, let ǫ ′ = ǫ ∗ ac h iev e the sup rem um defining Σ( ǫ ). 6 At this p oint, w e are using the fa ct [12],[11] that for an en sem ble of independently sel ected cod ew ords, the num b er of co dewords which contri bute en ergy E C ( X , y ) ≈ n [(1 + λ ) ǫ − ǫ ′ ] λ , is with very high probabilit y zero, if Σ S ( ǫ ′ ) + λφ (1 + λ ) ǫ − ǫ ′ ] /λ ) < 0 and around exp { N [Σ S ( ǫ ′ ) + λφ (1 + λ ) ǫ − ǫ ′ ] /λ ) } if Σ S ( ǫ ′ ) + λφ (1 + λ ) ǫ − ǫ ′ ] /λ ) > 0. The assumption of ind ep endnent codewords can be rela xed as long as th is concentration prop erty contin ues to hold. 15 A t this p oint, one should distinguish b etw een t w o situations: In the fi rst situation, ǫ is on the b oundary of th e range where Σ( ǫ ) is finite and p ositiv e, namely , Σ( ǫ ) = 0. In this ca se, the partition function Z ( β | y ) (and h ence also P β ( s | y )) is dominate d b y a su b exp onen tial num b er of configurations { s } and so, the en trop y rate H ( S | Y ) = 0, which means that the system is frozen in its glassy phase (cf. [12],[11] and references therein.) In the second situation, ǫ is an in tern al p oint of the r ange wh ere Σ ( ǫ ) > 0, where w e w ill also assume that ǫ ∗ ∈ (0 , (1 + λ ) ǫ ), whic h is the p ar amagnetic phase (or th e disordered phase) of Z e ( β | y ). T hen, the d eriv ativ e of the fu nction b eing maximized v anishes, i.e., d Σ S ( ǫ ′ ) dǫ ′ ǫ ′ = ǫ ∗ − dφ ( ǫ ′′ ) dǫ ′′ ǫ ′′ =[(1+ λ ) ǫ − ǫ ∗ ] /λ = 0 or equiv alen tly , Σ ′ S ( ǫ ∗ ) = φ ′ (1 + λ ) ǫ − ǫ ∗ λ , (12) where Σ ′ S and φ ′ denote the deriv ativ es of Σ S and φ , resp ecti ve ly . As b efore, eq. (1 2) giv es r ise to thermal equilbriu m b et w een th e physic al system corresp onding to the source and th e on e th at p ertains to th e co d e/c h annel. Next observ e that the left–hand side is exactly β S ( ǫ ∗ ). Thus, β S ( ǫ ∗ ) = φ ′ (1 + λ ) ǫ − ǫ ∗ λ , whic h means that giv en the v alue of the total p er–particle energy ǫ , w e can fin d ho w the dominant co dew ords sp lit the energy b et ween the source and the channel: w e can so lv e th e ab ov e e quation with the giv en ǫ , with ǫ ∗ as an unkn o wn. Then, th e source contribution will b e ǫ ∗ and the c hannel con tribution w ill b e [(1 + λ ) ǫ − ǫ ∗ ] /λ . The discussion ab o ve holds for ev ery v alue of ǫ f or wh ic h Σ( ǫ ) > 0. The d ominan t v alue of ǫ is ǫ 0 , the one that ac h iev es E { ln Z ( β | Y ) } / [ N (1 + λ )] for large N , in other words, the ac h iev er of: ψ ( β ) = lim N →∞ E ln Z ( β | Y ) N (1 + λ ) = sup ǫ ≥ 0 [Σ( ǫ ) − β ǫ ] . Th us, the d ominan t v alue of ǫ , whic h is relev an t for the previous paragraph, is ǫ 0 , whic h in tur n dep end s only on β . Bu t since Σ is assu med conca v e, then ψ and Σ are also a Legendre–transform pair, and so ǫ 0 and β are related via the d eriv ativ es, ǫ 0 = ǫ ( β ) ∆ = − ψ ′ ( β ) and β = β ( ǫ ) = Σ ′ ( ǫ ), where again, primes den ote the d eriv ativ es. In summary , giv en β , ǫ 0 = ǫ ( β ) and ǫ ∗ = ǫ S ( β ). Thus, 16 β S ( ǫ ∗ ) in the equilibr ium equation is β s ( ǫ S ( β )) ≡ β since β S ( · ) and ǫ S ( · ) are inv erses of one another. Th us, th e equ ilibr ium equation app lied to the d ominan t energy ǫ 0 b ecomes β = Σ ′ S ( ǫ ∗ ) = φ ′ (1 + λ ) ǫ 0 − ǫ ∗ λ . If, in ad d ition, φ is conca v e, then φ ′ is monotone, and thus h as an in verse, w h ic h is giv en b y the negativ e d eriv ativ e − ζ ′ of the Legendre transform of φ , th at is, b y the d eriv ativ e of ζ ( t ) = s up ǫ [ φ ( ǫ ) − ǫt ] and then (1 + λ ) ǫ 0 − ǫ ∗ λ = − ζ ′ ( β ) . No w observe th at if, for a typica l y , either Z c ( β | y ) dominates Z e ( β | y ), or Z e ( β | y ) is in its f rozen phase, th en H ( S | Y ) v anishes, and so the mutual information rate lim N →∞ I ( S ; Y ) / N = H ( S ). F or the complement ary case, our main resu lt is the follo wing: Theorem 1 L et E { I ( S ; Y ) } denote the exp e cte d mutual information, wher e the exp e ctation is taken w.r.t. the ensemble of of joint sour c e–channel c o des. Then, under Assumptions A1–A3: lim N →∞ E { I ( S ; Y ) } N = − λφ ( − ζ ′ ( β )) , pr ovide d th at Σ( ǫ 0 ) > 0 . R emark: F rom th e ab o v e discussion, it is apparent that this r esult applies also to the almost–sure limit of I ( S ; Y ) / N w.r.t. the co d e ensemble. Pr o of. lim N →∞ E I ( S ; Y ) N = H ( S ) − H ( S | Y ) = Σ S ( ǫ ∗ ) − (1 + λ )Σ( ǫ 0 ) = − λφ (1 + λ ) ǫ 0 − ǫ ∗ λ = − λφ ( − ζ ′ ( β )) . ✷ (13) Discussion. W e obtained then a very simple form ula which dep ends solely on the rand om codin g distribution. But what is the meaning of ζ ′ ( β )? Since − φ ( ǫ ) is, in fact, the large d eviations rate 17 function for the ev ent E C ( X , y ) ≤ nǫ , and ζ ( t ) is its Legendre transform, then it must b e the almost–sure limit of the log–momen t generating function, that is ζ ( t ) a.s. = lim n →∞ 1 n ln X x ∈X n M ( x ) e − t E C ( x , Y ) where, as defined ab o ve , M is th e random co ding d istribution that gov erns eac h one of the inde- p endent, rand omly selected co dew ords. T h us, − ζ ′ ( β ) a.s. = lim n →∞ 1 n · P x M ( x ) E c ( x , Y ) e − β E C ( x , Y ) } P x M ( x ) e − β E C ( x , Y ) . But the Boltzmann w eigh t e − β E C ( x , y ) is prop ortional to W ( y | x ), and so, − ζ ′ ( β ) is exact ly the asymptotic almost–sure normalized conditional exp ectation of the energy , lim n →∞ E {E C ( X , Y ) | Y } /n , stemming from the action of the c h an n el on the message x ( s 0 ) that w as actually transmitted. This quan tit y in tur n is assumed to concen trate ab out its mean whic h is lim n →∞ E {E C ( X , Y ) } /n . Th us, Z e ( β | y ) and P ( s | y ) are dominated by (err on eous) sequences { s } whose n ormalized en - ergy ǫ 0 consists of a source cont ribu tion ǫ ∗ = lim N →∞ E {E S ( S ) } / N , and a c h annel contribution, [(1 + λ ) ǫ 0 − ǫ ∗ ] /λ that agrees with the normalized energy generated by the noise, i.e., it ag rees with lim n →∞ E {E C ( X , Y ) } /n , where X and Y are r elated via the c h annel W . Moreo ver, this is also the t ypical energy comp osition of the true message s 0 that w as actually trans m itted (cf. the definition of Z c ( β | y ). Thus, the ab o v e conclusion holds true regardless of whether or not the entrop y r ate of the so urce is sm aller (in whic h c ase s 0 dominates Z ( β | y )) or larger th an λ times the normalized m utual inform tion b et w een X and Y (in w hic h case, erroneous message s dominate Z ( β | y ) for a t ypical y ). W e ha ve already seen this b eha vior in the sp ecial case of the b inary source and the BSC. Example 1. Supp ose th at the c hannel is BSC and co dew ords are generated b y fair coin tossing. In this case, W ( y | x ) is prop ortional to exp {− β E C ( x , y ) } , where E C ( x , y ) is the Hamming distance and β = ln 1 − p p . In this case, φ ( ǫ ) = h 2 ( ǫ ) − ln 2 wh ose d er iv ativ e is φ ′ ( ǫ ) = ln 1 − p p , and so, − ζ ′ ( β ), the inv erse of φ ′ ( ǫ ), is giv en by − ζ ′ ( β ) = 1 / (1 + e β ) = p . It follo ws then that if, in add ition, the source is binary and memoryless with a parameter q , then P ( s | y ) is dominated b y v ectors { s } w hose energy is as describ ed in the Introdu ction. Also, the normalized m utual information is − λφ ( − ζ ′ ( β )) = − λφ ( p ) = λ (ln 2 − h 2 ( p )). Somewh at more generally , let ea c h co ordinate X i ( s ), i = 1 , . . . , n , of eac h codeword b e d ra wn i.i.d. with probab ilities Pr { X i ( s ) = 1 } = 1 − Pr { X i ( s ) = 18 0 } = m . Then, it is easy to show (using the metho d of types [2]) that − φ ( p ) = min { P X | Y : E d ( X,Y ) ≤ p } [ I ( X ; Y ) + D ( P X k M )] , Y ∼ Bernoulli( m ∗ p ) , where m ∗ p means the b inary con vo lution b et ween m and p (i.e., m ∗ p = m (1 − p ) + p (1 − m )), d ( · , · ) is the Hamming distance and P X is the marginal of X ind uced by Y (wh ic h is Bernoulli( m ∗ p )) and the rev ersed c h annel P X | Y to b e optimized. By eliminating the div ergence term, w e are low er b ounding − φ ( p ) by the rate–distortion function of Y at Hamming distortion p , which is h 2 ( m ∗ p ) − h 2 ( p ). On the other h an d , returning to the original minimization problem, by selecting P X | Y (instead of minimizing o ve r P X | Y ) to b e the rev erse channel indu ced b y M and W Y | X (whic h is the BSC( p )), w e are getting the same quanti t y a lso as an up p er b ound. Thus, − φ ( p ) = h 2 ( m ∗ p ) − h 2 ( p ), and so, lim N →∞ E I ( S ; Y ) N = λ [ h 2 ( m ∗ p ) − h 2 ( p )] . Comment: An alternativ e v iew on the deriv ation of the asymptotic m utual information rate b etw een S a nd Y comes f r om the follo wing chain of equalities: lim N →∞ E I ( S ; Y ) N = lim N →∞ E ln P ( Y | S ) P ( Y ) = lim N →∞ 1 N E { ln exp {− β E C ( X ( S ) , Y ) }} − lim N →∞ 1 N E ( ln " X s 1 Z S ( β ) exp {− β [ E S ( s ) + E C ( X ( s ) , Y )] } #) = − β [(1 + λ ) ǫ 0 − ǫ ∗ ] + ψ S ( β ) − Σ S ( ǫ ∗ ) − λφ (1 + λ ) ǫ 0 − ǫ ∗ ) λ + β (1 + λ ) ǫ 0 = β ǫ ∗ + ψ S ( β ) − Σ S ( ǫ ∗ ) − λφ (1 + λ ) ǫ 0 − ǫ ∗ ) λ = − λφ (1 + λ ) ǫ 0 − ǫ ∗ ) λ (14) where we hav e used the fact that the summ ation o v er s is dominated b y configurations with p er– particle energy ǫ 0 , whic h is allocated as ǫ ∗ and [(1 + λ ) ǫ 0 − ǫ ∗ ] /λ . 5 Application to the Wiretap Channel In this section, w e demonstrate h o w our resu lts apply to the wiretap c hannel. Wyner, in his w ell– kno wn pap er on the wiretap c hannel [14], studied the problem of secure c omm un ication across a 19 degraded broadcast channel, without using a secret k ey , wh ere th e leg itimate receiv er has access to the output of the go o d c hannel and the w iretapp er r eceiv es the output of the bad channel. In that pap er, Wyner characte rized the o ptimum trade–off b et w een reliable coding rate s a nd the equiv o cation at the wiretapp er, whic h w as defined in terms of the conditional en trop y of the source giv en the output of the bad c h annel, observed by the wire–tapp er. Consider a DMS P as b efore, and a cascade of t w o finite alphab et DMC’s: W Y | X follo wed immediately b y W Z | Y , b oth 7 op erating at a relativ e r ate of λ c hannel sym b ols p er sour ce symbol. The source s ∈ S N is enco d ed to a c hannel input vect or x ( s ) ∈ X n , n = λN , and then transmitte d. A co de for the wire–tap channel sh ould b e designed in a w a y , that on the one h an d , the legitimate receiv er is requir ed to estimate the source s fr om the outpu t y ∈ Y n of the c h annel W Y | X within an arbitrarily s mall probability of error, wh ereas on the other hand, the ea v esdropp er, which h as access t o z ∈ Z n , should b e able to learn as little as p ossible ab out the s ource in the sense that the asymptotic equiv o cation, ∆ = lim sup N →∞ H ( S | Z ) / N , sh ould b e as large as p ossible. Wyner sho w ed [14] th at the largest achiev able v alue of ∆ is giv en b y λ Γ( H ( S ) /λ ), where Γ( R ) ∆ = max P X : I ( X ; Y ) ≥ R [ I ( X ; Y ) − I ( X ; Z )] . In particular, the secrecy capacit y C s , which is the solution to the equation R = Γ( R ), is the rate at whic h the p oten tial secrecy that the wiretap c hann el can offer is fully exp oilte d: If the en trop y of the sou r ce, H ( S ) /λ is less than or equal to C s (supp osing that λ can b e chosen in s uc h a wa y), then the coding scheme of [14] that asymptotical ly ac hiev es C s w orks as fo llo ws: Let X ∗ b e the random v ariable X that ac h iev es Γ ( R ), for some R in the range H ( S ) /λ ≤ R ≤ C s , and let Y ∗ and Z ∗ b e the corresp onding outputs of the t w o channels. W e fi rst compr ess the source S to its en trop y , and then apply c hann el cod ing so that t he go o d receiv er can still deco de reliably for large N and n , but the bad one c annot. Now, sin ce H ( S ) /λ ≤ C s , then by the defin itions of Γ( · ) and C s , I ( X ∗ ; Y ∗ ) ≥ H ( S ) /λ + I ( X ∗ ; Z ∗ ). Accordingly , the channel co d eb o ok is comp osed of ab ou t e N H ( S ) = e nH ( S ) /λ bins (o ne for eac h typical source sequ en ce), eac h of size sligh tly less than e nI ( X ∗ ; Z ∗ ) . The co dewo rd actually transmitted is randomly c h osen among all c o dewords of the bin p ertaining to the ind ex of the compr essed source sequ ence. Note that th e ea v esdropp er could ha v e deco ded the message had it b een informed of t he bin whic h the transmitted co d eword 7 The notation of the output of the se cond c hannel, Z , should not b e confused with the notation of the p artition function since we do not refer the partition function in t his secti on. 20 b elongs to since the rate of the b in , as s aid, is (slightl y) less than I ( X ∗ ; Z ∗ ). The idea then is that this inf ormation is irrelev an t s in ce it is indep end en t of the source ve ctor, and so it do es n ot help the ea ve sdr opp er in learning a nything ab out the source. Indeed, if w e represen t the transmitted co dew ord x as f ( c ( s ) , u ), where c ( s ) stands f or the bit string of the lossless compression of s , indicating the bin index using nH ( S ) /λ nats, and u as an indep enden t random bit string of length nI ( X ∗ ; Z ∗ ) nats, th en w e h a v e the f ollo win g: One the one hand, H ( X | Z ) ≤ H ( c ( S ) , U | Z ) = H ( c ( S ) | Z ) + H ( U | Z , c ( S )) where th e term H ( U | Z , c ( S )) essen tially v anishes since, as menti oned ab o v e, ev ery bin form s a c hannel su b–co de that is reliably decod able by the ea vesdropp er. On the other hand, H ( X | Z ) = H ( X ) − I ( X ; Z ) , th us the equiv o cation ac hiev ed is: H ( S | Z ) ≥ H ( c ( S ) | Z ) ∼ H ( X ) − I ( X ; Z ) where the fi rst term in the r.h.s. is essen tially n [ H ( S ) /λ + I ( X ∗ ; Z ∗ )] and the second term, whic h is a mutual information induced b y a co de ab o ve capacit y , can b e ev aluated using our ab o v e results, pro vided that the c hann el co d e is randomly selected from an ensem ble that satisfies our assump tions. F or example, if the co dewo rds are chose n i.i.d. ac cording to the distrib u tion of X ∗ , then I ( X ; Z ) is appro ximately nI ( X ∗ ; Z ∗ ), and then full secrecy is ac hieve d as H ( S | Z ) / N is essen tially equal to H ( S ). Nonetheless, since the rate of the co de [ H ( S ) /λ + I ( X ∗ ; Z ∗ ) is less than I ( X ∗ ; Y ∗ ), the legtimate deco der can still d eco d e reliably . O u t results can b e used also to assess th e secrecy ac h iev ed by random v arlaibles other than i.i.d. according to X ∗ , wh ile ensu r ing th at the goo d deco de can still d ecod e reliably . 6 Extension to Multiuser Settings The ab o v e ideas can b e extended in a natural manner to multiuser comm unication situations, and in this section, we d emons trate this for the m ultiple acce ss c hann el (MA C), wh er e the underlyin g principle is aga in thermal equilibrium b et wee n the subs ystems p ertaining to th e different users an d 21 that of the channel. As b efore, our fo cus is on the regi me wher e reliable comm unication cannot hold (the paramegnetic phase). As an example, consider a randomly selected join t source–c hannel co d e for a M A C with t wo users, in the follo wing s etting. W e are giv en tw o indep enden t sour ces, S 1 , S 2 , . . . and T 1 , T 2 , . . . go verned by probabilit y distributions P S ( · ) and P T ( · ), which are prop ortional to exp {− β E S ( · ) } and to exp {− β E T ( · ) } , with partition fu nctions Z S ( β ) and Z T ( β ), resp ectiv ely . Eac h N –v ector of the first source s = ( s 1 , . . . , s N ) ∈ S N is encoded in to a c hannel input vec tor x S ( s ) ∈ X n S and eac h N –v ector of th e seco nd source t = ( t 1 , . . . , t N ) ∈ T N is enco ded into a c hannel in put v ector x T ( t ) ∈ X n T . Both co d eb o oks are select ed indep endently , wh ere eac h codevec tor of the first co de is chosen ind ep endently acco rdin g to distrib ution M S and eac h co d ev ector of th e second co deb o ok is selected indep end en tly according to distribu tion M T . Both cod ew ords are fed int o a memoryless MA C W ( y | x S , x T ), whic h is prop ortional to exp {− β E C ( x S , x T , y ) } . If we wish to estimate the m utual information E I ( S , T ; Y ) induced b y the co d e, this is qu ite a trivia l extension of the f ormer deriv ation. But what ab out E I ( S ; Y )? Here, it will b e more conv enient to adopt the al ternativ e d eriv ation of eq. (14 ). Considerin g the partition fun ction Z ( β | y ) = X s , t exp {− β [ E S ( s ) + E T ( t ) + E C ( x S ( s ) , x T ( t ) , y )] } , let ǫ ∗ S , ǫ ∗ T , and ǫ ∗ C denote the d ominan t energies allo cated to the source S , the sour ce T , and the MA C, resp ectiv ely . Also, for a t y p ical randomly c hosen codeword x S ( s ) of the source mes- sage s actually transmitted, let us defin e e nφ n,δ ( ǫ | x S ( s ) , y ) as the probabilit y (under M T ) that E c ( x S ( s ) , X T , y ) is b etw een n ( ǫ − δ / 2) and n ( ǫ + δ/ 2), for give n x S ( s ) and y , and assume that as n → ∞ and th en δ → 0, φ n,δ ( ǫ | x S ( s ) , y ) tends u n iformly almost s urely to a certain f unction which will b e denoted by φ ( ǫ | S ). No w, lim N →∞ E I ( S ; Y ) N = lim N →∞ 1 N E { ln P ( Y | S ) } − lim N →∞ 1 N E { ln P ( Y ) } = lim N →∞ 1 N E ln 1 Z T ( β ) X t exp {− β [ E T ( t ) + E C ( X S ( S ) , X T ( t ) , Y )] } − lim N →∞ 1 N E ln 1 Z S ( β ) Z T ( β ) X s , t exp {− β [ E S ( s )+ 22 E T ( t ) + E C ( X S ( s ) , X T ( t ) , Y )] } ] } = ψ S ( β ) + Σ T ( ǫ ∗ T ) + λφ ( ǫ ∗ C | S ) − β ( ǫ ∗ C + ǫ ∗ T ) − Σ T ( ǫ ∗ T ) − Σ S ( ǫ ∗ S ) − λφ ( ǫ ∗ C | S ) + β ( ǫ ∗ S + ǫ ∗ T + ǫ ∗ C ) = λ [ φ ( ǫ ∗ C | S ) − φ ( ǫ ∗ C )] (15) The last line of the ab o ve c hain of equalities can b e in tuitiv ely explained as follo ws: The term − λφ ( ǫ ∗ C ) stands fo r lim N →∞ E I ( S , T ; Y ) / N , because of the same reaso ning as b efore (if w e lo ok at th e pair ( S , T ) as one en tit y). The term λφ ( ǫ ∗ C | S ) corresp onds to the conditional mutual information rate lim N →∞ E I ( T ; Y | S ) / N since the true S is give n and only th e r andom cod ew ord of T is select ed. Th us, by the c hain ru le of the mutual information, the d ifference giv es the mutual information rate b et w een S and Y . Example 2. Consider th e binary mo dulo–2 additiv e MA C, Y = X S ⊕ X T ⊕ V , where all v ariables tak e on v alues in { 0 , 1 } , ⊕ d enotes addition m o dulo 2 (X O R), and V is Bernoulli with parameter p = Pr { V = 1 } , ind ep endent of X T and X S . S imilarly as in Example 1, let the co deb ooks o f the t w o users b e generated by i.i.d. distrib utions with parameters m S and m T , respectiv ely . No w, as b efore, ǫ ∗ C = p and the pr obabilit y that X S ⊕ X T , whose comp onents are Bernoulli( m S ∗ m T ), w ould fall within distance n p from a t ypical y , w hose comp onen ts are Bernoulli( m S ∗ m T ∗ p ), is exp onen tially e n [ h 2 ( p ) − h 2 ( m S ∗ m T ∗ p )] , th us φ ( p ) = h 2 ( m S ∗ m T ∗ p ) − h 2 ( p ). On the other hand, the pr ob ab ility of the same ev en t conditioned on x S , is the probabilit y that X T w ould fall within distance np from y ⊕ x S = x T ⊕ v (whic h has Bernoulli( m T ∗ p ) comp on ents), and thus is o f the exp onenti al ord er of e nφ ( p | S ) = e n [ h 2 ( p ) − h 2 ( m T ∗ p )] . It follo ws then that lim N →∞ E I ( S ; Y ) N = λ [ h 2 ( m S ∗ m T ∗ p ) − h 2 ( m T ∗ p )] . In th e s p ecial case wher e m T = 1 / 2, w e ge t lim N →∞ I ( S ; Y ) N = 0 rega rdless of m S , in agreement with int uition, as X T b ehav es like Bern oulli(1/2 ) noise in the paramagnetic r egime. References [1] S . Arimoto, “On the con verse to the cod ing theorem for discrete memoryless c hannels,” IEEE T r ans. Inform. The ory , v ol. IT -19, no. 5, pp. 357–359, Ma y 1973. 23 [2] I . Csisz´ ar and J. K¨ orner, Inf ormation Th eory: Co ding Theorems for Discrete Memoryless Systems. New Y ork: Academic, 1981. [3] B. Derrida, “Rand om–energy mo d el: limit of a family of disordered mo dels,” Phys. R ev. L ett. , vol. 45, no. 2, pp. 79–82, July 1980. [4] B. Derrida, “The rand om energy mo del,” Physics R ep orts (Review Section of Physics Letters), v ol. 67, no. 1, pp. 29–35, 1980. [5] B. Derrida, “Random–energy mo del: an exactly solv able mo del for disord er ed systems,” Phys. R ev. B , v ol. 24, no. 5, pp. 2613–2 626, September 1981. [6] G. Duec k and J. K¨ orner, “Relia bilit y function of a discrete memoryless c hann el at rates ab ov e capacit y ,” IEEE T r ans. Inform. The ory , vol. IT-25, no. 1, pp. 82–85 , January 1979. [7] G. D. F orn ey , Jr., “Exp onen tial error b ound s for erasu re, list, and decision f eeedbac k sc hemes,” IE EE T r ans. Inform. The ory , vol. IT–14, no. 2, pp. 206–220, Marc h 1968. [8] R. G. Gallager, Inform ation Theory and Reliable Comm unication , J. Wiley & Sons, 1968. [9] D. Guo, S . S hamai, and S . V erd ´ u, “Mutual information and min imum mean–square error in Gaussian channels,” IE EE T r ans. Inform. The ory , vol. 51, no. 4, pp. 1261–1282 , April 2005 . [10] C. Kittel, Elementary statistic al physics , John Wiley & S ons, 1958. [11] N. Merha v, “The random energy m o del in a magnetic field and j oint source–c hannel co ding,” Physic a A: Statistic al M e chanics and Its Applic ations , v ol. 387, issue 22, pp. 5662–567 4, Septem b er 15, 2008. doi:10.10 16/j.ph ysa.2008.05.040 [12] M. M ´ ezard and A. Mon tanari, Inf ormation, Ph ysics and Computation , draft, No vem b er 2007 [h ttp://www.stanford.edu/ ∼ montanar/BOOK/b o ok.h tml]. [13] P . Ru j´ an, “Finite temp erature error–correcting codes,” Phys. R ev. L et. , v ol. 70 , no. 19, pp. 2968– 2971, Ma y 1993. [14] A. D. Wyner , “The w ire-tap c hannel,” Bel l System T e chnic al J ournal , v ol. 54, no. 8, pp. 1355– 1387, Octob er 1975. 24
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment