Achieving the Gaussian Rate-Distortion Function by Prediction

1 Achie ving the Gaussian Rate-Distort ion Fu nction by Prediction Ram Zamir , Y uval K ochman an d Uri Erez Dept. Electrical E ngineering-Systems , T el A vi v Uni versity Abstract — The “water -ﬁ lling” solution for the quadratic rate- distortion function of a stationary Gaussian source is given i n terms of its power spectrum. This f ormula naturally lends itself to a frequency domain “test-channel” realization. W e provide an al- ternativ e time-domain realization fo r the rate-distortion fu nction, based on linear prediction. The predictive test-channel h as some interesting implications, including the optimality at all di stortion lev els of pre/post ﬁltered vector -quantized differential pulse code modulation (DPCM) , and a dualit y relationship with d ecision - feedback equalization (DF E) for inter-sy mbol interference (ISI) channels. Keywords: T est channel, water -ﬁlling, pre/po st-ﬁltering, DPCM, Shanno n lower bo und, ECDQ, directed-inf ormation , equalization , M MSE estimation, decision feedback . I . I N T R O D U C T I O N The water-ﬁlling solutio n for the quad ratic r ate-distortion function R ( D ) of a stationary Gaussian source is given in terms of the spectrum of the sou rce. Similarly , the capacity C of a p ower -constrained ISI channe l with Gaussian noise is giv en by a water-ﬁlling solution relative to the effecti ve noise spectrum. Both these formu las amoun t to limiting values of mutual-in formation between vectors in the fre quency domain. In c ontrast, linear prediction alon g th e time do main can translate th ese vector mu tual-info rmation quantities into scalar ones. In deed, for ca pacity , Ciofﬁ et al [4] showed that C is equal to the scala r mutu al-infor mation over a slicer embedded in a dec ision-feedb ack noise-p rediction loo p. W e show that a par allel result holds f or th e rate -distortion function : R ( D ) is equ al to the scala r mu tual-infor mation over an additive w hite Gaussian noise (A WGN) channel embedd ed in a sour ce predictio n loop, a s sh own in Figu re 1. This result implies that R ( D ) can essentially be realized in a sequential m anner ( as will be clar iﬁed later ), an d it joins other observations regarding the role of m inimum mean-squ are error (MMSE) estimatio n in su ccessi ve encodin g an d deco ding of Gaussian chan nels and sources [ 7], [ 6], [3]. The Q u adratic-Gaussian Ra te-Distortion Fun ction The rate-distor tion function (RDF) of a statio nary source with memo ry is given as a limit of normalized mutual in for- mation associated with vectors of source samples. For a real valued sourc e { X n } = . . . , X − 2 , X − 1 , X 0 , X 1 , X 2 , . . . , and expected mean-squar ed distortio n lev el D , th e RDF c an be written as, [2], R ( D ) = lim n →∞ 1 n inf I ( X 1 , . . . , X n ; Y 1 , . . . , Y n ) 0 The work of the ﬁrst two authors was partial ly supported by the Israel Science Foundation, grant ISF 1259/07 where the inﬁmum is over all chann els X → Y such that 1 n k Y − X k 2 ≤ D . A channel which realizes this inﬁmum is called an o ptimum test-channel . When the sour ce is zer o-mean Gaussian, the RDF takes an explicit for m in the frequ ency domain in terms of the power-spectrum S ( e j 2 π f ) = X k R [ k ] e − j k 2 π f , − 1 / 2 < f < 1 / 2 , where R [ k ] = E { X n X n + k } is th e auto-corr elation function of the source . Th e water ﬁlling solution, illustrated in Figure 2, giv es a parametric fo rmula for the Gau ssian RDF in ter ms of a p arameter θ [8], [2], [5]: R ( D ) = Z 1 / 2 − 1 / 2 1 2 log  S ( e j 2 π f ) D ( e j 2 π f )  d f = Z f : S ( e j 2 π f ) >θ 1 2 log  S ( e j 2 π f ) θ  d f (1) where th e distortion spectrum is given by D ( e j 2 π f ) =  θ, if S ( e j 2 π f ) > θ S ( e j 2 π f ) , otherwise, (2) and where we choose the water level θ so tha t the total distortion is D : D = Z 1 / 2 − 1 / 2 D ( e j 2 π f ) d f . (3) In the spe cial case o f a m emoryle ss (white) Gaussian source ∼ N (0 , σ 2 ) , the p ower -spectrum is ﬂat S ( e j 2 π f ) = σ 2 , so θ = D and th e RDF is simp liﬁed to 1 2 log  σ 2 D  , 0 < D ≤ σ 2 . (4) The optimum test-chann el can be written in this c ase in a backwar d additive-noise form: X = Y + N , with N ∼ N (0 , D ) , or in a forwar d lin ear add iti ve-noise fo rm: Y = β ( αX + N ) with α = β = p 1 − D /σ 2 and N ∼ N (0 , D ) . In the gen eral stationary case, the forward channel realization o f the Gaussian RDF has several equivalent forms [8, Sec. 9. 7], [2, Sec. 4.5] . The one which is more useful for our purpose replaces α and β above by linear time- in variant ﬁlters, while keepin g the noise N as A WGN [18]: Y n = h 2 ,n ∗ ( h 1 ,n ∗ X n + N n ) (5) where N n ∼ N (0 , θ ) is A WGN with θ = θ ( D ) = the water lev el, ∗ den otes con v olution, and h 1 ,n and h 2 ,n are the impulse 2 Pre−filter Post−filter Source Reconstruction Σ X n U n Z n Z q n V n Y n − + H 1 ( e j 2 π f ) H 2 ( e j 2 π f ) Σ Σ N n ˆ U n Predictor g ( V n − 1 n − L ) Fig. 1. Predict i ve T est Cha nnel. S ( e j 2 π f ) D ( e j 2 π f ) θ f 1 2 Fig. 2. The water ﬁllin g soluti on. responses o f a suitable pr e-ﬁlter and post-ﬁlter, respectiv ely . See (1 3)-(18) in the next section. If we take a discre te appro ximation o f (1) , X i 1 2 log  S ( e j 2 π f i ) D ( e j 2 π f i )  , (6) then each com ponen t has the memor yless form of (4). Hence, we can think of the frequency domain formula (1) as an encod- ing of parallel (indepe ndent) Gau ssian sources, wher e so urce i is a m emoryless Gaussian source X i ∼ N (0 , S ( e j 2 π f i )) encoded at distortion le vel D ( e j 2 π f i ) ; see [5]. Indeed, practical frequen cy domain sou rce c oding schemes such as Transform Coding a nd Sub -band Coding [10] g et close to the RDF o f a stationary Gaussian sou rce using a n “arr ay” of parallel scalar quantizers. Rate-Distortion an d Prediction Our main r esult is a predictive ch annel realization for the quadra tic-Gaussian RDF (1), which can b e viewed as th e time-dom ain co unterpa rt of the frequency do main formulation above. Th e n otions of en tr opy-po wer and Shann on lo wer bound (SLB) provide a simple relation between th e Gaussian RDF and prediction, and motiv ate ou r resu lt. Recall that the entropy-p ower is the variance of a white Gaussian process having the same entropy-rate as the source [5]; for a zero-mean Gaussian source with power -spectrum S ( e j 2 π f ) , the entropy- power is given by P e ( X ) = exp Z 1 / 2 − 1 / 2 log  S ( e j 2 π f )  d f ! . (7) In the co ntext of Wiener’ s spe ctral-factorization theo ry , the entropy-power quantiﬁes the MMSE in o ne-step lin ear prediction of a Gaussian sou rce fro m its inﬁnite past [2]: P e ( X ) = inf { a i } E X n − ∞ X i =1 a i X n − i ! 2 . (8) The error pr ocess associated with the inﬁnite-o rder optimu m predictor, Z n = X n − ∞ X i =1 a i X n − i , (9) is called the inn ovation pr oc ess . The orthogon ality principle of MMSE estimation imp lies that the in novation p rocess has zero me an and is white ; in the Gaussian case un-co rrelation implies inde penden ce, so Z n ∼ N (0 , P e ( X )) (10) is a memoryless process. See, e.g ., [7]. ¿From an in formatio n theoretic per spectiv e, the entr opy- power play s a r ole in the SLB: R ( D ) ≥ 1 2 log  P e ( X ) D  . (11) Equality in the SLB ho lds if th e distortion level is smaller than or equal to the lowest value of the power spectr um: D ≤ S min ∆ = min f S ( e j 2 π f ) , in which case D ( e j 2 π f ) = θ = D [2]. It follows th at for d istortion levels b elow S min the RDF of a Gaussian source with m emory is e qual to the RDF of its memory less inn ov ation p rocess Z n : R ( D ) = R Z ( D ) = 1 2 log  σ 2 Z D  , D ≤ S min , (12) where σ 2 Z = P e ( X ) . W e sh all see later in Sec tion II h ow identity (12) tran slates into a p redictive test-chan nel, which can realize the RDF not only for small but for all distortion levels . This test chann el is motiv ated by th e sequential structure of Differential Pu lse Code M odulation (DPCM) [1 2], [10]. T he g oal o f DPCM is to translate the encod ing of depen dent source samples in to a series of in depend ent encodin gs. T he task of r emoving th e time dep endence is achieved by ( linear) pre diction: at each time instant the incomin g source sample is p redicted fr om previously enco ded samp les, th e predictio n erro r is e ncoded by a scalar quantizer an d added to the p redicted value to for m the new reco nstruction . See Figu re 3. 3 Σ − + Σ Quantiz er Source Predict or Reconstr uction Fig. 3. DPCM Quantiztio n Scheme. A n egati ve result along this direction was recently given by Kim and Berger [13]. They showed that the RDF of an auto-regressive ( AR) Ga ussian pro cess can not be achieved by directly encoding its innovation process. This can be viewed as op e n-loop predictio n, be cause the innovation pro cess is extracted f rom th e clean source ra ther th an f rom th e qu antized source [ 12], [9]. Her e we give a positiv e result, showing that the RDF can be achieved if we e mbed th e qu antizer in side the predictio n loop, i.e., by closed-loop prediction as d one in DPCM. Th e RDF-achieving sy stem co nsists of pre- and post-ﬁlters, and an A WGN chann el em bedded in a sou rce prediction lo op. As we show , the scalar (un- condition ed) mutual in formation over th is in ner A WGN chann el is equ al to th e RDF . After presenting and provin g our main result in Sec- tions II an d III, re spectiv ely , we discuss its cha racteristics and op erational im plications. Sectio n IV discusses th e spec tral features of the solution . Sectio n V relates the solutio n to vector-quantized DPCM of parallel sources. Section VI shows an implementatio n b y Entropy Coded Dithered Quantization (ECDQ), while extending the ECDQ r ate form ula [16] to the case of a system with feedback . Finally , in Section VII we relate pr ediction in sour ce coding to prediction for ch annel equalization and to r ecent observations by Forney [7]. As in [7], ou r analysis is based o n the prope rties o f info rmation measures; the only result we n eed from Wiener’ s estimation theory is the orthog onality principle. I I . M A I N R E S U LT Consider the system in Figur e 1, which consists of three basic blocks: a pre-ﬁlter H 1 ( e j 2 π f ) , a noisy channe l embedded in a closed loop, an d a po st-ﬁlter H 2 ( e j 2 π f ) , where H ( e j 2 π f ) denotes the frequ ency resp onse of a ﬁlter with impu lse re- sponse h n , H ( e j 2 π f ) = X n h n e − j n 2 π f , − 1 / 2 < f < 1 / 2 . The system param eters are der iv ed f rom the water -ﬁlling solution (1)-(2), an d depen d on the so urce spectrum S ( e j 2 π f ) and the distor tion lev el D . The source samp les { X n } are passed th rough a pre-ﬁlter, whose phase is arbitrary an d its absolute squ ared freq uency resp onse is given by | H 1 ( e j 2 π f ) | 2 = 1 − D ( e j 2 π f ) S ( e j 2 π f ) (13) Post−filter Pre−filter V n Y n Σ N n U n X n H 1 ( e j 2 π f ) H 2 ( e j 2 π f ) Fig. 4. Equi v alen t Channel . where 0 0 is taken as 1. The pre-ﬁlter outpu t, denoted U n , is fed to the central block which generates a process V n accordin g to the following recursion equations: ˆ U n = g ( V n − 1 , V n − 2 , . . . , V n − L ) (14) Z n = U n − ˆ U n (15) Z q n = Z n + N n (16) V n = ˆ U n + Z q n (17) where N n ∼ N (0 , θ ) is a ze ro-mean white Gau ssian noise, indepen dent of the input pr ocess { U n } , whose variance is equal to the water le vel θ = θ ( D ) ; and g ( · ) is some pr ediction function f or th e in put U n giv en the L past samp les o f the output process ( V n − 1 , V n − 2 , . . . , V n − L ) . 1 Finally , the post- ﬁlter f requen cy respon se is the comp lex c onjugate o f the frequen cy respon se of the pre-ﬁlter, H 2 ( e j 2 π f ) = H ∗ 1 ( e j 2 π f ) . (18) Equiv alently , the impu lse resp onse of the post-ﬁlter is the reﬂection of the impulse resp onse of the pr e-ﬁlter: h 2 ,n = h 1 , − n . (19) See a com ment regarding c a usality in the end of the section. The blo ck from U n to V n is equ iv alent to the conﬁg uration of DPCM, [12], [ 10], with the DPCM quantizer replaced by the additive G aussian noise channel Z q n = Z n + N n . In particular, the recursion equa tions (14)-(17) imply that this block satisﬁes the well k nown “DPCM error ide ntity”, [1 2], V n = U n + ( Z q n − Z n ) = U n + N n . (20) That is, the outp ut V n is a noisy version of the inp ut U n via the A WGN channel V n = U n + N n . T hus, the sy stem of Figur e 1 is equ iv alent to the system dep icted in Fig ure 4, which corresp onds to the fo rward chan nel realization (5) of the qu adratic-Gaussian RDF . In DPCM the prediction function g is line ar: g ( V n − 1 , . . . , V n − L ) = L X i =1 a i V n − i (21) where a 1 , . . . , a L are c hosen to minimize the mean-squar ed prediction erro r: σ 2 L = min a i E U n − L X i =1 a i V n − i ! 2 . (22) Because V n is th e result of passing U n throug h an A WGN channel, we ca ll that “n oisy pred iction”. If { U n } and { V n } are jointly Gaussian, th en the best predictor of any order is linear, 1 No initial condit ion on V n is needed as we assume a two-sided input process X n , and the system is stable. 4 so σ 2 L is a lso the MMSE in estimating U n from th e vector ( V n − 1 , . . . , V n − L ) . Clearly , this MMSE is no n-incre asing with the prediction o rder L , and as L go es to inﬁnity it conver ges to σ 2 ∞ = lim L →∞ σ 2 L , (23) the o ptimum inﬁnite order prediction error in U n giv en the past V − n ∆ = { V n − 1 , V n − 2 , . . . } . (24) W e shall see later in Section I V th at σ 2 ∞ = P e ( V ) − θ . W e further elab orate on the re lationship with DPCM in Section V. W e n ow state our main result. Theor em 1: ( Predictive test channel) For any stationary source with power spectru m S ( e j 2 π f ) and distortio n level D , the system of Figure 1, with the pr e-ﬁlter (13) and the post- ﬁlter ( 18), satisﬁes E ( Y n − X n ) 2 = D . (25) Furthermo re, if the source X n is Gaussian a nd g = g ( V − n ) achieves the o ptimum in ﬁnite orde r pred iction erro r σ 2 ∞ (23), then I ( Z n ; Z n + N n ) = 1 2 log(1 + σ 2 ∞ θ ) = R ( D ) , (2 6) where the left hand side is the scala r mutual information over the channel (16). The proo f is gi ven in Section III. The result ab ove is in sharp co ntrast to the classical realization of the RDF (5), which inv olves mutual information rate over a test-chan nel with memory . In a sense, th e co re of the enc oding process in the system of Figure 1 am ounts to a memoryless A WGN test- channel (althou gh, as we discuss in the sequel, the chan nel (16) is no t quite m emoryless n or ad ditiv e). ¿Fro m a practical perspective, this system pr ovides a b ridge between DPCM and rate-distortio n theo ry fo r a gen eral distor tion level D > 0 . Another interesting featur e of the system is the relation ship between the pred iction err or process Z n and the or iginal process X n . If X n is an a uto-regressive (AR) pr ocess, then in th e limit of small distor tion ( D → 0 ), Z n is r oughly its innovation process (9). Hence, unlike in o pen-loo p prediction [13], encod ing the innovations in a closed-loop system is optimal in the limit of high-resolu tion encod ing. W e shall return to this point, as well as discuss the case of g eneral resolution, in Section IV. Finally , we note that while the central b lock of th e system is sequ ential and h ence causal, the pre- an d post-ﬁlters ar e non-ca usal and th erefor e the ir realization in p ractice req uires delay . Speciﬁcally , since by (19) h 2 ,n = h 1 , − n , if one of the ﬁlters is causal th en the oth er must be anti-cau sal. Often the ﬁlter’ s r esponse is in ﬁnite, hence the r equired delay is inﬁnite as well. Of course, on e ca n ap proxim ate the desired spectr um (in L 2 sense and hen ce also in rate-distor tion sense) to any degree using ﬁlters of sufﬁciently large but ﬁnite d elay δ , so the system distor tion is actually measured betwee n Y n and X n − δ . In this sense, Th eorem 1 h olds in g eneral in the limit as the system delay δ go es to inﬁnity . If we insist on a system with cau sa l reconstru ction ( δ = 0 ), then we can not realize the p re- and post-ﬁlters ( 13) a nd (18), and some loss in perfor mance m ust be paid. Ne vertheless, if the sou rce spectrum is boun ded from below by a p ositiv e constant, then it can be seen from (13) that in the limit of s mall distortion ( D → 0 ) the ﬁlters can be omitted, i.e., H 1 = H 2 = 1 for all f . Hen ce, a causal system ( the ce ntral b lock in Figure 1) is asymptotically optimal at “h igh reso lution” co nditions. Furthermo re, the redu ndancy of an A WGN channel ab ove the RDF is at most 0 . 5 bit per sou rce sample f or an y source and at any re solution; see, e.g., [1 6]. I t thus follows from Lemma 1 below (which d irectly ch aracterizes the information r ate of the central b lock of Figu re 1), tha t a ca usal system (the system of Figur e 1 without the ﬁlters) loses at most 0. 5 bit at any resolution. These observations shed some light o n the “co st of causal- ity” in encod ing stationary Gaussian sources [14]. It is an o pen question, tho ugh, wh ether a redundan cy better than 0.5 bit can be gua ranteed when u sing causa l pr e an d p ost ﬁlters in th e system of Figur e 1. I I I . P R O O F O F M A I N R E S U LT W e start with Lemma 1 below , which shows an identity between the mutual inform ation rate over the centr al block o f Figure 1 and the scalar mutual info rmation (26). This iden tity holds regardless of the pre- and po st-ﬁlters, and only assumes optimum in ﬁnite orde r p rediction in the feedbac k lo op. Let I ( { U n } ; { V n } ) = lim n →∞ 1 n I ( U 1 , . . . , U n ; V 1 , . . . , V n ) (27) denote mutual information -rate b etween jointly station ary sources { U n } a nd { V n } , when ev er the limit exists. Lemma 1: For any stationary Gaussian pr ocess { U n } in Figure 1, if ˆ U n is the optim um inﬁnite order predicto r of U n from V − n (so th e variance of Z n is σ 2 ∞ as deﬁned in ( 23)), then I ( { U n } ; { V n } ) = I ( Z n ; Z n + N n ) . (28) Pr o of: For any ﬁnite ord er p redictor g ( V n − 1 n − L ) we can write I ( { U n } ; V i | V i − 1 i − L ) = I ( { U n } , U i − ˆ U ( L ) i ; V i − ˆ U ( L ) i | V i − 1 i − L ) = I ( { U n } , Z ( L ) i ; Z ( L ) i + N i | V i − 1 i − L ) = I ( Z ( L ) i ; Z ( L ) i + N i | V i − 1 i − L ) (29) = I ( Z ( L ) i ; Z ( L ) i + N i ) (30) where ˆ U ( L ) i = g ( V i − 1 i − L ) is the L -th order predictor o utput at time i , and Z ( L ) i is the p rediction error . The ﬁrst equ ality above follows since man ipulating the con dition do es not 5 affect the condition al mutual informatio n; the second eq uality follows from the deﬁn ition of Z ( L ) i ; (29) follows since N i is indepen dent o f ( { U n } , V − i ) and therefo re ( Z ( L ) i + N i ) ← → ( Z ( L ) i , V i − 1 i − L ) ← → { U n } form a Markov chain ; and ( 30) fo llows from two facts: ﬁrst, since N i is in depend ent of { U i } and previous N i ’ s, it is also indepen dent of the pair ( Z ( L ) i , V i − 1 i − L ) by the recursive structure o f th e system; seco nd, we assume o ptimum ( MMSE) prediction , hence the ortho gonality principle implies that the prediction error Z ( L ) i is orth ogonal to the me asurements V i − 1 i − L , so by Gau ssianity they are also independen t, an d hence by the two facts we have that V i − 1 i − L is in depend ent of the pair ( Z ( L ) i , N i ) . Since by (22) the variance of the L - th o rder prediction er ror Z ( L ) i is σ 2 L , wh ile the variance o f th e n oise N i is θ , we thus o btained fr om (3 0) I ( { U n } ; V i | V i − 1 i − L ) = 1 2 log  1 + σ 2 L θ  . (31) This im plies in the limit as L → ∞ I ( { U n } ; V i | V − i ) = 1 2 log  1 + σ 2 ∞ θ  (32) = I ( Z n ; Z n + N n ) . (33) Note that by stationarity , I ( { U n } ; V i | V − i ) is independ ent o f i . Thus, I ( { U n } ; V 1 ) + I ( { U n } ; V 2 | V 1 ) + . . . + I ( { U n } ; V i | V i − 1 1 ) normalized by 1 / i co n verges as i → ∞ to I ( { U n } ; V i | V − i ) . By the deﬁn ition of mutual information rate (27) an d b y th e chain rule for m utual information [ 5], this imp lies th at the lef t hand side of ( 28) is equ al to I ( { U n } ; { V n } ) = I ( { U n } ; V i | V − i ) . (34) Combining ( 33) and (34) the lemm a is p roved. Theorem 1 is a simple consequence o f Lemma 1 a bove an d the for ward cha nnel realization of the RDF . As discussed in the previous section , the DPCM erro r iden tity (20) implies that th e en tire system o f Fig ure 1 is equ iv alent to the system depicted in Figure 4 , consisting of a pre -ﬁlter ( 13), an A WGN channel with noise variance θ , and a po st-ﬁlter (1 8). This is also the forward chan nel realizatio n ( 5) o f th e RDF [8], [2], [18]. In particular , as simple spectral analysis shows, the power spectrum of th e overall er ror p rocess Y n − X n is equ al to th e water ﬁlling distor tion spectrum D ( e j 2 π f ) in (2). Hence, b y (3) th e total distortion is D , and (25) fo llows. W e turn to p rove th e second part o f the theorem ( equa- tion (26) ) . Since the system of Figure 4 is equiv alent to the forward channel realization (5) of the RDF of { X n } , we have I ( { X n } ; { Y n } ) = R ( D ) (35) where I deno tes mutual inform ation-rate (27). Sin ce { U n } is a functio n of { X n } , and since the p ost-ﬁlter H 2 is in vertible within the pass-ba nd of th e p re-ﬁlter H 1 , we also have I ( { X n } ; { Y n } ) = I ( { U n } ; { V n } ) . (36) The theorem now follows by c ombinin g (36), ( 35) and Lemma 1. An alternative p roof of Theor em 1, based only on spectral consideratio ns, is given in the e nd of the next sectio n. I V . P R O P E RT I E S O F T H E P R E D I C T I V E T E S T - C H A N N E L The following ob servations she d lig ht o n th e be havior of the test chan nel of Figu re 1. Prediction in the high resolution regime. I f the p ower - spectrum S ( e j 2 π f ) is ev erywhere po siti ve ( e.g., if { X n } can be represented as an AR pr ocess), then in the limit of small distortion D → 0 , th e pre- and post-ﬁlters (1 3), (18) conv erge to all-pass ﬁlters, and the power spectr um of U n becomes the power spectrum of the sourc e X n . Fu rthermo re, noisy prediction o f U n (from the “n oisy past” V − n , where V n = U n + N n ) b ecomes equ i valent to clean prediction of U n from its own past U − n . Hen ce, in this limit th e p rediction err or Z n is equivalent to the in novation process of X n (9). In p articular, Z n is an i.i.d. pr ocess wh ose variance is P e ( X ) = th e en tropy- power of the source ( 7). Prediction in t he general case. I nterestingly , fo r g eneral distortion D > 0 , the prediction e rror Z n is not white , as the noisiness of th e past does not allow the predicto r g to remove all the sou rce mem ory . N ev ertheless, the noisy version of the prediction error Z q n = Z n + N n is white for every D > 0 , because it am ounts to predicting V n from its own inﬁnite past: sinc e N n has zero- mean and is white (and theref ore indepen dent of the past), ˆ U n that minimizes the prediction error of U n is also the op timal p redictor for V n = U n + N n . In par ticular, in view of (8) a nd (1 0), we have Z q n ∼ N (0 , P e ( V )) (37) where P e ( V ) is the en tropy-power of the pro cess V n . An d since Z q n is the ind epende nt sum o f Z n and N n , we also have the r elation P e ( V ) = σ 2 ∞ + θ where σ 2 ∞ is the variance of Z n (23) and θ is the variance o f N n . Sequential Additivity . The whiteness of Z q n might seem at ﬁrst a contradiction , becau se Z q n is th e sum of a n on-white process, Z n , an d a white process N n ; n ev ertheless, { Z n } and { N n } are n ot in depend ent, because Z n depend s on past values of N n throug h the fe edback lo op an d the past of V n . Th us, the channel Z q n = Z n + N n is n ot quite a dditive but “sequentially additive”: each new noise sample is independ ent of the present and the past but n ot necessarily of the future. In par ticular , this channel satisﬁes: I ( Z n ; Z n + N n | Z 1 + N 1 , ..., Z n − 1 + N n − 1 ) = I ( Z n ; Z n + N n ) , (38) so b y the chain ru le for mutual informa tion ¯ I ( { Z n } ; { Z n + N n } ) > I ( Z n ; Z n + N n ) . Later in Section VI we rewrite (38) in terms of directed mutual informa tion. 6 The channel when the SLB is t ight. As long as D is smaller than the lowest po int o f the sou rce power spectrum S min , we have D ( e j 2 π f ) = θ = D in (1), an d the quad ratic- Gaussian RDF c oincides with the SLB (11). In this case, the following p roperties ho ld for the pr edictive test chan nel: • The power spectr a of U n and Y n are the sam e an d are equal to S ( e j 2 π f ) − D . • The power spectrum of V n is equal to the p ower spectrum of the sour ce S ( e j 2 π f ) . • The variance o f Z q n is equal to the entropy- power of V n by (3 7), which is equ al to P e ( X ) . • As a conseq uence we h av e I ( Z n ; Z n + N n ) = h ( Z q n ) − h ( N n ) = h  N (0 , P e ( V ))  − h  N (0 , D )  = 1 2 log  P e ( X ) D  which is indeed the SLB (11). As discussed in the Introd uction, the SLB is also th e RDF of the innovation pro cess (12), i.e., the con ditional RDF of the source X n giv en its inﬁnite clean past X − n . An alt ernative der ivation of Theorem 1 in the spectral domain. For a genera l D , we can u se (37) a nd the eq uiv alent channel of Fi gure 4 to re-deri ve the scalar mutual information - RDF identity ( 26). Note th at for any D the power spe ctrum of U n and Y n is eq ual to max { 0 , S ( e j 2 π f ) − θ } , where θ = θ ( D ) is the w ater-le vel. Thus the power spectr um of V n = U n + N n is given by max { θ , S ( e j 2 π f ) } . Since as discussed above the variance of Z q n = Z n + N n is given by the entropy power of the process V n , we have I ( Z n ; Z n + N n ) = 1 2 log  P e (max { θ , S ( e j 2 π f ) } ) θ  = R ( D ) where P e ( · ) as a fu nction of the spe ctrum is given in (7), a nd the second equality follows from (1). V . V E C T O R - Q UA N T I Z E D D P C M A N D D ∗ P C M As mention ed earlier, the structur e of the cen tral block of the chann el of Figure 1 is o f a DPCM enco der, with the scalar quan tizer replaced by the A WGN channel Z q n = Z n + N n . Howe ver , if we wish to implement the additive noise by a quan tizer whose rate is the mu tual inform ation I ( Z n ; Z n + N n ) , we must use vector q uantization (VQ) . In - deed, while scalar qu antization noise is appro ximately uniform over intervals, go od high dimension al lattices gener ate n ear Gaussian qu antization noise [17]. Y et, how can we comb ine VQ and DPCM without violating the sequ ential nature o f the system? In particular, th e quantized sample Z q n must be av ailable to gene rate V n , b efore the system ca n pred ict U n +1 and g enerate Z n +1 . One way we can achieve the VQ gain and still retain the sequential structur e of the system is b y adding a “spatial” dimension, i.e. , by jointly encoding a large num ber o f pa rallel X (1) n X ( K ) n Y (1) n Y ( K ) n and ( K -dim VQ) K in depend ent joint quantization pre-po st ﬁlters prediction lo ops Z Z q Fig. 5. DPCM of paral lel source s. sour ces , as hap pens, e.g ., in v ideo co ding. Figu re 5 shows DPCM encodin g of K parallel sou rces. The spectral shapin g and p rediction are do ne in the time doma in for each sou rce separately . Then , the resulting vector of K prediction e rrors is qu antized jointly at each time instant by a vecto r quantizer . The desired pr operties o f a dditive quantization error, and rate which is eq ual to K tim es the mu tual in formatio n I ( Z n ; Z n + N n ) , can b e ap proach ed in the limit of large K by a suitable choice of the quan tizer . In the next section we discuss one way to do that using lattice ECDQ. What if we have only one sour ce instead o f K parallel sources? If the source has d ecaying mem ory , we can still approx imate the par allel sou rce coding ap proach ab ove, at the cost of large delay , by using interleaving . W e divide the (pre-ﬁltered ) source into K long blocks, which are separately predicted and then inter leav ed and jointly quan tized a s if they were parallel sour ces. See Figure ?? . This is analogo us to the m ethod u sed in [1 1] for c ombinin g c oding- decoding and decision-fe edback eq ualization (DFE). Predictor VQ Σ − + Σ Π Π − 1 from pre-ﬁlter to post-ﬁlter Fig. 6. VQ-DPCM for a single source using interl ea ving. ( Π and Π − 1 denote interlea ving and de-interle avi ng, respecti vely .) If we do no t use any of th e above, but restrict our- selves to scalar qu antization ( K = 1 ) , then we hav e a pre/post ﬁltered DPCM sche me. By com bining Theor em 1 with known bou nds on the perf ormanc e o f ( memory less) entropy-co nstrained scalar quantizer s (e. g., [18]), we h av e H ( Q opt ( Z n )) ≤ R ( D ) + 1 2 log  2 π e 12  (39) where 1 / 2 log(2 π e/ 12 ) ≈ 0 . 25 4 bit. See Remark 3 in the next section regard ing scalar/lattice ECDQ. Hence, Theor em 1 implies th at in princip le, a pre/po st ﬁltered DPCM scheme is optimal, up to the lo ss o f the VQ gain, at all distortio n lev els and n ot only at the high resolution regime. A d ifferent a pproach to combine VQ and pr ediction is ﬁrst to extract the in novation p rocess and th en to quan tize 7 + − dither n Z n Quantizer Lattice P P Coder Entropy s Z q n Q n code Fig. 7. ECDQ Structure. it. I t is interesting to mention th at this method of “open loop” pr ediction, which we mentioned earlier regarding the model of [13], is kn own in th e quan tization literature as D ∗ PCM [12]. The best pre-ﬁlter for D ∗ PCM under a high resolution assump tion tur ns out to be the “h alf-whitening ﬁlter”: | H 1 ( e j 2 π f ) | 2 = 1 / p S ( e j 2 π f ) , with the post ﬁlter H 2 ( e j 2 π f ) bein g its inverse. B ut even with this optimum ﬁlter , D ∗ PCM is inferior to DPCM: The optimal distortion gain of D ∗ PCM over a non- predictive schem e is G D ∗ PCM = σ 2 X  R 1 / 2 − 1 / 2 p S X ( e j 2 π f ) d f  2 (strictly g reater tha n o ne for non-white spectra by the Cau chy- Schwartz inequality). Comparing to the optimum p rediction gain ob tained by the DPCM schem e: G DPCM = σ 2 X P e ( X ) , we h av e: G DPCM G D ∗ PCM =  R 1 / 2 − 1 / 2 p S X ( e j 2 π f ) d f  2 P e ( X ) = σ 2 ˜ U P e ( ˜ U ) ! 2 , where ˜ U n is the pr e-ﬁlter ou tput in the D ∗ PCM scheme. This ratio is strictly greater th an on e for non- white spectra. V I . E C D Q I N A C L O S E D L O O P S Y S T E M Subtractive dithering of a un iform/lattice quantizer is a common appro ach to make the quantization noise additiv e. As shown in [16], the con ditional entropy of the dith ered lattice quan tizer ( given the dith er) is eq ual to th e mutu al informa tion in an ad ditive noise ch annel, wher e the n oise is unifor m over th e lattice cell. Furthermo re, for “go od” high dimensiona l lattices, the no ise b ecomes clo ser to a white Gaussian p rocess [17]. Thus, E CDQ (entr opy-coded dithered quantization ) provid es a na tural way to realize the inn er A WGN chann el b lock o f the predictive test-chann el. One difﬁculty , howe ver , we observe in this section is that the r esults developed in [ 16] do not apply to the case wh ere the ECDQ inp ut dep ends on p revious EDCQ o utputs an d the entropy coding is conditioned on the past. This situation indeed happen s i n predictive cod ing, when ECDQ is embedded within a feedba ck loop. As we shall see, the right measure in this case is the directed info rmation . An ECDQ operating o n the source Z n is depicted in Figu re 7. A dith er sequen ce D n , ind ependen t o f th e inp ut sequ ence Z n , is added b efore the qu antization and subtracted af ter . If the quantizer has a lattice structure o f dimension K ≥ 1 , then we assum e that the sequ ence len gth is L = M K for some integer M , so the qu antizer is activ ated M times. In this section, we u se bold notation f or K -b locks cor respond ing to a single quantizer op eration. At each quantizer ope ration instant m , a dither vector D m is independe ntly an d uniformly distributed over th e b asic lattice cell. T he lattice poin ts at th e quantizer o utput Q m , m = 1 , . . . , M are fed in to an entro py coder which is a llowed to jointly encode the seque nce, and has knowledge o f the dither as well, thu s for an input sequ ence of length L it achieves an average rate of: R E C D Q ∆ = 1 L H ( Q M 1 | D M 1 ) (4 0) bit per sour ce samp le. T he entropy c oder p roduce s a sequen ce s of ⌈ L R E C D Q ⌉ bits, f rom which th e decod er can recover Q 1 , . . . Q M , a nd the n subtract the dith er to o btain th e r e- construction sequen ce Z q n = Q n − D n , n = 1 , . . . L . The reconstruc tion err or sequenc e N n = Z q n − Z n , called in the sequel the “ECDQ noise”, has K -blocks which are un iformly distributed over the mir ror image of the ba sic lattice cell a nd are mu tually i.i.d. [1 6]. It is furth er stated in [16, Thm. 1] that the inp ut an d the n oise sequenc es, Z = Z L 1 and N = N L 1 , are statistically independent, and that the ECDQ rate is equ al to th e mutu al inf ormation over an additive noise channel with the input Z and the noise N : R E C D Q = 1 L I ( Z ; Zq ) = 1 L I ( Z ; Z + N ) . (41) Howe ver , the d eriv ation of [16, Th m. 1] m akes the implicit assumption th at the quantizer is used without feed back , that is, the current input is co nditionally in depend ent of past outputs giv en the past inpu ts. (I n oth er word s, th e depen dence on th e past, if exists, is o nly due to memory in the sour ce.) When there is feedback, this condition does n ot necessarily hold, which implies th at (even with the dith er) the sequ ences Z and N are p ossibly depend ent. Speciﬁcally , since feedb ack is causal, the inp ut Z n can dep end o n past values of the ECDQ noise N n , so their joint d istribution in gen eral has th e f orm: f ( Z L 1 , N L 1 ) = M Y m =1 f ( N m ) f ( Z m | N m − 1 1 ) (42) where Z m = Z mK ( m − 1) K +1 denotes the m th K -block, and similarly fo r N m . I n this c ase, the m utual info rmation rate o f (41) over-estimates the true r ate of the ECDQ. Massey sh ows in [ 15] that f or DMCs with feed back, tradi- tional mutual informatio n is not a suitable measure, and should 8 be replaced by dir ected information . The directed infor mation between th e sequen ces Z an d Zq = Z q L 1 is deﬁn ed as I ( Z → Zq ) ∆ = L X n =1 I ( Z n 1 ; Z q n | Z q n − 1 1 ) (43) = L X n =1 I ( Z n ; Z q n | Z q n − 1 1 ) where th e second eq uality holds wh enever the chan nel from Z n to Z q n is memoryless, a s in our case. In contrast, the mutual in formatio n between Z and Zq is given by , I ( Z ; Zq ) = L X n =1 I ( Z L 1 ; Z q n | Z q n − 1 1 ) ( 44) which by the ch ain rule f or mu tual infor mation is in general higher . F or our pu rposes, we will deﬁne the K -block directed informa tion: I K ( Z → Zq ) ∆ = M X m =1 I ( Z m 1 ; Zq m | Z q m − 1 1 ) (45 ) The following result, pr oven in Appendix A, exten ds Massey’ s observation to ECDQ with fe edback, and gener alizes the result of [1 6, Th m. 1]: Theor em 2: ( ECDQ Rate with F eedback) The ECDQ system with causal feedba ck deﬁned by (42) satisﬁes: R E C D Q = 1 L I K ( Z → Zq ) = 1 L I K ( Z → Z + N ) . (46) Remarks: 1. When there is n o feedback, th e past an d future in put blocks ( Z m − 1 1 , Z M m +1 ) are co nditionally indepen dent o f the current output blo ck Z q m giv en the current input b lock Z m , implying b y th e ch ain rule tha t (4 3) coin cides with ( 44), and Theorem 2 reduc es to [16, Thm . 1]. 2. Even f or scalar quantizatio n ( K = 1 ) , the ECDQ rate (40) refer s to jo int entropy co ding o f the wh ole input vector . This d oes not contrad ict the sequ ential natu re of the system since e ntropy codin g can be implemented c ausally . Indeed , it follows from the chain rule for e ntropy th at it is en ough to encode the instantaneous quantizer o utput Q m condition ed o n past quan tizer outp uts Q m − 1 1 and on past and pr esent d ither samples D m 1 , in order to ach iev e the joint entro py of th e quantizer in (4 0). 3. I f we do n’t condition the en tropy coding on the past, then we h av e R E C D Q = I ( Z n ; Z n + N ( unif or m ) n ) (47) ≤ I ( Z n ; Z n + N ( gauss ) n ) + 1 2 log  2 π e 12  (48) = R ( D ) + 1 2 log  2 π e 12  (49) where N ( unif or m ) n , the scalar quantizatio n n oise, is unifo rmly distributed over the in terval ( − √ 12 D , + √ 12 D ) , and wher e (49) follows from Theorem 1. This implies (39) in the previous section. 4. W e can embed a K -dimension al lattice ECDQ f or K > 1 in the pred ictiv e test cha nnel of Figure 1, instead of th e additive noise chan nel, using th e V ector-DPCM ( VDPCM) conﬁgur ation d iscussed in the previous sectio n. For g ood lattices, wh en the quan tizer dimension K → ∞ , the no ise process N in the rate expressions (41) an d (46) be comes white Gaussian, and the scheme achiev es the rate-distortio n function . Indeed , com bining Theorems 1 and 2 , we see that the a verage rate per sample of suc h VDPCM with E CDQ satisﬁes: R V D P C M − EC D Q = I ( Z n ; Z n + N n ) = R ( D ) . This implies, in particular , that the entropy coder does not need to b e conditioned o n th e past at all, as the p redictor han dles all the memory . Howe ver , wh en the q uantization noise is no t Gaussian, or the predictor is not optimal, the entropy coder can use the residual time-d ependen ce after p rediction to further reduce the coding rate. The resulting rate of th e ECDQ w ould be the average dire cted informa tion between the so urce and its reco nstruction as stated in Theo rem 2. V I I . A D U A L R E L AT I O N S H I P W I T H D E C I S I O N - F E E D B A C K E Q UA L I Z AT I O N In this section w e ma ke an analo gy between the pr edictive form of the Gaussian RDF and the “in formatio n-optima lity” of decision-feed back eq ualization (DFE) for colored Gaussian channels. As we shall see, a symmetric equivalent for m of this chann el coding problem, includ ing a water-pouring trans- mission ﬁlter, a n MMSE rece i ve ﬁlter and a noise p rediction feedback loop, exh ibits a striking resemblance to the pre/p ost- ﬁltered pr edictive test-channel of Figure 1. W e consider the (real-valued) discre te-time time-inv ariant linear Gau ssian chann el, R n = c n ∗ X n + Z n , (50) where the transmitted s ignal S n is subject to a power constrain t E [ S 2 n ] ≤ P , the ch annel dispersion is mod eled b y a linear time-inv ariant ﬁlter c n , a nd where the channel noise Z n is (possibly colo red) Gaussian n oise. Let U n represent the data stream whic h we model as an i.i.d. zero-m ean Gaussian rand om process with variance σ 2 U . Further, let h 1 ,n be a spectral sh aping ﬁlter, satisfying σ 2 U Z 1 / 2 − 1 / 2 | H 1 ( e j 2 π f ) | 2 d f ≤ P (51) so the chan nel input X n = h 1 ,n ∗ U n indeed satisﬁes the po wer constraint. For the mo ment, we make no furthe r assumption on h n . The chann el (52) has inter-symbol interfer ence (ISI) due to the channel ﬁlter c n , as well as color ed Ga ussian noise. Let us assum e that the cha nnel f requency response is non-zer o ev erywhere, an d pass the received signal R n throug h a zero- forcing ( ZF) lin ear equalizer 1 C ( z ) , resulting in Y n . W e th us 9 estimation − estimator predictor error channel symbols filter shaping decisions (correct) slicer ” + E n ” X n Σ Σ N n Y n Σ V n ˆ U n ˆ U n D n ˆ D n U n U n Fig. 8. MMSE-DFE Scheme in Predicti v e Form arrive at an eq uiv alent ISI- free chann el, Y n = X n + N n , (52) where th e power spectrum o f N n is S N ( e j 2 π f ) = S Z ( e j 2 π f ) | C ( e j 2 π f ) | 2 . The mutu al in formatio n rate ( normalized per sy mbol) (27) between th e inpu t and output of the channel (5 2) is I ( { X n } , { Y n } ) = Z 1 / 2 − 1 / 2 1 2 log  1 + S X ( e j 2 π f ) S N ( e j 2 π f )  d f . (53 ) W e note that if th e spectral shaping ﬁlter h n satisﬁes the optimum “water-ﬁlling” power allo cation conditio n, [5], then (53) will equ al th e ch annel capacity . Similarly to the observations made in Section I with respect to th e RDF , we note (as reﬂected in (53)) that c apacity may be achieved by parallel A WGN cod ing over n arrow frequen cy ban ds (as done in practice in Discrete Multitone (DMT)/Ortho gonal F requency-D i vision Multiplexing (OFDM) systems). An altern ati ve a pproach , b ased o n time -domain prediction ra ther than the Fourier transform , is offered by the cano nical MMSE - feed for ward e qualizer - decision feedback equ alizer (FFE-DFE) structure used in sing le-carrier transmission. It is w ell known that this sche me, cou pled with A WGN coding , can achieve the c apacity of linear Gau ssian channels. This has b een shown using d ifferent ap proache s by numero us authors; see [11], [4], [1], [7] and refe rences therein. Our exposition closely follows that of Forney [7]. W e now recoun t this result, b ased on linear predictio n of the error sequence; see the system in Figure 8 and its equi valent cha nnel in Figure 9. In the com munication literature, this stru cture is referred to as “n oise p rediction”. It c an b e rec ast into the mo re familiar FFE-DFE form by absorbin g a pa rt of th e predictor into the estimator ﬁlter , form ing the usu al FFE. As a ﬁr st step, le t ˆ U n be the optimal MM SE estimato r of U n from the equ iv alent ch annel output sequence { Y n } of (52). Since { U n } and { Y n } are jo intly Gaussian an d stationary this estimator is linear and time inv ariant. Note that the combinatio n of the ZF equalizer 1 C ( z ) at the re ceiv er fron t- end an d the estimator above is equiv alent to direct MMSE estimation of U n from the o riginal ch annel o utput R n (50). Denote the estimation error, which is compo sed in general of ISI and Gaussian noise, b y D n . Th en U n = ˆ U n + D n (54) where { D n } is statistically independ ent of { ˆ U n } d ue to the orthog onality prin ciple and Ga ussianity . Assuming the decoder ha s access to past sy mbols U − n = U n − 1 , U n − 2 , . . . (see in the sequ el), the deco der knows also the p ast estimation erro rs D − n = D n − 1 , D n − 2 , . . . a nd may form an optimal line ar p redictor, ˆ D n , o f the current estimation error D n , which may then be ad ded to ˆ U n to form V n . The prediction erro r E n = D n − ˆ D n has variance P e ( D ) , the entropy power of D n . I t fo llows that U n = ˆ U n + D n = V n − ˆ D n + D n = V n + E n , (55) and th erefore E { U n − V n } 2 = σ 2 E = E { D n − ˆ D n } 2 = P e ( D ) . (56) The ch annel (55), which describ es the in put/outp ut relation of the slicer in Figure 8, is often referr ed to as the ba ckwar d channel . Furth ermore, since U n and E n are i.i.d Gaussian and since b y the or thogon ality principle E n is ind ependen t of present an d pa st values of V n (but depe ndent of futur e values throug h the fe edback loop), it is a “sequen tially additive” A WGN chan nel. See Figure 1 0 for a geom etric view of th ese proper ties. No tice the strong resemb lance with the chan nel (16), Z q n = Z n + N n , in the pr edictiv e test-channel o f the RDF: in b oth cha nnels the outp ut and th e n oise ar e i.i. d. and Gaussian, but the input has memo ry an d it dep ends on p ast outputs via the f eedback lo op. { Y n } ˆ U n V n E n U n D − n ˆ D n Fig. 10. Geometric V ie w of the Estimation Proce ss W e have the refore der iv ed th e following. Theor em 3: ( Information Optimality of No ise Predic- tion) For station ary Gaussian p rocesses U n and N n , and if H 2 ( e j 2 π f ) is chosen to be the o ptimal estimation ﬁlter o f U n 10 Predictor Filter Shaping Estimation Filter Σ Σ Σ U n X n ˆ U n N n Y n D n V n − ˆ D n H 1 ( e j 2 πf ) H 2 ( e j 2 πf ) g ( D n − 1 n − L ) + + Fig. 9. Noise-Predi ction Equi v ale nt Channel from Y n and the pred ictor g ( · ) is chosen to be the o ptimal prediction ﬁlter o f D n (with L → ∞ ), then th e m utual informa tion-rate (5 3) of th e ch annel from X n to Y n (or f rom U n to Y n ) is equal to the scalar mutual info rmation I ( V n ; V n + E n ) of the chann el (55). Fur thermor e, if H 1 ( e j 2 π f ) is ch osen such th at S X ( e j 2 π f ) e quals th e water-ﬁlling spectru m of the channel in put, th en th is m utual inf ormation equ als the channel capacity . Pr o of: Let U − n = { U n − 1 , U n − 2 , . . . } and D − n = { D n − 1 , D n − 2 , . . . } . Using th e cha in rule of mutu al inf orma- tion we have I ( { U n } , { Y n } ) = h ( { U n } ) − h ( { U n }|{ Y n } ) = h ( { U n } ) − h ( U n |{ Y n } , U − n ) = h ( { U n } ) − h ( U n − ˆ U n |{ Y n } , U − n ) = h ( { U n } ) − h ( D n |{ Y n } , U − n ) = h ( { U n } ) − h ( D n |{ Y n } , D − n ) = h ( { U n } ) − h ( D n − ˆ D n |{ Y n } , D − n ) = h ( { U n } ) − h ( E n |{ Y n } , D − n ) = h ( { U n } ) − h ( E n ) (57 ) = I ( V n ; V n + E n ) , where h ( · ) d enotes d ifferential en tropy rate, and where ( 57) follows from successive application of the or thogon ality prin- ciple [7], since we assumed optimum estimation and prediction ﬁlters, which are MMSE estimators in the Gaussian settin g. In v iew of (5 3) and (56), and since I ( { U n } , { Y n } ) = I ( { X n } , { Y n } ) , T heorem 3 can b e re -written as Z 1 / 2 − 1 / 2 1 2 log  1 + S X ( e j 2 π f ) S N ( e j 2 π f )  d f = 1 2 log  σ 2 U σ 2 E  (58) from which we obtain the fo llowing w ell kn own fo rmula for the “SNR at the slicer” for inﬁnite o rder FFE- DFE, [4], [1], σ 2 U σ 2 E = exp Z 1 / 2 − 1 / 2 log  1 + S X ( e j 2 π f ) S N ( e j 2 π f )  d f ! . W e m ake a few r emarks an d interp retations r egarding the capacity-ach ieving p redictive conﬁguratio n, which further en- hance its duality relationship with th e predictive realizatio n o f the RDF . Slicing and Coding W e assumed that th e decod er has access to past sym bols. In the sim plest realization, this is achieved b y a decision element (“slicer”) that works on a symbol-b y-symbo l b asis. In practice howev er , to a pproach capacity , th e slicer m ust b e replaced by a “dec oder”. Here we m ust actually break with the assump tion th at X n is a Gaussian process. W e implicitly assum e that X n are symbols of a capacity- achieving A WGN code. Th e slicer should be viewed as a m nemonic aid wher e in practice an op timal decoder sho uld be used. Howe ver , we encoun ter two problems with this interpre- tation. First, the co mmon view of a slicer is as a nea rest neighbo r q uantizer . Th us in order to fun ction correc tly , the noise E n in (55) must be indep endent of the symbols U n and not of the estimator V n (i.e., th e chan nel should be ”forward” additive: V n = U n + E n ). This can be ach iev ed b y dither ing the codebo ok via a modulo -shift as in [6]. This is re miniscent to th e dithered quan tization ap proach o f Sectio n VI. Anoth er difﬁculty is the c onﬂict between the inher ent decod ing d elay of a go od cod e, a nd the sequen tial natu re o f th e noise- prediction DFE conﬁguration. Again (as w ith vector-DPCM in Section V), this may in princip le be solved by incorp orating an interleaver as su ggested by Guess an d V arana si [11]. Capacity achieving shaping ﬁlter . F or any spectral shaping ﬁlter h 1 ,n , the mu tual inform ation is given by (53). The shaping ﬁlter h n which maximizes the mutual in formatio n (and yield s capa city) unde r the power constraint (5 1) is given by the parametr ic water-ﬁlling for mula: σ 2 U | H 1 ( e j 2 π f ) | 2 = [ θ − S N ( e j 2 π f )] + , (59) where the “water level” θ is chosen so t hat the power con straint is met with eq uality , σ 2 X = Z 1 / 2 − 1 / 2 σ 2 U | H ( e j 2 π f ) | 2 d f = Z 1 / 2 − 1 / 2 ⌈ θ − S N ( e j 2 π f ) ⌉ + d f = P . (60) Using this choice, and a rbitrarily setting σ 2 U = θ (61) it can be veriﬁed that the shaping ﬁlter H 1 ( e j 2 π f ) and the estimation ﬁlter H 2 ( e j 2 π f ) satisfy the same complex conjugate relation as the RDF-achieving pre - and post-ﬁlters (1 3) an d (18) H 2 ( e j 2 π f ) = H ∗ 1 ( e j 2 π f ) . 11 Under the same choice, we also have tha t: S D ( e j 2 π f ) = min  S N ( e j 2 π f ) , θ  . (62) Shaping, estimatio n and prediction at high SNR. At high signal-to-noise ratio ( SNR), th e sha ping ﬁlter H 1 and the estimation ﬁlter H 2 become all-pass, an d can be rep laced by scalar multipliers. If we set the symbo l variance as in ( 61), then we get at h igh SNR σ 2 U ≈ P , so X n ≈ U n and ˆ U n ≈ Y n . It follows that the estimation error D n ≈ N n , and therefo re the slicer er ror E n becomes simp ly th e p rediction erro r (or the entropy power) of the cha nnel noise N n . Th is is the well known “zero- forcing DFE ” solution fo r optimum detection a t high SNR [ 1]. W e shall next see th at the same behavior of the slicer er ror hold s ev en for non- asymptotic co nditions. The prediction process when the Shannon upper bound is tight. The Shann on uppe r bou nd (SUB) on ca pacity states that C ≤ 1 2 log(2 π eσ 2 Y ) − h ( N ) ≤ 1 2 log  P + σ 2 N P e ( N )  ∆ = C SUB , (63) where σ 2 N = Z 1 / 2 − 1 / 2 S N ( e j 2 π f ) d f is the variance of th e eq uiv alent noise, and wh ere equality holds if and only if the outpu t Y n is wh ite. This in turn is satisﬁed if and only if θ ≥ max f S N ( e j 2 π f ) , in wh ich case θ = P + σ 2 N . If we choo se σ 2 U accordin g to (61), we h av e: • The shap ing and estimation ﬁlters satisfy | H 1 ( e j 2 π f ) | 2 = | H 2 ( e j 2 π f ) | 2 = 1 − S N ( e j 2 π f ) θ . • U n and Y n are white, with the sam e variance θ . • X n and ˆ U n have the same p ower spectrum, θ − S N ( e j 2 π f ) . • The po wer spectrum of D n is equal to the po wer spectrum of the noise N n , S N ( e j 2 π f ) . Consequen tly , the variance of E n which is equal to the entr opy-power o f D n , is equal to P e ( N ) . • As a conseq uence we h av e I ( V n ; V n + E n ) = h ( U n ) − h ( E n ) = h  N (0 , θ )  − h  N (0 , P e ( N ))  = 1 2 log  P + σ 2 N P e ( N )  which is indeed the SUB (6 3). An alt ernative der ivation of Theorem 3 in the spectral domain. Similarly to the alter native proof o f T heorem 1 , o ne can pr ove Theo rem 3 using the spectra d erived above. V I I I . S U M M A RY W e demon strated the d ual role of predictio n in rate- distortion theo ry of Gau ssian sources and capacity o f ISI channels. The se observations shed light o n the conﬁgur ations of DPCM (fo r source compression) and FFE-DFE (for chan nel demodu lation), and show that in pr inciple they are “info rma- tion lossless” fo r any distortion / SNR le vel. The theoretic bound s, RDF and capacity , can be app roached in practice by approp riate use o f feed back an d linear estimatio n in the time domain co mbined with coding acr oss the “spatial” domain. A prediction -based system h as in many cases a d elay lo wer than that of a frequency dom ain ap proach , as is well kn own in practice. W e slightly touched on this issue when discu ssing the 0.5 bit loss du e to av oiding the (“non-ca usal”) p re/post ﬁlters. But the full potential of this aspect r equires f urther study . It is tempting to ask whether the predictive f orm of the RDF c an be extended to mor e g eneral sou rces and distortio n measures ( and similarly for capacity of more general ISI channels). Y et, examin ation of the argu ments in our d eriv ation reveals that it is strongly tied to the quadr atic-Gaussian case: • The ortho gonality principle, implied b y the MMSE cri- terion, gua rantees the whiteness of the no isy pred iction error Z q n and its ortho gonality with the past. • Gaussianity im plies th at o rthogo nality is e quiv alent to statistical indep endenc e. For other error criteria and/or n on-Gaussian sources, prediction (either linear or non-linear) is in general unable to remove the depend ence o n the p ast. Hence the scalar mutu al infor mation over the pred iction error chann el would in gen eral be greater than the mu tual informatio n rate of the sourc e befo re pred ic- tion. A C K N O W L E D G E M E N T W e’ d like to thank Robert M. Gray for pointin g to us the origin of DPCM in a U.S. patent b y C.C. Cutler in 1 952. A P P E N D I X A . P RO O F O F T H E O R E M 2 It will be convenient to lo ok at K -b locks, which we deno te by bo ld letters as in Section VI. Substituting the ECDQ rate deﬁnition (40) and the K -block directed information deﬁnition (45), the require d result (46) beco mes: H ( Q M 1 | D M 1 ) = M X m =1 I ( Z m ; Z q m | Z q m − 1 m ) . Using the chain rule f or en tropies, it is eno ugh to show th at: H ( Q m | Q m − 1 1 , D M − 1 1 ) = I ( Z q m ; Z m 1 | Z q m − 1 1 ) . 12 T o tha t end, we h av e the fo llowing seque nce of equalities: H ( Q m | Q m − 1 1 , D M − 1 1 ) ( a ) = H ( Q m | Q m − 1 1 , D m 1 ) ( b ) = H ( Q m | Q m − 1 1 , D m 1 ) − H ( Q m | Q m − 1 1 , Z m 1 , D m 1 ) = I ( Q m ; Z m 1 | Q m − 1 1 , D m 1 ) ( c ) = I ( Q m − D m ; Z m 1 | Q m − 1 1 , D m 1 ) = I ( Z q m ; Z m 1 | Q m − 1 1 , D m 1 ) = I ( Z q m ; Z m 1 | Q m − 1 1 − D m − 1 1 , D m 1 ) = I ( Z q m ; Z m 1 | Z q m − 1 1 , D m 1 ) ( d ) = I ( Z q m ; Z m 1 | Z q m − 1 1 , D m ) = I ( Z q m , D m ; Z m 1 | Z q m − 1 1 ) − I ( D m ; Z m 1 | Z q m − 1 1 ) ( e ) = I ( Z q m ; Z m 1 | Z q m − 1 1 ) − I ( D m ; Z m 1 | Z q m − 1 1 ) ( f ) = I ( Z q m ; Z m 1 | Z q m − 1 1 ) . In this seq uence, equ ality (a) comes from th e inde penden t dither generatio n and causality of f eedback . (b) is justiﬁed because Q m is a deter ministic function of th e elements on which the subtracted entropy is conditio ned, thus entropy is 0 . In (c) we sub tract fr om the left hand side argume nt of the m utual informatio n one o f the variables u pon which mutual info rmation is c ondition ed. (d) an d (e) h old since each dither vector D m is a determin istic functio n of the correspo nding quan tizer o utput Z q m . Finally , (f ) is true since Z m 1 is in depend ent of D m (both con ditioned on past q uantized values an d unc ondition ed). R E F E R E N C E S [1] J. Barry , E. A. Lee and D. G. Messerschmit t. Digita l Communicat ion . Kluwer Aca demic Press, 2004 (third edit ion). [2] T . Berger . Rate Distortion Theory: A Mathematical Basis for Data Compr ession . Prentice -Hall, Engle wood Cl if fs, NJ, 1971. [3] J. Chen, C. Tian, T . Berg er , and S. S. Hemami. Multiple Descripti on Quantiz ation via Gram-Schmidt Orthogonal izat ion. IEEE T rans. Infor- mation Theory , IT -52:5197– 5217, Dec. 2006. [4] J.M. Ciof ﬁ, G.P . Dude voi r , M.V . Eyuboglu, and G.D. J. Forne y . MMSE Decision -Feedbac k Equalize rs and Coding - Part I: E qualiza tion Results. IEEE T rans. Communicat ions , COM-43:2582–2594, Oct. 1995. [5] T . M. Cov er and J. A. Thomas. Elements of Informati on Theory . W ile y , Ne w Y ork, 199 1. [6] U. E rez and R. Zamir . Achie vi ng 1 2 log(1 + S N R ) on the A WGN channe l with lattic e encoding and decoding. IEE E T rans. Information Theory , IT -50:2293–2314, Oct. 2004. [7] G. D. Forne y , Jr . Shannon meets Wiener II: On MMSE estimation in successi v e decoding schemes. In 42st Annual Allerton Confer ence on Communicat ion, Contr ol, and Computing, A llerton House, Montic ello, Illinoi s , Oct. 2004. [8] R. G. Gallager . Informati on Theory and Reliable Communication . W il ey , Ne w Y ork, N.Y . , 1968. [9] A. Gersho and R. M. Gray . V ector Quantization and Signal Compr ession Kluwer Aca demic Pub., Boston, 1992. [10] G.D.Gibson, T .Ber ger , T .L ookabau gh, D.L indber gh, and R.L.Baker . Digital Compre ssion for Multi media: Princip les and Standards‘ . Morgan Kaufmann Pub., San Fa nsisco, 1998. [11] T . Guess and M. K. V arana si. An information-the oretic frame work for deri ving canonic al decision -feedba ck rec ei v ers in gaussian channel s. IEEE T rans. Information Theo ry , IT -51:173–187, Jan. 2005. [12] N. S. Jayant and P . Noll. Digital Coding of W aveforms . Prentice-Hal l, Engle woo d Clif fs, NJ , 1984. [13] K. T . Kim and T . Berger . Sending a Lossy Versio n of the Innov ati ons Process is Suboptimal in QG Rate-Distorti on. In Proc eedin gs of ISIT - 2005, A delaide , Aust rali a , pag es 209–213, 2005. [14] T . Linder and R. Zamir, Causal coding of stationary sources and indi vid ual sequences with high hesolution. IEEE T r ans. Informat ion Theory , IT -52:662–680, Feb . 2006. [15] J. Massey . Causality , Feedba ck and Directed Information. In Proc. IEEE Int. Symp. on Information Theory , pages 303 –305, 1990. [16] R. Zamir and M. Feder . On univ ersal quantiza tion by randomized uniform / lattice quantize r . IEEE T rans. Information Theory , pages 428– 436, March 1992. [17] R. Zamir and M. Feder . On latti ce quantizatio n noise. IEEE T ra ns. Informatio n Theory , pages 1152–115 9, July 1996. [18] R. Zamir and M. Feder . Information rates of pre/post ﬁlter ed dithered quantiz ers. IEEE T rans. Information Theory , pages 1340–1353, Septem- ber 1996.

Achieving the Gaussian Rate-Distortion Function by Prediction

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment