Channels that Heat Up

Channels that Heat Up T obias Ko c h Amos Lapidoth ETH Zurich Zurich, Switzerl and Email: { tkoch, lapidoth } @isi.ee.ethz.c h P aul P . Sotiriadis Johns Hopkins Universit y Baltimore, MD, USA Email: pp s@jh u.edu Abstract This w ork considers an a dditive noise c hannel where the time- k nois e v a riance is a w eighted sum of the c hannel input p ow ers prior to time k . This channel is motiv ated by p oin t-to-p oin t communicatio n b etw een t wo termi nals th at a re embedded in the same chip. T ransmission heats up the entire c hip and hence increases the thermal noise at the receiver. The capacity of this channel (b oth with and without feedbac k) is studied at lo w transmit p o w ers and at high transmit p o wers. At low transmit p o wers , the slope of t he capacit y-vs-p ow er curve at zero is computed and it is sho wn that the heating-up eﬀect is b eneﬁcial. At high transmit pow ers, conditions are determined under whic h the capacit y is b ounded , i.e., under whic h th e capacit y does not gro w to inﬁnity as the all ow ed av erage p o wer tends to inﬁnit y . 1 In tro duction Thermal heating in electro nic systems is strongly r e lated to p erformance limit ation, aging, relia- bilit y and safet y issues. High per formance-density and small ph ysical size (ar ea or volume) mak e thermal heating imp ortant and challenging to address. This is enhanced by the trend o f mo dern (micro-)electro nic s tec hnology to pack more and faster o peratio ns within the smallest p ossible ph ysica l area in order to incre a se per fo rmance, reduce co st and size, and therefor e expand the po ten tial applications o f the pr oduct and ma k e it more proﬁtable. Electrical p ow er dissipa tio n in to hea t raises the lo c a l tempera tur e of the circuit; more accu- rately , the temperature depends on the circuit activity . The temp erature inﬂuences the pow er of the intrinsic nois e in the cir cuit which in turn reduces the eﬀective communication or com- putation capacity of the circuit. This “negative” p erformance feedback is expected to b ecome a bo ttlenec k of future technology [1], [2]. This w ork aims to add this dimension to our understa nding of the coupling mechanism betw een commu nication and co mputation p erformance and thermal heating. T o this end a class of communication channels is in tro duced, where the channel’s noise pow er depends dynamically on the channel’s ac tiv ity , and its channel capacity is studied. T o supp ort the previous statemen ts and motiv ate the mathematica l developmen t o f this new class of channels w e ﬁrst discuss the underlying physical mechanism that connects circuit a ctivit y with p o wer cons umption and therma l hea ting. Thermal heating is una voidable in electro nic circuits. Every circ uit blo ck con verts par t o f the pow er it draws from the p ow er supply net work (and to cer tain ex ten t from its interconnections with other blo c ks) into heat whic h raises the lo cal temp erature. A circuit blo ck in a micro chip o ccupies certain physical spac e within whic h heat is dis- tributively genera ted and diﬀused according to the he at diﬀusion e quation (ignor ing other heat sources) C hv ∂ T ∂ t = ∇ ·  1 ρ thd ∇ T  + E ′ (1) The m ate rial in this paper was presented in part at the 2007 IEEE In ternational Symp osium on Information Theory (ISIT), Nice, F rance, and at th e 2007 IEEE Information Theory W orkshop (ITW), Lak e T aho e, CA, USA. 1 where C hv is the volumetric heat capacity o f the material, ∂ T /∂ t is the c hange in temp erature ov er time, ∇· is the divergence, ρ thd is the distributed ther mal r e s istance, ∇ T is the temperature gradient, and E ′ is the p ow er density of the added heat, [3], [4]. In many cases the diﬀusion equation can be replaced by the co rresp onding or dinary diﬀer- ential e quation (ODE) that provides a lump ed mo del of the thermal dyna mics. Consider for example a micro chip (die), made out of material of lo wer thermal resistance, which is internally heated by the a ctivit y of circuits and transfers the heat to the en viro nmen t (e.g., air) whic h has m uch higher resistance . In this case w e ca n write C h dT dt = T e − T ρ th + E (2) where C h is the heat ca pa cit y o f the micro c hip (die), ρ th is the thermal resis ta nce b et ween the die and the environment (e.g ., air ), T e is the tempe r ature o f the environment , and E is the instantaneous heat generated, i.e., the electrical p ow er co n verted in to hea t by the circuit. Solving (2) with the a ssumption that at time t = 0 we hav e T = T e with T e being ﬁxed, we obtain T ( t ) = T e + 1 C h Z t 0 e ξ − t ρ th C h E ( ξ ) dξ , t ∈ R . (3) If the circuit op erates based on a reference clock of perio d τ , (3) can be approximated by its discrete version T k = T e + k − 1 X ℓ =1 τ C h e − τ ρ th C h ( k − ℓ ) E ℓ , k ∈ Z + , (4) where Z + denotes the set of positive in teger s, and where the s equences { T k } and { E k } are the samples a t in teger m ultiples of τ o f T ( · ) and E ( · ), respectively . E q uation (4) shows the fading memory eﬀect o f temperature. Note that (4) also c aptures discrete versions of distributed or higher order lump ed approximations of the diﬀusion equation (1). Every electronic circuit has some intrinsically gener ated noise. This noise is added to the received signal degr ading its quality . E s pecially in the p opular clas s of circ uits based on MO S transistor s [5 ], this nois e is dominated by a thermal noise comp onent that is stationa r y Gaussian, and in most applications it can b e co nsidered w hite. The v a riance of the thermal no ise N follows the J ohnson-Nyquist for m ula N = λT W (5) where W is the considere d bandwidth, T is the tempe rature of the receiver circuit block, and λ is a prop ortiona lity constant [5], [6 ], [7]. The transmission of information is typically asso ciated with dissipation o f energy into heat. Thu s, in view of (4) and (5), this motiv a tes a channel mo del where the v ar ia nce θ 2 of the additive noise is determined by the history of the p ow er o f the tr ansmitted signa l, i.e., θ 2 ( x 1 , . . . , x k − 1 ) = σ 2 + k − 1 X ℓ =1 α k − ℓ x 2 ℓ , k ∈ Z + , (6) where x ℓ is the transmitted sy m bo l at time ℓ ∈ Z + , a nd where σ 2 and { α ℓ } will be deﬁned in Section 2. The rest of this pa per is organiz e d as follo ws. Section 2 describ es the c hannel model in more detail. Section 3 discusses c ha nnel c a pacit y and lists some important properties thereof. The main re s ults are pres en ted in Section 4. The pro ofs of the r esults are g iv en in Sectio ns 5 a nd 6. Section 7 concludes with a summar y . 2 P S f r a g r e p la c e m e n t s T ra nsmitter Channel Receiver Delay M ˆ M X k Y k Y k − 1 1 Figure 1: A schema of the communication system. 2 Channel Mo del W e consider the communication sys tem depicted in Figur e 1. The message M to be transmitted ov er the channel is assumed to be uniformly distributed o ver the set M = { 1 , . . . , |M |} for some po sitiv e in teger |M| . The enco der maps the messag e to the length- n sequence X 1 , . . . , X n , where n is the blo ck-length . In the absence of feedback, the sequence X n 1 is a function of the messag e M , i.e., X n 1 = φ n ( M ) for s ome mapping φ n : M → R n . Here A n m stands for A m , . . . , A n , and R denotes the s et of real num b ers. If there is a feedback link, then X k , k = 1 , . . . , n is not only a function o f the messa ge M but also of the pa st channel output symbols Y k − 1 1 , i.e., X k = ϕ ( k ) n ( M , Y k − 1 1 ) for some mapping ϕ ( k ) n : M × R k − 1 → R . The receiver guesses the transmitted messag e M based on the n channel output symbols Y n 1 , i.e., ˆ M = ψ n ( Y n 1 ) for some mapping ψ n : R n → M . Conditional on X 1 = x 1 , . . . , X k = x k ∈ R , the time- k channel output Y k ∈ R is given by Y k = x k + v u u t σ 2 + k − 1 X ℓ =1 α k − ℓ x 2 ℓ ! · U k , k ∈ Z + , (7) where { U k } is a zer o -mean, unit-v aria nce, stationary & w eakly - mixing random pro cess, dra wn independently of M , and being of ﬁnite fourth momen t and of ﬁnit e diﬀerential en tropy ra te, i.e., E  U 4 k  < ∞ and h  U k   U k − 1 −∞  > − ∞ . (8) See [8] for a deﬁnition of weak mixing. F or example, { U k } could b e a stationary & erg odic Gaussian pro cess [9]. In particula r, the case of most interest is when { U k } ar e indep enden t and iden tically distributed (I ID), zero-mean, unit-v ariance Gaussian r andom v a riables, and the reader is encour aged to fo cus on this case. The par a meter σ 2 is as s umed to be positive. It accounts for the temp erature of the dev ic e when the trans mitter is silen t. The co eﬃcients α ℓ , ℓ ∈ Z + are nonnegative and b ounded, i.e., α ℓ ≥ 0 , ℓ ∈ Z + and sup ℓ ∈ Z + α ℓ < ∞ . (9) They characterize the dissipation of the heat produce d b y the transmissio n of the mess a ge M . 1 An example for a heat dissipatio n pr oﬁle that sa tisﬁes (9) is the ge ometric heat dissipation proﬁle where { α ℓ } is a geometric sequence , i.e., α ℓ = ρ ℓ , ℓ ∈ Z + (10) for s ome 0 < ρ < 1 . The heat dissipa tion dep e nds inter alia on the eﬃciency of the heat sink tha t is employ ed in order to absor b the produced heat. In the above example (10), the heat sink’s eﬃciency is describ ed by the parameter ρ : the smalle r ρ , the mor e eﬃcie nt the hea t s ink . In gener al, an eﬃcient heat sink is mo deled by a heat dissipation pro ﬁle for whic h th e sequence { α ℓ } deca ys fast. 1 It seems reasonable to assume that the sequence { α ℓ } is monotonically nonincreasing, i.e., α ℓ ≤ α ℓ ′ for ℓ ≥ ℓ ′ . This assumption i s , how eve r, not required for the results stated in this paper. 3 W e study the ab o ve channel under an av erage- pow er constra int on the inputs, i.e., the ma p- pings φ n (without feedba c k) and ϕ (1) n , . . . , ϕ ( n ) n (with feedback) are chosen such that—av eraged ov er the message M a nd ch annel outputs Y n 1 —the s equence X n 1 satisﬁes 1 n n X k =1 E  X 2 k  ≤ P , (11) and w e deﬁne the signal-to- noise ratio (SNR) as SNR , P σ 2 . (12) Remark 1. The r esults pr esente d in this p ap er do not change when (1 1) is r eplac e d by a p er- message aver age-p ower c onst r aint, i.e., when t he mappings φ n and ϕ (1) n , . . . , ϕ ( n ) n ar e chosen such that, for e ach message m ∈ M and for any given se quenc e of output symb ols Y n 1 = y n 1 , the se quen c e x n 1 satisﬁes 1 n n X k =1 x 2 k ≤ P . (13) Inde e d, al l ach ievability r esults (which ar e b ase d on schemes that ignor e the fe e db ack) ar e derive d under (13) , wher e as al l c onverse r esults ar e derive d under (11) . Sinc e al l mappings φ n and ϕ (1) n , . . . , ϕ ( n ) n that satisfy (13) al so fulﬁl l (11) , t his implies t hat the achievabi lity r esults as well as the c onverse r esults derive d in this p ap er hold i rr esp e ctive of whether c onstr aint (11) or (13) is imp ose d. 3 Channel Capacit y Let the r ate R (in nats p er channel use ) b e deﬁned as R , log |M| n , (14) where log ( · ) deno tes the natural lo garithm function. A ra te is said to b e achievable if there exists a sequence o f mappings { φ n } (without feedback) or  ϕ (1) n , . . . , ϕ ( n ) n  (with feedback) and { ψ n } such tha t the err or pro babilit y Pr  ˆ M 6 = M  tends to zero a s n go es to inﬁnity . The c ap acity C is the suprem um of all ac hiev able rates. W e denote by C (SNR) the c a pacit y under the input co nstraint (11) when there is no feedback, and we add the subscript “ FB” to indicate that there is a feedback link. Clear ly C (S NR) ≤ C FB (SNR) (15) as we can always ig nore the feedback link. In the absence of feedback, the information c ap acity is deﬁned as C Info (SNR) , lim n →∞ 1 n sup I ( X n 1 ; Y n 1 ) , (16) where the supremum is o ver a ll join t distributions on X 1 , . . . , X n satisfying (11). When there is a feedback link, then we deﬁne the informatio n capacity as C Info,FB (SNR) , lim n →∞ 1 n sup I ( M ; Y n 1 ) , (17) where the supr em um is ov er all mappings ϕ (1) n , . . . , ϕ ( n ) n satisfying (11). By F ano’s inequality [10, Thm. 2.11 .1] no ra te ab o ve C Info (SNR) and C Info,FB (SNR) is a chiev able, i.e ., C (S NR) ≤ C Info (SNR) and C FB (SNR) ≤ C Info,FB (SNR) . (18) 4 See [11] for conditions that guarantee that C Info (SNR) is a chiev able. Note that the channel (7) is not stationar y 2 since the v ar iance of the additive noise dep ends on the time-index k . It is therefore prima facie no t clear whether the inequalities in (18) hold with equality . In this pap er, w e shall inv estigate the capacities C (SN R ) a nd C FB (SNR) at low SNR and at high SNR. T o study capacity at low SNR, we compute the c ap acities p er u n it c ost deﬁned as [12] ˙ C (0) , sup SNR > 0 C (S NR) SNR and ˙ C FB (0) , sup SNR > 0 C FB (SNR) SNR . (19) It will b ecome apparent later that the s uprema in (19) are attained when SNR tends to zero . Note that (15) implies ˙ C (0) ≤ ˙ C FB (0) . (20) A t high SNR, we study conditions under which capa cit y is unbounded in the SNR. Notice that when the allow e d transmit p o wer is lar ge, then ther e is a trade-oﬀ b etw e e n optimizing the present trans mission a nd minimizing the interference to future transmissions . Indeed, increasing the transmission power may help to overcome the present ambien t noise, but it also heats up the chip a nd th us increases the noise v a riance in future receptions. Prima facie it is not clear that, as we incre ase the allo wed tr a nsmit p ow er , the capacit y tends to inﬁnit y . W e shall see that this is not necessarily the case. 4 Main Results Our main results a r e presen ted in the following t wo sections. Sec tio n 4.1 fo cuses on capacity a t low SNR and presents our results on the capacity p er unit co st. Sec tio n 4 .2 provides a s uﬃcien t condition and a necessary condition o n { α ℓ } under which capa cit y is bo unded in the SNR. 4.1 Capacit y p er Unit Cost The results pres en ted in this section hold under the additional assumptions that ∞ X ℓ =1 α ℓ , α < ∞ (21) and that { U k } is IID. Prop osition 1. Consider the ab ove cha nnel mo del, and assume additional ly that the se quenc e { α ℓ } satisﬁes (21) and that { U k } is IID. Then sup SNR > 0 C Info (SNR) SNR ≥ sup SNR > 0 C α =0 (SNR) SNR , (22) wher e C α =0 (SNR) denotes the c ap acity of the channel Y k = x k + σ · U k which is a sp e cial c ase of (7) for α = 0 . Pr o of. See Appendix A. This pr opositio n demonstrates that the heating up can o nly increase the info r mation capacity per unit cost. Thus at low SNR the hea ting eﬀect is unhar mful. F or Gaussian no ise, i.e., if { U k } is a seq uence of I ID, zero -mean, unit-v ariance Gaussian random v ariables , then the heating eﬀect is bene ﬁc ia l. 2 By a st ationary channel we m ean a c hannel where for any st ationary sequence o f chan nel inputs { X k } and corresponding cha nnel outputs { Y k } the pair { ( X k , Y k ) } is join tly s tationary . 5 Theorem 2. Consider the ab ove channel mo del, and a ssume additional ly that the se quenc e { α ℓ } satisﬁes (21) and that { U k } is a se quenc e of IID, zer o-me an, unit-varianc e Gaussian r andom variables. Then, irr esp e ctive of whe ther fe e db ack is available or not, the c orr esp onding c ap acity p er unit c ost is giv en by ˙ C FB (0) = ˙ C (0) = lim SNR ↓ 0 C (S NR) SNR = 1 2 1 + ∞ X ℓ =1 α ℓ ! . (23) Pr o of. See Section 5. F or example, for the geometric heat dissipatio n proﬁle (10) we obtain from Theo r em 2 ˙ C FB (0) = ˙ C (0) = 1 2 1 1 − ρ , 0 < ρ < 1 . (24) Thu s the capacity per unit cost is monotonically de cr e asing in ρ . The ab ov e r esult might be counterin tuitive, beca use it suggests not to us e heat sinks at low SNR. Nev ertheless it can b e heuristica lly explained by noting that the heating eﬀect increases the channel gain 3 . Indeed, if w e split up the c hannel output Y k = X k + v u u t σ 2 + k − 1 X ℓ =1 α k − ℓ X 2 ℓ ! · U k int o a data-dep e ndent part ˜ X k = X k + v u u t k − 1 X ℓ =1 α k − ℓ X 2 ℓ ! · U k and a data-indep enden t part Z k (with { Z k } being a seq uenc e of I ID, zero-mean, v a r iance- σ 2 , Gaussian random v a riables drawn indep endent ly of { ( U k , X k ) } ), then the channel gain G for (7) is given b y G , lim n →∞ sup P n k =1 E h ˜ X 2 k i P n k =1 E [ X 2 k ] = 1 + ∞ X ℓ =1 α ℓ , (2 5) where the supremum is ov er all join t dis tr ibutions on X 1 , . . . , X n satisfying (11). Thus, in view of (2 5), Theorem 2 demo nstrates that the capacit y p er unit cost is determined by the channel gain G . This result is no t sp eciﬁc to (7) but has also been observed for other channel mo dels. F or exa mple, the same is tr ue for fading c hannels whenever the additive noise is Gaussian [13], [14]. 4.2 Conditions for B ounded Capacity While a t low SNR the heating eﬀect is beneﬁcia l, a t high SNR it is detr imen tal. In fact, it turns out that capacity can b e even bounded in the SNR, i.e., the ca pa cit y does not tend to inﬁnit y as the SNR tends to inﬁnit y . The f ollowing theorem pro vides a suﬃcien t condition and a necessary condition on { α ℓ } for the capacity to b e b ounded. Note that the results presented in this section do not require the additional a ssumptions made in Section 4.1: we neither assume that the sequence { α ℓ } sa tisﬁes (21) nor that { U k } is I ID. Theorem 3. Consider the channel mo del describ e d in Se ction 2. Then i)  lim ℓ →∞ α ℓ +1 α ℓ > 0  = ⇒  sup SNR > 0 C FB (SNR) < ∞  (26) ii)  lim ℓ →∞ α ℓ +1 α ℓ = 0  = ⇒  sup SNR > 0 C (S NR) = ∞  , (27) wher e we deﬁne, for any a > 0 , a/ 0 , ∞ and 0 / 0 , 0 . 3 The c hannel gain i s give n by the ratio of the “desired” pow er at the c hannel output to the “desired” pow er at th e c hannel input. 6 Pr o of. See Section 6. F or example, for a geometric heat dissipation (10) w e ha ve lim ℓ →∞ α ℓ +1 α ℓ = ρ, 0 < ρ < 1 and it follows from Theo r em 3 that the corres ponding capacity is bo unded. On the other hand, for a sub-g eometric heat diss ipation, i.e., α ℓ = ρ ℓ κ , ℓ ∈ Z + for s ome 0 < ρ < 1 and κ > 1, we obtain lim ℓ →∞ α ℓ +1 α ℓ = lim ℓ →∞ ρ ( ℓ +1) κ − ℓ κ = 0 and Theorem 3 implies that the corresp onding capacity is unbounded. Roughly sp eaking, we can say that whenever the sequence of co eﬃcien ts { α ℓ } dec ays not faster than ge ometric al ly then capacity is b ounde d in the SNR, and whenever th e seque nc e of co eﬃcients { α ℓ } decays faster than ge ometric al ly then capacity is unb ounde d in the SNR. Remark 2. F or Part i) of The or em 3 the assumptions that the pr o c ess { U k } is we akly-mixing and that it has a ﬁnite fourth m oment ar e n ot ne e de d. These assumptions ar e only ne e de d in the pr o of of Part ii). 4 In Part ii) of The or em 3, the c ondition on the left-hand side ( LH S) of (27) c an b e r eplac e d by lim ℓ →∞ 1 ℓ log 1 α ℓ = ∞ . (28) This c ondition (2 8) is we aker than t he origi nal c ondition (27) b e c ause  lim ℓ →∞ α ℓ +1 α ℓ = 0  = ⇒  lim ℓ →∞ 1 ℓ log 1 α ℓ = ∞  . When neither the LHS of (26) no r the LHS of (27) hold, i.e., lim ℓ →∞ α ℓ +1 α ℓ > 0 and lim ℓ →∞ α ℓ +1 α ℓ = 0 , (29) then ca pacit y can b e b ounded o r un b ounded. Example 1 ex hibits a sequence { α ℓ } satisfying (29) for whic h the capa cit y is b ounded, and E x ample 2 provides a sequence { α ℓ } satisfying (29) for which the capac it y is un b ounded. 5 Example 1. Consider the se quenc e { α ℓ } wher e al l c o eﬃcients with an even index ar e e qual to 1 , and wher e al l c o eﬃcients with an o dd index ar e 0 . It satisﬁes (29) b e c ause lim ℓ →∞ α ℓ +1 /α ℓ = ∞ and lim ℓ →∞ α ℓ +1 /α ℓ = 0 . Then the time- k channel output Y k c orr esp onding to the channel inputs ( x 1 , . . . , x k ) is given by Y k = x k + v u u u t   σ 2 + ⌊ ( k − 1) / 2 ⌋ X ℓ =1 x 2 k − 2 ℓ   · U k , k ∈ Z + , wher e ⌊·⌋ de notes the ﬂo or function. Thus at even times the output Y 2 k , k ∈ Z + only dep ends on the “ even ” inputs ( X 2 , X 4 , . . . , X 2 k ) , while a t o dd t imes the output Y 2 k +1 , k ∈ Z + 0 only dep ends on the “o dd” inputs ( X 1 , X 3 , . . . , X 2 k +1 ) . By pr o c e e ding along the lines of the pr o of of Part i) of The or em 3 while cho osing in (60 ) β = 1 /y 2 k − 2 , it c an b e shown t hat the c ap acity of this channel is b ounde d. 6 4 They are needed to pr o v e Lemma 5. 5 The prov ided sequences { α ℓ } are not monotonically decreasing in ℓ . Consequen tly , Examples 1 & 2 are rather of math ematical than of practical i n terest. Nev ertheless they show that when neither condition of Theo rem 3 is satisﬁed, then one can construct simple examples yielding a bounded capac ity or an un b ounded capacit y , th us demonstrating the diﬃcult y of ﬁnding conditions that are necessary and suﬃcien t for t he capacit y to b e bounded. 6 In tuitively , w i th this c hoice of { α ℓ } the c hannel can be divided in to t wo parallel c hannels, one connecting the inputs and outputs at eve n times, and the other conne cting the inputs and outputs at odd times. As both c hannels ha ve the co eﬃcien ts ˜ α 0 = ˜ α 1 = . . . = 1, it follows from Theorem 3 that the capacit y of each parallel c hannel is b ounded and therefore also the capacit y of the original channe l. 7 Example 2. Consider the s e quenc e { α ℓ } wher e al l c o eﬃcients with an even p ositive index ar e 0 , and wher e al l other c o eﬃcients ar e 1 . (A gain, we have lim ℓ →∞ α ℓ +1 /α ℓ = ∞ and lim ℓ →∞ α ℓ +1 /α ℓ = 0 .) In this c ase the time- k channel output Y k c orr esp onding to ( x 1 , . . . , x k ) is given b y Y k = x k + v u u u t   σ 2 + ⌊ k/ 2 ⌋ X ℓ =1 x 2 k − 2 ℓ +1   · U k , k ∈ Z + . Using Gaussian inputs of p ower 2 P at even times whi le s et ting the i nputs to b e zer o at o dd times, and me asuring the channel outputs only at even times, r e duc es the channel to a memoryless additive noise ch annel and demo nstra tes (using the r esult of [15]) the achi evability of R = 1 4 log(1 + 2 SNR) which is unb ounde d in the SNR . The tw o seeming ly -similar examples th us lead to co mpletely diﬀer en t capacity res ults. The crucial diﬀerence b etw een E xample 1 and Ex ample 2 is that in the former example at even times the interference is caused b y the past channel inputs at even times, whereas in the latter example at ev en times the in terference is caused by the past channel inputs at o dd times. Thus in Exa mple 2 setting all “ odd” inputs to zero cancels (at ev en times) the in terference fro m past channel inputs and hence transfor ms the c hannel in to an additiv e noise channel whose capacity is un b ounded. Evidently , this a ppr oach does not work for Example 1. 5 Pro of of Theorem 2 In Section 5 .1 we derive an upper bo und on the feedback capa cit y C FB (SNR), and in Section 5.2 we deriv e a lower b ound o n the capacity C (SNR) in the absence of feedback. These b ounds ar e used in Section 5.3 to derive an upper bound on ˙ C FB (0) and a lo wer bo und on ˙ C (0), which are then bo th shown to b e equal to 1 / 2 (1 + α ). T ogether with (20) this pr o ves Theorem 2. 5.1 Con v erse The upp er b ound on C FB (SNR) is ba sed on (18) a nd on an upp er bound on 1 n I ( M ; Y n 1 ), which for o ur channel can be expres sed, using the chain rule for mutual information, as 1 n I ( M ; Y n 1 ) = 1 n n X k =1  h  Y k   Y k − 1 1  − h  Y k   Y k − 1 1 , M   = 1 n n X k =1  h  Y k   Y k − 1 1  − h  Y k   Y k − 1 1 , M , X k 1   = 1 n n X k =1 h  Y k   Y k − 1 1  − h ( U k ) − 1 2 E " log σ 2 + k − 1 X ℓ =1 α k − ℓ X 2 ℓ !# ! , (30) where the second equality follows b ecause X k 1 is a f unction of M and Y k − 1 1 ; and the last equality follows from the b ehavior of diﬀerential e n tropy under transla tio n and sca ling [10, Thms. 9.6 .3 & 9.6.4], and b ecause U k is indep endent of  Y k − 1 1 , M , X k 1  . Ev alua ting the diﬀerential ent ro p y h ( U k ) of a Gaus s ian random v ariable, and using the trivial low er b ound E h log  σ 2 + P k − 1 ℓ =1 α k − ℓ X 2 ℓ i ≥ lo g σ 2 , w e o btain the ﬁnal upp er b ound 1 n I ( M ; Y n 1 ) ≤ 1 n n X k =1  h  Y k   Y k − 1 1  − 1 2 log(2 π eσ 2 )  ≤ 1 n n X k =1 1 2 log 1 + k X ℓ =1 α k − ℓ E  X 2 ℓ  /σ 2 ! 8 ≤ 1 2 log 1 + 1 n n X k =1 k X ℓ =1 α k − ℓ E  X 2 ℓ  /σ 2 ! = 1 2 log 1 + 1 n n X k =1 E  X 2 k  /σ 2 n − k X ℓ =0 α ℓ ! ≤ 1 2 log 1 + (1 + α ) 1 n n X k =1 E  X 2 k  /σ 2 ! ≤ 1 2 log (1 + (1 + α ) SNR) , (31) where we deﬁne α 0 , 1. Here the seco nd inequality follows b ecause conditioning cannot in- crease entrop y a nd from the entrop y maximizing pro p erty of Gaussian random v a r iables [10, Thm. 9.6.5 ]; the next inequalit y follows b y Jensen’s inequality; the following equa lit y by r ewrit- ing the double sum; the subsequent inequalit y follo ws b ecause the co eﬃcients are nonneg ativ e which implies that P n − k ℓ =0 α ℓ ≤ P ∞ ℓ =0 α ℓ = 1 + α ; and the last inequality follows from the power constraint (11). 5.2 Direct Part As a foremen tioned, the ab o ve c hannel (7) is no t stationary a nd it is therefore prima facie not clear whether C Info (SNR) is achiev able. W e shall side s tep this pr oblem by studying the capacity of a diﬀerent channel whos e time- k channel output ˜ Y k ∈ R is, conditiona l o n the sequence { X k } = { x k } , g iv en by ˜ Y k = x k + v u u t σ 2 + k − 1 X ℓ = −∞ α k − ℓ x 2 ℓ ! · U k , k ∈ Z + , (32) where { U k } a nd { α ℓ } a re deﬁned in Section 2. This channel ha s the a dv antage that it is stationary & erg odic in the sense tha t whe n { X k } is a stationa ry & ergo dic pro cess then the pair { ( X k , ˜ Y k ) } is join tly stationa ry & erg odic. It follows tha t if the seq ue nc e s { X k , k = 0 , − 1 , . . . } and { X k , k = 1 , 2 , . . . } are indep enden t of each other, and if the r andom v ar iables X k , k = 0 , − 1 , . . . are b ounded, then any rate that can b e achiev ed ov er this new channel is also achiev able ov er the orig inal c hannel. Indeed, the original channel (7) can b e conv erted int o (32) by adding S k = v u u t 0 X ℓ = −∞ α k − ℓ X 2 ℓ ! · U − k to the c hannel o utput Y k , 7 and, since the indep endence of { X k , k = 0 , − 1 , . . . } and { X k , k = 1 , 2 , . . . } e ns ures that the sequence { S k , k ∈ Z + } is indep endent of the message M , it follo ws that an y rate a c hiev able o ver (32) can b e achiev ed ov er (7) by us ing a receiver that generates { S k , k ∈ Z + } a nd guesses then M based on ( Y 1 + S 1 , . . . , Y n + S n ). 8 W e shall consider channel inputs { X k } that are blockwise IID in blo c ks of L sym b ols (for some L ∈ Z + ). Thus denoting X b = ( X bL +1 , . . . , X ( b +1) L ) T (where ( · ) T denotes the transp ose), { X b } is a sequence of IID rando m leng th- L v ectors with X b taking on the v alue ( ξ , 0 , . . . , 0) T with pr obabilit y δ and (0 , . . . , 0) T with pr obabilit y 1 − δ , for some ξ ∈ R . Note that to s atisfy the average-p o wer constr ain t (11) we shall c ho ose ξ and δ so that ξ 2 σ 2 δ = L SNR . (33) 7 The b ounde dness of the random v ariables X k , k = 0 , − 1 , . . . guarantee s that the quan tity P 0 ℓ = −∞ α k − ℓ x 2 ℓ is ﬁnite for any reali zat ion of { X k , k = 0 , − 1 , . . . } . 8 Note tha t this approach i s sp eciﬁc to the case where { U k } is a sequence of Gaussian random v ariables. Indeed, it relies hea vily on the fact that given { X k } = { x k } the additiv e noise term on the right-hand side of (32) can be written as the sum of t wo independent random v ariables, of which one only de p ends on { X k , k = 0 , − 1 , . . . } and the ot her only on { X k , k = 1 , 2 , . . . } . This surely holds f or Gaussian random v ariables, but it do es not necessarily ho ld for other distributions on { U k } . 9 Let ˜ Y b = ( ˜ Y bL +1 , . . . , ˜ Y ( b +1) L ) T . Noting that the pair { ( X b , ˜ Y b ) } is jointly sta tionary & ergo dic, it follows from [11] that the rate lim n →∞ 1 n I  X ⌊ n/L ⌋− 1 0 ; ˜ Y ⌊ n/L ⌋− 1 0  is ach iev able over the new c hannel (32) and thus yields a lower bound on the capa c ity C (SNR) of the or iginal channel (7). W e lo wer bo und 1 n I  X ⌊ n/L ⌋− 1 0 ; ˜ Y ⌊ n/L ⌋− 1 0  as 1 n I  X ⌊ n/L ⌋− 1 0 ; ˜ Y ⌊ n/L ⌋− 1 0  = 1 n ⌊ n/L ⌋− 1 X b =0 I  X b ; ˜ Y ⌊ n/L ⌋− 1 0   X b − 1 0  ≥ 1 n ⌊ n/L ⌋− 1 X b =0 I  X b ; ˜ Y b   X b − 1 0  ≥ 1 n ⌊ n/L ⌋− 1 X b =0  I  X b ; ˜ Y b   X b − 1 −∞  − I  X − 1 −∞ ; ˜ Y b   X b 0   , (34) where w e use the chain r ule and the nonnegativity of m utual information. It is sho wn in Ap- pendix B that lim b →∞ I  X − 1 −∞ ; ˜ Y b   X b 0  = 0 . (35) This together with a Ces´ ar o t yp e theor em [10, Thm. 4.2 .3] yields lim n →∞ 1 n I  X ⌊ n/L ⌋− 1 0 ; ˜ Y ⌊ n/L ⌋− 1 0  ≥ 1 L I  X 0 ; ˜ Y 0   X − 1 −∞  − 1 L lim n →∞ 1 ⌊ n/L ⌋ ⌊ n/L ⌋− 1 X b =0 I  X − 1 −∞ ; ˜ Y b   X b 0  = 1 L I  X 0 ; ˜ Y 0   X − 1 −∞  , (36) where the ﬁrst inequalit y fo llows by the sta tionarit y of { ( X b , ˜ Y b ) } whic h implies that I  X b ; ˜ Y b | X b − 1 −∞  do es not dep end on b , and b y noting that lim n →∞ ⌊ n/L ⌋ n = 1 /L . W e pro ceed to a na lyze I  X 0 ; ˜ Y 0 | X − 1 −∞ = x − 1 −∞  for a given sequence X − 1 −∞ = x − 1 −∞ . Mak ing use of the canonical decompo s ition of mutual infor mation (e.g., [12, Eq . (10)]), w e have I  X 0 ; ˜ Y 0   X − 1 −∞ = x − 1 −∞  = I  X 1 ; ˜ Y 0   X − 1 −∞ = x − 1 −∞  = Z D  P ˜ Y 0 | X 1 = x, x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞  P . X 1 ( x ) − D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞  = δ D  P ˜ Y 0 | X 1 = ξ, x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞  − D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞  , (37) where the ﬁrst equality follows because, for our choice of input distributio n, X 2 = . . . = X L = 0 and hence X 1 conv eys as muc h information a bout ˜ Y 0 as X 0 . Here D ( ·k· ) denotes rela tiv e ent ro p y , i.e., D  P 1   P 0  =    Z log P . 1 P . 0 P . 1 if P 1 ≪ P 0 + ∞ otherwise, and P ˜ Y 0 | X 1 = ξ, x − 1 −∞ , P ˜ Y 0 | X 1 =0 , x − 1 −∞ , and P ˜ Y 0 | x − 1 −∞ denote the distr ibutions o f ˜ Y 0 conditional on the inputs  X 1 = ξ , X − 1 −∞ = x − 1 −∞  ,  X 1 = 0 , X − 1 −∞ = x − 1 −∞  , a nd on X − 1 −∞ = x − 1 −∞ , resp ectively . Thus P ˜ Y 0 | X 1 = ξ, x − 1 −∞ is the law of an L - v aria te Gauss ia n ra ndom vector of mean ( ξ , 0 , . . . , 0) T and of diagona l cov ariance matrix K ( ξ ) x − 1 −∞ 10 with diagonal entries K ( ξ ) x − 1 −∞ (1 , 1 ) = σ 2 + − 1 X ℓ = −∞ α − ℓL x 2 ℓL +1 K ( ξ ) x − 1 −∞ ( i, i ) = σ 2 + α i − 1 ξ 2 + − 1 X ℓ = −∞ α − ℓL + i − 1 x 2 ℓL +1 , i = 2 , . . . , L ; P ˜ Y 0 | X 1 =0 , x − 1 −∞ is the law of an L -v ariate, zero- mea n Gaussian random v ector of diag onal co v a ri- ance matrix K (0) x − 1 −∞ with dia gonal entries K (0) x − 1 −∞ ( i, i ) = σ 2 + − 1 X ℓ = −∞ α − ℓL + i − 1 x 2 ℓL +1 , i = 1 , . . . , L ; and P ˜ Y 0 | x − 1 −∞ is given by P ˜ Y 0 | x − 1 −∞ = δ P ˜ Y 0 | X 1 = ξ, x − 1 −∞ + (1 − δ ) P ˜ Y 0 | X 1 =0 , x − 1 −∞ . In order to ev a luate the ﬁrst term on the right-hand side (RHS) of (37 ) w e note that the relative entrop y of tw o rea l, L -v ariate Gaus sian ra ndom vectors of mea ns µ 1 and µ 2 and of cov ariance matr ices K 1 and K 2 is given b y D  N ( µ 1 , K 1 )   N ( µ 2 , K 2 )  = 1 2 log det K 2 − 1 2 log det K 1 + 1 2 tr  K 1 K − 1 2 − I L  + 1 2 ( µ 1 − µ 2 ) T K − 1 2 ( µ 1 − µ 2 ) , (38) with det A and tr ( A ) denoting the determina n t a nd the tra ce of the matrix A , and wher e I L denotes the L × L identit y matrix. The second term on the RHS of (37) is analyzed in the next subsection. Let E h D  P ˜ Y 0 | X − 1 −∞   P ˜ Y 0 | X 1 =0 , X − 1 −∞  i denote the second term on the RHS of (37) av erage d ov er X − 1 −∞ , i.e., E h D  P ˜ Y 0 | X − 1 −∞    P ˜ Y 0 | X 1 =0 , X − 1 −∞ i = E X − 1 −∞ h D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞ i . Then using (38) & (37) a nd taking expe ctations ov er X − 1 −∞ , w e o btain, again deﬁning α 0 , 1 , 1 L I  X 0 ; ˜ Y 0   X − 1 −∞  = δ L ξ 2 σ 2 1 2 L X i =1 E " α i − 1 1 + P − 1 ℓ = −∞ α − ℓL + i − 1 X 2 ℓL +1 /σ 2 # − δ L 1 2 L X i =2 E " log 1 + α i − 1 ξ 2 σ 2 + P − 1 ℓ = −∞ α − ℓL + i − 1 X 2 ℓL +1 !# − 1 L E h D  P ˜ Y 0 | X − 1 −∞    P ˜ Y 0 | X 1 =0 , X − 1 −∞ i ≥ δ L ξ 2 σ 2 1 2 L X i =1 α i − 1 1 + P − 1 ℓ = −∞ α − ℓL + i − 1 E  X 2 ℓL +1  /σ 2 − δ L 1 2 L X i =2 log  1 + α i − 1 ξ 2 /σ 2  − 1 L E h D  P ˜ Y 0 | X − 1 −∞    P ˜ Y 0 | X 1 =0 , X − 1 −∞ i ≥ 1 2 SNR L X i =1 α i − 1 1 + α L SNR − 1 2 SNR L X i =2 log  1 + α i − 1 ξ 2 /σ 2  ξ 2 /σ 2 − 1 L E h D  P ˜ Y 0 | X − 1 −∞    P ˜ Y 0 | X 1 =0 , X − 1 −∞ i , (39) 11 where the ﬁrst inequalit y follo ws by the low e r bound E [1 / (1 + X )] ≥ 1 / (1 + E [ X ]), whic h is a consequence of Jensen’s inequality a pplied to the conv ex function 1 / (1 + x ), x > 0, and by the upper b ound E " log 1 + α i − 1 ξ 2 σ 2 + P − 1 ℓ = −∞ α − ℓL + i − 1 X 2 ℓL +1 !# ≤ lo g  1 + α i − 1 ξ 2 /σ 2  , i = 2 , . . . , L ; and the seco nd inequality follows by (33) and by upp er bounding − 1 X ℓ = −∞ α − ℓL + i − 1 ≤ ∞ X ℓ =1 α ℓ = α, i = 1 , . . . , L. The ﬁnal low er b ound follo ws now b y (39) and (36) lim n →∞ 1 n I  X ⌊ n/L ⌋− 1 0 ; ˜ Y ⌊ n/L ⌋− 1 0  ≥ 1 2 SNR L X i =1 α i − 1 1 + α L SNR − 1 2 SNR L X i =2 log  1 + α i − 1 ξ 2 /σ 2  ξ 2 /σ 2 − 1 L E h D  P ˜ Y 0 | X − 1 −∞    P ˜ Y 0 | X 1 =0 , X − 1 −∞ i (40) and b y r ecalling that C (S NR) ≥ lim n →∞ 1 n I  X ⌊ n/L ⌋− 1 0 ; ˜ Y ⌊ n/L ⌋− 1 0  . (41) 5.3 Asymptotic Analysis W e start with analyzing the upper b ound (31). Using tha t log (1 + x ) ≤ x , x > − 1 we have C FB (SNR) SNR ≤ 1 2 log(1 + (1 + α ) SNR) SNR ≤ 1 2 (1 + α ) , (42) and w e th us obtain ˙ C FB (0) = sup SNR > 0 C FB (SNR) SNR ≤ 1 2 (1 + α ) . (43) In order to derive a low er b ound on ˙ C (0) we ﬁrst note that ˙ C (0) = sup SNR > 0 C (S NR) SNR ≥ lim SNR ↓ 0 C (S NR) SNR (44) and pro ceed by ana lyzing the limiting ratio of the low er bound (40) to SNR as SNR tends to zero. T o this end we ﬁrst shall s ho w that lim SNR ↓ 0 E h D  P ˜ Y 0 | X − 1 −∞    P ˜ Y 0 | X 1 =0 , X − 1 −∞ i SNR = 0 . (45) W e recall that for any pair of distributions P 0 and P 1 satisfying P 1 ≪ P 0 [12, p. 10 2 3] lim β ↓ 0 D ( β P 1 + (1 − β ) P 0 k P 0 ) β = 0 . (46) Thu s, for an y given X − 1 −∞ = x − 1 −∞ , (46) together with δ = SNR L σ 2 /ξ 2 implies that lim SNR ↓ 0 D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞  SNR = 0 . (47) 12 In order to show that this also holds when D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞  is a veraged over X − 1 −∞ , we derive in the following the uniform upp er bound sup x − 1 −∞ D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞  = D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞     x − 1 −∞ =0 . (48) The claim (45) follows then by upp er bounding E h D  P ˜ Y 0 | X − 1 −∞    P ˜ Y 0 | X 1 =0 , X − 1 −∞ i ≤ D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞     x − 1 −∞ =0 (49) and b y (47). In order to prov e (48) we use that any Gaussia n r andom v ector can be ex pressed as the sum of t wo indepe ndent Ga ussian rando m vectors to write the channel output ˜ Y 0 as ˜ Y 0 = X 0 + V + W , (50) where, conditional on X 0 −∞ = x 0 −∞ , V and W are L -v ariate, zero-mean Gaussian ra ndom vec- tors, drawn independently of each other and having the r espective diagonal cov ariance matrices K V | x 0 and K W | x − 1 −∞ whose diagonal entries are giv en b y K V | x 0 (1 , 1 ) = σ 2 K V | x 0 ( i, i ) = σ 2 + α i − 1 x 2 1 , i = 2 , . . . , L, and K W | x − 1 −∞ ( i, i ) = − 1 X ℓ = −∞ α − ℓL + i − 1 x 2 ℓL +1 , i = 1 , . . . , L. Thu s W is the po r tion of the no ise due to X − 1 −∞ , and V is the p ortion of the noise that remains after subtr a cting W . Note that X 0 + V and W are indep enden t of each other becaus e X 0 is, by construction, independent o f X − 1 −∞ . The upp er b ound (48) follows now by D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞  = D  P X 0 + V + W | x − 1 −∞    P X 0 + V + W | X 1 =0 , x − 1 −∞  ≤ D  P X 0 + V   P X 0 + V | X 1 =0  = D  P ˜ Y 0 | x − 1 −∞    P ˜ Y 0 | X 1 =0 , x − 1 −∞     x − 1 −∞ =0 , (51) where P X 0 + V + W | x − 1 −∞ and P X 0 + V + W | X 1 =0 , x − 1 −∞ denote the distributio ns of X 0 + V + W co nditional on the inputs X − 1 −∞ = x − 1 −∞ and on ( X 1 = 0 , X − 1 −∞ = x − 1 −∞ ), resp ectively; P X 0 + V denotes the unconditional distribution of X 0 + V ; and P X 0 + V | X 1 =0 denotes the distribution of X 0 + V conditional on X 1 = 0. Here the inequa lit y follows b y the data proc essing inequality for relativ e entropy (see [1 0, Sec. 2 .9 ]) a nd b y noting that X 0 + V is indep enden t of X − 1 −∞ . Returning to the analysis o f (40), we o btain from (4 4 ) and (45) ˙ C (0) ≥ lim SNR ↓ 0 C (S NR) SNR ≥ lim SNR ↓ 0 1 2 L X i =1 α i − 1 1 + α L SNR − 1 2 L X i =2 log  1 + α i − 1 ξ 2 /σ 2  ξ 2 /σ 2 = 1 2 L X i =1 α i − 1 − 1 2 L X i =2 log  1 + α i − 1 ξ 2 /σ 2  ξ 2 /σ 2 . (52) By letting ﬁrst ξ 2 go to inﬁnit y while holding L ﬁxed, and by letting th en L go to inﬁn ity , w e obtain the desir ed low er b ound o n the capacity per unit c o st ˙ C (0) ≥ lim SNR ↓ 0 C (S NR ) SNR ≥ 1 2 (1 + α ) . (53) 13 Thu s (53), (20), and (43) y ield 1 2 (1 + α ) ≤ lim SNR ↓ 0 C (S NR) SNR ≤ ˙ C (0) ≤ ˙ C FB (0) ≤ 1 2 (1 + α ) (54) which prov es Theor em 2. 6 Pro of of Theorem 3 6.1 P art i) In order to show that lim ℓ →∞ α ℓ +1 α ℓ > 0 (55) implies that the feedba ck capa cit y C FB (SNR) is b ounded, we derive a capacity upp er b ound which is ba s ed on (18) and on an upp er b ound on 1 n I ( M ; Y n 1 ). Aga in we deﬁne α 0 , 1. W e ﬁrst no te that, according to (55), w e can ﬁnd an ℓ 0 ∈ Z + and a 0 < ρ < 1 so that α ℓ 0 > 0 and α ℓ +1 α ℓ ≥ ρ, ℓ ≥ ℓ 0 . (56) W e contin ue with the ch ain rule for mut ual information 1 n I ( M ; Y n 1 ) = 1 n ℓ 0 X k =1 I  M ; Y k   Y k − 1 1  + 1 n n X k = ℓ 0 +1 I  M ; Y k   Y k − 1 1  . (57) Each summand in the ﬁrst sum on the RHS of (57) is upper bo unded by I  M ; Y k   Y k − 1 1  ≤ h ( Y k ) − h  Y k   Y k − 1 1 , M  = h ( Y k ) − 1 2 E " log σ 2 + k − 1 X ℓ =1 α k − ℓ X 2 ℓ !# − h  U k   U k − 1 1  ≤ 1 2 log 2 π e 1 + k X ℓ =1 α k − ℓ E  X 2 ℓ  σ 2 ! ! − h  U k   U k − 1 1  ≤ 1 2 log 2 π e 1 +  sup ℓ ′ ∈ Z + 0 α ℓ ′  k X ℓ =1 E  X 2 ℓ  σ 2 ! ! − h  U k   U k − 1 1  ≤ 1 2 log 2 π e 1 +  sup ℓ ′ ∈ Z + 0 α ℓ ′  n SNR ! ! − h  U k   U k − 1 1  ≤ 1 2 log 2 π e 1 +  sup ℓ ′ ∈ Z + 0 α ℓ ′  n SNR ! ! − h  U k   U k − 1 −∞  . (58) Recall tha t sup ℓ ′ ∈ Z + 0 α ℓ ′ is ﬁnite (9). Here the ﬁrst inequality follows b ecause co nditioning cannot increa se entropy; the following equality follows b ecause  X k 1 , U k − 1 1  is a function of  M , Y k − 1 1  , from the b ehavior o f entropy under transla tion and sca ling [10, Thms. 9.6.3 & 9.6.4], and from the fact that, conditional on U k − 1 1 , U k is independent of  X k 1 , M , Y k − 1 1  ; the subseq uen t inequality fo llows from the en tropy maximizing pr operty of Gaussian random v a riables and by low er bounding E h log  σ 2 + P k − 1 ℓ =1 α k − ℓ X 2 ℓ i ≥ lo g σ 2 ; the next inequality by upper b ounding each co eﬃcient α ℓ ≤ sup ℓ ′ ∈ Z + 0 α ℓ ′ , ℓ = 1 , . . . , k ; the subsequent inequality fo llows from the power constraint (11); and the last inequalit y follo ws beca use conditioning cannot incre ase ent ropy . The summands in the s e cond sum o n the RHS of (57 ) are upper b ounded using the general upper b ound for mutual informa tion [16, Thm. 5.1 ] I ( X ; Y ) ≤ Z D  W ( ·| x )   R ( · )  Q . ( x ) , (59) 14 where W ( ·|· ) is the channel la w, Q ( · ) is the distribution on the channel input X , and R ( · ) is any distribution on the o utput alphab et. Thus any choice of output distr ibution R ( · ) yie lds an upper b ound on the m utual information. W e upper b ound I  M ; Y k   Y k − 1 1 = y k − 1 1  , k = ℓ 0 + 1 , . . . , n for a given Y k − 1 1 = y k − 1 1 by choosing R ( · ) to b e a Cauch y distributio n whose de ns it y is given by √ β π 1 1 + β y 2 k , y k ∈ R , (60) where we c ho ose the scale pa rameter β to b e 9 β = 1 ˜ β y 2 k − ℓ 0 and ˜ β = min    ρ ℓ 0 − 1 α ℓ 0 max ℓ ′ =0 ,...,ℓ 0 − 1 α ℓ ′ , α ℓ 0 , ρ ℓ 0    , (61) with 0 < ρ < 1 and ℓ 0 ∈ Z + given by (56). Note that (5 6) together with (9) implies that 0 < ˜ β < 1 and ˜ β α ℓ ≤ α ℓ + ℓ 0 , ℓ ∈ Z + 0 . (62) Applying (60) to (59) yields I  M ; Y k   Y k − 1 1 = y k − 1 1  ≤ E " log 1 + Y 2 k ˜ β Y 2 k − ℓ 0 !      Y k − 1 1 = y k − 1 1 # + 1 2 log  ˜ β y 2 k − ℓ 0  + log π − h  Y k   M , Y k − 1 1 = y k − 1 1  , (63) and w e th us obtain, av era ging o ver Y k − 1 1 , I  M ; Y k   Y k − 1 1  ≤ lo g π − h  Y k   Y k − 1 1 , M  + 1 2 E h log  ˜ β Y 2 k − ℓ 0  i + E h log  ˜ β Y 2 k − ℓ 0 + Y 2 k  i − E  log  Y 2 k − ℓ 0  − log ˜ β . (64) W e ev alua te the terms on the RHS of (6 4) individually . W e begin with h  Y k   Y k − 1 1 , M  ≥ 1 2 E " log σ 2 + k − 1 X ℓ =1 α k − ℓ X 2 ℓ !# + h  U k   U k − 1 −∞  , (65) where we us e the same steps as in the equality in (58) a nd that conditioning cannot increase ent ro p y . The next ter m is upper bounded by E h log  ˜ β Y 2 k − ℓ 0  i = E h E h log  ˜ β  X k − ℓ 0 + θ  X k − ℓ 0 − 1 1  · U k − ℓ 0  2     X k − ℓ 0 1 ii ≤ E h log  ˜ β E h  X k − ℓ 0 + θ  X k − ℓ 0 − 1 1  · U k − ℓ 0  2    X k − ℓ 0 1 ii = E " log ˜ β X 2 k − ℓ 0 + ˜ β σ 2 + ˜ β k − ℓ 0 − 1 X ℓ =1 α k − ℓ 0 − ℓ X 2 ℓ !# ≤ E " log σ 2 + k − ℓ 0 X ℓ =1 α k − ℓ X 2 ℓ !# , (66) where we deﬁne, for a g iv en X k − 1 1 = x k − 1 1 , θ  x k − 1 1  , v u u t σ 2 + k − 1 X ℓ =1 α k − ℓ x 2 ℓ . (67) 9 When y k − ℓ 0 = 0 then with this choice of β the density of the Cauch y distribution (6 0) i s undeﬁned. H o w ever, this e ven t is of zero probability and has therefore no i mpact on the m utual i nformation I ` M ; Y k ˛ ˛ Y k − 1 1 ´ . 15 Here the ﬁrs t inequality in (66) follows from Jensen’s inequalit y , and the seco nd inequalit y follows from (62). Similarly we use J ensen’s inequality along with (62) to upper b ound E h log  ˜ β Y 2 k − ℓ 0 + Y 2 k  i ≤ E " log σ 2 + k − ℓ 0 X ℓ =1 α k − ℓ X 2 ℓ + σ 2 + k X ℓ =1 α k − ℓ X 2 ℓ !# ≤ lo g 2 + E " log σ 2 + k X ℓ =1 α k − ℓ X 2 ℓ !# . (68) In order to low er b ound E  log  Y 2 k − ℓ 0  we need the following lemma: Lemma 4. L et X b e a r andom variable of density f X ( x ) , x ∈ R . Then, for any 0 < δ ≤ 1 and 0 < η < 1 we h ave sup c ∈ R E  log | X + c | − 1 · I {| X + c | ≤ δ }  ≤ ǫ ( δ, η ) + 1 η h − ( X ) (69) wher e I {·} denotes the indic ator function 10 ; h − ( X ) is deﬁne d as h − ( X ) , Z { x ∈ R : f X ( x ) > 1 } f X ( x ) log f X ( x ) x . ; (7 0) and wher e ǫ ( δ, η ) > 0 tends to zer o as δ ↓ 0 . Pr o of. See [16, Le mma 6.7]. W e write the exp ectation as E  log  Y 2 k − ℓ 0  = E  E  log  X k − ℓ 0 + θ  X k − ℓ 0 − 1 1  · U k − ℓ 0  2     X k − ℓ 0 1  and lo wer bo und the conditional expectatio n for a given X k − ℓ 0 1 = x k − ℓ 0 1 by E  log  X k − ℓ 0 + θ  X k − ℓ 0 − 1 1  · U k − ℓ 0  2     X k − ℓ 0 1 = x k − ℓ 0 1  = log θ 2  x k − ℓ 0 − 1 1  − 2 E   log      X k − ℓ 0 θ  X k − ℓ 0 − 1 1  + U k − ℓ 0      − 1       X k − ℓ 0 1 = x k − ℓ 0 1   ≥ log θ 2  x k − ℓ 0 − 1 1  − 2 ǫ ( δ, η ) − 2 η h − ( U k − ℓ 0 ) + log δ 2 (71) for some 0 < δ ≤ 1 and 0 < η < 1. Here the inequality follows by splitting the conditiona l exp ectation into the t wo exp ectations E   log      X k − ℓ 0 θ  X k − ℓ 0 − 1 1  + U k − ℓ 0      − 1       X k − ℓ 0 1 = x k − ℓ 0 1   = E   log      X k − ℓ 0 θ  X k − ℓ 0 − 1 1  + U k − ℓ 0      − 1 · I (      X k − ℓ 0 θ  X k − ℓ 0 − 1 1  + U k − ℓ 0      ≤ δ )       X k − ℓ 0 1 = x k − ℓ 0 1   + E   log      X k − ℓ 0 θ  X k − ℓ 0 − 1 1  + U k − ℓ 0      − 1 · I (      X k − ℓ 0 θ  X k − ℓ 0 − 1 1  + U k − ℓ 0      > δ )       X k − ℓ 0 1 = x k − ℓ 0 1   and by upper bounding then the ﬁrst term on the RHS using Lemma 4 a nd the second term by − lo g δ . Averaging (71) ov er X k − ℓ 0 1 yields E  log  Y 2 k − ℓ 0  ≥ E " log σ 2 + k − ℓ 0 − 1 X ℓ =1 α k − ℓ 0 − ℓ X 2 ℓ !# − 2 ǫ ( δ, η ) − 2 η h − ( U k − ℓ 0 ) + log δ 2 . (72) 10 The i ndicator function I { statemen t } tak es on the v alue 1 if the statemen t is true and 0 othe rwis e. 16 Note that, since U k − ℓ 0 is of unit v arianc e , (8) to g ether with [1 6, Lemma 6.4 ] implies tha t h − ( U k − ℓ 0 ) is ﬁnite. T urning back to the upp er b ound (64) we o btain from (6 5) , (66), (68), and (72) I  M ; Y k   Y k − 1 1  ≤ lo g π − 1 2 E " log σ 2 + k − 1 X ℓ =1 α k − ℓ X 2 ℓ !# − h  U k   U k − 1 −∞  + 1 2 E " log σ 2 + k − ℓ 0 X ℓ =1 α k − ℓ X 2 ℓ !# + log 2 + E " log σ 2 + k X ℓ =1 α k − ℓ X 2 ℓ !# − E " log σ 2 + k − ℓ 0 − 1 X ℓ =1 α k − ℓ 0 − ℓ X 2 ℓ !# + 2 ǫ ( δ, η ) + 2 η h − ( U k − ℓ 0 ) − log δ 2 − log ˜ β ≤ E " log σ 2 + k X ℓ =1 α k − ℓ X 2 ℓ !# − E " log σ 2 + k − ℓ 0 − 1 X ℓ =1 α k − ℓ 0 − ℓ X 2 ℓ !# + K , (73) where K , log 2 π ˜ β δ 2 − h  U k   U k − 1 −∞  + 2 η h − ( U k − ℓ 0 ) + 2 ǫ ( δ, η ) (74) is a ﬁnite cons tan t, a nd where the las t inequality in (73) follows b ecause for any X k − 1 k − ℓ 0 +1 = x k − 1 k − ℓ 0 +1 we ha ve P k − ℓ 0 ℓ =1 α k − ℓ x 2 ℓ ≤ P k − 1 ℓ =1 α k − ℓ x 2 ℓ . No te that K do es not dep end on k as the pro cess { U k } is stationa ry . T urning bac k to the ev aluation of the second sum on the RH S of (57), we use that f or an y sequences { a k } and { b k } n X k = ℓ 0 +1 ( a k − b k ) = n X k = n − 2 ℓ 0 +1 ( a k − b k − n +3 ℓ 0 ) + n − 2 ℓ 0 X k = ℓ 0 +1 ( a k − b k +2 ℓ 0 ) . (75) Deﬁning a k , E " log σ 2 + k X ℓ =1 α k − ℓ X 2 ℓ !# (76) and b k , E " log σ 2 + k − ℓ 0 − 1 X ℓ =1 α k − ℓ 0 − ℓ X 2 ℓ !# (77) we hav e for the ﬁrs t sum o n the RHS o f (75) n X k = n − 2 ℓ 0 +1 ( a k − b k − n +3 ℓ 0 ) = n X k = n − 2 ℓ 0 +1 E " log σ 2 + P k ℓ =1 α k − ℓ X 2 ℓ σ 2 + P k − n +2 ℓ 0 − 1 ℓ =1 α k − n +2 ℓ 0 − ℓ X 2 ℓ !# ≤ 2 ℓ 0 log 1 +  sup ℓ ∈ Z + 0 α ℓ  n SNR ! (78) which follows b y low er b ounding the denominator b y σ 2 , and by using then Je ns en’s inequality together with the third and fourth inequality in (58). F o r the second sum on the RHS o f (75) we hav e n − 2 ℓ 0 X k = ℓ 0 +1 ( a k − b k +2 ℓ 0 ) = n − 2 ℓ 0 X k = ℓ 0 +1 E " log σ 2 + P k ℓ =1 α k − ℓ X 2 ℓ σ 2 + P k + ℓ 0 − 1 ℓ =1 α k + ℓ 0 − ℓ X 2 ℓ !# ≤ n − 2 ℓ 0 X k = ℓ 0 +1 E " log σ 2 + P k ℓ =1 α k + ℓ 0 − ℓ X 2 ℓ σ 2 + P k + ℓ 0 − 1 ℓ =1 α k + ℓ 0 − ℓ X 2 ℓ !# − ( n − 3 ℓ 0 ) log ˜ β ≤ − ( n − 3 ℓ 0 ) log ˜ β , (79) 17 where the ﬁrs t inequality follows by adding log ˜ β to the exp ectation and by upper b o unding then ˜ β α ℓ < α ℓ + ℓ 0 , ℓ ∈ Z + 0 (62); and the last inequality fo llo ws because for any giv en X k + ℓ 0 − 1 k +1 = x k + ℓ 0 − 1 k +1 we ha ve P k ℓ =1 α k + ℓ 0 − ℓ x 2 ℓ ≤ P k + ℓ 0 − 1 ℓ =1 α k + ℓ 0 − ℓ x 2 ℓ . W e apply no w (73), (75), (78), and (79) to upper b ound 1 n n X ℓ = ℓ 0 +1 I  M ; Y k   Y k − 1 1  ≤ n − ℓ 0 n K + 2 ℓ 0 n log 1 +  sup ℓ ∈ Z + 0 α ℓ  n SNR ! − n − 3 ℓ 0 n log ˜ β (80) which together with (57) and (58 ) yields 1 n I ( M ; Y n 1 ) ≤ n − ℓ 0 n K − n − 3 ℓ 0 n log ˜ β + ℓ 0 2 n log(2 π e ) − ℓ 0 n h  U k   U k − 1 −∞  + ℓ 0 n 5 2 log 1 +  sup ℓ ∈ Z + 0 α ℓ  n SNR ! . (81) This conv erges to K − lo g ˜ β < ∞ as w e let n tend to inﬁnity , thus proving that lim ℓ →∞ α ℓ +1 /α ℓ > 0 implies that the c a pacit y C FB (SNR) is b ounded in the SNR. 6.2 P art ii) W e shall show that lim ℓ →∞ 1 ℓ log 1 α ℓ = ∞ (82) implies that the capacity C (SNR) in the absence of feedback is un b ounded in the SNR. P ar t ii) of Theorem 3 follows then b y no ting that lim ℓ →∞ α ℓ +1 α ℓ = 0 = ⇒ lim ℓ →∞ 1 ℓ log 1 α ℓ = ∞ . (83) W e prov e the claim by proposing a co ding sc heme tha t a chieves an un b ounded rate. W e ﬁrst note that (82) implies that for any 0 <  < 1 we can ﬁnd an ℓ 0 ∈ Z + so tha t α ℓ <  ℓ , ℓ = ℓ 0 , ℓ ≥ ℓ 0 . (84) If ther e exists an ℓ 0 ∈ Z + so tha t α ℓ = 0 , ℓ ≥ ℓ 0 , then we can achiev e the (un b ounded) ra te R = 1 2 L log(1 + L SNR) , L ≥ ℓ 0 (85) by a co ding sc heme where the channel inputs { X kL +1 , k ∈ Z + 0 } are IID, zero-mea n Gaussian random v ar iables o f v ariance L P , and where the other inputs are deterministically zero . Indeed, by waiting L time-steps, the chip’s temperature co ols down to the am bient one so that the noise v aria nce is indep endent o f the previous channel inputs and we ca n a c hieve—after appropr iate normalizatio n—the capacity of the additive white Gaussian nois e (A W GN) channel [15]. F or the more genera l case (84) we prop ose the following enco ding and deco ding scheme. Let x n 1 ( m ), m ∈ M denote the codeword s e nt out by th e transmitter that corresp onds to the message M = m . W e choose some L ≥ ℓ 0 and generate the comp onents x kL +1 ( m ), m ∈ M , k = 0 , . . . , ⌊ n/L ⌋ − 1 indep enden tly of each other acco rding to a zero-mea n Gaussian law of v aria nce P . The o ther comp onen ts are set to zero. 11 The r eceiv er uses a ne ar est n eighb or de c o der in order to guess M based on the received sequence of channel o utputs y n 1 . Thus it computes k y − x ( m ′ ) k 2 for each m ′ ∈ M and decides on the messa ge that satisﬁes ˆ M = arg min m ′ ∈M k y − x ( m ′ ) k 2 , (86) 11 It follows from the w eak law of large n umbers that, for an y m ∈ M , 1 n P n k =1 x 2 k ( m ) conv erges to P /L in probability as n t ends to inﬁnit y . This guarante es that the probabilit y that a codeword does not satisfy t he per- message p o wer constrain t (13)—and henc e als o the av er age -p o wer constrain t (11)—v anishes as n tends to inﬁnit y . 18 where ties are res o lv ed with a fair coin ﬂip. Here, k · k denotes the Euclidean norm, and y and x ( m ′ ) denote the resp ectiv e vectors ( y 1 , y L +1 , . . . , y ( ⌊ n/L ⌋− 1) L +1 ) T and ( x 1 ( m ′ ) , x L +1 ( m ′ ) , . . . , x ( ⌊ n/L ⌋− 1) L +1 ( m ′ )) T . W e are in terested in the a verage proba bilit y of er r or Pr  ˆ M 6 = M  , av era ged o ver all co de- words in the co deb o ok, and averaged o ver all co deb ook s. By the symmetry o f the co deb oo k construction, the probability of err or corres ponding to the m -th messa ge Pr  ˆ M 6 = M   M = m  do es not dep end on m , and we t hus conclude that Pr  ˆ M 6 = M  = Pr  ˆ M 6 = M   M = 1  . W e further note that Pr  ˆ M 6 = M   M = 1  ≤ P r |M| [ m ′ =2 k Y − X ( m ′ ) k 2 ≤ k Z k 2      M = 1 ! , (87) where Z =  θ  X 1 (1)  · U 1 , θ  X L 1 (1)  · U L +1 , . . . , θ  X ( ⌊ n/L ⌋− 1) L +1 1 (1)  · U ( ⌊ n/L ⌋− 1) L +1  T which is, conditio na l on M = 1, equal to k Y − X (1) k 2 . I n order to analyze (87) we need the following lemma. Lemma 5. Consider the channel describ e d in Se ction 2, and assume that { α ℓ } satisﬁes (82) . F urther assume that { X kL +1 , k ∈ Z + 0 } is a se quenc e of IID , zer o-me an Gaussian r andom vari- ables of varianc e P , and that X k = 0 if k mod L 6 = 1 (wher e k mo d L stands for the r emainder up on div ing k by L ). L et t he set D ǫ b e de ﬁne d as D ǫ , ( ( y , z ) ∈ R ⌊ n/L ⌋ × R ⌊ n/L ⌋ :     1 ⌊ n/L ⌋ k y k 2 − ( σ 2 + P + α ( L ) P )     < ǫ,     1 ⌊ n/L ⌋ k z k 2 − ( σ 2 + α ( L ) P )     < ǫ ) , (88) with α ( L ) b eing deﬁne d as α ( L ) , ∞ X ℓ =1 α ℓL . (89) Then lim n →∞ Pr  ( Y , Z ) ∈ D ǫ  = 1 (90) for any ǫ > 0 . Pr o of. See Appendix C. In order to upper b ound the RHS of (87) w e pro c eed along the lines of [15], [14]. W e hav e Pr |M| [ m ′ =2 k Y − X ( m ′ ) k 2 ≤ k Z k 2      M = 1 ! ≤ Pr  ( Y , Z ) / ∈ D ǫ  + Z D ǫ Pr |M| [ m ′ =2 k y − X ( m ′ ) k 2 ≤ k z k 2      ( y , z ) , M = 1 ! P . ( y , z ) , (91) where we use that, b y the symmetry of the co deb oo k construction, the la w of ( Y , Z ) do e s not depe nd on M . It follo ws from Lemma 5 tha t the ﬁrst term on the RHS of (91) v anishes as n tends to inﬁnity . Since the codewords are independent of each o ther, conditional on M = 1 , the distribution of X ( m ′ ), m ′ = 2 , . . . , |M| do es not dep end on ( y , z ). W e upper b ound the seco nd term on the RHS of (91) by ana lyzing Pr  k y − X ( m ′ ) k 2 ≤ k z k 2   ( y , z ) , M = 1  , m ′ = 2 , . . . , |M| and b y a pplying then the union of ev ents b o und. 19 F or m ′ = 2 , . . . , |M| , we have Pr  k y − X ( m ′ ) k 2 ≤ k z k 2   ( y , z )  ≤ exp ( − s ⌊ n/L ⌋ ( σ 2 + α ( L ) P + ǫ ) + s k y k 2 1 − 2 s P − 1 2 ⌊ n/L ⌋ log(1 − 2 s P ) ) , ( y , z ) ∈ D ǫ (92) for any s < 0. This follows by upper b ounding k z k 2 by ⌊ n/L ⌋ ( σ 2 + α ( L ) P + ǫ ) and from Cherno ﬀ ’s bo und [17, Sec. 5.4]. Using that, for ( y , z ) ∈ D ǫ , k y k 2 > ⌊ n/ L ⌋ ( σ 2 + P + α ( L ) P − ǫ ) it follows from the union of even ts b ound and from (92) tha t (9 1) go es to zero as n tends to inﬁnit y if fo r so me s < 0 the rate R sa tisﬁes R < s L ( σ 2 + α ( L ) P + ǫ ) + 1 2 L log(1 − 2 s P ) − s L σ 2 + P + α ( L ) P − ǫ 1 − 2 s P . (93) Thu s choosing s = − 1 / 2 · 1 / (1 + α ( L ) P ) yields that any rate b elow − 1 2 L σ 2 + α ( L ) P + ǫ 1 + α ( L ) P + 1 2 L log  1 + P 1 + α ( L ) P  + 1 2 L σ 2 + P + α ( L ) P − ǫ 1 + α ( L ) P 1 1 + P 1+ α ( L ) P (94) is ach iev able. As P tends to inﬁnit y this co nverges to 1 2 L log  1 + 1 α ( L )  > 1 2 L log 1 α ( L ) . (95) It remains to show that giv en (84) we can ma ke − 1 L log α ( L ) arbitrar ily lar ge. Indeed, (84) implies that α ( L ) = ∞ X ℓ =1 α ℓL < ∞ X ℓ =1  ℓL =  L 1 −  L and (95) can therefore be further low e r bounded by 1 2 L log  1 −  L  + 1 2 log 1  . (96) Letting L tend to inﬁnity yields then that w e can ac hieve any ra te b elow 1 2 log 1  . As this can b e made arbitrarily large by choos ing  suﬃciently small, w e conclude that lim ℓ →∞ 1 ℓ log 1 α ℓ = ∞ implies that the capacity is un b o unded. 7 Conclusion W e studied a mo del for on-chip comm unication with no nideal heat sinks. T o account for the heating up eﬀect w e propo s ed a channel model where the v ariance of the additiv e noise depends on a w eighted sum of the past c hannel input p owers. The w eights characterize the eﬃciency of the he a t sink. T o study the capa c ity of this channel at low SNR, we computed the capacity p er unit cost. W e show ed that the heating eﬀect is no t just unharmful but can be even beneﬁcial in the sense that the capacity per unit cost ca n be larger than the capacity per unit cost o f a corr e s ponding channel with ideal hea t sink, i.e., whe r e the weights descr ibing the dep endency of the noise v aria nce on the channel input powers a re zero. This s uggests that at low SNR no heat sinks should be used. Studying capacity at high SNR, we derived a s uﬃcien t condition and a necessary c o ndition on the w eights for the c a pacit y to b e b ounded in the SNR. W e show ed that when the s equence o f weigh ts decays no t fa ster tha n geometrically , then capacity is b ounded in the SNR. On the o ther hand, if the sequence of weigh ts deca ys faster tha n geo metrically , then capacit y is unbo unded in the SNR. This result demonstrates the imp o rtance of an e ﬃcien t heat sink at high SNR. 20 Ac kno wledgmen t F ruitful discussions with Ashish Khisti a nd Mich` ele Wigger are gratefully ac knowledged. Serg io V erd ´ u’s commen ts at the ISIT 20 07 on o ur low SNR r esults are also muc h appreciated. A Pro of of Prop osit ion 1 W e ﬁrst note that by the expr ession of the capa cit y per unit cost of a memoryless channel [1 2] we hav e sup SNR > 0 C α =0 (SNR) SNR = sup ζ 2 > 0 D  W α =0 ( ·| ζ )   W α =0 ( ·| 0)  ζ 2 /σ 2 , (97) where W α =0 ( ·|· ) denotes the channel law of the channel Y k = x k + σ · U k . (98) Thu s to pr o ve Prop osition 1 it suﬃces to sho w that sup SNR > 0 C Info (SNR) SNR ≥ sup ζ 2 > 0 D  W α =0 ( ·| ζ )   W α =0 ( ·| 0)  ζ 2 /σ 2 . W e sha ll obtain this result b y deriving a low er bound o n C Info (SNR) and b y computing then its limiting ratio to SNR as SNR tends to zero. In order to lo wer bo und C Info (SNR), whic h w as deﬁned in (16) a s C Info (SNR) = lim n →∞ 1 n sup I ( X n 1 ; Y n 1 ) , we ev a luate 1 n I ( X n 1 ; Y n 1 ) for inputs { X k } that a re blo ckwise I ID in blo cks of L symbols (for so me L ∈ Z + ). Thus { ( X bL +1 , . . . , X ( b +1) L ) , b ∈ Z + 0 } is a sequence of I ID r a ndom length- L vectors with ( X bL +1 , . . . , X ( b +1) L ) taking on the v alue ( ξ , 0 , . . . , 0) with pr obabilit y δ and (0 , . . . , 0) with probability 1 − δ , for s ome ξ ∈ R . T o satisfy the power constraint (11) we shall choose ξ and δ such that ξ 2 σ 2 δ = L SNR . (99) W e use the chain rule for mutual informa tio n to write 1 n I ( X n 1 ; Y n 1 ) = 1 n ⌊ n/L ⌋− 1 X b =0 I  X bL +1 ; Y n 1   X bL 1  ≥ 1 n ⌊ n/L ⌋− 1 X b =0 I  X bL +1 ; Y bL +1   X bL 1  , (100) where the inequality follo ws beca use reducing observ a tions ca nnot increase mutual information. Let R ( ξ ) on-oﬀ ( snr ) deno te the max im um rate a chiev able o n (98) using o n-oﬀ k eying with on- symbol ξ and with its co rresp onding probability ℘ chosen in order to satisfy the power constraint snr , i.e., R ( ξ ) on-oﬀ ( snr ) , sup P X ( ξ )=1 − P X (0)= ℘, ξ 2 /σ 2 ℘ ≤ snr I ( X ; X + σ · U k ) , snr ≥ 0 . (101) Notice that R ( ξ ) on-oﬀ ( snr ), snr ≥ 0 is a nonnegative, monotonically nondecreasing function of snr with R ( ξ ) on-oﬀ (0) = 0. F r om the strict co nca vity o f m utual infor mation it follows that R ( ξ ) on-oﬀ ( snr ) > 0 whenever snr > 0. Also, for a ﬁxed ξ , snr 7→ R ( ξ ) on-oﬀ ( snr ) is concav e in sn r . Co ns equen tly , for some snr 0 > 0 , the function snr 7→ R ( ξ ) on-oﬀ ( snr ) is strictly monotonic in the in terv al snr ∈ [0 , snr 0 ], and hence the supremum on the RHS of (101) is attained for ℘ = snr σ 2 /ξ 2 , snr ∈ [0 , snr 0 ]. 21 By wr iting I ( X bL +1 ; Y bL +1 | X bL 1 = x bL 1 ) fo r a giv en X bL 1 = x bL 1 as I  X bL +1 ; Y bL +1   X bL 1 = x bL 1  = I  X bL +1 ; X bL +1 + θ  x bL 1  · U bL +1  = I X bL +1 ; σ θ  x bL 1  X bL +1 + σ · U bL +1 ! (with θ  x bL 1  deﬁned in (67)), and by using that for snr ∈ [0 , snr 0 ] the supremum on the RHS of (101) is atta ine d for ℘ = snr σ 2 /ξ 2 we obtain I  X bL +1 ; Y bL +1   X bL 1 = x bL 1  = R ( ξ ) on-oﬀ L SNR 1 + P b − 1 ℓ =0 α ( b − ℓ ) L x 2 ℓL +1 /σ 2 ! , SNR ∈ [0 , SNR 0 ] , (102 ) where SNR 0 , snr 0 /L . Averaging o ver X bL 1 and com bining with (10 0) yields 1 n I ( X n 1 ; Y n 1 ) ≥ 1 n ⌊ n/L ⌋− 1 X b =0 E " R ( ξ ) on-oﬀ L SNR 1 + P b − 1 ℓ =0 α ( b − ℓ ) L X 2 ℓL +1 /σ 2 !# ≥ ⌊ n/L ⌋ n R ( ξ ) on-oﬀ  L SNR 1 + P ∞ ℓ =1 α ℓL ξ 2 /σ 2  , SNR ∈ [0 , SNR 0 ] , (103) where the second ineq ualit y follows by upper b ounding P b − 1 ℓ =0 α ( b − ℓ ) L X 2 ℓL +1 /σ 2 ≤ P ∞ ℓ =1 α ℓL ξ 2 /σ 2 , and by using tha t snr 7→ R ( ξ ) on-oﬀ ( snr ) is mo notonically increasing in s nr . The low er b ound o n C Info (SNR) follows then by letting n tend to inﬁnity C Info (SNR) = lim n →∞ 1 n I ( X n 1 ; Y n 1 ) ≥ 1 L R ( ξ ) on-oﬀ  L SNR 1 + P ∞ ℓ =1 α ℓL ξ 2 /σ 2  . (104) With this we can low e r bound the information c apacit y per unit co st as sup SNR > 0 C Info (SNR) SNR ≥ lim SNR ↓ 0 C Info (SNR) SNR ≥ lim SNR ↓ 0 1 L R ( ξ ) on-oﬀ  L S NR 1+ P ∞ ℓ =1 α ℓL ξ 2 /σ 2  SNR = lim SNR ↓ 0 R ( ξ ) on-oﬀ  L SNR 1+ P ∞ ℓ =1 α ℓL ξ 2 /σ 2  L S NR 1+ P ∞ ℓ =1 α ℓL ξ 2 /σ 2 1 1 + P ∞ ℓ =1 α ℓL ξ 2 /σ 2 = lim SNR ′ ↓ 0 R ( ξ ) on-oﬀ (SNR ′ ) SNR ′ 1 1 + P ∞ ℓ =1 α ℓL ξ 2 /σ 2 , (105) where the ﬁrst inequality follows by low er b ounding the supremum by the limit; and where the last equality follows b y substituting SNR ′ = L SNR 1+ P ∞ ℓ =1 α ℓL ξ 2 /σ 2 . Pro ceeding along the lines of the pro of o f [12, Thm. 3 ], it can b e shown that lim SNR ′ ↓ 0 R ( ξ ) on-oﬀ (SNR ′ ) SNR ′ = D  W α =0 ( ·| ξ )   W α =0 ( ·| 0)  ξ 2 /σ 2 (106) and therefore sup SNR > 0 C Info (SNR) SNR ≥ D  W α =0 ( ·| ξ )   W α =0 ( ·| 0)  ξ 2 /σ 2 · 1 1 + P ∞ ℓ =1 α ℓL ξ 2 /σ 2 . (107) Noting that (9) & (21) imply 0 ≤ lim L →∞ ∞ X ℓ =1 α ℓL ≤ lim L →∞ ∞ X ℓ = L α ℓ = 0 (108) 22 we obtain b y letting L tend to inﬁnity sup SNR > 0 C Info (SNR) SNR ≥ D  W α =0 ( ·| ξ )   W α =0 ( ·| 0)  ξ 2 /σ 2 . (109) Maximizing (109) o ver ξ 2 yields then sup SNR > 0 C Info (SNR) SNR ≥ sup ξ 2 > 0 D  W α =0 ( ·| ξ )   W α =0 ( ·| 0)  ξ 2 /σ 2 (110) which, in view of (9 7), proves P rop o sition 1. B App e ndix to Section 5.2 W e shall prov e that lim b →∞ I  X − 1 −∞ ; ˜ Y b   X b 0  = 0 . (111) Let α ( i ) b be deﬁned as α (1) 0 , 0 (112) α ( i ) b , α bL + i − 1 , ( b, i ) ∈ Z + 0 × Z + \ { (0 , 1) } . (113) W e hav e I  X − 1 −∞ ; ˜ Y b   X b 0  = L X i =1 I  X − 1 −∞ ; ˜ Y bL + i   X b 0 , ˜ Y bL + i − 1 bL +1  ≤ L X i =1  h  ˜ Y bL + i   X b 0  − h  ˜ Y bL + i   X b −∞   ≤ 1 2 L X i =1 E " log (2 π e ) σ 2 + b X ℓ =0 α ( i ) b − ℓ X 2 ℓL +1 + P L ∞ X ℓ = b +1 α ( i ) ℓ !!# − 1 2 L X i =1 E " log (2 π e ) σ 2 + b X ℓ =0 α ( i ) b − ℓ X 2 ℓL +1 + − 1 X ℓ = −∞ α ( i ) b − ℓ X 2 ℓL +1 !!# ≤ 1 2 L X i =1 E " log (2 π e ) σ 2 + b X ℓ =0 α ( i ) b − ℓ X 2 ℓL +1 + P L ∞ X ℓ = b +1 α ( i ) ℓ !!# − 1 2 L X i =1 E " log (2 π e ) σ 2 + b X ℓ =0 α ( i ) b − ℓ X 2 ℓL +1 !!# = 1 2 L X i =1 E " log 1 + P L P ∞ ℓ = b +1 α ( i ) ℓ σ 2 + P b ℓ =0 α ( i ) b − ℓ X 2 ℓL +1 !# ≤ 1 2 L X i =1 log 1 + L SNR ∞ X ℓ = b +1 α ( i ) ℓ ! , (114) where the ﬁrst inequality follows b ecause conditioning canno t increase entrop y a nd b ecause, conditional on X b −∞ , ˜ Y bL + i is independent of ˜ Y bL + i − 1 bL +1 ; the nex t inequa lit y fo llo ws fro m the ent ro p y maximizing pr operty of Gaussia n random v aria bles; the subsequent ineq ualit y fol- lows b ecause P − 1 ℓ = −∞ α ( i ) b − ℓ X 2 ℓL +1 ≥ 0, i = 1 , . . . , L ; a nd the last inequality follows b ecause P b ℓ =0 α ( i ) b − ℓ X 2 ℓL +1 ≥ 0 , i = 1 , . . . , L . By upp er bo unding ∞ X ℓ = b +1 α ( i ) ℓ ≤ ∞ X ℓ = b +1 α ℓ , i = 1 , . . . , L (115) 23 we obtain I  X − 1 −∞ ; ˜ Y b   X b 0  ≤ L 2 log 1 + L SNR ∞ X ℓ = b +1 α ℓ ! , (116) and (111) follows by noting that (2 1) implies lim b →∞ ∞ X ℓ = b +1 α i = 0 . C Pro of of Lemma 5 W e shall show that for any ǫ > 0 lim n →∞ Pr      1 ⌊ n/L ⌋ k Y k 2 − ( σ 2 + P + α ( L ) P )     ≥ ǫ  = 0 (117) and lim n →∞ Pr      1 ⌊ n/L ⌋ k Z k 2 − ( σ 2 + α ( L ) P )     ≥ ǫ  = 0 . (118) Lemma 5 follows then by the union of ev ents bo und. In order to prov e (117) & (118), w e ﬁrst note that 1 ⌊ n/L ⌋ E  k Y k 2  = σ 2 + P + P ⌊ n/L ⌋ ⌊ n/L ⌋− 1 X k =1 k X ℓ =1 α ℓL (119) 1 ⌊ n/L ⌋ E  k Z k 2  = σ 2 + P ⌊ n/L ⌋ ⌊ n/L ⌋− 1 X k =1 k X ℓ =1 α ℓL (120) and therefore, by Ces´ aro ’s mean [10, Thm. 4.2.3], lim n →∞ 1 ⌊ n/L ⌋ E  k Y k 2  = σ 2 + P + α ( L ) P (121) lim n →∞ 1 ⌊ n/L ⌋ E  k Z k 2  = σ 2 + α ( L ) P , (122) where α ( L ) was deﬁned in (89) as α ( L ) = ∞ X ℓ =1 α ℓL . Thu s, for an y ǫ > 0 and 0 < ε < ǫ , there exists an n 0 such that for all n ≥ n 0     1 ⌊ n/L ⌋ E  k Y k 2  − ( σ 2 + P + α ( L ) P )     ≤ ε (123 )     1 ⌊ n/L ⌋ E  k Z k 2  − ( σ 2 + α ( L ) P )     ≤ ε (124 ) and it follows from the triangle inequality that     1 ⌊ n/L ⌋ k Y k 2 − ( σ 2 + P + α ( L ) P )     ≤     1 ⌊ n/L ⌋ k Y k 2 − 1 ⌊ n/L ⌋ E  k Y k 2      + ε (125)     1 ⌊ n/L ⌋ k Z k 2 − ( σ 2 + α ( L ) P )     ≤     1 ⌊ n/L ⌋ k Z k 2 − 1 ⌊ n/L ⌋ E  k Z k 2      + ε. (126) F ro m this w e obtain Pr      1 ⌊ n/L ⌋ k Y k 2 − ( σ 2 + P + α ( L ) P )     ≥ ǫ  ≤ P r      1 ⌊ n/L ⌋ k Y k 2 − 1 ⌊ n/L ⌋ E  k Y k 2      ≥ ǫ − ε  ≤ V ar  1 ⌊ n/L ⌋ k Y k 2  ( ǫ − ε ) 2 (127) 24 and Pr      1 ⌊ n/L ⌋ k Z k 2 − ( σ 2 + α ( L ) P )     ≥ ǫ  ≤ P r      1 ⌊ n/L ⌋ k Z k 2 − 1 ⌊ n/L ⌋ E  k Z k 2      ≥ ǫ − ε  ≤ V ar  1 ⌊ n/L ⌋ k Z k 2  ( ǫ − ε ) 2 , (128) with Va r ( A ) = E  ( A − E [ A ]) 2  denoting the v aria nce of A . Here the last inequalities in (127) & (128) follo w from Chebyshev’s inequality [17, Sec. 5.4]. It r emains to show that lim n →∞ V ar  1 ⌊ n/L ⌋ k Y k 2  = lim n →∞ V ar  1 ⌊ n/L ⌋ k Z k 2  = 0 . (129) W e shall prov e (129) for Y . The proo f for Z follows along the same lines. W e b egin b y writing V ar  1 ⌊ n/L ⌋ k Y k 2  as V ar  1 ⌊ n/L ⌋ k Y k 2  = 1  ⌊ n/L ⌋  2 V ar   ⌊ n/L ⌋− 1 X k =0 Y 2 kL +1   = 1  ⌊ n/L ⌋  2 ⌊ n/L ⌋− 1 X k =0 V ar  Y 2 kL +1  + 2  ⌊ n/L ⌋  2 ⌊ n/L ⌋− 1 X k =1 ,j =0 k>j Cov  Y 2 kL +1 , Y 2 j L +1  , (130) where Cov ( A, B ) = E [( A − E [ A ])( B − E [ B ])] denotes the cov ariance b etw een A and B . W e shall ev alua te both ter ms on the RHS o f (130) separ a tely . F or the sak e of clar it y , we shall o mit the details of the deriv ations and show only the main steps. Unless otherwise stated these steps can be derived in a s tr aigh tforward w ay using tha t i) { X kL +1 , k ∈ Z + 0 } is a seq uence of I ID, ze r o-mean, v ariance- P Gaussian r a ndom v ariables whose fourth moments are giv en b y 3 P , while all odd momen ts are zer o; ii) X k = 0 if k mo d L 6 = 1; iii) { U k } (and hence als o { U kL +1 , k ∈ Z + 0 } ) is a zero- mean, unit-v ariance, stationary & weakly-mixing random pr o cess; iv) and that { X k } and { U k } a re indep enden t of ea c h other . F or the ﬁrst sum on the RHS of (130) it suﬃces to show that Va r ( Y kL +1 ) < ∞ , k ∈ Z + 0 . Indeed, this sum con tains only ⌊ n/ L ⌋ summands and hence, when divided by ( ⌊ n /L ⌋ ) 2 , this sum v anishes as n tends to inﬁnity , given that V ar ( Y kL +1 ) < ∞ , k ∈ Z + 0 . W e hav e V ar  Y 2 kL +1  = E  Y 4 kL +1  −  E  Y 2 kL +1  2 ≤ E  Y 4 kL +1  = E h  X kL +1 + θ  X kL 1  · U kL +1  4 i = 3 P 2 + 6 P σ 2 + P k X ℓ =1 α ℓL ! +   σ 4 + 2 σ 2 P k X ℓ =1 α ℓL + 2 P 2 k X ℓ =1 α 2 ℓL + P 2 k X ℓ =1 α kL ! 2   E  U 4 kL +1  ≤ 3 P 2 + 6 P  σ 2 + P α ( L )  + σ 4 + 2 σ 2 P α ( L ) + 2 P 2 ∞ X ℓ =1 α 2 ℓL + P 2  α ( L )  2 ! E  U 4 kL +1  (131) 25 where the s econd ineq ualit y follows b y upp er b ounding P k ℓ =1 α ℓL ≤ α ( L ) . No te that (84) implies that α ( L ) and P ∞ ℓ =1 α 2 ℓL are bounded. It follows therefore b y noting that U kL +1 has a ﬁnite fourth moment that (for a ﬁnite P ) V ar ( Y kL +1 ) < ∞ . ( 13 2 ) In order to s ho w that the second term on the RHS of (13 0) v anishes as n tends to inﬁnit y , we shall ev aluate Cov ( Y kL +1 , Y j L +1 ) = E  Y 2 kL +1 Y 2 j L +1  − E  Y 2 kL +1  E  Y 2 j L +1  for k ∈ Z + , j ∈ Z + 0 , k > j . W e hav e E  Y 2 kL +1 Y 2 j L +1  = E   X kL +1 + θ  X kL 1  · U kL +1  2  X j L +1 + θ  X j L 1  · U j L +1  2  = P 2 + P σ 2 + P j X ℓ =1 α ℓL ! + P σ 2 + P k X ℓ =1 α ℓL ! + 2 P 2 α ( k − j ) L + σ 2 + P k X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L ! E  U 2 kL +1 U 2 j L +1  + 2 P 2 j X ℓ =1 α ℓL α ( ℓ + k − j ) L E  U 2 kL +1 U 2 j L +1  . (133) Ev alua ting E  Y 2 kL +1  E  Y 2 j L +1  = P 2 + P σ 2 + P j X ℓ =1 α ℓL ! + P σ 2 + P k X ℓ =1 α ℓL ! + σ 2 + P k X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L ! (134) we obtain from (134) & (13 3) Cov ( Y kL +1 , Y j L +1 ) = 2 P 2 α ( k − j ) L + 2 P 2 j X ℓ =1 α ℓL α ( ℓ + k − j ) L E  U 2 kL +1 U 2 j L +1  + σ 2 + P k X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L !  E  U 2 kL +1 U 2 j L +1  − 1  . (135) Summing o ver k a nd j and diving by ( ⌊ n/L ⌋ ) 2 yields 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 1 X k =1 ,j =0 k>j Cov  Y 2 kL +1 , Y 2 j L +1  = 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 1 X k =1 ,j =0 k>j 2 P 2 α ( k − j ) L + 2 P 2 j X ℓ =1 α ℓL α ( ℓ + k − j ) L E  U 2 kL +1 U 2 j L +1  + σ 2 + P k X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L !  E  U 2 kL +1 U 2 j L +1  − 1  ! = 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋− 1 − j X ν =1 2 P 2 α ν L + 2 P 2 j X ℓ =1 α ℓL α ( ℓ + ν ) L E  U 2 ν L +1 U 2 1  + σ 2 + P j + ν X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L !  E  U 2 ν L +1 U 2 1  − 1  ! 26 = 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋− 1 − j X ν =1 2 P 2 α ν L + 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋− 1 − j X ν =1 2 P 2 j X ℓ =1 α ℓL α ( ℓ + ν ) L E  U 2 ν L +1 U 2 1  + 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋− 1 − j X ν =1 σ 2 + P j + ν X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L !  E  U 2 ν L +1 U 2 1  − 1  , (136) where the second equality follows by substituting ν = k − j and from the s ta tionarit y o f { U k } . The ﬁrst tw o ter ms on the RHS of (136) can be upper b ounded using ( 84) α ℓ <  ℓ , 0 <  < 1 , ℓ ≥ ℓ 0 . Indeed, noting tha t L ≥ ℓ 0 , w e have ⌊ n/L ⌋− 1 − j X ν =1 α ν L < ⌊ n/L ⌋− 1 − j X ν =1  ν L < ⌊ n/L ⌋ X ν =1  ν L (137) and ⌊ n/L ⌋− 1 − j X ν =1 j X ℓ =1 α ℓL α ( ℓ + ν ) L < ⌊ n/L ⌋− 1 − j X ν =1 j X ℓ =1   2 L  ℓ  ν L < ⌊ n/L ⌋ X ν =1 ∞ X ℓ =1   2 L  ℓ  ν L =  2 L 1 −  2 L ⌊ n/L ⌋ X ν =1  ν L . (138 ) Consequently with (137) we can upper b ound the ﬁrst term o n the RHS of (136) a s 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋− 1 − j X ν =1 2 P 2 α ν L < 4 P 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋ X ν =1  ν L = 4 P 2 ⌊ n/L ⌋ − 1 ⌊ n/L ⌋ 1 ⌊ n/L ⌋ ⌊ n/L ⌋ X ν =1  ν L , (139) and it follo ws from Ces´ a ro’s mean that this upper bo und tends to zero as n tends to inﬁnit y . Likewise with (138) we can upper b ound the second term o n the RHS of (1 3 6) as 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋− 1 − j X ν =1 2 P 2 j X ℓ =1 α ℓL α ( ℓ + ν ) L E  U 2 ν L +1 U 2 1  ≤ 4 P 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋− 1 − j X ν =1 j X ℓ =1 α ℓL α ( ℓ + ν ) L E  U 4 1  < 4 P 2  2 L 1 −  2 L E  U 4 1  ⌊ n/L ⌋ − 1 ⌊ n/L ⌋ 1 ⌊ n/L ⌋ ⌊ n/L ⌋ X ν =1  ν L , (140) where the ﬁrst inequalit y follows from the Cauch y-Sch warz inequa lit y . As above, it follows fr om Ces´ aro’s mea n that this upp er bound tends to zero as n tends to inﬁnity . 27 It th us remains to show that the last term on the RHS of (136) v anishes as n tends to inﬁnity . W e hav e for each j = 0 , . . . , ⌊ n/L ⌋ − 2 ⌊ n/L ⌋− 1 − j X ν =1 σ 2 + P j + ν X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L !  E  U 2 ν L +1 U 2 1  − 1  ≤ ⌊ n/L ⌋− 1 − j X ν =1 σ 2 + P j + ν X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L !    E  U 2 ν L +1 U 2 1  − 1    ≤ ⌊ n/L ⌋− 1 − j X ν =1  σ 2 + P α ( L )  2    E  U 2 ν L +1 U 2 1  − 1    ≤ ⌊ n/L ⌋ X ν =1  σ 2 + P α ( L )  2    E  U 2 ν L +1 U 2 1  − 1    , (141) where the ﬁrst ineq ua lit y follows by upper bounding E  U 2 ν L +1 U 2 1  − 1 ≤   E  U 2 ν L +1 U 2 1  − 1   ; and the s e cond inequalit y follows by upper b ounding P j ℓ =1 α ℓL ≤ P j + ν ℓ =1 α ℓL ≤ P ∞ ℓ =1 α ℓL = α ( L ) . The last term on the RHS of (136) is ther efore upper b ounded b y 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋− 1 − j X ν =1 σ 2 + P j + ν X ℓ =1 α ℓL ! σ 2 + P j X ℓ ′ =1 α ℓ ′ L !  E  U 2 ν L +1 U 2 1  − 1  ≤ 2 ( ⌊ n/L ⌋ ) 2 ⌊ n/L ⌋− 2 X j =0 ⌊ n/L ⌋ X ν =1  σ 2 + P α ( L )  2    E  U 2 ν L +1 U 2 1  − 1    = 2  σ 2 + P α ( L )  2 ⌊ n/L ⌋ − 1 ⌊ n/L ⌋ 1 ⌊ n/L ⌋ ⌊ n/L ⌋ X ν =1    E  U 2 ν L +1 U 2 1  − 1    . (142) It fo llo ws no w fr o m the w eakly-mixing prop ert y of { U k } that [8, Thm. 6.1 ] lim n →∞ 1 ⌊ n/L ⌋ ⌊ n/L ⌋ X ν =1    E  U 2 ν L +1 U 2 1  − 1    = lim n →∞ 1 ⌊ n/L ⌋ ⌊ n/L ⌋ X ν =1    E  U 2 ν L +1 U 2 1  − E  U 2 ν L +1  E  U 2 1     = 0 so tha t the last term on the RHS of (1 3 6 ) v anishes as n tends to inﬁnity . Thu s (142), (14 0), and (139) show that (136) v anishes a s n tends to inﬁnit y w hich in turn shows, along with (130) and (132), that lim n →∞ V ar  1 ⌊ n/L ⌋ k Y k 2  = 0 . T og ether with (127), this proves (11 7). The pro o f of (118) follows along the same lines. References [1] R. V enk atesan, A. Ka lo yeros, M. Beyla nsky , S. J. Souri, K. Baner jee, K. C. Sara s w at, A. Rahman, R. Reif, and J. D. Meindl, “In terconnec t limits on gig ascale in tegr a tion (GSI) in the 21s t cent ury ,” Pr o c. IEEE , vol. 89, no. 3, pp. 305–32 4, Mar . 2001 . [2] L. B. Kish, “End of Mo ore’s law: thermal (noise) death of integration in micro and nano electronics,” Physi cs L et t . A , no. 3–4, pp. 144–1 49, D ec. 2002 . [3] K . E . Go o dson, “Thermal conduction in electronic microstruc tur es,” in CRC Handb o ok of Thermal Engine ering , 1st ed., ser. Mec hanical Engineering Handb o ok Series , F. Keith, Ed., Dec. 1999. [4] J . H. Lienhard IV and J . H. Lienha rd V, A He at T r ansfer T extb o ok , 3rd ed. Cambridge Massach usets, USA: Phlogisten Press, 20 08. 28 [5] Y. Tsividis, Op er ation and Mo deling of the MOS T r ansist or , 2nd ed. USA: Oxford Uni- versit y Pres s, 2003. [6] C. C. E nz and Y. Cheng, “MOS tra nsistor mo deling for RF IC desig n,” IEEE J. Solid -St ate Cir cuits , vol. 35, no. 2, pp. 1 86–201, F eb. 2 000. [7] B. Razavi, “CMOS tec hnolog y c hara cterization for analog and RF design,” IEEE J. Solid- State Cir cuits , vol. 34, no. 3, pp. 268–2 76, Ma r. 199 9 . [8] K . Petersen, Er go dic The ory , ser. Cambridge Studies in Adv anced Ma thematics 2. Cam- bridge Univ ersity Press , 1983. [9] G. Maruyama, “The harmonic analysis of stationary sto ch astic proce s ses,” Me moirs of t he faculty of scienc e, S eries A , vol. 4, no. 1 , pp. 45–106, 194 9. [10] T. M. Cover a nd J. A. Thomas, Elements of In forma tion The ory . John Wiley & Sons, 1991. [11] S. V erd ´ u and T. S. Han, “A g e ne r al for m ula for c hannel capacity ,” IEEE T r ans. Inform. The ory , vol. 40, no. 4, pp. 1147–11 57, July 1994. [12] S. V erd´ u, “On c hannel capa cit y p er unit cost,” IEE E T r ans. Inform. The ory , vol. 36, pp. 1019– 1030, Sept . 19 90. [13] S. V erd ´ u, “ Spectral eﬃcie ncy in the wideband regime,” IEEE T r ans. Inform. The ory , vol. 48, no. 6, pp. 13 1 9–1343, June 2 002. [14] A. Lapido th and S. Shamai (Shitz), “ F ading channels: how p erfect need ‘p erfect side- information’ be?” IEEE T r ans. Inform. The ory , vol. 48, no. 5, pp. 1118– 1134, May 2 002. [15] A. Lapido th, “ Nearest neigh b or deco ding for additiv e non- Gaussian noise c hannels,” IEEE T ra ns. Info rm. The ory , v ol. 42, pp. 1520– 1529, Sept. 1996 . [16] A. Lapido th a nd S. M. Moser, “ Capacit y bounds via duality with applications to multiple- antenna systems on ﬂat fading c hannels,” IEEE T r ans. Inform. The ory , v ol. 49, no. 10, pp. 2426– 2467, Oct. 200 3. [17] R. G. Ga llager, Information The ory and R eliable Commun ic ation . John Wiley & Sons, 1968. 29

Channels that Heat Up

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment