Joint source and channel coding for MIMO systems: Is it better to be robust or quick?
We develop a framework to optimize the tradeoff between diversity, multiplexing, and delay in MIMO systems to minimize end-to-end distortion. We first focus on the diversity-multiplexing tradeoff in MIMO systems, and develop analytical results to min…
Authors: Tim Holliday, Andrea J. Goldsmith, H. Vincent Poor
Joint sour ce and channel coding f or MIMO systems: Is it better to be r ob ust or quick? T im Holliday , Andrea J. Goldsmith, and H. V incent Poor Abstract W e develop a frame work to optimize the tradeo ff between diversity , multiplexing , and d elay in MIMO systems to min imize end- to-end distortion. The goal is to find the optimal balan ce between th e incr eased data rate provided by antenna multiplexing , th e reduction in transmission errors provided by antenna di versity an d automatic repeat request (ARQ), and th e d elay in troduc ed b y ARQ. W e first fo cus on the diversity-multiplexing trad eoff in MIMO systems, and develop analytical re sults to min imize distortio n of a vecto r quan tizer concaten ated with a space- time MIMO chann el code. In the high SNR regime we obtain a closed- form expression for the en d-to-en d distortion as a fun ction of the optimal point o n the div ersity-mu ltiplexing trad eoff curve. F or large but finite SNR we find this op timal point v ia conve x optimizatio n. T he same g eneral framework can also be used to minimize end-to-en d distortion for a broad class of p ractical source and chan nel cod es, which we illustrate with an example. W e th en consider MIMO systems using ARQ retran smission to provide add itional diversity at the expense of delay . W e show that for sou rces without a delay constraint, distortion is minimized by m aximizing the ARQ win dow size. This r esults in an ARQ-enhanc ed multip lexing-diversity trad eoff region, with distortion minimized over this region in the same m anner as without ARQ. Howe ver , u nder a source delay constrain t the pr oblem fo rmulation changes to account for delay distortion a ssociated with random message arriv al and r andom ARQ completion times. Moreover , the simplificatio ns associated with a high SNR assumption bre ak down for this analy sis, since retransmissions, and th e delay they cause, beco me rare events. W e thus use a dynam ic programm ing formu lation to capture the chann el d iv ersity-multip lexing tradeoff at finite SNR as well as the r andom arriv al and retran smission dynamics. This fom ulation is used to solve for the optima l multiplexing- div ersity-delay tra deoff to minimize e nd- to-end distortio n associated with the source encod er , chann el, an d ARQ retransmission s. Our results show that a delay-sensitive system sho uld adapt its operatin g point o n the d iv ersity-multip lexing-delay trad eoff r egion to th e system dynamics. W e provide num erical resu lts that demo nstrate sign ificant performance g ains o f this ada ptiv e policy over a static allocation o f diversity/multiplexing in the cha nnel code a nd a static ARQ wind ow size. Keywords: ARQ, diversity-multiplexing-delay tradeoff, joint sour ce-chann el coding, MIMO chann els. I . I N T RO D U C T I O N Multiple antennas can sign ificantly improve the performance of wireless systems. In particular , with channel knowledge at the receiv er a data rate i ncrease equal to th e minim um nu mber of t ransmit/receive The fi rst author is with Goldman Sachs, the second author i s with S tanford Univ ersity , and the third author is with Princeton Uni versity . This resea rch was su pported by the Office of Nava l Research under Grant N00014-0 5-1-0168, by DARP A ’ s ITMANET program under Grant 110574 1-1-TFIND, and by the National Science Foundation under Grants ANI-03-38807 and CNS-06-25637 . antennas can be obtained by m ultiplexing data streams across the parallel channels associated with the channel gain matri x. Alternative ly , multiple antennas enable transmi t and/or receive diversity which de- creases the probability of error . In a land mark result Zheng and Tse [27] dev eloped a rigorous fundamental tradeof f between the data rate increase possi ble via multiplexing v ersus the channel error probability reduction poss ible via diversity , characterizing how a higher spatial multiplexing gain leads to lo wer div ersity and vice versa. The m ain result in [27] is an explicit characterization of the diversity-multiplexing tradeof f region. This result generated much activity in finding div ersity-multip lexing tradeoffs for other channel models as wel l as design of sp ace-time codes that ac hiev e any point on the tradeof f re gion [1], [8], [6], [16], [18], [24]. Th e diversity-multiplexing tradeoff was also extended to th e m ultiple access channel in [23]. Delay provides a thi rd dimension in the tradeoff region, and thi s dimensio n was explored for MIMO channels based on the automatic repeat request (ARQ) protocol in [7]. In particular , this work characterized the three-dimensional tradeof f between div ersity , mult iplexing, and ARQ-delay for MIMO systems. Our goal in this paper is t o answer t he fol lowing quest ion: “Given the diver sity-mul tiplexing-delay tradeof f region, where shou ld a system operate o n this region?”. In o rder to answer this question we require a performance metric from a layer above the physical l ayer; w hile phy sical layer tradeoffs are based on the channel m odel, the optimizatio n between these t radeof fs depends o n what is most im portant for the application’ s end-to-end performance. The h igher layer metric of interest in t his paper will be end-to-end distortion . Specifically , our system model consists of a lossy source encoder concatenated with a MIMO channel encoder and, in the last section, an ARQ re transmissi on protocol. Our goal is t o determine the optimal p oint on the diversity-multiplexing or diversity-multiplexing-delay tradeoff region that minim izes the com bined d istortion d ue to the source com pression, channel, and delays in the end-to - end s ystem. Our problem formulat ion diff ers from the Shannon-theoretic joint source-channel coding prob lem in that we do not assume asymp totically long block lengths for eith er the source or channel code. In particular , the traditional joint source/channel code formul ation assumes st ationary and er godi c s ources and channels in the asymptotic regime of large source dimension a nd channel code blocklength. Shannon showed that under these assu mptions the so urce shoul d be encoded at a rate just below channel capacity and then transmi tted over the channel at this rate. Since the rate is less than capacity , the channel introduces negligible error , hence the end-to-end dist ortion equals the di stortion i ntroduced b y compressing the source to a rate below the channel capacity . Shannon ’ s well-known separation th eorem indicates that this transm ission scheme is op timal for min imizing end-to-end distortio n and do es not require any coordinati on between the source and channel coders or decoders other than agreeing on the channel transmission rate [4], [5]. Our joint source/channel code formu lation is fundam entally different from Shannon’ s s ince we ass ume a finite bl ocklength for the channel code. T his assump tion is i nherent to the diversity-multiplexing tradeoff since, without finite blocklengt h, t he channel introduces negligible error and hence the di versity gain in terms of channel error probability is meaningless. The finite blocklength guarantees there is a nonnegligible probability of error i n the channel transmissi on. Thus there i s a tradeoff between resolution at t he s ource encoder and robustness at the channel encoder: limiti ng source disto rtion requires a hi gh-rate source code, for which the multiple ant ennas of the cha nnel must be used mainly for m ultiplexing. Alternatively , t he source can be encoded at a lower rate with mo re dis tortion, and then the channel error probability can be reduced through i ncreased div ersity . Our j oint so urce/channel code m ust determine the b est tradeof f between these two to m inimize end-to-end dis tortion. When retransmission is possible and the source is delay-sensitive, there is an additional tradeoff between reducing channel errors through retransmission s versus the delay th ese retransmi ssions entail. Joint sou rce/channel cod e o ptimization for the bin ary s ymmetric channel (BSC) w ith finit e blockl ength channel codes and asymptoti cally high source dimension was p re viously st udied in [15]. W e will use sev eral key ideas and results from this prior work in our a symptot ic analysis, in particular its decompos ition of end-to-end dist ortion into separate components associated with eit her the source code or the channel code. By apply ing this decomp osition to MIMO channels instead of the BSC, w e obt ain the optimal operating point on the Zheng/Tse div ersity-mult iplexing tradeof f region i n the asymptot ic lim it of hi gh source d imension and channel SNR. For any SNR the MIM O channel under mul tiplexing can be vie wed as a p arallel channel, and source/channel coding for p arallel channels has been previously explored in [17]. That work di f fers from ours in that the so urce m odels were n ot h igh d imensional and the nonergodic parallel channels did not have the same diversity-multiplexing tradeoff characterization as in a M IMO system. W e first develop a closed-form expression for the opti mal “distorti on exponent”, introduced in [17], under asymp totically hig h SNR. Specifically , for a mul tiplexing rate r and avera ge dis tortion measure D ( r ) we compute d ∗ D = min r " lim SNR →∞ log ¯ D ( r ) log SNR # , (1) where d ∗ D is the opti mal exponential rate at whi ch the di stortion g oes to zero with SNR. W e s how that th e optimal distortio n exponent correspon ds to a particular point on th e diversity-multiplexing t radeof f curve that is determined by the source characteristics. W e also demonstrate there is no loss in op timality for separate source and channel encoding and decoding given th e channel transm ission rate. Our opti mization frame work can also be used to optimize th e div ersity-multi plexing tradeof f at finite SNR, howe ver the solution is no longer in closed-form and must be found us ing tools from con vex o ptimization. W e extend this general opt imization framework to a wide variety of practical source-channel codes in non-asympt otic regimes. W e next consid er t he impact o f ARQ retransmissio ns and their associated delay . When the source does not have a delay constraint, the ARQ delay incurs no cost in terms of additional disto rtion. Hence, the ARQ protocol shou ld use the m aximum window size t o enhance the div ersity-mult iplexing tradeof f region associated with the MIMO channel alone. The large win dow size essentially allows codi ng ove r lar ger blocklengths than witho ut ARQ, which from Shannon theory do es not reduce data rate, only probabilit y of error . In the high SNR regime the op timal distortion exponent for the diversity-multiplexing tradeof f region enhanced by ARQ is found in the same mann er as without ARQ. Not surprisi ngly , a d elay con straint on the source changes the probl em considerably , since the source burstiness and queuing delay must n ow be incorporated into the problem formul ation. These characteristics are known to be a signi ficant obs tacle in merging analysis of the fundamental limits at t he ph ysical layer with end-to-end network performance [10]. In this sett ing the simpl icity ass ociated with the high SNR re gime breaks down, since at high SNR retransmissions and t heir associated delay have very low probabilit y , which essentiall y removes th e third dimension of delay i n our tradeoff region. W e thu s use dynami c programm ing to m odel and opti mize over the sy stem dynami cs as well as t he fundament al physical l ayer t radeof fs to minim ize end -to-end di stortion of a MIMO channel with ARQ. The remainder of th is paper is organized as follo ws. In the next s ection we present the channel mo del and summarize the dive rsity-mult iplexing tradeoff result s from [27]. In Section III we dev elop ou r s ource encoding framew ork and apply the MIMO channel error probability results of [27] t o the upper and lower bounds on end-to-end di stortion of [15]. Section IV obtains a closed-form expression for the optim al operating poi nt on the MIMO channel div ersity-multip lexing t radeof f curve in the hig h SNR regime to minimize end-to-end di stortion. This opti mal poi nt is also found for large, b ut fi nite, SNR using con ve x optimizatio n. In Section V we present a similar formul ation for op timizing diversity and mu ltiplexing in progressive vi deo transmi ssion using space-time codes. ARQ retransmission and its correspondi ng delay is considered in Section VI, where a dynamic programmin g formulation i s used to optimi ze the operating poi nt on the d iv ersity-multip lexing-delay tradeoff region for minimum end-to-end dis tortion of delay-constrained sou rces. A summary and closing thought s are provided i n Section VII. I I . C H A N N E L M O D E L W e wil l use the sam e channel model and notati on as in [27]. Cons ider a wireless channel wit h M transmit antennas and N receive antennas. The fading coef ficients h ij that model the gain from transmit antenna i to recei ve antenna j are independent and identically di stributed (i.i.d .) com plex Gaussian with unit variance. The channel gain m atrix H with elements H ( i, j ) = ( h ij : i ∈ { 1 , . . . M } , j ∈ { 1 , . . . , N } ) is assumed to be known at the recei ver and unknown at the transmitter . W e assume that t he channel remains constant ov er a block o f T symbols, while each block i s i.i.d. Therefore, in each block we can represent th e channel as Y = s SNR M HX + W , (2) where X ∈ C M x T and Y ∈ C N x T are t he t ransmitted and receiv ed signal vectors, respectively . The additive noise vector W is i.i.d. complex Gaussian with unit variance. W e construct a family of codes for thi s channel { C (SNR) } of block length T for each SNR l e vel. Define P e (SNR) as the avera ge probabil ity of error and R ( SNR) as the num ber of bi ts per s ymbol for the codebook. A channel code schem e { C (SNR) } is said to achiev e multipl exing gain r and diver sity gain d if lim SNR →∞ R (SNR) log 2 SNR = r, (3) and lim SNR →∞ log 2 P e (SNR) log 2 SNR = − d. (4) All logarithm s we consider wi ll h a ve base 2 and we th erefore suppress this base notatio n in the remainder of the paper . For each r we define the o ptimal diversity gain d ∗ ( r ) as the supremum of the diversity gain achie ved by any scheme. The main result from [27] that we wil l use in the next section is summarized in the following statement . Diversity-Multiplexing T radeoff [27]: Assu me the block length satis fies T ≥ M + N − 1 . Then the optimal tradeoff between diversity gain and mul tiplexing gain is t he piecewise-linear functio n connecting the point s d ∗ ( r ) = ( M − r ) ( N − r ) , for i nteger values of r such that 0 ≤ r ≤ min ( M , N ) . This functi on d ∗ ( r ) is plotted in Figure 1. In the Z heng/Tse framework the rate of the codebook { C (SNR) } must scale with log SNR , otherwise the multipl exing gain will go t o zero. Hence, in th e following s ections we will assum e, without loss of generality , t hat the rate o f the codeboo k is T r log SNR for an y choice of 0 ≤ r ≤ min ( M , N ) and block length T . W e also assum e th at the codebook achieves the optimal diversity gain d ∗ ( r ) for any choice of r . Codes achieving the optimal diversity-multiplexing tradeoff for MIMO channels hav e been inv estigated in many works, including [6], [8], [9], [20] and the reference s therein. I I I . E N D - T O - E N D D I S T O RT I O N This section presents our system m odel for the end-to-end t ransmission of source d ata. W e use th e same source coding mod el as [15] i n order to exploit t heir d ecomposition o f end-to-end dist ortion into Fig. 1. The optimal diversity-multiple xing tradeof f for T ≥ M + N − 1 . separate source and channel dis tortion components. W e assume th e original source data u is a random var iable wi th probabi lity densit y h ( u ) , which h as support on a closed and bounded subset of ℜ k with non-empty interior . An s -bit quantizer is applied to u via the following transformatio n: Q ( u ) = 2 s X i =1 v i I [ A i ] ( u ) , (5) where I [ A i ] ( u ) = I [ u ∈ A i ] i s the standard indi cator function, and { A i } 2 s i =1 is a partitio n of ℜ k into disjoint regions. Each region A i is represented b y a si ngle codev ector v i . The p t h-order distorti on due to the enc oding process is D s ( Q ) = 2 s X i =1 Z A i || u − v i || p h ( u ) du, (6) where || u − v i || p is the p th power of the E uclidian no rm. W e assume that the rate of the channel codebook C { SNR } is matched to the rate of the qu antizer (i.e. s = T r log SNR ). Each code vector from the quanti zer v 1 , . . . , v 2 s is mapped into a code word from C { SNR } through a permut ation mappi ng π . W e assume the mapping π is chos en equally likely at random from the 2 s ! possibilities. The code word π ( i ) i s t ransmitted o ver the channel d escribed i n Section 2 and decoded at the recei ver . Let q ( π ( j ) | π ( i )) b e t he probabil ity t hat codew ord π ( j ) is decoded at the receiv er giv en that π ( i ) was transmitted. The probability q ( ·|· ) will depend o n the SNR , t he quantizer Q ’ s codeword set, and the permut ation mapping π . Hence, we can write the total end-to-end dist ortion as follo ws: D τ ( Q, SNR , π ) = 2 s X i =1 2 s X j =1 q ( π ( j ) | π ( i )) Z A j || u − v j || p h ( u ) du. (7) Ideally we would like to be able to analyze the di stortion av eraged over all index assignm ents and possibly remove the dependence on h and Q . In general we cannot find a cl osed form expression for this distortion due to the dependence on Q ’ s codew ords, π , h , and the SNR. Howe ver , given our m atched source and channel rate s = T r log SNR , is clear t hat we hav e a tradeof f between t ransmitting at a high data rate to reduce source dist ortion and transmitt ing at a low data rate to reduce channel errors. In p articular , if we run full multip lexing in the MIMO channel (i.e. set r = min ( M , N ) ) we can use a lar ge s . This would result in low dist ortion at the source encoder but p ossibly cre ate many transmiss ion errors. Con versely , we could use full diversity in the channel (i. e. set d = M N ) to combat errors and then suffer the d istortion from a lo w value o f s . Between the two extremes lies a source cod e rate s and a correspondi ng channel multiplexing rate r that mini mizes (7). Although we cannot find a si mple g eneral expression for D τ ( Q, SNR , π ) , in the following subs ections we will determine tight asympt otic bounds for the distorti on through the use of high-resoluti on source coding t heory and high-SNR analysis of the MIMO channel. In addit ion, as the SNR approaches infinity we wi ll find a sim ple expression for the o ptimal choice of r and s that depends only on t he b lock lengt h T , source dimensi on k , number of transmit antennas M , and number of rec eiv e antennas N . The hig h-resolution asym ptotic regime is often us ed in source codin g theory to obtain analyti cal resul ts, since the performance characteristics of many encoder typ es are well understood in this regime [26]. Moreover , it has been s how that t he hi gh resolution asym ptotics often pro vide a g ood approxim ation for non-asymptoti c performance [19], [22]. As described in [26], we say that a quanti zer Q operates in the high-resolution asy mptotic regime if its nois eless distortion asy mptotically app roaches D s ( Q ) = 2 − ps/k + O (1) , (8) as s goes to infinity , where the O (1 ) term in (8) may depend on p , k , and s . Many practical qu antizers achie ve th is asympto tic distort ion, e.g. uni form and lattice-based quant izers [3], [25]. Thi s high-resolut ion asymptotic re gime is quite ac curate for our system model since we require the rate of our channel codebook { C (SNR) } to scale as r lo g SNR . Hence, at asymptotically high SNR, the source coder will receive an increasing num ber of bits, thereby approaching its hi gh-resolution regime. In the n ext two subsections we will construct upper and l ower asymptotic bounds for the end-to-end a verage distorti on of our system. The starting point for both bounds comes from the analysis of [15]. In Section IV we wil l show that these bounds are tight and find the optim al multip lexing rate t hat mi nimizes distortion i n the high SNR regime. A. Upper Bou nd for Distortion W e first cons truct an upp er bound for th e end-to-end distorti on (7) that depends on π . As shown in [15], D τ ( Q, SNR , π ) = 2 s X i =1 2 s X j =1 q ( π ( j ) | π ( i )) Z A i || u − v j || p h ( u ) du = 2 s X i =1 q ( π ( i ) | π ( i )) Z A i || u − v j || p h ( u ) du + 2 s X j,i =1 ,i 6 = j q ( π ( j ) | π ( i )) Z A i || u − v j || p h ( u ) du ≤ 2 s X i =1 Z A i || u − v j || p h ( u ) du + O (1) 2 s X i =1 P ( A i ) 2 s X j =1 ,j 6 = i q ( π ( j ) | π ( i )) ≤ D s ( Q ) + O (1) max i P e | π ( i ) (SNR) , (9) where P e | π ( i ) is the probabilit y of code word error given that codeword π ( i ) was transm itted. This bound essentially spli ts (7) into two pieces; one corresponding to correctly receiv ed channel codewords and the other correspondin g to erroneous channel decodin g. The term corresponding to correct t ransmission is bounded by t he noiseless dis tortion D s ( Q ) while t he t erm corresponding to errors is bounded by a constant 1 multipli ed by the channel codeword error probability . By construction, t he rate of our channel codebook (and hence the source encoder) is s = T r log SNR , therefore D s ( Q ) = 2 − ps/k + O (1) = 2 − pT r k log SNR+ O (1) (10) as s approaches infini ty or , equiv alently , as log SNR approaches infinit y . In order t o bou nd the probability of codew ord error we need a fe w quant ities from [27]. For th e channel defined in (2), let P out ( r log SNR) and d out ( r ) b e the outage probabili ty and outage exponent that satisfy P out ( r log SNR) = 2 − d out ( r ) log SNR+ o (log SNR) . (11) The exponent d out ( r ) can be directly computed and the equation for doing so is presented in [27]. W e can also boun d the probability of error wi th no outage through P (error , no outage) ≤ 2 − d G ( r ) log SNR+ o (log SNR) , (12) where d G ( r ) is the exponent asso ciated wit h choosing the channel cod e words to be i.i.d. Gaussian. Again, the formula for com puting d G ( r ) can be found in [27]. Then we can bound the overall probabili ty of 1 This term is O (1) because our source i s bound ed. error P e (SNR) by P e (SNR) ≤ P out ( r log SNR) + P (error , no outage) ≤ 2 − d out ( r ) log SNR+ o (log SNR) + 2 − d G ( r ) log SNR+ o (log SNR) . (13) W ith the bound (13) in hand we may now upper bound t he to tal dist ortion by D τ ( Q, SNR , π ) ≤ 2 − pT r k log SNR+ O (1) + O (1)2 − d out ( r ) log SNR+ o (log SNR) + O (1 )2 − d G ( r ) log SNR+ o (log SNR) . (14) Note that the dis tortion upper bound in (14) does not depend on the source-to-channel codew ord mappi ng π , sin ce the bounds (11) and (12) as well as the source distortion (10) do not depend o n this mapping. Hence, the bound (14) holds for the distortion a veraged over all possible source-codew ord mappin gs, and only depends on the quant izer Q t hrough the parameters p , s , and k . Thus , by averaging over all source-channel codeword mapping s we get t hat for any quantizer Q sati sfying (8) in the high resolution asymptotic regime, t he end-to -end av erage dis tortion is bounded above by ¯ D τ (SNR) = E π [ D τ (SNR , π )] ≤ 2 − pT r k log SNR+ O (1) + O ( 1 )2 − d out ( r ) log SNR+ o (log SNR) + O ( 1 )2 − d G ( r ) log SNR+ o (log SNR) . (15) B. Lower Bou nd for Distortion Our l ower b ound for disto rtion will also m ake use of a result from [15]. Let ¯ D τ ( Q, SNR) be the distortion averaged over all 2 s ! po ssible mapping s π . Then from [15] we ha ve ¯ D τ ( Q, SNR) ≥ 2 − ps/k + O (1) + O ( 1 ) P e (SNR) . (16) Note that as in the upper bound, for an y quantizer Q satisfying (8) in the asymptotic re gime, the lower bound depends on Q only through the parameters p , s and k . Howe ver , a key differe nce between thi s bound and th e upp er bound (14) is that it i s based on ave raging di stortion over all so urce-code word mappings π . In particular , this bound is based on th e assumption that each source-to-channel codeword mapping is random and equally prob able (i.e. the probabili ty of mapping a give n source codew ord to a giv en channel code word is uniform). From [27] we may l ower bound the error probability P e (SNR) via the outage exponent as P e (SNR) ≥ 2 − d out log SNR+ o (log SNR) . (17) Thus our lo wer bound fo r av erage dist ortion for any quant izer Q satisfying (8) in the asy mptotic regime of hi gh resolut ion becomes ¯ D τ (SNR) ≥ 2 − ps/k + O (1) + O ( 1 )2 − d out log SNR+ o (log SNR) . (18) I V . M I N I M I Z I N G T O T A L D I ST O RT I O N In this s ection we w ill optim ize the bounds p resented in the pre vious section and show that they are tight. In order to achiev e analyt ical results for the m inimum dist ortion bound w e consi der the asympt otic regime of S N R approaching infinity . In general, our t otal dis tortion is an exponential sum of the form 2 f ( r ) log SNR + 2 g ( r ) log SNR , (19) where we define f ( r ) as the so ur ce distortion exponent and g ( r ) as the channel dis tortion exponent . W e minimize tot al disto rtion in t he form of (19) by choosin g the exponents f ( r ) and g ( r ) to be within o (1 ) of each ot her . The function f ( r ) depends on the source distorti on whi le g ( r ) depends on th e channel error probability . For example, in (18), if we assume the bound is tight and neglect terms that become negligible at high SNR, then f ( r ) = − pT r /k (since s = T r log SNR ) and g ( r ) = − d out ( r ) . Note that if the exponents in (19) are n ot of the same order then one term in t he sum dominates the oth er as SNR approaches infinity . As we shall s ee, t he fact that these two terms are of t he same order is t he key to obtaining a closed-form expression for the opti mal dive rsity-mult iplexing t radeof f point. A. Asymptotic Re gime W e first consider the upper bound for to tal dis tortion (14). W e need to match t he exponents for the three terms in t he bound, oth erwise one t erm w ill not go to zero as the SNR goes to infinity . Fortunately , part of this has already been acc omplish ed i n [27]. Specifically , for the case where the block length satisfies T ≥ M + N − 1 it w as sho wn in [27] that d out ( r ) = d G ( r ) = d ∗ ( r ) , although the o (log SNR) t erms ar e not the same. Hence, if we c onsider the asymptoti c regime o f SNR approaching i nfinity we ha ve lim SNR →∞ log ¯ D τ (SNR) log SNR ≤ lim SNR →∞ log h 2 − pT r k log SNR+ O (1) + O ( 1 )2 − d ∗ ( r ) log SNR+ o (log SNR) i log SNR . If we choose an r ∗ that sol ves d ∗ ( r ∗ ) = pT r ∗ k , (20) where d ∗ ( r ) i s the piecewise linear function connecting ( N − r )( M − r ) for int eger v alu es of 0 < r < min( M , N ) , th en we ha ve lim SNR →∞ log ¯ D τ (SNR) log SNR ≤ lim SNR →∞ log h 2 − d ∗ ( r ∗ ) log SNR+ O (1) + O ( 1 )2 − d ∗ ( r ∗ ) log SNR+ o (log SNR) i log SNR ≤ lim SNR →∞ log h O (1)2 − d ∗ ( r ∗ ) log SNR+ o (log SNR) i log SNR = − d ∗ ( r ∗ ) . W e n ow consider the lower bound (18) on av erage distorti on. Again, for the case where T ≥ M + N − 1 we hav e that d out ( r ) = d ∗ ( r ) . W e can match th e exponents in (18) by choo sing t he s ame r ∗ that satisfies (20), wh ich yields lim SNR →∞ log ¯ D τ (SNR) log SNR ≥ lim SNR →∞ log h 2 − pT r k log SNR+ O (1) + O ( 1 )2 − d ∗ ( r ) log SNR+ o (log SNR) i log SNR ≥ lim SNR →∞ log h 2 − d ∗ ( r ∗ )+ O (1) + O ( 1 )2 − d ∗ ( r ∗ ) log SNR+ o (log SNR) i log SNR = − d ∗ ( r ∗ ) . Since the asymptotic upper and lower bounds are tight, we ha ve pro ved the following theorem: Theor em 1: In the limit of asymptotically h igh SNR, the opti mal end-to-end distorti on for a ve ctor quantizer ca scaded with t he M IMO channel char acterized by (2) satisfies d ∗ D = lim SNR →∞ ¯ D τ (SNR) log SNR = − min( d ∗ ( r ) , pT r /k ) = − d ∗ ( r ∗ ) . (21) The choice of opt imal multiplexing rate r ∗ is il lustrated in Figure 2, which plots d ∗ ( r ) from Figure 1 together wi th pT r /k as a function of r . W e see that t he so urce distortion exponent pT r /k incr eases linearly with r , whi le the channel distort ion e xponent d ∗ ( r ) decr eases piecewise linearly with r . T o balance the source and channel dist ortion, r ∗ is chosen such that d ∗ ( r ∗ ) = pT r ∗ /k . Fig. 2. The optimal multi plexing rate r ∗ to balance source and channel distortion. It should be no ted that the ti ghtness of the above bo unds only hold when T ≥ M + N − 1 . For T < M + N − 1 t he upper b ound remains the sam e wh ile the lower bou nd changes, which leaves a gap between our b ounds. B. Asymptotic Dis tortion Pr operties The asymptotic disto rtion and optimal dis tortion exponent from Theorem 1 possess a few non-intu itive properties. First, while it is possi ble to choose d ∗ ( r ) = M N (full mu ltiplexing) o r r = min( M , N ) (full div ersity), it is never optimal to do so. When minimizing ¯ D τ (SNR) we require non-zero amounts of b oth diversity and mul tiplexing, otherwise one of the t erms in th e disto rtion bounds (15) and (18 ) will not tend to zero as SNR approaches infinity . It is also interesting to examine the opt imal distortion exponent as the block length T or source dimensi on k becom e lar ge. As k becomes l ar ge (and T remains fixed) we mus t increase r ∗ in order to match the terms in (20). This is consistent with our i ntuition since a high dimensional source will require a lar ge amount of multiplexing, oth erwise the distortion at the source encoder becomes very large. It is mo re su rprising that as T becomes large (and k remains fixed) we should decrease r ∗ , i.e. i ncrease dive rsity at the expense o f multi plexing. This is in cont rast to traditional source-c hannel coding, where we encode our source at a rate just belo w the channel capacity ( min( M , N ) log SNR ) when the block length t ends to infinity . In th is case, howe ver , we don’t encod e at channel capacity because the source dimension k remains fixed as T becomes lar ge. Thus , since the source encoding rate is proportional to T , we are already getting an asymptotically large channel rate for source encoding, and th erefore should use our antennas for diver sity rather than additi onal rate through multiplexing. C. Sour ce-Channel Code Separation One feature t hat we do sh are with t he traditional so urce-channel coding results is the noti on of separation. In a traditional Shannon-theoretic framew ork, the source encoder needs to know o nly the channel capacity to design its source code. Th en one may encode the so urce independently of the channel (at the channel capacity rate) and achiev e the opti mal end-to-end di stortion. In thi s case the end-t o-end distortion is due onl y to the source encoder s ince the channel is error free (over asymp totically l ong block lengths). In ou r mo del we consider a source encoder concatenated with a MIMO channel that is restricted to transmiss ion over finite b lock leng ths. W ith this restriction the channel in troduces errors even at transmissio n rates b elow capacity . These channel errors giv e ri se to t he div ersity-multi plexing t radeof f. Under this finite blockl ength channel coding we obt ain a source and channel codin g strategy t o minim ize end-to-end distorti on. Our result s indicate that separate source and channel coding is still opti mal for this minimizati on. Howev er , w e n ow get (equal) di stortion from both the source and channel code, i n contrast to the optimal strategy in Shannon’ s separation theorem where the source is encoded at a rate below channel capacity and thus no disto rtion is i ntroduced by the channel. D. Non-asymptotic Bound s W e now analyze the beha vior of our distortion bounds and the corr esponding choice of r ∗ for finite SNR . In p articular , we wi ll cons ider t he case of large but finite SNR, such that the SNR is suf ficiently large to neglect the O (1) term in the exponent of (8) and (18), and to assume O (1) ≈ 1 and neglect the o (log S N R ) exponential term in (15) and (18). W ith these approxi mations the optimal div ersity-multi plexing tradeoff is obtai ned by solving the follo wing conv ex op timization prob lem: min r 2 − pT k r log SNR + 2 − d ∗ ( r ) log SNR (22) s.t. 0 ≤ r ≤ min( M , N ) . Figures 3 , 4 , and 5 provide numerical results based o n the solutio n to (22) comp aring the tot al end- to-end distortio n versus the number of ant ennas assigned to m ultiplexing. Each plot contains four curves that represent different SNR l e vels. The difference between the three plot s is the ratio of th e block leng th T to source vector dimension k . Notice t hat for T m uch smaller t han k (Figure 3 ) we wil l use almost all of our antennas for multiplexing. For k of the same order as T (Figure 4) we will choose about the same num ber of antennas for multi plexing and for div ersity . For k smaller than T (Figu re 5) we will u se more antennas for diversity than for m ultiplexing. Not e t hat even at low SNR we can still find r ∗ via the con vex opt imization formulation in (22), b ut must include the neglected terms O (1) and o (log S N R ) in the dis tortion expressions to which we apply th is optim ization. In our numerical results we found that neglecting t hese t erms for SNRs above 20 dB had little impact. 1 2 3 4 5 6 7 8 0 50 100 150 200 250 300 350 400 450 500 Total Distortion vs. Multiplexing Antennas Number of Antennas Assigned to Multiplexing Total Distortion SNR=.11 SNR=.15 SNR=.25 SNR=.39 SNR=.67 SNR=1 Fig. 3. T otal distortion vs. number of antennas assigned to multi plexing in an 8x8 system ( T << k ). 0 1 2 3 4 5 6 7 8 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 Multiplexing Rate Total Distortion Total Distortion vs. Multiplexing Rate, T=500, p=2, k=1000 SNR=1 SNR=20 SNR=40 SNR=60 SNR=80 Fig. 4. T otal distortion vs. number of antennas assigned to multi plexing in an 8x8 system ( T k ). 0 1 2 3 4 5 6 7 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 10 −4 Total Distortion vs. Multiplexing Rate, T=500, p=2, k=100 Multiplexing Rate Total Distortion SNR=1 SNR=20 SNR=40 SNR=60 SNR=80 Fig. 5. T otal distortion vs. number of antennas assigned to multi plexing in an 8x8 system ( T >> k ). V . P R AC T I C A L S O U R C E A N D C H A N N E L C O D I N G While the resul ts in the pre vious section lead to clos ed form sol utions for opti mal j oint source-channel coding i n the high SNR regime, they only apply to a sp ecific class of source and channel codes and distortion metrics. W e now examine the dive rsity-mul tiplexing tradeoff for a broad class of source codes, channel codes, and distortion m etrics. The basic op timization framework (22) can still be applied to this more gener al class of problems. Furthermore, this frame work can be applied in non-asymptot ic setting s, thereby allowing us t o study t he div ersity-multi plexing tradeoff under typical operating condit ions. In this s ection we present an example of end-to-end di stortion opt imization, via the di versity-multipl exing tradeof f, for source/channel dist ortion models that are fitted to real video streams and MIMO channels. W e use the progressiv e video encoder m odel de veloped in [13]. The overall mean-square distortion is e valuated as D τ = D e + D c , (23) where D e is t he distortion induced by the source encoder and D c is the distortion created by errors in the channel. Although the total disto rtion is represented by two separate component s, each component shares some common term s so we will still ha ve a tradeoff between diversity and multipl exing. Th e model for source distortion D e dev eloped in [13] cons ists o f a six-parameter analytical formula that is fitted to a particular traffic stream. Num erical result s for D e as a function of the source encoding rate are provided in [13, Figure 2]. The source encoder design is based on a parameter β corresponding to the amount of redundant data in consecutive encoding blo cks. In general a larger value of β leads to a sm aller D e at the cost of increased compl exity . The model for the channel distorti on D c is fitted to t he following equation, D c = σ 2 P e ( N u ) " γ + β γ ln 1 + γ β ! − 1 γ + 1 2 # , (24) where given β the parameters σ 2 and γ are based on the particular s ource encoder and traffic stream, N u is th e nu mber of antennas u sed for multipl exing, and P e ( N u ) is t he probability of cod e word error as a function of N u . W e will assu me sources with β = . 01 in ou r analysis since i t provides the lowe st distortion for any given rate. This source encoder settin g also pro vides th e hig hest sensi tivity to channel errors, whi ch allows us to highlight the tradeof f between multiplexing and di versity i n our optimization. Our channel transmission scheme follo ws t he setup in [16 ]. W e use 8 transmit and 8 recei ve ant ennas with a set of l inear space-time codes that can trade off multi plexing for diversity (s pecifically , these codes only trade integer values of r and ( M , N ) ). The actual code construction i n [16] is fairly complex and in volves se veral inner and outer codes designed t o handle both Ricean and Rayleigh fading channels in a M IMO orthogonal frequency division multip lexing (OFDM) s ystem. For the p urposes of our num erical results t he actual code design is irrelev ant, we only requi re the probabi lity of error as a function of SNR and the numb er of antennas assigned to m ultiplexing, whi ch i s giv en in [16 , Figure 4]. Our opti mization can be applied to space-time channel codes de veloped by other authors [8], [6], [18] by using the error probability associ ated with their codes in our optimization. Since the channel coding scheme of [16] does not permit us to assign fractions of antennas, we must solve the foll owing integer program for the optimal distort ion and number of multiplexing antenn as: min N u D e + D c (25) s.t. N u ∈ { 1 , 2 , 4 , 8 } . 1 2 3 4 5 6 7 8 0 50 100 150 200 250 300 350 400 450 500 Total Distortion vs. Multiplexing Antennas Number of Antennas Assigned to Multiplexing Total Distortion SNR=.11 SNR=.15 SNR=.25 SNR=.39 SNR=.67 SNR=1 Fig. 6. T otal distortion vs. number of antennas assigned to multi plexing for differing lev els of SIR. Figure 6 contains a set o f curves that show the total dist ortion achie ved as a functi on of the number of antennas assigned to multip lexing. The uppermost curv e corre sponds to t he lo west SNR and the bottom curve corresponds to th e h ighest SNR . W e see t hat we have an explicit tradeoff here that depends on SNR . At lo w SNR the t otal distortion is m inimized by assigning most antennas to di versity to compensate for the high error prob ability in th e channel. As SNR increases we assign more ant ennas to mu ltiplexing since this is a better use of antennas when t he error p robability is low . One si gnificant difference between this plot and the asympt otic results i n S ection IV is t hat here we do assign our antennas t o full multiplexing as the SNR becomes lar ge. The reason we observe t his behavior is that t he rate of our codebook i n this example does not scale with SNR . Thus, as the SNR becom es large w e ev entually reach a poi nt where distortio n would b e reduced b y moving to a h igher rate code that is not av ai lable in the 8x8 space-time code under consid eration. Hence, t he o ptimal choice in this case is to eventually move to full multiplexing. The implication of this resu lt is that a M IMO syst em shoul d h a ve enough antennas to exploit full multiplexing at all av ail able SNR s. A design frame work for such codes has been dev eloped in [6], but the error prob ability analysis of these codes is still needed to perform the joint s ource-channel coding opt imization. V I . T H E D I V E R S I T Y - M U L T I P L E X I N G - D E L A Y T R A D E O FF Instead of accepting decoding errors in the channel , many wireless systems perform error correction via some form of ARQ. In parti cular , th e receiver has so me form of error detection code, and if a t ransmission error is detected on a given packet, a feedback path is used t o send this error in formation back to the transmitter , which then resends part or all of the packet to increase the chance of successful decoding. The packet retransmis sions, combined with random arriv al times of t he messages at the transm itter , cause queues to form i n front of the source coder and hence each block of data wi ll experience random delays . Here, the notio n of delay we wish to capture is the time between th e arri val time of a mess age at the transmitter and the time at which it is successfully decoded at the receiver (also kn own as the “sojou rn time” in queueing sys tems). While ARQ in creases the probability of decoding a packet correctly , it also introduces addit ional delay . The window size of th e ARQ protocol determin es h ow many retransmiss ion attem pts wi ll be made for a giv en p acket. Th e lar ger this windo w size, the more likely the packet will be successfully received, and the larger the possible delays associated wit h retransmiss ion will be. ARQ can be v iewe d as a form of div ersity , and hence it com plements antenna div ersity in MIMO systems. For MIMO system s with ARQ, there is a three-dimensio nal tradeof f between diversity due to mult iple antennas and ARQ, multiplexing, and delay . This three-dim ensional tradeoff region was recently characterized by El Gam al, Caire, and Damen in [7], and we will u se this region in lieu of the Zheng/Tse di versity-multiplexing region in t his section. W e will first sum marize results from [7] characterizing t his region, then use this region to op timize the di versity-m ultiplexing-ARQ t radeof f for distortion under delay constraints. A. The AR Q Pr otocol and its Diversity Gain W e assum e t he same M x N channel model (2) as before and th e following ARQ schem e. Each information message is encoded i nto a sequence of L blocks each of size T . Tr ansmissi on commences with the first block and after decoding the message the rec eiv er sends a positiv e (A CK) or negati ve (N A CK) acknowledgement back to th e transmit ter . In the case of a NA CK the transmi tter sends the next block in the sequence and the receive r u ses all accumulated blocks to try t o d ecode the message. This process proceeds u ntil either the recei ver correctly d ecodes the message or until all L blocks h a ve been sent. If a N A CK is s ent af ter the transmission of the L th blo ck th en an error is declare d, the message is remove d from the system, and the transm itter st arts over with t he n ext queued message. As i n [7] we will use the term “round” t o describe a si ngle block transm ission of length T . W e wi ll refer to all L rounds ass ociated with the ARQ protocol as an “ ARQ block”. Hence, each ARQ bl ock consists of up t o L rounds, and each round is of size T . The fading coef ficients h ij that model the gain from transmit antenna i t o recei ve antenna j are i.i.d. complex Gaussian wi th unit variance. The channel gain matrix H wit h elements H ( i, j ) = ( h ij : i ∈ { 1 , . . . N } , j ∈ { 1 , . . . , M } ) is assumed to be known at the receiv er and unknown at the transmitt er . There are two channel model s in vestigated in [7]: the long-term st atic model and the sh ort-term st atic model. In the lo ng-term static model th e channel remains constant over each ARQ block of up to LT symbols, and th e fading associated with each ARQ block is i.i.d. In the short-t erm static mod el the fading is constant over one ARQ round, then changes to a new i.i.d. value. The long-term model appl ies to a quasi-static situation such as migh t be seen in a wi reless LAN channel. The short-term model is more dynamic and mi ght correspond t o fading associated with a port able mobile device. The ARQ diversity gain is very s imilar for th e two m odels. In particul ar , the div ersity exponent for the short-term stati c m odel is a factor of L lar ger th an for the l ong-term static model, corresponding t o the L -fold time d iv ersity in th e short-term model. W e will u se the long-term s tatic model in our analys is and numerical resul ts, since it allows u s to focus on the diversity asso ciated with the ARQ rather than time diversity . Our analysis easily extends to t he sh ort-term stati c model by addi ng the extra factor of L to t he ARQ di versity exponent. Under t he long-term static channel mod el, in round l ∈ { 1 , . . . , L } of an ARQ block we can represent the cha nnel as Y l = s SNR M HX l + W , (26) where X l ∈ C M x T and Y l ∈ C N x T are the transmitt ed and received signals in block l , respectively . The additive noise vector W is i.i.d. complex Gaussian with unit variance. W ith the abov e model in hand let us define a family of codes { C ( SNR ) } , inde xed by the SNR le vel. Each code h as length LT and t he bit rate of the first block in each code is b ( SNR ) /T . Suppose we consider a sequence of ARQ b locks. At time s th e random var iable B [ s ] = b ( SNR ) if a message is successfull y decoded at the recei ver , and B [ s ] = 0 otherwise. T hen, we can define the av erage th roughput of the ARQ protocol us ing these codes as η ( S N R ) = lim inf τ → ∞ 1 T τ τ X s =1 B [ s ] , (27) and we can vi e w η ( SNR ) as the avera ge numb er of transmi tted bit s per channel use. Further define P e ( SNR ) as the a verage probability of error o f th e ARQ block (i.e. the probabilit y that a N A CK is sent after L transmission rounds). The multi plexing gain of the ARQ protocol is defined in [7] as r = lim SNR →∞ η ( SNR ) log SNR , (28) and t he diversity gain as d = − lim SNR →∞ log P e ( SNR ) log SNR . (29) For each r and L we define the opt imal diversity gain d ∗ ( r , L ) as the supremum of the di versity gain achie ved by any schem e. For L = 1 (i .e. no ARQ) we hav e the original div ersity-multi plexing t radeof f from Section II. Hence, d ∗ ( r , 1) is the piece wi se linear function d ∗ ( r ) join ing the points ( k , ( M − k )( N − k ) , at i nteger values of k for 0 ≤ k ≤ min ( M , N ) . For L > 1 we hav e the follo wing result from [7]. Diversity Gain o f ARQ: T he diversity g ain for the ARQ proto col with a maximum of L blo cks is d ∗ ( r , L ) = d ∗ r L . (30) The di versity gain achie ved by ARQ is quite remarkable. According to (30), for any r < min( M , N ) we can achie ve the full dive rsity gain d = M N for sufficiently large L . Th us, for L suf ficiently lar ge, there i s no reason to uti lize spatial diversity since al l needed diversity can be obtained through ARQ. For L not su f ficiently l ar ge, the maximum ARQ wi ndow size would still be utili zed t o min imize th e amount of spatial diversity required. Th e d iv ersity-multip lexing-ARQ tradeoff (30) is analogous to the Zheng-Tse diversity-multiplexing tradeoff d ∗ ( r ) . Thus, t he s ame analysis as in Section III can be appl ied to minim ize end-to-end distorti on based on t he div ersity-multi plexing tradeoff d ∗ ( r , L ) induced by the ARQ. In parti cular , end-to-end disto rtion for MIMO channels wi th asymptotically h igh SNR and ARQ retransmissions , i n the absence of a delay const raint, is minimized usin g the follo wing procedure: 1) choose the largest ARQ windo w size L possible, 2) determin e the resulting ARQ di versity gain d ∗ ( r , L ) from (30 ) 3) solve (20) for the optimal rate r ∗ using d ∗ ( r , L ) instead of d ∗ ( r ) . This procedure not only m inimizes end-to-end disto rtion, but also indicates that separate source and channel coding is op timal, provided the s ource and channel encoders know r ∗ and the maximum value o f L . M oreove r , the resul ts in [8] show that the rate penalty for ARQ is negligible in t he high SNR regime. In order to analyze the diversity , m ultiplexing, and delay tradeof f for del ay-sensitive sources we m ust recognize two important subtleties about t he above results. First, in systems that t ransmit delay-constrained traf fic we may not be able to tolerate a long ARQ window (in some cases ARQ may not be tolerated at all ). Second, we must carefully consider the im pact of asy mptotically h igh SNR, which is crucial in the proofs of the above results. Specifically , in t he high SNR regime the occurrence of a N AC K in the ARQ protocol becomes a rare event (i.e. the probability of a NA CK tends to zero as SNR approaches infinity). Th erefore, wi th probability tending to one, each message is decoded correctly during the first transmissio n attempt – resulting in a multiplexing gain equiv alent to that of a sys tem without ARQ. The increasingly rare errors are correc ted by the ARQ process, which results i n increased di versity . The main dif ficulty in u sing th ese asymptot ic results to e v aluate delay performance is that i n the high SNR regime th ere is ess entially no delay due to ARQ. In other words, queuing delays associated with retransmissions are rare in the high SNR regime. Based on this fact and using s tandard results from queuing theory , o ne can show that under stable arri val rates the arriving messages almost always find the system empty . Hence, wit h hig h probabil ity an arri ving message wi ll i mmediately b egin transmiss ion and suffe r no queuing delay . In wireless syst ems, errors durin g a transmissio n attempt are n ot rare events. Indeed, most wi reless syst ems typically b ecome reliable onl y after t he applicatio n of ARQ. In o ther words, errors after completi on of the ARQ p rocess m ight be rare events, but errors during the ARQ process are not rare. As we sh all see in the next subsection, this subtle di f ference requires a an op timization frame work that can model and optimize over t he queuing dynamics associated with A RQ. B. Delay-Distorti on Model This section presents our mod el for a delay-sens itive syst em. W e do not assu me a high SNR regime in our analysis since, as stated in the pre vious section, this leads to rare ARQ err ors and hence ef fectively removes the ARQ queuing delay . W e do assume that the finite SNR i s fixed for each problem instance, i.e. we do not optimize power control, although th is opti mization was in vestigated in [7] and shown to provide significant di versity gains in t he lo ng-term stati c channel. W e assum e the original source data u i s a random vector wit h probabil ity d ensity h ( u ) , which has support on a closed, bounded subset of ℜ k with non -empty interior . During each transmission block of length T an instance of u arrive s at the sys tem independently with probability λ and is queued for transmissio n. W e assume that each m essage has a deadli ne k at the receive r . Hence, if a message arrive s at time t and is not recei ved by time t + k T then i ts deadline expires and the message is dropped from the system. W e assume that each message is quantized according t o the schem e discuss ed below . Th e quantized version of each message is then mapped int o a codeword i n the codebook { C ( SNR ) } and passed to the MIMO-ARQ transmitter discuss ed i n the previous section. Due to the random message arriv al t imes and the random completion times of t he ARQ process we will have q ueuing and delay in this system. Our goal is to select a di versity gain, multiplexing gain , and ARQ window si ze to minim ize t he disto rtion created b y both th e quantizer and th e messages lost due to channel error or delay . The int uition behind the div ersity-multi plexing-ARQ tradeoff is straightforward. W e would like to use as much mul tiplexing as possi ble sin ce th is wil l allow us t o use more b its t o describe a m essage and reduce encoder disto rtion. Howe ver , high lev els of multipl exing induce more errors i n the wireless channel, t hereby requiring l onger ARQ wi ndows to reduce errors. Th e longer ARQ windows induce hig her del ays, which als o cause higher distortio n due to messages m issing their deadl ines. W e must b alance all of these quantities to opt imize sys tem performance. W e use the same vector encoder and disto rtion m odel from Section III. As before, w e assu me that the total average d istortion D τ ( F , SNR ) can be split into two dependent pieces D τ ( F , S N R ) = D s ( F ) + D e ( d, SNR ) , (31) where D e ( d, SNR ) is the distorti on caused by messages d eclared in error . H ere the errors are i ncurred whenev er the ARQ process fails or when a message’ s deadl ine expires. W e also ass ume the d istortion due t o erroneous messages is bounded by the overall loss probability: D e ( d, SNR ) ≤ P e ( SNR ) + P { D el ay > k } , (32) where P { D el ay > k } is the probabilit y that a message violates its deadline and P e ( SNR ) is the probabili ty of error for the ARQ block, which depends o n its windo w size L . Our goal is to minimize the to tal delay-dist ortion bound D τ ( F , SNR ) ≤ D s ( F ) + P e ( SNR ) + P { D elay > k } . (33) In order to optim ize (33) we require a formulation that accounts for the dif ferent delays experienced by each m essage. Hence, as d escribed in the n ext section, we turn to the theory of M arkov decision processes to mod el and solve th is problem . C. Minimizing Di stortion via Dynamic Pr ogramming W e now de velop a dynamic programming optim ization frame work to minim ize (33). W e assu me without loss of generality th at the queue in o ur system is of maximum size k . This is not a rest rictiv e assum ption since each m essage requires at least one time b lock of size T for transmi ssion, hence any arriving message that sees more than k messages in the queue will not be able to meet i ts deadlin e and could be dropped without affecting our performance analysis. Note that unlike stand ard queuing models that only t rack the number of mess ages awaiting transmission, we must also track the amo unt of ti me a particular message has waited in th e queue. In particular , given t hat one m essage is queued for transm ission our stat e space model m ust d iff erentiate between a message that has ju st arrived and a message whose deadline is about to e xpire. Since the queue size is bounded, we can only have a finite num ber of messages i n the queue, and h ence the combined message and waiting time model e xists in a finite space. W e define the queue process X Q = ( X Q ( n ) : n ≥ 0) , whi ch takes values on a finite sp ace X Q . Simil arly , we define t he state of the ARQ process X L = ( X L ( n ) : n ≥ 0 ) on a finite space X L . Here, the state of the ARQ process denotes the n umber of t he current transm ission rou nd i n the current ARQ block. Final ly , we define t he overall state of t he sys tem as a process X = ( X ( n ) : n ≥ 0) such t hat X ( n ) = ( X Q ( n ) , X L ( n )) (i.e. t he space X is the product sp ace of X Q and X L ). Since the arriv al process is geom etric and each ARQ round is assum ed t o be i .i.d., the process X is a finite-state discrete-time Markov chain. The transition dynamics of t his Markov chain are governed b y the choices of d iv ersity , mult iplexing, and the ARQ windo w si ze. W e assume that at the start of ea ch ARQ block the t ransmitter chooses the n umber of bit s to assign to the vector encoder and hence the amount of spatial diversity and multipl exing i n the codeword selected from { C ( SNR ) } . The transmitter also s elects the length of the ARQ wi ndow . These choices th en remain fixed until either t he message is received or the ARQ window expires. Define the space of actions A as t he set of all possi ble combinatio ns of multiplexing gain and ARQ window lengt h. Note that a choice o f mult iplexing gain implicitl y selects the number of bi ts given to the source encoder as w ell as the amou nt of spati al diversity . W e ass ume that the number of antennas M and N are finite and that th e ARQ window s ize is also finite. Hence, the action space A is a finite set. W e define the control policy g as a prob ability dis tribution on the space X x A . W e can vi e w the elements of g as g ( x, a ) = P { action a c hosen in state x } , ∀ x ∈ X , a ∈ A . For any control g , the Markov chai n X is irreducible and aperiod ic 2 . Define Q ( g ) as the trans ition matri x for X corresponding to control policy g . Hence, Q ( g ) = ( Q i,j ( g ) : i, j ∈ X ) is a stochastic matrix with entries Q i,j ( g ) = P ( X ( n + 1) = j | X ( n ) = i, g ) = X a ∈A P ( X ( n + 1) = j | X ( n ) = i, A ( n ) = a ) g ( i, a )) . For each state-action pai r we define a rewa rd function r ( x, a ) . F or the stat es i n X correspondi ng to completion of the ARQ process the rew ard function denotes t he distortio n incurred in th at particular s tate. Hence, r ( x, a ) = 2 − ps/k + I [AR Q F ails] + I [ D elay > k ] . (34) Let G be t he set of all a vailable control pol icies. Then for any g ∈ G define the l imiting a verage value of g starting from st ate x as V ( x, g ) = lim sup n →∞ " 1 n + 1 n X k =0 E x,g [ r ( X ( k ) , g )] # , where r ( X ( k ) , g ) is the random re ward earned at time k und er control po licy g . Since X is an irreducible and aperiodic Markov chain for any control g we know from [2] that the above value function reduces to V ( x, g ) = π ( g ) r ( g ) ∀ x ∈ X , (35) where π ( g ) = π ( g ) Q ( g ) is t he statio nary distribution of X under con trol g and r ( g ) is the column vector of rew ards earned for each state x ∈ X under control g . Hence, the value fun ction i s simply the expected value of our rew ard function r wit h respect to the s tationary distribution of X . Notice t hat given our definition for r in (34), the value functio n V ( g ) pro vides us wi th the delay-based distortion (33) caused by control policy g . Thus we want to minimize distorti on by minimizing t he value function V ( g ) . Specifically , ou r goal is to find a g ∈ G that minim izes V ( x, g ) . From [2] we know t his problem can be so lved throu gh the following linear pro gram. min s X x ∈X X a ∈A r ( x, a ) s xa (36) 2 T o create a non-irreducible Marko v chain we would be required to successfully tr ansmit a packet with probability one. subject to : X x ∈X X a ∈A ( δ ( x, x ′ ) − p ( x ′ | x, a )) s xa = 0 , x ′ ∈ X X x ∈X X a ∈A s xa = 1 , s xa ≥ 0; a ∈ A , x ∈ X , where δ ( x, x ′ ) is the Kron ecker delta, s xa is the steady-state probabilit y of b eing in state x and taking action a , and p ( x ′ | x, a ) i s the p robability o f jump ing to st ate x ′ giv en action a in state x . The st ate-action frequencies s xa provide a unique m apping t o an optimal control g ∗ [2]. W ith this dynami c programming formu lation i n hand we can solve for the opti mal dive rsity gain, multiplexing gain, and ARQ window size as a function of queue state and deadline s ensitivity . W e demonstrate t he performance of these solutions with a numerical example in the next subsecti on. D. Distorti on Resul ts Consider the ARQ system described above wit h messages arriving in each time block wi th probabil ity λ = 0 . 9 . W e assume a 4x4 M IMO-ARQ s ystem ( M = N = 4 ) with an SNR of 1 0 dB that utilizes the incremental redundanc y codes proposed in [6], whi ch have been sho wn to achie ve the di versity- multiplexing-ARQ tradeoff. For t hese codes we all ow the ARQ window size to take values in a finite set L ∈ { 1 , . . . , 4 } . W e also consider the deadline length k ranging over seve ral values ( k ∈ { 2 , . . . , 8 } ) to examine the impact of delay sensitivity on the so lution to our dynamic program (36). For each v alue of k we solve a new version of (36). The plots below contain t he data accumulated by averaging over all of these solutions. Figure 7 plots the o ptimal AR Q window lengt h as a function of queue state for differe nt v alues of k . W e s ee t hat for s hort d eadlines we cannot afford long ARQ windows for any queue state. As t he deadlin es become more relaxed we can increase the ARQ win dow size. Howe ver as th e queue fills up we are forced to again decrease the amount of ARQ div ersity . Figure 8 plots the optim al multiplexing gain r as a function of queue state for different values of k . Here we see th at with short deadlines we must use f airly low amount s o f spatial multiplexing (i.e. high spatial d iv ersity), since we cannot u se A RQ diversity . As the d eadlines become m ore relaxed we can increase the amou nt of spatial m ultiplexing and use ARQ for diversity . Once again, as the q ueue fills up we m ust switch back to low le vels of multi plexing or , equiva lently , high levels of div ersity to ens ure a lower error probability and hence that fe wer retransmissions are needed to clear a given message from the system. 0 2 4 6 8 0 2 4 6 8 1 2 3 4 Queue State Optimal ARQ Window Size vs. Queue State and Deadline Deadline Length Optimal ARQ Window Fig. 7. Optimal ARQ windo w size vs. queue state vs. deadline length k ( SNR=10 dB). 0 2 4 6 8 0 2 4 6 8 1.5 2 2.5 3 3.5 4 Queue State Optimal Multiplexing Rate vs. Deadline Length and Queue State Deadline Length Optimal Multiplexing Rate Fig. 8. Optimal multiplexing gain vs. queue state vs. deadline length k (SNR=10 dB). W e also ev aluate the performance advantage gained by adapti ng the settings of d iv ersity , mu ltiplexing, and ARQ rather than choos ing fixed allocations. For k = 4 we compu ted the distortio n resulti ng from all possible fixed all ocations of ARQ window length and mul tiplexing gain. T he curved surface in Figure 9 plots the dis tortion of these fixed all ocations for all values of L and r . The flat surface in Figure 9 is the distortion achie ved by the adaptive scheme (plotted as a reference), which indicates a distortion reduction of u p to 70 dB. Even in t he m ost fa vorable cases, the adaptive schem e ou tperforms any fixed scheme by more than 50%. V I I . S U M M A RY W e have in vestigated the opt imal tradeoff between diversity , mul tiplexing, and delay i n MIM O systems to minim ize end-to-end distorti on under both asymptot ic assum ptions as well as in practical operating 1 1.5 2 2.5 3 3.5 4 0.5 1 1.5 2 2.5 3 3.5 −100 −80 −60 −40 −20 0 Multiplexing rate Total Distortion of Fixed ARQ and Multiplexing Allocations vs. Adaptive Allocation Number of ARQ attempts Total Distortion in dB Fig. 9. Distortion for the fixed allocation problem vs. multiplexing gain vs. AR Q windo w size (SNR=10 dB). conditions. W e first considered the tradeof f between diversity and m ultiplexing wit hout a delay constraint. In particular , for the asymptot ic re gime of hi gh SNR and sou rce dimension, we obtained a closed-form expression f or the optimal rate on the Zheng/Tse di versity-mu ltiplexing tradeof f region as a simple function of the source dimension, code blocklength, and disto rtion norm . W e also showed th at i n t his asymptotic regime separate source and channel coding at the opt imized rate minimizes end-to-end distortion. Ho wever , in contrast to codes desig ned according to Shanno n’ s separation theorem, the finite blockl ength assumption in our setti ng causes dis tortion to b e introduced by bot h the source cod e and the channel code, even though the source encoding rate is below channel capacity . W e showed t hat the s ame opti mization frame work can be applied even wi thout an asymptotically large SNR. Ho we ver , outs ide this asymptotic regime, closed-form expressions for the opt imal diversity-multiplexing tradeoff (and correspon ding transmis sion rate) cannot be found, and con vex optimization tool s are required t o find this optimal operating point . Finally , we de veloped an optimization framework to minimize end-to-end distorti on for a broad class of practical source and channel codes, and applied this frame work to a specific example of a video source code and space-time chann el code. Our num erical results illustrate qu antitatively how the optim al number of ant ennas used for mult iplexing increases with b oth the source rate and the SNR. W e t hen extended our analysi s to delay-constrained sources and M IMO systems usin g an ARQ retrans- mission protocol. ARQ provides additional div ersity in the sys tem at the expense of delay . Mini mizing end- to-end delay thu s ent ails finding the optimal operatin g p oint on the div ersity-mult iplexing-delay tradeoff region. W e d e veloped a dynam ic programming formulation for this opt imization to captu re the dive rsity- multiplexing tradeof fs of t he channel as well as the dynami cs of random message arriv al tim es and random ARQ block completi on ti mes. The dynamic program ca n be solved using standard techniques, which we applied t o a 4x4 MIMO system with different ARQ window sizes and delay constraints. W e obt ained numerical results indicati ng the optim al amount of diversity , multi plexing, and ARQ to u se as a functio n of the queue state and m essage deadline. W e als o demons trated that adaptation of the diversity-multiplexing characteristics of the MIMO channel code to the time-varying backlog in the system leads to di stortion reduction of up to 70 dB ve rsus a static allocation . The unconsum mated union b etween inform ation theory and networks has ve xed both communities for many years. As poin ted o ut in [10], part of the reason for thi s d isconnect is that source burstiness and end-to-end del ay are majo r components in the study of n etworks, yet play l ittle role in t raditional Shannon theory where delay is asymptotically in finite and channel capacity inherently assumes a source with infinite data t o send. W e hope that our work provides one small step towards consum mating this union by mer ging information-t heoretic tradeof fs associated with t he channel with models and analy sis tools from networking to handle source b urstiness and system delay . Much work remains to be done in this area by extending our ideas and developing ne w ones for coupling the fun damental performance l imits of general mul tihop networks with queuing delay , traf fic s tatistics, and end-to-end metric optimization for heterogeneous appli cations runni ng over these networks. V I I I . A C K N OW L E D G M E N T S W e are deeply grateful to the four re viewers for t heir d etailed and insightful comment s, which helped to greatly i mprove t he clarity and exposition of the paper . W e want to t hank Re viewer D in particular for suggesting Figure 2 to illustrate the optimization of the mu ltiplexing rate r ∗ . R E F E R E N C E S [1] K. Azarian, H. El Gamal, and P . Schniter , “On the Achiev able Dive rsity-Multiplexing Tradeof f i n Half-Duplex Cooperati ve Channels”, IEEE T rans. I nform. T heory , V ol. 51, No. 12, pp. 4152–417 2, Dec. 2005. [2] Bertsekas, D . P ., Dynamic Pr og ramming and Optimal Contr ol , Boston: A thena Scientific, 1995. [3] J. Bucklew and G. Wise, “Multidimensional asymptotic quantization theory with r th power distortion measures”, IEEE T rans. Inform. Theory , V ol. 28, No. 2, pp. 239-247, Mar . 1982. [4] T . M. Cover and J. A. Thomas, Elements of Information Theory , New Y ork: John Wiley & S ons, 1991. [5] I. Csisz ´ ar and J. K ¨ orner , Information Theory: Coding Theor ems for Discr ete Memoryless Systems , Academic Press, Ne w Y ork, 1981. [6] H. El Gamal, G. Caire, and M. O. Damen, “Lattice coding and decodin g achiev e the optimal div ersi ty-vs-multiplexing t radeof f of MIMO channels”, IE EE Tr ans. Inform. T heory , V ol. 50, No. 6, pp. 968–985, June 2004. [7] H. El Gamal, G. Caire, and M. O. Damen, “The MIMO ARQ Channel: Diversity-Multiple xing-Delay Tradeo ff ”, IEEE Tr ans. Inform. Theory , V ol. 52, No. 8, pp3601– 3621, Aug. 2006 . [8] H. El Gamal and A. R. Hammons Jr, “On the design of algebraic space-time codes for MIMO block fading channels”, IEEE T rans. Inform. T heory , vol. 49, N o. 1, pp. 151–163, Jan. 2003. [9] H. E l Gamal and M. O. Damen, “Univ ersal space-time coding, ” IE EE Tr ans. Inform. T heory , V ol. 49, No. 5, pp. 1097–1119, May 2003. [10] A. Ephremides and B. Hajek, “In formation theory and communication netw orks: an unc onsummated union, ” IEEE Tr ans. Inform. Theory , V ol. 44, No. 6, pp. 2416–2434, Oct. 1998. [11] A. Gersho, “ Asymp totically Optimal Block Quantization”, IEEE T rans. Inform. Theory , V ol. 25, No. 7, pp. 373–38 0, July 1979. [12] A. Gersho and R. M. Gray , V ector Quantization and Signal C ompr ession , Boston: Kluwer Academic, 1992. [13] B. Gir od, K. S tuhlmuller , N. Farber , “T rade-Off Between S ource and Channel Coding for V ideo Transmission”, Proc . of the IEEE Intl. Conf. Image Proc. (ICIP) , V ol. 1, pp. 399–402, Sept. 2000. [14] D. Gross and C. M. Harris, Fundamentals of Queueing Theory , N e w Y ork: John W iley & Sons, 2000. [15] B. Hochwald, K. Zeger , “Trade off Between S ource and Channel Coding”, IE EE T ran s. Inform. Theory , V ol. 43, No. 5, pp. 1412–1 424, Sept. 1997. [16] M. Kuhn, I. Hammerstroem, A. W ittneben, “Linear Scalable Space-T ime Codes: T radeof f B etween Spatial Multiplex ing and Transmit Div ersity”, P r oc. of SP A WC 2003 , pp. 21–25, June 2003. [17] J.N. Laneman, E . Martinian, G. W . W ornell, J.G. Apostolopoulos, “Source-chann el div ersity for parallel channels”, I EEE Tr ans. Inform. Theory . V ol. 51, No. 10, pp. 3518–3 539, Oct. 2005. [18] H. F . Lu and P . V . Kuma r , “Rate-div ersity tradeof f of space-time codes with fixed alphabet and optimal constellations for PSK modulationn”, IEEE T rans. Inform. Theory , V ol. 49, No. 10, pp. 2747–2751 , Oct. 2003 [19] J. Max, “Quantizing for Minimum Distortion”, IEEE T rans. Inform. T heory , V ol. 6, No. 1, pp. 7–12, March 1960. [20] B.A. S ethuraman, B.S. Rajan, and V . Shashidhar , “Full diversity , high rate, space-time block codes from di vision algebras, ” IEEE Tr ans. Inform. T heory , V ol. 49, No. 10, pp. 2596– 2616, Oct. 2003. [21] K. Stuhlmuller , N. F arber , M. Li nk, and B. Gir od, “ Analysis of V ideo Transmission over Lossy Channels’, IE EE J. Select. Areas Commun. , V ol. 18, No. 6, pp. 1012–1032, June 2000. [22] A. Tru shkin, “Sufficient conditions for uniqueness of a locally optimal quantizer for a class of con ve x error weighting functions, ” IEEE T rans. Inform. T heory , V ol. 28, No. 2, pp. 187-198, Mar . 1982. [23] D. Tse, P . V iswanath and L. Zheng “Div ersit y-Multiplexin g T radeof f in Multiple Access Channels”, IEEE T rans. Inform. T heory , vol. 50, No. 9, pp. 1859-1874, Sept 2004. [24] H. Y ao and G.W . W ornell, “ Achiev ing the F ull MIMO Dive rsity-Multiplexing F rontier with Rotati on-Based Space-Time Codes”, Pr oc. Allerton Conf. Commun., Contr ., and Computing , O ct. 2003. [25] P . Zador , “ Asymptotic quantization error of continuous signals and the quantization dimension, ” IE EE T rans. Inform. Theory , V ol. 28, No. 2, pp. 139-149, Mar . 1982. [26] K. Zeger and V . Manzella, “ Asymptotic bound s on optimal noisy channel quantization via random coding, ” IEEE Tr ans. Inform. Theory , V ol. 40, No. 6, pp. 1926-193 8, Nov . 1994. [27] L. Z heng, D. Tse, “Diversity and Multiplexing: the Optimal Tradeof f in Multiple Antenna Channels”, IEEE T rans. Inform. Theory , V ol. 49, No. 5, pp. 1073–1 096, May 2003. List of Figures and Captions • Fig ure 1: The optim al div ersity-multi plexing tradeoff for T ≥ M + N − 1 • Fig ure 2: The optim al multipl exing rate r ∗ to balance source and channel distort ion • Fig ure 3: T ot al distortion vs. numb er of antenn as assigned t o mul tiplexing in an 8x8 sys tem ( T << k ) • Fig ure 4: T otal dist ortion vs. number of antennas assigned to multiplexing i n an 8x8 syst em ( T k ) • Fig ure 5: T ot al distortion vs. numb er of antenn as assigned t o mul tiplexing in an 8x8 sys tem ( T >> k ) • Fig ure 6: T otal distortion vs. number of antennas assign ed to m ultiplexing for differing levels of SIR. • Fig ure 7: Optimal ARQ windo w size vs. qu eue stat e vs. deadline length k (SNR=10 dB) • Fig ure 8: Optimal multipl exing gain v s. queue state vs. deadline length k (SNR=10 dB) • Fig ure 9: Dist ortion for t he fixed all ocation problem vs. multip lexing gain vs. ARQ wind ow size (SNR=10 dB) A uthor Biographies Holliday: Tim Holliday receive d th e B.S. degree in general engineering from Harvey Mudd College, Claremont, CA, in 1997; the M.S. degree in electrical eng ineering from Stanford Un iv ersity , Stanford, CA, in 2001; and t he Ph.D. degree i n management science and engi neering from Stanford Uni versity i n 2004. His in dustry experience i ncludes s ummer internsh ips at Lucent T echnologies Bell Labs in summers of 2000 and 2001. He als o served as a Comm unications Officer in the U. S. Air Force Reserve from 1997 through 2004. From 2004-2006 he was a Postdoctoral Research Associate at Princeton Univ ersity , Princeton, NJ. In 2 007 he jo ined Gol dman Sachs as an associ ate. His research interests include stochastic processes and modeling, cross-layer design in wireless communication s, and informati on theory . Goldsmith: Andrea J. Goldsm ith is a professor of Electrical Engi neering at Stanford Univer sity , and was previously an assistant professor o f El ectrical E ngineering at Caltech. She h as also held indust ry positions at Maxim T echnologies and at A T&T Bell Laboratories, and is currently on l ea ve from Stanford as co-foun der and CT O of Quantenna Commun ications, Inc. Her research includes work on capacity of wireless channels and networks, wireless commu nication and informat ion theory , energy-constrained wireless com munications, wireless comm unications for distributed control, and cross-layer design of wireless networks. She is autho r of the book “W ireless Communications ” and co-author of the b ook “MIMO W ireless Communications , ” bot h publ ished by Cambridge Univer sity Press. She receiv ed the B.S., M.S. and Ph.D. degree s in Electrical Engineering from U.C. Berk eley . Dr . Goldsm ith is a Fellow o f the IEEE and of Stanford. She has received sev eral awa rds for her research, including the National Academy of Engineering Gil breth Lectureship , the Alfred P . Sloan Fellowship, the Stanford T erman Fellowship, the Nati onal Science Foundation CAREER Dev elopment A ward, and t he Of fice of Nav al Research Y oung In vestigator A ward. She was also a co-recipient o f the 2005 IEEE Communications Society and Information Th eory Society joint paper awa rd. She currently serves as associate editor for the IEEE T ransactions on Information Theory and as edito r for t he J ournal on Foundations and Trends in Commu nications and Information Theory and in Networks. She was previously an edi tor for the IEEE Transactions on Comm unications and for the IEEE W ireless Commun ications Magazine, and has serv ed as guest edit or for seve ral IEEE journal and magazine special issues. Dr . Goldsmit h is activ e in comm ittees and conference organization for the IEEE Information Theory and Communications Societies and is an elected member of the Board of Governors for bot h societies. She is a di stinguished lecturer for the IEEE Commu nications Society , the vice-president and student c ommit tee founder o f the IEEE Informati on Theory Society , and was th e techni cal program co-chair for the 2007 IEEE Int ernational Sympo sium on Information Theory . Poor: H. V incent Poor (S72, M77, SM82, F87) recei ved the Ph.D. degree in EECS from Princeton Univ ersity i n 1977. From 1977 unt il 1990, he was o n the faculty of t he University of Illin ois at Urbana- Champaign. Since 1 990 he has been on th e faculty at Princeton , where he is t he Dean of Engi neering and Applied Science, and th e M ichael Henry Strater University Professor of Electrical Engineering. Dr . Poors research int erests are i n th e areas of stochastic analys is, statist ical signal processi ng and their appli cations in wireless networks and related fields. Amon g his publications in these areas are the recent book M IMO W ireless Comm unications (Cambridge Univer sity Press, 2007), co-authored with Ezio Biglieri, et al, and the forthcoming book Quickest Detectio n (Cambridge Unive rsity Press, 2008), co-auth ored with O lympia Hadjiliadis. Dr . Poor is a member of the National Academy of Engineering, a Fellow of the American Academy o f Arts and Sciences, and a former Guggenheim Fellow . He i s also a Fellow of t he Institute of Mathematical Statistics, the Opt ical Society of America, and other organizations. In 1990, he served as President of t he IEEE Information Theory Society , and in 2004-07 as the Edit or- in-Chief of these Tra nsactions. Recent recognition of h is work includes the 2005 IEEE Edu cation Medal, the 2 007 IEEE Marconi Prize Paper A ward, and t he 20 07 T echnical Achievement A ward o f the IEEE Signal Processin g Society .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment