Optimal Constellations for the Low SNR Noncoherent MIMO Block Rayleigh Fading Channel

Reliable communication over the discrete-input/continuous-output noncoherent multiple-input multiple-output (MIMO) Rayleigh block fading channel is considered when the signal-to-noise ratio (SNR) per degree of freedom is low. Two key problems are pos…

Authors: ** - **Rao, S.** (주 저자) - **Hassibi, B.** (논문 초안 날짜: 2018년 11월 9일) --- **

Optimal Constellations for the Low SNR Noncoherent MIMO Block Rayleigh   Fading Channel
1 Optimal Constella tions f or the Low SNR Noncoherent MIMO Block Rayleigh F ading Channel Shivra tn a G. Srini vasan and Mahesh K. V aranasi This work was supp orted in part by NSF grants CCF-0434410 and CCF-0431170. It was presented in part at the Allerton Conf. Commu n., Cntrl. and Comp ut., Monticello, IL, Sept. 2006 and the Intnl. Symp. on In form. Th., Nice, France , June 2007. The authors are with the ECE d epartment of the Uni versity of Colorado at Bou lder, CO, 80309-042 5. This pape r was submitted Mar . 9, 2007, revised Mar . 21, 2008 and accepted Aug. 15, 2008. Nov ember 9, 2018 DRAFT Abstract Reliable communica tion over the discrete-input/contin uous-output nonco herent m ultiple-inp ut multiple-ou tput (MIMO) Rayleigh blo ck fading chan nel is con sidered when the sig nal-to-n oise ratio (SNR) per d egree of freedo m is lo w . T wo key pro blems are p osed and solved to obtain the optimum discrete in put. In both problems, th e a verage and peak p ower p er space-time slot of the input con stellation are con strained. In the first o ne, th e peak power to av erag e power ratio (PP APR) of th e input con stellation is held fixed, while in the secon d pro blem, the pe ak power is fixed independe ntly of the average p ower . In the first PP AP R-constrained prob lem, the mu tual information, which g rows as O(SNR 2 ) , is maximized up to second order in SNR. In the seco nd p eak-co nstrained problem, where the mutual infor mation behav es as O(SNR) , the structure of con stellations that are optimal up to first ord er , or equiv alently , that minimize energy/bit, ar e exp licitly cha racterized. Furthermo re, am ong co nstellations that ar e first-order optimal, those that m aximize the mutual inf ormation u p to secon d order, or equiv alently , the wideban d slope, are ch aracterized . I n both PP APR -co nstrained and peak-con strained prob lems, t h e optimal co nstellations are obtained in closed- form as solution s to non-co n vex op timizations, and interestingly , they are found to b e identical. Due to its special structure, the commo n solutio n is re ferred to as Space T ime Orthogon al Rank one Modulation , o r STORM. In both problems, it is seen that ST ORM provides a sharp c haracterizatio n of the behavior of noncohe rent MIMO capacity . Key W ords: capacity , con stellation design, energy/bit, low SNR, MIM O, nonco herent communica tion, non- conv ex optimization, peak-to-average power ratio, peak-power , R ayle igh fading, STORM, wi d eband slope. I . I N T R O D U C TI O N In th is paper , we cons ider the problem of communicating reliably o ver a MIMO bloc k Rayleigh fadin g chann el in the lo w SNR regime. W e assume the noncohe rent model, w herein neither the transmitt er nor the recei ver are assumed to ha ve instantane ous channel state information (CSI), while both ha ve kno wledge of the channel distrib ution . In scenarios where the mobile receiv ers are moving at a high speed or when the number of transmit antennas is lar ge, channel estimation at the recei ver might be insuf ficient due to the small coherence times in v olved. The proble m of the receiv er acquiring CSI is furthe r exa cerbated in the lo w SNR regime, where the channel estimates can be unreliable. As a result , the m ore common assu mption of pe rfect CSI at the receiv er , namely that of coh erent communicat ions, may not hold true in such cases . A more funda mental rationale for studying the noncoheren t model is as follows. Since in prac tice the chann el is not kno wn to the receiv er at the start of communicatio n, an informati on theore tic for mulation of the noncoheren t problem—whic h implicit ly accounts for the resources needed for implici t ch annel esti- mation without co nstraining the transmission scheme in any way—is m ore fundament al than the coheren t IEEE TRANS. INFORM. TH. 3 formulat ion. Systems that assu me coh erent transmission by argu ing that the channel can be acqui red at the recei ver by the use of pilot-symbo l assiste d transmissi on to pe rform exp licit chan nel estimation are inhere ntly subop timal in general while not taking into account the resources, namely energy and degree s of freed om, neede d for pilot transmissions , as they shoul d. The study of noncoheren t fading channels at low SNR is m oti vat ed by their application in wideband (WB) and ultra-wideband (UWB) chann els. In such scen arios, the signal po wer is spre ad over a large bandwid th, rendering the SNR pe r degree of free dom low . Tra nsmissions ov er wideband fad ing channels exp erience both time and frequenc y selecti vity . Howe ver , within a s hort windo w of time or fre quency , the chann el fading co efficien ts are k nown to be hi ghly correla ted. One widesp read approach th erefore to deal with frequenc y-selecti vity is to divid e the original wideband channel into sev eral parallel narro wband chann els such that each narro wband channel e xperiences flat f ading or a single tap co efficien t. T o deal with time-sel ectiv ity , a common approach is to model each narro wband channel through bloc k fading . In the b lock fadi ng m odel, t he channe l coef ficients are assu med fixed for a du ration in time foll owing whic h the y assume indep endent and identical ly dist ribut ed realizat ions (her e adequate interle aving across time and frequenc y windo ws is implicitly assumed). In this work, we model the wideband channel as a block fad ed narro wband chan nel in the lo w SNR regime. This simplif ying channel modeling assumption helps captur es the esse nce of the orign al wideban d chann el, and is widely adopted in the analysis of MIMO fad ing channels. The study of noncohere nt S ISO fading channels at lo w SNR dates back to the 1960’ s. T wo equi v alent notion s of optimality in the literature that are indicat ors of ener gy ef ficienc y in the low SN R regime are (1) the input being first order optimal with respect to Shannon capacity or (2) the input achie ving the minimum ener gy per bit or E b N 0 min requir ed for reliab le communication . A classic al result by Shann on [1] is that in the limit of infinite bandwidth or vani shing SNR, the m inimum ener gy/bit requi red for reli- able communications over an A WGN channel is − 1 . 59 dB. Early work by Kenne dy [2], Jacobs [3] (also see Gallager [4] and the referenc es therein) studied wideband SISO Rayleigh fading channel s with an a verage po wer cons trained input and sh owed tha t in the li mit of infinit e bandwidth or v anishing SNR, the requir ed minimum energ y/bit is aga in − 1 . 59 dB , the same as tha t of an A WGN channel. A remarkable observ ation the n was that the minimum ener gy/bi t require d is the same whether or not the recei ver has kno wledge of the chann el fading coef fecient s. T elatar an d Tse [5], and V erdu [6] sho w that the minimum ener gy/bit is − 1 . 59 dB ev en for fairly gen eral multipath SISO fading chann els and gene ral M IMO f ading chann els, resp ectiv ely . A common approa ch adopted to obtain E b N 0 min for fading channels is to consi der Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 4 the achie va ble rate of a certai n scheme (ofte n M -ary F requen cy Shift Ke ying or MF SK), w hich is trans- mitted at arbitrarily low duty cycles (cf. [2, 4, 5]). The required result is then obta ined by either showin g that the ener gy/bit of the sc heme at v anishing SNR matches that of the A WGN channel, or by derivi ng an upper bou nd on capacity that is tig ht with respect to the ach iev able lower b ound. Ho wev er , this approa ch fixes the in put a pr iori , and therefo re no deter mination can be made as to the necessa ry conditio ns for a constella tion to achie ve the minimum ener gy/bit. The characteri zation of the class of signals (more genera lly , input dis tribu tions) that ar e both necessary and suf ficient to achie ve the minimum ene rgy/b it had been an importan t and long standi ng open problem. Signals such as a rbitrarily low duty-c ycled FSK tend to hav e pro hibiti vely lar ge p eak-to-a verag e-po wer ratios (P APR) and are con sequently dif ficult to implement in pr actice. Such si gnals are therefore referred to as “pe aky” sign als in the literatu re. Using certain typ es of fou rth moments of the input as measur es of peakine ss, Medard and Gallager [7], and S ubramania n and Hajek [8] showed tha t sign aling that is not peak y i n either time or frequ ency dimensions cannot achie ve the minimum energ y/bit as SNR → 0 . V erdu [6] formalized this notion further for fa irly general noncoheren t MIMO fading chan nels and estab lished that flas h signal ing , where the inp ut distrib utio n con ve rges to a zero mass and a non-ze ro mass that is transmitt ed with vanish ing probabili ty as S NR → 0 , is both necessary and suf ficient to achiev e the minimum energ y/bit. While noncoher ent co mm unicat ions is suf ficient to transmit at the A WGN minimum ener gy/bit of − 1 . 59 dB, the work in [6] resolv es another majo r dif ficulty . It introduce s and explain s the crucia l role of wideband slope ( S 0 ) at larg e bu t finite band widths. The wideband slope is a measure of ho w f ast the ener gy/bi t of the optimal sch eme appr oaches the minimum energ y/bit, and is synon ymous with the notion of second order optimality w ith respec t to S hannon capacity . O ne main result of [6] is tha t for noncohe rent MIMO channe ls with an av erage power constraint, the wideband slope is zero. This result implies that to approach the minimum ener gy/bit, the bandwidth for reliable noncoher ent communica tions becomes prohibiti vely lar ge and the associate d signaling scheme prohibiti vel y peaky , and therefo re no realistic (i.e., bandwidth limited and peak-limited ) scheme can achie ve the minimum ener gy/bit. Hence it was important to pose problems that provid e meaningful second-ord er performanc e when consid ering noncohere nt fadin g channels at lo w SNR. O ne way was to impose suitable peak-constr aints on the input. It is shown in Rao and Hassib i [9] that un der certain reg ularity conditions on the signal, which include making the fourth and sixth moments finite, the noncohere nt MIMO capacity grows as O (SNR 2 ) . Similar expressio ns for the mutual information up to the second order are obtain ed in closed Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 5 form in [10, 11] with dif ferent assumptions on the fadin g matrices and peak-po wer constrai nts. Even thoug h such problems hav e capacity beha ving as O (SNR 2 ) , and hence the minimum ener gy/bit not oc- currin g at a van ishing SNR, they are important since the y in vol ve practical modulation schemes with reason able P APR. Schemes designed to satisfy such regular ity conditio ns must be deployed in the vicin- ity of the S NR w here the m inimum ener gy/bi t is ach iev ed. Also rele van t is the interesting case of the peak-c onstraint imposed bei ng indep endent of the av erage po wer constraint, resulti ng in O (SNR) gro wth of capacit y . In this cas e, it w ill be sho wn here that the wideband slope is not zero any more (unlik e the a verage power constr aint only problem). T herefor e, the ener gy/b it approa ches the mini mum ener gy/b it at a non-zero rate as SNR → 0 . Gursoy and V e rdu [12] consid er SISO Rician fast f ading channel s and impose di fferen t peak po wer constrai nts in additi on to the a verag e power constraint on th e input. For cer - tain combin ations of peak and ave rage-po wer const raints, they characteriz e the E b N 0 min and S 0 for S ISO Rician f ast fading channels. For a combination of pea k and av erage po wer constr aints, they sho w that On-Of f Quadrature Phase Shift Ke ying (OOQPSK ) achie ves the minimum ener gy per bit as well as the optimal wideband slope for the noncoh erent SIS O Rician fast fadin g channel. This result is obtain ed in [12] by directly ev aluatin g a second order expans ion of mutual informati on for OOQPS K, and this approa ch cannot be e xtended to more general MIMO bl ock fadin g models. T o the best of the authors’ kno wledge, this is the only inpu t distrib ution reported in the literature that is both first and second orde r optimal, in the conte xt of peak-con strained noncohe rent communica tions ov er fadin g channels. Abou-Fa ycal et. al. [13] consider a noncoherent SIS O Rayleigh fast fadin g channe l and prov e that the capaci ty achie ving distrib ution is discrete with a finite number of points, one of them being at the origin. In [14], the authors consider a S ISO Rician fast fadin g channel and sho w that the capaci ty achiev ing distrib ution is discr ete ev en when certa in types of pea k-constrain ts are impos ed. While th ere is no for mal proof of the di screteness of the optimal inp ut for MIMO Rayleigh f ading ch annels, it is expected to be the case. Despite these results, discrete input optimization of infor mation theoretic measure s is rarely consid ered since the optimiza tions encountere d are often seen as being analytica lly intr actable. Another compelli ng reason for c onsidering the prob lem of maximiz ing mutual info rmation as a finite dimensional optimiza tion, over a dis crete and finite cardinality input is that, the solution, if obta inable, would of fer insigh ts simultan eously into info rmation theoretic as well as codin g-modulation aspects. For , consider that e ven when capacity achie ving probabili ty distrib ution fun ctions are found, th e pr oblem of practica l transmis sion would be still unresolv ed as it would not be clear how the ch oice of a qu antization of the optimum input would affec t performance. Some recent works that deal w ith discrete signal constella tion Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 6 design using informati on theor etic criteria b ut only under an ave rage power con straint and via numerical optimiza tion techni ques are [15–17]. While the results in [16] provi ded numerically computable tight lo wer boun ds on capaci ty of the noncoheren t MIMO channel, the associate d co nstellation s may be hard to implement i n pract ice due to th eir limited analytica l structure an d lack of s trict peak or peak-to-a verag e po wer ratio constraints . In thi s paper , we pose an d solve two k ey problems of ob taining the optimum dis crete input of finite cardin ality for peak-c onstrained MIMO nonco herent block Rayleigh fading channels in closed form. Giv en the result s of [13, 14], it is e xpected that there will be no loss in optimiz ing over discret e inputs as op posed to inpu t distrib ution functions. In both problems , w e assume av erage po wer con straints on the input. In addition, w e also assume natural peak constraint s per antenn a and per time slot, which closel y emulate constrain ts on power amplifiers, ins tead of fourt h and higher order momen t co nstraints on the input used in [7–9]. In the first problem, the peak po wer to ave rage power ratio (PP APR ) of the inpu t conste llation is held fix ed, while in the second, the peak po wer is fixed independen tly of the a verage power . W e refer to these two problems as the PP APR constrai ned and peak-con strained cases, respec tiv ely . W e show that intere stingly , in the case of the noncoheren t MIMO Rayleigh fading channel at low SNR, such joint optimization s of information theoreti c metrics ov er comple x signal matrice s and their respe ctiv e proba bilities are indeed analytically tractable and result in elega nt closed form solutions . In the PP APR constrained case, it can be sho wn that the inp ut satisfies certain regu larity conditions specified in [9]. For such inputs, the mutual information is obtained up to second order in [9] and shown to grow as O (SNR 2 ) . In one of the ke y contrib utions here, we maximize this second order mutual informat ion jointly over the matrix-v alued elements of a fi nite input constellati on and their probabilitie s, when the cardin ality of the constellati ons is no greater than T + 1 , where T is the chann el coherence blockl ength. In the peak constraine d case, the mutual information behav es as O (SNR) . Here, we explicitl y charac- terize the struct ure of const ellations of any finite cardinality that are optimal up to fi rst order , or equiv a- lently , that minimize ener gy/bit or maximize capacity per unit ener gy . More importantly , among constel- lation s of cardin ality no greater than T + 1 that are first-order optimal , those that maximize the mutual informat ion up to seco nd order , or equiv alen tly , the w ideban d slope, are characterize d. In both PP APR and peak constrain ed problems, the optimal solu tions are ob tained in closed-form to finite dimensi onal non -con ve x optimiz ations. Moreo ver , the s olutions are established to be both necessary and suf ficient to optimize their respecti ve information theore tic metrics. Interestin gly , the solut ions to Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 7 both the PP APR -constr ained and peak-con strained problems are found to be identical . Due to its special structu re, we refe r to the co mmon solutio n as Space Time Orthogon al R ank one Modulatio n, or STORM. Moreo ver , in the PP APR constrained case, ST ORM (with cardin ality T + 1 ) is shown to be near-o ptimal e ven among constell ations of unconstr ained cardin ality , e ven for modest valu es of T and P APR. Hence, there is not much to be gained b y using mor e than T + 1 points in this case. In the peak- constrained case, we fi rst obta in necessary and suf ficient conditi ons fo r a constella tion of any finite cardina lity to achie ve the minimum energy /bit. Am ong all such constella tions, when the cardinality is no greate r than T + 1 , STORM is established as being both first and second ord er opti mal. Our approac h prov ides a far more detaile d charac terization of the first and second or der beha vior of no ncoherent MIMO capa city than in exi sting literature. Specifically , we sho w that when the peak power is less than a certain thresho ld, it is possib le to hav e a w ideban d slope that is non-z ero, and obta in the m aximum wideband slope achie v able by a T + 1 point p.m.f. Moreo ver , the energ y/bit and the wideban d sl ope achie ved by S TORM rev eal a fundame ntal energ y-vs-band width ef fi cienc y tradeo ff that enable th e dete rmination of the operati ng (low) SNR and peak po wer m ost suit able for a giv en applicati on. It also follo ws from our analy sis and optimizat ion tha t while the con v entional MIMO On-Of f Ke ying (OOK) also achi ev es the m inimum energ y per bit, STORM has a w ideban d slope that is T times greater which translates into an increase in bandwidth efficien cy (or a decrease in the P APR) by a factor of T in the w ideban d regime. Gi ven typical value s of the cohere nce blocklengt h T , these gai ns are potential ly huge. Our results and conc lusions also temper the conclusion s of [6] obtain ed under only the a verag e po wer constraint regardin g noncoheren t communication s ove r fading channe ls. Among th e se veral ne w ins ights that ST OR M provides on communication s in the lo w SNR re gime one that runs contr ary to con ve ntional wisdom is that, under the prac tical constrain ts consider ed in this wor k, it helps to use all av ailable transmit antenna s, not just one, to transmit linearly dependent signals across them in the lo w S NR re gime. Note that in this work, the input distrib ution is not a priori assumed or restricted as it is in most prior work. STORM is ob tained through nove l tec hniques in vol ving non- con ve x optimizatio n of informatio n theore tic measures . Consequent ly , our appro ach pro vides necessary and suf ficient conditions for a con- stellat ion to be optimal for the nonco herent MIMO Rayl eigh fadin g channe l, resolvin g a long-sta nding open probl em. L o w duty cyc led M-ary FSK (MFSK) [2, 4, 5] w hich is oft en propo sed to achie ve first order optimality in a SISO channel, is seen to be clos ely related to a special case of STORM. Howe ver , the zero symbol in S TORM is info rmation bearin g which is not the case in low duty cyc led MFSK. This Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 8 can make STORM ha ve higher achie vable rates espe cially in the P P APR-constra ined case. Moreov er , in this work, we specify a class of STORM cons tellations. One subtle insight af forded through diffe rent STORM constel lations is that optimal signal co nstellatio ns need not be p eaky in frequenc y dimension (as in lo w duty-c ycled MFSK), in addition to be being peaky in time dimensio n. In the process, w e disco ver a ne w optimal SISO constella tion which may be called “Permuted MFSK ” due to its relation to MFS K b ut would hav e better spectral properti es in gene ral. T o close this secti on, some notationa l con v entions used througho ut the paper are descri bed. Matrices are denoted by the boldface d capital letters, and v ectors by bold faced small letters. The symbol ⊗ denote s the Kronec ker product. The matrices X T , ¯ X and X ∗ denote the transpose, complex- conjugate, and conju gate trans pose of X , respecti ve ly . Moreo ver , tr( X ) and | X | denote the trace and determinant of the matrix X . The no tation [ X ] ij refers to the ( i, j ) th element of the matrix X . The notati on X ( m ) refers to the m th ro w of the matrix X . For an inte ger N , I N is an N × N identity m atrix and 1 N is the N length column vecto r of ones. The block diagonal matrix with matrices A 1 , . . . , A N along the block diagon al and zeros elsewh ere is denoted as blockdiag ( A 1 , A 2 , . . . , A N ) . E[ . ] denotes the expect ation operat or . A fun ction f ( ρ ) is said to beha ve as o ( ρ ) when lim ρ → 0 f ( ρ ) ρ = 0 . The symbo l X C is used to denote the compl ement of the set X . The symbol  is used to denote generaliz ed inequality ,i.e., if A  B then B − A is positiv e semidefinite (psd). T he first and second deriv ati ves of a function f ( x ) at x = c are denoted by ˙ f ( c ) and ¨ f ( c ) , respecti ve ly . The function log ( . ) alw ays refers to natural logarith m, unless otherwise specified . C omple x, circularly symmetric, Gaussian random vector s with mean m and cov arian ce matrix Q are said to be C N ( m , Q ) distrib uted. I I . S Y S T E M M O D E L Consider a MIMO channel with N t transmit and N r recei ve antennas . The rando m channel matrix H ∈ I C N t × N r is assumed to be constant for a duration of T symbols after whic h it changes to an inde- pende nt v alue. It has independen t, identically distrib uted (i.i.d.) C N (0 , 1) entries . The knowled ge of the distrib ution of H is k nown to th e tran smitter and re ceiv er . The re alizations of H ho wev er , are unkno wn at both ends. W ith the transmitted symbol denoted as X ∈ I C T × N t , the output of the channel can be w ritten as Y = XH + N . (1) The entr ies of the add itiv e noise mat rix N are ass umed to be i.i.d. C N (0 , 1) distrib uted random v ariable s. The symbol X is drawn from a finite const ellation or alph abet C with mat rix-v alued elements. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 9 T wo k ey cases based on the types of constra ints imposed are considere d in this work. (i) PP APR-constra ined case : It is assumed tha t the a verage S NR at each rec eiv e antenn a is constraine d to be P so that 1 T E [tr( XX ∗ )] ≤ P . (2) Moreo ver , a peak-p ower constrain t is imposed per space-t ime slot, namely , k X k ∞ △ = max i,j | [ X ] ij | ≤ √ K , ∀ X ∈ C . (3) This is most natural and practi cally meanin gful peak-po wer con straint as it restric ts the peak -power per antenn a and per time slot (to be at most K ). It accurately models constr aints on indi vidua l transmit RF po wer amplifiers in practice. T he PP APR cons traint is that the ratio K P is tak en to be a fixed consta nt. This cond ition ensure s that as the av erage S NR P → 0 , the maximum peak-po wer also goes to zero. (ii) P eak-cons trained case : Here, the av erage po wer constraint (2) and the peak-po wer constraint (3) are assumed to hold. In th is case h owe ver , K is as sumed to be a fixed constan t independ ent of P . In other words , in cont rast to the P P APR-constra ined case, the peak power remains constrain ed by K (and does not change ) as the av erage S NR P → 0 . For con venie nce, we will deno te the ave rage ener gy per bloc k of T symbols as E = P T . The nonc oherent MIMO Rayleigh fading channel th us described is completely speci fied by the input constr aints and the tran sition probabili ty density function (p.d.f.) of Y cond itioned on X being transmit- ted and is easily seen to be p ( Y | X ) = exp n − tr  Y ∗ ( I T + XX ∗ ) − 1 Y o π T N r | I T + XX ∗ | N r . Finally , the re will also be occ asion to use the notion of the pea k-to-a verage po wer ratio (P APR) of a conste llation C which is de fined as max m,n max X ∈ C | [ X ] mn | 2 E n | [ X ] mn | 2 o . (4) I I I . M A X I M I Z I N G T H E M U T UA L I N F O R M A T IO N A T L OW S N R U N D E R T H E P P A P R C O N S T R A I N T Consider the abov e-defined fi nite input and continuo us output noncoher ent MIMO Rayleigh fading chann el ov er which the input con stellation { X i } L i =1 is used with correspondi ng transmission pro babilities { P i } L i =1 . The mutual information between the transmitted and receiv ed signals, normalized by the block Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 10 length T (in units of nats/dimen sion), is thus gi ven as I ( X ; Y ) = 1 T X i P i Z Y p ( Y | X i ) log  p ( Y | X i ) p ( Y )  d Y . (5) A clo sed form expressio n for I ( X ; Y ) is un fortunately not kno wn for general SNR. At asympt otically lo w SNR howe ver , and when the input signal satisfies certain regularit y condition s to a vo id inputs being prohib itiv ely peaky , the auth ors in [9] show that the mutual infor mation is zero up to fi rst order for th e contin uous input and con tinuous output counterpar t of the abov e chann el. Moreo ver , the mutua l infor - mation up to the sec ond order in P is also obtained in closed form throug h a T aylo r series expa nsion and without an y assu mption on the signal structu re beyond the regul arity condition s. Note that the expres- sion for mutual information up to secon d orde r was also deri ved earlier in [10] and [11], but with more string ent cond itions on the input distrib ution. For the sake of completenes s, the key theore m in [9] for the contin uous input and con tinuous output chann el, slig htly m odified to accou nt for the dif ferent power n ormalization s in this paper , is stated next. Theor em 1: [9, Theorem 1] Let p ( Y ) deno te the p.d.f. of Y . 1. F ir st or der res ult : If (i) ∂ p ( Y ) ∂ P exi sts at P = 0 , and (ii) lim P → 0 E [ tr { ( XX ∗ ) 2 } ] P = 0 , the mutu al informat ion between the transmitted and receiv ed signals X and Y is zero to first order in P , i.e. , I ( X ; Y ) = o ( P ) . 2. Second or der res ult : If, in addition, (i) ∂ 2 p ( Y ) ∂ P 2 exi sts at P = 0 , (ii) E h tr n ( XX ∗ ) 2 oi < ∞ and (iii) lim P → 0 E [ tr { ( XX ∗ ) 3 } ] P 2 = 0 , then the mutual informatio n between X and Y up to second order in P is gi ven by I ( X ; Y ) = N r 2 T tr n E[( XX ∗ ) 2 ] − (E[ XX ∗ ]) 2 o + o ( P 2 ) . (6) The applicabili ty of the abov e result to the disc rete input chan nel with the PP APR constraint is next discus sed. Firstly , followin g the proof of the abov e theorem in [9], it can be seen to hold for the discrete input (and continu ous output) case and yield the sa me e xpression as in (6 ) for mutual information with the exp ectations in (6) no w over the discrete instead of continuo us input as in [9]. The exi stence of the first and seco nd deri v ativ es of p ( Y ) at P = 0 are easil y verified for the pro blem at hand. W ith the PP APR constr aint in effe ct, the peakin ess condition s, namely cond itions 1.ii and 2.ii and 2.iii of Theorem 1, are also easily veri fied to hold as well. H ence, it can be conc luded that for a discre te input satisfying the PP APR constraint (i) the mutu al information is zer o up to first o rder in P and (ii) de noting the coef ficient Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 11 of P 2 in the mutual infor mation I ( X ; Y ) of (5) as I low , I low △ = lim P → 0 I ( X ; Y ) P 2 = lim P → 0 N r 2 P 2 T tr n E[( XX ∗ ) 2 ] − (E[ XX ∗ ]) 2 o . (7) Evidentl y , the dominant second order term in the mutual information at low S NR is I low P 2 . The prob- lem of interest is hence to maximize I low ov er { X i } L i =1 and { P i } L i =1 under an a verage po wer constrai nt P i P i tr( X i X ∗ i ) ≤ E and a peak po wer constraint k X i k ∞ = max m,n | [ X i ] mn | ≤ √ K ∀ i . Before un ve iling the solution to the above probl em, we note that in [9] the mutual infor mation up to second order is maximized ov er continuous input distrib utions under two differe nt pea k power constraint s. The soluti ons ho wev er rely on the assumption that the input signal has the form S = Φ ¯ V , (8) where Φ is an isotropica lly dis tribu ted unitary random matrix and ¯ V is a diagonal (random) matrix w ith non-n egati ve entrie s. While this imposition entails no loss of optimality for the case w hen only the a verage power is cons trained (which is a seminal result of [18]), it does result in a loss of optimal ity , and a sig nificant one at that, when the p eak-po wer constraint of [9] is enfor ced which is that the diag onal entries | ¯ V i | 2 ≤ K . Due to the subopti mal restr iction in (8), the maximizations in [9] lead to the misleadin g conclu sion that it is optimal to use a sin gle transmit antenna in the lo w SNR reg ime. In [11] als o, the author s perform the same maximizatio n ov er continuo us input distri butio ns bu t under a more relaxed peak-c onstraint tr( XX ∗ ) ≤ ǫ and conclude that a single antenna should be used. D if ferent from [9] and [11], the optimizat ion problem conside red here does not sub-optimal ly restrict the signals to be as in (8) whi le considerin g ave raged po wer constrained discre te inputs and the practicall y relev ant peak -power constr aint per space-time slot. These assumptions result in a significantl y dif ferent and more challeng ing proble m than those consi dered in [9] or [11 ]. Indeed , in contrast to [9] o r [11], our result s indicate that in the PP APR-constrain ed problem, at suf ficiently low SNR, it ac tually helps to use all transmit antennas . For t he PP APR-constraine d problem, the set of all f easible constellat ions with cardin ality L is d enoted as S L and can be describe d as S L = ( ( X i , P i ) L i =1 : P i ≥ 0 , X i ∈ I C T × N t , L X i =1 P i = 1 , L X i =1 P i tr( X i X ∗ i ) ≤ E , k X i k ∞ ≤ √ K , ∀ i ) . It is assumed, without loss of generalit y , that K N t T ≥ E , beca use otherwise, the a verag e power con- straint cannot b e ac tiv e and one can the refore solv e the proble m by changing the a verage power constraint to E ′ = K N t T . Let the PP APR be deno ted as ζ = K P , a const ant in the PP APR-constraine d case as P v aries. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 12 Let I ∗ low ,L be the maximum mutual infor mation up to second order achiev able by any constellatio n in the set S L , so that I ∗ low ,L = max ( X i ,P i ) L i =1 ∈ S L I low . (9) Note that w hen P i = 0 , the symbol X i is not used and ther efore the set of feasible cons tellations in S L ′ is in cluded in the se t S L for an y L ′ < L . Hence, I ∗ low ,L is th e maximum mutua l information up to second order ac hiev able by an y conste llation of ca rdinality no grea ter than L . The maximu m mutual inf ormation up to second order w hen there is no upper limit on the cardin ality of the discrete input constellatio n is defined as I ∗ low = lim L →∞ I ∗ low ,L . It will be sho wn in what is to follow that I ∗ low ,L    L = T +1 (and its associ ated co nstellation of size T + 1 ) is near- optimal in that it can be ve ry close to I ∗ low (and the as yet unkno wn constellatio n which achie ves the latter). The follo wing theorem is one of the main results in this paper . Theor em 2: (PP APR -constr ained case) Let the coherence time T ≥ 2 . When L ≤ T + 1 , the maximum second order mutual informat ion with an L -poin t inpu t constella tion is giv en as I ∗ low ,L = N t N r T 2  ζ − 1 ( L − 1) N t  . (10) An L point constell ation (or p.m.f.) achie ves I ∗ low ,L with L ≤ T + 1 if and only if (iff) it is of the following form ( X i , P i ) =  √ K v i w ∗ i , E ( L − 1) K N t T  , 1 ≤ i ≤ L − 1 (11) ( X L , P L ) =  0 T × N t , 1 − E K N t T  , (12) where for each i , v i ∈ I C T × 1 is the i th column of a unitar y m atrix V , w i ∈ I C N t × 1 and | [ v i w ∗ i ] mn | = 1 , ∀ i, m, n . (13) Furthermor e, I ∗ low , the maximum second order mutual info rmation with an unconstrai ned cardinality , is bound ed abo ve and belo w as I ∗ low ,L   L = T +1 = N r 2 ( ζ N t T − 1) ≤ I ∗ low < N r 2 ζ N t T . (14) Pr oof: The proof is gi ven in Section III-B. The opt imal signal cons tellation for L = T + 1 gi ven in Theorem 2 can be vie wed as a space-ti me code (employin g une qual tra nsmission probabi lities) that achie ves the m aximum mutual info rmation up Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 13 to second order at lo w S NR. Based on its struct ure, it is referred to as S pace T ime O rthogo nal R ank one M odulation (STORM) becaus e each non-zero matrix is of un it rank and is or thogonal to the other conste llation matrices by construc tion. T wo e xamples of matrices that can be used for the unitary matrix V are the Discrete Fourier Transf orm (DFT) matrix and the Hadamard matri x (when it exists) . In one embodimen t of STORM, w i can be chosen to be 1 N t ∀ i . In this case, each of the L − 1 = T non-zero conste llation points is formed from a column of V and this column is re peated ov er the N t antenn as. The L th point is of cours e the all-zero matrix. It can be seen that for S TORM, the P APR as defined in (4), is K N t P = ζ N t ≥ 1 . Clearly , the ratio between the upper and lo wer bound s on I ∗ low in (14) is nearly equal to unit y when ζ N t T >> 1 . This is e vidently true ev en for moderate and practical valu es of P APR and T . As an exa mple, for ζ = 2 , N t = 2 and T = 4 , the ratio is 0 . 94 . Hence, ev en for m oderate values of P APR and T , the T + 1 point S TORM almost achiev es I ∗ low (the limit with unconst rained cardinalit y) and there is not much to be gained by using more than T + 1 points . A. Remarks Since S TORM achie ves a significant fraction of I ∗ low e ven for moderate values of η and T , the followin g insigh ts from its structure and mutual informatio n up to seco nd order it achiev es at low SN R are of interes t. For bre vity , the m utual informati on up to secon d order at lo w SNR is simply referred to as mutual informa tion in the rest of this section. 1. It can be seen that the mutual informatio n of STORM increases linearly with the maximum peak po wer K . That it is an increa sing function is to be expec ted since peak y sign aling is kn own to achie ve the nonco herent capacit y in the low SNR regime when there is only an av erage power constr aint. Moreov er , the mutua l information also increas es linea rly as a pr oduct N t .N r of the nu mbers of transmit and rece iv e antenn as. The use of a single antenna is evid ently suboptimal by a factor of N t . 2. A reason that is often cited in the literature for exp laining the efficac y of using a single antenn a at lo w SN R is that the number of channel parameters that are to be implicitly estimate d is the least in this case. The use of a sin gle antenna ho wev er is not necessar y to en sure this and can ev en be detrimental to performance as explaine d above. Consider S TORM, where the recei ve d signal w hen the i th non-ze ro signal is transmit ted is Y = √ K v i w ∗ i H + N = √ K v i h T + N , (15) where h T = w ∗ i H and so h is C N ( 0 , N t I N r ) distri buted . T herefor e, the eff ective cha nnel (15) does in Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 14 fact in vol ve only N r (and not N t N r ) unk nown chann el coefficie nts ev en though all transmit antenn as are used. The op timality of the unit rank structure of ST ORM cou ld thus be indeed attrib uted to the dif ficulty of (implicit) estimation of N t N r coef ficients at lo w S NR because it a void s this task by focusin g the power on just N r ef fecti ve unkno wn path gains, while at the same time making use of all the transmit antenn as. 3. Consider the case when w i = 1 N t ∀ i in (11), w hich is sufficien t for T + 1 point STORM to be optimal. Then the symbo ls sent by all transmit anten nas at an y gi ven time are identical and the f ading gains ef fecti vely add up at each recei ve antenna. S o, why not just use a single transmit antenna? All transmit antennas must be used because otherwise the ef fectiv e recei ved power is smaller due to the peak-p ower constraint which limits the symbol power per ant enna and per time slot. 4. A canoni cal e mbodiment of S TORM is one that results from setting w i = 1 N t , ∀ i and V = [ v 1 · · · v T ] to be a T -dimensio nal DFT m atrix in (11).A con venie nt featur e in this DFT vers ion of STORM is that the entri es of the sig nal m atrices can be transmitte d using PSK symbols with an additional zero point. Alternati ve ly , a T -dimensiona l H adamard matrix can be used for V (when it e xists). The adva ntage of using a Hadamard matrix is that it is enough to transmit real symbol s for eac h entry , spec ifically , BPSK and an addition al zero point. Hadamard matrices of dimension T e xist when T = 2 n for any natur al number n and also for man y m ultiple s of 4. In A ppendi x-B, we sho w ho w block decoding of STORM may be simplified usi ng either the Fa st Fou rier T ransform (FFT) or the Fast Hadamard T ransf orm (FHT), when L − 1 is a po wer of 2 . 5. Consider the specia l case when there is only a peak-co nstraint on the input (i.e., K N t T = E ). Here, it can be seen that STORM h as no zero point (so L = T ) and is giv en by ( X i , P i ) =  √ K v i w ∗ i , 1 L  , 1 ≤ i ≤ L . (16) Hence, all points are equipro bable and the P APR is unity , thus fa cilitating practic al implementation. Moreo ver , this co nstellation is near -optimal when ther e is only a peak constrain t and when T > > 1 as seen from the bounds on I ∗ low in (14) of Theorem 2. 6. The canonical ve rsion of STORM can be seen as a form of general ized ( T + 1) -ary ON-OFF signal- ing w ith repetition codin g across the transmit an tennas and with unequal proba bilities of ON and OFF signal ing, with the ON si gnaling actually being the class ical T -a ry , equiprob able Frequenc y Shift Ke ying (T -FSK). The lar ger th e all owed PP APR, the higher the proba bility of the OFF sign al. In fact, STORM tak es advan tage of all the peak power allo wed for each space-time slot when transmitting non-zero sym- bols while meeting the ave rage po wer constraint by the inclusio n of the zero symbo l with as high a probab ility as the PP APR constraint would allo w . Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 15 7. Consider the special case of a SISO system when there is only a pe r time-slot peak po wer const raint K . Here, Theorem 2 establishes the second order opti mality of equiprobab le T -FSK at low SNR among all T -ary constellati ons, and the near -optima lity under unconstrain ed cardina lity w hen T >> 1 . In genera l for SISO systems howe ver , dependi ng on the peak and av erage power constraints , an additional zero signal is neede d of probabili ty dif ferent from that of the T (equi-p robable) T -FSK s ignals. 8. The mutual information of STORM may be e xpressed as E N r 2 ( K N t − E T 2 ) . For fixe d K , T , and E , it increa ses line arly w ith N t . This may be attrib uted to the fact that increasing N t with fixed K , T and E increa ses the over all peak- power tr( XX ∗ ) = ζ N t E , while simultaneou sly dec reasing the probability of transmitt ing a non-ze ro signal 1 ζ N t , thereby making the signals more peaky in the time domain . O n the other ha nd, when T is increa sed for a fixe d N t and E , the ov erall peak -power tr( XX ∗ ) = ζ N t E and the p robability of transmit ting the zero si gnal 1 ζ N t remain fix ed b ut the mutua l information incre ases with T . T o get some insight on why this is so , consider the canonical version of STORM. An increase in T implies that the T -FSK trans missions (repeated over each antenna) become more peaky in the freque ncy domain 1 . 9. STORM conste llations other than the canonical on es can also be construc ted. For example, one can use the in ve rses of the DFT and Hadamard matrices for a choice of V . M ore genera lly , if e V is un itary with unit-mag nitude elements so is V = P e VQ w here P and Q are T × T permutati on matrices. Q only permutes the columns of e V thereb y renumbering the sign als leavi ng the STORM constellati on un- chang ed. Ho wev er , ro w permutations induced by P would result in constella tions that are no longer peaky in the freq uency do main as compared to the canonica l DF T versio n of ST ORM. It is unclea r as t o ho w the complete class of STORM constellat ions can be constructe d. In this regard, note that the w i vec tors can be arbitrary as long as its elements ha ve unit magnitudes. So “repet ition” across transmit ant ennas can in volv e arbitrary phase rotations or multiplica tion by possibly distinct unit-magn itude comple x numbers. 10. The cutof f rate for the discrete input (of cardinali ty L ) and continuo us outpu t channel is giv en by R 0 = max { P i } L i =1 , { X i } L i =1 − log    X i X j P i P j Z p p ( y /i ) p ( y /j ) d y    . (17) The cut off rate w as ini tially advoc ated as a des ign criterion for mod ulation schemes in [19] and [20] . It is a lower bound on th e rand om coding exponen t, and als o prov ides an expon entially accura te descriptio n of the attaina ble error pro bability when communicating at the critical rate [19]. Let th e argu ment of max( . ) in (17) be denoted as the cuto ff rate expr ession . For the noncoherent MIMO channel at lo w SNR, the 1 This was pointed out by a re viewer . Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 16 cutof f rate expressio n is easil y shown to be (c.f. [21]) C R low = N r 8 X i,j P i P j tr n  X i X ∗ i − X j X ∗ j  2 o + o ( P 2 ) (18) An inter esting propert y of C R low [16] is tha t when the input cons tellation satisfies the re gularity con- dition s, lim P → 0 C R low I low = 1 2 . In the limit of low SN R the refore, CR beha ves identi cally to the mutual informat ion. Therefore, the T + 1 point STORM also maximizes the cutoff rate expres sion up to second order at lo w S NR. 11. An often used noncoheren t co nstellation design criterion (cf. [22, 23]) is to maximize the worst-ca se chorda l distance w hich is gi ven by min j 6 = i tr n I − X ∗ i X j X ∗ j X i o . For STORM, the worst-case chordal distan ce is the maximum possible as for ev ery i 6 = j , X ∗ i X j = 0 N t × N t . Moreo ver , the diffe rence bet ween any two distinct matrices in STORM has unit rank, an d he nce the scheme would hav e a di versit y orde r of N r at high SN R if emplo yed as a cohere nt space-time code [24] whereas constella tion de sign at high SNR for the coheren t MIMO channel is typical ly geared towa rds achi eving maximum div ersity ( N t N r ). Theorem 2 shows that optimal noncohe rent constellat ions at low SNR ha ve quite the opposite properties from good cohere nt const ellations at high SN R. 12. Subsequent to the conference ve rsion of this pape r [25] (see also [26]), Sethuraman et. al. [27] con- sider a MIMO Rayleigh fading channel w ith the noncoheren t assu mption and with the fadi ng process modeled as stationary and ergod ic, as well as correlated over time. The authors characterize input distri- b utions whic h a re opti mal for the stationa ry and ergo dic MIMO channel , under av erage-po wer constra ints and peak-con straints which are per space-time slot similar to the PP APR-constrained case here. Intere st- ingly , one distrib ution identified in [27] which ach iev es the capacity up to secon d order can be seen to be closel y related to the ca nonical vers ion of ST ORM here . While this distr ibuti on is obta ined for a differe nt fad ing proc ess, the channel coherence time T here can be thought of as playing the same role as channel memory in [27]. B. Pr oof of Theor em 2 In this subsection, the proof of T heorem 2 is gi ven. T he follo wing definitions and lemmas are need ed first from [28]. Definitio n 1: A con ve x maximiza tion problem is an optimization problem in the follo wing form : max x ∈ X f ( x ) , (19) where f ( x ) is a con ve x function and X ⊂ ℜ n is a con vex set. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 17 Definitio n 2: A point x on the bou ndary of a con ve x set X is cal led an extreme point if there are no distin ct points x 1 , x 2 ∈ X such that x = λ x 1 + (1 − λ ) x 2 , 0 < λ < 1 . Lemma 1: A closed, bounded con vex set in ℜ n is the con vex hull of its extr eme points. Lemma 2: The global maximum of a con vex functio n f ov er a compac t con ve x set X is attain ed at an e xtreme point of X . A poi nt in a compa ct con ve x set X is a glob al maximizer of a strictly con vex functi on f iff it is an e xtreme point of X . Definitio n 3: A polyhed r on is defined to be the set of points P = { x ∈ ℜ n : Ax ≤ b , where A ∈ ℜ m × n and b ∈ ℜ m . A bounded polyhedron is called a poly tope. The extreme points of a polytope are referred to as vertice s . The ne xt lemma gi ve s the necessary and sufficie nt condi tions for a point to be a verte x of a ge neral polyto pe. Lemma 3: W ith the same notation as in Definition 3, let a T i , 1 ≤ i ≤ m denote the ro ws of the matrix A . Further , for x ∈ P , let I =  i ∈ { 1 , . . . , m } : a T i x = b i  descri be the inequalities which are bindin g (acti ve) at x , and let A I be the matrix with ro ws a T i , i ∈ I . Then x ∈ P is a verte x of P if f rank ( A I ) = n . The follo wing lemma more sharply specifies the vertice s of a spec ial poly tope which will be useful in the proof of Theorem 2. Lemma 4: Consider the polytop e defined by D = ( d : X i P i d i ≤ E , 0 ≤ d i ≤ Q , i = 1 , . . . , L ) , (20) which is the inters ection of the half -plane P i P i d i ≤ E and the hyper -cube 0 ≤ d i ≤ Q . Each v ertex of D consists of L − 1 entries that are either Q or 0 , and exactl y one entry c such that 0 ≤ c ≤ Q . Pr oof: The poly tope D can be expre ssed in the standard form Ad ≤ b gi ven in D efinition 3, by setting A =      q T I L − I L      (2 L +1) × L and b =      E Q 1 L 0 L      , (21) where q = [ P 1 P 2 . . . P L ] T . Let x be a verte x of the polyto pe describ ed by Ad ≤ b . Then, the ro ws of A w hich satisfy a T i x = b i should form a matrix with rank L by L emma 3. If x is a verte x for which q T x = E then there are at least L − 1 more linearly independe nt rows of A that co rrespond to activ e constr aints. Suppose k of them a re of th e form x j = Q for j ∈ J ⊆ { 1 , 2 , . . . , L } , then at least L − 1 − k Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 18 acti ve constraints (out of the remaining L − k constra ints) must be of the form x j = 0 for j ∈ J C . Hence, at most one entry of x can lie anywhere between 0 and Q (call it c ). If x is such that q T x < E , then of course it is a vertex by Lemma 3 iff x j = Q for all j in the subset J ⊆ { 1 , 2 , . . . , L } for which P j ∈ J P j < E and x j = 0 for all j ∈ J C (there are as many such vertic es as there are subsets J for which P j ∈ J P j < E ). In this case, all the entires of the verte x are either Q or 0 (set c = 0 or Q ). Pr oof: (of Theorem 2): T he prob lem that needs to be solved her e is essentially max { P i } L i =1 , { X i } L i =1 I low (22) subjec t to X i P i tr( X i X ∗ i ) ≤ E k X i k ∞ ≤ √ K , ∀ i X i P i = 1 , P i ≥ 0 ∀ i where I low is gi ven in (7). Maximizing I low is equi val ent to m aximizin g X i P i tr ( X i X ∗ i X i X ∗ i ) − tr   X i P i X i X ∗ i X j P j X j X ∗ j   (23) = X i P i (1 − P i ) tr ( X i X ∗ i X i X ∗ i ) − X i,j 6 = i P i P j tr  X ∗ j X i X ∗ i X j  (24) ≤ L X i =1 P i (1 − P i ) tr ( X i X ∗ i X i X ∗ i ) . (25) Since terms of the form tr  X ∗ j X i X ∗ i X j  are non-ne gativ e, (25) follo ws by repla cing all negati ve terms in (24) by zer o. Let x ik denote the k th column of the matrix X i . The equalit y in (25) occurs iff x ∗ j k x il = 0 ∀ k , l , j 6 = i . The strategy is to maximize the bound in (25) and sho w later that the signal constellati on that maximizes it achie ves equal ity in the inequality in (25) when L ≤ T + 1 , thereb y maxi mizing I low in these cases. So, let us consider the optimization problem max { P i } L i =1 , { X i } L i =1 X i P i (1 − P i )tr ( X i X ∗ i X i X ∗ i ) (26) subjec t to X i P i tr( X i X ∗ i ) ≤ E k X i k ∞ ≤ √ K , ∀ i X i P i = 1 , P i ≥ 0 ∀ i In Appendix- A, a simple arg ument is gi ven that sho ws that the maximizatio n of (26) is a non-con vex optimiza tion problem. A two-sta ge approac h is thus adopted for solving the optimization in (26). In Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 19 the first stage, the objecti ve functio n is maximized ove r { X i } L i =1 while holding { P i } L i =1 fixed. In the second stage, the resulting obje ctiv e fun ction is maximized over { P i } L i =1 . Furthermore , it is sho wn that the optimizat ion in the first stage can be split into two succes siv e con ve x maximization problems and the opti mization in the second stage is a con vex minimizati on problem. It is this nice structure that is exp loited to obtain the signa l matrices { X i } L i =1 and the probabilitie s { P i } L i =1 that join tly optimize the upper bound on mutual informatio n (up to seco nd order at low SNR) in (26) . Consider first the optimization in (26) over { X i } L i =1 for fi xed { P i } L i =1 . This problem is dec omposed into two steps . In the first step, tr ( X i X ∗ i ) = d i is fi xed for some { d i } L i =1 and the best set of { X i } L i =1 is found . Note that d i is equal to the ener gy of the i th signal and bec ause of the pea kpower constraint, it is suf ficient to restrict d i ∈ [0 , K N t T ] . In the second step, the resulti ng obje ctiv e fun ction is optimized ov er d i , i = 1 , . . . , L . Geometric ally , we first find the matrices { X i } L i =1 that maximize the objec tiv e functi on ov er the contour tr( X i X ∗ i ) = d i ∀ i and then optimize the resultin g objecti ve over { d i } L i =1 , thereb y obtaining the best contour for an arb itrary b ut fixed { P i } L i =1 . As it is sh own be low , both these proble ms can be solv ed as con v ex m aximizati on problems. W ith tr( X i X ∗ i ) = d i ∈ [0 , K N t T ] , ∀ i , it is cl ear tha t the objecti ve function in (26) is m aximized when for each i , X i is chose n according to max tr( X i X ∗ i )= d i k X i k ∞ ≤ √ K , ∀ i tr ( X i X ∗ i X i X ∗ i ) . (27) Let the eigen value s of the pos itiv e semid efinite matrix X i X ∗ i be { λ m } T m =1 (the depende nce on i is im- plicit) . Then, the solution of (27) is upper bounded by the solution of max P m λ m = d i λ m ≥ 0 , ∀ m X m λ 2 m , (28) with equal ity iff the additi onal constrain ts k X i k ∞ ≤ √ K hold for each i for the matrix that achie ves the maximum in (28). S ince the objecti ve func tion in (28) is strictly con ve x while the constraint set is a polyto pe, the problem in (28) is a strictly con v ex maximization problem. Hence by Lemma 2, a solution is glob ally optimal if f it is a ve rtex of the cons traint set. In this case, the constrai nt polyto pe has T + 1 ver tices which can be found by inspectio n to be [0 0 . . . 0 0] T , [ d i 0 . . . 0 0] T , [0 d i 0 . . . 0] T , . . . , [0 . . . 0 0 d i ] T (29) since no ne of th em can be expr essed as a con ve x combinatio n of an y other po ints in t he set, an d any point in the set can be expres sed as a con ve x combina tion of the points in (29). Now , since all the vert ices Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 20 exc ept the all-zero v ector gi ve the same v alue d 2 i for the objec tiv e funct ion, d 2 i is the sought maximum. This in turn implies that all the matri ces { X i } L i =1 ha ve to be of unit ran k for the ob jectiv e func tions to achie ve their m aximum valu e of d 2 i for each i (we adopt the con ve ntion that the all zero matrix is of unit rank). Let the number of matrices in { X i } L i =1 which are not 0 T × N t be L ′ . If more than one of the d i ’ s are z ero, they wo uld all cor respond to the same zero signal p oint 0 T × N t and the ir respecti ve probabil ities would simply add up, resulting in one effect iv e zero symbol matrix. T herefor e, L = L ′ + 1 or L = L ′ depen ding on whether or not there is a zero symbol. When L ′ ≤ T , consider the follo wing constella tion { X i } L i =1 , X i = r d i T N t v i w ∗ i , d i > 0 (30) X i = 0 , d i = 0 , (31) where the vectors v i and w i are constrained as in (13). Note that the set of matrices in (30, 31) are of unit rank and satisfy tr ( X i X ∗ i ) = d i , 1 ≤ i ≤ T . Hence they solv e the problem in (28). Now , since d i ≤ K N t T , using (13), it follo ws that k X i k ∞ ≤ √ K ∀ i and hence they also solve the proble m in (27). Moreo ver , sinc e L ′ ≤ T , any pai r of dif ferent constella tion matrices hav e orthogo nal colu mns (since v i ’ s are orthog onal), which ensure s that (25) holds with equality . It will ev entually be shown that the optimal v alues of the non-zero { d i } are all equal with d i = K N t T ∀ i . T his in turn implies that the structure in (30) and (31) is also necess ary . When L ′ > T , the set of v i in (30) can no longer be selected to be orth ogonal to each other . Ne verth e- less, a set of rank one matrices with the structure gi ven in (30) but w ith a non-ortho gonal set of v i (normal- ized in the same way), still solv es both (27) and (28) . Therefore, the express ion N r 2 T P L i =1 P i (1 − P i ) d 2 i serv es as an upper bound on the maximum mutual information up to second order achie v able by an y conste llation of cardinalit y of L = L ′ + 1 , w hich is I ∗ low ,L . In summary the best conste llation { X i } L i =1 can be specified for an y set of non-n egati ve { d i } L i =1 . It remains to find the best { d i } L i =1 accord ing to max { d i } L i =1 X i P i (1 − P i ) d 2 i . (32) subjec t to X i P i d i ≤ E (33) 0 ≤ d i ≤ K N t T ∀ i (34) For a fixe d { P i } L i =1 , this is also a str ictly con ve x maximization problem ov er a polytop e. H ence, a vert ex of the polytope is both necessary and sufficie nt to achie ve the global optimum. The polyto pe constr aint Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 21 set is exactly of the form considered in Lemma 4 which states th at each ver tex would consist of L − 1 entries that are eithe r K N t T or 0 , and at m ost one ent ry c such tha t 0 < c < K N t T . Fo r vertices for which P i P i d i < E , it is necessari ly the case that all entries are either 0 or K N t T . Consider the second stage of the optimization w hich is over { P i } L i =1 . Follo wing the result of the optimiza tion in the first stage, the str ucture of the opti mal d and the correspond ing probab ilities are of the form d =   K N t T . . . K N t T | {z } M times c 0   T , 0 < c ≤ K N t T . (35) P = [ P 1 . . . P M P M +1 P M +2 ] T (36) where M denotes the numbe r of entries in d that are equ al to K N t T . Note that when P M +1 = 0 , the conste llation po int correspon ding to the entry c such that 0 < c < K N t T , is not tran smitted. W e know that whene ver (33) is strict, there cannot be an extreme point d of the constraint set formed by (33) and (34), which has an entry c su ch that 0 < c < K N t T . Therefo re, in the case of a strict half-pl ane constr aint, we will take P M +1 = 0 for the optimal con stellation without any loss of gene rality , which simplifies the subseque nt con vex minimization prob lem. The cardi nality of the conste llation L depends on the number of non-ze ro probabiliti es in the optimal constell ation and is related to M by L ≤ M + 2 in gene ral. W ith the structure of the optimal d , the optimal set of probabiliti es are determined next in terms of M and c . Follo wing that, the value s of M and c are obt ained that maximize the resulting objecti ve function. For c on veni ence, consider minimizing the ne gativ e of the obj ectiv e function in (32) after t he optimal d is substi tuted as follo ws: min { P i } M +2 i =1 − K 2 N 2 t T 2 M X i =1 P i (1 − P i ) − c 2 P M +1 (1 − P M +1 ) . (37) subjec t to K N t T M X i =1 P i + cP M +1 ≤ E (38) M +2 X i =1 P i = 1 , P i ≥ 0 , 1 ≤ i ≤ M + 2 (39) The optimization o ver P in (37) is the more commonly stud ied con vex minimization prob lem [29]. T he Lagrangi an can be written as L ( P , β , λ, { µ i } M +2 i =1 ) = − K 2 N 2 t T 2 M X i =1 P i (1 − P i ) − c 2 P M +1 (1 − P M +1 ) + β M +2 X i =1 P i − 1 ! Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 22 + λ ( K N t T M X i P i + cP M +1 − E ) − M +2 X i =1 µ i P i . (40) It can be verified that Sla ter’ s con ditions [29] are satis fied and hence, st rong duality holds. Therefore, the Karush-K uhn-T uck er (KKT) condit ions are both necessary and suf ficient for the optimal solution P and are gi ven as λ ≥ 0 , µ i ≥ 0 ∀ i , K N t T M X i =1 P i + cP M +1 ≤ E M +2 X i =1 P i = 1 , λ ( K N t T M X i =1 P i + cP M +1 − E ) = 0 , µ i P i = 0 , 0 ≤ i ≤ M + 2 − K 2 N 2 t T 2 (1 − 2 P i ) + λK N t T + β − µ i = 0 , 1 ≤ i ≤ M − c 2 (1 − 2 P M +1 ) + λc + β − µ M +1 = 0 , β − µ M +2 = 0 . By eliminati ng the slack v ariable µ , we get K 2 N 2 t T 2 (2 P i − 1) + λK N t T + β ≥ 0 , 1 ≤ i ≤ M (41) c 2 (2 P M +1 − 1) + λc + β ≥ 0 (42) β ≥ 0 (43) λ K N t T M X i =1 P i + cP M +1 − E ! = 0 (44) β P M +2 = 0 (45)  K 2 N 2 t T 2 (2 P i − 1) + λK N t T + β  P i = 0 , 1 ≤ i ≤ M (46)  c 2 (2 P M +1 − 1) + λc + β  P M +1 = 0 (47) P i ≥ 0 , 1 ≤ i ≤ M + 2 (48) M +2 X i =1 P i = 1 (49) K N t T M X i =1 P i + cP M +1 ≤ E . (50) From (46), it can be seen that P i can take one of two v alues , namely , P i = 0 or P i = 1 2 − λK N t T + β 2 K 2 N 2 t T 2 . (51) Points with zero probabilit y are redundant and sinc e the opti mal number M is determin ed only later , it may be assumed that the M probabilit ies P i for 1 ≤ i ≤ M are the same and gi ven in (51) and denote Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 23 these probabi lities simply as “ P i ”. S imilarly from (47 ), P M +1 can tak e one of two v alues, namely , P M +1 = 0 or P M +1 = 1 2 − λc + β 2 c 2 . (52) Four ca ses must be considere d to find the solution s to the KKT conditi ons. Recall that K N t T ≥ E . Case 1 : K N t T P M i =1 P i + cP M +1 < E , P M +2 = 0 . The strict inequal ity in (50) implies that λ = 0 from (44). Since the power constr aint is a strict inequa lity , we may take P M +1 = 0 from the discussion that follo ws (36 ). There fore, P i = 1 M is necess ary to satisfy (49). From (51 ), we obtain β = M − 2 M K 2 N 2 t T 2 . The condition β ≥ 0 implies that M ≥ 2 . The strict inequal ity in (50) togeth er with P M +1 = 0 implies that this case holds when K N t T < E , which is ne ver true. Therefore, this case does not occur . Case 2 : K N t T P M i =1 P i + cP M +1 < E , P M +2 > 0 . The strict inequal ity in (50) implies that λ = 0 from (44). Since the power constr aint is a strict inequa lity , we m ay tak e P M +1 = 0 from the discussi on that fo llows (36). Since P M +2 > 0 , we hav e β = 0 from (45). Therefore, P i = 1 2 from (51) and P M +2 = 1 2 . From (50), this case applies when K N t T < 2 E and M = 1 . Case 3 : K N t T P M i =1 P i + cP M +1 = E , P M +2 > 0 . Since P M +2 > 0 , w e must ha ve β = 0 by (45). There are three sub-cases here , viz., (i) P i > 0 , P M +1 > 0 (ii) P i > 0 , P M +1 = 0 and (iii) P i = 0 , P M +1 > 0 . W e fi rst cons ider sub- case (i). (i) Using the v alues P i = 1 2 − λ 2 K N t T and P M +1 = 1 2 − λ 2 c from (51) and (52) in the po wer constraint equali ty , we can so lve for λ as λ = M K N t T + c − 2 E M +1 . Substitut ing this value of λ in (51) and (52 ), we obtain P i = K N t T − c + 2 E 2( M + 1) K N t T (53) P M +1 = M ( c − K N t T ) + 2 E 2 c ( M + 1) . (54) Using the abov e probabilit ies in the objecti ve function f gi ven in (37), we observ e that d 2 f dc 2 = − M 2( M + 1) ≤ 0 , (55) which means that f is a conca ve function over c . Since P M +1 ≥ 0 , we get from (52) that λ ≤ c . Therefore , the range of c in this case is giv en by λ ≤ c ≤ K N t T . Since the optimiza tion of f ov er c is a conca ve minimizatio n proble m, the minimum is either at c = λ or c = K N t T by Lemma 2. Choosin g c = λ gi ves P M +1 = 0 from (52), λ = K N t T − 2 E M and therefo re P i = E M K N t T . (56) Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 24 Consequ ently , from (49) we get that P M +2 = 1 − E K N t T . (57) If c were inste ad c hosen to be K N t T , then from (53) and (54), P i = E ( M +1) K N t T , P M +1 = E ( M +1) K N t T and therefore P M +2 = 1 − E K N t T . Since we are yet to optimize o ver M , the above solution clearly is identi cal to that obt ained in (56) and (57). So we m ay choo se c = λ itsel f as the solu tion. For c = λ , since λ ≥ 0 , this cas e requir es K N t T ≥ 2 E M . Moreov er , the po wer constraint equality requir es that K N t T ≥ E . Hence, this sub- case solv es the con vex optimiz ation problem for the ca ses K N t T ≥ E , M ≥ 2 and K N t T ≥ E , M = 1 . Even for sub-cases (ii) and (iii), it can be easily verified that we get essential ly the same solu tions as the pre vious sub-case. Case 4 : K N t T P M i =1 P i + cP M +1 = E , P M +2 = 0 . The cas es K N t T ≥ E , M ≥ 2 and K N t T ≥ E , M = 1 are solv ed complete ly through Cases 2 a nd 3. This is true because by strong duality , the constel lations obtained in Cases 2 and 3 are both necessary and sufficien t for opt imality . Moreo ver , since K N t T < E does no t occur , we do not solve for Case 4 since we will get no ne w solutions or insights. The last step is to fi nd the best possible M . W e rev ert to the problem which is a maximization of the object iv e fun ction f for con venien ce. F rom Case 3, which yields the only pertinent solution for T ≥ 2 , the objecti ve functi on with the optimal probab ilities giv en in (56) and (57) is f = K N t T E  1 − E M K N t T  . (58) Notice tha t f is an increasing functio n of M , and M needs to be chosen as lar ge as po ssible. Ho weve r , if M is chosen so that M > T , inequality (25) would be strict since it is not possible to make the columns of all pairs of differe nt constell ation matrices orthogonal. Therefore, M = T is opti mal among M satisfy ing M ≤ T . When we tak e the limit as M → ∞ , we get an upper bound on the mutual informat ion which is not achi ev able (hence the strict inequali ty for the upper bound in (14)). T o complete the proof, notice that we may use the jointly optima l P and d with the struc ture of conste llation points giv en in (30,31) so that the upper boun d in (25) is achie ved with equality when M ≤ T . T herefo re, th e opti mal constellat ions ha ve been obtain ed for the case M ≤ T . When M ≥ T , we can obtain an upper bound on the maximum achie vabl e mutual info rmation by letting M → ∞ in (58) (and multipl ying by the fa ctor N r 2 T ). Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 25 C. Spe ctral Efficie ncy Consider the normalize d energ y per bit for reliab le communica tions which is giv en as E b N 0 = P C ( P ) , (59) where C ( P ) is the S hannon capac ity for the channel in bits per dimens ion. For the case when C ( P ) is a non -decreasin g concav e function, it can be seen that (59) achie ve s its minimum v alue ov er all P , as P → 0 . Howe ver , this is not true in the P P APR-constra ined case. Indeed , since the capacit y is O ( P 2 ) , E b N 0 → ∞ as P → 0 . Theref ore, it is not ener gy-ef ficient to operate at asympto tically lo w SNR in this case. The mutual information of STORM at an y SN R is I S T O RM ( P ) = X i P i E Y | X i " log p ( Y | X i ) P j P j p ( Y | X j ) !# . (60) The expec tations in (60) can be calcu lated using Monte-Carlo in tegration . Thus the normal ized ener gy per bit require d for STORM can be determine d as E b N 0 = P I S T ORM ( P ) , over the entire range of SNR s. It can be seen through extensi ve simulati ons ove r a v ariety of cases that the minimum ener gy per bit typica lly occurs at a lo w b ut non-v anishi ng SN R. STORM should henc e be used in the vicinity of this SNR, for ma ximum spectral ef ficiency . In t he absenc e of the capacity of th e noncoheren t MIMO ch annel at a general SNR howe ver , there is no fair yard sti ck to compare the ener gy per bit of STORM again st that of the capac ity achie ving scheme. I V . T H E P E A K - C O N S T R A I N E D C A S E In this sectio n, the peak-cons trained problem is consider ed where the peak constrai nt K in (3 ) is a fixed consta nt, indepen dent of the av erage power P . It can be sho wn by a simple time-s haring ar gument that the channel capaci ty in this case is conc av e and non-dec reasing in P . Therefore , the normalized energy per bit E b N 0 gi ven in (59) can be seen to attain its minimum val ue over all P , as P → 0 . L et us denote the normaliz ed minimum energy per b it for our channel m odel by E b N 0 min , in k eeping with common usa ge [6]. Since C ( P ) is a non-de creasing function of P , it can be assumed without an y loss of generality that the a verage power constrain t is 1 T E [tr( XX ∗ )] = P instead of 1 T E [tr( XX ∗ )] ≤ P . The capac ity functi on (in bits/di mension) admits the follo wing T aylor series expan sion C ( P ) = ˙ C (0) P log 2 e + 1 2 ¨ C (0) P 2 log 2 e + o ( P 2 ) , (61) where ˙ C (0) and ¨ C (0) are the first and secon d deri vati ves of C ( P ) computed in nats/dimensi on. The notati on and units introduced abov e for C ( P ) , ˙ C (0) and ¨ C (0) will be used in the rest of this paper . The Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 26 capaci ty per unit ener gy (in bits per joule) is the recipro cal of E b N 0 min , and is equal to ˙ C (0) log 2 e in the peak-c onstrained case, and either metric can be considered to be a measure of ener gy ef ficiency . T here- fore, the noti ons of minimizing the ene rgy per bit and maximizing the information rat e per unit ener gy will be used interchange ably . T he minimizati on of ener gy per bit is cons idered in Section IV -A. Note ho wev er that since this minimum occu rs at a vanis hing SNR , a fixed rate (in bits/sec) of communica tion can be only achie ved in the limit of infinite bandwidth . It is hence of interest to communicate at low b ut non- vanis hing S NR and also do so in a bandwidth efficient manner , which brings us to the notion of wideband slope introdu ced in [6]. The slope of the capacity fu nction ver sus E b N 0 (also called th e spect ral efficie ncy fun ction) in bits per second per hertz p er 3 dB at zero sp ectral e fficienc y is defined as the w ideband slope in [6] and wa s shown to be gi ven in terms of ˙ C (0) and ¨ C (0) as S 0 = 2 h ˙ C (0) i 2 − ¨ C (0) . (62) The moti vatio n for considering the wideband slope as a performan ce m etric is that, while achie ving E b N 0 min is desirable for energ y ef ficiency , the rate of con ver gence of E b N 0 to E b N 0 min as P → 0 is also an importan t facto r at low P , which in turn is closely tie d to spectral efficie ncy . The higher the wideband slope, the greater is the spec tral effici ency when operating at small b ut non -van ishing SNR. T his point about the importance of the w ideban d slope was highlight ed through se ve ral e xamples in the insi ghtful work of [6]. An import ant ex ample pro vided th ere wa s th at of nonco herent communicatio ns w ith an input a verage power constraint alone, and the wideband slope in this case was foun d to be S 0 = 0 in cont rast to that of coherent communicatio n where it is po sitiv e. T his result implies that to approach E b N 0 min , the bandwid th for reliab le noncoher ent co mm unicat ions be comes prohib itiv ely large and the assoc iated sig- naling scheme prohibiti ve ly peak y , and therefore not reali stic (i.e., bandwidth limited and peak-limited) scheme can achie ve E b N 0 min . In this work, the noncoh erent MIMO chann el is considere d w ith a peak- constraint on the input, in additi on to the a verag e power constra int. It is sho wn that with the addition al peak- constraint, which is necess ary for meaningfu l results at lo w SNR, ther e is a trad eoff betwee n the minimum ene rgy per bi t and the wideba nd slope. This provide s a far more detail ed characteriza tion of th e wideband slop e than if o nly the ave rage power constraint were imposed, and in particular it sho ws that it is possible to ha ve S 0 > 0 pro vided the peak -constraint on the input is less than a certain constant. In the process, the T + 1 point conste llation is deriv ed in Section IV -B from among constellatio ns that achie ve minimum ener gy per bit Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 27 (or equi vale ntly , ˙ C (0) ) that is optima l in wideband slope (or maximize ¨ C (0) ), w hich interestingl y , turns out to be S TORM again . S TORM is hence optimal in spectral ef ficiency in the wideban d regime. Apart from pro viding fundamenta l limits on peak-limited MIMO noncoh erent communica tions, our results and conclu sions also temper the pessimistic conclusion s that result from the consideratio n of nonco herent communica tion under just an av erage power const raint [6]. A. Ach ieving minimum ener gy per bit In this section, the necessary and sufficie nt condition s for a constellat ion to achie ve E b N 0 min are deriv ed. First, the follo wing definition and lemma are needed from optimization theory [28, 29]. Definitio n 4: A functio n f is strictly q uasiconca ve ove r a con vex set A iff for any x 1 , x 2 ∈ A , and fo r 0 < θ < 1 , f ( θ x 1 + (1 − θ ) x 2 ) > min { f ( x 1 ) , f ( x 2 ) } . (63) Lemma 5: The global m inimum of a strictly quasiconc av e funct ion f ov er a compact con ve x set A is attaine d at a poin t x ∈ A only if x is an extreme point of A . Theor em 3: Consider a conste llation C wit h non-zero matrices { X i } L − 1 i =1 and respe ctiv e probab ili- ties { P i } L − 1 i =1 , and the zero matrix with probability P 0 . Let C sat isfy the a verage power constr aint E [tr( XX ∗ )] = P T = E and the peak-cons traint (3) as in the peak-co nstrained problem. Then, C achie ves the capaci ty per unit energy as P → 0 if f its constellatio n matrices and respecti ve proba bilities are of the follo wing form X i = √ K v i w ∗ i , 1 ≤ i ≤ L − 1 (64) X 0 = 0 T × N t , (65) L − 1 X i =1 P i = P K N t (66) P 0 = 1 − P K N t , (67) where for each i , v i ∈ I C T × 1 , w i ∈ I C N t × 1 and | [ v i w ∗ i ] mn | = 1 ∀ i, m, n . The capac ity per unit ener gy achie ved by the abov e constellati on is N r ·  1 − log(1 + K N t T ) K N t T  log 2 e bits/joule . (68) Pr oof: Let the mutual information between C and the outpu t Y be denoted as I ( P ) (in nats per dimensio n). It is known from [30] that to achie ve the capacity per unit ener gy , it is suf ficient to use one symbol apar t fro m the zero ener gy symb ol. Therefore, our formulatio n, which assumes a discrete input Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 28 with an arbitrary number of points, is without any loss of generality . The optimization problem that is to be solv ed is giv en as max { P i } L − 1 i =0 , { X i } L − 1 i =1 ˙ I (0) , (69) subjec t to L − 1 X i =0 P i tr( X i X ∗ i ) = P T , k X i k ∞ ≤ √ K , ∀ i L − 1 X i =1 P i = 1 − P 0 , P i ≥ 0 ∀ i . A gener al formula for ˙ I (0) was der iv ed in [6] and is giv en as ˙ I (0) = lim P → 0 E X  D ( P Y | X || P Y | X = 0 )  E X [tr( XX ∗ )] . (70) Since max { P i } L − 1 i =0 , { X i } L − 1 i =1 lim P → 0 E X  D ( P Y | X || P Y | X = 0 )  E X [tr( XX ∗ )] ≤ lim P → 0 max { P i } L − 1 i =0 , { X i } L − 1 i =1 E X  D ( P Y | X || P Y | X = 0 )  E X [tr( XX ∗ )] , (71) an upper boun d for the optimal va lue of the problem in (69) is lim P → 0 max { P i } L − 1 i =0 , { X i } L − 1 i =1 E X  D ( P Y | X || P Y | X = 0 )  E X [tr( XX ∗ )] , (72) subjec t to L − 1 X i =0 P i tr( X i X ∗ i ) = P T , k X i k ∞ ≤ √ K , ∀ i L − 1 X i =1 P i = 1 − P 0 , P i ≥ 0 ∀ i . The object iv e function in (72) can be ev aluate d as E X  D ( P Y | X || P Y | X = 0 )  E X [tr( XX ∗ )] = N r · ( 1 − P L − 1 i =1 P i log det ( I + X i X ∗ i ) P L − 1 i =1 P i tr( X i X ∗ i ) ) . (73) Consequ ently , the proble m that needs to be solv e is min { P i } L − 1 i =0 , { X i } L − 1 i =1 P L − 1 i =1 P i log det ( I + X i X ∗ i ) P L − 1 i =1 P i tr( X i X ∗ i ) , (74) subjec t to L − 1 X i =0 P i tr( X i X ∗ i ) = P T , k X i k ∞ ≤ √ K , ∀ i L − 1 X i =1 P i = 1 − P 0 , P i ≥ 0 ∀ i . Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 29 Relaxing the peak const raint, the optimal value of the probl em in (74) o ver the signal constella tion (b ut with the proba bilities fi xed) is lo wer bounded by the optimal val ue of the problem min { X i } L − 1 i =1 , { d i } L − 1 i =1 , { P i } L − 1 i =0 P L − 1 i =1 P i log det ( I + X i X ∗ i ) P L − 1 i =1 P i d i , (75) subjec t to L − 1 X i =0 P i d i = P T , tr( X i X ∗ i ) = d i , 0 ≤ d i ≤ K N t T , ∀ i . The optimal v alues of problems (74) and (75) are the same iff the { X i } L − 1 i =1 that solv es (75) also satisfies k X i k ∞ ≤ √ K , ∀ i . As in the P P A PR constraine d problem, the abov e problem can be solve d as a two-stage optimizat ion, where in the first stage, the probabiliti es { P i } L − 1 i =0 are fixed and the constella tion { X i } L − 1 i =1 is optimized. In the secon d step, the resulting objecti ve function ove r is optimized ov er { P i } L − 1 i =0 . Consider a fixed, fea sible but otherwise arbitrary { P i } L − 1 i =0 . It can be verified that for each i , min tr( X i X ∗ i )= d i log det ( I + X i X ∗ i ) = log(1 + d i ) , (76) is solv ed iff X i has unit rank. Therefore , the probl em in (75) can be re-written as min { d i } L − 1 i =1 , { P i } L − 1 i =0 P L − 1 i =1 P i log(1 + d i ) P L − 1 i =1 P i d i , (77 ) subjec t to L − 1 X i =1 P i d i = P T , 0 ≤ d i ≤ K N t T , ∀ i Let d = [ d 1 d 2 . . . d L − 1 ] T . Consider the set A t = ( d : h ( d ) = P L − 1 i =1 P i log(1 + d i ) P L − 1 i =1 P i d i > t , d i ≥ 0 ∀ i, t ≥ 0 ) . (78) Since P L − 1 i =1 P i log(1 + d i ) − t P L − 1 i =1 P i d i is strictly conca ve for e very real t , the set A t is con ve x. Therefore , conside ring any two points d 1 , d 2 ∈ A t where t = min { h ( d 1 ) , h ( d 2 ) } and using Defi nition 4, P L − 1 i =1 P i log(1+ d i ) P L − 1 i =1 P i d i is a strictly quasiconca ve funct ion of d . Hence, from L emma 5, the solution of (77) is achie ved at a verte x of the constraint set. Using Lemma 4, each vertex of the constrain t set consists of L − 1 entries that are either K N t T or 0 , and exac tly one entry c such that 0 ≤ c ≤ K N t T . It can theref ore be ass umed, without loss of generality , that the optimal d and the correspo nding prob- abiliti es are d =   K N t T . . . K N t T | {z } M times c 0   T , 0 ≤ c ≤ K N t T . (79) Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 30 P = [ P 1 . . . P M P c P 0 ] T . (80) where, fo r con venien ce, the symbol M is intro duced to d enote the n umber of entries in d that ar e equa l to K N t T . Since the obje ctiv e function is a symmetric function of d , the specific arrang ement of the ent ries is immaterial. U sing this structure for d , the prob lem in (77) can be re-written and boun ded from below as min c, { P i } L − 1 i =0 P M i =1 P i log(1 + K N t T ) + P c log(1 + c ) P M i =1 P i K N t T + c P c , (81) subjec t to 0 ≤ c ≤ K N t T , M X i =1 P i K N t T + c P c = P T , M X i =1 P i = 1 − P 0 ≥ min c, { P i } L − 1 i =0 0 ≤ c ≤ K N t T P M i =1 P i log(1 + K N t T ) + P c log(1 + c ) P M i =1 P i K N t T + c P c . (82) The problem in (82) is easily seen to be the minimization of a strictly quasiconc av e function over c . Therefore , the solution has to be among the v ertices of 0 ≤ c ≤ K N t T , ie., either c = 0 or c = K N t T . Notice tha t with either ch oice of c , the o bjecti ve function is log(1+ K N t T ) K N t T , and is independe nt of { P i } L − 1 i =0 . Therefore , the upper bou nd on the optimal valu e of the pro blem in (69) is N r ·  1 − log(1 + K N t T ) K N t T  . (83) Since d i = K N t T ∀ i , for equality to hold in the inequali ty leading to (75), it is necessary and suf ficient that the non-ze ro matrices { X i } L − 1 i =1 be of the form X i = √ K v i w ∗ i ∀ i, (84) where v i ∈ I C T × 1 , w i ∈ I C N t × 1 are such that | [ v i w ∗ i ] mn | = 1 ∀ i, m, n . By substituting (84) in (69), a lower bound on the optimal v alue of (69) is obtain ed, which coi ncides with the uppe r bound in (83), implying that (83) is the optimal val ue of the problem in (69). From the po wer cons traint, P L − 1 i =1 P i = P K N t must be true and P 0 = 1 − P K N t > 0 . T herefor e, it can be concl uded that ˙ C (0) = N r ·  1 − log(1 + K N t T ) K N t T  . ( 85) Note th at the capa city per unit e nergy in (68) is ind ependent of the number o f points L . In p articular , it can be achie ve d with a 2-po int constellati on. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 31 Cor olla ry : The follo wing two point constellati on achie ves the capacity per unit ener gy as the av erage po wer P → 0 ( X 1 , P 1 ) =  √ K v w ∗ , P K N t  (86) ( X 2 , P 2 ) =  0 T × N t , 1 − P K N t  , (87) where v and w are column vect ors such that | [ vw ∗ ] mn | = 1 ∀ i, m, n . The abov e 2-point cons tellation is referred to as MIMO-OO K (on-of f keyin g). T his constella tion can also be obtained directly through a simplified general formula for the capacity per unit energ y deri ved in [30]. It turns out that the simplified formula in [30] ca n be ev aluated using similar techniqu es to those used in the p roof of Theorem 3, and is als o a more direct approa ch than the deri vat ion of the capaci ty per unit ener gy in [31]. For the sake of complete ness, it is giv en in Appendix- C. Clearly , Theorem 3 implies that there is a large class of constella tions which achie ve E b N 0 min . For instan ce, the cardin ality can be any L ≥ 2 . Moreo ver , only the sum of proba bilities of the non-zero points is constrain ed to be P K N t , while the ind ividu al probabili ties can be arbitrary . F urther , there is no restric tion on the relationshi p be tween X i and X j , ∀ j 6 = i . In partic ular , X i can be taken to be all equal to a un it rank matri x X with elemen ts of equa l magnitud e (equal to √ K ) for all i = 1 , 2 , . . . , L − 1 . In this case, the non-zero points wo uld co incide and become one non -zero point with pro bability P K N t , thereb y reducing to the 2-point MIMO-OOK constell ation of Corollary 3. B. Maximizi ng the wideband slope A k ey in sight provid ed by [6] is t hat ev en thou gh differ ent schemes may ach iev e E b N 0 min , an analysis of their wideban d sl opes could rev eal vast dif ferences i n the rate of gro wth of their energy ef ficiencies around E b N 0 min , and therefore differ entiates their spectral efficie ncies. The wideband slope, which is the m easure of spectral ef ficiency at low but non-v anishing SNR, is therefore critical in the analys is of wideband chann els. Our next aim is therefore, to optimize the wideband slo pe ove r const ellations which ach iev e E b N 0 min . The next theorem provide s a formula for the wideba nd slope S 0 when ev aluated for an arbitrary genera lized OOK con stellation. Theor em 4: Consider a const ellation C with non -zero m atrices { X i } M i =1 and resp ectiv e probab ilities Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 32 { P i } M i =1 , and the zero matrix with probab ility P 0 . Then S 0 =                2 T N 2 r ( P M i =1 P i tr( X i X ∗ i ) − P M i =1 P i log det ( I + X i X ∗ i )) 2 P M i =1 P 2 i (1 − P 0 ) 2 1 | I − X i X ∗ i X i X ∗ i | N r + P i P j 6 = i P i P j (1 − P 0 ) 2 1 | I − X i X ∗ i X j X ∗ j | N r − 1 , if I − X i X ∗ i X j X ∗ j is p ositiv e defi n ite ∀ i, j 0 , otherwise . (88) Pr oof: See Appendix -D. The follo wing corollary indicat es a fundamental limitation in approachin g the capacity per unity energ y for a conste llation of arbitr ary cardinality . Cor olla ry 1: Consider a cons tellation C with non-zero matrices { X i } M i =1 and respecti ve probabilitie s { P i } M i =1 , an d the zero matrix with p robability P 0 . Let C satisfy th e av erage and pea k po wer constr aints in the statement of Theorem 3. Suppose C achie ve s the capacity per unit ener gy . Then the wideba nd slope S 0 is 0 w hen K N t T > 1 . Pr oof: Since C achiev es the capacity per unit en ergy , it satisfies the nece ssary con ditions state d in Theorem 3. From Theorem 4, the wideband slope is non-zero only when the matrix I − X i X ∗ i X j X ∗ j (89) is positi ve definite for all pairs i, j . The proof of the corollary follo ws w hen the necessary conditio ns for achie ving the capacity per unit ener gy in Theore m 3 are substituted in (89) and simplified. Theor em 5: Am ong all const ellations of Theore m 3 which achie ve E b N 0 min , with T + 1 points , STORM has the maximum wideban d slope. Pr oof: Since the conste llations under consideratio n achie ve E b N 0 min , the numera tor in (88) is a fi xed consta nt. Further , giv en the nec essary conditions for the constellatio n to ac hiev e E b N 0 min , th e denominator of the wideban d slope can be simplified as M X i =1 P 2 i (1 − P 0 ) 2 1  1 − K 2 N 2 t T 2  N r + X i X j 6 = i P i P j (1 − P 0 ) 2 1    I − X i X ∗ i X j X ∗ j    N r − 1 , (90) where the matrices { X i } M i =1 are of unit rank with entrie s of equal m agnitu de √ K , and K N t T < 1 (to ensure that I − X i X ∗ i X j X ∗ j is positi ve semidefinite ∀ i, j ). Clearly , (90) is minimiz ed when there exis t rank-o ne matrices { X i } M i =1 such that X ∗ j X i = 0 ∀ i, j 6 = i . S uch a set exists for M ≤ T , and is denoted by X i = √ K v i w ∗ i ∀ i , where the definitions for v i and w i are the same as in Theore m 2. The problem Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 33 that needs to be solv ed is thus min { P i } M i =1 P M i =1 P i =1 − P 0 , P M i =1 P i = P K N t M X i =1 P 2 i (1 − P 0 ) 2 ( 1  1 − K 2 N 2 t T 2  N r − 1 ) . (91) The objecti ve f unction in (91) can be easily shown to be a Schur -con vex function [32] of [ P 1 P 2 . . . P M ] . Hence, the minimum occurs when each of the probabiliti es { P i } M i =1 is equal to 1 − P 0 M . The optimal valu e of (91) is therefo re 1 M ( 1  1 − K 2 N 2 t T 2  N r − 1 ) . (92) Clearly , M has to be m ade as large as possible, but to ensure achie va blity of the optimal value in (91), it can be no greater th an T + 1 . Therefore, set M = T + 1 . Evidently , the solu tion to ( 91) when M > T + 1 would pro vide an upper bound on the maximum wideband slope. Theorem 5 establ ishes the optimality of STORM among T + 1 point constel lations in the peak- constr ained case. This means that STORM is spectrally m ost efficie nt among all T + 1 (or fe wer) point conste llations that achie ve maximum capac ity per un it energ y in the lo w S NR regime . The follo wing corollary provide s the wideband slopes of MIMO-OOK and STORM. Cor olla ry 2: The wideband slopes of MIMO-OOK and STORM are respe ctiv ely , S O O K 0 =      2 T N 2 r ( K N t T − log(1+ K N t T )) 2 1 ( 1 − K 2 N 2 t T 2 ) N r − 1 if K N t T < 1 ; 0 if K N t T ≥ 1 . (93) S S T O RM 0 =      2 N 2 r ( K N t T − log(1+ K N t T )) 2 1 ( 1 − K 2 N 2 t T 2 ) N r − 1 if K N t T < 1 ; 0 if K N t T ≥ 1 . (94) Pr oof: T he wideband slopes follo w by substi tuting the M IMO-OOK and S TORM constell ations in the result of Theorem 4. C. Remarks Since STORM was obtained as the optimal cons tellation ev en in the PP APR constra ined case, man y of the remarks on STORM followin g T heorem 2 and in Section III-A appl y ev en to the peak-cons trained case. Here we only state new ins ights pertinent to the peak-cons trained case. 1. From (68) , it is seen that lim K →∞ ˙ C (0) = N r . Therefore , for asymptotical ly lar ge peak-p owers, the well kno wn result on the capac ity per unit ener gy with only an ave rage power con straint [6] which is common to both co herent and nonc oherent MIMO channel s, is recov ered. Indeed, when N r = 1 , we Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 34 obtain the min imum ener gy to trans mit one bit o f information to be − 1 . 5 9 dB, whic h is a classic al result. By relaxing the peak -constrain t, STORM can be seen to be opti mal e ven for the cas e when there is mere ly an av erage power c onstraint (or with respect to infinite bandwidth capacity ). 2. When the signals are just subj ect to an av erage po wer constra int, it is shown in [6] that S 0 = 0 for the noncoh erent MIMO channel. Therefore , signa ls whose ener gy per bit approache s E b N 0 min would ha ve to ha ve bandwidths tha t become proh ibitiv ely large . Howe ver , when there is an ad ditional peak-p ower constr aint K which is a fixed constan t, and for the case when the normalize d peak po wer K N t T < 1 , Corollary 2 sho ws that S 0 is strictly positi ve. Hence, it is realistic to design signals that achie ve the E b N 0 min in th is scenario for lo w b ut non- vanis hing SNR. Similar insights were also n oted in [12] b ut in the simpler context of the SIS O Rician fading channel with unit block length under peak and av erage power constr aints. 3. While both MIMO-OO K and STORM achie ve E b N 0 min , acco rding to Corollary 2, the wideban d slope of STORM is higher by a factor of T . This means that at a certain ener gy per bit and for the same transmis sion rate, and as SNR → 0 , the b andwidth needed by STORM for the sa me spectral ef ficiency is less than that of MIMO-OOK by a factor of T . Giv en typical v alues of the coherence time T , this hig her spectr al efficien cy of STORM can translate into huge sav ings. T o gi ve a sense of the significant ga ins, Figures 1 and 2 plot the spectra l ef ficiency vs. the ener gy per bit for STORM and MIMO-OOK. 4. Figures 3 and 4 pl ot the ener gy per bit and wideb and slope of STORM v s. the n ormalized peak po wer K N t T , for dif ferent val ues of N r . As the normaliz ed peak po wer increases, it is see n that the E b N 0 min decrea ses. This is ex pected as peakie r signaling is more en ergy ef ficient. Howe ver , as the nor malized peak power gets close to 1 , the wideband slope appro aches 0 . In fact, the w ideban d slope attains its maximum at an intermediate value between 0 and 1 (say K N t T = c ∗ ). Since for any point in the region 0 ≤ K N t T ≤ c ∗ there is a point corresp onding to c ∗ ≤ K N t T ≤ 1 with lower E b N 0 min and the same wideband slope, it makes most sense to operate in the region c ∗ ≤ K N t T ≤ 1 . A ssuming only an a verage power constr aint, the analysis in [6] sho ws that S 0 = 0 for non coherent communica tions. The scheme that achie ves the E b N 0 min has the non-zero signals migrati ng to ∞ in amplitude as P → 0 . The results in [6] sho w in ef fect that it is unreali stic to realize the peak- unconstrain ed minimum ener gy per bit (STORM hav ing zero wideband slope for all K N t T ≥ 1 is clearly a stronger statement). Under realist ic assumption s on the peak- constraint ho wev er , it has been sho wn here that S 0 > 0 is possi ble when K N t T < 1 . Moreov er , a sharp cha racterizatio n is pro vided which shows that ther e is a tradeo ff between E b N 0 min and S 0 for STORM in the re gion c ∗ ≤ K N t T ≤ 1 . Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 35 5. For the same number of bits transmitted reliab ly per joule at lo w SNR, MIMO-OOK requires an operat ing SN R which is 10 log 10 T dB smaller than that of STORM. This can be seen from the fact that the wideband slop e of STORM is T times that of MIMO-O OK and that mutual infor mation per joule is gi ven as I ( P ) P = ˙ I (0)(log 2 e ) + 1 2 ¨ I (0)(log 2 e ) P + o ( P ) . (95) and the wideband slope is S I 0 = 2 [ ˙ I (0) ] 2 − ¨ I (0 ) . No w , since the peak-po wer is a fixed constan t, this implies that the P APR of MIMO-O OK at any small bu t n on-va nishing SNR woul d b e gr eater th an that o f STORM by a fact or of T . Since in the low SNR regime , peakiness of the signal constellati ons is a crucial factor , using STORM can potentially result in lar ge reduct ions in the required P APR and facil itate implementatio n. These lar ge saving s are illustrat ed in Figure 5, w here the approximation of I ( P ) P vs. P is plotted for STORM and MIMO-OOK. In the exa mple shown, the con ver genc e to the capacity per unit energ y is fast er for STORM by a fa ctor of 10 log 10 T = 9 dB relati ve to MIMO-OOK. 6. It has been sho wn in Corollar y 1 that whene ver K N t T > 1 , the wideband slope is 0 . Therefore, e ven though the noncoh erent capac ity per unit energ y is N r log 2 e bits/joule , it is prohibiti vel y expen siv e (in terms of bandwidth) to reliably trans mit at any rate more than the peak-cons trained capa city per unit ener gy e val uated at K N t T = 1 which from equatio n (68) is N r (log 2 e − 1) bits/joule. Hence, the capaci ty per unit ener gy at K N t T = 1 can be taken to be the realist ic limit for noncohe rent MIMO communica tion. Note that this limit is also N r bits/jo ule smalle r than the coherent capacity per unit ener gy . Since the analysi s of the noncoheren t channel neither assumes any pa rticular scheme for chann el estimatio n nor does it ignore the resources for (implicit) channel estimation, the realistic capaci ty per unit ener gy of N r (log 2 e − 1) bits/jou le can be argued as being more fundamental than the coheren t capaci ty per unit energ y of N r log 2 e bits/j oule. The differe nce between the two can be thought of as the fundame ntal or minimal cos t of (impl icit) channel estimation. 7. The dependen ce of E b N 0 min on K , N t and T is only through the product K N t T . So, increasing one or more these quan tities has the ef fect of lo wering E b N 0 min . Howe ver , this ef fect is beneficia l when K N t T < c ∗ and beyond that the tradeo ff between ener gy effici ency and bandwid th efficienc y is quantified here that allo ws a designer to choose a suitable operating point . T o illustrate this point, F igure 6 plots the approx imation of I ( P ) P vs. P for differ ent valu es of K N t T . It is evide nt from Figure 6, that ev en as K N t T gets close to one, the bits required to transmit reliably con ver ges to the capacity per unit ener gy at much smaller SNRs (and henc e larg er bandwidths). Since the P AP R of S TORM at SNR P is K N t P , it interes ting to note that w hen K N t T is fixe d, increasin g T decreas es the P AP R required for the same Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 36 ener gy per bit which is an adva ntage in practice. Increasing N t with K N t T fixe d, decreas es K and therefo re reduc es the peak-po wer per anten na and time slot (though not changing the P APR), which may also be helpful in pract ice. 8. An interesti ng observ ation from Figures 3 and 4 is that using more rece iv e antenn as always lo wers E b N 0 min , while it does not always increase the wideband slope. Figure 7 illustrates that w hile the approxi- mation of I ( P ) P increa ses with N r in ge neral, the con ver gence to the capa city per unit ener gy occurs more slo wly and hence a lower SNR is need ed to operate close to it as N r increa ses. 9. Even though the optimal scheme for a cardi nality more than T + 1 is yet unkno wn, STORM offers a concre te solution whose structure is also simple and practical. In [6], the posit iv e impact o n the w ideban d slope o f using c onstellatio ns with card inality greater than two i s illus trated via se vera l context s other than under the noncoh erent assumption. Even so, follo wing [30] and due to analytic al con ven ience, many recent papers [12, 31, 33] in noncohere nt communica tions focus on the two point ON-OF F scheme to achie ve the capac ity per unit ener gy . The results in this section demonst rate that there are compel ling reason s to look beyon d the two point ON-OF F sch eme in the low SNR re gime. 10. Recently , [34–37] ha ve in vestiga ted the possi bility of chan nel coherence length scalin g with SNR, so as to diminish the cost of acquiring chann el kno wledge. It should be interesting to pose and solve the optimiza tion problems of this work under such scenari os. V . C O N C L U S I O N W e pose two important problems on reliable communications ov er noncoherent MIMO spatially i.i.d. Rayleigh fading chan nels at lo w SNR. In both formulatio ns, we assume an av erage-po wer constraint on the i nput and a natural per -antenna, per -time slot peak -power cons traint. In the first problem for mulation, the peak-p ower to av erage-po wer ratio is held fi xed (PP APR-constrai ned) and the mutual information which grows as O (SNR 2 ) is maximiz ed up to seco nd order jointly o ver input s ignal matric es and t heir re- specti ve probabi lities, when the cardina lity of the const ellation is no greater than T + 1 ( T is the coherence blockl ength). In the second problem formula tion (peak-const rained), the peak- power is a fixed constant indepe ndent of SNR. Here, nece ssary and suf ficient condit ions for a conste llation of an y cardi nality to achie ve the mini mum energy /bit are deri ved. Over the set of all T + 1 point cons tellations which achie ve the minimum ener gy/bit, we optimize the secon d order beha viour of mutual information. The resultin g conste llations are both fi rst and second order optimal among all T + 1 point constellati ons. Both the PP APR-constrai ned and peak-cons trained problems result in finite dimensional non-con vex optimizatio n proble ms. Even so , they admit ele gant solu tions in closed form, whi ch are identic al in both formulat ions. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 37 W e refer to this common solution as Space Time Orthogo nal Rank-one Modulat ion (STORM), and it pro vides sev eral new ins ights on noncoher ent communicati ons at lo w SN R. In the PP APR-constrain ed case, we show that the T + 1 point STORM is near -optimal with resp ect to the maximum mutual infor mation up to second order with unconstrai ned cardinal ity e ven for m odest v alues of T and P A PR. Therefore, there is not much to be gained by usin g more than T + 1 point s in the PP APR-constraine d case. In the peak-constr ained case, our approach enabl es us to pro vide a sharp charac terization of the fi rst and second order be havio r of nonco herent MIMO cap acity , that also sheds light on the c ost of implicit estimation of channel state information in the low SN R regime. The energy /bit and the wideband slope achie ved by STORM als o rev eals a fundamental ener gy-vs-band width ef ficienc y tradeo ff that enables the determinat ion of the operating (lo w) SNR and pea k po wer most suitable for a gi ven appli cation. M oreo ver , while the more con ven tional MIMO On-Off Keyin g (O OK) also achie ves the minimum ener gy per bit, STORM has a w ideban d slo pe that is T times grea ter which translates into an incre ase in ban dwidth effici ency (or a decrease in the P AP R) by a factor of T in the wideba nd regime. Giv en typica l values of the coher ence blockl ength T , these gain s are po tentially huge. V I. A C K N O W LE D G E M E N T S The aut hors would lik e to thank the anon ymous revie wers for their help ful comments on the orig inally submitte d version of this paper . A PP E N D I X A. Pr oof of non-con vexit y A simple ar gument is gi ven to sho w that (25) is a non-con vex optimizatio n problem. W e need the follo wing definition of matrix con ve xity . Definitio n 5: A function f : ℜ n × n → ℜ m × m is matrix con ve x with respect to matrix inequality if for any pos itiv e semidefinite X 1 , X 2 and for any θ ∈ [0 , 1] f ( θ X 1 + (1 − θ ) X 2 )  θ f ( X 1 ) + (1 − θ ) f ( X 2 ) . (96) Since { X i } L i =1 is a set of co mplex matrices , the optimiz ation over the signals amou nts to an equ iv alent joint optimization over the real and imag inary part s of X i gi ven by X i = b X i + j e X i , ∀ i . In order to sho w that this join t optimizati on is non-c on ve x, w e will con sider the c ontour gi ven by e X i = 0 , ∀ i . W ith the imaginary parts being zero, the functi on in (25) bec omes P i P i (1 − P i ) tr  b X i b X ∗ i b X i b X ∗ i  It can be seen that g ( b X ) = b X b X ∗ is matrix-con ve x over b X , and h ( A ) = tr ( AA ∗ ) is a non-decre asing Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 38 con vex function over positi ve semidefinite matrices A . Theref ore, the composition f ( b X ) = h ◦ g = tr  b X b X ∗ b X b X ∗  is a con vex function ov er b X [29]. Further , since tr( b X b X ∗ ) and k b X k ∞ are con ve x func- tions of b X [29], th e co nstraints P i P i tr( b X i b X ∗ i ) ≤ E and k b X i k ∞ ≤ √ K are con ve x sets in { b X i } L i =1 . For an arbitrar y but fixed set of probabi lities { P i } L i =1 , the objecti ve function is con ve x in { b X i } L i =1 , while the constr aint set is the intersection of con ve x sets a nd is hence c on ve x. Therefore, the proble m of op timizing (25) over { b X i } L i =1 is a con ve x maximization problem and not a con ve x optimization problem. Since for a fixed { P i } L i =1 , the proble m of optimiz ing ov er { X i } L i =1 is a non-con vex optimization problem for the imaginar y parts of X i fixed, the joi nt optimizatio n ove r { P i } L i =1 and { X i } L i =1 is also non-con ve x. B. A low co mplexity bloc k decoder In some applica tions, deco ding of a block of symbols at a time may be required. This need arises for instan ce in unco ded systems, where the re is no c oding across block s. Another pos sibility is when th ere is coding across blocks, b ut hard de cision de coding is employed at th e r eceiv er so that the blocks of s ymbols are first decoded via the MAP rule follo wing which the outer code is d ecoded. In all such cases , w e sho w in this sectio n that the optimal MAP decoding of STORM can be simplified using Fast Fo urier T ransform (FFT) or Fast Had amard Tran sform (FH T) algor ithms. Consider the T + 1 poin t STORM as desc ribed in (11) and (12). Let the receiv ed signal matrix be R ∈ I C T × N r . The optimal MAP rule to decode a block at the recei ver is b j = max j P j p ( R | X j ) (97) = max j P j exp  − tr  R ∗  I T + X j X ∗ j  − 1 R  π T N r    I T + X j X ∗ j    N r (98) For con venien ce, w e will first find the maximum in (98) among the non -zero signal m atrices, an d then compare it with the metric for the zero m atrix. Substitu ting S TORM that is defined with permutatio n matrix P , we get that the maximum metric among non-z ero matrice s is max i =1 , ... ,T E K N t T 2 exp n − tr  Y ∗ ( I T + K N t v i v ∗ i ) − 1 Y o π T N r | I T + K N t v i v ∗ i | N r , (99) where Y = P ∗ R is a suf ficient statistic, which is simply the recei ved matrix with the permutation remov ed. The term ( I T + K N t v i v ∗ i ) − 1 can be simplified by applyi ng the W oodbu ry’ s identity , i.e., using ( A + BCD ) − 1 = A − 1 − A − 1 B  C − 1 + DA − 1 B  − 1 D A − 1 . Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 39 Also, using the identit y | I + AB | = | I + BA | , (99) becomes max i =1 , . .. ,T E exp {− tr( YY ∗ ) } π T N r K N t T 2 (1 + K N t T ) N r exp  K N t 1 + K N t T tr ( Y ∗ v i v ∗ i Y )  . (100) Clearly , among the non-zero constellatio n matrices, the MAP metric is maximized when k Y ∗ v i k 2 is maximized. Let V be the T dimensio nal DFT or Hadamard matrix. Then each row of the matrix Y ∗ V would r epresent the DFT or Hadamard transform of the correspo nding ro w of Y ∗ . The non-ze ro constel- lation matrix with the max imum MAP me tric wo uld there fore correspo nd to the column of Y ∗ V with th e maximum l 2 -norm. The N r DFTs or Hadamard transforms in vol ved can be effici ently compu ted using fast alg orithms (FF Ts and FHTs). Now , the metric correspond ing to the zero matrix would be  1 − E K N t T  exp ( − tr( YY ∗ )) π T N r . (101) Since this is a constant for a gi ven recei ve d signa l, we can di vide the metric in (100) by (101) and then tak e the natural logarithm of the resultin g exp ression so that Ω i = ln  E T ( K N t T − E )(1 + K N t T ) N r  + K N t 1 + K N t T tr ( Y ∗ v i v ∗ i Y ) . (102) No w letting i = arg max k =1 ,...,T k Y ∗ v k k 2 , the final simplified decod ing rule can be gi ven as b j =    i if Ω i ≥ 0 T + 1 if Ω i < 0 . (103) C. Deriva tion of MIMO-OO K Theor em 6: T he capac ity per unit ener gy (in nats/jo ule) for the i.i.d. MIMO block Rayleigh fadi ng chann el w ith a peak po wer constraint on the input signal k X k ∞ ≤ √ K is ˙ C (0) = N r  1 − log (1 + K N t T ) K N t T  , (104) and is achie ve d as P → 0 by the two p oint constell ation gi ven as ( X 1 , P 1 ) =  √ K v w ∗ , P K N t  (105) ( X 2 , P 2 ) =  0 T × N t , 1 − P K N t  , (106) where v ∈ I C T × 1 , w ∈ I C N t × 1 and | [ vw ∗ ] mn | = 1 ∀ i, m, n . Pr oof: From [30], it is kno wn tha t to achie ve the channel capac ity per unit en ergy , it is enoug h to transmit one non-ze ro symbol, gi ven in (105), apart from the symbol 0 . S ince we are dealing with a Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 40 memoryless , discrete and matrix input channe l (1) with th e cost-funct ion gi ven by b ( X ) = tr( XX ∗ ) , the capaci ty per uni t ener gy under a fixed peak po wer constrain t is giv en by [30] ˙ C (0) = sup X 6 = 0 k X k ∞ ≤ √ K D ( p ( Y | X ) || p ( Y | 0 )) tr( XX ∗ ) . ( 107) Using the ex pression for the K ullback-Liebl er distance whi ch can be ob tained easily (c.f. [17]), we obtain ˙ C (0) = sup X 6 = 0 k X k ∞ ≤ √ K N r  1 − log det ( I T + XX ∗ ) tr( XX ∗ )  (108) = sup X 6 = 0 , d tr( XX ∗ )= d , k X k ∞ ≤ √ K N r  1 − log det ( I T + XX ∗ ) d  . (109) Let the matrix XX ∗ ha ve eigen valu es { λ i } T i =1 . Then (109) can be upper bounded as ˙ C (0) ≤ sup { λ i } T i =1 , d P i λ i = d , d ≤ K N t T N r  1 − P i log(1 + λ i ) d  (110) = sup d d ≤ K N t T N r  1 − log(1 + d ) d  (111) = N r  1 − log (1 + K N t T ) K N t T  . (112) The exp ression in (111) is obtained by noting that since − P i log(1 + λ i ) is a con v ex function of [ λ 1 λ 2 . . . λ T ] T , the supremum in (110) is achie ved at the ex treme poin t [ d 0 . . . 0] T by Lemma 2. S ince  1 − log(1+ d ) d  is a monotonical ly increasing function of d , we obtain (112) by substit uting the maximum v alue of d . The inequa lity in (110) is achiev ed with equality when X is of unit rank, tr( XX ∗ ) = d and k X k ∞ ≤ √ K . The su premum in (111) is ach iev ed when d = K N t T , an d the unit rank X sat isfies both tr( XX ∗ ) = K N t T as well as k X k ∞ ≤ √ K w hich in turn is true iff it is of the form gi ven in (105). T o satisfy the av erage power con straint, set P 1 = P K N t . D. Pr oof of Theor em 4 The results regardi ng generaliz ed on-of f signaling gi ven in [6] are empl oyed. In partic ular , note that Theorem 10 in [6] prov ides the E b N 0 min and S 0 achie ved by a gener alized on-of f signa ling scheme. Fo r con venienc e, that result is summarized here. The generalized on- off signaling scheme has a P 0 mass at the all-zero m atrix 0 T × N t . The input pdf condit ioned on the input being nonzer o is denoted by P X , with distrib ution F X . W ith the input pdf condit ioned on the all- zero m atrix gi ven by P 0 , the input pdf is P X = ( P 0 ) P 0 + (1 − P 0 ) P X . (113 ) Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 41 Denoting the pdf of the output conditioned on the input by P Y | X , the output pdf correspo nding to P X is gi ven by P Y = Z P Y | X = X dF X ( X ) . (114) The wideban d slope S 0 achie ved by general ized on-of f signaling is S 0 = 2 T  E P X  D  P Y | X || P Y | X = 0   2 ∆( P Y || P Y | X = 0 ) , (115) where ∆( . || . ) denotes the Pearson’ s χ -div erge nce and is defined as ∆( P Y || P Y | X = 0 ) △ = E P Y | X = 0 "  P Y P Y | X = 0 − 1  2 # . (116) For the c hannel model under considerat ion in this paper , we hav e P Y = M X i =1 P i (1 − P 0 ) 1 π T N r | I + X i X ∗ i | N r exp n − tr  Y ∗ ( I + X i X ∗ i ) − 1 Y o P Y | X = 0 = exp( − tr( Y ∗ Y )) π T N r , and using the abov e express ions in (116), one obtains ∆  P Y || P Y | X = 0  = E P Y | X = 0 " M X i =1 P 2 i (1 − P 0 ) 2 e 2tr ( Y ∗ ( I − ( I + X i X ∗ i ) − 1 ) Y ) | I + X i X ∗ i | 2 N r +2 X i,j 6 = i P i P j (1 − P 0 ) 2 e tr ( Y ∗ ( 2 I − ( I + X i X ∗ i ) − 1 − ( I + X j X ∗ j ) − 1 ) Y ) | I + X i X ∗ i | N r    I + X j X ∗ j    N r − 2 M X i =1 P i 1 − P 0 e tr ( Y ∗ ( I − ( I + X i X ∗ i ) − 1 ) Y ) | I + X i X ∗ i | N r + 1 # (117) The abov e expressio n can be ev aluated usin g the result from [38] that if z is C N ( 0 , K ) distrib uted, then E z [exp ( z ∗ Az )] = { det ( I − KA ) } − 1 if I − KA is positi ve definite. O therwise, the exp ectation di ver ges. Hence (117) becomes ∆  P Y || P Y | X = 0  = 1 + M X i =1 P 2 i (1 − P 0 ) 2 1 | I + X i X ∗ i | 2 N r | I − (2 I − 2( I + X i X i ) − 1 ) | + X i j 6 = i 2 P i P j (1 − P 0 ) 2 1 | I + X i X ∗ i | N r    I + X j X ∗ j    N r    ( I + X i X ∗ i ) − 1 + ( I + X j X ∗ j ) − 1 − I    N r − M X i =1 P i 1 − P 0 1 | I + X i X ∗ i | N r | I + X i X ∗ i | − N r , (118) if I − X i X ∗ i X j X ∗ j is positi ve definite ∀ i, j , an d ∞ otherwise. Simplificat ion of (118) results in ∆  P Y || P Y | X = 0  gi ven in Theorem 4. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 42 R EF E R E N C E S [1] C. E. Shannon, “Communication in the Presence of Noise, ” Proceed ings of the IRE , vol. 37, no. 1, pp. 10–21, Jan. 1949. [2] R S Ken nedy , F ading Dispersive Communication Channels , Wile y and Sons, 1969. [3] I. Jacobs, “The asymptotic behavio r of incoherent M-ary communication systems, ” Pr oceedings of the IEEE , vo l. 51, no. 1, pp. 251–252 , Jan. 1963. [4] R G Gallager , Information Theory and Reliable Communication , W iley and Sons, 1968, Section 8.6. [5] I. E. T elatar and D. N. C. Tse, “Capacity and mutual information of wideband multipath fading channels, ” IEEE T rans. Inform. Theory , vol. 46 , no. 4, pp. 1384– 1400, July 2000. [6] Sergio V erd ´ u, “Spectral efficienc y in the wideband regime, ” IEEE T rans. Inform. Theory , vo l. 48, no. 6, pp. 1319–1343, June 2002, Special Issue on Shannon Theory: P erspecti ve, T rends, and Applications. [7] M. Medard and R. G. Gallager , “Bandwidth Scaling for fadin g multipath channels, ” IEEE T rans. Inform. Theory , v ol. 48, no. 4, pp. 840–852 , Apr . 2002. [8] V . G. Subramanian and B. Hajek, “Broad-band fading channels: si gnal burstiness and capacity, ” IEEE T ran s. Inform. Theory , vol. 4 8, no. 4, pp. 809– 827, Apr . 2002. [9] Chaitanya Rao and Babak Hassibi, “ Analysis of multiple-antenna wireless links at Low SNR, ” IEEE T rans. Inform. Theory , vol. 5 0, no. 9, pp. 2123 – 2130, Sept. 2004. [10] V Prelov and Sergio V erd ´ u, “S econd-order asymptotics of mutual information, ” IEEE T rans. Inform. Theory , vol. 50, no. 8, pp. 1567–15 80, Aug. 2004. [11] Bruce Hajek and V Subramaniam, “Capacity and reliabilit y function for small peak signal constraints, ” IEEE T rans. Inform. Theory , vol. 48 , no. 4, pp. 828–8 39, Apr . 2002. [12] Mustafa Cenk Gursoy , H. V incent Poor , and S ergio V erd ´ u, “Noncoheren t Rician Fading channel - Part II: Spectral E ffi- ciency in the Lo w-Power Re gime, ” IEEE T rans . W i r eless Commun. , vol. 4, no. 5, pp. 2207– 2221, Sept. 2005. [13] Ibrahim C. Abou-Faycal, Mitchell D T rott, and Shlomo Shamai (S hitz), “The Capacity of discrete-time memoryless Rayleigh fading chan nels, ” IEEE Tr ans. Inform. Theory , vol. 4 7, no. 4, pp. 1290 –1301, May 2001. [14] Mustafa Cenk Gursoy , V incent P oor , and Sergio V erdu, “The Noncohere nt Rician Fading Channel - Part I : St ructure of the Capacity-Achie ving Input, ” IEEE T rans. W ireless Commun. , v ol. 4, no. 5, pp. 2193–2206, Sept. 2005. [15] Jianyi Huang a nd Sean P . Mey n, “Characterization and Computation of Op timal Distributions for Channe l Coding, ” IEEE T rans. Inform. Theo ry , vol. 51, no. 7, pp. 2336–2351, July 2005. [16] Shivratna Giri Sriniv asan and Mahesh K V aranasi, “Constellation Design for the Noncoheren t MIMO Rayleigh-Fadin g Channel at General SNR, ” IEEE T rans. Inform. Theory , vo l. 53, no. 4, pp. 1572–1 584, Apr . 2006. [17] M J Borran, Ashutosh Sabharwal, and B Aazhang, “On design crit eria and construction of noncoh erent space-time con- stellations, ” IE EE T ran s. Inform. Theory , vol. 49, no. 10, pp. 2332–2 351, Oct. 2003. [18] T L Marzetta and Bertrand M Hochwald, “Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading, ” IEEE T rans. Inform. Theory , vol. 45, no. 1, pp. 139–157, Jan. 1999. [19] J. M. W ozencraft and R. S. Kenn edy , “Modulation and demodulation for probabilistic coding, ” IEEE Tr ans. Inform. Theory , vol. 1 2, pp. 291–29 7, July 1966. [20] J. L. Massey , “Coding and modulation in digital communication, ” in Pr oc. Intl. Zurich Seminar on Communications , Zurich, Switzerland, 1974. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 43 [21] Alfred O Hero, III and Thomas L Marzetta, “Cut-off rate and signal design for the Rayleigh fading space–time channe l, ” IEEE T rans. Inform. Theory , v ol. 47, no. 6, pp. 2400–241 6, S ept. 2001. [22] D. Agraw al, T . J. Richardson, and R. L. Urbank e, “Multiple-antenna signal constellations for fad ing channels, ” IEEE T rans. Inform. Theo ry , vol. 47, no. 6, pp. 2618–2626, Sept. 2001. [23] B M Hochwald, Thomas L Marzetta, T J Richardson, W Sweldens, and R Urbanke, “Systematic design of unitary space– time constellations, ” IEE E T rans. Inform. Theory , vol. 46, no. 6, pp. 196 2–1973, Sept. 2000. [24] V ahid T arokh, Nambirajan S eshadri, and A R obert C alderbank, “Space–time codes for high data rate wireless commu- nications: Performance criterion and code construction, ” IEE E T rans. Inform. Theory , vol. 44, no. 2, pp. 744–76 5, Mar . 1998. [25] Shivratna Giri Sriniv asan and Mah esh K V aranasi, “STORM: Optimal con stellati ons for non coherent MIMO communica- tions at low SNR under P APR constraints, ” in Proc. Allerton Conf . on Comm. Contr ol, and Comput. , Monticello, Illinois, Sept. 2006. [26] Shivratna Giri Srini vasan and Mahesh K V aranasi, “Mutual information optimal constellations for the low SNR non- coherent MIMO Rayleigh fading channel, ” in Pr oc. IEEE Intl. Symposium on Information Theory , Nice, France, June 2007. [27] V ignesh Sethuraman, Ligong W ang, Br uce Hajek, and Amos Lapidoth, “Low SNR Cap acity of Fading Chan nels - M IMO and Delay Spread, ” in P r oc. IEEE Intl. Symposiu m on Information Theory , Nice, France, June 2007. [28] Reiner Horst, Panos M. Pardalos, and Ngu yen V . Thoai, Intr oduction to Global Optimization , Kluwer , 2000. [29] S Boyd and L V andenberghe, Con vex Optimization , Cambridge Uni versity Press, Cambridge, U.K., 2004. [30] Sergio V erd ´ u , “On channe l capacity per unit co st, ” IEEE T rans. Inform. Theory , vol. 36, n o. 5, pp. 1019–1 030, Sept. 1 990. [31] X. W u and R. Srikant, “MIMO Channels in the Low SNR Regime: Communication Rate, Err or Exponent and S ignal Peakiness, ” IEEE Tr ans. Inform. Theory , Apr . 2007 . [32] A. W . Marshall and I. Olkin, Inequalities: T heory of Ma jorization and Its Applications , Academic Press, Ne w Y ork, 1979. [33] V ignesh Sethuraman and Bruce Hajek, “Capacity per unit energy o f fading channels with a peak constraint, ” IEEE T rans. Inform. Theory , vol. 51 , no. 9, pp. 3102– 3120, Sept. 2005. [34] Lizhong Zheng, David N. C. Tse, and Muriel Medard, “Channe l coherence in the low SNR re gime, ” IEEE T rans. In form. Theory , vol. 5 3, no. 3, pp. 976– 997, Mar . 2007. [35] Siddharth Ray , Muriel Medard, and Lizho ng Zheng, “On non-cohere nt MIMO channels in the wideb and regime: Capacity and reliability , ” IEEE T rans. Inform. Theory , v ol. 53, no. 6, pp. 1983–2009 , June 2007 . [36] V asanthan Raghav an, Gautham Hariharan, and Akbar Sayeed, “Capacity of sparse multipath channels in t he ultra- wideband regime, ” IEEE Jou rnal on Selected T opics in Signal Pr ocessing , vol. 1, no . 3, pp. 357 –371, Oct. 2007. [37] Gautham Hariharan and Akbar S ayeed, “Non-coherent capacity and reliability of sparse multipath channels in the wide- band regime, ” in Information Theory and Applications W orkshop , San Diego, Jan. 2007 . [38] G L T urin, “The characteristic function of hermitian quadratic forms in complex normal v ariables, ” B iometrika , vo l. 47, pp. 199–20 1, June 1960. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 44 1.3 1.4 1.5 1.6 1.7 1.8 0 0.5 1 1.5 x 10 −3 Spectral Efficiency vs. Energy per Bit, T=8, Nr = 2 Normalized Energy per Bit Eb/N0 (bits/Joule) Spectral Efficiency (bits/s/Hz) MIMO−OOK, KNtT=0.6 STORM, KNtT=0.6 MIMO−OOK, KNtT=0.8 STORM, KNtT=0.8 Fig. 1 P L OT O F S P E C T R A L E FFI C I E N C Y V S . E N E R G Y / B I T O F S TO R M U S I N G T H E S E C O N D O R D E R A P P ROX I M A T I O N O F I ( P ) I N ( 9 5 ) , F O R D I FF E R E N T V A L U E S O F K N t T . 0.8 1 1.2 1.4 1.6 1.8 0 0.5 1 1.5 2 2.5 x 10 −3 Normalized peak power KNtT = 0.7, T = 8 Normalized Energy per bit Eb/N0 (bits/Joule) Spectral Efficiency (bits/s/Hz) MIMO−OOK, Nr = 2 STORM, Nr= 2 MIMO−OOK, Nr = 3 STORM, Nr = 3 Fig. 2 P L OT O F S P E C T R A L E FFI C I E N C Y V S . E N E R G Y / B I T O F S TO R M U S I N G T H E S E C O N D O R D E R A P P ROX I M A T I O N O F I ( P ) I N ( 9 5 ) , F O R D I FF E R E N T V A L U E S O F N r . Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 45 0 0.2 0.4 0.6 0.8 1 −4 −2 0 2 4 6 8 10 12 14 16 K Nt T Minimum energy per bit for STORM Nr=1 Nr=2 Nr=3 Nr=4 Fig. 3 E N E R G Y P E R B I T ( D B ) O F S TO R M V S . K N t T F O R D I FF E R E N T N r Biography of Shi vratna G. Srinivasan Shi vratna Giri S rini va san (S07) recei ved the B.T ech. degre e in electric al engi neering from the Indian Institu te of T echnology , Madras, in 2002. He recei ved the M. S. and Ph.D. de grees in electrical and compute r engine ering from the Univ ersity of C olorad o at Boulder in 2005 and 2007, respecti ve ly . In September 2007, he joine d Q ualcomm, Inc., San Diego, as a S enior Engineer and is working on 4G wireless modem design s. Biography of Mahesh K. V aranasi Mahesh K. V aranasi (S ’87–M’8 9–SM’95) receiv ed the Ph.D. deg ree in electrical enginee ring from Rice Uni versit y , Houston, TX, in 1989. He joined the Electrical and Computer Engineer ing of the Uni- ver sity of Colorado at Boulder in 1989 as an Assistant P rofesso r wher e he was later an Associate Professo r during 1996-20 01 and is now a Professo r sinc e 200 1. H is research and teaching interests are in the areas of communic ation and informatio n theory , wireless communicatio n and cod ing, detection and estimat ion Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 46 0 0.2 0.4 0.6 0.8 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 K Nt T Wideband slope of STORM Nr=1 Nr=2 Nr=3 Nr=4 Fig. 4 W I D E BA N D S L O P E O F S T O R M V S . K N t T F O R D I FF E R E N T N r theory , and signa l processing . H e h as publis hed on a variet y of top ics in thes e fields an d is a Highly Cited Research er in the “Computer Science” catego ry accordin g to the IS I W eb of Science. He is currently servin g as an Editor for the IEEE Tran sactions on W ireless Communications. Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 47 −40 −38 −36 −34 −32 −30 −28 −26 −24 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 SNR (dB) bits/joule Bits transmitted reliably per joule, T = 8, Nr = 2 Cap pue KNtT=0.7 MIMOOOK KNtT=0.7 STORM KNtT=0.7 Cap pue KNtT=0.9 STORM KNtT=0.9 MIMOOOK KNtT=0.9 Fig. 5 P L OT C O M P A R I N G FI R S T O R D E R A P P R OX I M A T I O N O F I ( P ) P I N ( 9 5 ) V S . P F O R S T O RM A N D M I M O - O O K F O R D I FF E R E N T V A L U E S O F K N t T . Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 48 −40 −35 −30 −25 −20 −15 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 SNR (dB) bits/joule Bits transmitted reliably per joule, STORM, Nr = 2 capacity p.u.e. KNtT = 0.5 STORM, KNtT = 0.5 capacity p.u.e. KNtT = 0.7 STORM, KNtT = 0.7 capacity p.u.e. KNtT = 0.95 STORM, KNtT = 0.95 Fig. 6 P L OT O F FI R S T O R D E R A P P RO X I M A T I O N O F I ( P ) P I N ( 9 5 ) V S . P F O R S TO R M , A N D I T S C O N V E R G E N C E T O T H E C A PAC I T Y P E R U N I T E N E RG Y , F O R D I FF E R E N T V A L U E S O F K N t T . Nov ember 9, 2018 DRAFT IEEE TRANS. INFORM. TH. 49 −28 −26 −24 −22 −20 −18 −16 −14 −12 −10 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 bits/joule SNR (dB) Bits transmitted reliably per joule, STORM, KNtT = 0.7 Cap pue Nr = 1 STORM Nr = 1 Cap pue Nr = 2 STORM Nr = 2 Cap pue Nr = 3 STORM Nr = 3 Fig. 7 P L OT O F FI R S T O R D E R A P P RO X I M A T I O N O F I ( P ) P I N ( 9 5 ) V S . P F O R S TO R M , F O R D I FF E R E N T V A L U E S O F N r . Nov ember 9, 2018 DRAFT 0 0.2 0.4 0.6 0.8 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 bits/joule K Nt T Bits transmitted reliably per joule, STORM Capacity per unit energy SNR = −20 dB SNR = −17 dB SNR = −14 dB SNR = −11 dB 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 K Nt T bits/joule Bits transmitted reliably per joule, T = 4, Nr = 1 Capacity per unit energy MIMO OOK, SNR = −20 dB STORM, SNR = −20 dB MIMO OOK, SNR = −17 dB STORM, SNR = −17 dB 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 K Nt T bits/joule Bits transmitted reliably per unit energy, T = 8, SNR = −20dB Capacity per unit energy STORM MIMO OOK

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment