Communication under Strong Asynchronism
We consider asynchronous communication over point-to-point discrete memoryless channels. The transmitter starts sending one block codeword at an instant that is uniformly distributed within a certain time period, which represents the level of asynchr…
Authors: ** Y. Polyanskiy, H. V. Poor, S. Verdú **
1 Communication under Strong Asynchronism Aslan Tchamkerten, V enkat Chandar , and Gregory W ornell Electrical Engineer ing a nd Computer Scienc e Department Massachusetts I nstitute of T echnology Cambridge, MA 02139, USA Email: { tcham,vchanda r ,gww } @mit .edu Abstract W e consider asynchrono us commu nication over point-to-point discrete memo ryless channels witho ut feedback. The transmitter starts sending one block codew ord at an in stant that is uniformly distrib uted within a certain time period, which represen ts the le vel o f asyn chronism . The recei ver , by means of a sequ ential d ecoder, must isolate the message without knowing when the codew ord transmission star ts but being cognizant o f the asyn chronism level. W e are interested in how quickly can the receiver isolate th e sent message, par ticularly in the regime where the asynchro nism level is exponentially lar ger than the codeword len gth, which we refer to as ‘strong asynchr onism. ’ This model of sparse commu nication might represent the situation of a sensor that remains idle most of the time an d, o nly occasionally , transmits in formation to a remote base station w hich need s to quickly take action . Because of the limited am ount of energy the sensor possesses, assuming the same cost per tran smitted symbol, it is of interest to conside r minimum size codew ord s given the asynchronism le vel. The first result is an asymptotic char acterization of the largest asynchron ism le vel, in ter ms of the codew ord length, f or which reliable commu nication can b e achieved: vanishing er ror pr obability can b e g uaranteed as the codeword length N tend s to infinity while the asynch ronism le vel gr ows as e N α if and only if α does not exceed the synchr onization thres ho ld , a co nstant th at admits a simple closed form expression, and is at least as large as the capacity of the synchr onized channel. The secon d result is the ch aracterization of a set of a chiev able strictly positive r ates in the regime whe re the asy nchron ism le vel is expon ential in the codeword len gth, and wh ere the rate is defined with respect to th e expected (rando m) delay between the time info rmation starts being emitted un til the time the rece i ver makes a decision. Interestingly , this achiev ability r esult is obtain ed by a coding strategy who se d ecoder n ot on ly o perates in an asynchron ously , but h as a n almost un i versal decision ru le, in the sense that it is almo st ind ependen t of the channel statistics. As an applicatio n of th e first result we consider antip odal signaling over a Gaussian add itiv e chann el and derive a simple necessary con dition b etween blocklen gth, asynchr onism level, and SNR fo r ac hieving reliable commun ication. Index T erms Asynchro nous com munication , detection and isolation problem, d iscrete-time communication, error expo nent, low prob ability of d etection, po int-to-p oint communication , quickest detection, sequ ential analysis, sparse com mu- nication, stoppin g times I . I N T R O D U CT I O N A comm on assumpt ion in information theory i s that ‘wheneve r the t ransmitter speaks the recei ver listens. ’ In other words, in general, there is the assumption of perfect synchronization between the transmit ter and the receiver and, basic quantities, such as the channel capacity , are defined under this hypothesis [13]. In prac tice thi s assumption is rarely fulfilled. T ime uncertainty due, for instance, to bursty sources of information oft en causes asy nchronous comm unication, i.e., communi cation for whi ch t he recei ver has only a partial knowledge of when in formation i s sent. This work was supported in part by NSF under Grant No. CCF-0515122, and by a Univ ersity IR&D Grant from Draper Laboratory . 2 X Y ⋆ Q ( ·|· ) Fig. 1. Communication is carried over a discrete memoryless channel. When ‘no information ’ is sent the input of the channel is the ‘ ⋆ ’ symbol. There are, howe ver , notable channels for which asynchronism ef fects ha ve been studied from an inform ation theoreti c standpoint. An example is the multi ple access channel (see, e.g., [3], [9], [12], [16]) for whi ch the capacity region has been computed under various assumptions on the users’ asyn chronism. Another impo rtant example is t he insertion, deleti on, and substit ution channel for which only bou nds on the capacity are kn own (see, e.g., [1], [7], [8], [6]). In this paper we propose an information theoretic frame work that models users’ asynchronism for point-to-poin t discrete-time com munication without feedback. W e consider t he si tuation where the transmitt er may st art s ending information at a time unknown to the receiver . The time transmi ssion starts is assumed to be uniformly dis tributed within a certain interval, which defines t he asynchronism l e vel between t he transmi tter and t he receiv er . A suitable notion of rate is introduced and scaling laws between block m essage size and asynchronism l e vel are giv en for which reliable comm unication can o r cannot be achiev ed. 1 Our first resul t is the characterization o f t he highest asynchron ism lev el with respect to the codeword l ength u nder which reliabl e commun ication can still be achieved. This limit is attained by a codin g strategy that op erates at vanishing rate. This s trategy also allows for communication at positive rates while operating at asynchronism le vels t hat are exponentiall y larger than the code word length. In Section II we formally int roduce our model and dra w connections with t he related ‘detection and is olation’ probl em in sequential analysis. Section III cont ains our main results, Section IV is dev oted to the proofs, and we end with final remarks in Section V. The proofs make often use of large deviations type b ounding techniques for wh ich we refer the reader to [5, Chapters 1.1 and 1.2 ] or [4, Chapter 12]. I I . P RO B L E M F O R M U L A T I O N A N D BAC K G RO U N D W e consider discrete-time com munication over a discrete m emoryless channel characterized by its finite input and output alphabets X and Y , respectively , transition p robability matrix Q ( y | x ) , for all y ∈ Y and x ∈ X , and ‘noi se’ s ymbol ⋆ ∈ X (see Fig. 1). 2 The codebook consists of M ≥ 2 equally likely codewords o f l ength N compo sed of symbol s from X — possibly al so the ⋆ sym bol. The transmissi on of a particular cod e word starts at a random time ν , i ndependent of the codew ord to be sent, uniformly distributed in [1 , 2 , . . . , A ] , where the integer A ≥ 1 characterizes t he as ynchronism lev el. W e assume that the receiv er knows A b ut not ν . If A = 1 the channel is said to be synchronized. Throu ghout the paper , wheneve r we refer to the capacity of a channel, it is intended to be the capacity of th e synchronized channel. Throughout th e p aper we onl y consider chann els Q with strictly posi tiv e capacity C ( Q ) . Before and after the t ransmission of the information, i.e., before time ν and after t ime ν + N − 1 , the recei ver observe s noi se. Specifica lly , condition ed on the va lue of ν and o n the message to be conv eyed m , t he receiv er o bserves independent sy mbols Y 1 , Y 2 , . . . distri buted as follows. If i ≤ ν − 1 or i ≥ ν + N , the distribution is Q ( ·| ⋆ ) . At any tim e i ∈ [ ν , ν + 1 , . . . , ν + N − 1] 1 W e refer to ‘reliable communication’ whene ve r arbitrary low error p robability can be achiev ed. 2 Throughout the paper we always assume t hat for all y ∈ Y there is some x ∈ X for which Q ( y | x ) > 0 . 3 Y 1 Y 2 . . . ⋆ ⋆ . . . ⋆ c 1 ( m ) ν . . . τ c N ( m ) ⋆ ⋆ . . . ⋆ Fig. 2. Time representation of what is sent (upp er arrow) and what is receiv ed (lower arr o w). The ‘ ⋆ ’ represents the ‘no ise’ symbol. At time ν message m starts being sent and decoding occurs at time τ . the distribution is Q ( ·| c i − ν +1 ( m )) , where c n ( m ) denotes the n th sym bol of the code word c N ( m ) assigned to message m . The decoder consis ts of a s equential t est ( τ , φ ) , where τ is a stoppi ng tim e with respect to the outp ut s equence Y 1 , Y 2 , . . . 3 indicating when decoding happens, and where φ d enotes a decision rule 4 that declares the decoded m essage (see Fig. 2). 5 W e are in terested in reliable and quick decoding . T o that ai m we first define the av erage decoding error probability as P ( E ) = 1 A 1 M M X m =1 A X l =1 P m,l ( E ) , where E indicates the event that the decoded message does not correspond to the sent message, and where the su bscripts m,l indicate the condit ioning on the ev ent th at message m s tarts being sent at time l . Second, we d efine th e a verage communication rate wit h respect t o t he aver age delay it takes the receiv er to react to a sent m essage, i.e. R = ln M E ( τ − ν ) + (1) with E ( τ − ν ) + , 1 A 1 M M X m =1 A X l =1 E m,l ( τ − l ) + where x + denotes max { 0 , x } , and where E m,l denotes t he expectation wi th respect to P m,l . 6 W ith the above definitio ns we now int roduce the notion of achiev able rate with respect to a certain asynchronis m level as well as th e notion of synchr onization thr eshold . Definition 1. An asynchr onism exponent α is achiev able at a rate R if, f or an y ε > 0 , ther e e xists a b lock code with (sufficiently lar ge) codewor d length N , operating und er asynchr onism level A = e ( α − ε ) N , while yieldi ng a rate at least as lar ge as R − ε and an err or pr obabil ity P ( E ) ≤ ε . The supr emum of the set of asynchr onism ex ponents that ar e achievable at rate R is denoted α ( R, Q ) . Note that, for a given channel Q , t he asynchronism exponent function α ( R, Q ) is non-increasing in R . Definition 2. The synchronization th reshold of a channel Q , denot ed by α ( Q ) , is the supr emum of th e s et of achievable asynchr onism exponents at all rates, i.e., α ( Q ) = α ( R = 0 , Q ) . 3 Recall tha t a stopping time τ is an inte ger-v alued random v ariable wi th respect to a seq uence of random v ariables { Y i } ∞ i =1 so that the e vent { τ = n } , conditioned o n { Y i } n i =1 , is independent of { Y i } ∞ i = n +1 for all n ≥ 1 . 4 Formally φ is an F τ -measurable map whe re F 1 , F 2 , . . . is the natural filtration ind uced by the process Y 1 , Y 2 , . . . 5 In our model one messag e is sent in a certain interv al with pr obab ility one . An interesting extension of this model that we did not conside r is to gi ve some probab ilit y to the ev ent where no message is sent. The recei ver kno ws that with some probability 1 − p a message starts being sent within a certain interv al and t hat with probability p no me ssage is sent. 6 Here ln d enotes the natural logarithm. 4 Throughout the paper we often use the terminology ‘coding st rategy’ or ‘coding scheme’ to denote an infinite sequence of pairs codebook/decoder labeled by the blocklength . In particular , whenev er we refer to a coding strategy that ‘achiev es a certain rate, ’ it is intend ed to be asymptoticall y in t he lim it N → ∞ . Let us comment on the above b ursty communication model and i ts associated notions o f rate and synchronization threshol d. First observe that we d o not introduce a feedback channel from the receiv er to the transmitt er . W ith a noi seless feedback it is possible to in form the transmit ter of the recei ver’ s d ecoding time, s ay i n the form of ack/nack, therefore allowing the sending of multiple m essages instead of just one as in our model. Here th e no iseless assumpt ion is crucial. If t he feedback is noisy , the receiver’ s decision m ay be wrongly recognized by the transmitter , which possibly may result in a loss of mess age synchronization between transmitter and receiv er (say the receiver hasn’t yet decoded the first m essage while the transmi tter has already started to emit the second one). Therefore, in order to av oid a potent ial second source of asynchronism, we omit feedback in our study and limit transmi ssion to only on e message. The reason for defining the rate with respect to the av erage delay E ( τ − ν ) + (see (1)) is motiv ated by the following considerations . At first sight , a natural measure of delay may be the code word length N . Howe ver , in l ight o f the use of sequential decoding, the codew ord length does not provide a measure of the delay needed for the information to be reliably decoded. Another candidate for t he delay one mi ght consider is E ( τ ) or , equi valently , E ν + E ( τ − ν ) . The fact that this d elay takes in to account the initial offset E ν can be regarded as a weakness s ince this offset can be influenced neither by the t ransmitter nor by the receiv er . Als o, with such a delay measure, in the regime of positive asynchroni sm exponents we are interested in, the rate is always (asymp totically) vanishing for any reliable coding strategy . 7 Instead, we propo se to consider E ( τ − ν ) + , th e aver age time the transmitter needs to wait unti l the recei ver makes a decision. Also note th at, in the definition of achiev able rate (Definition 1), we choose to grow A with N . Indeed, when A is fixed the probl em becomes trivial. By using suffi ciently long code words and si mply decoding at the (fixed) time A + N − 1 the asynchronism ef fect on the rate can be made negligib le. W e now briefly discuss the notion of synchronization threshold. This threshold is defined with respect to zero rate coding strategies, that is st rategies for which ln M / E ( τ − ν ) + tends to zero (as N → ∞ ). H owe ver , because E ( τ − ν ) + and N need not coincide i n general, zero rate coding strategies need not, in general, yi eld a v anishi ng fraction ln M / N as N tends to infini ty . Indeed, as we will see, one can operate arbitrarily closely to the synchronization threshold while having ln M / N asymptoticall y boun ded awa y from zero. Perhaps the clo sest sequential decisio n problem our model relates to is a generalization of the change-point p roblem, often called t he ‘detection and iso lation problem, ’ introd uced b y Nikiforov in 1995 (see [11], [10] and [2] for a survey). A process Y 1 , Y 2 , . . . starts with some initial distribution and changes it at some unknown tim e. The post change d istribution can be any of a gi ven set of M distributions. By sequentially observing Y 1 , Y 2 , . . . the goal is to quickly react to the statis tical change and isolate its cause, i.e., t he p ost-change distribution. Hence, our synchronization prob lem t akes the form of a detection and isolation problem where the change in distribution is in duced by the transmit ted message. Howe ver , to the best of our kn owledge studies related to the d etection and isol ation problem usually assum e that once the ob served process jumps into one of its post-change distributions, i t remains in that state fore ver . This means that, ev entually , if we wait long enough, a correct decision is be possib le. Instead, i n the 7 T o see this consider the rate defined as ln M / ( E ν + E ( τ − ν )) . T o achiev e v anishing error probability as M (or N ) tends to infinity , the reaction de lay E ( τ − ν ) must grow at least linearly with ln M (if not this w ould imply that reliable communication abo ve ca pacity would be possible). Similarly , M and N must satisfy N ≥ ln M . Also, in the regime of positi ve asynchron ism expon ents, i .e., when A = e N α for some α > 0 , we hav e E ν = e N α / 2 since ν is uniformly distributed in [1 , 2 , . . . , A ] . Therefore, in the re gime of positi ve asynchronism expo nents, the rate ln M / ( E ν + E ( τ − ν )) i s vanishin g as N → ∞ for any coding strategy that ach ieve s arbitrarily low error probability . 5 synchronization problem t he change in distribution is local since it o nly lasts the duration of a code word length . In particular once the codeword is ‘m issed’ no recovery is possibl e. Final ly , optimal decoding rules for the det ection and isolati on problem seem to have been obt ained only in the limit of small error probabilit ies P ( E ) while keeping M , the number of post-change distributions, fixed. 8 In our case we ty pically let M grow as (1 / P ( E )) ξ , for some ξ > 0 . I I I . R E S U L T S Our first result is the characterization of the synchronization threshol d. Theor em 1. F or any discr ete memoryless channel Q , the synchr onizati on thr eshold as given in Definition 2 i s given by α ( Q ) = max x D ( Q ( ·| x ) || Q ( · | ⋆ )) wher e D ( Q ( ·| x ) || Q ( ·| ⋆ )) is the diver g ence (K ul lback- Leibler distance) b etween Q ( ·| x ) and Q ( ·| ⋆ ) . Furthermore , any synchr onization thr eshold α < α ( Q ) can be achieved by a codi ng strate gy th at yields lim N →∞ ln M / N > 0 . The theorem says that vanishing error probabil ity can be achie ved as the blocklength N tends to infinity if the asynchronism level grows as e N α where α < D ( Q ( ·| x ) || Q ( ·| ⋆ )) . Con versely , any coding strategy that operates at an asynchronism e xponent α > D ( Q ( ·| x ) || Q ( ·| ⋆ )) cannot achie ve arbitrary low error probabil ity . The second part of the theorem shows the distinctio n between the delay m easured b y the codew ord lengt h N and by the expected ‘reaction t ime’ E ( τ − ν ) + . Arbitrary closely to the s ynchronization threshold one can (asymp totically) guarantee ln M / N to be st rictly positiv e, while the qu estion remai ns open for the rate ln M / E ( τ − ν ) + . Specifically , it remains to be seen whether α ( Q ) = lim R ↓ 0 α ( R, Q ) (assumi ng α ( Q ) < ∞ ). This issue wi ll be discuss ed in Section III-B. At least som e conn ections betw een channel capacity and synchronization t hreshold exist. Although these two q uantities are not directly related, both refer to lim its on hypothesi s dis- crimination. The first concerns a purely isolatio n problem where as the second concerns an almost purely detection problem (since there is no rate constraint). It may be in teresting to n ote t hat the synchronization threshold α ( Q ) is always at least as large as C ( Q ) . T o see this let P be the capacity achieving distribution of the (synchronized) channel Q . It is well known [4, Lemma 13.8.1] that for any distribution V on Y D ( P Q || P P Y ) ≤ D ( P Q || P V ) where P Y is the right marginal of P Q = P ( · ) Q ( ·|· ) . Letti ng V = Q ( ·| ⋆ ) we g et C ( Q ) , D ( P Q ( ·|· ) || P P Y ) ≤ D ( P Q ( ·|· ) || P Q ( ·| ⋆ )) = X x P ( x ) X y Q ( y | x ) ln Q ( y | x ) Q ( y | ⋆ ) ≤ max x D ( Q ( ·| x ) || Q ( · | ⋆ )) = α ( Q ) Finally it can be checked that if C ( Q ) = 0 then α ( Q ) = 0 . 8 Here optimal decoding rules refer to sequential tests yielding minimum reaction d elay , usually a function of τ − ν , gi ven a certain error probability . 6 +1 − 1 +1 − 1 ⋆ = 0 ε 1 − ε 1 / 2 1 / 2 Fig. 3. Antipodal signaling o ver a Gaussian chann el with hard decision at the decoder . Example: the Ga ussian channel As an appli cation of Theorem 1 we consider antipodal sig naling over a Gaussian channel and deriv e a necessary cond ition between asynchronism lev el, blo ck length, and signal to noise ratio (SNR) for achieving reliable commu nication. Supp ose communicati on takes pl ace over an additive channel X → Y = X + Z where X denotes the inpu t, Y the output, and where Z is a normally distributed random variable, independent of X , wit h zero mean and unit variance. W e consider antipodal signalin g, t hat is c i ( m ) = ± √ SNR for all i ∈ { 1 , 2 , . . . , N } and m ∈ { 1 , . . . , M } , where the SNR is some positive con stant. Before decoding , the recei ver makes a h ard decision on each receiv ed sym bol and declares +1 i f Y i ≥ 0 and − 1 if Y i < 0 . Th e noise s ymbol ⋆ equals zero meaning that wh en no information is sent th e receiv er d eclares +1 or − 1 with p robability 1 / 2 . The i nputs + √ SNR and − √ SNR are receiv ed correctly with probability 1 − ε and are flipped wi th probabi lity ε , where ε = e − SNR 2 (1+ o (1)) as the SNR tends to in finity . The dis crete channel Q that result s from the hard decision procedure is depicted i n Fig. 3. From Theorem 1, any cod ing st rategy that yi elds vanishing error probability satisfies lim sup N →∞ 1 / N ln A ≤ α ( Q ) where α ( Q ) = max x D ( Q ( ·| x ) || Q ( · | ⋆ )) = ln 2 − H ( ε ) = ln 2 − H ( e − SNR 2 (1+ o (1)) ) as SNR → ∞ with H ( ε ) , − ε ln ε − (1 − ε ) ln(1 − ε ) . Therefore, as N t ends to infini ty , in order to achiev e reliable comm unication it is n ecessary that 1 N ln A ≤ ln 2 − H ( e − SNR 2 (1+ o 1 (1)) ) + o 2 (1) where o 1 (1) and o 2 (1) are vanishing functions of the SNR and of N , respectiv ely . Because of the chosen quantizatio n, in t he limi t of hig h SNR we have 1 N ln A ∼ ≤ ln 2 , and an increase in the power results in a ne gligib le increase of the asynchronism leve l for which reliable communication is possib le (for fixed blocklength). T o exploit power at high SNR it i s necessary to have a finer quantization at the output. Final ly no tice th at for th is (quantized) channel the synchronizatio n threshold coincides with the channel capacity . While we do not characterize the asynchronism exponent function α ( R , Q ) for R > 0 , The- orem 2 provides a non trivial lower bound characterization of α ( R , Q ) , for any R ∈ [0 , C ( Q )) . W e us e the no tation ( P Q ) Y to denote the right marginal of a j oint distribution P ( · ) Q ( ·|· ) and, give n a joi nt di stribution J on X × Y we denote by I ( J ) the m utual i nformation induced by J . Also we d enote by P Y |X the set of condi tional dist ributions of the form V ( y | x ) with x ∈ X and y ∈ Y . 7 Theor em 2. Let Q be a discr ete memoryless channel. If for some cons tants α ≥ 0 , t 1 ≥ 0 , t 2 > 1 , a nd input di stribution P , with I ( P Q ) > 0 , the foll owing in equalities a. α < inf V ∈P Y |X D (( P V ) Y || Q ( ·| ⋆ )) < t 1 α δ ( t 1 + t 2 − 1) D (( P V ) Y || ( P Q ) Y ) b. α < min V ∈P Y |X I ( P V ) ≤ t 2 α δ ( t 1 + t 2 − 1) D ( P V || P Q ) c. t 1 t 2 < D (( P Q ) Y || Q ( ·| ⋆ )) I ( P Q ) ar e satis fied for som e δ ∈ (0 , 1 ) , then the rate I ( P Q ) /t 2 is a chie vable at an asynchr onism e xponent α . Note t hat the conditions a and b in Theorem 2 are easy to check nu merically since they only in volv e conv ex op timizations. Also noti ce, on the right hand side o f the inequality b , t he sphere packing e xponent f unction — of the channel Q with input distrib ution P — e v aluated at t 2 α δ ( t 1 + t 2 − 1) (see [5, p .166]). Corollary . F or any c hannel Q wit h capacity C ( Q ) > 0 , any rate R ∈ (0 , C ( Q )) c an be achieved at a s trictly posi tive asynchr onism ex ponent. Pr oof of the Cor ollary: Consider the inequalities a , b , and c from Theorem 2. First cho ose some P and t 2 > 1 so that I ( P Q ) /t 2 ≥ R and s o that ( P Q ) Y 6 = Q ( ·| ⋆ ) (this is always possib le since C ( Q ) > 0 ). By setti ng t 1 = 0 the inequalit y c holds (since its right hand side is strictly positive). Also inequality a holds for any finite α (the infimum equals infinity). For the inequalit y b , observe that its right hand s ide is a decreasing functio n of α and has a st rictly positive v alue at α = 0 (since I ( P Q ) > 0 ). It fol lows that inequali ty b holds for strictly positive and small enough values of α . A. Codi ng for asynchr onous channels In this section we present the coding scheme from which one deduces T heorem 2 and the direct part of Theorem 1. As we will see, ou r scheme does no t subdivide the synchronizati on problem into a detection problem followed by a mess age isolation problem: detection and isolation are treated joint ly . The codebook is random ly generated according t o some dist ribution P . If the aim is only to reliably com municate at a certain asynchronism exponent α , t here is some degrees of freedom in choos ing P . One pos sible choice i s to pick a P that sat isfies D (( P Q ) Y || Q ( ·| ⋆ )) + I ( P Q ) − ln M / N > α with D ( ( P Q ) Y || Q ( ·| ⋆ )) > 0 and I ( P Q ) > 0 , where M represents the size of the m essage set and N the size of the codew ords (s ee proof of Proposit ion 2). In t he regime where t he asynchronism exponent is clos e to α ( Q ) t he cod e words are mainly com posed of the sym bol arg max x D ( Q ( ·| x ) || Q ( · | ⋆ )) . Indeed, in thi s asynchronis m regime, the main source of error comes from a mi ss detection of the sent codew ord, later referred to as ‘false-alarm. ’ W e deal with this source of error by di stillating information us ing codew ords wit h (mostly) sy mbols that induce out put d istributions that are ‘as far as possible’ from the o utput dist ribution induced by the ⋆ symb ol. Finally if the aim is to accomm odate both rate and asynchronism constraints, the distribution P has to satis fy the condi tions explicitly stated in Theorem 2. For the decoder , l et u s observe first that our comm unication mod el adm its two sources of error . The first comes from an atypical behavior of the noise d uring the period when n o information is con veye d, which may result i n a false-alarm. The second comes from an atypi cal 8 beha vior of the channel during inform ation transmissi on, whi ch may result i n a m iss-isolati on of the sent code word. These t wo sources of error depend on the asynchronism level as well as on the comm unication rate: the higher the asynchroni sm the higher the first s ource of error , the higher the commu nication rate t he higher t he second source of error . Accordingly , our d ecoder is the combi nation of two criteria parameterized by constants t hat are chosen based on the level of asynchronis m and according t o the rate we aim at. More specifically , the decoder observes the channel output s Y 1 , Y 2 , . . . and makes a decis ion as soon as it observes i consecutive out put symbo ls, with i ∈ [1 , 2 , . . . , N ] , that si multaneously satisfy two conditions. The first condition is that these symbols should look ‘suffi ciently dif ferent’ from the noise, as m easured by the div ergence. The second condition is that these symbols must be suf ficiently correlated, in a mutual information sense, with one of the code words. W e formalize this below . For j ≥ i w e write x j i for x i , x i +1 , . . . , x j . If i = 1 we use the shorthand notati on x j instead of x j i . Giv en a pair ( x n , y n ) let u s denote by ˆ P ( x n ,y n ) the empi rical dist ribution of ( x n , y n ) , i.e., ˆ P ( x n ,y n ) ( x, y ) = 1 n P n i =1 1 1 ( x,y ) ( x i , y i ) where 1 1 ( x,y ) ( x i , y i ) = 1 if ( x i , y i ) = ( x, y ) , else equals zero. T o each message m ∈ [1 , 2 , . . . , M ] asso ciate t he stopp ing t ime 9 τ m = inf ( n ≥ 1 : ∃ i ∈ { 1 , . . . , N } so th at iD ( ˆ P Y n n − i +1 || Q ( ·| ⋆ )) ≥ t 1 ln M and min k ∈ [1 ,...,i ] h k I ( ˆ P c k ( m ) ,y n − i + k n − i +1 ) + ( i − k ) I ( ˆ P c i k +1 ( m ) ,y n n − i + k +1 ) i ≥ t 2 ln M ) (4) where t 1 ≥ 0 and t 2 > 1 are some fixed threshold constants to be appro priately chosen according to th e asy nchronism level and desired com munication rate. The decoding is made at time τ = min m ∈ [1 , 2 ,...,M ] τ m and the message ¯ m that is declared is any that satisfies τ ¯ m = τ . It s hould be em phasized t hat there may be o ther sequent ial decoders that also achiev e the synchronization th reshold. The one we propose has t he prop erty that it also allows for communication at positive rates and positive asynchronism e xponents . Also, an interesting feature of the abo ve d ecoder is that, in addition t o operating in an asynchronous setting, it is also almost univ ersal in t he sense th at its rule does not depend of the channel stati stics, except for the noise di stribution Q ( ·| ⋆ ) . In fact this decoder is an extension o f a sequential univ ersal decoder introduced in [15, eq. (10 )] for th e synchronized set ting. In the context of asynchronous communication, the same decoding rule as abov e is considered in [14], but wit hout the diver gence cond ition, i .e., a decision is made as soon as for so me m and i the conditi on min k ∈ [1 ,...,i ] h k I ( ˆ P c k ( m ) ,y n − i + k n − i +1 ) + ( i − k ) I ( ˆ P c i k +1 ( m ) ,y n n − i + k +1 ) i ≥ t 2 ln M ) 9 It may seem to the read er that the mutual informa tion cond ition in (4) given by min k ∈ [ 1 ,...,i ] h kI ( ˆ P c k ( m ) ,y n − i + k n − i +1 ) + ( i − k ) I ( ˆ P c i k +1 ( m ) ,y n n − i + k +1 ) i ≥ t 2 ln M ) (2) is con voluted, and tha t it could b e rep laced, for instance, by iI ( ˆ P c i ( m ) ,y n n − i +1 ) ≥ t 2 ln M . (3) Our choice is moti vated by a technical consideration related to the false-alarm even t induced by i last symbols that are generated partly inside and partly outside the transmission period (see Case II of t he proof of Lemma 2). 9 1 1 0 ⋆ = 0 ε ε Fig. 4. A binary symmetric channel has a sphere packing bound at zero r ate, E sp ( R = 0 , Q ) gi ven by max P min V : I ( P V ) =0 D ( P V || P Q ) , that can be smaller compared to α ( Q ) . Specifically , Theorem 1 yields α ( Q ) = ε ln[ ε/ (1 − ε ) ] + ( 1 − ε ) ln[(1 − ε ) /ε ] and it can be checke d that E sp ( R = 0 , Q ) ≤ 0 . 5 ln[0 . 5 / (1 − ε )] + 0 . 5 ln[0 . 5 /ε ] . Therefo re E sp ( R = 0 , Q ) ≤ 0 . 5(1 + o (1)) α ( Q ) as ε → 0 . 1 0 1 0 ⋆ = − 1 1 / 2 1 / 2 Fig. 5. Example of a chann el for which α ( Q, 0) = lim R ↓ 0 α ( Q, R ) . holds. W ith the mutual i nformation condit ion al one, howe ver , it was n ot pos sible to prove t hat reliable communi cation can b e achie ved for asynchronism exponents higher than the capacity of the channel. B. Cont inuity of α ( · , Q ) at R = 0 W e discuss the continuity of α ( · , Q ) at R = 0 in l ight of Theorem 2. The right h and si de of inequali ty b , the sphere packing bound, i s associated to the miss-is olation error ev ent of the sent codew ord associ ated with the coding s cheme discussed in III-A (t his will be seen in the proof of Theorem 2). Therefore, regardless of the rate, any achie vable synchronization exponent α obtained via Theorem 2 is bounded b y the sphere packin g exponent at zero rate, which can be smaller than the synchronization thresho ld (see Fig. 4 for an example). This mot iv ates t he conjecture that α ( Q, 0) 6 = lim R ↓ 0 α ( Q, R ) in general. Note that there are channels for wh ich the asynchronism exponent function is continuous at zero rate, such as the one given in Fig. 5. Indeed, i n this case α ( Q ) = ln 2 by Theorem 1. Then, considering the th ree inequalit ies given in Theorem 2, let t 1 = 0 and let the input dis tribution P b e defined as P (1) = p = 1 − P (0) for some fixed p ∈ (0 , 1 / 2) . W ith this choice of t 1 and P the inequalit y a holds for any finite α (the infimum is infinit e) and inequali ty c holds for any t 2 > 1 since its right hand sid e is strictly positive. W e now focus on the inequality b . Observe that any channel V 6 = Q with inp uts 0 and 1 giv es D ( P V || P Q ) = + ∞ . Th erefore, for any δ ∈ (0 , 1) and t 2 > 1 t he right h and side of the inequalit y b is i nfinite if Q satisfies t 2 α δ ( t 2 − 1) < I ( P Q ) , (5) and zero otherwise. Now pick an arbitrarily small µ > 0 and choos e P with p sufficiently close to 1 / 2 so that I ( P Q ) ≥ α ( Q ) − µ/ 2 . (6) W e conclude from (5) and (6) that, b y choosi ng δ close enough to one and t 2 lar ge enough, any asynchronism exponent α ≤ α ( Q ) − µ can be achieved at all rates up to I ( P Q ) /t 2 . 10 y (1) y (2) . . . . . . y ( s ) Fig. 6. Parsing of the receiv ed sequence of maximal l ength A + N − 1 i nto s blocks y (1) , y (2) , . . . , y ( s ) of length N , where s is the integer part of ( A + N − 1) / N . I V . A N A L Y S I S In this section we prove the con verse and the direct p art of Theorem 1. T he conv erse shows that no coding strategy achieves vanishing error probability while operating at an asy nchronism exponent higher t han α ( Q ) . For the di rect part we show that the coding scheme p roposed in Section III-A can reliably operate arbitrarily closely to t he asynchronism exponent α ( Q ) . By extending the analysis of this scheme we will prove Theorem 2. The d iffe rence between the achie vability schemes of Theorem 1 and 2 lies in the codebooks . For Theorem 1 the codebook is randomly generated according to a certain distribution P , while for Theorem 2 we impose that each code word is (essentiall y) of const ant compositi on P uni formly over its length . Pr opositio n 1 (Con verse) . Suppose that Q ( y | ⋆ ) > 0 fo r all y ∈ Y . Then no cod ing st rate gy achieve s an asynchr onism exponent str ictly gr eater th an max x ∈X D ( Q ( ·| x ) || Q ( · | ⋆ )) . Proposition 1 assumes th at Q ( y | ⋆ ) > 0 for all y ∈ Y . Indeed, if Q ( y | ⋆ ) = 0 for some y ∈ Y it will shown in Proposition 2 that reliabl e comm unication can b e achieved irrespectively of the exponential growth rate of the asynchroni sm l e vel with respect to the b locklength. Pr oof o f Pr opos ition 1: Suppose t here are two equally likely mess ages, m and m ′ , and that the decoder is given the sequence of maximal length y 1 , y 2 , . . . , y A + N − 1 . W e make the hypothesis that each codew ord c ( m ) and c ( m ′ ) uses one symbol repeated N tim es. The case where each codeword us es multip le sym bols i s obtained by a straightforward extension o f the single sym bol case and i s therefore omi tted. Also, we optimis tically ass ume that the receiv er is cognizant of t he fact that t he sent m essage is deliver ed during one of the s di stinct time slot s of duration N , w here s is the integer part of ( A + N − 1) / N , as shown in Fig. 6. An easy computation shows that, giv en a sequence y A + N − 1 , the m aximum a pos teriori decoder declares message m or m ′ depending whether the sum s X l =1 z ( y ( l ) ) is po sitive of negati ve, 10 with z ( y ( l ) ) , Q ( y ( l ) | c ( m )) Q ( y ( l ) | ⋆ ) − Q ( y ( l ) | c ( m ′ )) Q ( y ( l ) | ⋆ ) (7) and where Q ( y ( l ) | c ( m )) denotes the probabilit y of the l th block y ( l ) of size N given the code word c ( m ) , and where Q ( y ( l ) | ⋆ ) refers to the same probabilit y now conditio ned on the string of N consecutiv e ⋆ . Th e probabilit y of the error e vent E is hence lower boun ded as P ( E ) ≥ 1 2 " P m s X l =1 z ( Y ( l ) ) < 0 ! + P m ′ s X l =1 z ( Y ( l ) ) > 0 !# where P m refers to the prob ability conditioned on message m being sent. Note that un der P m and P m ′ the z ( Y ( l ) ) are all i.i.d. according to the noi se distribution except for z ( Y ( ν ) ) whose distribution depends on the s ent message. 10 If the su m is zero the dec oder declares one of the two messages at rando m. 11 Let T m be the set of sequences y N that are strongly typi cal with respect to Q ( ·| c ( m )) [5, p.33], i .e, any sequence y N ∈ T m satisfies | n ( y ; y N ) / N − Q ( y | c ( m )) | < µ where n ( y ; y N ) is the number of ti mes the symbol y appears in y N . W e choose the strong t ypicality constant µ to be so that 0 < µ ≪ 1 and the blocklength N l ar ge enough that P m ( Y ( ν ) ∈ T m ) ≥ 1 − µ . W e define T m ′ analogously . Further , we define h to be equal t o max y N ∈ T m ∪ T m ′ | z ( y N ) | . Using the independence of z ( Y ( ν ) ) and P l 6 = ν z ( Y ( l ) ) under P m we get P m s X l =1 z ( Y ( l ) ) < 0 ! ≥ P m { Y ( ν ) ∈ T m } ∩ ( X l 6 = ν z ( Y ( l ) ) < − h )! ≥ (1 − µ ) P s − 1 X l =1 z ( Y ( l ) ) < − h ! . The su m in t he argument of the last term above i n v olves s − 1 independent random variables distributed according to Q ( ·| ⋆ ) . For simpl icity from now on w e deno te these random variables by Z l instead of z ( Y ( l ) ) . W e then deduce that P ( E ) ≥ 1 − µ 2 P s − 1 X l =1 Z l > h ! . (8) In the remaining part of the proof we s how that, if s = e ( α ( Q )+ ε ) N , with ε > 0 , the random walk P s − 1 i =1 Z l crosses h with finite probability as N tends to infinity , proving the Propos ition. At the core of t he argument l ies the fol lowing Lemma whose proof is deferred to t he Appendix. Lemma 1. Let P be a distribution over some finite alphabet A = { a 1 , a 2 , . . . , a |A| } and s uppose that for some inte ger s ≥ 1 3 sδ 0 < min { P ( a 1 ) , P ( a 2 ) } for some constan t δ 0 ∈ (0 , 1) . Let ˆ P be an empirical type 11 over A s so that min { ˆ P ( a 1 ) P ( a 1 ) , P ( a 2 ) ˆ P ( a 2 ) } ≥ δ 0 and ˆ P ( a 2 ) ≥ 1 / s . Let ¯ P be defined s o that ¯ P ( a 1 ) = ˆ P ( a 1 ) − 3 s , ¯ P ( a 2 ) = ˆ P ( a 2 ) + 3 s , and ¯ P ( a i ) = ˆ P ( a i ) for any a i ∈ A\{ a 1 , a 2 } . Then P s ( T ( ¯ P )) ≥ δ P s ( T ( ˆ P )) for so me s trictly posit ive constant δ = δ ( δ 0 ) , wher e P s denotes th e pr oduct dist ribution induced by P over A s , and wher e T ( ˆ P ) a nd T ( ¯ P ) denote the set of sequences o f length s with empirical type ˆ P and ¯ P , r espectively . W e use the l emma with A = { a : a = z ( y N ) for some y N ∈ Y N } , s defined as the integer part of e N ( α + ε ) for some arbitrary ε > 0 , and P defined as P ( a ) = P y N : z ( y N )= a Q ( y N | ⋆ ) for all a ∈ A . Als o, we l et a 1 = h , a 2 be the s ymbol in A wi th the high est probabil ity under P , and ˆ P b e any distribution on A s o that 1 − ˆ P ( a i ) P ( a i ) < µ for i ∈ { 1 , 2 } . In the sequel we label such distributions ˆ P as ‘typical types. ’ W e now assume that s , P , ˆ P , a 1 , and a 2 satisfy the hypothesis of Lemma 1 and will show it at the end of th e proo f. Suppose by contradiction that th e ri ght hand side of (8) goes to zero as N → ∞ , i.e., that P s s X l =1 Z l ≤ h ! ≥ 1 − ρ (9) 11 An empirical type over A s is a d istribution ˆ P ov er A so that ˆ P ( a ) is an integ er multiple of 1 /s , for all a ∈ A . 12 for any arbitrary ρ > 0 and N large eno ugh. As sume for the moment that (9) i mplies for N lar ge enough P s s X l =1 Z l ≤ h ∩ Z s has a typical type ˆ P ! ≥ 1 − µ − ρ . (10) This i mplication w ill b e sh own at the end of the proof. Now , for a given typical type ˆ P l et ¯ P be defined as in Lemma 1. Observe that i f Z s belongs to the ev ent s X l =1 Z l ≤ h ∩ Z s has typi cal type ˆ P then Z s has a type ˆ P that yi elds a ¯ P whose ty pe cl ass 12 belongs to t he eve nt 13 s X l =1 Z l > h . Hence, from Lemma 1 and (10) there exists so me δ > 0 so that P s s X l =1 Z l > h ! ≥ δ (1 − µ − ρ ) (11) for N large enough, which is in cont radiction wit h (9) for ρ sm all enough. W e conclude that P P s l =1 Z l > h is asympto tically bounded awa y from zero, and so is t he right hand side o f (8). T o conclude the proof we need to j ustify t he s teps from (9) to (10) and we need to check that P and ˆ P satisfy the hy pothesis of the lemma with our choi ce of a 1 and a 2 . For this last check, first n ote that z ( y N ) depends only on the type of y N . W ithout loss of generality we assume that h is achiev ed by a type in T m . Hence w e hav e 14 P ( a 1 ) , X y N ∈ T m Q ( y N | ⋆ ) ≥ e − N D ( Q ( ·| x ) || Q ( ·| ⋆ ))(1+ η ) p oly ( N ) where x is the N tim es repeated symbol for the codew ord c ( m ) , and where η = η ( µ ) > 0 goes to zero as µ vanishes. It follows that sP ( a 1 ) grows exponentially with N provided µ is sm all enough. T hus th e condi tion 1 / ( sδ 0 ) < P ( a 1 ) is trivially satisfied for any δ 0 ∈ (0 , 1) . Als o, our choice of a 2 giv es 1 / ( sδ 0 ) < P ( a 2 ) for any δ 0 . Thi s is b ecause P ( a 2 ) ≥ p oly ( N ) si nce there are polynom ially many types of length N and that a 2 is generated by the type of the highest probability . Finally , that the condi tions min { ˆ P ( a 1 ) P ( a 1 ) , P ( a 2 ) ˆ P ( a 2 ) } ≥ δ 0 and ˆ P ( a 2 ) ≥ 1 / s are satisfied follows from t he definition of ˆ P . Finally we s how that P s Z s has typi cal type ˆ P 12 The type class of ¯ P is the set of all sequences z s that have type ¯ P . 13 This step follows by noting first that a 1 = e N D (( Q ( ·| x ) || Q ( ·| ⋆ ))(1+ o (1)) as µ → 0 and N → ∞ , and second t hat a 2 /a 1 = o (1) as N → ∞ (for µ > 0 sma ll eno ugh). 14 Throughout the paper we u se t he notation p oly( N ) to denote any term that is either a po lynomial in N or the i n verse of a polynomial in N . 13 can b e m ade arbitrarily close to one as N tends to infinity , j ustifying the step from (9) to (10). Using Chebyshev’ s in equality and the fact t hat the variance of a bi nomial is d ominated by it s expectation we get 15 P s ˆ P Z s ( a 1 ) P ( a 1 ) − 1 ≥ µ ! = P s s X l =1 1 1 a 1 ( Z l ) − sP ( a 1 ) ≥ sµP ( a 1 ) ! ≤ 1 sµ 2 P ( a 1 ) which goes to zero as N → ∞ since we proved above t hat sP ( a 1 ) grows (exponentially) with N . A si milar ar gument shows that P s ( | ˆ P Z s ( a 2 ) /P ( a 2 ) − 1 | ≥ µ ) v anishes as N increases. Since P s Z s has typi cal type ˆ P = P s ˆ P Z s ( a i ) P ( a i ) − 1 < µ, i = 1 , 2 ! the claim is proved. The direct part of Theorem 1 is o btained by a random coding argument associated with the scheme presented in Section III-A. W e ass ume that all the components of all codewords are chosen i.i.d. according to s ome dis tribution P to be specified later . Given that message m starts being emitted at time l , we bound the probabil ity o f error as P m,l ( E ) ≤ P m,l ( min m ′ 6 = m τ m ′ < l + N − 1) + P m,l ( τ m ≥ l + N ) with τ m as defined in (4), which is interpreted as the sum of the probabil ity of false-alarm and the probabil ity of mis sing the correct code word. In order to upper b ound the above two terms, let us define the event E ( m, n, i, k ) as the intersection of the e vents k I ( ˆ P C k ( m ) ,Y n − i + k n − i +1 ) + ( i − k ) I ( ˆ P C i k +1 ( m ) ,Y n n − i + k +1 ) ≥ t 2 ln M and iD ( ˆ P Y n n − i +1 || Q ( ·| ⋆ )) ≥ t 1 ln M . Also let E ( m, n, i ) = ∩ k =1 , 2 ,...,i E ( m, n, i, k ) . W e interpret E ( m, n, i ) as the e vent that message m is declared at ti me n by observing the last i symbols. W ith these definit ions we h a ve 16 P m,l ( min m ′ 6 = m τ m ′ < l + N − 1) ≤ X m ′ 6 = m n ∈ [1 ,...,A + N − 1] i ∈ [1 ,...,N ∧ n ] P m,l ( E ( m ′ , n, i )) (12) from the u nion bound, and P m,l ( τ m ≥ l + N ) ≤ P m,l ( E ( m, l + N − 1 , N ) c ) . (13) Lemmas 2 and 3 below upper bou nd the right hand sides of (12) and (13). W e denote by P , P X , and P Y the set of all d istributions on X × Y , X , and Y respecti vely . Later we will also u se P Y |X to denote the set of condition al distributions of th e form V ( y | x ) with x ∈ X and y ∈ Y . Further we deno te by P n the set of all types of length n over X × Y , and similarly for P X n and P Y n . As mentio ned earlier , t he notation p oly ( N ) is us ed for a term that grows no faster than polyn omially in N . 15 Here 1 1 a 1 ( Z l ) equals 1 if Z l = a 1 , zero else. 16 The notation a ∧ b is used for the minimum of a and b . 14 Lemma 2 (false-alarm) . Assume the codebook to b e randomly generated so that each sample of each codewor d is i.i.d. accor ding to so me distr ibution P . F or any thr eshold const ants t 1 , t 2 ∈ R and as ynchr onism level A ≥ 1 X m ′ 6 = m n ∈ [1 ,...,A + N − 1] i ∈ [1 ,...,N ∧ n ] P m,l ( E ( m ′ , n, i )) ≤ ( M − ( t 1 + t 2 − 1) A + M − ( t 2 − 1) ) p oly ( N ) . Notice t hat t he above boun d on the false-alarm error probability does no t depend on P . Also notice th at if t 1 + t 2 ≤ 1 or t 2 ≤ 1 the lemma is trivial. Pr oof of Lemma 2: W e disti nguish the cases when E ( m ′ , n, i ) is generated outside the m essage transmission period and when it is generated partly outside and partly inside the message transmission period. In both cases we wil l use t he identity D ( V || P 1 P 2 ) = I ( V ) + D ( V X || P 1 ) + D ( V Y || P 2 ) , (14) where V denotes any d istribution on X × Y with m ar ginals V X and V Y , and where P 1 and P 2 are any distributions o n X and Y respecti vely . Case I: E ( m ′ , n, i ) is generated outs ide the message t ransmission perio d (i.e., n < l or n − i + 1 ≥ l + N ) By definition E ( m ′ , n, i ) ⊂ E ( m ′ , n, i, i ) , hence from Theorem 12 . 1 . 4 [4] and (14) we get P m,l ( E ( m ′ , n, i )) ≤ P m,l ( E ( m ′ , n, i, i )) ≤ X V ∈P i iI ( V ) ≥ t 2 ln M iD ( V Y || Q ( ·| ⋆ )) ≥ t 1 ln M e − iD ( V | | P Q ( ·| ⋆ )) ≤ X V ∈P i iI ( V ) ≥ t 2 ln M iD ( V Y || Q ( ·| ⋆ )) ≥ t 1 ln M e − iI ( V ) − iD ( V Y || Q ( ·| ⋆ )) ≤ ( i + 1) |X || Y | M − t 2 M − t 1 ≤ poly ( N ) M − t 2 M − t 1 (15) where the last two inequalities hold since | P i | ≤ ( i + 1 ) |X || Y | by Lemma 2.2 [5] and b ecause i ≤ N . Case II: E ( m ′ , n, i ) i s generated partl y o utside and pa rtly i nside the message transmission period (i.e., n ≥ l and n − i + 1 ≤ l + N − 1 ) Here the eve nt E ( m ′ , n, i ) in volv es the output random var iables Y n − i +1 , Y n − i +2 , . . . , Y n , the first k being dis tributed according to t he noise d istribution, and the remaining i − k according t o the distribution induced by the sent codew ord. Since, by definition, E ( m ′ , n, i ) ⊂ E ( m ′ , n, i, k ) for any k ∈ [0 , 1 , . . . , i ] , a sim ilar computatio n as for Case I based on the id entity (14) yields P m,l ( E ( m ′ , n, i )) ≤ P m,l ( E ( m ′ , n, i, k )) ≤ X V 1 ∈P k ,V 2 ∈P i − k k I ( V 1 )+( i − k ) I ( V 2 ) ≥ t 2 ln M e − k D ( V 1 || P Q ( ·| ⋆ )) − ( i − k ) D ( V 2 || P P Y ) ≤ X V 1 ∈P k ,V 2 ∈P i − k k I ( V 1 )+( i − k ) I ( V 2 ) ≥ t 2 ln M e − k I ( V 1 ) − ( i − k ) I ( V 2 ) ≤ p oly( N ) M − t 2 (16) 15 where P Y ( y ) , P x ∈X P ( x ) Q ( y | x ) . Combining the cases I and I I we get X m ′ 6 = m n ∈ [1 ,...,A + N − 1] i ∈ [1 ,...,N ∧ n ] P m,l ( E ( m ′ , n, i )) ≤ ( M − ( t 1 + t 2 − 1) A + M − ( t 2 − 1) ) p oly ( N ) yielding th e desi red result. Lemma 3 (m iss) . Assume t he codebook to be randomly generated so th at each sample of each codewor d is i.i. d. accor ding t o some di stribution P . F or any thr eshold const ants t 1 ≥ 0 and t 2 ≥ 0 P m,l ( E ( m, l + N − 1 , N ) c ) ≤ poly ( N ) exp − N inf V ∈P Y D ( V || Q ( ·| ⋆ )) 0 . Pr oof: Usi ng Lemmas 2 and 3 we get for any A ≥ 1 , t 1 ≥ 0 , t 2 > 1 , and distribution P P ( E ) ≤ p oly( N ) M − ( t 1 + t 2 − 1) A + M − ( t 2 − 1) + exp − N inf V ∈P Y D ( V || Q ( ·| ⋆ )) 0 for all y ∈ Y , imply ing that D ( P Y || Q ( ·| ⋆ )) < ∞ for any i nput distribution P . The case where Q ( y | ⋆ ) = 0 for some y ∈ Y is cons idered at the end of th e proof. Pick an input distribution P so that I ( P Q ) > 0 and D ( P Y || Q ( ·| ⋆ )) > 0 (this is poss ible since C ( Q ) > 0 ), fix t 2 > 1 , and let µ > 0 be a small constant (later we wil l take t 2 → ∞ and µ → 0 ). Then choosin g the ratio ln M / N > 0 and the cons tant t 1 ≥ 0 so that t 2 ln M N = I ( P Q ) − µ/ 2 (20) and t 1 ln M N = D ( P Y || Q ( ·| ⋆ )) − µ/ 2 , (21) the second, third, and fourth term insi de t he large bracke ts i n (19) decay exponentiall y with N . Now for th e first t erm. From (20) and (21) we get t 1 + t 2 = N ln M ( D ( P Y || Q ( ·| ⋆ )) + I ( P Q ) − µ ) . (22) For the first term to go to zero exponentially with N we further choose A = M t 1 + t 2 − (1+ µ ) , or , equiv alently using (20) and (22) A = e N ( D ( P Y || Q ( ·| ⋆ ))+ I ( P Q ) − µ − ln M N (1+ µ ) ) = e N “ D ( P Y || Q ( ·| ⋆ ))+ I ( P Q ) − µ − 1+ µ t 2 ( I ( P Q ) − µ/ 2) ” . (23) Since µ can be made arbitrarily s mall and t 2 arbitrarily large we conclude from (23) that, as long as A = e N α with α < D ( P Y || Q ( ·| ⋆ )) + I ( P Q ) (24) the right hand side of (19) g oes to zero as N tends to in finity . Maxim izing t he right hand sid e of (24) over the in put distributions P giv es D ( Q ( ·| x ) || Q ( ·| ⋆ )) , yi elding the desired result. T o prove this we show that 17 sup P D ( P Y || Q ( ·| ⋆ )) > 0 I ( P Q ) > 0 ( D ( P Y || Q ( ·| ⋆ )) + I ( P Q )) = max x D ( Q ( ·| x ) || Q ( · | ⋆ )) . (25) 17 The domain over which the supremum is tak en i s nonempty sin ce C ( Q ) > 0 . 17 Since we ass umed that Q ( y | ⋆ ) > 0 for all y ∈ Y , we hav e that D ( P Y || Q ( ·| ⋆ )) + I ( P Q ) is continuous in P and therefore sup P D ( P Y || Q ( ·| ⋆ )) > 0 I ( P Q ) > 0 ( D ( P Y || Q ( ·| ⋆ )) + I ( P Q )) = max P ( D ( P Y || Q ( ·| ⋆ )) + I ( P Q )) . Re writing D ( P Y || Q ( ·| ⋆ )) + I ( P Q ) we get D ( P Y || Q ( ·| ⋆ )) + I ( P Q ) = X x P ( x ) D ( Q ( ·| x ) || Q ( ·| ⋆ )) , hence sup P D ( P Y || Q ( ·| ⋆ )) > 0 I ( P Q ) > 0 ( D ( P Y || Q ( ·| ⋆ )) + I ( P Q )) = max x D ( Q ( Y | x ) || Q ( ·| ⋆ )) . W e now focus on the case where Q ( y | ⋆ ) = 0 for so me y ∈ Y . Pick an i nput distribution P such t hat I ( P Q ) > 0 and D ( P Y || Q ( ·| ⋆ )) = ∞ — one possib ility is to take P as the uniform distribution over X . Again consider the fou r terms in to large b rackets in (19). Fix t 2 > 1 and fix the ratio ln M / N so th at 0 < t 2 ln M N < I ( P Q ) . It follows that the second and fourth t erm decay exponentially with N . Now , with our choice of inp ut distribution note that t he third term decays exponentially with N , i rrespectiv ely of how large t 1 is. By lett ing A = M t 1 it follows that t he four terms decay exponentially wi th N , irrespecti vely of the exponential growth rate of A wit h respect t o N . Hence, when Q ( y | ⋆ ) = 0 for some y ∈ Y , an asynchronism exponent arbitrary large can be achieved. (Note that abov e we always assumed ln M / N to be some strict ly positive c onstant . Therefore the second p art of the claim of the proposit ion fol lows.) T o prove Theorem 2 we consi der th e same random cod ing argument used in proving Propo- sition 2, except that we modify the random codebook ensemble so that each code word now satisfies a certain p refix condit ion. This condit ion will allow us to treat the codew ords as being essentially of cons tant com position (see, e.g. ,[5, p.117 ]) un iformly over their length, yielding an imp rove d error probabil ity exponent com pared to t he case wh ere the codewords are i.i.d . P . The random cons truction of a codebook satisfying the prefix condition is obtained as follo ws. Giv en a mess age m , the codeword c N ( m ) i s generated s o t hat all of its s ymbols are i .i.d. according to a dist ribution P . If the ob tained codew ord does not s atisfy th e prefix condi tion we discard it and regenerate a new codew ord until the prefix condi tion is satisfied. The prefix condition requires that all prefixes c i ( m ) of size i greater than N/ ln N have empirical type ˆ P c i ( m ) close to P , i n the sense that || P − ˆ P c i ( m ) || ≤ 1 / ln N . 18 If N is l ar ge enough, wit h overwhelming probabilit y a random codew ord will sat isfy the prefix conditio n. Indeed, by th e union bound, the p robability of generating a sequence c N ( m ) that d oes not satis fy the prefix condition i s upper bounded by N ex p − Θ ( N / (ln N ) 3 ) , which tends to zero as N tends to infinity . This proves the following lemma. Lemma 4. The pr obabilit y that a sequence C 1 , C 2 , . . . , C N of random variables i .i.d. accor ding to P does not sat isfy t he pr efix condi tion tends to zer o as N g oes to i nfinity . T o prove Theorem 2 w e will n eed Lem mas 5 and 6 that bo und the probabili ties o f false-alarm and mis s assuming the codew ords satis fy the prefix conditi on. Before establishin g these lemmas 18 Here || · || is t he L 1 norm. Also, the choice N / ln N for the minimum prefix size could be replaced by any function f ( N ) so that f ( N ) = o ( N ) while ln N /f ( N ) = o (1) . 18 we make a sm all digression on the growth rate of M and N . Referring to the achie vability scheme of Section III-A, d ecoding may happen only if i is so that the condition min k ∈ [1 ,...,i ] h k I ( ˆ P C k ( m ) ,Y n − i + k n − i +1 ) + ( i − k ) I ( ˆ P C i k +1 ( m ) ,Y n n − i + k +1 ) i ≥ t 2 ln M is satisfied. Thus, a lower bo und on the values of i for which dec oding may happen is ln M / ln | X | since I ( · ) ≤ ln |X | and t 2 > 1 . In order guarantee t hat, whene ver decoding happens, onl y code word prefixes of si ze l ar ger th an N / ln N — the s ize o f t he smallest constant composi tion prefix — are in volved we imp ose that M and N sati sfy N ln N ≤ ln M ln |X | . (26) Lemma 5 (false-alarm, with prefix con dition) . Ass ume the codebook to be randomly gener ated so that each codewor d satisfies the pr efix condition accor ding to P , and assume tha t (26) holds. F or any t hr eshold cons tants t 1 , t 2 ∈ R and an y as ynchr onism level A ≥ 1 X m ′ 6 = m n ∈ [1 ,...,A + N − 1] i ∈ [1 ,...,N ∧ n ] P m,l ( E ( m ′ , n, i )) ≤ p oly( N )( M − ( t 1 + t 2 − 1+ o (1)) A + M − ( t 2 − 1+ o (1)) ) as N → ∞ . Lemma 6 (miss , with prefix condi tion) . Assume the codeboo k to be randomly generated so that each codewor d s atisfies the prefix condition accor ding to P and as sume tha t (26) hol ds. F or any t 1 ≥ 0 and t 2 > 0 P m,l ( E ( m, l + N − 1 , N ) c ) ≤ poly ( N ) exp − N inf V ∈P Y |X D (( P V ) Y || Q ( ·| ⋆ )) 0 . The aver age decoding delay i s bounded as E m,l ( τ − l ) + ≤ E m,l ( τ m − l ) + = E m,l ( 1 1 τ m 1 s atisfy t 1 < t 2 D ( P Y || Q ( ·| ⋆ )) I ( P Q ) (39) 19 The term 1 / M in the definition of j can b e r eplaced by an y positive strictly decreasing fun ction of M . 20 Here we are using the fact that if for so me ε > 0 we hav e min x : g ( x ) ≤ c f ( x ) = m + ε , then min x : f ( x ) ≤ m g ( x ) ≥ c . 22 so th at inf V ∈P Y |X D (( P V ) Y || Q ( ·| ⋆ )) 0 , and hence exp − j inf V ∈P Y |X D (( P V ) Y || Q ( ·| ⋆ )) 1 , and the ratio ln M / N can be chosen so that the i nequalities α < inf V ∈P Y |X D (( P V ) Y || Q ( ·| ⋆ )) ≤ t 1 ln M / N D (( P V ) Y || ( P Q ) Y ) α < min V ∈P Y |X I ( P V ) ≤ t 2 ln M N D ( P V || P Q ) (43) are satisfied. Therefore, if the i nequalities from (39) and (43) are satisfied the delay is b ounded as E m,l ( τ m − l ) + ≤ t 2 ln M I ( P Q ) (1 + o (1)) . (44) 23 W e now bound the error probabili ty . T o that aim we consider the false-alarm and miss eve nts and obtain , by Lemmas 5 and 6 P ( E ) ≤ p oly( N ) M − ( t 1 + t 2 − 1)(1+ o (1) ) A + M − ( t 2 − 1)(1+ o (1) ) + exp − N min V ∈P Y |X D (( P V ) Y || Q ( ·| ⋆ )) 1 , α , and the ratio ln M / N satis fy the follo wing conditions a. α < inf V ∈P Y |X D (( P V ) Y || Q ( ·| ⋆ )) < t 1 α δ ( t 1 + t 2 − 1) D (( P V ) Y || ( P Q ) Y ) b. α < min V ∈P Y |X I ( P V ) ≤ t 2 α δ ( t 1 + t 2 − 1) D ( P V || P Q ) c. t 1 t 2 < D (( P Q ) Y || Q ( ·| ⋆ )) I ( P Q ) (46) d. ln M N ≥ α δ ( t 1 + t 2 − 1) (47) for some δ ∈ (0 , 1) , then the asynchroni sm exponent α can be achieved at rate I ( P Q ) /t 2 . Note that if the condition s a , b , and c are satis fied for so me α , P , t 1 ≥ 0 , t 2 > 1 , and δ ∈ (0 , 1) one can always find choose N / ln M s o that the condition d is s atisfied. Hence, if the con ditions a , b , and c are sat isfied for some α , P , t 1 ≥ 0 , t 2 > 1 , and δ ∈ (0 , 1) the asynchronis m exponent α can be achieved at rate I ( P Q ) /t 2 . T o conclude t he proof we show that j = t 2 ln M I ( P Q ) (1 + o (1)) . T o that aim we show that d ( δ ) = 1 + o (1) as δ → 0 . Since I ( P V ) is a continuo us functio n over the com pact set { V ∈ P Y |X : D ( P V || P Q ) ≤ δ } , (48) the mi nimum i n the denominato r o f t he right hand side of (34) is well d efined, and so is d ( δ ) . W e now show that for δ small enough, the set i n (48) contain s no t rivial conditional probabi lity V , th at is no V ∈ P Y |X such th at V ( ·| x ) is the sam e for all x ∈ X . This will imply that d ( δ ) = 1 + o (1) as δ → 0 . Let W ( x, y ) = W X ( x ) W Y ( y ) for all ( x, y ) ∈ X × Y . The i dentity (14) yiel ds D ( P Q || W ) = I ( P Q ) + D ( P || W X ) + D ( P Y || W Y ) ≥ I ( P Q ) (49) 24 where P Y ( y ) , P x ∈X P ( x ) Q ( y | x ) . Since the set P π of product measures in P is compact and D ( P Q || · ) is continuous over P π , from (49) we have min W ∈P π D ( P Q || W ) ≥ I ( P Q ) . (50) Since I ( P Q ) > 0 , from (50) one deduces that min W ∈P π D ( W || P Q ) is strictly po sitive 21 and therefore the set (48) contains no trivial cond itional probabi lity . Therefore, for δ s mall enough the denomin ator in the definition (34) is strictl y positive, imply ing th at d ( δ ) is finite. W e then deduce that d ( δ ) = 1 + o (1) as δ → 0 . V . C O N C L U D I N G R E M A R K S W e introduced a new model for asynchronous and sparse communication and deri ved scaling laws between asynchronism level and blocklength for reliable and q uick decodin g. Perhaps the main conclu sion i s that eve n in the regime of strong asynchron ism, i.e., when the asynchroni sm lev el is exponential with respect t o the codeword length, reliable and quick decoding can be achie ved. At this point several directions might be purs ued. Perhaps the first is the characterization of the asynchronism e xponent function α ( · , Q ) at positiv e rates. In order to make this problem easier one may want to consider a less stringent rate definition. Indeed, the definition of rate we adopted considers E ( τ − ν ) + as delay . As a consequence, in the exponential asynchronism leve l we mostly focused on, it is difficult to g uarantee high communi cation rate; even though the probabili ty of ‘missing t he codew ord’ is exponentially sm all in the codew ord lengt h, once the codew ord is missed we pay a huge penalty i n terms of delay , of t he order o f the asynchronism level which is exponentiall y large in the codeword length. Therefore, ins tead of imposing E ( τ − ν ) + to be bounded by some d , we may consider a delay constraint of the form P (( τ − ν ) + ≤ d ) ≈ 1 and define th e rate as ln M /d . Another d irection is the extension of the proposed model to includ e t he e vent wh en no message is sent; t he receiv er kno ws th at with probability 1 − p one message is sent and with probabil ity p no message is sent. For this setting ‘natural’ scalings between p and t he asynchronism level remain t o be dis cove red. Finally a word about feedback. W e omitted fee dback in our study in order to av oid a potential additional so urce of asynchronism . Nev ertheless si nce feedback is inherently av ailable in any communication system it is of interest to include, say , a one-bit perfect feedback from th e recei ver to th e transmit ter . In th is case variable length codes can be used and the asynchronis m lev el might be defined directly with respect to E ( τ − ν ) + instead of the blocklength. V I . A P P E N D I X Pr oof of Lemma 1: The binom ial expansion for P s ( T ( ˆ P )) (see, e.g., [4, equation 12.25 ]) giv es P s ( T ( ˆ P )) = s s ˆ P ( a 1 ) , s ˆ P ( a 2 ) , . . . , s ˆ P ( a |A| ) Y a ∈A P ( a ) s ˆ P ( a ) . Using the hypothesis on P , ˆ P , and ¯ P giv es ˆ P ( a i ) ≥ 3 /s , i ∈ { 1 , 2 } , hence P s ( T ( ¯ P )) P s ( T ( ˆ P )) = P ( a 2 ) P ( a 1 ) 3 ( s ˆ P ( a 1 ) − 2)( s ˆ P ( a 1 ) − 1)( s ˆ P ( a 1 )) ( s ˆ P ( a 2 ) + 1)( s ˆ P ( a 2 ) + 2)( s ˆ P ( a 2 ) + 3) = P ( a 2 ) P ( a 1 ) 3 ˆ P ( a 1 ) ˆ P ( a 2 ) ! 3 (1 − 1 /s ˆ P ( a 1 ))(1 − 2 /s ˆ P ( a 1 )) (1 + 1 /s ˆ P ( a 2 ))(1 + 2 /s ˆ P ( a 2 ))(1 + 3 /s ˆ P ( a 2 )) ≥ δ 21 W e use t he fact that D ( P 1 || P 2 ) = 0 if and only if P 1 = P 2 . 25 for some δ = δ ( δ 0 ) > 0 . Lemma 7. F or any distr ibution J on X × Y and any constant r ≥ 0 min t 1 ∈ [0 , 1] min V 1 ,V 2 ∈P t 1 I ( V 1 )+(1 − t 1 ) I ( V 2 ) ≤ r t 1 D ( V 1 || J ) + (1 − t 1 ) D ( V 2 || J ) = min V ∈P I ( V ) ≤ r D ( V || J ) . Pr oof: If r ≥ I ( J ) the claim trivially holds, sin ce the left and right hand side of th e above equation equal to zero. From now on we assume th at r < I ( J ) . Define a = min t 1 ∈ [0 , 1] min V 1 ,V 2 ∈P t 1 I ( V 1 )+(1 − t 1 ) I ( V 2 ) ≤ r I ( V 1 )= I ( V 2 ) t 1 D ( V 1 || J ) + (1 − t 1 ) D ( V 2 || J ) and b = min t 1 ∈ [0 , 1] inf V 1 ,V 2 ∈P t 1 I ( V 1 )+(1 − t 1 ) I ( V 2 ) ≤ r I ( V 1 ) >I ( V 2 ) t 1 D ( V 1 || J ) + (1 − t 1 ) D ( V 2 || J ) . Since a = min V ∈P I ( V ) ≤ r D ( V || J ) to prove the Lemma it suf fices to sho w that b ≥ min V ∈P I ( V ) ≤ r D ( V || J ) . This is done via the following two claims proved below: • claim i. min V : I ( V ) ≤ r D ( V || J ) = min V : I ( V )= r D ( V || J ) . • claim ii. the function f ( r ) , min V : I ( V )= r D ( V || J ) is con vex. Using the above claims we ha ve b = inf r 1 >r 2 r − r 2 r 1 − r 2 r 2 + r 1 − r r 1 − r 2 r 1 = r r − r 2 r 1 − r 2 f ( r 1 ) + r 1 − r r 1 − r 2 f ( r 2 ) ≥ f ( r ) and therefore b ≥ min V ∈P I ( V ) ≤ r D ( V || J ) . The proo f of the above claim s i s based on t he con vexity of D ( J 1 || J 2 ) in the pair ( J 1 , J 2 ) (see, e.g., [5, Lemma 3.5, p.50]). For claim i, let r > 0 and suppose th at I ( V ) < r . 22 By defining ¯ V = λV + (1 − λ ) J wit h λ ∈ [0 , 1) we have D ( ¯ V || J ) < D ( V || J ) by con vexity . On the other hand lett ing V X and V Y denote the l eft and right marginals of V we hav e we hav e I ( ¯ V ) = D ( λV + (1 − λ ) J | | λV X V Y + (1 − λ ) J X J Y ) = λD ( V || V X V Y ) + (1 − λ ) D ( J | | J X J Y ) = λI ( V ) + (1 − λ ) I ( J ) < r where the inequality holds for λ sufficiently close to one. Th erefore ¯ V strictly im proves upo n V and cl aim i fol lows. 23 For claim ii, let V 1 and V 2 achie ve f ( r 1 ) and f ( r 2 ) , for some r 1 6 = r 2 , and let V = λV 1 + (1 − λ ) V 2 . By con ve xity we have D ( V || J ) ≤ λD ( V 1 || J ) + (1 − λ ) D ( V 2 || J ) = λf ( r 1 ) + (1 − λ ) f ( r 2 ) and I ( V ) ≤ r . T his yields claim ii. 22 If r = 0 the claim holds trivially . 23 Notice that i n [5, p.169 ] a similar argument holds for the sphere pac king exp onent. 26 A C K N O W L E D G M E N T The authors wish to thank A shish Khist i for interesting di scussions. R E F E R E N C E S [1] R. Ahlswede and J. W olfo witz, Channels without synchr onization , Adva nces in applied probability 3 (1971), no. 2, 383– 403. [2] Mich ` ele Basse ville and Igor Nikiforov , F ault isolation for diagnosis: nuisance r ejection and multiple hypothesis testing , rapport de recherche 4438, INRIA, 2002. [3] T . M. Cover , R. J. McEliece, and E. C. Posner , Asynchr onous multi ple-access chan nel capacity , I EEE T rans. Inform. Th. 4 (1981), 409–413. [4] T .M. Cover and J.A. Tho mas, Elements of information theo ry , W iley , New Y ork, 1991. [5] I. Csisz ` ar and J. K ¨ orner , Information theory: Cod ing theore ms for discr ete memoryless chann els , Academic P ress, Ne w Y ork, 198 1. [6] S . Diggavi and M. Grossglauser , On transmission over deletion chan nels , Allert on Conference, Monticello, I llinois, October , 2001. [7] R. L. Dobrushin, Shannon’ s theor ems f or channe ls with synchr onization erro rs , Pr oblems Information transmission 3 (1967), no. 4, 11– 26. [8] E. Drinea and M. Mitzenmacher , A simple lower bound for the capacity of the deletion channe l , IE EE Trans. Inform. Th. 52 (2006), 4657–4660. [9] J.Y . N. Hui and P .A. Humblet, The capacity of the totally async hr onous multiple-access channel , IEEE T rans. Inform. Th. 31 (1985), no. 2, 207–216 . [10] T .Z. Lai, Sequential multiple hypothesis testing and efficient fault detection-isolation in stochastic systems , IEEE T rans. Inform. Th. 46 (2000), 59 5–607. [11] I. V . Nikiforov , A generalized chang e detection pr oblem , IE EE Tran s. Inform. Th. 41 (1995 ), 171–1 87. [12] G.S. Poltyrev , Cod ing in an asynchr ono us multiple-access channel , P roblems Inform. T rans. 36 (1983), 12 –21. [13] C. E. S hannon, A m athematical theory o f commu nication , The Bell Sys. T ech . Jou rnal 27 (1948), 379–423. [14] A. Tchamkerten, A. Khisti , and G.W . W ornell, Information theor etic per spectives on sync hr onization , IEEE Intl. Sympo. on Info. Th. (ISIT), 2006, pp. 371– 375. [15] A. Tchamkerten and I. E. T elatar, V ariable length coding over an unknown c hannel , IE EE Tran s. inform. Th. 52 (20 06), no. 5, 2126–2145. [16] S. V erd ´ u, The capacity r e gion of the symbol-async hr onous g aussian multiple-access chann el , IEEE Tra ns. Inform. Th. 35 (1989), no. 4, 733 –751.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment