Distributed Source Coding for Interactive Function Computation
A two-terminal interactive distributed source coding problem with alternating messages for function computation at both locations is studied. For any number of messages, a computable characterization of the rate region is provided in terms of single-…
Authors: Nan Ma, Prakash Ishwar
1 Distrib uted Source Coding for Interac ti v e Function Computation 1 Nan Ma and Prakash Ishwar Department of Electrical and Computer Enginee ring Boston Uni versity , Boston, MA 02215 { nanma, pi } @bu.edu Abstract A two-terminal interactiv e distr ibuted so urce coding problem wit h alternating messages for function computation at both loc ations is stud ied. For any number o f messages, a c omputable characterization o f the rate region is p rovided in terms o f single-letter informa tion measures. While interaction is useless in terms of the minimum sum-rate for lossless source reproduction at one or both locations, the gains can be arbitrarily large for function computation ev en when the sources are independ ent. For a class of sources and functions, interaction is sho wn to be useless, even with infinite messages, when a fun ction has to be c omputed at only o ne location, but is shown to be useful, if functions ha ve to be computed at both locations. For computing the Boolean A ND function of two independ ent Bernoulli sources at both locations, an achiev able infinite-message sum-rate with infinitesimal-rate messages is deri ved in terms of a two- dimensional definite integral and a rate-allocation curve. A general frame work for multiterminal interactiv e function computation based on an i nformation exc hange protocol which successi vely switches among di ff erent distributed source coding configuration s is dev eloped. For networks with a star topology , multiple rounds of interactiv e coding is sho wn to decrease the scaling la w of the t otal network rate by an order of magnitude as the ne twork gro ws. Index T erms distributed source codin g, function comp utation, interacti ve coding, rate-distortion region , Slepian-W olf coding, two-wa y coding, W yner-Ziv coding. I. I ntr oduction In networked systems where distributed infer encing and co ntrol n eeds to be perfor med, the raw-data (sou rce samples) generated at di ff erent nod es (in formatio n sou rces) n eeds to be transformed and combined in a nu mber o f ways to extract actionab le infor mation. This r equires p erform ing distributed computation s on the source samples. A pu re data-tra nsfer solution ap proach would advocate first reliably r eprodu cing th e source samp les at decision - making n odes and then perf orming suitable comp utations to extract actionab le inform ation. T wo-way interaction and statistical d ependen cies am ong source, destina tion, and relay nodes, would be utilized, if at all, to primar ily improve the reliability of data-reprod uction than overall compu tation-e ffi cien cy . Howe ver, to maximize the overall comp utation-e ffi ciency , it is necessary for nodes to in teract bid irectionally , perfor m co mputatio ns, an d explo it statistical depend encies in data as op posed to only gener ating, receiving, and 1 This material is based upon work supported by the US Nat ional Scienc e Founda tion (NSF) under award (CAREER) CCF–0546598. Any opinions, findings, and conclusion s or recommendation s expresse d in this material are those of th e authors and do not necessarily reflect the vie ws of the NSF . A part of this work was present ed in ISIT’08. Novem ber 2, 2021 DRAFT 2 forwarding data. In this p aper we attem pt to fo rmalize this co mmon wisdom thr ough some example s of distributed function -compu tation pr oblems with the g oal of minimizing the total n umber of bits exchanged p er source samp le. Our objective is to highlig ht the r ole of interaction in com putation- e ffi ciency within a distributed source codin g framework inv olving block -coding asympto tics and vanishing p robab ility of function-com putation erro r . W e derive informa tion-theo retic ch aracterization s of the set of f easible coding-r ates for these problems an d explore th e fascinating interplay of fu nction-stru cture, distribution-structu re, and interaction. A. Pr oblem setting Consider the fo llowing general two-term inal interactive distributed sou rce cod ing problem with alternating me s- sages illustrated in Figure 1. Here, n samples X : = X n : = ( X (1) , . . . , X ( n )) ∈ X n , of an info rmation source are av ailable at loc ation A . A d i ff erent loc ation B has n samples Y ∈ Y n of a second informatio n so urce which are statistically cor related to X . Location A desires to produ ce a sequ ence b Z A ∈ Z n A such that d ( n ) A ( X , Y , b Z A ) ≤ D A where d ( n ) A is a nonn egati ve distortion function of 3 n variables. Similarly , location B d esires to produ ce a seq uence b Z B ∈ Z n B such that d ( n ) B ( X , Y , b Z B ) ≤ D B . All alphabets are assumed to be finite. T o a chieve the desired objective, t coded messages, M 1 , . . . , M t , of re spectiv e bit rates (bits p er source samp le), R 1 , . . . , R t , ar e sent alternately fr om the two location s starting with location A of location B . The messag e sent fro m a lo cation can d epend on th e source samples at that location and on all the pre vious messages (which are a vailable to b oth locations). There is enough memory at both locations to store all th e sou rce samples a nd messages. An impor tant goal is to ch aracterize the set of all rate t -tuples R : = ( R 1 , . . . , R t ) for which both P ( d ( n ) A ( X , Y , b Z A ) > D A ) and P ( d ( n ) B ( X , Y , b Z B ) > D B ) → 0 as n → ∞ . This set o f rate-tuples is called th e rate region. b Z A A M 2 B M 1 M t R 1 R 2 R t . . . X Y b Z B Fig. 1. Interac ti ve distrib uted source cod ing with t altern ating messages. B. Related work The av ailab le literatu re clo sely related to this prob lem can be rou ghly p artitioned into three broad categories. The salient f eatures of rela ted problem s in these three categories are sum marized below using the nota tion of the problem setting described ab ove. 1) Communicatio n co mplexity [1]: Her e, X and Y are typically determin istic, t is not fixed in advance, and d ( n ) A and d ( n ) B are the ind icator functions of the sets { b Z A , f A ( X , Y ) } and { b Z B , f B ( X , Y ) } respe ctiv e ly . Thus, the goal is to com pute the functio n f A ( X , Y ) at lo cation A and the fu nction f B ( X , Y ) at lo cation B . Both deterministic Novem ber 2, 2021 DRAFT 3 and rand omized codin g strategies have been studied . If coding is deter ministic, th e function s are req uired to be computed without err or , i.e., D A = D B = 0. If coding is rando mized, with the sources of rand omness independent of each other and X and Y , then b Z A and b Z B are rando m variables. In th is case, compu tation could be requir ed to be er ror-free and th e terminatio n time t rand om ( the Las-V egas fr amework) o r the ter mination time t could be held fixed but large enoug h to keep the pro bability of computatio n error smaller th an some desired value (th e Monte-Carlo framework). The coding-e ffi c iency for function c omputatio n is called communication complexity . When codin g is deterministic, commun ication complexity is m easured in terms of the min imum value, over all codes, of the total number of bits that n eed to be exchange d between the two locations, to co mpute th e fu nctions witho ut error, ir respective of the values of th e sour ces. When cod ing is r andom ized, both th e worst-case an d the expected value o f the total number of b its, over all sou rces of random ization, ha ve b een consider ed. The focus o f muc h of the literature has been on establishing order-of-m agnitude up per and lower b ound s for the comm unication com plexity and not on characterizin g the set of all sou rce codin g rate tu ples in b its per source sample. In fact, the rang es of f A and f B considered in the co mmunica tion complexity literature are often orde rs of magnitu de smaller than their domains. This would cor respond to a vanishing source coding rate. Recently , howev e r , Girid har and K umar succe ssfully ap plied the communicatio n co mplexity framework to study how the rate of functio n comp utation can scale with the size of the network f or d eterministic sources [2], [3]. They considered a network wh ere each n ode o bserves a (d eterministic) sequen ce of sour ce samples and a sink n ode where the seque nce of fu nction values needs to be comp uted. T o study how the co mputation rate scales with the network size, they considered the class of connected random planar n etworks and the class of co-loc ated networks and focused on th e divisible and symmetric families o f functions. 2) Interactive source r ep r oductio n: Kasp i [4] c onsidered a distributed block source cod ing [5 , Section 14.9] formu lation of this problem for discrete m emory less stationary sources taking v alu es in fin ite alphabets. Ho wever , the fo cus was on source reprodu ction with d istortion and no t function com putation. The sourc e reprod uction qua lity was measur ed in terms of two single-letter distor tion fun ctions of the form d ( n ) A ( x , y , ˆ z A ) : = (1 / n ) P n i = 1 d A ( y ( i ) , ˆ z A ( i )) and d ( n ) B ( x , y , ˆ z B ) : = (1 / n ) P n i = 1 d B ( x ( i ) , ˆ z B ( i )). Coup led single-letter disto rtion fu nctions of the form d A ( x ( i ) , y ( i ) , ˆ z A ( i )) and d B ( x ( i ) , y ( i ) , ˆ z B ( i )), and probability of block error fo r lo ssless reprodu ction, were not considered. For a fixed number o f messages t , a single- letter char acterization of th e sum- rate pair ( P j odd R j , P j ev en R j ) ( not the entire rate region) was derived. Howe ver, n o examples were pr esented to illustrate the benefits of two-way so urce c oding. The k ey question: “d oes two-way (in teractive) distributed source c oding with more messag es requ ire a strictly less sum-rate than with f ewer messages?” was left unanswered. The r ecent p aper by Y ang an d He [6] studied two-term inal interactive source cod ing for the lossless re produ ction of a stationary no n-ergod ic source X at B with deco der side-info rmation Y . Here, the code te rmination criter ion depend ed on the sources and p revious messages so that t was a ran dom v ar iable. T wo-way in teractive coding w as shown to be strictly better than one-way non -interactive cod ing. 3) Interactive fun ction compu tation: I n [7], Y amamo to stud ied the prob lem wh ere ( X , Y ) is a doubly symme tric binary so urce, 2 terminal B is require d to compute a Boolean functio n of th e sou rces satisfyin g an expected per- sample H amming d istortion criterion correspo nding to d ( n ) B ( x , y , ˆ z B ) : = (1 / n ) P n i = 1 ( f B ( x ( i ) , y ( i )) ⊕ ˆ z B ( i )), whe re f B ( x , y ) is a Boolean f unction, on ly o ne message is allo wed, i.e., t = 1 , an d n othing is required to be com puted a t termin al 2 ( X ( i ) , Y ( i )) ∼ iid p XY ( x , y ) = 0 . 5(1 − p ) δ xy + 0 . 5 p (1 − δ xy ), where δ i j is the Kroneck er delta, and x , y ∈ { 0 , 1 } . W e say ( X , Y ) ∼ DSBS( p ). Novem ber 2, 2021 DRAFT 4 A , i.e., d ( n ) A = 0. This is equivalent to W yner-Zi v source co ding [ 5] with deco der side-in formatio n for a p er-sample distortion functio n which depend s on the decoder reconstructio n a nd both the sources. Y amamoto c omputed the rate-distortio n function for all the 16 Boo lean function s of two binary variables and sho w ed that they are of only three forms. In [8], Han an d K obay ashi studied a three- terminal problem where X and Y are discrete memoryless stationary sources takin g values in finite alphab ets, X is observed at terminal one and Y at term inal two and terminal thr ee wishes to compute a samplewise fun ction of the sources losslessly . Only termin als o ne and two c an each send only one message to ter minal three. Han an d Kobayashi cha racterized the class of fu nctions for which the rate region of this problem coincides with the Slepian-W olf [ 5] rate re gion. Orlitsky and Roche [9] stud ied a d istributed b lock source coding p roblem whose setu p coincides with Kaspi’ s prob- lem [4] d escribed above. Howe ver , the fo cus was on computing a sam plewise functio n f B ( X , Y ) = ( f B ( X ( i ) , Y ( i ))) n i = 1 of the two sou rces at term inal B using up to two message s ( t ≤ 2). Nothing was requ ired to be comp uted at terminal A , i.e ., d ( n ) A = 0. Both probab ility of b lock er ror P ( { b Z B , f B ( X , Y ) } ) and p er-sample expected Hamming distortion (1 / n ) P n i = 1 P ( ˆ Z B ( i ) , f B ( X ( i ) , Y ( i ))) wer e considered. A single-letter characterization of th e rate region was deri ved. Example 8 in [9 ] sho wed that th e sum-rate with 2 m essages is strictly smaller than with one message. C. Contributions W e study the two-terminal interacti ve fun ction com putation problem de scribed in Section I- A for discrete memory less stationary sources tak ing values in finite alp habets. The goa l is to co mpute samplewise fu nctions at on e or b oth lo cations a nd th e two fun ctions can be the same o r di ff e rent. W e focus on a distributed block source co ding formu lation involving a pr obability of block error which is required to vanish as the blockleng th ten ds to infinity . W e deriv e a computable char acterization of the th e rate region and the m inimum sum -rate for a ny finite number of messages in terms of single- letter informatio n quantities (Theo rem 1 and Corollary 1). W e show how the rate-region s for di ff erent number of messages and di ff erent starting locations are nested (Proposition 1). W e show how the Markov chain and condition al entro py constraints associated with the r ate region are related to certain geo metrical pr operties of the suppor t-set of the joint distribution a nd the function -structur e (Lemma 1). This re lationship provides a link to the concept o f mono chrom atic rectangles which has been stud ied in the commu nication complexity literature. W e also consider a co ncurre nt k ind o f interaction wh ere m essages a re exchan ged simultaneou sly and show h ow the minimum sum-r ate is bounded b y the sum-r ate fo r a lternating- message interac tion ( Proposition 2). W e also conside r per-sample average distortio n criteria based on co upled single-letter distortion functio ns which inv olve the decoder output and bo th sources. For expected distortion as well a s prob ability of excess d istortion we d iscuss how the single-letter characterizatio n of the rate-distortion region is related to th e rate region for probability of blo ck error (Section III-B). Striking examples are pre sented to sho w ho w the b enefit of interac ti ve co ding depends on the f unction- structure, computatio n at one / bo th locations, and the structure of the so urce distribution. Intera ctiv e coding is u seless ( in terms of the minim um sum-rate) if the g oal is lo ssless sour ce repr oduction at on e o r both loca tions but the g ains can be arbitrarily large for computing nontr ivial fun ctions in volving both sources e ven when the sources are indepen dent (Sections IV -A, I V -B, and IV -C ). F or certain classes of sources and functions, in teractive coding is shown to h ave no advantage (Theor ems 2 and 3). In fact, for d oubly symmetric binary sou rces, interactive co ding, with even an unbou nded numbe r of messages is useless for computing an y function at one location (Sectio n I V -D) b ut is useful if compu tation is desired at both lo cations (Section IV -E). For indepen dent Bern oulli sources, when the Boo lean Novem ber 2, 2021 DRAFT 5 AND fu nction is required to b e co mputed at both locatio ns, we dev elop an achievable infinite-me ssage sum-r ate with an infinitesimal ra te for each message (Section IV -F). Th is sum-rate is expressed in ana lytic closed-f orm, in terms of two two-dimensional de finite in tegrals, which rep resent the total rate flowing in each directio n, and a rate-allocation curve wh ich coordinates the pr ogression of function comp utation. W e develop a gen eral for mulation of mu ltiterminal inter activ e function com putation in terms an interaction protoco l which switches amo ng many distributed source coding config urations (Section V). W e show how r esults for the two-term inal pro blem can be used to dev elop insigh ts in to optimu m to pologies f or info rmation flow in larger networks throug h a linear progr am inv o lving cut-set lower bound s (Section s V -B and V -C). W e show that allowing any arbitrary number of inter activ e message e xchanges over multiple rounds cannot red uce the minimum total rate for the K ¨ orner-Marton problem [10]. For networks with a star topolo gy , howe ver, we show that in teraction can, in fact, decrease the scaling law of the tota l network rate b y an o rder of m agnitud e as the network gr ows (Ex ample 3 in Section V -C). Notation: In this paper, the terms terminal, node, and lo cation, are syn onymous an d are u sed interch angeably . The acronym ‘iid’ stands for indepen dent and identically distributed and ‘pm f ’ stands for probab ility ma ss functio n. Boldface letters such as, x , X , etc., are used to deno te vectors. Altho ugh the dimension of a vector is supp ressed in this no tation, it will be clear from the context. W ith th e exception of the symbols R , D , N , L , A , and B , random quantities are deno ted in upp er case, e.g., X , X , etc., and th eir spe cific instantiatio ns are denoted in lower case, e.g., X = x , X = x , etc. When X d enotes a r andom variable, X n denotes the o rdered tuple ( X 1 , . . . , X n ) and X n m denotes the or dered tu ple ( X m , . . . , X n ). Ho w ev er , for a set S , S n denotes the n -fold Cartesian produc t S × . . . × S . The symbol X ( i − ) denotes ( X (1) , . . . , X ( i − 1 )) and X ( i + ) d enotes ( X ( i + 1 ) , . . . , X ( n ) ). The indicato r fun ction of set S which is equal to on e if x ∈ S and is zero otherwise, is deno ted b y 1 S ( x ). The sup port-set o f a pmf p is the set over which it is strictly positive and is deno ted by supp( p ). Sym bols ⊕ , ∧ , an d ∨ represent Boolean XOR, AND, and OR respecti vely . II. T wo - termina l interac t ive function comput a tion A. Interactive distributed sour c e code W e consider two statistically dep endent discrete memoryless stationar y sources taking values in finite alph abets. For i = 1 , . . . , n , let ( X ( i ) , Y ( i )) ∼ iid p X Y ( x , y ) , x ∈ X , y ∈ Y , |X| < ∞ , |Y| < ∞ . Here, p X Y is a joint p mf which describes the statistical d epend encies am ong the samples o bserved at the two locations at each time instant i . Le t f A : X × Y → Z A and f B : X × Y → Z B be function s of interest at locations A and B respectively , where Z A and Z B are finite alp habets. The desired outputs at locations A and B ar e Z A and Z B respectively , wher e fo r i = 1 , . . . , n , Z A ( i ) : = f A ( X ( i ) , Y ( i )) and Z B ( i ) : = f B ( X ( i ) , Y ( i )). Definition 1: A (two-ter minal) interactive distributed sou rce code (for function com putation ) with initial location A and par ameters ( t , n , |M 1 | , . . . , | M t | ) is the tuple ( e 1 , . . . , e t , g A , g B ) of t block en coding fu nctions e 1 , . . . , e t and Novem ber 2, 2021 DRAFT 6 two block de coding functions g A , g B , of blocklength n , where for j = 1 , . . . , t , (Enc. j ) e j : X n × N j − 1 i = 1 M i → M j , if j is odd Y n × N j − 1 i = 1 M i → M j , if j is e ven , (Dec. A ) g A : X n × t O j = 1 M j → Z n A , (Dec. B ) g B : Y n × t O j = 1 M j → Z n B . The outpu t of e j , deno ted by M j , is called th e j -th me ssage, and t is the nu mber of messages. The ou tputs of g A and g B are denoted by b Z A and b Z B respectively . For each j , (1 / n ) log 2 |M j | is called the j -th b lock-co ding r ate (in bits per sample). Intuitively speaking , t cod ed messages, M 1 , . . . , M t , ar e sent alternately from the two location s starting with location A . The m essage sent from a location c an depend on the source samples at th at location and on all the previous messages (which ar e available to both locatio ns from previous message transfers). There is enoug h memory at both locations to store all the source samples and message s. W e consider two types of fidelity criteria for interactiv e function computation in this paper . These are 1) probability of block error a nd 2) per -sample distortion. B. Pr obab ility of block err or and operational rate r e gio n Of inte rest here ar e the probabilities of blo ck error P ( Z A , b Z A ) an d P ( Z B , b Z B ) wh ich are mu lti-letter distortio n function s. Th e performance of t -message interactive coding for function co mputatio n is measure d as follows. Definition 2: A rate tuple R = ( R 1 , . . . , R t ) is admissible for t -message intera ctiv e f unction co mputation with initial location A if, ∀ ǫ > 0, ∃ N ( ǫ , t ) su ch that ∀ n > N ( ǫ , t ), there exists an interacti ve distributed sou rce code with initial location A and parameters ( t , n , |M 1 | , . . . , | M t | ) satisfying 1 n log 2 |M j | ≤ R j + ǫ , j = 1 , . . . , t , P ( Z A , b Z A ) ≤ ǫ , P ( Z B , b Z B ) ≤ ǫ . The set of all admissible ra te tuples, d enoted by R A t , is called the operational rate region f or t - message in teractive function comp utation with initial loca tion A . T he rate region is closed and convex due to the way it has b een defined. The m inimum sum-rate R A sum , t is giv en by min P t j = 1 R j where the m inimization is over R ∈ R A t . For initial location B , the rate region and the minimum sum-rate are den oted by R B t and R B sum , t respectively . C. P er-sample distortion and operational rate-distortion re gion Let d A : X × Y × Z A → R + and d B : X × Y × Z B → R + be bou nded sing le-letter distortion function s. The fidelity of function computation c an be measured b y the per-sample average distortion d ( n ) A ( x , y , ˆ z A ) : = 1 n n X i = 1 d A ( x ( i ) , y ( i ) , ˆ z A ( i )) . d ( n ) B ( x , y , ˆ z B ) : = 1 n n X i = 1 d B ( x ( i ) , y ( i ) , ˆ z B ( i )) . Of interest here are either the expected per -sample distortions E [ d ( n ) A ( X , Y , b Z A )] and E [ d ( n ) B ( X , Y , b Z B )] or the prob- abilities o f excess distortion P ( d ( n ) A ( X , Y , b Z A ) > D A ) and P ( d ( n ) B ( X , Y , b Z B ) > D B ). Note that althou gh the desired Novem ber 2, 2021 DRAFT 7 function s f A and f B do not explicitly appear in these fidelity criteria, they are subsumed by d A and d B because they accommo date general relationsh ips between the sour ces and the outp uts of the d ecoding fu nctions. T he p erform ance of t -message interactive coding for function computation is measur ed a s follows. Definition 3: A rate-distor tion tup le ( R , D ) = ( R 1 , . . . , R t , D A , D B ) is ad missible for t -message inter activ e fu nction computatio n with initial location A if, ∀ ǫ > 0, ∃ N ( ǫ , t ) suc h that ∀ n > N ( ǫ , t ), there exists an intera ctiv e d istributed source code with in itial location A and parameters ( t , n , |M 1 | , . . . , | M t | ) satisfying 1 n log 2 |M j | ≤ R j + ǫ , j = 1 , . . . , t , E [ d ( n ) A ( X , Y , b Z A )] ≤ D A + ǫ , E [ d ( n ) B ( X , Y , b Z B )] ≤ D B + ǫ . The set of all ad missible rate-d istortion tuples, d enoted by RD A t , is called th e o perationa l rate-distortion region for t -message inter activ e function co mputatio n with initial location A . The rate-d istortion region is closed and conve x due to the way it has been de fined. T he sum- rate-distortio n func tion R A sum , t ( D ) is given b y min P t j = 1 R j where the m inimization is over all R such that ( R , D ) ∈ RD A t . For initial lo cation B , the rate-d istortion region and the minimum sum-rate-distor tion fun ction are de noted by RD B t and R B sum , t ( D ) respectively . The admissibility of a rate-d istortion tuple can also be define d in terms o f the pro bability o f excess distor tion by rep lacing the expe cted distortio n cond itions in Definition 3 by th e cond itions P ( d ( n ) A ( X , Y , b Z A ) > D A ) ≤ ǫ and P ( d ( n ) B ( X , Y , b Z B ) > D B ) ≤ ǫ . Although these conditions ap pear to b e mo re string ent 3 , it can be shown 4 that they lead to the same operational rate-distortion region. F or simplicity , we focus on the expected d istortion condition s as in Definition 3. D. Discussion For a t -message interactive distributed source code, if | M t | = 1, the n M t = con sta nt (nu ll message) and nothing needs to be sent in the last step and the t - message code redu ces to a ( t − 1)-message co de. Thus the ( t − 1 )-message rate region is contained within the t -message rate region. For gen erality and conv enience, |M j | = 1 is allowed for all j ≤ t . Th e following prop osition summarize s some key properties o f the rate regions which are needed in th e sequel. Pr oposition 1: (i) If ( R 1 , . . . , R t − 1 ) ∈ R A t − 1 , then ( R 1 , . . . , R t − 1 , 0) ∈ R A t . Hence R A sum , ( t − 1) ≥ R A sum , t . (ii) If ( R 1 , . . . , R t − 1 ) ∈ R B t − 1 , then (0 , R 1 , . . . , R t − 1 ) ∈ R A t . Hence R B sum , ( t − 1) ≥ R A sum , t . Similarly , R A sum , ( t − 1) ≥ R B sum , t . (iii) lim t →∞ R A sum , t = lim t →∞ R B sum , t = : R sum , ∞ . Pr oof: (i) Any ( t − 1)-message code with initial lo cation A can b e regarded a s a special case of a t -message code with initial location A by taking | M t − 1 | = 1. (ii) Any ( t − 1)-me ssage code with initial location B can be regarded as a special c ase of a t -message c ode with in itial locatio n A by taking |M 1 | = 1. (iii) From (i), R A sum , t and R B sum , t are nonincr easing in t and bound ed fro m b elow from zero, so the limits exist. Fro m (ii), R A sum , ( t − 1) ≥ R B sum , t ≥ R A sum , ( t + 1) , hence the limits ar e equal. Proposition 1 is also true for an y fixed distortion lev e ls ( D A , D B ) if we replace rate regions a nd minimum sum-rates in the proposition b y rate-distortion regions and sum-rate-d istortion f unctions respecti vely . 3 Any tuple which is admissible according to the probabilit y of excess distortion criteria is also admissible accordi ng to the expecte d distortion criter ia. 4 Using strong-typi calit y arguments in the proof of the achie va bilit y part of the single-lett er characteri zati on of the rate-distorti on region. Novem ber 2, 2021 DRAFT 8 E. Interaction wi th concurrent message exchanges In con trast to th e type of in teraction descr ibed in Sectio n II- A which inv o lves altern ating message transfers , one could also consider an other ty pe o f in teraction w hich in volves co ncurr e nt messages e xchanges . In this type of interaction, in the j -th rou nd o f interaction, two messages M A B j and M BA j are generated simultaneously by encodin g function s e A B j (at loc ation A ) a nd e BA j (at loc ation B ) respectively . T hese messages are based on th e sou rce samples which are av ailable at each loca tion and on all the previous messages { M A B i , M BA i } j − 1 i = 1 which are av ailable to b oth locations from previous rou nds of in teraction. T hen M A B j and M BA j are exchanged . In t ro unds, 2 t me ssages are transferred . After t round s of interaction, deco ding functio ns g A and g B generate fun ction estimates based on all the messages a nd the source samples which ar e available at loca tions A and B respectively . W e can define the rate region and th e rate-d istortion region for concu rrent intera ction as in Section s II-B and II- C for alter nating interaction. Let R conc sum , t denote th e m inimum sum-rate f or t -roun d interactive function compu tation with concur rent message exchanges. The following proposition shows how th e minimu m sum- rates for con current a nd alternating typ es o f interactio n bound each o ther . This is based on a p urely structu ral comparison of altern ating and con current modes of inte raction. Pr oposition 2: (i) R A sum , t ≥ R conc sum , t ≥ R A sum , ( t + 1) . (ii) lim t →∞ R conc sum , t = lim t →∞ R A sum , t = R sum , ∞ . Pr oof: ( i) Th e first in equality hold s because any t -message interactiv e cod e with altern ating message s and initial location A can b e r egarded as a spe cial case of a t - round interactive code with con current messages by tak ing |M A B j | = 1 for all e ven j and | M BA j | = 1 fo r all odd j . The seco nd ine quality can be proved as f ollows. Giv e n any t - roun d interactive code with concu rrent messages and encodin g func tions { e A B j , e BA j } t j = 1 , one can construct a ( t + 1)-m essage in teractive cod e with alternating messages as follows: (1) Set e 1 : = e A B 1 . (2) For j = 2 , . . . , t , if j is even, d efine e j as the comb ination of e BA j − 1 and e BA j , otherwise, define e j as the comb ination of e A B j − 1 and e A B j . (3 ) If t is even, set e t + 1 : = e A B t , o therwise set e t + 1 : = e BA t . It can be verified b y induction that the inpu ts of { e 1 , . . . , e t + 1 } defined in th is way are indeed av ailable when these encoding function s ar e used. Hence th ese are v alid enco ding function s f or intera ctiv e coding with alternating messages. This ( t + 1)- message interactive code with alter nating messages has the same sum-rate as the o riginal t -r ound inter activ e code with concurrent messages. Th erefor e we have R A sum , ( t + 1) ≤ R conc sum , t . (ii) This follo ws f rom (i). Although a t -rou nd interactive co de with concu rrent me ssages uses 2 t messages, the sum-rate perfo rmance is bound ed b y th at of an alternating -message code with only ( t + 1) messages. When t is large, th e benefit of concur- rent interac tion over alter nating interactio n disappea rs. Du e to this re ason and because for two-terminal function computatio n it is easier to descr ibe results for a lternating in teraction, in Sections III and IV ou r d iscussion will be confined to alternating interaction. For m ultiterminal function computatio n, h owe ver, th e framework of concu rrent interaction beco mes more convenient. Hence in Section V we conside r multiterminal fun ction compu tation problems with concurr ent interactio n. III. R a te region A. Pr obab ility of block err or When the pro bability of block error is used to m easure the quality of fu nction computatio n, the rate region for t -message interactive d istributed source coding with alternatin g messages can b e characterized in term s of single- letter mutual information quan tities inv olving a uxiliary r andom variables satisfying conditional entropy constraints and Markov chain constraints. This characterization is p rovided b y Theorem 1 . Novem ber 2, 2021 DRAFT 9 Theor em 1: R A t = { R | ∃ U t , s . t . ∀ i = 1 , . . . , t , R i ≥ I ( X ; U i | Y , U i − 1 ) , U i − ( X , U i − 1 ) − Y , i odd I ( Y ; U i | X , U i − 1 ) , U i − ( Y , U i − 1 ) − X , i ev en H ( f A ( X , Y ) | X , U t ) = 0 , H ( f B ( X , Y ) | Y , U t ) = 0 } , (3.1) where U t are auxiliary random variables taking v alues in alph abets with the car dinalities bounded as fo llows, |U j | ≤ |X| Q j − 1 i = 1 |U i | + t − j + 3 , j odd , |Y| Q j − 1 i = 1 |U i | + t − j + 3 , j ev en . (3.2) It sho uld be note d that the right side of (3 .1) is conve x and closed. T his is because R A t is co n vex and closed and Theorem 1 shows that the righ t side of ( 3.1) is the same as R A t . In fact th e co n vexity and closedness of th e right side of (3 .1) can be shown directly without ap pealing to Theor em 1 and the pro perties of R A t . This is explained at the end of Appendix I. The proof of ach iev ability follows from stan dard rand om coding and random binnin g argu ments as in the so urce coding with side infor mation prob lem studied by W yner , Ziv , Gray , Ah lswede, an d K ¨ orner [5] (also see Kaspi [4]). W e only develop the intuition and in formally sketch th e steps leading to the proof o f ach iev ability . The key idea is to u se a seq uence of “W yner-Zi v -like” codes. First, En c.1 quantizes X to U 1 ∈ ( U 1 ) n using a r andom codebo ok-1. The codew o rds are further randomly distributed into bins and the bin index of U 1 is sent to locatio n B . Enc.2 identifies U 1 from the bin with the h elp of Y as deco der side-information. Next, En c.2 jointly quantizes ( Y , U 1 ) to U 2 ∈ ( U 2 ) n using a random codeb ook- 2. The cod ew o rds are ran domly b inned and the bin index o f U 2 is sen t to location A . Enc.3 identifies U 2 from th e bin with the help of ( X , U 1 ) as deco der side-in formatio n. Generally , for th e j -th messag e, j o dd, Enc. j jo intly quantizes ( X , U j − 1 ) to U j ∈ ( U j ) n using a rand om code book- j . The codewords are randomly binned and the b in index of U j is sent to location B . Enc.( j + 1) iden tifies U j from the bin with the help of ( Y , U j − 1 ) as de coder side info rmation. If j is even, inter change the roles of locatio ns A and B and sou rces X and Y in the pro cedure for an o dd j . Note that H ( f A ( X , Y ) | X , U t ) = 0 implies the existence of a d eterministic fun ction φ A such th at φ A ( X , U t ) = f A ( X , Y ). At the e nd of t messages, Dec. A p rodu ces b Z A by b Z A ( i ) = φ A ( X ( i ) , U t ( i )) , ∀ i = 1 , . . . , n . Similarly , Dec. B produces b Z B . The rate and Markov chain constraints ensure that all qu antized co dew ords are jo intly strong ly typical with the sources and are rec overed with a pro bability whic h tends to one as n → ∞ . The cond itional entro py constrain ts ensure that the correspond ing bloc k er ror pr obabilities for function computation go to zero as the blocklength tends to infinity . The (weak ) conv erse is p roved in App endix I following [ 4] using stand ard infor mation inequalities, suitably defining aux iliary random variables, and usin g conve x ification (time-shar ing) argumen ts. The co nditional entropy constraints are established using Fano’ s inequality as in [8, Le mma 1]. The proo f o f cardinality bou nds f or the alphabets of the au xiliary random v ar iables is also sketched. Cor ollary 1: For all t , ( i ) R A sum , t = min U t [ I ( X ; U t | Y ) + I ( Y ; U t | X )] , (3.3) ( ii ) R A sum , t ≥ H ( f B ( X , Y ) | Y ) + H ( f A ( X , Y ) | X ) , (3.4) where in (i) U t are subject to all th e Markov ch ain and cond itional entropy co nstraints in (3.1) and the cardina lity bound s given by (3.2). Novem ber 2, 2021 DRAFT 10 Pr oof: For (i), add all the rate in equalities in ( 3.1) e nforcin g all the co nstraints. Ine quality (ii) can be proved either using (3.3) and relax ing the Markov chains constraints, or using the following cut-set bound argume nt. If Y is also available at location A , then Z B = f B ( X , Y ) can be compu ted at location A . Hence by the converse part o f the Slepian-W olf theo rem [5], the sum -rate of all messages from A to B must be at least H ( f B ( X , Y ) | Y ) fo r B to form Z B . Similarly , the sum- rate o f all messages from B to A must be at least H ( f A ( X , Y ) | X ). Although (3.1) and (3.3) provide compu table single-letter c haracterizatio ns of R A t and R A sum , t respectively for all finite t , they do not provide a characteriz ation for R sum , ∞ in term s o f comp utable single-letter infor mation qu antities. This is because the card inality bounds fo r the alp habets of the au xiliary random v ariab le U t , giv e n b y (3.2), gr ow with t . The Markov chain and condition al e ntropy co nstraints of (3. 1) imp ly ce rtain structural p roperties w hich the support- set of the joint distribution of the source an d auxiliar y random variables need to satisfy . These proper ties are formalized below in Lemma 1. This lemm a p rovides a bridge b etween cer tain con cepts which have p layed a key role in the com municatio n complexity literatu re [1] and distributed source cod ing theory . In or der to state the lem ma, we need to introd uce some termin ology used in th e co mmunica tion complexity literatur e [1 ]. This is adapted to ou r fr amew ork an d notation . A sub set A ⊆ X × Y is called f -monochr oma tic if th e function f is constan t on A . A subset A ⊆ X × Y is called a rectangle if A = S X × S Y for som e S X ⊆ X and some S Y ⊆ Y . Su bsets of the fo rm { x } × S Y , x ∈ S X , are called rows an d subsets of th e fo rm S X × { y } , y ∈ S Y , are called column s the rectangle A = S X × S Y . By definition, the em pty set is simultaneo usly a recta ngle, a row , and a c olumn. If each row of a rectangle A is f -monoch romatic, then A is said to be ro w-wise f -mono chrom atic. Similarly , if each column of a rec tangle A is f -mon ochrom atic, then A is said to be colu mn-wise f -monoch romatic. Clearly , if A is both row-wise and colum n-wise f -monoc hromatic, then it is an f -mon ochro matic subset of X × Y . Lemma 1: L et U t be any set of auxiliary ra ndom v ariab les satisfying the Markov chain and co nditional entr opy constraints of (3.1). Let A ( u t ) : = { ( x , y ) | p X Y U t ( x , y , u t ) > 0 } denote the pro jection o f the u t -slice of sup p( p X Y U t ) onto X × Y . If supp( p X Y ) = X × Y , th en for all u t , the follo w ing f our co nditions hold . (i) A ( u t ) is a re ctangle. (ii) A ( u t ) is r ow-wise f A -mono chroma tic. (iii) A ( u t ) is column-wise f B -mono chroma tic. (iv) If in addition, f A = f B = f , then A ( u t ) is f -mon ochrom atic. Pr oof: (i) The Markov chains in (3.1) in duce the followi ng f actorization of the joint pm f. p X Y U t ( x , y , u t ) = p X Y ( x , y ) · p U 1 | X ( u 1 | x ) · p U 2 | Y U 1 ( u 2 | y , u 1 ) · p U 3 | X U 2 ( u 3 | x , u 2 ) . . . = : p X Y ( x , y ) φ X ( x , u t ) φ Y ( y , u t ) , where φ X is the product of all the factor s h aving con ditioning on x and φ Y is the p roduc t of all the factors having condition ing on y . Let S X ( u t ) : = { x | φ X ( x , u t ) > 0 } and S Y ( u t ) : = { y | φ Y ( y , u t ) > 0 } . Since p X Y ( x , y ) > 0 for all x and y , A ( u t ) = S X ( u t ) × S Y ( u t ). (ii) This follows fro m the co nditiona l entropy c onstraint H ( f A ( X , Y ) | X , U t ) = 0 in (3.1). (iii) This f ollows f rom the conditiona l entropy co nstraint H ( f B ( X , Y ) | Y , U t ) = 0 in (3 .1). (iv) This f ollows from parts (ii) an d (iii) of th is lemma. Note th at A ( u t ) is the empty set if, an d only if, p U t ( u t ) = 0. T he above lem ma h olds for all values of t . The fact that the set A ( u t ) has a rectangular shape is a consequence o f the fact that the aux iliary random v ariables U t need to satis fy the Markov chain constraints in (3.1). Th ese Markov chain constraints are in turn consequen ces of the stru ctural constraints which are inher ent to the cod ing pr ocess – message s alterna te fr om one term inal to the other and can depen d o n only the source samples and all the previously rec eiv ed messages which ar e a vailable at Novem ber 2, 2021 DRAFT 11 a terminal. The rectang ular property depen ds “less directly” on the fu nction-stru cture than on th e structur e of the coding pr ocess and th e structure of th e jo int source d istribution. On the other ha nd, the f act that A ( u t ) is ro w- wise or / and column- wise monoch romatic is a consequen ce o f the fact that the auxiliary rand om variables U t need to satisfy the co nditional entro py con straints in (3.1). T his pro perty is more closely tied to th e structure o f the fun ction and the structu re of the joint distribution of sourc es. Lemma 1 will be used to prove Theorems 2 and Theo rem 4 in the sequel. B. Rate-distortion r e gion When per-sample distortion criteria a re used, the single-le tter ch aracterizatio n of the rate-distortio n region is giv en b y Theorem 1 with the conditional entropy co nstraints in (3 .1) r eplaced by the fo llowing exp ected dis- tortion constraints: th ere exist deter ministic f unctions ˆ g A and ˆ g B , such that E d A ( X , Y , ˆ g A ( X , U t )) ≤ D A and E d B ( X , Y , ˆ g B ( Y , U t )) ≤ D B . The p roof of achiev ability is similar to that o f Theo rem 1. Th e distortion co nstraints get satisfied “automatically ” by using strong ly typical sets in the random cod ing and bin ning arguments. The proof of the con verse given in Appen dix I will continu e to ho ld if equ ations (I.4) and (I.5) are replaced by E [ d ( n ) A ( X , Y , b Z A )] ≤ D A + ǫ and E [ d ( n ) B ( X , Y , b Z B )] ≤ D B + ǫ resp ectiv ely an d the su bsequent steps in the proo f changed approp riately . The following proposition clarifies the relationsh ip between the rate region for pr obability o f block err or a nd the rate-distortio n region. Pr oposition 3: Let d H denote the Hamm ing distortion fun ction. If d A ( x , y , ˆ z A ) = d H ( f A ( x , y ) , ˆ z A ), d B ( x , y , ˆ z B ) = d H ( f B ( x , y ) , ˆ z B ), and D A = D B = 0, then { R | ( R , 0 , 0) ∈ RD A t } = R A t . Pr oof: In or der to show that { R | ( R , 0 , 0) ∈ RD A t } ⊇ R A t , note that ∀ R ∈ R A t , we have ǫ ≥ P ( Z A , b Z A ) ≥ E [ d ( n ) A ( X , Y , b Z A )] a nd ǫ ≥ P ( Z B , b Z B ) ≥ E [ d ( n ) B ( X , Y , b Z B )] f or the distortion fun ction assumed in the statement of the propo sition. Ther efore ( R , 0 , 0) ∈ RD A t . In order to show tha t { R | ( R , 0 , 0) ∈ RD A t } ⊆ R A t , note th at ∀ R such th at ( R , 0 , 0 ) ∈ RD A t , we ha ve d A ( X , Y , ˆ g A ( X , U t )) = d H ( f A ( X , Y ) , ˆ g A ( X , U t )) = 0 , which imp lies f A ( X , Y ) = ˆ g A ( X , U t ), wh ich in turn im plies H ( f A ( X , Y ) | X , U t ) = 0. Similarly , we have H ( f B ( X , Y ) | Y , U t ) = 0 . Therefore R ∈ R A t . Although the proof of the single- letter character ization of RD A t implies the pro of of Theo rem 1 for R A t , since the focu s of this p aper is on pr obability of bloc k er ror an d the p roofs f or RD A t are very similar , we provide th e detailed con verse proof only for Theorem 1 for R A t . IV . E xamples Does interac tion really h elp? In o ther words, does interactive codin g with mo re messages strictly o utperf orm coding with less messages in terms of the sum-rate? When only one nontrivial functio n ha s to be com puted at o nly one locatio n, at least one message is needed. In this situation, in teraction will be considered to be “useful” if there exists t > 1 such that R A sum , t < R A sum , 1 . When nontrivial f unction s h av e to be computed at both loca tions, at least tw o messages are neede d, one going from A to B a nd the other f rom B to A . Since me ssages go in both direc tions, a two-message cod e can b e potentially considere d to be intera ctiv e . Howe ver, this is a trivial for m o f in teraction because functio n c omputatio n is im possible without two messages. Therefor e, in this situation, inte raction will be considered to be useful if ther e exists t > 2 such th at R A sum , t < R A sum , 2 . Corollary 1 does not d irectly tell us if o r whe n interaction is u seful. In this sectio n we e x plore the value o f interactio n in di ff eren t scen arios thro ugh som e striking Novem ber 2, 2021 DRAFT 12 examples. I nteraction does help in examples IV - C, IV - E and IV -F, and does not (even with infinite messages) in examples IV -A, IV - B and IV -D. A. Interaction is useless for r epr o ducing one sour ce at one location: f A ( x , y ) : = 0 , f B ( x , y ) : = x. Only X needs to be reproduced at location B . Unless H ( X | Y ) = 0, at least one message is n ecessary . From (3 .4), ∀ t ≥ 1 , R A sum , t ≥ H ( X | Y ) . But R A sum , 1 = H ( X | Y ) by Slepian-W olf coding [5] with X as sour ce an d Y as decod er side informa tion. Henc e, by Pr oposition 1(i), R A sum , t = R A sum , 1 = H ( X | Y ) for all t ≥ 1. B. Interaction is useless for r epr o ducing both sour ces at both locations: f A ( x , y ) : = y , f B ( x , y ) : = x. Unless H ( X | Y ) = 0 or H ( Y | X ) = 0, at least two message s are necessary . Fro m (3.4), ∀ t ≥ 2 , R A sum , t ≥ H ( X | Y ) + H ( Y | X ). But R A sum , 2 = H ( X | Y ) + H ( Y | X ) by Slepia n-W o lf co ding, first with X as so urce an d Y as deco der side informa tion and then vice-versa. Hence, b y Proposition 1(i), R A sum , t = R A sum , 2 = H ( X | Y ) + H ( Y | X ) for all t ≥ 2. Examples IV -A and IV -B show that if the goal is source reprodu ction with vanishing distortion, interac tion is useless 5 . T o discover the value o f interaction, we must study either nonzero distortions or functio ns which in volve both sources. Our focus is on the latter . C. Benefit of interaction can be arbitrarily lar ge fo r function comp utation: X y Y , X ∼ Uniform { 1 , . . . , L } , p Y (1) = 1 − p Y (0) = p ∈ ( 0 , 1 ) , f A ( x , y ) : = 0 , f B ( x , y ) : = xy (real mu ltiplication) . This is an expand ed version of Example 8 in [9] . At lea st one message is n ecessary . If t = 1, an a chiev ab le scheme is to send X by Slepian-W olf coding at the rate H ( X | Y ) = lo g 2 L so that the function can be computed a t location B . Alth ough location B is requ ired to compute only th e sample wise pro duct and is not r equired to repr oduce X , it turns out, rather surprising ly , that the on e-message rate H ( X | Y ) canno t be decreased. Th is is a direct consequen ce of a lemma d ue to Han a nd K obay ashi wh ich we no w state by adaptin g it to our sit uation and notation. Lemma 2: ( Han and K o bayashi [8, Lemma 1]) Let supp( p X Y ) = X × Y . If ∀ x 1 , x 2 ∈ X , x 1 , x 2 , there exists y 0 ∈ Y such that f B ( x 1 , y 0 ) , f B ( x 2 , y 0 ), then R A sum , 1 ≥ H ( X | Y ) . The condition of Lemma 2 is satisfied in our pre sent example with y 0 = 1. Therefo re we have R A sum , 1 = H ( X | Y ) = log 2 L . W ith one extra m essage and initial loca tion B , howe ver, Y can b e re produ ced at location A by entropy-co ding at the rate R 1 = H ( Y ) = h 2 ( p ) bits p er sample. Th en, Z B can be comp uted at loca tion A an d conv eyed to lo cation B via Slepian- W olf coding at the ra te R 2 = H ( f B ( X , Y ) | Y ) = p log 2 L bits per sample, wher e h 2 is the binar y entropy function . Th erefor e, R B sum , 2 ≤ h 2 ( p ) + p log 2 L . The benefit of ev en one extra me ssage can be significant: For fixed L , ( R A sum , 1 / R B sum , 2 ) can be mad e arbitr arily large fo r suitably sm all p . For fixed p , ( R A sum , 1 − R B sum , 2 ) can be mad e arbitrarily large fo r su itably large L . Extrapo lating from this example, one might be led to b eliev e that the benefit of interactio n arises due to com puting nontrivial f unction s which in volve both sources as opposed to reproducing the sources themselves. In other words, the function -structure deter mines wh ether interaction is beneficial or n ot (recall that the sou rces were independen t in this example) . Howe ver, the structure of the jo int distribution p lays an equally impo rtant role and this aspect will be highlighted in th e next example. 5 Ho we ver , in teract ion ca n prov e useful for source reproduct ion when it is either required to be error-free [11], [12] or when the sources are station ary but non-ergod ic [6] Novem ber 2, 2021 DRAFT 13 D. Interaction can b e useless fo r comp uting a ny fun ction at one loc ation: Y = X ⊕ W , X y W , X ∼ Ber ( q ) , W ∼ Ber ( p ) , f A ( x , y ) : = 0 , f B ( x , y ) : = any function. If f B ( x , y ) do es not depe nd on x , i.e., there exists a function f ′ such that f B ( x , y ) = f ′ ( y ), no co mmun ication is needed and interaction d oes not help. If f B ( x , y ) depen ds on x , then ∃ y 0 ∈ { 0 , 1 } such that f B (0 , y 0 ) , f B (1 , y 0 ). Theo rem 2 below , proved in Appen dix II, shows th at interaction do es not help e ven with in finite messages. Theor em 2: Let f A ( x , y ) = 0 an d let Y = X ⊕ W , with X y W , X ∼ Ber ( q ), a nd W ∼ Be r ( p ). If th ere exists a y 0 ∈ { 0 , 1 } such that f B (0 , y 0 ) , f B (1 , y 0 ), then for all t ∈ Z + , R A sum , t = H ( X | Y ). Remark: Th e conclusion o f The orem 2 that interac tion d oes no t help canno t be d irectly dedu ced from (3 .4): When f B ( x , y ) = x ∧ y (Boolean AND), the lower bo und in Corollary 1(ii) H ( X ∧ Y | Y ) = H ( X | Y = 1) p Y (1) is strictly less than H ( X | Y ) if 0 < p , q < 1. The result of Theorem 2 can be g eneralized to the following theorem for non- binary sources. Th e proof of this theorem is provided in Append ix I I immediately af ter the proof of Th eorem 2. Theor em 3: Let f A ( x , y ) = 0 and let sup p( p X Y ) = X × Y . If (i) the o nly column- wise f B -mono chroma tic rectangles of X × Y are subsets of rows an d colum ns and (ii) there exists a rando m variable W an d d eterministic fu nctions ψ and η such th at Y = ψ ( X , W ), X = η ( Y , W ), and H ( Y | X ) = H ( W ), 6 then for all t ∈ Z + , R A sum , t = H ( X | Y ) . The exam ples till this point h av e hig hlighted the e ff ects of functio n-structur e and d istribution-structure on the benefit of interaction . The n ext example will highligh t a slightly di ff eren t aspect o f fun ction-structu re associated with the situation in wh ich both sides need to co mpute the same no ntrivial f unction which in volves both sources. The distribution-structure in the n ext example will be essentially th e same as in Examp le IV -D but with q = 1 / 2 and 0 < p < 1, i.e., ( X , Y ) ∼ DSBS( p ) . Howe ver, both lo cations will need to co mpute the samplewise Boolean AND function. Interestingly , in this situation the benefit of inte raction returns as explain ed b elow . E. Interaction can b e u seful for compu ting a fun ction of sources at bo th loca tions: ( X , Y ) ∼ DSBS ( p ) , p ∈ (0 , 1 ) , f A ( x , y ) = f B ( x , y ) : = x ∧ y. Since both lo cations need to compute non trivial function s, at least two m essages are n eeded. In a 2-message code with initial lo cation A , loc ation B sh ould be able to pro duce Z B after receiving the first m essage. By Lemma 2, R 1 ≥ H ( X | Y ) = h 2 ( p ) . W ith R 1 = h 2 ( p ) an d a Slepian-W olf code with Y as side-inform ation, X can be repr oduced at locatio n B . Thu s f or the seco nd message, R 2 = H ( f B ( X , Y ) | X ) = (1 / 2 ) h 2 ( p ) is both necessary and su ffi cien t to ensure that location A can produce Z A . Hence R A sum , 2 = (3 / 2) h 2 ( p ) . If a third message is allowed, one choice of auxiliary random variables in (3.1) is U 1 : = X ∨ W , W ∼ Ber (1 / 2 ) , W y ( X , Y ), U 2 : = Y ∧ U 1 , and U 3 : = X ∧ U 2 . Hence U 3 = X ∧ Y = f B ( X , Y ) ⇒ H ( f A ( X , Y ) | X , U 3 ) = H ( f B ( X , Y ) | Y , U 3 ) = 0. Hence, R A sum , 3 ≤ I ( X ; U 3 | Y ) + I ( Y ; U 3 | X ) = 5 4 h 2 ( p ) + 1 2 h 2 1 − p 2 − (1 − p ) 2 ( a ) < 3 2 h 2 ( p ) = R A sum , 2 , where step ( a ) h olds for all p ∈ (0 , 1 ) and the gap is maximum for p = 1 / 3 . When p = 0 . 5, X y Y , and an ac hiev ab le 3-message sum-rate is ≈ 1 . 406 < 1 . 5 = R A sum , 2 . Note th at as a special case of Example IV -D , if ( X , Y ) ∼ DSBS( p ) an d only loc ation B n eeds to compute the Boolean AND fun ction, interactio n is useless. But if both locations n eed to com pute it, and p ∈ (0 , 1), then the benefit of interaction returns. Motiv ated by the benefits of using the more an d m ore m essages, we investigate infinite-message interaction in th e following example. 6 It is easy to see that if Y = ψ ( X , W ), then H ( Y | X ) = H ( W ) ⇔ X y W and H ( W | X , Y ) = 0. Novem ber 2, 2021 DRAFT 14 F . An achievable infin ite-message sum-rate as a definite inte g ral with infi nitesimal-rate messages: X y Y , X ∼ Ber ( p ) , Y ∼ Ber ( q ) , p , q ∈ (0 , 1) , f A ( x , y ) = f B ( x , y ) = x ∧ y. 0 (1 − p ) X = 1 X = 0 (1 − q ) (1 − p ) X = 1 X = 0 (1 − q ) (1 − p ) X = 1 X = 0 (1 − q ) v y 1 1 Y = 1 Y = 0 U 4 = (0 , 0 , 0 , 0 ) R eg (1) U 4 = (1 , 0 , 0 , 0 ) R eg (2) R eg (3) R eg (4) U 4 = (1 , 1 , 1 , 1 ) ( α ( s 1 ) , β ( s 1 )) U 4 = (1 , 1 , 1 , 0 ) U 4 = (1 , 1 , 0 , 0 ) rate-allocation curve Γ (a) v x t → ∞ v y 1 1 Y = 1 Y = 0 (c) v x v y 1 1 Y = 1 Y = 0 (b) v x curve Γ rate-allocation curve Γ ∗ optimal rate- allocation W ∗ y W y W x W ∗ x 0 0 Fig. 2. (a) 4 -message interacti ve code (b) ∞ -message interacti ve code (c) ∞ -message interac ti ve code with optimal rate-al locat ion curv e w hen q ≥ p . As in Examp le IV -E, the 2-message minimum sum-rate is R A sum , 2 = H ( X | Y ) + H ( f B ( X , Y ) | X ) = h 2 ( p ) + ph 2 ( q ). Example I V -E dem onstrates the g ain of interaction . Th is inspire s us to gen eralize the 3-message c ode of Exam- ple IV -E to an arb itrary n umber of messages and ev aluate an achiev ab le infinite-m essage sum-r ate. Sin ce we are interested in the limit t → ∞ , it is su ffi cient to co nsider e ven-valued t d ue to Proposition 1. Define real au xiliary random variables ( V x , V y ) ∼ Unifor m ( [0 , 1] 2 ). If X : = 1 [1 − p , 1] ( V x ) and Y : = 1 [1 − q , 1] ( V y ), then ( X , Y ) has the corr ect joint pm f, i.e., p X (1) = 1 − p X (0) = p , p Y (1) = 1 − p Y (0) = q and X y Y . W e will interpret 0 and 1 as real zero and real one respecti vely as needed. This interpretation will allow us to express Boolean arithme tic in terms of r eal arithmetic. Th us X ∧ Y (Bo olean AND) = X Y (real multiplication) . Define a r ate-alloca tion curve Γ parame trically b y Γ : = { ( α ( s ) , β ( s )) , 0 ≤ s ≤ 1 } wher e α and β ar e real, nond ecreasing, absolu tely continuo us function s with α (0) = β (0) = 0, α (1 ) = ( 1 − p ), and β (1 ) = (1 − q ). T he significance of Γ will beco me clear later . Now cho ose a partition of [0 , 1 ], 0 = s 0 < s 1 < . . . < s t / 2 − 1 < s t / 2 = 1, such that m ax i = 1 ,..., t / 2 ( s i − s i − 1 ) < ∆ t . For i = 1 , . . . , t / 2, define t auxiliar y r andom v ariab les as fo llows, U 2 i − 1 : = 1 [ α ( s i ) , 1] × [ β ( s i − 1 ) , 1] ( V x , V y ) , U 2 i : = 1 [ α ( s i ) , 1] × [ β ( s i ) , 1] ( V x , V y ) . In Figu re 2(a), ( V x , V y ) is uniform ly d istributed o n the un it square an d U t are defin ed to be 1 in rectan gular regions wh ich are nested. The f ollowing pro perties can be verified: P 1: U 1 ≥ U 2 ≥ . . . ≥ U t . P 2: H ( X ∧ Y | X , U t ) = H ( X ∧ Y | Y , U t ) = 0: since U t = 1 [1 − p , 1] × [1 − q , 1] ( V x , V y ) = X ∧ Y . P 3: U t satisfy all the Markov chain co nstraints in (3.1): for exam ple, consider U 2 i − ( Y , U 2 i − 1 ) − X . U 2 i − 1 = 0 ⇒ U 2 i = 0 a nd the Markov ch ain h olds. U 2 i − 1 = Y = 1 ⇒ ( V x , V y ) ∈ [ α ( s i ) , 1] × [1 − q , 1 ] ⇒ U 2 i = 1 and the Markov cha in holds. Given U 2 i − 1 = 1 , Y = 0, ( V x , V y ) ∼ Unifor m([ α ( s i ) , 1] × [ β ( s i − 1 ) , 1 − q ]) ⇒ V x and V y are condition ally indepen dent. Thus X y U 2 i | ( U 2 i − 1 = 1 , Y = 0) b ecause X is a f unction of only V x and U 2 i is a function of only V y upon conditioning . So the Markov chain U 2 i − ( Y , U 2 i − 1 ) − X holds in all situatio ns. P 4: ( Y , U 2 i ) y X | U 2 i − 1 = 1: this can be proved by the same method as in P 3. Novem ber 2, 2021 DRAFT 15 P 2 and P 3 show th at U t satisfy all th e constraints in ( 3.1). For i = 1 , . . . , t / 2, the (2 i )-th rate is gi ven by I ( Y ; U 2 i | X , U 2 i − 1 ) = P 1 = I ( Y ; U 2 i | X , U 2 i − 1 = 1) p U 2 i − 1 (1) P 4 = I ( Y ; U 2 i | U 2 i − 1 = 1) p U 2 i − 1 (1) = H ( Y | U 2 i − 1 = 1) p U 2 i − 1 (1) − H ( Y | U 2 i , U 2 i − 1 = 1) p U 2 i − 1 (1) ( b ) = H ( Y | U 2 i − 1 = 1) p U 2 i − 1 (1) − H ( Y | U 2 i = 1) p U 2 i (1) = (1 − α ( s i )) (1 − β ( s i − 1 )) h 2 q 1 − β ( s i − 1 ) ! − (1 − β ( s i )) h 2 q 1 − β ( s i ) !! ( c ) = (1 − α ( s i )) Z β ( s i ) β ( s i − 1 ) log 2 1 − v y 1 − q − v y ! d v y = Z Z [ α ( s i ) , 1] × [ β ( s i − 1 ) ,β ( s i )] w y ( v y , q ) d v x d v y , where step (b) is due to pr operty P 4 an d b ecause ( U 2 i − 1 , U 2 i ) = (1 , 0) ⇒ Y = 0, hen ce H ( Y | U 2 i , U 2 i − 1 = 1) p U 2 i − 1 (1) = H ( Y | U 2 i = 1 , U 2 i − 1 = 1) p U 2 i , U 2 i − 1 (1 , 1 ) P 1 = H ( Y | U 2 i = 1) p U 2 i (1), and st ep (c) is because ∂ ∂ v y − (1 − v y ) h 2 q 1 − v y !! = log 2 1 − v y 1 − q − v y ! = : w y ( v y , q ) . The 2 i -th rate can thu s be expre ssed as a 2- D integral of a weight fu nction w y over the rectangu lar region R eg (2 i ) : = [ α ( s i ) , 1] × [ β ( s i − 1 ) , β ( s i )] ( a h orizonta l bar in Figur e 2(a)) . There fore, the su m o f r ates of all messages sent from location B to location A is the integral of w y over the un ion of all the corresponding ho rizontal bars in Figu re 2(a). Similarly , the sum of rates o f all messages sent from loc ation A to location B can be expressed as the integral of another weight fu nction w x ( v x , p ) : = lo g 2 ((1 − v x ) / (1 − p − v x )) over the union of all the vertica l b ars in Figure 2(a). Now let t → ∞ such that ∆ t → 0. Since α and β are absolu tely contin uous, ( α ( s i ) − α ( s i − 1 )) → 0 and ( β ( s i ) − β ( s i − 1 )) → 0 . Th e u nion o f the horizontal (resp. vertical bars) in Figur e 2(a) tends to the region W y (resp. W x ) in Figure 2(b). Hence a n achie vable infinite-message sum-rate gi ven by Z Z W x w x ( v x , p ) d v x d v y + Z Z W y w y ( v y , q ) d v x d v y (4.5) depend s o n only the rate-allo cation curve Γ wh ich co ordinates the progress of sou rce d escriptions at A and B . Since W x S W y is indep endent of Γ , (4 .5) is min imized when W x = W ∗ x : = { ( v x , v y ) ∈ [ 0 , 1 − p ] × [0 , 1 − q ] : w x ( v x , p ) ≤ w y ( v y , q ) } ∪ [0 , 1 − p ] × [1 − q , 1]. For q ≥ p , th e boun dary Γ ∗ separating W ∗ x and W ∗ y is g iv en b y the piecewise linear curve con necting (0 , 0), (( q − p ) / q , 0), (1 − p , 1 − q ) in that order (see Figure 2(c)). For W x = W ∗ x , (4.5) can be evaluated in closed form and is gi ven by h 2 ( p ) + ph 2 ( q ) + p log 2 q + p (1 − q ) log 2 e . (4.6) Recall that R A sum , 2 = h 2 ( p ) + ph 2 ( q ). The di ff erence p (log 2 q + (1 − q ) log 2 e ) is an inc reasing func tion of q fo r q ∈ (0 , 1] an d e quals 0 wh en q = 1. Hence the di ff ere nce is negativ e fo r q ∈ (0 , 1). So R sum , ∞ < R A sum , 2 and interaction doe s help. In particu lar , whe n p = q = 1 / 2, ( ( X , Y ) ∼ iid Ber (1 / 2)), by an infin ite-message code, we can achieve the sum-r ate (1 + (log 2 e ) / 4) ≈ 1 . 361, comp ared with the 3-message ach iev able sum-rate 1 . 40 6 an d Novem ber 2, 2021 DRAFT 16 the 2-m essage minimu m sum- rate 1 . 5 in Exa mple IV -E. It should be noted that for finite t , Γ is staircase- like and contains h orizon tal and vertical segments. Ho wever , Γ ∗ contains an ob lique se g ment. So the code with finite t generated in this way never achieves the infinite-message sum-rate. It can b e approximated only when t → ∞ an d each message uses an infin itesimal rate. Note that the a chiev ab le sum- rate (4 .5) is no t sh own to be the o ptimal sum-rate R sum , ∞ because we only consider a particu lar c onstruction of the aux iliary r andom variables. W e have, howe ver , th e following lower bo und for R sum , ∞ which can be proved b y a technique which is similar to the proof of Th eorem 2. Theor em 4: If X y Y , X ∼ B er ( p ) , Y ∼ Ber ( q ) , f A ( x , y ) = f B ( x , y ) = x ∧ y , 0 < p , q < 1, we have R sum , ∞ ≥ h 2 ( p ) + h 2 ( q ) − (1 − pq ) h 2 (1 − p ) (1 − q ) 1 − pq ! . The proof is gi ven in Appendix II I. T his lower b ound is strictly less than ( 4.6) wh en 0 < p , q < 1. F or exam ple, when p = q = 1 / 2, (( X , Y ) ∼ iid B er ( 1 / 2)) , the boun d in Theorem 4 gi ves us R sum , ∞ ≥ (2 − (3 / 4) h 2 (1 / 3)) ≈ 1 . 311, compare d with the infinite-message achiev able sum-rate 1 . 361. V . M ul t itermin al interac tive function comput a tion W e can consider mu ltiterminal interacti ve function co mputation pr oblems as gene ralizations of the two-terminal interactive f unction com putation prob lem. At a h igh level, interactive fu nction com putation may b e thoug ht of as a form o f distributed source codin g with pro gressive levels of feedbac k. Althou gh the multiterm inal prob lem is significantly m ore intricate, imp ortant insights can be extracted by leveraging results for the two-terminal problem. The ability to prog ressiv ely refine in forma tion bi-d irectionally in mu ltiple round s lies at the heart of in teractive function compu tation. This ability to refine inform ation can ha ve a significant impac t on the e ffi ciency of information transport in large networks as discussed in Section V -C (see Example 3). A. Pr oblem formulation Let m be the n umber of nodes. Con sider m statis tically dependent discrete memo ryless stationary sou rces taking values in finite a lphabets. For each j , where j takes integer values 1 th roug h m , le t X j : = ( X j (1) , . . . , X j ( n )) ∈ ( X j ) n denote the n source samples which ar e av ailable at no de j . For i = 1 , . . . , n , let ( X 1 ( i ) , X 2 ( i ) , . . . , X m ( i )) ∼ iid p X 1 ,..., X m where p X 1 ,..., X m is a joint pmf which describ es the statistical d ependen cies am ong the samples observed at the m nodes at each time instant. For ea ch j and i , let Z j ( i ) : = f j ( X 1 ( i ) , . . . , X m ( i )) ∈ Z j and let Z j : = ( Z j (1) , . . . , Z j ( n )). The tuple Z j denotes n samples of the samplewise fu nction of all the sources which is desired to be comp uted at node j . Let the topolo gy of the network be characterized by a d irected graph G = ( V , E ), where V : = { 1 , . . . , m } is the vertex set of all the n odes an d E is the edge set of all the dire cted links which are available for commun ication. The network top ology describes the conn ectivity and info rmation flow constraints in the network. It is assumed that the top ology is consistent with the goals of function compu tation, th at is, fo r every n ode which compu tes a nontrivial functio n which depend s on the source samples at other no des, there exists a set of directed p aths over which inform ation can be transfered from the relevant n odes to perfo rm the computatio n. In order to perform the computatio ns, a t -roun d multiterminal in teractive distributed source co de for function co mputation can be defined b y extending the notion of a t -rou nd concurr ent-message interactiv e cod e for the tw o -terminal problem (see Section II-E) in the following man ner . In the i -th ro und, where i takes integer values 1 throu gh t , for each d irected link ( j , k ) ∈ E , a message M jki is generated at node j as a pre- specified deterministic function of X j and all the messages to and Novem ber 2, 2021 DRAFT 17 from this node in all the previous round s. Then all the messages in th e i -th round ar e transfer red concur rently over all the av ailable directed links. After t roun ds, at each node j , a d ecoding functio n r eprodu ces Z j as b Z j based on X j and all th e message s to and from this n ode. As par t of the t -r ound interactive code specification, a m essage over any link in any rou nd is allo we d to be a nu ll message, i.e, no m essage is sent over the lin k, and this is known in adv ance as part of the code. By incor porating null messages, th e concurrent-m essage inter activ e cod ing schem e described a bove su bsumes all conce i vable types of in teraction. Let a link be ca lled active in a given ro und if it does not carry a null m essage in that r ound . For each round i , let E i denote the sub set of dire cted links in E wh ich are active. A t -round interaction pro tocol is the sequence of directed sub graph s E 1 , . . . , E t which describes h ow the nodes are permitted to excha nge messages over di ff ere nt r ounds. This co ntrols the dyn amics of info rmation flow in the network. Our key po int of view , illustrated in Fig ure 3, is that, interactive function computation is at its h eart, an interac tion protoco l which successi vely switches the information- flow to pology am ong several basic distributed source coding configur ations. In the two-terminal case, the alternating-me ssage interaction protoco l is simple: messages alter nate from o ne no de to the other; the only free pa rameter in the pro tocol being the initial no de which mu st be chosen to minim ize the sum rate. For this protoc ol, th ere is essentially o nly one typ e of con figuration and accord ingly only o ne basic d istributed source codin g strategy , namely , W y ner-Zi v-like coding with all the previously recei ved messages as common side-inf ormation av ailable to both th e no des. The mu ltiterminal case is, howe ver , sign ificantly more intricate. For instan ce, with three nodes there are sev eral ba sic configur ations in addition to the poin t-to-po int one, e.g., many-to-one, one- to-many , and relay as sho wn in Figure 3. X 3 X 2 1 X 1 2 3 X 3 X 2 1 X 1 2 3 X 3 X 2 1 X 1 2 3 R 13 R 31 X 2 1 2 3 R 23 R 12 R 21 R 32 Z 3 3-terminal interactiv e f unction computation Many-to-one configuration One-to-many configuration Relay configuration X 1 X 3 Z 1 Z 2 Fig. 3. Interactiv e function com putation can b e vie wed as an interaction protoco l which s uccessi vely switches amo ng se veral bas ic distributed sou rce coding configurations. The e ffi ciency of communication for f unction computation can be measured at v ario us le vels. The most precise characterizatio n would be in terms of the ( t |E| )-dimen sional rate tuple ( R jki ) ( j , k ) ∈E , i = 1 ,..., t correspo nding to the nu mber of b its per samp le in each link in each r oun d . A coarser characterization would be in terms of the |E| -dim ensional total-rate tuple ( R jk ) ( j , k ) ∈E , where R jk is the total numb er of bits per samp le transferred through link ( j , k ) in a ll the r ou nds . The coar sest character ization would be in ter ms of the sum-total-r ate wh ich is the sum of the total number o f bits per sample in all the r oun ds throu gh all th e links . One can then de fine admissible rates, admissible total-rates, and the min imum sum -total-rate R sum , t , following Definition 2, in ter ms of rates fo r which there exist encodin g and dec oding functio ns fo r which the block error proba bility o f function com putation g oes to zero a s Novem ber 2, 2021 DRAFT 18 the block length goes to infinity . Let t ∗ denote the minimu m number of r ound s for which function co mputation is feasible. Com putation is n ontrivial if t ∗ ≥ 1. Clearly , t ∗ is no t more tha n the diameter of the largest connected compon ent of the network which is itself not mo re than ( m − 1). Hence t ∗ ≤ ( m − 1). W e will consider interactio n to be useful if R sum , t < R sum , t ∗ for some t > t ∗ . The sear ch f or an op timum inter activ e code is a twofold sear ch over all interaction p rotoco ls a nd over all distributed sou rce codes. The in teraction protoco l d ictates which nod es tr ansmit and which no des receive messages in each rou nd. The distributed source co de d ictates what inform ation to sen d an d h ow to decode it. I n th e two- terminal case, th e standard machinery of rand om cod ing and bin ning is adequ ate to charac terize the rate region and the minimu m sum r ate because it can be v iewed a s a sequence of W yner-Zi v-like codes. In th e multiterm inal case, howe ver, finding a compu table char acterization of the r ate region s in ter ms of single- letter in formatio n measur es can be ch allenging b ecause the r ate region s for ev en no n-interac tiv e special cases, such as the many-to- one, o ne-to- many , and r elay c onfigur ations (see Figu re 3) ar e lon gstanding o pen pr oblems. For ma ny of these configu rations, the standard machiner y o f random coding and binning fall short of gi ving the optimal performan ce as exemplified by the K ¨ orner-Marton pro blem [1 0]. These di ffi cu lties n otwithstandin g, results fo r the two-termina l intera ctiv e fun ction computatio n pro blem can be u sed to develop insightfu l per forman ce b ounds and architectur al gu idelines f or the general multiterminal problems. Th is is discussed in the following tw o subsections. B. Cut-set bounds Giv e n any t -ro und mu ltiterminal inter activ e fu nction com putation prob lem, we can form ulate a t -rou nd two- terminal interactive fun ction compu tation pro blem with con curren t messages by regarding a set o f nod es S ⊆ V as one term inal and the complement S c as the other . The minimu m sum-rate fo r this two-termina l p roblem is a lower bound for the minimum sum -total-rate between S an d S c in the original multiterminal p roblem . Let R A , B : = P j ∈ A , k ∈ B , ( j , k ) ∈E R jk denote the sum- total-rate fr om a set of n odes A to a set o f no des B (over all r ound s and over all av ailable directed lin ks from A to B ) . Let R S , S c sum , t denote th e minimum su m-rate of th e t -ro und two- terminal prob lem with concu rrent messages with sources ( X j ) j ∈S at A an d ( X j ) j ∈S c at B an d fun ctions ( f j ( X m )) j ∈S and ( f j ( X m )) j ∈S c to b e com puted at A a nd B respectively . A systematic method fo r developing c ut-set lower bou nds for the minimum sum-total-rate o f the t -rou nd mu ltiterminal problem is to formulate a linear p rogra m with ( R jk ) ( j , k ) ∈E as the variables and the sum-total-rate P ( j , k ) ∈E R jk as the linear o bjective function to be min imized sub ject to the following linea r inequa lity constraints: ∀S ⊆ V , R S , S c ≥ H (( f j ( X m )) j ∈S c | ( X j ) j ∈S c ), R S c , S ≥ H (( f j ( X m )) j ∈S | ( X j ) j ∈S ), ( R S , S c + R S c , S ) ≥ R S , S c sum , t , an d R jk ≥ 0 , ∀ j , k . No te that th e first tw o co nstraints respectively co me fro m the first tw o terms o n the right side of Co rollary 1 (ii). Su ch cu t-set bo unds can of ten p rovide insig hts into when in teraction may be useful an d when it m ay not be ( see examples below). C. Examples Example 1: Consider three nod es with sources ( X 1 , X 2 ) ∼ DSBS( p ), p ∈ (0 , 1), and X 3 = 0. The f unctions desired at nodes 1 , 2, an d 3 are f 1 = 0, f 2 = 0, and f 3 ( x 1 , x 2 ) = x 1 ⊕ x 2 respectively . In o ther w o rds, correlated sou rces X 1 and X 2 are av ailab le at nod es 1 and 2 respe ctiv e ly , and nod e 3 needs to com pute the samplewise Boolean XOR function X 1 ⊕ X 2 . Assume that this thr ee-termina l network has a fu lly connected topology E . First con sider the 1 -roun d many-to-one interaction proto col gi ven by E 1 = { (1 , 3) , ( 2 , 3 ) } . Un der this in teraction protoco l, the distributed function com putation problem reduces to th e K ¨ orner-Marton problem [10] and is illustrated in Figure 4 (a). The distributed source coding scheme of K ¨ orner and Marton based on b inary linear co des (see [ 10]) Novem ber 2, 2021 DRAFT 19 (b) X 2 1 X 1 2 3 X 1 ⊕ X 2 R 13 R 31 (c) X 2 1 X 1 2 3 X 1 ⊕ X 2 R 13 R 23 X 2 1 X 1 2 3 X 1 ⊕ X 2 (a) R 12 R 23 R 23 R 12 R 21 R 32 Fig. 4. (a) Many-to- one K ¨ orner-Ma rton scheme. (b) Relay scheme. (c) General intera cti ve scheme. W hen ( X 1 , X 2 ) ∼ DSBS ( p ) , p ∈ (0 , 1) , all three schemes hav e the same minimum sum-total-rat e 2 h 2 ( p ) . achieves the goal of compu ting the Boolean XOR at node 3 with R 131 = R 231 = R 13 = R 23 = H ( X 1 ⊕ X 2 ) = h 2 ( p ) . Hence the sum-to tal-rate o f this non-in teractive many-to-on e coding scheme is given by R 13 + R 23 = 2 h 2 ( p ) bits p er sample. Thus, in this exam ple t ∗ = 1 and the coding is non -interactive. Next, con sider th e 2-round relay -based inter action pr otocol g iv en b y E 1 = { (1 , 2 ) } an d E 2 = { (2 , 3 ) } as illustrated in Figure 4(b). Consid er the following coding strategy . Using Slepian- W olf c oding in the first r ound , with R 121 = R 12 = H ( X 1 | X 2 ) = h 2 ( p ) , X 1 can be repro duced a t no de 2. T hen, X 1 ⊕ X 2 can be compu ted a t no de 2 an d the result o f the comp utation can b e co n veyed to node 3 in the secon d rou nd by en tropy-co ding at the rate given by R 232 = R 23 = H ( X 1 ⊕ X 2 ) = h 2 ( p ) . Hence th e sum-total-r ate of th is relay scheme is given by R 12 + R 23 = 2 h 2 ( p ) bits per sample. Since under this protoco l in formation is constrained to flow in o nly on e d irection fr om source n ode 1 to source no de 2 in rou nd one and then from nod e 2 to the destinatio n node 3 in roun d two, distributed sour ce codes which respect th is protocol are, tr uly speaking, non-interactive. Finally , con sider gen eral t -roun d interactive co des. Th e cut- set lower bou nd between { 1 } and { 2 , 3 } for co mputin g X 1 ⊕ X 2 at { 2 , 3 } gives R 12 + R 13 ≥ H ( X 1 ⊕ X 2 | X 2 ) = h 2 ( p ) . In terchang ing th e roles of no des 1 and 2 in the previous cut-set bound we also h av e R 21 + R 23 ≥ H ( X 1 ⊕ X 2 | X 1 ) = h 2 ( p ) . Adding these two bound s gives R 12 + R 13 + R 21 + R 23 ≥ 2 h 2 ( p ) . Hence, R sum , t ≥ 2 h 2 ( p ) . This sh ows that the sum- total-rates of the many-to- one K ¨ orner-Marton and the relay schemes are optimum. No amount of interaction can red uce the sum-total-rate of these non-inter activ e schemes. Example 2: Consider three nod es with sources ( X 1 , X 2 ) ∼ DSBS( p ), p ∈ (0 , 1), and X 3 = 0. The f unctions desired at nodes 1 , 2, and 3 are f 1 = 0, f 2 = 0, and f 3 ( x 1 , x 2 ) = x 1 ∧ x 2 respectively . In other words, c orrelated sources X 1 and X 2 are available at n odes 1 and 2 re spectiv ely , and nod e 3 nee ds to compute the sam plewise Boolean AND function instead of the XOR function in Ex ample 1. As in Ex ample 1, assume that this three-terminal n etwork has a fully connected to pology E . Consider a general t -roun d inter activ e c ode with the following interaction protoc ol: for all i = 1 , . . . , t , E i = { (1 , 3 ) , (3 , 1 ) , (2 , 3) , (3 , 2) } (see Figu re 5(a)). N ote that nodes 1 and 2 cannot directly communicate with each other under this interaction pr otocol. Due to Th eorem 2, the cu t-set lower bou nd b etween { 1 } an d { 2 , 3 } fo r c omputin g X 1 ∧ X 2 at { 2 , 3 } is g iv e n by: R 13 + R 31 ≥ H ( X 1 | X 2 ) = h 2 ( p ) . Similarly , we h ave R 23 + R 32 ≥ H ( X 2 | X 1 ) = h 2 ( p ) . Novem ber 2, 2021 DRAFT 20 (b) (a) R 13 R 31 X 2 1 X 1 2 3 R 23 X 1 ∧ X 2 X 2 1 X 1 2 3 X 1 ∧ X 2 R 12 R 23 R 32 Fig. 5. (a) Intera cti ve Many-to-o ne scheme. (b) Relay scheme. When ( X 1 , X 2 ) ∼ DSBS ( p ) and p ∈ (1 / 3 , 1) , the minimum sum-tota l-rate for (b) is less than that for (a). Adding these two bou nds g iv es R 13 + R 31 + R 23 + R 32 ≥ 2 h 2 ( p ) . It shou ld b e clear that t ∗ = 1 because nodes 1 and 2 can send all their source samples to node 3 in o ne r ound . If ther e is only on e ro und, there is no adv antage to be gained by transf erring messages b etween no des 1 and 2. T his o bservation, together with the above cut-set boun d shows th at R sum , t ∗ ≥ 2 h 2 ( p ) . Now consider the 2-ro und relay scheme illustrated in Figure 5(b ). Using Slepian-W olf codin g in the first roun d, with R 121 = R 12 = H ( X 1 | X 2 ) = h 2 ( p ) , X 1 can be reprod uced at n ode 2. Then, X 1 ∧ X 2 can be computed at node 2 and the result of th e computatio n ca n be conve y ed to n ode 3 in the secon d r ound by entropy-co ding at the rate giv en by R 232 = R 23 = H ( X 1 ∧ X 2 ) = h 2 1 − p 2 . Hen ce the sum- total-rate of this relay scheme is given by R 12 + R 23 = h 2 ( p ) + h 2 1 − p 2 bits p er sample, which is less than 2 h 2 ( p ) when p > 1 / 3. T hus, fo r p > 1 / 3, R sum , 2 < R sum , t ∗ and interaction is useful. 7 In fact, when p > 1 / 3, a single message from node 1 to n ode 2 is mo re beneficial in terms o f the sum-to tal-rate than m ultiple rou nds of two-way commu nication between nod es 1 and 3 and between nodes 2 an d 3. Example 3: Con sider m ≥ 3 nod es a nd m ind ependen t sou rces X 1 , . . . , X m each of wh ich is iid Ber (1 / 2 ). For each i , the i -th source X i is ob served at only the i -th n ode. On ly node 1 needs to com pute the fu nction f 1 ( x m ) = min m j = 1 ( x j ). Assume that the n etwork h as a star topolo gy with nod e 1 as the central nod e as illustrated in Figure 6. Specifically , let E = { ( j , 1 ) , (1 , j ) } m j = 2 . Consider no n-intera ctiv e coding schemes in which infor mation is co nstrained to flow in only one dire ction fro m the leaf no des to the centr al no de as illustrated in Figure 6(a). Specifically , the inte raction protoco l is g i ven by E i = { ( j , 1) } m j = 2 for each i = 1 , . . . , t . Since informatio n flows in only one d irection from the leaf nod es to the central node, th ere is no loss of g enerality in assuming that t = 1. F or each j = 2 , . . . , m , le t us compute the cut-set bou nd R S , S c sum , t with S = { j } and t = 1. Using Lemma 2, we obtain R j 1 ≥ H ( X j | X 1 , . . . , X j − 1 , X j + 1 , . . . , X m ) = H ( X j ) = 1. Therefo re, R sum , 1 ≥ ( m − 1) . Since this is achiev able by transfe rring all the data to no de 1, R sum , 1 = ( m − 1) = Θ ( m ). Thus, in this e xample, t ∗ = 1. 7 Trul y speaking, this coding scheme is non-inte racti ve because information flows in only one directi on from node 1 to node 2 and then from node 2 to node 3. Novem ber 2, 2021 DRAFT 21 . . . m X m 2 X 2 m X m 2 X 2 1 min m j = 1 X j X 1 1 min m j = 1 X j X 1 (b) (a) . . . 3 X 3 . . . 3 X 3 . . . Fig. 6. (a) Non-interacti ve function computation. (b) Interacti ve function computation. When ( X 1 , . . . , X m ) ∼ iid Ber (1 / 2) , the minimum sum- total -rate for (b) is orderwise smaller than that for (a). Now consider the following (2 m − 2)-ro und interactive co ding scheme i n which information flo ws in both directions from the leaf no des to the central nod e and back as illustrated in Fig ure 6(b). In roun d number (2 i − 1) , where i ranges throug h integers from 1 throug h ( m − 1), nod e 1 sends the sequence min i j = 1 ( X j ( k )) n k = 1 to nod e ( i + 1) at the rate R 1( i + 1)(2 i − 1) = H (m in i j = 1 ( X j )) = h 2 (1 / 2 i ) bits per sample. Node ( i + 1 ) then compu tes the sequenc e min i + 1 j = 1 ( X j ( k )) n k = 1 and send s it back to n ode 1 in ro und num ber 2 i , using Slepian -W olf coding (or conditio nal coding) with the previous message as correlated sid e infor mation av ailab le to the d ecoder ( and the encoder ). This can be done at the rate giv en by R ( i + 1)1(2 i ) = H (m in i + 1 j = 1 ( X j ) | m in i j = 1 ( X j )) = 1 / 2 i bits per sample. It can b e verified that the m essage sequenc e in round number ( 2 m − 2) is the desired function. The sum-total-rate of th is scheme is gi ven by m − 1 X i = 1 h 2 1 2 i ! + 1 2 i ! ≤ m − 1 X i = 1 1 2 i log 2 e + i 2 i + 1 2 i ! < 3 + log 2 e , where the fir st ine quality is because h 2 ( p ) ≤ p log 2 ( e / p ). Th us fo r all m ≥ 6 , R sum , (2 m − 2) < 3 + log 2 e < 5 ≤ ( m − 1) = R sum , t ∗ , showing that in teraction is usef ul. In fact, the minimu m sum-to tal-rate R sum , (2 m − 2) is O ( 1) with resp ect to the number of nodes m in the n etwork. Th is is orderwise smaller than Θ ( m ) for any 1-rou nd no n-interactive codin g scheme. 8 The above examples can be interp reted in two ways. From the perspective of proto col d esign, these exam ples show that f or a given topo logy , certain info rmation -routing config urations ar e fundam entally mor e e ffi c ient than certain others for func tion computa tion. From the pe rspective of n etwork architectu re, these examples show th at certain topologies are f undame ntally more e ffi cient than cer tain others for f unction computation. Th e last examp le shows th at the scalin g laws governing the info rmation transpo rt e ffi ciency in large networks can be dr amatically di ff erent depending on wh ether the information transport is interactive or non -interactive. VI. C oncluding remarks In this p aper, we studied the two-terminal interactive fu nction compu tation problem within a distributed source coding framework and d emonstrated that the benefit of interac tion depends on both the functio n-structu re and the 8 Note the follo wing: a) In s tudying how the minimum sum-total -rate scales with network size, the coding blockl ength is out of the pictu re because it has alre ady been “sent to infinity”. b) Even though H (min m j = 1 ( X j )) → 0 as m → ∞ , we canno t hav e nodes send nothing ( t = 0) and set the output of node 1 to be identicall y zero. T his is because then the probability of block error will be equal to one. Novem ber 2, 2021 DRAFT 22 distribution-structure. W e formulated a multiterm inal interactiv e fu nction compu tation pr oblem and demonstrated th at interaction can change the scaling law of communication e ffi ciency in large networks. There are se veral directions for future work. In two-termin al inter activ e fun ction co mputation , a co mputab le character ization of the infinite-message minimum sum-rate is still open . The achie vable infinite-message sum-rate of Section IV -F inv olv ing definite inte grals and a rate-allo cation curve appears to be a prom ising ap proach. W e h av e obtain ed o nly a partial charac terization of the structure of fu nctions and distributions for which interaction is n ot beneficial. An interesting direction would b e to find necessary and su ffi cient condition s u nder which interaction is useful. The mu ltiterminal inter activ e f unction computatio n p roblem is wid e open. A p romising d irection would be to study h ow the total n etwork rate scales with network size and understand how it is related to the network topology , the function structure, and the d istribution- structure. A ppendix I T heorem 1 converse pr o of If a rate tuple R = ( R 1 , . . . , R t ) is admissible for th e t - message interactive function co mputation with initial location A , the n ∀ ǫ > 0, there exists N ( ǫ , t ), such that ∀ n > N ( ǫ , t ) the re exists an in teractive d istributed sou rce code with initial lo cation A and param eters ( t , n , |M 1 | , . . . , | M t | ) satisfying 1 n log 2 |M j | ≤ R j + ǫ , j = 1 , . . . , t , P ( Z A , b Z A ) ≤ ǫ , P ( Z B , b Z B ) ≤ ǫ . Define auxiliary random variables ∀ i = 1 , . . . , n , U 1 ( i ) : = { M 1 , X ( i − ) , Y ( i + ) } , an d for j = 2 , . . . , t , U j : = M j . Information inequalities: F or the first r ate, we ha ve n ( R 1 + ǫ ) ≥ H ( M 1 ) ≥ H ( M 1 | Y ) ≥ I ( M 1 ; X | Y ) = H ( X | Y ) − H ( X | M 1 , Y ) = n X i = 1 H ( X ( i ) | Y ( i )) − H ( X ( i ) | X ( i − ) , M 1 , Y ) ≥ n X i = 1 H ( X ( i ) | Y ( i )) − H ( X ( i ) | X ( i − ) , M 1 , Y ( i ) , Y ( i + )) = n X i = 1 I ( X ( i ); M 1 , X ( i − ) , Y ( i + ) | Y ( i )) = n X i = 1 I ( X ( i ); U 1 ( i ) | Y ( i )) . (I.1) Novem ber 2, 2021 DRAFT 23 For an o dd j ≥ 2 , we ha ve n ( R j + ǫ ) ≥ H ( M j ) ≥ H ( M j | M j − 1 , Y ) ≥ I ( M j ; X | M j − 1 , Y ) = H ( X | M j − 1 , Y ) − H ( X | M j , Y ) = n X i = 1 H ( X ( i ) | X ( i − ) , M j − 1 , Y ) − H ( X ( i ) | X ( i − ) , M j , Y ) ( a ) = n X i = 1 H ( X ( i ) | X ( i − ) , M j − 1 , Y ( i ) , Y ( i + )) − H ( X ( i ) | X ( i − ) , M j , Y ) ≥ n X i = 1 H ( X ( i ) | X ( i − ) , M j − 1 , Y ( i ) , Y ( i + )) − H ( X ( i ) | X ( i − ) , M j , Y ( i ) , Y ( i + )) = n X i = 1 I ( X ( i ); M j | M j − 1 , X ( i − ) , Y ( i + ) , Y ( i )) = n X i = 1 I ( X ( i ); U j | U 1 ( i ) , U j − 1 2 , Y ( i )) . (I.2) Step (a) is b ecause the Markov chain X ( i ) − ( M j − 1 , X ( i − ) , Y ( i ) , Y ( i + )) − Y ( i − ) holds for each i = 1 , . . . , n . Similarly , for an even j ≥ 2, we hav e n ( R j + ǫ ) ≥ H ( M j ) ≥ I ( M j ; Y | M j − 1 , X ) = H ( Y | M j − 1 , X ) − H ( Y | M j , X ) = n X i = 1 H ( Y ( i ) | Y ( i + ) , M j − 1 , X ) − H ( Y ( i ) | Y ( i + ) , M j , X ) ( b ) = n X i = 1 H ( Y ( i ) | Y ( i + ) , M j − 1 , X ( i ) , X ( i − )) − H ( Y ( i ) | Y ( i + ) , M j , X ) ≥ n X i = 1 H ( Y ( i ) | Y ( i + ) , M j − 1 , X ( i ) , X ( i − )) − H ( Y ( i ) | Y ( i + ) , M j , X ( i ) , X ( i − )) = n X i = 1 I ( Y ( i ); M j | M j − 1 , X ( i − ) , Y ( i + ) , X ( i )) = n X i = 1 I ( Y ( i ); U j | U 1 ( i ) , U j − 1 2 , X ( i )) . (I.3) Step (b) is because th e Markov chain Y ( i ) − ( M j − 1 , X ( i − ) , X ( i ) , Y ( i + )) − X ( i + ) holds for each i = 1 , . . . , n . Novem ber 2, 2021 DRAFT 24 By the condition P ( Z A , b Z A ) ≤ ǫ an d Fano’ s inequ ality [ 5], h 2 ( ǫ ) + ǫ log 2 ( |Z n A | − 1) ≥ H ( Z A | M t , X ) ≥ n X i = 1 H ( Z A ( i ) | Z A ( i + ) , M t , X ) ≥ n X i = 1 H ( Z A ( i ) | Z A ( i + ) , Y ( i + ) , M t , X ) ( c ) = n X i = 1 H ( Z A ( i ) | Y ( i + ) , M t , X ) ( d ) = n X i = 1 H ( Z A ( i ) | Y ( i + ) , M t , X ( i − ) , X ( i )) = n X i = 1 H ( Z A ( i ) | U 1 ( i ) , U t 2 , X ( i )) . (I.4) Step (c) is because for each i , Z A ( i ) = f A ( X ( i ) , Y ( i )). Step (d ) is becau se the Markov ch ain Z A ( i ) − ( X ( i ) , Y ( i )) − ( M t , X ( i − ) , X ( i ) , Y ( i + )) − X ( i + ) holds for each i . Similarly we also h av e h 2 ( ǫ ) + ǫ log 2 ( |Z n B | − 1) ≥ n X i = 1 H ( Z B ( i ) | U 1 ( i ) , U t 2 , Y ( i )) . (I.5) T imesharing: T hen we introd uce a timesharing random variable Q taking value in { 1 , . . . , n } e qually likely , w hich is indepen dent of all the other rand om variables. De fining U 1 : = ( U 1 ( Q ) , Q ) , X : = X ( Q ) , Y : = Y ( Q ) , Z A : = Z A ( Q ) , Z B : = Z B ( Q ), we can continue (I.1 ) as R 1 + ǫ ≥ 1 n n X i = 1 I ( X ( i ); U 1 ( i ) | Y ( i )) = I ( X ( Q ); U 1 ( Q ) | Y ( Q ) , Q ) ( e ) = I ( X ( Q ); U 1 ( Q ) , Q | Y ( Q )) = I ( X ; U 1 | Y ) , (I.6) where step ( e) is because Q is ind epende nt of all the o ther ran dom variables and the join t pmf o f ( X ( Q ) , Y ( Q )) ∼ p X Y does not depend o n Q . Similarly , (I.2) and (I.3) be come R j + ǫ ≥ I ( X ; U j | Y , U j − 1 ) , j ≥ 2 , j odd , I ( Y ; U j | X , U j − 1 ) , j ≥ 2 , j ev en . (I.7) (I.4) and (I.5) b ecome 1 n h 2 ( ǫ ) + ǫ log 2 |Z A | ≥ H ( Z A | U t , X ) , (I.8) 1 n h 2 ( ǫ ) + ǫ log 2 |Z B | ≥ H ( Z B | U t , Y ) . (I.9) Concernin g the Markov ch ains, we can verify that U 1 ( i ) − X ( i ) − Y ( i ) holds for each i = 1 , . . . , n , ⇒ I ( U 1 ( Q ); Y ( Q ) | X ( Q ) , Q ) = 0 ⇒ I ( U 1 ( Q ) , Q ; Y ( Q ) | X ( Q )) = 0 ⇒ I ( U 1 ; Y | X ) = 0. For each odd j ≥ 2, we can verify that U j − ( X ( i ) , U 1 ( i ) , U j − 1 2 ) − Y ( i ) hold s f or each i , ⇒ I ( U j ; Y ( Q ) | X ( Q ) , U 1 ( Q ) , U j − 1 2 , Q ) = 0 ⇒ I ( U j ; Y | X , U j − 1 ) = 0. Similarly , we c an prove the Novem ber 2, 2021 DRAFT 25 Markov cha ins for even j ’ s. So we hav e I ( U j ; Y | X , U j − 1 ) = 0 , j odd , I ( U j ; X | Y , U j − 1 ) = 0 , j ev en . (I.10) Car din ality b ounds: Th e cardinalities |U j | , j = 1 , . . . , t , can be bou nded as in (3. 2) by coun ting the constraints that the U j ’ s n eed to satisfy and applying the Carath ´ eodory theo rem recursively as explained below (also see [13]). Le t U t be a given set of random variables satisfying (I.6) to (I.1 0). If |U j | , j = 1 , . . . , t are larger th an the alphab et sizes giv en by (3 .2), it is possible to de riv e an alternative set of ra ndom variables satisfying (3 .2) while pr eserving the values o n the right side of ( I.6) to (I.9) fixed b y th e given U t as well as all the Markov chain s (I.1 0) satisfied b y the giv en U t . The deriv atio n o f an alternati ve set of ran dom v ar iables from U t has a recursiv e structu re. Suppose that for j = 1 , . . . , ( k − 1), alternativ e U j have been derived satisfying (3.2) without changing the right sides of (I.6) to (I.9) and with out violatin g the Markov chain con straints (I.10). W e focus on deriving an alternative random variable e U k from U k . W e illustrate the d eriv atio n for only an odd -valued k . The jo int pmf of ( X , Y , U t ) can b e factorized as p X Y U t = p U k p X U k − 1 | U k p Y | X U k − 1 p U t k + 1 | X Y U k (I.11) due to the Markov chain U k − ( X , U k − 1 ) − Y . It sh ould b e noted th at Z A and Z B being determ inistic function s of ( X , Y ) are con ditionally independen t of U t giv en ( X , Y ). The main idea is to alter p U k to p e U k keeping fix ed all the other factor s on the righ t side of ( I.11). W e alter p U k to p e U k in m anner whic h leaves p X Y U k − 1 unchan ged while simultaneou sly pr eserving the righ t sides of (I.6) to (I.9). Leaving p X Y U k − 1 unchan ged en sures that th e Markov ch ain constraints (I .10) contin ue to h old for U k − 1 . Fixin g all th e factors in (I.11) except the first ensures that the Markov chain constrain ts (I.10) continue to h old for ( e U k , U t k + 1 ). T o keep p X Y U k − 1 unchan ged, it is su ffi cient to keep p X U k − 1 unchan ged because p Y | X U k − 1 is kep t fixed in (I.11). Keeping p X U k − 1 and p X U k − 1 | U k fixed while altering p U k requires that p X U k − 1 ( x , u k − 1 ) = X u k p U k ( u k ) p X U k − 1 | U k ( x , u k − 1 | u k ) (I.12) hold all tuples ( x , u k − 1 ). This lead s to |X| Q k − 1 j = 1 |U j | − 1 linear constraints o n p U k (the m inus one is because P x , u k − 1 p X U k − 1 ( x , u k − 1 ) = 1). W ith p X Y U k − 1 unchan ged, the r ight sides of (I.6) and (I. 7) for j = 1 , . . . , ( k − 1) also remain unchang ed. For j = k , k odd , th e right side of ( I.7) can b e written as f ollows i k = H ( X | Y , U k − 1 ) − X u k p U k ( u k ) H ( X | Y , U k − 1 , U k = u k ) . (I.13) The quantity i k is equa l to the value of I ( X ; U k | Y , U k − 1 ) e valuated for the o riginal set of rand om variables U t which did n ot satisfy the cardin ality bound s (3.2). The q uantities H ( X | Y , U k − 1 ), an d H ( X | Y , U k − 1 , U k = u k ) in (I .13) ar e held fixed becau se p X Y U k − 1 is kept unch anged a nd a ll factors except the first in (I.11) are fixed. In a similar mann er , for each j > k , j odd, the right side o f (I.7) can be written as follows i j = X u k p U k ( u k ) I ( X ; U j | Y , U k − 1 , U j k + 1 , U k = u k ) , (I.14) where i j is e qual to the value of I ( X ; U j | Y , U j − 1 ) ev alu ated fo r the o riginal U t and I ( X ; U j | Y , U k − 1 , U j k + 1 , U k = u k ) is h eld fixed fo r all j > k , j o dd, b ecause all factors except the first in (I. 11) ar e fixed. Again , for each j > k , j ev en, th e right side o f (I.7) can b e written as f ollows i j = X u k p U k ( u k ) I ( Y ; U j | X , U k − 1 , U j k + 1 , U k = u k ) , (I.15) Novem ber 2, 2021 DRAFT 26 where i j is equ al to the value of I ( Y ; U j | X , U j − 1 ) ev alu ated for the origin al U t and I ( Y ; U j | X , U k − 1 , U j k + 1 , U k = u k ) is held fix ed for all j > k , j e ven, beca use all factors except the first in ( I.11) are fixed. Th e right sides of (I.8) and (I.9) respectiv e ly can also be written as follo ws h A = X u k p U k ( u k ) H ( Z A | X , U k − 1 , U t k + 1 , U k = u k ) , (I.16) h B = X u k p U k ( u k ) H ( Z A | X , U k − 1 , U t k + 1 , U k = u k ) , (I.17) where h A and h B are respectively equal to the values of H ( Z A | X , U t ) and H ( Z B | Y , U t ) evaluated for the orig inal U t and H ( Z A | X , U k − 1 , U t k + 1 , U k = u k ) and H ( Z A | X , U k − 1 , U t k + 1 , U k = u k ) are h eld fixed because Z A and Z B are deterministic function s of ( X , Y ) an d all f actors except the first in (I.11) ar e fixed. Equation s (I.13) thr ough (I .17) impo se ( t − k + 3) linear constraints on p U k . When the linear constrain ts imposed b y (I.12) are acco unted for, altogether ther e are no m ore than |X| Q k − 1 j = 1 |U j | + t − k + 2 linear constraints on p U k . The vector ( { p X U k − 1 ( x , u k − 1 ) } , i k , . . . , i t , h A , h B ) belong s to the conv ex hull o f |U k | vector s whose |X| Q k − 1 j = 1 |U j | + t − k + 2 compon ents a re g iv e n by { p X U k − 1 | U k ( x , u k − 1 | u k ) } , H ( X | Y , U k − 1 , U k = u k ), { I ( X ; U j | Y , U k − 1 , U j k + 1 , U k = u k ) } j > k , j : odd , { I ( Y ; U j | X , U k − 1 , U j k + 1 , U k = u k ) } j > k , j : even , H ( Z A | X , U k − 1 , U t k + 1 , U k = u k ), H ( Z A | X , U k − 1 , U t k + 1 , U k = u k ). By the Carath ´ eodo ry theorem, p U k can be r eplaced b y p e U k such that th e new rand om variable e U k ∈ e U k where e U k ⊆ U k contains o nly |X| Q k − 1 j = 1 |U j | + t − k + 3 elements, while (I.10) an d the right sides o f (I.6) to ( I.9) remain unchanged. T aking limits: Thus far, we have shown th at ∀ ǫ > 0 and ∀ n > N ( ǫ , t ) , ∃ p U t | X Y ( u t | x , y , ǫ , n ) such that U t satisfy (3.2) and (I .6) to (I.1 0). It should be n oted th at p U t | X Y ( u t | x , y , ǫ , n ) ma y depe nd on ( ǫ , n ), where as ∀ j = 1 , . . . , t , | U j | is finite an d in depend ent of ( ǫ , n ). Th erefor e, for each ( ǫ 0 , n 0 ), p U t | X Y ( u t | x , y , ǫ 0 , n 0 ) is a fin ite dim ensional stoch astic matrix taking values in a compact set. Let { ǫ l } be any seq uence of real number s such that ǫ l > 0 an d ǫ l → 0 as l → ∞ . Let { n l } b e any sequence of blo cklength s such that n l > N ( ǫ l , t ). Sinc e p U t | X Y li ves in a compact set, there exists a subseq uence of { p U t | X Y ( u t | x , y , ǫ l , n l ) } c onv e rging to a limit p ¯ U t | X Y ( u t | x , y ). Denote the auxiliary ran dom variables de riv ed f rom the limit pmf by ¯ U t . Due to the c ontinuity of co nditiona l mu tual inform ation an d cond itional entropy measures, (I.6 ) to (I.10) b ecome R j ≥ I ( X ; ¯ U j | Y , ¯ U j − 1 ) , I ( ¯ U j ; Y | X , ¯ U j − 1 ) = 0 , j odd , I ( Y ; ¯ U j | X , ¯ U j − 1 ) , I ( ¯ U j ; X | Y , ¯ U j − 1 ) = 0 , j even , H ( Z A | ¯ U t , X ) = 0 , H ( Z B | ¯ U t , Y ) = 0 . Therefo re R belong s to righ t si de of (3.1). Remarks: T he conv exity of the theoretical characterization of the rate region can be established in a ma nner similar to th e timesharing argu ment in the above proof. The closedness of the region can also b e shown established in a manner similar to the limit argument in the last paragraph of the ab ove proof using the f ollowing facts: (i) All the alphabets are finite, th us p U t | X Y takes values in a comp act set. Therefor e the limit po int of a sequ ence of c ondition al probab ilities exists. (ii) Condition al mutua l infor mation measures ar e co ntinuo us with respect to the pro bability distributions. A ppendix II P r oo fs of T heorems 2 and 3 Pr oof o f Theo r em 2 : W e need to sh ow that R A sum , t ≥ H ( X | Y ) only fo r p , q ∈ (0 , 1) . If p , q ∈ ( 0 , 1 ) then p X Y ( x , y ) > 0 , ∀ ( x , y ) ∈ X × Y . L et U t be any set of aux iliary rando m variables in (3.3) satisfying all the M arkov cha in and Novem ber 2, 2021 DRAFT 27 condition al entropy constraints of (3 .1). Du e to Lemm a 1( i), for any u t , A ( u t ) is a r ectangle of X × Y . Due to Lemma 1(iii) and the assumptio n that f B (0 , y 0 ) , f B (1 , y 0 ), A ( u t ) canno t be X × Y . Ther efore A ( u t ) could be a row of X × Y , a column, a sing leton, or the e mpty set. Let φ ( u t ) : = 0 , if A ( u t ) is empty 1 , if A ( u t ) is a r ow of X × Y 2 , otherwise. Now , p U t ( u t ) = 0 ⇔ A ( u t ) is empty . Therefore p φ ( U t ) (0) = X u t : A ( u t ) is empty p U t ( u t ) = X u t : p U t ( u t ) = 0 p U t ( u t ) = 0 . Hence p X Y φ ( U t ) ( x , y , 0 ) = 0 for all x and y . By th e definition of a row of X × Y , we have H ( X | U t , φ ( U t ) = 1 ) = 0 , which implies that H ( X | Y , U t , φ ( U t ) = 1 ) = 0 . Similarly , we hav e H ( Y | X , U t , φ ( U t ) = 2 ) = 0 . Loosely speak ing, th is means th at k nowing the auxiliary r andom variables U t = u t (represen ting the m essages in the proo f of ach iev ability ), there are only two possible alter natives, (1) H ( Y | X , U t = u t ) = 0, that is, Y c an be repro duced at location A ; (2) H ( X | Y , U t = u t ) = 0 , X can be r eprodu ced at location B . Thus interestingly , althoug h the goal was to only compute a function of sou rces at location B , after t m essages h ave bee n com municated , each location can, in fact, repr oduce a part o f the source from the other location. In the case wher e X is not known at location B , Y mu st be kn own at location A . T o con tinue the pr oof, for any t ∈ Z + , R A sum , t = min[ I ( X ; U t | Y ) + I ( Y ; U t | X )] = min[ H ( X | Y ) − H ( X | Y , U t , φ ( U t )) + H ( Y | X ) − H ( Y | X , U t , φ ( U t ))] ( a ) = min[ H ( X | Y ) − H ( X | Y , U t , φ ( U t ) = 2 ) p φ ( U t ) (2) + H ( Y | X ) − H ( Y | X , U t , φ ( U t ) = 1 ) p φ ( U t ) (1) = min[ H ( X | Y ) − H ( Y ⊕ W | Y , U t , φ ( U t ) = 2 ) p φ ( U t ) (2) + H ( Y | X ) − H ( X ⊕ W | X , U t , φ ( U t ) = 1 ) p φ ( U t ) (1) ≥ min[ H ( X | Y ) − H ( W | φ ( U t ) = 2 ) p φ ( U t ) (2) + H ( Y | X ) − H ( W | φ ( U t ) = 1) p φ ( U t ) (1) = min[ H ( X | Y ) + H ( Y | X ) − H ( W | φ ( U t ))] ≥ H ( X | Y ) + H ( Y | X ) − H ( W ) ( b ) = H ( X | Y ) , where all the minimizatio ns above are subjec t to all the Markov cha in and c ondition al entropy constraints in (3.1). In step ( a ) we used the con ditions H ( X | Y , U t , φ ( U t ) = 1) = 0 and H ( Y | X , U t , φ ( U t ) = 2 ) = 0 and in step ( b ) we used the fact th at H ( Y | X ) = H ( W ) = h 2 ( p ) . Pr oof of Theorem 3: This f ollows immed iately b y examining th e proof of Theo rem 2 and making th e following observations. Observe th at A ( u t ) can be only a subset of a row or a column . This follows fro m the first assump tion in the statement of th e theorem that these are the on ly column-wise f B -mono chroma tic r ectangles of X × Y . Next Novem ber 2, 2021 DRAFT 28 observe th at if φ ( u t ) : = 0 , if A ( u t ) is empty 1 , if A ( u t ) is a su bset of a row of X × Y 2 , otherwise, then H ( X | Y , U t , φ ( U t ) = 1) = 0 and H ( Y | X , U t , φ ( U t ) = 2) = 0 as in the p roof of Theorem 2 . Finally ob serve that the the series of inform ation-ineq ualities in pre v ious proof will contin ue to hold if X ⊕ W an d Y ⊕ W are replaced by ψ ( X , W ) and η ( Y , W ) respec tiv ely . This is du e to the second assump tion in the statement of the theorem which also states that H ( Y | X ) = H ( W ). A ppendix III T heorem 4 pr o of Since 0 < p , q < 1, p X Y ( x , y ) > 0 , ∀ ( x , y ) ∈ X × Y . Let U t be any set of auxiliary r andom variables in (3.3) satisfying all the Markov chain a nd cond itional entropy constrain ts o f (3 .1). Due to Lemma 1(i) and (iv), for a ny u t , A ( u t ) is a f A -mono chroma tic rectan gle o f X × Y . Since f A ( x , y ) = x ∧ y , A ( u t ) can be { (0 , 0) , (0 , 1) } , { (0 , 0) , (1 , 0) } , any singleton { ( x , y ) } , or the emp ty set. Let φ ( u t ) : = 0 , if A ( u t ) is empty 1 , if A ( u t ) = { (1 , 1) } 2 , if A ( u t ) ∋ ( 1 , 0 ) 3 , otherwise. Since, p U t ( u t ) = 0 ⇔ A ( u t ) is empty , p φ ( U t ) (0) = 0. Therefo re p X Y φ ( U t ) ( x , y , 0 ) = 0 for all x an d y . When X = Y = 0, φ ( U t ) can be only 2 or 3, that is, p X Y φ ( U t ) (0 , 0 , 0 ) = p X Y φ ( U t ) (0 , 0 , 1 ) = 0 . The condition p X Y φ ( U t ) (0 , 0 , 0 ) = 0 is obvio us. T o see why p X Y φ ( U t ) (0 , 0 , 1 ) = 0 is true, note that φ ( u t ) = 1 if, an d only if, A ( u t ) = { (1 , 1) } , which implies that p X Y U t (0 , 0 , u t ) = 0 b ecause A ( u t ) is the set o f all ( x , y ) for which p X Y U t ( x , y , u t ) > 0 and (0 , 0 ) is not in it. T herefor e, p X Y φ ( U t ) (0 , 0 , 1 ) = X u t : A ( u t ) = { (1 , 1) } p X Y U t (0 , 0 , u t ) = 0 . Reasoning in a similar fashion, we can summ arize the relationship between X , Y , and φ ( U t ) as shown in T able I. For each value of ( x , y ), the v alues of φ ( U t ) shown in the table are those values for which p X Y φ ( U t ) is possibly nonzero, that is, f or all values of φ di ff er ent from those sh own in the table, the value of p X Y φ ( U t ) is zero . For e xample, the entry “ X = 0 , Y = 0 , φ ( U t ) = 2 or 3” means that for i , 2 , 3, p X Y φ ( U t ) (0 , 0 , i ) = 0. T ABLE I R ela tion between X , Y and φ ( U t ) Y = 0 Y = 1 X = 0 φ ( U t ) = 2 or 3 φ ( U t ) = 3 X = 1 φ ( U t ) = 2 φ ( U t ) = 1 Novem ber 2, 2021 DRAFT 29 Let λ : = p φ ( U t ) | X , Y (2 | 0 , 0), we have p φ ( U t ) (0) = 0 , p φ ( U t ) (1) = p X Y (1 , 1 ) , p φ ( U t ) (2) = p X Y (1 , 0 ) + λ p X Y (0 , 0 ) , p φ ( U t ) (3) = p X Y (0 , 1 ) + (1 − λ ) p X Y (0 , 0 ) . For any t ∈ Z + , R A sum , t = min[ I ( X ; U t | Y ) + I ( Y ; U t | X )] = min[ I ( X ; U t , φ ( U t ) | Y ) + I ( Y ; U t , φ ( U t ) | X )] ≥ min[ I ( X ; φ ( U t ) | Y ) + I ( Y ; φ ( U t ) | X )] = min[ H ( X | Y ) + H ( Y | X ) − H ( X | Y , φ ( U t )) − H ( Y | X , φ ( U t ))] = min [ h 2 ( p ) + h 2 ( q ) − H ( X | Y = 0 , φ ( U t ) = 2) p φ ( U t ) (2) − H ( Y | X = 0 , φ ( U t ) = 3) p φ ( U t ) (3) ] ≥ min 0 ≤ λ ≤ 1 " h 2 ( p ) + h 2 ( q ) − h 2 p X Y (1 , 0 ) p φ ( U t ) (2) ! p φ ( U t ) (2) − h 2 p X Y (0 , 1 ) p φ ( U t ) (3) ! p φ ( U t ) (3) # , where all th e minimizations ab ove except th e last o ne are su bject to all the Markov chain and conditional en tropy constraints in (3.1). The last expression is minimized when λ ∗ = q (1 − p ) / ( p + q − 2 p q ). Evaluating th e minimum value of the objective functio n, we have R sum , ∞ = lim t →∞ R A sum , t ≥ h 2 ( p ) + h 2 ( q ) − (1 − p q ) h 2 (1 − p )(1 − q ) 1 − pq ! . R eferences [1] E. Kushile vitz and N. Nisan, Communic ation Comple xity . Cambri dge Univ ersity Press, 1997. [2] A. Giridhar and P . Kumar , “Computing and communicating functions ov er sensor networ ks, ” IEE E Journal on Select ed Are as of Communicat ion , vol. 23, no. 4, pp. 755–764, Apr 2005. [3] ——, “T ow ards a theory of in-network computation in wire less sensor networks, ” IEEE Communications Ma gazine , v ol. 44, no. 4, pp. 98–107, Apr 2006. [4] A. H. Kaspi, “T wo-wa y source coding with a fidelity criterion, ” IEEE T ran. Info. Theory , vol. IT –31, no. 6, pp. 735–740, Nov 1985. [5] T . M. Cov er and J. A. Thomas, Elements of Information Theory . Ne w Y ork: Wil ey , 1991. [6] E. Y ang and D. He, “On interact i ve encoding and decoding for lossless source coding with decoder only side information, ” Proc. IEE E Inttl. Symp. Info. Theory (ISIT) , Jul. 2008. [7] H. Y amamoto, “W yner-zi v theory for a general function of the correlate d sources, ” IEEE Tr an. Info. Theory , vol. IT –28, no. 5, pp. 803–807, Sept 1982. [8] T .-S. Han and K. K obayashi , “ A dich otomy of functi ons f ( x , y ) of c orrelat ed sources ( x , y ) from t he vie wpoint of the ach ie vable rate regi on, ” IE EE Tr an. Info. T heory , vol. IT –33, no. 1, pp. 69–76, Jan 1987. [9] A. Orlitsk y and J. R. Roche, “Coding for computing, ” IEEE T rans. Info. Theory , vol. IT –47, no. 3, pp. 903–917, Mar 2001. [10] J. K ¨ orner and K. Marton, “Ho w to encode the modulo-tw o sum of binary sources, ” IEEE Tr an. Info. Theory , vol. IT –25, no. 2, pp. 219–221, Mar 1979. Novem ber 2, 2021 DRAFT 30 [11] A. Orli tsky , “W orst-case inte racti ve communication i: two messages are almost optimal, ” IEEE T ra n. Info. Theory , vol. 36, no. 5, pp. 1111–1126, Sept 1990. [12] ——, “W orst-c ase interacti ve communica tion ii: two messages are not optimal , ” IEE E T ran. Info. Theory , vol. 37, no. 4, pp. 995–1005, July 1991. [13] H. V iswana than and T . Berger , “Sequentia l coding of correlated sources, ” IEEE T ran. Info. Theory , vol. IT –46, no. 1, pp. 236–246, Jan 2000. Novem ber 2, 2021 DRAFT
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment