The CEO problem with inter-block memory

1 The CEO problem with inter -block memory V ictoria K ostina, Member , IEEE , Babak Hassibi, Member , IEEE Abstract —An n -dimensional source with memory is ob- served by K isolated encoders via parallel channels, who compress their observations to transmit to the decoder via noiseless rate-constrained links while leveraging their memory of the past. At each time instant, the decoder receiv es K new codew ords from the observers, combines them with the past recei ved codewords, and produces a minimum-distortion estimate of the latest block of n source symbols. This scenario extends the classical one-shot CEO problem to multiple rounds of communication with communicators maintaining the memory of the past. W e extend the Berger -T ung inner and outer bounds to the scenario with inter -block memory , showing that the minimum asymptotically (as n → ∞ ) achievable sum rate requir ed to achieve a target distortion is bounded by minimal directed mutual inf ormation problems. For the Gauss-Markov source observed via K parallel A WGN channels, we show that the inner bound is tight and solve the corresponding minimal directed mutual information problem, thereby establishing the minimum asymptotically achievable sum rate. Finally , we explicitly bound the rate loss due to a lack of communication among the observers; that bound is attained with equality in the case of identical observ ation channels. The general coding theorem is pro ved via a new nonasymp- totic bound that uses stochastic likelihood coders and whose asymptotic analysis yields an extension of the Berger -T ung inner bound to the causal setting. The analysis of the Gaussian case is facilitated by re versing the channels of the observers. Index T erms —CEO problem, Ber ger-T ung bound, dis- tributed source coding, causal rate-distortion theory , Gauss- Markov source, LQG control, directed information. I . I N T R O D U C T I O N W e set up the CEO (chief executi ve or estimation ofﬁcer) problem with inter-block memory as follows. An information source { X i } emits a block of length n , X i ∈ A n , at time i ; it is observed by K encoders through K noisy channels; at time i , k th encoder sees Y k i generated according to P Y k i | X 1 ,...,X i ,Y k 1 ,...,Y k i − 1 . See Fig. 1 . The encoders (observers) communicate to the decoder (CEO) via their separate noiseless rate-constrained links. At each time i , k th observer forms a code word based on the observ ations it has seen so far , i.e., Y k 1 , . . . , Y k i . The decoder at time i forms the estimate, ˆ X i ∈ ˆ A n , based on the codewords it receiv ed thus far . The goal is to minimize the average distortion 1 t t X i =1 E h d ( X i , ˆ X i ) i , (1) The authors are with California Institute of T echnology (e-mail: vkostina@caltech.edu , hassibi@caltech.edu ). This work was supported in part by the National Science Foundation (NSF) under grants CCF- 1751356 and CCF-1817241. The work of Babak Hassibi was supported in part by the NSF under grants CNS-0932428, CCF-1018927, CCF- 1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASA ’ s Jet Propulsion Laboratory through the President and Director’ s Fund, and by King Abdullah University of Science and T echnology . A part of this work was presented at ISIT 2020 [ 1 ]. where t is the time horizon ov er which the source is being tracked, and d : A n × ˆ A n 7→ R + is the distortion measure. Encoding and decoding operations le verage the memory of the past but cannot look in the future. In this causal setting no delay is allowed neither at the encoders in producing codew ords to encode X i nor at the decoder in producing ˆ X i . ENC 1 ENC 2 DEC ENC K . . . . . . X i AAACfnicbVHbbtNAEN24XEq49MIjLxYpiAeIvU5FggRSBTzwWARpI8VWtF5PklX2Yu2OWyLLn8ArfBt/wyb1A20ZaaWjMzM7c+bkpRQO4/hPJ9i5c/fe/d0H3YePHj/Z2z84PHOmshzG3EhjJzlzIIWGMQqUMCktMJVLOM9Xnzb58wuwThj9HdclZIottJgLztBT3yYzMdvvxf14G+FtQFvQI22czg46kBaGVwo0csmcm9K4xKxmFgWX0HTTykHJ+IotYOqhZgpcVm93bcIXninCubH+aQy37L8dNVPOrVXuKxXDpbuZ25D/y00rnI+yWuiyQtD8atC8kiGacCM8LIQFjnLtAeNW+F1DvmSWcfTn6aYIP/BSFLj8QBOuwmtjo7HzNdHFyjgUmkWfW/Uu+mhwxXxC+S+N89ItaLjkRimmizqVLPfOICjR1OlW6tQu8qz25x3GIzp8Hfdp8vY4iT0YDt7RQdwcpXklJeBR442hN224Dc6SPh30k6/HvZP3rUW75Bl5Tl4RSobkhHwhp2RMOFmQn+QX+R2Q4GXwJoiuSoNO2/OUXItg9BeAMb9c Y 1 i AAACgHicbVFNb9NAEN24fJTw1ZYjF4sUqQcUe52KFARSBRw4Fgm3RbGJ1utJssp+WLvjlsjyb+AKP41/wyb1gbaMtNLTm5mdefOKSgqHcfynF2zduXvv/vaD/sNHj5883dndO3WmthxSbqSx5wVzIIWGFAVKOK8sMFVIOCuWH9f5swuwThj9FVcV5IrNtZgJztBT6bep+E6nO4N4GG8ivA1oBwaki5Ppbg+y0vBagUYumXMTGleYN8yi4BLaflY7qBhfsjlMPNRMgcubzbZt+NIzZTgz1j+N4Yb9t6NhyrmVKnylYrhwN3Nr8n+5SY2zo7wRuqoRNL8aNKtliCZcSw9LYYGjXHnAuBV+15AvmGUc/YH6GcIPvBQlLt7ThKvw2tgodb4mulgah0Kz6FOn3kUfDC6ZTyj/pXFeugUNl9woxXTZZJIV3hsEJdom20id2HmRN/684/iIjl/FQ5q8PkxiD8ajN3QUt/tZUUsJuN96Y+hNG26D02RIR8Pky+Hg+F1n0TZ5Tl6QA0LJmByTz+SEpIQTQX6SX+R3EAQHQRTQq9Kg1/U8I9ciePsX+1zAAA== Y 2 i AAACgHicbVFNb9NAEN24fJTw1ZYjF4sUqQcUe52KFARSBRw4Fgm3RbGJ1utJssp+WLvjlsjyb+AKP41/wyb1gbaMtNLTm5mdefOKSgqHcfynF2zduXvv/vaD/sNHj5883dndO3WmthxSbqSx5wVzIIWGFAVKOK8sMFVIOCuWH9f5swuwThj9FVcV5IrNtZgJztBT6bep+J5MdwbxMN5EeBvQDgxIFyfT3R5kpeG1Ao1cMucmNK4wb5hFwSW0/ax2UDG+ZHOYeKiZApc3m23b8KVnynBmrH8aww37b0fDlHMrVfhKxXDhbubW5P9ykxpnR3kjdFUjaH41aFbLEE24lh6WwgJHufKAcSv8riFfMMs4+gP1M4QfeClKXLynCVfhtbFR6nxNdLE0DoVm0adOvYs+GFwyn1D+S+O8dAsaLrlRiumyySQrvDcISrRNtpE6sfMib/x5x/ERHb+KhzR5fZjEHoxHb+gobvezopYScL/1xtCbNtwGp8mQjobJl8PB8bvOom3ynLwgB4SSMTkmn8kJSQkngvwkv8jvIAgOgiigV6VBr+t5Rq5F8PYv/XTAAQ== Y K i AAACgHicbVFNb9NAEN2YrxI+2sKRi0WK1AOKvU5FCgKpAg5IXIqE26LYROv1JFllP6zdcUtk+TdwhZ/Gv2GT+kBbRlrp6c3Mzrx5RSWFwzj+0wtu3b5z997W/f6Dh48eb+/sPjlxprYcUm6ksWcFcyCFhhQFSjirLDBVSDgtlh/W+dNzsE4Y/RVXFeSKzbWYCc7QU+m3qfj+eboziIfxJsKbgHZgQLo4nu72ICsNrxVo5JI5N6FxhXnDLAouoe1ntYOK8SWbw8RDzRS4vNls24YvPFOGM2P90xhu2H87GqacW6nCVyqGC3c9tyb/l5vUODvMG6GrGkHzy0GzWoZowrX0sBQWOMqVB4xb4XcN+YJZxtEfqJ8h/MALUeLiHU24Cq+MjVLna6LzpXEoNIs+dupd9N7gkvmE8l8a56Vb0HDBjVJMl00mWeG9QVCibbKN1ImdF3njzzuOD+n4ZTykyauDJPZgPHpNR3G7lxW1lIB7rTeGXrfhJjhJhnQ0TL4cDI7edhZtkWfkOdknlIzJEflEjklKOBHkJ/lFfgdBsB9EAb0sDXpdz1NyJYI3fwEx28Aa ˆ X i AAACg3icbVHbattAEF0rSZu6l1z62JclTqHQYkuyqVNoILR9yGMKdWKwhFmtxvbivYjdUVIj9BN9bX+sf9O1o4dcOrBwODOzM2dOVkjhMAz/toKt7Z0nT3eftZ+/ePlqb//g8NKZ0nIYcSONHWfMgRQaRihQwriwwFQm4Spbfl3nr67BOmH0D1wVkCo212ImOENPjZMFQzqeiul+J+yGm6CPQdSADmniYnrQgiQ3vFSgkUvm3CQKC0wrZlFwCXU7KR0UjC/ZHCYeaqbApdVm4Zq+9UxOZ8b6p5Fu2LsdFVPOrVTmKxXDhXuYW5P/y01KnJ2kldBFiaD57aBZKSkaulZPc2GBo1x5wLgVflfKF8wyjv5G7QThJ96IHBenUcwVvTe2N3K+pne9NA6FZr1vjXrX+2JwyXxC+S+N89ItaLjhRimm8yqRLPP2IChRV8lG6sTOs7Ty5x2GJ9HwQ9iN4o+DOPRg2P8U9cP6OMlKKQGPa29M9NCGx+Ay7kb9bvx90Dn73Fi0S96QI/KORGRIzsg5uSAjwokkv8hv8ifYCd4HcTC4LQ1aTc9rci+C03/gTsFH CH 2 CH 1 CH K R 1 R 2 R K Fig. 1. The CEO problem with inter-block memory: the encoders and the decoder keep the memory of their past observations. In the classical setting with t = 1 , the CEO problem was ﬁrst introduced by Berger et al. [ 2 ] for a ﬁnite alphabet source. In the classical Gaussian CEO problem, an i.i.d. Gaussian source is observed via A WGN channels and reproduced under mean squared error (MSE) distortion. The Gaussian CEO problem was studied by V iswanathan and Berger [ 3 ], who proved an achiev ability bound on the rate-distortion dimension for the case of K identical Gaussian channels, by Oohama [ 4 ], who deri ved the sum- rate rate-distortion region for that special case, by Prabharan et al. [ 5 ] and Oohama [ 6 ], who determined the full Gaussian CEO rate region, by Chen et al. [ 7 ], who proved that the minimum sum rate is achieved via waterﬁlling, by Behroozi and Soleymani [ 8 ] and by Chen and Berger [ 9 ], who showed rate-optimal successive coding schemes. W agner et al. [ 10 ] found the rate region of the distributed Gaussian lossy compression problem by coupling it to the Gaussian CEO problem. W agner and Anantharam [ 11 ] showed an outer bound to the rate region of the multiterminal source coding problem that is tighter than the Berger -T ung outer bound [ 12 ], [ 13 ]. W ang et al. [ 14 ] showed a simple conv erse on the sum rate of the vector Gaussian CEO problem. Concurrently , Ekrem and Ulukus [ 15 ] and W ang and Chen [ 16 ] showed an outer bound to the rate region of the vector Gaussian CEO problem that is tight in some cases and not tight in others and that particularizes the outer bound in [ 11 ] to the Gaussian case. Courtade and W eissman [ 17 ] determined 2 the distortion re gion of the distrib uted source coding and the CEO problem under logarithmic loss. None of the above results directly apply to the tracking problem in Fig. 1 because of the past memory in encoding the n -blocks of observations and in producing ˆ X i in ( 1 ) , which imposes blockwise causality constraints onto the coding process. The most basic scenario of source coding with causality constraints is that of a single observer directly seeing the information source [ 18 ]. The causal rate-distortion function for the Gauss-Markov source was computed by Go rb unov and Pinsker [ 19 ]. The link between the minimum attainable linear quadratic Gaussian (LQG) control cost and the causal rate-distortion function is elucidated in [ 20 ]–[ 22 ]. A semideﬁnite program to compute the causal rate-distortion function for vector Gauss-Markov sources is provided in [ 23 ]. The remote Gaussian causal rate-distortion function, which corresponds to setting K = 1 in Fig. 1 , is computed in [ 22 ]. The causal rate-distortion function of the Gauss-Markov source with a Gaussian side observ ation av ailable at the decoder (the causal counterpart of the W yner-Zi v setting) is computed in [ 24 ] for the scalar source and in [ 25 ] for the vector source. That causal W yner-Zi v setting can be viewed a special case of our causal CEO problem ( 2 ) , ( 3 ) with two observers, with one of the observers enjoying an inﬁnite rate. Stability of linear Gaussian systems with multiple isolated observers is in vestigated in [ 26 ]. The ﬁrst contrib ution of this paper is an extension of the Berger -Tung inner and outer bounds [ 12 ], [ 13 ] to the distributed tracking setting of Fig. 1 that sandwich the minimum asymptotically achiev able (as n → ∞ ) sum rate R 1 + . . . + R K required to achieve a giv en average distortion ( 1 ) . Provided that the components of each X i ∈ A n are i.i.d. ( X i can still depend on X 1 , . . . , X i − 1 ), the channels act on each of those components independently , and the distortion measure is separable, that minimum sum rate is bounded in terms of the directed mutual information from the encoders to the decoder . The conv erse (outer bound) follows via standard data processing and single-letterization arguments. T o prove the achie vability , we sho w a nonasymptotic bound for blockwise-causal distributed lossy source coding that can be vie wed as an extension of the nonasymptotic Berger - T ung inner bound by Y assaee et al. [ 27 ], [ 28 ], applicable to the setting with K = 2 sources and t = 1 rounds of communication, to the setting with an arbitrary number of sources and communication rounds. W e view the horizon- t causal coding problem as a multiterminal coding problem in which at each step coded side information from past steps is available, and we use a stochastic likelihood coder (SLC) by Y assaee et al. [ 27 ], [ 28 ] to perform encoding operations. The SLC-based encoder mimics the operation of the joint typicality encoder while admitting sharp nonasymptotic bounds on its performance. While the SLC-based decoder of [ 27 ], [ 28 ] is ill-suited to the case K > 2 , we propose a nov el decoder that falls into the class of generalized likelihood decoders [ 29 ] and uses K different threshold tests depending on the point of the rate-distortion region the code is operating at. An asymptotic analysis of our nonasymptotic bound yields an extension of the Ber ger- T ung inner bound [ 12 ], [ 13 ] to the setting with inter-block memory . The second contribution of the paper is an explicit ev alua- tion of the minimum sum rate for the causal Gaussian CEO problem. In that scenario, the source is an n -dimensional Gauss-Markov source, X i +1 = aX i + V i , (2) and the k -th observer sees Y k i = X i + W k i , k = 1 , . . . , K , (3) where X 1 and { V i , W 1 i , W 2 i , . . . , W K i } T i =1 are independent Gaussian vectors of length n with i.i.d. components; each component of V i is distributed as N (0 , σ 2 V ) , and each component of W k i as N (0 , σ 2 W k ) . Note that different observation channels can hav e different noise po wers. The distortion measure is the normalized squared error d  X i , ˆ X i  = 1 n k X i − ˆ X i k 2 . (4) W e characterize the minimum sum rate as a con vex opti- mization problem over K parameters; an explicit formula is giv en in the case of identical observation channels. Similar to the corresponding result for t = 1 [ 5 ], [ 6 ], [ 30 , Th. 12.3], our extension of the Berger -T ung inner bound is tight in this case. T o compute the bound, we split up the directed minimal mutual information problem into a sum of easier- to-solve optimization problems. T o tie the parameters of those optimization problems back to those of the original optimization problem, we extend the technique dev eloped by W ang et al. [ 14 ] for the time horizon t = 1 , to t > 1 . A device that helps us track the behavior of optimal estimation errors over multiple time instances is the rev ersal of the channels from { X i } to { Y k i } : X i = ¯ X k i + W k 0 i , (5) where ¯ X k i , E  X i | Y k 1 , . . . , Y k i  , (6) and W k 0 i ⊥ ¯ X k i are Gaussian independent random vectors representing the errors in estimating X i from { Y k j } i j =1 . While for t = 1 , it does not matter whether the encoders compress Y k 1 or ¯ X 1 since the latter is just a scaled version of the former , for t > 1 , compressing Y k i instead of ¯ X k i is only suboptimal. The third contribution of the paper is a bound on the rate loss due to a lack of communication among the different encoders in the causal Gaussian CEO problem: as long as the target distortion is not too small, the rate loss is bounded above by K − 1 times the difference between the remote and the direct rate-distortion functions. The bound is attained with equality if the observ ation channels are identical, indicating that among all possible observer channels with the same minimum MSE in the estimation of { X i } from { Y k j } j ≤ i,k =1 ,...,K , the identical channels case is the hardest to compress. This result contributes to the discussions of the rate loss in the classical CEO [ 31 , Cor . 1] and multiple descriptions [ 32 , Lemma 3] problems. The rest of the paper is organized as follo ws. In Section II , we consider the general (non-Gaussian) causal CEO problem 3 and prove direct and con verse bounds to the minimum sum rate in terms of minimal directed mutual information problems (Theorem 1 ). In Section III , we characterize the causal Gaussian CEO rate-distortion function (Theorem 4 ). In Section IV , we bound the rate loss due to isolated observers (Theorem 5 ). Notation: Logarithms are natural base. For a natural number M , [ M ] , { 1 , . . . , M } . Notation X ← Y reads “replace X by Y ”; notation X ⊥ Y reads “ X is independent of Y ”; notation , reads “by deﬁnition”. The temporal index is indicated in the subscript and the spatial index in the su- perscript: Y k [ t ] is the temporal v ector ( Y k 1 , . . . , Y k t ) ; Y [ K ] i is the spatial vector ( Y 1 i , . . . , Y K i ) T ; Y [ K ] [ t ] , ( Y 1 [ t ] , . . . , Y K [ t ] ) . Delay operator D acts as D X [ t ] , (0 , X 1 , . . . , X t − 1 ) . For a random vect or X with i.i.d. components, X denotes a random v ariable distributed the same as each component of X . W e adopt the follo wing shorthand notation for causally conditional [ 33 ] probability kernels: P Y [ t ] || X [ t ] , t Y i =1 P Y i | Y [ i − 1] ,X [ i ] . (7) Giv en a distribution P X [ t ] and a causal kernel P Y [ t ] k X [ t ] , the directed mutual information is deﬁned as [ 34 ] I  X [ t ] → Y [ t ]  , t X i =1 I  X [ i ] ; Y i | Y [ i − 1]  . (8) I I . S U M R A T E V I A D I R E C T E D I N F O R M A T I O N A. Overview In this section, we present and prove our extension of the Berger-T ung bounds to the setting inter-block memory that sandwich the minimum achiev able sum rate in terms of minimal directed mutual information problems. The bounds apply to an abstract source with abstract observations. The operational scenario and achiev able rates are formally deﬁned in Section II-B . The directed mutual information bounds are presented in Section II-C . The con verse is proven in Section II-D . The nonasymptotic achiev ability bound and its asymptotic analysis are presented in Section II-E . A set of remarks in Section II-F completes Section II . B. Operational pr oblem setting A CEO code with inter -block memory , or a causal CEO code, is formally deﬁned as follo ws. Deﬁnition 1 (A CEO code with inter-block memory) . Consider a discr ete-time random pr ocess { X i } t i =1 on X , observed by K causal observers via the channels P Y k [ t ] k X [ t ] : X ⊗ t 7→ Y ⊗ t , k ∈ [ K ] . (9) Let d : X × ˆ X 7→ R + be the distortion measure . A CEO code with inter-bloc k memory consists of: a) K encoding policies P B k [ t ] k Y k [ t ] : Y ⊗ t 7→ t Y i =1 [ M k i ] , k ∈ [ K ] , (10) b) a decoding policy P ˆ X [ K ] [ t ] k B [ K ] [ t ] : t Y i =1 [ M k i ] 7→ ˆ X ⊗ t . (11) If the encoding and decoding policies satisfy 1 t t X i =1 E h d  X i , ˆ X i i ≤ d, (12) we say that the y form an ( M [ K ] [ t ] , d ) av erage distortion code. If the encoding and decoding policies satisfy P " t [ i =1 n d  X i , ˆ X i  > d i o # ≤ , (13) we say that the y form an ( M [ K ] [ t ] , d [ t ] ,  ) excess distortion code. The pr obability measur e in ( 12 ) and ( 13 ) is generated by the joint distrib ution P X [ t ] P Y [ K ] [ t ] k X [ t ] P ˆ X [ K ] [ t ] k B [ K ] [ t ] Q K k =1 P B k [ t ] k Y k [ t ] . A distortion measure d n : A n × ˆ A n 7→ R + is called separable if d n ( x, ˆ x ) = 1 n n X i =1 d ( x ( i ) , ˆ x ( i )) , (14) where d : A × ˆ A 7→ R + , and x ( i ) , ˆ x ( i ) denote the i -th components of vectors x ∈ A n and ˆ x ∈ ˆ A n , respectively . Deﬁnition 2 (Operational rate-distortion function) . Con- sider a discrete-time r andom pr ocess { X i } t i =1 on X = A n equipped with a separable distortion measur e, observed by K causal observers via the channels ( 9 ) . The rate-distortion tuple  R [ K ] , d  is asymptotically achiev able at time horizon t if for ∀ γ > 0 , ∃ n 0 ∈ N such that ∀ n ≥ n 0 , an  M [ K ] [ t ] , d + γ  averag e distortion CEO code with inter-block memory exists, where 1 nt t X i =1 log M k i ≤ R k , k ∈ [ K ] . (15) The rate-distortion pair ( R, d ) is asymptotically achie v- able if a rate-distortion tuple  R [ K ] , d  with K X k =1 R k ≤ R (16) is asymptotically achie vable. The causal CEO rate-distortion function at time horizon t is deﬁned as follows: R t CEO ( d ) , inf n R : ( R, d ) is achie vable (17) at time horizon t in the CEO pr oblem . o 4 C. Ber ger-T ung bounds with inter -block memory Consider a discrete-time random process { X i } t i =1 on X = A n equipped with separable distortion measure d , observed by K causal observers via the channels ( 9 ) with Y = B n and P X i | X [ i − 1] = P ⊗ n X i | X [ i − 1] (18) P Y k i | X [ i ] ,Y k [ i − 1] = P ⊗ n Y k i | X [ i ] , Y k [ i − 1] . (19) Denote the minimal directed mutual information problems R t CEO ( d ) , inf P U [ K ] [ t ] k Y [ K ] [ t ] : ( 22 ) P ˆ X [ t ] k U [ K ] [ t ] : ( 24 ) 1 t I  Y [ K ] [ t ] → U [ K ] [ t ]  (20) R t CEO ( d ) , inf P U [ K ] [ t ] k Y [ K ] [ t ] : ( 23 ) P ˆ X [ t ] k U [ K ] [ t ] : ( 24 ) 1 t I  Y [ K ] [ t ] → U [ K ] [ t ]  (21) where the constraints are as follo ws: P U [ K ] [ t ] k Y [ K ] [ t ] = K Y k =1 P U k [ t ] k Y k [ t ] (22) P U k [ t ] k Y [ K ] [ t ] = P U k [ t ] k Y k [ t ] ∀ k ∈ [ K ] (23) 1 t t X i =1 E h d  X i , ˆ X i i ≤ d. (24) Fixing a k ∈ [ K ] and marginalizing { U k 0 [ t ] , k 0 6 = k } out of both sides of ( 22 ) , one can see that any joint distribution that satisﬁes the separate encoding constraint ( 22 ) also satisﬁes ( 23 ) . Thus, the optimization problems ( 20 ) and ( 21 ) differ in that the constraint ( 22 ) is more stringent than ( 23 ) . They represent extensions of the Berger -T ung inner ( ( 20 ) ) and outer ( ( 21 ) ) bounds [ 30 , Th. 12.1, 12.2] to the causal setting. One can con vexify R t CEO ( d ) by adding to the opti- mization parameters a scalar α ∈ (0 , 1] and a distribu- tion P ˜ U [ K ] [ t ] k ˜ Y [ K ] [ t ] satisfying the separate encoding constraint analogous to ( 22 ) , and replacing the directed information in ( 20 ) by αI  Y [ K ] [ t ] → U [ K ] [ t ]  + (1 − α ) I  ˜ Y [ K ] [ t ] → ˜ U [ K ] [ t ]  . This is equiv alent to introducing into ( 20 ) a binary time sharing random variable. Given the achiev ability of ( 20 ) , the achie vability of the con vexiﬁcation follows by the standard time sharing argument [ 30 , Ch. 4.4]. Since a mixture of distributions P U [ K ] [ t ] k Y [ K ] [ t ] satisfying ( 23 ) also satisﬁes ( 23 ) , the con vexity of R t CEO ( d ) follo ws from the conv exity of directed mutual information in P U [ K ] [ t ] k Y [ K ] [ t ] , with no need for an explicit auxiliary time sharing random variable. Theorem 1 (Berger -T ung bounds with inter-block memory) . Consider a discrete-time random process { X i } t i =1 on X = A n equipped with a separable distortion measure d , observed by K causal observers via the channels ( 9 ) with Y = B n and ( 18 ) , ( 19 ) satisﬁed. Suppose further that for some p > 1 , there exists a vector ˆ x [ t ] such that E " 1 t t X i =1 d ( X i , ˆ x i ) ! p #! 1 p ≤ d p < ∞ . (25) The causal rate-distortion function is bounded as R t CEO ( d ) ≤ R t CEO ( d ) ≤ R t CEO ( d ) . (26) Condition ( 25 ) is a technical condition needed to apply a standard argument using H ¨ older’ s inequality to pass from an excess to average distortion in the proof of the achie vability bound (Appendix B ). T o prove the upper bound on the sum rate in ( 26 ) , we actually sho w a more accurate characterization of the entire rate tuple R [ K ] (Theorem 3 , below). W e will see in Section III below that the inner (upper) bound in ( 26 ) is tight in the quadratic Gaussian setting. This is in line with the corresponding result in the setting of block coding without inter-block memory [ 30 , Th. 12.3]. While in general the t -step optimization problems ( 20 ) and ( 21 ) are challenging to compute, we illustrate in this paper that the normalized limit as t → ∞ is possible to compute in the Gaussian setting. Similar limit results in other communication scenarios were shown in [ 19 ], [ 22 ], [ 24 ], [ 25 ], [ 35 ]–[ 38 ]. D. Theor em 1 : proof of conver se The proof of the conv erse uses standard techniques. W e will use the following deﬁnition and lemma. Causally conditioned directed information is deﬁned as I ( X [ t ] → Y [ t ] k Z [ t ] ) , t X i =1 I ( X [ i ] ; Y i | Y [ i − 1] , Z [ i ] ) . (27) Lemma 1 ( [ 33 , (3.14)–(3.16)]) . Dir ected information chain rules: I (( X [ t ] , Y [ t ] ) → Z [ t ] ) = I ( X [ t ] → Z [ t ] ) + I ( Y [ t ] → Z [ t ] k X [ t ] ) , (28) I ( X [ t ] → ( Y [ t ] , Z [ t ] )) = I ( X [ t ] → Y [ t ] kD Z [ t ] ) + I ( X [ t ] → Z [ t ] k Y [ t ] ) . (29) Fix an ( M [ K ] [ t ] , d ) code in Deﬁnition 1 . Denote by B k i ∈ [ M k i ] the codeword sent by k -th encoder at time i . Since the codewords satisfy the sum rate constraint ( 16 ), ntR ≥ K X k =1 H ( B k [ t ] ) (30) ≥ H  B [ K ] [ t ]  (31) ≥ I  Y [ K ] [ t ] → B [ K ] [ t ]  (32) ≥ inf P B [ K ] [ t ] k Y [ K ] [ t ] = Q K k =1 P B k [ t ] k Y k [ t ] , P ˆ X [ K ] [ t ] k B [ K ] [ t ] : ( 12 ) holds I  Y [ K ] [ t ] → B [ K ] [ t ]  , (33) where ( 31 ) holds because the joint entropy is upper-bounded by the sum of indi vidual entropies, and ( 32 ) holds because the mutual information is upper-bounded by the entropy . Note that ( 33 ) is the n -letter version of ( 20 ). W e proceed to apply a standard single-letterization argument to ( 33 ) . For an n -dimensional vector Y k i , we denote by Y k i ( j ) its j -th component; for sets K ⊆ [ K ] 5 and I ⊆ [ n ] , we denote by Y K i ( I ) the components of the vectors  Y k i : k ∈ K  index ed by I . W e introduce auxiliary random objects U k i ( j ) =  B k i , Y [ K ] i ([ j − 1])  , j ∈ [ n ] (34) The directed mutual information in the right side of ( 33 ) can be rewritten in terms of U [ K ] i and bounded as follo ws. I  Y [ K ] [ t ] → B [ K ] [ t ]  = n X j =1 I  Y [ K ] [ t ] ( j ) → B [ K ] [ t ] k Y [ K ] [ t ] ([ j − 1])  (35) = n X j =1 I  Y [ K ] [ t ] ( j ) →  B [ K ] [ t ] , Y [ K ] [ t ] ([ j − 1])  − I  Y [ K ] [ t ] ( j ) → Y [ K ] [ t ] ([ j − 1]) kD B [ K ] [ t ]  (36) = n X j =1 I  Y [ K ] [ t ] ( j ) →  B [ K ] [ t ] , Y [ K ] [ t ] ([ j − 1])  (37) = n X j =1 I  Y [ K ] [ t ] ( j ) → U [ K ] [ t ] ( j )  (38) ≥ min d j ,j ∈ [ n ] : P d j ≤ nd t n X j =1 R t CEO ( d j ) (39) ≥ nt R t CEO ( d ) (40) where ( 35 ) is by the chain rule of mutual information; ( 36 ) is by the chain rule of directed information ( 29 ) ; ( 37 ) holds because P B [ K ] [ t ] | Y [ K ] [ t ] = P B [ K ] [ t ] k Y [ K ] [ t ] is a causal kernel, which means that P Y [ K ] [ t ] kD B [ K ] [ t ] = P Y [ K ] [ t ] , hence conditioning on D B [ K ] [ t ] in ( 36 ) can be eliminated, and the resulting directed information is zero because dif ferent components of the vector Y k i are independent due to ( 18 ) , ( 19 ) ; ( 38 ) is by substituting ( 34 ) ; ( 39 ) holds because U k i ( j ) ( 34 ) satisﬁes P U k [ t ] ( j ) k Y [ K ] [ t ] ( j ) = P U k [ t ] ( j ) k Y k [ t ] ( j ) , the distortion measure is separable and ( 18 ) , ( 19 ) hold; and ( 40 ) is by the con vexity of R t CEO ( d ) as a function of d . E. Theor em 1 : pr oof of ac hievability T o show that ( 26 ) is achiev able in the asymptotics n → ∞ , we ﬁrst sho w a nonasymptotic bound. Then, via an asymptotic analysis of the bound, we derive an extension of the Berger -T ung inner bound [ 12 ], [ 13 ] to the setting with inter-block memory . Before we present our nonasymptotic achiev ability bound in Theorem 2 below , we prepare some notation. For a ﬁxed conditional distribution P U k i Y k [ i ] | U k [ i − 1] , denote the conditional information density ı  y k [ i ] ; u k i | u k [ i − 1]  , log dP U k i | Y k [ i ] ,U k [ i − 1]  u k i | y k [ i ] , u k [ i − 1]  dP U k i | U k [ i − 1]  u k i | u k [ i − 1]  . (41) For a ﬁxed joint distrib ution P U [ K ] [ i ] , denote the relativ e conditional information densities  k  u [ K ] [ i ]  , log dP U k i | U [ k − 1] i U [ K ] [ i − 1]  u k i | u [ k − 1] i u [ K ] [ i − 1]  dP U k i | U k [ i − 1]  u k i | u k [ i − 1]  . (42) For a permutation π : [ K ] 7→ [ K ] , we denote the ordered set π ( K ) , { π ( k ) : k ∈ K} . (43) Theorem 2 (nonasymptotic Berger-T ung inner bound with inter-block memory) . F ix P Y [ K ] [ t ] and parameters M [ K ] [ t ] , d [ K ] [ t ] ,  . F or any scalars α k i , β k i , any inte gers L k i ≥ M k i , i ∈ [ t ] , k ∈ [ K ] , any causal kernels P U [ K ] [ t ] k Y [ K ] [ t ] = Q K k =1 P U k [ t ] k Y k [ t ] and P ˆ X [ K ] [ t ] k U [ K ] [ t ] , and any permutation π : [ K ] 7→ [ K ] , ther e exists an ( M [ K ] [ t ] , d [ t ] ,  ) excess distortion CEO code with inter-bloc k memory such that  ≤ P [ E ] + γ , (44) wher e event E is given by E , t [ i =1 n d  X i , ˆ X k i  > d i o (45) t [ i =1 K [ k =1 n ı  Y k [ i ] ; U k i | U k [ i − 1]  > log L k i − α k i o t [ i =1 K [ k =1 (  π ( k )  u π ([ K ]) [ i ]  < log L π ( k ) i M π ( k ) i + β π ( k ) i ) , and constant γ is given by γ , 1 − (46) 1 Q t i =1 h P K⊆ K exp( − P k ∈K β k i ) i Q K k =1  1 + exp( − α k i )  . Pr oof sketch. W e employ the achie vability proof technique dev eloped by Y assaee et al. [ 27 ], [ 28 ] that uses a stochastic likelihood coder (SLC) to perform encoding operations. An SLC makes a randomized decision that coincides with high probability with the choice that a maximum likelihood (ML) coder would make (in fact, the error probability of the SLC exceeds by at most a factor of 2 the error probability of the ML coder [ 39 , Th. 7]). W e view the horizon- t causal coding problem as a multiterminal coding problem in which at each step coded side information from past steps is av ailable, and we deﬁne the SLC based on the auxiliary transition probability kernel P U k i | Y k [ i ] U k [ i − 1] (see ( 132 ) in Appendix A ) that is also used to generate random codebooks. While [ 28 , Th. 6] shows a sharp nonasymptotic bound for the classical distributed source coding problem with K = 2 terminals, the decoder employed there does not extend to the case K > 2 . In ( 136 ) in Appendix A , we propose a novel decoder that falls into the class of generalized likelihood decoders (GLD) conceptualized by Merhav [ 29 , eq. (4)] and that uses an auxiliary indicator function g  u [ K ] [ i ]  ( 137 ) . W ith our GLD we are able to recov er the full Berger-T ung 6 region ( ( 52 ) , belo w) for any K . One can view the set of outcomes u [ K ] [ i ] for which g  u [ K ] [ i ]  = 1 as a jointly typical set. That set depends on the choice of π and thus on the particular rate point that the code is operating at. Checking for membership in that set in volv es K threshold tests. In contrast, the jointly typical set deﬁned by Oohama [ 4 , eq. (46)] inv olves 2 K − 1 threshold tests, one for each nonempty subset of [ K ] . Full details are given in Appendix A . Theorem 3 (Berger -T ung inner bound with inter-block memory) . Under the assumptions of Theor em 1 , the rate- distortion tuple ( R [ K ] , d ) is asymptotically achievable at time horizon t if for some single-letter causal kernels P U [ K ] [ t ] k Y [ K ] [ t ] , P ˆ X [ K ] [ t ] k U [ K ] [ t ] satisfying ( 22 ) , ( 24 ) and some per- mutation π : [ K ] 7→ [ K ] , it holds for all k ∈ [ K ] R π ( k ) > 1 t I  Y π ( k ) [ t ] → U π ( k ) [ t ] k U π ([ k − 1]) [ t ] , D U [ K ] [ t ]  . (47) Pr oof. Appendix B . Theorem 3 implies that the sum rate K X k =1 R k > 1 t I  Y [ K ] [ t ] → U [ K ] [ t ]  (48) is achiev able. Indeed, summing ( 47 ) ov er k and using U k i −  Y k [ i ] , U k [ i − 1]  − U [ K ] \{ k } [ i ] leads to ( 48 ) . Therefore, the sum rate in ( 21 ) is achiev able. F . Remarks W e conclude Section II with a set of remarks. 1. Theorems 2 and 3 are easily extended to distributed source coding with inter-block memory , where the goal is to separately compress (and jointly decompress) K pro- cesses { Y k i } under the individual distortion constraints 1 t t X i =1 E h d k ( Y k i , ˆ Y k i ) i ≤ d k , k ∈ [ K ] . (49) Theorem 2 continues to hold with d  X i , ˆ X k i  > d i in ( 45 ) replaced by d k  Y k i , ˆ Y k i  > d k i . Consequently , The- orem 3 also continues to hold, replacing the constraint in ( 24 ) by 1 t t X i =1 E h d k ( Y k i , ˆ Y k i ) i ≤ d k , k ∈ [ K ] . (50) 2. Case t = 1 corresponds to the classical CEO / distributed source coding problems. The region in ( 47 ) simpliﬁes to R π ( k ) > I ( Y π ( k ) ; U π ( k ) | U π ([ k − 1]) ) , ∀ k ∈ [ K ] , ∀ permutation π : [ K ] 7→ [ K ] . (51) The multiterminal Ber ger-T ung inner region is usually (e.g. [ 17 , Def. 7], [ 5 , eq. (2)]) speciﬁed as X k ∈A R k > I ( Y A ; U A | U A c ) , ∀A ⊆ [ K ] . (52) These characterizations are equiv alent (Appendix C ). 3. While the sum rate bound in ( 48 ) is the same reg ardless of the choice of permutation π , different π ’ s in ( 47 ) correspond to dif ferent orders in which the chain rule of mutual information can be applied, and are needed to specify the full achiev able region of rates and distortions. 4. W e chose to omit the time-sharing random variable in Theorem 3 for simplicity of presentation. It can be introduced in ( 47 ) using the standard time sharing argument [ 30 , Ch. 4.4]. I I I . G A U S S I A N R A T E - D I S T O RT I O N F U N C T I O N A. Pr oblem setup This section focuses on the scenario of the Gauss-Markov source in ( 2 ) observed through the Gaussian channels in ( 3 ) under squared error distortion ( 4 ) . Given an encoding policy in Deﬁnition 1 , the optimal decoding policy P ˆ X [ K ] [ t ] k B [ K ] [ t ] that achieves the minimum expected squared error is ˆ X i , E h X i | B [ K ] [ i ] i . (53) For simplicity we focus on the inﬁnite time-horizon limit. R CEO ( d ) , lim sup t →∞ R t CEO ( d ) . (54) In other words, the causal CEO rate-distortion function R CEO ( d ) is the inﬁmum of R ’ s such that ∀ γ > 0 , ∃ t 0 ≥ 0 such that ∀ t ≥ t 0 , ∃ n 0 ∈ N such that ∀ n ≥ n 0 , an  M [ K ] [ t ] , d + γ  av erage distortion CEO code with inter- block memory exists with M [ K ] [ t ] satisfying ( 15 ) and ( 16 ). T aking the limit t → ∞ simpliﬁes the solution of many minimal directed mutual information problems ([ 22 , Th. 9], [ 24 , Th. 6, Th. 7], [ 35 , Th. 1], [ 36 , Th. 2], [ 37 , Th. 1]) by eliminating the transient effects due to the starting location X 1 of the process { X i } that is being transmitted. In this steady state regime, the optimal rate allocation across time is uniform (i.e., log M k 1 = . . . = log M k t in ( 15 ) ). Furthermore, R t CEO ( d ) approaches its steady-state value ( 54 ) as O  1 t  (this is a consequence of [ 24 , eq. (83)-(85), (92)] and ( 82 ), ( 86 ), ( 93 ) belo w). In Section III-B , we present the Gaussian rate-distortion function as a con ve x optimization problem over K parame- ters (Theorem 4 ), which reduces to an explicit formula in the identical-channels case (Corollary 1 ). These results are obtained by showing that the inner bound in Theorem 1 is tight in the Gaussian case and by ev aluating the correspond- ing minimal directed mutual information. In Section III-C , we give auxiliary estimation lemmas that are useful in the proof of Theorem 4 . W e give the proof of Theorem 4 in Section III-D . Notation: For a random process { X i } on R , its stationary variance (can be + ∞ ) is denoted by σ 2 X , lim sup i →∞ E  X 2 i  . (55) The minimum mean squared error (MMSE) in the estimation of X i from Y [ K ] [ i ] is denoted by σ 2 X i | Y [ K ] [ i ] , E   X − E h X i | Y [ K ] [ i ] i 2  , (56) and the steady-state causal MMSE by σ 2 X k Y [ K ] , lim sup i →∞ σ 2 X i | Y [ K ] [ i ] . (57) 7 B. Gaussian r ate-distortion function In Theorem 4 , the Gaussian rate-distortion function is expressed as a conv ex optimization problem ov er param- eters { d k } K k =1 that determine the indi vidual rates of the transmitters and that correspond to the MSE achiev able at the decoder in the estimation of { X i } t i =1 provided that the codew ords from k -th transmitter are decoded correctly . Theorem 4 (Gaussian rate-distortion function with in- ter-block memory) . F or all σ 2 X k Y [ K ] < d < σ 2 X , the causal CEO rate-distortion function ( 54 ) for the Gauss-Markov sour ce in ( 2 ) observed thr ough the Gaussian channels in ( 3 ) is given by R CEO ( d ) = 1 2 log ¯ d d + min { d k } K k =1 K X k =1 1 2 log ¯ d k − σ 2 X k Y k d k − σ 2 X k Y k d k ¯ d k , (58) wher e ¯ d , a 2 d + σ 2 V , (59) ¯ d k , a 2 d k + σ 2 V , (60) and the minimum is over d k , k ∈ [ K ] , that satisfy 1 d ≤ 1 σ 2 X k Y [ K ] − K X k =1 1 σ 2 X k Y k − 1 d k ! , (61) σ 2 X k Y k ≤ d k ≤ σ 2 X . (62) Pr oof. Section III-D . If the source is observed directly by one or more of the encoders, say if σ 2 X k Y 1 = 0 , then d 1 = d , d 2 = . . . = d K = σ 2 X is optimal, and ( 58 ) reduces to the causal rate-distortion function [ 19 , eq. (1.43)] (and e.g. [ 20 ], [ 40 , Th. 3], [ 22 , (64)]), [ 24 , Th. 6]): R ( d ) = 1 2 log ¯ d d . (63) The sum ov er k ∈ [ K ] in ( 58 ) is thus the penalty due to the encoders not observing the source directly and not communicating with each other . If the observation channels satisfy σ 2 X k Y 1 = . . . = σ 2 X k Y K , (64) we can explicitly write the rate-distortion function R K − sym CEO ( d ) for this symmetrical scenario. Corollary 1 (Observation channels with the same SNR) . If, in the scenario of Theor em 4 , the observation channels satisfy ( 64 ) , the causal CEO rate-distortion function ( 54 ) is given by R K − sym CEO ( d ) = 1 2 log ¯ d d + K 2 log ¯ d 1 − σ 2 X k Y 1 d 1 − σ 2 X k Y 1 d 1 ¯ d 1 , (65) wher e d 1 satisﬁes 1 d = 1 σ 2 X k Y [ K ] − K σ 2 X k Y 1 + K d 1 . (66) Pr oof. It sufﬁces to show that the minimum in ( 58 ) is attained by d 1 = . . . = d K . Since each of the terms in the sum in ( 58 ) is a con vex function of d k , applying Jensen’ s inequality concludes the proof. Let us think now of adding identical observers by letting K → ∞ in ( 64 ) . Since σ 2 X k Y [ K ] → 0 , had the observers communicated with each other, they could have recovered the source exactly , and they could have operated at the sum rate ( 63 ) in the limit. As the following result demonstrates, lim K →∞ R K − sym CEO ( d ) is actually strictly greater than ( 63 ) , thus a non vanishing penalty due to separate encoding is present in this regime. See Section IV for a more thorough discussion on the loss due to separate encoding. Corollary 2 (Many channels asymptotics) . In the scenario of Cor ollary 1 , lim K →∞ R K − sym CEO ( d ) = 1 2 log ¯ d d + 1 2 1 d − 1 ¯ d 1 σ 2 X k Y 1 − 1 σ 2 X . (67) Pr oof. By Lemma 3 in Section III-C below , 1 σ 2 X k Y [ K ] = K σ 2 X k Y 1 − K − 1 σ 2 X . (68) Eliminating d 1 and σ 2 X k Y [ K ] from ( 65 ) using ( 66 ) and ( 68 ) , one readily veriﬁes that R K − sym CEO ( d ) − 1 2 log ¯ d d = 1 2 1 d − 1 ¯ d 1 σ 2 X k Y 1 − 1 σ 2 X + O  1 K  , (69) and ( 67 ) follows. Corollary 2 extends the result of Oohama [ 4 , Cor . 1] to the compression with inter-block memory , and coincides with it if a = 0 . Considering the scenario where the encoders and the decoder do not memorize past observations or code words, we may in voke the results on the classical Gaussian CEO problem in [ 5 ], [ 7 ] to express the minimum achiev able sum rate as R no memory CEO ( d ) = 1 2 log σ 2 X d + min { d k } K k =1 K X k =1 1 2 log σ 2 X − σ 2 X | Y k d k − σ 2 X | Y k d k σ 2 X , (70) where the minimum is over 1 d ≤ 1 σ 2 X | Y [ K ] − K X k =1 1 σ 2 X | Y k − 1 d k ! , (71) σ 2 X | Y k ≤ d k ≤ σ 2 X . (72) Here σ 2 X | Y k , lim i →∞ σ X i | Y k i and σ 2 X | Y [ K ] , lim i →∞ σ 2 X i | Y [ K ] i denote the stationary MMSE achiev able in the estimation of X i from Y k i and Y [ K ] i respecti vely , i.e., without memory of the past. If a = 0 , the observed process ( 2 ) becomes a stationary memoryless Gaussian process, the predicti ve MMSEs reduce to the variance of X i : ¯ d = ¯ d k = σ 2 X = σ 2 V ; similarly , σ 2 X | Y k = σ 2 X k Y k and σ 2 X | Y [ K ] = σ 2 X k Y [ K ] , and the result of 8 Theorem 4 coincides with the classical Gaussian CEO rate- distortion function ( 70 ) . This shows that if the source is memoryless, asymptotically there is no beneﬁt in keeping the memory of previously encoded estimates as permitted by Deﬁnition 1 . Classical codes that for get the past after encoding the current block of length n perform just as well. If | a | > 1 , the beneﬁt due to memory is inﬁnite: indeed, since the source is unstable, σ 2 X = ∞ , while ¯ d < ∞ . If | a | < 1 , that beneﬁt is ﬁnite and is characterized by the discrepancy between the stationary variance σ 2 X = σ 2 V 1 − a 2 of the process { X i } ∞ i =1 and the steady-state predictiv e MMSE ¯ d < σ 2 X , as well as that between σ 2 X | Y k and σ 2 X k Y k . C. MMSE estimation lemmas W e record two elementary estimation lemmas that will be instrumental in the proof of Theorem 4 . Lemma 2. Let X ∼ N  0 , σ 2 X  , W ∼ N  0 , σ 2 W  , W ⊥ X , and let Y = X + W. (73) Then, σ 2 X | Y = σ 2 X  1 − σ 2 X σ 2 Y  . (74) Pr oof. Appendix D . Lemma 3. Let ¯ X k and W 0 k be Gaussian random variables,  ¯ X k  K k =1 ⊥ { W 0 j } K j =1 , such that W 0 k ⊥ W 0 j , j 6 = k , and X = ¯ X k + W 0 k . (75) Then, the MMSE estimate and the estimation err or σ 2 W 0 , σ 2 X | ¯ X [ K ] of X given the vector ¯ X [ K ] satisfy E  X | ¯ X [ K ]  = K X k =1 σ 2 W 0 σ 2 W 0 k ¯ X k , (76) 1 σ 2 W 0 = K X k =1 1 σ 2 W 0 k − K − 1 σ 2 X . (77) Pr oof. Appendix D . Lemma 3 conv erts the “forward channels” from X to observations Y k Y k = X + W k , k = 1 , . . . , K, (78) where W k ∼ N  0 , σ 2 W k I  ( I denotes the identity matrix), W k ⊥ W j , j 6 = k , into “backward channels” from estimates ¯ X k to X ( 75 ) . While both representations are equi valent, ( 75 ) is more con venient to work with. Backward channel representations ﬁnd a widespread use in rate-distortion theory [ 41 ]. D. Pr oof of Theor em 4 : con verse 1) Proof overview: W e e valuate the n -letter conv erse bound ( 33 ) . W e break up the minimal directed mutual information problem in ( 33 ) into subproblems, and we use the tools we dev eloped in [ 24 ] to e valuate the causal rate-distortion functions for each subproblem. T o link the parameters of the subproblems together to obtain the solution of the original problem, we extend the proof technique by W ang et al. [ 14 ], developed for the case t = 1 , to t > 1 . Con verting the “forward channels” from X [ t ] to observations Y k [ t ] into the “backward channels” from MMSE estimates ¯ X k [ t ] to X [ t ] and applying the lemmas in Section III-C abov e are key to that extension. 2) Decoupling the pr oblem into K subpr oblems: Recall the notation in ( 6 ) . W e expand the right-hand side of ( 33 ) : inf I  Y [ K ] [ t ] → B [ K ] [ t ]  ≥ inf I  ¯ X [ K ] [ t ] → B [ K ] [ t ]  (79) = inf I  X [ t ] , ¯ X [ K ] [ t ]  → B [ K ] [ t ]  (80) = inf n I  X [ t ] → B [ K ] [ t ]  + I  ¯ X [ K ] [ t ] → B [ K ] [ t ] k X [ t ] o (81) = inf ( I  X [ t ] → B [ K ] [ t ]  + K X k =1 I  ¯ X k [ t ] → B k [ t ] k X [ t ]  ) (82) where • ( 79 ) holds by the chain rule ( 28 ) using I  ¯ X [ K ] [ t ] → B [ K ] [ t ] k Y [ K ] [ t ]  = 0 . The inﬁmum is ov er kernels P B [ K ] [ t ] k ¯ X [ K ] [ t ] satisfying both the separate encoding constraint P B [ K ] [ t ] k ¯ X [ K ] [ t ] = K Y k =1 P B k [ t ] k ¯ X k [ t ] (83) and the distortion constraint 1 nt t X i =1 E h k X i − ˆ X i k 2 i ≤ d, (84) where ˆ X i ( 53 ) is the MMSE estimate of X i giv en B [ K ] [ i ] ; • ( 80 ) is due to the chain rule of directed informa- tion ( 28 ), and I  X [ t ] → B [ K ] [ t ] k ¯ X [ K ] [ t ]  = 0 ; • ( 81 ) is by the chain rule of directed information ( 28 ) ; • ( 82 ) is due to ( 83 ). 3) Using causal r ate-distortion functions to evaluate the terms in ( 82 ) : W e lower -bound the ﬁrst term in ( 82 ) using a classical result on the point-to-point causal Gaussian rate- 9 distortion function [ 19 , eq. (1.43)] 1 as lim t →∞ inf ( 83 ) : ( 84 ) holds 1 t I  X [ t ] → B [ K ] [ t ]  ≥ lim t →∞ inf P ˆ X [ K ] [ t ] k X [ t ] : ( 84 ) holds 1 t I  X [ t ] → ˆ X [ K ] [ t ]  (85) = n 2 log ¯ d d , (86) where ¯ d is uniquely determined by d via ( 59 ) . Furthermore, ( 86 ) is achieved by the Gaussian kernel P ˆ X ? [ t ] k X [ t ] such that X i = ˆ X ? i + Z 0 i , Z 0 i ∼ N (0 , d I ) , (87) { Z 0 i } are i.i.d. and independent of { ˆ X ? i } , and d = σ 2 X k ˆ X ? (88) ¯ d = σ 2 X kD ˆ X ? . (89) For each of the remaining K terms in ( 82 ) , note that { ¯ X k i } is a Gauss-Markov process ¯ X k i +1 = a ¯ X k i + ¯ V k i , (90) where ¯ V k i ∼ N  0 ,  σ 2 X k i | Y k [ i ] − σ 2 X k i | Y k [ i − 1]  I  . The pro- cess { X i } can be expressed through { ¯ X k i } as X i = ¯ X k i + W k 0 i , (91) where W k 0 i are independent, W k 0 i ∼ N  0 , σ 2 X i | Y k [ i ] I  , and W k 0 i ⊥ X k i . Thus, we may apply the result [ 24 , Th. 7] on the causal counterpart of Gaussian W yner-Zi v rate-distortion function to the process { ¯ X k i } ( 90 ) with side information { X i } ( 91 ) to write (while stated for the scalar Gaussian source, the same argument applies to n parallel Gaussian sources of the same power , as is the case here; see [ 25 ] for the general vector case) lim t →∞ inf P B k [ t ] k ¯ X k [ t ] : 1 t P t i =1 σ 2 ¯ X k i | X [ i ] ,B k [ i ] ≤ ρ k 1 t I  ¯ X k [ t ] → B k [ t ] k X [ t ]  (92) = n 2 log ¯ ρ k ρ k , (93) where ¯ ρ k is uniquely determined by ρ k via 1 ¯ ρ k = 1 σ 2 ¯ W k 0 + 1 a 2 ρ k + σ 2 ¯ V . (94) Furthermore, ( 93 ) is attained by the Gaussian kernel P B k? k ¯ X k B k? i = ¯ X k i + Z k i , Z i ∼ N (0 , σ 2 Z k I ) , (95) { Z i } are i.i.d. and independent of { ¯ X k i } , and ρ k = σ 2 ¯ X k k X , B k? , (96) ¯ ρ k = σ 2 ¯ X k k X , D B k? . (97) The variances σ 2 Z k in ( 95 ) are set to satisfy ( 96 ). 1 See also [ 24 , Th. 6]; while stated for the scalar Gaussian source, the same argument applies to n parallel Gaussian sources of the same power , as is the case here; see [ 23 ] for the general vector case. 4) Linking { ρ k } K k =1 to d : It remains to establish the connection between { ρ k } K k =1 ( 96 ) and d ( 88 ). Setting ˆ X ? i in ( 87 ) to ˆ X ? i , E h X i | B [ K ] ? [ i ] i (98) attains equality in ( 85 ) , implying that the same Gaussian kernel ( 95 ) simultaneously attains the inﬁma of both terms in ( 82 ) . Thus, putting together ( 82 ) , ( 86 ) and ( 93 ) , we have R CEO ( d ) ≥ (99) inf n σ 2 ¯ X k k X , U k? o K k =1 : σ 2 X k U [ K ] ? = d ( 1 2 log σ 2 X kD B [ K ] ? σ 2 X k B [ K ] ? + K X k =1 1 2 log σ 2 ¯ X k k X , D B ? k σ 2 ¯ X k k X , B k? ) In voking Lemma 3 with X ← X i , ¯ X k ← ¯ X k i , W 0 k ← W k 0 i , we express ¯ X i , E h X i | Y [ K ] [ i ] i (100) = K X k =1 σ 2 X i | Y [ K ] [ i ] σ 2 X i | Y k [ i ] ¯ X k i , (101) which implies in particular E h ¯ X i | X [ i ] , B [ K ] ? [ i ] i = K X k =1 σ 2 X i | Y [ K ] [ i ] σ 2 X i | Y k [ i ] E h ¯ X k i | X [ i ] , B [ K ] ? [ i ] i (102) = K X k =1 σ 2 X i | Y [ K ] [ i ] σ 2 X i | Y k [ i ] E h ¯ X k i | X [ i ] , B k? [ i ] i . (103) It follows that steady-state causal MMSE in estimating ¯ X i from X [ i ] and B [ K ] ? [ i ] satisﬁes σ 2 ¯ X k X , B [ K ] ? = K X k =1 σ 4 X k Y σ 4 X k Y k ρ k . (104) Observe that σ 2 ¯ X i | X [ i ] , B [ K ] ? [ i ] = σ 2 ¯ X i − E h ¯ X i | X [ i ] , B [ K ] ? [ i ] i (105) = σ 2 ¯ X i − X i − E h ¯ X i − X i | X [ i ] , B [ K ] ? [ i ] i (106) = σ 2 ¯ X i − X i − E [ ¯ X i − X i | X i − ˆ X ? i ] (107) = σ 2 X i − ¯ X i | X i − ˆ X ? i , (108) Now , we apply Lemma 2 with X ← X i − ¯ X i , Y ← X i − ˆ X i , W ← ¯ X i − ˆ X i to establish lim i →∞ σ 2 X i − ¯ X i | X i − ˆ X i = σ 2 X k Y 1 − σ 2 X k Y d ! , (109) which, together with ( 104 ) and ( 108 ), means 1 d ≤ 1 σ 2 X k Y − K X k =1 ρ k σ 4 X k Y k . (110) Also, note that 0 ≤ ρ k ≤ σ 2 ¯ X k k X . (111) 10 W e can no w simplify the constraint set in the inﬁmum in ( 99 ) : the inﬁmum is over { ρ k } K k =1 that satisfy ( 110 ) and ( 111 ). It remains to clarify ho w the form in ( 58 ) , ( 61 ) , ( 62 ) , parameterized in terms of d k , σ 2 X k B k? (112) rather than ρ k , is obtained. An application of Lemma 2 with X ← X i − ¯ X k i , Y ← X i − ˆ X k i , W ← ¯ X k i − ˆ X k i leads to ρ k = σ 2 X k Y k 1 − σ 2 X k Y k d k ! . (113) Plugging ( 113 ) into ( 110 ) leads to ( 61 ) . Applying Lemma 2 with X ← X i − ¯ X k i , Y ← X i , W ← ¯ X k i , we express σ 2 ¯ X k k X = σ 2 X k Y k 1 − σ 2 X k Y k σ 2 X ! , (114) which, together with ( 113 ) , implies the equiv alence of ( 111 ) and ( 62 ) . Finally , applying Lemma 2 with X ← X i − ¯ X k i , Y ← X i − a ˆ X k i − 1 , W ← ¯ X k i − a ˆ X k i − 1 , we express ¯ ρ k = σ 2 X k Y k 1 − σ 2 X k Y k ¯ d k ! . (115) Plugging ( 113 ) and ( 115 ) into ( 99 ) , we conclude the equiv alence of ( 99 ) and ( 58 ). E. Pr oof of Theor em 4 : ac hievability W e ev aluate the Berger-T ung inner bound with inter- block memory ( 20 ) . In the proof of the con verse, we lo wer- bounded the n -letter version of that bound, i.e., ( 33 ) , by computing the right-hand side of ( 82 ) . Thus, it sufﬁces to show that equality holds in ( 79 ) . But this is easily veriﬁed by substituting the optimal kernel ( 95 ) into the left side of ( 79 ). I V . L O S S D U E T O I S O L A T E D O B S E RV E R S A. Overview In Section IV , we in vestigate ho w the rate-distortion function in Theorem 4 compares to what would hav e been achie vable had the encoders communicated with each other . A tight upper bound on the rate loss due to separate encoding is presented in Section IV -B (Theorem 5 ). Its proof relies on an upper bound on R CEO ( d ) presented in Section IV -C (Proposition 1 ). The proof of Theorem 5 in Section IV -D concludes the section. B. Loss due to isolated observers Unrestricted communication among the encoders is equiv alent to having one encoder that sees all the obser- vation processes n Y [ K ] i o . It is also equiv alent to allowing joint encoding policies P B [ K ] [ t ] k Y [ K ] [ t ] in lieu of independent encoding policies Q K k =1 P B k [ t ] k Y k [ t ] in Deﬁnition 1 . The lossy compression setup in which the encoder has access only to a noise-corrupted version of the source has been referred to as “remote”, “indirect”, or “noisy” rate- distortion problem in the literature [ 41 ]–[ 44 ]. The setting with causal coding was considered in [ 22 , Th. 5–8, Cor . 1]. W e denote the joint encoding counterpart of the opera- tional fundamental limit R CEO ( d ) ( 54 ) by R rm ( d ) (remote). The following result is a corollary to Theorem 4 . Corollary 3 (Remote rate-distortion function with in- ter-block memory) . F or all σ 2 X k Y [ K ] < d < σ 2 X , the rate- distortion function with joint encoding for the Gauss- Markov sour ce in ( 2 ) observed through the Gaussian channels in ( 3 ) is given by R rm ( d ) = 1 2 log ¯ d − σ 2 X k Y [ K ] d − σ 2 X k Y [ K ] , (116) wher e ¯ d is deﬁned in ( 59 ) . Pr oof. Examining its proof, it is easy to see that Theorem 4 continues to hold in the scenario with vector observ ations Y k i (that are still required to be jointly Gaussian with X i ). In light of this fact, we view the joint encoding scenario as the CEO scenario with a single encoder that has access to all K observati ons, and we see that ( 58 ) indeed reduces to ( 116 ) in that case. Previously , the minimal mutual information problem leading to R rm ( d ) was solved in [ 22 ] in a different form using a different method; both forms are equiv alent (Appendix E ). The loss due to isolated encoders is bounded as follo ws. Theorem 5 (Loss due to isolated observers) . Consider the causal Gaussian CEO pr oblem ( 2 ) , ( 3 ) . Assume that tar get distortion d satisﬁes σ 2 X k Y [ K ] < d and 1 d ≥ 1 σ 2 X k Y [ K ] + K σ 2 X − min k ∈ [ K ] K σ 2 X k Y k . (117) Then, the rate loss due to isolated observers is bounded as R CEO ( d ) − R rm ( d ) ≤ ( K − 1) ( R rm ( d ) − R ( d )) , (118) with equality if and only if σ 2 X k Y k ar e all the same, wher e R ( d ) is given in ( 63 ) and R rm ( d ) is given in ( 116 ) . Pr oof. Section IV -D . Theorem 5 parallels the corresponding result for the classical Gaussian CEO problem [ 31 , Cor . 1], and recov ers it if a = 0 . It is interesting that in both cases, the rate loss is bounded abo ve by K − 1 times the difference between the remote and the direct rate-distortion functions. In the case of identical observ ation channels, condition ( 117 ) reduces to d ≤ σ 2 X . The rate loss ( 118 ) gro ws without bound in the high resolution regime d ↓ σ 2 X k Y [ K ] and vanishes in the low resolution regime d ↑ σ 2 X . C. A suboptimal waterﬁlling allocation W e present an upper bound to R CEO ( d ) , which is obtained by waterﬁlling ov er d k ’ s. This parallels the cor - responding result for the classical Gaussian CEO problem [ 31 , Cor . 1]. Like [ 31 ], we use waterﬁlling to obtain this result, but unlike the case t = 1 considered in [ 31 ] where 11 waterﬁlling is optimal [ 7 ], it is only suboptimal if t > 1 due to the memory of the past steps at the encoders and the decoder . This is unsurprising, as for the same reason waterﬁlling cannot be applied to solve the vector Gaussian rate-distortion problem for t > 1 [ 22 , Remark 2]. Proposition 1 (Suboptimal waterﬁlling rate allocation) . F or all σ 2 X k Y [ K ] < d < σ 2 X , the causal CEO rate-distortion function for the Gauss-Markov sour ce in ( 2 ) observed thr ough the Gaussian channels in ( 3 ) is upper-bounded as R CEO ( d ) ≤ 1 2 log ¯ d d + K X k =1 1 2 log ¯ d k − σ 2 X k Y k d k − σ 2 X k Y k d k ¯ d k , (119) wher e d k , k ∈ [ K ] satisfy 1 σ 2 X k Y k − 1 d k = min ( 1 λ , 1 σ 2 X k Y k − 1 σ 2 X ) , (120) λ is the solution to K X k =1 min ( 1 λ , 1 σ 2 X k Y k − 1 σ 2 X ) = 1 σ 2 X k Y [ K ] − 1 d , (121) and ¯ d , ¯ d k ar e deﬁned in ( 59 ) , ( 60 ) r espectively . Inequality in ( 119 ) holds with equality if all σ 2 X k Y k ar e equal. Pr oof. W e ﬁrst check that the choice in ( 120 ) is feasible. Since the right side of ( 120 ) is lower-bounded by 0 and upper bounded by 1 σ 2 X k Y k − 1 σ 2 X , ( 62 ) is satisﬁed. Furthermore, substituting ( 121 ) ensures that ( 61 ) is satisﬁed with equality . T o claim equality in the symmetrical case, it sufﬁces to recall that in that case, the minimum in ( 58 ) is attained by d 1 = . . . = d K (Corollary 1 ). D. Pr oof of Theor em 5 Under the assumption ( 117 ) , the waterﬁlling allocation in Proposition 1 results in all activ e transmitters, and ( 120 ) reduces to 1 σ 2 X k Y k − 1 d k = 1 λ , (122) while ( 121 ) reduces to λ = K 1 σ 2 X k Y [ K ] − 1 d ! − 1 . (123) Substituting ( 122 ) into ( 119 ) we conclude that under assumption ( 117 ), R CEO ( d ) = 1 2 log ¯ d d + 1 2 K X k =1 log " 1 σ 2 X k Y k − 1 ¯ d k ! λ # (124) ≤ 1 2 log ¯ d d + K 2 log " K X k =1 1 σ 2 X k Y k − 1 ¯ d k ! λ K # (125) = 1 2 log ¯ d d + K 2 log 1 σ 2 X k Y [ K ] − 1 ¯ d ! λ K (126) = 1 2 log ¯ d − σ 2 X k Y [ K ] d − σ 2 X k Y [ K ] + K − 1 2 log ¯ d − σ 2 X k Y [ K ] d − σ 2 X k Y [ K ] d ¯ d (127) where • ( 125 ) is by Jensen’ s inequality , since log is conca ve; • ( 126 ) is due to 1 σ 2 X k Y [ K ] = K X k =1 1 σ 2 X k Y k − K − 1 σ 2 X , (128) 1 ¯ d = K X k =1 1 ¯ d k − K − 1 σ 2 X , (129) which holds by Lemma 3 e ven if the source is nonstationary (that is, | a | ≥ 1 and σ 2 X = ∞ ), as a simple limiting argument taking K − 1 σ 2 X to 0 conﬁrms. • ( 127 ) holds by substituting ( 122 ) into ( 126 ). Notice that ( 118 ) is just another way to write ( 127 ) , using ( 116 ) and ( 63 ) . T o verify the condition for equality , note that ‘ = ’ holds in ( 124 ) in the symmetrical case by Proposition 1 , and that ‘ = ’ holds in ( 125 ) only in the symmetrical case due to strict concavity of the log function. V . C O N C L U S I O N In this paper , we set up the causal CEO problem (Deﬁnition 1 , Deﬁnition 2 ) and we prove that the rate- distortion function is upper bounded by the directed mutual information from the encoders to the decoder minimized subject to the distortion constraint and the separate encoding constraint, and lo wer bounded by the minimal directed mu- tual information subject to a weaker constraint (Theorem 1 ). The proof of the direct coding theorem hinges upon an SLC-based nonasymptotic bound (Theorem 2 ) that extends [ 28 , Th. 6] to the case with K > 2 observers and t > 1 time steps. An asymptotic analysis of Theorem 2 leads to an extension of the Berger -T ung inner bound [ 12 ], [ 13 ] to t > 1 time steps (Theorem 3 ). By sho wing that the achie vability bound in Theorem 1 is tight in the Gaussian case and by solving the correspoding minimal directed mutual information problem, we charac- terize the causal Gaussian CEO rate-distortion function as a con ve x optimization problem over K parameters (Theorem 4 ). W e give an explicit formula in the identical- channels case (Corollary 1 ), and we study its asymptotic behavior as K → ∞ (Corollary 2 ). W e deriv e the causal Gaussian remote rate-distortion function as a corollary to Theorem 4 with K = 1 (Corollary 3 ). Using a suboptimal waterﬁlling allocation ov er the K optimization parameters in Theorem 4 (Proposition 1 ), we upper -bound the rate loss due to separated observers (Theorem 5 ). W e chose not to treat correlation between n components of X i and W k i in this paper merely to keep things simple. W e expect our results to generalize to the scenario in which the components of the source and the noise are not i.i.d. A further interesting generalization would be to consider the general vector state-space model X i +1 = AX i + V i (130) Y k i = C X i + W k i , (131) where A is an n × n matrix and C is an m × n matrix. It will also be interesting to determine the full rate-distortion region of the causal Gaussian CEO problem as opposed to the sum 12 rate we found in this paper . While Theorem 3 already giv es an inner bound to that region, de veloping a con verse remains open. The techniques in [ 11 ], [ 15 ], [ 16 ] appear promising in that pursuit. Certain causal multiterminal source coding problems also appear within reach in vie w of the result in [ 10 ] and the applicability of Theorem 3 to multiterminal source coding. A P P E N D I X A P RO O F O F T H E O R E M 2 Codebooks: Encoder k maintains separate codebooks U k 1 , U k 2 , . . . , U k t to use at the transmission instances 1 , 2 , . . . , t respecti vely . Codebook U k i is an n × L k 1 × . . . × L k i - dimensional array: there is a separate codebook for each possible realization of past chosen code words. For v ector of indices ` [ i ] ∈ Q i j =1 [ L k j ] , we denote by U k i ( ` [ i ] ) the codeword corresponding to index ` i , given the past indices ` [ i − 1] . For subsets K ⊆ [ K ] and I ⊆ [ t ] , we denote the collection of codebooks U K I , ( U k i : k ∈ K , i ∈ I ) . For indices ` k i ∈ [ L k i ] , i ∈ [ t ] , k ∈ [ K ] , we denote their collection ` K I , ( ` k i : k ∈ K , i ∈ I ) . Finally , U K I ( ` K I ) , ( U k i ( ` k [ i ] ) : k ∈ K , i ∈ I ) denotes the codew ords corresponding to ` K I ; 1 K I denotes the array of 1’ s of dimension |K| × |I | . Codebook 1 for encoder k , U k 1 , consists of L k 1 codew ords drawn i.i.d. from P U k 1 . For i = 2 , . . . , t , codebook i for user k , U k i , consists of L k i codew ords drawn i.i.d. from P U k i | U k [ i − 1] = U k [ i − 1]  ` k [ i − 1]  , for each ` k [ i − 1] ∈ Q i − 1 j =1 [ L k j ] . Random binning: Let B k i : [ L k i ] 7→ [ M k i ] , i = 1 , 2 , . . . , t , be random mappings in which each element of [ L k i ] is mapped equiprobably and independently to the set [ M k i ] . W e will use the notation B K I ( ` K I ) , ( B k i ( ` k [ i ] ) : k ∈ K , i ∈ I ) denotes the codewords corresponding to ` K I . In the description of coding operations that follows, we denote the instances of the random codebooks in operation by u k i and those of the random binning functions by b k i . Encoders: The encoders use the stochastic likelihood coder (SLC) [ 27 ], [ 28 ] followed by random binning. Each user k maintains a collection of encoders inde xed by time i = 1 , 2 , . . . , t ; at time i , encoder i is inv oked to form and transmit a codew ord at that time. Encoder i for user k : Given an observ ation y k i ∈ Y k i and past codew ords ` k [ i − 1] ∈ Q i − 1 j =1 [ L k j ] , the SLC chooses an index ` k i ∈ [ L k i ] with probability Q U k i | Y k [ i ] = y k [ i ] ,U k [ i − 1] = u k [ i − 1]  ` k [ i − 1]   u k i  ` k [ i ]  (132) = exp  ı  y k [ i ] ; u k i  ` k [ i ]  | u k [ i − 1]  ` k [ i − 1]  P L k i ` =1 exp  ı  y k [ i ] ; u k i  ` k [ i − 1] , `  | u k [ i − 1]  ` k [ i − 1]  , where the conditional information density is with respect to the giv en distribution P Y k i U k i | U k [ i − 1] . Encoder i transmits m k i = b k i ( ` k i ) to the decoder, a realization of the random variable we denote by B k i . The causal encoder k is the resulting causal probability kernel Q B k [ t ] k Y k [ t ] ( m k [ t ] k y k [ t ] ) = X ` k [ t ] 1 n b k [ t ]  ` k [ t ]  = m k [ t ] o Q U k [ t ] k Y k [ t ] ( ` k [ t ] k y k [ t ] ) . (133) Since the encoders operate independently , Q U [ K ] [ t ] k Y [ K ] [ t ] = K Y k =1 Q U k [ t ] k Y k [ t ] , (134) Q B [ K ] [ t ] k Y [ K ] [ t ] = K Y k =1 Q B k [ t ] k Y k [ t ] . (135) Decoder: Having recei ved the collection of bin numbers m [ K ] i ∈ Q K k =1 [ M k i ] at time i and remembering the past, the decoder in vokes a generalized likelihood decoder (GLD) [ 29 , eq. (4)] to select among indices that fall into those bins a collection of indices ˆ ` [ K ] i ∈ Q K k =1 [ L k i ] with probability Q ˆ U [ K ] i | B [ K ] [ i ] = b [ K ] [ i ] , ˆ U [ K ] [ i − 1] = u [ K ] [ i − 1]  ˆ ` [ K ] [ i − 1]   u [ K ] i  ˆ ` [ K ] i  = g  u [ K ] [ i ]  ˆ ` K [ i ]  1 n b [ K ] i  ˆ ` [ K ] i  = m [ K ] i o P ` [ K ] i g  u [ K ] [ i ]  ˆ ` [ K ] [ i − 1] , ` [ K ] i  1 n b [ K ] i ( ` [ K ] i ) = m [ K ] i o , (136) where g  u [ K ] [ i ]  , K Y k =1 1 (  π ( k )  u π ([ K ]) [ i ]  ≥ log L π ( k ) i M π ( k ) i + β π ( k ) i ) . (137) Having determined ˆ ` [ K ] i , the decoder applies the given transformation P ˆ X i | U [ K ] [ i ] , ˆ X [ i − 1] to form the estimate of the source ˆ X i  u [ K ] [ i ]  ˆ ` [ K ] [ i ]  . The causal decoder is the resulting causal kernel Q ˆ X [ t ] k B [ K ] [ t ] . Err or analysis: W e consider two error e vents: E dec : ˆ U [ K ] [ t ] 6 = U [ K ] [ t ] (138) E enc : t [ i =1 n d  X i , ˆ X i  U [ K ] [ i ]  > d i o , (139) where U [ K ] [ i ] are the codewords chosen by the encoders at encoding step ( 132 ) , and ˆ U [ K ] [ t ] is the decoder’ s es- timate of those codewords after decoding step ( 136 ) . Note that E dec is the event that some codewords are not recovered (decoding error), and E enc is the event that some distortions exceed threshold even if all the codew ords are recov ered correctly (encoding error). W e denote for brevity by F the sigma-algebra generated by Y [ K ] [ t ] , U [ K ] [ t ]  1 [ K ] [ t ]  , B [ K ] [ t ]  1 [ K ] [ t ]  , ˆ X [ i ]  U [ K ] [ i ]  1 [ K ] [ i ]  ; by Q the probability measure generated by the code; and by F k i , G i the denominators in ( 132 ) and ( 136 ) , respectively . Follo wing Shannon’ s random coding argument and the Jensen inequality technique of Y assaee et al. [ 27 ], [ 28 ], we proceed to bound an e xpectation of the indicator of 13 the correct decoding ev ent with respect to both the actual source code and the random codebooks. E " Q " t Y i =1 1 n d  X i , ˆ X i  ≤ d i | U [ K ] [ t ] , B [ K ] [ t ] o ## ≥ E  Q  E c enc ∩ E c dec | U [ K ] [ t ] , B [ K ] [ t ]  (140) = E  X ` [ K ] [ t ] ∈ Q t i =1 Q K k =1 [ L k i ] Q U [ t ] k Y [ K ] [ t ]  U [ K ] [ t ]  ` [ K ] [ t ]  k Y [ K ] [ t ]  · X m [ K ] [ t ] ∈ Q t i =1 Q K k =1 [ M k i ] 1 n B [ K ] [ t ]  ` [ K ] [ t ]  = m [ K ] [ t ] o · Q ˆ U [ K ] [ t ] k B [ K ] [ t ] = m [ K ] [ t ]  U [ K ] [ t ]  ` [ K ] [ t ]  1 {E c enc }  (141) = K Y k =1 t Y i =1 M k i L k i · E  E  Q U [ K ] [ t ] k Y [ K ] [ t ]  U [ K ] [ t ]  1 [ K ] [ t ]  k Y [ K ] [ t ]  · 1 n B [ K ] [ t ]  1 [ K ] [ t ]  = 1 [ K ] [ t ] o · Q ˆ U [ K ] [ t ] k B [ K ] [ t ] =1 [ K ] [ t ]  U [ K ] [ t ]  1 [ K ] [ t ]  1 {E c enc } | F  (142) ≥ K Y k =1 t Y i =1 M k i L k i · E " K Y k =1 t Y i =1 exp  ı  Y k [ i ] ; U k i  1 [ i ]  | U k [ i − 1]  1 [ i − 1]   E  F k i | F  · g  U [ K ] [ i ]  1 [ K ] [ i ]  1 n B [ K ] i  1 [ K ]  = 1 [ K ] o E [ G i | F ] · 1 n d  X i , ˆ X i  U [ K ] [ i ]  1 [ K ] [ i ]  ≤ d i o # , (143) where • the expectation E in ( 141 ) is with respect to the codebooks U [ K ] [ t ] , the random binning functions B [ K ] [ t ] , the decoder P ˆ X [ t ] k U [ K ] [ t ] and X [ t ] , Y [ K ] [ t ] ; • ( 142 ) uses that both the codew ords and the binning functions for the i -th time instant are independently and identically distributed, thus each choice of ` [ K ] [ t ] and m [ K ] [ t ] results in the same probability as the choice ` [ K ] [ t ] = 1 [ K ] [ t ] and m [ K ] [ t ] = 1 [ K ] [ t ] . Here we also conditioned on F before taking an outer expectation with respect to it, which will facilitate the next step of the calculation. • the main step ( 143 ) is shown as follows. The product Q U [ K ] [ t ] k Y [ K ] [ t ] Q ˆ U [ K ] [ t ] k B [ K ] [ t ] is proportional to the product of ( K + 1) t factors Q t i =1 1 G i Q K k =1 1 F k i . Applying Jensen’ s inequality to this jointly conv ex function of ( K + 1) t variables yields E " t Y i =1 1 G i K Y k =1 1 F k i | F # ≥ t Y i =1 1 E [ G i | F ] K Y k =1 1 E  F k i | F  (144) W e compute each factor in ( 144 ) as follows. E  F k i |F  (145) = E   L k i X ` =1 exp  ı  Y k [ i ] ; U k i  1 [ i − 1] , `  | U k [ i − 1]  1 [ i − 1]   | F   = exp  ı  Y k [ i ] ; U k i  1 [ i ]  | U k [ i − 1]  1 [ i − 1]   + ( L k i − 1) · E h exp  ı  Y k [ i ] ; U k i (1 [ i − 1] , 2) | U k [ i − 1]  1 [ i − 1]   | F i (146) = exp  ı  Y k [ i ] ; U k i (1 [ i ] ) | U k [ i − 1]  1 [ i − 1]   + ( L k i − 1) , (147) where to write ( 146 ) we used that the codewords { U k i (1 [ i − 1] , ` ) : ` 6 = 1 } are identically distributed condi- tioned on F . T o ev aluate E [ G i |F ] , we partition the set of all ` [ K ] i ∈ Q K i =1 [ L k i ] into index sets parameterized by K ⊆ [ K ] : L i ( K ) , n ` [ K ] ∈ K Y i =1 [ L k i ] : ` π ( k ) = 1 , k ∈ K , ` π ( k ) 6 = 1 , k ∈ K c o , (148) and for each ` [ K ] i ∈ L i ( K ) , K ⊂ K , we upper-bound g ( · ) as g  u [ K ] [ i ]  1 [ K ] [ i − 1] , ` [ K ] i  (149) ≤ Y k ∈K c 1 (  π ( k )  u π ([ K ]) [ i ]  ≥ log L π ( k ) i M π ( k ) i + β π ( k ) i ) (150) ≤ Y k ∈K c M π ( k ) i L π ( k ) i exp   π ( k )  u π ([ K ]) [ i ]  1 [ K ] [ i − 1] , ` [ K ] i  − β π ( k ) i  , (151) while for K = [ K ] , we upper-bound it as g  u [ K ] [ i ]  1 [ K ] [ i ]  ≤ 1 . (152) Note that for each ` [ K ] i ∈ L i ( K ) , K ⊂ K E " Y k ∈K c exp   π ( k )  U π ([ K ]) [ i ]  1 [ K ] [ i − 1] , ` [ K ] i  |F # = 1 . (153) The upper-bound in ( 151 ) and the equality in ( 153 ) are ke y to the analysis of our GLD ( 136 ). 14 Now , E [ G i |F ] is bounded as E [ G i |F ] = E " X ` [ K ] i ∈ Q K k =1 [ L k i ] g  U [ K ] [ i ]  1 [ K ] [ i − 1] , ` [ K ] i  · 1 n B [ K ] i ( ` [ K ] i ) = 1 [ K ] o | F # (154) = g  U [ K ] [ i ]  1 [ K ] [ i ]  1 n B [ K ] i (1 [ K ] ) = 1 [ K ] o + X K⊂ [ K ] E    X ` [ K ] i ∈L i ( K ) g  U [ K ] [ i ]  1 [ K ] [ i − 1] , ` [ K ] i  | F    · Y k ∈K c 1 M π ( k ) i · 1 n B π ( K ) i (1 K ) = 1 K o (155) ≤ 1 n B [ K ] i (1 [ K ] ) = 1 [ K ] o (156) + X K⊂ [ K ] exp − X k ∈K c β π ( k ) i ! 1 n B π ( K ) i (1 K ) = 1 K o , where ( 156 ) follows from ( 151 ), ( 152 ) and ( 153 ). No w , plugging ( 147 ) and ( 156 ) into ( 143 ) and computing the e xpectation in ( 143 ) with respect to the codebooks and the binning functions, we conclude that the probability of successful decoding is bounded below as 1 −  ≥ (157) E " K Y k =1 t Y i =1 1 1 L k i exp  ı  Y k [ i ] ; U k i | U k [ i − 1]  +  1 − 1 L k i  · g  U [ K ] [ i ]  1 n d  X i , ˆ X i  U [ K ] [ i ]  ≤ d i o 1 + P K⊂ [ K ] exp  − P k ∈K c β k i  # . Loosening the bound ( 157 ) : Here we again follow the recipe of Y assaee et al. [ 27 ], [ 28 ]. 1 −  ≥ E " K Y k =1 t Y i =1 1 ( L k i ) − 1 exp  ı  Y k [ i ] ; U k i | U k [ i − 1]  + 1 · g  U [ K ] [ i ]  1 n d  X i , ˆ X i  U [ K ] [ i ]  ≤ d i o P K⊆ [ K ] exp  − P k ∈K β k i  # (158) ≥ K Y k =1 t Y i =1 P [ E c ]  1 + exp( − α k i )  h P K⊆ [ K ] exp  − P k ∈K β k i  i (159) where ( 158 ) holds by weakening ( 157 ) using 1 −  L k t  − 1 ≤ 1 and rewriting for brevity 1 + X K⊂ [ K ] exp − X k ∈K c β k i ! = X K⊆ [ K ] exp − X k ∈K β k i ! ; (160) ( 159 ) is obtained by weakening ( 158 ) by multiplying the random variable inside the expectation by 1 {E c } and using the conditions in E ( 45 ) to upper -bound ı  Y k [ i ] ; U k i | U k [ i − 1]  in the denominator . Rewriting ( 159 ), we obtain  ≤ 1 − (161) K Y k =1 t Y i =1 P [ E c ]  1 + exp( − α k i )  h P K⊆ K exp( − P k ∈K β k i ) i = P [ E ] + γ P [ E c ] (162) ≤ P [ E ] + γ . (163) A P P E N D I X B P RO O F O F T H E O R E M 3 W e analyze the bound in Theorem 2 with P U k [ t ] k Y k [ t ] = P ⊗ n U k [ t ] k Y k [ t ] , (164) P ˆ X [ K ] [ t ] k U [ K ] [ t ] = P ⊗ n ˆ X [ K ] [ t ] k U [ K ] [ t ] , (165) single-letter kernels chosen so that E h d  X i , ˆ X i  U [ K ] [ i ] i = d i + δ, (166) for some δ > 0 . W e also ﬁx an arbitrary permutation π : [ K ] 7→ [ K ] . Denote for brevity the di vergences D π ( k ) i , E h  π ( k )  U π ([ K ]) [ i ] i (167) = D  P U π ( k ) i | U π ([ k − 1]) i U π ([ K ]) [ i − 1] k P U π ( k ) i | U π ( k ) [ i − 1] | P U π ([ k − 1]) i U π ([ K ]) [ i − 1]  For k ∈ [ K ] , i ∈ [ t ] , let α k i = β k i = nδ, (168) and choose L k i , M k i to satisfy log L k i ≥ n I  Y k [ i ] ; U k i | U k [ i − 1]  + 2 α k i , (169) log M π ( k ) i ≥ log L π ( k ) i − n D π ( k ) i + 2 β k i . (170) Note that since U k i −  Y k [ i ] , U k [ i − 1]  − U [ K ] \{ k } [ i ] , it holds that I  Y π ( k ) [ i ] ; U π ( k ) i | U π ([ k − 1]) i , U π ([ K ]) [ i − 1]  = I  Y π ( k ) [ i ] ; U π ( k ) i | U π ( k ) [ i − 1]  − D π ( k ) i , (171) and thus summing both sides of ( 170 ) ov er i ∈ [ t ] we obtain (cf. ( 47 )) 1 n t X i =1 log M k i ≥ I  Y π ( k ) [ t ] → U π ( k ) [ t ] k U π ([ k − 1]) [ t ] , D U [ K ] [ t ]  + 4 tδ. (172) Applying the union bound to P [ E ] and the law of lar ge numbers to each of the resultant (2 K + 1) t terms, we further conclude that P [ E ] → 0 as n → ∞ . Furthermore, γ → 0 as n → ∞ , and therefore by Theorem 2 there exists a sequence of codes with log L k i and log M k i satisfying ( 169 ) , ( 170 ) with excess-distortion probability  → 0 as n → ∞ . Under our assumption on the p -th moment of the distor- tion measure ( 25 ) , the existence of an ( M [ K ] [ t ] , d [ t ] ,  ) excess- distortion code with 1 t P t i =1 d i ≤ d implies the existence of an ( M [ K ] [ t ] , d (1 −  ) + d p  1 − 1 /p ) av erage distortion code via a standard ar gument using H ¨ older’ s inequality [ 45 , Th. 25.5]. 15 A P P E N D I X C T W O C H A R A C T E R I Z AT I O N S O F B E R G E R - T U N G B O U N D Proposition 2. The re gion R in ( 52 ) is equivalent to the r e gion R 0 in ( 51 ) . Pr oof of Pr oposition 2 . Observe that any subset A of [ K ] with cardinality k is equal to π ([ k ]) , for some permutation π on [ K ] . First, we show that R 0 ⊆ R . Fix π and consider K = π ([ k ]) . Since gi ven Y k , U k is independent of U [ K ] \{ k } , I ( Y K ; U K ) = k X j =1 I ( Y π ( j ) ; U π ( j ) | U π [ j − 1] ) , (173) I ( Y K c ; U K c | U K ) = K X j = k +1 I ( Y π ( j ) ; U π ( j ) | U π [ j − 1] ) . (174) From ( 174 ) , we conclude that any set of rates that satisﬁes ( 51 ) for π must also satisfy ( 52 ) for A = K c . Thus, R 0 ⊆ R . T o show that R ⊆ R 0 , note, using the operational Markov chain condition U B − Y B − Y A\B − U A\B , that for all B ⊆ A , I ( Y A ; U A ) = I ( Y A\B ; U A\B | U B ) + I ( Y B ; U B ) . (175) Since ( S 1 ≥ I 1 S 1 + S 2 ≥ I 1 + I 2 ⇐ ⇒ ( S 1 ≥ I 1 S 2 ≥ I 2 , (176) ( 175 ) implies that for any A ⊆ [ K ] , ( P k ∈A c R k ≥ I ( Y A c ; U A c | U A ) P k ∈ [ K ] R k ≥ I ( Y [ K ] ; U [ K ] ) (177) ⇐ ⇒ ( P k ∈A R k ≥ I ( Y A ; U A ) P k ∈A c R k ≥ I ( Y A c ; U A c | U A ) (178) and for any B ⊆ A , ( P k ∈B R k ≥ I ( Y B ; U B ) P k ∈A R k ≥ I ( Y A ; U A ) (179) ⇐ ⇒ ( P k ∈B R k ≥ I ( Y B ; U B ) P k ∈A\B R k ≥ I ( Y A\B ; U A\B | U B ) (180) For B = π ([ k − 1]) and A = π ([ k ]) , the second inequality in ( 180 ) is exactly the inequality ( 51 ) . Since any set of rates satisfying ( 52 ) must also satisfy ( 180 ) for all B ⊆ A ⊆ [ K ] , we conclude that R ⊆ R 0 . A P P E N D I X D M M S E E S T I M AT I O N L E M M A S Lemmas 2 and 3 are corollaries to the following result. Lemma 4. Let X ∼ N  0 , σ 2 X  , and let Y k = X + W k , k = 1 , . . . , K, (181) wher e W k ∼ N  0 , σ 2 W k  , W k ⊥ W j , j 6 = k . Then, the MMSE estimate and the normalized estimation err or of X given Y [ K ] ar e given by E  X | Y [ K ]  = K X k =1 σ 2 X | Y [ K ] σ 2 W k Y k , (182) 1 σ 2 X | Y [ K ] = 1 σ 2 X + K X k =1 1 σ 2 W k . (183) Pr oof of Lemma 4 . The result is well known; we provide a proof for completeness. For jointly Gaussian random vectors X, Y , E [ X | Y = y ] = E [ X ] + Σ X Y Σ − 1 Y Y ( y − E [ Y ]) , (184) Co v [ X | Y ] = Σ X X − Σ X Y Σ − 1 Y Y Σ Y X . (185) Denote for brevity Σ W ,    σ 2 W 1 0 . . . 0 σ 2 W K    . (186) In our case, X is a scalar and Y = Y [ K ] is a vector , and Σ X X = σ 2 X , (187) Σ Y Y = Σ W +    1 . . . 1    σ 2 X  1 · · · 1  , (188) Σ X Y = σ 2 X  1 . . . 1  . (189) Using the matrix inv ersion lemma, we compute readily Co v [ X | Y ] − 1 = Σ − 1 X X − Σ − 1 X X Σ X Y  Σ Y X Σ − 1 X X Σ X Y − Σ Y Y  − 1 Σ Y X Σ − 1 X X (190) = Σ − 1 X X + Σ − 1 X X Σ X Y Σ − 1 W Σ Y X Σ − 1 X X (191) = 1 σ 2 X + 1 σ 2 W 1 + . . . + 1 σ 2 W K , (192) which shows ( 183 ) . T o show ( 182 ) , we apply the matrix in version lemma to Σ Y Y to write: Σ − 1 Y Y = Σ − 1 W − Σ − 1 W    1 . . . 1    σ 2 X | Y [ K ]  1 . . . 1  Σ − 1 W . (193) It’ s easy to verify that σ 2 X  1 . . . 1     I n 1 σ 2 X | Y [ K ] − Σ − 1 W    1 . . . 1     1 . . . 1     =  1 . . . 1  , (194) where I n is the n × n identity matrix, so E [ X | Y = y ] = Σ X Y Σ − 1 Y Y y (195) =  1 . . . 1  Σ − 1 W σ 2 X | Y [ K ] y , (196) which is equiv alent to ( 182 ). Pr oof of Lemma 2 . Equality ( 74 ) follows from σ 2 Y = σ 2 X + σ 2 W , (197) 1 σ 2 X | Y = 1 σ 2 X + 1 σ 2 W , (198) where ( 198 ) is a particularization of ( 183 ). Pr oof of Lemma 3 . Notice that ( 75 ) with ¯ X k = E [ X | Y k ] and W 0 k ∼ N (0 , σ 2 X | Y k ) is just another way to write ( 181 ) . Reparameterizing ( 182 ) and ( 183 ) accordingly , one recovers ( 76 ) and ( 77 ). 16 Remark 1 . W e may use Lemma 4 to deriv e the Kalman ﬁlter for the estimation of X i ( 2 ) gi ven the history of observations Y [ K ] [ i ] ( 3 ): ¯ X i = a ¯ X i − 1 + K X k =1 σ 2 X i | Y [ K ] [ i ] σ 2 W k  Y k i − a ¯ X i − 1  , (199) 1 σ 2 X i | Y [ K ] [ i ] = 1 σ 2 X i | Y [ K ] [ i − 1] + K X k =1 1 σ 2 W k . (200) where ¯ X i is deﬁned in ( 100 ) . Equation ( 199 ) is the Kalman ﬁlter recursion with Kalman ﬁlter gain equal to the row vector σ 2 X i | Y [ K ] [ i ]  1 σ 2 W 1 , . . . , 1 σ 2 W K  , and ( 200 ) is the corresponding Riccati recursion for the MSE. A P P E N D I X E T W O E Q U I V A L E N T R E P R E S E N TA T I O N S O F R rm ( d ) In this appendix, we verify that ( 116 ) coincides with the lower bound on the causal remote rate-distortion function deriv ed in [ 22 ]. Indeed, [ 22 , Cor . 1 and Th. 9] imply R rm ( d ) ≥ 1 2 log a 2 + σ 2 X kD Y [ K ] − σ 2 X k Y [ K ] d − σ 2 X k Y [ K ] ! . (201) Here, σ 2 X kD Y [ K ] − σ 2 X k Y [ K ] is the v ariance of the innov ations of the Gauss-Markov process { ¯ X i } , i.e. ¯ X i +1 = a ¯ X i + ¯ V i , (202) ¯ V i ∼ N (0 , σ 2 X kD Y [ K ] − σ 2 X k Y [ K ] ) . The form in ( 201 ) leads to that in ( 116 ) via ( 59 ) and σ 2 X kD Y [ K ] = a 2 σ 2 X k Y [ K ] + σ 2 V . (203) A C K N O W L E D G E M E N T W e thank both anon ymous revie wers for their insightful and careful re views, which are reﬂected in the ﬁnal version. R E F E R E N C E S [1] V . K ostina and B. Hassibi, “Fundamental limits of distributed tracking, ” in Proceedings 2020 IEEE International Symposium on Information Theory , June 2020, pp. 2438–2443. [2] T . Berger, Z. Zhang, and H. V iswanathan, “The CEO problem [multiterminal source coding], ” IEEE T ransactions on Information Theory , vol. 42, no. 3, pp. 887–902, 1996. [3] H. V iswanathan and T . Berger , “The quadratic Gaussian CEO problem, ” IEEE T ransactions on Information Theory , vol. 43, no. 5, pp. 1549–1559, 1997. [4] Y . Oohama, “The rate-distortion function for the quadratic Gaussian CEO problem, ” IEEE T ransactions on Information Theory , vol. 44, no. 3, pp. 1057–1070, May 1998. [5] V . Prabhakaran, D. Tse, and K. Ramachandran, “Rate region of the quadratic Gaussian CEO problem, ” in Proceedings 2004 International Symposium on Information Theory , June 2004, p. 119. [6] Y . Oohama, “Rate-distortion theory for Gaussian multiterminal source coding systems with several side informations at the decoder, ” IEEE T ransactions on Information Theory , vol. 51, no. 7, pp. 2577–2593, 2005. [7] J. Chen, X. Zhang, T . Berger, and S. B. W icker , “ An upper bound on the sum-rate distortion function and its corresponding rate allocation schemes for the CEO problem, ” IEEE Journal on Selected Areas in Communications , vol. 22, no. 6, pp. 977–987, 2004. [8] H. Behroozi and M. R. Soleymani, “Optimal rate allocation in successiv ely structured Gaussian CEO problem, ” IEEE Tr ansactions on W ireless Communications , vol. 8, no. 2, pp. 627–632, 2009. [9] J. Chen and T . Berger , “Successi ve Wyner–Zi v coding scheme and its application to the quadratic Gaussian CEO problem, ” IEEE T ransactions on Information Theory , vol. 54, no. 4, pp. 1586–1603, 2008. [10] A. B. W agner, S. T avildar , and P . V iswanath, “Rate region of the quadratic Gaussian two-encoder source-coding problem, ” IEEE T ransactions on Information Theory , vol. 54, no. 5, pp. 1938–1961, 2008. [11] A. B. W agner and V . Anantharam, “ An improved outer bound for multiterminal source coding, ” IEEE T ransactions on Information Theory , vol. 54, no. 5, pp. 1919–1937, 2008. [12] T . Berger , Multi-terminal sour ce coding. New Y ork: Springer-V erlag, 1978, vol. The Information Theory Approach to Communications. [13] S.-Y . T ung, “Multiterminal source coding, ” Ph.D. dissertation, School of Electrical Engineering, Cornell University , 1978. [14] J. W ang, J. Chen, and X. W u, “On the sum rate of Gaussian multi- terminal source coding: New proofs and results, ” IEEE Tr ansactions on Information Theory , vol. 56, no. 8, pp. 3946–3960, 2010. [15] E. Ekrem and S. Ulukus, “ An outer bound for the vector Gaussian CEO problem, ” IEEE T ransactions on Information Theory , vol. 60, no. 11, pp. 6870–6887, 2014. [16] J. W ang and J. Chen, “V ector Gaussian multiterminal source coding, ” IEEE Tr ansactions on Information Theory , vol. 60, no. 9, pp. 5533– 5552, 2014. [17] T . A. Courtade and T . W eissman, “Multiterminal source coding under logarithmic loss, ” IEEE T ransactions on Information Theory , vol. 60, no. 1, pp. 740–761, Jan. 2014. [18] A. Gorbunov and M. S. Pinsker, “Nonanticipatory and prognostic  -entropies and message generation rates, ” Pr oblemy P eredac hi Informatsii , vol. 9, no. 3, pp. 12–21, 1973. [19] ——, “Prognostic epsilon entropy of a Gaussian message and a Gaussian source, ” Pr oblemy P eredachi Informatsii , vol. 10, no. 2, pp. 5–25, 1974. [20] S. T atikonda, A. Sahai, and S. Mitter, “Stochastic linear control over a communication channel, ” IEEE T ransactions on Automatic Control , vol. 49, no. 9, pp. 1549–1561, Sep. 2004. [21] E. Silva, M. Derpich, J. Ostergaard, and M. Encina, “ A character- ization of the minimal av erage data rate that guarantees a giv en closed-loop performance lev el, ” IEEE T ransactions on Automatic Contr ol , vol. 61, no. 8, pp. 2171–2186, Nov . 2016. [22] V . Kostina and B. Hassibi, “Rate-cost tradeoffs in control, ” IEEE T ransactions on Automatic Contr ol , vol. 64, no. 11, pp. 4525–4540, Apr . 2019. [23] T . T anaka, K.-K. K. Kim, P . A. Parrilo, and S. K. Mitter , “Semideﬁnite programming approach to Gaussian sequential rate-distortion trade- offs, ” IEEE T ransactions on Automatic Control , vol. 62, no. 4, pp. 1896–1910, 2017. [24] V . Kostina and B. Hassibi, “Rate-cost tradeoffs in scalar LQG control and tracking with side information, ” in 2018 56th Annual Allerton Confer ence on Communication, Control, and Computing , Monticello, IL, Oct. 2018, pp. 421–428. [25] O. Sabag, P . Tia n, V . K ostina, and B. Hassibi, “The minimal directed information needed to improve the LQG cost, ” in 2020 59th IEEE Confer ence on Decision and Contr ol (CDC) , 2020, pp. 1842–1847. [26] A. P . Johnston and S. Y ¨ uksel, “Stochastic stabilization of partially observed and multi-sensor systems driven by unbounded noise under ﬁxed-rate information constraints, ” IEEE T ransactions on Automatic Contr ol , vol. 59, no. 3, pp. 792–798, 2014. [27] M. H. Y assaee, M. R. Aref, and A. Gohari, “ A technique for deriving one-shot achie vability results in network information theory , ” in Pr oceedings 2013 IEEE International Symposium on Information Theory , Istanbul, Turk ey , July 2013. [28] ——, “ A technique for deri ving one-shot achie vability results in network information theory , ” 2013. [29] N. Merhav , “The generalized stochastic likelihood decoder: Random coding and expurgated bounds, ” IEEE T ransactions on Information Theory , vol. 63, no. 8, pp. 5039–5051, 2017. [30] A. El Gamal and Y .-H. Kim, Network information theory . Cambridge univ ersity press, 2011. [31] V . Kostina, “Rate loss in the Gaussian CEO problem, ” in Proceedings 2019 IEEE Information Theory W orkshop , V isby , Gotland, Sweden, Aug. 2019. [32] J. Østergaard and R. Zamir , “Incremental reﬁnement using a Gaussian test channel, ” in 2011 IEEE International Symposium on Information Theory Proceedings , 2011, pp. 2233–2237. 17 [33] G. Kramer, “Directed information for channels with feedback, ” Ph.D. dissertation, ETH Zurich, Dept. Electrical Engineering, 1998. [34] J. Massey , “Causality , feedback and directed information, ” in Proc. Int. Symp. Inf. Theory Applic.(ISITA-90) , Nov . 1990, pp. 303–305. [35] T . T anaka, “Semideﬁnite representation of sequential rate-distortion function for stationary Gauss-Markov processes, ” in Proceedings 2015 IEEE Confer ence on Control Applications (CCA) , Sep. 2015, pp. 1217–1222. [36] N. Guo and V . Kostina, “Optimal causal rate-constrained sampling of the Wiener process, ” in Proceedings 57th Annual Allerton Confer ence on Communication, Contr ol and Computing, , Monticello, IL, Sep. 2019. [37] O. Sabag, V . Kostina, and B. Hassibi, “Feedback capacity of MIMO Gaussian channels, ” in Pr oceedings 2021 IEEE International Symposium on Information Theory , July 2021, pp. 7–12. [38] ——, “Feedback capacity of MIMO Gaussian channels, ” arXiv pr eprint arXiv:2106.01994 , June 2021. [39] J. Liu, P . Cuff, and S. V erd ´ u, “On α -decodability and α -likelihood decoder , ” in Proceedings 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton) , Monticello, IL, Oct. 2017, pp. 118–124. [40] M. S. Derpich and J. Ostergaard, “Improved upper bounds to the causal quadratic rate-distortion function for Gaussian stationary sources, ” IEEE Tr ansactions on Information Theory , vol. 58, no. 5, pp. 3131–3152, May 2012. [41] T . Berger , Rate distortion theory . Prentice-Hall, Englew ood Cliffs, NJ, 1971. [42] R. Dobrushin and B. Tsybakov , “Information transmission with additional noise, ” IRE T ransactions on Information Theory , vol. 8, no. 5, pp. 293 –304, Sep. 1962. [43] H. S. Witsenhausen, “Indirect rate distortion problems, ” IEEE T ransactions on Information Theory , vol. 26, no. 5, pp. 518–521, Sep. 1980. [44] V . K ostina and S. V erd ´ u, “Nonasymptotic noisy lossy source coding, ” IEEE T ransactions on Information Theory , vol. 62, no. 11, pp. 6111– 6123, Nov . 2016. [45] Y . Polyanskiy . (2012) Lecture notes on information theory . V ictoria K ostina (S’12–M’14) is a Professor of Electrical Engineering and of Computing and Mathematical Sciences at Caltech. She re- ceiv ed a bachelor’ s degree from Moscow Institute of Physics and T echnology (2004), where she was afﬁliated with the Institute for Information T ransmission Problems of the Russian Academy of Sciences, a master’ s degree from University of Ottawa (2006), and a PhD from Princeton Univ ersity (2013). She received the Natural Sciences and Engineering Research Council of Canada postgraduate scholarship (2009–2012), the Princeton Electrical Engineering Best Dissertation A ward (2013), the Simons-Berkeley research fellowship (2015) and the NSF CAREER award (2017). Kostina’ s research spans information theory , coding, control, learning, and communications. Babak Hassibi was born in T ehran, Iran, in 1967. He received the B.S. degree from the University of T ehran in 1989, and the M.S. and Ph.D. degrees from Stanford Univ ersity in 1993 and 1996, respectively , all in electrical engineering. He has been with the California Institute of T echnology since January 2001, where he is currently the Mose and Lilian S. Bohn Professor of Electrical Engineering. From 2013-2016 he was the Gordon M. Binder/Amgen Professor of Electrical Engineering and from 2008-2015 he was Executiv e Of ﬁcer of Electrical Engineering, as well as Associate Director of Information Science and T echnology . From October 1996 to October 1998 he was a research associate at the Information Systems Laboratory , Stanford University , and from November 1998 to December 2000 he was a Member of the T echnical Staff in the Mathematical Sciences Research Center at Bell Laboratories, Murray Hill, NJ. He has also held short-term appointments at Ricoh California Research Center, the Indian Institute of Science, and Linkoping University , Sweden. His research interests include communications and information theory , control and network science, and signal processing and machine learning. He is the coauthor of the books (both with A.H. Sayed and T . Kailath) Indeﬁnite Quadratic Estimation and Control: A Uniﬁed Approac h to H 2 and H ∞ Theories (New Y ork: SIAM, 1999) and Linear Estimation (Engle wood Cliffs, NJ: Prentice Hall, 2000). He is a recipient of an Alborz Foundation Fellowship, the 1999 O. Hugo Schuck best paper award of the American Automatic Control Council (with H. Hindi and S.P . Boyd), the 2002 National ScienceFoundation Career A ward, the 2002 Okawa Foundation Research Grant for Information and T elecommunications, the 2003 David and Lucille Packard Fellowship for Science and Engineering, the 2003 Presidential Early Career A ward for Scientists and Engineers (PECASE), and the 2009 Al-Marai A ward for Innov ative Research in Communications, and was a participant in the 2004 National Academy of Engineering “Frontiers in Engineering”program. He has been a Guest Editor for the IEEE Transactions on Information Theory special issue on “space-time transmission, reception, coding and signal processing” was an Associate Editor for Communications of the IEEE Transactions on Information Theory during 2004-2006, and is currently an Editor for the Journal “Foundations and Trends in Information and Communication” and for the IEEE T ransactions on Network Science and Engineering. He is an IEEE Information Theory Society Distinguished Lecturer for 2016-2017 and was General Co-Chair if the 2020 IEEE International Symposium on Information Theory (ISIT 2020).

The CEO problem with inter-block memory

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment