Gossip Algorithms for Distributed Signal Processing

DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGN AL PR OCESSING 1 Gossip Algorithms for Distrib uted Signal Processing Alexandros G. Dimakis, Soummya Kar , Jos ´ e M.F . Moura, Michael G. Rabbat, and Anna Scaglione, Abstract Gossip algorithms are attractive for in-network processing in sensor networks because they do not require any specialized routing, there is no bottleneck or single point of failure, and they are rob ust to unreliable wireless network conditions. Recently , there has been a surge of activity in the computer science, control, signal processing, and information theory communities, dev eloping faster and more robust gossip algorithms and deriving theoretical performance guarantees. This article presents an overvie w of recent work in the area. W e describe con ver gence rate results, which are related to the number of transmitted messages and thus the amount of energy consumed in the network for gossiping. W e discuss issues related to gossiping over wireless links, including the effects of quantization and noise, and we illustrate the use of gossip algorithms for canonical signal processing tasks including distributed estimation, source localization, and compression. I . I N T RO D U C T I O N Collaborativ e in-network processing is a major tenet of wireless sensor networking, and has recei v ed much attention from the signal processing, control, and information theory communities during the past decade [1]. Early research in this area considered applications such as detection, classiﬁcation, tracking, and pursuit [2]–[5]. By exploiting local computation resources at each node, it is possible to reduce the amount of data that needs to be transmitted out of the network, thereby saving bandwidth and energy , extending the network lifetime, and reducing latency . Manuscript recei ved No vember 16, 2009; revised March 26, 2010. A.G. Dimakis is with the Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, 90089 USA { e-mail: dimakis@usc.edu. } S. Kar and J.M.F . Moura are with the Department of Electrical and Computer Engineering, Carnegie Mellon Univ ersity , Pittsburgh, P A, 15213 USA { e-mail: soummyak@andre w .cmu.edu; moura@ece.cmu.edu. } M.G. Rabbat is with the Department of Electrical and Computer Engineering, McGill Univ ersity , Montr ´ eal, QC, H3A 2A7 CANAD A { e-mail: michael.rabbat@mcgill.ca. } A. Scaglione is with the Department of Electrical and Computer Engineering, University of California, Davis, CA, 95616 USA { e-mail: ascaglione@ucdavis.edu. } The work of Kar and Moura was partially supported by the NSF under grants ECS-0225449 and CNS-0428404, and by the Ofﬁce of Nav al Research under MURI N000140710747. The work of Rabbat was partially supported by the NSERC under grant RGPIN 341596-2007, by MIT A CS, and by FQRNT under grant 2009-NC-126057. The work of Scaglione is supported by the NSF under grant CCF-0729074. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGN AL PR OCESSING 2 In addition to having on-board sensing and processing capabilities, the archetypal sensor network node is battery- powered and uses a wireless radio to communicate with the rest of the network. Since each wireless transmission consumes bandwidth and, on common platforms, also consumes considerably more ener gy than processing data locally [6], [7], reducing the amount of data transmitted can signiﬁcantly prolong battery life. In applications where the phenomenon being sensed varies slo wly in space, the measurements at nearby sensors will be highly correlated. In-network processing can compress the data to av oid wasting transmissions on redundant information. In other applications, rather than collecting data from each node, the goal of the system may be to compute a function of the data such as estimating parameters, ﬁtting a model, or detecting an event. In-network processing can be used to carry out the computation within the network so that, instead of transmitting raw data to a fusion center, only the results of the computation are transmitted to the end-user . In many situations, in-network computation leads to considerable energy savings over the centralized approach [8], [9]. Many previous approaches to in-network processing assume that the network can provide specialized routing services. For e xample, some schemes require the existence of a c yclic route through the network that passes through every node precisely one time 1 [9]–[11]. Others are based on forming a spanning tree rooted at the fusion center or information sink, and then aggregating data up the tree [8], [12], [13]. Although using a ﬁxed routing scheme is intuitiv e, there are many drawbacks to this approach in wireless networking scenarios. Aggregating data tow ards a fusion center at the root of a tree can cause a bottleneck in communications near the root and creates a single point of failure. Moreover , wireless links are unreliable, and in dynamic environments, a signiﬁcant amount of undesirable overhead trafﬁc may be generated just to establish and maintain routes. A. Gossip Algorithms for In-Network Pr ocessing This article presents an overvie w of gossip algorithms and issues related to their use for in-network processing in wireless sensor networks. Gossip algorithms have been widely studied in the computer science community for information dissemination and search [14]–[16]. More recently , the y ha ve been de veloped and studied for information processing in sensor networks. They have the attractiv e property that no specialized routing is required. Each node begins with a subset of the data in the network. At each iteration, information is exchanged between a subset of nodes, and then this information is processed by the receiving nodes to compute a local update. Gossip algorithms for in-netw ork processing hav e primarily been studied as solutions to consensus pr oblems , which capture the situation where a network of agents must achiev e a consistent opinion through local information exchanges with their neighbors. Early work includes that of Tsitsiklis et al. [17], [18]. Consensus problems have arisen in numerous applications including: load balancing [19]; alignment, ﬂocking, and multi-agent collabora- tion [20], [21]; vehicle formation [22], tracking and data fusion [23], and distributed inference [24]. The canonical example of a gossip algorithm for information aggregation is a randomized protocol for distributed av eraging. The problem setup is such that each node in a n -node network initially has a scalar measurement value, 1 This is a Hamiltonian cycle, in graph-theoretic terms. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 3 and the goal is to have ev ery node compute the av erage of all n initial values – often referred to as the a verage consensus. In pairwise randomized gossiping [25], each node maintains an estimate of the network av erage, which it initializes with its own measurement value. Let x ( t ) denote the vector of estimates of the global averages after the t th gossip round, where x (0) is the vector of initial measurements; that is, x i ( t ) is the estimate 2 at node i after t iterations. In one iteration, a randomly selected pair of neighboring nodes in the network exchange their current estimates, and then update their estimates by setting x i ( t + 1) = x j ( t + 1) =  x i ( t ) + x j ( t )  / 2 . A straightforward analysis of such an algorithm shows that the estimate at each node are guaranteed to con v erge to the av erage, x av e = 1 n P n i =1 x i (0) , as long as the netw ork is connected (information can ﬂo w between all pairs of nodes), and as long as each pair of neighboring nodes gossips frequently enough; this is made more precise in Section II below . Note that the primitiv e described abov e can be used to compute any function of the form P n i =1 f i  x i (0)  by properly setting the initial value at each node, and while this is not the most general type of query , many useful computations can be reduced in this form as will further be highlighted in Sections IV and V. Gossip algorithms can be classiﬁed as being randomized or deterministic. The scheme described above is randomized and asynchronous, since at each iteration a random pair of nodes is activ e. In deterministic, synchronous gossip algorithms, at each iteration node i updates x i ( t + 1) with a con ve x combination of its o wn values and the values recei ved from all of its neighbors, e.g., as discussed in [26]. Asynchronous gossip is much better suited to wireless sensor network applications, where synchronization itself is a challenging task. Asynchronous gossip can be implemented using the framew ork described in [18], [27]. Each node runs an independent Poisson clock, and when node i ’ s clock “ticks”, it randomly selects and gossips with one neighbor . In this formulation, denoting the probability that node i chooses a neighbor j by P i,j , conditions for conv ergence can be expressed directly as properties of these probabilities. Gossip and consensus algorithms have also been the subject of study within the systems and control community , with a focus on characterizing conditions for con vergence and stability of synchronous gossiping, as well as optimization of the algorithm parameters P i,j ; see the e xcellent surveys by Olfati-Saber and Murray [28], and Ren et al. [29], and references therein. B. P aper Outline Our ov ervie w of gossip algorithms begins on the theoretical side and progresses towards sensor network appli- cations. Each gossip iteration requires wireless transmission and thus consumes valuable bandwidth and energy resources. Section II discusses techniques for bounding rates of con ver gence for gossip, and thus the number of transmissions required. Because standard pairwise gossip con ver ges slowly on wireless network topologies, a large body of work has focused on de veloping faster gossip algorithms for wireless networks, and this w ork is also described. When transmitting ov er a wireless channel, one must also consider issues such as noise and coding. Section III discusses the effects of ﬁnite transmission rates and quantization on con ver gence of gossip algorithms. Finally , Section IV illustrates how gossip algorithms can be applied to accomplish distrib uted signal processing 2 Throughout, we will sometimes alternatively refer to the estimates x i ( t ) as states, and to nodes as agents. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 4 tasks such as distributed estimation and compression. I I . R A T E S O F C O N V E R G E N C E A N D F A S T E R G O S S I P Gossip algorithms are iterativ e, and the number of wireless messages transmitted is proportional to the number of iterations executed. Thus, it is important to characterize rate of con ver gence of gossip and to understand what factors inﬂuence these rates. This section surveys conv ergence results, describing the connection between the rate of conv ergence and the underlying network topology , and then describes dev elopments that hav e been made in the area of fast gossip algorithms for wireless sensor networks. A. Analysis of Gossip Algorithms In pairwise gossip, only two nodes exchange information at each iteration. More generally , a subset of nodes may av erage their information. All the gossip algorithms that we will be interested in can be described by an equation of the form x ( t + 1) = W ( t ) x ( t ) , (1) where W ( t ) are randomly selected av eraging matrices, selected independently across time, and x ( t ) ∈ R n is the vector of gossip states after t iterations. When restricted to pairwise averaging algorithms, in each gossip round only the values of two nodes i, j are averaged (as in [25]) and the corresponding W ( t ) matrices have 1 / 2 in the coordinates ( i, i ) , ( i, j ) , ( j, i ) , ( j, j ) and a diagonal identity for e very other node. When pairwise gossip is performed on a graph G = ( V , E ) , only the matrices that average nodes that are neighbors on G (i.e., i, j ∈ E ) are selected with non-zero probability . More generally , we will be interested in matrices that av erage sets of node values and leav e the remaining nodes unchanged. A matrix W ( t ) acting on a vector x ( t ) is set averaging matrix for a set S of nodes, if x i ( t + 1) = 1 | S | X i ∈ S x ( t ) , i ∈ S, (2) and x i ( t + 1) = x i ( t ) , i / ∈ S . Such matrices therefore have entry 1 / | S | at the coordinates corresponding to the set S and a diagonal identity for all other entries. It is therefore easy to see that all such matrices will hav e the following properties:      ~ 1 T W ( t ) = ~ 1 T W ( t ) ~ 1 = ~ 1 , (3) which respectiv ely ensure that the average is preserved at ev ery iteration, and that ~ 1 , the vector of ones, is a ﬁxed point. Further , any set av eraging matrix W is symmetric and doubly stochastic. A matrix is doubly stochastic if its rows sum to unity , and its columns also sum to unity , as implied in (3). The well-known Birkhoff–v on Neumann Theorem states that a matrix is doubly stochastic if and only if it is a con ve x combination of permutation matrices. In the context of gossip, the only permutation matrices which contribute to the conv ex combination are those which permute nodes in S to other nodes in S , and keep all other nodes not in S ﬁxed. The matrix W must also be a DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 5 projection matrix; i.e., W 2 = W since averaging the same set twice no longer changes the vector x ( t ) . It then follows that W must also be positive-semideﬁnite. W e are no w ready to understand the evolution of the estimate vector x ( t ) through the product of these randomly selected set averaging matrices: x ( t + 1) = W ( t ) x ( t ) = t Y k =0 W ( k ) x (0) . (4) Since W ( t ) are selected independently across time, E W ( t ) = E W (0) and we can drop the time index and simply refer to the expected av eraging matrix E W , which is the av erage of symmetric, doubly stochastic, positive semideﬁnite matrices and therefore also has these properties. The desired behavior is that x ( t + 1) → x av e ~ 1 that is equiv alent to asking that t Y k =0 W ( k ) → 1 n ~ 1 ~ 1 T . (5) B. Expected behavior W e start by looking at the expected ev olution of the random vector x ( t ) by taking expectations on both sides of (4): E x ( t + 1) = E t Y k =0 W ( k ) ! x (0) = ( E W ) t +1 x (0) , (6) where the second equality is true because the matrices are selected independently . Since E W is a con ve x combination of the matrices W ( t ) which all satisfy the conditions (3), it is clear that E W is also a doubly stochastic matrix. W e can see that the expected evolution of the estimation vector follows a Markov chain that has the ¯ x av e ~ 1 vector as its stationary distribution. In other words, ~ 1 is an eigen vector of E W with eigen value 1 . Therefore if the Markov chain corresponding to E W is irreducible and aperiodic, our estimate vector will conv er ge in expectation to the desired av erage. Let λ 2 ( E [ W ]) be the second largest eigen v alue of E W . If condition (3) holds and if λ 2 ( E [ W ]) < 1 , then x ( t ) conv erges to x av e ~ 1 in expectation and in mean square. Further precise conditions for con ver gence in expectation and in mean square can be found in [30]. C. Con ver gence rate The problem with the expectation analysis is that it giv es no estimate on the rate of conv er gence, a key parameter for applications. Since the algorithms are randomized, we need to specify what we mean by con vergence. One notion that yields clean theoretical results in v olves deﬁning con ver gence as the ﬁrst time where the normalized error is small with high probability , and controlling both error and probability with one parameter ,  . Deﬁnition 1:  -averaging time T av e (  ) . Given  > 0 , the  -averaging time is the earliest gossip round in which the vector x ( k ) is  close to the normalized true av erage with probability greater than 1 −  : T av e (  ) = sup x (0) inf t =0 , 1 , 2 ... ( P k x ( t ) − x av e ~ 1 k k x (0) k ≥  ! ≤  ) . (7) Observe that the conv ergence time is deﬁned for the worst case over the initial vector of measurements x (0) . This deﬁnition was ﬁrst used in [25] (see also [31] for a related analysis). DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 6 The key technical theorem used in the analysis of gossip algorithms is the following connection between the av eraging time and the second largest eigen value of E W : Theor em 1: For any gossip algorithm that uses set-av eraging matrices and con ver ges in expectation, the averaging time is bounded by T av e ( , E W ) ≤ 3 log  − 1 log  1 λ 2 ( E W )  ≤ 3 log  − 1 1 − λ 2 ( E W ) . (8) This theorem is a slight generalization of Theorem 3 from [25] for non-pairwise av eraging gossip algorithms. There is also a lower bound of the same order, which implies that 3 T av e ( , E W ) = Θ(log  − 1 / (1 − λ 2 ( E W ))) . The topology of the network inﬂuences the conv ergence time of the gossip algorithm, and using this theorem this is precisely quantiﬁed; the matrix E [ W ] is completely speciﬁed by the network topology and the selection probabilities of which nodes gossip. The rate at which the spectral gap 1 − λ 2 ( E [ W ]) approaches zero, as n increases, controls the  -averaging time T av e . The spectral gap is related to the mixing time (see, e.g., [32]) of a random walk on the network topology . Roughly , the gossip averaging time is the mixing time of the simple random walk on the graph times a factor of n . One therefore would like to understand how the spectral gap scales for different models of networks and gossip algorithms. This was ﬁrst analyzed for the complete graph and uniform pairwise gossiping [15], [25], [30]. F or this case it was shown that λ 2 ( E [ W ]) = 1 − 1 /n and therefore, T av e = Θ( n log  − 1 ) . Since only nearest neighbors interact, each gossip round costs two transmitted messages, and therefore, Θ( n log  − 1 ) gossip messages need to be exchanged to con ver ge to the global average within  accuracy . This yields Θ( n log n ) messages to hav e a vanishing error with probability 1 /n , an excellent performance for a randomized algorithm with no coordination that averages n nodes on the complete graph. F or other well connected graphs (including e xpanders and small world graphs), uniform pairwise gossip con ver ges very quickly , asymptotically requiring the same number of messages ( Θ( n log  − 1 ) ) as the complete graph. Note that any algorithm that averages n numbers with a constant error and constant probability of success should require Ω( n ) messages. If the netw ork topology is ﬁx ed, one can ask what is the selection of pairwise gossiping probabilities that maximizes the con v ergence rate (i.e. maximizes the spectral gap). This problem is equi valent to designing a Markov chain which approaches stationarity optimally fast and, interestingly , it can be formulated as a semideﬁnite program 3 Because our primary interest is in understanding scaling laws—ho w man y messages are needed as the network size grows—our discussion centers on the order-wise behavior of gossip algorithms. Recall the Landau or “big O” notation: a function f is asymptotically bounded above by g , written f ( n ) = O ( g ( n )) , if there exist constants N > 0 and c 1 > 0 such that f ( n ) ≤ c 1 g ( n ) for all n ≥ N ; f is asymptotically bounded below by g , written f ( n ) = Ω( g ( n )) , if there exist constants c 2 > 0 and N > 0 such that f ( n ) ≥ c 2 g ( n ) for n ≥ N ; and f is asymptotically bounded abo ve and below by g , written f ( n ) = Θ( g ( n )) , if c 2 g ( n ) ≤ f ( n ) ≤ c 1 g ( n ) for all n ≥ N . DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 7 (SDP) which can be solved efﬁciently [25], [26], [33]. Unfortunately , for random geometric graphs 4 and grids, which are the rele v ant topologies for large wireless ad-hoc and sensor networks, e ven the optimized version of pairwise gossip is extremely wasteful in terms of communication requirements. For example for a grid topology , the number of required messages scales like Θ( n 2 log  − 1 ) [25], [35]. Observe that this is of the same order as the energy required for every node to ﬂood its estimate to all other nodes. On the contrary , the ob vious solution of av eraging numbers on a spanning tree and ﬂooding back the av erage to all the nodes requires only O ( n ) messages. Constructing and maintaining a spanning tree in dynamic and ad-hoc networks introduces signiﬁcant ov erhead and complexity , but a quadratic number of messages is a high price to pay for fault tolerance. D. F aster Gossip Algorithms Pairwise gossip conv erges very slowly on grids and random geometric graphs because of its diffusi ve nature. Information from nodes is essentially performing random walks, and, as is well known, a random walk on the two- dimensional lattice has to perform d 2 steps to cover distance d . One approach to gossiping faster is to modify the algorithm so that there is some directionality in the underlying diffusion of information. Assuming that nodes have knowledge of their geographic location, we can use a modiﬁed algorithm called geographic gossip [35]. The idea of geographic gossip is to combine gossip with greedy geographic routing tow ards a randomly selected location. If each node has kno wledge of its o wn location and under some mild assumptions on the network topology , greedy geographic routing can be used to build an overlay network where any pair of nodes can communicate. The ov erlay network is a complete graph on which pairwise uniform gossip con verges with Θ( n log  − 1 ) iterations. At each iteration, we perform greedy routing, which costs Θ( p n/ log n ) messages on a random geometric graph (also the order of the diameter of the network). In total, geographic gossip thus requires Θ( n 1 . 5 log  − 1 / √ log n ) messages. The technical part of the analysis inv olv es understanding how this can be done with only local information: assuming that each node only knows their o wn location, routing tow ards a randomly selected location is not identical to routing tow ards a randomly selected node. If the nodes are evenly spaced, ho we ver , these two processes are almost the same and the Θ( n 1 . 5 ) message scaling still holds [35]. Li and Dai [36], [37] recently proposed Location-Aided Distributed A vera ging (LAD A) , a scheme that uses partial locations and Markov chain lifting to create fast gossiping algorithms. Lifting of gossip algorithms is based on the seminal work of Diaconis et al. [38] and Chen et al. [39] on lifting Markov chain samplers to accelerate con ver gence rates. The basic idea is to lift the original chain to one with additional states; in the context of gossiping, this corresponds to replicating each node and associating all replicas of a node with the original. LADA creates 4 The family of random geometric graphs with n nodes and connectivity radius r , denoted G ( n, r ) , is obtained by placing n nodes uniformly at random in the unit square, and placing an edge between two nodes if their Euclidean distance is no more than r . In order to process data in the entire network, it is important that the network be connected (i.e., there is a route between every pair of nodes). A fundamental result due to Gupta and Kumar [34] states that the critical connectivity threshold for G ( n, r ) is r con ( n ) = Θ( q log n n ) ; that is, if r does not scale as fast as r con ( n ) , then the network is not connected with high probability , and if r scales at least as fast as r con ( n ) , then the network is connected with high probability . Throughout this paper , when using random geometric graphs it is implied that we are using G ( n, r con ( n )) , in order to ensure that information ﬂows across the entire network. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 8 one replica of a node for each neighbor and associates the policy of a node gi ven it recei v es a message from the neighbor with that particular lifted state. In this manner , LAD A suppresses the diffusi ve nature of rev ersible Markov chains that causes pairwise randomized gossip to be slow . The cluster-based LAD A algorithm performs slightly better than geographic gossip, requiring Θ( n 1 . 5 log  − 1 / (log n ) 1 . 5 ) messages for random geometric graphs. While the theoretical machinery is different, LADA algorithms also use directionality to accelerate gossip, but can operate even with partial location information and hav e smaller total delay compared to geographic gossip, at the cost of a somewhat more complicated algorithm. A related scheme based on lifting was proposed concurrently by Jung, Shah, and Shin [40]. Mosk-Aoyama and Shah [41] use an algorithm based on the work of Flajolet and Martin [42] to compute averages and bound the averaging time in terms of a “spreading time” associated with the communication graph, with a similar scaling for the number of messages on grids and RGGs. Just as algorithms based on lifting incorporate additional memory at each node (by way of additional states in the lifted Marko v chain), another collection of algorithms seek to accelerate gossip computations by having nodes remember a few previous state values and incorporate these values into the updates at each iteration. These memory-based schemes can be viewed as predicting the trajectory as seen by each node, and using this prediction to accelerate con ver gence. The schemes are closely related to shift-register methods studied in numerical analysis to accelerate linear system solvers. The challenge of this approach is to design local predictors that provide speedups without creating instabilities. Empirical evidence that such schemes can accelerate conv ergence rates is shown in [43], and numerical methods for designing linear prediction ﬁlters are presented in [44], [45]. Recent work of Oreshkin et al. [46] sho ws that improv ements in con ver gence rate on par with of geographic gossip are achiev ed by a deterministic, synchronous gossip algorithm using only one extra tap of memory at each node. Extending these theoretical results to asynchronous gossip algorithms remains an open area of research. The geographic gossip algorithm uses location information to route packets on long paths in the network. One natural extension of the algorithm is to allow all the nodes on the routed path to be averaged jointly . This can be easily performed by aggregating the sum and the hop length while routing. As long as the information of the av erage can be routed back on the same path, all the intermediate nodes can replace their estimates with updated value. This modiﬁed algorithm is called geogr aphic gossip with path averaging . It was recently shown [47] that this algorithm conv er ges much faster , requiring only Θ( √ n ) gossip interactions and Θ( n log  − 1 ) messages, which is clearly minimal. A related distributed algorithm was introduced by Sav as et al. [48], using multiple random walks that merge in the network. The proposed algorithm does not require any location information and uses the minimal number of messages, Θ( n log n ) , to av erage on grid topologies with high probability . The coalescence of information reduces the number of nodes that update information, resulting in optimal communication requirements but also less fault tolerance. In most gossip algorithms all nodes k eep updating their information which, as we discuss in the next section, adds robustness with respect to changes to the network and noise in communications. Finally , we note the recent de velopment of schemes that exploit the broadcast nature of wireless communications in order to accelerate gossip rates of conv er gence [49], [50], either by having all neighbors that ov erhear a transmission DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 9 ex ecute a local update, or by having nodes eav esdrop on their neighbors’ communication and then using this information to strategically select which neighbor to gossip with ne xt. The next section discusses issues arising when gossiping speciﬁcally ov er wireless networks. I I I . R A T E L I M I TA T I O N S I N G O S S I P A L G O R I T H M S Rate limitations are relev ant due to the bandwidth restrictions and the po wer limitations of nodes. Finite trans- mission rates imply that nodes learn of their neighbors’ states with ﬁnite precision; if the distortion is measured by the MSE, then it is well established that the operational distortion rate function is exponentially decaying with the number of bits [51], which implies that the precision doubles for each additional bit of representation. For example, in an A WGN channel with path loss in versely proportional to the distance squared, r 2 , the rate R needs to be belo w the capacity bound R < C = 1 / 2 log (1 + γ r − 2 ) . Then, at a ﬁxed po wer budget, ev ery bit of additional precision requires approximately shrinking the range by half; i.e. , ﬁxing γ , the channel capacity increases as the inter-node distance decreases. For a uniform network deployment, this w ould reduce the size of each node’ s neighborhood by about 75%, decreasing the network connectivity and therefore the con ver gence speed. This simple argument illustrates the importance of understanding if the performance of gossip algorithms degrades gracefully as the communication rate of each link decreases. Before summarizing the key ﬁndings of selected literature on the subject of av erage consensus under communi- cation constraints, we explain why some papers care about this issue and some do not. A. Ar e Rate Constraints Signiﬁcant? In most sensor network architectures today , the o verhead of packet headers and reliable communication is so great that using a few bytes to encode the gossip state variables exchanged leads to negligible additional cost while practically gi ving a precision that can be seen as inﬁnite. Moreover , we can ignore bit errors in transmissions, which very rarely go undetected thanks to CRC bits. It is natural to ask: why should one bother studying rate constraints at all? One should bother because existing sensor netw ork modems are optimized to transmit long messages, infrequently , to nearby neighbors, in order to promote spatial bandwidth reuse, and were not designed with decentralized iterativ e computation in mind. T ransmission rates are calculated amortizing the overhead of establishing the link over the duration of very long transmission sessions. Optimally encoding for computation in general (and for gossiping in particular) is an open problem; very few ha ve treated the subject of communication for computation in an information theoretic sense (see, e.g., [52], [53]) and consensus gossiping is nearly absent in the landscape of network information theory . This is not an accident. Broken up in parts, consensus gossip contains the elements of complex classical problems in information theory , such as multi-terminal source coding, the two-way channel, the feedback channel, the multiple access of correlated sources and the relay channel [54]; this is a frightening collection of open questions. Howe ver , as the number of possible applications of consensus gossip primiti ves expands, designing source and channel encoders to solv e precisely DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 10 this class of problems more efﬁciently , ev en though perhaps not optimally , is a worthy task. Desired features are efﬁcienc y in exchanging frequently , and possibly in an optimal order, few correlated bits, and e xchanging with nodes that are (at least occasionally) very far , to promote rapid diffusion. Such forms of communications are very important in sensor networks and network control. Even if fundamental limits are hard to deriv e, there are several heuristics that have been applied to the problem to yield some achiev able bound. Numerous papers ha ve studied the ef fects of intermittent or lossy links in the context of gossip algorithms (i.i.d. and correlated models, symmetric and asymmetric) [55]–[64]. In these models, lossy links correspond to masking some edges from the topology at each iteration, and, as we hav e seen above, the topology directly affects the con ver gence rate. Interestingly , a common thread running through all of the work in this area is that so long as the network remains connected on average, con ver gence of gossip algorithms is not affected by lossy or intermittent links, and con ver gence speeds degrade gracefully . Another aspect that has been widely studied is that of source coding for av erage consensus, and is the one that we consider next in Section III-B. It is fair to say that, particularly in wireless networks, the problem of channel coding is essentially open, as we will discuss in section III-C. B. Quantized consensus Quantization maps the state v ariable exchanged x j ( t ) onto codes that correspond to discrete points Q t,j ( x j ( t )) = q j ( t ) ∈ Q t,j ⊂ R . The set Q t,j is referred to as the code used at time t by node j ; the points q j ( t ) are used to generate an approximation ˆ x j ( t ) of the state x j ( t ) ∈ R that each node needs to transmit; the quantizer rate , in bits, is R t,j = log 2 |Q t,j | , where |A| is the cardinality of the set A . Clearly , under the constraints speciﬁed previously on the network update matrix W ( t ) , the consensus states, { c ~ 1 : c ∈ R } , are ﬁxed points. The evolution of the nodes’ quantized states is that of an automaton; under asynchronous random exchanges, the network state forms a Markov chain with Q n j =1 |Q t,j | possible states and consensus states { c ~ 1 : c ∈ R } that are absorbing states . The cumulativ e number of bits that quantized consensus diffuses throughout the network asymptotically is: R ∞ tot = ∞ X t =1 R t,tot = ∞ X t =1 n X j =1 R t,j . (9) The ﬁrst simple question is: for a ﬁxed uniform quantizer with step-size ∆ , i.e., ˆ x j ( t ) = uni ∆ ( x j ( t )) = argmin q ∈Q | x j ( t ) − q | , where Q = { 0 , ± ∆ , ± 2∆ , . . . , ± (2 R − 1 − 1)∆ } , do the states x ( t ) always con ver ge (in a probabilistic sense) to the ﬁxed points c ~ 1 ? The second is: what is the distortion d  lim k →∞ x ( t ) , 1 n P n i =1 x i (0) ~ 1  due to limited R k,i or a total budget R tot ? Fig. 1 illustrates the basic answers through numerical simulation. Interestingly , with a synchronous gossip update, quantization introduces ne w ﬁxed points other than consensus (Fig. 1(a)), and asynchronous gossiping in general reaches consensus, but without guarantees on the location of outcome (Fig. 1(b)). Kashyap et al. [65] ﬁrst considered a ﬁxed code quantized consensus algorithm, which preserves the network av erage at e very iteration. In their paper , the authors draw an analogy between quantization and load balancing DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 11 (a) 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration state (b) 200 400 600 800 1000 1200 1400 1600 1800 2000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration state exact average Fig. 1. Quantized consensus over a random geometric graph with n = 50 nodes transmission radius r = . 3 and initial states ∈ [0 , 1] with uniform quantization with 128 quantization levels. Synchronous updates (a) and pairwise exchange (b). DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 12 among processors, which naturally comes with an integer constraint since the total number of tasks are ﬁnite and divisible only by inte gers (see, e.g., [19], [66], [67]). Distributed policies to attain a balance among loads were previously proposed in [68], [69]. Assuming that the average can be written as 1 n P n j =1 x j (0) = S n and denoting L , S mo d n , under these updates in [65] it is proven that any algorithm meeting the aforementioned conditions makes e very node con ver ge to either L or L + 1 , thereby approximating the average. The random gossip algorithm analyzed in [70] leads to a similar result, where the ﬁnal consensus state differs at most by one bin from the true av erage; the same authors discuss bounds on the rate of conv er gence in [71]. In these protocols the agents will be uncertain on what interval contains the actual average: the nodes whose ﬁnal value is L will conclude that the av erage is in [ L − 1 , L + 1] and those who end with L + 1 will think that the av erage is in [ L, L + 2] . Benezit et al. [72] proposed a slight modiﬁcation of the policy , considering a ﬁxed rate class of quantization strategies that are based on voting , requiring only 2 bits of memory per agent and attaining a consensus on the interval that contains the actual average. T o overcome the fact that not all nodes end up having the same quantized value, a simple v ariant on the quantized consensus problem that guarantees almost sure conv er gence to a unique consensus point was proposed concurrently in [73] and [74]. The basic idea is to dither the state variables by adding a uniform random variable u ∼ U ( − ∆ 2 , ∆ 2 ) prior to quantizing the states, i.e., ˆ x i ( t ) = uni ∆ ( x i ( t ) + u ) . This modest change enables gossip to conv erge to a consensus almost sur ely , as sho wn in [75]. This guarantees that the nodes will make exactly the same decision. Howe ver , the algorithm can deviate more from the actual average than the quantized consensus policies considered in [65]. The advantage of using a ﬁxed code is the low complexity , but with relatively modest additional cost, the performance can considerably improv e. Carli et al. in [76] noticed that the issue of quantizing for consensus averaging has analogies with the problem of stabilizing a system using quantized feedback [77], which amounts to partitioning the state-space into sets whose points can be mapped to an identical feedback control signal. Hence, the authors resorted to control theoretic tools to infer effecti ve strategies for quantization. In particular , instead of using a static mapping, they model the quantizer Q t ( x i ( t )) at each node i as a dynamical system with internal state ξ i ( t ) , which is coupled with the consensus update through a quantized err or variable q i ( t ) (see Fig. 2). They study two particular strategies. They refer to the ﬁrst as the zoom in - zoom out uniform coder/decoder where they adaptiv ely quantize the state as follo ws. The node states are deﬁned as ξ i ( t ) = ( ˆ x − 1 ,i ( t ) , f i ( t )) . (10) The quantized feedback and its update are ˆ x i ( t ) = ˆ x − 1 ,i ( t + 1) = ˆ x − 1 ,i ( t ) + f i ( t ) q i ( t ); (11) q i ( t ) = uni ∆  x i ( t ) − ˆ x − 1 ,i ( t ) f i ( t )  , (12) DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 13 F ( ξ i ( k ) ,q i ( k )) ξ i ( k ) q i ( k ) Q ( ξ i ( k ) ,x i ( k )) x i ( k ) z − 1 Enco der state No de state Co de F ( ξ i ( k ) ,q i ( k )) ξ i ( k ) q i ( k ) Co de H ( ξ i ( k ) ,q i ( k )) ˆ x i ( k ) Quan tized state Channel Enc. i Dec. j z − 1 Fig. 2. Quantized consensus Node i Encoder and Node j Decoder , with memory . which is basically a differential encoding, and f i ( t ) is the stepsize, updated according to f i ( t + 1) =    k in f i ( t ) if | q i ( t ) | < 1 k out f i ( t ) if | q i ( t ) | = 1 , (13) which allows the encoder to adaptively zoom-in and out, depending on the range of q i ( t ) . The second strategy has the same node states but uses a logarithmic quantizer , ˆ x i ( t ) = ξ i ( t + 1) = ξ i ( t ) + q i ( t ); (14) q i ( t ) = log δ ( x i ( t ) − ξ i ( t )) , (15) where the logarithmic quantization amounts to: q i ( t ) = sign( x i ( t ) − ξ i ( t ))  1 + δ 1 − δ  ` i ( t ) (16) ` i ( t ) : 1 1 − δ ≤ | x i ( t ) − ξ i ( t ) |  1 + δ 1 − δ  − ` i ( t ) ≤ 1 1 + δ . In [76] numerical results are provided for the conv ergence of the zoom-in/out quantizer, while the properties of the logarithmic quantizer are studied analytically . Remarkably , the authors prov e that if the state a verage is preserv ed and if 0 < δ < 1+ λ min ( W ) 3 − λ min ( z ) , then the network reaches asymptotically exactly the same state as the un-quantized av erage consensus. In other words, for all i , lim k →∞ x i ( t ) = 1 n P n i =1 x i (0) . One needs to observe that the logarithmic quantizer replaces state values in an uncountable set R with discrete countable outputs ` ∈ N , in the most efﬁcient DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 14 q i ( k ) Co de ˆ x i ( k ) Quan tized state Dec. j H (( ζ ij ( k ) , ξ i ( k )) ,q i ( k )) G (( ζ ij ( k ) , ξ i ( k )) ,q i ( k ) ) x j ( k ) , ... Side Information z − 1 ζ ij ( k ) , ξ i ( k ) Fig. 3. Node j Decoder , with memory and side information. way [77], but there are still inﬁnite many such sets; in other words, the logarithmic quantizer has unlimited range and therefore R t,j = ∞ . Hence, in practice, one will hav e to accept a penalty in accuracy when its range is limited. The vast signal processing literature on sampling and quantization can ob viously be applied to the consensus problem as well to ﬁnd heuristics. It is not hard to recognize that the quantizers analyzed in [76] are equiv alent to predictiv e quantizers. Noting that the states are both temporally and spatially correlated, it is clear that encoding using the side information that is av ailable at both transmitter and receiv er can yield improved performance and lower cost; this is the tenet of the work in [78], [79], which analyzed a more general class of quantizers. They can be captured in a similar framew ork as that of [76] by adding an auxiliary state v ariable ζ ij ( t ) , which affects the state of the decoder only (see the decoder in Fig. 3). The idea is similar, since ˆ x − 1 ,i ( t + 1) in (11) is replaced in [79] by the optimum linear minimum mean-squared error prediction, performed using k previous states ˆ x − 1 ,i ( t ) = P k l =1 a i,k ( l ) ˆ x i ( t − l ) . Similarly , the receiv er state is introduced to utilize the idea of coding with side information [80], where the side information about x i ( t ) that is av ailable at, say , recei ver j consists of the receiv er’ s present state x j ( t ) , as well as possibly its own past states and those of neighbors in communication with node j . The decoder augmented state ( ζ ij ( t ) , ξ i ( t )) in [79] is useful to reap the beneﬁts of the reﬁnement in the prediction of x i ( t ) that the decoder can obtain using its own side information. This prediction ˆ x − 1 ,ij ( t ) can more closely approximate the true state x i ( t ) compared to the transmitter ˆ x − 1 ,i ( t ) and this, in turn, means that (12) can be replaced by a nested quantizer, such as for example the nested lattice quantizers in [81]. In practice, to keep the complexity at bay , one can use a static nested lattice quantizer at the transmitter without any memory , while using the current local state as the j -node decoder state, i.e., ζ ij ( t ) = x j ( t ) . The main analytical result in [79] is the conclusion that, even with the lowest complexity (i.e. prediction memory k = 1 only or ζ ij ( t ) = x j ( t ) and no memory) one needs ﬁnite R ∞ tot < ∞ to guarantee that the network will reach consensus with a bounded error d  lim k →∞ x ( t ) , 1 n P n i =1 x i (0) ~ 1  ≤ D ∞ tot that decreases as a function of R ∞ tot . This is useful to establish since one may argue that, as long as the network is ﬁnite, ﬂooding each value from each node, rather than gossiping, would require a total cost in term of transmission bits that is ﬁnite, which can also be reduced via optimal joint source and network coding methods. It is meaningful to ask if gossiping can also lead to a similar rate-distortion tradeof f and the result in [79] suggests that this is, indeed, the case. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 15 Recent work has begun to inv estigate information-theoretic performance bounds for gossip. These bounds charac- terize the rate-distortion tradeof f either (i) as a function of the underlying network topology assuming that each link has a ﬁnite capacity [82], or (ii) as a function of the rate of the information source providing new measurements to each sensor [83]. C. W ireless channel coding for averag e consensus Quantization provides a source code, but equally important is the channel code that is paired with it. First, using the separation of source and channel coding in wireless networks is not optimal in general. Second, and more intuitiv ely , in a wireless network there are a v ariety of rates that can be achiev ed with a variety of nodes under different traf ﬁc conditions. The two key elements that determine what communications can take place are scheduling and channel coding. Theoretically , there is no ﬁxed-range communication; any range can be reached albeit with lower capacity . Also, there is no such thing as a collision; rather , there is a tradeoff between the rate that multiple users can simultaneously access the channel. The computational codes proposed in [84] aim to strike a near-optimal trade-off for each gossip iteration, by utilizing the additiv e noise multiple access channel as a tool for directly computing the average of the neighborhood. The idea adv ocated by the authors echoes their pre vious work [53]: nodes send lattice codes that, when added through the channel, result in a lattice point that encodes a speciﬁc algebraic sum of the inputs. Owing to the algebraic structure of the channel codes and the linearity of the channel, each recipient decodes directly the linear combination of the neighbors’ states, which provides a new estimate of the network average when added to the local state. The only drawbacks of this approach is that 1) it requires channel state information at the transmitter, and 2) that only one recipient can be targeted at the time. The scenario considered is closer to that in [83], since a stream of data needs to be av eraged, and a ﬁnite round is dedicated to each input. The key result proven is that the number of rounds of gossip grows as O (log n 2 /r 2 ) where r is the radius of the neighborhood. I V . S E N S O R N E T W O R K A P P L I C A T I O N S O F G O S S I P This section illustrates how gossip algorithms can be applied to solve representative problems in wireless sensor networks. Of course, gossip algorithms are not suited for all distributed signal processing tasks. They have proven useful, so far , for problems that inv olve computing functions that are linear combinations of data or statistics at each node. T wo straightforward applications arise from distributed inference and distributed detection. When sensors make conditionally independent observations, the log-likelihood function conditioned on a hypothesis H j is simply the sum of local log-likelihood functions, P n i =1 log p ( x i | H j ) , and so gossip can be used for distributed detection (see also [85], [86]). Similarly , if sensor readings can be modeled as i.i.d. Gaussian with unknown mean, distributed inference of the mean boils down to computing the av erage of the sensor measurements, and again gossip can be applied. Early papers that made a broader connection are those of Saligrama et al. [86], and Moallemi and V an Roy [87], which both discuss connections between gossip algorithms and belief propagation. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 16 Below we consider three additional example applications. Section IV -A describes a gossip algorithm for distributed linear parameter estimation that uses stochastic approximation to ov ercome quantization noise effects. Sections IV -B and IV -C illustrate how gossip can be used for distributed source localization and distributed compression, respec- tiv ely . W e also note that gossip algorithms ha ve recently been applied to problems in camera networks for distributed pose estimation [88], [89]. A. Robust Gossip for Distributed Linear P arameter Estimation The present section focuses on r obust gossiping for distributed linear parameter estimation of a vector of parameters with lo w-dimensional observations at each sensor . W e describe the common assumptions on sensing, the network topology , and the gossiping protocols. Although we focus on estimation, the formulation is quite general and applies to many inference problems, including distributed detection and distributed localization. 1) Sensing/Observation Model: Let θ ∈ R m × 1 be an m -dimensional parameter that is to be estimated by a network of n sensors. W e refer to θ as a parameter , although it is a vector of m parameters. For deﬁniteness we assume the following observation model for the i -th sensor: z i ( t ) = H i θ + w i ( t ) (17) where:  z i ( t ) ∈ R m i × 1  t ≥ 0 is the i.i.d. observation sequence for the i -th sensor; { w i ( t ) } t ≥ 0 is a zero-mean i.i.d. noise sequence of bounded v ariance. For most practical sensor network applications, each sensor observes only a subset of m i of the components of θ , with m i  m . Under such conditions, in isolation, each sensor can estimate at most only a part of the parameter . Since we are interested in obtaining a consistent estimate of the entire parameter θ at each sensor, we need some type of observability condition. W e assume the matrix n X i =1 H T i H i (18) is full rank. Note that the in vertibility is ev en required by a centralized estimator (one which has access to data from all sensors at all time) to get a consistent estimate of θ . It turns out that, under reasonable assumptions on the network connectivity , this necessary condition for centralized observability is sufﬁcient for distributed observability , i.e., for each sensor to obtain a consistent estimate of θ . It is not necessary to restrict to time-in v ariant observation matrices and the H i s can be random time-varying [90], as would be required in most regression based analyses. In general, the observations need not come from a linear statistical model and may be distributions parameterized by θ . The distributed observability would then correspond to asymptotic distinguishability of the collection of these distributions over the network. A generic formulation in such a setting requires the notion of separably estimable observation models (see [91]). An equiv alent formulation of the estimation problem in the setting considered abov e, comes from the distributed least mean squar e (LMS) adaptive ﬁltering framework [92]–[94]. The objectiv e here is slightly different. While we are interested in consistent estimates of the entire parameter at each sensor , the LMS formulations require, in a distributed way , to adapt to the en vironment to produce a desired response at each sensor , and the observability issue DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 17 is not of primary importance. A generic framework for distributed estimation, both in the static parameter case and when the parameter is non-stationary , is addressed in [95]. An important aspect of algorithm design in these cases is the choice of the inter-sensor weight sequence for fusing data or estimates. In the static parameter case, where the objective is to drive all the sensors to the true parameter value, the weight sequence necessarily decays ov er time to overcome the accumulation of observation and other forms of noises, whereas, in the dynamic parameter estimation case, it is required that the weight sequence remains bounded away from zero, so that the algorithm possesses tracking abilities. W e direct the reader to the recent article [96] for a discussion along these lines. In the dynamic case, we also suggest the signiﬁcant literature on distributed Kalman ﬁltering (see, e.g., [97]–[101] and the references therein), where the objectiv e is not consensus seeking among the local estimates, but, in general, optimizing fusion strategies to minimize the mean-squared error at each sensor . It is important to note here that average consensus is a speciﬁc case of a distributed parameter estimation model, where each sensor initially takes a single measurement, and sensing of the ﬁeld thereafter is not required for the duration of the gossip algorithm. Several distributed inference protocols (for example, [24], [102], [103]) are based on this approach, where either the sensors take a single snapshot of the ﬁeld at the start and then initiate distributed consensus protocols (or more generally distrib uted optimization, as in [103]) to fuse the initial estimates, or the observation rate of the sensors is assumed to be much slo wer than the inter-sensor communicate rate, thus permitting a separation of the two time-scales. 2) Distributed linear parameter estimation: W e no w brieﬂy discuss distrib uted parameter estimation in the linear observation model (17). Starting from an initial deterministic estimate of the parameters (the initial states may be random, we assume deterministic for notational simplicity), x i (0) ∈ R m × 1 , each sensor generates, by a distributed iterativ e algorithm, a sequence of estimates, { x i ( t ) } t ≥ 0 . T o simplify the discussion in this section, we assume a synchronous update model where all nodes exchange information and update their local estimates at each iteration. The parameter estimate x i ( t + 1) at the i -th sensor at time t + 1 is a function of: 1) its previous estimate; 2) the communicated quantized estimates at time t of its neighboring sensors; and 3) the new observation z i ( t ) . The data is subtractiv ely dithered quantized, i.e., there exists a vector quantizer Q ( . ) and a family ,  ν l ij ( t )  , of i.i.d. uniformly distributed random variables on [ − ∆ / 2 , ∆ / 2) such that the quantized data receiv ed by the i -th sensor from the j -th sensor at time t is Q ( x j ( t ) + ν ij ( t )) , where ν ij ( t ) = [ ν 1 ij ( t ) , · · · , ν m ij ( t )] T . It then follo ws that the quantization error , ε ij ( t ) ∈ R m × 1 , is a random vector , whose components are i.i.d. uniform on [ − ∆ / 2 , ∆ / 2) and independent of x j ( t ) . 3) Stochastic appr oximation algorithm: Let N i ( t ) denote the neighbors of node i at iteration t ; that is, j ∈ N i ( t ) if i can receiv e a transmission from j at time t . In this manner , we allow the connectivity of the network to vary with time. Based on the current state, x i ( t ) , the quantized e xchanged data { Q ( x j ( t ) + ν ij ( t )) } j ∈N i ( t ) , and the DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 18 observation z i ( t ) , the updated estimate at node i is x i ( t + 1) = x i ( t ) − α ( t )   b X j ∈N i ( t ) ( x i ( t ) (19) − Q ( x j ( t ) + ν ij ( t ))) − H T i ( z i ( t ) − H i x i ( t )) i In (19), b > 0 is a constant and { α ( t ) } t ≥ 0 is a sequence of weights satisfying the persistence condition 5 : α ( t ) ≥ 0 , X t α ( t ) = ∞ , X t α 2 ( t ) < ∞ (20) Algorithm (19) is distrib uted because for sensor n it in v olves only the data from the sensors in its neighbor- hood N i ( t ) . The follo wing result from [91] characterizes the desired statistical properties of the distrib uted parameter es- timation algorithm just described. The ﬂav or of these results is common to other stochastic approximation algo- rithms [104]. First, we hav e a law of large numbers-like result which guarantees that the estimates at each node will con ver ge to the true parameter estimates, P  lim t →∞ x i ( t ) = θ , ∀ i  = 1 (21) If, in addition to the conditions mentioned above, the weight sequence is taken to be α ( t ) = a t + 1 , (22) for some constant a > 0 , we also obtain a central limit theorem-like result, describing the distribution of estimation error over time. Speciﬁcally , for a sufﬁciently large, we have that the error , √ t  x ( t ) − ~ 1 ⊗ θ  con ver ges in distribution to a zero-mean multiv ariate normal with cov ariance matrix that depends on the observ ation matrices, the quantization parameters, the variance of the measurement noise, w i ( t ) , and the constants a and b . The two most common techniques for analyzing stochastic approximation algorithms are stochastic L yapunov functions and the ordinary differential equations method [104]. For the distributed estimation algorithm (19), the results just mentioned can be deriv ed using the L yapunov approach [91], [105]. Performance analysis of the algorithm for an example network is illustrated in Figure 4. An example network of n = 45 sensors are deployed randomly on a 25 × 25 grid, where sensors communicate in a ﬁxed radius and are further constrained to have a maximum of 6 neighbors per node. The true parameter θ ∗ ∈ R 45 . Each node is associated with a single component of θ ∗ . For the experiment, each component of θ ∗ is generated by an instantiation of a zero mean Gaussian random variable of variance 25. (Note, the parameter θ ∗ here has a physical signiﬁcance 5 W e need the α ( t ) to sum to inﬁnity , so that the algorithm ‘persists’ and does not stop; on the other hand, the α sequence should be square summable to pre vent the build up of noise over time. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 19 0 5 10 15 20 25 0 5 10 15 20 25 (a) 50 100 150 200 250 300 0 0.5 1 1.5 2 2.5 3 Itera tions Erro r p er co mp o nent o f θ ∗ (b) Fig. 4. Illustration of distributed linear parameter estimation. (a) Example network deployment of 45 nodes. (b) Conv ergence of normalized estimation error at each sensor . and may represent the state of the ﬁeld to be estimated. In this example, the ﬁeld is assumed to be white, stationary and hence each sample of the ﬁeld has the same Gaussian distribution and independent of the others. More generally , the components of θ ∗ may correspond to random ﬁeld samples, as dictated by the sensor deployment, representing a discretization of the PDE governing the ﬁeld.) Each sensor observes the corresponding ﬁeld component in additive Gaussian noise. For example, sensor 1 observes z 1 ( t ) = θ ∗ 1 + w 1 ( t ) , where w 1 ( t ) ∼ N (0 , 1) . Clearly , such a model satisﬁes the distributed observability condition G = X i H T i H i = I = G − 1 (23) (Note, here H i = e T i , where e i is the standard unit vector with 1 at the i -th component and zeros else where.) Fig. 4(a) shows the network topology , and Fig. 4(b) shows the normalized error of each sensor plotted against the iteration index t for an instantiation of the algorithm. The normalized error for the i -th sensor at time t is giv en by the quantity k x i ( t ) − θ ∗ k / 45 , i.e., the estimation error normalized by the dimension of θ ∗ . W e note that the errors conv erge to zero as established by the theoretical ﬁndings. The decrease is rapid at the beginning and slows down at t increases. This is a standard property of stochastic approximation based algorithms and is attributed to the decreasing weight sequence α ( t ) required for conv ergence. It is interesting to note that, although the individual sensors suf fer from lo w rank observ ations of the true parameter , by collaborating, each of them can reconstruct the true parameter v alue. The asymptotic normality sho ws that the estimation error at each sensor decays as 1 / √ t , the decay rate being similar to that of a centralized estimator having access to all the sensor observations at all times. The efﬁcienc y of the distributed estimator is measured in terms of its asymptotic variance, the lower limit being the Fisher information rate of the corresponding centralized estimator . As expected, because of the distributed nature of the protocol (information needs to disseminate across the entire network) and quantized (noisy) inter-sensor communication, the achiev ed asymptotic variance is larger than the DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 20 centralized Fisher information rate. In the absence of quantization (perfect communication), it can be sho wn that the parameter a in eqn. (22) can be designed appropriately so that the asymptotic variance of the decentralized estimator matches the centralized Fisher information rate, showing that the distributed estimator described abov e is efﬁcient for the centralized estimation problem. An example of interest, with Gaussian observation noise is studied in [91], where it is shown that asymptotic variance attainable by the distributed algorithm is the same as that of the optimum (in the sense of Cram ´ er-Rao) centralized estimator having access to all information simultaneously . This is an interesting result, as it holds irrespectiv e of the network topology . Such a phenomenon is attributed to a time scale separation between the consensus potential and the innov ation rate (rate of new information entering the network), when inter-sensor communication is unquantized (perfect) with possible link failures. As noted before, the observation model need not be linear for distrib uted parameter es timation. In [91] a lar ge class of nonlinear observation models were considered and a notion of distrib uted nonlinear observability called separably estimable observ able models introduced. Under the separably estimable condition, there exist local transforms under which the updates can be made linear . Howe ver , such a state transformation induces different time scales on the consensus potential and the innov ation update, giving the algorithm a mixed time-scale behavior (see [91], [106] for details.) This mixed time-scale behavior and the effect of biased perturbations leads to the inapplicability of standard stochastic approximation techniques. B. Sour ce Localization A canonical problem, encompassing many of the challenges which commonly arise in wireless sensor network applications, is that of estimating the location of an energy-emitting source [1]. Patw ari et al. [107] presents an excellent overvie w of the man y approaches that hav e been dev eloped for this problem. The aim in this section is to illustrate ho w gossip algorithms can be used for source localization using recei ved signal strength (RSS) measurements. Let θ ∈ R 2 denote the coordinates of the unknown source, and for i = 1 , . . . , n , let y i ∈ R 2 denote the location of the i th sensor . The RSS measurement at node i is modeled as f i = α k y i − θ k β + w i , (24) where α > 0 is the signal strength emitted at the source, β is the path-loss coef ﬁcient, and w j is additive white Gaussian noise. T ypical values of β are between 2 and 4. This model was validated experimentally in [108]. Centralized maximum likelihood estimators for single and multiple-source scenarios based on this model are presented in [109] and [110]. Because the maximum likelihood problem is, in general, non-linear and non-con ve x, it is challenging to solve in a decentralized fashion. Distributed approaches based on ﬁnding a cyclic route through the network are presented in [9], [10]. An alternativ e approach, using gossip algorithms [111], forms a location estimate b θ by taking a linear combination of the sensor locations weighted by a function of their RSS measurement, b θ = P n i =1 y i K ( f i ) P n i =1 K ( f i ) , (25) DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 21 where K : R + → R + is a monotone increasing function satisfying K (0) = 0 and lim f →∞ K ( f ) < ∞ . Intuitiv ely , nodes that are close to the source measure high RSS values, so their locations should be gi ven more weight than nodes that are further a way , which will measure lo wer RSS v alues. T aking K ( f ) = 1 { f ≥ γ } , where γ > 0 is a positiv e threshold, and where 1 {·} is the indicator function, (25) reduces to b θ 1 = P n i =1 y i 1 {k y i − θ k≤ γ − 1 /β } P n i =1 1 {k y i − θ k≤ γ − 1 /β } , (26) which is simply the centroid of the locations of sensors that are no further than γ − 1 /β from the source. In [111], it was sho wn that this estimator beneﬁts from some attracti ve properties. First, if the sensor locations y i are modeled as uniform and random over the region being sensed, then b θ 1 is a consistent estimator , as the number of sensors grows. It is interesting to note that one does not necessarily need to kno w the parameters α or β precisely to implement this estimator . In particular, because (25) is self-normalizing, the estimator automatically adapts to the source signal strength, α . In addition, [111] shows that this estimator is robust to choice of γ . In particular , ev en if β is not known precisely , the performance of (25) degrades gracefully . On the other hand, the maximum likelihood approach is very sensitive to model mismatch and estimating α and β can be challenging. Note that (25) is a ratio of linear functions of the measurements at each node. T o compute (25), we run two parallel instances of gossip o ver the network, one each for the numerator and the denominator . If each node initializes x N i (0) = y i K ( f i ) , and x D i (0) = K ( f i ) , then executing gossip iterations will cause the values at each node to con v erge to lim t →∞ x N i ( t ) = 1 n P n j =1 y j K ( f j ) and lim t →∞ x D i = 1 n P n j =1 K ( f j ) . Of course, in a practical implementation one would stop gossiping after a ﬁxed number of iterations, t stop , which depends on the desired accuracy and netw ork topology . Then, each node can locally compute the estimate x N i ( t stop ) /x D i ( t stop ) of the source’ s location. Note that throughout this section it was assumed that each node knows its own location. This can also be accomplished using a gossip-style algorithm, as described in [112]. C. Distributed Compression and F ield Estimation Extracting information in an energy-ef ﬁcient and communication-efﬁcient manner is a fundamental challenge in wireless sensor network systems. In many cases, users are interested in gathering data to see an “image” of activity or sensed v alues over the entire region. Let f i ∈ R denote the measurement at node i , and let f ∈ R n denote the network signal obtained by stacking these values into a vector . Having each sensor transmit f i directly to an information sink is inefﬁcient in many situations. In particular, when the values at different nodes are correlated or the signal is compressible, then one can transmit less data without loosing the salient information in the signal. Distributed source coding approaches attempt to reduce the total number of bits transmitted by lev eraging the celebrated results of Slepian and W olf [113] to code with side information [114], [115]. These approaches make assumptions about statistical characteristics of the underlying data distrib ution that may be difﬁcult to verify in practice. An alternativ e approach is based on linear transform coding, gossip algorithms, and compressi ve sensing. It has been observed that many natural signals are compressible under some linear transformation. That is, although f DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 22 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 (a) 0 100 200 300 400 500 0 20 40 60 80 100 120 140 Number of terms in approximation (m) Mean ï Squared Approximation Error Original Signal Transformed Signal (b) 1 2 3 4 5 x 1 0 4 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Gossip Iterations Normalized Mean Squared Error k=50 k=100 k=150 (c) Fig. 5. Example illustrating compression of a smooth signal. Panel (a) shows the original smooth signal which is sampled at 500 random node locations, and nodes are connected as in a random geometric graph. Panel (b) illustrates the m -term approximation error decay in both the original basis and using the eigenvectors of the graph Laplacian as a transform, which is analogous to taking a Fourier transform of signals supported on the network. Panel (c) illustrates the reconstruction error after gossiping on random linear combinations of the sensor measurements and reconstructing using compressed sensing techniques. Note that using more random linear projections (larger k ), giv es lower error , but the number of projections used is much smaller than the network size. may have energy in all locations (i.e., f i > 0 for all i ), there is a linear basis transformation matrix, T ∈ R n × n , such that when f is represented in terms of the basis T by computing θ = T f , the transformed signal θ is compressible (i.e., θ j ≈ 0 for many j ). For example, it is well kno wn that smooth one-dimensional signals are well approximated using the Fourier basis, and piece-wise smooth images with smooth boundaries (a reasonable model for images) are well-approximated using wa velet bases [116]. T o formally capture the notion of compressibility using ideas from the theory of nonlinear approximation [117], we reorder the coefﬁcients θ j in order of decreasing magnitude, | θ (1) | ≥ | θ (2) | ≥ | θ (3) | ≥ · · · ≥ | θ ( n ) | , (27) and then deﬁne the best m -term approximation of f in T as f ( m ) = P m j =1 θ ( j ) T : , ( j ) , where T : ,j denotes the j th column of T . This is analogous to projecting f onto the m -dimensional subspace of T that captures the most energy in f . W e then say that f is α -compressible in T , for α ≥ 1 , when the mean squared approximation error behaves like 1 n k f − f ( m ) k 2 ≤ C m − 2 α , (28) for some constant C > 0 . Since the error exhibits a power -law decay in m for compressible signals, it is possible to achieve a small mean squared approximation error while only computing and/or communicating the few most signiﬁcant coefﬁcients θ (1) , . . . , θ ( m ) . Figure 5 sho ws an example where 500 nodes forming a random geometric graph sample a smooth function. As a compressing basis T , we use the eigen vectors of the normalized graph Laplacian (a function of the network topology), which are analogous to the Fourier basis vectors for signals supported on G [118]. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 23 Observe that each coefﬁcient, θ j , is a linear function of the data at each node, and so one could conceiv ably compute these coef ﬁcients using gossip algorithms. Assuming that each node i knows the v alues { T j,i } n j =1 in each basis vector , to compute θ j , we can initialize x i (0) = nT j,i f i , and, by gossiping, each node will compute lim t →∞ x i ( t ) = P n k =1 T j,k f k = θ j . The main challenge with this approach is that the indices of the most signiﬁcant coefﬁcients are very signal-speciﬁc, and are generally not known in advance. W e can a void this issue by making use of the recent theory of compressiv e sensing [119]–[121], which says that one can recov er sparse signals from a small collection of random linear combinations of the measurements. In the present setting, to implement the gathering of k compressi ve sensing measurements using gossip algorithms, each node initializes k parallel instances of gossip with x i,j (0) = nA i,j f i , j = 1 , . . . , k , where A i,j , e.g., are i.i.d. zero-mean normal random variables with v ariance 1 /n . Let ¯ x j denote the limiting value of the j th gossip instance at each node. Stacking these into the vector , ¯ x , any node can recover an estimate of the signal f by solving the optimization min θ k ¯ x − AT T θ k 2 + τ n X i =1 | θ i | , (29) where τ > 0 is a regularization parameter . In practice, the v alues A i,j can be pseudo-randomly generated at each node using a predeﬁned seeding mechanism. Then, any user can retrieve the gossip values { x i,j ( t ) } k j =1 from any node i and solve the reconstruction. Moreov er , note that the compressing transformation T only needs to be kno wn at reconstruction time, and to initialize the gossip instances each node only needs its measurement and pseudo-randomly generated v alues A i,j . In general, there is a tradeoff between 1) k , the number of compressed sensing measurements collected, 2)  , the accuracy to which the gossip algorithm is run, 3) the number of transmissions required for this computation, and 4) the av erage reconstruction accuracy av ailable at each node. For an α -compressible signal f , compressed sensing theory provides bounds on the mean squared reconstruction error as a function of k and α , assuming the values, ¯ x , are calculated precisely . Larger k corresponds to lower error, and the error decays rapidly with k (similar to the m -term approximation), so one can obtain a very accurate estimate of f with k  n measurements. Inaccurate computation of the compressed sensing values, ¯ x , due to gossiping for a ﬁnite number of iterations, can be thought of as adding noise to the v alues ¯ x , and increases the overall reconstruction error . Figure 5(c) illustrates, via numerical simulation, the tradeoff between varying k and the number of gossip iterations. For more on the theoretical performance guarantees achiev able in this formulation, see [122], [123]. A gossip-based approach to solving the reconstruction problem in a distributed fashion is described in [124]. For an alternative approach to using gossip for distributed ﬁeld estimation, see [125]. V . C O N C L U S I O N A N D F U T U R E D I R E C T I O N S Because of their simplicity and robustness, gossip algorithms are an attractiv e approach to distributed in-network processing in wireless sensor networks, and this article surveyed recent results in this area. A major concern in sensor networks rev olv es around conserving limited bandwidth and energy resources, and in the context of iterativ e gossip algorithms, this is directly related to the rate of con ver gence. One thread of the discussion cov ered fast DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 24 gossiping in wireless network topologies. Another thread focused on understanding and designing for the effects of wireless transmission, including source and channel coding. Finally , we hav e illustrated how gossip algorithms can be used for a diverse range of tasks, including estimation and compression. Currently , this research is branching into a number of directions. One area of acti ve research is inv estigating gossip algorithms that go beyond computing linear functions and averages. Just as the average can be viewed as the minimizer of a quadratic cost function, researchers are studying what other classes of functions can be optimized within the gossip framew ork [126]. A related direction is in v estigating the connections between gossip algorithms and message-passing algorithms for distributed inference and information fusion, such as belief propagation [87], [127]. While it is clear that computing pairwise averages is similar to the sum-product algorithm for computing marginals of distributions, there is no explicit connection between these families of distributed algorithms. It would be interesting to demonstrate that pairwise gossip and its generalizations correspond to messages of the sum- product (or max-product) algorithm for an appropriate Markov random ﬁeld. Such potentials would guarantee conv ergence (which is not guaranteed in general iterati ve message-passing) and further establish e xplicit con vergence and message scheduling results. Another interesting research direction inv olves understanding the effects of intermittent links and dynamic topologies, and in particular the effects of node mobility . Early work [128] has analyzed i.i.d mobility models and sho wn that mobility can greatly beneﬁt con vergence under some conditions. Generalizing to more realistic mobility models seems to be a very interesting research direction that would also be relev ant in practice since gossip algorithms are more useful in such dynamic environments. Gossip algorithms are certainly relev ant in other applications that arise in social networks and the interaction of mobile devices with social networks. Distributed inference and information fusion in such dynamic networked en vironments is certainly going to pose substantial challenges for future research. R E F E R E N C E S [1] F . Zhao and L. Guibas, Wir eless Sensor Networks: An Information Pr ocessing Approac h . Morgan Kaufmann, 2004. [2] F . Zhao, J. Liu, J. Liu, L. Guibas, and J. Reich, “Collaborativ e signal and information processing: An information-directed approach, ” Pr oc. IEEE , vol. 91, no. 8, pp. 1199–1209, Aug. 2003. [3] B. Sinopoli, C. Sharp, L. Schenato, S. Schaffert, and S. Sastry , “Distributed control applications within sensor networks, ” Pr oc. IEEE , vol. 91, no. 8, pp. 1235–1246, Aug. 2003. [4] C. Chong and S. Kumar , “Sensor networks: Evolution, opportunities, and challenges, ” Pr oc. IEEE , vol. 91, no. 8, pp. 1247–1256, Aug. 2003. [5] R. Brooks, P . Ramanathan, and A. Sayeed, “Distrib uted target classiﬁcation and tracking in sensor networks, ” Proc. IEEE , v ol. 91, no. 8, pp. 1163–1171, Aug. 2003. [6] G. Pottie and W . Kaiser, “Wireless integrated network sensors, ” Communications of the ACM , v ol. 43, no. 5, pp. 51–58, 2000. [7] V . Shnayder, M. Hempstead, B. Chen, G. W erner-Allen, and M. W elsh, “Simulating the power consumption of large-scale sensor network applications, ” in Pr oc. A CM Conf. on Embedded Networked Sensor Systems , Baltimore, Nov . 2004. [8] Y . Y u, B. Krishnamachari, and V . Prasanna, “Energy-latency tradeoffs for data gathering in wireless sensor networks, ” in IEEE Infocom , Hong K ong, March 2004. [9] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks, ” in Proc. IEEE/ACM Symposium on Information Pr ocessing in Sensor Networks , Berkele y , CA, April 2004. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 25 [10] D. Blatt and A. Hero, “Energy based sensor network source localization via projection onto con vex sets (POCS), ” IEEE T ransactions on Signal Pr ocessing , vol. 54, no. 9, pp. 3614–3619, 2006. [11] S. Son, M. Chiang, S. Kulkarni, and S. Schwartz, “The value of clustering in distributed estimation for sensor networks, ” in IEEE W ir elesscom , Maui, June 2005. [12] A. Ciancio, S. P attem, A. Ortega, and B. Krishnamachari, “Ener gy-efﬁcient data representation and routing for wireless sensor networks based on a distributed wa velet compression algorithm, ” in Pr oc. IEEE/ACM Symposium on Information Pr ocessing in Sensor Networks , Nashville, Apr . 2006. [13] S. Ratnasamy , B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Y in, and F . Y u, “Data-centric storage in sensornets with GHT, a geographic hash table, ” Mobile Nets. and Apps. , vol. 8, no. 4, pp. 427–442, 2003. [14] R. Karp, C. Schindelhauer , S. Shenker , and B. V ocking, “Randomized rumor spreading, ” in Annual Symp. on F oundations of Computer Science , vol. 41, 2000, pp. 565–574. [15] D. Kempe, A. Dobra, and J. Gehrke, “Computing aggregate information using gossip, ” in Pr oc. F oundations of Computer Science , Cambridge, MA, Oct. 2003. [16] P . Levis, N. Patel, D. Culler , and S. Shenker, “Trickle: A self-regulating algorithm for code propagation and maintenance in wireless sensor networks, ” in Pr oc. USENIX/ACM Symp. on Networked Systems Design and Implementation , vol. 246, 2004. [17] J. Tsitsiklis, “Problems in decentralized decision making and computation, ” Ph.D. dissertation, Massachusetts Institute of T ech., Nov . 1984. [18] J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms, ” IEEE T rans. Automatic Contr ol , vol. A C-31, no. 9, pp. 803–812, Sep. 1986. [19] G. V . Cybenko, “Dynamic load balancing for distributed memory multiprocessors, ” Journal on P arallel and Distributed Computing , vol. 7, pp. 279–301, 1989. [20] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile autonomous agents using nearest neighbor rules, ” IEEE T ransactions on Automatic Control , vol. A C-48, no. 6, pp. 988–1001, June 2003. [21] R. Olfati-Saber and R. M. Murray , “Consensus problems in networks of agents with switching topology and time-delays, ” IEEE T rans. Automat. Contr . , vol. 49, no. 9, pp. 1520–1533, Sept. 2004. [22] J. A. Fax and R. M. Murray , “Information ﬂo w and cooperativ e control of vehicle formations, ” IEEE Tr ansactions on Automatic Control , vol. 49, no. 9, pp. 1465–1476, Sep. 2004. [23] V . Saligrama and D. Castanon, “Reliable distributed estimation with intermittent communications, ” in 45th IEEE Conference on Decision and Contr ol , San Diego, CA, Dec. 2006, pp. 6763–6768. [24] S. Kar, S. A. Aldosari, and J. M. F . Moura, “T opology for distributed inference on graphs, ” IEEE T ransactions on Signal Pr ocessing , vol. 56, no. 6, pp. 2609–2613, June 2008. [25] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms. ” IEEE T rans. Inf. Theory , vol. 52, no. 6, pp. 2508–2530, Jun. 2006. [26] L. Xiao and S. Boyd, “Fast linear iterations for distrib uted averaging, ” in Pr oc. IEEE Conf. on Decision and Control , Hawaii, Dec. 2003. [27] D. Bertsekas and J. Tsitsiklis, P arallel and Distributed Computation: Numerical Methods . Athena Scientiﬁc, 1997. [28] R. Olfati-Saber , J. Fax, and R. Murray , “Consensus and cooperation in networked multi-agent systems, ” Proc. IEEE , vol. 95, no. 1, pp. 215–233, Jan. 2007. [29] W . Ren, R. Beard, and E. Atkins, “Information consensus in multi vehicle cooperativ e control, ” IEEE Contr ol Systems Magazine , vol. 27, no. 2, pp. 71–82, April 2007. [30] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “ Analysis and optimization of randomized gossip algorithms, ” in Proceedings of the 43rd Confer ence on Decision and Control (CDC 2004) , 2004. [31] F . Fagnani and S. Zampieri, “Randomized consensus algorithms over large scale networks, ” in IEEE J . on Selected Areas of Communications, to appear , 2008. [32] A. Sinclair , “Impro ved bounds for mixing rates of mark ov chains and multicommodity ﬂo w , ” in Combinatorics, Pr obability and Computing , vol. 1, 1992. [33] S. Boyd, P . Diaconis, and L. Xiao, “Fastest mixing markov chain on a graph, ” SIAM REVIEW , v ol. 46, pp. 667–689, 2003. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 26 [34] P . Gupta and P . R. Kumar, “Critical power for asymptotic connectivity in wireless networks, ” in Stochastic Analysis, Control, Optimization, and Applications , Boston, 1998, pp. 1106–1110. [35] A. G. Dimakis, A. D. Sarwate, and M. J. W ainwright, “Geographic gossip: efﬁcient aggregation for sensor networks, ” in ACM/IEEE Symposium on Information Processing in Sensor Networks , 2006. [36] W . Li and H. Dai, “Location-aided fast distrib uted consensus, ” in IEEE T ransactions on Information Theory , submitted , 2008. [37] ——, “Cluster-based distributed consensus, ” IEEE T rans. W ir eless Communications , vol. 8, no. 1, pp. 28–31, Jan. 2009. [38] P . Diaconis, S. Holmes, and R. Neal, “ Analysis of a nonreversible markov chain sampler , ” Annals of Applied Probability , pp. 726–752, 2000. [39] F . Chen, L. Lovasz, and I. Pak, “Lifting markov chains to speed up mixing, ” in Pr oceedings of the thirty-ﬁrst annual ACM symposium on Theory of computing . A CM, 1999, pp. 275–281. [40] K. Jung, D. Shah, and J. Shin, “Fast gossip through lifted Markov chains, ” in Pr oc. Allerton Conf. on Comm., Control, and Comp. , Urbana-Champaign, IL, Sep. 2007. [41] D. Mosk-Aoyama and D. Shah, “Information dissemination via gossip: Applications to av eraging and coding, ” April 2005, http://arxiv .org/cs.NI/0504029. [42] P . Flajolet and G. Martin, “Probabilistic counting algorithms for data base applications, ” Journal of Computer and System Sciences , vol. 31, no. 2, pp. 182–209, 1985. [43] M. Cao, D. A. Spielman, and E. M. Y eh, “ Accelerated gossip algorithms for distributed computation, ” in Proc. 44th Annual Allerton Conf. Comm., Control, and Comp. , Monticello, IL, Sep. 2006. [44] E. Kokiopoulou and P . Frossard, “Polynomial ﬁltering for fast conv ergence in distributed consensus, ” IEEE Tr ans. Signal Pr ocessing , vol. 57, no. 1, pp. 342–354, Jan. 2009. [45] B. Johansson and M. Johansson, “Faster linear iterations for distributed av eraging, ” in Pr oc. IF A C W orld Congress , Seoul, South K orea, Jul. 2008. [46] B. Oreshkin, M. Coates, and M. Rabbat, “Optimization and analysis of distrib uted a veraging with short node memory , ” to appear , IEEE T rans. Signal Pr ocessing , Jul. 2010. [47] F . Benezit, A. G. Dimakis, P . Thiran, and M. V etterli, “Gossip along the way: Order-optimal consensus through randomized path averaging, ” in Pr oceedings of Allerton Conference , Monticello, IL , 2007. [48] O. Savas, M. Alanyali, and V . Saligrama, “Efﬁcient in-network processing through local ad-hoc information coalescence, ” in DCOSS , 2006, pp. 252–265. [49] T . A ysal, M. Yildiz, A. Sarwate, and A. Scaglione, “Broadcast gossip algorithms for consensus, ” Signal Processing, IEEE Tr ansactions on , vol. 57, no. 7, pp. 2748–2761, July 2009. [50] D. Ustebay , B. Oreshkin, M. Coates, and M. Rabbat, “Rates of conv ergence for greedy gossip with eavesdropping, ” in Pr oc. Allerton Conf. on Communication, Contr ol, and Computing , Monticello, 2008. [51] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression, ” Signal Processing Magazine, IEEE , vol. 15, no. 6, pp. 23–50, Nov 1998. [52] A. Orlitsky and A. El Gamal, “ A v erage and randomized communication complexity , ” Information Theory , IEEE Tr ansactions on , vol. 36, no. 1, pp. 3–16, Jan 1990. [53] B. Nazer and M. Gastpar, “Computation over multiple-access channels, ” Information Theory , IEEE T ransactions on , v ol. 53, no. 10, pp. 3498–3516, Oct. 2007. [54] T . M. Cover and J. A. Thomas, Elements of information theory . John Wile y and Sons, Inc., 1991. [55] S. Kar and J. M. F . Moura, “Sensor networks with random links: T opology design for distributed consensus, ” IEEE T ransactions on Signal Pr ocessing , vol. 56, no. 7, pp. 3315–3326, July 2008. [56] Y . Hatano and M. Mesbahi, “ Agreement over random networks, ” in 43r d IEEE Conference on Decision and Contr ol , vol. 2, Dec. 2004, pp. 2010–2015. [57] M. G. Rabbat, R. D. Nowak, and J. A. Bucklew , “Generalized consensus computation in networked systems with erasure links, ” in Proc. of the 6th Intl. Wkshp. on Sign. Pr oc. Adv . in Wir eless Communications , New Y ork, NY , 2005, pp. 1088–1092. [58] S. Patterson and B. Bamieh, “Distributed consensus with link failures as a structured stochastic uncertainty problem, ” in 46th Allerton Conf. on Comm., Contr ol, and Comp. , Monticello, IL, Sept. 2008. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 27 [59] C. Wu, “Synchronization and con vergence of linear dynamics in random directed networks, ” IEEE T ransactions on Automatic Contr ol , vol. 51, no. 7, pp. 1207–1210, July 2006. [60] M. Porﬁri and D. Stilwell, “Stochastic consensus over weighted directed networks, ” in Proceedings of the 2007 American Control Confer ence , Ne w Y ork City , USA, July 11-13 2007. [61] D. Jakovetic, J. Xavier , and J. M. F . Moura, “W eight optimization for consensus algorithms with correlated switching topology , ” IEEE T ransactions on Signal Pr ocessing , v ol. abs/0906.3736, June 2009, submitted. [62] A. T . Salehi and A. Jadbabaie, “On consensus in random networks, ” in The Allerton Conference on Communication, Control, and Computing , Allerton House, IL, September 2006. [63] S. Kar and J. M. F . Moura, “Distributed consensus algorithms in sensor networks with imperfect communication: Link failures and channel noise, ” IEEE T ransactions on Signal Pr ocessing , vol. 57, no. 1, pp. 355–369, January 2009. [64] A. Nedic, A. Ozdaglar, and P . Parrilo, “Constrained consensus and optimization in multi-agent networks, ” IEEE T ransactions on Automatic Contr ol , 2009, to appear . [65] A. Kashyap, T . Basar , and R. Srikant, “Quantized consensus, ” Automatica , v ol. 43, no. 7, pp. 1192 – 1203, 2007. [66] R. Subramanian and I. D. Scherson, “ An analysis of diffusi ve load-balancing, ” in SP AA ’94: Proceedings of the sixth annual ACM symposium on P arallel algorithms and ar chitectur es . New Y ork, NY , USA: A CM, 1994, pp. 220–225. [67] Y . Rabani, A. Sinclair , and R. W anka, “Local diver gence of Markov chains and the analysis of iterativ e load-balancing schemes, ” in In Pr oceedings of the 39th IEEE Symposium on F oundations of Computer Science (FOCS 98 , 1998, pp. 694–703. [68] W . Aiello, B. A werbuch, B. Maggs, and S. Rao, “ Approximate load balancing on dynamic and asynchronous networks, ” in STOC ’93: Pr oceedings of the twenty-ﬁfth annual ACM symposium on Theory of computing . New Y ork, NY , USA: A CM, 1993, pp. 632–641. [69] B. Ghosh and S. Muthukrishnan, “Dynamic load balancing by random matchings, ” J. Comput. Syst. Sci. , vol. 53, no. 3, pp. 357–370, 1996. [70] J. Lav aei and R. Murray , “On quantized consensus by means of gossip algorithm - part i: Conv ergence proof, ” in American Control Confer ence, 2009. ACC ’09. , June 2009, pp. 394–401. [71] ——, “On quantized consensus by means of gossip algorithm - part ii: Con ver gence time, ” in American Contr ol Confer ence, 2009. ACC ’09. , June 2009, pp. 2958–2965. [72] F . Benezit, P . Thiran, and M. V etterli, “Interval consensus: From quantized gossip to voting, ” Acoustics, Speech, and Signal Pr ocessing, IEEE International Confer ence on , vol. 0, pp. 3661–3664, 2009. [73] S. Kar and J. M. F . Moura, “Distributed consensus algorithms in sensor networks: Quantized data and random link failures, ” Accepted for publication in the IEEE T ransactions on Signal Pr ocessing , September 2009. [Online]. A v ailable: http://arxi v .org/abs/0712.1609 [74] T . C. A ysal, M. Coates, and M. Rabbat, “Distributed av erage consensus using probabilistic quantization, ” in Statistical Signal Pr ocessing, 2007. SSP ’07. IEEE/SP 14th W orkshop on , Aug. 2007, pp. 640–644. [75] T . A ysal, M. Coates, and M. Rabbat, “Distributed average consensus with dithered quantization, ” IEEE T rans. Signal Processing , vol. 56, no. 10, pp. 4905–4918, Oct. 2008. [76] R. Carli, F . Fagnani, A. Speranzon, and S. Zampieri, “Communication constraints in the average consensus problem, ” Automatica , vol. 44, no. 3, pp. 671–684, 2008. [77] N. Elia and S. Mitter , “Stabilization of linear systems with limited information, ” Automatic Control, IEEE T ransactions on , vol. 46, no. 9, pp. 1384–1400, Sep 2001. [78] M. Y ildiz and A. Scaglione, “Differential nested lattice encoding for consensus problems, ” in Proc. Information Processing in Sensor Networks , April 2007, pp. 89–98. [79] ——, “Coding with side information for rate-constrained consensus, ” Signal Processing , IEEE T ransactions on , vol. 56, no. 8, pp. 3753–3764, Aug. 2008. [80] A. W yner and J. Ziv , “The rate-distortion function for source coding with side information at the decoder, ” Information Theory , IEEE T ransactions on , vol. 22, no. 1, pp. 1–10, Jan 1976. [81] R. Zamir, S. Shamai, and U. Erez, “Nested linear/lattice codes for structured multiterminal binning, ” Information Theory , IEEE T ransactions on , vol. 48, no. 6, pp. 1250–1276, Jun 2002. [82] O. A yaso, D. Shah, and M. Dahleh, “Distributed computation under bit constraints, ” in Decision and Control, 2008. CDC 2008. 47th IEEE Confer ence on , Dec. 2008, pp. 4837–4842. DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 28 [83] A. El Gamal and H.-I. Su, “Distributed lossy av eraging, ” in Information Theory , 2009. ISIT 2009. IEEE International Symposium on , 28 2009-July 3 2009, pp. 1453–1457. [84] B. Nazer, A. G. Dimakis, and M. Gastpar, “Neighborhood gossip: Concurrent a veraging through local interference, ” in Acoustics, Speech and Signal Pr ocessing, 2009. ICASSP 2009. IEEE International Conference on , April 2009, pp. 3657–3660. [85] S. Kar and J. Moura, “Consensus based detection in sensor networks: T opology optimization under practical constraints, ” in Pr oc. International W orkshop on Information Theory in Sensor Networks , Santa Fe, NM, June 2007. [86] V . Saligrama, M. Alanyali, and O. Sav as, “Distributed detection in sensor networks with packet loss and ﬁnite capacity links, ” IEEE T rans. Signal Pr ocessing , vol. 54, no. 11, pp. 4118–4132, Nov . 2006. [87] C. Moallemi and B. V an Roy , “Consensus propagation, ” IEEE Tr ansactions on Information Theory , vol. 52, no. 11, pp. 4753–4766, 2006. [88] R. T ron, R. V idal, and A. T erzis, “Distributed pose estimation in camera networks via consensus on S E (3) , ” in Proc. IEEE Conf. on Distributed Smart Cameras , Palo Alto, Sep. 2008. [89] A. Jorstad, P . Burlina, I. W ang, D. Lucarelli, and D. DeMenthon, “Model-based pose estimation by consensus, ” in Pr oc. Intelligent Sensors, Sensor Networks, and Inf. Pr ocessing , Sydney , Dec. 2008. [90] S. Kar and J. M. F . Moura, “ A linear iterative algorithm for distributed sensor localization, ” in 42nd Asilomar Conference on Signals, Systems, and Computers , Paciﬁc Grov e, CA, Oct. 2008, pp. 1160–1164. [91] S. Kar , J. M. F . Moura, and K. Ramanan, “Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication, ” Aug. 2008, submitted for publication. [Online]. A vailable: http://arxi v .org/abs/0809.0009 [92] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis, ” IEEE T rans. Signal Pr ocessing , vol. 56, no. 7, pp. 3122–3136, July 2008. [93] S. Stankovic, M. Stankovic, and D. Stipanovic, “Decentralized parameter estimation by consensus based stochastic approximation, ” in 46th IEEE Confer ence on Decision and Contr ol , Ne w Orleans, LA, USA, 12-14 Dec. 2007, pp. 1535–1540. [94] I. Schizas, G. Mateos, and G. Giannakis, “Stability analysis of the consensus-based distributed lms algorithm, ” in Proceedings of the 33r d International Confer ence on Acoustics, Speech, and Signal Processing , Las V egas, Nev ada, USA, April 1-4 2008, pp. 3289–3292. [95] S. Ram, V . V eera v alli, and A. Nedic, “Distributed and recursive parameter estimation in parametrized linear state-space models, ” Submitted for publication , April 2008. [96] S. S. Ram, V . V . V eerav alli, and A. Nedic, “Distributed and recursi ve nonlinear least square parameter estimation: Linear and separable models, ” in Sensor Networks: Wher e Theory Meets Practice , G. Ferrari, Ed. Springer -V erlag, 2009. [97] R. Olfati-Saber , “Distributed Kalman ﬁlter with embedded consensus ﬁlters, ” in ECC-CDC’05, 44th IEEE Conference on Decision and Contr ol and European Control Conference , 2005. [98] S. Kirti and A. Scaglione, “Scalable distributed Kalman ﬁltering through consensus, ” in Pr oceedings of the 33r d International Conference on Acoustics, Speec h, and Signal Pr ocessing , Las V eg as, Nev ada, USA, April 1-4 2008, pp. 2725–2728. [99] U. A. Khan and J. M. F . Moura, “Distributing the Kalman ﬁlter for large-scale systems, ” Accepted for publication, IEEE T ransactions on Signal Pr ocessing , 2008. [100] A. Ribeiro, I. D. Schizas, S. I. Roumeliotis, and G. B. Giannakis, “Kalman ﬁltering in wireless sensor networks: Incorporating communication cost in state estimation problems, ” IEEE Contr ol Systems Magazine , 2009, submitted. [101] R. Carli, A. Chiuso, L. Schenato, and S. Zampieri, “Distributed Kalman ﬁltering using consensus strate gies, ” IEEE Journal on Selected Ar eas in Communications , vol. 26, no. 4, pp. 622–633, 2008. [102] A. Das and M. Mesbahi, “Distrib uted linear parameter estimation in sensor networks based on laplacian dynamics consensus algorithm, ” in 3rd Annual IEEE Communications Society on Sensor and Ad Hoc Communications and Networks , vol. 2, Reston, V A, USA, 28-28 Sept. 2006, pp. 440–449. [103] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in ad hoc wsns with noisy links - part i: Distributed estimation of deterministic signals, ” IEEE T ransactions on Signal Pr ocessing , v ol. 56, no. 1, pp. 350–364, January 2008. [104] M. Nev el’ son and R. Has’minskii, Stoc hastic Appr oximation and Recursive Estimation . Providence, Rhode Island: American Mathematical Society , 1973. [105] M. Huang and J. Manton, “Stochastic L yapunov analysis for consensus algorithms with noisy measurements, ” in Pr oc. American Control Conf. , New Y ork, Jul. 2007. [106] S. Kar and J. M. F . Moura, “ A mixed time scale algorithm for distributed parameter estimation: nonlinear observation models and DIMAKIS ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PR OCESSING 29 imperfect communication, ” in Proceedings of the 34th International Conference on Acoustics, Speech, and Signal Pr ocessing , T aipei, T aiwan, April 2009, pp. 3669–3672. [107] N. Patwari, J. Ash, S. Kyperountas, A. Hero, R. Moses, and N. Correal, “Locating the nodes: Cooperati ve localization in wireless sensor networks, ” IEEE Signal Pr ocessing Magazine , vol. 22, no. 4, pp. 54–69, July 2005. [108] D. Li and Y . Hu, “Energy based collaborative source localization using acoustic micro-sensor array , ” J. EUR OSIP Applied Signal Pr ocessing , v ol. 2003, no. 4, pp. 321–337, 2003. [109] X. Sheng and Y . Hu, “Energy based acoustic source localization, ” in Pr oc. ACM/IEEE Int. Conf. on Information Processing in Sensor Networks , Palo Alto, April 2003. [110] ——, “Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks, ” IEEE T ransactions on Signal Pr ocessing , v ol. 53, no. 1, pp. 44–53, Jan. 2005. [111] M. Rabbat, R. Nowak, and J. Bucklew , “Robust decentralized source localization via av eraging, ” in Pr oc. IEEE ICASSP , Phil., P A, Mar. 2005. [112] U. Khan, S. Kar , and J. Moura, “Distributed sensor localization in random en vironments using minimal number of anchor nodes, ” IEEE T rans. Signal Pr ocessing , vol. 57, no. 5, pp. 2000–2016, May 2009. [113] D. Slepian and J. W olf, “Noiseless coding of correlated information sources, ” IEEE Tr ans. Inf. Theory , vol. 19, no. 4, pp. 471–480, 1973. [114] S. Servetto, “On the feasibility of large-scale wireless sensor networks, ” Pr oc. Allerton Conf. on Comm., Contr ol, and Computing , 2002. [115] S. Pradhan, J. Kusuma, and K. Ramchandran, “Distributed compression in a dense microsensor network, ” IEEE Signal Processing Magazine , v ol. 19, no. 2, pp. 51–60, March 2002. [116] S. Mallat, A W avelet T our of Signal Pr ocessing . Academic Press, 1999. [117] R. DeV ore, “Nonlinear approximation, ” Acta numerica , vol. 7, pp. 51–150, 1998. [118] F . Chung, Spectral Graph Theory . American Math. Society , 1997. [119] E. J. Candes and T . T ao, “Decoding by linear programming, ” IEEE T rans. Inform. Theory , vol. 51, no. 12, pp. 4203–4215, Dec. 2005. [120] D. L. Donoho, “Compressed sensing, ” IEEE Tr ans. Inform. Theory , vol. 52, no. 4, pp. 1289–1306, Apr . 2006. [121] E. J. Candes and T . T ao, “Near-optimal signal recovery from random projections: Universal encoding strategies?” IEEE T rans. Inform. Theory , vol. 52, no. 12, pp. 5406–5425, Dec. 2006. [122] M. Rabbat, J. Haupt, A. Singh, and R. No wak, “Decentralized compression and predistribution via randomized gossiping, ” in Pr oc. Information Processing in Sensor Networks , Nashville, TN, Apr . 2006. [123] J. Haupt, W . Bajwa, M. Rabbat, and R. Now ak, “Compressed sensing for networked data, ” IEEE Signal Pr ocessing Magazine , vol. 25, no. 2, pp. 92–101, Mar . 2008. [124] A. Schmidt and J. Moura, “ A distributed sensor fusion algorithm for the inv ersion of sparse ﬁelds, ” in Pr oc. Asilomar Conf. on Signals, Systems, and Computers , Paciﬁc Grov e, CA, Nov . 2009. [125] R. Sarkar, X. Zhu, and J. Gao, “Hierarchical spatial gossip for multi-resolution representations in sensor networks, ” in Pr oc. Int. Conf. on Information Pr ocessing in Sensor Networks , April 2007, pp. 420–429. [126] S. S. Ram, A. Nedi ´ c, and V . V eera valli, “ Asynchronous gossip algorithms for stochastic optimization, ” in Pr oc. IEEE Conf. on Decision and Contr ol , Shanghai, China, Dec. 2009. [127] M. Cetin, L. Chen, J. Fisher, A. Ihler, R. Moses, M. W ainwright, and A. W illsky , “Distributed fusion in sensor networks, ” IEEE Signal Pr ocessing Magazine , vol. 23, no. 4, pp. 42–55, 2006. [128] A. Sarwate and A. Dimakis, “The impact of mobility on gossip algorithms, ” Proceedings of IEEE Infocom , 2009.

Gossip Algorithms for Distributed Signal Processing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment