Picking up the Pieces: Self-Healing in Reconfigurable Networks

Picking up the Pieces: Self-Healing in Reconﬁgurable Networks Jared Saia ∗ Amitabh T rehan ∗ Abstract W e consider the pr oblem of self-healing in networks that ar e reconﬁgurable in the sense that they can change their topology during an attack. Our goal is to maintain connectivity in these networks, even in the pr esence of r epeated adversarial node deletion, by car efully adding edges after each attack. W e pr esent a ne w algorithm, D ASH , that pr ovably ensur es that: 1) the network stays connected even if an adversary deletes up to all nodes in the network; and 2) no node ever incr eases its de gr ee by more than 2 log n , wher e n is the number of nodes initially in the network. DASH is fully distrib uted; adds new edges only among neighbors of deleted nodes; and has avera ge latency and bandwidth costs that ar e at most logarithmic in n . DASH has these pr operties irr e- spective of the topology of the initial network, and is thus orthogonal and complementary to traditional topology- based appr oaches to defending a gainst attack. W e also pr ove lower -bounds showing that DASH is asymptotically optimal in terms of minimizing maxi- mum de gree increase over multiple attacks. F inally , we pr esent empirical results on power-law gr aphs that show that DASH performs well in practice , and that it signif- icantly outperforms naive algorithms in r educing maxi- mum de gr ee incr ease. 1. Introduction On August 15, 2007 the Skype network crashed for about 48 hours, disrupting service to approximately 200 million users [8, 13, 16, 19, 20]. Skype attributed this outage to failures in their “self-healing mechanisms” [2]. W e believ e that this outage is indicative of a much broader problem. Modern computer systems have com- plexity unprecedented in the history of engineering: we are approaching scales of billions of components. Such systems are less akin to a traditional engineering enter- ∗ Department of Computer Science, Uni versity of Ne w Me xico, Al- buquerque, NM 87131-1386; email: { saia, amitabh } @cs.unm.edu. This research was partially supported by NSF CAREER A ward 0644058, NSF CCR-0313160, and an AFOSR MURI grant. prise such as a bridge, and more akin to a living organ- ism in terms of complexity . A bridge must be designed so that key components nev er fail, since there is no way for the bridge to automatically recover from system f ail- ure. In contrast, a living organism can not be designed so that no component ev er fails: there are simply too many components. For example, skin can be cut and still heal. Designing skin that can heal is much more practi- cal than designing skin that is completely impervious to attack. Unfortunately , current algorithms ensure robust- ness in computer networks through hardening individual components or , at best, adding lots of redundant compo- nents. Such an approach is increasingly unscalable. In this paper , we focus on a new , r esponsive approach for maintaining robust networks. Our approach is re- sponsiv e in the sense that it responds to an attack (or component failure) by changing the topology of the net- work. Our approach works irrespecti ve of the initial state of the network, and is thus orthogonal and comple- mentary to traditional non-responsiv e techniques. There are many desirable inv ariants to maintain in the face of an attack. Here we focus only on one of the simplest and most fundamental in variants: maintaining network connectivity . The responsi ve approach will only w ork on networks that are r econﬁgurable , in the sense that the topology of the network can be changed. Not all networks hav e this property . Howe ver , many large-scale networks are reconﬁgurable. For example, peer-to-peer and overlay networks are reconﬁgurable, as are wireless and mo- bile networks. More generally , many social networks, such as a company’ s or ganizational chart; infrastructure networks, such as an airline’ s transportation network; and biological networks, such as the human brain, are also reconﬁgurable. The increasing importance of these types of networks calls for new mathematical and algo- rithmic methods to study and exploit their ﬂe xibility . Our Model: W e now describe our model of attack and network response. W e assume that the network is ini- tially a connected graph over n nodes. W e assume that ev ery node knows not only its neighbors in the network but also the neighbors of its neighbors i.e. neighbor- of-neighbor (NoN) information. In particular , for all 1 nodes x , y and z such that x is a neighbor of y and y is a neighbor of z , x knows z . There are many ways that such information can be efﬁciently maintained, see e.g. [14, 18]. W e assume that there is an adversary that is attacking the network. This adversary kno ws the network topol- ogy and our algorithm, and it has the ability to delete carefully selected nodes from the network. Howe ver , we assume the adversary is constrained in that in any time step it can only delete a small number of nodes from the network 1 . W e further assume that after the adversary deletes some node x from the network, that the neigh- bors of x become aware of this deletion and that they hav e a small amount of time to react. When a node x is deleted, we allow the neighbors of x to react to this deletion by adding some set of edges amongst themselves. W e assume that these edges can only be between nodes which were previously neighbors of x . This is to ensure that, as much as possible, edges are added which respect locality information in the un- derlying network. W e assume that there is very limited time to react to deletion of x before the adversary deletes another node. Thus, the algorithm for deciding which edges to add between the neighbors of x must be fast and localized. Our Results: W e introduce an algorithm for self- healing of reconﬁgurable networks, called D ASH (an acronym for De gree based Self-Healing ). D ASH is locality-awar e in that it uses only the neighbors of the deleted node for reconnection. W e prove that D ASH maintains connectivity in the network, and that it increases the degree of any node by no more than O ( log n ) . During reconnection of nodes, our algorithm uses only local information, therefore, it is scalable and can be implemented in a completely distrib uted man- ner . Algorithm D ASH is described as Algorithm 1 in Section 2. The main characteristics of DASH are sum- marized in the following theorem that is prov ed in Sec- tion 2. Theorem 1. D ASH guarantees the following pr operties even if up to all the nodes in the network ar e deleted: • The de gr ee of any vertex is incr eased by at most 2 log n . • The number of messages any node of initial de- gr ee d sends out and receives is no more than 2( d + 2 log n ) ln n with high probability 2 over all 1 Throughout this paper , for ease of e xposition, we will assume that the adversary deletes only one node from the network before the al- gorithm responds. Howev er, our main algorithm, DASH , can easily handle the situation where any number of nodes are removed, so long as the neighbor-of-neighbor graph remains connected. 2 Throughout this paper, we use the phrase with high probability (w .h.p) to mean with probability at least 1 − 1 /n C for any ﬁxed con- stant C . node deletions. • The latency to reconnect is O (1) after attack; and the amortized latency to update the state of the network over θ ( n ) deletions is O (log n ) with high pr obability . W e also prove (in Section 3) the follo wing lower bound that shows that D ASH is asymptotically optimal. Theorem 2. Consider any locality-aware algorithm that incr eases the de gr ee of any node after an attack by at most a ﬁxed constant. Then ther e exists a graph and a strate gy of deletions on that graph that will for ce the al- gorithm to incr ease the degr ee of some node by at least log n . W e also present empirical results (in Section 4) show- ing that D ASH performs well in practice and that it sig- niﬁcantly outperforms nai ve algorithms in terms of re- ducing the maximum degree increase. Finally (in Sec- tion 4) we describe SDASH , a heuristic based on DASH that we show empirically both keeps node degrees small and also keeps shortest paths between nodes short. Related W ork: There have been numerous papers that discuss strategies for adding additional capacity and rerouting in anticipation of f ailures [7, 9, 12, 17, 21, 22]. Here we focus on results that are responsi ve in some sense. M ´ edard, Finn, Barry , and Gallager [15] propose constructing redundant trees to mak e backup routes pos- sible when an edge or node is deleted. Anderson, Bal- akrishnan, Kaashoek, and Morris [1] modify some e xist- ing nodes to be RON (Resilient Ov erlay Network) nodes to detect failures and reroute accordingly . Some net- works have enough redundancy built in so that separate parts of the network can function on their own in case of an attack [11]. In all these past results, the network topology is ﬁxed. In contrast, our algorithm adds edges to the network as node failures occur . Further , our al- gorithm does not dictate routing paths or speciﬁcally re- quire redundant components to be placed in the netw ork initially . In this paper , we build on earlier work done in [5, 6], which proposed a simple line algorithm for self- healing to maintain network connecti vity . T able of Contents: The rest of our paper is organized as follows. Section 2 describes the algorithm DASH , and its theoretical properties. Section 3 gi ves a lower bound on locality-aware algorithms. Section 4 giv es empirical results for DASH , and several other simple algorithms on random power -law networks. It also describes and giv es results for SD ASH . W e conclude and giv e areas for future work in Section 5. 2. D ASH : An Algorithm for Self-Healing In this Section, we describe DASH and prov e cer- tain properties about it. In brief, when a deletion occurs, D ASH asks the neighbors of the deleted node to recon- nect themselves into a certain kind of complete binary tree. Then messages are propagated so that the nodes can keep track of which connected component they be- long to. Let the actual network at a particular time step be G ( V , E ) . Let E 0 be the edges (i.e. healing edges ), that hav e been added by the algorithm up to that time step (note E 0 ⊆ E ). Let G 0 = ( V , E 0 ) . W e show that G 0 is a forest in Lemma 1. 2.1. D ASH : Degree based Self-Healing As the acronym suggests, DASH employs informa- tion of pre vious degree increase to control further de gree increase for a node. When a deletion occurs, we assume the neighbors of the deleted node are able to detect the deletion. Then they employ D ASH to heal. T o maintain connectivity , D ASH connects the neighbors of a deleted node as a binary tree. The tree is structured so that the vertices which have incurred the maximum degree in- crease previously get to be leav es and thus not increase their degree in this round. Notice that at least half the vertices in a binary tree are leav es. The nodes main- tain information about the virtual network and their con- nected component in this network. The algorithm tries to use only a single node from each component during reconnection and thus adds only a lo w number of new edges during healing. T o describe D ASH we gi ve some deﬁnitions. Let N ( v , G ) be the neighbors of vertex v in the graph G rep- resenting the real network. Let N ( v , G 0 ) be the neigh- bors of vertex v in graph G 0 consisting of the edges added by the healing algorithm. Let δ ( v ) be the degree increase of the vertex v compared to its initial degree. Note that this is not the same as the degree of v in G 0 . When a node v is deleted, partition on the basis of their I D all the neighbors of v in G (not having the same I D as v ). Let U N ( v , G ) ( Unique Neighbors ) be the set having one representati ve from each of the partitions. If there is more than one node as a possible representative from a partition, we include the one with the lowest ini- tial I D . Note that U N ( v , G ) ∩ N ( v , G 0 ) = φ and U N ( v , G ) ∪ N ( v , G 0 ) ⊆ N ( v , G ) . The I D of a node allows us to keep track of which connected component in G 0 it be- longs to. The lowest I D of any node in that component is broadcast and all the nodes in the component take on this I D . Algorithm 1 D ASH: Degree-Based Self-Healing 1: Init: for gi ven network G ( V , E ) , Initialize each ver- tex with a random number I D between [0,1] se- lected uniformly at random. 2: while true do 3: If a vertex v is deleted, do 4: Nodes in U N ( v , G ) ∪ N ( v , G 0 ) are reconnected into a complete binary tr ee . T o connect the tree, go left to right, top down, mapping nodes to the complete binary tr ee in increasing order of δ value. 5: Let M I N I D be the minimum I D of any node in U N ( v , G ) ∪ N ( v , G 0 ) . Propagate M I N I D to all the nodes in the tree of U N ( v , G ) ∪ N ( v , G 0 ) in G 0 . All these nodes now set their I D to M I N I D . 6: end while Our main results about DASH are stated in Theo- rem 1. Theorem 1. D ASH is a distributed algorithm with the following pr operties: • The de gr ee of any vertex is incr eased by at most 2 log n . • The latency to reconnect is O (1) . • The number of messages any node of de gr ee d sends out and r eceives is no mor e than (2 d + 2 log n ) ln n with high probability over all node deletions. • The amortized latency for I D pr opagation is O ( log n ) with high probability over all node dele- tions. 2.2. Pro of of Theorem 1 For analysis, we use the follo wing deﬁnitions: • Let T ( x, y ) be the tree in G 0 − y that contains x . • Each vertex v will hav e a weight, w ( v ) . The weight of a verte x will start at 1 and may increase during the algorithm. If v is deleted, w ( v ) is added to an arbitrarily chosen neighbor in G 0 . • Let W ( S ) = P v ∈ V w ( v ) , for a graph S ( V , E ) i.e. the sum of the weights of all vertices in S . • For verte x v , let rem( v ) = X u ∈ N ( v ,G 0 ) W ( T ( u, v )) − max u ∈ N ( v ,G 0 ) ( W ( T ( u, v ))) + w ( v ) . W e will show that as the degree of a vertex in- creases in our algorithm, so will the rem v alue of that verte x. Intuitiv ely rem( v ) is large when re- moving v from its tree in G 0 giv es rise to many connected components with large weight. Lemma 1. The edges added by the algorithm, E 0 , form a for est. Pr oof. W e prove this by induction on the number of nodes deleted. Base Case: Initially , G 0 is a forest because E 0 is empty . W e note that E 0 and G 0 change only when a deletion occurs. Consider the i th deletion and let v be the node deleted. Let v belong to tree T v in G 0 just prior to the deletion of v . Now , for all x, y ∈ N ( v , G 0 ) x and y are not con- nected in E 0 since that w ould ha ve implied the existence of a c ycle through v contradicting the Inductiv e Hypoth- esis. Note also that for all z ∈ U N ( v , G ) , z / ∈ T v . Since we select only 1 node from each tree T i in which v had a neighbor , no pair of nodes in U N ( v , G ) ∪ N ( v , G 0 ) are connected in G 0 . W e reconnect all the nodes in U N ( v , G ) ∪ N ( v , G 0 ) in a Binary T ree and propagate the minimum ID. Since we are adding edges between nodes which previously were in separate connected compo- nents in G 0 , no cycles are introduced. Hence, G 0 remains a forest. Lemma 2. F or any vertex v , r em ( v ) is non-decr easing over any verte x deletion where v has not been deleted. Pr oof. By Lemma 1, ev ery verte x v in G 0 belongs to some tree, which we will call T v . F or ev ery T v in G 0 , W ( T v ) is the sum of the weights of all vertices in T v . By deﬁnition, rem( v ) = X u ∈ N ( v ,G 0 ) W ( T ( u, v )) − max u ∈ N ( v ,G 0 ) ( W ( T ( u, v ))) + w ( v ) . Therefore, r em ( v ) = W ( T v ) − max u ∈ N ( v ,G 0 ) W ( T ( u, v )) Observe ﬁrst that W ( T v ) cannot decrease ev en when there is a deletion in T v because the deleted vertex’ s weight is not “lost”, but added to some member of T v . Since W ( T v ) cannot decrease, r em ( v ) can only de- crease if the maximum subtree weight increases more than W ( T v ) . Since the maximum subtree is a subset of the tree, T v , any increases or decreases in the maximum subtree is also counted in W ( T v ) . Thus, r em ( v ) cannot decrease. Lemma 3. F or any node v , for all nodes q ∈ N ( v , G 0 ) , W ( T ( v , q )) ≥ r em ( v ) . m r l v Figure 1. W ( T ( v , m )) ≥ r em ( v ) . Pr oof. For all nodes q , W ( T ( v , q )) = X u ∈ N ( v,G 0 ) u 6 = q W ( T ( u, v )) + w ( v ) ≥ X u ∈ N ( v ,G 0 ) W ( T ( u, v )) − max u ∈ N ( v ,G 0 ) W ( T ( u, v )) + w ( v ) = rem ( v ) For example, in ﬁgure 1, W ( T ( V , M )) = W ( T ( L, V )) + W ( T ( R, V )) + w ( v ) ≥ r em ( v ) . Lemma 4. F or any node v , r em ( v ) ≥ 2 δ ( v ) / 2 , wher e δ ( v ) , as deﬁned earlier , is the de gr ee increase of the ver- tex v in G . Pr oof. Let t be the number of rounds of healing where a round is a single adversarial deletion followed by self-healing by D ASH . W e prove this lemma by induction on t . Let G 0 t , rem t ( v ) and δ t ( v ) be G 0 , rem( v ) and δ ( v ) respectiv ely at time t . Base Case: t = 0: In this case, all nodes v hav e δ ( v ) = 0 ; rem( v ) = 1 . Thus, rem( v ) ≥ 2 0 . Inductive Step: Consider the network at round t . W e assume by the inductive hypothesis that for all nodes v in G 0 , rem t − 1 ( v ) ≥ 2 δ t − 1 ( v ) / 2 . Our goal is to sho w that rem t ( v ) ≥ 2 δ t ( v ) / 2 . Suppose node x was deleted at round t . According to our algorithm, some or all of the neighbors of x will be reconnected as a binary tree. Let us call this tree R T (short for Reconstruction T r ee ). Let T ( x, y ) be the tree in G 0 t − 1 − y that contains x , and T 0 ( x, y ) be the tree in G 0 t − y that contains x . Consider a surviving vertex v . If v is not a part of R T , then by a simple application of lemma 2, our induction holds. If v is a part of R T , there are 3 possibilities: 1. v is a leaf node in R T The degree of v did not change. Thus, δ t ( v ) = δ t − 1 ( v ) . By Lemma 2, rem t ( v ) ≥ rem t − 1 ( v ) . Thus, using the induction hypothesis, rem t ( v ) ≥ 2 δ t ( v ) / 2 . 2. v is the root of R T w1 w2 v z x H v z w1 w2 H’ Figure 2. node v is the root, with 2 children If v has only one child in R T , then this is the same as the previous case with the parent and child role rev ersed and the induction holds. Let us consider the case when v has two children in R T . Now , δ t ( v ) has increased by 1. Let z be the neighbor of v such that W ( T ( z , v )) is the largest among all neighbors of v except x . Note that W ( T 0 ( z , v )) = W ( T ( z , v )) , since this subtree was not inv olved in the reconstruction. Consider the possibly empty subtree of v rooted at z . Let the two children of v in R T be w 1 and w 2 , as illustrated in ﬁgure 2. By our algorithm, we know that δ t − 1 ( w 1 ) ≥ δ t − 1 ( v ) and δ t − 1 ( w 2 ) ≥ δ t − 1 ( v ) . Thus, using the inductiv e hypothesis and lemma 3, we have that W(T( w 1 , x )) ≥ rem t − 1 ( w 1 ) ≥ 2 δ t − 1 ( w 1 ) / 2 and W(T( w 2 , x )) ≥ rem t − 1 ( w 2 ) ≥ 2 δ t − 1 ( w 2 ) / 2 . By lemma 2, this implies that in G 0 t , W (T 0 ( w 1 , v )) ≥ 2 δ t − 1 ( w 1 ) / 2 ≥ 2 δ t − 1 ( v ) / 2 W (T 0 ( w 2 , v )) ≥ 2 δ t − 1 ( w 2 ) / 2 ≥ 2 δ t − 1 ( v ) / 2 Assume without loss of generality that W(T 0 ( w 1 , v )) ≤ W(T 0 ( w 2 , v )) . There are two cases: (a) W(T( z , v )) < W(T 0 ( w 1 , v )) In this case rem t − 1 ( v ) did not include W(T( x, v )) . But rem t ( v ) will include W(T 0 ( w 1 , v )) Hence, rem t ( v ) ≥ rem t − 1 ( v ) + W(T 0 ( w 1 , v )) ≥ 2 δ t − 1 ( v ) / 2 + 2 δ t − 1 ( v ) / 2 = 2 ( δ t − 1 ( v )+2) / 2 = 2 ( δ t ( v )+1) / 2 (b) W(T( z , v )) ≥ W(T 0 ( w 1 , v )) In this case rem t ( v ) will include W(T 0 ( w 1 , v )) and the smaller of W(T 0 ( w 2 , v )) and W(T 0 ( z , v )) . Note that by Lemmas 3 and 2, the inductive hypothesis, and the fact that δ t − 1 ( w 1 ) ≥ δ t − 1 ( v ) , W( T 0 ( w 1 , v )) ≥ rem t ( w 1 ) ≥ rem t ( w 1 ) ≥ 2 δ t − 1 ( w 1 ) / 2 ≥ 2 δ t − 1 ( v ) / 2 . Also, since by assumption W( T 0 ( w 2 , v )) ≥ W( T 0 ( w 1 , v )) , we know that W( T 0 ( w 2 , v )) ≥ 2 δ t − 1 ( v ) / 2 . Further , since W( T 0 ( z , v )) = W( T ( z , v )) ≥ W( T 0 ( w 1 , v )) we know that W( T 0 ( z , v )) ≥ 2 δ t − 1 ( v ) / 2 . Hence, rem t ( v ) ≥ 2 δ t − 1 ( v ) / 2 + 2 δ t − 1 ( v ) / 2 = 2 ( δ t − 1 ( v )+2) / 2 = 2 ( δ t ( v )+1) / 2 3. v is an internal node in T 0 v x H H’ c1 c2 p c1 p c2 p2 v Figure 3. Internal node v with 1 child v x c1 c2 p c1 v p c2 H H’ Figure 4. Internal node v with 2 children For node v to become an internal node, the deleted neighbor x must hav e at least three other neighbors. Three neighbors of x are shown as C 1 , C 2 and P in the ﬁgures 3 and 4. Also, now v ’ s degree can increase by 1, as illustrated in ﬁgure 3, or by 2, as illustrated in ﬁgure 4. Let us consider these cases separately: (a) δ t ( v ) = δ t − 1 ( v ) + 1 This can only happen when v has a parent and a single child in R T as in ﬁgure 3. Let P be the parent of v and C 1 the child of v . C 1 has to be a leaf node since the tree is complete and v has only one child. Observe that there exists at least one leaf node besides C 1 in the tree, accessible to v only via P . Let this node be C 2 and let P 2 be its parent. Note that P 2 and P may e ven be the same node. In our algorithm, any leaf node in R T has a δ value no less than the δ v alue of any internal node. Thus, δ t − 1 ( C 1) ≥ δ t − 1 ( v ); and δ t − 1 ( C 2) ≥ δ t − 1 ( v ) These inequalities, Lemmas 2 and 3, and the Inductiv e Hypothesis, imply that W(T 0 ( C 1 , v )) ≥ rem t ( C 1) ≥ rem t − 1 ( C 1) ≥ 2 δ t − 1 ( v ) / 2 ; W(T 0 ( C 2 , P 2)) ≥ rem t ( C 2) ≥ rem t − 1 ( C 2) ≥ 2 δ t − 1 ( v ) / 2 ; W(T( v , x )) ≥ rem t ( v ) ≥ rem t − 1 ( v ) ≥ 2 δ t − 1 ( v ) / 2 . Since rem t ( v ) can exclude at most one of W ( T 0 ( C 1 , v )) , W ( T 0 ( C 2 , P 2)) and W ( T ( v , x )) , rem t ( v ) ≥ 2 δ t − 1 ( v ) / 2 + 2 δ t − 1 ( v ) / 2 = 2 ( δ t ( v )+1) / 2 (b) δ t ( v ) = δ t − 1 ( v ) + 2 In this case v has two children in R T , C 1 and C 2 , as illustrated in ﬁgure 4. The analysis is similar to the case above. The value rem t ( v ) can exclude at most one of W ( T 0 ( C 1 , v )) , W ( T 0 ( C 2 , v )) and W ( T ( v , x )) and we can show that all three of these values are at least 2 δ t − 1 ( v ) / 2 . Thus, rem t ( v ) ≥ 2 ( δ t ( v )) / 2 . Hence, the induction holds. Lemma 5. F or all vertices v , rem ( v ) is always no mor e than n. Pr oof. No vertex is counted twice in a r em value since the subtrees of a verte x are disjoint. Since the number of vertices in the subtrees cannot be more than the number of vertices remaining, the r em v alue is always no more than the sum of the weights of all undeleted vertices in G 0 . Deﬁne W ∗ to be the sum of weights of all undeleted vertices in G 0 . After initialization, W ∗ = n , since there are n vertices. At each step of the algorithm, W ∗ = n , since the weight of the deleted vertex is added to one of the remaining vertices. Thus, for node v , r em ( v ) ≤ n . Lemma 6. DASH incr eases the de gr ee of any vertex by at most O (log n ) . Pr oof. Every vertex v starts with rem( v ) = w ( v ) = 1 . W e know that rem( v ) ≥ 2 δ ( v ) / 2 by Lemma 4. since rem( v ) is at most n, 2 δ ( v ) / 2 ≤ n . T aking log of both sides, δ ( v ) / 2 ≤ log n . Solving for δ ( v ) gives δ ( v ) ≤ 2 log n . Lemma 7. The latency to r econnect the network in D ASH is O (1) . Pr oof. During the reconnection process, DASH re- quires communication only between nodes one hop away , thus, the latency is just O (1) . Lemma 8. The number of messages any node of ini- tial degr ee d sends out and r eceives is no more than 2( d + 2 log n ) ln n with high probability over all node deletions. Pr oof. In DASH , after the reconnections ha ve been made, messages are sent out by nodes when the mini- mum I D has to be propagated. W ith similarity to the r ecor d br eaking pr oblem [10], it is easily shown that w .h.p. , a node has its I D reduced no more than 2 ln n times, where the record is the node’ s I D . These are the only messages the node needs to transmit or re- ceiv e. Each time its I D changes, the node sends this message to all its neighbors, Thus, it sends or recei ves O (( d + log n ) ln n ) messages, since the ﬁnal degree of the node is at most d + 2 log n . Lemma 9. The amortized latency for I D propa gation is O (log n ) with high probability over all node deletions. Pr oof. Again, with similarity to the recor d br eaking pr oblem , a node sends messages to its neighbors (neigh- bors, by deﬁnition, are a single hop away) only O (log n ) times with high probability . Thus, messages are trans- mitted O ( n log n ) times over all the nodes. Over O ( n ) deletions, this implies that the amortized latency for messages (inv olving I D propagation) is only O (log n ) . 2.3. Pro of of Theorem 1 The proof of Theorem 1 now follows immediately from Lemmas 6, 7, 8 and 9. 3 Lower bounds on Locality-aware algo- rithms 3.1. Necessit y of Comp onen t tracking for healing strategies T o begin with, we giv e an insight as to why a healing strategy might need to keep track of connected compo- nents. Lemma 10. F or a tr ee, deletion of a node of degr ee d incr eases the sum total of degr ees of its neighbors by d − 2 for a locality-aware acyclic healing str ate gy . Pr oof. A locality-awar e acyclic healing strate gy will re- connect the neighbors of a deleted node without cre- ating any cycles. If there were no cycles in the origi- nal graph inv olving the neighbors and not inv olving the deleted node, then such a strategy can only reconnect these neighbors as a tree to maintain their connectivity . A node of degree d has d neighbors. Since it was part of a tree, this node and its neighbors also constitute a tree. Let us call this the immediate subtree . The im- mediate subtree had d edges and a total of 2 d degrees. These d neighbors are now reconnected as a tree with d − 1 edges and 2( d − 1) degrees. Each of these neigh- bors lost a single degree due to the deletion of their edge to the deleted node. Thus, the total degrees gained on reconstruction are 2( d − 1) − d = d − 2 . It is reasonable to assume that an ef ﬁcient healing al- gorithm adds close to the minimum possible edges at each step to maintain connectivity of the neighbors of the deleted node. In G 0 , if a deleted node v had two neighbors which had an alternate path between them- selves not inv olving v , then the algorithm may need to use only one of them for reconnection to other nodes. By extension, if there were many neighbors which had alternate connections between them, the algorithm may need to use only one of these nodes. This is equiv a- lent to stating that the algorithm may need to use only one node from a connected component. Knowing that certain nodes are in the same component would allow the algorithm to do this. G 0 is comprised only of edges added by the healing algorithm, and is always a forest. If the adversary mainly deletes nodes with degree greater than 2 and the algorithm does not use the component in- formation, the sum total of degrees of the neighbors of the deleted nodes will increase by ( d − 2) i.e. at least 1, at each step. After many ( O ( n ) ) deletions, only a few nodes will be left, and these will have O ( n ) degree in- crease. 3.2 A lo w er b ound on healing b y Degree-b ounded lo calit y-aw are healing algorithms W e now prov e our result regarding the lo wer bounds for locality-aware algorithms in Theorem 2. Our lower bound occurs on graphs that are originally trees. T o state the proof, we need to prov e some other lemmas. First, we deﬁne the follo wing operation that the ad- versary can perform on trees, where we assume self- healing is applied after ev ery deletion: Prune (r ,s) : For a node r and its subtree headed by node s , the P rune operation on s leads to deletion of all the nodes in that subtree including s . This op- eration can be accomplished by repeatedly deleting leaf nodes in the subtree till all the nodes including s are deleted. d v c h b a x v a x d d v x Figure 5. Steps in Prune(v ,x). Leaf nodes are deleted at each step. Lemma 11. Deletion of a node with degr ee at least 3 incr eases the de gree of at least one node by de gr ee 1, no matter how the healing occurs. Pr oof. Any reconnection of more than two nodes has a 3-node line (as in ﬁgure 6) as a subgraph. Here the internal node has a degree increase of 1 . Thus, at least one node increases it’ s de gree by at least 1 . Figure 6. An internal node in a 3-node line reconnection suffers a degree increase . For further discussion, we deﬁne the follo wing: Degree-bounded / M-degr ee-bounded : A healing al- gorithm is de gr ee-bounded or M-de gree-bounded if any node can increase its de gree by at most M in a single round of deletion and healing. Lemma 12. Consider a M-de gree-bounded locality- awar e healing algorithm used on a tr ee. In such a situa- tion, deletion of a node v with de gr ee at least M+3 leads to de gr ee incr ease for at least two neighbors of v . . Pr oof. Node v has M + 3 neighbors. By Lemma 10, the sum total of degree increase of neighbors is M + 1 , when the graph is a tree. Since one node can get a maximum degree increase of M , at least one node has to incur the rest of the de gree increase. Thus, at least two nodes ha ve to increase their degrees. Figure 7. M+2 -ary T ree Algorithm 2 L E V E L A T TAC K : lev el-by-lev el attack on a (M+2)-ary tree 1: Consider an (M+2)-ary tree T of depth D with lev- els numbered 0 to D , the root being at lev el 0 . 2: i ← D − 1 3: while i ≥ 0 do 4: for each node v at level i do 5: if v has c > M + 2 children remo ve the excess c − ( M + 2) nodes by deleting those with least degree increases and their subtrees by using the Prune operation, so that v now has M + 2 chil- dren. 6: delete v . 7: end f or 8: i ← i − 1 9: end while Here, we introduce a new attack strate gy: L E V E L A T TAC K : This strategy is described in Algo- rithm 2. In brief, the adversary deletes nodes one lev el at a time be ginning one le vel abo ve the leav es of a M + 2 -ary complete tree going up to the root. The reasoning behind the strategy is the following: If the adversary deletes a node of degree M + 3 in a tree, this ensures that a de gree increase of at least 1 is passed to its children. What the adversary must do is to ensure that log n of these degree increases are credited to the same node. Lemma 13. Assume a ( M + 2) − ar y tr ee T , a de gr ee- bounded locality-awar e healing algorithm and the L E V - E L A T TAC K adversarial strate gy . Then, when L E V E L A T - TAC K deleted a node at level i , 0 < i < D some leaf node of the original tree incr eases its degr ee by at least D − i . Pr oof. The proof is by induction. Base case: In the L E V E L A T TAC K strategy , the nodes at level D − 1 are deleted ﬁrst. Thus, a deletion of a node at D − 1 is our base case. A node at level D − 1 has M + 3 neighbors. By lemma 12, there is at least one leaf node that increases its degree by 1 or more. Thus, the base case holds. Inductive step: Assume the hypothesis holds for nodes at le vel i + 1 . W e no w sho w that it holds for nodes at level i . Consider a node, say X at level i ≥ 0 . It had M + 2 children at le vel i + 1 . By the inducti ve hypothe- sis, each of these deletions led to at least one node with degree D − ( i + 1) . Moreo ver , X is not among these M + 2 nodes. Moreov er, all of these are no w neighbors of X , since X itself was in volv ed in each of these dele- tions. The Prune algorithm in step 5 retains only these M + 2 as children of X . Each of these children has de- gree increase D − ( i + 1) and was originally a leaf node of T . The adversary no w deletes X . By lemma 12, at least one of these children incurs a degree increase. Theorem 2. Consider any locality-aware algorithm that incr eases the de gr ee of any node after an attack by at most a ﬁxed constant. Then ther e exists a graph and a strate gy of deletions on that graph that will for ce the al- gorithm to incr ease the degr ee of some node by at least log n . Pr oof. It is suf ﬁcient to giv e a graph and an attack strat- egy such that any degree-bounded locality-aware heal- ing algorithm will hav e to increase a particular node’ s degree by log n . Let M be the constant degree increase that is the maximum that the healing algorithm can im- pose on any one node in the graph. Then, for a graph which is a full (M+2)-ary tree ( Figure 7), the adversary uses L E V E L A T TAC K . Consider a (M+2)-ary tree T of depth D with lev els numbered 0 to D . By lemma 13, after the last deletion in the adversary strategy , which is the deletion of the root of T i.e. the node at lev el 0 there is at least one node left which has a degree increase of D . Since D is O ( log n ) , this adversary strategy achieves a degree increase of at least O ( log n ) . 4. Experiments W e carried out a number of experiments to ascertain the performance of v arious healing algorithms. W e used a number of attack strategies to measure how different healing strategies performed with regard to degree in- crease and stretch, where stretch is the maximum ratio of distance increase in the healed network compared to the original network, over all pairs of nodes. Our em- pirical results on stretch and a heuristic for maintaining low stretch are described in Section 4.6. 4.1. Metho dology Most of our experiments were conducted on random graphs. These graphs were generated by the Pr efer en- tial attachment model proposed by Barabasi [3, 4]. The experimental approach was the follo wing: • For each graph size, for a particular deletion and healing strategy , repeat for 30 random instances of the graph: – Repeat while there are nodes in the graph: ∗ delete a single node according to the deletion strategy . ∗ repair according to the self-healing strat- egy . ∗ measure the statistics (e.g. maximum change of degree for any node) for the graph. • average the statistics for each graph size. 4.2. A ttac k Strategies The aim of the adversary is to collapse the network by trying to overload a node beyond it’ s maximum ca- pacity . There are many possible attack strategies. One strategy is to delete the node with the maximum de gree. W e call this the M axN odestrateg y . It would seem that a strategy that leads to additional burden on an already high burden node would be a good strategy . For the adversary , one good adversarial strategy is to continu- ously attack/delete a randomly chosen neighbor of the highest degree node in the network. W e call this the N eig hbor of M axS tr ateg y ( N M S ) . This would also seem plausible as in a real network or the kind of net- works we are looking at, it would be reasonable that the hubs or the high degree nodes would be more well pro- tected and resilient to attack while their less signiﬁcant neighbors should be easy to take do wn. 4.3. Healing strategies W e attempted various locality-aw are healing strate- gies, some of which are the following: • Graph heal : On each deletion, we reconnect the neighbors of the deleted node in a binary tree re- gardless of whether we introduced any cycles in the graph formed by the new edges introduced for heal- ing. This seems to be a naiv e algorithm since the nodes use more edges than what are required for maintaining connectivity . • Binary tree heal : On each deletion, we reconnect the neighbors of the deleted node in a binary tree being careful not to introduce an y cycles in the graph formed by the new edges introduced for heal- ing. This is done using random IDs which can then be used to identify which tree a particular node be- longs to. This is an improvement on the previous algorithm but still naiv e since it does not take into consideration the pre vious degree increase suffered by nodes during healing. • DASH (de gree based binary tree heal) : D ASH is smarter than the previous algorithms as borne out by the results of the experiments. The D ASH algo- rithm has been earlier described in Section 2.1 and stated as Algorithm 1. • SDASH (Surr ogate de gr ee based binary tr ee heal) : (described in Section 4.6.2) A heuristic based on D ASH that tries to keep both node degrees and path lengths small. 4.4. Degree increase The N eig hbor of M axS tr ateg y consistently re- sulted in higher degree increase, hence, we report results for only this attack strategy . Our experimental results clearly show that0 D ASH and SD ASH are good heal- ing strategies. It performed well against both adversary strategies. Figure 8 shows that D ASH and SDASH hav e much lower degree increase than the other more naive strategies. Also, this degree increase was less than log n , which is consistent with our theoretical results. SD ASH has the additional nice property that it keeps path lengths small ov er multiple adversarial deletions. Figure 8. Maximum Degree increase: D ASH vs other algorithms 4.5. Messages Figure 9(a) shows that the number of time a nodes I D changes is less than log n , as expected, for all heal- ing strategies. Figure 9(b) shows the maximum number of messages a node sent out for the different strategies. Note that the number of messages a node sends out has to be less than or equal to the number of times a node changes ID times the degree of the node. Thus, algo- rithms with higher degree increase perform poorly . 4.6. Heuristics and exp eriments inv olv- ing Stretc h 4.6.1 Stretch Stretch is an important property we would also like our self-healing algorithms to minimize. The stretch for any two nodes is the ratio between their distance in the new healed network and their distance in the original net- work. Stretch for the network is the maximum stretch ov er all pairs of nodes. Stretch is also closely related to the diameter of the network. In some sense, maintain- ing low degree increase and low stretch are contradictory aims since a high-degree node will lead to shorter paths and possibly lower stretch in the netw ork. (a) ID changes for nodes (b) Number of messages exchanged for Component(ID) information maintenance Figure 9. ID changes and Messages ex- changed per node 4.6.2 SDASH : a strategy with good empirical re- sults SD ASH is an algorithm we have devised which em- pirically has both lo w degree increase and low stretch. During self-healing, we say a node surr ogates if it re- places its deleted neighbor in the network. i.e. it takes all the connections of the deleted neighbor to itself. Sur- rogation ne ver increases stretch since the paths never in- crease in length. In certain situations, it turns out that surrogation can be done without degree increase. In such situations, SD ASH does surrogation else it simply ap- plies D ASH . SDASH is described in Algorithm 3. Algorithm 3 SD ASH: Surrogate Degree-Based Self- Healing 1: Init: for gi ven network G ( V , E ) , Initialize each ver- tex with a random number I D between [0,1] se- lected uniformly at random. 2: while true do 3: If a vertex v is deleted, do 4: Let m ∈ U N ( v , G ) ∪ N ( v , G 0 ) be the node with Maximum degree increase ( δ ) of all nodes in U N ( v , G ) ∪ N ( v , G 0 ) . 5: if w ∈ U N ( v , G ) ∪ N ( v , G 0 ) and δ ( w ) + | U N ( v , G ) ∪ N ( v , G 0 ) | − 1 ≤ δ ( m ) then 6: connect all nodes in U N ( v , G ) ∪ N ( v , G 0 ) to w . 7: else 8: Nodes in U N ( v , G ) ∪ N ( v , G 0 ) are recon- nected into a complete binary tr ee . T o connect the tree, go left to right, top do wn, mapping nodes to the complete binary tr ee in increasing order of δ value. 9: end if 10: Let M I N I D be the minimum I D of any node in U N ( v , G ) ∪ N ( v , G 0 ) . Propagate M I N I D to all the nodes in the tree of U N ( v , G ) ∪ N ( v , G 0 ) in G 0 . All these nodes now set their I D to M I N I D . 11: end while As can be seen in the ﬁgures that follo w , SDASH seems to allow a degree increase up to O (log n ) and stretch up to O (log n ) . W e are working on proving the- oretical properties of this algorithm. 4.6.3 Stretch: empirical results Figure 10 shows the performance of some of our algorithms for stretch. W e determined that the M axN odestr ateg y is most effecti ve for the adversary when trying to maximize stretch and so our results in Figure 10 are against that adversarial strategy . The more naiv e degree-control healing strategies do a good job of minimizing stretch. Howe ver , it is important to keep in mind that these more naive algorithms increase the node degrees to a point where they are unlikely to be use- ful for many applications. In contrast, our experiments show that SDASH does a good job of minimizing both stretch and degree increase. Figure 10. Stretch f or various algorithms 5. Conclusions and future w ork W e hav e studied the problem of self-healing in net- works that are reconﬁgurable in the sense that ne w edges can be added to the network. W e have described D ASH , a simple, efﬁcient and localized algorithm for self-healing, that prov ably maintains network connec- tivity , e ven while increasing the degree of any node by no more than O (log n ) . W e have shown that DASH is asymptotically optimal in terms of minimizing the de- gree increase of any node. Further , we hav e presented empirical results on power -law networks showing that D ASH signiﬁcantly outperforms the naive algorithms for this problem. Sev eral interesting problems remain open including the following: Can we not only maintain connectivity , but also prov ably ensure that lengths of shortest paths in the graph do not increase by too much? Can we remove the need for propagating IDs in order to maintain con- nected component information, or is such information strictly necessary to k eep the degree increase small? Can we use the self-healing idea to protect in variants for combinatorial objects besides graphs? For example, can we provide algorithms to rewire a circuit so that it maintains essential functionality even when multiple gates fail? 5.1 Ac knowledgmen ts W e gratefully acknowledge the help of Iching Bo- man, Dr . Deepak Kapur and his class Intr oduction to Pr oofs, Logic and T erm-r ewriting and the UNM Com- puter Science Theory Seminar in writing this paper . References [1] D. Andersen, H. Balakrishnan, F . Kaashoek, and R. Mor- ris. Resilient overlay networks. SIGOPS Oper . Syst. Rev . , 35(5):131–145, 2001. [2] V . Arak. What happened on August 16, August 2007. [3] A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science , 286:509, 1999. [4] A.-L. Barab ´ asi and E. Bonabeau. Scale-free networks. Scientiﬁc American , pages 50–59, 2003. [5] I. Boman, J. Saia, C. T . Abdallah, and E. Schamiloglu. Brief announcement: Self-healing algorithms for re- conﬁgurable networks. In Symposium on Stabilization, Safety , and Security of Distrib uted Systems(SSS) , 2006. [6] I.-C. C. Boman. Algorithms for self-healing networks. M.S. Thesis, Computer Science, University of New Me x- ico. , 2006. [7] R. D. Doverspike and B. W ilson. Comparison of ca- pacity efﬁcienc y of dcs netw ork restoration routing tech- niques. J. Network Syst. Manage . , 2(2), 1994. [8] K. Fisher . Skype talks of ”perfect storm” that caused outage, clariﬁes blame, August 2007. [9] T . Frisanco. Optimal spare capacity design for vari- ous protection switchingmethods in atm networks. In Communications, 1997. ICC 97 Montr eal, ’T owar ds the Knowledge Millennium’. 1997 IEEE International Con- fer ence on , volume 1, pages 293–298, 1997. [10] N. Glick. Breaking records and breaking boards. In The American Mathematical Monthly , volume 85, pages 2– 26, January 1978. [11] S. Goel, S. Belardo, and L. Iwan. A resilient network that can operate under duress: T o support communi- cation between government agencies during crisis situ- ations. Pr oceedings of the 37th Hawaii International Confer ence on System Sciences , 0-7695-2056-1/04:1– 11, 2004. [12] R. R. Iraschko, M. H. MacGregor , and W . D. Grov er . Optimal capacity placement for path restoration in stm or atm mesh-surviv able networks. IEEE/A CM T rans. Netw . , 6(3):325–336, 1998. [13] O. Malik. Does Skype Outage Expose P2Ps Limita- tions?, August 2007. [14] G. S. Manku, M. Naor , and U. W ieder . Know thy neigh- bor’ s neighbor: the power of lookahead in randomized p2p networks. In Pr oceedings of the 36th ACM Sympo- sium on Theory of Computing (STOC) , 2004. [15] M. Medard, S. G. Finn, and R. A. Barry . Redundant trees for preplanned recovery in arbitrary vertex-redundant or edge-redundant graphs. IEEE/ACM T ransactions on Networking , 7(5):641–652, 1999. [16] M. Moore. Skype’ s outage not a hang-up for user base, August 2007. [17] K. Murakami and H. S. Kim. Comparativ e study on restoration schemes of survi vable A TM networks. In IN- FOCOM (1) , pages 345–352, 1997. [18] M. Naor and U. W ieder . Know thy neighbor’ s neighbor: Better routing for skip-graphs and small worlds. In The Thir d International W orkshop on P eer -to-P eer Systems (IPTPS) , 2004. [19] B. Ray . Skype hangs up on users, August 2007. [20] B. Stone. Skype: Microsoft Update T ook Us Do wn, Au- gust 2007. [21] B. van Caenegem, N. W auters, and P . Demeester . Spare capacity assignment for different restoration strategies in mesh survi vable networks. In Communications, 1997. ICC 97 Montr eal, ’T owar ds the Knowledge Millennium’. 1997 IEEE International Conference on , volume 1, pages 288–292, 1997. [22] Y . Xiong and L. G. Mason. Restoration strategies and spare capacity requirements in self-healing atm net- works. IEEE/ACM T r ans. Netw . , 7(1):98–110, 1999.

Picking up the Pieces: Self-Healing in Reconfigurable Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment