Min-Cost 2-Connected Subgraphs With k Terminals
In the k-2VC problem, we are given an undirected graph G with edge costs and an integer k; the goal is to find a minimum-cost 2-vertex-connected subgraph of G containing at least k vertices. A slightly more general version is obtained if the input al…
Authors: Ch, ra Chekuri, Nitish Korula
Min-Cost 2 -Connec ted Subgraph s W ith k T erminals Chandra Chekuri ∗ Nitish K orula † Abstract In the k - 2 V C p roblem, we are given an und irected graph G with edge costs and an integer k ; the goal is to find a minimu m-cost 2-vertex-conn ected subgrap h o f G containing a t least k vertices. A sligh tly more g eneral versio n is obtained if the input also specifies a sub set S ⊆ V of termina ls and the go al is to find a subgr aph containin g at least k terminals. Closely re lated to the k - 2 V C p roblem, and in fact a special case of it, is the k - 2 E C problem, in which the goal is to fin d a minimum- cost 2-ed ge-conn ected subgrap h c ontaining k vertices. The k - 2 E C p roblem was introd uced by Lau et al. [2 2], who also gave a poly- logarithmic ap proxim ation for it. No pr evious approx imation algor ithm was known f or the mor e general k - 2 V C problem . W e describe an O (log n · log k ) app roximatio n for the k - 2 V C pro blem. 1 Introd uction Connecti vity a nd ne twork d esign pr oblems pla y an impor tant role in co mbinatorial opt imization and algorithms both for their the oretical appea l and their man y real-world appli cations. An in teresting and large class of proble ms are of the followin g type : gi v en a graph G ( V , E ) w ith edge o r node costs, find a m inimum-co st subgra ph H of G that satisfies certain connecti vity proper ties. For exa mple, giv en an integer λ > 0 , one can ask for the m inimum-cos t spannin g subgraph that is λ -edge or λ -ve rtex connected . If λ = 1 then this is the classic al minimum spanning tree (MS T) problem. For λ > 1 the problem is NP -hard and also AP X-hard to approx imate. More general versions of connecti vity problems are obtained if one seeks a subgra ph in which a subset of the nodes S ⊆ V referred to as terminals are λ -conne cted. The well-kno wn Steiner tree problem is to find a minimum-cost subgra ph that ( 1 -)connec ts a giv en set S . Many of these probl ems are special cases of the survi v able network design problem (SNDP). In SNDP , each pair of nodes u, v ∈ V specifies a connecti vity requir ement r ( u, v ) and the goal is to find a minimum-cos t subgrap h that has r ( u, v ) disjo int paths for each pair u, v . Giv en the intractabili ty of these conn ectiv ity problems, there has been a lar ge amount of work on approx imation algori thms. A number of elega nt and po werful technique s and results hav e been de velo ped ove r the years (see [19, 25]). In particular , the primal-du al method [1, 17] and iterated rounding [20] ha ve led to some remarkabl e results includi ng a 2 -appr oximation for edge -connecti vity SNDP [20]. An inte resting class of proble ms, related to some of the c onnecti vity problems describe d abov e, is obt ained by requiring that only k of the gi ven terminals be connected. T hese problems are partly motiv ated by applic a- tions in w hich one seeks to m aximize profit giv en a upper bound (budg et) on the cost. For example, a useful proble m in ve hicle routing applica tions is to find a path tha t maximizes the number of ver tices in it subject to a b udget B on the lengt h of the path. In the ex act optimization setting , the profit maxi mization problem is equi v- alent to the problem of minimizing the cost/length of a path subject to the const raint that at least k vertices are includ ed. Of course the two version s need not be approximat ion equiv alen t, ne verthe less, understand ing one ∗ Dept. of C omputer Science, University of Illinois, Urbana, IL 61801. Partially supported by NSF grants CCF 07-28782 and CNS-0721899, and a US-Israeli BSF grant 200227 6. chekuri@ cs.uiuc.edu † Dept. of Computer Science, University of Illinois, U rbana, IL 61801. Partially supported by N SF grant CCF 07-28782. nkorula2@uiuc .edu 1 is often fruitful or necess ary to understand the other . The most well-studied of these problems is the k -MS T proble m; the goal h ere is to find a minimum-co st subgraph of th e gi ven graph G that co ntains at leas t k vertic es (or termin als). T his p roblem has att racted considerab le attenti on in the appr oximation algorithms literatu re and its st udy has l ed to se v eral ne w a lgorithmic ideas a nd appli cations [3, 15, 1 4, 6, 4]. W e note that the Steiner tree proble m can be relati vely easily reduced in an approximatio n preser ving fash ion to the k -MST problem. More recent ly , Lau et al. [22 ] consid ered the natural generaliza tion of k -MST to high er connec tivit y . In particular the y defined the ( k, λ ) -sub graph proble m to be the follo wing: find a m inimum-cos t subgra ph of the gi ven graph G that conta ins at l east k vertices a nd is λ -edge conn ected. W e u se the nota tion k - λ E C to r efer t o this prob lem. In [22] an O (lo g 3 k ) approx imation was claimed for the k - 2 E C problem. Howe ve r , the algorithm and proof in [22] are incorrect. More recen tly , and in indepen dent work from ours, the author s of [22] obtained a dif ferent algori thm for k - 2 E C that yields an O (lo g n log k ) approximatio n. W e giv e later a more detailed comparison between their approac h and ours. It is also sho wn in [22] that a good appro ximation for k - λ E C when λ is lar ge would yield an improve d algorithm for the k -de nsest subgra ph proble m [12]; in this problem one seeks a k -vert ex su bgraph of a g iv en grap h G that has the maxi mum number of edge s. The k -densest subgr aph problem admits an O ( n δ ) approximatio n for some fixe d constant δ < 1 / 3 [12], b ut has resi sted attempts at an impro ved approx imation for a number of years no w . In this paper we consider the verte x-connec tivi ty gen eralizatio n of the k -MST problem. W e define the k - λ V C proble m as follows: G i ven an integer k an d a graph G with edge costs, find the minimum-cost λ - ver tex-con nected subgraph of G that contain s at least k vertic es. W e also consider the terminal version of the proble m w here the subg raph has to contain k terminals from a giv en terminal set S ⊆ V . It can be easily sho wn that the k - λ E C problem reduces to the k - λ V C problem for any k ≥ 1 . W e also observe that the k - λ E C problem with terminal s can be easi ly reduced, as follo ws, to the unifo rm problem where e very ver tex is a terminal: For each terminal v ∈ S , create n dummy vertice s v 1 , v 2 , . . . , v n and attach v i to v with λ paralle l edges of zero cost. Now set k ′ = k n in the ne w graph. One can av oid using parallel edges by creatin g a clique on v 1 , v 2 , . . . , v n using zero-co st edges and connectin g λ of these vertices to v . N ote, ho wev er , that this reduct ion only works for edge-conne ctiv ity . W e are not aware of a reduction that reduces the k - λ V C problem with a giv en set of terminals to the k - λ V C problem, ev en when λ = 2 . In this paper w e consider the k - 2 V C proble m; our main result is the follo wing. Theor em 1.1. Ther e is an O (log ℓ · log k ) appr oximation for the k - 2 V C pr oble m wher e ℓ is the number of terminal s. Cor ollary 1.2. Ther e is an O (log ℓ · log k ) appr oxima tion for the k - 2 E C pr oblem wher e ℓ is the number of terminal s. One of the tech nical ing redients that we d ev elop is the theore m be low w hich may be of indepe ndent inte rest. Giv en a graph G with edg e costs and weights on terminals S ⊆ V , we de fine density ( H ) for a subg raph H to be the ratio of the cost of edges in H to the total weight of terminals in H . Theor em 1.3. Let G be an 2 -ver tex-c onnected g raph w ith edge costs and let S ⊆ V be a set of termina ls. Then, ther e is a simple cycl e C contain ing at least 2 terminals (a non-tri vial cycle) such that th e density of C is at most the density of G . M or eo ver , such a cycle can be found in poly nomial time. Using the abov e theorem and an LP approach we obtain the follo wing. Cor ollary 1.4. Given a graph G ( V , E ) with edg e costs and ℓ terminals S ⊆ V , ther e is an O (log ℓ ) appr oxi- mation for the pr ob lem of find ing a minimum-den sity non-trivial cycle. Note th at Theorem 1.3 and Coroll ary 1.4 are o f interes t because we s eek a c ycle with at lea st two ter minals. A minimum-de nsity cycle cont aining only one ter minal can be fou nd by using th e well-kno wn min-mean cyc le 2 algori thm in directe d graphs [2]. W e remark, howe ve r , that although we suspect that the proble m of fi nding a minimum-den sity non-tri vial cycle is N P-hard, w e currently do not ha ve a proof. Theo rem 1.3 sho ws that the proble m is equi v alent to the dens- 2 V C problem, defined in the next sectio n. Remark: The reader may wond er whether k - 2 E C or k - 2 V C admit a constant factor approximation , since the k -MST problem admits one. W e note that the main technical tool which underlie s O (1) approximatio ns for k -MST problem [5, 15, 11] is a special property that holds for a LP relaxa tion of the prize-col lection Steiner tree p roblem [17] which i s a Lagrang ian relaxation of the Steiner tree p roblem. Such a pro perty is n ot kno wn to hold for gener alizations of k -MST includin g k - 2 E C and k - 2 V C and the k -Steiner forest proble m [18]. T hus, one is forced to rely on alternat iv e and problem-s pecific techniques. 1.1 Overview of T echn ical Ideas W e conside r the rooted version of k - 2 V C : the goal is to find a m in-cos t subgrap h that 2 -con nects at least k terminals to a specified root verte x r . It is relati vely straight forward to reduce k - 2 V C to its rooted versio n (see sectio n 2 for detail s.) W e draw inspiration from algorithmic ideas that led to poly-logari thmic approximatio ns for the k -MST problem. T o des cribe ou r a pproach to the rooted k - 2 V C prob lem, we define a closely related problem. For a subg raph H that conta ins r , let k ( H ) be the number of terminals that are 2 -conn ected to r in H . Then the density of H is simply the ratio of the cost of H to k ( H ) . The dens- 2 V C problem is to find a 2-connected subgr aph of minimum density . An O (lo g ℓ ) approximat ion for the dens- 2 V C problem (where ℓ is the number of terminals) can be deri ved in a some w hat standard way by using a b ucketi ng and scalin g trick on a linear programming relaxa tion for the problem. W e exp loit the kno wn bound of 2 on the inte grality gap of a natural LP for the SNDP problem with verte x connect ivity requirement s in { 0 , 1 , 2 } [13 ]. The b ucket ing and scaling trick has seen sev eral uses in the past and has recently been highlig hted in se veral applica tions [8, 9, 7]. Our algori thm for k - 2 V C uses a greedy appro ach at the high lev el. W e start with an empty subgr aph G ′ and use the approximati on algorithm for dens- 2 V C in an iterati ve fashion to greedily add terminals to G ′ until at least k ′ ≥ k terminals are in G ′ . This approac h would yield an O (log ℓ log k ) approximation if k ′ = O ( k ) . Ho wev er , the last iteration of the dens- 2 V C algorithm m ay add many more termina ls than desired with the result that k ′ ≫ k . In this case we canno t bound the quality of the solution obtain ed by the algorithm. T o ov ercome this prob lem, one can try to prun e the subgraph H added in the last itera tion to only ha ve th e desired number of terminals . For t he k -MS T pro blem, H is a tree and p runing is quite easy . W e remark that this yiel ds a rath er straight forward O (log n log k ) approxi mation for k -MST and co uld ha ve been d iscov ered much befor e a more cle ver analys is gi ven in [3]. One of o ur technical contri buti ons is to gi ve a prun ing step for the k - 2 V C problem. T o accomp lish this, we use two algori thmic ideas. T he first is encaps ulated in the cycle finding algorith m of Theorem 1.3. Second, we use this cycle finding algorithm to repeatedl y mer ge subgraphs until we get the desired number of terminals in one subgraph . T his latter step requires care. The cycle mergin g scheme is inspired by a similar approach from the w ork of Lau et a l. [22] on t he k - 2 E C problem and i n [10] on the di rected orie nteering problem. These id eas yield an O (log ℓ · log 2 k ) appro ximation. W e gi ve a slight ly modified cycle-mer ging algorithm with a more sophis ticated and non-tri vial analysis to obtain an improve d O (log ℓ · log k ) approx imation. Some remarks are in order to compare our work to that of [22] on the k - 2 E C problem. The combinatoria l algori thm in [22] is based on finding a low-den sity cycle or a relat ed structu re called a bi-cycl e. The algorithm in [22] to find such a structure is incorr ect. F urther , the cycles are contracted along the way which limits the approa ch to the k - 2 E C proble m (contracti ng a cycle in 2 -nod e-connecte d graph may make the resulting graph not 2 -node-con nected). In our algori thm we do not co ntract c ycles a nd inste ad intro duce dummy terminal s with weights to capture the number of terminal s in an already formed compon ent. This requires us to no w address the minimum-de nsity non-t rivia l simple cycle problem which we do via Theorem 1.3 and Corollary 1.4. In 3 indepe ndent work, Lau et al. [23] obtain a ne w and correct O (log n log k ) -approxi mation for k - 2 E C . T hey also follo w the same approach that we do in using the L P for finding dense subgraphs follo wed by the pruning step. Howe ver , in the pruning step they use a completely diffe rent approac h; they use the sophisticat ed idea of no-where zero 6 -flows [24]. Althoug h the use of this idea is elegan t, the approac h works only for the k - 2 E C proble m, while our appro ach is less comple x and leads to an algorithm for the more general k - 2 V C problem. 2 The Algorithm f or the k - 2 V C Problem W e work with graphs in which some vert ices are designat ed as terminals . Gi ven a graph G with edge costs and terminal weights, we define the densit y of a subgrap h H to be sum of the costs of edges in H di vided by the sum of th e weights of termin als in H . Henceforth, we use 2 -co nnected graph to mean a 2 - verte x-connec ted graph. The goal of the k - 2 V C probl em is to find a minimum-cost 2-con nected subgrap h on at least k terminals. 1 Recall that in the rooted k - 2 V C problem, the goal is to find a min-cost subgraph on at least k terminals in which ev ery terminal is 2-conn ected to the specified root r . T he (unroo ted) k - 2 V C problem can be reduc ed to the rooted version by guessing 2 vert ices u, v that are in an optimal soluti on, creating a new root vertex r , and conne cting it w ith 0-cost edges to u and v . It is not hard to show that any solution to the rooted probl em in the m odified graph can be con v erted to a solution to the unroot ed probl em by adding 2 minimum-cost vertex - disjoi nt paths between u and v . (Since u and v are in the optimal solution , the cost of these add ed paths cannot be more than O P T.) W e omit further details from this exten ded abstr act. In th e dens- 2 V C problem, the go al is to find a su bgraph H of minimum d ensity in which all termin als of H are 2-connected t o t he root. The follo wing lemma is prov ed in Section 2 .1 belo w . It relies on a 2 -ap proximation , via a natura l LP , for the min-co st 2 -connecti vity probl em due to Fleisch er , Jain and W illiamson [13], and some standa rd techni ques. Lemma 2.1. Ther e is an O (log ℓ ) -appr ox imation algorithm for the dens- 2 V C pr ob lem, wher e ℓ is the number of terminals in the given instanc e. Let O P T be the cost of an optimal solution to the k - 2 V C problem. W e assume kno wledge of O P T ; this can b e dispen sed with using stand ard methods. W e p re-proces s the graph by deleting any termina l that doe s not ha ve 2 ver tex-dis joint paths to the ro ot r of total cost at most O P T . The high-le ve l description of the algorithm for the rooted k - 2 V C problem is gi ven belo w . k ′ ← k , G ′ is the empty graph . While ( k ′ > 0 ): Use the app roximation algorithm for dens- 2 V C to find a subgraph H in G . If ( k ( H ) ≤ k ′ ): G ′ ← G ′ ∪ H , k ′ ← k ′ − k ( H ) Mark all terminals in H as non-terminal s. Else: Prune H to obtain H ′ that contai ns k ′ terminals . G ′ = G ′ ∪ H ′ , k ′ ← 0 Output G ′ 1 In fact, our algorithm solves t he harder problem in which terminals hav e weights, and the goal is to find a minimum-cost 2- connected subgraph in which the sum of terminal weights is at least k . For simplicit y of exposition, howe ve r, we sti ck to the more restricted version. 4 At the beginni ng of any iterat ion of the while loop, the graph contains a solution to the dens- 2 V C proble m of density at most O P T k ′ . Therefore , the graph H returned always has den sity a t most O (log ℓ ) O P T k ′ . If k ( H ) ≤ k ′ , we add H to G ′ and decrement k ′ ; we refer to this as the augmentatio n step. Otherwise, we hav e a graph H of good density , but w ith too m any terminals. In this case, we prune H to find a graph with the required number of terminals; this is the pruni ng step . A simple set-co ver type argu ment sho ws the follo wing lemma: Lemma 2.2. If, at ever y augmentatio n step, we add a graph of density at most O (log ℓ ) O P T k ′ (wher e k ′ is the number of additiona l terminals that m ust be selected ), the total cost of all the augmentation steps is at most O (log ℓ · log k ) O P T . Therefore , we no w only hav e to bound the cost of the graph H ′ added in the prunin g step; we prov e the follo wing theorem in Section 4. Theor em 2.3. Let h G, k i be an instance of the r ooted k - 2 V C pr ob lem w ith r oot r , suc h that every verte x of G has 2 vert ex-d isjoint paths to r of total cost at most L , a nd suc h that density ( G ) ≤ ρ . Ther e is a polynomia l-time algori thm to find a solut ion to this instanc e of cost at most O (log k ) ρk + 2 L . W e can no w prov e our main result for the k - 2 V C problem, Theorem 1.1. Pro of of Th eor em 1 .1: Let O P T be the co st of an optimal solution to the (rooted) k - 2 V C probl em. By Lemma 2.2, the total cost of the augmentati on steps of our greedy algorithm is O (log ℓ · log k ) O P T . T o bound the cost of the pruning step, let k ′ be the number of addition al terminals that must be cov ered just prior to this step. The algorith m for the dens- 2 V C problem returns a graph H with k ( H ) > k ′ terminals , and density at most O (log ℓ ) O P T k ′ . A s a result of our pre-proce ssing step, ev ery vertex has 2 vertex -disjoint paths to r of total cost at most O P T . Now , we use Theorem 2.3 to p rune H and find a graph H ′ with k ′ terminals and co st at most O (log k ) density ( H ) k ′ + 2 O P T ≤ O (log ℓ · log k ) O P T + 2 O P T . Therefo re, the total cost of our solution is O (log ℓ · log k ) O P T . ✷ It remains only to prov e L emma 2.1, that there is an O (log ℓ ) -approx imation for the dens- 2 V C problem, and Theorem 2.3, bounding the cost of the pruning step. W e pr ove the former in Section 2.1 belo w . Before the latter is pro ved in Section 4, we dev elop some tools in Section 3; chief among these tools is Theorem 1.3. 2.1 An O (lo g ℓ ) -appr oximatio n f or the dens- 2 V C prob lem Recall that the dens- 2 V C proble m was defined as follo ws: Giv en a graph G ( V , E ) with edge-cos ts, a set T ⊆ V o f terminals, and a root r ∈ V ( G ) , fi nd a subgraph H of m inimum density , in which ev ery terminal of H is 2-co nnected to r . (Here, the den sity of H is define d as the cost of H di vided by t he number of terminals it contai ns, not including r .) W e describe an algorithm for dens- 2 V C that giv es an O (log ℓ ) -approx imation, and ske tch its proof. W e use an LP based appro ach and a b ucket ing and scaling trick (see [7, 8, 9] for applications of this idea), and a const ant-fac tor bound on the integra lity gap of an L P for SNDP w ith vertex -connecti vity requir ements in { 0 , 1 , 2 } [13]. W e d efine LP-dens as the follo wing LP relaxation of dens- 2 V C . For each termina l t , th e v ariable y t indica tes whether or not v is chosen in the solution. (By normalizin g P t y t to 1, and minimizing the sum of edge costs, we minimize the density .) C t is the set of all simple cycles containin g t and the root r ; for any C ∈ C t , f C indica tes how much ‘fl o w’ is sent from v to r throu gh C . (Note that a pair of vertex -disjoint paths is a cyc le; the flow along a cycle is 1 if we can 2-connect t to r using the edges of the cycle .) The varia ble x e indica tes whether the edge e is us ed by the solution. 5 min X e ∈ E c ( e ) x e X t ∈ T y t = 1 X C ∈C t f C ≥ y t ( ∀ t ∈ T ) X C ∈C t | e ∈ C f C ≤ x e ( ∀ t ∈ T , e ∈ E ) x e , f c , y t ≥ 0 It is not hard to see that an optimal solution to L P-dens has cost at most the density of an optimal solution to dens- 2 V C . W e no w show ho w to obtain an inte gral solution of densit y at most O (log ℓ ) O P T LP , where O P T LP is the cost of an optimal solutio n to LP-dens . The linear pro gram LP-dens has an ex ponential number of varia bles but a polynomial number of non-tri vial constr aints; it can, howe ver , be solved in polynomial time. Fix an optimal solution to LP -dens of cost O P T LP , and for each 0 ≤ i < 2 log ℓ (for ease of notation, assume log ℓ is an integer) , let Y i be the set of terminals t such that 2 − ( i +1) < y t ≤ 2 − i . Since P t ∈ T y t = 1 , there is some inde x i such that P t ∈ Y i y t ≥ 1 2 log ℓ . Since eve ry terminal t ∈ Y i has y t ≤ 2 − i , the number of terminal s in Y i is at least 2 i − 1 log ℓ . W e claim that there is a subgra ph H of G with cost at most O (2 i +2 O P T LP ) , in which e very terminal of Y i is 2-connec ted to the root. If this is true, the density of H is at most O (log ℓ · O P T LP ) , and hence we ha ve an O (log ℓ ) -appro ximation for the dens- 2 V C problem. T o prov e our claim about the cost of the subgrap h H in which ev ery terminal of Y i is 2-connecte d to r , consid er scaling up the gi ven optimum solution of LP-dens by a fac tor of 2 i +1 . Fo r each terminal t ∈ Y i , the flow from t to r in this scaled solution 2 is at leas t 1, and the cos t of the scale d solution is 2 i +1 O P T LP . In [13], th e authors des cribe a linea r program LP 1 to find a minimu m-cost subgraph in which a g iv en set of terminals is 2-co nnected to t he root, and s how t hat this linear progr am has an integ rality gap of 2. The v ariable s x e in the ‘scaled solution’ to LP -dens corresp ond to a feasible solut ion of LP 1 with Y i as the set of terminal s; the integralit y gap of 2 implies that there is a subgrap h H in which eve ry terminal of Y i is 2-connecte d to the root, with cost at most 2 i +2 O P T LP . Therefore , the algorit hm for dens- 2 V C is: 1. Find an optimal fraction al solutio n to LP-dens . 2. Find a set of terminals Y i such that P t ∈ Y i y t ≥ 1 2 log ℓ . 3. Find a min-cost subgr aph H in which e very termin al in Y i is 2-co nnected to r using the algorith m of [1 3]. H has density at most O (log ℓ ) times the optimal solut ion to dens - 2 V C . 3 Finding Low-den sity Non-trivial Cycles A cycle C ⊆ G is non-tr ivial if it contain s at least 2 terminals. W e define the min-densi ty non-tr ivia l cycl e proble m: Give n a graph G ( V , E ) , with S ⊆ V marke d as terminals, edge costs and terminal weights, fi nd a minimum-den sity cycle th at contains at lea st 2 termina ls. Note t hat if we re move the requ irement that the c ycle be non-tr ivia l (that is, it contain s at least 2 terminals), the problem reduces to the min-mean cycle problem in 2 This is an abu se of the term ‘solution’, since after scali ng, P t ∈ T y t = 2 i +1 6 directe d graphs, and can be solved exactly in polynomia l time (see [2]). Algorithms for the min-den sity non- tri vial cyc le proble m are a useful tool for solving the k - 2 V C and k - 2 E C proble ms. In this section, we giv e an O (log ℓ ) -appro ximation algorith m for the minimum-de nsity non-tr ivia l cycle prob lem. First, we prov e Theorem 1.3 , that a 2-connect ed graph with edge costs and terminal w eights cont ains a simple no n-tri vial cycle, with den sity no more than the a verag e density of th e graph. W e gi ve two algorit hms to find such a cycl e; the fi rst, descri bed in Section 3.1, is simpler , bu t the running time is not polynomial . A more techni cal proof that leads to a strongly polyn omial-time algorithm is described in Section 3.2; we recommend this proof be skipped on a first reading . 3.1 An Algorithm to Find Cycles of A verage Density T o find a non-tri vial c ycle of density at most that of th e 2 -connected input g raph G , we will start with an arbitra ry non-tri vial cycle, and successi vely fi nd cycl es of better density until we obtain a cycle with density at most density ( G ) . The follo wing lemma sho ws that if a cycl e C has an ear with densit y less than density ( C ) , we can use this ear to find a cyc le of lower de nsity . Lemma 3.1. Let C be a non-trivia l cycle, and H an ear incid ent to C at u and v , such that cost ( H ) w eig ht ( H −{ u,v } ) < densit y ( C ) . Let S 1 and S 2 be the two inte rnally disjoint paths bet ween u and v in C . Then H ∪ S 1 and H ∪ S 2 ar e both simple cycles and one of these is non-triv ial and has densit y less than densit y ( C ) . Proof . C has at least 2 terminals, so it has finite density; H m ust then hav e at least 1 terminal. L et c 1 , c 2 and c H be, res pecti vely , the sum of the c osts of th e edges in S 1 , S 2 and H , and let w 1 , w 2 and w H be th e sum of the weights of the terminals in S 1 , S 2 and H − { u, v } . Assume w .l.o.g. th at S 1 has density at most that of S 2 . (That is, c 1 /w 1 ≤ c 2 /w 2 .) 3 S 1 must contain at least one terminal, and so H ∪ S 1 is a simple non-tri vial cycle. The statement density ( H ∪ S 1 ) < density ( C ) is equi va lent to ( c H + c 1 )( w 1 + w 2 ) < ( c 1 + c 2 )( w H + w 1 ) . ( c H + c 1 )( w 1 + w 2 ) = c 1 w 1 + c 1 w 2 + c H ( w 1 + w 2 ) ≤ c 1 w 1 + c 2 w 1 + c H ( w 1 + w 2 ) ( densi ty ( S 1 ) ≤ density ( S 2 )) < c 1 w 1 + c 2 w 1 + ( c 1 + c 2 ) w H ( c H /w H < density ( C )) = ( c 1 + c 2 )( w H + w 1 ) Therefore , H ∪ S 1 is a simple cycl e containin g at least 2 terminals of density less than density ( C ) . Lemma 3.2. Given a cycle C in a 2 -connect ed graph G , let G ′ be the graph formed fr om G by contract ing C to a single verte x v . If H is a connect ed componen t of G ′ − v , H ∪ { v } is 2 -conn ected in G ′ . Proof . Let H be an arbitrary connected compon ent of G ′ − v , and let H ′ = H ∪ { v } . T o prov e that H ′ is 2-conn ected, we first ob serve that v is 2-con nected to any verte x x ∈ H . (Any s et that se parates x from v in H ′ separa tes x from the c ycle C in G .) It no w follo ws that for all vertices x, y ∈ V ( H ) , x and y are 2-conn ected in H ′ . Suppose deleti ng some ver tex u separates x from y . The verte x u cannot be v , since H is a conne cted component of G ′ − v . B ut if u 6 = v , v and x are in the same component of H ′ − u , since v is 2-connecte d to x in H ′ . Similarly , v and y are in the same compone nt of H ′ − u , and so deletin g u does not separate x from y . W e now sho w that giv en any 2-conn ected graph G , we can find a non-tri vial cycl e of density no more than that of G . 3 It is possible that one of S 1 and S 2 has cost 0 and weight 0. In this case, let S 1 be the compone nt wi th non-ze ro weight. 7 Theor em 3.3. Let G be a 2 -con nected grap h with at least 2 terminal s. G contains a simple non-trivi al cycle X such that density ( X ) ≤ densit y ( G ) . Proof . Let C be an arbitrary non-tri vial simple cycle; such a cycl e always exist s since G is 2 -connecte d and has at least 2 termin als. If density ( C ) > density ( G ) , we gi ve an algo rithm that finds a new no n-tri vial cycle C ′ such that density ( C ′ ) < density ( C ) . Repeating this process, we obtain cyc les of success iv ely better densities until ev entuall y finding a non-tr ivia l cycle X of density at most density ( G ) . Let G ′ be the graph for med by contractin g the giv en cy cle C to a single vert ex v . In G ′ , v is not a terminal, and so has weight 0. Consider the 2-conn ected components of G ′ (from Lemma 3.2, each such compone nt is formed by adding v to a connec ted componen t of G ′ − v ), and pick the one of minimum density . If H is this compone nt, density ( H ) < densi ty ( G ) by an a veragin g ar gument. H contai ns at least 1 terminal . If it contains 2 or more terminals, recursi vely fi nd a non-tri vial cycle C ′ in H such that density ( C ′ ) ≤ density ( H ) < density ( C ) . If C ′ exi sts in the giv en graph G , it has the desired proper ties, and we are done. Otherwise, C ′ contai ns v , and the edges of C ′ form a e ar of C in the origina l graph G . The dens ity of this ear is less than the densi ty of C , so we can app ly Lemma 3 .1 to obt ain a n on-tri vial cycle in G that has dens ity less than density ( C ) . Finally , if H has exactly 1 terminal u , find any 2 vert ex-dis joint paths using edges of H from u to distinct ver tices in the cycle C . ( Since G is 2-con nected, there alw ays exist such paths.) The cost of these paths is at most cost ( H ) , and concatenati ng these 2 paths corresp onds to a ear of C in G . The density of this ear is less than density ( C ) ; again, we use Lemma 3.1 to obtain a cycle in G w ith the desir ed properties. W e remark again that the a lgorithm o f The orem 3.3 does not lead to a polyno mial-time algorit hm, ev en if all edge costs and termin al w eights are poly nomially bounded. In Section 3.2, we describe a strong ly polynomial- time algorith m that, giv en a graph G , finds a non-tri vial cyc le of density at most that of G . Note that neither of these algo rithms may directl y giv e a goo d approximation to the min- density non-tri vial cycle p roblem, because the optimal non-tri vial cyc le may hav e density much less than that of G . H o wev er , we can use Theorem 3.3 to pro ve the follo wing theorem: Theor em 3.4. Ther e is an α -appr oximation to the (unr ooted) den s- 2 V C pr oblem if and only if ther e is an α -appr oximation to the pr ob lem of finding a minimum-density non-trivi al cycle . Proof . Assu me we hav e a γ ( ℓ ) -approx imation for the dens- 2 V C problem; we use it to find a low-de nsity non- tri vial cycle. Solve the dens- 2 V C problem on the giv en graph; since the optimal cycle is a 2-connec ted graph, our solution H to the dens- 2 V C problem has density at most γ ( ℓ ) times the densi ty of this cycle . Find a non- tri vial cycle in H of densit y at most that of H ; it has densi ty at most γ ( ℓ ) times that of an optimal non-t rivi al cyc le. Note that any instance of the (unroo ted) dens- 2 V C problem has an optimal solutio n that is a non-tri vial cyc le. (Cons ider an y optimal solu tion H of density ρ ; by Theorem 1.3, H contains a non-tr ivia l cycle of densit y at m ost ρ . This cycle is a valid solution to the dens- 2 V C proble m.) Therefore, a β ( ℓ ) -ap proximation for the min-densi ty non-tri vial cycle prob lem gi ves a β ( ℓ ) -approximat ion for the dens- 2 V C problem. Theorem 3.4 and Lemma 2.1 imply an O (log ℓ ) -appro ximation for the m inimum-den sity non-tr ivia l cycle proble m; this pro ves Corollary 1.4. W e say that a graph G ( V , E ) is minimally 2-connected on its terminal s if for ev ery edge e ∈ E , some pair of terminals is not 2-connecte d in the graph G − e . Section 3.2 shows that in any graph which is minimally 2-conn ected on its terminals, ev ery cycl e is non-tri vial. Therefor e, the proble m of finding a minimum-densi ty non-tr ivia l cycle in such graph s is just that of finding a minimum-den sity cy cle, w hich can be solved exa ctly 8 C c 4 c 6 c 9 c 0 H Figure 1: H is an earrin g o f G , with clasp s c 4 , c 6 , c 9 ; c 4 is its first clasp, and c 9 its last clasp. The arrow indica tes the arc of H . in polynomial time. H o wev er , as we explain at the end of the section, this does not directly lead to an ef ficient algori thm for arbitr ary graph s. 3.2 A Str ongly Polynomial-time Algorithm to Find Cycles of A v erage Density In this section, we describe a strongly polyno mial-time algorithm which, giv en a 2-connected graph G ( V , E ) with edge costs and terminal weights, finds a non-tr ivial cycle of densit y at most that of G . W e be gin with sev eral definitio ns: Let C be a cycle in a graph G , and G ′ be the graph formed by deleting C from G . Let H 1 , H 2 , . . . H m be the connect ed components of G ′ ; we refer to these as earring s of C . 4 For each H i , let the ver tices of C incide nt to it be called its clasps . From the definition of an earrin g, for any pair of clasps of H i , there is a path between them whose interna l vertices are all in H i . W e say that a vert ex of C is an anc hor if it is the clasp of some earring . (An anch or may be a clasp of multiple earrings.) A se gment S of C is a path contain ed in C , such that the endpoints of S are both anchors, and no internal verte x of S is an anchor . (Note that the endpoints of S might be clasps of the same earring, or of distinct earrings .) It is easy to see that the seg ments partitio n the edge set of C . By deleting a segment, we refer to deletin g its edges and internal vertice s. Observ e that if S is deleted from G , the only ve rtices of G − S that lose an edge are the endpoi nts of S . A segment is safe if the graph G − S is 2-connecte d. Arbitrari ly pick a v ertex o of C as the orig in , and consec utiv ely number the verti ces of C clockwise around the c ycle as o = c 0 , c 1 , c 2 , . . . , c r = o . The first cla sp of an ear ring H is its lo west numbered clasp, an d the last clasp is its highe st numbered clasp. (If the origin is a clasp of H , it is considered the first clasp, not the last.) The ar c of an earring is the subgraph of C found by trav ersing clockwise from its fi rst clasp c p to its last clasp c q ; the length of this arc is q − p . (That is, the length of an arc is the number of edges it contains.) Note that if an arc contains the origin, it must be the first vert ex of the arc. Figure 1 illustrat es se veral of these definitions . Theor em 3.5. Let H be an earring of minimum ar c length. Every se gment contain ed in the ar c of H is safe. Proof . Let H be the set of earrings with arc identical to that of H . Sinc e they ha ve the same arc, we refer to this as the arc of H , or the critical ar c . Let the first clasp of ev ery earring in H be c a , and the last clasp of each earring in H be c b . Because the earrin gs in H ha ve arcs of minimum length , any earrin g H ′ / ∈ H has a clas p c x that is not in the critical arc. (That is, c x < c a or c x > c b .) W e must show that eve ry seg ment contai ned in the critical arc is safe; recall that a segment S is safe if the graph G − S is 2-connecte d. Giv en an arbitrary segment S in the critical arc, let c p and c q ( p < q ) be the 4 If H i were simply a path, it would be an ear of C , but H i may be more comple x. 9 C c a c b H ∈ H c p c q c 0 C c a c b H 1 c p c q H 2 c 0 C c a c b H ∈ H c p c q H 1 H 2 c 0 C c a c b H 1 ∈ H c p c q H 2 c 0 C c a c b H 1 ∈ H c p c q H 2 c 0 Figure 2: The va rious cases of Theorem 3.5 are illustrated in the order presented. In each case, one of the 2 ver tex-dis joint paths from c p to c q is indic ated w ith dashed lines, and the other with dotted lines. ancho rs that are its endpoints. W e pr ove that there are alway s 2 internally vertex -disjoint paths between c p and c q in G − S ; this suf fices to sho w 2-connecti vity . W e c onsider se veral cases , depending on the earring s that conta in c p and c q . Figure 2 illus trates these cas es. If c p and c q are containe d in the same earring H ′ , it is easy to find two vertex -disjoint paths between them in G − S . The first path is clockwise from q to p in the cycl e C . The secon d path is entirely contai ned in the earring H ′ (an earring is connect ed in G − C , so we can alway s fi nd such a path .) Otherwise, c p and c q are clasps of distinct earrings. W e consider three cases : Both c p and c q are clasp s of earring s in H , one is (but not both ), or neither is. 1. W e first consider that both c p and c q are clasps of earrings in H . Let c p be a clasp of H 1 , and c q a clasp of H 2 . The first path is from c q to c a throug h H 2 , and th en clockwise along the critical arc from c a to c p . The second p ath is f rom c q to c b clockwis e along the critical p ath, and th en c b to c p throug h H 1 . It is easy to see that these path s are internally vertex -disjoint. 2. No w , suppos e neither c p nor c q is a clasp of an earring in H . Let c p be a clasp of H 1 , and c q be a clasp of H 2 . The first path we find follows the critical arc clockwise from c q to c b (the last clasp of the critical arc), from c b to c a throug h H ∈ H , and again clockwise through the critical arc from c a to c p . Intern al ver tices of this path are all in H or on the critical arc. Let c p ′ be a clasp of H 1 not on th e critical arc, and c q ′ be a last clasp of H 2 not on the critical arc. The second path goes from c p to c p ′ throug h H 1 , from p ′ to q ′ throug h the cycl e C outside the critic al arc, and from c q ′ to c q throug h H 2 . Intern al vertic es of this path are in H 1 , H 2 , or in C , but not part of the critical arc (since each of c p ′ and c q ′ are outsi de the critica l arc). Therefore, we hav e 2 vertex -disjoint paths from c p to c q . 10 3. Fina lly , w e con sider the case th at exactl y one of c p , c q is a clasp of a n earring in H . Suppose c p is a cl asp of H 1 ∈ H , and c q is a clasp of H 2 / ∈ H ; the other case (where H 1 / ∈ H and H 2 ∈ H is symmetric, and omitted, tho ugh figure 2 illustra tes the paths .) Let q ′ be th e inde x of a c lasp of H 2 outsid e the critical arc. The first path is from c q to c b throug h the critical arc, and then from c b to c p throug h H 1 . The second path is from c q to c q ′ throug h H 2 , and from c q ′ to c p clockwis e through C . Note that the last part of this path enters the critical arc at c a , and contin ues through the arc until c p . Internal ver tices of the first path that are in C are on the critical arc, b ut ha ve ind ex great er than q . Internal vert ices of the seco nd path that belong to C are either not in the critical arc, or ha ve index between c a and c p . There fore, the two paths are inte rnally verte x-disjoin t. W e n ow descri be our alg orithm to find a non-tr ivia l cycle of go od dens ity , proving Theo rem 1.3: Let G be a 2 -conn ected graph w ith edge-co sts and terminal weights, and at least 2 terminals . Ther e is a polynomial-ti me algori thm to find a non-t rivial cycle X in G suc h that density ( X ) ≤ density ( G ) . Pro of of Theore m 1.3: Let G be a graph w ith ℓ terminals an d den sity ρ ; we descri be a polynomia l-time algori thm that either finds a cycle in G of density less than ρ , or a proper subgraph G ′ of G that contains all ℓ terminals . In the latter case, we can recurse on G ′ until we e ventu ally find a cycl e of density at most ρ . W e first find, in O ( n 3 ) time, a m inimum-de nsity cycle C in G . By Theorem 3.3 , C has density at most ρ , becaus e the m inimum-den sity non-tr ivial cyc le has at most this density . If C conta ins at least 2 terminals, we are done. Otherwise, C conta ins exactl y one terminal v . Since G contains at least 2 terminals, there must exist at least one earring of C . Let v be the origin of this cycle C , and H an earring of minimum arc length. By Theorem 3.5, e very seg ment in the arc of H is safe. Let S be such a se gment; since v was selec ted as the orig in, v is not an int ernal ver tex of S . As v is the only terminal of C , S contains no terminals, and therefore , the graph G ′ = G − S is 2-conn ected, and contai ns all ℓ termin als of G . ✷ The pro of abov e also sh ows that if G is minimally 2-connected on its terminals (th at is, G has no 2- conne cted proper subgrap h containing all its terminals), eve ry cycle of G is non-tr ivia l. (If a cycle conta ins 0 or 1 terminals, it has a safe segment containin g no terminals , which can be deleted; this giv es a contradic tion.) Therefore , gi ven a gr aph that is minimally 2-conn ected on its ter minals, finding a minimum-densit y non-tri vial cyc le is equi val ent to finding a minimum-den sity cycle , and so can be solved exactly in polynomial time. This sugge sts a nat ural algorithm for the problem: Give n a graph that is not minimal ly 2-connected on its terminals, delete edges and vertice s until the graph is m inimally 2-connec ted on the terminals, and then fi nd a minimum- densit y c ycle. A s sho wn above , thi s gi ves a cycle of density no more than that of the input graph, but this may not be the minimum-dens ity cy cle of the original graph. For instanc e, there exist instances where the minimum-den sity cycle use s edges of a safe segment S that might be deleted by this algorithm. 4 Pruning 2-connected Graphs of Good Density In this secti on, we p rove T heore m 2.3. W e ar e gi ven a gr aph G and S ⊆ V , a set of at least k terminals. Further , e very terminal in G has 2 vert ex-disj oint paths to the root r of total cost at most L . Let ℓ be the number of terminals in G , and cost ( G ) its total cost; ρ = cost ( G ) ℓ is the density of G . W e describ e an algorithm that finds a sub graph H of G that contai ns at least k termina ls, each of which is 2-c onnected to the roo t, and of tota l edge cost O (log k ) ρk + 2 L . W e c an ass ume ℓ > (8 log k ) · k , or the tri vial s olution of taking the enti re graph G suf fices. The ma in pha se of our algorithm proceeds by maintain ing a set of 2-connec ted subgrap hs that we call clusters , and repeatedly finding lo w-dens ity cycle s that mer ge cl usters of s imilar weigh t to for m lar ger clust ers. (The weight o f a cluste r X , denoted by w X , is (roug hly) the number of te rminals it cont ains.) Cluster s are grou ped into tiers by weight; 11 tier i contains clusters w ith weight at least 2 i and less than 2 i +1 . Initially , each terminal is a separate cluster in tier 0. W e say a cluster is lar ge if it has weight at least k , and small otherwise. The algor ithm stops when most terminals are in lar ge clusters. W e now describe the algorithm M E R G E C L U S T E R S (see next page). T o simplify notation , let α be the quanti ty 2 ⌈ log k ⌉ ρ . W e say that a cy cle is good if it has density at most α ; that is, good cycles hav e density at most O (log k ) times the dens ity of the inpu t graph. M E R G E C L U S T E R S : For ( each i in { 0 , 1 , . . . , ( ⌈ log 2 k ⌉ − 1) } ) do: If ( i = 0 ): Every termin al has weight 1 Else: Mark all vertic es as non-termina ls For (eac h small 2-connec ted cluster X in tier i ) do: Add a (dummy) terminal v X to G of weight w X Add (dummy) edges of cost 0 from v X to two (a rbitrary) distinct vertice s of X While ( G ha s a non-tri vial cycle C of density at most α in G ): Let X 1 , X 2 , . . . X q be the small cluste rs that contain a terminal or an edge of C . (Note that the terminals in C belong to a subset of { X 1 , . . . X q } .) Form a ne w cluster Y (of a higher tier) by mergin g the clusters X 1 , . . . X q w Y ← P q j =1 w X j If ( i = 0 ): Mark all terminal s in Y as non-termina ls Else: Delete all (dummy) terminals in Y and the associated (dummy) edges. W e briefly remark on some salient feature s of this algori thm and our analys is before presenting the details of the proofs. 1. In itera tion i , the ter minals correspo nd to tier i cluste rs. C lusters a re 2-conne cted subgraphs of G , an d by using c ycles to merge clusters , we pres erve 2-connec tivit y as the clusters become larg er . 2. Whe n a cycle C is used to m er ge clusters, all small clusters that contain an edge of C (regardle ss of their tier) a re mer ged to form th e ne w clus ter . Therefore , at an y stage of the algori thm, all cu rrently small cluste rs are edge- disjoint. L ar ge cl usters, on the othe r hand, are fr oze n ; ev en if t hey inters ect a good cyc le C , they are not mer ged with other clust ers on C . Thus, at any time, an edge may be in multiple larg e cluste rs and up to one small cluster . 3. In iteration i of M E R G E C L U S T E R S , the d ensity of a cy cle C is only determined by its cost and the weight of terminals in C corresp onding to tier i clusters. Though small clusters of other (lower or higher) tiers might be mer ged using C , we do not use their weight to pay for the edges of C . 4. The i th iteratio n terminate s when no good cycles can be foun d using the remaining tier i clusters. At this point, there may be some terminals remaining that correspond to clusters which are not mer ged to form cluster s of higher tiers. Howe ver , our cho ice of α (which de fines the den sity of good cycle s) is such that we can bound the number of terminal s that are “left behind” in this fashion. Therefore, when the algori thm terminates, m ost terminals are in lar ge clusters. By bounding the density of larg e clusters , we can find a solution to the rooted k - 2 V C proble m of bounded densit y . Because we alw ays use cycl es of low density to merge clusters, an analysis similar to that of [22] and 12 [10] sho ws th at e very la rge cl uster has density at most O (log 2 k ) ρ . W e first prese nt this analy sis, though it doe s not suf fice t o prov e The orem 2.3. A mor e c areful an alysis sho ws that there is at le ast o ne lar ge cluster of dens ity at most O (log k ) ρ ; this allo ws us to prov e the desired theorem. W e no w formally prove that M E R G E C L U S T E R S has the desired beha vior . First , we present a series of claims which, together , sho w that when the algorithm terminates, most terminals are in lar ge cluster s, and all cluste rs are 2-con nected. Remark 4.1. Thr oughout the algorit hm, the gr aph G is always 2 -connect ed. The weight of a clust er is at most the number of terminals it conta ins. Proof . The only structu ral changes to G are when new ver tices are added as terminals; they are added with edges to two disti nct vertices of G . This preserve s 2-connec tivit y , as does delet ing these terminals with the associ ated edges. T o see th at the second claim is true, observ e that if a termin al contrib utes weight to a clu ster , it is contai ned in th at cluster . A te rminal can be in mult iple cluster s, but it contr ibut es to the weight of e xactly one cl uster . W e use the foll owing simple prop osition in proof s of 2-conn ecti vity; the proof is s traightfor ward, and hence omitted. Pro position 4.2. Let H 1 = ( V 1 , E 1 ) and H 2 = ( V 2 , E 2 ) be 2 -connec ted subgra phs of a graph G ( V , E ) such that | V 1 ∩ V 2 | ≥ 2 . Then the gra ph H 1 ∪ H 2 = ( V 1 ∪ V 2 , E 1 ∪ E 2 ) is 2 -con nected. Lemma 4.3. The cluster s formed by M E R G E C L U S T E R S are all 2 -connecte d. Proof . Let Y be a cluster formed by us ing a cycle C to merge cluster s X 1 , X 2 , . . . X q . The edges of the cyc le C form a 2-connec ted subgraph of G , and we assume that each X j is 2-conne cted by induction. Further , C contain s at least 2 ve rtices of each X j 5 , so w e can use inductio n and Proposition 4.2 abo ve: W e assume C ∪ { X l } j l =1 is 2-con nected by induction , and C contains 2 vertices of X j +1 , so C ∪ { X l } j +1 l =1 is 2-con nected. Note that we ha ve sho wn Y = C ∪ { X j } q j =1 is 2-connecte d, but C (and hence Y ) might contain dummy terminals and the correspond ing dummy edges . Ho wev er , each such terminal w ith the 2 associated edges is a ear of Y ; deleti ng them lea ves Y 2-connecte d. Lemma 4.4. The total weigh t of small clu sters in tier i that ar e not mer ged t o form clust ers of higher tier s is at most ℓ 2 ⌈ log k ⌉ . Proof . Assu me this were not true; this means that M E R G E C L U S T E R S could find no more cy cles of density at most α using the remaining small tier i clust ers. But the total cost of all the edges is at most cost ( G ) , and the sum of terminal weights is at least ℓ 2 ⌈ log k ⌉ ; this implies that the density of the graph (using the remaining terminals ) is at most 2 ⌈ log k ⌉ · cost ( G ) ℓ = α . But b y Theo rem 3.3, the grap h must then contain a good non-tr ivia l cyc le, and so the while loop would not ha ve terminated . Cor ollary 4.5. When the algorit hm M E R G E C L U S T E R S terminates , the total weight of lar g e cluster s is at least ℓ/ 2 > (4 log k ) · k . Proof . Each terminal not in a lar ge clus ter co ntrib utes to the weight o f a cluster that was n ot mer ged with others to form a cluste r of a higher tier . The pre vious lemma shows that the total weight of such clusters in any tier is at m ost ℓ 2 ⌈ log k ⌉ ; since there are ⌈ log k ⌉ tiers, the total number of terminals not in lar ge clusters is less than ⌈ log k ⌉ · ℓ 2 ⌈ log k ⌉ = ℓ/ 2 . 5 A cluster X j may be a singleton verte x (for instance, if we are in tier 0), but such a verte x does not affect 2-con nectivity . 13 So far , we ha ve sho wn that most terminals reach lar ge clusters, all of which are 2-conn ected, but w e hav e not arg ued about the density of these cluste rs. The next lemma says that if we can find a larg e cluster of good densit y , we can find a solution to the k - 2 V C problem of good density . Lemma 4.6. Let Y be a lar ge cluster formed by M E R G E C L U S T E R S . If Y has density at most δ , we can find a gra ph Y ′ with at least k terminals, each of which is 2 - connected to r , o f total cost at most 2 δ k + 2 L . Proof . Let X 1 , X 2 , . . . X q be the clusters mer ged to form Y in order around the cycle C that m er ged them; each X j was a small cluster , of w eight at most k . A simple av eraging ar gument shows that there is a consec utiv e seg ment of X j s with total weight between k an d 2 k , su ch that the cost of the edges of C co nnecting these cluste rs, togethe r with the costs of the clusters themsel ves, is at most 2 δ k . Let X a be the “first” cluste r of this seg ment, and X b the “last”. Let v and w be arbitrary terminals of X a and X b respec tiv ely . Con nect each of v and w to the root r using 2 vertex -disjoint paths; the cost of this step is at most 2 L . (W e assumed that ev ery terminal could be 2-connec ted to r using disjoint paths of cost at most L .) The graph Y ′ thus constructed has at least k terminal s, and total cost at most 2 δk + 2 L . W e s how that e very v ertex z of Y ′ is 2-co nnected to r ; t his comple tes our proof. Let z be an arbi trary ver tex of Y ′ ; suppose there is a cut-v ertex x which, when delete d, separates z from r . Both v and w are 2-connected to r , and theref ore neithe r is in the same component as z in Y ′ − x . Howe ver , we descri be 2 ver tex-dis joint paths P v and P w in Y ′ from z to v and w respec tiv ely; deleting x canno t separate z from both v and w , which gi ves a contradicti on. The paths P v and P w are easy to find; let X j be the cluster contain ing z . T he cy cle C contai ns a path from vertex z 1 ∈ X j to v ′ ∈ X a , and anothe r (verte x-disjoin t) path from z 2 ∈ X j to w ′ ∈ X b . Concaten ating these paths with paths from v ′ to v in X a and w ′ to w in X b gi ves us verte x-disjo int paths P 1 from z 1 to v and P 2 from z 2 to w . Since X j is 2-connecte d, we can find verte x-disjoin t paths from z to z 1 and z 2 , which gi ves us the desired paths P v and P w . 6 W e no w present the two analys es of density referred to earlier . The key dif feren ce between the weaker and tighte r analysi s is in the way w e bou nd ed ge cos ts. In the former , ea ch lar ge cluster pays fo r i ts edg es se parately , using the fact that all cycles used ha ve density at most α = O (log k ) ρ . In the latter , we cruci ally use the fact that small clusters which share edges are merged . Roughl y speakin g, because small clusters are edge-disj oint, the av erage density of small cluster s must be comparable to the density of the input graph G . Once an edge is in a l arge clust er , we can no lon ger use the edge- disjointne ss arg ument. W e must pay for these edges separ ately , b ut we can bound this cost. First, the follo wing lemma allows us to sho w that ev ery lar ge cluster has density at most O (log 2 k ) ρ . Lemma 4.7. F or any clu ster Y formed by M E R G E C L U S T E R S during iteratio n i , the total cost of edges in Y is at most ( i + 1) · αw Y . Proof . W e prove this lemma by inductio n on the number of ver tices in a cluster . Let S be the set of cluste rs mer ged using a cycle C to form Y . Let S 1 be the set of clust ers in S of tier i , and S 2 be S − S 1 . ( S 2 contai ns cluste rs of tiers less or greater than i th at contained an edge of C .) The cost of edges in Y is at most the sum of: t he cost of C , the cost of S 1 , and the cost of S 2 . S ince all cluste rs in S 2 ha ve been formed during iteration i or earlier , and are smaller than Y , we can use induction to sho w that the cost of edges in S 2 is at most ( i + 1) α P X ∈S 2 w X . All clusters in S 1 are of tier i , and so m ust ha ve been formed befo re iteration i (an y cluster formed during iteration i is of a str ictly greater tier), so we use induct ion to bound the cost of edges in S 1 by iα P X ∈S 1 w X . 6 The v ertex z may not be in any clus ter X j . In t his case, P v is formed b y using ed ges of C from z to v ′ ∈ X a , and the n a path fro m v ′ to v ; P w is formed similarly . 14 Finally , becau se C w as a goo d-density cyc le, and o nly clust ers of tier i contr ibut e to calculati ng the densit y of C , the c ost of C is at most α P X ∈S 1 w X . Therefore, the total co st of edges in Y is at most ( i + 1) α P X ∈S w X = ( i + 1) αw Y . Let Y be an arbitrary larg e cluster; since w e ha ve only ⌈ log k ⌉ tiers, the previo us lemma implies that the cost of Y is at most ⌈ log k ⌉ · αw Y = O (lo g 2 k ) ρw Y . That is, the density of Y is at most O (log 2 k ) ρ , and we can use this fact together with Lemma 4.6 to find a solution to the rooted k - 2 V C problem of cost at most O (log 2 k ) ρk + 2 L . This completes the ‘weaker’ analy sis, b ut this does not suf fice to prov e T heorem 2.3; to pro ve the theorem, we would need to use a lar ge cluster Y of density O (log k ) ρ , instead of O (log 2 k ) ρ . For the purpose of the more careful analysi s, implicitly construct a forest F on the clusters formed by M E R G E C L U S T E R S . In itially , the verte x set of F is just S , the s et of terminals , and F has no ed ges. Every time a cluster Y is formed by mer ging X 1 , X 2 , . . . X q , we add a corres ponding verte x Y to the forest F , and add edges from Y to each of X 1 , . . . X q ; Y is the parent of X 1 , . . . X q . W e also associate a cost with each vertex in F ; the cost of the vertex Y is the cost of the cycle used to form Y from X 1 , . . . X q . W e thus build up trees as the algorith m proceed s; the root of any tree correspo nds to a cluster that has not yet become part of a bigger cluste r . The lea ves of the trees correspond to vert ices of G ; they all hav e cost 0. Also, any lar ge cluster Y formed by the algorit hm is at the root of its tree; we refer to this tree as T Y . For each large cluster Y after M E R G E C L U S T E R S terminates, say tha t Y is of type i if Y was formed during iteration i of M er geClusters . W e now define the final-sta ge cluste rs of Y : They are the cluster s formed during iteration i that became part of Y . (W e include Y it self in the list of final-sta ge clusters; ev en though Y was formed in iteration i of M E R G E C L U S T E R S , it may contain other final-stage clusters. For instan ce, during iterati on i , we may mer ge se veral tier i clusters t o form a clu ster X of tier j > i . Then, if we find a good- density cyc le C th at contains an edge of X , X will merg e with the other clusters of C .) T he penultimate cluste rs of Y are thos e clusters that exist just before the beginn ing of iteration i and become a part of Y . Equi v alently , the penult imate clusters are those formed before iteration i that are the immediate children in T Y of final-stage cluste rs. Figure 1 illus trates the definitions of final-sta ge and penulti mate clusters. Such a tree cou ld be formed if, in iteration i − 1 , 4 clusters of this tier m er ged to form D , a cluster of tier i + 1 . Subsequentl y , in iteration i , clusters H an d J m er ge to form F . W e next fi nd a good cycle contai ning E and G ; F contains an edge of this cycl e, so these three clusters are merge d to form B . Note that the cost of this cycl e is paid for the by the weights of E and G only; F is a tier i + 1 cluster , and so its weight is not included in the density calcu lation. Finally , we find a go od cycle pai d for by A and C ; sin ce B and D share ed ges with this c ycle, they all mer ge to form the lar ge cluster Y . Y i i + 2 i i + 1 A B C D i i + 1 i E F G i i H J Figure 3: A part of the T ree T Y corres ponding to Y , a lar ge cluster of type i . The number in each ver tex indica tes the tier of the correspon ding cluste r . Only final-stage and penultimate clusters are sho wn: final-stage cluste rs are indic ated w ith a double circle; all other cluste rs are penu ltimate. An edge of a larg e cluste r Y is said to be a final edge if it is used in a cycle C that produces a final-stag e 15 cluste r of Y . A ll other edges of Y are called penult imate edge s ; note that any penulti mate edge is in some penult imate cluster of Y . W e define the final cost of Y to be the sum of the costs of its final edge s, and its penult imate cost to be the sum of the cost s of its penult imate edges ; clearly , the cost of Y is the sum of its final and penult imate costs. W e bo und the final costs and penultimate costs separately . Recall that an edge is a final edge of a lar ge cluster Y if it is used by M E R G E C L U S T E R S to form a cycle C in the final iteration during which Y is formed. The reason we can bound the cost of final edges is that the cost of any such cycle is at most α times the weight of clusters contain ed in the cycle , and a cluster does not contri bute to the weight of more than one cycl e in an iteration . (This is also the essence of Lemma 4.7.) W e formalize this intuit ion in the next lemma. Lemma 4.8. The final cost of any lar ge clus ter Y is at most αw Y , wher e w Y is the weight of Y . Proof . Let Y be an arbitra ry lar ge cluster . In the constructio n of the tree T Y , w e associated with each verte x of T Y the cost of the cycle used to form the corres ponding cluster . T o bound the total fi nal cost of Y , we must bound the sum of the costs of vertices of T Y associ ated with final-stage clusters. The weight of Y , w Y is at least the sum of the weights of the penultimate tier i clusters that become a part of Y . There fore, it suffices to show that the sum of the costs of vertices of T Y associ ated with final-stage cluster s is at most α times the sum of the weights of Y ’ s penulti mate tier i clusters. (Note that a tier i cluster m ust ha ve been formed prior to iterati on i , and hence it cannot itself be a final-stag e cluster .) A c ycle w as used to co nstruct a final-stage clu ster X only if i ts cost was at mos t α times the sum of weig hts of the penulti mate tier i cl usters that beco me a part of X . (Large r clusters may beco me a part o f X , but th ey do not contrib ute weight to the dens ity calcu lation.) Therefore , if X is a vert ex of T Y corres ponding to a final-stage cluste r , the cost of X is at most α times the sum of the weights of its tier i immediate children in T Y . But T Y is a tree, and so no verte x corres ponding to an penultimate tier i cluster has more than one parent. That is, the weight of a penu ltimate cluster pays for only one final-stage cluster . T herefo re, the sum of the c osts of vertic es associ ated with final-stage clusters is at most α times the sum of the weights of Y ’ s penultimate tier i clusters, and so the final cost of Y is at most αw Y . Lemma 4.9. If Y 1 and Y 2 ar e distinct lar ge cluste rs of the same type, no edge is a penulti mate edg e of both Y 1 and Y 2 . Proof . Supp ose, by way of contradict ion, that some edge e is a penultimate edge of both Y 1 and Y 2 , which are lar ge clusters of type i . Let X 1 (respe ctiv ely X 2 ) be a penultimate cluste r of Y 1 (resp. Y 2 ) containing e . As penult imate clusters, both X 1 and X 2 are formed before iterat ion i . But until iteratio n i , neither is part of a lar ge cluster , and two small cluste rs canno t share an edge without being merge d. Ther efore, X 1 and X 2 must ha ve been merg ed, so they cannot belong to disti nct lar ge clusters, givin g the desired contradicti on. Theor em 4.10. After M E R G E C L U S T E R S terminates , at least one lar ge cluster has densit y at most O (log k ) ρ . Proof . W e define the penultimat e density of a lar ge cluster to be the ratio of its penultimate cost to its weight. Consider the total penultimate costs of all large clusters: For any i , eac h edge e ∈ E ( G ) can be a penultimate edge of at m ost 1 lar ge c luster of type i . This implies t hat each edge can be a penultimate edge of at most ⌈ log k ⌉ cluste rs. Therefo re, the sum of penultimate costs of all lar ge clusters is at most ⌈ log k ⌉ cost ( G ) . Furth er , the total weight of all lar ge clusters is at least ℓ/ 2 . Therefore, the (weighted ) av erage penultimate densi ty of large cluste rs is at most 2 ⌈ log k ⌉ cost ( G ) ℓ = 2 ⌈ log k ⌉ ρ , and hence there exists a lar ge cluster Y of penultimate density at most 2 ⌈ log k ⌉ ρ . The penultimate cost of Y is, therefore, at most 2 ⌈ log k ⌉ ρw Y , and from Lemma 4.8, the final cost of Y is at most αw Y . Therefore, the density of Y is at most α + 2 ⌈ log k ⌉ ρ = O (log k ) ρ . 16 Theorem 4.10 and Lemma 4.6 togeth er imply that we can fi nd a solutio n to the roote d k - 2 V C problem of cost at most O (log k ) ρk + 2 L . T his complete s our proof of Theorem 2.3. 5 Conclusions W e list the follo wing open problems: • C an the ap proximation ratio for the k - 2 V C problem be improv ed fro m the current O (log ℓ log k ) to O (log n ) or b etter? Removing the d ependence on ℓ to obta in ev en O (log 2 k ) could be interest ing. If not, can one impro ve the approx imation ratio for the easier k - 2 E C problem? • C an we obt ain approxi mation algorithms for th e k - λ V C or k - λ E C prob lems for λ > 2 ? In gener al, few results are known for problems where ve rtex-co nnecti vity is requir ed to be greater than 2, but there has been more pro gress with higher edge-conn ecti vity requirements . • G i ven a 2-co nnected graph of densi ty ρ with so me verti ces marked as te rminals, we sho w that it contain s a non -tri vial cycle with d ensity at most ρ , and gi ve an algor ithm to find suc h a cycle. W e ha ve also f ound an O (log ℓ ) -approx imation for the problem of finding a minimum-densi ty non-tri vial cycl e. Is there a consta nt-fact or appro ximation for this pro blem? Can it be solv ed exactl y in polynomial time? Acknowledgments: W e thank Mohammad Sala vat ipour for helpful discuss ions on k - 2 E C and related prob- lems. W e thank Erin W olf Chambers for usefu l suggestions on notation. Refer ences [1] A. Agrawal, P . N. Klei n, and R. Ravi. When trees coll ide: An Approxi mation A lgorit hm for the General- ized Steiner Problem on Networks . SIAM J. on Computin g , 24(3) :440–456 , 1995. [2] R. Ahuja, T . Magnanti, and J. Orlin. Netwo rk Flo ws: Theory , Algorithms, and Applications . P rentic e Hall, Upper Saddle Riv er , NJ, 1993 [3] B. A werb uch, Y . Azar , A. B lum and S. V empala. New Approximation Gua rantees for Minimum W eight k - T rees and P rize-Col lecting Salesmen. SIAM J. on Computing , 28(1):254– 262, 1999. Preliminary versio n in Pr oc. of A CM STOC , 19 95. [4] A. Blum, S . Chawla, D. Karger , T . L ane, A. Meyerso n, and M . Minkof f. Approximati on Algorithms for Orienteer ing and Discoun ted-Re ward TSP . SIAM J. on Computi ng , 37(2):653– 670, 2007. [5] A. Blum, R. Ravi and S. V empala. A C onstan t-facto r Approximation Algorithm for the k -MST Problem. JCSS , 58:101– 108, 1999 . Prelimin ary version in Pr oc. of A CM STOC , 1 996. [6] K. Chaud huri, B. Godfre y , S. Rao, an d K. T alwar . Paths, trees, and minimum laten cy tours . Pr oc. of IEE E FOCS , 36–45, 2003. [7] C. Chekuri, G. Even, A. G upta, and D. Sege v . Set Connecti vity Problems in Undirected Graphs and the Directed Steiner Network Proble m. Pr oc. of ACM-SI AM SODA , 53 2–541, 2008. [8] C. Chekuri, M. T . Hajiaghay i, G. Ko rtsarz, and M. R. Salav atipou r . Approximatio n algorithms for N on- unifor m Buy-at-b ulk Network Design. P r oc. of IEEE FOCS , 677 –686, 2006. 17 [9] C. Chekuri, M. T . Hajiaghay i, G. K ortsa rz, and M. R. Sala vati pour . App roximation Algorithms for Node- weighted Buy-at-b ulk Network Design. P r oc. of A CM-SIAM SOD A , 1265–1 274, 2007. [10] C . Chekuri, N. Ko rula, and M. P ´ al. Impro ved Algorith ms for Orienteering and Related Problems. Pr oc. of ACM-S IAM SODA , 661–6 70, 2008. [11] F . A. Chudak, T . Roughgarden , and D. P . W illiamson. Approximate k -MSTs and k -Steiner Tr ees via the Primal-Dual Method and Lagrangea n Relaxatio n. Pr oc. of IP CO , 60– 70, 2001. [12] U . Feige, G. Korts arz and D . Pele g. The Dense k -Subgrap h Problem. Algorithmica , 29(3): 410–421, 2001. Preliminary ver sion in Pr oc. of IEEE FOCS , 1993. [13] L . F leisch er , K. Jain, D. P . W illiamson. Iterati ve R oundin g 2-appr oximation Algorithms for Minimum- cost V erte x C onnec tivi ty Problems. J. of Computer and System Scien ces , 72(5):838– 867, 2006. [14] N . G ar g. Savi ng an ǫ : A 2 -approxi mation for the k -MST Problem in Graphs. Pr oc. of ACM ST OC , 396–4 02, 2005. [15] N . Gar g. A 3-approx imation for the Minimum T ree Spann ing k V ertices. Pr oc. of IEEE FOCS , 302 –309, 1996. [16] M. X. Goeman s and D. P . Wi lliamson. A General App roximation T echnique fo r Constrain ed Forest Problems. SIAM J. on Computin g , 24(2):296–3 17, 1995. [17] M. X. G oemans and D. P . W illiamson. The P rimal-Dual method for Approximatio n Algorithms and its Applicat ion to Network Design Problems. In D. S. Hochbau m, editor , Appr ox imation Algor ithms for NP-Har d Pr oblems . PWS Publis hing C ompan y , 1996. [18] M. T . Hajiaghayi and K. Jain. The Prize-Co llecting Generalize d Steiner T ree Proble m via a New Approach of Primal-Dual Schema. Pr oc of ACM-SIAM S ODA , 63 1–640, 2006. [19] D . S. Hochbaum, edito r . Appr oximatio n Algorithms for NP -Har d Pr oblems . PWS Publishi ng C ompan y , 1996. [20] K . Jain. A Factor 2 Approximat ion Algorithm for the Generaliz ed S teiner Network Problem Combin a- torica , 21(1):3 9–60, 2001. Preliminary ver sion in Pr oc. of IEEE FO CS , 44 8–457, 1998. [21] D . S. Johnson. Appr oximation Algor ithms for Combinatoria l Problems. J. of C omputer and S ystem Scienc es , 9(3):25 6–278, 1974. [22] L .C. Lau, J. Naor , M. Salav atip our and M. Singh. Survi v able Network Design with Degree or Order Constrai nts. Pr oc . of ACM ST OC , 2007. [23] L .C. Lau, J. Naor , M. Salav atip our and M. Singh. Survi v able Network Design with Degree or Order Constrai nts. Manu script, 2007. [24] P . D. Seymour . N o where-zero 6-flo ws J. Comb . T heory B , 30: 130–135 , 1981. [25] V . V . V azirani. Appr ox imation Algorithms . Springer , 2001. 18
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment