Drawing (Complete) Binary Tanglegrams: Hardness, Approximation, Fixed-Parameter Tractability
A \emph{binary tanglegram} is a drawing of a pair of rooted binary trees whose leaf sets are in one-to-one correspondence; matching leaves are connected by inter-tree edges. For applications, for example, in phylogenetics, it is essential that both t…
Authors: Kevin Buchin, Maike Buchin, Jaroslaw Byrka
Dra wing (Complete) Binary T anglegrams: Hardness, Appro ximation, Fixed-P arameter T ractabilit y ∗ Kevin Buc hin † Maik e Buchin ‡ Jarosla w Byrk a § Martin N¨ ollen burg ¶ Y oshio Ok amoto k Ro drigo I. Silv eira ∗∗ Alexander W olff †† Abstract A binary tangle gr am is a dra wing of a pair of ro oted binary trees whose leaf sets are in one-to-one corresp ondence; matc hing lea ves are connected b y inter-tree edges. F or ap- plications, for example, in ph ylogenetics, it is essential that both trees are drawn without edge crossings and that the inter-tree edges ha ve as few crossings as p ossible. It is known that finding a tanglegram with the minim um num b er of crossings is NP-hard and that the problem is fixed-parameter tractable with resp ect to that num b er. W e pro ve that under the Unique Games Conjecture there is no constant-factor ap- pro ximation for binary trees. W e sho w that the problem is NP-hard even if b oth trees are complete binary trees. F or this case w e give an O ( n 3 )-time 2-appro ximation and a new, simple fixed-parameter algorithm. W e show that the maximization v ersion of the dual problem for binary trees can b e reduced to a version of MaxCut for whic h the algorithm of Go emans and Williamson yields a 0 . 878-approximation. Keyw ords. Binary tanglegram · crossing minimization · NP-hardness · approximation algorithm · fixed-parameter tractability ∗ W ork started at the 10th Korean W orkshop on Computational Geometry , Dagstuhl, Germany , 2007. A preliminary version [BBB + 09] of this pap er was presented at the 16th International Symposium on Graph Dra wing (GD’08). † F aculteit Wiskunde en Informatica, TU Eindhov en, The Netherlands. Email: k.a.buchin@tue.nl . K. Buchin was supp orted b y the Netherlands Organisation for Scientific Research (NWO) under pro ject no. 639.022.707 and 642.065.503. ‡ F aculteit Wiskunde en Informatica, TU Eindhov en, The Netherlands. Email: m.e.buchin@tue.nl . M. Buc hin was supp orted b y the German Research F oundation (DF G) under gran t no. BU 2419/1-1 and b y the Netherlands Organisation for Scientific Research (NW O) under pro ject no. 642.065.503. § Institute of Computer Science, Universit y of W roclaw, P oland. Email: jby@ii.uni.wroc.pl . J. Byrk a w as partially supported by MNiSW gran t num b er N N206 368839, 2010-2013. His research w as partially conducted at TU Eindho v en and at EPFL, Lausanne. ¶ Institute of Theoretical Informatics, Karlsruhe Institute of T echnology (KIT), German y . Email: noellenburg@kit.edu . M. N¨ ollen burg was supp orted by gran t WO 758/4-3 of the German Research F ounda- tion (DFG). k Graduate Sc ho ol of Information Science and Engineering, T okyo Institute of T echnology , Japan. Email: okamoto@is.titech.ac.jp . Y. Ok amoto w as partially supp orted by Grant-in-Aid for Scien tific Research and Global COE Program “Computationism as a F oundation for the Sciences” from Ministry of Education, Science and Culture, Japan, and Japan So ciety for the Promotion of Science. ∗∗ Dept. Matem` atica Aplicada II, Universitat Polit ` ecnica de Catalun y a, Spain. Email: rodrigo.silveira@ upc.edu . R. Silveira w as supp orted by the Netherlands Organisation for Scientific Researc h (NWO). †† Lehrstuhl I, Institut f ¨ ur Informatik, Universit¨ at W¨ urzburg, Germany . WWW: www1.informatik.uni- wuerzburg.de/en/staff/wolff alexander 1 1 In tro duction In this paper we are interested in dra wing so-called tangle gr ams [P ag02], that is, comparative dra wings of pairs of ro oted trees whose leaf sets are in one-to-one corresp ondence. The need to visually compare pairs of trees arises in applications such as the analysis of soft ware pro jects, ph ylogenetics, or clustering. In the first application, trees ma y represent pack age-class-metho d hierarchies or the decomp osition of a pro ject into lay ers, units, and mo dules [HvW08]. The aim is to analyze changes in hierarc hy ov er time or to compare human-made decomp ositions with automatically generated ones. Whereas trees in soft ware analysis can hav e no des of arbitrary degree, trees from our second application, that is, (ro oted) phylogenetic trees, are binary trees. This mak es binary tanglegrams an interesting sp ecial case, see Fig. 1. T anglegrams in phylogenetics are used, for example, to study cosp eciation [Pag02] or to compare evolutionary trees for the sp eciation of a single lineage but from differen t tree building metho ds. Hierarchical clusterings, our third application, are usually visualized by a binary tree-lik e structure called dendr o gr am , where elements are represented b y the leav es and each in ternal node of the tree represents the cluster con taining the lea ves in its subtree. Pairs of dendrograms stemming from different clustering processes of the same data can be compared visually using tanglegrams. Note that we are in terested in minimizing the num b er of crossings for visualization purp oses. The minim um, as a num b er, is not primarily in tended to b e a tree-distance measure (since, for example, a crossing num b er of zero do es not mean that tw o trees are equal). Examples of such measures are nearest-neigh b or in terc hange and subtree transfer [DHJ + 97]. (a) arbitrary la y out (b) lay out by our 2-approximation algorithm Fig. 1: A binary tanglegram showing t w o ev olutionary trees for lice of p ock et gophers [HSV + 94]. Let S and T b e tw o ro oted, unordered, n -leaf trees with node sets V ( S ) and V ( T ), edge sets E ( S ) and E ( T ), and leaf sets L ( S ) ⊆ V ( S ) and L ( T ) ⊆ V ( T ), resp ectively . In the remainder of the pap er, unless explicitly stated otherwise, trees are considered to be rooted and unordered. W e say that the pair of trees h S, T i is uniquely le af-lab ele d if there are tw o bijective lab eling functions λ S : L ( S ) → Λ and λ T : L ( T ) → Λ, where Λ = { 1 , . . . , n } is a set of lab els. F or a uniquely leaf-lab eled pair of trees h S, T i w e define the set E ( S, T ) = { uv | u ∈ L ( S ) , v ∈ L ( T ) , λ S ( u ) = λ T ( v ) } of inter-tr e e e dges , where each edge in E ( S, T ) connects tw o leav es with the same lab el. T anglegram La yout Problem 1 (TL) Given a uniquely leaf-lab eled pair of trees h S, T i , find a tangle gr am of h S, T i , that is, a dra wing of the graph G = ( V ( S ) ∪ V ( T ) , E ( S ) ∪ E ( T ) ∪ E ( S, T )) in the plane, with the follo wing prop erties: 1. The sub drawing of S is a plane, leftwar d drawing of S with the lea v es L ( S ) on the line x = 0 and each parent no de strictly to the left of all its children; 1 The name follo ws the common terminology in the biology literature [P ag02, LPR + 07, V ASG10]. Note that the problem has also been called the t w o-tree crossing minimization problem [FKP05] or the stratified tree ordering problem [DS04]. 2 2. the sub drawing of T is a plane, rightwar d drawing of T with the leav es L ( T ) on the line x = 1 and each parent no de strictly to the righ t of all its children; 3. the inter-tree edges E ( S, T ) are drawn as straight-line segments; 4. the num b er of crossings (b et ween inter-tree edges) in the drawing is minimum. In this pap er w e consider binary tanglegrams, that is, tanglegrams that consist of tw o ro oted binary trees. W e call the restriction of TL to binary trees the binary TL pr oblem . W e sa y that a rooted binary tree is c omplete (or p erfe ct ) if all its leav es hav e the same distance to the ro ot. Accordingly , we call the restriction of the binary TL problem to complete binary trees the c omplete binary TL pr oblem . Figure 1 shows tw o binary tanglegrams for the same pair of trees, an arbitrary tanglegram and one with a minimum n um b er of crossings. The TL problem is purely combinatorial: Given a tree T , w e say that a linear order of L ( T ) is c omp atible with T if for each no de v of T the no des in the subtree of v form an interv al in the order. F or a binary tree T the linear orders of L ( T ) that are compatible with T are exactly those orders that can be obtained from an initial plane left ward (or right ward) dra wing of T by performing a sequence of subtree swaps that flip the order of the tw o child subtrees at an internal no de. Given a p ermutation π of { 1 , . . . , n } , we call ( i, j ) an inversion in π if i < j and π ( i ) > π ( j ). F or fixed orders σ of L ( S ) and τ of L ( T ) we define the p ermutation π τ ,σ , whic h for a giv en p osition in τ returns the position in σ of the leaf having the same lab el. No w the TL problem consists in finding an order σ of L ( S ) compatible with S and an order τ of L ( T ) compatible with T such that the n um b er of inv ersions in π τ ,σ is minimum. Related problems. In graph drawing the so-called two-side d cr ossing minimization pr oblem (2SCM) is an important problem that occurs when computing lay ered graph la youts. Suc h la youts w ere introduced by Sugiyama et al. [STT81] and are widely used for drawing hierarchical graphs. In 2SCM, vertices of a bipartite graph are to b e placed on tw o parallel lines (called layers ) suc h that v ertices on one line are adjacent only to vertices on the other line. As in TL the ob jectiv e is to minimize the num b er of edge crossings provided that edges are drawn as straigh t-line segments. In one-sided crossing minimization (1SCM) the order of the vertices on one of the lay ers is fixed. Even 1SCM is NP-hard [EW94]. In contrast to TL, a vertex in an instance of 1SCM or 2SCM can hav e sev eral inciden t edges and the linear order of the vertices in the non-fixed lay er is not required to b e compatible with a tree. The following is known ab out 1SCM. The median heuristic of Eades and W ormald [EW94] yields a 3-appro ximation and a randomized algorithm of Nagamochi [Nag05] yields an exp ected 1.4664-appro ximation. Dujmo vi ˇ c et al. [DFK08] give an FPT algorithm that runs in O ? (1 . 4664 k ) time, where k is the minimum num b er of crossings in any 2-lay er drawing of the given graph that resp ects the vertex order of the fixed la yer. The O ? ( · )-notation ignores p olynomial factors. Previous work. Dwy er and Sc hreib er [DS04] dra w series of related tanglegrams in 2.5 dimensions. Eac h tree is drawn on a plane, and the planes are stack ed on top of each other. They consider a one- sided version of binary TL by fixing the la yout of the first tree in the stack, and then, plane-b y-plane, computing the leaf order of the next tree in O ( n 2 log n ) time eac h. Binary TL is also studied b y F ernau et al. [FKP05], although they refer to it as the two-tr e e cr ossing minimization problem. They show that binary TL is NP-hard and giv e a fixed-parameter algorithm that runs in O ? ( c k ) time, where c is a constant estimated to b e 1024 and k is the minim um num b er of crossings in an y drawing of the giv en tanglegram. In addition, they show that the one-sided v ersion of binary TL can be solved in O ( n log 2 n ) time. This improv es on the result of Dwyer and Schreiber [DS04]. F ernau et al. also mak e the simple observ ation that the edges of the tanglegram can b e directed from one ro ot to the other. Th us the existence of a crossing-free tanglegram can b e verified using a linear-time up ward- planarit y test for single-source directed acyclic graphs [BDMT98]. Later, apparently not b eing aw are of the ab ov e men tioned results, Lozano et al. [LPR + 07] give a quadratic-time algorithm for the same sp ecial case, to which they refer as planar tangle gr am layout . Holten and v an Wijk [HvW08] present 3 a visualization to ol for general tanglegrams that heuristically reduces crossings (using the barycenter metho d for 1SCM on a p er-lev el base) and dra ws in ter-tree edges in bundles (using B ´ ezier curv es). Our results. W e first analyze the complexit y of binary TL, see Section 2. W e sho w that binary TL is essen tially as hard as the MinUncut problem. If the (widely accepted) Unique Games Con- jecture holds, it is NP-hard to appro ximate MinUncut —and th us binary TL—within any constan t factor [KV05]. This motiv ates us to consider c omplete binary TL. It turns out that this special case has a rich structure. W e start our inv estigation by giving a new reduction from Max2Sa t that establishes the NP-hardness of complete binary TL. The main result of this pap er is a simple recursive factor-2 approximation algorithm for complete binary TL, see Section 3. It runs in O ( n 3 ) time and extends to d -ary trees. Our algorithm can also pro cess non-complete binary tanglegrams—without guaranteeing any approximation ratio. It works w ell in practice and is quite fast when combined with branc h-and-b ound [NVWH09]. Next we consider a dual problem: maximize the n um b er of edge pairs that do not cross. W e show that this problem (for binary trees) can b e reduced to a v ersion of MaxCut for which the algorithm of Go emans and Williamson [GW95] yields a 0 . 878-appro ximation. Finally , we inv estigate the parameterized complexity of co mplete binary TL. Our parameter is the n um b er k of crossings in an optimal dra wing. W e give a new FPT algorithm for complete binary TL that is m uc h simpler and faster than the FPT algorithm for binary TL b y F ernau et al. [FKP05]. The running time of our algorithm is O (4 k n 2 ), see Section 4. An in teresting feature of the algorithm is that the parameter does not drop in each level of the recursion. Subsequen t w ork. Since the presentation of the preliminary v ersion [BBB + 09] of this w ork, the TL problem has receiv ed a lot of attention. W e briefly summarize these recent developmen ts. B¨ oc ker et al. [BHTW09] present a fixed-parameter algorithm for binary TL that runs in O (2 k n 4 ) time. They further giv e a kernel-lik e b ound for complete binary TL. Baumann et al. [BBL10] study a generalized v ersion of TL, in which the lea ves no longer hav e to be in one-to-one correspondence; instead, the in ter- tree edges may form an y bipartite graph. They show ho w to form ulate the problem as a quadratic linear-ordering problem with additional side constraints. Bansal et al. [BCEFB09] study the same generalization, but restricted to binary TL. F or the one-sided case (where the leaf order of one tree is fixed), they giv e a p olynomial-time algorithm. On instances of (non-generalized) one-sided binary TL, their algorithm runs in O ( n log 2 n/ log log n ) time, impro ving on the algorithm of F ernau et al. Finally , V enk atachalam et al. [V ASG10] give an O ( n log n )-time solution for the same problem. 2 Complexit y In this section we consider the complexit y of binary TL, which F ernau et al. [FKP05] hav e shown to b e NP-complete. W e strengthen their findings in tw o wa ys. First, w e sho w that it is unlik ely that an efficien t constan t-factor approximation for binary TL exists. Second, we show that TL remains hard ev en when restricted to c omplete binary tanglegrams. W e start by showing that binary TL is essentially as hard as MinUncut , the dual formulation of the classic MaxCut problem [GJ79]. This result relates the existence of a constant-factor ap- pro ximation for binary TL to the Unique Games Conjecture (UGC). The UGC w as introduced b y Khot [Kho02] in the con text of interactiv e pro ofs. It concerns a scenario with tw o prov ers and a single round of answ ers to a question of the verifier. The word “unique” refers to the strategy of the v erifier, who for any fixed answer of one of the prov ers will accept the pro of only if the other prov er gives the unique second part of the pro of. The prov ers cannot communicate with eac h other. Still they wan t to maximize the probability of the pro of being accepted given that questions of the verifier are drawn randomly from a given distribution. The UGC states that it is NP-hard to decide whether the optimal strategy of the pro vers giv es them a high probabilit y of success. The UGC became famous when it was disco vered that it implies optimal hardness-of-appro ximation results for problems suc h as MaxCut and Ver texCover , and forbids constan t factor-appro ximation 4 algorithms for problems suc h as MinUncut and Sp arsestCut [KV05]. W e reduce the MinUncut problem to the binary TL problem, which, b y the result of Khot and Vishnoi [KV05], mak es it unlik ely that an efficient constan t-factor approximation for binary TL exists. The MinUncut problem is defined as follo ws. Given an undirected graph G = ( V , E ), find a partition ( V 1 , V 2 ) of the v ertex set V that minimizes the n umber of edges that are not cut b y the partition, that is, min ( V 1 ,V 2 ) |{ uv ∈ E : { u, v } ⊆ V 1 or { u, v } ⊆ V 2 }| . Note that an optimal solution for MinUncut of a graph G is at the same time an optimal solution for MaxCut of G . Nevertheless, the MinUncut problem is more difficult to appro ximate. Theorem 2.1. Under the Unique Games Conje ctur e it is NP-har d to appr oximate the TL pr oblem for binary tr e es within any c onstant factor. Pr o of. As mentioned ab ov e, w e reduce from the MinUncut problem. Our reduction is similar to the reduction in the NP-hardness pro of by F ernau et al. [FKP05]. Consider an instance G = ( V , E ) of the MinUncut problem. W e construct a binary TL instance h S, T i as follows. The tw o trees S and T are isomorphic and there are three groups of edges connecting lea v es of S to leav es of T . F or simplicit y of exp osition, we p ermit multiple in ter-tree edges b etw een a pair of leav es and also an in ter-tree connection of a leaf to many other leafs in the other tree. In the actual trees, we replace each such meta-leaf b y a binary tree with the appropriate num b er of regular lea v es. Let V = { v 1 , v 2 , . . . , v n } be the v ertex set of the graph G that constitutes our MinUncut in- stance. Then we construct b oth S and T as follows. W e start with what we call the b ackb one path h v 11 , v 12 , v 21 , v 22 , . . . , v n 1 , v n 2 , a i from the ro ot no de v 11 to a central leaf a . Additionally , for i ∈ { 1 , . . . , n } and j ∈ { 1 , 2 } , we attach eac h no de v ij to a leaf ` ij . (The construction of S and T is illustrated, for the complete graph K 3 = ( { v 1 , v 2 , v 3 } , { v 1 v 2 , v 2 v 3 , v 3 v 1 } ), in Fig. 2.) In the remainder of this proof, where needed, we use a sup erscript to denote the tree to which a leaf b elongs. The in ter-tree edges b etw een S and T form the following three groups. • Group A con tains n 11 edges connecting the cen tral lea ves of the tw o trees. • Group B contains, for eac h v i ∈ V , n 7 edges connecting ` S i 1 with ` T i 2 and n 7 edges connecting ` S i 2 with ` T i 1 . • Group C con tains, for eac h v i v j ∈ E , a single edge from ` S i 1 to ` T j 1 . Note that group C contains p ossibly more than one in ter-tree edge attached to a single leaf in the describ ed tree. The actual, final tree is then obtained b y replacing eac h leaf of the tree described ab ov e b y a tree with O ( n ) new leav es such that no tw o in ter-tree edges share a leaf. This replacemen t may cause new crossings, but no more than O ( n 2 ). Hence, these crossings can b e neglected in the analysis, where only terms of order n 11 will matter. Next, we show how to transform any partition in G into a solution of the corresp onding binary TL instance h S, T i . F or our reduction we will apply this transformation to the partition of an optimal solution to the given MinUncut instance. Let ( V ∗ 1 , V ∗ 2 ) b e the given partition of G and suppose that k is the num b er of edges that are not cut. W e no w construct a dra wing of h S, T i suc h that at most k · n 11 + O ( n 10 ) pairs of edges cross. (In the example of Fig. 2 we consider the cut ( { v 1 } , { v 2 , v 3 } ) with the uncut edge v 2 v 3 .) W e simply draw, for eac h v e rtex v i ∈ V ∗ 1 , the leav es ` S i 1 and ` T i 2 ab o ve the bac kb ones, and the leav es ` S i 2 and ` T i 1 b elo w the backbones. Symmetrically , for eac h vertex v i ∈ V ∗ 2 , w e dra w the leav es ` S i 1 and ` T i 2 b elo w the backbones, and the leav es ` S i 2 and ` T i 1 ab o ve the backbones. Let us chec k the resulting num b er of crossings. There are k · n 11 A–C crossings, no A–B crossings, at most | E | · n 8 ∈ O ( n 10 ) B–C crossings, and at most | E | 2 ∈ O ( n 4 ) C–C crossings. (In Fig. 2, we ha ve k = 1, | E | = 3, and n 11 + 2 n 7 + 1 crossings in total.) No w, supp ose there exists, for some constant α , an α -approximation algorithm for the binary TL problem. Applying this algorithm to the instance h S , T i defined ab ov e yields a dra wing D ( S , T ) with at most α · k · n 11 + O ( n 10 ) crossings. Let us assume that n is m uch larger than α and than any of the constants hidden in the O ( · )-notation. W e show that from such a dra wing D ( S , T ) we would b e able to reconstruct a cut ( V 1 , V 2 ) in G with at most α · k uncut edges. First, observe that no des ` S i 1 5 n 11 v 11 v 12 v 21 v 22 v 31 v 32 a v 11 v 12 v 21 v 22 v 31 v 32 a n 7 n 7 n 7 n 7 n 7 S T ` 11 ` 22 ` 32 ` 31 ` 21 ` 12 ` 12 ` 21 ` 31 ` 32 ` 22 ` 11 n 7 Legend for inter-tree edges group A group B group C Fig. 2: Binary TL instance corresp onding to the graph K 3 and the cut ( { v 1 } , { v 2 , v 3 } ). The crossings of the inter-tree edges are marked b y gray ellipses. and ` T i 2 m ust b e drawn either both abov e or b oth b elo w the backbones, otherwise there w ould be n 18 A–B crossings. Similarly , ` S i 2 m ust b e on the same side as ` T i 1 . Next, observe that no des ` S i 1 and ` S i 2 m ust b e dra wn on different sides of the bac kb ones, otherwise there would b e O ( n 14 ) B–B crossings. Finally , observe that if we interpret the set of v ertices v i for which ` S i 1 is drawn abov e the backbone as the set V 1 of a partition of G and its complemen t as the set V 2 , then this partition lea ves at most α · k edges from E uncut. Hence, an α -approximation for the binary TL problem would pro vide an α -approximation for the MinUncut problem, which w ould contradict the UGC. The ab ov e negativ e result for binary TL is our motiv ation to in vestigate the complexit y of complete binary TL. It turns out that even this sp ecial case is hard. Unlik e F ernau et al. [FKP05], who show ed hardness of binary TL by a reduction from MaxCut using extremely unbalanced trees, we use a quite differen t reduction from a v arian t of Max2Sa t . Theorem 2.2. The TL pr oblem is NP-c omplete even for c omplete binary tr e es. Pr o of. Recall the Max2Sa t problem which is defined as follows. Giv en a set U = { x 1 , . . . , x n } of Bo olean v ariables, a set C = { c 1 , . . . , c m } of disjunctive clauses containing tw o literals each, and an in teger K , the question is whether there is a truth assignmen t of the v ariables such that at least K clauses are satisfied. W e consider a restricted v ersion of Max2Sa t , where each v ariable app ears in at most three clauses. This version remains NP-complete [RRR98]. Our reduction constructs t wo complete binary trees S and T , in whic h certain aligned subtrees serv e as v ariable gadgets and others as clause gadgets. W e further determine an in teger K 0 suc h that the instance h S, T i has less than K 0 crossings if and only if the corresp onding Max2Sa t instance has a truth assignment that satisfies at least K clauses. The high-level structure of the tw o trees is depicted in Fig. 3. F rom top to b ottom, the four subtrees at level 2 on b oth sides are a clause subtree, a v ariable subtree, another clause subtree, and finally a dumm y subtree. The subtrees are connected to eac h other b y inter-tree edges such that in any optimal solution they must be aligned in the depicted (or mirrored) order. Eac h clause gadget app ears t wice, once in each clause subtree, and is connected to the v ariable gadgets belonging to its tw o literals. P airs of corresp onding gadgets in S and T are connected to each other. Finally , non-crossing dummy edges connect unused lea ves in order to make S and T complete. In the following, we describe the gadgets in more detail. 6 V ariable gadgets. The basic structure of a v ariable gadget consists of tw o complete binary trees with 32 lea ves each as shown in Fig. 4. Eac h tree has three highligh ted subtrees of size 2 lab eled a, b, c and a 0 , b 0 , c 0 , resp ectively . F rom each of these subtrees there is one red c onne ctor edge lea ving the gadget at the top and one leaving it at the b ottom. As long as t wo connector edges from the same tree do not cross eac h other, they transfer the vertical order of the lab eled subtrees tow ards a clause gadget. W e define the configuration in Fig. 4a as true and the configuration in Fig. 4b as false . If the configuration is in its true state, the induced vertical order of the connector edges is a < b < c , otherwise the order is inv erse: c < b < a . It can easily be v erified that b oth states ha ve the same n um b er of crossings. T o see that it is optimal observe that each pair of connector edges from the same subtree (for example, subtree a ) alwa ys crosses all 26 gray edges in the gadget. F urthermore, all 24 crossings of tw o connector edges in the figure are mandatory . Finally , the four crossings among the gra y edges b etw een subtrees 1 and 2 0 and subtrees 2 and 1 0 are also optimal. (Otherwise, if subtree 1 is aligned with subtree 2 0 , there are 12 edges from the upp er subtree on the left to the low er subtree on the right and 10 edges from the lo w er subtree on the left to the upp er subtree on the righ t that yield in total at least 120 gray–gra y crossings in addition to the 24 red–red crossings and the 156 red–gray crossings as opposed to a total of 184 crossings in either configuration of Fig. 4.) Note that some in ternal swaps within the subtrees 1, 2, 1 0 , 2 0 are p ossible that do not affect the num b er of crossings; none of them, ho wev er, changes the order of the connector edges since in any optimal solution the subtrees of the four crossing gray edges m ust alwa ys stay in the center of the gadget. Note that so far the gadget in the figure is designed for a single app earance of the v ariable since the four connector-edge triplets are required for a single clause. F or the Max2Sa t reduction, how ever, eac h v ariable can app ear up to three times in different clauses. By app ending a complete binary tree with four leav es as in Fig. 5 to each leaf of the gadget in Fig. 4 and copying eac h edge accordingly the abov e arguments still hold for the enlarged trees with 128 lea v es each. Un used connector edges in opp osite subtrees are link ed to each other ( a to a 0 , b to b 0 , c to c 0 ) as in Fig. 4b suc h that the num b er of crossings in the gadget remains balanced for b oth states. Clause gadgets. F or each clause c i = l i 1 ∨ l i 2 , where l i 1 and l i 2 denote the t wo literals, we create t w o clause gadgets: one in the upp er clause subtrees and one in the low er clause subtrees (recall Fig. 3). . . . . . . x 1 x n c 1 c m . . . x 1 x n c 1 c m . . . c 1 c m . . . c 1 c m . . . . . . . . . . . . clauses variables clauses S T red green gray Edge color legend Fig. 3: High-level structure of the tw o trees S and T . Red edges connect clause and v ariable gadgets, green edges connect corresp onding gadget halves, and gra y edges are dumm y edges to complete the trees. 7 1 2 1 0 2 0 a b c a 0 b 0 c 0 a b c a b c a 0 b 0 c 0 a 0 b 0 c 0 (a) x = true 2 1 2 0 1 0 b a c b 0 a 0 c 0 c b a c b a c 0 b 0 a 0 c 0 b 0 a 0 (b) x = false (a) A single gray edge. (b) Two pairs of connector edges for a v ariable used in three clauses. Fig. 4: The v ariable gadget in its t w o optimal configura- tions with 184 crossings. Red edges are drawn solid, whereas dash-dot style is used for gra y edges. Fig. 5: Replacing eac h edge by four edges. Eac h gadget itself consists of tw o parts: one part that uses the connectors from the first v ariable in the left tree and those from the second v ariable in the right tree and vice v ersa. Figure 6 shows one suc h part of the gadget in the low er clause subtrees, where the connector edges lead upw ards. The gadget in the upper clause subtree is simply a mirrored version. The basic structure consists of tw o aligned subtrees with eight leav es as depicted in Fig. 6. Three of the leav es on each side serve as the missing endpoints for the triplets of connector edges from the corresp onding v ariables. Recall that for a p ositiv e literal with v alue true the order of the connector edges is a < b < c , and for a positive literal with v alue false it is c < b < a . (F or negative literals the meaning of the orders is in verted.) The t wo connector leav es for the edges lab eled a and b are in the same four-leaf subtree, the connector leaf for c is in the other subtree. Three cases need to b e distinguished. If (1) b oth literals are true , then the configuration in Fig. 6a is optimal with 21 crossings. If (2) only one literal is true , then Fig. 6b shows again an optimal configuration with 21 crossings. Here the tree on the right side swapped the subtrees of the ro ot no de. Finally , if (3) b oth literals are false , there are at least 22 crossings in the gadget as sho wn in Fig. 6c. Since this substructure is rep eated four times for each clause w e ha ve 84 induced crossings for satisfied clauses and 88 induced crossings for unsatisfied clauses. 8 a b c a 0 b 0 c 0 true { } true 1 2 3 4 5 1 2 3 4 5 (a) true ∨ true : 21 crossings. c b a a 0 b 0 c 0 false { } true 1 2 3 5 4 1 2 3 5 4 (b) false ∨ true : 21 crossings. 1 2 3 5 4 c 0 b 0 a 0 c b a false { } false 1 2 3 5 4 (c) false ∨ false : 22 crossings. S T x i to x i − 1 to x i + 2 x i + 1 Fig. 6: Gadget for the clause c i = l i 1 ∨ l i 2 . Fig. 7: Linking adjacent v ariable gadgets for x i and x i +1 . Reduction. W e construct the gadgets for all v ariables and clauses and link them together as t wo trees S and T , which are filled up with dummy leav es and edges suc h that they b ecome complete binary trees. The general lay out is as depicted in Fig. 3, where each dummy leaf in S is connected to the opp osite dummy leaf in T such that there are no crossings among dummy edges. In each of the four main subtrees all dumm y edges are consecutiv e. Thus of all dummy edges only those in the v ariable subtree hav e crossings with exactly half the connector edges. It remains to compute the minimum num b er M of crossings that are alwa ys necessary , even if all clauses are satisfied. Then the Max2Sa t instance has a s olution with at least K satisfied clauses if and only if the constructed TL instance has a solution with at most K 0 = M + 4( | C | − K ) crossings. W e get the corresponding v ariable assignmen t directly from the lay out of the v ariable gadgets. The first step for computing M is to fix an (arbitrary) order for the v ariable gadgets in the v ariable subtree. Let this order b e x 1 < x 2 < . . . < x n . W e wan t to ac hieve that any other order w ould increase the num b er of crossings by a num b er that is too large for it to be part of an optimum solution. W e first establish neigh b or links b et ween adjacen t v ariable gadgets. F or these neigh b or links w e need eight of the 128 leav es in each half of each v ariable gadget as shown in Fig. 7. Since b oth subtrees b elo w the ro ot of x i in S and b oth subtrees b elow the ro ot of x i +1 in T are connected to each other, the minim um n um b er of crossings of those edges is indep endent of the truth state of each gadget. The next step is to enlarge the v ariable gadgets even further b y rep eatedly doubling all lea v es until each v ariable gadget has at least cm 2 gra y edges for some constan t c . (Note that in subtrees con taining 9 red connector edges, we do not duplicate any red edges but rather create new gray edges, similarly to Fig. 4b.) Now changing the v ariable order causes at least 8 cm 2 additional crossings since at least eigh t neighbor links w ould cross at least one v ariable gadget. W e explain how to choose c later. Once the order of the v ariables is fixed, w e sort all clauses lexicographically (a clause with v ariables x i < x j is smaller than a clause with v ariables x k < x l if x i < x k or if x i = x k and x j < x l ) and place smaller clauses to wards the top of the clause subtrees. Consider t w o clause gadgets in the same clause subtree. Then, in the given clause order, there are crossings b et ween their connector-edge triplets if and only if the interv als b etw een their resp ective v ariables intersect in the v ariable order. Since these crossings are una voidable for the given v ariable order, the num b er of connector-triplet crossings in the lexicographic order of the clauses is optimal. There are at most 36 crossings b et ween the connector- edge triples of any pair of clause gadgets in eac h of the t w o clause subtrees. So for all clause pairs in b oth clause subtrees w e get at most γ = 2 · 36 · m ( m − 1) / 2 crossings. If we c ho ose the constant c so that 8 cm 2 > γ , it nev er pays off to change the given v ariable order. So we can finally compute all necessary crossings betw een connector edges, dumm y edges and in tra-gadget edges whic h yields the n um b er M . Since each gadget has p olynomial size, the tw o trees and the num b er M can b e computed in p olynomial time. It is ob vious that the complete binary TL problem is in N P . 3 Appro ximation Algorithm W e start with a basic observ ation about binary tanglegrams. As we hav e noted in the in tro duction, TL is a purely com binatorial problem, that is, it suffices to determine tw o leaf orders σ and τ that are compatible with the input trees S and T , resp ectively . These orders are completely determined by fixing an order of the t w o subtrees of each inner node v ∈ S ◦ ∪ T ◦ , where S ◦ and T ◦ denote the set of inner no des of S and T . The algorithm will recursively split the t w o trees S and T at their ro ots in to tw o equally sized subinstances and determine leaf orders of S and T by c ho osing a lo cally optimal order of the subtrees b elo w the left and righ t root of the current subinstance. Let h S 0 , T 0 i b e an input instance for complete binary TL. W e assume that an initial lay out of S 0 and T 0 is giv en, that is, the subtrees of each v ∈ S ◦ 0 ∪ T ◦ 0 are ordered (otherwise c ho ose an arbitrary initial la y out). The ro ot of a tree T is denoted as v T . F or a binary tree T with the tw o ordered subtrees T 1 and T 2 of v T , we use the notation T = ( T 1 , T 2 ). F or each subinstance h S, T i with S = ( S 1 , S 2 ) and T = ( T 1 , T 2 ), w e need to consider the four configurations ( S 1 , S 2 ) × ( T 1 , T 2 ) (initial la yout), ( S 2 , S 1 ) × ( T 1 , T 2 ) (sw ap at v S ), ( S 1 , S 2 ) × ( T 2 , T 1 ) (sw ap at v T ), and ( S 2 , S 1 ) × ( T 2 , T 1 ) (sw ap at v S and v T ). F or each configuration, we recursively solve tw o subinstances and then c ho ose the configuration with the minimum n um b er of crossings. W e alwa ys split the instance h S, T i into an upp er and a lo w er half, that is, the subinstances dep end on the swap decision. If w e sw ap both v S and v T or none, the t wo subinstances are h S 1 , T 1 i and h S 2 , T 2 i ; if only one side is swapped, the subinstances are h S 1 , T 2 i and h S 2 , T 1 i . W e solv e b oth subinstances indep endently . In order to ac hieve the desired approximation ratio, ho wev er, we cannot ignore the swap history of the predecessor nodes of v T and v S . This history can be regarded as tw o bit strings h S and h T that represent the swap and no-swap decisions made at the previous steps of the recursion. Figure 8 shows an instance h S, T i and its sw ap history . The history is used to compute the num b er of curr ent-level cr ossings of h S, T i , that is, the num b er of crossings that are caused b y the swap decisions made for the current subinstance. The num b er of curren t-lev el crossings and the recursively computed num b ers of crossings of the subinstances determine whic h of the four configurations of the current instance is the b est one. Let lca( a, b ) b e the lo west common ancestor of t wo no des a and b of the same tree. An imp ortan t observ ation that is necessary to compute the n umber of current-lev el crossings is the following. Observ ation. F or e ach p air of inter-tr e e e dges ab and cd , a, c ∈ L ( S ) and b, d ∈ L ( T ) , the swap de cisions at the lowest c ommon anc estors lca( a, c ) and lca( b, d ) c ompletely determine whether ab and cd cr oss or not. Given the or der of the subtr e es of lca( a, c ) , swapping or not swapping the subtr e es of lca( b, d ) (and vic e versa) c auses or r emoves the cr ossing of ab and cd . 10 S 0 T 0 S S 1 S 2 v S 0 0 1 0 v T 1 1 0 1 0 v S 0 v T 0 T 1 T 2 1 T Fig. 8: The context of an instance h S, T i that is split into the subinstances h S 1 , T 2 i and h S 2 , T 1 i since T 1 and T 2 are sw app ed at v T . The swap history is indicated by binary swap v ariables along the paths to the ro ots v S 0 and v T 0 . When considering the current-lev el crossings of a subinstance h S, T i w e kno w from the swap history whic h of the nodes on the paths P S and P T from v S and v T to the ro ots v S 0 and v T 0 of the full trees, resp ectiv ely , hav e swap p ed their subtrees. Hence, for v S w e can compute the current-lev el crossings of all pairs of edges ab and cd with a ∈ L ( S 1 ), c ∈ L ( S 2 ), and lca( b, d ) ∈ P T ; analogously , w e can compute the crossings of all pairs of edges ab and cd with b ∈ L ( T 1 ), d ∈ L ( T 2 ), and lca( a, c ) ∈ P S . Note that if lca( b, d ) or lca( a, c ) is not one of the predecessor no des of v T or v S , but it is a no de in the subtree T or S , then the crossing of the edges ab and cd will b e considered in a subsequen t step. Otherwise, our algorithm cannot account for the crossing and w e may underestimate the num b er of crossings. Y et, we are able to bound this error later in Theorem 3.2. Algorithm 1 defines the recursiv e routine RecSplit that computes our tanglegram la yout. It is initially called with the parameters RecSplit ( S 0 , T 0 , ε, ε ), where ε is the empty string. In order to quickly calculate the num b er of curren t-level crossings we use a prepro cessing step. T o that end, w e compute t wo tables C = and C × of size O ( n 2 ). F or each pair ( v , w ) of inner no des in S ◦ × T ◦ , the entry C = [ v , w ] stores the num b er of crossings of edge pairs ab and cd with lca( a, c ) = v and lca( b, d ) = w if either b oth or none of v and w swap their subtrees. An en try C × [ v , w ] stores the analogous num b er of crossings if only one of v and w swap their subtrees. Lemma 3.1. The tables C = and C × c an b e c ompute d in O ( n 2 ) time. Pr o of. W e initialize all en tries as 0 and prepro cess S 0 and T 0 in linear time to supp ort lo w est-common- ancestor queries in O (1) time [GT83]. Then we determine for each pair of in ter-tree edges their lo west common ancestors in S 0 and T 0 and increment the corresponding table entry dep ending on which tw o configurations yield the crossing. This takes O ( n 2 ) time for all edge pairs. Once we hav e computed C = and C × , w e can determine the num b er of current-lev el crossings for an y subinstance h S, T i in O (log n ) time by summing up the appropriate table en tries dep ending on the swap history along the paths P T and P S , which are of length O (log n ). The running time Algorithm 1 satisfies the recurrence T ( n ) ≤ 8 T ( n/ 2) + O (log n ), which solves to T ( n ) = O ( n 3 ) by the master metho d [CLRS01]. W e now pro ve that the algorithm yields a 2- appro ximation. Theorem 3.2. Given a c omplete binary TL instanc e h S 0 , T 0 i with n le aves in e ach tr e e, A lgorithm 1 c omputes in O ( n 3 ) time a dr awing of h S 0 , T 0 i that has at most twic e as many cr ossings as an optimal dr awing. Pr o of. Fix an y drawing δ of h S 0 , T 0 i . Algorithm 1 tries, for eac h subinstance h S, T i of h S 0 , T 0 i , all four p ossible configurations of S = ( S 1 , S 2 ) and T = ( T 1 , T 2 )—among them the configuration in δ . 11 Algorithm 1: RecSplit ( S, T , h S , h T ) Input : n -leaf trees S = ( S 1 , S 2 ) and T = ( T 1 , T 2 ), swap histories h S and h T Output : lo wer b ound cr S T on the num b er of crossings created b y the algorithm; orders σ and τ for the lea ves of S and T , resp ectively 1 if n = 1 then 2 return (cr S T , σ, τ ) = (0 , v S , v T ) 3 else 4 cr S T = ∞ 5 foreac h (swp S , swp T ) ∈ { 0 , 1 } 2 do 6 lo op thr ough al l four c ases to swap subtr e es of S and T 7 cl ← current level crossings induced by (swp S , swp T ) 8 (cr 1 , σ 1+swp S , τ 1+swp T ) ← RecSplit ( S 1+swp S , T 1+swp T , ( h S , swp S ) , ( h T , swp T )) 9 (cr 2 , σ 2 − swp S , τ 2 − swp T ) ← RecSplit ( S 2 − swp S , T 2 − swp T , ( h S , swp S ) , ( h T , swp T )) 10 if cl + cr 1 + cr 2 < cr S T then 11 cr S T ← cl + cr 1 + cr 2 12 if swp S = 0 then 13 σ ← ( σ 1 , σ 2 ) 14 else σ ← ( σ 2 , σ 1 ) 15 if swp T = 0 then 16 τ ← ( τ 1 , τ 2 ) 17 else τ ← ( τ 2 , τ 1 ) 18 return (cr S T , σ, τ ) Assume that the configuration in δ is h ( S 1 , S 2 ) , ( T 1 , T 2 ) i . W e determine an upp er bound on the n umber of crossings that the algorithm fails to count for the drawing δ . In eac h of the trees S 0 and T 0 w e distinguish four differen t areas for the endp oin ts of the edges: ab o ve S 1 , in S 1 , in S 2 , below S 2 and similarly abov e T 1 , in T 1 , in T 2 , b elo w T 2 . W e num b er these regions from 0 to 3, see Fig. 9. This allo ws us to classify the edges in to 16 groups (tw o of which, 0–0 and 3–3, are not relev an t). W e denote the n um b er of i – j e dges , that is, edges from area i to area j , by n ij ( S, T ) (for i, j ∈ { 0 , 1 , 2 , 3 } ). Figure 9a sho ws the four groups of i – j edges for i = 1. The only crossings that the algorithm do es not tak e into account are crossings b etw een edges whose low est common ancestors lie in parts of S 0 and T 0 that are split apart into differen t branches of the recursion. F or the subinstance h S, T i , which is split into h S 1 , T 1 i and h S 2 , T 2 i , this means that for all n 12 ( S, T ) edges that run b etw een S 1 and T 2 , we fail to consider all crossings b et ween pairs of t w o suc h edges. Similarly , we do not consider an y pair of the n 21 ( S, T ) edges betw een S 2 and T 1 . Let’s return to the drawing δ and consider the set I of subinstances that correspond to δ , that is, all pairs of opp osing subtrees in δ . F or each subinstance h S, T i ∈ I we do not account for crossings of pairs of 1–2 edges and pairs of 2–1 edges since these edges run b etw een t w o subinstances that are solv ed indep enden tly . In the worst case all these edge pairs cross and the algorithm misses n 12 ( S,T ) 2 + n 21 ( S,T ) 2 crossings. Let c δ b e the num b er of crossings of δ coun ted by the algorithm, and let | δ | b e the actual num b er of crossings of δ . Clearly , we hav e c δ ≤ | δ | . W e can b ound | δ | from ab o v e b y | δ | ≤ c δ + X h S,T i∈I n 12 ( S, T ) 2 + n 21 ( S, T ) 2 ≤ c δ + X h S,T i∈I n 2 12 ( S, T ) + n 2 21 ( S, T ) 2 . (1) W e now show that P h S,T i∈I ( n 2 12 ( S, T ) + n 2 21 ( S, T )) ≤ 2 c δ . F or the sake of conv enience, w e abbre- viate n ij ( S, T ) by n ij in the following. W e will b ound n 2 12 b y the num b er of c rossings of the 1–2 edges 12 S 1 S 2 v S T 1 T 2 v T 0 1 2 3 0 1 2 3 n 10 n 11 n 12 n 13 (a) Edges incident to L ( S 1 ) are separated into four groups by the area of their second endp oint. S 1 S 2 v S T 1 T 2 v T 1 2 1 2 n 12 n 30 n 21 n 20 n 31 n 03 0 3 0 3 (b) All edge groups that cross the n 12 1–2 edges b e- t w een L ( S 1 ) and L ( T 2 ). Fig. 9: Areas of the endp oin ts and types of edges incident to L ( S ) and L ( T ). Cardinalities n ij ( S, T ) are abbreviated as n ij . in δ that are counted by the algorithm. This n umber is at least c 12 = n 12 · ( n 03 + n 20 + n 21 + n 30 + n 31 ) (2) as can b e seen in Fig. 9b. All these crossings are current-lev el crossings at this or some earlier p oint in the algorithm. Since our (sub)trees are complete and thus S 1 and T 1 ha v e the same num b er of leav es, w e obtain n 10 + n 12 + n 13 = n 01 + n 21 + n 31 . (3) F urthermore, we hav e the follo wing equality for the edges from areas 0 on b oth sides n 01 + n 02 + n 03 = n 10 + n 20 + n 30 . (4) F rom (3) we obtain n 12 ≤ n 01 − n 10 + n 21 + n 31 and from (4) we obtain n 01 − n 10 ≤ n 20 + n 30 . Hence, w e hav e n 12 ≤ n 20 + n 30 + n 21 + n 31 . With (2) this yields n 2 12 ≤ n 12 · ( n 20 + n 30 + n 21 + n 31 ) ≤ c 12 , (5) that is, n 2 12 is b ounded b y the num b er of crossings that inv olve a 1–2 edge in δ and that are coun ted b y the algorithm. Analogously , we obtain n 2 21 ≤ n 21 · ( n 02 + n 03 + n 12 + n 13 ) ≤ c 21 , (6) that is, n 2 21 is bounded by the n um b er of crossings coun ted by the algorithm that inv olve a 2–1 edge in δ . So from (5) and (6) we hav e n 2 12 ≤ c 12 and n 2 21 ≤ c 21 . Applying this argument to all subinstances h S, T i ∈ I we get X h S,T i∈I ( n 2 12 ( S, T ) + n 2 21 ( S, T )) ≤ X h S,T i∈I c 12 ( S, T ) + X h S,T i∈I c 21 ( S, T ) ≤ 2 · c δ . (7) The fact that P h S,T i∈I c 12 ( S, T ) ≤ c δ holds is due to each edge crossing δ app earing in at most one term c 12 ( S, T ). This can b e seen as follows. Let ab be a 1–2 edge in the subinstance h S, T i . Then in all paren t instances of the recursion, ab was still a 1–1 edge or a 2–2 edge; such edges do not app ear 13 S T S T 1 m 2 m 3 m 4 m 1 m 2 m 3 m 4 m Fig. 10: Example of a tanglegram for whic h our algorithm may output a drawing (left) that has roughly twice as man y crossings as the optimal dra wing (right). in an y previous c 12 -term. In a subsequent instance h S 0 , T 0 i below h S , T i in the recursion the edge ab migh t in fact reapp ear, for example as a 0–3 edge. At that p oint, how ever, it is considered as an edge that crosses one of the 1–2 edges of h S 0 , T 0 i , say cd . But then cd was considered as a 1–1 or 2–2 edge in all previous instances. Hence, the crossing b et ween ab and cd do es not app ear in any other c 12 -term. Analogous reasoning yields P h S,T i∈I c 21 ( S, T ) ≤ c δ Plugging (7) into (1) yields | δ | ≤ 2 c δ . Now let A ? b e the solution computed by Algorithm 1 and let S ? b e an optimal solution. W e denote their actual num b ers of crossings by | A ? | and | S ? | , resp ectiv ely . By c A ? and c S ? w e denote the num b er of crossings counted by our algorithm for the dra wings A ? and S ? , resp ectively . Since | δ | ≤ 2 c δ for any drawing δ we get | A ? | ≤ 2 c A ? ≤ 2 c S ? ≤ 2 | S ? | , that is, the algorithm is indeed a factor-2 approximation. W e note that the appro ximation factor of 2 is tigh t: let n = 4 m , let S hav e lea ves ordered 1 , . . . , 4 m , and let T hav e lea ves ordered 1 , . . . , m, 3 m, . . . , 2 m + 1 , m + 1 , . . . , 2 m, 3 m + 1 , . . . , 4 m (see Fig. 10). Then our algorithm may construct a dra wing with m 2 + 2 m 2 = 2 m 2 − m crossings, while the optimal dra wing has only m 2 crossings. Non-complete binary trees. Algorithm 1 can also b e applied to non-complete tanglegrams with minor mo difications. The only essen tial difference is that during the algorithm we can encoun ter the situation that a single leaf v of one tree is paired with a larger subtree T 0 of the other tree. In that case we contin ue the recursion for those subtrees of T 0 that contain an edge to v in order to find their lo cally optimal swap decisions. F or non-complete tanglegrams, how ever, the approximation factor do es not hold any more. N¨ ollen burg et al. [NVWH09] ha ve ev aluated sev eral heuristics for binary TL, among them the modified v ersion of Algorithm 1. Generalization to d -ary trees. The algorithm can b e generalized to complete d -ary trees. The recurrence relation of the running time changes to T ( n ) ≤ d · ( d !) 2 · T ( n/d ) + O (log n ) since w e need to consider all d ! subtree orderings of both trees, each triggering d subinstances of size n/d . This resolv es to T ( n ) = O ( n 1+2 log d ( d !) ). F or d ≥ 3 the running time is upp er-b ounded b y O ( n 2 d − 1 . 7 ). At the same time the approximation factor increases to 1 + d 2 . This is b ecause for any pair ( i, j ) with 1 ≤ i < j ≤ d the algorithm fails to accoun t for p oten tial crossings b etw een the trees S i and T j as w ell as b etw een S j and T i . This n umber can b e b ounded for each of the d 2 pairs by the num b er of crossings in the optimal solution using our argumen ts for binary trees. Maximization version. Instead of the original TL problem, which minimizes the n umber of pairs of edges that cross eac h other, w e now consider the dual problem TL ? of maximizing the n umber of pairs of edges that do not cross. The sets of optimal solutions for the t wo problems are the same, but from the p ersp ective of appro ximation the problems differ a lot, at least in the binary case: in con trast to 14 binary TL, which is hard to approximate as w e ha ve shown in Theorem 2.1, binary TL ? has a constant- factor approximation algorithm. W e sho w this by reducing binary TL ? to a constrained v ersion of the MaxCut problem, whic h can be solved approximately with the semidefinite programming (SDP) rounding algorithm of Go emans and Williamson [GW95]. Their algorithm runs in p olynomial time; solving the underlying SDP relaxation of the problem is the most time-consuming step. Still, SDP relaxations of MaxCut instances of up to 7000 v ariables can b e solv ed in practice [BM01]. Theorem 3.3. Ther e exists a p olynomial-time factor- 0 . 878 appr oximation algorithm for binary TL ? . Pr o of. Let h S, T i b e an instance of binary TL ? . Fix an y initial drawing of h S, T i . As before, we asso ciate a decision v ariable with each inner no de of the tw o trees. The v ariable decides whether w e do or do not s w ap the children at the corresponding node. W e mo del this situation by a w eighted graph G = ( V , E ); a sw ap decision corresp onds to deciding to which side of a cut the corresponding v ertex is assigned. More precisely , for each inner no de u of h S, T i , the graph G contains tw o vertices u and u 0 . W e will also imp ose a constraint that u and u 0 m ust b e separated b y a cut w e are lo oking for. As we will indicate later, we can use the algorithm of Go emans and Williamson [GW95] to find large cuts among those separating all pairs of t yp e ( u, u 0 ). F or each pair ab and cd of in ter-tree edges with a, c ∈ L ( S ) and b, d ∈ L ( T ), the graph G contains a w eighted edge that w e construct as follows. Let v = lca( a, c ) and w = lca( b, d ) b e the lo west common ancestors of the edge pair. If ab and cd cross in the initial drawing, we add the edge v w with weigh t 1 to G . If the edge is already presen t, we increase its w eight by one. If the t wo edges do not cross in the initial drawing, then w e analogously add the edge v w 0 to G or increase its weigh t by one. Consider a cut in G that for each inner no de u of h S, T i separates u and u 0 . W e claim that any suc h cut enco des a drawing of h S, T i . T o see this, let ( F , N = ( V \ F )) be such a cut. Starting from the initial drawing we construct a new drawing as follows. Let u b e an inner no de of h S, T i . If u ∈ F and u 0 ∈ N , w e sw ap the children of the inner no de u of the current drawing. If u ∈ N and u 0 ∈ F , w e do nothing. (Note that exchanging the roles of the sets F and N yields the mirrored drawing with the same num b er of crossings.) F or a moment, think of G as of a multigraph that is obtained by replacing each edge of weigh t k b y k edges of weigh t one. Let us argue that the ab ov e describ ed procedure to deco de drawings from cuts has the prop erty that in the resulting dra wing of h S, T i , pairs of in ter-tree edges that do not cross corresp ond one-to-one to edges in G that are cut by ( F , N ). Consider first the cut corresp onding to the initial drawing, namely the cut with u ∈ N for eac h inner no de u of h S, T i and observ e that the claim holds for this cut. Now consider a single sw ap op eration at an inner no de u of h S , T i and the corresp onding change in the cut. Note that it changes the “cut status” of exactly those pairs of edges that ha v e u as the lo w est common ancestor of t w o of their endp oin ts; at the same time it also c hanges the cut status of exactly the edges in G corresp onding to these pairs of edges in the drawing. Since any cut in G ma y b e reached by a finite sequence of such sw ap op erations from the initial one, the prop erty holds for any cut. Therefore, the n umber of pairs of non-crossing inter-tree edges in the obtained drawing equals the total weigh t of the cut (in the original, weigh ted version of G ). The resulting optimization problem is the MaxResCut problem, that is, MaxCut with ad- ditional constraints forcing certain pairs of vertices to b e separated by the cut. Goemans and Williamson [GW95], when describing their famous algorithm for the MaxCut problem, observed that adding constraints to separate certain pairs of vertices does not make the problem harder to appro ximate. It is sufficient to enco de these constraints as additional linear constraints in the SDP relaxation and to observ e that random hyperplanes used to separate vertices alw ays separate suc h constrained pairs. W e use their SDP rounding algorithm for MaxResCut to compute a 0 . 878-appro ximation of the largest cut in G . This cut determines whic h of the subtrees in the initial drawing must b e sw app ed to obtain a drawing that is a 0 . 878-appro ximation to binary TL ? . Note that our pro of also w orks in a sligh tly more general case, namely for pairs of (not necessarily binary) trees where for eac h inner no de the only c hoice for arranging the children is b et w een a given p erm utation and the reverse permutation obtained by sw apping the whole blo ck of c hildren. 15 4 Fixed-P arameter T ractabilit y W e consider the follo wing parameterized v arian t of the complete binary TL problem. Given a complete binary TL instance h S, T i and a non-negative integer k , decide whether there exists a la yout of S and T with at most k induced inter-tree edge crossings. Our algorithm mak es use of the same technique to count current-lev el crossings as the 2-appro ximation algorithm. Hence, we precompute the crossing tables C = and C × in O ( n 2 ) time as b efore, see Lemma 3.1. The algorithm trav erses the inner no des of S in breadth-first order. It starts at the ro ot of S and its corresponding no de in T (in this case the root of T ), branches into all four p ossible subtree configurations (at the ro ot it actually suffices to consider t w o of them), and subtracts from k the num b er of curren t-level crossings in each branc h. Then we pro ceed recursiv ely with the next node v in S , its corresp onding opposite no de w in T , and the reduced parameter k 0 of allow ed crossings. In eac h no de of the searc h tree we count the current-lev el crossings for eac h of the subtree orders of v and w b y summing up in linear time the appropriate entries in C = and C × for v (or w ) and all of the O ( n ) subtree orders that are already fixed in T (or S ). Once we reac h a leaf of the search tree we know the exact n umber of crossings since each pair of edges ab and cd is counted as so on as the subtree orders of b oth lca( a, c ) and lca( b, d ) are fixed. Obviously , we stop follo wing a branch of the search tree when the parameter v alue drops b elow 0. F or the search tree to hav e b ounded height, we need to ensure that whenever we mov e to the next subinstance, the parameter v alue decreases at least b y one. A t first sight this seems problematic: if a subinstance do es not incur any curren t-level crossings, the parameter will not drop. The follo wing k ey lemma—which does not hold for non-complete binary trees—sho ws that there is a w ay out. It sa ys that if there is an order of the subtrees in a subinstance that does not incur any curren t-level crossings, then we can ignore the other three subtree orders and do not hav e to branch. Lemma 4.1. L et h S, T i b e a c omplete binary TL instanc e, and let v S b e a no de of S and v T a no de of T such that v S and v T have the same distanc e to their r esp e ctive r o ot. F urther, let ( S 1 , S 2 ) b e the subtr e es incident to v S and let ( T 1 , T 2 ) b e the subtr e es incident to v T . If the subin- stanc e h ( S 1 , S 2 ) , ( T 1 , T 2 ) i do es not incur any curr ent-level cr ossings, then e ach of the subinstanc es h ( S 1 , S 2 ) , ( T 2 , T 1 ) i , h ( S 2 , S 1 ) , ( T 1 , T 2 ) i , and h ( S 2 , S 1 ) , ( T 2 , T 1 ) i has at le ast as many cr ossings as the instanc e h ( S 1 , S 2 ) , ( T 1 , T 2 ) i , for any fixe d or dering of the le aves of S 1 , S 2 , T 1 and T 2 . Pr o of. If the subinstance h ( S 1 , S 2 ) , ( T 1 , T 2 ) i do es not incur an y current-lev el crossings, this excludes certain types of edges. W e categorize the in ter-tree edges originating from the four subtrees according to their destinations as b efore, and use the notation n ij for the n um b er of edges betw een area i on the left and area j on the righ t—see Fig. 11a. First of all, there are no edges b etw een S 1 and T 2 or betw een S 2 and T 1 . W e consider only the first case, that is, n 12 = 0; the second case n 21 = 0 is symmetric. In b oth cases, we hav e n 13 = n 31 = n 20 = n 02 = 0. Since we consider complete binary trees, we obtain the three equalities n 10 = n 01 + n 21 , n 32 = n 23 + n 21 , and n 01 + n 11 = n 23 + n 22 . W e fix an ordering σ of the lea v es of the four subtrees S 1 , S 2 , T 1 , and T 2 . W e first compare the num b er of crossings in the subinstance h ( S 1 , S 2 ) , ( T 1 , T 2 ) i with the num b er of crossings in the subinstance h ( S 2 , S 1 ) , ( T 2 , T 1 ) i , see Figures 11a and 11b. The subinstance h ( S 1 , S 2 ) , ( T 1 , T 2 ) i can ha ve at most n 21 ( n 11 + n 22 ) crossings that do not o ccur in h ( S 2 , S 1 ) , ( T 2 , T 1 ) i . How ev er, h ( S 2 , S 1 ) , ( T 2 , T 1 ) i has at least n 10 ( n 23 + n 21 + n 22 ) + n 23 n 11 + n 32 ( n 01 + n 21 + n 11 ) + n 01 n 22 crossings that do not appear in h ( S 1 , S 2 ) , ( T 1 , T 2 ) i . Plugging in the ab ov e equalities for n 10 and n 32 , we get ( n 01 + n 21 )( n 23 + n 21 + n 22 )+ n 23 n 11 + ( n 23 + n 21 )( n 01 + n 21 + n 11 ) + n 01 n 22 ≥ n 21 ( n 11 + n 22 ). Thus, the subinstance h ( S 2 , S 1 ) , ( T 2 , T 1 ) i has at least as many crossings with resp ect to the fixed leaf order σ as h ( S 1 , S 2 ) , ( T 1 , T 2 ) i has. Next, w e compare the n um b er of crossings in the subinstance h ( S 1 , S 2 ) , ( T 1 , T 2 ) i with the n umber of crossings in the subinstance h ( S 1 , S 2 ) , ( T 2 , T 1 ) i , see Figures 11a and 11c. Now the n umber of additional crossings of h ( S 1 , S 2 ) , ( T 1 , T 2 ) i is at most n 21 n 22 , and the subinstance h ( S 1 , S 2 ) , ( T 2 , T 1 ) i introduces at least ( n 01 + n 11 )( n 32 + n 22 )+ n 32 n 21 additional crossings. With the equality n 01 + n 11 = n 23 + n 22 and the inequalit y n 32 + n 22 ≥ n 21 w e get ( n 01 + n 11 )( n 32 + n 22 ) + n 32 n 21 ≥ ( n 23 + n 22 + n 32 ) n 21 ≥ n 22 n 21 . Thus, the subinstance h ( S 1 , S 2 ) , ( T 2 , T 1 ) i has at least as man y crossings with resp ect to σ as h ( S 1 , S 2 ) , ( T 1 , T 2 ) i has. 16 S 1 S 2 v S T 1 T 2 v T 0 1 2 3 n 11 n 22 n 21 n 10 n 01 n 32 n 23 (a) h ( S 1 , S 2 ) , ( T 1 , T 2 ) i S 2 S 1 v S T 2 T 1 v T 0 1 2 3 n 10 n 01 n 22 n 21 n 32 n 23 n 11 (b) h ( S 2 , S 1 ) , ( T 2 , T 1 ) i S 1 S 2 v S T 2 T 1 v T 0 1 2 3 n 10 n 01 n 22 n 32 n 23 n 11 n 21 (c) h ( S 1 , S 2 ) , ( T 2 , T 1 ) i Fig. 11: Edge types and crossings of the instance h S, T i . Only non-empt y classes of edge t yp es are sho wn. By symmetry , the same holds for the last case h ( S 2 , S 1 ) , ( T 1 , T 2 ) i , which incurs at least as many crossings as n 11 n 21 , the num b er of crossings that can b e present in h ( S 1 , S 2 ) , ( T 1 , T 2 ) i but not in h ( S 2 , S 1 ) , ( T 1 , T 2 ) i . Coun ting the current-lev el c rossings tak es O ( n ) time for eac h no de that fixes its subtree order. If an order do es not incur any curren t-level crossings we migh t need to fix in total up to O ( n ) subtree orders and coun t the incurred crossings until we reach a new no de of the searc h tree. Thus we s pend O ( n 2 ) time for each of the O (4 k ) search-tree no des. Including the prepro cessing this yields a total running time of O ( n 2 + 4 k n 2 ). If the algorithm reaches a leaf of the search tree it has fixed all s ubtree orders in S and T and th us found a lay out of the input instance that has at most k in ter-tree edge crossings. If the search stops without reaching a leaf there is no lay out of h S, T i with at most k in ter-tree edge crossings. Theorem 4.2. Given a c omplete binary TL instanc e h S, T i with n le aves in e ach tr e e and an inte ger k , in O (4 k n 2 ) time we c an either determine a layout of h S, T i with at most k inter-tr e e e dge cr ossings or r ep ort that no such layout exists. Finally , the fact that Lemma 4.1 relies on the completeness of the tw o trees is illustrated in Fig. 12. Here w e hav e an example of an instance whose optimal lay out requires a current-lev el crossing (Fig. 12a). A t the same time, the configuration h ( S 1 , S 2 ) , ( T 2 , T 1 ) i has no current-lev el crossing. According to Lemma 4.1 the leaf order of the optimal lay out copied in to the lay out without current- lev el crossings would pro duce at most as man y crossings as in the other lay out. Figure 12b shows that this is not true in our example. The b est solution of the configuration h ( S 1 , S 2 ) , ( T 2 , T 1 ) i still has tw o crossings and is not optimal (Fig. 12c). Hence, we do hav e to consider al l subtree orders even if one of them incurs no current-lev el crossings. This means that we cannot b ound the size of the searc h tree in terms of the parameter k as w e ha ve done for complete binary trees. 5 Op en Problems W e hav e sho wn that one cannot exp ect to find a constan t-factor appro ximation for binary TL. W ould it help if one of the tw o given trees w as complete? W e hav e giv en a factor-2 approximation for complete binary TL. It is natural to ask whether we can do b etter. An alternative optimization goal is to remo v e a minimum n um b er of inter-tree edges in order to obtain a planar tanglegram. 17 0 1 2 3 4 5 0 1 2 3 4 5 S 1 S 2 T 2 T 1 (a) 0 S 1 0 1 2 3 4 5 T 2 T 1 1 2 3 4 5 S 2 (b) 0 S 1 0 1 2 5 4 3 T 2 T 1 5 4 2 3 1 S 2 (c) Fig. 12: Example of a binary TL instance with an optimal lay out that has one crossing (a). The same order of the leav es in the subtrees S 2 and T 2 yields four crossings for a configuration without current-lev el crossings (b). The b est la yout that a voids the current-lev el crossing still has tw o crossings (c). Ac kno wledgmen ts W e thank Dann y Holten and Jack v an Wijk for introducing us to this exciting problem and Da vid Bry an t for p ointing us to the work of Ro deric P age on host and parasite trees. References [BBB + 09] Kevin Buchin, Maike Buc hin, Jarosla w Byrk a, Martin N¨ ollen burg, Y oshio Ok amoto, Ro- drigo I. Silv eira, and Alexander W olff. Dra wing (complete) binary tanglegrams: Hardness, appro ximation, fixed-parameter tractability . In I. G. T ollis and M. P atrignani, editors, Pr o c. 16th Internat. Symp. Gr aph Dr awing (GD’08) , volume 5417 of L e ctur e Notes Com- put. Sci. , pages 324–335. Springer-V erlag, 2009. [pp. 1, 4] [BBL10] F rank Baumann, Christoph Buchheim, and F rauke Liers. Exact bipartite crossing mini- mization under tree constraints. In P . F esta, editor, Pr o c. 9th Internat. Symp os. Exp eri- mental A lgorithms (SEA’10) , volume 6049, pages 118–128. Springer-V erlag, 2010. [p. 4] [BCEFB09] Mukul S. Bansal, W en-Chieh Chang, Oliv er Eulenstein, and Da vid F ern´ andez-Baca. Gen- eralized binary tanglegrams: Algorithms and applications. In Sanguthev ar Ra jasek aran, editor, Pr o c. 1st Internat. Conf. Bioinformatics Comput. Biol. (BICoB’09) , v olume 5462 of L e ctur e Notes Comput. Sci. , pages 114–125. Springer-V erlag, 2009. [p. 4] [BDMT98] Paola Bertolazzi, Giuseppe Di Battista, Carlo Mannino, and Rob erto T amassia. Optimal up w ard planarity testing of single-source digraphs. SIAM J. Comput. , 27(1):132–169, 1998. [p. 3] [BHTW09] Sebastian B¨ oc k er, F alk H ¨ uffner, Anke T russ, and Magnus W ahlstr¨ om. A faster fixed- parameter approac h to drawing binary tanglegrams. In Jianer Chen and F edor F omin, editors, Pr o c. 4th Internat. Workshop Par ameterize d and Exact Comput. (IWPEC’09) , v olume 5917 of L e ctur e Notes Comput. Sci. , pages 38–49. Springer-V erlag, 2009. [p. 4] [BM01] Sam uel Burer and Renato D.C. Monteiro. A pro jected gradient algorithm for solving the Maxcut SDP relaxation. Optimization Metho ds and Softwar e , 15:175–200, 2001. [p. 15] [CLRS01] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Riv est, and Clifford Stein. Intr o- duction to A lgorithms . MIT Press, 2nd edition, 2001. [p. 11] [DFK08] Vida Dujmovi ´ c, Henning F ernau, and Mic hael Kaufmann. Fixed parameter algorithms for one-sided crossing minimization revisited. J. Discr ete Algorithms , 6(2):313–323, 2008. [p. 3] 18 [DHJ + 97] Bhask ar DasGupta, Xin He, T ao Jiang, Ming Li, John T romp, and Louxin Zhang. On distances b etw een phylogenetic trees. In Pr o c. 18th Annu. ACM-SIAM Symp os. Discr ete A lgorithms (SOD A’97) , pages 427–436, 1997. [p. 2] [DS04] Tim Dwy er and F alk Schreiber. Optimal leaf ordering for t wo and a half dimensional ph ylogenetic tree visualization. In Neville Churc her and Clare Churc her, editors, Pr o c. A ustr alasian Symp os. Inform. Visual. (InVis.au’04) , volume 35 of CRPIT , pages 109–115. Australian Comput. So c., 2004. [pp. 2, 3] [EW94] P eter Eades and Nicholas W ormald. Edge crossings in drawings of bipartite graphs. A lgorithmic a , 10:379–403, 1994. [p. 3] [FKP05] Henning F ernau, Mic hael Kaufmann, and Mathias P oths. Comparing trees via crossing minimization. In R. Ramanujam and Sandeep Sen, editors, Pr o c. 25th Intern. Conf. F ound. Softw. T e chn. The or et. Comput. Sci. (FSTTCS’05) , volume 3821 of L e ctur e Notes Comput. Sci. , pages 457–469. Springer-V erlag, 2005. [pp. 2, 3, 4, 5, 6] [GJ79] Michael R. Garey and David S. Johnson. Computers and Intr actability . W. H. F reeman, 1979. [p. 4] [GT83] Harold N. Gabow and Rob ert Endre T arjan. A linear-time algorithm for a sp ecial case of disjoin t set union. In Pr o c. 15th Annu. A CM Symp. The ory Comput. (STOC’83) , pages 246–251, 1983. [p. 11] [GW95] Mic hel X. Go emans and David P . Williamson. Improv ed approximation algorithms for maxim um cut and satisfiability problems using semidefinite programming. J. ACM , 42(6):1115–1145, 1995. [pp. 4, 15] [HSV + 94] M. S. Hafner, P . D. Sudman, F. X. Villablanca, T. A. Spradling, J. W. Demastes, and S. A. Nadler. Disparate rates of molecular ev olution in cosp eciating hosts and parasites. Scienc e , 265:1087–1090, 1994. [p. 2] [HvW08] Dann y Holten and Jarke J. v an Wijk. Visual comparison of hierarchically organized data. In Pr o c. 10th Eur o gr aphics/IEEE-VGTC Symp os. Visualization (Eur oVis’08) , pages 759– 766, 2008. [pp. 2, 3] [Kho02] Subhash Khot. On the p ow er of unique 2-pro v er 1-round games. In Pr o c. 34th Annu. A CM Symp os. The ory Comput. (STOC’02) , pages 767–775, 2002. [p. 4] [KV05] Subhash Khot and Nisheeth K. Vishnoi. The unique games conjecture, in tegralit y gap for cut problems and embeddability of negative type metrics into l 1 . In Pr o c. 46th A nnu. IEEE Symp os. F oundat. Comput. Sci. (FOCS’05) , pages 53–62, 2005. [pp. 4, 5] [LPR + 07] Antoni Lozano, Ron Y. Pin ter, Oleg Rokhlenk o, Gabriel V aliente, and Mic hal Ziv-Uk elson. Seeded tree alignmen t and planar tanglegram la yout. In R. Giancarlo and S. Hannenhalli, editors, Pr o c. 7th Internat. Workshop Algorithms Bioinformatics (W ABI’07) , volume 4645 of L e ctur e Notes Comput. Sci. , pages 98–110. Springer-V erlag, 2007. [pp. 2, 3] [Nag05] Hiroshi Nagamo chi. An impro v ed b ound on the one-sided minimum crossing num b er in t w o-lay ered dra wings. Discr ete Comput. Ge om. , 33(4):565–591, 2005. [p. 3] [NVWH09] Martin N¨ ollen burg, Markus V¨ olk er, Alexander W olff, and Dann y Holten. Drawing binary tanglegrams: An exp erimental ev aluation. In Pr o c. 11th Workshop Algorithm Engine ering and Exp eriments (ALENEX’09) , pages 106–119. SIAM, 2009. [pp. 4, 14] [P ag02] Ro deric D. M. Page, editor. T angle d T r e es: Phylo geny, Cosp e ciation, and Co evolution . Univ ersit y of Chicago Press, 2002. [p. 2] [RRR98] V enk atesh Raman, B. Ra vikumar, and S. Sriniv asa Rao. A simplified NP-complete MAXSA T problem. Inform. Pr o c ess. L ett. , 65:1–6, 1998. [p. 6] 19 [STT81] Kozo Sugiyama, Sho jiro T aga w a, and Mitsuhiko T o da. Metho ds for visual understanding of hierarchical system structures. IEEE T r ans Syst. Man Cyb ern. , 11(2):109–125, 1981. [p. 3] [V ASG10] Bala ji V enk atachalam, Jim Apple, Katherine St. John, and Daniel Gusfield. Untangling tanglegrams: Comparing trees by their drawings. IEEE/A CM T r ans. Comput. Biol. Bioinf. , PrePrints, 2010. (doi: 10.1109/TCBB.2010.57). [pp. 2, 4] 20
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment