Average-case analysis of perfect sorting by reversals
A sequence of reversals that takes a signed permutation to the identity is perfect if at no step a common interval is broken. Determining a parsimonious perfect sequence of reversals that sorts a signed permutation is NP-hard. Here we show that, desp…
Authors: Mathilde Bouvel (LIAFA), Cedric Chauve, Marni Mishna
A v erage-ase analysis of p erfet sorting b y rev ersals Mathilde Bouv el ∗ , Cedri Chauv e † , Marni Mishna † , Dominique Rossin ∗ No v em b er 17, 2018 Abstrat A sequene of rev ersals that tak es a signed p erm utation to the iden tit y is p erfet if at no step a ommon in terv al is brok en. Determining a parsimonious p erfet sequene of rev ersals that sorts a signed p erm utation is NP-hard. Here w e sho w that, despite this w orst-ase analysis, with probabilit y one, sorting an b e done in p olynomial time. F urther, w e nd asymptoti expressions for the a v erage length and n um b er of rev ersals in omm uting p erm utations, an in teresting sub- lass of signed p erm utations. 1 In tro dution The sorting of signed p erm utations b y rev ersals is a simple om binatorial problem with a diret appliation in genome arrangemen t studies. Dieren t sorting senarios pro vide estimates for ev olu- tionary distane and an help explain the dierenes in gene orders b et w een t w o sp eies (see [9 ℄ for example). Initially , the shortest sequenes (parsimonious) of rev ersals w ere sough t, and p olynomial time algorithms to nd su h sequenes w ere desrib ed ([ 13 , 8, 18 ℄). Reen tly , biologially motiv ated renemen ts ha v e b een onsidered, sp eially aoun ting for groups of genes that are o-lo alized with the dieren t homologous genes (genes ha ving a single ommon anestor) in the genomes of dieren t sp eies. These groups are lik ely together in the ommon anestral genome, and w ere not disrupted during ev olution, hene, w e exp et them to app ear together at ev ery step of the ev olution. In terms of our om binatorial mo del, a group of o-lo alized genes is mo deled b y a ommon interval , that is, a olletion of sequen tial n um b ers that are not brok en b y an y rev ersal mo v e. This onstrain t leads us ba k to the basi algorithmi problem: What is the smallest n um b er of rev ersals required to sort a signed p erm utation in to the iden tit y p erm utation without breaking an y (subset of ) ommon in terv al? These senarios are alled p erfe t [11℄. Beause of the additional onstrain t, it is p ossible that the smallest p erfet sorting senario is longer that the smallest senario. Already it is kno wn that this rened problem is NP-hard [11 ℄. Ho w ev er, sev eral authors ha v e giv en sub-instanes whi h an b e solv ed in p olynomial time [ 3, 4, 10 ℄, and xed parameter tratable algorithms exist [4 , 5℄. F or example, ommuting p ermutations are the sub-lass with the striking prop ert y that the prop ert y of a senario b eing p erfet is preserv ed ev en when the sequene of ∗ CNRS, Univ ersité P aris Diderot, LIAF A, P aris, F rane, Supp orted b y ANR pro jet GAMMA BLAN07-2_195422 † Departmen t of Mathematis, Simon F raser Univ ersit y , Burnab y (BC), Canada 1 rev ersals is reordered. Examples of omm uting senarios arise in the study of mammals. All of the kno wn sub-problems an b e expressed in terms of the strong in terv al tree asso iated to a p erm utation, and w e fo us our atten tion on the struture of this tree. Reen tly , sev eral w orks ha v e in v estigated exp eted prop erties of om binatorial ob jets related to genomi distane omputation, su h as the breakp oin t graph [20 , 21, 19, 17℄. W e explore this route here, but fo using on the strong in terv al tree, to ondut an a v erage ase analysis of p erfet sorting b y rev ersals. First, in Setion 3, w e pro v e that for large enough n , with probabilit y 1 , omputing a p erfet rev ersal sorting senario for signed p erm utations an b e done in time p olynomial in n , despite the fat that this is NP-hard. Seondly , in Setion 4, w e sho w that in parsimonious p erfet senarios for omm uting p erm utations of length n , the a v erage n um b er of rev ersals is asymptotially 1 . 2 n , and the a v erage length of a rev ersal is 1 . 02 √ n . 2 Preliminaries W e rst summarize the om binatorial and algorithmi framew orks for p erfet sorting b y rev ersals. F or a more detailed treatmen t, w e refer to [ 4 ℄. P erm utations, rev ersals, ommon in terv als and p erfet senarios. A signe d p ermutation on [ n ] is a p erm utation on the set of in tegers [ n ] = { 1 , 2 , . . . , n } in whi h ea h elemen t has a sign, p ositiv e or negativ e. Negativ e in tegers are represen ted b y plaing a bar o v er them. W e denote b y I d n (resp. I d n ) the iden tit y (resp. rev ersed iden tit y) p erm utation, (1 2 . . . n ) (resp. ( n . . . 2 1) ). When the n um b er n of elemen ts is lear from the on text, w e will simply write I d or I d . An interval I of a signed p erm utation σ on [ n ] is a segmen t of adjaen t elemen ts of σ . The ontent of I is the subset of I dened b y the absolute v alues of the elemen ts of I . Giv en σ , an in terv al is dened b y its on ten t and from no w, when the on text is unam biguous, w e iden tify an in terv al with its on ten t. The r eversal of an in terv al of a signed p erm utation rev erses the order of the elemen ts of the in terv al, while hanging their signs. If σ is a p erm utation, w e denote b y σ the p erm utation obtained b y rev ersing the omplete p erm utation σ . A s enario for σ is a sequene of rev ersals that transforms σ in to I d n or I d n . The length of su h a senario is the n um b er of rev ersals it on tains. The length of a rev ersal is the n um b er of elemen ts in the in terv al that is rev ersed. T w o distint in terv als I and J ommute if their on ten ts trivially in terset, that is either I ⊂ J , or J ⊂ I , or I ∩ J = ∅ . If in terv als I and J do not omm ute, they overlap . A ommon interval of a p erm utation σ on [ n ] is a subset of [ n ] that is an in terv al in b oth σ and the iden tit y p erm utation I d n . The singletons and the set { 1 , 2 , . . . , n } are alw a ys ommon in terv als alled trivial ommon intervals . A senario S for σ is alled a p erfe t s enario if ev ery rev ersal of S omm utes with ev ery ommon in terv al of σ . A p erfet senario of minimal length is alled a p arsimonious p erfe t s enario . A p erm utation σ is said to b e ommuting if, there exists a p erfet senario for σ su h that for ev ery pair of rev ersals of this senario, the orresp onding in terv als omm ute. In su h a ase, this prop ert y holds for ev ery p erfet senario for σ [4 ℄. The strong in terv al tree. A ommon in terv al I of a p erm utation σ is a str ong interval of σ if it omm utes with ev ery other ommon in terv al of σ . 2 PSfrag replaemen ts { 1 } { 2 } { 3 } { 4 } { 5 } { 6 } { 7 } { 8 } { 9 } { 10 } { 11 } { 12 } { 13 } { 14 } { 15 } { 16 } { 17 } { 18 } { 2 , 3 , 4 , 5 } { 6 , 7 } { 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 } { 13 , 14 } { 16 , 17 } { 10 , 11 , 12 , 13 , 14 } { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 1 7 , 18 } Figure 1: The strong in terv al tree T S ( σ ) of the p erm utation σ = (1 ¯ 8 4 2 ¯ 5 3 9 ¯ 6 7 12 10 ¯ 14 13 ¯ 11 15 ¯ 17 16 18) . Prime and linear v erties are distinguished b y their shap e. There are three non-trivial linear v erties, the retangular v erties, and three prime v erties, the round v erties. The ro ot and the v ertex { 6 , 7 } are inreasing linear v erties, while the linear v erties { 16 , 17 } and { 13 , 14 } are dereasing. The inlusion order of the set of strong in terv als denes an n -leaf tree, denoted b y T S ( σ ) , whose lea v es are the singletons, and whose ro ot is the in terv al on taining all elemen ts of the p erm utation. The strong in terv al tree of σ an b e omputed in linear time and spae (see [7℄ for example). W e all the tree T S ( σ ) the str ong interval tr e e of σ , and w e iden tify a v ertex of T S ( σ ) with the strong in terv al it represen ts. In a more om binatorial on text, this tree is also alled substitution de omp osition tr e e [1℄. If σ is a signed p erm utation, the sign of ev ery elemen t of σ is giv en to the orresp onding lea v es in T S ( σ ) . Let I b e a strong in terv al of σ and I = ( I 1 , . . . , I k ) the unique partition of the elemen ts of I in to maximal strong in terv als, from left to righ t. The quotient p ermutation of I , denoted σ I , is dened as follo ws: σ I ( i ) is smaller than σ I ( j ) in σ I if an y elemen t of I i is smaller (in absolute v alue if σ is a signed p erm utation) than an y elemen t of I j . The v ertex I , or equiv alen tly the strong in terv al I of σ , is either: inr e asing line ar , if σ I is the iden tit y p erm utation, or de r e asing line ar , if σ I is the rev ersed iden tit y p erm utation, or prime , otherwise. F or exp osition purp oses w e onsider that an inreasing v ertex is p ositiv e and a dereasing v ertex is negativ e. The strong in terv al tree as omputed in the algorithm of [7℄ on tains the nature -inreasing/dereasing linear or prime- of ea h v ertex. It an b e adapted to ompute also in linear time the quotien t p erm utation asso iated to ea h strong in terv al. (See Fig. 1 for an example.) F or a v ertex I of T S ( σ ) , w e denote b y L ( I ) the set of elemen ts of σ that lab el lea v es of the subtree of T S ( σ ) ro oted at I . The strong in terv al tree as a guide for p erfet sorting b y rev ersals. W e desrib e no w imp ortan t prop erties, related to the strong in terv al tree, of the algorithm desrib ed in [ 4℄ for p erfet sorting b y rev ersals a signed p erm utation. Let σ b e a signed p erm utation of size n and T S ( σ ) its strong in terv al tree, ha ving m in ternal v erties, alled I 1 , . . . , I m , inluding p prime v erties: Theorem 1. [4℄ 1. The algorithm desrib e d in [4 ℄ an ompute a p arsimonious p erfe t s enario for σ in worst- ase 3 time O (2 p n p n log ( n )) . 2. σ is a ommuting p ermutation if and only if p = 0 . 3. If σ is a ommuting p ermutation, then every p erfe t s enario has for r eversals set the set { L ( I j ) | I j has a sign dier ent fr om its p ar ent in T S ( σ ) } Remark 1. The str ong interval tr e e of an unsigne d p ermutation is e quivalent to the mo dular de om- p osition tr e e of the orr esp onding lab ele d p ermutation gr aph (se e [ 4 ℄ for example). A lso ommuting p ermutations have b e en investigate d, in onne tion with p ermutation p atterns, under the name of separable p ermutations [14℄. 3 On the n um b er of prime v erties Motiv ated b y the a v erage-time omplexit y of the algorithm desrib ed in [4℄ for omputing a par- simonious p erfet senario, w e rst in v estigate the a v erage shap e of a strong in terv al tree of a p erm utation of size n . Su h a tree is haraterized b y the shap e of the tree along with the quotien t p erm utations lab eling in ternal v erties. F or prime v erties, those quotien t p erm utations orresp ond to simple p ermutations as dened in [2 ℄. W e rst onen trate on en umerativ e results on simple p er- m utations. Next, w e deriv e from them en umerativ e onsequenes on the n um b er of p erm utations whose strong in terv al tree has a giv en shap e. Exhibiting a family of shap es with only one prime v ertex, w e an pro v e that nearly all p erm utations ha v e a strong in terv al tree of this sp eial shap e. 3.1 Com binatorial preliminaries: strong in terv al trees and simple p erm utations Let T S ( σ ) b e the strong in terv al tree of a p erm utation σ of length n . F rom a om binatorial p oin t of view it is simply a plane tree (the hildren of a v ertex are totally ordered) with n lea v es and its in ternal v erties lab eled b y their quotien t p erm utation: an in ternal v ertex ha ving k hildren an b e lab eled either b y the p erm utation (1 2 . . . k ) (inreasing linear v ertex), the p erm utation ( k k − 1 . . . 1) (dereasing linear v ertex) or a p erm utation of length k whose only ommon in terv als are trivial (prime v ertex). Due to the fat that T S ( σ ) represen ts the ommon in terv als b et w een σ and the iden tit y p erm utation, it has t w o imp ortan t prop erties. Prop ert y 1. 1. No e dge an b e inident to two inr e asing or two de r e asing line ar verti es. 2. The lab eling of the le aves by the inte gers { 1 , . . . , n } is impliitly dene d by the lab els of the internal verti es. P erm utations whose ommon in terv als are trivial are alled simple p ermutations . The shortest simple p erm utations are of length 4 and are (3 1 4 2) and (2 4 1 3) . The en umeration of simple p erm utations w as in v estigated in [2℄. The authors pro v e that this en umerativ e sequene is not P- reursiv e and there is no kno wn losed form ula for the n um b er of simple p erm utations of a giv en size. Ho w ev er, it w as sho wn in [2℄ that an asymptoti equiv alen t for the n um b er s n of simple p erm utations of size n is s n = n ! e 2 (1 − 4 n + 2 n ( n − 1) + O ( 1 n 3 )) when n → ∞ (1) 4 3.2 A v erage shap e of strong in terv al trees A twin in a strong in terv al tree is a v ertex of degree 2 su h that ea h of its t w o hildren is a leaf. A t win is then a linear v ertex. The follo wing result, that applies b oth to signed p erm utations and unsigned p erm utations, is the main result of this setion. Theorem 2. Asymptoti al ly, with pr ob ability 1 , a r andom p ermutation σ of size n has a str ong interval tr e e suh that the r o ot is a prime vertex and every hild of the r o ot is either a le af or a twin. Mor e over the pr ob ability that T S ( σ ) has suh a shap e with exatly k twins is 2 k e 2 k ! . The pro of follo ws from Lemma 1 and Equation 1 . Lemma 1. If p ′ n,k denotes the numb er of p ermutations of length n whih ontain a ommon interval I of length k then for any xe d p ositive inte ger c : n − c X k = c +2 p ′ n,k n ! = O ( n − c ) Pr o of. This Lemma generalizes to an y ommon in terv al the follo wing result. Lemma 2. [2 , Lemma 7℄ A ommon interval in a p ermutation is said minimal if it is not a singleton and e ah ommon interval inlude d in it is trivial. If p n,k denotes the numb er of p ermutations of length n whih ontain a minimal ommon interval of length k then for any xe d p ositive inte ger c : n − c X k = c +2 p n,k n ! = O ( n − c ) The pro of of Lemma 1 is v ery similar to the artile [2℄. W e ha v e p ′ n,k ≤ ( n − k + 1) k !( n − k + 1)! . Indeed, the righ t hand side oun ts the n um b er of quotien t p erm utations orresp onding to I ( k ! ), the p ossible v alues of the minimal elemen t of I ( n − k + 1 ) and the struture of the rest of the p erm utation with one more elemen t whi h marks the insertion of I ( ( n − k + 1)! ). Only the extremal terms of the sum an ha v e magnitude O ( n − c ) and the remaining terms ha v e magnitude O ( n − c − 1 ) . Sine there are few er than n terms the result of Lemma 1 follo ws. Pr o of of The or em 2. Lemma 1 with c = 1 giv es that the prop ortion of non-simple p erm utations with ommon in terv als of size greater than or equal to 3 is O ( n − 1 ) . But p erm utations whose ommon in terv als are only of size 1 , 2 or n are exatly p erm utations whose strong in terv al tree has a prime ro ot and ev ery hild is either a leaf or a t win. Then the n um b er of p erm utations whose strong in terv al tree has a prime ro ot with k t wins is s n − k n − k k 2 k . F rom Equation 1 the asymptotis for this n um b er is n !2 k e 2 k ! , pro ving Theorem 2. 3.3 A v erage time omplexit y of p erfet sorting b y rev ersals Corollary 1. The algorithm desrib e d in [4 ℄ for omputing a p arsimonious p erfe t s enario for a r andom p ermutation runs in p olynomial time with pr ob ability 1 as n → ∞ . 5 1 1.2 1.4 1.6 1.8 2 2.2 0 5 10 15 20 25 Figure 2: p n , up to n = 25 . Pr o of. Diret onsequene of p oin t 1 in Theorem 1 and of Theorem 2 , applied on signed p erm uta- tions. This result ho w ev er do es not imply that the a v erage omplexit y of this algorithm is p olynomial, as the a v erage time omplexit y is the sum of the omplexit y on all instanes of size n divided b y the n um b er of instanes. F ormally , to assess the a v erage time omplexit y , w e need to pro v e that as n gro ws, the ratio p n = P p 2 p T n,p T n is b ounded b y a p olynomial in n , where T n is the n um b er of strong in terv al trees with n lea v es and T n,p the n um b er of su h trees with p prime v erties. Let T ( x, y ) b e the biv ariate generating funtion T ( x, y ) = P k ,n T n,p x n y p Then p n = [ x n ] T ( x, 2) . Let moreo v er P ( x ) b e the generating funtion of simple p erm utations P ( x ) = P n ≥ 0 s n x n (whose rst terms an b e obtained from en try A111111 in [16℄). Using the sp eiation for strong in terv al trees giv en in Setion 3.1 and te hniques desrib ed in [12℄ for example, it is immediate that T ( x, y ) satises the follo wing system of funtional equations: ( T ( x, y ) = x + y P ( T ( x, y )) + 2 B ( x,y ) 2 1 − B ( x,y ) B ( x, y ) = x + y P ( T ( x, y )) + B ( x,y ) 2 1 − B ( x,y ) By iterating these equations, w e omputed the 25 rst v alues of p n (Fig. 2) that suggest that p n is ev en b ounded b y a onstan t lose to 2 and lead us to Conjeture 1. Conjeture 1. The aver age-time omplexity of the algorithm desrib e d in [4℄ for omputing a p arsimonious p erfe t s enario is p olynomial, b ounde d by O ( n √ n ) . 4 A v erage-ase prop erties of omm uting p erm utations W e no w study the family of omm uting (signed) p erm utations and more preisely the a v erage n um b er of rev ersals in a parsimonious p erfet senario for a omm uting p erm utation and the a v erage length of a rev ersal of su h a senario. Let σ b e a omm uting p erm utation of size n , i.e. a signed p erm utation whose strong in terv al tree T S ( σ ) has no prime v ertex. It follo ws from the om binatorial sp eiation of strong in terv al trees giv en in Setion 3.1 that T S ( σ ) is simply a plane tree with in ternal v erties ha ving at least t w o 6 hildren and a sign on the ro ot (that denes impliitly the signs of the other in ternal v erties from p oin t 1 in Prop ert y 1 and the lab els { 1 . . . n } of the lea v es). These trees are then S hrö der trees (en try A001003 in the On-Line Enylop edia of In teger Sequenes [16 ℄) with a sign on the ro ot. Theorem 3. The aver age length of a p arsimonious p erfe t s enario for a ommuting p ermutation of length n is asymptoti al ly 1 + √ 2 2 n ≃ 1 . 2 n. Pr o of. F rom the previous setion and p oin ts 2 and 3 in Theorem 1, the problem of omputing the exp eted n um b er of rev ersals of a parsimonious p erfet senario redues to omputing the exp eted n um b er of in ternal v erties of T S ( σ ) other than the ro ot (b eause t w o adjaen t linear v erties annot ha v e the same sign) and the exp eted n um b er of lea v es whose sign in σ diers from the sign of its paren t in T S ( σ ) . The exp eted n um b er of lea v es whose sign in σ is dieren t from its paren t in T S ( σ ) is ob viously n/ 2 , as the sign of the leaf and of its paren t are indep enden t. T o ompute the a v erage n um b er of in ternal v erties in a S hrö der tree, w e use sym b oli metho ds as dened in [12℄. Let us dene the biv ariate generating funtion S ( x, y ) = P k ,n S n,k x n y k where S n,k denotes the n um b er of S hrö der trees with n lea v es and k in ternal v erties. The a v erage n um b er of in ternal v erties in a S hrö der tree with n lea v es is P k k S n,k P k S n,k = [ x n ] ∂ S ( x,y ) ∂ y | y =1 [ x n ] S ( x, 1) . A S hrö der tree an b e reursiv ely desrib ed as a single leaf, or a ro ot ha ving at least t w o hildren, whi h are again S hrö der trees. Consequen tly , S ( x, y ) satises the equation S ( x, y ) = x + y S ( x, y ) 2 1 − S ( x, y ) , and solving this equation giv es S ( x, y ) = ( x + 1) − p ( x + 1) 2 − 4 x ( y + 1) 2( y + 1) . (2) W e ompute an asymptoti equiv alen t of the n um b er [ x n ] S ( x, 1) , the n um b er of S hrö der trees ([16 , en try A001003℄). Asymptoti study of S ( x, 1) . By Equation 2 w e obtain S ( x, 1) = ( x + 1) − p ( x + 1) 2 − 8 x 4 = ( x + 1) − q (1 − x 3+2 √ 2 )(1 − x 3 − 2 √ 2 ) 4 , whi h yields the equiv alen t when x → 3 − 2 √ 2 , x < 3 − 2 √ 2 S ( x, 1) ∼ 2 − √ 2 2 − p 3 √ 2 − 4 2 (1 − x 3 − 2 √ 2 ) 1 / 2 . Applying the te hniques of [12 , hapters 4 and 6 ℄ giv es the follo wing equiv alen t of the o eien ts [ x n ] S ( x, 1) when n → ∞ : [ x n ] S ( x, 1) ∼ p 3 √ 2 − 4 4 (3 + 2 √ 2) n 1 √ π n 3 . 7 Asymptoti study of ∂ S ( x,y ) ∂ y | y =1 . By Equation 2 w e obtain ∂ S ( x, y ) ∂ y | y =1 = ( x − 1) 2 − ( x + 1) p ( x + 1) 2 − 8 x 8 q (1 − x 3+2 √ 2 )(1 − x 3 − 2 √ 2 ) . F rom the ab o v e expression, w e an obtain an equiv alen t of ∂ S ( x,y ) ∂ y | y =1 when x → 3 − 2 √ 2 , x < 3 − 2 √ 2 . Namely , ∂ S ( x, y ) ∂ y | y =1 ∼ 3 − 2 √ 2 4 p 3 √ 2 − 4 (1 − x 3 − 2 √ 2 ) − 1 / 2 . As b efore, w e dedue that an equiv alen t of the o eien ts [ x n ] ∂ S ( x,y ) ∂ y | y =1 when n → ∞ is [ x n ] ∂ S ( x, y ) ∂ y | y =1 ∼ 3 − 2 √ 2 4 p 3 √ 2 − 4 (3 + 2 √ 2) n 1 √ π n An equiv alen t of the a v erage n um b er of in ternal v erties in a S hrö der tree with n lea v es is no w easily deriv ed as [ x n ] ∂ S ( x,y ) ∂ y | y =1 [ x n ] S ( x, 1) ∼ 3 − 2 √ 2 3 √ 2 − 4 n ∼ n √ 2 . Com bining all results together The n um b er ab o v e is the the a v erage n um b er of in ternal v erties in S hrö der trees with n lea v es, inluding the ro ot if it is not a leaf (i.e. n > 1 ). A giv en S hrö der tree with n lea v es an ha v e its in ternal v erties and lea v es signed in 2 n +1 w a ys ( 2 hoies for the sign of the ro ot, that dene the signs of all other in ternal v erties, and 2 n hoies for the signs of the n lea v es). As these signs do not hange the n um b er of in ternal v erties of the tree, the a v erage n um b er of in ternal v erties in su h signed S hrö der trees do es not hange. W e also ha v e to disard the ro ot as it do es not dene a rev ersal, but this do es not hange the asymptoti b eha viour and adding n/ 2 to aoun t for signed lea v es that dene rev ersals, w e obtain 1 + √ 2 2 n Remark 2. It is inter esting to note the lar ge r epr esentation of r eversals of length 1 , that omp oses almost half of the exp e te d r eversals. A similar pr op erty was observe d in [ 15℄ on datasets of b aterial genomes. Theorem 4. The aver age length of a r eversal in a p arsimonious p erfe t s enario for a ommuting p ermutation of length n is asymptoti al ly 2 7 / 4 p 3 − 2 √ 2 1 + √ 2 √ π n ≃ 1 . 02 √ n 8 Pr o of. W e w an t to ompute the ratio b et w een the a v erage sum of the lengths of the rev ersals of a parsimonious p erfet senario for a omm uting p erm utation and the a v erage length of su h a senario. The later w as obtained ab o v e (Theorem 3 ), and w e onen trate on the former. A rev ersal dened b y a v ertex x of the strong in terv al tree T S ( σ ) is of length L ( x ) (it rev erses the segmen t of the signed p erm utation that on tains the lea v es of the subtree ro oted at x , see [ 4 ℄). W e rst fo us on the a v erage v alue of the sum of the sizes of all subtrees in a S hrö der tree. F or simpliit y in the omputation, w e will also oun t the whole tree and the lea v es as subtrees (ob viously of size 1 ), whi h will giv e the same quan tit y w e w an t to ompute, up to subtrating 3 / 2 · n to the nal result. W e rst dene the biv ariate generating funtion (that w e all again S , but whi h is sligh tly dieren t) follo wing the standard analyti metho d dened in [12 ℄ S ( x, y ) = X k ,n S n,k x n y k where S n,k denotes the n um b er of S hrö der trees with n lea v es and sizes of subtrees (inluding lea v es and the whole tree) that sum to k . The a v erage v alue of the sum of the sizes of ev ery subtree in a S hrö der tree with n lea v es is P k k S n,k P k S n,k = [ x n ] ∂ S ( x,y ) ∂ y | y =1 [ x n ] S ( x, 1) . A S hrö der tree an b e reursiv ely desrib ed as a single leaf or a ro ot ha ving at least t w o hildren, whi h are again S hrö der trees. In the seond ase, the subtrees are those in v olv ed in the hildren of the ro ot, plus the tree itself (whi h is a subtree of size n ), whi h giv es the funtional equation 3: S ( x, y ) = xy + S ( xy , y ) 2 1 − S ( xy , y ) . (3) Sine this equation in v olv es b oth S ( x, y ) and S ( xy , y ) , w e annot extrat from it an expression for S ( x, y ) as in the pro of of Theorem 3. But sine the a v erage v alue of the sum of the sizes of ev ery subtree in a S hrö der tree with n lea v es an b e obtained b y P k k S n,k P k S n,k = [ x n ] ∂ S ( x,y ) ∂ y | y =1 [ x n ] S ( x, 1) , w e do no need to ompute S ( x, y ) but only S ( x, 1) and ∂ S ( x,y ) ∂ y | y =1 . Asymptoti study of S ( x, 1) . By Equation 3 w e obtain S ( x, 1) = ( x +1) − √ ( x +1) 2 − 8 x 4 , whi h is the same funtion as in the pro of of Theorem 3. Hene, [ x n ] S ( x, 1) ∼ p 3 √ 2 − 4 4 (3 + 2 √ 2) n 1 √ π n 3 . Asymptoti study of ∂ S ( x,y ) ∂ y | y =1 . Deriving Equation 3 and setting y = 1 giv es: ∂ S ∂ x ( x, 1) = 1 + ∂ S ∂ x ( x, 1) · 2 S ( x, 1) − S ( x, 1) 2 (1 − S ( x, 1)) 2 ∂ S ∂ y ( x, 1) = x + x ∂ S ∂ x ( x, 1) + ∂ S ∂ y ( x, 1) · 2 S ( x, 1) − S ( x, 1) 2 (1 − S ( x, 1)) 2 . F rom this system, w e an extrat the follo wing equation where S ( x, 1) has b een omputed b efore: ∂ S ( x, y ) ∂ y | y =1 = ∂ S ∂ y ( x, 1) = x (1 − C ) 2 , where C = 2 S ( x, 1) − S ( x, 1) 2 (1 − S ( x, 1)) 2 . 9 The singularit y losest to the origin is 3 − 2 √ 2 , and the T a ylor dev elopmen t of the ab o v e around this singularit y giv es: ∂ S ( x, y ) ∂ y | y =1 ∼ 3 − 2 √ 2 2(1 − x 3 − 2 √ 2 ) Applying the te hniques of [12℄, this yields the follo wing equiv alen t of the o eien ts [ x n ] ∂ S ( x,y ) ∂ y | y =1 when n → ∞ : [ x n ] ∂ S ( x, y ) ∂ y | y =1 ∼ 3 − 2 √ 2 2 (3 + 2 √ 2) n Then [ x n ] ∂ S ( x,y ) ∂ y | y =1 [ x n ] S ( x, 1) ∼ 2 3 / 4 q 3 − 2 √ 2 √ π n 3 . giv es the a v erage sum of the sizes of all subtrees of a S hrö der tree. This is indep enden t of the signs added to giv e the strong in terv al tree of a omm uting p erm u- tation, so this n um b er is also the exp eted sum of the sizes of all subtrees of a the strong in terv al tree asso iated to a random omm uting p erm utation. T o get the exp eted sum of the lengths of the rev ersals of a parsimonious p erfet senario for a random omm uting p erm utation, w e need to remo v e the size of the whole tree, that w as oun ted as a subtree ( n ), the size of the n subtrees dened b y the lea v es ( n ) and to add the on tribution of the rev ersals of size 1 ( n/ 2 on the a v erage), whi h do es not hange the ab o v e asymptotis. Dividing b y the a v erage n um b er of rev ersals of su h a senario (Theorem 3 ), w e obtain Theo- rem 4. 5 Conlusion W e sho w ed that p erfet sorting b y rev ersals, although an in tratable problem, is v ery lik ely to b e solv ed in p olynomial time for random signed p erm utations. This result relies on a study of the shap e of a random strong in terv al tree that sho ws that asymptotially su h trees are mostly omp osed of a large prime v ertex at the ro ot and small subtrees. As the strong in terv al tree of a p erm utation is equiv alen t to the mo dular deomp osition tree of the orresp onding lab eled p erm utation graph [ 4 ℄, this result agrees with the general b elief that the mo dular deomp osition tree of a random graph has a large prime ro ot. W e w ere also able to giv e preise asymptoti results for the exp eted lengths of a parsimonious p erfet senario and of a rev ersal of su h a senario for random omm uting p erm utations. Our resear h lea v es at least one op en problem: pro ving that omputing a parsimonious p erfet senario an b e done in p olynomial time on the a v erage. It w ould also b e in teresting to see if our approa h an b e extended to the p erfet rearrangemen t problem for the Double-Cut-and-Join mo del that has b een in tro dued reen tly [6℄ and has the in triguing prop ert y that instanes that w ere hard to solv e for rev ersals are an b e solv ed in p olynomial time in the DCJ on text and on v ersely . Referenes [1℄ M. Alb ert and M. A tkinson. Simple p erm utations and pattern restrited p erm utations. Disr ete Math. , 300(1-3):115, 2005. 10 [2℄ M. Alb ert, M. A tkinson, and M. Klazar. The en umeration of simple p erm utations. J. Inte ger Se q. , 6, 2003. [3℄ S. Bérard, A. Bergeron, and C. Chauv e. Conserv ation of om binatorial strutures in ev olution senarios. In Comp ar ative Genomis 2004 , v olume 3388 of LNCS/LNBI , pages 114, 2004. [4℄ S. Bérard, A. Bergeron, C. Chauv e, and C. P aul. P erfet sorting b y rev ersals is not alw a ys diult. IEEE/A CM T r ans. Comput. Biol. Bioinform. , 4:416, 2007. [5℄ S. Bérard, C. Chauv e, and C. P aul. A more eien t algorithm for p erfet sorting b y rev ersals. Inform. Pr o . L etters , 106:9095, 2008. [6℄ S. Bérard, C. Chauv e, C. P aul, and E. T annier. P erfet DCJ rearrangemen t. In RECOMB-CG 2008 , v olume 5267 of LNCS/LNBI , pages 158169, 2008. [7℄ A. Bergeron, C. Chauv e, F. de Mon tgoler, and M. Ranot. Computing ommon in terv als of k p erm utations, with appliations to mo dular deomp osition of graphs. SIAM J. Disr ete Math. , 22:10221039, 2008. [8℄ A. Bergeron, J. Mixta ki, and J. Sto y e. Mathematis of Evolution and Phylo geny , hapter The in v ersion distane problem. Oxford Univ ersit y Press, 2005. [9℄ G. Bourque and P . P evzner. Genome-sale ev olution: reonstruting gene orders in the anestral sp eies. Genome R es. , 12:2636, 2002. [10℄ Y. Diekmann, M.-F. Sagot, and E. T annier. Ev olution under rev ersals: P arsimon y and onser- v ation of ommon in terv als. IEEE/A CM T r ans. Comput. Biol. Bioinform. , 4:301109, 2007. [11℄ M. Figea and J.-S. V arré. Sorting b y rev ersals with ommon in terv als. In W ABI 2004 , v olume 3240 of LNCS/LNBI , pages 2637, 2004. [12℄ P . Fla jolet and R. Sedgewi k. A nalyti Combinatoris . Cam bridge Univ ersit y Press, 2008. [13℄ S. Hannenhalli and P . P evzner. T ransforming abbage in to turnip: P olynomial algorithm for sorting signed p erm utations b y rev ersals. J. A CM , 46:127, 1999. [14℄ L. Ibarra. Finding pattern mat hings for p erm utations. Inform. Pr o . L etters , 61:293295, 1997. [15℄ J.-F. Lefeb vre, N. El-Mabrouk, E. R. M. Tillier, and D. Sank o. Detetion and v alidation of single gene in v ersions. In ISMB (Supplement of Bioinformatis) , pages 190196, 2003. [16℄ N. J. A. Sloane. The on-line enylop edia of in teger sequenes, 2007. published eletronially at www.researh.att.om/~njas/sequenes/ . [17℄ K. Sw enson, Y. Lin, V. Ra jan, and B. Moret. Hurdles hardly ha v e to b e heeded. In RECOMB- CG 2008 , v olume 5267 of LNCS/LNBI , pages 241251, 2008. [18℄ E. T annier, A. Bergeron, and M.-F. Sagot. A dv anes on sorting b y rev ersals. Disr ete Appl. Math. , 155:881888, 2007. 11 [19℄ A. B. W. Xu and D. Sank o. P oisson adjaeny distributions in genome omparison: m ulti- hromosomal, irular, signed and unsigned ases. Bioinformatis , 24(16):i146i152, 2008. [20℄ C. Z. W. Xu and D. Sank o. P aths and yles in breakp oin t graph of random m ulti hromosomal genomes. J. Comput. Biol. , 14(4):423435, 2007. [21℄ W. Xu. The distribution of distanes b et w een randomly onstruted genomes: Generating funtion, exp etation, v ariane and limits. J. Bioinform. Comput. Biol. , 6(1):2336, 2008. 12 1 1.2 1.4 1.6 1.8 2 2.2 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 1.2 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 1.2 0 5 10 15 20 25
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment