Parametric analysis of RNA folding

We extend recent methods for parametric sequence alignment to the parameter space for scoring RNA folds. This involves the construction of an RNA polytope. A vertex of this polytope corresponds to RNA secondary structures with common branching. We us…

Authors: Valerie Hower, Christine E. Heitsch

Parametric analysis of RNA folding
P arametric analysis of RNA folding V alerie Ho wer, Christine E. Heitsc h Abstract W e extend recen t methods for parametric sequence alignmen t to the parameter space for scoring RNA folds. This in volv es the construction of an RNA p olytop e. A v ertex of this p oly- top e corresponds to RNA secondary structures with common branc hing. W e use this p olytop e and its normal fan to study the effect of v arying three parameters in the free energy mo del that are not determined exp erimentally . Our results indicate that v ariation of these sp ecific param- eters do es not hav e a dramatic effect on the structures predicted b y the free energy mo del. W e additionally map a collection of known RNA secondary structures to the RNA polytop e. Keywor ds: RNA secondary structure; plane tree; free energy; thermodynamic mo del; para- metric analysis 1 In tro duction Determining the structure of RNA molecules remains a fundamen tal scien tific c hallenge, since curren t metho ds cannot alwa ys iden tify the “correct” fold from the large n umber of p ossible con- figurations. A common metho d for predicting the secondary structure of a single RNA molecule, termed the thermo dynamic mo del , in volv es free energy minimization [21, 32, 34]. Extensions to this approach, such as sub optimal structure prediction and partition function calculations, still dep end on the parameters from the thermo dynamic model to score p ossible secondary structures. The free energy of a secondary structure is calculated by scoring substructures according to a set of parameters—most of which are determined exp erimen tally (see [16] for a review). A dynamic programming algorithm, used in softw are pac k ages lik e mfold [20, 33], computes the minimal free energy as w ell as the optimal secondary structure(s) [19]. In this w ork, we address v ariation in the parameter space for scoring secondary structures, fo cusing on three parameters from the m ulti-branch loop energy function that are not based on measuremen t. Sp ecifically , we address the follo wing questions. What is the geometry of the pa- rameter space for scoring RNA folds and ho w do es this geometry relate bac k to the biology? How sensitiv e is the thermodynamic mo del to v ariation of the ad-ho c m ulti-branc h loop parameters? W e answ er these questions using geometric com binatorics. W e find that v ariation of the m ulti- branc h lo op parameters has a smaller effect than the c hange in the parameter space coming from impro ved measuremen t. Moreov er, regardless of the choice of m ulti-branch loop parameters used in the curren t version of the thermo dynamic mo del, the minimal energy structures hav e a low degree of branching. Our results are achiev ed b y applying techniques from geometric com binatorics to give a para- metric analysis of RNA folding. W e construct an RNA p olytop e whose vertices corresp ond to sets of secondary structures with common branc hing. Its normal fan subdivides the parameter space so that the parameters lying in the same cone giv e the same minimal free energy structures. These 1 approac hes ha ve b een used recently in parametric sequence alignmen t [7, 8, 22] and for more gen- eral hidden Mark ov models [4, 23]. There is also earlier polyhedral w ork on parametric sequence alignmen t [13, 30] and related work on secondary structure comparison [29] and sequence/structure alignmen t [18]. W e additionally mak e comparisons with biological structures, and this work sup- p orts our theoretical results. 2 Bac kground 2.1 Plane trees and RNA folding W e use a simplified mo del of RNA folding in which a secondary structure S is represented by a ro oted plane tree T = T ( S ). Single-stranded RNA sequences fold into molecular structures. One step in this folding pro cess is the formation of W atson-Cric k and also G-U base pairs. The set of (nested) base pairs determines the secondary structure of an RNA sequence. As illustrated in Figure 1, a secondary structure has t wo basic types of substructures—runs of stack ed base pairs whic h are called helic es and the single-stranded regions known as lo ops . Every comp onen t of a secondary structure is giv en an associated free energy score b y the thermo dynamic mo del. T o a first appro ximation, the score of a lo op is determined its de gr e e —the n umber of base pairs con tained in the loop. There are different energy functions for the external lo op, hairpin loops, whic h ha v e degree 1, bulge/internal lo ops with degree 2, and multi-branc h lo ops with degree greater than 2. Supp ose L is multi-branc h loop, then the free energy of L is E ( L ) = a + bn 1 + cn 2 + q , (1) where n 1 is the num b er of single-stranded bases in L , n 2 is the num b er of helices in L , q is the sum of the single-base stacking energies in L , and a, b, c are the parameters for offset, free base, and helix p enalties, respectively [34]. In this w ork, our analysis is primarily fo cused on the three parameters a, b, and c from this function since they are not exp erimentally determined. Our results are obtained b y considering ro oted plane trees as a simplified mo del of RNA folding. Plane trees hav e b een used to en umerate p ossible RNA secondary structures [24] and also to compare them [17, 25] for some time no w. The in teraction b et ween com binatorics and RNA folding has con tinued to develop o ver the last 20 y ears, including using trees as more abstract represen tations of RNA folding, for instance in [11] and related w ork as w ell as in [2, 3, 14]. A r o ote d plane tr e e (also called plane tr e e or or der e d tr e e [5, 27]) is a tree with a sp ecified ro ot v ertex and suc h that the subtrees of an y giv en vertex are ordered. This ordering comes from the 5 0 → 3 0 linear arrangemen t of the RNA sequence. Plane trees with n edges are one of the man y com binatorial ob jects coun ted by the Catalan n umbers C n = 1 n + 1  2 n n  . (2) T o obtain T , w e assign the root v ertex to the exterior lo op of S and the non-root v ertices of T corresp ond to the remaining lo ops in S . Tw o v ertices in T share an edge when their lo ops in S are connected b y helices. As an example, we giv e a secondary structure in Figure 1A together with its asso ciated plane tree in Figure 1B. T ec hnically , a secondary structure S m ust be free of pseudoknots in order to construct T . While pseudoknots do o ccur in secondary structures, the thermo dynamic mo del cannot predict them and moreov er one can create a nested, pseudoknot-free structure from 2 a given fold in several wa ys—some of which are in [26] and our approac h is describ ed the Materials and Metho ds section. Giv en a plane tree T with n edges, w e write r for the degree of the ro ot A) B) Figure 1: Secondary structures as rooted plane trees v ertex and for 0 ≤ k ≤ n , d k is the num b er of non-ro ot v ertices with k children. Th us, d k giv es the num b er of non-ro ot v ertices in T with degree k + 1, and this is the n um b er of lo ops in S with k + 1 branc hes. T o assign an energy to a plane tree, we assign w eights to the vertices, based on the do wn degree of the v ertex. In terms of secondary structures, we are assigning the same energy to eac h t yp e of lo op in the fold. This is a simplification of the scoring for the thermodynamic model, in which the energy of a structure is the sum of the energies of the lo ops. If T is a plane free with n edges, the free energy of T is written as E ( T ) = a 3 r + a 0 d 0 + a 1 d 1 + n X k =2 [ c 2 + a 2 ( k + 1)] d k = a 3 r + a 0 d 0 + a 1 d 1 + ( c 2 + 2 a 2 ) n X k =2 d k + a 2 n X k =2 ( k − 1) d k = a 3 r + a 0 d 0 + a 1 d 1 + ( c 2 + 2 a 2 )( n − d 0 − d 1 ) + a 2 ( d 0 − r ) = ( c 2 + 2 a 2 ) n + ( a 3 − a 2 ) r + ( a 0 − c 2 − a 2 ) d 0 + ( a 1 − c 2 − 2 a 2 ) d 1 , where we hav e used the relations n X k =2 d k = n − d 0 − d 1 and n X k =2 ( k − 1) d k = d 0 − r that hold for all plane trees [27]. T o minimize free energy , we m ust minimize E ( T ) o ver the space of all plane trees. Since this space is infinite, w e will t ypically think of n as b eing fixed but arbitrary and minimize the free energy function o v er the finite space of plane trees with n edges. F or a given set of parameters a 0 , a 1 , a 2 , a 3 , c 2 , this is equiv alent to minimizing the following inner product E 0 ( T ) = ( θ 2 , θ 3 , θ 4 ) · ( r , d 0 , d 1 ) (3) where θ 2 = a 3 − a 2 θ 3 = a 0 − c 2 − a 2 θ 4 = a 1 − c 2 − 2 a 2 . 2.2 Geometric combinatorics In this section, w e presen t some basic definitions in geometric combinatorics. W e refer the reader to [12, 31] for a more detailed treatment. A set U ⊂ R d is c onvex if for any t wo points x, y ∈ U , 3 the line segmen t connecting x and y is contained in U , that is { α x + (1 − α ) y | 0 ≤ α ≤ 1 } ⊂ U . F or an y subset U of R d , the c onvex hul l of U , written conv U , is the intersection of all con vex sets that contain U . A lattic e p olytop e ∆ ⊂ R d is the con vex h ull of a finite collection of lattice p oin ts: ∆ = con v A , where A = { y 1 , y 2 , y 3 , · · · , y r } ⊂ Z d . An y lattice p olytop e ∆ is characterized b y a finite collection of defining inequalities { c i · x ≥ b i } i ∈ I where c i ∈ Z d , x ∈ ∆ , and b i ∈ Z . (4) A fac e F of ∆ is a subset defined b y setting some of the defining inequalities to equality , i.e. F =        x ∈ ∆         c i 1 · x = b i 1 c i 2 · x = b i 2 · · · c i k · x = b i k        , and the dimension of F is the dimension of its affine span. The vertic es of ∆ are the 0-dimensional faces while the fac ets ha ve dimension dim∆ − 1. The b oundary of ∆, w ritten ∂ ∆ is the union of all faces of ∆ of dimensions 0 , 1 , 2 , · · · , dim∆ − 1. A c onvex p olyhe dr al c one σ is the p ositiv e h ull of a finite collection of lattice p oin ts in Z d : σ = { t 1 z 1 + t 2 z 2 + · · · + t s z s | t i ≥ 0 , z i ∈ Z d } , and w e write σ = h z 1 , z 2 , · · · z s i . Asso ciated to eac h lattice p olytop e ∆ is its normal fan N (∆) that is a collection of cones and subdivides R d . The r ays (1-dimensional cones) of N (∆) are of the form h c i i for i ∈ I in (4). Moreov er, the cones σ ∈ N (∆) are in one-to-one corresp ondence with faces F of ∆: σ F = { v ∈ R d | u · v ≤ x · v ∀ u ∈ F , ∀ x ∈ ∆ } . (5) Note that dim σ F = dim∆ − dim F . In terms of minimization, equation (5) states that the points in F are minimizers of the dot product for vectors in σ F , among all p oin ts in ∆. As an example of the abov e concepts, w e give a 2-dimensional polytop e ∆ in Figure 2A and its normal fan N (∆) in Figure 2B. The four v ertices of ∆ corresp ond to the four 2-dimensional cones in N (∆), and the four facets of ∆ correspond to the four ra ys of N (∆). A) B) Figure 2: A 2-dimensional p olytop e ∆ (A) and its normal fan N (∆) (B) 4 3 Results 3.1 Plane trees that minimize energy Fixing n ≥ 5, the possible count v ectors ( r , d 0 , d 1 ) of plane trees are classified by the second author [14] and fall into one of four classes, as listed in T able I with r , d 0 , d 1 ≥ 0 in all cases. Since r , d 0 , d 1 Set of inequalities V ertices for n even V ertices for n o dd (A) r = 1 d 0 = 1 d 1 = n − 1 { (1 , 1 , n − 1) } { (1 , 1 , n − 1) } (B) r = 1 2 ≤ d 0 ≤ n n − 2 d 0 + 1 ≤ d 1 d 1 ≤ n − d 0 − 1  (1 , 2 , n − 3) , (1 , n +2 2 , 0) , (1 , n 2 , 1) , (1 , n − 1 , 0)   (1 , 2 , n − 3) , (1 , n +1 2 , 0) , (1 , n − 1 , 0)  (C) r = d 0 2 ≤ d 0 ≤ n d 1 = n − d 0 { (2 , 2 , n − 2) , ( n, n, 0) } { (2 , 2 , n − 2) , ( n, n, 0) } (D) 2 ≤ r r ≤ 2 d 0 − n + d 1 3 ≤ d 0 ≤ n − 1 n − 2 d 0 + 2 ≤ d 1 d 1 ≤ n − d 0 − 1  (2 , n − 1 , 0) , ( n − 2 , n − 1 , 0) , (2 , 3 , n − 4) , (2 , n +2 2 , 0)     (2 , n − 1 , 0) , ( n − 2 , n − 1 , 0) , (2 , n +3 2 , 0) , (3 , n +3 2 , 0) , (2 , 3 , n − 4) , (2 , n +1 2 , 1)    T able I: Sets of inequalities and corresponding v ertices for plane trees m ust all b e in tegers, the v ertices in T able IB or T able ID differ depending on whether or not n is ev en or o dd. W e wan t to minimize the linear energy function ov er this point set (which includes coun t v ectors from all four cases), and hence w e let P n b e the con vex h ull of the union of the four p olytop es listed in T able I. Regardless of our choice of energy parameters, a minim um energy plane tree with n edges will o ccur at a v ertex of P n . The follo wing prop osition describes the vertices of P n . Prop osition 3.1.1. Define Ψ n as fol lows. Ψ n :=  con v { (1 , n +1 2 , 0) , (1 , n − 1 , 0) , (1 , 1 , n − 1) , ( n, n, 0) } n o dd con v { (1 , n +2 2 , 0) , (1 , n 2 , 1) , (2 , n +2 2 , 0) , (1 , n − 1 , 0) , (1 , 1 , n − 1) , ( n, n, 0) } n even Then Ψ n = P n for n ≥ 5 . Pr o of. Clearly Ψ n ⊂ P n and hence w e’ll sho w each lattice p oin t of P n in T able I is con tained in Ψ n . The normal fan of Ψ n has rays { ( − 1 , 2 , 1) , (1 , 0 , 0) , (1 , 1 − n, 2 − n ) , (0 , 0 , 1) } n o dd { ( − 1 , 2 , 1) , (1 , 0 , 0) , (1 , 1 − n, 2 − n ) , (0 , 0 , 1) , (0 , 1 , 1) } n ev en 5 Moreo ver, for eac h lattice p oin t t = ( r, d 0 , d 1 ) in T able I, one can v erify that t satisfies the defining inequalities of Ψ n : ( r , d 0 , d 1 ) · ( − 1 , 2 , 1) ≥ n ( r , d 0 , d 1 ) · (1 , 0 , 0) ≥ 1 ( r , d 0 , d 1 ) · (1 , 1 − n, 2 − n ) ≥ 2 n − n 2 ( r , d 0 , d 1 ) · (0 , 0 , 1) ≥ 0 and for n even w e additionally ha ve ( r , d 0 , d 1 ) · (0 , 1 , 1) ≥ n + 2 2 . This gives P n ⊂ Ψ n , and w e hav e equalit y . In the sequel, we will primarily focus on the rational tetrahedron ∆ n := con v { (1 , n +1 2 , 0) , (1 , n − 1 , 0) , (1 , 1 , n − 1) , ( n, n, 0) } regardless of whether n is ev en or o dd. There are man y reasons for this. First, asymptotically , there is no difference b et ween P n and ∆ n for n even. The normal fan N ( P n ) is obtained from N (∆ n ) b y adding a single ra y and sub dividing the full dimensional cone σ = h (1 , 0 , 0) , (0 , 0 , 1) , ( − 1 , 2 , 1) i corresp onding to the vertex (1 , n +1 2 , 0). Th us, when n is even, the parameters giving (1 , 1 , n − 1) , (1 , n − 1 , 0) , or ( n, n, 0) the minimal energy are the same regardless of whether we use the sub division of R 3 determined b y N ( P n ) or that determined by N (∆ n ). Moreo ver, the parameters in σ will yield (1 , n +2 2 , 0) , (1 , n 2 , 1) , or (2 , n +2 2 , 0) as minimal, and the trees corresponding to these three count vectors are all similar, as discussed in Prop osition 3.3.1. 3.2 Lattice p oints in ∂ P n Supp ose S is a secondary structure whose plane tree has count vector ( r , d 0 , d 1 ). If ( r , d 0 , d 1 ) ∈ in t P n then there is no c hoice of parameters that can mak e S ha ve minimal free energy . Con v ersely , if ( r , d 0 , d 1 ) ∈ in t F for some face F of P n , then any parameter vector in the cone σ F ⊂ N ( P n ) yields S with minimal energy . W e thus wan t to determine the count v ectors lying on ∂ P n . All four sets of inequalities in T able I in tersect ∂ P n . Let Q A , Q B , Q C , and Q D b e the polyhedra describ ed in T able IA, IB, IC, and ID, resp ectively . Then, Q A , Q B , Q C ⊂ ∂ P n and Q A = { (1 , 1 , n − 1) } ( Q A ∪ Q B ) ∩ Z 3 = con v { (1 , n − 1 , 0) , (1 , 1 , n − 1) , (1 , n +1 2 , 0) } ∩ Z 3 ( Q A ∪ Q C ) ∩ Z 3 = con v { (1 , 1 , n − 1) , ( n, n, 0) } ∩ Z 3 . Since Q D is 3-dimensional, it cannot be con tained in the b oundary of P n . W e do, ho wev er, hav e ( Q D ∩ ∂ P n ) ∩ Z 3 = (in t E 1 ∪ in t F 1 ∪ in t F 2 ) ∩ Z 3 , (6) where E 1 = conv { ( n, n, 0) , (1 , n +1 2 , 0) } , F 1 = conv { ( n, n, 0) , (1 , n +1 2 , 0) , (1 , 1 , n − 1) } , and F 2 = con v { ( n, n, 0) , (1 , n +1 2 , 0) , (1 , n − 1 , 0) } . Equation (6) follows from coun ting lattice p oin ts in the ob jects on the left and right hand sides of the equation using the same technique as in Prop osition 3.2.1. The plane trees defined in T able ID that lie on ∂ P n satisfy d 1 = 0 or r = 2 d 0 − n + d 1 . Their 6 asso ciated secondary structures either ha v e no bulges/in ternal loops or ha v e a maximal n umber of helices in the exterior loop. Next, we coun t the n um b er of lattice p oin ts in the in terior of eac h face of P n . F or an edge of the form E = conv { ( x 1 , y 1 , z 1 ) , ( x 2 , y 2 , z 2 ) } , we use the form ula #  in t E ∩ Z 3  = gcd ( | x 1 − x 2 | , | y 1 − y 2 | , | z 1 − z 2 | ) − 1 and obtain the follo wing counts. The edges con v { ( n, n, 0) , (1 , n +1 2 , 0) } and conv { (1 , 1 , n − 1) , (1 , n +1 2 , 0) } eac h hav e 1 2 ( n − 3) lattice p oints in their in teriors. A total of 1 2 ( n − 5) lattice p oints are in the in terior of con v { (1 , n − 1 , 0) , (1 , n +1 2 , 0) } . The interior of conv { ( n, n, 0) , (1 , 1 , n − 1) } con tains n − 2 lattice p oin ts, and there are no interior lattice points for the edges con v { ( n, n, 0) , (1 , n − 1 , 0) } and con v { (1 , 1 , n − 1) , (1 , n − 1 , 0) } . T o determine the num b er of lattice p oints in a facet F of P n , we use Pic k’s theorem [15] #  in t F ∩ Z 3  = Area( F ) − 1 2  #  ∂ F ∩ Z 3  + 1 , where the area of F is normalized with resp ect to the 2-dimensional sublattice con taining F . W e illustrate Pick’s theorem with the following prop osition. Prop osition 3.2.1. Ther e ar e no interior lattic e p oints in the fac et F = conv { (1 , 1 , n − 1) , (1 , n − 1 , 0) , ( n, n, 0) } . Pr o of. The triangle F lies on the hyperplane − X + ( n − 1) Y + ( n − 2) Z = n 2 − 2 n in R 3 , and th us we normalize the area of F by dividing by p ( − 1) 2 + ( n − 1) 2 + ( n − 2) 2 = p 2( n 2 − 3 n + 3). Before normalization, the area of F is 1 2 v u u u t       1 1 n 1 n − 1 n 1 1 1       2 +       1 n − 1 n n − 1 0 0 1 1 1       2 +       n − 1 0 0 1 1 n 1 1 1       2 = 1 2 p 2 n 4 − 10 n 3 + 20 n − 18 n + 6 = 1 2 ( n − 1) p 2( n 2 − 3 n + 3) . Moreo ver, using the coun ts abov e for the in terior lattice p oin ts in the edges of F , w e ha v e #  ∂ F ∩ Z 3  = ( n − 2) + 0 + 0 + 3 = n + 1 . Applying Pick’s theorem yields #  in t F ∩ Z 3  = 1 2 ( n − 1) − 1 2 ( n + 1) + 1 = 0 . F or the other three facets of P n , eac h contains 1 4 ( n − 3) 2 in terior lattice p oin ts. In total, this giv es 1 4 (3 n 2 − 8 n + 13) lattice points on ∂ P n , all of which corresp ond to plane trees. 7 3.3 Biological meaning of P n and N ( P n ) 3.3.1 The v ertices of P n The v ertices of P n represen t the secondary structures with the maxim um num b er of helices in a lo op—so-called “maximal degree of branc hing”—and the few est helices in a lo op—or “minimal degree of branc hing”—as describ ed b elo w. If T is a plane tree represented as a v ertex of P n then T has n edges and n + 1 vertices. If in addition, T has coun t vector ( n, n, 0) then the degree of the root vertex is n and the n + 1 vertices are the root together with the n lea ves (vertices with 0 children). Th us, a secondary structure corresp onding to T has no in ternal lo ops, bulges, or multi-branc h lo ops and the exterior loop has n helices. If T has count v ector (1 , 1 , n − 1), the ro ot vertex has degree 1, there is one leaf, and n − 1 v ertices of degree 2 (1 c hild). Thus, T is a straigh t line, and a secondary structure corresponding to T has no multi-branc h loops and the exterior lo op has one helix.. If T has count v ector (1 , n − 1 , 0), the n + 1 vertices are the root (with degree 1), n − 1 leav es, and one v ertex of degree n . Secondary structures corresp onding to T hav e no in ternal loops or bulges and one multi-branc h lo op with n helices. In addition, the exterior loop has one helix. The remaining vertices—(1 , n +2 2 , 0) for n o dd and (1 , n 2 , 1), (2 , n +2 2 , 0), or (1 , n +2 2 , 0) for n ev en— are dealt with in the follo wing proposition. Prop osition 3.3.1. (i) F or n o dd, any plane tr e e with c ount ve ctor (1 , n +1 2 , 0) satisfies d 2 = n − 1 2 and d i = 0 for i > 2 . (ii) F or n even, any plane tr e e with c ount ve ctor (1 , n 2 , 1) or (2 , n +2 2 , 0) satisfies d 2 = n − 2 2 and d i = 0 for i > 2 . (iii) F or n even, any plane tr e e with c ount ve ctor (1 , n +2 2 , 0) satisfies d 2 = n − 4 2 , d 3 = 1 , and d i = 0 for i > 3 . Pr o of. F or (i), suppose n is odd and T is a plane tree with n edges, r = 1, d 0 = n +1 2 , and d 1 = 0. Then, T has n +1 2 + 1 v ertices of degree 1, and the remaining n + 1 −  n +1 2 + 1  = n − 1 2 v ertices ha ve degree at least 3. Thus, X v ∈ V deg v = 1 2 ( n + 1) + 1 + X deg v ≥ 3 deg v ≥ 1 2 ( n + 1) + 1 + 3 2 ( n − 1) = 2 n Ho wev er, since X v ∈ V deg v = 2 | E | , we must hav e equalit y . Thus, all other vertices must hav e degree 3 (2 c hildren). The pro of of (ii) is nearly identical to that of (i). F or (iii), a plane tree with n edges, r = 1, d 0 = n +2 2 and d 1 = 0 has n +4 2 v ertices of degree 1 and zero v ertices of degree 2. Suc h a tree cannot ha ve all other vertices of degree 3 as this w ould yield a graph with an o dd num b er of odd vertices. Thus, there is a v ertex v 0 with degree p with 8 p ≥ 4 even. This gives X v ∈ V deg v = 1 2 ( n + 4) + p + X deg v ≥ 3 v 6 = v 0 deg v ≥ 1 2 ( n + 4) + 4 + 3 2 ( n − 4) = 2 n As b efore this inequality m ust be an equality , and hence p = 4 and all other vertices ha ve degree 3. Th us, for n o dd, the coun t v ector (1 , n +1 2 , 0) corresp onds to secondary structures with no interior lo ops/bulges, all multi-branc h lo ops hav e 3 helices, and the exterior lo op has one helix. When n is ev en, a secondary structure with n helices and all three of these prop erties is not p ossible. W e instead hav e three cases, each with exactly one of the prop erties relaxed: a structure corresp onding to (1 , n 2 , 1) has one interior lo op/bulge, the coun t vector (1 , n +2 2 , 0) arises from structures having one m ulti-branch lo op with 4 helices (all other m ulti-branch lo ops hav e 3 helices), and the exterior lo op of a structure corresp onding to (2 , n +2 2 , 0) has 2 helices. F or n o dd, plane trees representativ e of those described in this section are shown in Figure 3. Figure 3: The RNA p olytop e P n . Remark. The map fr om plane tr e es to c ount ve ctors is generic al ly many-to-one. Thr e e of the 4 vertic es, however, c orr esp ond to exactly one tr e e: ( n, n, 0) , (1 , 1 , n − 1) , (1 , n − 1 , 0) . The tr e es with c ount ve ctor (1 , n +1 2 , 0) ar e in one-to-one c orr esp ondenc e with ful l binary tr e es with n − 1 e dges (by r emoving the r o ot vertex). Ther e ar e C n − 1 2 such tr e es [6], wher e C n − 1 2 is the n − 1 2 th Catalan numb er define d in e quation (2) . 9 3.3.2 The ra ys in N ( P n ) The energy function E 0 in equation (3) scores a secondary structure with n helices based on the n umber of helices in the exterior lo op, the num b er hairpin lo ops, and the n um b er of bulges/internal lo ops. The normal fan N ( P n ) of P n sub divides the ( θ 2 , θ 3 , θ 4 ) parameter space. Each v ector in ( x, y , z ) ∈ R [ θ 2 ] × R [ θ 3 ] × R [ θ 4 ] corresp onds to a scoring function in whic h x gives the weigh t of a helix in the external lo op, y gives the w eight of a hairpin lo op, and z gives the w eight of a bulge/in ternal loop. The fan N ( P n ) consists of cones generated b y elemen ts in the p ow er set P ( { (1 , 0 , 0) , (0 , 0 , 1) , ( − 1 , 2 , 1) , (1 , 1 − n, 2 − n ) } ) . Th us, a parameter vector v ∈ R [ θ 2 ] × R [ θ 3 ] × R [ θ 4 ] has the form c 1 y 1 + c 2 y 2 + c 3 y 3 with c 1 , c 2 , c 3 ≥ 0 and y 1 , y 2 , y 3 ∈ { (1 , 0 , 0) , (0 , 0 , 1) , ( − 1 , 2 , 1) , (1 , 1 − n, 2 − n ) } . A generic vector in R 3 lies in the in terior of one of the 3-dimensional cones in N ( P n ), and hence we give a brief in terpretation of the parameter vectors with c i 6 = 0 for i = 1 , 2 , 3. Scoring vectors in the interior of the cone h (0 , 0 , 1) , (1 , 0 , 0) , (1 , 1 − n, 2 − n ) i p enalize for hairpin lo ops and can indep endently p enalize or rew ard for helices in the exterior loop and bulges/in ternal lo ops. If v ∈ int h (0 , 0 , 1) , (1 , 0 , 0) , ( − 1 , 2 , 1) i then v gives a p enalty for b oth hairpin lo ops and interior lo ops/bulges. Helices in the exterior lo op can b e b eneficial or harmful with this scoring v ector, and v can equally p enalize helices in the exterior lo op, hairpin lo ops, and internal loops/bulges. Scoring v ectors in the interior of one of the tw o remaining cones can reward or p enalize all three quantities. These are not independent, ho wev er. F or instance, if v ∈ in t h (1 , 1 − n, 2 − n ) , (0 , 0 , 1) , ( − 1 , 2 , 1) i and hairpin lo ops are disadv antageous under v ’s scoring sc heme then helices in the exterior loop are b eneficial. If w ∈ int h (1 , 0 , 0) , (1 , 1 − n, 2 − n ) , ( − 1 , 2 , 1) i and w rewards hairpin loops then w rew ards bulges/internal lo ops. Similarly , if w penalizes for bulges/internal lo ops then w penalizes for hairpin lo ops. Also, scoring v ectors in the in terior of the cone h (1 , 1 − n, 2 − n ) , (0 , 0 , 1) , ( − 1 , 2 , 1) i can equally rew ard hairpin lo ops, internal lo ops/bulges, and helices in the exterior lo op. 3.4 V ariation in the parameter space In this section, w e add additional information to the parameters { θ 2 , θ 3 , θ 4 } in order to study the effect of v arying the m ulti-branch lo op parameters in the thermo dynamic mo del of RNA folding. W e obtain free energy parameters for plane trees using one of the four combinatorial sequences ha ving the form X 4 ( Y 6 X 4 Z 6 X 4 ) k where k ≥ 1 and  X = A and { Y , Z } = { C , G } X = C and { Y , Z } = { A, U } . In these sequences, the segments of the form Y 6 pair with the Z 6 segmen ts while the X nucleotides remain unpaired, and moreov er all the loops of a giv en type hav e the same free energy . W e do not include the p ossibilities X = U and { Y , Z } = { C, G } or X = G and { Y , Z } = { A, U } b ecause w e w ant to preven t the G − U pairing. F or a giv en sequence, we use b oth the curren t (v ersion 3.0) [19] and previous (version 2.3) [28] thermodynamic parameters, determined b y the T urner lab. The parameters a 3 , a 0 , and a 1 are based on experimental measuremen t and are listed in T able I I. The parameters a 2 and c 2 come from the m ulti-branch loop scoring function in equation (1), where the parameters a, b , and c in this function are not determined exp erimen tally . If L is a multi-branc h lo op 10 with n 1 single-stranded bases and n 2 helices and L app ears in a secondary structure for one of our 4 com binatorial sequences, then we ha ve n 1 = 4 n 2 . Additionally , for eac h helix in L , the single-base stac king energy is a 3 . Th us, free energy of L in equation (1) becomes E ( L ) = a + 4 bn 2 + cn 2 + a 3 n 2 , and the parameters a 2 and c 2 in the free energy function E 0 in equation (3) can be written as a 2 = 4 b + c + a 3 and c 2 = a . T able I I I illustrates three t yp es of v ariation: v ariation of specific T urner 3.0 V alues T urner 2.3 V alues Sequence a 3 a 0 a 1 a 3 a 0 a 1 X=A, Y=G, Z=C − 1 . 9 4 . 1 2 . 3 − 1 . 9 3 . 5 3 . 0 X=A, Y=C, Z=G − 1 . 6 4 . 5 2 . 3 − 1 . 6 3 . 8 3 . 0 X=C, Y=A, Z=U − 0 . 4 5 . 0 3 . 7 − 0 . 4 4 . 3 4 . 0 X=C, Y=U, Z=A − 0 . 6 4 . 9 3 . 7 − 0 . 6 4 . 2 4 . 0 T able II: Energy parameters for plane trees n ucleotides in combinatorial sequence, v ariation of the version of T urner’s energy parameters, and v ariation of a, b, c parameters. The effect of v arying the m ulti-branch lo op parameters a, b, c is more or less the same for eac h sequence and energy table: tw o different count vectors can b e minimal dep ending on the v alue of a + 12 b + 3 c . T ec hnically , a third v ertex of P n has minimal energy in some cases when b = c = 0. How ever, if the offset and helix penalties are both zero, the multi-branc h energy function will ha v e no p enalties for the n umber of single-stranded bases and the n umber of stems in a lo op. This does not agree with the free energy mo del. V arying the sequence alone, we obtain differences in the cut-off v alues for a + 12 b + 3 c . On the whole, ho wev er, n ucleotide v ariation in the combinatorial sequence do es not give qualitativ e differences in the minimal energy plane trees. W e do see (in 3 of the 4 sequences) qualitative differences in the minimal energy trees when w e compare v ersion 3.0 parameters to v ersion 2.3 parameters. F or instance, when a + 12 b + 3 c is large, 3 of the 4 sequences giv e the ‘straight line’ tree with count v ector (1 , 1 , n − 1) minimal with v ersion 3.0 parameters. Using v ersion 2.3, all four sequences result in the maximal degree of branc hing, with coun t vector ( n, n, 0) having minimal energy . This difference in minimal energy trees is not to o surprising b ecause the change from version 2.3 to version 3.0 was based on more accurate experimental measurement. The secondary predicted structures ha ve indeed c hanged. It is worth noting that if w e use the actual penalties for offset, free base, and helix from versions 2 . 3 and 3 . 0 of the T urner energies, w e obtain a + 12 b + 3 c =  4 . 6 v 3 . 0 9 . 7 v 2 . 3 . Th us, all four combinatorial sequences yield ( n, n, 0) with minimal energy for v ersion 2 . 3. Moreo ver, as 9.7 is a fair amoun t greater than the cut off for all four sequences, slight v ariation in these parameters will not c hange the predicted structure. F or v ersion 3 . 0, the t wo com binatorial sequences with unpaired p oly-A segments hav e (1 , n +1 2 , 0) b eing minimal while (1 , 1 , n − 1) is minimal for the other tw o sequences. Also, 4.6 is muc h closer to the cut off v alues for the sequences. Small changes in these parameters could c hange which trees hav e minimal energy . 11 V ertex Ra ys in N ( P n ) Energy version Restrictions on a, b, c Sequence: [X , Y , Z] = (1 , 1 , n − 1) (1 , 1 − n, 2 − n ) (1 , 0 , 0) ( − 1 , 2 , 1) 3 . 0 (2 . 3) a + 12 b + 3 c ≥        N/A (N/A) 4 . 9 (N/A) 3 . 6 (N/A) 4 . 3 (N/A) [A , G , C] [A , C , G] [C , A , U] [C , U , A] (1 , n +1 2 , 0) (0 , 0 , 1) (1 , 0 , 0) ( − 1 , 2 , 1) 3 . 0 (2 . 3) a + 12 b + 3 c ≤        6 . 0 (5 . 4) 4 . 9 (5 . 4) 3 . 6 (4 . 7) 4 . 3 (4 . 8) [A , G , C] [A , C , G] [C , A , U] [C , U , A] ( n, n, 0) (1 , 1 − n, 2 − n ) (0 , 0 , 0) ( − 1 , 2 , 1) 3 . 0 (2 . 3) a + 12 b + 3 c ≥        6 . 0 (5 . 4) N/A (5 . 4) N/A (4 . 7) N/A (4 . 8) [A , G , C] [A , C , G] [C , A , U] [C , U , A] (1 , n − 1 , 0) (0 , 0 , 1) (1 , 0 , 0) (1 , 1 − n, 2 − n ) 3 . 0 (2 . 3) b = c = 0 , a =        6 . 0 (5 . 4) N/A (5 . 4) N/A (4 . 7) N/A (4 . 8) [A , G , C] [A , C , G] [C , A , U] [C , U , A] T able II I: Restrictions on a, b, c parameters from full-dimensional cones in N ( P n ) 3.5 RNA STRAND database analysis 3.5.1 Ov erall shap e of data Our initial collection of secondary structures contains 145 structures from 137 distinct RNA se- quences, as describ ed in Materials and Methods. The sequences range from 19 to 4216 nucleotides. W e exclude structures for which the n umber of helices is less than 5 from further analysis. This reason for this is that not all the v ertices of P n listed in Proposition 3.1.1 are v alid and distinct when n ≤ 4. W e ha ve 110 structures with n ≥ 5 (from 103 sequences) having av erage (median) length of 739 (367) and n = 27 (13). W e break these into classes, based on the n umber of helices, as depicted in T able IV. While our collection con tains more small and medium trees as compared to large trees, this reflects the frequency in the RNA STRAND database. F or instance, according to an analysis done b y RNA STRAND, the a v erage (median) n um b er of helices ov er the en tire database is 28 (8). This coun t do es, ho wev er, include the sequences with few er than 5 helices and includes a less restrictiv e definition of bulges/in ternal lo ops and helices: internal lo ops/bulges can ha ve any n umber of unpaired bases and helices can ha ve an y num b er of base pairs. Our large trees come from 16S rib osomal RNA and 23S rib osomal RNA sequences and ha v e a minim um sequence length of 954 n ucleotides. In the RNA STRAND database, only 20% of the 4666 structures con tain at least 954 n ucleotides. 12 Category Range of n # of trees Average length Median length Average n Median n Small 5 − 12 50 244 220 9 9 Medium 13 − 40 40 676 512 22 19 Large 41 − 136 20 2104 1831 82 76 T able IV: T rees in RNA STRAND collection by size 3.5.2 Lo cation of count v ectors on polytop e It is of great imp ortance to kno w when biologically correct secondary structures can b e predicted b y the free energy mo del. With our simplified energy function E 0 in (3), w e ask if the biologically correct structures can be minimal for some choice of parameters. As mentioned in § 3.2, this translates into determining when the corresp onding coun t vectors lie on the b oundary of P n . Sev ent y-one out of 110 coun t v ectors lie on the b oundary of P n : 49 lying on the interior of a facet, 18 lying on the interior of an edge, and 4 o ccurring as v ertices. The a verage num b er of edges for plane trees on the boundary of P n is 17 and is 45 for plane trees in the in terior of P n . Of those con tained in the in terior of a facet, 28 are minimal for parameters in h ( − 1 , 2 , 1) i , 7 are minimal for parameters in h (0 , 0 , 1) i , and 14 are minimal for parameter v alues in h (1 , 0 , 0) i . Of those con tained in the interior of an edge, 8 are minimal for parameters in h ( − 1 , 2 , 1) , (1 , 0 , 0) i , 9 are minimal for parameters in h ( − 1 , 2 , 1 , (1 , 1 − n, 2 − n ) i , and 1 is minimal for parameters in h (1 , 0 , 0) , (0 , 0 , 1) i . The 4 coun t vectors that are v ertices of P n satisfy n = 5 or 6 and consist of the set { (5 , 5 , 0) , (6 , 6 , 0) , (1 , 4 , 0) , (1 , 1 , 4) } . Figure 4 shows the location of the count vectors for small, medium, and large trees, giv en in terms of the p ercen tage of trees in eac h category . Figure 4: Lo cation of count v ectors on P n for small, medium, and large trees (in percentage) 13 3.5.3 Closest v ertex to coun t v ectors In order to determine which of the 4 v ertices of P n is closest to a given coun t vector, w e map the tetrahedron con v { (1 , n, n, 0) , (1 , 1 , 1 , n − 1) , (1 , 1 , n − 1 , 0) , (1 , 1 , n +1 2 , 0) } on to the standard tetrahedron with v ertices { (1 , 0 , 0 , 0) , (0 , 1 , 0 , 0) , (0 , 0 , 1 , 0) , (0 , 0 , 0 , 1) } . This is accomplished with the following matrix                 − 1 n − 1 1 n − 1 0 0 0 0 0 1 n − 1 − n n − 3 − 1 n − 3 2 n − 3 1 n − 3 2 n ( n − 2) ( n − 1)( n − 3) 2 ( n − 1)( n − 3) − 2 n − 3 − 2( n − 2) ( n − 1)( n − 3)                 (7) whic h has determinan t 2 ( n − 1) 2 ( n − 3) . F or a giv en n , an y coun t v ector ( r , d 0 , d 1 ) can be written as a sum a 1 ( n, n, 0) + a 2 (1 , 1 , n − 1) + a 3 (1 , n − 1 , 0) + a 4 (1 , n +1 2 , 0) with 0 ≤ a i ≤ 1 and a 1 + a 2 + a 3 + a 4 = 1. After applying the linear transformation (7), the lattice p oin t (1 , r, d 0 , d 1 ) will ha ve coordinates ( a 1 , a 2 , a 3 , a 4 ). The co ordinate a i giv es a measure of the ‘closeness’ to vertex i . F or a giv en RNA structure, the largest of the a i giv es the v ertex closest to its count vector. Moreo v er, if t = max { a 1 , a 2 , a 3 , a 4 } then 0 . 25 ≤ t ≤ 1. Fift y-tw o of the 110 structures are closest to (1 , n +1 2 , 0), 38 are closest to (1 , 1 , n − 1), 10 are closest to ( n, n, 0), and 6 are closest to (1 , n − 1 , 0). Additionally , w e hav e 2 that are closest to b oth (1 , 1 , n − 1) and ( n, n, 0) and 2 that are closest to b oth (1 , 1 , n − 1) and (1 , n +1 2 , 0). The a verage v alues of ( a 1 , a 2 , a 3 , a 4 ) o ver the 110 structures are (0 . 181 , 0 . 357 , 0 . 138 , 0 . 332) which sho ws that as a whole, the count v ectors are closest to (1 , 1 , n − 1) and (1 , n +1 2 , 0). W e say a coun t v ector is ‘close’ to vertex i if a i > 0 . 625 . The v alue 0 . 625 is halfw ay in b etw een the smallest and largest p ossible v alues of a i . With this definition, 22% of the small trees are close to v ertices, 5% of the medium trees are close to v ertices, and no large trees are close to v ertices. Thirteen trees in total are close to v ertices, of which 8 are close to (1 , 1 , n − 1), 2 are close to (1 , n +1 2 , 0), 2 are close to ( n, n, 0), and 1 is close to (1 , n − 1 , 0). All thirteen of these lattice p oin ts lie on the boundary of P n and hence correspond to minimal energy trees for some c hoice of parameter v alues. 4 Discussion and Conclusions W e ha ve used a simple scoring scheme for scoring RNA folds: energy is assigned to a secondary structure based solely on the total n umber of helices, the n umber of helices in the exterior lo op, and the n umbers of hairpin lo ops and bulges/internal loops. Fixing the total num b er of helices, the extremal folds are those with the maximal and minimal degrees of branching. When a generic 14 parameter vector is chosen, precisely one of those will hav e minimal energy . F or more specific c hoices of parameters (biologically realistic or not), the n umber of minimal coun t vectors is on the order of the square of the total n umber of helices. While this seems large, the total num b er of coun t vectors that cannot be minimal for any c hoice of parameters is on the order of the cube of the total n umber of helices. Thus, when this total is large, we w ould not exp ect such a scoring sc heme to accurately predict the correct structures. This is supp orted by our RNA STRAND analysis in whic h 85% of the count v ectors from known structures with a high n umber of helices cannot b e minimal for any choice of parameters. None of these structures are ‘close’ to the extremal folds. This is not unexp ected, ho wev er, since ev en the highly detailed free energy model is not accurate for large RNA molecules [9]. On the other hand, when the total num b er of helices is small, only 10% of the kno wn structures cannot be minimal for our scoring scheme. While the scoring function used in this w ork is to o simplistic to implement in a prediction soft ware, our results suggest that for small RNA molecules, the full free energy model is not necessary for accurate predictions. W e are not the first to make this observ ation, for [10] analyzed some simple probabilistic RNA folding mo dels—one with as few 21 free parameters—whose accuracies are comparable to mfold’s. In their study , the sequences used for testing came from rib on uclease P RNA, transfer mRNA, and signal recognition particle RNA sequences, all of which yield small to medium trees b y our classification. While 21 parameters is far to o man y for parametric analysis using p olyhedral geometry , perhaps a simple model incorporating some thermo dynamics and some probabilistic parameters can accurately predict the folding of small RNA molecules. W e compared the v ariation of multi-branc h lo op parameters to t wo other t yp es of v ariation in the parameter space. Fixing the combinatorial sequence and energy version, tw o possible coun t vectors can b e minimal by v arying the m ulti-branc h lo ops parameters. If we use the most recen t (accurate) energy v ersion, w e find that for 3 of the 4 sequences, these t w o count v ectors include (1 , 1 , n − 1) and (1 , n +1 2 , 0). Interestingly , these t wo v ertices are closest to the known structures in our RNA STRAND collection. Moreov er, regardless of the choices of m ulti-branc h lo op parameters in the curren t v ersion of the thermodynamic mo del, predicted structures ha v e a lo w degree of bran ching— b oth in the exterior lo op and in the multi-branc h lo ops. Out of the three possible v ariations, the most significant c hanges come from v arying the energy v ersion, as the p ossible predicted structure for version 2.3 ha ve a high degree of branching. Even though the p enalties for off-set, free base and helix in the m ulti-branch lo op energy calculation are chosen without sp ecific measuremen t, they do not appear to hav e a dramatic effect on the predicted structures. One w ould hop e that the parameters determined exp erimentally are what truly gov ern the predicted structures, and our findings supp ort this p ossibilit y . 5 Materials and Metho ds 5.1 Selection of secondary structures from RNA STRAND database The RNA STRAND database [1] was searc hed b y type of RNA (for example, 16S ribosomal RNA, cis-regulatory elemen t, or group I in tron). Eac h type of RNA w as sorted b y molecule length, and structures w ere selected from a v ariety of organisms to b e representativ e of the different lengths app earing in the database for that t yp e of RNA. Visual insp ection of the secondary structures was imp ortan t in the selection of the structures for our collection. It allo wed for the inclusion of similar 15 length structures with differen t types of branching. It also prev en ted our collection from containing nearly identical structures formed b y tw o different RNA molecules of the same t yp e. Finally , visual insp ection k ept our collection from having a plethora of structures with only one or t w o helices; these structures are ov errepresented in the RNA STRAND database. 5.2 Remo v al of pseudoknots from .ct files In order to obtain a plane tree from a give secondary structure, pseudoknots w ere remo v ed. A p erl script read the .ct file and stored the closing pairs of all helices, where the helices are defined is § 5.3. Eac h pair ( i, j ) and ( i 0 , j 0 ) of closing pairs w as tested to see if i < i 0 < j < j 0 . If true, the pairs ( i, j ) and ( i 0 , j 0 ) were printed to a file. Next, for each pair ( i, j ) and ( i 0 , j 0 ) in the output file, one of the asso ciated helices w as remov ed according to the follo wing rubric. If some closing pair ( i, j ) app ears m ultiple times, its helix was remov ed under the assumption that it formed a pseudoknot. If b oth ( i, j ) and ( i 0 , j 0 ) were not listed with an y other closing pairs, the shorter of the 2 corresp onding helices w as remov ed. In the even t that the t w o helices had the same n umber of paired bases, t wo v ersions of the .ct file were sav ed—one with the first helix remov ed and one with the second helix remo ved. 5.3 Calculation of n, r, d 0 , d 1 from .ct files After all the pseudoknots w ere remov ed from the .ct files of secondary structures in our collection, a p erl script calculated n, r , d 0 , and d 1 . In our simplified mo del of RNA folding, all helices ha ve the same energy indep enden t of the n umber of base pairs in the helix. Similarly , all bulges/in ternal lo ops hav e the same energy regardless of the n um b er of free bases in the lo op. Because of this, v ery small bulges/in ternal loops and v ery short helices w ere ignored. Bulges and in terior lo ops w ere required to hav e at least 3 unpaired bases. No restrictions w ere placed on the num b er of free bases in a hairpin lo op, whic h w as imp ortant so as to main tain the graph structure (edges connecting t wo v ertices). A) B) C) Figure 5: Helices and in ternal loops: A, B, and C are fragmen ts from structures in the RNA STRAND database Eac h helix with choice of closing pair has a ‘left length’ and ‘right length’ of the helix. The left length of a helix is the n umber of bases in the portion of the sequence that terminates at one of 16 the closing bases. The right length of a helix is the num b er of bases in the p ortion of the sequence that originates at one of the closing bases. The closing pair of a helix as well as its righ t length are depicted in Figure 5A. F or this structure, the helix with closing pair G–C has left length 28 and right length 25. F or our analyses, a helix w as defined to ha v e both the left and right length 3 or greater. Thus, the piece of secondary structure shown in Figure 5B has t wo helices—one with left and righ t length 5 and one with left and righ t length 3—and one hairpin lo op. Similarly , with our definitions, the fragmen t depicted in Figure 5C has only 1 in terior lo op that con tains the base pairs G–C and U–A. The single C–G base pair is not considered a helix, and since eac h of the in ternal lo ops containing the C–G pair hav e more than 3 unpaired bases, the C–G base pair is not considered a part of either helix. Ac kno wledgemen ts V.H. and C.E.H were b oth supported b y the NIH grant 1R01GM083621-01 (P .I. Heitsch). C.E.H. also ackno wledges funding from a Career Aw ard at the Scientific In terface (CASI) from the Bur- roughs W ellcome F und (BWF). Additionally , V.H. would lik e to thank Justin Filoseta for the remark able computer supp ort at the Georgia Institute of T ec hnology . References [1] M. Andronescu, V. Bereg, H. Hoos, and A. Condon. RNA STRAND: The RNA secondary structure and statistical analysis database. BMC Bioinformatics , 9(1):340, 2008. [2] Y. Bakh tin and C. E. Heitsc h. Large deviations for random trees. J Stat Phys , 132(3):551–560, 2008. [3] Y. Bakh tin and C. E. Heitsch. Large deviations for random trees and the branching of RNA secondary structures. Bul l Math Biol , 71(1):84–106, 2009. [4] N. Beerenwink el, C. N. Dew ey , and K. M. W o o ds. P arametric inference of recombination in HIV genomes. preprint av ailable at arXiv:q-bio/0512019v1, Dec 2005. [5] N. Dershowitz and S. Zaks. En umerations of ordered trees. Discr ete Math , 31(1):9–28, 1980. [6] E. Deutsch. Ordered trees with prescrib ed root degrees, no de degrees, and branc h lengths. Discr ete Math , 282(1-3):89–94, 2004. [7] C. N. Dew ey , P . M. Huggins, K. W o o ds, B. Sturmfels, and L. P ach ter. P arametric alignmen t of drosophila genomes. PL oS Comput Biol , 2(6):606–614, 2006. [8] C. N. Dew ey and K. W o o ds. Parametric sequence alignmen t. In B. Sturmfels and L. Pac hter, editors, A lgebr aic statistics for c omputational biolo gy , pages 193–205. Cam bridge Univ ersit y Press, New Y ork, 2005. [9] K. J. Doshi, J. J. Cannone, C. W. Cobaugh, and R. R. Gutell. Ev aluation of the suitabil- it y of free-energy minimization using nearest-neigh b or energy parameters for RNA secondary structure prediction. BMC Bioinformatics , 5(1):105, 2004. 17 [10] R. Dow ell and S. Eddy . Ev aluation of sev eral ligh t weigh t stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics , 5(1):71, 2004. [11] H. H. Gan, S. Pasquali, and T. Sc hlick. Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic A cids R es , 31(11):2926–43, June 2003. [12] B. Gr ¨ un baum. Convex p olytop es , v olume 221 of Gr aduate T exts in Mathematics . Springer- V erlag, New Y ork, second edition, 2003. Prepared and with a preface by V olker Kaib el, Victor Klee and G¨ unter M. Ziegler. [13] D. Gusfield, K. Balasubrama, and D. Naor. P arametric optimization of sequence alignmen t. A lgorithmic a , 12(4-5):312–326, 1992. [14] C. E. Heitsc h. Com binatorial insigh ts into RNA secondary structures. In preparation. [15] H. Iseri. An exploration of Pic k’s theorem in space. Math Mag , 81(2):106–115, 2008. [16] D. H. T. John San taLucia Jr. Measuring the thermodynamics of RNA secondary structure formation. Biop olymers , 44(3):309–319, 1997. [17] S.-Y. Le, R. Nussino v, and J. V. Maizel. T ree graphs of RNA secondary structures and their comparisons. Comput Biome d R es , 22(5):461–473, Octob er 1989. [18] H.-P . Lenhof, K. Reinert, and M. Vingron. A p olyhedral approach to RNA sequence structure alignmen t. J Comput Biol , 5:517–530, 1998. [19] D. H. Mathews, M. D. Disney , J. L. Childs, S. J. Sc hro eder, M. Zuk er, and D. H. T urner. Incorp orating c hemical mo dification constrain ts in to a dynamic programming algorithm for prediction of RNA secondary structure. Pr o c Nat A c ad Sci , 101(19):7287–7292, 2004. [20] D. H. Mathews, J. Sabina, M. Zuk er, and D. H. T urner. Expanded sequence dependence of thermo dynamic parameters impro v es prediction of RNA secondary structure. J Mol Biol , 288(5):911–940, May 21 1999. [21] D. H. Mathews and D. H. T urner. Prediction of RNA secondary structure b y free energy minimization. Curr Opin Struct Biol , 16(3):270–278, 2006. [22] L. P ac hter and B. Sturmfels. P arametric inference for biological sequence analysis. Pr o c Nat A c ad So c , 101(46):16138–16143, 2004. [23] L. P ach ter and B. Sturmfels. T ropical geometry of statistical mo dels. Pr o c Nat A c ad So c , 101(46):16132–16137, 2004. [24] W. R. Schmitt and M. S. W aterman. Linear trees and RNA secondary structure. Discr ete Appl Math , 51(3):317–323, 1994. [25] B. A. Shapiro and K. Zhang. Comparing multiple RNA secondary structures using tree com- parisons. Comput Appl Biosci , 6(4):309–18, October 1990. [26] S. Smit, K. Rother, J. Heringa, and R. Knight. F rom knotted to nested RNA structures: a v ariety of computational methods for pseudoknot remo v al. RNA , 14(3):410–416, 2008. 18 [27] R. P . Stanley . Enumer ative c ombinatorics. Vol. 2 , v olume 62 of Cambridge Studies in A dvanc e d Mathematics . Cam bridge Univ ersity Press, Cambridge, 1999. [28] A. E. W alter and D. H. T urner. Sequence dependence of stability for coaxial stacking of RNA helixes with Watson-Cric k base paired interfaces. Bio chemistry , 33(42):12715–12719, Oct 25 1994. [29] L. W ang and J. Zhao. P arametric alignmen t of ordered trees. Bioinformatics , 19(17):2237–45, No v 22 2003. [30] M. S. W aterman, M. Eggert, and E. Lander. P arametric sequence comparisons. Pr o c Nat A c ad Sci , 89(12):6090–6093, 1992. [31] G. M. Ziegler. L e ctur es on p olytop es , volume 152 of Gr aduate T exts in Mathematics . Springer- V erlag, New Y ork, 1995. [32] M. Zuk er. Calculating n ucleic acid secondary structure. Curr Opin Struct Biol , 10(3):303–310, 2000. [33] M. Zuk er. Mfold web serv er for n ucleic acid folding and h ybridization prediction. Nucleic A cids R es , 31(13):3406–15, 2003. [34] M. Zuker, D. Mathews, and D. T urner. Algorithms and thermodynamics for RNA secondary structure prediction: A practical guide. In J. Barciszewski and B. Clark, editors, RNA Bio- chemistry and Biote chnolo gy , NA TO ASI Series, pages 11–43. Klu wer Academic Publishers, 1999. 19

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment