Central and Local Limit Theorems for RNA Structures
A k-noncrossing RNA pseudoknot structure is a graph over $\{1,...,n\}$ without 1-arcs, i.e. arcs of the form (i,i+1) and in which there exists no k-set of mutually intersecting arcs. In particular, RNA secondary structures are 2-noncrossing RNA struc…
Authors: ** *Christian M. Reidys* (주요 저자) 외 (논문에 명시된 공동 저자들) **
CENTRAL AND LOCAL LIMIT THEOR EMS F OR RNA STRU CTURES EMMA Y. JIN AND CHRISTI AN M. REIDYS ⋆ Abstract. A k -noncrossing RNA pseudoknot structure is a graph ov er { 1 , . . . , n } without 1- arcs, i .e. arcs of the form ( i, i + 1) and in whic h there exists no k - set of mu tually intersecting arcs. In particular, RNA seconda ry structures are 2-noncrossing RNA st ructures. In this pap er we prov e a central and a local li mit theorem for the distribution of the num b ers of 3-noncrossing RNA structures ov er n nucleo tides with exactly h b onds. W e will buil d on the results of [10] and [ 11], where the generating function of k -noncrossing RN A pseudoknot structures and the asymptotics f or its coefficients ha ve b een derived. The results of this paper explain the findings on the n umbers of arcs of RNA secondary structures obtained by molecular folding algorithms and predict the distributions for k -noncrossing RNA folding algori thms which are current ly being dev elop ed. 1. Introduction An RNA molecule consists of the primary seque nc e of the four nucleotides A , G , U and C tog ether with the W atson-Crick ( A-U , G-C ) and ( U-G ) base pairing r ules. The la tter specify the pa ir s of nu cleotides that can p otentially form b onds. Single stra nded RNA mole c ules form helical structures whose b onds satisfy the ab ov e base pa iring r ules a nd which, in many cases , deter mine their function. F or instance RNA rib os omes a re capable of c a talytic activit y , cleaving other RNA molecules. Not all possible bo nds are r ealized, though. Due to bio-physical constr aints and the chemistry of W atson-Crick base pa ir s there exist rather severe constraints o n the b o nds of an RNA molec ule . In light of this three deca de s ag o W aterman et.al. pioneered the c oncept o f RNA se c o ndary structures [2 1, 17], b eing sub ject to the most s trict co mbinatorial constraints. Any str ucture can be represented by dr awing the primary seq uence horizo ntally , ig noring all c hemical b onds o f its Date : July 2007. Key wor ds and phr ases. k -noncrossing RNA structure, pseu do-knot, gene rating funct ion, singularity , cen tral limit theorem, lo cal limit theorem. 1 2 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ backbone, s e e Fig . 1. Then one draws a ll b o nds, satisfying the W atso n-Crick base pair ing rules as ar cs in the upper half-plane, effectiv ely iden tifying structur e with the set of a ll arcs . In this representation, RNA secondary struc tur es hav e no 1-a r cs, i.e . arcs of the for m ( i, i + 1) and no tw o arcs ( i 1 , j 1 ), ( i 2 , j 2 ), where i 1 < j 1 and i 2 < j 2 with the proper ty i 1 < i 2 < j 1 < j 2 . In other words there exist no t wo arcs that cross in the diagr am representation of the structure. It is well-known Figure 1. R N A secondary structures. Diagram representation (top): the primary sequence, AGGCAA UCUACAGCGU , is drawn h orizontally and its backbone b onds are ignored. All b onds are d ra wn in the upp er h alf-plane. Secondary structu res hav e the prop erty that no tw o arcs intersect and all arcs hav e minimum length 2. Outer p lanar graph rep resen tation (b ottom). that there exis t a dditional types of nucleotide interactions [1]. These b onds are called pseudoknots [23] a nd o ccur in functional RNA (RNAseP [1 4]), rib os o mal RNA [12 ] and are conserved in the catalytic core of gr oup I introns. Pseudokno ts app ear in plant vir a l RNAs pseudo - knots and in in vitr o RNA ev olution [20] ex p er iments ha ve pro duced families of RNA structures with pseudoknot motifs, when binding HIV-1 rev erse transcriptase. Impo rtant mechanisms lik e riboso mal f ra me shifting [3] also in volv e pseudoknot in teractio ns. k -noncro ssing RNA structures introduced in [10] CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 3 capture these pseudoknot b onds and gener alize the concept of the RNA seco ndary structure s in a natural way . In the diag ram repres e nt ation k - noncross ing RNA structure has no 1-a r cs and contains at mos t k − 1 mut ually cr ossing arcs. Figure 2. k -non crossing RNA stru ct u res. (a) secondary structure (with isolated lab els 3 , 7 , 8 , 10), (b) p lanar 3-noncrossing RNA structure, 2 , 9 b eing isolated (c) the smallest non-planar 3-noncrossing structu re The s tarting p o int of this paper was the ex per imental finding that 3 -noncros sing RNA s tructures for random sequences of leng th 1 00 ov er the nu cleo tides A , G , U a nd C exhibited shar ply concentrated nu mbers of arcs (centered at 39). It w as furthermore in triguing th at the n umbers of ar cs w ere significantly higher than those in RNA secondary structures. While it is evident that 3-no ncrossing RNA structures have more arcs than secondar y s tructures, the jump from 2 7 to 39 (for n = 100 ) with a maximum num b er of 50 arcs was not an ticipated. Since all these quantities w ere via the genera ting functions for k -no ncrossing RNA structures in [10] explicitly kno wn we could easily confirm that the num b ers of 3-noncrossing RNA structures with ex actly h ar cs, S ′ 3 ( n, h ) sa tisfy indeed a lmost “p er fectly” a Gaus s ian distribution with a mean of 3 9, see Fig. 3. W e also fo und that a cen tral limit theor em holds for RNA secondary structures with h arcs, see Figure 4. These observ ation motiv ated us to unders ta nd how and why these limit distributions arise , which is what the present pa p e r is ab out. Our main results can be summarized as follows: 4 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ Figure 3. Centra l limit theorem and local limit theorem for 3-non crossing R NA struc- tures of length n = 100 with ex actly h arcs: w e display the central limit th eorem (left) for S ′ 3 (100 , h ) , h = 1 , 2 , · · · 50 (lab eled by red dots) with mean 0 . 39089 · 100 = 39 . 089 and v ariance 0 . 041565 · 100 = 4 . 1565, and for the local limit theorem (righ t), w e displa y the difference √ 4 . 1565 P “ X n − 39 . 089 √ 4 . 1565 = x ” − 1 √ 2 π e − x 2 2 whic h is maximal close to the p eak of the distribution. Theorem. L et S ′ 3 ( n, h ) denote the nu mb er of 3 - n oncr ossing RNA struct u r es with exactly h ar cs. Then the r andom variable X n having distribution P ( X n = h ) = S ′ 3 ( n, h ) / S 3 ( n ) satisfies a c entr al and lo c al limit the or em with me an 0 . 39089 n and varianc e 0 . 04 1 565 n . Our par ticular str ategy is ro o ted in o ur r ecent w ork on a s ymptotic enumeration of k -noncr ossing RNA structures [1 1] and a pa pe r of Bender [2] who showed how suc h central limit theorems a rise in case of singular ities that are po les. In order to put o ur r esults into context let us provide some background on central a nd lo c a l limit theor ems. Suppo se we are g iven a s et A n (of size a n ). F or instance let A n be the set of s ubsets o f { 1 , . . . , n } . Suppose further we a r e given A n,k (of size a n,k ), k ∈ N representing a disjo int set pa rtition of A n . F or instance let A n,k be the num b er of subs e ts with exac tly k elements. Cons ider the rando m v a riable ξ n having the pr obability distribution P ( ξ n = k ) = a n,k /a n , then the cor resp onding probability gener ating function is given by X k ≥ 0 P ( ξ n = k ) w k = X k ≥ 0 a n,k a n w k = P k ≥ 0 a n,k w k P k ≥ 0 a n,k 1 k . CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 5 Let ϕ n ( w ) = P k ≥ 0 a n,k w k , then ϕ n ( w ) ϕ n (1) is the pro ba bility generating function o f ξ n and f ( z , w ) = X n ≥ 0 ϕ n ( w ) z n = X n ≥ 0 X k ≥ 0 a n,k w k z n is called the biv ariate gener ating function. F or instance, in our example we hav e P ( ξ n ) = ( n k ) 2 n and the resulting biv ariate generating function is (1.1) X n ≥ 0 X k ≤ n n k w k z n = 1 1 − z (1 + w ) . The key idea consists in c onsidering f ( z , w ) as being parameterized b y w and to study the change of its sing ularity in a n ǫ -disc cen tered at w = 1. Indeed the moment generating function is given by E ( e sξ n ) = X k ≥ 0 a n,k a n e sk = ϕ n ( e s ) ϕ n (1) = [ z n ] f ( z , e s ) [ z n ] f ( z , 1 ) and [ z n ] f ( z ,e it ) [ z n ] f ( z , 1) = E ( e itξ n ) is the characteristic function of ξ n . This sho ws that the co efficients of f ( z , w ) cont ro l the distribution, which can, for large n , be obtained via singularit y analysis. The r esulting a nalysis can b e amaz ingly simple. Let us show case this in the case of the binomial distribution. Here w e ha ve the biv ar iate g enerating function P n ≥ 0 P k ≤ n n k w k z n = 1 1 − z (1+ w ) , eq. (1.1). The s imple p o le r ( s ) of f ( z , e s ) is 1 1+ e s . Observe that ϕ n ( e s ) ϕ n (1) ∼ ( r (0) r ( s ) ) n holds for s uniformly in a neighborho od of 0, and T aylor ex pa nsion shows ϕ n ( e it ) ϕ n (1) ∼ exp( i · n 2 · t − 1 2 · n 4 · t 2 + O ( t 3 )) uniformly for t for any arbitr ary finite interv al. It remains to apply the L´ evy-Cra m´ er theorem (Theorem 4) to the normalized c har acteristic function of the random v ar ia ble η n − n 2 √ n 4 , which yields the as y mptotic normality of η n . Th us n k is asymptotica lly norma l distributed with mean n 2 and v a riance n 4 . As it turns out w e will hav e to w ork a bit harder to prov e our main result. The complication is due to the fact that the g enerating function for 3-noncro ssing RNA structur es is mu ch mor e co mplex (and fascinating) than the biv ariate function of eq . (1.1) which has a simple p ole as dominant singularity . F o r instance, the singula rity of the g e nerating function for 3-noncro ssing RNA s tructures is not a p ole but of a lgebraic- logar ithmic type [5, 6, 16]. Our tw o main r esults, The o rem 5 in Section 4 and T he o rem 6 in Section 5 shed light on the distribution of 3 -noncros sing RNA structures from a g lobal and lo ca l p ersp ective. A central limit 6 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ theorem represents the g lobal p ersp ective on the limiting dis tribution of some r andom v ariable X n : lim n →∞ P X n − µ n σ n < x = 1 √ 2 π Z x −∞ e − t 2 2 dt . Bender observed in [2] tha t a cen tra l limit theorem combined with cer tain smo othness co nditions on the co efficients a n,k implies a lo cal limit theo rem which co ns iders the difference betw een P ( x ≤ X n − µ n σ n < x + 1) and 1 √ 2 π R x +1 x e − t 2 2 dt as n tends to infinity . T o be precise, X n satisfies a lo c al limit theorem on some set S ⊂ R if and o nly if lim n →∞ sup x ∈ S σ n P X n − µ n σ n = x − 1 √ 2 π e − x 2 2 = 0 holds and we say X n satisfies a lo c al limit theorem for some S = { x ∈ R | x = o ( √ n ) } . Why is the smo o thness of the a n,k so imp or tant? Suppo se a n,k = n k + ( − 1 ) k 2 n k , then it follows in analogy to our ab ov e arg ument that a central limit theorem with mean 1 4 n a nd v ariance 1 4 n holds. How ever, η n do es not satisfy a loc a l limit theorem, since σ n P ( η n − µ n σ n = x ) − 1 √ 2 π e − x 2 2 = 1 2 √ n P ( η n − 1 4 n 1 2 √ n = x ) − 1 √ 2 π e − x 2 2 and for S = { √ n 2 | n = 1 , 2 . . . } , we hav e 1 2 √ n P ( η n = 1 2 n ) − 1 √ 2 π e − n 4 9 0, the key po int be ing that a n,k flips betw een − n k and 3 n k . All res ults of th is paper hold for 2- noncross ing RNA structures, i.e . RNA seco ndary structures . This is a consequence of a n a nalogo us ana lysis of their respective biv ariate gener a ting function. In this case, how ever, no singular expansio n is necess ary as the genera ting function itself can b e used. They a lso give rise to put th e asymptotic res ults on RNA secondar y structures of [8] on a new level. W e ca n pass from computing exp onential growth r ates to computing distributions for RNA secondary structures with sp ecific proper ties. T o be precise we hav e for RNA secondary s tructures Theorem. L et S ′ 2 ( n, h ) denote the nu mb er of RNA se c ondary st r u ctur es with exactly h ar cs. Then the r andom varia ble Y n having distribution P ( Y n = h ) = S ′ 2 ( n, h ) / S 2 ( n ) satisfies a c entr al and lo c al limit the or em with me an 0 . 2763 9 n and varianc e 0 . 04 4 72 n . In par ticular the theorem pr e dicts a sharp co ncentration of the num ber of RNA secondar y struc- tures with 55 . 27 8 % unpaired bas e s which a grees with the statistics of RNA seconda r y structures obtained by folding algorithms [24, 8, 22, 19, 1 8, 1 5]. Let us finally rema rk that muc h more holds: due to the det ermina nt formula for k -noncro ssing ma tchings and the functional identit y CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 7 Figure 4. Central limit theorem of 2-noncrossing and 3-noncrossing RNA structures: b oth rand om v ariables are n ormalized to S ′ 2 ( n, h ) / S 2 ( n )and S ′ 3 ( n, h ) / S 3 ( n ), resp ectively . In case of n = 100, for 2-noncrossing R N A structures w e h a ve a mean of 0 . 276393 n = 27 . 6393 and v ariance 0 . 044721 n = 4 . 4721 (left curve), while for 3-noncrossing RN A structures mean 0 . 39089 n = 39 . 089 and v ariance 0 . 041565 n = 4 . 1565 (right curve). The red dots and magen ta d ots represen t the va lues S ′ 2 ( n, h ) / S 2 ( n ) and S ′ 3 ( n, h ) / S 3 ( n ), re- sp ectively . of Lemma 1, Section 3 our r esults ca n b e generaliz ed to k -no ncrossing RNA structures, where k is ar bitr ary . Why this is of int eres t ca n b e seen in Fig. 4. F or higher k the mean of the central limit theorems for k - noncross ing RNA structures will shift to wards the maximum combinatorially po ssible num ber of arcs. W e sp ecula te that each increa se in k will basically cut the distance to the maximum ar c num b er in half. This is w or k in pro g ress. The pap er is str uctured a s f ollows: In Section 2 we provide some bac kgr ound on k -no ncrossing RNA structures and all generating functions inv olved. In Sec tio n 3 we g ive a functional equation for the biv ariate ge ne r ating function of S ′ 3 ( n, h ) via 3-noncr ossing matchings proved in [11]. W e hav e included its pro of in the app endix in order to keep the pape r self-contained. This functional ident ity plays a k ey role in proving the central limit and lo cal limit theorem in Section 4 and Section 5, resp e ctively . The c e nt ra l limit theor em is proved by analyz ing the s ingular expa ns ion of analytic function of p ow er series P n ≥ 0 P h ≤ n 2 S ′ 3 ( n, h ) w h z n and using transfer theorems [5, 6, 16] 8 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ and to prov e the loca l limit theo rem, w e use a theo rem of Hwang [9 ] and build on o ur pro of o f the central limit theo rem. 2. RNA structures Let us begin b y illustrating the concept of RNA structures. Suppo s e we ar e given the primary sequence AA CCA UGUGGUA CUUGA UGGCGA C . Structures are com binatorial g raphs o ver the labe ls of the nucleotides of the primary sequence. These g raphs c an b e represented in several wa ys. In Fig ure 5 we repr esent a 3-noncross ing RNA structure with loop- lo op interactions in tw o wa ys: first we display the struc tur e as a planar graph and secondly as a diagr am, where the bonds are drawn as a rcs in the p ositive ha lf-plane. In the following we will consider structures as diagra m repres entations of digraphs. A digraph D n is a pair of sets V D n , E D n , where V D n = { 1 , . . . , n } a nd E D n ⊂ { ( i, j ) | 1 ≤ i < j ≤ n } . V D n and E D n are called vertex and arc set, resp ectively . A k -noncr ossing digraph is a digra ph in which all vertices have degree ≤ 1 and whic h do es not contain a k -se t of a rcs that a re mutually intersecting, i.e. 6 ∃ ( i r 1 , j r 1 ) , ( i r 2 , j r 2 ) , . . . , ( i r k , j r k ); i r 1 < i r 2 < · · · < i r k < j r 1 < j r 2 < · · · < j r k . (2.1) W e will represe nt digr aphs as a diagrams (Figure 5) by r epresenting the vertices as integers o n a line and connecting any t wo adjacent v ertices by an a r c in the upper -half plane. T he direction of the arcs is implicit in the linear ordering of the vertices a nd accor dingly omitted. Definition 1. An RNA structure (of pseudo-knot type k − 2), S k,n , is a digra ph in which all vertices have degree ≤ 1 , that do es no t co nt ain a k -set of mutually in tersec ting a rcs and 1-a rcs, i.e. arcs o f th e form ( i, i + 1), respectively . W e denote the n umber of RNA structures by S k ( n ) and the num be r of RNA structur es with exactly ℓ isola ted vertices and with h arcs by S k ( n, ℓ ) and S ′ k ( n, h ), resp ectively . Note that S ′ k ( n, h ) = S k ( n, n − 2 h ). CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 9 Figure 5. A 3-noncrossing R NA structure, as a planar graph (top) and as a diagram (b ottom) Let f k ( n, ℓ ) denote the n umber of k -noncr ossing digraphs with ℓ is olated po int s. W e hav e shown in [10] that f k ( n, ℓ ) = n ℓ f k ( n − ℓ, 0) (2.2) det[ I i − j (2 x ) − I i + j (2 x )] | k − 1 i,j =1 = X n ≥ 1 f k ( n, 0) · x n n ! (2.3) e x det[ I i − j (2 x ) − I i + j (2 x )] | k − 1 i,j =1 = ( X ℓ ≥ 0 x ℓ ℓ ! )( X n ≥ 1 f k ( n, 0) x n n ! ) = X n ≥ 1 ( n X ℓ =0 f k ( n, ℓ ) ) · x n n ! . (2.4) In particular we o btain for k = 2 and k = 3 (2.5) f 2 ( n, ℓ ) = n ℓ C ( n − ℓ ) / 2 and f 3 ( n, ℓ ) = n ℓ h C n − ℓ 2 +2 C n − ℓ 2 − C 2 n − ℓ 2 +1 i , where C m denotes the m -th Ca talan num ber. The deriv ation o f the gener ating function of k - noncrossing RNA structures , given in Theorem 1 b elow uses adv anced metho ds and nov el co n- structions o f enumerative combinatorics due to Chen et.al. [4, 7] and Stanley’s mapping b e t ween matchings and osc illating tableaux i.e. fa milies of Y oung dia g rams in which any tw o consecutive shap es differ by exactly one square . The enumeration is obtained using the r eflection pr inciple due to Gessel and Zeilb erger [7 ] and Lindstr¨ om [1 3] combined with an inclusion-exclus io n argument in order to eliminate the ar cs o f length 1. In [10] g eneraliza tio ns to restr icted (i.e. where arcs of the form ( i, i + 2) ar e excluded) and cir cular RNA str uc tur es are g iven. The following theore m 10 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ provides all data o n num b ers of k -noncr o ssing RNA structures with h arcs and the num ber s of all k -noncro ssing RNA s tructures. Theorem 1. [1 0] L et k ∈ N , k ≥ 2 , let C m denote the m - th Catalan numb er and f k ( n, ℓ ) b e the numb er of k -noncr ossing digr aphs over n vertic es with exactly ℓ isolate d vertic es. Then the numb er of R N A s t ructur es with ℓ isolate d vertic es, S k ( n, ℓ ) , is given by (2.6) S k ( n, ℓ ) = ( n − ℓ ) / 2 X b =0 ( − 1) b n − b b f k ( n − 2 b, ℓ ) , wher e f k ( n − 2 b, ℓ ) is gi ven by the gener ating fun ction in e q. (2.3) . F urthermor e the numb er of k -noncr ossing R N A s t ructur es, S k ( n ) is (2.7) S k ( n ) = ⌊ n/ 2 ⌋ X b =0 ( − 1) b n − b b ( n − 2 b X ℓ =0 f k ( n − 2 b, ℓ ) ) wher e { P n − 2 b ℓ =0 f k ( n − 2 b, ℓ ) } is given by the gener ating function in e q. (2.4) . In principle, Theorem 1 co ntains all info r mation ab out the num b e rs of k -noncrossing RNA struc- tures. How ever, due to the inclusion-exc lusion str uc tur e of its co efficie nts it is howev er difficult to int erpr et and to express their b e havior for lar ge n . Subsequent asy mptotic analysis [11] pro duced the following simple fo rmula Theorem 2. [1 1] The num b er of 3 -noncr ossing RNA structur es is asymptotic al ly given by S 3 ( n ) ∼ 10 . 472 4 · 4! n ( n − 1) . . . ( n − 4 ) 5 + √ 21 2 ! n . 3. A functional equa tion W e hav e shown in the in tro ductio n that the biv aria te gene r ating function is the key to prove the central and loca l limit theorems. The following lemma, whose pro of is given in the app endix, rewrites this biv aria te generating function as a composition of tw o “simple” fun ctions. This is crucial for the sing ularity analysis insofar as w e can use a phenomenon known as p ersistence of the singularity of the “o uter” function (the sup er critic al c ase ) [5]. It basically mea ns that the t yp e of the singularity is deter mined by the g enerating function of k - no ncrossing matchings. CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 11 Lemma 1. [11] L et x b e an indeterminant over R and w ∈ R a p ar ameter. L et ρ k ( w ) denote the r adius of c onver genc e of the p ower series P n ≥ 0 [ P h ≤ n/ 2 S ′ k ( n, h ) w 2 h ] x n . Then for | x | < ρ k ( w ) (3.1) X n ≥ 0 X h ≤ n/ 2 S ′ k ( n, h ) w 2 h x n = 1 w 2 x 2 − x + 1 X n ≥ 0 f k (2 n, 0) wx w 2 x 2 − x + 1 2 n holds. In p articular we have for w = 1 (3.2) X n ≥ 0 S k ( n ) z n = 1 z 2 − z + 1 X n ≥ 0 f k (2 n, 0) z z 2 − z + 1 2 n for z ∈ C with | z | < ρ k (1) . T o keep the pap er selfcontained we giv e the pro of of Lemma 1 in the App endix. While (3.1) can only be prov ed on the level of formal p ow er- series for rea l v ariables, co mplex analy s is i.e. the int erpr etation of these gener ating functions as ana lytic functions allows to extend the e q uality to arbitrar y complex v ariables . Lemma 2. S upp ose ǫ > 0 , k ∈ N , k ≥ 2 and w = e s 2 , wher e | s | < ǫ and ϕ n,k ( s ) = P h ≤ n/ 2 S ′ k ( n, h ) e hs . L et ρ k ( s ) ∈ R + denote the r adius of c onver genc e of P n ≥ 0 ϕ n,k ( s ) z n p ar ameterize d by s . Then we have (3.3) ∀ s, z ∈ C ; | s | < ǫ, | z | < ρ k ( s ); X n ≥ 0 ϕ n,k ( s ) z n = 1 e s z 2 − z + 1 X n ≥ 0 f k (2 n, 0) e s 2 z e s z 2 − z + 1 2 n . F urthermor e P n ≥ 0 ϕ n, 3 ( s ) z n has an analytic c ontinuation, Ξ 3 ( z , s ) . F or ǫ su fficiently smal l and | s | < ǫ , Ξ 3 ( z , s ) has exactly 6 singularities, 4 of which have distinct mo duli. Pr o of. W e fir s t prov e eq. (3 .3). F or this purp ose we observe that (3.4) ∀ | s | < ǫ, | z | < ρ k ( s ) G ( z , s ) = 1 e s z 2 − z + 1 X n ≥ 0 f k (2 n, 0) e s 2 z e s z 2 − z + 1 2 n considered as a power s eries in e 1 2 s is analytic in a neigh b orho od of s = 0 , since G ( z , 0) is ana lytic for | z | < ρ k (0). In addition, we c an in terpr et P n ≥ 0 ϕ n,k ( s ) z n as a p ow er series in e 1 2 s : (3.5) X n ≥ 0 X h ≤ n/ 2 S ′ k ( n, h ) e hs z n = X h ≥ 0 X n ≥ 2 h S ′ k ( n, h ) z n ( e s ) h = X h ≥ 0 ψ h ( z ) e 1 2 s 2 h . 12 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ Therefore G ( z , s ) and the p ow er s eries P n ≥ 0 ϕ n,k ( s ) z n are a nalytic in the indeterminant e 1 2 s in an ǫ -dis c c e nt ered at 0. Lemma 1 implies that for s ∈ ] − ǫ, ǫ [ the a nalytic functions G ( z , s ) and P n ≥ 0 ϕ n,k ( s ) z n are e qual. Since any tw o functions that a re analytic a t 0 and that co incide on the int erv al ] − ǫ, ǫ [ are identical, w e obtain (3.6) ∀ | s | < ǫ, | z | < ρ k ( s ) G ( z , s ) = X n ≥ 0 ϕ n,k ( s ) z n . Claim 1 . Supp os e | s | < ǫ . Then P n ≥ 0 ϕ n, 3 ( s ) z n has an analytic c ontin uation, Ξ 3 ( z , s ), which has exactly 6 singular ities 4 of whic h hav e distinct mo duli. In order to prove Claim 1 we observe that the power series P n ≥ 0 f 3 (2 n, 0) y n has the ana lytic contin uation Ψ( y ) (obtained by MAPLE sum to ols) given b y (3.7) Ψ( y ) = − (1 − 16 y ) 3 2 P − 1 3 2 ( − 16 y +1 16 y − 1 ) 16 y 5 2 , where P m ν ( x ) deno tes the Legendre Polynomial of the fir st kind with the pa rameters ν = 3 2 and m = − 1. According to eq. (3.6) we ha ve (3.8) X n ≥ 0 ϕ n,k ( s ) z n = 1 e s z 2 − z + 1 X n ≥ 0 f k (2 n, 0) e s 2 z e s z 2 − z + 1 2 n which implies that P n ≥ 0 ϕ n, 3 ( s ) z n has the analytic contin uation (3.9) ∀ | s | < ǫ , Ξ 3 ( z , s ) = 1 e s z 2 − z + 1 Ψ e 1 2 s z e s z 2 − z + 1 ! 2 . In particula r for s = 0, Ξ 3 ( z , 0 ) is the analytic contin uation of the pow er ser ies P n ≥ 0 S 3 ( n ) z n . W e pro ceed by showing that Ξ 3 ( z , s ) ha s exactly 6 singularities a nd 4 of them have differ ent mo duli in C parameter iz e d by s . Two singula rities a re given by the ro ots o f e s z 2 − z + 1 are ζ 1 ( s ) = 1 − √ 1 − 4 e s 2 e s and ζ 2 ( s ) = 1+ √ 1 − 4 e s 2 e s . Observe that | ζ 1 (0) | = | ζ 2 (0) | = 1 and p olynomia l e s z 2 − z + 1 dep ends contin uously on e s 2 , therefore ζ 1 ( s ) a nd ζ 2 ( s ) co uld p otentially hav e equa l mo dulus for | s | < ǫ . The remaining 4 singula rities are induced by the the unique domina nt singularity α 1 = 1 16 of a nalytic function Ψ( y ). The function Ψ( y ) has three singular ities, t wo of them α 1 = 1 16 and α 2 = + ∞ ar e branch points and the other α 3 = 0 is a remov able singularity . The function g ( z ) = e s 2 z e s z 2 − z +1 2 with g (0) = 0 has a ra dius of conv erg ence of 1 as s tends to 0. Therefore the singularity type only depe nds o n Ψ ( y ) (this is the sup er critic al c ase in [5]). The singularity α 1 = 1 16 gives rise to the equations 0 = e s z 2 − (1 + 4 e 1 2 s ) z + 1 and 0 = e s z 2 + (4 e 1 2 s − 1) z + 1 CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 13 and setting µ + ( s ) = 1 + 4 e 1 2 s , µ − ( s ) = 1 − 4 e 1 2 s and θ ( s ) = p 12 e s + 8 e 1 2 s + 1 its ro ots ar e given by ζ 3 ( s ) = µ + ( s ) − θ ( s ) 2 e s , ζ 4 ( s ) = µ + ( s ) + θ ( s ) 2 e s , ζ 5 ( s ) = µ − ( s ) + θ ( s ) 2 e s and ζ 6 ( s ) = µ − ( s ) − θ ( s ) 2 e s , resp ectively . Obser ve that for | s | < ǫ , e s 2 is in a neighbor ho o d of 1 ov er C , hence θ ( s ) 6 = 0. That leads to 4 distinct r o ots ζ 3 ( s ) , ζ 4 ( s ) , ζ 5 ( s ) , ζ 6 ( s ) ov er | s | < ǫ , all of them ha ve distinct mo duli for s being a sufficien tly small neighbor ho od of 0. Indeed, fo r s = 0 we hav e 4 distinct rea l v alued ro ots ζ 3 (0) = 5 − √ 21 2 , ζ 4 (0) = 5 + √ 21 2 , ζ 5 (0) = − 3 + √ 5 2 , and ζ 6 (0) = − 3 − √ 5 2 and the p olynomials e s z 2 − (1 + 4 e 1 2 s ) z + 1 , e s z 2 + (4 e 1 2 s − 1) z + 1 and e s z 2 − z + 1 dep end contin uously on the parameter e 1 2 s , whence Claim 1 and the lemma follows. 4. The central limit theorem In this section we prove a central limit theor em for the num bers o f 3-noncross ing RNA structur es with h arcs. W e will a nalyze for fixed but arbitrary n the dis tr ibution o f S ′ 3 ( n, h ). Let us first prepare s o me metho ds and results used in the pro o f of Theor em 5. [ z n ] f ( z ) denotes the co efficient of z n in the p ow er series expansio n of f ( z ) a round 0. The scaling pr op erty of T aylor co efficien ts (4.1) ∀ γ ∈ C \ 0 ; [ z n ] f ( z ) = γ n [ z n ] f ( z γ ) , shows that w.l.o.g . any singularity analys is can be reduced to the cas e where 1 is the dominan t singularity . W e will be interested in the b e havior of an analy tic function “lo cally ”, i.e. around a certain singular it y ρ . F or this purp ose we use the notation (4.2) f ( z ) = O ( g ( z )) a s z → ρ ⇐ ⇒ f ( z ) /g ( z ) is b ounded as z → ρ and if we write f ( z ) = O ( g ( z )) it is implicit ly assumed that z tends to a (unique) singularit y . Given tw o num b ers φ, R , wher e R > | ρ | > 0 a nd 0 < φ < π 2 and ρ ∈ C the op en domain ∆ ρ ( φ, R ) is defined as (4.3) ∆ ρ ( φ, R ) = { z | | z | < R, z 6 = ρ, | Arg( z − ρ ) | > φ } A do main is a ∆ ρ -domain if it is of the form ∆ ρ ( φ, R ) for some R and φ . A function is ∆ ρ -analytic if it is a nalytic in some ∆ ρ -domain. W e use U ( a, r ) = { z ∈ C | | z − a | < r } to denote the op en neighborho o d of a in C . Via the following theorem we can extract the co efficie nts of analy tic functions provided these functions satisfy certain “lo cal” prop er ties. 14 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ Theorem 3. [5 ] L et r ∈ Z ≥ 0 and f ( z , e s ) b e a ∆ ρ ( s ) -analytic function p ar ameterize d by s , which satisfies in the interse ction of a neighb orho o d of ρ ( s ) with its ∆ ρ ( s ) -domain (4.4) f ( z , e s ) = b 0 ( s ) + b 1 ( s )( z − ρ ( s )) + A ( s ) ( ρ ( s ) − z ) r ln 1 ρ ( s ) − z + R ( z , s ) wher e A ( s ) , b 0 ( s ) , b 1 ( s ) ar e analytic in | s | < ǫ and | R ( z , s ) | ≤ c | ρ ( s ) − z | for some absolute c onstant c ∈ C . That is we have f ( z , e s ) = O (( ρ ( s ) − z ) r ln( 1 ρ ( s ) − z )) w ith uniform err or b ound as s in a neighb orho o d of 0 . Then we have (4.5) [ z n ] f ( z , e s ) = A ( s ) ( − 1) r r ! n ( n − 1) . . . ( n − r ) 1 − O ( 1 n ) for some A ( s ) ∈ C , wher e the err or term is again uniform for s fr om a neighb orho o d of origin, i.e. R ( s ) ≤ c | s | , wher e c > 0 . Remark. The equiv alence b etw een eq. (4.4) a nd f ( z , e s ) = O (( ρ ( s ) − z ) r ln( 1 ρ ( s ) − z )) for r ∈ Z ≥ 0 can be seen as follows: by definition of f ( z , e s ) = O (( ρ ( s ) − z ) r ln( 1 ρ ( s ) − z )) there exist A ( z , s ) and B ( z , s ), such that f ( z , e s ) = B ( z , s ) + A ( z , s )( ρ ( s ) − z ) r ln( 1 ρ ( s ) − z ), where A ( z , s ) and B ( z , s ) a re analytic in a neighbor ho o d of ρ ( s ). T aylor expansion of A ( z , s ) and B ( z , s ) at z = ρ ( s ) pr o duces f ( z , s ) = B ( z , s ) + A ( z , s )( ρ ( s ) − z ) r ln 1 ρ ( s ) − z = b 0 ( s ) + b 1 ( s )( z − ρ ( s )) + · · · + ( a 0 ( s ) + a 1 ( z − ρ ( s )) + · · · ) ( ρ ( s ) − z ) r ln 1 ρ ( s ) − z = b 0 ( s ) + b 1 ( s )( z − ρ ( s )) + a 0 ( s )( ρ ( s ) − z ) r ln 1 ρ ( s ) − z + R ( z , s ) where R ( z , s ) = O (( ρ ( s ) − z ) r +1 ln 1 ρ ( s ) − z ). F or r ∈ Z ≥ 0 , | R ( z ,s ) | | ρ ( s ) − z | = O ( | ρ ( s ) − z | r ln 1 | ρ ( s ) − z | ) is bo unded by an absolute constant a s z tends to ρ ( s ). Tha t implies the erro r bound is uniform. The next Theorem is a clas sic result on limit distributions which allows us to prove our main result via characteristic functions i.e. explicitly by sho wing lim n →∞ ϕ n ( t ) = ϕ ( t ) for any t ∈ ( −∞ , ∞ ). Theorem 4. (L´ evy-Cram ´ er) L et { ξ n } b e a s e quenc e of ra ndom variables and let { ϕ n ( x ) } and { F n ( x ) } b e the c orr esp onding se qu enc es of char acteristic and distribution functions. If ther e exists a fun ction ϕ ( t ) , such t hat lim n →∞ ϕ n ( t ) = ϕ ( t ) un iformly over an arbitr ary fi n ite interval enclosing the origin, t hen ther e exists a r andom variable ξ wi th distribution fun ct ion F ( x ) such that F n ( t ) = ⇒ F ( x ) uniformly over any fin ite or infinite interval of c ontinuity of F ( x ) . CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 15 W e no w consider the random v ariable X n having the distribution P ( X n = h ) = S ′ 3 ( n, h ) / S 3 ( n ), where h = 0 , 1 , . . . ⌊ n 2 ⌋ . The key point in the pro of o f Theorem 5 is to compute the co efficients of the biv ariate gener ating function whose v ariable, s is consider ed as a parameter . Intuitiv ely the particular distribution is a result of how the singularity shifts as a function of this parameter. As a result t he pro of is s o mewhat “non-probabilistic” and has tw o distinct parts : (a) t he analytic combinatorics of the biv a r iate generating function and (b) the co mputation of the c haracteristic function with subsequent application of the L´ evy-Cram´ er Theo rem. Theorem 5. The r andom variab le X n − µn √ σ 2 n has asymptotic al ly normal distribution with p ar ameter (0 , 1) , i.e. (4.6) lim n →∞ P X n − µn √ σ 2 n < x = 1 √ 2 π Z x −∞ e − 1 2 t 2 dt and µ, σ 2 ar e given by (4.7) µ = − − 3 2 + 13 42 √ 21 5 2 − 1 2 √ 21 = 0 . 3 9089 and σ 2 = µ 2 − 1 − 94 441 √ 21 5 − √ 21 2 = 0 . 0 4156 5 . Pr o of. W e se t w = e 1 2 s and ϕ n, 3 ( s ) = P h ≤ n/ 2 S ′ 3 ( n, h ) e hs . Since (4.8) X n ≥ 0 ϕ n, 3 ( s ) z n = X n ≥ 0 X h ≤ n/ 2 S ′ 3 ( n, h ) e hs z n , we can co nsider the double generating function P n ≥ 0 P h ≤ n/ 2 S ′ 3 ( n, h ) w 2 h z n as a p ow er series in the complex indeterminant z , parameterized by s . Claim 1 . (4.9) Ψ( z ) = O (1 − 16 z ) 4 ln 1 1 − 16 z holds uniformly for ∀ z ∈ ∆ 1 16 ( φ, R ) ∩ U ( 1 16 , ǫ ); Ψ( z ) is ∆ 1 16 ( φ, R )-analytic and has the singular expa nsion (1 − 16 z ) 4 ln 1 1 − 16 z in the intersection of U ( 1 16 , ǫ ) with the ∆ 1 16 − domain, where ∆ r ( φ, R ) = { z | z | < R, z 6 = r, | Arg( z − r ) | > φ } for some R > r . Fir st ∆ 1 16 ( φ, R )-analyticity of the function (1 − 16 z ) 4 ln 1 1 − 16 z is obvious. W e pro ceed by pr oving that (1 − 16 z ) 4 ln 1 1 − 16 z is the singula r e xpansion of Ψ( z ). The above men tioned scaling prop erty of T aylor co efficients allows us to consider the power series P n ≥ 0 f 3 (2 n, 0)( z 16 ) n ov er the ∆-doma in ∆ 1 ( φ, R ) for some R > 1. Using the notation of falling factor ials ( n − 1) 4 = 16 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ ( n − 1)( n − 2)( n − 3)( n − 4) we observe f 3 (2 n, 0) = C n +2 C n − C 2 n +1 = 1 ( n − 1) 4 12( n − 1) 4 (2 n + 1) ( n + 3)( n + 1) 2 ( n + 2) 2 2 n n 2 . With this expressio n for f 3 (2 n, 0) we ar rive at the formal identit y X n ≥ 5 16 − n f 3 (2 n, 0) z n = O ( X n ≥ 5 " 16 − n 1 ( n − 1) 4 12( n − 1) 4 (2 n + 1) ( n + 3)( n + 1) 2 ( n + 2) 2 2 n n 2 − 4! ( n − 1) 4 1 π 1 n # z n + X n ≥ 5 4! ( n − 1) 4 1 π 1 n z n ) , where f ( z ) = O ( g ( z )) denotes that the limit f ( z ) /g ( z ) is bo unded for z → 1, eq. (4.2). It is clea r that the error b ound b elow X n ≥ 5 " 16 − n 1 ( n − 1) 4 12( n − 1) 4 (2 n + 1) ( n + 3)( n + 1) 2 ( n + 2) 2 2 n n 2 − 4! ( n − 1) 4 1 π 1 n # z n ∼ X n ≥ 5 " 16 − n 1 ( n − 1) 4 12( n − 1) 4 (2 n + 1) ( n + 3)( n + 1) 2 ( n + 2) 2 2 n n 2 − 4! ( n − 1) 4 1 π 1 n # < κ holds unifor mly for z in ∆ 1 ( φ, R ) ∩ U (1 , ǫ ) a nd some abs olute κ < 0 . 07 84. Therefore we can conclude (4.10) X n ≥ 5 16 − n f 3 (2 n, 0) z n = O ( X n ≥ 5 4! ( n − 1) 4 1 π 1 n z n ) . W e pro ceed by interpreting the p ower ser ie s on the rhs, observing (4.11) ∀ n ≥ 5 ; [ z n ] (1 − z ) 4 ln 1 1 − z = 4! ( n − 1) . . . ( n − 4) 1 n , whence (1 − z ) 4 ln 1 1 − z is the unique analytic co nt inuation of P n ≥ 5 4! ( n − 1) 4 1 π 1 n z n . Using the scaling prop erty of T aylor co efficients [ z n ] f ( z ) = γ n [ z n ] f ( z γ ) (4.12) Ψ( z ) = O (1 − 16 z ) 4 ln 1 1 − 16 z holds uniformly for ∀ z ∈ ∆ 1 16 ( φ, R ) ∩ U ( 1 16 , ǫ ) Therefore we hav e prov ed that (1 − 1 6 z ) 4 ln( 1 1 − 16 z ) is the singular expa ns ion of Ψ( z ) a t z = 1 16 , whence Claim 1. Our next step consists in v erifying that when pas sing from Ψ ( z ) to the biv ariate generating function Ψ( z , s ) = Ψ (( wz w 2 z 2 − z +1 ) 2 ), then there exists a singula r expans ion o f the form O (1 − z ρ 3 ( s ) ) 4 ln( 1 1 − z ρ 3 ( s ) ) , parameter ized in s . Claim 2. Let 0 < ǫ < 1, then for an y | s | < ǫ and z ∈ ∆ ρ 3 ( s ) ( φ, R ), we hav e Ψ ( z , s ) = CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 17 O (1 − z ρ 3 ( s ) ) 4 ln( 1 1 − z ρ 3 ( s ) ) , and the erro r bound is unifor m for s in a neighbor ho o d of 0. T o pr ov e the claim w e first obs erve that Claim 1 implies (4.13) Ψ( z ) = κ (1 − 16 z ) 4 ln 1 1 − 16 z + R ( z ) for so me abso lute constant κ and R ( z ) is the uniform err or b ound for z ∈ ∆ 1 16 ( φ, R ) ∩ U ( 1 16 , ǫ ). I.e. F or z ∈ ∆ 1 16 ( φ, R ) ∩ U ( 1 16 , ǫ ), there exists so me absolute constant c , s uch that | R ( z ) | ≤ c · | 1 − 16 z | holds. According to Lemma 2 we ha ve Ξ 3 ( z , s ) = 1 e s z 2 − z + 1 O 1 − 16( e 1 2 s z e s z 2 − z + 1 ) 2 ! 4 ln 1 1 − 16( e 1 2 s z e s z 2 − z +1 ) 2 = κ e s z 2 − z + 1 1 − 16( e 1 2 s z e s z 2 − z + 1 ) 2 ! 4 ln 1 1 − 16( e 1 2 s z e s z 2 − z +1 ) 2 + R ( z , e s ) . W e expand 1 − 16( e 1 2 s z e s z 2 − z +1 ) 2 4 ln 1 1 − 16( e 1 2 s z e s z 2 − z +1 ) 2 ! around z = ρ 3 ( s ), where ρ 3 ( s ) is the solu- tion of z e 1 2 s e s z 2 − z +1 = 1 4 of minimal mo dulus. Le mma 2 implies that ρ 3 ( s ) = 4 e 1 2 s + 1 − p 12 e s + 8 e 1 2 s + 1 2 e s is the unique dominant singular it y . As a function in s we have ρ ′ 3 (0) = − 3 2 + 13 42 √ 21 6 = 0. The ter m p 12 e s + 8 e 1 2 s + 1 in ρ 3 ( s ) pro duces tw o branching p oints parameterized b y s . i.e. w = e 1 2 s = − 1 6 and w = e 1 2 s = − 1 2 , or equiv alently s = 2 ln 1 2 + 2 π i and s = 2 ln 1 6 + 2 π i , resp ectively . The int erv al betw een 2 ln 1 6 + 2 π i and 2 ln 1 2 + 2 π i divides the complex plane of s into t w o analytic branches. F or a ny 0 < ǫ < min {| 2 ln 1 2 + 2 π i | , | 2 ln 1 6 + 2 π i |} = 6 . 4343, the region | s | < ǫ is disjoint to the interv al [(2 ln 1 6 , 2 π ) , (2 ln 1 2 , 2 π )]. Therefore ρ 3 ( s ) is analytic for | s | < ǫ . W e nex t consider q ( z , s ) = 1 − 16( e 1 2 s z e s z 2 − z +1 ) 2 as a function o f z and compute the T aylor expansion at ρ 3 ( s ). q ( z , s ) = α ( ρ 3 ( s ) − z ) + O ( z − ρ 3 ( s )) 2 18 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ and setting α = √ 21 5 − √ 21 1 e s z 2 − z + 1 q ( z , s ) 4 ln 1 q ( z , s ) = ( α ( ρ 3 ( s ) − z ) + O ( z − ρ 3 ( s )) 2 ) 4 ln 1 α ( ρ 3 ( s ) − z )+ O ( z − ρ 3 ( s )) 2 e s ( z − ρ 3 ( s )) 2 + (2 ρ 3 ( s ) e s − 1)( z − ρ 3 ( s )) − 3 ρ 3 ( s ) 2 e s + ρ 3 ( s ) + 1 = [ α + O ( z − ρ 3 ( s ))]( ρ 3 ( s ) − z ) 4 ln 1 [ α + O ( z − ρ 3 ( s ))]( ρ 3 ( s ) − z ) O ( z − ρ 3 ( s )) − 3 ρ 3 ( s ) 2 + ρ 3 ( s ) + 1 = O ( ρ 3 ( s ) − z ) 4 ln 1 ρ 3 ( s ) − z . According to Theorem 3 for r = 4 , we obtain the error term in the expansio n of 1 e s z 2 − z +1 h q ( z , s ) 4 ln 1 q ( z , s ) i is uniform for s in a neighborho o d of 0. W e observe that the resulting err or b ound for Ξ 3 ( z , s ) is the sum R ( z , e s ) + R 1 ( z , e s ), where | R ( z , e s ) | ≤ c · 1 − 16 e 1 2 s z e s z 2 − z + 1 ! = O ( ρ 3 ( s ) − z ) . Therefore the error b ound for the expa nsion of biv ariate Ξ 3 ( z , s ) is unifor m and Claim 2 is pr ov ed. W e pro ceed by using the scaling prop er ty of T aylor co efficients [ z n ] f ( z ) = γ n [ z n ] f ( z γ ) and apply Theorem 3 . Via Theorem 3 we obta in the key infor mation about the co efficients of Ξ 3 ( z , s ) which allows us to subs titute ϕ n, 3 ( it σ n ) in eq. (4.17) b elow: (4.14) [ z n ] Ξ 3 ( z , s ) = K ( s ) 4! n ( n − 1) . . . ( n − 4 ) ρ 3 ( s ) − 1 n 1 − O ( 1 n ) for some K ( s ) ∈ C , where the erro r term is again uniform for s from a neighbo rho o d of origin. Suppo se we a r e given the ra ndo m v ariable (r.v.) ξ n with mean µ n and v arianc e σ 2 n . W e consider the rescale d r.v. η n = ( ξ n − µ n ) σ − 1 n and the characteristic function of η n : (4.15) f η n ( t ) = E [ e itη n ] = E [ e it ξ n σ n ] e − i µ n σ n t . In particular, for ξ n = X n we obtain, substituting the term E [ e itη n ] (4.16) f X n ( t ) = n X h =0 S ′ 3 ( n, h ) S 3 ( n ) e it h σ n ! e − i µ n σ n t . Since ϕ n, 3 ( s ) = P h ≤ n/ 2 S ′ 3 ( n, h ) e hs , we can interpret S 3 ( n ) = P h ≤ n/ 2 S ′ 3 ( n, h ) as ϕ n, 3 (0) and ϕ n, 3 ( it σ n ) = P h ≤ n/ 2 S ′ 3 ( n, h ) e h it σ n , resp ectively . Therefor e we hav e (4.17) f X n ( t ) = 1 ϕ n (0) ϕ n ( it σ n ) e − i µ n σ n t . CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 19 F o r | s | < ǫ , eq. (4.14) yields ϕ ( s ) = [ z n ] Ξ 3 ( z , s ) ∼ K ( s ) 4! n ( n − 1) ... ( n − 4) ρ 3 ( s ) − 1 n with unif or m error term and we ac c ordingly obtain (4.18) f X n ( t ) ∼ K ( it σ n ) K (0) " ρ 3 ( it σ n ) ρ 3 (0) # − n e − i µ n σ n t . where the erro r term is uniform for t from any bounded in terv al. T a king the loga r ithm we obtain (4.19) ln f X n ( t ) ∼ ln K ( it σ n ) K (0) − n ln ρ 3 ( it σ n ) ρ 3 (0) − i µ n σ n t . Expanding g ( s ) = ln ρ 3 ( s ) ρ 3 (0) in its T aylor series at s = 0, (note that g (0) = 0 ho lds) yields (4.20) ln ρ 3 ( it σ n ) ρ 3 (0) = ρ ′ 3 (0) ρ 3 (0) it σ n − " ρ ′′ 3 (0) ρ 3 (0) − ρ ′ 3 (0) ρ 3 (0) 2 # t 2 2 σ 2 n + O ( it σ n 3 ) and therefore (4.21) ln f n ( t ) ∼ ln K ( it σ n ) K (0) − n ( ρ ′ 3 (0) ρ 3 (0) it σ n − 1 2 " ρ ′′ 3 (0) ρ 3 (0) − ρ ′ 3 (0) ρ 3 (0) 2 # t 2 σ 2 n + O ( it σ n 3 ) ) − iµ n t σ n . Claim 2 implies Ξ 3 ( z , s ) = O ( ρ 3 ( s ) − z ) 4 ln 1 ρ 3 ( s ) − z is a nalytic in s where s is contained in a disc of r adius ǫ a r ound 0. Hence Ξ 3 ( z , s ) is in particular contin uous in s for | s | < ǫ a nd we can conclude from eq. (4.14) for fix e d t ∈ ] − ∞ , ∞ [ (4.22) lim n →∞ ln K ( it σ n ) − ln K (0) = 0 . In view of eq. (4.21) we in tro duce µ = − ρ ′ 3 (0) ρ 3 (0) , σ = ( ρ ′ 3 (0) ρ 3 (0) 2 − ρ ′′ 3 (0) ρ 3 (0) ) and eq. (4.21) b ecomes (4.23) ln f X n ( t ) ∼ − t 2 2 + O ( it σ n 3 ) with uniform er ror term for t from any bounded interv al. This is eq uiv alent to lim n →∞ f X n ( t ) = exp( − t 2 2 ) with uniform error term. The L´ evy-Cram´ er Theorem (Theorem 4) implies now eq. (4.6) and it remains to compute the v alues for µ a nd σ which are given by µ = − ρ ′ 3 (0) ρ 3 (0) = − − 3 2 + 13 42 √ 21 5 2 − 1 2 √ 21 = 0 . 39 0 89 (4.24) σ 2 = µ 2 − ρ ′′ 3 (0) ρ 3 (0) = µ 2 − 1 − 94 441 √ 21 5 − √ 21 2 = 0 . 0 4156 5 (4.25) 20 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ whence eq. (4.7) and the pro o f of Theorem 5 is complete. 5. The l ocal limit theorem In this section we complemen t the central limit theorem presented in the pr evious section lim n →∞ P X n − µ n σ n < x = 1 √ 2 π Z x −∞ e − t 2 2 dt by co ns idering a ”lo cal” p ersp ective on the limiting distr ibutio n o f X n . F or the lo cal limit theor em we a nalyze the difference b etw een P ( x ≤ X n − µ n σ n < x + 1 ) and 1 √ 2 π R x +1 x e − t 2 2 dt as n tends to infinit y . X n satisfies a lo cal limit theor em on some set S ⊂ R if and only if (5.1) lim n →∞ sup x ∈ S σ n P X n − µ n σ n = x − 1 √ 2 π e − x 2 2 = 0 . One k ey condition formulated in eq. (5.2) of Theorem 6 b elow for pro ving a loca l limit theorem is given by ϕ n ( s ) ϕ n (0) ∼ exp( M ( s ) β n + N ( s )) , where M ( s ) is differentiable and N ( s ) is contin uous in s o me ǫ -disc centered at 0. In vie w of eq. (4.17) a nd eq. (4.1 8) in the pro o f of the central limit theo rem, this c ondition alone implies the central limit theo rem. In o ther words, the lo ca l limit theor em implies the central limit theor em. W e hav e shown in the intro duction that a cen tral limit theorem do e s not imply a local limit theorem. Bender obser ved in [2] that the central limit theorem combined with certain smo o thness conditions do es imply the lo cal limit theorem. Accordingly , in or der to prove the lo cal limit theo r em for 3 - noncrossing RNA structures with h ar cs our strategy will consist in verifying suc h smo o thnes s conditions [9]. Theorem 6. Le t ϕ n ( s ) = P k a n,k w k and w = e s . Supp ose (5.2) ϕ n ( s ) ϕ n (0) ∼ exp( M ( s ) β n + N ( s )) holds uniformly for | s | ≤ τ , s ∈ C and τ > 0 , wher e the fol lowing c onditions ar e satisfie d ( i ) M ( s ) is differ entiable and N ( s ) is c ontinuous in | s | < ǫ a nd furt hermor e M ( s ) and N ( s ) ar e indep endent of n . ( ii ) β n is indep endent of t , β n → ∞ and M ′′ (0) > 0 ; CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 21 ( iii ) t her e exist c onstant δ and c = c ( δ, r ) > 0 , wher e 0 < δ ≤ τ such that (5.3) ϕ n ( r + it ) ϕ n ( r ) = O (exp ( − cβ n )) holds uniformly for − τ ≤ r ≤ τ and δ ≤ | t | ≤ π as n tends to infinity. Then r andom variable X n having distribution P ( X n = k ) = a n,k /a n with me an M ′ (0) β n and varianc e M ′′ (0) β n satisfies a lo c al limit the or em on the r e al set S = { x | x = o ( √ β n ) } i.e. (5.4) lim n →∞ sup x ∈ S σ n P X n − µ n σ n = x − 1 √ 2 π e − x 2 2 = 0 . With the help o f Theor em 6 we can no w pro ve the lo cal limit theorem for 3-nonc r ossing RN A structures with h arcs . Theorem 7. L et S ′ 3 ( n, h ) b e the numb er of 3 -noncr ossing R NA st ructur es with exactly h ar cs. L et X n b e the r.v. having the distribution (5.5) ∀ h = 0 , 1 , . . . ⌊ n 2 ⌋ , P ( X n = h ) = S ′ 3 ( n, h ) S 3 ( n ) Then we have for set S = { x | x = o ( √ n ) } (5.6) lim n →∞ sup x ∈ S √ σ 2 n P X n − n µ √ σ 2 n = x − 1 √ 2 π e − x 2 2 = 0 , wher e µ = 0 . 3908 9 and σ 2 = 0 . 04 1 565 . Pr o of. W e will show that { S ′ 3 ( n,h ) S 3 ( n ) } s atisfies the conditions for Theorem 6. F or | s | ≤ ǫ , where ǫ is sufficiently small but fixed. The crucia l equa tion implying the co nditions o f Theorem 6 is eq. (4.14) of the pro of of Theor em 5 : ϕ n, 3 ( s ) = K ( s ) 4! n ( n − 1) . . . ( n − 4 ) ( ρ 3 ( s ) − 1 ) n 1 − O ( 1 n ) K ( s ) ∈ C , holds uniformly for | s | < ǫ . Ther efore we hav e (5.7) ϕ n, 3 ( s ) ϕ n, 3 (0) = K ( s ) K (0) ρ 3 (0) ρ 3 ( s ) n 1 − O ( 1 n ) ∼ exp n ln ρ 3 (0) ρ 3 ( s ) + ln K ( s ) K (0) . uniformly for | s | < ǫ . W e set (5.8) β n = n, M ( s ) = ln ρ 3 (0) ρ 3 ( s ) and N ( s ) = ln K ( s ) K (0) . 22 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ By c o nstruction t is independent of n and clear ly n → ∞ and M ( s ) is differentiable and N ( s ) is contin uous for a ll s suc h that | s | < ǫ . In a ddition M ′′ (0) is analytic for | s | < ǫ a nd we have M ′′ (0) = µ 2 − 1 − 94 441 √ 21 5 − √ 21 2 = 0 . 04 1 565 > 0. Let δ = ǫ and − ǫ ≤ r ≤ ǫ , w e obtain ϕ n, 3 ( r + it ) ϕ n, 3 ( r ) ∼ exp n ln ρ 3 ( r ) ρ 3 ( r + i t ) + ln K ( r + it ) K ( r ) uniformly for − ǫ ≤ r ≤ ǫ and ǫ ≤ | t | ≤ π . Since K ( s ) K (0) yields a constant factor and K ( s ) is c ontin uous for | s | < ǫ , it suffices to a nalyze ln ρ 3 ( r ) ρ 3 ( r + it ) . W e obser ve ρ 3 ( s ) = 1+4 e s 2 − √ 12 e s − 8 e s 2 +1 2 e s 6 = 0 for any complex s where | s | < ǫ . The s ingularities of ln( ρ 3 (0) ρ 3 ( s ) ) corresp ond to the ze r os of 12 e s − 8 e s 2 + 1 = (2 e s 2 + 1)(6 e s 2 + 1), that is e s 2 = − 1 2 or − 1 6 . Observe that for | s | < ǫ , | e s 2 | is clos e to 1. Therefo r e ln ρ 3 ( r ) ρ 3 ( r + it ) is analytic for a ny ǫ ≤ | t | ≤ π and r ∈ ] − ǫ, ǫ [ and w e c a n conclude ϕ n, 3 ( r + it ) ϕ n, 3 ( r ) = O exp( n · ln ρ 3 ( it ) ρ 3 (0) ) = O exp Re n · ln ρ 3 ( r + it ) ρ 3 ( r ) uniformly fo r − ǫ ≤ r ≤ ǫ and ǫ ≤ | t | ≤ π . T aylor expansion of ln( ρ 3 ( r + it ) ρ 3 ( r ) ) at 0 shows (see eq. (4.20)), that the dominant real part of ln( ρ 3 ( r + it ) ρ 3 ( r ) ) is given by " ρ ′ 3 ( r ) ρ 3 ( r ) 2 − ρ ′′ 3 ( r ) ρ 3 ( r ) # t 2 2! < 0 for r ∈ ] − ǫ , ǫ [ . Setting c 1 = ρ ′′ 3 ( r ) ρ 3 ( r ) − ρ ′ 3 ( r ) ρ 3 ( r ) 2 π 2 2! > 0 a nd c 2 = ρ ′′ 3 ( r ) ρ 3 ( r ) − ρ ′ 3 ( r ) ρ 3 ( r ) 2 δ 2 2! > 0 we can conclude ϕ n, 3 ( r + i t ) ϕ n, 3 ( r ) = O (exp( − c · n )) for s o me 0 < c 2 < c < c 1 , uniformly for − ǫ ≤ r ≤ ǫ and ǫ ≤ | t | ≤ π and Theor em 6 a pplies, whence Theorem 7. 6. Appendix Pro of of Lemm a 1. First we observe that for x, w ∈ [ − 1 , 1] the term w 2 x 2 − x + 1 is strictly po sitive. W e set (6.1) F k ( x, w ) = X n ≥ 0 X h ≤ n/ 2 S ′ k ( n, h ) w 2 h x n CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 23 and compute F k ( x, w ) = X n ≥ 0 X h ≤ n/ 2 h X j =0 ( − 1) j n − j j n − 2 j 2( h − j ) f k (2( h − j ) , 0) w 2 h x n = X n ≥ 0 X j ≤ n/ 2 n/ 2 X h = j ( − 1) j n − j j n − 2 j 2( h − j ) f k (2( h − j ) , 0) w 2 h x n = X j ≥ 0 X n ≥ 2 j n/ 2 X h = j ( − 1) j n − j j n − 2 j 2( h − j ) f k (2( h − j ) , 0) w 2 h x n = X j ≥ 0 ( − 1) j ( wx ) 2 j j ! X n ≥ 2 j ( n − j )! n/ 2 X h = j n − 2 j 2( h − j ) f k (2( h − j ) , 0) w 2( h − j ) ( n − 2 j )! x n − 2 j . W e shift s ummation indices n ′ = n − 2 j a nd h ′ = h − j and derive for the rhs the following expression = X j ≥ 0 ( − 1) j ( wx ) 2 j j ! X n ′ ≥ 0 ( n ′ + j )! n/ 2 X h = j n ′ 2( h − j ) f k (2( h − j ) , 0) w 2( h − j ) n ′ ! x n − 2 j = X j ≥ 0 ( − 1) j ( wx ) 2 j j ! X n ′ ≥ 0 ( n ′ + j )! n/ 2 − j = n ′ / 2 X h ′ =0 n ′ 2 h ′ f k (2 h ′ , 0) w 2 h ′ x n ′ n ′ ! The idea is now to interpret the term P n ′ / 2 h ′ =0 n ′ 2 h ′ f k (2 h ′ , 0) w 2 h ′ x n n ! as a pro duct o f the t wo p ower series e x and P n ≥ 0 f k (2 n, 0) ( wx ) 2 n (2 n )! : X ℓ ≥ 0 x ℓ ℓ ! X n ≥ 0 f k (2 n, 0) ( wx ) 2 n (2 n )! = X n ′ ≥ 0 X 2 n + ℓ = n ′ 1 ℓ ! 1 (2 n )! f k (2 n, 0) w 2 n x n ′ = X n ′ ≥ 0 n ′ / 2 X n =0 n ′ 2 n f k (2 n, 0) w 2 n x n ′ n ′ ! . W e set η n ′ = n P n ′ / 2 h ′ =0 n ′ 2 h ′ f k (2 h ′ , 0) w 2 h ′ o . B y a ssumption we hav e | x | < ρ k ( w ) and we next derive, using the Laplace transformation and interc hanging in tegr ation and summation (6.2) X n ′ ≥ 0 ( n ′ + j )! η n x n ′ n ′ ! = Z ∞ 0 X n ′ ≥ 0 η n ′ ( xt ) n ′ n ′ ! t j e − t dt . 24 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ Since | x | < ρ k ( w ) the ab ov e transfor mation is v alid and using (6.3) X n ′ ≥ 0 n ′ / 2 X n =0 n ′ 2 n f k (2 n, 0) w 2 n x n ′ n ′ ! = X ℓ ≥ 0 x ℓ ℓ ! X n ≥ 0 f k (2 n, 0) ( wx ) 2 n (2 n )! we acco rdingly obtain X n ′ ≥ 0 η n ′ ( xt ) n ′ n ′ ! t j e − t dt = Z ∞ 0 e tx X n ≥ 0 f k (2 n, 0) ( wxt ) 2 n (2 n )! t j e − t dt . (6.4) The next step is to substitute the term P n ′ ≥ 0 ( n ′ + j )! η n x n ′ n ′ ! in eq. (6.2), whence conse quently F k ( x, w ) = X j ≥ 0 ( − 1) j ( wx ) 2 j j ! Z ∞ 0 e tx X n ≥ 0 f k (2 n, 0) ( wxt ) 2 n (2 n )! t j e − t dt = Z ∞ 0 X j ≥ 0 ( − 1) j ( wx ) 2 j j ! e tz X n ≥ 0 f k (2 n, 0) ( wxt ) 2 n (2 n )! t j e − t dt . The summation ov er the index j is just an exp o ne ntial function and we de r ive = Z ∞ 0 e − ( w 2 x 2 − x +1) t X n ≥ 0 f k (2 n, 0) ( wxt ) 2 n (2 n )! dt = Z ∞ 0 e − ( w 2 x 2 − x +1) t X n ≥ 0 f k (2 n, 0) 1 (2 n )! wx w 2 x 2 − x + 1 2 n (( w 2 x 2 − x + 1 ) t ) 2 n dt W e pro cee d by transforming the integral introducing u = ( w 2 x 2 − x +1) t , i.e . dt = ( w 2 x 2 − x +1) − 1 du and accordingly arr ive a t F k ( x, w ) = X n ≥ 0 f k (2 n, 0) 1 (2 n )! wx w 2 x 2 − x + 1 2 n Z ∞ 0 e − ( w 2 x 2 − x +1) t (( w 2 x 2 − x + 1 ) t ) 2 n dt = X n ≥ 0 f k (2 n, 0) 1 (2 n )! wx w 2 x 2 − x + 1 2 n 1 w 2 x 2 − x + 1 (2 n )! = 1 w 2 x 2 − x + 1 X n ≥ 0 f k (2 n, 0) wx w 2 x 2 − x + 1 2 n , In particular for w = 1 X n ≥ 0 S k ( n ) x n = 1 x 2 − x + 1 X n ≥ 0 f k (2 n, 0) x x 2 − x + 1 2 n (6.5) holds for any x ∈ R , satisfying | x | < ρ k (1), a nd where ρ k (1) is the radius of convergence o f the power s eries P n ≥ 0 S k ( n ) z n ov er C , that is eq. (6.5) holds for x ∈ ] − ρ k (1) , ρ k (1)[ . F rom complex CENTRAL AND LOCAL LIMIT THEOREMS FOR RNA STR UCTURES 25 analysis we know that any tw o functions that are analy tic a t 0 and coincide on an op en interv al which includes 0 are identical. Therefore eq. (6.5) ho lds for z ∈ C , | z | < ρ k (1), a nd the pro of of the lemma is complete. Ac kno wledgme n ts. W e ar e g rateful to Pro f. Ja son Gao for helpful discuss io ns. This work was suppo rted by the 97 3 Pr o ject, the PCSIR T P ro ject of the Ministry of E ducation, the Ministry of Science and T echnology , and the National Science F oundatio n of China. References [1] Mapping RNA form and function. Science , 2, 2005. [2] E.A. Bender. Central and lo cal limit theorem applied to asymptotic enume ration. J. Combin. The ory A , 15:91– 111, 1973. [3] M. Chamorro, N. Pa rkin, and H.E. V armus. An RNA pseudoknot and an optimal heptameric shift site are required for highly efficien t rib osomal frameshifting on a retro viral m essenger RNA. J. Pr o c Natl A ca d Sci USA , 89:713–717, 1991. [4] W.Y.C. Chen, E.Y.P . Deng, R .R.X. Du, R.P . Stanley , and C.H . Y an. Crossings and nestings of matchings and partitions. T r ans. Amer. Math. So c. , 359:1555–1575, 2007. [5] P . Fl a j olet, J. A. Fi l l, and N. K apur. Singularity analysis, hadamard pro ducts, and tree r ecurrences. J. Comp. Appl. Math. , 174:271–313, 2005. [6] Z. Gao and L.B. Richmond. Cen tral and lo cal li mit theorems applied to asymptotic enumeration. J. Appl. Comput. Anal. , 41:177–186, 1992. [7] I.M. Gessel and D . Zeilb erger. Random walk in a W eyl cham ber. Pr o c. A mer. Math. So c. , 115:27–31, 1992. [8] Hofac ke r, I.L. , Sch uster, P ., Stadler, P .F. Combinatorics of RNA Secondary Structures. Discr. Appl. Math. , 88:207–237, 1998. [9] H.K. H wa ng. Large deviations of combinatorial distributions. i i. l o cal limit theorems. Ann. Appl. Pr ob ab. , 8(1):163–1 81, 1998. [10] E.Y. Jin, J. Qin, and C. M. Reidys. Combinato ri cs of RNA structures with pseudoknot s. Bul l.M ath.Biol. , 2007. in press. [11] E.Y. Jin and C. M. Reidys. Asymptotics of rna structures with pseudoknots. Bul l.Math.Biol. , 2007. submitted. [12] D.A.M Konings and R.R Gutell. A comparison of thermodynamic foldings wi th comparatively deriv ed structures of 16s and 16s-like r RNAs. RNA , 1:559–574, 1995. [13] B. Lindstro em. On the ve ctor representation of induced matroids. Bul l. L ondon Math. So c. , 5:85–90, 1973. [14] A. Loria and T. Pan. Domain structure of the rib ozyme fr om eubacterial rib onuclease p. RNA , 2:551–563, 1996. [15] J.S. M cCaskill. The equili brium partition function and base pair bi nding probabilities for RNA seco ndary structure. Biop olymers , 29:1105–1119, 1990. [16] A.M. Odlyzk o. Handb o ok of Combinatorics , c hapter 22. Elsevier, 1995. [17] W.R. Sc hmitt and M.S. W aterman. Linear trees and RNA secondary structure. Discr. Appl. Math. , 51:317–323 , 1994. 26 EMMA Y. JIN AND CHRISTIAN M. REIDYS ⋆ [18] M. T ac ke r, W. F on tana, P .F. Stadler, and P . Sch uster. Statistics of RNA melting ki netics. Eur. Biophysics J. , 23:29–38, 1994. [19] T ack er, M. and Stadler, P .F. an d Baue r, E.G. and Hofac k er, I.L. and Sc h uster P . Algorithm Indep enden t Prop er ties of RNA Secondary Structure Pr edictions. Eur.Biophy.J. , 25:115–130, 1996. [20] C. T uerk, S. MacDougal, and L. Gold. RN A pseudo knots that inhibit h uman immunodeficiency virus t ype 1 reve rse transcriptase. Pr o c. Natl. A c ad. Sci. USA , 89:6988–6992, 1992. [21] M.S. W aterman. Secondary structure of single - stranded n ucleic acids. A dv. Math.I (suppl.) , 1:167–212, 1978. [22] M.S. W aterman and T.F. Smith. Rapid dynamic programming algorithms for RNA secondary structure. A dv. Appl. Math. , 7:455–464, 1986. [23] E. W esthof and L. Jaeger. RNA pseudoknots. Curr ent Opinion Struct. Biol. , 2:327–333, 1992. [24] M. Zuk er and D. Sank off. RNA secondary structures and their prediction. Bul l. Ma th. Bio. , 46(4):591– 621, 1984. Center for Combinato rics, LPMC-TJKLC, Na nkai University, Tianjin 300071, P.R. China, Phone: *86- 22-2350-6 800, F ax : *86-22-235 0-9272 E-mail addr ess : reidys@nankai.e du.cn
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment