A proposal to a generalised splicing with a self assembly approach

Theory of splicing is an abstract model of the recombinant behaviour of DNAs. In a splicing system, two strings to be spliced are taken from the same set and the splicing rule is from another set. Here we propose a generalised splicing (GS) model wit…

Authors: L. Jeganathan, R. Rama, Ritabrata Sengupta

A proposal to a generalised splicing with a self assembly approach
A prop osal to a generalised splicing with a self assem bly approac h L Jeganatha n, R Rama, and Ritabrata Sengupta Department of M athematics Indian Institute of T ec hnology Chennai 600 036, India lj, ramar, rits@iitm.ac .in Abstract. T heory of splicing is an abstract mod e l of the recombinan t b eha viour of DNA s. In a splicing system, tw o strings to b e sp liced are taken from the same set and the splicing rule is from an oth er set. Here w e prop ose a generalised splicing (GS) mo del with th ree components, tw o strings f rom tw o languages and a splicing rule from third comp onent. W e prop ose a generalised self assembly (GSA ) of strings. Tw o strings u 1 xv 1 and u 2 xv 2 self assemble over x and generate u 1 xv 2 and u 2 xv 1 . W e study th e relationship b etw een GS and G SA. W e study some classes of generalise d splicing language s with t h e help of generalised self assem bly . 1 In tro duction T om Head prop os ed [5] a n o p e ration called ‘splicing’, for desc r ibing the recombination of DNA s equences under the application of re striction enz y mes and lig ases. Giv en t wo strings uαβ v and u ′ α ′ β ′ v ′ ov er some alphab et V and a splicing rule α # β $ α ′ # β ′ , tw o strings u αβ ′ v ′ and u ′ α ′ β v are pr o duced. The splicing rule α # β $ α ′ # β ′ means that the fir st string is cut betw een α and β and the second string is cut b etw een α ′ and β ′ , a nd the fragments recombine crosswise. The splicing scheme (also written as H-scheme) is a pair σ = ( V , R ) where V is an alphab et and R ⊆ V ∗ # V ∗ $ V ∗ # V ∗ is the set of splicing rules. Starting from a langua ge, we generate a new language b y the iterated applicatio n o f splicing r ules in R . Here R ca n b e infinite. Thus R can b e considered as a langua g e ov er V ∪ { # , $ } . Splicing language (language gener ated by splicing ) dep ends upo n the class of the la nguage (in the Chomskian hierar chy) to b e splice d and the type of the splicing rules to be applied. The c la ss of splicing languag e H ( F L 1 , F L 2 ) is the set of strings ge ne r ated by taking any tw o strings fro m F L 1 and splicing them by the str ings of F L 2 . F L 1 and F L 2 can b e any class of la nguages in the Chomskian hiera r ch y . Detailed inv estiga tio ns on computational p ower of splicing is found in [9]. Theory of splicing is a n abstr a ct mo del of the r e combinan t b ehaviour of the DNAs. In a splicing system, the t wo str ings to b e spliced ar e ta ken fro m the same set and the s plicing rule is from a nother set. The reaso n for taking tw o strings fro m the same set is, in the DNA r ecombination, b oth the ob jects to b e spliced are DNAs. F or e xample, the splicing la nguage in the class H ( F I N , RE G ) is the la nguage ge ner ated by taking tw o string s from a finite la nguage a nd using strings fr o m a r egular languag e as the splicing rules. An y general ‘cut’ and ‘connection’ mo del should include the cutting o f t wo strings taken from tw o different la nguages. The strings spliced and the s plicing rules ha ve a n effect on the language generated by the s plicing pr o cess. In short, we view a splicing mo del as having three co mpo nen ts, tw o strings from tw o languages as the first tw o comp onents, and a splicing rule as the third co mpo nent . Our pr op osal of a generalis ed splicing mo del (a for mal definition of GS: Generalised splicing, is given in section 2 definition 1)will b e: GS ( L 1 , L 2 , L 3 ) := { z 1 , z 2 : ( x, y ) | = r ( z 1 , z 2 ) , x ∈ L 1 , y ∈ L 2 , r ∈ L 3 } . Instead of ta k ing t wo string s from same langua ge, as b eing done in the theory of splicing, w e take them from t wo differe n t langua ges. W e cut them by us ing rules from a third la nguage. This means, taking an arbitrar y word w 1 ( ∈ L 1 ) and an arbitra ry word from w 2 ( ∈ L 2 ), we cut them by using an a rbitrary rule of L 3 . If L 1 = L 2 in the g eneralised splicing mo del, we get the usual H -system. The motiv ation of the above pro p o sal o f a gene r alised theory of splicing comes from the s elf assembly of strings [4]. Two strings uv a nd v w self assemble o ver v and gener ate uv w . Here, the ov erlapping strings app ear a t the end of o ne string and at the b eginning of the o ther . Then comes the que s tion: What will b e the genera lisation if we do no t restric t the ov erlapping strings to b e in the end (or the b eginning) of the string s that par ticipate in the a ssembling pr o cess. As an answer to the ab ove question, we prop ose a generalised se lf assembly (GSA) of tw o s tr ings (definition 2 ). Two strings u 1 xv 1 and u 2 xv 2 self ass emb le ov er the sub-string x a nd genera te the strings u 1 xv 2 and u 2 xv 1 , a s illustrated in the right hand side of the figure 1. The genera ted w ords indicate that 2 u 1 v 1 u 2 v 2 x x ⇐ ⇒ x ✟ ✟ ✯ u 2 ❍ ❍ ❥ u 1 ✟ ✟ ✯ v 1 ❍ ❍ ❥ v 2 w 1 = u 1 xv 1 w 2 = u 2 xv 2 Fig. 1. Equ iv alence of generalised splicing and self assembly the x -se lf assembly of w 1 and w 2 (self a ssembly with x as the ov erla pping string) is just a genera lised splicing of w 1 and w 2 with a splicing rule x #$ x #. W e ta ke adv antage of this eq uiv alence of GS a nd GS A and plan to in vestigate the genera lised splicing for some classes of languag es in Chomskian hierar ch y . Since an inv estigation of the classes o f langua ges under the generalise d s plicing model is going to b e a mo re co mplicated one, compar ed to the exis ting H -sy stem in all sense, we narrow down the in vestigation of the g eneralised splicing mo del by taking L 3 = V + ∪ { ( w 1 , w 2 ) : w 1 ∈ L 1 , w 2 ∈ L 2 } ( V is the set of commo n symbols tha t a ppea r in L 1 and L 2 ), which constitutes the set of splicing rules: a word w ∈ V + indicates that the splicing rule will b e w #$ w #, where # a nd $ hav e the usual mea nings as in H - system; a pair of words ( w 1 , w 2 ) ∈ L 3 indicate that the splicing rule will b e of the fo r m w 1 #$ w 2 #. The very pur p o se of including the pair ( w 1 , w 2 ) in L 3 is to include the words that a re b eing spliced, in the s e t of words generated by the GS. The necessity of including the pa rent w ords is discusse d at the end of s e ction 2. Though the whole theo ry of splic ing can be rew r itten with the genera lised splicing system, nevertheless, in this pap er, we inv estigate GS ( L 1 , L 2 , L 3 ) for L 1 , L 2 ∈ { R E G, LI N , C F } and L 3 is as given in the previous paragr aph. F o r an investigation, we define the GSA o f automata, reg ular grammar , linear grammar, context free gra mmar (apart fro m the GSA of tw o languages ). In this pap er, s ection 2 discusses the definitions o f GS and GSA. The subsequent sections discuss the gener alised self a ssembly of finite languages , r egular langua g es, linear languages and the context free language s . 2 Definitions Throughout this pap er , we follow the terminolog ies and the no tations a s in [2], [9]. Definition 1 (Generalised spl icing scheme) Gener alise d splicing scheme is define d as a t riplet σ G = ( V 1 , V 2 , R ) , wher e V 1 , V 2 ar e alphab ets, and R ⊆ V ∗ 1 # V ∗ 1 $ V ∗ 2 # V ∗ 2 . Her e R c an b e infin ite, and R is c onsider e d as a set of strings, henc e a language. F or a given σ G , and a languages L 1 ⊆ V ∗ 1 and L 2 ⊆ V ∗ 2 , we define σ G ( L 1 , L 2 ) = { z 1 , z 2 : ( x, y ) | = r ( z 1 , z 2 ) , for x ∈ L 1 , y ∈ L 2 , r ∈ R } . Given thr e e families F L 1 , F L 2 , F L 3 ; we define GS ( F L 1 , F L 2 , F L 3 ) = { σ G ( L 1 , L 2 ) : L 1 ∈ F L 1 , L 2 ∈ F L 2 , R ∈ F L 3 } , i.e. GS ( F L 1 , F L 2 , F L 3 ) is the set of s trings gener ate d by splicing a language of F L 1 , and a language of F L 2 , by using a set of splicing rules in F L 3 . Note 1. Whenever we refer ‘genera lis ed splicing’, we mean ge ne r alised 2-s plicing. Definition 2 (Generalised se lf assembly) L et w 1 ∈ L 1 , w 2 ∈ L 2 b e any two wor ds. The gener alise d x -self assembly op er ation GS A x ( w 1 , w 2 ) over ( ε 6 =) x ∈ s ub( w 1 ) ∩ sub( w 2 ) is define d as fol lows: GS A x ( w 1 , w 2 ) = { u 1 xv 1 , u 2 xv 2 , u 1 xv 2 , u 2 xv 1 : w 1 = u 1 xv 1 , w 2 = u 2 xv 2 , } . The self assemble d wor ds ar e the wor ds t hat ar e gener ate d when we tr ac e fr om a left c orner to a right c orner in t he figure 2. Given any two languages L 1 and L 2 , over the alphab et set V 1 and V 2 r esp e ctively, we define- GS A ( w 1 , w 2 ) := [ x GS A x ( w 1 , w 2 ) , 3 x ✟ ✟ ✯ u 2 ❍ ❍ ❥ u 1 ✟ ✟ ✯ v 1 ❍ ❍ ❥ v 2 w 1 = u 1 xv 1 w 2 = u 2 xv 2 ✲ Over lapping string x ∈ Σ + Fig. 2. Sup er imp ose ov er the common sub-string x and GS A ( L 1 , L 2 ) := [ w 1 ∈ L 1 w 2 ∈ L 2 GS A ( w 1 , w 2 ) . Though a self ass embly pro cess will not include the par ent w ords w 1 and w 2 (when w 1 6 = w 2 ), in the ab ove definition, w e purp o sefully include the parent words for the sake o f mor e cla rity of studying the GS through the GSA approach, i.e. we plan to inv estigate GS ( L 1 , L 2 , L 3 ) wher e L 3 = V + ∪ { ( w 1 , w 2 ) : w 1 ∈ L 1 , w 2 ∈ L 2 } ( V is the set o f co mmo n sy m b ols that app ear in L 1 and L 2 ). The pa ir ( w 1 , w 2 ) in the set of splicing rules means that w 1 will be cut a fter w 1 and w 2 will be cut a fter w 2 . Note that,the parent words w 1 and w 2 are included in GS ( w 1 , w 2 ). With the motiv ation given in section 1 and with the a b ov e t wo definitions, we hav e the following theorem. Theorem 1. gsags] L et L 1 and L 2 b e any two languages. L et V = V L 1 ∩ V L 2 , wher e V L 1 and V L 2 ar e the alphab ets of V L 1 and V L 1 r esp e ctively. Then GS ( L 1 , L 2 , R ) = GS A ( L 1 , L 2 ) , wher e R = V + ∪ { ( w 1 , w 2 ) : w ∈ L 1 , w ∈ L 2 } 3 Generalised Self assem bly of finite languages This is the simplest and most trivial case . Supp ose there are tw o finite la nguages L 1 and L 2 , each containing n 1 and n 2 words resp ectively . Given an y tw o words, there can b e only finitely many common symbols b e t ween them. So only finitely many new w ords can b e genera ted by self assembly . Since the parent languages are finite the end pr o duct S ( L 1 , L 2 ) co nt ains o nly finite num b er of words. Th us we get the following theorem:- Theorem 2. Self assembly of two finite languages is finite. So we m ay write, GS A ( F I N , F I N ) = F I N . 4 Generalised Self assem bly of regular languages In this section we shall inv estigate b ehaviour of the self assembly of tw o reg ula r languages . W e know that re gular languages can b e generated b y r egular g rammar a nd a re also accepted by a finite automata. W e shall show tha t self ass emb ly of any tw o regular lang ua ges is regular. W e shall pr ove it by b oth the auto mata and gra mmar approach. 4.1 Generalised Sel f assembly of regular gramm ar In this section we shall descr ib e : given any tw o reg ular grammar s G 1 , G 2 of langua ges L 1 and L 2 resp ectively , how to construct a gr a mmar for the self as sembly languag e S ( L 1 , L 2 ). Definition 3 (Self asse m bly of REG grammars) L et G i = ( N i , T i , R i , S i ) , i = 1 , 2 , b e the r e gular gr am- mars of languages L 1 = L ( G 1 ) and L 2 = L ( G 2 ) , wher e N i ’s ar e the s et of non terminals, N 1 ∩ N 2 = ∅ , T i ’s ar e the set of terminals and T 1 ∩ T 2 6 = ∅ (only then we c an self assemble), S i ’s ar e the starting symb ols and R i ’s ar e the ru les r esp e ctively . The gener alise d self assembly of G 1 and G 2 , written as GS A ( G 1 , G 2 ) is define d as G = ( N 1 ∪ N 2 ∪ { S } , T 1 ∪ T 2 , S, R ) , S / ∈ N 1 ∪ N 2 , wher e R includes t he fol lowing rules: 4 1. S − → S 1 , S − → S 2 . 2. A l l t he rules of R 1 and R 2 . 3. F or a ∈ T 1 ∩ T 2 , for e ach p air of the rules A − → aB ∈ R 1 and A ′ − → aB ′ ∈ R 2 , include the r u les A − → aB ′ and A ′ − → aB in R . Note 2. R E G grammar s are ones whose r ules a re o f the form A − → aB o r A − → a , where A is non-ter minal and a is a terminal. The tw o rules can b e jointly expressed a s A − → aγ wher e γ is a no n-terminal or γ = ε . Example 1 L et G 1 = ( { S 1 } , { a, b } , R 1 = { S 1 − → aS 1 , S 1 − → b } , S 1 ) , and G 2 = ( { S 2 } , { a, b } , R 1 = { S 2 − → bS 2 , S 2 − → a } , S 1 ) . L 1 = L ( G 1 ) = a ∗ b and L 2 = L ( G 2 ) = b ∗ a . Then the GSA gr ammar is G = ( { S, S 1 , S 2 } , { a, b } , R, S ) , wher e the rules R ar e given as S − → S 1 S 1 − → aS 1 | b | bS 2 | a S − → S 2 S 2 − → aS 1 | bS 2 | a | b. Note that the lang uage gener ated by G , L ( G ) w ill include the languages L ( G 1 ) a nd L ( G 2 ). Thus GSA o f t wo regular grammars is aga in reg ular. In the same s pirit of the ab ov e definition, we define GSA of linear gr ammars and GSA o f context free gra mmars ( for this, we consider the Greibach normal form for CFG). Theorem 3. L et G 1 and G 2 b e any two r e gular gr ammar. Then L ( GS A ( G 1 , G 2 ) = GS A ( L ( G 1 ) , L ( G 2 )) . Pr o of. Par t I: Case I w ∈ L ( G 1 ) o r w ∈ L ( G 2 ). It is trivial, since the rules R 1 and R 2 are included in GS A ( G 1 , G 2 ). Case I I w / ∈ L ( G 1 ) or w / ∈ L ( G 2 ). Let w ∈ GS A ( L ( G 1 ) , L ( G 2 )). There exis ts w 1 ∈ L ( G 1 ) , w 2 ∈ L ( G 2 ), a ∈ Σ w 1 ∩ Σ w 2 , and w = GS A ( w 1 , w 2 ) = uav such that w 1 = uau 1 , w 2 = v 1 av , wher e u ∈ prefix( w 1 ) , v 1 ∈ prefix( w 2 ) , u 1 ∈ suffix( w 1 ) , v ∈ suffix( w 2 ). Since w 1 ∈ L ( G 1 ), ther e exists a senten tial fo r m S 1 ⇒ ∗ G 1 uA ⇒ uaB ⇒ ∗ G 1 uau 1 : A → aB ∈ R 1 for der iving w 1 = u au 1 . Simila rly ther e exists a senten tial fo r m S 2 ⇒ ∗ G 2 v 1 A ′ ⇒ v 1 aB ′ ⇒ ∗ G 2 v 1 av : A ′ → aB ′ ∈ R 2 for der iving w 2 = v 1 av . Since A → aB ∈ R 1 and A ′ → aB ′ ∈ R 2 implies that A → aB ′ ∈ R ( GS A ( G 1 , G 2 )), we hav e the s ent ential form S ⇒ GS A ( G 1 ,G 2 ) S 1 ⇒ ∗ G 1 uA ⇒ GS A ( G 1 ,G 2 ) uaB ′ ⇒ ∗ G 2 uav ′ i.e. S ⇒ GS A ( G 1 ,G 2 ) uav = w. Hence w ∈ L ( GS A ( G 1 , G 2 ) ⇒ GS A ( L ( G 1 ) , L ( G 2 )) ⊆ L ( GS A ( G 1 , G 2 ). P art I I: Let w ∈ L ( GS A ( G 1 , G 2 )). Without loss of gener ality , we ass ume that w / ∈ L ( G 1 ) a nd L ( G 2 ). Since w ∈ L ( GS A ( G 1 , G 2 )), w can b e express ed as w = uav . So there exists a senten tial for m S ⇒ GS A ( G 1 ,G 2 ) S 1 ⇒ ∗ G 1 uA ⇒ GS A ( G 1 ,G 2 ) uaB ′ ⇒ ∗ G 2 uav Since A → aB ′ ∈ R ( GS A ( G 1 , G 2 )), but / ∈ R 1 , R 2 (beca use A, B / ∈ N 2 and A ′ , B ′ / ∈ N 1 ), there ex is ts pro ductio ns of the type A → aB ∈ R 1 and A ′ → aB ′ ∈ R 2 . This implies S 1 ⇒ ∗ G 1 ⇒ uA ⇒ G 1 uaB ⇒ ∗ G 1 uax, using the pro duction A → a B and S 2 ⇒ ∗ G 2 y A ′ ⇒ G 2 y aB ′ ⇒ ∗ G 2 y av , using the pro duction A ′ → aB ′ This gives ∃ uax ∈ L ( G 1 ) a nd y av ∈ L ( G 2 ) co rresp onding to w = u a v ∈ GS A ( L ( G 1 ) , L ( G 2 )). Hence L ( GS A ( G 1 , G 2 ) ⊆ GS A ( L ( G 1 ) , L ( G 2 )) . Hence the res ult. 5 4.2 Generalised Sel f assembly of finite automata If L 1 and L 2 any tw o R E G lang uages, there exists t wo finite automatas M 1 and M 2 such that L 1 = L ( M 1 ) and L 2 = L ( M 2 ). While L 1 and L 2 can self a ssembly by string ov erlapping, it is interesting to explore whether the corres p o nding auto ma ta self a s semble to an auto mata M such that the languag e of the self a ssembled automata is same as self assembly of languag es. If a word w is accepted by a F A, every s ymbol a in w corres po nds to an edge ‘ a ’ in the tra nsition diag ram of the F A. This gives the idea that the F A’s ca n b e self assembled by the ov erlapping edge with the same level. Th us we ha ve the following definition: Definition 4 (Generalised se lf assembly of t wo F A’s) L et M 1 = ( Q 1 , V 1 , δ 1 , q 1 , F 1 ) and M 2 = ( Q 2 , V 2 , δ 2 , q 2 , F 2 ) b e two machines such that V 1 ∩ V 2 6 = ∅ . The gener alise d self assembly of M 1 and M 2 written as GS A ( M 1 , M 2 ) is define d as M = ( Q = Q 1 ∪ Q 2 , V 1 ∪ V 2 ∪ { ε } , δ, q 0 , F 1 ∪ F 2 ) . δ is define d as fol lows 1. δ ( q 0 , ε ) = { q 1 , q 2 } . 2. ∀ a ∈ V 1 ∪ V 2 , q ∈ Q δ ( q , a ) =  δ 1 ( q , a ) q ∈ Q 1 δ 2 ( q , a ) q ∈ Q 2 3. F or every p air of tr ansitions δ 1 ( q i , a ) = q j and δ 1 ( q ′ i , a ) = q ′ j , q 1 ∈ Q 1 , q ′ i ∈ Q 2 , we include two new tr ansition rules, δ ( q i , a ) = q ′ j δ ( q ′ i , a ) = q j . Note that the langua g e accepted by the GSA of M 1 and M 2 include L ( M 1 ) a nd L ( M 2 ). It is obser ved that when G 1 and G 2 are regular grammar s, w e hav e L ( GS A ( G 1 , G 2 )) = L ( GS A ( M 1 , M 2 )) , where L ( G 1 ) = L ( M 1 ) a nd L ( G 2 ) = L ( M 2 ). The idea be hind the self as s embly of tw o F As is the overlapping of the direc ted edge lab elled with sa me symbol in the transition dia gram of b oth the finite a utomatas. Every transition rules corre s po nds to a directed edge in the transition diag r am. Let δ ( q i , a ) = q j and δ ( q ′ i , a ) = q ′ j be the transition in M 1 and M 2 resp ectively . In the self ass e mbly of M 1 and M 2 , the directed edg e in the transitio n dia gram that corr esp onds to the ab ov e transition overlap: When the edges ov erlap, the states q i and q ′ i ov erlap. T o a dd more clar it y , the figur e 3 is ✖✕ ✗✔ ✖✕ ✗✔ ✲ ✛ ✛ ✲ ✛ ✲ ✲ a q i q j Mac hine M 1 ✖✕ ✗✔ ✖✕ ✗✔ ✲ ✛ ✛ ✲ ✛ ✲ ✲ a q ′ i q ′ j Mac hine M 2 ✖✕ ✗✔ ✖✕ ✗✔ ✲ ✛ ✛ ✲ ✛ ✲ ✲ a q i q ′ i q j q ′ j Mac hine M Fig. 3. Part of self assem bled finite automata. Machine M 1 and M 2 are self assem bled at the transitions edge a . The new F A M is d ra wn to sp ecifically highlight the assem bled states. drawn in a wa y that all the transitions are pr eserved. 6 Theorem 4. L et M 1 and M 2 b e any two fin ite automatas. Then L ( GS A ( M 1 , M 2 )) = GS A ( L ( M 1 ) , L ( M 2 )) . Pr o of. Without loss of g enerality , we ass ume tha t there is o nly one directed edg e labeled a in the transition diagram of M 1 and M 2 , which can ov erlap. F urther we can assume that a ll states o f M 1 and M 2 are differently lab eled. P art I Let w ∈ L ( GS A ( M 1 , M 2 )). Case I w ∈ L ( M 1 )) or w ∈ L ( M 2 )). Since L ( M 1 ) , L ( M 2 ) ⊂ GS A ( L ( M 1 ) , L ( M 2 )), we hav e w ∈ GS A ( L ( M 1 ) , L ( M 2 )). Case I I w / ∈ L ( M 1 )) and w / ∈ L ( M 2 )) . There exists a path from q 0 to any one of the final states involving the edg e a the transition gr aph of M suc h that the path preceding the edge a is in M 1 (or in M 2 ), and the pa th s ucc eeding the edge a is in M 2 (or in M 1 ). ⇒ w = w 1 aw 2 , where w 1 prefix( x ) , x ∈ L ( M 1 ) (or w 1 prefix( x ) , x ∈ L ( M 2 )) and w 2 suffix( x ) , x ∈ L ( M 2 ) (or w 2 prefix( x ) , x ∈ L ( M 1 )); i.e. w 1 is the lab els of the pa th in M 1 (or in M 2 ), and w 2 is the lab els of the pa th in M 2 (or in M 1 ). ⇒ w ca n be written as the s elf assembly of the words w 1 aw ′ 1 and w ′ 2 aw 2 , w her e w 1 aw ′ 1 ∈ L ( M 1 ) and w ′ 2 aw 2 ∈ L ( M 2 ). ⇒ w ∈ GS A ( L ( M 1 ) , L ( M 2 )). Hence L ( GS A ( M 1 , M 2 )) ⊂ GS A ( L ( M 1 ) , L ( M 2 )). P art I I L e t w ∈ GS A ( L ( M 1 ) , L ( M 2 )). ⇒ w = GS A ( x, y ) : x ∈ L ( M 1 ) , y ∈ L ( M 2 ). ⇒ w = w 1 aw ′ 2 or w 2 aw ′ 1 where x = w 1 aw ′ 1 , y = w 2 aw ′ 2 . ⇒ There ex ists a path with la bel w from q 0 to any one o f the final states in the transition gr aph o f M , inv olving the edge a . ⇒ w ∈ L ( GS A ( M 1 , M 2 )). Hence the r esult. Combining the r e sults ab ove we get the following theo rem. Theorem 5. Gener alise d self assembly of t wo r e gular languages is r e gular. S o we may write, GS A ( RE G, R E G ) = R E G. W e may als o go a step fur ther. F or an y L 1 ∈ F I N we ca n genera te an automa ta M 1 , in this wa y: for each word, make an automa ta w hich accepts only that word. All together this will make a finite automata, with a unique starting sy mbo l, w hich may take the empt y str ing ε and links to each o f the individual automa ta s. No w given a r egular lang uage L 2 ∈ RE G , we hav e an automata M 2 accepting it. W e can self assembly them by the metho d describ ed in theor em 4 . The resultant is ag ain a finite automata. Since L 2 ⊂ GS A ( L 1 , L 2 ), by our construction, this automata als o a c c epts infinite num ber of words. W e can summarise this a s :- Theorem 6. Self assembly of r e gular and finite languages is r e gular. So we may write, GS A ( F I N , RE G ) = RE G. 5 Generalised Self assem bly of linear languages Linear languages (written as LIN) are the ones which are characterised by the following grammar rules. X − → P 1 Y P 2 X − → P , (1) where X and Y are non-terminals ( N ), and P 1 , P 2 , P 3 are words ov er terminals( T ) [2]. If P 1 (resp. P 2 ) is ε the g rammar is called le ft-linea r (re sp. rig h t-linear ). An y linear lang uage can be gener ated by r ight (or left) linear grammar . Also they a re equiv alen t [2]. Hence for our purp ose we c o nv ert all the gramma rs of the fo r m of right-linear only , i.e . we ar e only consider ing rules of the form: X − → P 1 Y X − → P , 7 where P 1 , P 2 ∈ T + . Again w e ma y further intro duce new non-terminals, such that ea ch rule is of either of the form: X − → a Y X − → a, (2) where a ∈ T ∪ { ε } and Y ∈ N + . Metho d for self ass em bly of LIN gramm ar: Now w e use similar pro cess as given in definition 3. Supp ose we hav e L 1 , L 2 ∈ LI N . W e construct grammar G i = ( N i , T i , S i , R i ) , i = 1 , 2 for them such that N 1 ∩ N 2 = ∅ and R i ’s are of the form of equation 2. Define a grammar G = ( N , T , S, R ) wher e N = N 1 ∪ N 2 , T = T 1 ∪ T 2 , S is the new starting symbol, and the rules o f R a re: 1. S − → S 1 , S − → S 2 . 2. All the rules of R 1 and R 2 . 3. F or a ∈ T 1 ∩ T 2 , for each pair of the rules A 1 − → aγ 1 ∈ R 1 and A 2 − → aγ 2 ∈ R 2 , include the rules A 1 − → aγ 2 and A 2 − → aγ 1 in R , wher e γ 1 ∈ N ∗ 1 and γ 2 ∈ N ∗ 2 . The analogo us result of theore m 3 follows the same line of arg umen t. Thus we can a lso conclude that: Theorem 7. Self assembly of two line ar languages is line ar; i.e. GS A ( LI N , LI N ) = LI N . 6 Generalised Self assem bly of con text free languages W e self assemble CF grammars, a nd th us show that the self assembly of tw o CF languages is a gain a CF language. Instead o f using g eneral gra mmar r ules , we take the help of Greibach no rmal form [6 ]. T o use this, we can assume witho ut loss of gener ality , tha t the parent lang uages ar e ε free. Now, in Greibach nor mal form each rule is o f the form A − → aγ , where γ ∈ N ∗ . W e use e xactly the same metho d used for linear g rammar. Same lines o f a r guments g ive us: Theorem 8. Gener alise d self assembly of t wo c ontext fr e e languages is c ontext fr e e; i.e. GS A ( C F , C F ) = C F. 7 Conclusion In all definitions of GSAs of languag es, grammars (definition 3) a nd F As (definition 4), the parent words are included in the words gener a ted by the GSA. In fact, in any self assembly pro cess o f w 1 and w 2 , w 1 w 2 will be generated only when w 1 = w 2 . But, in our definitio n of GSA, we prefer to include w 1 and w 2 (even if w 1 6 = w 2 ) in GS A ( w 1 , w 2 ) with a purp os e . Though we can define the GSA of g rammars (as well as F As) so that the parent words are no t included in the words generated, the pro cess will b e highly co mplicated. The ma in pur po se o f this pap er is just to study the gener alised splicing in the self ass emb ly a ppr oach. F or the sa ke of not lo osing clarity of our appr oach in this s tudy , we prefer to include the par ent words in all our definitions, na mely GS of languages , GSA o f la ng uages, and GSA of gr ammars. Thu s, we have prov ed tha t GS ( F I N , F I N , R ) = F I N , GS ( RE G, R E G, R ) = RE G , GS ( F I N , R E G, R ) = RE G , GS ( LI N , L I N , R ) = LI N and GS ( C F, C F , R ) = C F , where R is as men tioned as in Theore m 1. This study can further b e extended to study the other genera lised splicing classes o f la nguages. References 1. L. Adleman, T o w ards a mathematical theory of self-assem bly , T e chnic al R ep ort (00-72) , U niversit y of South California, 2000. 2. Arto Saloma, F ormal Languages, Academic Press Inc. 1973. 3. Karel Culik I I, T ero Harju, S plicing semigroups of domino es and D NA, Discr ete Applie d Mathematics , 31 (3) , 261-277, 1991. 4. Erzsbet Csuha j-V arj ´ u, Ion P etre, Gy ¨ orgy V aszil, Self assembly of strings and languages, The or etic al Computer Scienc e , 374 (1-3) , 74-81, 2007. 8 5. T om Head, F ormal language theory and DNA : An analysis of the generative capacity of sp ecific recombinan t b e- haviours, Bul l. Math. Bi olo gy , 49 , 737-759, 1987. 6. John H opcroft, Ra jeev Motw ani, Jeffrey Ullman, Introduction to automata theory , languages, and computation (2e), P earson In d ian reprin t, 2001. 7. Gh. P˘ aun , On the Splicing operation, Discr ete Applie d M athematics , 70 , 57-79, 1996. 8. Gh.P˘ aun , Grzegorz Rozenberg, Art o Salomaa, Computing by Splicing, The or etic al Computer Scienc e , 168(2) , 321- 336, 1996. 9. Gh. P˘ aun, Grzegorz Rozenberg, A rto Salomaa, D NA Computing : New Computing Paradigms, Springer-V erla g, 1998.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment