Analytic aspects of the shuffle product
There exist very lucid explanations of the combinatorial origins of rational and algebraic functions, in particular with respect to regular and context free languages. In the search to understand how to extend these natural correspondences, we find t…
Authors: Marni Mishna, Mike Zabrocki
Symposium on Theoretical Aspects of Computer Science 2008 (Bordeaux), pp. 561-572 www .stacs-conf .org ANAL YTIC ASPECTS OF THE SHUFFLE PR O DUC T MARNI MISHNA 1 AND MIKE ZABR OCKI 2 1 Department of Mathematics, Simon F ra ser U n ivers ity , Burnaby , Canada E-mail addr ess : mmishna@sf u.ca 2 Department of Mathematics and Statistics, Y ork Universit y , T oron to, Canada E-mail addr ess : zabrocki@m athstat.yorku.ca Abstra ct. T h ere exist very l ucid explanations of the combinatori al origins of rational and algebraic functions, in particular with resp ect to regular an d context free languages. In the searc h to und erstand h ow to extend these natural corresponden ces, w e fin d t h at the shuffle prod uct mo dels many key asp ects of D-fi n ite generating functions, a class which contains algebraic. W e consider sever al different takes on the shuffle pro duct, sh uffle closure, and shuffle grammars, and give explicit generating fun ction consequences. In the pro cess, w e define a grammar class that mo dels D-fi nite generating functions. In tro duction Generating functions of languages The (ordinary) generating fu nction of a language L is the sum L ( z ) = X w ∈ L z | w | , where | w | is the length of the word. This sum is a formal p o w er series if there are finitely man y w ords of a giv en length. In this case, we sa y the language is pr op er , and w e can rewrite L ( z ) as L ( z ) = P ℓ ( n ) z n , where ℓ ( n ) is the num b er of words in L of length n . In the case where we ha ve an unambiguous grammar to describ e a regular language or a cont ext free language, one can automatically generate equations satisfied by generating function directly from the g rammar. T hese are the wel l kno wn tr anslations: L = L 1 + L 2 = ⇒ L ( z ) = L 1 ( z ) + L 2 ( z ) L = L 1 · L 2 = ⇒ L ( z ) = L 1 ( z ) L 2 ( z ) L = L ∗ 1 = ⇒ L ( z ) = (1 − L 1 ( z )) − 1 . Generating fun ctions of formal languages are now a ve r y established to ol for algorithm analysis (see [12] for many references) and increasingly fo r r andom generation [9]. In this con text, we are also intereste d in the exp onential gener ating function of a language. The t wo are relate d by the Laplace- Borel transf orm, ho we ver it is sufficient for our purp oses to 1998 ACM Subje ct Classific ation: F.4.3 F ormal Languages. Key wor ds and phr ases: generating functions, formal languages, shuffle pro duct. c M. Mishna and M. Zabrocki CC Creative Commons Attribution- NoDerivs License 562 M. MISHNA AND M . ZABR OCKI think of the exp onen tial generating fu nction ˆ L ( z ) as the Hadamard pro duct of L ( z ) and exp( z ) = P z n n ! ; that is, ˆ L ( z ) = P ℓ ( n ) z n n ! . One sp ectac u lar feature o f generating functions of languages is the extent to wh ic h their analytic complexit y mo dels the complexit y of the language. Sp ecifically , we ha ve th e t wo classic results: fir st, regular languages hav e rational generating fun ctions, and second, those con text-free languages whic h are n ot inherent ly ambiguous ha v e an algebraic generating function. The conte xt-free languages form a la r ge and historicall y imp ortant sub class of all ob jects which ha ve algebraic generating functions. Bousquet-M ´ elou pr o vides us [6 , 7] with an interesting discus sion of the nature of com binatorial structures that p ossess algebraic and rational generating fun ctions, including broad classes that are not r epresen table as con text-free languages. There remain unanswered questions related to other classes of languages, and other classes of functions. A n example of the former is the question of Fla j olet [10]: “In which class of transcendental functions do generating fun ctions of (general) con text free languages lie?” An example of the latter is the identificati on of languages whose generating functions are D-finite 1 . T his is an exceptional class of fun ctions [24], which, for the moment, lac ks a satisfying com b inatorial explanation. W e su rv ey some current understand ings in Section 1.3, and provide a language theoretic interpretati on of one in Section 3.1. T o capture the analytic complexit y of D-finite generating fu nctions w e sh ould not exp ect a simple clim bing of the language hierarch y (to ind exed or con text sensitive, sa y), as there are differen t notions of complexit y in comp etition. F or example the languag e { a n b n c n : n ∈ N } is difficu lt to recognize, b ut trivial to enumerate . Lik ewise, the generating f unction of the relativ ely simple lo oking language { z n 2 : n ∈ N } has a natural b oundary at | z | = 1, whic h is a trademark of v ery complex analytic b eha viour. The sh uffle pro duct In the absence of the ob vious answers, w e consider a ve ry common, an d useful operator, the shuffle pr o duct , and disco ve r that it fills in m an y in teresting holes in this story . Consid er the wo rd s w , uw 1 and v w 2 , and th e letters u, v ∈ Σ. W e d efine th e sh u ffle p ro duct of t wo w ords r ecursiv ely by the equation uw 1 v w 2 = u ( w 1 v w 2 ) + v ( uw 1 w 2 ) , w ǫ = w ; ǫ w = w . Here the u nion is d isjoin t, and w e distinguish d uplicated letters f rom the second word by a bar: a a = { aa, aa } . Using the shuffle pro duct w e can defin e a class of languages with asso ciated generating functions that form a class that strictly con tains algebraic functions; it allo w s u s to m o del a very straigh tforwa rd com bin atorial inte r pretation of the deriv ativ e (indeed in some int eresting non-comm utativ e algebras the shuffle pr o duct is even called a deriv ativ e); and it allo ws us to neatly consider s ome larger classes wh ic h are simultaneously more complex from the language an d generating function p oints of view. 1 D-finite, also known as holonomic, functions satisfy linear differential equations with p olynomial coefficients. ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 563 Goal and Results The aim of this stud y is tw o-fold. W e h op e that a greater u nderstanding of generat- ing function implications of addin g the shuffle p ro duct to cont ext fr ee languages pr o vides insigh t to a larger class of com bin atorial prob lems. The second goal is to understand the com binatorial interpretatio ns of d ifferen t fun ction classes that arise b etw een algebraic and D-finite. T he sh uffle is a natural com bin atorial p ro duct to consid er sin ce it is, in some sense, a generalizatio n of p ointing. In the present wo rk, w e first examine the s h uffle as an op er ator on languages, and in the second p art w e consider the shuffle as a grammar p ro duction ru le to define languages. W e sho w that the shuffle closure of the con text free languages is D-finite; we giv e the asymptotic gro wth of co efficien ts of t wo classes using shuffle; w e define a sp ecial p oin ting class that describ es all D-fin ite functions; and discuss the shuffle closure of a language. In the next section w e r eview in terpretations of different ial equations. This is follo wed b y a d iscussion on the shuffle of languages, and some descriptions of sh u ffle grammars. 1. In terpreting differential equations combin atorially 1.1. The class of D-finite functions The class of D-finite fun ctions is of interest to the com binatorialist for man y reasons. The co efficien t sequence of a D-finite p o wer series is P-recursive: it satisfies a linear recur- rence of fixed l ength with p olynomial co efficient s, and hence is easy to generate, manipulate, and eve n “guess” their form. By defin ition, D -fin ite f unctions satisfy linear d ifferen tial equa- tions with p olynomial co efficien ts, and thus it is relativ ely str aigh tforw ard in many cases to p er form an asymptotic an alysis on the co efficien ts, even without a closed form for the generating fu nction. One imp ortan t f eature that w e u se here is that a P-recursiv e sequence gro ws asymp totical ly like ℓ ( n ) ∼ λ ( n !) r /s exp( Q ( n 1 /m ω n n α (log n ) k )) where r , s, m, n, k ∈ N , Q is a p olynomial and λ, ω , α , are complex n umb ers. W e con trast this to the asymptotic temp late satisfied by co efficien ts of algebraic functions: ℓ ( n ) ∼ κ n d Γ( d + 1) ω − n , (1.1) where κ is an algebraic num b er and d ∈ Q \ {− 1 , − 2 , . . . } . (A ve ry complete source on the theory of asym ptotic expansions of co efficien ts of algebraic fun ctions arising in the com binatorial cont ext is [12, Section VI I.4.1].) Notable differences includ e the exp onen tial/ logarithmic factors, the p o w er of a factorial, and the allo wable exp onents of n . W e sh all use the follo wing prop erties of the D-finite fun ctions: The fu nction 1 /f is D- finite if, and only if, f is of the form exp ( g ) h , where g and h are algebraic [23]; The Hadamard pro du ct f × g = P f n g n z n of t wo D-fin ite fu nctions f = P f n z n and g = P g n z n is also D-finite. 564 M. MISHNA AND M . ZABR OCKI 1.2. The simplest shuffl e: the p oin t P oint ing (or m arking) is an op eration that has b een long studied in connection with structures generate d by grammars. T he p oin t o f an w ord w , denoted P ( w ), is a set of words, eac h with a different p osition mark ed. F or example, P ( abc ) = { abc, abc, abc } . F rom th e en umerative p oint of view we remark that the t wo languages L , and L 1 = P ( L) = { P ( w ) : w ∈ L } , satisfies the enumerativ e relation ℓ 1 ( n ) = nℓ ( n ) , (1.2) and hence L 1 ( z ) = z d dz L ( z ). Th e p oin ting op erator is relev an t to ou r discussion b ecause of the simple bijectiv e corresp ond ence b et w een P ( L) and L a = { w a : w ∈ L } . The fir st obvious question is, “do es p ointing in crease expr essiv e p o we r?”. In the case of regular languages and con text free languages the answ er is no; W e can add a companion non-terminal for eac h non terminal that generates a language isomorphic to the p oin ted language. Let A b e the p ointe d v ersion of A . W e add the follo win g rules which mo d el p oint ing: ( AB ) = AB + AB , ( A + B ) = A + B Remark ho w these rules resemble the corresp onding pro d uct and s um r ules for differen- tiation. F u rthermore, from the p oint of view of generating fu nctions, we know that the deriv ativ e of a rational function is rational again, and the deriv ative of an algebraic f unc- tion is aga in algebraic, and so we know immediately that we could not hop e to in crease the class of generating fun ctions represen ted by this metho d. P oint ing, wh en paired with a “de-p oint ing” op erator whic h remo ves suc h marks, b e- comes p ow erf ul enough to d escrib e other kinds of constructions, namely lab elled cycles and sets [13, 15]. In this case we can describ e set partitions, and w hic h has expon en tial generating fun ction exp(exp( z ) − 1), w hic h is n ot D-finite. It tak es m u c h m ore effort [5] to defi ne a p oin ting op erator with a d ifferen tiation prop ert y as in Eq.(1.2) for un lab elled str uctures defined using Set and Cycle constru ctions. It is a fruitful exercise, as one can then generate app ro ximate size samplers with exp ected linear time complexit y . 1.3. O ther combinatorial deriv ativ es Com binatorial sp ecies theory [2] pr o vides a r ic h formalism for explaining the inte rp la y b et ween analytic and com b inatorial represen tions of ob jects. In particular, using the v ehicle of the the cycle index series, and there are several p ossibilities on ho w to relate th em to (m ultiv ariate) D-finite functions [18, 21]. In this r ealm, give n an y arbitrary linear differen tial equation with p olynomial co efficien ts we can d efine a set of grammar op erators that allo w us to constru ct a pair of sp ecie s wh ose difference has a generating fu nction that satisfies the giv en d ifferen tial equation. Unfortunately at presen t w e lac k the intuition to u nderstand what this class “is”, sp ecifically , w e lack the to ols to construct a test to see if an y giv en class or language falls within it. In Sectio n 3.4 w e giv e a language theoretic interpretati on of the d eriv ativ e of a sp ecies; sp ecifically a grammar system, fr om which, for an y linear d ifferen tial equation w ith co- efficien ts from Q [ x ] we can generate a language whose generating function satisfies th is equation. ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 565 1.4. O ther diff e re ntial classes There are several other natural function classes related to the differential equations. A series f ( z ) ∈ K [[ t ]] is said to b e c onstructible differ entiably algebr aic (CDF) if it b elongs to some fin itely generated ring whic h is closed u nder differen tiation. [3, 4]. Th is is equiv alent to satisfying a system of differential equations of a giv en form. Combinatorially , any CDF function can b e int erp reted as a family of enric h ed trees. Theorem 3 of [3] giv es the result that if P a n /n ! t n is CDF, then | a n | = O ( α n n !) for s ome complex constant α . Th is class is not closed un der Hadamard pro du ct, and any arb itrary CDF fun ction is unlikely to h a v e the image under the Borel transf orm also CDF. T his is the key closure prop erty required for a meaningful corresp ond ence with resp ect to the shuffle pro duct. A larger class whic h conta ins b oth CDF and D-finite is different iably algebraic. A function is differ e ntiably algebr aic (DA) if it satisfies an algebraic differenti al equation of the form P ( x, y , y ′ , . . . , y ( n ) ) = 0 where P is a non-trivial p olynomial in its n + 2 v ariables. (See Rub el’s sur v ey [22] for many references.) The set of D A f unctions is close d und er multiplicati ve inv erse and Hadamard pr o duct. These t wo facts together are sufficient to pro ve that al l of the classes we c onsider ar e differ entiably algebr aic . 1.5. Generat ing functions and shuffles Generating functions are useful to ol for the automatic studies of certain com binatorial problems. The shuffle op erator has a straight forward imp licatio n on the generating fu nction, as we shall s ee. With the a id of the shuffle pr o duct, Fla jolet et al. [11] are able to p erform a s traigh tfor- w ard analysis of f our pr oblems in r andom allocation. By using some syste matic translations, they are able to derive inte gral represen tations for exp ectations and probabilit y d istribu- tions. As they remark, th e shuffle of languages app ears in sev eral places relating to analysis of algorithms (such as ev olution of t wo stac ks in a common memory area). 2. The sh uffle of t wo languages The shuffle of tw o languages is d efined as L 1 L 2 = [ w 1 ∈ L 1 ,w 2 ∈ L 2 ( w 1 w 2 ) . In order to u se a generating function app roac h, we assu me that L 1 is a language ov er th e alphab et Σ 1 , and L 2 is a language o ver Σ 2 , and Σ 1 ∩ Σ 2 = ∅ . If they sh are an alphab et, it suffices to add a bar on top of the cop y from Σ 2 . 2.1. The shuffle closure of context free languages W e consider the shuffle closure of a language in the next section, and firs t concen trate on th e shuffle closure of a class of languages. F or any giv en class of languages C , the sh u ffle closure can b e defin ed recursive ly as the (infi nite) union of S 0 , S 1 , . . . , the sequence recursiv ely defined b y S 0 = C , S n = { L 1 L 2 : L 1 ∈ S n − 1 , L 2 ∈ C } . 566 M. MISHNA AND M . ZABR OCKI The shuffle pr o duct is comm utativ e and asso ciativ e [20], and th us the closure con tains S j S i , for any i and j . Remark, that for any giv en language in the closure, there is a b ound on the num b er of sh uffl e pro ductions that can o ccur in any deriv ation tree; namely , if L ∈ S n , that b ound is n . In ge n eral, w e denote the closure of a class of languages under s h uffle as C . The class of regular languages is closed un der the sh uffle pro duct, s ince th e shuffle of an y tw o regular languages is regular. How ever, th e con text fr ee languages are n ot closed un der the shuffle pro du ct [20], and hence w e consid er its closure. The p rotot yp ical language in this class is the shuffle of (an y finite n umber of ) Dyc k languages. Let | w | a b e count the n umb er of o ccur ences of the letter a in th e w ord w . Let D b e the Dyc k language ov er the alphab et Σ = { u, d } : D = { w ∈ Σ ∗ : w ′ v = w = ⇒ | w ′ | u ≥ | w ′ | d and | w | u = | w | d . } W e constru ct an isomorphic v ersion E , o ver the alphab et { l, r } . The language D E has enco des random w alks restricted to the quarter plane with steps f rom u(p), d(own), r(igh t), and l(eft) that r eturn to the origin. By consid ering the larger language of Dyc k pr efixes, w e can mo d els wal ks that end an ywhere in the quarter plane. Indeed, as the shuffle do es preserve t wo distinct s ets of prefix conditions, there are man y examples of r andom walks in b ound ed regions that can b e expr essed as shuffles of algebraic languages. It might b e in teresting to consider other stand ard questions of classes of languages for this closure class; in particular if inte resting random wal ks arise. 2.2. The closure is D-finit e In ord er to sho w that the sh u ffle p ro duct of t w o languages with D-finite g enerating fun c- tions also has a D-finite generating function, we consider the follo wing classic observ ation on the enumeratio n of shuffles of languages. If L is the shuffle of L 1 and L 2 , then the num b er of words of length n in L are easily coun ted if the generating series for L 1 ( z ) = ℓ 1 ( n ) z n and L 2 ( z ) = ℓ 2 ( n ) z n are kno wn b y the follo w ing formula: ℓ ( n ) = X n 1 + n 2 = n n n 1 n 2 ℓ 1 ( n 1 ) ℓ 2 ( n 2 ) . T o see this, recognize that a w ord in L is a comp osed of t wo words, and a set of p ositions for the letters in the word from L 1 , This is equiv alen t to ℓ ( n ) n ! = X n 1 + n 2 = n ℓ 1 ( n 1 ) n 1 ! ℓ 2 ( n 2 ) n 2 ! , (2.1) whic h amoun ts to the relation b etw een the exp onential gener ating functions of the th ree languages: L = L 1 L 2 = ⇒ ˆ L ( z ) = ˆ L 1 ( z ) ˆ L 2 ( z ) . (2.2) Using these relations, we can easily pro ve the follo win g resu lt. Prop osition 2.1. If L 1 and L 2 ar e languages with D-finite or dinary gener ating functions, then the gener ating series for L = L 1 L 2 , L ( z ) is also D-finite. As is the ca se w ith man y of the most in teresting closure prop erties of D-finite functions, the pro of follo ws from the closure of D-finite fun ctions u nder Hadamard pro d uct [19]. ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 567 Pr o of. Since D-finite functions are closed under Hadamard produ ct, t h e ordin ary generating function is D-finite if and only if the exp onential generating function of a sequence is D- finite. Consequen tly , if L 1 ( z ) and L 2 ( z ) are D-finite, then so are the exp onen tial generating functions, ˆ L 1 ( z ) and ˆ L 2 ( z ). By c losur e under pro d uct, ˆ L ( z ) is D-finite, and th us so is L ( z ). This result has the f ollo win g consequences. Corollary 2.2. If L 1 and L 2 ar e c ontext fr e e languages which ar e not inher ently ambiguous, then the gener ating series L ( z ) for L = L 1 L 2 is D-finite. Corollary 2.3. A ny language in the shuffle closur e of c ontext fr e e languages has a D-finite gener ating func tion. 2.3. Asymptotic template for ℓ ( n ) W e cont inue the example fr om the previous section using the t wo Dyc k languag es D and E . It is straight forward to compute that D ( z ) = E ( z ) = P 2 n n 1 n +1 z n . Th u s, ℓ ( n ), the n umb er of w ords of length n in the shuffle is given by ℓ ( n ) = X n 1 + n 2 = n n n 1 n 1 n 1 / 2 n 2 n 2 / 2 . W e remark that an asymptotic expression f or ℓ ( n ) can b e d etermined b y first using the V andermond e-Ch u iden tit y to simp lify ℓ ( n ): ℓ ( n ) = n ⌊ n/ 2 ⌋ n + 1 ⌈ n/ 2 ⌉ , and then by applying Stirlin g’s form u la. Since ℓ ( n ) ∼ 4 n /n , w e see that it the resulting s eries is not algebraic. Fla jolet uses this tec hniqu e extensiv ely in [10] to pr o v e that certain con text- free languages are inherent ly am biguous. Thus, w e ha ve that our class has generating functions strictly con tains the algebraic functions. Th u s, we h a v e some elements of a class of function with a n ice asymp totic expansion. A rough calculation give s that that the shuffle of tw o languag es, with resp ectiv e asymptotic gro wth of κ i n r i ( α i ) n , for i = 1 , 2 r esp ectiv ely , is giv en b y the expression ℓ ( n ) ∼ κn r 1 + r 2 ( α 1 + α 2 − r 1 − r 2 ) n . Ho w could one hop e to pro ve d irectly th at all elemen ts in this class ha ve an exp ansion of the form ℓ ( n ) ∼ κα n n r , where now r can b e any r ational, and κ is no longer restricted to algebraic num b ers ? It seems that it should b e p ossible to pro ve th is at least for the shuffles of series which satisfy the h yp otheses of Theorem 3.11 [7], using a more generalized form of the Ch u-V anderm onde iden tit y , or for the c losure of t h e sub -class of co ntext-free language s p osessing a n N -algebraic generating f unction. In this case the d = − 3 / 2, and t h is simp lifies the analyses consid erably . Unfortunately , it d o es not seem lik e a direct application of Bender’s metho d [12, Theorem VI.2] applies. Theorem 3.2 states that th e asymptotic form will not conta in an y p o w ers of n ! greater than 2. This illustrates a limitation w ith the expressive p o wer of the shuffle closure of con text free languages: there are kno wn n atural com binatorial ob jects w hic h hav e D-finite generating functions with coefficien ts that gro w asymptotically with h igher p o wers of n !. 568 M. MISHNA AND M . ZABR OCKI F or example, the num b er of k -regular graphs for k > 4 con tains ( n !) 5 / 2 , and the conjectured asymptotic for for k -un iform Y oung tableaux [8] contai n s n ! k / 2 − 1 . 3. Sh uffle gr ammars W e extend the first appr oac h by allo wing the shuffle to come int o p la y earlier in the story; w e add the shuffle op erator to our grammar rewr iting rules. Sh uffle grammars as defined by Gisc her [14] include a shuffle rule, and a shuffle closure ru le. W e consider th ese in Section 3.4. As we did ea rlier, we fir st consider languages whic h ha v e a natural b ound on t h e num b er of shuffle pro d uctions that can occur in a d eriv ation tree of an y w ord in the language. Th at is follo wed by an example of a recursiv e s h uffle grammar to illustrate how pow erf ul they can b e. It has b een prov en [17] that the recursive s h uffle grammars do indeed h a v e a greater expressiv e p o wer, but it is not alw ays clea r h o w to interpret the resulting com b inatorial families. W e b egin with a second kind of p oin ting op erator. 3.1. A terminal p oin ting op erator The traditional p oin ting op erator can b e used to mod el z d dz , but one can show that this is, in fact, insu fficien t to generate all D-fin ite functions. T o remedy this, we define a p oin ting op erator w hic h mimics the concept b ehind the deriv ativ e of a sp ecies. This p oint ing op erator has the effect of con verting a let ter to an epsilon b y ‘marking’ the let ter. Consequent ly , a letter can not b e m ark ed m ore than once, and eac h su bsequen t time a w ord is marked, there is a co unter on the mark wh ic h is augmen ted. The p ointing op erator applied a set of words will b e the p oin ting op erator applied to eac h of the elemen ts of the set. Notationall y , we distinguish them with accum ulated primes. W e giv e some examples: P ( aab ) = a ′ ab + aa ′ b + aab ′ P ( P ( aab )) = a ′ a ′′ b + a ′ ab ′′ + a ′′ a ′ b + aa ′ b ′′ + a ′′ ab ′ + aa ′′ b ′ P ( a ′′′ a ′ b ′′ ) = ∅ . The length of the w ord is the num b er of u nmark ed letters in a w ord (but the co mbinatorial ob jects in th e language enco de more than just the length in some sense). Th e num b er of w ords in th e p oin ting of a w ord is equal to its length. This giv es a straigh tforwa rd interpretatio n of the deriv ativ e: L 1 = P ( L) = ⇒ L 1 ( z ) = d dz L ( z ) . Using this defin ition if A is a sym b ol whic h ‘yields’ through a grammar a language Remark, if we allo w concatenatio n after marking, we could generate t wo letters in th e same w ord mark ed with a single prime via concatenation of mark ed w ords. Using the marking op eration, w e can express most D-finite functions, sp ecifically , b y the differen tial equations that th ey satisfy . F or example, th e series P ( z ) = P n ≥ 0 n ! z n satisfies the differentia l equ ation P ( z ) = 1 + z P ( z ) + z 2 P ′ ( z ) . ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 569 This is mo delled by the grammar A → ε A → aA A → bc P ( A ) . An alph ab et on three letters ( a, b, c ) allo ws us to track th e origin of eac h letter. Here is the result of the third iteration of the rules: 1 ⊕ a ⊕ aa + ba ′ c ⊕ aaa + abca ′ + bca ′ a + bcb ′′ ca ′ + bcaa ′ + bcbc ′′ a ′ ⊕ aaaa + aabca ′ + abca ′ a. W e w ill call a p oint ing grammar one that has rules of the form A → w , A → w B , A → P ( B ) . (3.1) Despite th e fact that we allo w only left concatenation, (a strategy to a vo id concatenating p oint ed words) these grammars rules can mo del any D-finite function. W e can d efine a pr o cedure f or finding a language giv en a defining equation satisfied b y a D-finite generating f unction. Sa y that a generating function T ( z ) satisfies T ( z ) = q ( z ) + q 0 ( z ) T ( z ) + q 1 ( z ) T ′ ( z ) + . . . + q n ( z ) T ( n ) ( z ) . (3.2) No w su bstitute T ( z ) = P ( z ) − N ( z ) and ( P ( z ) − N ( z )) = q ( z )+ q 0 ( z )( P ( z ) − N ( z ))+ q 1 ( z )( P ′ ( z ) − N ′ ( z ))+ . . . + q n ( z )( P ( n ) ( z ) − N ( n ) ( z )) Use also the n otatio n that q i ( z ) = q + i ( z ) − q − i ( z ) where q + i ( z ) are the p ositive terms of the p olynomial and q − i ( z ) are the negativ e on es. Then if P ( z ) = q + ( z ) + q + 0 ( z ) P ( z ) + q − 0 ( z ) N ( z ) + · · · + q + n ( z ) P ( n ) ( z ) + q − n ( z ) N ( n ) ( z ) (3.3) and N ( z ) = q − ( z ) + q − 0 ( z ) P ( z ) + q + 0 ( z ) N ( z ) + · · · + q − n ( z ) P ( n ) ( z ) + q + n ( z ) N ( n ) ( z ) (3.4) then P ( z ) − N ( z ) satisfies equation (3.2). No w we can define a language w ith a r ule for eac h monomial in (3.3) and (3.4 ) and ev ery terms x a R ( k ) ( z ) is represente d by a rule of the f orm ˜ R → w P ( · · · P ( R ) · · · ) where P o ccurs k times and R , ˜ R are symb ols repr esen ting a language whose generating function is either P ( z ) or N ( z ) and w is a wo rd of length a . An y language w hic h is generated from ru les of the form Eq. (3.1) has a generating function whic h satisfies a linear d ifferen tial equation, and hence is D-finite. W e su mmarize this in the follo wing theorem. Theorem 3.1. A language which is gener ate d fr om the rules of the form Eq. (3.1) has a D-finite gener ating function. Mor e over, any D-finite function c an b e written as a differ enc e of two gener ating functions for languages which ar e g ener ate d by rules of this form. 570 M. MISHNA AND M . ZABR OCKI 3.2. Acyclic shuffle dep endencies W e consider languages generated by the f ollo win g re-writing r ules, w here w is a word, and A , B and C are non-terminals: A → w , A → B C, A → B C. (3.5) F or any language generated by r ules of the ab ov e t yp e, and a fixed set of non-terminals, w e construct the graph wit h non-terminals as nod es, and for e very pro duction r ule A → B C , w e make an edge fr om A to B and an edge from A to C . If this graph is acyclic, w e say the language has acyclic shuffle dep endencies. The next section treats languages that h a v e a cyclic dep endency . W e pro v e that this class of languages is larger than those generate d b y the p ointing op erator of the previous section, b ecause w e can generate a language with a generating function that is not D-finite. W e r e-use the Dyc k languages D an d E d efined in Section 3.4. Cons ider the language generated b y the follo wing grammar: A → D E C → 1 | AC. The sh uffle dep end ency graph is a tree, and th us this is in ou r class. T he generating functions of A and C are give n by A ( z ) = − 1 4 z + (16 z − 1) 2 π z EllipticK(4 √ z ) + 1 π z EllipticE(4 √ z ) , C ( z ) = 1 1 − A ( z ) . Since 1 − A ( z ) is n ot of the form exp ( alg ebr aic ) al g ebr aic , C ( z ) is not D-fin ite. Nonetheless, w e can pro ve an asymptotic result ab out generating fu nctions in this class. Theorem 3.2. L et L b e a pr op er language gener ate d by shuffle pr o duction in an unambigu- ous gr ammar of with rules of the form gi ven in Eq. (3.5) , on an alphab et with k letters. The numb er of wor ds of length n , ℓ ( n ) , satisfies ℓ ( n ) = O ( n ! 2 ) . Pr o of. Since the grammar generates prop er languages, there are no sh uffl e pro d uctions with epsilon. Th u s, the deriv ation tree of a word of length n can ha ve at most n sh uffl e pro du ctions. In the w orst case, eac h one increments the alph ab et and so the m axim um size of alphab et that a word of length n can dr a w on is then k n . The total num b er of w ords from this alphab et is ( k n ) n . F or k < n the resu lt follo ws by Stirling’s form u la. 3.3. C yclic shuffle dep endencies Languages in this class will ha ve an infin ite alphab et since w e use a disjoint un ion in our sh u ffle. Ho we ver, the num b er of w ords of a giv en length is finite if there is no deriv ation tree p ossible that is a sh u ffle and an ǫ . Under this restriction, an y word of length n co mes from an alphab et using no m ore than more than a constan t m u ltiple of n letters. W e consid er an imp ortant cla ss of this type in the next section. ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 571 3.4. The shuffle closure of a languages A class of languages whic h falls under this category are those that are generating using the shuffle closure op er ator. Th e shuffle closur e of a language is d efined recur siv ely in the follo w ing wa y: L 1 = L L, and L n = L n − 1 L. The sh u ffle closure, is the u nion o v er all finite shuffles: L = [ n L n . Equiv alentl y , w e write this as a grammar pro duction: A → A B | B . The sh u ffle clo- sure [16, 17] provides extremely concise n otatio n . I n particular, they arise in descriptions of sequenti al execution histories of concurr en t pr o cesses. Remark, that th e closure of the language is one sin gle languag e, whereas the clo su re of the class of languages that is one language is an infin ite set of languages. The shuffle clo sur e of a single letter gives all p erm utations: a = a ⊕ aa + aa ⊕ aaa + aaa + aaa + aaa + aaa + aaa ⊕ . . . The generating function of the this language is P n ! z n , and indeed the generating function of the shuffle clo su re of an y word of length k is P ( k n )!( z k k ) n , which is also D-finite. T o pro ve our formula ab o ve , we express the generating function of L in terms of the op erators wh ic h switc h b et wee n the ordinary and exp onen tial generati n g functions. Reca ll, L ( z ) = P a n z n = ⇒ ˆ L ( z ) = P a n n ! , and we d efine the Laplace op erator L · ˆ L ( z ) = L ( z ). Then, L 1 = L = ⇒ L 1 ( z ) = X n L · [( ˆ L ( z )) n ] . (3.6) Although all of the su mmands are D-finite, it is p ossible that the sum is not. Clearly , th e s h uffle closure do es not p reserv e regularit y , and indeed adding it, and the shuffle pro du ct to regular languages is enough to generate all recursively enumerable languages. Thus, we see that if there is n o b ound on the num b er of shuffles p ossible in any expression tree, the languages can get far more complex. Nonetheless the follo wing conjecture s eems r easonable, and p erhaps it is p ossible to pro ve it follo win g starting fr om Eq. (3.6), and necessarily a m ore sophisticated analysis. Conjecture 3.3. Th e sh uffle closure of a r egular language h as a D-finite generating fun c- tion. 4. Conclusion A next step is t o adapt the Bolzmann generators to these languag es. Sin ce w e can e ffec- tiv ely sim ulate lab elled ob jects in an unlab elled context , w e can easily describ e ob jects lik e strong int erv al trees. This appr oac h might allo w a detailed analysis of certain parameters of p ermutation sorting b y rev ersals, as applied to comparativ e genomics [1]. W e are also intereste d in c haracterizing the con text-free languages wh ose sh uffle is not algebraic, and to consider the other naturu al questions of closure that are standard for language classes. A cknow le dgments. W e gratefully a c knowledge many dis cussions from the Algebra ic Combinatorics Seminar at the Fields Institute. In particular, we acknowledge contributions by N. Ber geron, C. Hollweg, a nd M. Ro sas. W e wish to a lso ackno wledg e the financial s uppor t of NSE R C. 572 M. MISHNA AND M . ZABR OCKI References [1] S ´ everi n e B´ era rd, Anne Bergeron, Cedric Chauve, and Chistophe P aul. P erfect sorting by reve rsals is not alwa y s difficult. IEEE/ACM T r ans. on c omput. biol o gy and bioinformatics , 4(1), 2007. [2] F. Bergeron, G. Lab elle, and P . Leroux. Combinatorial sp e cies and tr e e-like structur es , volume 67 of Encyclop e dia of Mathematics and its Applic ations . Cam bridge Un ive rsity Press, Cam bridge, 1998. [3] F ran¸ cois Bergeron and Christophe Reutenauer. Combinatori al resolution of systems of differential equ a- tions. I I I. A special class of d ifferen tially algebraic series. Eur op e an J. Combin. , 11(6):501–5 12, 1990. [4] F ran¸ cois Bergeron and Ulrike Sattler. Constructible differentially finite algebraic series in several v ari- ables. The or et. C omput. Sci. , 144(1-2):59–6 5, 1995. [5] Man uel Bo dirsky , ´ Eric F usy , Mihyun Kang, and Stefan V igerske. A n unbiased p ointing op erator for unlab eled structures, with applications to counting and sampling. In Nikhil Bansal, Kirk Pruhs, and Clifford Stein, editors, SODA , pages 356–36 5. SIAM, 2007. [6] Mireil le Bousquet-M´ el ou. Algebraic generating functions in enumerativ e combinatorics, and context-free languages. In Stacs 05 , volume 3404 of L e ctur e Notes in Comput. Sci. , pages 18–35. Springer, 2005. [7] Mireil le Bousquet-M ´ elou. Rational and algebraic series in com binatorial en umeration. In International Congr ess of Mathematicians , pages 789–826, 2006. [8] F r´ ed ´ eric Ch y zak , Marni Mishna, and Bruno Salvy . Effective scalar pro ducts of D-fi nite symmetric functions. J. Combin. The ory Ser. A , 112(1):1–43, 2005. [9] Philippe Duchon, Philippe Fla jolet, Guy Louchard, and Gill es Schaeffer. Boltzmann samplers for th e random generation of combinatoria l stru ctures. Combi n. Pr ob ab. Comput. , 13(4-5):577–625, 2004. [10] Ph ilippe Fla jolet. Analytic mo dels and ambiguit y of context-free languages. The or et. Comput. Sci. , 49(2-3):283– 309, 1987. [11] Ph ilippe Fla jolet, Dani` el e Gardy , and Lo ¨ ys Thimonier. Birthday parado x, coup on collectors, caching algorithms and self-organizing search. Discr ete Appl . Math. , 39(3):207–229, 1992. [12] Ph ilippe Fla jolet and Rob ert S edgewic k. Analyt i c Combinatorics . http://alg o.inria.fr/flajol et/Publications/books.html , 2006. [13] Ph ilippe Fla jolet, Paul Zimmerman, and Bernard V an Cutsem. A calculus for the random generation of lab elled combinatoria l struct u res. The or et. Comput. Sci. , 132(1-2):1– 35, 1994. [14] Jay Gisc her. Shuffle languages, p etri nets, and context-sensitiv e grammars. Communic ations of the ACM , 24(9), S ep tem b er 1981. [15] D aniel Hill Greene. L ab el le d F ormal La nguages and Their Uses . PhD thesis, Stanford Universit y , 1983. [16] Matth ias Jantzen. Ex tending regular expressions with iterated shuffle. The or et. Comput. Sci. , 38(2- 3):223–2 47, 1985. [17] Joanna J ‘ edrzejo wicz. In finite hierarc hy of ex pressions containing shuffle closure op erator. Inform. Pr o- c ess. L ett. , 28(1):33–37, 1988. [18] Gilb ert Lab elle and C ´ edric Lamathe. A theory of general com binatorial differential op erators. In F ormal Power Series and Algebr aic Combinatorics , 2007. [19] L. Lipshitz. The diagonal of a D -fi nite pow er series is D -finite. J. Algebr a , 113(2):373–37 8, 1988. [20] M. Lothaire. Combinatorics on wor ds , volume 17 of Encyclop e dia of Mathematics and its Appli c ations . Addison-W esley Pu blishing Co., Reading, Mass., 1983. [21] Marni Mishna. Automatic enumeration of regular ob jects. J. Inte ger Se quenc es , 10:Article 07.5.5, 2007. [22] Lee A. R ub el. A survey of transcendentally transcendental functions. A mer. M ath. Monthly , 96(9):777 – 788, 1989. [23] Michael F. Singer. A lgebraic relations among solutions of linear differential equations. T r ans. Amer. Math. So c. , 295(2):753–7 63, 1986. [24] R ic hard P . Stanley . Enumer ative c ombinatorics. Vol. 2 , volume 62 of Cambridge Studies i n A dvanc e d Mathematics . Cambridge U niversi ty Press, Cambridge, 1999. This wor k is licens ed under t he Creative Commons Attr ibution-NoDe rivs License. T o view a copy of this license, v isit http: //creati vecommon s.org/licenses/by- nd/3.0/ .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment