Analytic aspects of the shuffle product

Symposium on Theoretical Aspects of Computer Science 2008 (Bordeaux), pp. 561-572 www .stacs-conf .org ANAL YTIC ASPECTS OF THE SHUFFLE PR O DUC T MARNI MISHNA 1 AND MIKE ZABR OCKI 2 1 Department of Mathematics, Simon F ra ser U n ivers ity , Burnaby , Canada E-mail addr ess : mmishna@sf u.ca 2 Department of Mathematics and Statistics, Y ork Universit y , T oron to, Canada E-mail addr ess : zabrocki@m athstat.yorku.ca Abstra ct. T h ere exist very l ucid explanations of the combinatori al origins of rational and algebraic functions, in particular with resp ect to regular an d context free languages. In the searc h to und erstand h ow to extend these natural corresponden ces, w e ﬁn d t h at the shuﬄe prod uct mo dels many key asp ects of D-ﬁ n ite generating functions, a class which contains algebraic. W e consider sever al diﬀerent takes on the shuﬄe pro duct, sh uﬄe closure, and shuﬄe grammars, and give explicit generating fun ction consequences. In the pro cess, w e deﬁne a grammar class that mo dels D-ﬁ nite generating functions. In tro duction Generating functions of languages The (ordinary) generating fu nction of a language L is the sum L ( z ) = X w ∈ L z | w | , where | w | is the length of the word. This sum is a formal p o w er series if there are ﬁnitely man y w ords of a giv en length. In this case, we sa y the language is pr op er , and w e can rewrite L ( z ) as L ( z ) = P ℓ ( n ) z n , where ℓ ( n ) is the num b er of words in L of length n . In the case where we ha ve an unambiguous grammar to describ e a regular language or a cont ext free language, one can automatically generate equations satisﬁed by generating function directly from the g rammar. T hese are the wel l kno wn tr anslations: L = L 1 + L 2 = ⇒ L ( z ) = L 1 ( z ) + L 2 ( z ) L = L 1 · L 2 = ⇒ L ( z ) = L 1 ( z ) L 2 ( z ) L = L ∗ 1 = ⇒ L ( z ) = (1 − L 1 ( z )) − 1 . Generating fun ctions of formal languages are now a ve r y established to ol for algorithm analysis (see [12] for many references) and increasingly fo r r andom generation [9]. In this con text, we are also intereste d in the exp onential gener ating function of a language. The t wo are relate d by the Laplace- Borel transf orm, ho we ver it is suﬃcient for our purp oses to 1998 ACM Subje ct Classiﬁc ation: F.4.3 F ormal Languages. Key wor ds and phr ases: generating functions, formal languages, shuﬄe pro duct. c  M. Mishna and M. Zabrocki CC  Creative Commons Attribution- NoDerivs License 562 M. MISHNA AND M . ZABR OCKI think of the exp onen tial generating fu nction ˆ L ( z ) as the Hadamard pro duct of L ( z ) and exp( z ) = P z n n ! ; that is, ˆ L ( z ) = P ℓ ( n ) z n n ! . One sp ectac u lar feature o f generating functions of languages is the extent to wh ic h their analytic complexit y mo dels the complexit y of the language. Sp eciﬁcally , we ha ve th e t wo classic results: ﬁr st, regular languages hav e rational generating fun ctions, and second, those con text-free languages whic h are n ot inherent ly ambiguous ha v e an algebraic generating function. The conte xt-free languages form a la r ge and historicall y imp ortant sub class of all ob jects which ha ve algebraic generating functions. Bousquet-M ´ elou pr o vides us [6 , 7] with an interesting discus sion of the nature of com binatorial structures that p ossess algebraic and rational generating fun ctions, including broad classes that are not r epresen table as con text-free languages. There remain unanswered questions related to other classes of languages, and other classes of functions. A n example of the former is the question of Fla j olet [10]: “In which class of transcendental functions do generating fun ctions of (general) con text free languages lie?” An example of the latter is the identiﬁcati on of languages whose generating functions are D-ﬁnite 1 . T his is an exceptional class of fun ctions [24], which, for the moment, lac ks a satisfying com b inatorial explanation. W e su rv ey some current understand ings in Section 1.3, and provide a language theoretic interpretati on of one in Section 3.1. T o capture the analytic complexit y of D-ﬁnite generating fu nctions w e sh ould not exp ect a simple clim bing of the language hierarch y (to ind exed or con text sensitive, sa y), as there are diﬀeren t notions of complexit y in comp etition. F or example the languag e { a n b n c n : n ∈ N } is diﬃcu lt to recognize, b ut trivial to enumerate . Lik ewise, the generating f unction of the relativ ely simple lo oking language { z n 2 : n ∈ N } has a natural b oundary at | z | = 1, whic h is a trademark of v ery complex analytic b eha viour. The sh uﬄe pro duct In the absence of the ob vious answers, w e consider a ve ry common, an d useful operator, the shuﬄe pr o duct , and disco ve r that it ﬁlls in m an y in teresting holes in this story . Consid er the wo rd s w , uw 1 and v w 2 , and th e letters u, v ∈ Σ. W e d eﬁne th e sh u ﬄe p ro duct of t wo w ords r ecursiv ely by the equation uw 1 v w 2 = u ( w 1 v w 2 ) + v ( uw 1 w 2 ) , w ǫ = w ; ǫ w = w . Here the u nion is d isjoin t, and w e distinguish d uplicated letters f rom the second word by a bar: a a = { aa, aa } . Using the shuﬄe pro duct w e can deﬁn e a class of languages with asso ciated generating functions that form a class that strictly con tains algebraic functions; it allo w s u s to m o del a very straigh tforwa rd com bin atorial inte r pretation of the deriv ativ e (indeed in some int eresting non-comm utativ e algebras the shuﬄe pr o duct is even called a deriv ativ e); and it allo ws us to neatly consider s ome larger classes wh ic h are simultaneously more complex from the language an d generating function p oints of view. 1 D-ﬁnite, also known as holonomic, functions satisfy linear diﬀerential equations with p olynomial coeﬃcients. ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 563 Goal and Results The aim of this stud y is tw o-fold. W e h op e that a greater u nderstanding of generat- ing function implications of addin g the shuﬄe p ro duct to cont ext fr ee languages pr o vides insigh t to a larger class of com bin atorial prob lems. The second goal is to understand the com binatorial interpretatio ns of d iﬀeren t fun ction classes that arise b etw een algebraic and D-ﬁnite. T he sh uﬄe is a natural com bin atorial p ro duct to consid er sin ce it is, in some sense, a generalizatio n of p ointing. In the present wo rk, w e ﬁrst examine the s h uﬄe as an op er ator on languages, and in the second p art w e consider the shuﬄe as a grammar p ro duction ru le to deﬁne languages. W e sho w that the shuﬄe closure of the con text free languages is D-ﬁnite; we giv e the asymptotic gro wth of co eﬃcien ts of t wo classes using shuﬄe; w e deﬁne a sp ecial p oin ting class that describ es all D-ﬁn ite functions; and discuss the shuﬄe closure of a language. In the next section w e r eview in terpretations of diﬀerent ial equations. This is follo wed b y a d iscussion on the shuﬄe of languages, and some descriptions of sh u ﬄe grammars. 1. In terpreting diﬀerential equations combin atorially 1.1. The class of D-ﬁnite functions The class of D-ﬁnite fun ctions is of interest to the com binatorialist for man y reasons. The co eﬃcien t sequence of a D-ﬁnite p o wer series is P-recursive: it satisﬁes a linear recur- rence of ﬁxed l ength with p olynomial co eﬃcient s, and hence is easy to generate, manipulate, and eve n “guess” their form. By deﬁn ition, D -ﬁn ite f unctions satisfy linear d iﬀeren tial equa- tions with p olynomial co eﬃcien ts, and thus it is relativ ely str aigh tforw ard in many cases to p er form an asymptotic an alysis on the co eﬃcien ts, even without a closed form for the generating fu nction. One imp ortan t f eature that w e u se here is that a P-recursiv e sequence gro ws asymp totical ly like ℓ ( n ) ∼ λ ( n !) r /s exp( Q ( n 1 /m ω n n α (log n ) k )) where r , s, m, n, k ∈ N , Q is a p olynomial and λ, ω , α , are complex n umb ers. W e con trast this to the asymptotic temp late satisﬁed by co eﬃcien ts of algebraic functions: ℓ ( n ) ∼ κ n d Γ( d + 1) ω − n , (1.1) where κ is an algebraic num b er and d ∈ Q \ {− 1 , − 2 , . . . } . (A ve ry complete source on the theory of asym ptotic expansions of co eﬃcien ts of algebraic fun ctions arising in the com binatorial cont ext is [12, Section VI I.4.1].) Notable diﬀerences includ e the exp onen tial/ logarithmic factors, the p o w er of a factorial, and the allo wable exp onents of n . W e sh all use the follo wing prop erties of the D-ﬁnite fun ctions: The fu nction 1 /f is D- ﬁnite if, and only if, f is of the form exp ( g ) h , where g and h are algebraic [23]; The Hadamard pro du ct f × g = P f n g n z n of t wo D-ﬁn ite fu nctions f = P f n z n and g = P g n z n is also D-ﬁnite. 564 M. MISHNA AND M . ZABR OCKI 1.2. The simplest shuﬄ e: the p oin t P oint ing (or m arking) is an op eration that has b een long studied in connection with structures generate d by grammars. T he p oin t o f an w ord w , denoted P ( w ), is a set of words, eac h with a diﬀerent p osition mark ed. F or example, P ( abc ) = { abc, abc, abc } . F rom th e en umerative p oint of view we remark that the t wo languages L , and L 1 = P ( L) = { P ( w ) : w ∈ L } , satisﬁes the enumerativ e relation ℓ 1 ( n ) = nℓ ( n ) , (1.2) and hence L 1 ( z ) = z d dz L ( z ). Th e p oin ting op erator is relev an t to ou r discussion b ecause of the simple bijectiv e corresp ond ence b et w een P ( L) and L a = { w a : w ∈ L } . The ﬁr st obvious question is, “do es p ointing in crease expr essiv e p o we r?”. In the case of regular languages and con text free languages the answ er is no; W e can add a companion non-terminal for eac h non terminal that generates a language isomorphic to the p oin ted language. Let A b e the p ointe d v ersion of A . W e add the follo win g rules which mo d el p oint ing: ( AB ) = AB + AB , ( A + B ) = A + B Remark ho w these rules resemble the corresp onding pro d uct and s um r ules for diﬀeren- tiation. F u rthermore, from the p oint of view of generating fu nctions, we know that the deriv ativ e of a rational function is rational again, and the deriv ative of an algebraic f unc- tion is aga in algebraic, and so we know immediately that we could not hop e to in crease the class of generating fun ctions represen ted by this metho d. P oint ing, wh en paired with a “de-p oint ing” op erator whic h remo ves suc h marks, b e- comes p ow erf ul enough to d escrib e other kinds of constructions, namely lab elled cycles and sets [13, 15]. In this case we can describ e set partitions, and w hic h has expon en tial generating fun ction exp(exp( z ) − 1), w hic h is n ot D-ﬁnite. It tak es m u c h m ore eﬀort [5] to deﬁ ne a p oin ting op erator with a d iﬀeren tiation prop ert y as in Eq.(1.2) for un lab elled str uctures deﬁned using Set and Cycle constru ctions. It is a fruitful exercise, as one can then generate app ro ximate size samplers with exp ected linear time complexit y . 1.3. O ther combinatorial deriv ativ es Com binatorial sp ecies theory [2] pr o vides a r ic h formalism for explaining the inte rp la y b et ween analytic and com b inatorial represen tions of ob jects. In particular, using the v ehicle of the the cycle index series, and there are several p ossibilities on ho w to relate th em to (m ultiv ariate) D-ﬁnite functions [18, 21]. In this r ealm, give n an y arbitrary linear diﬀeren tial equation with p olynomial co eﬃcien ts we can d eﬁne a set of grammar op erators that allo w us to constru ct a pair of sp ecie s wh ose diﬀerence has a generating fu nction that satisﬁes the giv en d iﬀeren tial equation. Unfortunately at presen t w e lac k the intuition to u nderstand what this class “is”, sp eciﬁcally , w e lack the to ols to construct a test to see if an y giv en class or language falls within it. In Sectio n 3.4 w e giv e a language theoretic interpretati on of the d eriv ativ e of a sp ecies; sp eciﬁcally a grammar system, fr om which, for an y linear d iﬀeren tial equation w ith co- eﬃcien ts from Q [ x ] we can generate a language whose generating function satisﬁes th is equation. ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 565 1.4. O ther diﬀ e re ntial classes There are several other natural function classes related to the diﬀerential equations. A series f ( z ) ∈ K [[ t ]] is said to b e c onstructible diﬀer entiably algebr aic (CDF) if it b elongs to some ﬁn itely generated ring whic h is closed u nder diﬀeren tiation. [3, 4]. Th is is equiv alent to satisfying a system of diﬀerential equations of a giv en form. Combinatorially , any CDF function can b e int erp reted as a family of enric h ed trees. Theorem 3 of [3] giv es the result that if P a n /n ! t n is CDF, then | a n | = O ( α n n !) for s ome complex constant α . Th is class is not closed un der Hadamard pro du ct, and any arb itrary CDF fun ction is unlikely to h a v e the image under the Borel transf orm also CDF. T his is the key closure prop erty required for a meaningful corresp ond ence with resp ect to the shuﬄe pro duct. A larger class whic h conta ins b oth CDF and D-ﬁnite is diﬀerent iably algebraic. A function is diﬀer e ntiably algebr aic (DA) if it satisﬁes an algebraic diﬀerenti al equation of the form P ( x, y , y ′ , . . . , y ( n ) ) = 0 where P is a non-trivial p olynomial in its n + 2 v ariables. (See Rub el’s sur v ey [22] for many references.) The set of D A f unctions is close d und er multiplicati ve inv erse and Hadamard pr o duct. These t wo facts together are suﬃcient to pro ve that al l of the classes we c onsider ar e diﬀer entiably algebr aic . 1.5. Generat ing functions and shuﬄes Generating functions are useful to ol for the automatic studies of certain com binatorial problems. The shuﬄe op erator has a straight forward imp licatio n on the generating fu nction, as we shall s ee. With the a id of the shuﬄe pr o duct, Fla jolet et al. [11] are able to p erform a s traigh tfor- w ard analysis of f our pr oblems in r andom allocation. By using some syste matic translations, they are able to derive inte gral represen tations for exp ectations and probabilit y d istribu- tions. As they remark, th e shuﬄe of languages app ears in sev eral places relating to analysis of algorithms (such as ev olution of t wo stac ks in a common memory area). 2. The sh uﬄe of t wo languages The shuﬄe of tw o languages is d eﬁned as L 1 L 2 = [ w 1 ∈ L 1 ,w 2 ∈ L 2 ( w 1 w 2 ) . In order to u se a generating function app roac h, we assu me that L 1 is a language ov er th e alphab et Σ 1 , and L 2 is a language o ver Σ 2 , and Σ 1 ∩ Σ 2 = ∅ . If they sh are an alphab et, it suﬃces to add a bar on top of the cop y from Σ 2 . 2.1. The shuﬄe closure of context free languages W e consider the shuﬄe closure of a language in the next section, and ﬁrs t concen trate on th e shuﬄe closure of a class of languages. F or any giv en class of languages C , the sh u ﬄe closure can b e deﬁn ed recursive ly as the (inﬁ nite) union of S 0 , S 1 , . . . , the sequence recursiv ely deﬁned b y S 0 = C , S n = { L 1 L 2 : L 1 ∈ S n − 1 , L 2 ∈ C } . 566 M. MISHNA AND M . ZABR OCKI The shuﬄe pr o duct is comm utativ e and asso ciativ e [20], and th us the closure con tains S j S i , for any i and j . Remark, that for any giv en language in the closure, there is a b ound on the num b er of sh uﬄ e pro ductions that can o ccur in any deriv ation tree; namely , if L ∈ S n , that b ound is n . In ge n eral, w e denote the closure of a class of languages under s h uﬄe as C . The class of regular languages is closed un der the sh uﬄe pro duct, s ince th e shuﬄe of an y tw o regular languages is regular. How ever, th e con text fr ee languages are n ot closed un der the shuﬄe pro du ct [20], and hence w e consid er its closure. The p rotot yp ical language in this class is the shuﬄe of (an y ﬁnite n umber of ) Dyc k languages. Let | w | a b e count the n umb er of o ccur ences of the letter a in th e w ord w . Let D b e the Dyc k language ov er the alphab et Σ = { u, d } : D = { w ∈ Σ ∗ : w ′ v = w = ⇒ | w ′ | u ≥ | w ′ | d and | w | u = | w | d . } W e constru ct an isomorphic v ersion E , o ver the alphab et { l, r } . The language D E has enco des random w alks restricted to the quarter plane with steps f rom u(p), d(own), r(igh t), and l(eft) that r eturn to the origin. By consid ering the larger language of Dyc k pr eﬁxes, w e can mo d els wal ks that end an ywhere in the quarter plane. Indeed, as the shuﬄe do es preserve t wo distinct s ets of preﬁx conditions, there are man y examples of r andom walks in b ound ed regions that can b e expr essed as shuﬄes of algebraic languages. It might b e in teresting to consider other stand ard questions of classes of languages for this closure class; in particular if inte resting random wal ks arise. 2.2. The closure is D-ﬁnit e In ord er to sho w that the sh u ﬄe p ro duct of t w o languages with D-ﬁnite g enerating fun c- tions also has a D-ﬁnite generating function, we consider the follo wing classic observ ation on the enumeratio n of shuﬄes of languages. If L is the shuﬄe of L 1 and L 2 , then the num b er of words of length n in L are easily coun ted if the generating series for L 1 ( z ) = ℓ 1 ( n ) z n and L 2 ( z ) = ℓ 2 ( n ) z n are kno wn b y the follo w ing formula: ℓ ( n ) = X n 1 + n 2 = n  n n 1 n 2  ℓ 1 ( n 1 ) ℓ 2 ( n 2 ) . T o see this, recognize that a w ord in L is a comp osed of t wo words, and a set of p ositions for the letters in the word from L 1 , This is equiv alen t to ℓ ( n ) n ! = X n 1 + n 2 = n ℓ 1 ( n 1 ) n 1 ! ℓ 2 ( n 2 ) n 2 ! , (2.1) whic h amoun ts to the relation b etw een the exp onential gener ating functions of the th ree languages: L = L 1 L 2 = ⇒ ˆ L ( z ) = ˆ L 1 ( z ) ˆ L 2 ( z ) . (2.2) Using these relations, we can easily pro ve the follo win g resu lt. Prop osition 2.1. If L 1 and L 2 ar e languages with D-ﬁnite or dinary gener ating functions, then the gener ating series for L = L 1 L 2 , L ( z ) is also D-ﬁnite. As is the ca se w ith man y of the most in teresting closure prop erties of D-ﬁnite functions, the pro of follo ws from the closure of D-ﬁnite fun ctions u nder Hadamard pro d uct [19]. ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 567 Pr o of. Since D-ﬁnite functions are closed under Hadamard produ ct, t h e ordin ary generating function is D-ﬁnite if and only if the exp onential generating function of a sequence is D- ﬁnite. Consequen tly , if L 1 ( z ) and L 2 ( z ) are D-ﬁnite, then so are the exp onen tial generating functions, ˆ L 1 ( z ) and ˆ L 2 ( z ). By c losur e under pro d uct, ˆ L ( z ) is D-ﬁnite, and th us so is L ( z ). This result has the f ollo win g consequences. Corollary 2.2. If L 1 and L 2 ar e c ontext fr e e languages which ar e not inher ently ambiguous, then the gener ating series L ( z ) for L = L 1 L 2 is D-ﬁnite. Corollary 2.3. A ny language in the shuﬄe closur e of c ontext fr e e languages has a D-ﬁnite gener ating func tion. 2.3. Asymptotic template for ℓ ( n ) W e cont inue the example fr om the previous section using the t wo Dyc k languag es D and E . It is straight forward to compute that D ( z ) = E ( z ) = P  2 n n  1 n +1 z n . Th u s, ℓ ( n ), the n umb er of w ords of length n in the shuﬄe is given by ℓ ( n ) = X n 1 + n 2 = n  n n 1  n 1 n 1 / 2  n 2 n 2 / 2  . W e remark that an asymptotic expression f or ℓ ( n ) can b e d etermined b y ﬁrst using the V andermond e-Ch u iden tit y to simp lify ℓ ( n ): ℓ ( n ) =  n ⌊ n/ 2 ⌋  n + 1 ⌈ n/ 2 ⌉  , and then by applying Stirlin g’s form u la. Since ℓ ( n ) ∼ 4 n /n , w e see that it the resulting s eries is not algebraic. Fla jolet uses this tec hniqu e extensiv ely in [10] to pr o v e that certain con text- free languages are inherent ly am biguous. Thus, w e ha ve that our class has generating functions strictly con tains the algebraic functions. Th u s, we h a v e some elements of a class of function with a n ice asymp totic expansion. A rough calculation give s that that the shuﬄe of tw o languag es, with resp ectiv e asymptotic gro wth of κ i n r i ( α i ) n , for i = 1 , 2 r esp ectiv ely , is giv en b y the expression ℓ ( n ) ∼ κn r 1 + r 2 ( α 1 + α 2 − r 1 − r 2 ) n . Ho w could one hop e to pro ve d irectly th at all elemen ts in this class ha ve an exp ansion of the form ℓ ( n ) ∼ κα n n r , where now r can b e any r ational, and κ is no longer restricted to algebraic num b ers ? It seems that it should b e p ossible to pro ve th is at least for the shuﬄes of series which satisfy the h yp otheses of Theorem 3.11 [7], using a more generalized form of the Ch u-V anderm onde iden tit y , or for the c losure of t h e sub -class of co ntext-free language s p osessing a n N -algebraic generating f unction. In this case the d = − 3 / 2, and t h is simp liﬁes the analyses consid erably . Unfortunately , it d o es not seem lik e a direct application of Bender’s metho d [12, Theorem VI.2] applies. Theorem 3.2 states that th e asymptotic form will not conta in an y p o w ers of n ! greater than 2. This illustrates a limitation w ith the expressive p o wer of the shuﬄe closure of con text free languages: there are kno wn n atural com binatorial ob jects w hic h hav e D-ﬁnite generating functions with coeﬃcien ts that gro w asymptotically with h igher p o wers of n !. 568 M. MISHNA AND M . ZABR OCKI F or example, the num b er of k -regular graphs for k > 4 con tains ( n !) 5 / 2 , and the conjectured asymptotic for for k -un iform Y oung tableaux [8] contai n s n ! k / 2 − 1 . 3. Sh uﬄe gr ammars W e extend the ﬁrst appr oac h by allo wing the shuﬄe to come int o p la y earlier in the story; w e add the shuﬄe op erator to our grammar rewr iting rules. Sh uﬄe grammars as deﬁned by Gisc her [14] include a shuﬄe rule, and a shuﬄe closure ru le. W e consider th ese in Section 3.4. As we did ea rlier, we ﬁr st consider languages whic h ha v e a natural b ound on t h e num b er of shuﬄe pro d uctions that can occur in a d eriv ation tree of an y w ord in the language. Th at is follo wed by an example of a recursiv e s h uﬄe grammar to illustrate how pow erf ul they can b e. It has b een prov en [17] that the recursive s h uﬄe grammars do indeed h a v e a greater expressiv e p o wer, but it is not alw ays clea r h o w to interpret the resulting com b inatorial families. W e b egin with a second kind of p oin ting op erator. 3.1. A terminal p oin ting op erator The traditional p oin ting op erator can b e used to mod el z d dz , but one can show that this is, in fact, insu ﬃcien t to generate all D-ﬁn ite functions. T o remedy this, we deﬁne a p oin ting op erator w hic h mimics the concept b ehind the deriv ativ e of a sp ecies. This p oint ing op erator has the eﬀect of con verting a let ter to an epsilon b y ‘marking’ the let ter. Consequent ly , a letter can not b e m ark ed m ore than once, and eac h su bsequen t time a w ord is marked, there is a co unter on the mark wh ic h is augmen ted. The p ointing op erator applied a set of words will b e the p oin ting op erator applied to eac h of the elemen ts of the set. Notationall y , we distinguish them with accum ulated primes. W e giv e some examples: P ( aab ) = a ′ ab + aa ′ b + aab ′ P ( P ( aab )) = a ′ a ′′ b + a ′ ab ′′ + a ′′ a ′ b + aa ′ b ′′ + a ′′ ab ′ + aa ′′ b ′ P ( a ′′′ a ′ b ′′ ) = ∅ . The length of the w ord is the num b er of u nmark ed letters in a w ord (but the co mbinatorial ob jects in th e language enco de more than just the length in some sense). Th e num b er of w ords in th e p oin ting of a w ord is equal to its length. This giv es a straigh tforwa rd interpretatio n of the deriv ativ e: L 1 = P ( L) = ⇒ L 1 ( z ) = d dz L ( z ) . Using this deﬁn ition if A is a sym b ol whic h ‘yields’ through a grammar a language Remark, if we allo w concatenatio n after marking, we could generate t wo letters in th e same w ord mark ed with a single prime via concatenation of mark ed w ords. Using the marking op eration, w e can express most D-ﬁnite functions, sp eciﬁcally , b y the diﬀeren tial equations that th ey satisfy . F or example, th e series P ( z ) = P n ≥ 0 n ! z n satisﬁes the diﬀerentia l equ ation P ( z ) = 1 + z P ( z ) + z 2 P ′ ( z ) . ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 569 This is mo delled by the grammar A → ε A → aA A → bc P ( A ) . An alph ab et on three letters ( a, b, c ) allo ws us to track th e origin of eac h letter. Here is the result of the third iteration of the rules: 1 ⊕ a ⊕ aa + ba ′ c ⊕ aaa + abca ′ + bca ′ a + bcb ′′ ca ′ + bcaa ′ + bcbc ′′ a ′ ⊕ aaaa + aabca ′ + abca ′ a. W e w ill call a p oint ing grammar one that has rules of the form A → w , A → w B , A → P ( B ) . (3.1) Despite th e fact that we allo w only left concatenation, (a strategy to a vo id concatenating p oint ed words) these grammars rules can mo del any D-ﬁnite function. W e can d eﬁne a pr o cedure f or ﬁnding a language giv en a deﬁning equation satisﬁed b y a D-ﬁnite generating f unction. Sa y that a generating function T ( z ) satisﬁes T ( z ) = q ( z ) + q 0 ( z ) T ( z ) + q 1 ( z ) T ′ ( z ) + . . . + q n ( z ) T ( n ) ( z ) . (3.2) No w su bstitute T ( z ) = P ( z ) − N ( z ) and ( P ( z ) − N ( z )) = q ( z )+ q 0 ( z )( P ( z ) − N ( z ))+ q 1 ( z )( P ′ ( z ) − N ′ ( z ))+ . . . + q n ( z )( P ( n ) ( z ) − N ( n ) ( z )) Use also the n otatio n that q i ( z ) = q + i ( z ) − q − i ( z ) where q + i ( z ) are the p ositive terms of the p olynomial and q − i ( z ) are the negativ e on es. Then if P ( z ) = q + ( z ) + q + 0 ( z ) P ( z ) + q − 0 ( z ) N ( z ) + · · · + q + n ( z ) P ( n ) ( z ) + q − n ( z ) N ( n ) ( z ) (3.3) and N ( z ) = q − ( z ) + q − 0 ( z ) P ( z ) + q + 0 ( z ) N ( z ) + · · · + q − n ( z ) P ( n ) ( z ) + q + n ( z ) N ( n ) ( z ) (3.4) then P ( z ) − N ( z ) satisﬁes equation (3.2). No w we can deﬁne a language w ith a r ule for eac h monomial in (3.3) and (3.4 ) and ev ery terms x a R ( k ) ( z ) is represente d by a rule of the f orm ˜ R → w P ( · · · P ( R ) · · · ) where P o ccurs k times and R , ˜ R are symb ols repr esen ting a language whose generating function is either P ( z ) or N ( z ) and w is a wo rd of length a . An y language w hic h is generated from ru les of the form Eq. (3.1) has a generating function whic h satisﬁes a linear d iﬀeren tial equation, and hence is D-ﬁnite. W e su mmarize this in the follo wing theorem. Theorem 3.1. A language which is gener ate d fr om the rules of the form Eq. (3.1) has a D-ﬁnite gener ating function. Mor e over, any D-ﬁnite function c an b e written as a diﬀer enc e of two gener ating functions for languages which ar e g ener ate d by rules of this form. 570 M. MISHNA AND M . ZABR OCKI 3.2. Acyclic shuﬄe dep endencies W e consider languages generated by the f ollo win g re-writing r ules, w here w is a word, and A , B and C are non-terminals: A → w , A → B C, A → B C. (3.5) F or any language generated by r ules of the ab ov e t yp e, and a ﬁxed set of non-terminals, w e construct the graph wit h non-terminals as nod es, and for e very pro duction r ule A → B C , w e make an edge fr om A to B and an edge from A to C . If this graph is acyclic, w e say the language has acyclic shuﬄe dep endencies. The next section treats languages that h a v e a cyclic dep endency . W e pro v e that this class of languages is larger than those generate d b y the p ointing op erator of the previous section, b ecause w e can generate a language with a generating function that is not D-ﬁnite. W e r e-use the Dyc k languages D an d E d eﬁned in Section 3.4. Cons ider the language generated b y the follo wing grammar: A → D E C → 1 | AC. The sh uﬄe dep end ency graph is a tree, and th us this is in ou r class. T he generating functions of A and C are give n by A ( z ) = − 1 4 z + (16 z − 1) 2 π z EllipticK(4 √ z ) + 1 π z EllipticE(4 √ z ) , C ( z ) = 1 1 − A ( z ) . Since 1 − A ( z ) is n ot of the form exp ( alg ebr aic ) al g ebr aic , C ( z ) is not D-ﬁn ite. Nonetheless, w e can pro ve an asymptotic result ab out generating fu nctions in this class. Theorem 3.2. L et L b e a pr op er language gener ate d by shuﬄe pr o duction in an unambigu- ous gr ammar of with rules of the form gi ven in Eq. (3.5) , on an alphab et with k letters. The numb er of wor ds of length n , ℓ ( n ) , satisﬁes ℓ ( n ) = O ( n ! 2 ) . Pr o of. Since the grammar generates prop er languages, there are no sh uﬄ e pro d uctions with epsilon. Th u s, the deriv ation tree of a word of length n can ha ve at most n sh uﬄ e pro du ctions. In the w orst case, eac h one increments the alph ab et and so the m axim um size of alphab et that a word of length n can dr a w on is then k n . The total num b er of w ords from this alphab et is ( k n ) n . F or k < n the resu lt follo ws by Stirling’s form u la. 3.3. C yclic shuﬄe dep endencies Languages in this class will ha ve an inﬁn ite alphab et since w e use a disjoint un ion in our sh u ﬄe. Ho we ver, the num b er of w ords of a giv en length is ﬁnite if there is no deriv ation tree p ossible that is a sh u ﬄe and an ǫ . Under this restriction, an y word of length n co mes from an alphab et using no m ore than more than a constan t m u ltiple of n letters. W e consid er an imp ortant cla ss of this type in the next section. ANAL YTIC ASPECTS OF THE SHUFFLE PRODUCT 571 3.4. The shuﬄe closure of a languages A class of languages whic h falls under this category are those that are generating using the shuﬄe closure op er ator. Th e shuﬄe closur e of a language is d eﬁned recur siv ely in the follo w ing wa y: L 1 = L L, and L n = L n − 1 L. The sh u ﬄe closure, is the u nion o v er all ﬁnite shuﬄes: L = [ n L n . Equiv alentl y , w e write this as a grammar pro duction: A → A B | B . The sh u ﬄe clo- sure [16, 17] provides extremely concise n otatio n . I n particular, they arise in descriptions of sequenti al execution histories of concurr en t pr o cesses. Remark, that th e closure of the language is one sin gle languag e, whereas the clo su re of the class of languages that is one language is an inﬁn ite set of languages. The shuﬄe clo sur e of a single letter gives all p erm utations: a = a ⊕ aa + aa ⊕ aaa + aaa + aaa + aaa + aaa + aaa ⊕ . . . The generating function of the this language is P n ! z n , and indeed the generating function of the shuﬄe clo su re of an y word of length k is P ( k n )!( z k k ) n , which is also D-ﬁnite. T o pro ve our formula ab o ve , we express the generating function of L in terms of the op erators wh ic h switc h b et wee n the ordinary and exp onen tial generati n g functions. Reca ll, L ( z ) = P a n z n = ⇒ ˆ L ( z ) = P a n n ! , and we d eﬁne the Laplace op erator L · ˆ L ( z ) = L ( z ). Then, L 1 = L = ⇒ L 1 ( z ) = X n L · [( ˆ L ( z )) n ] . (3.6) Although all of the su mmands are D-ﬁnite, it is p ossible that the sum is not. Clearly , th e s h uﬄe closure do es not p reserv e regularit y , and indeed adding it, and the shuﬄe pro du ct to regular languages is enough to generate all recursively enumerable languages. Thus, we see that if there is n o b ound on the num b er of shuﬄes p ossible in any expression tree, the languages can get far more complex. Nonetheless the follo wing conjecture s eems r easonable, and p erhaps it is p ossible to pro ve it follo win g starting fr om Eq. (3.6), and necessarily a m ore sophisticated analysis. Conjecture 3.3. Th e sh uﬄe closure of a r egular language h as a D-ﬁnite generating fun c- tion. 4. Conclusion A next step is t o adapt the Bolzmann generators to these languag es. Sin ce w e can e ﬀec- tiv ely sim ulate lab elled ob jects in an unlab elled context , w e can easily describ e ob jects lik e strong int erv al trees. This appr oac h might allo w a detailed analysis of certain parameters of p ermutation sorting b y rev ersals, as applied to comparativ e genomics [1]. W e are also intereste d in c haracterizing the con text-free languages wh ose sh uﬄe is not algebraic, and to consider the other naturu al questions of closure that are standard for language classes. A cknow le dgments. W e gratefully a c knowledge many dis cussions from the Algebra ic Combinatorics Seminar at the Fields Institute. In particular, we acknowledge contributions by N. Ber geron, C. Hollweg, a nd M. Ro sas. W e wish to a lso ackno wledg e the ﬁnancial s uppor t of NSE R C. 572 M. MISHNA AND M . ZABR OCKI References [1] S ´ everi n e B´ era rd, Anne Bergeron, Cedric Chauve, and Chistophe P aul. P erfect sorting by reve rsals is not alwa y s diﬃcult. IEEE/ACM T r ans. on c omput. biol o gy and bioinformatics , 4(1), 2007. [2] F. Bergeron, G. Lab elle, and P . Leroux. Combinatorial sp e cies and tr e e-like structur es , volume 67 of Encyclop e dia of Mathematics and its Applic ations . Cam bridge Un ive rsity Press, Cam bridge, 1998. [3] F ran¸ cois Bergeron and Christophe Reutenauer. Combinatori al resolution of systems of diﬀerential equ a- tions. I I I. A special class of d iﬀeren tially algebraic series. Eur op e an J. Combin. , 11(6):501–5 12, 1990. [4] F ran¸ cois Bergeron and Ulrike Sattler. Constructible diﬀerentially ﬁnite algebraic series in several v ari- ables. The or et. C omput. Sci. , 144(1-2):59–6 5, 1995. [5] Man uel Bo dirsky , ´ Eric F usy , Mihyun Kang, and Stefan V igerske. A n unbiased p ointing op erator for unlab eled structures, with applications to counting and sampling. In Nikhil Bansal, Kirk Pruhs, and Cliﬀord Stein, editors, SODA , pages 356–36 5. SIAM, 2007. [6] Mireil le Bousquet-M´ el ou. Algebraic generating functions in enumerativ e combinatorics, and context-free languages. In Stacs 05 , volume 3404 of L e ctur e Notes in Comput. Sci. , pages 18–35. Springer, 2005. [7] Mireil le Bousquet-M ´ elou. Rational and algebraic series in com binatorial en umeration. In International Congr ess of Mathematicians , pages 789–826, 2006. [8] F r´ ed ´ eric Ch y zak , Marni Mishna, and Bruno Salvy . Eﬀective scalar pro ducts of D-ﬁ nite symmetric functions. J. Combin. The ory Ser. A , 112(1):1–43, 2005. [9] Philippe Duchon, Philippe Fla jolet, Guy Louchard, and Gill es Schaeﬀer. Boltzmann samplers for th e random generation of combinatoria l stru ctures. Combi n. Pr ob ab. Comput. , 13(4-5):577–625, 2004. [10] Ph ilippe Fla jolet. Analytic mo dels and ambiguit y of context-free languages. The or et. Comput. Sci. , 49(2-3):283– 309, 1987. [11] Ph ilippe Fla jolet, Dani` el e Gardy , and Lo ¨ ys Thimonier. Birthday parado x, coup on collectors, caching algorithms and self-organizing search. Discr ete Appl . Math. , 39(3):207–229, 1992. [12] Ph ilippe Fla jolet and Rob ert S edgewic k. Analyt i c Combinatorics . http://alg o.inria.fr/flajol et/Publications/books.html , 2006. [13] Ph ilippe Fla jolet, Paul Zimmerman, and Bernard V an Cutsem. A calculus for the random generation of lab elled combinatoria l struct u res. The or et. Comput. Sci. , 132(1-2):1– 35, 1994. [14] Jay Gisc her. Shuﬄe languages, p etri nets, and context-sensitiv e grammars. Communic ations of the ACM , 24(9), S ep tem b er 1981. [15] D aniel Hill Greene. L ab el le d F ormal La nguages and Their Uses . PhD thesis, Stanford Universit y , 1983. [16] Matth ias Jantzen. Ex tending regular expressions with iterated shuﬄe. The or et. Comput. Sci. , 38(2- 3):223–2 47, 1985. [17] Joanna J ‘ edrzejo wicz. In ﬁnite hierarc hy of ex pressions containing shuﬄe closure op erator. Inform. Pr o- c ess. L ett. , 28(1):33–37, 1988. [18] Gilb ert Lab elle and C ´ edric Lamathe. A theory of general com binatorial diﬀerential op erators. In F ormal Power Series and Algebr aic Combinatorics , 2007. [19] L. Lipshitz. The diagonal of a D -ﬁ nite pow er series is D -ﬁnite. J. Algebr a , 113(2):373–37 8, 1988. [20] M. Lothaire. Combinatorics on wor ds , volume 17 of Encyclop e dia of Mathematics and its Appli c ations . Addison-W esley Pu blishing Co., Reading, Mass., 1983. [21] Marni Mishna. Automatic enumeration of regular ob jects. J. Inte ger Se quenc es , 10:Article 07.5.5, 2007. [22] Lee A. R ub el. A survey of transcendentally transcendental functions. A mer. M ath. Monthly , 96(9):777 – 788, 1989. [23] Michael F. Singer. A lgebraic relations among solutions of linear diﬀerential equations. T r ans. Amer. Math. So c. , 295(2):753–7 63, 1986. [24] R ic hard P . Stanley . Enumer ative c ombinatorics. Vol. 2 , volume 62 of Cambridge Studies i n A dvanc e d Mathematics . Cambridge U niversi ty Press, Cambridge, 1999. This wor k is licens ed under t he Creative Commons Attr ibution-NoDe rivs License. T o view a copy of this license, v isit http: //creati vecommon s.org/licenses/by- nd/3.0/ .

Analytic aspects of the shuffle product

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment