Perfect Necklaces

We introduce a variant of de Bruijn words that we call perfect necklaces. Fix a finite alphabet. Recall that a word is a finite sequence of symbols in the alphabet and a circular word, or necklace, is the equivalence class of a word under rotations. …

Authors: Nicolas Alvarez, Veronica Becher, Pablo A. Ferrari

P erfect Nec klaces Nicol´ as Alv arez V er´ onica Bec her Pablo A. F errari Sergio A. Y uhjtman F ebruary 1, 2016 Abstract W e int r o duce a v ariant of de Bruijn words that we call p erfect necklaces. Fix a finite alphab et. Recall that a word is a finite sequenc e of symbols in the alphab et and a circular word, or necklace, is the eq uiv alence class o f a word under ro tations. F or p ositive integers k and n , we call a necklace ( k, n )-p erfect if each word of length k o ccurs exactly n times at po sitions which are different mo dulo n for any con ven tio n o n the starting p o int. W e call a necklace p erfect if it is ( k , k )-p erfect for some k . W e prov e that every arithmetic s equence with difference coprime with the a lphab et size induces a p erfect nec klace . In particular , the co ncatenation of all w or ds of the same length in lex icographic o rder yields a perfect necklace. F or ea ch k and n , we give a clo sed for mula for the num b er of ( k , n )-p erfect necklaces. Finally , we pr ov e that every infinite p erio dic sequence whose p erio d coincides with some ( k , n )-p erfect necklace for a ny n , pa s ses all s tatistical tests o f size up to k , but not all larger tests. This last theorem mo tiv ated this work. Keyw ords : com binatorics on words, neckla ces, de Bru ijn words, statistical tests of finite size 1 In tro du ction Fix a fi nite al p hab et A and wr ite |A | for its cardinalit y . A w ord is a finite sequence of sym b ols in the alphab et. A circular word, o r neckl ace, is the equiv alence class of a word under r otatio n s. I n this note w e introd uce p erfe ct ne cklac es : Definition 1. A neckla ce is ( k , n ) -p erfe ct if it has length n |A| k and eac h w ord of length k o ccurs exactly n times at p ositions which are different mo dulo n for an y conv en tion on the starting p oin t. A nec klace is p erfe ct if it is ( k , k )-p erfect for s ome k . P erfect n ec klaces are a v arian t of the celebrated de Bruijn nec klaces [7]. Recall that a de Bruijn n ec klace of order k in alph ab et A h as length |A| k and eac h word of length k occurs in it exactly once. Th us, our ( k , 1)-p erfect nec klaces coincide with the d e Bruijn nec klaces of ord er k . F or a supreme presentati on of d e Br u ijn nec klaces, in clud ing a historic accoun t of their disco v ery and redisco very , see [2]. Ob s erv e that a nec klace of length k |A| k admits k p ossible d ecomp ositions into |A| k consecutiv e (non-o ve r lapping) w ords of length k . Hence, a 1 nec klace is ( k , k )-p erfect if and only if it has length k | A | k and eac h w ord of length k o ccurs exactly once in eac h of the k p ossible decomp ositions. F or eac h k and n , w e giv e a charac terization of ( k , n )-p er f ect nec klaces in terms of Eulerian circuits in app ropriate graphs (Corollary 14). W e giv e a closed f ormula f or th e num b er of ( k , n )-p erfect nec klaces (Theorem 20). These are the most el ab orate r esults in this wo r k. W e sh ow that eac h arithmetic sequence with difference copr im e with the alphab et size induces a p erfect nec klace (Theorem 5). In particular, the concat enation of all w ords of the same length in lexicographic order yields a p erfect nec klace (Corollary 6). This pr ovides a gracious instance of a p er f ect nec klace for any word length. As far as w e kno w, Da vid Champ ern o wne [5] w as the fi rst to consider com binatorial prop erties in the concatenation of all w ord s of the same length in lexicographic order. He used th em in h is constr u ction of a real num b er normal to b ase 10, a prop erty defined by ´ Emile Borel [3]. He work ed with alphab et A = { 0 , 1 , . . . , 9 } and for eac h k , he b oun ded the num b er of o ccurrences of eac h w ord of length u p to k in the concatenat ion of all w ord s of length k in lexicographic order. But Champ ern o wne missed that eac h w ord of length k occurs in this sequence exactly k times, once in eac h of the k differen t shifts. 2 P erfect nec klaces Notation. W e wr ite A ∗ for the set of all w ord s, and A k for the set of all w ord s of length k . The length of a word w is denoted with | w | and the p ositions in w are num b ered from 0 to | w | − 1. W e w rite w ( i ) to d enote the symbol in the i -th p osition of w . Let θ : A ∗ → A ∗ b e the shift op erator, suc h that for eac h p osition i , ( θ w )( i ) = w (( i + 1) mod | s | )) . T hat is, the shift op erator is defin ed with the con v enti on of p er io dicit y . With θ n w e denote the application of the shift n times to the right, and with θ − n , n times to the left. As already stated, a nec klace is the equiv alence class of a w ord und er rota tions. T o denote a nec klace we write [ w ] wh ere w is any of the w ords in the equ iv alence class. F or example, if A = { 0 , 1 } , [000] cont ains a single w ord 000, b ecause for ev ery n , θ n (000) = 000. [110] cont ains three words θ 0 (110) = 110, θ 1 (110) = 101 and θ 2 (110) = 011. Example 2. Let A = { 0 , 1 } . W e add spaces in the examples just for readabilit y . F or words of length 2 there are just t wo p erfect n ec klaces: [00 01 10 11], [00 10 01 11]. This is a p erf ect nec klace for wo r d length 3: [000 110 101 111 001 010 011 100]. The follo win g are n ot p erfect, [00 01 11 10], [000 101 110 111 010 001 011 100]. The so-called Gr ay numb ers are not per f ect, for in stance, [000 001 011 010 110 111 101 100] . 2 2.1 Eac h ordered nec klace is p erfect Definition 3. F or an ordered alphab et A and a p ositiv e in teger k , the k - or der e d ne cklac e has length k |A| k and it is obtained by the concatenation of all wo r ds of length k in lexicographic order. F or A = { 0 , 1 } the follo wing are the ordered nec klaces for k equal to 1, 2 and 3 r esp ectiv ely: [01] , [00 01 10 11], [000 001 010 011 100 101 110 111]. W e will pro v e that for ev ery w ord length, the ordered nec klace is p er f ect. W e sa y that a bijection σ : A k → A k is a cycle if for eac h w ∈ A k the set { σ j ( w ) : 0 ≤ j < |A| k } equals A k . F or a w ord w w e w rite w ( i . . . j ) to denote the subsequence of w from p osition i to j . Lemma 4. L et A b e a finite alphab et, σ : A k → A k a cycle and v any wor d in A k . L et s = σ 0 ( v ) σ 1 ( v ) . . . σ |A| k − 1 ( v ) . The ne cklac e [ s ] is p erfe ct if and only i f for e v ery ℓ such that 0 ≤ ℓ < k , for every x ∈ A ℓ and every y ∈ A k − ℓ , ther e is a unique w ∈ A k such that w ( k − ℓ . . . k − 1 ) = x and ( σ ( w ))(0 . . . k − ℓ − 1) = y . Pr o of. Ass u me [ s ] is ( k , k )-p erfect. T ak e ℓ such that 0 ≤ ℓ < k , x ∈ A ℓ and y ∈ A k − ℓ . Consider θ − ℓ s , the − ℓ th shift of s . Since [ s ] is ( k , k )-p erfect, xy o ccurs exactly once in the decomp osition of θ − ℓ s in consecutive w ords of length k . Thus, there is a u nique w ord w in the decomp osition of s in consecutiv e w ords of length k whose last ℓ sym b ols are equal to x and whose first k − ℓ symb ols are equal to y . Conv ersely , supp ose [ s ] is not ( k , k )-p erfect. Then, there is some ℓ , 0 ≤ ℓ < k , suc h that the d ecomp osition of θ − ℓ ( s ) con tains t wo equal w ords of length k . This con tradicts that for ev ery x ∈ A ℓ and ev ery y ∈ A k − ℓ , there is a un ique w ∈ A k suc h that w ( k − ℓ . . . k − 1) = x and ( σ ( w ))(0 . . . k − ℓ − 1) = y . Theorem 5. Consider the alphab et A = { 0 , .., b − 1 } wher e b is an inte ger gr e ater than or e qual to 2 , a wor d length k and a p ositive inte ger r c oprime with b . Identify the elements of A k with the set of inte gers mo dulo b k ac c o r ding to r epr esenta tion in b ase b . Define the wor d of length k b k by the juxtap osition of the elements of A k c orr esp onding to the arithmetic se quenc e 0 , r , 2 r , . . . , ( b k − 1) r . Then th e asso ciate d ne cklac e is p erfe c t. Pr o of. S ince r is coprime with b , the add ition of r defines a cycle σ : A k → A k . W e must c hec k th at it satisfies the condition in Lemma 4. F or any w such that w ( k − ℓ . . . k − 1) = x we ha ve σ ( w )( k − ℓ . . . k − 1) = ˜ x , where abusing notation ˜ x = x + r mo d b ℓ . S ince th e word y ˜ x app ears only one time in the cycle, this fixes a uniqu e w = σ − 1 ( y ˜ x ) w ith w ( k − ℓ . . . k − 1) = x and ( σ ( w ))(0 . . . k − ℓ − 1) = y . Corollary 6. F or an or der e d alphab et A and wor d length k , the k -or der e d ne cklac e is p erfe ct. Pr o of. T ak e r = 1 in T heorem 5. The follo win g prop osition is immed iate, so w e state it without proof. 3 Prop osition 7. The fo l lowing op er ators φ : A ∗ → A ∗ ar e wel l define d on ne cklac es and pr eserve p erfe ction. That is, for every k and n and for every s ∈ A ∗ , if [ s ] is ( k , n ) -p erfe ct then [ φs ] is ( k , n ) -p erfe ct. 1. The digit p ermutation op er a tor define d by φ ( x 0 . . . x k b k − 1 ) = ( π x 0 . . . π x k b k − 1 ) for any p ermuta tion π : A → A . 2. The r efle ction op er ator φ ( x 0 . . . x k b k − 1 ) = ( x k b k − 1 . . . x 0 ) . 3 Characterizing and c oun ting p erfect nec klaces T o charac terize an d count ( k , n )-p erfect n eckla ces in alphab et A we consider Eulerian circuits in an appropr iate directed graph, defined from A , k and n . Recall that an Eulerian circuit in a graph is a path that u s es all edges exactly once. A thorough presenta tion of the m aterial on graphs that we u s e in this section can b e r ead in the monograph s [9, 16, 6]. F or th e m aterial on com b inatorics o n wo r d s see th e b o oks [13, 14]. W e write m | n when m divides n and w e wr ite gcd( m, n ) for the maxim um common divisor b et ween m an d n . Definition 8. Let A b e an alphab et with cardinalit y b , let s b e a w ord length and let n b e a p ositiv e int eger. W e defin e the astute gr aph G s,n as the directed graph, with nb s no des, eac h no de is a pair ( u, v ), where u is in A s and v is a n u m b er b et we en 0 and n − 1. Ther e is an edge fr om ( u, v ) to ( u ′ , v ′ ) if the last s − 1 sym b ols from u coincide with the first s − 1 sym b ols fr om u ′ and ( v + 1) mo d n = v ′ . Observ e that G s,n is strongly regular (all no d es ha ve in-degree and out-deg r ee equal to b ) and it is strongly connected (there is a path from ev ery no de to ev ery other no de). Remark 9. F or any alph ab et s ize, the astute graph G k − 1 , 1 coincides with a de Bru ijn graph of words of length k − 1; hence, the Eulerian circu its in G k − 1 , 1 yield exactly th e de Bruijn nec klaces of order k . Although eac h Eu lerian circuit in the astute graph G k − 1 ,n giv es one ( k , n )-p erfect neckla ce, eac h ( k , n )-p erfect n ec klace can come from seve r al Eulerian circuits in this graph . 3.1 F rom p erfect neck laces to Eulerian c ircuits Hereafter, we assume an alph ab et A and w e write b for its cardin alit y . Definition 10. F or a neckl ace of length ℓ , [ a 0 , a 2 , . . . a ℓ − 1 ], we define its p erio d as the minim u m in teger L such that for ev ery non-negativ e in teger j , a j mod ℓ = a ( j + L ) mo d ℓ . Notice that the p erio d L alwa ys exists, and necessarily L | ℓ . If the p erio d coincides with th e length w e sa y the neckla ce is irr e ducible . Definition 11. Let m, n b e p ositiv e in tegers. W e define d m,n = Y p α i i where { p i } is the set of p rimes that d ivide m , and α i is th e exp onent of p i in the factoriza tion of n . 4 Prop osition 12. The p erio d L of a ( k, n ) -p erfe ct ne cklac e satisfies the fol lowing: 1. L = j b k for j | n . 2. d b,n | j . 3. The c orr esp onding irr e ducible ne ckla c e of length L = j b k is ( k , j ) -p erfe c t. Pr o of. L et [ s ] b e ( k, n )-p erfect, with s = a 0 . . . a nb k − 1 . 1. Since [ s ] has length nb k , we know L | nb k . Let’s verify that b k | L . S ince [ s ] has p erio d L , [ a 0 . . . a L − 1 ] is a neckla ce wher e all w ord s of length k o ccur the same n u m b er of times. Otherwise, it w ould b e imp ossible that they o ccur the same num b er of times in [ s ]. If eac h w ord of length k o ccurs j times in [ a 0 . . . a L − 1 ], then L = j b k . Since j b k | nb k , we conclude j | n . 2. The word a 0 . . . a k − 1 o ccurs at p osition 0 in s b ut also at p ositions L, 2 L, . . . , ( n/j − 1) L . These p ositions are of the form q j b k where 0 ≤ q < n/j . These num b ers must h av e pairwise differen t congruen ces mo dulo n . Equiv alen tly , the n/j num b ers of the f orm r b k , where 0 ≤ q < n/j , are all pairwise different mo dulo n . . This last condition h olds exactly wh en gcd( b k , n/j ) = 1, wh ic h in turn is equiv alen t to gcd( b, n/j ) = 1, wh ic h is equiv alen t to d b,n | j . 3. As argued in Po int 1, in the nec klace [ a 0 . . . a L − 1 ] every w ord of length k o ccur s th e same num b er of times. If th e p ositions of t wo o ccurrences of a giv en w ord w ere equal mo dulo j then they w ould b e equal mo d ulo n , b ut this is imp ossible b ecause [ s ] is ( k , n )-p erf ect. Prop osition 13. L et N b e a ( k , j ) -p erfe ct ne cklac e. If n is such that d b,n | j | n then the ne cklac e of length nb k obtaine d by r ep e ating N exactly n/j times is ( k , n ) -p erfe ct. Pr o of. L et ˜ N b e obtained by rep eating N exactly n /j times. Then eac h word of length k o ccurs in ˜ N exactly j × n/j = n times. T ake a wo rd w of length k and let q 1 , . . . , q j , eac h b et ween 0 and j b k − 1, b e the p ositions of the o ccurrences of w in N for some con ven tion on th e sta r ting point. T hen, w o ccurs in ˜ N at p ositions q i + j b k t , w h ere 0 ≤ t < n/j . Assume q i 1 + j b k t 1 ≡ q i 2 + j b k t 2 (mo d n ). T aking mo du lo j we conclud e i 1 = i 2 b ecause N is ( k , j )-p erfect. Then w e ha ve b k t 1 ≡ b k t 2 (mo d n/j ). Since d b,n | j we ha ve gcd( b, n/j ) = 1, so t 1 ≡ t 2 (mo d n /j ), whic h implies t 1 = t 2 . Corollary 14. Assume an alphab e t of b symb ols, with b ≥ 2 . L et k and n b e p ositive inte gers. An Eulerian cir cu it in the astute gr aph G k − 1 ,n induc es a ( k , n ) -p erfe ct ne ckla c e. Each ( k , n ) - p erfe c t ne cklac e of p erio d j b k c orr esp onds to j differ ent eulerian cir cuits in G k − 1 ,j . Ther efor e, the numb er of Eulerian cir cuits in the astute gr aph G k − 1 ,n is e ( n ) = X d b,n | j | n j p ( j ) , wher e p ( j ) is the numb er of irr e ducible ( k, j ) -p erfe ct ne cklac es. 5 3.2 The n um b er of Eulerian circuits in the astut e graphs Let G b e a dir ected graph with n no des. The adjacency matrix of a graph G is the matrix A ( G ) = ( a i,j ) n i,j =1 where a i,j is the num b er of edges b et w een no de i and n o de j . The c haracteristic p olynomial [6] of a graph G is d efined as P ( G ; x ) = determinant( xI − A ( G )) , where I is the id en tit y matrix of dimension n × n . The BEST theorem (for the authors Bruijn, v an Aardenne-Ehrenfest, Smith and T utte) giv es a pro du ct formula for the num b er of Eulerian circuits in directed graphs. Lemma 15 (BEST Th eorem [9]) . L et G b e r e gular c onne cte d gr aph with n no des. L et v b e a no de of G and let r ( G ) b e the numb er of sp a nning tr e es oriente d towar ds v . The numb er of Eulerian cir cuits in G is r ( G ) · n Y v =1 ( degr ee ( v ) − 1)! Lemma 16 (Hutschenreurther, Prop osition 1.4 [6]) . L et G b e a r e gular multigr aph with n no des and de gr e e b . F or any of its no des, the numb er of sp anning tr e es r ( G ) oriente d to it is r ( G ) = 1 n ∂ ∂ x P ( G ; x ) | x = b . wher e ∂ ∂ x is the derivative with r esp e ct to x . Giv en a graph G , its line-graph Γ( G ) is a graph suc h that eac h no d e of Γ( G ) represen ts an edge of G ; and t wo n o des of Γ( G ) are adjacen t if and only if their corresp onding edges share a common no d e in in G . Lemma 17 ([6]) . F or any dir e cte d gr aph G , r e gular and c onne cte d, P (Γ( G ); x ) = x m − n P ( G ; x ) , wher e Γ( G ) is the line-gr aph of G , m is the numb er of e dges of G and n is the numb er of no des of G . In the next lemma w e wr ite λ for th e empt y w ord, namely the unique w ord in A 0 . Lemma 1 8. L et b b e any alphab e t size, k b e a wor d length, and j b e an inte ger such that g cd ( b, k ) | j | k . L et G 0 ,j b e the g r aph with the set of no des { ( λ, 0) , ( λ, 1) , . . . ( λ, j − 1) } , with b e dges fr om ( λ, i ) to ( λ, i + 1 mod j ) . Then, P ( G 0 ,j ; x ) = x j − b j . Pr o of. I t is easy to c hec k that P ( G 0 ,j ; x ) = det ( xI − A ( G 0 ,j )), wh ic h is equal to x j − b j . Lemma 19. Assume an alphab et of b symb ols with b ≥ 2 . L et k b e a wor d length and j b e a p ositive inte ger such that gcd( b, k ) | j | k . The numb er of Eu lerian cir cuits in the astute gr ap h G k − 1 ,j is ( b !) j b k − 1 b − k . 6 Pr o of. W e write Γ( G ) to denote the line graph of G . Notice that for every p ositiv e s and for ev ery j , G s,j = Γ( G s − 1 ,j ) . In this pro of the v alue j w ill remain fixed. Since G k − 1 ,j has j b k − 1 no des, eac h with in -degree b (also out-degree b ), by Lemma 15 the n u m b er of Eulerian circuits in G k − 1 ,j is r ( G k − 1 ,j ) · j b k − 1 Y v =1 ( degr ee ( v ) − 1)! = r ( G k − 1 ,j ) ˙ ( b − 1)! j b k − 1 . The rest of the pr o of is to determine r ( G k − 1 ,j ) u sing Lemma 16. P ( G k − 1 ,j ; x ) = P (Γ( G k − 2 ,j ); x ) = x b k − 1 j − b k − 2 j P ( G k − 2 ,j ; x ) = x j ( b k − 1 − b k − 2 ) P (Γ( G k − 3 ,j ); x ) = x j ( b k − 1 − b k − 2 ) x j ( b k − 2 − b k − 3 ) P ( G k − 3 ,j ; x ) = x j ( b k − 1 − b k − 3 ) P ( G k − 3 ,j ; x ) = . . . = x j ( b k − 1 − b 0 ) P ( G 0 ,j ; x ) = x j ( b k − 1 − 1) ( x j − b j ) . ∂ ∂ x P ( G k − 1 ,j ; x ) = ∂ ∂ x x j ( b k − 1 − 1) ( x j − b j ) = ( j b k − 1 − j ) x j b k − 1 − j − 1 ( x j − b j ) + x j b k − 1 − j j x j − 1 . ∂ ∂ x P ( G k − 1 ,j ; x ) | x = b = b j b k − 1 − j j b j − 1 . Finally , b y Lemma 16, r ( G k − 1 ,j ) = 1 j b k − 1 ∂ ∂ x P ( G k − 1 ,j ; x ) | x = b = 1 j b k − 1 b j b k − 1 − j j b j − 1 = b j b k − 1 − k . Hence, the tota l num b er Eu lerian circuits in G k − 1 ,j is b j b k − 1 − k (( b − 1)!) j b k − 1 = b ! j b k − 1 b − k . 3.3 The n um b er of p erfect neckla ces Recall that b y Definition 11, d b,n = Y p α i i , where { p i } is the set of primes that divid e b oth b and n , and α i is the exp on ent of p i in the factorization of n . The Eu ler totient function ϕ ( n ) counts the p ositiv e in tegers less than or equal to n that are r elativ ely prime to n . Theorem 20 . Assume an alphab et of b symb ols, with b ≥ 2 . L et k and n b e p ositive inte gers. The numb er of ( k, n ) - p erfe ct ne cklac es is 1 n X d b,n | j | n e ( j ) ϕ ( n/j ) wher e e ( j ) = ( b !) j b k − 1 b − k is the numb er of Eulerian c i r cuits in gr aph G k − 1 ,j and ϕ is Euler’s totient function. 7 Pr o of. L et p ( j ) b e th e n umb er of irr educible ( k , j )-p er f ect nec klaces. Then, the num b er of ( k , n )-p erfect nec klaces is X d b,n | j | n p ( j ) . Let e ( j ) b e the n u m b er of Eulerian circuits in th e astute g r aph G k − 1 ,j . By Corollary 14, for eac h j su ch that d b,n | j | n , e ( j ) = X d b,n | ℓ | j ℓ p ( ℓ ) . Notice that d b,n = d b,j . F or a light er notation, in the rest o f the pro of w e abbreviate d b,n as just d . Then, writing eac h suc h j as a multiple of d , w e obtain that for eac h m su c h that md | n , e ( md ) = X i | m id p ( id ) . Let g ( m ) = e ( md ) and f ( m ) = p ( md ) md . W rting µ for the M¨ obius function w e obtai n f ( m ) = X i | m µ ( m/i ) g ( i ) . p ( md ) md = X i | m µ ( m/i ) e ( id ) . p ( md ) = 1 md X i | m µ ( m/i ) e ( id ) . X d | j | n p ( j ) = X m | n/d 1 md X i | m µ ( m/i ) e ( id ) = X i | n/d e ( id ) X i | m | n/d 1 md µ ( m/i ) = X d | j | n e ( j ) X j | q | n 1 q µ ( q /j ) . Applying the M¨ obius inv ersion, X j | q | n 1 q µ ( q /j ) = X r | n/j 1 j r µ ( r ) = 1 n X r | n/j n/j r µ ( r ) = 1 n ϕ ( n/j ) . W e ha ve used the identit y ϕ ( m ) = X r | m m r µ ( r ), whic h is simply the inv ersion of m = X r | m ϕ ( r ). By Lemma 19, the num b er e ( j ) of Eulerian circuits in the astute graph G k − 1 ,j is ( b !) j b k − 1 b − k . 8 4 Finite-size tests and p erfect neckl aces “Given a finite family of tests for r andomness ther e is an infinite se quenc e x which p asses al l of them, but x wil l b e r eje cte d by a new mor e r efine d test” , prop osed Norb erto F a v a to us. Ou r attempt to f ormalize th is claim led to finite-size tests and p erfect p erio dic sequences. The resu lt is summarized in Prop osition 21. Let ( X 0 , X 1 , . . . ) b e a sequence of random v ariables with v alues in a given alphab et A with at least t wo symbols. W e sa y that the sequence is r ando m if the v ariables are uniformly distributed in A and m u tually indep enden t. T o test if a sample ( x 0 , . . . , x n − 1 ) ∈ A n comes from a rand om sequence w e consider the follo win g fin ite-size hyp othesis testing setup . As usual, we write R for the set of r eal n umb ers. (a) The hyp othesis H 0 : ( X 0 , X 1 , . . . ) is r andom (b) A test-size k and a test function t : A k → R . Denote τ = E 0  t ( X 0 , . . . , X k − 1 )  = |A| − k X ( y 0 ,...,y k − 1 ) ∈A k t ( y 0 , . . . , y k − 1 ) , where E 0 is the exp ectatio n asso ciated to the hyp othesis H 0 . (c) A function T n : A n → R d efined by T n ( x 0 , . . . , x n − 1 ) =    1 n n − 1 X i =0 t ( x i , . . . , x i + k − 1 ) − τ    with p erio dic b ound ary conditions x n + j = x j . Thus, T n ( x 0 , . . . , x n − 1 ) is the absolute differ- ence b et w een the empir ical mean of t for the sample and the exp ected v alue of t under H 0 . (d) An err o r ε > 0 and the de cision rule If T n ( x 0 , . . . , x n − 1 ) > ε then reject the samp le ( x 0 , . . . , x n − 1 ) as co m in g fr om H 0 . In this case w e sa y that the test t r ej e cts the sample ( x 0 , . . . , x n − 1 ). This is called a test of size k b ecause r ejection is decided as a fu nction of the empirical mean of t , a fun ction of k successiv e co ordin ates. Examples of fin ite-size tests include fr equency test, b lo c k testing, n umber of ru ns in a b lo c k, lo n gest run of ones in a blo ck, etc. There are many (non -fi nite) tests, like th e discrete F ourier transform test, the Kolmogoro v-Smirno v test and man y others. Those tests also use some function ˜ T n of the sample, not necessarily based on the empirical mean of a t . The common feature is the use of th e d istribution of ˜ T n ( X 1 , . . . , X n ) und er H 0 to compute the pr obabilit y of r ejection when H 0 holds. T ests for H 0 are u sed to chec k if a sequence of n u m b ers pro duced by a random num b er generator can b e consider ed rand om; see K n uth [10] and the b attery of tests pr op osed b y L’Ecuy er and Simard [11 ]. A nice accoun t of the history of hyp othesis testing is giv en by Lehmann [12]. 9 In th e u sual h yp otheses testing the samp le-size n is kept fi xed. Assu ming H 0 and rep eating the test j times with indep enden t data, the prop ortion of times that the hypothesis is rejected con v er ges as j → ∞ to the p robabilit y under H 0 that T n ( X 0 , . . . , X n − 1 ) > ε . Ins tead, w e will tak e one infin ite sequence, test its first n elements, record rejection for eac h n and take n → ∞ . Let x = ( x 0 , x 1 , . . . ) b e an infin ite sequence of symb ols in A . Fix a test-size k , a test- function t of size k and let T n b e given by (c). W e sa y that x p asses the test t if lim n →∞ T n ( x 0 , . . . , x n − 1 ) = 0 . ( ∗ ) That is, for eac h ε > 0 there is an n ( x, ε ) su ch that for all n > n ( x, ε ) we ha ve T n ( x 0 , . . . , x n − 1 ) ≤ ε. In other words, fixin g th e test fu nction t of size k and the error ε , the test t rejects ( x 0 , . . . , x n − 1 ) for at most a fi nite num b er of n ’s. Wh en ( ∗ ) do es not hold w e say that t r eje cts x . The ran d om s equence ( X 0 , X 1 , . . . ) of ind ep endently identica lly d istributed un iform ran- dom v ariables in A p asses an y fi nite-size test t almost sur ely . T his is the same as saying that the set of real num b ers in [0 , 1] whose |A| -ary r ep resen tation passes all fin ite tests h as Leb esgue measur e 1. W e say that the infi nite sequence x is ( k , m ) -p erfe ct if x is p eriod ic with p er io d m |A | k and the nec klace [ x 0 . . . x m |A| k − 1 ] is ( k , m )-p erfect. Recall that ( k , 1)-p erfect n ec klaces are exactly the d e Bruijn n ec klaces of order k , so the follo wing prop osition considers infin ite d e Bruijn sequences of ord er k as a sp ecial case: if x is d e Bru ijn of order k th ere is a te st of size k + 1 that rejects x . Prop osition 21. Assume alphab et A has at le ast two symb ols. L et m b e a p ositive inte ger and let the infinite se quenc e x b e ( k , m ) -p erfe ct. Then, the fol lowing holds: 1. The infinite se quenc e x p asses every test of size j ≤ k . 2. F or e ach h > k + log | A | m ther e exists a test t of size h such tha t t r eje c ts x . Pr o of. L et b b e the num b er of sym b ols in A . T hus, the p erio d of x has length mb k . 1. Let t b e a test of size k . F or an y p ositiv e in teger ℓ , b y p erio diciy , T mb k ℓ =    1 mb k ℓ mb k ℓ − 1 X i =0 t ( x i , . . . , x i + k − 1 ) − τ    =    ℓ mb k ℓ mb k − 1 X i =0 t ( x i , . . . , x i + k − 1 ) − τ    = 0 . b ecause x is ( k , m )-p erfect and the definition of τ in (b). No w tak e j ∈ { 0 , . . . , mb k − 1 } and use the ab o ve id entit y to get ( mb k ℓ + j ) T mb k ℓ + j = j T j ≤ j max | t − τ | ≤ mb k max | t − τ | , where max | t − τ | = max z 0 ,...,z k − 1 | t ( z 0 , . . . , z k − 1 ) − τ | . Hence, T mb k ℓ + j ≤ mb k mb k ℓ + j max | t − τ | ≤ 1 ℓ max | t − τ | − → ℓ →∞ 0 . 10 This s h o ws that x passes t . Let ˜ t be a test of size j < k . T o see that x also passes ˜ t define t of size k as t ( x 0 , . . . , x k − 1 ) = ˜ t ( x 0 , . . . , x j − 1 ) . 2. Let h b e an int eger suc h that h > k + log b m . Then b h > mb k and th ere are more words w = w 0 . . . w h − 1 ∈ A h than the p ossible mb k places to start. Hence, there is at least one word ˜ w of length h not p resen t in the sequ ence x and the test t consisting on the indicator of ˜ w rejects x . Finite tests and normal n umbers. As stated by Borel (see [4]), a real num b er is simply normal to base b k exactly when eac h blo c k of length k occurs in the b -ary expans ion of x with asymptotic frequency b − k . Hence, a real num b er is simply n ormal to base b k if its b -ary expansion passes all tests up to size k . W e ha ve obtained th at for eac h k and b , and for an y m , eac h ( k , m )-p erfect sequence in alphab et { 0 , 1 , . . . , b − 1 } is the b -ary expansion of a n umber that is simp ly norm al to base b k . Borel d efines n ormalit y to base b as simp le n ormalit y to all bases b k , for every p ositiv e in teger k . Henceforth, a num b er is norm al to b ase b if its b -ary expansion passes all statistical tests of finite size. T hen, eac h instance of a n u mb er normal to a giv en b ase pro vides an example of a s equence that passes all finite-size tests. Man y are kno wn , such as [5, 1] and the references in [4]. Infinite t ests and algorithmically random sequences. Martin L¨ of in tro duced infinite tests defin ed in terms of computability [15]. These tests pr op erly include all tests of finite size, so for ev ery k and m , ( k , m )-p erfect s equ ences are rejected b y these tests. The infinite sequences that pass all th ese tests are the Martin L¨ of rand om sequences, also kn o wn as the algorithmicall y rand om sequences. Due to the n ature of th e definition, the algorithmically random sequences can not b e computed but some of them can b e d efi ned at the fir st lev el of the Ar ithmetical Hierarc hy [8]. Ac kno wledgments. W e thank Norb er to F av a and Victor Y ohai for motiv ating the qu estion on the existence of p erio dic s equ ences that pass any fin ite family of finite-size tests. W e thank Liliana F orzani and Ricardo F raiman for enligh tenin g discussions. Alv arez and Bec her are memb ers of Lab oratoire In ternational Asso ci ´ e INFINIS , Unive r sit ´ e P aris Diderot-CNRS / Univ ersidad de Buenos Aires-CONICET). Alv arez is su pp orted by CONICET do ctoral fello wsh ip. Bec her and F errari are supp orted b y the Univ ers ity of Buenos Aires and b y CONICET. References [1] V er´ onica Bec her and Pablo Ariel Heib er. On extending de Bruijn sequences. Informatio n Pr o c essing L etters , 111(18) :930–932, 2011. [2] J ean Berstel and Dominique P err in . Th e origins of com binatorics on w ord s. Eu r op e a n Journal of Combinato rics , 28(3):996– 022, 2007. 11 [3] ´ Emile Borel. Les probabilit ´ es d´ enombrables et leurs applications arithm ´ etiques. Supplemento R endic onti del Cir c olo Matemat i c o di P alermo , 27:247–27 1, 1909. [4] Y ann Bugeaud. Distribution mo dulo one and Diophan tine appr oxima tion , vo lu me 193 of Cambridge T r acts in Mathematics . Cambridge Un iv ersit y Press, Cam bridge, 20 12. [5] David Champ erno wn e. T he constru ction of decimals n ormal in the scale of ten. Journal of L ondo n Mathematic a l So ciety , 3:254–260, 1933. [6] Drago ˇ s M. C v etk ovi ´ c, Mic h ael Do ob, and Horst Sac h s. Sp e ctr a of Gr aphs . Academic Press, 1980. [7] N.G. de Bru ijn. A com bin atorial p roblem. Indagation e s Mathematic ae , 8:461–46 7, 1946. Pro c. Konin k lijke Nederlandse Ak a d emie v. W etensc happ en 4 9:758–764. [8] Ro d Downey and Denis Hirsc hf eldt. A lgorithmic R and omness a nd Complexity . Sp r inger- V erlag New Y ork, Inc., USA, 2010 . [9] F. Harary . Gr aph The ory . Add ison-W esley Publish in g C o. Inc., Rea d ing, Mass., 1969. [10] Donald E. Knuth. The art of c omputer pr o g r amming. Vol. 2 . Addison-W esley , Reading, MA, 1998. Seminumerical algorithms, Third edition [of MR0286318]. [11] Pierre L’Ecuy er and Ric h ard Simard. T estU01: a C li b rary for empirical testi n g of random num b er generato rs . ACM T r ans. Math . Softwar e , 33(4):Art. 22, 40, 2 007. [12] Er ich L. Lehmann. Fisher, N eyman, and the cr e ation of classic al statistics . Springer, New Y ork, 2011. [13] M. Lothaire. Combinato rics on wor ds . Cam brid ge Mathematica l Library . Cambridge Univ ersity Press, Cam bridge, 1997. [14] M. Lo th aire. Algebr aic c ombinatorics o n wor ds , volume 90 of Encyclop e dia of Mathematics and its Applic a tions . Cam br id ge Univ ersity Pr ess, Cam bridge, 2002. [15] Per Martin-L¨ of. The Definition of Rand om S equences. Infor mation and Contr o l , 9(6):6 02–619, 1966. [16] W.T. T u tte. Gr ap h The ory . Addison-W esley , 1984 . Nicol´ as Alv arez Departamen to de Ciencias e Ingenier ´ ıa de la Computaci´ on Univ ersidad Nacio n al del Sur, Ar gentina. naa@cs.uns.edu.ar V er´ onica Be cher Departmen to de Computaci´ on, F acultad de Ciencias Exactas y Naturales Univ ersidad de Buenos Aires & C ONICET, Argent in a. vb ec h er@dc.uba.ar P ablo F errari Departmen to de Matem´ atica, F acultad de Ciencias Exact as y Natur ales Univ ersidad de Buenos Aires & C ONICET, Argent in a. pferrari@d m .uba.ar Sergio Y uh jtman Departmen to de Matem´ atica, F acultad de Ciencias Exact as y Natur ales Univ ersidad de Buenos Aires, Argen tina. syuhjtma@d m .uba.ar 12

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment