Problems of robustness for universal coding schemes

Problems of robustness for univ ersal co ding sc hemes V.V.V’ yugin Abstract The Lemp el–Ziv universal co ding sc h eme is asymptotically optimal for the class of all stationary ergod ic sources. A problem of robust- ness of this prop ert y un der small violations of er go dicit y is studied. A notion of deﬁ ciency of algorithmic ran d omness is used as a m ea- sure of disagreemen t b et ween data sequence and probabilit y measure. W e pr o v e that un iv ersal compressin g sc hemes fr om a large class are non-robust in the follo w ing sense: if the r andomness deﬁ ciency gro ws arbitrarily slowly on initial fragments of an inﬁ nite sequence then th e prop ert y of asymptotic optimalit y of an y univ ersal compr essin g algo- rithm can b e violated. Lemp el–Ziv compressing algorithms are r o- bust on inﬁ nite sequences generated by ergo dic Mark o v chains when the randomness deﬁciency o f its initial fragmen ts of length n gro ws as o ( n ). 1 In tro duction W ell kno wn data compression sc hemes univ ersal for classes of stationary er- go dic sources, lik e Lemp el–Ziv algorithms, are asymptotically optimal [1, 2]. In particular, for almost ev ery inﬁnite binary sequence ω 1 ω 2 . . . generated b y an ergo dic source with unkno wn statistics the a v erage length of co dew ord related to one bit of input sequence tends to entrop y of the source when the blo c k length tends to inﬁnity . It lo oks signiﬁcan t a prop erty of co d- ing algorithms to b e robust under small v aria t ions of its parameters. W e consider in this pap er a problem o f robustness of the asymptotic optimal- it y pro p erty under small viola tions of ergo dicity of a source. A notion of deﬁciency of algorithmic randomness d P ( ω 1 . . . ω n ) is used as a measure of 1 disagreemen t b etw een da t a sequence ω . . . ω n . . . and probability distribution P . This notion is conside red in Kolmog o ro v theory of a lgorithmic complexit y and randomness [3, 4, 5]. In the framew ork of this theory w e can form ulate la ws of pro babilit y theory , i.e. statemen ts whic h hold almost surely , in a “p oin t wise” form a s statemen ts whic h hold for individual ob jects. The set of Martin-L¨ of [6] random sequences is used at the presen t time as a standard set o f suc h individual ob jects. The measure of this set is equal 1 a nd laws of probability theory , like the law of large num b ers, the law of iterated loga- rithm and others, hold f o r each sequence from this set. A sequence ω 1 ω 2 . . . is algorithmic random with resp ect to a computable measure P if and only if the ra ndomness deﬁciency d P ( ω 1 . . . ω n ) of its initial fragmen t s of length n is b ounded then n increases (exact deﬁnition of the randomness deﬁciency will b e g iv en in Section 2). “Robustness” under small violations of algorithmic randomness o f some probabilit y la ws was studied in [7, 8]. These statemen ts hold no t o nly for random sequences but they hold also for sequences from more broader sets: the law of large num b ers fo r symmetric Bernoulli sc heme holds for an y se- quence ω 1 ω 2 . . . suc h that d P ( ω 1 . . . ω n ) = o ( n ); the la w o f iterated logar it hm holds if d P ( ω 1 . . . ω n ) = o (log log n ). Small v ariations of these conditions imply violations o f these la ws. Robustness prop erty can b e failed for laws of more general type. It is pro v ed in [9] tha t Bir khoﬀ ’s ergo dic theorem is non-robust in this sense – an y small growing of the deﬁcienc y of ra ndomness on initial fragments of an inﬁnite sequence ω 1 ω 2 . . . can imply the violation of t he statemen t of this theorem. W e pro v e that for an y unbounded, nonnegativ e, and nondecreasing func- tion σ ( n ) a stationary ergo dic (and computable with respect to σ ) measure P exists suc h tha t for an y univ ersal co de for some inﬁnite binary sequence ω 1 . . . ω n . . . inequalit y d P ( ω 1 . . . ω n ) ≤ σ ( n ) holds fo r all suﬃcien tly large n and the prop ert y of asymptotic optimality of this co de is violated for this sequence . 2 Algorith mic co mplexit y and rand omness Main n otions and results on computability can be found in [10]. In this pap er w e consider algorithms w orking with constructiv e ob jects (that is in teger and rational n um b ers, or w ords in ﬁnite alphab et). Let B b e some ﬁnite alphab et and B ∗ b e the set of all w ords (ﬁnite sequences of letters) in it. Empt y word 2 Λ is also an elemen t of B ∗ . Let l ( x ) b e the length (num b er of letters) of a w ord x ∈ B ∗ . W e write x ⊆ y if a w ord x is a preﬁx of a w ord y . Tw o words x and x ′ are comparable if x ⊆ x ′ or x ′ ⊆ x . Let bx b e a concatenation of b and x (i.e. all letters of x follow a fter a ll letters of b in bx ). Kolmogorov (a lg orithmic) complexit y of a w ord x ∈ B ∗ (with resp ect to a w ord y ∈ B ∗ ) is equal to the length of the shortest binary co dew ord p (i.e. p ∈ { 0 , 1 } ∗ ) by whic h give n y the w ord x can b e reconstructed 1 K ψ ( x | y ) = min { l ( p ) : ψ ( p, y ) = x } . By this deﬁnition the complexit y dep ends on partial computable function ψ – metho d of deco ding. A.N.Kolmogorov prov ed that an optimal de c o ding algorithm ψ exists suc h tha t for an y p ositiv e constan t c (do not dep ending from x , y and ψ ′ ) K ψ ( x | y ) ≤ K ψ ′ ( x | y ) + 2 K ( ψ ′ ) + c (1) holds for a n y computable deco ding function ψ ′ and for all w ords x and y . Here K ( ψ ′ ) is the length of the shortest pro gram computing v alues of ψ ′ . 2 W e ﬁx some optimal decoding function ψ . The v alue K ( x | y ) = K ψ ( x | y ) is called (conditional) Kolmogorov complexit y o f x giv en y . Unconditional complexit y of x is deﬁned K ( x ) = K ( x | Λ). It follows from [11] t ha t a corresp onding to ψ co ding algorithm ( in sense of Section 4) computing by x a co dew ord p o f minimal length suc h that ψ ( p ) = x do es no t exist. W e will use some prop erties of Kolmogorov complexit y [5, 1 1]. Incom- pressibilit y prop erty asserts that for any p o sitive in teger num b ers n and m a p ort io n of all sequences x of length n suc h that K ( x ) < n − m, (2) is less than 2 − m . Indeed, the n umber of all x satisfying this inequality do es not exceed the num b er of all binary progra ms generating them. Since the length o f an y suc h pro g ram is less than n − m the n um b er o f these programs is less t ha n 2 n − m . 1 W e supp ose that min ∅ = + ∞ . 2 W e supp ose that so me universal progr amming langua ge is ﬁxed, and all dec o ding progra ms are written in this langua g e (the co nstan t c depends o n this language). 3 Let x and b b e ﬁnite w ords. It is easy to construct a function whic h giv en an y program computing bx and the length of b computes the w or d x . Therefore, 3 K ( x ) ≤ K ( bx ) + 2 log l ( b ) + c (3) for an y x , where c is a p ositiv e constan t not dep ending fro m b and x . W e consider a probabilit y space (Ω , F, P ), where Ω = { 0 , 1 } ∞ , Borel ﬁeld F is generated b y balls Γ x = { ω ∈ Ω : x ⊆ ω } , where x ∈ { 0 , 1 } ∗ . T o deﬁne a probabilit y measure P on t he space Ω it is suﬃcien t to deﬁne the concordant v alues P (Γ x ) = P ( x ) suc h that P (Λ) = 1 and P ( x ) = P ( x 0) + P ( x 1) for all x , where xν denotes a w ord obta ined fro m x b y adding ν o n right. After that, the function P can b e extended b y Kolmogorov extension t heorem [12]. A uniform Bernoulli probability distribution on binary sequences is deﬁned B 1 / 2 ( x ) = 2 − l ( x ) . A measure P is called computable if there exists an algorithm whic h giv en a ﬁnite sequence x and a degree of accuracy , a r a tional ǫ > 0, o utputs a rational approxim ation to P ( x ) with the accuracy ǫ . A notion of algorit hmic rando m sequence is deﬁned using an algorithmic analogue of a set of measure 0 . Let P b e a computable probability measure on a set of all inﬁnite binary sequences Ω. A set M ⊆ Ω ha s P -measure 0 if for eac h rational ǫ > 0 there is a sequence x (1) , x (2) , . . . of elemen ts of Ξ suc h that the set U ǫ = ∪ i Γ x ( i ) satisﬁes M ⊆ U ǫ and P ( U ǫ ) < ǫ . A P -null set is called eﬀectiv ely P -n ull if t here exists a computable function x ( ǫ, i ) suc h that M ⊆ U ǫ = ∪ Γ x ( ǫ,i ) and P ( U ǫ ) < ǫ for each r a tional ǫ > 0. It can b e prov ed that for an y computable measure P there exists the la rgest with resp ect t o the measure-theoretic inclusion eﬀectiv ely P -n ull set [4, 5, 6]. The complemen t of this la r gest eﬀectiv ely P -n ull set is called the constructiv e supp ort of the measure P . An inﬁnite sequence ω ∈ Ω is called alg orithmic rando m with resp ect to the measure P (random in the sense of Martin-L¨ of ) if it b elongs to the constructiv e supp ort of t he measure P . Using some mo diﬁcation of deco ding algorithms w e can deﬁne a notion of algorithmic ra ndo m sequence in terms of complexit y [4, 5, 13]. Let us consider monotonic computable transformations of sequences. Let A and B b e ﬁnite alphabets, and let a s et ˆ ψ ⊆ A ∗ × B ∗ is (recursiv ely) enumerable (b y means of some algor it hm) a nd suc h that for any ( x, y ) , ( x ′ , y ′ ) ∈ ˆ ψ if x a nd x ′ are compara ble then y and y ′ are also compara ble. Let a lso A = { 0 , 1 } . 3 W e will co nsider in the follo wing log arithms on t he base 2. 4 The set ˆ ψ deﬁnes some monotonic with resp ect to ⊆ deco ding function 4 ψ ( p ) = sup { x : ( p, x ) ∈ ˆ ψ } . (4) The class of suc h monotonic functions ψ determines the corresp onding alg o - rithmic complexit y K m ψ ( x ) = min { l ( p ) : x ⊆ ψ ( p ) } . The corresp onding optimal complexit y K m ( x ) is diﬀer from complexit y K ( x ) b y a term of order of loga r it hm from l ( x ). W e hav e K ( x ) − 2 log l ( x ) − c ≤ K m ( x ) ≤ K ( x ) + 2 log K ( x ) + c (5) for all x , where c is a p ositiv e constan t [4, 5]. F or an y sequenc e ω denote b y ω n = ω 1 . . . ω n its initial fragmen t of length n . The f o llo wing fundamen tal assertion (which at ﬁrst w as prov ed in [1 3]) holds. Prop osition 1 L et P b e som e c omputable me asur e. Then 1) for a ny inﬁnite se quenc e ω a c onstant c exists such that for al l n ine quality K m ( ω n ) ≤ − log P ( ω n ) + c holds, b esides, for any m P ( ∪{ Γ x : − log P ( x ) − K m ( x ) ≥ m } ) ≤ 2 − m ; 2) a se quenc e ω is r andom with r esp e c t to a me asur e P in sense of Martin- L¨ of if and only if for some c onstant c it hold s K m ( ω n ) ≥ − log P ( ω n ) − c for al l n . These pro po sition show s that asymptotic b ehaviour of the function d P ( ω n ) = − log P ( ω n ) − K m ( ω n ) can b e used as a quan titativ e measure o f nonrandomness o f the sequence ω . By Prop osition 1 a sequence ω is algorithmic random with resp ect to a computable measure P if and only if sup n d P ( ω n ) < ∞ . The v alue d P ( ω n ) is called the deﬁciency of algorithmic r an domness of a word (ﬁnite sequenc e) ω n with respect to a measure P [4, 5, 14]. Basic notions of ergo dic theory can b e found in [15] (see also App endix 2 to this pap er). A prop ert y of “ asymptotic optimalit y o f compression” by means of the shortest co dew ord deﬁning the Kolmogo ro v complexit y holds. 4 Here the b y supremum w e mean an union of all comparable x in one sequence. 5 Corollary 1 L et P b e an arbitr ary c o m putable stationary er go dic me asur e, and let H b e its entr opy. Then for P -almost al l inﬁnite s e q uenc es ω the fol lowing limits exist an d the c orr esp ond ing e qualities hold lim n →∞ K ( ω n ) n = lim n →∞ K m ( ω n ) n = lim n →∞ − log P ( ω n ) n = H . (6) This c orollary follow s from Prop osition 1, relation (5) and Shannon – McMil- lan – Breiman theorem [15]. At ﬁrst this coro lla ry was pro ved for K ( x ) in [11]. In [16] a v ariant of (6) for algorithmic r a ndom sequence w as obtained: for a n y inﬁnite sequence ω random with respect t o a computable ergo dic measure P with entrop y H relations (6) hold where the limit is replaced on upp er limit. 3 Non-robus tness pro p ert y of the univ ersal data compre ssion sc heme It lo oks imp ortant a prop ert y of compressing algorithms to b e robust under small v ariations of its par a meters. The follow ing Theorem 1 can b e inter- preted as an assertion of tha t “optimal compression sc heme” corresp onding to Kolmogorov complex it y is non-robust in the class of all stationary ergo dic sources. As consequence s o f this theorem w e obtain in Section 4 results on non-robustness o f computable univ ersal co ding sc hemes (see Prop ositions 2 and 3). Theorem 1 F or any nonne gative, nonde c r e a s ing, and unb ounde d function σ ( n ) and for any r e al numb e r 0 < ǫ < 1 / 4 a c omputable with r esp e ct to σ stationary er go dic me asur e P with entr opy 0 < H ≤ ǫ an d a n inﬁnite binary se quenc e α exis t such that d P ( α n ) ≤ σ ( n ) (7) for almo s t al l n . It holds also lim sup n →∞ K ( α n ) n ≥ 1 4 , (8) lim inf n →∞ K ( α n ) n ≤ ǫ. (9) 6 Pr o of . Let r > 0 b e a suﬃcien tly small rationa l num b er. Let us consider a partition π 0 = [0 , 1 2 ) ∪ ( 1 2 + r , 1] , π 1 = [ 1 2 , 1 2 + r ] of semiop en interv al [0 , 1) (the n umber r will b e sp eciﬁed later). Using cut- ting and stac king metho d (basic deﬁnitions for this metho d will b e giv en in App endix 2) w e will deﬁne an ergo dic transformation T of interv al [0 , 1 ) whic h will generate a stationary ergo dic measure P on the set Ω. T o deﬁne the measure P consider P ( a 1 a 2 . . . a n ) = λ { ω : ω ∈ [0 , 1) , T i ( ω ) ∈ π a i , i = 1 , 2 , . . . , n } , (10) where a 1 a 2 . . . a n is an arbitrary binary sequen ce, λ is the uniform measure on the interv al [0 , 1 ). The measure P is extended on arbitra ry Borel subsets of Ω by a natural fashion [12]. The ergo dic transfor ma t ion T will be de ﬁned by a s equence of gadgets ∆ s , Π s , where s = 0 , 1 , . . . . Let a gadget Φ s b e the union of these t wo ga dgets. W e deﬁne at step s an approx imation T s = T (Φ s ) o f the transformatio n T and corresp onding a pproximation P s of the measure P ana logously to (10). The transformation T s determines ﬁnite tra jectories star t ing in the p oints of in ternal in terv als of these gadg ets and ﬁnishing in the to p in terv als. An y suc h tra jectory has a name whic h is a w ord in the alphab et { 0 , 1 } . By deﬁnition for an y word a (for an y set of words D ) the n umber P s ( a ) ( P s ( D ) accordingly) is equal to the sum of lengths of all in terv als of the g adget Π s from whic h tra jectories with na mes extending a (extending w ords from D ) start. Since the function σ is nondecreasing a nd un b ounded a computable with resp ect to it sequence o f p ositiv e integer n umbers exists suc h that 0 < h − 2 < h − 1 < h 0 < h 1 < . . . and σ ( h i − 1 ) − σ ( h i − 2 ) > − log r + i + 13 (11) for all i = 0 , 1 , . . . . The gadgets will b e deﬁned b y mathematical induction on steps. The gadget ∆ 0 is deﬁned b y cutting of the in terv al [ 1 2 − r, 1 2 + r ) on 2 h 0 equal pa rts a nd by stac king them. Le t Π 0 b e a gadget deﬁned by cutting of in t erv a ls [0 , 1 2 − r ) and ( 1 2 + r, 1] in 2 h 0 equal parts and stacking them. The purp ose o f this deﬁnition is t o construct initial gadgets o f heigh t 2 h 0 with suppo rts satisfying λ ( ˆ ∆ 0 ) = 2 r and λ ( ˆ Π 0 ) = 1 − 2 r . 7 The sequence of gadgets { ∆ s } , s = 0 , 1 , . . . , will deﬁne an appro ximation of the uniform Bernouli measure concen trated on the names ot t heir tra jec- tories. The sequence of gadgets { Π s } , s = 0 , 1 , . . . , will deﬁne a measure with suﬃcien tly small en trop y . The gadget Π s − 1 will b e extended at each step o f the construction b y a half part of the ga dg et ∆ s − 1 . Af t er that, the indep enden t cutting and stacking pro cess will b e applied to this extended gadget. This pro cess ev en tua lly deﬁnes inﬁnite t r a jectories of p oin t s from in terv al [0 , 1). The sequence of gadgets { Π s } , s = 0 , 1 , . . . , will b e complete and will deﬁne the needed measure P . Lemmas 2 and 3 will ensure the transformation T and measure P to b e ergo dic. The purp ose of the construction is to suggest conditions under whic h there exists a p o int in interv al [0 , 1) having an inﬁnite tra jectory with a name α satisfying (7), (8) and (9 ). T o implemen t (8) w e p erio dically extend initial fra gmen ts of α by names o f tra j ectories of gadgets ∆ s − 1 (for suitable s ) whic h hav e the maximal complexit y . T o b ound the deﬁciency of randomness of initial fra g men t of length n b y the v alue σ ( n ) we suggest with the help of condition (11 ) some relation b etw een the heigh t of the gadget ∆ s and the measure of the supp ort of this gadg et. W e will use Prop osition 5 to deﬁne an extension with suﬃcien tly small deﬁciency of randomness. T o implemen t condition (9) it is suﬃcien t to extend names in lo ng runs of the construction only in accoun t of tra jectories o f ga dg ets { Π s } , s = 0 , 1 , . . . . F or an y s only a p ortion ≤ r o f the supp ort of such gadget b elongs to elemen t π 1 of the partition. Then by ergo dic theorem the most part of (suﬃcien t ly long) tra jectories of this gadget will visit π 1 according to this frequency , and the names of these tra jectories will ha v e the frequency o f ones b ounded by a small n umber 2 r , that ensures the b ound (9). Construction . Let a t step s − 1 ( s > 0) g a dgets ∆ s − 1 and Π s − 1 w ere deﬁned. Cut of the ga dg et ∆ s − 1 in to t wo copies ∆ ′ ∆ ′′ of equal width (i.e. w e cut of each column in to t wo sub columns o f equal width) and j o in Π s − 1 ∪ ∆ ′′ in one gadget. Find a num b er R s and do R s -fold indep enden t cutting and stacking o f the gadget Π s − 1 ∪ ∆ ′′ and also o f the gadget ∆ ′ to obtain new gadgets Π s and ∆ s of heigh t 2 h s suc h that the gadget Π s − 1 ∪ ∆ ′′ is (1 − 1 /s )–well–dis tributed in the gadget Π s . The needed n umber R s exists b y Lemma 3 (App endix 2). Pr op erties of the c onstruction . D eﬁne T = T { Π s } . Since the sequence of the g a dgets { Π s } is complete (i.e. λ ( ˆ Π s ) → 1 and w (Π s ) → 0 as s → ∞ ) the transformation T is deﬁned fo r λ -almost all ω . The measure P is deﬁned b y (10). The measure P is stationary , since t he tra nsformation T preserv es the 8 uniform measure λ . Measure P is ergo dic b y Lemma 2 (App endix 2), where Υ s = Π s , since the sequence of gadgets Π s is complete. Beside s, the ga dg et Π s − 1 ∪ ∆ ′′ , a nd the gadget Π s − 1 are (1 − 1 / s ) – w ell–distributed in Π s for an y s . By construction λ ( ˆ ∆ i ) = 2 − i +1 r and λ ( ˆ Π i ) = 1 − 2 − i +1 r (12) for all i = 0 , 1 , . . . . This construction is algorithmic eﬀec tiv e, so the measure P is computable with respect to σ . Let us prov e that en trop y H of the measure P do not exceed ǫ . Since λ ( π 1 ) = r and the transformatio n T preserv es the measure λ , b y ergo dic theorem in almost all p oin ts of interv al [0 , 1) a tra jectory start s suc h that the limit of the frequency of visiting the elemen t π 1 b y this tra jectory is equal r , when the length of initial fragment of suc h tra jectory tends to inﬁnity . 5 Th us for any δ > 0 for all suﬃcien tly large n the measure P of all sequences x of length n with p ortion of ones ≤ 2 r is ≥ 1 − δ . L et us consider an y suc h sequence x as an elemen t a ﬁnite set consisting of all sequences of length n and containing no more than 2 r n ≤ n 2 ones. Then w e obtain a standard upp er b ound K ( x ) n ≤ 1 n log 2 r n n 2 r n !! + 2 log n n ≤ − 3 r log r (13) for all suﬃcien tly large n . By this inequalit y and b y (6) we o btain upp er b ound H ≤ − 3 r log r ≤ ǫ for en tropy H of the measure P , where r is suﬃcien tly small. Let us prov e that an inﬁnite sequence α exists suc h that the conclusion of Theorem 1 ho lds. W e will deﬁne α by induction on steps s as the union of a n increasing sequence of initial fragments α (0) ⊂ . . . ⊂ α ( k ) ⊂ . . . (14) F or all suﬃcien tly large k the Kolmogorov complexit y of initial fragmen t α ( k ) will b e small if k is o dd, and complexit y of α ( k ) will b e larg e, otherwise. 5 F or any ω ∈ [0 , 1) the frequency o f visiting of π 1 by tra jectory star ting in ω is equal to (1 /l ) P l i =1 χ 1 ( T i ω ), where l is the length of this tr a jector y and χ 1 ( r ) = 1 if r ∈ π 1 , and χ 1 ( r ) = 0 , otherwise. 9 Deﬁne α (0) b e equal to Π 0 –name of some tra jectory of length ≥ h 0 suc h that d P ( α (0)) ≤ 2. This is p ossible to do b y Prop osition 5 (App endix 1) . Deﬁne s ( − 1) = s (0) = 0. Induction hyp otheses. Supp ose that k > 0 and a sequence α (0) ⊂ . . . ⊂ α ( k − 1) is already deﬁned, and for some step s ( k − 1) of the construction the w ord α ( k − 1 ) is Π s ( k − 1) – name of a tra jectory of s ome p oint from the s upp ort of the gadget Π s ( k − 1) . W e supp ose tha t l ( α ( k − 1)) > h s ( k − 1) , and if k is o dd then d P ( α ( k − 1)) ≤ σ ( h s ( k − 2) ) − 4. If k is ev en then d P ( α ( k − 1)) ≤ σ ( h s ( k − 2) ) and P s ( k − 1) ( α ( k − 1)) > (1 / 8) P ( α ( k − 1)). Let us consider an y o dd k . Deﬁne a = α ( k − 1). Let us consider a set of a ll in terv als (from columns) of t he ga dg et Π s − 1 with the following prop erty: for any tra jectory starting f rom this in terv al with Π s − 1 -names extending a the f requenc y of visiting the elemen t π 1 of the partition is ≤ 2 r . F or the name γ of an y such tra jectory an inequalit y K ( γ ) /l ( γ ) ≤ − 3 r log r ≤ ǫ (15) (analogous to (13)) holds, where r is suﬃcien tly small. As in the pro of o f the inequalit y H ≤ ǫ w e obtain by ergo dic theorem that f or all suﬃcien t ly large s total length of all inte rv al from this set is ≥ (1 / 2) P ( a ). Let us consider an arbit r a ry column from the gadget Π s . Divide all its in terv als on tw o equal pa r t s: upp er part and lo w er part. W e will consider only interv als from the low er part. An y tra jectory starting from a p oin t o f an in terv al from this part has length ≥ h s . Fix some s as ab o v e a nd deﬁne s ( k ) = s . Let U s ( a ) b e all in terv als from the low er part of the gadget Π s suc h that tra jectories starting from them and ha ving Π s – names extending a satisfy the inequalit y (15). Let D a b e a set of all Π s – names of all these tra jectories. Inequalit y P s ( D a ) = P s ( a ) > (1 / 4) P ( a ) holds for t he total length P s ( D a ) of all interv als from U s ( a ). Deﬁne ˜ D = ∪ x ∈ D Γ x . It is easy to prov e that a set C a ⊆ D a exists suc h that P ( ˜ C a ) > (1 / 8 ) P ( ˜ D a ) and P s ( b ) > ( 1 / 8) P ( b ) for a ll b ∈ C a . By Prop osition 5 ( App endix 1) an b ∈ C a exists suc h that d P ( b j ) ≤ d P ( a ) + 4 when l ( a ) ≤ j ≤ l ( b ). Deﬁne α ( k ) = b . By induction hypotheses inequalities d P ( a ) ≤ σ ( h s ( k − 2) ) − 4 and l ( a ) ≥ h s ( k − 1) > h s ( k − 2) hold. Then d P ( b j ) ≤ σ ( h s ( k − 2) ) ≤ σ ( l ( a )) ≤ σ ( j ) fo r all l ( a ) ≤ j ≤ l ( b ). Notice, that l ( b ) ≥ h s ( k ) , since an y tra jectory deﬁning b starts from an in terv al of the lo w er pa r t o f the g a dget Π s , and the height o f this g a dget is ≥ 2 h s . The rest induction h yp otheses a re prov ed ab ov e. 10 The condition (9) is true, since condition (15) holds for inﬁnite num b er of initia l fr a gmen ts α ( k ) of the sequence α . Let k b e even . Put b = α ( k − 1). Let s = s ( k − 1) + 1. Deﬁne s ( k ) = s . Let us consider an a rbitrary column from the gadg et ∆ s − 1 . Divide all its in terv als into tw o equal parts: upper part and low er part. Any tra jectory starting from an inte rv al of the low er part hav e the length ≥ L/ 2 , where L ≥ 2 h s − 1 is the heigh t of the gadg et ∆ s − 1 . The uniform measure of a ll suc h interv als is equal to 1 2 λ ( ˆ ∆ s − 1 ). Let us consider the names x L/ 2 of initial fragmen ts of length L/ 2 of all these tra jectories. By incompressibilit y prop- ert y o f Kolmogorov complexit y (2) a nd b y choice o f L the uniform Bernoulli measure of all sequences o f length L/ 2 satisfying K ( x L/ 2 ) l ( x L/ 2 ) < 1 − 2 h s − 2 , is less than 2 − L/h s − 2 ≤ 1 / 4. Names of initial fra g men ts (of length L/ 2) of the rest part of tra jectories starting from interv als of low er part of the gadget ∆ s − 1 satisfy K ( x L/ 2 ) l ( x L/ 2 ) ≥ 1 − 2 h s − 2 . (16) It is noted in App endix 2 ( Remark 1 ), for any step s of the construction the equalit y P s − 1 ( x ) = 2 − l ( x ) λ ( ˆ ∆ s − 1 ) holds fo r the name x o f an y tra jectory of the gadget ∆ s − 1 . W e conclude from this equalit y that the uniform measure of all in terv als from the low er part of the gadget ∆ s − 1 , suc h that tra jectories with names (mor e correctly , with initial fragmen ts x L/ 2 of such names) satisfying (16) start fr o m these in terv als, is at least 1 4 λ ( ˆ ∆ s − 1 ). By (11) a nd (12) γ = λ ( ˆ ∆ ′′ ) λ ( ˆ Π s − 1 ) = λ ( ˆ ∆ s − 1 ) 2 λ ( ˆ Π s − 1 ) = 2 − s +1 r 1 − 2 − s +2 > 2 − s +1 r ≥ 2 − ( σ ( h s − 1 ) − σ ( h s − 2 )+12 (17) Let us consider R s –fold indep enden t cutting and stack ing of the g a dget Π s − 1 ∪ ∆ ′′ in more details. A t ﬁrst, w e cut of this gadget on R s copies. When w e stac k the next cop y on already deﬁned part o f the gadget the p ortion of a ll tra jectories of an y column from the previously constructed part, whic h go to a sub column fr o m the gadget ∆ ′′ , is equal to λ ( ˆ ∆ ′′ ) λ ( ˆ Π s − 1 ) + λ ( ˆ ∆ ′′ ) = γ 1 + γ . (18) 11 This is true, since b y deﬁnition an y column is co v ered b y a set of su b columns with the same distribution as the gadget Π s − 1 ∪ ∆ ′′ has. T otal length of all in terv als of the g adget Π s − 1 suc h that t r a jectories with names extending b start fr o m these interv als is equal to P s − 1 ( b ). Consider the low er half of all subin terv als generated b y cutting and stack - ing of the gadget Π s − 1 in whic h tra jectories with Π s − 1 –names extending b start. The length of any suc h tra jectory (in Π s ) is at least h s . By this reason some inductiv e hypothesis will b e true. The measure of all remain- ing subin terv als decreases twice . After that, w e consider a subset of these subin terv als, suc h that tra jectories star t ing from subin terv als of this subset go in to sub columns of the gadget ∆ ′′ . The measure of remaining subin terv als is m ultiplied b y a factor γ / (1 + γ ). F urther, consider subin terv als from the remaining part generating tra jectories whose names ha v e in ∆ ′′ fragmen ts satisfying ( 16). The measure o f the remaining part can b e at least 1 / 4 from the previously considered part. W e o btain this b ound from previous esti- mate of the p ortion of subin terv als generating tra jectories in the g a dget ∆ ′′ of length ≥ L/ 2 satisfying (16). 6 Let D b b e a set of all Π s –names of all tra jectories starting from subinte rv als remaining after t hese selection op era- tions. Then P s ( D b ) ≥ γ 8(1 + γ ) P s − 1 ( b ) . (19) The name of an y suc h tra jectory has initial fra gmen t of t yp e bx ′ x L/ 2 , where x ′ x L/ 2 is t he name of a fragmen t of this tra jectory corresp onding to its path in the gadget ∆ s − 1 . The w ord x L/ 2 has length L/ 2 and satisﬁes (16). The w o r d x ′ is the na me o f a fragmen t of the tra jectory whic h go es from lo wer interv al to an interv al generating tra jectory with name x L/ 2 . W e ha ve l ( bx ′ x L/ 2 ) ≤ 2 L = 4 l ( x L/ 2 ). By (3) a nd (16) w e obtain for these initial fragmen ts of suﬃcien tly large length K ( bx ′ x L/ 2 ) l ( bx ′ x L/ 2 ) ≥ K ( x L/ 2 ) − 2 log l ( bx ′ ) 4 l ( x L/ 2 ) ≥ 1 4 − 1 h s − 2 . (20) W e hav e P s − 1 ( b ) > (1 / 8) P ( b ) by induction h yp othesis. After that , taking in to accoun t tha t γ ≤ 1, w e deduce from (19) P ( ˜ D b ) ≥ P s − 1 ( D b ) ≥ γ 128 P ( b ) . 6 Remem b er, that L ( ≥ 2 h s − 1 ) is the height o f gadgets Π s − 1 , ∆ s − 1 . 12 By Prop osition 5 an c ∈ D b exists such that d P ( c j ) ≤ d P ( b ) + 1 − log γ 128 ≤ d P ( b ) + ( σ ( h s − 1 ) − σ ( h s − 2 ) − 12) + 8 ≤ σ ( h s − 1 ) − 4 = σ ( h s ( k − 1) ) − 4 for a ll l ( b ) ≤ j ≤ l ( c ). Here we hav e d P ( b ) ≤ σ ( h s ( k − 2) ) ≤ σ ( h s − 2 ) b y induction hypothesis. W e also used inequality ( 1 7). Beside s, b y induction h yp othesis we hav e l ( b ) ≥ h s − 1 . Therefore, d P ( c j ) < σ ( h s − 1 ) ≤ σ ( l ( b )) ≤ σ ( j ) for l ( b ) ≤ j ≤ l ( c ). Deﬁne α ( k ) = c . It is easy to see that all induction h yp otheses a re true for α ( k ). An inﬁnite sequence α is deﬁned by a sequence of initia l fragmen t s (1 4 ). W e pro v ed tha t d P ( α j ) ≤ σ ( j ) f o r all j ≥ l ( α (1)). By the construction t here are inﬁnitely many initial frag men ts of the sequence α satisfying (20). The sequence h s , where s = 0 , 1 , . . . , is monotone increased. So, the condition (8) hold. △ 4 Non-robus tness prop ert y of univ ersal co des Let A and B b e ﬁnite alphab ets. By a c o de w e mean a computable family of functions 7 φ n : A n → B ∗ , where n = 1 , 2 , . . . . Supp ose that B = { 0 , 1 } . W e will consider deco dable co des. A computable family o f deco ding func- tions ψ n : φ n ( A n ) → A n suc h t ha t α = ψ n ( φ n ( α )) for all n and fo r all α ∈ A n is asso ciated with t his co de. A separating prop ert y of the co de is required. An algorit hm m ust exist deco ding a ny sequence of concatenated co dew ords. Preﬁx co des satisfy to this requiremen t. An y tw o co dew or ds φ n ( α ) and φ n ( α ′ ) are incomparable under preﬁx metho d of co ding. F or any co de { φ n } a compressing ratio ρ φ n ( α n ) = l ( φ n ( α n )) / ( n log | A | ) of input w ord α n ∈ A n is deﬁned. W e supp ose for simplicit y that A = { 0 , 1 } . In [17, 1 8] co des univ ersal in the mean for some classes of sources were considered, in [1, 2] a co de univ ersal a lmo st ev erywhere for the class o f a ll stationary ergo dic sources w as deﬁned. W e consider co des univ ersal almost ev erywhere. 7 A function φ n ( α ) is computable by both arg umen ts n and α . 13 A co de { φ n } is called universal with resp ect to a class of stationary erg o dic sources if fo r any computable stationary ergo dic measure P from this class lim n →∞ ρ φ n ( ω n ) = H (21) holds P –almost ev ery inﬁnite sequence ω = ω 1 ω 2 . . . , where H is the entrop y of the measure P . There exist sev eral ty p es of Lemp el - Ziv univers al co ding sc heme [1, 2]. Let us recall t w o of them. A co ding alg o rithm is f ed with a w ord ω 1 . . . ω N of length N . By the ﬁrst v arian t of the algor it hm a sequence of letters ω 1 , ω 2 . . . ω n is read beginning at the left and is divided on subblo c ks as f o llo ws: a p ointer on k -th subblo c k is inserted after ω i ( k ) if subblo c k ω i ( k − 1)+1 ω i ( k − 1)+2 . . . ω i ( k ) − 1 w as already seen b et we en previous p ointe rs and subblo c k ω i ( k − 1)+1 ω i ( k − 1)+2 . . . ω i ( k ) w as not seen. T o enco de new subblo c k it is suﬃcien t t o memorize co ordinate of the b eginning of the sequence ω i ( k − 1)+1 ω i ( k − 1)+2 . . . ω i ( k ) − 1 , its length, and new letter ω i ( k ) . The same idea is used in the second v ariant of t he algo r ithm but a sub- blo c k ω i ( k − 1)+1 ω i ( k − 1)+2 . . . ω i ( k ) − 1 is deemed to ha v e app eared if it o ccurs at all – not necessary b etw een p ointers . The follow ing prop osition on non-robustness o f univ ersal co des is an ana- log of Theorem 1 . Prop osition 2 F or any nonne gative, nonde c r e a sing, and unb ounde d f unc- tion σ ( n ) and for an y r e al numb er 0 < ǫ < 1 / 4 a c omputable with r esp e ct to σ stationary er go dic me asur e P with entr opy 0 < H ≤ ǫ exis ts such that for e a ch universal ( f o r class of al l s tationary er go d i c sour c es) c o de { φ n } an inﬁnite binary se quenc e α exists such that d P ( α n ) ≤ σ ( n ) for almost al l n and lim sup n →∞ ρ φ n ( α n ) ≥ 1 4 ; (22) lim inf n →∞ ρ φ n ( α n ) ≤ ǫ. (23) Pr o of . F or an y n a deco ding algorithm ψ n of the co de { φ n } is deﬁned b y log n + O (1) bits. Then w e ha v e K ( α n ) ≤ l ( φ n ( α )) + O (log n ) . (24) Inequalit y (2 2) follows from the inequalit y ( 8 ) of Theorem 1. The pro of o f the inequalit y (23) is analogous to the pro of of the inequalit y (9) of Theo- rem 1. W e mu st only replace condition (15) from the pro of of Theorem 1 on 14 l ( φ n ( ω n )) /n ≤ ǫ and tak e into accoun t property (2 1) of a symptotic optimality of t he co de { φ n } . △ Let { φ N } b e a co de. Under blo c k realization of the co de any sequen ce of letters ω n = ω 1 . . . ω n is divided in consecutiv e blo cks ω = ˜ ω 1 . . . ˜ ω k , where n = ( k − 1) N + q , 0 ≤ q < N and ˜ ω i = ω ( i − 1) N . . . ω iN , i = 1 , 2 , . . . k − 1, is a blo c k o f length N , and ˜ ω k = ω ( k − 1) N . . . ω ( k − 1) N + q is the last incomplete blo c k. An y blo c k ˜ ω i is enco ded by a binary word φ N ( ˜ ω i ). In asymptotic estimates ( when n → ∞ ) metho d of co ding of this la st blo ck ˜ ω k is unessen tia l (w e ﬁx some o f these metho ds). W e write φ N ( ω n ) = φ N ( ˜ ω 1 ) . . . φ N ( ˜ ω k ) and ρ φ N ( ω n ) = l ( φ N ( ω n )) /n . It is prov ed in [2] (Theorem 4) that fo r any stationary ergo dic measure P with en trop y H a prop ert y of a symptotic optimality holds for blo ck real- ization of Lemp el–Ziv co de { φ N } with blo ck s of length N . Relation lim N →∞ lim sup n →∞ ρ φ N ( ω n ) = H (25) holds for P –a lmost all ω . W e can pro v e that equalit y (25) holds also for any sequence ω r andom in sense of Martin-L¨ of with resp ect to a measure P (i.e. when d P ( ω n ) = O (1) as n → ∞ ) . The fo llowing analogue of Theorem 1 holds for blo c k realization of co des with blo ck length N and for co des using sliding windo w of length N (when a new letter of co dew ord dep ends only from N preceding letters of input w ord). Prop osition 3 F or any nonne gative, nonde c r e a sing, and unb ounde d f unc- tion σ ( n ) and for an y r e al numb er 0 < ǫ < 1 / 4 a c omputable with r esp e ct to σ stationary er go dic me asur e P with entr opy 0 < H ≤ ǫ exis ts such that for e a ch universal (for class of al l stationary er go dic sour c es) c o de { φ N } o r for e a c h universal c o de w i th slidin g wind ow of length N an inﬁ n ite binary se quenc e α exis ts such that d P ( α n ) ≤ σ ( n ) for alm ost al l n and for any N lim sup n →∞ ρ φ N ( α n ) ≥ 1 4 , (26) and for al l s uﬃciently lar ge N lim inf n →∞ ρ ( φ N ( α n )) ≤ ǫ. (27) The pro of of this prop osition is a small comlication o f the pro of of Prop osi- tion 2 . 15 Notice, that the prop ert y (2 6) is also hold for adaptive co ding sc heme, i.e. when co ding algorithm dep ends on preceding blo c ks. Using Theorem 1 it can b e pro v ed that non-robustness pro p erty holds for other w ell-kno wn univ ersal co des. F or example, in [19] a univers al fore- casting measure ρ ( ω 1 . . . ω n ) and a co de ψ n suc h that l ( ψ n ( ω 1 . . . ω n )) ≤ − log ρ ( ω 1 . . . ω n ) + 1 were deﬁned. This measure is deﬁned as a mixture ρ ( y ) = ∞ P k =0 λ k ρ k ( y ) of measures ρ k univ ersal for Mark o v sources of order k constructed in the theory of univ ersal co ding [20]. Here λ k is some opti- mal probabilit y distribution on p ositive in teger n umbers (it can b e deﬁned λ k = ck − 1 log − 2 k , where c is a constan t) and φ ( k ) is the corresp onding co de- w ord for a p ositiv e integer n um b er k : l ( φ ( k )) = log k + O (log log k ). In [2 1] an univ ersal co de w a s constructed ψ ( u ) = φ ( l ( u )) ψ l ( u ) ( u ), where u ∈ B ∗ . The univ ersalit y conditions for the measure ρ a nd for the co de ψ is the fo l- lo wing: 8 for any stationa ry measure µ with entrop y H ( µ ) for µ –almost all ω ∈ Ω the mean error of the forecast by measure µ tends to zero lim T →∞ 1 T T X t =1 log µ ( ω t +1 | ω 1 . . . ω t ) ρ ( ω t +1 | ω 1 . . . ω t ) = lim t →∞ 1 t log µ ( ω 1 . . . ω t ) ρ ( ω 1 . . . ω t ) = 0 , (28) and lim n →∞ l ( ψ ( ω n )) /n = lim n →∞ − log ρ ( ω n ) /n = H ( µ ). It is easy to deriv e from the deﬁnition of the deﬁcienc y of randomness that the condition (2 8) is “robust under violation of randomness”, more correctly , it holds for a n y computable stationary measure µ and for an y inﬁnite seque nce ω suc h that d µ ( ω n ) = o ( n ) a s n → ∞ . But the correspo nding unive rsal co de ψ is non- robust for the class of all stationary ergo dic sources. Since a deco ding algo- rithm exists fo r the co de ψ it holds K ( ω 1 . . . ω n ) ≤ l ( ψ ( ω 1 . . . ω n )) + O (1) ≤ − log ρ ( ω 1 . . . ω n ) + O (log n ). Then b y Prop osition 2 there exists an α ∈ Ω, suc h that t he conclus ion of this prop osition holds, in part icular, the condition (22) holds. The pro p erty ( 2 3) can b e obtained as in the pro of of Prop osition 2 b y unive rsalit y of t he co de. The prop ert y of asymptotic optimalit y can b e robust for more narrow classes o f stationary ergo dic sources suc h that as i.i.d sequences of random v ariables o r stationary Marko v ch ains. Prop osition 4 L et P b e an arbitr ary c o m putable pr ob ability me asur e r epr e- senting a stationary er go dic Markov chai n o f ﬁxe d or de r (in p articular, i.i.d 8 W e give some simpliﬁcation of the results of [19, 21]. 16 se quenc e of r andom variables) , H is its entr opy, { φ n } is a varia n t of L emp el – Ziv c om pr essing algorithm. Then fo r any inﬁnite se quenc e ω if d P ( ω n ) = o ( n ) then e quality (2 1) holds, and for blo ck r e ali z a tion of this c ompr essing scheme e quality (2 5) holds. The pro of is based on constructiv e feature o f the pro of of r esults fro m [2]. The Birghoﬀ ’s ergo dic theorem is also used in this pro of that is in the case of Mark o v sources is a v ariant of the la w of large n um b ers. This la w holds for individual sequence ω when d P ( ω n ) = o ( n ) as n → ∞ . 5 App e ndix 1 Bounded increase of the deﬁciency of randomness . In the pr oof of Theo- rem 1 a prop osition on a b ounded increase of the deﬁciency of randomness w as used. Let P b e a measure, P ( x ) 6 = 0 and a set A consists of words y such th at x ⊆ y . Recall, that P ( ˜ A ) = P ( ∪{ Γ y : y ∈ A } ) for an y A ⊆ { 0 , 1 } ∗ . Deﬁn e P ( ˜ A | x ) = P ( ˜ A ) /P ( x ). Prop osition 5 L et P b e a me asur e, x b e a wor d, P ( x ) 6 = 0 and a set A c onsists of wor ds y such that x ⊆ y and P ( ˜ A ) > 0 . Then for any 0 < µ < 1 a sub se t A ′ ⊆ A exists such that P ( ˜ A ′ ) > µP ( ˜ A ) and d P ( y n ) ≤ d P ( x ) − log(1 − µ ) − log P ( ˜ A | x ) for al l y ∈ A ′ and l ( x ) ≤ n ≤ l ( y ) . Pr o of . W e will use in the pr o of a notion of sup ermartingale [12]. A fun ction M is called P –sup ermartingale if it is deﬁned on { 0 , 1 } ∗ and satisﬁes conditions: M (Λ) ≤ 1; M ( x ) ≥ M ( x 0) P (0 | x ) + M ( x 1) P (1 | x ) for all x , where P ( ν | x ) = P ( xν ) /P ( x ) for ν = 0 , 1 (w e p ut h ere 0 / 0 = 0 ∗ ∞ = 0). A sup ermartingale M is low er semicompu table if the set { ( r , x ) : r < M ( x ) } , where r is a rational num b er, is a range of some computable f unction. W e will consider only nonnegativ e sup erm artingale s. Let us pro v e that the deﬁciency of randomness is b ounded by a logarithm of some lo wer semicomputable sup ermartingale. Lemma 1 L et P b e a c omputable pr ob ability me asur e. Then ther e e xi sts a lower semic omputable P –sup ermar tingale M such that d P ( x ) ≤ log M ( x ) for al l x . 17 Pr o of . Let some optimal function ψ satisfying (4) deﬁnes the monotone complexit y K m ( x ). Deﬁne Q ( α ) = B 1 / 2 ( ∪{ Γ p : α ⊆ ψ ( p ) } ) , (29) where B 1 / 2 (Γ α ) = 2 − l ( α ) is the u niform Bernoulli measure on the s et of all binary sequences. It is easy to ve rify th at Q (Λ) ≤ 1 and Q ( α ) ≥ Q ( α 0) + Q ( α 1) for all w ords α . Then the function M ( α ) = Q ( α ) /P ( α ) is a P –sup ermartingale. Since for an y α the shortest p s uc h that α ⊆ ψ ( p ) is an elemen t of the set from (29), w e ha ve in equalit y Q ( α ) ≥ 2 − K m ( α ) , an d so, d P ( α ) ≤ log M ( α ). △ Let d P ( x ) ≤ log M ( x ), where M is lo w er semicomputable P – sup ermartingal. Let us d eﬁne a s et A 1 =  y ∈ A : ∃ j  l ( x ) ≤ j ≤ l ( y ) and M( y j ) > 1 (1 − µ ) P ( A | x ) M ( x )  . A set of wo rds B is called preﬁx free if for any tw o d istinct words x, y ∈ B conditions x 6⊆ y and y 6⊆ x hold. By deﬁn ition of sup erm artingale f or an y p reﬁx free set B su c h th at x ⊆ y for all y ∈ B inequalit y M ( x ) ≥ X y ∈ B M ( y ) P ( y | x ) (30) holds. F or any y ∈ A 1 let y p b e the initial fragment of y of maximal length su c h that M ( y p ) M ( x ) > 1 (1 − µ ) P ( A | x ) . The set { y p : y ∈ A 1 } is pr eﬁx free. Then by (30) we ha ve 1 ≥ X y ∈ A 1 M ( y p ) M ( x ) P ( y p | x ) > 1 (1 − µ ) P ( ˜ A | x ) X y ∈ A 1 P ( y p | x ) ≥ 1 (1 − µ ) P ( ˜ A | x ) P ( ˜ A 1 | x ) . F rom this we obtain P ( ˜ A 1 | x ) < (1 − µ ) P ( ˜ A | x ). Deﬁne A ′ = A − { y ∈ A : z ⊆ y for some z ∈ A 1 } . Then P ( ˜ A ′ | x ) > µP ( ˜ A | x ). F or an y y ∈ A ′ w e ha ve M ( y j ) ≤ M ( x ) 1 (1 − µ ) P ( ˜ A | x ) for all l ( x ) ≤ j ≤ ( y ). The r esult of the p rop osition follo ws from inequalit y d P ( x ) ≤ log M ( x ). △ 18 6 App e ndix 2 Metho d of cutting and stac king . An arbitrary measurable m apping of the a probabilit y space into itself is called a trans f ormation or a pro cess. A transform a- tion T p reserv es a measure P if P ( T − 1 ( A )) = T ( A ) for all measurab le su bsets A of the space. A sub set A is called inv arian t with resp ect to T if T − 1 A = A . A transformation T is called ergod ic if e ac h inv arian t with resp ect to T subset A has measure 0 or 1. The simplest example of suc h tran s formation of the space A ∞ of all in ﬁ nite sequences, where A = { 0 , 1 , . . . , k − 1 } is some ﬁ nite alphab et, is the (left) shift T deﬁned by ( T ω ) i = ω i +1 for all i = 1 , 2 , . . . . If the sh ift T preserves th e measure P then this measure is called stationary , i.e. P { ω : ω i = x 1 , . . . , ω i + k − 1 = x k } = P { ω : ω 1 = x 1 , . . . , ω k = x k } for all p ositiv e in teger num b ers i, k ≥ 1 and all x 1 , . . . , x k equal 0 or 1. Recall some n otio ns of symbolic dynamics. W e us consider th e u niform measure λ on the unit in terv al [0 , 1) and a transformation T of this interv al. A partition is a sequence pairw ise disjoin t subsets π = ( π 1 , . . . , π k ) of the interv al [0 , 1) whose union is equal to this in terv al. A transformation T deﬁn es a m easure on the set o f all ﬁnite and inﬁnite w ords of the alphab et A = { 0 , 1 , . . . , k − 1 } as follo ws P ( a 1 a 2 . . . a n ) = λ { ω : ω ∈ [0 , 1) , T i ( ω ) ∈ π a i , i = 1 , 2 , . . . , n } , (31) where a 1 a 2 . . . a n is a sequence of letters from A . The measur e P can b e extended on all Borel subs ets of A ∞ b y a natural fashion [12]. The measure P d eﬁned b y (31) is stationary and er go dic w ith r esp ect to the left s hift if and only if the transformation T has the same prop erties. W e use a cutting and stac king metho d of constru cting of ergod ic pro cesses [22, 23]. Recall th e m ain notions and p rop erties of this metho d. A column is a sequence E = ( L 1 , . . . , L h ) of pairwise disjoin t sub in terv als of the unit interv al of equal width; L 1 is the base, L h is the top of the column , ˆ E = ∪ h i =1 L i is the su pp ort of the column, w ( E ) = λ ( L 1 ) is the w idth of the column, h is the height of the column, λ ( ˆ E ) = λ ( ∪ h i =1 L i ) is the measure of the column. Any column d eﬁnes an algorithmicall y eﬀectiv e transformation T wh ic h linearly transforms L j to L j +1 for all j = 1 , . . . , h − 1. This transformation T is not deﬁned outside all interv als of the column and at all p oint s of the top L h in terv al of this column. Denote T 0 ω = ω , T i +1 ω = T ( T i ω ). F or any 1 ≤ j < h an arb itrary p oin t ω ∈ L j generates a ﬁn ite tra jectory ω , T ω , . . . , T h − j ω . A partition π = ( π 1 , . . . , π k ) is compatible with a column E if for eac h j there exists an i s u c h that L j ⊆ π i . This num b er i is called the name of the inte rv al L j , and the corresp onding sequence of names of all in terv als of the column is called the name of the column E . F or any p oint 19 ω ∈ L j , where 1 ≤ j < h , b y E –n ame of the tra jectory ω , T ω , . . . , T h − j ω we mean a sequence of n ames o f in terv als L j , . . . , L h from the col umn E . Th e length of this sequence is h − j + 1. A gadget is a ﬁnite collecti on of disjoin t columns. T he width of th e gadget w (Υ ) is the sum of the widths of its column s . A union of gadgets Υ i with disjoint supp orts is the gadget Υ = ∪ Υ i whose columns are the columns of all the Υ i . The supp ort of the gadget Υ is the un ion ˆ Υ of the su pp orts of all its columns. A transformation T (Υ) is asso ciated with a gadget Υ if it is the union of transf or- mations deﬁn ed on all columns of Υ. With any gadget Υ the corresp onding set of ﬁnite tra jectories generated by p oin ts of its columns is asso ciated. By Υ-name of a tra jectory w e mean its E -name, where E is that column of Υ to w hic h this tra- jectory corresp onds. A gadget Υ extends a column Λ if the supp ort of Υ extends the supp ort of Λ, the transformation T (Υ) extends the transformation T (Λ) and the partition corresp onding to Υ extends the partition corresp onding to Λ. The cutting an d stac kin g op erations th at are common used will n o w b e deﬁn ed . The d istribution of a gadget Υ with c olumns E 1 , . . . , E n is a v ector of probabilities  w ( E 1 ) w (Υ ) , . . . , w ( E n ) w (Υ )  . A gadget Υ is a cop y of a gadget Λ if they hav e the same distribution and the corresp onding columns ha ve th e same partition n ames. A gadget Υ can b e cut in to M copies of itself Υ i , i = 1 , . . . , M , according to a giv en probabilit y v ector ( γ 1 , . . . , γ n ) by cutting eac h column E i = ( L i,j : 1 ≤ j ≤ h ( E i )) (and its in terv als) in to disjoint sub columns E i,m = ( L i,j,m : 1 ≤ j ≤ h ( E i )) su c h that w ( E i,m ) = w ( L i,j,m ) = γ m w ( L i,j ). T he gadget Υ m = { E i,m : 1 ≤ i ≤ L } is called the cop y of the gadget Υ of width γ m . T he action of the gadget trans f ormation T is not aﬀected b y the copying op eration. Another op eration is the stac king gadgets on to gadgets. At ﬁrst w e consider the stac king of columns on to columns and the stac king of gadgets on to columns. Let E 1 = ( L 1 ,j : 1 ≤ j ≤ h ( E 1 )) and E 2 = ( L 2 ,j : 1 ≤ j ≤ h ( E 2 )) b e t wo columns of equal width wh ose sup p orts are disjoint. The new column E 1 ∗ E 2 = ( L j : 1 ≤ j ≤ h ( E 1 ) + h ( E 2 )) is deﬁned as L j = L 1 ,j for all 1 ≤ j ≤ h ( E 1 ) and L j = L 2 ,j − h ( E 1 )+1 for all h ( E 1 ) ≤ j ≤ h ( E 1 ) + h ( E 2 ). Let a g adget Υ and a co lumn E h a v e the same width, and their sup p orts are disjoin t. A n ew gadget E ∗ Υ is deﬁned as follo w s. C ut E in to su b columns E i according to the distribution of the gadget Υ such that w ( E i ) = w ( U i ), where U i is the i -th column of the gadget Υ. Stac k U i on the top of E i to get the new column E i ∗ U i . A n ew gadget consists of th e column s ( E i ∗ U i ). Let Υ and Λ b e t wo gadgets of the same wid th and w ith disj oin t sup p orts. A gadget Υ ∗ Λ is d eﬁ ned as follo ws. Let the columns of Υ are ( E i ). Cut Λ in to 20 copies Λ i suc h that w (Λ i ) = w ( E i ) for a ll i . After that, for eac h i stac k the gadge t Λ i on to column E i , i.e. w e consid er a gadget E i ∗ Λ i . T he new g adget is the union of gadgets E i ∗ Λ i for all i . Th e num b er of column s of the gadget Υ ∗ Λ is the pro duct of the num b er of column s of Υ on the num b er of columns of Λ. The M -fold indep endent cutting and stac king of a sin gle gadget Υ is d eﬁned b y cutting Υ into M copies Υ i , i = 1 , . . . , M , of equal width and successive ly indep endently cutting and stac king them to obtain Υ ∗ ( M ) = Υ 1 ∗ . . . ∗ Υ M . Remark 1. S ev eral examples of stationary measur es constr u cted u sing cutting and stac king metho d are giv en in [22, 23]. W e use in Section 3 a construction of a sequ en ce of gadgets deﬁ ning the un iform Bernoulli distrib ution on tra jectories generated by them. This sequence is constructed u sing the follo w ing scheme. Let a partition π = ( π 0 , π 1 ) b e g iv en . Let also ∆ b e a gadget such that its columns ha ve the same width and are compatible with the partition π . Let λ ( ˆ ∆ ∩ π 0 ) = λ ( ˆ ∆ ∩ π 1 ). Supp ose that for some M a gadget ∆ ′ is constructed from the gadget ∆ by means of M –fold ind ep enden t cutting and stac king and P b e a measure on tra jectories of the gadget ∆ ′ deﬁned by (31). Then b y the m ethod of cutting and stac king P ( x ) = 2 − l ( x ) λ ( ˆ ∆) for the tra jectory x of an y p oint fr om the sup p ort of ˆ ∆ ′ . A sequence of gadgets { Υ m } is complete if • lim m →∞ w (Υ m ) = 0; • lim m →∞ λ ( ˆ Υ m ) = 1; • Υ m +1 extends Υ m for all m . An y complete sequ ence of gadgets { Υ s } determines a trans formation T = T { Υ s } whic h is d eﬁned on in terv al [0 , 1) almost surely . By d eﬁnition T preserves th e measure λ . I n [22] and [23] the conditions suf- ﬁcien t a pr ocess T to b e ergo dic were suggested. Let a gadget Υ is constructed b y cutting and stac king fr om a gadget Λ. Let E b e a column fr om Υ and D b e a column from Λ. Th en ˆ E ∩ ˆ D is d eﬁned as the u nion of sub columns from D of width w ( E ) wh ic h we re used for construction of E . Let 0 < ǫ < 1. A gadget Λ is (1 − ǫ )-w ell-distributed in Υ if X D ∈ Λ X E ∈ Υ | λ ( ˆ E ∩ ˆ D ) − λ ( ˆ E ) λ ( ˆ D ) | < ǫ. (32) W e will use the follo w in g t wo lemmas. Lemma 2 ([22], Cor ol lary 1), ([23], The or em A.1). L et { Υ n } b e a c omplete se quenc e of gadgets and for e ach n the gadget { Υ n } is (1 − ǫ n ) -wel l- distribute d in { Υ n +1 } , wher e ǫ n → 0 . Then { Υ n } deﬁnes the er go dic pr o c ess. 21 Lemma 3 ([23], L emma 2.2). F or any ǫ > 0 and any gadget Υ ther e is an M such that for e ach m ≥ M the gadget Υ is (1 − ǫ ) - wel l-distribute d in the gadget Υ ∗ ( m ) c onstructe d fr om Υ by m-fold indep endent cu tting and stacking. References [1] L emp el A., Ziv J. A Univ ersal Algorit hm for Sequen tial Data Compres- sion // IEEE T ra ns. Info rm. Theory . 197 7 . V.23. N3. P .337–34 3. [2] L emp el A., Ziv J. Compression of Individual Sequences via V aria ble Ra te Co ding // IEEE T ra ns. Inform. Theory . 1978. V.24. N5. P .530– 536. [3] Kolm o gor ov A.N. The Logical Basis f o r Information Theory and Prob- abilit y Theory // IEEE T rans. Inf. Theory . 1968. V. 14, P . 662–664. [4] Usp ensky V.A., Seme nov A.L., Shen A.Kh. Can an Indiv idual Sequence of Z ero s and Ones b e Random? // Russian Math. Surv eys. 1990. V. 45. P . 121–189 . [5] Li M., Vit´ anyi P. An introduction to Kolmo g oro v complexit y a nd its applications. New Y ork: Springer–V erlag. 19 97. [6] Martin-L¨ of P. The D eﬁnition of Random Sequences / / Inform. and Con- trol. 1966. V.9. N6. P .602–61 9. [7] V ovk V.G. Th e La w of the Iterated Logarithm for R a ndom Kolmog oro v, or Chaotic Sequences // SIAM Theory Probab. Applic. 1987, V. 32. P . 413–425 . [8] Schn o rr C.P. A Uniﬁed Appro a c h to the Deﬁnition of Random Se- quences // Mathematical Systems Theory . 1971. V.5 P .246–258 . [9] V’yugin V.V. Non-ro bustnes s Prop erty of the Individual Ergo dic Theo- rem // Probl. Info r m. T ransm. 2001. V.37. P .2 7–39. [10] R o gers H. Theory of Recursiv e F unctions and Eﬀectiv e Computability , New Y ork: McGraw Hill. 1967 . [11] Zvo nkin A.K. and L evin L.A. The Complexit y of Finite Ob jects and the Algorithmic Concepts of Informatio n and R andomness // Russ. Math. Surv. V.25. P .83–124. 22 [12] Shi ryaev A.N. Probability . Berlin: Springer. 198 4 . [13] L evin L.A. On the Notion of Random Sequence / / Soviet Math. Dokl. V.14. P .1413–1 416. [14] Kol m o gor ov A.N., Usp en sky, V.A. Algorithms and Randomness / / The- ory Probab. Applic. 1987. V. 32. P . 389–412. [15] B i l lingsly P. Ergo dic theory and Infor ma t io n. New Y ork: Wiley . 1965. [16] V’yugin V.V. Ergo dic Theorems for Individual Random Sequences // Theoretical Computer Science. 1998. V.207. N4. P .343- 361. [17] Fi ttinghof B.M. O ptimal Co ding in the Case of Unkno wn and Changing Message Statistics// Probl. Inform. T ransm. 1966. V.2. N2. P .3–11. [18] D a visson L.D. Univ ersal Noiseless Co ding // IEEE T rans. Inform. The- ory . 1973. V.19. P .783–795 . [19] Ryabko B. Prediction of Random Sequences and Univ ersal Co ding // Probl. Inform. T ransm. 1988 . V.24. P .3–14. [20] Krich evsky R.E., T r oﬁmov V.K. The P erforma nce o f Univ ersal Co ding // IEEE T ra ns. Inform. Theory . 1981. V.27. N2 . P .199–207. [21] Ryabko B. Twice Univ ersal Co ding / / Probl. Inform. T ransm. 19 84. V.20. P .173–17 8. [22] Shi e lds P.C. Cutting and Stac king: a Metho d for Constructing Statio n- ary Pro cesses // IEEE T rans. Inform. Theory . 1991. V.37. N6. P .16 0 5– 1617. [23] Shi e lds P.C. Tw o D iv ergence-Rate Coun terexamples / / J. Theoret. Probabilit y . 1993. V.6. P .521–545 . 23

Problems of robustness for universal coding schemes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment