Depth as Randomness Deficiency

Depth of an object concerns a tradeoff between computation time and excess of program length over the shortest program length required to obtain the object. It gives an unconditional lower bound on the computation time from a given program in absence…

Authors: Luis Antunes (Univ. Porto), Arm, o Matos (Univ. Porto)

Depth as Randomness Deficienc y Lu ´ ıs An tunes ∗ Computer Scienc e Dep artment University of Porto Armando Matos † Computer Scienc e Dep artment University of Porto Andr ´ e Souto ‡ Computer Scienc e D ep artment University of Porto § P a ul Vit´ an yi ¶ CWI and Computer Scienc e Dep artment University of Amster dam No v em b er 21, 2018 Abstract Depth of an ob ject concerns a tradeoff be tw een computation time and excess of pro- gram length over the shortest progra m length required to o btain the ob ject. It gives an unconditional lower b ound on the computation time from a given prog ram in a bsence of auxiliary information. V arian ts known as logica l depth and computational depth ar e expressed in Kolmogo rov complexity theory . W e derive quantitative relatio n b etw e e n logical depth and computationa l depth a nd unify the different de pth notions b y relating them to A. Kolmo gorov and L. Levin’s fruitful notion of ra ndomness deficiency . Subsequently , we r evisit the computationa l depth of infinite string s, intro ducing the notion of sup er de ep s equences and r elate it with other approaches. 1 In tro duction The information con tained in an individual fi nite ob ject (a finite binary string) can b e mea- sured by its Kolmogoro v complexit y—the length of the shortest bin ary p rogram th at computes the ob ject . Suc h a shortest program conta ins no redundancy: ev ery b it is information; b ut is it meaningful inf ormation? If we flip a fair coin to obtain a finite binary string, th en with o verwhelming probabilit y that string constitutes its own shortest descrip tion. Ho w ev er, with o verwhelming probabilit y also, all the b its in the string are apparen tly meanin gless informa- tion, just random noise. The op p osite of r andomness is regularit y; and the effectiv e regularities in an ob ject can b e used to compress it and cause it to h a ve lo w er Kolmogoro v complexit y . Regular ob jects ∗ Email: lfa@ncc.up .pt . W eb: http: //www.ncc .up.pt/ ∼ lfa . Adress: Departament o de Ci^ en cia de Computador es. Rua Campo Alegre, 1021/1055, 4169 - 007 PORTO, PORTUGAL † Email: acm@ncc.up.pt . W eb: http://www.ncc. up.pt/ ∼ acm . ‡ Email: andresouto@dcc.fc.u p.pt . W eb: http://www.ncc.up .pt/ ∼ andres outo . § The authors from Universit y of Porto are p artially sup p orted by KCrypt (POSC/EIA/60819/2 004) and funds gran ted to LIACC through the Programa de Financiamento Plurianual, F un da¸ c˜ ao para a Ciˆ encia e T ecnolog ia and Prog rama POSI ¶ Email: Paul.Vitanyi@cwi.nl . W eb: http://homepages .cwi.nl/ ∼ pau lv/ . 1 con tain la ws that go v ern their existence and h a ve meaning. This meaning ma y b e instantl y clear, but it is also p ossible that this meaning b ecomes inte lligible only as th e result of a long computation. F or example, let the ob ject in question b e a b ook on num b er theory . The b ook will list a num b er of difficu lt theorems. Ho w ever, it has v ery lo w Kolmogoro v complexit y since all theorems are deriv able from the initial few definitions. Our estimate of the difficult y of the b o ok is based on the fact that it take s a long time to r epro duce the b o ok from part of the information in it. W e can transmit all the information in the b o ok by just transmitting the theorems. The receiv er will hav e to s p end a long time reconstructing the pro ofs and the full b ook. On the other hand, w e can send all of the b o ok. No w the receiv er has all the useful information without literally , and do es not ha v e to sp end time to extract information. Hence, th ere is a tr ad eoff: in b oth cases w e send the same i nformation in terms of K olmogoro v complexit y , b ut in the former case it tak es a long time to reco nstruct it from a sh ort message, and in the latter case it takes a short time to r econstru ct it from a long message. T h e existence of su ch b o ok is itself evidence of s ome long evolutio n preceding it. The compu tational effort to transform the information into ‘u sable’ information is called ‘dep th ’. W e also use a cen tral n otion in Kolmogoro v complexit y: that of ‘randomness deficiency’. The rand omness deficiency of an ob ject in a particular distribution qu an tifies th e ‘t ypicalit y’ or ‘randomn ess’ of that ob ject for that d istribution. A rand omness deficiency of 0 tells us that the ob ject is typica l (w e b eliev e that the ob ject is r an d omly drawn from the distribu tion). A high rand omn ess d eficiency tells u s that the ob ject is at ypical an d not lik ely to b e ran d omly dra wn. Finally , w e consider the inform ation in one ob ject ab out another one and vice v ersa, and since these are approximate ly equal w e call it ‘m u tual inform ation.’ Results: F or fi n ite strings, w e deriv e quantita tiv e relations b et w een the differen t notions of d epth: logical depth and compu tational depth (S ection 3). In Section 4 w e p r o ve that t hese t w o notions of depth are instances of a more general measure, namely , Levin ’s randomness deficiency , i.e. , compu tational d epth is the randomn ess deficiency with resp ect to the time b ound ed un iv ersal semimeasure and logica l depth is the least time f or wich the randomness deficiency with resp ect to the time b ounded apriori p robabilit y is u pp er b ou n ded b y the significance lev el. Next, w e study the information con tained on infinite sequences. Ap p lying the randomness deficiency with resp ect to M ⊗ M , w here M is the un iv ers al lo wer s emicomputable semimea- sure ov er { 0 , 1 } ∞ , Levin [Lev74, Lev84] defined mutual information for in finite sequences. W e observe that despite the co rrectness of the defin ition, it d o es n ot f u lly ac hieve the de- sired c h aracteriza tion of m utual information. F or example, if α = α 1 α 2 ... an d γ = γ 1 γ 2 ... are t w o K olmogoro v random sequ ences and w e construct the sequence β = α 1 γ 1 α 2 γ 2 ... , then I ( α : β ) = I ( β : α ) = ∞ . Ho we v er intuitiv ely β h as more informatio n ab ou t α than the other w a y around since from β w e can fully reconstruct α but from α w e can only reco ver half of β . In order to fulfil our intuitio n we p rop ose s ome d efinitions of normalized mutual information for infinite sequences. W e relate this notion with the constructiv e Hausdorff dimension, using the resu lt pro v ed by May ordomo in [May0 2]. Na mely , we sho w that the n ormalized mutual information of α with resp ect to β is at least the r atio of the constructiv e Hausdorff dimen- sions of α and β up to an additiv e f actor that measures the difficult y to reco ver the initial segmen ts of α from the initial segments of the s ame s ize of β . This connection motiv ates the definition of dimensional m utual information f or infi nite sequences. Th is measur e, con trarly to the normalized m utual in f ormation, is symmetric and it is at most the m inim um b et wee n the norm alized mutual information of α with resp ect to β and vice v ersa. In the last section w e revisit the notion of depth for in finite sequences, prop osing a new 2 depth measure called dimensional depth. As the name suggests, this measure is related to the constructiv e Hausdorff dimension. W e prov e that dimensional d epth is at most the difference b et w een time b ound ed and resource un b ounded v ersions of constructiv e Hausdorff d imension and finally w e fully charac terize sup er deepness using our prop osed measures in a s imilar w a y as done in [JLL94]. Previous work : Bennett [Ben88] in tro duced the notion of logical depth of an ob ject as the amount of time required for an algorithm to deriv e the ob ject from a sh orter description. An tunes et al. [AFMV06] consider logical depth as one instan tiation of a more general theme, computational depth, and prop ose several other v arian ts based on the difference b e- t w een a resource b ound Kolmogoro v complexity m easure and the un b ounded Kolmogoro v complexit y . F or infinite sequences, Bennett ident ified the classes o f we akly and str ongly de e p sequences, and sho w ed that the halting pr ob lem is strongly deep. In tuitiv ely a sequence is strongly d eep if no computable time b ound is enough to compress infin itely man y of its prefi xes to within a constan t num b er of bits of its smallest represen tation. An in terp retation of strongly deep ob jects is giv en in [LL99]; a strongly deep sequence is analogous to a great w ork of literature for whic h no num b er of readings suffices to exhaust its v alue. S ubsequently Jud es, Lathrop, and Lutz [JLL94] extended Bennett’s work defin ing the cla sses of we akly useful sequences. The compu tational usefulness of a sequence can b e measured as the class of computational problems th at can b e solved efficientl y , giv en acc ess to that sequence. M ore form ally , for infinite s equ ences, a sequence is w eakly usefu l if ev ery elemen t of a n on-negligible set of decidable sequences is reducible to it in r ecursiv ely b ounded time. L athr op, and Lutz [JLL94] pro v ed that ev ery we akly useful sequen ce is strongly deep in th e sense of Bennett. Later, F enner et al. [FLMR05] pro v ed that there exist sequ ences that are wea kly useful but not strongly useful. Lathrop and Lutz [LL99] introdu ced refi n emen ts (named r e cursive we ak depth and r e cursive str ong depth ) of Bennett’s notion of weak and strong depth, and studied its fund amental p rop erties, sho wing that recursiv ely w eakly (resp. strongly) deep sequences form a prop er sub class of the class of w eakly (resp. strongly) deep sequences, and also that ev ery we akly u seful sequences is recursiv e strongly d eep. Levin [Lev74, Lev84] show ed that the randomness defi ciency of x with resp ect to µ is the largest, within an additive constan t, randomness µ -test for x . So δ ( x | µ ) is, in a sense, a unive rsal c h aracteriza tion of “non-rand omness”, “useful” or “meaningful” information in a string x with resp ect to a probabilit y distrib ution µ . 2 Preliminaries W e briefly introd uce some notions from Kolmogoro v complexit y , mainly the standardize no- tation. W e r efer to the textb ook b y L i and Vit´ an yi [L V97] f or more details. Let U b e a fi xed unive rsal T uring mac hine. F or tec hnical reasons we c ho ose one with a separate read-only input tap e, that is scanned fr om left-to-righ t without b ac king up, a separate w ork tap e on whic h the computation tak es place, and a separate outpu t tap e. Up on halting, th e initial segmen t p of the inpu t that has b een scanned is called the inpu t “program” and the conten ts of the outpu t tap e is called the “output”. By construction, the set of halting programs is prefix f r ee. W e call U the reference u niv ersal prefix mac h ine. In the r est of this p ap er w e denote the n - length prefix of an infinite sequ ence α by α n and the i th bit by α i . 3 Definition 2.1 (i) The (prefix) Kolmo gor ov c omplexity of a finite b inary string x is d efined as K ( x ) = min p {| p | : U ( p ) = x } , where p is a program, and the Universal a priori pr ob ability of x is Q U ( x ) = X U ( p )= x 2 −| p | . (ii) A time-constructible fu nction t from natur al n umb ers to natural num b ers is a fun ction with the pr op ert y that t ( n ) can b e constru cted from n by a T ur ing mac hine in time of order O ( t ( n )). F or ev ery time-c onstructible t , the t - time-b ounde d K olmo gor ov c omplexity of x is defined as K t ( x ) = min p {| p | : U ( p ) = x in at most t ( | x | ) steps } , and the t - time b ounde d Universal a priori pr ob ability is defin ed as Q t U ( x ) = X U t ( p )= x 2 −| p | , and U t ( p ) = x means that U computes x in at most t ( | x | ) steps and h alts. A different unive rsal T u ring mac hine ma y affect the p rogram size | p | by at m ost a constan t additiv e term, and the ru nning time t b y at most a logarithmic multiplica tiv e fact or. The same will h old for all other measures we will in tro duce. Levin [Lev74] sho w ed that the Kolmogoro v co mplexit y of a string x coincides u p to an additiv e constan t term with the logarithm of 1 /Q U ( x ). This result is calle d the “Codin g Theorem” since it sho ws that the s h ortest upp er semicomputable co de is a Shann on-F ano co de of the greatest lo w er semicomputable pr ob ab ility mass fu nction. In order to state formally the Co d ing theorem w e need the follo wing theorem on the existence of a univ ersal lo w er semicomputable discrete semimeasure (Theorem 4.3.1 in [L V97]). Theorem 2.2 Ther e exists a universal lower sem ic omputable discr ete semime asur e over { 0 , 1 } ∗ , denote d by m . Theorem 2.3 (C o ding Theorem) F or every x ∈ { 0 , 1 } n , K ( x ) = − log Q U ( x ) = − log m ( x ) with e quality up to an additive c onstant c . Hence, if x has high probability b ecause it has many long descriptions then it m ust h a ve a short description to o. W e refer to mutual information of tw o finite strings as I ( x : y ) = K ( x ) + K ( y ) − K ( x, y ) . Notice that th e m utual inform ation is symmetric, i.e., I ( x : y ) = I ( y : x ). 4 3 Depth Bennett [Ben88] defin es the b -significant logical d epth of an ob ject x as th e time required by the reference un iversal T uring mac hine to generate x by a pr ogram that is no more than b bits longer than the shortest descriptions of x . Bennett talks ab out time as the n um b er of steps; without loss of generalit y w e consider the num b er of steps t ( | x | ), where t is a time- constructible fu nction. Definition 3.1 (Logical Dept h) The logical depth of a string x at a significance lev el b is ldepth b ( x ) = min  t ( | x | ) : Q t U ( x ) Q U ( x ) ≥ 2 − b  , where the min im um is tak en o ve r all time constru ctible t . Giv en a significance lev el b , the logical d ep th of a string x is the minimal ru nning time t ( | x | ), suc h that programs runn ing in at m ost t ( | x | ) steps accoun t f or appro ximately a 1 / 2 b fraction of x ’s u niv ersal p robabilit y . T his is Bennett’s T en tativ e Definition 0.3 in [Ben88] p. 240. In fact, w ith some probabilit y w e can derive th e string by simp ly flip ping a coin. But for long strings th is probabilit y is exceedingly sm all. If the string has a s h ort description then w e can fl ip that description with higher p r obabilit y . Benn ett’s prop osal tries to express the tradeoff b et wee n the probabilit y of fl ip ping a short program and the shortest computation time from program to ob ject. An tunes et al. [AFMV06] dev elop ed th e notion of compu tational depth in o rder to c apture the tradeoff b et we en the amoun t of help bits required and the reduced computation time to compute a string. The concept is s im p le: they consid er the difference of t wo v ersions of Kolmogoro v complexit y measur es. Definition 3.2 (Basic C omputational Dept h) Let t b e a time constru ctible fun ction. F or any finite bin ary string x we define depth t ( x ) = K t ( x ) − K ( x ) . In Definition 1 of [Ben88] p. 241 w e find Definition 3.3 A strin g x is ( t ( | x | ) , b )-deep iff t ( | x | ) is the least num b er of steps to compute x from a program of length at most K ( x ) + b . Then, it is straigh tforw ard that depth t ( x ) = K t ( x ) − K ( x ) iff x is ( t ( | x | ) , K t ( x ) − K ( x ))-deep. Bennett r emarks, [Ben88] p. 241, “The difference b et we en [Definitions 3.3 and 3.1] is rather subtle philosophically and not very grea t quant itativ ely .” This is follo wed by [ Ben88] Lemma 5 on p. 241 whic h is an in formal version of [L V97] T h eorem 7.7.1. The pro of of I tem (ii) b elo w uses an id ea in the p ro of of the latter theorem. Definition 3.4 Let t b e a recursiv e fu n ction. Define K ( t ) as th e (p refix) K olmogoro v com- plexit y of t by K ( t ) = min i { i : T i computes t ( · ) } , w here T 1 , T 2 , . . . is the s tandard enumeration of all T uring mac hines. 5 Theorem 3.5 L et t b e a time-c onstructible function (henc e it is r e cursive and K ( t ) is define d in Definition 3.4). (i) If b is the minimum value such that ldepth b ( x ) = t ( | x | ) , then depth t ( x ) ≥ b + O (1) . (ii) If depth t ( x ) = b , then ldepth b +min { K ( b ) ,K ( t ) } + O (1) ( x ) ≥ t ( | x | ) . Pr o of . (i) Assume, ldepth b ( x ) = t ( | x | ). So Q t U ( x ) Q U ( x ) ≥ 2 − b , with t ( | x | ) least. Assume f urthermore that b is the least integ er so that the inequalit y holds for this t ( | x | ). W e also hav e Q t U ( x ) Q U ( x ) ≥ 2 − K t ( x ) Q U ( x ) = 2 − ( K t ( x ) − K ( x ) − O (1)) = 2 − b − ∆ , where b + ∆ = K t ( x ) − K ( x ) − O (1). The first inequalit y holds since the sum Q t U ( x ) comprises a term 2 − K t ( x ) based on a shortest pr ogram of length K t ( x ) compu ting x in at most t ( | x | ) steps. Since b is the least integ er, it follo ws that ∆ ≥ 0. Since depth t ( x ) = K t ( x ) − K ( x ), we find that depth t ( x ) ≥ b + O (1). (ii) Assume that depth t ( x ) = b , that is, x is ( t ( | x | ) , b )-deep. W e can enumerate the set S of all p rograms computing x in time at most t ( | x | ) by sim u lating all pr ograms of length l ≤ | x | + 2 log | x | for t ( | x | ) steps. Hence, the shortest such program q enumerating S h as length | q | ≤ K ( x, t ) + O (1). But we ac h iev e the same effect if, give n x and b we en umerate all programs of length l as ab o v e in order of increasing ru nning time and stop when the accum u lated algorithmic probability e xceeds 2 − K ( x )+ b . T he running time of the last p rogram is t ( | x | ). (This sho w s that K ( t, x ) ≤ K ( b, x ) + O (1), n ot K ( t ) ≤ K ( b ) + O (1)). The shortest program r doing th is h as length | r | ≤ K ( x, b ) + O (1). Hence, K ( S ) ≤ min { K ( x, t ) , K ( x, b ) } + O (1). By d efinition, Q t U ( x ) = P p ∈ S 2 −| p | . Assume, b y w a y of contradicti on, that Q t U ( x ) Q U ( x ) < 2 − b − min { K ( b ) ,K ( t ) }− O (1) Since Q U ( x ) = 2 − K ( x ) − O (1) , we hav e Q t U ( x ) < 2 − K ( x ) − b − min { K ( b ) ,K ( t ) } − O (1) Denote m = K ( x ) + b + min { K ( b ) , K ( t ) } + O (1). T h erefore, P p ∈ S 2 −| p | < 2 − m . No w ev ery string in S can b e effectiv ely compressed by at least m − K ( S ) − O (1) bits. Namely , X p ∈ S 2 −| p | + m < 1 The latter inequalit y is a K raft inequalit y , and hence the elemen ts of S can b e co ded by a prefix cod e with the co de word length for p at most | p | − m . In order to mak e this cod ing effectiv e, w e u se a p rogram of length K ( S ) to enumerate exactly the strings of S . T h is tak es an additional K ( S ) + O (1) b its in the co de f or eac h p ∈ S . This w ay , eac h p ∈ S is effectiv ely compressed b y m − K ( S ) − O (1) bits. Therefore, eac h p ∈ S can b e compressed by at least 6 K ( x ) + b + min { K ( b ) , K ( t ) } − min { K ( x, t ) , K ( x, b ) } bits, up to an additiv e constan t we can set freely , and hence b y more than b b its whic h is a contradicti on. Hence, Q t U ( x ) Q U ( x ) ≥ 2 − b − min { K ( t ) ,K ( b ) }− O (1) whic h prov es (ii). ⋄ 4 A Unifying Ap p roac h Logical d epth and computational depth are all instances of a more general measure, n amely the randomness deficiency of a strin g x with r esp ect to a probabilit y d istribution, Levin [Lev74, Lev84]. In the rest of this pap er, with some a buse of nota tion (see [L V97]), a f unction µ : { 0 , 1 } ∗ → R defines a pr ob ability me asur e , or me asur e for sh ort, if µ ( ǫ ) = 1 , µ ( x ) = X a ∈{ 0 , 1 } µ ( xa ) . Definition 4.1 Let µ b e a compu table measure. The v alue δ ( x | µ ) =  log Q U ( x ) µ ( x )  is the r andomness deficienc y 1 of x with resp ect to µ . Here Q U is the universal a p riori probabilit y of Definition 2.1. Note that Q U ( x ) is of exact order of magnitude of 2 − K ( x ) b y the Co din g Theorem 2.3, i.e., up to multiplicativ e te rms Q U ( x ) and 2 − K ( x ) are equal. (In the literature, see for ex- ample [L V97], m ( x ) = 2 − K ( x ) is u s ed instead of Q U ( x ), and it is straigh tforw ard that this is equiv alen t u p to a multiplica tiv e in d ep endent constan t by th e Co ding Th eorem.) W e no w observ e that logical depth and compu tational depth of a string x equals the randomness deficiency of x w ith r esp ect to the measures Q t ( x ) = P U t ( p )= x 2 −| p | and 2 − K t ( x ) resp ectiv ely . T he pro ofs follo w directly fr om the definitions. Lemma 4.2 L et x b e a finite binary string and let t b e a time-c onstructible f u nction. (i) ldepth b ( x ) = min { t : δ ( x | Q t ) ≤ b } . (ii) depth t ( x ) = δ ( x | m t ) wher e m t ( z ) = 2 − K t ( z ) . 5 On the information of infinite strings Based on the unification of depth concepts for finite strings, in this section w e extend those ideas for infinite sequences. In order to motiv ate our approac h w e start by introd ucing Levin’s n otion of rand omn ess deficiency for infinite sequences. Let M b e th e universal lo wer semicomputable (conti nous) semimeasure ov er { 0 , 1 } ∞ as d efined, and pr o ved to exist, b y 1 ⌊ r ⌋ denotes th e integer part of r and ⌈ α ⌉ den otes the smallest in teger b igger than α . 7 [Lev84] (see also [L V97]). If α ∈ { 0 , 1 } ∞ , then with α = α 1 α 2 . . . with α i ∈ { 0 , 1 } , w e write α n = α 1 α 2 . . . α n . Finally , w e write ‘ M ( x )’ and ‘ µ ( x )’ as notational shorthan d f or ‘ M (Γ x )’ and ‘ µ (Γ x )’, with x ∈ { 0 , 1 } ∗ and Γ x is the cylinder { ω : ω ∈ { x }{ 0 , 1 } ∞ } . Strictly sp eaking, M ( x ) is not o v er { 0 , 1 } ∞ but o v er { 0 , 1 } ∞ S { 0 , 1 } ∗ , see also [L V97], and M ( x ) is the pr obabilit y concen trated on th e set of finite and infi nite sequences starting with x . Definition 5.1 (Levin) The v alue D ( α | µ ) =  log  sup n M ( α n ) µ ( α n )  is called the random- ness deficiency of α w ith resp ect to the semimeasur e µ . Here M ( α n ) is the p r obabilit y densit y fun ction of M ( α n ). Let α and β b e t w o sequ en ces an d M ⊗ M b e defin ed by M ⊗ M ( α, β ) = M ( α ) M ( β ). Definition 5.2 (Levin) The v alue I ( α : β ) = D (( α, β ) | M ⊗ M ) is called the amount of information in α ab out β or the deficiency of their indep endence. This defi nition is equiv alent to the m utual in formation I ( α : β ) = sup n I ( α n : β n ). Example 5.3 Let α and γ b e t w o random infi n ite and indep endent strings (in the sense t hat their prefixes are indep en den t). Consider the follo wing sequence β = α 1 γ 1 α 2 γ 2 . . . By Definition 5.2 w e ha v e I ( α : β ) = sup n I ( α n : β n ) = sup n ( K ( α n ) + K ( β n ) − K ( α n , β n )) ≥ sup n  n + n −  n + n 2  = ∞ . As I ( β : α ) = I ( α : β ) then I ( β : α ) = ∞ . Ho wev er, in tuitiv ely β con tains more information ab out α than the other wa y around, since from the sequence β we can totally reconstruct α b ut fr om α we can only r eco ve r h alf of β , namely , the bits with o dd indexes. This seems to b e a lacuna in Definition 5.2. The definition sa ys more when the information is fi nite b ut that is p recisely wh en we do not need an accurate result. Notice that if the sequences are finite we can argue that they are indep en den t. In th e infinite case, one sh ould b e able to classify the cases wh ere the m utual in f ormation is infinite. Tw o infinite sequences ma y ha ve infinite m u tu al information and y et infin ite information ma y b e still lac kin g to reconstruct one of them out of the other one. In the previous example α fails to provide all the in formation of β related to γ , whic h has infinite inf orm ation. In this section we will present t w o a pproac hes to reformulate the defin ition of “mutual in formation” in ord er to fu lfill our intuitio n. In order to hav e a pr op ortion of in f ormation as the pr efixes gro w we need to do some n ormalizatio n in the pro cess. 8 5.1 The Mutual Information P oint of V iew W e are lo oking for a normalized mutual information measure I m that applied to Example 5.3 giv es I m ( α : α ) = 1 I m ( α : β ) = 1 / 2; I m ( β : α ) = 1 I m ( β : β ) = 1 Con trarily to Levin’s d efinition of mutual information for infinite s equences, and accord- ingly to our in tuition, the ab o ve conditions imp ly that the normalized v ersion m ust b e non- symmetric. Definition 5.4 (First a ttempt) Giv en t w o in finite s equences α an d β the normalized mu- tual inform ation that β has ab out α is defined as I m ( β : α ) = lim n →∞ lim m →∞ I ( β m : α n ) I ( α n : α n ) The ma jor drawbac k of th is definition is the fact that the limit do es not alw a ys exist. 2 Ho wev er, it d o es exist for the Examp le 5.3 with the d esired prop erties. F urthermore, we obtain for the same α and β I m ( α : α ) = 1 ; I m ( β : β ) = 1; I m ( α : β ) = lim n →∞ lim m →∞ m + n − ( m + n − n/ 2) n = 1 2 ; I m ( β : α ) = lim n →∞ lim m →∞ m + n − m n = 1 ; Definition 5.5 (Normalized m ut ua l information for infinite sequences) Giv en t w o infinite sequ en ces α and β we define the lo wer n ormalized mutual information that β has ab out α as I m ∗ ( β : α ) = lim inf n →∞ lim m →∞ I ( β m : α n ) I ( α n : α n ) and the u pp er normalized m u tual information that β h as ab out α as I ∗ m ( β : α ) = lim su p n →∞ lim m →∞ I ( β m : α n ) I ( α n : α n ) Notice t hat these definitions a lso fulfi ll the requiremen ts presen ted in the b eginnin g of this section with resp ect to Example 5.3. W e no w can defin e ind ep endence with resp ec t to normalized m u tual inform ation: Definition 5.6 Two s equ ences, α and β , are indep enden t if I ∗ m ( α : β ) = I ∗ m ( β : α ) = 0. 2 Notice that there are sequences α for whic h lim n n K ( α n ) does not exist. 9 In [Lut00, Lut02], the author dev elop ed a constructiv e v ersion of Hausd orff dimens ion. That dimension assigns to ev ery binary sequence α a real n um b er dim( α ) in the in terv al [0 , 1]. Lutz claims that the dimension of a sequence is a measure of its in formation density . The idea is to differen tiate sequences b y non-randomness degrees, namely by their dim en sion. Our appr oac h is precisely to introd uce a measure of density of information that one sequence has ab out the other, in the total amoun t of the other’s information. So w e differen tiate non-indep en d en t sequences, b y th eir normalized mutual information. Ma yordomo [Ma y02] r edefined constructiv e Hausd orff dimension in terms of Kolmogo ro v complexit y . Theorem 5.7 (May ordomo) F or every se qu e nc e α , dim( α ) = lim inf n →∞ K ( α n ) n So, no w the connectio n b et w een constructiv e dimension and normalized information mea- sure in trod uced here is clear. It is only natural to accomplish resu lts ab out the Haus d orff constructiv e dimension of a sequence, kno w ing the d imension of another, and their normalized information. Lemma 5.8 L et α and β b e two infinite se quenc es. Then I ∗ m ( α : β ) · d im( β ) ≥ dim( α ) + lim inf n →∞ − K ( α n | β n ) n Pr o of . I ∗ m ( α : β ) · d im( β ) = lim sup n lim m I ( α m : β n ) I ( β n : β n ) · lim inf n K ( β n ) n ≥ lim inf n lim inf m I ( α m : β n ) n ≥ lim inf n lim inf m I ( α m : β n ) m ≥ lim inf n lim inf m K ( α m ) − K ( α m | β m ) m ≥ lim inf m K ( α m ) m + lim inf m − K ( α m | β m ) m = dim( α ) + lim inf m − K ( α m | β m ) m ⋄ Note that, in the previous lemma the (unexp ected) aditiv e term lim inf m − K ( α m | β m ) m is necessary to expresses the h ardness of reco v er α giv en β . W e present n ow the time b ou n ded version of dim( α ). T h is d efi nition will b e imp orta n t later on this pap er. Definition 5.9 Let t b e a time-constructible function. The t -b ounded d imension of an in fi- nite sequence α is d efined as dim t ( α ) = lim in f n →∞ K t ( α n ) n 10 5.2 The Hausdorff constr uctiv e dimension p oin t of view In this subsection we define a v ersion of mutual information b et we en tw o sequences based on Hausdorff constructive dimension and establish a connection to it. Definition 5.10 Th e dimensional mutual information of the sequen ces α and β is defined as I dim ( α : β ) = dim( α ) + dim( β ) − 2 dim h α, β i This measure of m utual information is symmetric. The definition considers t wice dim h α, β i b ecause when encod ing the p refixes α n and β n the resu lt is a 2 n -length string. Notic e th at, I dim ( α : β ) = dim( α ) + d im ( β ) − 2 dim h α, β i = lim inf n →∞ K ( α n/ 2 ) n/ 2 + lim inf n →∞ K ( β n/ 2 ) n/ 2 − 2 lim inf n →∞ K ( h α, β i n ) n ≤ lim inf n →∞ K ( α n/ 2 ) + K ( β n/ 2 ) − K ( α n/ 2 , β n/ 2 ) n/ 2 = lim inf n →∞ I ( α n : β n ) n ≤ lim inf n →∞ I ( α n : β n ) K ( β n ) ≤ lim inf n →∞ lim m →∞ I ( α m : β n ) K ( β n ) = I m ∗ ( α : β ) The third inequalit y is tru e due to the f ollo wing fact: I ( β n : α m ) = K ( β n ) − K ( β n | α m ) ≥ K ( β n ) − K ( β n | α n ) = I ( β n : α n ) . By the sym metry of th e defin ition we also ha v e that I dim ( α : β ) ≤ I m ∗ ( β : α ). These t wo facts prov e the follo w ing lemma: Lemma 5.11 L et α and β b e two se q u enc es. Then I dim ( α : β ) ≤ min( I m ∗ ( α : β ) , I m ∗ ( β : α )) One can easily mo dify the d efinitions introdu ced in this section by considering the limits when n goes to the length of the strin g, or the maxim um length of the strings b eing considered . One should also notice that wh en x and y are fin ite strings and K ( y ) ≥ K ( x ), I m ∗ ( x : y ) is 1 − d ( x, y ), where d ( x, y ) is the normalized inf ormation d istance studied in [Li03]. 6 Depth of infinite strings In this section we revisit depth for infi nite sequences. W e in tro duce a new depth measure, pro v e that it is closely related w ith constructive Hausdorff dimension and u se it to c haracterize sup er deepn ess. T o motiv ate our defin itions w e r ecall the definitions of the classes of w eakly (vs. strongly deep) sequences and w eakly usefu l (vs. strongly useful) sequences. Definition 6.1 ([Ben88]) An infi nite binary sequen ce α is defined as 11 • wea kly deep if it is not compu table in r ecursiv ely b ounded time from any algorithmically random infi nite sequence. • strongly d eep if at ev ery significance leve l b , and for ev ery r ecursiv e function t , all b u t finitely many initial segments α n ha v e logica l depth exceeding t ( n ). Definition 6.2 ([FLMR05]) An infi nite binary sequ ence α is d efined as • wea kly usefu l if there is a computable time b ound w ithin whic h all the sequences in a non-measure 0 subset of the set of decidable sequences are T uring redu cible to α . • strongly usefu l if th ere is a computable time b oun d within wh ic h ev ery decidable se- quence is T uring red u cible to α . The relation b et w een logical depth and usefulness w as stu d ied by Juedes, Lathrop and Lutz [JLL94] w ho d efined th e conditions for w eak and strong usefu lness and show ed that ev ery we akly u seful sequence is strongly deep. This result generaliz es Bennett’s remark that the diagonal halting problem is strobgly deep, strengthening the relation b et ween d epth and usefulness. Latt er F en n er et al. [FLMR05] pr o ved th e existence of sequences that are wea kly useful b u t n ot strongly u seful. The Hausd orff constructiv e dimension h as a close connection with the information theories for infin ite strings stud ied b efore, s ee for example [FLMR05 ], [Lut00], [Lut02] and [Ma y02 ]. Therefore, in this section we defin e the dimensional computational d epth of a sequence in order to s tudy the n onrandom information on a infinite sequ ence. Definition 6.3 The d imensional depth of a sequence α is defined as depth t dim ( α ) = lim inf n →∞ δ ( α n | 2 − K t ( α n ) ) n . Lemma 6.4 depth t dim ( α ) ≤ d im t ( α ) − dim( α ) Pr o of . depth t dim ( α ) = lim inf n →∞ δ ( α n | 2 − K t ( α n ) ) n = lim inf n →∞ K t ( α n ) − K ( α n ) n ≤ dim t ( α ) − dim( α ) . The last inequalit y holds since the sequence of v alues K ( α n ) /n is non negativ e and then lim inf n − K ( α n ) /n ≤ − lim in f n K ( α n ) /n . ⋄ No w , in the d efinition of str on gly deep sequences, instead of considering a fixed significance lev el s we consider a significance leve l function s : N → N . Naturally , we wan t s ( n ) to gro w v ery slowly so w e assume for example that s = o ( n ). With this replace men t we obtain a tigh ter defin ition as deepness d ecreases with the increase of the signifi cance lev el. Definition 6.5 A sequence is called sup er deep if for ev ery significance lev el fun ction s : N → N , su c h that s = o ( n ), and for ev ery recursive fun ction t : N → N , all b ut finitely man y initial segments α n ha v e logica l depth exceeding t ( n ). 12 W e ha ve already characte rized sup er deep sequences using their dimensional d ep th in Theorem 3.5. In fact w e ha v e ldepth b ( x ) = t ( | x | ), with b minimal ⇒ d epth t ( x ) ≥ b + O (1) Theorem 6.6 A se qu enc e α is sup er de ep if and onl y if depth t dim ( α ) > 0 for al l r e cu rsive time b ound t . Pr o of . Let α be a sup er d eep sequence. Then for ev ery significance lev el fu n ction s , suc h that s = o ( n ) an d ev ery r ecur siv e fun ction t w e hav e that f or almost all n , ldepth s ( n ) ( α n ) > t ( n ). Then depth t ( n ) ( α n ) > s ( n ) . No w if for some time b ound g , depth g dim ( α ) = 0 then there exists a b oun d S , suc h that S = o ( n ), and, infi nitely often depth g ( n ) ( α n ) < S ( n ) . This is ab s urd and therefore for all r ecursiv e time b ound t , d epth t dim ( α ) > 0. Con v er s ely if depth t dim ( α ) > 0 then there is some ǫ > 0 su c h that f or almost all n , depth t ( n ) dim ( α n ) > ǫn . This implies th at ldepth s ( n ) ( α n ) > ldepth ǫn ( α n ) > t ( n ) for all significance function s = o ( n ) and almost all n . So α is sup er deep. ⋄ In the n ext theorem w e express other equiv alen t w a ys to defin e sup er d eepn ess. Theorem 6.7 F or every se quenc e α the f ol lowing c onditions ar e e quivalent. 1. α is sup er de ep; 2. F or ev e ry r e cursive time b ound t : N → N and every signific anc e function g = o ( n ) , depth t ( α n ) > g ( n ) for al l exc ept finitely many n ; 3. F or ev e ry r e cursive time b ound t : N → N and every signific anc e function g = o ( n ) , Q ( α n ) ≥ 2 g ( n ) Q t ( α n ) for al l exc ept finitely many n ; Pr o of . [Ske tc h ] The equ iv alence (1 ⇔ 2) w as pro v ed in Theorem 6.6. T o sho w that (2 ⇔ 3) consider the follo wing sets: D t g = { α ∈ { 0 , 1 } ∞ : depth t ( α n ) ≥ g ( n ) a.e. } ˜ D t g = { α ∈ { 0 , 1 } ∞ : Q ( α n ) ≥ 2 g ( n ) Q t ( α n ) a.e. } The pr o of nows is an imm ediate consequence of the follo wing lemma: Lemma 6.8 ( L emma 3.5 in [JLL94]) If t is a r e cu rsive time b ound then ther e exists c on- stants c 1 and c 2 and a r e cursive time b ound t 1 such that D t 1 g + c 1 ⊂ ˜ D t g and ˜ D t g + c 2 ⊂ D t g . 13 ⋄ F ollo win g the ideas in [JLL94] to pro v e that eve ry w eakly useful sequence is strongly deep w e can p ro v e that ev ery weakl y u seful sequence is s u p er deep. Theorem 6.9 Every we akly use f ul se quenc e is sup er de ep. F or th e pro of of th is result w e n eed the f ollo wing lemmas: Lemma 6.10 (Le mma 5.5 in [JLL94]) L e t s : N → N b e strictly incr e asing and time- c onstructible with the c onstant c s as witness. F or e ach s-time- b ounde d T uring machine M, ther e is a c onstant c M that satisfies the fol lowing. Given non-de cr e asing functions t, g : N → N we define s ∗ , τ , ˆ t, ˆ g : N → N b y s ∗ ( n ) = 2 s ( ⌈ log n ⌉ )+1 , τ ( n ) = t ( s ∗ ( n + 1) + 4 s ∗ ( n + 1) + 2( n + 1) c s s ( | w | ) + 2 ns ∗ ( n + 1) s ( | w | )) , ˆ t = c M (1 + τ ( n ) ⌈ log τ ( n ) ⌉ ) , ˆ g = g ( s ∗ ( n + 1)) + c M , wher e w is the binary r epr esentation of n . F or al l se quenc es α , β , if β is T uring r e ducible to α in time s by M and β ∈ D ˆ t ˆ g then α ∈ D t g . Lemma 6.11 (C orollary 5.9 in [JLL94]) F or every r e cursive function t : N → N and every 0 < γ < 1 , the set D t γ n has me asur e 1 in the se t of r e cursive se q u enc es. Pr o of . [of Theorem 6.9] Let α b y a we akly us efu l sequence. T o p ro v e that α is sup er deep w e sho w that for ev er y recursive time b ound t and every an y significance lev el g = o ( n ), α ∈ D t g , where D t g is the s et d efined in pro of of Theorem 6.7. Since α is w eakly usefu l then there exists a r ecursiv e time b ound s (that without lose of generalit y we can assu m e increasing) such that the set D T I M E α ( s ) of all sequences that are T ur ing r educible to α has p ositiv e measure in the set of recursive sequen ces. Usin g Lemma 6.10, to conclude that α ∈ D t g all that is n ecessary is to pr o ve that there exists β ∈ D ˆ t ˆ g ∩ D T I M E α ( s ), wh ere ˆ t and ˆ g are describ ed in same lemma. Fix γ ∈ ]0 , 1[ and consider ˜ t ( n ) = n (1 + γ ( n ) ⌈ log n ⌉ ) where γ is obtained fr om t and s as in Lemma 6.10. Since ˜ t is recursiv e, by Lemma 6.11, D ˜ t γ n has measure 1 in the set of all recursiv e sequen ces. Th us D ˜ t γ n ∩ D T I M E α ( s ) h as measure 1 and in particular is non emp t y . As ˜ t ( n ) > ˆ t ( n ) a.e. and γ n > o ( n ) = g a.e. it follo ws , directly from the d efinitions, that D ˜ t γ n ⊂ D ˆ t ˆ g and then D ˜ t γ n 6 = ∅ , as w e wan ted to sh o w. ⋄ Corollary 6.12 The char acteristic se q uenc es of the halting pr oblem and the diagonal halting pr oblem ar e sup e r de ep. Pr o of . In [Be n88], the author pro ved that the c haracteristic s equ ences of the halting problem and the diagonal halting problem are weakly useful. Then, it follo ws from T heorem 6.9 th at these tw o sequences are su p er deep. ⋄ Ac kno wledgemen t W e thank Harry Buhr man, L an ce F ortno w, and Ming Li for commen ts and suggestions. 14 References [AFMV06] L. Antunes, L. F ortno w, D. v an Melk eb eek and N. Vino dc handran, “ Computa- tional depth: c onc ept an d applic ations” , in The or. Comput. Sci. , v olume 354 (3), pages: 391-4 04, 2006. [Ben88] C. Benn ett, “L o gic al depth and physic al c omplexity” , in The Universal T uring Ma- chine: A Half-Century Survey , p ages: 227-257, Oxford Unive rsit y Pr ess, 1988. [FLMR05] S. F enner , J. Lu tz, E Ma y ordomo and P . Reardon, “We akly useful se quenc es” , in Information and Computation volume 197, pages: 41-54 , 2005. [JLL94] D. Juedes, J. Lathrop an d J. Lutz, “ Computatio nal Depth and R e ducibility” , in The or et. Comput. Sci . , v olume 132, p ages: 37- 70, 1994. [LL99] J . Lathrop and J. Lutz, “R e cu rsive c omputational depth” , in Informatio n and Com- putation , vo lume 153, pages: 139-172, 1999 [Lev74] L. Levin, “L aws of information c onservation (nongr owth) and asp e cts of the founda- tion of pr ob ability the ory” , in Pr obl. Inform. T r ansm. , vo lume 10, pages: 20 6-210 , 1974. [Lev84] L. Levin, “R andomness c onservation ine qualities: informatio n and indep endenc e in mathemat ic al the ories” , in Information and Contr ol , v olume 61, p ages:15 -37, 1984. [Li03] M. Li, X. C h en, X. Li, B. Ma and P . Vit´ an yi, “The similarity metric” , IEEE T r ans. Inform. Th. , 50:12(20 04), 3250–3264 . [L V97] M. Li and P . Vit´ an yi, “An intr o duction to Kolmo gor ov c ompl exity and its ap plic a- tions” , Springer, 2nd edition, 1997. [Lut00] J . Lutz, “Dimension in c omplexity classes” , Pr o c e e dings of the 1 5th IEEE Confer enc e of Computationa l Complexity , IEEE Computer So ciet y Press, 200 0. [Lut02] J . Lu tz, “The dimensions of individual strings and se quenc es” , T e chnic al R ep ort cs.CC/02030 17 , ACM Computing R ese ar ch R ep ository , 2002. [Ma y02] E. Ma y ord romo, “A Kolmo gor ov c omplexity char acterization of c onstructive Haus- dorff dimension ” , Information Pr o c essing L etters , v olume 84, p ages:1-3, 2002. 15

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment