Effective Complexity and its Relation to Logical Depth

EFFECTIVE COMPLEXITY AND ITS RELA TION T O LOGICAL DEPTH. OCTOBER 29, 2008 1 Ef fec ti v e Comple xity a nd i ts Relation to Logical Depth Nihat A y , Markus M ¨ uller , Arleta Szkoła Abstract —Effectiv e complexity measures the inf ormation con- tent of the regularities of an object. It has b een introduced by M. Gell-Mann and S. Lloyd to av oid some of the disadvantages of K olmogorov complexity , also known as algorithmic information content. In this paper , we give a precise f ormal deﬁnit ion of effectiv e complexity and rigorous proofs of its basic properties. In particular , we show that incompressible binary strings are effectiv ely simple, and we prov e the existence of strings that have effectiv e complexity close to their lengths. Fu rthermore , we show that effective complexity is related to Bennett’ s logical d epth: If the effective complexity of a string x exceeds a certain explicit threshold th en that string must h a ve astronomically large depth ; otherwise, th e depth can be arbit rarily small. Index T erms —Effectiv e Complexity , K olmogoro v Complexity , Algorithmic In fo rmation Content, Bennett’ s Logical Depth, Kol- mogor ov Minimal S ufﬁcient Statistics, S hannon En tropy . I . I N T RO D U C T I O N A N D M A I N R E S U LT S W HA T is complexity? A great deal of r esearch h as b een perfor med on the q uestion in wh at sense som e objects are “more comp licated” than others, and how this fact and its consequen ces can be an alyzed m athematically . One of the most well-kn own complexity m easures is K o l- mogor ov complexity [1], also called algo rithmic complexity or algorithmic informa tion con tent . In short, th e Kolmogorov complexity o f som e ﬁnite bin ary string x is the length of the shortest computer program that produces x o n a universal com- puter . So Kolmogorov comp lexity quantiﬁes how well a string can in p rinciple be com pressed. T his notion of complexity has fou nd v ar ious interesting applications in mathem atics an d computer science. Despite its usefulness, Kolmogorov complexity does not capture the intuitive notion of comp lexity very well. For example, rando m strings with out any regu larities, say strings that are constructed bitwise by repeated tosses of a fair coin, have very large Kolmogorov c omplexity . But tho se strings are not “complex” f rom an intuitive point of view — those strings are complete ly random and do not car ry any intere sting structure at all. Effective comp lexity is an attempt by M. Gell-Man n and S. Lloyd [2],[3] to deﬁn e some complexity measure that is closer to the intuitive notion of comp lexity and overcomes the difﬁculties of K o lmogorov com plexity . The main idea o f effecti ve comp lexity is to split th e algorithmic informatio n content of some string x into two parts, its random featur es and N. A y , M. M ¨ uller and A. Szkoła are with the Max Planck Institute for Mathemat ics in the Scien ces, Inselstr . 22, 04103 Leipzig, Germany . e-mail: { nay ,szk ola } @mis.mpg.de, m ueller @math.tu-berlin.de. M. M ¨ uller is also with the Institute of Mathematics 7-2, TU Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany . its regularities. Then, the ef fe cti ve comp lexity of x is d eﬁned as the algor ithmic infor mation conten t of the regularities alone. In this paper, we are interested in the basic proper ties of effecti ve complexity , an d how it relates to other complexity measures. In particular, we give a mo re precise form al deﬁ- nition of effecti ve comp lexity than has been done previously . W e use this form al fr amew o rk to give d etailed proofs o f th e proper ties of ef fec ti ve complexity , and we use it to sho w an un- expected relatio n b etween ef f ectiv e complexity an d Ben nett’ s logical depth [4 ]. Since th ere are now so many d ifferent com plexity mea- sures [5], ou r result contributes to the clariﬁcation of the interrelation s within th is “zoo” of com plexity measures. Mor e- over , we hope that o ur more formal ap proach helps to ﬁnd applications of effecti ve complexity within mathematics, in a similar mann er as this has been done fo r Kolmogorov complexity . W e now describe ou r main results an d give a syno psis of this paper: • After some notational p reliminaries in Section II, we mo- ti vate and st ate the main deﬁnition of ef fective complexity in Sectio n I II. • In Section IV, we an alyze the basic prop erties of ef fective complexity . In particular, we show in T heorem 10 that effecti ve c omplexity indeed av o ids the d isadvantage of K olmo gorov com plexity that we hav e exp lained above: Random strin gs are effectively simple. Although the existence of effecti vely complex strings has been men tioned in [2], it ha s not been co njectured explicitly . Based on th e notion of alg orithmic statistics as studied by G ´ acs at al. [6] we provide a formal existence proof , see T heorem 14. • Section V contains ou r main r esult (The orem 18 and Theorem 19), the relation between effecti ve com plexity and logical dep th. In sho rt, it states that if the effecti ve complexity of some string exceeds a certain explicit threshold, then th e time it takes to compute that string from a short description mu st be astronom ically large. This thresho ld is in some sense very sharp, suc h that the behavior o f lo gical depth with respe ct to effecti ve complexity is com parable to that o f a phase tr ansition (cf. Fig. 2 on page 11). • In Sectio n VI we show h ow e ffecti ve com plexity is related to the notion of K o lmogor ov minimal sufﬁcient statistics. • Finally , in the Appen dix, we give an explicit example of a computab le ensemble on the binary strin gs that has non-co mputable entropy . This illustrates the necessity o f EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 2 the d etails of ou r d eﬁnition in Section III. W e start by introdu cing notation. I I . P R E L I M I NA R I E S A N D N OTA T I O N W e d enote the ﬁnite bin ary strings { λ, 0 , 1 , 00 , 01 , . . . } by { 0 , 1 } ∗ , where λ is the empty strin g, and we write ℓ ( x ) for the length of a binary string x ∈ { 0 , 1 } ∗ . An ensemble E is a probab ility distribution on { 0 , 1 } ∗ . All logar ithms are in base 2 . W e assume that the reader is familiar with the basic concepts of K olmo gorov comp lexity; a g ood referen ce is the boo k by Li and V it ´ a nyi [1]. There is a “plain” and a “preﬁx” version of K olmo gorov complexity , and we will use both o f them in this paper . The p lain K olm ogorov com plexity C ( x ) of some string x is de ﬁned as th e length of the shortest computer progra m that outpu ts x if it is given as input to a universal compu ter V , C ( x ) := min { ℓ ( p ) | V ( p ) = x } . Preﬁx Kolmogorov complexity is deﬁned an alogou sly , but with respect to a universal pr eﬁx comp uter U . A p reﬁx computer U has the property that if U ( s ) is deﬁned for som e string s , then U ( st ) is undeﬁn ed for every string t that is no t the empty strin g. So K ( x ) := min { ℓ ( p ) | U ( p ) = x } . There ar e different possible choices of U and V ; we ﬁx one of them fo r th e rest o f th e pa per . Sev era l v ar iations of K olmog orov complexity can be easily deﬁned and will be u sed in this p aper, for examp le, the complexity of a ﬁnite list of string s, or the complexity of an integer o r a real nu mber . W ith a few exception s below , we will no t discuss th e details o f the deﬁnitions here and instead refer the re ader to Ref. [1]. The ﬁrst exception that deserves a m ore d etailed d iscussion is conditio nal c omplexity . There ar e two versions of condi- tional complexity , a “naive” one and a more sophisticated one . The nai ve deﬁnition is K ( x | y ) := min { ℓ ( p ) | p ∈ { 0 , 1 } ∗ , U ( p, y ) = x } , (1) that is, the complexity o f pr oducing string x , giv en string y as additio nal “f ree” infor mation. A more sop histicated version due to Chaitin [7] reads K ∗ ( x | y ) := min { ℓ ( p ) | p ∈ { 0 , 1 } ∗ , U ( p, y ∗ ) = x } , (2) that is, the comp lexity of produ cing x , given a m inimal progr am y ∗ for y . The advantage of K ∗ ( ·|· ) comp ared with K ( ·|· ) is the v a lidity o f a chain rule K ( x, y ) + = K ( y ) + K ∗ ( x | y ) (3) for all strings x an d y . Here we m ake u se of a well- known notation [6] which helps to sup press additive constants: Suppose that f , g : { 0 , 1 } ∗ → N are f unctions o n the binar y strings, and th ere is some c ∈ N independ ent of the argumen t value s such that f ( s ) ≤ g ( s ) + c for every s ∈ { 0 , 1 } ∗ , i.e. the inequ ality holds uniformly for s ∈ { 0 , 1 } ∗ . Then we write f ( s ) + < g ( s ) ( s ∈ { 0 , 1 } ∗ ) . W e use the notation + = if b oth + < and + > h old. Note that the “naive” form of co nditional com plexity as deﬁned in (1) doe s n ot satisfy the chain rule (3). Only the weaker id entity K ( x, y ) + < K ( y ) + K ( x | y ) (4) holds in gen eral. W e will often use obvious identities like K ( x ) + < K ( x, y ) or K ( x, y ) + = K ( y , x ) withou t explain ing in detail where th ey come fr om; we again r efer the reader to th e b ook by Li and V it ´ anyi [1]. Another imp ortant prer equisite for th is paper is the deﬁ- nition of the preﬁx K olm ogorov com plexity K ( E ) o f some ensemble E . I n co ntrast to bit strings, there a re sev er al inequivalent notions of a “descr iption”, an d we can learn from Ref. [6] the lesson that it is very importan t to exactly specify which of th em we will use. Our deﬁnition of K ( E ) f or ensembles E is as follows. First, a pr ogram that computes E is a compu ter program that expects two inp uts, namely a string s ∈ { 0 , 1 } ∗ and an integer n ∈ N , and that outpu ts ( the bina ry digits of ) an a pproxim ation of E ( s ) with accu racy of at least 2 − n . Then, ou r p reliminary deﬁnition o f K ( E ) is the leng th of the shor test prog ram for the u niversal preﬁx com puter U that computes E . Obviously , no t every ensemb le E is comp utable — there is a continuum of string ensembles, but ther e ar e only countab ly many algorithms that compu te ensembles. Another unexpected difﬁculty concer ns the entropy o f a computab le ensemble, deﬁned as H ( E ) := − P x ∈{ 0 , 1 } ∗ E ( x ) log E ( x ) . Con trary to a ﬁrst naive guess, the entro py of a com putable ensemble doe s not need to be com putable; all we k now fo r sure is th at it is enumera ble from below . T o illustrate this, we giv e an explicit example of a co mputable ensemble with a non -compu table entropy in Examp le 22 in Appendix A. Thus, for th e rest o f the pap er , we assume that all ensembles are co mputable and have c omputable and ﬁnite entropy H ( E ) , un less stated other wise. Even when one re stricts to the set of ensembles with computab le entropy , the map E 7→ H ( E ) is not necessarily a computab le function. Hence the appro ximate equality K ( E , H ( E )) + = K ( E ) is not necessarily true uniformly in E . Thus, f rom now o n we replace the p reliminary deﬁn ition K ( E ) by K ( E ) := K ( E , H ( E )) , i.e. we assume that co mputer progra ms for ensembles E carry additionally a subprogram that computes the entropy H ( E ) . I I I . D E F I N I T I O N O F E FF E C T I V E C O M P L E X I T Y T o deﬁne the notion of e ffecti ve complexity , we follow the steps describ ed in o ne o f the o riginal manu scripts by M. Gell- Mann and S. Lloyd [3]. First, they d eﬁne th e total informatio n of an e nsemble as the sum of the e nsemble’ s entropy and complexity . EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 3 T o u nderstand the motiv atio n behind this deﬁnition, supp ose we are given so me data x (a ﬁnite binary string ) whic h has been generated by an unk nown stochastic pro cess. W e would like to make a goo d guess on the p rocess that gener ated x , ev en if we only have one sample of the process. T his is similar to a scientist that tries to ﬁn d a (pr obabilistic) theory of physics, giv en only the present state of the un iv er se. T o m ake a good guess on the proba bility distribution or en semble E that produ ced x , we m ake two natural assum ptions: • The exp lanation shou ld be simple. In terms of K ol- mogor ov complexity , this m eans that K ( E ) should be small. • The explanation sho uld not allow all possible outcomes, but should prefer some outcomes (includin g x ) over others. For example, the unifo rm d istribution o n a billion different possible physical theor ies is “simple” (i.e. K ( E ) is small), but it is not a “good explanation” of our physical world beca use it c ontains a huge a mount o f arb itrariness. This ar bitrariness can be identiﬁed with the “measure of ignoran ce”, the entropy of E . Thus, it is natural to deman d that th e entropy H ( E ) shall b e small. Putting b oth a ssumptions toge ther , it is n atural to consider the sum K ( E ) + H ( E ) which is ca lled the “total in formation ” Σ( E ) . A “good theory” is then an en semble E with small Σ( E ) . Deﬁnition 1 (T otal In formation): For ev er y ensemble E with entropy H ( E ) := − P x ∈{ 0 , 1 } ∗ E ( x ) log E ( x ) , we d eﬁne th e total information Σ( E ) of E as Σ( E ) := K ( E ) + H ( E ) . Note that the total informa tion is a real number larger than or equ al to 1 . If E is c omputab le and has ﬁn ite entropy , as always assumed in this paper, then Σ( E ) is ﬁnite. In the sub sequent work [2] by M . Ge ll-Mann and S. Lloyd, it has been p ointed out that H ( E ) ≈ P s ∈{ 0 , 1 } ∗ E ( s ) K ( s | E ) . It follows that Σ( E ) ≈ X s ∈{ 0 , 1 } ∗ E ( s )  K ( s | E ) + K ( E )  . (5) This has a nice interpreta tion: The total informa tion g i ves the av er age comp lexity of comp uting a string with the detou r of computin g the ensemble. The next step in [3] is to explain what is m eant by a string being “typ ical” fo r an ensemble. Go ing b ack to the an alogy of a scien tist trying to ﬁnd a theor y E explaining his data x , a good th eory sh ould in fact predict tha t th e ap pearance of x ha s n on-zero pr obability . Even more , the prob ability E ( x ) should not be too small; it should b e at least as large as that of “typical” o utcomes o f th e correspond ing process. What is the pro bability of a “typical” outcome of a random experiment? Suppo se we to ss a biased coin with p robability p for heads and 1 − p =: q for tails n times, and call the resulting prob ability distribution E . Then it turns ou t that typical outcom es x have pr obability E ( x ) close to 2 − nH , where H := − p log p − q log q , and n · H is the entro py of E . In fact, the probability th at E ( x ) lies in between 2 − n ( H + ε ) and 2 − n ( H + ε ) for ε > 0 tends to on e as n gets large. In informa tion theor y , this is called the “asymp totic equipar tition proper ty” (cf. Ref. [8]). An appro priately extended version of th is result ho lds f or a large class of stocha stic pr ocesses, including ergo dic pro cesses. This motivates to deﬁne that a string x is typica l for an ensemble E if its pro bability is no t muc h smaller than 2 − H ( E ) . Deﬁnition 2 ( δ -T ypical String): Let E be an ensemble, x ∈ { 0 , 1 } ∗ a strin g an d δ ≥ 0 . W e say that x is δ -typ ical fo r E , if E ( x ) ≥ 2 − H ( E )(1+ δ ) . W e return to the scenario o f th e scientist who looks fo r goo d theories (ensemb les E ) explainin g h is data x . As discussed above, it is natural to lo ok for theories with small total informa tion Σ( E ) . M oreover , the theory should predict x as a “typical” outco me of the correspondin g rand om experimen t, that is, x should be δ - typical for E for some small constant δ . How small c an the total info rmation of such a theory be? The next lemma shows that the an swer is “n ot to o sm all”. Lemma 3 ( Minimal T o tal Information) : It uniform ly holds for x ∈ { 0 , 1 } ∗ and δ ≥ 0 that K ( x ) 1 + δ + < inf { Σ( E ) | x is δ -typical for E } + < K ( x ) . Remark. Th e up per bou nd K ( x ) and the compu tability of E s h ow that the set is ﬁnite, and the inﬁmu m is indeed a minimum. Pr oof: Fix some δ ≥ 0 an d some x ∈ { 0 , 1 } ∗ . Clearly , x is δ -ty pical for the singlet distribution E x , given by E x ( x ) = 1 and E x ( x ′ ) = 0 for every x ′ 6 = x . This ensemble has entropy H ( E x ) = 0 . T hus, the to tal in formation Σ( E x ) equ als the complexity K ( E x ) . W e also have K ( E x ) + = K ( x ) , as d escribing the ensem ble E x boils down to d escribing the string x . Fu rthermor e, th e co rrespond ing ad ditiv e constant does n ot d epend on x or δ . It follows that inf { Σ( E ) } + < K ( x ) . T o p rove the converse, suppose E is any ensemble suc h tha t x is δ -typical f or E . Th en we have the chain of inequalities K ( x ) + < K ( x, E ) + < K ( E ) + K ( x | E ) + < K ( E ) + ⌈− log E ( x ) ⌉ ≤ K ( E ) + ⌈ H ( E )(1 + δ ) ⌉ (6) ≤ Σ( E ) + δ H ( E ) + 1 (7) ≤ Σ( E )(1 + δ ) + 1 . (8) The ﬁrst two inequ alities follow fro m general pro perties of preﬁx K olmo gorov complexity , while the th ird in equality is due to the up per bo und K ( x | E ) + < ⌈− log E ( x ) ⌉ , (9) which follows fr om coding e very string x with E ( x ) 6 = 0 in to a preﬁx co de word of length ⌈− log E ( x ) ⌉ (such a code exists due to the Kr aft inequality ). Moreover, (6) is a con sequence EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 4 of δ -typicality of s for E , and (7) uses th e deﬁnition of the total informatio n Σ . ✷ The ultimate goal of effecti ve complexity is to assign a useful complexity measu re E ( x ) to string s x . In o ur analogy , this mea ns that the scientist wants to assign a natural nu mber to his data x sayin g h ow “complex” x is. Simp ly taking the K olmo gorov complexity K ( x ) as this value has importan t drawbacks: It does no t at all captu re the intuition th at “com- plexity” should measure the “amount of s tr ucture” of an object. In fact, if x is uniform ly rando m ( i.e. the r esult of fair co in tossing), then K ( x ) is large, while the string p ossesses almo st no structure at all. The strategy of S. Lloyd and M. Gell-Mann [3] is instead to take that comp lexity K ( E ) of “the best” theor y E that explain s the d ata x . What is “the best” theo ry? As already discussed, a good th eory sho uld hav e small total inf ormation Σ( E ) , an d the data x sho uld be “typical” fo r E in th e sen se that the probab ility E ( x ) is n ot m uch smaller tha n 2 − H ( E ) . Giv en som e data x , th ere are always many “good theories” which satisfy the se requirem ents. Which one is “the best”? T o think abou t this qu estion, it is helpfu l to look at a graphica l representation of “good theories” and their properties as described in [3] and depicted in Fig. 1. H ( E ) K ( E ) K ( x ) K ( x ) δ x E ( x ) Fig. 1. The minimization domain of effe ctiv e comple xity . Plotted are only those ensembles E for which the ﬁxed string x is typical. Suppose we plot the s e t of theories in the entropy- complexity plane. That is, fo r every comp utable ensemble E (with ﬁnite an d comp utable entr opy), we plot a black d ot at the p lane, where th e x -ax is labels the e ntropy H ( E ) and the y -axis labels the Kolmogorov comp lexity K ( E ) . The Kolmogorov comp lexity K ( E ) is integer-valued, and if n ∈ N is sm all, th ere ar e o nly f ew ensembles E with K ( E ) = n (in fact, the numb er of suc h ensembles is up per-bounded by 2 n ). Thus, there ar e only f e w blac k do ts at sm all v alu es o f the y -axis. Going u p the y -axis, the nu mber of ensembles and hence the d ensity o f th e black dots incre ases. The total inform ation Σ( E ) is the su m of the ensemb le’ s entropy an d complexity . Thus, ensemb les with con stant total informa tion correspo nd to lines in the plane that are parallel to the tilted line in Fig. 1. Suppose we ﬁx some data x and plot only th ose ensembles E such that x is δ -typical fo r E f or some ﬁxed constant δ ≥ 0 . This was one of ou r two requirements that a “g ood theory” E for x sho uld fulﬁll. Th at is, we dismiss all the en sembles f or which x is no t a typica l realizatio n. Then Lemma 3 tells us that all the remainin g ensemb les must, up to an a dditive constant, ha ve total informatio n larger than K ( x ) / (1 + δ ) . Graphically , this m eans that all th ose ensembles must approx imately lie right of the straight line with H ( E ) + K ( E ) = K ( x ) . One o f these ensembles is the Dirac measure δ x , the ensemble with δ x ( x ) = 1 and δ x ( x ′ ) = 0 for x 6 = x ′ : It has Kolmogorov complexity K ( δ x ) + = K ( x ) and entro py H ( δ x ) = 0 , hence min imal total infor mation. I t correspo nds to the circle on the y -axis at the left end of the line. W e also have discu ssed a secon d r equiremen t for a “ good theory”: The total infor mation shou ld be as small as possible. According to Lemm a 3, this m eans th at Σ( E ) should not be much larger than the Kolmogorov complexity K ( x ) . W e identify the “g ood” theo ries as those ensembles that are no t too far away f rom the line in Fig. 1; say , we c onsider those ensembles as “goo d” tha t are below the d otted lin e with Σ( E ) = K ( x ) + ∆ . Among th e re maining go od th eories, wh ich on e is “th e best”? The co n vin cing sug gestion by M. Gell-Man n and S. Lloyd is that the b est the ory is the simplest th eory; that is, the ensemble E with the minimal Kolmogorov com plexity K ( E ) . The co mplexity K ( E ) of this minimizing ensemble is the n deﬁned a s th e effecti ve comp lexity of x . In other words, the effective complexity of x is deﬁned as the smallest possible K olmo gorov comp lexity of any “good theory” (satisfying the two requiremen ts) f or x . This suggests the f ollowing preliminary deﬁn ition (we d iscuss an importan t modiﬁcation below): Deﬁnition 4 (Effective Complexity I): Giv en parameters δ, ∆ ≥ 0 , the effective complexity E δ, ∆ ( x ) of any string x ∈ { 0 , 1 } ∗ is d eﬁned as E δ, ∆ ( x ) := inf { K ( E ) | x is δ -typ ical for E , Σ( E ) ≤ K ( x ) + ∆ } , or as ∞ if this set is empty . W e re fer to the set o n the righ t-hand side as th e minimization domain o f x for effecti ve co mplexity , an d den ote it b y P δ, ∆ ( x ) . T hus E δ, ∆ ( x ) = min E ∈P δ , ∆ ( x ) K ( E ) . Note that en sembles E of the m inimization doma in P δ, ∆ ( x ) of x ∈ { 0 , 1 } ∗ satisfy K ( x ) 1 + δ + < Σ( E ) ≤ K ( x ) + ∆ . This notion o f effecti ve co mplexity is closely related to a quantity called “Kolmogorov minimal suf ﬁcient statistics”. W e explain this fact in mo re detail in Deﬁn ition 20 and Lemm a 21 in Sectio n VI b elow . As po inted out by M. Gell-Mann an d S. Lloyd, it is often useful to extend this deﬁnitio n of effecti ve co mplexity by EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 5 imposing additional conditions (“constraints”) on the en sem- bles that are allowed in th e m inimization domain. There are b asically two intuitive r easons wh y th is is usefu l. T o understan d those reasons, we go back to the scenario o f a scientist look ing for goo d theories to explain his d ata x . Recall the in terpretation of the minimization domain of effectiv e complexity as the set of “g ood theo ries” for x . Reaso ns f or considerin g constraints on th e ensembles ar e: • Giv en the string x , there might be certain pr op erties of x that the scientist judges to b e important . T hose p roperties should b e explain ed by the theories in the sense that the proper ties a re no t just simple ra ndom coincid ences, but necessary or high ly probab le p roperties of each o utcome of the correspond ing process. For example, sup pose that a scientist wants to ﬁnd good theories tha t explain the present state of ou r universe. In addition, that scientist ﬁnds it particularly imp ortant an d interesting th at the v a lue of th e ﬁne structure con stant is about 1 137 and would like to ﬁnd theor ies that explain why this c onstant is close to that value. Then, he will only accept ensembles E that have expectation value of this constant not too far away from 1 137 . In terms of effecti ve complexity , this scientist views the appearan ce of a ﬁne structur e constan t of abou t 1 137 as an impor tant stru ctural (non-r andom) prop erty of x , the encoded state o f our universe. Thus, h e considers this proper ty as pa rt of the regular ities o f x . Effectiv e com- plexity is the K olmo gorov comp lexity of the regular ities of x , thu s, this scientist tends to ﬁn d a larger value of effecti ve comp lexity than othe r scientists wh o consider the ﬁn e structure con stant as unim portant and rand om. • The scientist m ight hav e additiona l information on th e process that actually crea ted x . T his situation is of ten encoun tered in th ermody namics. Suppose that x encod es some m icroscopic pro perties of a gas in a con tainer that a scientist has measured. In addition to these measu rement results, the scientist typically a lso h as inform ation on se veral macr oscopic ob servables like the temperatur e or the total energy in the box — at least, cr ude upper bounds are usually given b y basic pro perties of the labor atory physics. Then , a “g ood theo ry” consistent with the actual physical p rocess with in the lab must ob ey the additional constraints giv en by the macr oscopic ob servables. In terms o f effectiv e complexity , the m acroscopic obser v- ables respectively the add itional in formatio n contributes to the regularities of x an d enlarges its e ffecti ve complex- ity . Deﬁnition 5 (Effective Complexity II): Giv en parameters δ, ∆ ≥ 0 , the effective comp lexity E δ, ∆ ( x |C ) of a strin g x co nstrained to a subset C of all ensem bles is E δ, ∆ ( x |C ) := inf { K ( E ) | x is δ -ty pical f or E , E ∈ C , Σ( E ) ≤ K ( x ) + ∆ } (10) or as ∞ if this set is empty . The set o n the r ight-han d side is called the constrained minimization domain of x for effecti ve complexity an d e quals P δ, ∆ ( x ) ∩ C accor ding to the notation of Deﬁnition 4 . Th us, E δ, ∆ ( x |C ) = min E ∈P δ , ∆ ( x ) ∩C K ( E ) . Note that we allo w that C d epends on x . This makes it possible, for examp le, to intro duce the constraint that x is an elem ent of an ensem ble of strings th at all have the same len gth : Just take C as the set of probability distributions on { 0 , 1 } n , wher e n = ℓ ( x ) . In general restrictin g the set o f ensemb les will increase the value of effecti ve comp lexity , i.e. C ⊆ D ⇒ E δ, ∆ ( x |C ) ≥ E δ, ∆ ( x |D ) . This is in agreemen t with ou r intuitio n. Indeed such re stric- tions give a way to demand explicitly that som e regularities o f the co nsidered string x appear as a consequ ence of regular ities of the gener ating p rocess. As such, they con tribute to the effecti ve com plexity . If the constrained set C or the con stant ∆ are too small such that the (constrained) minimizatio n domain P δ, ∆ ( x ) ∩ C is emp ty , th en effecti ve comp lexity is inﬁnite, acco rding to Deﬁnition 5. Fu rthermor e note th at • E δ, ∆ ( x |C ) is d ecreasing in δ and ∆ , • if E δ, ∆ ( x |C ) is ﬁn ite and x ∈ { 0 , 1 } n , then E δ, ∆ ( x |C ) ≤ K ( x ) + ∆ ≤ n + O (log n ) . (11) In many situatio ns in phy sics, the co nstrained sets C a ppearing in th e deﬁn ition of E δ, ∆ ( x |C ) consist of those ensembles tha t have expectation v alue s of observables within certain interv als. That is, we have real-valued fun ctions f i , the observables, and the set of en sembles C consists of those ensembles E with X x ∈{ 0 , 1 } ∗ E ( x ) f i ( x ) ∈ I i , where the sets I i ⊂ R are inter vals or p ossibly ﬁxed re al number s. So metimes it even makes sense to allow different intervals I i ( E ) f or different en sembles E ; say , the intervals may all be centere d arou nd the same ﬁxed expectation value, but may have a width which grows with the standard deviation of E with respec t to the observable f i . This is explo red in more detail in th e fo llowing example. Example 6 ( Constraints and Observables): Fix som e string x ∈ { 0 , 1 } ∗ . Let M b e an index set and { f i } i ∈ M a family of re al-valued constrain t functio ns on { 0 , 1 } ∗ (the observables). W e would like to de ﬁne a constrained set of ensembles C x with the following prop erty: C x shall con tain all those ensemb les wh ich have expectation values of the observables { f i } that are “n ot too far aw ay from” the actual values f i ( x ) of the observables evaluated on the string x . Th is is don e in the following way: T o each observable f i and e nsemble E , associa te a corre- sponding in terval I i ( E ) ⊂ R . The ch oice of tho se intervals is somewhat arbitrary – we only demand th at th ey co ntain the expectation values of the correspo nding observables, i.e. X s ∈{ 0 , 1 } ∗ E ( s ) f i ( s ) ∈ I i ( E ) for all i ∈ M . Then d eﬁne th e constrained set of ensemble s C x by C x := { E | f i ( x ) ∈ I i ( E ) fo r a ll i ∈ M } , EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 6 that is, C x consists of those ensembles E such that the correspo nding interval (centered aro und the corr espondin g ex- pectation value) contains the “correct” value o f the observable ev aluated at x . The correspond ing ef f ectiv e complexity value x 7→ E δ, ∆ ( x |C x ) has a natu ral interpretation as the effectiv e complexity of x if the observable prop erties f i of x ar e judged to be important (or are ﬁxed as macroscopic observables). Compar e the discussion before Deﬁnition 5. T o illustrate the notation introduced in the pre v ious example, we look at the situation whe n we would like to deﬁne the effecti ve comp lexity of strings under th e co nstraint of ﬁxed length. That is, suppose that we con sider the length ℓ ( x ) o f our string x as an imp ortant regular ity – or that we have additional side inf ormation that the unk nown rand om proce ss generates strings of ﬁxed len gth only . In this case, it makes sense to look at the effectiv e complexity E δ, ∆ ( x |C x ) , where C x := { E | ℓ ( y ) 6 = ℓ ( x ) ⇒ E ( y ) = 0 } . (12) Example 7 ( F ixed Length Co nstraint): Consider th e effective comp lexity n otion E δ, ∆ ( x |C x ) as ex- plained above. Instead of using Equatio n ( 12), we can also use th e no tation of Ex ample 6: we have on ly one constraint, so the index set M satisﬁes # M = 1 , for example M = { 1 } . Our observable f 1 is then the character istic fu nction on the strings o f length ℓ ( x ) , that is, f 1 ( s ) :=  1 if ℓ ( s ) = ℓ ( x ) , 0 otherwise . T o every ensemble E , we associate an in terval I 1 ( E ) which only consists of the single rea l number th at equals the corre - sponding exp ectation value o f f 1 , i.e. I 1 ( E ) =    X s ∈{ 0 , 1 } ∗ E ( s ) f 1 ( s )    =    X ℓ ( s )= ℓ ( x ) E ( s )    ⊂ R . It is then easy to see that the set C x deﬁned in Exam ple 6 above equals the set in Equatio n (12). Due to linearity , th e constrained sets C x in Examp le 6 are always convex. This p roperty , tog ether with a comp utability condition , will be u seful in th e fo llowing. Deﬁnition 8 (Con vex and Decidable Constraints): A set C of ensembles o n th e binary strin gs is called • conve x , if f or ev er y ﬁnite set of ensem bles { E i } i ⊆ C , ev er y compu table con vex combination P i λ i E i with λ i ∈ (0 , 1) an d P i λ i = 1 , is also in C , • decidab le , if there exists some algo rithm that, given some string x ∈ { 0 , 1 } ∗ as input, decides in ﬁnite time whe ther the Dirac m easure on x is an ele ment of C or no t, that is, whether the measu re δ x ( t ) :=  1 if t = x 0 if t 6 = x, satisﬁes δ x ∈ C or not. In this sense, we may deﬁn e K ( C ) as the len gth o f the shortest compu ter pro gram that computes the correspond ing decision fun ction. W e proceed by analyzing so me basic prope rties o f effective complexity . I V . B A S I C P RO P E RT I E S O F E FF E C T I V E C O M P L E X I T Y W e hav e remarked that th e effecti ve complexity E δ, ∆ ( x |C ) can be inﬁnite, for example, if the constant ∆ is too small or the c onstrained set of en sembles C is too restrictive such tha t the minim ization domain satisﬁes P δ, ∆ ( x ) ∩ C = ∅ . Thus, we start by proving a simple sufﬁcient condition that guara ntees that effective complexity is ﬁnite. Lemma 9 ( F initeness of E ffective Complexity): There is a constant m ∈ N such that E δ, ∆ ( x |C ) ≤ K ( x ) + ∆ < ∞ for all strings x with Dirac measure δ x ∈ C , δ ≥ 0 and ∆ ≥ m . Pr oof: Due to (11), we only have to prove that E δ, ∆ ( x |C ) is ﬁn ite. According to Deﬁnitio n 5, it remains to prove that P δ, ∆ ( x ) ∩ C 6 = ∅ under the conditio ns g iv en above, wh ere P δ, ∆ ( x ) is the mini- mization domain of x . T o this end, we s h ow that δ x ∈ P δ, ∆ ( x ) . This f ollows from • δ x ( x ) = 1 = 2 − H ( δ x )(1+ δ ) for every δ ≥ 0 , so x is δ -typic al for δ x , • Σ( δ x ) = H ( δ x ) + K ( δ x ) = K ( δ x ) ≤ K ( x ) + m , where m ∈ N is a constan t. ✷ Our ﬁrst result resemb les th e example on p. 51 in [3]. Sup - pose that we have a rand om binary string x of length n , m aybe a string which h as been determ ined by tossing a fair coin n times. Th e K olmo gorov complexity of such a string typically satisﬁes K ( x ) ≈ n , that is, structu reless random strings have maximal Kolmogorov complexity . This was one of reason s for S. Lloyd’ s and M. Gell-Man n’ s criticism of K o lmogor ov complexity and for their attempt to d eﬁne effecti ve co mplexity as a m ore useful an d intuitive comp lexity measure. The following theo rem proves that ra ndom strings indee d have small effecti ve co mplexity , which suppo rts the p oint of view that effective comp lexity measures only the comp lexity of the non-ran dom stru cture of a strin g. Before we state th at theorem, we have to explain in detail what we m ean by a “rando m” string. It is well-k nown that most strings ar e incomp ressible, which is wh at we mean by “rand om” at th is po int. In more detail, if r ∈ N is some ﬁxed p arameter, the numbe r o f strings x of length n w ith preﬁx complexity K ( x ) ≤ n + K ( n ) − r does not exceed 2 n − r + O (1) (cf. [1, Thm. 3.3. 1]). Th at is, m ost strings are in compre ssible in the sense that K ( x ) ≥ n + K ( n ) − r . W e call such strings r -inco mpr essible . Theor em 10 (Incomp r essible Strings ar e Simple) : There exists some global constant c ∈ N such that E δ, ∆ ( x ) ≤ log n + O (log lo g n ) (13) for all r - incompre ssible string s x o f length n , δ ≥ 0 and ∆ ≥ r + c . EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 7 Moreover , if C is a conv ex and decida ble con strained set of ensembles, then for all r -inco mpressible strings x of leng th n with the p roperty that the Dirac measure δ x ∈ C , we ha ve E δ, ∆ ( x |C ) ≤ log n + O (lo g log n ) + K ( C ) whenever δ ≥ 0 and ∆ ≥ r + c + K ( C ) . Note that the log log n - term does no t depend on C . Pr oof: W ith a suitab le ch oice o f c , the ﬁrst part of the theorem is a special case of the second part (with C := the set of all en sembles), so it is sufﬁcient to prove the second p art. Suppose that C and x satisfy the con ditions of the theorem . Let E x |C be th e uniform distribution on M x |C := { t ∈ { 0 , 1 } ∗ | ℓ ( t ) = ℓ ( x ) , δ t ∈ C } . Then H ( E x |C ) = log # M x |C ≤ ℓ ( x ) , and E x |C ( x ) = 1 # M x |C = 2 − H ( E x |C ) , so x is δ -typ ical fo r E x |C . Moreover , K ( E x |C ) + < K ( ℓ ( x )) + K ( C ) . Thus, Σ( E x |C ) + < ℓ ( x ) + K ( ℓ ( x )) + K ( C ) . The strings x which are r -incom pressible satisfy by deﬁn ition K ( x ) ≥ ℓ ( x )+ K ( ℓ ( x )) − r . Denoting by r ( x ) the corresponding degree of incompressibility of x gives Σ( E x |C ) + < K ( x ) + r ( x ) + K ( C ) , i.e. there is a g lobal constan t c ∈ N such that Σ( E x |C ) ≤ K ( x ) + r ( x ) + K ( C ) + c . Now if ∆ ≥ r ( x ) + K ( C ) + c , it follows from Deﬁnition 4 a nd E x |C ∈ C th at E δ, ∆ ( x |C ) ≤ K ( E x |C ) ≤ K ( ℓ ( x )) + K ( C ) + O (1) ≤ log ℓ ( x ) + O (log log ℓ ( x )) + K ( C ) . ✷ Note that accord ing to the theorem above, every string x ∈ { 0 , 1 } ∗ becomes effectiv ely simple if ∆ is large enou gh. Indeed , fo r ev er y δ ≥ 0 , relation (13) is satisﬁed by x if ∆ is larger than the x -d ependen t threshold ∆ max ( x ) := ℓ ( x ) + log ℓ ( x ) + c − K ( x ) . ( Here c is the global constant appearin g in Theorem 10.) For strings x of ﬁxed len gth n , o ne can g i ve an n -depen dent threshold ∆ max ( n ) := n + c su ch that E δ, ∆ ( x ) + < log n + O (log log n ) if ∆ ≥ ∆ max ( n ) . On th e other han d, due to Lemma 9, to ensure E δ, ∆ ( x ) < ∞ for x ∈ { 0 , 1 } ∗ , one shou ld choose ∆ not too small, namely ∆ ≥ m , where m was a global constant d epending on the reference universal computer only . These simple observations show that a discussion of the depend ence of effecti ve complexity for arb itrary but ﬁxed strings x ∈ { 0 , 1 } ∗ on the p arameter ∆ should be useful for a deeper u nderstandin g of the con cept. In a forthco ming paper [9], we investigate in more detail the behavior of effecti ve com plexity of long strings g enerated by stochastic pr ocesses for different cho ices of ∆ = ∆( n ) . For the rest of this pape r , we assum e ∆ to be a ﬁxed constan t (no t depend ing on n or x ) that is larger th an the a foremen tioned constant m . In what follows we strengthen th e resu lt in T heorem 10 in a way that will be interesting later in Section V: Cor ollary 11: Th ere exists som e global constan t c ∈ N such that uniformly E δ, ∆ ( x ) + < K ( C ( x )) + r for all r -inc ompressible strin gs x , δ ≥ 0 and ∆ ≥ r + c . If C is a con vex and decid able set of ensembles, then for all x that add itionally satisfy δ x ∈ C we have E δ, ∆ ( x |C ) + < K ( C ( x )) + r + K ( C ) whenever ∆ ≥ r + c + K ( C ) . Pr oof: It follows from th e proof o f T heorem 10 tha t E δ, ∆ ( x |C ) + < K ( ℓ ( x )) + K ( C ) . According to the deﬁnitio n of r -inco mpressibility , we also have K ( ℓ ( x )) ≤ K ( x ) − ℓ ( x ) + r , thus E δ, ∆ ( x |C ) + < K ( x ) − ℓ ( x ) + r + K ( C ) . Moreover , it ho lds [1] K ( x ) + < C ( x ) + K ( C ( x )) and C ( x ) + < ℓ ( x ) . ✷ The fact th at inco mpressible strings have small e ffecti ve complexity — an d m ost string s are incomp ressible — raises the qu estion if ther e exists any string with la r ge effectiv e complexity at all. Fortunately , the answer is “yes”; otherwise, the n otion of effecti ve complexity would be an empty concep t. On the one hand, we might drop a requ irement po sed in Theorem 10. Th ere, w e restricted to constrained sets C of ensembles that con tain the Dirac measure δ x , which basically means that th e string x shou ld itself f ulﬁll th e constraints th at are u sed to d eﬁne C . The effecti ve co mplexity of strings x that do not m eet this requ irement m ight p ossibly be large. On th e other hand, e ven among strin gs that f ulﬁll this requirem ent, there are still strings that are effectively complex, namely tho se strin gs that ar e called “non-stoch astic” in the context of algorithm ic an d K olmo gorov minima l sufﬁcient statistics [6]. Suppo se we have a ﬁnite subset S ⊂ { 0 , 1 } ∗ of the ﬁnite binary strings. Ther e are elemen ts x ∈ S that are easy to specify , once the set S is g iv en , in the sense that K ( x | S ) is sm all. For exam ple, th e smallest elemen t of S in lexicographic al o rder ha s very small cond itional com plexity K ( x | S ) . W e call such elements atypical . On the other hand, most of the elemen ts o f S will not be special in such a way , such that we can specify them only by giving their “in dex” within the set S , which takes abo ut lo g # S bits. Th us, most elements x ∈ S will have K ( x | S ) + > log # S. W e can call such elements typical f or S . There is a lemma by G ´ acs, T ro mp, an d V it ´ a nyi [6], stating th at there exist strings which ar e atypical for every simple set S . They are called non-stochastic : Lemma 1 2 ( [6, Thm.IV .2]): Ther e are constants c 1 , c 2 ∈ N such that the fo llowing holds true: Su ppose n ∈ N is ﬁxed. For ev er y k < n , there is some string x ∈ { 0 , 1 } n with K ( x | n ) ≤ k , such that log # S − K ( x | S, K ( S )) > n − k − c 2 for e very S ∋ x with K ( S ) < k − c 1 . EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 8 W e want to use this result to pr ove that for every n , there are binary string s of length n that have effectiv e com plexity of about n . Th erefore, we have to show that b asically all we do with ensemb les of string s can as well be accomplished with equidistributions on sets : Lemma 1 3 (Ensembles a nd Sets): Let x ∈ { 0 , 1 } ∗ and δ, ∆ ≥ 0 b e arb itrary , and let C be a set of decidable an d conve x constraints such that δ x ∈ C . Moreover , let D b e an ar bitrary set of con straints such th at the effecti ve c omplexity E δ, ∆ ( x |D ) is ﬁnite, and let E be the correspo nding minimizing ensemble. Then, for every ε > 0 , there is a set S ⊂ { 0 , 1 } ∗ containing x with log # S ≤ H ( E )(1 + δ ) + ε (14) K ( S ) ≤ K ( E ) + c + K ( δ ) + K ( ε ) + K ( C ) (15) such that the e quidistribution on S is in C , where c ∈ N is a global constant. Remark. The most interesting case is D = C , showing that the minimizing ensem bles in the d eﬁnition of effective complexity can “almost” (up to the additi ve terms above) be chosen to be equidistributions on sets even in the p resence of decid able and conv ex constrain ts. Pr oof: The minim izing en semble E in the deﬁnition of E δ, ∆ ( x |D ) ha s th e following proper ties: E ( x ) ≥ 2 − H ( E )(1+ δ ) , K ( E ) + H ( E ) ≤ K ( x ) + ∆ . (16) W e would like to write a compu ter prog ram that, given a description of E , com putes a list of all strin gs y ∈ { 0 , 1 } ∗ that satisfy the constraints C and the ineq uality E ( y ) ≥ 2 − H ( E )(1+ δ ) . Such a prog ram c ould search through all strings y , decid e fo r every y wheth er this equation and the c onstraints hold for y , and do this until the sum of the probabilities of all the pr eviously listed e lements exceeds 1 − 2 − H ( E )(1+ δ ) . But there is a problem of n umerics: It is in general imp ossible for the p rogram to d ecide with cer tainty if this ineq uality holds, because of the un av oidab le numerical er ror . In stead, we can construct a compute r program that c omputes a set S ⊂ { 0 , 1 } ∗ with the f ollowing weaker p roperties: y ∈ S ⇒ E ( y ) ≥ 2 − H ( E )(1+ δ ) − ε and y satisﬁes C , E ( y ) ≥ 2 − H ( E )(1+ δ ) and y satisﬁes C ⇒ y ∈ S. That is, the pro gram computes a set S wh ich deﬁnitely contains all string s y with E ( y ) ≥ 2 − H ( E )(1+ δ ) that satisfy the constraints, b ut it may also contain string s which slightly vio- late this inequ ality as long as they still satisfy the constraints. Howe ver , the numerical methods are chosen good enoug h such that we ar e guarantee d that every elemen t of S has p robability of at least 2 − H ( E )(1+ δ ) − ε . This set S contains x and has the desired prop erties (14) and (15). This can be seen as follows. By deﬁn ition, E ( x ) ≥ 2 − H ( E )(1+ δ ) , so x ∈ S . Since ev er y elem ent y ∈ S ha s p rob- ability E ( y ) ≥ 2 − H ( E )(1+ δ ) − ε , it h olds # S ≤ 2 H ( E )(1+ δ )+ ε . Finally , the description length of the c orrespon ding comp uter progr am can be e stimated via K ( S | E ) ≤ c + K ( δ ) + K ( ε ) + K ( C ) . ✷ Now we are read y to prove the existence of effecti vely com- plex strings. First, what should we expect from “effectiv ely complex” strin gs — how large c ould effecti ve complexity E δ, ∆ ( x ) of some string x of length n possibly b e? If E is the m inimizing ensemble in th e deﬁn ition of E δ, ∆ ( x ) , th en E δ, ∆ ( x ) = K ( E ) ≤ Σ( E ) ≤ K ( x ) + ∆ ≤ n + K ( n ) + O (1) . Thus, the best result we can hope for is the existence of string s of length n that have ef f ectiv e c omplexity close to n . The next theorem shows exactly this. Theor em 14 (Effectively Complex String s): For e very δ, ∆ ≥ 0 and n ∈ N , th ere is a string x of length n such that E δ, ∆ ( x ) ≥ (1 − δ ) n − (1 + 2 δ ) log n − O (log log n ) . As effecti ve complexity is increased if constraints are add ed, the same statement is true for E δ, ∆ ( x |C ) if C is an arbitrar y constrained set o f en sembles. Remark. An explicit lo we r bo und is E δ, ∆ ( x ) ≥ (1 − δ ) n − (1 + 2 δ ) log n − 2 lo g lo g n − ∆(4 + δ ) − 5 K ( δ ) − ω , where ω ∈ N is a global constant. Pr oof: Fix ∆ ≥ 0 , δ ∈ (0 , 1) , and x ∈ { 0 , 1 } n . Let E x be th e minimizing ensemble associated to E δ, ∆ ( x ) , i.e. K ( E x ) = E δ, ∆ ( x ) . (17) Choose ε > 0 ar bitrary . According to Lemm a 13, ther e is a set S x ⊂ { 0 , 1 } ∗ such that x ∈ S x and log # S x ≤ H ( E x )(1 + δ ) + ε, (18) K ( S x ) ≤ K ( E x ) + c, (19) where c ∈ N the sum o f a g lobal constant an d K ( δ ) an d K ( ε ) . Let ˜ c be the best constant f or ou r universal computer U such that K ( s ) ≤ K ( s | t ) + K ( t ) + ˜ c for all s, t ∈ { 0 , 1 } ∗ and at th e same time K ( s | t ) ≤ K ( s | t, u ) + K ( u ) + ˜ c for all s, t, u ∈ { 0 , 1 } ∗ . Using (16), (18) and ( 19), we conc lude that x is almost ty pical for S x : K ( x | S x ) ≥ K ( x ) − K ( S x ) − ˜ c ≥ K ( E x ) + H ( E x ) − ∆ − K ( S x ) − ˜ c ≥ K ( S x ) − c + log # S x 1 + δ − ε 1 + δ − ∆ − K ( S x ) − ˜ c ≥ log # S x 1 + δ − c − ε − ∆ − ˜ c. It also fo llows K ( x | S x , K ( S x )) ≥ K ( x | S x ) − K ( K ( S x )) − ˜ c (20) ≥ log # S x 1 + δ − K ( K ( S x )) − c − ε − ∆ − 2 ˜ c. EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 9 Now we g et rid of the term K ( K ( S x )) . First n ote that (19), (17) and (1 1) yield K ( S x ) ≤ K ( E x ) + c = E δ, ∆ ( x ) + c ≤ K ( x ) + ∆ + c ≤ n + 2 log n + γ + ∆ + c, where γ ∈ N is some co nstant such th at K ( s ) ≤ ℓ ( s ) + 2 log ℓ ( s ) + γ for every s ∈ { 0 , 1 } ∗ , and K ( k ) ≤ log k + 2 log log k + γ for e very k ∈ N . By elem entary analy sis, it holds log( a + b ) ≤ log a + b a if a, b > 0 . Henc e there is some constant κ > 0 which does not depen d on n , δ , ∆ , or x , such that for n ≥ 2 , log K ( S x ) ≤ log n + c + ∆ + κ. Using the same a rgument with K ( K ( S x )) ≤ log K ( S x ) + 2 log log K ( S x ) + γ , for get f or a ll n ≥ 2 K ( K ( S x )) ≤ log n + 2 log lo g n + 3 c + 3∆ + 3 κ + γ . Going back to (2 0), it follows K ( x | S x , K ( S x )) ≥ log # S x 1 + δ − log n − 2 log log n − Λ , (21) where Λ := 4 c + 4∆ + 3 κ + γ + ε + 2 ˜ c (22) Note th at x was ar bitrary , so this equation is valid for every x ∈ { 0 , 1 } n . Let now K n := max { K ( t ) | t ∈ { 0 , 1 } n } , and k := n − ⌈ δ ( K n + ∆ + ε ) + lo g n + 2 log log n ⌉ − Λ − c 2 , where c 2 is th e con stant fr om Lemm a 12. If n is large enoug h, then 0 < k < n h olds, and Lemma 12 applies: There is a string x ∗ ∈ { 0 , 1 } n such that K ( x ∗ | S, K ( S )) < log # S − n + k + c 2 for every set S ∋ x ∗ with K ( S ) < k − c 1 , where c 1 is an other global constant. First note the following inequ ality: − δ ( K n + ∆ + ε ) ≤ − δ ( K ( x ∗ ) + ∆ + ε ) ≤ − δ ( H ( E x ∗ ) + ε ) ≤ − δ  H ( E x ∗ ) + ε 1 + δ  = − δ 1 + δ ( H ( E x ∗ )(1 + δ ) + ε ) ≤  1 1 − δ − 1  log # S x ∗ . (23) Now suppose that K ( S x ∗ ) < k − c 1 . Consequently , K ( x ∗ | S x ∗ , K ( S x ∗ )) < log # S x ∗ − n + k + c 2 ≤ log # S x ∗ − n − Λ + n − log n − δ ( K n + ∆ + ε ) − 2 log log n ≤ log # S x ∗ 1 − δ − log n − 2 log log n − Λ . But this is a co ntradiction to (21). Hen ce ou r assumption must be false, and we must instead h av e K ( S x ∗ ) ≥ k − c 1 . Th us, using (17), ( 19), an d K n ≤ n + 2 log n + γ , E δ, ∆ ( x ∗ ) = K ( E x ∗ ) ≥ K ( S x ∗ ) − c ≥ k − c 1 − c ≥ n − δ ( n + 2 log n + γ + ∆ + ε ) − log n − 2 log log n − Λ − c 2 − 1 . ✷ Applying re lation ( 11) to the case of effecti vely complex strings x ∗ constructed here, we obtain a lower bound o n K ( x ∗ ) : (1 − δ ) n − (1 + 2 δ ) log n − 2 log log n − θ ≤ K ( x ∗ ) , ( 24) where θ = ∆(5 + δ ) + 5 K ( δ ) + ω . For large n , wh ere the constant θ become s negligib le, this boun d is non-trivial and in particular for δ = 0 rem arkably c lose to the maximal v alu e n + K ( n ) . On the other hand , from the previous p roof and Lemma 12 we dedu ce the following u pper bound on the comp lexity of x ∗ : K ( x ∗ ) ≤ (1 − δ ) n + K ( n ) − log n − 2 log log n − Λ + ˜ ω , where ˜ ω is a g lobal con stant. Th is implies the following relation between the degree o f incom pressibility r ( x ∗ ) and the c onstant ∆ : r ( x ∗ ) ≥ δ n + lo g n + 2 log lo g n + Λ − ˜ ω ≥ Λ ≥ 4∆ , where the second ineq uality holds for sufﬁciently large n and the last one uses the deﬁnition (22) of Λ . Indeed, this relation prevents x ∗ from falling into the domain of Theorem 10, which would f orce it to have small effecti ve comp lexity . Note that all effectiv ely comp lex strings must, as lon g as effecti ve com plexity is ﬁnite, h av e large Kolmogorov com- plexity , too. Th is follows from (11). V . E FF E C T I V E C O M P L E X I T Y A N D L O G I C A L D E P T H In this section, we show that effectively com plex strings have very large compu tation times. In mor e detail, it takes a universal co mputer an astronomically large amoun t o f time to compute such a string fr om its minimal pro gram, o r from an almost m inimal program. The time it takes to compute a string from its minimal progr am is discu ssed b y C. Benn ett [4] in the context o f “logical depth”. The notion of logical depth form alizes the idea that some strin gs are more difﬁcult to co nstruct than o thers (say , a string describing a pro of of the Riemann hypoth esis is harder to con struct than a u niformly ran dom string ). Howe ver, the computatio n time o f a string’ s min imal p rogram is n ot a very stable measure fo r this difﬁculty: There m ight be progr ams that are “almost minimal”, i.e. only a fe w bits longer, but run much faster than the minimal progr am. Thus, logical depth is deﬁned as the sho rtest time to compute a string fro m its almost-min imal p rogram. “ Almost- minimal” m eans tha t the p rogram is only a few bits long er than th e minimal pro gram, and the maximu m length overhead is c alled the “sign iﬁcance le vel”. EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 10 Deﬁnition 15 ([4 ], Depth o f F inite String s): Let x ∈ { 0 , 1 } ∗ be a strin g an d z ∈ N 0 a parameter . A string ’ s logical depth at signiﬁcance le vel z , d enoted Depth z ( x ) , will be d eﬁned as Depth z ( x ) := min { T ( p ) | ℓ ( p ) − C ( x ) ≤ z , V ( p ) = x } , where T ( p ) denotes the halting time of the universal computer V on input p . Note that we have used plain K olm ogorov com plexity C here, that is, the u niversal co mputer V is no t assumed to be preﬁx-f ree in con trast to the original deﬁnition [4]. As plain an d pr eﬁx K olm ogorov complexity are clo sely related , this mo diﬁcation will not result in large quantitative ch anges. Howe ver , for us it has important technical advantages as we will see b elow . Logical depth is som etimes deﬁned in a d ifferent manner with reference to alg orithmic pr obability , cf. [1, Def. 7.7 .1]. Howe ver , if computation times are large (which will be the case in our context, cf. the remark after Theor em 18), then those different deﬁnition s are essentially equiv alen t [ 1, Claim 7.7.1] . Clearly , it takes a comp uter (i.e. a T u ring machine) at least ℓ ( x ) steps to print a string x on its tape. Thus, th e depth o f a string must be lower -b ound ed b y its length: Depth z ( x ) + > ℓ ( x ) for every z . Follo win g [1], strin gs that have a depth almost as small as possible, i.e . Depth z ( x ) + < ℓ ( x ) , will be called shallow . The notion of depth d epends on the choice of the un i versal referenc e co mputer — but n ot too much. As explained in [1], there is a universal 2 -tape Turing machine tha t simulates t steps of an arbitrary k -ta pe T urin g mach ine in time c · t log t , where c is a co nstant that only depen ds on the mach ine. In particular, this T uring mach ine can im plement obvio us tasks like copying of n b its in time n from one ta pe to the other . Therefo re, we will ﬁx this “Hennie- Sterns mach ine” (cf . [1, 6.13]) as our u niv e rsal referen ce machine for this sectio n. T o state the ﬁrst example, a string x will be called m - random if C ( x ) ≥ ℓ ( x ) − m . Example 16 ([1, Ex. 7.7.3 ]): Random strings ar e sha llow . That is, there are con stants β , γ ∈ N such that f or every m - random string x it ho lds Depth m + β ( x ) ≤ ℓ ( x ) + γ . As Depth z is decreasing in z , it a lso f ollows that Depth z ( x ) ≤ ℓ ( x ) + γ fo r all z ≥ m + β . Pr oof: There is always a co mputer pro gram p of length ℓ ( x ) + β that sequen tially lists the b its of x , pr oducing V ( p ) = x in the most tri v ial way . Suc h a pro gram will hav e a running time of ℓ ( x ) , plus potentially some overhead γ r esulting from initialization. If ℓ ( x ) ≤ C ( x ) + m , then ℓ ( p ) ≤ C ( x ) + m + β , and the claim fo llows from the deﬁn ition of dep th. ✷ It will be inter esting in the following that this conclusio n carries over to “most” strings that ar e r -incom pressible as deﬁned in Theo rem 10. Th is will be proved in the next example. Moreover, we give a tech nical result which will be useful below . Note that K ( x ) + < C ( x ) + K ( C ( x )) ( x ∈ { 0 , 1 } ∗ ) , and we will be interested in strings f or which som e kind of conv er se hold s. For this pur pose, we say that a string x is k -well-beha ved for some k ∈ N if K ( x ) + k ≥ C ( x ) + K ( C ( x )) . In fact, it is stated in [1] that most s tr ings are k -well-behaved if k is large enough. The next example s h ows that incompressible random strin gs are well-b ehaved. Lemma 1 7 ( Incompressible and Shallow Strings): For e v - ery n , ther e are at least 2 n (1 − 2 − r + c − 2 − m ) strings of length n tha t are r -incom pressible and m -random , where c ∈ N is a constant. T hey satisfy Depth m + β ( x ) ≤ ℓ ( x ) + γ , where β , γ ∈ N are constants. Moreover, th ose strings are k - well-behaved, where k ∈ N is a constant that o nly depen ds on r a nd m , but not on x . Pr oof: Recall two b asic incomp ressibility facts th at are listed in [1 ]: • There is a constan t c such that th ere ar e at least 2 n (1 − 2 − r + c ) strings of leng th n wh ich ar e r -incom pressible. • For every m ∈ N , there are at least 2 n (1 − 2 − m ) strings of leng th n with C ( x ) ≥ n − m (w e call suc h strings “ m -rando m”). A simple application of the union bou nd gives tha t there are at least 2 n (1 − 2 − r + c − 2 − m ) strings o f leng th n that are at the same time r -incomp ressible and m - random . The upper bound on the log ical dep th f ollows then from Ex ample 16. If x is m -ran dom, then C ( x ) ≥ ℓ ( x ) − m , and a co n verse inequality h olds trivially . T he well-kn own continu ity prop- erty [1] o f K then guara ntees the existence o f a constan t l ∈ N (that d epends only on m but not on x ) such that K ( C ( x )) ≤ K ( ℓ ( x ))+ l . Moreover, if p ∈ N is a constant such that C ( s ) ≤ ℓ ( s ) + p for all strin gs s , the r -in compre ssibility proper ty yields K ( x ) ≥ ℓ ( x ) + K ( ℓ ( x )) − r ≥ C ( x ) − p + K ( C ( x )) − l − r. It follows that x is ( p + l + r ) -well-behaved. ✷ W e will now show that e ffecti vely complex strings must have very large log ical dep th if th ey are well-behaved. Th is is in contrast to th e fact th at incomp ressible strings (which have small ef fec ti ve com plexity acc ording to Theorem 10) are always shallow , that is, h av e very small dep th as shown in Lemma 1 7. The main id ea is as follows: Suppo se that an almo st- minimal pr ogram for th e string x h as a ra ther shor t halting time τ . Then we can con sider the ensemble E , that is deﬁned by equidistribution over all string s that have a sh ort pro gram (of length less or equal to C ( x ) ) with some h alting time less than or equal to τ . Such an en semble is simp le, x is typ ical for it, an d it has small total info rmation. Thus, it f orces the effecti ve com plexity of x to b e small. Theor em 18 (Effective Complexity an d Dep th): Th ere is a global co nstant ω ∈ N with th e fo llowing property : Suppo se that f : N → N is a strictly in creasing, comp utable function. EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 11 Furthermo re, suppo se that x is a k -well-beh av ed string. If the effecti ve com plexity of x satisﬁes E δ,k + z + K ( z )+ K ( f )+ ω +1 ( x ) > K ( C ( x )) + K ( z ) + K ( f ) + ω for some ar bitrary δ ≥ 0 and z ∈ N , then Depth z ( x ) > f ( C ( x )) . Remark. Before we p rove th is theorem, we explain its mean- ing and im plications. First, since C ( x ) + < ℓ ( x ) , it follows that K ( C ( x )) is of the order log ℓ ( x ) or le ss, which is q uite small. Thus, the inequ ality f or E in this theorem is a very weak condition . The f unction f can be chosen to be simple ( such that K ( f ) is small), but extremely rapidly growing, fo r example of the form f ( n ) := n n n n ... (power tower of height n ) . Thus, the co nsequence of th is th eorem is th at the dep th m ust be a stronomically large. Pr oof: As E δ, ∆ is decre asing in δ , it is sufﬁcient to prove the th eorem for the ca se δ = 0 . For y ∈ N and every computab le function f : N → N , deﬁne th e set τ y ,f := { x ∈ { 0 , 1 } ∗ | ∃ p ∈ { 0 , 1 } ∗ with V ( p ) = x, ℓ ( p ) ≤ y , T ( p ) ≤ f ( y ) } , where T ( p ) denotes the h alting time o f the ( plain, not p reﬁx) universal computer V on inpu t p . Clearly # τ y ,f < 2 y +1 , and if E y ,f is the unifo rm distrib ution on τ y ,f , then H ( E y ,f ) < y + 1 . Moreover , there is some c onstant c ∈ N such that K ( E y ,f ) ≤ K ( y ) + K ( f ) + c . Hence the total information satisﬁes Σ( E y ,f ) ≤ y + K ( y ) + K ( f ) + c + 1 . (25) Now let x ∈ { 0 , 1 } ∗ be a k -well behaved string, and let z ∈ N be arbitrar y . Suppose that x ∈ τ C ( x )+ z ,f , then x is 0 -ty pical for E := E C ( x )+ z ,f . Since the re exists som e global constant d ∈ N such that K ( a + b ) ≤ K ( a ) + K ( b ) + d fo r every a, b ∈ N , we can estimate , using ( 25), Σ( E ) ≤ C ( x ) + z + K ( C ( x ) + z ) + K ( f ) + c + 1 ≤ C ( x ) + K ( C ( x )) + z + K ( z ) + K ( f ) + d + c + 1 ≤ K ( x ) + k + z + K ( z ) + K ( f ) + c + d + 1 | {z } =:∆ . By deﬁnition o f effecti ve comp lexity , we get E 0 , ∆ ( x ) ≤ K ( E ) ≤ K ( C ( x ) + z ) + K ( f ) + c ≤ K ( C ( x )) + K ( z ) + K ( f ) + c + d. (26 ) In summary , we have so far shown the f ollowing: if th ere is a prog ram p ∈ { 0 , 1 } ∗ with V ( p ) = c and ℓ ( p ) ≤ C ( x ) + z su ch th at the correspon ding halting time satisﬁes T ( p ) ≤ f ( C ( x ) + z ) , then the effecti ve complexity is as small as in (26). The n egation of th is statement, togethe r with f ( C ( x ) + z ) ≥ f ( C ( x )) an d ω := c + d , proves the theorem. ✷ Interestingly , incompr essible string s just sligh tly fail to fulﬁll th e inequ ality of the pr e v ious th eorem. Acco rding to Corollary 11, r -incom pressible string s x have effecti ve com- plexity E δ,r + c ( x ) + < K ( C ( x )) + r . Thus, we cannot con clude that those strings hav e large d epth — f ortunately , becau se most r -in compre ssible strings are in fact sha llow according to E xample 17. Thus, it follows that th e expression K ( C ( x )) sharply mar ks the “edg e o f d epth”, in the sense that strings with larger effecti ve co mplexity always have extremely large d epth, but strings with smaller effecti ve comp lexity can have arbitrarily small dep th. I n some sense, a phenomeno n similar to a “p hase transition” occurs at effecti ve co mplexity of K ( C ( x )) (apa rt from add iti ve constants that get less and less imp ortant in the “thermod ynamic limit” ℓ ( x ) → ∞ ) . This beh a v ior is schematically d epicted in Figure 2 . The previous theo rem says that if th e ef fective comp lexity exceeds K ( C ( x )) (omitting all additive constants here) , then the logi- cal d epth mu st be astronomically large. On the other hand, if effecti ve comp lexity is smaller , d ifferent values o f dep th seem possible. In particular, if x is r -inc ompressible, th en we know from Corollary 11 that effectiv e complexity is (possibly only up to a fe w bits) smaller than K ( C ( x )) , while the logical depth is as small as possible (o f the order ℓ ( x ) ) due to Lemma 17. E ( x ) Depth( x ) n = ℓ ( x ) n = ℓ ( x ) K ( C ( x )) absurdly large ?? Fig. 2. At ef fecti ve comple xity equa l to K ( C ( x )) logical depth suddenly becomes astronomic ally large. This is reminiscent of the phenomenon of “phase transiti on” kno wn from statisti cal mechanics. This theorem can easily be extended to the case of effecti ve complexity with constrain ts as long as the constrained sets of ensembles satisfy the usual technical condition s: Theor em 19 ( E a nd Depth with Constraints): Th ere is a global co nstant ω ∈ N with th e fo llowing property : Suppo se f : N → N is a strictly increasing , co mputable fun ction. Furthermo re, supp ose that x is a k -well behaved string, and the constrain ed set C is decidable and co n vex an d contain s the Dirac m easure δ x . If th e effectiv e complexity of x satisﬁes E δ, ∆ ( x |C ) > K ( C ( x )) + K ( z ) + K ( f ) + K ( C ) + ω for some ∆ ≥ k + z + K ( z ) + K ( f ) + ω + 1 + K ( C ) with δ ≥ 0 a nd z ∈ N , then Depth z ( x ) > f ( C ( x )) . Remark. F or an explanation and interpretation of this theorem, see the rem arks a fter T heorem 18 ab ove. EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 12 Pr oof: The proof is almost identical to that of Theorem 18 above. The only m odiﬁcation is that the set τ y ,f has to be replaced by a set τ y ,f , C := { x ∈ { 0 , 1 } ∗ | ∃ p ∈ { 0 , 1 } ∗ with V ( p ) = x, ℓ ( p ) ≤ y , T ( p ) ≤ f ( y ) , δ x satisﬁes C } . The conv exity conditio n then ensures that the unifor m dis- tribution on τ y ,f , C , called E y ,f , C , is containe d in C . This construction enlarges every additive co nstant b y a term K ( C ) , i.e. th e complexity of a computer pro gram that is able to test for string s if the correspon ding Dirac m easures are contained in C . ✷ V I . E FF E C T I V E C O M P L E X I T Y A N D K O L M O G O RO V M I N I M A L S U FFI C I E N T S TA T I S T I C S Now we stud y the relatio n b etween effecti ve complexity without constraints and K olmogo rov min imal suf ﬁcient statis- tics (KMSS). For mor e inform ation on K o lmogor ov minimal sufﬁcient statistics and related n otions, see [1, 2.2.2 ]. For strings x and integers k ∈ N , we can d eﬁne a (version of) the Kolmogor ov structur e function H k ( x | n ) by H k ( x | n ) := min { log # A | A ⊆ { 0 , 1 } n , x ∈ A, K ∗ ( A | n ) ≤ k } , (27) i.e. H k ( x | n ) is the lo garithm of the min imal size o f any su bset of string s of len gth n which co ntains the string x and has complexity u pper bound ed by k , giv e n n . Th e correspon ding minimal set will be called A k (if the re are several m inimiz- ers, we take the ﬁrst set in some cano nical order ). Hence H k ( x | n ) = log # A k and K ∗ ( A k | n ) ≤ k . Deﬁnition 20 (KMSS ): Let x be a string of leng th n and denote by k ∆ ( x ) th e minimal natural number k satisfying H k ( x | n ) + k ≤ K ∗ ( x | n ) + ∆ . (28) A minim al pr ogram k ∗ ∆ ( x ) for A k ∆ ( x ) is called K o lmogor ov minimal sufﬁcient statistics of th e string x . Originally , the K olmog orov structur e function as well as Kol- mogor ov minimal sufﬁcient statistics were deﬁned by using plain Kolmogorov c onditiona l c omplexity C ( ·|· ) in (27) a nd (28) instead of Chaitin’ s preﬁx version K ∗ ( ·|· ) . It holds ℓ ( k ∗ ∆ ( x )) = K ( A k ∆ ( x ) ) + < k ∆ ( x ) + K ( n ) . (29) Moreover , k ∆ ( x ) can equiv alently b e written as k ∆ ( x ) = min { K ∗ ( A | n ) | log # A + K ∗ ( A | n ) ≤ K ∗ ( x | n ) +∆ , x ∈ A ⊂ { 0 , 1 } n } . (30) This formula looks very similar to the deﬁnition of (uncon - strained) ef fec ti ve com plexity , as given in Deﬁnition 4 . Hence the Kolmogorov min imal sufﬁcient statistics is ap proxim ately the minimal prog ram of the minimizing set within the m ini- mization domain o f ef fective co mplexity . W e will explore this observation in more detail in the following lemma. Lemma 2 1: There is a co nstant c ∈ N such that for all δ, ∆ ≥ 0 it holds E δ, ∆+ c ( x ) + < ℓ ( k ∗ ∆ ( x )) + < k ∆ ( x ) + K ( n ) (31) unifor mly in x ∈ { 0 , 1 } ∗ , wher e n := ℓ ( x ) . Moreover, there is a co nstant g ∈ N such that for all δ, ∆ ≥ 0 E δ, ∆ ( x ) + > k K ( n )+∆ + δ ( K ( x )+∆)+ K ( δ )+ g ( x ) − K ( δ ) (3 2) + > ℓ  k ∗ K ( n )+∆ + δ ( K ( x )+∆)+ K ( δ )+ g ( x )  − K ( δ ) − K ( n ) unifor mly in x ∈ { 0 , 1 } ∗ . Pr oof: Let k := k ∆ ( x ) and n := ℓ ( x ) . By deﬁnition , k ≥ K ∗ ( A k | n ) + = K ( A k , n ) − K ( n ) + = K ( A k ) − K ( n ) . Let E k be th e uniform distribution on A k . It fo llows H ( E k ) + K ( E k ) + = log # A k + K ( A k ) = H k ( x | n ) + K ( A k ) + < H k ( x | n ) + k + K ( n ) ≤ K ∗ ( x | n ) + ∆ + K ( n ) + = K ( x, n ) + ∆ + = K ( x ) + ∆ . Thus, there is some constant c ∈ N such that E δ, ∆+ c ( x ) ≤ K ( E k ) + < K ( A k ) . Then ( 31) follows from ( 29). In order to sh ow (32), we u se Lemma 13. Let E be the minimizing ensemble in the deﬁnition of E δ, ∆ ( x ) . In particular, it holds K ( E ) = E δ, ∆ ( x ) . (33) Fix any ε > 0 . Due to Lemma 13, there is a set S ′ ⊂ { 0 , 1 } ∗ containing x su ch th at log # S ′ + < H ( E )(1 + δ ) , K ( S ′ ) + < K ( E ) + K ( δ ) . Let now S := S ′ ∩{ 0 , 1 } n . It still holds log # S + < H ( E )(1 + δ ) and K ( S ) + < K ( S ′ ) + K ( n ) + < K ( E ) + K ( δ ) + K ( n ) . Since K ∗ ( S | n ) + = K ( S, n ) − K ( n ) + = K ( S ) − K ( n ) , we get the chain of in equalities log # S + K ∗ ( S | n ) + < H ( E )(1 + δ ) + K ( S ) − K ( n ) + < H ( E ) + δ H ( E ) + K ( E ) + K ( δ ) ≤ K ( x ) + ∆ + δ H ( E ) + K ( δ ) + < K ∗ ( x | n ) + K ( n ) + ∆ + K ( δ ) + δ ( K ( x ) + ∆) . Using (30) and (33), it follows that k K ( n )+∆ + δ ( K ( x )+∆)+ K ( δ )+ g ( x ) ≤ K ∗ ( S | n ) + = K ( S ) − K ( n ) + < K ( E ) + K ( δ ) = E δ, ∆ ( x ) + K ( δ ) . Then ( 32) follows again fr om (29) ✷ EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 13 V I I . C O N C L U S I O N S W e h a ve given a formal deﬁnition of effective complexity and rig orous proo fs o f its b asic pro perties. In p articular, we have shown that there is an interesting relation between effecti ve c omplexity and logica l depth : the depth o f a strin g x is astron omically large if the effectiv e comp lexity exceeds K ( C ( x )) ; othe rwise, it can be arbitrarily small. This statement is tru e u p to a few tech nical conditions and up to certain add iti ve constants. These constants beco me less and less impo rtant for lo nger an d longer strin gs — th is is comp arable to the “th ermody namic limit” in statistical mechanics, and th e be havior can b e co mpared to that of a phase transition. So how usefu l is ef fe cti ve complexity for the study of natu - ral systems? W e do not yet know th e answer to this q uestion, but we hope th at ou r mathem atically rigo rous appro ach gives the ﬁrst steps towards possible application s within mathema t- ics or theoretical com puter scien ce. At least, we h av e shown that effecti ve co mplexity h as interesting pro perties, for example there are strings that h av e effecti ve c omplexity clo se to their lengths. Those strin gs are rare events only — “most” strings are in fact effectively simple; this follows f rom Theorem 10. But this pro perty is unav o idable: Recall tha t a major motiv ation for the d eﬁnition of effecti ve co mplexity was th at random string s sh ould be simple (in contr ast to Kolmogorov co mplexity). Now since almost all strings are random, it follows tha t almost all strings must be e ffecti vely simple. Most o f our r esults co ncerning the con strained version of effecti ve co mplexity E δ, ∆ ( x |C ) were de riv ed under the assumption th at the Dirac mea sure δ x is an element o f C . Although this is a natural assumption , the behavior of ef fe cti ve complexity m ight as well b e completely different if it is dropp ed. I n vestigating such situations in mo re detail could be useful in order to get better insight into the concep t of effecti ve complexity and its limitations. Finally , a possible ﬁeld o f app lication of effective comp lex- ity might be statistical mechan ics, wher e the n otion of entr opy and algorithmic complexity have b oth already led to interesting conclusion s. After all, the con straints given by th e set C can be in terpreted as macroscop ic o bservables as discussed in Section III, and we have already compared our result on logical depth with cer tain n otions of statistical mec hanics. A P P E N D I X A A N E X A M P L E O F N O N - C O M P U TA B L E E N T RO P Y As p romised in the introduc tion, we give an explicit con- struction o f a comp utable ensemble E with the p roperty that the entropy H ( E ) is ﬁnite but not computable: Example 22 (Non-Computa ble Entr opy): For every n ∈ N , le t A n be th e set of strings that star t with exactly n − 1 zeroes, such that the n -th bit either does no t exists or is a one. T hat is, A 1 = { λ, 1 , 10 , 11 , 100 , 101 , . . . } , A 2 = { 0 , 0 1 , 010 , 01 1 , 0100 , 0101 , . . . } , A 3 = { 00 , 001 , 001 0 , 0011 , 00100 , 0 0101 , . . . } and so o n. Clearly , this is a computab le partition of { 0 , 1 } ∗ into countab ly-inﬁnite, m utually disjoint subsets: S n ∈ N A n = { 0 , 1 } ∗ , A m ∩ A n = ∅ if m 6 = n . W e now c onstruct an ensemble E such th at ev er y set A n has weigh t 2 − n , that is E ( A n ) := P x ∈ A n E ( x ) = 2 − n . W e distribute the weight 2 − n among the m embers of A n in a way , such that the resulting en semble E is co mputable, but ha s non - computab le entropy . Th is is don e as fo llows: let Ω > 0 be a real num ber which is not computable , but enumer able from below . Tha t is, there exists a computab le sequence (Ω n ) n ∈ N with Ω 1 := 0 wh ich is increasing , i.e. Ω n +1 ≥ Ω n , and which conv erges to Ω , i.e. lim n →∞ Ω n = Ω . For example, we may use Chaitin’ s Omega number[ 1 ] Ω := X U ( x ) exists 2 − ℓ ( x ) , where the sum is over all string s x ∈ { 0 , 1 } ∗ such that the universal preﬁx computer U halts on inp ut x . The number Ω giv e s the pr obability th at the c omputer U h alts o n rando mly chosen input. It is a real number b etween zero and one, and it is o bviously enum erable from b elow , b ut it is not computable. Giv en some weight c ∈ (0 , 1] and ﬁnitely many positive real number s r i such that P i r i = c , the resulting en tropy sum − P i r i log r i will always be larger than o r equa l to − c log c . The conv er se is also true: Given some ﬁxed entropy value s ≥ − c log c , we can always ﬁnd ﬁnitely many p ositi ve real number s r i with P i r i = c such that − P i r i log r i = s . Such a list of re al numbers can be fou nd in an obviou s systematic way that can be implemen ted as a co mputer p rogram . Thus, to every n ∈ N , we may systematically distribute the weight 2 − n to ﬁnitely many string s in { x 1 , . . . , x k } ⊂ A n such that the corr espondin g p robabilities E ( x i ) have entropy sum − 2 − n log 2 − n + Ω n +1 − Ω n , i.e. X x ∈ A n E ( x ) = 2 − n , − X x ∈ A n E ( x ) log E ( x ) = − 2 − n log 2 − n + Ω n +1 − Ω n | {z } ≥ 0 . The resulting ensemble is obviously comp utable, and the entropy is H ( E ) = − ∞ X n =1 X x ∈ A n E ( x ) log E ( x ) = ∞ X n =1  − 2 − n log 2 − n + Ω n +1 − Ω n  = ∞ X n =1 n 2 n + lim N →∞ Ω( N ) − Ω(1 ) = 2 + Ω . This is n ot a com putable num ber . A C K N O W L E D G M E N T The autho rs would like to thank Eric Smith for helpf ul discussions. Th is work has bee n suppo rted by the San ta Fe Institute. EFFECTIVE COMPLEXITY AND ITS RELA TION TO L OGICAL DE PTH. OC TOBER 29, 2008 14 R E F E R E N C E S [1] M. Li and P . Vi tanyi, An Intr oducti on to K olmogor ov Complexi ty and Its Applicat ions , Springer V er lag (1997) [2] M. Gell-Mann, S. L loyd , Effe ctive Complexi ty , Santa Fe Institute, pp. 387-398 (2003) [3] M. G ell-Ma nn, S. Lloyd, Information Measures, Effecti ve Comple xity , and T ot al Inf ormation , Comple xity , V ol . 2 , 44-52 (1996) [4] C. Bennett, Logical Depth and Physical Comple xity , in The Universal T uring Machi ne - a Half-Century Surve y , ed. Rolf Herken, Oxford Uni versity Press (1988) [5] S. Lloyd, Measur es of Complexity : A Nonexhaust ive List , IEEE Control Systems Magazine 21/4 , pp. 7-8 (2001) [6] P . G ´ acs, J. T . Tromp, P . M. B. V it ´ an yi, A lgorith mic Statistic s , IEEE Trans. Inf. Th. 47/6 2443-2463 (2001) [7] G. J. Chait in, Exploring Randomness , Springer- V erla g London (2001) [8] T . M. Cover , J. A. Thomas, E lements of Information Theory , Wil ey Series in T elec om municat ions, J ohn W iley & Sons, New Y ork (1991) [9] N. A y , M. M ¨ uller , A. Szkoła, Effectiv e Complexit y of Ergodic Processe s (prelimin ary title), in preparati on.

Effective Complexity and its Relation to Logical Depth

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment