A Complete Theory of Everything (will be subjective)
Increasingly encompassing models have been suggested for our world. Theories range from generally accepted to increasingly speculative to apparently bogus. The progression of theories from ego- to geo- to helio-centric models to universe and multiver…
Authors: Marcus Hutter
A Complete Theory of Ev erything (will b e sub jectiv e) Marcus Hutter SoCS, RSISE, IAS, CECS Australian National Univ ersit y Can b erra, A CT, 020 0, Australia September 2010 Abstract Increasingly encompassing mo d els ha ve b een suggested for our w orld. The- ories range from generally accepted to increasingly sp eculativ e to apparen tly b ogus. The p r ogression of theories from ego- to geo- to h elio-cen tric mo dels t o unive r se and multiv erse theories and b ey ond wa s accompanied by a dramatic increase in the sizes o f the p ostulated w orlds, with humans b eing exp elled from their cente r to ev er more remote and random lo cations. Rather than leading to a true theory of ev eryth ing, this trend faces a tur n ing p oin t after wh ic h the predictiv e p o w er of such th eories decreases (actually to zero). Incorp orating the lo cation and other capacities of the observ er in to such theories av oids this problem and allo ws to d istinguish meaningful from predictiv ely meaningless theories. T his also leads to a truly complete theory of everything consisting of a (con ven tional ob jectiv e) th eory of ev erything plus a (no vel sub jectiv e) ob- serv er pro cess. The observe r lo calization is neither based on the con trov ersial an throp ic principle, nor h as it anything to do with the quan tu m -mec hanical observ ation pro cess. The suggested principle is extended to m ore practical (partial, appr oximate , probabilistic, parametric) w orld mo dels (rather than theories of ev erything). Finally , I p ro vide a ju stification of Ockham’s razor, and criticize the an thropic principle, the do omsda y argument , the n o free lunc h theorem, and the falsifiability dogma. Con tents 1 In tro duction 2 2 Theories of Something, Ev erything & Nothing 4 3 Predictiv e Po w er & Observer Lo calization 7 4 Complete T oEs (CT oEs) 9 5 Complete T oE - F ormalization 12 6 Univ ersal T oE - F ormalization 14 7 Extensions 15 8 Justification of Oc kham’s Razor 19 9 Discussion 23 References 24 A List of Notation 26 Keyw ords w orld mo dels; observer lo calizatio n ; predictiv e p o we r ; Oc kham’s razor; uni- v ersal theories; inductiv e reasoning; simplicit y and complexit y; un iv ersal self- sampling; no-free-lunch; computabilit y . 1 “... in spite of it’s inc omputability, A lgorithmic Pr ob ability c an serve as a kind o f ‘Gold Standa r d’ for induction systems” — Ray Solomonoff (1997) “Ther e is a the ory whic h states that if ever anyone disc overs exactly what the Universe is for and why it is her e, it wil l instantly disapp e ar and b e r eplac e d by something even mor e bizarr e and inexplic able. The r e is another the ory which states that this has alr e ady happ ene d.” — Douglas Adams, Hitc hhik ers g uide to the Galaxy (1979) 1 In tro ducti on This pap er uses an information-the or etic and c omputational approac h for addressing the p h ilosophic al problem of judging theories (of ev erything) in physics . In or der to k eep the pap er generally accessible , I’v e tried to minimize field-sp ecific jargon and mathematics, and fo cus on the core pro blem and its solution. By the ory I mean an y mo del whic h can explain ≈ desc r ib e ≈ predict ≈ compress [Hut06a] our observ ations, whatev er the form of the mo del. Scien tists o ften sa y that their model explains some phenomenon. What is usually meant is that the mo del de s c rib es (the relev an t asp ects of ) the observ ations more compactly than the ra w data. T he mo del is then r ega rded as capturing a la w (of nature), whic h is b eliev ed to hold true also for unseen/future data. This pro cess o f inferring general conclusions from example instances is call induc- tive r e asoning . F or instance, observing 100 0 blac k rav ens but no white one supp orts but cannot prov e the h yp othesis that all rav ens are black . In general, induction is used to find prop erties or rules o r mo dels of past observ ations. The ultimate pur- p ose of the induced mo dels is to use them for making predictions, e.g. that the next observ ed ra v en will also be blac k. Arguably inductiv e reasoning is eve n more imp o r- tan t than deductiv e reasoning in science and ev eryday lif e: f o r scien tific disco very , in mac hine learning, for forecasting in economics, a s a philosophical disc ipline, in common-sense decision making, and la st but not least to find theories of eve rything. Historically , some fa mous, but apparently misguided philosophers [Sto82, Gar01], including Popper and Miller, ev en disputed the existenc e, necessit y or v a lidit y of inductiv e reas oning. Mean while it is we ll-kno wn ho w minim um encoding length principles [W al05 , Gr ¨ u07], ro oted in (algo r it hmic) information theory [L V08], quan- tify Oc kham’s ra zor principle, a nd lead to a solid pragmatic foundation of inductiv e reasoning [Hut07 ]. Essen tially , one can sho w t ha t the more o ne can c ompr ess , the b etter one can pr e dict , and vice v ersa. A deterministic theory/mo del a llows from initial conditio ns to determine an ob- serv ation sequence , whic h could b e co ded as a bit string. F or instance, Newton mec hanics maps initial planet p ositions+ve lo cities into a time-series of planet p osi- tions. So a deterministic mo del with initial conditions is “just” a compact represen- 2 tation of an infinite observ ation string. A sto c hastic mo del is “just” a probability distribution ov er observ ation strings. Classical mo dels in ph ysics are essen tially differen tial eq uat io ns describing the time-ev olution of some asp ects of the w orld. A Theory of Ev erything (T oE) mo dels the whole univ erse or m ultive rse, whic h should include initial conditions. As I will argue, it can b e crucial to also lo calize the observ er, i.e. to augmen t the T oE with a mo del of the prop erties of the observ er, ev en for non-quan tum-mec hanical phenomena. I call a T oE with observ er lo calization, a Com p lete T oE (CT oE). That the observ er itself is imp or t an t in describing o ur world is we ll- kno wn. Most prominen tly in quantum mec hanics, the observ er pla ys an activ e role in ‘collapsing the w av e function’ . This is a sp ecific and relativ ely w ell- defined role of t he observ er for a particular theory , whic h is n o t m y concern. I will show that (ev en the lo cal- ization of ) the observ er is indispensable for finding or dev eloping any (useful) T oE. Often, the anthropic principle is in vok ed for this purp ose (our univ erse is as it is b ecause otherwise w e w ould not exist). Unfortunately its curren t use is rather v ague and limited, if not outright unscien tific [Smo04]. In Section 6 I extend Sc hmidhuber’s formal w o r k [Sc h00 ] on computable T oEs to formal ly include observ ers. Sc hmidh u- b er [Sc h00 ] already discusse s observ ers and men tions sampling univ erses consisten t with our o wn existence , but this part sta ys informal. I giv e a precise and formal ac- coun t of observ ers b y explicitly separating the observ er’s sub jectiv e exp erience f r o m the ob jectiv ely existing univ erse or m ultive rse, whic h b esides other things sho ws that w e also need to lo calize the o bserv er within our unive r se (not o nly whic h univ erse the observ er is in). In order to mak e the main p oin t of this pap er clear, Section 2 first trav erses a n um b er of mo dels that hav e b een suggested for our w orld, from generally accepted to increasingly sp eculativ e and questionable theories. Section 3 discusses the relative merits of the mo dels, in particular their predictiv e p ow er (precision a nd co verage). W e will see that lo calizing the observ er, whic h is usually not regarded as an issue, can b e very imp ortant. Section 4 giv es a n informal in tro duction t o the necessary ingredien ts for CT oEs, a nd ho w to ev a luate and compare them using a quantified instan tiation of Oc kham’s razor. Sec t io n 5 giv es a formal definition of what accounts for a CT oE, in tro duces more realistic observ ers with limited p erception abilit y , and formalizes the CT oE selection principle. The Univ ersal T oE is a sanit y critical p oin t in the dev elopmen t of T oEs, and will b e in ve stigated in more detail in Section 6. Extensions to more practical (pa rtial, a pproximate, pro babilistic, parametric) theories (rather than T oEs) a re briefly discussed in Section 7. In Section 8 I show that Oc kham’s razor is w ell-suited for finding T oEs and briefly criticize the anthropic principle, the do omsda y a rgumen t, the no free lunc h theorem, and the falsifiabilit y dogma. Section 9 concludes. 3 2 Theories of Something , Ev erythin g & Not hing A n um b er of mo dels ha v e b een suggested for our w orld. They range from generally accepted to increasingly sp eculativ e to outright una cceptable. F or the purp ose of this w o r k it do esn’t matter where y ou p ersonally draw the line. Many now generally accepted theories hav e once b een regarded as insane, so using the scien tific comm u- nit y or g eneral public as a judge is problematic and can lead to endless discussions: for instance, the historic g eo ↔ helio cen tric battle; and the ong oing discussion of whether string theory is a theory of ev erything or more a theory of nothing. In a sense this pap er is ab o ut a formal rational criterion to determine whether a mo del mak es sense or no t . In order to make the main p oin t of this pap er clear, b elo w I will briefly trav erse a n umber of mo dels. Space constrain ts preven t to explain these mo dels pro p erly , but mo st of them a re commonly kno wn; see e.g. [Har00, BD H04] for surv eys. The presen ted b ogus mo dels help to mak e clear the necessit y of observ er lo calization and hence the imp or tance of this w ork. (G) Geo c en tric mo del. In the well-kno wn geo centric mo del, the Eart h is at the cen ter o f the univ erse and the Sun, the Mo o n, and all planets and stars mov e around Earth. The ancient mo del assumed concen tric spheres, but increasing precision in observ ations and measuremen t s reve aled a quite complex geo cen tric picture with planets mo ving with v aria ble sp eed on epicycles. This Ptolemaic system predicted the celestial motions quite w ell for its time, but was relativ ely complex in the com- mon sense and in the sense of in v olving many parameters tha t had to b e fitted exp erimentally . (H) Helio cen tric model. In the modern (later) helio centric mo del, the Sun is at the cen ter of the solar system (or univers e), with all planets (and stars) moving in ellipses around the Sun. Cop ernicus dev elop ed a complete mo del, muc h simpler than the Ptolemaic system, whic h intere stingly did not offer b etter predictions ini- tially , but Kepler’s refinemen ts ultimately outp erformed all geo cen tric mo dels. The price fo r this improv emen t w a s to exp el the observ ers (h umans) from the cen ter of the univ erse to one out of 8 movin g planets. While to da y this price seems small, historically it w as quite high. Indeed w e will compute the exact price later. (E) Effective t heories. After the celestial mec hanics of planets ha ve b een under- sto o d, ev er more complex phenomena could b e captured with increasing co verage. Newton’s mec hanics unifies celestial and terrestrial g r avitational phenomena. When unified with sp ecial relativit y theory one arriv es at Einstein’s general relativit y , pre- dicting large scale phenomena lik e blac k ho les and t he big bang. On the small scale, electrical a nd magnetic phenomena are unified b y Maxwe ll’s equations fo r electro- magnetism. Quan tum mec hanics and electromagnetism hav e further b een unified to quan tum electro dynamics (QED) . QED is the most p ow erful theory ev er in v ented, in terms of precision and co ve ra ge of phenomena. It is a theory of a ll phy sical and c hemical pro cesses , except f o r radio-activit y and gra vity . 4 (P) Standard mo del of particle ph ysics. Salam, Glashow and W ein b erg ex- tended QED to include w eak interactions, resp onsible for radioactive deca y . T o- gether with quantum c hromo dynamic [Hut96], whic h describ es the n ucleus, this constitutes the current standard mo del (SM) o f particle ph ysics. It describes all kno wn non-gravitational phenomena in our univ erse. There is no exp erimen t indi- cating an y limitat ion (precision, cov erage). It has ab out 20 unexplained parameters (mostly ma sses and coupling constan ts) that hav e to b e (and are) exp erimen tally determined (although some regularities can b e explained [BH97]). The effectiv e theories of the previous paragra ph can b e regarded as appro ximations o f SM, hence SM, although founded on a subatomic lev el, a lso predicts medium scale phenomena. (S) St r ing t heory . Pure gra vitational and pure quan tum phenomena are p erfectly predictable by general relativit y and the standard mo del, respective ly . Phenomen a in v o lving b o th, like the big bang, require a prop er final unification. String theory is the candidate for a final unification of the standard mo del with the gravitational force. As suc h it describ es the unive r se a t its largest and smallest scale, and all scales in-b etw een. String theory is essen tially parameter-free, but is immensely difficult to ev aluate and it seems to allow for many solutions (spatial compactifications). F or these and other reasons, there is curren t ly no uniquely accepted cosmological mo del. (C) Cosmological mo dels. Our concept of what the univ erse is, seems to ev er expand. In a ncien t times there w as Earth, Sun, Mo o n, a nd a few planets, surrounded b y a sphere of shiny p oints ( fixed stars). The curren t textb o ok univ erse started in a big bang and consists of billions of ga laxy clusters eac h con t a ining billions of stars, pro bably man y with a planetar y system. But this is just the visible univ erse. According to inflatio n mo dels, whic h are needed to explain the homogeneit y of our univ erse, t he “ total” univ erse is v astly larger than the visible part. (M) Multiv erse theories. Man y theories (can b e argued to) imply a m ultitude of essen tia lly disconnected univ erses (in the con v entional sense), often eac h with their o wn (quite differen t) characteris t ics [T eg04]. In Wheeler’s oscillating univ erse a new big bang fo llo ws the assumed big crunc h, and this rep eats indefinitely . Lee Smolin prop osed that ev ery black hole recursiv ely pro duces new univ erses on the “other side” with quite differen t prop erties. Ev erett’s many-w o rlds in terpretation of quan- tum mec hanics p ostulates tha t the w av e function do esn’t collapse but the univ erse splits ( decoheres) in to different branche s, one for eac h p ossible outcome of a mea- suremen t . Some string theorists hav e suggested that p ossibly all compactifications in their theory are realized, each resulting in a differen t univ erse. (U) Univ ersal T oE. The la st t wo m ultiv erse suggestions contain the seed of a gen- eral idea. If t heory X contains some unexplained elemen t s Y (quan tum or compact- ification or other indeterminism), one p ostulates that ev ery r ealization of Y results in its o wn univ erse, and w e just happ en to liv e in o ne of them. Often the a n thropic principle is used in some hand-w aving w ay to argue wh y we are in this and not that univ erse [Smo04]. T aking this t o the extreme, Sc hmidhuber [Sc h9 7, Sch00] p ostu- lates a m ultiv erse (whic h I call univ ersal univ erse) that consists of every computable 5 univ erse (no te there are “just” coun tably man y computer programs). Clearly , if our univ erse is computable, then it is contained in t he univ ersal univ erse, so w e ha ve a T oE already in our hands. Similar in spirit but neither constructiv e nor fo r mally w ell-defined is T egmark’s mathematical m ultivers e [T eg08]. (R) Random univ erse. Actually there is a muc h simpler w ay of obtaining a T oE. Consider an infinite seque nce of random bits (fair coin tosses). It is easy t o see that an y finite pattern, i.e. an y finite binary sequenc e, o ccurs (actually infinitely often) in this string. No w consider our observ able univers e quantized at e.g. Planc k lev el, and co de the whole space-time unive rse in to a h uge bit string. If the unive rse ends in a big crunc h, this string is finite. (Think of a digital high resolution 3D mo vie of the univ erse from the big bang to t he big crunc h). This big string also app ears somewhere in our random string, hence our random string is a p erfect T oE. This is reminiscen t of the Boltzmann brain idea tha t in a sufficien tly large ra ndom univ erse, there exist low en tropy regions that resem ble our o wn univ erse and/or brain (observ er) [BT86, Sec.3.8]. (A) All-a-Carte models. The existence of true randomness is contro versial and complicates man y considerations. So T oE (R) may b e rejected on this ground, but there is a simple deterministic computable v a rian t. G lue the natura l n um b ers written in binary format , 1,10,11,100,10 1 ,110,111,1 0 00,1001,... to one long string. 11011100 10111011 1100010 0 1 ... The decimal v ersion is know n as Champ erno wne’s num b er. O bviously it con tains ev ery finite substring by construction. Indeed, it is a Normal Number in the sense that it con tains ev ery substring of length n with the same relativ e f r equency (2 − n ). Man y irr ational n umbers lik e √ 2, π , and e are conjectured t o b e normal. So Cham- p erno wne’s n umber and probably ev en √ 2 are p erfect T oEs. Remarks. I presume that ev ery reader of this section at some p oin t regarded the remainder as bog us. In a sen se this pap er is ab out a ra t io nal criterion to decide whether a mo del is sane or insane. The problem is that the line of sanity differs for differen t p eople and differen t historical times. Mo ving the earth out of t he center of the univ erse w as ( a nd for some eve n still is) insane. The standard mo del is a ccepted by nearly all ph ysicists as the closest appro ximation to a T oE so far. Only outside physic s, often b y opp onen t s of reduc- tionism, this view has b een criticized. Some resp ectable researc hers including Nob el Laureates go f urt her and tak e string theory and ev en some Multiv erse theories seri- ous. Univ ersal T o E also has a few serious pro p onen ts. Whether Boltzmann’s random noise or m y All-a- Carte T oE find adherers needs to b e seen. F o r me, Univ ersal T oE (U) is the sanit y critical p oin t. Indeed UT oE will b e inv estigated in greater detail in later sections. References to t he dogmatic Bible, Popper’s misguided falsifiability principle [Sto82, Gar01], and wrong applications of Ock ham’s razor are the most p opular 6 pseudo justifications of what theories are (in)sane. More serious arguments in v o lv- ing the usefulness of a theory will b e discussed in the next section. 3 Predictiv e P o w er & Obs e rv er Lo calizatio n In the last section I ha v e en umerated some mo dels of (parts or including) the uni- v erse, roughly sorted in increasing size of the univ erse. Here I discuss their relativ e merits, in particular their predictiv e p o we r (precision and co v erage). Analytical or computational tr actabilit y also influences t he usefulness of a theory , but can b e ig- nored when ev alua t ing its status as a T oE. F o r example, Q ED is computationally a nightmare, but this do es not at all affect its status as the theory of all electrical, magnetic, a nd chem ical pro cesse s. On the other hand, w e will see that lo calizing the observ er, whic h is usually not r egarded as an issue, can b e ve r y imp orta n t. The latter has nothing to do with the quantum-mec hanical measuring pro cess, although there ma y b e some deep er y et to b e explored connection. P artic le ph ysics. The standard mo del has more p ow er and hence is closer to a T oE t ha n all effectiv e theories (E) together. String theory plus the righ t c hoice of compactification reduces t o the standard mo del, so has the same or sup erior p ow er. The k ey p oin t here is the inclusion o f the “r ig h t c ho ice of compactification”. Without it, string theory is in some resp ect less p o we rf ul than SM. Bab y univ erses. Let us now turn to the cosmological mo dels, in particular Smolin’s bab y univers e theory , in whic h infinitely many univ erses with differen t prop erties exist. T he theory “explains” w hy a univ erse with our properties exis t (since it includes univ erses with all kinds of prop erties), but it has little predictiv e p ow er. The ba by univ erse theory plus a sp ecification in whic h unive r se w e happ en t o liv e w ould determine the v alue of the in ter-univ erse v ariables fo r o ur univ erse, and hence ha ve m uc h mo r e predictiv e p ow er. So lo calizing ourselv es increases the predictiv e p ow er of the theory . Univ ersal T oE. Let us consider the eve n larger unive r sal m ultiv erse. Assuming our univ erse is computable, the m ultivers e generated b y UT o E con tains a nd hence p erfectly describ es our univ erse. But this is of little use, since w e can’t use UT oE for prediction. If we knew our “p osition” in this multiv erse, we w ould know in whic h (sub)univ erse w e are. This is equiv alen t to kno wing the program that generates our univ erse. This program may b e close to an y of the con v en tiona l cosmological mo dels, whic h indeed ha ve a lot of predictiv e p o wer. Since lo cating ourselv es in UT oE is equiv alen t and hence as har d as finding a con ve ntional T oE of our univ erse, w e hav e not gained m uch. All-a-Carte mo dels also contain and hence p erfectly describe our univ erse. If and only if we can lo calize o urselv es, we can actually use it for predictions. (F or instance, if w e knew w e w ere in the cen ter of univ erse 00101101 1 w e could predict that w e will ‘see’ 0010 when ‘lo oking’ to the left and 1011 when lo oking to the righ t.) Let u b e a 7 snapshot of our space-time univ erse; a truly gar g an tuan string. Lo cating ourselv es means t o (at least) lo cate u in the multiv erse. W e kno w that u is the u ’s n umber in Champ erno wne’s s equence (in terpreting u as a binary n umber), hence lo cating u is equiv alent to sp ecifying u . So a T oE based o n normal num b ers is only useful if accompanied by the garg an tuan snapshot u of our univ erse. In lig h t o f this, an “All-a-Carte” T o E ( without kno wing u ) is rather a theory of nothing than a theory of ev erything. Lo calization within our univ erse. The loss of predictiv e p o w er when enlarg ing a univ erse to a multiv erse mo del has nothing to do with multiv erses p er se. Indeed, the distinction b etw een a univ erse and a m ultiverse is not absolute. F or instance, Champ erno wne’s num b er could also b e interprete d as a single unive r se, rather than a m ultiv erse. It could b e regarded as an extreme fo rm o f the infinite fantasia land from the Nev erEnding Story , where ev erything happ ens somewhere. Champ ernowne ’s n um b er constitutes a p erfect map of the All- a-Carte univ erse, but the map is useless unless y ou kno w where you are. Similarly but less extreme, the inflation mo del pro duces a univ erse that is v a stly lar g er than its visible part, a nd different regions ma y hav e differen t pr o p erties. Ego centric to Geocentric mo del. Consider no w the “small” scale of our daily life. A y oung c hild b eliev es it is the center of the w o rld. Lo calization is tr ivial. It is alw ays at “ co ordinate” (0,0,0 ). Later it learns that it is just o ne among a few billion other p eople and as little o r m uch sp ecial as any ot her p erson thinks of themself. In a sense w e replace our ego cen tr ic co or dina t e system by one with origin (0,0,0 ) in the cen ter of Earth. The mo ve a w ay from an ego centric w orld view has man y so cial adv antages, but dis-answ ers one question: Wh y am I this particular p erson and not an y other? (It also comes a t the cost of constantly havin g to ba la nce egoistic with altruistic b ehavior.) Geo cen t r ic to Helio cen t ric mo del. While b eing exp elled fro m the cen ter of the w or ld as an individual, in the geo cen tric mo del, at least the human race as a whole remains in the cen t er of the w orld, with the remaining (dead?) unive rse rev olving around us . The helio cen tric mo del puts Sun at (0,0,0) and degrades Earth to planet nu mber 3 out of 8. The astronomic adv a n tages are clear, but dis-answ ers one question: Wh y this planet and not one of the others? T ypically we are muz zled b y questionable an thropic argumen ts [Bos02, Smo04 ]. (Another scien tific cost is the necessit y no w t o switc h b etw een co ordinat e systems, since the ego- and geo cen tric views are still useful.) Helio cen t ric to mo dern cosmological mo del. The next coup of astrono mers w as to degra de our Sun to one star among billions of stars in our milky w ay , and our milky wa y to one gala xy out of billions of others. It is generally accepted that the question wh y w e are in this par t icular galaxy in this particular sola r system is essen tia lly unansw erable. Summary . The exemplary discuss io n abov e has hop efully con vinced the reader that w e inde ed lose something (some predictiv e pow er) when progressing to to o 8 large univ erse and multiv erse mo dels. Historically , the higher predictiv e p ow er o f the large-univ erse mo dels (in whic h w e are seemingly randomly placed) o vers ha dow ed the few extra questions they raised compared to the smaller ego/geo/helio-centric mo dels. (we’re not concerned here with the psyc holo gical disadv antages/damage, whic h ma y b e large). But the discussion of the (ph ysical, univ ersal, random, and all-a-carte) m ultiv erse theories has sho wn that pushing this progression to o far will at some p oint ha r m predictiv e p o wer. W e saw that this has to do with the increasing difficult y to lo calize the observ er. 4 Complete T oEs ( CT oEs) A T oE b y definition is a p erfect mo del of the univ erse. It should allow to predict all phenomena. Most T oEs require a sp ecification of some initial conditions, e.g. the state at the big bang, and ho w the state ev olv es in t ime (the equations of motion). In g eneral, a T o E is a program that in principle can “sim ulate” the whole unive r se. An All-a-Carte univ erse p erfectly satisfies this condition but apparen tly is rather a theory of nothing than a theory of ev erything. So meeting the simulation condition is not sufficien t for qualifying as a Complete T oE. W e ha v e seen tha t (ob jectiv e) T oEs can b e completed b y sp ecifying the lo catio n of the observ er. This allows us to mak e useful predictions from our (sub jectiv e) viewp oint. W e call a T oE plus observ er lo calization a sub jective or complete T oE. If w e allo w for stochastic (quan t um) univ erses w e also need to include the noise. If w e consider (human) observ ers with limited p erception ability w e need to tak e that in to a ccoun t to o. So A complete T oE needs sp ecification of (i) initial conditions (e) state ev o lution (l) lo calization of observ er (n) random noise (o) perception abilit y of observ er W e will ig nore noise and p erception ability in the follo wing and resume to these issues in Sections 7 and 5, resp ectiv ely . Next we need a wa y to compare T oEs. Epistemology . I assume tha t t he observ ers’ exp erience o f the w orld consists of a single temp oral binar y sequence whic h gets lo nger with time. This is definitely true if the observ er is a rob ot equipp ed with sensors lik e a video camera whose signal is con v erted to a digita l data stre am, fed in to a digital computer and stored in a binary file of increasing length. In h umans, the signal transmitted by the optic and other sensory nerv es could play the role of the digital data stream. Of course (most) hum a n observ ers do not p ossess photographic memory . W e can deal with this limitation in v arious wa ys: digitally record a nd mak e accessible up o n request the nerv e signals from birth till now , or a llo w for uncertain or partially remem b ered 9 observ ations. Class ical philosophical theories of kno wledge [Alc06] (e.g. as justified true b elief ) op erate on a m uch higher conceptual lev el a nd therefore require stronger (and hence more disputable) philosophical presupp ositions. In m y minimalist “spar- tan” inf o rmation-theoretic epistem o logy , a bit-string is the only observ atio n, and all higher on tolo gies are constructed from it and are pure “imaginatio n” . Predictiv e pow er and elegance. Whatev er the in termediary guiding principles for designing theories/mo dels (elegance, symmetries, tractabilit y , consistency), the ultimate judge is predictiv e success. Unfortunately we can nev er b e sure whether a giv en T oE mak es correct predictions in the future. After all w e cannot rule out that the world suddenly c hanges tomorro w in a totally unexpected wa y (cf. the quote at b eginning of this article). W e hav e to compare theories based on their predictiv e success in the past. It is a lso clear that the latter is not enough: F or eve ry mo del w e can construct an alt ernat ive mo del that b eha v es iden t ically in the pa st but mak es differen t predictions f rom, say , y ear 2020 on. P opp er’s falsifiabilit y dog ma is little helpful. Bey ond p ostdictiv e success, the guiding principle in designing and selecting theories, especially in phy sics, is elegance and mathematical consistency . The pre- dictiv e p o w er of the first helio cen tric mo del was not superior to the geo centric one, but it w as m uch simpler. In more profane terms, it has significan tly less para meters that need to b e sp ecified. Oc kham’s razor suitably in terpreted tells us to c ho ose the simpler a mong tw o or more ot herwise equally go o d theories. F or justifications of Oc kham’s razor, see [L V08] and Section 8. Some ev en argue that b y definition, scienc e is ab o ut applying Oc kham’s razor, see [Hut05]. F or a discussion in the con t ext of theories in ph ysics, see [GM94]. It is b ey ond the scope of this paper to repeat these considerations. In Sections 4 and 8 I will show tha t simpler theories more lik ely lead t o correct predictions, and therefore Ock ham’s razor is suitable fo r finding T oEs. Complexit y of a T oE. In o r der t o apply Oc kham’s ra zor in a non-heuristic w a y , w e need to quan tify simplicit y or complexit y . Ro ughly , the complexit y of a theory can b e defined as the num b er of sym b ols o ne needs to write the theory do wn. More precisely , write down a program fo r the state ev olut io n together with the initial conditions, and define the complexit y o f the theory as the size in bits of the file that con tains the progra m. This quan tificatio n is known as algor ithmic informatio n or Kolmogorov complexity [L V0 8 ] and is consisten t with our in tuition, since an elegan t theory will ha v e a shorter program than an inelegan t one, and extra parameters need extra space to co de, resulting in lo nger programs [W al05, G r ¨ u07 ]. F rom no w on I iden tify theories with programs a nd write Length ( q ) for t he length= complexity of program=theory q . Standard mo del v ersus st r ing t heory . T o k eep the discussion simple, let us pretend that standard mo del (SM) + grav ity (G) and string t heory (S) b oth qualify as T oEs. SM+Gra vity is a mixture of a few relativ ely elegan t theories, but con tains ab out 20 parameters that need to b e sp ecified. String theory is tr uly elegan t, but 10 ensuring that it r educes to the standard mo del needs sophisticated extra assumptions (e.g. the righ t compactification). SM+G can b e written dow n in one line, plus w e hav e t o give 20+ constan ts, so lets say o ne page. The meaning (the axioms) of a ll sym b o ls and op erator s require another page. Then w e need the basics, natural, real, complex n um b ers, sets (Z F C), etc., whic h is a nother page. That mak es 3 pag es for a complete de scription in first-order logic. There are a lot of subtleties though: (a) The axioms a r e lik ely mathematically inconsisten t, (b) it’s not immediately clear how the axioms lead to a program sim ulating our univers e, (c) t he theory do es no t predict the outcome o f random eve nts, and (d) some other problems. So to transform the description into a C program sim ulat ing our univ erse, needs a couple of pages more, but I w ould estimate aro und 10 pag es o v erall suffices, whic h is ab out 20 ’000 sym b ols=by tes. Of course this progra m will b e (i) a ve r y ineffic ien t sim ulatio n and (ii) a v ery naiv e co ding of SM+G. I conjecture tha t the shortest program for SM+G on a univ ersal T uring mac hine is m uch shorter, ma yb e ev en only one ten th of this. The n umbers are only a quic k r ule- o f-th um b guess. If we start from string theory (S), w e need ab out the same length. S is much more elegan t , but w e need to co de the compactification t o describe our univ erse, whic h effectiv ely amounts to the same. Note that ev erything else in the w orld ( a ll other ph ysics, che mistry , etc,) is emergen t. It would require a ma jor effort to quan tify whic h theory is the simpler one in the sense defined ab o ve , but I think it would b e w o rth the effor t . It is a quantitativ e ob jectiv e w ay t o decide b et we en theories that are (so far) predictiv ely indistinguish- able. CT oE selection principle. It is trivial to write do wn a program fo r an All-a-Carte m ultiv erse (A). It is also not to o hard to write a program for the unive r sal m ultiv erse (U), see Section 6. Length wise (A) easily wins o v er (U), and (U) easily wins ov er (P) and (S), but as discussed (A) and (U) hav e serious defects. On the ot her ha nd, these theories can only b e used for predictions af t er extra sp ecifications: Roug hly , for (A) this amounts to tabling the whole univ erse, (U) requires defining a T oE in the con v entional sense, (P) needs 20 or so parameters and (S) a compactification sc heme. Hence lo calization-wise (P) and (S) easily win ov er (U), and (U) easily wins o ve r (A). Given this trade-off, it now nearly suggests it self that w e should include the description length o f the observ er lo cation in our T o E ev a luation measure. That is, among tw o CT oEs, select the one that has shorter o ve r all length Length( i ) + Length( e ) + Length ( l ) (1) F or an All-a-Carte m ultiv erse, the last term contains the gargan tuan string u , cata- pulting it f rom the shortest T oE to the longest CT oE, hence (A) will not minimize (1). T oE versus UT oE. Consider any (C)T oE and its program q , e.g. (P) or (S). Since (U) r uns all programs including q , sp ecifying q means lo calizing (C)T oE q in (U). 11 So (U)+ q is a CT oE whose length is just some constan t bits (the simu lation part of (U)) more than that o f (C)T oE q . So whatev er (C)T oE ph ysicists come up with, (U) is nearly as go o d as this theory . This essen tially clarifies the paradoxical status of (U). Naked, (U) is a theory of nothing, but in com bination with another T oE it excels to a go o d CT oE, a lb eit sligh t ly longer=w or se than the latter. Lo calization within our univ erse. So f ar we hav e only lo calized our univ erse in the m ultive rse, but not ourselv es in the unive r se. T o lo calize our Sun, we could e.g. sort (and index) stars by their creation da t e, whic h the mo del (i)+(e) provide s. Most stars last for 1-10 billion y ears (sa y a n av era g e of 5 billion ye a r s). The univ erse is 14 billion years o ld, so most stars ma y b e 3 rd g eneration (Sun definitely is), so the total n um b er of stars that ha ve ev er existed should v ery ro ug hly b e 3 times the curren t nu m b er of stars of ab out 10 11 × 1 0 11 . Probably “3” is ve ry crude, but this do esn’t really matter for sak e of the argumen t. In order to lo calize our Sun w e only need its index, whic h can b e co ded in ab out log 2 (3 × 10 11 × 1 0 11 ) . = 75 bits. Similarly w e can sort and index planets and observ ers. T o lo calize earth among the 8 planets needs 3 bits. T o lo calize y ourself among 7 billion humans needs 3 3 bits. Alternativ ely one could simply sp ecify the ( x,y ,z ,t ) co o rdinate of the observ er, whic h req uires more but still only v ery few bits. These lo calization p enalties are tin y compared to the difference in predictiv e p o w er (to b e quantified later) of the v arious theories (ego/geo/ helio/cosmo). This explains and justifies theories o f large univ erses in whic h w e o ccup y a random lo catio n. 5 Complete T oE - F ormalizati o n This section f o rmalizes the CT oE selection principle and what accoun ts fo r a CT oE. Univ ersal T uring machine s are used to for ma lize the notion of pro grams as mo dels for generating our univ erse a nd our observ atio ns. I also in t ro duce more realistic observ ers with limited p erception ability . Ob jective T oE. Since w e essen tially iden tify a T oE with a program generating a univ erse, w e need to fix some general purp ose prog ramming languag e on a general purp ose computer. In theoretical computer science, the standard mo del is a so- called Univ ersal T uring Mac hine (UTM) [L V08]. It tak es a prog r a m co ded as a finite binary string q ∈ { 0 , 1 } ∗ , executes it and outputs a finite or infinite binary string u ∈ { 0 , 1 } ∗ ∪ { 0 , 1 } ∞ . The details do not matter to us, since drawn conclusions are ty pically indep enden t of them. In this section w e only consider q with infinite output UTM( q ) = u q 1 u q 2 u q 3 ... =: u q 1: ∞ In our case, u q 1: ∞ will b e the space-time unive r se (or multiv erse) generated by T o E candidate q . So q incorp orates items (i) and (e) of Section 4. Surely our univ erse do esn’t lo ok lik e a bit string, but can b e co ded as o ne as explained in Sections 2 and 7. W e ha ve some simple co ding in mind, e.g. u q 1: N b eing the (fictitious) binary 12 data file o f a high-resolution 3D movie of the whole univ erse from big bang to big crunc h, aug men ted by u q N +1: ∞ ≡ 0 if the univ erse is finite. Again, the details do not matter. Observ ational pro cess and sub jectiv e complete T oE. As we hav e dem o n- strated it is also imp ortant to lo calize the observ er. In order to av oid p otential qualms with mo deling h uman observ ers, consider as a surrogate a (con v en tional not extra cosmic) video camera filming=observing parts of the w orld. The camera ma y b e fixed on Earth or installed on an a uto nomous rob ot. It records part of the uni- v erse u denoted by o = o 1: ∞ . (If the lifetime of the observ er is finite, w e app end zeros to the finite observ ation o 1: N ). I only consider dir e ct observ a t io ns like with a camera. Electrons or atomic deca ys or quasars are not directly observ ed, but with some (classical) instrumen t . It is the indicator or camera image of the instrumen t that is observ ed (whic h ph ysicists then usually in terpret). This setup av oids ha ving to deal with any form o f informa l corresp ondence b et w een theory and r eal w orld, or with subtleties of the quan tum- mec hanical measuremen t pro cess. The only philosophical presupp osition I ma ke is that it is p ossible to determine uncon trov ersially whether tw o finite binary strings (on pap er or file) are the same or differ in some bits. In a computable univ erse, the observ atio nal pro cess within it, is ob viously also computable, i.e. there exists a program s ∈ { 0 , 1 } ∗ that extracts observ ations o from univ erse u . F ormally UTM( s, u q 1: ∞ ) = o sq 1: ∞ (2) where w e ha v e extended the definition of UTM to allow access to an extra infinite input stream u q 1: ∞ . So o sq 1: ∞ is the sequence observ ed b y sub ject s in univ erse u q 1: ∞ generated by q . Program s con tains all informatio n ab out the lo cation and orien ta - tion and p erception abilities o f the observ er/camera, hence sp ecifies not only it em (l) but also item (o) of Section 4. A Complete T oE (CT oE) consists of a sp ecification of a (T oE,sub j ect) pair ( q ,s ). Since it includes s it is a Sub jectiv e T oE. CT oE selection principle. So far, s and q w ere fictitious sub jects and univ erse programs. Let o true 1: t b e the past o bserv ations of some concrete o bserv er in our uni- v erse, e.g. y our o wn p ersonal exp erience of the w orld from birth till to day . The future observ ations o true t +1: ∞ are of course unkno wn. By definition, o 1: t con tains al l a v ailable exp erience of the observ er, including e.g. outcomes of scien tific exp erimen ts, sc ho ol education, read b o oks, etc. The observ ation sequence o sq 1: ∞ generated b y a correct CT oE mus t b e consisten t with the true observ a tions o true 1: t . If o sq 1: t w ould differ from o true 1: t (in a single bit) the sub ject would hav e ‘exp erimen tal’ evidence that ( q ,s ) is not a perfect CT oE. W e can no w formalize the CT oE selection principle as follows Among a giv en set of p erfect ( o sq 1: t = o true 1: t ) CT oEs { ( q , s ) } select the one of smallest length Length( q ) + Length( s ) (3) 13 Minimizing length is motiv ated b y Oc kham’s razor. Inclusion of s is necessary to a void degenerate T oEs lik e (U) and (A). The selected CT oE ( q ∗ ,s ∗ ) can and should then b e used for forecasting future observ ations via ...o f or ecast t +1: ∞ = UTM( s ∗ ,u q ∗ 1: ∞ ). 6 Univ ersal T oE - F ormalization The Univ ersal T oE is a sanity critical p oint in the dev elopmen t of T oEs, and will formally b e defined and inv estigated in this section. Definition of Univ ersal T oE. The Unive rsal T oE generates all computable uni- v erses. The generated multiv erse can b e depicted as an infinite matrix in whic h eac h ro w corresp onds to one univ erse. q UTM( q ) ǫ u ǫ 1 u ǫ 2 u ǫ 3 u ǫ 4 u ǫ 5 · · · 0 u 0 1 u 0 2 u 0 3 u 0 4 · · ·· · · 1 u 1 1 u 1 2 u 1 3 · · ·· · · 00 u 00 1 u 00 2 · · ·· · · . . . . . . . . . . . . T o fit this in to o ur framework w e need to define a single program ˘ q that generates a single string corresp onding to this matrix. The standard wa y to linearize an infinite matrix is to do ve ta il in diag o nal serpentines though the matrix: ˘ u 1: ∞ := u ǫ 1 u 0 1 u ǫ 2 u ǫ 3 u 0 2 u 1 1 u 00 1 u 1 2 u 0 3 u ǫ 4 u ǫ 5 u 0 4 u 1 3 u 00 2 ... F ormally , define a bijection i = h q ,k i b et w een a (progra m, lo cation) pair ( q ,k ) and the natural n umbers I N ∋ i , and define ˘ u i := u q k . It is not hard to construct an explicit program ˘ q for UTM that computes ˘ u 1: ∞ = u ˘ q 1: ∞ = UTM( ˘ q ). One migh t think that it would hav e b een simpler or more na tural to generalize T uring machine s to ha ve matrix “tap es”. But this is deceiving. If w e allo w for T uring mac hines with matrix output, w e also should allo w for and en umerate all programs q that ha ve a matrix output. This leads to a 3 d tensor that needs to b e con v erted to a 2d matrix, whic h is no simpler than the linearization ab o ve. P artial T oEs. Cutting the univ erses into bits and in terw eaving them in to one string migh t app ear messy , but is unproblematic fo r t wo reasons: First, the bij ection i = h q ,k i is v ery simple, so an y par ticular univ erse string u q can easily be r ecov ered from ˘ u . Second, suc h an extraction will b e included in the lo calization / observ ational pro cess s , i.e. s will con tain a sp ecification of the relev ant univ erse q and whic h bits k are to b e observ ed. More problematic is that man y q will not pro duce an infinite univ erse. This can b e fixed as follo ws: First, we need to b e more precise ab out what it means for UTM( q ) to write u q . W e in t r o duce an extra sym b ol ‘#’ for ‘undefined’ and set eac h bit u q i initially to ‘#’. The UTM running q can output bits in any order but can 14 o ve r write eac h lo catio n # only once, either with a 0 or with a 1. W e implicitly assumed this mo del ab o v e, and similarly for s . No w we (ha ve to) also allow for q that lea v e some or all bits unsp ecified. The interlea ving computation UTM( s, UTM( q )) = o of s and q w orks as f ollo ws: Whenev er s wan ts to read a bit fro m u q 1: ∞ that q has not (y et) written, con trol is transferred to q until this bit is written. If it is neve r written, then o will b e only partially defined, but suc h s are usually not considered. (If the undefined lo cation is b efore t , CT oE ( q ,s ) is not p erfect, since o true 1: t is completely defined.) Alternativ ely one ma y define a more complex dynamic bijection h· , ·i that orders the bits in the order they are created, or one resorts to generalized T uring mac hines [Sc h00 , Sc h02] whic h can o verw r ite lo cations and also greatly increase the set of describable univ erses. These v ariants (allo w to) mak e ˘ u and all u q complete (no ‘#’ sym b ol, tap e ∈ { 0 , 1 } ∞ ). T oE v ersus UT oE. W e can fo rmalize the argumen t in t he last section of simulating a T oE b y UT o E as follows : If ( q ,s ) is a CT oE, then ( ˘ q , ˜ s ) based on UT oE ˘ q and observ er ˜ s := r q s , where prog r a m r extracts u q from ˘ u a nd then o sq from u q , is a n equiv alen t but sligh t ly larger CT o E, since UTM( ˜ s, ˘ u ) = o q s = UTM( s,u q ) b y definition of ˜ s and L ength ( ˘ q ) + Length( ˜ s ) = Length( q ) + Length( s ) + O (1). The b est CT oE . Finally , one ma y define t he best C T oE (of an observ er with exp erience o true 1: t ) as UCT o E := arg min q ,s { Length( q ) + Length( s ) : o sq 1: t = o true 1: t } where o sq 1: ∞ = UTM( s, UTM( q )). This may b e regarded as a formalization of the ho ly grail in ph ysics; o f finding suc h a TOE. 7 Extensio n s Our CT oE selection principle is applicable to p erfect, determ inistic, discrete, and complete mo dels q of our univ erse. None of the existing sane w orld mo dels is of this kind. In this section I extend the CT oE selection principle to more realistic, partial, appro ximate, probabilistic, and/or parametric mo dels for finite, infinite a nd eve n con tinuous univers es. P artial theories. Not all in teresting theories are T oEs. Inde ed, most theories are only partial mo dels o f asp ects of our w orld. W e can reduce the problem of selecting go o d partial theories to CT oE selection as follows: Let o true 1: t b e the complete observ ation, and ( q ,s ) b e some theory explain- ing only some observ ations but not all. F or instance, q migh t b e the helio cen tric mo del and s b e suc h that all bits in o true 1: t that corresp o nd to planetar y p ositions are predicted correctly . The other bits in o q s 1: t are undefined, e.g. the p osition of cars. W e can augmen t q with a (h uge) table b of all bits f or whic h o q s i 6 = o true i . T ogether, 15 ( q ,b,s ) allows to reconstruct o true 1: t exactly . Hence for tw o differen t theories, the one with smaller length Length( q ) + L ength( b ) + Length( s ) (4) should b e selected . W e can actually spare ourselv es from tabling all tho se bits that are unpredicted b y all q under consideration, since they con tribute the same ov erall constan t. So when comparing t w o theories it is sufficien t to consider only those observ ations t ha t are correctly predicted b y o ne (or b oth) theories. If t w o partial theories ( q ,s ) and ( q ′ ,s ′ ) predict the same phenomena equally w ell (i.e. o q s 1: t = o q ′ s ′ 1: t 6 = o true 1: t ), then b = b ′ and minimizing (4) reduces to minimizing (3). Appro ximate theories. Most theories are no t p erfect but only appro ximate re- alit y , ev en in their limited domain. The geo cen tric mo del is less accurate than the helio cen tric mo del, Newton’s mec hanics appro ximates general relativit y , etc. Ap- pro ximate theories can b e view ed a s a v ersion o f partia l theories. F o r example, consider predicting lo catio ns of planets with lo cations b eing co ded b y ( truncated) real n um b ers in binary represen tat io n, then Einstein gets more bits righ t than New- ton. The remaining erroneous bits could b e tabled as ab ov e. Errors are often more subtle than simple bit error s, in whic h case corr ection programs rather than just tables are needed. Celestial exam ple. The ancien t celestial models just capture the mo v emen t of some celestial b o dies, and ev en those only imperfectly . Neve rt heless it is in teresting to compare them. Let us tak e as o ur corpus of observ atio ns o true 1: t , say , all astronom- ical tables a v ailable in the y ear 160 0, and ig nore all other exp erience. The geo cen t r ic mo del q G more or less directly describes the observ atio ns, hence s G is relativ ely simple. In the helio cen tric model q H it is necessary to include in s H a non- trivial co ordinate tra nsfor ma t io n to explain the geo cen tric astronomical data. Assuming b oth mo dels w ere p erfect, t hen, if and o nly if q H is simpler than q G b y a margin t hat is larg er than the extra complications due to the co o rdinate transformation (Length( q G ) − Length ( q H ) > Length( s H ) − Length ( s G )), w e should regard the helio cen tr ic mo del as b etter. If/since the helio cen tric mo del is mor e accurate, w e hav e to additionally p enalize the geo cen tric mo del b y the num b er of bits it do esn’t predict correctly . This clearly mak es the helio cen tric mo del sup erior. Probabilistic theories. Contrary to a deterministic theory t hat predicts the future from the past for sure, a probabilistic theory assigns to eac h future a certain c hance that it will o ccur. Equ iv alen t ly , a deterministic univ erse is describ ed b y some string u , while a probabilistic univers e is described by some probabilit y distribution Q ( u ), the a priori probability of u . (In the sp ecial case of Q ( u ′ ) = 1 for u ′ = u and 0 else, Q describ es the deterministic univers e u .) Similarly , t he observ ational pro cess ma y b e probabilistic. Let S ( o | u ) b e the pro babilit y of observing o in univ erse u . T ogether, ( Q,S ) is a probabilistic CT oE that predicts observ ation o with probabilit y P ( o ) = P u S ( o | u ) Q ( u ). A computable probabilistic CT oE is one for whic h there exist 16 programs (o f lengths Length ( Q ) a nd Length( S )) that compute the functions Q ( · ) and S ( · |· ). Consider now the true observ atio n o true 1: t . The larger P ( o true 1: t ) the “b etter” is ( Q,S ). In the degenerate deterministic case, P ( o true 1: t ) = 1 is maximal for a correct CT o E, and 0 for a wrong o ne. In ev ery o t her case, ( Q,S ) is only a partial t heory that needs completion, since it do es not compute o true 1: t . Giv en P , it is p ossible to co de o true 1: t in | log 2 P ( o true 1: t ) | bits (arithmetic or Shannon-F ano co de). Assuming that o true 1: t is indeed sampled from P , one can sho w t hat with high probability this is the shortest p ossible co de. So there exists an effectiv e description o f o true 1: t of length Length( Q ) + Length( S ) + | log 2 P ( o true 1: t ) | (5) This expression should b e used (minimized) when comparing proba bilistic CT o Es. The principle is reminiscen t of classical tw o- pa rt Minim um Enco ding Length prin- ciples lik e MML and MDL [W al05, Gr ¨ u07 ]. Note that the no ise corresp onds to the errors and the log term to the error table of the previous pa r a graphs. Probabilistic examples. Assume S ( o | o ) = 1 ∀ o and consider the observ atio n se- quence o true 1: t = u true 1: t = 110010 0 1000011 1 11101101 010100. If w e assume this is a sequence of fair coin flips, then Q ( o 1: t ) = P ( o 1: t ) = 2 − t are v ery simple functions, but | log 2 P ( o 1: t ) | = t is large. If we assume tha t o true 1: t is the binary expansion of π (whic h it is), then the corresp onding deterministic Q is somewhat more complex, but | log 2 P ( o true 1: t ) | = 0. So for suffi ciently large t , the deterministic mo del of π is selected, since it leads to a shorter co de (5) t han the fair-coin-flip mo del. Quan tum theory is (ar g ued b y ph ysicists to b e) truly ra ndom. Hence all mo d- ern T oE candidates (P+G, S, C, M) are probabilistic. This yie lds h uge additiv e constan ts | log 2 P ( o true 1: t ) | to the otherwise quite elegan t theories Q . Sc hmidh ub er [Sc h97 , Sc h00] argues that all apparen t ph ysical randomness is actually only pseudo random, i.e. generated b y a small program. If this is true and w e could find the ran- dom n umber generator, w e could instantly predict all apparent quan t um-mec hanical random eve nts. This w ould b e a true impro veme nt of existing theories, and indeed the corresp onding CT oE w ould b e significan tly shorter. In [Hut05, Sec.8.6.2] I give an argumen t wh y b elieving in true random noise may b e an unscien tific p osition. Theories with parameters. Man y theories in ph ysics dep end on real-v alued pa- rameters. Since observ a t io ns ha v e finite accuracy , it is sufficien t to specify these pa- rameters to some finite accuracy . Hence the theories including their finite-precision parameters can b e co ded in finite length. There are general results and tec hniques [W al05, Gr ¨ u07] that allo w a comfor t able ha ndling of a ll this . F or instance, fo r smo oth parametric mo dels, a pa rameter accuracy of O (1 / √ n ) is needed, whic h re- quires 1 2 log 2 n + O (1) bits p er parameter. The explicable O (1) term dep ends on the smo othness of the mo del and prev en ts ‘c heating’ (e.g. zipping t wo parameters into one). Infinite and con tin uous unive rses. So far we ha v e assumed that each time-slice through our univ erse can b e describ ed in finitely man y bits and time is discrete. 17 Assume our univ erse w ere the infinite con t inuous 3+1 dimensional Mink owski space I R 4 o ccupied by (t iny) balls ( “ particles”). Consider all p oints ( x,y ,z ,t ) ∈ I R 4 with rational co o rdinates, and le t i = h x,y ,z ,t i be a bijection to the natural num b ers similarly to the dov etailing in Section 6. Let u i = 1 if ( x,y ,z ,t ) is o ccupied b y a particle and 0 otherwise. String u 1: ∞ is an exact description of this univ erse. The ab o v e idea generalizes to any so-called separable ma t hematical space. Since a ll spaces o ccurring in established ph ysical t heories a r e separable, there is curren tly no T oE candidate that requires uncoun table unive rses. May b e contin uous t heories are just con v enient appro ximations of deep er discrete theories. An ev en more fundamen ta l argumen t put forw ard in this con text by [Sc h00] is that the Lo ew enheim-Sk olem theorem (an apparen t paradox) implies that Zermelo-F raenk el set theory (ZFC) has a coun tably infinite mo del. Since all ph ysical theories so far are for ma lizable in ZFC , it follows they all hav e a countable mo del. F or some strange reason (possibly an historical artifact), the adopted uncoun table in terpretatio n seems just more con venie nt. Multiple theories. Some prop onents of pluralism and some opp onen ts of reduc- tionism argue that w e need m ultiple theories on multiple scales for differen t (o ver- lapping) application domains. They argue that a T oE is not desirable and/or not p ossible. Here I giv e a reason wh y w e ne e d one single fundamen tal theory (with all other theories having to b e regarded as approximations): Consid er t w o Theories ( T 1 and T 2) with (pro claimed) application domains A 1 and A 2, resp ectiv ely . If predictions of T 1 and T 2 coincide on their inte r section A 1 ∩ A 2 (or if A 1 a nd A 2 are disjoin t ), we can trivially “unify” T 1 and T 2 to one t heory T b y taking their union. O f course, if this does not result in an y simplific a tion, i.e. if L ength( T ) = Length( T 1) + Length ( T 2), we gain nothing. But since nearly all mo dern theories ha ve some common basis, e.g. use natural or real num b ers, a fo r ma l unification of the generating pro grams nearly alw ays leads to Length( q ) < Length( q 1 ) + Length( q 2 ). The in teresting case is when T 1 and T 2 lead to different forecasts on A 1 ∩ A 2. F or instance, particle ve rsus w av e theory with the atomic w or ld at their in tersection, unified by quan tum theory . Then we need a reconciliation of T 1 and T 2, tha t is, a single theory T for A 1 ∪ A 2. Ockh a m’s ra zor tells us to c ho ose a simple (elegan t) unification. This rules o ut naiv e/ugly/complex solutions lik e de veloping a third theory for A 1 ∩ A 2 or attributing pa r ts of A 1 ∩ A 2 to T 1 or T 2 as one sees fit, or a ve r a ging the predictions of T 1 a nd T 2. Of course T mus t b e consisten t with the observ ations. Pluralism on a meta lev el, i.e. allo wing b esides Ock ha m’s ra zor other principles for selecting theories, has the same problem on a meta-lev el: whic h principle should one use in a concrete situation? T o argue that this ( or any o ther) problem cannot b e formalized/quan tized/mec hanized w ould b e (close to) an a n ti-scien tific attitude. 18 8 Justificatio n of Oc kham’s Razor W e no w prov e Oc kham’s razor under the assumptions stated b elow and compare it to t he No F ree Lunch myth. The result itself is not nov el [Sch00]. The in tention and con tribution is to pro vide a n elemen tary but still sufficien tly f o rmal argumen t, whic h in particular is free of more sophisticated concepts lik e Solomonoff ’s a- priori distribution. Oc kham’s razor principle demands to “ta k e the simplest theory consisten t with the observ ations”. Ockham’s r azor c ould b e r e gar de d as c orr e ct if among al l c ons ider e d the o- ries, the one sele c te d by Ockham’s r azor is the one that most likely l e ads to c orr e ct pr e dictions. Assumptions. Assume we liv e in the univ ersal multiv erse ˘ u t hat consists o f all com- putable univ erses, i.e. UT oE is a correct/true/p erfect T oE. Since ev ery computable univ erse is con t a ined in UT oE, it is at least under the computabilit y assumption im- p ossible t o dispro v e this assumptions. The second assumption we mak e is that our lo cation in the multiv erse is random. W e can divide this in to tw o steps: Fir st, t he univ erse u q in whic h we happ en to b e is c hosen randomly . Second, our “lo cation” s within u q is ch osen at random. W e call these the universal self-sampling assumption . The crucial difference to the infor ma l an thro pic self-sampling assum ptio n used in do omsda y argumen ts is discussed b elo w. Recall the observ er program ˜ s := r q s in tro duced in Section 5. W e will make the simplifying assumption that s is the identit y , i.e. restrict ourselv es to “ob jectiv e” observ ers that observ e their univ erse completely: UTM ( ˜ s, ˘ u ) = o q s = UTM( s,u q ) = u q = UTM( q ). F ormally , t he univ ersal self-sampling assumption can b e stated as follo ws: A p riori it is e q ual ly l i k ely to b e in any of the universes u q gener a te d by some pr o gr am q ∈ { 0 , 1 } ∗ . T o b e precise, w e consider all progra ms with length b ounded by some constant L , and tak e the limit L → ∞ . Coun ting consisten t universes. Let o true 1: t = u true 1: t b e the unive r se observ ed so far and Q L := { q : Length( q ) ≤ L and UTM( q ) = u true 1: t ∗} b e the set of all consisten t univ erses (whic h is non-empt y for large L ), where * is an y con tin uation o f u true 1: t . Given u true 1: t , w e know w e are in one of the univ erses in Q L , whic h implies by the univers a l self-sampling assumption a uniform sampling in Q L . Let q min := arg min q { Length( q ) : q ∈ Q L } and l := Length( q min ) 19 b e the shortest consisten t q and its length, resp ectiv ely . Adding (unread) “garbag e” g after t he end of a progra m q do es not change its b eha vior, i.e. if q ∈ Q L , then also q g ∈ Q L pro vided that Length( q g ) ≤ L . Hence for ev ery g with L ength ( g ) ≤ L − l , w e ha v e q min g ∈ Q L . Since t here are ab out 2 L − l suc h g , w e ha v e | Q L | > ∼ 2 L − l . It is a deep theorem in algorithmic information theory [L V08] that there are also not significan tly more than 2 L − l programs q equiv a len t to q min . The pro of idea is as follo ws: One can sho w that if there are man y long equiv a len t programs, then there m ust a lso b e a short o ne. In o ur case the shortest one is q min , whic h upp er b ounds the n um b er o f long programs. T o gether this sho ws that | Q L | ≈ 2 L − l Probabilistic prediction. Giv en observ ations u true 1: t w e now determine the proba- bilit y of b eing in a univ erse that con tinues with u t +1: n , where n > t . Similarly to the previous paragraph w e can appro ximately coun t the num b er of suc h univ erses: Q n L := { q : Length ( q ) ≤ L and UTM( q ) = u true 1: t u t +1: n ∗} ⊂ Q L q n min := arg min q { Length( q ) : q ∈ Q n L } and l n := Length( q n min ) | Q n L | ≈ 2 L − l n The pro babilit y of b eing in a univ erse with future u t +1: n giv en u true 1: t is determined b y t heir relativ e num b er P ( u t +1: n | u true 1: t ) = | Q n L | | Q L | ≈ 2 − ( l n − l ) (6) whic h is (asymptotically) indep enden t of L . Oc kham’s r azor. Relation (6 ) implies that the most lik ely con tinuation ˆ u t +1: n := argmax u t +1: n P ( u t +1: n | u true 1: t ) is (approximately) the one that minimizes l n . By defini- tion, q min is the shortest program in Q L = S u t +1: n Q n L . Therefore P ( ˆ u t +1: n | u true 1: t ) ≈ P ( u q min t +1: n | u true 1: t ) The accuracy of ≈ is clarified later. In w o rds We ar e most likely in a universe that is (e quivalent to) the simplest universe c onsis tent with our p as t o b servations. This sho ws that Ockh am’s razor selects the theory that most lik ely leads to correct predictions, and hence prov es (under t he stated assumptions) that Ock ham’s razor is correct. Ockham’s r azor i s c o rr e ct under the universal se l f - sampling ass ump tion. 20 Discussion. It is imp orta n t to no t e that the univ ersal self-sampling assumption has not b y itself an y bias to wards simple mo dels q . Indeed, most q in Q L ha ve length close to L , and since we sample uniformly from Q L this actually represen ts a huge bias to wards large mo dels for L → ∞ . The result is also largely indep enden t of the uniform sampling assumption. F o r instance, sampling a length l ∈ I N w.r.t. any reasonable (i.e. slow er than exp onen- tially decreasing) distribution and then q of length l uniformly leads to the same conclusion. Ho w r easonable is the UT oE? W e ha ve already discussed that it is nearly but not quite as go o d as an y other correct T oE. The philosophical, alb eit no t practical adv antage of UT oE is that it is a safer b et, since w e can nev er b e sure ab out the future correctness of a more sp ecific T oE. An a priori argumen t in fav or of UT oE is as follo ws: What is the b est candidate for a T oE b efore i.e. in absence of an y observ ations? If someb o dy (but ho w and who?) w ould tell us tha t the univ erse is computable but nothing else, univ ersal self-sampling seems lik e a reasonable a pr io ri UT oE. Comparison to an thropic self-sa mpling. Our univ ersal self-sampling assump- tion is related to anthropic self-sampling [Bos02] but crucially differen t. The an- thropic self-sampling assumption states that a priori you are equally lik ely an y of the (hum a n) observ ers in our univ erse. First, we sample fr o m an y univ erse and a n y lo cation (living or dead) in the m ultiv erse and no t only among h uman (or reasonably in telligen t ) observ ers. Second, w e ha ve no problem of what coun ts as a reasonable (h uman) observ er. Third, o ur principle is completely formal. Nev ertheless the principles are related since (see inclus ion of s ) given o true 1: t w e also sample from the set of reasonable o bserv ers, since o true 1: t includes snapshots of other (h uman) observ ers. No F ree Lunc h ( N FL) myth. W olp ert [WM97] considers algorithms for finding the minim um of a function, and compares their a ve rage p erformance. The sim- plest p erformance measure is the n um b er of function ev aluations needed to find the global minim um. The av erag e is tak en uniformly ov er the set of all f unctions from and to some fixed finite domain. Since sampling uniformly leads with (v ery) high probabilit y to a totally random function (white noise), it is clear that on a verage no optimization algorithm can p erfor m b etter t ha n exhaustiv e search, and no rea- sonable algo rithm (that is one that prob es ev ery function argumen t at most once) p erforms w orse. That is, all reasonable optimizatio n algorithms are equally bad on a ve r a ge. This is the essence of W o lp ert’s NFL theorem and all v ariations thereof I am a ware o f , including the ones for less uniform distributions. While NFL theorems are cute observ ations, they are obv io usly irrelev ant, since nob o dy cares ab out t he maxim um of white noise f unctions. D espite NFL b eing the holy grail in some researc h comm unities, the NFL myth has little to no practical implication [Sto01]. An analogue of NFL for prediction would b e as follo ws: Let u 1: n ∈ { 0 , 1 } n b e 21 uniformly sampled, i.e. the proba bilit y of u 1: n is λ ( u 1: n ) = 2 − n . Giv en u true 1: t w e w ant to predict u t +1: n . Let u p t +1: n b e an y deterministic prediction. It is clear that all deterministic predictors p are on a v erage equally bad ( w.r.t. symmetric p erformance measures) in predicting uniform noise ( λ ( u p t +1: n | u true 1: t ) = 2 − ( n − t ) ) . Ho w does this compare to the p ositiv e result under univ ersal self- sampling? There w e also used a unifo rm distribution, but ov er effectiv e mo d- els=theories=programs. A priori w e assumed a ll programs to b e equally lik ely , but the resulting univ erse distribution is far fro m uniform. Phrased differen tly , w e pip ed uniform noise (via M , see b elo w) through a univ ersal T uring machine . W e assume a univ ersal distribution M , rather than a uniform distribution λ . Just assum ing that the w o rld has any effectiv e structure breaks NFL dow n, and make s Oc kham’s razor w ork [SH02]. The assumption that the w orld has some structure is as safe as (or I think ev en weak er tha n) the a ssumption that e.g. classical logic is go o d for reasoning ab out the world (and the la tter one has to assume to make science meaningful). Some t ec hnical details ∗ . Readers not familiar with Algorithmic Informatio n The- ory might w a nt to skip this paragraph. P ( u ) in (6) tends for L → ∞ to Solomonoff ’s a priori distribution M ( u ). In the definition of M [Sol64] only programs o f length = L , rather than ≤ L are considered, but since lim L →∞ 1 L P L l =1 a l = lim L →∞ a L if t he latter exists, t hey are equiv alent. Mo dern definitions inv olve a 2 − l ( q ) -w eighted sum of prefix progra ms, whic h is also equiv alen t [L V08]. F inally , M ( u ) is also equal to the probability t ha t a univ ersal monotone T uring machine with uniform ran- dom noise on the input tap e outputs a string starting with u [Hut05]. F urther, l ≡ Length( q min ) = K m ( u ) is the monotone complexit y o f u := u true 1: t . It is a deep result in Algorithmic Information Theory that K m ( u ) ≈ − log 2 M ( u ). F or most u equalit y holds within an additiv e constant, but fo r some u only within log a rithmic accuracy [L V08]. T aking the ra t io of M ( u ) ≈ 2 − K m ( u ) for u = u true 1: t u t +1: n and u = u true 1: t yields (6). The argument/res ult is not only tec hnical but also subtle: Not only are there 2 L − l programs equiv alen t to q min but there are also “nearly” 2 L − l programs that lead to totally differen t predictions. Luc kily they don’t har m probabilistic predictions based on P , and seldomly affect deterministic predictions based on q min in practice but can do so in theory [Hut06b]. One can a void this problem b y aug menting Oc kham’s razor with Epicurus principle of m ultiple explanations, taking all theories consisten t with the observ ations but w eigh them according to their length. See [L V08, Hut05] for details. 22 9 Discuss ion Summary . I hav e demonstrated that a theory that perfectly describ es our univ erse or m ultiv erse, rather than being a Theory of Ev erything (T oE), migh t a lso be a theory of nothing. I hav e sho wn that a predictiv ely meaningful theory can b e ob- tained if the theory is augmen ted b y t he lo calization of the observ er. This resulted in a truly Complete Theory of Ev erything (CT o E), whic h consists of a conv entional (ob jectiv e) T oE plus a (sub jectiv e) obs erv er pro cess. Oc kham’s razor quantifie d in terms o f co de-length minimization ha s b een inv o ked to select the “b est” theory (UCT o E). Assumptions. The construction of the sub jectiv e complete theory of ev erything rested on the following assumptions: ( i ) The obse r v ers’ experience of the w orld consists of a single temp oral binary sequenc e o true 1: t . All other phys ical and episte- mological concepts are derive d. ( ii ) There exists an ob jectiv e world indep enden t of any particular observ er in it. ( iii ) The w orld is computable, i.e. there exists an algorithm (a finite binar y string) which when executed outputs the whole space-time univ erse. This assumption implicitly assumes (i.e. implies) that temp orally stable binary strings exist. ( iv ) The observ er is a computable pro cess within t he ob jectiv e w orld. ( v ) The algorithms for univ erse and observ er are c hosen at random, whic h I called univ ersal self-sampling assumption. Implications. As demonstrated, under these assumptions, t he scien tific quest for a theory of ev erything can b e formalized. As a side r esult, this allows to separate ob jectiv e knowle dg e q from sub jectiv e kno wledge s . One might ev en try to a rgue that if q for the b est ( q ,s ) pair is no n-trivial, this is evidence f or the existence o f an ob jectiv e reality . Another side result is that there is no hard distinction b etw een a univ erse and a m ultive rse; the difference is qualitat ive and seman tic. Last but not least, another implication is the v alidity of Oc kham’s razor. Conclusion. Resp ectable r esearch ers, including Nob el Laureates, hav e dismis sed and embrace d each single mo del of the w o rld men tioned in Section 2, at differen t times in history and concurren tly . (Excluding All-a-Carte T oEs whic h I ha ve n’t seen discusse d b efore.) As I ha ve sho wn, Univ ersal T oE is the sanit y critical p oint. The most p opular (pseudo) justifications of whic h theories are (in)sane hav e b een references to the dogmat ic Bible, Popper’s limited falsifiability principle, and wrong applications of Ock ham’s ra zor. This pap er con tained a more serious tr eat ment of w orld mo del selection. I in tro duced and discussed the usefulness of a theory in terms of predictiv e p o wer based on mo del and observ er lo calization complexit y . 23 References [Alc06] N. Alc hin . The ory of Know le dge . John Murra y Press, 2nd edition, 2006. [BDH04] J . D. Barrow, P . C . W. Da vies, and C. L . Harp er, editors. Scienc e and U ltimate R e ality . Cambridge Unive r sit y Press, 2004. [BH97] A. Blumh ofer and M. Hutter. F amily str ucture from p erio dic solutions of an impro ved gap equation. N ucle ar Physics , B484:8 0–96, 1997. Missing figures in B494 (1997) 485. [Bos02] N. Bostrom. Anthr opic Bias . Routledge, 2002 . [BT86] J. Barro w and F. Tipler. The Anthr opic Cosmolo gic al Principle . Oxford Univ. Press, 1986. [Gar01] M. Gardner. A skeptic al lo ok at K arl Pop p er. Skeptic al Inquir e r , 25(4):1 3–14,72, 2001. [GM94] M. Gell-Mann. The Quark and the Jaguar: A dventur es in the Simple and the Complex . W.H. F reeman & Comp any , 1994. [Gr ¨ u07] P . D. Gr ¨ u n wald. The Minimum Description L e ng th Principle . The MIT Pr ess, Cam br idge, 2007. [Har00] E. Harrison. Cosmolo gy: The Scienc e of the Unive rse . Cam bridge Universit y Press, 2nd edition, 2000 . [Hut96] M. Hutter. Instantons in QCD: The ory and applic ation of the instanton liq- uid mo del . PhD thesis, F acult y for Theoretical Physics, LMU Munic h , 1996. T ranslation f rom the German original. arXiv:hep-ph/0107098. [Hut05] M. Hutter. Universal Artificial Intel lig enc e: Se quential De cisions b ase d on Al- gorithmic Pr ob ability . Springer, Berlin, 2005. [Hut06a] M. Hutter. Human kno w ledge compression prize, 2006 . op en en ded, h ttp://prize.hutte r 1.net/. [Hut06b] M. Hutter. Sequenti al p r edictions b ased on algorithmic complexit y . Journal of Computer and System Scienc es , 72(1):9 5–117, 2006. [Hut07] M. Hutter. On univ ersal prediction and Ba y esian confi r mation. The or etic al Computer Scienc e , 384(1) :33–48 , 2007. [L V08] M. L i and P . M. B. Vit´ an yi. An Intr o duction to K olmo gor ov Complexity and its Applic ations . S pringer, Berlin, 3rd edition, 2008. [Sc h97] J. Sc h m idhub er . A computer scientist’ s view of life, the universe, and ev erything. In F oundations of Computer Scienc e: Potential - The ory - Co gnition , v olume 1337 of LNCS , p ages 201–20 8. Sprin ger, Berlin, 1997. [Sc h00] J. Sc hmidhub er. Algorithmic theories of ev eryth ing. Rep ort IDSIA-20-0 0, arXiv:quan t-ph/0011122 , IDSIA, Mann o (Lu gano), Switzerland, 2000. 24 [Sc h02] J. Sc h midhub er. Hierarc hies of generalized Kolmogoro v complexities and nonen u merable universal measures computable in the limit. International Jour- nal of F oundations of Computer Sci e nc e , 13(4):587– 612, 2002. [SH02] J. Sc h midhub er and M. Hutter. Univ ers al lea r ning algorithms and optimal searc h. NIPS 2001 Workshop , 2002. htt p ://www.h u tter1.net/idsia/nipsws.h tm . [Smo04] L. Smolin. S cien tific alternativ es to the an thropic pr inciple. T echnical Rep ort hep-th/040721 3 , arXiv, 2004. [Sol64] R. J. S olomonoff. A formal theory of inductiv e inference: P arts 1 and 2. Infor - mation and Contr ol , 7:1–22 and 224–25 4, 1964. [Sto82] D. C. S to v e. Popp er and After: F our Mo dern Irr ationalists . P ergamon Pr es, 1982. [Sto01] D. S tork. F oundations of Occam’s razor and parsimon y in learning. NIP S 2001 Workshop , 2001. http:// www.rii.ricoh.com/ ∼ stork/OccamW orksh op.h tml . [T eg04] M. T egmark. Pa r allel universes. I n Scienc e and Ultimate R e ality , pages 459–491. Cam br idge Univ ersity Press, 2004. [T eg08] M. T egmark. Th e mathematical universe. F oundations of P hysics , 38(2):101– 150, 2008. [W al05] C. S . W allace. Statistic al and Inductive Infer enc e by M inimum Message L ength . Springer, Berlin, 2005. [WM97] D. H. W olp er t and W. G. Macready . No fr ee lunc h theorems f or optimizati on . IEEE T r ansactions on Ev olutionary Computat ion , 1(1):6 7–82, 1997. 25 A List o f No tation G,H,E,P ,S,C,M,U,R,A,... specific mo dels/theories defined in Section 2 T ∈ { G,H,E,P ,S,C,M,U,R,A,... } theory/mo del T oE Theo y of Ev erything (in any sense) T oE candidate a theory that migh t b e a partial or p erfect or wrong T oE UT oE Univ ersal T oE CT o E Complete T oE (i+e+l+n+o) UCT o E Univ ersal Complete T oE theory mo del which can explain ≈ des crib e ≈ predict ≈ c o mpress observ at io ns univ erse t ypically refers to visible/observ ed unive r se m ultiv erse un- or only w eakly connected collection of univ erses predictiv e p o wer precision and cov erage precision the accuracy of a theory co v erag e ho w man y phenomena a theory can explain/predict prediction refers to unseen, usually future observ a t io ns computabilit y assumption: that our univ erse is computable q T ∈ { 0 , 1 } ∗ the program that generates the unive r se mo deled by theory T u q ∈ { 0 , 1 } ∞ the univ erse g enerated b y pro gram q : u q = UTM( q ) UTM Univ ersal T uring Mac hine s ∈ { 0 , 1 } ∗ observ ation mo del/program. Extracts o from u . o q s ∈ { 0 , 1 } ∞ Sub ject’s s observ atio ns in univ erse u q : o q s = UTM( s,u q ) o true 1: t T rue past observ a tions ˘ q , ˘ u Program and univ erse of UT o E S ( o | u ) Probabilit y of observing o in univ erse u Q ( u ) Probabilit y of univ erse u (a ccording to some prob. theory T ) P ( o ) Probabilit y of observing o 26
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment