Why Philosophers Should Care About Computational Complexity
One might think that, once we know something is computable, how efficiently it can be computed is a practical question with little further philosophical importance. In this essay, I offer a detailed case that one would be wrong. In particular, I argu…
Authors: Scott Aaronson
Wh y Philosophers Should Care Ab out Computational Complexit y Scott Aaronson ∗ Abstract One might think that, once we kno w something is computable, how efficiently it can b e com- puted is a practical questio n with little further philo sophical imp ortance. In this essay , I o ffer a detailed case that one would b e wrong . In particula r, I a rgue that c omputational c omplexity the- ory —the field that studies the resources (such a s time, space, and rando mness) needed to s olv e computational problems—leads t o new pers pectives on the nature of mathematical knowledge, the stro ng AI debate, computationalism, the problem o f logical omniscie nc e , Hume’s pro blem of induction, Go odman’s grue riddle, the foundations of quantum mec ha nics, economic ra tionalit y , closed timelike curves, and several o ther to pics o f philos ophical interest. I end by discuss ing asp ects of complexity theory itself tha t could b enefit fro m philosophica l analysis . Con ten ts 1 Introduction 2 1.1 What T his Essay W on ’t Co ve r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Complexity 101 5 3 The Relev ance of P olynomial Time 6 3.1 The Entsc heidungsproblem Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Ev olv abilit y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Kno wn Int egers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Computational Complexity and the T uring T est 10 4.1 The Lo okup-T able Argum ent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 Relation to Pr evious W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Can Humans Solv e NP -Complete P r oblems Effi cie ntly? . . . . . . . . . . . . . . . . . 14 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 The Problem of Logical Omniscience 16 5.1 The Cobham Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2 Omniscience V ersus Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 ∗ MIT. Email: aaronson@csail. mit.edu . This material is based up on work supp orted by th e National Science F ound ation under Gran t N o. 08446 26. Also supp orted by a DARP A YF A gran t, the Sloan F oundation, and a TIBCO Chair. 1 6 Computationalism and W aterfalls 22 6.1 “Reductions” T h at Do All The W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7 P A C-Learning and the Problem of Induction 25 7.1 Dra wbac ks of the Basic P A C Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.2 Computational C omplexit y , Bleen, and Grue . . . . . . . . . . . . . . . . . . . . . . 29 8 Quantum C omputing 32 8.1 Quant u m C omputing an d the Man y-W orlds Interpretati on . . . . . . . . . . . . . . . 34 9 New Computational Notions of Pro of 36 9.1 Zero-Kno wledge Pro ofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 9.2 Other New Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 10 C omplexit y , Space, and Time 40 10.1 Closed Timelike Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 10.2 The Evo lutionary Pr inciple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 10.3 Closed Timelike Curve Compu tat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 11 E cono mics 45 11.1 Bounded Rationalit y and the Iterated Prisoners’ Dilemma . . . . . . . . . . . . . . . 46 11.2 The Complexit y of Eq u ilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 12 C onclusio ns 48 12.1 Criticisms of Complexit y Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 12.2 F uture Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 13 Ackno wledgmen ts 51 1 Introd uction The view that machines c annot give rise to surprises is due, I b elieve, to a fal lacy to which philosophers and mathematicians ar e p articularly subje ct. This is the assumption that as so on as a fact is pr ese nte d to a mind al l c onse quenc es of that fact spring into the mind si multane ously with it. It is a very useful assumption u nder many cir cumstanc es, but one to o e asily for gets that i t is false. —Alan M . T uring [126] The theory of compu ting, created by Alan T uring, Alonzo Ch urch, K urt G¨ odel, and others in the 1930s, didn’t only c hange civilization; it also had a lasting im p act on philosophy . Indeed, clarifying philosophical issues was t h e original p oint of their w ork; the tec hnologica l pay offs only came later! T o da y , it would b e hard to imagine a serious discussion ab out (sa y) the p hilosoph y of mind, the fou n dations of mathematics, or the prosp ects of mac hine intellig ence th at was un informed b y this revol u tion in human kn o wledge three-quarters of a cent u ry ago. Ho w ever, as computers b ecame w idely av ailable starting in the 1960s, computer scien tists in- creasingly came to see c omp utabilit y theory as not a skin g quite the righ t questio n s. F or a lmost al l the problems we actually w an t to so lve turn out to b e computable in T urin g’s sense; the r eal question is wh ic h pr oblems are efficiently or fe asibly compu table. The latter qu estio n ga ve rise to a 2 new field, called compu tatio n al complexit y theory (not to b e confused with the “other” complexity theory , w hic h studies complex systems such as cellular automata). Since the 1970s, computational complexit y theory has witnessed some sp ectacular disco ve r ies, whic h includ e NP -completeness, public-k ey c r yptograph y , new types of mathematical pro of (such as probabilistic, inte r act ive, and zero-kno wledge pro ofs), and the theoretical foun dations of machine learning and qu an tum compu- tation. T o p eople who work on these topics, the work of G¨ odel and T uring ma y lo ok in retrosp ect lik e just a warm up to the “big” qu estio n s ab out computation. Because of this, I fin d it su rprising that complexit y theory has not influen ced philosoph y to an ything lik e the extent c omp u tabilit y t h eory has. The question arises: w hy hasn ’t it? Sev eral p ossible answ ers spring to mind : mayb e computabilit y theory just had r icher philosophical im - plications. (Though as w e’ll see, one c an mak e a strong case for exactly the opp osite.) Ma yb e complexit y h as essen tially the same philosophical implications as computabilit y , and computability got there first. Ma yb e outsiders are scared a w ay from learning complexit y theory by the “math barrier.” Ma y b e the exp lanatio n is so cial: the wo rld where G¨ odel, T ur in g, Wittgenstein, and Russell participated in the same intelle ctual con v ersation v anished with W orld W ar I I; after that, theoretical computer science came to b e drive n by tec hn olo gy an d lost touc h with its philosophical origins. Ma yb e r ece nt adv ances in complexit y th eory simply h a v en’t had enough time to en ter philosophical consciousness. Ho w ever, I s u sp ect that part of the answ er is just c omplexity t he orists’ failur e to c ommunic ate what th ey can add to philosophy’s conceptual arsenal. Hence this essay , whose mo dest goal is to help correct that failure, by surveying some asp ects of complexit y theory that migh t interest philosophers, as w ell as some philosophical problems that I think a complexit y p ersp ectiv e can clarify . T o forestall misu nderstandings, let me add a note of humilit y b efore going further. This essa y will touc h on man y pr oblems that philosophers ha v e debated for generations, suc h as strong AI, the problem of induction, the relation b et ween syn tax and seman tics, and the interpretatio n of quan tum mec hanics. In none of these c ases w ill I claim that computational complexit y theory “dissolv es” the p hilosophical pr ob lem—only th at it contributes useful p ersp ectiv es and insigh ts. I’ll ofte n exp licitly ment ion ph ilosophical puzzles that I thin k a complexit y analysis eit h er lea v es unt ouched o r else in tro duces itself. But ev en where I don’t do so, one shouldn’t presu me that I think there are no such pu zzles! Indeed, one of my hop es for this essa y is that computer scienti sts, mathematicians, and other tec hnical p eople wh o read it will come a wa y with a b etter appreciation for the s ubtlet y of some of the p roblems consid ered in mo dern analytic ph iloso p h y . 1 1.1 What This Essa y W on ’ t Co v er I won’t try to discuss ev ery p ossible connectio n b et ween computational complexit y and philosophy , or ev en ev ery connection that’s already b een made. A small num b er of philosophers hav e long in vok ed computational complexit y ideas in their w ork; indeed, the “philpap ers arc h iv e” lists 32 pap ers und er the heading Compu tatio n al Complexit y . 2 The ma jorit y of those pap ers prov e theo- rems ab out the compu tatio nal complexities of v arious logical systems. Of the remaining pap ers, some use “computational complexit y” in a different sens e t h an I d o—for example, to encompass 1 When I use th e word “philosoph y” in t h is essay , I’ll me an philosophy within the analytic tradition. I don’t understand Con tinental or Eastern philosophy wel l enough to sa y whether they ha ve any interesting connections with computational complexity theory . 2 See philpap ers.org/bro wse/computational-complexit y 3 computabilit y theory—and some in vok e the c onc ept o f compu tat ional complexity , bu t n o particular results from th e field dev oted to i t. P erhaps the c losest in spirit to this essay are the interesting articles by C herniak [40] and Morton [97]. In addition, many writers hav e made some v ersion of the observ ations in Section 4 , ab out co mp utatio n al c omp lexity a n d the T u ring T est: see for example Block [30], P arb erry [101], Lev esque [87], and Shieb er [116]. In deciding wh ic h connections to include in this essa y , I adopted the follo wing ground r ules: (1) The connection must in volv e a “prop erly philosophical” prob lem—f or example, the ju stifica- tion for induction or the natur e of mathematical kn o wledge—and not just a tec hnical pr ob lem in logic or mo del theory . (2) The connection must draw on sp e cific insights from the field of computational complexit y theory: n ot j ust th e ide a of complexit y , or the f act that th ere exist hard pr oblems. There a r e man y philosophically-in teresting ideas in mo dern complexit y theory that this essa y men tions only briefly or not at all. One example is pseudor andom gener ators (see Goldr eich [63]): functions th at con v ert a short rand om “seed” into a long string of bits that, w hile n ot truly random, is so “random-lo oking” that no efficient algorithm can detect an y regularities in it. While pseudorand om generators in this sens e are not yet pro ve d to exist, 3 there are many plausible candidates, and the b elief that at least some of the candidates work is cen tral to m odern cryptograph y . (S ection 7.1 will in vok e the related concept of pseu dorandom fu nc tions .) A second example is ful ly homomor phic encryption : an extremely exciting new class of metho ds, the first of whic h was announced by Gen try [60] in 2009, for p erforming arbitrary computations on encrypted data without ever de crypting the data . The output of suc h a computation will look like meaningless gibb erish to the p erson who computed it, b ut it can n ev ertheless b e u ndersto od (and even recognized as the correct outpu t) b y someone who knows the decryption k ey . What are the implications of pseudorand om generators for the foundations of probabilit y , or of fully homomorphic encryption for debates ab out t h e s eman tic m eanin g o f computatio n s? I very muc h hop e that t h is essa y will inspire others to tac kle these and similar questions. Outside of computational complexit y , there are at least th ree ot h er ma jor in tersection p oin ts b et w een ph ilosoph y and modern theoretical computer science. The first o n e is the semantics of pr o gr amming languages , whic h has large and ob vious co n n ectio ns to t h e philosophy of language. 4 The second is distribute d systems the ory , which pro vides b oth an app lica tion area and a rich source of examples for philosophical w ork on r easoning ab out kno wledge (see F agin et al. [53] and Stalnak er [123]). The third is Kolmo gor ov c omplexity (see Li and Vit´ an yi [89]) which studies the length of the sh ortest computer program that achiev es s ome functionalit y , disregarding t ime, memory , and other resour ces u sed b y th e pr og r am. 5 In this essa y , I won’t d iscuss any of these connections, except in p assing (for example, Section 5 touc h es on logics of knowledge in the cont ext of the “logical omniscience problem,” and Section 7 touc hes on Kolmogoro v complexit y in the con text of P A C-learning). In defense of these omissions, let me offer f our excuses. First, th ese other conn ections fall ou tsid e m y stated topic. Second, they 3 The conjecture that p seudorandom generators ex ist implies the P 6 = NP conjecture (ab out which more later), but migh t b e even stronger: the con verse implication is un kno wn. 4 The Stanfor d Encyclop e di a of Philosophy entry on “The Philosophy of Computer S cience,” plato.stanford.edu/en tries/computer-science, devotes most of its space to this connection. 5 A v ariant, “resource-b ounded Kolmogoro v complexity ,” do es take time and memory into account, and is part of computational complexity theory prop er. 4 w ould mak e this essa y ev en longer than it already is. Th ir d, I lac k requisite bac kground . And fourth, my impression is that philosophers—at least some philosophers—are already more aw are of these other connections th an they are of th e computational complexit y conn ections that I w ant to explain. 2 Complexit y 101 Computational complexit y theory is a h uge, sp ra wling field; naturally this essa y will only touc h on small p arts of it. Readers w h o w ant to delve deep er in to the sub ject are urged to consult one of the man y outstanding textb ooks, suc h as th ose of Sip ser [122], Pa p adimitriou [100], Mo ore and Mertens [95], Goldreic h [62], or Arora and Barak [15 ]; or surv ey articles by Wigderson [133 , 134], F ortno w and Homer [58], or Sto c kmey er [124]. One might think that, once w e kno w something is c omp utable , whether it tak es 1 0 seconds or 20 s econds to compute is ob viously the concern of enginee r s rather than philosophers. But that conclusion would not b e so obvious, if the question were one of 10 seconds v ers u s 10 10 10 seconds! And indeed, in complexity theory , the quan titativ e gaps we care ab out are u sually so v ast that one has to consider them qualitativ e gaps as we ll. Think, for example, of the differen ce b et we en reading a 400-page bo ok and reading every p ossible such b o ok, or b etw een writing do wn a thou s and-digit n u mb er and coun ting to that num b er. More precisely , complexit y theory asks th e qu estio n : h o w do the resources n eeded to solv e a p r oblem scale with some m easure n of the prob lem size: “reasonably” (lik e n or n 2 , s ay), or “unreasonably” (lik e 2 n or n !)? As an example, t wo n -digit int egers can b e multiplied using ∼ n 2 computational steps (by th e grade-sc ho ol metho d), or ev en ∼ n log n log log n steps (b y more adv anced metho ds [112]). Either metho d is co n sidered efficien t. By co ntrast, the fa s test kno wn metho d for the reve r se op eration— factoring an n -digit inte ger int o primes—us es ∼ 2 n 1 / 3 steps, whic h is considered inefficient. 6 F amously , this conjectured gap b et ween the inherent d ifficulties of m ultiplying an d factoring is the basis for m ost of the cryp tograph y cur ren tly used on th e In tern et. Theoretical computer scientist s generally call an algorithm “efficien t” if its runn ing time can b e upp er-b oun ded by any p olynomial function of n , and “inefficien t” if its run ning time can b e lo w er-b oun ded by any exp onentia l function of n . 7 These criteria ha ve the great adv an tage of theoretical con venience. While th e exact complexit y of a problem migh t dep end on “lo w-lev el enco ding details,” suc h as whether our T urin g mac h ine has one or t wo memory tap es, or how the in puts are encod ed as binary strings, where a p roblem falls on the p olynomial/exp onen tial dic hotom y can b e sho wn to b e indep end en t of almost all such choic es. 8 Equally imp ortan t are the closur e pr op erties of p olynomial and exp onent ial time: a p olynomial-time algorithm that calls a p olynomial-time subroutin e still yields an o v erall p olynomial-t ime algo r ithm, while a polynomial- 6 This method is called the num b er field sieve , and the quoted ru nning time dep ends on plausible but u npro ved conjectures in number theory . The b est pr oven running time is ∼ 2 √ n . Both of th ese represent nontrivial improve - ments o ver the na ¨ ıve metho d of trying all p ossible divisors, which takes ∼ 2 n steps. See Pomerance [105] for a goo d survey of factoring algorithms. 7 In some contexts, “expon ential” means c n for some constant c > 1, but in most complex ity-theoretic contexts it can also mean c n d for constants c > 1 and d > 0. 8 This is not to say that no detail s of the comp u tational mo del matter: for ex ample, some problems are known to b e solv able in p olynomial time on a qu an tum computer, b u t not k no wn to b e solv able in p olynomial time on a classical comput er! But in my view, the fact th at the p olynomial/exponential distinction can “notice” a mo delling choi ce of this magnitude is a feature of t he distinction, not a b ug. 5 time algorithm that calls an exp onen tial-time su broutine (or vice ve r s a) yields an exp onentia l-time algorithm. Ther e are also more sophisticated r easons why theoretical compu ter scientists f ocus on p olynomial time (rather t h an, sa y , n 2 time o r n log n time); w e’ll explore some of those reasons in Section 5.1. The p olynomial/ exp onential distinction is op en to ob vious ob jections: an algorithm th at to ok 1 . 0000 0001 n steps would b e muc h faster in practice than an algorithm that to ok n 10000 steps! F urthermore, there are man y gro wth r ates that fall b et wee n p olynomial and exp onential , suc h as n log n and 2 2 √ log n . But empirically , p olynomial-time turne d out to corresp ond to “efficien t in pr actice,” and exp onen tial-time to “inefficien t in practice,” so often that complexity theorists b ecame comfortable making the iden tification. Why the iden tification w orks is an in teresting question in its o wn righ t, one to w hic h we will r eturn in Section 12. A priori , insisting that p rograms terminate after r easo n able amoun ts of time, that they use reasonable amount s of memory , etc. m ight sound lik e relativ ely-minor amendments to T uring’s notion of compu tatio n . In pr actice, though, these r equiremen ts lead to a theory with a completely differen t c haracter than computabilit y theory . Firstly , complexit y has muc h closer connections with the scienc es : it lets u s p ose questions ab out (for example) ev olution, quan tum mec hanics, statistica l physics, economics, or human language acquisition that would b e meaningless from a computabilit y standp oint (sin ce al l the relev ant pr oblems are computable). Comp lexit y also differs from computabilit y in the diversit y of m athemat ical te chniques used: while initially complexit y (like computabilit y) dr ew mostly on mathematical logic, to da y it draws on p robabilit y , n u m b er theory , com binatorics, rep resen tation theory , F ourier analysis, and n early ev ery other sub ject ab out whic h y ello w b o oks are written. Of course, this con tribu tes n ot only to complexit y theory’s depth but also to its p erceiv ed in acc essibility . In this essay , I’ll argue that complexit y theory has dir ect r elev ance to ma jor issues in p hilosoph y , including syntax and semant ics, the problem of ind uction, and the in terpretation of quan tum mec hanics. Or that, at least, whether complexit y theory do es or do es not hav e such relev ance is an imp ortant question for ph ilosoph y! My p ersonal view is that complexity will ultimately pr ov e mor e relev ant to p hilosoph y than computabilit y was, p r ecisel y b ecause of the r ic h conn ections with the sciences m en tioned earlier. 3 T he Relev ance of Pol yn omial Time An yone who doubts the imp ortance of the p olynomial/exp onen tial distinction n eeds only p ond er ho w many basic in tuitions in math, science, and philosophy already imp licitl y rely on that d istin c- tion. In this section I’ll give three examples. 3.1 The En t sc heidungsproblem Revisited The Entscheidungspr oblem wa s the dream, en u nciated b y Da vid Hilb ert in the 1920s, of design- ing a mec h anical pro cedure to determin e the tru th or falseho o d of an y w ell-formed mat h emati cal statemen t. According to th e usual story , Hilbert’s dream w as irrev o cably destro yed by th e w ork of G¨ odel, Churc h, and T uring in the 1930s. First, the Incompleteness Theorem sho wed that n o recursiv ely-axiomatizable formal system can enco de al l and only the true mathematical statemen ts. Second, Churc h’s and T u r ing’s resu lts s ho w ed that , ev en i f we settle for an in complete system F , there is stil l no mec h an ical pro cedure to sort mathematical statemen ts into the three categories 6 “pro v able in F ,” “dispro v able in F ,” and “un decidable in F .” Ho w ever, t h ere is a catc h in the ab o ve story , wh ic h was fi r st p ointe d out b y G¨ odel himself, in a 1956 letter to John von Neumann that h as b ecome famous in theoretical computer science since its redisco v ery in the 1980s (see Sipser [121] for an English translatio n ). Giv en a formal system F (suc h as Z ermelo-F raenk el set theory), G¨ odel wrote, co n sider the problem of dec id ing wh ether a mathematical statemen t S has a pro of in F with n symb ols or fewer. Unlik e Hil b er t’s orig in al problem, this “truncated En tscheidungsproblem” is clearly d ecidable. F or, if nothing else, we could alw a ys just program a compu ter to searc h through all 2 n p ossible bit-strings with n sym b ols, and c hec k whether any of them enco des a v alid F -proof of S . Th e issue is “merely” that this app roac h tak es an astronomical amount of time: if n = 1000 (say), then the univ erse will ha ve degenerated in to blac k holes and radiation long b efore a compu ter can c hec k 2 1000 pro ofs! But as G¨ odel also p oin ted out, it’s far fr om obvious ho w to pr ove that there isn’t a m u c h b etter appr oa ch: an approac h that would a vo id brute-force searc h, and find pr oofs of size n in time p olynomial in n . F urthermore: If there actually w ere a mac hine with [runn ing time] ∼ K n (or ev en only with ∼ K n 2 ) [for s ome constan t K in dep enden t of n ], this w ould h a v e consequences of the greatest magnitude. That is to sa y , it would clearly indicate t h at, despite the unsolv abilit y of the En tsc heidun gsproblem, the menta l effort of t h e mathematician in the case of y es- or-no questions could b e c ompletely [added in a fo otnote: apart from the p ostulation of axioms] rep lac ed by mac hines. O ne w ould ind eed ha ve to simply select an n so large that, if th e mac hin e y ields n o result, there wo u ld then also b e no reason to th ink fu rther ab out the pr oblem. If we replace the “ ∼ K n o r ∼ K n 2 ” in G¨ odel’s c hallenge b y ∼ K n c for an arbitr ary constant c , then w e get precisely what computer s cience no w k n o ws a s the P v ersus NP problem. Here P (P olynomial-Time) is, roughly sp eaking, the class of all computational p roblems that are solv able b y a p olynomial-time algorithm. Mean while, NP (Nondeterministic Polynomia l-Time) is th e class of computational problems for whic h a solution can b e r e c o gnize d in p olynomial time, ev en though a s olutio n might b e v ery hard to find. 9 (Think, for example, of factoring a large num b er, or solving a jigsa w or Su d oku pu zzle.) Clearly P ⊆ NP , so the question is whether the in clusion is strict. If P = NP , then the ability to c he ck the solutions to pu zzl es efficient ly would imply th e abilit y to find solutions efficien tly . An analo gy w ould b e if an yo n e able to appr e ciate a great symph on y could also comp ose one themselv es! Giv en the intuitiv e imp lausibilit y of suc h a scenario, essent ially all complexit y theorists p roceed (reasonably , in m y opin ion) on the assumption that P 6 = NP , ev en if they pu blicly claim op en- mindedness ab out t h e qu estion. Pro ving or dispro ving P 6 = NP is one of t h e sev en milli on-d olla r Cla y Millennium Prize Problems 10 (alongside the Riemann Hyp othesis, the Po incar´ e C onjecture 9 Con trary to a common misconception, NP d oes not stand for “Non-Polynomial”! There ar e computational problems that are known to require more than p olynomial time (see Section 10), but th e NP problems are not among those. Indeed, the classes NP and “Non - P olynomial” have a nonempty intersection exactly if P 6 = NP . F or detailed definitions of P , NP , and several hundred other complexity classes, see my Complexity Zo o website: www.complexit yzo o.com. 10 F or more in formation see www.cla ymath .org/ millennium/P vs N P/ My own view is t hat P versus NP is manifestly the most im p ortant of th e seven problems! F or if P = NP , then by G¨ odel’s argumen t , th ere is an excellen t chance that w e could program our computers t o solv e the other six p rob lems as w ell. 7 pro ved in 2002 by P erelman, etc.), wh ic h should giv e some indication of the prob lem’s difficulty . 11 No w retur n to the p roblem of wh ether a mathematical statement S has a pro of with n symbols or fewe r, in some formal s ystem F . A suitable form alization of this p roblem is easily seen to b e in NP . F or finding a pro of might b e int r act able, but if we’ r e given a pu rp orted pro of, w e can certainly c hec k in time p olynomial in n whether eac h line of the pro of follo ws b y a simp le logical manipulation of pr evious lines. In deed, this pr ob lem turns out to b e NP -c omplete , wh ic h means that it b elongs to an enormous class of NP p r oblems, first ident ified in the 1970s, that “capture the en tire difficult y of NP .” A few other examples of NP -complete p roblems are Sudoku and jigsa w p uzzles, the T ra vel in g Salesp erson Problem, and the satisfiabilit y problem f or prop ositional formulas. 12 Asking wh ether P = NP is equiv alen t to asking whether any NP -complete problem can b e solve d in p olynomial time, and is also equiv alen t to asking whether al l of them can b e. In mo dern terms, then, G¨ odel is saying th at if P = NP , then when ev er a theorem had a pro of of reasonable length, we could find that pro of in a r easo n able amount of time. I n suc h a s itu ation, w e might say t h at “for all practical purp oses,” Hilb ert’s dream of mechanizing mathemati cs had prev ailed, d espite th e un decidabilit y results of G¨ odel, C hurc h, and T ur ing. If y ou accept this, th en it seems fair to sa y that unti l P v ersus NP is solv ed, the story of Hilb ert’s En tscheidungsproblem—its rise, its fall, and the consequences for ph ilosoph y—is not yet o v er. 3.2 Ev olv ability Creationists often claim that Darwinian ev olution is as v acuous an explanation for complex adap- tations as “a tornado assem blin g a 747 airplane as it p asses through a junkya r d.” Why is this claim false? There are several related w a ys of answering the question, but to me, one of the m ost illuminating is the fol lowing. In principle, one c ould see a 747 a ssemble itself in a tornado-pr one junkyard—but b efore that happ ened, one w ould need to wa it for an exp ected num b er of torn adoes that grew e xp onential ly with the n u m b er of pieces of self-assem bling j u nk. (This is similar to ho w, in thermo dynamics, n gas p article s in a b ox wil l eve ntually congregate themselv es in one corner of the b ox, bu t only after ∼ c n time for some constan t c .) By con trast, ev olutionary processes can often b e obs erv ed in simulat ions—and in some cases, even p ro v ed th eoret ically—to fin d in teresting solutions to optimization problems after a n umb er of steps that gro ws only p olynomial ly with the n u mb er of v ariables. In terestingly , in a 1 972 lette r to Hao W ang (see [130, p . 192]), Kurt G¨ odel expr essed his own doubts ab out evo lu tion as follo w s : I b eliev e that mec hanism in b iolo gy is a pr eju dice of our time whic h will b e d ispro v ed. In this case, one d ispro of, in m y opinion, will consist in a mathematical theorem to the effect that the formation w ithin geologica l time of a human b o dy by the la ws of 11 One might ask: can we explain what makes the P 6 = NP problem so hard, rath er than just p oin ting out that many smart p eople hav e tried to solve it and failed? After four decades of researc h, we do have partial explanations for the problem’s difficulty , in the form of formal “barriers” that rule out large classes of p roof techniques. Three barriers identified so far are r elativization [21] (which rules out diagonalization and other techniques with a “computability” flav or), al gebrization [8] (whic h rules ou t d iagonalization even when com b ined with the main n on-relativizing tech- niques known today), and natur al pr o ofs [108] ( which shows th at man y “combinatorial” techniques, if they work ed, could b e turned around to get faster algorithms to d istinguish rand om from pseudorandom functions). 12 By contra st, and con trary to a common misconception, there is strong evidence th at factoring integers is not NP -complete. It is known that if P 6 = NP , then there are NP p roblems that are neither in P nor NP - complete [85], and factoring is one candidate for suc h a p roblem. This p oint will b ecome relev an t when we discuss quantum computing. 8 physic s (or any other la ws of similar nature), starting from a ran d om distribution of the elemen tary p articles and the field, is as u nlik ely as the separation by chance of the atmosphere into its comp onen ts. P ersonally , I see no reason to accept G¨ odel’s intuition on this s u b ject o ve r the consensu s of mo dern biology! But p a y atten tion to G¨ odel’s c haracteristically-careful p h rasing. He d oes not ask whether ev olution can e v entual ly form a human b o dy (for he kno w s that it can, giv en exp onen tial time); instead, he asks whether it can do so on a “merel y” geolog ical timescale. J ust as G¨ odel’s letter to von Neumann ant icipated the P v ersus NP problem, so G¨ odel’s letter to W ang m igh t b e said to antici p ate a r ecent effort, by the celebrated computer scien tist L eslie V alian t, t o c onstr uct a qu an titativ e “ theory of evo lv abilit y” [128]. Building on V alian t’s earlier work in computational learning theory (discussed in Section 7), evolv abilit y tries to f ormalize and answer questions ab out the sp e e d of ev olution. F or example: “wh at sorts of adaptive b eha viors can evolv e, with high probabilit y , a fter only a p olynomial n umber of generations? what sorts of b eha viors can b e learned in p olynomial time, b ut not via ev olution?” While there are some interesting early results, it shou ld surprise no one that ev olv abilit y is nowhere close to b eing able to calculate, from fir s t principles, w hether four billion yea r s is a “reasonable” or “unreasonable” length of time for the h u m an b rain to evo lve out of the primordial soup. As I s ee it, this difficult y reflects a general p oin t ab out G¨ odel’s “ev olv abilit y” question. Namely , ev en supp osing G¨ od el w as right, that the mechanistic w orldview of mo dern b iolo gy wa s “as unlikely as the separation b y c h ance of the atmosphere into its comp onen ts,” computational complexit y theory seems hop elessly far f rom b eing able to pr ove an ything of the k in d! In 1972 , one could ha ve argued th at this merely reflected the sub ject’s newn ess: no on e had though t terribly deeply y et ab out ho w to prov e lower b ounds on computation time. But by no w, p eople have though t deeply ab out it, and h a v e id en tified h u ge obstacles to proving even such “ob vious” and wel l-defi n ed conjectures as P 6 = NP . 13 (Section 4 will make a related p oin t, ab out the difficult y of p ro ving non trivial lo wer b ou n ds on the time or memory needed b y a computer program to p ass the T u ring T est.) 3.3 Kno wn I n tegers My last example of the philosophical relev ance of the p olynomial/exponential distinction concerns the concept of “kno wledge” in mathematics. 14 As of 2011, the “largest kno wn pr ime n umb er,” as rep orted by GIMPS (the Grea t In ternet Mersenn e Prime Sea r c h), 15 is p := 2 43112609 − 1. But on reflection, w hat do w e mean by sa ying that p is “kno wn ”? Do w e mean that, if we desired, w e could literally print out its decimal digits (usin g a b out 30 , 000 pages)? That seems lik e to o restrictiv e a c r iterion. F or, giv en a positiv e integ er k together with a pro of th at q = 2 k − 1 w as prime, I d oubt most m athematicians w ould hesitate to call q a “known” prime, e ven if k w ere so large that printing out its decimal digits (or s toring them in a computer memory) were b eyond the Earth’s capacit y . Should w e call 2 2 1000 an “unkno wn p o wer o f 2,” just b ecause it has to o man y decimal digits to list b efore the Sun go es cold? All that sh ould r e al ly matter, one feels, is that 13 Admittedly , one migh t b e a ble to prov e that Darwinian nat ur al sele ction w ould requ ire exp onenti al time to prod uce some functionality , without thereby p ro ving th at any algorithm wo uld require ex ponential time. 14 This section was insp ired by a question of A. R upinski on t he website MathOverflow . See mathov erflow. n et/q uestions/6 2925/philosophical-question-related-to-largest-known-primes/ 15 www.merse n ne.org 9 (a) the expr ession ‘2 43112609 − 1’ picks out a un ique p ositive integ er, and (b) that integ er h as b een p r o v en (in this case, via computer, of course) to b e prime. But w ait! If those are the criteria, then wh y can’t we imm ediate ly b eat the largest-kno w n-prime record, like so? p ′ = The fir s t pr ime larger than 2 43112609 − 1 . Clearly p ′ exists, it is unambiguously defi ned, and it is prime. If we w ant, w e can ev en write a program that is guaran teed to find p ′ and output its d ecimal digits, using a num b er of steps that can b e upp er-b ound ed a pr iori . 16 Y et our intuition stubb orn ly in s ists that 2 43112609 − 1 is a “kno wn” prime in a s ense that p ′ is not. Is there an y principled b asis for suc h a distinction? The clearest basis that I can suggest is the follo win g. W e know an algorithm that tak es as input a p ositiv e in teger k , and that outputs the decimal d igits of p = 2 k − 1 using a numb er of steps that is p olynomial—inde e d, line ar—in the numb er of d ig its of p . But we do n ot kno w a ny similarly-efficien t algorithm that pro v ably outp uts the first pr ime larger than 2 k − 1. 17 3.4 Summa ry The p oin t of these examples wa s to illustrate that, b eyo n d its utilit y for th eoretical computer science, the p olynomial/exp onen tial gap is also a fertile territory for philosoph y . I think of the p olynomial/exponential gap as occupying a “middle ground ” b et we en t w o other sorts of gaps: on the one han d , sm all qu an titativ e gaps (such as the gap b etw een n steps and 2 n steps); and on the other hand, the gap b etw een a finite n um b er of steps and an infinite num b er. The troub le with small quantita tive gaps is that they are to o sensitiv e to “m un dane” mo deling c hoices and the details of tec h n olog y . But the gap b et ween fin ite and infinite has the opp osite problem: it is serenely in sensitiv e to d istinctio n s th at we actually care ab out, such as that b et w een findin g a solution and v erifying it, or b et ween classical and quantum p h ysics. 18 The p olynomial/ exp onential gap a v oids b oth problems. 4 Compu tati onal Complexit y and the T uring T est Can a c omputer think? F or almost a cen tury , discus sions ab out this qu estion hav e often conflated t wo issues. The first is the “metaphysic al” issu e: Supp osing a co mp uter program passed the T uring T est (or a s stron g a v ariant of t h e 16 F or ex ample, one could u se Chebyshev’s The or em (also called Bertr and’s Postulate ), whic h says that for all N > 1 there exists a p rime b etw een N an d 2 N . 17 Cr am´ er’s Conj e ctur e sta t es that the spacing b et ween tw o consecutive n -digit primes never exceeds ∼ n 2 . This conjecture app ears staggeringly difficult: even assuming th e Riemann Hyp othesis, it is only know n how to d educe the muc h weak er upp er b ound ∼ n 2 n/ 2 . But interestingly , if Cram´ er’s Conjecture is prov ed, expressions like “the first prime larger than 2 k − 1” will then defi ne “kn o wn primes” according to my criterion. 18 In particular, it is easy to c h ec k that the set of c omputable functions does not depend on whether w e define computability with resp ect to a classical or a quantum T uring machine, or a d eterministic or nondeterministic one. At most, these choices can change a T u ring m achine’s running t ime by an exp onential factor, which is irrelev an t for computability t h eory . 10 T uring T est as one wish es to defin e), 19 w ould we b e righ t to ascrib e to it “consciousness,” “qualia,” “ab outness,” “in ten tionalit y ,” “sub jectivit y ,” “p ersonho o d,” or whatev er other c harmed status we wish to ascrib e to other humans and to ourselv es? The second is the “practical” iss ue: Could a c ompu ter program that passed (a strong version of ) the T uring T est actually b e written? Is there s ome f undamen tal reason wh y it couldn’t b e? Of course, it w as precisely in an attempt to separate these issues th at T u r ing p rop osed the T uring T est in the fir st place! But despite h is efforts, a familiar feature of anti-AI arguments to this day is that they first assert AI’s metaphysica l imp ossibilit y , and then try to b olster that p osition with claims ab out AI’s practical d ifficulties. “Su re,” they sa y , “a computer pr ogram migh t mimic a few minutes of witt y b an ter, but unlike a human b eing, it would nev er sho w fear or anger or jealousy , or comp ose symphonies, or grow old, or f all in lo v e...” The o bvious follo wu p questio n —and what if a program did do all those th ings?—is often left unask ed , or else answered by listing more things that a compu ter program could self-eviden tly nev er do. B ecause of this, I susp ect that man y p eople w ho say th ey consid er AI a met aphysical imp ossibilit y , really consider it only a practical imp ossibilit y: they simply ha v e not carried the requisite thought exp erimen t far enough to see the d ifferen ce b etw een the tw o. 20 Inciden tally , this is as clear-cut a case as I know of where p eople wo u ld b enefit from stu dying more philosophy! Th u s, the an ti-AI arguments that inte r est m e most hav e alw a ys b een the ones that target the practical issue from the outset, by prop osing empirical “sw ord-in-the-stone tests” (in Daniel Dennett’s phrase [46]) that it is claimed h um an s can pass but compu ters cannot. The m ost famous suc h test is probably the one based on G¨ odel’s Incompleteness T h eorem, as prop osed by John Lucas [91] and elab orated by Roger P enrose in his b o oks The Emp er or’s N e w Mind [102] and Shadows of the Mind [103]. Briefly , Lucas and P enrose argued that , according to the In completeness Theorem, one thing that a computer making deductions via fixed formal rules can n ev er do is to “see” the consistency of its o wn rules. Y et this, they assert, is something that h u man mathematicians c an do, via some sort of intuitiv e p erception of Platonic realit y . T herefore h um ans (or at least, human mathematicians!) can never b e simulated b y m ac hines. Critics p ointed out n umerous h oles in this argumen t, 21 to which Pe n rose resp onded at length in Shadows of the M ind , in m y opinion un con vincingly . Ho wev er, ev en b efor e we analyze some 19 The T uring T est, p rop osed by Alan T uring [126] in 1950, is a test where a human jud ge interacts with either another human or a computer conve rsation program, by typing messages b ac k and forth. The program “passes” the T est if th e judge can’t reliably distinguish the program from the human interlocutor. By a “strong v arian t” o f the T uring T est, I mean that besides th e usual telet yp e co nversatio n, one could add additional t ests requiring vision, hearing, touc h, smell, sp eaking, handwriting, facial ex pressio n s, dancing, playing sp orts and musical instruments, etc.—ev en though many p erfectly-intelligen t humans w ould then b e unable to pass the tests! 20 One famous exception is John Searle [113], who has made it clear that, if (say) his b est friend turned out to b e contro lled by a micro c h ip rather than a brain, then he would regard his friend as never having b een a p erson at all. 21 See Dennett [46] and Chalmers [37] for ex ample. T o summarize: (1) Why should we assume a compu ter op erates within a know ably-sound formal system? If we grant a compu t er the same freedom to make o ccasio n al mistakes th at we gran t humans, then the Incompleteness Theorem is no longer relev ant. (2) Why should w e assume that human mathematicians hav e “direct p erception of Platonic realit y”? Human 11 prop osed sw ord -in -the-stone test, it seems to me that th er e is a m uch more b asic question. Namely , what do es one even me an in sa ying one has a task that “h umans can p erform but computers cannot”? 4.1 The Lo okup-T able Argumen t There is a fundamen tal difficult y here, which was noticed by others in a slig htly different conte xt [30, 101, 87, 116]. Let me fi rst exp lain the difficulty , and then discuss the difference b et ween m y argumen t and the p r evious ones. In practice, p eople j udge eac h other to b e conscious after interacti n g for a v ery short time, p erhaps as little as a few seconds. This su gge sts that w e can pu t a fin ite u pp er b ou n d—to b e generous, let us sa y 10 20 —on the num b er of bits of information th at t wo p eople A and B wo u ld ev er realisticall y exc hange, b efore A had amassed enough evidence to conclude B was conscious. 22 No w imagine a lo okup table that stores every p ossib le history H of A and B ’s con ve r satio n , and n ext to H , the ac tion f B ( H ) that B would tak e next g iven that history . Of cour se, lik e Bo r ges’ Library of Bab el, the lo okup table would consist almost en tirely of meaningless n onsense, and it w ould also b e m uch to o large to fi t inside the observe d un iv erse. But all th at matters for us is that the lo okup table would b e finite , by the assu mption that there is a finite u pp er boun d on the conv ersation length. This implies th at the f unction f B is computable (indeed, it c an b e recognized by a fi n ite automaton!). F rom these sim p le considerations, we conclude that if th ere is a fundamenta l obstacle to computers p assing the T uring T est, then it is not to b e found in computabilit y theory . 23 In Shadows of the Mind [103, p . 83], P enr ose recognizes this prob lem, bu t giv es a puzzling and unsatisfying resp onse: One could equally w ell envi s age computers that con tain nothing bu t lists of totally false mathematical ‘theorems,’ or lists con taining r andom jumbles of truths and falseho ods . Ho w are w e to tell whic h computer to trust? T h e argument s that I am trying to m ak e here do not sa y that an effectiv e sim ulation of the output of conscious human activit y (here mathematics) is imp ossible, since p urely by chance the compu ter might ‘happ en’ mathematicians (such as F rege) hav e b een wrong b efore about the consistency of formal systems. (3) A compu t er could, of course, b e p rogra mmed to output “I b elieve that formal system F is consistent”—and even t o output answ ers to v arious follo wup questions ab out why it believes this. So in arguing that s u c h affirmations “wo u ld n ’t really count” (b ecause they wouldn’t reflect “true und erstanding”), AI critics such as Lucas and Pe n rose are forced to retreat from their vision of an empirical “sword-in-the-stone test,” and fall back on other, un specified criteria related t o the AI ’s internal structure. But then why put the swor d i n the stone in the first plac e? 22 P eople in teracting o ver t h e Internet, via ema il or instant messages, regularly judge ea ch other to be h umans rather than spam-b ots after ex c hanging a m u c h smaller num b er of b its! In any case, cosmological considerations suggest an upp er b ound of roughly 10 122 bits in any observ able pro cess [34]. 23 Some readers migh t notice a t en sion here: I explained in Section 2 that complexity theorists care about the asymptotic b eha v ior as the problem size n go es to infi nit y . So wh y am I now saying t hat, for the p urposes of the T uring T est, we should restrict attention to finite v alues of n suc h as 10 20 ? There a re t wo answers to this question. The first is that, in contra st to mathematical problems like th e factoring problem or the halting problem, it is un clear whether it even makes sense to generalize the T uring T est to arbitrary con versation lengths: for the T uring T est is defi ned in terms of human b eings, and human conversa tional capacity is finite. The second answe r is that, to whatever extent it do es mak e sense to generalize the T uring T est to arbitrary conv ersation lengths n , I am interes ted in wheth er the asymptotic complexity of passing th e test grows p olynomially or ex ponentially with n (as the remainder of th e section explains). 12 to get it right—e ven without any understand ing whatso ev er. But th e o dd s against this are absurd ly enormous, and the iss u es that are b eing addressed h ere, namely h o w one decides which mathemat ical statemen ts are true and w hic h are false, are not even b eing touc hed... The trouble w ith this resp onse is that it amounts to a retreat from the sword-in-the-stone test, bac k to murkier inte r nal criteria. If, in the end, w e are going to hav e to look insid e the computer anyw a y to determine w h ether it truly “understands” its answe r s , then why no t disp ense with c omputability the ory fr om the b e ginning? F or computabilit y theory only add resses whether or not T uring mac h ines exist to solv e v arious problems, and w e ha ve already seen that that is not the relev an t issue. T o m y m in d, there is one dir ect ion that Pe n rose could tak e f rom this p oin t to a void incoherence— though disapp ointingly , it is n ot the direction he chooses. Namely , h e could p oin t out that, w hile the lo okup table “w orks,” it requires computational r esources that grow exp onen tially with the length of the con ve r satio n ! This w ould lead to the follo win g sp eculation: (*) Any compu ter pr ogram that passed the T uring T est w ould need to b e exp onen tially- inefficien t in the length of the test—as measur ed in some r esource such as time, memory usage, o r the num b er of b its n eeded to write the program d o wn. In other wo r ds, the astronomical lo okup table is essent ially the b est one can do. 24 If true, sp eculation (*) wo u ld do what Pe n rose w an ts: it w ould imply that the human brain can’t ev en b e simula te d by c ompu ter, within the r esource co ns train ts of the obs erv able universe. F urthermore, u nlik e the earlier computabilit y claim, (*) has the adv an tage of not b eing trivially false! On the other h and, to pu t it mildly , (*) is not trivially true either. F or AI pr oponents, the lac k of comp elling evidence for (*) is hardly surpr ising. After all, if y ou b eliev e that the brain itself is basically an efficie nt, 25 classical T uring mac hine, then y ou hav e a s imple explanati on for why n o one has pro ved that the brain can’t b e sim u late d b y such a mac hin e! Ho we ver, complexit y theory also mak es it clear that, even if we supp ose d (*) held , there w ould b e little hop e of pr oving it in our curr en t state of mathematical kn o wledge. After all, w e can’t ev en prov e plausible, we ll-defin ed conjectures such as P 6 = NP . 4.2 Relation to P revious W ork As men tioned b efore, I’m far from the first p erson to ask ab out the c omputational r esour c es used in passing the T uring T est, and whether they scale p olynomially or exp onen tially with the con versatio n length. While man y writers ignore this crucial distinction, Block [30], P arb err y [101], Lev esque [87], Shieb er [116 ], and several others all discussed it e xp licit ly . The mai n difference is that th e previous discus s ions to ok p lace in the conte xt of S earle’ s Chin ese Ro om argumen t [113]. 24 As Gil K ala i p ointed out to me, one could sp eculate instead th at an efficien t computer program exists to pass the T uring T est, but that finding such a program would require exp onen tial computational resources. In th at situation, the human brain could indeed be sim ulated efficiently by a computer program, but mayb e not by a program t hat humans could ever write ! 25 Here, by a T urin g mac h ine M b eing “efficien t,” we mean that M ’s running time, memory u sage , an d program size are mo dest en ou gh that there is no real problem of principle u nderstanding how M could b e sim u lated by a classical physical system consisting of ∼ 10 11 neurons and ∼ 10 14 synapses. F or example, a T uring machine containing a lookup table of size 10 10 20 w ould not b e efficient in this sense. 13 Briefly , Searle prop osed a thought exp eriment—the d etai ls don’t concern u s here—purp orting to s ho w that a computer program could pass the T u r ing T est, eve n though the p r ogram manifestly lac k ed anything that a r easo n able p erson wo u ld call “int elligence” or “understand ing.” In r esp onse, man y critics sai d that Searle’s argum en t w as deeply misleading, b ecause it implicitly enco u raged us to imag ine a computer program that w as simplistic in its in ternal op erations—something lik e the gian t lo okup table describ ed in Section 4.1. And while it w as tru e, th e critics wen t on, th at a gian t lo okup table w ouldn ’t “truly under s tand” its resp ons es, that p oint is also irr elevant . F or the gian t lo okup table is a philosophical fiction an ywa y: something that can’t even fi t in the observ able unive r s e! If w e instea d imagine a c omp act, efficient computer program passin g the T u ring T est, then the s itu ation c h anges drastic ally . F or no w, in order to exp lain ho w the program can b e so compact and efficien t, we’ll n eed to p osit that th e p rogram includ es repr esen tations of abstract concepts, capacities for learning and r easo n ing, and all sorts of other inte r nal fu rniture that w e w ould exp ect to find in a mind . P ersonally , I find this r esp onse to Searle extremely interesting—since if correct, it suggests that the distinction b et ween p olynomial and exp onent ial complexit y has metaphysic al significance. According to this resp onse, an exp onen tial-sized lookup table that passed the T uring T est w ould not b e s en tien t (or conscious, int elligen t, self-a w are, etc.), bu t a p olynomially-b ounded p r ogram with exactly the same input/outpu t b eha vior would b e sent ient. F urthermore, the latter program w ould b e sen tient b e c ause it was p olynomially-b ounded. Y et, as m u c h as that criterion for sentience fl atte r s my complexity-t h eoret ic pride, I fi n d m yself reluctan t to tak e a p osition on suc h a weig ht y m atte r. My p oint , in Section 4.1, w as a simpler and (hopefu lly) less con tro ve r sial one: namely , that if y ou w an t to cl aim that passing the T urin g T est is flat-out imp ossible , then like it or not, yo u must talk ab out complexit y rather than just computabilit y . I n other words, the previous w riters [30, 101, 87, 116] and I are all inte r ested in the computational resources needed to pass a T ur ing T est of length n , but for different reasons. Where others in vok ed complexit y considerations to argue with Searle ab out the metaph ysical question, I’m inv oking them to argue with Penrose ab out the p ractic al question. 4.3 Can Humans Solv e NP -Complete Problems Efficien tly? In that case, w h at can w e actually say ab out the practical qu estion? Are there an y reasons to accept the claim I called (*)—the claim that humans are not e fficiently simulable by T uring mac hines? In considering th is question, we ’r e immediately led to some s p eculati ve p ossibilities. So f or example, if it tur ned out th at h u mans could s olv e arbitrary instances of NP -complete problems in p olynomial time, then that would certainly strong excellen t empirical evidence f or (*). 26 Ho w ever, desp ite o ccasio nal claims to the con trary , I p ersonally see no r easo n to b eliev e that humans c an solv e NP - complete pr oblems in p olynomial time, and excellen t r easo n s to b eliev e the opp osite. 27 Recall, for 26 And am u singly , if we could solv e NP - complete problems, then we’d presumably find it m u ch easier to prov e t h at computers c ouldn ’t solve th em! 27 Indeed, it is not even clear to me that we sh ould think of humans as b eing able to solv e all P problems efficiently , let alone NP -complete problems! Recall that P is the class o f problems that ar e solv able in p olynomial time by a deterministic T uring machine. Man y problems are known to belong to P for quite sophisticated rea sons: tw o examples are testing wheth er a number is prime (th ough not factoring it!) [9] and testing whether a graph has a p erfect matching. In principle, of course, a human could laboriously run th e p olynomial - time algorithms for such problems using p encil an d pap er. But is the use of p encil and pap er legitimate, w h ere use of a compu t er would not be? What is the computational p o wer of the “unaided” human intellect? Recent wo rk of Druck er [51], which sho ws how to use a sto ck photography collection to increase the “effective memory” a va ilable for mental calculations, 14 example, that the in teger factoring pr oblem is in NP . Th us, if h umans could solv e NP -complete problems, then presu m ably we ought to b e able to factor enormous num b ers as well! But f actoring do es n ot exactly seem lik e the most pr omising candidate for a s w ord-in-the-stone test: that is, a task that’s easy for h umans bu t hard f or computers. As far as an y one kno ws to da y , facto rin g is hard for humans and (classical) computers alike , although with a definite adv an tage on the compu ters’ side! The b asic p oin t can h ardly b e stressed enough: when complexit y theorists talk ab out “in- tractable” problems, they generally mean mathematical problems that all our exp erience leads us to b eliev e are at lea st as hard for h umans as for computers. T his suggests that, even if humans w ere n ot efficientl y sim u lable by T uring mac hines, the “direction” in whic h th ey w ere hard to simu- late w ould almost certainly b e d ifferen t from the dir ect ions u sually considered in complexit y theory . I see tw o (hyp othetic al) wa y s this could happ en. First, the tasks that humans were uniquely goo d at—lik e pain ting or writing p o etry—could b e inc omp ar able with mathematical tasks like s olving NP -complete problems, in the sense that neither was efficien tly r educible to the other. This would mean, in particular, that there could b e no p olynomial-time algorithm ev en to r e c o gnize great art or p o etry (since if such an algorithm existed, then the task of c omp osing great art or p o etry w ould b e in NP ). Within complexit y theory , it’s kno wn that there exist pairs of p r oblems that are incomparable in this sens e. As one plausible example, no one currently knows ho w to r ed uce the simula tion of quan tum computers to the solution of NP -complete p roblems or vice ve r sa. Second, humans could ha ve the abilit y to solv e interesting sp e cial c ases of NP -complete problems faster than an y T u ring mac hine. So for e xamp le, ev en if co mp uters we re b etter than humans at factoring large num b ers or at solving randomly-generated Sudoku puzzles, h umans migh t still b e b etter a t searc h problems with “higher-lev el structure” or “semanti cs,” su c h a s proving F ermat’s Last T h eorem or (ironically) designing faster computer algorithms. Ind eed, even in limited domains suc h as puzzle-solving, while computers can examine solutions millions of times faster, humans (for no w) are v astly b etter at noticing glob al p atterns or symmetries in the puzzle that mak e a solution either trivial or imp ossible. As an am u s ing example, consider the Pige onhole Principle , which sa ys that n + 1 pigeons can’t b e placed int o n h oles, w ith at most one pigeon p er h ole. It’s not h ard to construct a prop ositional Bo olean formula ϕ th at enco des th e Pigeonhole Principle for some fixed v alue of n (sa y , 1000). Ho wev er, if y ou then feed ϕ t o curr ent Bo olean satisfiabilit y algorithms, they’ll assiduously set to w ork tryin g out p ossibilities: “let’s see, if I p ut this pigeon here, and that one there ... darn, it stil l do esn’t work!” And they’ll cont inue trying out p ossib iliti es for an exp onen tial num b er of steps , oblivious to the “global ” rea son why the goal can nev er b e ac hiev ed. Indeed, b eginning in the 1980s, the field of pr o of c omplexity —a close cousin of computational complexit y—has b een able to sh o w th at large classes of algorithms r e quir e exp onen tial time to pro ve the Pigeonhole Principle and similar prop ositional tautolog ies (see Beame and Pitassi [24] for a su rv ey). Still, if w e wa nt to bu ild our sword-in-the-stone test on the abilit y to detect “higher-lev el patterns” in combinatorial search pr oblems, then th e b urden is on us to explain what we me an b y higher-level patterns, and why we think that n o p olynomial-time T uring mac hine—eve n m uch more sophisticated ones than we can imagine to da y—could ev er d ete ct those patterns as well. F or an initial attempt to understand NP -complete pr oblems from a co gnitive scie n ce persp ective , see Baum [22]. provides a fascinating empirical p ersp ectiv e on th ese questions. 15 4.4 Summa ry My conclusion is that, if you opp ose the p ossib ility of AI in p rinciple, th en either (i) y ou can tak e th e “metaphysica l route” (a s Searle [113] does with the Chinese Room), con- ceding the p ossibilit y of a computer program passing ev ery conceiv able empir ica l test for in telligence, but arguing that that isn ’t enough, or (ii) y ou can conjecture an astr onomic al lower b ound on the r esour c es needed either to run such a program or to write it in the fir st place—but here there is little question of pro of f or the foreseeable fu tu re. Crucially , b ecause of the lo okup-table argument, one o p tion y ou d o not hav e is to assert the flat- out imp ossibilit y of a computer program passing the T u ring T est, with no mention of qu an titativ e complexit y b ounds. 5 T he Problem of Logical Omniscience Giving a form al accoun t of know le dge is one of the cen tral concerns in mo dern analytic p hilosoph y; the literature is to o v ast ev en to s u rv ey here (though see F agin e t a l. [5 3 ] for a compu ter-scie n ce- friendly ov erview). T yp ically , form al accoun ts of kn owledge inv olv e con v entional “logical” axioms, suc h as • If you kno w P and you kno w Q , then y ou also kno w P ∧ Q supplemented by “mo dal” axioms ha ving to do w ith kn o wledge itself, such as • If you kno w P , th en you also know that you kno w P • If you don’t know P , then you kno w that yo u don’t kn o w P 28 While the details d iffer, what most formal accoun ts of kn o wledge ha ve in common is that they treat an agen t’s kno wledge as c lose d un der the application of v arious dedu ctio n ru les like the ones ab o v e. In other w ord s, agent s are consid er ed lo gic al ly omniscient : if they kno w certain facts, th en they also kn o w all p ossible logical consequences of those facts. Sadly and obvio u s ly , no mortal b eing has ever attained or ev en appro ximated th is sort of omniscience (r eca ll the T urin g q u ote from the b eginning of Section 1). So for example, I can know the ru les of arithmetic without kno win g F ermat’s Last Th eorem, and I can kno w the r ules of c h ess without kno wing whether White has a forced win. F ur thermore, the difficult y is not (a s sometimes claimed) limited to a f ew domains, such as mathematics and games. As p oint ed out b y Stalnak er [123], if we assumed logical o mn iscience , then we couldn’t accoun t for any con temp lation of facts already kno wn to u s—and thus, for the main activit y and one of the m ain sub jects of ph iloso p h y itself ! W e can now lo osely s tat e wh at Hint ikk a [72] called the pr oblem of lo gic al omniscienc e : 28 Not surprisingly , this particular axiom has engendered contro versy: it leav es no p ossibili ty for Ru msfel d ian “unknown un kno wns.” 16 Can we giv e some f orm al accoun t of “kno w ledge” able to accommo date p eople learning new things w ith ou t lea ving their armchairs? Of course, on e v acuous “solution” wo u ld b e to declare that yo u r kno wledge is simply a list of all the true sen tences 29 that yo u “kno w”—and that, if the list happ ens not to b e closed und er logical deductions, so b e it! But this “solution” is no help at all at explaining how or why you know things. Can’t we do b etter? In tuitively , we wan t to say that your “kno wledge” consists of v arious non-logical facts (“grass is green”), together with some simple consequen ces of those facts (“grass is not pin k ”), but not nec- essarily al l the consequences, and certainly not all consequences that inv olv e difficult mathematical reasoning. Unfortunately , as s o on as w e try to form alize this idea, we run into problems. The most obvious problem is the lac k of a sharp b oun d ary b et ween the facts you kno w righ t a w ay , and those y ou “could” kno w, but only after significan t though t. (Recall the discus sion of “known primes” from Sectio n 3.3.) A relate d problem is the lac k of a sharp b oundary b et wee n the facts y ou kno w “only if aske d ab out th em,” an d those y ou kn o w even if yo u ’re not asked. Interestingly , these t w o b oundaries seem to cut across eac h ot h er. F or example, while you’v e probably already encoun tered the fact that 91 is comp osite, it migh t tak e yo u some time to rememb er it; while y ou’v e probably never encoun tered the fact that 83190 is comp osite, once asked y ou can pr obably assent to it immed iate ly . But as discussed b y Stalnak er [123], th er e’s a thir d problem that s eems m u c h more serious than either of the tw o ab o v e. Namely , y ou migh t “kno w” a particular fact if ask ed ab out it one w ay , but not if aske d in a different w ay! T o illus trate this, Stalnake r uses an example that we can recognize immediately from the d iscussion of the P v ers us NP p roblem in Section 3.1. If I ask ed you whether 43 × 37 = 1591, y ou could p robably answer easily (e.g., by u sing (40 + 3) (40 − 3) = 40 2 − 3 2 ). On the other hand, if I instead ask ed y ou wh at the p rime factors of 1591 w ere, y ou probably c ouldn ’t answ er so easily . But the answ ers to the t wo qu estions hav e the same con ten t, ev en on a v ery fi ne- grained notion of con ten t. S upp ose that we fix the threshold of ac cessibility so that the inf ormatio n that 43 and 37 are the prime factors of 1591 is accessible in r esp onse to the second question, b ut n ot accessible in resp onse to the fir st. Do y ou kn ow wh at the prime factors of 1591 are or not? ... Our problem is that we are n ot just trying to sa y what an ag ent w ould know up on b eing ask ed certain questions; rather, we a r e trying to use the facts ab out an agen t’s qu estion answe r in g capacitie s in order to get at wh at the agen t knows, eve n if the questions are n ot ask ed. [123, p . 253] T o ad d another example: do es a typica l four-y ear-old child “know” th at addition of reals is comm utativ e? Certainly not if we ask ed her in those w ords — an d if we tried to explain the w ord s, she probably wo u ldn’t understand us. Y et if w e show ed her a stac k of b o oks, and asked h er whether she could make the s tac k higher b y shuffling the b o oks, she probably w ouldn ’t mak e a mistak e t h at inv olv ed imagining add ition was n on-co mmutativ e. In that sense, we might sa y she already “implicitly” kn o ws w h at her math classes w ill later mak e explicit. In my view, these and other examples strongly suggest th at only a small part of what w e mean b y “knowledge ” i s kn o wledge ab out th e truth or falseho o d of individu al p rop ositio n s. And 29 If we don’t require the sentences to b e true , then presumably we’re talking ab out b elief rather than know l e dge . 17 crucially , this r emains s o ev en if we restrict our attent ion to “pur ely v erbalizable” knowle d ge— indeed, know le dge use d f or answering factual questions —and not (sa y) knowledge of h ow to ride a bik e or swin g a golf club, or knowle d ge of a p er s on or a place. 30 Man y ev eryd a y uses of the word “kno w” supp ort this idea: Do y ou kno w calculus? Do y ou kno w Sp anish? Do y ou kno w the ru les of bridge? Eac h of the ab o ve questions could b e interpreted as asking: do you p ossess an internal algorithm, by which you c an answ er a lar ge (and p ossibly-unb ounde d) set of questions of some form ? While this is rarely made explicit, the examples of this section and of Section 3.3 suggest addin g th e pro viso: . . . answ er in a r e asona ble amount of time? But sup p ose w e accept that “kno wing ho w” (or “kno wing a go o d algorithm for”) is a more fundamental concept than “kno wing that.” Ho w d oes that help us at al l in s olving the logical omniscience pr ob lem? Y ou migh t worry th at w e’re righ t b ac k wher e we s tarted. After all, if we try to giv e a formal accoun t of “kno w ing ho w,” then just lik e in the case of “kno win g that,” it will b e tempting to write do wn axioms lik e the follo wing: If y ou know ho w to compute f ( x ) and g ( x ) efficient ly , then you also know h o w to compute f ( x ) + g ( x ) efficient ly . Naturally , w e’ll then wan t to take the logical closure of th ose axioms. But then, b efore we kn o w it, w on’t we ha ve conjur ed into our imaginations a computationally-omniscien t sup erb eing, wh o could efficien tly compute anything at all? 5.1 The Cobham Axioms Happily , t h e ab o v e w orry turns out to b e unfound ed . W e c an write do wn reasonable axioms for “kno wing h o w to compute efficien tly ,” and then go ahe ad and take the closur e of those axioms , without gett ing the un w anted consequence of co mp utational omniscience. Explaining this p oin t will in vo lve a digression in to an old and f ascinating corner of complexit y theory—one that pr obably holds indep end en t interest for p h ilosophers. As is well -kn own, in the 1930s Ch u r c h and Kleene prop osed definitions of the “computable functions” that tur ned out to b e pr ecisely equiv alen t to T uring’s defin ition, but that differed from T uring’s in making no explicit men tion of mac hines. Rather than analyzing the pr o c ess of co m- putation, the Churc h-Kleene approac h w as simp ly to list axioms that the computable functions of natural num b ers f : N → N ought to satisfy—for example, “if f ( x ) and g ( x ) are b oth computable, then so is f ( g ( x ))”—and then to d efine “the” compu table f unctions as the smallest s et satisfying those axioms. In 1965, Alan Cobham [42 ] ask ed whether the s ame could b e done for the e ffic i ently o r fe asibly computable functions. As an answer, he offered axioms th at precisely c h aract erize what to da y w e call FP , or F unction P olynomial-Time (though Cob h am called it L ). The class FP consists o f all 30 F or “knowi n g” a person suggests having actually met the p erson, while “knowi n g” a p lace suggests ha v ing visited the place. Interes tin gly , in Hebrew, one uses a completely different verb for “know” in the sense of “b eing familiar with” ( makir ) than for “kn ow” in the intellectual sense ( yo deya ). 18 functions of natural num b ers f : N → N that are computable in p olynomial time by a deterministic T uring mac hine. Note that FP is “morall y” the same as th e cl ass P (P olynomial-Time) defined in Section 3.1: th ey differ only in that P is a class of de cision p roblems (or equiv alen tly , functions f : N → { 0 , 1 } ), wh ereas FP is a class of functions with in teger range. What w as note worth y ab out Cobham’s c haracterization of p olynomial time was that it d idn’t in vol ve any explicit m en tion of either computin g devices or b ounds on their run n ing time. L et me no w list a v ersion of Cobh am’s axi oms, adapted fr om Arora, Impagliazzo , and V azirani [16]. Eac h of the axioms talks ab out which fu nctions of n atural n umb ers f : N → N are “efficien tly computable.” (1) Ev ery constan t f u nction f is efficien tly computable, as is ev ery fu nction wh ic h is nonzero only finitely often. (2) P airing: If f ( x ) a n d g ( x ) are efficien tly computable, then so is h f ( x ) , g ( x ) i , where h , i is some standard pairing fun ction for the natural num b ers. (3) Comp osition: If f ( x ) and g ( x ) are efficien tly computable, then s o is f ( g ( x )). (4) Grab Bag: The follo w in g fun ctio n s are all efficien tly computable: • the arithmetic functions x + y and x × y • | x | = ⌊ log 2 x ⌋ + 1 (the num b er of bits in x ’s binary representat ion) • the pro j ection functions Π 1 ( h x, y i ) = x and Π 2 ( h x, y i ) = y • bit ( h x, i i ) (the i th bit of x ’s binary repr esen tation, or 0 if i > | x | ) • diff ( h x, i i ) (the n u m b er obtained from x b y flipping its i th bit) • 2 | x | 2 (called the “smash function”) (5) Bounded Recursion: Supp ose f ( x ) is efficien tly computable, and | f ( x ) | ≤ | x | for all x ∈ N . Then the f u nction g ( h x, k i ), defined by g ( h x, k i ) = f ( g ( h x, ⌊ k / 2 ⌋i )) if k > 1 x if k = 1 , is also efficiently computable. A few commen ts ab out the Cobham axioms migh t b e helpful. First, the axiom that “do es most of the w ork” is (5). Intuitiv ely , giv en any natural n um b er k ∈ N that w e can generate starting from the original in p ut x ∈ N , the Bounded Recursion axiom lets us set up a “computational pro cess” that runs for log 2 k steps. Second, the role of the “smash fun ctio n ,” 2 | x | 2 , is to let us map n -bit intege r s to n 2 -bit integ ers to n 4 -bit integ ers and so on, an d thereby (in com b in atio n with the Bounded Recursion axiom) set up computational pro cesses that ru n for arbitrary p olynom ial num- b ers of steps. Third, although addition and multiplic ation are included as “efficient ly computable functions,” it is cru cial that exp onentiat ion is not included. Indeed, if x and y a r e n -bit int egers, then x y migh t require exp onent ially many bits ju st to write down. The basic result is then the follo wing: 19 Theorem 1 ([42, 110]) The clas s FP , of functions f : N → N c omputable in p olynom ial time by a deterministic T uring ma chine, sa tisfies axioms (1)-(5), and is the smal lest class th at do es so. T o p ro v e Th eorem 1, on e needs to do t wo things, n either of them difficult: fi rst, show that any function f that can b e defin ed using the C ob h am axioms can also b e computed in p olynomial time; and second, sho w that the Cobham axioms are enough to sim ulate an y p olynomial-time T urin g mac hine. One dr a wbac k of th e Cobham axioms is th at they seem to “sneak in the concept of p olynomial- time through the bac k do or”—b oth through the “smash function,” and through the arbitrary- lo oking c ond ition | f ( x ) | ≤ | x | in axiom (5). In the 199 0s, how ev er, Leiv ant [86] and B ellant oni and Co ok [2 5 ] both ga v e more “elega nt” logical c haracterizations of FP that a v oid th is p roblem. So for example, Leiv an t sho we d that a fun ction f b elongs to FP , if and only if f is computed by a program that can b e pr o v ed correct in second-order logic with compr ehension restricted to p ositiv e quan tifier-fr ee formulas. Resu lts lik e these pro vide fur ther evidence—if an y was n eeded—that p olynomial-time computabilit y is an extremely natural notion: a “wide target in conceptual s pace” that one hits ev en while aiming in purely logical directions. Ov er the p ast few decades, the id ea of defining complexit y classes such as P and NP in “logica l, mac hine-free” w ays h as give n rise to an entire fi eld called descriptive c omplexity the ory , which has deep connections with finite mo del theory . While further discussion of descriptiv e complexit y theory would take u s to o far afield, s ee the b o ok of Imm er m an [77] f or the defin itiv e int r odu ctio n , or F agin [52] for a su rv ey . 5.2 Omniscience V ersus Infinit y Returning to our original topic, ho w exactly do axiomatic theories su c h as Cobham’s (or Churc h’s and Kleene’s, for that matter) escap e the pr oblem of omniscience? One straigh tforward answer is that, u nlik e the set of true sentences in some f ormal language, whic h is only c ountably infinite, th e set of f u nctions f : N → N is unc ountably infinite. And therefore, ev en if w e define the “efficient ly- computable” functions f : N → N by taking a counta b ly-infinite logica l closure, w e are sure to m iss some fun ctio n s f (in fact, almost all of them!). The observ ation ab o v e suggests a general strategy to tame the logical omn iscience p roblem. Namely , w e could refuse to define an agen t’s “kno wledge” in t erm s of whic h individual questions she can qu ic kly a n sw er, and insist on sp eaking instead ab out whic h infinite families of questio n s she can qu ic kly answer. In slogan form, we w ant to “figh t omniscience with infinity .” Let’s see ho w, by taking this route, we can giv e semi-plausible answers to the p u zzle s ab out kno wledge discu ssed earlier in this section. First, the r eason wh y y ou can “kno w” that 1591 = 43 × 37 , b ut at th e same time not “kno w ” the prime factors of 1591, is th at, when w e sp eak ab out kno wing the answ ers to these questions, we r eal ly mean knowing how to answ er them. And as w e sa w, there need not b e any con tradiction in kno w in g a fast m u ltiplica tion algorithm but not a fast factoring algorithm, even if w e mo del your kn o wledge ab out algorithms as d eductiv ely closed. T o put it another w ay , by em b ed d ing the t wo qu estions Q1 = “Is 1591 = 43 × 37?” Q2 = “What are the prime factors of 1591 ?” in to infinite fam ilies of r elate d questions , w e can break the symmetry b et wee n the kn o wledge en - tailed in answ ering th em. 20 Similarly , we could think of a c hild as p ossessing an internal algorithm w hic h, giv en any state- men t of the form x + y = y + x (for sp ecific x and y v alues), immediately outputs true , without ev en examining x and y . Ho wev er, the c hild d o es not y et ha ve the a b ility to pro cess quantifie d statemen ts, suc h as “ ∀ x, y ∈ R x + y = y + x .” In that sen se, she still lac ks the explicit kno wledge that addition is comm utativ e. Although the “cure” for logical omniscience sk etc hed ab o ve solv es some puzzles, not surp risingly it raises man y puzzles of its o w n. So let me end this section b y d iscussing th ree ma jor ob jections to the “infi nit y cure.” The first ob jection is that we’v e simply pu shed the problem of logical omniscience somewhere else. F or supp ose an agen t “knows” ho w to compute ev ery function in some restricted class such as FP . T h en ho w can we ever make sense of the agen t le arning a new algorith m? One natural resp onse is that, even if y ou ha ve the “laten t abilit y” to compute a fu n ctio n f ∈ F P , yo u migh t not know that y ou ha ve the abilit y—either b ecause you don’t kn o w a s uitable a lgorithm, or b ecause y ou do know an algorithm, but don’t k n o w that it’s an algorithm for f . Of course, if w e w ante d to pursu e things to the b ottom, we’d next need to tell a story ab out know le dge of algorithms , and ho w logica l omniscience is a v oided there. Ho wev er, I claim that this rep resen ts p rogress! F or notice that, ev en w ithout such a story , w e can already explain some failures of logical omniscience. F or example, the r easo n why you don’t kno w the factors of a large num b er migh t not b e your ignorance of a fast factoring metho d, bu t rather that no such m ethod exists. The second ob jection is that, when I adv o cated fo cusing on infinite families of qu estio n s r ather than single questions in isolation, I neve r sp ecified which infin ite families. The difficult y is that th e same question could b e generalized in wildly different w ays. As an example, consider the question Q = “Is 432 , 150 comp osite?” Q is an instance of a computational problem that h umans find very hard: “giv en a large in teger N , is N comp osite?” Ho wev er, Q is also an instance of a computational problem that humans find very easy: “giv en a large in teger N ending in 0, is N comp osite?” And indeed, we ’d exp ect a p erson to know the answe r to Q if she noticed th at 432 , 1 50 end s in 0, b ut not otherwise. T o m e, what this example demonstrates is that, if we w an t to discu s s an agen t’s kno wledge in terms of individual qu estions su c h as Q, then the r elev an t issue will b e whether there exists a generalization G of Q, suc h that the agen t kno ws a fast algorithm for answering questio n s of t yp e G, and also recognizes that Q is of typ e G. The third ob j ect ion is just the standard o n e ab out th e relationship b et w een asymptotic com- plexit y and finite statement s. F or example, if we mo del an agent ’s kno wledge usin g the Cobh am axioms, t h en we can ind eed explain why the a gent do esn’t kno w ho w to p la y p erfect chess on an n × n b oard, for arbitr ary v alues of n . 31 But on a stand ard 8 × 8 b oard, pla ying p erfect c hess w ould “merely” r equire (sa y) ∼ 10 60 computational steps, whic h is a constant, a n d therefore cer- tainly p olynomial! S o strictly on the basis of the Cobh am axioms, w h at explanation could w e p ossibly o ffer for why a rational agen t, who knew the ru les of 8 × 8 c hess, didn’t also kno w ho w to pla y it optimally? While this ob jection might soun d dev astating, it’s imp ortan t to un derstand that it’s no different fr om the usual ob jection lev eled against complexit y-theoretic argumen ts, and can b e giv en the usual resp onse. Namely: asymptotic s tat ements are always vulnerable to b eing rendered irrelev an t, if th e constan t fact ors turned ou t to b e ridiculous. Ho wev er, exp erience has 31 F or chess on an n × n b oard is known to b e EXP -complete, and it is also k no wn that P 6 = EXP . See Section 10, and particularly fo otnote 60, for more details. 21 sho wn that, for whatev er r easo n s, that h app ens rarely enough that one can u sually tak e asymptotic b eha vior as “ha vin g explanatory force unti l pr o v en otherwise.” (Section 12 will sa y m ore a b out the explanatory force of asymptotic claims, as a problem requir ing ph ilosophical analysis.) 5.3 Summa ry Because of the difficulties p oin ted out in Section 5.2, m y o wn view is th at computational complexit y theory h as not ye t come close to “solving” the logical omniscience p roblem, in the sen s e of giving a satisfying formal account of kno wledge that also av oids m aking absurd predictions. I ha ve no idea whether such an accoun t is even p ossible. 32 Ho w ever, wh at I’ve tried to sho w in this section is that complexit y theory provides a w ell-defined “limiting case” wh ere the logi cal omniscience problem is solv able, ab out a s well a s one could hop e it to b e. Th e limiting case is where the size of the questions g rows w ithout b ound, and the solution there i s giv en b y the Cobham a xioms: “axioms of knowing how” w hose logical closure one c an take without thereb y inviting omniscience. In other words, when we con template th e omn iscience problem, I claim th at we’ r e in a situation similar to one often faced in physics—where we m ight b e at a loss to un d erstand some p henomenon (sa y , gra vitational ent r op y), e xc ept in limiting cases such as b lac k h ole s. I n epistemology j u st lik e in physics, the limiting cases that we do more-or-less unders tand offer an ob vious starting p oint for those wishin g to tac kle the general case. 6 Compu tati onalism and W aterfalls Ov er the past t wo decades, a certain argum ent ab out computation—whic h I’ll call the waterfal l ar- gument —has b een w idely discussed by philosophers of mind. 33 Lik e Searle’s famous Chinese Ro om argumen t [113], th e wa terfall argum en t seeks to show that computations are “inh eren tly syn tactic,” and can never b e “ab out” anything—and that for this reason, the do ctrine of “computationali sm ” is false. 34 But unlike the Chinese Ro om, t h e waterfall argumen t supplemen ts the bare app eal to in tuition by a fu rther claim: namely , that the “meaning” of a compu tation, to w h atev er exten t it has one, is alw a ys r elative to some e xterna l observer . More concretely , consider a w aterfall (though an y other p hysical system with a large enough state space would do as we ll). Here I do not mean a w aterfall that w as sp ecially engineered to p erform computations, but r e al ly a naturally-o ccurring w aterfall: sa y , Niaga r a F alls. Being go v erned by laws of p h ysics, the waterfall implements some mapp ing f from a set of p ossible in itia l states to a se t of p ossible final stat es. If w e accept that the la ws of ph ysics are r eversible , then f must also b e injectiv e. Now supp ose w e restrict atten tion to some finite subset S of p ossible initial states, with | S | = n . Th en f is just a one-to-o n e mapp ing fr om S to s ome output set 32 Compare the p essimism expressed by Pa u l Graham [68] ab out knowledge representation more generally: In practice formal logic is n ot muc h u se, b ecause despite some progress in the last 150 years w e’re still only able to formalize a small p ercen tage of statements. W e ma y never d o th at muc h b etter, for the same reason 1980s-style “kno wledge representation” could never hav e w orked; many statements m ay hav e no representation more concise than a huge, analog brain state. 33 See Put nam [106, app endix] and Searle [114] for tw o instantiatio n s of the argument ( though the formal details of either will not concern us here). 34 “Computationalism” refers to the v iew that the mind is literally a computer, and that thought is literally a type of computation. 22 T = f ( S ) with | T | = n . The “crucial observ ation” is no w this: giv en any p erm utation σ from the set of intege r s { 1 , . . . , n } to itself, there is some wa y to lab el the elements of S and T b y integ ers in { 1 , . . . , n } , suc h that w e can in terpret f a s imp lemen ting σ . F or example, if we let S = { s 1 , . . . , s n } and f ( s i ) = t i , then it su ffices to lab el the initial state s i b y i and the fi n al state t i b y σ ( i ). But the p erm utation σ could ha v e an y “seman tics” w e lik e: it migh t represen t a p rogram for pla ying c hess, or factoring int egers, or simulating a different waterfall . Therefore “mere computation” cannot giv e rise to seman tic meaning. Here is ho w S earle [114, p. 57] expresses the conclusion: If we are consistent in adopting the T urin g test or some other “ob jectiv e” criterion for in telligen t b eh avior, then the answ er to such questions as “Can un in telligen t bits of matter pro duce inte lligent b eha vior?” and ev en, “Ho w e xactly d o they do it” are lud i- crously ob vious. An y thermostat, p o c k et calculator, or waterfall pr od u ces “inte lligen t b eha vior,” and w e kno w in eac h case how it works. Certain artifacts are designed to b ehav e as if they were in telligen t, and since eve r ything follo ws la ws of n ature, then ev erything will ha ve so me description under whic h it behav es as if it w ere in telligen t. But this s ense of “in telligen t behavio r ” is of n o psycholog ical relev ance at all. The wate rf all argumen t has b een criticized on n umerous groun ds: see Haugeland [71], Block [30], and esp ecially Chalmers [37] (wh o p arod ied the argum en t by pro ving that a cake recip e, b eing merely syn tactic, can never giv e rise to the semantic attribu te of crumbliness). T o my mind , though, p erhaps the easiest wa y to demolish the w aterfall argument is through computational complexit y considerations. Indeed, supp ose w e actually w an ted to use a waterfall to help us calculate chess mo v es. Ho w w ould w e do that? In complexit y terms, what we w ant is a r e duction fr om the chess problem to the waterfall- simulation problem. That i s, w e w an t an efficien t algorithm that someho w enc o des a c hess p osition P in to an initial state s P ∈ S of the w aterfall, in suc h a w ay that a go o d mov e from P can b e r ead out efficien tly from the wat erfall’s corresp ond ing fin al s tat e, f ( s P ) ∈ T . 35 But what wo uld such an algorithm lo ok like? W e cannot say for sure—certainly not without d eta iled kno wledge ab out f (i.e., the physics of w aterfalls), as well as the means by wh ic h the S and T elemen ts are enco ded as binary strings. But for any r easonable c h oice, it seems o v erwhelmingly lik ely that any r eduction algorithm w ould j ust solve the chess pr oblem itself , without using the w aterfall in an essent ial w ay at all! A b it more pr ecisely , I conjecture th at, giv en any c hess-pla yin g algorithm A that accesses a “w aterfall oracle” W , ther e is an equally-go od chess-pla ying algorithm A ′ , with similar time and space requirement s, that do es not access W . If this conjecture holds, then it giv es us a p erfectly observ er-ind ep en den t wa y to formalize our in tuition that the “seman tics” of w aterfalls ha ve n othing to do with c hess. 36 35 T echnically , this describes a restricted class of reductions, called nonadaptive red uctions. An adaptive reduction from chess to w aterfalls migh t so lve a chess problem by some pro cedure that involv es initializing a waterfa ll and observing its final state, then u sing the results of that aquatic computation to initialize a se c ond w aterfall and observe its final state, and so on for some p olynomial num b er of repetitions. 36 The p erceptive reader might susp ect that we sm uggled o ur conclusion into the assumption that the waterf all states s P ∈ S and f ( s P ) ∈ T w ere enco ded as b inary strings in a “ reasonable” w ay (and n ot, for example, in a w ay that encodes the solution t o the chess problem). But a crucial lesson of complexity theory is th at, when w e discuss “computational problems,” w e always make an imp licit commitment ab out the input and outp u t encod ings anyw a y ! So for example, if p ositiv e integers were given as in put via their prime factorizations, th en the factoring problem w ould be trivial (just apply the identit y function). But who cares? If, in mathematically defining the w aterfall-sim u latio n problem, w e required inpu t and outpu t enco dings that entailed solving chess p roblems, then it w ould no longer b e reasonable to call our problem (solely) a “w aterfall-sim u latio n p roblem” at all. 23 6.1 “Reductions” That Do All The W ork In terestingly , the issue of “trivial” or “degenerate” r eductions also arises within complexit y th eory , so it might b e instructiv e to see h o w it is handled there. Recall from Section 3.1 that a problem is NP -c ompl e te if , lo osely sp eaking, it is “maximally hard among all NP problems” ( NP b eing the class of problems for which solutions can b e c h ec k ed in p olynomial time). More formally , w e say that L is NP -complete if (i) L ∈ NP , and (ii) giv en any other NP p roblem L ′ , there exists a p olynomial-time algo rith m to solv e L ′ using access to an oracle that solves L . (Or more su ccinct ly , L ′ ∈ P L , w here P L denotes the complexit y class P augmen ted b y an L -oracle.) The c oncept of NP -completeness had in credible explanatory p o we r : it sho w ed that thousa nds of seemingly-unrelated problems from physics, biology , industr ial optimization, mathematical logic, and other fields w ere all identic al from t h e s tandp oin t of polynomial-time compu tatio n, and t h at not one of these problems had an efficien t solution unless P = NP . Thus, it was n atural f or theoretical compu ter scien tists to wan t to d efine an analogous concept of P - c ompleteness . In other w ords : among all the p roblems that ar e solv able in p olynomial time, which ones are “maximally hard”? But how should P -completeness ev en b e d efined? T o s ee the difficulty , su pp ose that, b y analogy with NP -completeness, w e say that L is P -complete if (i) L ∈ P and (ii) L ′ ∈ P L for ev ery L ′ ∈ P . Then it is easy to see that th e second condition is v acuous: every P pr oblem is P -complete! F or in “reducing” L ′ to L , a p olynomial-time algorithm can a lwa ys ju st ignore th e L -o r acl e and solv e L ′ b y itse lf, muc h li ke our h yp othetical chess program that ignored its w aterfall o r acle . Because of this, condition (ii) must b e replaced by a str onger condition; one p opular choice is (ii’) L ′ ∈ LOGSP A CE L for ev ery L ′ ∈ P . Here LOGS P ACE means, inform ally , the class of pr oblems solv able b y a deterministic T u ring mac hine w ith a read/write memory consisting of only log n bits, giv en an inp u t of size n . 37 It’s not hard to show that LOGS P ACE ⊆ P , and this conta in m en t is strongly b elieve d t o b e strict (though just lik e with P 6 = NP , there is no pro of y et). The k ey p oint i s that, if w e wa nt a non-vacuous notion of completeness, then the r educing complexit y class needs to b e we aker (either pr o v ably or conjecturally) than th e class b eing reduced to. In fact complexit y classes ev en smaller than LOGSP A CE almost alw ays suffice in practice. In my view, there is an imp ortan t lesson here for debates ab out c omp u tatio n alism. Supp ose w e wa nt to claim, for example, that a computation that p la ys c hess is “equiv alen t” to some other computation that simulat es a w aterfall. Then our claim is only non-v acuous if it’s p ossible to exhibit the equiv alence (i.e., giv e the redu ctio n s) within a mo del of computation that isn ’t itself p o we r ful enough to solve the c hess or waterfall problems. 37 Note t h at a LOGSP ACE machine do es not even hav e enou gh memory to store its input string! F or this reason, w e think of the inpu t string as b eing provided on a sp ecial r e ad-only tap e. 24 7 P AC-Learning and the Problem of Induction Cen tur ies ago, Da vid Hume [76] f amously p ointe d out that learning from the past (and, b y exten- sion, science) seems logically imp ossib le. F or example, if we s ample 500 ra v ens and ev ery one of them is blac k, wh y do es th at give us any grounds—even probabilistic grounds —for exp ecting the 501 st ra ve n to b e blac k also? An y mo dern answer to th is qu estion wo u ld p robably refer to Oc c am’s r azo r , the p r inciple that simpler hypotheses consisten t with the data are more likely to b e correct. So f or example, the h yp othesis that all ra vens are blac k is “simpler” than the hyp othesis that most ra ve n s are green or purple, and that only the 500 w e h ap p ened to see were blac k. In tuitivel y , it seems Occam’s razor must b e part of the solution to Hum e’s pr oblem; the diffi cu lt y is that such a resp onse leads to questions of its own: (1) What do w e mean by “simpler”? (2) Why are simple explanations like ly to b e correct? Or, less am b itio u sly: what prop erties must realit y ha ve for Occam’s Razor to “w ork”? (3) Ho w m uch data must we collect b efore we can fin d a “simp le hyp othesis” that will pr obably predict fu ture data? Ho w d o we go ab out find ing such a h yp othesis? In my view, the theory of P A C (Pr ob abilistic al ly Appr oximately Corr e ct) L e arning , initiated b y Leslie V aliant [12 7 ] in 1984, has made large enough adv ances on all of these questions th at it deserv es to b e studied b y an yone intereste d in indu ctio n . 38 In this theory , w e consider an id eali zed “learner,” who is presented with p oin ts x 1 , . . . , x m dra wn randomly from some large set S , together with the “classificat ions” f ( x 1 ) , . . . , f ( x m ) of those p oin ts. The learner’s goal is to infer the function f , well enough to b e able to predict f ( x ) for most future p oints x ∈ S . As an example, the learner migh t b e a bank, S migh t b e a set of p eople (repr esen ted by their cr ed it histories), and f ( x ) might repr esen t whether or n ot p er s on x will default on a loan. F or simplicit y , we often assume that S is a set of binary s tr ings, and that the function f maps eac h x ∈ S to a single b it, f ( x ) ∈ { 0 , 1 } . Both assu mptions can b e remov ed without significan tly c hanging the theory . Th e imp ortan t assumptions are the follo wing: (1) Eac h of the sample p oints x 1 , . . . , x m is dr a wn indep endently from some (p ossibly-un kno wn) “sample d istribution” D o v er S . F ur th ermore, the future p oin ts x on wh ic h the learner will need to predict f ( x ) are dra wn from the same distribution. (2) The fun ctio n f b elongs to a kno wn “hyp othesis class” H . Th is H r epresen ts “the set of p ossibilities th e learner is willing to ente r tai n ” (and is typica lly m u c h sm alle r than the set of all 2 |S | p ossible functions fr om S to { 0 , 1 } ). Under these assum ptions, we ha ve the follo wing cen tral resu lt. 38 See Kearns and V azirani [82] for an excellent introduction to P AC-learning, and de W olf [136] for previous w ork applying P A C-learning to philosophy and linguistics: sp ecifically , to fl eshing out Chomsky’s “p o vert y of the stimulus” argument. De W olf also discusses several formalizations of Occam’s Razor other than the one based on P A C-learning. 25 Theorem 2 (V alian t [127]) Consider a finite hyp othesis class H , a Bo ole an function f : S → { 0 , 1 } in H , and a sample distribution D over S , as wel l as an err or r ate ε > 0 and failur e pr ob ability δ > 0 that th e le arner i s wil ling to toler ate. Cal l a hyp othesis h : S → { 0 , 1 } “go o d” if Pr x ∼D [ h ( x ) = f ( x )] ≥ 1 − ε. Also , c al l sa mple p oints x 1 , . . . , x m “r eliable” if any hyp othesis h ∈ H that satisfies h ( x i ) = f ( x i ) for al l i ∈ { 1 , . . . , m } is go o d. Then m = 1 ε ln |H| δ sample p oints x 1 , . . . , x m dr awn indep endently fr om D wil l b e r eliable with pr ob ability at le ast 1 − δ . In tuitively , Th eorem 2 sa ys th at the b eha vior of f on a small n u mb er of randomly-c h osen p oints pr ob ably determines its b ehavi or on most of the r emaining p oin ts. In other words, if, by some unsp ecified means, the learner manages to fi nd an y hyp othesis h ∈ H that make s correct pr edictio n s on all its past data p oin ts x 1 , . . . , x m , then pro vided m is large enough (and as it happ ens, m do esn’t need to b e ve r y large), the learner can be statistically confiden t that h will also make the correct predictions on most future p oints. The part of Theorem 2 that b ears the unmistak able imp rin t of complexit y theory is the b ound on sample size, m ≥ 1 ε ln |H| δ . Th is b oun d has three n ota b le implications. First, ev en if the class H con tains exp onentia lly man y hypotheses (sa y , 2 n ), one can still learn an arbitrary fun ctio n f ∈ H using a line ar amoun t of sample data, since m gro w s only logarithmically with |H | : in other words, lik e the num b er of bits needed to write down an ind ividual h yp othesis. Second, one can m ake the probabilit y that the hypothesis h will fail to ge n eraliz e exp onentia l ly smal l (sa y , δ = 2 − n ), at the cost of increasing the sample size m by only a lin ear factor. Third , assuming the hyp othesis do es generalize, its error rate ε decreases in ve r s ely with m . It is not hard to sho w t h at eac h of these dep endencies is tigh t, so that for example, if we demand either ε = 0 or δ = 0 then no finite m suffices. This is the origin of the name “P A C-learnin g”: th e most one can hop e for is to output a h yp othesis th at is “probably , app ro ximately” correct. The pro of of Th eorem 2 is easy: consider an y hypothesis h ∈ H that is b ad , meaning that Pr x ∼D [ h ( x ) = f ( x )] < 1 − ε. Then by th e indep endence assu mption, Pr x 1 ,...,x m ∼D [ h ( x 1 ) = f ( x 1 ) ∧ · · · ∧ h ( x m ) = f ( x m )] < (1 − ε ) m . No w, the num b er of bad hyp otheses is no more than the total num b er of h yp otheses, |H | . So by the union b ound, the probabilit y that there exists a bad h yp othesis that agrees with f on all of x 1 , . . . , x m can b e at most |H| · (1 − ε ) m . Therefore δ ≤ |H | · ( 1 − ε ) m , and all th at rema in s is to solv e for m . The relev ance of Theorem 2 to Hume’s problem of induction is that the theorem describ es a non trivial class of situations where ind u ctio n is guar ante e d to work with high prob ab ility . Theorem 2 also illum inates the role of Occam’s Razor in in d uction. In order to learn using a “reasonable” n u mb er of sample p oints m , th e hyp othesis class H must ha ve a su ffi cien tly s mall cardinalit y . But that is e qu iv alen t to saying that ev ery hyp othesis h ∈ H m ust hav e a suc ci nc t description —since 26 the n umb er of bits needed to specify an arbitrary h yp othesis h ∈ H is simply ⌈ log 2 |H|⌉ . If the n u mb er of bits needed to sp ecify a h yp othesis is to o large, then H will alwa ys b e vulner ab le to the p r oblem of overfitting : some h yp otheses h ∈ H sur viving conta ct with the sample data ju st by c hance. As p oin ted out to me b y Agust ´ ın Ra y o, there are sev eral p ossible in terpr eta tions of Occam’s Razor that h a v e nothing to d o w ith descriptiv e complexit y: for example, we migh t w an t our h yp otheses to b e “simple” in terms o f th eir on tological or ideological co mm itmen ts. Ho w ev er, to whatev er exten t we in terpret Occam’s R azor as saying that shorter or lower-c omplexity h yp otheses are preferable, Theorem 2 comes c loser than one might h av e though t p ossible to a ma th ematical justification for w h y the Razor wo rk s . Man y philosophers migh t b e familiar with alternativ e formal appr oac hes to Occam’s Razor. F or example, w ithin a Ba y esian f r amew ork, one can c ho ose a pr ior o ver all p ossib le hyp otheses that giv es greater wei ght to “simpler” hyp otheses (where simplicit y is measur ed, for example, by the length of the shortest pr ogram that computes the p r edictio n s). How ev er, while th e P A C-learning and Ba yesian approac hes are r ela ted, th e P A C approac h has the adv an tage of requiring only a qualitative d ecision ab out wh ich h yp otheses one wan ts to consider, rather than a quantit ativ e p rior o v er hypotheses. Given the hyp othesis class H , one can then seek learning metho ds that w ork for any f ∈ H . (On the other hand , th e P AC approac h requires an assu mption ab out the probabilit y distribution ov er observations , while the Ba y esian approac h do es not.) 7.1 Dra wbac ks of the B asic P A C Mo del I’d n o w like to discuss three drawbac ks of Theorem 2, since I think th e drawbac ks illumin ate philosophical asp ects of induction as well as the adv an tages do. The first drawbac k is that Th eorem 2 works only for finite h yp othesis classes. In science, how- ev er, hyp otheses often inv olv e contin uous parameters, of wh ic h there is an un coun table infinity . Of course, one could solve this problem b y simp ly discretizing th e parameters, b ut then the num b er of h yp otheses (and therefore the relev ance of T h eorem 2) wo u ld dep end on h o w fine the discretization w as. F ortu n atel y , w e can av oid such difficulties by realizing that the le arner only c ar es ab out the “differ enc es” b etwe en two hyp otheses insofar as they le ad to differ ent pr e dictions. Th is lea d s to the fu n damen tal notion of VC-dimension (after its originators, V apnik and Chervonenkis [129]). Definition 3 (VC-dimension) A hyp othesis class H shatters the sample p oints { x 1 , . . . , x k } ⊆ S if for al l 2 k p ossible settings of h ( x 1 ) , . . . , h ( x k ) , ther e exists a h yp othesis h ∈ H c omp atible with those settings. Then VCdim ( H ) , the VC-dimension of H , is the lar gest k for which ther e exists a subset { x 1 , . . . , x k } ⊆ S that H sh atters (o r if no finite maximum exists, then V Cdim ( H ) = ∞ ). Clearly an y fin ite h yp othesis class has finite V C-dimension: indeed, VCdim ( H ) ≤ log 2 |H| . Ho w ever, eve n an infin ite h yp othesis class can ha ve fin ite V C-dimension if it is “sufficient ly simp le.” F or example, let H b e the class of all functions h a,b : R → { 0 , 1 } of the form h a,b ( x ) = 1 if a ≤ x ≤ b 0 o th er w ise. Then it is easy to chec k th at VCdim ( H ) = 2. With th e notion of V C-dimension in hand, we can state a p ow erful (and harder -to-prov e!) generalizat ion of Theorem 2, d u e to Blumer et al. [31]. 27 Theorem 4 (Blumer et al. [31]) F or some universal c onstant K > 0 , the b ound on m in The- or em 2 c an b e r eplac e d by m = K VCdim ( H ) ε ln 1 δ ε , with the the or em now holding for any hyp othesis class H , finite or infinite. If H has infi nite V C-dimen sion, then it is easy to construct a probabilit y distribution D o v er sample p oints such that no finite numb er m of samp les fr om D suffic es to P AC-le arn a function f ∈ H : one real ly is in the unfortunate situation described by Hu me, of having n o groun ds at a ll for p redicting that the n ext rav en will b e b lack. In some sense, then, Th eorem 4 is telling us that finite V C-dimension is a n ece ss ary and sufficien t condition for scien tific induction to b e p ossible. Once again, Theorem 4 also has a n in terpretation in terms of Occam’s R azor, with the sm allness of the VC-dimension now playi n g the role of simplicit y . The second dr a wbac k of Theorem 2 is that it give s u s no clues ab out ho w to find a hyp othesis h ∈ H consisten t w ith th e sample data. All it sa ys is that, if we fin d suc h an h , th en h will probably b e close to the truth. This illustrates th at, ev en in the simp le setup en visioned by P A C-learning, indu ction c annot b e merely a matter of seeing enough data and then “generalizing” from it, b ecause immense computations migh t b e needed to find a suitable generalization! Indeed, follo wing the wo r k of Kearns and V alian t [81 ], we now kno w that many natural learning problems— as an exa mp le, inferring the r ules of a regular or conte xt-free language from random exa m p les of grammatical and ungrammatical sen tences—are computationally intrac table in an extremely strong sense: Any p olynomial-time algorith m for finding a hyp othesis c onsistent with the data would imply a p olynomial-time algorith m for br e aking widely-use d cryptosystems such as RSA! 39 The app earance of crypto gr aphy in the ab o v e statemen t is far f rom acciden tal. In a sense that can b e made precise, learning and cryptograph y are “dual” p roblems: a learner w ants to find patterns in data, while a cryptographer wan ts to generate d ata whose patterns are har d to find. M ore concretely , one of the basic p rimitiv es in cryptography is called a pseudor ando m function family . Th is is a family of efficien tly-computable Bo olea n fu nctions f s : { 0 , 1 } n → { 0 , 1 } , parameterized b y a s h ort random “seed” s , that are virtual ly indistinguishable fr om r andom functions b y a p olynomial-time algorithm. Here, we imagine that the w ould-b e distinguish in g algorithm can query the function f s on v arious p oints x , and also that it k now s the mapping from s to f s , and so is ignorant only of the seed s itself. There is strong evidence in cryp tograph y that pseudorand om fun ction families exist: indeed, Goldreich, Goldw asser, and Micali [64] s h o w ed ho w to construct one starting f rom any pseudorandom gener ator (the latter was men tioned in S ectio n 1.1). No w, giv en a pseudorandom fun ctio n f amily { f s } , imagine a P A C-learner w h ose hyp othesis class H consists of f s for all p ossible seeds s . The learner is pro vided s ome randomly-c hosen sample p oin ts x 1 , . . . , x m ∈ { 0 , 1 } n , toget h er with the v alues of f s on those p oints: f s ( x 1 ) , . . . , f s ( x m ). Given 39 In the setting of “prop er learning”—where the learner needs to outp ut a hyp othesis in some sp ecified format—it is even known that many natural P A C-learning problems are NP -complete (see Pitt and V alian t [104] for example). But in the “improp er” setting—where the learner can describe its hypothesis using an y p olynomial-time algori th m—it is only known how to show that P AC-learning problems are hard und er crypt ogra p hic assumptions, and t h ere seem to b e inh eren t reasons for this (see A pplebaum, Barak, and Xiao [14]). 28 this “training data,” the learner’s goal is to figure out ho w to co mp ute f s for itself—and thereb y predict the v alues of f s ( x ) on new p oin ts x , p oin ts not in the training sample. Un fortunately , it’s easy to see that if the learner could do that, then it would thereby d istinguish f s from a truly random function—and thereb y con tr ad ict our starting assumption that { f s } w as pseudorand om! Our conclusion is that, if the basic assumptions of mo dern cryptography hold (and in p articular, if there exist pseu d orandom generators), then th ere must b e situ ations where learning is imp ossible purely b ecause of compu tational complexit y (and n ot b ecause of insufficien t data). The third dr awbac k of Theorem 2 is the assu mption that the distribution D from wh ich the learner is tested is the same as the distribution from w hic h the sample p oints w ere dra wn . T o me, this is the most serious dra wb ack, since it tells u s th at P AC-le arn ing mo dels the “learning” p erformed by an under grad u ate cr amm in g for an exam by solving last year’s p r oblems, or an emplo ye r us ing a regression mo del to id entify the c haracteristics of su cce ss f ul hires, or a cryptanalyst breaking a code from a co llection of plaint exts and ciph ertexts. It do es n ot, ho wev er, mo del the “learning” of an Eins tein or a Szilard, making predictions ab out phenomena that are differen t in kind fr om an ything y et observ ed. As Da vid Deutsc h stresses in his r ece nt b o ok The Be ginning of Infinity [49], the goal of science is not merely to summarize observ ations, and thereby let us make predictions ab out sim ilar observ ations. Rather, the goal is to disco v er explanations with “reac h ,” meaning the abilit y to predict what w ould happ en ev en in nov el or hypothetical situations, lik e the Sun sudd enly disapp earing or a quantum computer b eing bu ilt. In m y view, dev eloping a comp elling mathematica l mo del of explanatory learning—a mo del that “is to explanation as the P A C mo del is to pr edictio n ”—is an outstanding op en problem. 40 7.2 Computational Complexit y , Bleen, and Gr ue In 1955, Nelson Go o dman [67] prop osed what he called the “new riddle of induction,” whic h survives the Occam’s Razor answ er to Hume’s original induction problem. In Go o dman’s riddle, we are ask ed to consider th e hypothesis “All emeralds are green.” Th e question is, w hy do w e fa v or tha t h yp othesis o ver the follo wing alternativ e, wh ic h is equally compatible with all our evidence of green emeralds? “All emeralds are green b efore J an uary 1, 2030, and then b lue afterwards.” The ob vious answer is that the second hypothesis ad d s sup er fl uous complicatio n s, and is there- fore disfa vored by O cca m’s Razor. T o that, Go od man replies that the defin itions of “simp le” and “complicate d ” dep end on our language. In particular, supp ose we h ad no words for green or b lue, but w e did hav e a word grue , meaning “green b efore January 1, 2030, and blue afterw ards ,” and a w ord ble en , meaning “blue b efore Jan uary 1, 2030, and green afterw ards .” In that case, w e could only express the hypothesis “All emeralds are green” b y sa ying “All emeralds are grue b efore January 1, 2030, and then bleen afterwards.” —a manifestly more complicated hyp othesis than the simple “All emeralds are grue”! 40 Imp ortan t progress tow ard t his goal includes the work of Angluin [11] on learning fi nite automata from q ueries and counterexamples, and that of A ngluin et al. [12] on learning a circuit by injecting v alues. Bo th p apers study natural learning mo dels that generalize the P AC model by allo wing “controll ed scientific exp erimen ts,” whose results confirm or refute a hyp othesis and thereby provide guidance ab out which exp eriments to d o next . 29 I confess that, w hen I con template the grue riddle, I ca n ’t help b u t recall the jok e ab out the An ti-Indu ctivists, who, when ask ed why they con tin ue to b eliev e th at the future won ’t resem ble the past, wh en th at false b elief has brought their civilizati on nothing but p o vert y and misery , reply , “b ecause ant i-ind uction has never work ed b efore!” Y es, if we artificially defin e our p rimitiv e concepts “against the grain of the world,” then w e shouldn’t b e surprised if the w orld’s actual b eha vior b ecomes more cumber s ome to describ e, or if w e m ak e wrong pr edictio n s. It would b e as if we were u s ing a programming language that h ad n o built-in fu nction for m u ltiplica tion, b u t only for F ( x , y ) := 17 x − y − x 2 + 2 xy . In that case, a normal p erson ’s fi rst in s tinct w ould b e either to switc h programming languages, or else to define m u ltiplica tion in terms of F , and forget ab out F f rom that point onw ard! 41 No w, there is a ge nuine p hilosophical problem here: why do grue, bleen, an d F ( x, y ) go “against the grain of the w orld,” whereas green, b lue, and m ultiplication go with the grain? But to me, that problem (lik e Wigner’s pu zzlement o v er “the unr easo n able effectiv eness of mathematics in natural sciences” [1 35 ]) is more a b out the w orld itself t h an ab out h u m an concepts, so we shouldn ’t exp ect any pu rely linguistic analysis to r esolv e it. What ab out computational complexit y , th en ? In my view, wh ile computational complexit y do esn’t solve the gru e riddle, it do es con tr ib ute a useful insight . Namely , that when we talk ab out the simp licit y or complexit y of h yp otheses, w e sh ould d istinguish tw o issu es: (a) The asymptotic sc aling of the hypothesis size, as the “size” n of our learning pr oblem go es to infinity . (b) The constant -factor ov erheads. In terms of the basic P A C mo del in S ect ion 7, we can imagine a “hidden parameter” n , whic h measures the n u m b er of bits needed to sp ecify an individ u al p oin t in the set S = S n . (Other w ays to measure the “size” of a learning problem w ould also wo r k, but this wa y is p articularly con v enient.) F or conv enience, we can iden tify S n with the set { 0 , 1 } n of n -bit strings, so that n = log 2 |S n | . W e th en need to consider, not just a single hyp othesis class, but an infinite f amily of hyp othesis classes H = { H 1 , H 2 , H 3 , . . . } , one for eac h p ositiv e in teger n . Here H n consists of h yp othesis f u nctions h that map S n = { 0 , 1 } n to { 0 , 1 } . No w let L b e a language for sp ecifying hyp otheses in H : in other words, a mapping f r om (some subset of ) binary strings y ∈ { 0 , 1 } ∗ to H . Also, given a hyp othesis h ∈ H , let κ L ( h ) := min {| y | : L ( y ) = h } b e the length of the shortest description of h in the language L . (Here | y | j ust means the num b er of bits in y .) Finally , let κ L ( n ) := max { κ L ( h ) : h ∈ H n } b e the n u m b er of bits needed to sp ecify an arbitr ary h yp othesis in H n using the language L . Clearly κ L ( n ) ≥ ⌈ log 2 |H n |⌉ , with equalit y if and only if L is “optimal” (that is, if it represent s 41 Supp ose that our programming language p rovides only multiplication by constants, addition, and th e function F ( x, y ) := ax 2 + bxy + cy 2 + dx + ey + f . W e can assume without loss of generality t hat d = e = f = 0. Then provided a x 2 + bxy + cy 2 factors into tw o indep endent linear terms, px + q y an d r x + sy , w e can express th e p roduct xy as F ( sx − q y , − r x + py ) ( ps − q r ) 2 . 30 eac h h yp othesis h ∈ H n using as few bits as p ossib le). Th e question that concerns us is how quic kly κ L ( n ) gro ws as a function of n , f or v arious c hoices of language L . What do es any of this ha ve to do with the gru e rid dle? W ell, we can think of the details of L (its synt ax, v o cabulary , etc.) as affecting the “lo we r -order” b ehavio r of the f unction κ L ( n ). So for example, supp ose w e are unlucky enough that L c ontains the w ords grue and b le en , but not blue and gr e en . T hat might increase κ L ( n ) b y a factor o f ten or so—since no w, ev ery time w e w ant to mentio n “green” when sp ecifying our h yp othesis h , w e instead need a wordy circu m locution lik e “grue b efore January 1, 2030 , a n d then bleen afterwards,” and similarly for blue. 42 Ho w ever, a crucial lesso n of complexit y theory is that the “ h igher-ord er” b ehavio r of κ L ( n )—for example, whether it gro ws p olynomially or exp onen tially with n —is almost completely unaffected by the details of L ! The reason is th at, if t wo languages L 1 and L 2 differ only in th eir “lo w-lev el d etai ls,” then tr anslating a hyp othesis from L 1 to L 2 or vice v ersa will increase the description length by n o more than a p olynomial factor. Indeed, as in our grue example, there is usu ally a “un iv ersal trans- lation constan t” c suc h that κ L 1 ( h ) ≤ cκ L 2 ( h ) or ev en κ L 1 ( h ) ≤ κ L 2 ( h ) + c for every hyp othesis h ∈ H . The one exception to the ab o v e r ule is if the languages L 1 and L 2 ha ve differen t expr essive p owers . F or example, ma yb e L 1 only allo ws nesting exp r essions to depth t wo , w hile L 2 allo ws nesting to arbitrary depth s; or L 1 only allo ws prop ositional connectiv es, while L 2 also allo ws first- order qu an tifiers. In those cases, κ L 1 ( h ) could indeed b e m uch greater than κ L 2 ( h ) for some h yp otheses h , p ossibly eve n exp on entially greater ( κ L 1 ( h ) ≈ 2 κ L 2 ( h ) ). A rough analogy w ould b e this: supp ose y ou hadn’t learned w hat d ifferential equ ati ons were, and had no idea ho w to solv e them ev en appr o ximately or n umer ically . In that case, Newtonian m ec hanics migh t seem ju st as co mp lica ted to y ou as the Ptolemaic theory with epicycl es, if not mor e complicated! F or the only w a y y ou could make predictions with Newtonian mec hanics w ould b e usin g a huge ta b le of “precomputed” d ifferen tial equ ati on solutions—and to you , that table w ould seem just as u n wieldy and inelegan t as a t able of epicycles. But notic e that in this case, your p erception would b e the result, n ot o f some arb itrary c hoice of v o cabulary , but of an obje ctive gap in y our m athemat ical expressiv e p o we r s. T o su m marize, our choice of v o cabulary—for example, whether we tak e green/blue or bleen/grue as primitiv e concepts—could indeed matter if we w ant to use O cca m’s Razor to predict the fu ture color of emeralds. But I think that complexit y theory justifies us in t r eat in g g ru e as a “small- n effect”: something that b ecomes less and less imp ortant in the asymp tot ic limit of m ore and more complicated learning problems. 42 Though note that, if the language L is expressive enough t o allo w this, we can simply define green and blue in terms of bleen and grue onc e , then refer back to those definitions whenever needed! In that case, taking bleen and grue (rather th an green and blue) to b e the primitive concepts would increase κ L ( n ) by only an additive constan t, rather than a multiplicativ e constant. The ab o ve fact is related to a fundamental result from the theory of Kolmogorov complexity (see Li and Vit´ anyi [89] for example). Namely , if P and Q are any tw o T uring- univ ersal programming languages, an d if K P ( x ) and K Q ( x ) are th e lengths of th e shortest programs in P and Q respectively that out p ut a given string x ∈ { 0 , 1 } ∗ , then there exists a u niv ersal “translation constan t” c P Q , such that | K P ( x ) − K Q ( x ) | ≤ c P Q for every x . This c P Q is just the num b er of bits needed to write a P -interpreter for Q - programs or v ice versa. 31 8 Q u an tum Compu ting Quantum c omputing is a prop osal for using quan tum mechanics to solv e certain compu tat ional problems muc h faster than w e know ho w to solv e them to da y . 43 T o do so, one w ould n eed to b uild a new t yp e of computer, capable of exploiting the qu an tum effects of sup erp osition and inte rf erence. Building su c h a computer—one large enough to solve interesti n g problems—r emains an enormous c hallenge for physics and engineering, du e to the fr agil ity of quantum states and th e need to isolate them from their external environmen t. In th e mean time, though, theoretica l computer scient ists ha v e extensiv ely stud ied w hat w e could and couldn ’t do with a quantum computer if w e had one. F or certain problems, remark able quan tum algo r ithms are kno wn to solv e them in p olynomial time, eve n though th e b est-kno wn classical algo r ithms requ ir e exp onen tial time. Most famously , in 1994 P eter Shor [117] ga v e a p olynomial-time quan tum algorithm for fac toring in tegers, and as a b ypr o du ct, breaking m ost of the cryptographic co des used on the Internet tod a y . Besides the practical implications, Shor’s algorithm also provided a k ey piece of evidence th at sw itc hing from classical to quant u m comput- ers w ould enlarge the class of problems solv able in p olynomial-ti me. F or theoretica l compu ter scien tists, th is had a p r ofound lesson: if w e w an t to kno w the limits of e fficient computation, w e ma y need to “lea v e our armc hairs” and incorp orate actual facts ab out physics (at a minimum, the truth or f alseho o d of quan tum mechanics!) . 44 Whether or n ot scalable quan tum computers are built anyti m e so on , m y o wn (biased) view is that qu an tum compu ting represents one of the great s cien tific adv ances of our time. But here I wan t to ask a differen t question: d oes quantum c ompu ting ha v e any implicatio n s for ph ilosophy —and sp ecifically , for th e inte r pretation of quan tum mec hanics? F rom one p ersp ectiv e, the answ er seems like a n ob vious “no.” Whatev er else it is, quantum computing is “merely” an application of quantum mec hanics, as that theory h as existed in p hysics textb ooks for 80 yea rs . Indeed, if yo u accept that qu an tum m ec hanics (as currently u ndersto od ) is true, then pr esu mably you should also accept the p ossibilit y of quan tum c omp uters, and mak e the same predictio n s ab out th eir op eration as ev eryone else. Whether yo u describ e the “realit y” b ehind quan tum pro cesses v ia the Man y-W orlds Interpretation, Bohmian mec h anics, or some other view (or, follo win g Bohr’s C op enh age n Inte r pretation, r efuse to discuss the “realit y” at all), seems irrelev an t. F rom a differen t p ersp ective , though, a scala b le quan tu m compu ter w ould test quan tum me- c hanics in an extremely no ve l regime—and for th at reason, it could in d eed raise new p hilosophical issues. The “regime” quan tum computers would test is c haracterize d not by a n energy scale or a temp erature, but by computational complexit y . One of t h e most strikin g facts ab out quant u m mec hanics is that, to represen t the state of n ent angled particles, one needs a vect or of size exp o- nential in n . F or example, to sp ecify the sta te of a thousand spin-1 / 2 particles, o n e needs 2 1000 complex n u m b ers called “amplitudes,” one for eve ry p ossible outcome of measuring the spins in the { up , do wn } basis. The quan tum state, denoted | ψ i , is then a linear com b ination or “sup erp osition” 43 The authoritative reference for quantum comput ing is th e b o ok of N ielsen and Ch u ang [99]. F or gentler intro- ductions, try Mermin [92, 93] or th e survey articles of Aharonov [10], F ortnow [57], or W atrous [132 ]. F or a general discussion of p olynomial-time comput ation and the law s of p hysics (including sp eculativ e mo dels b ey ond quantum computation), see my survey article “ NP - complete Problems and Physical R eality” [4]. 44 By contrast, if we only wa nt to know what is c omputable in the physical universe, with no efficiency requ iremen t, then it remains entirely consistent with current k no wledge that Churc h and T urin g ga ve t he correct answe r in the 1930s—and that they d id so without incorp orating any physics b ey ond what is “accessible to intuition.” 32 of the p ossible outcomes, with eac h outcome | x i we ighted by its amplitude α x : | ψ i = X x ∈{ up , do wn } 1000 α x | x i . Giv en | ψ i , one can ca lculate the prob ab ility p x that any p articular outcome | x i w ill b e observed, via the rule p x = | α x | 2 . 45 No w, there are only ab out 10 80 atoms in the visible un iverse, whic h is a muc h smaller num b er than 2 1000 . So assumin g quan tum mec hanics is true, it seems Nature has to inv est staggering amoun ts of “computational effort” to k eep trac k of small collecti ons of particles—certainly more than anything classical physics requires! 46 , 47 In the early 1980s, Ric hard F eynman [55] and others called atten tion to this p oin t, noting that it underla y something that had long b een apparen t in practice: the extraordinary difficulty of simulati n g quant u m mec hanics using conv en tional compu t- ers. But F eynman also raised th e p ossibilit y of turning that difficult y around, by bu ilding our computers out of quan tu m comp onen ts. Suc h computers could conceiv ably solve certain prob- lems faster than co nv en tional computers: if nothing else, then at least the pr oblem of sim ulating quan tum mec hanics! Th u s, quantum computing is interesting n ot just b ecause of its applications, bu t (ev en more, in m y opinion) b ecause it is the first te chnolo gy that would dir e ctly “ pr ob e” the exp onentiality inher ent in the quantum description of Natur e. One can make an analogy here to the exp eriments in the 1980s that fi rst convincingly violated the Bell Inequalit y . Lik e quan tum algorithms to da y , Bell’s refutation of lo cal realism wa s “merely” a mathematical consequ ence of qu antum mechanics. But that r efutatio n (and the exp eriments that it insp ired) made conceptually-imp ortan t asp e cts of quan tum m ec hanics no longer p ossible to ignore—and for that reason, it c hanged th e ph ilosophical landscap e. It seems o verwhelmingly likely to me that quan tu m compu ting will do the same. Indeed, we can extend the analogy further: just as th er e w ere “local realist d iehards” wh o denied that Bell I nequalit y violation wo uld b e p ossible (and tried to explain it a wa y after it was ac hiev ed), 45 This means, in particular, that th e amplitud es satisfy the normalization condition P x | α x | 2 = 1. 46 One might ob ject that even in t he classical wo rld, i f we simply d on ’t know t he v alue of (sa y ) an n -bit string, then we also describ e our ignorance using exp onentia lly- man y numbers: namely , the pr ob ability p x of each p ossible string x ∈ { 0 , 1 } n ! And indeed, there is an extremely close conn ection b etw een quantum mechanics and classical probabilit y theory; I often describe quantum mechanics as just “probabilit y theory with complex num b ers instead of nonnegative reals.” How ever, a crucial difference is that we can alw ays describ e a classical string x as “really” having a defin ite va lue; the vector of 2 n probabilities p x is then just a mental representation of our own ignorance. With a qu an tum state, we do not have the same luxury , b ecause of th e phenomenon of interfer enc e b etw een p ositiv e and negative amplitudes. 47 One might also ob ject that, even in classical physics, it takes i nfinitely many bits to record the state of even a single p articl e, if its p osition and momentum can b e arbitrary real num b ers. A nd indeed, Copeland [43], H ogarth [73], Siegelmann [118], and other writers hav e sp eculated that the contin uity of physical quantities migh t actually allo w “hypercompu tations”—including solving the halting problem in a fi n ite amount of time! F rom a mo dern p erspective, thou gh , quantum mechanics and quantum gra vity strongly suggest t hat the “c ontinuity” of me asur able quantities such as p ositions and m om enta i s a the or etic al artif act. In other words, it ought to suffice for simulation purp oses to approximate these quantities to some fi nite p recisi on, probably related to the Planc k scale of 10 − 33 centimeters or 10 − 43 seconds. But the exp onen tiality of quantum states is different, for at least tw o reasons. Firstly , it do esn’t lead to computa- tional sp eedups that are n early as “unreasonable” as the hyp ercomputing sp eedups. Secondly , no one has any idea where the theory in q uestion (quantum mechanics) c ould break down, in a manner consisten t with current exp eri- ments. In other words, th ere is no known “killer obstacle” for q uan tu m computing, analogous to th e Planck scale for hypercompu ting. See Aaronson [2 ] for further discussion of this p oint, as w ell as a prop osed complexity-theoretic framew ork (called “Sure/Shor separators”) with which to study such obstacles. 33 so to da y a vocal minority of computer scien tists and physicists (including Leonid Levin [88], O ded Goldreic h [61], and Gerard ’t Ho oft [75]) denies the p ossibility of scalable qu an tum compu ters, ev en in prin ciple. While they admit th at quantum mec hanics h as passed ev ery exp erimental test for a cen tury , these sk eptics are c onfident that qu an tum mec hanics will fail in the regime teste d b y quan tum computing—and that wh atev er new th eory replaces it, th at theory will allo w only classical computing. As most quan tum computing researchers are quic k to p oin t out in resp onse, they would b e thril le d if the attempt to build scalable quan tum compu ters led ins tea d to a revision of quantum mec hanics! S u c h an outcome w ould pr obably constitute the largest rev olution in p h ysics since the 1920s, and ultimately b e muc h mor e in teresting than bu ilding a quantum compu ter. O f cours e, it is also p ossible that scalable quantum computing will b e gi ven u p as t o o difficult for “m undane” tec hnological reasons, rather than fund amental p hysics reasons. But t h at “m und ane” p ossib ilit y is not wh at skeptic s su c h as Levin, Goldreic h, and ’t Ho oft are talking ab out. 8.1 Quan tum Computing and t he Man y-W orlds In t erpretation But let’s retur n to the original question: supp ose the skeptics are wrong, and it is possible to b uild scalable qu antum computers. W ould that ha ve any relev ance to the inte r p retatio n of quantum mec hanics? The b est-kno wn argumen t that the answe r is “y es” w as made by Da vid Deutsc h , a quant u m computing pioneer and staunc h d efen d er o f the Man y-W orlds Interpretation. T o b e precise, Deutsc h thin ks th at quantum mec hanics str aightforwar d ly implies the existence of parallel unive r s es, and that it do es so indep endently of quantum computing: on his view, eve n the d ouble- slit exp eriment can only b e explained in terms of t wo parallel unive r ses in terfering. Ho wev er, Deutsc h also think s th at quan tum compu ting adds emotional pu n c h to the argument. Here is how he put it in his 1997 b o ok The F abric of R e ality [48, p. 217]: Logicall y , the p ossibilit y of complex quantum computations add s nothing to a case [for the Many-W orlds In terp r etat ion] that is already u nansw erable. But it do es add psyc hological i m p act. With Sh or’s algorithm, th e argumen t has b een w rit v ery large. T o those who still cling to a sin gle -u n iv erse world-view, I issue this c hallenge : explain how Shor’s algorith m works. I do not merely mean predict that it will work, whic h is merely a matter of solving a few uncont r o v ersial equations. I mean pr o vide an explanation. When Sh or’s algorithm h as factorized a num b er, using 10 500 or so times the compu tati onal resources th at can b e seen to b e pr esent, where w as the n umb er factorized? Th ere are only ab out 10 80 atoms in the enti r e v isib le u niv erse, an utterly min u scule num b er compared w ith 10 500 . So if the visible unive r se we re the exten t of physic al realit y , physical realit y would not even remotely con tain the resources r equired to fa ctorize suc h a large n umb er. Who did factorize it, th en? Ho w, and where, w as the computation p erformed? There is plent y in the ab o v e paragraph f or an enterprising philosopher to min e. In particular, ho w should a nonbeliev er in Man y-W orlds answ er Deutsc h’s c hallenge? In the r est of this section, I’ll fo cus on tw o p ossib le resp onses. The first resp ons e is to den y th at, if Shor’s algorithm works as predicted, that can only b e explained b y p ostulating “ v ast computational resources.” A t the most obvio u s level, complexit y 34 theorists hav e n ot y et ruled out the p ossibility of a fast classic al factoring algorithm. 48 More gen- erally , that quan tum compu ters can s olve certain p r oblems sup erp olynomially faster than classical computers is not a theorem, but a (profoun d, plausible) c onje ctur e . 49 , 50 If the conjecture failed, then the do or would seem op en to w hat w e might call “p olynomial-time hidden-v ariable theories”: theories that repro duce the p r edictio n s of quan tum mec h anics without in vo king any compu tati ons outside P . 51 These would b e analogous to th e lo c al hidden v ariable theories that Einstein and others had h op ed for, b efore Bell ruled suc h th eories out. A second resp ons e to Deutsch’s c hallenge is that, ev en if we agree that Shor’s algorithm demon- strates the r eal ity of v ast c omputa tional r esour c es in Nature, it is not obvious th at w e sh ould think of those r esources as “parallel un iv erses.” Why not simply sa y that there is one universe, and that it is quant u m-mec hanical? Doesn’t the parallel-univ erses language reflect an ironic p ar o chialism : a desire to imp ose a f amiliar science-fiction image on a mathematical theory that is str anger than fiction, that d o esn’t matc h any of our pre-quantum intuitio ns (includ ing computational in tuitions) particularly well? One can s harp en the p oint as follo w s: if one to ok the p aralle l-un iv erses exp lanati on of ho w a quan tum compu ter works to o seriously (as many p opular writers do!), then it would b e natural to mak e f urther in ferences ab out quant u m compu ting that are flat-out wr ong. F or example: “Using only a thousand q u antum b i ts (or qubits), a quantum c omputer c ould stor e 2 1000 classic al bits.” This is true only for a bizarre defi n ition of the w ord “store”! Th e f undamen tal pr oblem is that, when you measure a quantum computer’s state, y ou see only one of the p ossible outcomes; th e rest d isapp ear. Indeed, a celebrated r esult called Holevo’s The or em [74] sa ys that, usin g n qubits, there is no wa y to store m ore than n classical bits so that the bits can b e reliably retriev ed la ter. In other w ord s: for at least one n atural defin ition of “information-carrying c apacit y ,” qubits ha ve exactly the same capacit y as b its. T o tak e another examp le: 48 Indeed, one c annot ru le t hat p ossi b ility out, without first proving P 6 = NP ! But even if P 6 = NP , a fast classical factoring algorithm might stil l e x ist, again b ecause factoring is not th ough t to b e NP -complete. 49 A formal versio n of th is conjecture is BPP 6 = BQP , where BPP (Bounded- Error Probabilistic Po ly n omial - Time) and BQP (B oun ded-Error Quantum P olynomial-Time) are the classes of problems efficiently solv able b y classical randomized algorithms and quantum algorithms respectively . Bernstein and V azirani [29] show ed that P ⊆ BPP ⊆ BQP ⊆ PSP ACE , where PSP ACE is the class of problems solv able by a det erministic T uring mac hine using a p olynomial amount of memory (but p ossi b ly exp onential time). F or th is reason, an y proof of the BPP 6 = BQP conjecture would immediately imply P 6 = PSP ACE as we ll. The latter w ould b e considered almost as great a breakthrough as P 6 = NP . 50 Complicating matters, there ar e quantum algorithms th at prov ably ac hieve exp onentia l sp eedups ov er any clas- sical algori t h m: one example is Simon’s algorithm [119], an important predecessor of Shor’s algorithm. How ever, all such algorithms are form ulated in the “blac k- box mo del” (see Beals et al. [23]), where th e resource to b e minimized is the number of queries th at an algorithm makes to a hypoth etical b lac k box. Because it is relative ly easy to analyze, the b lac k-b o x mo del is a crucial source o f insi ghts about what mi ght b e true in the conven tional T uring machine mod el. How ever, it is also known that the black-b o x mo del sometimes misleads u s ab out the “real” situation. As a famous example, th e complexity classes IP and PS P A CE are equ al [115], d espite the existence of a black b ox that separates them (see F ortno w [56] for discussion). Besides the black-box mo del, unc onditional exp on ential separations b et ween q uan tu m and classical complexities are known in severa l other restricted mo dels, including comm u n ication complexity [107]. 51 T echnically , if the hidden-va riable t heory invo lved classical randomness, th en it w ould correspond more closely to the complexity class BPP (Bounded- Error Probabilistic P olyn omial-Time). How ever, to da y th ere is strong evidence that P = BPP (see I mpaglia zzo and Wigderson [79]). 35 “Unlike a classic al c omputer, which c an only f actor numb e rs by trying th e divisors one by one, a q uantum c omputer c ould try a l l p ossible divisors in p ar al lel.” If quantum computers can harness v ast num b ers of parallel w orlds, then the ab o ve seems lik e a reasonable guess as to h ow S h or’s algorithm w orks. But it’s not how it works at al l . Notice that, if Shor’s algorithm did work that w a y , then it could b e u sed n ot o n ly for fact orin g in tegers, but also for the muc h larger task of s olving NP -complete p r oblems in p olynomial time. (As men tioned in fo otnote 12, the factoring problem is strongly b eliev ed not to b e NP -complete.) But con trary to a common misconception, quan tum compu ters are n either kno wn n or b eliev ed to b e able to solve NP -complete p r oblems efficientl y . 52 As us u al, th e fundamental problem is th at measuring rev eals just a sin gle random outcome | x i . T o get around that pr ob lem, and ensu re that the right outcome is observ ed with high probabilit y , a quantum algorithm needs to generate an interfer enc e p attern , in wh ich the compu tat ional paths leading to a giv en wrong outcome cancel eac h o ther out, wh ile the paths leading to a giv en righ t outcome reinforce e ach other. This is a delicate requiremen t, and as far as any one kno ws, it can only b e ac hiev ed for a few p roblems, most of w hic h (lik e the factoring pr oblem) hav e sp ecial structur e arising from algebra or num b er theory . 53 A Man y-W orlder migh t retort: “sure, I agree that quan tum compu ting in vol ves harnessing the parallel u niv erses in subtle and non-obvio u s wa ys, b u t it’s s till harnessing p ar al lel universes! ” Bu t ev en h ere, th er e’s a fascinating irony . Supp ose we c ho ose to think of a quan tum algorithm in term s of p arallel un iv erses. Then to pu t it crudely , not only must many universes interfere to giv e a large final amplitude to the right answ er; they must also, b y int erf er in g, lose their identities as p ar al lel universes! In other words, to wh ateve r exten t a collection of universes is useful for quan tum computation, to that exten t it is arguable whether we ought to call them “parallel un iv erses” at all ( as opp osed to parts of one exponential ly-large, self-int erf er in g, quan tum-mec hanical blob). Con versely , to whatev er exten t the universes ha ve un am biguously separate iden tities, to that extent they’re no w “decohered” an d out of causal con tact w ith eac h other. Thus w e can explain the outputs of an y future computations b y in vo kin g only one of the univ ers es, and treating th e others as un realiz ed h yp otheticals. T o clarify , I don’t regard either of th e ab o ve ob jections to Deutsc h’s argument as decisiv e, and am unsur e what I think ab out the matter. My pur p ose, in setting out th e ob jections, was simp ly to illus trate the p oten tial of qu an tum computing theory to inform d eb ate s ab out the Many- W orlds In terp r etat ion. 9 New Compu tati onal Notions of Pro of Since the time of Eu clid, there h a v e b een t wo main notions of mathematical pr oof: (1) A “pro of” is a ve r b al explanation that indu ces a sense of certaint y (and ideally , u nderstanding) ab out the statement to b e p ro v ed, in an y human m athemat ician willing and able to follo w it. 52 There is a remark able quantum algo rithm called Gr over ’ s algorithm [69], whic h can search any space of 2 N p ossi b le solutions in only ∼ 2 N/ 2 steps. How ever, Grov er’s algorithm rep resen ts a quadr atic ( square-root) improveme nt o ver classical brute-force searc h, rather than an exp onential improv ement. A nd without any furth er assumptions ab out the structure of th e search space, Grov er’s algorithm is optimal, as shown by Bennett et al. [27]. 53 Those interested in further details of h o w Shor’s algo rithm w orks, bu t still not ready for a mathematical exposition, migh t wan t to try my p opular essay “Shor, I’ll Do It” [1]. 36 (2) A “pro of” is a finite sequence of symb ols enco ding syn tactic dedu ctions in some formal sy s tem, whic h start with axioms and end with the s tat ement to b e p ro v ed. The tension b et w een these tw o n otio ns is a recurr in g theme in the ph iloso p h y of mathematics. But theoretical computer science d eal s regularly with a third notion of pro of—one that seems to ha ve receiv ed muc h less ph iloso p hical analysis than either of the tw o ab o ve . This notion is the follo wing: (3) A “pro of” is any computational pro cess or p rotocol (r eal or imagined) that can term in ate in a certain wa y if and only if the statemen t to b e p ro v ed is true. 9.1 Zero-Kno wledge P ro ofs As an example of th is third notion, consider zer o-know le dge pr o ofs , introd uced b y Goldwa ss er, Micali, and Rac k off [66]. Giv en t wo graphs G and H , eac h w ith n ≈ 10000 vertice s , supp ose that an all-p o werful bu t un tru st w orthy wizard Merlin wish es to convince a sk eptical king Arthur that G and H are not isomorp h ic. Of course, one w ay Merlin could do this would b e to list all n ! graph s obtained b y p erm u ting the v ertices of G , then n ote that none of th ese equal H . Ho w ev er, suc h a pro of would clearly exhaust Arth ur’s patience (indeed, it could not e ven b e written do wn within the observ able universe). Alternativ ely , Merlin could p oin t Arthur to some pr op erty of G a n d H that different iates them: for example, ma yb e their adjacency matrices ha ve d ifferent eigen v alue sp ectra. Unfortu nately , it is not y et prov en that, if G and H are non-isomorphic, there is alw a ys a differentia ting pr operty that Arth ur can v erify in time p olynomial in n . But as noticed by Goldreic h, Micali, and Wigderson [65], there is somethin g Merlin can do instead: h e can let Arthur chal lenge him. Merlin can sa y: Arth u r , send me a new graph K , wh ic h y ou obtained either b y r andomly p erm uting the v ertices of G , or randomly p ermuting the vertices of H . Then I guaran tee that I will tell y ou, without fail, wh ether K ∼ = G or K ∼ = H . It is clear that, if G and H are really non-isomorphic, then Merlin can alwa ys answer suc h c hallenges correctly , by the assu mption that he (Merlin) has unlimited compu tati onal p o wer. But it is equally clear that, if G and H are isomorphic, then Merlin must ans w er some c hallenges incorrectly , regardless of his compu tat ional p ow er—since a random p ermuta tion of G is statistically indistinguishable fr om a random p erm u tat ion of H . This p r otocol has at least four features that merit reflection by any one interested in the n ature of mathematical p roof. First, the p rotocol is pr ob abilistic . Me rlin cannot convince Arthur with certa int y that G and H are non-isomorph ic, since ev en if they w ere isomorph ic, th ere’s a 1 / 2 p robabilit y th at Merlin w ould get lu c ky and answe r a giv en c hallenge correctly (and hence, a 1 / 2 k probabilit y that he w ould answ er k c hallenges correctly). All Merlin ca n do is offer t o rep eat the protocol (say) 100 or 1000 times, and th er eby mak e it less likely that his p ro of is unsound than that an asteroid will strik e Camelot, killing b oth him and Ar thur. Second, the p rotocol is inter active . Un like w ith pro of notions (1) and (2), Arth ur is no longer a passiv e recipien t of kno wledge, but an act ive play er wh o c hallenges the p ro v er. W e kno w from exp erience that the abilit y to interr o gate a seminar sp eak er—to ask questions that the sp eak er 37 could not hav e an ticipated, ev aluate th e resp onses, and then p ossibly ask follo wu p qu estio n s— often sp eeds up the process of figuring out whether the speake r kn o ws what he or sh e is t alking ab out. Complexit y th eory affirms our in tuition here, th r ough its disco v ery of in teractiv e pro ofs for s tatement s (su c h as “ G and H are not isomorphic”) wh ose shortest kno w n con ven tional pro ofs are exp onenti ally longer. The third inte resting feature of the graph non-isomorphism pr otocol—a feature seldom mentioned— is that its soundness implicitly relies on a physic al assu mption. Name ly , if Merlin had the p ow er (whether through magic or through ord inary espionage) to “p eer in to Arthur’s study” and dir e c tly observe whether Arthur started with G or H , then clearly h e could answ er every c h alle n ge correctly ev en if G ∼ = H . It follo ws th at th e p ersuasiv eness of Merlin’s “pro of” can only b e as strong as Arth u r ’s extramathematical b elief that Merlin do es not ha v e such p o wers. By n o w, there are man y other examples in complexity theory of “pro ofs” w hose v alidit y rests on assumed limitations of the pro vers. As Sh ieb er [116] p oints out, all three of the ab o v e prop erties of int eractiv e protocols also hold for the T uring T est discussed in Section 4! The T u r ing T est is in teractiv e by defin ition, it is probabilistic b ecause ev en a pr ogram that printed rand om gibb erish w ould hav e some nonzero probabilit y of passing the test b y chance, an d it dep ends on the ph ysical assu mption that th e AI program do esn’t “c heat” b y (for example) secretly consulting a h u man. F or these reasons, Shieb er argues that we can see the T uring T est itself as an early int eractiv e pr otocol—one that con vinces the verifier not of a mathematical theorem, b ut of the pro ve r ’s capacit y for intel ligent v erbal b eha vior. 54 Ho w ever, p erhaps the most striking feature of the graph non-isomorphism proto col is that it is zer o-know le dge : a technical term formalizing our in tuition that “Arth ur learns nothing fr om the proto col, b ey ond the tr u th o f the statemen t b eing pro ved.” 55 F or all Merlin ev er tells A r th ur is whic h graph h e (Arthur) started w ith, G or H . But Arth u r alr e ady knew w h ic h graph he started with! This means that, not only does Arth ur ga in no “understanding” of what makes G a n d H non-isomorphic, he d o es not ev en gain the abilit y to pro ve to a third party what Merlin pro ve d to him. This is another asp ect of compu tati onal proofs that has no analogue with p ro of notions (1) or (2). One migh t complain that, as in teresting as the zero-kno wledge pr op ert y is, so far w e’v e only sho wn it’s a chiev able for a n extremely specialized problem. And ind eed, just lik e with factoring in tegers, to da y there is strong evidence that the graph isomorphism problem is not NP -complete [33]. 56 , 57 Ho w ever, in the same pap er that ga v e the g r aph non-isomorp h ism p rotocol, Goldreic h, 54 Incidentally , this p ro vides a go od example of how notions from computational complexity theory can influence philosophy even just at the level of metap h or, forgetting ab out the actual results. In this essa y , I d idn’t try to collect such “metaphorical” applications of complexity theory , simply b ecause there were to o many of them! 55 T echnically , the p roto col is “ honest-verifier zero-know ledge,” meaning th at Arthur learns nothing from h is con- versa tion his Merlin b esides the truth of the statement b eing pro ved, assuming Arthur follow s t he protocol correctly . If Arthur c heats—for example, by send ing a graph K for whic h h e do esn ’t already know an isomorphism either to G or to H —then Merlin’s resp onse could indeed tell Arthur something n ew. H o wev er, Goldreich, Micali, and Wigderson [65] also gav e a more sophisticated p roof p rotocol for graph non-isomorphism, whic h remains zero-know ledge even in the case where A rt h ur cheats. 56 Indeed, there is not even a consensus b elief that graph iso morph ism is outside P ! The main reason is th at, in contra st to factoring in tegers, g raph i somorphism turns out to b e extremely easy in pr actic e . I ndeed, fin ding non-isomorphic graphs that c an ’t b e d istinguished by simple inv arian ts is itself a h ard problem! And in the past, sever al problems (such as linear programming and primalit y testing) th at w ere long know n to b e “efficiently solv able for practical purp oses” were eventually shown to b e in P in the strict mathematical sense as well. 57 There is also strong evid en ce th at th ere are short c onvent i onal p roofs for graph non-isomorphism—in other w ords, 38 Micali, and Wi gder s on [65] also ga ve a celebrated ze ro-knowledge proto col (no w c alled the GMW pr oto c ol ) for the NP -co m p lete problems. By the definition of NP -complete (see Section 3. 1 ), the GMW p rotocol m ean t th at every mathematic al statement that has a c onventional pr o of (say, in Zermelo-F r aenkel set the ory) also h as a zer o-know le dge pr o of of c omp ar able size! As an example application, supp ose you’v e just pro ved th e Riemann Hyp othesis. Y ou w ant to con vince the exp erts of y our triump h, but are paranoid ab out them stealing credit for it. In that case, “all” yo u need to do is (1) rewrite your pro of in a formal language, (2) enco de the resu lt as the solution to an NP -complete problem, and th en (3) lik e a 16 th -cen tury court m athematician c hallenging his comp etitors to a duel, invite the exp erts to run th e GMW proto col with y ou o ver the Int ern et! Pro vided you answer all their c hallenges correctly , the exp erts can b ecome statistic al ly c ertain that you p ossess a pro of of t h e Riemann Hyp othesis, without l earnin g an ythin g ab out that proof b esides an up p er b oun d on its length. Better yet, u n lik e the graph non-isomorphism p rotocol, th e GMW proto col do es not assu me a sup er-p o we r f ul wizard—only an ordinary p olynomial- time b eing wh o hap p ens to kno w a pro of of the relev an t theorem. As a result, to da y the G MW proto col is m uch more than a theoretic al curiosit y: it and its v arian ts hav e foun d ma jor applications in Internet cryptography , wh ere clien ts and s erv ers often need to prov e to eac h other that they are follo wing a proto col correctly w ith ou t rev ealing secret information as they do so. Ho w ever, there is one imp ortant ca veat: u nlik e the graph-nonisomorp hism pr otocol, the GMW proto col relies essen tially on a crypto gr aphic hyp othesis . F or here is h o w the GMW pr oto col works: y ou (the pro ver) first publish thousand s of encrypted messages, eac h one “committing” yo u to a randomly-garbled piece of y our claimed p roof. Y ou then offe r to decrypt a tiny fraction of those messages, as a wa y for sk eptical observers to “sp ot-c hec k” y our pro of, while learning nothin g ab out its structure b esides the u s eless fact that, s a y , the 1729 th step is v alid (but how could it not b e v alid?). If the skeptics wa nt to increase their confid ence that y our pro of is sound, then you simply run the proto col o ver and ov er with them, using a fresh b atc h of encr y p ted messages eac h time. If the ske p tics could decrypt all the messages in a single batc h , then they could piece together y our pro of—but to do that, they would n eed to b reak the und erlying cryp tog r ap h ic co de. 9.2 Other New Notions Let me mentio n four other notions of “pro of” that complexit y theorists h a v e explored in dep th ov er the last tw en ty y ears, and that migh t merit philosophical atten tion. • Multi- pr over inter active pr o ofs [26, 20], in which Arth ur exc h anges messages with two (or more) compu tat ionally-p o werful but untrust worth y wizards. Here, Arth u r migh t b ecome con vinced of some mathematical s tat ement, but only u n der the assu mption that the wizards could not communicate with e ach other during the p rotocol. (The usu al analogy is to a p olice detectiv e wh o puts t w o s usp ects in separate cells, to preve nt them from co ordinating th eir an- sw ers.) Inte restingly , in some multi- p ro v er proto cols, ev en non-comm u nicating wizards could that not just graph isomorphism bu t also graph non-isomorphism will ultimately turn out to b e in NP [84]. 39 successfully coordinate t h eir resp onses to Arth ur’s c hallenges ( and thereb y co nvince Ar thur of a falseho od ) through the use of quantum entanglement [41 ]. How ev er, other p rotocols are conjectured to remain sound eve n against en tangled wizards [83]. • Pr ob abilistic al ly che ckable pr o ofs [54, 18], wh ic h are mathematical p roofs enco ded in a sp ecial error-correcting fo r mat, so that o n e can b ecome confident of their v alidit y b y c hec king only 10 or 20 bits chosen randomly in a correlated wa y . Th e PCP (P r ob abilistic al ly Che ckable Pr o ofs) The or em [17, 5 0 ], one of the crowning ac hiev ements o f complexit y th eory , sa ys that any mathematical theorem, in an y stand ard formal system suc h as Z ermelo-F raenke l set theory , can b e conv erted in p olynomial time into a prob ab ilistically-c hec k able f ormat. • Quantum pr o ofs [131, 6], which are proofs that d epen d for th eir v alidit y on the output of a quan tum compu tatio n— p ossibly , ev en a quantum computation that requires a sp ecial en tan- gled “pro of state” fed to it as inp u t. Because n quan tum bits might requ ire ∼ 2 n classical bits to sim u late , quant u m pro ofs ha ve th e prop ert y that it might n ev er b e p ossible to list all the “steps” that wen t into the p roof, w ithin th e constrain ts of th e visible unive r s e. F or this reason, one’s b elief in the mathematical statemen t b eing prov ed might dep end on one’s b elief in the correctness of qu an tum mec hanics as a ph ysical theory . • Computational ly-sound pr o ofs and ar guments [35 , 94], which rely for their v alidit y on th e assumption that the prov er w as limited to p olynomial-t ime computations—as w ell as the mathematical conjecture that crafting a convincing argument for a falseho o d would ha ve tak en the prov er more than p olynomial time. What implications do these new types of pr o of ha ve for the foundations of mathematics? Do they merely mak e more dramatic w h at “should h a v e b een obvio u s all along”: that, as Da vid Deutsc h argu es in The Be ginning of Infinity [49], p roofs are physical pro cesses taking place in brains or computers, whic h therefore h a v e no v alidity indep end en t of our b eliefs ab out physics? Are the issu es r aised essen tially th e same as those raised b y “con v ent ional” p roofs th at r equire extensiv e computations, lik e App el a n d Hak en’s pro of of t h e F our-Color Th eorem [13]? Or does app ealing, in the course of a “mathematica l p roof,” to (sa y) the v alidit y of quantum mec hanics, the randomn ess of apparently-random n umb ers, or th e la ck of certain sup erp o w ers on the part o f the pr o v er repr esen t something qualitativ ely new? Philosophical analysis is sought . 10 Complexit y , Space, and Time What c an computational c omp lexit y tell us a b out the nature of space and time? A first answ er migh t b e “not muc h”: after all, the definitions of standard complexit y classes such as P can b e sho wn to b e insensitiv e to su c h details as the n u m b er of spatial d imensions, and even whether the sp eed of light is finite or infi nite. 58 On the other hand , I thin k complexit y theory do es offer in sigh t ab out the differ enc es b et ween space and time. 58 More precisely , T u rin g mac h ines with one-dimensional tap es are p olynomial ly equiv alent to T uring mac hin es with k - dimensional tap es for any k , and are also p olynomially equiv alent to r andom-ac c ess m achines (whic h can “jump” to any memory location in unit time, with no lo calit y constraint). On the other h and, if we care ab out p olynomial differences in sp eed, and esp e cial ly if we wan t to study parallel computing mod els, details ab out the spatial la yout of the computing and memory elements (as wel l as the sp eed of comm u n icatio n among the elements) can b ecome vitally imp ortan t. 40 The c lass of problems solv able using a p olynomial amoun t of memory (but p ossibly an expo- nen tial amoun t of time 59 ) is called PSP A CE , for Polynomial Space. Examples of PSP AC E pr oblems include sim u lati n g dynamical systems, deciding whether a regular grammar ge n erates all p ossible strings, and executing an optimal strategy in tw o-pla y er games su ch as Reve r si, Connect F our, and Hex. 60 It is n ot h ard to s h o w that PSP ACE is at least as p o werful as NP : P ⊆ NP ⊆ PSP A CE ⊆ EXP . Here EXP rep resen ts the class of problems solv able usin g an exp onen tial amoun t of time, and also p ossibly an exp onen tial amount of memory . 61 Ev ery one of th e ab o ve conta in men ts is b eliev ed to b e strict, although the o n ly one currently pr ove d to b e strict is P 6 = EXP , by a n imp ortant 1965 result of Hartmanis and Stearns [70] called th e Time Hierarc hy T heorem 62 , 63 . Notice, in particular, that P 6 = NP implies P 6 = PSP A CE . S o wh ile P 6 = PSP A CE is n ot yet pro ved, it is an extremely secure conjecture by the standards of complexity theory . In slogan form, 59 Why “only” an exp onential amount? Because a T uring mac h ine with B bits of memory can ru n for no more than 2 B time steps. After that, the machine must either halt or else retu rn t o a configuration previously v isited (thereby entering an infinite lo op). 60 Note that, in order to sp eak ab out the computational complexity of such games, w e first need to generalize them to an n × n b oard! But if we d o so, th en for many natural games, the problem of determining whic h p la yer has th e win from a given p osition is not only in PSP A CE , but PSP A CE -c omplete (i.e., it captu res the entire difficulty of the class PSP A CE ). F or ex ample, R eisch [109] show ed that this is true for Hex . What ab ou t a suitable generalization of chess to an n × n b oard? That’s also in PSP A CE —but as far as anyone knows , only if we imp ose a p olynomial upp er b ound on the num b er of mo ves in a chess game. Without such a restriction, F raenk el and Lich ten stein [59] show ed th at chess is EXP -complete; with such a restriction, Storer [125] sho wed that chess is PSP A CE -complete. 61 In this context, we call a function f ( n ) “exp onentia l” if it can b e upp er-b ounded by 2 p ( n ) , for some p olynomial p . Also, note that mor e than exp onential memory would b e u seles s h ere, since a T uring machine that runs for T time steps can visit at most T memory cells. 62 More generally , the Time Hierarch y Theorem shows that, if f and g are any tw o “sufficien tly w ell-b eha ved” functions that satisfy f ( n ) ≪ g ( n ) (for example: f ( n ) = n 2 and g ( n ) = n 3 ), then ther e ar e c omputational pr oblems solvable in g ( n ) time but not in f ( n ) time . The pro of of this th eorem u ses d iag onalization, and can b e thought of as a scaled-down vers ion of T uring’s proof of th e un solv ability of t h e halting p roblem. That is, w e argue that, if it were alwa ys p ossible to simulate a g ( n )-time T uring machine by an f ( n )- t ime T uring machine, then we could construct a g ( n )-time m achine that “predicted its own output in adv ance” and th en output something else—thereby causing a contradiction. Using similar arguments, w e can show (for example) that there exist computational problems solv able using n 3 bits of memory bu t n ot using n 2 bits, and so on in most cases where we w ant to compare m or e versus less of the same c omputational r esour c e. In complexity theory , the hard part is comparing tw o differ ent resources: for example, determinism versus n ondeterminism (the P ? = N P problem), time versus space ( P ? = PSP A CE ), or cl assical vers u s quantum computation ( BPP ? = BQP ). F or in those cases, diagonalizatio n by itself no longer works. 63 The fact that P 6 = EXP h as an amusing implication, often attribu ted to Hartmanis: namely , at le ast one of the three inequalities (i) P 6 = NP (ii) NP 6 = PSP A CE (iii) PSP A CE 6 = EXP must b e true, ev en t h ough proving any one of them to b e true i ndividual ly w ould represen t a titanic adv ance in mathematics! The ab o ve observ ation is sometimes offered as circumstantial evidence for P 6 = NP . Of all ou r hundreds of unprov ed b elief s a b out inequalities b et ween pairs of complexit y cl asses, a large fraction of them m ust b e correct, simply to a void contradicting th e hierarc hy t heorems. So then wh y not P 6 = NP in particular (given that our intuitio n there is stronger than our intuitions for most of th e other inequalities)? 41 complexit y theorists b eliev e that sp ac e is mor e p owerful than time . No w, s ome p eople ha ve ask ed h o w such a claim could p ossibly b e consisten t w ith mod ern physic s. F or didn’t Einstein teac h u s that space and time are merely tw o asp ects of the same structure? On e immediate answ er is that, ev en within relativit y theory , space and time are not in terc hangeable: space has a p ositiv e signature whereas time has a negativ e signature. In complexity th eory , th e difference b et w een space and time manifests itself i n the strai ghtforw ard fact that y ou can r euse the same memory cells o ver and o v er, but you can’t reuse the same moments of time. 64 Y et, as trivial as th at observ ation soun ds, it leads to an interesting though t. S upp ose that the la ws of physic s let us tra v el b ackwar ds in time. In suc h a case, it’s natural to imagine t h at time wo u ld b ecome a “reu sable resource” just lik e space is—and that, as a result, arbitrary PS P ACE computations w ould fall within our grasp . But is that just an idle sp eculation, or can we rigorously justify it? 10.1 Closed Timelik e Curv es Philosophers, like science-fiction fans, h a v e long b een interested in the p ossibilit y of closed timelik e curv es (C T Cs), w h ic h arise in certain solutions to Einstein’s field equations of general relativit y . 65 On a traditional u n derstanding, the central p hilosophical problem raised by CTCs is the gr andfat her p ar adox . T h is is the situation where y ou g o bac k in time to kill y our o w n grandfather, therefore y ou are n ever b orn, therefore your grandfather is not killed, therefore y ou ar e b orn , and so on. Do es this con tradiction imm ed iat ely imply that CTCs are imp ossible? No, it do esn’t: w e can only conclude that, if CTCs exist, then th e la ws of physics m ust someho w prev ent grandfather parado xes from arising. How could they do so? One classic illustration is that “when y ou go b ack i n time to try a n d kill your g r an d father, the gun j ams”—or some other “unlik ely” eve nt inevitably o ccurs to k eep the state of the univ erse consisten t. But why s hould w e im agine that suc h a conv enien t “out” will alw a ys b e a v aila b le, in ev ery physic al exp eriment in vol vin g CTCs ? Normally , w e like to imagine th at we hav e the freedom to design an exp erimen t ho wev er we wish, without Natur e imp osing conditions on the exp erimen t (for example: “ev ery gun m ust jam sometimes”) whose reasons ca n only b e understo o d in terms of distant or h yp othetical ev ent s. In his 1991 pap er “Quantum mec hanics near closed timelik e lines,” Deutsc h [47] ga v e an elegan t prop osal f or eliminating grandfather parado xes. In particular he sho wed that, as long as we assume the la ws of ph ysics are quan tum-mec hanical (or ev en just classically probabilistic), ev ery exp erimen t inv olving a CT C admits at least one fixe d p oint : that is, a w ay to satisfy the cond itio n s of the exp eriment that ensur es consistent ev olution. F ormally , if S is the mapp ing fr om qu antum states to themselve s indu ced by “going aroun d the CTC once,” then a fixed p oin t is any qu an tum mixed state 66 ρ su c h that S ( ρ ) = ρ . The existence of s uc h a ρ follo ws from simple linear-algebraic argumen ts. As one illustratio n , the “resolution of the grand father paradox” is n o w that y ou are b orn with p r obabilit y 1 / 2, and if y ou are b orn , you go bac k in time to k ill y our grandf ather—f rom 64 See my b log p ost www.scottaaro n son.com/ b log/?p=368 for more on this theme. 65 Though it is not known whether those solutions are “physical”: for ex amp le, whether or not they can surv ive in a quantum th eory of gravit y (see [96 ] for ex ample). 66 In quantum mechanics, a mixe d state can b e though t of as a classica l probability distribution o ver q uan tum states. How ever , an imp ortan t twist is t hat the same mixed state can b e represented by differ ent probability d istributions: for example, an equal mixture of the states | 0 i and | 1 i is physically indistinguishable from an equal mixt u re of | 0 i + | 1 i √ 2 and | 0 i−| 1 i √ 2 . This is why mixed states are represented mathematically using Heisenberg’s density matrix formalism. 42 whic h it follo ws that you are b orn with pr obabilit y 1 / 2, and so on. Merely b y treating states as probabilistic (a s, in some se n se, they h ave to b e in quantum mec hanics 67 ), w e ha ve made the ev olution of the u niv erse consisten t. But Deutsc h’s account of CTC s faces at least three serious difficulties. Th e first difficult y is that the fixed p oint s might not b e unique : there could b e man y mixed states ρ such that S ( ρ ) = ρ , and then the question arises of how Nature c ho oses one of them. T o illustrate, consid er the g r andfather anti-p ar adox : a b it b ∈ { 0 , 1 } that tra vels around a CTC without c hanging. W e can consistently assume b = 0, or b = 1, or an y probabilistic mixture of the t wo —and unlik e th e usual s ituatio n in physic s, here there is no p ossible b oun dary condition that could resolve the am biguity . The second difficulty , p oin ted out Bennett et al. [28], is th at Deutsc h’s p r op osal violates th e statistica l in terpretation of qu an tum mixed states. So for example, if half of an en tangled pair | 0 i A | 0 i B + | 1 i A | 1 i B √ 2 is placed inside the CTC, w hile th e other h alf remains outside the C TC, th en the pro cess of fi n ding a fixed p oin t will “break” th e en tanglemen t b et we en the t wo halv es. As a “remedy” for this problem, Bennett et al. suggest r equiring t h e CTC fixed p oin t ρ to b e ind ep enden t of the en tire rest of the univ erse. T o m y mind, th is remedy is so d rastic that it b asica lly amoun ts to defining CTCs out of existence! Motiv ated by these difficulties, Llo yd et al. [90] recen tly pr oposed a completely differen t accoun t of CT Cs, based on p ostsele cte d telep ortation . Llo yd et al.’s acco u n t a v oids b oth of the problems ab o v e—though p erhap s n ot surpr isingly , in tro duces other p roblems of its o wn. 68 My own view, for whatev er it is w orth, is th at Llo yd et al. are talking less ab out “true” CTC s as I w ould under s tand the concept, as ab out p ostselected quant u m-mec hanical exp erimen ts that simulate CTCs in certain in teresting resp ects. If there are an y contro v ersies in physics that call out for exp ert philosophical atten tion, surely this is one of them. 10.2 The Evolu t ionary Principle Y et so far, we ha ve not ev en mentioned what I see a s the main difficult y with Deutsc h ’s accoun t of CTC s. T his is that finding a fixed p oin t m ight requir e Nature to solv e an astronomically-hard computational problem! T o ill u strate, consider a scie n ce-fiction scenario wherein y ou go back in time a n d dict ate Shakespeare’s pla ys to him. Shake sp eare thanks y ou for saving h im the effort, publishes v erbatim the pla ys that y ou dictate d , and cen turies later the pla ys come do wn t o yo u , whereup on y ou go bac k in time and dictate them to Sh ak esp eare, etc. Notice that, in con trast to the grandfather parado x, here there is no logical contradict ion: th e story as w e told it is entirely consistent. But most p eople find the story “parado xical” anyw a y . After all, somehow Hamlet gets written, without an yo n e ev er doing the wo r k of wr iting it! As Deutsc h [47] p erceptively observ ed, if there is a “paradox” here, th en it is not one of logic but of 67 In more detail, Deutsc h’s prop osal works if the state s p ace consists of classical probabilit y distributions D or quantum mixed states ρ , bu t not if it consists of pure states | ψ i . Thus, if one b elieved that only pu re states were fundamental in physics, and that probability distributions and mixed states alwa ys reflected sub jectiv e ignorance, one might reject Deutsch’s prop osal on that ground. 68 In particular, in Lloyd et al.’s prop osal, the only wa y to deal with the grandfather paradox is by some v ariant of “the gun jams”: t here ar e evo lutions with no consisten t solution, and it needs to b e p ostulated th at th e laws of physics are such that they n ev er o ccur. 43 c omputa tional c omplexity . S p ecifical ly , the story violates a commonsense pr inciple that we can lo osely articulate as follo ws: Kno wledge requires a causal pro cess to bring it in to existence. Lik e many other imp ortan t pr in ciples, this one might not b e recognized as a “principle” at all b efore w e con template s itu ations th at violate it! Deutsc h [47] calls this p r inciple the Evolutionary Principle (EP). Not e that some v ersion of the EP w as inv ok ed b oth by William Pale y’s blind- w atc hm aker argumen t, and (ironically) by the argumen ts of Ric hard Da wkin s [45] and other atheists against the existence of an in telligen t designer. In m y sur v ey article “ NP -Complete Pr oblems and Ph ys ical Realit y” [4], I pr op osed an d d efended a complexit y-theoretic analogue of the EP , whic h I called the NP Har dness Assumption : There is no ph ysical means to so lve NP -complete problems in polynomial time. The ab o v e statemen t implies P 6 = NP , but is stronger in th at it en compasses pr obabilistic comput- ing, quantum computing, an d any other c omputatio nal mo del compatible w ith the la ws of p h ysics. See [4] f or a sur vey of recen t results b earing on the NP Hardness Assumption, analyses of claimed coun terexamples to the assu mption, and p ossible implications of the assump tion for ph ysics. 10.3 Closed Timelik e Curv e Computation But ca n w e sho w more r igorously that clo sed timelike cur v es w ould viol ate the NP Hardn ess As- sumption? In deed, let us n ow sho w th at, in a un iv erse wh ere arbitrary computations could b e p erformed insid e a CTC, and wh ere Nature had to fin d a fi xed p oint for th e CTC, w e could solv e NP -complete p r oblems using only p olynomial resources. W e can mod el any N P -complete problem instance b y a function f : { 0 , . . . , 2 n − 1 } → { 0 , 1 } , whic h maps eac h p ossible solution x to the bit 1 i f x is v alid, or to 0 if x is in v alid. (Here, for con v enience, we iden tify eac h n -bit solution string x with the nonnegativ e in teger that x enco des in binary .) Our task, then, is to fi nd an x ∈ { 0 , . . . , 2 n − 1 } such that f ( x ) = 1. W e can solv e this problem with just a single ev aluatio n to f , pr o vided we can ru n the follo w ing compu ter program C inside a closed timelik e curve [36, 4, 7]: Given input x ∈ { 0 , . . . , 2 n − 1 } : If f ( x ) = 1 , th en output x Otherwis e, output ( x + 1) mod 2 n Assuming there exists at least one x such that f ( x ) = 1, th e only fixe d p oints o f C —that is, the only wa ys for C ’s output to equal its inp ut—are for C to input, and output, such a v alid solution x , wh ic h therefore app ears in C ’s outpu t register “as if by magic.” (If there are n o v alid solutions, then C ’s fixed p oin ts will sim p ly b e uniform sup erp ositions or pr obabilit y distributions o ver al l x ∈ { 0 , . . . , 2 n − 1 } .) Extending the ab ov e idea, John W atrous and I [7] (follo wing a suggestion by F ortno w) r ece ntly sho wed that a CT C computer in Deutsc h’s mo del could solv e all pr oblems in PS P ACE . (Recall that PSP A CE is b eliev ed to be ev en larger than NP .) More sur prisingly , w e a lso sho w ed that PSP A CE constitutes the limit on wh at can b e done with a CTC computer; and that this is tru e whether the CTC computer is classical or quan tum. One consequence of our results is that the “na ¨ ıv e 44 in tuition” about CTC c ompu ters—that their effect would b e to “ make space and time equiv alen t as computational resour ces” —is ultimately correct, although not for the na ¨ ıve r easons. 69 A second, am using consequence is that, once c losed t imelike curv es are a v ailable, s w itc hing from classical to quan tum computers pr ovides n o additional b enefit! It is imp ortant to realize that our algorithms for solving hard p roblems with CTCs d o not just b oil do wn to “us in g h u ge amoun ts of time to fi nd the answer, then sending the answer bac k in time to b efore the computer started.” F or even in the exotic scenario of a time tra v el computer, we still require that all resources u sed inside the CTC (time, memory , etc.) b e p olynomially- b oun ded. Th u s, the ability to solv e hard pr oblems comes solely f rom c ausal c onsistency : the requ ir emen t that Nature m ust fi nd some ev olution for the CT C computer that av oids grandfather parado xes. In Llo yd et al.’s alternativ e account of CTCs based on p ostselection [90], hard problems can also b e solv ed, th ou gh f or d ifferen t reasons. In particular, building on an earlier result of mine [5], Llo yd et al. show t h at t h e pow er of their mod el corresp onds to a complexit y class called PP (Probabilistic Polynomial-Time) , whic h is b eliev ed to b e strictly smaller than PS P ACE but s trictly larger than NP . T h us, one might say that Llo yd et al.’s mo del “improv es” the computational situation, b ut not by m uch! So one migh t wonder: is there an y wa y that the la ws of physics could allo w CTCs, without op ening the do or to implausible computational p o wers? There remains at least one interesting p ossibilit y , whic h w as comm unicated to me b y the philosopher Tim Maudlin. 70 Ma yb e the la ws of physic s hav e the prop ert y that, n o matter what computations are p erformed inside a CTC, Nature alwa ys has an “out” th at a vo ids the grand father parado x, but also av oids solving hard computational problems—analogous to “the gun jamming” in the original grandf ather p arad ox. Suc h a n out migh t inv olv e (for example) an asteroid hitting th e C TC computer, or the c omp u ter failing for other myste rious reasons. Of course, any compu ter in the p h ysical world has some nonzero probability of failure, but ordin arily w e imagine that the failur e p r obabilit y can b e made negligibly sm all. Ho wev er, in s itu ations where Nature is b eing “forced” to fi nd a fi xed p oin t, ma yb e “m ysterious computer failur es” would b ecome the n orm r ather than the exception. T o summarize, I think that compu tat ional complexit y th eory c hanges the philosophical issues raised by time tra v el into the past. While d iscussion traditionally fo cused on the grandfather parado x, we hav e seen that there is n o shortage of wa ys for Nature to a void logical in consistencie s, ev en in a un iv erse with CTCs. The “real” p roblem, then, is ho w to esca p e the oth er parado xes that arise in the course of taming the grand father p arado x! Pr obably foremost among those is the “computational complexit y parado x,” of NP -complete and even hard er problems getting solv ed as if by m agic. 11 Economics In classical economics, agent s are m odeled as rational, Ba y esian agen ts who tak e whatev er actions will maximize their exp ected u tilit y E ω ∈ Ω [ U ( ω )], giv en their su b jectiv e p robabilities { p ω } ω ∈ Ω o v er 69 Sp ecifically , it is not true that in a CTC un iv erse, a T uring machine tap e head could just trav el back and forth in time the same wa y it travel s bac k and forth in space . If one th inks this wa y , then one really h as in mi n d a second, “meta-time,” while t he “original” time h as b ecome merely one more dimension of space. T o put the p oin t differently: even though a CTC w ould make time cyclic , time wo u ld still retain its dir e ctionality . This is the reason why , if w e w ant to sh o w that CTC computers h a ve the p ow er of PSP A CE , we need a nontrivial argument inv olving causal consistency . 70 This p ossibilit y is also discussed at length in D eutsc h’s pap er [47]. 45 all p ossible states ω of the w orld. 71 This, of cour se, is a carica tu r e that seems almost designed to b e attac k ed, and it has b een attac k ed from almost e very angle. F or exa mp le, h u mans are not ev en close to r atio n al Bay esian agen ts, bu t suffer from well-kno wn cognitiv e biases, as explored by Kahneman and T v ersky [80] among others. F urtherm ore, the classical view seems to lea v e no r oom for cr itiquin g p eople’s b eliefs (i.e., their prior probabilities) or their utilit y fu nctions as irrational— y et it is easy to cook up prior probabilities or utilit y functions that w ould lead to b eha vior that almost an yone w ould consider insane. A third problem is that, in games with several co op erating or comp eting agen ts who act simultaneously , classic al economics guarante es the existence of at least one Nash e quilibrium among the agent s’ strategies. But the u sual situation is that ther e are m ultiple equilibria, and then there is no general p rinciple to pred ict which equilibr ium will prev ail, ev en though the choice migh t mean the d ifference b et w een w ar and p eace. Computational complexity theory can contribute to d ebates ab out the f oundations of economics b y sho win g that, ev en in the idealized situation of r atio n al agents who all h a v e p erfect information ab out the state of th e w orld, it w ill often b e c omputa tional ly i ntr actable for those agen ts to act in accordance with classical economics. Of course, some version of this observ ation has b een recognized in economics for a long time. There is a large literature on b ounde d r ationality (going bac k to the w ork of Herb ert Simon [120]), whic h stu dies the beh avior of eco n omic agen ts w h ose decision-making abilities are limited in one wa y or another. 11.1 Bounded Rationality and the It erated Prisoners’ Dilemma As one example of an insight to emerge f rom this literature, consider the Finite I terated Prisoner’s Dilemma. Th is is a game where t wo pla ye r s meet for some fi x ed num b er of rounds N , whic h is finite and common kno wledge b et w een the play ers. In eac h round, b oth pla y ers ca n e ither “D efect” or “Co op er ate” (not kno w in g th e other pla yer’s c hoice), after w hic h they r ecei ve the follo wing p a y offs: Defect 2 Co op erate 2 Defect 1 1 , 1 4 , 0 Co op erate 1 0 , 4 3 , 3 Both pla yers remem b er th e en tire pr evious h istory of the inte r act ion. It is c lear that th e p la y ers will b e jointl y b est off if they b oth c o op erate, bu t equally clear that if N = 1 , then coop eration is n ot an e qu ilibrium. On the other hand, if the numb er of r ounds N wer e unknown or infinite , then the p la y ers could rationally decide to co op erate, similarly to how humans d ecide to co op erate in rea l life. T hat is, Pla y er 1 reasons that if he defects, then Play er 2 will ret aliate b y defecting in futu re rounds, and vice versa. S o o v er the long r un, b oth pla y ers do b est for themselv es b y co operating. The “parad ox” is no w that, as so on as N b ecomes kno wn, the ab ov e reasoning collapses. F or assuming the play ers are rational, they b oth realize th at whatev er else, neither h as an ythin g to lose by defecting in r ound N —and therefore that is what they do. But since b oth pla y ers know that b oth will defect in r ound N , neither one h as an ything to lose b y d efecti n g in round N − 1 either —and they can con tin u e inductiv ely in this w a y bac k to the first rou n d. W e therefore g et the “prediction” that b oth pla y ers w ill defect in ev ery round, ev en though that is neither in the pla ye r s ’ o wn in terests, nor what actual h u mans do in exp eriments. 71 Here we assume for simplicity that the set Ω of p ossible states is countable; otherwise w e could of course use a conti nuous probability measure. 46 In 1985, Neyman [98] prop osed an ingenious resolution of this paradox. Sp ecifically , h e sho wed that if the t w o play ers ha ve sufficiently smal l memories —tec hnically , if they are finite automata with k states, for 2 ≤ k < N —then coop eration b ecomes a n equilibrium once again! The basic in tuition is that, if b oth pla y ers lac k enough memory to coun t up to N , and b oth of th em kn o w that, an d b oth kno w that th ey b oth kn o w that, and so on, then the inductiv e argument in the last paragraph fails, s ince it assumes intermediate strategies that neither pla yer can imp lemen t. While complexity considerations v anquish some of the counterin tuitiv e c onclusions of classical economics, equally in teresting to m e is that they do not v an q u ish others. As one example, I sho wed in [3] that Rob ert Au m ann’s celebrated agr e ement the or em [19]—p erfect Ba y esian agen ts with com- mon priors can neve r “agree to disagree”—p ersists even in th e presence of limited comm un ica tion b et w een the agents. There are many other interesting results in the b ounded rationalit y literature, to o many to do them justice here (but see Rubinstein [111] for a survey). On the other hand, “b ounded ratio- nalit y” is something of a cat ch-all phr ase, en compassin g a lmost every imaginable deviation from rationalit y—including h u man cognitiv e biases, limits on inform ation-gathering and comm u nicatio n , and the restriction of strategies to a sp ecific form (for example, linear thr eshold functions). Many of these deviations ha v e little to do with computational complexit y p er se . So the qu estion re- mains of whether computational c omp lexity sp e cific al ly can p ro vide new in sigh ts ab out e conomic b eha vior. 11.2 The Complexit y of E quilibria There are s ome ve r y recen t adv ances s uggesti n g that the answer is y es. Consid er the problem of find ing an equilibrium of a tw o-pla y er game, give n the n × n pa yoff matrix as input. In the sp ecial case of zer o-sum games (whic h von Neumann s tu died in 1928), it h as long b een known how to solv e this pr oblem in an amount of time p olynomial in n , for example by reduction to linear programming. But in 2006, Dask alakis, Goldb erg, and Pa p adimitriou [44] (with improv emen ts by Chen and Deng [39]) pro v ed the sp ecta cular result that, for a gener al (n ot necessarily zero-sum) t wo -play er game, find ing a Nash equilibriu m is “ PP A D -complete.” Here PP AD (“P olynomial Pa r it y Argumen t, Directed”) is, r oughly sp eaking, the class of al l searc h problems for whic h a solution is guarant eed to exist for the same com binatorial reason that every game has at least one Nash equilibrium. Note that find ing a Nash equilibrium c annot b e NP -complete, for the tec hn ical reason that N P is a cla ss of de cision problems, and the answ er to the decision problem “do es t h is game ha ve a Nash equilibriu m?” is alwa ys y es. But Dask alakis et al.’s result says (informally) that the searc h prob lem of finding a Nash problem is “as close to NP -complete as it could p ossibly b e,” sub ject to its decision version b eing trivial. Similar PP AD -completeness results are n o w kno wn for other fundamenta l economic p roblems, s uc h as findin g mark et-clearing prices in Arrow-De b r eu mark ets [38]. Of course, one can debate the economic relev ance of these results: f or example, how often d oes the computational hardness that w e n o w kn ow 72 to b e inherent in economic equilibrium theorems actually rear its head in p ractic e? But one can similarly debate the economic relev ance of the equilibrium theorems themselves! In m y opinion, if the theorem that Nash equilibria exist is considered relev an t to debates a b ou t (sa y) fr ee markets v ersu s go v ernm en t in terv ent ion, then the theorem that finding th ose equilibria is PP AD -complete should b e considered r ele v ant also. 72 Sub ject, as u sual, to widely-b eliev ed complexity assumptions. 47 12 Conclus ions The purp ose of th is essa y was to illustrate ho w philosophy could b e enric hed by taking compu- tational complexit y theory into accoun t, muc h as it w as enric hed almost a cen tury ago by taking computabilit y theory into accoun t. In particular, I argued that computational complexit y provides new in sigh ts in to the explanatory conten t of Darwinism, the nature of mathematical knowledge and pro of, compu tationalism, synta x v ersu s semant ics, the problem of logical omniscience, d eb ate s sur- rounding the T uring T est and Chinese Ro om, the pr ob lem of induction, the foun dations of quan tum mec hanics, closed timelik e curves, and economic r atio n alit y . Indeed, one might sa y that the “real” question is which philosophical problems don ’t h a v e imp ortan t computational complexit y asp ects! My o wn opinion is that there p robably ar e suc h problems (even within an alytic philosoph y), and that one go od candidate is th e p roblem of w hat w e sh ould tak e as “b edro c k mathematical realit y”: that is, the set of mathematical statement s that are ob jectiv ely tru e or false, regardless of whether they can b e pro ved or d ispro v ed in a given f ormal system. T o me, if we are not willing to sa y that a give n T ur ing machine M either accepts, rejects, or runs forever ( w hen started on a blank tap e)—and th at which one it does is a n ob jectiv e fact, indep endent of our formal axiomatic theories, the laws of p h ysics, the biology of the human brain, cultural conv en tions, etc.—then we have no b asis to talk ab out any of those other things (axio matic theories, the la ws of physics, and so on). F urther m ore, M ’s resource requiremen ts are irrelev an t here: ev en if M only h alts after 2 2 10000 steps, its output is as mathematica lly defin ite as if it had halted after 10 steps. 73 Can we say an ything gene r al ab out wh en a computational complexity p ersp ectiv e is helpful in philosophy , and when it isn’t? Extrap olating from the examples in this essa y , I would sa y that computational complexit y tends to b e helpful when we wa nt to kno w w hether a p articula r fact do es any e xpla natory work : S ect ions 3.2, 3.3, 4, 6, and 7 all p r o vided examp les of this. Other “philosophical applications” of complexit y theory come from the Evol u tionary Principle and th e NP Ha rd ness Assump tion discussed in S ecti on 10.2. If w e b eliev e that certain problems are computationally in tractable, then we ma y b e able to draw in teresting conclusions from that b elief ab out economic rationalit y , quan tum mec h an ics, the p ossibilit y of closed timelike curve s, and other issues. By con trast, computational complexit y tends to b e un h elpful when we only wan t to kno w wh ether a particular fact “determines” another fact, and don’t care ab out the length of the inferen tial c hain. 12.1 Criticisms of Complexit y Theory Despite its exp lanato r y reac h, complexit y theory h as b een criticized on v arious groun ds. Here are four of the most common criticisms: (1) Complexit y theory o n ly makes a symptotic statemen ts (statemen ts ab out ho w the resources needed to solve problem instances of size n scale as n go es to infi nit y). But as a matter of logic, asymptotic statemen ts need n ot hav e any impl ic ations wha tso ever for the finite v alues of n (sa y , 10 , 000 ) that h u mans care actually ab out, n or can an y finite amoun t of exp erimenta l data confirm or refute an asymptotic claim. 73 The situation is very different for mathematical statements lik e th e Con tinuum Hyp othesis, which c an ’t obviously b e phrased as predictions ab out idealized computational processes (since they’re n ot expressible by first- order or even second-order quantification ov er the intege rs). F or those statements, it really i s u nclear to me what one means by their truth or falsehoo d apart from th eir p ro v ability in some formal system. 48 (2) Man y of (wh at w e wo uld lik e to b e) complexit y theory’s basic p rinciples, such as P 6 = NP , are currentl y u npro ve d ma th emati cal co n jectures, and will probably remain that wa y for a long time. (3) Complexit y theory fo cuses on only a limited t yp e of computer—the serial, deterministic T ur- ing mac hine—and fails to incorp orate the “messier” computational phenomena f ound in na- ture. (4) Complexit y theory studies only the worst-c ase b ehavio r of algorithms, and do es n ot add ress whether that b eha vior is r epresen tativ e, or whether it merely reflects a few “pathological” inputs. So for example, ev en if P 6 = NP , there might still b e excellen t heuristics to solv e most instances of NP -complete problems that actually arise in practice; complexit y theory tells us nothing ab out su c h p ossibilities one wa y or the other. F or whatev er it’s worth, criticisms (3) and (4) ha ve b ecome muc h less accurate since the 1980s. As discussed in this essa y , complexit y theory h as b y no w b ranc hed out far b ey ond determin istic T uring mac hines, to incorp orate (for example) quan tum mec hanics, parallel and distrib uted com- puting, and sto c hastic pro cesses such as Darwinian ev olution. Meanwhile, although wo r st-ca se complexit y remains the b est-unders to o d kind, to da y there is a large b o dy of w ork—muc h of it driv en b y cryptography—that s tudies the aver age-c ase hardness of computational p roblems, for v arious probabilit y distributions o ver inputs. An d ju st as almost all co mp lexit y t h eorists b eliev e that P 6 = NP , so almost all sub scrib e to the stronger b elief that there exist har d-on-aver age NP problems—indeed, that b elief is one of the underpinnings of mod ern cryptography . A few prob- lems, such as calculating discrete logarithms, are eve n known to b e just a s ha r d on r andom inputs as they ar e on the har dest p ossible input (though whether su c h “w orst-case/a v erage-case equiv a- lence” h olds for any NP -complete p roblem remains a ma jor op en question). F or th ese reasons, although sp eaking ab out av erage-c ase rather than worst-ca se complexit y w ould complicate some of the arguments in this essay , I don’t thin k it w ould c hange the conclusions muc h. 74 See Bogdano v and T r evisan [32] for an excellen t r ece nt surve y of av erage -case complexit y , and Impagliazzo [78] for an ev o cativ e discussion of co mp lexit y t h eory’s “p ossible wo r lds” (for example, the “world” where NP -complete p r oblems turn out to b e hard in the worst case but easy on a ve r age ). The broader p oin t is that, ev en if we adm it that critici sm s (1)-(4 ) hav e merit, that d oes not giv e us a license to dismiss complexit y-theoretic argument s whenever we d islik e th em! In science, w e only eve r d eal with imp erfect, approxi m ate theories—and if w e reject the conclusions of the b est appro ximate theory in some area, then the burd en is on u s to explain why . T o illustrate, sup p ose y ou b eliev e that qu antum compu ters will never give a s p eedup o ver classical computers for an y p ractica l problem. T h en as an explanation for y our stance, you might assert any of the follo wing: (a) Quantum mec hanics is false or in complete , and an attempt to build a scalable quantum computer would instead lead to falsifying or extending quant u m m ec hanics itself. (b) There exist p olynomial-time classic al algorithms f or f act orin g in tegers, and for all the other problems that adm it p olynomial-time quantum algorithms. (In complexit y terms, the classes BPP and B QP are equal.) 74 On the other hand, it would p resuppose that we k new how to d efine reasonable probability distributions o ver inputs. But as discussed in Section 4.3, it seems hard to exp lain what we mean by “structured instances,” or “the types of instances t h at normally arise in practice.” 49 (c) The “constan t-factor o ve r h eads” inv olv ed in building a quantum computer are so large as to negate their asymptotic adv an tages, for any p roblem of conceiv able h u man in terest. (d) While we don’t yet kn o w whic h of (a)-(c) holds, w e can know on some a priori groun d that at least on e of them has to hold. The p oin t is that, even if w e can’t answ er eve ry p ossible shortcoming of a complexit y-theoretic analysis, w e can still use i t to clarify the choic e s : to force p eople to la y some c ard s on th e table, committing themselve s either to a p rediction that migh t b e falsified or to a mathematica l conjecture that might b e disp ro v ed. Of course, this is a common feature of al l scientific theories, not something sp ecific to complexit y theory . If complexit y theory is unusual here, it is only in the num b er of “predictions” it juggles that could b e confirmed or r efuted by mathematical pro of (and ind eed, only b y mathematica l proof ). 75 12.2 F uture Directions Ev en if the v arious criticisms of complexit y theory don’t n ega te its relev ance, it w ould b e great to address those criticisms head-on—and more generally , to ge t a cle arer un derstanding of the rela- tionship b et wee n complexit y theory and the r eal -world phenomena that it tries to exp lain. T o wa rd that end, I think the follo wing qu estio n s wo u ld all b enefit fr om careful p hilosophical analysis: • What is th e empir ical status o f asymptotic claims? What sense can w e giv e to an asymp- totic statement “making predictions,” or b eing su pp orted or ruled out by a finite num b er of observ ations? • Ho w can we explain th e empirical facts on which complexit y theory relies: for example, that w e rarely see n 10000 or 1 . 0000001 n algorithms, or th at the co m putational p roblems humans care ab out tend to organize themselve s in to a relativ ely-small num b er of equiv alence classes? • Short of pr oof, how do p eople form in tu itions ab out the truth or falseho o d of mat h emati cal conjectures? What ar e those int u itio n s, in cases su c h as P 6 = NP ? • Do the conceptual conclusions that p eople s ometi mes w ant to draw f rom conjectures such as P 6 = NP or BPP 6 = BQP —for example, ab out the nature of mathematical creativit y or the in terpr eta tion of qu an tum mec hanics—actually dep end on those conjectures b eing true? Are there easier-to-pro ve statemen ts t h at wo u ld arguably su p p ort the same conclusions? • If P 6 = NP , then ho w hav e humans managed to make su c h enorm ou s mathematical progress, ev en in th e face of the general in tractabilit y of theorem-pr o ving? Is there a “selection effect,” b y whic h mathematicia n s fav or problems with sp ecial structure that mak es them easier to solv e than arbitrary p roblems? If so, then w hat do es this structure consist of ? In sh ort, I see p lent y of scop e for the con verse essay to this one: “Why Compu tat ional Com- plexit y Theorists Sh ould Care Ab out P h ilosoph y .” 75 One other example that springs to mind, of a scientific theory many of whose “pred ictions” take the form of mathematical conjectures, is string theory . 50 13 A c kno wledgmen ts I am grateful to Oron Shagrir for pushin g me to finish this essa y , for helpfu l co m m en ts, and for suggesting Section 7.2; to Alex Byrne for suggesting Section 6; to Agust ´ ın Ra y o for suggesting Section 5; and to Da vid Aarons on , Seam us Bradley , T errence C ole , Mic hael Collins, Andy Dru c k er, Mic hael F orb es, Oded Goldreic h, Bob Harp er, Gil Kalai, Dana Moshko vitz, Jan Arne T elle, Dylan Th u rston, Ronald de W olf, Avi Wigderson, and Joshua Zelinsky for their feedbac k. References [1] S. Aaronson. Shor, I ’ll d o it (we b log en try). www.scottaaronson.com/blog/ ?p =208. [2] S. Aaronson. Multilinear formulas and sk epticism of quantum computing. In Pr o c. ACM STOC , pages 118–1 27, 2 004. quant -p h /03 11039. [3] S. Aaronson . The complexit y of agreemen t. In Pr o c. ACM STOC , p ag es 634–643 , 2005. ECCC TR04-061. [4] S. Aaronson. NP-complete problems and ph ysical realit y . SIGACT News , Marc h 2005. qu ant- ph/050207 2. [5] S. Aaronson. Qu antum compu tin g, p ostselec tion, and probabilistic p olynomial-time. Pr o c. R oy. So c . L ond on , A 461(2063): 3473–3482, 2005. quant-ph/041 2187. [6] S. Aaronson and G. Ku p erb erg. Qu an tum v ersu s classical p r oofs and advice. The ory of Com- puting , 3(7):129–1 57, 2007. Previous v ersion in Pr oceedings of CC C 2007. qu an t-ph/060405 6. [7] S. Aaronson an d J . W atrous. Closed timelik e curves make quantum and classical computing equiv alen t. Pr o c. R oy. So c. L ondon , (A465):631– 647, 2009. arXiv:0808.2669 . [8] S. Aaronson and A. Wigderson. Algebrization: a new b arrier in complexit y theory . ACM T r ans. on Computation The ory , 1(1), 2009. C onference version in Pro c. ACM STOC 2008. [9] M. Agra w al, N. Ka y al, and N. Saxena. PRIMES is in P. www.cse.iitk.ac.in/users/manindra/primalit y .ps, 2 002. [10] D. Aharonov. Quantum computation - a review. In Dietric h Stauffer, editor, Annual R eview of Computational P hysics , v olume VI. 1998. quan t-ph/9812037. [11] D. Angluin. Learning regular sets from queries and counterexa mp les. Information and Com- putation , 75(2):87 –106, 1987 . [12] D. An gluin, J. Aspn es, J. Chen, and Y. W u . Learning a circuit b y injecting v alues. J . Comput. Sys. Sci. , 75(1):60–7 7, 2009. Earlier v ersion in ST OC’2006 . [13] K. Ap p el and W. Hak en. E very planar map is four-c olor able . American Mathematical So ciet y , 1989. [14] B. Applebaum, B . Barak, and D. Xiao. On basing lo wer-b ounds f or learning on w orst-case assumptions. I n Pr o c. IEEE FOCS , pages 211–220, 2008 . 51 [15] S. Arora and B. Barak. Complexity The ory: A Mo dern Appr o ach . C am bridge Universit y Press, 2009. Online dr aft at w w w.cs.princeton.edu/theory/complexit y/. [16] S. Arora, R. Impagliazzo, and U. V azirani. Relativiz ing v ersus nonrelativizing tec hniques: the role of lo cal c hec k abilit y . Man uscript, 1992. [17] S. Arora, C . Lu nd, R. Mot wani, M. Su d an, and M. Szegedy . Pro of verificatio n and the hardness of appro ximation problems. J. ACM , 45(3):501 –555, 1998 . [18] S. Arora and S. Safra. Probabilistic c hec king of pro ofs: a new c h aract erization of NP. J. ACM , 45(1):70–1 22, 199 8. [19] R. J. Aumann. Agreeing to disagree. Annal s of Statistics , 4(6):1 236–1239, 19 76. [20] L. Babai, L. F ortno w, and C . Lund . Nondeterministic exp onent ial time has t wo-pro v er inter- activ e proto cols. Computational Complexity , 1(1):3– 40, 1991 . [21] T. Bak er, J. Gill, and R. Solo v a y . Relativizations of the P= ?NP question. SIAM J. Comput. , 4:431– 442, 19 75. [22] E. B. Baum. What Is Thought? Bradford Bo oks, 2004. [23] R. Beals, H. Buh rman, R. Cleve , M. Mosca, and R. de W olf. Qu an tum lo wer b ounds by p olynomials. J. ACM , 48(4) :778–797, 2001. Earlier version in IEEE F OC S 1998, p p. 352-361. quan t-ph /98 02049. [24] P . Beame and T. Pitassi. Prop ositional p roof complexit y: past, p resen t, and f u ture. Curr ent T r ends in The or etic al Computer Scienc e , pages 42–70, 2001. [25] S. Bellan toni and S. A. Co ok. A n ew r ecursion-theoretic charact erization of the p olytime functions. Computational Complexity , 2:97 –110, 1992. Earlier v ersion in S TOC 1992, p. 283-2 93. [26] M. Ben-Or, S. Goldw asser, J. Kilian, and A. Wigderson. Multi-pro ve r int eractiv e pro ofs: ho w to remo ve the in tractabilit y assump tions. In Pr o c . ACM STOC , p age s 113–131 , 198 8. [27] C. Bennett, E. Bernstein, G. Brassard, and U. V azirani. Strengths and weaknesses of quan tum computing. SIA M J . Comput. , 26(5):1 510–1523, 19 97. quant-ph/97 01001. [28] C. H. Bennett, D. Leung, G. Smith, and J. A. S molin. Can closed timelik e curv es or nonlin- ear q u an tum mec h anics impro ve quantum s tate d iscr im in atio n or help solv e hard problems? Phys. R ev. L ett. , 103(1705 02), 200 9. arXiv:090 8.3023. [29] E. Bern stein and U. V azirani. Quantum complexit y theory . SIAM J. Comput. , 26 (5):1411– 1473, 1997. First app eared in A CM STO C 1993. [30] N. Blo c k. Searle’s argument s against cognitiv e s cie n ce. In J. Preston and M. Bishop, editors, Views i nto the Chinese R o om: New Essays on Se arle and Artificial Intel lig enc e , pages 70–79. Oxford, 2002. [31] A. Blumer, A. Ehr enfeuc h t, D. Haussler, and M. K. W arm u th. Learnabilit y an d the Vapnik- Chervo n enkis d imension. J. ACM , 36(4):929–9 65, 1989. 52 [32] A. Bogdano v and L. T revisan. Av erage-case complexit y . F ounda tions and T r ends in The o- r etic al Com puter Scienc e , 2(1), 2006. EC CC TR06-073. [33] R. B. Boppana, J. H ˚ astad, and S . Z achos. Do es co-NP ha ve sh ort interacti ve pr oofs? Inform. Pr o c. L ett. , 25:127 –132, 19 87. [34] R. Bousso. Po s itive v acuum energy and the N-b ound. J. H igh Ene r gy Phys. , 0011(038 ), 2000. hep-th/001025 2. [35] G. Brassard, D. Ch aum, and C. Cr´ ep eau. Minim um d isclo su re pro ofs of knowle d ge. J. Comput. Sys. Sci. , 37(2):156 –189, 1988 . [36] T. Br u n. Comp uters with closed timelik e cu r v es can solv e hard problems. F oundations of Physics L etters , 16:245–253 , 2003. gr-qc/02090 61. [37] D. J . Chalmers. The Co nsci ous Mind: In Se ar ch of a F undamental The ory . Oxford, 1996. [38] X. Chen, D. Dai, Y. Du, and S.-H. T eng. Settling the complexit y of Arrow-De b reu equilibria in marke ts with additiv ely s eparable utilities. In Pr o c. IEEE F OCS , p age s 273–282 , 200 9. [39] X. Chen and X. Deng. Settling the complexit y of t wo -play er Nash equilibr ium. In Pr o c. IEE E F O CS , pages 261–271, 2006. [40] C. Ch er n iak. Comp u tatio n al c omp lexit y and th e univ ers al acceptance of logic. The Journal of Philosophy , 81(12):739 –758, 1984. [41] R. Clev e, P . Høy er, B. T oner, an d J. W atrous. Consequences and limits of nonlo cal s trate - gies. In Pr o c. IEEE Confer enc e on Computationa l Complexity , pages 236–2 49, 2 004. quan t- ph/040407 6. [42] A. Cobham. T h e in trinsic computational difficult y of fun ctio n s. In Pr o c e e dings of L o gic, Metho dolo gy, and P hilos ophy of Scienc e II . North Holland, 1965 . [43] J. Cop eland. Hyp ercomputation. M i nds and Machines , 12:461 –502, 20 02. [44] C. Dask alakis, P . W. Goldb erg, and C. H. P apadimitriou. The complexit y of computing a Nash equilibrium. Commun. ACM , 52( 2):89–97, 2009. Earlier version in Pr oceedings of STOC’2006. [45] R. Da wkins. The Go d Delusion . Hough ton Mifflin Harcourt, 2006. [46] D. C. Denn ett. D arwin ’s D anger ous Ide a: E volution and the Me anings of Life . Simon & Sc huster, 1995. [47] D. Deutsch. Quantum mechanics near closed timelik e lin es. Phys. R ev. D , 44:3197– 3217, 1991. [48] D. Deutsch. The F abric o f R e ality . P enguin, 1998. [49] D. Deutsc h. The Be ginning of Infinity: Explanations that T r ansform th e World . Allen Lane, 2011. 53 [50] I. Din ur. The PCP theorem b y gap amplification. J. ACM , 54(3):12, 2007. [51] A. Druc ke r . Multiplying 10-digit n umb ers using Flic kr: the p o w er of recognition memory . p eople.csail.mit .edu /andyd/rec m etho d.p d f , 2011. [52] R. F agin. Finite mo del theory - a p ersonal p ersp ectiv e. The or etic al Comput. Sci. , 116:3– 31, 1993. [53] R. F agin, J . Y. Halp ern, Y. Moses, and M. Y. V ardi. R e asoning ab out Know le dge . The MIT Press, 1995. [54] U. F eige, S. Goldwasser, L. Lov´ asz, S. Safra, and M. Szegedy . Interact ive pr oofs and the hardness of appro ximating cliques. J. A CM , 43(2):26 8–292, 199 6. [55] R. P . F eynman. Sim ulating physics with compu ters. Int. J. The or etic al Physics , 21(6-7): 467– 488, 1982. [56] L. F ortno w. The role of relativization in complexit y theory . B u l letin of the EA TCS , 52:229– 244, F ebruary 1994. [57] L. F ortnow. One complexit y theorist’s view of qu an tum computing. The or etic al Comput. Sci. , 292(3):597 –610, 2003 . [58] L. F ortno w and S. Homer. A short history of computational complexit y . Bul letin of the EA TCS , (80):95–1 33, 2003. [59] A. F r aenk el and D. Lic h tenstein. Compu ting a p erfect strategy for nxn c hess requires time exp onen tial in n. Journal of Co mbinatorial The ory A , 31:199–21 4, 1981. [60] C. Gent r y . F ully homomo r phic encry p tion usin g ideal lattices. In Pr o c. ACM STOC , pages 169–1 78, 20 09. [61] O. Goldreic h. On q u an tum compu ting. w ww.wisdom.w eizmann.ac.il/˜oded /o n -qc.h tml, 2004. [62] O. Goldreic h. Computational Compl exi ty: A Con c eptual Persp e ctive . Cam br idge Unive r sit y Press, 2008. Earlier version at www.wisd om.w eizmann.ac.il/ ˜o ded/cc-drafts.ht ml. [63] O. Goldreic h . A Primer on Pseudor andom Gener ator s . American Mathematica l So ciet y , 2010. w ww.wisdom.w eizmann.ac.il/˜oded /PDF/ pr g10 .p d f. [64] O. Goldreic h, S. Goldw asser, and S. Mic ali. Ho w to construct r andom fu nctions. J. ACM , 33(4): 792–807, 1 984. [65] O. Goldr eich, S. Micali, an d A. Wigderson. P r oofs that yield nothing b ut their v alidit y or all languages in NP ha ve zero-kno wledge pro of systems. J. ACM , 38(1):6 91–729, 19 91. [66] S. Goldw asser, S. Micali , an d C. Rack off. The kno wledge complexit y of interact ive pro of systems. SIAM J. Comput. , 18(1):1862 08, 1989 . [67] N. Goo dman. F act, Fiction, and F or e c ast . Harv ard Un iv ersit y Press, 1955. [68] P . Gr ah am. Ho w to do p hilosoph y . ww w.paulgraham.com/philosoph y .h tml, 2007. 54 [69] L. K. Gro ver. A fast qu an tum mec hanical algorithm for database searc h. In Pr o c. ACM STOC , pages 212–2 19, 1 996. quant -p h /96 05043. [70] J. Hartmanis and R. E. Stearns. On the computational complexit y of algorithms. T r ansactions of the Americ an Mat hematic al So ciety , 117:285–3 06, 1965. [71] J. Haugeland. Synta x, semantic s, ph ysics. In J. Preston and M. Bishop, editors, Views into the Chinese R o om: New Essays on Se arle and Artificial Intel ligenc e , pages 379–39 2. O xford, 2002. [72] J. Hin tikk a. Know le dge and Belief . Cornell Univ ersity Press, 1962. [73] M. Hogarth. Non-Turing compu ters and non-Turin g computabilit y . Biennial Me eting of the Philosophy of Scienc e Asso ciation , 1:126–138 , 1994. [74] A. S. Holev o. Some estimates of the information transmitted by quan tum comm unication c hannels. Pr oblems of Information T r ansmission , 9:177– 183, 1973. English translation. [75] G. ’t Ho oft. Quant u m gravit y as a dissipativ e deterministic system. Classic al and Quantum Gr avity , 16:3263– 3279, 1999 . gr-qc/990 3084. [76] D. Hu m e. An Enquiry c onc erning H uman Understanding . 17 48. 18th.eserv er.org/h u me- enquiry .h tml. [77] N. Immerman. De scriptive Complexity . Springer, 1998. [78] R. Impagliaz zo. A p ersonal view of av erage-c ase complexit y . In P r o c. IEEE Confer enc e on Computation al Complexity , p age s 134–147 , 199 5. [79] R. Impagliazzo and A. Wigderson. P=BPP unless E h as sub exp onenti al circuits: derandom- izing the X OR Lemma. In Pr o c. ACM STO C , pages 220– 229, 1 997. [80] D. Kahneman, P . Slo vic, and A. Tv ersky . Judgment U nder Unc ertainty: Heuristics and Biases . Cambridge Unive r sit y Press, 1982. [81] M. J. Kearns and L. G. V alian t. Cryptographic limitations on learning Bo olea n f orm ulae and finite automata. J. ACM , 41(1):67– 95, 1994. [82] M. J. Kearns and U. V. V azirani. An Intr o duction to Computational L e arning The ory . MIT Press, 1994. [83] J. Kemp e, H. Koba ya sh i, K. Matsumoto, B. T oner, and T . Vidick. E n tangled games are hard to approximat e. SIAM J. Comput. , 40(3): 848–877, 2011. Earlier v ersion in FOCS’2008 . arXiv:0704 .2903. [84] A. Kliv ans and D. v an Melk eb eek. Graph nonisomorphism has sub exp onentia l size pr oofs un - less the p olynomial-time hierarc hy collapses. SIAM J . Comput. , 31:1501– 1526, 2 002. Earlier v ersion in A C M STOC 1999. [85] R. E. Ladner. On the structur e of p olynomial time reducibilit y . J. ACM , 22:155 –171, 1975. 55 [86] D. Leiv an t. A f oundational delineat ion of p oly-ti m e. Information and Computa tion , 110(2 ):391–420, 199 4. Earlier v ersion in LICS (Logic In Compu ter Science) 1991, p. 2-11. [87] H. J. Lev esque. Is it enough to get the b eha vior right? In Pr o c e e dings of IJCA I , pages 1439– 1444, 20 09. [88] L. A. Levin. Polynomial time and extra v agan t mo dels, in T he tale of one-wa y functions. Pr oblems of Information T r ansmission , 39(1):9 2–103, 2 003. cs.CR/0012023 . [89] M. Li and P . M. B. Vit´ anyi. An Intr o duction to Kolmo gor ov Complexity and Its Applic ations (3r d e d.) . Sprin ger, 2008. [90] S. Llo yd, L. Maccone, R. Garcia-P atron, V. Gio v annetti, and Y. Shik ano. The quantum mec hanics of time tr a v el through p ost-selec ted telep ortatio n. Phys. R ev. D , 84(02500 7), 2011. arXiv:1007.2615 . [91] J. R. Lucas. Minds, mac hines, and G¨ odel. Philosophy , 36:11 2–127, 1961. [92] N. D. Mermin. F rom cbits to qbits: teac hing computer scien tists quan tum m ec hanics. A mer- ic an J. Phys. , 71(1):23– 30, 2003. quant-ph/020 7118. [93] N. D. Mermin. Quantum Computer Scienc e: An Intr o duction . Cambridge Un iv ersit y Pr ess, 2007. [94] S. Micali. Compu tatio nally sound p ro ofs. SIA M J . Comput. , 30(4):12 53–1298, 200 0. [95] C. Mo ore and S. Mertens. The Natur e of Computation . Oxford Universit y Press, 2011. [96] M. S. Morris, K . S. Thorne, and U. Y urtsev er. W ormh oles, time mac hines, and the w eak energy condition. Phys. R ev. L ett. , 61:1446– 1449, 1988 . [97] A. Morton. Epistemic vir tues, meta virtu es, and computational complexit y . Noˆ us , 38(3):481– 502, 2004. [98] A. Neyman. Bounded complexit y justifies coop eration in the finitely rep eated p risoners’ dilemma. E c onomics L etters , 19(3):227–2 29, 1985. [99] M. Nielsen and I. Chuang. Quantum Computation and Q uantum Information . C am bridge Univ ersity Press, 2000. [100] C. H. Papadimitriou. Computa tional Complexity . Addison-W esley , 1994. [101] I. P arb erry . Knowledge , u n derstanding, and compu tational complexit y . In D. S. Levine and W. R. Elsb erry , editors, Optimality in Biolo gic al and Artificial Networks? , p age s 125–1 44. La wrence Erlbaum Asso ciates, 1997. [102] R. Penrose. The Emp er or’s New M ind . O xford, 1989. [103] R. P enrose. Shadows of the M ind: A Se ar c h for the Missing Scienc e of Consciousness . Oxford, 1996. 56 [104] L. Pitt and L. V alian t. Compu tational limitations on learning fr om examples. J. ACM , 35(4): 965–984, 1 988. [105] C. Pomerance . A tale of t wo siev es. Notic es of the Americ an Mathemat ic al So ciety , 43(12 ):1473–1485, 199 6. [106] H. Pu tnam. R epr esentation a nd R e ality . Bradford Bo oks, 1991. [107] R. R az. Exp onen tial s eparatio n of quantum and classical comm un ication complexit y . In Pr o c . ACM STOC , p age s 358–367 , 1999 . [108] A. A. Razb oro v and S. Rud ic h. Natural p r oofs. J. Comput. Sys. Sci. , 55(1) :24–35, 1 997. [109] S. Reisc h . Hex is PSP A C E-co m p lete. A cta Informatic a , 1 5:167–191 , 1981. [110] H. E. Rose. Su b r e cursion: F unctions and Hier ar chies . Clarend on Pr ess, 1984. [111] A. Ru binstein. Mo deling Bounde d R ationa lity . MIT Press, 1998. [112] A. Sch¨ onhage and V. S trassen. Sc h nelle Multiplik atio n großer Zahlen. Computing , (7):281– 292, 1971. [113] J. Searle. Minds, brains, and pr ograms. Behavior al and Br ain Scienc es , 3(417-457 ), 1980. [114] J. Searle. The R e disc overy of the Mind . MIT Press, 1992. [115] A. S hamir. IP=PSP ACE. J. ACM , 39(4):8 69–877, 19 92. [116] S. M. Shieb er. Th e Turing test as interac tive pr oof. Noˆ us , 41(4):686– 713, 2007. [117] P . W. Shor. Po lyn omial-time algorithms for prime facto rization and discrete logarithms on a quan tum computer. SIAM J. Comput. , 26(5 ):1484–150 9, 1997 . Earlier version in IEEE F OCS 1994. quant-ph/950 8027. [118] H. T. Siegelmann. Neural and sup er-Tu ring computing. Minds and Machines , 13(1):103– 114, 2003. [119] D. S imon. O n the p o w er of quantum co mp utation. In Pr o c. IE EE FOCS , pages 116–1 23, 1994. [120] H. A. Simon. A b eh a vioral mo del of rational c h oice . The Quart erly Journal of Ec onomics , 69(1): 99–118, 19 55. [121] M. Sipser. The history and status of the P versus NP qu estio n . In Pr o c. A CM STOC , pages 603–6 18, 19 92. [122] M. Sipser. Intr o duction to the The ory of Computation (Se c ond E dition) . Course T echnolog y , 2005. [123] R. S talnak er. Th e pr oblem of logica l omniscience, I and I I. In Context and Con tent: E ssays on Intentionality in Sp e e ch and Thought , Oxford Cognitiv e Science Series, p ages 241–273. Oxford Universit y Press, 1999 . 57 [124] L. J. Sto c kmey er. Classifying the computational complexit y of problems. J. Symb olic L o gic , 52(1): 1–43, 19 87. [125] J. A. S torer. On the complexit y of chess. J. Comput. Sys. Sci. , 27(1):77 –100, 1 983. [126] A. M. T uring. Comp uting m achinery and in telligence. M i nd , 59:433– 460, 195 0. [127] L. G. V alian t. A theory of the learnable. Communic ations of th e A CM , 27:1134– 1142, 1984 . [128] L. G. V alia nt. Evolv abilit y . J. ACM , 56(1), 2009. Confer en ce version in MF CS 2007. ECCC TR06-120 . [129] V. V apnik and A. Chervonenkis. O n th e u niform con v ergence of r elat ive frequ en cie s of ev ents to their probabilities. The ory of P r ob ability and its Applic ations , 16(2):264 –280, 1971 . [130] H. W ang. A L o gic al Journey: F r om G ¨ odel to Philosophy . MIT Press, 1997. [131] J. W atrous. S uccinct q u an tum pro ofs for prop erties of fin ite group s. In P r o c. IEEE FOCS , pages 537–546 , 2000 . cs.CC/0009002. [132] J. W atrous. Q uan tum computational complexit y . In Encyclop e dia of Complexity and Systems Scienc e . Sp r inger, 2008. arXiv:0804.340 1. [133] A. Wigderson. P, NP and mathematics - a c omp utatio n al complex- it y p ersp ectiv e. In P r o c e e dings of the International Co ngr ess of Math- ematicians 2006 (M ad rid) , pages 665–712. EMS Pu blishing House, 2007. www.math.ias.edu/˜a vi/PUBLICA TIONS/MYP APERS/W06/w06 .p d f . [134] A. Wigderson. Kno wledge, creativit y and P versus NP, 2009. www.math.ias.edu/˜a vi/PUBLICA TIONS/MYP APERS/A W09/A W09. p d f. [135] E. Wigner. The unreasonable effectiv eness of mathematics in the natural sciences. Commu- nic ations in Pur e and Applie d Mathematics , 13(1), 1960. [136] R. de W olf. Philosophical applicati ons of computational learning theory: Chomsky an innateness and o ccam’s razor. Maste r ’s thesis, Erasm us Unive r sit y , 1 997. home- pages.cwi.nl/˜rdew olf/publ/philosophy/ph thesis.p df. 58
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment