Why Philosophers Should Care About Computational Complexity

Wh y Philosophers Should Care Ab out Computational Complexit y Scott Aaronson ∗ Abstract One might think that, once we kno w something is computable, how eﬃciently it can b e com- puted is a practical questio n with little further philo sophical imp ortance. In this essay , I o ﬀer a detailed case that one would b e wrong . In particula r, I a rgue that c omputational c omplexity the- ory —the ﬁeld that studies the resources (such a s time, space, and rando mness) needed to s olv e computational problems—leads t o new pers pectives on the nature of mathematical knowledge, the stro ng AI debate, computationalism, the problem o f logical omniscie nc e , Hume’s pro blem of induction, Go odman’s grue riddle, the foundations of quantum mec ha nics, economic ra tionalit y , closed timelike curves, and several o ther to pics o f philos ophical interest. I end by discuss ing asp ects of complexity theory itself tha t could b eneﬁt fro m philosophica l analysis . Con ten ts 1 Introduction 2 1.1 What T his Essay W on ’t Co ve r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Complexity 101 5 3 The Relev ance of P olynomial Time 6 3.1 The Entsc heidungsproblem Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Ev olv abilit y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Kno wn Int egers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Computational Complexity and the T uring T est 10 4.1 The Lo okup-T able Argum ent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 Relation to Pr evious W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Can Humans Solv e NP -Complete P r oblems Eﬃ cie ntly? . . . . . . . . . . . . . . . . . 14 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 The Problem of Logical Omniscience 16 5.1 The Cobham Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2 Omniscience V ersus Inﬁnity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 ∗ MIT. Email: aaronson@csail. mit.edu . This material is based up on work supp orted by th e National Science F ound ation under Gran t N o. 08446 26. Also supp orted by a DARP A YF A gran t, the Sloan F oundation, and a TIBCO Chair. 1 6 Computationalism and W aterfalls 22 6.1 “Reductions” T h at Do All The W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7 P A C-Learning and the Problem of Induction 25 7.1 Dra wbac ks of the Basic P A C Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.2 Computational C omplexit y , Bleen, and Grue . . . . . . . . . . . . . . . . . . . . . . 29 8 Quantum C omputing 32 8.1 Quant u m C omputing an d the Man y-W orlds Interpretati on . . . . . . . . . . . . . . . 34 9 New Computational Notions of Pro of 36 9.1 Zero-Kno wledge Pro ofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 9.2 Other New Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 10 C omplexit y , Space, and Time 40 10.1 Closed Timelike Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 10.2 The Evo lutionary Pr inciple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 10.3 Closed Timelike Curve Compu tat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 11 E cono mics 45 11.1 Bounded Rationalit y and the Iterated Prisoners’ Dilemma . . . . . . . . . . . . . . . 46 11.2 The Complexit y of Eq u ilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 12 C onclusio ns 48 12.1 Criticisms of Complexit y Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 12.2 F uture Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 13 Ackno wledgmen ts 51 1 Introd uction The view that machines c annot give rise to surprises is due, I b elieve, to a fal lacy to which philosophers and mathematicians ar e p articularly subje ct. This is the assumption that as so on as a fact is pr ese nte d to a mind al l c onse quenc es of that fact spring into the mind si multane ously with it. It is a very useful assumption u nder many cir cumstanc es, but one to o e asily for gets that i t is false. —Alan M . T uring [126] The theory of compu ting, created by Alan T uring, Alonzo Ch urch, K urt G¨ odel, and others in the 1930s, didn’t only c hange civilization; it also had a lasting im p act on philosophy . Indeed, clarifying philosophical issues was t h e original p oint of their w ork; the tec hnologica l pay oﬀs only came later! T o da y , it would b e hard to imagine a serious discussion ab out (sa y) the p hilosoph y of mind, the fou n dations of mathematics, or the prosp ects of mac hine intellig ence th at was un informed b y this revol u tion in human kn o wledge three-quarters of a cent u ry ago. Ho w ever, as computers b ecame w idely av ailable starting in the 1960s, computer scien tists in- creasingly came to see c omp utabilit y theory as not a skin g quite the righ t questio n s. F or a lmost al l the problems we actually w an t to so lve turn out to b e computable in T urin g’s sense; the r eal question is wh ic h pr oblems are eﬃciently or fe asibly compu table. The latter qu estio n ga ve rise to a 2 new ﬁeld, called compu tatio n al complexit y theory (not to b e confused with the “other” complexity theory , w hic h studies complex systems such as cellular automata). Since the 1970s, computational complexit y theory has witnessed some sp ectacular disco ve r ies, whic h includ e NP -completeness, public-k ey c r yptograph y , new types of mathematical pro of (such as probabilistic, inte r act ive, and zero-kno wledge pro ofs), and the theoretical foun dations of machine learning and qu an tum compu- tation. T o p eople who work on these topics, the work of G¨ odel and T uring ma y lo ok in retrosp ect lik e just a warm up to the “big” qu estio n s ab out computation. Because of this, I ﬁn d it su rprising that complexit y theory has not inﬂuen ced philosoph y to an ything lik e the extent c omp u tabilit y t h eory has. The question arises: w hy hasn ’t it? Sev eral p ossible answ ers spring to mind : mayb e computabilit y theory just had r icher philosophical im - plications. (Though as w e’ll see, one c an mak e a strong case for exactly the opp osite.) Ma yb e complexit y h as essen tially the same philosophical implications as computabilit y , and computability got there ﬁrst. Ma yb e outsiders are scared a w ay from learning complexit y theory by the “math barrier.” Ma y b e the exp lanatio n is so cial: the wo rld where G¨ odel, T ur in g, Wittgenstein, and Russell participated in the same intelle ctual con v ersation v anished with W orld W ar I I; after that, theoretical computer science came to b e drive n by tec hn olo gy an d lost touc h with its philosophical origins. Ma yb e r ece nt adv ances in complexit y th eory simply h a v en’t had enough time to en ter philosophical consciousness. Ho w ever, I s u sp ect that part of the answ er is just c omplexity t he orists’ failur e to c ommunic ate what th ey can add to philosophy’s conceptual arsenal. Hence this essay , whose mo dest goal is to help correct that failure, by surveying some asp ects of complexit y theory that migh t interest philosophers, as w ell as some philosophical problems that I think a complexit y p ersp ectiv e can clarify . T o forestall misu nderstandings, let me add a note of humilit y b efore going further. This essa y will touc h on man y pr oblems that philosophers ha v e debated for generations, suc h as strong AI, the problem of induction, the relation b et ween syn tax and seman tics, and the interpretatio n of quan tum mec hanics. In none of these c ases w ill I claim that computational complexit y theory “dissolv es” the p hilosophical pr ob lem—only th at it contributes useful p ersp ectiv es and insigh ts. I’ll ofte n exp licitly ment ion ph ilosophical puzzles that I thin k a complexit y analysis eit h er lea v es unt ouched o r else in tro duces itself. But ev en where I don’t do so, one shouldn’t presu me that I think there are no such pu zzles! Indeed, one of my hop es for this essa y is that computer scienti sts, mathematicians, and other tec hnical p eople wh o read it will come a wa y with a b etter appreciation for the s ubtlet y of some of the p roblems consid ered in mo dern analytic ph iloso p h y . 1 1.1 What This Essa y W on ’ t Co v er I won’t try to discuss ev ery p ossible connectio n b et ween computational complexit y and philosophy , or ev en ev ery connection that’s already b een made. A small num b er of philosophers hav e long in vok ed computational complexit y ideas in their w ork; indeed, the “philpap ers arc h iv e” lists 32 pap ers und er the heading Compu tatio n al Complexit y . 2 The ma jorit y of those pap ers prov e theo- rems ab out the compu tatio nal complexities of v arious logical systems. Of the remaining pap ers, some use “computational complexit y” in a diﬀerent sens e t h an I d o—for example, to encompass 1 When I use th e word “philosoph y” in t h is essay , I’ll me an philosophy within the analytic tradition. I don’t understand Con tinental or Eastern philosophy wel l enough to sa y whether they ha ve any interesting connections with computational complexity theory . 2 See philpap ers.org/bro wse/computational-complexit y 3 computabilit y theory—and some in vok e the c onc ept o f compu tat ional complexity , bu t n o particular results from th e ﬁeld dev oted to i t. P erhaps the c losest in spirit to this essay are the interesting articles by C herniak [40] and Morton [97]. In addition, many writers hav e made some v ersion of the observ ations in Section 4 , ab out co mp utatio n al c omp lexity a n d the T u ring T est: see for example Block [30], P arb erry [101], Lev esque [87], and Shieb er [116]. In deciding wh ic h connections to include in this essa y , I adopted the follo wing ground r ules: (1) The connection must in volv e a “prop erly philosophical” prob lem—f or example, the ju stiﬁca- tion for induction or the natur e of mathematical kn o wledge—and not just a tec hnical pr ob lem in logic or mo del theory . (2) The connection must draw on sp e ciﬁc insights from the ﬁeld of computational complexit y theory: n ot j ust th e ide a of complexit y , or the f act that th ere exist hard pr oblems. There a r e man y philosophically-in teresting ideas in mo dern complexit y theory that this essa y men tions only brieﬂy or not at all. One example is pseudor andom gener ators (see Goldr eich [63]): functions th at con v ert a short rand om “seed” into a long string of bits that, w hile n ot truly random, is so “random-lo oking” that no eﬃcient algorithm can detect an y regularities in it. While pseudorand om generators in this sens e are not yet pro ve d to exist, 3 there are many plausible candidates, and the b elief that at least some of the candidates work is cen tral to m odern cryptograph y . (S ection 7.1 will in vok e the related concept of pseu dorandom fu nc tions .) A second example is ful ly homomor phic encryption : an extremely exciting new class of metho ds, the ﬁrst of whic h was announced by Gen try [60] in 2009, for p erforming arbitrary computations on encrypted data without ever de crypting the data . The output of suc h a computation will look like meaningless gibb erish to the p erson who computed it, b ut it can n ev ertheless b e u ndersto od (and even recognized as the correct outpu t) b y someone who knows the decryption k ey . What are the implications of pseudorand om generators for the foundations of probabilit y , or of fully homomorphic encryption for debates ab out t h e s eman tic m eanin g o f computatio n s? I very muc h hop e that t h is essa y will inspire others to tac kle these and similar questions. Outside of computational complexit y , there are at least th ree ot h er ma jor in tersection p oin ts b et w een ph ilosoph y and modern theoretical computer science. The ﬁrst o n e is the semantics of pr o gr amming languages , whic h has large and ob vious co n n ectio ns to t h e philosophy of language. 4 The second is distribute d systems the ory , which pro vides b oth an app lica tion area and a rich source of examples for philosophical w ork on r easoning ab out kno wledge (see F agin et al. [53] and Stalnak er [123]). The third is Kolmo gor ov c omplexity (see Li and Vit´ an yi [89]) which studies the length of the sh ortest computer program that achiev es s ome functionalit y , disregarding t ime, memory , and other resour ces u sed b y th e pr og r am. 5 In this essa y , I won’t d iscuss any of these connections, except in p assing (for example, Section 5 touc h es on logics of knowledge in the cont ext of the “logical omniscience problem,” and Section 7 touc hes on Kolmogoro v complexit y in the con text of P A C-learning). In defense of these omissions, let me oﬀer f our excuses. First, th ese other conn ections fall ou tsid e m y stated topic. Second, they 3 The conjecture that p seudorandom generators ex ist implies the P 6 = NP conjecture (ab out which more later), but migh t b e even stronger: the con verse implication is un kno wn. 4 The Stanfor d Encyclop e di a of Philosophy entry on “The Philosophy of Computer S cience,” plato.stanford.edu/en tries/computer-science, devotes most of its space to this connection. 5 A v ariant, “resource-b ounded Kolmogoro v complexity ,” do es take time and memory into account, and is part of computational complexity theory prop er. 4 w ould mak e this essa y ev en longer than it already is. Th ir d, I lac k requisite bac kground . And fourth, my impression is that philosophers—at least some philosophers—are already more aw are of these other connections th an they are of th e computational complexit y conn ections that I w ant to explain. 2 Complexit y 101 Computational complexit y theory is a h uge, sp ra wling ﬁeld; naturally this essa y will only touc h on small p arts of it. Readers w h o w ant to delve deep er in to the sub ject are urged to consult one of the man y outstanding textb ooks, suc h as th ose of Sip ser [122], Pa p adimitriou [100], Mo ore and Mertens [95], Goldreic h [62], or Arora and Barak [15 ]; or surv ey articles by Wigderson [133 , 134], F ortno w and Homer [58], or Sto c kmey er [124]. One might think that, once w e kno w something is c omp utable , whether it tak es 1 0 seconds or 20 s econds to compute is ob viously the concern of enginee r s rather than philosophers. But that conclusion would not b e so obvious, if the question were one of 10 seconds v ers u s 10 10 10 seconds! And indeed, in complexity theory , the quan titativ e gaps we care ab out are u sually so v ast that one has to consider them qualitativ e gaps as we ll. Think, for example, of the diﬀeren ce b et we en reading a 400-page bo ok and reading every p ossible such b o ok, or b etw een writing do wn a thou s and-digit n u mb er and coun ting to that num b er. More precisely , complexit y theory asks th e qu estio n : h o w do the resources n eeded to solv e a p r oblem scale with some m easure n of the prob lem size: “reasonably” (lik e n or n 2 , s ay), or “unreasonably” (lik e 2 n or n !)? As an example, t wo n -digit int egers can b e multiplied using ∼ n 2 computational steps (by th e grade-sc ho ol metho d), or ev en ∼ n log n log log n steps (b y more adv anced metho ds [112]). Either metho d is co n sidered eﬃcien t. By co ntrast, the fa s test kno wn metho d for the reve r se op eration— factoring an n -digit inte ger int o primes—us es ∼ 2 n 1 / 3 steps, whic h is considered ineﬃcient. 6 F amously , this conjectured gap b et ween the inherent d iﬃculties of m ultiplying an d factoring is the basis for m ost of the cryp tograph y cur ren tly used on th e In tern et. Theoretical computer scientist s generally call an algorithm “eﬃcien t” if its runn ing time can b e upp er-b oun ded by any p olynomial function of n , and “ineﬃcien t” if its run ning time can b e lo w er-b oun ded by any exp onentia l function of n . 7 These criteria ha ve the great adv an tage of theoretical con venience. While th e exact complexit y of a problem migh t dep end on “lo w-lev el enco ding details,” suc h as whether our T urin g mac h ine has one or t wo memory tap es, or how the in puts are encod ed as binary strings, where a p roblem falls on the p olynomial/exp onen tial dic hotom y can b e sho wn to b e indep end en t of almost all such choic es. 8 Equally imp ortan t are the closur e pr op erties of p olynomial and exp onent ial time: a p olynomial-time algorithm that calls a p olynomial-time subroutin e still yields an o v erall p olynomial-t ime algo r ithm, while a polynomial- 6 This method is called the num b er ﬁeld sieve , and the quoted ru nning time dep ends on plausible but u npro ved conjectures in number theory . The b est pr oven running time is ∼ 2 √ n . Both of th ese represent nontrivial improve - ments o ver the na ¨ ıve metho d of trying all p ossible divisors, which takes ∼ 2 n steps. See Pomerance [105] for a goo d survey of factoring algorithms. 7 In some contexts, “expon ential” means c n for some constant c > 1, but in most complex ity-theoretic contexts it can also mean c n d for constants c > 1 and d > 0. 8 This is not to say that no detail s of the comp u tational mo del matter: for ex ample, some problems are known to b e solv able in p olynomial time on a qu an tum computer, b u t not k no wn to b e solv able in p olynomial time on a classical comput er! But in my view, the fact th at the p olynomial/exponential distinction can “notice” a mo delling choi ce of this magnitude is a feature of t he distinction, not a b ug. 5 time algorithm that calls an exp onen tial-time su broutine (or vice ve r s a) yields an exp onentia l-time algorithm. Ther e are also more sophisticated r easons why theoretical compu ter scientists f ocus on p olynomial time (rather t h an, sa y , n 2 time o r n log n time); w e’ll explore some of those reasons in Section 5.1. The p olynomial/ exp onential distinction is op en to ob vious ob jections: an algorithm th at to ok 1 . 0000 0001 n steps would b e muc h faster in practice than an algorithm that to ok n 10000 steps! F urthermore, there are man y gro wth r ates that fall b et wee n p olynomial and exp onential , suc h as n log n and 2 2 √ log n . But empirically , p olynomial-time turne d out to corresp ond to “eﬃcien t in pr actice,” and exp onen tial-time to “ineﬃcien t in practice,” so often that complexity theorists b ecame comfortable making the iden tiﬁcation. Why the iden tiﬁcation w orks is an in teresting question in its o wn righ t, one to w hic h we will r eturn in Section 12. A priori , insisting that p rograms terminate after r easo n able amoun ts of time, that they use reasonable amount s of memory , etc. m ight sound lik e relativ ely-minor amendments to T uring’s notion of compu tatio n . In pr actice, though, these r equiremen ts lead to a theory with a completely diﬀeren t c haracter than computabilit y theory . Firstly , complexit y has muc h closer connections with the scienc es : it lets u s p ose questions ab out (for example) ev olution, quan tum mec hanics, statistica l physics, economics, or human language acquisition that would b e meaningless from a computabilit y standp oint (sin ce al l the relev ant pr oblems are computable). Comp lexit y also diﬀers from computabilit y in the diversit y of m athemat ical te chniques used: while initially complexit y (like computabilit y) dr ew mostly on mathematical logic, to da y it draws on p robabilit y , n u m b er theory , com binatorics, rep resen tation theory , F ourier analysis, and n early ev ery other sub ject ab out whic h y ello w b o oks are written. Of course, this con tribu tes n ot only to complexit y theory’s depth but also to its p erceiv ed in acc essibility . In this essay , I’ll argue that complexit y theory has dir ect r elev ance to ma jor issues in p hilosoph y , including syntax and semant ics, the problem of ind uction, and the in terpretation of quan tum mec hanics. Or that, at least, whether complexit y theory do es or do es not hav e such relev ance is an imp ortant question for ph ilosoph y! My p ersonal view is that complexity will ultimately pr ov e mor e relev ant to p hilosoph y than computabilit y was, p r ecisel y b ecause of the r ic h conn ections with the sciences m en tioned earlier. 3 T he Relev ance of Pol yn omial Time An yone who doubts the imp ortance of the p olynomial/exp onen tial distinction n eeds only p ond er ho w many basic in tuitions in math, science, and philosophy already imp licitl y rely on that d istin c- tion. In this section I’ll give three examples. 3.1 The En t sc heidungsproblem Revisited The Entscheidungspr oblem wa s the dream, en u nciated b y Da vid Hilb ert in the 1920s, of design- ing a mec h anical pro cedure to determin e the tru th or falseho o d of an y w ell-formed mat h emati cal statemen t. According to th e usual story , Hilbert’s dream w as irrev o cably destro yed by th e w ork of G¨ odel, Churc h, and T uring in the 1930s. First, the Incompleteness Theorem sho wed that n o recursiv ely-axiomatizable formal system can enco de al l and only the true mathematical statemen ts. Second, Churc h’s and T u r ing’s resu lts s ho w ed that , ev en i f we settle for an in complete system F , there is stil l no mec h an ical pro cedure to sort mathematical statemen ts into the three categories 6 “pro v able in F ,” “dispro v able in F ,” and “un decidable in F .” Ho w ever, t h ere is a catc h in the ab o ve story , wh ic h was ﬁ r st p ointe d out b y G¨ odel himself, in a 1956 letter to John von Neumann that h as b ecome famous in theoretical computer science since its redisco v ery in the 1980s (see Sipser [121] for an English translatio n ). Giv en a formal system F (suc h as Z ermelo-F raenk el set theory), G¨ odel wrote, co n sider the problem of dec id ing wh ether a mathematical statemen t S has a pro of in F with n symb ols or fewer. Unlik e Hil b er t’s orig in al problem, this “truncated En tscheidungsproblem” is clearly d ecidable. F or, if nothing else, we could alw a ys just program a compu ter to searc h through all 2 n p ossible bit-strings with n sym b ols, and c hec k whether any of them enco des a v alid F -proof of S . Th e issue is “merely” that this app roac h tak es an astronomical amount of time: if n = 1000 (say), then the univ erse will ha ve degenerated in to blac k holes and radiation long b efore a compu ter can c hec k 2 1000 pro ofs! But as G¨ odel also p oin ted out, it’s far fr om obvious ho w to pr ove that there isn’t a m u c h b etter appr oa ch: an approac h that would a vo id brute-force searc h, and ﬁnd pr oofs of size n in time p olynomial in n . F urthermore: If there actually w ere a mac hine with [runn ing time] ∼ K n (or ev en only with ∼ K n 2 ) [for s ome constan t K in dep enden t of n ], this w ould h a v e consequences of the greatest magnitude. That is to sa y , it would clearly indicate t h at, despite the unsolv abilit y of the En tsc heidun gsproblem, the menta l eﬀort of t h e mathematician in the case of y es- or-no questions could b e c ompletely [added in a fo otnote: apart from the p ostulation of axioms] rep lac ed by mac hines. O ne w ould ind eed ha ve to simply select an n so large that, if th e mac hin e y ields n o result, there wo u ld then also b e no reason to th ink fu rther ab out the pr oblem. If we replace the “ ∼ K n o r ∼ K n 2 ” in G¨ odel’s c hallenge b y ∼ K n c for an arbitr ary constant c , then w e get precisely what computer s cience no w k n o ws a s the P v ersus NP problem. Here P (P olynomial-Time) is, roughly sp eaking, the class of all computational p roblems that are solv able b y a p olynomial-time algorithm. Mean while, NP (Nondeterministic Polynomia l-Time) is th e class of computational problems for whic h a solution can b e r e c o gnize d in p olynomial time, ev en though a s olutio n might b e v ery hard to ﬁnd. 9 (Think, for example, of factoring a large num b er, or solving a jigsa w or Su d oku pu zzle.) Clearly P ⊆ NP , so the question is whether the in clusion is strict. If P = NP , then the ability to c he ck the solutions to pu zzl es eﬃcient ly would imply th e abilit y to ﬁnd solutions eﬃcien tly . An analo gy w ould b e if an yo n e able to appr e ciate a great symph on y could also comp ose one themselv es! Giv en the intuitiv e imp lausibilit y of suc h a scenario, essent ially all complexit y theorists p roceed (reasonably , in m y opin ion) on the assumption that P 6 = NP , ev en if they pu blicly claim op en- mindedness ab out t h e qu estion. Pro ving or dispro ving P 6 = NP is one of t h e sev en milli on-d olla r Cla y Millennium Prize Problems 10 (alongside the Riemann Hyp othesis, the Po incar´ e C onjecture 9 Con trary to a common misconception, NP d oes not stand for “Non-Polynomial”! There ar e computational problems that are known to require more than p olynomial time (see Section 10), but th e NP problems are not among those. Indeed, the classes NP and “Non - P olynomial” have a nonempty intersection exactly if P 6 = NP . F or detailed deﬁnitions of P , NP , and several hundred other complexity classes, see my Complexity Zo o website: www.complexit yzo o.com. 10 F or more in formation see www.cla ymath .org/ millennium/P vs N P/ My own view is t hat P versus NP is manifestly the most im p ortant of th e seven problems! F or if P = NP , then by G¨ odel’s argumen t , th ere is an excellen t chance that w e could program our computers t o solv e the other six p rob lems as w ell. 7 pro ved in 2002 by P erelman, etc.), wh ic h should giv e some indication of the prob lem’s diﬃculty . 11 No w retur n to the p roblem of wh ether a mathematical statement S has a pro of with n symbols or fewe r, in some formal s ystem F . A suitable form alization of this p roblem is easily seen to b e in NP . F or ﬁnding a pro of might b e int r act able, but if we’ r e given a pu rp orted pro of, w e can certainly c hec k in time p olynomial in n whether eac h line of the pro of follo ws b y a simp le logical manipulation of pr evious lines. In deed, this pr ob lem turns out to b e NP -c omplete , wh ic h means that it b elongs to an enormous class of NP p r oblems, ﬁrst ident iﬁed in the 1970s, that “capture the en tire diﬃcult y of NP .” A few other examples of NP -complete p roblems are Sudoku and jigsa w p uzzles, the T ra vel in g Salesp erson Problem, and the satisﬁabilit y problem f or prop ositional formulas. 12 Asking wh ether P = NP is equiv alen t to asking whether any NP -complete problem can b e solve d in p olynomial time, and is also equiv alen t to asking whether al l of them can b e. In mo dern terms, then, G¨ odel is saying th at if P = NP , then when ev er a theorem had a pro of of reasonable length, we could ﬁnd that pro of in a r easo n able amount of time. I n suc h a s itu ation, w e might say t h at “for all practical purp oses,” Hilb ert’s dream of mechanizing mathemati cs had prev ailed, d espite th e un decidabilit y results of G¨ odel, C hurc h, and T ur ing. If y ou accept this, th en it seems fair to sa y that unti l P v ersus NP is solv ed, the story of Hilb ert’s En tscheidungsproblem—its rise, its fall, and the consequences for ph ilosoph y—is not yet o v er. 3.2 Ev olv ability Creationists often claim that Darwinian ev olution is as v acuous an explanation for complex adap- tations as “a tornado assem blin g a 747 airplane as it p asses through a junkya r d.” Why is this claim false? There are several related w a ys of answering the question, but to me, one of the m ost illuminating is the fol lowing. In principle, one c ould see a 747 a ssemble itself in a tornado-pr one junkyard—but b efore that happ ened, one w ould need to wa it for an exp ected num b er of torn adoes that grew e xp onential ly with the n u m b er of pieces of self-assem bling j u nk. (This is similar to ho w, in thermo dynamics, n gas p article s in a b ox wil l eve ntually congregate themselv es in one corner of the b ox, bu t only after ∼ c n time for some constan t c .) By con trast, ev olutionary processes can often b e obs erv ed in simulat ions—and in some cases, even p ro v ed th eoret ically—to ﬁn d in teresting solutions to optimization problems after a n umb er of steps that gro ws only p olynomial ly with the n u mb er of v ariables. In terestingly , in a 1 972 lette r to Hao W ang (see [130, p . 192]), Kurt G¨ odel expr essed his own doubts ab out evo lu tion as follo w s : I b eliev e that mec hanism in b iolo gy is a pr eju dice of our time whic h will b e d ispro v ed. In this case, one d ispro of, in m y opinion, will consist in a mathematical theorem to the eﬀect that the formation w ithin geologica l time of a human b o dy by the la ws of 11 One might ask: can we explain what makes the P 6 = NP problem so hard, rath er than just p oin ting out that many smart p eople hav e tried to solve it and failed? After four decades of researc h, we do have partial explanations for the problem’s diﬃculty , in the form of formal “barriers” that rule out large classes of p roof techniques. Three barriers identiﬁed so far are r elativization [21] (which rules out diagonalization and other techniques with a “computability” ﬂav or), al gebrization [8] (whic h rules ou t d iagonalization even when com b ined with the main n on-relativizing tech- niques known today), and natur al pr o ofs [108] ( which shows th at man y “combinatorial” techniques, if they work ed, could b e turned around to get faster algorithms to d istinguish rand om from pseudorandom functions). 12 By contra st, and con trary to a common misconception, there is strong evidence th at factoring integers is not NP -complete. It is known that if P 6 = NP , then there are NP p roblems that are neither in P nor NP - complete [85], and factoring is one candidate for suc h a p roblem. This p oint will b ecome relev an t when we discuss quantum computing. 8 physic s (or any other la ws of similar nature), starting from a ran d om distribution of the elemen tary p articles and the ﬁeld, is as u nlik ely as the separation by chance of the atmosphere into its comp onen ts. P ersonally , I see no reason to accept G¨ odel’s intuition on this s u b ject o ve r the consensu s of mo dern biology! But p a y atten tion to G¨ odel’s c haracteristically-careful p h rasing. He d oes not ask whether ev olution can e v entual ly form a human b o dy (for he kno w s that it can, giv en exp onen tial time); instead, he asks whether it can do so on a “merel y” geolog ical timescale. J ust as G¨ odel’s letter to von Neumann ant icipated the P v ersus NP problem, so G¨ odel’s letter to W ang m igh t b e said to antici p ate a r ecent eﬀort, by the celebrated computer scien tist L eslie V alian t, t o c onstr uct a qu an titativ e “ theory of evo lv abilit y” [128]. Building on V alian t’s earlier work in computational learning theory (discussed in Section 7), evolv abilit y tries to f ormalize and answer questions ab out the sp e e d of ev olution. F or example: “wh at sorts of adaptive b eha viors can evolv e, with high probabilit y , a fter only a p olynomial n umber of generations? what sorts of b eha viors can b e learned in p olynomial time, b ut not via ev olution?” While there are some interesting early results, it shou ld surprise no one that ev olv abilit y is nowhere close to b eing able to calculate, from ﬁr s t principles, w hether four billion yea r s is a “reasonable” or “unreasonable” length of time for the h u m an b rain to evo lve out of the primordial soup. As I s ee it, this diﬃcult y reﬂects a general p oin t ab out G¨ odel’s “ev olv abilit y” question. Namely , ev en supp osing G¨ od el w as right, that the mechanistic w orldview of mo dern b iolo gy wa s “as unlikely as the separation b y c h ance of the atmosphere into its comp onen ts,” computational complexit y theory seems hop elessly far f rom b eing able to pr ove an ything of the k in d! In 1972 , one could ha ve argued th at this merely reﬂected the sub ject’s newn ess: no on e had though t terribly deeply y et ab out ho w to prov e lower b ounds on computation time. But by no w, p eople have though t deeply ab out it, and h a v e id en tiﬁed h u ge obstacles to proving even such “ob vious” and wel l-deﬁ n ed conjectures as P 6 = NP . 13 (Section 4 will make a related p oin t, ab out the diﬃcult y of p ro ving non trivial lo wer b ou n ds on the time or memory needed b y a computer program to p ass the T u ring T est.) 3.3 Kno wn I n tegers My last example of the philosophical relev ance of the p olynomial/exponential distinction concerns the concept of “kno wledge” in mathematics. 14 As of 2011, the “largest kno wn pr ime n umb er,” as rep orted by GIMPS (the Grea t In ternet Mersenn e Prime Sea r c h), 15 is p := 2 43112609 − 1. But on reﬂection, w hat do w e mean by sa ying that p is “kno wn ”? Do w e mean that, if we desired, w e could literally print out its decimal digits (usin g a b out 30 , 000 pages)? That seems lik e to o restrictiv e a c r iterion. F or, giv en a positiv e integ er k together with a pro of th at q = 2 k − 1 w as prime, I d oubt most m athematicians w ould hesitate to call q a “known” prime, e ven if k w ere so large that printing out its decimal digits (or s toring them in a computer memory) were b eyond the Earth’s capacit y . Should w e call 2 2 1000 an “unkno wn p o wer o f 2,” just b ecause it has to o man y decimal digits to list b efore the Sun go es cold? All that sh ould r e al ly matter, one feels, is that 13 Admittedly , one migh t b e a ble to prov e that Darwinian nat ur al sele ction w ould requ ire exp onenti al time to prod uce some functionality , without thereby p ro ving th at any algorithm wo uld require ex ponential time. 14 This section was insp ired by a question of A. R upinski on t he website MathOverﬂow . See mathov erﬂow. n et/q uestions/6 2925/philosophical-question-related-to-largest-known-primes/ 15 www.merse n ne.org 9 (a) the expr ession ‘2 43112609 − 1’ picks out a un ique p ositive integ er, and (b) that integ er h as b een p r o v en (in this case, via computer, of course) to b e prime. But w ait! If those are the criteria, then wh y can’t we imm ediate ly b eat the largest-kno w n-prime record, like so? p ′ = The ﬁr s t pr ime larger than 2 43112609 − 1 . Clearly p ′ exists, it is unambiguously deﬁ ned, and it is prime. If we w ant, w e can ev en write a program that is guaran teed to ﬁnd p ′ and output its d ecimal digits, using a num b er of steps that can b e upp er-b ound ed a pr iori . 16 Y et our intuition stubb orn ly in s ists that 2 43112609 − 1 is a “kno wn” prime in a s ense that p ′ is not. Is there an y principled b asis for suc h a distinction? The clearest basis that I can suggest is the follo win g. W e know an algorithm that tak es as input a p ositiv e in teger k , and that outputs the decimal d igits of p = 2 k − 1 using a numb er of steps that is p olynomial—inde e d, line ar—in the numb er of d ig its of p . But we do n ot kno w a ny similarly-eﬃcien t algorithm that pro v ably outp uts the ﬁrst pr ime larger than 2 k − 1. 17 3.4 Summa ry The p oin t of these examples wa s to illustrate that, b eyo n d its utilit y for th eoretical computer science, the p olynomial/exp onen tial gap is also a fertile territory for philosoph y . I think of the p olynomial/exponential gap as occupying a “middle ground ” b et we en t w o other sorts of gaps: on the one han d , sm all qu an titativ e gaps (such as the gap b etw een n steps and 2 n steps); and on the other hand, the gap b etw een a ﬁnite n um b er of steps and an inﬁnite num b er. The troub le with small quantita tive gaps is that they are to o sensitiv e to “m un dane” mo deling c hoices and the details of tec h n olog y . But the gap b et ween ﬁn ite and inﬁnite has the opp osite problem: it is serenely in sensitiv e to d istinctio n s th at we actually care ab out, such as that b et w een ﬁndin g a solution and v erifying it, or b et ween classical and quantum p h ysics. 18 The p olynomial/ exp onential gap a v oids b oth problems. 4 Compu tati onal Complexit y and the T uring T est Can a c omputer think? F or almost a cen tury , discus sions ab out this qu estion hav e often conﬂated t wo issues. The ﬁrst is the “metaphysic al” issu e: Supp osing a co mp uter program passed the T uring T est (or a s stron g a v ariant of t h e 16 F or ex ample, one could u se Chebyshev’s The or em (also called Bertr and’s Postulate ), whic h says that for all N > 1 there exists a p rime b etw een N an d 2 N . 17 Cr am´ er’s Conj e ctur e sta t es that the spacing b et ween tw o consecutive n -digit primes never exceeds ∼ n 2 . This conjecture app ears staggeringly diﬃcult: even assuming th e Riemann Hyp othesis, it is only know n how to d educe the muc h weak er upp er b ound ∼ n 2 n/ 2 . But interestingly , if Cram´ er’s Conjecture is prov ed, expressions like “the ﬁrst prime larger than 2 k − 1” will then deﬁ ne “kn o wn primes” according to my criterion. 18 In particular, it is easy to c h ec k that the set of c omputable functions does not depend on whether w e deﬁne computability with resp ect to a classical or a quantum T uring machine, or a d eterministic or nondeterministic one. At most, these choices can change a T u ring m achine’s running t ime by an exp onential factor, which is irrelev an t for computability t h eory . 10 T uring T est as one wish es to deﬁn e), 19 w ould we b e righ t to ascrib e to it “consciousness,” “qualia,” “ab outness,” “in ten tionalit y ,” “sub jectivit y ,” “p ersonho o d,” or whatev er other c harmed status we wish to ascrib e to other humans and to ourselv es? The second is the “practical” iss ue: Could a c ompu ter program that passed (a strong version of ) the T uring T est actually b e written? Is there s ome f undamen tal reason wh y it couldn’t b e? Of course, it w as precisely in an attempt to separate these issues th at T u r ing p rop osed the T uring T est in the ﬁr st place! But despite h is eﬀorts, a familiar feature of anti-AI arguments to this day is that they ﬁrst assert AI’s metaphysica l imp ossibilit y , and then try to b olster that p osition with claims ab out AI’s practical d iﬃculties. “Su re,” they sa y , “a computer pr ogram migh t mimic a few minutes of witt y b an ter, but unlike a human b eing, it would nev er sho w fear or anger or jealousy , or comp ose symphonies, or grow old, or f all in lo v e...” The o bvious follo wu p questio n —and what if a program did do all those th ings?—is often left unask ed , or else answered by listing more things that a compu ter program could self-eviden tly nev er do. B ecause of this, I susp ect that man y p eople w ho say th ey consid er AI a met aphysical imp ossibilit y , really consider it only a practical imp ossibilit y: they simply ha v e not carried the requisite thought exp erimen t far enough to see the d iﬀeren ce b etw een the tw o. 20 Inciden tally , this is as clear-cut a case as I know of where p eople wo u ld b eneﬁt from stu dying more philosophy! Th u s, the an ti-AI arguments that inte r est m e most hav e alw a ys b een the ones that target the practical issue from the outset, by prop osing empirical “sw ord-in-the-stone tests” (in Daniel Dennett’s phrase [46]) that it is claimed h um an s can pass but compu ters cannot. The m ost famous suc h test is probably the one based on G¨ odel’s Incompleteness T h eorem, as prop osed by John Lucas [91] and elab orated by Roger P enrose in his b o oks The Emp er or’s N e w Mind [102] and Shadows of the Mind [103]. Brieﬂy , Lucas and P enrose argued that , according to the In completeness Theorem, one thing that a computer making deductions via ﬁxed formal rules can n ev er do is to “see” the consistency of its o wn rules. Y et this, they assert, is something that h u man mathematicians c an do, via some sort of intuitiv e p erception of Platonic realit y . T herefore h um ans (or at least, human mathematicians!) can never b e simulated b y m ac hines. Critics p ointed out n umerous h oles in this argumen t, 21 to which Pe n rose resp onded at length in Shadows of the M ind , in m y opinion un con vincingly . Ho wev er, ev en b efor e we analyze some 19 The T uring T est, p rop osed by Alan T uring [126] in 1950, is a test where a human jud ge interacts with either another human or a computer conve rsation program, by typing messages b ac k and forth. The program “passes” the T est if th e judge can’t reliably distinguish the program from the human interlocutor. By a “strong v arian t” o f the T uring T est, I mean that besides th e usual telet yp e co nversatio n, one could add additional t ests requiring vision, hearing, touc h, smell, sp eaking, handwriting, facial ex pressio n s, dancing, playing sp orts and musical instruments, etc.—ev en though many p erfectly-intelligen t humans w ould then b e unable to pass the tests! 20 One famous exception is John Searle [113], who has made it clear that, if (say) his b est friend turned out to b e contro lled by a micro c h ip rather than a brain, then he would regard his friend as never having b een a p erson at all. 21 See Dennett [46] and Chalmers [37] for ex ample. T o summarize: (1) Why should we assume a compu ter op erates within a know ably-sound formal system? If we grant a compu t er the same freedom to make o ccasio n al mistakes th at we gran t humans, then the Incompleteness Theorem is no longer relev ant. (2) Why should w e assume that human mathematicians hav e “direct p erception of Platonic realit y”? Human 11 prop osed sw ord -in -the-stone test, it seems to me that th er e is a m uch more b asic question. Namely , what do es one even me an in sa ying one has a task that “h umans can p erform but computers cannot”? 4.1 The Lo okup-T able Argumen t There is a fundamen tal diﬃcult y here, which was noticed by others in a slig htly diﬀerent conte xt [30, 101, 87, 116]. Let me ﬁ rst exp lain the diﬃculty , and then discuss the diﬀerence b et ween m y argumen t and the p r evious ones. In practice, p eople j udge eac h other to b e conscious after interacti n g for a v ery short time, p erhaps as little as a few seconds. This su gge sts that w e can pu t a ﬁn ite u pp er b ou n d—to b e generous, let us sa y 10 20 —on the num b er of bits of information th at t wo p eople A and B wo u ld ev er realisticall y exc hange, b efore A had amassed enough evidence to conclude B was conscious. 22 No w imagine a lo okup table that stores every p ossib le history H of A and B ’s con ve r satio n , and n ext to H , the ac tion f B ( H ) that B would tak e next g iven that history . Of cour se, lik e Bo r ges’ Library of Bab el, the lo okup table would consist almost en tirely of meaningless n onsense, and it w ould also b e m uch to o large to ﬁ t inside the observe d un iv erse. But all th at matters for us is that the lo okup table would b e ﬁnite , by the assu mption that there is a ﬁnite u pp er boun d on the conv ersation length. This implies th at the f unction f B is computable (indeed, it c an b e recognized by a ﬁ n ite automaton!). F rom these sim p le considerations, we conclude that if th ere is a fundamenta l obstacle to computers p assing the T uring T est, then it is not to b e found in computabilit y theory . 23 In Shadows of the Mind [103, p . 83], P enr ose recognizes this prob lem, bu t giv es a puzzling and unsatisfying resp onse: One could equally w ell envi s age computers that con tain nothing bu t lists of totally false mathematical ‘theorems,’ or lists con taining r andom jumbles of truths and falseho ods . Ho w are w e to tell whic h computer to trust? T h e argument s that I am trying to m ak e here do not sa y that an eﬀectiv e sim ulation of the output of conscious human activit y (here mathematics) is imp ossible, since p urely by chance the compu ter might ‘happ en’ mathematicians (such as F rege) hav e b een wrong b efore about the consistency of formal systems. (3) A compu t er could, of course, b e p rogra mmed to output “I b elieve that formal system F is consistent”—and even t o output answ ers to v arious follo wup questions ab out why it believes this. So in arguing that s u c h aﬃrmations “wo u ld n ’t really count” (b ecause they wouldn’t reﬂect “true und erstanding”), AI critics such as Lucas and Pe n rose are forced to retreat from their vision of an empirical “sword-in-the-stone test,” and fall back on other, un speciﬁed criteria related t o the AI ’s internal structure. But then why put the swor d i n the stone in the ﬁrst plac e? 22 P eople in teracting o ver t h e Internet, via ema il or instant messages, regularly judge ea ch other to be h umans rather than spam-b ots after ex c hanging a m u c h smaller num b er of b its! In any case, cosmological considerations suggest an upp er b ound of roughly 10 122 bits in any observ able pro cess [34]. 23 Some readers migh t notice a t en sion here: I explained in Section 2 that complexity theorists care about the asymptotic b eha v ior as the problem size n go es to inﬁ nit y . So wh y am I now saying t hat, for the p urposes of the T uring T est, we should restrict attention to ﬁnite v alues of n suc h as 10 20 ? There a re t wo answers to this question. The ﬁrst is that, in contra st to mathematical problems like th e factoring problem or the halting problem, it is un clear whether it even makes sense to generalize the T uring T est to arbitrary con versation lengths: for the T uring T est is deﬁ ned in terms of human b eings, and human conversa tional capacity is ﬁnite. The second answe r is that, to whatever extent it do es mak e sense to generalize the T uring T est to arbitrary conv ersation lengths n , I am interes ted in wheth er the asymptotic complexity of passing th e test grows p olynomially or ex ponentially with n (as the remainder of th e section explains). 12 to get it right—e ven without any understand ing whatso ev er. But th e o dd s against this are absurd ly enormous, and the iss u es that are b eing addressed h ere, namely h o w one decides which mathemat ical statemen ts are true and w hic h are false, are not even b eing touc hed... The trouble w ith this resp onse is that it amounts to a retreat from the sword-in-the-stone test, bac k to murkier inte r nal criteria. If, in the end, w e are going to hav e to look insid e the computer anyw a y to determine w h ether it truly “understands” its answe r s , then why no t disp ense with c omputability the ory fr om the b e ginning? F or computabilit y theory only add resses whether or not T uring mac h ines exist to solv e v arious problems, and w e ha ve already seen that that is not the relev an t issue. T o m y m in d, there is one dir ect ion that Pe n rose could tak e f rom this p oin t to a void incoherence— though disapp ointingly , it is n ot the direction he chooses. Namely , h e could p oin t out that, w hile the lo okup table “w orks,” it requires computational r esources that grow exp onen tially with the length of the con ve r satio n ! This w ould lead to the follo win g sp eculation: (*) Any compu ter pr ogram that passed the T uring T est w ould need to b e exp onen tially- ineﬃcien t in the length of the test—as measur ed in some r esource such as time, memory usage, o r the num b er of b its n eeded to write the program d o wn. In other wo r ds, the astronomical lo okup table is essent ially the b est one can do. 24 If true, sp eculation (*) wo u ld do what Pe n rose w an ts: it w ould imply that the human brain can’t ev en b e simula te d by c ompu ter, within the r esource co ns train ts of the obs erv able universe. F urthermore, u nlik e the earlier computabilit y claim, (*) has the adv an tage of not b eing trivially false! On the other h and, to pu t it mildly , (*) is not trivially true either. F or AI pr oponents, the lac k of comp elling evidence for (*) is hardly surpr ising. After all, if y ou b eliev e that the brain itself is basically an eﬃcie nt, 25 classical T uring mac hine, then y ou hav e a s imple explanati on for why n o one has pro ved that the brain can’t b e sim u late d b y such a mac hin e! Ho we ver, complexit y theory also mak es it clear that, even if we supp ose d (*) held , there w ould b e little hop e of pr oving it in our curr en t state of mathematical kn o wledge. After all, w e can’t ev en prov e plausible, we ll-deﬁn ed conjectures such as P 6 = NP . 4.2 Relation to P revious W ork As men tioned b efore, I’m far from the ﬁrst p erson to ask ab out the c omputational r esour c es used in passing the T uring T est, and whether they scale p olynomially or exp onen tially with the con versatio n length. While man y writers ignore this crucial distinction, Block [30], P arb err y [101], Lev esque [87], Shieb er [116 ], and several others all discussed it e xp licit ly . The mai n diﬀerence is that th e previous discus s ions to ok p lace in the conte xt of S earle’ s Chin ese Ro om argumen t [113]. 24 As Gil K ala i p ointed out to me, one could sp eculate instead th at an eﬃcien t computer program exists to pass the T uring T est, but that ﬁnding such a program would require exp onen tial computational resources. In th at situation, the human brain could indeed be sim ulated eﬃciently by a computer program, but mayb e not by a program t hat humans could ever write ! 25 Here, by a T urin g mac h ine M b eing “eﬃcien t,” we mean that M ’s running time, memory u sage , an d program size are mo dest en ou gh that there is no real problem of principle u nderstanding how M could b e sim u lated by a classical physical system consisting of ∼ 10 11 neurons and ∼ 10 14 synapses. F or example, a T uring machine containing a lookup table of size 10 10 20 w ould not b e eﬃcient in this sense. 13 Brieﬂy , Searle prop osed a thought exp eriment—the d etai ls don’t concern u s here—purp orting to s ho w that a computer program could pass the T u r ing T est, eve n though the p r ogram manifestly lac k ed anything that a r easo n able p erson wo u ld call “int elligence” or “understand ing.” In r esp onse, man y critics sai d that Searle’s argum en t w as deeply misleading, b ecause it implicitly enco u raged us to imag ine a computer program that w as simplistic in its in ternal op erations—something lik e the gian t lo okup table describ ed in Section 4.1. And while it w as tru e, th e critics wen t on, th at a gian t lo okup table w ouldn ’t “truly under s tand” its resp ons es, that p oint is also irr elevant . F or the gian t lo okup table is a philosophical ﬁction an ywa y: something that can’t even ﬁ t in the observ able unive r s e! If w e instea d imagine a c omp act, eﬃcient computer program passin g the T u ring T est, then the s itu ation c h anges drastic ally . F or no w, in order to exp lain ho w the program can b e so compact and eﬃcien t, we’ll n eed to p osit that th e p rogram includ es repr esen tations of abstract concepts, capacities for learning and r easo n ing, and all sorts of other inte r nal fu rniture that w e w ould exp ect to ﬁnd in a mind . P ersonally , I ﬁnd this r esp onse to Searle extremely interesting—since if correct, it suggests that the distinction b et ween p olynomial and exp onent ial complexit y has metaphysic al signiﬁcance. According to this resp onse, an exp onen tial-sized lookup table that passed the T uring T est w ould not b e s en tien t (or conscious, int elligen t, self-a w are, etc.), bu t a p olynomially-b ounded p r ogram with exactly the same input/outpu t b eha vior would b e sent ient. F urthermore, the latter program w ould b e sen tient b e c ause it was p olynomially-b ounded. Y et, as m u c h as that criterion for sentience ﬂ atte r s my complexity-t h eoret ic pride, I ﬁ n d m yself reluctan t to tak e a p osition on suc h a weig ht y m atte r. My p oint , in Section 4.1, w as a simpler and (hopefu lly) less con tro ve r sial one: namely , that if y ou w an t to cl aim that passing the T urin g T est is ﬂat-out imp ossible , then like it or not, yo u must talk ab out complexit y rather than just computabilit y . I n other words, the previous w riters [30, 101, 87, 116] and I are all inte r ested in the computational resources needed to pass a T ur ing T est of length n , but for diﬀerent reasons. Where others in vok ed complexit y considerations to argue with Searle ab out the metaph ysical question, I’m inv oking them to argue with Penrose ab out the p ractic al question. 4.3 Can Humans Solv e NP -Complete Problems Eﬃcien tly? In that case, w h at can w e actually say ab out the practical qu estion? Are there an y reasons to accept the claim I called (*)—the claim that humans are not e ﬃciently simulable by T uring mac hines? In considering th is question, we ’r e immediately led to some s p eculati ve p ossibilities. So f or example, if it tur ned out th at h u mans could s olv e arbitrary instances of NP -complete problems in p olynomial time, then that would certainly strong excellen t empirical evidence f or (*). 26 Ho w ever, desp ite o ccasio nal claims to the con trary , I p ersonally see no r easo n to b eliev e that humans c an solv e NP - complete pr oblems in p olynomial time, and excellen t r easo n s to b eliev e the opp osite. 27 Recall, for 26 And am u singly , if we could solv e NP - complete problems, then we’d presumably ﬁnd it m u ch easier to prov e t h at computers c ouldn ’t solve th em! 27 Indeed, it is not even clear to me that we sh ould think of humans as b eing able to solv e all P problems eﬃciently , let alone NP -complete problems! Recall that P is the class o f problems that ar e solv able in p olynomial time by a deterministic T uring machine. Man y problems are known to belong to P for quite sophisticated rea sons: tw o examples are testing wheth er a number is prime (th ough not factoring it!) [9] and testing whether a graph has a p erfect matching. In principle, of course, a human could laboriously run th e p olynomial - time algorithms for such problems using p encil an d pap er. But is the use of p encil and pap er legitimate, w h ere use of a compu t er would not be? What is the computational p o wer of the “unaided” human intellect? Recent wo rk of Druck er [51], which sho ws how to use a sto ck photography collection to increase the “eﬀective memory” a va ilable for mental calculations, 14 example, that the in teger factoring pr oblem is in NP . Th us, if h umans could solv e NP -complete problems, then presu m ably we ought to b e able to factor enormous num b ers as well! But f actoring do es n ot exactly seem lik e the most pr omising candidate for a s w ord-in-the-stone test: that is, a task that’s easy for h umans bu t hard f or computers. As far as an y one kno ws to da y , facto rin g is hard for humans and (classical) computers alike , although with a deﬁnite adv an tage on the compu ters’ side! The b asic p oin t can h ardly b e stressed enough: when complexit y theorists talk ab out “in- tractable” problems, they generally mean mathematical problems that all our exp erience leads us to b eliev e are at lea st as hard for h umans as for computers. T his suggests that, even if humans w ere n ot eﬃcientl y sim u lable by T uring mac hines, the “direction” in whic h th ey w ere hard to simu- late w ould almost certainly b e d iﬀeren t from the dir ect ions u sually considered in complexit y theory . I see tw o (hyp othetic al) wa y s this could happ en. First, the tasks that humans were uniquely goo d at—lik e pain ting or writing p o etry—could b e inc omp ar able with mathematical tasks like s olving NP -complete problems, in the sense that neither was eﬃcien tly r educible to the other. This would mean, in particular, that there could b e no p olynomial-time algorithm ev en to r e c o gnize great art or p o etry (since if such an algorithm existed, then the task of c omp osing great art or p o etry w ould b e in NP ). Within complexit y theory , it’s kno wn that there exist pairs of p r oblems that are incomparable in this sens e. As one plausible example, no one currently knows ho w to r ed uce the simula tion of quan tum computers to the solution of NP -complete p roblems or vice ve r sa. Second, humans could ha ve the abilit y to solv e interesting sp e cial c ases of NP -complete problems faster than an y T u ring mac hine. So for e xamp le, ev en if co mp uters we re b etter than humans at factoring large num b ers or at solving randomly-generated Sudoku puzzles, h umans migh t still b e b etter a t searc h problems with “higher-lev el structure” or “semanti cs,” su c h a s proving F ermat’s Last T h eorem or (ironically) designing faster computer algorithms. Ind eed, even in limited domains suc h as puzzle-solving, while computers can examine solutions millions of times faster, humans (for no w) are v astly b etter at noticing glob al p atterns or symmetries in the puzzle that mak e a solution either trivial or imp ossible. As an am u s ing example, consider the Pige onhole Principle , which sa ys that n + 1 pigeons can’t b e placed int o n h oles, w ith at most one pigeon p er h ole. It’s not h ard to construct a prop ositional Bo olean formula ϕ th at enco des th e Pigeonhole Principle for some ﬁxed v alue of n (sa y , 1000). Ho wev er, if y ou then feed ϕ t o curr ent Bo olean satisﬁabilit y algorithms, they’ll assiduously set to w ork tryin g out p ossibilities: “let’s see, if I p ut this pigeon here, and that one there ... darn, it stil l do esn’t work!” And they’ll cont inue trying out p ossib iliti es for an exp onen tial num b er of steps , oblivious to the “global ” rea son why the goal can nev er b e ac hiev ed. Indeed, b eginning in the 1980s, the ﬁeld of pr o of c omplexity —a close cousin of computational complexit y—has b een able to sh o w th at large classes of algorithms r e quir e exp onen tial time to pro ve the Pigeonhole Principle and similar prop ositional tautolog ies (see Beame and Pitassi [24] for a su rv ey). Still, if w e wa nt to bu ild our sword-in-the-stone test on the abilit y to detect “higher-lev el patterns” in combinatorial search pr oblems, then th e b urden is on us to explain what we me an b y higher-level patterns, and why we think that n o p olynomial-time T uring mac hine—eve n m uch more sophisticated ones than we can imagine to da y—could ev er d ete ct those patterns as well. F or an initial attempt to understand NP -complete pr oblems from a co gnitive scie n ce persp ective , see Baum [22]. provides a fascinating empirical p ersp ectiv e on th ese questions. 15 4.4 Summa ry My conclusion is that, if you opp ose the p ossib ility of AI in p rinciple, th en either (i) y ou can tak e th e “metaphysica l route” (a s Searle [113] does with the Chinese Room), con- ceding the p ossibilit y of a computer program passing ev ery conceiv able empir ica l test for in telligence, but arguing that that isn ’t enough, or (ii) y ou can conjecture an astr onomic al lower b ound on the r esour c es needed either to run such a program or to write it in the ﬁr st place—but here there is little question of pro of f or the foreseeable fu tu re. Crucially , b ecause of the lo okup-table argument, one o p tion y ou d o not hav e is to assert the ﬂat- out imp ossibilit y of a computer program passing the T u ring T est, with no mention of qu an titativ e complexit y b ounds. 5 T he Problem of Logical Omniscience Giving a form al accoun t of know le dge is one of the cen tral concerns in mo dern analytic p hilosoph y; the literature is to o v ast ev en to s u rv ey here (though see F agin e t a l. [5 3 ] for a compu ter-scie n ce- friendly ov erview). T yp ically , form al accoun ts of kn owledge inv olv e con v entional “logical” axioms, suc h as • If you kno w P and you kno w Q , then y ou also kno w P ∧ Q supplemented by “mo dal” axioms ha ving to do w ith kn o wledge itself, such as • If you kno w P , th en you also know that you kno w P • If you don’t know P , then you kno w that yo u don’t kn o w P 28 While the details d iﬀer, what most formal accoun ts of kn o wledge ha ve in common is that they treat an agen t’s kno wledge as c lose d un der the application of v arious dedu ctio n ru les like the ones ab o v e. In other w ord s, agent s are consid er ed lo gic al ly omniscient : if they kno w certain facts, th en they also kn o w all p ossible logical consequences of those facts. Sadly and obvio u s ly , no mortal b eing has ever attained or ev en appro ximated th is sort of omniscience (r eca ll the T urin g q u ote from the b eginning of Section 1). So for example, I can know the ru les of arithmetic without kno win g F ermat’s Last Th eorem, and I can kno w the r ules of c h ess without kno wing whether White has a forced win. F ur thermore, the diﬃcult y is not (a s sometimes claimed) limited to a f ew domains, such as mathematics and games. As p oint ed out b y Stalnak er [123], if we assumed logical o mn iscience , then we couldn’t accoun t for any con temp lation of facts already kno wn to u s—and thus, for the main activit y and one of the m ain sub jects of ph iloso p h y itself ! W e can now lo osely s tat e wh at Hint ikk a [72] called the pr oblem of lo gic al omniscienc e : 28 Not surprisingly , this particular axiom has engendered contro versy: it leav es no p ossibili ty for Ru msfel d ian “unknown un kno wns.” 16 Can we giv e some f orm al accoun t of “kno w ledge” able to accommo date p eople learning new things w ith ou t lea ving their armchairs? Of course, on e v acuous “solution” wo u ld b e to declare that yo u r kno wledge is simply a list of all the true sen tences 29 that yo u “kno w”—and that, if the list happ ens not to b e closed und er logical deductions, so b e it! But this “solution” is no help at all at explaining how or why you know things. Can’t we do b etter? In tuitively , we wan t to say that your “kno wledge” consists of v arious non-logical facts (“grass is green”), together with some simple consequen ces of those facts (“grass is not pin k ”), but not nec- essarily al l the consequences, and certainly not all consequences that inv olv e diﬃcult mathematical reasoning. Unfortunately , as s o on as w e try to form alize this idea, we run into problems. The most obvious problem is the lac k of a sharp b oun d ary b et ween the facts you kno w righ t a w ay , and those y ou “could” kno w, but only after signiﬁcan t though t. (Recall the discus sion of “known primes” from Sectio n 3.3.) A relate d problem is the lac k of a sharp b oundary b et wee n the facts y ou kno w “only if aske d ab out th em,” an d those y ou kn o w even if yo u ’re not asked. Interestingly , these t w o b oundaries seem to cut across eac h ot h er. F or example, while you’v e probably already encoun tered the fact that 91 is comp osite, it migh t tak e yo u some time to rememb er it; while y ou’v e probably never encoun tered the fact that 83190 is comp osite, once asked y ou can pr obably assent to it immed iate ly . But as discussed b y Stalnak er [123], th er e’s a thir d problem that s eems m u c h more serious than either of the tw o ab o v e. Namely , y ou migh t “kno w” a particular fact if ask ed ab out it one w ay , but not if aske d in a diﬀerent w ay! T o illus trate this, Stalnake r uses an example that we can recognize immediately from the d iscussion of the P v ers us NP p roblem in Section 3.1. If I ask ed you whether 43 × 37 = 1591, y ou could p robably answer easily (e.g., by u sing (40 + 3) (40 − 3) = 40 2 − 3 2 ). On the other hand, if I instead ask ed y ou wh at the p rime factors of 1591 w ere, y ou probably c ouldn ’t answ er so easily . But the answ ers to the t wo qu estions hav e the same con ten t, ev en on a v ery ﬁ ne- grained notion of con ten t. S upp ose that we ﬁx the threshold of ac cessibility so that the inf ormatio n that 43 and 37 are the prime factors of 1591 is accessible in r esp onse to the second question, b ut n ot accessible in resp onse to the ﬁr st. Do y ou kn ow wh at the prime factors of 1591 are or not? ... Our problem is that we are n ot just trying to sa y what an ag ent w ould know up on b eing ask ed certain questions; rather, we a r e trying to use the facts ab out an agen t’s qu estion answe r in g capacitie s in order to get at wh at the agen t knows, eve n if the questions are n ot ask ed. [123, p . 253] T o ad d another example: do es a typica l four-y ear-old child “know” th at addition of reals is comm utativ e? Certainly not if we ask ed her in those w ords — an d if we tried to explain the w ord s, she probably wo u ldn’t understand us. Y et if w e show ed her a stac k of b o oks, and asked h er whether she could make the s tac k higher b y shuﬄing the b o oks, she probably w ouldn ’t mak e a mistak e t h at inv olv ed imagining add ition was n on-co mmutativ e. In that sense, we might sa y she already “implicitly” kn o ws w h at her math classes w ill later mak e explicit. In my view, these and other examples strongly suggest th at only a small part of what w e mean b y “knowledge ” i s kn o wledge ab out th e truth or falseho o d of individu al p rop ositio n s. And 29 If we don’t require the sentences to b e true , then presumably we’re talking ab out b elief rather than know l e dge . 17 crucially , this r emains s o ev en if we restrict our attent ion to “pur ely v erbalizable” knowle d ge— indeed, know le dge use d f or answering factual questions —and not (sa y) knowledge of h ow to ride a bik e or swin g a golf club, or knowle d ge of a p er s on or a place. 30 Man y ev eryd a y uses of the word “kno w” supp ort this idea: Do y ou kno w calculus? Do y ou kno w Sp anish? Do y ou kno w the ru les of bridge? Eac h of the ab o ve questions could b e interpreted as asking: do you p ossess an internal algorithm, by which you c an answ er a lar ge (and p ossibly-unb ounde d) set of questions of some form ? While this is rarely made explicit, the examples of this section and of Section 3.3 suggest addin g th e pro viso: . . . answ er in a r e asona ble amount of time? But sup p ose w e accept that “kno wing ho w” (or “kno wing a go o d algorithm for”) is a more fundamental concept than “kno wing that.” Ho w d oes that help us at al l in s olving the logical omniscience pr ob lem? Y ou migh t worry th at w e’re righ t b ac k wher e we s tarted. After all, if we try to giv e a formal accoun t of “kno w ing ho w,” then just lik e in the case of “kno win g that,” it will b e tempting to write do wn axioms lik e the follo wing: If y ou know ho w to compute f ( x ) and g ( x ) eﬃcient ly , then you also know h o w to compute f ( x ) + g ( x ) eﬃcient ly . Naturally , w e’ll then wan t to take the logical closure of th ose axioms. But then, b efore we kn o w it, w on’t we ha ve conjur ed into our imaginations a computationally-omniscien t sup erb eing, wh o could eﬃcien tly compute anything at all? 5.1 The Cobham Axioms Happily , t h e ab o v e w orry turns out to b e unfound ed . W e c an write do wn reasonable axioms for “kno wing h o w to compute eﬃcien tly ,” and then go ahe ad and take the closur e of those axioms , without gett ing the un w anted consequence of co mp utational omniscience. Explaining this p oin t will in vo lve a digression in to an old and f ascinating corner of complexit y theory—one that pr obably holds indep end en t interest for p h ilosophers. As is well -kn own, in the 1930s Ch u r c h and Kleene prop osed deﬁnitions of the “computable functions” that tur ned out to b e pr ecisely equiv alen t to T uring’s deﬁn ition, but that diﬀered from T uring’s in making no explicit men tion of mac hines. Rather than analyzing the pr o c ess of co m- putation, the Churc h-Kleene approac h w as simp ly to list axioms that the computable functions of natural num b ers f : N → N ought to satisfy—for example, “if f ( x ) and g ( x ) are b oth computable, then so is f ( g ( x ))”—and then to d eﬁne “the” compu table f unctions as the smallest s et satisfying those axioms. In 1965, Alan Cobham [42 ] ask ed whether the s ame could b e done for the e ﬃc i ently o r fe asibly computable functions. As an answer, he oﬀered axioms th at precisely c h aract erize what to da y w e call FP , or F unction P olynomial-Time (though Cob h am called it L ). The class FP consists o f all 30 F or “knowi n g” a person suggests having actually met the p erson, while “knowi n g” a p lace suggests ha v ing visited the place. Interes tin gly , in Hebrew, one uses a completely diﬀerent verb for “know” in the sense of “b eing familiar with” ( makir ) than for “kn ow” in the intellectual sense ( yo deya ). 18 functions of natural num b ers f : N → N that are computable in p olynomial time by a deterministic T uring mac hine. Note that FP is “morall y” the same as th e cl ass P (P olynomial-Time) deﬁned in Section 3.1: th ey diﬀer only in that P is a class of de cision p roblems (or equiv alen tly , functions f : N → { 0 , 1 } ), wh ereas FP is a class of functions with in teger range. What w as note worth y ab out Cobham’s c haracterization of p olynomial time was that it d idn’t in vol ve any explicit m en tion of either computin g devices or b ounds on their run n ing time. L et me no w list a v ersion of Cobh am’s axi oms, adapted fr om Arora, Impagliazzo , and V azirani [16]. Eac h of the axioms talks ab out which fu nctions of n atural n umb ers f : N → N are “eﬃcien tly computable.” (1) Ev ery constan t f u nction f is eﬃcien tly computable, as is ev ery fu nction wh ic h is nonzero only ﬁnitely often. (2) P airing: If f ( x ) a n d g ( x ) are eﬃcien tly computable, then so is h f ( x ) , g ( x ) i , where h , i is some standard pairing fun ction for the natural num b ers. (3) Comp osition: If f ( x ) and g ( x ) are eﬃcien tly computable, then s o is f ( g ( x )). (4) Grab Bag: The follo w in g fun ctio n s are all eﬃcien tly computable: • the arithmetic functions x + y and x × y • | x | = ⌊ log 2 x ⌋ + 1 (the num b er of bits in x ’s binary representat ion) • the pro j ection functions Π 1 ( h x, y i ) = x and Π 2 ( h x, y i ) = y • bit ( h x, i i ) (the i th bit of x ’s binary repr esen tation, or 0 if i > | x | ) • diﬀ ( h x, i i ) (the n u m b er obtained from x b y ﬂipping its i th bit) • 2 | x | 2 (called the “smash function”) (5) Bounded Recursion: Supp ose f ( x ) is eﬃcien tly computable, and | f ( x ) | ≤ | x | for all x ∈ N . Then the f u nction g ( h x, k i ), deﬁned by g ( h x, k i ) =  f ( g ( h x, ⌊ k / 2 ⌋i )) if k > 1 x if k = 1 , is also eﬃciently computable. A few commen ts ab out the Cobham axioms migh t b e helpful. First, the axiom that “do es most of the w ork” is (5). Intuitiv ely , giv en any natural n um b er k ∈ N that w e can generate starting from the original in p ut x ∈ N , the Bounded Recursion axiom lets us set up a “computational pro cess” that runs for log 2 k steps. Second, the role of the “smash fun ctio n ,” 2 | x | 2 , is to let us map n -bit intege r s to n 2 -bit integ ers to n 4 -bit integ ers and so on, an d thereby (in com b in atio n with the Bounded Recursion axiom) set up computational pro cesses that ru n for arbitrary p olynom ial num- b ers of steps. Third, although addition and multiplic ation are included as “eﬃcient ly computable functions,” it is cru cial that exp onentiat ion is not included. Indeed, if x and y a r e n -bit int egers, then x y migh t require exp onent ially many bits ju st to write down. The basic result is then the follo wing: 19 Theorem 1 ([42, 110]) The clas s FP , of functions f : N → N c omputable in p olynom ial time by a deterministic T uring ma chine, sa tisﬁes axioms (1)-(5), and is the smal lest class th at do es so. T o p ro v e Th eorem 1, on e needs to do t wo things, n either of them diﬃcult: ﬁ rst, show that any function f that can b e deﬁn ed using the C ob h am axioms can also b e computed in p olynomial time; and second, sho w that the Cobham axioms are enough to sim ulate an y p olynomial-time T urin g mac hine. One dr a wbac k of th e Cobham axioms is th at they seem to “sneak in the concept of p olynomial- time through the bac k do or”—b oth through the “smash function,” and through the arbitrary- lo oking c ond ition | f ( x ) | ≤ | x | in axiom (5). In the 199 0s, how ev er, Leiv ant [86] and B ellant oni and Co ok [2 5 ] both ga v e more “elega nt” logical c haracterizations of FP that a v oid th is p roblem. So for example, Leiv an t sho we d that a fun ction f b elongs to FP , if and only if f is computed by a program that can b e pr o v ed correct in second-order logic with compr ehension restricted to p ositiv e quan tiﬁer-fr ee formulas. Resu lts lik e these pro vide fur ther evidence—if an y was n eeded—that p olynomial-time computabilit y is an extremely natural notion: a “wide target in conceptual s pace” that one hits ev en while aiming in purely logical directions. Ov er the p ast few decades, the id ea of deﬁning complexit y classes such as P and NP in “logica l, mac hine-free” w ays h as give n rise to an entire ﬁ eld called descriptive c omplexity the ory , which has deep connections with ﬁnite mo del theory . While further discussion of descriptiv e complexit y theory would take u s to o far aﬁeld, s ee the b o ok of Imm er m an [77] f or the deﬁn itiv e int r odu ctio n , or F agin [52] for a su rv ey . 5.2 Omniscience V ersus Inﬁnit y Returning to our original topic, ho w exactly do axiomatic theories su c h as Cobham’s (or Churc h’s and Kleene’s, for that matter) escap e the pr oblem of omniscience? One straigh tforward answer is that, u nlik e the set of true sentences in some f ormal language, whic h is only c ountably inﬁnite, th e set of f u nctions f : N → N is unc ountably inﬁnite. And therefore, ev en if w e deﬁne the “eﬃcient ly- computable” functions f : N → N by taking a counta b ly-inﬁnite logica l closure, w e are sure to m iss some fun ctio n s f (in fact, almost all of them!). The observ ation ab o v e suggests a general strategy to tame the logical omn iscience p roblem. Namely , w e could refuse to deﬁne an agen t’s “kno wledge” in t erm s of whic h individual questions she can qu ic kly a n sw er, and insist on sp eaking instead ab out whic h inﬁnite families of questio n s she can qu ic kly answer. In slogan form, we w ant to “ﬁgh t omniscience with inﬁnity .” Let’s see ho w, by taking this route, we can giv e semi-plausible answers to the p u zzle s ab out kno wledge discu ssed earlier in this section. First, the r eason wh y y ou can “kno w” that 1591 = 43 × 37 , b ut at th e same time not “kno w ” the prime factors of 1591, is th at, when w e sp eak ab out kno wing the answ ers to these questions, we r eal ly mean knowing how to answ er them. And as w e sa w, there need not b e any con tradiction in kno w in g a fast m u ltiplica tion algorithm but not a fast factoring algorithm, even if w e mo del your kn o wledge ab out algorithms as d eductiv ely closed. T o put it another w ay , by em b ed d ing the t wo qu estions Q1 = “Is 1591 = 43 × 37?” Q2 = “What are the prime factors of 1591 ?” in to inﬁnite fam ilies of r elate d questions , w e can break the symmetry b et wee n the kn o wledge en - tailed in answ ering th em. 20 Similarly , we could think of a c hild as p ossessing an internal algorithm w hic h, giv en any state- men t of the form x + y = y + x (for sp eciﬁc x and y v alues), immediately outputs true , without ev en examining x and y . Ho wev er, the c hild d o es not y et ha ve the a b ility to pro cess quantiﬁe d statemen ts, suc h as “ ∀ x, y ∈ R x + y = y + x .” In that sen se, she still lac ks the explicit kno wledge that addition is comm utativ e. Although the “cure” for logical omniscience sk etc hed ab o ve solv es some puzzles, not surp risingly it raises man y puzzles of its o w n. So let me end this section b y d iscussing th ree ma jor ob jections to the “inﬁ nit y cure.” The ﬁrst ob jection is that we’v e simply pu shed the problem of logical omniscience somewhere else. F or supp ose an agen t “knows” ho w to compute ev ery function in some restricted class such as FP . T h en ho w can we ever make sense of the agen t le arning a new algorith m? One natural resp onse is that, even if y ou ha ve the “laten t abilit y” to compute a fu n ctio n f ∈ F P , yo u migh t not know that y ou ha ve the abilit y—either b ecause you don’t kn o w a s uitable a lgorithm, or b ecause y ou do know an algorithm, but don’t k n o w that it’s an algorithm for f . Of course, if w e w ante d to pursu e things to the b ottom, we’d next need to tell a story ab out know le dge of algorithms , and ho w logica l omniscience is a v oided there. Ho wev er, I claim that this rep resen ts p rogress! F or notice that, ev en w ithout such a story , w e can already explain some failures of logical omniscience. F or example, the r easo n why you don’t kno w the factors of a large num b er migh t not b e your ignorance of a fast factoring metho d, bu t rather that no such m ethod exists. The second ob jection is that, when I adv o cated fo cusing on inﬁnite families of qu estio n s r ather than single questions in isolation, I neve r sp eciﬁed which inﬁn ite families. The diﬃcult y is that th e same question could b e generalized in wildly diﬀerent w ays. As an example, consider the question Q = “Is 432 , 150 comp osite?” Q is an instance of a computational problem that h umans ﬁnd very hard: “giv en a large in teger N , is N comp osite?” Ho wev er, Q is also an instance of a computational problem that humans ﬁnd very easy: “giv en a large in teger N ending in 0, is N comp osite?” And indeed, we ’d exp ect a p erson to know the answe r to Q if she noticed th at 432 , 1 50 end s in 0, b ut not otherwise. T o m e, what this example demonstrates is that, if we w an t to discu s s an agen t’s kno wledge in terms of individual qu estions su c h as Q, then the r elev an t issue will b e whether there exists a generalization G of Q, suc h that the agen t kno ws a fast algorithm for answering questio n s of t yp e G, and also recognizes that Q is of typ e G. The third ob j ect ion is just the standard o n e ab out th e relationship b et w een asymptotic com- plexit y and ﬁnite statement s. F or example, if we mo del an agent ’s kno wledge usin g the Cobh am axioms, t h en we can ind eed explain why the a gent do esn’t kno w ho w to p la y p erfect chess on an n × n b oard, for arbitr ary v alues of n . 31 But on a stand ard 8 × 8 b oard, pla ying p erfect c hess w ould “merely” r equire (sa y) ∼ 10 60 computational steps, whic h is a constant, a n d therefore cer- tainly p olynomial! S o strictly on the basis of the Cobh am axioms, w h at explanation could w e p ossibly o ﬀer for why a rational agen t, who knew the ru les of 8 × 8 c hess, didn’t also kno w ho w to pla y it optimally? While this ob jection might soun d dev astating, it’s imp ortan t to un derstand that it’s no diﬀerent fr om the usual ob jection lev eled against complexit y-theoretic argumen ts, and can b e giv en the usual resp onse. Namely: asymptotic s tat ements are always vulnerable to b eing rendered irrelev an t, if th e constan t fact ors turned ou t to b e ridiculous. Ho wev er, exp erience has 31 F or chess on an n × n b oard is known to b e EXP -complete, and it is also k no wn that P 6 = EXP . See Section 10, and particularly fo otnote 60, for more details. 21 sho wn that, for whatev er r easo n s, that h app ens rarely enough that one can u sually tak e asymptotic b eha vior as “ha vin g explanatory force unti l pr o v en otherwise.” (Section 12 will sa y m ore a b out the explanatory force of asymptotic claims, as a problem requir ing ph ilosophical analysis.) 5.3 Summa ry Because of the diﬃculties p oin ted out in Section 5.2, m y o wn view is th at computational complexit y theory h as not ye t come close to “solving” the logical omniscience p roblem, in the sen s e of giving a satisfying formal account of kno wledge that also av oids m aking absurd predictions. I ha ve no idea whether such an accoun t is even p ossible. 32 Ho w ever, wh at I’ve tried to sho w in this section is that complexit y theory provides a w ell-deﬁned “limiting case” wh ere the logi cal omniscience problem is solv able, ab out a s well a s one could hop e it to b e. Th e limiting case is where the size of the questions g rows w ithout b ound, and the solution there i s giv en b y the Cobham a xioms: “axioms of knowing how” w hose logical closure one c an take without thereb y inviting omniscience. In other words, when we con template th e omn iscience problem, I claim th at we’ r e in a situation similar to one often faced in physics—where we m ight b e at a loss to un d erstand some p henomenon (sa y , gra vitational ent r op y), e xc ept in limiting cases such as b lac k h ole s. I n epistemology j u st lik e in physics, the limiting cases that we do more-or-less unders tand oﬀer an ob vious starting p oint for those wishin g to tac kle the general case. 6 Compu tati onalism and W aterfalls Ov er the past t wo decades, a certain argum ent ab out computation—whic h I’ll call the waterfal l ar- gument —has b een w idely discussed by philosophers of mind. 33 Lik e Searle’s famous Chinese Ro om argumen t [113], th e wa terfall argum en t seeks to show that computations are “inh eren tly syn tactic,” and can never b e “ab out” anything—and that for this reason, the do ctrine of “computationali sm ” is false. 34 But unlike the Chinese Ro om, t h e waterfall argumen t supplemen ts the bare app eal to in tuition by a fu rther claim: namely , that the “meaning” of a compu tation, to w h atev er exten t it has one, is alw a ys r elative to some e xterna l observer . More concretely , consider a w aterfall (though an y other p hysical system with a large enough state space would do as we ll). Here I do not mean a w aterfall that w as sp ecially engineered to p erform computations, but r e al ly a naturally-o ccurring w aterfall: sa y , Niaga r a F alls. Being go v erned by laws of p h ysics, the waterfall implements some mapp ing f from a set of p ossible in itia l states to a se t of p ossible ﬁnal stat es. If w e accept that the la ws of ph ysics are r eversible , then f must also b e injectiv e. Now supp ose w e restrict atten tion to some ﬁnite subset S of p ossible initial states, with | S | = n . Th en f is just a one-to-o n e mapp ing fr om S to s ome output set 32 Compare the p essimism expressed by Pa u l Graham [68] ab out knowledge representation more generally: In practice formal logic is n ot muc h u se, b ecause despite some progress in the last 150 years w e’re still only able to formalize a small p ercen tage of statements. W e ma y never d o th at muc h b etter, for the same reason 1980s-style “kno wledge representation” could never hav e w orked; many statements m ay hav e no representation more concise than a huge, analog brain state. 33 See Put nam [106, app endix] and Searle [114] for tw o instantiatio n s of the argument ( though the formal details of either will not concern us here). 34 “Computationalism” refers to the v iew that the mind is literally a computer, and that thought is literally a type of computation. 22 T = f ( S ) with | T | = n . The “crucial observ ation” is no w this: giv en any p erm utation σ from the set of intege r s { 1 , . . . , n } to itself, there is some wa y to lab el the elements of S and T b y integ ers in { 1 , . . . , n } , suc h that w e can in terpret f a s imp lemen ting σ . F or example, if we let S = { s 1 , . . . , s n } and f ( s i ) = t i , then it su ﬃces to lab el the initial state s i b y i and the ﬁ n al state t i b y σ ( i ). But the p erm utation σ could ha v e an y “seman tics” w e lik e: it migh t represen t a p rogram for pla ying c hess, or factoring int egers, or simulating a diﬀerent waterfall . Therefore “mere computation” cannot giv e rise to seman tic meaning. Here is ho w S earle [114, p. 57] expresses the conclusion: If we are consistent in adopting the T urin g test or some other “ob jectiv e” criterion for in telligen t b eh avior, then the answ er to such questions as “Can un in telligen t bits of matter pro duce inte lligent b eha vior?” and ev en, “Ho w e xactly d o they do it” are lud i- crously ob vious. An y thermostat, p o c k et calculator, or waterfall pr od u ces “inte lligen t b eha vior,” and w e kno w in eac h case how it works. Certain artifacts are designed to b ehav e as if they were in telligen t, and since eve r ything follo ws la ws of n ature, then ev erything will ha ve so me description under whic h it behav es as if it w ere in telligen t. But this s ense of “in telligen t behavio r ” is of n o psycholog ical relev ance at all. The wate rf all argumen t has b een criticized on n umerous groun ds: see Haugeland [71], Block [30], and esp ecially Chalmers [37] (wh o p arod ied the argum en t by pro ving that a cake recip e, b eing merely syn tactic, can never giv e rise to the semantic attribu te of crumbliness). T o my mind , though, p erhaps the easiest wa y to demolish the w aterfall argument is through computational complexit y considerations. Indeed, supp ose w e actually w an ted to use a waterfall to help us calculate chess mo v es. Ho w w ould w e do that? In complexit y terms, what we w ant is a r e duction fr om the chess problem to the waterfall- simulation problem. That i s, w e w an t an eﬃcien t algorithm that someho w enc o des a c hess p osition P in to an initial state s P ∈ S of the w aterfall, in suc h a w ay that a go o d mov e from P can b e r ead out eﬃcien tly from the wat erfall’s corresp ond ing ﬁn al s tat e, f ( s P ) ∈ T . 35 But what wo uld such an algorithm lo ok like? W e cannot say for sure—certainly not without d eta iled kno wledge ab out f (i.e., the physics of w aterfalls), as well as the means by wh ic h the S and T elemen ts are enco ded as binary strings. But for any r easonable c h oice, it seems o v erwhelmingly lik ely that any r eduction algorithm w ould j ust solve the chess pr oblem itself , without using the w aterfall in an essent ial w ay at all! A b it more pr ecisely , I conjecture th at, giv en any c hess-pla yin g algorithm A that accesses a “w aterfall oracle” W , ther e is an equally-go od chess-pla ying algorithm A ′ , with similar time and space requirement s, that do es not access W . If this conjecture holds, then it giv es us a p erfectly observ er-ind ep en den t wa y to formalize our in tuition that the “seman tics” of w aterfalls ha ve n othing to do with c hess. 36 35 T echnically , this describes a restricted class of reductions, called nonadaptive red uctions. An adaptive reduction from chess to w aterfalls migh t so lve a chess problem by some pro cedure that involv es initializing a waterfa ll and observing its ﬁnal state, then u sing the results of that aquatic computation to initialize a se c ond w aterfall and observe its ﬁnal state, and so on for some p olynomial num b er of repetitions. 36 The p erceptive reader might susp ect that we sm uggled o ur conclusion into the assumption that the waterf all states s P ∈ S and f ( s P ) ∈ T w ere enco ded as b inary strings in a “ reasonable” w ay (and n ot, for example, in a w ay that encodes the solution t o the chess problem). But a crucial lesson of complexity theory is th at, when w e discuss “computational problems,” w e always make an imp licit commitment ab out the input and outp u t encod ings anyw a y ! So for example, if p ositiv e integers were given as in put via their prime factorizations, th en the factoring problem w ould be trivial (just apply the identit y function). But who cares? If, in mathematically deﬁning the w aterfall-sim u latio n problem, w e required inpu t and outpu t enco dings that entailed solving chess p roblems, then it w ould no longer b e reasonable to call our problem (solely) a “w aterfall-sim u latio n p roblem” at all. 23 6.1 “Reductions” That Do All The W ork In terestingly , the issue of “trivial” or “degenerate” r eductions also arises within complexit y th eory , so it might b e instructiv e to see h o w it is handled there. Recall from Section 3.1 that a problem is NP -c ompl e te if , lo osely sp eaking, it is “maximally hard among all NP problems” ( NP b eing the class of problems for which solutions can b e c h ec k ed in p olynomial time). More formally , w e say that L is NP -complete if (i) L ∈ NP , and (ii) giv en any other NP p roblem L ′ , there exists a p olynomial-time algo rith m to solv e L ′ using access to an oracle that solves L . (Or more su ccinct ly , L ′ ∈ P L , w here P L denotes the complexit y class P augmen ted b y an L -oracle.) The c oncept of NP -completeness had in credible explanatory p o we r : it sho w ed that thousa nds of seemingly-unrelated problems from physics, biology , industr ial optimization, mathematical logic, and other ﬁelds w ere all identic al from t h e s tandp oin t of polynomial-time compu tatio n, and t h at not one of these problems had an eﬃcien t solution unless P = NP . Thus, it was n atural f or theoretical compu ter scien tists to wan t to d eﬁne an analogous concept of P - c ompleteness . In other w ords : among all the p roblems that ar e solv able in p olynomial time, which ones are “maximally hard”? But how should P -completeness ev en b e d eﬁned? T o s ee the diﬃculty , su pp ose that, b y analogy with NP -completeness, w e say that L is P -complete if (i) L ∈ P and (ii) L ′ ∈ P L for ev ery L ′ ∈ P . Then it is easy to see that th e second condition is v acuous: every P pr oblem is P -complete! F or in “reducing” L ′ to L , a p olynomial-time algorithm can a lwa ys ju st ignore th e L -o r acl e and solv e L ′ b y itse lf, muc h li ke our h yp othetical chess program that ignored its w aterfall o r acle . Because of this, condition (ii) must b e replaced by a str onger condition; one p opular choice is (ii’) L ′ ∈ LOGSP A CE L for ev ery L ′ ∈ P . Here LOGS P ACE means, inform ally , the class of pr oblems solv able b y a deterministic T u ring mac hine w ith a read/write memory consisting of only log n bits, giv en an inp u t of size n . 37 It’s not hard to show that LOGS P ACE ⊆ P , and this conta in m en t is strongly b elieve d t o b e strict (though just lik e with P 6 = NP , there is no pro of y et). The k ey p oint i s that, if w e wa nt a non-vacuous notion of completeness, then the r educing complexit y class needs to b e we aker (either pr o v ably or conjecturally) than th e class b eing reduced to. In fact complexit y classes ev en smaller than LOGSP A CE almost alw ays suﬃce in practice. In my view, there is an imp ortan t lesson here for debates ab out c omp u tatio n alism. Supp ose w e wa nt to claim, for example, that a computation that p la ys c hess is “equiv alen t” to some other computation that simulat es a w aterfall. Then our claim is only non-v acuous if it’s p ossible to exhibit the equiv alence (i.e., giv e the redu ctio n s) within a mo del of computation that isn ’t itself p o we r ful enough to solve the c hess or waterfall problems. 37 Note t h at a LOGSP ACE machine do es not even hav e enou gh memory to store its input string! F or this reason, w e think of the inpu t string as b eing provided on a sp ecial r e ad-only tap e. 24 7 P AC-Learning and the Problem of Induction Cen tur ies ago, Da vid Hume [76] f amously p ointe d out that learning from the past (and, b y exten- sion, science) seems logically imp ossib le. F or example, if we s ample 500 ra v ens and ev ery one of them is blac k, wh y do es th at give us any grounds—even probabilistic grounds —for exp ecting the 501 st ra ve n to b e blac k also? An y mo dern answer to th is qu estion wo u ld p robably refer to Oc c am’s r azo r , the p r inciple that simpler hypotheses consisten t with the data are more likely to b e correct. So f or example, the h yp othesis that all ra vens are blac k is “simpler” than the hyp othesis that most ra ve n s are green or purple, and that only the 500 w e h ap p ened to see were blac k. In tuitivel y , it seems Occam’s razor must b e part of the solution to Hum e’s pr oblem; the diﬃ cu lt y is that such a resp onse leads to questions of its own: (1) What do w e mean by “simpler”? (2) Why are simple explanations like ly to b e correct? Or, less am b itio u sly: what prop erties must realit y ha ve for Occam’s Razor to “w ork”? (3) Ho w m uch data must we collect b efore we can ﬁn d a “simp le hyp othesis” that will pr obably predict fu ture data? Ho w d o we go ab out ﬁnd ing such a h yp othesis? In my view, the theory of P A C (Pr ob abilistic al ly Appr oximately Corr e ct) L e arning , initiated b y Leslie V aliant [12 7 ] in 1984, has made large enough adv ances on all of these questions th at it deserv es to b e studied b y an yone intereste d in indu ctio n . 38 In this theory , w e consider an id eali zed “learner,” who is presented with p oin ts x 1 , . . . , x m dra wn randomly from some large set S , together with the “classiﬁcat ions” f ( x 1 ) , . . . , f ( x m ) of those p oin ts. The learner’s goal is to infer the function f , well enough to b e able to predict f ( x ) for most future p oints x ∈ S . As an example, the learner migh t b e a bank, S migh t b e a set of p eople (repr esen ted by their cr ed it histories), and f ( x ) might repr esen t whether or n ot p er s on x will default on a loan. F or simplicit y , we often assume that S is a set of binary s tr ings, and that the function f maps eac h x ∈ S to a single b it, f ( x ) ∈ { 0 , 1 } . Both assu mptions can b e remov ed without signiﬁcan tly c hanging the theory . Th e imp ortan t assumptions are the follo wing: (1) Eac h of the sample p oints x 1 , . . . , x m is dr a wn indep endently from some (p ossibly-un kno wn) “sample d istribution” D o v er S . F ur th ermore, the future p oin ts x on wh ic h the learner will need to predict f ( x ) are dra wn from the same distribution. (2) The fun ctio n f b elongs to a kno wn “hyp othesis class” H . Th is H r epresen ts “the set of p ossibilities th e learner is willing to ente r tai n ” (and is typica lly m u c h sm alle r than the set of all 2 |S | p ossible functions fr om S to { 0 , 1 } ). Under these assum ptions, we ha ve the follo wing cen tral resu lt. 38 See Kearns and V azirani [82] for an excellent introduction to P AC-learning, and de W olf [136] for previous w ork applying P A C-learning to philosophy and linguistics: sp eciﬁcally , to ﬂ eshing out Chomsky’s “p o vert y of the stimulus” argument. De W olf also discusses several formalizations of Occam’s Razor other than the one based on P A C-learning. 25 Theorem 2 (V alian t [127]) Consider a ﬁnite hyp othesis class H , a Bo ole an function f : S → { 0 , 1 } in H , and a sample distribution D over S , as wel l as an err or r ate ε > 0 and failur e pr ob ability δ > 0 that th e le arner i s wil ling to toler ate. Cal l a hyp othesis h : S → { 0 , 1 } “go o d” if Pr x ∼D [ h ( x ) = f ( x )] ≥ 1 − ε. Also , c al l sa mple p oints x 1 , . . . , x m “r eliable” if any hyp othesis h ∈ H that satisﬁes h ( x i ) = f ( x i ) for al l i ∈ { 1 , . . . , m } is go o d. Then m = 1 ε ln |H| δ sample p oints x 1 , . . . , x m dr awn indep endently fr om D wil l b e r eliable with pr ob ability at le ast 1 − δ . In tuitively , Th eorem 2 sa ys th at the b eha vior of f on a small n u mb er of randomly-c h osen p oints pr ob ably determines its b ehavi or on most of the r emaining p oin ts. In other words, if, by some unsp eciﬁed means, the learner manages to ﬁ nd an y hyp othesis h ∈ H that make s correct pr edictio n s on all its past data p oin ts x 1 , . . . , x m , then pro vided m is large enough (and as it happ ens, m do esn’t need to b e ve r y large), the learner can be statistically conﬁden t that h will also make the correct predictions on most future p oints. The part of Theorem 2 that b ears the unmistak able imp rin t of complexit y theory is the b ound on sample size, m ≥ 1 ε ln |H| δ . Th is b oun d has three n ota b le implications. First, ev en if the class H con tains exp onentia lly man y hypotheses (sa y , 2 n ), one can still learn an arbitrary fun ctio n f ∈ H using a line ar amoun t of sample data, since m gro w s only logarithmically with |H | : in other words, lik e the num b er of bits needed to write down an ind ividual h yp othesis. Second, one can m ake the probabilit y that the hypothesis h will fail to ge n eraliz e exp onentia l ly smal l (sa y , δ = 2 − n ), at the cost of increasing the sample size m by only a lin ear factor. Third , assuming the hyp othesis do es generalize, its error rate ε decreases in ve r s ely with m . It is not hard to sho w t h at eac h of these dep endencies is tigh t, so that for example, if we demand either ε = 0 or δ = 0 then no ﬁnite m suﬃces. This is the origin of the name “P A C-learnin g”: th e most one can hop e for is to output a h yp othesis th at is “probably , app ro ximately” correct. The pro of of Th eorem 2 is easy: consider an y hypothesis h ∈ H that is b ad , meaning that Pr x ∼D [ h ( x ) = f ( x )] < 1 − ε. Then by th e indep endence assu mption, Pr x 1 ,...,x m ∼D [ h ( x 1 ) = f ( x 1 ) ∧ · · · ∧ h ( x m ) = f ( x m )] < (1 − ε ) m . No w, the num b er of bad hyp otheses is no more than the total num b er of h yp otheses, |H | . So by the union b ound, the probabilit y that there exists a bad h yp othesis that agrees with f on all of x 1 , . . . , x m can b e at most |H| · (1 − ε ) m . Therefore δ ≤ |H | · ( 1 − ε ) m , and all th at rema in s is to solv e for m . The relev ance of Theorem 2 to Hume’s problem of induction is that the theorem describ es a non trivial class of situations where ind u ctio n is guar ante e d to work with high prob ab ility . Theorem 2 also illum inates the role of Occam’s Razor in in d uction. In order to learn using a “reasonable” n u mb er of sample p oints m , th e hyp othesis class H must ha ve a su ﬃ cien tly s mall cardinalit y . But that is e qu iv alen t to saying that ev ery hyp othesis h ∈ H m ust hav e a suc ci nc t description —since 26 the n umb er of bits needed to specify an arbitrary h yp othesis h ∈ H is simply ⌈ log 2 |H|⌉ . If the n u mb er of bits needed to sp ecify a h yp othesis is to o large, then H will alwa ys b e vulner ab le to the p r oblem of overﬁtting : some h yp otheses h ∈ H sur viving conta ct with the sample data ju st by c hance. As p oin ted out to me b y Agust ´ ın Ra y o, there are sev eral p ossible in terpr eta tions of Occam’s Razor that h a v e nothing to d o w ith descriptiv e complexit y: for example, we migh t w an t our h yp otheses to b e “simple” in terms o f th eir on tological or ideological co mm itmen ts. Ho w ev er, to whatev er exten t we in terpret Occam’s R azor as saying that shorter or lower-c omplexity h yp otheses are preferable, Theorem 2 comes c loser than one might h av e though t p ossible to a ma th ematical justiﬁcation for w h y the Razor wo rk s . Man y philosophers migh t b e familiar with alternativ e formal appr oac hes to Occam’s Razor. F or example, w ithin a Ba y esian f r amew ork, one can c ho ose a pr ior o ver all p ossib le hyp otheses that giv es greater wei ght to “simpler” hyp otheses (where simplicit y is measur ed, for example, by the length of the shortest pr ogram that computes the p r edictio n s). How ev er, while th e P A C-learning and Ba yesian approac hes are r ela ted, th e P A C approac h has the adv an tage of requiring only a qualitative d ecision ab out wh ich h yp otheses one wan ts to consider, rather than a quantit ativ e p rior o v er hypotheses. Given the hyp othesis class H , one can then seek learning metho ds that w ork for any f ∈ H . (On the other hand , th e P AC approac h requires an assu mption ab out the probabilit y distribution ov er observations , while the Ba y esian approac h do es not.) 7.1 Dra wbac ks of the B asic P A C Mo del I’d n o w like to discuss three drawbac ks of Theorem 2, since I think th e drawbac ks illumin ate philosophical asp ects of induction as well as the adv an tages do. The ﬁrst drawbac k is that Th eorem 2 works only for ﬁnite h yp othesis classes. In science, how- ev er, hyp otheses often inv olv e contin uous parameters, of wh ic h there is an un coun table inﬁnity . Of course, one could solve this problem b y simp ly discretizing th e parameters, b ut then the num b er of h yp otheses (and therefore the relev ance of T h eorem 2) wo u ld dep end on h o w ﬁne the discretization w as. F ortu n atel y , w e can av oid such diﬃculties by realizing that the le arner only c ar es ab out the “diﬀer enc es” b etwe en two hyp otheses insofar as they le ad to diﬀer ent pr e dictions. Th is lea d s to the fu n damen tal notion of VC-dimension (after its originators, V apnik and Chervonenkis [129]). Deﬁnition 3 (VC-dimension) A hyp othesis class H shatters the sample p oints { x 1 , . . . , x k } ⊆ S if for al l 2 k p ossible settings of h ( x 1 ) , . . . , h ( x k ) , ther e exists a h yp othesis h ∈ H c omp atible with those settings. Then VCdim ( H ) , the VC-dimension of H , is the lar gest k for which ther e exists a subset { x 1 , . . . , x k } ⊆ S that H sh atters (o r if no ﬁnite maximum exists, then V Cdim ( H ) = ∞ ). Clearly an y ﬁn ite h yp othesis class has ﬁnite V C-dimension: indeed, VCdim ( H ) ≤ log 2 |H| . Ho w ever, eve n an inﬁn ite h yp othesis class can ha ve ﬁn ite V C-dimension if it is “suﬃcient ly simp le.” F or example, let H b e the class of all functions h a,b : R → { 0 , 1 } of the form h a,b ( x ) =  1 if a ≤ x ≤ b 0 o th er w ise. Then it is easy to chec k th at VCdim ( H ) = 2. With th e notion of V C-dimension in hand, we can state a p ow erful (and harder -to-prov e!) generalizat ion of Theorem 2, d u e to Blumer et al. [31]. 27 Theorem 4 (Blumer et al. [31]) F or some universal c onstant K > 0 , the b ound on m in The- or em 2 c an b e r eplac e d by m = K VCdim ( H ) ε ln 1 δ ε , with the the or em now holding for any hyp othesis class H , ﬁnite or inﬁnite. If H has inﬁ nite V C-dimen sion, then it is easy to construct a probabilit y distribution D o v er sample p oints such that no ﬁnite numb er m of samp les fr om D suﬃc es to P AC-le arn a function f ∈ H : one real ly is in the unfortunate situation described by Hu me, of having n o groun ds at a ll for p redicting that the n ext rav en will b e b lack. In some sense, then, Th eorem 4 is telling us that ﬁnite V C-dimension is a n ece ss ary and suﬃcien t condition for scien tiﬁc induction to b e p ossible. Once again, Theorem 4 also has a n in terpretation in terms of Occam’s R azor, with the sm allness of the VC-dimension now playi n g the role of simplicit y . The second dr a wbac k of Theorem 2 is that it give s u s no clues ab out ho w to ﬁnd a hyp othesis h ∈ H consisten t w ith th e sample data. All it sa ys is that, if we ﬁn d suc h an h , th en h will probably b e close to the truth. This illustrates th at, ev en in the simp le setup en visioned by P A C-learning, indu ction c annot b e merely a matter of seeing enough data and then “generalizing” from it, b ecause immense computations migh t b e needed to ﬁnd a suitable generalization! Indeed, follo wing the wo r k of Kearns and V alian t [81 ], we now kno w that many natural learning problems— as an exa mp le, inferring the r ules of a regular or conte xt-free language from random exa m p les of grammatical and ungrammatical sen tences—are computationally intrac table in an extremely strong sense: Any p olynomial-time algorith m for ﬁnding a hyp othesis c onsistent with the data would imply a p olynomial-time algorith m for br e aking widely-use d cryptosystems such as RSA! 39 The app earance of crypto gr aphy in the ab o v e statemen t is far f rom acciden tal. In a sense that can b e made precise, learning and cryptograph y are “dual” p roblems: a learner w ants to ﬁnd patterns in data, while a cryptographer wan ts to generate d ata whose patterns are har d to ﬁnd. M ore concretely , one of the basic p rimitiv es in cryptography is called a pseudor ando m function family . Th is is a family of eﬃcien tly-computable Bo olea n fu nctions f s : { 0 , 1 } n → { 0 , 1 } , parameterized b y a s h ort random “seed” s , that are virtual ly indistinguishable fr om r andom functions b y a p olynomial-time algorithm. Here, we imagine that the w ould-b e distinguish in g algorithm can query the function f s on v arious p oints x , and also that it k now s the mapping from s to f s , and so is ignorant only of the seed s itself. There is strong evidence in cryp tograph y that pseudorand om fun ction families exist: indeed, Goldreich, Goldw asser, and Micali [64] s h o w ed ho w to construct one starting f rom any pseudorandom gener ator (the latter was men tioned in S ectio n 1.1). No w, giv en a pseudorandom fun ctio n f amily { f s } , imagine a P A C-learner w h ose hyp othesis class H consists of f s for all p ossible seeds s . The learner is pro vided s ome randomly-c hosen sample p oin ts x 1 , . . . , x m ∈ { 0 , 1 } n , toget h er with the v alues of f s on those p oints: f s ( x 1 ) , . . . , f s ( x m ). Given 39 In the setting of “prop er learning”—where the learner needs to outp ut a hyp othesis in some sp eciﬁed format—it is even known that many natural P A C-learning problems are NP -complete (see Pitt and V alian t [104] for example). But in the “improp er” setting—where the learner can describe its hypothesis using an y p olynomial-time algori th m—it is only known how to show that P AC-learning problems are hard und er crypt ogra p hic assumptions, and t h ere seem to b e inh eren t reasons for this (see A pplebaum, Barak, and Xiao [14]). 28 this “training data,” the learner’s goal is to ﬁgure out ho w to co mp ute f s for itself—and thereb y predict the v alues of f s ( x ) on new p oin ts x , p oin ts not in the training sample. Un fortunately , it’s easy to see that if the learner could do that, then it would thereby d istinguish f s from a truly random function—and thereb y con tr ad ict our starting assumption that { f s } w as pseudorand om! Our conclusion is that, if the basic assumptions of mo dern cryptography hold (and in p articular, if there exist pseu d orandom generators), then th ere must b e situ ations where learning is imp ossible purely b ecause of compu tational complexit y (and n ot b ecause of insuﬃcien t data). The third dr awbac k of Theorem 2 is the assu mption that the distribution D from wh ich the learner is tested is the same as the distribution from w hic h the sample p oints w ere dra wn . T o me, this is the most serious dra wb ack, since it tells u s th at P AC-le arn ing mo dels the “learning” p erformed by an under grad u ate cr amm in g for an exam by solving last year’s p r oblems, or an emplo ye r us ing a regression mo del to id entify the c haracteristics of su cce ss f ul hires, or a cryptanalyst breaking a code from a co llection of plaint exts and ciph ertexts. It do es n ot, ho wev er, mo del the “learning” of an Eins tein or a Szilard, making predictions ab out phenomena that are diﬀeren t in kind fr om an ything y et observ ed. As Da vid Deutsc h stresses in his r ece nt b o ok The Be ginning of Inﬁnity [49], the goal of science is not merely to summarize observ ations, and thereby let us make predictions ab out sim ilar observ ations. Rather, the goal is to disco v er explanations with “reac h ,” meaning the abilit y to predict what w ould happ en ev en in nov el or hypothetical situations, lik e the Sun sudd enly disapp earing or a quantum computer b eing bu ilt. In m y view, dev eloping a comp elling mathematica l mo del of explanatory learning—a mo del that “is to explanation as the P A C mo del is to pr edictio n ”—is an outstanding op en problem. 40 7.2 Computational Complexit y , Bleen, and Gr ue In 1955, Nelson Go o dman [67] prop osed what he called the “new riddle of induction,” whic h survives the Occam’s Razor answ er to Hume’s original induction problem. In Go o dman’s riddle, we are ask ed to consider th e hypothesis “All emeralds are green.” Th e question is, w hy do w e fa v or tha t h yp othesis o ver the follo wing alternativ e, wh ic h is equally compatible with all our evidence of green emeralds? “All emeralds are green b efore J an uary 1, 2030, and then b lue afterwards.” The ob vious answer is that the second hypothesis ad d s sup er ﬂ uous complicatio n s, and is there- fore disfa vored by O cca m’s Razor. T o that, Go od man replies that the deﬁn itions of “simp le” and “complicate d ” dep end on our language. In particular, supp ose we h ad no words for green or b lue, but w e did hav e a word grue , meaning “green b efore January 1, 2030, and blue afterw ards ,” and a w ord ble en , meaning “blue b efore Jan uary 1, 2030, and green afterw ards .” In that case, w e could only express the hypothesis “All emeralds are green” b y sa ying “All emeralds are grue b efore January 1, 2030, and then bleen afterwards.” —a manifestly more complicated hyp othesis than the simple “All emeralds are grue”! 40 Imp ortan t progress tow ard t his goal includes the work of Angluin [11] on learning ﬁ nite automata from q ueries and counterexamples, and that of A ngluin et al. [12] on learning a circuit by injecting v alues. Bo th p apers study natural learning mo dels that generalize the P AC model by allo wing “controll ed scientiﬁc exp erimen ts,” whose results conﬁrm or refute a hyp othesis and thereby provide guidance ab out which exp eriments to d o next . 29 I confess that, w hen I con template the grue riddle, I ca n ’t help b u t recall the jok e ab out the An ti-Indu ctivists, who, when ask ed why they con tin ue to b eliev e th at the future won ’t resem ble the past, wh en th at false b elief has brought their civilizati on nothing but p o vert y and misery , reply , “b ecause ant i-ind uction has never work ed b efore!” Y es, if we artiﬁcially deﬁn e our p rimitiv e concepts “against the grain of the world,” then w e shouldn’t b e surprised if the w orld’s actual b eha vior b ecomes more cumber s ome to describ e, or if w e m ak e wrong pr edictio n s. It would b e as if we were u s ing a programming language that h ad n o built-in fu nction for m u ltiplica tion, b u t only for F ( x , y ) := 17 x − y − x 2 + 2 xy . In that case, a normal p erson ’s ﬁ rst in s tinct w ould b e either to switc h programming languages, or else to deﬁne m u ltiplica tion in terms of F , and forget ab out F f rom that point onw ard! 41 No w, there is a ge nuine p hilosophical problem here: why do grue, bleen, an d F ( x, y ) go “against the grain of the w orld,” whereas green, b lue, and m ultiplication go with the grain? But to me, that problem (lik e Wigner’s pu zzlement o v er “the unr easo n able eﬀectiv eness of mathematics in natural sciences” [1 35 ]) is more a b out the w orld itself t h an ab out h u m an concepts, so we shouldn ’t exp ect any pu rely linguistic analysis to r esolv e it. What ab out computational complexit y , th en ? In my view, wh ile computational complexit y do esn’t solve the gru e riddle, it do es con tr ib ute a useful insight . Namely , that when we talk ab out the simp licit y or complexit y of h yp otheses, w e sh ould d istinguish tw o issu es: (a) The asymptotic sc aling of the hypothesis size, as the “size” n of our learning pr oblem go es to inﬁnity . (b) The constant -factor ov erheads. In terms of the basic P A C mo del in S ect ion 7, we can imagine a “hidden parameter” n , whic h measures the n u m b er of bits needed to sp ecify an individ u al p oin t in the set S = S n . (Other w ays to measure the “size” of a learning problem w ould also wo r k, but this wa y is p articularly con v enient.) F or conv enience, we can iden tify S n with the set { 0 , 1 } n of n -bit strings, so that n = log 2 |S n | . W e th en need to consider, not just a single hyp othesis class, but an inﬁnite f amily of hyp othesis classes H = { H 1 , H 2 , H 3 , . . . } , one for eac h p ositiv e in teger n . Here H n consists of h yp othesis f u nctions h that map S n = { 0 , 1 } n to { 0 , 1 } . No w let L b e a language for sp ecifying hyp otheses in H : in other words, a mapping f r om (some subset of ) binary strings y ∈ { 0 , 1 } ∗ to H . Also, given a hyp othesis h ∈ H , let κ L ( h ) := min {| y | : L ( y ) = h } b e the length of the shortest description of h in the language L . (Here | y | j ust means the num b er of bits in y .) Finally , let κ L ( n ) := max { κ L ( h ) : h ∈ H n } b e the n u m b er of bits needed to sp ecify an arbitr ary h yp othesis in H n using the language L . Clearly κ L ( n ) ≥ ⌈ log 2 |H n |⌉ , with equalit y if and only if L is “optimal” (that is, if it represent s 41 Supp ose that our programming language p rovides only multiplication by constants, addition, and th e function F ( x, y ) := ax 2 + bxy + cy 2 + dx + ey + f . W e can assume without loss of generality t hat d = e = f = 0. Then provided a x 2 + bxy + cy 2 factors into tw o indep endent linear terms, px + q y an d r x + sy , w e can express th e p roduct xy as F ( sx − q y , − r x + py ) ( ps − q r ) 2 . 30 eac h h yp othesis h ∈ H n using as few bits as p ossib le). Th e question that concerns us is how quic kly κ L ( n ) gro ws as a function of n , f or v arious c hoices of language L . What do es any of this ha ve to do with the gru e rid dle? W ell, we can think of the details of L (its synt ax, v o cabulary , etc.) as aﬀecting the “lo we r -order” b ehavio r of the f unction κ L ( n ). So for example, supp ose w e are unlucky enough that L c ontains the w ords grue and b le en , but not blue and gr e en . T hat might increase κ L ( n ) b y a factor o f ten or so—since no w, ev ery time w e w ant to mentio n “green” when sp ecifying our h yp othesis h , w e instead need a wordy circu m locution lik e “grue b efore January 1, 2030 , a n d then bleen afterwards,” and similarly for blue. 42 Ho w ever, a crucial lesso n of complexit y theory is that the “ h igher-ord er” b ehavio r of κ L ( n )—for example, whether it gro ws p olynomially or exp onen tially with n —is almost completely unaﬀected by the details of L ! The reason is th at, if t wo languages L 1 and L 2 diﬀer only in th eir “lo w-lev el d etai ls,” then tr anslating a hyp othesis from L 1 to L 2 or vice v ersa will increase the description length by n o more than a p olynomial factor. Indeed, as in our grue example, there is usu ally a “un iv ersal trans- lation constan t” c suc h that κ L 1 ( h ) ≤ cκ L 2 ( h ) or ev en κ L 1 ( h ) ≤ κ L 2 ( h ) + c for every hyp othesis h ∈ H . The one exception to the ab o v e r ule is if the languages L 1 and L 2 ha ve diﬀeren t expr essive p owers . F or example, ma yb e L 1 only allo ws nesting exp r essions to depth t wo , w hile L 2 allo ws nesting to arbitrary depth s; or L 1 only allo ws prop ositional connectiv es, while L 2 also allo ws ﬁrst- order qu an tiﬁers. In those cases, κ L 1 ( h ) could indeed b e m uch greater than κ L 2 ( h ) for some h yp otheses h , p ossibly eve n exp on entially greater ( κ L 1 ( h ) ≈ 2 κ L 2 ( h ) ). A rough analogy w ould b e this: supp ose y ou hadn’t learned w hat d iﬀerential equ ati ons were, and had no idea ho w to solv e them ev en appr o ximately or n umer ically . In that case, Newtonian m ec hanics migh t seem ju st as co mp lica ted to y ou as the Ptolemaic theory with epicycl es, if not mor e complicated! F or the only w a y y ou could make predictions with Newtonian mec hanics w ould b e usin g a huge ta b le of “precomputed” d iﬀeren tial equ ati on solutions—and to you , that table w ould seem just as u n wieldy and inelegan t as a t able of epicycles. But notic e that in this case, your p erception would b e the result, n ot o f some arb itrary c hoice of v o cabulary , but of an obje ctive gap in y our m athemat ical expressiv e p o we r s. T o su m marize, our choice of v o cabulary—for example, whether we tak e green/blue or bleen/grue as primitiv e concepts—could indeed matter if we w ant to use O cca m’s Razor to predict the fu ture color of emeralds. But I think that complexit y theory justiﬁes us in t r eat in g g ru e as a “small- n eﬀect”: something that b ecomes less and less imp ortant in the asymp tot ic limit of m ore and more complicated learning problems. 42 Though note that, if the language L is expressive enough t o allo w this, we can simply deﬁne green and blue in terms of bleen and grue onc e , then refer back to those deﬁnitions whenever needed! In that case, taking bleen and grue (rather th an green and blue) to b e the primitive concepts would increase κ L ( n ) by only an additive constan t, rather than a multiplicativ e constant. The ab o ve fact is related to a fundamental result from the theory of Kolmogorov complexity (see Li and Vit´ anyi [89] for example). Namely , if P and Q are any tw o T uring- univ ersal programming languages, an d if K P ( x ) and K Q ( x ) are th e lengths of th e shortest programs in P and Q respectively that out p ut a given string x ∈ { 0 , 1 } ∗ , then there exists a u niv ersal “translation constan t” c P Q , such that | K P ( x ) − K Q ( x ) | ≤ c P Q for every x . This c P Q is just the num b er of bits needed to write a P -interpreter for Q - programs or v ice versa. 31 8 Q u an tum Compu ting Quantum c omputing is a prop osal for using quan tum mechanics to solv e certain compu tat ional problems muc h faster than w e know ho w to solv e them to da y . 43 T o do so, one w ould n eed to b uild a new t yp e of computer, capable of exploiting the qu an tum eﬀects of sup erp osition and inte rf erence. Building su c h a computer—one large enough to solve interesti n g problems—r emains an enormous c hallenge for physics and engineering, du e to the fr agil ity of quantum states and th e need to isolate them from their external environmen t. In th e mean time, though, theoretica l computer scient ists ha v e extensiv ely stud ied w hat w e could and couldn ’t do with a quantum computer if w e had one. F or certain problems, remark able quan tum algo r ithms are kno wn to solv e them in p olynomial time, eve n though th e b est-kno wn classical algo r ithms requ ir e exp onen tial time. Most famously , in 1994 P eter Shor [117] ga v e a p olynomial-time quan tum algorithm for fac toring in tegers, and as a b ypr o du ct, breaking m ost of the cryptographic co des used on the Internet tod a y . Besides the practical implications, Shor’s algorithm also provided a k ey piece of evidence th at sw itc hing from classical to quant u m comput- ers w ould enlarge the class of problems solv able in p olynomial-ti me. F or theoretica l compu ter scien tists, th is had a p r ofound lesson: if w e w an t to kno w the limits of e ﬃcient computation, w e ma y need to “lea v e our armc hairs” and incorp orate actual facts ab out physics (at a minimum, the truth or f alseho o d of quan tum mechanics!) . 44 Whether or n ot scalable quan tum computers are built anyti m e so on , m y o wn (biased) view is that qu an tum compu ting represents one of the great s cien tiﬁc adv ances of our time. But here I wan t to ask a diﬀeren t question: d oes quantum c ompu ting ha v e any implicatio n s for ph ilosophy —and sp eciﬁcally , for th e inte r pretation of quan tum mec hanics? F rom one p ersp ectiv e, the answ er seems like a n ob vious “no.” Whatev er else it is, quantum computing is “merely” an application of quantum mec hanics, as that theory h as existed in p hysics textb ooks for 80 yea rs . Indeed, if yo u accept that qu an tum m ec hanics (as currently u ndersto od ) is true, then pr esu mably you should also accept the p ossibilit y of quan tum c omp uters, and mak e the same predictio n s ab out th eir op eration as ev eryone else. Whether yo u describ e the “realit y” b ehind quan tum pro cesses v ia the Man y-W orlds Interpretation, Bohmian mec h anics, or some other view (or, follo win g Bohr’s C op enh age n Inte r pretation, r efuse to discuss the “realit y” at all), seems irrelev an t. F rom a diﬀeren t p ersp ective , though, a scala b le quan tu m compu ter w ould test quan tum me- c hanics in an extremely no ve l regime—and for th at reason, it could in d eed raise new p hilosophical issues. The “regime” quan tum computers would test is c haracterize d not by a n energy scale or a temp erature, but by computational complexit y . One of t h e most strikin g facts ab out quant u m mec hanics is that, to represen t the state of n ent angled particles, one needs a vect or of size exp o- nential in n . F or example, to sp ecify the sta te of a thousand spin-1 / 2 particles, o n e needs 2 1000 complex n u m b ers called “amplitudes,” one for eve ry p ossible outcome of measuring the spins in the { up , do wn } basis. The quan tum state, denoted | ψ i , is then a linear com b ination or “sup erp osition” 43 The authoritative reference for quantum comput ing is th e b o ok of N ielsen and Ch u ang [99]. F or gentler intro- ductions, try Mermin [92, 93] or th e survey articles of Aharonov [10], F ortnow [57], or W atrous [132 ]. F or a general discussion of p olynomial-time comput ation and the law s of p hysics (including sp eculativ e mo dels b ey ond quantum computation), see my survey article “ NP - complete Problems and Physical R eality” [4]. 44 By contrast, if we only wa nt to know what is c omputable in the physical universe, with no eﬃciency requ iremen t, then it remains entirely consistent with current k no wledge that Churc h and T urin g ga ve t he correct answe r in the 1930s—and that they d id so without incorp orating any physics b ey ond what is “accessible to intuition.” 32 of the p ossible outcomes, with eac h outcome | x i we ighted by its amplitude α x : | ψ i = X x ∈{ up , do wn } 1000 α x | x i . Giv en | ψ i , one can ca lculate the prob ab ility p x that any p articular outcome | x i w ill b e observed, via the rule p x = | α x | 2 . 45 No w, there are only ab out 10 80 atoms in the visible un iverse, whic h is a muc h smaller num b er than 2 1000 . So assumin g quan tum mec hanics is true, it seems Nature has to inv est staggering amoun ts of “computational eﬀort” to k eep trac k of small collecti ons of particles—certainly more than anything classical physics requires! 46 , 47 In the early 1980s, Ric hard F eynman [55] and others called atten tion to this p oin t, noting that it underla y something that had long b een apparen t in practice: the extraordinary diﬃculty of simulati n g quant u m mec hanics using conv en tional compu t- ers. But F eynman also raised th e p ossibilit y of turning that diﬃcult y around, by bu ilding our computers out of quan tu m comp onen ts. Suc h computers could conceiv ably solve certain prob- lems faster than co nv en tional computers: if nothing else, then at least the pr oblem of sim ulating quan tum mec hanics! Th u s, quantum computing is interesting n ot just b ecause of its applications, bu t (ev en more, in m y opinion) b ecause it is the ﬁrst te chnolo gy that would dir e ctly “ pr ob e” the exp onentiality inher ent in the quantum description of Natur e. One can make an analogy here to the exp eriments in the 1980s that ﬁ rst convincingly violated the Bell Inequalit y . Lik e quan tum algorithms to da y , Bell’s refutation of lo cal realism wa s “merely” a mathematical consequ ence of qu antum mechanics. But that r efutatio n (and the exp eriments that it insp ired) made conceptually-imp ortan t asp e cts of quan tum m ec hanics no longer p ossible to ignore—and for that reason, it c hanged th e ph ilosophical landscap e. It seems o verwhelmingly likely to me that quan tu m compu ting will do the same. Indeed, we can extend the analogy further: just as th er e w ere “local realist d iehards” wh o denied that Bell I nequalit y violation wo uld b e p ossible (and tried to explain it a wa y after it was ac hiev ed), 45 This means, in particular, that th e amplitud es satisfy the normalization condition P x | α x | 2 = 1. 46 One might ob ject that even in t he classical wo rld, i f we simply d on ’t know t he v alue of (sa y ) an n -bit string, then we also describ e our ignorance using exp onentia lly- man y numbers: namely , the pr ob ability p x of each p ossible string x ∈ { 0 , 1 } n ! And indeed, there is an extremely close conn ection b etw een quantum mechanics and classical probabilit y theory; I often describe quantum mechanics as just “probabilit y theory with complex num b ers instead of nonnegative reals.” How ever, a crucial diﬀerence is that we can alw ays describ e a classical string x as “really” having a deﬁn ite va lue; the vector of 2 n probabilities p x is then just a mental representation of our own ignorance. With a qu an tum state, we do not have the same luxury , b ecause of th e phenomenon of interfer enc e b etw een p ositiv e and negative amplitudes. 47 One might also ob ject that, even in classical physics, it takes i nﬁnitely many bits to record the state of even a single p articl e, if its p osition and momentum can b e arbitrary real num b ers. A nd indeed, Copeland [43], H ogarth [73], Siegelmann [118], and other writers hav e sp eculated that the contin uity of physical quantities migh t actually allo w “hypercompu tations”—including solving the halting problem in a ﬁ n ite amount of time! F rom a mo dern p erspective, thou gh , quantum mechanics and quantum gra vity strongly suggest t hat the “c ontinuity” of me asur able quantities such as p ositions and m om enta i s a the or etic al artif act. In other words, it ought to suﬃce for simulation purp oses to approximate these quantities to some ﬁ nite p recisi on, probably related to the Planc k scale of 10 − 33 centimeters or 10 − 43 seconds. But the exp onen tiality of quantum states is diﬀerent, for at least tw o reasons. Firstly , it do esn’t lead to computa- tional sp eedups that are n early as “unreasonable” as the hyp ercomputing sp eedups. Secondly , no one has any idea where the theory in q uestion (quantum mechanics) c ould break down, in a manner consisten t with current exp eri- ments. In other words, th ere is no known “killer obstacle” for q uan tu m computing, analogous to th e Planck scale for hypercompu ting. See Aaronson [2 ] for further discussion of this p oint, as w ell as a prop osed complexity-theoretic framew ork (called “Sure/Shor separators”) with which to study such obstacles. 33 so to da y a vocal minority of computer scien tists and physicists (including Leonid Levin [88], O ded Goldreic h [61], and Gerard ’t Ho oft [75]) denies the p ossibility of scalable qu an tum compu ters, ev en in prin ciple. While they admit th at quantum mec hanics h as passed ev ery exp erimental test for a cen tury , these sk eptics are c onﬁdent that qu an tum mec hanics will fail in the regime teste d b y quan tum computing—and that wh atev er new th eory replaces it, th at theory will allo w only classical computing. As most quan tum computing researchers are quic k to p oin t out in resp onse, they would b e thril le d if the attempt to build scalable quan tum compu ters led ins tea d to a revision of quantum mec hanics! S u c h an outcome w ould pr obably constitute the largest rev olution in p h ysics since the 1920s, and ultimately b e muc h mor e in teresting than bu ilding a quantum compu ter. O f cours e, it is also p ossible that scalable quantum computing will b e gi ven u p as t o o diﬃcult for “m undane” tec hnological reasons, rather than fund amental p hysics reasons. But t h at “m und ane” p ossib ilit y is not wh at skeptic s su c h as Levin, Goldreic h, and ’t Ho oft are talking ab out. 8.1 Quan tum Computing and t he Man y-W orlds In t erpretation But let’s retur n to the original question: supp ose the skeptics are wrong, and it is possible to b uild scalable qu antum computers. W ould that ha ve any relev ance to the inte r p retatio n of quantum mec hanics? The b est-kno wn argumen t that the answe r is “y es” w as made by Da vid Deutsc h , a quant u m computing pioneer and staunc h d efen d er o f the Man y-W orlds Interpretation. T o b e precise, Deutsc h thin ks th at quantum mec hanics str aightforwar d ly implies the existence of parallel unive r s es, and that it do es so indep endently of quantum computing: on his view, eve n the d ouble- slit exp eriment can only b e explained in terms of t wo parallel unive r ses in terfering. Ho wev er, Deutsc h also think s th at quan tum compu ting adds emotional pu n c h to the argument. Here is how he put it in his 1997 b o ok The F abric of R e ality [48, p. 217]: Logicall y , the p ossibilit y of complex quantum computations add s nothing to a case [for the Many-W orlds In terp r etat ion] that is already u nansw erable. But it do es add psyc hological i m p act. With Sh or’s algorithm, th e argumen t has b een w rit v ery large. T o those who still cling to a sin gle -u n iv erse world-view, I issue this c hallenge : explain how Shor’s algorith m works. I do not merely mean predict that it will work, whic h is merely a matter of solving a few uncont r o v ersial equations. I mean pr o vide an explanation. When Sh or’s algorithm h as factorized a num b er, using 10 500 or so times the compu tati onal resources th at can b e seen to b e pr esent, where w as the n umb er factorized? Th ere are only ab out 10 80 atoms in the enti r e v isib le u niv erse, an utterly min u scule num b er compared w ith 10 500 . So if the visible unive r se we re the exten t of physic al realit y , physical realit y would not even remotely con tain the resources r equired to fa ctorize suc h a large n umb er. Who did factorize it, th en? Ho w, and where, w as the computation p erformed? There is plent y in the ab o v e paragraph f or an enterprising philosopher to min e. In particular, ho w should a nonbeliev er in Man y-W orlds answ er Deutsc h’s c hallenge? In the r est of this section, I’ll fo cus on tw o p ossib le resp onses. The ﬁrst resp ons e is to den y th at, if Shor’s algorithm works as predicted, that can only b e explained b y p ostulating “ v ast computational resources.” A t the most obvio u s level, complexit y 34 theorists hav e n ot y et ruled out the p ossibility of a fast classic al factoring algorithm. 48 More gen- erally , that quan tum compu ters can s olve certain p r oblems sup erp olynomially faster than classical computers is not a theorem, but a (profoun d, plausible) c onje ctur e . 49 , 50 If the conjecture failed, then the do or would seem op en to w hat w e might call “p olynomial-time hidden-v ariable theories”: theories that repro duce the p r edictio n s of quan tum mec h anics without in vo king any compu tati ons outside P . 51 These would b e analogous to th e lo c al hidden v ariable theories that Einstein and others had h op ed for, b efore Bell ruled suc h th eories out. A second resp ons e to Deutsch’s c hallenge is that, ev en if we agree that Shor’s algorithm demon- strates the r eal ity of v ast c omputa tional r esour c es in Nature, it is not obvious th at w e sh ould think of those r esources as “parallel un iv erses.” Why not simply sa y that there is one universe, and that it is quant u m-mec hanical? Doesn’t the parallel-univ erses language reﬂect an ironic p ar o chialism : a desire to imp ose a f amiliar science-ﬁction image on a mathematical theory that is str anger than ﬁction, that d o esn’t matc h any of our pre-quantum intuitio ns (includ ing computational in tuitions) particularly well? One can s harp en the p oint as follo w s: if one to ok the p aralle l-un iv erses exp lanati on of ho w a quan tum compu ter works to o seriously (as many p opular writers do!), then it would b e natural to mak e f urther in ferences ab out quant u m compu ting that are ﬂat-out wr ong. F or example: “Using only a thousand q u antum b i ts (or qubits), a quantum c omputer c ould stor e 2 1000 classic al bits.” This is true only for a bizarre deﬁ n ition of the w ord “store”! Th e f undamen tal pr oblem is that, when you measure a quantum computer’s state, y ou see only one of the p ossible outcomes; th e rest d isapp ear. Indeed, a celebrated r esult called Holevo’s The or em [74] sa ys that, usin g n qubits, there is no wa y to store m ore than n classical bits so that the bits can b e reliably retriev ed la ter. In other w ord s: for at least one n atural deﬁn ition of “information-carrying c apacit y ,” qubits ha ve exactly the same capacit y as b its. T o tak e another examp le: 48 Indeed, one c annot ru le t hat p ossi b ility out, without ﬁrst proving P 6 = NP ! But even if P 6 = NP , a fast classical factoring algorithm might stil l e x ist, again b ecause factoring is not th ough t to b e NP -complete. 49 A formal versio n of th is conjecture is BPP 6 = BQP , where BPP (Bounded- Error Probabilistic Po ly n omial - Time) and BQP (B oun ded-Error Quantum P olynomial-Time) are the classes of problems eﬃciently solv able b y classical randomized algorithms and quantum algorithms respectively . Bernstein and V azirani [29] show ed that P ⊆ BPP ⊆ BQP ⊆ PSP ACE , where PSP ACE is the class of problems solv able by a det erministic T uring mac hine using a p olynomial amount of memory (but p ossi b ly exp onential time). F or th is reason, an y proof of the BPP 6 = BQP conjecture would immediately imply P 6 = PSP ACE as we ll. The latter w ould b e considered almost as great a breakthrough as P 6 = NP . 50 Complicating matters, there ar e quantum algorithms th at prov ably ac hieve exp onentia l sp eedups ov er any clas- sical algori t h m: one example is Simon’s algorithm [119], an important predecessor of Shor’s algorithm. How ever, all such algorithms are form ulated in the “blac k- box mo del” (see Beals et al. [23]), where th e resource to b e minimized is the number of queries th at an algorithm makes to a hypoth etical b lac k box. Because it is relative ly easy to analyze, the b lac k-b o x mo del is a crucial source o f insi ghts about what mi ght b e true in the conven tional T uring machine mod el. How ever, it is also known that the black-b o x mo del sometimes misleads u s ab out the “real” situation. As a famous example, th e complexity classes IP and PS P A CE are equ al [115], d espite the existence of a black b ox that separates them (see F ortno w [56] for discussion). Besides the black-box mo del, unc onditional exp on ential separations b et ween q uan tu m and classical complexities are known in severa l other restricted mo dels, including comm u n ication complexity [107]. 51 T echnically , if the hidden-va riable t heory invo lved classical randomness, th en it w ould correspond more closely to the complexity class BPP (Bounded- Error Probabilistic P olyn omial-Time). How ever, to da y th ere is strong evidence that P = BPP (see I mpaglia zzo and Wigderson [79]). 35 “Unlike a classic al c omputer, which c an only f actor numb e rs by trying th e divisors one by one, a q uantum c omputer c ould try a l l p ossible divisors in p ar al lel.” If quantum computers can harness v ast num b ers of parallel w orlds, then the ab o ve seems lik e a reasonable guess as to h ow S h or’s algorithm w orks. But it’s not how it works at al l . Notice that, if Shor’s algorithm did work that w a y , then it could b e u sed n ot o n ly for fact orin g in tegers, but also for the muc h larger task of s olving NP -complete p r oblems in p olynomial time. (As men tioned in fo otnote 12, the factoring problem is strongly b eliev ed not to b e NP -complete.) But con trary to a common misconception, quan tum compu ters are n either kno wn n or b eliev ed to b e able to solve NP -complete p r oblems eﬃcientl y . 52 As us u al, th e fundamental problem is th at measuring rev eals just a sin gle random outcome | x i . T o get around that pr ob lem, and ensu re that the right outcome is observ ed with high probabilit y , a quantum algorithm needs to generate an interfer enc e p attern , in wh ich the compu tat ional paths leading to a giv en wrong outcome cancel eac h o ther out, wh ile the paths leading to a giv en righ t outcome reinforce e ach other. This is a delicate requiremen t, and as far as any one kno ws, it can only b e ac hiev ed for a few p roblems, most of w hic h (lik e the factoring pr oblem) hav e sp ecial structur e arising from algebra or num b er theory . 53 A Man y-W orlder migh t retort: “sure, I agree that quan tum compu ting in vol ves harnessing the parallel u niv erses in subtle and non-obvio u s wa ys, b u t it’s s till harnessing p ar al lel universes! ” Bu t ev en h ere, th er e’s a fascinating irony . Supp ose we c ho ose to think of a quan tum algorithm in term s of p arallel un iv erses. Then to pu t it crudely , not only must many universes interfere to giv e a large ﬁnal amplitude to the right answ er; they must also, b y int erf er in g, lose their identities as p ar al lel universes! In other words, to wh ateve r exten t a collection of universes is useful for quan tum computation, to that exten t it is arguable whether we ought to call them “parallel un iv erses” at all ( as opp osed to parts of one exponential ly-large, self-int erf er in g, quan tum-mec hanical blob). Con versely , to whatev er exten t the universes ha ve un am biguously separate iden tities, to that extent they’re no w “decohered” an d out of causal con tact w ith eac h other. Thus w e can explain the outputs of an y future computations b y in vo kin g only one of the univ ers es, and treating th e others as un realiz ed h yp otheticals. T o clarify , I don’t regard either of th e ab o ve ob jections to Deutsc h’s argument as decisiv e, and am unsur e what I think ab out the matter. My pur p ose, in setting out th e ob jections, was simp ly to illus trate the p oten tial of qu an tum computing theory to inform d eb ate s ab out the Many- W orlds In terp r etat ion. 9 New Compu tati onal Notions of Pro of Since the time of Eu clid, there h a v e b een t wo main notions of mathematical pr oof: (1) A “pro of” is a ve r b al explanation that indu ces a sense of certaint y (and ideally , u nderstanding) ab out the statement to b e p ro v ed, in an y human m athemat ician willing and able to follo w it. 52 There is a remark able quantum algo rithm called Gr over ’ s algorithm [69], whic h can search any space of 2 N p ossi b le solutions in only ∼ 2 N/ 2 steps. How ever, Grov er’s algorithm rep resen ts a quadr atic ( square-root) improveme nt o ver classical brute-force searc h, rather than an exp onential improv ement. A nd without any furth er assumptions ab out the structure of th e search space, Grov er’s algorithm is optimal, as shown by Bennett et al. [27]. 53 Those interested in further details of h o w Shor’s algo rithm w orks, bu t still not ready for a mathematical exposition, migh t wan t to try my p opular essay “Shor, I’ll Do It” [1]. 36 (2) A “pro of” is a ﬁnite sequence of symb ols enco ding syn tactic dedu ctions in some formal sy s tem, whic h start with axioms and end with the s tat ement to b e p ro v ed. The tension b et w een these tw o n otio ns is a recurr in g theme in the ph iloso p h y of mathematics. But theoretical computer science d eal s regularly with a third notion of pro of—one that seems to ha ve receiv ed muc h less ph iloso p hical analysis than either of the tw o ab o ve . This notion is the follo wing: (3) A “pro of” is any computational pro cess or p rotocol (r eal or imagined) that can term in ate in a certain wa y if and only if the statemen t to b e p ro v ed is true. 9.1 Zero-Kno wledge P ro ofs As an example of th is third notion, consider zer o-know le dge pr o ofs , introd uced b y Goldwa ss er, Micali, and Rac k oﬀ [66]. Giv en t wo graphs G and H , eac h w ith n ≈ 10000 vertice s , supp ose that an all-p o werful bu t un tru st w orthy wizard Merlin wish es to convince a sk eptical king Arthur that G and H are not isomorp h ic. Of course, one w ay Merlin could do this would b e to list all n ! graph s obtained b y p erm u ting the v ertices of G , then n ote that none of th ese equal H . Ho w ev er, suc h a pro of would clearly exhaust Arth ur’s patience (indeed, it could not e ven b e written do wn within the observ able universe). Alternativ ely , Merlin could p oin t Arthur to some pr op erty of G a n d H that diﬀerent iates them: for example, ma yb e their adjacency matrices ha ve d iﬀerent eigen v alue sp ectra. Unfortu nately , it is not y et prov en that, if G and H are non-isomorphic, there is alw a ys a diﬀerentia ting pr operty that Arth ur can v erify in time p olynomial in n . But as noticed by Goldreic h, Micali, and Wigderson [65], there is somethin g Merlin can do instead: h e can let Arthur chal lenge him. Merlin can sa y: Arth u r , send me a new graph K , wh ic h y ou obtained either b y r andomly p erm uting the v ertices of G , or randomly p ermuting the vertices of H . Then I guaran tee that I will tell y ou, without fail, wh ether K ∼ = G or K ∼ = H . It is clear that, if G and H are really non-isomorphic, then Merlin can alwa ys answer suc h c hallenges correctly , by the assu mption that he (Merlin) has unlimited compu tati onal p o wer. But it is equally clear that, if G and H are isomorphic, then Merlin must ans w er some c hallenges incorrectly , regardless of his compu tat ional p ow er—since a random p ermuta tion of G is statistically indistinguishable fr om a random p erm u tat ion of H . This p r otocol has at least four features that merit reﬂection by any one interested in the n ature of mathematical p roof. First, the p rotocol is pr ob abilistic . Me rlin cannot convince Arthur with certa int y that G and H are non-isomorph ic, since ev en if they w ere isomorph ic, th ere’s a 1 / 2 p robabilit y th at Merlin w ould get lu c ky and answe r a giv en c hallenge correctly (and hence, a 1 / 2 k probabilit y that he w ould answ er k c hallenges correctly). All Merlin ca n do is oﬀer t o rep eat the protocol (say) 100 or 1000 times, and th er eby mak e it less likely that his p ro of is unsound than that an asteroid will strik e Camelot, killing b oth him and Ar thur. Second, the p rotocol is inter active . Un like w ith pro of notions (1) and (2), Arth ur is no longer a passiv e recipien t of kno wledge, but an act ive play er wh o c hallenges the p ro v er. W e kno w from exp erience that the abilit y to interr o gate a seminar sp eak er—to ask questions that the sp eak er 37 could not hav e an ticipated, ev aluate th e resp onses, and then p ossibly ask follo wu p qu estio n s— often sp eeds up the process of ﬁguring out whether the speake r kn o ws what he or sh e is t alking ab out. Complexit y th eory aﬃrms our in tuition here, th r ough its disco v ery of in teractiv e pro ofs for s tatement s (su c h as “ G and H are not isomorphic”) wh ose shortest kno w n con ven tional pro ofs are exp onenti ally longer. The third inte resting feature of the graph non-isomorphism pr otocol—a feature seldom mentioned— is that its soundness implicitly relies on a physic al assu mption. Name ly , if Merlin had the p ow er (whether through magic or through ord inary espionage) to “p eer in to Arthur’s study” and dir e c tly observe whether Arthur started with G or H , then clearly h e could answ er every c h alle n ge correctly ev en if G ∼ = H . It follo ws th at th e p ersuasiv eness of Merlin’s “pro of” can only b e as strong as Arth u r ’s extramathematical b elief that Merlin do es not ha v e such p o wers. By n o w, there are man y other examples in complexity theory of “pro ofs” w hose v alidit y rests on assumed limitations of the pro vers. As Sh ieb er [116] p oints out, all three of the ab o v e prop erties of int eractiv e protocols also hold for the T uring T est discussed in Section 4! The T u r ing T est is in teractiv e by deﬁn ition, it is probabilistic b ecause ev en a pr ogram that printed rand om gibb erish w ould hav e some nonzero probabilit y of passing the test b y chance, an d it dep ends on the ph ysical assu mption that th e AI program do esn’t “c heat” b y (for example) secretly consulting a h u man. F or these reasons, Shieb er argues that we can see the T uring T est itself as an early int eractiv e pr otocol—one that con vinces the veriﬁer not of a mathematical theorem, b ut of the pro ve r ’s capacit y for intel ligent v erbal b eha vior. 54 Ho w ever, p erhaps the most striking feature of the graph non-isomorphism proto col is that it is zer o-know le dge : a technical term formalizing our in tuition that “Arth ur learns nothing fr om the proto col, b ey ond the tr u th o f the statemen t b eing pro ved.” 55 F or all Merlin ev er tells A r th ur is whic h graph h e (Arthur) started w ith, G or H . But Arth u r alr e ady knew w h ic h graph he started with! This means that, not only does Arth ur ga in no “understanding” of what makes G a n d H non-isomorphic, he d o es not ev en gain the abilit y to pro ve to a third party what Merlin pro ve d to him. This is another asp ect of compu tati onal proofs that has no analogue with p ro of notions (1) or (2). One migh t complain that, as in teresting as the zero-kno wledge pr op ert y is, so far w e’v e only sho wn it’s a chiev able for a n extremely specialized problem. And ind eed, just lik e with factoring in tegers, to da y there is strong evidence that the graph isomorphism problem is not NP -complete [33]. 56 , 57 Ho w ever, in the same pap er that ga v e the g r aph non-isomorp h ism p rotocol, Goldreic h, 54 Incidentally , this p ro vides a go od example of how notions from computational complexity theory can inﬂuence philosophy even just at the level of metap h or, forgetting ab out the actual results. In this essa y , I d idn’t try to collect such “metaphorical” applications of complexity theory , simply b ecause there were to o many of them! 55 T echnically , the p roto col is “ honest-veriﬁer zero-know ledge,” meaning th at Arthur learns nothing from h is con- versa tion his Merlin b esides the truth of the statement b eing pro ved, assuming Arthur follow s t he protocol correctly . If Arthur c heats—for example, by send ing a graph K for whic h h e do esn ’t already know an isomorphism either to G or to H —then Merlin’s resp onse could indeed tell Arthur something n ew. H o wev er, Goldreich, Micali, and Wigderson [65] also gav e a more sophisticated p roof p rotocol for graph non-isomorphism, whic h remains zero-know ledge even in the case where A rt h ur cheats. 56 Indeed, there is not even a consensus b elief that graph iso morph ism is outside P ! The main reason is th at, in contra st to factoring in tegers, g raph i somorphism turns out to b e extremely easy in pr actic e . I ndeed, ﬁn ding non-isomorphic graphs that c an ’t b e d istinguished by simple inv arian ts is itself a h ard problem! And in the past, sever al problems (such as linear programming and primalit y testing) th at w ere long know n to b e “eﬃciently solv able for practical purp oses” were eventually shown to b e in P in the strict mathematical sense as well. 57 There is also strong evid en ce th at th ere are short c onvent i onal p roofs for graph non-isomorphism—in other w ords, 38 Micali, and Wi gder s on [65] also ga ve a celebrated ze ro-knowledge proto col (no w c alled the GMW pr oto c ol ) for the NP -co m p lete problems. By the deﬁnition of NP -complete (see Section 3. 1 ), the GMW p rotocol m ean t th at every mathematic al statement that has a c onventional pr o of (say, in Zermelo-F r aenkel set the ory) also h as a zer o-know le dge pr o of of c omp ar able size! As an example application, supp ose you’v e just pro ved th e Riemann Hyp othesis. Y ou w ant to con vince the exp erts of y our triump h, but are paranoid ab out them stealing credit for it. In that case, “all” yo u need to do is (1) rewrite your pro of in a formal language, (2) enco de the resu lt as the solution to an NP -complete problem, and th en (3) lik e a 16 th -cen tury court m athematician c hallenging his comp etitors to a duel, invite the exp erts to run th e GMW proto col with y ou o ver the Int ern et! Pro vided you answer all their c hallenges correctly , the exp erts can b ecome statistic al ly c ertain that you p ossess a pro of of t h e Riemann Hyp othesis, without l earnin g an ythin g ab out that proof b esides an up p er b oun d on its length. Better yet, u n lik e the graph non-isomorphism p rotocol, th e GMW proto col do es not assu me a sup er-p o we r f ul wizard—only an ordinary p olynomial- time b eing wh o hap p ens to kno w a pro of of the relev an t theorem. As a result, to da y the G MW proto col is m uch more than a theoretic al curiosit y: it and its v arian ts hav e foun d ma jor applications in Internet cryptography , wh ere clien ts and s erv ers often need to prov e to eac h other that they are follo wing a proto col correctly w ith ou t rev ealing secret information as they do so. Ho w ever, there is one imp ortant ca veat: u nlik e the graph-nonisomorp hism pr otocol, the GMW proto col relies essen tially on a crypto gr aphic hyp othesis . F or here is h o w the GMW pr oto col works: y ou (the pro ver) ﬁrst publish thousand s of encrypted messages, eac h one “committing” yo u to a randomly-garbled piece of y our claimed p roof. Y ou then oﬀe r to decrypt a tiny fraction of those messages, as a wa y for sk eptical observers to “sp ot-c hec k” y our pro of, while learning nothin g ab out its structure b esides the u s eless fact that, s a y , the 1729 th step is v alid (but how could it not b e v alid?). If the skeptics wa nt to increase their conﬁd ence that y our pro of is sound, then you simply run the proto col o ver and ov er with them, using a fresh b atc h of encr y p ted messages eac h time. If the ske p tics could decrypt all the messages in a single batc h , then they could piece together y our pro of—but to do that, they would n eed to b reak the und erlying cryp tog r ap h ic co de. 9.2 Other New Notions Let me mentio n four other notions of “pro of” that complexit y theorists h a v e explored in dep th ov er the last tw en ty y ears, and that migh t merit philosophical atten tion. • Multi- pr over inter active pr o ofs [26, 20], in which Arth ur exc h anges messages with two (or more) compu tat ionally-p o werful but untrust worth y wizards. Here, Arth u r migh t b ecome con vinced of some mathematical s tat ement, but only u n der the assu mption that the wizards could not communicate with e ach other during the p rotocol. (The usu al analogy is to a p olice detectiv e wh o puts t w o s usp ects in separate cells, to preve nt them from co ordinating th eir an- sw ers.) Inte restingly , in some multi- p ro v er proto cols, ev en non-comm u nicating wizards could that not just graph isomorphism bu t also graph non-isomorphism will ultimately turn out to b e in NP [84]. 39 successfully coordinate t h eir resp onses to Arth ur’s c hallenges ( and thereb y co nvince Ar thur of a falseho od ) through the use of quantum entanglement [41 ]. How ev er, other p rotocols are conjectured to remain sound eve n against en tangled wizards [83]. • Pr ob abilistic al ly che ckable pr o ofs [54, 18], wh ic h are mathematical p roofs enco ded in a sp ecial error-correcting fo r mat, so that o n e can b ecome conﬁdent of their v alidit y b y c hec king only 10 or 20 bits chosen randomly in a correlated wa y . Th e PCP (P r ob abilistic al ly Che ckable Pr o ofs) The or em [17, 5 0 ], one of the crowning ac hiev ements o f complexit y th eory , sa ys that any mathematical theorem, in an y stand ard formal system suc h as Z ermelo-F raenke l set theory , can b e conv erted in p olynomial time into a prob ab ilistically-c hec k able f ormat. • Quantum pr o ofs [131, 6], which are proofs that d epen d for th eir v alidit y on the output of a quan tum compu tatio n— p ossibly , ev en a quantum computation that requires a sp ecial en tan- gled “pro of state” fed to it as inp u t. Because n quan tum bits might requ ire ∼ 2 n classical bits to sim u late , quant u m pro ofs ha ve th e prop ert y that it might n ev er b e p ossible to list all the “steps” that wen t into the p roof, w ithin th e constrain ts of th e visible unive r s e. F or this reason, one’s b elief in the mathematical statemen t b eing prov ed might dep end on one’s b elief in the correctness of qu an tum mec hanics as a ph ysical theory . • Computational ly-sound pr o ofs and ar guments [35 , 94], which rely for their v alidit y on th e assumption that the prov er w as limited to p olynomial-t ime computations—as w ell as the mathematical conjecture that crafting a convincing argument for a falseho o d would ha ve tak en the prov er more than p olynomial time. What implications do these new types of pr o of ha ve for the foundations of mathematics? Do they merely mak e more dramatic w h at “should h a v e b een obvio u s all along”: that, as Da vid Deutsc h argu es in The Be ginning of Inﬁnity [49], p roofs are physical pro cesses taking place in brains or computers, whic h therefore h a v e no v alidity indep end en t of our b eliefs ab out physics? Are the issu es r aised essen tially th e same as those raised b y “con v ent ional” p roofs th at r equire extensiv e computations, lik e App el a n d Hak en’s pro of of t h e F our-Color Th eorem [13]? Or does app ealing, in the course of a “mathematica l p roof,” to (sa y) the v alidit y of quantum mec hanics, the randomn ess of apparently-random n umb ers, or th e la ck of certain sup erp o w ers on the part o f the pr o v er repr esen t something qualitativ ely new? Philosophical analysis is sought . 10 Complexit y , Space, and Time What c an computational c omp lexit y tell us a b out the nature of space and time? A ﬁrst answ er migh t b e “not muc h”: after all, the deﬁnitions of standard complexit y classes such as P can b e sho wn to b e insensitiv e to su c h details as the n u m b er of spatial d imensions, and even whether the sp eed of light is ﬁnite or inﬁ nite. 58 On the other hand , I thin k complexit y theory do es oﬀer in sigh t ab out the diﬀer enc es b et ween space and time. 58 More precisely , T u rin g mac h ines with one-dimensional tap es are p olynomial ly equiv alent to T uring mac hin es with k - dimensional tap es for any k , and are also p olynomially equiv alent to r andom-ac c ess m achines (whic h can “jump” to any memory location in unit time, with no lo calit y constraint). On the other h and, if we care ab out p olynomial diﬀerences in sp eed, and esp e cial ly if we wan t to study parallel computing mod els, details ab out the spatial la yout of the computing and memory elements (as wel l as the sp eed of comm u n icatio n among the elements) can b ecome vitally imp ortan t. 40 The c lass of problems solv able using a p olynomial amoun t of memory (but p ossibly an expo- nen tial amoun t of time 59 ) is called PSP A CE , for Polynomial Space. Examples of PSP AC E pr oblems include sim u lati n g dynamical systems, deciding whether a regular grammar ge n erates all p ossible strings, and executing an optimal strategy in tw o-pla y er games su ch as Reve r si, Connect F our, and Hex. 60 It is n ot h ard to s h o w that PSP ACE is at least as p o werful as NP : P ⊆ NP ⊆ PSP A CE ⊆ EXP . Here EXP rep resen ts the class of problems solv able usin g an exp onen tial amoun t of time, and also p ossibly an exp onen tial amount of memory . 61 Ev ery one of th e ab o ve conta in men ts is b eliev ed to b e strict, although the o n ly one currently pr ove d to b e strict is P 6 = EXP , by a n imp ortant 1965 result of Hartmanis and Stearns [70] called th e Time Hierarc hy T heorem 62 , 63 . Notice, in particular, that P 6 = NP implies P 6 = PSP A CE . S o wh ile P 6 = PSP A CE is n ot yet pro ved, it is an extremely secure conjecture by the standards of complexity theory . In slogan form, 59 Why “only” an exp onential amount? Because a T uring mac h ine with B bits of memory can ru n for no more than 2 B time steps. After that, the machine must either halt or else retu rn t o a conﬁguration previously v isited (thereby entering an inﬁnite lo op). 60 Note that, in order to sp eak ab out the computational complexity of such games, w e ﬁrst need to generalize them to an n × n b oard! But if we d o so, th en for many natural games, the problem of determining whic h p la yer has th e win from a given p osition is not only in PSP A CE , but PSP A CE -c omplete (i.e., it captu res the entire diﬃculty of the class PSP A CE ). F or ex ample, R eisch [109] show ed that this is true for Hex . What ab ou t a suitable generalization of chess to an n × n b oard? That’s also in PSP A CE —but as far as anyone knows , only if we imp ose a p olynomial upp er b ound on the num b er of mo ves in a chess game. Without such a restriction, F raenk el and Lich ten stein [59] show ed th at chess is EXP -complete; with such a restriction, Storer [125] sho wed that chess is PSP A CE -complete. 61 In this context, we call a function f ( n ) “exp onentia l” if it can b e upp er-b ounded by 2 p ( n ) , for some p olynomial p . Also, note that mor e than exp onential memory would b e u seles s h ere, since a T uring machine that runs for T time steps can visit at most T memory cells. 62 More generally , the Time Hierarch y Theorem shows that, if f and g are any tw o “suﬃcien tly w ell-b eha ved” functions that satisfy f ( n ) ≪ g ( n ) (for example: f ( n ) = n 2 and g ( n ) = n 3 ), then ther e ar e c omputational pr oblems solvable in g ( n ) time but not in f ( n ) time . The pro of of this th eorem u ses d iag onalization, and can b e thought of as a scaled-down vers ion of T uring’s proof of th e un solv ability of t h e halting p roblem. That is, w e argue that, if it were alwa ys p ossible to simulate a g ( n )-time T uring machine by an f ( n )- t ime T uring machine, then we could construct a g ( n )-time m achine that “predicted its own output in adv ance” and th en output something else—thereby causing a contradiction. Using similar arguments, w e can show (for example) that there exist computational problems solv able using n 3 bits of memory bu t n ot using n 2 bits, and so on in most cases where we w ant to compare m or e versus less of the same c omputational r esour c e. In complexity theory , the hard part is comparing tw o diﬀer ent resources: for example, determinism versus n ondeterminism (the P ? = N P problem), time versus space ( P ? = PSP A CE ), or cl assical vers u s quantum computation ( BPP ? = BQP ). F or in those cases, diagonalizatio n by itself no longer works. 63 The fact that P 6 = EXP h as an amusing implication, often attribu ted to Hartmanis: namely , at le ast one of the three inequalities (i) P 6 = NP (ii) NP 6 = PSP A CE (iii) PSP A CE 6 = EXP must b e true, ev en t h ough proving any one of them to b e true i ndividual ly w ould represen t a titanic adv ance in mathematics! The ab o ve observ ation is sometimes oﬀered as circumstantial evidence for P 6 = NP . Of all ou r hundreds of unprov ed b elief s a b out inequalities b et ween pairs of complexit y cl asses, a large fraction of them m ust b e correct, simply to a void contradicting th e hierarc hy t heorems. So then wh y not P 6 = NP in particular (given that our intuitio n there is stronger than our intuitions for most of th e other inequalities)? 41 complexit y theorists b eliev e that sp ac e is mor e p owerful than time . No w, s ome p eople ha ve ask ed h o w such a claim could p ossibly b e consisten t w ith mod ern physic s. F or didn’t Einstein teac h u s that space and time are merely tw o asp ects of the same structure? On e immediate answ er is that, ev en within relativit y theory , space and time are not in terc hangeable: space has a p ositiv e signature whereas time has a negativ e signature. In complexity th eory , th e diﬀerence b et w een space and time manifests itself i n the strai ghtforw ard fact that y ou can r euse the same memory cells o ver and o v er, but you can’t reuse the same moments of time. 64 Y et, as trivial as th at observ ation soun ds, it leads to an interesting though t. S upp ose that the la ws of physic s let us tra v el b ackwar ds in time. In suc h a case, it’s natural to imagine t h at time wo u ld b ecome a “reu sable resource” just lik e space is—and that, as a result, arbitrary PS P ACE computations w ould fall within our grasp . But is that just an idle sp eculation, or can we rigorously justify it? 10.1 Closed Timelik e Curv es Philosophers, like science-ﬁction fans, h a v e long b een interested in the p ossibilit y of closed timelik e curv es (C T Cs), w h ic h arise in certain solutions to Einstein’s ﬁeld equations of general relativit y . 65 On a traditional u n derstanding, the central p hilosophical problem raised by CTCs is the gr andfat her p ar adox . T h is is the situation where y ou g o bac k in time to kill y our o w n grandfather, therefore y ou are n ever b orn, therefore your grandfather is not killed, therefore y ou ar e b orn , and so on. Do es this con tradiction imm ed iat ely imply that CTCs are imp ossible? No, it do esn’t: w e can only conclude that, if CTCs exist, then th e la ws of physics m ust someho w prev ent grandfather parado xes from arising. How could they do so? One classic illustration is that “when y ou go b ack i n time to try a n d kill your g r an d father, the gun j ams”—or some other “unlik ely” eve nt inevitably o ccurs to k eep the state of the univ erse consisten t. But why s hould w e im agine that suc h a conv enien t “out” will alw a ys b e a v aila b le, in ev ery physic al exp eriment in vol vin g CTCs ? Normally , w e like to imagine th at we hav e the freedom to design an exp erimen t ho wev er we wish, without Natur e imp osing conditions on the exp erimen t (for example: “ev ery gun m ust jam sometimes”) whose reasons ca n only b e understo o d in terms of distant or h yp othetical ev ent s. In his 1991 pap er “Quantum mec hanics near closed timelik e lines,” Deutsc h [47] ga v e an elegan t prop osal f or eliminating grandfather parado xes. In particular he sho wed that, as long as we assume the la ws of ph ysics are quan tum-mec hanical (or ev en just classically probabilistic), ev ery exp erimen t inv olving a CT C admits at least one ﬁxe d p oint : that is, a w ay to satisfy the cond itio n s of the exp eriment that ensur es consistent ev olution. F ormally , if S is the mapp ing fr om qu antum states to themselve s indu ced by “going aroun d the CTC once,” then a ﬁxed p oin t is any qu an tum mixed state 66 ρ su c h that S ( ρ ) = ρ . The existence of s uc h a ρ follo ws from simple linear-algebraic argumen ts. As one illustratio n , the “resolution of the grand father paradox” is n o w that y ou are b orn with p r obabilit y 1 / 2, and if y ou are b orn , you go bac k in time to k ill y our grandf ather—f rom 64 See my b log p ost www.scottaaro n son.com/ b log/?p=368 for more on this theme. 65 Though it is not known whether those solutions are “physical”: for ex amp le, whether or not they can surv ive in a quantum th eory of gravit y (see [96 ] for ex ample). 66 In quantum mechanics, a mixe d state can b e though t of as a classica l probability distribution o ver q uan tum states. How ever , an imp ortan t twist is t hat the same mixed state can b e represented by diﬀer ent probability d istributions: for example, an equal mixture of the states | 0 i and | 1 i is physically indistinguishable from an equal mixt u re of | 0 i + | 1 i √ 2 and | 0 i−| 1 i √ 2 . This is why mixed states are represented mathematically using Heisenberg’s density matrix formalism. 42 whic h it follo ws that you are b orn with pr obabilit y 1 / 2, and so on. Merely b y treating states as probabilistic (a s, in some se n se, they h ave to b e in quantum mec hanics 67 ), w e ha ve made the ev olution of the u niv erse consisten t. But Deutsc h’s account of CTC s faces at least three serious diﬃculties. Th e ﬁrst diﬃcult y is that the ﬁxed p oint s might not b e unique : there could b e man y mixed states ρ such that S ( ρ ) = ρ , and then the question arises of how Nature c ho oses one of them. T o illustrate, consid er the g r andfather anti-p ar adox : a b it b ∈ { 0 , 1 } that tra vels around a CTC without c hanging. W e can consistently assume b = 0, or b = 1, or an y probabilistic mixture of the t wo —and unlik e th e usual s ituatio n in physic s, here there is no p ossible b oun dary condition that could resolve the am biguity . The second diﬃculty , p oin ted out Bennett et al. [28], is th at Deutsc h’s p r op osal violates th e statistica l in terpretation of qu an tum mixed states. So for example, if half of an en tangled pair | 0 i A | 0 i B + | 1 i A | 1 i B √ 2 is placed inside the CTC, w hile th e other h alf remains outside the C TC, th en the pro cess of ﬁ n ding a ﬁxed p oin t will “break” th e en tanglemen t b et we en the t wo halv es. As a “remedy” for this problem, Bennett et al. suggest r equiring t h e CTC ﬁxed p oin t ρ to b e ind ep enden t of the en tire rest of the univ erse. T o m y mind, th is remedy is so d rastic that it b asica lly amoun ts to deﬁning CTCs out of existence! Motiv ated by these diﬃculties, Llo yd et al. [90] recen tly pr oposed a completely diﬀeren t accoun t of CT Cs, based on p ostsele cte d telep ortation . Llo yd et al.’s acco u n t a v oids b oth of the problems ab o v e—though p erhap s n ot surpr isingly , in tro duces other p roblems of its o wn. 68 My own view, for whatev er it is w orth, is th at Llo yd et al. are talking less ab out “true” CTC s as I w ould under s tand the concept, as ab out p ostselected quant u m-mec hanical exp erimen ts that simulate CTCs in certain in teresting resp ects. If there are an y contro v ersies in physics that call out for exp ert philosophical atten tion, surely this is one of them. 10.2 The Evolu t ionary Principle Y et so far, we ha ve not ev en mentioned what I see a s the main diﬃcult y with Deutsc h ’s accoun t of CTC s. T his is that ﬁnding a ﬁxed p oin t m ight requir e Nature to solv e an astronomically-hard computational problem! T o ill u strate, consider a scie n ce-ﬁction scenario wherein y ou go back in time a n d dict ate Shakespeare’s pla ys to him. Shake sp eare thanks y ou for saving h im the eﬀort, publishes v erbatim the pla ys that y ou dictate d , and cen turies later the pla ys come do wn t o yo u , whereup on y ou go bac k in time and dictate them to Sh ak esp eare, etc. Notice that, in con trast to the grandfather parado x, here there is no logical contradict ion: th e story as w e told it is entirely consistent. But most p eople ﬁnd the story “parado xical” anyw a y . After all, somehow Hamlet gets written, without an yo n e ev er doing the wo r k of wr iting it! As Deutsc h [47] p erceptively observ ed, if there is a “paradox” here, th en it is not one of logic but of 67 In more detail, Deutsc h’s prop osal works if the state s p ace consists of classical probabilit y distributions D or quantum mixed states ρ , bu t not if it consists of pure states | ψ i . Thus, if one b elieved that only pu re states were fundamental in physics, and that probability distributions and mixed states alwa ys reﬂected sub jectiv e ignorance, one might reject Deutsch’s prop osal on that ground. 68 In particular, in Lloyd et al.’s prop osal, the only wa y to deal with the grandfather paradox is by some v ariant of “the gun jams”: t here ar e evo lutions with no consisten t solution, and it needs to b e p ostulated th at th e laws of physics are such that they n ev er o ccur. 43 c omputa tional c omplexity . S p eciﬁcal ly , the story violates a commonsense pr inciple that we can lo osely articulate as follo ws: Kno wledge requires a causal pro cess to bring it in to existence. Lik e many other imp ortan t pr in ciples, this one might not b e recognized as a “principle” at all b efore w e con template s itu ations th at violate it! Deutsc h [47] calls this p r inciple the Evolutionary Principle (EP). Not e that some v ersion of the EP w as inv ok ed b oth by William Pale y’s blind- w atc hm aker argumen t, and (ironically) by the argumen ts of Ric hard Da wkin s [45] and other atheists against the existence of an in telligen t designer. In m y sur v ey article “ NP -Complete Pr oblems and Ph ys ical Realit y” [4], I pr op osed an d d efended a complexit y-theoretic analogue of the EP , whic h I called the NP Har dness Assumption : There is no ph ysical means to so lve NP -complete problems in polynomial time. The ab o v e statemen t implies P 6 = NP , but is stronger in th at it en compasses pr obabilistic comput- ing, quantum computing, an d any other c omputatio nal mo del compatible w ith the la ws of p h ysics. See [4] f or a sur vey of recen t results b earing on the NP Hardness Assumption, analyses of claimed coun terexamples to the assu mption, and p ossible implications of the assump tion for ph ysics. 10.3 Closed Timelik e Curv e Computation But ca n w e sho w more r igorously that clo sed timelike cur v es w ould viol ate the NP Hardn ess As- sumption? In deed, let us n ow sho w th at, in a un iv erse wh ere arbitrary computations could b e p erformed insid e a CTC, and wh ere Nature had to ﬁn d a ﬁ xed p oint for th e CTC, w e could solv e NP -complete p r oblems using only p olynomial resources. W e can mod el any N P -complete problem instance b y a function f : { 0 , . . . , 2 n − 1 } → { 0 , 1 } , whic h maps eac h p ossible solution x to the bit 1 i f x is v alid, or to 0 if x is in v alid. (Here, for con v enience, we iden tify eac h n -bit solution string x with the nonnegativ e in teger that x enco des in binary .) Our task, then, is to ﬁ nd an x ∈ { 0 , . . . , 2 n − 1 } such that f ( x ) = 1. W e can solv e this problem with just a single ev aluatio n to f , pr o vided we can ru n the follo w ing compu ter program C inside a closed timelik e curve [36, 4, 7]: Given input x ∈ { 0 , . . . , 2 n − 1 } : If f ( x ) = 1 , th en output x Otherwis e, output ( x + 1) mod 2 n Assuming there exists at least one x such that f ( x ) = 1, th e only ﬁxe d p oints o f C —that is, the only wa ys for C ’s output to equal its inp ut—are for C to input, and output, such a v alid solution x , wh ic h therefore app ears in C ’s outpu t register “as if by magic.” (If there are n o v alid solutions, then C ’s ﬁxed p oin ts will sim p ly b e uniform sup erp ositions or pr obabilit y distributions o ver al l x ∈ { 0 , . . . , 2 n − 1 } .) Extending the ab ov e idea, John W atrous and I [7] (follo wing a suggestion by F ortno w) r ece ntly sho wed that a CT C computer in Deutsc h’s mo del could solv e all pr oblems in PS P ACE . (Recall that PSP A CE is b eliev ed to be ev en larger than NP .) More sur prisingly , w e a lso sho w ed that PSP A CE constitutes the limit on wh at can b e done with a CTC computer; and that this is tru e whether the CTC computer is classical or quan tum. One consequence of our results is that the “na ¨ ıv e 44 in tuition” about CTC c ompu ters—that their eﬀect would b e to “ make space and time equiv alen t as computational resour ces” —is ultimately correct, although not for the na ¨ ıve r easons. 69 A second, am using consequence is that, once c losed t imelike curv es are a v ailable, s w itc hing from classical to quan tum computers pr ovides n o additional b eneﬁt! It is imp ortant to realize that our algorithms for solving hard p roblems with CTCs d o not just b oil do wn to “us in g h u ge amoun ts of time to ﬁ nd the answer, then sending the answer bac k in time to b efore the computer started.” F or even in the exotic scenario of a time tra v el computer, we still require that all resources u sed inside the CTC (time, memory , etc.) b e p olynomially- b oun ded. Th u s, the ability to solv e hard pr oblems comes solely f rom c ausal c onsistency : the requ ir emen t that Nature m ust ﬁ nd some ev olution for the CT C computer that av oids grandfather parado xes. In Llo yd et al.’s alternativ e account of CTCs based on p ostselection [90], hard problems can also b e solv ed, th ou gh f or d iﬀeren t reasons. In particular, building on an earlier result of mine [5], Llo yd et al. show t h at t h e pow er of their mod el corresp onds to a complexit y class called PP (Probabilistic Polynomial-Time) , whic h is b eliev ed to b e strictly smaller than PS P ACE but s trictly larger than NP . T h us, one might say that Llo yd et al.’s mo del “improv es” the computational situation, b ut not by m uch! So one migh t wonder: is there an y wa y that the la ws of physics could allo w CTCs, without op ening the do or to implausible computational p o wers? There remains at least one interesting p ossibilit y , whic h w as comm unicated to me b y the philosopher Tim Maudlin. 70 Ma yb e the la ws of physic s hav e the prop ert y that, n o matter what computations are p erformed inside a CTC, Nature alwa ys has an “out” th at a vo ids the grand father parado x, but also av oids solving hard computational problems—analogous to “the gun jamming” in the original grandf ather p arad ox. Suc h a n out migh t inv olv e (for example) an asteroid hitting th e C TC computer, or the c omp u ter failing for other myste rious reasons. Of course, any compu ter in the p h ysical world has some nonzero probability of failure, but ordin arily w e imagine that the failur e p r obabilit y can b e made negligibly sm all. Ho wev er, in s itu ations where Nature is b eing “forced” to ﬁ nd a ﬁ xed p oin t, ma yb e “m ysterious computer failur es” would b ecome the n orm r ather than the exception. T o summarize, I think that compu tat ional complexit y th eory c hanges the philosophical issues raised by time tra v el into the past. While d iscussion traditionally fo cused on the grandfather parado x, we hav e seen that there is n o shortage of wa ys for Nature to a void logical in consistencie s, ev en in a un iv erse with CTCs. The “real” p roblem, then, is ho w to esca p e the oth er parado xes that arise in the course of taming the grand father p arado x! Pr obably foremost among those is the “computational complexit y parado x,” of NP -complete and even hard er problems getting solv ed as if by m agic. 11 Economics In classical economics, agent s are m odeled as rational, Ba y esian agen ts who tak e whatev er actions will maximize their exp ected u tilit y E ω ∈ Ω [ U ( ω )], giv en their su b jectiv e p robabilities { p ω } ω ∈ Ω o v er 69 Sp eciﬁcally , it is not true that in a CTC un iv erse, a T uring machine tap e head could just trav el back and forth in time the same wa y it travel s bac k and forth in space . If one th inks this wa y , then one really h as in mi n d a second, “meta-time,” while t he “original” time h as b ecome merely one more dimension of space. T o put the p oin t diﬀerently: even though a CTC w ould make time cyclic , time wo u ld still retain its dir e ctionality . This is the reason why , if w e w ant to sh o w that CTC computers h a ve the p ow er of PSP A CE , we need a nontrivial argument inv olving causal consistency . 70 This p ossibilit y is also discussed at length in D eutsc h’s pap er [47]. 45 all p ossible states ω of the w orld. 71 This, of cour se, is a carica tu r e that seems almost designed to b e attac k ed, and it has b een attac k ed from almost e very angle. F or exa mp le, h u mans are not ev en close to r atio n al Bay esian agen ts, bu t suﬀer from well-kno wn cognitiv e biases, as explored by Kahneman and T v ersky [80] among others. F urtherm ore, the classical view seems to lea v e no r oom for cr itiquin g p eople’s b eliefs (i.e., their prior probabilities) or their utilit y fu nctions as irrational— y et it is easy to cook up prior probabilities or utilit y functions that w ould lead to b eha vior that almost an yone w ould consider insane. A third problem is that, in games with several co op erating or comp eting agen ts who act simultaneously , classic al economics guarante es the existence of at least one Nash e quilibrium among the agent s’ strategies. But the u sual situation is that ther e are m ultiple equilibria, and then there is no general p rinciple to pred ict which equilibr ium will prev ail, ev en though the choice migh t mean the d iﬀerence b et w een w ar and p eace. Computational complexity theory can contribute to d ebates ab out the f oundations of economics b y sho win g that, ev en in the idealized situation of r atio n al agents who all h a v e p erfect information ab out the state of th e w orld, it w ill often b e c omputa tional ly i ntr actable for those agen ts to act in accordance with classical economics. Of course, some version of this observ ation has b een recognized in economics for a long time. There is a large literature on b ounde d r ationality (going bac k to the w ork of Herb ert Simon [120]), whic h stu dies the beh avior of eco n omic agen ts w h ose decision-making abilities are limited in one wa y or another. 11.1 Bounded Rationality and the It erated Prisoners’ Dilemma As one example of an insight to emerge f rom this literature, consider the Finite I terated Prisoner’s Dilemma. Th is is a game where t wo pla ye r s meet for some ﬁ x ed num b er of rounds N , whic h is ﬁnite and common kno wledge b et w een the play ers. In eac h round, b oth pla y ers ca n e ither “D efect” or “Co op er ate” (not kno w in g th e other pla yer’s c hoice), after w hic h they r ecei ve the follo wing p a y oﬀs: Defect 2 Co op erate 2 Defect 1 1 , 1 4 , 0 Co op erate 1 0 , 4 3 , 3 Both pla yers remem b er th e en tire pr evious h istory of the inte r act ion. It is c lear that th e p la y ers will b e jointl y b est oﬀ if they b oth c o op erate, bu t equally clear that if N = 1 , then coop eration is n ot an e qu ilibrium. On the other hand, if the numb er of r ounds N wer e unknown or inﬁnite , then the p la y ers could rationally decide to co op erate, similarly to how humans d ecide to co op erate in rea l life. T hat is, Pla y er 1 reasons that if he defects, then Play er 2 will ret aliate b y defecting in futu re rounds, and vice versa. S o o v er the long r un, b oth pla y ers do b est for themselv es b y co operating. The “parad ox” is no w that, as so on as N b ecomes kno wn, the ab ov e reasoning collapses. F or assuming the play ers are rational, they b oth realize th at whatev er else, neither h as an ythin g to lose by defecting in r ound N —and therefore that is what they do. But since b oth pla y ers know that b oth will defect in r ound N , neither one h as an ything to lose b y d efecti n g in round N − 1 either —and they can con tin u e inductiv ely in this w a y bac k to the ﬁrst rou n d. W e therefore g et the “prediction” that b oth pla y ers w ill defect in ev ery round, ev en though that is neither in the pla ye r s ’ o wn in terests, nor what actual h u mans do in exp eriments. 71 Here we assume for simplicity that the set Ω of p ossible states is countable; otherwise w e could of course use a conti nuous probability measure. 46 In 1985, Neyman [98] prop osed an ingenious resolution of this paradox. Sp eciﬁcally , h e sho wed that if the t w o play ers ha ve suﬃciently smal l memories —tec hnically , if they are ﬁnite automata with k states, for 2 ≤ k < N —then coop eration b ecomes a n equilibrium once again! The basic in tuition is that, if b oth pla y ers lac k enough memory to coun t up to N , and b oth of th em kn o w that, an d b oth kno w that th ey b oth kn o w that, and so on, then the inductiv e argument in the last paragraph fails, s ince it assumes intermediate strategies that neither pla yer can imp lemen t. While complexity considerations v anquish some of the counterin tuitiv e c onclusions of classical economics, equally in teresting to m e is that they do not v an q u ish others. As one example, I sho wed in [3] that Rob ert Au m ann’s celebrated agr e ement the or em [19]—p erfect Ba y esian agen ts with com- mon priors can neve r “agree to disagree”—p ersists even in th e presence of limited comm un ica tion b et w een the agents. There are many other interesting results in the b ounded rationalit y literature, to o many to do them justice here (but see Rubinstein [111] for a survey). On the other hand, “b ounded ratio- nalit y” is something of a cat ch-all phr ase, en compassin g a lmost every imaginable deviation from rationalit y—including h u man cognitiv e biases, limits on inform ation-gathering and comm u nicatio n , and the restriction of strategies to a sp eciﬁc form (for example, linear thr eshold functions). Many of these deviations ha v e little to do with computational complexit y p er se . So the qu estion re- mains of whether computational c omp lexity sp e ciﬁc al ly can p ro vide new in sigh ts ab out e conomic b eha vior. 11.2 The Complexit y of E quilibria There are s ome ve r y recen t adv ances s uggesti n g that the answer is y es. Consid er the problem of ﬁnd ing an equilibrium of a tw o-pla y er game, give n the n × n pa yoﬀ matrix as input. In the sp ecial case of zer o-sum games (whic h von Neumann s tu died in 1928), it h as long b een known how to solv e this pr oblem in an amount of time p olynomial in n , for example by reduction to linear programming. But in 2006, Dask alakis, Goldb erg, and Pa p adimitriou [44] (with improv emen ts by Chen and Deng [39]) pro v ed the sp ecta cular result that, for a gener al (n ot necessarily zero-sum) t wo -play er game, ﬁnd ing a Nash equilibriu m is “ PP A D -complete.” Here PP AD (“P olynomial Pa r it y Argumen t, Directed”) is, r oughly sp eaking, the class of al l searc h problems for whic h a solution is guarant eed to exist for the same com binatorial reason that every game has at least one Nash equilibrium. Note that ﬁnd ing a Nash equilibrium c annot b e NP -complete, for the tec hn ical reason that N P is a cla ss of de cision problems, and the answ er to the decision problem “do es t h is game ha ve a Nash equilibriu m?” is alwa ys y es. But Dask alakis et al.’s result says (informally) that the searc h prob lem of ﬁnding a Nash problem is “as close to NP -complete as it could p ossibly b e,” sub ject to its decision version b eing trivial. Similar PP AD -completeness results are n o w kno wn for other fundamenta l economic p roblems, s uc h as ﬁndin g mark et-clearing prices in Arrow-De b r eu mark ets [38]. Of course, one can debate the economic relev ance of these results: f or example, how often d oes the computational hardness that w e n o w kn ow 72 to b e inherent in economic equilibrium theorems actually rear its head in p ractic e? But one can similarly debate the economic relev ance of the equilibrium theorems themselves! In m y opinion, if the theorem that Nash equilibria exist is considered relev an t to debates a b ou t (sa y) fr ee markets v ersu s go v ernm en t in terv ent ion, then the theorem that ﬁnding th ose equilibria is PP AD -complete should b e considered r ele v ant also. 72 Sub ject, as u sual, to widely-b eliev ed complexity assumptions. 47 12 Conclus ions The purp ose of th is essa y was to illustrate ho w philosophy could b e enric hed by taking compu- tational complexit y theory into accoun t, muc h as it w as enric hed almost a cen tury ago by taking computabilit y theory into accoun t. In particular, I argued that computational complexit y provides new in sigh ts in to the explanatory conten t of Darwinism, the nature of mathematical knowledge and pro of, compu tationalism, synta x v ersu s semant ics, the problem of logical omniscience, d eb ate s sur- rounding the T uring T est and Chinese Ro om, the pr ob lem of induction, the foun dations of quan tum mec hanics, closed timelik e curves, and economic r atio n alit y . Indeed, one might sa y that the “real” question is which philosophical problems don ’t h a v e imp ortan t computational complexit y asp ects! My o wn opinion is that there p robably ar e suc h problems (even within an alytic philosoph y), and that one go od candidate is th e p roblem of w hat w e sh ould tak e as “b edro c k mathematical realit y”: that is, the set of mathematical statement s that are ob jectiv ely tru e or false, regardless of whether they can b e pro ved or d ispro v ed in a given f ormal system. T o me, if we are not willing to sa y that a give n T ur ing machine M either accepts, rejects, or runs forever ( w hen started on a blank tap e)—and th at which one it does is a n ob jectiv e fact, indep endent of our formal axiomatic theories, the laws of p h ysics, the biology of the human brain, cultural conv en tions, etc.—then we have no b asis to talk ab out any of those other things (axio matic theories, the la ws of physics, and so on). F urther m ore, M ’s resource requiremen ts are irrelev an t here: ev en if M only h alts after 2 2 10000 steps, its output is as mathematica lly deﬁn ite as if it had halted after 10 steps. 73 Can we say an ything gene r al ab out wh en a computational complexity p ersp ectiv e is helpful in philosophy , and when it isn’t? Extrap olating from the examples in this essa y , I would sa y that computational complexit y tends to b e helpful when we wa nt to kno w w hether a p articula r fact do es any e xpla natory work : S ect ions 3.2, 3.3, 4, 6, and 7 all p r o vided examp les of this. Other “philosophical applications” of complexit y theory come from the Evol u tionary Principle and th e NP Ha rd ness Assump tion discussed in S ecti on 10.2. If w e b eliev e that certain problems are computationally in tractable, then we ma y b e able to draw in teresting conclusions from that b elief ab out economic rationalit y , quan tum mec h an ics, the p ossibilit y of closed timelike curve s, and other issues. By con trast, computational complexit y tends to b e un h elpful when we only wan t to kno w wh ether a particular fact “determines” another fact, and don’t care ab out the length of the inferen tial c hain. 12.1 Criticisms of Complexit y Theory Despite its exp lanato r y reac h, complexit y theory h as b een criticized on v arious groun ds. Here are four of the most common criticisms: (1) Complexit y theory o n ly makes a symptotic statemen ts (statemen ts ab out ho w the resources needed to solve problem instances of size n scale as n go es to inﬁ nit y). But as a matter of logic, asymptotic statemen ts need n ot hav e any impl ic ations wha tso ever for the ﬁnite v alues of n (sa y , 10 , 000 ) that h u mans care actually ab out, n or can an y ﬁnite amoun t of exp erimenta l data conﬁrm or refute an asymptotic claim. 73 The situation is very diﬀerent for mathematical statements lik e th e Con tinuum Hyp othesis, which c an ’t obviously b e phrased as predictions ab out idealized computational processes (since they’re n ot expressible by ﬁrst- order or even second-order quantiﬁcation ov er the intege rs). F or those statements, it really i s u nclear to me what one means by their truth or falsehoo d apart from th eir p ro v ability in some formal system. 48 (2) Man y of (wh at w e wo uld lik e to b e) complexit y theory’s basic p rinciples, such as P 6 = NP , are currentl y u npro ve d ma th emati cal co n jectures, and will probably remain that wa y for a long time. (3) Complexit y theory fo cuses on only a limited t yp e of computer—the serial, deterministic T ur- ing mac hine—and fails to incorp orate the “messier” computational phenomena f ound in na- ture. (4) Complexit y theory studies only the worst-c ase b ehavio r of algorithms, and do es n ot add ress whether that b eha vior is r epresen tativ e, or whether it merely reﬂects a few “pathological” inputs. So for example, ev en if P 6 = NP , there might still b e excellen t heuristics to solv e most instances of NP -complete problems that actually arise in practice; complexit y theory tells us nothing ab out su c h p ossibilities one wa y or the other. F or whatev er it’s worth, criticisms (3) and (4) ha ve b ecome muc h less accurate since the 1980s. As discussed in this essa y , complexit y theory h as b y no w b ranc hed out far b ey ond determin istic T uring mac hines, to incorp orate (for example) quan tum mec hanics, parallel and distrib uted com- puting, and sto c hastic pro cesses such as Darwinian ev olution. Meanwhile, although wo r st-ca se complexit y remains the b est-unders to o d kind, to da y there is a large b o dy of w ork—muc h of it driv en b y cryptography—that s tudies the aver age-c ase hardness of computational p roblems, for v arious probabilit y distributions o ver inputs. An d ju st as almost all co mp lexit y t h eorists b eliev e that P 6 = NP , so almost all sub scrib e to the stronger b elief that there exist har d-on-aver age NP problems—indeed, that b elief is one of the underpinnings of mod ern cryptography . A few prob- lems, such as calculating discrete logarithms, are eve n known to b e just a s ha r d on r andom inputs as they ar e on the har dest p ossible input (though whether su c h “w orst-case/a v erage-case equiv a- lence” h olds for any NP -complete p roblem remains a ma jor op en question). F or th ese reasons, although sp eaking ab out av erage-c ase rather than worst-ca se complexit y w ould complicate some of the arguments in this essay , I don’t thin k it w ould c hange the conclusions muc h. 74 See Bogdano v and T r evisan [32] for an excellen t r ece nt surve y of av erage -case complexit y , and Impagliazzo [78] for an ev o cativ e discussion of co mp lexit y t h eory’s “p ossible wo r lds” (for example, the “world” where NP -complete p r oblems turn out to b e hard in the worst case but easy on a ve r age ). The broader p oin t is that, ev en if we adm it that critici sm s (1)-(4 ) hav e merit, that d oes not giv e us a license to dismiss complexit y-theoretic argument s whenever we d islik e th em! In science, w e only eve r d eal with imp erfect, approxi m ate theories—and if w e reject the conclusions of the b est appro ximate theory in some area, then the burd en is on u s to explain why . T o illustrate, sup p ose y ou b eliev e that qu antum compu ters will never give a s p eedup o ver classical computers for an y p ractica l problem. T h en as an explanation for y our stance, you might assert any of the follo wing: (a) Quantum mec hanics is false or in complete , and an attempt to build a scalable quantum computer would instead lead to falsifying or extending quant u m m ec hanics itself. (b) There exist p olynomial-time classic al algorithms f or f act orin g in tegers, and for all the other problems that adm it p olynomial-time quantum algorithms. (In complexit y terms, the classes BPP and B QP are equal.) 74 On the other hand, it would p resuppose that we k new how to d eﬁne reasonable probability distributions o ver inputs. But as discussed in Section 4.3, it seems hard to exp lain what we mean by “structured instances,” or “the types of instances t h at normally arise in practice.” 49 (c) The “constan t-factor o ve r h eads” inv olv ed in building a quantum computer are so large as to negate their asymptotic adv an tages, for any p roblem of conceiv able h u man in terest. (d) While we don’t yet kn o w whic h of (a)-(c) holds, w e can know on some a priori groun d that at least on e of them has to hold. The p oin t is that, even if w e can’t answ er eve ry p ossible shortcoming of a complexit y-theoretic analysis, w e can still use i t to clarify the choic e s : to force p eople to la y some c ard s on th e table, committing themselve s either to a p rediction that migh t b e falsiﬁed or to a mathematica l conjecture that might b e disp ro v ed. Of course, this is a common feature of al l scientiﬁc theories, not something sp eciﬁc to complexit y theory . If complexit y theory is unusual here, it is only in the num b er of “predictions” it juggles that could b e conﬁrmed or r efuted by mathematical pro of (and ind eed, only b y mathematica l proof ). 75 12.2 F uture Directions Ev en if the v arious criticisms of complexit y theory don’t n ega te its relev ance, it w ould b e great to address those criticisms head-on—and more generally , to ge t a cle arer un derstanding of the rela- tionship b et wee n complexit y theory and the r eal -world phenomena that it tries to exp lain. T o wa rd that end, I think the follo wing qu estio n s wo u ld all b eneﬁt fr om careful p hilosophical analysis: • What is th e empir ical status o f asymptotic claims? What sense can w e giv e to an asymp- totic statement “making predictions,” or b eing su pp orted or ruled out by a ﬁnite num b er of observ ations? • Ho w can we explain th e empirical facts on which complexit y theory relies: for example, that w e rarely see n 10000 or 1 . 0000001 n algorithms, or th at the co m putational p roblems humans care ab out tend to organize themselve s in to a relativ ely-small num b er of equiv alence classes? • Short of pr oof, how do p eople form in tu itions ab out the truth or falseho o d of mat h emati cal conjectures? What ar e those int u itio n s, in cases su c h as P 6 = NP ? • Do the conceptual conclusions that p eople s ometi mes w ant to draw f rom conjectures such as P 6 = NP or BPP 6 = BQP —for example, ab out the nature of mathematical creativit y or the in terpr eta tion of qu an tum mec hanics—actually dep end on those conjectures b eing true? Are there easier-to-pro ve statemen ts t h at wo u ld arguably su p p ort the same conclusions? • If P 6 = NP , then ho w hav e humans managed to make su c h enorm ou s mathematical progress, ev en in th e face of the general in tractabilit y of theorem-pr o ving? Is there a “selection eﬀect,” b y whic h mathematicia n s fav or problems with sp ecial structure that mak es them easier to solv e than arbitrary p roblems? If so, then w hat do es this structure consist of ? In sh ort, I see p lent y of scop e for the con verse essay to this one: “Why Compu tat ional Com- plexit y Theorists Sh ould Care Ab out P h ilosoph y .” 75 One other example that springs to mind, of a scientiﬁc theory many of whose “pred ictions” take the form of mathematical conjectures, is string theory . 50 13 A c kno wledgmen ts I am grateful to Oron Shagrir for pushin g me to ﬁnish this essa y , for helpfu l co m m en ts, and for suggesting Section 7.2; to Alex Byrne for suggesting Section 6; to Agust ´ ın Ra y o for suggesting Section 5; and to Da vid Aarons on , Seam us Bradley , T errence C ole , Mic hael Collins, Andy Dru c k er, Mic hael F orb es, Oded Goldreic h, Bob Harp er, Gil Kalai, Dana Moshko vitz, Jan Arne T elle, Dylan Th u rston, Ronald de W olf, Avi Wigderson, and Joshua Zelinsky for their feedbac k. References [1] S. Aaronson. Shor, I ’ll d o it (we b log en try). www.scottaaronson.com/blog/ ?p =208. [2] S. Aaronson. Multilinear formulas and sk epticism of quantum computing. In Pr o c. ACM STOC , pages 118–1 27, 2 004. quant -p h /03 11039. [3] S. Aaronson . The complexit y of agreemen t. In Pr o c. ACM STOC , p ag es 634–643 , 2005. ECCC TR04-061. [4] S. Aaronson. NP-complete problems and ph ysical realit y . SIGACT News , Marc h 2005. qu ant- ph/050207 2. [5] S. Aaronson. Qu antum compu tin g, p ostselec tion, and probabilistic p olynomial-time. Pr o c. R oy. So c . L ond on , A 461(2063): 3473–3482, 2005. quant-ph/041 2187. [6] S. Aaronson and G. Ku p erb erg. Qu an tum v ersu s classical p r oofs and advice. The ory of Com- puting , 3(7):129–1 57, 2007. Previous v ersion in Pr oceedings of CC C 2007. qu an t-ph/060405 6. [7] S. Aaronson an d J . W atrous. Closed timelik e curves make quantum and classical computing equiv alen t. Pr o c. R oy. So c. L ondon , (A465):631– 647, 2009. arXiv:0808.2669 . [8] S. Aaronson and A. Wigderson. Algebrization: a new b arrier in complexit y theory . ACM T r ans. on Computation The ory , 1(1), 2009. C onference version in Pro c. ACM STOC 2008. [9] M. Agra w al, N. Ka y al, and N. Saxena. PRIMES is in P. www.cse.iitk.ac.in/users/manindra/primalit y .ps, 2 002. [10] D. Aharonov. Quantum computation - a review. In Dietric h Stauﬀer, editor, Annual R eview of Computational P hysics , v olume VI. 1998. quan t-ph/9812037. [11] D. Angluin. Learning regular sets from queries and counterexa mp les. Information and Com- putation , 75(2):87 –106, 1987 . [12] D. An gluin, J. Aspn es, J. Chen, and Y. W u . Learning a circuit b y injecting v alues. J . Comput. Sys. Sci. , 75(1):60–7 7, 2009. Earlier v ersion in ST OC’2006 . [13] K. Ap p el and W. Hak en. E very planar map is four-c olor able . American Mathematical So ciet y , 1989. [14] B. Applebaum, B . Barak, and D. Xiao. On basing lo wer-b ounds f or learning on w orst-case assumptions. I n Pr o c. IEEE FOCS , pages 211–220, 2008 . 51 [15] S. Arora and B. Barak. Complexity The ory: A Mo dern Appr o ach . C am bridge Universit y Press, 2009. Online dr aft at w w w.cs.princeton.edu/theory/complexit y/. [16] S. Arora, R. Impagliazzo, and U. V azirani. Relativiz ing v ersus nonrelativizing tec hniques: the role of lo cal c hec k abilit y . Man uscript, 1992. [17] S. Arora, C . Lu nd, R. Mot wani, M. Su d an, and M. Szegedy . Pro of veriﬁcatio n and the hardness of appro ximation problems. J. ACM , 45(3):501 –555, 1998 . [18] S. Arora and S. Safra. Probabilistic c hec king of pro ofs: a new c h aract erization of NP. J. ACM , 45(1):70–1 22, 199 8. [19] R. J. Aumann. Agreeing to disagree. Annal s of Statistics , 4(6):1 236–1239, 19 76. [20] L. Babai, L. F ortno w, and C . Lund . Nondeterministic exp onent ial time has t wo-pro v er inter- activ e proto cols. Computational Complexity , 1(1):3– 40, 1991 . [21] T. Bak er, J. Gill, and R. Solo v a y . Relativizations of the P= ?NP question. SIAM J. Comput. , 4:431– 442, 19 75. [22] E. B. Baum. What Is Thought? Bradford Bo oks, 2004. [23] R. Beals, H. Buh rman, R. Cleve , M. Mosca, and R. de W olf. Qu an tum lo wer b ounds by p olynomials. J. ACM , 48(4) :778–797, 2001. Earlier version in IEEE F OC S 1998, p p. 352-361. quan t-ph /98 02049. [24] P . Beame and T. Pitassi. Prop ositional p roof complexit y: past, p resen t, and f u ture. Curr ent T r ends in The or etic al Computer Scienc e , pages 42–70, 2001. [25] S. Bellan toni and S. A. Co ok. A n ew r ecursion-theoretic charact erization of the p olytime functions. Computational Complexity , 2:97 –110, 1992. Earlier v ersion in S TOC 1992, p. 283-2 93. [26] M. Ben-Or, S. Goldw asser, J. Kilian, and A. Wigderson. Multi-pro ve r int eractiv e pro ofs: ho w to remo ve the in tractabilit y assump tions. In Pr o c . ACM STOC , p age s 113–131 , 198 8. [27] C. Bennett, E. Bernstein, G. Brassard, and U. V azirani. Strengths and weaknesses of quan tum computing. SIA M J . Comput. , 26(5):1 510–1523, 19 97. quant-ph/97 01001. [28] C. H. Bennett, D. Leung, G. Smith, and J. A. S molin. Can closed timelik e curv es or nonlin- ear q u an tum mec h anics impro ve quantum s tate d iscr im in atio n or help solv e hard problems? Phys. R ev. L ett. , 103(1705 02), 200 9. arXiv:090 8.3023. [29] E. Bern stein and U. V azirani. Quantum complexit y theory . SIAM J. Comput. , 26 (5):1411– 1473, 1997. First app eared in A CM STO C 1993. [30] N. Blo c k. Searle’s argument s against cognitiv e s cie n ce. In J. Preston and M. Bishop, editors, Views i nto the Chinese R o om: New Essays on Se arle and Artiﬁcial Intel lig enc e , pages 70–79. Oxford, 2002. [31] A. Blumer, A. Ehr enfeuc h t, D. Haussler, and M. K. W arm u th. Learnabilit y an d the Vapnik- Chervo n enkis d imension. J. ACM , 36(4):929–9 65, 1989. 52 [32] A. Bogdano v and L. T revisan. Av erage-case complexit y . F ounda tions and T r ends in The o- r etic al Com puter Scienc e , 2(1), 2006. EC CC TR06-073. [33] R. B. Boppana, J. H ˚ astad, and S . Z achos. Do es co-NP ha ve sh ort interacti ve pr oofs? Inform. Pr o c. L ett. , 25:127 –132, 19 87. [34] R. Bousso. Po s itive v acuum energy and the N-b ound. J. H igh Ene r gy Phys. , 0011(038 ), 2000. hep-th/001025 2. [35] G. Brassard, D. Ch aum, and C. Cr´ ep eau. Minim um d isclo su re pro ofs of knowle d ge. J. Comput. Sys. Sci. , 37(2):156 –189, 1988 . [36] T. Br u n. Comp uters with closed timelik e cu r v es can solv e hard problems. F oundations of Physics L etters , 16:245–253 , 2003. gr-qc/02090 61. [37] D. J . Chalmers. The Co nsci ous Mind: In Se ar ch of a F undamental The ory . Oxford, 1996. [38] X. Chen, D. Dai, Y. Du, and S.-H. T eng. Settling the complexit y of Arrow-De b reu equilibria in marke ts with additiv ely s eparable utilities. In Pr o c. IEEE F OCS , p age s 273–282 , 200 9. [39] X. Chen and X. Deng. Settling the complexit y of t wo -play er Nash equilibr ium. In Pr o c. IEE E F O CS , pages 261–271, 2006. [40] C. Ch er n iak. Comp u tatio n al c omp lexit y and th e univ ers al acceptance of logic. The Journal of Philosophy , 81(12):739 –758, 1984. [41] R. Clev e, P . Høy er, B. T oner, an d J. W atrous. Consequences and limits of nonlo cal s trate - gies. In Pr o c. IEEE Confer enc e on Computationa l Complexity , pages 236–2 49, 2 004. quan t- ph/040407 6. [42] A. Cobham. T h e in trinsic computational diﬃcult y of fun ctio n s. In Pr o c e e dings of L o gic, Metho dolo gy, and P hilos ophy of Scienc e II . North Holland, 1965 . [43] J. Cop eland. Hyp ercomputation. M i nds and Machines , 12:461 –502, 20 02. [44] C. Dask alakis, P . W. Goldb erg, and C. H. P apadimitriou. The complexit y of computing a Nash equilibrium. Commun. ACM , 52( 2):89–97, 2009. Earlier version in Pr oceedings of STOC’2006. [45] R. Da wkins. The Go d Delusion . Hough ton Miﬄin Harcourt, 2006. [46] D. C. Denn ett. D arwin ’s D anger ous Ide a: E volution and the Me anings of Life . Simon & Sc huster, 1995. [47] D. Deutsch. Quantum mechanics near closed timelik e lin es. Phys. R ev. D , 44:3197– 3217, 1991. [48] D. Deutsch. The F abric o f R e ality . P enguin, 1998. [49] D. Deutsc h. The Be ginning of Inﬁnity: Explanations that T r ansform th e World . Allen Lane, 2011. 53 [50] I. Din ur. The PCP theorem b y gap ampliﬁcation. J. ACM , 54(3):12, 2007. [51] A. Druc ke r . Multiplying 10-digit n umb ers using Flic kr: the p o w er of recognition memory . p eople.csail.mit .edu /andyd/rec m etho d.p d f , 2011. [52] R. F agin. Finite mo del theory - a p ersonal p ersp ectiv e. The or etic al Comput. Sci. , 116:3– 31, 1993. [53] R. F agin, J . Y. Halp ern, Y. Moses, and M. Y. V ardi. R e asoning ab out Know le dge . The MIT Press, 1995. [54] U. F eige, S. Goldwasser, L. Lov´ asz, S. Safra, and M. Szegedy . Interact ive pr oofs and the hardness of appro ximating cliques. J. A CM , 43(2):26 8–292, 199 6. [55] R. P . F eynman. Sim ulating physics with compu ters. Int. J. The or etic al Physics , 21(6-7): 467– 488, 1982. [56] L. F ortno w. The role of relativization in complexit y theory . B u l letin of the EA TCS , 52:229– 244, F ebruary 1994. [57] L. F ortnow. One complexit y theorist’s view of qu an tum computing. The or etic al Comput. Sci. , 292(3):597 –610, 2003 . [58] L. F ortno w and S. Homer. A short history of computational complexit y . Bul letin of the EA TCS , (80):95–1 33, 2003. [59] A. F r aenk el and D. Lic h tenstein. Compu ting a p erfect strategy for nxn c hess requires time exp onen tial in n. Journal of Co mbinatorial The ory A , 31:199–21 4, 1981. [60] C. Gent r y . F ully homomo r phic encry p tion usin g ideal lattices. In Pr o c. ACM STOC , pages 169–1 78, 20 09. [61] O. Goldreic h. On q u an tum compu ting. w ww.wisdom.w eizmann.ac.il/˜oded /o n -qc.h tml, 2004. [62] O. Goldreic h. Computational Compl exi ty: A Con c eptual Persp e ctive . Cam br idge Unive r sit y Press, 2008. Earlier version at www.wisd om.w eizmann.ac.il/ ˜o ded/cc-drafts.ht ml. [63] O. Goldreic h . A Primer on Pseudor andom Gener ator s . American Mathematica l So ciet y , 2010. w ww.wisdom.w eizmann.ac.il/˜oded /PDF/ pr g10 .p d f. [64] O. Goldreic h, S. Goldw asser, and S. Mic ali. Ho w to construct r andom fu nctions. J. ACM , 33(4): 792–807, 1 984. [65] O. Goldr eich, S. Micali, an d A. Wigderson. P r oofs that yield nothing b ut their v alidit y or all languages in NP ha ve zero-kno wledge pro of systems. J. ACM , 38(1):6 91–729, 19 91. [66] S. Goldw asser, S. Micali , an d C. Rack oﬀ. The kno wledge complexit y of interact ive pro of systems. SIAM J. Comput. , 18(1):1862 08, 1989 . [67] N. Goo dman. F act, Fiction, and F or e c ast . Harv ard Un iv ersit y Press, 1955. [68] P . Gr ah am. Ho w to do p hilosoph y . ww w.paulgraham.com/philosoph y .h tml, 2007. 54 [69] L. K. Gro ver. A fast qu an tum mec hanical algorithm for database searc h. In Pr o c. ACM STOC , pages 212–2 19, 1 996. quant -p h /96 05043. [70] J. Hartmanis and R. E. Stearns. On the computational complexit y of algorithms. T r ansactions of the Americ an Mat hematic al So ciety , 117:285–3 06, 1965. [71] J. Haugeland. Synta x, semantic s, ph ysics. In J. Preston and M. Bishop, editors, Views into the Chinese R o om: New Essays on Se arle and Artiﬁcial Intel ligenc e , pages 379–39 2. O xford, 2002. [72] J. Hin tikk a. Know le dge and Belief . Cornell Univ ersity Press, 1962. [73] M. Hogarth. Non-Turing compu ters and non-Turin g computabilit y . Biennial Me eting of the Philosophy of Scienc e Asso ciation , 1:126–138 , 1994. [74] A. S. Holev o. Some estimates of the information transmitted by quan tum comm unication c hannels. Pr oblems of Information T r ansmission , 9:177– 183, 1973. English translation. [75] G. ’t Ho oft. Quant u m gravit y as a dissipativ e deterministic system. Classic al and Quantum Gr avity , 16:3263– 3279, 1999 . gr-qc/990 3084. [76] D. Hu m e. An Enquiry c onc erning H uman Understanding . 17 48. 18th.eserv er.org/h u me- enquiry .h tml. [77] N. Immerman. De scriptive Complexity . Springer, 1998. [78] R. Impagliaz zo. A p ersonal view of av erage-c ase complexit y . In P r o c. IEEE Confer enc e on Computation al Complexity , p age s 134–147 , 199 5. [79] R. Impagliazzo and A. Wigderson. P=BPP unless E h as sub exp onenti al circuits: derandom- izing the X OR Lemma. In Pr o c. ACM STO C , pages 220– 229, 1 997. [80] D. Kahneman, P . Slo vic, and A. Tv ersky . Judgment U nder Unc ertainty: Heuristics and Biases . Cambridge Unive r sit y Press, 1982. [81] M. J. Kearns and L. G. V alian t. Cryptographic limitations on learning Bo olea n f orm ulae and ﬁnite automata. J. ACM , 41(1):67– 95, 1994. [82] M. J. Kearns and U. V. V azirani. An Intr o duction to Computational L e arning The ory . MIT Press, 1994. [83] J. Kemp e, H. Koba ya sh i, K. Matsumoto, B. T oner, and T . Vidick. E n tangled games are hard to approximat e. SIAM J. Comput. , 40(3): 848–877, 2011. Earlier v ersion in FOCS’2008 . arXiv:0704 .2903. [84] A. Kliv ans and D. v an Melk eb eek. Graph nonisomorphism has sub exp onentia l size pr oofs un - less the p olynomial-time hierarc hy collapses. SIAM J . Comput. , 31:1501– 1526, 2 002. Earlier v ersion in A C M STOC 1999. [85] R. E. Ladner. On the structur e of p olynomial time reducibilit y . J. ACM , 22:155 –171, 1975. 55 [86] D. Leiv an t. A f oundational delineat ion of p oly-ti m e. Information and Computa tion , 110(2 ):391–420, 199 4. Earlier v ersion in LICS (Logic In Compu ter Science) 1991, p. 2-11. [87] H. J. Lev esque. Is it enough to get the b eha vior right? In Pr o c e e dings of IJCA I , pages 1439– 1444, 20 09. [88] L. A. Levin. Polynomial time and extra v agan t mo dels, in T he tale of one-wa y functions. Pr oblems of Information T r ansmission , 39(1):9 2–103, 2 003. cs.CR/0012023 . [89] M. Li and P . M. B. Vit´ anyi. An Intr o duction to Kolmo gor ov Complexity and Its Applic ations (3r d e d.) . Sprin ger, 2008. [90] S. Llo yd, L. Maccone, R. Garcia-P atron, V. Gio v annetti, and Y. Shik ano. The quantum mec hanics of time tr a v el through p ost-selec ted telep ortatio n. Phys. R ev. D , 84(02500 7), 2011. arXiv:1007.2615 . [91] J. R. Lucas. Minds, mac hines, and G¨ odel. Philosophy , 36:11 2–127, 1961. [92] N. D. Mermin. F rom cbits to qbits: teac hing computer scien tists quan tum m ec hanics. A mer- ic an J. Phys. , 71(1):23– 30, 2003. quant-ph/020 7118. [93] N. D. Mermin. Quantum Computer Scienc e: An Intr o duction . Cambridge Un iv ersit y Pr ess, 2007. [94] S. Micali. Compu tatio nally sound p ro ofs. SIA M J . Comput. , 30(4):12 53–1298, 200 0. [95] C. Mo ore and S. Mertens. The Natur e of Computation . Oxford Universit y Press, 2011. [96] M. S. Morris, K . S. Thorne, and U. Y urtsev er. W ormh oles, time mac hines, and the w eak energy condition. Phys. R ev. L ett. , 61:1446– 1449, 1988 . [97] A. Morton. Epistemic vir tues, meta virtu es, and computational complexit y . Noˆ us , 38(3):481– 502, 2004. [98] A. Neyman. Bounded complexit y justiﬁes coop eration in the ﬁnitely rep eated p risoners’ dilemma. E c onomics L etters , 19(3):227–2 29, 1985. [99] M. Nielsen and I. Chuang. Quantum Computation and Q uantum Information . C am bridge Univ ersity Press, 2000. [100] C. H. Papadimitriou. Computa tional Complexity . Addison-W esley , 1994. [101] I. P arb erry . Knowledge , u n derstanding, and compu tational complexit y . In D. S. Levine and W. R. Elsb erry , editors, Optimality in Biolo gic al and Artiﬁcial Networks? , p age s 125–1 44. La wrence Erlbaum Asso ciates, 1997. [102] R. Penrose. The Emp er or’s New M ind . O xford, 1989. [103] R. P enrose. Shadows of the M ind: A Se ar c h for the Missing Scienc e of Consciousness . Oxford, 1996. 56 [104] L. Pitt and L. V alian t. Compu tational limitations on learning fr om examples. J. ACM , 35(4): 965–984, 1 988. [105] C. Pomerance . A tale of t wo siev es. Notic es of the Americ an Mathemat ic al So ciety , 43(12 ):1473–1485, 199 6. [106] H. Pu tnam. R epr esentation a nd R e ality . Bradford Bo oks, 1991. [107] R. R az. Exp onen tial s eparatio n of quantum and classical comm un ication complexit y . In Pr o c . ACM STOC , p age s 358–367 , 1999 . [108] A. A. Razb oro v and S. Rud ic h. Natural p r oofs. J. Comput. Sys. Sci. , 55(1) :24–35, 1 997. [109] S. Reisc h . Hex is PSP A C E-co m p lete. A cta Informatic a , 1 5:167–191 , 1981. [110] H. E. Rose. Su b r e cursion: F unctions and Hier ar chies . Clarend on Pr ess, 1984. [111] A. Ru binstein. Mo deling Bounde d R ationa lity . MIT Press, 1998. [112] A. Sch¨ onhage and V. S trassen. Sc h nelle Multiplik atio n großer Zahlen. Computing , (7):281– 292, 1971. [113] J. Searle. Minds, brains, and pr ograms. Behavior al and Br ain Scienc es , 3(417-457 ), 1980. [114] J. Searle. The R e disc overy of the Mind . MIT Press, 1992. [115] A. S hamir. IP=PSP ACE. J. ACM , 39(4):8 69–877, 19 92. [116] S. M. Shieb er. Th e Turing test as interac tive pr oof. Noˆ us , 41(4):686– 713, 2007. [117] P . W. Shor. Po lyn omial-time algorithms for prime facto rization and discrete logarithms on a quan tum computer. SIAM J. Comput. , 26(5 ):1484–150 9, 1997 . Earlier version in IEEE F OCS 1994. quant-ph/950 8027. [118] H. T. Siegelmann. Neural and sup er-Tu ring computing. Minds and Machines , 13(1):103– 114, 2003. [119] D. S imon. O n the p o w er of quantum co mp utation. In Pr o c. IE EE FOCS , pages 116–1 23, 1994. [120] H. A. Simon. A b eh a vioral mo del of rational c h oice . The Quart erly Journal of Ec onomics , 69(1): 99–118, 19 55. [121] M. Sipser. The history and status of the P versus NP qu estio n . In Pr o c. A CM STOC , pages 603–6 18, 19 92. [122] M. Sipser. Intr o duction to the The ory of Computation (Se c ond E dition) . Course T echnolog y , 2005. [123] R. S talnak er. Th e pr oblem of logica l omniscience, I and I I. In Context and Con tent: E ssays on Intentionality in Sp e e ch and Thought , Oxford Cognitiv e Science Series, p ages 241–273. Oxford Universit y Press, 1999 . 57 [124] L. J. Sto c kmey er. Classifying the computational complexit y of problems. J. Symb olic L o gic , 52(1): 1–43, 19 87. [125] J. A. S torer. On the complexit y of chess. J. Comput. Sys. Sci. , 27(1):77 –100, 1 983. [126] A. M. T uring. Comp uting m achinery and in telligence. M i nd , 59:433– 460, 195 0. [127] L. G. V alian t. A theory of the learnable. Communic ations of th e A CM , 27:1134– 1142, 1984 . [128] L. G. V alia nt. Evolv abilit y . J. ACM , 56(1), 2009. Confer en ce version in MF CS 2007. ECCC TR06-120 . [129] V. V apnik and A. Chervonenkis. O n th e u niform con v ergence of r elat ive frequ en cie s of ev ents to their probabilities. The ory of P r ob ability and its Applic ations , 16(2):264 –280, 1971 . [130] H. W ang. A L o gic al Journey: F r om G ¨ odel to Philosophy . MIT Press, 1997. [131] J. W atrous. S uccinct q u an tum pro ofs for prop erties of ﬁn ite group s. In P r o c. IEEE FOCS , pages 537–546 , 2000 . cs.CC/0009002. [132] J. W atrous. Q uan tum computational complexit y . In Encyclop e dia of Complexity and Systems Scienc e . Sp r inger, 2008. arXiv:0804.340 1. [133] A. Wigderson. P, NP and mathematics - a c omp utatio n al complex- it y p ersp ectiv e. In P r o c e e dings of the International Co ngr ess of Math- ematicians 2006 (M ad rid) , pages 665–712. EMS Pu blishing House, 2007. www.math.ias.edu/˜a vi/PUBLICA TIONS/MYP APERS/W06/w06 .p d f . [134] A. Wigderson. Kno wledge, creativit y and P versus NP, 2009. www.math.ias.edu/˜a vi/PUBLICA TIONS/MYP APERS/A W09/A W09. p d f. [135] E. Wigner. The unreasonable eﬀectiv eness of mathematics in the natural sciences. Commu- nic ations in Pur e and Applie d Mathematics , 13(1), 1960. [136] R. de W olf. Philosophical applicati ons of computational learning theory: Chomsky an innateness and o ccam’s razor. Maste r ’s thesis, Erasm us Unive r sit y , 1 997. home- pages.cwi.nl/˜rdew olf/publ/philosophy/ph thesis.p df. 58

Why Philosophers Should Care About Computational Complexity

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment