Scientific theories of consciousness should be falsifiable and non-trivial. Recent research has given us formal tools to analyze these requirements of falsifiability and non-triviality for theories of consciousness. Surprisingly, many contemporary theories of consciousness fail to pass this bar, including theories based on causal structure but also (as I demonstrate) theories based on function. Herein, I show these requirements of falsifiability and non-triviality especially constrain the potential consciousness of contemporary Large Language Models (LLMs) because of their proximity to systems that are equivalent to LLMs in terms of input/output function; yet, for these functionally equivalent systems, there cannot be any falsifiable and non-trivial theory of consciousness that judges them conscious. This forms the basis of a disproof of contemporary LLM consciousness. I then show a positive result, which is that theories of consciousness based on (or requiring) continual learning do satisfy the stringent formal constraints for a theory of consciousness in humans. Intriguingly, this work supports a hypothesis: If continual learning is linked to consciousness in humans, the current limitations of LLMs (which do not continually learn) are intimately tied to their lack of consciousness.
In the past decade, a subfield of consciousness research has taken a "structural turn" and become focused on mathematics and formalisms [1,2]. Much of this is the influence of Integrated Information Theory (IIT) [3], since it is expressed in a mathematically formalized way [4,5]. In turn, many of the criticisms of IIT have depended on its detailed particulars and whether it is scientifically testable [6,7]. These debates have led to increased interest in defining minimal "toy models" of consciousness [8,9] and identifying what requirements might constrain theories of consciousness in terms of being testable (such as being falsifiable). This structural turn gives rise to a novel approach: Can a theory of consciousness be so strongly constrained by requirements for testability that its overall nature can be deduced?
Increasingly, such an approach seems a necessity. Currently, the field of consciousness research has hundreds of existing theories [10,11], and no agreedupon way to empirically distinguish them. The search for the neural correlates of consciousness [12,13] has revealed that even tightly-controlled and well-funded adversarial collaborations fail [14], such as the recent head-to-head comparison of IIT and Global Neuronal Workspace Theory [15,16], which led to accusations of pseudoscience [7,17]. If consciousness research is to make serious progress, it must winnow the wide field via constraints on properties of theories [18,19]. In this, scientists would act like artists drawing the negative space around consciousness to see its outline clearly.
A further reason to pursue a testability-first approach is that we want questions about consciousness that the contemporary state of consciousness research cannot answer yet. Most prominently, the question whether or not contemporary LLMs (like ChatGPT or Claude or Gemini) are conscious has become suddenly critical. There are major risks associated with getting this question wrong. Assigning consciousness where there is none has a myriad of risks, which include increasing the risk of AI psychosis, overestimation of LLM capabilities, inappropriate practices or regulation, and misleading scientific beliefs about human consciousness [20,21]. On the other hand, if contemporary LLMs were conscious they could be considered moral patients [22] and be due considerations like conversational “exit rights” [23] as well as other protections against mistreatment.
It is unlikely that empirical data from LLMs alone can answer the question of their consciousness, given how prone they are to confabulation and promptsensitivity [24]. And while there is a small subfield studying LLM “introspection” [25], the best evidence for this introspection is in the form of statistically uncommon anomaly detection of injected concepts, for which there could be shallow mechanistic explanations [26].
Therefore, most have assumed that answering the question of LLM consciousness would require scientific consensus around a particular theory of consciousness. However, I show that this is not necessary. Instead, it is possible to prove that there is no theory of consciousness that is non-trivial and falsifiable that could exist for (at minimum) baseline LLMs. This represents much faster progress than trying to assign assumption-dependent probabilities of consciousness, or trying to apply some set of the many (often contradictory) theories that do currently exist [27]. This road to a disproof of LLM consciousness begins with one particularly influential criticism of IIT: The “Unfolding Argument” by Doerig et al. [28] has led to debate in the literature on how to falsify theories of consciousness. Specifically, the argument is based on the universal approximation theorem [29], and argues that any given Recurrent Neural Network (RNN) can be “unfolded” into some functionally equivalent Feedforward Neural Network (FNN). This creates a pathological scientific situation for theories like IIT, since any given RNN might have a certain degree of integrated information, yet its “twin” FNN would have zero integrated information. This is despite the two networks sharing the same input/output (I/O).
Later work by Johannes Kleiner and myself [18] showed that unfolding was a specific case of the independence between what a theory uses to make predictions about consciousness (like the integrated information of IIT) and how a scientist infers the actual states of conscious experiences during experiments (via things like report or behavior or, more broadly, any I/O). We proposed a framework wherein alternative systems (of any kind) can be “substituted” in during empirical testing while holding I/O fixed, which reveals pathologies around falsifiability (thus, the “Substitution Argument”).
Additionally, we pointed out that, instead of independence, there is a risk of having too much dependency between a theory’s predictions and a scientist’s inferences. This would create theories of consciousness that are triv
This content is AI-processed based on open access ArXiv data.