What does a system modify when it modifies itself?

What do es a system mo dify when it mo diﬁes itself ? Self-mo diﬁcation regimes and crossed opacities in cognitive systems Floren tin Koch École P olytechnique, P alaiseau, F rance florentin.koch@polytechnique.edu Abstract When a cognitiv e system mo diﬁes its o wn functioning, what exactly do es it modify— a lo w-lev el rule, a con trol rule, or the norm that ev aluates its o wn revisions? Cognitive science describ es executive control, metacognition, and hierarchical learning with con- siderable precision, but lacks a formal framew ork distinguishing these targets of trans- formation and the conditions that separate them. Mean while, con temp orary artiﬁcial in telligence realizes a growing sp ectrum of self-mo diﬁcation without common criteria for comparison with biological cognition. W e sho w that the question “what is a self-mo difying system?” imp oses, by logical deriv ation, a minimal structure: a rule hierarch y Φ t = { R 0 , . . . , R k max } , an unav oid- able ﬁxed core ( R k max ), and a distinction b et w een eﬀectiv e rules ( Φ t ), represented rules ( Φ R t ⊆ Φ t ), and causally accessible rules ( Φ C t ⊆ Φ t ). F our self-mo diﬁcation regimes are iden tiﬁed according to the level of the hierarc h y at which mo diﬁcation op erates: (1) action without mo diﬁcation, (2) lo w-level mo diﬁcation, (3) structural mo diﬁcation, (4) teleological revision. Eac h regime is anc hored in a c haracterized cognitive phe- nomenon and a corresp onding artiﬁcial system. Application to the h uman case yields a cen tral result: the crossing of opacities. Humans p ossess self-representation ( Φ R t ) and causal p ow er ( Φ C t ) concentrated at the upp er lev els of their hierarch y ( R k max and neigh b oring levels), while op erational lev els ( R 0 , R 1 , . . . ) remain largely opaque. Reﬂexiv e artiﬁcial systems exhibit the inv erse pro- ﬁle: Φ R t and Φ C t ric h at operational levels, empt y at the lev el of R k max . This crossed asymmetry constitutes the structural signature of h uman/AI comparison. The frame- w ork further provides a structural proto col for the question of artiﬁcial consciousness, sho wing that higher-order theories (HOT) and Atten tion Schema Theory (AST) app ear as special cases of the formalism. F our testable predictions and a nov el exp erimental proto col are prop osed. F our op en problems emerge in cascade: the indep endence of 1 transformativit y and autonomy , the viabilit y of self-mo diﬁcation, the teleological lo ck, and iden tity under transformation. Keyw ords: cognitiv e architecture, self-modiﬁcation, metacognition, reﬂexivity , hierar- c hical control, artiﬁcial consciousness, human/AI comparison Notation table Notation Deﬁnition Φ t F unctional state of the system at time t : { R ( t ) 0 , R ( t ) 1 , . . . , R ( t ) k max } R ( t ) i Lev el- i rule at time t in the functional hierarch y R 0 Lo west-lev el rule (observ able b ehavior, weigh ts, asso ciativ e strength) R 1 Rule go verning the mo diﬁcation of R 0 (learning rule) R k max T eleological norm: ultimate ev aluation criterion k max Hierarc hy depth: maximum num b er of lev els E t En vironment at time t ; E t ⇝ Φ t denotes feedbac k (via F ) R epr esentation and c ausality Φ R t Self-represen tation: subset of Φ t that the system represen ts to itself ( Φ R t ⊆ Φ t ) Φ C t Causally activ e subset: rules of Φ t o ver which the system p ossesses eﬀectiv e causal p ow er ( Φ C t ⊆ Φ t , not necessarily ⊆ Φ R t ) Φ R t ∩ Φ C t Rules b oth represen ted and causally accessible (reﬂexive mo diﬁcation) Φ C t \ Φ R t Rules causally mo diﬁed without represen tation (blind mo diﬁcation) Φ R t \ Φ C t Rules represen ted but without causal p ow er (introspection without leverage) A \ B Set diﬀerence: elements of A not b elonging to B L o c al structur e of an op er ation R i +1 → R i ˜ R i Compressed, partial, or inaccurate represen tation of R i ^ R i +1 → R i Represen tation (p ossibly degraded) of the causal link b etw een levels R i +1 ⇝ R i Capacit y (p ossibly degraded) of R i +1 to mo dify R i Reﬂexivit y: Φ R t ⊇ { ˜ R i +1 , ^ R i +1 → R i , ˜ R i } ⊆ Φ t Causalit y: Φ C t ⊇ { R i +1 ⇝ R i } Dynamics F T ransformation dynamics: ( x t +1 , p t +1 ) = F ( x t , p t ) Hierarc hical form: R ( t +1) i = R ( t ) i +1 ( R ( t ) i ; R ( t ) k max ) Π Pro jection op erator: pro duces Φ R t from Φ t and state s t Systemic pr op erties T T ransformativity: max. lev el at which the system can transform A Organizational autonom y: capacity to maintain conditions of existence S1, S2, S3 Sim ulation mo dalities: logical, execution/sandb ox, predictive 2 1 In tro duction 1.1 T w o literatures, one common question In cognitiv e science, an empirical con vergence establishes that h uman cognition is not re- ducible to lo cal resp onses to stimuli but deploys hierarchical, stable, and revisable con trol structures. Miller & Cohen (2001) show ed that the prefrontal cortex activ ely maintains goal represen tations that bias processing in posterior systems—the agent implements organizing structures, not merely resp onses. K o echlin et al. (2003) and Badre & Nee (2018) do cumen ted a cascading organization of executiv e pro cesses along a rostro-caudal axis of the fron tal cor- tex, where progressively more abstract forms of control gov ern more lo cal op erations (see also Botvinick et al., 2009). Nelson & Narens (1990) formalized a tw o-lev el arc hitecture— obje ct-level and meta-level —connected b y monitoring and con trol ﬂows; Fleming & Dolan (2012) sho wed that metacognitiv e accuracy is disso ciated from ﬁrst-order p erformance and asso ciated with anterior prefron tal cortex. Computational cognitive arc hitectures—A CT-R (Anderson, 2007), SOAR (Laird, 2012), CLARION (Sun, 2002)—implement v arious lev els of metacognitiv e control, but none formalizes the conditions under which cognitiv e mo diﬁcation c hanges regime. In parallel, con temp orary artiﬁcial intelligence realizes a concrete sp ectrum of self-mo diﬁcation lev els. A classical Mark ov decision process op erates with ﬁxed rules; p olicy gradien t learn- ing adjusts lo w-level rules under an inv arian t reward function; meta-reinforcemen t learning (W ang et al., 2016) mo diﬁes the learning rules themselves; architecture searc h (Zoph & Le, 2017) rewrites the netw ork structure. Upstream of these achiev ements, the computational re- ﬂection tradition formalized the conditions of p ossibility for self-mo diﬁcation: Smith (1984) sho w ed that a system can con tain an op erativ e represen tation of its o wn in terpreter and mo dify it; Maes (1987) clariﬁed the minimal threshold b y p ositing that reﬂexivit y requires an in ternal representation that is causally active in determining b ehavior. These tw o literatures address the same question: under what conditions can a cognitive system interv ene on the rules that go vern its own functioning? Y et they do not conv erge to w ard a common framew ork. The literature on cognitive control and metacognition dis- tinguishes monitoring from regulation, conﬁdence from performance, task switc hing from adaptation—but do es not formalize the conditions under whic h these mo diﬁcations c hange in nature. Con versely , the AI literature provides systems that mo dify their rules at v arious lev els, but without criteria enabling comparison of these mo diﬁcations with those of human cognition within the same conceptual space. This gap is not merely terminological. A frontal patien t who p erseverates on the Wiscon- sin Card Sorting T est despite explicit negativ e feedback (Milner, 1963) has lost the ability 3 to mo dify an active rule while retaining in tact asso ciativ e learning. A patien t in metacog- nitiv e therapy who learns to treat “worrying protects me” as a revisable rule (W ells, 2009) p erforms a transformation qualitativ ely distinct from an asso ciative adjustmen t. These tw o cases—dev elop ed formally in §3—illustrate a gap that the literature do es not bridge: b e- t w een mo difying low-lev el rules within a ﬁxed framew ork and mo difying the framew ork it- self. Katy al & Fleming (2024) note that contemporary metacognition researc h m ust reco v er greater c onstruct br e adth , b eyond its dominan t core cen tered on conﬁdence. 1.2 Con tribution This article prop oses to bridge this gap through a minimal theory of cognitiv e modiﬁcation regimes. Our question is t wofold: (i) what are the formally distinct regimes of mo diﬁcation that a cognitiv e system can exert on its own rule hierarch y , and (ii) ho w does the exten t of self-represen tation determine which regimes are accessible to it? W e seek neither to prop ose a complete theory of mind nor to subsume all forms of cognitiv e regulation under a single principle, but to introduce a lev el of analysis that is missing betw een existing theories. W e sp ecify at the outset what we mean b y endogenous self-mo diﬁcation: a self-mo difying system is one capable of pro ducing, through its own functioning, a transformation of its functional structure—not merely of its states or outputs. This sp eciﬁcation excludes purely exogenous mo diﬁcations, while ackno wledging that in systems coupled to their environmen t (Odling-Smee et al., 2003), the b oundary b et w een endogenous and exogenous is p orous. T o address this question, w e ﬁrst show (§2) that the question “what is a self-mo difying system?” imp oses a minimal structure: a rule hierarc hy , a ﬁxed core, and a distinction b et ween eﬀective rules, represented rules, and causally accessible rules. F rom this formalism, w e distinguish four regimes (§3), eac h anc hored in a c haracterized cognitiv e phenomenon. W e apply the framework to h uman cognition (§4), iden tifying the crossing of opacities as a structural signature and p ositioning the framew ork relative to theories of consciousness. The taxonom y gives rise to four structural problems in cascade (§5)—the indep endence of transformativit y and autonomy , the viability of self-modiﬁcation, the teleological lo c k, and iden tit y under transformation—of whic h the ﬁrst three receive partial resolutions within the framew ork. W e deriv e four testable predictions and prop ose an exp erimental proto col (§6). The discussion (§7) synthesizes the contributions, identiﬁes limitations and future directions. These results constitute a bridge b et w een cognitiv e science, philosophy of mind, and AI safet y . 4 2 Mo del The framew ork rests on the idea that a self-mo difying cognitiv e system can b e described as a hierarch y of rules of whic h the system represents only a subset. This section justiﬁes this structure (§2.1), derives it logically (§2.2), and then formalizes it (§2.3). 2.1 Wh y a hierarc h y of rules? The h yp othesis of hierarchical organization is supp orted b y three conv ergent arguments. Empirical argumen t. Human b eha vior manifests stable, transferable, and revisable con- trol structures. Miller & Cohen (2001) show ed that the prefrontal cortex maintains goal repre- sen tations that bias p osterior pro cessing. Conﬂict and adaptation paradigms sho w that these structures are mo dulable: sub jects strategically adjust their selection priorities (Botvinic k et al., 2001). Bec ker et al. (2023) show ed experimentally that systematic metacognitive reﬂec- tion leads sub jects to adopt more far-sighte d strategies—what is revised is not a punctual resp onse but a decision structure. Koechlin et al. (2003) and Badre & Nee (2018) do cumented a cascading organization of executiv e pro cesses along a rostro-caudal axis. Structural argumen t. Simon (1962) show ed that stable complex systems are necessar- ily nearly decomp osable. His argument rests on the parable of the w atc hmakers: b et w een a w atchmak er who assem bles a thousand pieces in one go and one who assembles stable sub-mo dules of ten pieces eac h, only the second surviv es interruptions—because each p er- turbation destro ys only a sub-module, not the en tire assem bly . F or a self-mo difying system, this constrain t is reinforced: if an y comp onent could be modiﬁed without hierarc hical orga- nization, a lo cal revision could propagate its eﬀects through the entire system without an isolation mec hanism, pro ducing cascades of uncontrolled revisions. More formally , stability requires that interactions b etw een components within a level b e muc h more frequent and rapid than interactions b et w een levels—the prop ert y of near-decomp osability . Hierarc h y is therefore not merely an empirical prop erty of stable systems: it is a viabilit y condition for self-mo diﬁcation. Logical argumen t. As we show in §2.2, hierarch y is not merely an observ ation or a viability constrain t: it derives from the v ery question “what is a self-mo difying system?” 5 2.2 Deriv ation of the minimal structure The prop osed structure is not one mo deling choice among others. It deriv es, step by step, from the question: what is a self-mo difying system? Step 1: an ordinary dynamical system do es not suﬃce. A classical dynamical system ev olv es according to a ﬁxed la w: x t +1 = f ( x t ) . The state x t c hanges, but the la w f remains in v arian t—it b elongs to the deﬁnition of the system, not to its conten t. Such a system adapts its states, never its pro cesses. Step 2: pro cesses m ust b e distinct in ternal ob jects. F or a system to mo dify its o wn processes, these pro cesses m ust be represen ted as en tities distinct from the ordinary state. W e therefore in tro duce tw o comp onen ts: a state x t and a process p t . As so on as one requires that a pro cess b e mo diﬁable, one imp oses that it b e iden tiﬁable, distinguishable, and replaceable—hence that it b e a manipulable ob ject. A manipulable ob ject that determines the system’s b eha vior is, in the minimal sense, an op erativ e rule. Step 3: rules m ust b e part of the system’s state. If rules remain external to the system, an y mo diﬁcation can only come from outside—the system do es not mo dify itself; it is mo diﬁed. F or self-mo diﬁcation to b e endogenous, rules must b e internalized. The total state b ecomes s t = ( x t , p t ) : not only what the system is , but also ho w it op er ates . Step 4: the dynamics must pro duce b oth the next state and the next rule. The functioning at time t m ust b e able to determine not only x t +1 but also p t +1 : ( x t +1 , p t +1 ) = F ( x t , p t ) . If F pro duces only x t +1 , p t remains ﬁxed—one falls back into an ordinary dynamical system. Step 5: the inﬁnite regress imp oses a ﬁxed core. The system can mo dify p t , but F remains ﬁxed. One might wan t to mak e F mo diﬁable by in tro ducing a meta-rule, then a meta-meta-rule, and so on. This escalation is endless. Any attempt to eliminate the distinction b et w een the level that is mo diﬁed and the level that mo diﬁes leads to an inﬁnite regress or circularity . The consequence is structural: an y coheren t self-mo difying system necessarily p ossesses, at least at each time t , a minimal ﬁxed core. This result is not a p ostulate: it is a logical constraint. It directly grounds the notion of the teleological norm R k max and Proposition 1 (causal closure) established in §3.3. The qualiﬁcation “at eac h time t ” is decisive: it lea ves op en the p ossibilit y that the level serving as the ﬁxed core at one 6 momen t may itself b ecome an ob ject of revision at a later moment—a p ossibility explored in Regime 4 (§3.4). Step 6: partial represen tation is una v oidable. The dynamics F op erates on the total state, but nothing guarantees that the system has complete access to its own rules. A biolog- ical system implemen ts rules in neural netw orks of whic h it has no explicit represen tation. A computational system can access the source co de to which it is given access and will discov er emergen t properties of its execution. Self-mo diﬁcation therefore does not operate on the complete set of eﬀective rules, but on the system’s represen tation of them—a represen tation that is generally partial, compressed, and p oten tially inaccurate. This observ ation requires distinguishing the hierarch y of eﬀectiv e rules from the subset that the system represents. Syn thesis. The structure Φ t = { R 0 , . . . , R k max } , a ﬁxed core R k max , and a distinction Φ R t ⊆ Φ t result from six requiremen ts, each necessary for the notion of self-modiﬁcation to b e w ell-deﬁned: pro cesses must be internal ob jects (step 2), internalized in the s tate (step 3), pro duced by the dynamics (step 4), bounded b y a ﬁxed core (step 5), and accessible via partial represen tation (step 6). Eac h is satisﬁed b y the cognitive systems—biological or artiﬁcial—that the framework seeks to describ e. 2.3 F ormalism W e denote Φ t = { R ( t ) 0 , R ( t ) 1 , . . . , R ( t ) k max } the functional state of the system at time t , from the most concrete rules R 0 to the teleological norm R k max . The conv ention is one of an ascending hierarc hical gradien t: higher levels gov ern lo w er lev els. T erminological clariﬁcation. The term “rule” is used in a broad functional sense: it designates any relativ ely stable structure that constrains or organizes a class of cognitiv e op erations and can become the ob ject of revision. A rule ma y b e instantiated as a goal, an action p olicy , a heuristic, a generative mo del, a probabilistic exp ectation, or a precision- w eigh ting scheme. The notion is neutral with resp ect to substrate: it is compatible with activ e inference (F riston, 2010), whic h mo dels cognition in terms of generative models rather than sym b olic rules. The shift from “pro cess” to “rule” is not an added hypothesis but a consequence of the mo diﬁability requirement (step 2). Ev ery adaptiv e system changes state; but a self-mo difying system mo diﬁes at least some of the rules that go vern its future ev olution. This precision allo ws tw o frequent confusions to b e set aside. First, a c hange in the system’s outputs (a diﬀerent response to a diﬀerent stim ulus) do es not constitute self-mo diﬁcation in the sense of the framework: only a c hange 7 in the hierarch y Φ t constitutes a mo diﬁcation. A thermostat that adjusts ro om temp erature c hanges the state of the world, but not its own con trol rules. Second, the fact that a system partially represen ts itself ( Φ R t  = ∅ ) does not y et imply that it can transform the principles organizing its activity . Self-mo diﬁcation th us requires not only v ariabilit y but a certain in ternalization of the system’s op erative structures—and, for Regime 4, eﬀective causal p o wer o v er the norm. Represen tation and causal p o w er. The system do es not necessarily ha v e complete ac- cess to Φ t . W e denote Φ R t ⊆ Φ t the set of rules that the system represen ts to itself, and Φ C t ⊆ Φ t the set of rules ov er which the system p ossesses eﬀectiv e causal p o w er of mo diﬁca- tion. These tw o sets do not necessarily coincide, and neither is necessarily included in the other. Three conﬁgurations illustrate the distinction. (i) Causal p ower without r epr esentation. A netw ork trained b y gradient descent mo diﬁ es its weigh ts ( R 0 ) at each iteration under the action of R 1 : θ t +1 = θ t − η ∇ L ( θ t ) . This pro cess is causally eﬀective—it transforms R 0 —but the system represen ts neither R 1 nor the fact that it is learning. R 1 is in Φ C t but not in Φ R t . (ii) R epr esentation without c ausal p ower. A human can, through brain imaging, represent the synaptic strength of certain circuits ( R 0 in Φ R t ). But this knowledge confers no causal p o wer: knowing that a giv en synapse has a giv en strength do es not imply b eing able to mo dify it. R 0 is in Φ R t but not in Φ C t . (iii) R epr esentation and c ausal p ower jointly. A h uman in metacognitive therap y repre- sen ts their meta-b eliefs ( R 2 in Φ R t ) and can revise them through delib eration ( R 2 in Φ C t ). This is the case where mo diﬁcation is b oth reﬂexiv e and causally eﬀectiv e. Lo cal structure of an op eration R i +1 → R i . Ev ery down w ard arrow in the hierarc hy— ev ery act by whic h a higher lev el mo diﬁes a lo w er level—possesses the same internal structure with t wo comp onen ts. The reﬂexiv e comp onent, when non-empty , contains three elements: Φ R t ⊇ { ˜ R i +1 , ^ R i +1 → R i , ˜ R i } , where the tilde ( ˜ · ) denotes a p ossibly compressed, partial, or inaccurate representation—a representation of the higher rule, the causal link betw een lev els, and the modiﬁed rule. The causal comp onent con tains the eﬀectiv e capacit y of R i +1 to mo dify R i : Φ C t ⊇ { R i +1 ⇝ R i } , where ⇝ denotes a p ossibly degraded causal p o w er. The indep endence of the t wo comp onents is essen tial: a system can causally modify R i without an y represen tation of what it is doing (gradient descent: Φ R t = ∅ , Φ C t  = ∅ ), or ﬁnely represen t the link R i +1 → R i without b eing able to act on it (in trosp ection of a neural circuit: Φ R t  = ∅ , Φ C t = ∅ ). 8 The central hypothesis is that self-mo diﬁcation op erates via t wo path w ays: a structural path w ay (the mec hanism F mo diﬁes Φ t via Φ C t , whether or not there is represen tation) and a reﬂexiv e path wa y (the system mo diﬁes Φ t b y passing through Φ R t ∩ Φ C t —represen tation serv es as a causal lever). Because Φ R t and Φ C t are b oth partial, self-mo diﬁcation is necessarily limited. The framew ork makes these limits formalizable. The gap b et w een Φ R t and Φ t , and more precisely the exten t of Φ R t , th us constitutes the cen tral v ariable of the framework, since it determines which forms of self-mo diﬁcation are accessible to the system. T eleological norm. W e call R k max the teleological norm: the ultimate ev aluation criterion that orients the transformation d ynamics. In a sup ervised learning system, R k max corresp onds to the loss function; in reinforcemen t learning, to the reward signal; in a living organism, to viabilit y constraints; in human cognition, to a more heterogeneous set of goals, preferences, v alues, or explicit commitments—a set that is not necessarily unitary or coherent, as v alue conﬂicts and moral dilemmas illustrate. The important p oint is not to impose a single substan tiv e theory of ﬁnalit y , but to recognize that a non-trivial self-mo difying system do es not transform its rules indiﬀerently: its revisions are orien ted by higher-lev el constraints. Whether R k max is itself accessible to revision is what separates Regime 3 from Regime 4. Mo des of action. The system has t wo fundamental mo des of action. In the ﬁrst, a rule R i acts on the world without modifying Φ t . In the second, a higher-level rule R i +1 , acting on a portion of the arc hitecture, mo diﬁes R i . A third, intermediate case deserves mention: an action on the world can modify Φ t through feedback—instrumen tal practice (pla ying the piano), for example, mo diﬁes op erative rules through sensorimotor coupling. This case p oses no diﬃcult y for the framew ork: environmen tal feedback enters F as an input factor. The minimal dynamics can b e summarized as: R ( t +1) i = R ( t ) i +1 ( R ( t ) i ; R ( t ) k max ) . In full generality , the modiﬁcation of R i can dep end on the en tire represen ted hierarch y Φ R t and on en vironmen tal input. The hierarc hical notation captures the minimal constraint: mo diﬁcation is principally go verned by a higher level and oriented by the teleological norm— an idealization empirically motiv ated b y the cascading organization of prefron tal con trol (K o echlin et al., 2003; Badre & Nee, 2018). It do es not imply that ev ery real arc hitecture is strictly linear or that eac h lev el acts only on the immediately lo w er lev el. 9 Figure 1: Hierarc hical arc hitecture of the functional state Φ t and lo cal structure of an op- eration R i +1 → R i . Left panel: The system Φ t is organized as a hierarc h y of rules from observ able b ehaviors ( R 0 ) to the teleological norm ( R k max ). Eac h lev el corresponds to a self-mo diﬁcation regime (1–4). The dashed red contour indicates Φ R t in the h uman proﬁle ( Φ R t, human ≫ Φ C t, human at upper lev els); the dashed blue con tour indicates Φ R t in the AI proﬁle ( Φ R t, AI ≈ Φ C t, AI at low er levels). En vironmental feedbac k E t ⇝ Φ t en ters the dynamics via F . Righ t panel: Lo cal view of an y op eration R i +1 → R i , decomp osed in to a reﬂexiv e compo- nen t ( Φ R t ⊇ { ˜ R i +1 , ^ R i +1 → R i , ˜ R i } ⊆ Φ t ) and a causal comp onen t ( Φ C t ⊇ { R i +1 ⇝ R i } ), with the hierarc hical dynamics equation. 3 F our self-mo diﬁcation regimes W e distinguish four regimes, eac h characterized by the level of Φ t at whic h modiﬁcation op erates. Regimes are cumulativ e: each higher regime presupposes the capacities of the preceding one. W e anchor eac h regime in a cognitive phenomenon and an artiﬁcial system. F rom Regime 2 onw ard, each distinction yields a theoretical result that the existing literature do es not pro vide. 3.1 Regime 1: action without self-mo diﬁcation Deﬁnition 1 (Fixed regime) . A system is in the ﬁxe d r e gime if Φ t +1 = Φ t for al l t . The system pr o duc es outputs but do es not mo dify any c omp onent of its functional hier ar chy. F ormal pr oﬁle. Φ R t = ∅ (the system represen ts none of its rules); Φ C t reduces to the ac- tion of R 0 on the environmen t—a causal p ow er directed out w ard, not tow ard the in ternal arc hitecture. No rule is an ob ject of mo diﬁcation. 10 Empirical anc hor. The spinal reﬂex arc: the withdraw al of the hand from a no ciceptive surface is mediated by a p olysynaptic circuit ﬁxed developmen tally (Sherrington, 1906). The reﬂex latency ( ∼ 35 ms for the H-reﬂex) do es not v ary wi th exp erience (Pierrot-Deseilligny & Burk e, 2012). In the formalism: R 0 = { stim ulus → motor resp onse } , inv arian t; Φ R t = ∅ ; Φ C t ( in ternal ) = ∅ . Artiﬁcial anc hor. A ﬁnite automaton with a ﬁxed transition table. Boundary . Habituation—the progressive decline of the resp onse to a rep eated stimulus (Rankin et al., 2009)—constitutes the b orderline case: as so on as R 0 is mo diﬁed, the system en ters Regime 2. 3.2 Regime 2: lo w-lev el mo diﬁcation Deﬁnition 2 (Lo cal regime) . A system is in the lo c al r e gime if only the lowest-level rules ar e mo diﬁe d, under invariant governanc e. F ormally: R ( t +1) 0 = R 1 ( R ( t ) 0 ; R k max ) , where R 1 , . . . , R k max are in v ariant. F ormal pr oﬁle. Φ R t con tains at most the current v alue of R 0 , but not the description of R 1 as a rule; Φ C t con tains R 0 (gradien t descen t or the Rescorla-W agner rule eﬀectively mo diﬁes R 0 ), but this causal p o wer is exercised without represen tational mediation—it is a blind mo diﬁcation ( R 1 ∈ Φ C t but R 1 / ∈ Φ R t ). The system do es not “kno w” that it is learning. On tological clariﬁcation and theoretical con tribution. Regime 2 clariﬁes a funda- men tal distinction: the formalism do es not separate “parameters” and “rules”—it contains only rules ordered by degree of abstraction. What the literature calls a “parameter”—the asso ciativ e strength V , the w eigh ts of a netw ork—is the low est-level rule R 0 . What the liter- ature calls the “learning rule” is R 1 . V determines the system’s b eha vior at time t (it is R 0 ); ∆ V = α β ( λ − V ) prescrib es how V c hanges (it is R 1 ). The relev an t distinction is not the mathematical t yp e of the ob ject—the distinction betw een v alue and function exists—but its p osition in the go v ernance hierarc hy . In such a simple system, the learning rule is the ev alua- tion criterion—this indiﬀerentiation is the signature of minimal Regime 2: the system cannot in terv ene on its norm b ecause the norm is not represen ted as distinct from the rule that implemen ts it. The term “rule learning” is commonly used to describ e phenomena that, in our formalism, amount to the mo diﬁcation of R 0 under ﬁxed R 1 . The distinction is testable: in Regime 2, transfer errors are errors of R 0 (maladapted asso ciative strength); in Regime 3, they are errors of R 1 (learning strategy maladapted to the task structure). Blo c king (Kamin, 11 1969) and long-term p otentiation (Bliss & Lømo, 1973) conﬁrm that mo diﬁcation, in classical conditioning, b ears exclusiv ely on R 0 . Empirical anchor. The Rescorla-W agner mo del (Rescorla & W agner, 1972) allows a com- plete instan tiation. R 0 designates the system’s input-output rule: the function that, for a giv en conditioned stim ulus, produces a conditioned resp onse of a given strength. R 1 des- ignates the up date rule ∆ V = α β ( λ − V ) , which gov erns the mo diﬁcation of R 0 . The prediction error ( λ − V ) mo diﬁes R 0 , but R 1 is structurally in v ariant. The crucial p oint is that the system do es not represen t R 1 as a rule—it do es not “kno w” that it is learning. Artiﬁcial anc hor. Gradien t descent: the weigh ts—which constitute R 0 —c hange according to R 1 : θ t +1 = θ t − η ∇ L ( θ t ) . But R 1 , η , and L ( R k max ) are ﬁxed. Boundary . If R 1 c hanges—for example, the shift from Rescorla-W agner to Pearce-Hall (P earce & Hall, 1980), which mo diﬁes how atten tion mo dulates learning—the system op erates in Regime 3. 3.3 Regime 3: rule mo diﬁcation Deﬁnition 3 (Structural regime) . A system is in the structur al r e gime if mo diﬁc ation c an b e ar on rules R i of arbitr ary level, with the exc eption of R k max . F ormally: R ( t +1) i = R i +1 ( R ( t ) i ; R k max ) , where R k max is in v ariant. F ormal pr oﬁle. Φ R t includes R 1 as an explicit ob ject—the system represen ts its own rule and can treat it as revisable. Φ C t includes R 1 via R 2 (the meta-rule of change). The transition from Regime 2 to Regime 3 corresp onds formally to the en try of R 1 in to Φ R t ∩ Φ C t : the rule is no longer merely applied but represen ted and causally accessible. Empirical anchor. The Wisconsin Card Sorting T est (Milner, 1963; Monchi et al., 2001). R 0 = sorting action; R 1 = activ e sorting rule (“sort by color”); R 2 = meta-rule (“if p ersistent negativ e feedbac k, change sorting criterion”); R k max = implicit norm (“maximize correct re- sp onses”). R k max do es not prescrib e how to ac hiev e the ob jective; R 2 is a particular strategy for satisfying R k max —other R 2 s w ould b e p ossible under the same R k max . Artiﬁcial anchor. Meta-reinforcement learning (W ang et al., 2016): a meta-RL agent learns a learning rule ( R 1 ) under ﬁxed R k max . Architecture searc h (Zoph & Le, 2017) and Gö del mac hines (Sc hmidhuber, 2007) push this principle to its limits. 12 Prop osition 1 (Causal closure) . The lo gic al c onstr aint establishe d at step 5 (§2.2) tr anslates as fol lows: in any system in R e gime 3, ther e exists at e ach time t a level k max such that R k max is ﬁxe d. This c onstr aint is r e c o gnize d in the AI safety liter atur e under the name of go al stability (So ar es & F al lenstein, 2017), but had not b e en formulate d in terms of hier ar chic al rule levels nor c onne cte d to the clinic al disso ciation b etwe en r e gimes. Theoretical contribution: a predicted double disso ciation. The formalism predicts that Regimes 2 and 3, having distinct targets in Φ t , are disso ciable b y selectiv e lesion. This is exactly what is observ ed: patients with dorsolateral prefrontal lesions exhibit p ersev erativ e errors on the WCST—they con tin ue applying R 1 (“sort by color”) despite p ersisten t negative feedbac k—but retain intact asso ciative conditioning (Monc hi et al., 2001). The disso ciation is not merely compatible with the framew ork: it is a direct consequence, b ecause the t wo capacities op erate on formally distinct targets in Φ t — R 0 for conditioning, R 1 for rule change. The switc h cost (Monsell, 2003) reﬂects the computational load of replacing R 1 —a cost absen t in Regime 2, where only R 0 c hanges under ﬁxed gov ernance. T ransition. Prop osition 1 establishes that an y system in Regime 3 p ossesses a ﬁxed core R k max . But the health y human seems to violate this closure: they can delib erate on their v al- ues, revise their moral criteria, c ho ose exp eriences they kno w will transform their preferences. Ho w can a ﬁxed-core system revise the core itself ? 3.4 Regime 4: teleological revision Deﬁnition 4 (Reﬂexiv e regime) . A system is in the tele olo gic al r evision r e gime if R k max ∈ Φ C t and R k max ∈ Φ R t —that is, if R k max is b oth r epr esente d and ac c essible to the system’s c ausal p ower of mo diﬁc ation. The c ondition R k max ∈ Φ C t alone (without Φ R t ) would c orr esp ond to a physic al system that entir ely r e c onﬁgur es its ar chite ctur e without r epr esentational me diation— a form of self-mo diﬁc ation that is p owerful but non-delib er ative. R e gime 4 in the str ong sense r e quir es the c onjunction: the system mo diﬁes R k max through the r epr esentation it has of it, which op ens the p ossibility of evaluation—and henc e of the tele olo gic al lo ck. The threshold separating Regime 3 from Regime 4 is precise: R k max en ters Φ R t ∩ Φ C t . F or this en try to be non-trivial, the represen tation must be causally activ e in the sense of Maes (1987). Theoretical foundations. Regime 4 rests on tw o results that must be distinguished. The univ ersal T uring mac hine (1936) sho wed that a system can sim ulate any machine, including 13 itself—but simulation is not reﬂexivity: the universal machine do es not represen t itself to itself as an ob ject of mo diﬁcation. Smith (1984) crossed an additional threshold in the 3-LISP arc hitecture: the system’s represen tation of its o wn in terpreter is not an inert sim ulation but a causal lever—modifying it c hanges the system’s b ehavior in real time. The system can reason ab out this represen tation, mo dify it, and then execute the mo diﬁed v ersion—realizing an in terpretiv e reen try . This is the transition from self-sim ulation to reﬂexiv e self-mo diﬁcation. Maes (1987) then generalized this threshold independently of any architecture: a system is reﬂexiv e if and only if its in ternal represen tation of its own structure causally in tervenes in its b eha vior. Description alone constitutes introspection; causally activ e description constitutes reﬂexivit y . Empirical anc hor. Metacognitive therap y (MCT; W ells, 2009) p ro vides the clearest clin- ical instantiation. Consider a patien t with generalized anxiet y disorder: R 0 designates the emotional and behavioral resp onses; R 1 the ﬁrst-order beliefs (“this situation is dangerous”); R 2 the meta-b eliefs gov erning R 1 (“w orrying protects me”); R k max the ultimate ev aluation criterion. Classical CBT targets R 1 ; MCT targets R 2 and R k max : it leads the patient to treat “w orrying protects me” not as a fact but as a revisable rule. Longitudinal studies conﬁrm that c hanges in meta-b eliefs predict symptomatic impro vemen t (Solem et al., 2009), which is compatible with the thesis that the target is hierarchically higher—but strict temp oral precedence remains to b e demonstrated. The transformative exp erience in the sense of Paul (2014) constitutes the framework’s limiting case. An agen t chooses an exp erience (parenthoo d, profound con version) kno wing that it will mo dify their preferences unpredictably . The ev aluation of the mo diﬁcation of R k max requires R k max as a criterion—but R k max is what will change. This is the very structure of the teleological lo ck (§5.3). Artiﬁcial anc hor. Regime 4 remains largely programmatic in AI. Current rew ard shaping systems mo dify the rew ard function during training, but this mo diﬁcation is go verned by a ﬁxed meta-criterion. W ork on goal stability (Everitt et al., 2021; Soares & F allenstein, 2017) formalizes related questions in diﬀeren t framew orks. Our framew ork provides a com- plemen tary p ersp ective by situating the problem within a hierarc hy of cognitive rules and their represen tation. Prop osition 2 (Reﬂexive op enness) . A system is r eﬂexively op en if, for al l k , it is in prin- ciple p ossible that R k ∈ Φ R t at a later time. The tension b et w een Prop ositions 1 and 2 is apparen t: the ﬁrst imp oses a ﬁxed core at eac h momen t, the s econd sa ys that no lev el is deﬁnitively excluded. The resolution is that 14 causal closure is lo cal and temp oral, not absolute. The h uman satisﬁes Prop osition 1 at eac h momen t while dynamically shifting the core. It is this mobility of the ﬁxed p oint—not its absence—that c haracterizes Regime 4. The displacemen t of the core follo ws tw o path w ays in h umans. (a) Thr ough envir onmental c oupling : external v ariables (ev ents, losses, relationships) are in ternalized as internal states that mo dify ho w R k max ev aluates situations (Sch w ab e & W olf, 2013; Baumeister, 1990). The core shifts without b eing explicitly revised. A b erea vemen t, for example, can con tract the space of imaginable futures to the p oin t where the decision criteria that orien ted the sub ject’s life lose their normativ e force and are replaced by criteria of immediate surviv al. R k max has changed, but not through an act of delib erate revision: it is the system’s internal state, mo diﬁed b y coupling with the en vironment, that shifted the ev aluativ e ﬁxed p oint. Baumeister (1990) show ed that this mec hanism can lead to the extreme: when the space of p erceived futures contracts to the point of b ecoming empt y , the sub ject ma y come to revise R k max to the annulmen t of the viabilit y constraint itself. (b) Thr ough pr osp e ctive c ommitment : the system commits to an action it kno ws will mo dify R k max —the curren t core authorizes the leap, but do es not con trol the result. An individual who decides to b ecome a parent p erforms a prosp ective commitment: their curren t criteria authorize the leap, but do not control the result—paren tho o d transforms preferences, priorities, and ev aluation criteria themselv es in unpredictable wa ys (Paul, 2014). The R k max that will retrosp ectively ev aluate the decision will no longer b e the one that authorized it—the norm changed in the mean time, and it is from the new norm that the old decision is judged. Longitudinal data on the transition to parenthoo d conﬁrm this pattern: v alues, temp oral priorities, and risk thresholds reorganize in wa ys unforeseeable b y the sub ject (Nelson et al., 2014). F or an AI, a third pathw a y would b e p ossible: (c) thr ough existential sandb ox . The system instan tiates itself in a comparable or iden tical environmen t, app lies the mo diﬁcation of R k max to the copy , observes what the cop y do es and b ecomes under the new R k max , then ev aluates the result from its curren t R k max —and decides whether to adopt the modiﬁcation. This p ossibilit y is structurally inaccessible to the human, who cannot duplicate themselves. It constitutes a fundamen tal asymmetry b et ween the t wo types of systems in their relationship to normativ e revision. The framew ork do es not pro vide a formal dynamics of this displacemen t—this is an ac kno wledged limitation (§7.2). 15 T able 1: Corresp ondences b etw een regimes, targets, representation/causalit y proﬁles, cogni- tiv e phenomena, and artiﬁcial systems. Regime T arget Φ R t Φ C t Cognitiv e phenomenon Artif. system 1: Fixed None ∅ ∅ (in ternal) Spinal reﬂex Finite automaton 2: Lo cal R 0 R 0 curren t R 0 (via R 1 , blind) Conditioning Gradien t descent 3: Structural R i ( i < k max ) R i represen ted R i (via R i +1 ) W CST Meta-RL, AutoML 4: Reﬂexive R k max R k max ∈ Φ R t R k max ∈ Φ C t MCT, transf. exp. (programmatic) 4 Application to the h uman case 4.1 The h uman proﬁle: exten t and opacit y of Φ R t The monitoring/con trol architecture of Nelson & Narens (1990) describ es a lo op in whic h the meta-lev el receiv es information from the ob ject-lev el and mo dulates its op erations in return: in our framew ork, certain comp onen ts of Φ t b ecome ob jects of Φ R t , so that mo diﬁcation is mediated by self-mo deling. Metacognitiv e accuracy disso ciated from ﬁrst-order p erformance (Fleming & Dolan, 2012) sho ws that Φ R t p ossesses its o wn dimensions of ﬁdelity , indep enden t of Φ t . In humans, the proﬁle of Φ R t is structurally asymmetric. Humans represen t their tele- ological norms ( R k max ) relatively well: they can deliberate on their v alues, life goals, and moral criteria (Paul, 2014; W ells, 2009). But they hav e virtually no access to their lo w-lev el op erativ e rules ( R i for small i ): how they recognize a face, pro duce a grammatical sen tence, or adjust ﬁne motor mo vemen ts. Dev elopmen tal data show that Φ R t is constructed progressiv ely . Zelazo (2004) sho w ed that the capacity to represent rules of increasing level follows an ordered dev elopmen tal tra jectory: a 3-y ear-old can follo w a single rule, a 5-y ear-old can main tain t wo rules in alternation, and it is only around 7–8 y ears that the capacity to represen t the rule for selecting betw een rules emerges—a tra jectory that corresp onds, in the formalism, to a progressive expansion of Φ R t to w ard increasingly higher hierarc hical lev els. Karmiloﬀ-Smith (1992) formalized the underlying mec hanism: represen tational redescription, by which implicit pro cedures ( R i that are eﬀectiv e but not represen ted) b ecome explicit ob jects of Φ R t —a necessary condition for an in terven tion by R i +1 to b ecome possible. But this pro cess of constructing Φ R t is not faithful. Nisb ett & Wilson (1977) sho w ed that sub jects do not ha ve access to the real causes of their judgmen ts and confabulate p ost-ho c reasons. This result do es not contradict the existence of Φ R t —it qualiﬁes it: the “reasons” rep orted are not high-lev el R i to whic h the sub ject has access, but reconstructions within Φ R t that purp ort to describ e Φ t without corresp onding to it. The true causes of the judgment 16 T able 2: Compared proﬁles of represen tation ( Φ R t ) and causal p ow er ( Φ C t ). Φ R t (represen tation) Φ C t (causal p o wer) Human — R k max Ric h Strong (delib eration) Human — R 0 P o or (opaque) None AI — R 0 . . . R i Ric h (transparent) Strong (read/write) AI — R k max None None (the low-lev el R i that actually pro duce the decision) remain opaque; what the sub ject reports is a coheren t confabulation. Johansson et al. (2005) reinforced this thesis with the c hoice blindness paradigm: sub jects accept and rationalize choices they did not make, conﬁrming that Φ R t is constructed b y in terpretation, not b y direct reading of Φ t (Carruthers, 2011). The divergence Φ t  = Φ R t is therefore not an empirical anomaly—it is the structural norm predicted b y the framew ork. Con v ersely , a Smithian system (3-LISP; Smith, 1984) has transparent access to its op er- ativ e rules but no endogenous capacity for revision of R k max . This crossing of opacit y proﬁles constitutes the framework’s central result: Φ R t ⊃ { R k max } but Φ R t ⊃ { R 0 , R 1 , . . . } in h umans, whereas Φ R t ⊃ { R 0 , . . . , R k max − 1 } but Φ R t ⊃ { R k max } in Smith-type architectures. Represen tation and causal p ow er. The crossing concerns the extent of Φ R t . But repre- sen ting a rule and b eing able to act on it are tw o distinct capacities. Φ C t ⊆ Φ t designates the subset o ver which the system p ossesses eﬀectiv e causal p o w er. In general, Φ C t  = Φ R t . A thought exp erimen t clariﬁes the distinction. A h uman who, through adv anced imaging, managed to represent the en tirety of their connectome would hav e a Φ R t co v ering Φ t . But this knowledge w ould confer no causal p o wer: kno wing that a given synapse has a given strength do es not imply b eing able to mo dify it. Φ C t w ould remain conﬁned to the upp er lev els. The extension of Φ R t to lo wer levels would b e accompanied b y an extension of Φ C t only through an additional technique (neurostim ulation, brain-computer interface) transforming kno wledge in to a causal lever. F or a Smithian system, Φ C t ≈ Φ R t at op erational lev els: what the system represen ts (its co de, its w eights), it can modify in the same act—reading and writing op erate in the same formal space. But at upp er lev els, Φ C t is empt y , b ecause R k max is not in Φ R t . The human singularit y is that the h uman is the only kno wn system p ossessing b oth Φ R t and Φ C t non-empt y at the level of R k max —ev en if this causal p ow er is imp erfect, slow, and without guaran tee. It is this conjunction that deﬁnes Regime 4. T wo additional properties reinforce the asymmetry . First, human normativit y is not unitary . Sch wabe & W olf (2013) show ed that stress triggers a switc h from mo del-based 17 con trol to mo del-free con trol—a reorganization of the gov ernance hierarc h y itself, without analogue in curren t artiﬁcial systems. Second, low-lev el opacity constrains mo diﬁcation to transit through the top of the hierarc h y , conferring on h uman cognition a qualitatively distinct mo diﬁcation proﬁle from that of computational systems, where mo diﬁcation transits through the b ottom. F rom a strictly structural standp oin t, computational reﬂexivit y can exceed human re- ﬂexivit y on the transparency axis: a Smithian system sees what the h uman cannot see of themselv es. But this transparency do es not automatically imply an endogenous capacit y for teleological revision. 4.2 P ositioning relativ e to theories of consciousness The Φ t / Φ R t framew ork does not constitute a theory of phenomenal consciousness and tak es no p osition on the hard problem (Chalmers, 1996). It do es, how ever, formalize the functional structure of reﬂexivit y—a territory that several theories of consciousness presupp ose without fully formalizing. T w o theoretical families are directly concerned; the others less so, for reasons w e sp ecify at the outset. Among con temp orary theories, w e retain higher-order theories (HOT) and Atten tion Sc hema Theory (AST) as primary interlocutors, b ecause their central mechanism—the causally activ e metarepresen tation—corresp onds directly to Φ R t ∩ Φ C t . Global W orkspace Theory (Baars, 1988; Dehaene & Changeux, 2011) addresses an upstream question: ho w a con- ten t b ecomes globally accessible. This mec hanism can b e read as a condition of entry in to Φ R t , but GWT does not theorize what the system do es with this accessibilit y to reconﬁgure itself—whic h is precisely our ob ject. Recurren t Pro cessing Theory (Lamme, 2006) concerns p erceptual stabilization in sensory lo ops—an implementation condition for certain R i , not the representation of the system’s own architecture. In tegrated Information Theory (T ononi et al., 2016) concerns the in tegrated causal structure of the physical substrate; our framework is functional and algorithmic, hence orthogonal to I IT at the lev el of analysis. The COG- IT A TE results (2025) conﬁrm that the predictions of GWT and I IT are b oth empirically con tested, whic h reinforces the interest of a distinct p ositioning. Higher-order theories (Rosen thal, 2005; Lau & Rosenthal, 2011; Bro wn et al., 2019) hold that a mental state b ecomes conscious when it is the ob ject of a higher-order represen tation. HOT asks: which states are represented b y higher-order states? Our framew ork asks a broader question: whic h comp onen ts of the functional architecture—lo w-level rules, meta- rules, norms—are in Φ R t , and do es this representation hav e a causal p o wer of revision ( Φ C t )? HOT th us app ears as the special case where Φ R t con tains certain ﬁrst-order represen tations. Our framework extends the analysis to control rules and teleological norms—a territory that 18 HOT do es not address. AST (Graziano, 2013; Graziano & W ebb, 2015) prop oses that consciousness is a simpli- ﬁed internal mo del of attention serving attentional control—an instan tiation of the Φ t / Φ R t distinction restricted to the attentional subsystem. The decisiv e p oint of conv ergence is the shared thesis: it is the causal eﬃcacy of self-represen tation, not its ﬁdelity , that matters. The framework adv ances on three problems that HOT and AST lea v e op en or treat as anomalies. First, the formal sep ar ation of metar epr esentation levels : the literature regu- larly conﬂates the presence of a conten t in pro cessing, global accessibility , rep ortability , and causally active metarepresentation—a confusion noted by Kat y al & Fleming (2024). The for- malism separates them: a state can b elong to Φ t without b elonging to Φ R t (non-represen ted pro cessing); it can b e represented in Φ R t without b eing faithfully so (in trosp ective misrep- resen tation); it can b e represen ted in Φ R t without b elonging to Φ C t (in trosp ection without reﬂexivit y); it can b e in Φ C t without b eing in Φ R t (blind causal mo diﬁcation). HOT and AST op erate on sp ecial cases of these distinctions; the formalism uniﬁes them into a grammar ﬁner than the conscious/unconscious binary . Second, intr osp e ctive err or as a structur al pr op erty : HOT treats misrepresen tation as a recurring theoretical problem (Blo ck, 2011; Rosenthal, 2005). In our framew ork, the div ergence Φ t  = Φ R t is the structural norm—it is a direct consequence of the partiality of the pro jection op erator Π . Data on confabulation, choice blindness, and self-interpretation are no longer theoretical embarrassmen ts—they are direct predictions of the framew ork. Third, a structur al pr oto c ol for artiﬁcial c onsciousness : the notion of “representation of a represen tation” b ecomes trivial if it is not constrained (Butlin et al., 2025). Our framework provides this constraint: the op erationalizable question is not “do es the system hav e a second-order represen tation?” but “whic h comp onents of its arc hi- tecture are in Φ R t and Φ C t , at whic h lev el, and do es this represen tation serve to reconﬁgure functioning?” The crossing of opacities—humans represent their norms well but their op era- tiv e rules p o orly , reﬂexiv e systems exhibit the in v erse proﬁle—constitutes the most directly testable prediction. The sp ectrum of artiﬁcial self-mo diﬁcation. Con temp orary AI systems realize a sp ec- trum of self-mo diﬁcation levels that conﬁrms the crossing. A t inference, an LLM mo diﬁes no comp onen t of Φ t ; in-con text learning (Bro wn et al., 2020; Co da-F orno et al., 2023) mimics b eha vioral adaptation without modifying weigh ts—a proto-regime. Fine-tuning and RLHF (Ouy ang et al., 2022) constitute a standard Regime 2. This distinction b etw een inference and training is decisiv e and often obscured. Meta-learning (W ang et al., 2016) and arc hitecture searc h (Zoph & Le, 2017) reac h Regime 3. The Darwin Gö del Mac hine (Zhang et al., 2025) pushes furthest: the system rewrites its own self-mo diﬁcation co de, realizing an in terpretive 19 reen try in the sense of Smith (1984)—but R k max remains externally ﬁxed. The common de- nominator is that R k max remains ﬁxed and exogenous. Prop osition 1 applies uniformly . The DGM’s proﬁle illustrates the inv ersion: Φ R t ric h in op erative R i , empt y at R k max —the inv erse proﬁle of the human. 5 Structural problems and partial resolutions The dev elopment of the framew ork gives rise to four problems in logical cascade: eac h arises only if the preceding one is resolved or at least p osed. The ﬁrst three receiv e partial resolutions within the framework; the fourth remains op en. 5.1 The independence of transformativit y and autonom y The human/AI comparison rev eals that transformativity ( T —the maxim um level at whic h the system can apply an endogenous transformation) and organizational autonom y ( A ) are logically independent. A self-optimizing soft ware can rewrite its rules but b e stopp ed or deleted at any time ( T high, A nil). A simple organism main tains its conditions of existence but do es not mo dify its rules ( A high, T lo w). The human combines b oth. It is only at their in tersection that a self-mo difying system in the strong sense—an indep endent one—would app ear. The literature has treated T and A separately . Autopoiesis theory (Maturana & V arela, 1980) formalizes organizational closure—the system’s capacity to pro duce the comp onents necessary for its o wn p ersistence—but do es not address the transformation of rules: au- top oiesis describ es ho w a system p ersists, not ho w it mo diﬁes itself. V on Neumann (1966) formalized a second dimension of A —reproductive autonom y—sho wing for the ﬁrst time that the capacit y to generate new instances of oneself is a formalizable mec hanism. On the T side, meta-learning and self-mo diﬁcation arc hitectures (W ang et al., 2016; Zhang et al., 2025) describ e transformation without p ersistence. Our framework contributes b y making the independence of T and A visible and formalizable, whic h clariﬁes why a system can be highly adv anced on one axis and nil on the other—and wh y the com bination is rare. 5.2 The viabilit y of self-mo diﬁcation A transforming and autonomous system is not thereb y viable. An agent capable of rewrit- ing its o wn rules can produce irrev ersible transformations: inﬁnite lo ops, functional col- lapse, loss of controllabilit y . The question—can a self-mo difying system revise itself without collapsing?—giv es rise to an additional condition: the capacity to sim ulate b efore acting. 20 T able 3: Positioning in the ( T , A ) space. System T A Characterization Thermostat 1 ∼ 0 Regulation without self-mo diﬁcation Simple organism 1–2 high High autonomy , limited T Con temp orary LLM 1–2 0 Lo cal, heteronomous Meta-RL agen t 3 0 Struc tural, heteronomous DGM 3+ 0 Self-referen tial, heteronomous Human cognition 4 high Reﬂexive, opaque, autonomous W e distinguish three sim ulation mo dalities, ordered b y safet y . (S1) Logical sim ulation— static analysis, mo del che cking —allows prop erties to b e established without real execution. (S2) Execution sim ulation—sandb ox, rev ersible sp eculativ e execution—ev aluates a mo diﬁca- tion candidate in an isolated environmen t. The DGM (Zhang et al., 2025) explicitly real- izes this mo dality . (S3) Predictive sim ulation—in ternal w orld mo dels (Ha & Sc hmidh ub er, 2018)—an ticipates consequences b y pro jection. This distinction app ears tec hnical, but it has a deep structural consequence. The crossing of opacities (§4.1) manifests along the safet y axis. The human has only S3: episo dic prosp ec- tion (Sc hacter et al., 2012; Buckner & Carroll, 2007) allo ws ev aluating hypothetical scenarios without implementing them. But architectural opacit y precludes any equiv alen t of S1 or S2: h uman normativ e revision pro ceeds without a sandb ox, through direct commitmen t—whic h explains wh y transformativ e exp eriences (Paul, 2014) are structurally risky . Artiﬁcial sys- tems ha ve S1 and S2 in principle but hav e nothing to test on R k max , since it is outside Φ R t . Eac h t yp e of system is vulnerable where the other is protected. The literature on formal v eriﬁcation and sandboxing (cf. Amo dei et al., 2016, for a syn thesis of AI safet y problems) has not connected these techniques to the question of reﬂexiv e self-modiﬁcation—our frame- w ork do es so b y situating simulation as a viability condition, not merely as an engineering tec hnique. 5.3 The teleological lo ck Can a viable system rationally revise its own ev aluation norm? Let R k max b e the activ e criterion. A modiﬁcation ∆ pro duces R ′ k max = R k max + ∆ . Ev aluating ∆ requires a criterion E . If E = R k max , the judgmen t is conserv ative: an y substan tial mo diﬁcation w ould b e judged negativ ely by the curren t criteria. If E  = R k max , the question shifts: where do es E come from? This problem is analogous to Neurath’s b oat (Quine, 1960) and has b een p osed in logical decision theory b y Soares & F allenstein (2017) and formalized b y Everitt et al. (2021) via causal inﬂuence diagrams. Our framework adds a precise lo calization: the lo ck is inactiv e as 21 long as R k max is outside Φ C t (Regimes 1–3); it activ ates exactly at the threshold of Regime 4. P artial resolutions. The human solv es this problem de facto. The framew ork dev elop ed in §3.4 sho ws that the lo ck is not a dead end: the human solves it de facto through tw o path w ays—en vironmen tal coupling (the in ternal state, mo diﬁed b y experience, shifts the ev aluativ e core without a delib erate act of revision) and prosp ectiv e commitmen t (the cur- ren t core authorizes the leap, but do es not control the result). F or AI, the existen tial sandb ox (§3.4) op ens a third pathw ay structurally inaccessible to the human. The h uman resolution pro ceeds without formal guarantee—the lo ck is circum v ented, not suppressed. This observ a- tion constitutes a result of the framework: the teleological lo c k is not an absolute obstacle but a constrain t whose mo des of trav ersal diﬀer according to the system’s Φ R t / Φ C t proﬁle. The h uman crosses it from abov e (coupling aﬀecting R k max through non-delib erativ e path wa ys); AI could cross it through causal transparency (sandb ox on copy). Implications for AI design. The observ ation that the h uman solv es the lo ck through dynamic coupling rather than transparency suggests a design principle: not the elimination of the ﬁxed core (whic h would lead to instability), but its softening through dep endence on in ternal state—a functional analogue of the biological mechanism. The active inference framew ork (F riston, 2010)—where the agen t minimizes v ariational free energy—corresponds in our formalism to an adv anced Regime 3 under a ﬁxed norm (free energy minimization is R k max ). The op en question is whether activ e inference can b e extended to Regime 4: could the system represent and revise the v ery principle of free energy minimization, or is this principle b y construction outside Φ R t ? General constrain t. An y guaran tee concerning a self-mo diﬁcation is relative to an ev alu- ation rule that do es not itself mo dify at the moment the mo diﬁcation is judged. This stability need not b e absolute; it suﬃces that it hold at the relev an t momen t of ev aluation. 5.4 Iden tit y under transformation If a system has eﬀectively crossed the teleological lo c k—if it has revised its R k max —is the resulting system still “the same”? This problem presupp oses the preceding one: it arises only for a system that has eﬀectively realized an endogenous revision of R k max . What p ersists through self-mo diﬁcation is neither the op erative rules (they c hange in Regime 2) nor the norm (it c hanges in Regime 4): it is, possibly , the meta-rule of transformation—the w ay the system mo diﬁes itself. But if this meta-rule is itself revisable (Regime 4 fully open), no structural in v ariant subsists by necessit y . Iden tity b ecomes an op en problem. 22 Philosoph y has treated this question under the name of the Ship of Theseus; biology illustrates it through niche construction (Odling-Smee et al., 2003), where organisms mo dify their environmen t to the p oin t of transforming the selection pressures that act on them—the “rules” of ev olution b ecome partially ob jects of action, and the lineage’s identit y redeﬁnes itself through transformation. But none of these discussions had formalized the problem in terms of hierarchical rule lev els and self-representation. Our framework sho ws that the iden tit y problem do es not arise for systems in Regimes 1–3 (their R k max is ﬁxed, and identit y can be deﬁned by this in v arian t core): it emerges exactly at Regime 4, where the core itself b ecomes revisable. The framew ork predicts that the ﬁrst artiﬁcial system to realize an endogenous revision of R k max will also b e the ﬁrst to face this question in its acute form. 6 T estable predictions 6.1 Neuro cognitiv e gradien t and represen tational asymmetry (P1– P2) P1: Regime gradien t. Regimes 1–4 should corresp ond to a rostro-caudal gradient of the prefron tal cortex and a monotonically increasing cognitive cost with lev el k . Existing data are compatible: the W CST activ ates dorsolateral prefrontal cortex during rule c hange (Monc hi et al., 2001), Badre & D’Esp osito (2007) do cument a rostro-caudal hierarc hy asso ciated with the degree of control abstraction, and K o echlin et al. (2003) show progressive anterior recruitmen t with control lev el. The framew ork predicts that this neural hierarc hy corresp onds to the stratiﬁcation of levels k in Φ t —a corresp ondence that the literature suggests but that our taxonom y formalizes for the ﬁrst time in terms of self-mo diﬁcation regimes. The prediction is falsiﬁable: if Regimes 2 and 3 recruited the same netw orks to the same degree, the formal distinction w ould lack a neural correlate. P2: Asymmetry of Φ R t . Humans represent R k max b etter than R i for small i . This asym- metry should manifest in mec hanisms of therap eutic c hange: interv entions targeting R k max (metacognitiv e therapy) should sho w qualitativ ely distinct change patterns from those tar- geting R 1 (classical CBT). The framework’s strong prediction is more precise: the opacity gradien t should b e monotone—the lo wer the level, the po orer the representation in Φ R t and the w eaker the Φ R t → Φ C t coupling ratio—whic h is testable through in trosp ection tasks strat- iﬁed b y hierarchical level. Suc h a gradient would constitute the ﬁrst quan titativ e measure of the in ternal structure of Φ R t in h umans. 23 6.2 Crossed opacities and hierarc hical plasticit y (P3–P4) P3: Human/AI double disso ciation. A meta-learning system should exhibit a Φ R t proﬁle rich in operative R i and p o or in R k max , while the h uman exhibits the in v erse proﬁle. On rule-transfer tasks, error patterns should diﬀer qualitatively: p erseveration errors in h umans ( R 1 main tained incorrectly; Milner, 1963), exploration errors in the artiﬁcial agen t ( R 1 mo diﬁed incorrectly when the task distribution c hanges radically). The tw o-step task literature (Da w et al., 2011) provides the metho dological framework for distinguishing mo del- based from mo del-free con trol, but do es not connect this distinction to the exten t of Φ R t —this is what our framework adds. P4: Hierarc hical plasticit y . Human singularit y lies not only in hierarc hical depth but in hierarchical plasticity: the capacit y to reorder whic h level go v erns which other. T asks requiring hierarchical reorganization should recruit prefrontal net works distinct from those in v olved in standard hierarchical control. Data from Sch wa b e & W olf (2013) on priority reorganization under stress are compatible. The transformativ e exp erience (P aul, 2014) constitutes the limiting case: the sub ject enters an exp erience kno wing that their criteria will c hange—the hierarchical ceiling shifts. 6.3 Proto col: testing the causality of Φ R t Motiv ation. T wo recent results precisely delimit the gap this protocol ﬁlls. Beck er et al. (2023) sho wed that systematic metacognitive reﬂection accelerates the adoption of far-sighte d strategies in a planning task—showing that an in terv ention on strategy representation can reorganize the strategy itself. But their design do es not dissociate the represen tation of the rule ( Φ R t ) from the eﬀective rule ( Φ t ): they sho w that reﬂection helps, not that it is the correction of self-representation that pro duces the eﬀect. Con v ersely , Grinsc hgl et al. (2021) sho wed that metacognitive b eliefs can be modiﬁed b y fake fe e db ack without c hanging the actual oﬄoading strategy—showing that Φ R t can div erge from Φ t without b ehavioral consequence. T aken together, these results indicate that the complete causal lo op—correcting Φ R t and observing whether Φ t c hanges—has not been directly tested. Design. A multi-step planning task with structural shift (the optimal strategy changes mid-session) is used. P articipan ts—h umans and meta-RL agents—tra verse ﬁve phases. Phase 1: L e arning. The system acquires an eﬀective strategy R 1 in a stable-structure en vironmen t. R 1 is measured b y pro cess tracing—that is, observing information-consultation sequences and decision times, which allo w the eﬀectiv ely used strategy to b e inferred (Mouse- 24 lab paradigm; Bec ker et al., 2023). In the agent, the eﬀective strategy is iden tiﬁed b y ana- lyzing the internal states of the recurrent net w ork—sp eciﬁcally , a linear classiﬁer is trained to predict, from hidden activ ations, whic h strategy the net work implements. This classiﬁer iden tiﬁes the direction in activ ation space corresponding to R 1 . Phase 2: Me asuring Φ R t . The system’s represen tation of its o wn strategy is elicited. In the human: structured self-rep ort (“describ e ho w y ou decide”) and prediction of one’s own b eha vior on a h yp othetical scenario. In the agen t: the linear prob e trained in Phase 1 pro vides the encoding of R 1 in activ ation space. Phase 3: Diver genc e induction. An exp erimen tal gap b etw een what the system do es ( Φ t ) and what it b elieves it do es ( Φ R t ) is created. In the h uman, a false but plausible description of the sub ject’s strategy is provided (Grinschgl et al., 2021 metho d: the sub ject receiv es manipulated feedback on their own decision-making). In the agen t, the activ ation v ector iden tiﬁed in Phase 2 is p erturb ed, implan ting an incorrect represen tation of the strategy in the in ternal states—a functional analogue of h uman fak e feedbac k. In b oth cases, Φ R t is mo diﬁed without directly touching Φ t . Phase 4: R epr esentational c orr e ction only. Only Φ R t is corrected—the sub ject is provided with an exact description of their eﬀective strategy—without mo difying the rew ard structure, the task, or rule R 1 . In the agen t, the correct direction in the hidden states is restored. Phase 5: T r ansfer test. The structural shift is in tro duced (the optimal strategy changes). The dep endent v ariable is adaptation sp eed, measured b y the num b er of trials needed to reac h p erformance criterion in the new structure. Conditions. Three b etw een-sub jects conditions: (a) correction of Φ R t (Phase 4 active), (b) fake fe e db ack main tained ( Φ R t degraded), (c) con trol (no interv ention on Φ R t ). The design is a 3 × 2 factorial (condition × system: h uman vs meta-RL agen t). Predictions deriv ed from the framew ork. First pr e diction (causalit y of Φ R t ): if Φ R t is causally activ e in the sense of Maes (1987), representational correction in Phase 4 should accelerate adaptation in Phase 5—condition (a) faster than (c), whic h is faster than (b). Se c ond pr e diction (crossed opacities): in the h uman, correction of Φ R t should ha ve a strong eﬀect at high lev els (meta-strategy , planning criterion) but a w eak eﬀect at lo w lev els (motor execution, p erceptual parsing). In the meta-RL agent, the eﬀect should b e strong at op erative lev els (the learned p olicy) but nil on the ev aluation criterion (externally ﬁxed and inaccessible). Moreo v er, in the h uman, the Φ R t → Φ C t coupling is strong at upp er lev els and nearly nil at low er levels; in the agent, coupling is strong wherev er Φ R t is non-empt y but Φ R t is empt y at upp er levels. This crossed proﬁle constitutes a double disso ciation directly 25 deriv ed from the Φ R t / Φ C t distinction. Thir d pr e diction (Regime 2 / Regime 3 distinction): a “null” con trol condition can b e added where the task structure is random but statistical diﬃcult y is comparable. The frame- w ork predicts that correction of Φ R t accelerates adaptation only in the structured condition (Regime 3: the sub ject m ust change rules), not in the n ull condition (Regime 2: only R 0 c hanges). This directly tests the b oundary b et w een Regimes 2 and 3. Implemen tabilit y . The meta-RL agen t version is sim ulable with existing to ols (W ang et al., 2016; Duan et al., 2016): the recurren t netw ork pro vides direct access to latent represen- tations for Phases 2–4. The h uman v ersion uses v alidated paradigms (Mouselab for pro cess tracing, fak e feedback for div ergence induction). The complete design is pre-registrable and requires neither brain imaging nor sp ecialized equipmen t. 7 Discussion 7.1 Con tributions The framework provides ﬁve con tributions: (1) a formal taxonom y of four self-mo diﬁcation regimes deﬁned b y the level of transformation in the hierarch y R 0 . . . R k max , eac h anc hored in a c haracterized cognitiv e phenomenon and a corresponding artiﬁcial system; (2) a for- malism ( Φ t , Φ R t , Φ C t ) that renders commensurable phenomena typically treated in disjoint framew orks—conditioning, rule c hange, metacognition, meta-learning, v alue revision, AI arc hitectures—b y p ositioning them in a common space deﬁned by the transformation target lev el and the exten t of self-representation. This commensurability enables metho dological transfers in b oth directions: clinical disso ciations observ ed in humans (Regime 2 preserved / Regime 3 lost) can guide diagnosis of artiﬁcial architectures, and con versely , the structural transparency of computational systems can inform the understanding of human opacities; (3) the iden tiﬁcation of crossed opacities—in its dual dimension of represen tation ( Φ R t ) / causal p o w er ( Φ C t )—as the structural signature of human/AI comparison; (4) four structural problems in cascade (§5), of which three receive partial resolutions—in particular the precise lo calization of the teleological lo ck at the threshold of Regime 4, the iden tiﬁcation of three mo des of tra v ersal (coupling, prosp ective commitmen t, existential sandb o x), and viability as an additional condition linking simulation and self-mo diﬁcation; (5) an exp erimental proto col directly testing the causalit y of Φ R t through indep endent manipulation of self-representation. The most consequen tial con tribution for the ﬁeld ma y b e commensurabilit y itself: the framew ork pro vides a formal common space in whic h h uman cognition, classical cognitive 26 arc hitectures, and contemporary learning systems can b e directly compared—not b y v erbal analogy , but by p ositioning in a common space of dimensions (exten t of Φ R t , maximum transformation target level, status of R k max ). This commensurabilit y op ens the p ossibilit y of systematic comparativ e studies. 7.2 Limitations and future directions The framew ork raises four structural problems (§5) and partially resolv es three. The identit y problem and several empirical limitations structure future directions. Regime b oundaries: formal sharpness, empirical blur. The b oundary b et ween regimes is sharp in the formalism but ma y b e blurred empirically . Ho w do es one distinguish a h ighly sophisticated Regime 2 from a minimal Regime 3? The framework prop oses a formal criterion (do es R 1 c hange or not?), but observing a b ehavioral change do es not suﬃce to decide—one m ust determine whether it is R 0 or R 1 that c hanged. The proto col in §6.3 prop oses a metho d for a sp eciﬁc case (indep endent manipulation of Φ R t ), but general op erationalization remains op en and constitutes a metho dological c hallenge for the framework. Mapping Φ t : the topology of the hierarc hical netw ork. The rostro-caudal hierarch y of prefron tal con trol is solidly established (Koechlin et al., 2003; Badre & Nee, 2018; Badre, 2025), but the ﬁne top ology of the net work—its linearity , interactions b et w een non-adjacen t lev els, p ossible partial orders—remains largely unmapp ed. The op en question is whether the regime taxonomy survives in a partial order or requires a total order. The immediate direction is to systematically map go vernance relations b etw een levels, distinguishing top- do wn (hierarc hical) inﬂuences from lateral and b ottom-up ones. Measuring Φ R t : the missing opacit y gradient. Metacognitive accuracy is measurable at the p erceptual lev el through Fleming’s signal-theoretic framework (meta- d ′ ; Fleming & Lau, 2014), and neural correlates are iden tiﬁed (anterior prefrontal cortex; Fleming et al., 2010, 2014; Lapate et al., 2020). But all of these measures op erate at the same level—the metacognition of R 0 (p erception, memory). No one has measured the complete proﬁle of metacognitiv e accuracy across the lev els of the hierarch y , from p erceptual judgmen ts ( R 0 ) to strategy judgments ( R 1 ) to v alue judgments ( R k max ). Prediction P2 (monotone opac- it y gradient) w ould constitute the ﬁrst measure of the in ternal structure of Φ R t in h umans. In trosp ection tasks stratiﬁed by hierarchical level would allow this proﬁle to b e traced. 27 Measuring Φ C t : causalit y remains to b e established. The clinical MCT literature abundan tly do cumen ts that changes in meta-b eliefs predict symptomatic improv ement (Solem et al., 2009; meta-analysis Normann & Morina, 2018), and a recent systematic review conﬁrms the promise of metacognitions as a transdiagnostic change mechanism while emphasizing that metho dological rigor is lacking to establish strict causality . The proto col prop osed in §6.3 ﬁlls precisely this gap: by manipulating Φ R t indep enden tly of Φ t , it tests whether represen tational correction alone pro duces a causal eﬀect—which has nev er b een done directly . Crossed opacities: a d ouble disso ciation to b e realized. The elemen ts on b oth sides of the crossing are do cumen ted separately—h uman low-lev el opacit y (Nisb ett & Wilson, 1977; Johansson et al., 2005) and the operative transparency of artiﬁcial systems (probing, in terpretabilit y). But the direct comparison of h uman vs. meta-RL agen t on the same task with measuremen t of error patterns (P3) has never b een carried out. This is the most immediately testable prediction of the framew ork. The question of surpassabilit y . What w ould an artiﬁcial system p ossessing b oth Φ R t and Φ C t non-empt y at the lev el of R k max , together with p o werful Φ R t and Φ C t at operative levels, lo ok like? Such a system w ould combine the op erativ e transparency of AI and the normativ e reﬂexivit y of the h uman, p ossessing a reﬂexivity structurally superior to that of either. This question is b oth the most sp eculative and the most consequen tial of the framew ork. Iden tit y under transformation: to w ard formalization. The iden tity problem (§5.4) could b eneﬁt from connections with computability theory and pro v abilit y logic. Kleene’s recursion theorem guaran tees that any computable transformation of programs admits a ﬁxed p oin t—a behavioral in v ariant. Applied to self-mo diﬁcation, it suggests that certain classes of transformations necessarily preserv e an in v arian t, even if the system mo diﬁes itself. Löb’s theorem constrains what a formal system can pro v e ab out its own mo diﬁcations—it is the most direct formalization of the teleological lo c k within mathematical logic. These connections remain to b e developed. F ormalizing regime transitions. The framew ork classiﬁes regimes but do es not mo del ho w a system transitions from one to another—ho w Φ R t progressiv ely expands. Dev elopmen tal data (Zelazo, 2004; Karmiloﬀ-Smith, 1992) describ e the progressive expansion of Φ R t ; a formal dynamics of this expansion w ould constitute a ma jor con tribution. F rom classiﬁcatory framew ork to dynamical theory . The presen t formalism con- stitutes a classiﬁcatory framework—analogous to thermo dynamic phases—and not yet a dy- 28 namical theory—analogous to equations of state. The classiﬁcation is non-trivial: it pro duces the four regimes, the crossing of opacities, the teleological lo ck, and Propositions 1–2. But to derive quan titativ e resu lts—b ounds on self-mo diﬁcation accuracy , regime transition rates, sim ulable dev elopmental tra jectories—three thresholds remain to be crossed. First, the pro- jection op erator Π is named but not mo deled: a c haracterization of Π as a noisy information c hannel, with a sp eciﬁable capacity C i at eac h lev el i , w ould allow deriving a no fr e e lunch result—the accuracy of self-mo diﬁcation at lev el i w ould b e b ounded ab o v e b y the qualit y of represen tation at lev el i + 1 . Second, the meta- d ′ measure (Fleming & Lau, 2014) rigorously quan tiﬁes Φ R t , but exclusively at the R 0 lev el (p erceptual metacognition). The equiv alen t for R 1 (accuracy of strategic introspection: do es the sub ject know which strategy they actually use?) and for R k max (accuracy of normative introspection) do es not exist: without these m ulti-lev el measures, the opacity gradient cannot be empirically calibrated. Third, a formal dynamics of Φ R t —ho w self-representation expands, contracts, or deforms ov er time—is needed to sim ulate tra jectories. Concretely , a sim ulation would require a parametric form for the rules, a mo del of Π as an adjustable lossy c hannel, a dynamics of Π ’s evolution, and m ulti- lev el empirical anc horing. The present article inaugurates the framework; the transition to a sim ulable dynamical theory constitutes the most ambitious horizon of w ork it op ens. Bidirectional design. F or AI: softening R k max through dependence on internal state— arc hitectures where the reward function dep ends on an aggregated in ternal state—and the existen tial sandbox. F or h uman cognition: cognitiv e augmen tation practices (education, psyc hotherap y , con templativ e practices, brain-computer in terfaces) reconceptualized as sys- tematic in terven tions on Φ R t and Φ C t . Cultural and so cial extension. Φ R t is not only pro duced b y the individual but shap ed by cultural to ols—writing, delib erativ e institutions, philosophical traditions constitute cultural extensions of Φ R t . 8 Conclusion This article p osed a t wofold question: what are the formally distinct types of mo diﬁcation that a cognitive system can exert on its own rule hierarc hy , and ho w do es the exten t of self-represen tation determine whic h types are accessible to it? The deriv ation (§2) sho ws that the question “what is a self-mo difying system?” imp oses a minimal structure: a rule hierarch y , an unav oidable ﬁxed core, and a threefold distinction b et ween eﬀective rules ( Φ t ), represen ted rules ( Φ R t ), and causally accessible rules ( Φ C t ). The 29 taxonom y (§3) distinguishes four regimes—ﬁxed, lo cal, structural, reﬂexiv e—each anchored in a c haracterized cognitive phenomenon. The comparativ e result (§4) identiﬁes the cross- ing of opacities in its dual dimension (representation and causal p ow er) and sho ws that the sp ectrum of artiﬁcial self-mo diﬁcation con tin ually pushes the hierarchical lev el of transfor- mation upw ard while encountering the imp ossibility of an endogenous revision of R k max . The p ositioning relative to theories of consciousness (§4.2) shows that the formalism uniﬁes levels of metarepresen tation that the literature conﬂates, treats introspective error as a structural prop ert y rather than an anomaly , and provides a structural proto col for the question of artiﬁ- cial consciousness. The four structural problems in cascade (§5)—three of which are partially resolv ed—sho w that the framew ork has deductiv e p o w er b ey ond classiﬁcation: it lo calizes the teleological lo c k at the threshold of Regime 4 and iden tiﬁes distinct mo des of trav ersal dep ending on the system’s proﬁle. F our testable predictions and an exp erimen tal proto col allo w the cen tral thesis—the causalit y of Φ R t —to be submitted to a direct test (§6). The framew ork do es not prop ose a complete theory of mind. It in tro duces a level of analysis that w as missing b etw een existing theories of cognitive control, metacognition, and computational self-modiﬁcation. The formalism renders commensurable phenomena that the literature treats in disjoint frameworks. The four op en problems that emerge in cascade—indep endence of transformativit y and autonom y , viability of self-mo diﬁcation, teleological lo ck, identit y under transformation—are not residuals but structural consequences of the taxonomy . The question of how a system can rationally revise the very criterion that ev aluates its revisions is, p erhaps, the most imp ortan t question that cognitive science and artiﬁcial in telligence share without ha ving y et formalized it in a common language. This article prop oses that language. References [1] Anderson, J. R. (2007). How c an the human mind o c cur in the physic al universe? Oxford Univ ersit y Press. [2] Amo dei, D., Olah, C., Steinhardt, J., et al. (2016). Concrete problems in AI safet y . arXiv pr eprint arXiv:1606.06565 . [3] Baars, B. J. (1988). A c o gnitive the ory of c onsciousness. Cam bridge Univ ersit y Press. [4] Badre, D. (2025). Cognitive control. Annual R eview of Psycholo gy, 76 , 167–195. 30 [5] Badre, D., & D’Esp osito, M. (2007). F unctional magnetic resonance imaging evidence for a hierarchical organization of the prefron tal cortex. Journal of Co gnitive Neur oscienc e, 19 (12), 2082–2099. [6] Badre, D., & Nee, D. E. (2018). F rontal cortex and the hierarchical control of b ehavior. T r ends in Co gnitive Scienc es, 22 (2), 170–188. [7] Baumeister, R. F. (1990). Suicide as escape from self. Psycholo gic al R eview, 97 (1), 90– 113. [8] Bec k er, F., Wirzb erger, M., Pammer-Sc hindler, V., Sriniv as, S., & Lieder, F. (2023). Systematic metacognitive reﬂection helps p eople disco v er far-sighted decision strategies. Judgment and De cision Making, 18 , e15. [9] Bliss, T. V. P ., & Lømo, T. (1973). Long-lasting p oten tiation of synaptic transmission in the dentate area. Journal of Physiolo gy, 232 (2), 331–356. [10] Blo c k, N. (2011). Perceptual consciousness ov erﬂows cognitiv e access. T r ends in Co gni- tive Scienc es, 15 (12), 567–575. [11] Botvinic k, M. M., Bra ver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conﬂict monitoring and cognitive control. Psycholo gic al R eview, 108 (3), 624–652. [12] Botvinic k, M. M., Niv, Y., & Barto, A. C. (2009). Hierarc hically organized behavior and its neural foundations. Co gnition, 113 (3), 262–280. [13] Bro wn, R., Lau, H., & LeDoux, J. E. (2019). Understanding the higher-order approach to consciousness. T r ends in Co gnitive Scienc es, 23 (9), 754–768. [14] Bro wn, T. B., et al. (2020). Language mo dels are few-shot learners. A dvanc es in Neur al Information Pr o c essing Systems, 33 , 1877–1901. [15] Buc kner, R. L., & Carroll, D. C. (2007). Self-pro jection and the brain. T r ends in Co g- nitive Scienc es, 11 (2), 49–57. [16] Butlin, P ., Long, R., Ba yne, T., et al. (2025). Iden tifying indicators of consciousness in AI systems. T r ends in Co gnitive Scienc es, 29 (2), 106053. [17] Carruthers, P . (2011). The op acity of mind. Oxford Universit y Press. [18] Chalmers, D. J. (1996). The c onscious mind. Oxford Univ ersity Press. 31 [19] Co da-F orno, J., et al. (2023). Meta-in-con text learning in large language mo dels. A d- vanc es in Neur al Information Pr o c essing Systems, 36 , 65189–65201. [20] Cogitate Consortium et al. (2025). Adv ersarial testing of global neuronal workspace and in tegrated information theories of consciousness. Natur e, 642 (8066), 133–142. [21] Da w, N. D., Gershman, S. J., Seymour, B., Day an, P ., & Dolan, R. J. (2011). Mo del- based inﬂuences on humans’ choices and striatal prediction errors. Neur on, 69 (6), 1204– 1215. [22] Dehaene, S., & Changeux, J.-P . (2011). Exp erimental and theoretical approac hes to conscious pro cessing. Neur on, 70 (2), 200–227. [23] Duan, Y., et al. (2016). RL 2 : F ast reinforcement learning via slow reinforcement learning. arXiv pr eprint arXiv:1611.02779 . [24] Ev eritt, T., Lea, S., & Hutter, M. (2021). Agent incen tives: A causal p ersp ectiv e. Pr o- c e e dings of the AAAI Confer enc e on Artiﬁcial Intel ligenc e, 35 (13), 11487–11495. [25] Fleming, S. M., & Dolan, R. J. (2012). The neural basis of metacognitiv e ability . Philo- sophic al T r ansactions of the R oyal So ciety B, 367 (1594), 1338–1349. [26] Fleming, S. M., Ryu, J., Golﬁnos, J. G., & Blac kmon, K. E. (2014). Domain-sp eciﬁc im- pairmen t in metacognitiv e accuracy follo wing anterior prefron tal lesions. Br ain, 137 (10), 2811–2822. [27] Fleming, S. M. (2024). Metacognition and conﬁdence: A review and syn thesis. Annual R eview of Psycholo gy, 75 , 241–268. [28] F riston, K. (2010). The free-energy principle: A uniﬁed brain theory? Natur e R eviews Neur oscienc e, 11 (2), 127–138. [29] Graziano, M. S. A. (2013). Consciousness and the so cial br ain. Oxford Universit y Press. [30] Graziano, M. S. A., & W ebb, T. W. (2015). The attention sc hema theory . F r ontiers in Psycholo gy, 6 , 500. [31] Grinsc hgl, S., Meyerhoﬀ, H. S., Sc hw an, S., & Papenmeier, F. (2021). F rom metacogni- tiv e b eliefs to strategy selection. Psycholo gic al R ese ar ch, 85 , 2654–2666. [32] Ha, D., & Schmidh ub er, J. (2018). W orld models. arXiv pr eprint arXiv:1803.10122 . 32 [33] Johansson, P ., Hall, L., Sikström, S., & Olsson, A. (2005). F ailure of introspective rep ort. Scienc e, 310 (5745), 116–119. [34] Kamin, L. J. (1969). Predictability , surprise, atten tion, and conditioning. In B. A. Camp- b ell & R. M. Churc h (Eds.), Punishment and aversive b ehavior (pp. 279–296). Appleton- Cen tury-Crofts. [35] Karmiloﬀ-Smith, A. (1992). Beyond mo dularity. MIT Press. [36] Kat y al, S., & Fleming, S. M. (2024). The future of metacognition research. Cortex, 171 , 223–234. [37] K o echlin, E., Ody , C., & K ouneiher, F. (2003). The arc hitecture of cognitiv e control in the h uman prefrontal cortex. Scienc e, 302 (5648), 1181–1185. [38] Laird, J. E. (2012). The SO AR c o gnitive ar chite ctur e. MIT Press. [39] Lamme, V. A. F. (2006). T ow ards a true neural stance on consciousness. T r ends in Co gnitive Scienc es, 10 (11), 494–501. [40] Lapate, R. C., et al. (2020). P erceptual metacognition of human faces is causally sup- p orted b y function of the lateral prefron tal cortex. Communic ations Biolo gy, 3 , 360. [41] Lau, H., & Rosenthal, D. (2011). Empirical supp ort for higher-order theories of conscious a w areness. T r ends in Co gnitive Scienc es, 15 (8), 365–373. [42] Maes, P . (1987). Concepts and exp eriments in computational reﬂection. Pr o c e e dings of OOPSLA , 147–155. [43] Maturana, H. R., & V arela, F. J. (1980). Autop oiesis and c o gnition. D. Reidel. [44] Miller, E. K., & Cohen, J. D. (2001). An integrativ e theory of prefrontal cortex function. A nnual R eview of Neur oscienc e, 24 (1), 167–202. [45] Milner, B. (1963). Eﬀects of diﬀerent brain lesions on card sorting. A r chives of Neur olo gy, 9 (1), 90–100. [46] Monc hi, O., P etrides, M., Petre, V., W orsley , K., & Dagher, A. (2001). Wisconsin Card Sorting revisited. Journal of Neur oscienc e, 21 (19), 7733–7741. [47] Monsell, S. (2003). T ask switching. T r ends in Co gnitive Scienc es, 7 (3), 134–140. [48] Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework and new ﬁndings. Psycholo gy of L e arning and Motivation, 26 , 125–173. 33 [49] Nelson, S. K., Kushlev, K., English, T., Dunn, E. W., & Lyub omirsky , S. (2014). In defense of parenthoo d. Psycholo gic al Scienc e, 25 (1), 3–10. [50] Nisb ett, R. E., & Wilson, T. D. (1977). T elling more than w e can kno w. Psycholo gic al R eview, 84 (3), 231–259. [51] Normann, N., & Morina, N. (2018). The eﬃcacy of metacognitiv e therap y: A systematic review and meta-analysis. F r ontiers in Psycholo gy, 9 , 2211. [52] Odling-Smee, F. J., Laland, K. N., & F eldman, M. W. (2003). Niche c onstruction. Princeton Univ ersity Press. [53] Ouy ang, L., et al. (2022). T raining language mo dels to follo w instructions with human feedbac k. A dvanc es in Neur al Information Pr o c essing Systems, 35 , 27730–27744. [54] P aul, L. A. (2014). T r ansformative exp erienc e. Oxford Univ ersit y Press. [55] P earce, J. M., & Hall, G. (1980). A model for Pa vlovian learning. Psycholo gic al R eview, 87 (6), 532–552. [56] Pierrot-Deseillign y , C., & Burk e, D. (2012). The cir cuitry of the human spinal c or d. Cam bridge Univ ersit y Press. [57] Quine, W. V. O. (1960). W or d and obje ct. MIT Press. [58] Rankin, C. H., et al. (2009). Habituation revisited. Neur obiolo gy of L e arning and Mem- ory, 92 (2), 135–138. [59] Rescorla, R. A., & W agner, A. R. (1972). A theory of P avlo vian conditioning. In A. H. Blac k & W. F. Prok asy (Eds.), Classic al c onditioning II (pp. 64–99). Appleton-Century- Crofts. [60] Rosen thal, D. M. (2005). Consciousness and mind. Oxford Univ ersity Press. [61] Sc hacter, D. L., et al. (2012). The future of memory . Neur on, 76 (4), 677–694. [62] Sc hmidh ub er, J. (2007). Gö del machines. In B. Go ertzel & C. Pennac hin (Eds.), A rtiﬁcial Gener al Intel ligenc e (pp. 199–226). Springer. [63] Sc h wabe, L., & W olf, O. T. (2013). Stress and m ultiple memory systems. T r ends in Co gnitive Scienc es, 17 (2), 60–68. [64] Sherrington, C. S. (1906). The inte gr ative action of the nervous system. Scribner. 34 [65] Simon, H. A. (1962). The arc hitecture of complexity . Pr o c e e dings of the A meric an Philo- sophic al So ciety, 106 (6), 467–482. [66] Smith, B. C. (1984). Reﬂection and seman tics in LISP . Pr o c e e dings of the 11th ACM SIGA CT-SIGPLAN Symp osium on Principles of Pr o gr amming L anguages , 23–35. [67] Soares, N., & F allenstein, B. (2017). Agen t foundations for aligning mac hine in telli- gence with h uman in terests. In K. Callaghan et al. (Eds.), The te chnolo gic al singularity (pp. 103–125). Springer. [68] Solem, S., Håland, Å. T., V ogel, P . A., Hansen, B., & W ells, A. (2009). Change in metacognitions predicts outcome in obsessive-compulsiv e disorder patients undergoing treatmen t with exp osure and response preven tion. Behaviour R ese ar ch and Ther apy, 47 (4), 301–307. [69] Sun, R. (2002). Duality of the mind. Lawrence Erlbaum. [70] T ononi, G., Boly , M., Massimini, M., & Koch, C. (2016). In tegrated information theory . Natur e R eviews Neur oscienc e, 17 (7), 450–461. [71] v on Neumann, J. (1966). The ory of self-r epr o ducing automata (A. W. Burks, Ed.). Uni- v ersit y of Illinois Press. [72] W ang, J. X., et al. (2016). Learning to reinforcement learn. arXiv pr eprint arXiv:1611.05763 . [73] W ells, A. (2009). Metac o gnitive ther apy for anxiety and depr ession. Guilford Press. [74] Zelazo, P . D. (2004). The dev elopment of conscious con trol in c hildho o d. T r ends in Co gnitive Scienc es, 8 (1), 12–17. [75] Zhang, J., Hu, S., Lu, C., Lange, R., & Clune, J. (2025). Darwin Gö del Mac hine: Op en- ended ev olution of self-impro ving agen ts. arXiv pr eprint arXiv:2505.22954 . [76] Zoph, B., & Le, Q. V. (2017). Neural architecture searc h with reinforcemen t learning. Pr o c e e dings of ICLR . 35

What does a system modify when it modifies itself?

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment