A Mathematical Theory of Understanding

Generative AI has transformed the economics of information production, making explanations, proofs, examples, and analyses available at very low cost. Yet the value of information still depends on whether downstream users can absorb and act on it. A …

Authors: Bahar Taşkesen

A Mathematical Theory of Understanding Bahar T aşk esen University of Chic ago, Bo oth Scho ol of Business Marc h 23, 2026 Abstract Generativ e AI has transformed the economics of information pro duction, making explanations, pro ofs, examples, and analyses a v ailable at very lo w cost. Y et the v alue of information still depends on whether do wnstream users can absorb and act on it. A signal con veys meaning only to a learner with the structural capacit y to decode it: an explanation that clarifies a concept for one user ma y b e indistinguishable from noise to another who lacks the relev an t prerequisites. This pap er develops a mathematical mo del of that learner-side bottleneck. W e mo del the learner as a mind , an abstract learning system characterized by a prerequisite structure ov er concepts. A mind may represent a h uman learner, an artificial learner suc h as a neural netw ork, or any agen t whose ability to interpret signals dep ends on previously acquired concepts. T eaching is mo deled as sequential comm unication with a laten t target. Because instructional signals are usable only when the learner has acquired the prerequisites needed to parse them, the effective communication channel dep ends on the learner’s curren t state of kno wledge and b ecomes more informativ e as learning progresses. The mo del yields t wo limits on the sp eed of learning and adoption: a structural limit determined by prerequisite reac habilit y and an epistemic limit determined by uncertaint y ab out the target. The framework implies threshold effects in training and capability acquisition. When the teac hing horizon lies b elo w the prerequisite depth of the target, additional instruction cannot produce successful completion of teac hing; once that depth is reached, completion b ecomes feasible. This generates non-concav e returns to training effort and implies that spreading scarce instructional resources evenly can yield low er output than concentrating them on few er w orkers or users. Across heterogeneous learners, a common broadcast curriculum can b e slow er than p ersonalized instruction by a factor linear in the num b er of learner types. 1 In tro duction The v alue of a piece of information dep ends on the existence of a mind that can deco de it. This is eviden t in ordinary learning: an explanation means nothing to 1 a listener who lacks the bac kground to follow it, and a lecture conv eys nothing to a studen t who cannot parse its conten t. Information that cannot b e absorb ed b y the in tended learner is, in a precise sense, noise. Understanding therefore cannot b e reduced to the accumulation of information alone. Whether a signal carries usable information is not an in trinsic prop ert y of the signal itself, but a relation b et w een the signal and the conceptual structure of the mind that receives it. Ov er the past cen tury , the cost of pro ducing and distributing information has fallen b y orders of magnitude, from prin ted encyclop edias to digital rep ositories to, most recently , generativ e AI systems capable of pro ducing explanations, pro ofs, and w orked examples on demand. As the supply of machine-generated information expands, the b ottlenec k shifts from pro duction to absorption: the ability of do wnstream users to parse, in terpret, and act on what is pro duced. Whether a signal carries usable information dep ends on the learner’s ability to decode it. An explanation that conv eys meaning to one user may b e indistinguishable from noise to another who lacks the relev ant prerequisites. This pap er dev elops a mathematical mo del of that learner-side b ottlenec k through a formal theory of understanding. W e do not attempt to mo del every feature of cognition, suc h as analogy , abstraction, forgetting, or seman tic in ter- pretation. Instead, we ask a narrow er structural question: giv en a learner with a fixed prerequisite arc hitecture, whic h concepts are understandable in principle, through whic h in termediate states can the learner mo v e, and what limits the sp eed at whic h instruction can bring the learner to a target? Our starting p oin t is a formal mo del of a mind . W e use the term mind for a learning system whose ability to interpret new signals dep ends on what has already b een acquired. The same formal ob ject can represent a human learner, an artificial learner such as a neural net work, or any agen t whose deco ding p o wer is shap ed by prerequisite structure. F ormally , a mind consists of a concept space together with an axiom set and a family of finitary expansion rules sp ecifying whic h concepts b ecome accessible once their prerequisites hav e b een acquired. These rules induce an understanding horizon, describing what is reachable in principle from the axioms, and a family of reachable acquired concept sets, describing the in termediate states through whic h a learner can progress b y successiv e prerequisite-resp ecting steps. Under a finite-horizon assumption, we sho w that this reachable family forms a learning space abov e the axiom set, equiv alently an antimatroid, and conv ersely that every such structure admits a represen tation b y an appropriate mind. T o study the op erational consequences of this structure, we mo del teaching as sequen tial comm unication with a latent target concept. The teacher kno ws the realized target but the learner do es not. Instructional signals are filtered through a prerequisite-gated parser induced b y the mind: a signal is usable only when its target concept is currently ordered for the learner, and otherwise collapses to a common null observ ation. The effective learner-side channel is therefore not fixed in adv ance. It dep ends on the learner’s curren t kno wledge state and changes as instruction pro ceeds. The same raw broadcast may conv ey usable information to one learner while collapsing to noise for another. W e call this phenomenon the 2 r elativity of r andomness . This state dep endence creates tw o distinct obstacles to fast teaching. The first is structur al : b efore a target can b e acquired, the learner m ust mo v e through prerequisite-resp ecting states until the target b ecomes curren tly parseable. The second is epistemic : the learner m ust infer which target the teacher in tends. Our main quantitativ e result com bines these tw o b ottlenec ks in to a general lo wer bound on teac hing time. Exp ected completion time m ust clear b oth a structural barrier, determined b y the shortest v alid route to the target, and an epistemic barrier, determined b y the cum ulative usable information that can pass through the learner-side channel. The structural barrier can b e dominant, but the information-theoretic la y er remains essential for c haracterizing when and ho w instruction b ecomes usable. In our mo del, once the prerequisite structure makes the target reachable, one additional signal is enough for identification. This framework leads to several consequences. A cquiring prerequisites do es more than add concepts: in the sense of Blackw ell, it refines the statistical ex- p erimen t through which later instruction ab out the target is interpreted. This structural c hange has op erational implications for teaching. F or deterministic tar- gets, fixed-horizon teac hing problems exhibit discontin uous structural thresholds: completion probabilit y jumps from zero to one when the teac hing horizon reaches the structural distance to the target concept, implying non-concav e returns to instructional time and simple failures of uniform resource allo cation. The same structural logic also shap es multi-learner settings. A cross heterogeneous learn- ers, teac hing with a common broadcast curriculum can b e strictly slo wer than p ersonalized instruction b y a factor linear in the num b er of learner types. Related literature. The pap er draws on several literatures but differs from eac h in a sp ecific wa y . At a broad level, our question is how the structure of a learner limits the us able flow of information. This connects combinatorial mo dels of learning, information theory , teac hing and mac hine learning, and mo dels of skill formation, but the present framework combines these ingredients in a wa y that is sp ecific to prerequisite-gated understanding. The com binatorial study of feasible learning states originates with knowledge space theory [ Doignon and F almagne , 1999 , 2015 ], where the family of feasible states is tak en as a primitiv e. Indep enden tly , Korte et al. [ 1991 ] arrived at the same mathematical structure, antimatroids, from the p ersp ectiv e of com binato- rial optimization. W e reco v er this structure from a differen t starting point: a generativ e mo del of a mind sp ecified by axioms and finitary expansion rules. The equiv alence (Theorem 2.27 ) sho ws that the t wo viewp oin ts are formally inter- c hangeable, but the generativ e formulation connects the com binatorial structure directly to closure, deriv abilit y , and the teaching b ounds developed in the pap er. Shannon’s information theory [ Shannon , 1948 ] studies channels whose input- output relationship is fixed. In our framew ork, by con trast, the learner’s parsing map induces an effectiv e channel whose output alphab et dep ends on the learner’s curren t acquired state. Blac kw ell’s comparison of exp erimen ts [ Blac kwell , 1951 , 1953 ] pro vides the natural language for this dep endence: w e show that the 3 parsed exp erimen t induced by a larger acquired state Blackw ell-dominates the one induced by a smaller state. In computational learning theory , Goldman and Kearns [ 1995 ] in tro duce teac hing dimension as a combinatorial measure of how many lab eled examples suffice to identif y a target concept within a learner class. Our setting is different: the main constraint is not only identification, but whether the learner can parse target-relev ant signals at all giv en its current prerequisites. More recent w ork in mac hine teac hing studies settings in which a single teac her must instruct multiple heterogeneous learners with a common teac hing sequence; for example, Zh u et al. [ 2017 ] sho w that common teaching can be strictly harder than individualized teac hing, and Zh u et al. [ 2018 ] surv ey the broader landscap e. Our broadcast imp ossibilit y result (Theorem 5.6 ) differs in the s ource of the p enalt y: it is driven b y prerequisite-gated deco dabilit y and the geometry of reachable acquired states, rather than by differences in algorithmic up date rules across learners. The term curriculum also app ears in mac hine learning, where it typically refers to the ordering of training examples from easy to hard [ Bengio et al. , 2009 ]. There the ob ject b eing shap ed is the optimization tra jectory of a parametric mo del; here it is the sequence of prerequisite-resp ecting states through which a structured learner can mov e. The threshold and allo cation results in Section 5 are also related in spirit to mo dels of h uman-capital accum ulation [ Bec ker , 1964 , Ben-Porath , 1967 , Cunha and Heckman , 2007 ]. Those mo dels study how current in vestmen t affects future skill formation, often through complementarit y across stages. Our mechanism is differen t. In our framework, missing prerequisites create structural thresholds: b elo w the relev ant structural depth, completion is imp ossible regardless of strategy , whereas b ey ond that threshold completion b ecomes feasible. The resulting non- smo othness comes from prerequisite-gated deco dabilit y rather than from an exogenous pro duction technology . The state-dep enden t information constraint also connects to rational inattention [ Sims , 2003 ]: b oth frameworks study limits on usable information, but in rational inattention the b ottlenec k is imp osed through an explicit information cost, whereas here it arises endogenously from the prerequisite structure of the mind. Finally , the observ ation that absorptive capacit y limits the v alue of infor- mation connects naturally to emerging work on the economic implications of AI-generated conten t. As generativ e mo dels reduce the cost of pro ducing expla- nations, examples, and analyses, the cen tral question becomes who can mak e use of the resulting output. In our framew ork, this b ottlenec k arises from the prerequisite structure of the learner, which determines which generated signals carry usable information and which collapse to noise. Notation and con v en tions. W e write ∆(Ω) for the set of probability distri- butions on a finite or countable set Ω . Unless stated otherwise, all logarithms are tak en to base 2 ; accordingly , entrop y and mutual information are measured in bits. F or a set S , we denote its cardinality by | S | and its p o w er set by 2 S . Finally , δ x denotes the p oin t mass at x ∈ S . 4 2 Understanding as a Closure System What do es it mean for a learner to “understand” something? A child who knows addition can follow a multiplication lesson built on rep eated addition; one who lac ks addition cannot follo w that explanation. The same w ords carry information for one mind and are noise for another. Understanding, in this sense, is not an isolated state but a structured dep endency: eac h concept requires certain prerequisites, and those prerequisites ma y themselves dep end on prior knowledge. T o formalize this idea, we introduce a primitiv e notion of concept and a nonempt y c onc ept sp ac e , whose elements represent the conceptual units under consideration. A mind is then sp ecified b y t wo ob jects: a set of axioms and a set of expansion rules. Axioms are concepts taken as giv en, requiring no further justification. Each expansion rule states that mastery of a sp ecific finite set of concepts, referred to as its prerequisites, unlo c ks a new concept. Different minds ma y share the same concept space yet differ in their axioms or expansion rules. In that case, the order in whic h concepts b ecome learnable differs, capturing the familiar observ ation that individuals with differen t backgrounds require differen t learning paths. Giv en a mind, the expansion rules induce a closure op erator: starting from an y set of known concepts, iteratively apply every expansion rule whose prerequisites are satisfied un til no new concepts are added. The resulting closure op erator satisfies extension, monotonicit y , and idemp otence, the standard closure axioms. These are not merely formal conv eniences. Extension enco des that know ledge is never lost by deriv ation. Monotonicity enco des that knowing more can only enlarge what is deriv able. Idemp otence enco des that once all consequences hav e b een dra wn, further application changes nothing. An y reasonable notion of logical or conceptual consequence m ust satisfy these prop erties. The closure framework pro vides the basic structural language in whic h the notion of understanding will b e formalized in the sections that follow. W e now formalize these ideas using closure op erators from order theory . Definition 2.1 (Concept space) . A c onc ept sp ac e is a nonempty set C whose elemen ts are c onc epts . The concept space C is a mo deling primitive: its elements may represen t facts (“zebras are animals”), skills (“long division”), prop ositions (“the fundamental theorem of calculus”), or pro cedures (“how the simplex metho d works”) at any lev el of granularit y . The framework is inv arian t to this choice. The mo deler selects C in the same w a y an economist selects the state space in a decision problem or the t yp e space in a mec hanism design mo del: the c hoice determines which phenomena the mo del can express, but the theorems themselves do not dep end on the particular in terpretation. The concept space C ma y b e finite or coun tably infinite. When concepts admit finite descriptions, they can b e represented as finite strings ov er a finite alphab et, and C can therefore b e identified with a subset of that set. 5 Definition 2.2 (Mind) . A mind ov er a concept space C is a triple m = ( C , A m , E m ) where: (i) A m ⊆ C is a set of axioms , (ii) E m ⊆ 2 C fin × C is a set of exp ansion rules , where 2 C fin denotes the collection of finite subsets of C . The axioms A m are the concepts that the mind m understands a priori : they require no prerequisites. Eac h expansion rule ( S , c ) ∈ E m states that if all concepts in the finite set S are curren tly understo o d, then the concept c b ecomes accessible. The set S is referred to as the pr er e quisites of c under that rule. The expansion rules E m describ e the cognitiv e architecture of the mind, that is, the wiring that determines what can b e deriv ed from what, rather than prop ositions explicitly kno wn b y the learner. A rule in E m is not assumed to b e something the learner can articulate; instead, it sp ecifies whic h concepts b ecome accessible once the learner has mastered the prerequisites. The te acher , b y con trast, ma y or ma y not kno w E m . A teac her with full knowledge of the learner’s rules can tailor instruction to the learner’s prerequisite structure, whereas a teacher who is ignoran t of the learner’s type may ha ve to resort to a common broadcast and can then pay the price of universalit y (Theorem 5.6 ). W e imp ose one structural restriction on E m : each prerequisite set is finite. A ccordingly , the granularit y of C , i.e. , what counts as a single concept, should b e c hosen so that realistic explanations can b e mo deled using finitely man y prerequisites. Beyond this finitarit y requiremen t, the lev el of gran ularit y is a mo deling c hoice. R emark 2.3 . W e do not mo del logical inconsistency or b elief revision. Concepts are treated as abstract units, and understanding refers to accessibilit y under a prerequisite structure rather than to semantic truth. This is a delib erate mo deling c hoice, analogous to Shannon’s separation of the engineering problem of comm unication from the semantic conten t of messages. A ccordingly , a concept in our framework may represent a true theorem, a useful heuristic, or even a widespread misconception. The theory is in v ariant to this distinction: the teaching b ounds dep end only on the dep endency structure induced by the prerequisite rules and on the information geometry of the teaching in teraction, not on the truth v alue of the concepts themselves. Example 2.4 (T w o minds learning arithmetic) . Let C = { a, b, c, d } with the informal readings a = coun ting, b = addition, c = spatial arra ys, d = m ultiplica- tion. Mind 1 (algorithmic le arner). Axioms A 1 = { a } . The expansion rule set E 1 consists of { a } ⇒ b , { b } ⇒ c , { b, c } ⇒ d . This mind first understands addition from counting, then understands spatial arra ys through rep eated addition, and finally grasps m ultiplication once it combines rep eated addition with the array represen tation. Mind 2 (visual le arner). Axioms A 2 = { a } . The expansion rule set E 2 consists of { a } ⇒ c , { c } ⇒ b , { b, c } ⇒ d . This mind first understands spatial arra ys 6 from counting ob jects arranged in space, then understands addition by combining arra ys, and finally reaches multiplication through the same rule { b, c } ⇒ d . Both minds in Example 2.4 share the same concept space and the same axioms, and b oth can even tually derive all four concepts, but the order in which concepts b ecome av ailable differs. A concept that one mind derives early ma y come late for the other. This is the formal expression of r elativity of understanding : individuals with different cognitiv e arc hitectures can arriv e at the same b o dy of knowledge through fundamentally different paths. W e will revisit this example throughout the pap er. Example 2.4 is ab out learning mathematics, but the framework applies to an y domain in which understanding has prerequisite structure. The next example illustrates this. Example 2.5 (T w o minds learning text editing on a computer) . Let C = { t, s, k , e } with the informal readings t = typing text, s = selecting (highlighting) text, k = k eyb oard shortcuts, e = efficient editing. Mind 3 (mouse-first). The axiom set is A 3 = { t } . The expansion rule set E 3 consists of { t } ⇒ s , { s } ⇒ k , { s, k } ⇒ e . This learner first acquires text selection from typing, then acquires k eyb oard shortcuts once selection is understo o d, and finally reaches efficient editing once b oth selection and shortcuts are av ailable. Mind 4 (shortcut-first). The axiom set is A 4 = { t } . The expansion rule set E 4 consists of { t } ⇒ k , { k } ⇒ s , { s, k } ⇒ e . This learner first acquires keyboard shortcuts from typing, then acquires selection through shortcut-based interaction, and finally reac hes efficient editing once b oth selection and shortcuts are av ailable. Both minds share the same concept space and the same axiom set, and b oth can ultimately reach e . Ho wev er, their prerequisite structures differ: in Mind 3, selection unlo c ks shortcuts, whereas in Mind 4, shortcuts unlo c k selection. The final rule { s, k } ⇒ e is shared, but the paths by which its prerequisites are acquired are different. The expansion rules admit a com binatorial in terpretation. They form a directed hypergraph [ Berge , 1984 ] in which eac h rule ( S , c ) is a h yp eredge from the prerequisite set S to the concept c . Definition 2.6 (One-step expansion) . F or a mind m and a set K ⊆ C of currently kno wn concepts, define the one-step exp ansion b y Φ m ( K ) = K ∪ { c ∈ C : ∃ S ⊆ K such that ( S , c ) ∈ E m } . F or Mind 1 in Example 2.4 , start from K = { a } . The rule { a } ⇒ b fires, since { a } ⊆ { a } , and therefore Φ 1 ( { a } ) = { a, b } . Applying the op erator again, the rule { b } ⇒ c fires, whereas { b, c } ⇒ d do es not, since c / ∈ { a, b } . Th us Φ 1 ( { a, b } ) = { a, b, c } . Applying the operator once more, the rule { b, c } ⇒ d no w fires, so Φ 1 ( { a, b, c } ) = { a, b, c, d } . A further application pro duces no new concepts, so { a, b, c, d } is a fixed p oin t of Φ 1 . Note that Φ m is extensive : by definition, the union in Definition 2.6 includes K itself, so K ⊆ Φ m ( K ) for every K ⊆ C . W e use this prop ert y freely throughout. 7 T wo further prop erties of Φ m ensure that rep eated application yields a w ell-defined closure. Monotonicit y guarantees that kno wledge never shrinks, while preserv ation of directed unions ensures that no concept app ears only “at the limit”: whenever a concept is deriv able from the union of an increasing family of knowledge states, it is already deriv able at some stage in that family . Lemma 2.7 (Monotonicit y) . If K ⊆ K ′ , then Φ m ( K ) ⊆ Φ m ( K ′ ) . Definition 2.8. F or a mind m and a set K ⊆ C , the understanding closur e of K is the smallest fixed point of Φ m con taining K : cl m ( K ) = T { F ⊆ C : K ⊆ F and Φ m ( F ) = F } . The understanding horizon of mind m is U m = cl m ( A m ) . F or a set map Φ m : 2 C → 2 C , a subset F ⊆ C is a fixe d p oint if Φ m ( F ) = F . Fixed p oin ts are partially ordered b y set inclusion. Given K ⊆ C , a fixed p oin t F ⋆ is the le ast fixe d p oint c ontaining K if K ⊆ F ⋆ and F ⋆ ⊆ F for every fixed p oin t F with K ⊆ F . In our setting, cl m ( K ) is defined as the intersection of all fixed p oin ts containing K , hence it is precisely the least kno wledge state that con tains K and is closed under all expansion rules of mind m . In particular, the understanding horizon U m = cl m ( A m ) is the least fixed p oin t containing the axiom set A m , and therefore represen ts the set of all concepts p otential ly ac c essible to m , that is, the theoretical horizon reachable from its axioms under its expansion rules. R emark 2.9 . It is natural to ask whether a teacher can impart new expansion rules, or ho w a concept un teachable to a to ddler ( e.g. , abstract algebra) ev entually b ecomes learnable years later. In our framew ork, metho ds and techniques that informally feel lik e “rules”, such as the c hain rule in calculus or mo dus p onens in logic, are mo deled as c onc epts c ∈ C . They are things a learner can b e taugh t. Once mastered, they serve as prerequisites that unlo c k do wnstream concepts via the mind’s existing expansion rules. The expansion rules E m and axioms A m themselv es are not teac hable: they represen t the learner’s fixed cognitiv e arc hitecture, sensory baseline, or dev elopmen tal stage ov er the timescale of a teac hing in teraction. A concept is strictly un teachable ( c / ∈ U m ) when this arc hitecture cannot bridge the gap from the axioms. If a learner even tually grasps a concept that w as structurally inaccessible to their earlier self, we mo del this not as a teaching even t, but as c o gnitive development : a transition into a new mind m ′ with ric her axioms A m ′ , ric her expansion rules E m ′ , or b oth. Our theory b ounds the fundamen tal limits of te aching a fixed architecture; the long-term development of the architecture itself is a separate pro cess. Prop osition 2.10 (Existence and characterization) . F or any mind m and any K ⊆ C : (i) cl m ( K ) exists and is a fixe d p oint of Φ m . (ii) cl m ( K ) = S ∞ n =0 Φ n m ( K ) , wher e Φ 0 m ( K ) = K and Φ n +1 m ( K ) = Φ m (Φ n m ( K )) . (iii) If C is finite, then cl m ( K ) = Φ N m ( K ) for some N ≤ | C | . The existence of a least fixed p oin t in Prop osition 2.10 follows from the Knaster-T arski fixed p oin t theorem [ T arski , 1955 ]; see also [ Aliprantis and Border , 8 2006 ] for a textb o ok treatment. W e give a self-con tained pro of for completeness in Section A . Prop osition 2.11 (Axiomatic characterization of understanding) . F or a given mind m , the set U m = cl m ( A m ) is the unique set U ⊆ C satisfying: (i) Axioms are understo o d: A m ⊆ U . (ii) Closure under expansion: if ( S , c ) ∈ E m and S ⊆ U , then c ∈ U . (iii) Minimalit y: U is the smal lest set satisfying (i) and (ii) . Prop ert y (i) of Prop osition 2.11 ensures that the axioms b elong to U m . Prop- ert y (ii) enforces closure under expansion: whenev er all prerequisites of a concept are already in the set, the concept itself must also b elong to the set. Man y subsets of C satisfy (i) and (ii); the entire concept space C is a trivial example. Prop erty (iii) remov es this am biguity by imp osing minimality: U m admits no prop er subset that b oth contains the axioms and is closed under the expansion rules. T ogether, the three prop erties determine U m uniquely . In this sense, understanding is completely determined by the axioms A m and the expansion rules E m , with no additional degrees of freedom. 2.1 Deriv ations and Equiv alence The closure cl m ( K ) tells us which concepts are reachable from K , but not how they are reached. A deriv ation mak es the “ho w” explicit: it is a ro oted tree whose no des represent rule applications and base concepts, showing step b y step why a concept lies in cl m ( K ) . By Lemma A.2 , every such tree is finite. Definition 2.12 (Deriv ation) . A derivation of c onc ept c fr om K ⊆ C in mind m is a well-founded ro oted tree whose no des are lab eled by concepts, satisfying: (i) Ev ery no de is either a b ase no de or a rule no de : • A b ase no de is a leaf (no children) lab eled by a concept in K . • A rule no de is lab eled b y a concept c ′ and has children in bijection with a set S suc h that ( S , c ′ ) ∈ E m , with each c hild lab eled b y the corresp onding elemen t of S . (ii) The ro ot is lab eled by c . W e write K ⊢ m c if such a deriv ation exists. Example 2.13 (Deriv ation trees for the tw o minds) . Con tin uing Example 2.4 , consider the deriv ation of d (m ultiplication) from the axiom set K = { a } . Figure 1 sho ws the deriv ation trees for b oth minds. In each tree, the le aves (b ottom no des, dra wn as squares) are concepts from K , whic h represents the starting knowledge. Eac h internal no de (drawn as a circle) is a concept derived b y applying one expansion rule to its children (the no des directly b elo w it). The r o ot (top no de) is the concept b eing derived. Reading each tree b ottom-up: Mind 1 deriv es a → b → c → d (addition b efore arrays); Mind 2 derives a → c → b → d (arra ys b efore addition). The tw o deriv ation trees witness the same conclusion, namely that d ∈ cl 1 ( { a } ) ∩ cl 2 ( { a } ) , but through different in termediate paths. This provides a concrete instance of mind-relativit y . 9 Mind 1 (algorithmic) d b c a b a { b, c } ⇒ d { a } ⇒ b { b } ⇒ c { a } ⇒ b Mind 2 (visual) d c b a c a { b, c } ⇒ d { a } ⇒ c { c } ⇒ b { a } ⇒ c = leaf (axiom, already in K ) = internal no de (deriv ed concept) Figure 1: Deriv ation trees for d (m ultiplication) from K = { a } (coun ting) in the t wo minds of Example 2.4 . Eac h tree is read b ottom-up: leav es are concepts already known; each in ternal no de is derived from its children by the expansion rule shown alongside. The ro ot d is the concept b eing deriv ed. Both trees witness that d b elongs to the corresp onding understanding closure of { a } , but through differen t in termediate paths. Deriv ations provide a constructiv e counterpart to the closure: if a concept b elongs to cl m ( K ) , there must exist a finite chain of rule applications that pro duces it. The following theorem confirms that the t wo characterizations are equiv alent, that is, nothing b elongs to the closure without a deriv ation, and ev ery deriv ation sta ys within the closure. Theorem 2.14 (Closure-deriv ability equiv alence) . F or any mind m , any set K ⊆ C , and any c onc ept c ∈ C , c ∈ cl m ( K ) ⇐ ⇒ K ⊢ m c . The closure op erator cl m induced b y a mind satisfies the usual closure axioms (extension, monotonicit y , and idemp otence). A closure op erator that additionally satisfies a finitary prop ert y , namely that membership in the closure dep ends only on finitely many elements, is called algebraic (see Definition A.3 ). Theorem 2.15 (Algebraic closure equiv alence) . (i) F or any mind m = ( C , A m , E m ) , the closur e op er ator cl m : 2 C → 2 C is an algebr aic closur e op er ator on C . (ii) Conversely, for any set X and any algebr aic closur e op er ator f : 2 X → 2 X , ther e exists a rule set E ⊆ 2 X fin × X such that, writing Ψ E ( K ) = K ∪ { c ∈ X : ∃ S ⊆ K such that ( S , c ) ∈ E } , one has, for every K ⊆ X , f ( K ) = \ n F ⊆ X : K ⊆ F and Ψ E ( F ) = F o . 10 Theorem 2.15 shows that finitary expansion-rule systems and algebraic closure op erators are equiv alen t w ays of describing the same finitary consequence relation. In particular, every finitary expansion-rule system induces an algebraic closure op erator, and conv ersely every algebraic closure op erator on a set X admits at least one, generally non-unique, presentation b y finitary expansion rules. Thus the rule-based component of a mind should b e understo o d not as additional structure b ey ond closure, but as a presentation of an algebraic closure op erator. Conceptually , this separates structure from presen tation. The expansion rules describ e one particular finite-premise decomp osition of the underlying consequence relation, while the intrinsic ob ject is the algebraic closure op erator itself. Concretely , let X b e a nonempt y set, let f : 2 X → 2 X b e an algebraic closure op erator, and let A ⊆ X b e a c hosen set of axioms. Cho ose any rule set E ⊆ 2 X fin × X whose induced closure op erator is f , as guaran teed by Theorem 2.15 (ii). Then m = ( X , A , E ) is a mind whose induced closure op erator is f , and whose understanding is U m = f ( A ) . Th us sp ecifying a mind amounts to sp ecifying an algebraic closure op erator together with an axiom set, while the rule formalism provides a finite-premise presen tation of that closure structure. 2.2 Ordered and Unordered Information In classical information theory , the information conten t of a signal is treated as a prop ert y of the source model, indep enden t of the particular receiv er. In teac hing, ho wev er, the usefulness of information is fundamen tally relativ e. The same explanation that substantially reduces uncertaint y for a prepared learner ma y con vey little or no usable information to a no vice. This relativity arises b ecause the ability to extract usable information dep ends on tw o internal factors: the learner’s prerequisite structure E m and the learner’s acquired concept set K at the time of interaction. A concept that is within reac h for one mind may b e completely inaccessible to another, either b ecause the tw o minds op erate under differen t prerequisite rules, or b ecause they share the same rules but b egin from different acquired concept sets. F ormally , one-step accessibility of a concept is determined by the expansion map: a concept c is reac hable from the current acquired concept set K if and only if c ∈ Φ m ( K ) . Consequen tly , the information conv ey ed b y a signal is not determined by the signal alone, but by its p osition with resp ect to the learner’s mind. This relationship defines the effective c hannel through whic h teac hing o ccurs. Definition 2.16 (Ordered and unordered concept) . Let m b e a mind and let K ⊆ C b e the set of concepts the learner currently kno ws. A concept c ∈ C is: (i) Or der e d for ( m , K ) if c ∈ Φ m ( K ) . Equiv alen tly , either c ∈ K , or there exists a rule ( S , c ) ∈ E m suc h that S ⊆ K . (ii) Unor der e d for ( m , K ) if c / ∈ Φ m ( K ) . Equiv alently , c / ∈ K and for every rule ( S , c ) ∈ E m , at least one prerequisite in S is missing from K . 11 A t any giv en stage of the learning pro cess, the set K represen ts the concepts actual ly ac quir e d so far . It is not assumed to b e closed under inference. The closure cl m ( K ) represen ts the set of concepts that are in principle r e achable from K under the learner’s expansion rules. A ccordingly , it need not coincide with the learner’s current acquired set at a giv en momen t. Later, when w e mo del teaching dynamics (see Section 3 ), the ev olving state K t will represent the concepts the learner has acquired by time t , whereas cl m ( K t ) will describ e the concepts that are p oten tially accessible from that state. R emark 2.17 . The distinction b et w een ordered and unordered concepts concerns deco dabilit y at the presen t moment, not whether a signal can b e stored for later use. A learner with memory could buffer a signal targeting a curren tly unordered concept; for example, a student migh t cop y do wn a formula they do not yet understand. Once the prerequisite concepts enter K , the stored signal ma y b ecome deco dable retroactively . In the memoryless parsing mo del in tro duced in Definition 3.3 , by contrast, a signal targeting an unordered concept is lost immediately . A natural extension would replace that parser with a dela y ed-parsing v ariant, in which raw signals are buffered and re-parsed whenev er K expands. Suc h a mo del could low er the teaching-time lo w er b ound, since information presen ted to o early would no longer b e wasted. Definition 2.18 (V alid ordered curriculum) . Let m b e a mind and let K 0 ⊆ C b e an initial knowledge set. A p ossibly empty finite sequence γ = (( S i , c i )) L i =1 , L ≥ 0 , is a valid or der e d curriculum starting from K 0 if: (i) ( S i , c i ) ∈ E m for each i = 1 , . . . , L ; (ii) defining recursively K i = K i − 1 ∪ { c i } , i = 1 , . . . , L, (1) one has S i ⊆ K i − 1 for every i = 1 , . . . , L . Definition 2.18 formalizes the idea that a curriculum must resp ect prerequisites at ev ery step. The rule ( S i , c i ) can b e used only when all concepts in S i are already con tained in the curren t set K i − 1 . Th us the curriculum follo ws a prerequisite- resp ecting path, up dating the set of concepts acquired b y the learner one step at a time. Here K i denotes the set of concepts acquired after the first i steps, so that K 0 ⊆ K 1 ⊆ · · · ⊆ K L . Theorem 2.19 (Ordering theorem) . F or any mind m and any tar get c ∗ ∈ U m , ther e exists a valid or der e d curriculum γ = (( S 1 , c 1 ) , . . . , ( S L , c L )) , L ≥ 0 , starting fr om A m , such that, if ( K i ) i =1 ,...,L is c onstructe d as in ( 1 ) with K 0 = A m , then c ∗ ∈ K L . Example 2.20 (V alid ordered curricula for the tw o minds) . W e illustrate Defi- nition 2.18 using the tw o minds of Example 2.4 , b oth starting from the initial concept set K 0 = { a } . Mind 1 (algorithmic). A v alid ordered curriculum for Mind 1 is γ 1 = ( r 1 , r 2 , r 3 ) , r 1 = ( { a } , b ) , r 2 = ( { b } , c ) , r 3 = ( { b, c } , d ) . 12 W riting c 1 = b , c 2 = c , c 3 = d , and defining K (1) 0 = { a } , K (1) i = K (1) i − 1 ∪ { c i } ( i = 1 , 2 , 3) , w e obtain K (1) 1 = { a, b } , K (1) 2 = { a, b, c } , K (1) 3 = { a, b, c, d } . Indeed, at each step the prerequisite set of the selected rule is con tained in the curren t acquired concept set. Mind 2 (visual). A v alid ordered curriculum for Mind 2 is γ 2 = ( r ′ 1 , r ′ 2 , r ′ 3 ) , r ′ 1 = ( { a } , c ) , r ′ 2 = ( { c } , b ) , r ′ 3 = ( { b, c } , d ) . W riting c ′ 1 = c , c ′ 2 = b , c ′ 3 = d , and defining K (2) 0 = { a } , K (2) i = K (2) i − 1 ∪ { c ′ i } ( i = 1 , 2 , 3) , w e obtain K (2) 1 = { a, c } , K (2) 2 = { a, b, c } , K (2) 3 = { a, b, c, d } . Again, each rule is applicable when used. Th us the t wo minds admit differen t v alid ordered curricula from the same starting set. In particular, their first steps must differ. F or Mind 1 the only rule whose prerequisite set is contained in { a } is ( { a } , b ) , whereas for Mind 2 the only such rule is ( { a } , c ) . This suggests that a single common curriculum cannot in general resp ect the structural requirements of b oth minds sim ultaneously , foreshadowing the imp ossibilit y result of Section 5.2 . Prop osition 2.21 (Curricula sta y inside understanding horizon) . L et m b e a mind, and let γ = (( S i , c i )) L i =1 b e a valid or der e d curriculum starting fr om A m . L et ( K i ) i =1 ,...,L b e c onstructe d as in ( 1 ) with K 0 = A m . Then K i ⊆ U m for every i = 0 , 1 , . . . , L . In p articular, if c ∗ / ∈ U m , then no valid or der e d curriculum starting fr om A m c an r e ach c ∗ . Prop osition 2.21 draws a b oundary around what an y curriculum can achiev e. If a concept do es not b elong to the understanding horizon U m , then no sequence of rule applications, how ev er long or carefully arranged, can pro duce it. The barrier is structural, not epistemic: it is not that the teacher lacks information or that the curriculum is p o orly designed, but that the expansion rules of the learner do not connect the axioms to the target concept. In this sense, the understanding horizon U m is the theoretical horizon of the mind m . A concrete illustration is the attempt to conv ey the visual exp erience of the color purple to a learner who has b een blind from birth. Here the target concept is not the word purple or its descriptive use, but the sensory concept asso ciated with its visual app earance. Such a learner may understand many relational facts ab out color: that purple is classified b et w een blue and red in certain systems, that particular ob jects are called purple by sighted sp eak ers, or that light asso ciated with purple o ccupies a certain range of wa v elengths. But if the learner’s mind con tains no rule path from its existing concepts to that sensory target, then no curriculum, how ev er long or ingeniously ordered, can reach it. 13 2.3 Reac hable acquired concept sets The closure op erator cl m iden tifies what is reac hable in principle from the axiom set, but it do es not describ e the in termediate concept sets through which a learner ma y pass on the wa y to that horizon. This distinction is structurally imp ortan t and closely related to a central idea in the literature on knowledge spaces, where one studies not only which concepts are ultimately attainable, but also which in termediate learning states are feasible along a learning pro cess [ Doignon and F almagne , 1999 , Korte and Lovász , 1983 ]. Closure is a global notion: if a concept lies in cl m ( K ) , then it is even tually reachable from K , but it need not already b elong to the current acquired concept set K . In particular, closure alone do es not record whic h subsets of U m can arise b y successive prerequisite-resp ecting acquisitions, one concept at a time. F or the structural theory of teaching, we therefore need a finer ob ject than the understanding horizon alone. W e introduce the family of r e achable ac quir e d c onc ept sets : those subsets of U m that can b e built from the axioms by a finite sequence of lo cally v alid acquisitions. This family is the natural state space for teac hing dynamics. Later we show that, under a finite-horizon assumption, it has the combinatorial structure familiar from the kno wledge-space literature: after shifting b y the axiom core, it forms an antimatr oid , equiv alen tly , a learning space. Thus the framew ork do es not take feasible learning states of [ Doignon and F almagne , 1999 ] as primitive; rather, it derives them from axioms and expansion rules of a mind. Definition 2.22 (Reachable acquired concept sets) . A set K ⊆ U m is r e achable if there exists a finite chain A m = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K suc h that for eac h i = 0 , . . . , L − 1 , K i +1 = K i ∪ { c i } , c i ∈ Φ m ( K i ) \ K i . An y such chain is called a witnessing chain for the reac hability of K . Assumption 2.23 (Finite understanding horizon) . The understanding horizon U m = cl m ( A m ) is finite. Assumption 2.23 is imp osed only to place the reachable family within the finite com binatorial framew ork of learning spaces and antimatroids. The definition of reac hability itself do es not require finiteness. Under this assumption, we define the r e achable family of mind m as K m = { K ⊆ U m : K is reac hable from A m } . The reachable family K m will later serv e as the state space for the teaching dynamics in Section 3 , so its in ternal structure is central to the theory . The next prop osition shows that this family has three basic features. It has a distinguished minim um state, every non-minimal reachable state can b e obtained from another reac hable state b y adding a single concept, and it is closed under unions. These prop erties are natural from the p ersp ectiv e of learning: one can build feasible states step by step, and compatible partial acquisitions can b e combined. They also place the reachable family in close corresp ondence with the combinatorial ob jects studied in the literature on learning spaces [ Doignon and F almagne , 1999 ] and antimatroids [ Korte and Lo vász , 1983 ]. 14 Prop osition 2.24 (Structure of the reachable family) . Under Assumption 2.23 , the family K m is finite and satisfies: (i) A m is the minimum element of the p artial ly or der e d set ( K m , ⊆ ) ; (ii) for every K ∈ K m with K  = A m , ther e exists K ′ ∈ K m such that K ′ ⊂ K and | K \ K ′ | = 1 ; (iii) if K , K ′ ∈ K m , then K ∪ K ′ ∈ K m ; (iv) U m is the maximum element of ( K m , ⊆ ) ; (v) or der e d by inclusion, ( K m , ⊆ ) is a finite join-semilattic e, and for every K , K ′ ∈ K m the join is given by K ∨ K ′ = K ∪ K ′ . Prop erties (i) through (iii) identify the core com binatorial features of the reac hable family: a distinguished minimum state, one-step accessibilit y , and union- closure. These are precisely the ingredients that connect the reachable family to the notions of learning space [ Doignon and F almagne , 1999 ] and antimatroid [ K orte and Lovász , 1983 ]. T o make the connection precise, we recall b oth concepts and their equiv alence. An an timatroid on a finite set E is a family F ⊆ 2 E satisfying: (i) ∅ ∈ F ; (ii) for ev ery nonempty S ∈ F , there exists x ∈ S suc h that S \{ x } ∈ F (accessibilit y); and (iii) F is union-closed. While matroids [ Whitney , 1935 ] axiomatize indep endence structures where feasibilit y is closed do wn ward: ev ery subset of a feasible set is feasible; an timatroids [ Korte and Lo vász , 1983 ] capture the complementary pattern: feasibility is closed up ward under unions, mo deling sequen tial construc- tion under precedence constraints. Indep enden tly , [ Doignon and F almagne , 1999 ] arriv ed at the same mathematical structure from a differen t motiv ation: modeling the feasible kno wledge states of a human learner. They called the resulting ob ject a learning space [ Doignon and F almagne , 2015 , Theorem 7], [ Doignon and F almagne , 2016 ], which is an an timatroid. The standard definition of a learning space takes the empty set as the minim um element, mo deling a learner who b egins with no kno wledge. In our setting the learner starts from the axiom set A m , so we introduce a shifted v ariant that replaces ∅ with A . Definition 2.25 ( A -based learning space) . Let U b e a finite set and let A ⊆ U . A family F ⊆ 2 U is called an A -b ase d le arning sp ac e if: (i) A ∈ F and every K ∈ F satisfies A ⊆ K ; (ii) for every K ∈ F with K  = A , there exists x ∈ K \ A suc h that K \ { x } ∈ F ; (iii) F is union-closed. Equiv alently , the shifted family b F = { K \ A : K ∈ F } ⊆ 2 U \ A is an antimatroid. Corollary 2.26 (Shifted antimatroid structure) . Under Assumption 2.23 , the r e achable family K m is an A m -b ase d le arning sp ac e. Equivalently, the shifte d family b K m = { K \ A m : K ∈ K m } ⊆ 2 U m \ A m is an antimatr oid. The next theorem c haracterizes the reachable families generated b y minds: they are precisely the A -based learning spaces. Theorem 2.27 (Representation of reachable families) . L et C b e a finite set, let A ⊆ C , and let F ⊆ 2 C . The fol lowing ar e e quivalent: 15 (i) F is an A -b ase d le arning sp ac e; (ii) ther e exists a mind m = ( C , A , E m ) whose r e achable family satisfies K m = F . Mor e over, when (i) holds, the mind m F = ( C , A , E F ) with rule set E F = { ( S , c ) : S ∈ F , c ∈ C \ S , S ∪ { c } ∈ F } satisfies K m F = F . Figure 2 illustrates the reachable family for a mind with axiom set A m = { a } and expansion rules { a } ⇒ b, { a } ⇒ c, { b, c } ⇒ d . Starting from K = { a } , the learner can acquire b or c in either order, since b oth are individually unlo c k ed b y the axiom. Ho w ever, d b ecomes reac hable only once b oth b and c ha ve b een acquired, so { a, b, c } is the unique gatewa y to d . The set { a, b, d } , for instance, do es not b elong to K m b ecause the rule for d requires c , whic h is absen t. The figure makes b oth accessibility and union-closure visible. { a } { a, b } { a, c } { a, b, c } { a, b, c, d } + b + c + c + b + d Figure 2: The reac hable family K m for a mind with axiom set A m = { a } and expansion rules { a } ⇒ b , { a } ⇒ c , { b, c } ⇒ d . The concept d b ecomes reac hable only at { a, b, c } , where b oth pre- requisites are present. Sets suc h as { a, b, d } are structurally unreac h- able. Theorem 2.27 c haracterizes the reachable families generated b y minds as the A -based learning spaces. This has t wo consequences for the present work. First, the feasible knowl- edge states of a mind need not be postulated as a primitiv e; they are deriv ed from axioms and expansion rules, and the resulting state space automatically inherits the rich combina- torial structure of an antimatroid. Second, the con v erse direction guarantees that the mind for- malism is fully expressiv e: an y learning space one might wish to study can b e generated by a suitably c hosen mind. Th us the structural and the generativ e viewp oints are equiv alen t. W e note, how ever, that not ev ery union-closed family ab o v e the axioms qualifies as an A -based learning space. A ccessibilit y is an additional re- quiremen t. It rules out degenerate state spaces in whic h the learner cannot progress one concept at a time; see Corollary B.1 . 3 T eac hing and Learning Dynamics Understanding characterizes whic h concepts are in principle accessible under a prerequisite structure. T eac hing introduces a second challenge b ey ond accessibilit y: the learner must iden tify the teaching target. A signal ab out addition in a mathematics course, for example, ma y indicate that addition is itself the in tended endp oin t, or it may b e an intermediate step on the w ay to multiplication. This is the identification comp onen t of teaching. It is here that in ten tionality en ters. A teaching mov e is not merely the presen tation of a concept; it is an action chosen in light of a target and interpreted b y the learner as evidence ab out that target. T o represent this as ymmetry , we 16 mo del the target concept as a latent v ariable kno wn to the teac her and unknown to the learner, and we represent the learner’s ev olving belief as a probabilit y distribution ov er candidate targets. The laten t target need not b e interpreted only as the teac her’s in tended endp oin t. It may also b e read as the higher-level concept that renders the curren tly acquired material globally coherent. On this interpretation, learning in volv es tw o coupled dimensions: the acquired concept set expands, while the learner simultaneously infers which larger target those concepts are organizing to ward. A concept may therefore be acquired locally b efore its place in the larger conceptual graph is understo o d. F or example, a learner may acquire many concepts from electromagnetism and electronics while still lacking the bridge concept that connects them to wireless comm unication. Once that target is iden tified, previously disconnected material b ecomes integrated as part of a single explanatory structure. T eaching dynamics therefore in volv e b oth structural and epistemic progress. Structural progress is gov erned by the prerequisite structure: once the learner is at an acquired set from whic h a concept is or der e d , and the appropriate signal is successfully parsed, that concept en ters the learner’s acquired concept set. Epistemic progress, b y con trast, concerns the gradual resolution of uncertain ty ab out the target. Because the learner do es not know whic h target the teacher in tends, eac h signal m ust play a dual role: it m ust b e a v alid instructional step in the prerequisite structure, and it must simultaneously provide evidence that distinguishes the intended target from the alternativ es. F rom the learner’s p erspective, the observ ed signal is therefore a random v ariable whose distribution dep ends on b oth the unknown target and the teaching strategy . Eac h round can con v ey only a b ounded amoun t of usable information ab out the latent target, and the total teac hing time is gov erned by the rate at which this epistemic uncertain ty is resolved. If the learner knew the target from the outset, the epistemic dimension w ould disapp ear and teaching would reduce to the purely structural problem of reac hing a kno wn target b y a v alid curriculum. W e now mak e these ideas concrete by in tro ducing a sto c hastic mo del of teac hing. 3.1 A Sto chastic Mo del of T eac hing Fix a probabilit y space (Ω 0 , F , P ) on which all random v ariables b elo w are defined. Let Ω ⊆ C b e a finite set of tar get c onc epts . Let Θ : Ω 0 → Ω b e an Ω -v alued random v ariable representing the realized target concept, kno wn to the teacher but unknown to the learner. The learner’s goal is to identify Θ . Let Z b e a finite set, called the te aching signal set , consisting of the ra w signals the teacher can emit. Let ⊥ / ∈ Z b e an additional symbol representing a n ull observ ation pro duced when a signal cannot b e parsed at the learner’s current kno wledge state. The learner observ ation set is Y = Z ∪ {⊥} . Definition 3.1 (Signal target map) . A signal tar get map is a function tgt : Z → C that assigns to eac h raw teac hing signal z ∈ Z the concept tgt ( z ) ∈ C that the 17 signal is intended to teach. W e assume that every target concept is asso ciated with at least one raw signal, that is, Ω ⊆ im (tgt) . F or each concept c ∈ C , the fib er tgt − 1 ( c ) = { z ∈ Z : tgt ( z ) = c } is the set of all ra w signals designed to teach c , represen ting different explanations, examples, or phrasings of the same concept. Signals in the fib er tgt − 1 ( c ) all target the same concept and therefore hav e the same structural effect on the learner’s acquired concept set. How ev er, they may still differ informationally: distinct signals in the fib er can enco de different information ab out the laten t target Θ . R emark 3.2 (Fixed signal system and notation) . The raw signal alphab et Z and the target map tgt are treated as fixed throughout a given teac hing problem. Capacit y quantities in tro duced later therefore dep end not only on the mind m and the acquired concept set K , but also on this signal system ( Z , tgt ) . When no am biguity arises, we suppress this dep endence in the notation. The signal target map tgt and the laten t target Θ play complementary but distinct roles. The random v ariable Θ ∈ Ω sp ecifies what the learner must ultimately iden tify: the realized target concept. The map tgt sp ecifies what each individual signal is ab out: a signal z with tgt ( z ) = c is designed to teach concept c , which may or ma y not equal Θ . In general, signals targeting prerequisite concepts may need to b e presented b efore signals targeting Θ itself can b ecome usable to the learner. Th us the teac her’s ev entual strategy has tw o degrees of freedom: which concept to target, and which particular enco ding of that concept to use within the fib er tgt − 1 ( c ) . Consequen tly , a signal may carry information ab out the target ev en when it do es not directly target the concept Θ . W e now introduce the p arsing map ρ m , which takes a raw teac hing signal together with the learner’s curren t knowledge set and either passes the signal through, when the prerequisites are satisfied, or collapses it to the n ull tok en ⊥ otherwise. Definition 3.3 (Parsing map) . A mind m is equipp ed with a p arsing map ρ m : Z × 2 C → Z ∪ {⊥} , where ⊥ is a nul l token indicating that the signal is unparseable. F or a signal z ∈ Z with target c = tgt ( z ) and a knowledge set K ⊆ C : (i) ρ m ( z , K ) = z if c ∈ Φ m ( K ) , equiv alen tly , if either c ∈ K already or there exists a rule ( S , c ) ∈ E m with S ⊆ K ; (ii) ρ m ( z , K ) = ⊥ if c / ∈ Φ m ( K ) , equiv alen tly , if c / ∈ K and no rule for c has all its prerequisites in K . The condition c ∈ Φ m ( K ) is the ordered condition of Definition 2.16 . A concept may hav e multiple prerequisite sets, and the signal is parseable if any one of them is satisfied. Dynamics. W e model teaching as a rep eated interaction b et ween a teac her and a learner unfolding o v er discrete rounds t = 0 , 1 , 2 , . . . . The mo del uses a concept-lev el time scale: one round represen ts a single instructional in teraction 18 in whic h the teac her emits one raw signal, the learner observes its parsed v ersion, and the learner’s acquired concept set ma y b e up dated as a result. W e take the learner’s initial acquired concept set to b e the axiom set of the mind: K 0 = A m . F or each t ≥ 0 , the set K t ⊆ C denotes the concepts acquired b y the learner after the first t rounds of instruction. A t round t + 1 , the teac her emits a raw signal Z t +1 ∈ Z . Given the learner’s curren t acquired concept set K t , the learner observ ation is the parsed signal Y t +1 = ρ m ( Z t +1 , K t ) ∈ Y . Definition 3.4 (Concept-acquisition up date rule) . Giv en the parsed observ ation Y t +1 ∈ Y , the learner’s acquired concept set evolv es according to K t +1 =      K t ∪ { tgt( Y t +1 ) } if Y t +1 ∈ Z , K t if Y t +1 = ⊥ . Under this up date rule, eac h round can add at most one newly acquired concept, namely the concept targeted by the parsed signal when parsing succeeds. Time is therefore measured in units of concept-lev el teac hing opp ortunities. The rule in Definition 3.4 has tw o immediate consequences. First, acquisition is monotone: K t ⊆ K t +1 for all t ≥ 0 . Second, the set K t records only concepts that ha v e b een explicitly acquired through parsed instruction; it is not automatically closed under the expansion rules. Th us a concept may already b e reac hable from K t , in the sense that c ∈ Φ m ( K t ) , without ye t b elonging to K t itself. The learner acquires suc h a concept only at a later round in which it receives a parseable signal targeting c . Therefore, U m = cl m ( A m ) describ es what is in principle reachable from the axioms, whereas the pro cess ( K t ) t ≥ 0 describ es what has actually b een acquired ov er time. Lemma 3.5 (The instructional pro cess stays inside the reachable family) . F or every t ≥ 0 , one has K t ∈ K m almost sur ely. Lemma 3.5 sho ws that the stochastic teac hing process ev olv es within the reac hable family K m . Th us the family introduced in Section 2.3 not only describ es structurally feasible knowledge states but also forms the natural state space for the instructional dynamics. Definition 3.6 (Admissible teac hing strategy) . An admissible te aching str ate gy is a sequence of sto c hastic k ernels κ t +1 ( · | θ , y 1 , . . . , y t ) ∈ ∆( Z ) , t ≥ 0 , so that, conditional on the realized target Θ = θ and the parsed history ( Y 1 , . . . , Y t ) = ( y 1 , . . . , y t ) , the teac her chooses the next raw signal Z t +1 according to κ t +1 . Because the learner’s epistemic ob jectiv e is to identify the latent target Θ , it main tains at eac h time t a b elief ov er the p ossible target concepts. This b elief is up dated from the parsed observ ations Y 1 , . . . , Y t , rather than from the raw teac her 19 emissions Z 1 , . . . , Z t , which are not directly observ ed by the learner. Acc ordingly , define the learner’s information filtration by F t = σ ( Y 1 , . . . , Y t ) ⊆ F , t ≥ 1 , and set F 0 = { ∅ , Ω 0 } . Given a fixed prior π 0 and a fixed admissible teac hing strategy , the learner’s p osterior at time t is the random probabilit y v ector π t ∈ ∆(Ω) defined by π t ( c ) = P (Θ = c | F t ) , c ∈ Ω . The conditional probability is taken with resp ect to the probability law induced b y the prior π 0 and the admissible teaching strategy . Th us the learner is mo deled as Bay esian: its b elief state at time t is the p osterior distribution of the latent target given the parsed observ ation history . Definition 3.7 (Learning state) . A le arning state at time t is a pair ( K t , π t ) where (i) K t ⊆ C is the learner’s acquired concept set at time t ; (ii) π t ∈ ∆(Ω) is the learner’s p osterior b elief ov er target concepts. Th us the learning state records b oth dimensions of progress in the teaching pro cess: structural progress, captured by the acquired concept set K t , and epistemic progress, captured by the p osterior b elief π t ab out the laten t target. The sto chastic teac hing dynamics therefore ev olve on the pro duct space K m × ∆(Ω) . Definition 3.8 (Completion) . T eaching is c omplete at time τ if b oth (i) tar get ac quisition : Θ ∈ K τ ; (ii) identific ation : π τ (Θ) = 1 . The framework therefore distinguishes three related notions. First, the un- derstanding horizon U m = cl m ( A m ) is the set of concepts that are in principle reac hable from the axioms under the expansion rules of the mind. Second, the time-indexed set K t records whic h concepts hav e actually b een acquired through instruction b y time t . Third, the completion condition of Definition 3.8 formalizes successful teaching of the target: it requires b oth acquisition of the target, Θ ∈ K τ , and identification of the target, π τ (Θ) = 1 . Acquisition without identification corresp onds to ha ving acquired a concept without yet kno wing that it is the in tended target. Identification without acquisition corresp onds to knowing whic h concept is intended without y et having reac hed it. Completion requires b oth. Example 3.9 (A full teac hing interaction) . Let C = { a, b, c, d } with the informal readings a = counting , b = addition , c = arrays , d = multiplication . Fix Mind 1 from Example 2.4 , with axiom set A m = { a } and expansion rules { a } ⇒ b, { b } ⇒ c, { b, c } ⇒ d. 20 Let Ω = { b, c, d } , let the prior b e uniform on Ω , and let the teac her use the deterministic p olicy Θ = b : ( Z 1 , Z 2 , Z 3 ) = ( z (1) b , z (1) b , z (1) b ) , Θ = c : ( Z 1 , Z 2 , Z 3 ) = ( z (1) b , z (1) c , z (1) c ) , Θ = d : ( Z 1 , Z 2 , Z 3 ) = ( z (1) b , z (1) c , z (1) d ) , where tgt ( z (1) b ) = b , tgt ( z (1) c ) = c , and tgt ( z (1) d ) = d . Supp ose the realized target is Θ = d . The learner starts from K 0 = { a } , π 0 ( b ) = π 0 ( c ) = π 0 ( d ) = 1 3 . A t t = 0 , the teacher emits Z 1 = z (1) b . Since b ∈ Φ m ( { a } ) , the signal is parseable, so Y 1 = z (1) b and K 1 = { a, b } . Because the same first signal is prescrib ed under all three targets, the observ ation Y 1 = z (1) b do es not y et distinguish among them, and therefore π 1 = π 0 . A t t = 1 , the teacher emits Z 2 = z (1) c . Since b has already b een acquired, the concept c is now ordered, so the signal is parseable: Y 2 = z (1) c and K 2 = { a, b, c } . Under the stated policy , the history ( Y 1 , Y 2 ) = ( z (1) b , z (1) c ) is inconsisten t with Θ = b . Hence the p osterior assigns zero mass to b and splits mass equally b et w een c and d : π 2 ( b ) = 0 , π 2 ( c ) = π 2 ( d ) = 1 2 . A t t = 2 , the teacher emits Z 3 = z (1) d . Since b oth b and c are no w present, the concept d is ordered, so the signal is parseable: Y 3 = z (1) d and K 3 = { a, b, c, d } . No w the full observ ation history is consistent only with Θ = d , so π 3 = δ d . Th us teac hing is complete at time τ = 3 : the learner has b oth acquired the target, Θ = d ∈ K 3 , and identified it, π 3 ( d ) = 1 . This example illustrates the distinction betw een structural acquisition, enco ded by the pro cess ( K t ) , and epistemic identification, enco ded by the p osterior pro cess ( π t ) . 3.2 The Epistemic Arro w of Time W e now formalize the epistemic comp onen t of the teaching dynamics. The k ey question is how the learner’s uncertaint y ab out the laten t target evolv es as parsed observ ations accumulate ov er time. This motiv ates the term epistemic arr ow of time : although particular observ ations may b e uninformative, p osterior uncertain t y can only decrease in conditional exp ectation under Bay esian up dating. The information-theoretic notions used b elo w are standard; see, for example, [ Co ver and Thomas , 2006 , §2]. 21 Information-theoretic quantities. Let X and Y b e discrete random v ariables on a probability space (Ω 0 , F , P ) taking v alues in finite or countable sets X and Y . W e adopt the conv ention 0 log 0 = 0 . The Shannon entrop y of X is H ( X ) = − X x ∈ X P ( X = x ) log P ( X = x ) . F or a sub- σ -field G ⊆ F , define the p athwise c onditional entr opy of X giv en G b y H ( X | G ) = − X x ∈ X P ( X = x | G ) log P ( X = x | G ) . Its expectation E [ H ( X | G )] is the usual conditional en tropy . F or brevit y , w e write H ( X | Y ) = E [ H ( X | σ ( Y ))] . The mutual information b et w een X and Y is I ( X ; Y ) = H ( X ) − H ( X | Y ) . The conditional mutual information giv en G is I ( X ; Y | G ) = H ( X | G ) − E [ H ( X | G ∨ σ ( Y )) | G ] . In the teaching mo del, Θ is Ω -v alued and the learner filtration is F t = σ ( Y 1 , . . . , Y t ) . W e define the epistemic entr opy at time t by H t = H (Θ | F t ) . Since π t ( c ) = P (Θ = c | F t ) , this may b e written as H t = − X c ∈ Ω π t ( c ) log π t ( c ) a.s. Th us H t is the Shannon entrop y of the learner p osterior at time t . Prop osition 3.10 (En trop y drop equals information flo w) . The one-r ound exp e cte d r e duction in epistemic entr opy satisfies E [ H t − H t +1 | F t ] = I (Θ; Y t +1 | F t ) . Prop osition 3.10 expresses a conserv ation principle: the exp ected reduction in p osterior uncertain t y about the target is equal to the conditional m utual information conv ey ed by the next parsed observ ation. In other words, exp ected learning progress in one round is precisely the amount of information that Y t +1 carries ab out the target Θ . Theorem 3.11 (Epistemic arro w of time) . The epistemic entr opy pr o c ess ( H t ) t ≥ 0 is a sup ermartingale: E [ H t +1 | F t ] ≤ H t , with e quality if and only if Y t +1 is indep endent of Θ given F t . 22 Theorem 3.11 formalizes the epistemic arrow of time: p osterior uncertaint y decreases in conditional exp ectation, although along particular sample paths it ma y increase after a realized observ ation. Equalit y holds when the next observ ation carries no information ab out the target. R emark 3.12 (Ba yesian mo deling c hoice) . By defining π t ( c ) = P (Θ = c | F t ) , we ha v e adopted a Bay esian learner mo del: the learner b elief is the true conditional distribution of the target giv en the parsed observ ation history . This is not the only possible c hoice, but it is natural here for three reasons. First, π t uses all information contained in the observ ations and nothing else, so it is determined entirely b y the prior π 0 and the filtration F t . Second, b ecause π t is the conditional distribution of Θ giv en F t , the epistemic en tropy H t coincides with the conditional entrop y H (Θ | F t ) . This mak es mutual information the natural measure of learning progress: each new observ ation reduces p osterior uncertain ty by precisely I (Θ; Y t +1 | F t ) in conditional exp ectation. Third, the completion condition π τ (Θ) = 1 then has a strong in terpretation: the parsed observ ations iden tify the target, rather than the learner merely arriving at the correct answer by c hance. 3.3 Prerequisites and the Relativit y of Randomness The epistemic arrow of time in Theorem 3.11 describ es how uncertain ty ev olves once observ ations are received. It do es not, how ever, determine what the learner actually observes. In the teaching mo del the learner do es not observe the raw teac her signal Z t +1 directly; instead it receives the parsed observ ation Y t +1 = ρ m ( Z t +1 , K t ) , where the parsing map dep ends on the learner’s curren t acquired concept set. When the targeted concept is ordered, the signal passes through unc hanged; when prerequisites are missing, the parser collapses the signal to the n ull tok en ⊥ . The effectiv e information channel from Θ to the learner is therefore state dependent. In particular, the same ra w broadcast ma y transmit usable information to one learner while b eing erased for another. The next result formalizes this phenomenon. As throughout, conditional m utual-information expressions given U t +1 or U c t +1 are understo o d on the even t where the relev an t conditioning probabilit y is positive, and are tak en to b e 0 otherwise. Theorem 3.13 (Relativit y of randomness) . L et C t +1 = tgt ( Z t +1 ) b e the tar gete d c onc ept, and define the unp arse ability event U t +1 = { C t +1 / ∈ Φ m ( K t ) } . A ssume that on p arse able r ounds the r aw te acher signal is informative ab out the latent tar get: I (Θ; Z t +1 | F t , U c t +1 ) > 0 . Then the le arner’s p er-r ound information tr ansfer exhibits an eventwise dichotomy: I (Θ; Y t +1 | F t , U t +1 ) = 0 (erasure) , I (Θ; Y t +1 | F t , U c t +1 ) > 0 (informative) . Theorem 3.13 shows that the usable information in a teaching signal is state dep enden t. Under the parsing map ρ m , if the targeted concept is unordered at 23 K t , then the parsed observ ation collapses to ⊥ ; b y Theorem 3.13 , the learner receiv es no further within-ev en t discrimination from the ra w signal on that even t: conditional on unparseabilit y , the parsed observ ation is the constan t ⊥ , although the o ccurrence of unparseabilit y itself ma y still be informativ e ab out Θ . By con trast, on parseable rounds the same ra w broadcast may transmit strictly p ositiv e information. In this precise sense, the informational status of a signal is relativ e to the learner’s structural capacity to deco de it. This relativity is consistent with classical information theory . Randomness has alw ays b een observer dep enden t: a ciphertext app ears as pure noise without the cryptographic key [ Shannon , 1949 ], and conditional m utual information formalizes the dep endence of information on what is known [ Cov er and Thomas , 2006 ]. What is distinctive here is the mechanism that generates this dep endence: the learner’s deco ding p o w er is gov erned b y the combinatorial closure op erator Φ m , so prerequisite top ology directly determines when the channel b eha v es as iden tity and when it b eha v es as erasure. The notion of mind-r elative r andomness introduced earlier is related to, but distinct from, the com binatorial distinction b et w een ordered and unordered concepts from Definition 2.16 . The latter is a structural prop ert y of the targeted concept relative to the learner’s acquired concept set, whereas the former is an epistemic prop ert y of the observ ation pro cess relative to the laten t target Θ . In the sharp parsing mo del, if the teacher targets a concept C t +1 = tgt ( Z t +1 ) that is unordered at the current acquired concept set, C t +1 / ∈ Φ m ( K t ) , then the parser maps every such ra w signal to the same n ull observ ation: Y t +1 = ⊥ almost surely on that even t. Thus all distinctions among those raw signals are erased at the learner end of the channel. Ho w ever, the app earance of ⊥ do es not by itself imply mind-relative random- ness. Ev en though the sym b ol ⊥ con tains no in ternal distinctions, the ev en t { Y t +1 = ⊥} ma y still con vey information ab out the target Θ . In particular, if the teacher’s targeting rule dep ends on Θ , then the probability that the teacher selects a concept outside Φ m ( K t ) ma y v ary with Θ , and observing ⊥ can up date the learner’s p osterior b elief. Con versely , an ordered round need not be informative. If C t +1 ∈ Φ m ( K t ) , then the signal is parseable and Y t +1 = Z t +1 . But even in this case the parsed observ ation ma y still b e mind-random if, conditional on the public history F t , the teac her’s p olicy induces the same distribution of Y t +1 under every p ossible target. Equiv alently , Θ ⊥ ⊥ Y t +1 | F t . Thus parseabilit y and informativ eness are logically distinct: an unordered round may still b e informativ e through the o ccurrence of erasure, while an ordered round may b e uninformative if the parsed signal distribution do es not dep end on the target. An immediate consequence of sharp parsing is that rep eated rephrasings of the same unordered concept do not help. If c / ∈ Φ m ( K t ) , then ev ery raw signal targeting c collapses to the null observ ation ⊥ , regardless of how many distinct enco dings or phrasings are av ailable (see Corollary B.4 ). Thus, on that even t, rep etition and rephrasing do not reduce epistemic uncertaint y ab out the target. Com bined with the prerequisite gating established in Theorem 3.13 , these 24 observ ations formalize a central thesis of the framework: whether a broadcast con v eys usable information is not an intrinsic prop ert y of the signal itself, but of the interaction among the signal, the learner’s current acquired concept set, and the teacher’s p olicy . R emark 3.14 . The relativity of randomness established in Theorem 3.13 suggests a broader p erspective in which randomness itself b ecomes observer dep enden t. The parsing map ρ m determines, for eac h mind and acquired state, which signals are informative and which collapse to noise. In this sense, randomness is not an in trinsic prop ert y of a signal but a relation b et ween the signal and the observer’s structure of understanding. 4 Sp eed Limits of T eac hing W e now derive the quantitativ e sp eed limits of the teaching mo del. T w o obstruc- tions co exist. The first is structur al : the learner must acquire enough prerequisite concepts for the target to b ecome reachable. The second is epistemic : the learner m ust resolv e uncertain ty ab out which target concept the teac her intends. The purp ose of this section is to formalize b oth obstructions and com bine them in to a single low er b ound on the exp ected completion time. Fix a mind m = ( C , A m , E m ) and a finite target set Ω ⊆ U m = cl m ( A m ) . Thus ev ery target under consideration lies in the learner understanding horizon. 4.1 Iden tification and state-dep enden t capacit y Recall that Θ : Ω 0 → Ω is the realized target concept and that the learner observ es the parsed history F t = σ ( Y 1 , . . . , Y t ) . The learner epistemic ob jectiv e is to identify Θ from this history . W e say that identification o ccurs at time t if π t (Θ) = 1 , equiv alen tly , H (Θ | F t ) = 0 . Since completion additionally requires target acquisition, iden tification is a strictly weak er requirement than full teaching completion. Definition 4.1 (Identification stopping time) . A random time τ id is an identifi- c ation stopping time if: (i) τ id is an ( F t ) -stopping time; (ii) P ( τ id < ∞ ) = 1 ; (iii) Θ is F τ id -measurable. Equiv alently , H (Θ | F τ id ) = 0 almost surely . Because the parsing map ρ m dep ends on the learner’s curren t acquired concept set K t , the effectiv e learner-side c hannel is state dep enden t. A t early stages man y ra w signals ma y collapse to ⊥ , whereas later the same signals may pass through unc hanged once the relev an t prerequisites ha v e b een acquired. Let K m b e the reac hable family in tro duced in Definition 2.22 . F or eac h reac hable acquired concept set K ∈ K m , define the ordered raw-signal set 25 Z ord ( K ) = { z ∈ Z : tgt ( z ) ∈ Φ m ( K ) } . Under sharp parsing, signals in Z ord ( K ) pass through unchanged, while all other ra w signals collapse to ⊥ . This leads to the following learner-side capacit y notion. Definition 4.2 (State-dep enden t parsed en trop y b ound) . F or each K ∈ K m , define C m ( K ) = sup n H  ρ m ( Z, K )  : Z is an Z -v alued random v ariable o . The parsed en tropy b ound C m ( K ) also dep ends on the signal system ( Z , tgt ) . Throughout the pap er this instructional in terface is treated as fixed, and we therefore suppress this dep endence in the notation. F or a giv en interface, the v ariation of C m ( K ) across acquired concept sets is endogenous to the learner state, whereas its n umerical lev el is determined jointly by the mind m and the signal system ( Z , tgt ) . Thus C m ( K ) should b e understo od as a prop ert y of the pair ( m , ( Z , tgt)) ev aluated at state K . Th us C m ( K ) is the largest Shannon entrop y that a one-round parsed obser- v ation ρ m ( Z, K ) can attain at the learner end of the channel when the acquired concept set is K , as the la w of the raw input signal Z ranges ov er all Z -v alued distributions. Prop osition 4.3 (Statewise one-round information bound) . F or every t ≥ 0 , I (Θ; Y t +1 | F t ) ≤ C m ( K t ) almost sur ely. Prop osition 4.3 shows that the learner’s p er-round information gain ab out the target is b ounded b y the capacity C m ( K t ) , which dep ends on the learner’s acquired concept set at time t . As the learner acquires more concepts, the set of parseable signals grows, and the capacity may increase. The b ound is therefore not static: structural progress expands the effective channel through which teaching o ccurs. This coupling b et w een structural progress and informational capacity is the mechanism through which prerequisites gov ern the sp eed of teaching. The next lemma sho ws that the learner-side channel can only improv e as the learner acquires more concepts. Lemma 4.4 (Monotonicit y of the state-dep enden t b ound) . If K , K ′ ∈ K m satisfy K ⊆ K ′ , then C m ( K ) ≤ C m ( K ′ ) . Lemma 4.4 reflects a basic prop ert y of the parsing mo del: acquiring additional concepts cannot reduce the learner’s ability to deco de signals. When the acquired concept set grows, previously parseable signals remain parseable, and additional signals ma y b ecome usable. Consequen tly the entrop y of the parsed observ ation, and therefore the effective c hannel capacity , cannot decrease as the learner acquires additional concepts. This monotonicity admits a stronger statistical interpretation. T o formalize it, w e use the Blac kwell order on statistical exp erimen ts [ Blackw ell , 1953 ]. Informally , one exp erimen t Blackw ell-dominates another if the latter can b e obtained from the former b y garbling, that is, b y post-pro cessing through a stochastic map indep enden t of the underlying state. Equiv alently , the dominating exp erimen t is at least as informative for ev ery statistical decision problem. 26 Definition 4.5 (Blackw ell domination) . Let Ω b e a finite state space, and let W : Ω → ∆( Y ) and W ′ : Ω → ∆( Y ′ ) b e tw o statistical exp erimen ts. W e sa y that W Blackwel l-dominates W ′ if there exists a Marko v kernel G : Y → ∆( Y ′ ) such that for ev ery ω ∈ Ω , W ′ ( · | ω ) = P y ∈ Y G ( · | y ) W ( y | ω ) . Equiv alently , W ′ is a garbling of W . Theorem 4.6 (Blac kw ell order on acquired concept sets) . Fix t ≥ 0 and a public history h t = ( y 1 , . . . , y t ) ∈ Y t with P (( Y 1 , . . . , Y t ) = h t ) > 0 . F or e ach K ∈ K m , let W K ,h t denote the statistic al exp eriment fr om Θ to the p arse d observation induc e d by the c onditional r aw-signal law P  Z t +1 ∈ · | Θ = ω , ( Y 1 , . . . , Y t ) = h t  , ω ∈ Ω . If K ⊆ K ′ , then W K ′ ,h t Blackwel l-dominates W K ,h t . Theorem 4.6 holds for eac h realized public history h t separately . Thus the ordering of acquired concept sets is path wise rather than merely av eraged: con- ditional on an y history for which the next-round ra w-signal la w is defined, the parsed exp erimen t induced b y a larger acquired concept set Blac kwell-dominates the parsed exp erimen t induced b y a smaller one. This theorem strengthens Lemma 4.4 . The monotonicit y of C m ( K ) says that larger acquired concept sets p ermit w eakly greater parsed entr opy . Theorem 4.6 sho ws more: they induce uniformly more informative exp erimen ts in the sense of statistical decision theory . The univ ersal-broadcast theorem of Theorem 5.6 will show that this dep endence on the learner prerequisite structure cannot, in general, b e eliminated b y a common broadcast curriculum. 4.2 Structural and epistemic lo w er b ounds W e now combine the structural and epistemic constrain ts of the mo del to derive a single low er b ound on teaching time. Definition 4.7 (Structural distance to a target concept) . F or c ∈ U m , define L m ( c ) = min  L ≥ 0 : ∃ K 0 , . . . , K L ∈ K m , u 0 , . . . , u L − 1 ∈ U m suc h that K 0 = A m , c ∈ K L , K i +1 = K i ∪ { u i } , u i ∈ Φ m ( K i ) \ K i , i = 0 , . . . , L − 1  . The quantit y L m ( c ) measures the shortest prerequisite-resp ecting route from the axioms to a state con taining c . It therefore giv es the natural structural b enc hmark against which an y completion time m ust b e compared. The first fundamen tal constrain t on teaching time is structural: the learner must tra verse the prerequisite chain b efore the target can b e acquired. Prop osition 4.8 (Structural barrier) . L et τ b e any c ompletion time in the sense of Definition 3.8 . Then τ ≥ L m (Θ) almost sur ely. Conse quently, E [ τ ] ≥ E [ L m (Θ)] . 27 Prop osition 4.8 is the purely geometric obstruction in the mo del. Regardless of how informativ e the signals are, the learner cannot complete teac hing b efore tra v ersing a prerequisite-resp ecting path to a state con taining the realized target. Since each round adds at most one concept, the shortest suc h path gives an una voidable low er b ound on completion time. T o control the epistemic obstruction, w e next aggregate the information gained across all rounds up to identification τ id . The p oin t is that the p er-round entrop y drop identit y from Prop osition 3.10 telescop es o ver time. Lemma 4.9 (T otal information required for identification) . L et τ id b e an identi- fic ation stopping time. Then E " τ id − 1 X t =0 I (Θ; Y t +1 | F t ) # = H (Θ) . Lemma 4.9 sa ys that identification must pay for the full initial uncertain ty of the target. The cum ulative conditional m utual information transmitted through the parsed observ ations up to identification is the en tropy of Θ . Thus the learner cannot iden tify the target un til enough usable information has flow ed through the learner-side channel to resolv e all initial uncertain ty . The next step is to combine this accounting iden tity with the statewise capacity b ound from Prop osition 4.3 . This con ve rts total required information into a low er b ound expressed in terms of the learner tra jectory through the reac hable family . Prop osition 4.10 (T ra jectory information budget) . L et τ id b e any identific ation stopping time. Then H (Θ) ≤ E " τ id − 1 X t =0 C m ( K t ) # . Prop osition 4.10 is the dynamic information budget of the mo del. The total target uncertain t y cannot exceed the cumulativ e parsed capacity along the states visited b efore iden tification. In this sense, a curriculum may need to sp end rounds building the deco der b efore it can effectiv ely use it: the states through whic h the learner passes determine the rate at which target information can b e transmitted. W e define C max m = max K ∈ K m C m ( K ) . This maximum is well defined b ecause, under Assumption 2.23 , the reachable family K m is finite by Prop osition 2.24 . Theorem 4.11 (Global structural-information low er b ound) . L et τ b e any c om- pletion time, then E [ τ ] ≥ max ( E [ L m (Θ)] , H (Θ) C max m ) . Theorem 4.11 is the central sp eed la w of the framew ork. T eac hing is con- strained sim ultaneously b y prerequisite geometry and b y information transmission. The lo w er b ound tak es the form of a maxim um rather than an additiv e sum b ecause structural progress may itself conv ey information ab out the target. Nev- ertheless b oth b ottlenec ks must b e cleared. 28 Assumption 4.12 (Structural signal a v ailabilit y) . F or every c onc ept u ∈ U m , ther e exists a r aw signal z ∈ Z such that tgt( z ) = u . Earlier w e required only that ev ery possible target concept admit a corre- sp onding signal, that is, Ω ⊆ im ( tgt ) . Assumption 4.12 is stronger: it requires signals for all concepts in the understanding horizon U m , including intermediate prerequisites. This assumption ensures that the teacher can implemen t any v alid ordered curriculum by emitting signals targeting the concepts that m ust b e acquired along the path to the target. Prop osition 4.13 (Direct target signaling collapses the epistemic term in the baseline mo del) . L et Ω + = { c ∈ Ω : π 0 ( c ) > 0 } b e the supp ort of the prior. (i) If Ω ⊆ im(tgt) , then H (Θ) C max m ≤ 1 . (ii) Under Assumption 4.12 , ther e exists an admissible te aching str ate gy with c ompletion time τ satisfying τ ≤ L m (Θ) + 1 almost sur ely, and henc e E [ τ ] ≤ E [ L m (Θ)] + 1 . Prop osition 4.13 clarifies the role of the information-theoretic la y er in the baseline mo del. Once the learner has structurally reac hed the realized target, a single target-specific signal suffices for identification. Thus the dominant obstruction is t ypically structural: the learner must first acquire the prerequisites that make the target concept reac hable. The information-theoretic analysis nevertheless remains essential. It explains wh y target-sp ecific instruction is ineffective b efore the relev an t prerequisites are in place, and it provides a principled wa y to compare the informativeness of differen t acquired states through Blac kwell dominance. In this view, structural progress builds the deco der, and information transmission b ecomes effective only after that deco der exists. Example 4.14 (A common prerequisite can open a parseable iden tification c hannel) . Consider the mind m = ( C , A m , E m ) with C = { a, b, d 1 , d 2 , d 3 , d 4 } , A m = { a } , and expansion rules { a } ⇒ b , { b } ⇒ d j , j = 1 , 2 , 3 , 4 . Th us b is a common prerequisite, and once b has b een acquired, an y of the four target concepts d 1 , d 2 , d 3 , d 4 b ecomes reachable in one additional step. Let Ω = { d 1 , d 2 , d 3 , d 4 } with the uniform prior. Then H (Θ) = log 4 = 2 . F or each j = 1 , 2 , 3 , 4 , L m ( d j ) = 2 , and therefore E [ L m (Θ)] = 2 . Let the ra w signal alphab et b e Z = { z b , z 1 , z 2 , z 3 , z 4 } , with tgt( z b ) = b , tgt( z j ) = d j , j = 1 , 2 , 3 , 4 . A t the initial acquired concept set { a } , only b is ordered. Hence the parsed observ ation range is { z b , ⊥} , so C m ( { a } ) = log 2 . A t the acquired concept set { a, b } , all fiv e raw signals are parseable, so the parsed observ ation range is { z b , z 1 , z 2 , z 3 , z 4 } , and therefore C m ( { a, b } ) = log 5 . Thus acquiring the single prerequisite b enlarges the learner effective channel from 1 bit to log 5 bits p er round. 29 No w supp ose the teacher tries to identify the target immediately by sending Z 1 = z j when Θ = d j . A t the ra w-signal level this w ould reveal the target p erfectly . But at the learner initial acquired concept set { a } , none of the targets d j is ordered, so Y 1 = ρ m ( Z 1 , { a } ) = ⊥ almost surely . Hence I (Θ; Y 1 | F 0 ) = 0 . Before the common prerequisite b is taugh t, target- sp ecific instruction is pure erasure. Consider instead the tw o-round strategy Z 1 = z b for every realization of Θ , and Z 2 = z j if Θ = d j . After round 1 , the learner has acquired the prerequisite: K 1 = { a, b } . The p osterior do es not change, b ecause the first signal is indep enden t of Θ . A t round 2 , the signal z j is parseable, so the learner observ es Y 2 = z j , acquires d j , and iden tifies the target exactly . Thus τ = 2 almost surely . The low er b ound of Theorem 4.11 is therefore tight in this example. Since C max m = log 5 , one obtains E [ τ ] ≥ max ( E [ L m (Θ)] , H (Θ) C max m ) = max ( 2 , 2 log 5 ) = 2 , and the strategy ab o ve attains equality . This example shows that an optimal teacher may rationally sp end an entire round on structural preparation rather than on target-sp ecific signaling, b ecause target-sp ecific signals are useless b efore the common prerequisite b has been acquired. In the baseline mo del, the information-theoretic term is not the binding lo w er b ound here, since Prop osition 4.13 implies that identification costs at most one additional round once the target is structurally reac hable. The example nev ertheless illustrates the cen tral mec hanism of the section: usable information is state dep enden t, and teac hing may need to enlarge the learner parsed alphab et b efore target information can flow. 5 Structural Limits on T eac hing This section develops tw o consequences of the structural view of teaching. First, for a fixed learner mind, the prerequisite geometry creates threshold effects in finite-horizon teaching: b elo w a critical time budget, completion is imp ossible for ev ery strategy , while beyond that threshold success b ecomes feasible and, under mild assumptions, even tually likely . Second, for heterogeneous learners, structural incompatibilities generate an intrinsic inefficiency of universal broadcast 30 curricula: a single common sequence of signals may b e forced to pay separately for prerequisites that p ersonalized teaching would handle individually . T aken together, these results sho w that the limits of teaching are not merely informational. They are already enco ded in the com binatorial structure of the learner prerequisite system. That structure determines b oth when teaching can b egin to succeed and ho w costly common instruction b ecomes across different minds. 5.1 Structural thresholds in teac hing A cen tral question in any teac hing problem is: given a fixe d time budget, what is the pr ob ability that te aching suc c e e ds? The prerequisite structure of the learner determines the answer. Belo w a certain threshold, completion is imp ossible for ev ery teaching strategy . Once the time horizon exceeds that threshold, completion is no longer ruled out a priori, and under mild assumptions the optimal fixed- horizon success probability conv erges to one as the horizon gro ws. This v anishing completion probabilit y is not an appro ximation but a direct consequence of the structural barrier. It also has an immediate implication for resource allo cation: when training budgets are scarce, distributing time evenly across learners ma y pro duce no completed learners at all, whereas concen trating the same budget on fewer learners can yield strictly p ositiv e output. Recall the sto c hastic teac hing mo del from Section 3.1 . The target concept Θ is dra wn from a prior π 0 on Ω , known to b oth teac her and learner. By Definition 3.8 , teac hing is complete at the random time τ if b oth (i) the learner has acquired the target concept, Θ ∈ K τ ; (ii) the learner has identified the target, π τ (Θ) = 1 . W e therefore ask: if the teacher is giv en a budget of t rounds, what is the maximal probability of completing teac hing within that budget? Define V ( t ) = sup admissible teaching strategies P ( τ ≤ t ) . Th us V ( t ) is the optimal success probability ac hiev able with a time budget of t rounds, computed under the prior on Θ . Recall also that for eac h target c ∈ Ω , the quantit y L m ( c ) denotes the structural distance from the axiom set A m to a reac hable acquired concept set containing c . Define L min = min { L m ( c ) : π 0 ( c ) > 0 } . This is the smallest structural distance among targets that can arise under the prior. F or exp ected completion time, the baseline mo del also admits the upp er b ound E [ τ ] ≤ E [ L m (Θ)] + 1 under Assumption 4.12 (Prop osition 4.13 ). The fixed-horizon analysis b elow complements that statement b y describing the threshold structure of success probabilities as a function of the time budget. 31 Prop osition 5.1 (Zero completion b elo w the structural threshold) . F or every t ∈ N , V ( t ) ≤ P  L m (Θ) ≤ t  . In p articular, V ( t ) = 0 for al l t < L min . Prop osition 5.1 shows that if the time budget is shorter than the structural depth of every p ossible target, then completion is imp ossible. No teaching strategy can circum ven t this obstruction, b ecause the learner cannot b e mov ed to a reachable acquired concept set con taining the realized target in so few rounds. A t the opp osite extreme, if some admissible strategy completes teaching in finite exp ected time, then the optimal fixed-horizon success probabilit y conv erges to one as the horizon grows. Prop osition 5.2 (Ev en tual success) . If ther e exists an admissible te aching str ate gy such that E [ τ ] < ∞ , then V ( t ) → 1 as t → ∞ . Mor e c oncr etely, for any such str ate gy, V ( t ) ≥ 1 − E [ τ ] t for al l t ≥ 1 . T ogether, Prop ositions 5.1 and 5.2 describ e the qualitative shap e of the fixed- horizon success function V ( t ) : an initial region of structural imp ossibilit y , follo wed b y a region in which success b ecomes increasingly lik ely as the time budget grows. T o make the allo cation implications transparent, it is useful to consider the sp ecial case of a deterministic target. Let g ∈ U m b e fixed, and supp ose that Θ = g almost surely . Then the prior is degenerate, so π t = δ g for all t . Hence identification is automatic, and completion reduces to target acquisition alone. Define the target-acquisition time of g b y τ g = inf { t ≥ 0 : g ∈ K t } , and define the fixed-horizon acquisition probability b y V g ( t ) = sup admissible teaching strategies P ( τ g ≤ t ) . Prop osition 5.3 (Step function for deterministic targets) . A ssume that the p arsing map is given by Definition 3.3 and that Assumption 4.12 holds. Then for every deterministic tar get g ∈ U m , V g ( t ) =    0 , if t < L m ( g ) , 1 , otherwise . Th us, for a deterministic target, the fixed-horizon acquisition probability is a step function at the structural distance L m ( g ) . Below that threshold acquisition is imp ossible; at and ab ov e it, acquisition can b e achiev ed with certaint y . R emark 5.4 . The threshold structure ab o v e con trasts with b enc hmark mo dels of human-capital accum ulation in whic h training is represen ted b y a smooth pro duction technology for human capital ( e.g. , [ Ben-Porath , 1967 , Bec k er , 1964 ]). In suc h mo dels ev ery marginal unit of inv estmen t yields a p ositiv e, though 32 p ossibly diminishing, return. In the presen t framew ork, prerequisite-gated learning induces a threshold technology: a teaching signal has no effect un til the learner prerequisite structure admits the target concept, after whic h additional signals b ecome pro ductiv e. The induced pro duction technology is therefore non -concav e. This threshold structure has direct implications for the allo cation of training resources. Consider a decision mak er who m ust allocate a fixed instructional budget across learners, for example a firm training w orkers in a sp ecific skill or an instructor allo cating tutoring hours across students. The planner has a total budget of B instructional rounds and m ust decide how to distribute them across N learners. Prop osition 5.5 (Allo cation under structural thresholds) . A ssume that the p arsing map is given by Definition 3.3 and that Assumption 4.12 holds. Fix a deterministic tar get g ∈ U m with L = L m ( g ) ≥ 1 . Consider N identic al le arners and a total budget of B ∈ N instructional r ounds. (i) A ny al lo c ation that gives every le arner fewer than L r ounds yields zer o c omplete d le arners. (ii) Ther e exists an al lo c ation that gives L r ounds to min { N , ⌊ B /L ⌋} le arn- ers and 0 r ounds to the r emaining le arners, and under this al lo c ation min { N , ⌊ B /L ⌋} le arners c omplete. In p articular, if B < N L and the budget is spr e ad so that every le arner r e c eives fewer than L r ounds, then total output is zer o, wher e as the c onc entr ate d al lo c ation in (ii) yields strictly p ositive output whenever B ≥ L . Prop osition 5.5 shows that evenly spreading a fixed training budget can waste the en tire budget when every learner remains b elow the structural threshold. By con trast, concen trating the same budget on few er learners allo ws those learners to cross the threshold and pro duce strictly p ositiv e output. The source of this effect is structural: for a deterministic target, additional training time has no effect until the prerequisite threshold L m ( g ) is reached, at which p oin t completion b ecomes p ossible. The zero-output region is therefore not imp osed from outside the mo del but is a direct consequence of the learner prerequisite geometry . F or random targets, the step-function structure need not p ersist, b ecause differen t targets may ha ve differen t structural depths. What remains is the zero- completion phenomenon from Prop osition 5.1 : if every learner receiv es fewer than L min = min { L m ( c ) : π 0 ( c ) > 0 } , rounds, then the completion probability is zero regardless of the teaching strategy . The qualitative allo cation lesson therefore extends b ey ond the deterministic case: if the a v ailable budget is spread so thinly that every learner remains b elow the relev an t structural threshold, no learner completes. 5.2 Limits of univ ersal broadcast curricula The preceding subsection concerned a single learner mind. W e no w turn to heter o gene ous learners whose prerequisite structures differ. In that setting, a teac her restricted to a single broadcast curriculum cannot adapt instruction 33 to individual minds. The next theorem shows that this restriction carries a structural cost: ev en when eac h learner can b e taugh t efficiently b y a p ersonalized curriculum, an y common broadcast ma y b e forced to pay a linear p enalt y in the n umber of learner types. Theorem 5.6 (Linear broadcast p enalt y for incompatible minds) . Fix inte gers k ≥ 2 and L ≥ 2 . Then one c an c onstruct • a finite c onc ept sp ac e C , • a c ommon axiom set A ⊆ C , • a finite r aw-signal alphab et Z to gether with a signal tar get map tgt : Z → C , • minds m 1 , . . . , m k on C with c ommon axiom set A m i = A , i = 1 , . . . , k , but p airwise distinct rule sets E m i , • and a c ommon deterministic tar get c onc ept g ∈ C , such that: (i) for e ach i ∈ { 1 , . . . , k } , ther e exists a valid or der e d curriculum for m i of length L whose final ac quir e d c onc ept set c ontains g ; (ii) if a c ommon br o adc ast se quenc e Γ = ( z 1 , . . . , z T ) ∈ Z T is pr esente d to al l k minds, and if the induc e d ac quir e d c onc ept pr o c esses start fr om K ( i ) 0 = A , i = 1 , . . . , k , and evolve ac c or ding to K ( i ) t +1 =    K ( i ) t ∪ { tgt( z t +1 ) } , if tgt( z t +1 ) ∈ Φ m i ( K ( i ) t ) , K ( i ) t , otherwise, t = 0 , . . . , T − 1 , then the c ondition g ∈ K ( i ) T for every i = 1 , . . . , k implies T ≥ k ( L − 1) + 1; (iii) Ther e exists a c ommon br o adc ast se quenc e of length k ( L − 1) + 1 for which g ∈ K ( i ) k ( L − 1)+1 for every i = 1 , . . . , k . Theorem 5.6 is an existence result. F or any prescrib ed n umber k of learner t yp es and any prescrib ed p ersonalized teac hing length L , one can construct k minds sharing the same axiom set and the same deterministic target concept g , but having different prerequisite structures. F or each mind, the target can b e acquired in L p ersonalized rounds. How ev er, every common broadcast sequence that succeeds for all minds must ha v e length at least k ( L − 1) + 1 . The source of the p enalt y is purely structural. Eac h mind p ossesses a priv ate prerequisite chain leading to the target concept, and signals that adv ance one mind along its chain are unparseable for the others. Consequen tly a universal broadcast cannot reuse prerequisite rounds across learner types; it must effectively pa y for eac h priv ate chain separately . This is what generates the linear dep endence of the required broadcast length on the n umber of learner t yp es. A c kno wledgmen ts. The author thanks T eng Andrea Xu for helpful discus- sions. 34 App endix This do cumen t con tains supplemen tary material for the pap er A Mathematic al The ory of Understanding . A Pro ofs 35 A.1 Pro ofs for Section 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 35 A.2 Pro ofs for Section 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 41 A.3 Pro ofs for Section 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 42 A.4 Pro ofs for Section 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 46 B Additional results 49 A Pro ofs This app endix collects the pro ofs omitted from the main text, organized by the section in which the corresp onding result app ears. Supplemen tary lemmas are used in the pro ofs but not stated in the main text are included where they arise. A.1 Pro ofs for Section 2 Pro of of Lemma 2.7 Let c ∈ Φ m ( K ) . If c ∈ K , then c ∈ K ′ ⊆ Φ m ( K ′ ) . Otherwise there exists ( S , c ) ∈ E m with S ⊆ K . Since K ⊆ K ′ , one also has S ⊆ K ′ , hence c ∈ Φ m ( K ′ ) . Lemma A.1 (Directed-union con tin uit y) . If ( K α ) α ∈ A is a nonempty dir e cte d family, then Φ m ( S α ∈ A K α ) = S α ∈ A Φ m ( K α ) . Pr o of. The inclusion ⊇ follo ws from monotonicit y of Φ m . F or the rev erse inclusion, let c ∈ Φ m ( S α K α ) . If c ∈ S α K α , then c ∈ Φ m ( K α ) for some α ∈ A . Otherwise there exists a finite set S ⊆ S α K α with ( S , c ) ∈ E m . F or each s ∈ S , c ho ose α s suc h that s ∈ K α s . Because the family is directed and S is finite, there exists γ with S s ∈ S K α s ⊆ K γ . Hence S ⊆ K γ , so c ∈ Φ m ( K γ ) . 35 Pro of of Prop osition 2.10 (i) W e first show that the collection of fixed p oin ts of Φ m con taining K is non- empt y . Consider the en tire concept space C . By extensiveness, C ⊆ Φ m ( C ) . F or the reverse inclusion: ev ery expansion rule ( S , c ) ∈ E m has c ∈ C b y definition, so Φ m ( C ) ⊆ C . T ogether, Φ m ( C ) = C , and clearly K ⊆ C . So C is a fixed p oin t con taining K . By Definition 2.8 , cl m ( K ) = T { F ⊆ C : K ⊆ F , Φ m ( F ) = F } . The in tersection is ov er a non-empty collection, so cl m ( K ) is w ell-defined. W e now sho w it is itself a fixed p oin t. F or any fixed point F ⊇ K in the collection, cl m ( K ) ⊆ F , so monotonicit y gives Φ m (cl m ( K )) ⊆ Φ m ( F ) = F . Since this holds for ev ery suc h F , we get Φ m (cl m ( K )) ⊆ cl m ( K ) . Extensiveness gives the other direction: cl m ( K ) ⊆ Φ m (cl m ( K )) . T ogether: Φ m (cl m ( K )) = cl m ( K ) . (ii) By extensiveness, the sequence K ⊆ Φ m ( K ) ⊆ Φ 2 m ( K ) ⊆ · · · is non-decreasing. Let L = ∪ ∞ n =0 Φ n m ( K ) . By Lemma A.1 : Φ m ( L ) = Φ m ∞ [ n =0 Φ n m ( K ) ! = ∞ [ n =0 Φ n +1 m ( K ) = L so L is a fixed p oin t of Φ m con taining K . Since cl m ( K ) is the least suc h fixed p oin t, cl m ( K ) ⊆ L . Con v ersely , w e show b y induction that Φ n m ( K ) ⊆ cl m ( K ) for all n ≥ 0 . Base case: Φ 0 m ( K ) = K ⊆ cl m ( K ) by definition. Inductive step: supp ose Φ n m ( K ) ⊆ cl m ( K ) . Since cl m ( K ) is a fixed point, Φ m (cl m ( K )) = cl m ( K ) . Monotonicit y then giv es Φ n +1 m ( K ) = Φ m (Φ n m ( K )) ⊆ Φ m (cl m ( K )) = cl m ( K ) . Since Φ n m ( K ) ⊆ cl m ( K ) for every n , w e get L = ∪ ∞ n =0 Φ n m ( K ) ⊆ cl m ( K ) . (iii) If C is finite, the chain K ⊆ Φ 1 m ( K ) ⊆ Φ 2 m ( K ) ⊆ · · · is an increasing sequence of subsets of C . Whenever Φ n +1 m ( K )  = Φ n m ( K ) , the inclusion is strict, so | Φ n +1 m ( K ) | ≥ | Φ n m ( K ) | + 1 . Since each set has at most | C | elemen ts, strict gro wth can o ccur at most | C | − | K | ≤ | C | times. Therefore Φ N m ( K ) = Φ N +1 m ( K ) for some N ≤ | C | , and the chain stabilizes: cl m ( K ) = Φ N m ( K ) . Pro of of Prop osition 2.11 W e first show that U m = cl m ( A m ) satisfies (i) to (iii). (i) By Prop osition 2.10 (ii), cl m ( A m ) = S n ≥ 0 Φ n m ( A m ) . Since Φ 0 m ( A m ) = A m , it follo ws that A m ⊆ cl m ( A m ) . (ii) Let ( S , c ) ∈ E m and supp ose S ⊆ cl m ( A m ) . By Prop osition 2.10 (i), cl m ( A m ) is a fixed p oin t of Φ m , so Φ m (cl m ( A m )) = cl m ( A m ) . Since ( S , c ) ∈ E m and S ⊆ cl m ( A m ) , the definition of Φ m giv es c ∈ Φ m (cl m ( A m )) = cl m ( A m ) . (iii) Let F ⊆ C satisfy (i) and (ii). Then A m ⊆ F . Moreo ver, if c ∈ Φ m ( F ) , then either c ∈ F , or there exists ( S , c ) ∈ E m with S ⊆ F , in which case (ii) gives c ∈ F . Hence Φ m ( F ) ⊆ F . By extensiveness of Φ m , we also ha ve F ⊆ Φ m ( F ) . Therefore Φ m ( F ) = F , so F is a fixed point of Φ m con taining A m . Since cl m ( A m ) is the in tersection of all such fixed p oin ts, we conclude that cl m ( A m ) ⊆ F . Th us cl m ( A m ) is the smallest set satisfying (i) and (ii). Finally , supp ose U and U ′ b oth satisfy (i)-(iii). Since U ′ satisfies (i) and (ii), the minimality prop ert y (iii) for U implies U ⊆ U ′ . By symmetry , U ′ ⊆ U . Hence U = U ′ . 36 Pro of of Theorem 2.14 The pro of of Theorem 2.14 relies on the finiteness of deriv ation trees, whic h w e establish first. Lemma A.2 (Finiteness of deriv ations) . Every derivation tr e e in the sense of Definition 2.12 is finite. Pr o of of L emma A.2 . Assume for contradiction that the deriv ation tree is infinite. By Definition 2.12 (i), ev ery no de has finitely many children, since eac h prerequisite set is finite. Th us the tree is finitely branching. By K önig’s lemma [ Diestel , 2024 , Lemma 8.1.2], every infinite finitely branching tree has an infinite descending path. This contradicts the w ell-foundedness requiremen t in Definition 2.12 . Therefore the tree is finite. Pr o of of The or em 2.14 . F or ( ⇐ ), supp ose K ⊢ m c . By Lemma A.2 , the deriv ation tree is finite. W e argue by induction on its height. If the height is 0 , then either c ∈ K , or ( ∅ , c ) ∈ E m ; in either case c ∈ cl m ( K ) . F or the induction step, if the ro ot uses a rule ( S , c ) and each child lab el s ∈ S has a deriv ation of smaller height, then by the induction h yp othesis S ⊆ cl m ( K ) , hence c ∈ Φ m (cl m ( K )) = cl m ( K ) . F or ( ⇒ ), let D = { d ∈ C : K ⊢ m d } . W e sho w that D is a fixed point of Φ m con taining K . First, K ⊆ D : for any c ∈ K , the single-no de tree with ro ot lab eled c is a v alid deriv ation, so c ∈ D . Next, we will show that Φ m ( D ) ⊆ D . Let c ∈ Φ m ( D ) . If c / ∈ D , then there exists ( S , c ) ∈ E m with S ⊆ D . F or each s ∈ S , c ho ose a deriv ation of s from K and attac h them b elo w a new ro ot lab eled c . This giv es a deriv ation of c , con tradiction. If S = ∅ , the new ro ot has no children and still forms a v alid deriv ation. Thus, c ∈ D , and Φ m ( D ) ⊆ D . By extensiveness, D ⊆ Φ m ( D ) , so D is a fixed p oint containing K . Therefore cl m ( K ) ⊆ D . By definition of D , this means that if c ∈ cl m ( K ) , then K ⊢ m c . Pro of of Theorem 2.15 W e first recall the abstract definition of algebraic closure op erator. Definition A.3 (Algebraic closure op erator) . Let X b e a set. A map f : 2 X → 2 X is an algebr aic closur e op er ator if it satisfies extension, monotonicity , idemp otence, and the finitary prop ert y: if c ∈ f ( K ) , then c ∈ f ( S ) for some finite S ⊆ K . Pr o of of The or em 2.15 . F or (i), extension, monotonicit y , and idemp otence of cl m follo w from Proposition 2.10 . Finitariness follows from Theorem 2.14 and Lemma A.2 : if c ∈ cl m ( K ) , then there is a finite deriv ation tree using only finitely man y base lab els from K . F or (ii), define E = { ( S , c ) : S ⊆ X finite and c ∈ f ( S ) \ S } . Let g b e the closure op erator induced b y E as in the theorem statemen t. W e show g ( K ) = f ( K ) for every K ⊆ X . First, g ( K ) ⊆ f ( K ) b ecause f ( K ) is a fixed point of Ψ E con taining K : if c ∈ Ψ E ( f ( K )) , then either c ∈ f ( K ) or else there exists ( S , c ) ∈ E with S ⊆ f ( K ) , whic h implies c ∈ f ( S ) ⊆ f ( f ( K )) = f ( K ) . 37 Con v ersely , if c ∈ f ( K ) , then b y algebraicity there exists a finite S 0 ⊆ K suc h that c ∈ f ( S 0 ) . If c ∈ S 0 , then c ∈ K ⊆ g ( K ) . Otherwise ( S 0 , c ) ∈ E , and since S 0 ⊆ K ⊆ g ( K ) , the rule fires inside g ( K ) , so c ∈ g ( K ) . Pro of of Theorem 2.19 If c ∗ ∈ A m , the empty curriculum works. Assume therefore that c ∗ / ∈ A m . Since c ∗ ∈ U m = cl m ( A m ) , Theorem 2.14 implies that there exists a deriv ation tree of c ∗ from A m . By Lemma A.2 , this deriv ation tree is finite. Let R b e the set of all non-base rule no des in this deriv ation tree. F orm a directed graph on R b y retaining the paren t-child relation b et w een rule no des and orien ting eac h edge from c hild to parent. Because the deriv ation tree is finite and w ell-founded, this directed graph is finite and acyclic. Hence it admits a top ological ordering v 1 , . . . , v L . F or eac h i = 1 , . . . , L , let ( S i , c i ) b e the expansion rule attached to the no de v i . Define K 0 = A m , K i = K i − 1 ∪ { c i } for i = 1 , . . . , L . W e claim that γ = (( S 1 , c 1 ) , . . . , ( S L , c L )) is a v alid ordered curriculum starting from A m . Indeed, fix i ∈ { 1 , . . . , L } and let s ∈ S i . In the deriv ation tree, the child corresp onding to s is either: (i) a base no de, in which case s ∈ A m = K 0 ⊆ K i − 1 ; or (ii) a rule node. In that case this c hild m ust occur earlier than v i in the top ological order, sa y it is v j with j < i . Its lab el is then c j = s , so s ∈ K j ⊆ K i − 1 . Th us every prerequisite in S i b elongs to K i − 1 , so S i ⊆ K i − 1 . Since also ( S i , c i ) ∈ E m b y construction, eac h step is v alid. Therefore γ is a v alid ordered curriculum. Finally , the ro ot of the deriv ation tree is a rule no de lab elled b y c ∗ . Hence it is one of the nodes v 1 , . . . , v L , say v r , and therefore c r = c ∗ . It follo ws that c ∗ ∈ K r ⊆ K L . So the curriculum reaches a final acquired concept set containing c ∗ . Pro of of Prop osition 2.21 W e argue by induction on i . F or i = 0 , K 0 = A m ⊆ U m b y Prop osition 2.11 (i). No w supp ose K i − 1 ⊆ U m . Since γ is v alid, ( S i , c i ) ∈ E m and S i ⊆ K i − 1 ⊆ U m . By Prop osition 2.11 (ii), this implies c i ∈ U m . Hence K i = K i − 1 ∪ { c i } ⊆ U m . This pro ves the claim for all i . The final statement follows immediately . Pro of of Prop osition 2.24 Because U m is finite b y Assumption 2.23 , the p o w er set 2 U m is finite. Since K m ⊆ 2 U m , it follows that K m is finite. F or (i), the trivial c hain of length zero shows that A m ∈ K m . Moreov er, every reac hable set con tains A m , since every witnessing chain starts from A m and only adds concepts. Th us A m is the minimum element of ( K m , ⊆ ) . F or (ii), let K ∈ K m with K  = A m . By definition of reachabilit y , there exists a witnessing chain A m = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K 38 suc h that K i +1 = K i ∪ { c i } , c i ∈ Φ m ( K i ) \ K i ( i = 0 , . . . , L − 1) . Since K  = A m , one has L ≥ 1 . Then K L − 1 ∈ K m b y Lemma B.2 , K L − 1 ⊂ K , | K \ K L − 1 | = 1 . This prov es (ii). F or (iii), let K , K ′ ∈ K m . Cho ose a witnessing c hain for K ′ : A m = K ′ 0 ⊂ K ′ 1 ⊂ · · · ⊂ K ′ s = K ′ , K ′ i +1 = K ′ i ∪ { c i } , c i ∈ Φ m ( K ′ i ) \ K ′ i . F or each i = 0 , . . . , s , define L i = K ∪ K ′ i . Then L 0 = K and L s = K ∪ K ′ . Since K ′ i ⊆ L i , monotonicity of Φ m giv es Φ m ( K ′ i ) ⊆ Φ m ( L i ) . Hence, whenever c i / ∈ L i , one has c i ∈ Φ m ( K ′ i ) ⊆ Φ m ( L i ) , so L i +1 = L i ∪ { c i } is a v alid extension. If instead c i ∈ L i , then L i +1 = L i . Remo ving rep eated sets from the sequence ( L i ) s i =0 yields a v alid chain from K to K ∪ K ′ . Concatenating this chain with any witnessing chain from A m to K sho ws that K ∪ K ′ ∈ K m . Th us K m is union-closed. F or (iv), we first sho w that U m ∈ K m . Let K ∈ K m with K  = U m . Suppose, to w ard a con tradiction, that Φ m ( K ) = K . Then K is a fixed p oin t of Φ m con taining A m . Since U m = cl m ( A m ) is the least fixed p oint containing A m , it follo ws that U m ⊆ K . But by definition of K m , one also has K ⊆ U m , hence K = U m , a con tradiction. Therefore Φ m ( K ) \ K  = ∅ . Cho ose an y c ∈ Φ m ( K ) \ K . Because K ⊆ U m and U m is a fixed p oin t of Φ m , monotonicit y gives Φ m ( K ) ⊆ Φ m ( U m ) = U m , so in particular c ∈ U m . Hence K ∪ { c } is again a reachable subset of U m . Starting from A m , rep eat this step as long as Φ m ( K ) \ K  = ∅ . Because U m is finite and eac h step strictly enlarges the set, the pro cess terminates after finitely man y steps at some reachable set F ⊆ U m satisfying Φ m ( F ) = F . Since F is a fixed p oin t containing A m , minimalit y of U m = cl m ( A m ) implies U m ⊆ F . As also F ⊆ U m , we conclude that F = U m . Therefore U m ∈ K m . Since every element of K m is b y definition a subset of U m , it follo ws that U m is the maxim um elemen t of ( K m , ⊆ ) . Finally , (v) follows from (iii). F or any K , K ′ ∈ K m , the set K ∪ K ′ b elongs to K m and is clearly an upp er b ound of K and K ′ . If M ∈ K m is an y other upp er b ound, so that K ⊆ M and K ′ ⊆ M , then K ∪ K ′ ⊆ M . Hence K ∪ K ′ is the least upp er b ound. 39 Pro of of Corollary 2.26 The result follows directly from Prop osition 2.24 and Definition 2.25 . Pro of of Theorem 2.27 W e prov e (ii) ⇒ (i) and (i) ⇒ (ii) separately . (ii) ⇒ (i) . Assume there exists a mind m = ( C , A , E m ) suc h that K m = F . By Corollary 2.26 , the family K m is an A -based learning space. Hence so is F . (i) ⇒ (ii) . Assume that F is an A -based learning space, and define m F = ( C , A , E F ) using the canonical rule set ab o v e. Let K m F denote the reac hable family generated b y this mind. W e prov e that K m F = F . Step 1: K m F ⊆ F . Let K ∈ K m F . Cho ose a witnessing chain A = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K suc h that K i +1 = K i ∪ { c i } , c i ∈ Φ m F ( K i ) \ K i for i = 0 , . . . , L − 1 . W e prov e by induction on i that K i ∈ F for all i . F or i = 0 , one has K 0 = A ∈ F . No w supp ose K i ∈ F . Since c i ∈ Φ m F ( K i ) \ K i , there exists a rule ( S , c i ) ∈ E F with S ⊆ K i . By definition of E F , S ∈ F , S ∪ { c i } ∈ F . Because K i ∈ F and F is union-closed, K i ∪ ( S ∪ { c i } ) ∈ F . Since S ⊆ K i , this simplifies to K i ∪ { c i } = K i +1 ∈ F . Th us ev ery K i lies in F , and in particular K ∈ F . Hence K m F ⊆ F . Step 2: F ⊆ K m F . Let K ∈ F . If K = A , then K is reachable by the trivial chain. Assume now that K  = A . Since F is an A -based learning space, repeated application of accessibility yields a descending c hain K = K L ⊃ K L − 1 ⊃ · · · ⊃ K 0 = A suc h that eac h K i ∈ F and K i = K i − 1 ∪ { x i } for i = 1 , . . . , L. Rev erse the c hain: A = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K . F or each i = 1 , . . . , L , b oth K i − 1 and K i = K i − 1 ∪ { x i } b elong to F . Therefore, b y the definition of E F , ( K i − 1 , x i ) ∈ E F . Hence x i ∈ Φ m F ( K i − 1 ) \ K i − 1 , so every step in the chain is a v alid reac hable extension. Thus K is reachable from A , whic h sho ws that K ∈ K m F . Therefore F ⊆ K m F . Com bining the t wo inclusions gives K m F = F . 40 A.2 Pro ofs for Section 3 Pro of of Lemma 3.5 W e argue by induction on t . F or t = 0 , K 0 = A m ∈ K m b y the trivial witnessing c hain. No w supp ose K t ∈ K m almost surely . If Y t +1 = ⊥ , then by Definition 3.4 , K t +1 = K t , hence K t +1 ∈ K m . If Y t +1 ∈ Z , define c t +1 = tgt ( Y t +1 ) . Since the parser outputs a non- n ull signal, Definition 3.3 implies that c t +1 ∈ Φ m ( K t ) . Hence either c t +1 ∈ K t , in whic h case K t +1 = K t , or else c t +1 ∈ Φ m ( K t ) \ K t , in which case K t +1 = K t ∪ { c t +1 } is a v alid one-step reac hable extension from K t in the sense of Definition 2.22 . Since K t ∈ K m , it follows that K t +1 ∈ K m . This pro ves the claim. Pro of of Prop osition 3.10 By the definition of conditional mutual information, I (Θ; Y t +1 | F t ) = H (Θ | F t ) − E [ H (Θ | F t ∨ σ ( Y t +1 )) | F t ] . Since F t +1 = F t ∨ σ ( Y t +1 ) and H t = H (Θ | F t ) , this b ecomes I (Θ; Y t +1 | F t ) = H t − E [ H t +1 | F t ] . Because H t is F t -measurable, H t − E [ H t +1 | F t ] = E [ H t − H t +1 | F t ] . Pro of of Theorem 3.11 By Prop osition 3.10 , H t − E [ H t +1 | F t ] = I (Θ; Y t +1 | F t ) ≥ 0 . Hence ( H t ) is a sup ermartingale. Equalit y holds if and only if I (Θ; Y t +1 | F t ) = 0 , whic h is equiv alent to conditional indep endence of Θ and Y t +1 giv en F t . Pro of of Theorem 3.13 The pro of of Theorem 3.13 relies on the follo wing lemma, which we establish first. Lemma A.4 (Unparseabilit y erases information) . L et C t +1 = tgt( Z t +1 ) , U t +1 = { C t +1 / ∈ Φ m ( K t ) } . Then: (i) on U t +1 one has Y t +1 = ⊥ almost sur ely, and ther efor e I (Θ; Y t +1 | F t , U t +1 ) = 0 , I ( Z t +1 ; Y t +1 | F t , U t +1 ) = 0; 41 (ii) if P ( U t +1 | F t ) = 1 , then I (Θ; Y t +1 | F t ) = 0 , I ( Z t +1 ; Y t +1 | F t ) = 0 . Pr o of. On U t +1 , Definition 3.3 gives Y t +1 = ρ m ( Z t +1 , K t ) = ⊥ almost surely . Hence conditional on ( F t , U t +1 ) , the random v ariable Y t +1 is constant, so all the relev ant conditional en tropies are zero. This pro ves (i). If P ( U t +1 | F t ) = 1 , then U t +1 o ccurs almost surely conditional on F t , so Y t +1 = ⊥ almost surely conditional on F t . Again all relev ant conditional entropies are zero, proving (ii). Pr o of of The or em 3.13 . On U t +1 , Lemma A.4 giv es I (Θ; Y t +1 | F t , U t +1 ) = 0 . On U c t +1 , Prop osition B.3 yields I (Θ; Y t +1 | F t , U c t +1 ) = I (Θ; Z t +1 | F t , U c t +1 ) > 0 . A.3 Pro ofs for Section 4 Lemma A.5 (Explicit form ula for the parsed entrop y b ound) . A ssume Z is finite. Then, for every K ∈ K m , C m ( K ) =    log  | Z ord ( K ) | + 1  , if Z ord ( K ) ⊊ Z , log | Z | , if Z ord ( K ) = Z . Pr o of of L emma A.5 . Fix K ∈ K m and define the parsed observ ation range Y ( K ) = { ρ m ( z , K ) : z ∈ Z } ⊆ Z ∪ {⊥} . F or any Z -v alued random v ariable Z , the random v ariable ρ m ( Z, K ) tak es v alues in Y ( K ) almost surely , so H ( ρ m ( Z, K )) ≤ log | Y ( K ) | b y [ Cov er and Thomas , 2006 , P age 41]. T aking the supremum o ver all such Z gives C m ( K ) ≤ log | Y ( K ) | . F or the rev erse inequalit y , let M = | Y ( K ) | . F or each y ∈ Y ( K ) , choose some represen tativ e z y ∈ Z suc h that ρ m ( z y , K ) = y . Define a Z -v alued random v ariable Z by P ( Z = z y ) = 1 M for each y ∈ Y ( K ) , and P ( Z = z ) = 0 for all other z ∈ Z . Then ρ m ( Z, K ) is uniform on Y ( K ) , so H ( ρ m ( Z, K )) = log | Y ( K ) | . Hence C m ( K ) = log | Y ( K ) | . Under the parsing map ρ m , one has ρ m ( z , K ) =    z , if z ∈ Z ord ( K ) , ⊥ , if z / ∈ Z ord ( K ) . If Z ord ( K ) ⊊ Z , then Y ( K ) = Z ord ( K ) ∪ {⊥} , so | Y ( K ) | = | Z ord ( K ) | + 1 . If instead Z ord ( K ) = Z , then ev ery raw signal is parseable, so Y ( K ) = Z . Substituting these t wo cases into C m ( K ) = log | Y ( K ) | prov es the claim. 42 Pro of of Prop osition 4.3 Because K t is F t -measurable, conditional on F t the law of Y t +1 = ρ m ( Z t +1 , K t ) is obtained by passing the conditional law of Z t +1 through the fixed map z 7→ ρ m ( z , K t ) . Therefore, I (Θ; Y t +1 | F t ) ≤ H ( Y t +1 | F t ) ≤ C m ( K t ) almost surely . Pro of of Lemma 4.4 Assume K , K ′ ∈ K m with K ⊆ K ′ . By Lemma 2.7 , Φ m ( K ) ⊆ Φ m ( K ′ ) , hence Z ord ( K ) ⊆ Z ord ( K ′ ) . Define g K , K ′ : Z ∪ {⊥} → Z ∪ {⊥} by g K , K ′ ( y ) =    y , if y ∈ Z ord ( K ) , ⊥ , otherwise. Then for every z ∈ Z , ρ m ( z , K ) = g K , K ′ ( ρ m ( z , K ′ )) . Indeed, if z ∈ Z ord ( K ) , then z is ordered at b oth sets and b oth sides equal z . If z / ∈ Z ord ( K ) , then the left-hand side is ⊥ ; on the right-hand side, either ρ m ( z , K ′ ) = ⊥ , or else ρ m ( z , K ′ ) = z and g K , K ′ ( z ) = ⊥ . No w let Z be any Z -v alued random v ariable. Then ρ m ( Z, K ) = g K , K ′  ρ m ( Z, K ′ )  almost surely . Th us ρ m ( Z, K ) is a deterministic function of ρ m ( Z, K ′ ) . By the data pro cessing inequalit y , H ( ρ m ( Z, K )) ≤ H ( ρ m ( Z, K ′ )) . T aking suprema ov er all Z -v alued random v ariables Z yields C m ( K ) ≤ C m ( K ′ ) . Pro of of Theorem 4.6 Let g K , K ′ b e the deterministic map constructed in the pro of of Lemma 4.4 . F or every raw signal z ∈ Z , one has ρ m ( z , K ) = g K , K ′ ( ρ m ( z , K ′ )) . Therefore, conditional on the public history h t , ρ m ( Z t +1 , K ) = g K , K ′  ρ m ( Z t +1 , K ′ )  almost surely . Hence, for every ω ∈ Ω and ev ery y ∈ Z ∪ {⊥} , W K ,h t ( y | ω ) = X y ′ ∈ Z ∪{⊥} G K , K ′ ( y | y ′ ) W K ′ ,h t ( y ′ | ω ) , where G K , K ′ ( y | y ′ ) = 1 { g K , K ′ ( y ′ ) = y } is the Mark o v kernel induced by g K , K ′ . Th us W K ,h t is obtained from W K ′ ,h t b y post-pro cessing through a Marko v k ernel independent of ω . Therefore W K ,h t is a garbling of W K ′ ,h t , and W K ′ ,h t Blac kwe ll-dominates W K ,h t . 43 Pro of of Prop osition 4.8 Fix a sample path. By the concept-acquisition up date rule, each round adds at most one new concept to the learner’s acquired concept set. If completion o ccurs at time τ , then in particular Θ ∈ K τ . Delete rep eated sets from the sequence K 0 , K 1 , . . . , K τ . The resulting strictly increasing sequence is of the form A m = K i 0 ⊂ K i 1 ⊂ · · · ⊂ K i r , where each step adds one concept b elonging to the one-step expansion of the previous set. Hence it is a prerequisite-resp ecting c hain ending at a set containing Θ . By definition of L m (Θ) , any such c hain has length at least L m (Θ) . Since the n um b er of strict acquisitions up to time τ is at most τ , it follows that τ ≥ L m (Θ) almost surely . T aking exp ectations completes the pro of. Pro of of Lemma 4.9 By Prop osition 3.10 , E [ H t − H t +1 | F t ] = I (Θ; Y t +1 | F t ) . Multiplying b y 1 { τ id >t } and taking exp ectations giv es E h 1 { τ id >t } ( H t − H t +1 ) i = E h 1 { τ id >t } I (Θ; Y t +1 | F t ) i . Summing from t = 0 to n − 1 yields E " n − 1 X t =0 1 { τ id >t } I (Θ; Y t +1 | F t ) # = E [ H 0 ] − E [ H τ id ∧ n ] . Since Θ is F τ id -measurable, one has H τ id = H (Θ | F τ id ) = 0 almost surely . Also, 0 ≤ H t ≤ log | Ω | for all t . Let S n = n − 1 X t =0 1 { τ id >t } I (Θ; Y t +1 | F t ) . Because the summands are nonnegative, S n increases almost surely to τ id − 1 X t =0 I (Θ; Y t +1 | F t ) . Monotone conv ergence and b ounded conv ergence therefore give E " τ id − 1 X t =0 I (Θ; Y t +1 | F t ) # = E [ H 0 ] . Since F 0 is trivial, E [ H 0 ] = H (Θ) . 44 Pro of of Prop osition 4.10 By Lemma 4.9 , H (Θ) = E " τ id − 1 X t =0 I (Θ; Y t +1 | F t ) # . By Prop osition 4.3 , I (Θ; Y t +1 | F t ) ≤ C m ( K t ) almost surely for every t. Substituting this b ound inside the sum yields the result. Pro of of Theorem 4.11 By Prop osition 4.8 , the structural b ound follows: E [ τ ] ≥ E [ L m (Θ)] . F or the epistemic part: define the identification time τ id = inf { t ≥ 0 : H (Θ | F t ) = 0 } . Since { τ id ≤ t } = { H (Θ | F t ) = 0 } ∈ F t , τ id is an ( F t ) -stopping time. Moreo ver, if τ is a completion time then iden tification m ust already hav e o ccurred, so τ id ≤ τ almost surely . By Prop osition 4.10 , H (Θ) ≤ E " τ id − 1 X t =0 C m ( K t ) # . Since K t ∈ K m almost surely by Lemma 3.5 , C m ( K t ) ≤ C max m almost surely . Therefore H (Θ) ≤ E " τ id − 1 X t =0 C m ( K t ) # ≤ C max m E [ τ id ] ≤ C max m E [ τ ] . Rearranging yields E [ τ ] ≥ H (Θ) C max m . Com bining the t wo b ounds giv es the theorem. Pro of of Prop osition 4.13 F or each c ∈ Ω + , c ho ose one raw signal z c ∈ Z suc h that tgt ( z c ) = c . Because tgt is a function, the signals ( z c ) c ∈ Ω + are pairwise distinct. Since Ω ⊆ U m and U m is a fixed p oint of Φ m , ev ery target concept c ∈ Ω is ordered at U m . Hence eac h z c is parseable at U m , so the parsed observ ation range at U m con tains at least the distinct sym b ols { z c : c ∈ Ω + } . Therefore, by Lemma A.5 , C max m ≥ C m ( U m ) ≥ log | Ω + | . Since en trop y is b ounded by the logarithm of the supp ort size, H (Θ) ≤ log | Ω + | , whic h yields H (Θ) ≤ C max m . This pro ves (i). 45 F or (ii), fix c ∈ Ω + . By definition of L m ( c ) , there exists a v alid ordered curriculum of length L m ( c ) from A m to a set containing c . Under Assumption 4.12 , the teacher can implemen t that curriculum by sending one raw signal targeting eac h concept along the path. After L m ( c ) rounds, the learner has acquired c . In one additional round, the teac her sends the fixed representativ e signal z c . Because c ∈ K L m ( c ) , the signal z c is parseable at that state. Since the strategy specifies a unique representativ e signal for eac h p ossible target, the learner identifies the realized target after observing z c . Th us τ ≤ L m (Θ) + 1 almost surely . T aking exp ectations completes the pro of. A.4 Pro ofs for Section 5 Pro of of Prop osition 5.1 By Prop osition 4.8 , ev ery completion time τ satisfies τ ≥ L m (Θ) almost surely . Hence { τ ≤ t } ⊆ { L m (Θ) ≤ t } . Therefore, for every admissible teaching strategy , P ( τ ≤ t ) ≤ P  L m (Θ) ≤ t  . T aking the supremum o ver strategies yields the first claim. If t < L min , then L m (Θ) > t almost surely under the prior, so P  L m (Θ) ≤ t  = 0 . Hence V ( t ) = 0 . Pro of of Prop osition 5.2 Fix an admissible teaching strategy with E [ τ ] < ∞ . By Marko v’s inequality , P ( τ > t ) ≤ E [ τ ] t , and therefore P ( τ ≤ t ) ≥ 1 − E [ τ ] t . Since V ( t ) is the supremum of P ( τ ≤ t ) ov er all admissible strategies, it follows that V ( t ) ≥ 1 − E [ τ ] t . Letting t → ∞ gives V ( t ) → 1 . 46 Pro of of Prop osition 5.3 By Prop osition 4.8 , ev ery acquisition time τ g satisfies τ g ≥ L m ( g ) almost surely . Therefore, for every admissible strategy and ev ery t < L m ( g ) , P ( τ g ≤ t ) = 0 . T aking the supremum o ver strategies yields V g ( t ) = 0 for t < L m ( g ) . No w let L = L m ( g ) . By definition of structural distance, there exists a witnessing chain A m = K 0 ⊂ K 1 ⊂ · · · ⊂ K L , g ∈ K L , suc h that K i +1 = K i ∪ { u i } , u i ∈ Φ m ( K i ) \ K i ( i = 0 , . . . , L − 1) . By Assumption 4.12 , for each u i there exists a raw signal z i ∈ Z such that tgt( z i ) = u i . Since u i ∈ Φ m ( K i ) , the signal z i is parseable at K i . If the teac her sends z 0 , z 1 , . . . , z L − 1 in sequence, the learner mov es through the sets K 0 , K 1 , . . . , K L and therefore acquires g after L rounds. Thus there exists an admissible strategy suc h that P ( τ g ≤ L ) = 1 . Hence V g ( t ) = 1 for all t ≥ L m ( g ) . Pro of of Prop osition 5.5 F or (i), if ev ery learner receives fewer than L rounds, then by Prop osition 5.3 the acquisition probabilit y of g is zero for ev ery learner. Hence no learner completes. F or (ii), select min { N , ⌊ B /L ⌋} learners and allo cate L rounds to eac h of them, allo cating 0 rounds to the remaining learners. This is feasible b ecause L min { N , ⌊ B /L ⌋} ≤ L ⌊ B /L ⌋ ≤ B , and min { N , ⌊ B /L ⌋} ≤ N . By Prop osi- tion 5.3 , each selected learner completes with probability one. The remaining learners receive 0 < L rounds, so again b y Prop osition 5.3 they complete with prob- abilit y zero. Hence the total num b er of completed learners is min { N , ⌊ B /L ⌋} . 47 Pro of of Theorem 5.6 Let a and g b e distinct concepts. F or eac h i ∈ { 1 , . . . , k } and each j ∈ { 1 , . . . , L − 1 } , let p i,j b e a distinct concept, with all these concepts also distinct from a and g . Define C = { a, g } ∪ { p i,j : i = 1 , . . . , k , j = 1 , . . . , L − 1 } , A = { a } . F or each i ∈ { 1 , . . . , k } , define the rule set of mind m i b y E m i = n ( { a } , p i, 1 ) , ( { p i, 1 } , p i, 2 ) , . . . , ( { p i,L − 2 } , p i,L − 1 ) , ( { p i,L − 1 } , g ) o . Th us mind m i has a priv ate prerequisite chain a → p i, 1 → p i, 2 → · · · → p i,L − 1 → g , and no concept p i ′ ,j with i ′  = i is reac hable in mind m i . Cho ose ra w signals z i,j ∈ Z ( i = 1 , . . . , k , j = 1 , . . . , L − 1) , z g ∈ Z , with tgt( z i,j ) = p i,j , tgt( z g ) = g . (i) Personalize d ac quisition in L r ounds. Fix i . The sequence z i, 1 , z i, 2 , . . . , z i,L − 1 , z g is a v alid ordered curriculum of length L for m i : eac h signal b ecomes parseable when its predecessor on the priv ate chain has b een acquired, and the final signal acquires g . (ii) Br o adc ast lower b ound. Consider an y common broadcast sequence Γ = ( z 1 , . . . , z T ) that acquires g for ev ery mind. Fix i . Before mind m i can acquire g , it must first acquire all L − 1 priv ate prerequisite concepts p i, 1 , . . . , p i,L − 1 . Moreo v er, if i ′  = i , then none of the concepts p i,j lies in U m i ′ . Hence a broadcast signal targeting p i,j can help at most mind m i ; it pro duces no acquisition for an y other mind. It follo ws that at least L − 1 rounds m ust b e dev oted to the priv ate prerequisites of eac h mind i . Summing ov er i = 1 , . . . , k , at least k ( L − 1) rounds are required to make all minds ready for a signal targeting g . Finally , one additional round targeting g is necessary , since g / ∈ A and is acquired only when a signal with target g is parseable. Hence T ≥ k ( L − 1) + 1 . (iii) Tightness. Consider the broadcast sequence z 1 , 1 , . . . , z 1 ,L − 1 , z 2 , 1 , . . . , z 2 ,L − 1 , . . . , z k, 1 , . . . , z k,L − 1 , z g . During the blo c k z i, 1 , . . . , z i,L − 1 , only mind m i adv ances; all other minds ignore those signals. After the first k ( L − 1) rounds, each mind m i has acquired p i,L − 1 . The final signal z g is therefore parseable for ev ery mind, so all of them acquire g on the last round. Thus the low er b ound is attained. 48 B A dditional results This app endix collects supplementary results that are inv ok ed in the pro ofs ab o v e but are not essential to the main narrativ e. It also records additional consequences of the framework that ma y b e of indep enden t in terest. Corollary B.1 (Not every union-closed family ab o v e the axioms is a learning space) . The class of A -b ase d le arning sp ac es on U is a strict sub class of the class of union-close d families F ⊆ 2 U . Pr o of. Ev ery A -based learning space is, by definition, union-closed and lies ab o ve A , so only strictness needs to b e shown. Let U = { a, b } , A = ∅ , F = { ∅ , { a, b }} . Then F is union-closed and con tains A . Ho w ev er, it fails accessibilit y , since neither { a, b } \ { a } = { b } nor { a, b } \ { b } = { a } b elongs to F . Hence F is not an A -based learning space. Lemma B.2 (Prefix closure of reac hable acquired concept sets) . If K ∈ K m and A m = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K is a witnessing chain, then every interme diate set K i also b elongs to K m . Pr o of. Eac h K i is reac hable from A m b y truncating the witnessing c hain at step i . Prop osition B.3 (P arseability preserv es information) . L et U t +1 = { tgt( Z t +1 ) / ∈ Φ m ( K t ) } . Then I (Θ; Y t +1 | F t , U c t +1 ) = I (Θ; Z t +1 | F t , U c t +1 ) . In p articular, if the right-hand side is strictly p ositive, then so is the left-hand side. Pr o of. On U c t +1 , the parser acts as the iden tity , so Y t +1 = Z t +1 almost surely . The identit y of the conditional mutual informations follows immediately . Corollary B.4 (Unlimited rephrasing can b e useless under sharp parsing) . Fix time t and a mind m . L et U t ( c ) = { c / ∈ Φ m ( K t ) } . L et ( Z ( j ) t +1 ) j ≥ 1 b e any family of Z -value d r andom variables such that tgt( Z ( j ) t +1 ) = c almost sur ely for every j ≥ 1 , and define Y ( j ) t +1 = ρ m ( Z ( j ) t +1 , K t ) . Then for every j ≥ 1 , I (Θ; Y ( j ) t +1 | F t ) = 0 almost sur ely on U t ( c ) . Pr o of. On U t ( c ) , the targeted concept is not ordered, so Y ( j ) t +1 = ⊥ almost surely . Hence Y ( j ) t +1 is conditionally constant giv en F t , so the conditional mutual informa- tion is zero. 49 References C. D. Aliprantis and K. C. Border. Infinite Dimensional A nalysis: A Hitchhiker’s Guide . Springer, 2006. Gary S. Bec k er. Human Capital: A The or etic al and Empiric al A nalysis, with Sp e cial R efer enc e to Educ ation . Univ ersity of Chicago Press, 1964. Y oram Ben-Porath. The pro duction of human capital and the life cycle of earnings. Journal of Politic al Ec onomy , 75(4):352–365, 1967. Y oshua Bengio, Jérôme Louradour, Ronan Collob ert, and Jason W eston. Curricu- lum learning. In International Confer enc e on Machine L e arning , pages 41–48, 2009. Claude Berge. Hyp er gr aphs: Combinatorics of finite sets , v olume 45. Elsevier, 1984. Da vid Blackw ell. Comparison of exp erimen ts. In Berkeley Symp osium on Mathe- matic al Statistics and Pr ob ability , pages 93–102. Univ ersity of California Press, 1951. Da vid Blac kwell. Equiv alen t comparisons of exp erimen ts. The A nnals of Mathe- matic al Statistics , 24(2):265–272, 1953. Thomas M. Co ve r and Joy A. Thomas. Elements of Information The ory . Wiley , 2006. Fla vio Cunha and James J. Hec kman. The technology of skill formation. A meric an Ec onomic R eview , 97(2):31–47, 2007. Reinhard Diestel. Gr aph The ory , volume 173. Springer, 2024. Jean-P aul Doignon and Jean-Calude F almagne. K now le dge Sp ac es . Springer, 1999. Jean-P aul Doignon and Jean-Claude F almagne. Kno wledge spaces and learning spaces. , 2015. Jean-P aul Doignon and Jean-Claude F almagne. Kno wledge spaces and learning spaces. In New Handb o ok of Mathematic al Psycholo gy, V olume 1: F oundations and Metho dolo gy , pages 274–321. Cambridge Univ ersity Press, 2016. Sally A. Goldman and Michael J. Kearns. On the complexit y of teaching. Journal of Computer and System Scienc es , 50(1):20–31, 1995. Bernhard K orte and László Lo vász. Structural prop erties of greedoids. Combina- toric a , 3(3):359–374, 1983. Bernhard K orte, László Lovász, and Rainer Schrader. Gr e e doids , volume 4 of A lgorithms and Combinatorics . Springer-V erlag, 1991. 50 Claude E. Shannon. A mathematical theory of comm unication. Bel l System T e chnic al Journal , 27(3):379–423, 1948. Claude E. Shannon. Communication theory of secrecy systems. Bel l System T e chnic al Journal , 1949. Christopher A. Sims. Implications of rational inatten tion. Journal of Monetary Ec onomics , 50(3):665–690, 2003. Alfred T arski. A lattice-theoretical fixp oin t theorem and its applications. Pacific Journal of Mathematics , 5(2):285–309, 1955. Hassler Whitney . On the abstract properties of linear dep endence. A meric an Journal of Mathematics , 57(3):509–533, 1935. Xiao jin Zhu, Ji Liu, and Man uel Lop es. No learner left b ehind: On the complexity of teaching m ultiple learners sim ultaneously . In International Joint Confer enc e on A rtificial Intel ligenc e , pages 3588–3594, 2017. Xiao jin Zh u, A dish Singla, Sandra Zilles, and Anna N Rafferty . An ov erview of mac hine teac hing. , 2018. 51

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment