A Mathematical Theory of Understanding

A Mathematical Theory of Understanding Bahar T aşk esen University of Chic ago, Bo oth Scho ol of Business Marc h 23, 2026 Abstract Generativ e AI has transformed the economics of information pro duction, making explanations, pro ofs, examples, and analyses a v ailable at very lo w cost. Y et the v alue of information still depends on whether do wnstream users can absorb and act on it. A signal con veys meaning only to a learner with the structural capacit y to decode it: an explanation that clariﬁes a concept for one user ma y b e indistinguishable from noise to another who lacks the relev an t prerequisites. This pap er develops a mathematical mo del of that learner-side bottleneck. W e mo del the learner as a mind , an abstract learning system characterized by a prerequisite structure ov er concepts. A mind may represent a h uman learner, an artiﬁcial learner suc h as a neural netw ork, or any agen t whose ability to interpret signals dep ends on previously acquired concepts. T eaching is mo deled as sequential comm unication with a laten t target. Because instructional signals are usable only when the learner has acquired the prerequisites needed to parse them, the eﬀective communication channel dep ends on the learner’s curren t state of kno wledge and b ecomes more informativ e as learning progresses. The mo del yields t wo limits on the sp eed of learning and adoption: a structural limit determined by prerequisite reac habilit y and an epistemic limit determined by uncertaint y ab out the target. The framework implies threshold eﬀects in training and capability acquisition. When the teac hing horizon lies b elo w the prerequisite depth of the target, additional instruction cannot produce successful completion of teac hing; once that depth is reached, completion b ecomes feasible. This generates non-concav e returns to training eﬀort and implies that spreading scarce instructional resources evenly can yield low er output than concentrating them on few er w orkers or users. Across heterogeneous learners, a common broadcast curriculum can b e slow er than p ersonalized instruction by a factor linear in the num b er of learner types. 1 In tro duction The v alue of a piece of information dep ends on the existence of a mind that can deco de it. This is eviden t in ordinary learning: an explanation means nothing to 1 a listener who lacks the bac kground to follow it, and a lecture conv eys nothing to a studen t who cannot parse its conten t. Information that cannot b e absorb ed b y the in tended learner is, in a precise sense, noise. Understanding therefore cannot b e reduced to the accumulation of information alone. Whether a signal carries usable information is not an in trinsic prop ert y of the signal itself, but a relation b et w een the signal and the conceptual structure of the mind that receives it. Ov er the past cen tury , the cost of pro ducing and distributing information has fallen b y orders of magnitude, from prin ted encyclop edias to digital rep ositories to, most recently , generativ e AI systems capable of pro ducing explanations, pro ofs, and w orked examples on demand. As the supply of machine-generated information expands, the b ottlenec k shifts from pro duction to absorption: the ability of do wnstream users to parse, in terpret, and act on what is pro duced. Whether a signal carries usable information dep ends on the learner’s ability to decode it. An explanation that conv eys meaning to one user may b e indistinguishable from noise to another who lacks the relev ant prerequisites. This pap er dev elops a mathematical mo del of that learner-side b ottlenec k through a formal theory of understanding. W e do not attempt to mo del every feature of cognition, suc h as analogy , abstraction, forgetting, or seman tic in ter- pretation. Instead, we ask a narrow er structural question: giv en a learner with a ﬁxed prerequisite arc hitecture, whic h concepts are understandable in principle, through whic h in termediate states can the learner mo v e, and what limits the sp eed at whic h instruction can bring the learner to a target? Our starting p oin t is a formal mo del of a mind . W e use the term mind for a learning system whose ability to interpret new signals dep ends on what has already b een acquired. The same formal ob ject can represent a human learner, an artiﬁcial learner such as a neural net work, or any agen t whose deco ding p o wer is shap ed by prerequisite structure. F ormally , a mind consists of a concept space together with an axiom set and a family of ﬁnitary expansion rules sp ecifying whic h concepts b ecome accessible once their prerequisites hav e b een acquired. These rules induce an understanding horizon, describing what is reachable in principle from the axioms, and a family of reachable acquired concept sets, describing the in termediate states through whic h a learner can progress b y successiv e prerequisite-resp ecting steps. Under a ﬁnite-horizon assumption, we sho w that this reachable family forms a learning space abov e the axiom set, equiv alently an antimatroid, and conv ersely that every such structure admits a represen tation b y an appropriate mind. T o study the op erational consequences of this structure, we mo del teaching as sequen tial comm unication with a latent target concept. The teacher kno ws the realized target but the learner do es not. Instructional signals are ﬁltered through a prerequisite-gated parser induced b y the mind: a signal is usable only when its target concept is currently ordered for the learner, and otherwise collapses to a common null observ ation. The eﬀective learner-side channel is therefore not ﬁxed in adv ance. It dep ends on the learner’s curren t kno wledge state and changes as instruction pro ceeds. The same raw broadcast may conv ey usable information to one learner while collapsing to noise for another. W e call this phenomenon the 2 r elativity of r andomness . This state dep endence creates tw o distinct obstacles to fast teaching. The ﬁrst is structur al : b efore a target can b e acquired, the learner m ust mo v e through prerequisite-resp ecting states until the target b ecomes curren tly parseable. The second is epistemic : the learner m ust infer which target the teacher in tends. Our main quantitativ e result com bines these tw o b ottlenec ks in to a general lo wer bound on teac hing time. Exp ected completion time m ust clear b oth a structural barrier, determined b y the shortest v alid route to the target, and an epistemic barrier, determined b y the cum ulative usable information that can pass through the learner-side channel. The structural barrier can b e dominant, but the information-theoretic la y er remains essential for c haracterizing when and ho w instruction b ecomes usable. In our mo del, once the prerequisite structure makes the target reachable, one additional signal is enough for identiﬁcation. This framework leads to several consequences. A cquiring prerequisites do es more than add concepts: in the sense of Blackw ell, it reﬁnes the statistical ex- p erimen t through which later instruction ab out the target is interpreted. This structural c hange has op erational implications for teaching. F or deterministic tar- gets, ﬁxed-horizon teac hing problems exhibit discontin uous structural thresholds: completion probabilit y jumps from zero to one when the teac hing horizon reaches the structural distance to the target concept, implying non-concav e returns to instructional time and simple failures of uniform resource allo cation. The same structural logic also shap es multi-learner settings. A cross heterogeneous learn- ers, teac hing with a common broadcast curriculum can b e strictly slo wer than p ersonalized instruction b y a factor linear in the num b er of learner types. Related literature. The pap er draws on several literatures but diﬀers from eac h in a sp eciﬁc wa y . At a broad level, our question is how the structure of a learner limits the us able ﬂow of information. This connects combinatorial mo dels of learning, information theory , teac hing and mac hine learning, and mo dels of skill formation, but the present framework combines these ingredients in a wa y that is sp eciﬁc to prerequisite-gated understanding. The com binatorial study of feasible learning states originates with knowledge space theory [ Doignon and F almagne , 1999 , 2015 ], where the family of feasible states is tak en as a primitiv e. Indep enden tly , Korte et al. [ 1991 ] arrived at the same mathematical structure, antimatroids, from the p ersp ectiv e of com binato- rial optimization. W e reco v er this structure from a diﬀeren t starting point: a generativ e mo del of a mind sp eciﬁed by axioms and ﬁnitary expansion rules. The equiv alence (Theorem 2.27 ) sho ws that the t wo viewp oin ts are formally inter- c hangeable, but the generativ e formulation connects the com binatorial structure directly to closure, deriv abilit y , and the teaching b ounds developed in the pap er. Shannon’s information theory [ Shannon , 1948 ] studies channels whose input- output relationship is ﬁxed. In our framew ork, by con trast, the learner’s parsing map induces an eﬀectiv e channel whose output alphab et dep ends on the learner’s curren t acquired state. Blac kw ell’s comparison of exp erimen ts [ Blac kwell , 1951 , 1953 ] pro vides the natural language for this dep endence: w e show that the 3 parsed exp erimen t induced by a larger acquired state Blackw ell-dominates the one induced by a smaller state. In computational learning theory , Goldman and Kearns [ 1995 ] in tro duce teac hing dimension as a combinatorial measure of how many lab eled examples suﬃce to identif y a target concept within a learner class. Our setting is diﬀerent: the main constraint is not only identiﬁcation, but whether the learner can parse target-relev ant signals at all giv en its current prerequisites. More recent w ork in mac hine teac hing studies settings in which a single teac her must instruct multiple heterogeneous learners with a common teac hing sequence; for example, Zh u et al. [ 2017 ] sho w that common teaching can be strictly harder than individualized teac hing, and Zh u et al. [ 2018 ] surv ey the broader landscap e. Our broadcast imp ossibilit y result (Theorem 5.6 ) diﬀers in the s ource of the p enalt y: it is driven b y prerequisite-gated deco dabilit y and the geometry of reachable acquired states, rather than by diﬀerences in algorithmic up date rules across learners. The term curriculum also app ears in mac hine learning, where it typically refers to the ordering of training examples from easy to hard [ Bengio et al. , 2009 ]. There the ob ject b eing shap ed is the optimization tra jectory of a parametric mo del; here it is the sequence of prerequisite-resp ecting states through which a structured learner can mov e. The threshold and allo cation results in Section 5 are also related in spirit to mo dels of h uman-capital accum ulation [ Bec ker , 1964 , Ben-Porath , 1967 , Cunha and Heckman , 2007 ]. Those mo dels study how current in vestmen t aﬀects future skill formation, often through complementarit y across stages. Our mechanism is diﬀeren t. In our framework, missing prerequisites create structural thresholds: b elo w the relev ant structural depth, completion is imp ossible regardless of strategy , whereas b ey ond that threshold completion b ecomes feasible. The resulting non- smo othness comes from prerequisite-gated deco dabilit y rather than from an exogenous pro duction technology . The state-dep enden t information constraint also connects to rational inattention [ Sims , 2003 ]: b oth frameworks study limits on usable information, but in rational inattention the b ottlenec k is imp osed through an explicit information cost, whereas here it arises endogenously from the prerequisite structure of the mind. Finally , the observ ation that absorptive capacit y limits the v alue of infor- mation connects naturally to emerging work on the economic implications of AI-generated conten t. As generativ e mo dels reduce the cost of pro ducing expla- nations, examples, and analyses, the cen tral question becomes who can mak e use of the resulting output. In our framew ork, this b ottlenec k arises from the prerequisite structure of the learner, which determines which generated signals carry usable information and which collapse to noise. Notation and con v en tions. W e write ∆(Ω) for the set of probability distri- butions on a ﬁnite or countable set Ω . Unless stated otherwise, all logarithms are tak en to base 2 ; accordingly , entrop y and mutual information are measured in bits. F or a set S , we denote its cardinality by | S | and its p o w er set by 2 S . Finally , δ x denotes the p oin t mass at x ∈ S . 4 2 Understanding as a Closure System What do es it mean for a learner to “understand” something? A child who knows addition can follow a multiplication lesson built on rep eated addition; one who lac ks addition cannot follo w that explanation. The same w ords carry information for one mind and are noise for another. Understanding, in this sense, is not an isolated state but a structured dep endency: eac h concept requires certain prerequisites, and those prerequisites ma y themselves dep end on prior knowledge. T o formalize this idea, we introduce a primitiv e notion of concept and a nonempt y c onc ept sp ac e , whose elements represent the conceptual units under consideration. A mind is then sp eciﬁed b y t wo ob jects: a set of axioms and a set of expansion rules. Axioms are concepts taken as giv en, requiring no further justiﬁcation. Each expansion rule states that mastery of a sp eciﬁc ﬁnite set of concepts, referred to as its prerequisites, unlo c ks a new concept. Diﬀerent minds ma y share the same concept space yet diﬀer in their axioms or expansion rules. In that case, the order in whic h concepts b ecome learnable diﬀers, capturing the familiar observ ation that individuals with diﬀeren t backgrounds require diﬀeren t learning paths. Giv en a mind, the expansion rules induce a closure op erator: starting from an y set of known concepts, iteratively apply every expansion rule whose prerequisites are satisﬁed un til no new concepts are added. The resulting closure op erator satisﬁes extension, monotonicit y , and idemp otence, the standard closure axioms. These are not merely formal conv eniences. Extension enco des that know ledge is never lost by deriv ation. Monotonicity enco des that knowing more can only enlarge what is deriv able. Idemp otence enco des that once all consequences hav e b een dra wn, further application changes nothing. An y reasonable notion of logical or conceptual consequence m ust satisfy these prop erties. The closure framework pro vides the basic structural language in whic h the notion of understanding will b e formalized in the sections that follow. W e now formalize these ideas using closure op erators from order theory . Deﬁnition 2.1 (Concept space) . A c onc ept sp ac e is a nonempty set C whose elemen ts are c onc epts . The concept space C is a mo deling primitive: its elements may represen t facts (“zebras are animals”), skills (“long division”), prop ositions (“the fundamental theorem of calculus”), or pro cedures (“how the simplex metho d works”) at any lev el of granularit y . The framework is inv arian t to this choice. The mo deler selects C in the same w a y an economist selects the state space in a decision problem or the t yp e space in a mec hanism design mo del: the c hoice determines which phenomena the mo del can express, but the theorems themselves do not dep end on the particular in terpretation. The concept space C ma y b e ﬁnite or coun tably inﬁnite. When concepts admit ﬁnite descriptions, they can b e represented as ﬁnite strings ov er a ﬁnite alphab et, and C can therefore b e identiﬁed with a subset of that set. 5 Deﬁnition 2.2 (Mind) . A mind ov er a concept space C is a triple m = ( C , A m , E m ) where: (i) A m ⊆ C is a set of axioms , (ii) E m ⊆ 2 C ﬁn × C is a set of exp ansion rules , where 2 C ﬁn denotes the collection of ﬁnite subsets of C . The axioms A m are the concepts that the mind m understands a priori : they require no prerequisites. Eac h expansion rule ( S , c ) ∈ E m states that if all concepts in the ﬁnite set S are curren tly understo o d, then the concept c b ecomes accessible. The set S is referred to as the pr er e quisites of c under that rule. The expansion rules E m describ e the cognitiv e architecture of the mind, that is, the wiring that determines what can b e deriv ed from what, rather than prop ositions explicitly kno wn b y the learner. A rule in E m is not assumed to b e something the learner can articulate; instead, it sp eciﬁes whic h concepts b ecome accessible once the learner has mastered the prerequisites. The te acher , b y con trast, ma y or ma y not kno w E m . A teac her with full knowledge of the learner’s rules can tailor instruction to the learner’s prerequisite structure, whereas a teacher who is ignoran t of the learner’s type may ha ve to resort to a common broadcast and can then pay the price of universalit y (Theorem 5.6 ). W e imp ose one structural restriction on E m : each prerequisite set is ﬁnite. A ccordingly , the granularit y of C , i.e. , what counts as a single concept, should b e c hosen so that realistic explanations can b e mo deled using ﬁnitely man y prerequisites. Beyond this ﬁnitarit y requiremen t, the lev el of gran ularit y is a mo deling c hoice. R emark 2.3 . W e do not mo del logical inconsistency or b elief revision. Concepts are treated as abstract units, and understanding refers to accessibilit y under a prerequisite structure rather than to semantic truth. This is a delib erate mo deling c hoice, analogous to Shannon’s separation of the engineering problem of comm unication from the semantic conten t of messages. A ccordingly , a concept in our framework may represent a true theorem, a useful heuristic, or even a widespread misconception. The theory is in v ariant to this distinction: the teaching b ounds dep end only on the dep endency structure induced by the prerequisite rules and on the information geometry of the teaching in teraction, not on the truth v alue of the concepts themselves. Example 2.4 (T w o minds learning arithmetic) . Let C = { a, b, c, d } with the informal readings a = coun ting, b = addition, c = spatial arra ys, d = m ultiplica- tion. Mind 1 (algorithmic le arner). Axioms A 1 = { a } . The expansion rule set E 1 consists of { a } ⇒ b , { b } ⇒ c , { b, c } ⇒ d . This mind ﬁrst understands addition from counting, then understands spatial arra ys through rep eated addition, and ﬁnally grasps m ultiplication once it combines rep eated addition with the array represen tation. Mind 2 (visual le arner). Axioms A 2 = { a } . The expansion rule set E 2 consists of { a } ⇒ c , { c } ⇒ b , { b, c } ⇒ d . This mind ﬁrst understands spatial arra ys 6 from counting ob jects arranged in space, then understands addition by combining arra ys, and ﬁnally reaches multiplication through the same rule { b, c } ⇒ d . Both minds in Example 2.4 share the same concept space and the same axioms, and b oth can even tually derive all four concepts, but the order in which concepts b ecome av ailable diﬀers. A concept that one mind derives early ma y come late for the other. This is the formal expression of r elativity of understanding : individuals with diﬀerent cognitiv e arc hitectures can arriv e at the same b o dy of knowledge through fundamentally diﬀerent paths. W e will revisit this example throughout the pap er. Example 2.4 is ab out learning mathematics, but the framework applies to an y domain in which understanding has prerequisite structure. The next example illustrates this. Example 2.5 (T w o minds learning text editing on a computer) . Let C = { t, s, k , e } with the informal readings t = typing text, s = selecting (highlighting) text, k = k eyb oard shortcuts, e = eﬃcient editing. Mind 3 (mouse-ﬁrst). The axiom set is A 3 = { t } . The expansion rule set E 3 consists of { t } ⇒ s , { s } ⇒ k , { s, k } ⇒ e . This learner ﬁrst acquires text selection from typing, then acquires k eyb oard shortcuts once selection is understo o d, and ﬁnally reaches eﬃcient editing once b oth selection and shortcuts are av ailable. Mind 4 (shortcut-ﬁrst). The axiom set is A 4 = { t } . The expansion rule set E 4 consists of { t } ⇒ k , { k } ⇒ s , { s, k } ⇒ e . This learner ﬁrst acquires keyboard shortcuts from typing, then acquires selection through shortcut-based interaction, and ﬁnally reac hes eﬃcient editing once b oth selection and shortcuts are av ailable. Both minds share the same concept space and the same axiom set, and b oth can ultimately reach e . Ho wev er, their prerequisite structures diﬀer: in Mind 3, selection unlo c ks shortcuts, whereas in Mind 4, shortcuts unlo c k selection. The ﬁnal rule { s, k } ⇒ e is shared, but the paths by which its prerequisites are acquired are diﬀerent. The expansion rules admit a com binatorial in terpretation. They form a directed hypergraph [ Berge , 1984 ] in which eac h rule ( S , c ) is a h yp eredge from the prerequisite set S to the concept c . Deﬁnition 2.6 (One-step expansion) . F or a mind m and a set K ⊆ C of currently kno wn concepts, deﬁne the one-step exp ansion b y Φ m ( K ) = K ∪ { c ∈ C : ∃ S ⊆ K such that ( S , c ) ∈ E m } . F or Mind 1 in Example 2.4 , start from K = { a } . The rule { a } ⇒ b ﬁres, since { a } ⊆ { a } , and therefore Φ 1 ( { a } ) = { a, b } . Applying the op erator again, the rule { b } ⇒ c ﬁres, whereas { b, c } ⇒ d do es not, since c / ∈ { a, b } . Th us Φ 1 ( { a, b } ) = { a, b, c } . Applying the operator once more, the rule { b, c } ⇒ d no w ﬁres, so Φ 1 ( { a, b, c } ) = { a, b, c, d } . A further application pro duces no new concepts, so { a, b, c, d } is a ﬁxed p oin t of Φ 1 . Note that Φ m is extensive : by deﬁnition, the union in Deﬁnition 2.6 includes K itself, so K ⊆ Φ m ( K ) for every K ⊆ C . W e use this prop ert y freely throughout. 7 T wo further prop erties of Φ m ensure that rep eated application yields a w ell-deﬁned closure. Monotonicit y guarantees that kno wledge never shrinks, while preserv ation of directed unions ensures that no concept app ears only “at the limit”: whenever a concept is deriv able from the union of an increasing family of knowledge states, it is already deriv able at some stage in that family . Lemma 2.7 (Monotonicit y) . If K ⊆ K ′ , then Φ m ( K ) ⊆ Φ m ( K ′ ) . Deﬁnition 2.8. F or a mind m and a set K ⊆ C , the understanding closur e of K is the smallest ﬁxed point of Φ m con taining K : cl m ( K ) = T { F ⊆ C : K ⊆ F and Φ m ( F ) = F } . The understanding horizon of mind m is U m = cl m ( A m ) . F or a set map Φ m : 2 C → 2 C , a subset F ⊆ C is a ﬁxe d p oint if Φ m ( F ) = F . Fixed p oin ts are partially ordered b y set inclusion. Given K ⊆ C , a ﬁxed p oin t F ⋆ is the le ast ﬁxe d p oint c ontaining K if K ⊆ F ⋆ and F ⋆ ⊆ F for every ﬁxed p oin t F with K ⊆ F . In our setting, cl m ( K ) is deﬁned as the intersection of all ﬁxed p oin ts containing K , hence it is precisely the least kno wledge state that con tains K and is closed under all expansion rules of mind m . In particular, the understanding horizon U m = cl m ( A m ) is the least ﬁxed p oin t containing the axiom set A m , and therefore represen ts the set of all concepts p otential ly ac c essible to m , that is, the theoretical horizon reachable from its axioms under its expansion rules. R emark 2.9 . It is natural to ask whether a teacher can impart new expansion rules, or ho w a concept un teachable to a to ddler ( e.g. , abstract algebra) ev entually b ecomes learnable years later. In our framew ork, metho ds and techniques that informally feel lik e “rules”, such as the c hain rule in calculus or mo dus p onens in logic, are mo deled as c onc epts c ∈ C . They are things a learner can b e taugh t. Once mastered, they serve as prerequisites that unlo c k do wnstream concepts via the mind’s existing expansion rules. The expansion rules E m and axioms A m themselv es are not teac hable: they represen t the learner’s ﬁxed cognitiv e arc hitecture, sensory baseline, or dev elopmen tal stage ov er the timescale of a teac hing in teraction. A concept is strictly un teachable ( c / ∈ U m ) when this arc hitecture cannot bridge the gap from the axioms. If a learner even tually grasps a concept that w as structurally inaccessible to their earlier self, we mo del this not as a teaching even t, but as c o gnitive development : a transition into a new mind m ′ with ric her axioms A m ′ , ric her expansion rules E m ′ , or b oth. Our theory b ounds the fundamen tal limits of te aching a ﬁxed architecture; the long-term development of the architecture itself is a separate pro cess. Prop osition 2.10 (Existence and characterization) . F or any mind m and any K ⊆ C : (i) cl m ( K ) exists and is a ﬁxe d p oint of Φ m . (ii) cl m ( K ) = S ∞ n =0 Φ n m ( K ) , wher e Φ 0 m ( K ) = K and Φ n +1 m ( K ) = Φ m (Φ n m ( K )) . (iii) If C is ﬁnite, then cl m ( K ) = Φ N m ( K ) for some N ≤ | C | . The existence of a least ﬁxed p oin t in Prop osition 2.10 follows from the Knaster-T arski ﬁxed p oin t theorem [ T arski , 1955 ]; see also [ Aliprantis and Border , 8 2006 ] for a textb o ok treatment. W e give a self-con tained pro of for completeness in Section A . Prop osition 2.11 (Axiomatic characterization of understanding) . F or a given mind m , the set U m = cl m ( A m ) is the unique set U ⊆ C satisfying: (i) Axioms are understo o d: A m ⊆ U . (ii) Closure under expansion: if ( S , c ) ∈ E m and S ⊆ U , then c ∈ U . (iii) Minimalit y: U is the smal lest set satisfying (i) and (ii) . Prop ert y (i) of Prop osition 2.11 ensures that the axioms b elong to U m . Prop- ert y (ii) enforces closure under expansion: whenev er all prerequisites of a concept are already in the set, the concept itself must also b elong to the set. Man y subsets of C satisfy (i) and (ii); the entire concept space C is a trivial example. Prop erty (iii) remov es this am biguity by imp osing minimality: U m admits no prop er subset that b oth contains the axioms and is closed under the expansion rules. T ogether, the three prop erties determine U m uniquely . In this sense, understanding is completely determined by the axioms A m and the expansion rules E m , with no additional degrees of freedom. 2.1 Deriv ations and Equiv alence The closure cl m ( K ) tells us which concepts are reachable from K , but not how they are reached. A deriv ation mak es the “ho w” explicit: it is a ro oted tree whose no des represent rule applications and base concepts, showing step b y step why a concept lies in cl m ( K ) . By Lemma A.2 , every such tree is ﬁnite. Deﬁnition 2.12 (Deriv ation) . A derivation of c onc ept c fr om K ⊆ C in mind m is a well-founded ro oted tree whose no des are lab eled by concepts, satisfying: (i) Ev ery no de is either a b ase no de or a rule no de : • A b ase no de is a leaf (no children) lab eled by a concept in K . • A rule no de is lab eled b y a concept c ′ and has children in bijection with a set S suc h that ( S , c ′ ) ∈ E m , with each c hild lab eled b y the corresp onding elemen t of S . (ii) The ro ot is lab eled by c . W e write K ⊢ m c if such a deriv ation exists. Example 2.13 (Deriv ation trees for the tw o minds) . Con tin uing Example 2.4 , consider the deriv ation of d (m ultiplication) from the axiom set K = { a } . Figure 1 sho ws the deriv ation trees for b oth minds. In each tree, the le aves (b ottom no des, dra wn as squares) are concepts from K , whic h represents the starting knowledge. Eac h internal no de (drawn as a circle) is a concept derived b y applying one expansion rule to its children (the no des directly b elo w it). The r o ot (top no de) is the concept b eing derived. Reading each tree b ottom-up: Mind 1 deriv es a → b → c → d (addition b efore arrays); Mind 2 derives a → c → b → d (arra ys b efore addition). The tw o deriv ation trees witness the same conclusion, namely that d ∈ cl 1 ( { a } ) ∩ cl 2 ( { a } ) , but through diﬀerent in termediate paths. This provides a concrete instance of mind-relativit y . 9 Mind 1 (algorithmic) d b c a b a { b, c } ⇒ d { a } ⇒ b { b } ⇒ c { a } ⇒ b Mind 2 (visual) d c b a c a { b, c } ⇒ d { a } ⇒ c { c } ⇒ b { a } ⇒ c = leaf (axiom, already in K ) = internal no de (deriv ed concept) Figure 1: Deriv ation trees for d (m ultiplication) from K = { a } (coun ting) in the t wo minds of Example 2.4 . Eac h tree is read b ottom-up: leav es are concepts already known; each in ternal no de is derived from its children by the expansion rule shown alongside. The ro ot d is the concept b eing deriv ed. Both trees witness that d b elongs to the corresp onding understanding closure of { a } , but through diﬀeren t in termediate paths. Deriv ations provide a constructiv e counterpart to the closure: if a concept b elongs to cl m ( K ) , there must exist a ﬁnite chain of rule applications that pro duces it. The following theorem conﬁrms that the t wo characterizations are equiv alent, that is, nothing b elongs to the closure without a deriv ation, and ev ery deriv ation sta ys within the closure. Theorem 2.14 (Closure-deriv ability equiv alence) . F or any mind m , any set K ⊆ C , and any c onc ept c ∈ C , c ∈ cl m ( K ) ⇐ ⇒ K ⊢ m c . The closure op erator cl m induced b y a mind satisﬁes the usual closure axioms (extension, monotonicit y , and idemp otence). A closure op erator that additionally satisﬁes a ﬁnitary prop ert y , namely that membership in the closure dep ends only on ﬁnitely many elements, is called algebraic (see Deﬁnition A.3 ). Theorem 2.15 (Algebraic closure equiv alence) . (i) F or any mind m = ( C , A m , E m ) , the closur e op er ator cl m : 2 C → 2 C is an algebr aic closur e op er ator on C . (ii) Conversely, for any set X and any algebr aic closur e op er ator f : 2 X → 2 X , ther e exists a rule set E ⊆ 2 X ﬁn × X such that, writing Ψ E ( K ) = K ∪ { c ∈ X : ∃ S ⊆ K such that ( S , c ) ∈ E } , one has, for every K ⊆ X , f ( K ) = \ n F ⊆ X : K ⊆ F and Ψ E ( F ) = F o . 10 Theorem 2.15 shows that ﬁnitary expansion-rule systems and algebraic closure op erators are equiv alen t w ays of describing the same ﬁnitary consequence relation. In particular, every ﬁnitary expansion-rule system induces an algebraic closure op erator, and conv ersely every algebraic closure op erator on a set X admits at least one, generally non-unique, presentation b y ﬁnitary expansion rules. Thus the rule-based component of a mind should b e understo o d not as additional structure b ey ond closure, but as a presentation of an algebraic closure op erator. Conceptually , this separates structure from presen tation. The expansion rules describ e one particular ﬁnite-premise decomp osition of the underlying consequence relation, while the intrinsic ob ject is the algebraic closure op erator itself. Concretely , let X b e a nonempt y set, let f : 2 X → 2 X b e an algebraic closure op erator, and let A ⊆ X b e a c hosen set of axioms. Cho ose any rule set E ⊆ 2 X ﬁn × X whose induced closure op erator is f , as guaran teed by Theorem 2.15 (ii). Then m = ( X , A , E ) is a mind whose induced closure op erator is f , and whose understanding is U m = f ( A ) . Th us sp ecifying a mind amounts to sp ecifying an algebraic closure op erator together with an axiom set, while the rule formalism provides a ﬁnite-premise presen tation of that closure structure. 2.2 Ordered and Unordered Information In classical information theory , the information conten t of a signal is treated as a prop ert y of the source model, indep enden t of the particular receiv er. In teac hing, ho wev er, the usefulness of information is fundamen tally relativ e. The same explanation that substantially reduces uncertaint y for a prepared learner ma y con vey little or no usable information to a no vice. This relativity arises b ecause the ability to extract usable information dep ends on tw o internal factors: the learner’s prerequisite structure E m and the learner’s acquired concept set K at the time of interaction. A concept that is within reac h for one mind may b e completely inaccessible to another, either b ecause the tw o minds op erate under diﬀeren t prerequisite rules, or b ecause they share the same rules but b egin from diﬀerent acquired concept sets. F ormally , one-step accessibility of a concept is determined by the expansion map: a concept c is reac hable from the current acquired concept set K if and only if c ∈ Φ m ( K ) . Consequen tly , the information conv ey ed b y a signal is not determined by the signal alone, but by its p osition with resp ect to the learner’s mind. This relationship deﬁnes the eﬀective c hannel through whic h teac hing o ccurs. Deﬁnition 2.16 (Ordered and unordered concept) . Let m b e a mind and let K ⊆ C b e the set of concepts the learner currently kno ws. A concept c ∈ C is: (i) Or der e d for ( m , K ) if c ∈ Φ m ( K ) . Equiv alen tly , either c ∈ K , or there exists a rule ( S , c ) ∈ E m suc h that S ⊆ K . (ii) Unor der e d for ( m , K ) if c / ∈ Φ m ( K ) . Equiv alently , c / ∈ K and for every rule ( S , c ) ∈ E m , at least one prerequisite in S is missing from K . 11 A t any giv en stage of the learning pro cess, the set K represen ts the concepts actual ly ac quir e d so far . It is not assumed to b e closed under inference. The closure cl m ( K ) represen ts the set of concepts that are in principle r e achable from K under the learner’s expansion rules. A ccordingly , it need not coincide with the learner’s current acquired set at a giv en momen t. Later, when w e mo del teaching dynamics (see Section 3 ), the ev olving state K t will represent the concepts the learner has acquired by time t , whereas cl m ( K t ) will describ e the concepts that are p oten tially accessible from that state. R emark 2.17 . The distinction b et w een ordered and unordered concepts concerns deco dabilit y at the presen t moment, not whether a signal can b e stored for later use. A learner with memory could buﬀer a signal targeting a curren tly unordered concept; for example, a student migh t cop y do wn a formula they do not yet understand. Once the prerequisite concepts enter K , the stored signal ma y b ecome deco dable retroactively . In the memoryless parsing mo del in tro duced in Deﬁnition 3.3 , by contrast, a signal targeting an unordered concept is lost immediately . A natural extension would replace that parser with a dela y ed-parsing v ariant, in which raw signals are buﬀered and re-parsed whenev er K expands. Suc h a mo del could low er the teaching-time lo w er b ound, since information presen ted to o early would no longer b e wasted. Deﬁnition 2.18 (V alid ordered curriculum) . Let m b e a mind and let K 0 ⊆ C b e an initial knowledge set. A p ossibly empty ﬁnite sequence γ = (( S i , c i )) L i =1 , L ≥ 0 , is a valid or der e d curriculum starting from K 0 if: (i) ( S i , c i ) ∈ E m for each i = 1 , . . . , L ; (ii) deﬁning recursively K i = K i − 1 ∪ { c i } , i = 1 , . . . , L, (1) one has S i ⊆ K i − 1 for every i = 1 , . . . , L . Deﬁnition 2.18 formalizes the idea that a curriculum must resp ect prerequisites at ev ery step. The rule ( S i , c i ) can b e used only when all concepts in S i are already con tained in the curren t set K i − 1 . Th us the curriculum follo ws a prerequisite- resp ecting path, up dating the set of concepts acquired b y the learner one step at a time. Here K i denotes the set of concepts acquired after the ﬁrst i steps, so that K 0 ⊆ K 1 ⊆ · · · ⊆ K L . Theorem 2.19 (Ordering theorem) . F or any mind m and any tar get c ∗ ∈ U m , ther e exists a valid or der e d curriculum γ = (( S 1 , c 1 ) , . . . , ( S L , c L )) , L ≥ 0 , starting fr om A m , such that, if ( K i ) i =1 ,...,L is c onstructe d as in ( 1 ) with K 0 = A m , then c ∗ ∈ K L . Example 2.20 (V alid ordered curricula for the tw o minds) . W e illustrate Deﬁ- nition 2.18 using the tw o minds of Example 2.4 , b oth starting from the initial concept set K 0 = { a } . Mind 1 (algorithmic). A v alid ordered curriculum for Mind 1 is γ 1 = ( r 1 , r 2 , r 3 ) , r 1 = ( { a } , b ) , r 2 = ( { b } , c ) , r 3 = ( { b, c } , d ) . 12 W riting c 1 = b , c 2 = c , c 3 = d , and deﬁning K (1) 0 = { a } , K (1) i = K (1) i − 1 ∪ { c i } ( i = 1 , 2 , 3) , w e obtain K (1) 1 = { a, b } , K (1) 2 = { a, b, c } , K (1) 3 = { a, b, c, d } . Indeed, at each step the prerequisite set of the selected rule is con tained in the curren t acquired concept set. Mind 2 (visual). A v alid ordered curriculum for Mind 2 is γ 2 = ( r ′ 1 , r ′ 2 , r ′ 3 ) , r ′ 1 = ( { a } , c ) , r ′ 2 = ( { c } , b ) , r ′ 3 = ( { b, c } , d ) . W riting c ′ 1 = c , c ′ 2 = b , c ′ 3 = d , and deﬁning K (2) 0 = { a } , K (2) i = K (2) i − 1 ∪ { c ′ i } ( i = 1 , 2 , 3) , w e obtain K (2) 1 = { a, c } , K (2) 2 = { a, b, c } , K (2) 3 = { a, b, c, d } . Again, each rule is applicable when used. Th us the t wo minds admit diﬀeren t v alid ordered curricula from the same starting set. In particular, their ﬁrst steps must diﬀer. F or Mind 1 the only rule whose prerequisite set is contained in { a } is ( { a } , b ) , whereas for Mind 2 the only such rule is ( { a } , c ) . This suggests that a single common curriculum cannot in general resp ect the structural requirements of b oth minds sim ultaneously , foreshadowing the imp ossibilit y result of Section 5.2 . Prop osition 2.21 (Curricula sta y inside understanding horizon) . L et m b e a mind, and let γ = (( S i , c i )) L i =1 b e a valid or der e d curriculum starting fr om A m . L et ( K i ) i =1 ,...,L b e c onstructe d as in ( 1 ) with K 0 = A m . Then K i ⊆ U m for every i = 0 , 1 , . . . , L . In p articular, if c ∗ / ∈ U m , then no valid or der e d curriculum starting fr om A m c an r e ach c ∗ . Prop osition 2.21 draws a b oundary around what an y curriculum can achiev e. If a concept do es not b elong to the understanding horizon U m , then no sequence of rule applications, how ev er long or carefully arranged, can pro duce it. The barrier is structural, not epistemic: it is not that the teacher lacks information or that the curriculum is p o orly designed, but that the expansion rules of the learner do not connect the axioms to the target concept. In this sense, the understanding horizon U m is the theoretical horizon of the mind m . A concrete illustration is the attempt to conv ey the visual exp erience of the color purple to a learner who has b een blind from birth. Here the target concept is not the word purple or its descriptive use, but the sensory concept asso ciated with its visual app earance. Such a learner may understand many relational facts ab out color: that purple is classiﬁed b et w een blue and red in certain systems, that particular ob jects are called purple by sighted sp eak ers, or that light asso ciated with purple o ccupies a certain range of wa v elengths. But if the learner’s mind con tains no rule path from its existing concepts to that sensory target, then no curriculum, how ev er long or ingeniously ordered, can reach it. 13 2.3 Reac hable acquired concept sets The closure op erator cl m iden tiﬁes what is reac hable in principle from the axiom set, but it do es not describ e the in termediate concept sets through which a learner ma y pass on the wa y to that horizon. This distinction is structurally imp ortan t and closely related to a central idea in the literature on knowledge spaces, where one studies not only which concepts are ultimately attainable, but also which in termediate learning states are feasible along a learning pro cess [ Doignon and F almagne , 1999 , Korte and Lovász , 1983 ]. Closure is a global notion: if a concept lies in cl m ( K ) , then it is even tually reachable from K , but it need not already b elong to the current acquired concept set K . In particular, closure alone do es not record whic h subsets of U m can arise b y successive prerequisite-resp ecting acquisitions, one concept at a time. F or the structural theory of teaching, we therefore need a ﬁner ob ject than the understanding horizon alone. W e introduce the family of r e achable ac quir e d c onc ept sets : those subsets of U m that can b e built from the axioms by a ﬁnite sequence of lo cally v alid acquisitions. This family is the natural state space for teac hing dynamics. Later we show that, under a ﬁnite-horizon assumption, it has the combinatorial structure familiar from the kno wledge-space literature: after shifting b y the axiom core, it forms an antimatr oid , equiv alen tly , a learning space. Thus the framew ork do es not take feasible learning states of [ Doignon and F almagne , 1999 ] as primitive; rather, it derives them from axioms and expansion rules of a mind. Deﬁnition 2.22 (Reachable acquired concept sets) . A set K ⊆ U m is r e achable if there exists a ﬁnite chain A m = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K suc h that for eac h i = 0 , . . . , L − 1 , K i +1 = K i ∪ { c i } , c i ∈ Φ m ( K i ) \ K i . An y such chain is called a witnessing chain for the reac hability of K . Assumption 2.23 (Finite understanding horizon) . The understanding horizon U m = cl m ( A m ) is ﬁnite. Assumption 2.23 is imp osed only to place the reachable family within the ﬁnite com binatorial framew ork of learning spaces and antimatroids. The deﬁnition of reac hability itself do es not require ﬁniteness. Under this assumption, we deﬁne the r e achable family of mind m as K m = { K ⊆ U m : K is reac hable from A m } . The reachable family K m will later serv e as the state space for the teaching dynamics in Section 3 , so its in ternal structure is central to the theory . The next prop osition shows that this family has three basic features. It has a distinguished minim um state, every non-minimal reachable state can b e obtained from another reac hable state b y adding a single concept, and it is closed under unions. These prop erties are natural from the p ersp ectiv e of learning: one can build feasible states step by step, and compatible partial acquisitions can b e combined. They also place the reachable family in close corresp ondence with the combinatorial ob jects studied in the literature on learning spaces [ Doignon and F almagne , 1999 ] and antimatroids [ Korte and Lo vász , 1983 ]. 14 Prop osition 2.24 (Structure of the reachable family) . Under Assumption 2.23 , the family K m is ﬁnite and satisﬁes: (i) A m is the minimum element of the p artial ly or der e d set ( K m , ⊆ ) ; (ii) for every K ∈ K m with K  = A m , ther e exists K ′ ∈ K m such that K ′ ⊂ K and | K \ K ′ | = 1 ; (iii) if K , K ′ ∈ K m , then K ∪ K ′ ∈ K m ; (iv) U m is the maximum element of ( K m , ⊆ ) ; (v) or der e d by inclusion, ( K m , ⊆ ) is a ﬁnite join-semilattic e, and for every K , K ′ ∈ K m the join is given by K ∨ K ′ = K ∪ K ′ . Prop erties (i) through (iii) identify the core com binatorial features of the reac hable family: a distinguished minimum state, one-step accessibilit y , and union- closure. These are precisely the ingredients that connect the reachable family to the notions of learning space [ Doignon and F almagne , 1999 ] and antimatroid [ K orte and Lovász , 1983 ]. T o make the connection precise, we recall b oth concepts and their equiv alence. An an timatroid on a ﬁnite set E is a family F ⊆ 2 E satisfying: (i) ∅ ∈ F ; (ii) for ev ery nonempty S ∈ F , there exists x ∈ S suc h that S \{ x } ∈ F (accessibilit y); and (iii) F is union-closed. While matroids [ Whitney , 1935 ] axiomatize indep endence structures where feasibilit y is closed do wn ward: ev ery subset of a feasible set is feasible; an timatroids [ Korte and Lo vász , 1983 ] capture the complementary pattern: feasibility is closed up ward under unions, mo deling sequen tial construc- tion under precedence constraints. Indep enden tly , [ Doignon and F almagne , 1999 ] arriv ed at the same mathematical structure from a diﬀeren t motiv ation: modeling the feasible kno wledge states of a human learner. They called the resulting ob ject a learning space [ Doignon and F almagne , 2015 , Theorem 7], [ Doignon and F almagne , 2016 ], which is an an timatroid. The standard deﬁnition of a learning space takes the empty set as the minim um element, mo deling a learner who b egins with no kno wledge. In our setting the learner starts from the axiom set A m , so we introduce a shifted v ariant that replaces ∅ with A . Deﬁnition 2.25 ( A -based learning space) . Let U b e a ﬁnite set and let A ⊆ U . A family F ⊆ 2 U is called an A -b ase d le arning sp ac e if: (i) A ∈ F and every K ∈ F satisﬁes A ⊆ K ; (ii) for every K ∈ F with K  = A , there exists x ∈ K \ A suc h that K \ { x } ∈ F ; (iii) F is union-closed. Equiv alently , the shifted family b F = { K \ A : K ∈ F } ⊆ 2 U \ A is an antimatroid. Corollary 2.26 (Shifted antimatroid structure) . Under Assumption 2.23 , the r e achable family K m is an A m -b ase d le arning sp ac e. Equivalently, the shifte d family b K m = { K \ A m : K ∈ K m } ⊆ 2 U m \ A m is an antimatr oid. The next theorem c haracterizes the reachable families generated b y minds: they are precisely the A -based learning spaces. Theorem 2.27 (Representation of reachable families) . L et C b e a ﬁnite set, let A ⊆ C , and let F ⊆ 2 C . The fol lowing ar e e quivalent: 15 (i) F is an A -b ase d le arning sp ac e; (ii) ther e exists a mind m = ( C , A , E m ) whose r e achable family satisﬁes K m = F . Mor e over, when (i) holds, the mind m F = ( C , A , E F ) with rule set E F = { ( S , c ) : S ∈ F , c ∈ C \ S , S ∪ { c } ∈ F } satisﬁes K m F = F . Figure 2 illustrates the reachable family for a mind with axiom set A m = { a } and expansion rules { a } ⇒ b, { a } ⇒ c, { b, c } ⇒ d . Starting from K = { a } , the learner can acquire b or c in either order, since b oth are individually unlo c k ed b y the axiom. Ho w ever, d b ecomes reac hable only once b oth b and c ha ve b een acquired, so { a, b, c } is the unique gatewa y to d . The set { a, b, d } , for instance, do es not b elong to K m b ecause the rule for d requires c , whic h is absen t. The ﬁgure makes b oth accessibility and union-closure visible. { a } { a, b } { a, c } { a, b, c } { a, b, c, d } + b + c + c + b + d Figure 2: The reac hable family K m for a mind with axiom set A m = { a } and expansion rules { a } ⇒ b , { a } ⇒ c , { b, c } ⇒ d . The concept d b ecomes reac hable only at { a, b, c } , where b oth pre- requisites are present. Sets suc h as { a, b, d } are structurally unreac h- able. Theorem 2.27 c haracterizes the reachable families generated b y minds as the A -based learning spaces. This has t wo consequences for the present work. First, the feasible knowl- edge states of a mind need not be postulated as a primitiv e; they are deriv ed from axioms and expansion rules, and the resulting state space automatically inherits the rich combina- torial structure of an antimatroid. Second, the con v erse direction guarantees that the mind for- malism is fully expressiv e: an y learning space one might wish to study can b e generated by a suitably c hosen mind. Th us the structural and the generativ e viewp oints are equiv alen t. W e note, how ever, that not ev ery union-closed family ab o v e the axioms qualiﬁes as an A -based learning space. A ccessibilit y is an additional re- quiremen t. It rules out degenerate state spaces in whic h the learner cannot progress one concept at a time; see Corollary B.1 . 3 T eac hing and Learning Dynamics Understanding characterizes whic h concepts are in principle accessible under a prerequisite structure. T eac hing introduces a second challenge b ey ond accessibilit y: the learner must iden tify the teaching target. A signal ab out addition in a mathematics course, for example, ma y indicate that addition is itself the in tended endp oin t, or it may b e an intermediate step on the w ay to multiplication. This is the identiﬁcation comp onen t of teaching. It is here that in ten tionality en ters. A teaching mov e is not merely the presen tation of a concept; it is an action chosen in light of a target and interpreted b y the learner as evidence ab out that target. T o represent this as ymmetry , we 16 mo del the target concept as a latent v ariable kno wn to the teac her and unknown to the learner, and we represent the learner’s ev olving belief as a probabilit y distribution ov er candidate targets. The laten t target need not b e interpreted only as the teac her’s in tended endp oin t. It may also b e read as the higher-level concept that renders the curren tly acquired material globally coherent. On this interpretation, learning in volv es tw o coupled dimensions: the acquired concept set expands, while the learner simultaneously infers which larger target those concepts are organizing to ward. A concept may therefore be acquired locally b efore its place in the larger conceptual graph is understo o d. F or example, a learner may acquire many concepts from electromagnetism and electronics while still lacking the bridge concept that connects them to wireless comm unication. Once that target is iden tiﬁed, previously disconnected material b ecomes integrated as part of a single explanatory structure. T eaching dynamics therefore in volv e b oth structural and epistemic progress. Structural progress is gov erned by the prerequisite structure: once the learner is at an acquired set from whic h a concept is or der e d , and the appropriate signal is successfully parsed, that concept en ters the learner’s acquired concept set. Epistemic progress, b y con trast, concerns the gradual resolution of uncertain ty ab out the target. Because the learner do es not know whic h target the teacher in tends, eac h signal m ust play a dual role: it m ust b e a v alid instructional step in the prerequisite structure, and it must simultaneously provide evidence that distinguishes the intended target from the alternativ es. F rom the learner’s p erspective, the observ ed signal is therefore a random v ariable whose distribution dep ends on b oth the unknown target and the teaching strategy . Eac h round can con v ey only a b ounded amoun t of usable information ab out the latent target, and the total teac hing time is gov erned by the rate at which this epistemic uncertain ty is resolved. If the learner knew the target from the outset, the epistemic dimension w ould disapp ear and teaching would reduce to the purely structural problem of reac hing a kno wn target b y a v alid curriculum. W e now mak e these ideas concrete by in tro ducing a sto c hastic mo del of teac hing. 3.1 A Sto chastic Mo del of T eac hing Fix a probabilit y space (Ω 0 , F , P ) on which all random v ariables b elo w are deﬁned. Let Ω ⊆ C b e a ﬁnite set of tar get c onc epts . Let Θ : Ω 0 → Ω b e an Ω -v alued random v ariable representing the realized target concept, kno wn to the teacher but unknown to the learner. The learner’s goal is to identify Θ . Let Z b e a ﬁnite set, called the te aching signal set , consisting of the ra w signals the teacher can emit. Let ⊥ / ∈ Z b e an additional symbol representing a n ull observ ation pro duced when a signal cannot b e parsed at the learner’s current kno wledge state. The learner observ ation set is Y = Z ∪ {⊥} . Deﬁnition 3.1 (Signal target map) . A signal tar get map is a function tgt : Z → C that assigns to eac h raw teac hing signal z ∈ Z the concept tgt ( z ) ∈ C that the 17 signal is intended to teach. W e assume that every target concept is asso ciated with at least one raw signal, that is, Ω ⊆ im (tgt) . F or each concept c ∈ C , the ﬁb er tgt − 1 ( c ) = { z ∈ Z : tgt ( z ) = c } is the set of all ra w signals designed to teach c , represen ting diﬀerent explanations, examples, or phrasings of the same concept. Signals in the ﬁb er tgt − 1 ( c ) all target the same concept and therefore hav e the same structural eﬀect on the learner’s acquired concept set. How ev er, they may still diﬀer informationally: distinct signals in the ﬁb er can enco de diﬀerent information ab out the laten t target Θ . R emark 3.2 (Fixed signal system and notation) . The raw signal alphab et Z and the target map tgt are treated as ﬁxed throughout a given teac hing problem. Capacit y quantities in tro duced later therefore dep end not only on the mind m and the acquired concept set K , but also on this signal system ( Z , tgt ) . When no am biguity arises, we suppress this dep endence in the notation. The signal target map tgt and the laten t target Θ play complementary but distinct roles. The random v ariable Θ ∈ Ω sp eciﬁes what the learner must ultimately iden tify: the realized target concept. The map tgt sp eciﬁes what each individual signal is ab out: a signal z with tgt ( z ) = c is designed to teach concept c , which may or ma y not equal Θ . In general, signals targeting prerequisite concepts may need to b e presented b efore signals targeting Θ itself can b ecome usable to the learner. Th us the teac her’s ev entual strategy has tw o degrees of freedom: which concept to target, and which particular enco ding of that concept to use within the ﬁb er tgt − 1 ( c ) . Consequen tly , a signal may carry information ab out the target ev en when it do es not directly target the concept Θ . W e now introduce the p arsing map ρ m , which takes a raw teac hing signal together with the learner’s curren t knowledge set and either passes the signal through, when the prerequisites are satisﬁed, or collapses it to the n ull tok en ⊥ otherwise. Deﬁnition 3.3 (Parsing map) . A mind m is equipp ed with a p arsing map ρ m : Z × 2 C → Z ∪ {⊥} , where ⊥ is a nul l token indicating that the signal is unparseable. F or a signal z ∈ Z with target c = tgt ( z ) and a knowledge set K ⊆ C : (i) ρ m ( z , K ) = z if c ∈ Φ m ( K ) , equiv alen tly , if either c ∈ K already or there exists a rule ( S , c ) ∈ E m with S ⊆ K ; (ii) ρ m ( z , K ) = ⊥ if c / ∈ Φ m ( K ) , equiv alen tly , if c / ∈ K and no rule for c has all its prerequisites in K . The condition c ∈ Φ m ( K ) is the ordered condition of Deﬁnition 2.16 . A concept may hav e multiple prerequisite sets, and the signal is parseable if any one of them is satisﬁed. Dynamics. W e model teaching as a rep eated interaction b et ween a teac her and a learner unfolding o v er discrete rounds t = 0 , 1 , 2 , . . . . The mo del uses a concept-lev el time scale: one round represen ts a single instructional in teraction 18 in whic h the teac her emits one raw signal, the learner observes its parsed v ersion, and the learner’s acquired concept set ma y b e up dated as a result. W e take the learner’s initial acquired concept set to b e the axiom set of the mind: K 0 = A m . F or each t ≥ 0 , the set K t ⊆ C denotes the concepts acquired b y the learner after the ﬁrst t rounds of instruction. A t round t + 1 , the teac her emits a raw signal Z t +1 ∈ Z . Given the learner’s curren t acquired concept set K t , the learner observ ation is the parsed signal Y t +1 = ρ m ( Z t +1 , K t ) ∈ Y . Deﬁnition 3.4 (Concept-acquisition up date rule) . Giv en the parsed observ ation Y t +1 ∈ Y , the learner’s acquired concept set evolv es according to K t +1 =      K t ∪ { tgt( Y t +1 ) } if Y t +1 ∈ Z , K t if Y t +1 = ⊥ . Under this up date rule, eac h round can add at most one newly acquired concept, namely the concept targeted by the parsed signal when parsing succeeds. Time is therefore measured in units of concept-lev el teac hing opp ortunities. The rule in Deﬁnition 3.4 has tw o immediate consequences. First, acquisition is monotone: K t ⊆ K t +1 for all t ≥ 0 . Second, the set K t records only concepts that ha v e b een explicitly acquired through parsed instruction; it is not automatically closed under the expansion rules. Th us a concept may already b e reac hable from K t , in the sense that c ∈ Φ m ( K t ) , without ye t b elonging to K t itself. The learner acquires suc h a concept only at a later round in which it receives a parseable signal targeting c . Therefore, U m = cl m ( A m ) describ es what is in principle reachable from the axioms, whereas the pro cess ( K t ) t ≥ 0 describ es what has actually b een acquired ov er time. Lemma 3.5 (The instructional pro cess stays inside the reachable family) . F or every t ≥ 0 , one has K t ∈ K m almost sur ely. Lemma 3.5 sho ws that the stochastic teac hing process ev olv es within the reac hable family K m . Th us the family introduced in Section 2.3 not only describ es structurally feasible knowledge states but also forms the natural state space for the instructional dynamics. Deﬁnition 3.6 (Admissible teac hing strategy) . An admissible te aching str ate gy is a sequence of sto c hastic k ernels κ t +1 ( · | θ , y 1 , . . . , y t ) ∈ ∆( Z ) , t ≥ 0 , so that, conditional on the realized target Θ = θ and the parsed history ( Y 1 , . . . , Y t ) = ( y 1 , . . . , y t ) , the teac her chooses the next raw signal Z t +1 according to κ t +1 . Because the learner’s epistemic ob jectiv e is to identify the latent target Θ , it main tains at eac h time t a b elief ov er the p ossible target concepts. This b elief is up dated from the parsed observ ations Y 1 , . . . , Y t , rather than from the raw teac her 19 emissions Z 1 , . . . , Z t , which are not directly observ ed by the learner. Acc ordingly , deﬁne the learner’s information ﬁltration by F t = σ ( Y 1 , . . . , Y t ) ⊆ F , t ≥ 1 , and set F 0 = { ∅ , Ω 0 } . Given a ﬁxed prior π 0 and a ﬁxed admissible teac hing strategy , the learner’s p osterior at time t is the random probabilit y v ector π t ∈ ∆(Ω) deﬁned by π t ( c ) = P (Θ = c | F t ) , c ∈ Ω . The conditional probability is taken with resp ect to the probability law induced b y the prior π 0 and the admissible teaching strategy . Th us the learner is mo deled as Bay esian: its b elief state at time t is the p osterior distribution of the latent target given the parsed observ ation history . Deﬁnition 3.7 (Learning state) . A le arning state at time t is a pair ( K t , π t ) where (i) K t ⊆ C is the learner’s acquired concept set at time t ; (ii) π t ∈ ∆(Ω) is the learner’s p osterior b elief ov er target concepts. Th us the learning state records b oth dimensions of progress in the teaching pro cess: structural progress, captured by the acquired concept set K t , and epistemic progress, captured by the p osterior b elief π t ab out the laten t target. The sto chastic teac hing dynamics therefore ev olve on the pro duct space K m × ∆(Ω) . Deﬁnition 3.8 (Completion) . T eaching is c omplete at time τ if b oth (i) tar get ac quisition : Θ ∈ K τ ; (ii) identiﬁc ation : π τ (Θ) = 1 . The framework therefore distinguishes three related notions. First, the un- derstanding horizon U m = cl m ( A m ) is the set of concepts that are in principle reac hable from the axioms under the expansion rules of the mind. Second, the time-indexed set K t records whic h concepts hav e actually b een acquired through instruction b y time t . Third, the completion condition of Deﬁnition 3.8 formalizes successful teaching of the target: it requires b oth acquisition of the target, Θ ∈ K τ , and identiﬁcation of the target, π τ (Θ) = 1 . Acquisition without identiﬁcation corresp onds to ha ving acquired a concept without yet kno wing that it is the in tended target. Identiﬁcation without acquisition corresp onds to knowing whic h concept is intended without y et having reac hed it. Completion requires b oth. Example 3.9 (A full teac hing interaction) . Let C = { a, b, c, d } with the informal readings a = counting , b = addition , c = arrays , d = multiplication . Fix Mind 1 from Example 2.4 , with axiom set A m = { a } and expansion rules { a } ⇒ b, { b } ⇒ c, { b, c } ⇒ d. 20 Let Ω = { b, c, d } , let the prior b e uniform on Ω , and let the teac her use the deterministic p olicy Θ = b : ( Z 1 , Z 2 , Z 3 ) = ( z (1) b , z (1) b , z (1) b ) , Θ = c : ( Z 1 , Z 2 , Z 3 ) = ( z (1) b , z (1) c , z (1) c ) , Θ = d : ( Z 1 , Z 2 , Z 3 ) = ( z (1) b , z (1) c , z (1) d ) , where tgt ( z (1) b ) = b , tgt ( z (1) c ) = c , and tgt ( z (1) d ) = d . Supp ose the realized target is Θ = d . The learner starts from K 0 = { a } , π 0 ( b ) = π 0 ( c ) = π 0 ( d ) = 1 3 . A t t = 0 , the teacher emits Z 1 = z (1) b . Since b ∈ Φ m ( { a } ) , the signal is parseable, so Y 1 = z (1) b and K 1 = { a, b } . Because the same ﬁrst signal is prescrib ed under all three targets, the observ ation Y 1 = z (1) b do es not y et distinguish among them, and therefore π 1 = π 0 . A t t = 1 , the teacher emits Z 2 = z (1) c . Since b has already b een acquired, the concept c is now ordered, so the signal is parseable: Y 2 = z (1) c and K 2 = { a, b, c } . Under the stated policy , the history ( Y 1 , Y 2 ) = ( z (1) b , z (1) c ) is inconsisten t with Θ = b . Hence the p osterior assigns zero mass to b and splits mass equally b et w een c and d : π 2 ( b ) = 0 , π 2 ( c ) = π 2 ( d ) = 1 2 . A t t = 2 , the teacher emits Z 3 = z (1) d . Since b oth b and c are no w present, the concept d is ordered, so the signal is parseable: Y 3 = z (1) d and K 3 = { a, b, c, d } . No w the full observ ation history is consistent only with Θ = d , so π 3 = δ d . Th us teac hing is complete at time τ = 3 : the learner has b oth acquired the target, Θ = d ∈ K 3 , and identiﬁed it, π 3 ( d ) = 1 . This example illustrates the distinction betw een structural acquisition, enco ded by the pro cess ( K t ) , and epistemic identiﬁcation, enco ded by the p osterior pro cess ( π t ) . 3.2 The Epistemic Arro w of Time W e now formalize the epistemic comp onen t of the teaching dynamics. The k ey question is how the learner’s uncertaint y ab out the laten t target evolv es as parsed observ ations accumulate ov er time. This motiv ates the term epistemic arr ow of time : although particular observ ations may b e uninformative, p osterior uncertain t y can only decrease in conditional exp ectation under Bay esian up dating. The information-theoretic notions used b elo w are standard; see, for example, [ Co ver and Thomas , 2006 , §2]. 21 Information-theoretic quantities. Let X and Y b e discrete random v ariables on a probability space (Ω 0 , F , P ) taking v alues in ﬁnite or countable sets X and Y . W e adopt the conv ention 0 log 0 = 0 . The Shannon entrop y of X is H ( X ) = − X x ∈ X P ( X = x ) log P ( X = x ) . F or a sub- σ -ﬁeld G ⊆ F , deﬁne the p athwise c onditional entr opy of X giv en G b y H ( X | G ) = − X x ∈ X P ( X = x | G ) log P ( X = x | G ) . Its expectation E [ H ( X | G )] is the usual conditional en tropy . F or brevit y , w e write H ( X | Y ) = E [ H ( X | σ ( Y ))] . The mutual information b et w een X and Y is I ( X ; Y ) = H ( X ) − H ( X | Y ) . The conditional mutual information giv en G is I ( X ; Y | G ) = H ( X | G ) − E [ H ( X | G ∨ σ ( Y )) | G ] . In the teaching mo del, Θ is Ω -v alued and the learner ﬁltration is F t = σ ( Y 1 , . . . , Y t ) . W e deﬁne the epistemic entr opy at time t by H t = H (Θ | F t ) . Since π t ( c ) = P (Θ = c | F t ) , this may b e written as H t = − X c ∈ Ω π t ( c ) log π t ( c ) a.s. Th us H t is the Shannon entrop y of the learner p osterior at time t . Prop osition 3.10 (En trop y drop equals information ﬂo w) . The one-r ound exp e cte d r e duction in epistemic entr opy satisﬁes E [ H t − H t +1 | F t ] = I (Θ; Y t +1 | F t ) . Prop osition 3.10 expresses a conserv ation principle: the exp ected reduction in p osterior uncertain t y about the target is equal to the conditional m utual information conv ey ed by the next parsed observ ation. In other words, exp ected learning progress in one round is precisely the amount of information that Y t +1 carries ab out the target Θ . Theorem 3.11 (Epistemic arro w of time) . The epistemic entr opy pr o c ess ( H t ) t ≥ 0 is a sup ermartingale: E [ H t +1 | F t ] ≤ H t , with e quality if and only if Y t +1 is indep endent of Θ given F t . 22 Theorem 3.11 formalizes the epistemic arrow of time: p osterior uncertaint y decreases in conditional exp ectation, although along particular sample paths it ma y increase after a realized observ ation. Equalit y holds when the next observ ation carries no information ab out the target. R emark 3.12 (Ba yesian mo deling c hoice) . By deﬁning π t ( c ) = P (Θ = c | F t ) , we ha v e adopted a Bay esian learner mo del: the learner b elief is the true conditional distribution of the target giv en the parsed observ ation history . This is not the only possible c hoice, but it is natural here for three reasons. First, π t uses all information contained in the observ ations and nothing else, so it is determined entirely b y the prior π 0 and the ﬁltration F t . Second, b ecause π t is the conditional distribution of Θ giv en F t , the epistemic en tropy H t coincides with the conditional entrop y H (Θ | F t ) . This mak es mutual information the natural measure of learning progress: each new observ ation reduces p osterior uncertain ty by precisely I (Θ; Y t +1 | F t ) in conditional exp ectation. Third, the completion condition π τ (Θ) = 1 then has a strong in terpretation: the parsed observ ations iden tify the target, rather than the learner merely arriving at the correct answer by c hance. 3.3 Prerequisites and the Relativit y of Randomness The epistemic arrow of time in Theorem 3.11 describ es how uncertain ty ev olves once observ ations are received. It do es not, how ever, determine what the learner actually observes. In the teaching mo del the learner do es not observe the raw teac her signal Z t +1 directly; instead it receives the parsed observ ation Y t +1 = ρ m ( Z t +1 , K t ) , where the parsing map dep ends on the learner’s curren t acquired concept set. When the targeted concept is ordered, the signal passes through unc hanged; when prerequisites are missing, the parser collapses the signal to the n ull tok en ⊥ . The eﬀectiv e information channel from Θ to the learner is therefore state dependent. In particular, the same ra w broadcast ma y transmit usable information to one learner while b eing erased for another. The next result formalizes this phenomenon. As throughout, conditional m utual-information expressions given U t +1 or U c t +1 are understo o d on the even t where the relev an t conditioning probabilit y is positive, and are tak en to b e 0 otherwise. Theorem 3.13 (Relativit y of randomness) . L et C t +1 = tgt ( Z t +1 ) b e the tar gete d c onc ept, and deﬁne the unp arse ability event U t +1 = { C t +1 / ∈ Φ m ( K t ) } . A ssume that on p arse able r ounds the r aw te acher signal is informative ab out the latent tar get: I (Θ; Z t +1 | F t , U c t +1 ) > 0 . Then the le arner’s p er-r ound information tr ansfer exhibits an eventwise dichotomy: I (Θ; Y t +1 | F t , U t +1 ) = 0 (erasure) , I (Θ; Y t +1 | F t , U c t +1 ) > 0 (informative) . Theorem 3.13 shows that the usable information in a teaching signal is state dep enden t. Under the parsing map ρ m , if the targeted concept is unordered at 23 K t , then the parsed observ ation collapses to ⊥ ; b y Theorem 3.13 , the learner receiv es no further within-ev en t discrimination from the ra w signal on that even t: conditional on unparseabilit y , the parsed observ ation is the constan t ⊥ , although the o ccurrence of unparseabilit y itself ma y still be informativ e ab out Θ . By con trast, on parseable rounds the same ra w broadcast may transmit strictly p ositiv e information. In this precise sense, the informational status of a signal is relativ e to the learner’s structural capacity to deco de it. This relativity is consistent with classical information theory . Randomness has alw ays b een observer dep enden t: a ciphertext app ears as pure noise without the cryptographic key [ Shannon , 1949 ], and conditional m utual information formalizes the dep endence of information on what is known [ Cov er and Thomas , 2006 ]. What is distinctive here is the mechanism that generates this dep endence: the learner’s deco ding p o w er is gov erned b y the combinatorial closure op erator Φ m , so prerequisite top ology directly determines when the channel b eha v es as iden tity and when it b eha v es as erasure. The notion of mind-r elative r andomness introduced earlier is related to, but distinct from, the com binatorial distinction b et w een ordered and unordered concepts from Deﬁnition 2.16 . The latter is a structural prop ert y of the targeted concept relative to the learner’s acquired concept set, whereas the former is an epistemic prop ert y of the observ ation pro cess relative to the laten t target Θ . In the sharp parsing mo del, if the teacher targets a concept C t +1 = tgt ( Z t +1 ) that is unordered at the current acquired concept set, C t +1 / ∈ Φ m ( K t ) , then the parser maps every such ra w signal to the same n ull observ ation: Y t +1 = ⊥ almost surely on that even t. Thus all distinctions among those raw signals are erased at the learner end of the channel. Ho w ever, the app earance of ⊥ do es not by itself imply mind-relative random- ness. Ev en though the sym b ol ⊥ con tains no in ternal distinctions, the ev en t { Y t +1 = ⊥} ma y still con vey information ab out the target Θ . In particular, if the teacher’s targeting rule dep ends on Θ , then the probability that the teacher selects a concept outside Φ m ( K t ) ma y v ary with Θ , and observing ⊥ can up date the learner’s p osterior b elief. Con versely , an ordered round need not be informative. If C t +1 ∈ Φ m ( K t ) , then the signal is parseable and Y t +1 = Z t +1 . But even in this case the parsed observ ation ma y still b e mind-random if, conditional on the public history F t , the teac her’s p olicy induces the same distribution of Y t +1 under every p ossible target. Equiv alently , Θ ⊥ ⊥ Y t +1 | F t . Thus parseabilit y and informativ eness are logically distinct: an unordered round may still b e informativ e through the o ccurrence of erasure, while an ordered round may b e uninformative if the parsed signal distribution do es not dep end on the target. An immediate consequence of sharp parsing is that rep eated rephrasings of the same unordered concept do not help. If c / ∈ Φ m ( K t ) , then ev ery raw signal targeting c collapses to the null observ ation ⊥ , regardless of how many distinct enco dings or phrasings are av ailable (see Corollary B.4 ). Thus, on that even t, rep etition and rephrasing do not reduce epistemic uncertaint y ab out the target. Com bined with the prerequisite gating established in Theorem 3.13 , these 24 observ ations formalize a central thesis of the framework: whether a broadcast con v eys usable information is not an intrinsic prop ert y of the signal itself, but of the interaction among the signal, the learner’s current acquired concept set, and the teacher’s p olicy . R emark 3.14 . The relativity of randomness established in Theorem 3.13 suggests a broader p erspective in which randomness itself b ecomes observer dep enden t. The parsing map ρ m determines, for eac h mind and acquired state, which signals are informative and which collapse to noise. In this sense, randomness is not an in trinsic prop ert y of a signal but a relation b et ween the signal and the observer’s structure of understanding. 4 Sp eed Limits of T eac hing W e now derive the quantitativ e sp eed limits of the teaching mo del. T w o obstruc- tions co exist. The ﬁrst is structur al : the learner must acquire enough prerequisite concepts for the target to b ecome reachable. The second is epistemic : the learner m ust resolv e uncertain ty ab out which target concept the teac her intends. The purp ose of this section is to formalize b oth obstructions and com bine them in to a single low er b ound on the exp ected completion time. Fix a mind m = ( C , A m , E m ) and a ﬁnite target set Ω ⊆ U m = cl m ( A m ) . Thus ev ery target under consideration lies in the learner understanding horizon. 4.1 Iden tiﬁcation and state-dep enden t capacit y Recall that Θ : Ω 0 → Ω is the realized target concept and that the learner observ es the parsed history F t = σ ( Y 1 , . . . , Y t ) . The learner epistemic ob jectiv e is to identify Θ from this history . W e say that identiﬁcation o ccurs at time t if π t (Θ) = 1 , equiv alen tly , H (Θ | F t ) = 0 . Since completion additionally requires target acquisition, iden tiﬁcation is a strictly weak er requirement than full teaching completion. Deﬁnition 4.1 (Identiﬁcation stopping time) . A random time τ id is an identiﬁ- c ation stopping time if: (i) τ id is an ( F t ) -stopping time; (ii) P ( τ id < ∞ ) = 1 ; (iii) Θ is F τ id -measurable. Equiv alently , H (Θ | F τ id ) = 0 almost surely . Because the parsing map ρ m dep ends on the learner’s curren t acquired concept set K t , the eﬀectiv e learner-side c hannel is state dep enden t. A t early stages man y ra w signals ma y collapse to ⊥ , whereas later the same signals may pass through unc hanged once the relev an t prerequisites ha v e b een acquired. Let K m b e the reac hable family in tro duced in Deﬁnition 2.22 . F or eac h reac hable acquired concept set K ∈ K m , deﬁne the ordered raw-signal set 25 Z ord ( K ) = { z ∈ Z : tgt ( z ) ∈ Φ m ( K ) } . Under sharp parsing, signals in Z ord ( K ) pass through unchanged, while all other ra w signals collapse to ⊥ . This leads to the following learner-side capacit y notion. Deﬁnition 4.2 (State-dep enden t parsed en trop y b ound) . F or each K ∈ K m , deﬁne C m ( K ) = sup n H  ρ m ( Z, K )  : Z is an Z -v alued random v ariable o . The parsed en tropy b ound C m ( K ) also dep ends on the signal system ( Z , tgt ) . Throughout the pap er this instructional in terface is treated as ﬁxed, and we therefore suppress this dep endence in the notation. F or a giv en interface, the v ariation of C m ( K ) across acquired concept sets is endogenous to the learner state, whereas its n umerical lev el is determined jointly by the mind m and the signal system ( Z , tgt ) . Thus C m ( K ) should b e understo od as a prop ert y of the pair ( m , ( Z , tgt)) ev aluated at state K . Th us C m ( K ) is the largest Shannon entrop y that a one-round parsed obser- v ation ρ m ( Z, K ) can attain at the learner end of the channel when the acquired concept set is K , as the la w of the raw input signal Z ranges ov er all Z -v alued distributions. Prop osition 4.3 (Statewise one-round information bound) . F or every t ≥ 0 , I (Θ; Y t +1 | F t ) ≤ C m ( K t ) almost sur ely. Prop osition 4.3 shows that the learner’s p er-round information gain ab out the target is b ounded b y the capacity C m ( K t ) , which dep ends on the learner’s acquired concept set at time t . As the learner acquires more concepts, the set of parseable signals grows, and the capacity may increase. The b ound is therefore not static: structural progress expands the eﬀective channel through which teaching o ccurs. This coupling b et w een structural progress and informational capacity is the mechanism through which prerequisites gov ern the sp eed of teaching. The next lemma sho ws that the learner-side channel can only improv e as the learner acquires more concepts. Lemma 4.4 (Monotonicit y of the state-dep enden t b ound) . If K , K ′ ∈ K m satisfy K ⊆ K ′ , then C m ( K ) ≤ C m ( K ′ ) . Lemma 4.4 reﬂects a basic prop ert y of the parsing mo del: acquiring additional concepts cannot reduce the learner’s ability to deco de signals. When the acquired concept set grows, previously parseable signals remain parseable, and additional signals ma y b ecome usable. Consequen tly the entrop y of the parsed observ ation, and therefore the eﬀective c hannel capacity , cannot decrease as the learner acquires additional concepts. This monotonicity admits a stronger statistical interpretation. T o formalize it, w e use the Blac kwell order on statistical exp erimen ts [ Blackw ell , 1953 ]. Informally , one exp erimen t Blackw ell-dominates another if the latter can b e obtained from the former b y garbling, that is, b y post-pro cessing through a stochastic map indep enden t of the underlying state. Equiv alently , the dominating exp erimen t is at least as informative for ev ery statistical decision problem. 26 Deﬁnition 4.5 (Blackw ell domination) . Let Ω b e a ﬁnite state space, and let W : Ω → ∆( Y ) and W ′ : Ω → ∆( Y ′ ) b e tw o statistical exp erimen ts. W e sa y that W Blackwel l-dominates W ′ if there exists a Marko v kernel G : Y → ∆( Y ′ ) such that for ev ery ω ∈ Ω , W ′ ( · | ω ) = P y ∈ Y G ( · | y ) W ( y | ω ) . Equiv alently , W ′ is a garbling of W . Theorem 4.6 (Blac kw ell order on acquired concept sets) . Fix t ≥ 0 and a public history h t = ( y 1 , . . . , y t ) ∈ Y t with P (( Y 1 , . . . , Y t ) = h t ) > 0 . F or e ach K ∈ K m , let W K ,h t denote the statistic al exp eriment fr om Θ to the p arse d observation induc e d by the c onditional r aw-signal law P  Z t +1 ∈ · | Θ = ω , ( Y 1 , . . . , Y t ) = h t  , ω ∈ Ω . If K ⊆ K ′ , then W K ′ ,h t Blackwel l-dominates W K ,h t . Theorem 4.6 holds for eac h realized public history h t separately . Thus the ordering of acquired concept sets is path wise rather than merely av eraged: con- ditional on an y history for which the next-round ra w-signal la w is deﬁned, the parsed exp erimen t induced b y a larger acquired concept set Blac kwell-dominates the parsed exp erimen t induced b y a smaller one. This theorem strengthens Lemma 4.4 . The monotonicit y of C m ( K ) says that larger acquired concept sets p ermit w eakly greater parsed entr opy . Theorem 4.6 sho ws more: they induce uniformly more informative exp erimen ts in the sense of statistical decision theory . The univ ersal-broadcast theorem of Theorem 5.6 will show that this dep endence on the learner prerequisite structure cannot, in general, b e eliminated b y a common broadcast curriculum. 4.2 Structural and epistemic lo w er b ounds W e now combine the structural and epistemic constrain ts of the mo del to derive a single low er b ound on teaching time. Deﬁnition 4.7 (Structural distance to a target concept) . F or c ∈ U m , deﬁne L m ( c ) = min  L ≥ 0 : ∃ K 0 , . . . , K L ∈ K m , u 0 , . . . , u L − 1 ∈ U m suc h that K 0 = A m , c ∈ K L , K i +1 = K i ∪ { u i } , u i ∈ Φ m ( K i ) \ K i , i = 0 , . . . , L − 1  . The quantit y L m ( c ) measures the shortest prerequisite-resp ecting route from the axioms to a state con taining c . It therefore giv es the natural structural b enc hmark against which an y completion time m ust b e compared. The ﬁrst fundamen tal constrain t on teaching time is structural: the learner must tra verse the prerequisite chain b efore the target can b e acquired. Prop osition 4.8 (Structural barrier) . L et τ b e any c ompletion time in the sense of Deﬁnition 3.8 . Then τ ≥ L m (Θ) almost sur ely. Conse quently, E [ τ ] ≥ E [ L m (Θ)] . 27 Prop osition 4.8 is the purely geometric obstruction in the mo del. Regardless of how informativ e the signals are, the learner cannot complete teac hing b efore tra v ersing a prerequisite-resp ecting path to a state con taining the realized target. Since each round adds at most one concept, the shortest suc h path gives an una voidable low er b ound on completion time. T o control the epistemic obstruction, w e next aggregate the information gained across all rounds up to identiﬁcation τ id . The p oin t is that the p er-round entrop y drop identit y from Prop osition 3.10 telescop es o ver time. Lemma 4.9 (T otal information required for identiﬁcation) . L et τ id b e an identi- ﬁc ation stopping time. Then E " τ id − 1 X t =0 I (Θ; Y t +1 | F t ) # = H (Θ) . Lemma 4.9 sa ys that identiﬁcation must pay for the full initial uncertain ty of the target. The cum ulative conditional m utual information transmitted through the parsed observ ations up to identiﬁcation is the en tropy of Θ . Thus the learner cannot iden tify the target un til enough usable information has ﬂow ed through the learner-side channel to resolv e all initial uncertain ty . The next step is to combine this accounting iden tity with the statewise capacity b ound from Prop osition 4.3 . This con ve rts total required information into a low er b ound expressed in terms of the learner tra jectory through the reac hable family . Prop osition 4.10 (T ra jectory information budget) . L et τ id b e any identiﬁc ation stopping time. Then H (Θ) ≤ E " τ id − 1 X t =0 C m ( K t ) # . Prop osition 4.10 is the dynamic information budget of the mo del. The total target uncertain t y cannot exceed the cumulativ e parsed capacity along the states visited b efore iden tiﬁcation. In this sense, a curriculum may need to sp end rounds building the deco der b efore it can eﬀectiv ely use it: the states through whic h the learner passes determine the rate at which target information can b e transmitted. W e deﬁne C max m = max K ∈ K m C m ( K ) . This maximum is well deﬁned b ecause, under Assumption 2.23 , the reachable family K m is ﬁnite by Prop osition 2.24 . Theorem 4.11 (Global structural-information low er b ound) . L et τ b e any c om- pletion time, then E [ τ ] ≥ max ( E [ L m (Θ)] , H (Θ) C max m ) . Theorem 4.11 is the central sp eed la w of the framew ork. T eac hing is con- strained sim ultaneously b y prerequisite geometry and b y information transmission. The lo w er b ound tak es the form of a maxim um rather than an additiv e sum b ecause structural progress may itself conv ey information ab out the target. Nev- ertheless b oth b ottlenec ks must b e cleared. 28 Assumption 4.12 (Structural signal a v ailabilit y) . F or every c onc ept u ∈ U m , ther e exists a r aw signal z ∈ Z such that tgt( z ) = u . Earlier w e required only that ev ery possible target concept admit a corre- sp onding signal, that is, Ω ⊆ im ( tgt ) . Assumption 4.12 is stronger: it requires signals for all concepts in the understanding horizon U m , including intermediate prerequisites. This assumption ensures that the teacher can implemen t any v alid ordered curriculum by emitting signals targeting the concepts that m ust b e acquired along the path to the target. Prop osition 4.13 (Direct target signaling collapses the epistemic term in the baseline mo del) . L et Ω + = { c ∈ Ω : π 0 ( c ) > 0 } b e the supp ort of the prior. (i) If Ω ⊆ im(tgt) , then H (Θ) C max m ≤ 1 . (ii) Under Assumption 4.12 , ther e exists an admissible te aching str ate gy with c ompletion time τ satisfying τ ≤ L m (Θ) + 1 almost sur ely, and henc e E [ τ ] ≤ E [ L m (Θ)] + 1 . Prop osition 4.13 clariﬁes the role of the information-theoretic la y er in the baseline mo del. Once the learner has structurally reac hed the realized target, a single target-speciﬁc signal suﬃces for identiﬁcation. Thus the dominant obstruction is t ypically structural: the learner must ﬁrst acquire the prerequisites that make the target concept reac hable. The information-theoretic analysis nevertheless remains essential. It explains wh y target-sp eciﬁc instruction is ineﬀective b efore the relev an t prerequisites are in place, and it provides a principled wa y to compare the informativeness of diﬀeren t acquired states through Blac kwell dominance. In this view, structural progress builds the deco der, and information transmission b ecomes eﬀective only after that deco der exists. Example 4.14 (A common prerequisite can open a parseable iden tiﬁcation c hannel) . Consider the mind m = ( C , A m , E m ) with C = { a, b, d 1 , d 2 , d 3 , d 4 } , A m = { a } , and expansion rules { a } ⇒ b , { b } ⇒ d j , j = 1 , 2 , 3 , 4 . Th us b is a common prerequisite, and once b has b een acquired, an y of the four target concepts d 1 , d 2 , d 3 , d 4 b ecomes reachable in one additional step. Let Ω = { d 1 , d 2 , d 3 , d 4 } with the uniform prior. Then H (Θ) = log 4 = 2 . F or each j = 1 , 2 , 3 , 4 , L m ( d j ) = 2 , and therefore E [ L m (Θ)] = 2 . Let the ra w signal alphab et b e Z = { z b , z 1 , z 2 , z 3 , z 4 } , with tgt( z b ) = b , tgt( z j ) = d j , j = 1 , 2 , 3 , 4 . A t the initial acquired concept set { a } , only b is ordered. Hence the parsed observ ation range is { z b , ⊥} , so C m ( { a } ) = log 2 . A t the acquired concept set { a, b } , all ﬁv e raw signals are parseable, so the parsed observ ation range is { z b , z 1 , z 2 , z 3 , z 4 } , and therefore C m ( { a, b } ) = log 5 . Thus acquiring the single prerequisite b enlarges the learner eﬀective channel from 1 bit to log 5 bits p er round. 29 No w supp ose the teacher tries to identify the target immediately by sending Z 1 = z j when Θ = d j . A t the ra w-signal level this w ould reveal the target p erfectly . But at the learner initial acquired concept set { a } , none of the targets d j is ordered, so Y 1 = ρ m ( Z 1 , { a } ) = ⊥ almost surely . Hence I (Θ; Y 1 | F 0 ) = 0 . Before the common prerequisite b is taugh t, target- sp eciﬁc instruction is pure erasure. Consider instead the tw o-round strategy Z 1 = z b for every realization of Θ , and Z 2 = z j if Θ = d j . After round 1 , the learner has acquired the prerequisite: K 1 = { a, b } . The p osterior do es not change, b ecause the ﬁrst signal is indep enden t of Θ . A t round 2 , the signal z j is parseable, so the learner observ es Y 2 = z j , acquires d j , and iden tiﬁes the target exactly . Thus τ = 2 almost surely . The low er b ound of Theorem 4.11 is therefore tight in this example. Since C max m = log 5 , one obtains E [ τ ] ≥ max ( E [ L m (Θ)] , H (Θ) C max m ) = max ( 2 , 2 log 5 ) = 2 , and the strategy ab o ve attains equality . This example shows that an optimal teacher may rationally sp end an entire round on structural preparation rather than on target-sp eciﬁc signaling, b ecause target-sp eciﬁc signals are useless b efore the common prerequisite b has been acquired. In the baseline mo del, the information-theoretic term is not the binding lo w er b ound here, since Prop osition 4.13 implies that identiﬁcation costs at most one additional round once the target is structurally reac hable. The example nev ertheless illustrates the cen tral mec hanism of the section: usable information is state dep enden t, and teac hing may need to enlarge the learner parsed alphab et b efore target information can ﬂow. 5 Structural Limits on T eac hing This section develops tw o consequences of the structural view of teaching. First, for a ﬁxed learner mind, the prerequisite geometry creates threshold eﬀects in ﬁnite-horizon teaching: b elo w a critical time budget, completion is imp ossible for ev ery strategy , while beyond that threshold success b ecomes feasible and, under mild assumptions, even tually likely . Second, for heterogeneous learners, structural incompatibilities generate an intrinsic ineﬃciency of universal broadcast 30 curricula: a single common sequence of signals may b e forced to pay separately for prerequisites that p ersonalized teaching would handle individually . T aken together, these results sho w that the limits of teaching are not merely informational. They are already enco ded in the com binatorial structure of the learner prerequisite system. That structure determines b oth when teaching can b egin to succeed and ho w costly common instruction b ecomes across diﬀerent minds. 5.1 Structural thresholds in teac hing A cen tral question in any teac hing problem is: given a ﬁxe d time budget, what is the pr ob ability that te aching suc c e e ds? The prerequisite structure of the learner determines the answer. Belo w a certain threshold, completion is imp ossible for ev ery teaching strategy . Once the time horizon exceeds that threshold, completion is no longer ruled out a priori, and under mild assumptions the optimal ﬁxed- horizon success probability conv erges to one as the horizon gro ws. This v anishing completion probabilit y is not an appro ximation but a direct consequence of the structural barrier. It also has an immediate implication for resource allo cation: when training budgets are scarce, distributing time evenly across learners ma y pro duce no completed learners at all, whereas concen trating the same budget on fewer learners can yield strictly p ositiv e output. Recall the sto c hastic teac hing mo del from Section 3.1 . The target concept Θ is dra wn from a prior π 0 on Ω , known to b oth teac her and learner. By Deﬁnition 3.8 , teac hing is complete at the random time τ if b oth (i) the learner has acquired the target concept, Θ ∈ K τ ; (ii) the learner has identiﬁed the target, π τ (Θ) = 1 . W e therefore ask: if the teacher is giv en a budget of t rounds, what is the maximal probability of completing teac hing within that budget? Deﬁne V ( t ) = sup admissible teaching strategies P ( τ ≤ t ) . Th us V ( t ) is the optimal success probability ac hiev able with a time budget of t rounds, computed under the prior on Θ . Recall also that for eac h target c ∈ Ω , the quantit y L m ( c ) denotes the structural distance from the axiom set A m to a reac hable acquired concept set containing c . Deﬁne L min = min { L m ( c ) : π 0 ( c ) > 0 } . This is the smallest structural distance among targets that can arise under the prior. F or exp ected completion time, the baseline mo del also admits the upp er b ound E [ τ ] ≤ E [ L m (Θ)] + 1 under Assumption 4.12 (Prop osition 4.13 ). The ﬁxed-horizon analysis b elow complements that statement b y describing the threshold structure of success probabilities as a function of the time budget. 31 Prop osition 5.1 (Zero completion b elo w the structural threshold) . F or every t ∈ N , V ( t ) ≤ P  L m (Θ) ≤ t  . In p articular, V ( t ) = 0 for al l t < L min . Prop osition 5.1 shows that if the time budget is shorter than the structural depth of every p ossible target, then completion is imp ossible. No teaching strategy can circum ven t this obstruction, b ecause the learner cannot b e mov ed to a reachable acquired concept set con taining the realized target in so few rounds. A t the opp osite extreme, if some admissible strategy completes teaching in ﬁnite exp ected time, then the optimal ﬁxed-horizon success probabilit y conv erges to one as the horizon grows. Prop osition 5.2 (Ev en tual success) . If ther e exists an admissible te aching str ate gy such that E [ τ ] < ∞ , then V ( t ) → 1 as t → ∞ . Mor e c oncr etely, for any such str ate gy, V ( t ) ≥ 1 − E [ τ ] t for al l t ≥ 1 . T ogether, Prop ositions 5.1 and 5.2 describ e the qualitative shap e of the ﬁxed- horizon success function V ( t ) : an initial region of structural imp ossibilit y , follo wed b y a region in which success b ecomes increasingly lik ely as the time budget grows. T o make the allo cation implications transparent, it is useful to consider the sp ecial case of a deterministic target. Let g ∈ U m b e ﬁxed, and supp ose that Θ = g almost surely . Then the prior is degenerate, so π t = δ g for all t . Hence identiﬁcation is automatic, and completion reduces to target acquisition alone. Deﬁne the target-acquisition time of g b y τ g = inf { t ≥ 0 : g ∈ K t } , and deﬁne the ﬁxed-horizon acquisition probability b y V g ( t ) = sup admissible teaching strategies P ( τ g ≤ t ) . Prop osition 5.3 (Step function for deterministic targets) . A ssume that the p arsing map is given by Deﬁnition 3.3 and that Assumption 4.12 holds. Then for every deterministic tar get g ∈ U m , V g ( t ) =    0 , if t < L m ( g ) , 1 , otherwise . Th us, for a deterministic target, the ﬁxed-horizon acquisition probability is a step function at the structural distance L m ( g ) . Below that threshold acquisition is imp ossible; at and ab ov e it, acquisition can b e achiev ed with certaint y . R emark 5.4 . The threshold structure ab o v e con trasts with b enc hmark mo dels of human-capital accum ulation in whic h training is represen ted b y a smooth pro duction technology for human capital ( e.g. , [ Ben-Porath , 1967 , Bec k er , 1964 ]). In suc h mo dels ev ery marginal unit of inv estmen t yields a p ositiv e, though 32 p ossibly diminishing, return. In the presen t framew ork, prerequisite-gated learning induces a threshold technology: a teaching signal has no eﬀect un til the learner prerequisite structure admits the target concept, after whic h additional signals b ecome pro ductiv e. The induced pro duction technology is therefore non -concav e. This threshold structure has direct implications for the allo cation of training resources. Consider a decision mak er who m ust allocate a ﬁxed instructional budget across learners, for example a ﬁrm training w orkers in a sp eciﬁc skill or an instructor allo cating tutoring hours across students. The planner has a total budget of B instructional rounds and m ust decide how to distribute them across N learners. Prop osition 5.5 (Allo cation under structural thresholds) . A ssume that the p arsing map is given by Deﬁnition 3.3 and that Assumption 4.12 holds. Fix a deterministic tar get g ∈ U m with L = L m ( g ) ≥ 1 . Consider N identic al le arners and a total budget of B ∈ N instructional r ounds. (i) A ny al lo c ation that gives every le arner fewer than L r ounds yields zer o c omplete d le arners. (ii) Ther e exists an al lo c ation that gives L r ounds to min { N , ⌊ B /L ⌋} le arn- ers and 0 r ounds to the r emaining le arners, and under this al lo c ation min { N , ⌊ B /L ⌋} le arners c omplete. In p articular, if B < N L and the budget is spr e ad so that every le arner r e c eives fewer than L r ounds, then total output is zer o, wher e as the c onc entr ate d al lo c ation in (ii) yields strictly p ositive output whenever B ≥ L . Prop osition 5.5 shows that evenly spreading a ﬁxed training budget can waste the en tire budget when every learner remains b elow the structural threshold. By con trast, concen trating the same budget on few er learners allo ws those learners to cross the threshold and pro duce strictly p ositiv e output. The source of this eﬀect is structural: for a deterministic target, additional training time has no eﬀect until the prerequisite threshold L m ( g ) is reached, at which p oin t completion b ecomes p ossible. The zero-output region is therefore not imp osed from outside the mo del but is a direct consequence of the learner prerequisite geometry . F or random targets, the step-function structure need not p ersist, b ecause diﬀeren t targets may ha ve diﬀeren t structural depths. What remains is the zero- completion phenomenon from Prop osition 5.1 : if every learner receiv es fewer than L min = min { L m ( c ) : π 0 ( c ) > 0 } , rounds, then the completion probability is zero regardless of the teaching strategy . The qualitative allo cation lesson therefore extends b ey ond the deterministic case: if the a v ailable budget is spread so thinly that every learner remains b elow the relev an t structural threshold, no learner completes. 5.2 Limits of univ ersal broadcast curricula The preceding subsection concerned a single learner mind. W e no w turn to heter o gene ous learners whose prerequisite structures diﬀer. In that setting, a teac her restricted to a single broadcast curriculum cannot adapt instruction 33 to individual minds. The next theorem shows that this restriction carries a structural cost: ev en when eac h learner can b e taugh t eﬃciently b y a p ersonalized curriculum, an y common broadcast ma y b e forced to pay a linear p enalt y in the n umber of learner types. Theorem 5.6 (Linear broadcast p enalt y for incompatible minds) . Fix inte gers k ≥ 2 and L ≥ 2 . Then one c an c onstruct • a ﬁnite c onc ept sp ac e C , • a c ommon axiom set A ⊆ C , • a ﬁnite r aw-signal alphab et Z to gether with a signal tar get map tgt : Z → C , • minds m 1 , . . . , m k on C with c ommon axiom set A m i = A , i = 1 , . . . , k , but p airwise distinct rule sets E m i , • and a c ommon deterministic tar get c onc ept g ∈ C , such that: (i) for e ach i ∈ { 1 , . . . , k } , ther e exists a valid or der e d curriculum for m i of length L whose ﬁnal ac quir e d c onc ept set c ontains g ; (ii) if a c ommon br o adc ast se quenc e Γ = ( z 1 , . . . , z T ) ∈ Z T is pr esente d to al l k minds, and if the induc e d ac quir e d c onc ept pr o c esses start fr om K ( i ) 0 = A , i = 1 , . . . , k , and evolve ac c or ding to K ( i ) t +1 =    K ( i ) t ∪ { tgt( z t +1 ) } , if tgt( z t +1 ) ∈ Φ m i ( K ( i ) t ) , K ( i ) t , otherwise, t = 0 , . . . , T − 1 , then the c ondition g ∈ K ( i ) T for every i = 1 , . . . , k implies T ≥ k ( L − 1) + 1; (iii) Ther e exists a c ommon br o adc ast se quenc e of length k ( L − 1) + 1 for which g ∈ K ( i ) k ( L − 1)+1 for every i = 1 , . . . , k . Theorem 5.6 is an existence result. F or any prescrib ed n umber k of learner t yp es and any prescrib ed p ersonalized teac hing length L , one can construct k minds sharing the same axiom set and the same deterministic target concept g , but having diﬀerent prerequisite structures. F or each mind, the target can b e acquired in L p ersonalized rounds. How ev er, every common broadcast sequence that succeeds for all minds must ha v e length at least k ( L − 1) + 1 . The source of the p enalt y is purely structural. Eac h mind p ossesses a priv ate prerequisite chain leading to the target concept, and signals that adv ance one mind along its chain are unparseable for the others. Consequen tly a universal broadcast cannot reuse prerequisite rounds across learner types; it must eﬀectively pa y for eac h priv ate chain separately . This is what generates the linear dep endence of the required broadcast length on the n umber of learner t yp es. A c kno wledgmen ts. The author thanks T eng Andrea Xu for helpful discus- sions. 34 App endix This do cumen t con tains supplemen tary material for the pap er A Mathematic al The ory of Understanding . A Pro ofs 35 A.1 Pro ofs for Section 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 35 A.2 Pro ofs for Section 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 41 A.3 Pro ofs for Section 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 42 A.4 Pro ofs for Section 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 46 B Additional results 49 A Pro ofs This app endix collects the pro ofs omitted from the main text, organized by the section in which the corresp onding result app ears. Supplemen tary lemmas are used in the pro ofs but not stated in the main text are included where they arise. A.1 Pro ofs for Section 2 Pro of of Lemma 2.7 Let c ∈ Φ m ( K ) . If c ∈ K , then c ∈ K ′ ⊆ Φ m ( K ′ ) . Otherwise there exists ( S , c ) ∈ E m with S ⊆ K . Since K ⊆ K ′ , one also has S ⊆ K ′ , hence c ∈ Φ m ( K ′ ) . Lemma A.1 (Directed-union con tin uit y) . If ( K α ) α ∈ A is a nonempty dir e cte d family, then Φ m ( S α ∈ A K α ) = S α ∈ A Φ m ( K α ) . Pr o of. The inclusion ⊇ follo ws from monotonicit y of Φ m . F or the rev erse inclusion, let c ∈ Φ m ( S α K α ) . If c ∈ S α K α , then c ∈ Φ m ( K α ) for some α ∈ A . Otherwise there exists a ﬁnite set S ⊆ S α K α with ( S , c ) ∈ E m . F or each s ∈ S , c ho ose α s suc h that s ∈ K α s . Because the family is directed and S is ﬁnite, there exists γ with S s ∈ S K α s ⊆ K γ . Hence S ⊆ K γ , so c ∈ Φ m ( K γ ) . 35 Pro of of Prop osition 2.10 (i) W e ﬁrst show that the collection of ﬁxed p oin ts of Φ m con taining K is non- empt y . Consider the en tire concept space C . By extensiveness, C ⊆ Φ m ( C ) . F or the reverse inclusion: ev ery expansion rule ( S , c ) ∈ E m has c ∈ C b y deﬁnition, so Φ m ( C ) ⊆ C . T ogether, Φ m ( C ) = C , and clearly K ⊆ C . So C is a ﬁxed p oin t con taining K . By Deﬁnition 2.8 , cl m ( K ) = T { F ⊆ C : K ⊆ F , Φ m ( F ) = F } . The in tersection is ov er a non-empty collection, so cl m ( K ) is w ell-deﬁned. W e now sho w it is itself a ﬁxed p oin t. F or any ﬁxed point F ⊇ K in the collection, cl m ( K ) ⊆ F , so monotonicit y gives Φ m (cl m ( K )) ⊆ Φ m ( F ) = F . Since this holds for ev ery suc h F , we get Φ m (cl m ( K )) ⊆ cl m ( K ) . Extensiveness gives the other direction: cl m ( K ) ⊆ Φ m (cl m ( K )) . T ogether: Φ m (cl m ( K )) = cl m ( K ) . (ii) By extensiveness, the sequence K ⊆ Φ m ( K ) ⊆ Φ 2 m ( K ) ⊆ · · · is non-decreasing. Let L = ∪ ∞ n =0 Φ n m ( K ) . By Lemma A.1 : Φ m ( L ) = Φ m ∞ [ n =0 Φ n m ( K ) ! = ∞ [ n =0 Φ n +1 m ( K ) = L so L is a ﬁxed p oin t of Φ m con taining K . Since cl m ( K ) is the least suc h ﬁxed p oin t, cl m ( K ) ⊆ L . Con v ersely , w e show b y induction that Φ n m ( K ) ⊆ cl m ( K ) for all n ≥ 0 . Base case: Φ 0 m ( K ) = K ⊆ cl m ( K ) by deﬁnition. Inductive step: supp ose Φ n m ( K ) ⊆ cl m ( K ) . Since cl m ( K ) is a ﬁxed point, Φ m (cl m ( K )) = cl m ( K ) . Monotonicit y then giv es Φ n +1 m ( K ) = Φ m (Φ n m ( K )) ⊆ Φ m (cl m ( K )) = cl m ( K ) . Since Φ n m ( K ) ⊆ cl m ( K ) for every n , w e get L = ∪ ∞ n =0 Φ n m ( K ) ⊆ cl m ( K ) . (iii) If C is ﬁnite, the chain K ⊆ Φ 1 m ( K ) ⊆ Φ 2 m ( K ) ⊆ · · · is an increasing sequence of subsets of C . Whenever Φ n +1 m ( K )  = Φ n m ( K ) , the inclusion is strict, so | Φ n +1 m ( K ) | ≥ | Φ n m ( K ) | + 1 . Since each set has at most | C | elemen ts, strict gro wth can o ccur at most | C | − | K | ≤ | C | times. Therefore Φ N m ( K ) = Φ N +1 m ( K ) for some N ≤ | C | , and the chain stabilizes: cl m ( K ) = Φ N m ( K ) . Pro of of Prop osition 2.11 W e ﬁrst show that U m = cl m ( A m ) satisﬁes (i) to (iii). (i) By Prop osition 2.10 (ii), cl m ( A m ) = S n ≥ 0 Φ n m ( A m ) . Since Φ 0 m ( A m ) = A m , it follo ws that A m ⊆ cl m ( A m ) . (ii) Let ( S , c ) ∈ E m and supp ose S ⊆ cl m ( A m ) . By Prop osition 2.10 (i), cl m ( A m ) is a ﬁxed p oin t of Φ m , so Φ m (cl m ( A m )) = cl m ( A m ) . Since ( S , c ) ∈ E m and S ⊆ cl m ( A m ) , the deﬁnition of Φ m giv es c ∈ Φ m (cl m ( A m )) = cl m ( A m ) . (iii) Let F ⊆ C satisfy (i) and (ii). Then A m ⊆ F . Moreo ver, if c ∈ Φ m ( F ) , then either c ∈ F , or there exists ( S , c ) ∈ E m with S ⊆ F , in which case (ii) gives c ∈ F . Hence Φ m ( F ) ⊆ F . By extensiveness of Φ m , we also ha ve F ⊆ Φ m ( F ) . Therefore Φ m ( F ) = F , so F is a ﬁxed point of Φ m con taining A m . Since cl m ( A m ) is the in tersection of all such ﬁxed p oin ts, we conclude that cl m ( A m ) ⊆ F . Th us cl m ( A m ) is the smallest set satisfying (i) and (ii). Finally , supp ose U and U ′ b oth satisfy (i)-(iii). Since U ′ satisﬁes (i) and (ii), the minimality prop ert y (iii) for U implies U ⊆ U ′ . By symmetry , U ′ ⊆ U . Hence U = U ′ . 36 Pro of of Theorem 2.14 The pro of of Theorem 2.14 relies on the ﬁniteness of deriv ation trees, whic h w e establish ﬁrst. Lemma A.2 (Finiteness of deriv ations) . Every derivation tr e e in the sense of Deﬁnition 2.12 is ﬁnite. Pr o of of L emma A.2 . Assume for contradiction that the deriv ation tree is inﬁnite. By Deﬁnition 2.12 (i), ev ery no de has ﬁnitely many children, since eac h prerequisite set is ﬁnite. Th us the tree is ﬁnitely branching. By K önig’s lemma [ Diestel , 2024 , Lemma 8.1.2], every inﬁnite ﬁnitely branching tree has an inﬁnite descending path. This contradicts the w ell-foundedness requiremen t in Deﬁnition 2.12 . Therefore the tree is ﬁnite. Pr o of of The or em 2.14 . F or ( ⇐ ), supp ose K ⊢ m c . By Lemma A.2 , the deriv ation tree is ﬁnite. W e argue by induction on its height. If the height is 0 , then either c ∈ K , or ( ∅ , c ) ∈ E m ; in either case c ∈ cl m ( K ) . F or the induction step, if the ro ot uses a rule ( S , c ) and each child lab el s ∈ S has a deriv ation of smaller height, then by the induction h yp othesis S ⊆ cl m ( K ) , hence c ∈ Φ m (cl m ( K )) = cl m ( K ) . F or ( ⇒ ), let D = { d ∈ C : K ⊢ m d } . W e sho w that D is a ﬁxed point of Φ m con taining K . First, K ⊆ D : for any c ∈ K , the single-no de tree with ro ot lab eled c is a v alid deriv ation, so c ∈ D . Next, we will show that Φ m ( D ) ⊆ D . Let c ∈ Φ m ( D ) . If c / ∈ D , then there exists ( S , c ) ∈ E m with S ⊆ D . F or each s ∈ S , c ho ose a deriv ation of s from K and attac h them b elo w a new ro ot lab eled c . This giv es a deriv ation of c , con tradiction. If S = ∅ , the new ro ot has no children and still forms a v alid deriv ation. Thus, c ∈ D , and Φ m ( D ) ⊆ D . By extensiveness, D ⊆ Φ m ( D ) , so D is a ﬁxed p oint containing K . Therefore cl m ( K ) ⊆ D . By deﬁnition of D , this means that if c ∈ cl m ( K ) , then K ⊢ m c . Pro of of Theorem 2.15 W e ﬁrst recall the abstract deﬁnition of algebraic closure op erator. Deﬁnition A.3 (Algebraic closure op erator) . Let X b e a set. A map f : 2 X → 2 X is an algebr aic closur e op er ator if it satisﬁes extension, monotonicity , idemp otence, and the ﬁnitary prop ert y: if c ∈ f ( K ) , then c ∈ f ( S ) for some ﬁnite S ⊆ K . Pr o of of The or em 2.15 . F or (i), extension, monotonicit y , and idemp otence of cl m follo w from Proposition 2.10 . Finitariness follows from Theorem 2.14 and Lemma A.2 : if c ∈ cl m ( K ) , then there is a ﬁnite deriv ation tree using only ﬁnitely man y base lab els from K . F or (ii), deﬁne E = { ( S , c ) : S ⊆ X ﬁnite and c ∈ f ( S ) \ S } . Let g b e the closure op erator induced b y E as in the theorem statemen t. W e show g ( K ) = f ( K ) for every K ⊆ X . First, g ( K ) ⊆ f ( K ) b ecause f ( K ) is a ﬁxed point of Ψ E con taining K : if c ∈ Ψ E ( f ( K )) , then either c ∈ f ( K ) or else there exists ( S , c ) ∈ E with S ⊆ f ( K ) , whic h implies c ∈ f ( S ) ⊆ f ( f ( K )) = f ( K ) . 37 Con v ersely , if c ∈ f ( K ) , then b y algebraicity there exists a ﬁnite S 0 ⊆ K suc h that c ∈ f ( S 0 ) . If c ∈ S 0 , then c ∈ K ⊆ g ( K ) . Otherwise ( S 0 , c ) ∈ E , and since S 0 ⊆ K ⊆ g ( K ) , the rule ﬁres inside g ( K ) , so c ∈ g ( K ) . Pro of of Theorem 2.19 If c ∗ ∈ A m , the empty curriculum works. Assume therefore that c ∗ / ∈ A m . Since c ∗ ∈ U m = cl m ( A m ) , Theorem 2.14 implies that there exists a deriv ation tree of c ∗ from A m . By Lemma A.2 , this deriv ation tree is ﬁnite. Let R b e the set of all non-base rule no des in this deriv ation tree. F orm a directed graph on R b y retaining the paren t-child relation b et w een rule no des and orien ting eac h edge from c hild to parent. Because the deriv ation tree is ﬁnite and w ell-founded, this directed graph is ﬁnite and acyclic. Hence it admits a top ological ordering v 1 , . . . , v L . F or eac h i = 1 , . . . , L , let ( S i , c i ) b e the expansion rule attached to the no de v i . Deﬁne K 0 = A m , K i = K i − 1 ∪ { c i } for i = 1 , . . . , L . W e claim that γ = (( S 1 , c 1 ) , . . . , ( S L , c L )) is a v alid ordered curriculum starting from A m . Indeed, ﬁx i ∈ { 1 , . . . , L } and let s ∈ S i . In the deriv ation tree, the child corresp onding to s is either: (i) a base no de, in which case s ∈ A m = K 0 ⊆ K i − 1 ; or (ii) a rule node. In that case this c hild m ust occur earlier than v i in the top ological order, sa y it is v j with j < i . Its lab el is then c j = s , so s ∈ K j ⊆ K i − 1 . Th us every prerequisite in S i b elongs to K i − 1 , so S i ⊆ K i − 1 . Since also ( S i , c i ) ∈ E m b y construction, eac h step is v alid. Therefore γ is a v alid ordered curriculum. Finally , the ro ot of the deriv ation tree is a rule no de lab elled b y c ∗ . Hence it is one of the nodes v 1 , . . . , v L , say v r , and therefore c r = c ∗ . It follo ws that c ∗ ∈ K r ⊆ K L . So the curriculum reaches a ﬁnal acquired concept set containing c ∗ . Pro of of Prop osition 2.21 W e argue by induction on i . F or i = 0 , K 0 = A m ⊆ U m b y Prop osition 2.11 (i). No w supp ose K i − 1 ⊆ U m . Since γ is v alid, ( S i , c i ) ∈ E m and S i ⊆ K i − 1 ⊆ U m . By Prop osition 2.11 (ii), this implies c i ∈ U m . Hence K i = K i − 1 ∪ { c i } ⊆ U m . This pro ves the claim for all i . The ﬁnal statement follows immediately . Pro of of Prop osition 2.24 Because U m is ﬁnite b y Assumption 2.23 , the p o w er set 2 U m is ﬁnite. Since K m ⊆ 2 U m , it follows that K m is ﬁnite. F or (i), the trivial c hain of length zero shows that A m ∈ K m . Moreov er, every reac hable set con tains A m , since every witnessing chain starts from A m and only adds concepts. Th us A m is the minimum element of ( K m , ⊆ ) . F or (ii), let K ∈ K m with K  = A m . By deﬁnition of reachabilit y , there exists a witnessing chain A m = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K 38 suc h that K i +1 = K i ∪ { c i } , c i ∈ Φ m ( K i ) \ K i ( i = 0 , . . . , L − 1) . Since K  = A m , one has L ≥ 1 . Then K L − 1 ∈ K m b y Lemma B.2 , K L − 1 ⊂ K , | K \ K L − 1 | = 1 . This prov es (ii). F or (iii), let K , K ′ ∈ K m . Cho ose a witnessing c hain for K ′ : A m = K ′ 0 ⊂ K ′ 1 ⊂ · · · ⊂ K ′ s = K ′ , K ′ i +1 = K ′ i ∪ { c i } , c i ∈ Φ m ( K ′ i ) \ K ′ i . F or each i = 0 , . . . , s , deﬁne L i = K ∪ K ′ i . Then L 0 = K and L s = K ∪ K ′ . Since K ′ i ⊆ L i , monotonicity of Φ m giv es Φ m ( K ′ i ) ⊆ Φ m ( L i ) . Hence, whenever c i / ∈ L i , one has c i ∈ Φ m ( K ′ i ) ⊆ Φ m ( L i ) , so L i +1 = L i ∪ { c i } is a v alid extension. If instead c i ∈ L i , then L i +1 = L i . Remo ving rep eated sets from the sequence ( L i ) s i =0 yields a v alid chain from K to K ∪ K ′ . Concatenating this chain with any witnessing chain from A m to K sho ws that K ∪ K ′ ∈ K m . Th us K m is union-closed. F or (iv), we ﬁrst sho w that U m ∈ K m . Let K ∈ K m with K  = U m . Suppose, to w ard a con tradiction, that Φ m ( K ) = K . Then K is a ﬁxed p oin t of Φ m con taining A m . Since U m = cl m ( A m ) is the least ﬁxed p oint containing A m , it follo ws that U m ⊆ K . But by deﬁnition of K m , one also has K ⊆ U m , hence K = U m , a con tradiction. Therefore Φ m ( K ) \ K  = ∅ . Cho ose an y c ∈ Φ m ( K ) \ K . Because K ⊆ U m and U m is a ﬁxed p oin t of Φ m , monotonicit y gives Φ m ( K ) ⊆ Φ m ( U m ) = U m , so in particular c ∈ U m . Hence K ∪ { c } is again a reachable subset of U m . Starting from A m , rep eat this step as long as Φ m ( K ) \ K  = ∅ . Because U m is ﬁnite and eac h step strictly enlarges the set, the pro cess terminates after ﬁnitely man y steps at some reachable set F ⊆ U m satisfying Φ m ( F ) = F . Since F is a ﬁxed p oin t containing A m , minimalit y of U m = cl m ( A m ) implies U m ⊆ F . As also F ⊆ U m , we conclude that F = U m . Therefore U m ∈ K m . Since every element of K m is b y deﬁnition a subset of U m , it follo ws that U m is the maxim um elemen t of ( K m , ⊆ ) . Finally , (v) follows from (iii). F or any K , K ′ ∈ K m , the set K ∪ K ′ b elongs to K m and is clearly an upp er b ound of K and K ′ . If M ∈ K m is an y other upp er b ound, so that K ⊆ M and K ′ ⊆ M , then K ∪ K ′ ⊆ M . Hence K ∪ K ′ is the least upp er b ound. 39 Pro of of Corollary 2.26 The result follows directly from Prop osition 2.24 and Deﬁnition 2.25 . Pro of of Theorem 2.27 W e prov e (ii) ⇒ (i) and (i) ⇒ (ii) separately . (ii) ⇒ (i) . Assume there exists a mind m = ( C , A , E m ) suc h that K m = F . By Corollary 2.26 , the family K m is an A -based learning space. Hence so is F . (i) ⇒ (ii) . Assume that F is an A -based learning space, and deﬁne m F = ( C , A , E F ) using the canonical rule set ab o v e. Let K m F denote the reac hable family generated b y this mind. W e prov e that K m F = F . Step 1: K m F ⊆ F . Let K ∈ K m F . Cho ose a witnessing chain A = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K suc h that K i +1 = K i ∪ { c i } , c i ∈ Φ m F ( K i ) \ K i for i = 0 , . . . , L − 1 . W e prov e by induction on i that K i ∈ F for all i . F or i = 0 , one has K 0 = A ∈ F . No w supp ose K i ∈ F . Since c i ∈ Φ m F ( K i ) \ K i , there exists a rule ( S , c i ) ∈ E F with S ⊆ K i . By deﬁnition of E F , S ∈ F , S ∪ { c i } ∈ F . Because K i ∈ F and F is union-closed, K i ∪ ( S ∪ { c i } ) ∈ F . Since S ⊆ K i , this simpliﬁes to K i ∪ { c i } = K i +1 ∈ F . Th us ev ery K i lies in F , and in particular K ∈ F . Hence K m F ⊆ F . Step 2: F ⊆ K m F . Let K ∈ F . If K = A , then K is reachable by the trivial chain. Assume now that K  = A . Since F is an A -based learning space, repeated application of accessibility yields a descending c hain K = K L ⊃ K L − 1 ⊃ · · · ⊃ K 0 = A suc h that eac h K i ∈ F and K i = K i − 1 ∪ { x i } for i = 1 , . . . , L. Rev erse the c hain: A = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K . F or each i = 1 , . . . , L , b oth K i − 1 and K i = K i − 1 ∪ { x i } b elong to F . Therefore, b y the deﬁnition of E F , ( K i − 1 , x i ) ∈ E F . Hence x i ∈ Φ m F ( K i − 1 ) \ K i − 1 , so every step in the chain is a v alid reac hable extension. Thus K is reachable from A , whic h sho ws that K ∈ K m F . Therefore F ⊆ K m F . Com bining the t wo inclusions gives K m F = F . 40 A.2 Pro ofs for Section 3 Pro of of Lemma 3.5 W e argue by induction on t . F or t = 0 , K 0 = A m ∈ K m b y the trivial witnessing c hain. No w supp ose K t ∈ K m almost surely . If Y t +1 = ⊥ , then by Deﬁnition 3.4 , K t +1 = K t , hence K t +1 ∈ K m . If Y t +1 ∈ Z , deﬁne c t +1 = tgt ( Y t +1 ) . Since the parser outputs a non- n ull signal, Deﬁnition 3.3 implies that c t +1 ∈ Φ m ( K t ) . Hence either c t +1 ∈ K t , in whic h case K t +1 = K t , or else c t +1 ∈ Φ m ( K t ) \ K t , in which case K t +1 = K t ∪ { c t +1 } is a v alid one-step reac hable extension from K t in the sense of Deﬁnition 2.22 . Since K t ∈ K m , it follows that K t +1 ∈ K m . This pro ves the claim. Pro of of Prop osition 3.10 By the deﬁnition of conditional mutual information, I (Θ; Y t +1 | F t ) = H (Θ | F t ) − E [ H (Θ | F t ∨ σ ( Y t +1 )) | F t ] . Since F t +1 = F t ∨ σ ( Y t +1 ) and H t = H (Θ | F t ) , this b ecomes I (Θ; Y t +1 | F t ) = H t − E [ H t +1 | F t ] . Because H t is F t -measurable, H t − E [ H t +1 | F t ] = E [ H t − H t +1 | F t ] . Pro of of Theorem 3.11 By Prop osition 3.10 , H t − E [ H t +1 | F t ] = I (Θ; Y t +1 | F t ) ≥ 0 . Hence ( H t ) is a sup ermartingale. Equalit y holds if and only if I (Θ; Y t +1 | F t ) = 0 , whic h is equiv alent to conditional indep endence of Θ and Y t +1 giv en F t . Pro of of Theorem 3.13 The pro of of Theorem 3.13 relies on the follo wing lemma, which we establish ﬁrst. Lemma A.4 (Unparseabilit y erases information) . L et C t +1 = tgt( Z t +1 ) , U t +1 = { C t +1 / ∈ Φ m ( K t ) } . Then: (i) on U t +1 one has Y t +1 = ⊥ almost sur ely, and ther efor e I (Θ; Y t +1 | F t , U t +1 ) = 0 , I ( Z t +1 ; Y t +1 | F t , U t +1 ) = 0; 41 (ii) if P ( U t +1 | F t ) = 1 , then I (Θ; Y t +1 | F t ) = 0 , I ( Z t +1 ; Y t +1 | F t ) = 0 . Pr o of. On U t +1 , Deﬁnition 3.3 gives Y t +1 = ρ m ( Z t +1 , K t ) = ⊥ almost surely . Hence conditional on ( F t , U t +1 ) , the random v ariable Y t +1 is constant, so all the relev ant conditional en tropies are zero. This pro ves (i). If P ( U t +1 | F t ) = 1 , then U t +1 o ccurs almost surely conditional on F t , so Y t +1 = ⊥ almost surely conditional on F t . Again all relev ant conditional entropies are zero, proving (ii). Pr o of of The or em 3.13 . On U t +1 , Lemma A.4 giv es I (Θ; Y t +1 | F t , U t +1 ) = 0 . On U c t +1 , Prop osition B.3 yields I (Θ; Y t +1 | F t , U c t +1 ) = I (Θ; Z t +1 | F t , U c t +1 ) > 0 . A.3 Pro ofs for Section 4 Lemma A.5 (Explicit form ula for the parsed entrop y b ound) . A ssume Z is ﬁnite. Then, for every K ∈ K m , C m ( K ) =    log  | Z ord ( K ) | + 1  , if Z ord ( K ) ⊊ Z , log | Z | , if Z ord ( K ) = Z . Pr o of of L emma A.5 . Fix K ∈ K m and deﬁne the parsed observ ation range Y ( K ) = { ρ m ( z , K ) : z ∈ Z } ⊆ Z ∪ {⊥} . F or any Z -v alued random v ariable Z , the random v ariable ρ m ( Z, K ) tak es v alues in Y ( K ) almost surely , so H ( ρ m ( Z, K )) ≤ log | Y ( K ) | b y [ Cov er and Thomas , 2006 , P age 41]. T aking the supremum o ver all such Z gives C m ( K ) ≤ log | Y ( K ) | . F or the rev erse inequalit y , let M = | Y ( K ) | . F or each y ∈ Y ( K ) , choose some represen tativ e z y ∈ Z suc h that ρ m ( z y , K ) = y . Deﬁne a Z -v alued random v ariable Z by P ( Z = z y ) = 1 M for each y ∈ Y ( K ) , and P ( Z = z ) = 0 for all other z ∈ Z . Then ρ m ( Z, K ) is uniform on Y ( K ) , so H ( ρ m ( Z, K )) = log | Y ( K ) | . Hence C m ( K ) = log | Y ( K ) | . Under the parsing map ρ m , one has ρ m ( z , K ) =    z , if z ∈ Z ord ( K ) , ⊥ , if z / ∈ Z ord ( K ) . If Z ord ( K ) ⊊ Z , then Y ( K ) = Z ord ( K ) ∪ {⊥} , so | Y ( K ) | = | Z ord ( K ) | + 1 . If instead Z ord ( K ) = Z , then ev ery raw signal is parseable, so Y ( K ) = Z . Substituting these t wo cases into C m ( K ) = log | Y ( K ) | prov es the claim. 42 Pro of of Prop osition 4.3 Because K t is F t -measurable, conditional on F t the law of Y t +1 = ρ m ( Z t +1 , K t ) is obtained by passing the conditional law of Z t +1 through the ﬁxed map z 7→ ρ m ( z , K t ) . Therefore, I (Θ; Y t +1 | F t ) ≤ H ( Y t +1 | F t ) ≤ C m ( K t ) almost surely . Pro of of Lemma 4.4 Assume K , K ′ ∈ K m with K ⊆ K ′ . By Lemma 2.7 , Φ m ( K ) ⊆ Φ m ( K ′ ) , hence Z ord ( K ) ⊆ Z ord ( K ′ ) . Deﬁne g K , K ′ : Z ∪ {⊥} → Z ∪ {⊥} by g K , K ′ ( y ) =    y , if y ∈ Z ord ( K ) , ⊥ , otherwise. Then for every z ∈ Z , ρ m ( z , K ) = g K , K ′ ( ρ m ( z , K ′ )) . Indeed, if z ∈ Z ord ( K ) , then z is ordered at b oth sets and b oth sides equal z . If z / ∈ Z ord ( K ) , then the left-hand side is ⊥ ; on the right-hand side, either ρ m ( z , K ′ ) = ⊥ , or else ρ m ( z , K ′ ) = z and g K , K ′ ( z ) = ⊥ . No w let Z be any Z -v alued random v ariable. Then ρ m ( Z, K ) = g K , K ′  ρ m ( Z, K ′ )  almost surely . Th us ρ m ( Z, K ) is a deterministic function of ρ m ( Z, K ′ ) . By the data pro cessing inequalit y , H ( ρ m ( Z, K )) ≤ H ( ρ m ( Z, K ′ )) . T aking suprema ov er all Z -v alued random v ariables Z yields C m ( K ) ≤ C m ( K ′ ) . Pro of of Theorem 4.6 Let g K , K ′ b e the deterministic map constructed in the pro of of Lemma 4.4 . F or every raw signal z ∈ Z , one has ρ m ( z , K ) = g K , K ′ ( ρ m ( z , K ′ )) . Therefore, conditional on the public history h t , ρ m ( Z t +1 , K ) = g K , K ′  ρ m ( Z t +1 , K ′ )  almost surely . Hence, for every ω ∈ Ω and ev ery y ∈ Z ∪ {⊥} , W K ,h t ( y | ω ) = X y ′ ∈ Z ∪{⊥} G K , K ′ ( y | y ′ ) W K ′ ,h t ( y ′ | ω ) , where G K , K ′ ( y | y ′ ) = 1 { g K , K ′ ( y ′ ) = y } is the Mark o v kernel induced by g K , K ′ . Th us W K ,h t is obtained from W K ′ ,h t b y post-pro cessing through a Marko v k ernel independent of ω . Therefore W K ,h t is a garbling of W K ′ ,h t , and W K ′ ,h t Blac kwe ll-dominates W K ,h t . 43 Pro of of Prop osition 4.8 Fix a sample path. By the concept-acquisition up date rule, each round adds at most one new concept to the learner’s acquired concept set. If completion o ccurs at time τ , then in particular Θ ∈ K τ . Delete rep eated sets from the sequence K 0 , K 1 , . . . , K τ . The resulting strictly increasing sequence is of the form A m = K i 0 ⊂ K i 1 ⊂ · · · ⊂ K i r , where each step adds one concept b elonging to the one-step expansion of the previous set. Hence it is a prerequisite-resp ecting c hain ending at a set containing Θ . By deﬁnition of L m (Θ) , any such c hain has length at least L m (Θ) . Since the n um b er of strict acquisitions up to time τ is at most τ , it follows that τ ≥ L m (Θ) almost surely . T aking exp ectations completes the pro of. Pro of of Lemma 4.9 By Prop osition 3.10 , E [ H t − H t +1 | F t ] = I (Θ; Y t +1 | F t ) . Multiplying b y 1 { τ id >t } and taking exp ectations giv es E h 1 { τ id >t } ( H t − H t +1 ) i = E h 1 { τ id >t } I (Θ; Y t +1 | F t ) i . Summing from t = 0 to n − 1 yields E " n − 1 X t =0 1 { τ id >t } I (Θ; Y t +1 | F t ) # = E [ H 0 ] − E [ H τ id ∧ n ] . Since Θ is F τ id -measurable, one has H τ id = H (Θ | F τ id ) = 0 almost surely . Also, 0 ≤ H t ≤ log | Ω | for all t . Let S n = n − 1 X t =0 1 { τ id >t } I (Θ; Y t +1 | F t ) . Because the summands are nonnegative, S n increases almost surely to τ id − 1 X t =0 I (Θ; Y t +1 | F t ) . Monotone conv ergence and b ounded conv ergence therefore give E " τ id − 1 X t =0 I (Θ; Y t +1 | F t ) # = E [ H 0 ] . Since F 0 is trivial, E [ H 0 ] = H (Θ) . 44 Pro of of Prop osition 4.10 By Lemma 4.9 , H (Θ) = E " τ id − 1 X t =0 I (Θ; Y t +1 | F t ) # . By Prop osition 4.3 , I (Θ; Y t +1 | F t ) ≤ C m ( K t ) almost surely for every t. Substituting this b ound inside the sum yields the result. Pro of of Theorem 4.11 By Prop osition 4.8 , the structural b ound follows: E [ τ ] ≥ E [ L m (Θ)] . F or the epistemic part: deﬁne the identiﬁcation time τ id = inf { t ≥ 0 : H (Θ | F t ) = 0 } . Since { τ id ≤ t } = { H (Θ | F t ) = 0 } ∈ F t , τ id is an ( F t ) -stopping time. Moreo ver, if τ is a completion time then iden tiﬁcation m ust already hav e o ccurred, so τ id ≤ τ almost surely . By Prop osition 4.10 , H (Θ) ≤ E " τ id − 1 X t =0 C m ( K t ) # . Since K t ∈ K m almost surely by Lemma 3.5 , C m ( K t ) ≤ C max m almost surely . Therefore H (Θ) ≤ E " τ id − 1 X t =0 C m ( K t ) # ≤ C max m E [ τ id ] ≤ C max m E [ τ ] . Rearranging yields E [ τ ] ≥ H (Θ) C max m . Com bining the t wo b ounds giv es the theorem. Pro of of Prop osition 4.13 F or each c ∈ Ω + , c ho ose one raw signal z c ∈ Z suc h that tgt ( z c ) = c . Because tgt is a function, the signals ( z c ) c ∈ Ω + are pairwise distinct. Since Ω ⊆ U m and U m is a ﬁxed p oint of Φ m , ev ery target concept c ∈ Ω is ordered at U m . Hence eac h z c is parseable at U m , so the parsed observ ation range at U m con tains at least the distinct sym b ols { z c : c ∈ Ω + } . Therefore, by Lemma A.5 , C max m ≥ C m ( U m ) ≥ log | Ω + | . Since en trop y is b ounded by the logarithm of the supp ort size, H (Θ) ≤ log | Ω + | , whic h yields H (Θ) ≤ C max m . This pro ves (i). 45 F or (ii), ﬁx c ∈ Ω + . By deﬁnition of L m ( c ) , there exists a v alid ordered curriculum of length L m ( c ) from A m to a set containing c . Under Assumption 4.12 , the teacher can implemen t that curriculum by sending one raw signal targeting eac h concept along the path. After L m ( c ) rounds, the learner has acquired c . In one additional round, the teac her sends the ﬁxed representativ e signal z c . Because c ∈ K L m ( c ) , the signal z c is parseable at that state. Since the strategy speciﬁes a unique representativ e signal for eac h p ossible target, the learner identiﬁes the realized target after observing z c . Th us τ ≤ L m (Θ) + 1 almost surely . T aking exp ectations completes the pro of. A.4 Pro ofs for Section 5 Pro of of Prop osition 5.1 By Prop osition 4.8 , ev ery completion time τ satisﬁes τ ≥ L m (Θ) almost surely . Hence { τ ≤ t } ⊆ { L m (Θ) ≤ t } . Therefore, for every admissible teaching strategy , P ( τ ≤ t ) ≤ P  L m (Θ) ≤ t  . T aking the supremum o ver strategies yields the ﬁrst claim. If t < L min , then L m (Θ) > t almost surely under the prior, so P  L m (Θ) ≤ t  = 0 . Hence V ( t ) = 0 . Pro of of Prop osition 5.2 Fix an admissible teaching strategy with E [ τ ] < ∞ . By Marko v’s inequality , P ( τ > t ) ≤ E [ τ ] t , and therefore P ( τ ≤ t ) ≥ 1 − E [ τ ] t . Since V ( t ) is the supremum of P ( τ ≤ t ) ov er all admissible strategies, it follows that V ( t ) ≥ 1 − E [ τ ] t . Letting t → ∞ gives V ( t ) → 1 . 46 Pro of of Prop osition 5.3 By Prop osition 4.8 , ev ery acquisition time τ g satisﬁes τ g ≥ L m ( g ) almost surely . Therefore, for every admissible strategy and ev ery t < L m ( g ) , P ( τ g ≤ t ) = 0 . T aking the supremum o ver strategies yields V g ( t ) = 0 for t < L m ( g ) . No w let L = L m ( g ) . By deﬁnition of structural distance, there exists a witnessing chain A m = K 0 ⊂ K 1 ⊂ · · · ⊂ K L , g ∈ K L , suc h that K i +1 = K i ∪ { u i } , u i ∈ Φ m ( K i ) \ K i ( i = 0 , . . . , L − 1) . By Assumption 4.12 , for each u i there exists a raw signal z i ∈ Z such that tgt( z i ) = u i . Since u i ∈ Φ m ( K i ) , the signal z i is parseable at K i . If the teac her sends z 0 , z 1 , . . . , z L − 1 in sequence, the learner mov es through the sets K 0 , K 1 , . . . , K L and therefore acquires g after L rounds. Thus there exists an admissible strategy suc h that P ( τ g ≤ L ) = 1 . Hence V g ( t ) = 1 for all t ≥ L m ( g ) . Pro of of Prop osition 5.5 F or (i), if ev ery learner receives fewer than L rounds, then by Prop osition 5.3 the acquisition probabilit y of g is zero for ev ery learner. Hence no learner completes. F or (ii), select min { N , ⌊ B /L ⌋} learners and allo cate L rounds to eac h of them, allo cating 0 rounds to the remaining learners. This is feasible b ecause L min { N , ⌊ B /L ⌋} ≤ L ⌊ B /L ⌋ ≤ B , and min { N , ⌊ B /L ⌋} ≤ N . By Prop osi- tion 5.3 , each selected learner completes with probability one. The remaining learners receive 0 < L rounds, so again b y Prop osition 5.3 they complete with prob- abilit y zero. Hence the total num b er of completed learners is min { N , ⌊ B /L ⌋} . 47 Pro of of Theorem 5.6 Let a and g b e distinct concepts. F or eac h i ∈ { 1 , . . . , k } and each j ∈ { 1 , . . . , L − 1 } , let p i,j b e a distinct concept, with all these concepts also distinct from a and g . Deﬁne C = { a, g } ∪ { p i,j : i = 1 , . . . , k , j = 1 , . . . , L − 1 } , A = { a } . F or each i ∈ { 1 , . . . , k } , deﬁne the rule set of mind m i b y E m i = n ( { a } , p i, 1 ) , ( { p i, 1 } , p i, 2 ) , . . . , ( { p i,L − 2 } , p i,L − 1 ) , ( { p i,L − 1 } , g ) o . Th us mind m i has a priv ate prerequisite chain a → p i, 1 → p i, 2 → · · · → p i,L − 1 → g , and no concept p i ′ ,j with i ′  = i is reac hable in mind m i . Cho ose ra w signals z i,j ∈ Z ( i = 1 , . . . , k , j = 1 , . . . , L − 1) , z g ∈ Z , with tgt( z i,j ) = p i,j , tgt( z g ) = g . (i) Personalize d ac quisition in L r ounds. Fix i . The sequence z i, 1 , z i, 2 , . . . , z i,L − 1 , z g is a v alid ordered curriculum of length L for m i : eac h signal b ecomes parseable when its predecessor on the priv ate chain has b een acquired, and the ﬁnal signal acquires g . (ii) Br o adc ast lower b ound. Consider an y common broadcast sequence Γ = ( z 1 , . . . , z T ) that acquires g for ev ery mind. Fix i . Before mind m i can acquire g , it must ﬁrst acquire all L − 1 priv ate prerequisite concepts p i, 1 , . . . , p i,L − 1 . Moreo v er, if i ′  = i , then none of the concepts p i,j lies in U m i ′ . Hence a broadcast signal targeting p i,j can help at most mind m i ; it pro duces no acquisition for an y other mind. It follo ws that at least L − 1 rounds m ust b e dev oted to the priv ate prerequisites of eac h mind i . Summing ov er i = 1 , . . . , k , at least k ( L − 1) rounds are required to make all minds ready for a signal targeting g . Finally , one additional round targeting g is necessary , since g / ∈ A and is acquired only when a signal with target g is parseable. Hence T ≥ k ( L − 1) + 1 . (iii) Tightness. Consider the broadcast sequence z 1 , 1 , . . . , z 1 ,L − 1 , z 2 , 1 , . . . , z 2 ,L − 1 , . . . , z k, 1 , . . . , z k,L − 1 , z g . During the blo c k z i, 1 , . . . , z i,L − 1 , only mind m i adv ances; all other minds ignore those signals. After the ﬁrst k ( L − 1) rounds, each mind m i has acquired p i,L − 1 . The ﬁnal signal z g is therefore parseable for ev ery mind, so all of them acquire g on the last round. Thus the low er b ound is attained. 48 B A dditional results This app endix collects supplementary results that are inv ok ed in the pro ofs ab o v e but are not essential to the main narrativ e. It also records additional consequences of the framework that ma y b e of indep enden t in terest. Corollary B.1 (Not every union-closed family ab o v e the axioms is a learning space) . The class of A -b ase d le arning sp ac es on U is a strict sub class of the class of union-close d families F ⊆ 2 U . Pr o of. Ev ery A -based learning space is, by deﬁnition, union-closed and lies ab o ve A , so only strictness needs to b e shown. Let U = { a, b } , A = ∅ , F = { ∅ , { a, b }} . Then F is union-closed and con tains A . Ho w ev er, it fails accessibilit y , since neither { a, b } \ { a } = { b } nor { a, b } \ { b } = { a } b elongs to F . Hence F is not an A -based learning space. Lemma B.2 (Preﬁx closure of reac hable acquired concept sets) . If K ∈ K m and A m = K 0 ⊂ K 1 ⊂ · · · ⊂ K L = K is a witnessing chain, then every interme diate set K i also b elongs to K m . Pr o of. Eac h K i is reac hable from A m b y truncating the witnessing c hain at step i . Prop osition B.3 (P arseability preserv es information) . L et U t +1 = { tgt( Z t +1 ) / ∈ Φ m ( K t ) } . Then I (Θ; Y t +1 | F t , U c t +1 ) = I (Θ; Z t +1 | F t , U c t +1 ) . In p articular, if the right-hand side is strictly p ositive, then so is the left-hand side. Pr o of. On U c t +1 , the parser acts as the iden tity , so Y t +1 = Z t +1 almost surely . The identit y of the conditional mutual informations follows immediately . Corollary B.4 (Unlimited rephrasing can b e useless under sharp parsing) . Fix time t and a mind m . L et U t ( c ) = { c / ∈ Φ m ( K t ) } . L et ( Z ( j ) t +1 ) j ≥ 1 b e any family of Z -value d r andom variables such that tgt( Z ( j ) t +1 ) = c almost sur ely for every j ≥ 1 , and deﬁne Y ( j ) t +1 = ρ m ( Z ( j ) t +1 , K t ) . Then for every j ≥ 1 , I (Θ; Y ( j ) t +1 | F t ) = 0 almost sur ely on U t ( c ) . Pr o of. On U t ( c ) , the targeted concept is not ordered, so Y ( j ) t +1 = ⊥ almost surely . Hence Y ( j ) t +1 is conditionally constant giv en F t , so the conditional mutual informa- tion is zero. 49 References C. D. Aliprantis and K. C. Border. Inﬁnite Dimensional A nalysis: A Hitchhiker’s Guide . Springer, 2006. Gary S. Bec k er. Human Capital: A The or etic al and Empiric al A nalysis, with Sp e cial R efer enc e to Educ ation . Univ ersity of Chicago Press, 1964. Y oram Ben-Porath. The pro duction of human capital and the life cycle of earnings. Journal of Politic al Ec onomy , 75(4):352–365, 1967. Y oshua Bengio, Jérôme Louradour, Ronan Collob ert, and Jason W eston. Curricu- lum learning. In International Confer enc e on Machine L e arning , pages 41–48, 2009. Claude Berge. Hyp er gr aphs: Combinatorics of ﬁnite sets , v olume 45. Elsevier, 1984. Da vid Blackw ell. Comparison of exp erimen ts. In Berkeley Symp osium on Mathe- matic al Statistics and Pr ob ability , pages 93–102. Univ ersity of California Press, 1951. Da vid Blac kwell. Equiv alen t comparisons of exp erimen ts. The A nnals of Mathe- matic al Statistics , 24(2):265–272, 1953. Thomas M. Co ve r and Joy A. Thomas. Elements of Information The ory . Wiley , 2006. Fla vio Cunha and James J. Hec kman. The technology of skill formation. A meric an Ec onomic R eview , 97(2):31–47, 2007. Reinhard Diestel. Gr aph The ory , volume 173. Springer, 2024. Jean-P aul Doignon and Jean-Calude F almagne. K now le dge Sp ac es . Springer, 1999. Jean-P aul Doignon and Jean-Claude F almagne. Kno wledge spaces and learning spaces. , 2015. Jean-P aul Doignon and Jean-Claude F almagne. Kno wledge spaces and learning spaces. In New Handb o ok of Mathematic al Psycholo gy, V olume 1: F oundations and Metho dolo gy , pages 274–321. Cambridge Univ ersity Press, 2016. Sally A. Goldman and Michael J. Kearns. On the complexit y of teaching. Journal of Computer and System Scienc es , 50(1):20–31, 1995. Bernhard K orte and László Lo vász. Structural prop erties of greedoids. Combina- toric a , 3(3):359–374, 1983. Bernhard K orte, László Lovász, and Rainer Schrader. Gr e e doids , volume 4 of A lgorithms and Combinatorics . Springer-V erlag, 1991. 50 Claude E. Shannon. A mathematical theory of comm unication. Bel l System T e chnic al Journal , 27(3):379–423, 1948. Claude E. Shannon. Communication theory of secrecy systems. Bel l System T e chnic al Journal , 1949. Christopher A. Sims. Implications of rational inatten tion. Journal of Monetary Ec onomics , 50(3):665–690, 2003. Alfred T arski. A lattice-theoretical ﬁxp oin t theorem and its applications. Paciﬁc Journal of Mathematics , 5(2):285–309, 1955. Hassler Whitney . On the abstract properties of linear dep endence. A meric an Journal of Mathematics , 57(3):509–533, 1935. Xiao jin Zhu, Ji Liu, and Man uel Lop es. No learner left b ehind: On the complexity of teaching m ultiple learners sim ultaneously . In International Joint Confer enc e on A rtiﬁcial Intel ligenc e , pages 3588–3594, 2017. Xiao jin Zh u, A dish Singla, Sandra Zilles, and Anna N Raﬀerty . An ov erview of mac hine teac hing. , 2018. 51

A Mathematical Theory of Understanding

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment