Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation

San ta F e Institute W orking P ap er 10-07-XXX arxiv.org:1007.XXXX [ph ysics.gen-ph] Sync hronization and Con trol in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Comp eting Mo dels of Sto c hastic Computation James P . Crutc hﬁeld, 1, 2 , ∗ Christopher J. Ellison, 1 , † Ry an G. James, 1 , ‡ and John R. Mahoney 1 , § 1 Complexity Sciences Center and Physics Dep artment, University of California at Davis, One Shields Avenue, Davis, CA 95616 2 Santa F e Institute, 1399 Hyde Park R o ad, Santa F e, NM 87501 (Dated: Ma y 27, 2022) W e adapt to ols from information theory to analyze how an observer comes to sync hronize with the hidden states of a ﬁnitary , stationary sto chastic pro cess. W e show that synchronization is determined b y both the pro cess’s internal organization and by an observ er’s model of it. W e analyze these components using the conv ergence of state-blo ck and blo ck-state entropies, comparing them to the previously known conv ergence properties of the Shannon blo ck en trop y . Along the wa y , we in tro duce a hierarc h y of information quantiﬁers as deriv ativ es and in tegrals of these entropies, whic h parallels a similar hierarch y introduced for blo c k entrop y . W e also draw out the duality b etw een sync hronization prop erties and a pro cess’s con trollabilit y . The to ols lead to a new classiﬁcation of a pro cess’s alternativ e represen tations in terms of minimalit y , sync hronizabilit y , and uniﬁlarit y . Keyw ords : con trollabilit y , synchronization information, stored information, en trop y rate, statis- tical complexity , excess en trop y , crypticity , information diagram, presen tation, minimality , gauge information, oracular information P ACS num bers: 02.50.-r 89.70.+c 05.45.Tp 02.50.Ey 02.50.Ga CONTENTS I. Introduction 2 A. Precis 3 B. Synchronization and Con trol: Related W ork 3 I I. Blo ck Entrop y and Its Conv ergence Hierarch y 4 A. Stationary Stochastic Pro cesses 4 B. Blo ck Entrop y 4 C. Source En trop y Rate 4 D. Excess En trop y 4 E. Blo ck Entrop y Asymptotics 5 F. The Con v ergence Hierarch y 5 I I I. Pro cess Presentations 6 A. The Causal State Representation 6 B. General Presen tations 7 IV. State-Blo ck and Block-State Entropies 7 A. Conv ergence Hierarchies 8 B. Asymptotics 8 V. Synchronization 9 ∗ chaos@ucda vis.edu † cellison@cse.ucdavis.edu ‡ rgjames@ucdavis.edu § jrmahoney@ucdavis.edu A. Duality of Sync hronization and Contr ol 9 B. Synchronizing to the  -Machine 9 VI. Presentation Quantiﬁers 11 A. Crypticity 11 B. Oracular Information 12 C. Gauge Information 12 D. Synchronization Information 12 E. Cryptic Order 13 F. Oracular Order 13 G. Gauge Order 13 H. Synchronization Order 14 I. Synchronization Time 14 VI I. Classifying Presen tations 15 A. Case: Minimal Uniﬁlar Presen tation 16 B. Case: W eakly Asymptotically Sync hronizable Presentations 17 C. Case: Uniﬁlar Presen tations 18 D. Case: Nonuniﬁlar Presentations 19 VI I I. Conclusions 20 A. Notation Change for T otal Predictabilit y 22 B. State-Blo ck Entrop y Rate Estimate 22 C. Reducing the Presentation I-Diagram 23 Ac kno wledgmen ts 24 References 24 2 Nonlinear dynamical systems store and gen- erate information—they intrinsic al ly c ompute . Real computing devices use nonlinearit y to do the same, except that they are designe d to c om- pute —the information serv es some utilit y or func- tion determined by the designer. In tuitively , use- ful computing devices must b e constructed out of (ph ysical, c hemical, or biological) pro cesses that ha ve some minimum amoun t of intrinsic compu- tational capabilit y . Ho wev er, the exact relation- ship b etw een in trinsic and designed computation remains elusiv e. In fact, bridging in trinsic and de- signed computation requires solvi ng a num b er of in termediate problems. One is to understand the div ersity of in trinsic computation of which nonlin- ear dynamical systems are capable. Another is to determine if one can practically manipulate these systems in the service of functional information generation and storage. Here, we address b oth of these problems from the p ersp ectiv e of information theory . W e de- scrib e new information pro cessing characteristics of dynamical systems and the sto chastic processes they generate. W e fo cus particularly on tw o key asp ects that impact design: synchronization and con trol. Sync hronization concerns how w e come to know the hidden states of a pro cess through observ ations; while con trol, how we can manipu- late a pro cess in to a desired in ternal condition. I. INTRODUCTION Giv en a mo del of a stationary sto chastic pro cess, ho w m uch information m ust one extract from observ ations to exactly kno w which state the process is in? With this, an observ er is said to b e synchronized to the pro cess. (F or an introduction to the problem, see Ref. [1].) Giv en that one has designed a stochastic pro cess, is there a series of inputs that reliably driv e it to a desired in ternal condition? If so, the designed pro cess is said to b e con trollable. Sync hronization and control are dual to each other: In sync hronization, an observ er attempts to predict the pro- cess’s in ternal state from incomplete and indirect obser- v ations, typically starting with complete ignorance and hop efully ending with complete certain t y . In con trol, one m ust extract from the design a series of manipulations, t ypically indirect, that will drive the pro cess to a de- sired state or set of states. The duality is simply that the observer’s measurements can b e interpreted as the designer’s control inputs. Sync hronization and control are key aspects in intrin- sic and designed computation, b oth to detecting in trin- sic computation in dynamical systems and to leverag- ing a dynamical system’s intrinsic computation into use- ful computation. F or the latter, the circuit designer at- tempts to build circuits, themselves dynamical systems, that synchronize to incoming signals. F or example, ev en the most mundane initial op era- tion is essential: When p ow er is ﬁrst applied, a digital computer must predictably reac h a stable and rep eat- able state, without necessarily b eing able to perform ev en small amounts of digital in telligen t con trol or analysis of its c hanging environmen t. Without reliably reaching a stable condition—now a quite elab orate op eration in mo dern micropro cessors—no useful information pro cess- ing can b e initiated. The device is still a dynamical sys- tem, of course, but it fails at raising itself from that pro- saic condition to the level of a computing device. Once digital computing op erations hav e commenced, similar concerns arise in the timing and control of infor- mation b eing loaded from memory into a register. Not only m ust each data bus line synchronize properly or risk misconstruing the v oltage lev el oﬀered up by the wires, but this m ust happ en simultaneously across a num b er of comp onen t devices—quite wide buses, 128 and 256 lines are not uncommon today . Stepping back a bit, one must wonder what to ols dy- namical systems theory itself provides to analyze and de- sign computation. Indeed, many of the prop erties of- ten used to characterize and classify dynamical systems are time-asymptotic—the Kolmogoro v-Sinai en tropy or Shannon entrop y rate, the sp ectrum of Ly apunov char- acteristic exp onents, the fractal and information dimen- sions (which rely on the asymptotic inv arian t measure), come to mind. How ever, real computing is not asymp- totic. Individual logic gates, as dynamical systems, de- liv er their results on the short term. Indeed, the faster they do this, the b etter. Ho w can we bridge the gap b etw een dynamical sys- tems theory and the need to characterize the short term prop erties of dynamical systems? A suggestive exam- ple is found in the analysis of escap e rates [2], a prop- ert y of transient, short-term b eha vior. Another answer is found in synchronization and controllabilit y , as they to o are prop erties of the short term b ehavior of dynam- ical systems. W e will show that there is a connection b et w een these prop erties and the more typical asymp- totic view of dynamical b eha vior: Synchronization and con trol are determined by the nature of con vergence to the asymptotic—they are our sub ject. Giv en the duality b etw een synchronization and con- trol, in the following we present results in terms of only one notion—synchronization. The results apply equally w ell to con trol, though with diﬀerent interpretations. 3 A. Precis Analyzing informational con vergence prop erties is the main strategy we will use. How ever, as we will see, diﬀer- en t properties conv erge diﬀerently from eac h other, either for a giv en pro cess or as one lo oks across a family of pro- cesses. Moreo ver, for a given pro cess we will consider a family of diﬀeren t representations of it. The result, while giving insight in to informational prop erties and how rep- resen tations can distort them, ends up b eing a rather elab orate classiﬁcation scheme. T o reduce the apparent complication, it will b e helpful to give a detailed sum- mary of the steps we employ in the dev elopment. After describing related work, we review the use of Shannon blo ck entrop y and related quantities, analyz- ing their asymptotic b ehavior and asp ects of con ver- gence. W e in tro duce a single framework—the conv er- gence hierarch y—to call out the systematic nature of con- v ergence prop erties. W e then take a short detour to in tro duce the range of p ossible descriptions a pro cess can hav e, noting their deﬁning prop erties. One, the  -machine, plays a partic- ularly central role, as it allows one to calculate all of a pro cess’s intrinsic prop erties. Other descriptions typi- cally do not allo w this to the same broad extent. With a mo del in hand, one can start to discuss how one synchronizes to its states. When the mo del is the  -mac hine, one can sp eak of synchronizing to the pro cess itself. T o do this, we analyze the conv ergence prop erties of tw o new entropies: the state-blo ck entrop y and the blo c k-state entrop y . W e establish their general asymp- totic prop erties, in tro ducing conv ergence hierarchies of their own, paralleling that for the blo ck en tropy . F or ﬁnitary processes, the latter conv erges from below, but the new blo c k-state en tropy con verges from ab o ve to the same asymptote. One b eneﬁt is that estimation metho ds can b e impro ved through use of b ounds from abov e and b elo w. When we sp ecialize to the  -mac hine, w e establish a direct connection betw een synchronization and how the blo ck en tropies conv erge. W e pro vide an infor- mational measure—synchronization information—that summarizes the total uncertain ty encountered during sync hronization. W e relate this bac k to the transient in- formation in tro duced previously , which deriv es only from the observed sequences, not requiring a mo del or a notion of state. Along the wa y , we discuss a pro cess’s Mark ov order—the scale at which “asymptotic” statistics set in— and its cryptic order—the length scale ov er whic h in ter- nal state information is spread. These scales control syn- c hronization. The developmen t then, step-b y-step, relaxes the  -mac hine’s deﬁning prop erties in order to explore an in- creasingly wide range of mo dels. A particular emphasis in this is to sho w ho w nonoptimal mo dels bias estimates of a pro cess’s informational prop erties. Con versely , we learn how certain classes of models, some widely used in mathematical statistics and elsewhere, make strong as- sumptions and, in some cases, preclude the estimation of imp ortan t process prop erties. Starting with the class of minimal, optimally predictiv e mo dels that synchronize (ﬁnitary  -machines), we ﬁrst relax the minimality assumption. W e show that need- less mo del elab orations—such as more, but redundant states—can aﬀect sync hronization. W e iden tify that class whic h still does sync hronize. Then, we consider nonmin- imal uniﬁlar, nonsynchr onizing mo dels. Finally , we relax the uniﬁlarit y assumption. At eac h stage, we see how the con vergence prop erties of the v arious entropies change. These c hanges, in turn, induce a num b er of informational measures of what the mo dels themselv es contribute to a pro cess’s no w largely-apparen t information processing. A key tool in the analysis takes adv antage of the fact that the v arious m ultiv ariable information quantities form a signed measure [3]. Their visual display , a form of V enn diagram called an information diagram, brings some order to the notation and classiﬁcation chaos. B. Synchronization and Con trol: Related W ork Con trolling dynamical systems and stochastic pro- cesses has an extensive history . F or linear dynamical systems see, for example, Ref. [4] and for hidden Marko v mo dels see, for example, Ref. [5]. More recently , there has b een muc h work on con trolling nonlinear dynami- cal systems, a markedly more diﬃcult problem in its full generalit y; see Refs. [6 – 8]. Sync hronization, to o, has b een very broadly studied and for muc h longer, going back at least to Huygens [9]. It is also an imp ortan t prop erty of symbolic dynamical systems [10]. It has even b ecome quite p opularized of late, b eing elev ated to a general principle of natural or- ganization [11]. Here, we consider a form of sync hronization that is, at least at this p oint, distinct from the dynamical kind. Moreo ver, w e take a complementary , but dis- tinct approach—that of information theory—to address con trol and synchronization. This was in tro duced in Ref. [12] and several applications are given in Refs. [1, 13]. A roughly similar problem setting for synchronization is found in Ref. [14]. W e note that the closely related top- ics of state estimation and control are addressed in infor- mation theory [15, 16], nonlinear dynamics [17 – 19], and Mark ov decision pro cesses [20]. Adapting the present approac h to contin uous dynam- 4 ical systems and stochastic processes remains a future eﬀort. F or the present, the closest connections will b e found to the w ork cited ab ov e on hidden Marko v mo dels and symbolic dynamical systems. I I. BLOCK ENTR OPY AND ITS CONVER GENCE HIERAR CHY It is an interesting fact, p erhaps now intuitiv e, that to estimate even the randomness of an information source, one must also estimate it’s internal structure. Ref. [12] giv es a review of this interdependence and it serves as a starting point for our analysis of synchronization, which is a question about coming to know the source’s states from observ ations. Indeed, if one has to make estimates of internal organization just to get to randomness, then one, in eﬀect and without to o muc h more eﬀort, can also address issues of synchronization. There is an in timate relationship that we hop e to establish. W e brieﬂy review Ref. [12], largely to in tro duce no- tation and highlight the main ideas needed for sync hro- nization. This review and our dev elopment of synchro- nization requires the reader to b e facile with information theory at the lev el of the ﬁrst half of Ref. [21], signed in- formation measures and information diagrams of Ref. [3], and their uses in Refs. [22–24]. A. Stationary Sto c hastic Pro cesses The approach in Ref. [12] starts simply: Any sta- tionary pro cess, P , is a joint probability distribution Pr( ← − X , − → X ) o v er past and future observ ations. This dis- tribution can b e thought of as a c ommunic ation chan- nel with a sp eciﬁed input distribution, Pr( ← − X ). It trans- mits information from the p ast ← − X = . . . X − 3 X − 2 X − 1 to the futur e − → X = X 0 X 1 X 2 . . . by storing it in the present. X t is the random v ariable for the measurement outcome at time t ; the lo wercase x t denotes a particular v alue. Throughout this w ork, we alwa ys use ← − X and − → X in the limiting sense. That is, w e w ork with length- L sequences or blo cks of random v ariables: X L t = X t X t +1 · · · X t + L − 1 and take the limit as L approaches inﬁnit y . In the follo wing, we consider only discrete measure- men t outcomes— x ∈ A = { 1 , 2 , . . . , k } —and stationary pro cesses—Pr( X L t ) = Pr( X L 0 ), for all times t and blo ck lengths L . Unlike some deﬁnitions of stationarity , this mak es no assumptions about the pro cess’s in ternal start- ing conditions, as suc h kno wledge obviates the very ques- tion of synchronization. Suc h processes include those found in the ﬁeld of sto c hastic processes, of course, but one also has in mind the symbolic dynamics of contin uous-state contin uous- time or contin uous-state discrete-time dynamical systems on their inv ariant sets. The notions also apply equally w ell to one-dimensional spatial conﬁgurations of spin sys- tems and of deterministic and probabilistic cellular au- tomata, where one interprets the spatial co ordinate as a “time”. B. Blo ck En tropy One measure of the diversit y of length- L sequences generated by a pro cess is its Shannon blo ck entr opy : H ( L ) ≡ H [ X L 0 ] (1) = − X w ∈A L Pr( w ) log 2 Pr( w ) , (2) where w = x 0 x 1 . . . x L − 1 is a wor d in the set A L of length- L sequences. It has units of [bits] of informa- tion. One can think of the blo ck entrop y as a kind of transform that reduces a pro cess’s distribution ov er the (t ypically inﬁnite) num b er of sequences to a function of a single v ariable L . In this view, Ref. [12] fo cused on a simple question: What prop erties of a pro cess can b e determined solely from its H ( L )? C. Source Entrop y Rate One of those prop erties, and historically the most widely used and technologically important, is Shannon ’s sour c e entr opy r ate : h µ = lim L →∞ H ( L ) L . (3) The entrop y rate is the irreducible unpredictability of a pro cess’s output—the in trinsic randomness left after one has extracted all of the correlational information from past observ ations. The diﬀerence b etw een it and the al- phab et size, log 2 |A| − h µ , indicates how muc h the raw measuremen ts can b e compressed. More precisely , Shan- non’s First Theorem states that the output sequences x L from an information source can b e compressed, without error, to Lh µ bits [21]. Moreov er, Shannon’s Second The- orem gives op erational meaning to the entrop y rate [21]: A comm unication channel’s capacit y must be larger than h µ for error-free transmission. D. Excess Entrop y As noted, an y pro cess—chaotic dynamical system, spin c hain, cellular automata, to men tion a few—can b e con- sidered a c hannel that comm unicates its past to its fu- ture. The messages to b e transmitted in this wa y are the pasts which the pro cess can generate. Thus, the “capac- 5 it y” of this channel is not something that one optimizes as done in Shannon’s theory to engineer channels and construct error-free encodings. Rather, we think of it as ho w muc h of the pro cess’s c hannel is actually used. A pro cess’s channel utilization is another property that can b e determined from the block en tropy . It is called the exc ess entr opy and is deﬁned, closely follo wing Shannon’s c hannel capacity deﬁnition, b y: E = I [ ← − X ; − → X ] , (4) where I [ Y ; Z ] is the m utual information betw een random v ariables Y and Z . It has units of [bits] and tells one ho w muc h information the output shares with the input and so measures how m uch information is transmitted through a, p ossibly noisy , channel. E. Blo ck En tropy Asymptotics It has been known for quite some time no w that the en tropy rate and excess entrop y control the asymptotic b eha vior ( L → ∞ ) of a ﬁnitary pro cess’s blo ck entrop y . Sp eciﬁcally , it scales according to the linear asymptote: H ( L ) ∝ E + h µ L . (5) Sp eciﬁcally , E = lim L →∞ ( H ( L ) − Lh µ ) . (6) E is the sublinear part of H ( L ). This giv es imp ortant general insight into the blo ck entrop y’s b eha vior. It is also quite practical, though: If H ( L ) actually meets the asymptote at some ﬁnite sequence length R , then the pro cess is eﬀectiv ely an order- R Marko v chain [12, 24]: Pr( X 0 | ← − X ) = Pr( X 0 | X R − R ). Interestingly , many ﬁnitary pro cesses do not reach the asymptote at ﬁnite lengths and so cannot b e recast as Marko v chains of any order. Roughly sp eaking, they ha ve v arious kinds of inﬁnite- range correlation. F. The Conv ergence Hierarch y In this wa y , the study of how the block entrop y con- v erges, or does not, is a tool for classifying pro cesses. Reference [12] show ed that the entrop y rate and excess en tropy are merely tw o play ers in an inﬁnite hierarc hy that determines the shap e of H ( L ). The central idea is to take L -deriv atives and in tegrals of H ( L ). T o start, one has the block entrop y diﬀerence: ∆ H [ X L 0 ] ≡ H [ X L 0 ] − H [ X L − 1 0 ] , (7) where ∆ is the discrete deriv ative with resp ect to blo c k length L . It is easy to see that the right-hand side is the conditional entrop y H [ X L − 1 | X L − 1 0 ] and that, in turn, h µ = lim L →∞ H [ X L − 1 | X L − 1 0 ] (8) = H [ X 0 | ← − X 0 ] , (9) reco vering the entrop y rate. It is often useful to directly refer to the length- L approximation to the en tropy rate as h µ ( L ) ≡ H [ X L − 1 | X L − 1 0 ]. h µ ( L ) ≥ h µ and so it conv erges from ab o ve. The excess entrop y , for its part, controls the conv er- gence sp eed, as it is the discrete integral: E = ∞ X L =1 ( h µ ( L ) − h µ ) . (10) It requires only a few steps to see that this form is equiv- alen t to that of Eq. (4). F ollowing a similar strategy , the discrete integral T = ∞ X L =0 [ E + h µ L − H ( L )] (11) measures how H ( L ) itself reac hes its linear asymptote E + h µ L . T is called the tr ansient information and it is implicated in determining the Mark ov order and, as we will show, synchronization. The pattern should b e clear no w: At the lo west lev el, the transien t information indicates ho w quic kly the block en tropy reaches its asymptote. Then, that asymptote gro ws at the rate h µ and has y -intercept E . It might b e helpful to refer to the graphical summary of blo c k- en tropy conv ergence and the asso ciated information mea- sures giv en in Ref. [12, Fig. 2]. Analogous diagrams will app ear shortly . All this can b e compactly summarized by introducing t wo operators: a deriv ativ e and an integral that op erate on H ( L ). The deriv ativ e operator at the n th -lev el is: ∆ n H [ X L 0 ] ≡ ∆ n − 1 H [ X L 0 ] − ∆ n − 1 H [ X L − 1 0 ] , (12) for L ≥ n = 1 , 2 , . . . and for L ≥ n = 0, ∆ 0 H [ X L 0 ] ≡ H [ X L 0 ] . (13) The integral op erator is: I n ≡ ∞ X L = n  ∆ n H [ X L 0 ] − lim ` →∞ ∆ n H [ X ` 0 ]  , (14) n = 0 , 1 , 2 , . . . . (This is a slight deviation from Ref. [12], when n = 2. See App. A.) 6 T o make the connection with what we just discussed, in this notation w e hav e: h µ = lim L →∞ ∆ 1 H [ X L 0 ] , (15) E = I 1 , and (16) T = −I 0 . (17) Additionally , I 2 is a pro cess’s total pr e dictability G and ∆ 2 H [ X L 0 ] is its predictability gain—the rate at which predictions improv e by going to longer sequences. The tw o operators, ∆ n and I n , deﬁne the entr opy c on- ver genc e hier ar chy for a pro cess, capturing those prop er- ties reﬂected in the pro cess’s blo ck en trop y . Given a pro- cess’s sp eciﬁcation, one attempts to calculate the hierar- c hy analytically; giv en data, to estimate it empirically . In addition to systematizing a process’s informational prop- erties, the hierarch y has a n umber of uses. F or example, structural classes of pro cesses can b e distinguished by the n ∗ at which the hierarch y b ecomes trivial; for example, when ∆ n H [ X L 0 ] = 0 , n > n ∗ . Other classiﬁcations turn on b ounded I n ∗ . The ﬁnitary pro cesses, for example, are deﬁned by n ∗ = 1: I 1 = E < ∞ . Or, conv ersely , there are well kno wn pro cesses for which some in tegrals div erge; they include the onset of c haos through p erio d- doubling, where the excess en tropy diverges. Reference [12, Sec. VI I.A] introduces a classiﬁcation of pro cesses along these lines. I I I. PROCESS PRESENT A TIONS A. The Causal State Representation Prediction is closely allied to the view of a pro cess as a comm unication channel: W e wish to predict the future using information from the past. At root, a prediction is probabilistic, sp eciﬁed b y a distribution of p ossible fu- tures − → X given a particular past ← − x : Pr( − → X | ← − x ). A t a min- im um, a go o d predictive model needs to capture al l of the information I shared b etw een the past and future: E = I [ ← − X ; − → X ]. Consider no w the goal of mo deling—building a repre- sen tation that allo ws not only go o d prediction but also expresses the mechanisms pro ducing a system’s b ehav- ior. T o build a mo del of a structured pro cess (a memo- ryful channel), computational mec hanics [25] introduced an equiv alence relation ← − x ∼ ← − x 0 that clusters all histories whic h give rise to the same prediction:  ( ← − x ) = { ← − x 0 : Pr( − → X | ← − x ) = Pr( − → X | ← − x 0 ) } . (18) In other words, for the purp ose of forecasting the fu- ture, tw o diﬀerent pasts are equiv alent if they result in the same prediction. The result of applying this equiv a- lence giv es the pro cess’s c ausal states S = Pr( ← − X , − → X ) / ∼ , whic h partition the space ← − X of pasts in to sets that are predictiv ely equiv alent. The set of causal states [26] can b e discrete, fractal, or contin uous; see, e.g., Figs. 7, 8, 10, and 17 in Ref. [27]. State-to-state transitions are denoted b y matrices T ( x ) S S 0 whose elements give the probability Pr( X = x, S 0 |S ) of transitioning from one state S to the next S 0 on see- ing measuremen t x . The resulting mo del, consisting of the causal states and transitions, is called the pro cess’s  -machine . Given a pro cess P , w e denote its  -mac hine b y M ( P ). Causal states hav e a Marko vian prop ert y that they ren- der the past and future statistically independent; they shield the future from the past [28]: Pr( ← − X , − → X |S ) = Pr( ← − X |S ) Pr( − → X |S ) . (19) Moreo ver, they are optimally predictive [25] in the sense that knowing which causal state a pro cess is in is just as go o d as having the en tire past: Pr( − → X |S ) = Pr( − → X | ← − X ). In other w ords, causal shielding is equiv alent to the fact [28] that the causal states capture all of the information shared b et ween past and future: I [ S ; − → X ] = E .  -Mac hines hav e an imp ortant structural prop erty called uniﬁlarity [25, 29]: F rom the start state, eac h sym b ol sequence corresponds to exactly one sequence of causal states [30]. The imp ortance of uniﬁliarit y , as a prop ert y of any mo del, is reﬂected in the fact that represen tations without uniﬁlarity , such as generic hid- den Mark ov mo dels, c annot b e used to directly calculate imp ortan t system prop erties—including the most basic, suc h as how random a pro cess is. As a practical result, uniﬁlarit y is easy to v erify: F or eac h state, each measure- men t symbol app ears on at most one outgoing transition [31]. Thus, the signature of uniﬁlarit y is that on kno wing the current state S t and measurement X t , the uncertain ty in the next state S t +1 v anishes: H [ S t +1 |S t , X t ] = 0. Out of all optimally predictive mo dels b R —for whic h I [ b R ; − → X ] = E —the  -machine captures the minimal amoun t of information that a pro cess m ust store in or- der to comm unicate all of the excess entrop y from the past to the future. This is the Shannon information con tained in the causal states—the statistic al c omplex- ity [28]: C µ ≡ H [ S ] ≤ H [ b R ]. It turns out that statisti- cal complexity upp er b ounds the excess en tropy [25, 29]: E ≤ C µ . In short, E is the eﬀective information trans- mission rate of the pro cess, view ed as a c hannel, and C µ is the memory stored in that c hannel. Com bined, these properties mean that the  -machine is the basis against which mo deling should b e compared, since it captures all of a pro cess’s information at maxi- 7 m um representational eﬃciency . Imp ortan tly , due to uniﬁlarity one can calculate the en tropy rate directly from a pro cess’s  -mac hine: h µ = H [ X |S ] = − X {S } Pr( S ) X { x S 0 } T ( x ) S S 0 log 2 X {S 0 } T ( x ) S S 0 . (20) Pr( S ) is the asymptotic probability of the causal states, whic h is obtained as the normalized principal eigen vector of the transition matrix T = P { x } T ( x ) . A pro cess’s statistical complexity can also b e directly calculated from its  -machine: C µ = H [ S ] = − X {S } Pr( S ) log 2 Pr( S ) . (21) Th us, the  -machine directly giv es t wo imp ortant proper- ties: a pro cess’s rate ( h µ ) of pro ducing information and the amount ( C µ ) of historical information it stores in do- ing so. Moreov er, Refs. [22, 23] show ed how to calculate a pro cess’s excess en trop y directly from the  -machine. B. General Presentations The  -machine is only one p os sible description of a pro- cess. There are many alternativ es: Some larger, some smaller; some with the same prediction error, some with larger prediction error; some that are uniﬁlar, some not; some that do an excellent job of capturing Pr( ← − X , − → X ), man y (or most) doing only an approximate job; some allo wing for the direct calculation of the pro cess’s prop- erties, some precluding suc h calculations. The  -machine, compared to all other p ossible descrip- tions, is arguably the best. The results in the follow- ing, as an ancillary b eneﬁt, strengthen this conclusion considerably . How ev er, it is important to k eep in mind that due to implementation constraints or intended use or under sp eciﬁed p erformance criteria, alternativ e mo dels ma y b e desirable and preferred to the  -mac hine. Refer- ence [27], for example, compares the beneﬁts and disad- v antages of diﬀerent kinds of nonuniﬁlar mo dels that are smaller than the  -machine. W e return to elab orate on this in Sec. VII D. One refers to a pro cess’s p ossible descriptions as pr esentations [32]. Sp eciﬁcally , these are state-based mo dels—using states and state transitions—that exactly describ e Pr( ← − X , − → X ). That is, giv en a ﬁnitary pro cess P , w e consider the set of all presentations that generate the same pr o c ess language : Pr( X L ) , L = 1 , 2 , . . . . The set of P ’s presentations is the focus of our w ork here. That is, w e do not address mo dels that give only appro ximations to the pro cess language. W e refer to these alternative mo dels as rivals . A riv al consists of a set R of states and state-to-state transitions T ( s ) RR 0 o ver the sym b ols s in the pro cess’s measuremen t al- phab et A . There is an asso ciated mapping η : ← − x → R from pasts to riv al states. When w e refer to the riv al’s state as a random v ariable, w e will denote this R = η ( ← − X ). W e use low er case ρ when w e refer to a particular realiza- tion: R = ρ , ρ ∈ R . Just as with the  -machine, giv en a riv al presentation, we can refer to the amoun t of infor- mation the riv al states contain—this is the pr esentation state entr opy H [ R ]. Ab o v e, we noted that a pro cess’s  -machine is its mini- mal uniﬁlar presen tation. But, ho w are the riv als related, if at all, to the  -machine? T o explore the organization of the space of riv als, in the follo wing w e relax prop erties that make the  -machine unique, working with presen ta- tions that are nonminimal uniﬁlar and those that are not ev en uniﬁlar. And so, we must distinguish sev eral kinds of presen tation. First, we extend uniﬁlarity to presenta- tions, generally . Deﬁnition 1. A pr esentation is uniﬁlar if and only if H [ R t +1 |R t , X t ] = 0 . Second, we introduce the notion of reverse-time uniﬁlar- it y . Deﬁnition 2. A pr esentation is couniﬁlar if and only if H [ R t | X t , R t +1 ] = 0 . Third, w e will consider prescient presen tations, those whose states are as go o d at predicting as the  -machine’s causal states [28, 29]. Deﬁnition 3. A pr esentation is prescient if and only if, for al l p asts ← − x ∈ ← − X : Pr( − → X L |R = η ( ← − x )) = Pr( − → X L |S =  ( ← − x )) , (22) for al l L ≥ 1 , 2 , 3 , . . . . W e will also shortly discuss presen tations to which one can or cannot sync hronize—that are or are not control- lable. IV. ST A TE-BLOCK AND BLOCK-ST A TE ENTR OPIES No w, w e introduce t wo blo ck entropies and discuss their properties, but ﬁrst, w e recall some well known re- sults from information theory [21, Sec. 4.2]. F or any stationary sto chastic pro cess, ∆ H [ X L 0 ] is a nonincreasing sequence of nonnegativ e terms that con- v erges, from ab ov e, to the en tropy rate h µ . There is a complemen tary result which pro vides an estimate of 8 the entrop y rate that conv erges from b elow. It is typ- ically stated in terms of the Mo ore (state-output) type of hidden Mark ov mo del [21, Thm. 4.5.1], so w e recast the theorem in terms of the Mealy (edge-output) type of hidden Marko v mo dels, used exclusively here. Theorem 1. If R 0 , R 1 , . . . form a stationary Markov chain and ( X i , R i +1 ) = φ ( R i ) , then H [ X L |R 0 , X L 0 ] ≤ h µ ≤ H [ X L | X L 0 ] , (23) L = 0 , 1 , 2 , . . . , and H [ X ∞ |R 0 , − → X 0 ] = h µ . (24) φ ne e d not b e a deterministic mapping. App endix B provides the proof details. Henceforth, we refer to H [ R 0 , X L 0 ] as the state-blo ck entr opy . W e also deﬁne the blo ck-state en tropy to b e H [ X L 0 , R L ]. As with the state-blo c k entrop y , there is a corresp onding conv ergence result. Theorem 2. If R 0 , R 1 , . . . form a stationary Markov chain and ( X i , R i +1 ) = φ ( R i ) , then H [ X L 0 , R L ] − H [ X L − 1 0 , R L − 1 ] ≤ h µ ≤ H [ X L | X L 0 ] , (25) L = 1 , 2 , 3 , . . . , and lim L →∞  H [ X L 0 , R L ] − H [ X L − 1 0 , R L − 1 ]  = h µ . (26) A gain, φ ne e d not b e a deterministic mapping. Ref. [33] pro vides the pro of of this theorem and dis- cusses related results in the con text of crypticit y and cryptic order [24]. Note, both of these theorems hold for general presen tations—not just  -machines—and this fact serv es as the motiv ation for our later generalizations. A. Conv ergence Hierarchies Just as with the block en tropy H [ X L 0 ], w e will consider L -deriv atives and in tegrals of the state-blo ck and blo c k- state entropies. A t the ﬁrst lev el, ∆ H [ R 0 , X L 0 ] ≡ H [ R 0 , X L 0 ] − H [ R 0 , X L − 1 0 ] , (27) ∆ H [ X L 0 , R L ] ≡ H [ X L 0 , R L ] − H [ X L − 1 0 , R L − 1 ] . (28) Higher-order deriv ativ es are deﬁned similarly to Eq. (12). As b efore, the n = 0 case is an identit y operator. So, for example, ∆ 0 H [ R 0 , X L 0 ] = H [ R 0 , X L 0 ]. W e already know—Thms. 1 and 2—that b oth of these quan tities tend to h µ in the large- L limit, ensuring that all higher-order deriv atives tend to zero. No w, consider the n th state-blo c k and block-state in- tegrals: K n = ∞ X L = n  ∆ n H [ R 0 , X L 0 ] − lim ` →∞ ∆ n H [ R 0 , X ` 0 ]  , (29) J n = ∞ X L = n  ∆ n H [ X L 0 , R L ] − lim ` →∞ ∆ n H [ X ` 0 , R ` ]  . (30) Note that b oth K 0 ≥ 0 and J 0 ≥ 0 while, in contrast, I 0 ≤ 0. Also, K 1 ≤ 0 and J 1 ≤ 0 while I 1 ≥ 0. These diﬀerences are due to the fact that the blo c k entrop y is conca ve in L while the state-blo ck and blo ck-state en- tropies are conv ex. Consider the partial sums of K 1 —the state-blo ck inte- gral: K 1 ( L ) = L X ` =1  ∆ H [ R 0 , X ` 0 ] − h µ  = H [ R 0 , X L 0 ] − H [ R 0 , X 0 0 ] − Lh µ = H [ X L 0 |R 0 ] − Lh µ . (31) Note that if the presen tation is uniﬁlar, then H [ X L 0 |R 0 ] = Lh µ and K 1 ( L ) = 0. Th us, uniﬁlarit y is a suﬃcien t condition for K 1 = 0, but it is not a neces- sary condition. No w, consider the partial sums of J 1 —the blo ck-state in tegral: J 1 ( L ) = L X ` =1  ∆ H [ X ` 0 , R ` ] − h µ  = H [ X L 0 , R L ] − H [ X 0 0 , R 0 ] − Lh µ = H [ X L 0 , R L ] − H [ X 0 L , R L ] − Lh µ = H [ X L 0 |R L ] − Lh µ . (32) Similarly , if the presentation is couniﬁlar, then it follows that H [ X L 0 |R L ] = 0 and J 1 ( L ) = 0. So, couniﬁlarity is a suﬃcien t condition for J 1 = 0, but it is not a necessary condition. B. Asymptotics Theorems 1 and 2 tell us H [ S 0 , X L 0 ] and H [ X L 0 , S L ] are con vex functions in L and that the slop e limits to the en tropy rate. This means that eac h curv e con verges to a linear asymptote, cf. Eq. (5): H [ R 0 , X L 0 ] ∝ Y SBE + h µ L (33) H [ X L 0 , R L ] ∝ Y BSE + h µ L , (34) where Y SBE and Y BSE are constants indep endent of L . The pictures that one should hav e in mind for the growth 9 of these entropies are those of Figs. 1, 5, 8, 11, and 14, whic h we will discuss in due course. In fact, we will tak e this b ehavior as the deﬁnition of the following linear asymptotes: Y SBE ≡ lim L →∞  H [ R 0 , X L 0 ] − h µ L  = lim L →∞  H [ R 0 ] + H [ X L 0 |R 0 ] − h µ L  = H [ R 0 ] + K 1 (35) and Y BSE ≡ lim L →∞  H [ X L 0 , R L ] − h µ L  = lim L →∞  H [ R L ] + H [ X L 0 |R L ] − h µ L  = H [ R 0 ] + J 1 . (36) These tell us that K 1 and J 1 are not the sublinear parts of the state-blo ck and blo ck-state entropies. This is in con- trast to the corresp onding result for the blo ck entropies: Y BE ≡ lim L →∞  H [ X L 0 ] − h µ L  (37) = lim L →∞  H [ X 0 0 ] + H [ X L 0 ] − h µ L  (38) = H [ X 0 0 ] + I 1 . (39) The term H [ X 0 0 ] was dropped in the earlier partial sum form ulation—i.e., Eq. (10)—since it corresp onds to no measuremen t b eing made and so is zero. It is reintro- duced ab ov e, though, to complete the formal parallel to the state-blo c k and block-state entrop y cases. The result for blo ck entrop y is that the oﬀset of the linear asymptote was equal to the I 1 , the excess en- trop y . How ever, the argumen t just given clearly estab- lishes that, in fact, one should think of the ﬁrst deriv a- tiv es as oﬀsets from the initial v alue of their corresp ond- ing curves. Finally , recall that K 1 and J 1 are not greater than zero, so Y SBE and Y BSE are less than or equal to the presen tation state en tropy H [ R 0 ]. V. SYNCHRONIZA TION A. Duality of Sync hronization and Control Sync hronization is a question about how an observer comes to kno w a pro cess’s (typically hidden) curren t in ternal state through observ ations. (Recall the pic- ture introduced in Ref. [1].) As suc h, it requires a no- tion of state, either the process’s causal state (using the  -mac hine) or the state of some other presentation. In either case we monitor the observer’s uncertaint y o ver the states R after having seen a series of measurements w = x 0 x 2 . . . x L − 1 using the conditional state entrop y H [ R| w ]. When this v anishes, the observer is synchro- nized and we call w a synchr onizing wor d . During synchronization, the observer up dates her an- sw er to the question, “Whic h presen tation states can b e reac hed by sequence w ?” When there is a unique answer, the observer is sync hronized. If the ev entual answer, though, is only a proper subset of presentation states, then 0 < H [ R| w ] ≤ H [ R ] and the observ er can b e said to b e partially sync hronized. A formal treatment of sync hronization appears in Refs. [34, 35], which deﬁne asymptotic synchronization as follows. Deﬁnition 4. A pr esentation is weakly asymptotically sync hronizing if and only if lim L →∞ H [ R L | X L 0 ] = 0 . While some pro cesses can hav e synchronizing words, others hav e synchr onizing blo cks where ev ery w ord of a ﬁnite length R is a synchronizing word. Such pro cesses are called Markov pr o c esses . The smallest such R is the Markov or der [24, 36]. It turns out that the  -machine presen tation for a Marko v pro cess is exactly synchr oniz- ing [34]: for ﬁnite R , H [ S 0 | X L 0 ] = 0 , L ≥ R . If a pro cess admits a presen tation that is only weakly asymptotically synchronizing, though, then an observer will b e in v arious conditions of state uncertaint y un- til the limit L → ∞ . Finitary  -machines, as it turns out, are alw ays weakly asymptotically synchronizing and the state uncertaint y v anishes exponentially fast [35]: Pr  H [ S 0 | X L 0 ] > 0  ∝ e − L . The con trollabilit y prop erties of a process and its mod- els are analogous. How ev er, now there is a designer that has built an implementation of a process. And, starting from an unknown condition, the designer wishes to pre- pare the pro cess in a particular state or set of states by imp osing a sequence of inputs. Phrased this w ay , one sees that the implementation is, in eﬀect, a presen tation and the control sequence is none other than a synchronizing w ord. Due to this duality , we only discuss synchroniza- tion in the bulk of our developmen t, returning at the end to brieﬂy draw out interpretations of the results for con trollability . B. Synchronizing to the  -Machine W e noted that the  -machine directly gives t wo imp or- tan t information-theoretic prop erties—the entrop y rate ( h µ ) and the statistical complexit y ( C µ )—and one (the excess en tropy E ) indirectly . The diﬀerence betw een C µ and E was introduced as the crypticity [22, 23] χ = C µ − E (40) 10 to describ e how m uch of the in ternal state information ( C µ ) is not locally present in observ ed sequences ( E ). Sync hronization, as we discussed, is a prop erty of the recurren t p ortion of the  -mac hine and since it is uniﬁ- lar, if one kno ws its current state and follows transitions according to the w ord b eing considered, then one will al- w ays kno w the  -mac hine’s ﬁnal state. How ever, it is also useful to consider the scenario when one do es not know the  -mac hine’s current state. Giv en no other informa- tion, the b est estimate for the current state is to draw from the stationary state distribution Pr( S ). Then, as eac h symbol is observed, one up dates this b elief distri- bution and estimates the next state from this up dated distribution. As noted ab ov e, H [ S L | X L 0 ] conv erges to zero exp onen- tially fast for all  -mac hines with a ﬁnite num b er of re- curren t causal states. At each L b efore that p oint, there is an uncertaint y in the causal state of the  -machine. If w e add up the uncertaint y at each length, then w e ha ve the synchr onization information : S ≡ ∞ X L =0 H [ S L | X L 0 ] (41) = ∞ X L =0  H [ X L 0 , S L ] − H [ X L 0 ]  . (42) Imp ortan tly , the second line shows that synchronization information can be visualized as the sum of all diﬀerences b et w een the block-state and the block en tropy curves. Moreo ver, starting from Eq. (42) we ﬁnd: S = ∞ X L =0  H [ X L 0 , S L ] − ( E + Lh µ )  − ∞ X L =0  H [ X L 0 ] − ( E + Lh µ )  (43) = J 0 − I 0 . (44) W e kno w that T = −I 0 . When we iden tify J 0 with a separate, nonnegative information quantit y w e conclude immediately that S ≥ T . This relationship is sho wn graphically in Fig. 1. The cryptic or der k , as deﬁned in Ref. [24], can be in terpreted as the length at which the blo ck-state curv e has conv erged to its asymptote: E + h µ L . Surprisingly , this is not the length at whic h an  -machine can be con- sidered sync hronized, which is giv en by the Mark ov order R . Given its deﬁnition as the smallest v alue L for which H [ S L | − → X 0 ] = 0, we see that the cryptic order can b e in- terpreted as a measure of how far back in time the state sequence c an b e retro dicted from the distant future. F or example, the Even Pro cess consists of all bi-inﬁnite R k L [symbols] C µ E H [bits] H [ X L 0 ] H [ X L 0 , S L ] E + Lh µ S T R k L [symbols] C µ E H [bits] H [ X L 0 ] H [ X L 0 , S L ] E + Lh µ S T FIG. 1. Block en tropy and block-state en tropy gro wth for a generic ﬁnitary stationary pro cess: It is easily seen that the sync hronization information upp er bounds the transient in- formation, T ≤ S , as T is a comp onent of S . The Mark ov order R and cryptic order k are also shown in their prop er re- lationship k ≤ R : R indicates where the blo ck en tropy meets the E + h µ L asymptote and k , where the blo ck-state entrop y meets the same asymptote. sequences that con tain even-length stretches of 1s sepa- rated by at least a single 0; see Ref. [12]. This pro cess cannot b e considered synchronized at any ﬁnite length b ecause all the thus-far seen sym b ols ma y b e 1s, and so one do es not know if the latest symbol is a 1 at an ev en- or o dd-v alued lo cation. In contrast, once a 0 has b een seen, w e kno w instantly the ev enness and o ddness of eac h preceding 1, making the cryptic order k = 0. Since the cryptic order k = 0 for the Even Pro cess, one concludes that J 0 do es not contribute to S and T = S . The tw o pieces— J 0 and I 0 —comprising S are b oth ﬁ- nite due to the exp onentially fast conv ergence of the tw o blo c k-en tropy curv es [35]. This shows that S consists of distinct information con tributions drawn from diﬀerent pro cess features. Referring to Fig. 1, the low er piece, the transien t information T , is information recorded due to an ov er-estimation of the entrop y rate h µ at blo ck lengths L less than the Mark ov order R . This ov er-estimation is due, in eﬀect, to L b eing shorter than the longest corre- lations in the data. In a complemen tary wa y , the upp er p ortion J 0 can be view ed as the amount of state informa- tion that cannot b e retro dicted, ev en given the inﬁnite future. The relativ e roles of the contributions to sync hroniza- tion information can b e clearly seen for one-dimensional range- R spin systems. Reference [12] claimed that for spin chains: S = T + 1 2 R ( R + 1) h µ , (45) where R is the coupling range (Mark ov order) of the spin 11 c hain. This can b e established rather directly , and under- sto o d for the ﬁrst time, using the geometric con vergence picture just introduced for S . First, Ref. [33] show ed that for a spin chain H [ X L 0 , S L ] is ﬂat (zero slop e) for 0 ≤ L ≤ R , after whic h it conv erges to its asymptote. Second, combining these, we hav e: J 0 = R X L =0 H [ X L 0 , S L ] − ( E + Lh µ ) (46) = R X L =0 ( E + Rh µ ) − ( E + Lh µ ) (47) = R X L =0 ( R − L ) h µ (48) = 1 2 R ( R + 1) h µ . (49) So, the amount of state information that cannot be retro- dicted is quadratic in Marko v order. Finally , H [ X L 0 ] and H [ X L 0 , S L ] giv e lo wer and up- p er b ounds on E , resp ectively: the ﬁrst monotonically approac hes E + Lh µ from b elow and the second mono- tonically approaches it from ab ov e. This wa y , giv en an  -mac hine, it is simple to compute E with any accuracy required from the blo ck entropies, which themselv es can b e eﬃciently estimated from the  -machine. Similarly , since H [ X L 0 ] ov er-estimates the entrop y rate while ap- proac hing from ab ov e and H [ X L 0 , S L ] under-estimates the entrop y rate while approaching from b elow, one ob- tains an analogous pair of b ounds on h µ . This blo ck-state tec hnique for b ounding the en tropy rate, how ever, holds for any t yp e of presentation of the pro cess. (Cf. Ref. [21, Sec. 4.5].) VI. PRESENT A TION QUANTIFIERS The developmen t and results hav e fo c used, so far, on  -mac hines and their information-theoretic prop erties. Due to the  -machine’s uniqueness, these were also prop- erties of the corresp onding processes themselves. No w, w e relax the deﬁning c haracteristics of  -machines to con- sider generic presentations. Naturally , this destroys our abilit y to directly identify presentation prop erties with those of the pro cess represen ted. A pro cess’s entrop y rate ( h µ ) and excess entrop y ( E ) remain unc hanged, ho w- ev er, since they are deﬁned solely through its observ ables Pr( ← − X , − → X ). Widening our purview to generic presenta- tions leads us to brieﬂy introduce sev eral new prop er- ties that capture information pro cessing in presentations. P erhaps more distinctly , this also leads us to quantify the kinds of information in a presen tation that are not char- acteristics of the pro cess it represen ts. Section VI I then pro vides more detailed exp ositions on their meaning and example pro cesses to illustrate them. A. Crypticity The statistical complexit y C µ is the amoun t of infor- mation a pro cess must store in order to generate future b eha vior. The crypticity χ is that part of C µ not trans- mitted to the future: χ = C µ − E . Roughly , it can b e though t of as the irreducible ov erhead that arises from the pro cess’s causal structure. Reference [22] deﬁned crypticit y for  -mac hines as χ = H [ S 0 | − → X 0 ]. Now, we generalize this to deﬁne crypticity for generic presen ta- tions. Deﬁnition 5. The presen tation crypticit y χ ( R ) is the amount of state information shar e d with the p ast that is not tr ansmitte d to the futur e: χ ≡ I [ ← − X 0 ; R 0 | − → X 0 ] . (50) When the presen tation states are causal states, this quan tity reduces to the original deﬁnition—the pro cess’s crypticit y . F urthermore, the crypticity is the diﬀer- ence b etw een the presentation state entrop y and the y - in tercept of block-state entrop y curv e, Eq. (34). Theorem 3. The pr esentation crypticity χ ( R ) is the dif- fer enc e b etwe en the pr esentation state entr opy H [ R 0 ] and the subline ar p art of the blo ck-state entr opy: χ = −J 1 . (51) Pr o of. Starting with the length- L approximation of the crypticit y , w e work our wa y to the L th partial sum of −J 1 via a straightforw ard calculation: I [ X L − L ; R 0 | X L 0 ] = H [ X L − L | X L 0 ] − H [ X L − L |R 0 , X L 0 ] (52) = H [ X L − L | X L 0 ] − H [ X L − L |R 0 ] (53) = Lh µ − H [ X L − L |R 0 ] + H [ X L − L | X L 0 ] − Lh µ (54) = −J 1 ( L ) + H [ X L − L | X L 0 ] − Lh µ (55) = −J 1 ( L ) + H [ X L 0 | X L − L ] − Lh µ (56) = −J 1 ( L ) + L − 1 X j =0 H [ X j | X j 0 , X L − L ] − Lh µ (57) = −J 1 ( L ) + 2 L − 1 X j = L H [ X j | X L + j 0 ] − Lh µ . (58) Equation (53) follo ws b ecause the states (in any hidden Mark ov mo del) shield the past from the future: the fu- ture is a function of the state. Equation (55) follo ws from the deﬁnition of J 1 , and Eq. (56) from stationar- it y . Equation (57) follows from the c hain rule for block 12 en tropies [21], and Eq. (58) from using stationarity again. Finally , w e take the large- L limit. By deﬁnition, w e ha ve J 1 ( L ) → J 1 . The remaining diﬀerence conv erges to zero due to a result in Ref. [35] that the conditional blo c k entropies con verge to the entrop y rate faster than linearly in L . B. Oracular Information W e now in tro duce a sibling of crypticity—the or acular information . Deﬁnition 6. The oracular information is the amount of state information shar e d with the futur e that is not derive d fr om the p ast: ζ ≡ I [ R 0 ; − → X 0 | ← − X 0 ] . (59) This new quan tity is alwa ys zero for the  -machine and nonzero only for nonuniﬁlar presen tations. It is the dif- ference b etw een presentation statistical complexity and the y -intercept of the state-blo ck entrop y curv e, Eq. (33). Theorem 4. The or acular information is the diﬀer enc e b etwe en the pr esentation state entr opy H [ R 0 ] and the subline ar p art of the state-blo ck entr opy curve: ζ = −K 1 . (60) Pr o of. The proof pro ceeds almost identically to the cor- resp onding result for crypticit y . Namely , I [ R 0 ; X L 0 | X L − L ] = − K 1 ( L ) + 2 L − 1 X j = L H [ X j | X j 0 ] − Lh µ . (61) Then, taking the large- L limit prov es the result. In this sense, a positive oracular information indicates that there is a deﬁcit in using only the riv al states for pre- diction. More information—the oracular information— m ust be extracted from the presentation in order to p er- form optimal prediction. C. Gauge Information When moving aw ay from the optimal representation aﬀorded by a pro cess’s  -mac hine, it is possible to en- coun ter presentations containing state information that is not justiﬁed b y a pro cess’s bi-inﬁnite set of observ ables. W e call this gauge information to draw a parallel with the descriptional degrees of freedom that gauge theory addresses in physical systems [37]. Deﬁnition 7. The gauge information is the unc ertainty in the pr esentation states given the entir e p ast and futur e: ϕ ≡ H [ R 0 | ← − X 0 , − → X 0 ] . (62) That is, to the exten t there is uncertain ty in the states, ev en after the past and the future are kno wn, the presen- tation contains state uncertaint y ab ov e and b ey ond the pro cess. Thus, there are comp onents of the model that are not determined b y the pro cess; rather they are the result of a c hoice of presen tation. In tuitively , gauge information can b e related to the to- tal state entrop y , crypticity , oracular information, and excess entrop y . Later, we will discuss information dia- grams as a useful visualization to ol, but for no w, we sim- ply p oint out that one can “visually” verify the following theorem from Figure 13. Theorem 5. Gauge information is the diﬀer enc e b e- twe en the state entr opy and the sum of the crypticity, or acular information, and exc ess entr opy: ϕ = H [ R ] − ( χ + ζ + E ) . (63) Pr o of. Since w e are w orking with hidden Marko v models, the future and past are conditionally indep endent giv en the curren t state. Th us, E ≡ I [ ← − X ; − → X ] = I [ ← − X ; R ; − → X ]. No w, the proof pro ceeds as a simple v eriﬁcation: χ ( L ) + ζ ( L ) + E ( L ) = I [ ← − X L ; R| − → X L ] + I [ R ; − → X L | ← − X L ] + I [ ← − X L ; R ; − → X L ] = H [ R ] − H [ R| ← − X L , − → X L ] . So, ﬁnite-length approximations to the gauge information can b e written as: H [ R ] −  χ ( L ) + ζ ( L ) + E ( L )  = H [ R| ← − X L , − → X L ] . T aking the limit, we achiev e our desired result. D. Synchronization Information As we noted, it is alwa ys p ossible to asymptotically sync hronize to an  -mac hine with a ﬁnite num b er of recurren t causal states. F or some pro cesses, synchro- nization can happ en in ﬁnite time. While in others, it can only happ en in the limit as the observ ation window tends to inﬁnity . In either case, it is alwa ys true that H [ S ∞ | − → X ] = 0. When w e generalize to presen tations that diﬀer from  -mac hines, it is no longer true that one alwa ys synchro- nizes to the presen tation states. In such cases, there is 13 irreducible state uncertaint y , ev en after observing an in- ﬁnite num b er of symbols. This kind of state uncertaint y cannot b e reduced by past observ ations alone. Due to this, the sync hronization information, as previously de- ﬁned, diverges. Deﬁnition 8. The presentation synchronization infor- mation is the total unc ertainty in the pr esentation states: S ≡ ∞ X L =0 H [ R L | X L 0 ] . (64) W e will show in Sec. VI H that this can b e understo o d in terms of the gauge and oracular informations. E. Cryptic Order The cryptic order was deﬁned in Ref. [24] as the min- im um length k for which H [ S k | − → X 0 ] = 0. Reference [36] sho ws that the cryptic order is a topological prop erty of the irr e ducible soﬁc shift [32] describing the supp ort of the  -machine. Ho wev er, we can understand the cryp- tic order geometrically as the length k χ at whic h the blo c k-state en tropy H [ X L 0 , S L ] reac hes its asymptote; see Eq. (33). It turns out that this concept generalizes di- rectly to generic presen tations. Deﬁnition 9. The presentation cryptic order is the length k at which the blo ck-state entr opy curve r e aches its asymptote: k χ ≡ min  L : H [ X L 0 , R L ] = H [ R 0 ] − χ + h µ L  . (65) One would like to understand the cryptic order in terms of an explicit limit, as done for  -machines, where cryptic order is the minim um k for which H [ S k | − → X 0 ] = 0. The obvious complication for presen tations, in general, is that one migh t never sync hronize to a particular state. Ho wev er, it turns out that one can understand the pre- sen tation cryptic order in terms of one’s uncertaint y in the distribution o v er distributions of states —that is, the uncertain ty in the distribution o ver mixe d states [23, 38]. Sp eciﬁcally , w e frame the generalized cryptic order in terms of synchronizing to distributions o ver presentation states. W e outline the approach brieﬂy; a detailed exp o- sition will app ear elsewhere [36]. As measuremen ts are made, an observer’s uncertaint y in the state of the presentation v aries. Ho wev er, the pat- tern of v ariation b ecomes regular as more observ ations are made. The cryptic order, then, is understo o d as the n umber of distributions o ver presen tation states that one cannot kno w with certain ty from time t = 0 giv en the en- tire future. Said diﬀerently , the cryptic order is the time at which an observer becomes absolutely certain ab out the uncertaint y in the presentation states. F. Oracular Order The oracular order deﬁnition parallels those of the cryptic and the Mark ov orders. Deﬁnition 10. The oracular order is the length k ζ at which the state-blo ck entr opy curve r e aches its asymptote: k ζ ≡ min  L : H [ R 0 , X L 0 ] = H [ R 0 ] − ζ + h µ L  . (66) It alw ays v anishes for  -machines. So, this new length scale is a prop erty of the presen tation only and not of the pro cess generated by the presen tation. G. Gauge Order The gauge order deﬁnition also parallels those of the cryptic, Marko v, and oracular orders. Deﬁnition 11. The gauge order is the length k ϕ at which H [ R 0 | X L − L X L 0 ] r e aches its asymptote. k ϕ ≡ min { L : H [ R 0 | X L − L , X L 0 ] = ϕ } . (67) Geometrically , we visualize the gauge order as the length at which the diﬀerence b etw een tw o curv es— H [ X L − L , R 0 , X L 0 ] and H [ X L − L , X L 0 ]—b ecomes ﬁxed to their asymptotic diﬀerence. Theorem 6. The gauge order is the maximum of the Markov, cryptic, and or acular or ders: k ϕ = max { R, k χ , k ζ } . (68) Pr o of. The gauge information can b e understo o d as the left-o ver state information after the excess entrop y , cryp- ticit y , and oracular information [39] ha ve been extracted: ϕ = H [ R 0 ] − E − χ − ζ . (69) Th us, as soon as the observ er reac hes eac h of the Marko v, cryptic, and oracular orders, the remaining state infor- mation exactly equals the gauge information. It is imp ortan t to note that, unlike the Marko v, cryp- tic, and oracular orders, the gauge order do es not indi- cate a scale at whic h an amount of information is con- tained. Rather, it is more the opp osite. The gauge order is the length scale b ey ond which there is no p oint at- tempting to extract any more state information (ev en with an oracle), precisely b ecause this remainder is the gauge information and, therefore, not correlated with the pro cess language. It corresp onds to what in physics one calls a gauge freedom. 14 H. Synchronization Order As men tioned in Sec. V B, the length at which an ob- serv er has synchronized to an  -machine is alwa ys R , the Mark ov order. Recall, an y order- R Marko v pro cess has I [ − → X R ; ← − X 0 | X R 0 ] = 0. Sync hronization to the  -machine requires that H [ S L | X L 0 ] = 0, and it is straigh tforward to see that this holds for L = R . As we generalize to non-  -mac hine presentations, though, w e must lo ok b eyond Mark ov order to address the fact that one migh t only sync hronize to distributions ov er presen tation states. Deﬁnition 12. The presentation synchronization order is the length k S at which H [ R L | X L 0 ] r e aches its asymp- tote: k S ≡ min { L : H [ R L | X L 0 ] = ϕ + ζ } . (70) The motiv ation for this deﬁnition is that the asymp- tote is simply the diﬀerence of the asymptotes for the blo c k-state and blo ck en trop y curv es. That is, the syn- c hronization order is also thought of as the length at whic h the state uncertain ty equals its irreducible state uncertain ty: ϕ + ζ = H [ R 0 | ← − X 0 ]. No w, we sho w that the synchronization order m ust o c- cur at either the presen tation cryptic order or the Marko v order. Theorem 7. The pr esentation synchr onization or der is the maximum of the Markov and pr esentation cryptic or- ders: k S = max { R, k χ } . (71) Pr o of. When b oth the block-state and blo ck entrop y curv es hav e reached their asymptotes the observer will ha ve extracted E + χ bits of state information. This lea ves H [ R 0 ] − E − χ = ϕ + ζ bits. This is exactly the irreducible state uncertaint y—that whic h cannot b e learned from the observ ables. Note that for  -mac hines: E + χ = C µ . So, when an observer has extracted all that can b e learned ab out the pro cess from the past observ ables, the observ er has learned everything ab out the causal states. When the sync hronization order is ﬁnite, H [ R L | X L 0 ] is ﬁxed at the presentation’s irreducible state uncertain ty for all L > k S . Then, it can b e helpful to view the presen tation synchronization information as consisting of t wo contributions: S = k S − 1 X L =0 H [ R L | X L 0 ] + ∞ X L = k S ( ϕ + ζ ) . (72) When the sync hronization order is not ﬁnite, it can b e useful to in terpret the synchronization information in a sligh tly diﬀerent manner: S = I 0 + J 0 + lim L →∞ ( ϕ + ζ ) L . (73) I. Synchronization Time Reference [13] deﬁned the synchr onization time τ of a p erio dic pro cess to b e the av erage time needed to syn- c hronize to the states. Let w = w 0 · · · w p − 1 b e a cyclic p erm utation of the word that is rep eated by a p erio dic pro cess ha ving p erio d p . It follows that Pr( X p 0 = w ) = 1 p , (74) since an y cyclic p ermutation is just as likely as another. No w, while each p ermutation has the same probability , it is not true that each p erm utation is equally informa- tiv e in terms of synchronization. F or example, consider the pro cess that rep eats the word 00011, indeﬁnitely . If an observer saw 01, then the observer w ould be synchro- nized. In con trast, the observer would not b e sync hro- nized if 00 had been observed instead. Reference [13] deﬁned τ w as the synchronization time of the cyclic p er- m utations of w . Then, τ = X w τ w Pr( X p 0 = w ) . (75) Since h µ = 0 for all perio dic pro cesses, Pr( X p 0 = w ) = Pr( X τ w 0 = w 0 · · · w τ w − 1 ) . (76) Th us, we can rewrite τ suggestively as: τ = X w τ w Pr( X τ w 0 = w 0 · · · w τ w − 1 ) . (77) Then, instead of summing ov er all cyclic p erm utations of w , w e can just sum ov er the set L sync of all minimal sync hronizing w ords. (A word is a minimal synchr onizing wor d if no preﬁx of the w ord is also sync hronizing.) Now, w e can extend τ to all ﬁnitary pro cesses, not just perio dic ones. Deﬁnition 13. The pro cess sync hronization time is the aver age time r e quir e d to synchr onize to the  -machine’s r e curr ent c ausal states: τ ≡ X w ∈L sync | w | Pr  X | w | 0 = w  . (78) Note that an y order- R Marko v pro cess has τ ≤ R . The synchronization time gives an intuition for ho w long it takes to synchronize to a sto chastic pro cess. 15 As an example, recall the Even Pro cess [12]. It has the prop erty that there are arbitrarily long minimal syn- c hronizing words. F or example, 1 k 0 is alw ays a minimal sync hronizing word, for an y k . Despite this fact, the syn- c hronization time of the Even Pro cess is τ = 10 / 3. After rep eatedly observing sequences four symbols in length, on av erage an observer will b e synchronized to the states of the  -machine. When considering more general presen tations it is not alw ays the case that one can sync hronize to the states, as τ can b e inﬁnite. Just as with the cryptic order, ho wev er, one can synchronize to distributions o v er the presentation states. This motiv ates the presentation synchronization time. Deﬁnition 14. The presentation synchronization time is the aver age time r e quir e d to synchr onize to a r e curr ent distribution over pr esentation states. W e provide an intuitiv e deﬁnition here, leaving a more detailed discussion, where notation is prop erly developed, for a sequel. VI I. CLASSIFYING PRESENT A TIONS The  -mac hine is frequently the preferred presenta- tion of a pro cess, esp ecially when one is interested in understanding fundamental prop erties of the pro cess it- self. How ever, one might be interested in the prop erties of particular presen tations of a pro cess, and it would be helpful if there was an analogous theory similar to that whic h has been developed for  -machines. T o develop this, we establish a classiﬁcation of a pro cess’s presen tations. The classes are deﬁned in terms of whether a presen tation is non uniﬁlar, uniﬁ- lar, weakly asymptotically synchronizable, and minimal uniﬁlar. The result is sho w n in Fig. 2, which shows that the presentation classes form a nested hierarch y . The most general type of presentation is non uniﬁlar, where we allo w the p ossibility that H [ R 1 |R 0 , X 0 ] > 0. Then, uniﬁlar presentations are the subset of nonuniﬁlar presen tations for whic h this quantit y is exactly zero. In the uniﬁlar class, there can b e redundan t states—states from which the future lo oks exactly the same and also states which hav e the same exact histories mapping to them. When we mo ve to weakly asymptotically sync hro- nizable presen tations, all redundant states are remov ed and the remaining states must induce a partition on the set of histories that is a reﬁnement of the causal state partition; cf. Ref. [29, Lemma 7]. Finally , minimal uniﬁ- lar presentations are the  -mac hines, whose partition of the pasts is the coarsest one possible. In this ligh t, one might conclude that  -mac hines are an o verly restricted set of presen tations. They are indeed Minimal-Uniﬁlar W eakly Asymptotically Sync hr onizable Uniﬁlar Non uniﬁlar FIG. 2. The hierarch y of presentations of a ﬁnitary pro- cess. The gray region represents that p ortion to which the  -mac hine b elongs. a restricted set, but it is a restriction with purp ose: The  -mac hine is the unique minimal prescien t presen tation within the set of a pro cess’s presen tations. Moreo ver, all of a pro cess’s prop erties can b e determined from its  -mac hine. These facts allo w one to purp osefully conﬂate prop erties of the  -mac hine with process’s prop erties. W e will use a information diagr am (I-diagram) [3] to analyze what happ ens as one relaxes the deﬁning prop- erties of the  -machine presentation’s random v ariables. With the  -machine, we hav e the past ← − X , the causal states S , and the future − → X . As we mo ve aw ay from the  -mac hine’s causal states, we must consider in addition the riv al states R . In total, there are four random v ariables to consider. The full range of their p ossible information-theoretic rela- tionships app ears in the information diagram (I-diagram) of Fig. 3. How ever, App endix C shows that 7 of the 15 atoms (elemen tal comp onents of the multiv ariate infor- mation measure sigma algebra) v anish. This allows us to simplify other atoms dramatically . F or example, the atom: I [ ← − X ; S ; R ; − → X ] (79) = I [ ← − X ; S ; − → X ] − I [ ← − X ; S ; − → X |R ] (80) = I [ ← − X ; S ; − → X ] (81) = I [ ← − X ; − → X ] − I [ ← − X ; − → X |S ] (82) = I [ ← − X ; − → X ] − ( I [ ← − X ; R ; − → X |S ] + I [ ← − X ; − → X |S , R ]) (83) = I [ ← − X ; − → X ] , (84) where w e made use of the atoms that v anish. Thus, the four-w ay mutual information simply reduces to the mu- tual information b etw een the past and the future—the 16 χ E ϕ ζ H [ ← − X ] H [ − → X ] H [ S ] H [ R ] H [ ← − X |S , R , − → X ] H [ − → X | ← − X , S , R ] I [ ← − X ; S |R , − → X ] I [ R ; − → X | ← − X , S ] H [ S | ← − X , R , − → X ] H [ R| ← − X , S , − → X ] I [ S ; R| ← − X , − → X ] I [ ← − X ; S ; R| − → X ] I [ S ; R ; − → X | ← − X ] I [ ← − X ; S ; R ; − → X ] I [ ← − X ; − → X |S , R ] I [ ← − X ; R|S , − → X ] I [ S ; − → X | ← − X , R ] I [ ← − X ; R ; − → X |S ] I [ ← − X ; S ; − → X |R ] FIG. 3. The general four-v ariable information diagram inv olving ← − X , S , R , and − → X . The shaded ligh t gray is the generalized crypticit y χ . The yello w is the excess entrop y E . The dark gra y is the oracular information ζ . The hatc hed area is the gauge information ϕ . Note that this is only a schematic diagram of the interrelationships. In particular, p otentially inﬁnite quan tities—suc h as, H [ ← − X ] and H [ − → X ]—are depicted with ﬁnite areas. excess en tropy: I [ ← − X ; S ; R ; − → X ] = E . (85) Similar calculations reduce the other information mea- sures in Fig. 3 correspondingly . W e no w consider these reductions in turn. A. Case: Minimal Uniﬁlar Presen tation The set of minimal uniﬁlar presentations corresp onds exactly to the  -machines, up to state relab eling. The states in these presentations, the causal states, induce a partition of the inﬁnite pasts via the function  ( ← − x ). The information diagram and en tropy growth plot are particularly simple, as seen in Fig. 4 and Fig. 5. This sim- plicit y derives from the eﬃcien t predictive role the causal states pla y . Referring to the I-diagram, H [ S | ← − X ] = 0 be- cause of determinism of the  ( ← − x ) map, Eq. (18). Next, causal states, as well as all other states we consider, are prescien t states and so I [ ← − X ; − → X |R ] = 0. These straight- forw ard requirements en tirely determine the form of the  -mac hine I-diagram in Fig. 4. As we step through the space of presentation classes, we will see these relation- ships b ecome more complex. There are three quantities that require attention in this H [ ← − X ] H [ − → X ] C µ H [ R ] χ E FIG. 4. The information diagram for an  -machine. The states of the presentation are causal states and induce a parti- tion on the past. The en tropy ov er the states, H [ R 0 ] = H [ S 0 ], deﬁnes the statistical complexity ( C µ ). The process crypticit y is the diﬀerence of the statistical complexity and the excess en trop y . ﬁgure. First, the state entrop y H [ R ] is equal to C µ —the statistical complexity . This particular state information is considered privileged as it is the state information asso- ciated with the  -mac hine and so the pro cess. The excess en tropy E is the mutual information b et ween the past and future and is also exactly that information whic h 17 L [symbols] E C µ H [bits] H [ X L 0 ] H [ X L 0 , R L ] H [ R 0 , X L 0 ] E + Lh µ L [symbols] E C µ H [bits] H [ X L 0 ] H [ X L 0 , R L ] H [ R 0 , X L 0 ] E + Lh µ FIG. 5. Entrop y growth for a generic  -machine. H [ X L 0 ] and H [ X L 0 , S L ] b oth conv erge to the same asymptote. H [ S 0 , X L 0 ] is linear. A B p | 1 1 − p | 0 1 | 1 FIG. 6. The  -machine presen tation of the Golden Mean Pro- cess. the (causal) states con tain ab out the future. Lastly , the crypticit y χ is the amount of information “o v erhead” re- quired for prediction using the  -mac hine. Generally , this o verhead is asso ciated with the presentation as well as the pro cess itself , due to the uniqueness of the  -machine pre- sen tation. It is the irreducible memory asso ciated with the pro cess. At any time, the pro cess itself or a predictive mo del m ust keep trac k of C µ bits of state information, while only E bits of this information are correlated with the future. The entrop y growth plot, Fig. 5, is also simpliﬁed b y using causal states. In terms of our newly deﬁned inte- grals: K n = 0 for all n and J 1 = H [ S ] − I 1 = χ . A simple example that illustrates all of these p oints is pro vided b y the Golden Mean Process and its  -machine; see Fig. 6. When the probability p is chosen to b e 1 2 , the v alues of our information measures are C µ = log 2 (3) − 2 3 = 0 . 9183 bits, χ = 2 3 bits, and E = C µ − χ = 0 . 2516 bits. As w e explore alternate presentations, we will re- turn to this process as a common thread for explanation and intuition. B. Case: W eakly Asymptotically Synchronizable Presen tations Let’s relax the minimality constraint lea ving the  -mac hines for presentations that are nonminimal uniﬁ- lar and weakly asymptotically synchronizable. Again, H [ ← − X ] H [ − → X ] C µ H [ R ] χ E FIG. 7. The information diagram for a presentation that is weakly asymptotically synchronizable, but not necessarily minimal uniﬁlar. The states still induce a partition on the in- ﬁnite past. The presentation crypticity χ ( R ) is the diﬀerence of the state en tropy H [ R ] ≥ C µ and the excess en tropy E . the states corresp ond to a partition of the inﬁnite pasts, but since they are prescien t and not minimal uniﬁlar, the partition m ust b e a reﬁnement of the causal-state parti- tion [29]. The eﬀect of this is benign as seen in b oth the I- diagram (Fig. 7) and the entrop y gro wth plot (Fig. 8). In Fig. 7, w e akly asymptotically synchronizabilit y ensures that H [ R| ← − X ] = 0. Demanding prescient states deter- mines the form of the I-diagram. Figure 7 indicates that H [ R ] > H [ S ]. This is a consequence of R b eing a non- trivial reﬁnement of S . Examining the en tropy growth plot, the increased state information is reﬂected in the v alues of the blo ck-state and state-block entrop y curv es at L = 0. Additionally , it is in teresting to note what happ ens to the cryptic or- der. W e generalized the deﬁnition of cryptic order to b e that length where the blo ck-state en tropy reac hes its asymptote. Since blo c k-state entrop y is nondecreasing, this suggests that it might b e forced to reac h its asymp- tote at a larger v alue of L than the cryptic order for the  -machine presen tation. W e can see that this is in fact true by expanding the following join t entrop y in tw o w ays. Note that we combine v ariables from tw o diﬀer ent presen tations and expand H [ X L 0 , S L , R L ]: H [ X L 0 R L ] = H [ R L | X L 0 S L ] + H [ X L 0 S L ] − H [ S L | X L 0 R L ] = H [ R L | X L 0 S L ] + H [ X L 0 S L ] ≥ H [ X L 0 S L ] . (86) In the ab ov e, we make use of the fact that R is a re- ﬁnemen t of S and that conditional entropies are p ositiv e semi-deﬁnite. This sho ws that the blo ck-state curve for the nonminimal presen tation lies ab ov e or on the curve 18 L [symbols] E H [ R ] H [bits] H [ X L 0 ] H [ X L 0 , R L ] H [ R 0 , X L 0 ] E + Lh µ L [symbols] E H [ R ] H [bits] H [ X L 0 ] H [ X L 0 , R L ] H [ R 0 , X L 0 ] E + Lh µ FIG. 8. En tropy gro wth for a weakly asymptotically synchro- nizing presentation. H [ X L 0 ] and H [ X L 0 , S L ] b oth conv erge to the same asymptote. H [ S 0 , X L 0 ] is linear. H [ R ] is larger than C µ . for the  -machine presen tation. Since blo ck and blo c k- state entropies share an asymptote— E + Lh µ —the non- minimal uniﬁlar blo ck-state en tropy will reac h its asymp- tote at a v alue greater than or equal to the pro cess’s cryp- tic order. More care will b e required in the subsequen t cases, as the relations among entrop y growth functions are more complicated. T o illustrate these class characteristics, consider the follo wing three-state presen tation of the Golden Mean Pro cess in Fig. 9. The original causal state partition, { A = ∗ 1 , B = ∗ 0 } , has become reﬁned. (Here, ∗ de- notes any allow ed history .) W e no w ha ve { A = ∗ 11 , B = ∗ 0 , C = ∗ 01 } . It is straightforw ard to v erify that H [ R ] = log 2 (3) = 1 . 585 bits. Excess entrop y is unc hanged as it is a feature of the process language and not the pre- sen tation. As illustrated in Fig. 7, the crypticit y gro ws commensurately with H [ R ]. W e ha ve sho wn that for w eakly asymptotically syn- c hronizable presentations the presentation cryptic order generally will b e larger than the cryptic order. It is in- teresting to note that it is also possible for the presen- tation cryptic order to surpass even the Marko v order. Our three-state example (Fig. 9) is 2-cryptic while the Mark ov order remains R = 1 as it also dep ends only on the pro cess language. Since the Marko v order R bounds the domain of the I 0 in tegral and the presentation cryptic order k b ounds the domain of the J 0 in tegral, the domain of the syn- c hronization information is b ounded by max { R, k } . C. Case: Uniﬁlar Presentations Remo ving the requirement that a presentation be w eakly asymptotically synchronizable, we no longer op er- A B C p | 1 1 − p | 0 1 | 1 p | 1 1 − p | 0 FIG. 9. A weakly asymptotically synchronizable and non- minimal uniﬁlar presentation of the Golden Mean Pro cess: observing a 0 sync hronizes the observer to state B . ate with (recurren t) states that corresp ond to a partition of the inﬁnite past, but rather to a cov ering of the set of inﬁnite pasts. That is, η ( ← − x ) can b e multiv alued, al- though for each ρ ∈ R , η − 1 ( ρ ) is a set of pasts that is a subset of some causal state’s set of pasts. Ev ery allo wable inﬁnite history induces at least one state in the presen tation—this is the deﬁnition of an al- lo wable inﬁnite history . Additionally , an y presentation that is not weakly asymptotically sync hronizable must ha ve a (p ositive measure) set of histories where each his- tory induces more than one state. Consider a uniﬁlar presen tation and an inﬁnite history whic h induces only one state. Due to uniﬁlarity , w e can use this history to construct an inﬁnite set of histories that are also synchronizing. W e conjecture that this set of histories must hav e zero measure and, even stronger, that for ﬁnite-state uniﬁlar presen tations with a single recurren t comp onent, there are no synchronizing histo- ries. This inabilit y to synchronize, a product of the non triv- ial co vering, is represented as the information measure ϕ in Fig. 10. This information is not captured by the causal states. In fact, it is not even captured by the past (or the future). It also is not necessary for making predic- tions with the same pow er as the  -machine. Lik e χ ( R ), ϕ is unnecessary for prediction. How ever, unlike χ ( R ), ϕ do es not capture any structural prop ert y of the pro- cess. Instead, it represents degrees of freedom entirely decoupled (informationally) from the pro cess language and prediction. F or this reason, we called it the gauge information . The entrop y growth plot of Fig. 11 has a new and sig- niﬁcan t feature representing the change in class. The asymptotes of the blo ck entrop y and block-state entrop y b ecome nondegenerate. This has the eﬀect of making the synchronization information diverge. Although this fact follows immediately from the deﬁnition of weakly 19 H [ ← − X ] H [ − → X ] C µ H [ R ] χ E ϕ FIG. 10. The information diagram for a presentation that is not weakly asymptotically synchronizable, but still uniﬁlar. The states are prescient, but no longer induce a partition on the inﬁnite past. F urthermore, the states contain information that the past do es not contain. The presentation crypticity is the diﬀerence of the state entrop y H [ R 0 ] ≥ C µ and the excess en trop y E = I [ ← − X 0 ; − → X 0 ]. L [symbols] E H [ R ] − χ H [ R ] H [bits] H [ X L 0 ] H [ X L 0 , R L ] H [ R 0 , X L 0 ] E + Lh µ H [ R ] − χ + Lh µ L [symbols] E H [ R ] − χ H [ R ] H [bits] H [ X L 0 ] H [ X L 0 , R L ] H [ R 0 , X L 0 ] E + Lh µ H [ R ] − χ + Lh µ FIG. 11. Entrop y gro wth for a not w eakly asymptotically syn- c hronizable, but uniﬁlar presen tation. H [ X L 0 ] and H [ X L 0 , S L ] b oth conv erge to diﬀerent asymptotes. H [ S 0 , X L 0 ] is linear. H [ R ] is larger than C µ . asymptotically sync hronizable, it is instructive to see its geometric represe n tation. Since, from this p oin t forward, synchronization infor- mation is alwa ys inﬁnite, w e ﬁnd it necessary to re- express what synchronization information means. It can b e denoted, recall Eq. (73), as the sum of a ﬁ- nite piece and the limit of a linear (in L ) piece: S = I 0 + J 0 + lim L →∞ Lϕ . This rate of increase of the linear piece is exactly the gauge information. It is also in teresting to note that when this informa- tion is obtained—that is, a constraint is imp osed up on the descriptional degrees of freedom—uniﬁlarity main- tains synchronization as more data is pro duced. In this sense, acquiring gauge information is a “one-time” cost. A B C D p | 1 1 − p | 0 1 | 1 p | 1 1 − p | 0 1 | 1 FIG. 12. A uniﬁlar, but not w eakly asymptotically synchro- nizing, presen tation of the Golden Mean Process. The Golden Mean Pro cess presen tation in Fig. 12 illus- trates all of the features describ ed ab ov e. It is straigh tfor- w ard to see that this presen tation is not weakly asymp- totically synchronizing. An y history , ﬁnite or inﬁnite, has exactly tw o states that it induces. This degener- acy is nev er brok en, due to uniﬁlarit y . Rephrasing, the gauge information v alue, ϕ = 1 bit, derives from the fact that each inﬁnite history induces one of tw o states with equal likelihoo d. This relies on the fact that there is no oracular information con tribution— ζ = 0 bits since the presen tation is uniﬁlar—to disen tangle from the gauge information. D. Case: Nonuniﬁlar Presen tations Finally , we remov e the requirement of uniﬁlarit y and examine the muc h larger, complementary space of non uniﬁlar presentations. Only one nonuniﬁlar state m ust be present to change the class of the whole presenta- tion. This ease of breaking uniﬁlarit y is why nonuniﬁlar presen tations form a muc h larger class. Examining the I-diagram in Fig. 13, w e notice one new feature: the oracular information ζ = I [ R ; − → X | ← − X ] = I [ R ; − → X |S ] 6 = 0. The oracular information is a curious quan tity and so deserves careful in terpretation. It is the degree to which the presentation state reduces un- certain ty in the future b eyond that for which the past can account. One migh t think of this feature as “sup er- prescience”. Not only is the information from the past b eing maximally utilized for prediction, but some ad- ditional information is also injected. W e mak e several remarks ab out this. It is well known that a pro cess’s nonuniﬁlar presen ta- tions ma y b e smaller than the corresp onding  -machine. This fact is sometimes cited [27] as providing evidence that the smaller nonuniﬁlar presentation is the more “natural” one [40]. While it is true that the state infor- mation H [ R ] can b e smaller than C µ , and in fact often is, the I-diagram mak es plain the fact that oracular informa- 20 H [ ← − X ] H [ − → X ] C µ H [ R ] χ E ϕ ζ FIG. 13. The information diagram for a presentation that is not uniﬁlar. The states are sup er-prescient, do not induce a partition on the past, and ha ve information not contained in the past. The presen tation crypticity is the diﬀerence of the state en tropy H [ R ] and the excess en tropy ( E ). Note, the state entrop y can also b e smaller than the statistical com- plexit y . tion must b e introduced to determine R and, thus, make a sup er-prescient prediction. F or this reason, unless one is transparent about allowing for oracular information, it is not appropriate to make a judgment about naturalness of nonuniﬁlar presentations. Giv en that w e do not hav e the luxury of access to an oracle, we might like to kno w how these presentations p erform without this information. The nonoracular part of I [ R ; − → X ] is simply E . That is, without the oracu- lar information, we predict just as we would with an y other prescient presentati on. How ever, the predictions are made using distributions over states rather than in- dividual states. (The former are the mixed states of Ref. [23].) More imp ortantly , as w e contin ue to mak e pre- dictions, the state distribution evolv es through a series of distributions. These distributions are in 1-to-1 corre- sp ondence with the causal states of the  -machine. And so, for a nonoracular user of a nonuniﬁlar presen tation to communicate her history-induced state to another re- quires the transmission of C µ bits. The statistical com- plexit y is inescapable as the (nonoracular) information storage of the process. When discussing nonw eakly asymptotically sync hro- nizable, but uniﬁlar presen tations, w e indicated that the gauge information w as a “one-time” cost. Now, we ask the same question of the tw o informations—gauge and oracular—that are not pro ducts of the past. Since we no longer ha ve uniﬁlarity , state uncertaint y is dynamically rein tro duced as sync hronization is lost. That is, non uniﬁ- lar presen tations are allow ed to lo c al ly resynchronize fol- lo wing the introduction of state uncertain ty . The net result is that ov er time synchronization is rep eatedly lost and reacquired. The entrop y growth plot of Fig. 14 mak es one last ad- justmen t to ackno wledge the c hange in class. F or the ﬁrst time, the state-blo c k entrop y is nonlinear. It approaches its asymptote from abov e and, moreov er, the asymptote is indep endent of the blo ck-state asymptote. The pro- jection back onto the y-axis mirrors our ﬁnal and most general I-diagram of Fig. 13. The left panel emphasizes that the crypticit y χ ( R ) can b e less than the oracular information ζ , in general cases. A non uniﬁlar presen tation of the Golden Mean Process is shown in Fig. 15. All of the ab ov e-mentioned quanti- ties are nonzero for this presentation: F or p = 1 / 2, the crypticit y χ ( R ) = 1 / 3 bits, the gauge information ϕ = 1 bit, and the oracular information ζ = 1 / 3 bits. The v alue of the gauge information (1 bit) is easy to understand. It indicates that the non uniﬁlar prese n tation is tw o copies of a uniﬁlar presentation of the Golden Mean Process su- tured together. All of history space is co vered twice and the choice of which comp onent of the cov er is visited is a fair coin ﬂip. The crypticity and oracular information (crypticit y’s time-rev ersed analog) are the same, due to the non uniﬁlar presentation resp ecting the time-reverse symmetry of the Golden Mean Pro cess [23]. VI I I. CONCLUSIONS Our developmen t started out discussing sync hroniza- tion and con trol. The to ols required to address these— the blo ck-state and state-blo c k entropies—quic kly led to a substantially enlarged view of the space of comp eting mo dels, the riv al presen tations, and a new collection of information measures that reﬂect their subtleties and dif- ferences. As milestones along the wa y , w e gav e example pre- sen tations of the w ell known Golden Mean Pro cess that w ent from the  -machine to a nonminimal nonsync hro- nizing nonuniﬁlar presentation. T able I summarizes the quan titative results. It giv es the entrop y rate h µ , statisti- cal complexity C µ , excess entrop y E , and the crypticity χ for the pro cess itself. Immediately following, it compares the analogous measures for the range of presen tations considered. In addition, the gauge information ϕ and the oracular information ζ , being properties of presenta- tions, are added. Careful study of the table sho ws ho w the measures trac k the presentations’ structural c hanges. A few comments are in order ab out the tools the de- v elopment required. The ﬁrst were the blo ck-state and state-blo c k entropies, as noted. Analyzing their word- length conv ergence prop erties was the backbone of the approac h—one directly paralleling the previously in tro- 21 L [symbols] E H [ R ] − χ H [ R ] − ζ H [ R ] H [bits] L [symbols] E H [ R ] − χ H [ R ] − ζ H [ R ] H [bits] L [symbols] E H [ R ] − χ H [ R ] − ζ H [ R ] H [bits] L [symbols] E H [ R ] − χ H [ R ] − ζ H [ R ] H [bits] H [ X L 0 ] H [ X L 0 , R L ] H [ R 0 , X L 0 ] E + Lh µ H [ R ] − χ + Lh µ H [ R ] − ζ + Lh µ FIG. 14. Entrop y gro wth for a non uniﬁlar presen tation. Left: H [ X L 0 ] and H [ X L 0 , S L ] b oth con verge to diﬀeren t asymptotes; H [ S 0 , X L 0 ] is not linear and H [ R ] is larger than C µ . Righ t: The same as on the left, but illustrating that χ can b e less than ζ . A B C D p | 1 1 − p | 1 1 | 0 p | 1 1 | 1 1 − p | 0 FIG. 15. A nonuniﬁlar presentation of the Golden Mean Pro- cess. duced entrop y conv ergence hierarc hy [12]. Another im- p ortan t to ol was the I-diagram. While it is not necessary in establishing ﬁnal results, it is immensely helpful in organizing one’s thinking and in managing the compli- cations of multiv ariate information measures. Metho d- ologically sp eaking, the principal sub ject was the four- v ariable—past, future, causal state, and presentation state—I-diagram with its sigma algebra of 15 atoms. Th us, the metho dology of the dev elopment turned on just tw o to ols—blo ck en trop y con vergence and presenta- tion information measures. As for the concrete results, w e sho wed that there are t wo mec hanisms op erating in pro cesses that are hard to sync hronize to, as measured by the sync hronization in- formation whic h consists of tw o corresp onding indep en- den t contributions. The ﬁrst is the transient information whic h reﬂects entrop y-rate ov erestimates that o ccur at small blo ck lengths. The second, new here, reﬂects the state information that is not retro dictable using the fu- ture. With these tw o contributions laid out, the general connection betw een synchronization and transient infor- mation, previously introduced in Ref. [12], b ecame clear. W e also p ointed out that the synchronization information for nonsync hronizing presen tations can div erge. This, in turn, called for a generalized deﬁnition of synchronization appropriate to all presen tations. W e also generalized the process crypticit y , b ey ond the domain of  -machine optimal presentations, to describe the amount of presentation state information that is shared with the past but not transmitted to the future. A sibling of the crypticity , we introduced a new infor- mation measure for generic presen tations—the oracular information—that is the amount of state information shared with the future, but not deriv able from the past. Finally , to account for “comp onents”, either explicitly or implicitly included in a presen tation, that are not jus- tiﬁed by the pro cess statistics, we in tro duced the gauge information, inten tionally drawing a parallel to the con- cept of gauge degrees-of-freedom familiar from ph ysics. One immediate result was that the information mea- sures allow ed us to delineate the hierarch y of a pro cess’s presen tations. The hierarc hy go es from the unique, min- imal uniﬁlar, optimal predictor (  -machine) to nonmini- mal uniﬁlar, w eakly asymptotically synchronizing presen- tations to nonsync hronizing, uniﬁlar presen tations. W e sho wed these are nested classes. Stepping outside to the non uniﬁlar presen tations lea ves one in a mark edly larger class for whic h all of the information measures play a necessary role. W e trust that the presentation hierarch y makes the singular role of the  -mac hine transparent. First, the  -mac hine’s minimality and uniqueness are those of the corresp onding pro cess. This cannot b e said for alterna- tiv e presen tations. Second, there is a wide range of prop- erties that can b e eﬃciently calculated, when alternative presen tations may preclude this. One cannot calculate a pro cess’s stored information ( C µ ) or information pro- duction rate ( h µ ) from, for example, nonun iﬁlar presen- tations. The latter must b e conv erted, either directly or indirectly , to the pro cess’s  -machine to calculate them. 22 Information Measures for Alternative Presen tations Pro cess h µ C µ E χ Golden Mean 2 / 3 log 2 (3) − 2 / 3 log 2 (3) − 4 / 3 2 / 3 Presen tation H [ X |R ] H [ R ] I [ R ; − → X ] χ ( R ) ϕ ζ  -Mac hine h µ C µ E χ 0 0 Sync hronizable h µ log 2 (3) E 4 / 3 0 0 Uniﬁlar h µ log 2 (3) + 1 / 3 E 5/3 1 0 Non uniﬁlar 1/3 log 2 (3) + 1 / 3 log 2 (3) − 1 1/3 1 1/3 T ABLE I. Comparison of information measures for presentations of the Golden Mean Pro cess with transition parameter p = 1 / 2. Nonetheless, as discussed at some length in Ref. [27], in v arying circumstances—limited material, inference, or compute-time resources; ready access to sources of ideal randomness; noisy implementation substrates; and the lik e—the  -mac hine ma y not b e how an observ er should mo del a pro cess. A minimal non uniﬁlar presentation, that is necessarily more sto chastic in ternally than the  -mac hine [29], may b e preferred due to it ha ving a smaller set of states. Recalling the duality of synchronization and control, w e close b y noting that essentially all of the results here apply to the setting in which an agen t attempts to steer a pro cess into desired states. The eﬃciency with which the control signals achiev e this is reﬂected in the ana- logue of blo ck en tropy conv ergence. The v ery p ossibilit y of control has its counterparts in an implemen tation hi- erarc hy that mirrors the presen tation hierarch y , but with con trollability instead of sync hronizability . App endix A: Notation Change for T otal Predictabilit y The deﬁnition for I n in Eq. (14)—the total predictabilit y—represents a minor change in notation from Ref. [12]. (W e refer to the latter as RUR O, ab- breviating its title.) There, the minimum L w as usually n except for n = 2, when the minimum L v alue was L = 1 instead. One reason for the c hange in deﬁnition is that I 2 no w do es not dep end on an y assumption (prior) for symbol entrop y rate and dep ends only on asymptotic prop erties of the pro cess. T o mak e this explicit, note that the original deﬁnition of total predictability contained a boundary term: G RUR O = ∆ 2 H (1) + X L =2 ∆ 2 H ( L ) , (A1) where ∆ 2 H (1) = h µ (1) − h µ (0) = H (1) − log 2 |A| . (A2) The logarithm term characterized the en tropy rate es- timate before any probabilities are considered. In the mo diﬁed deﬁnition of total predictability , we drop the b oundary term, giving: G ≡ I 2 = X L =2 ∆ 2 H ( L ) . (A3) The tw o quantities are related by: G RUR O = G + ∆ 2 H (1) (A4) = G + H (1) − log 2 |A| . (A5) This aﬀects relationships inv olving G . Previously , for example, G RUR O = − R ≤ 0 , (A6) where R is the total redundancy . Now, G = − R − ∆ 2 H (1) (A7) = log 2 |A| − H (1) − R . (A8) App endix B: State-Blo ck En tropy Rate Estimate In this section, we prov e Thm. 1, which states that H [ X L |R 0 , X L 0 ] conv erges monotonically (nondecreasing) to the entrop y rate. Pr o of. First, we show that diﬀerence in the H [ R 0 , X L 0 ] 23 forms a nondecreasing sequence: H [ X L − 1 |R 0 , X L − 1 0 ] (B1) = H [ X L |R 1 , X L − 1 1 ] (B2) = H [ X L |R 0 , X 0 , R 1 , X L − 1 1 ] (B3) ≤ H [ X L |R 0 , X 0 , X L − 1 1 ] (B4) = H [ X L |R 0 , X L 0 ] . (B5) Next, we show this sequence is b ounded and, thus, has a limit. F or all k ≥ 0, we hav e: H [ X L |R 0 , X L 0 ] (B6) = H [ X L | X k − k , R 0 , X L 0 ] (B7) ≤ H [ X L | X L + k − k ] (B8) = H [ X L + k | X L + k 0 ] . (B9) Since this holds for all k , it also holds in the limit as k tends to inﬁnit y , whic h is the deﬁnition of the entrop y rate. Th us, H [ X L |R 0 , X L 0 ] is a nondecreasing sequence and b ounded ab ov e b y h µ . Finally , w e show that this b ounded sequence con verges to h µ . T o do this, w e will sho w that the diﬀerence H [ X L | X L 0 ] − H [ X L |R 0 , X L 0 ] = I [ R 0 ; X L | X L 0 ] con verges to zero. Then, since the ﬁrst term (diﬀerences in the blo ck entropies) is kno wn to con verge to the en- trop y rate, the claim will be prov ed. W e ha ve: H [ R 0 ] ≥ lim L →∞ I [ R 0 ; X L 0 , X L ] (B10) = lim L →∞ L X k =0 I [ R 0 ; X k | X k 0 ] . (B11) Since the sum is ﬁnite, the terms must tend to zero. App endix C: Reducing the Presentation I-Diagram Pro ving that the v arious multiv ariate information mea- sures v anish makes use of a few facts ab out states: • H [ S | ← − X ] = 0. • H [ ← − X ; − → X |S ] = 0. • I [ ← − X ; − → X |R ] = H [ − → X |R ] − H [ − → X |R , ← − X ] = 0. The last one follo ws from limiting ourselv es to states that actually generate the pro cess. Th us, additional condi- tioning on the past cannot inﬂuence the future, as the curren t state alone determines the future. The following atoms v anish: • H [ S | ← − X , R , − → X ]: H [ S | ← − X , R , − → X ] ≤ H [ S | ← − X ] = 0 . • I [ S ; R| ← − X , − → X ]: I [ S ; R| ← − X , − → X ] = H [ S | ← − X , − → X ] − H [ S | ← − X , R , − → X ] = H [ S | ← − X , − → X ] − 0 ≤ H [ S | ← − X ] = 0 . • I [ S ; R ; − → X | ← − X ]: I [ S ; R ; − → X | ← − X ] = I [ S ; R| ← − X ] − I [ S ; R| ← − X , − → X ] = I [ S ; R| ← − X ] − 0 = H [ S | ← − X ] − H [ S |R , ← − X ] = 0 − H [ S |R , ← − X ] . Finally , note that | H [ S |R , ← − X ] | ≤ | H [ S | ← − X ] | = 0 . • I [ S ; − → X | ← − X , R ]: I [ S ; − → X | ← − X , R ] = H [ S | ← − X , R ] − H [ S | ← − X , − → X , R ] = H [ S | ← − X , R ] − 0 ≤ H [ S | ← − X ] = 0 . • I [ ← − X ; − → X |S , R ]: I [ ← − X ; − → X |S , R ] = H [ − → X |S , R ] − H [ − → X |S , R , ← − X ] = H [ − → X |S , R ] − H [ − → X |S , R ] = 0 . • I [ ← − X ; R ; − → X |S ]: I [ ← − X ; R ; − → X |S ] = I [ ← − X ; − → X |S ] − I [ ← − X ; − → X |S , R ] = 0 . • I [ ← − X ; S ; − → X |R ]: I [ ← − X ; S ; − → X |R ] = I [ ← − X ; − → X |R ] − I [ ← − X ; − → X |S , R ] = 0 . The ﬁrst four v anish due to the causal states b eing a function of the past. The last three v anish since an y presen tation that generates the pro cess captures all the information shared b etw een past and future. 24 A CKNOWLEDGMENTS This work was partially supp orted b y the DARP A Ph ysical In telligence Program. The authors thank Dav e F eldman, Nick T ra vers, and Luke Grec ki for helpful com- men ts on the manuscript. [1] J. P . Crutc hﬁeld and D. P . F eldman. Sync hronizing to the environmen t: Information theoretic limits on agen t learning. A dv. in Complex Systems , 4(2):251–264, 2001. [2] S. Wiggins. Chaotic T r ansp ort in Dynamic al Systems . Springer, New Y ork, 1992. [3] R. W. Y eung. A new outlook on Shannon’s information measures. IEEE T r ans. Info. Th. , 37(3):466–474, 1991. [4] J. Klamk a. Contr ol lability of Dynamic al Systems . Springer, New Y ork, 1991. [5] R. J. Elliott, L. Aggoun, and J. B. Mo ore. Hidden Markov mo dels: Estimation and c ontr ol . Springer, New Y ork, 1994. [6] B. R. Andrievskii and A. L. F radko v. Control of c haos: Metho ds and Applications. I. Methods. Automation and Contr ol , 64(5):673–713, 2004. [7] B. R. Andrievskii and A. L. F radko v. Control of c haos: Metho ds and Applications. I I. Applications. Automation and Contr ol , 65(4):505–533, 2004. [8] J. M. Gonzalez-Miranda. Synchr onization and Contr ol of Chaos: An Intr o duction for Scientists and Engine ers . W orld Scien tiﬁc, Singap ore, 2004. [9] A. Piko vsky and J. Kurths M. Rosenblum. Synchr oniza- tion: A Universal Conc ept in Nonline ar Scienc es . Cam- bridge Nonlinear Science Series. Cambridge Universit y Press, New Y ork, 2001. [10] N. Jonosk a. Soﬁc shifts with synchronizing presentations. The o. Comp. Sci. , 158:81–115, 1996. [11] S. Strogatz. Sync: The Emer ging Scienc e of Sp ontane ous Or der . Hyp erion, New Y ork, 2003. [12] J. P . Crutc hﬁeld and D. P . F eldman. Regularities un- seen, randomness observ ed: Levels of entrop y conv er- gence. CHAOS , 13(1):25–54, 2003. [13] D. P . F eldman and J. P . Crutchﬁeld. Synchronizing to p erio dicit y: The transien t information and sync hroniza- tion time of p erio dic sequences. Advanc es in Complex Systems , 7(3-4):329–355, 2004. [14] H. Marko. The bidirectional communication theory: A generalization of information theory . IEEE T r ans. Comm. , COM-21(12):1345–135, 1973. [15] X. F eng, K. A. Loparo, , and Y. F ang. Optimal state esti- mation for sto chastic systems: An information theoretic approac h. IEEE T r ans. A uto. Contr ol , 42(6):771–785, 1997. [16] N. U. Ahmed. Line ar and Nonline ar Filtering for Engi- ne ers and Scientists . W orld Scientiﬁc Publishers, Singa- p ore, 1998. [17] N. H. Pac k ard, J. P . Crutchﬁeld, J. D. F armer, and R. S. Sha w. Geometry from a time series. Phys. R ev. L et. , 45:712, 1980. [18] F. T akens. Detecting strange attractors in ﬂuid turbu- lence. In D. A. Rand and L. S. Y oung, editors, Symp o- sium on Dynamic al Systems and T urbulenc e , v olume 898, page 366, Berlin, 1981. Springer-V erlag. [19] E. Ott, B.R. Hunt, I. Szun yogh aand A.V. Zimin, E. J. Kostelic h, M. Corazza, E. Kalna y , D. J. Patil, and J. A. Y ork e. Estimating the state of large spatio-temp orally c haotic systems. Physics L etters A , 330:365–370, 2004. [20] M. L. Puterman. Markov De cision Pr o c esses: Discr ete Sto chastic Dynamic Pr ogr amming . Wiley-Interscience, New Y ork, 2005. [21] T. M. Cov er and J. A. Thomas. Elements of Information The ory . Wiley-Interscience, New Y ork, second edition, 2006. [22] J. P . Crutchﬁeld, C. J. Ellison, and J. R. Mahoney . Time’s barb ed arrow: Irreversibilit y , crypticity , and stored information. Phys. R ev. L ett. , 103(9):094101, 2009. [23] C. J. Ellison, J. R. Mahoney , and J. P . Crutchﬁeld. Prediction, retrodiction, and the amoun t of information stored in the present. J. Stat. Phys. , 136(6):1005–1034, 2009. [24] J. R. Mahoney , C. J. Ellison, and J. P . Crutc hﬁeld. In- formation accessibilit y and cryptic processes. J. Phys. A: Math. The o. , 42:362002, 2009. [25] J. P . Crutchﬁeld and K. Y oung. Inferring statistical com- plexit y . Phys. R ev. L et. , 63:105–108, 1989. [26] A pro cess’s causal states consist of b oth transient and recurren t states. T o simplify the presentation, we hence- forth refer only to recurrent causal states. [27] J. P . Crutchﬁeld. The calculi of emergence: Compu- tation, dynamics, and induction. Physic a D , 75:11–54, 1994. [28] J. P . Crutchﬁeld and C. R. Shalizi. Thermo dynamic depth of causal states: Ob jectiv e complexity via mini- mal represen tations. Phys. R ev. E , 59(1):275–283, 1999. [29] C. R. Shalizi and J. P . Crutchﬁeld. Computational me- c hanics: Pattern and prediction, structure and simplicity . J. Stat. Phys. , 104:817–879, 2001. [30] In the theory of computation, uniﬁlarity is referred to as “determinism” [41]. [31] Sp eciﬁcally , each transition matrix T ( x ) has, at most, one nonzero comp onen t in eac h ro w. [32] D. Lind and B. Marcus. A n Intr o duction to Symb olic Dynamics and Co ding . Cam bridge Univ ersity Press, New Y ork, 1995. [33] J. R. Mahoney , C. J. Ellison, Ry an G. James, and J. P . Crutc hﬁeld. in pr epar ation , 2010. [cond-mat]. 25 [34] N. T rav ers and J. P . Crutchﬁeld. Exactly synchronizing to ﬁnite-state sources. 2010. SFI W orking Paper 10-06- XXX; arxiv.org:10XX.XXXX [XXXX]. [35] N. T ra vers and J. P . Crutchﬁeld. Asymptotically syn- c hronizing to ﬁnite-state sources. 2010. SFI W orking P ap er 10-06-XXX; arxiv.org:10XX.XXXX [XXXX]. [36] Ryan G. James, J. R. Mahoney , C. J. Ellison, and J. P . Crutc hﬁeld. in pr epar ation , 2010. [cond-mat]. [37] P . H. F rampton. Gauge Field The ories . Wiley-VGH V er- lag, W einheim, 2008. [38] D. R. Upp er. The ory and Algorithms for Hidden Markov Mo dels and Gener alize d Hidden Markov Mo dels . PhD thesis, Univ ersity of California, Berk eley , 1997. Published b y Univ ersit y Microﬁlms In tl, Ann Arb or, Mic higan. [39] Oracular information cannot be extracted from the past observ ables. This p oint will be discussed further in Sec. VI I. [40] Similar observ ations appeared recen tly; for example, see Ref. [42]. In a sequel w e compare this to the earlier results of Refs. [27, 38]. [41] J. E. Hop croft and J. D. Ullman. Intr o duction to Au- tomata The ory, L anguages, and Computation . Addison- W esley , Reading, 1979. [42] W. Lo ehr and N. Ay . Non-suﬃcien t memories that are suﬃcien t for prediction. In J. Zhou, editor, Complex Sci- enc es 2009 , volume 4 of L ectur e Notes of the Institute for Computer Scienc es, Social Informatics and T ele c om- munic ations Engine ering , pages 265–276. Springer, New Y ork, 2009.

Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment