Some Results on the Information Loss in Dynamical Systems

Some Results on the Informati on Loss in Dynamical Systems Bernhard C. Geiger ∗ , Gernot Kubin ∗ ∗ Signal Processing and Speech Comm unication Laboratory , Graz Univ ersity of T echnolo gy , Au stria { geiger,gernot.ku bin } @tugraz.at Abstract —In thi s work we in vestigate the i nform ation loss in (nonlinear) dynamical input-outp ut systems and pro vide some general results. In particular , we present an upper bound on the informa tion loss rate, deﬁned as the (n on-negative) difference between the entropy rates of the jointly stationary stochastic processes at the input and output of the system. W e further introduce a family of systems with v anishing informa tion loss rate. It is sho wn that not only linear ﬁlters belong to that family , but – under certain circumstances – also ﬁnite- precision implementations of the latter , which typically consist of nonlinear elements. I . I N T RO D U C T I O N T ransmission and pr ocessing of inf ormation is the primary concern in many ﬁelds of commu nications, signal processing, and machine learning . The typical impairments considered in these co ntexts are n oise an d interfe rence, incom plete data sets, and coarse obser vations, eliciting both info rmation- theoretic and energy -centered ana lyses. In con trary , the effect of d e- terministic in put-ou tput systems on the inform ation content, i.e., the en tropy rate, of a signal has not y et been th oroug hly analyzed. Still, nonlinear dynamical systems – capable of changin g informatio n content – are o mnipresen t in commu - nication systems in the roles of high-power ampliﬁers or fre- quency mixers. Anoth er example is the energy de tector, a low- complexity receiver arch itecture for wireless co mmunic ations. T o obtain a b etter und erstanding of the e ffects of these system compon ents, an inf ormation -theore tic treatment is essential. In this paper , we e stablish a framework for analyzing the effects of discrete-time dynam ical systems with a ﬁnite- dimensiona l state vector o n the entropy rate of a sig nal. While the analysis of continuous-valued stoch astic pro cesses will be left fo r futu re work, here we focu s on (jo intly) stationar y input and outpu t p rocesses taking values from countable alphabets. The data pr ocessing inequality (DPI , [1, pp. 35 ]) states that the entro py of a discre te random variable (R V) canno t increase by passing the R V thr ough a static non linearity . It was shown that th e same result holds for entropy rates of jointly stationary stochastic pr ocesses on ﬁnite alphab ets, both for static n onlinearities [2] and genera l dynamical systems [3]. Continuou s-valued processes passing through linear ﬁlters were alrea dy analyze d by Shann on in ter ms of differential entropy rate s [ 4], [5], wh ich in our opinion are n ot a dequate measures of information lo ss, cf. Section V. The c ondition al entropy , used to char acterize th e inform ation lost b y passing a continuou s R V th rough a static n onlinearity [6] or b y multiplying two integers [7], appears to be more approp riate. W e start by deﬁning the informa tion lo ss rate in Section II and sho w that th is q uantity is equal to the dif ference between the entropy rates o f the input and output processes. This ch oice establishes the DPI for dynam ical systems in Section I II, stating that the in formatio n loss rate is non- negativ e. This result is then c omplemen ted by an upper bound that can be ev aluated ea sily . In Section IV we introduce a family o f dynamica l systems fo r wh ich we show that the information loss rate vanishes. Th is family not only comprises a large class of stable lin ear ﬁlters (see Section V), but also their ﬁnite-precision coun terparts, commonly used in digital signal processing. Aside fr om the latter, Section VI d iscusses some other examples illustrating our theoretical results. This do cument is an extended version of a pap er submitted to an IEEE conf erence. I I . P RO B L E M S TA T E M E N T & P R E L I M I N A R I E S W e consid er a d iscrete-time regular two-sided stationary stochastic process X taking v alues from a countable set X . Let X n denote the R V of the n -th sam ple an d let X n k = ( X k , X k +1 , . . . , X n ) , thu s X = X ∞ −∞ . For the actual v alue of X n we write x n . W e furth er con sider an - other countable set Y whic h need s n ot be identical to X . Let H ( X n ) d enote the ze roth-or der entropy of X n and let ¯ H ( X ) = lim n →∞ 1 n H ( X n 1 ) d enote the entropy r ate of X . The restriction to countab le sets ensures that en tropies and entropy rates are well-deﬁned. The following class of dynamical systems is treated in this work: Deﬁnition 1 (Finite-Dim ensional Dynamical System) . Let Y n = f ( X n n − N , Y n − 1 n − M ) , 0 ≤ M , N < ∞ , be the R V of the n -th output sample of a dynam ical system with a ﬁnite- dimensiona l state vector subject to the input process X . Here, f : X N +1 × Y M → Y is a fu nction such that the sequence of output samples, Y n , con stitutes a two-sided stochastic process Y jo intly stationary with X . Deﬁnition 2 (I nforma tion Loss Rate) . Let X and Y be jointly station ary p rocesses on coun table sets related as in Deﬁnition 1. The average infor mation lost per sample is given by the condition al entropy r ate ¯ H ( X | Y ) = lim n →∞ 1 n H ( X n 1 | Y n 1 ) . (1) Characterizing the in formatio n loss as a cond itional entr opy rate is quite intuitive: Th e cond itional entr opy r ate deno tes the av erage number of bits per sample unknown about the input sequence after o bserving the o utput sequen ce; i.e. , the average informa tion lost per sample by passing the sequ ence thr ough the system in questio n. Before proceeding with the analysis, we will introduce two Lemmas: Lemma 1. F or any set of d iscr ete RVs Z n 1 and a ny functio n f ( Z k , Z l , . . . ) , 1 ≤ k , l , · · · ≤ n , the following hold s: H ( Z n 1 , f ( Z k , Z l , . . . )) = H ( Z n 1 ) (2) Pr o of: See [1, Prob. 2.4 ]. Lemma 2. Let X and Y be jointly stationa ry stochastic pr o cesses on counta ble sets. Then, for M < ∞ , ¯ H ( X ) = lim n →∞ 1 n H ( X n 1 | Y M 1 ) = lim n →∞ 1 n H ( X n 1 , Y M 1 ) . Pr o of: Clearly , H ( X n 1 | Y M 1 ) ≤ H ( X n 1 ) ≤ H ( X n 1 , Y M 1 ) (3) for a ll n , thus also in the limit. Now , since H ( X n 1 , Y M 1 ) = H ( X n 1 | Y M 1 ) + H ( Y M 1 ) and since all inv olved entities are non - negativ e, ¯ H ( X ) ≤ lim n →∞ 1 n H ( X n 1 | Y M 1 ) + lim n →∞ 1 n H ( Y M 1 ) | {z } → 0 . (4) Thus in th e limit th e u pper and lower bou nd ar e eq ual and the proof is completed . Since the input and outpu t alphabets of the d ynamical systems can be co untable, it may occur th at the entropy of a single sample be comes inﬁnite. Y et, by th e maximu m entro py proper ty of the uniform distribution, H ( Y M 1 ) ≤ M H ( Y ) ≤ lim |Y |→∞ M log |Y | (5) which app roaches inﬁnity at a slower rate than lim n →∞ n . Thus the term on the right in (4) approache s zer o even f or processes Y with inﬁnite zeroth-order entropy o r in ﬁnite entropy rate. I I I . I N F O R M AT I O N L O S S R AT E I N D Y N A M I C A L S Y S T E M S In this Section, wh ich co mprises the m ain co ntribution of this work, we p resent some gener al re sults on th e inform ation loss r ate ind uced by a sy stem satisfying Deﬁnition 1. W e will start b y proving a Theorem wh ich essentially states that the informa tion loss rate is id entical to the difference of entropy rates: Theorem 1. Let X and Y be jointly stationary pr oc esses on countab le sets r ela ted as in Deﬁn ition 1. Then, th e information loss rate is given by the differ e nce of entr op y rates: ¯ H ( X | Y ) = ¯ H ( X ) − ¯ H ( Y ) (6) Pr o of: While the proof for static functions (i.e., M = N = 0 ) is r elativ ely simp le [2], for dyn amical systems we have to show that ¯ H ( X | Y ) = lim n →∞ 1 n ( H ( X n 1 , Y n 1 ) − H ( Y n 1 )) (7) = lim n →∞ 1 n H ( X n 1 ) − lim n →∞ 1 n H ( Y n 1 ) (8) i.e., that lim n →∞ 1 n H ( X n 1 , Y n 1 ) = lim n →∞ 1 n H ( X n 1 ) . (9) Consider that, for n > max { M , N } H ( X n 1 , Y n 1 ) = H ( Y n , X n 1 , Y n − 1 1 ) (10) = H ( f ( X n n − N , Y n − 1 n − M ) , X n 1 , Y n − 1 1 ) (11) ( a ) = H ( X n 1 , Y n − 1 1 ) (12) where ( a ) is du e to Lemma 1. By repeated application, H ( X n 1 , Y n 1 ) = H ( X n 1 , Y max { M ,N } 1 ) . (13 ) Since this holds for all n > max { M , N } , it also h olds in the limit and with Lemma 2 we obtain lim n →∞ 1 n H ( X n 1 , Y max { M ,N } 1 ) = lim n →∞ 1 n H ( X n 1 ) (14) and thus ¯ H ( X | Y ) = ¯ H ( X ) − ¯ H ( Y ) . (15) This completes the pro of. The signiﬁcan ce of this Theor em lies in the fact that the informa tion loss can be inferred by comparin g the entropy rates of the in put and output pr ocesses. Note that the same does n ot hold for differential en tropy r ates, as we will argue in Section V. By the non -negativity of th e conditional entropy rate the following Corollary to Theorem 1 sho ws that the entropy rate o f the system o utput canno t be larger than the entropy rate of the system inpu t. This result, origin ally stated in [ 3] for ﬁnite alpha bets, further justiﬁes our intuitive de ﬁnition of informa tion lo ss: Corollary 1 ( DPI for Dynamical Systems) . Let X and Y be jointly statio nary pr ocesses on cou ntable sets r elated as in Deﬁnition 1. Then, the entr opy rate o f the output pr ocess Y canno t be lar ger than the entr opy rate of th e input pr oce ss X , i.e., ¯ H ( Y ) ≤ ¯ H ( X ) . (16) Generally , the com putation of en tropy rates is a non-trivial problem , wh ere closed-form solutions exist only for simple processes (e.g., Ma rkov chains). Since f unction s of stochastic processes rarely allo w such a simpliﬁed treatment, th e av ail- ability of bound s is o f vital importance. W e will thus presen t an upper bound on the information loss rate, which is simple to e valuate: Theorem 2 (Upper Bound) . Let X and Y be jointly sta tionary pr o cesses on countable sets related as in Deﬁnition 1. Then, the informatio n loss r ate is boun ded by ¯ H ( X | Y ) ≤ max ( x,θ ) ∈X ×T log | f − 1 θ [ f θ ( x )] | (17) wher e T = X N × Y M , θ ∈ T ar e the possible values of the RV Θ n = { X n − 1 n − N , Y n − 1 n − M } , and f − 1 θ [ · ] denotes the preima ge under f θ , an instantiatio n of the function f Θ n ( · ) = f ( · , Θ n ) . Pr o of: ¯ H ( X | Y ) = lim n →∞ 1 n ( H ( X n 1 , Y n 1 ) − H ( Y n 1 )) (18) ( a ) = lim n →∞ 1 n n X i =1 H ( X i , Y i | X i − 1 1 , Y i − 1 1 ) − n X i =1 H ( Y i | Y i − 1 1 ) ! (19) ( b ) ≤ lim n →∞ 1 n n X i =1 H ( X i , Y i | X i − 1 1 , Y i − 1 1 ) − n X i =1 H ( Y i | X i − 1 1 , Y i − 1 1 ) ! (20) = lim n →∞ 1 n n X i =1 H ( X i | X i − 1 1 , Y i 1 ) (21) where ( a ) is due to the chain rule of entropy an d ( b ) is due to the fact that cond itioning redu ces entropy . Th e expression under the su m in (2 1) is a no n-negative d ecreasing sequen ce in i and thu s has a limit. W e use the Ces ´ aro mean [ 1, Thm . 4.2.3] and obtain ¯ H ( X | Y ) ≤ lim n →∞ H ( X n | X n − 1 1 , Y n 1 ) (22) ≤ H ( X n | X n − 1 n − N , Y n n − M ) (23) = H ( X n | Y n , Θ n ) . (24 ) W e now replace Y n = f ( X n , Θ n ) = f Θ n ( X n ) , w here we treat the collection o f all previous R Vs inﬂuencing Y n as a (rando m) par ameter Θ n of the fu nction. T his appro ach lets us interpret the dy namical system as a param eterized static system f Θ n : X → Y , where we let Θ n take values θ from T = X N × Y M . W e thu s continue ¯ H ( X | Y ) ≤ H ( X n | f Θ n ( X n ) , Θ n ) (25) = X ( x,θ ) ∈X ×T H ( X n | f θ ( x ) , θ )Pr ( X n = x, Θ n = θ ) ( c ) ≤ X ( x,θ ) ∈X ×T log | f − 1 θ [ f θ ( x )] | Pr( X n = x, Θ n = θ ) ≤ max ( x,θ ) ∈X ×T log | f − 1 θ [ f θ ( x )] | (26) where ( c ) is due to conditioning and the maxim um en tropy proper ty of the uniform d istribution over a n alphabet size equ al to the cardinality of the preimage under f θ . Maximizing ov er all p ossible x and parameter values θ c ompletes the pro of. X System 1 System 2 Z Y ¯ H ( X | Y ) ¯ H ( Y | Z ) ¯ H ( X | Z ) Fig. 1. Cascade of systems This result can be inter preted as relating the inf ormation loss ra te o f a d ynamical system to the inform ation loss rate induced by a static fu nction. I n particular, we let the static function be p arameterize d by previous inp ut an d outpu t values taking ef fect on Y n and upper bo und the informa tion loss rate by th e max imum cardinality of the preimag e under f θ . While this u pper bound may be rather conservati ve, it is particularly simple to e valuate if th e system f unction from Deﬁnition 1 is a vailable. W e will illustrate th e use o f this result in Section VI-C. Finally , we present a r esult abo ut the cascad e of systems (see Fig. 1): Theorem 3 (Cascading Systems) . Let X , Y , and Z be jointly stationary stochastic p r oc esses on counta ble sets X , Y , and Z , r espectively , where Y is generated b y p assing X thr o ugh a system satisfying Deﬁnition 1, and Z is generated by passing Y thr ough another such system. Th en, the informatio n lo ss rate induced by the cascade is ¯ H ( X | Z ) = ¯ H ( X | Y ) + ¯ H ( Y | Z ) . (27) Pr o of: By using Theor em 1 ¯ H ( X | Z ) can be written as ¯ H ( X | Z ) = ¯ H ( X ) − ¯ H ( Z ) (28) = ¯ H ( X ) − ¯ H ( Y ) + ¯ H ( Y ) − ¯ H ( Z ) ( 29) = ¯ H ( X | Y ) + ¯ H ( Y | Z ) . (30) I V . P A RT I A L LY I N V E RT I B L E S Y S T E M S W e n ow impose an addition al restriction on the system function in Deﬁnitio n 1. This add itional restriction deﬁnes a family of systems for wh ich the information loss rate can be shown to vanish. Deﬁnition 3 (Partially invertible system) . A sy stem satisfyin g Deﬁnition 1 is partially in vertible if there exists a function f inv : X N × Y M +1 → X su ch that X n = f inv ( X n − 1 n − N , Y n n − M ) = f inv ( Y n , Θ n ) = f − 1 Θ n ( Y n ) . (31) In other word s, a system is partially invertible if its p arameter- ized static function f Θ n is invertible for all po ssible param eter values θ ∈ T . W e will now argue that for th is class o f systems th e informa tion lo ss rate vanishes. W e start by showing that the total info rmation loss for a ﬁnite-length inpu t sequen ce X K 1 after observing an output seque nce Y K 1 of th e same length remains boun ded independently o f the sequen ce length: Theorem 4. Let X K 1 and Y K 1 , K > max { M , N } , be two ﬁnite-leng th sequen ces of jointly stationary pr ocesses X an d Y on countable s ets X and Y , r e spectively , wher e Y is generated by p assing X thr ough a pa rtially in vertible system. Then, the information loss beco mes H ( X K 1 | Y K 1 ) = H ( X max { M ,N } 1 | Y K 1 ) . (32) Pr o of: W e start by noticing th at H ( X K 1 | Y K 1 ) = H ( X K 1 , Y K 1 ) − H ( Y K 1 ) and H ( X K 1 , Y K 1 ) = H ( X K , X K − 1 1 , Y K 1 ) (33) = H ( f inv ( X K − 1 K − N , Y K K − M ) , X K − 1 1 , Y K 1 ) ( a ) = H ( X K − 1 1 , Y K 1 ) (34) where ( a ) is du e to Lemma 1. Rep eating this step a numbe r of times yields H ( X K 1 , Y K 1 ) = H ( X max { M ,N } 1 , Y K 1 ) . (35) Subtracting H ( Y K 1 ) completes the proo f. Note that even though f Θ n is inv ertible for all p arameter values θ , this does o nly mean th at H ( X n | f Θ n ( X n ) , Θ n ) = 0 , while H ( X n | f Θ n ( X n )) ≥ 0 . This co rrespon ds to the state- ment o f Theorem 4, where for n < max { M , N } Θ n has to b e con sidered u nknown. It is also impo rtant to note th at H ( X K 1 | Y K 1 ) 6 = H ( X K 1 ) − H ( Y K 1 ) . While the information loss rate is e qual to the difference of en tropy rates (cf. Th eorem 1), it does no t ho ld gen erally that the difference of join t entr opies is equal to the joint co nditional entropy . W e will now make use o f th is result in pr oving that partially in vertible systems ha ve a v anishing information loss rate: Corollary 2. Let X and Y be jointly stationary p r o cesses on countab le sets r elated a s in Theor e m 4. Th en, the info rmation loss rate in duced b y p assing the pr ocess X th r ou gh the system vanishes, i.e., ¯ H ( X | Y ) = 0 . (36) Pr o of: W e provide two proofs fo r this Corollar y . F or the ﬁrst, note that irr espectiv e o f θ th e in verse func tion f − 1 θ always exists by Deﬁnition 3. W ith The orem 2 this imm ediately leads to ¯ H ( X | Y ) = 0 . For the second proof we note that Theorem 4 holds for all K , thus also in the limit. W ith Deﬁnition 2 we can therefor e write the inform ation loss rate as: ¯ H ( X | Y ) = lim n →∞ 1 n H ( X n 1 | Y n 1 ) (37) = lim n →∞ 1 n H ( X max { M ,N } 1 | Y n 1 ) (38) ≤ lim n →∞ 1 n H ( X max { M ,N } 1 ) = 0 (39) by similar argumen ts as in the proof o f Lemma 2. An immed iate co nsequen ce of this imp ortant Corollary is that, except f or the initial samples X max { M ,N } 1 after starting the o bservation of Y (cf. Theore m 4), th e remaining in for- mation of the input process can be recovered by observ ing the output process. Note that this not necessarily mean s that the inpu t p rocess can be reconstructed p erfectly , ev en if recon struction erro rs are a llowed in the ﬁrst max { M , N } samples. An illustrative example f or this fact will b e gi ven in Section VI-B. V . T H E C A S E O F L I N E A R F I LT E R S It is intere sting to note that a n impor tant subclass of discrete-time stable causal linear ﬁlters falls in the category of partially invertible systems, as long as the in put and output alphabets are countable. An example wher e the latter c ondition is satisﬁed is g iv en if the inpu t p rocess and the coefﬁcients take v alues from the ﬁeld of rational numbers. This subclass, powerful enoug h to cover most app lications [ 8], comprises ﬁlters with a ﬁnite-dimensio nal state vector de scribed by constant-co efﬁcient difference eq uations: Y n = N X k =0 b k X n − k + M X l =1 a l Y n − l (40) As noted in [ 9], stability of the ﬁlter guar antees th at for a stationary input process the outpu t proc ess is stationary and that Deﬁnition 1 a pplies. By rearrang ing the terms in (40) it can be veriﬁed that this sub class o f linear systems satisﬁes the deﬁn ition of partially inv ertible systems an d, thus, has a vanishing information loss rate. It is no tew orthy that this proper ty is in depend ent of the minimum- phase proper ty (cf. [10, p p. 2 80]) o f linear ﬁlters, which ensur es that the ﬁlter has a stable and causal inv erse. Indeed , for ﬁlter s which ar e not minim um-ph ase, the p artial in verse function f inv used in Deﬁnition 3 describ es a causal, but unstable linear ﬁlter . As a consequen ce, to an arbitrary stationary stoch astic inpu t process, the in verse ﬁlter described by f inv may r espond w ith a non-station ary output p rocess; howe ver , the response to Y will be X . A signal space model may effecti vely illustrate these con- siderations: Le t X ∞ and Y ∞ be the spaces of stationa ry in put and outpu t p rocesses X and Y , respectively , and let F {·} be the ( linear) operator m apping each elemen t of X ∞ to Y ∞ . By restricting our attention to regular stochastic processes, i.e., p rocesses whic h canno t have periodic com ponen ts, the operator F {·} is in jectiv e. As a conseq uence, for each ele- ment of Y ∞ there exists at most one element in X ∞ such that Y = F { X } . No te, ho wev er , that there are stationary stochastic pr ocesses in Y ∞ which are not images of elements in X ∞ . Only if F {·} is such that it describ es a stable, causal minimum- phase system, i.e., h as a stable and c ausal in verse, Y ∞ contains only image s of elements from X ∞ . This co mplemen ts a result already introduced by Shan- non [4], wh ich states that the change in differ en tial e ntropy ra te caused by stable, causal linear ﬁltering of con tinuous-valued stationary pro cesses is indep endent of the p rocess statistics. In particular, for a linea r ﬁlter with frequen cy response G (e θ ) the differential entropy rate of the outp ut is gi ven by [5, pp . 6 63] ¯ h ( Y ) = ¯ h ( X ) + 1 2 π Z π − π ln | G (e θ ) | dθ. (41) It can be shown (see, e.g., [11]) that the integral above ev aluates to ln | b 0 | + P i : | z i | > 1 ln | z i | , whe re z i are th e zer os of the transfer function G ( z ) . For causal minimum-ph ase systems ( | z i | < 1 ∀ i ) with b 0 = 1 , the differential e ntropy ra tes for the input and output pr ocess are e qual. T his r esult was rece ntly veriﬁed by [ 12], which analyzed the in variance of entropy rates fo r all-pole ﬁlters. Scaling the transfer fu nction of such a ﬁlter such that b 0 6 = 1 leads to ¯ h ( X ) 6 = ¯ h ( Y ) , d espite the fact that by scaling no inform ation is lost. Conv ersely , it is easily po ssible that ¯ h ( X ) = ¯ h ( Y ) for sy stems which destroy informa tion. Therefo re, we believe that differential entrop ies and differential entropy rates ar e not ad equate measures fo r informa tion lo ss. Future invest igations will show if alternati ve descriptions for co ntinuou s-valued processes will yield more approp riate ch aracterization s. V I . O T H E R E X A M P L E S While the case of linear ﬁlters is a particularly interesting one, the r estriction to coun table input and outp ut alphabets suggests further exam ples illustratin g the app lication of ou r theoretical results. A. Exa mple 1: F in ite-Pr ecision Linear F ilters The ﬁrst exam ple co nsiders an extension to the subclass of discrete-time linea r ﬁlters discussed in Section V. In many practical applications in d igital signal processing linear ﬁlters are im plemented with ﬁn ite-precision nu mber repre sentations only . W e thus assume that both input pro cess and ﬁlter coefﬁcients take values fro m a ﬁn ite set. For example , X may be a ﬁnite subset o f the ra tional n umbers Q , closed u nder modulo -addition . Multiplying two v alues from that s et, e.g., by multiplying an inp ut samp le with a ﬁlter coefﬁcient, ty pically yields a result not representable in X . As a conseque nce, after ev ery multiplication a quantizer is necessary , e ssentially truncating the addition al bits resulting fr om multiplication . Let the qu antizer be describ ed by a f unction Q : R → X with Q ( a + X n ) = Q ( a ) ⊕ X n if X n ∈ X , wher e ⊕ de notes modulo -addition (e.g., [1 0, p p. 37 3]). W ith this (40) changes to Y n = N M k =0 Q ( b k X n − k ) ⊕ M M l =1 Q ( a l Y n − l ) (42) or Y n = Q N M k =0 b k X n − k ⊕ M M l =1 a l Y n − l ! (43) depend ing whether qua ntization is p erforme d af ter m ultiplica- tion or af ter accumulation (in the latter case, the intermediate results are rep resented in a larger set X ′ ). Note tha t due to modulo -addition the r esult Y n remains in X . W e will now focus on ﬁlters with b 0 = 1 . For ﬁlters with inﬁnite precision this can be d one without loss of ge nerality by conside ring a constant gain factor b 0 and by n ormalizing X g ( · ) Linear Filter Y V Fig. 2. Discrete-T ime Hammerstein S ystem. For g ( · ) = ( · ) 2 and if the linear ﬁlter is a mo ving-a vera ge ﬁlter , this corresponds to a disc retize d model of the ener gy detec tor . all b k coefﬁcients. Howev er , this gain norm alization p oses a r estriction in the ﬁnite-prec ision case since b k /b 0 is not necessarily an element o f X . W ith b 0 = 1 ( 42) and ( 43) change to Y n = X n ⊕ N M k =1 Q ( b k X n − k ) ⊕ M M l =1 Q ( a l Y n − l ) ! (44) and Y n = X n ⊕ Q N M k =1 b k X n − k − M M l =1 a l Y n − l ! (45) by th e proper ty of the quan tizer . From this it can b e seen that either imp lementation is partially in vertible (th e terms in par entheses in (44) an d (4 5) are both in X , and m odulo- addition has an inverse element). Consequ ently , ev en ﬁlters with non linear elements can be sh own to preserve in formatio n under certain circumstances despite th e fact th at the quantizer function is non- injectiv e. B. Exa mple 2: Multiplying Consecutive Inputs Another no nlinear system satisfying Deﬁnitio n 3 is given by the following in put-ou tput relationship: Y n = X n X n − 1 (46) The partial inverse in th is case would be X n = Y n X n − 1 if X n − 1 6 = 0 , while for X n − 1 = 0 no such inv erse exists. Therefo re, th is example represents a class of sy stems whose partial in vertibility d epends on the alphabet X of the stochastic process. If the pro cess X is such that X does not co ntain the element 0 , the p artial inverse exists and we obtain for X n , n > 1 : X n = ( X 1 Q n − 1 2 k =1 Y 2 k +1 Y 2 k , for odd n Y n X 1 Q n 2 − 1 k =1 Y 2 k Y 2 k +1 , for even n (47) Indeed , since all X n , n > 1 , can be computed from X 1 and Y n 1 , we obtain H ( X n 1 | Y n 1 ) = H ( X 1 | Y n 1 ) wh ich is in perfect accordan ce with Theor em 4. Recon struction of X is thus possible u p to an unknown X 1 . Note, however , that this unknown samp le inﬂuences the whole reconstru cted sequence as shown in (47). Thus, even thou gh the in formatio n lo ss rate vanishes, perfect reconstruc tion of any sub sequence of X is impossible by observ ing th e outpu t process Y o nly . C. Example 3: Hamme rstein Systems A ﬁnal example considers a simple special ca se of a nonlinear dynamical system, na mely , a c ascade of a static nonlinear ity and a linear ﬁlter [13]. Su ch a cascade, usually referred to as Hammer stein system, is depicted in Fig. 2. A practical example o f such a Ha mmerstein system is th e energy detector, a pop ular low-complexity recei ver architectur e in wireless commun ications. In the discrete-time case the input- output relationship is given by Y n = N X k =0 b k g ( X n − k ) + M X l =1 a l Y n − l . (48) As it is e asily seen, this system is partially in vertible if and only if the function g has an in verse. If g is n ot inv ertible, we obtain in the light of Theo rem 2: Y n = f Θ n ( X n ) = b 0 g ( X n ) + C Θ n (49) where C Θ n is a c onstant depending on the random p arameter Θ n . W ith this and f − 1 θ [ f θ ( x )] = g − 1 [ g ( x )] for all x ∈ X , θ ∈ T we obtain an upper bound on the information lo ss rate: ¯ H ( X | Y ) ≤ max x ∈X log | g − 1 [ g ( x )] | (50) Interestingly , the structure o f this system allows a simp liﬁed analysis: Since the information loss rate of a cascad e of systems is equal to the sum of individual infor mation loss rates (cf. Theorem 3) we can analyze both c onstituent systems separately . The line ar ﬁlter was already shown to p reserve full in formatio n, so any infor mation loss will be caused by the static no nlinearity , i.e. , ¯ H ( X | Y ) = ¯ H ( X | V ) . T his is in accordan ce with the o bservation that the Ham merstein system is partially in vertible if the static nonlinear ity is invertible. For static nonlinearities the a nalytic treatment of infor- mation lo ss is simple com pared to dynamical systems. In particular, for an in depend ent, identically distributed (iid) input process X the infor mation loss rate can b e shown to b e equ al to the zeroth-o rder co nditional entropy , H ( X | V ) , while for a g eneral stationar y process th is qu antity acts as an u pper bound [2]. The up per boun d from Theorem 2 turn s out to be ev en mor e gen eral, since it also pr ovides an upper bo und on H ( X | V ) in the case of an iid in put proc ess (cf. The orem 4 in [6]) . An in-d epth analysis of the interp lay between these bound s is th e object of futur e w ork. V I I . C O N C L U S I O N In this work we have presented general results on the informa tion loss of dynam ical system s for stationary s tochastic input and o utput processes on countab le alphab ets. Furth er- more, we ha ve extended the p roof of the data pro cessing inequality stating that the entropy rate at th e ou tput of the system can not b e larger th an the entropy rate at the input and have deriv ed an u pper bou nd on the inf ormation loss rate. The additivity of in formatio n loss rates f or cascaded systems cou ld be shown, too. W e h av e furth er identiﬁed a family o f systems for which this upper bou nd is zero, i.e., for which the information loss rate vanishes. Not on ly linear ﬁlters belong to that family , but also their nonlinear coun terparts com mon in ﬁnite-p recision signal processing. Future research will e xtend these resu lts to the case of continuo us-valued stochastic processes and the a pplication to common nonlinear systems, e.g., V olterra models. A C K N O W L E D G M E N T S The auth ors gratefully acknowledge discussions with Sebas- tian Tschiatschek c oncern ing mathem atical notatio n, and his comments improving the quality of this manuscript. R E F E R E N C E S [1] T . M. Cov er and J. A . T homas, Elements of Information Theory , 2nd ed. Hoboke n, NJ: Wi ley Interscie nce, 2006. [2] S. W atanabe and C. T . Abraham, “Loss and reco ve ry of informat ion by coarse observ ation of stoc hastic chain, ” Inf ormation and Contr ol , vol. 3, pp. 248–278, 1960. [3] M. S. Pinsker , Information und Informationsstabi lit ¨ at zuf ¨ alli ger Gr ¨ ossen und Pr ozesse , H. Grell, Ed. Berlin: Deutscher V erlag der Wis- senschaft en, 1963, (Information and information stability of random v ariable s and processes, German translation ). [4] C. E. Shannon, “ A mathemati cal theory of communication, ” Bell Systems T echnical Journal , vol. 27, pp. 379–423, 623–656, Oct. 1948. [5] A. Papoulis and U. S. Pillai , P r obabil ity , Random V ariables and Sto chas- tic Proce sses , 4th ed. New Y ork, NY : McGraw Hill, 2002. [6] B. C. Geiger , C. F eldbau er , and G. Kubin, “Information loss in static nonline aritie s, ” arXiv:1102.479 4 [cs.IT] , 2011. [7] N. Pippenge r , “The av erag e amount of info rmation lost in multiplic a- tion, ” IEE E T rans. Inf. Theory , vol. 51, no. 2, pp. 684–687, Feb . 2005. [8] P . S. R. Diniz, E. A. B. da Silv a, and S. L. Netto, Digital Signal Pro- cessing: System Analysis and Design , 2nd ed. Cambridge : Cambridge Uni versi ty Press, 2010. [9] A. Papoulis, “Maximum entrop y and spectra l estimation : A revi e w , ” IEEE T rans. Acoust., Speec h, Signal Proc ess. , vol. AS SP-29, no. 6, pp. 1176–1186, Dec. 1981. [10] A. V . Oppenhei m, R. W . Schafer , and J. R. Buc k, Discre te-T ime Signal Pr ocessing , 2nd ed. Upper Saddl e Ri ver , NJ: Prentice Hall, Inc., 1999. [11] S. Y u and P . G. Meht a, “The Kullback- Leibler rate pseudo-metri c for comparing dynami cal systems, ” IEE E T rans. Aut om. Con tr ol , vol. 55, no. 7, pp. 1585–1598, Jul. 2010. [12] M. Dumitrescu and G. Popovici, “Entrop y in v aria nce for autore gres- si ve processes constructed by linear ﬁltering, ” International J ournal of Computer Mathematic s , vol. 88, no. 4, pp. 864–880, Mar . 2011. [13] Y . S. Shm aliy , Continuous-T ime Sy stems . Dordrecht: Springer , 2007.

Some Results on the Information Loss in Dynamical Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment