Scanning and Sequential Decision Making for Multi-Dimensional Data - Part II: the Noisy Case

We consider the problem of sequential decision making on random fields corrupted by noise. In this scenario, the decision maker observes a noisy version of the data, yet judged with respect to the clean data. In particular, we first consider the prob…

Authors: Asaf Cohen, Tsachy Weissman, Neri Merhav

Scanning and Sequential Decision Making for Multi-Dimensional Data -   Part II: the Noisy Case
Scanning and Sequen tial Decision Mak ing for Multi-Dimensional Data - P art I I: t he Noisy C ase ∗ Asaf Cohen † , Tsac hy W eissman ‡ and Neri Merhav † Octob er 27, 201 8 Abstract W e consider the problem of sequen tial d eci sion making on random fields corrupted b y noise. In this scenario, the decision maker obs er ves a noisy version of the data, y et judged with resp ect to the clean data. In particular, w e first consider th e problem of sequen tially scanning and filtering noisy random fields. In this case, the sequent ial filter is giv en the freedom to c h oose the path ov er which it tra verses the random field (e.g., noisy image or v id eo sequence), thus it is natural t o ask what is the b est ac hiev able p erformance and ho w sensitiv e th is p erform an ce is to the c h oic e of the scan. W e formally define the problem of s ca n ning and filtering, d eriv e a boun d on the b est ac hiev able p erformance and quant ify the excess loss o ccurring when non-optimal sca nners are used, compared to optimal scanning and filtering. W e then d iscuss th e problem of sequen tial scanning and predictio n of noisy random fields. This setting is a natural mo del for applications such as restoratio n and co ding of noisy im- ages. W e formally define the problem of scanning and pr edictio n of a n oisy m ultidimensional arra y and r ela te t he optimal p erformance to the clean scandictabilit y defined by Merha v and W eissman . Moreo v er, b ound s on the excess loss due to sub-optimal scans a r e deriv ed, and a unive r sal prediction algorithm is suggested. ∗ The mater ial in this pap er was presented in part a t the IEEE I nternational Sy mp osium on Information Theory , Seattle, W ashington, United States , July 2 006, and accepted to the IEEE International Symp osium on Infor mation Theory , Nice, F ra nce, June 200 7. † Asaf Cohen and Ner i Merha v are with the Departmen t of the Electrical Engineering , T echnion - I.I.T., Haifa 32000 , Israel. E - mails: { soofso of@tx,merhav@ee } .technion.ac.il. ‡ Tsach y W eissman is with the Depar tmen t of Ele c tr ical Eng ineering, Stanfo r d University , Stanford, CA 943 05, USA. E -mail: tsac hy@stanford.edu. 1 This pap er is the second part of a t wo-part pap er. Th e first pap er dealt w ith s equ en tial decision m aking on noiseless d ata arra ys, namely , w hen the decision mak er is j u dged with resp ect to the same d ata arr a y it observ es. 1 In tro d uction Consider the problem of sequ en tially scanning and filtering (or p redicting) a multidimensional noisy data arra y , wh ile min imizing a giv en loss function. Partic u larly , at eac h time in stan t t , 1 ≤ t ≤ | B | , where | B | is the num b er of sites (“pixels”) in the data array , th e sequen tial decision mak er chooses a site to b e v isited, d enote d b y Ψ t . In the filtering scenario, it first observ es the v alue at that site, an d then giv es an estimation for the und erlying clean v alue. In the prediction s ce nario, it is required to giv e a prediction for that (clean) v alue, b efore the actual observ ation is made. In b oth cases, b oth the lo cation Ψ t and the estimation or prediction ma y dep end on the previously observe d v alues - the v alues at sites Ψ 1 to Ψ t − 1 . The goal is to minimize the cum ulativ e loss after scanning the entire d ata arra y . Applications of this problem can b e fou n d in image an d video pro cessing, su c h as filtering or p redictiv e co ding. I n th ese app lica tions, one wishes to either enhance or jointly enhance and cod e a giv en image. The motiv ation b ehind a pr ed ict ion/compression-b ased approac h, is th at the prediction error may consist mainly of the noise signal, while the clean signal is reco v er ed b y the p redictor. F or example, see [1]. It is clear that different s ca n ning patterns of the image ma y result in different filtering or prediction errors, th u s, it is natural to ask what is the p erforman ce of the optimal scannin g strategy , and what is the loss w h en non-optimal strategies are used. The problem of s ca n ning multidimensional d at a arra ys also arises in other areas of image pro cessing, s u c h as one-dimensional wa velet [2] or median [3] pr ocessing of images, wh ere one seeks a space-filling curve which facilitates th e one-dimensional signal pro cessing of the m u ltidimen sional d at a. O ther examples include digital halftoning [4], where a sp ace filling curv e is sought in ord er to minimize the effect of false con tour s, an d pattern recognition [5]. Y et more applications can b e found in m u ltidimensional data query [6] and indexing [7], where m ultidimensional data is stored on a one-dimensional storage device, hence a lo calit y- preserving sp ac e-filling curve is sought in order to minimize the num b er of conti n u ous read op erations required to access a m u ltidimensional ob ject, and rend ering of three-dimensional graphics [8], [9]. 2 An inform at ion theoretic discussion of the scanning pr oblem wa s in itia ted by Lemp el and Ziv in [10], where the Pea no-Hilb ert scan was shown to b e optimal for co m pression of individual images. In [11], Merha v an d W eissman f ormally defin ed a “scandictor”, a scheme for sequ en- tially scann ing and predicting a multidimensional data array , as well as the “scandictabilit y” of a rand om field, namely , th e b est ac h iev able p erformance for scanning and pr edictio n of a random field. P articular cases w here this v alue can b e computed and the optimal scanning order can b e identi fied we re discussed in that wo r k. One of the main results of [11] is the fact that if a sto c hastic field can b e represente d autoregressive ly (und er a sp ecific scan Ψ) with a maximum-en trop y inn ov ation pro cess, then it is optimally scandicted in the w ay it was created (i.e., by the sp ecific scan Ψ an d its corresp onding optimal p r edicto r ). A more compre- hensiv e su rv ey can b e found in [12] and [13]. In [12], the problem of universal scanning and prediction of noise-free multidimensional arrays w as inv estigated. Although this problem is fundamentall y differen t fr om its one-dimensional analogue (for example, one cannot comp ete successfully with any t wo scandictors on an y in dividual image), a un iv ersal scann in g and pr e- diction algorithm which ac h iev es the scandictabilit y of any stationary random field was give n, and the excess loss incu rred when non-optimal scanning strategies are u sed w as quant ified. In [14], W eissman, Merha v and S omekh-Ba ruc h , as w ell as W eissman and Merhav in [15] and [16], extended the problem of universal prediction to the case of a noisy en vironm en t. Namely , the p redictor observe s a noisy version of the sequence, ye t, it is ju dged with resp ect to the clean sequence. In this p aper, we extend the results of [11] and [12] to this noisy scenario. W e form ally define the problem of s equen tially filtering or pr edicti ng a multidimensional data arra y . First, w e d eriv e lo w er b ounds on the b est ac hiev ab le p erformance. W e then discuss the s ce nario w here non-optimal scanning strategies are used. T hat is, w e assume th at , due to implemen tation constrain ts, for examp le, one cannot u se th e optimal scanner for a giv en data arra y , and is forced to u se an arbitrary s cann ing order. In su ch a scenario, it is imp ortan t to understand what is th e excess loss in curred, compared to optimal scannin g and filtering (or prediction). W e d er ive upp er b ounds on this excess loss. Finally , we briefly ment ion h o w the results of [12] can b e exp loited in order to constru ct universal schemes to the noisy case as w ell. While many of the resu lts for noisy scandiction are extendible f r om the noiseless case, similarly as r esu lts for n oisy pred ict ion were extended from results for n oise less prediction [15], the scanning and filtering pr oblem p oses new c hallenges and requires th e use of n ew to ols and tec hniques. The pap er is organized as follo ws. S ection 2 includ es a p recise formulat ion of the p roblem. 3 Section 3 includes th e results on scanning and filtering of noisy data arrays, while S ec tion 4 is dev oted to the prediction scenario. In b oth sections, particular emphasis is give n to the imp ortant cases of Gauss ian random fields corr u pted by Additive White Gaussian Noise (A W GN), u nder th e squared error criterion, and binary random fields corrupted by a Binary Symmetric Channel (BSC), un der the Hamming loss criterion. In p artic u lar, in Section 3.1 , a new to ol is used to d eriv e a lo we r b ound on th e optim um scanning and filtering p erformance (Section 4.1 later shows how this to ol can b e used to strengthen the resu lts of [11] in the n oise -free scenario as well) . S ect ion 3.2 gives u pp er b ounds on the excess loss in non-optimal scann ing. In Section 3.2.1, the results of Dun ca n [17] as well as those of Guo, S hamai and V erd ´ u [18] are used to derive the b ounds when th e noise is Gaussian, and Section 3.2.2 deals with the b in ary setting. Section 3.3 uses recen t results b y W eissman et . al . [19] to describ e how univ ersal scanning and fi lte ring algorithms can b e constructed. In the noisy scandiction section, Section 4.1 relates the b est ac hiev able p erformance in this setting, as w ell as the achievi ng scandictors, to the cle an sc andictability of the noisy field. Section 4.2 in tro duces a universal scandiction algorithm, and Section 4.3 giv es an u pp er b oun d on the excess loss. In b oth Section 3 and Section 4, th e su b-sectio n s d escribing the optim um p erformance, the excess loss b ounds and the universal algorithms are not directly related and can b e read ind epend en tly . Finally , Section 5 con tains some concluding r emarks . 2 Problem F orm ulation W e start w ith a formal defin itio n of the p roblem. Let A denote the alphab et, wh ic h is either discrete or the real line. Let N b e the n oisy observ ation alph ab et. Let Ω = ( A × N ) Z 2 b e the observ ation sp ac e (the results can b e extended to an y fi nite dimension). A pr obabilit y measure Q on Ω is stationary if it is inv arian t u nder translations τ i , where for eac h ω ∈ Ω and i, j ∈ Z 2 , τ i ( ω ) j = ω j + i (namely , stationarit y means sh ift in v ariance). Denote by M (Ω) and M S (Ω) the sets of all pr obabilit y measures on Ω and stationary probabilit y measur es on Ω, resp ectiv ely . Elemen ts of M (Ω), r andom fields , will b e denoted by up p er case letters while elemen ts of Ω, individual data arr ays , will b e d en ot ed by the corresp onding lo w er case. It will also b e b eneficial to refer to the clean and noisy random fields separately , that is, { X t } t ∈ Z 2 represent s the clean signal an d { Y t } t ∈ Z 2 represent s the noisy obser v ations, where for t ∈ Z 2 , X t is the random v ariable corresp ondin g to X at site t . Let V d enote the set of all finite subsets of Z 2 . F or V ∈ V , denote b y X V the r estrict ions 4 of the d at a arr a y X to V . Let R  b e the set of all r ec tangles of the f orm V = Z 2 ∩ ([ m 1 , m 2 ] × [ n 1 , n 2 ]). As a sp ecial case, denote b y V n the square { 0 , . . . , n − 1 } × { 0 , . . . , n − 1 } . F or V ⊂ Z 2 , let the inte rior radiu s of V b e R ( V ) △ = s up { r : ∃ c s.t. B ( c, r ) ⊆ V } , (1) where B ( c, r ) is a closed ball (under the l 1 -norm) of radiu s r cen tered at c . T hroughout, ln( · ) will d enote the n atural logarithm. Definition 1. A sc anner-filter p air for a fi n ite set of sites B ∈ V is the follo wing pair (Ψ , F ): • The scan { Ψ t } | B | t =1 is a sequen ce of m ea surable mappin gs, Ψ t : N t − 1 7→ B determining the site to b e visited at time t , with the prop erty that n Ψ 1 , Ψ 2 ( y Ψ 1 ) , Ψ 3 ( y Ψ 1 , y Ψ 2 ) , . . . , Ψ | B |  y Ψ 1 , . . . , y Ψ | B | − 1  o = B , ∀ y ∈ N B . (2) • { ˜ F t } | B | t =1 is a sequence of measurable fi lters, where for eac h t , ˜ F t : N t 7→ D d ete r mines the r ec onstruction f or the v alue at the site visited at time t , based on the cur r en t and previous observ ations, and D is the reconstruction alph abet. Note that b oth the scanner Ψ and the filters { ˜ F t } base their decisions only on the noisy observ ations. In the prediction scenario (i.e., noisy scandiction), w e d efine F t : N t − 1 7→ D , that is, { F t } represen ts measurable predictors, w hic h ha v e access only to p revious observ ations. W e allo w r andomize d scanner-filter pairs, namely , pairs suc h that { Ψ t } | B | t =1 or { ˜ F t } | B | t =1 can b e c hosen rand omly from some set of p ossible fu nctions. It is also imp ortant to note that w e consider only scanner s for fi n ite sets of sites, ones whic h can b e view ed merely as a reordering of th e sites in a finite set B . The cum u lat iv e loss of a s cann er-filter pair (Ψ , ˜ F ) up to time t ≤ | B | is d enote d b y L (Ψ , ˜ F ) ( x B , y B ) t , L (Ψ , ˜ F ) ( x B , y B ) t = t X i =1 l ( x Ψ i , ˜ F i ( y Ψ 1 , . . . , y Ψ i )) , (3) where l : A × D 7→ [0 , ∞ ) is the loss fu nction. The sum of the instan taneous losses ov er the en tire data arra y B , L (Ψ , ˜ F ) ( x B , y B ) | B | , w ill b e abbreviated as L (Ψ , ˜ F ) ( x B , y B ). F or a giv en loss f unction l and a field Q ∈ M (Ω) r estricted to B , defin e the b est ac hiev able scanning and filtering p erformance by ˜ U ( l, Q B ) = inf (Ψ , ˜ F ) ∈S ( B ) E Q B 1 | B | L (Ψ , ˜ F ) ( X B , Y B ) , (4) 5 where Q B is the marginal probabilit y measure restricted to B and S ( B ) is the set of al l p ossible scanner-filter pairs for B . The b est ac h iev able p erform an ce for the field Q , ˜ U ( l, Q ), is defin ed b y ˜ U ( l, Q ) = lim n →∞ ˜ U ( l, Q V n ) , (5) if this limit exists. In the p redictio n scenario, F t is allo w ed to base its estimation only on y Ψ 1 , . . . , y Ψ t − 1 , and w e hav e L (Ψ ,F ) ( x B , y B ) = | B | X t =1 l ( x Ψ t , F t ( y Ψ 1 , . . . , y Ψ t − 1 )) , (6) ¯ U ( l, Q B ) = inf (Ψ ,F ) E Q B 1 | B | L (Ψ ,F ) ( X B , Y B ) , (7) and ¯ U ( l, Q ) = lim n →∞ ¯ U ( l, Q V n ) , (8) if this limit exists. The follo wing prop osition asserts th at for any stationary random field b oth the limit in (5) and the limit in (8) exist. Prop osition 1. F or any stationary field Q ∈ M S (Ω) and for any se qu enc e { B n } , B n ∈ R  , satisfying R ( B n ) → ∞ , the limits in (5) and (8) exist and satisfy ˜ U ( l, Q ) = lim n →∞ ˜ U ( l, Q B n ) = inf ∆ ∈R  ˜ U ( l, Q ∆ ) , (9) ¯ U ( l, Q ) = lim n →∞ ¯ U ( l, Q B n ) = inf ∆ ∈R  ¯ U ( l, Q ∆ ) . (10) Since ˜ U ( l, Q B ) and ¯ U ( l, Q B ), p ossess the su b-additivit y prop erty , e.g., for any V , V ′ , V ∩ V ′ = ∅ , there exists a scanner-filter pair (Ψ , ˜ F ) (or a scandictor (Ψ , F )) on V ∪ V ′ suc h that E Q L (Ψ , ˜ F ) ( X V ∪ V ′ , Y V ∪ V ′ ) ≤ | V | ˜ U ( l, Q V ) + | V ′ | ˜ U ( l, Q V ′ ) , (11) the p roof of Pr op osition 1 follo ws verbatim that of [11, Theorem 1]. 3 Filtering of Nois y Data Arra ys In this section, w e consider the scenario of scanning and filtering. In this case, a lo wer b ound on the b est ac hiev able p erformance is d eriv ed. F or the cases of Gaussian random fields corrup ted b y A W GN and binary v alued fields observed through a BSC, w e der ive b ounds on th e excess loss when a n on-optima l scann er is used (with an optimal filter). Finally , we br iefly d iscuss unive r sal scanning and filtering. 6 3.1 A Lo w er Bou nd on the Best Ac h ie v able Scanning and Fil- tering P erformance W e assume an in v ertible m emoryless c han n el, m ea ning the c hann el inpu t distribution of a single s ym b ol is uniqu ely determined giv en the output distribution. As an example, a discrete memoryless c h annel with an in v ertible channel matrix can b e kept in m in d. See [20] for a discussion on the conditions on the c h annel matrix for th e inv ertibilit y pr operty to hold. Moreo v er, as w ill b e elab orated on later, the result b elo w applies to more general c hann els, including con tinuous ones. In the case of an inv ertible channel, w e define asso ciated Ba yes en velo p e by f l ( P ) = min g ( · ) E l ( X, g ( Y )) , (12) where P is the distribu tion of the channel output Y . Define ζ ( d ) = max { H ( P ) : f l ( P ) ≤ d } , (13) and let ¯ ζ ( · ) b e th e upp er conca ve ( ∩ ) en velope of ζ ( · ). Theorem 2. L et Y B b e the output of an invertible memoryless channel whose input is X B . Then, for any sc anner-filter p air (Ψ , ˜ F ) we have ¯ ζ  1 | B | E Q B L (Ψ , ˜ F ) ( X B , Y B )  ≥ 1 | B | H ( Y B ) , (14) that is, ¯ ζ  ˜ U ( l, Q B )  ≥ 1 | B | H ( Y B ) . (15) Pr o of. W e prov e the ab o v e theorem for the discrete case. Y et, the deriv ations b elo w app ly to the contin uous case as well, w ith summations r eplace d by the appropr iat e integ r als and the en tropy r eplac ed by differentia l entrop y . 7 Denote by Ψ( Y B ) th e reordered outpu t sequence, that is, { Y Ψ 1 , Y Ψ 2 , . . . , Y Ψ | B | } . W e hav e, H ( Y B ) ( a ) = H (Ψ( Y B )) = | B | X t =1 H ( Y Ψ t | Y Ψ t − 1 ) = | B | X t =1 X y Ψ t − 1 H ( Y Ψ t | Y Ψ t − 1 = y Ψ t − 1 ) P ( y Ψ t − 1 ) ( b ) ≤ | B | X t =1 X y Ψ t − 1 ζ  E Q B n l  X Ψ t , ˜ F t ( y Ψ t − 1 , Y Ψ t )  | Y Ψ t − 1 = y Ψ t − 1 o P ( y Ψ t − 1 ) ( c ) ≤ | B | X t =1 X y Ψ t − 1 ¯ ζ  E Q B n l  X Ψ t , ˜ F t ( y Ψ t − 1 , Y Ψ t )  | Y Ψ t − 1 = y Ψ t − 1 o P ( y Ψ t − 1 ) ( d ) ≤ | B | X t =1 ¯ ζ  E Q B l  X Ψ t , ˜ F t ( Y Ψ t )  ( e ) ≤ | B | ¯ ζ  1 | B | E Q B L ˜ F (Ψ( X B ) , Ψ( Y B ))  = | B | ¯ ζ  1 | B | E Q B L (Ψ , ˜ F ) ( X B , Y B )  . (16) The equ ali t y ( a ) is since the reord ering do es not c hange the ent rop y of Y B . While this is clear for data-indep endent reordering, more caution is required when Ψ is a d at a-dep enden t scan. Y et, this can b e pro ved usin g the c hain ru le, an d noting that conditioned on Y Ψ t − 1 Ψ 1 , th e n ext site Ψ t is fixed (this is similar to the pro of of [12, Prop osition 13]). The inequalities (b) and (c) f ol lo w from the defin itio n s of ζ and ¯ ζ resp ectiv ely , and (d) and (e) f ollo w from Jensen’s inequalit y . A t this p oint, a few remarks are in order. Theorem 2 is the d irect analogue of the low er b ounds in [11] for the filtering scenario. Note, ho wev er, that it holds for an y fin ite set of sites B . F urtherm ore, it app lies to arbitrarily distribu te d r an d om fi elds (ev en non-stationary fields), and to a wide family of loss f unctions. In fact, the only condition on l ( · , · ) is that the asso cia ted Ba y es en velope f l ( P ) is w ell defined. Note also that th e lo wer b ound on ˜ U ( l, Q ) giv en in T heorem 2 results from the app lication of a single letter f u nction, ¯ ζ − 1 ( · ), to th e normalized ent rop y of th e n oisy fi eld, 1 | B | H ( Y B ). Th at is, the memory in ( X B , Y B ) is reflected only in 1 | B | H ( Y B ). The pro of of Theorem 2 is general and direct, h o wev er, it lac ks the insightful geometrical in terpr eta tion whic h led to the lo wer b ound in [11]. Ther ein, Merhav and W eissman show ed that the transformation from a data array to an error sequence (defined by a sp ecific scandictor 8 (Ψ , F )) is volume preserving. Th u s, the least exp ected cumulativ e error is the r adius of a spher e , whose volume is the vo lu me of th e s et of all typica l data arr a ys of the source. This happ ens when all the t ypical data arr a y s of th e source map to a sphere in th e “error v ectors” space, and thus Merha v and W eissman we re able to ident ify cases wher e the lo wer b ound is tight. Cu r ren tly , w e cannot p oin t out sp ecific cases in whic h (15) is tigh t. Moreo ver, as the n ext t w o examp les sho w , in the scanning and fi ltering scenario (un lik e the scanning and prediction scenario we discuss in S ec tion 4), ζ ( d ) ma y not b e conca ve , and th u s ζ ( d ) 6 = ¯ ζ ( d ). Note, in this con text, that there is no natural time sharing solution in this case, as there is no natural trade-off b et we en t w o (or more) optimal p oint s, and there is only one criterion to b e minimized - th e cum ulativ e scanning and filtering loss (as opp osed to rate v er s us distortion, for example). 3.1.1 Binary Input and BSC T o illustr ate its use, we sp ecialize Theorem 2 to the case of bin ary input through a BSC, i.e., the inp ut random field X V n is b inary , and Y V n is the outp u t of a BSC whose input is X V n and crossov er p robabilit y is δ < 1 / 2. Note, how ev er, that although the d eriv ations b elo w are sp ecific for binary alphab et and Hamming loss, they are easily extendible to arbitrary fi nite alphab et and d iscrete memoryless c hannel with a c h annel transition matrix Π and loss function Λ( · , · ). T o compute the low er b ound on the b est ac hiev able scanning and fi lte ring p erf orm ance, w e ev aluate f l ( P ) and ζ ( d ). By th e defin itio n s in (12) and (13), we consider the scalar pr oblem of estimation of a random v ariable X based on its n oisy observ ation Y . Denote by p Y the probabilit y P ( Y = 1) and by p X the probabilit y P ( X = 1). Th e b est ac hiev able p erform ance , 9 f l H ( p Y ), whic h clearly d epend s on δ , and, hence, den oted f δ ( p Y ), is give n by f δ ( p Y ) = X x,y P ( x, y ) l H ( x, g opt ( y )) = X y P ( y ) X x P ( x | y ) l H ( x, g opt ( y )) ( a ) = X y P ( y ) min x P ( x | y ) = X y min x P ( x, y ) = min { p X (1 − δ ) , δ (1 − p X ) } + min { p X δ , (1 − δ )(1 − p X ) } = min { p X , 1 − p X , δ } ( b ) = min  p Y − δ 1 − 2 δ , 1 − p Y − δ 1 − 2 δ , δ  , (17) where (a) results fr om the optimalit y of g opt ( y ) and (b) results from th e inv ertabilit y of the c hann el. Consequ ently , ζ ( d ) = max p h b ( p ) s.t. f δ ( p ) ≤ d = ( h b ( δ ∗ d ) d < δ 1 d ≥ δ , (18) where h b ( · ) is the binary en tropy f u nction and δ ∗ d = d (1 − δ ) + δ (1 − d ). Note that since δ ∗ δ < 1 / 2 for 0 < δ < 1 / 2, there is a discon tin uity at d = δ , h en ce ζ ( d ) is generally not conca v e and ¯ ζ ( d ) 6 = ζ ( d ) (although ¯ ζ ( d ) can b e easily calculated). Figure 1 includes plots of b oth ζ ( d ) and ¯ ζ ( d ) for δ = 0 . 25. W e also mentio n that d = δ is a realistic cumulativ e loss in non-trivial situations, as there are cases wh ere “sa y-what-y ou-see” (and thus suffer a loss δ ) is th e b est any filter can d o [21]. F urth ermore, note that ζ ( d ) is not th e maximum entrop y function γ ( d ) used in [11] to deriv e the lo wer b ound on th e scandictabilit y . Finally , exact ev aluation of the b ound in Theorem 2 may b e difficu lt in man y cases, as the en tropy 1 | B | H ( Y B ) may b e h ard to calculate, and only b ound s on its v alue can b e used . 1 A t the end of Section 3.2.2, we give a numerical example for the b ound in T heorem 2 using a lo we r b ound on the entrop y rate. R emark 1 . Clearly , ζ ( d ) is interesting only in the r egi on d ≤ δ , as an y reasonable filter will ha ve an exp ected n ormaliz ed cum u lat iv e loss smaller or equal to the c h annel crossov er probab ility . Ho we ver, due to th e d iscontin uit y at d = δ , ζ ( d ) is conca ve for d < δ but not for d ≤ δ . Th is 1 Think, for example, of an input pro cess which is a first order Mar kov so urce. While the entrop y rate of the input is known, the o utput is a hidden Markov pro cess whos e entrop y r ate is unknown in gener al. 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.8 0.85 0.9 0.95 1 ζ (d) and its upper concave envelope for δ =0.25 d ζ (d) Upper concave envelope of ζ (d) Figure 1: The function ζ ( d ), as it app ears in (18), a nd its upp er conca v e en v elop e, ¯ ζ ( d ), b oth plotted f o r δ = 0 . 25. Note that ζ ( d ) and ¯ ζ ( d ) ha v e analytic expressions, a nd the plots a re discrete only to b etter distinguish b etw een them. is fortunate, as if ζ ( d ) was conca ve on d ≤ δ , Theorem 2 would hav e resulted in h b ( δ ∗ δ ) as an upp er b ound on the entrop y rate of any b inary s ou r ce corrupted b y a BSC , whic h is erroneous (for example, it violates h b ( π ∗ δ ) as a lo wer b ound on the entrop y rate of a first ord er Mark o v source with transition pr obabilit y π corru pted by a BSC with crosso ver probability δ ). 3.1.2 Gaussian Channel Consider no w the case where Y V n is the outpu t of an A WG N channel, wh ose in put is arbitrarily distributed. Assume the squared error loss. As the optimal filter is clearly the conditional exp ectat ion, ζ ( d ) in this case is given by ζ ( d ) = max { H ( X + N ) : V ar( X | X + N ) ≤ d } , N ∼ N (0 , σ 2 N ) , N ⊥ X . (19) Since H ( X + N | X ) = H ( N ) is fixed, this is similar to the classical Gaussian c hann el capacit y problem, only no w the inpu t constrain t is V ar ( X | X + N ) ≤ d , which generally dep ends on the distribution of X rather than solely on its v ariance, and h en ce is n ot necessarily ac h iev ed by Gaussian X . When the input is also limited to b e Gaussian, ho we v er, the optimization problem in (13) is trivial and ζ ( d ) can b e easily calculate d (note that in this case ζ ( d ) is v alid only to 11 b ound the p erformance for scanning and fi ltering of Gaussian fields corru p ted by A WGN) . Since the d istributions dep end only on the v ariance (assuming zero exp ectation), we ha ve f l s ( P ) = f l s ( σ 2 Y ), and , in f act, f l s ( σ 2 Y ) = σ 2 N σ 2 X σ 2 N + σ 2 X = σ 2 N − σ 4 N σ 2 Y . (20) Hence, ζ ( d ) = max 1 2 ln(2 πeσ 2 Y ) s.t. f l s ( σ 2 Y ) ≤ d = ( 1 2 ln  2 π e σ 4 N σ 2 N − d  d < σ 2 N ∞ d ≥ σ 2 N . (21) Unlik e the b inary s etting, h ere th e cumulativ e loss, d , will b e strictly smaller th an σ 2 N for an y non-trivial setting and reasonable filter, as the error in symb ol -b y-symb ol filtering is σ 2 N σ 2 X σ 2 X + σ 2 N < σ 2 N . Y et, ζ ( d ) is conv ex ( ∪ ) for d < σ 2 N , and the chai n of inequalities in (16) cannot b e tigh t. 3.2 Bounds on the Excess Loss of Non-Op ti mal Scanners Theorem 2 giv es a lo wer b ound on the optimum scanning and filtering p erformance. Ho wev er, it is interesting to inv estigate wh at is the excess scanning and filtering loss when non-optimal scanners are used. Sp ecifically , in this s ec tion we address the f ollo wing question: Su pp ose that, for practical r easo n s for example, one uses a n on-optima l scanner, accompanied with the optimal filter for that scan. Ho w large is the excess loss incurred by th is sc h eme with resp ect to optimal scanning and fi ltering? W e consider b oth the case of a Gaussian c hann el and squared err or loss (with Gaussian or arb itrarily distributed input) and the case of a bin ary source passed through a BSC and Hamming loss. While the to ols we use in order to construct s u c h a b ound f or the binary case are similar to the ones us ed in [12], we d ev elop a n ew s et of to ols and tec hniques for the Gaussian setting. 3.2.1 Gaussian Channel W e inv estigate the excess scannin g and fi lte r ing loss when non-optimal scanners are us ed , for the case of arbitrarily distribu ted inpu t corrup ted by a Gaussian c hann el. W e fir st fo cus 12 atten tion on th e case where the input is Gaussian as well, and then d eriv e a n ew results for the m ore general setting. Similarly as in [12], the b ound is ac hiev ed by b oundin g the absolute difference b et wee n the s ca n ning and fi ltering p erformance of an y t wo scans, Ψ 1 and Ψ 2 , assuming b oth use their optimal filters. This b ound, ho w ever, results from a r ela tion b et we en the p erformance of discr ete time fi ltering and c ontinuous time filtering, together with the fu ndamen tal r esult of Duncan [17] on the relation b et wee n mutual in formatio n and causal minimal mean square error estimation in a Gaussian c hannel. Namely , w e use the mutual i nfor mation in c ontinuous time as a s can inv arian t feature, and the actual v alue of the excess loss b ound results from the difference b et we en d iscrete and cont inuous time filtering problems, as will b e made precise b elo w. F r om now on w e assume the loss fun ctio n is the s q u ared er r or loss, l s ( · ). W e start with sev eral defi nitions. Let X b e a Gaussian random v ariable, X ∼ N (0 , σ 2 X ). C onsider the follo wing tw o estimation problems: • The scalar problem of estimating X based on Y = X + N , where N ∼ N (0 , σ 2 N ), indep endent of X . • The con tinuous time problem of causally estimating X t ≡ X , t ∈ [0 , 1], based on Y t , whic h is an A WG N-corrupted version of X t , the Gaussian noise having a sp ectral d ensit y lev el of σ 2 N . T o b ound th e sens itivi t y of the scanning and filtering p erformance, it is b eneficial to consider the d ifference b et ween the estimation err ors in the ab o ve t wo pr oblems, that is, Z 1 0 V ar( X t | Y t )d t − V ar( X | Y ) , (22) where Y t is the contin uous time signal { Y t ′ } t t ′ =0 . C lea r ly , V ar( X | Y ) = σ 2 X σ 2 N σ 2 X + σ 2 N . S ince R t 0 Y t ′ d t ′ is a sufficient statistics in the estimation of X t ≡ X , V ar( X t | Y t ) is equiv alen t to the squ ared error in estimating X based on X + ˜ N , ˜ N b eing a Gauss ian r andom v ariable, indep endent of X , with zero m ea n and v ariance σ 2 N /t . Thus, Z 1 0 V ar( X t | Y t )d t − V ar( X | Y ) = Z 1 0 σ 2 X ( σ 2 N /t ) σ 2 X + ( σ 2 N /t ) d t − σ 2 X σ 2 N σ 2 X + σ 2 N = σ 2 N ln  1 + σ 2 X σ 2 N  − σ 2 X σ 2 N σ 2 X + σ 2 N = σ 2 N f  σ 2 X σ 2 N  , (23) 13 where f ( x ) = ln(1 + x ) − x x + 1 . (24) The follo wing is the main result in this sub -sec tion. Theorem 3. L e t X V n b e a Gaussian r andom field with a c onstant mar ginal distribution sat- isfying V ar ( X i ) = σ 2 X < ∞ for al l i ∈ V n . L et Y i = X i + N i , wher e N V n is a white Gaussian noise of varianc e σ 2 N , indep endent of X V n . Then, for any two sc ans Ψ 1 and Ψ 2 , we have 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ σ 2 N f  σ 2 X σ 2 N  . (25) Theorem 3 b ound s the absolute difference b et wee n the scanning and fi ltering p erform an ce of an y t wo s cann ers, Ψ 1 and Ψ 2 , assuming they use th eir optimal filters. Clearly , since the scanners are arbitrary , this result can also b e interpreted as the difference in p erformance b et w een an y scan Ψ, and the b est achiev able p erformance, ˜ U ( l , Q V n ). Note that the b ound v alue, σ 2 N f  σ 2 X σ 2 N  , is a single letter expression, which dep ends on the inp ut fi eld X V n and the noise N V n only th rough their v ariances. Namely , the b ound do es not d epen d on the memory in X V n . Pr o of (The or em 3). As m en tioned earlier, th e comparison b et we en an y tw o scans is made by b ounding th e norm al ized cumulativ e loss of any sc an Ψ in terms of a sc an invariant en tit y , whic h is the mutual information. F or simplicit y , assume first that the scan Ψ is data-indep endent, namely , it is merely a reordering of the entries of Y V n . In this case, { X Ψ i } n 2 i =1 is a discrete time Gaussian vecto r . W e construct fr om it a c ontinuous time pr o c ess , { X ( c ) t } t ∈ [0 ,n 2 ] , w here for any t ∈ [ i − 1 , i ), X ( c ) t = X Ψ i , i ∈ { 1 , 2 , . . . , n 2 } . That is, X ( c ) t is a piecewise constant p rocess, whose constan t v alues at interv als of length 1 corr esp ond to the original v alues of th e discrete time v ector { X Ψ i } . L et { Y Ψ i } and { Y ( c ) t } b e th e A W GN-corrupted versions on { X Ψ i } and X ( c ) t , n amel y , Y Ψ i = X Ψ i + N Ψ i and Y ( c ) t is constructed according to d Y ( c ) t = X ( c ) t d t + σ N d W t , t ∈ [0 , n 2 ] , (26) where W t is a stand ard Bro wnian m ot ion. Observe that the white Gaussian noise, σ N d W t , has a sp ectral density of lev el σ 2 N , s im ilar to th e v ariance of the discrete time noise N V n . Since w e switc h fr om d iscrete time to contin uous time, it is im p ortant to note that the noise v alue in the t wo p roblems is equiv alen t. Th at is, if the discrete time field X V n is corrup ted by n oi se of v ariance σ 2 N , th en we wish th e cont in u ous time white noise to hav e a sp ectrum suc h that the 14 in tegral o ver an interv al of length 1, whose inte grand is the con tinuous outpu t Y ( c ) t (and th u s is a sufficient statistics in order to estimate the piecewise-co n tinuous input X t in th is interv al), will b e a r andom v ariable which is exactly X Ψ i + N Ψ i , N Ψ i ha ving a v ariance of σ 2 N . W e hav e, 1 n 2 E Q V n L (Ψ , ˜ F opt ) ( X V n , Y V n ) = 1 n 2 n 2 X i =1 V ar  X Ψ i | Y Ψ i Ψ 1  ( a ) = 1 n 2 n 2 X i =1 " Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ]  d t − σ 2 N f V ar  X Ψ i | Y Ψ i − 1 Ψ 1  σ 2 N ! # ( b ) ≥ 1 n 2 n 2 X i =1 Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ]  d t − σ 2 N f V ar  X 1  σ 2 N ! ( c ) = 1 n 2 2 σ 2 N I  { X ( c ) t } t ∈ [0 ,n 2 ] ; { Y ( c ) t } t ∈ [0 ,n 2 ]  − σ 2 N f V ar  X 1  σ 2 N ! = 1 n 2 2 σ 2 N I ( { X Ψ i } ; { Y Ψ i } ) − σ 2 N f V ar  X 1  σ 2 N ! ( d ) = 1 n 2 2 σ 2 N I ( X V n ; Y V n ) − σ 2 N f V ar  X 1  σ 2 N ! . (27) The equalit y (a) is fr om the application of (23) w ith X = X Ψ i | Y Ψ i − 1 Ψ 1 , i.e., with X Ψ i dis- tributed cond itioned on Y Ψ i − 1 Ψ 1 . Note conditioned on Y Ψ i − 1 Ψ 1 , X Ψ i is in deed Gauss ia n, and that (23) applies to any Gaussian X corrup ted by Gaussian noise. The in equalit y (b) is since V ar( X Ψ i | Y Ψ i − 1 Ψ 1 ) ≤ V ar( X 1 ) and due to the increasing monotonicit y of f , (c) is since th e result- ing in tegral from 0 to n 2 is simply th e minimal mean square error in filtering { Y ( c ) t } (as Y Ψ i is a sufficien t statistics with resp ect to { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ] ), and the application of Duncan’s r esult [17, T h eorem 3]. Finally , (d) is since th e mutual information is inv arian t to th e reordering of the ran d om v ariables. T o complete the pro of of Theorem 3, simply note th at sin ce f ( x ) is non-negativ e f or x > 0, by (a) ab o v e, the normalized cumulativ e loss can b e upp er b oun ded as well, that is, 1 n 2 E Q V n L (Ψ , ˜ F opt ) ( X V n , Y V n ) ≤ 1 n 2 n 2 X i =1 Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ]  d t (28) hence, s im ilarly as in the c hain of inequalities leading to (27), 1 n 2 E Q V n L (Ψ , ˜ F opt ) ( X V n , Y V n ) ≤ 1 n 2 2 σ 2 N I ( X V n ; Y V n ) . (29) 15 In fact, equ at ion (29) can b e view ed as the scanning and filtering analogue of [18, eq. (156a) ]. No w, if the scan Ψ is data-dep enden t, th e ab o v e deriv ations apply , with the u se of the smo othing prop ert y of conditional exp ectation. Th at is, conditioned on Y Ψ i − 1 Ψ 1 , the p osition Ψ i is fixed (assuming deterministic s ca n ners, though random scanning order can b e tackl ed with a similar metho d), relation (a) in (27) h olds since it holds cond itioned on Y Ψ i − 1 Ψ 1 , and relation (c) h olds as the mutual information is inv arian t under data-dep enden t reordering as w ell. T h is is very similar to the metho ds used in the pro of of [12, Prop osition 13], w here it was sho w n that the en tr opy of a vect or is inv ariant to data-dep enden t reorderin g. A t this p oin t, a few r emarks are in order . A very simple b ound, applicable to arbitrarily distributed fi elds and under squ ared error loss (yet int er esting mainly in the Gaussian r egime) results from noting th at for any random v ariables X and Y = X + N , 0 ≤ V ar( X | Y ) ≤ σ 2 N σ 2 X σ 2 X + σ 2 N . (30) Namely , simp le sym b ol by sym b ol restoration results in a cumulat iv e loss of at most σ 2 N σ 2 X σ 2 X + σ 2 N , and we h a ve, 1 n E L (Ψ , ˜ F opt ) ( { X i } n i =1 , { Y i } n i =1 ) = 1 n X i V ar( X Ψ i | Y Ψ i Ψ 1 ) ≤ 1 n X i V ar( X Ψ i | Y Ψ i ) = σ 2 N σ 2 X σ 2 X + σ 2 N . (31) Th us, the excess loss in n on-optimal scanning cannot b e greater than that v alue, hence, 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ σ 2 N σ 2 X σ 2 X + σ 2 N . (32) In th e next su b -sect ion, w e derive a tigh ter b oun d th an the b ound in (32), ap p lica b le to arbitrarily distributed noise-free fields. Ho wev er, sin ce this b ound may b e h arder to ev aluate, it is in teresting to d iscuss the prop erties of (32 ) as well . Both the b ound in Theorem 3 an d the b ound in (32) are in the form of V ar( X 1 ) g (SNR), for some g , wh ere SNR = σ 2 X /σ 2 N . This means th at any b ound obtained for a certain SNR app lies to all v alues of V ar( X 1 ) by rescaling. T he b oun d in Th eorem 3 h as the form V ar( X 1 ) f (SNR) SNR , where f ( · ) was defi n ed in (24), and we h av e lim SNR → 0 + f (SNR) SNR = lim SNR →∞ f (SNR) SNR = 0 , (33) 16 0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (Var(X)/Var(N)) bound value (for Var(X 0 )=1) Bounds on the scantering sensitivity − Gaussian input Symbol−by−symbol bound (arbitrary distributions) Gaussian input and Gaussian noise Figure 2: Bounds on the excess loss in scanning and filtering of Gaussian input corrupted by A WGN. The solid line is the b ound give n in Theorem 3. The dashed line is the b ound giv en in (32). that is, the scan is inconsequentia l at v ery h igh or ve ry lo w SNR. This is clear as at h igh SNR the current observ ation is by far th e most influ en tial, and wh ate ver previous observ ations us ed is inconsequentia l. F or low S NR, the cum ulativ e loss is high wh ate ver the s ca n is. Unlik e the b ound in Theorem 3, (32) do es n ot p r edict the correct b ehavio r for SNR → 0 + , and is mainly in teresting in the high S NR r eg ime. The ab o v e observ ations are also evident in Figur e 2, which includes b oth the b ound giv en in Th eo rem 3, applicable to Gaussian fields, and (32), applicable to arbitrarily distrib uted fields. It is also evident that in the case of Gaussian fields, f (SNR) SNR has a u nique maxim u m of appro xim ately 0 . 216, that is, the excess loss du e to a sub optimal scan at any SNR is upp er b ounded by 0 . 216V ar ( X 1 ). R emark 2 . It is clear from the pr oof of T heorem 3 that an upp er b ound on the expr ession in (22), v alid for arb itrarily distribu ted in p ut X , may yield an upp er boun d on the exce ss scanning and fi lte r ing loss wh ic h is also v alid for arbitrarily distr ib uted random fields. Ho wev er, while the integral in (22) can b e up p er b ound ed by assuming a Gaussian X , V ar( X | Y ) has no non- trivial lo wer b ound. In fact, in [22], it is shown that if X is th e f ol lo wing b inary random v ariable, X =    q 1 − p p w.p. p, − q p 1 − p w.p. 1 − p, (34) 17 for whic h E X = 0 and E X 2 = 1, then we ha v e V ar ( X | Y ) ≤ 1 2 p (1 − p ) e − σ 2 X /σ 2 N 4 p (1 − p ) , (35) whic h can b e arbitrarily close to 0 for small enough p . T h us, the only low er b ound on V ( X | Y ) whic h is v alid f or an y X with E X 2 < ∞ , an d dep ends only on σ 2 X and σ 2 N , is 0 (and hence results is a b ound w eake r than Theorem 3 or (32 )). In the next t wo subsections, we derive new b ound s on the excess loss, whic h are v alid f or more general input fields . First, we generalize the b ound in Theorem 3. While the result ma y b e complex to ev aluate in its general f orm, we sho w that for binary input fields the b ound admits a simple form. W e then sho w that if the input alphab et is contin uous, then a non-trivial b ound on V ar( X | Y ) can b e deriv ed easily , w hic h, in tur n, resu lts in a new b oun d on the excess loss. A Generalization of Theorem 3. A generalization of Theorem 3 r esults fr om revisiting equalit y (a) of (27), w h ic h is sim p ly the application of (23) with X = X Ψ i | Y Ψ i − 1 Ψ 1 . While it is clear th at an expression similar to that in (23) can b e compu ted f or n on-Gaussian X , it is not clear that X Ψ i | Y Ψ i − 1 Ψ 1 has the s ame distrib ution f or any 1 ≤ i ≤ n 2 (unlik e the Gauss ia n setting, w here X Ψ i | Y Ψ i − 1 Ψ 1 is alw ays Gaussian). Nevertheless, u sing the definition b elo w, on e can generalize Theorem 3 f or arbitrarily distribu ted inpu ts as follo ws. F or any ( X V n , Y V n ), where Y V n is the A W GN-corrup ted version of X V n , d efine f ∗ ( X V n , σ 2 N ) = max Ψ , 1 ≤ i ≤ n 2  Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ]  d t − V ar  X Ψ i | Y Ψ i Ψ 1   . (36) Theorem 4. L et X V n b e an arbitr arily distribute d r andom field, with a c onstant mar ginal distribution satisfying V ar ( X i ) = σ 2 X < ∞ for al l i ∈ V n . L et Y i = X i + N i , wher e N V n is a white Gaussian noise of varianc e σ 2 N , indep e ndent of X V n . Then, for any two sc ans Ψ 1 and Ψ 2 , we have 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ f ∗ ( X V n , σ 2 N ) . (37) The pro of of Theorem 4 is similar to that of Th eo rem 3, and app ears in App endix A.1. Note that f ∗ ( X V n , σ 2 N ) is s can-ind epend en t, as it includ es a maximization o v er all p ossible scans. At first sigh t, it seems lik e this maximization may tak e the sting out of the excess loss b ound. Ho wev er, as the example b elo w shows, at least for the inte r esting scenario of b inary input, this is not the case. 18 First, ho wev er, a few more general r emarks are in order. Since imp ortant insight can b e gained when u sing the results of Guo, S hamai an d V erd ´ u [18], let us mention the setting u sed therein. In [18], one wishes to estimate X b ased on √ SNR X + N , w h ere N is a s tand ard normal r an d om v ariable. Denote by I(SNR) and mmse(SNR) the mutual in formatio n b et w een X and √ SNR X + N , and th e minimal mean square error in estimating X based on √ SNR X + N , r esp ec tively . Note that V ar( X | Y ) in our setting equals σ 2 X mmse( σ 2 X /σ 2 N ). Under these definitions, d dSNR I(SNR) = 1 2 mmse(SNR) , (38) or, equ iv alen tly , I(SNR) = 1 2 Z SNR 0 mmse( γ ) d γ . (39) Consequent ly , the result of Theorem 3 can b e r esta ted as 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ 2 σ 2 N I(SNR) − σ 2 X mmse(SNR) , (40) where I(SNR) = 1 2 ln(1 + SNR) and mmse(SNR) = 1 1+SNR are simply the mutual information and minimal mean squ are error of the sc alar pr oblem (hence, a sin gl e letter expression) of estimating a Gaussian X based on √ SNR X + N , where N standard Gaussian. In fact, the b ound in Theorem 4 w ill alw ays ha ve the form 2 σ 2 N I(SNR) − σ 2 X mmse(SNR), for s ome X ∗ whose distribution is the maximizing distribution in (36 ). The next example sh o ws that this is indeed the case for b inary inp ut as we ll, and the r esulting b ound can b e easily compu ted. Example 1 ( Binary input and A WGN ) . Consid er the case wher e X V n is a bin ary random field, with a symmetric marginal distribution (that is, P ( X 0 = σ X ) = P ( X 0 = − σ X ) = 1 / 2). Note that the X i ’s are not necessarily i.i.d., and an y dep endence b etw een them is p ossible. Y V n is th e A WG N-corrupted version of X V n . T o ev aluate the b oun d in Th eo rem 4, f ∗ ( X V n , σ 2 N ) should b e calculate d. Ho wev er, for an y scan Ψ and time i , X Ψ i | Y Ψ i − 1 Ψ 1 is still a binary rand om v ariable, taking the v alues ± σ X with probabilities ( p, 1 − p ), for some 0 ≤ p ≤ 1 / 2. Hence, f ∗ ( X V n , σ 2 N ) ≤ m ax 0 ≤ p ≤ 1 / 2  Z 1 0 V ar( X t | Y t )d t − V ar( X | Y )  , (41) where X is a binary rand om v ariable, taking the v alues ± σ X with p r obabilitie s ( p, 1 − p ), X t ≡ X , Y = X + N and Y t is the A W GN-corrupted v ers io n of X t . The follo wing result holds for an y random v ariable X . Claim 5 . F or any r andom variable X with V ar ( X ) = σ 2 X < ∞ , the expr ession in (22) is monotonic al ly incr e asing in σ 2 X . 19 Pr o of. W e hav e, Z 1 0 V ar( X | Y t )d t − V ar( X | Y ) = Z 1 0 σ 2 X mmse( σ 2 X /σ 2 N t )d t − σ 2 X mmse( σ 2 X /σ 2 N ) = σ 2 N Z σ 2 X /σ 2 N 0 mmse( γ )d γ − σ 2 X mmse( σ 2 X /σ 2 N ) . (42) Th us, d d σ 2 X  Z 1 0 V ar( X t | Y t )d t − V ar( X | Y )  = − σ 2 X d d σ 2 X mmse( σ 2 X /σ 2 N ) = − 2SNR d 2 dSNR 2 I(SNR) ≥ 0 , (43) where the last inequalit y is by [18, Corollary 1]. Claim 5 simply states that the monotonicit y of f ( · ) u sed in inequalit y (b) of (27) is n ot sp ecific for Gaussian in put, and h olds for any X . Thus, b y Claim 5, the term in th e braces of equation (41) is monotonically in creasing in the v ariance of X , wh ic h is s imply 4 σ 2 X ( p − p 2 ). Th us, it is maximized by p = 1 / 2, and w e hav e f ∗ ( X V n , σ 2 N ) = 2 σ 2 N I(SNR) − σ 2 X mmse(SNR) , (44) where I(SNR) and mmse(SNR) are the mutual inf ormatio n and minimal mean squ are err or in the estimation of X based on √ SNR X + N , wh ere X is binary symm etric and N is a standard normal. Since the conditional mean estimate in this pr oblem is tanh( √ SNR Y ), we ha v e [18] I(SNR) = S NR − 1 √ 2 π Z ∞ −∞ e − y 2 2 ln cosh(SNR − √ SNR y ) dy , (45) and mmse(SNR) = 1 − 1 √ 2 π Z ∞ −∞ e − y 2 2 tanh(SNR − √ SNR y ) dy , (46) so the b ound can b e compu ted numerically . The ab o ve b ound is plotted in Figure 3. Similarly to the case of Gaussian input, it is insightful to compare this b ound to a simple symb ol-b y- sym b ol filtering b ound. T hat is, since for an y binary X corrupted by A W GN of v ariance σ 2 N , 0 ≤ V ar( X | Y ) ≤ σ 2 X mmse(SNR), we ha ve 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ σ 2 X mmse(SNR) , (47) where mmse(SNR) is giv en in (46). Th is is simply the analogue of (32) to the binary inpu t setting. 20 0 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (Var(X)/Var(N)) bound value (for Var(X 0 )=1) Bounds on the scantering sensitivity − binary input Symbol−by−symbol bound Binary input and Gaussian noise Figure 3 : Bounds on the excess loss in scanning and filtering of binary input fields corrupted b y A WGN. The solid line is the b ound give n in (44) (that is, Theorem 4), and the dashed line is the sym b ol-b y-sym b ol b ound giv en in (47). A Bound for Arbitrarily Distributed C on tin uous Input. In this sub -section, we derive an additional b ound on the excess scann in g and filtering loss under squ ared error. W e assume, ho wev er, that the input random field X V n is o ve r R V n , and th at X i | Y V n has a fin ite d ifferential en tropy for an y i ∈ V n (roughly s p eaking, th is means that in the denoising problem of X i , X i | Y V n is a non-d eg enerated con tinuous random v ariable). Under the ab o ve assumptions, we deriv e an excess loss b ound whic h is not only v alid for non-Gaussian input, but also d epend s on th e memory in the ran d om field ( X V n , Y V n ). On the other hand, it is imp ortant to note that the b ound b elo w is mainly asymptotic, and ma y b e muc h harder to ev aluate compared to the b ounds in Theorem 3 or (32). By [23, Theorem 9.6.5], for an y X , Y with a finite conditional differen tial en tropy H ( X | Y ), V ar( X | Y ) ≥ 1 2 π e exp { 2 H ( X | Y ) } . (48) 21 Th us, 1 n E L (Ψ , ˜ F opt ) ( X V n , Y V n ) = 1 n 2 n 2 X i =1 V ar( X Ψ i | Y Ψ i Ψ 1 ) ( a ) ≥ 1 n 2 n 2 X i =1 1 2 π e exp { 2 H ( X Ψ i | Y Ψ i Ψ 1 ) } ( b ) ≥ 1 n 2 n 2 X i =1 1 2 π e exp { 2 H ( X Ψ i | Y Ψ n 2 Ψ 1 ) } ( c ) ≥ 1 2 π e exp    2 1 n 2 n 2 X i =1 H ( X Ψ i | Y Ψ n 2 Ψ 1 )    , (49) where (a) is by applying (48 ) with Y = Y Ψ i Ψ 1 , (b) is since cond iti oning reduces ent r op y and (c) is b y app lying J ensen’s inequalit y . The expr ession 1 n 2 P n 2 i =1 H ( X Ψ i | Y Ψ n 2 Ψ 1 ) equals 1 n 2 P n 2 i =1 H ( X Ψ ′ 1 | Y Ψ ′ n 2 Ψ ′ 1 ) for any t wo scanners Ψ and Ψ ′ , since equalit y holds even without th e exp ectatio n imp licit in the entrop y fun ction. Th us, it is scan-inv arian t. Define H + ( X | Y ) = lim in f n →∞ 1 | V n | X i ∈ V n H ( X i | Y V n ) . (50) H + ( X | Y ) can b e seen as the asymptotic n ormaliz ed entrop y in the d en oi sing pr oblem of { X } based on its n oisy obser v ations. Note that th e entropies in (50) are d ifferential. The follo wing prop osition giv es a new low er b ound on the excess scannin g and fi ltering loss u nder squared error. Prop osition 6. L et X V n b e an arbitr arily distribute d c ontinuous value d r andom field with V ar ( X i ) = σ 2 X for al l i . L et Y i = X i + N i , wher e N V n is a white noise of varianc e σ 2 N , indep endent of X V n . Assume that X i | Y V n has a finite differ ential entr opy for any i ∈ V n . Then, for any two sc ans Ψ 1 and Ψ 2 , we have lim inf n →∞ 1 | V n |    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ σ 2 N σ 2 X σ 2 X + σ 2 N − 1 2 π e exp  2 H + ( X | Y )  . (51) Pr o of. T he pro of follo ws directly by app lying the lo wer b ound on the scann ing and fi lte ring p erformance giv en in (49) and the up p er b ound in (31). The b ound in Prop osition 6 is alw a ys at least as tigh t as the b oun d in (32) (and thus tigh ter than the b ound in Theorem 3 for high SNR). F or example, if the estimation error of X i giv en 22 Y V n tends to zero as n increases (as in the case where X i = X f or all i ), then exp { 2 H + ( X | Y ) } = 0. Ho w ever, if X i cannot b e reconstructed completely b y Y V n , th en the b ound may b e tigh ter than (32 ). I t is far fr om b eing a tigh t b oun d on the excess loss, th ough. In the extreme case where all X i ’s are i.i.d., the excess loss b ound in Prop osition 6 is σ 2 N σ 2 X σ 2 X + σ 2 N − V ar( X 1 | Y 1 ) > 0 (for non Gaussian X ), w hile it is clear that all reasonable scanner-filter p airs p erform th e s ame. Finally , note that any lo wer b ound on H + ( X | Y ) results in an u pp er b ound on the scanning and fi lte ring excess loss. F or example, s in ce H + ( X | Y ) ≥ H ( X 0 | Y k − k , X − k − 1 , X k +1 ) (52) for any fin ite k , one can compute a simple up p er b ound on the excess loss, at least for fir s t order Mark ov { X } . 3.2.2 Binary Input and BSC Unlik e the Gaussian setting discussed in Section 3.2.1 , wh ere the b ound on the excess loss resulted from a con tinuous-time equalit y , with the mutu al information serving as the s can- in v ariant feature, in the case of binary in put and a BSC th e entr opy of the r andom field w ill pla y the key role, similar to [12]. As giv en in Section 3.1, the b est ac h iev able p erformance (in the s ca lar p roblem) is giv en by f δ ( p ) = min  p − δ 1 − 2 δ , 1 − p − δ 1 − 2 δ , δ  , (53) where p is the pr obabilit y that the channel output is 1 and δ is th e c hann el crosso ver prob a- bilit y . Note that f δ ( p ) is not the Ba yes en velope asso cia ted w ith estimating X t using Y t under Hamming loss. Ho wev er , as is clear from the deriv ations in (17) , and w ill b e evid ent fr om the pro of of the follo w ing theorem, f δ ( P ( y t | y t − 1 )) is the exp e ctation of the Ba y es en velo p e (asso cia ted w ith estimating X t using Y t under Hamming loss) with resp ect to th e distribu tio n P ( y t | y t − 1 ). Define ǫ δ = m in a,b max δ ≤ p ≤ 1 / 2 | ah b ( p ) + b − f δ ( p ) | . (54) The follo wing is the main result in this sub -sec tion. Theorem 7. L et Y B b e the output of a BSC with cr ossover pr ob ability δ whose input is X B . Then, for any sc anner-filter p air (Ψ , ˜ F opt ) , wher e ˜ F opt is the optimal filter f or the sc an Ψ , we have     1 | B | E Q B L Ψ , ˜ F opt ( X B , Y B ) − ˜ U ( l H , Q B )     ≤ 2 ǫ δ . (55) 23 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 δ Bound value The 2 ε δ bound compared to the simple singlet bound 2 ε δ δ (singlet bound) Figure 4: Bounds on the excess lo ss in scanning and filtering o f binary random fields corrupted by a BSC. The solid line is the b ound in Theorem 7 (2 ǫ δ ), and t he dashed line is the singlet b ound ( f δ (1 / 2) = δ ) . Ev en w ith ou t ev aluating ǫ δ explicitly , it is easy to see that the excess loss wh en usin g n on optimal scanners is quite sm all in this binary filtering scenario. F or example, for δ = 0 . 1 and δ = 0 . 25 w e ha v e ǫ δ < 0 . 035 and ǫ δ < 0 . 03 resp ectiv ely , yielding a m aximal loss of 0 . 07 or ev en 0 . 06. Figure 4 includes the v alue of 2 ǫ δ as a fu nction of δ . Similarly to S ect ion 3.2.1, it is compared to a simple b oun d on the excess loss which resu lts from simp ly b oundin g the Hamming loss of any filter by 0 from b elo w an d δ from ab o ve (n amel y , δ is the resu lting b ound on th e excess loss). The v alues in Figure 4 should also b e compared to 0 . 16, wh ic h is the b ound on the excess loss in the clean pr e diction scenario [12], or ev en to larger v alues in the noisy prediction s ce nario, to b e discussed in the next s ec tion. T h e fact that th e filtering problem is less sensitive to the scannin g order is quite clear as the noisy observ ation of X Ψ t is av ailable und er any scan. Finally , it is n ot h ard to s h o w th at in the limits of δ → 0 and δ → 1 / 2 (high and lo w SNR, resp ectiv ely), we hav e ǫ δ → 0, whic h is exp ected, as the scanning is in co n sequen tial in these cases (n ote, how ever, that the singlet b ound, δ , do es n ot predict the corr ect b eha vior at low S NR). Pr o of (The or em 7). W e firs t sho w that for any arbitrarily distr ibuted bin ary n -tuple X n and 24 an y 0 ≤ δ < 1 / 2     a ∗ δ 1 n H ( Y n ) + b ∗ δ − 1 n E L opt l H ( X n , Y n )     ≤ ǫ δ , (56) where E L opt l H ( X n , Y n ) is the exp ecte d cum u lat iv e Hamming loss in optimally filtering X n based on Y n and a ∗ δ and b ∗ δ are the min imize r s of ǫ δ in (54). In deed,     a ∗ δ 1 n H ( Y n ) + b ∗ δ − 1 n E L opt l H ( X n , Y n )     =     1 n n X t =1 X y t − 1 P ( y t − 1 ) X x t ,y t  − a ∗ δ P ( x t , y t | y t − 1 ) ln P ( y t | y t − 1 ) + b ∗ δ P ( x t , y t | y t − 1 ) − P ( x t , y t | y t − 1 ) l H  x t , ˜ F opt ( y t )       ≤ 1 n n X t =1 X y t − 1 P ( y t − 1 )     a ∗ δ h b  P ( y t | y t − 1 )  + b ∗ δ − X x t ,y t P ( x t , y t | y t − 1 ) l H  x t , ˜ F opt ( y t )      ≤ 1 n n X t =1 X y t − 1 P ( y t − 1 )     a ∗ δ h b  P ( y t | y t − 1 )  + b ∗ δ − X y t P ( y t | y t − 1 ) X x t P ( x t | y t ) l H  x t , ˜ F opt ( y t )      . (57) Consider th e s ummation P y t P ( y t | y t − 1 ) P x t P ( x t | y t ) l H  x t , ˜ F opt ( y t )  . As ˜ F is optimal, the inner sum equals at most min { P ( x t = 0 | y t ) , P ( x t = 1 | y t ) } . Thus, similar to the deriv ations in (17), w e ha ve X y t P ( y t | y t − 1 ) X x t P ( x t | y t ) l H  x t , ˜ F opt ( y t )  = X y t P ( y t | y t − 1 ) min { P ( x t = 0 | y t ) , P ( x t = 1 | y t ) } = m in  P ( y t = 0 | y t − 1 ) − δ 1 − 2 δ , P ( y t = 1 | y t − 1 ) − δ 1 − 2 δ , δ  . (58) Let p = P ( y t = 0 | y t − 1 ). Note that m in { p, 1 − p } ≥ δ . W e ha ve     a ∗ δ 1 n H ( Y n ) + b ∗ δ − 1 n E L opt l H ( X n , δ )     ≤ 1 n n X t =1 X y t − 1 P ( y t − 1 )     a ∗ δ h b ( p ) + b ∗ δ − min  p − δ 1 − 2 δ , 1 − p − δ 1 − 2 δ , δ      ≤ 1 n n X t =1 X y t − 1 P ( y t − 1 ) max δ ≤ p ≤ 1 / 2 | a ∗ δ h b ( p ) + b ∗ δ − f δ ( p ) | = ǫ δ , (59) whic h establishes (56). Ho we ver, the same inequality can b e pro ved for an y reordering of the data Ψ (similar to the pro of of [12, Pr oposition 13]), consequen tly ,     a ∗ δ 1 | B | H (Ψ( Y B )) + b ∗ δ − 1 | B | E Q B L Ψ , ˜ F opt ( X B , Y B )     ≤ ǫ δ . (60) 25 Using (60), remem b ering that H ( Y B ) = H (Ψ( Y B )) for any Ψ , an d applying the triangle inequalit y r esults in (55). Note that analogous ideas w ere used by V erd ´ u and W eissman to b ound the absolute dif- ference b etw een the d en oi sabilit y and erasur e en tropy [24]. Theorem 2 giv es a lo wer b ound on the b est ac h iev able scanning and filtering p erform an ce. Theorems 3, 4 and 7 give an u p p er b oun d on th e maximal p ossible difference b et we en the normalized cumulativ e loss of any t wo scanners (accompanied b y th e optimal filters), or an y one scanner compared to th e optimal scan. A lth ou gh Theorem 2 is similar to th e r esults of [11], ev en for th e relativ ely simp le examples of a Gaussian field through a Gaussian memoryless c hann el or a b inary source thr ough a bin ary symmetric c h annel we hav e no resu lts whic h can parallel [11, Theorem 17] or [11, Corollary 21], i.e., giv e an example of an optimal scanner-filter pair for a certain scenario. How ev er, as the next example s h o ws, we can identify situations when scannin g and filtering improv es th e filtering results, i.e., non trivial scanning of th e data results in strictly b etter restoration. Moreo ve r , the example b elo w illustrates the use of the results derive d in this section. Example 2 ( One Dimensional Binary M arkov Sour c e and the BSC ) . In this case, it is not to o hard to construct a sc h eme in wh ic h non-trivial scanning improv es th e filtering p erformance. In [21], Ordentlic h and W eissman stud y the optimalit y of symb ol by sym b ol (singlet) filtering and d ec o ding. That is, the regions (d epend ing on the source and channel parameters) w here a m emoryless sc h eme to estimate X i is optimal with resp ect to causal (filtering) or non causal (denoising) non-memoryless schemes. C learly , in the r eg ions where sin gle t denoising is optimal (a fortiori singlet filtering), scanning cannot impro ve the filtering p erform ance. Ho we v er, consider the region where singlet fi lte ring is op timal, y et singlet denoising is n ot. In this region, there exists k f or which the estimation error in estimating X i based on Y i + k i − k is strictly smaller than that b ased on Y i (as the optimal fi lte r is m emoryless ye t the optimal denoiser is n ot) . Hence, a scann er w hic h in the first pass scans k con tiguous sy mb ols, then sk ip s one, etc., and in the second p ass returns to fi ll in the holes, accompanied by sin gle t filtering in the first pass and non-memoryless in th e second, has strictly b etter filtering p erformance than th e trivial scanner. F or a binary symmetric Mark o v source w ith a transition probabilit y π ≤ 1 2 , corru pted by a BSC with crosso ver pr obabilit y δ , [21, Corollary 3] asserts that sin glet filtering (“sa y-what- 26 y ou-see” sc heme in this case) is optimal if and only if δ ≤ f ( π ) △ = 1 2 (1 − p max { 1 − 4 π , 0 } ) . (61) Singlet denoising, on the other hand, is optimal if and only if δ ≤ d ( π ) △ = 1 2   1 − s max { 1 − 4  π 1 − π  2 , 0 }   . (62) Consider a scanner-filter pair w hic h scans the data us ing an “o dds-then-ev ens” scheme. On the o dds, “say-what- y ou-see” filtering is used. On the ev ens, Y i +1 i − 1 are u s ed in ord er to estimate X i . 2 The resu lts are in Figur es 5 and 6. In Figure 5, the p oint s mark ed with “x” are where the “o dds-then-ev ens” scan improv es on the trivial scan. Th e tw o curv es are f ( π ) and d ( π ). Figure 6 shows the actual impro vemen t made by the “o dds-then-ev ens” scanning and filtering. F or δ = π = 0 . 1, for example, the “o d ds-then-ev ens” err or rate is sm al ler than that of filtering with the trivial scan b y 0 . 021 (that is, 0 . 079 compared to 0 . 1). This v alue sh ould b e pu t alongside the upp er b oun d on the excess loss give n in Theorem 7, w h ic h is sm all er than 0 . 07 in this case. T o ev aluate th e b ound on the b est ac hiev able scann in g and filtering p erformance giv en in Theorem 2 for this example (d enote d , with a slight abuse of notation, as ˜ U ( π , δ )), we hav e ˜ U ( π , δ ) ≥ ¯ ζ − 1 ( ¯ H ( π , δ )) ≥ ¯ ζ − 1 ( h b ( π ∗ δ )) , (63) where ¯ H ( π , δ ) is the entrop y r ate of the output, w h ic h is in tur n lo w er b oun ded b y h b ( π ∗ δ ). The resulting b ound f or π = δ = 0 . 1 is appr oximate ly 0 . 04. Th us, there exist non-trivial scanning and fi lte r ing schemes (i.e., lo wer b ounds) whose impro vemen t on the trivial scanning order is of th e same order of magnitude as the upp er b ound in T heorem 7. T o conclude, it is clear that there is a wide region we r e a n on trivial scanning order impro v es on the trivial scan, an that this r egion includes at least all the region b et w een f ( π ) and d ( π ). Y et, it is not clear what is the optimal scann er -fi lte r pair. 3.3 Univ ersal Scanning and Filtering of Noisy Data Arra ys In [19 ], W eissman et . al . men tion that the p r oblems inv olving sequentia l d ecision m aking on noisy data are not f undamen tally different fr om their noiseless analogue, and in fact can b e 2 This is to hav e few simple steps in the forward-backward alg orithm [25, Section 5] which is requir ed to compute P ( x i | y i +1 i − 1 ). The genera lization to Y i + k i − k is straightforward. 27 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 π δ Where can non−trivial scanning improve the optimal filtering performance? f( π ) d( π ) Non−trivial scanning improves the optimal filtering performance Figure 5: Where can a simple (sub optimal) “o dds-then-ev ens” scan impro v e on the trivial scanning order and optimal filtering sc heme. π is the transition proba bility of the symmetric, first or der, Mark ov source a nd δ is the c ha nnel crosso ver probability . reduced to the noiseless setting using a pr operly mo dified loss function. Indeed, th is p rop ert y of the noisy setting w as used throughout the literature, and in this work. Th e problem of filtering a noisy data sequence is not d ifferen t in this sense, and it is p ossible to constru ct a mo dified loss function su c h that the filtering pr oblem is transf ormed into a prediction pr oblem (with a few imp ortant exceptions to b e discus s ed later). Suc h a m odified loss function and a ‘filtering-prediction transf orm at ion’ is discussed in [19]. W e briefly review this trans formatio n , and consider its use in universal filtering of noisy data arra ys. First, we sligh tly generalize our notion of a filter. F or a random v ariable U t uniformly distributed on [0 , 1], let ˆ X t ( y t − 1 y t , U t ) ∈ A denote the output of the fi lter ˆ X at time t , after observing y t . Th at is, the filter ˆ X also views an auxiliary rand om v ariable, on which it can b ase its output, ˆ X t ( y t − 1 y t , U t ). W e also generalize the prediction space to M ( S ), S = { s : N 7→ A } . I.e., the prediction space is a distrib u tion on the set of fu nctions fr om the noisy observ ations alphab et N to the clean signal alphab et A . W e assume an inv er tib le discrete memoryless c hann el. F or eac h filter ˆ X , the corresp onding pr edicto r is defi n ed b y F ˆ X t ( y t − 1 )[ s ] = P  ˆ X t ( y t − 1 y , U t ) = s ( y ) ∀ y ∈ N  . (64) 28 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 −0.005 0 0.005 0.01 0.015 0.02 0.025 δ The difference between optimal filtering and "odds−then−evens" scantering. π Difference in error rate Figure 6: The actual difference b etw een the o ptimal filtering error ra te and the “o dds-then-ev ens” scanning and filtering error rate. π is the transition probability of the symmetric, first o r der, Mark ov source and δ is the c hannel crosso ver probabilit y . Only v alues for whic h δ < f ( π ) ar e sho wn. 29 The analogous ‘prediction-filtering transformation’ is ˆ X F t ( y t , u t ) = a j ∈ A if j − 1 X i =0 X s : s ( y t )= a i F t ( y t − 1 )[ s ] ≤ u t < j X i =0 X s : s ( y t )= a i F t ( y t − 1 )[ s ] , ( 65) where the subs cr ip t i r efl ec ts s ome en u m erati on of A . Und er the ab o ve defin itio n s, [19, T h eo- rem 4] states that for all n , x n ∈ A n and an y pred ict or F , E L ˆ X F ( x n , Y n ) = E L ′ F ( Y n ) , (66) where L ˆ X F ( x n , Y n ) is the cumulati v e loss of th e filter u nder th e original loss fu nctio n l and L ′ F ( Y n ) is the cum u lat iv e loss of the predictor under a mo difie d loss function l ′ , whic h d epen ds on the original loss l and the c hann el crosso ver pr ob ab ilities. This resu lt can b e u sed for un iv ersal filtering, under inv ertible discrete memoryless c h annels, in the follo wing w ay . F or eac h fi nite set of filters, constru ct the corresp ondin g set of predictors, then use the well kno w n results in u niv ersal prediction in order to construct a u niv ersal p r edic- tor for th at set. Finally , constru ct th e un iv ersal filter using the “in v erse” p redictio n-filtering transformation. Analogously , the results on unive r sal fin ite set scand iction giv en in [12] can b e used to construct univ ersal scanner-filter pairs. Note, ho w ever, th at the mo dified loss function l ′ ma y b e muc h more complex to handle compared to the original one. F or example, it may not b e a function of th e d ifferen ce x t − F t , even if the original loss f unction is. Nev ertheless, the r esults in [12 ] apply to any b ound ed loss function, and thus can b e utilized. 4 Scandictio n of Noisy Data Arra ys In this sectio n, we co n sider a scenario similar to that of Section 3, only now, for eac h t , the data Y Ψ t is not a v ailable in the estimation of X Ψ t , n amely , F t = F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ), as opp osed to ˜ F t = ˜ F t ( Y Ψ 1 , . . . , Y Ψ t ) in the fi ltering scenario. W e refer to this scenario as “noisy scandiction”, analogous to the noisy pr edict ion problems d iscussed in [14] and [15]. W e fir st assume the joint p r obabilit y distribu tio n of the u nderlying field and n oisy obser- v ations, Q , is kn o w n, and examine the s et tings of Gaussian fields un der s q u ared error loss and binary fi elds u n der Hamming loss. In these cases, we characte r ize the n oi sy s ca n dictabilit y and the ac h ievi ng scandictors in terms on the “clean” scandictabilit y of the noisy data. W e then consider u niv ersal scandiction for the n oisy setting, show that this is ind eed p ossible for finite scandictor set and for the class of all stationary binary fields corru pted b y bin ary noise. Finally , we der ive b ounds on the excess loss when non optimal scanners are u s ed (yet , with the op timal pr ed ict or f or eac h scan). 30 4.1 Noisy Scandictabilit y Throughout this section, it will b e b eneficial to consider also the cle an scand ictabilit y as defined in [11, Definition 2], that is, wh en th e scandictor is ju dged w ith resp ect to the same random fi eld it observes. Th u s, for ( X , Y ) go verned by the probabilit y measure Q , Q Y denotes the marginal measure of { Y } , and therefore U ( l, Q Y ) r efers to th e clean scandictabilit y of Y , i.e., L (Ψ ,F ) ( y B ) = | B | X t =1 l ( y Ψ t , F t ( y Ψ 1 , . . . , y Ψ t − 1 )) , (67) and U ( l , Q Y ) = lim n →∞ inf (Ψ ,F ) E Q Y 1 | B | L (Ψ ,F ) ( Y B ) . (68) As m en tioned earlier, in this section we relate th e noisy scandictabilit y , ¯ U ( l , Q ), to the clean scandictabilit y of the noisy field, U ( l, Q Y ). This r ela tion can b e used to deriv e b ounds on ¯ U ( l , Q ) using the b ounds on U ( l, Q Y ) derive d in [11]. Ho wev er, this should b e done carefully . F or example, the low er and upp er b ounds give n in [11 , Th eorem 9] are applicable only when X has an autoregressiv e representa tion (with resp ect to some scandictor) with indep endent inno v ations. Unfortunately , Y = X + N do es n ot n ece s sarily hav e this representa tion, and the b ounds do not apply to Y in a straigh tforward mann er. 3 Y et, a simple generalization of the lo wer b ound in [11 ], v alid for arbitrarily distrib uted random fields, can b e deriv ed using the same m ethod used in the pro of of T h eorem 2 . T o this end, we briefly d escrib e this generalizat ion. Let B ( P ) = min ˆ y X y l ( y , ˆ y ) P ( y ) , (69) and fu rther define γ ( d ) = max { H ( P ) : B ( P ) ≤ d } . (70) Similarly as in Section 3.1, denote by ¯ γ ( d ) the upp er conca ve env elop e of γ ( d ). Corollary 8. F or any r andom field Y B and any sc andictor (Ψ , F ) for Y B , ¯ γ ( L (Ψ ,F ) ( Y B )) ≥ 1 | B | H ( Y B ) . (71 ) 3 Note tha t the r estriction to autoreg ressive fields is merely technical, i.e ., it facilitates the pr oof of the low er bo und in the sense that a weak AEP -lik e theorem is requir ed. The es sence o f the low er b ound, how ever, which is a volume preserv ation argument, is v alid for non a utoregressive fields as well. 31 Pr o of. T he pro of is similar to that of Theorem 2. W e h a v e, H ( Y B ) = H (Ψ( Y B )) = | B | X t =1 H ( Y Ψ t | Y Ψ t − 1 ) ≤ | B | X t =1 X y Ψ t − 1 γ  E Q B  l  Y Ψ t , F t ( y Ψ t − 1 )  | Y Ψ t − 1 = y Ψ t − 1  P ( y Ψ t − 1 ) ≤ | B | ¯ γ  1 | B | E Q B L (Ψ ,F ) ( Y B )  . (72) The lo w er b oun d in Corollary 8 strengthens the b ound in [11, Theorem 9] sin ce it app lies to general loss fun cti ons, arbitrarily distrib uted rand om fields, and is non-asymptotic. When A = R and the loss fu nction is of the form l ( x, F ) = ρ ( x − F ), where ρ ( z ) is monotonically increasing for z > 0, m onot onically decreasing for z > 0, satisfies ρ (0) = 0 and R e − sρ ( z ) dz < ∞ for ev ery s > 0, the ab o v e b oun d coincides with that of [11]. In that case, ¯ γ ( d ) = γ ( d ), wh ic h is in turn the one side d F enchel-L e gendr e tr ansform of th e lo g moment gener ating f u nction asso cia ted with ρ (See [11, Section I I I] for the details). F or example, when ρ ( z ) = z 2 , we h a ve, γ ( d ) = 1 2 ln (2 π ed ), d > 0 and γ − 1 ( h ) = 1 2 π e e 2 h , h > 0. Similar results can b e d eriv ed for binary alphab et, th u s, when ρ ( z ) is the Hamming loss fu nction, γ ( d ) = h b ( d ). W e n ow turn to d iscuss the noisy scnadictabilit y , ¯ U ( l , Q ). Th e follo wing lemma, p r o v ed in App endix A.2 , d escribes the noisy scand ictabilit y for any additive white n oise c hannel mo del and the squ ared error loss fu nctio n, l s ( · ), in terms of the clean scandictabilit y of Y , and giv es the op timal scandictor. Lemma 9. L e t { ( X t , Y t ) } t ∈ Z 2 b e a r andom field governe d by a pr ob ability me asur e Q such that Y t = X t + N t , wher e N t , t ∈ Z 2 , ar e i.i.d. r andom variables with V ar ( N t ) = σ 2 N < ∞ . Then ¯ U ( l s , Q ) = U ( l s , Q Y ) − σ 2 N . (7 3) F urthermor e, ¯ U ( l s , Q ) is achieve d by the sc andictor which achieves U ( l s , Q Y ) . Actually , Lemm a 9 is only scarcely related to scannin g. It merely states that in the pre- diction of a pro cess based on its noisy observ ations, und er the additive mo del stated ab o v e and squared err or loss, the optimal pr ed ict or is one which d isr ega r ds the noise, and attempts to pr edict th e next noisy outcome. Sim ilar resu lts for bin ary pro cesses through a BSC w er e giv en in [16] and will b e discussed later. 32 Finally , we men tion th at the metho d used in the pro of of Lemma 9 is sp ecific for the square error loss fun ctio n . F or a general loss fun ctio n , one can us e conditional exp ectat ion in order to compute the noisy scandictabilit y , under a mo difie d loss fun cti on ρ . Sp ecifically , for a r andom field X , d enote b y σ ( X V n ) the s m all est sigma algebra with resp ect to whic h X V n is measurable. Let Ψ n denote a scanner for V n and denote b y F Ψ n t the information available to the sc andictor at the t ’th step , that is F Ψ n t = σ  Y Ψ 1 , Y Ψ 2 ( Y Ψ 1 ) , . . . , Y Ψ t ( Y Ψ t − 1 Ψ 1 )  . (74) Note that the set of sites Ψ 1 , Ψ 2 , . . . , Ψ t is itself random, yet for eac h t , Ψ t is F Ψ n t − 1 measurable (if Ψ is random, namely , it u ses add itio nal indep endent random v ariables, th e definition of F Ψ n t is altered accordingly). Hence, the fi ltratio n {F Ψ n t } | V n | t =1 represent s the gathered kn owledge at the s ca n dictor. W e ha ve E Q B n 1 | B n | | B n | X t =1 ρ ( F t − Y Ψ t ) = E Q B n 1 | B n | | B n | X t =1 E Q B n n ρ ( F t − X Ψ t − N Ψ t )    F Ψ t − 1 , σ ( X Ψ t ) o = E Q B n 1 | B n | | B n | X t =1 ˜ ρ ( F t − X Ψ t ) , (75 ) for some ˜ ρ . Thus, if l ( X Ψ t , F t ) is the required loss fu n ctio n in the n oisy prediction pr oblem of { X } , on e has to seek a function ρ ( · , · ) suc h that ˜ ρ ( x Ψ t , F t ) = l ( x Ψ t , F t ) for all x Ψ t and F t . If su c h a f unction is found , then surely E ρ ( Y Ψ t , F t ) = E l ( X Ψ t , F t ) and the optimal scandictor for the noisy prediction problem is the one whic h is optimal for the clean p rediction p r oblem of { Y } u nder ρ . While this is simple for the squared err or loss f unction and additiv e noise (c ho ose ρ ( y − F ) = ( y − F ) 2 − σ 2 N ), or Hamming loss and BS C (choose ρ ( y , F ) = l H ( y, F ) − δ 1 − 2 δ ) this is not alw a ys the case for a general loss fu nction. It is also imp ortant to note that in the case of white noise considred in this p aper, the condition on the mo dified loss fu nction ρ can b e stated in a single letter expresion, namly , if l ( X, F ) is the required loss f u nction for the noisy scandiction problem, ρ should satisfy E { ρ ( Y , F ) | σ ( X ) } = l ( X , F ). 4.1.1 Gaussian R and o m Fields Let b oth X and N b e Gaussian rand om fields, where the comp onents of N are i.i.d. and indep endent of X . Th at is, Y is the output of an A WGN c hannel, with a Gaussian input X . In this scenario, similarly to the clean one, the noisy scandictabilit y is kno wn exactly and is giv en by a single letter expr ession. Before we pro ceed, sev eral d efinitions are required. F or any t ∈ Z 2 and V ⊆ Z 2 , denote b y ˆ X t ( V ) the b est linear predictor of X t giv en { X t ′ } t ′ ∈ V . A subset S ⊆ Z 2 is called a half 33 plane if it is closed to addition and satisfies S ∪ ( − S ) = Z 2 and S ∩ ( − S ) = { 0 } . F or example, S lex = { ( m, n ) ∈ Z 2 : [ m > 0] or [ m = 0 , n ≥ 0] } is a half plane. Let X b e a wid e sen s e stationary random field and denote by g th e density fu nction asso ciate d with the absolutely con tinuous comp onent in the Leb esgue decomp osition of its sp ectral measure. Then, for any half plane S , we ha v e [26 , Theorem 1], E  X 0 − ˆ X 0 ( − S \ { 0 } )  2 = exp ( 1 4 π 2 Z [0 , 2 π ) 2 ln g ( λ ) dλ ) △ = σ 2 u ( X ) . (76) W e can now state the follo wing corollary , regarding the n oisy scandictabilit y in th e Gaussian regime and squared error loss, which is a direct app lica tion of Lemma 9 and the results of [11, Section IV]. Corollary 10. L et { ( X t , Y t ) } t ∈ Z 2 b e a r andom field governe d by a pr ob ability me asur e Q suc h that Y t = X t + N t , wher e X is a stationary Gaussian r andom field, N t , t ∈ Z 2 , is an A WGN, indep endent of { X t } t ∈ Z 2 . Then, the noisy sc andictability of Q u nder the squar e d err or loss is given by ¯ U ( l s , Q ) = σ 2 u ( Y ) − σ 2 N . (77) F urthermor e, ¯ U ( l s , Q ) is asymptotic al ly achieve d b y a sc andictor which sc ans ( X t , Y t ) ac c or ding to the total or der define d by any half-plane S and applies the c orr e sp onding b e st line ar pr e dictor for the next outc ome of Y . F or any stationary Gaussian pr o c ess X , it h as b een sho wn by Kolmogoro v (see for example [27]) that the ent rop y rate is giv en b y H X ∗ = 1 2 ln (2 π e ) + 1 4 π Z π − π ln g ( λ ) dλ. (78) Th us, u sing the one-dimensional analogue of (76), for a stationary Gaussian p rocess X we ha ve, H X ∗ = 1 2 ln (2 π eσ 2 u ( X )) . (79) In f act, (79) applies for stationary Gaussian r andom fields as well. Thus, we ha v e, ¯ U ( l s , Q ) = σ 2 u ( Y ) − σ 2 N = 1 2 π e e 2 H Y ∗ − 1 2 π e e 2 H N , (80) where H Y ∗ is the entrop y rate of Y and H N is the entrop y of eac h N t . F r om the entrop y p ow er inequalit y [23 , pp . 496], w e ha ve, 1 2 π e e 2 H Y ∗ ≥ 1 2 π e e 2 H N + 1 2 π e e 2 H X ∗ , (81) 34 th u s , as exp ected, th e n oisy scandictabilit y giv en in Corollary 10 (and (80) ) is at least as small the clean scandictabilit y of X , that is, with no noise at all. In m ost of the in teresting cases, ho wev er, (81) is a strict inequalit y . In fact, as mentioned in [28], (81) is ac hiev ed w ith equalit y only w hen b oth X and N are Gaussian and ha ve pr op ortional sp e ctr a . C onsequen tly , u nless X is wh ite , Corollary 10 is n on -trivial. 4.1.2 Binary Random Fields In this case, the results of [14 ] and [16] s h ed ligh t on the optimal scandictor. Therein, it wa s sho wn th at for a binary prediction prob lem, i.e., w here { X t } is a bin ary source passed th rough a BSC with cross o ver probability δ < 1 2 , and { Y t } is the channel output, the more lik ely outcome f or the clean bit is also the more likely outcome f or th e noisy b it. Thus, the optimal predictor in the Hamming sense for the n ext clean bit (based on the noisy observ ations) might as w ell u se the same strategy as if it tries to p redict the next noisy bit. C onsequen tly , the optimal sc andictor in the noisy setting is th e one whic h is optimal for { Y } , and the resu lts of [11, S ect ion V] app ly . The follo wing pr oposition relates the scandictabilit y of a binary noise-corrupted pro cess { Y } , judged with resp ect to the clean binary pr ocess { X } , to its clean scandictabilit y . Prop osition 11. L et { ( X t , Y t ) } t ∈ Z 2 b e a binary r andom field governe d by a pr ob ability me asur e Q such that { Y t } is the output of a binary memoryless symmetric channel with cr oss over pr ob ability δ and input { X t } . Then, ¯ U ( l H , Q ) = U ( l H , Q Y ) − δ 1 − 2 δ , (82) wher e l H is the Hamming loss f u nction. F urthermo r e, ¯ U ( l H , Q ) is achieve d by the sc andictor which achieves U ( l H , Q Y ) . Note th at ind ee d U ( l H , Q Y ) ≥ δ as Y is the outp ut of a BSC with crosso ver pr obabilit y δ . Pr o of (Pr op osition 11). Let { B n } n ≥ 1 b e any sequence of elemen ts in V , satisfying R ( B n ) → ∞ . W e ha ve, ¯ U ( l H , Q B n ) = inf (Ψ ,F ) ∈S ( B n ) E Q B n 1 | B n | | B n | X t =1 l H ( X Ψ t , F t ( Y Ψ 1 , . . . , Y Ψ t − 1 )) = inf (Ψ ,F ) ∈S ( B n ) 1 | B n | | B n | X t =1 P  F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) 6 = X Ψ t  , (83) 35 and, analogously , U ( l H , Q Y ,B n ) = inf (Ψ ,F ) ∈S ( B n ) 1 | B n | | B n | X t =1 P  F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) 6 = Y Ψ t  . (84) Denoting by Z t the channel n oise at time t , and abbr evia ting F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) b y F t , we ha ve P ( F t 6 = Y Ψ t ) = P ( F t 6 = Y Ψ t , Z Ψ t = 1) + P ( F t 6 = Y Ψ t , Z Ψ t = 0) = P ( F t = X Ψ t , Z Ψ t = 1) + P ( F t 6 = X Ψ t , Z Ψ t = 0) = (1 − P ( F t 6 = X Ψ t )) δ + P ( F t 6 = X Ψ t ) (1 − δ ) . (85) Namely , for δ < 1 2 , the optimal strategy for p redicting Y Ψ t based on Y Ψ 1 , . . . , Y Ψ t − 1 and the optimal strategy for pr edicting X Ψ t based on Y Ψ 1 , . . . , Y Ψ t − 1 are ident ical, and, in add iti on, P ( F t 6 = X Ψ t ) = P ( F t 6 = Y Ψ t ) − δ 1 − 2 δ . (86) Substituting (86) int o (83) and taking n → ∞ completes the pro of. 4.2 Univ ersal Scandiction in the Noisy Scenario Section 4.1 dealt with the actual v alue of the b est achiev able p erformance in the noisy scan- diction scenario. Ho wev er, it is also inte r esting to inv estigate the universal setting in wh ic h one seeks a p redicto r which do es not dep end on th e join t probabilit y measur e of { ( X, Y ) } , ye t p erforms asymptotically as well as a one matc hed to this measure. T he problem of un iv ersal scandiction in the noiseless scenario w as dealt with in [12 ]. Herein, we show that it is p ossible to construct universal scandictors in the noisy setting as w ell (similar to universal scanning and filtering in Section 3.3). First, w e show that it is p ossible to comp ete su cc essfully with an y fi n ite set of scandictors, and present a un iv ersal scandictor for this s ett in g. W e then show that w ith a prop er c hoice of a set of scandictors, it is p ossible to (univ ersally) ac h iev e ¯ U ( l , Q ), i.e., th e noisy scandictabilit y , for an y s patia lly stationary random filed ( X , Y ). A t the b asis of the r esults of [12] stands the exp onenti al weig hting algorithm, originally deriv ed by V o vk in [29]. In [29], V o vk consid ered a general set of exp erts and introdu ced the exp onen tial weig h ting alg orithm in order to comp ete with th e b est exp ert in the set. In this algorithm, eac h exp ert is assigned with a w eigh t, according to its past p erformance. By d ecreasing the weigh t of p o orly p erforming exp erts, hence p referring the on es p r o ved to p erform wel l th us far, one is able to comp ete with th e b est exp ert, ha ving neither any a priori kno wledge on th e input sequence nor wh ic h exp ert w ill p erform the b est. It is clear that the 36 essence of this algorithm is the u se of the cum u lat iv e losses incurr ed b y eac h exp ert to co nstruct a probab ility measure on the exp erts, whic h is later u sed to c ho ose an exp ert for the next action. Ho we ver, when th e clean data X is not kn own to the sequ en tial algorithm, it is imp ossible to calculate the cum u lat ive losses of the exp erts pr ec isely . Nev ertheless, as W eissman and Merha v sho w in [15], u s ing an unbiased estimate ˆ X t ( Y t ) of X t results in sufficientl y accurate estimates of the cumulativ e losses of the exp erts, wh ic h in turn can b e used by the exp onen tial w eighti ng alg orithm . Hence, the framew ork deriv ed in [12] can then b e used to suggest univ ersal scandictors for the noisy setting as wel l. Consider a random field ( X B , Y B ) where X is binary and Y is either binary (e.g., the output of a BSC w hose inp ut is Y ) or real v alued (e.g., X through a Gaussian n oise c hannel). F or a loss f u nction l : { 0 , 1 } × [0 , 1] → [0 , ∞ ] we defin e, similarly to [15], l 0 ( · ) △ = l (0 , · ) and l 1 ( · ) △ = l (1 , · ) . (87) Assume (Ψ , F ) is a scandictor for B . Then, f or any t ≤ | B | , w e ha ve L (Ψ ,F ) ( x B , y B ) t = t X i =1 l ( F i ( y Ψ i − 1 Ψ 1 ) , x Ψ i ) = t X i =1 h (1 − x Ψ i ) l 0 ( F i ( y Ψ i − 1 Ψ 1 )) + x Ψ i l 1 ( F i ( y Ψ i − 1 Ψ 1 )) i . (88) Clearly , L (Ψ ,F ) ( x B , y B ) t dep ends on x B and is not kno wn to the sequ ential algorithm. Let h ( y Ψ i ) b e an unbiased estimate for x Ψ i . F or example, wh en Y is the output of a BSC with input X we may choose h ( y Ψ i ) = y Ψ i − δ 1 − 2 δ . (89) Define ˆ L (Ψ ,F ) ( y B ) t = t X i =1 h (1 − h ( y Ψ i )) l 0 ( F i ( y Ψ i − 1 Ψ 1 )) + h ( y Ψ i ) l 1 ( F i ( y Ψ i − 1 Ψ 1 )) i , (90) and ∆ (Ψ ,F ) ( x B , y B ) t △ = L (Ψ ,F ) ( x B , y B ) t − ˆ L (Ψ ,F ) ( y B ) t = t X i =1 ( h ( y Ψ i ) − x Ψ i ) l 0 ( F i ( y Ψ i − 1 Ψ 1 )) + t X i =1 ( x Ψ i − h ( y Ψ i )) l 1 ( F i ( y Ψ i − 1 Ψ 1 )) . (91) Similarly to [15], w e assume that the noise field N B is of ind epend en t comp onents and that for eac h i ∈ B , Y i ∈ σ ( N i ), i.e., the noise comp onent at site i affects the observ ation at that site alone. In Ap pen dix A.3, w e show that f or any image x B and an y scandictor (Ψ , F ) f or 37 B ,  ∆ (Ψ ,F ) ( x B , y B ) t , F Ψ t  is a zero mean martingale. As a result, for any scand ictor (Ψ , F ), image x B and t we hav e E L (Ψ ,F ) ( x B , Y B ) t = E ˆ L (Ψ ,F ) ( Y B ) t , (92) namely , ˆ L (Ψ ,F ) ( Y B ) t is an unbiased estimator for L (Ψ ,F ) ( x B , Y B ) t . The u niv ersal algorithm f or scanning and prediction in the noisy scenario w ill thus use ˆ L (Ψ ,F ) ( Y B ) t instead of L (Ψ ,F ) ( x B , Y B ) t , whic h is un kno wn. More sp ecifically , similarly to the algorithm prop osed in [12], the algorithm divides the data arr a y to b e scand ict ed to blo c ks of size m ( n ) × m ( n ), then scans the data in a (fixed) blo c k-wise order, wh ere eac h b loc k is scandicted u sing a scandictor c h osen at random from the scandictors set, according to the distrib u tion ˆ P i  j |{ ˆ L j,i } λ j =1  , ˆ P i  j |{ ˆ L j,i } λ j =1  = e − η ˆ L j,i P λ j =1 e − η ˆ L j,i , (93) where ˆ L j,i = P i − 1 m =0 ˆ L (Ψ ,F ) j ( y m ), the estimated cumulativ e loss of the scandictor (Ψ , F ) j after scandicting i blo c ks of data, when (Ψ , F ) j is r estarted after eac h blo c k, and λ is the card inalit y of the set of s cand ict ors, F m . 4 Note the sub script m in F m , as in order to scand ict a data arra y of s ize n × n , the un iv ersal algorithm discussed h er ein uses the scandictors with w hic h it comp etes, b ut only on blo c ks of size m × m . The follo wing prop osition giv es an u p p er b ound on the redun dancy of the algorithm wh en comp eting with a fi nite set of scand ictors, eac h op erating blo c k-wise on the data arra y . Prop osition 12. L et E L alg ( x V n , Y V n ) b e the exp e cte d (with r esp e ct to the noisy r andom field as wel l as the r andomiza tion in the algorithm) cumulative loss of the pr op ose d algorithm on Y V n , when the underlying cle an arr ay is x V n and the noisy field i s of indep endent c omp onents with Y i ∈ σ ( N i ) for e ach i ∈ V n ⊂ Z 2 . L et E L min ( x V n , Y V n ) denote the e xp e cte d cumulative loss of the b est sc andictor in F m , op er ating blo c k -wise on Y V n . Assume |F m | = λ , then E L alg ( x V n , Y V n ) − E L min ( x V n , Y V n ) ≤ m ( n )( n + m ( n )) √ ln λ l max √ 2 . (94) 4 T o b e consistent with the notation of [12], the same notation is used for b oth a filtratio n and a scandictor set. The difference should be cle ar from the context. 38 Pr o of. By (92) and [12 , Prop osition 3], for any x V n w e ha ve E L alg ( x V n , Y V n ) − min (Ψ ,F ) ∈F m E L (Ψ ,F ) ( x V n , Y V n ) = E ˆ L alg ( Y V n ) − min (Ψ ,F ) ∈F m E ˆ L (Ψ ,F ) ( Y V n ) ≤ E ˆ L alg ( Y V n ) − E min (Ψ ,F ) ∈F m ˆ L (Ψ ,F ) ( Y V n ) = E  ˆ L alg ( Y V n ) − min (Ψ ,F ) ∈F m ˆ L (Ψ ,F ) ( Y V n )  ≤ m ( n )( n + m ( n )) √ ln λ l max √ 2 . (95) Prop osition 12 is the basis for the main r esult in this sub-section, a universal s ca n dictor whic h comp etes su cce s s fully with any fi nite set of s cand ict ors for the noisy scenario. Theorem 13. L et ( X , Y ) b e a stationary r andom field with a pr ob ability me asur e Q . A ssume that for e ach i ∈ Z 2 , Y i is the output of a memoryless channel whose input is X i . L et F = {F n } b e an arbitr ary se quenc e of sc andictor sets, wher e F n is a set of sc andictors for V n and |F n | = λ < ∞ for al l n . Then, ther e e xi sts a se quenc e of sc andictors { ( ˆ Ψ , ˆ F ) n } , indep endent of Q , for which lim inf n →∞ E Q V n E 1 | V n | L ( ˆ Ψ , ˆ F ) n ( X V n , Y V n ) ≤ lim inf n →∞ min (Ψ ,F ) ∈F n E Q V n 1 | V n | L (Ψ ,F ) ( X V n , Y V n ) (96) for any Q ∈ M S (Ω) , wher e the inner exp e ctation in the l.h.s. of (96) is due to the p ossible r andomization in ( ˆ Ψ , ˆ F ) n . The pro of of Theorem 13 follo ws the pr oof of [12, Th eo rem 2] ve r batim. It is n o w p ossible to sho w the existence of a unive rsal scandictor for any stationary r andom field in the noisy scandiction setting. Herein, we in clude only the setting where X is binary and Y is the output of a BSC . In this case, the scandictor is t wofol d-unive rsal, n amely , it do es not d epend on the channel crosso v er probabilit y either. Extending the results to real-v alued noise is p ossible using the metho ds introduced in [16] (although the u niv ersal p redictor d o es dep end on th e c hann el c haracteristics) and will b e d iscussed later. Theorem 14. L et X b e a stationary r andom field over a finite alphab et A and a pr ob ability me asur e Q . L et Y b e the output of a BSC whose input is X and whose cr ossover pr ob ability δ . L et the pr e diction sp ac e D b e either finite or b ounde d (with l ( x, F ) then b eing Lipschitz in 39 its se c ond ar gument). Then, ther e exists a se quenc e of sc andictors { (Ψ , F ) n } , indep endent of Q and of δ , for which lim n →∞ E Q V n E 1 | V n | L (Ψ ,F ) n ( X V n , Y V n ) = ˜ U ( l , Q ) (97) for any Q ∈ M S (Ω) , wher e the inner exp e ctation in the l.h.s. of (97) is due to the p ossible r andomization in (Ψ , F ) n . Similar to [16, Section A] and the pro of of [12, Theorem 6], in the case of binary inpu t and binary-v alued noise it is p ossible to tak e the set of scandictors with w h ic h we comp ete as the set of al l p ossible sc andictors for an m ( n ) × m ( n ) blo c k. The pr oof th u s follo ws d irectly from the p roof of [12 , Theorem 6]. As for con tinuous-v alued observ ations, it is quite clea r that th e set of all p ossible scandictors for an m ( n ) × m ( n ) b loc k is f ar to o ric h to comp ete with (note that this is since the n umber of pr e dictors is to o large). A complete d iscussion is a v ailable in [16, Section B]. Ho wev er, W eissman and Merha v do offer a m ethod for successfully ac h ieving the Ba y es en v elop e for this setting, b y introdu cing a muc h s malle r set of p r edicto r s, w hic h on one hand includes the b est k th order Mark o v p redictor, yet on the other han d is not to o rich, in the sense that the redund ancy of the exp onent ial w eighting algorithm tends to zero when comp eting with an ǫ - g rid of it . Since presenting a unive rsal sc andictor for th is scenario w ill mainly include a rep etition of the man y details discu s sed in [16], we do not in clude it here. 4.3 Bounds on the Excess Loss for Non-Op tima l Scandictors Analogously to the scannin g and filtering setting discu ssed in Section 3, and the clean predic- tion setting d iscussed in [12], it is in teresting to inv estigate the excess loss incurr ed when non optimal s cand ict ors are used in the noisy scandiction setting. Un lik e the scanning and filtering setting, w here the excess loss b ound s were n ot a straigh tforward extensions of the r esults in [12], in the noisy s ca n diction scenario th is problem can b e quite easily tac kled using the resu lts of [12] and mo dified loss fun ctio n s. W e br iefly state the results of [12] in this con text. The scenario consid er ed therein is that of p redicting the next outcome of a b inary source, with D = [0 , 1] as the prediction space. φ ρ denotes th e Ba yes en v elop e asso ciated with th e loss fun ctio n ρ , i.e., φ ρ ( p ) = min q ∈ [0 , 1] [(1 − p ) ρ (0 , q ) + pρ (1 , q )] . (98) 40 Similarly to (54), d efine ǫ ρ = m in α,β max 0 ≤ p ≤ 1 | αh b ( p ) + β − φ ρ ( p ) | . (9 9) Note that although the definitions of φ ρ ( p ) and ǫ ρ refer to the binary scenario, the resu lt b elo w holds f or larger alph abets, with ǫ ρ defined as in (99), with the maxim u m ranging o ver the simplex of all d istributions on the alph abet, and h ( p ) (replacing h b ( p )) and φ ρ ( p ) denoting the ent rop y and Ba yes env elop e of th e distribution p , resp ectiv ely . In [12], it is shown that if X B is an arbitrarily distr ib uted bin ary rand om field, then, for an y scan Ψ,     α ρ 1 | B | H ( X B ) + β ρ − E Q B 1 | B | L (Ψ ,F opt ) ( X B )     ≤ ǫ ρ , (100) where α ρ and β ρ are the ac hieve rs of the minim um in (99). As men tioned earlier, if ρ ( Y , F ) is some loss fun cti on for the “clean” pred iction problem of { Y } , the noisy p rocess, then, E { ρ ( Y , F ) | σ ( X ) } = ˜ ρ ( X , F ) (101) for some ˜ ρ . Assuming a su ita b le ρ is foun d (i.e., ˜ ρ = l ), we hav e, for any s ca n Ψ,      α ρ 1 | B | H ( Y B ) + β ρ − 1 | B | E Q B L l Ψ ,F opt ( X B , Y B )      =     α ρ 1 | B | H ( Y B ) + β ρ − 1 | B | E Q B L ρ Ψ ,F opt ( Y B )     ≤ ǫ ρ , (102) where 1 | B | E Q B L l Ψ ,F opt ( X B , Y B ) is the normalized exp ected cumulativ e loss in optimally pre- dicting X Ψ t based on Y Ψ t − 1 1 , u nder the loss function l , 1 | B | E Q B L ρ Ψ ,F opt ( Y B ) is the norm alized exp ected cumulati v e loss in op timally pred ict in g Y Ψ t based on Y Ψ t − 1 1 , und er the loss function ρ , and α ρ and β ρ are the minimizers of ǫ ρ as defined in (99) . Hence, the follo w in g corollary applies. Corollary 15. L et X B b e an arbitr arily distribute d binary field. Assume a white noise, and denote the noisy version of X B by Y B . L e t D = [0 , 1] b e the pr e diction sp ac e and l : { 0 , 1 }× D → R b e any loss function. Then, for any sc an Ψ ,     1 | B | E Q B L l Ψ ,F opt ( X B , Y B ) − ¯ U ( l , Q B )     ≤ 2 ǫ ρ , (103) when ρ is a loss function such that E { ρ ( Y , F ) | σ ( X ) } = l ( X , F ) for any F . 41 Example 3 ( BSC and Hamming L oss ) . I n the case of binary inpu t, BSC w ith crosso ver prob- abilit y δ and Hamming loss l H ( · , · ), it is not hard to sh o w that ρ ( y , F ) = l H ( y , F ) − δ 1 − 2 δ . (104) Hence, φ ρ ( p ) = φ l H ( p ) − δ 1 − 2 δ (105) and ǫ ρ = 1 1 − 2 δ ǫ l H , (106) where ǫ l H = 0 . 08 as ment ioned in [12]. T he ab o v e b oun d on the excess loss can also b e computed dir ec tly , without using Corollary 15, as for any scan Ψ, th e n orm ali zed cumulativ e prediction errors are give n by 1 | B | E Q B L l H Ψ ,F ( X B , Y B ) = 1 | B | | B | X t =1 P  F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) 6 = X Ψ t  (107) for the noisy scenario, and 1 | B | E Q B L l H Ψ ,F ( Y B ) = 1 | B | | B | X t =1 P  F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) 6 = Y Ψ t  (108) for the (clean) prediction of Y B . Hence, u sing (86) , for any scan Ψ w e ha v e,     1 | B | E Q B L l H Ψ ,F opt ( X B , Y B ) − ¯ U ( l H , Q B )     =      1 | B | E Q B L l H Ψ ,F opt ( Y B ) − δ 1 − 2 δ − U ( l H , Q Y ,B ) − δ 1 − 2 δ      = 1 1 − 2 δ     1 | B | E Q B L l H Ψ ,F opt ( Y B ) − U ( l H , Q Y ,B )     ≤ 2 ǫ l H 1 − 2 δ . (109) Example 4 ( A dditive Noise and Squar e d Err or ) . L et Y B b e th e outpu t of an additiv e c h annel, with σ 2 v denoting th e noise v ariance. Let l s b e the squ ared error loss fun ctio n . In this case, E n ( Y Ψ t − F t ( y Ψ t − 1 )) 2 − σ 2 v | {z } ρ ( Y Ψ t ,F t ( y Ψ t − 1 )) | σ ( X Ψ t ) o = ( X Ψ t − F t ( y Ψ t − 1 )) 2 | {z } l s ( X Ψ t ,F t ( y Ψ t − 1 )) . (110) Th us, Corollary 15 applies with ρ ( Y , ˆ Y ) = ( Y − ˆ Y ) 2 − σ 2 v , and clearly ǫ ρ = ǫ l s . Note that although Corollary 15 is stated for b inary alphab et, it is not h ard to generalize its result to larger alphab ets, as men tioned in [12, Section 4]. 42 4.3.1 Excess Loss Bounds V ia the Con tinuou s Time Mutual Information The b ound on th e excess noisy scandiction loss given in Corollary 15 w as d eriv ed u sing the results of [12] and mo dified loss fu nctions. Ho w ev er, n ew b ounds can also b e deriv ed u sing the same metho d w hic h w as used in the pro of of Theorem 3, namely , the scan in v ariance of the m u tu al information and the relation to the conti nuous time problem. W e b riefly discuss ho w suc h a b ound can b e derived for noisy scand iction of Gaussian fields corru pted by Gaussian noise. Using the notation of S ec tion 3.2.1, we h a ve V ar( X ) − Z 1 0 V ar( X t | Y t )d t = σ 2 X − σ 2 N ln  1 + σ 2 X σ 2 N  = σ 2 N g  σ 2 X σ 2 N  , (111) where g ( x ) = x − ln(1 + x ) . (11 2) Since σ 2 X ≥ V ar  X Ψ i | Y Ψ i − 1 Ψ 1  and g ( x ) is mon otonically increasing for x > 0, deriv ations similar to (27) lead to 1 n 2 E L (Ψ ,F opt ) ( X V n , Y V n ) ≤ σ 2 N g  σ 2 X σ 2 N  + 1 n 2 2 σ 2 N I ( X V n , Y V n ) . (113) On th e other hand , since g ( x ) ≥ 0 for x ≥ 0, we hav e 1 n 2 E L (Ψ ,F opt ) ( X V n , Y V n ) ≥ 1 n 2 2 σ 2 N I ( X V n , Y V n ) , (114) whic h no w can b e viewe d as th e s cann ing and prediction analogue of [18, eq. (156b)]. W e thus ha ve the follo wing corollary . Corollary 16. L et X V n b e a Gaussian r andom field with a c onstant mar ginal distribution satisfying V ar ( X i ) = σ 2 X < ∞ for al l i ∈ V n . L et Y i = X i + N i , wher e N V n is a white Gaussian noise of varianc e σ 2 N , i ndep endent of X V n . Then, for any two sc ans Ψ 1 and Ψ 2 and their optimal pr e dictors, we have 1 n 2   E L (Ψ 1 ,F opt ) ( X V n , Y V n ) − E L (Ψ 2 ,F opt ) ( X V n , Y V n )   ≤ σ 2 N g  σ 2 X σ 2 N  . (115) Similarly as in Theorem 3, the b ound in C oroll ary 16 h as the form σ 2 X g (SNR) SNR , n amel y , it scales with the v ariance of the in put. As exp ected, at the limit of lo w SNR, g (SNR) SNR → 0, since regardless of the scan, one is clueless ab out the un derlying clean s y mb ol. In fact, it is not surpr ising th at this b eha vior is common to b oth the fi ltering and th e pr edicti on scenarios. In 43 the f ormer, the b ound v alue is giv en by (23), while in the latter it is give n by (111 ). I n b oth cases, th e b ound v alue is simply the difference b et wee n a con tinuous time filtering p roblem, and a discrete time filtering (or prediction, in (111)) pr oblem. I t is not h ard to see that this difference tends to 0 as S NR → 0 + . At the limit of high S NR, g (SNR) SNR → 1. Indeed, this limit corresp onds to th e noiseless scandiction scenario, where scanning is consequen tial [11]. 5 Conclus ion W e inv estigated problems in sequentia l fi lte ring and pr edictio n of noisy multidimensional data arra ys. A b ound on the b est ac hiev able scann ing and fi ltering p erformance w as deriv ed, and the excess loss incurred when non-optimal scanners are used w as quanti fied. In the pr edictio n setting, a relation of the b est ac hiev able p erform ance to that of the clean scandictabilit y w as giv en. In b oth the filtering and prediction scenarios, a sp ecial emphasis wa s giv en to the cases of A WGN and squared error loss, and BSC and Hamming loss. Due to their sequenti al nature, the problems d iscussed in this pap er are strongly related to the filtering and pred ict ion p roblems where r eordering of th e data is n ot allo w ed (or where there is only one n atural order to scan the data), s u c h as robust fi ltering and universal pr e- diction discus sed in the current literature. Ho w ev er, the numerous scanning p ossibilities in the multidimensional setting add a multit u de of new chall enges. In fact, many interesting problems remain op en. It is clear th at iden tifying the optimal scanning metho ds in the widely used inp ut and c h annel mo dels discussed herein is r equ ired, as th e implementa tion of univ ersal algorithms might b e to o complex in r ealistic s itu ations. Moreo v er , tighter up p er b ounds on the excess loss can b e derive d in ord er to b etter u nderstand the tr ad e-offs b et ween non-trivial scanning metho ds and the o ve rall p erformance. Finally , by [11], th e trivial scan is optimal for scandiction of noise-free Gaussian rand om fields. By Corollary 10 herein, this is also the case in scandiction of Gaussian fields corrup ted by Gaussian n oise. Whether the same hold for scanning and filtering of Gaussian rand om fields corrup ted by Gaussian noise r emains unanswered. 44 A App end ixes A.1 Pro of of Theorem 4 The pro of resem b les the pro of of Theorem 4. Ho wev er, the deriv ations leading to the analogue of (27) are slightly differen t. F or any inp ut field X V n , we hav e 1 n 2 E Q V n L (Ψ , ˜ F opt ) ( X V n , Y V n ) = 1 n 2 n 2 X i =1 V ar  X Ψ i | Y Ψ i Ψ 1  = 1 n 2 n 2 X i =1 ( Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t } t ∈ [ i − 1 ,i − 1+ t ]  d t − " Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t } t ∈ [ i − 1 ,i − 1+ t ]  d t − V ar  X Ψ i | Y Ψ i Ψ 1  #) ( a ) ≥ 1 n 2 n 2 X i =1 Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t } t ∈ [ i − 1 ,i − 1+ t ]  d t − f ∗  X V n , σ 2 N  = 1 n 2 2 σ 2 N I ( X V n ; Y V n ) − f ∗  X V n , σ 2 N  , (116) where (a) results from the d efinition of f ∗ (). Th e rest of the pro of follo ws similar to the pr oof of T heorem 4, since f or an y X V n and σ 2 N it is clear that f ∗ ( X V n , σ 2 N ) is non negativ e. A.2 Pro of of Lemma 9 Without loss of generalit y , we assume E N t = 0. F rom Prop osition 1, we h a ve, ¯ U ( ρ s , Q ) = lim n →∞ inf (Ψ ,F ) ∈S ( B n ) E Q B n 1 | B n | | B n | X t =1  X Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 . (117) 45 Ho we ver, since Y Ψ t = X Ψ t + N Ψ t , and N Ψ t is indep end ent of N Ψ t ′ , t ′ 6 = t and of all { X t } , we ha ve E Q B n 1 | B n | | B n | X t =1  X Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 = E Q B n 1 | B n | | B n | X t =1  Y Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 ) − N Ψ t  2 = E Q B n 1 | B n | | B n | X t =1   Y Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 − 2 N Ψ t  X Ψ t + N Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  + N 2 Ψ t  = E Q B n 1 | B n | | B n | X t =1  Y Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 − σ 2 N . (11 8) That is, U ( ρ s , Q ) = lim n →∞ inf (Ψ ,F ) ∈S ( B n ) E Q B n 1 | B n | | B n | X t =1  Y Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 − σ 2 N , (119) whic h completes the pro of. A.3 The M artinga le Prop ert y of  ∆ (Ψ ,F ) ( x B , y B ) t , F Ψ t  The pro of f oll o ws that of [15, Lemma 1]. Ho wev er, notice that d u e to the d ata-dep endent scanning F Ψ t is not generated by a fixe d set of random v ariables, that is, o ver a fixed set of sites, but by a set of t random v ariables wh ic h may b e different for eac h instantia tion of the random field (as for eac h t , Ψ t dep ends on Y Ψ t − 1 Ψ 1 ). Y et, the exp ectation will alwa y s b e with resp ect to the random v ariables se en so far . By (91) , ∆ (Ψ ,F ) ( x B , y B ) t = t X i =1 ( h ( y Ψ i ) − x Ψ i ) l 0 ( F i ( y Ψ i − 1 Ψ 1 )) + t X i =1 ( x Ψ i − h ( y Ψ i )) l 1 ( F i ( y Ψ i − 1 Ψ 1 )) . (120) Defining m t △ = t X i =1 ( h ( Y Ψ i ) − x Ψ i ) l 0 ( F i ( Y Ψ i − 1 Ψ 1 )) , (121) 46 w e hav e, E  m t +1 |F Ψ t  = E ( t +1 X i =1 ( h ( Y Ψ i ) − x Ψ i ) l 0 ( F i ( Y Ψ i − 1 Ψ 1 )) |F Ψ t ) = E n  h ( Y Ψ t +1 ) − x Ψ t +1  l 0 ( F t +1 ( Y Ψ t Ψ 1 )) |F Ψ t o + E ( t X i =1 ( h ( Y Ψ i ) − x Ψ i ) l 0 ( F i ( Y Ψ i − 1 Ψ 1 )) |F Ψ t ) = E  h ( Y Ψ t +1 ) − x Ψ t +1  |F Ψ t  l 0 ( F t +1 ( Y Ψ t Ψ 1 )) + t X i =1 ( h ( Y Ψ i ) − x Ψ i ) l 0 ( F i ( Y Ψ i − 1 Ψ 1 )) = E  h ( Y Ψ t +1 ) − x Ψ t +1  l 0 ( F t +1 ( Y Ψ t Ψ 1 )) + m t = m t , (122) where the third equalit y is since Y Ψ t 0 is F Ψ t measurable f or an y t 0 ≤ t , the four th is since h ( Y Ψ t +1 ) − x Ψ t +1 is indep endent of F Ψ t and the fifth is sin ce h ( Y Ψ t +1 ) is an unbiased estimate for x Ψ t +1 . Hence, ( m t , F Ψ t ) is a zero-mean martingale (note that E m 1 = 0). Analogously , P t i =1 ( x Ψ i − h ( y Ψ i )) l 1 ( F i ( y Ψ i − 1 Ψ 1 )) is also a zero-mean martingale w ith r esp ect to F Ψ t , whic h completes the pro of. References [1] B. Natara jan, K. K onstan tinides, and C. Herley , “Occam fi lte rs for sto c hastic sources with application to digital images,” IEEE T r ans. Signal Pr o c essing , vol. 46, no. 5, pp. 1434– 1438, May 1998. [2] C.-H. Lamarqu e and F. Rob ert, “Image analysis using s pace -filling curve s and 1D w a ve let bases,” Pattern R e c o gnition , vol. 29, n o. 8, p p . 1309– 1322, 1996. [3] A. K rzyzak, E . Rafa jlo wicz, and E. Sku balsk a-Rafa jlo wicz, “Clipp ed median and space filling curv es in image fi ltering,” Nonline ar Analysis , vol . 47, p p. 303–314 , 2001. [4] L. V elho and J. M. Gomes, “Digital h alfto n ing with space filling curves,” Computer Gr aphics , vo l. 25, no. 4, p p . 81–90, Ju ly 1991. [5] E. Sku balsk a-Rafa jlo w icz , “Pattern r ecognition algorithms based on sp ace -fi lling curves and orthogonal expansions,” IEEE T r ans. Inform. The ory , vol. 47, no. 5, pp . 1915 –1927 , July 2001. 47 [6] T. Asano, D. Ranjan, T. R o os, E. W elzl, and P . Widma y er , “Space-filling cur v es and their use in th e design of geometric d at a structures,” The or etic al Computer Scie nc e , vol . 181, pp. 3–15, 1997. [7] B. Mo on, H. V. Jagadish, C. F aloutsos, and J. H. Saltz, “Analysis of the clustering prop- erties of the Hilb ert space-filling curve,” IEEE T r ans. Know le dge and Data E ng i ne ering , v ol. 13, no. 1, p p. 124–1 41, January/F ebruary 2001. [8] A. Bogomjak o v and C. Gotsman, “Univ ersal renderin g sequences for transparent v ertex cac hing of progressiv e meshes,” Computer Gr aphics F orum , v ol. 21, n o. 2, pp . 137–14 8, 2002. [9] R. Niederm eier, K. Reinhardt, and P . S anders, “T o w ards optimal localit y in mesh- indexings,” Discr ete A pp lie d Mathematics , vol. 117, pp . 211–237 , 2002. [10] A. Lemp el and J. Ziv, “Compr ession of tw o-dimensional data,” IEE E T r ans. Inform. The ory , v ol. IT-32, no. 1, pp. 2–8, January 1986. [11] N. Merhav and T. W eissman, “Scanning and prediction in multidimensional data arrays,” IEEE T r ans. Inform. The ory , vol. 49, no. 1, pp . 65–82, Jan uary 2003. [12] A. Cohen, N. Merha v, and T. W eissman, “Scanning and sequentia l decision making for m u lti-dimens ional data - p art I: the noiseless case,” to app ear in IEEE T r ans. on Inform. The ory . [13] A. Cohen, Topics in sc anning of multidimensional data , Ph.D. thesis, T ec hn ion, Isr ael Institute of T ec hnology , 2007. [14] T. W eissman, N. Merh a v, and A. Somekh-Baruch, “Twofold un iv ersal p rediction sc hemes for ac h ieving the fin ite-state predictabilit y of a noisy in dividual binary sequen ce,” IEEE T r ans. Inform. The ory , vol. 47, no. 5, pp. 1849–186 6, July 2001. [15] T. W eissman and N. Merh a v, “Univ ers al prediction of individual binary sequen ces in th e presence of noise,” IEEE T r ans. Inform. The ory , v ol. 47, pp . 2151–2 173, Septemb er 2001. [16] T. W eissman and N. Merh a v, “Univ ers al prediction of random bin ary s equences in a noisy en vironm en t,” Ann. App. Pr ob. , v ol. 14, n o. 1, pp. 54–89, F ebruary 2004. [17] T. E. Duncan, “On calculation of mutual inf ormati on,” SIAM Journal of Applie d M ath- ematics , v ol. 19, pp. 215–220, July 1970. 48 [18] D. Gu o, S. S hamai, and S. V erd ´ u, “Mutual information and minim u m mean-square error in Gaussian c h an n els,” IEEE T r ans. Inf orm. The ory , vol. 51, no. 4, pp. 1261–1 282, Apr il 2005. [19] T. W eissm an , E. Ordentlic h , M. W ein b erger, A. S omekh -Baruch, and N. Merh a v , “Uni- v ersal filtering via prediction,” IEEE T r ans. Inform. The ory , v ol. 53, no. 4, pp. 1253– 1264, April 2007. [20] T. W eissman, E. Or d en tlic h, G. Seroussi, S . V erd ´ u, and M. W einb erger, “Unive rsal discrete d enoising: kn own c h an n el,” IEEE T r ans. Inform. The ory , v ol. 51, no. 1, pp. 5–28, Jan u ary 2005. [21] E. Ord en tlic h and T. W eissman, “On the optimalit y of symb ol-b y-sym b ol filtering and denoising,” IEE E T r ans. Inform. The ory , vol. 52, n o. 1, pp . 19–40, Jan uary 2006. [22] D. Guo, Gaussian Channels: informatio n, estimation and multiuser dete ction , Ph .D. thesis, Princeton Unive rsit y , 2004. [23] T. M. C o ver and J. A. Thomas, Elements of Information The ory , Wiley , New Y ork, 1991. [24] S. V erd ´ u and T. W eissman, “The in formatio n lost in erasu res,” submitted to IEEE T r ans. on Inform. The ory , 2007. [25] Y. Ep hraim and N. Merhav, “Hidden Mark ov pr ocesses,” IEEE T r ans. Inform. The ory , v ol. 48, no. 6, p p. 1518– 1569, Jun e 2002. [26] H. Helson and D Lo wdenslager, “Pred iction theory and Fourier s er ies in s ev eral v ariables,” A cta Math. , v ol. 99, pp . 165–202, 1958. [27] A. Papoulis, Pr ob ability, R andom V ariables, and Sto chastic Pr o c esses , McGra w-Hill, New Y ork, 2nd edition, 1984. [28] N. M. Blac hman, “The con v olution inequalit y for en tropy p o w ers,” IE EE T r ans. Inform. The ory , v ol. IT-11, pp . 267–27 1, April 1965. [29] V. G. V o vk, “Aggregating strategie s ,” Pr o c . 3r d Annu. Workshop Computationa l L e arning The ory, San Mateo, CA, pp. 372–3 83, 1990. 49

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment