Scanning and Sequential Decision Making for Multi-Dimensional Data - Part II: the Noisy Case

Scanning and Sequen tial Decision Mak ing for Multi-Dimensional Data - P art I I: t he Noisy C ase ∗ Asaf Cohen † , Tsac hy W eissman ‡ and Neri Merhav † Octob er 27, 201 8 Abstract W e consider the problem of sequen tial d eci sion making on random ﬁelds corrupted b y noise. In this scenario, the decision maker obs er ves a noisy version of the data, y et judged with resp ect to the clean data. In particular, w e ﬁrst consider th e problem of sequen tially scanning and ﬁltering noisy random ﬁelds. In this case, the sequent ial ﬁlter is giv en the freedom to c h oose the path ov er which it tra verses the random ﬁeld (e.g., noisy image or v id eo sequence), thus it is natural t o ask what is the b est ac hiev able p erformance and ho w sensitiv e th is p erform an ce is to the c h oic e of the scan. W e formally deﬁne the problem of s ca n ning and ﬁltering, d eriv e a boun d on the b est ac hiev able p erformance and quant ify the excess loss o ccurring when non-optimal sca nners are used, compared to optimal scanning and ﬁltering. W e then d iscuss th e problem of sequen tial scanning and predictio n of noisy random ﬁelds. This setting is a natural mo del for applications such as restoratio n and co ding of noisy im- ages. W e formally deﬁne the problem of scanning and pr edictio n of a n oisy m ultidimensional arra y and r ela te t he optimal p erformance to the clean scandictabilit y deﬁned by Merha v and W eissman . Moreo v er, b ound s on the excess loss due to sub-optimal scans a r e deriv ed, and a unive r sal prediction algorithm is suggested. ∗ The mater ial in this pap er was presented in part a t the IEEE I nternational Sy mp osium on Information Theory , Seattle, W ashington, United States , July 2 006, and accepted to the IEEE International Symp osium on Infor mation Theory , Nice, F ra nce, June 200 7. † Asaf Cohen and Ner i Merha v are with the Departmen t of the Electrical Engineering , T echnion - I.I.T., Haifa 32000 , Israel. E - mails: { soofso of@tx,merhav@ee } .technion.ac.il. ‡ Tsach y W eissman is with the Depar tmen t of Ele c tr ical Eng ineering, Stanfo r d University , Stanford, CA 943 05, USA. E -mail: tsac hy@stanford.edu. 1 This pap er is the second part of a t wo-part pap er. Th e ﬁrst pap er dealt w ith s equ en tial decision m aking on noiseless d ata arra ys, namely , w hen the decision mak er is j u dged with resp ect to the same d ata arr a y it observ es. 1 In tro d uction Consider the problem of sequ en tially scanning and ﬁltering (or p redicting) a multidimensional noisy data arra y , wh ile min imizing a giv en loss function. Partic u larly , at eac h time in stan t t , 1 ≤ t ≤ | B | , where | B | is the num b er of sites (“pixels”) in the data array , th e sequen tial decision mak er chooses a site to b e v isited, d enote d b y Ψ t . In the ﬁltering scenario, it ﬁrst observ es the v alue at that site, an d then giv es an estimation for the und erlying clean v alue. In the prediction s ce nario, it is required to giv e a prediction for that (clean) v alue, b efore the actual observ ation is made. In b oth cases, b oth the lo cation Ψ t and the estimation or prediction ma y dep end on the previously observe d v alues - the v alues at sites Ψ 1 to Ψ t − 1 . The goal is to minimize the cum ulativ e loss after scanning the entire d ata arra y . Applications of this problem can b e fou n d in image an d video pro cessing, su c h as ﬁltering or p redictiv e co ding. I n th ese app lica tions, one wishes to either enhance or jointly enhance and cod e a giv en image. The motiv ation b ehind a pr ed ict ion/compression-b ased approac h, is th at the prediction error may consist mainly of the noise signal, while the clean signal is reco v er ed b y the p redictor. F or example, see [1]. It is clear that diﬀerent s ca n ning patterns of the image ma y result in diﬀerent ﬁltering or prediction errors, th u s, it is natural to ask what is the p erforman ce of the optimal scannin g strategy , and what is the loss w h en non-optimal strategies are used. The problem of s ca n ning multidimensional d at a arra ys also arises in other areas of image pro cessing, s u c h as one-dimensional wa velet [2] or median [3] pr ocessing of images, wh ere one seeks a space-ﬁlling curve which facilitates th e one-dimensional signal pro cessing of the m u ltidimen sional d at a. O ther examples include digital halftoning [4], where a sp ace ﬁlling curv e is sought in ord er to minimize the eﬀect of false con tour s, an d pattern recognition [5]. Y et more applications can b e found in m u ltidimensional data query [6] and indexing [7], where m ultidimensional data is stored on a one-dimensional storage device, hence a lo calit y- preserving sp ac e-ﬁlling curve is sought in order to minimize the num b er of conti n u ous read op erations required to access a m u ltidimensional ob ject, and rend ering of three-dimensional graphics [8], [9]. 2 An inform at ion theoretic discussion of the scanning pr oblem wa s in itia ted by Lemp el and Ziv in [10], where the Pea no-Hilb ert scan was shown to b e optimal for co m pression of individual images. In [11], Merha v an d W eissman f ormally deﬁn ed a “scandictor”, a scheme for sequ en- tially scann ing and predicting a multidimensional data array , as well as the “scandictabilit y” of a rand om ﬁeld, namely , th e b est ac h iev able p erformance for scanning and pr edictio n of a random ﬁeld. P articular cases w here this v alue can b e computed and the optimal scanning order can b e identi ﬁed we re discussed in that wo r k. One of the main results of [11] is the fact that if a sto c hastic ﬁeld can b e represente d autoregressive ly (und er a sp eciﬁc scan Ψ) with a maximum-en trop y inn ov ation pro cess, then it is optimally scandicted in the w ay it was created (i.e., by the sp eciﬁc scan Ψ an d its corresp onding optimal p r edicto r ). A more compre- hensiv e su rv ey can b e found in [12] and [13]. In [12], the problem of universal scanning and prediction of noise-free multidimensional arrays w as inv estigated. Although this problem is fundamentall y diﬀeren t fr om its one-dimensional analogue (for example, one cannot comp ete successfully with any t wo scandictors on an y in dividual image), a un iv ersal scann in g and pr e- diction algorithm which ac h iev es the scandictabilit y of any stationary random ﬁeld was give n, and the excess loss incu rred when non-optimal scanning strategies are u sed w as quant iﬁed. In [14], W eissman, Merha v and S omekh-Ba ruc h , as w ell as W eissman and Merhav in [15] and [16], extended the problem of universal prediction to the case of a noisy en vironm en t. Namely , the p redictor observe s a noisy version of the sequence, ye t, it is ju dged with resp ect to the clean sequence. In this p aper, we extend the results of [11] and [12] to this noisy scenario. W e form ally deﬁne the problem of s equen tially ﬁltering or pr edicti ng a multidimensional data arra y . First, w e d eriv e lo w er b ounds on the b est ac hiev ab le p erformance. W e then discuss the s ce nario w here non-optimal scanning strategies are used. T hat is, w e assume th at , due to implemen tation constrain ts, for examp le, one cannot u se th e optimal scanner for a giv en data arra y , and is forced to u se an arbitrary s cann ing order. In su ch a scenario, it is imp ortan t to understand what is th e excess loss in curred, compared to optimal scannin g and ﬁltering (or prediction). W e d er ive upp er b ounds on this excess loss. Finally , we brieﬂy ment ion h o w the results of [12] can b e exp loited in order to constru ct universal schemes to the noisy case as w ell. While many of the resu lts for noisy scandiction are extendible f r om the noiseless case, similarly as r esu lts for n oisy pred ict ion were extended from results for n oise less prediction [15], the scanning and ﬁltering pr oblem p oses new c hallenges and requires th e use of n ew to ols and tec hniques. The pap er is organized as follo ws. S ection 2 includ es a p recise formulat ion of the p roblem. 3 Section 3 includes th e results on scanning and ﬁltering of noisy data arrays, while S ec tion 4 is dev oted to the prediction scenario. In b oth sections, particular emphasis is give n to the imp ortant cases of Gauss ian random ﬁelds corr u pted by Additive White Gaussian Noise (A W GN), u nder th e squared error criterion, and binary random ﬁelds corrupted by a Binary Symmetric Channel (BSC), un der the Hamming loss criterion. In p artic u lar, in Section 3.1 , a new to ol is used to d eriv e a lo we r b ound on th e optim um scanning and ﬁltering p erformance (Section 4.1 later shows how this to ol can b e used to strengthen the resu lts of [11] in the n oise -free scenario as well) . S ect ion 3.2 gives u pp er b ounds on the excess loss in non-optimal scann ing. In Section 3.2.1, the results of Dun ca n [17] as well as those of Guo, S hamai and V erd ´ u [18] are used to derive the b ounds when th e noise is Gaussian, and Section 3.2.2 deals with the b in ary setting. Section 3.3 uses recen t results b y W eissman et . al . [19] to describ e how univ ersal scanning and ﬁ lte ring algorithms can b e constructed. In the noisy scandiction section, Section 4.1 relates the b est ac hiev able p erformance in this setting, as w ell as the achievi ng scandictors, to the cle an sc andictability of the noisy ﬁeld. Section 4.2 in tro duces a universal scandiction algorithm, and Section 4.3 giv es an u pp er b oun d on the excess loss. In b oth Section 3 and Section 4, th e su b-sectio n s d escribing the optim um p erformance, the excess loss b ounds and the universal algorithms are not directly related and can b e read ind epend en tly . Finally , Section 5 con tains some concluding r emarks . 2 Problem F orm ulation W e start w ith a formal deﬁn itio n of the p roblem. Let A denote the alphab et, wh ic h is either discrete or the real line. Let N b e the n oisy observ ation alph ab et. Let Ω = ( A × N ) Z 2 b e the observ ation sp ac e (the results can b e extended to an y ﬁ nite dimension). A pr obabilit y measure Q on Ω is stationary if it is inv arian t u nder translations τ i , where for eac h ω ∈ Ω and i, j ∈ Z 2 , τ i ( ω ) j = ω j + i (namely , stationarit y means sh ift in v ariance). Denote by M (Ω) and M S (Ω) the sets of all pr obabilit y measures on Ω and stationary probabilit y measur es on Ω, resp ectiv ely . Elemen ts of M (Ω), r andom ﬁelds , will b e denoted by up p er case letters while elemen ts of Ω, individual data arr ays , will b e d en ot ed by the corresp onding lo w er case. It will also b e b eneﬁcial to refer to the clean and noisy random ﬁelds separately , that is, { X t } t ∈ Z 2 represent s the clean signal an d { Y t } t ∈ Z 2 represent s the noisy obser v ations, where for t ∈ Z 2 , X t is the random v ariable corresp ondin g to X at site t . Let V d enote the set of all ﬁnite subsets of Z 2 . F or V ∈ V , denote b y X V the r estrict ions 4 of the d at a arr a y X to V . Let R  b e the set of all r ec tangles of the f orm V = Z 2 ∩ ([ m 1 , m 2 ] × [ n 1 , n 2 ]). As a sp ecial case, denote b y V n the square { 0 , . . . , n − 1 } × { 0 , . . . , n − 1 } . F or V ⊂ Z 2 , let the inte rior radiu s of V b e R ( V ) △ = s up { r : ∃ c s.t. B ( c, r ) ⊆ V } , (1) where B ( c, r ) is a closed ball (under the l 1 -norm) of radiu s r cen tered at c . T hroughout, ln( · ) will d enote the n atural logarithm. Deﬁnition 1. A sc anner-ﬁlter p air for a ﬁ n ite set of sites B ∈ V is the follo wing pair (Ψ , F ): • The scan { Ψ t } | B | t =1 is a sequen ce of m ea surable mappin gs, Ψ t : N t − 1 7→ B determining the site to b e visited at time t , with the prop erty that n Ψ 1 , Ψ 2 ( y Ψ 1 ) , Ψ 3 ( y Ψ 1 , y Ψ 2 ) , . . . , Ψ | B |  y Ψ 1 , . . . , y Ψ | B | − 1  o = B , ∀ y ∈ N B . (2) • { ˜ F t } | B | t =1 is a sequence of measurable ﬁ lters, where for eac h t , ˜ F t : N t 7→ D d ete r mines the r ec onstruction f or the v alue at the site visited at time t , based on the cur r en t and previous observ ations, and D is the reconstruction alph abet. Note that b oth the scanner Ψ and the ﬁlters { ˜ F t } base their decisions only on the noisy observ ations. In the prediction scenario (i.e., noisy scandiction), w e d eﬁne F t : N t − 1 7→ D , that is, { F t } represen ts measurable predictors, w hic h ha v e access only to p revious observ ations. W e allo w r andomize d scanner-ﬁlter pairs, namely , pairs suc h that { Ψ t } | B | t =1 or { ˜ F t } | B | t =1 can b e c hosen rand omly from some set of p ossible fu nctions. It is also imp ortant to note that w e consider only scanner s for ﬁ n ite sets of sites, ones whic h can b e view ed merely as a reordering of th e sites in a ﬁnite set B . The cum u lat iv e loss of a s cann er-ﬁlter pair (Ψ , ˜ F ) up to time t ≤ | B | is d enote d b y L (Ψ , ˜ F ) ( x B , y B ) t , L (Ψ , ˜ F ) ( x B , y B ) t = t X i =1 l ( x Ψ i , ˜ F i ( y Ψ 1 , . . . , y Ψ i )) , (3) where l : A × D 7→ [0 , ∞ ) is the loss fu nction. The sum of the instan taneous losses ov er the en tire data arra y B , L (Ψ , ˜ F ) ( x B , y B ) | B | , w ill b e abbreviated as L (Ψ , ˜ F ) ( x B , y B ). F or a giv en loss f unction l and a ﬁeld Q ∈ M (Ω) r estricted to B , deﬁn e the b est ac hiev able scanning and ﬁltering p erformance by ˜ U ( l, Q B ) = inf (Ψ , ˜ F ) ∈S ( B ) E Q B 1 | B | L (Ψ , ˜ F ) ( X B , Y B ) , (4) 5 where Q B is the marginal probabilit y measure restricted to B and S ( B ) is the set of al l p ossible scanner-ﬁlter pairs for B . The b est ac h iev able p erform an ce for the ﬁeld Q , ˜ U ( l, Q ), is deﬁn ed b y ˜ U ( l, Q ) = lim n →∞ ˜ U ( l, Q V n ) , (5) if this limit exists. In the p redictio n scenario, F t is allo w ed to base its estimation only on y Ψ 1 , . . . , y Ψ t − 1 , and w e hav e L (Ψ ,F ) ( x B , y B ) = | B | X t =1 l ( x Ψ t , F t ( y Ψ 1 , . . . , y Ψ t − 1 )) , (6) ¯ U ( l, Q B ) = inf (Ψ ,F ) E Q B 1 | B | L (Ψ ,F ) ( X B , Y B ) , (7) and ¯ U ( l, Q ) = lim n →∞ ¯ U ( l, Q V n ) , (8) if this limit exists. The follo wing prop osition asserts th at for any stationary random ﬁeld b oth the limit in (5) and the limit in (8) exist. Prop osition 1. F or any stationary ﬁeld Q ∈ M S (Ω) and for any se qu enc e { B n } , B n ∈ R  , satisfying R ( B n ) → ∞ , the limits in (5) and (8) exist and satisfy ˜ U ( l, Q ) = lim n →∞ ˜ U ( l, Q B n ) = inf ∆ ∈R  ˜ U ( l, Q ∆ ) , (9) ¯ U ( l, Q ) = lim n →∞ ¯ U ( l, Q B n ) = inf ∆ ∈R  ¯ U ( l, Q ∆ ) . (10) Since ˜ U ( l, Q B ) and ¯ U ( l, Q B ), p ossess the su b-additivit y prop erty , e.g., for any V , V ′ , V ∩ V ′ = ∅ , there exists a scanner-ﬁlter pair (Ψ , ˜ F ) (or a scandictor (Ψ , F )) on V ∪ V ′ suc h that E Q L (Ψ , ˜ F ) ( X V ∪ V ′ , Y V ∪ V ′ ) ≤ | V | ˜ U ( l, Q V ) + | V ′ | ˜ U ( l, Q V ′ ) , (11) the p roof of Pr op osition 1 follo ws verbatim that of [11, Theorem 1]. 3 Filtering of Nois y Data Arra ys In this section, w e consider the scenario of scanning and ﬁltering. In this case, a lo wer b ound on the b est ac hiev able p erformance is d eriv ed. F or the cases of Gaussian random ﬁelds corrup ted b y A W GN and binary v alued ﬁelds observed through a BSC, w e der ive b ounds on th e excess loss when a n on-optima l scann er is used (with an optimal ﬁlter). Finally , we br ieﬂy d iscuss unive r sal scanning and ﬁltering. 6 3.1 A Lo w er Bou nd on the Best Ac h ie v able Scanning and Fil- tering P erformance W e assume an in v ertible m emoryless c han n el, m ea ning the c hann el inpu t distribution of a single s ym b ol is uniqu ely determined giv en the output distribution. As an example, a discrete memoryless c h annel with an in v ertible channel matrix can b e kept in m in d. See [20] for a discussion on the conditions on the c h annel matrix for th e inv ertibilit y pr operty to hold. Moreo v er, as w ill b e elab orated on later, the result b elo w applies to more general c hann els, including con tinuous ones. In the case of an inv ertible channel, w e deﬁne asso ciated Ba yes en velo p e by f l ( P ) = min g ( · ) E l ( X, g ( Y )) , (12) where P is the distribu tion of the channel output Y . Deﬁne ζ ( d ) = max { H ( P ) : f l ( P ) ≤ d } , (13) and let ¯ ζ ( · ) b e th e upp er conca ve ( ∩ ) en velope of ζ ( · ). Theorem 2. L et Y B b e the output of an invertible memoryless channel whose input is X B . Then, for any sc anner-ﬁlter p air (Ψ , ˜ F ) we have ¯ ζ  1 | B | E Q B L (Ψ , ˜ F ) ( X B , Y B )  ≥ 1 | B | H ( Y B ) , (14) that is, ¯ ζ  ˜ U ( l, Q B )  ≥ 1 | B | H ( Y B ) . (15) Pr o of. W e prov e the ab o v e theorem for the discrete case. Y et, the deriv ations b elo w app ly to the contin uous case as well, w ith summations r eplace d by the appropr iat e integ r als and the en tropy r eplac ed by diﬀerentia l entrop y . 7 Denote by Ψ( Y B ) th e reordered outpu t sequence, that is, { Y Ψ 1 , Y Ψ 2 , . . . , Y Ψ | B | } . W e hav e, H ( Y B ) ( a ) = H (Ψ( Y B )) = | B | X t =1 H ( Y Ψ t | Y Ψ t − 1 ) = | B | X t =1 X y Ψ t − 1 H ( Y Ψ t | Y Ψ t − 1 = y Ψ t − 1 ) P ( y Ψ t − 1 ) ( b ) ≤ | B | X t =1 X y Ψ t − 1 ζ  E Q B n l  X Ψ t , ˜ F t ( y Ψ t − 1 , Y Ψ t )  | Y Ψ t − 1 = y Ψ t − 1 o P ( y Ψ t − 1 ) ( c ) ≤ | B | X t =1 X y Ψ t − 1 ¯ ζ  E Q B n l  X Ψ t , ˜ F t ( y Ψ t − 1 , Y Ψ t )  | Y Ψ t − 1 = y Ψ t − 1 o P ( y Ψ t − 1 ) ( d ) ≤ | B | X t =1 ¯ ζ  E Q B l  X Ψ t , ˜ F t ( Y Ψ t )  ( e ) ≤ | B | ¯ ζ  1 | B | E Q B L ˜ F (Ψ( X B ) , Ψ( Y B ))  = | B | ¯ ζ  1 | B | E Q B L (Ψ , ˜ F ) ( X B , Y B )  . (16) The equ ali t y ( a ) is since the reord ering do es not c hange the ent rop y of Y B . While this is clear for data-indep endent reordering, more caution is required when Ψ is a d at a-dep enden t scan. Y et, this can b e pro ved usin g the c hain ru le, an d noting that conditioned on Y Ψ t − 1 Ψ 1 , th e n ext site Ψ t is ﬁxed (this is similar to the pro of of [12, Prop osition 13]). The inequalities (b) and (c) f ol lo w from the deﬁn itio n s of ζ and ¯ ζ resp ectiv ely , and (d) and (e) f ollo w from Jensen’s inequalit y . A t this p oint, a few remarks are in order. Theorem 2 is the d irect analogue of the low er b ounds in [11] for the ﬁltering scenario. Note, ho wev er, that it holds for an y ﬁn ite set of sites B . F urtherm ore, it app lies to arbitrarily distribu te d r an d om ﬁ elds (ev en non-stationary ﬁelds), and to a wide family of loss f unctions. In fact, the only condition on l ( · , · ) is that the asso cia ted Ba y es en velope f l ( P ) is w ell deﬁned. Note also that th e lo wer b ound on ˜ U ( l, Q ) giv en in T heorem 2 results from the app lication of a single letter f u nction, ¯ ζ − 1 ( · ), to th e normalized ent rop y of th e n oisy ﬁ eld, 1 | B | H ( Y B ). Th at is, the memory in ( X B , Y B ) is reﬂected only in 1 | B | H ( Y B ). The pro of of Theorem 2 is general and direct, h o wev er, it lac ks the insightful geometrical in terpr eta tion whic h led to the lo wer b ound in [11]. Ther ein, Merhav and W eissman show ed that the transformation from a data array to an error sequence (deﬁned by a sp eciﬁc scandictor 8 (Ψ , F )) is volume preserving. Th u s, the least exp ected cumulativ e error is the r adius of a spher e , whose volume is the vo lu me of th e s et of all typica l data arr a ys of the source. This happ ens when all the t ypical data arr a y s of th e source map to a sphere in th e “error v ectors” space, and thus Merha v and W eissman we re able to ident ify cases wher e the lo wer b ound is tight. Cu r ren tly , w e cannot p oin t out sp eciﬁc cases in whic h (15) is tigh t. Moreo ver, as the n ext t w o examp les sho w , in the scanning and ﬁ ltering scenario (un lik e the scanning and prediction scenario we discuss in S ec tion 4), ζ ( d ) ma y not b e conca ve , and th u s ζ ( d ) 6 = ¯ ζ ( d ). Note, in this con text, that there is no natural time sharing solution in this case, as there is no natural trade-oﬀ b et we en t w o (or more) optimal p oint s, and there is only one criterion to b e minimized - th e cum ulativ e scanning and ﬁltering loss (as opp osed to rate v er s us distortion, for example). 3.1.1 Binary Input and BSC T o illustr ate its use, we sp ecialize Theorem 2 to the case of bin ary input through a BSC, i.e., the inp ut random ﬁeld X V n is b inary , and Y V n is the outp u t of a BSC whose input is X V n and crossov er p robabilit y is δ < 1 / 2. Note, how ev er, that although the d eriv ations b elo w are sp eciﬁc for binary alphab et and Hamming loss, they are easily extendible to arbitrary ﬁ nite alphab et and d iscrete memoryless c hannel with a c h annel transition matrix Π and loss function Λ( · , · ). T o compute the low er b ound on the b est ac hiev able scanning and ﬁ lte ring p erf orm ance, w e ev aluate f l ( P ) and ζ ( d ). By th e deﬁn itio n s in (12) and (13), we consider the scalar pr oblem of estimation of a random v ariable X based on its n oisy observ ation Y . Denote by p Y the probabilit y P ( Y = 1) and by p X the probabilit y P ( X = 1). Th e b est ac hiev able p erform ance , 9 f l H ( p Y ), whic h clearly d epend s on δ , and, hence, den oted f δ ( p Y ), is give n by f δ ( p Y ) = X x,y P ( x, y ) l H ( x, g opt ( y )) = X y P ( y ) X x P ( x | y ) l H ( x, g opt ( y )) ( a ) = X y P ( y ) min x P ( x | y ) = X y min x P ( x, y ) = min { p X (1 − δ ) , δ (1 − p X ) } + min { p X δ , (1 − δ )(1 − p X ) } = min { p X , 1 − p X , δ } ( b ) = min  p Y − δ 1 − 2 δ , 1 − p Y − δ 1 − 2 δ , δ  , (17) where (a) results fr om the optimalit y of g opt ( y ) and (b) results from th e inv ertabilit y of the c hann el. Consequ ently , ζ ( d ) = max p h b ( p ) s.t. f δ ( p ) ≤ d = ( h b ( δ ∗ d ) d < δ 1 d ≥ δ , (18) where h b ( · ) is the binary en tropy f u nction and δ ∗ d = d (1 − δ ) + δ (1 − d ). Note that since δ ∗ δ < 1 / 2 for 0 < δ < 1 / 2, there is a discon tin uity at d = δ , h en ce ζ ( d ) is generally not conca v e and ¯ ζ ( d ) 6 = ζ ( d ) (although ¯ ζ ( d ) can b e easily calculated). Figure 1 includes plots of b oth ζ ( d ) and ¯ ζ ( d ) for δ = 0 . 25. W e also mentio n that d = δ is a realistic cumulativ e loss in non-trivial situations, as there are cases wh ere “sa y-what-y ou-see” (and thus suﬀer a loss δ ) is th e b est any ﬁlter can d o [21]. F urth ermore, note that ζ ( d ) is not th e maximum entrop y function γ ( d ) used in [11] to deriv e the lo wer b ound on th e scandictabilit y . Finally , exact ev aluation of the b ound in Theorem 2 may b e diﬃcu lt in man y cases, as the en tropy 1 | B | H ( Y B ) may b e h ard to calculate, and only b ound s on its v alue can b e used . 1 A t the end of Section 3.2.2, we give a numerical example for the b ound in T heorem 2 using a lo we r b ound on the entrop y rate. R emark 1 . Clearly , ζ ( d ) is interesting only in the r egi on d ≤ δ , as an y reasonable ﬁlter will ha ve an exp ected n ormaliz ed cum u lat iv e loss smaller or equal to the c h annel crossov er probab ility . Ho we ver, due to th e d iscontin uit y at d = δ , ζ ( d ) is conca ve for d < δ but not for d ≤ δ . Th is 1 Think, for example, of an input pro cess which is a ﬁrst order Mar kov so urce. While the entrop y rate of the input is known, the o utput is a hidden Markov pro cess whos e entrop y r ate is unknown in gener al. 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.8 0.85 0.9 0.95 1 ζ (d) and its upper concave envelope for δ =0.25 d ζ (d) Upper concave envelope of ζ (d) Figure 1: The function ζ ( d ), as it app ears in (18), a nd its upp er conca v e en v elop e, ¯ ζ ( d ), b oth plotted f o r δ = 0 . 25. Note that ζ ( d ) and ¯ ζ ( d ) ha v e analytic expressions, a nd the plots a re discrete only to b etter distinguish b etw een them. is fortunate, as if ζ ( d ) was conca ve on d ≤ δ , Theorem 2 would hav e resulted in h b ( δ ∗ δ ) as an upp er b ound on the entrop y rate of any b inary s ou r ce corrupted b y a BSC , whic h is erroneous (for example, it violates h b ( π ∗ δ ) as a lo wer b ound on the entrop y rate of a ﬁrst ord er Mark o v source with transition pr obabilit y π corru pted by a BSC with crosso ver probability δ ). 3.1.2 Gaussian Channel Consider no w the case where Y V n is the outpu t of an A WG N channel, wh ose in put is arbitrarily distributed. Assume the squared error loss. As the optimal ﬁlter is clearly the conditional exp ectat ion, ζ ( d ) in this case is given by ζ ( d ) = max { H ( X + N ) : V ar( X | X + N ) ≤ d } , N ∼ N (0 , σ 2 N ) , N ⊥ X . (19) Since H ( X + N | X ) = H ( N ) is ﬁxed, this is similar to the classical Gaussian c hann el capacit y problem, only no w the inpu t constrain t is V ar ( X | X + N ) ≤ d , which generally dep ends on the distribution of X rather than solely on its v ariance, and h en ce is n ot necessarily ac h iev ed by Gaussian X . When the input is also limited to b e Gaussian, ho we v er, the optimization problem in (13) is trivial and ζ ( d ) can b e easily calculate d (note that in this case ζ ( d ) is v alid only to 11 b ound the p erformance for scanning and ﬁ ltering of Gaussian ﬁelds corru p ted by A WGN) . Since the d istributions dep end only on the v ariance (assuming zero exp ectation), we ha ve f l s ( P ) = f l s ( σ 2 Y ), and , in f act, f l s ( σ 2 Y ) = σ 2 N σ 2 X σ 2 N + σ 2 X = σ 2 N − σ 4 N σ 2 Y . (20) Hence, ζ ( d ) = max 1 2 ln(2 πeσ 2 Y ) s.t. f l s ( σ 2 Y ) ≤ d = ( 1 2 ln  2 π e σ 4 N σ 2 N − d  d < σ 2 N ∞ d ≥ σ 2 N . (21) Unlik e the b inary s etting, h ere th e cumulativ e loss, d , will b e strictly smaller th an σ 2 N for an y non-trivial setting and reasonable ﬁlter, as the error in symb ol -b y-symb ol ﬁltering is σ 2 N σ 2 X σ 2 X + σ 2 N < σ 2 N . Y et, ζ ( d ) is conv ex ( ∪ ) for d < σ 2 N , and the chai n of inequalities in (16) cannot b e tigh t. 3.2 Bounds on the Excess Loss of Non-Op ti mal Scanners Theorem 2 giv es a lo wer b ound on the optimum scanning and ﬁltering p erformance. Ho wev er, it is interesting to inv estigate wh at is the excess scanning and ﬁltering loss when non-optimal scanners are used. Sp eciﬁcally , in this s ec tion we address the f ollo wing question: Su pp ose that, for practical r easo n s for example, one uses a n on-optima l scanner, accompanied with the optimal ﬁlter for that scan. Ho w large is the excess loss incurred by th is sc h eme with resp ect to optimal scanning and ﬁ ltering? W e consider b oth the case of a Gaussian c hann el and squared err or loss (with Gaussian or arb itrarily distributed input) and the case of a bin ary source passed through a BSC and Hamming loss. While the to ols we use in order to construct s u c h a b ound f or the binary case are similar to the ones us ed in [12], we d ev elop a n ew s et of to ols and tec hniques for the Gaussian setting. 3.2.1 Gaussian Channel W e inv estigate the excess scannin g and ﬁ lte r ing loss when non-optimal scanners are us ed , for the case of arbitrarily distribu ted inpu t corrup ted by a Gaussian c hann el. W e ﬁr st fo cus 12 atten tion on th e case where the input is Gaussian as well, and then d eriv e a n ew results for the m ore general setting. Similarly as in [12], the b ound is ac hiev ed by b oundin g the absolute diﬀerence b et wee n the s ca n ning and ﬁ ltering p erformance of an y t wo scans, Ψ 1 and Ψ 2 , assuming b oth use their optimal ﬁlters. This b ound, ho w ever, results from a r ela tion b et we en the p erformance of discr ete time ﬁ ltering and c ontinuous time ﬁltering, together with the fu ndamen tal r esult of Duncan [17] on the relation b et wee n mutual in formatio n and causal minimal mean square error estimation in a Gaussian c hannel. Namely , w e use the mutual i nfor mation in c ontinuous time as a s can inv arian t feature, and the actual v alue of the excess loss b ound results from the diﬀerence b et we en d iscrete and cont inuous time ﬁltering problems, as will b e made precise b elo w. F r om now on w e assume the loss fun ctio n is the s q u ared er r or loss, l s ( · ). W e start with sev eral deﬁ nitions. Let X b e a Gaussian random v ariable, X ∼ N (0 , σ 2 X ). C onsider the follo wing tw o estimation problems: • The scalar problem of estimating X based on Y = X + N , where N ∼ N (0 , σ 2 N ), indep endent of X . • The con tinuous time problem of causally estimating X t ≡ X , t ∈ [0 , 1], based on Y t , whic h is an A WG N-corrupted version of X t , the Gaussian noise having a sp ectral d ensit y lev el of σ 2 N . T o b ound th e sens itivi t y of the scanning and ﬁltering p erformance, it is b eneﬁcial to consider the d iﬀerence b et ween the estimation err ors in the ab o ve t wo pr oblems, that is, Z 1 0 V ar( X t | Y t )d t − V ar( X | Y ) , (22) where Y t is the contin uous time signal { Y t ′ } t t ′ =0 . C lea r ly , V ar( X | Y ) = σ 2 X σ 2 N σ 2 X + σ 2 N . S ince R t 0 Y t ′ d t ′ is a suﬃcient statistics in the estimation of X t ≡ X , V ar( X t | Y t ) is equiv alen t to the squ ared error in estimating X based on X + ˜ N , ˜ N b eing a Gauss ian r andom v ariable, indep endent of X , with zero m ea n and v ariance σ 2 N /t . Thus, Z 1 0 V ar( X t | Y t )d t − V ar( X | Y ) = Z 1 0 σ 2 X ( σ 2 N /t ) σ 2 X + ( σ 2 N /t ) d t − σ 2 X σ 2 N σ 2 X + σ 2 N = σ 2 N ln  1 + σ 2 X σ 2 N  − σ 2 X σ 2 N σ 2 X + σ 2 N = σ 2 N f  σ 2 X σ 2 N  , (23) 13 where f ( x ) = ln(1 + x ) − x x + 1 . (24) The follo wing is the main result in this sub -sec tion. Theorem 3. L e t X V n b e a Gaussian r andom ﬁeld with a c onstant mar ginal distribution sat- isfying V ar ( X i ) = σ 2 X < ∞ for al l i ∈ V n . L et Y i = X i + N i , wher e N V n is a white Gaussian noise of varianc e σ 2 N , indep endent of X V n . Then, for any two sc ans Ψ 1 and Ψ 2 , we have 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ σ 2 N f  σ 2 X σ 2 N  . (25) Theorem 3 b ound s the absolute diﬀerence b et wee n the scanning and ﬁ ltering p erform an ce of an y t wo s cann ers, Ψ 1 and Ψ 2 , assuming they use th eir optimal ﬁlters. Clearly , since the scanners are arbitrary , this result can also b e interpreted as the diﬀerence in p erformance b et w een an y scan Ψ, and the b est achiev able p erformance, ˜ U ( l , Q V n ). Note that the b ound v alue, σ 2 N f  σ 2 X σ 2 N  , is a single letter expression, which dep ends on the inp ut ﬁ eld X V n and the noise N V n only th rough their v ariances. Namely , the b ound do es not d epen d on the memory in X V n . Pr o of (The or em 3). As m en tioned earlier, th e comparison b et we en an y tw o scans is made by b ounding th e norm al ized cumulativ e loss of any sc an Ψ in terms of a sc an invariant en tit y , whic h is the mutual information. F or simplicit y , assume ﬁrst that the scan Ψ is data-indep endent, namely , it is merely a reordering of the entries of Y V n . In this case, { X Ψ i } n 2 i =1 is a discrete time Gaussian vecto r . W e construct fr om it a c ontinuous time pr o c ess , { X ( c ) t } t ∈ [0 ,n 2 ] , w here for any t ∈ [ i − 1 , i ), X ( c ) t = X Ψ i , i ∈ { 1 , 2 , . . . , n 2 } . That is, X ( c ) t is a piecewise constant p rocess, whose constan t v alues at interv als of length 1 corr esp ond to the original v alues of th e discrete time v ector { X Ψ i } . L et { Y Ψ i } and { Y ( c ) t } b e th e A W GN-corrupted versions on { X Ψ i } and X ( c ) t , n amel y , Y Ψ i = X Ψ i + N Ψ i and Y ( c ) t is constructed according to d Y ( c ) t = X ( c ) t d t + σ N d W t , t ∈ [0 , n 2 ] , (26) where W t is a stand ard Bro wnian m ot ion. Observe that the white Gaussian noise, σ N d W t , has a sp ectral density of lev el σ 2 N , s im ilar to th e v ariance of the discrete time noise N V n . Since w e switc h fr om d iscrete time to contin uous time, it is im p ortant to note that the noise v alue in the t wo p roblems is equiv alen t. Th at is, if the discrete time ﬁeld X V n is corrup ted by n oi se of v ariance σ 2 N , th en we wish th e cont in u ous time white noise to hav e a sp ectrum suc h that the 14 in tegral o ver an interv al of length 1, whose inte grand is the con tinuous outpu t Y ( c ) t (and th u s is a suﬃcient statistics in order to estimate the piecewise-co n tinuous input X t in th is interv al), will b e a r andom v ariable which is exactly X Ψ i + N Ψ i , N Ψ i ha ving a v ariance of σ 2 N . W e hav e, 1 n 2 E Q V n L (Ψ , ˜ F opt ) ( X V n , Y V n ) = 1 n 2 n 2 X i =1 V ar  X Ψ i | Y Ψ i Ψ 1  ( a ) = 1 n 2 n 2 X i =1 " Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ]  d t − σ 2 N f V ar  X Ψ i | Y Ψ i − 1 Ψ 1  σ 2 N ! # ( b ) ≥ 1 n 2 n 2 X i =1 Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ]  d t − σ 2 N f V ar  X 1  σ 2 N ! ( c ) = 1 n 2 2 σ 2 N I  { X ( c ) t } t ∈ [0 ,n 2 ] ; { Y ( c ) t } t ∈ [0 ,n 2 ]  − σ 2 N f V ar  X 1  σ 2 N ! = 1 n 2 2 σ 2 N I ( { X Ψ i } ; { Y Ψ i } ) − σ 2 N f V ar  X 1  σ 2 N ! ( d ) = 1 n 2 2 σ 2 N I ( X V n ; Y V n ) − σ 2 N f V ar  X 1  σ 2 N ! . (27) The equalit y (a) is fr om the application of (23) w ith X = X Ψ i | Y Ψ i − 1 Ψ 1 , i.e., with X Ψ i dis- tributed cond itioned on Y Ψ i − 1 Ψ 1 . Note conditioned on Y Ψ i − 1 Ψ 1 , X Ψ i is in deed Gauss ia n, and that (23) applies to any Gaussian X corrup ted by Gaussian noise. The in equalit y (b) is since V ar( X Ψ i | Y Ψ i − 1 Ψ 1 ) ≤ V ar( X 1 ) and due to the increasing monotonicit y of f , (c) is since th e result- ing in tegral from 0 to n 2 is simply th e minimal mean square error in ﬁltering { Y ( c ) t } (as Y Ψ i is a suﬃcien t statistics with resp ect to { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ] ), and the application of Duncan’s r esult [17, T h eorem 3]. Finally , (d) is since th e mutual information is inv arian t to th e reordering of the ran d om v ariables. T o complete the pro of of Theorem 3, simply note th at sin ce f ( x ) is non-negativ e f or x > 0, by (a) ab o v e, the normalized cumulativ e loss can b e upp er b oun ded as well, that is, 1 n 2 E Q V n L (Ψ , ˜ F opt ) ( X V n , Y V n ) ≤ 1 n 2 n 2 X i =1 Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ]  d t (28) hence, s im ilarly as in the c hain of inequalities leading to (27), 1 n 2 E Q V n L (Ψ , ˜ F opt ) ( X V n , Y V n ) ≤ 1 n 2 2 σ 2 N I ( X V n ; Y V n ) . (29) 15 In fact, equ at ion (29) can b e view ed as the scanning and ﬁltering analogue of [18, eq. (156a) ]. No w, if the scan Ψ is data-dep enden t, th e ab o v e deriv ations apply , with the u se of the smo othing prop ert y of conditional exp ectation. Th at is, conditioned on Y Ψ i − 1 Ψ 1 , the p osition Ψ i is ﬁxed (assuming deterministic s ca n ners, though random scanning order can b e tackl ed with a similar metho d), relation (a) in (27) h olds since it holds cond itioned on Y Ψ i − 1 Ψ 1 , and relation (c) h olds as the mutual information is inv arian t under data-dep enden t reordering as w ell. T h is is very similar to the metho ds used in the pro of of [12, Prop osition 13], w here it was sho w n that the en tr opy of a vect or is inv ariant to data-dep enden t reorderin g. A t this p oin t, a few r emarks are in order . A very simple b ound, applicable to arbitrarily distributed ﬁ elds and under squ ared error loss (yet int er esting mainly in the Gaussian r egime) results from noting th at for any random v ariables X and Y = X + N , 0 ≤ V ar( X | Y ) ≤ σ 2 N σ 2 X σ 2 X + σ 2 N . (30) Namely , simp le sym b ol by sym b ol restoration results in a cumulat iv e loss of at most σ 2 N σ 2 X σ 2 X + σ 2 N , and we h a ve, 1 n E L (Ψ , ˜ F opt ) ( { X i } n i =1 , { Y i } n i =1 ) = 1 n X i V ar( X Ψ i | Y Ψ i Ψ 1 ) ≤ 1 n X i V ar( X Ψ i | Y Ψ i ) = σ 2 N σ 2 X σ 2 X + σ 2 N . (31) Th us, the excess loss in n on-optimal scanning cannot b e greater than that v alue, hence, 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ σ 2 N σ 2 X σ 2 X + σ 2 N . (32) In th e next su b -sect ion, w e derive a tigh ter b oun d th an the b ound in (32), ap p lica b le to arbitrarily distributed noise-free ﬁelds. Ho wev er, sin ce this b ound may b e h arder to ev aluate, it is in teresting to d iscuss the prop erties of (32 ) as well . Both the b ound in Theorem 3 an d the b ound in (32) are in the form of V ar( X 1 ) g (SNR), for some g , wh ere SNR = σ 2 X /σ 2 N . This means th at any b ound obtained for a certain SNR app lies to all v alues of V ar( X 1 ) by rescaling. T he b oun d in Th eorem 3 h as the form V ar( X 1 ) f (SNR) SNR , where f ( · ) was deﬁ n ed in (24), and we h av e lim SNR → 0 + f (SNR) SNR = lim SNR →∞ f (SNR) SNR = 0 , (33) 16 0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (Var(X)/Var(N)) bound value (for Var(X 0 )=1) Bounds on the scantering sensitivity − Gaussian input Symbol−by−symbol bound (arbitrary distributions) Gaussian input and Gaussian noise Figure 2: Bounds on the excess loss in scanning and ﬁltering of Gaussian input corrupted by A WGN. The solid line is the b ound give n in Theorem 3. The dashed line is the b ound giv en in (32). that is, the scan is inconsequentia l at v ery h igh or ve ry lo w SNR. This is clear as at h igh SNR the current observ ation is by far th e most inﬂu en tial, and wh ate ver previous observ ations us ed is inconsequentia l. F or low S NR, the cum ulativ e loss is high wh ate ver the s ca n is. Unlik e the b ound in Theorem 3, (32) do es n ot p r edict the correct b ehavio r for SNR → 0 + , and is mainly in teresting in the high S NR r eg ime. The ab o v e observ ations are also evident in Figur e 2, which includes b oth the b ound giv en in Th eo rem 3, applicable to Gaussian ﬁelds, and (32), applicable to arbitrarily distrib uted ﬁelds. It is also evident that in the case of Gaussian ﬁelds, f (SNR) SNR has a u nique maxim u m of appro xim ately 0 . 216, that is, the excess loss du e to a sub optimal scan at any SNR is upp er b ounded by 0 . 216V ar ( X 1 ). R emark 2 . It is clear from the pr oof of T heorem 3 that an upp er b ound on the expr ession in (22), v alid for arb itrarily distribu ted in p ut X , may yield an upp er boun d on the exce ss scanning and ﬁ lte r ing loss wh ic h is also v alid for arbitrarily distr ib uted random ﬁelds. Ho wev er, while the integral in (22) can b e up p er b ound ed by assuming a Gaussian X , V ar( X | Y ) has no non- trivial lo wer b ound. In fact, in [22], it is shown that if X is th e f ol lo wing b inary random v ariable, X =    q 1 − p p w.p. p, − q p 1 − p w.p. 1 − p, (34) 17 for whic h E X = 0 and E X 2 = 1, then we ha v e V ar ( X | Y ) ≤ 1 2 p (1 − p ) e − σ 2 X /σ 2 N 4 p (1 − p ) , (35) whic h can b e arbitrarily close to 0 for small enough p . T h us, the only low er b ound on V ( X | Y ) whic h is v alid f or an y X with E X 2 < ∞ , an d dep ends only on σ 2 X and σ 2 N , is 0 (and hence results is a b ound w eake r than Theorem 3 or (32 )). In the next t wo subsections, we derive new b ound s on the excess loss, whic h are v alid f or more general input ﬁelds . First, we generalize the b ound in Theorem 3. While the result ma y b e complex to ev aluate in its general f orm, we sho w that for binary input ﬁelds the b ound admits a simple form. W e then sho w that if the input alphab et is contin uous, then a non-trivial b ound on V ar( X | Y ) can b e deriv ed easily , w hic h, in tur n, resu lts in a new b oun d on the excess loss. A Generalization of Theorem 3. A generalization of Theorem 3 r esults fr om revisiting equalit y (a) of (27), w h ic h is sim p ly the application of (23) with X = X Ψ i | Y Ψ i − 1 Ψ 1 . While it is clear th at an expression similar to that in (23) can b e compu ted f or n on-Gaussian X , it is not clear that X Ψ i | Y Ψ i − 1 Ψ 1 has the s ame distrib ution f or any 1 ≤ i ≤ n 2 (unlik e the Gauss ia n setting, w here X Ψ i | Y Ψ i − 1 Ψ 1 is alw ays Gaussian). Nevertheless, u sing the deﬁnition b elo w, on e can generalize Theorem 3 f or arbitrarily distribu ted inpu ts as follo ws. F or any ( X V n , Y V n ), where Y V n is the A W GN-corrup ted version of X V n , d eﬁne f ∗ ( X V n , σ 2 N ) = max Ψ , 1 ≤ i ≤ n 2  Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t ′ } t ′ ∈ [ i − 1 ,i − 1+ t ]  d t − V ar  X Ψ i | Y Ψ i Ψ 1   . (36) Theorem 4. L et X V n b e an arbitr arily distribute d r andom ﬁeld, with a c onstant mar ginal distribution satisfying V ar ( X i ) = σ 2 X < ∞ for al l i ∈ V n . L et Y i = X i + N i , wher e N V n is a white Gaussian noise of varianc e σ 2 N , indep e ndent of X V n . Then, for any two sc ans Ψ 1 and Ψ 2 , we have 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ f ∗ ( X V n , σ 2 N ) . (37) The pro of of Theorem 4 is similar to that of Th eo rem 3, and app ears in App endix A.1. Note that f ∗ ( X V n , σ 2 N ) is s can-ind epend en t, as it includ es a maximization o v er all p ossible scans. At ﬁrst sigh t, it seems lik e this maximization may tak e the sting out of the excess loss b ound. Ho wev er, as the example b elo w shows, at least for the inte r esting scenario of b inary input, this is not the case. 18 First, ho wev er, a few more general r emarks are in order. Since imp ortant insight can b e gained when u sing the results of Guo, S hamai an d V erd ´ u [18], let us mention the setting u sed therein. In [18], one wishes to estimate X b ased on √ SNR X + N , w h ere N is a s tand ard normal r an d om v ariable. Denote by I(SNR) and mmse(SNR) the mutual in formatio n b et w een X and √ SNR X + N , and th e minimal mean square error in estimating X based on √ SNR X + N , r esp ec tively . Note that V ar( X | Y ) in our setting equals σ 2 X mmse( σ 2 X /σ 2 N ). Under these deﬁnitions, d dSNR I(SNR) = 1 2 mmse(SNR) , (38) or, equ iv alen tly , I(SNR) = 1 2 Z SNR 0 mmse( γ ) d γ . (39) Consequent ly , the result of Theorem 3 can b e r esta ted as 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ 2 σ 2 N I(SNR) − σ 2 X mmse(SNR) , (40) where I(SNR) = 1 2 ln(1 + SNR) and mmse(SNR) = 1 1+SNR are simply the mutual information and minimal mean squ are error of the sc alar pr oblem (hence, a sin gl e letter expression) of estimating a Gaussian X based on √ SNR X + N , where N standard Gaussian. In fact, the b ound in Theorem 4 w ill alw ays ha ve the form 2 σ 2 N I(SNR) − σ 2 X mmse(SNR), for s ome X ∗ whose distribution is the maximizing distribution in (36 ). The next example sh o ws that this is indeed the case for b inary inp ut as we ll, and the r esulting b ound can b e easily compu ted. Example 1 ( Binary input and A WGN ) . Consid er the case wher e X V n is a bin ary random ﬁeld, with a symmetric marginal distribution (that is, P ( X 0 = σ X ) = P ( X 0 = − σ X ) = 1 / 2). Note that the X i ’s are not necessarily i.i.d., and an y dep endence b etw een them is p ossible. Y V n is th e A WG N-corrupted version of X V n . T o ev aluate the b oun d in Th eo rem 4, f ∗ ( X V n , σ 2 N ) should b e calculate d. Ho wev er, for an y scan Ψ and time i , X Ψ i | Y Ψ i − 1 Ψ 1 is still a binary rand om v ariable, taking the v alues ± σ X with probabilities ( p, 1 − p ), for some 0 ≤ p ≤ 1 / 2. Hence, f ∗ ( X V n , σ 2 N ) ≤ m ax 0 ≤ p ≤ 1 / 2  Z 1 0 V ar( X t | Y t )d t − V ar( X | Y )  , (41) where X is a binary rand om v ariable, taking the v alues ± σ X with p r obabilitie s ( p, 1 − p ), X t ≡ X , Y = X + N and Y t is the A W GN-corrupted v ers io n of X t . The follo wing result holds for an y random v ariable X . Claim 5 . F or any r andom variable X with V ar ( X ) = σ 2 X < ∞ , the expr ession in (22) is monotonic al ly incr e asing in σ 2 X . 19 Pr o of. W e hav e, Z 1 0 V ar( X | Y t )d t − V ar( X | Y ) = Z 1 0 σ 2 X mmse( σ 2 X /σ 2 N t )d t − σ 2 X mmse( σ 2 X /σ 2 N ) = σ 2 N Z σ 2 X /σ 2 N 0 mmse( γ )d γ − σ 2 X mmse( σ 2 X /σ 2 N ) . (42) Th us, d d σ 2 X  Z 1 0 V ar( X t | Y t )d t − V ar( X | Y )  = − σ 2 X d d σ 2 X mmse( σ 2 X /σ 2 N ) = − 2SNR d 2 dSNR 2 I(SNR) ≥ 0 , (43) where the last inequalit y is by [18, Corollary 1]. Claim 5 simply states that the monotonicit y of f ( · ) u sed in inequalit y (b) of (27) is n ot sp eciﬁc for Gaussian in put, and h olds for any X . Thus, b y Claim 5, the term in th e braces of equation (41) is monotonically in creasing in the v ariance of X , wh ic h is s imply 4 σ 2 X ( p − p 2 ). Th us, it is maximized by p = 1 / 2, and w e hav e f ∗ ( X V n , σ 2 N ) = 2 σ 2 N I(SNR) − σ 2 X mmse(SNR) , (44) where I(SNR) and mmse(SNR) are the mutual inf ormatio n and minimal mean squ are err or in the estimation of X based on √ SNR X + N , wh ere X is binary symm etric and N is a standard normal. Since the conditional mean estimate in this pr oblem is tanh( √ SNR Y ), we ha v e [18] I(SNR) = S NR − 1 √ 2 π Z ∞ −∞ e − y 2 2 ln cosh(SNR − √ SNR y ) dy , (45) and mmse(SNR) = 1 − 1 √ 2 π Z ∞ −∞ e − y 2 2 tanh(SNR − √ SNR y ) dy , (46) so the b ound can b e compu ted numerically . The ab o ve b ound is plotted in Figure 3. Similarly to the case of Gaussian input, it is insightful to compare this b ound to a simple symb ol-b y- sym b ol ﬁltering b ound. T hat is, since for an y binary X corrupted by A W GN of v ariance σ 2 N , 0 ≤ V ar( X | Y ) ≤ σ 2 X mmse(SNR), we ha ve 1 n 2    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ σ 2 X mmse(SNR) , (47) where mmse(SNR) is giv en in (46). Th is is simply the analogue of (32) to the binary inpu t setting. 20 0 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (Var(X)/Var(N)) bound value (for Var(X 0 )=1) Bounds on the scantering sensitivity − binary input Symbol−by−symbol bound Binary input and Gaussian noise Figure 3 : Bounds on the excess loss in scanning and ﬁltering of binary input ﬁelds corrupted b y A WGN. The solid line is the b ound give n in (44) (that is, Theorem 4), and the dashed line is the sym b ol-b y-sym b ol b ound giv en in (47). A Bound for Arbitrarily Distributed C on tin uous Input. In this sub -section, we derive an additional b ound on the excess scann in g and ﬁltering loss under squ ared error. W e assume, ho wev er, that the input random ﬁeld X V n is o ve r R V n , and th at X i | Y V n has a ﬁn ite d iﬀerential en tropy for an y i ∈ V n (roughly s p eaking, th is means that in the denoising problem of X i , X i | Y V n is a non-d eg enerated con tinuous random v ariable). Under the ab o ve assumptions, we deriv e an excess loss b ound whic h is not only v alid for non-Gaussian input, but also d epend s on th e memory in the ran d om ﬁeld ( X V n , Y V n ). On the other hand, it is imp ortant to note that the b ound b elo w is mainly asymptotic, and ma y b e muc h harder to ev aluate compared to the b ounds in Theorem 3 or (32). By [23, Theorem 9.6.5], for an y X , Y with a ﬁnite conditional diﬀeren tial en tropy H ( X | Y ), V ar( X | Y ) ≥ 1 2 π e exp { 2 H ( X | Y ) } . (48) 21 Th us, 1 n E L (Ψ , ˜ F opt ) ( X V n , Y V n ) = 1 n 2 n 2 X i =1 V ar( X Ψ i | Y Ψ i Ψ 1 ) ( a ) ≥ 1 n 2 n 2 X i =1 1 2 π e exp { 2 H ( X Ψ i | Y Ψ i Ψ 1 ) } ( b ) ≥ 1 n 2 n 2 X i =1 1 2 π e exp { 2 H ( X Ψ i | Y Ψ n 2 Ψ 1 ) } ( c ) ≥ 1 2 π e exp    2 1 n 2 n 2 X i =1 H ( X Ψ i | Y Ψ n 2 Ψ 1 )    , (49) where (a) is by applying (48 ) with Y = Y Ψ i Ψ 1 , (b) is since cond iti oning reduces ent r op y and (c) is b y app lying J ensen’s inequalit y . The expr ession 1 n 2 P n 2 i =1 H ( X Ψ i | Y Ψ n 2 Ψ 1 ) equals 1 n 2 P n 2 i =1 H ( X Ψ ′ 1 | Y Ψ ′ n 2 Ψ ′ 1 ) for any t wo scanners Ψ and Ψ ′ , since equalit y holds even without th e exp ectatio n imp licit in the entrop y fun ction. Th us, it is scan-inv arian t. Deﬁne H + ( X | Y ) = lim in f n →∞ 1 | V n | X i ∈ V n H ( X i | Y V n ) . (50) H + ( X | Y ) can b e seen as the asymptotic n ormaliz ed entrop y in the d en oi sing pr oblem of { X } based on its n oisy obser v ations. Note that th e entropies in (50) are d iﬀerential. The follo wing prop osition giv es a new low er b ound on the excess scannin g and ﬁ ltering loss u nder squared error. Prop osition 6. L et X V n b e an arbitr arily distribute d c ontinuous value d r andom ﬁeld with V ar ( X i ) = σ 2 X for al l i . L et Y i = X i + N i , wher e N V n is a white noise of varianc e σ 2 N , indep endent of X V n . Assume that X i | Y V n has a ﬁnite diﬀer ential entr opy for any i ∈ V n . Then, for any two sc ans Ψ 1 and Ψ 2 , we have lim inf n →∞ 1 | V n |    E L (Ψ 1 , ˜ F opt ) ( X V n , Y V n ) − E L (Ψ 2 , ˜ F opt ) ( X V n , Y V n )    ≤ σ 2 N σ 2 X σ 2 X + σ 2 N − 1 2 π e exp  2 H + ( X | Y )  . (51) Pr o of. T he pro of follo ws directly by app lying the lo wer b ound on the scann ing and ﬁ lte ring p erformance giv en in (49) and the up p er b ound in (31). The b ound in Prop osition 6 is alw a ys at least as tigh t as the b oun d in (32) (and thus tigh ter than the b ound in Theorem 3 for high SNR). F or example, if the estimation error of X i giv en 22 Y V n tends to zero as n increases (as in the case where X i = X f or all i ), then exp { 2 H + ( X | Y ) } = 0. Ho w ever, if X i cannot b e reconstructed completely b y Y V n , th en the b ound may b e tigh ter than (32 ). I t is far fr om b eing a tigh t b oun d on the excess loss, th ough. In the extreme case where all X i ’s are i.i.d., the excess loss b ound in Prop osition 6 is σ 2 N σ 2 X σ 2 X + σ 2 N − V ar( X 1 | Y 1 ) > 0 (for non Gaussian X ), w hile it is clear that all reasonable scanner-ﬁlter p airs p erform th e s ame. Finally , note that any lo wer b ound on H + ( X | Y ) results in an u pp er b ound on the scanning and ﬁ lte ring excess loss. F or example, s in ce H + ( X | Y ) ≥ H ( X 0 | Y k − k , X − k − 1 , X k +1 ) (52) for any ﬁn ite k , one can compute a simple up p er b ound on the excess loss, at least for ﬁr s t order Mark ov { X } . 3.2.2 Binary Input and BSC Unlik e the Gaussian setting discussed in Section 3.2.1 , wh ere the b ound on the excess loss resulted from a con tinuous-time equalit y , with the mutu al information serving as the s can- in v ariant feature, in the case of binary in put and a BSC th e entr opy of the r andom ﬁeld w ill pla y the key role, similar to [12]. As giv en in Section 3.1, the b est ac h iev able p erformance (in the s ca lar p roblem) is giv en by f δ ( p ) = min  p − δ 1 − 2 δ , 1 − p − δ 1 − 2 δ , δ  , (53) where p is the pr obabilit y that the channel output is 1 and δ is th e c hann el crosso ver prob a- bilit y . Note that f δ ( p ) is not the Ba yes en velope asso cia ted w ith estimating X t using Y t under Hamming loss. Ho wev er , as is clear from the deriv ations in (17) , and w ill b e evid ent fr om the pro of of the follo w ing theorem, f δ ( P ( y t | y t − 1 )) is the exp e ctation of the Ba y es en velo p e (asso cia ted w ith estimating X t using Y t under Hamming loss) with resp ect to th e distribu tio n P ( y t | y t − 1 ). Deﬁne ǫ δ = m in a,b max δ ≤ p ≤ 1 / 2 | ah b ( p ) + b − f δ ( p ) | . (54) The follo wing is the main result in this sub -sec tion. Theorem 7. L et Y B b e the output of a BSC with cr ossover pr ob ability δ whose input is X B . Then, for any sc anner-ﬁlter p air (Ψ , ˜ F opt ) , wher e ˜ F opt is the optimal ﬁlter f or the sc an Ψ , we have     1 | B | E Q B L Ψ , ˜ F opt ( X B , Y B ) − ˜ U ( l H , Q B )     ≤ 2 ǫ δ . (55) 23 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 δ Bound value The 2 ε δ bound compared to the simple singlet bound 2 ε δ δ (singlet bound) Figure 4: Bounds on the excess lo ss in scanning and ﬁltering o f binary random ﬁelds corrupted by a BSC. The solid line is the b ound in Theorem 7 (2 ǫ δ ), and t he dashed line is the singlet b ound ( f δ (1 / 2) = δ ) . Ev en w ith ou t ev aluating ǫ δ explicitly , it is easy to see that the excess loss wh en usin g n on optimal scanners is quite sm all in this binary ﬁltering scenario. F or example, for δ = 0 . 1 and δ = 0 . 25 w e ha v e ǫ δ < 0 . 035 and ǫ δ < 0 . 03 resp ectiv ely , yielding a m aximal loss of 0 . 07 or ev en 0 . 06. Figure 4 includes the v alue of 2 ǫ δ as a fu nction of δ . Similarly to S ect ion 3.2.1, it is compared to a simple b oun d on the excess loss which resu lts from simp ly b oundin g the Hamming loss of any ﬁlter by 0 from b elo w an d δ from ab o ve (n amel y , δ is the resu lting b ound on th e excess loss). The v alues in Figure 4 should also b e compared to 0 . 16, wh ic h is the b ound on the excess loss in the clean pr e diction scenario [12], or ev en to larger v alues in the noisy prediction s ce nario, to b e discussed in the next s ec tion. T h e fact that th e ﬁltering problem is less sensitive to the scannin g order is quite clear as the noisy observ ation of X Ψ t is av ailable und er any scan. Finally , it is n ot h ard to s h o w th at in the limits of δ → 0 and δ → 1 / 2 (high and lo w SNR, resp ectiv ely), we hav e ǫ δ → 0, whic h is exp ected, as the scanning is in co n sequen tial in these cases (n ote, how ever, that the singlet b ound, δ , do es n ot predict the corr ect b eha vior at low S NR). Pr o of (The or em 7). W e ﬁrs t sho w that for any arbitrarily distr ibuted bin ary n -tuple X n and 24 an y 0 ≤ δ < 1 / 2     a ∗ δ 1 n H ( Y n ) + b ∗ δ − 1 n E L opt l H ( X n , Y n )     ≤ ǫ δ , (56) where E L opt l H ( X n , Y n ) is the exp ecte d cum u lat iv e Hamming loss in optimally ﬁltering X n based on Y n and a ∗ δ and b ∗ δ are the min imize r s of ǫ δ in (54). In deed,     a ∗ δ 1 n H ( Y n ) + b ∗ δ − 1 n E L opt l H ( X n , Y n )     =     1 n n X t =1 X y t − 1 P ( y t − 1 ) X x t ,y t  − a ∗ δ P ( x t , y t | y t − 1 ) ln P ( y t | y t − 1 ) + b ∗ δ P ( x t , y t | y t − 1 ) − P ( x t , y t | y t − 1 ) l H  x t , ˜ F opt ( y t )       ≤ 1 n n X t =1 X y t − 1 P ( y t − 1 )     a ∗ δ h b  P ( y t | y t − 1 )  + b ∗ δ − X x t ,y t P ( x t , y t | y t − 1 ) l H  x t , ˜ F opt ( y t )      ≤ 1 n n X t =1 X y t − 1 P ( y t − 1 )     a ∗ δ h b  P ( y t | y t − 1 )  + b ∗ δ − X y t P ( y t | y t − 1 ) X x t P ( x t | y t ) l H  x t , ˜ F opt ( y t )      . (57) Consider th e s ummation P y t P ( y t | y t − 1 ) P x t P ( x t | y t ) l H  x t , ˜ F opt ( y t )  . As ˜ F is optimal, the inner sum equals at most min { P ( x t = 0 | y t ) , P ( x t = 1 | y t ) } . Thus, similar to the deriv ations in (17), w e ha ve X y t P ( y t | y t − 1 ) X x t P ( x t | y t ) l H  x t , ˜ F opt ( y t )  = X y t P ( y t | y t − 1 ) min { P ( x t = 0 | y t ) , P ( x t = 1 | y t ) } = m in  P ( y t = 0 | y t − 1 ) − δ 1 − 2 δ , P ( y t = 1 | y t − 1 ) − δ 1 − 2 δ , δ  . (58) Let p = P ( y t = 0 | y t − 1 ). Note that m in { p, 1 − p } ≥ δ . W e ha ve     a ∗ δ 1 n H ( Y n ) + b ∗ δ − 1 n E L opt l H ( X n , δ )     ≤ 1 n n X t =1 X y t − 1 P ( y t − 1 )     a ∗ δ h b ( p ) + b ∗ δ − min  p − δ 1 − 2 δ , 1 − p − δ 1 − 2 δ , δ      ≤ 1 n n X t =1 X y t − 1 P ( y t − 1 ) max δ ≤ p ≤ 1 / 2 | a ∗ δ h b ( p ) + b ∗ δ − f δ ( p ) | = ǫ δ , (59) whic h establishes (56). Ho we ver, the same inequality can b e pro ved for an y reordering of the data Ψ (similar to the pro of of [12, Pr oposition 13]), consequen tly ,     a ∗ δ 1 | B | H (Ψ( Y B )) + b ∗ δ − 1 | B | E Q B L Ψ , ˜ F opt ( X B , Y B )     ≤ ǫ δ . (60) 25 Using (60), remem b ering that H ( Y B ) = H (Ψ( Y B )) for any Ψ , an d applying the triangle inequalit y r esults in (55). Note that analogous ideas w ere used by V erd ´ u and W eissman to b ound the absolute dif- ference b etw een the d en oi sabilit y and erasur e en tropy [24]. Theorem 2 giv es a lo wer b ound on the b est ac h iev able scanning and ﬁltering p erform an ce. Theorems 3, 4 and 7 give an u p p er b oun d on th e maximal p ossible diﬀerence b et we en the normalized cumulativ e loss of any t wo scanners (accompanied b y th e optimal ﬁlters), or an y one scanner compared to th e optimal scan. A lth ou gh Theorem 2 is similar to th e r esults of [11], ev en for th e relativ ely simp le examples of a Gaussian ﬁeld through a Gaussian memoryless c hann el or a b inary source thr ough a bin ary symmetric c h annel we hav e no resu lts whic h can parallel [11, Theorem 17] or [11, Corollary 21], i.e., giv e an example of an optimal scanner-ﬁlter pair for a certain scenario. How ev er, as the next example s h o ws, we can identify situations when scannin g and ﬁltering improv es th e ﬁltering results, i.e., non trivial scanning of th e data results in strictly b etter restoration. Moreo ve r , the example b elo w illustrates the use of the results derive d in this section. Example 2 ( One Dimensional Binary M arkov Sour c e and the BSC ) . In this case, it is not to o hard to construct a sc h eme in wh ic h non-trivial scanning improv es th e ﬁltering p erformance. In [21], Ordentlic h and W eissman stud y the optimalit y of symb ol by sym b ol (singlet) ﬁltering and d ec o ding. That is, the regions (d epend ing on the source and channel parameters) w here a m emoryless sc h eme to estimate X i is optimal with resp ect to causal (ﬁltering) or non causal (denoising) non-memoryless schemes. C learly , in the r eg ions where sin gle t denoising is optimal (a fortiori singlet ﬁltering), scanning cannot impro ve the ﬁltering p erform ance. Ho we v er, consider the region where singlet ﬁ lte ring is op timal, y et singlet denoising is n ot. In this region, there exists k f or which the estimation error in estimating X i based on Y i + k i − k is strictly smaller than that b ased on Y i (as the optimal ﬁ lte r is m emoryless ye t the optimal denoiser is n ot) . Hence, a scann er w hic h in the ﬁrst pass scans k con tiguous sy mb ols, then sk ip s one, etc., and in the second p ass returns to ﬁ ll in the holes, accompanied by sin gle t ﬁltering in the ﬁrst pass and non-memoryless in th e second, has strictly b etter ﬁltering p erformance than th e trivial scanner. F or a binary symmetric Mark o v source w ith a transition probabilit y π ≤ 1 2 , corru pted by a BSC with crosso ver pr obabilit y δ , [21, Corollary 3] asserts that sin glet ﬁltering (“sa y-what- 26 y ou-see” sc heme in this case) is optimal if and only if δ ≤ f ( π ) △ = 1 2 (1 − p max { 1 − 4 π , 0 } ) . (61) Singlet denoising, on the other hand, is optimal if and only if δ ≤ d ( π ) △ = 1 2   1 − s max { 1 − 4  π 1 − π  2 , 0 }   . (62) Consider a scanner-ﬁlter pair w hic h scans the data us ing an “o dds-then-ev ens” scheme. On the o dds, “say-what- y ou-see” ﬁltering is used. On the ev ens, Y i +1 i − 1 are u s ed in ord er to estimate X i . 2 The resu lts are in Figur es 5 and 6. In Figure 5, the p oint s mark ed with “x” are where the “o dds-then-ev ens” scan improv es on the trivial scan. Th e tw o curv es are f ( π ) and d ( π ). Figure 6 shows the actual impro vemen t made by the “o dds-then-ev ens” scanning and ﬁltering. F or δ = π = 0 . 1, for example, the “o d ds-then-ev ens” err or rate is sm al ler than that of ﬁltering with the trivial scan b y 0 . 021 (that is, 0 . 079 compared to 0 . 1). This v alue sh ould b e pu t alongside the upp er b oun d on the excess loss give n in Theorem 7, w h ic h is sm all er than 0 . 07 in this case. T o ev aluate th e b ound on the b est ac hiev able scann in g and ﬁltering p erformance giv en in Theorem 2 for this example (d enote d , with a slight abuse of notation, as ˜ U ( π , δ )), we hav e ˜ U ( π , δ ) ≥ ¯ ζ − 1 ( ¯ H ( π , δ )) ≥ ¯ ζ − 1 ( h b ( π ∗ δ )) , (63) where ¯ H ( π , δ ) is the entrop y r ate of the output, w h ic h is in tur n lo w er b oun ded b y h b ( π ∗ δ ). The resulting b ound f or π = δ = 0 . 1 is appr oximate ly 0 . 04. Th us, there exist non-trivial scanning and ﬁ lte r ing schemes (i.e., lo wer b ounds) whose impro vemen t on the trivial scanning order is of th e same order of magnitude as the upp er b ound in T heorem 7. T o conclude, it is clear that there is a wide region we r e a n on trivial scanning order impro v es on the trivial scan, an that this r egion includes at least all the region b et w een f ( π ) and d ( π ). Y et, it is not clear what is the optimal scann er -ﬁ lte r pair. 3.3 Univ ersal Scanning and Filtering of Noisy Data Arra ys In [19 ], W eissman et . al . men tion that the p r oblems inv olving sequentia l d ecision m aking on noisy data are not f undamen tally diﬀerent fr om their noiseless analogue, and in fact can b e 2 This is to hav e few simple steps in the forward-backward alg orithm [25, Section 5] which is requir ed to compute P ( x i | y i +1 i − 1 ). The genera lization to Y i + k i − k is straightforward. 27 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 π δ Where can non−trivial scanning improve the optimal filtering performance? f( π ) d( π ) Non−trivial scanning improves the optimal filtering performance Figure 5: Where can a simple (sub optimal) “o dds-then-ev ens” scan impro v e on the trivial scanning order and optimal ﬁltering sc heme. π is the transition proba bility of the symmetric, ﬁrst or der, Mark ov source a nd δ is the c ha nnel crosso ver probability . reduced to the noiseless setting using a pr operly mo diﬁed loss function. Indeed, th is p rop ert y of the noisy setting w as used throughout the literature, and in this work. Th e problem of ﬁltering a noisy data sequence is not d iﬀeren t in this sense, and it is p ossible to constru ct a mo diﬁed loss function su c h that the ﬁltering pr oblem is transf ormed into a prediction pr oblem (with a few imp ortant exceptions to b e discus s ed later). Suc h a m odiﬁed loss function and a ‘ﬁltering-prediction transf orm at ion’ is discussed in [19]. W e brieﬂy review this trans formatio n , and consider its use in universal ﬁltering of noisy data arra ys. First, we sligh tly generalize our notion of a ﬁlter. F or a random v ariable U t uniformly distributed on [0 , 1], let ˆ X t ( y t − 1 y t , U t ) ∈ A denote the output of the ﬁ lter ˆ X at time t , after observing y t . Th at is, the ﬁlter ˆ X also views an auxiliary rand om v ariable, on which it can b ase its output, ˆ X t ( y t − 1 y t , U t ). W e also generalize the prediction space to M ( S ), S = { s : N 7→ A } . I.e., the prediction space is a distrib u tion on the set of fu nctions fr om the noisy observ ations alphab et N to the clean signal alphab et A . W e assume an inv er tib le discrete memoryless c hann el. F or eac h ﬁlter ˆ X , the corresp onding pr edicto r is deﬁ n ed b y F ˆ X t ( y t − 1 )[ s ] = P  ˆ X t ( y t − 1 y , U t ) = s ( y ) ∀ y ∈ N  . (64) 28 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 −0.005 0 0.005 0.01 0.015 0.02 0.025 δ The difference between optimal filtering and "odds−then−evens" scantering. π Difference in error rate Figure 6: The actual diﬀerence b etw een the o ptimal ﬁltering error ra te and the “o dds-then-ev ens” scanning and ﬁltering error rate. π is the transition probability of the symmetric, ﬁrst o r der, Mark ov source and δ is the c hannel crosso ver probabilit y . Only v alues for whic h δ < f ( π ) ar e sho wn. 29 The analogous ‘prediction-ﬁltering transformation’ is ˆ X F t ( y t , u t ) = a j ∈ A if j − 1 X i =0 X s : s ( y t )= a i F t ( y t − 1 )[ s ] ≤ u t < j X i =0 X s : s ( y t )= a i F t ( y t − 1 )[ s ] , ( 65) where the subs cr ip t i r eﬂ ec ts s ome en u m erati on of A . Und er the ab o ve deﬁn itio n s, [19, T h eo- rem 4] states that for all n , x n ∈ A n and an y pred ict or F , E L ˆ X F ( x n , Y n ) = E L ′ F ( Y n ) , (66) where L ˆ X F ( x n , Y n ) is the cumulati v e loss of th e ﬁlter u nder th e original loss fu nctio n l and L ′ F ( Y n ) is the cum u lat iv e loss of the predictor under a mo diﬁe d loss function l ′ , whic h d epen ds on the original loss l and the c hann el crosso ver pr ob ab ilities. This resu lt can b e u sed for un iv ersal ﬁltering, under inv ertible discrete memoryless c h annels, in the follo wing w ay . F or eac h ﬁ nite set of ﬁlters, constru ct the corresp ondin g set of predictors, then use the well kno w n results in u niv ersal prediction in order to construct a u niv ersal p r edic- tor for th at set. Finally , constru ct th e un iv ersal ﬁlter using the “in v erse” p redictio n-ﬁltering transformation. Analogously , the results on unive r sal ﬁn ite set scand iction giv en in [12] can b e used to construct univ ersal scanner-ﬁlter pairs. Note, ho w ever, th at the mo diﬁed loss function l ′ ma y b e muc h more complex to handle compared to the original one. F or example, it may not b e a function of th e d iﬀeren ce x t − F t , even if the original loss f unction is. Nev ertheless, the r esults in [12 ] apply to any b ound ed loss function, and thus can b e utilized. 4 Scandictio n of Noisy Data Arra ys In this sectio n, we co n sider a scenario similar to that of Section 3, only now, for eac h t , the data Y Ψ t is not a v ailable in the estimation of X Ψ t , n amely , F t = F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ), as opp osed to ˜ F t = ˜ F t ( Y Ψ 1 , . . . , Y Ψ t ) in the ﬁ ltering scenario. W e refer to this scenario as “noisy scandiction”, analogous to the noisy pr edict ion problems d iscussed in [14] and [15]. W e ﬁr st assume the joint p r obabilit y distribu tio n of the u nderlying ﬁeld and n oisy obser- v ations, Q , is kn o w n, and examine the s et tings of Gaussian ﬁelds un der s q u ared error loss and binary ﬁ elds u n der Hamming loss. In these cases, we characte r ize the n oi sy s ca n dictabilit y and the ac h ievi ng scandictors in terms on the “clean” scandictabilit y of the noisy data. W e then consider u niv ersal scandiction for the n oisy setting, show that this is ind eed p ossible for ﬁnite scandictor set and for the class of all stationary binary ﬁelds corru pted b y bin ary noise. Finally , we der ive b ounds on the excess loss when non optimal scanners are u s ed (yet , with the op timal pr ed ict or f or eac h scan). 30 4.1 Noisy Scandictabilit y Throughout this section, it will b e b eneﬁcial to consider also the cle an scand ictabilit y as deﬁned in [11, Deﬁnition 2], that is, wh en th e scandictor is ju dged w ith resp ect to the same random ﬁ eld it observes. Th u s, for ( X , Y ) go verned by the probabilit y measure Q , Q Y denotes the marginal measure of { Y } , and therefore U ( l, Q Y ) r efers to th e clean scandictabilit y of Y , i.e., L (Ψ ,F ) ( y B ) = | B | X t =1 l ( y Ψ t , F t ( y Ψ 1 , . . . , y Ψ t − 1 )) , (67) and U ( l , Q Y ) = lim n →∞ inf (Ψ ,F ) E Q Y 1 | B | L (Ψ ,F ) ( Y B ) . (68) As m en tioned earlier, in this section we relate th e noisy scandictabilit y , ¯ U ( l , Q ), to the clean scandictabilit y of the noisy ﬁeld, U ( l, Q Y ). This r ela tion can b e used to deriv e b ounds on ¯ U ( l , Q ) using the b ounds on U ( l, Q Y ) derive d in [11]. Ho wev er, this should b e done carefully . F or example, the low er and upp er b ounds give n in [11 , Th eorem 9] are applicable only when X has an autoregressiv e representa tion (with resp ect to some scandictor) with indep endent inno v ations. Unfortunately , Y = X + N do es n ot n ece s sarily hav e this representa tion, and the b ounds do not apply to Y in a straigh tforward mann er. 3 Y et, a simple generalization of the lo wer b ound in [11 ], v alid for arbitrarily distrib uted random ﬁelds, can b e deriv ed using the same m ethod used in the pro of of T h eorem 2 . T o this end, we brieﬂy d escrib e this generalizat ion. Let B ( P ) = min ˆ y X y l ( y , ˆ y ) P ( y ) , (69) and fu rther deﬁne γ ( d ) = max { H ( P ) : B ( P ) ≤ d } . (70) Similarly as in Section 3.1, denote by ¯ γ ( d ) the upp er conca ve env elop e of γ ( d ). Corollary 8. F or any r andom ﬁeld Y B and any sc andictor (Ψ , F ) for Y B , ¯ γ ( L (Ψ ,F ) ( Y B )) ≥ 1 | B | H ( Y B ) . (71 ) 3 Note tha t the r estriction to autoreg ressive ﬁelds is merely technical, i.e ., it facilitates the pr oof of the low er bo und in the sense that a weak AEP -lik e theorem is requir ed. The es sence o f the low er b ound, how ever, which is a volume preserv ation argument, is v alid for non a utoregressive ﬁelds as well. 31 Pr o of. T he pro of is similar to that of Theorem 2. W e h a v e, H ( Y B ) = H (Ψ( Y B )) = | B | X t =1 H ( Y Ψ t | Y Ψ t − 1 ) ≤ | B | X t =1 X y Ψ t − 1 γ  E Q B  l  Y Ψ t , F t ( y Ψ t − 1 )  | Y Ψ t − 1 = y Ψ t − 1  P ( y Ψ t − 1 ) ≤ | B | ¯ γ  1 | B | E Q B L (Ψ ,F ) ( Y B )  . (72) The lo w er b oun d in Corollary 8 strengthens the b ound in [11, Theorem 9] sin ce it app lies to general loss fun cti ons, arbitrarily distrib uted rand om ﬁelds, and is non-asymptotic. When A = R and the loss fu nction is of the form l ( x, F ) = ρ ( x − F ), where ρ ( z ) is monotonically increasing for z > 0, m onot onically decreasing for z > 0, satisﬁes ρ (0) = 0 and R e − sρ ( z ) dz < ∞ for ev ery s > 0, the ab o v e b oun d coincides with that of [11]. In that case, ¯ γ ( d ) = γ ( d ), wh ic h is in turn the one side d F enchel-L e gendr e tr ansform of th e lo g moment gener ating f u nction asso cia ted with ρ (See [11, Section I I I] for the details). F or example, when ρ ( z ) = z 2 , we h a ve, γ ( d ) = 1 2 ln (2 π ed ), d > 0 and γ − 1 ( h ) = 1 2 π e e 2 h , h > 0. Similar results can b e d eriv ed for binary alphab et, th u s, when ρ ( z ) is the Hamming loss fu nction, γ ( d ) = h b ( d ). W e n ow turn to d iscuss the noisy scnadictabilit y , ¯ U ( l , Q ). Th e follo wing lemma, p r o v ed in App endix A.2 , d escribes the noisy scand ictabilit y for any additive white n oise c hannel mo del and the squ ared error loss fu nctio n, l s ( · ), in terms of the clean scandictabilit y of Y , and giv es the op timal scandictor. Lemma 9. L e t { ( X t , Y t ) } t ∈ Z 2 b e a r andom ﬁeld governe d by a pr ob ability me asur e Q such that Y t = X t + N t , wher e N t , t ∈ Z 2 , ar e i.i.d. r andom variables with V ar ( N t ) = σ 2 N < ∞ . Then ¯ U ( l s , Q ) = U ( l s , Q Y ) − σ 2 N . (7 3) F urthermor e, ¯ U ( l s , Q ) is achieve d by the sc andictor which achieves U ( l s , Q Y ) . Actually , Lemm a 9 is only scarcely related to scannin g. It merely states that in the pre- diction of a pro cess based on its noisy observ ations, und er the additive mo del stated ab o v e and squared err or loss, the optimal pr ed ict or is one which d isr ega r ds the noise, and attempts to pr edict th e next noisy outcome. Sim ilar resu lts for bin ary pro cesses through a BSC w er e giv en in [16] and will b e discussed later. 32 Finally , we men tion th at the metho d used in the pro of of Lemma 9 is sp eciﬁc for the square error loss fun ctio n . F or a general loss fun ctio n , one can us e conditional exp ectat ion in order to compute the noisy scandictabilit y , under a mo diﬁe d loss fun cti on ρ . Sp eciﬁcally , for a r andom ﬁeld X , d enote b y σ ( X V n ) the s m all est sigma algebra with resp ect to whic h X V n is measurable. Let Ψ n denote a scanner for V n and denote b y F Ψ n t the information available to the sc andictor at the t ’th step , that is F Ψ n t = σ  Y Ψ 1 , Y Ψ 2 ( Y Ψ 1 ) , . . . , Y Ψ t ( Y Ψ t − 1 Ψ 1 )  . (74) Note that the set of sites Ψ 1 , Ψ 2 , . . . , Ψ t is itself random, yet for eac h t , Ψ t is F Ψ n t − 1 measurable (if Ψ is random, namely , it u ses add itio nal indep endent random v ariables, th e deﬁnition of F Ψ n t is altered accordingly). Hence, the ﬁ ltratio n {F Ψ n t } | V n | t =1 represent s the gathered kn owledge at the s ca n dictor. W e ha ve E Q B n 1 | B n | | B n | X t =1 ρ ( F t − Y Ψ t ) = E Q B n 1 | B n | | B n | X t =1 E Q B n n ρ ( F t − X Ψ t − N Ψ t )    F Ψ t − 1 , σ ( X Ψ t ) o = E Q B n 1 | B n | | B n | X t =1 ˜ ρ ( F t − X Ψ t ) , (75 ) for some ˜ ρ . Thus, if l ( X Ψ t , F t ) is the required loss fu n ctio n in the n oisy prediction pr oblem of { X } , on e has to seek a function ρ ( · , · ) suc h that ˜ ρ ( x Ψ t , F t ) = l ( x Ψ t , F t ) for all x Ψ t and F t . If su c h a f unction is found , then surely E ρ ( Y Ψ t , F t ) = E l ( X Ψ t , F t ) and the optimal scandictor for the noisy prediction problem is the one whic h is optimal for the clean p rediction p r oblem of { Y } u nder ρ . While this is simple for the squared err or loss f unction and additiv e noise (c ho ose ρ ( y − F ) = ( y − F ) 2 − σ 2 N ), or Hamming loss and BS C (choose ρ ( y , F ) = l H ( y, F ) − δ 1 − 2 δ ) this is not alw a ys the case for a general loss fu nction. It is also imp ortant to note that in the case of white noise considred in this p aper, the condition on the mo diﬁed loss fu nction ρ can b e stated in a single letter expresion, namly , if l ( X, F ) is the required loss f u nction for the noisy scandiction problem, ρ should satisfy E { ρ ( Y , F ) | σ ( X ) } = l ( X , F ). 4.1.1 Gaussian R and o m Fields Let b oth X and N b e Gaussian rand om ﬁelds, where the comp onents of N are i.i.d. and indep endent of X . Th at is, Y is the output of an A WGN c hannel, with a Gaussian input X . In this scenario, similarly to the clean one, the noisy scandictabilit y is kno wn exactly and is giv en by a single letter expr ession. Before we pro ceed, sev eral d eﬁnitions are required. F or any t ∈ Z 2 and V ⊆ Z 2 , denote b y ˆ X t ( V ) the b est linear predictor of X t giv en { X t ′ } t ′ ∈ V . A subset S ⊆ Z 2 is called a half 33 plane if it is closed to addition and satisﬁes S ∪ ( − S ) = Z 2 and S ∩ ( − S ) = { 0 } . F or example, S lex = { ( m, n ) ∈ Z 2 : [ m > 0] or [ m = 0 , n ≥ 0] } is a half plane. Let X b e a wid e sen s e stationary random ﬁeld and denote by g th e density fu nction asso ciate d with the absolutely con tinuous comp onent in the Leb esgue decomp osition of its sp ectral measure. Then, for any half plane S , we ha v e [26 , Theorem 1], E  X 0 − ˆ X 0 ( − S \ { 0 } )  2 = exp ( 1 4 π 2 Z [0 , 2 π ) 2 ln g ( λ ) dλ ) △ = σ 2 u ( X ) . (76) W e can now state the follo wing corollary , regarding the n oisy scandictabilit y in th e Gaussian regime and squared error loss, which is a direct app lica tion of Lemma 9 and the results of [11, Section IV]. Corollary 10. L et { ( X t , Y t ) } t ∈ Z 2 b e a r andom ﬁeld governe d by a pr ob ability me asur e Q suc h that Y t = X t + N t , wher e X is a stationary Gaussian r andom ﬁeld, N t , t ∈ Z 2 , is an A WGN, indep endent of { X t } t ∈ Z 2 . Then, the noisy sc andictability of Q u nder the squar e d err or loss is given by ¯ U ( l s , Q ) = σ 2 u ( Y ) − σ 2 N . (77) F urthermor e, ¯ U ( l s , Q ) is asymptotic al ly achieve d b y a sc andictor which sc ans ( X t , Y t ) ac c or ding to the total or der deﬁne d by any half-plane S and applies the c orr e sp onding b e st line ar pr e dictor for the next outc ome of Y . F or any stationary Gaussian pr o c ess X , it h as b een sho wn by Kolmogoro v (see for example [27]) that the ent rop y rate is giv en b y H X ∗ = 1 2 ln (2 π e ) + 1 4 π Z π − π ln g ( λ ) dλ. (78) Th us, u sing the one-dimensional analogue of (76), for a stationary Gaussian p rocess X we ha ve, H X ∗ = 1 2 ln (2 π eσ 2 u ( X )) . (79) In f act, (79) applies for stationary Gaussian r andom ﬁelds as well. Thus, we ha v e, ¯ U ( l s , Q ) = σ 2 u ( Y ) − σ 2 N = 1 2 π e e 2 H Y ∗ − 1 2 π e e 2 H N , (80) where H Y ∗ is the entrop y rate of Y and H N is the entrop y of eac h N t . F r om the entrop y p ow er inequalit y [23 , pp . 496], w e ha ve, 1 2 π e e 2 H Y ∗ ≥ 1 2 π e e 2 H N + 1 2 π e e 2 H X ∗ , (81) 34 th u s , as exp ected, th e n oisy scandictabilit y giv en in Corollary 10 (and (80) ) is at least as small the clean scandictabilit y of X , that is, with no noise at all. In m ost of the in teresting cases, ho wev er, (81) is a strict inequalit y . In fact, as mentioned in [28], (81) is ac hiev ed w ith equalit y only w hen b oth X and N are Gaussian and ha ve pr op ortional sp e ctr a . C onsequen tly , u nless X is wh ite , Corollary 10 is n on -trivial. 4.1.2 Binary Random Fields In this case, the results of [14 ] and [16] s h ed ligh t on the optimal scandictor. Therein, it wa s sho wn th at for a binary prediction prob lem, i.e., w here { X t } is a bin ary source passed th rough a BSC with cross o ver probability δ < 1 2 , and { Y t } is the channel output, the more lik ely outcome f or the clean bit is also the more likely outcome f or th e noisy b it. Thus, the optimal predictor in the Hamming sense for the n ext clean bit (based on the noisy observ ations) might as w ell u se the same strategy as if it tries to p redict the next noisy bit. C onsequen tly , the optimal sc andictor in the noisy setting is th e one whic h is optimal for { Y } , and the resu lts of [11, S ect ion V] app ly . The follo wing pr oposition relates the scandictabilit y of a binary noise-corrupted pro cess { Y } , judged with resp ect to the clean binary pr ocess { X } , to its clean scandictabilit y . Prop osition 11. L et { ( X t , Y t ) } t ∈ Z 2 b e a binary r andom ﬁeld governe d by a pr ob ability me asur e Q such that { Y t } is the output of a binary memoryless symmetric channel with cr oss over pr ob ability δ and input { X t } . Then, ¯ U ( l H , Q ) = U ( l H , Q Y ) − δ 1 − 2 δ , (82) wher e l H is the Hamming loss f u nction. F urthermo r e, ¯ U ( l H , Q ) is achieve d by the sc andictor which achieves U ( l H , Q Y ) . Note th at ind ee d U ( l H , Q Y ) ≥ δ as Y is the outp ut of a BSC with crosso ver pr obabilit y δ . Pr o of (Pr op osition 11). Let { B n } n ≥ 1 b e any sequence of elemen ts in V , satisfying R ( B n ) → ∞ . W e ha ve, ¯ U ( l H , Q B n ) = inf (Ψ ,F ) ∈S ( B n ) E Q B n 1 | B n | | B n | X t =1 l H ( X Ψ t , F t ( Y Ψ 1 , . . . , Y Ψ t − 1 )) = inf (Ψ ,F ) ∈S ( B n ) 1 | B n | | B n | X t =1 P  F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) 6 = X Ψ t  , (83) 35 and, analogously , U ( l H , Q Y ,B n ) = inf (Ψ ,F ) ∈S ( B n ) 1 | B n | | B n | X t =1 P  F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) 6 = Y Ψ t  . (84) Denoting by Z t the channel n oise at time t , and abbr evia ting F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) b y F t , we ha ve P ( F t 6 = Y Ψ t ) = P ( F t 6 = Y Ψ t , Z Ψ t = 1) + P ( F t 6 = Y Ψ t , Z Ψ t = 0) = P ( F t = X Ψ t , Z Ψ t = 1) + P ( F t 6 = X Ψ t , Z Ψ t = 0) = (1 − P ( F t 6 = X Ψ t )) δ + P ( F t 6 = X Ψ t ) (1 − δ ) . (85) Namely , for δ < 1 2 , the optimal strategy for p redicting Y Ψ t based on Y Ψ 1 , . . . , Y Ψ t − 1 and the optimal strategy for pr edicting X Ψ t based on Y Ψ 1 , . . . , Y Ψ t − 1 are ident ical, and, in add iti on, P ( F t 6 = X Ψ t ) = P ( F t 6 = Y Ψ t ) − δ 1 − 2 δ . (86) Substituting (86) int o (83) and taking n → ∞ completes the pro of. 4.2 Univ ersal Scandiction in the Noisy Scenario Section 4.1 dealt with the actual v alue of the b est achiev able p erformance in the noisy scan- diction scenario. Ho wev er, it is also inte r esting to inv estigate the universal setting in wh ic h one seeks a p redicto r which do es not dep end on th e join t probabilit y measur e of { ( X, Y ) } , ye t p erforms asymptotically as well as a one matc hed to this measure. T he problem of un iv ersal scandiction in the noiseless scenario w as dealt with in [12 ]. Herein, we show that it is p ossible to construct universal scandictors in the noisy setting as w ell (similar to universal scanning and ﬁltering in Section 3.3). First, w e show that it is p ossible to comp ete su cc essfully with an y ﬁ n ite set of scandictors, and present a un iv ersal scandictor for this s ett in g. W e then show that w ith a prop er c hoice of a set of scandictors, it is p ossible to (univ ersally) ac h iev e ¯ U ( l , Q ), i.e., th e noisy scandictabilit y , for an y s patia lly stationary random ﬁled ( X , Y ). A t the b asis of the r esults of [12] stands the exp onenti al weig hting algorithm, originally deriv ed by V o vk in [29]. In [29], V o vk consid ered a general set of exp erts and introdu ced the exp onen tial weig h ting alg orithm in order to comp ete with th e b est exp ert in the set. In this algorithm, eac h exp ert is assigned with a w eigh t, according to its past p erformance. By d ecreasing the weigh t of p o orly p erforming exp erts, hence p referring the on es p r o ved to p erform wel l th us far, one is able to comp ete with th e b est exp ert, ha ving neither any a priori kno wledge on th e input sequence nor wh ic h exp ert w ill p erform the b est. It is clear that the 36 essence of this algorithm is the u se of the cum u lat iv e losses incurr ed b y eac h exp ert to co nstruct a probab ility measure on the exp erts, whic h is later u sed to c ho ose an exp ert for the next action. Ho we ver, when th e clean data X is not kn own to the sequ en tial algorithm, it is imp ossible to calculate the cum u lat ive losses of the exp erts pr ec isely . Nev ertheless, as W eissman and Merha v sho w in [15], u s ing an unbiased estimate ˆ X t ( Y t ) of X t results in suﬃcientl y accurate estimates of the cumulativ e losses of the exp erts, wh ic h in turn can b e used by the exp onen tial w eighti ng alg orithm . Hence, the framew ork deriv ed in [12] can then b e used to suggest univ ersal scandictors for the noisy setting as wel l. Consider a random ﬁeld ( X B , Y B ) where X is binary and Y is either binary (e.g., the output of a BSC w hose inp ut is Y ) or real v alued (e.g., X through a Gaussian n oise c hannel). F or a loss f u nction l : { 0 , 1 } × [0 , 1] → [0 , ∞ ] we deﬁn e, similarly to [15], l 0 ( · ) △ = l (0 , · ) and l 1 ( · ) △ = l (1 , · ) . (87) Assume (Ψ , F ) is a scandictor for B . Then, f or any t ≤ | B | , w e ha ve L (Ψ ,F ) ( x B , y B ) t = t X i =1 l ( F i ( y Ψ i − 1 Ψ 1 ) , x Ψ i ) = t X i =1 h (1 − x Ψ i ) l 0 ( F i ( y Ψ i − 1 Ψ 1 )) + x Ψ i l 1 ( F i ( y Ψ i − 1 Ψ 1 )) i . (88) Clearly , L (Ψ ,F ) ( x B , y B ) t dep ends on x B and is not kno wn to the sequ ential algorithm. Let h ( y Ψ i ) b e an unbiased estimate for x Ψ i . F or example, wh en Y is the output of a BSC with input X we may choose h ( y Ψ i ) = y Ψ i − δ 1 − 2 δ . (89) Deﬁne ˆ L (Ψ ,F ) ( y B ) t = t X i =1 h (1 − h ( y Ψ i )) l 0 ( F i ( y Ψ i − 1 Ψ 1 )) + h ( y Ψ i ) l 1 ( F i ( y Ψ i − 1 Ψ 1 )) i , (90) and ∆ (Ψ ,F ) ( x B , y B ) t △ = L (Ψ ,F ) ( x B , y B ) t − ˆ L (Ψ ,F ) ( y B ) t = t X i =1 ( h ( y Ψ i ) − x Ψ i ) l 0 ( F i ( y Ψ i − 1 Ψ 1 )) + t X i =1 ( x Ψ i − h ( y Ψ i )) l 1 ( F i ( y Ψ i − 1 Ψ 1 )) . (91) Similarly to [15], w e assume that the noise ﬁeld N B is of ind epend en t comp onents and that for eac h i ∈ B , Y i ∈ σ ( N i ), i.e., the noise comp onent at site i aﬀects the observ ation at that site alone. In Ap pen dix A.3, w e show that f or any image x B and an y scandictor (Ψ , F ) f or 37 B ,  ∆ (Ψ ,F ) ( x B , y B ) t , F Ψ t  is a zero mean martingale. As a result, for any scand ictor (Ψ , F ), image x B and t we hav e E L (Ψ ,F ) ( x B , Y B ) t = E ˆ L (Ψ ,F ) ( Y B ) t , (92) namely , ˆ L (Ψ ,F ) ( Y B ) t is an unbiased estimator for L (Ψ ,F ) ( x B , Y B ) t . The u niv ersal algorithm f or scanning and prediction in the noisy scenario w ill thus use ˆ L (Ψ ,F ) ( Y B ) t instead of L (Ψ ,F ) ( x B , Y B ) t , whic h is un kno wn. More sp eciﬁcally , similarly to the algorithm prop osed in [12], the algorithm divides the data arr a y to b e scand ict ed to blo c ks of size m ( n ) × m ( n ), then scans the data in a (ﬁxed) blo c k-wise order, wh ere eac h b loc k is scandicted u sing a scandictor c h osen at random from the scandictors set, according to the distrib u tion ˆ P i  j |{ ˆ L j,i } λ j =1  , ˆ P i  j |{ ˆ L j,i } λ j =1  = e − η ˆ L j,i P λ j =1 e − η ˆ L j,i , (93) where ˆ L j,i = P i − 1 m =0 ˆ L (Ψ ,F ) j ( y m ), the estimated cumulativ e loss of the scandictor (Ψ , F ) j after scandicting i blo c ks of data, when (Ψ , F ) j is r estarted after eac h blo c k, and λ is the card inalit y of the set of s cand ict ors, F m . 4 Note the sub script m in F m , as in order to scand ict a data arra y of s ize n × n , the un iv ersal algorithm discussed h er ein uses the scandictors with w hic h it comp etes, b ut only on blo c ks of size m × m . The follo wing prop osition giv es an u p p er b ound on the redun dancy of the algorithm wh en comp eting with a ﬁ nite set of scand ictors, eac h op erating blo c k-wise on the data arra y . Prop osition 12. L et E L alg ( x V n , Y V n ) b e the exp e cte d (with r esp e ct to the noisy r andom ﬁeld as wel l as the r andomiza tion in the algorithm) cumulative loss of the pr op ose d algorithm on Y V n , when the underlying cle an arr ay is x V n and the noisy ﬁeld i s of indep endent c omp onents with Y i ∈ σ ( N i ) for e ach i ∈ V n ⊂ Z 2 . L et E L min ( x V n , Y V n ) denote the e xp e cte d cumulative loss of the b est sc andictor in F m , op er ating blo c k -wise on Y V n . Assume |F m | = λ , then E L alg ( x V n , Y V n ) − E L min ( x V n , Y V n ) ≤ m ( n )( n + m ( n )) √ ln λ l max √ 2 . (94) 4 T o b e consistent with the notation of [12], the same notation is used for b oth a ﬁltratio n and a scandictor set. The diﬀerence should be cle ar from the context. 38 Pr o of. By (92) and [12 , Prop osition 3], for any x V n w e ha ve E L alg ( x V n , Y V n ) − min (Ψ ,F ) ∈F m E L (Ψ ,F ) ( x V n , Y V n ) = E ˆ L alg ( Y V n ) − min (Ψ ,F ) ∈F m E ˆ L (Ψ ,F ) ( Y V n ) ≤ E ˆ L alg ( Y V n ) − E min (Ψ ,F ) ∈F m ˆ L (Ψ ,F ) ( Y V n ) = E  ˆ L alg ( Y V n ) − min (Ψ ,F ) ∈F m ˆ L (Ψ ,F ) ( Y V n )  ≤ m ( n )( n + m ( n )) √ ln λ l max √ 2 . (95) Prop osition 12 is the basis for the main r esult in this sub-section, a universal s ca n dictor whic h comp etes su cce s s fully with any ﬁ nite set of s cand ict ors for the noisy scenario. Theorem 13. L et ( X , Y ) b e a stationary r andom ﬁeld with a pr ob ability me asur e Q . A ssume that for e ach i ∈ Z 2 , Y i is the output of a memoryless channel whose input is X i . L et F = {F n } b e an arbitr ary se quenc e of sc andictor sets, wher e F n is a set of sc andictors for V n and |F n | = λ < ∞ for al l n . Then, ther e e xi sts a se quenc e of sc andictors { ( ˆ Ψ , ˆ F ) n } , indep endent of Q , for which lim inf n →∞ E Q V n E 1 | V n | L ( ˆ Ψ , ˆ F ) n ( X V n , Y V n ) ≤ lim inf n →∞ min (Ψ ,F ) ∈F n E Q V n 1 | V n | L (Ψ ,F ) ( X V n , Y V n ) (96) for any Q ∈ M S (Ω) , wher e the inner exp e ctation in the l.h.s. of (96) is due to the p ossible r andomization in ( ˆ Ψ , ˆ F ) n . The pro of of Theorem 13 follo ws the pr oof of [12, Th eo rem 2] ve r batim. It is n o w p ossible to sho w the existence of a unive rsal scandictor for any stationary r andom ﬁeld in the noisy scandiction setting. Herein, we in clude only the setting where X is binary and Y is the output of a BSC . In this case, the scandictor is t wofol d-unive rsal, n amely , it do es not d epend on the channel crosso v er probabilit y either. Extending the results to real-v alued noise is p ossible using the metho ds introduced in [16] (although the u niv ersal p redictor d o es dep end on th e c hann el c haracteristics) and will b e d iscussed later. Theorem 14. L et X b e a stationary r andom ﬁeld over a ﬁnite alphab et A and a pr ob ability me asur e Q . L et Y b e the output of a BSC whose input is X and whose cr ossover pr ob ability δ . L et the pr e diction sp ac e D b e either ﬁnite or b ounde d (with l ( x, F ) then b eing Lipschitz in 39 its se c ond ar gument). Then, ther e exists a se quenc e of sc andictors { (Ψ , F ) n } , indep endent of Q and of δ , for which lim n →∞ E Q V n E 1 | V n | L (Ψ ,F ) n ( X V n , Y V n ) = ˜ U ( l , Q ) (97) for any Q ∈ M S (Ω) , wher e the inner exp e ctation in the l.h.s. of (97) is due to the p ossible r andomization in (Ψ , F ) n . Similar to [16, Section A] and the pro of of [12, Theorem 6], in the case of binary inpu t and binary-v alued noise it is p ossible to tak e the set of scandictors with w h ic h we comp ete as the set of al l p ossible sc andictors for an m ( n ) × m ( n ) blo c k. The pr oof th u s follo ws d irectly from the p roof of [12 , Theorem 6]. As for con tinuous-v alued observ ations, it is quite clea r that th e set of all p ossible scandictors for an m ( n ) × m ( n ) b loc k is f ar to o ric h to comp ete with (note that this is since the n umber of pr e dictors is to o large). A complete d iscussion is a v ailable in [16, Section B]. Ho wev er, W eissman and Merha v do oﬀer a m ethod for successfully ac h ieving the Ba y es en v elop e for this setting, b y introdu cing a muc h s malle r set of p r edicto r s, w hic h on one hand includes the b est k th order Mark o v p redictor, yet on the other han d is not to o rich, in the sense that the redund ancy of the exp onent ial w eighting algorithm tends to zero when comp eting with an ǫ - g rid of it . Since presenting a unive rsal sc andictor for th is scenario w ill mainly include a rep etition of the man y details discu s sed in [16], we do not in clude it here. 4.3 Bounds on the Excess Loss for Non-Op tima l Scandictors Analogously to the scannin g and ﬁltering setting discu ssed in Section 3, and the clean predic- tion setting d iscussed in [12], it is in teresting to inv estigate the excess loss incurr ed when non optimal s cand ict ors are used in the noisy scandiction setting. Un lik e the scanning and ﬁltering setting, w here the excess loss b ound s were n ot a straigh tforward extensions of the r esults in [12], in the noisy s ca n diction scenario th is problem can b e quite easily tac kled using the resu lts of [12] and mo diﬁed loss fun ctio n s. W e br ieﬂy state the results of [12] in this con text. The scenario consid er ed therein is that of p redicting the next outcome of a b inary source, with D = [0 , 1] as the prediction space. φ ρ denotes th e Ba yes en v elop e asso ciated with th e loss fun ctio n ρ , i.e., φ ρ ( p ) = min q ∈ [0 , 1] [(1 − p ) ρ (0 , q ) + pρ (1 , q )] . (98) 40 Similarly to (54), d eﬁne ǫ ρ = m in α,β max 0 ≤ p ≤ 1 | αh b ( p ) + β − φ ρ ( p ) | . (9 9) Note that although the deﬁnitions of φ ρ ( p ) and ǫ ρ refer to the binary scenario, the resu lt b elo w holds f or larger alph abets, with ǫ ρ deﬁned as in (99), with the maxim u m ranging o ver the simplex of all d istributions on the alph abet, and h ( p ) (replacing h b ( p )) and φ ρ ( p ) denoting the ent rop y and Ba yes env elop e of th e distribution p , resp ectiv ely . In [12], it is shown that if X B is an arbitrarily distr ib uted bin ary rand om ﬁeld, then, for an y scan Ψ,     α ρ 1 | B | H ( X B ) + β ρ − E Q B 1 | B | L (Ψ ,F opt ) ( X B )     ≤ ǫ ρ , (100) where α ρ and β ρ are the ac hieve rs of the minim um in (99). As men tioned earlier, if ρ ( Y , F ) is some loss fun cti on for the “clean” pred iction problem of { Y } , the noisy p rocess, then, E { ρ ( Y , F ) | σ ( X ) } = ˜ ρ ( X , F ) (101) for some ˜ ρ . Assuming a su ita b le ρ is foun d (i.e., ˜ ρ = l ), we hav e, for any s ca n Ψ,      α ρ 1 | B | H ( Y B ) + β ρ − 1 | B | E Q B L l Ψ ,F opt ( X B , Y B )      =     α ρ 1 | B | H ( Y B ) + β ρ − 1 | B | E Q B L ρ Ψ ,F opt ( Y B )     ≤ ǫ ρ , (102) where 1 | B | E Q B L l Ψ ,F opt ( X B , Y B ) is the normalized exp ected cumulativ e loss in optimally pre- dicting X Ψ t based on Y Ψ t − 1 1 , u nder the loss function l , 1 | B | E Q B L ρ Ψ ,F opt ( Y B ) is the norm alized exp ected cumulati v e loss in op timally pred ict in g Y Ψ t based on Y Ψ t − 1 1 , und er the loss function ρ , and α ρ and β ρ are the minimizers of ǫ ρ as deﬁned in (99) . Hence, the follo w in g corollary applies. Corollary 15. L et X B b e an arbitr arily distribute d binary ﬁeld. Assume a white noise, and denote the noisy version of X B by Y B . L e t D = [0 , 1] b e the pr e diction sp ac e and l : { 0 , 1 }× D → R b e any loss function. Then, for any sc an Ψ ,     1 | B | E Q B L l Ψ ,F opt ( X B , Y B ) − ¯ U ( l , Q B )     ≤ 2 ǫ ρ , (103) when ρ is a loss function such that E { ρ ( Y , F ) | σ ( X ) } = l ( X , F ) for any F . 41 Example 3 ( BSC and Hamming L oss ) . I n the case of binary inpu t, BSC w ith crosso ver prob- abilit y δ and Hamming loss l H ( · , · ), it is not hard to sh o w that ρ ( y , F ) = l H ( y , F ) − δ 1 − 2 δ . (104) Hence, φ ρ ( p ) = φ l H ( p ) − δ 1 − 2 δ (105) and ǫ ρ = 1 1 − 2 δ ǫ l H , (106) where ǫ l H = 0 . 08 as ment ioned in [12]. T he ab o v e b oun d on the excess loss can also b e computed dir ec tly , without using Corollary 15, as for any scan Ψ, th e n orm ali zed cumulativ e prediction errors are give n by 1 | B | E Q B L l H Ψ ,F ( X B , Y B ) = 1 | B | | B | X t =1 P  F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) 6 = X Ψ t  (107) for the noisy scenario, and 1 | B | E Q B L l H Ψ ,F ( Y B ) = 1 | B | | B | X t =1 P  F t ( Y Ψ 1 , . . . , Y Ψ t − 1 ) 6 = Y Ψ t  (108) for the (clean) prediction of Y B . Hence, u sing (86) , for any scan Ψ w e ha v e,     1 | B | E Q B L l H Ψ ,F opt ( X B , Y B ) − ¯ U ( l H , Q B )     =      1 | B | E Q B L l H Ψ ,F opt ( Y B ) − δ 1 − 2 δ − U ( l H , Q Y ,B ) − δ 1 − 2 δ      = 1 1 − 2 δ     1 | B | E Q B L l H Ψ ,F opt ( Y B ) − U ( l H , Q Y ,B )     ≤ 2 ǫ l H 1 − 2 δ . (109) Example 4 ( A dditive Noise and Squar e d Err or ) . L et Y B b e th e outpu t of an additiv e c h annel, with σ 2 v denoting th e noise v ariance. Let l s b e the squ ared error loss fun ctio n . In this case, E n ( Y Ψ t − F t ( y Ψ t − 1 )) 2 − σ 2 v | {z } ρ ( Y Ψ t ,F t ( y Ψ t − 1 )) | σ ( X Ψ t ) o = ( X Ψ t − F t ( y Ψ t − 1 )) 2 | {z } l s ( X Ψ t ,F t ( y Ψ t − 1 )) . (110) Th us, Corollary 15 applies with ρ ( Y , ˆ Y ) = ( Y − ˆ Y ) 2 − σ 2 v , and clearly ǫ ρ = ǫ l s . Note that although Corollary 15 is stated for b inary alphab et, it is not h ard to generalize its result to larger alphab ets, as men tioned in [12, Section 4]. 42 4.3.1 Excess Loss Bounds V ia the Con tinuou s Time Mutual Information The b ound on th e excess noisy scandiction loss given in Corollary 15 w as d eriv ed u sing the results of [12] and mo diﬁed loss fu nctions. Ho w ev er, n ew b ounds can also b e deriv ed u sing the same metho d w hic h w as used in the pro of of Theorem 3, namely , the scan in v ariance of the m u tu al information and the relation to the conti nuous time problem. W e b rieﬂy discuss ho w suc h a b ound can b e derived for noisy scand iction of Gaussian ﬁelds corru pted by Gaussian noise. Using the notation of S ec tion 3.2.1, we h a ve V ar( X ) − Z 1 0 V ar( X t | Y t )d t = σ 2 X − σ 2 N ln  1 + σ 2 X σ 2 N  = σ 2 N g  σ 2 X σ 2 N  , (111) where g ( x ) = x − ln(1 + x ) . (11 2) Since σ 2 X ≥ V ar  X Ψ i | Y Ψ i − 1 Ψ 1  and g ( x ) is mon otonically increasing for x > 0, deriv ations similar to (27) lead to 1 n 2 E L (Ψ ,F opt ) ( X V n , Y V n ) ≤ σ 2 N g  σ 2 X σ 2 N  + 1 n 2 2 σ 2 N I ( X V n , Y V n ) . (113) On th e other hand , since g ( x ) ≥ 0 for x ≥ 0, we hav e 1 n 2 E L (Ψ ,F opt ) ( X V n , Y V n ) ≥ 1 n 2 2 σ 2 N I ( X V n , Y V n ) , (114) whic h no w can b e viewe d as th e s cann ing and prediction analogue of [18, eq. (156b)]. W e thus ha ve the follo wing corollary . Corollary 16. L et X V n b e a Gaussian r andom ﬁeld with a c onstant mar ginal distribution satisfying V ar ( X i ) = σ 2 X < ∞ for al l i ∈ V n . L et Y i = X i + N i , wher e N V n is a white Gaussian noise of varianc e σ 2 N , i ndep endent of X V n . Then, for any two sc ans Ψ 1 and Ψ 2 and their optimal pr e dictors, we have 1 n 2   E L (Ψ 1 ,F opt ) ( X V n , Y V n ) − E L (Ψ 2 ,F opt ) ( X V n , Y V n )   ≤ σ 2 N g  σ 2 X σ 2 N  . (115) Similarly as in Theorem 3, the b ound in C oroll ary 16 h as the form σ 2 X g (SNR) SNR , n amel y , it scales with the v ariance of the in put. As exp ected, at the limit of lo w SNR, g (SNR) SNR → 0, since regardless of the scan, one is clueless ab out the un derlying clean s y mb ol. In fact, it is not surpr ising th at this b eha vior is common to b oth the ﬁ ltering and th e pr edicti on scenarios. In 43 the f ormer, the b ound v alue is giv en by (23), while in the latter it is give n by (111 ). I n b oth cases, th e b ound v alue is simply the diﬀerence b et wee n a con tinuous time ﬁltering p roblem, and a discrete time ﬁltering (or prediction, in (111)) pr oblem. I t is not h ard to see that this diﬀerence tends to 0 as S NR → 0 + . At the limit of high S NR, g (SNR) SNR → 1. Indeed, this limit corresp onds to th e noiseless scandiction scenario, where scanning is consequen tial [11]. 5 Conclus ion W e inv estigated problems in sequentia l ﬁ lte ring and pr edictio n of noisy multidimensional data arra ys. A b ound on the b est ac hiev able scann ing and ﬁ ltering p erformance w as deriv ed, and the excess loss incurred when non-optimal scanners are used w as quanti ﬁed. In the pr edictio n setting, a relation of the b est ac hiev able p erform ance to that of the clean scandictabilit y w as giv en. In b oth the ﬁltering and prediction scenarios, a sp ecial emphasis wa s giv en to the cases of A WGN and squared error loss, and BSC and Hamming loss. Due to their sequenti al nature, the problems d iscussed in this pap er are strongly related to the ﬁltering and pred ict ion p roblems where r eordering of th e data is n ot allo w ed (or where there is only one n atural order to scan the data), s u c h as robust ﬁ ltering and universal pr e- diction discus sed in the current literature. Ho w ev er, the numerous scanning p ossibilities in the multidimensional setting add a multit u de of new chall enges. In fact, many interesting problems remain op en. It is clear th at iden tifying the optimal scanning metho ds in the widely used inp ut and c h annel mo dels discussed herein is r equ ired, as th e implementa tion of univ ersal algorithms might b e to o complex in r ealistic s itu ations. Moreo v er , tighter up p er b ounds on the excess loss can b e derive d in ord er to b etter u nderstand the tr ad e-oﬀs b et ween non-trivial scanning metho ds and the o ve rall p erformance. Finally , by [11], th e trivial scan is optimal for scandiction of noise-free Gaussian rand om ﬁelds. By Corollary 10 herein, this is also the case in scandiction of Gaussian ﬁelds corrup ted by Gaussian n oise. Whether the same hold for scanning and ﬁltering of Gaussian rand om ﬁelds corrup ted by Gaussian noise r emains unanswered. 44 A App end ixes A.1 Pro of of Theorem 4 The pro of resem b les the pro of of Theorem 4. Ho wev er, the deriv ations leading to the analogue of (27) are slightly diﬀeren t. F or any inp ut ﬁeld X V n , we hav e 1 n 2 E Q V n L (Ψ , ˜ F opt ) ( X V n , Y V n ) = 1 n 2 n 2 X i =1 V ar  X Ψ i | Y Ψ i Ψ 1  = 1 n 2 n 2 X i =1 ( Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t } t ∈ [ i − 1 ,i − 1+ t ]  d t − " Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t } t ∈ [ i − 1 ,i − 1+ t ]  d t − V ar  X Ψ i | Y Ψ i Ψ 1  #) ( a ) ≥ 1 n 2 n 2 X i =1 Z 1 0 V ar  X Ψ i | Y Ψ i − 1 Ψ 1 , { Y ( c ) t } t ∈ [ i − 1 ,i − 1+ t ]  d t − f ∗  X V n , σ 2 N  = 1 n 2 2 σ 2 N I ( X V n ; Y V n ) − f ∗  X V n , σ 2 N  , (116) where (a) results from the d eﬁnition of f ∗ (). Th e rest of the pro of follo ws similar to the pr oof of T heorem 4, since f or an y X V n and σ 2 N it is clear that f ∗ ( X V n , σ 2 N ) is non negativ e. A.2 Pro of of Lemma 9 Without loss of generalit y , we assume E N t = 0. F rom Prop osition 1, we h a ve, ¯ U ( ρ s , Q ) = lim n →∞ inf (Ψ ,F ) ∈S ( B n ) E Q B n 1 | B n | | B n | X t =1  X Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 . (117) 45 Ho we ver, since Y Ψ t = X Ψ t + N Ψ t , and N Ψ t is indep end ent of N Ψ t ′ , t ′ 6 = t and of all { X t } , we ha ve E Q B n 1 | B n | | B n | X t =1  X Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 = E Q B n 1 | B n | | B n | X t =1  Y Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 ) − N Ψ t  2 = E Q B n 1 | B n | | B n | X t =1   Y Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 − 2 N Ψ t  X Ψ t + N Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  + N 2 Ψ t  = E Q B n 1 | B n | | B n | X t =1  Y Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 − σ 2 N . (11 8) That is, U ( ρ s , Q ) = lim n →∞ inf (Ψ ,F ) ∈S ( B n ) E Q B n 1 | B n | | B n | X t =1  Y Ψ t − F t ( y Ψ 1 , . . . , y Ψ t − 1 )  2 − σ 2 N , (119) whic h completes the pro of. A.3 The M artinga le Prop ert y of  ∆ (Ψ ,F ) ( x B , y B ) t , F Ψ t  The pro of f oll o ws that of [15, Lemma 1]. Ho wev er, notice that d u e to the d ata-dep endent scanning F Ψ t is not generated by a ﬁxe d set of random v ariables, that is, o ver a ﬁxed set of sites, but by a set of t random v ariables wh ic h may b e diﬀerent for eac h instantia tion of the random ﬁeld (as for eac h t , Ψ t dep ends on Y Ψ t − 1 Ψ 1 ). Y et, the exp ectation will alwa y s b e with resp ect to the random v ariables se en so far . By (91) , ∆ (Ψ ,F ) ( x B , y B ) t = t X i =1 ( h ( y Ψ i ) − x Ψ i ) l 0 ( F i ( y Ψ i − 1 Ψ 1 )) + t X i =1 ( x Ψ i − h ( y Ψ i )) l 1 ( F i ( y Ψ i − 1 Ψ 1 )) . (120) Deﬁning m t △ = t X i =1 ( h ( Y Ψ i ) − x Ψ i ) l 0 ( F i ( Y Ψ i − 1 Ψ 1 )) , (121) 46 w e hav e, E  m t +1 |F Ψ t  = E ( t +1 X i =1 ( h ( Y Ψ i ) − x Ψ i ) l 0 ( F i ( Y Ψ i − 1 Ψ 1 )) |F Ψ t ) = E n  h ( Y Ψ t +1 ) − x Ψ t +1  l 0 ( F t +1 ( Y Ψ t Ψ 1 )) |F Ψ t o + E ( t X i =1 ( h ( Y Ψ i ) − x Ψ i ) l 0 ( F i ( Y Ψ i − 1 Ψ 1 )) |F Ψ t ) = E  h ( Y Ψ t +1 ) − x Ψ t +1  |F Ψ t  l 0 ( F t +1 ( Y Ψ t Ψ 1 )) + t X i =1 ( h ( Y Ψ i ) − x Ψ i ) l 0 ( F i ( Y Ψ i − 1 Ψ 1 )) = E  h ( Y Ψ t +1 ) − x Ψ t +1  l 0 ( F t +1 ( Y Ψ t Ψ 1 )) + m t = m t , (122) where the third equalit y is since Y Ψ t 0 is F Ψ t measurable f or an y t 0 ≤ t , the four th is since h ( Y Ψ t +1 ) − x Ψ t +1 is indep endent of F Ψ t and the ﬁfth is sin ce h ( Y Ψ t +1 ) is an unbiased estimate for x Ψ t +1 . Hence, ( m t , F Ψ t ) is a zero-mean martingale (note that E m 1 = 0). Analogously , P t i =1 ( x Ψ i − h ( y Ψ i )) l 1 ( F i ( y Ψ i − 1 Ψ 1 )) is also a zero-mean martingale w ith r esp ect to F Ψ t , whic h completes the pro of. References [1] B. Natara jan, K. K onstan tinides, and C. Herley , “Occam ﬁ lte rs for sto c hastic sources with application to digital images,” IEEE T r ans. Signal Pr o c essing , vol. 46, no. 5, pp. 1434– 1438, May 1998. [2] C.-H. Lamarqu e and F. Rob ert, “Image analysis using s pace -ﬁlling curve s and 1D w a ve let bases,” Pattern R e c o gnition , vol. 29, n o. 8, p p . 1309– 1322, 1996. [3] A. K rzyzak, E . Rafa jlo wicz, and E. Sku balsk a-Rafa jlo wicz, “Clipp ed median and space ﬁlling curv es in image ﬁ ltering,” Nonline ar Analysis , vol . 47, p p. 303–314 , 2001. [4] L. V elho and J. M. Gomes, “Digital h alfto n ing with space ﬁlling curves,” Computer Gr aphics , vo l. 25, no. 4, p p . 81–90, Ju ly 1991. [5] E. Sku balsk a-Rafa jlo w icz , “Pattern r ecognition algorithms based on sp ace -ﬁ lling curves and orthogonal expansions,” IEEE T r ans. Inform. The ory , vol. 47, no. 5, pp . 1915 –1927 , July 2001. 47 [6] T. Asano, D. Ranjan, T. R o os, E. W elzl, and P . Widma y er , “Space-ﬁlling cur v es and their use in th e design of geometric d at a structures,” The or etic al Computer Scie nc e , vol . 181, pp. 3–15, 1997. [7] B. Mo on, H. V. Jagadish, C. F aloutsos, and J. H. Saltz, “Analysis of the clustering prop- erties of the Hilb ert space-ﬁlling curve,” IEEE T r ans. Know le dge and Data E ng i ne ering , v ol. 13, no. 1, p p. 124–1 41, January/F ebruary 2001. [8] A. Bogomjak o v and C. Gotsman, “Univ ersal renderin g sequences for transparent v ertex cac hing of progressiv e meshes,” Computer Gr aphics F orum , v ol. 21, n o. 2, pp . 137–14 8, 2002. [9] R. Niederm eier, K. Reinhardt, and P . S anders, “T o w ards optimal localit y in mesh- indexings,” Discr ete A pp lie d Mathematics , vol. 117, pp . 211–237 , 2002. [10] A. Lemp el and J. Ziv, “Compr ession of tw o-dimensional data,” IEE E T r ans. Inform. The ory , v ol. IT-32, no. 1, pp. 2–8, January 1986. [11] N. Merhav and T. W eissman, “Scanning and prediction in multidimensional data arrays,” IEEE T r ans. Inform. The ory , vol. 49, no. 1, pp . 65–82, Jan uary 2003. [12] A. Cohen, N. Merha v, and T. W eissman, “Scanning and sequentia l decision making for m u lti-dimens ional data - p art I: the noiseless case,” to app ear in IEEE T r ans. on Inform. The ory . [13] A. Cohen, Topics in sc anning of multidimensional data , Ph.D. thesis, T ec hn ion, Isr ael Institute of T ec hnology , 2007. [14] T. W eissman, N. Merh a v, and A. Somekh-Baruch, “Twofold un iv ersal p rediction sc hemes for ac h ieving the ﬁn ite-state predictabilit y of a noisy in dividual binary sequen ce,” IEEE T r ans. Inform. The ory , vol. 47, no. 5, pp. 1849–186 6, July 2001. [15] T. W eissman and N. Merh a v, “Univ ers al prediction of individual binary sequen ces in th e presence of noise,” IEEE T r ans. Inform. The ory , v ol. 47, pp . 2151–2 173, Septemb er 2001. [16] T. W eissman and N. Merh a v, “Univ ers al prediction of random bin ary s equences in a noisy en vironm en t,” Ann. App. Pr ob. , v ol. 14, n o. 1, pp. 54–89, F ebruary 2004. [17] T. E. Duncan, “On calculation of mutual inf ormati on,” SIAM Journal of Applie d M ath- ematics , v ol. 19, pp. 215–220, July 1970. 48 [18] D. Gu o, S. S hamai, and S. V erd ´ u, “Mutual information and minim u m mean-square error in Gaussian c h an n els,” IEEE T r ans. Inf orm. The ory , vol. 51, no. 4, pp. 1261–1 282, Apr il 2005. [19] T. W eissm an , E. Ordentlic h , M. W ein b erger, A. S omekh -Baruch, and N. Merh a v , “Uni- v ersal ﬁltering via prediction,” IEEE T r ans. Inform. The ory , v ol. 53, no. 4, pp. 1253– 1264, April 2007. [20] T. W eissman, E. Or d en tlic h, G. Seroussi, S . V erd ´ u, and M. W einb erger, “Unive rsal discrete d enoising: kn own c h an n el,” IEEE T r ans. Inform. The ory , v ol. 51, no. 1, pp. 5–28, Jan u ary 2005. [21] E. Ord en tlic h and T. W eissman, “On the optimalit y of symb ol-b y-sym b ol ﬁltering and denoising,” IEE E T r ans. Inform. The ory , vol. 52, n o. 1, pp . 19–40, Jan uary 2006. [22] D. Guo, Gaussian Channels: informatio n, estimation and multiuser dete ction , Ph .D. thesis, Princeton Unive rsit y , 2004. [23] T. M. C o ver and J. A. Thomas, Elements of Information The ory , Wiley , New Y ork, 1991. [24] S. V erd ´ u and T. W eissman, “The in formatio n lost in erasu res,” submitted to IEEE T r ans. on Inform. The ory , 2007. [25] Y. Ep hraim and N. Merhav, “Hidden Mark ov pr ocesses,” IEEE T r ans. Inform. The ory , v ol. 48, no. 6, p p. 1518– 1569, Jun e 2002. [26] H. Helson and D Lo wdenslager, “Pred iction theory and Fourier s er ies in s ev eral v ariables,” A cta Math. , v ol. 99, pp . 165–202, 1958. [27] A. Papoulis, Pr ob ability, R andom V ariables, and Sto chastic Pr o c esses , McGra w-Hill, New Y ork, 2nd edition, 1984. [28] N. M. Blac hman, “The con v olution inequalit y for en tropy p o w ers,” IE EE T r ans. Inform. The ory , v ol. IT-11, pp . 267–27 1, April 1965. [29] V. G. V o vk, “Aggregating strategie s ,” Pr o c . 3r d Annu. Workshop Computationa l L e arning The ory, San Mateo, CA, pp. 372–3 83, 1990. 49

Scanning and Sequential Decision Making for Multi-Dimensional Data - Part II: the Noisy Case

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment