A criterion for hypothesis testing for stationary processes

Given a finite-valued sample $X_1,...,X_n$ we wish to test whether it was generated by a stationary ergodic process belonging to a family $H_0$, or it was generated by a stationary ergodic process outside $H_0$. We require the Type I error of the tes…

Authors: Daniil Ryabko (INRIA Futurs, LIFL, INRIA Lille - Nord Europe)

A criterion for h yp othesis testing for stationary pro cesses Daniil Ry abk o INRIA Lil le-Nor d Eur op e, 40, Avenue Hal ley 59650 Vil leneuve d’Asc q, F r an c e daniil@ryab ko.net Abstract Give n a finite-v alued sample X 1 , . . . , X n w e wish to test whether it w as generated by a stationary ergo dic pro cess b elonging to a family H 0 , or it w as generated by a stationary ergo dic pro cess outside H 0 . W e req uire the T yp e I error of t he test to b e uniformly b ound ed , while th e typ e I I error has to b e mande not more than a finite num b er of times with probability 1. F or this n otion of consistency we provide necessary and sufficient condi- tions on the family H 0 for the existence of a consistent test. This criterion is illustrated with applications to testing for a membership to paramet- ric families , generalizing some existing results. In addition, w e analyze a stronger notion of consistency , whic h requires finite-sample guarantees on error of b oth types, and provide some necessary and some su fficient conditions for the existence of a consisten t test. W e emphasize that no assumption on the process distributions are made b eyond stationarity and ergodicity . Keywor ds: Hyp othesis testing, stationary pr o c esses, er go dic pr o c esses, distribut ional distanc e. 1 In tro duc tion Given a sample X 1 , . . . , X n (where X i are fro m a finite alphab et A ) that is known to be gene r ated b y a stationary ergo dic pro cess, we wish to decide whether it was generated by a distribution b elonging to a certa in family H 0 , versus it was generated b y a statio nary ergo dic dis tr ibution that do es no t b elo ng to H 0 . Unlike mo s t of the works on the s ub ject, we do no t assume that X i are i.i.d., but only make a muc h w eaker assumption that the distribution generating the s ample is stationa r y ergo dic. A test is a function tha t takes a s ample and an additional pa rameter α (the significance le vel), and g ives a binar y (po s sibly incorrect) answer: the sa mple was gener ated by a distribution fr om H 0 or by a stationar y ergo dic distr ibutio n not be lo nging to H 0 . Her e we a re concerned with characterizing those families H 0 for which consis tent tests exist. 1 W e consider the following notion of consistency . Call a test c onsistent if, for any pr e-sp ecified level α ∈ (0 , 1), any sample size n a nd any distribution in H 0 the pr ob ability of T yp e I err or (the test says “not H 0 ”) is not gr e ater t han α , while for every stationar y er go dic distribution from outside H 0 and every α T yp e II err or (the test says H 0 ) is made only a finite numb er of times (as the sample size go es to infinit y) with pr ob ability 1 . This notion of consistency r epresents a classical statistica l approa ch to the problem, and s uites well situations where the h yp othesis H 0 is considerably mo re simple than the alter native, for example when H 0 consists of just one distributio n, o r when it is some pa rametric fa mily , or when it is the hypothesis of ho mogeneity or that of indep endence. Prior work. There is a v ast bo dy o f literature on hypo thesis testing for i.i.d. (real- o r discrete-v alued) data (see e.g. [8]). In the context of discrete-v alued i.i.d. data, the necess ary and sufficient conditions for the ex is tence of a co nsis- ten t tes t ar e rather simple to obtain: there is a c onsistent test for H 0 (against “i.i.d. but not H 0 ”) if and only if H 0 is closed, where the to p o logy is that of the parameter space (probabilities of e a ch symbol), e.g. see [4]. The consistency being eas y to ensure, the pr ime concer n for the case of i.i.d. da ta is o ptimality . There is, how ever, muc h less litera ture o n h yp othesis testing beyond i.i.d. or parametric models, while the questions o f determining whether a consistent tes t exists (for different notions o f consistency and different hypo theses) is m uch less trivial. F or a weaker notion of consistency , namely , req uir ing tha t the test should stabilize on the co rrect answer for a .e. realization of the pro ces s (under either H 0 or the alternative), [7] constr ucts a co nsistent test for so- called co nstrained finite-state mo del classes (including finite-state Mar ko v a nd hidden Markov pro cesses ), against the genera l alter native of stationary erg o dic pro cesses . F or the same notio n of consistency , [10] gives sufficient conditions on t wo families H 0 and H 1 that cons is t of sta tionary erg o dic real-v a lued pro cess es, under which a consistent co ntin uo us test e x ists, extending the r e sults of [5] for i.i.d. data. The la tter co ndition is that H 0 and H 1 are contained in disjoint F σ sets (countable unions of closed sets), with resp ect to the top olog y of weak conv ergence. F or the notion o f consistency that we consider, consistent tests for some sp ecific hypo theses, but under the g eneral alterna tive of stationa r y ergo dic pro cesses , hav e been pr op osed in [1 1, 12, 14], which address problems o f testing ident ity , indep endence, estimating the or der of a Marko v pro cess, a nd also the change p o int pro blem. Some impo ssibility r esults for testing hypotheses ab out stationary ergo dic pr o cesses ca n b e found in [9, 13]. The results . The aim of this work is to provide to p o logical characteriz ations of the hypotheses for whic h cons is tent tests ex ist, for the ca s e o f stationa ry ergo dic distributions. The o bta ined characterization is ra ther s imilar to those men tioned ab ove for the cas e o f i.i.d. data, but is with resp ect to the top olo gy of distributional distance (or weak conv ergence). The fact tha t necessar y and sufficient conditions are obtained indicates tha t this top ology is the right one to cons ider. A distributional dista nce b etw een t wo pro cess distributions is defined as a weigh ted sum of probabilities o f a ll p ossible tuples X ∈ A ∗ , w he r e A is the alphab et and the weigh ts a re p ositive and hav e a finite sum. The main result 2 is the following theor em (forma liz e d in the next sections). Theorem. There exis ts a consistent test fo r H 0 if and only if H 0 has probability 1 with r esp ect to ergo dic decomp ositio n of every distribution fr om the clo sure of H 0 . The test that we construct to establish this result is based on empirica l estimates o f distributiona l distance. F o r a given level α , it takes the la rgest ε - neighbourho o d of the closure o f H 0 that has proba bilit y no t g reater than 1 − α with resp ect to every ergo dic pro c ess in it, and o utputs 0 if the sample falls into this neighbourho o d, a nd 1 o ther wise. T o illustrate the applicability of the main result, w e show that the families of k -or der Ma rko v pro cess es and k -state Hidden Marko v pro ces s es (for any natura l k ), satisfy the co nditions o f the theore m, and therefore there exists a consistent test for member s hip to these families. It should be emphasiz e d tha t the r esults of this work c oncern what is p oss ible in principle; finding an efficient testing pro cedure for e a ch sp ecific hypothesis for which we can demonstra te existence of a consistent test is a different pro blem. 2 Preliminaries Let A b e a finite alphab et, and denote A ∗ the se t of words (or tuples) ∪ ∞ i =1 A i and A ∞ the set of a ll one- wa y infinite seque nc e s . F or a word B ∈ A ∗ the symbol | B | stands fo r the length of B . Distributions, or (sto chastic) pro cesses, are measures o n the space ( A ∞ , F A ∞ ), where F A ∞ is the Borel sigma- algebra of A ∞ . Denote #( X , B ) the num b er of o ccurrences of a word B ∈ A ∗ in a word X ∈ A ∗ and ν ( X, B ) its fre quency: #( X, B ) = | X |−| B | +1 X i =1 I { ( X i ,...,X i + | B |− 1 )= B } , and ν ( X , B ) =  1 | X |−| B | +1 #( X, B ) if | X | ≥ | B | , 0 otherwise, (1) where X = ( X 1 , . . . , X | X | ). F or example, ν (0001 , 00) = 2 / 3 . W e use the abbr e viation X 1 ..k for X 1 , . . . , X k . A pro c ess ρ is stationary if ρ ( X 1 .. | B | = B ) = ρ ( X t..t + | B |− 1 = B ) for a ny B ∈ A ∗ and t ∈ N . Denote S the set of a ll statio na ry pro cesses on A ∞ . A stationary pro cess ρ is called (stationary) er go dic if the freq uency o f o ccur rence of ea ch w or d B in a sequence X 1 , X 2 , . . . gener ated by ρ tends to its a prio ri (or limiting) proba bility a .s .: ρ (lim n →∞ ν ( X 1 ..n , B ) = ρ ( X 1 .. | B | = B )) = 1. By v irtue of the ergo dic theo r em (e.g . [3]), this definition can b e shown to be equiv a le nt to the standard definition of sta tio nary ergo dic pro c e s ses (every shift-inv a riant set has measure 0 or 1 ; see e.g. [4]). Denote E the set of a ll stationary ergo dic pr o cesses. 3 Definition 1 (distributional distance) . The distributional distanc e is define d for a p air of pr o c esses ρ 1 , ρ 2 as fol lows [6]: d ( ρ 1 , ρ 2 ) = ∞ X k =1 w k | ρ 1 ( X 1 .. | B k | = B k ) − ρ 2 ( X 1 .. | B k | = B k ) | , wher e w k = 2 − k and B k , k ∈ N r ange thr ough the set A ∗ of al l wor ds in length-lexic o gr aphic al or der (t he weights and or dering ar e fixe d for the sake of c oncr eteness only). It is ea sy to se e that d is a metric. E quipp ed with this metric, the space of a ll sto chastic pro ces ses is separ able and complete; moreov er, it’s a compact. The set of stationary pr o cesses S is its conv ex clo sed subs et (hence a compa ct too ). The set of all finite-memory stationar y distributions is dense in S . (T aking only those that hav e ratio nal trans ition pr obabilities we obtain a countable dense subset of S .) The set E is not conv ex (a mixture of stationa ry ergo dic distributions is alwa ys stationary but never er go dic) and is not closed (its clos ure is S ). W e refer to [6] for more details and pro ofs of these facts. When talking a bo ut clo sed and op en s ubsets of S we a s sume the top olo gy of d . Compactness of the set S is one of the main ingr e die nts in the pr o ofs of the ma in res ults. Another is that the distance d can b e consistently estimated, as the next lemma shows. Considering the Borel (with resp ect to the metric d ) sigma-a lgebra F S on the set S , we obtain a standa r d pro bability s pace ( S , F S ). An imp orta nt to ol that will b e used in the ana ly sis is ergo dic decomp ositi o n of s ta tionary pro ces ses (see e.g . [6, 3]): which we reca ll here. An y stationary pro cess c a n b e expr e ssed as a mixture of sta tionary e r go dic pr o cesses; more for mally , for any ρ ∈ S there is a measure W ρ on ( S , F S ), such that W ρ ( E ) = 1 , and ρ ( B ) = R dW ρ ( µ ) µ ( B ), for any B ∈ F A ∞ . The supp ort of a sta tionary distribution ρ is the minimal closed set U ⊂ S such that W ρ ( U ) = 1 . A test is a function ψ α : A ∗ → { 0 , 1 } that takes as input a sample and a parameter α ∈ (0 , 1), and o utputs a binary answer, where the answer 0 is int erpr eted as “the sample w as gener ated by a distr ibutio n that be lo ngs to H 0 ”, a nd the answer 1 as “the sample w as g enerated by a stationa ry ergo dic distribution that do es not belo ng to H 0 .” A test ϕ makes the T yp e I erro r if it says 1 while H 0 is true, and it makes T yp e II erro r if it says 0 while H 0 is false. Definition 2 (consistency) . Cal l a test ψ α , α ∈ (0 , 1) c onsistent as a test of H 0 against H 1 if: (i) The pr ob ability of T yp e I err or is alway s b ounde d by α : ρ { X ∈ A n : ψ α ( X ) = 1 } ≤ α for every ρ ∈ H 0 , every n ∈ N and every α ∈ (0 , 1) , and (ii) T yp e II err or is made not mor e than a finite numb er of times with pr ob- ability 1: ρ (lim n →∞ ψ α ( X 1 ..n ) = 1) = 1 for every ρ ∈ H 1 and every α ∈ (0 , 1) . 4 3 Main results The test co nstructed b elow is ba sed o n empiric al estimates of the distributional distanc e d : ˆ d ( X 1 ..n , ρ ) = ∞ X i =1 w i | ν ( X 1 ..n , B i ) − ρ ( B i ) | , where n ∈ N , ρ ∈ S , X 1 ..n ∈ A n . That is, ˆ d ( X 1 ..n , ρ ) measure s the discrep- ancy b etw een empirica lly estimated and theor etical proba bilities. F or a sa mple X 1 ..n ∈ A n and a hypothesis H ⊂ E define ˆ d ( X 1 ..n , H ) = inf ρ ∈ H ˆ d ( X 1 ..n , ρ ) . Construct the test ψ α H 0 , α ∈ (0 , 1) as follows. F or each n ∈ N , δ > 0 and H ⊂ E define the neighbourho o d b n δ ( H ) o f n -tuples ar ound H as b n δ ( H ) := { X ∈ A n : ˆ d ( X, H ) ≤ δ } . Moreov er, let γ n ( H, θ ) := inf { δ : inf ρ ∈ H ρ ( b n δ ( H )) ≥ θ } be the smallest radius of a neighbourho o d a round H that ha s probability not less than θ w ith resp ect to e very pro cess in H , and let C n ( H, θ ) := b n γ n ( H,θ ) ( H ) be a neighbo urho o d of this ra dius. Define ψ α H 0 ( X 1 ..n ) :=  0 if X 1 ..n ∈ C n (cl H 0 ∩ E , 1 − α ) , 1 otherwise. W e will often omit the subscript H 0 from ψ α H 0 when it ca n caus e no confusion. The main res ult of this work is the following theor em, whose pro o f is given in s ection 6. Theorem 1. L et H 0 ⊂ E . The fol lowing statement s ar e e quivalent: (i) Ther e exists a c onsistent test for H 0 against E \ H 0 . (ii) The test ψ α H 0 is c onsistent. (iii) The set H 0 has pr ob ability 1 with r esp e ct to er go dic de c omp osition of every ρ in the closur e of H 0 : W ρ ( H 0 ) = 1 for e ach ρ ∈ cl H 0 . 4 Examples The firs t s imple illustration of The o rem 1 ab ov e is identit y testing, or go o dness of fit: testing whether a distribution g e nerating the sample o b eys a certain giv en law, versus it do es no t. Let ρ ∈ E , H 0 = { ρ } . Since H 0 is clo sed, Theorem 1 implies that there is a consistent test for H 0 . Identit y tes ting is a classical problem o f mathematical statistics, with solutio ns (e.g. based on Pearso n’s χ 2 5 statistic) for i.i.d. data (e.g . [8]), and Markov chains [2]. F o r stationa ry ergo dic pro cesses , [12] gives a consistent test when H 0 has a finite a nd bounded memory , and [14] for the genera l case. Another ex a mple is b ounding the o r der of a Ma rko v o r a Hidden Mar kov pro cess. Theorem 1 implies that for any given k ∈ N there is a consistent tes t of the hypothesis M k = “the pro cess is Mar ko v of or der not greater than k ” (against E \M k ). Mor eov er, there is a consistent test of H M k =“the pro ces s is given by a Hidden Markov pro cess with not more than k sta tes.” Indeed, in bo th cases ( k -order Markov, Hidden Markov with not mor e than k s ta tes), the hypothesis H 0 is a parametr ic family , with a compa ct set of parameters , and a contin uo us function mapping par a meters to pro cess e s (that is, to the spa c e S ). W eier s trass theorem then implies that the ima ge of such a compac t parameter set is clo s ed (and compa ct). Moreover, in bo th cases H 0 is clos ed under tak ing ergo dic deco mpo sitions. Thus, b y Theorem 1, ther e exis ts a co nsistent test. The pro blem of estimating the order of a (hidden) Marko v pro cess, based on a sample fr om it, was addressed in a num be r of works. In the co ntest of hypothesis testing, c o nsistent tes ts for M k against M t with t > k were given in [1], see also [2]. F or a weak er notion of consistency (the test has to s tabilize o n the co rrect answer even tually , with proba bilit y 1) the existence of a consis tent test for HM k was established in [7 ]. F or the notion of consis tency co nsidered here, a consistent test for M k was propo sed in [11], while for the case of testing HM k the r esult a b ove is appar ently new. 5 Uniform testing Finally , let us co nsider a str onger notio n of hypothesis testing, that r equires uniform sp eed o f co nv ergence for error s o f e ither type. A test ϕ is called uni formly consis ten t if for every α ther e is an n α ∈ N such that for every n ≥ n α the proba bility of er ror on a sample of size n is less than α : ρ ( X ∈ A n : ϕ ( X ) = i ) < α for every ρ ∈ H 1 − i and every i ∈ { 0 , 1 } . F or H 0 , H 1 ⊂ S , the uniform test ϕ H 0 ,H 1 is co ns tructed as follows. F or each n ∈ N let ϕ H 0 ,H 1 ( X 1 ..n ) :=  0 if ˆ d ( X 1 ..n , cl H 0 ∩ E ) < ˆ d ( X 1 ..n , cl H 1 ∩ E ) , 1 otherwise. (2) Theorem 2 (uniform testing) . L et H 0 ⊂ S and H 1 ⊂ S . If W ρ ( H i ) = 1 for every ρ ∈ cl H i then the test ϕ H 0 ,H 1 is uniformly c onsistent. Conversely, if ther e exists a un iformly c onsistent test for H 0 against H 1 then W ρ ( H 1 − i ) = 0 for any ρ ∈ cl H i . The pro of is given in the next section. 6 Pro ofs The pro of of the main results will us e the following lemmas. 6 Lemma 1 ( ˆ d is consis tent ) . L et ρ, ξ ∈ E and let a s ample X 1 ..k ) b e gener ate d by ρ . Then lim k →∞ ˆ d ( X 1 ..k , ξ ) = d ( ρ, ξ ) ρ -a.s. The pro of is based on the fact that the frequency o f each word converges to its expecta tion. F or each δ w e can find a time b y which the first K ( δ ) frequencies will hav e conv erg ed up to δ , where K ( δ ) is such tha t the cumulativ e weigh t of the r est o f the frequencies is smaller than δ to o. Pr o of. F or any ε > 0 find such an index J tha t P ∞ i = J w i < ε / 2. F or each j we hav e lim k →∞ ν ( X 1 ..k , B j ) = ρ ( B j ) a.s., so that | ν ( X 1 ..k , B j ) − ρ ( B j ) | < ε / (2 J w j ) from so me k on; denote K j this k . Let K = max j K we hav e | ˆ d ( X 1 ..k , ξ ) − d ( ρ, ξ ) | =      ∞ X i =1 w i  | ν ( X 1 ..k , B i ) − ξ ( B i ) | − | ρ ( B i ) − ξ ( B i ) |       ≤ ∞ X i =1 w i | ν ( X 1 ..k , B i ) − ρ ( B i ) | ≤ J X i =1 w i | ν ( X 1 ..k , B i ) − ρ X ( B i ) | + ε/ 2 ≤ J X i =1 w i ε/ (2 J w i ) + ε/ 2 = ε, which proves the statement. Lemma 2 (smo o th probabilities of devia tion) . L et m > 2 k > 1 , ρ ∈ S , H ⊂ S , and ε > 0 . Then ρ ( ˆ d ( X 1 ..m , H ) ≥ ε ) ≤ 2 ε ′− 1 ρ ( ˆ d ( X 1 ..k , H ) ≥ ε ′ ) , (3) wher e ε ′ := ε − 2 k m − k +1 − t k with t k b eing the su m of al l the weights of tuples longer than k in t he definition of d : t k := P i : | B i | >k w i . F urther, ρ ( ˆ d ( X 1 ..m , H ) ≤ ε ) ≤ 2 ρ  ˆ d ( X 1 ..k , H ) ≤ m m − k + 1 2 ε + 4 k m − k + 1  . (4) The meaning of this lemma is as follows. F or any word X 1 ..m , if it is far aw ay from (or clo se to) a given distr ibutio n µ (in the empirical distributional distance), then some of its shorter subw o rds X i..i + k are far from (clos e to) µ to o. In other words, for a stationa ry distribution µ , it ca nnot happ en that a small sa mple is likely to b e close to µ , but a lar ger sample is likely to b e far. Pr o of. Le t B b e a tuple such that | B | < k and X 1 ..m ∈ A m be any s ample of siz e m > 1. The n umber of o ccurre nc e s of B in X can be bounded by the 7 nu mber of o ccurr ences of B in subw ords of X of length k as follows: #( X 1 ..m , B ) ≤ 1 k − | B | + 1 m − k +1 X i =1 #( X i..i + k − 1 , B ) + 2 k = m − k +1 X i =1 ν ( X i..i + k − 1 , B ) + 2 k . Indeed, summing ov er i = 1 ..m − k the num b er of o ccurr ences o f B in a ll X i..i + k − 1 we count each o ccurrence of B exactly k − | B | + 1 times, except fo r those that o ccur in the first and la st k sy mbo ls. Dividing by m − | B | + 1, and using the definition (1), we obtain ν ( X 1 ..m , B ) ≤ 1 m − | B | + 1 m − k +1 X i =1 ν ( X i..i + k − 1 , B ) | + 2 k ! . (5) Summing over a ll B , for any µ , we g et ˆ d ( X 1 ..m , µ ) ≤ 1 m − k + 1 m − k +1 X i =1 ˆ d ( X i..i + n − 1 , µ ) + 2 k m − k + 1 + t k , (6) where in the right-hand side t k corres p o nds to a ll the s umma nds in the left-ha nd side for which | B | > k , where for the rest of the summands we used | B | ≤ k . Since this holds for any µ , we conclude tha t ˆ d ( X 1 ..m , H ) ≤ 1 m − k + 1 m − k +1 X i =1 ˆ d ( X i..i + k − 1 , H ) ! + 2 k m − k + 1 + t k . (7) Note that the ˆ d ( X i..i + k − 1 , H ) ∈ [0 , 1 ]. The r efore, for the av erag e in the r.h.s. of (7) to b e larger than ε ′ , at lea st ε ′ / 2( m − k + 1) summands hav e to be larger than ε ′ / 2. Using stationar ity , we can conclude ρ  ˆ d ( X 1 ..k , H ) ≥ ε ′  ≥ ε ′ / 2 ρ  ˆ d ( X 1 ..m , H ) ≥ ε  , proving (3). The s econd statement ca n b e prov en similarly; indeed, ana logously to (5) we hav e ν ( X 1 ..m , B ) ≥ 1 m − | B | + 1 m − k +1 X i =1 ν ( X i..i + k − 1 , B ) − 2 k m − | B | + 1 ≥ 1 m − k + 1 m − k + 1 m m − k +1 X i =1 ν ( X i..i + k − 1 , B ) ! − 2 k m , 8 where we hav e used | B | ≥ 1. Summing ov er different B , we o btain (similar to (6)), ˆ d ( X 1 ..m , µ ) ≥ 1 m − k + 1 m − k +1 X i =1 m − k + 1 m ˆ d k ( X i..i + n − 1 , µ ) − 2 k m (8) (since the frequencies are non-nega tive, there is no t n term here). F or the av erage in (8) to b e s maller than ε , at least half of the summands must b e smaller than 2 ε . Using stationar ity of ρ , this implies (4 ). Lemma 3 . L et ρ k ∈ S , k ∈ N b e a se quenc e of pr o c esses that c onver ges to a pr o c ess ρ ∗ . Then, for any T ∈ A ∗ and ε > 0 if ρ k ( T ) > ε for infin itely many indic es k , then ρ ∗ ( T ) ≥ ε Pr o of. The statement follows from the fact that ρ ( T ) is contin uous a s a function of ρ . Pr o of of The or em 1. The implica tion (ii) ⇒ (i) is o bvious. W e will show (iii) ⇒ ( ii) and (i) ⇒ (iii) . T o establish the former, w e hav e to show that the family of tests ψ α is cons istent. By constructio n, for a ny ρ ∈ cl H 0 ∩ E we hav e ρ ( ψ α ( X 1 ..n ) = 1 ) ≤ α . T o prove the consistency of ψ , it remains to show tha t ξ ( ψ α ( X 1 ..n ) = 0 ) → 0 a.s. for any ξ ∈ E \ H 0 and α > 0 . T o do this, fix any ξ ∈ E \ H 0 and let ∆ := d ( ξ , cl H 0 ) := inf ρ ∈ cl H 0 ∩E d ( ξ , ρ ). Since cl H 0 is closed, we hav e ∆ > 0. Suppo se that there exists an α > 0, such that, for infinitely many n , some samples from the ∆ / 2-neig hbourho o d of n -sa mples around ξ ar e so rted as H 0 by ψ , that is, C n (cl H 0 ∩ E , 1 − α ) ∩ b n ∆ / 2 ( ξ ) 6 = ∅ . Then for thes e n we hav e γ n (cl H 0 ∩ E , 1 − α ) ≥ ∆ / 2. This mea ns tha t there exists an incr easing sequence n m , m ∈ N , and a se- quence ρ m ∈ cl H 0 , m ∈ N , such that ρ m ( ˆ d ( X 1 ..n m , cl H 0 ∩ E ) > ∆ / 2) > α. Using Lemma 2, (3) (with ρ = ρ m , m = n m , k = n k , a nd H = cl H 0 ), and taking k la rge enough to hav e t n k < ∆ / 4, for every m larg e enough to hav e 2 n k n m − n k +1 < ∆ / 4, we obtain 8∆ − 1 ρ m  ˆ d ( X 1 ..n k , cl H 0 ) ≥ ∆ / 4  ≥ ρ m  ˆ d ( X 1 ..n m , cl H 0 ) ≥ ∆ / 2  > α. (9) Thu s, ρ m ( b n k ∆ / 4 (cl H 0 ∩ E )) < 1 − α ∆ / 8 . (10) Since the se t cl H 0 is co mpact (as a clo sed subset of a compa ct set S ), we may assume (passing to a subsequence, if necessary ) that ρ m conv erges to a certain ρ ∗ ∈ cl H 0 . Since (10) this holds for infinitely ma ny m , using Lemma 3 (with T = b n k ∆ / 4 (cl H 0 ∩ E )) we conclude that ρ ∗ ( b n k ∆ / 4 (cl H 0 ∩ E )) ≤ 1 − ∆ α/ 8 . 9 Since the latter inequa lity holds for infinitely many indices k we also have ρ ∗ (lim sup n →∞ ˆ d ( X 1 ..n , cl H 0 ∩ E ) > ∆ / 4) > 0 . How e ver, we m ust hav e ρ ∗ (lim n →∞ ˆ d ( X 1 ..n , cl H 0 ∩ E ) = 0 ) = 1 for every ρ ∗ ∈ cl H 0 : indeed, for ρ ∗ ∈ cl H 0 ∩ E it follows from Lemma 1 , and for ρ ∗ ∈ cl H 0 \E fro m Lemma 1, ergo dic deco mpo sition and the conditio ns o f the theorem ( W ρ ( H 0 ) = 1 for ρ ∈ cl H 0 ). This co ntradiction shows that for every α there are not more than finitely many n for which C n (cl H 0 ∩ E , 1 − α ) ∩ b n ∆ / 2 ( ξ ) 6 = ∅ . T o finis h the pr o of of the implication, it rema ins to no te that, as follows from Lemma 1 , ξ { X 1 , X 2 , . . . . : X 1 ..n ∈ b n ∆ / 2 ( ξ ) from some n on } ≥ ξ  lim n →∞ ˆ d ( X 1 ..n , ξ ) = 0  = 1 . T o establish the implication (i) ⇒ (iii) , we assume that there exists a con- sistent test ϕ for H 0 , and we will show that W ρ ( E \ H 0 ) = 0 for every ρ ∈ cl H 0 . T a ke ρ ∈ cl H 0 and supp ose that W ρ ( E \ H 0 ) = δ > 0. W e hav e lim sup n →∞ Z E \ H 0 dW ρ ( µ ) µ ( ψ δ/ 2 n = 0) ≤ Z E \ H 0 lim sup n →∞ dW ρ ( µ ) µ ( ψ δ/ 2 n = 0) = 0 , where the ine q uality follows fro m F atou’s lemma (the functions under in tegr al are all b ounded by 1), and the equality fro m the consistency of ψ . Th us, fro m some n on we will hav e R E \ H 0 dW ρ µ ( ψ δ/ 2 n = 0) < 1 / 4 s o that ρ ( ψ δ/ 2 n = 0 ) < 1 − 3 δ / 4. F o r any set T ∈ A n the function µ ( T ) is contin uous as a function o f µ . In particula r, it holds for the s et T := { X 1 ..n : ψ δ/ 2 n ( X 1 ..n ) = 0 } . Ther efore, since ρ ∈ cl H 0 , for a ny n lar g e enough we can find a ρ ′ ∈ H 0 such that ρ ′ ( ψ δ/ 2 n = 0) < 1 − 3 δ / 4, which contradicts the consis tency of ψ . Th us, W ρ ( H 0 ) = 1, and Theorem 1 is proven. Pr o of of The or em 2. T o prov e the firs t s tatement of the theorem, we will show that the test ϕ H 0 ,H 1 is a uniformly consistent test for cl H 0 ∩ E aga inst cl H 1 ∩ E (and henc e for H 0 against H 1 ), under the conditions of the theorem. Supp ose that, on the contrary , for some α > 0 for every n ′ ∈ N ther e is a pr o cess ρ ∈ cl H 0 such that ρ ( ϕ ( X 1 ..n ) = 1 ) > α for some n > n ′ . Define ∆ := d (cl H 0 , cl H 1 ) := inf ρ 0 ∈ cl H 0 ∩E ,ρ 1 ∈ cl H 1 ∩E d ( ρ 0 , ρ 1 ) , which is p ositive since cl H 0 and cl H 1 are closed and disjoint. W e have α < ρ ( ϕ ( X 1 ..n ) = 1 ) ≤ ρ ( ˆ d ( X 1 ..n , H 0 ) ≥ ∆ / 2 or ˆ d ( X 1 ..n , H 1 ) < ∆ / 2) ≤ ρ ( ˆ d ( X 1 ..n , H 0 ) ≥ ∆ / 2) + ρ ( ˆ d ( X 1 ..n , H 1 ) < ∆ / 2) . (11) 10 This implies that either ρ ( ˆ d ( X 1 ..n , cl H 0 ) ≥ ∆ / 2) > α/ 2 or ρ ( ˆ d ( X 1 ..n , cl H 1 ) < ∆ / 2) > α/ 2, so that, by assumption, a t least o ne of these ineq ualities holds for infinitely many n ∈ N for so me sequence ρ n ∈ H 0 . Supp ose that it is the first one, that is, there is an increa sing sequence n i , i ∈ N and a seq uence ρ i ∈ cl H 0 , i ∈ N such that ρ i ( ˆ d ( X 1 ..n i , cl H 0 ) ≥ ∆ / 2) > α/ 2 for all i ∈ N . (12) The set S is co mpa ct, hence s o is its closed subset cl H 0 . Therefore, the se quence ρ i , i ∈ N must contain a subsequence that converges to a certain pro cess ρ ∗ ∈ cl H 0 . Passing to a subsequence if necessar y , we ma y assume that this convergen t subsequence is the sequence ρ i , i ∈ N itself. Using Lemma 2 , (3) (with ρ = ρ n m , m = n m , k = n k , and H = cl H 0 ), and taking k la rge enough to hav e t n k < ∆ / 4, for every m larg e enough to hav e 2 n k n m − n k +1 < ∆ / 4, we obtain 8∆ − 1 ρ n m  ˆ d ( X 1 ..n k , cl H 0 ) ≥ ∆ / 4  ≥ ρ n m  ˆ d ( X 1 ..n m , cl H 0 ) ≥ ∆ / 2  > α/ 2 . (13) That is , we hav e shown that fo r an y la rge enough index n k the inequalit y ρ n m ( ˆ d ( X 1 ..n k , cl H 0 ) ≥ ∆ / 4) > ∆ α/ 16 holds for infinitely ma ny indices n m . F ro m this and Lemma 3 with T = T k := { X : ˆ d ( X 1 ..n k , cl H 0 ) ≥ ∆ / 4 } we conclude that ρ ∗ ( T k ) > ∆ α/ 16. The latter holds for infinitely many k ; that is, ρ ∗ ( ˆ d ( X 1 ..n k , cl H 0 ) ≥ ∆ / 4) > ∆ α/ 16 infinitely o ften. Therefor e , ρ ∗ (lim sup n →∞ d ( X 1 ..n , cl H 0 ) ≥ ∆ / 4) > 0 . How e ver, we must have ρ ∗ ( lim n →∞ d ( X 1 ..n , cl H 0 ) = 0 ) = 1 for every ρ ∗ ∈ cl H 0 : indeed, for ρ ∗ ∈ cl H 0 ∩ E it follows from Lemma 1, a nd for ρ ∗ ∈ cl H 0 \E from Lemma 1, ergo dic decomp osition and the conditions of the theo rem. Thu s, we hav e ar rived at a contradiction that shows that ρ n ( ˆ d ( X 1 ..n , cl H 0 ) > ∆ / 2) > α/ 2 cannot hold for infinitely man y n ∈ N for any sequence of ρ n ∈ cl H 0 . Analogously , we ca n show that ρ n ( ˆ d ( X 1 ..n , cl H 1 ) < ∆ / 2 ) > α/ 2 cannot hold for infinitely many n ∈ N for any sequence of ρ n ∈ cl H 0 . Indeed, using Lemma 2, equation (4 ), we ca n s how that ρ n m ( ˆ d ( X 1 ..n m , cl H 1 ) ≤ ∆ / 2) > α/ 2 for a larg e eno ugh n m implies ρ n m ( ˆ d ( X 1 ..n k , cl H 1 ) ≤ 3 ∆ / 4) > α/ 4 for a smaller n k . Ther efore, if we ass ume that ρ n ( ˆ d ( X 1 ..n , cl H 1 ) < ∆ / 2) > α/ 4 for infinitely many n ∈ N for some s equence of ρ n ∈ cl H 0 , then we will als o find a ρ ∗ for which ρ ∗ ( ˆ d ( X 1 ..n , cl H 1 ) ≤ 3∆ / 4 ) > α/ 4 for infinitely many n , which, using Lemma 1 a nd ergo dic decomp osition, ca n be s hown to contradict the fac t that ρ ∗ (lim n →∞ d ( X 1 ..n , cl H 1 ) ≥ ∆) = 1. Thu s, r eturning to (11), we hav e shown that from s ome n on there is no ρ ∈ c l H 0 for which ρ ( ϕ = 1) > α holds true. The s ta tement fo r ρ ∈ cl H 1 can be pr ov en analog ously , thereby finishing the pro of of the first statement. 11 T o prov e the seco nd statement of the theorem, we as s ume that there ex- ists a uniformly consistent test ϕ fo r H 0 against H 1 , a nd we will show that W ρ ( H 1 − i ) = 0 for every ρ ∈ cl H i . Indeed, le t ρ ∈ cl H 0 , that is, s uppo se that there is a sequenc e ξ i ∈ H 0 , i ∈ N s uch that ξ i → ρ . Assume W ρ ( H 1 ) = δ > 0 and take α := δ / 2. Since the test ϕ is uniformly consistent, ther e is an N ∈ N such that for every n > N we hav e ρ ( ϕ ( X 1 ..n = 0)) ≤ Z H 1 ϕ ( X 1 ..n = 0 ) dW ρ + Z E \ H 1 ϕ ( X 1 ..n = 0) dW ρ ≤ δ α + 1 − δ ≤ 1 − δ / 2 . Recall that, for T ∈ A ∗ , µ ( T ) is a contin uous function in µ . In particular, this holds for the set T = { X ∈ A n : ϕ ( X ) = 0 } , for any given n ∈ N . Therefor e, for every n > N and fo r every i la r ge e no ugh, ρ i ( ϕ ( X 1 ..n ) = 0) < 1 − δ / 2 implies also ξ i ( ϕ ( X 1 ..n ) = 0) < 1 − δ / 2 which contradicts ξ i ∈ H 0 . This co nt ra diction shows W ρ ( H 1 ) = 0 for every ρ ∈ cl H 0 . The case ρ ∈ cl H 1 is analogo us. References [1] T. Anderson, L. Go o dman. Statistical I nfer ence about Marko v Chains, Ann. Math. Statist. V ol. 28(1), pp. 89-11 0, 1957. [2] P . Billing sley , Statistical metho ds in Markov chains, Ann. Math. Statist. V ol. 3 2 (1), pp. 12 -40, 196 1 . [3] P . Billingsley , E r go dic theor y a nd information. Wiley , New Y ork, 1965 . [4] I. Csisz´ ar, P . Shields, Notes on Informa tion Theory and Statistics: A tuto- rial, F oundations a nd T r ends in Communications and Information Theory (1), pp. 1 –111 . 20 04. [5] A. Dem b o, Y. Peres. A to po logical criter ion for hypo thesis testing. Ann. Math. Stat. V ol. 22 , pp. 106- 1 17, 1994 . [6] R. Gray . Probability , Ra ndom P r o cesses , and E rgo dic Pro p erties. Spring er V erla g, 1 988. [7] J.C. Kieffer, Strongly cons istent co de-bas ed identification and o rder esti- mation for constrained finite-state mo del classes, IE E E T r ansactions on Information Theory , V ol. 39(3), pp. 89 3 -902 , 19 93. [8] E. Le hmann, T e sting Statistical Hyp o theses, 2nd edition, J ohn Wiley & Sons, New Y ork, 1986 . [9] Mor v ai G., W eis s B. (2005) On cla ssifying pro cesse s . Bernoul li , vol. 11, no. 3, pp. 523–5 32. [10] A. Nob el, Hyp o thesis testing for families of ergo dic pro cesses. Bernoulli, vol. 1 2(2), pp. 2 51-26 9, 2006. 12 [11] B. Ryabko, J. Astola, Universal co des as a basis for nonparametr ic testing of seria l indep endence for time series , Journa l of Sta tis tica l Planning and Inference, V o l. 13 6(12), pp. 411 9-41 28, 20 06. [12] B. Ryabko, J. Astola, A. Ga mmerman. Application of Kolmogo rov com- plexity and univ ersa l co des to identit y testing and nonpa rametric tes ting of serial indep endence for time serie s . Theo retical Co mputer Science, v.359, pp.440-4 4 8, 2006. [13] D. Ryabk o, An imp os sibility r e sult for pr o cess disc r imination. In Pro ceed- ings of IE EE International Symp osium on Informa tion Theor y (ISIT’09), pp. 1 734- 1738, Seoul, So uth Ko rea, 2 009. [14] D. Ryabk o, B. Ryabko. On Hypothese s T esting for E rgo dic P ro cesse s In Pro ceeding s o f IEEE Information Theory W orkshop (ITW’08), Porto, Por- tugal, pp. 281- 2 83, 2 0 08. see also http://arxiv.org /abs/ 0 804.0510 13

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment