Limitations on intermittent forecasting

G. Morv ai and B. W eiss: Limitati ons on in termi tten t forecasting. App eared in : Statist. Probab. Lett. 72 (2005), no. 4, pp. 285–290 . Abstract Bailey sho w ed that the general p oint wise f orecasting for stationary and ergo dic time s eries has a negativ e solution. How ever, it is kno wn that for Ma rk o v c hains the problem can b e solv ed . Morv ai sho wed that there is a stopping time sequence { λ n } such that P ( X λ n +1 = 1 | X 0 , . . . , X λ n ) can b e estimated from samples ( X 0 , . . . , X λ n ) suc h that the diﬀerence b et we en the conditional probabilit y and the estimate v anishes along these stopp p ing times for all stationary and ergo dic binary time series. W e will sho w it is not p ossible to estimate the ab o v e conditional pr obabilit y along a st opping time sequen ce for all stationary and ergo dic bin ary time series in a p oint wise sense suc h that if the time ser ies turns out to b e a Mark ov chain, the predictor will predict even tually for all n . Key w ords: Nonparametric estimation, prediction theory , stationary and ergo dic pro cesses , ﬁnite order Mark o v c hains Mathematics Su b ject Classiﬁcations (2000): 62G05, 6 0G25, 60G10 1 In tro du ction and Statemen t of Resul t s Co v er [2] po sed the follow ing fundamen tal pr o blem concerning fo r ecasting f or stationary and ergo dic binary time series { X n } ∞ n = −∞ . (Note that a stationary time series { X n } ∞ n =0 can b e extended to b e a t w o sided stat io nary time series { X n } ∞ n = −∞ .) Problem 1 Is ther e an estimation scheme f n for the value P ( X n +1 = 1 | X 0 , X 1 , . . . , X n ) such that f n dep ends solely on the data se gm ent ( X 0 , X 1 , . . . , X n ) and lim n →∞ | f n ( X 0 , X 1 , . . . , X n ) − P ( X n +1 = 1 | X 0 , X 1 , . . . , X n ) | = 0 almost sur e l y for al l stationary and er go dic binary time s eries { X n } ∞ n = −∞ ? This problem w as answ ered by Bailey [1] in a negativ e w a y , that is, he sho w ed that there is no s uc h sc heme. (Also se e Ry abk o [10], Gy¨ orﬁ, Morv ai, Y a k o witz [5] and W eiss [11].) Morv ai [8] considered the f o llo wing mo diﬁcation of Problem 1. Problem 2 A r e ther e a strictly incr e asing se quenc e of stopping times { λ n } and e s - timators { h n ( X 0 , . . . , X λ n ) } such that for al l stationary e r go d i c binary t ime series { X n } the estimator h n is c onsistent at stopping times λ n , that i s , lim n →∞ | h n ( X 0 , . . . , X λ n ) − P ( X λ n +1 = 1 | X 0 , . . . , X λ n ) | = 0 almost sur e l y ? Morv ai [8] constructed a sc heme that solv es Problem 2. Unfortunatelly , his stopping times gro w extremly rapidly and so that sche me is not practical at all. Let X ∗− b e the set of a ll one-sided binary seque nces, t ha t is, X ∗− = { ( . . . , x − 1 , x 0 ) : x i ∈ { 0 , 1 } for all −∞ < i ≤ 0 } . Deﬁne the distance d ∗ ( · , · ) on X ∗− as follows . Let d ∗ (( . . . , x − 1 , x 0 ) , ( . . . , y − 1 , y 0 )) = ∞ X i =0 2 − i − 1 | x − i − y − i | . 1 Definition T he conditional probabilit y P ( X 1 = 1 | . . . , X − 1 , X 0 ) is almost surely contin uous if to some set C ⊆ X ∗− whic h has probability one the conditional probability P ( X 1 = 1 | . . . , X − 1 , X 0 ) restricted to this set C is con tin uous with resp ect to metric d ∗ ( · , · ). The pro cesses with almost surely con tin uo us conditional probabilit y gen- eralizes the pro cesses for whic h it is actually con tin uous, these are essen tially the Ra ndo m Mark o v Pro cesse s of Kalik o w [6], or the con tin uous g-measures studied b y Mik e Keane [7 ]. A more mo derate gro wth ( compared to Morv ai [8] ) was ac hiev ed by Morv ai and W eiss [9] but t he consis tency was secured only for the sub class of a ll stationar y and ergo dic binary time series with almost surely con tin uous conditional probability P ( X 1 = 1 | . . . , X − 1 , X 0 ). Ho w ev er for the class of a ll statio nary and ergo dic Mark o v-c hains of some ﬁnite order Problem 1 can be solv ed. Indeed, if the time series is a Mark ov- c hain of some ﬁnite o rder, w e can estimate the order (e.g. as in Csisz´ ar, Shields [3] and Csisz´ ar [4]) and count frequencies o f blo c ks with length equal to the order. Ba iley sho we d that one can’t test fo r b eing in the class. It is conceiv able that one can improv e the result of Morv a i [8] or Morv ai and W eiss [9] so that if the pro cess happ ens to b e Mark o vian then one ev en- tually estimates at all times. Our purp ose in this paper is t o sho w that this is not p o ssible. This puts some new restrictions on what can b e achie v ed in estimating along stopping times. Theorem 1 F or an y strictly incr e asing se quenc e of stopping times { λ n } such that fo r a l l stationary and er go dic binary Markov-chains with arb i - tr ary ﬁnite or der, eventual ly λ n +1 = λ n + 1 , an d for any se quenc e of e s - timators { h n ( X 0 , . . . , X λ n ) } ther e i s a stationary and er go dic binary time series { X n } with almost sur ely c ontinuous c ondition al pr ob ability P ( X 1 = 1 | . . . , X − 1 , X 0 ) , such that P  lim sup n →∞ | h n ( X 0 , . . . , X λ n ) − P ( X λ n +1 = 1 | X 0 , . . . , X λ n ) | > 0  > 0 . Remark: Ba iley [1] among other things prov ed that there is no seque nce of f unctions { e n ( X n − 1 0 ) } whic h for all stationary a nd e rgo dic time series, if 2 it turns out to be a Mark o v-c ha in, would b e ev en t ua lly 1 and 0 otherwise. (That is, there is no test for the Mark o v prop erty .) This result do es not imply ours. On the o ther hand, our result implies Bailey’s. (Indeed, if there were a test for Mark o v-c hains in the ab o v e sense, w e could apply the estimator in Morv ai [8] or Morv ai and W eiss [9] if the time series is not a Mark o v-c hain of some ﬁnite order, and if the time series is a Marko v-chain of some ﬁnite order w e can estimate t he order of the Mark o v chain (e.g. as in Csisz´ ar, Shields [3] or Csisz´ ar [4]) and coun t frequencies of blo cks with length equal to the order. Bailey [1] and Ry abk o [10] pro v ed less than our Theorem 1. They prov ed the nonexistence o f the desired estimator when the estimator should w ork for all statio na ry and ergo dic binary time series and when all λ n = n , that is, when w e alw a ys require go o d prediction. 2 Pro o f of Theore m 1 Proo f: The pro of mainly follow s the fo ot steps of Ry abk o [10] and Gy¨ orﬁ, Mor- v ai, Y ak o witz [5] with alterations where necessary . F or m ≤ n let X n m = ( X m , . . . , X n ). First we deﬁne the same Mark o v-c hain as in Ry abk o [10] whic h serv es a s the tec hnical to ol for construction of our coun terexample. Let the state space S b e the non-negative integers . F rom state 0 t he pro - cess certainly passes to state 1 and then to state 2, at the following ep o ch. F rom eac h state s ≥ 2, the Marko v c ha in passes either to state 0 or to state s + 1 with equal probabilities 0 . 5. This construction yields a stationary and ergo dic Marko v chain { M i } with stationar y distribution P ( M = 0) = P ( M = 1 ) = 1 4 and P ( M = i ) = 1 2 i for i ≥ 2 . Let ψ k denote the ﬁrst po sitiv e time of o ccurrence o f state 2 k : ψ k = min { i ≥ 0 : M i = 2 k } . Note that if M 0 = 0 then M i ≤ 2 k for 0 ≤ i ≤ ψ k . F or eac h 0 ≤ j < ∞ w e will deﬁne a binary-v alued Mark o v-c hain { X ( j ) i } with some ﬁnite order, 3 whic h w e denote as X ( j ) i = f ( j ) ( M i ) where f ( j ) will b e a { 0 , 1 } v alued function of the state space S . W e will also deﬁne a pro cess { X i } whic h we denote as X i = f ( ∞ ) ( M i ) where f ( ∞ ) is a lso a binary v alued function of the state space S , and the time series { X i } will serv e as the stationary (non Mark o v ) unpredictable pro cess. F or all 0 ≤ j ≤ ∞ , let f ( j ) (0) = 0, f ( j ) (1) = 0, and f ( j ) ( s ) = 1 f o r all ev en states s . Note that so far w e hav e only deﬁned f ( j ) partially . W e will deﬁne the v alues for the remaining states later on. A feature of this deﬁnition of f ( j ) ( · ) is that whenev er X ( j ) n = 0 , X ( j ) n +1 = 0 , X ( j ) n +2 = 1 w e kno w tha t M n = 0 and vice vers a. No w o bserv e that if for a certain 0 ≤ j ≤ ∞ , there is an index K j suc h that f ( j ) ( i ) = 1 for all i ≥ K j then the deﬁned pro cess { X ( j ) n } is a binary Marko v-chain with order not greater tha n K j . (Indeed, the pro b- abilities P ( X ( j ) n = 1 | X ( j ) 0 , . . . , X ( j ) n − 1 ) are determined by the last K j bits ( X ( j ) n − K j , . . . , X ( j ) n − 1 ). T o see this consider the follo wing cases. I. If for some 1 ≤ i ≤ K j − 2 X ( j ) n − i = 1 and X ( j ) n − 1 − i = X ( j ) n − 2 − i = 0 than w e can detect that M n − i = 2, M n − 1 − i = 1 a nd M n − 2 − i = 0 and the conditional probability do es not dep end on previous v alues. I I. If there is no 1 ≤ i ≤ K j − 2 suc h that X ( j ) n − i = 1 and X ( j ) n − 1 − i = X ( j ) n − 2 − i = 0 w e hav e three sub-cases. I I/1. If X ( j ) n − 1 = 1 then M n − 1 ≥ K j . In this case the conditio nal proba- bilit y is 0 . 5 . I I/2. If X ( j ) n − 2 = X ( j ) n − 1 = 0 then M n − 1 = 1 and the conditional probabil- it y is 1. I I/3. If X ( j ) n − 2 = 1 and X ( j ) n − 1 = 0 then M n − 1 = 0 and so the conditional probabilit y is 0 .) No w let f (0) (2 k + 1) = 1 for a ll k ≥ 1 and so the function f (0) is fully de- ﬁned. Since f (0) ( i ) is ev en tually 1 , the deﬁned pro cess { X (0) i } is a stationary ergo dic binary Mar ko v-chain with some ﬁnite order. F or function f ( j ) and index 2 k , if f ( j ) ( i ) is deﬁned for a ll 0 ≤ i ≤ 2 k , then it is easy to see tha t if M 0 = 0 (that is, f ( j ) ( M 0 ) = 0, f ( j ) ( M 1 ) = 0 , f ( j ) ( M 2 ) = 1 ) then M i ≤ 2 k for 0 ≤ i ≤ ψ k and the mapping M ψ k 0 → ( f ( j ) ( M 0 ) , . . . , f ( j ) ( M ψ k )) 4 is in v ertible. If we let λ n op erate on pro cess { X ( j ) i } , deﬁne A j ( k ) = { M 0 = 0 , ψ k = λ n ( X ( j ) 0 , X ( j ) 1 , . . . ) for some n } . Th us a s so on as f ( j ) ( i ) is deﬁned for all 0 ≤ i ≤ 2 k the set A j ( k ) is a lso w ell deﬁned, it is measurable with respect to M ψ k 0 and depends on state 2 k and index j whic h selects the pro cess { X ( j ) n } on whic h the stopping times { λ n } op erate. Let N − 1 = 1. Not ice that A 0 ( k ) is well deﬁned for all k . Now w e deﬁne f ( j ) b y induction. Assume that for 0 ≤ i ≤ j − 1 we hav e alr eady deﬁned a strictly increasing sequence of in tegers N i − 1 , and functions f ( i ) whic h are ev en tually constan t. No w w e deﬁne f ( j ) . Since b y assumption { X ( j − 1) n } is a stationary and er- go dic binary-v alued Mark o v pro cess with some ﬁnite o rder, the estimator is assumed t o predict ev en tually on this pro cess a nd t here is a N j − 1 > N j − 2 suc h that P ( A j − 1 ( N j − 1 )) > 1 / 8 . No w for eac h j ≤ l ≤ ∞ deﬁne f ( l ) (2 m + 1) for the segmen t N j − 2 ≤ m < N j − 1 as follows , f ( l ) (2 m + 1) = f ( j − 1) (2 m + 1) . Notice that now A j ( N j − 1 ) is we ll deﬁned and coincides with A j − 1 ( N j − 1 ). W e will deﬁne f ( j ) (2 N j − 1 + 1) maliciously . Let B + j = A j ( N j − 1 ) \ { h n ( f ( j ) ( M 0 ) , . . . , f ( j ) ( M ψ N j − 1 )) ≥ 1 4 } and B − j = A j ( N j − 1 ) \ { h n ( f ( j ) ( M 0 ) , . . . , f ( j ) ( M ψ N j − 1 )) < 1 4 } . No w notice that the sets B + j and B − j do not dep end on the future v alues of f ( j ) (2 r + 1) for r ≥ N j − 1 . One of the tw o sets B + j , B − j has at least probabilit y 1 / 16. Now we specify f ( j ) (2 N j − 1 + 1). Let f ( j ) (2 N j − 1 + 1) = 1, I j = B − j if P ( B − j ) ≥ P ( B + j ) and let f ( j ) (2 N j − 1 + 1) = 0, I j = B + j if P ( B − j ) < P ( B + j ). 5 Because of the construction of { M i } , on ev en t I j , P ( X ( j ) ψ N j − 1 +1 = 1 | X ( j ) 0 , . . . , X ( j ) ψ N j − 1 ) = f ( j ) (2 N j − 1 + 1) P ( X ( j ) ψ N j − 1 +1 = f (2 N j − 1 + 1) | X ( j ) 0 , . . . , X ( j ) ψ N j − 1 ) = f ( j ) (2 N j − 1 + 1) P ( M ψ N j − 1 +1 = 2 N j − 1 + 1 | M ψ N j − 1 0 ) = 0 . 5 f ( j ) (2 N j − 1 + 1) . The diﬀerence of the estimate and the conditional pro ba bilit y is at least 1 4 on set I j and this ev en t o ccurs with probabilit y not less than 1 / 16. No w for all N j − 1 < m deﬁne f ( j ) (2 m + 1) = 1 . In this w a y , { X ( j ) i } is also a stationar y a nd ergo dic binary- v alued Mark o v- c hain. No w by induction, w e deﬁned all the functions f ( j ) for 0 ≤ j < ∞ . Since f ( ∞ ) ( m ) = f ( j ) ( m ) = f ( j − 1) ( m ) for all 0 ≤ m ≤ 2 N j − 1 so w e also deﬁned f ( ∞ ) . Finally b y F atou’s Lemma, P (lim sup n →∞ {| h n ( X λ n 0 ) − P ( X λ n +1 = 1 | X λ n 0 ) | ≥ 1 / 4 } ) ≥ P (lim sup j →∞ I j ) ≥ lim sup j →∞ P ( I j ) ≥ 1 16 . Concerning the conditional pro babilit y P ( X 1 = 1 | X 0 −∞ ) observ e that as so on as one ﬁnds the pattern ’001’ in the sequence X 0 −∞ the conditional probabilit y do es not dep end on previous v alues. The probability of the o ccurence of ′ 001 ′ in the past is one since the original Mark o v c hain is ergo dic and our pro cess is therefore also ergo dic. Th us the conditional probabilities are a lmo st surely con tin uous. The pro of of Theorem 1 is complete. References [1] D. H. Bailey , Se quential S chemes for Clas sifying an d Pr e dic ting Er go dic Pr o c esses. Ph. D. thesis, Sta nf o rd Univ ersit y , 1976. 6 [2] T. M. Cov er, ” O p en problems in information theory ,” in 19 7 5 IEEE Joint Workshop on Information The ory , pp. 35–3 6. New Y or k: IEEE Press, 197 5. [3] I. Csisz´ ar and P . Shields, ” The consiste ncy of the BIC Marko v order estimator,” Annals of Statistics. , v ol. 28, pp. 1601-1619, 2000. [4] I. Csisz´ ar, ”Larg e-scale typicalit y of Mark ov sample paths and the con- sistency of MDL order estimators,” IEEE T r an s actions on Information The ory. , v ol. 48, pp. 1616-1 628, 2002 . [5] L. Gy¨ orﬁ, G. Morv a i, and S. Y a ko witz, ” Limits to consisten t on-line forecasting for ergo dic time series,” IEEE T r ansac tion s on Information The ory , v ol. 44, pp. 886–892, 1998. [6] S. Kalik ow ”Random Mark ov pro cesses and uniform martingales ,” Isr ael Journal of Mathematics , v o l. 71, pp. 3 3 –54, 1990. [7] M. Keane ”Str ongly mixing g- measures,” Inv ent. Math. , vol. 16, pp. 309–324, 1972. [8] G. Morv a i ”G uessing the output of a stationary binary time series” In: F oundations of Statistical Inference, (Eds. Y. Haitovsky , H.R.Lerc he, Y. Rito v), Ph ysik a-V erlag, pp. 207-215 , 200 3. [9] G. Morv ai and B. W eiss, ”F orecasting for stationary binary time series,” A cta Applic andae Mathematic a e , v ol. 79, pp. 25–34, 20 0 3. [10] B. Y a . Ry abk o, ”Prediction of random sequences and unive rsal co ding,” Pr oblems of Inform. T r ans., vol. 2 4 , pp. 87-96 , Apr.-June 1988. [11] B. W eiss, Single Orbit Dynamics , American Mathematical So ciet y , 2000. 7

Limitations on intermittent forecasting

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment