Inferring the conditional mean

INFERRING THE CONDITIONAL MEAN Guszt ´ av Mor v ai and Benjamin Weiss A bs tr act . Consider a stationary real-v alued time series { X n } ∞ n =0 with a priori unkno wn distribution. The goal is to estimate the conditional expectation E ( X n +1 | X 0 , . . . , X n ) based on the observ ations ( X 0 , . . . , X n ) in a point wise consistent w ay . It is well k nown that this is not p ossible at all v alues of n . W e will estimate i t along stopping times. Appeared in: Theor y Stoch. Process. 11 (2005), no. 1-2, 1 12–120. Introduction and St a tement of Resul ts Suppo se the distributio n of the rea l-v a lued stationary time se r ies { X n } ∞ n =0 is not known a pr iori. The goa l is to estimate the conditional exp ectation E ( X n +1 | X 0 , . . . , X n ) from the data segment X 0 , . . . , X n such that the diﬀerence b etw een the estimate a nd the conditional ex pectation should tend to zero almos t s ur ely as the nu mber of obser v atio ns n tends to inﬁnity . This pro ble m (for binary time series) was introduce d in Cov er (1975). When o ne is o bliged to estimate for a ll n , Bailey (19 76) and Ryabk o (19 88) prov ed the nonexistence of such a univ ers al alg orithm even ov er the class of a ll stationary and erg odic binary time series. In a sp ecial case, for certain Ga ussian pro cesses, Sch¨ afer (200 2) constructed an algo- rithm which c an estimate the co nditional exp ectation for every time instance n . F or further reading on related topics cf. Ornstein (19 78), A lg o et (1992 ), (1999 ), Morv ai Y ako witz and Algo et (1997 ), Morv ai, Y ako witz and Gy¨ orﬁ (1996 ), Gy¨ orﬁ, Lugosi and Mo rv a i (199 9), Gy¨ o rﬁ and Lug osi (200 2), W eiss (2000 ) and Gy¨ o rﬁ et al. (2002). In this pap er we do not r e q uire to estimate fo r every time instance n , but ra ther, merely along a sequence of stopping times. That is, lo oking at the data segment X 0 , . . . , X n our rule will decide if we estimate for this n or not, bu t an yhow we will deﬁnitely estimate for inﬁnitely ma n y n . Algorithms of this kind were prop osed for binary time series in Morv ai (2003) a nd Morv ai and W eiss (2003 ). W e will consider tw o-sided r eal-v alued pro cesses { X n } ∞ n = −∞ . A one-sided stationar y time ser ies { X n } ∞ n =0 can always b e co nsidered to be a tw o-sided stationar y time ser ies { X n } ∞ n = −∞ . Let ℜ be the set of a ll r eal num b ers and put ℜ ∗− the set of all o ne-sided sequences o f real num ber s, that is, ℜ ∗− = { ( . . . , x − 1 , x 0 ) : x i ∈ ℜ for all − ∞ < i ≤ 0 } . 1991 Mathematics Subje ct Classiﬁc ation . 62G05, 60G25, 60G10. Key wor ds and phr ases. Nonparametric estimation, stationary pro cesses. Ty p eset by A M S -T E X 1 2 GUSZT ´ AV MOR V AI AND BENJAMIN WEISS Deﬁne the metric d ∗ ( · , · ) on ℜ ∗− as d ∗ (( . . . , x − 1 , x 0 ) , ( . . . , y − 1 , y 0 )) = ∞ X i =0 2 − i − 1 | x − i − y − i | 1 + | x − i − y − i | . Deﬁnition:. The co nditional exp ectation E ( X 1 | . . . , X − 1 , X 0 ) is almo st surely contin- uous if for some s et B ⊆ ℜ ∗− which has probability one the co nditional exp ectation E ( X 1 | . . . , X − 1 , X 0 ) re stricted to this set B is contin uous with resp ect to metric d ∗ ( · , · ). Now we intro duce our algor ithm. F or notationa l convenience, let X n m = ( X m , . . . , X n ), where m ≤ n . Deﬁne the nested sequence o f partitions { P k } ∞ k =0 of the real line as follows. Let P k = { [ i 2 − k , ( i + 1)2 − k ) : for i = 0 , 1 , − 1 , 2 , − 2 , . . . } . Let x → [ x ] k denote a qua n tizer that assigns to any po in t x ∈ ℜ the unique in ter v al in P k that contains x . Let [ X n m ] k = ([ X m ] k , . . . , [ X n ] k ). W e deﬁne the stopping times { λ n } along which we will estimate. Set λ 0 = 0. F or n = 1 , 2 , . . . , de ﬁne λ n recursively . Let λ n = λ n − 1 + min { t > 0 : [ X λ n − 1 + t t ] n = [ X λ n − 1 0 ] n . (1) Note that λ n ≥ n and it is a stopping time o n [ X ∞ 0 ] n . Let f k : P k → ℜ denote a function that assigns to a n y cell A ∈ P k a point in A . The n th estimate m n is deﬁned a s m n = 1 n n − 1 X j =0 f j ([ X λ j +1 ] j ) . (2) Observe that m n depe nds so lely on [ X λ n 0 ] n . This es timator ca n b e viewed as a sam- pled version of the predictor in Morv ai, Y akowitz and Gy¨ orﬁ (199 6), W eiss (20 00), Al- go et (19 99) and Gy¨ orﬁ et al. (2 002). Deﬁne the time series { ˜ X n } 0 n = −∞ as ˜ X − n = lim j →∞ X λ j − n for n ≥ 0 , (3) where the limit exists since the in terv als { [ X λ j − n ] j } ∞ j = n are nested a nd their lengths tend to z e ro. Deﬁne the function e : ℜ ∗− → ( −∞ , ∞ ) as e ( x 0 −∞ ) = E ( X 1 | X 0 −∞ = x 0 −∞ ) . W e will prov e the following theor em. INFERRING THE CONDITIONAL ME A N 3 Theorem. L et { X n } b e a r e al-value d stationary time s eries with E ( | X 0 | 2 ) < ∞ . T hen almost su r ely lim n →∞ m n = lim n →∞ E ( X λ n +1 | [ X λ n 0 ] n ) = e ( ˜ X 0 −∞ ) and lim n →∞    m n − E ( X λ n +1 | [ X λ n 0 ] n )    = 0 . Mor e over, if in addition the c onditional exp e ctation E ( X 1 | X 0 −∞ ) is almost sur ely c ontin- uous t hen almost sur ely lim n →∞    m n − E ( X λ n +1 | X λ n 0 )    = 0 . Unfortunately, ther e is a stationary and er go dic Markov chain { X n } taking values fr om a c ount able subset of the u nit interval such t hat P  lim sup n →∞    m n − E ( X λ n +1 | X λ n 0 )    > 0  > 0 . R emarks. Let { X n } b e a r eal-v alued sta tionary time series with E ( | X 0 | 2 ) < ∞ . If the distr ibu- tion of X 0 happ ens to co ncetrate o n ﬁnitely many ato ms then E ( X λ n +1 | [ X λ n 0 ] n ) = E ( X λ n +1 | X λ n 0 ) even tually and s o | m n − E ( X λ n +1 | X λ n 0 ) | → 0 almost surely , without any contin uity condition. Let { X n } be a real- v alued stationar y time ser ies with E ( | X 0 | 2 ) < ∞ . If one knows in a dv ance that the distribution of X 0 concentrates o n ﬁnite or count a bly inﬁnite ato ms then one may o mit the partition P k , the q uan tizer [ · ] k and the function f k ( · ) entirely . That is, one may deﬁne λ ′ 0 = 0 and fo r n = 1 , 2 , . . . set λ ′ n = λ ′ n − 1 + min { t > 0 : X λ ′ n − 1 + t t = X λ ′ n − 1 0 } and m ′ n = 1 n n − 1 X j =0 X λ ′ j +1 . Then lim n →∞    m ′ n − E ( X λ ′ n +1 | X λ ′ n 0 )    = 0 almost sur ely without any con tinuit y condition. Particularly , m ′ n works for the co un terexa mple pro c e ss in the third part of the Theo rem. The counterexample Mar k ov c hain in the third par t of the Theo rem of course will no t po ssess almost sur e ly contin uous conditiona l exp ectation E ( X 1 | X 0 −∞ ). F rom the pr oo f o f Bailey (1 976), Ryabko (1988 ), Gy¨ o rﬁ, Morv ai, Y ako witz (1998 ) it is clear tha t even for the cla ss of all statio na ry and ergo dic bina r y time ser ies with almo s t surely co n tinuous conditional exp ectation E ( X 1 | X 0 −∞ ) one can not estimate E ( X n +1 | X n 0 ) for all n in a point wise consis ten t wa y . 4 GUSZT ´ AV MOR V AI AND BENJAMIN WEISS Proofs It will b e useful to deﬁne other pro cesse s { ˆ X ( k ) n } ∞ n = −∞ for k ≥ 0 as follows. Let ˆ X ( k ) − n = X λ k − n for − ∞ < n < ∞ . (4) F or an arbitrar y r e a l-v a lued stationa ry time series { Y n } , let ˆ λ 0 ( Y 0 −∞ ) = 0 and for n ≥ 1 deﬁne ˆ λ n ( Y 0 −∞ ) = ˆ λ n − 1 ( Y 0 −∞ ) − min { t > 0 : [ Y − t ˆ λ n − 1 − t ] n = [ Y 0 ˆ λ n − 1 ] n } . Let T denote the left shift op erator, that is , ( T x ∞ −∞ ) i = x i +1 . It is easy to se e that if λ n ( x ∞ −∞ ) = l then ˆ λ n ( T l x ∞ −∞ ) = − l . Pr o of of the The or em. Step 1. We show that for arbitr ary k ≥ 0 , t he time series { ˆ X ( k ) n } ∞ n = −∞ and { X n } ∞ n = −∞ have identic al distribution. It is eno ugh to s how that for a ll k ≥ 0, m ≥ n ≥ 0, and Borel set F ⊆ ℜ n +1 , P (( ˆ X ( k ) m − n , . . . , ˆ X ( k ) m ) ∈ F ) = P ( X m m − n ∈ F ) . This is immediate by s tationarity of { X n } and by the fa c t that for a ll k ≥ 0, m ≥ n ≥ 0, l ≥ 0, F ⊆ ℜ n +1 , T l { X λ k + m λ k + m − n ∈ F , λ k = l } = { X m m − n ∈ F , ˆ λ k ( X 0 −∞ ) = − l } . Step 2 . We show that for k ≥ 0 , almost sur ely, ˆ λ k ( . . . , ˆ X ( k ) − 1 , ˆ X ( k ) 0 ) = ˆ λ k ( ˜ X 0 −∞ ) and [ ˜ X 0 ˆ λ k ( ˜ X 0 −∞ ) ] k +1 = [ ˆ X ( k ) ˆ λ k ( ..., ˆ X ( k ) − 1 , ˆ X ( k ) 0 ) , . . . , ˆ X ( k ) 0 ] k +1 . Since we a re dealing with a nes ted sequence of partitions and ˆ λ k depe nds solely o n the k th quantized sequence, it is enough to prove tha t for any i ≥ 0 and for all j ≥ i , almost surely , [ ˜ X − i ] j +1 = [ ˆ X ( j ) − i ] j +1 . (Note that λ j ( X ∞ 0 ) − j ≥ 0 .) If ˜ X − i 6∈ [ ˆ X ( j ) − i ] j +1 for so me j ≥ i then this must happ en at a right end-p oin t of some interv al in S ∞ k =0 P k . By (3) and Step 1, we hav e 1 − P ( ˜ X − i ∈ [ ˆ X ( j ) − i ] j +1 for all j ≥ i ) ≤ ∞ X k = i ∞ X s = −∞ P ( ˜ X − i = s 2 − k , ˆ X ( j ) − i < ˜ X − i for all j ≥ k ) ≤ ∞ X k = i ∞ X s = −∞ lim j →∞ P ( s 2 − k − 2 − j ≤ ˆ X ( j ) − i < s 2 − k ) = ∞ X k = i ∞ X s = −∞ lim j →∞ P ( s 2 − k − 2 − j ≤ X − i < s 2 − k ) = 0 . INFERRING THE CONDITIONAL ME A N 5 Step 3 . We show that t he distributions of { ˜ X n } 0 n = −∞ and { X n } 0 n = −∞ ar e the same. This is immediate from Step 1 a nd Step 2 . The time ser ies { ˜ X n } 0 n = −∞ is stationa ry , since { X n } 0 n = −∞ is stationa ry , and it can b e extended to b e a tw o-s ide d time s e ries { ˜ X n } ∞ n = −∞ . W e will use this fact o nly for the purp ose o f deﬁning the conditiona l exp ectation E ( ˜ X 1 | ˜ X 0 −∞ ). Step 4 . We pr ove t he ﬁrst p art of t he The or em. Consider m n = 1 n n − 1 X j =0  f j ([ X λ j +1 ] j ) − E ( f j ([ X λ j +1 ] j ) | [ X λ j 0 ] j )  + 1 n n − 1 X j =0  E ( f j ([ X λ j +1 ] j ) | [ X λ j 0 ] j ) − E ( X λ j +1 | [ X λ j 0 ] j )  + 1 n n − 1 X j =0 E ( X λ j +1 | [ X λ j 0 ] j ) . (5) Observe that { Γ j = f j ([ X λ j +1 ] j ) − E ( f j ([ X λ j +1 ] j ) | [ X λ j 0 ] j ) } is a s equence of or thogonal random v ar iables with E Γ j = 0 and E  Γ 2 j  ≤ E  | X 1 | 2  + 2 E | X 1 | + 1 since E  Γ 2 j  ≤ E  | X λ j +1 | 2  + 2 E | X λ j +1 | + 1 and, by Step 1 , X λ j +1 has the same distribution as X 1 . Now b y Theor em 3.2.2 in R´ ev´ esz (1968 ), 1 n n − 1 X j =0 Γ j → 0 almost surely . The s econd term tends to zero since | f j ([ X λ j +1 ] j ) − X λ j +1 | ≤ 2 − j . Now we deal with the third term. By Step 2, Step 1 and Step 3 , E ( X λ j +1 | [ X λ j 0 ] j ) = E ( ˜ X 1 | [ ˜ X 0 ˆ λ j ( ˜ X 0 −∞ ) ] j ) . The latter forms a martingale a nd by Theo rem 7 .6 .2 in Ash (1972 ), almost surely , E ( X λ j +1 | [ X λ j 0 ] j ) = E ( ˜ X 1 | [ ˜ X 0 ˆ λ j ( ˜ X 0 −∞ ) ] j ) → E ( ˜ X 1 | ˜ X 0 −∞ ) . (6) By (5) and (6) , almo st surely , lim n →∞ m n = E ( ˜ X 1 | ˜ X 0 −∞ ) . (7) Thu s the ﬁrst pa r t of the Theorem is proved. Step 5 . We pr ove t he se c ond p art of the The or em. 6 GUSZT ´ AV MOR V AI AND BENJAMIN WEISS By (7) it is enough to prov e that almost surely E ( X λ j +1 | X λ j 0 ) → E ( ˜ X 1 | ˜ X 0 −∞ ) provided that E ( X 1 | X 0 −∞ ) is almost surely contin uous . By as s umption, the function e ( · ) is con- tin uous on a set B ⊆ ℜ ∗− with P ( X 0 −∞ ∈ B ) = 1 . By Step 1 and Step 3, P ( ˜ X 0 −∞ ∈ B , ( . . . , ˆ X ( j ) − 1 , ˆ X ( j ) 0 ) ∈ B for all j ≥ 0) = 1 . (8) Let N j ( X λ j 0 ) = { z 0 −∞ ∈ ℜ ∗− : z − λ j ∈ [ X 0 ] j , . . . , z 0 ∈ [ X λ j ] j } . By (4), (8) and Step 2 , a lmost sur ely , for all j , ( . . . , ˆ X ( j ) − 1 , ˆ X ( j ) 0 ) ∈ N j ( X λ j 0 ) \ B a nd ˜ X 0 −∞ ∈ N j ( X λ j 0 ) \ B . (9) Put Θ j ( X λ j 0 ) = sup y 0 −∞ ,z 0 −∞ ∈N j ( X λ j 0 ) T B | e ( y 0 −∞ ) − e ( z 0 −∞ ) | . Since e ( · ) is contin uous on set B a nd by (9), almost surely , lim j →∞ Θ j ( X λ j 0 ) = 0 . (10) By (9) and (10 ), a lmost sur ely , lim sup j →∞    E  e ( ˜ X 0 −∞ ) | [ X λ j 0 ] j  − E  e ( . . . , ˆ X ( j ) − 1 , ˆ X ( j ) 0 ) | X λ j 0     ≤ lim sup j →∞ E     E  e ( ˜ X 0 −∞ ) | [ X λ j 0 ] j  − e ( . . . , ˆ X ( j ) − 1 , ˆ X ( j ) 0 )    | X λ j 0  ≤ lim sup j →∞ E  Θ j ( X λ j 0 ) | X λ j 0  = lim sup j →∞ Θ j ( X λ j 0 ) = 0 . (11) By Step 2 , E  X λ j +1 | X λ j 0  = E  e ( ˜ X 0 −∞ ) | [ ˜ X 0 ˆ λ j ] j  − n E  e ( ˜ X 0 −∞ ) | [ X λ j 0 ] j  − E  e ( . . . , ˆ X ( j ) − 1 , ˆ X ( j ) 0 ) | X λ j 0 o . The ﬁrs t term tends to e ( ˜ X 0 −∞ ) b y the almost sur e ma rtingale co n vergence theorem (cf.Theorem 7 .6.2 in Ash (19 72)) since by Step 3 , E    e ( ˜ X 0 −∞ )    ≤ E    ˜ X 1    = E | X 1 | < ∞ . The second term tends to zero by (1 1). The pro of of the second part of the T heo rem is complete. Step 6 . We pr ove t he thir d p art of the The or em. First we deﬁne a Mar k ov chain { M n } on the nonneg ativ e integers which w ill serve as a technical to ol for our co un terexa mple pro cess. Let the tra nsition pro babilities b e as follows. P ( M 1 = 0 | M 0 = 0) = P ( M 1 = 1 | M 0 = 0) = P ( M 1 = 0 | M 0 = 1) = 2 − 1 INFERRING THE CONDITIONAL ME A N 7 and fo r i = 2 , 3 , . . . , let P ( M 1 = i | M 0 = 1) = 2 − i and P ( M 1 = 0 | M 0 = i ) = 1 . All other tr ansitions happ en with proba bility ze r o. Note that one can r each s tate 1 only from state 0. It is ea sy to see that the Ma rko v chain just deﬁned yields a s tationary and ergo dic time series with initial proba bilities P ( M 0 = 0) = 4 7 , P ( M 0 = 1) = 2 7 , and for i = 2 , 3 , . . . P ( M 0 = i ) = 1 7 1 2 i − 1 . Our counterexample pro cess { X n } will b e a one to one function of the Ma rko v chain { M n } . Deﬁne the function h : { 0 , 1 , 2 , . . . } → ℜ a s h (0) = 0, h (1) = 1 a nd fo r i ≥ 2 put h ( i ) = 2 − 2 i 2 . Let X n = h ( M n ). Since h ( · ) is one to one, { X n } is also a Markov chain. Since { ˜ X n } has the same distribution as { X n } , { ˜ X n } is a lso a Mar k ov chain. Let A n = { h ( i ) : h ( i ) < 2 − ( n +1) for i = 0 , 1 , 2 , . . . } . Note that h ( i ) ∈ A n if and only if [ h ( i )] n +1 = [0] n +1 . Deﬁne the even t H = { ˜ X 0 = 0 , X 1 0 = (0 , 1) } . Observe: If X 1 = 1 then X 0 = 0. (State 1 can b e reached only from s tate 0.) The event { ˜ X 0 = 0 } happ ens if and only if X λ n ∈ A n for all n = 1 , 2 , . . . . Since [ h (0)] 1 = [ h ( i )] 1 for i ≥ 2 and for all k ≥ 0, [ h (1 )] k 6 = [ h ( i )] k provided i 6 = 1 the event { ˜ X − 1 = 1 } o ccurs if and only if X 1 = 1. It follows that H = { X 0 = 0 , X 1 = 1 , X λ n ∈ A n for n = 1 , 2 , . . . } = { ˜ X 0 − 2 = (0 , 1 , 0) } . Since the time series { ˜ X n } has the s ame distribution a s { X n } , P ( H ) = P ( X 0 − 2 = (0 , 1 , 0)) = 4 7 1 2 1 2 = 1 7 > 0 . It will b e enough to sho w that X λ n ∈ A n − { 0 } happ ens inﬁnitely often g iv en the co ndition H since if X λ n ∈ A n − { 0 } happ ens then X λ n +1 = 0 and by (7), on H m n → E ( ˜ X 1 | ˜ X 0 = 0) = 0 . 5 and s o P  lim sup n →∞ | m n − E ( X λ n +1 | X λ n 0 ) | = 0 . 5 | H  = 1 and P ( H ) > 0. T o prov e that { X λ n ∈ A n − { 0 }} o ccur s inﬁnitely often we need the following obser v ation for rep eated use: By the Mar k ov prop erty and the cons truction in (1) if x i ∈ A i for i = 1 , 2 , . . . , j then for j ≥ 1, P ( X λ j = x j | X 1 0 = (0 , 1) , X λ m = x m for 1 ≤ m < j ) = P ( X 1 = x j | X 0 = 1 , X 1 ∈ A j − 1 ) . (12) Indeed, for j = 1 this is trivial, since X 1 = 1 implies tha t X 0 = 0, λ 1 = 2 while X 0 = 1 implies that X 1 ∈ A 0 . F or j ≥ 2 set ψ j 0 = λ j − 1 − 1 and for i ≥ 1 the ψ j i will b e the successive o ccurrences of the blo c k [ X λ j − 1 − 1 0 ] j in the j - th quantization, deﬁned by ψ j i = min { t > ψ j i − 1 : [ X t t − λ j − 1 +1 ] j = [ X ψ j i − 1 ψ j i − 1 − λ j − 1 +1 ] j } . 8 GUSZT ´ AV MOR V AI AND BENJAMIN WEISS These ψ j i are stopping times for i = 1 , 2 , . . . . T emp orarily let D j denote the even t { X 1 0 = (0 , 1) , X λ m = x m for 1 ≤ m < j } . The wa y that λ j is deﬁned means that on D j if λ j o ccurs at the i -th rep etition of [ X λ j − 1 − 1 0 ] j it is b ecause ψ j i < λ j and X ψ j i +1 ∈ A j − 1 . It follows that P ( X λ j = x j | D j ) = ∞ X i =1 P ( X ψ j i +1 = x j | X ψ j i +1 ∈ A j − 1 , ψ j i < λ j , D j ) P ( ψ j i + 1 = λ j | D j ) . Since x j ∈ A j ⊆ A j − 1 , e ac h expressio n P ( X ψ j i +1 = x j | X ψ j i +1 ∈ A j − 1 , ψ j i < λ j , D j ) can be written as P ( X ψ j i +1 = x j | X ψ j i +1 ∈ A j − 1 , ψ j i < λ j , D j ) = P ( X ψ j i +1 = x j | ψ j i < λ j , D j ) P ( X ψ j i +1 ∈ A j − 1 | ψ j i < λ j , D j ) and then by decomp osition according to the v alue l of ψ j i we g et P ( X ψ j i +1 = x j | ψ j i < λ j , D j ) = ∞ X l =1 P ( X l +1 = x j | ψ j i = l < λ j , D j ) P ( X l +1 ∈ A j − 1 | ψ j i = l < λ j , D j ) P ( ψ j i = l, X ψ j i +1 ∈ A j − 1 | ψ j i < λ j , D j ) ! . Observe that X ψ j i = 1 provided X 1 = 1 and the even t { ψ j i < λ j } is measur a ble with resp ect to σ ([ X ψ j i 0 ] j ). Now by the Markov prop erty we get P ( X ψ j i +1 = x j | X ψ j i +1 ∈ A j − 1 , ψ j i < λ j , D j ) = ∞ X l =1 P ( X l +1 = x j | X l = 1) P ( X l +1 ∈ A j − 1 | X l = 1) · P ( ψ j i = l , X ψ j i +1 ∈ A j − 1 | ψ j i < λ j , D j ) P ( X ψ j i +1 ∈ A j − 1 | ψ j i < λ j , D j ) ! . By stationarity and since x j ∈ A j ⊆ A j − 1 , P ( X l +1 = x j | X l = 1) P ( X l +1 ∈ A j − 1 | X l = 1) = P ( X 1 = x j | X 1 ∈ A j − 1 , X 0 = 1) . Combining all this we g et P ( X λ j = x j | D j ) = P ( X 1 = x j | X 1 ∈ A j − 1 , X 0 = 1) · ∞ X i =1 P ( ψ j i + 1 = λ j | D j ) ∞ X l =1 P ( ψ j i = l , X ψ j i +1 ∈ A j − 1 | ψ j i < λ j , D j ) P ( X ψ j i +1 ∈ A j − 1 | ψ j i < λ j , D j ) ! = P ( X 1 = x j | X 1 ∈ A j − 1 , X 0 = 1) ∞ X i =1 P ( ψ j i + 1 = λ j | D j ) = P ( X 1 = x j | X 1 ∈ A j − 1 , X 0 = 1) INFERRING THE CONDITIONAL ME A N 9 and we hav e pr o ved (12). In order to show that the ev ents { X λ n ∈ A n − { 0 }} o ccur inﬁnitely often we prove that they hav e suﬃciently la rge conditional probabili- ties and they ar e conditiona lly indep enden t given the condition H . Fir s t we calculate P ( X λ n ∈ A n − { 0 }| H ). F or n ≥ 2, by (12), P ( X λ n ∈ A n − { 0 }| H ) = P ( { X λ n ∈ A n − { 0 }} T H ) P ( H ) = P ( X λ n ∈ A n − { 0 }| X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < n ) P ( X λ n ∈ A n | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < n ) · ∞ Y m = n +1 P ( X λ m ∈ A m | X 1 0 = (0 , 1) , X λ n ∈ A n − { 0 } , X λ j ∈ A j for 1 ≤ j < m ) P ( X λ m ∈ A m , | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < m ) = P ( X λ n ∈ A n − { 0 }| X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < n ) P ( X λ n ∈ A n | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < n ) ≥ P ( X λ n ∈ A n − { 0 }| X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < n ) = P ( X 1 ∈ A n , X 1 6 = 0 | X 0 = 1 , X 1 ∈ A n − 1 ) ≥ P ( X 1 ∈ A n , X 1 6 = 0 | X 0 = 1) = X i ∈ A n −{ 0 } 1 2 i = X i> log 2 ( n ) 1 2 i ≥ 1 n . W e hav e just pr o ved that X n P ( X λ n ∈ A n − { 0 }| H ) ≥ X n 1 n = ∞ . (13) Now we will pr ov e that fo r n = 1 , 2 , . . . , the events { X λ n ∈ A n − { 0 }} a re co nditionally independent given H . Since P ( X λ i ∈ A i − { 0 } for i = 1 , 2 , . . . , k | H ) = X x 1 ∈ A i −{ 0 } · · · X x k ∈ A k −{ 0 } P ( X λ i = x i for i = 1 , 2 , . . . , k | H ) it is enough to show that the even ts { X λ i = x i } ar e conditiona lly indep endent g iv en the 10 GUSZT ´ AV MOR V AI AND BENJAMIN WEISS condition H , provided that x i ∈ A i . Let x i ∈ A i . Then by rep eated us e of (12 ) P ( X λ i = x i for i = 1 , 2 , . . . , k | H ) = P ( X λ i = x i for i = 1 , 2 , . . . , k , H ) P ( H ) = k Y m =1 P ( X λ m = x m | X 1 0 = (0 , 1) , X λ j = x j for 1 ≤ j < m ) P ( X λ m ∈ A m | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < m ) ! · ∞ Y l = k +1 P ( X λ l ∈ A l | X 1 0 = (0 , 1) , X λ i = x i for 1 ≤ i ≤ k a nd X λ j ∈ A j for 1 ≤ j < l ) P ( X λ l ∈ A l | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < l ) = k Y m =1 P ( X λ m = x m | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < m ) P ( X λ m ∈ A m | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < m ) = k Y m =1  P ( X λ m = x m | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < m ) P ( X λ m ∈ A m | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < m ) · ∞ Y l = m +1 P ( X λ l ∈ A l | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < l ) P ( X λ l ∈ A l | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < l ) ! = k Y m =1  P ( X λ m = x m | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < m ) P ( X λ m ∈ A m | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < m ) · ∞ Y l = m +1 P ( X λ l ∈ A l | X 1 0 = (0 , 1) , X λ m = x m , X λ j ∈ A j for 1 ≤ j < l ) P ( X λ l ∈ A l | X 1 0 = (0 , 1) , X λ j ∈ A j for 1 ≤ j < l ) ! = k Y i =1 P ( X λ i = x i , H ) P ( H ) = k Y i =1 P ( X λ i = x i | H ) . Now by (1 3 ) and the Borel-Ca n telli lemma (cf. Lemma B in R´ enyi (1970 ) on pag e 39 0) the event s { X λ n ∈ A n − { 0 }} o ccur inﬁnitely o ften a nd the third part of the Theor e m is prov ed. The pr oo f of the Theore m is complete. References 1. P . Algoet, Universal schemes for pr e diction, gambling and p ortfolio sele ct ion , Annals of Probability 20 (1992), 901–941. 2. P . Al goet, Universal schemes for le arning the b est nonline ar pr e dictor give n the inﬁnite p ast and side information , IEEE T ransactions on Information Theory 45 (1999), no. 4, 1165–1185. 3. R.B. Ash, Re al Analysis and Pr opb ability , “Academic Press”, New Y ork, 1972. 4. D. H. Bail ey, Se quential Schemes for Classifying and Pr e dicting Er go dic Pr o c esses , Ph. D. thesis, “Stanford Universit y”, 1976. 5. T. M. Cov er, Op en pr oblems in information the ory , In: 1975 IEEE Joint W or kshop on Information Theory, “IEEE Press”, New Y ork, 1975, pp. 35–36. 6. L. Gy¨ orﬁ, M . Kohler, A . K rzyzak, and H. W alk, A Distri b ution F r e e The ory of Nonp ar ametric R e gr ession , “Springer-V erl ag”, New Y ork, 2002. 7. L. Gy¨ orﬁ and G. Lugosi, Str ate gies for se quent ial pr e diction of stationary t ime series , M o deling Uncertainit y An Examination of Sto c hastic Theory , Methods, and Applications M.Dror, P . L’Ecuyer, F. Szidarovszky (Eds.), “Kluw er Academic Publishers”, 2002, pp. 225–248. INFERRING THE CONDITIONAL ME A N 11 8. L. Gy¨ orﬁ, G. Lugosi and G. Morv ai, A simple r andomize d algorithm for c onsistent se quential pr e dic- tion of er go dic time serie s , IEEE T ransactions on Information Theory 4 5 (1999) , no. 45, 2642–2 650. 9. L. Gy¨ orﬁ, G. Morv ai, and S. Y ak owitz, Li mit s to c onsistent on-line for e c asting for er go dic time series , IEEE T ransactions on Information Theory 4 4 (1998 ), no. 2, 886–892. 10. G. Mor v ai, Guessing the output of a stationary binary time series , i n: F oundations of Statistical Inference Y. Haitovsky , H.R . Lerche, Y. Ritov (Eds.) (2003), “Ph ysi k a V erlag”, 205–21 3. 11. G. Morv ai and B. W eiss, F or e c asting for stationary binary ti me series , Acta Appli candae Mathe- maticae 79 (2003), no. 1-2,, 25–34. 12. G. Morv ai, S. Y ak owitz, and P . Algo et, We akly co nvergent nonp ar ametric for e c asting of stationary time series , IEEE T ransactions on Information Theory 4 3 (1997) , no. 2, 483–498. 13. G. Morv ai, S. Y ako witz, and L. Gy¨ orﬁ, Nonp ar ametric i nfer enc es for er go dic, stationary time series , Annals of Statistics 2 4 (1996) , no. 1, 370–379. 14. D. S. Or nstein, Guessing the next output of a stationary pr o c ess , Isr ael J. Math 30 (1978), 292–296. 15. A. R´ enyi, Pr ob ability The ory , “Ak ad ´ emiai Kiad´ o”, 1970. 16. P . R´ ev´ esz, The L aw of La r ge Numb ers , “Academic Press”, 1968. 17. B. Y a. Rya bko, Pr e diction of r andom se quenc es and universal c o ding , Problems of Infor m. T rans. (Problemy Peredac hi Informatsii) 24 (1988), no. 2, 3–14. 18. D. Sch¨ afer, Str ongly c onsistent online for e c asting of c ent e r ed Gaussian pr o c esses , IEEE T ransactions on Information Theory 48 (2002), no. 3, 791–799. 19. B. W eis s, Single O rbit Dy namics , “American Mathematical So ciet y”, Providence, RI, 2000. G us zt ´ av M or v ai ( C orr es po nd ing aut ho r. T e l.: 3 6 -1 -4 6 3 2 8 6 7 ; f ax .:3 6 -1 -4 6 3 3 14 7 .) R e - se arc h G ro up f or I nfo rm a t ic s a nd E le ct ron ic s o f t he H un ga ria n A ca de my o f S c ien ce s, B uda pes t , 1 5 21 G o ld m an n G y ¨ org y t ´ er 3 , H u nga r y E-mail addr ess : morvai@math.bme .hu B en j am in W e is s (T e l.: 97 2 -2 -6 5 8 -4 3 88 ; f ax. : 97 2 -2 -5 6 3 -0 7 02 .) H ebr ew U ni ver si ty o f Je rus ale m J erus al em 9 19 0 4 Is ra el E-mail addr ess : weiss@math.huji .ac.il

Inferring the conditional mean

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment