Density estimation from an individual numerical sequence

This paper considers estimation of a univariate density from an individual numerical sequence. It is assumed that (i) the limiting relative frequencies of the numerical sequence are governed by an unknown density, and (ii) there is a known upper boun…

Authors: Andrew B. Nobel, Gusztav Morvai, Sanjeev R. Kulkarni

A. B. Nob el, G. Morv ai, S .R. Kulk arni: Densit y estimation from an i ndivid ual n u- merical sequence. IEEE T rans. Inform. Theory 44 (1998 ), no. 2, 537–541 . Abstract This pap er co nsiders estimation of a univ ariate density from an individual n umerical sequence. It is assumed that (i) th e limiting relativ e frequen cies of the n umerical sequence are go v erned by an unkno wn densit y , and (ii) there is a kno wn upp er b ound for the v ariation of the d ensit y on an in cr easing sequence of int erv als. A simple estimation sc heme is prop osed, and is shown to b e L 1 consisten t when (i) and (ii) apply . In addition it is sho wn that there is no consisten t estimation sc heme for th e set of individual sequences satisfying only condition (i). Key w ords and phrases: Density estimation, individual sequences, b ounded v ar i- ation, ergo dic pro cesses. 1 In tro duction Estimation o f a univ ariate densit y fro m a finite data set is an imp ortant problem in theoretical and a pplied statistics. In t he most common setting, it is a ssum ed that data are obtained from a stationa r y pro cess X 1 , X 2 , . . . suc h that I P { X i ∈ A } = Z A f dx for ev ery Borel set A ⊆ I R i.e. the common distribution of the X i has density f , written X i ∼ f . F o r eac h n ≥ 1 an estimate ˆ f n of f ( · ) is pro duced from X 1 , . . . , X n . The estimates { ˆ f n } are said to b e strong ly L 1 consisten t if R | ˆ f n − f | dx → 0 as n → ∞ with pro ba bilit y one. Common densit y estimation metho ds include histogram, ke rnel, nearest neigh b or, orthogonal series , w a v elet, spline, and likelihoo d based pro cedures . F o r an accoun t of these metho ds, w e refer the intere sted reader to the texts of Devroy e and Gy¨ orfi [4], Silv erman [19], Scott [18], and W and and Jones [2 0]. In establishing consistency and rates of con v ergence f or estimation pro cedures like those ab ov e, many analyses assume that X 1 , X 2 , . . . are independen t and iden tically distributed (i.i.d.), in whic h case the distribution of the pro cess { X i } is completely sp ecified b y the marginal density f of X 1 . Complemen ting w ork for indep enden t random v ariables, numerous results ha v e also been obtained for statio nary sequences exhibiting b oth short and long range dep endenc e. Roussas [17] and Rosen blatt [16] studied the consistency and asymptotic normalit y of kerne l density estimates from Marko v pro cesses. Similar results, under w eak er conditions, w ere obtained b y Y ak o witz [21]. Gy¨ orfi [5] sho w ed that there is a simple ke rnel-based pro cedure Φ that is strongly L 2 -consisten t for ev ery stationary ergo dic pro cess { X i } ∞ i = −∞ suc h that (i) the conditional distribution of X 1 giv en { X i : i ≤ 0 } is absolutely con tin uous with probabilit y one, and (ii) the corresp onding conditional densit y h satisfies E R | h ( u ) | 2 du < ∞ . F or additional w ork in this area, see also Ahmad [2 ], Caste llana and Leadb etter [3], Gy¨ orfi and Masry [7], Hall and Hart [9], and the references contained therein. With these p ositive results hav e come examples show ing that densit y estimation from strongly dep enden t pro cesse s can b e problematic. In a result attributed to Shields, it was sho wn b y Gy¨ orfi, H¨ ardle, Sarda and Vieu [8] that there are histogram densit y estimates, consisten t for ev ery i.i.d. pro cess, that fail for some stationary 1 ergo dic pro cess. Gy¨ orfi and Lugosi [6] established a similar result for or dina r y k erne l estimates. Extending these results, Adams and Nob el [1 ] ha ve recen tly sho wn that there is no densit y estimation pro cedure that is consisten t for eve ry statio nary ergo dic pro cess. With a view to considering densit y estimation in a more general setting, one may eliminate sto c hastic assumptions. Here we consider the estimation of an unkno wn densit y from an individual n umerical sequence, which need not b e the tra jectory of a stationary stochastic pro cess. W e prop ose a simple estimation procedure that is applicable in a purely deterministic setting. This deterministic p oin t of view is in line with recen t w ork on individual se quences in information theory , statistics, and learning theory (cf. [22, 13, 12, 1 0]). Extending the tec hnique s deve lop ed in this pap er, Morv ai, Kulk arni, and Nob el [14] consider the problem of regression estimation from individual sequences . In many cases, results based on deterministic analyses can b e applied to individual sample paths in a sto c hastic setting. Theorem 1 of this pap er yields a p ositiv e result concerning densit y estimation from ergo dic pro cesse s (see Corollary 1 b elo w). 2 The Determini stic Set ting Let f : I R → I R b e a univ ariate densit y f unction with asso ciated probability measure µ f ( A ) = R A f ( x ) dx . An infinite sequence x = ( x 1 , x 2 , . . . ) of n umbers x i ∈ I R has limiting de n sity f if ˆ µ n ( A ) = 1 n n X i =1 I { x i ∈ A } → µ f ( A ) (1) for ev ery in terv al A ⊆ I R. A sequence x hav ing a limiting densit y will be called stationary . Let Ω( f ) b e the set of stationary sequence s with limiting densit y f . Note that stationarity concerns t he limiting b eha vior of r elativ e frequencies, whic h need not conv erge to their corresponding probabilities at an y pa rticular rate. Sta- tionarit y says nothing ab out the mec hanism by whic h the individual sequence x is pro duced. In particular, the limiting relative frequencies of a statio na ry sequence x are unc hanged if one app ends to x a prefix of a n y finite length. The sample paths o f ergo dic pro cesses pro vide one source of stationa r y sequences . The next prop osition follow s easily from Birkhoff ’s ergo dic theorem. 2 Prop osition 1 If X 1 , X 2 , . . . ar e stationary and er g o dic w ith X i ∼ f , then X = ( X 1 , X 2 , . . . ) ∈ Ω( f ) with pr ob a b ility one. A univ aria t e densit y estimation sche me is a coun table collection Φ of Borel- measurable mappings φ n : I R × I R n → I R, n ≥ 1. Th us φ n asso ciates eve ry v e ctor ( x 1 , . . . , x n ) ∈ I R n with a function φ n ( · : x 1 , . . . , x n ), whic h is view ed as the estimate of an unknow n densit y a ssociated with the seque nce x 1 , . . . , x n . These estimates ma y tak e negativ e v alues, and they need not integrate to o ne. In part icular, no r egula r ity conditions are imp osed on the b ehav ior of φ n as a f unction o f its inputs. A sche me Φ is L 1 consisten t fo r a a collection Ω of stat io nary sequences if for each x ∈ Ω, Z | φ n ( x : x 1 , . . . , x n ) − f ( x ) | dx → 0 , as n → ∞ , where f is the limiting densit y of x . A sc heme Φ is univ ersal if it is L 1 consisten t for the set Ω ∗ of all stationary sequences. Note that, for i.i.d. data, a densit y estimation sc hem e is called univ ersal if it is consisten t for ev ery marginal densit y f . The notion o f univ ersalit y defined a bov e is considerable stronger, as there are no constrain ts apart from stationarit y pla ced on the structure o f the individual sequence s. In what fo llo w s, when x = x 1 , x 2 , . . . is fixed, φ ( x : x 1 , . . . , x n ) will b e denoted b y φ n ( x ). Recall tha t the total v ariation of a real-v a lued function h defined on an interv al [ a, b ) ⊆ I R is given by V ( h : a, b ) = sup n X i =1 | h ( t i ) − h ( t i − 1 ) | , where the suprem um is taken ov er all finite ordered sequence s a ≤ t 0 < · · · < t n < b . F or eac h nondecreasing function α : Z Z + → (0 , ∞ ) let F ( α ) b e the set of all densities f on I R suc h that V ( f : − i, i ) < α ( i ) f or i ≥ 1, and let Ω( α ) = [ f ∈F ( α ) Ω( f ) b e t he collection of all those stationary sequences having limiting densities in F ( α ). Giv en a function α ( · ) as ab o v e, w e prop ose a simple histogram based pro cedure that is consisten t for Ω( α ). F or eac h k ≥ 1 let π k b e the partition of I R into dy adic 3 in terv a ls of the for m A k ,j =  j 2 k , j + 1 2 k  with j ∈ Z Z , and let π k [ x ] b e the unique cell of π k con taining x . Let { b n } b e any sequence of p ositiv e in tegers tending to infinit y . F or eac h sequence of num b ers x 1 , . . . , x n and eac h k ≥ 1 define histogram densit y estimates ˆ h n,k ( x ) = 2 k n n X i =1 I { x i ∈ π k [ x ] } . (2) Our estimate is selecte d from among the histograms ˆ h n,k b y selecting a suitable v alue of k . Find the partitio n index k n = max n 1 ≤ k ≤ b n : V ( ˆ h n,k : − i, i ) < 4 α ( i ) for 1 ≤ i ≤ k o (3) and define φ ∗ n ( x : x 1 , . . . , x n ) = ˆ h n,k n ( x ) . (4) If the conditions defining k n are not satisfied for an y 1 ≤ k ≤ b n , then set φ ∗ n ≡ 0. Theorem 1 L et α : Z Z + → (0 , ∞ ) b e a fixe d, non-de cr e as ing function. The es tima tion scheme Φ ∗ = { φ ∗ n } define d by (2)-(4) is L 1 -c onsi s tent for Ω( α ) . Th us for every stationary s e quenc e x with limiting d ensity f ∈ F ( α ) , R | φ ∗ n ( x ) − f ( x ) | dx → 0 . Corollary 1 L et α ( · ) b e fixe d and let φ ∗ n b e defin e d by (2)-(4). F or every stationa ry er go dic pr o c ess { X i } such that X i ∼ f with f ∈ F ( α ) , Z | φ ∗ n ( x : X 1 , . . . , X n ) − f ( x ) | dx → 0 as n → ∞ w i th pr ob ability on e . Example: Fix γ > 0, and consider the class of stationary ergo dic pro cesses { X i } suc h that X i ∼ f with V ( f : −∞ , ∞ ) < 2 γ . This class includes, but is not lim- ited to, pro cesses hav ing uniform, exp onen tial, and normal marginal densities with arbitrary means, under the restriction that V ar ( X i ) is great er than (12 γ 2 ) − 1 , γ − 2 , and (2 π γ 2 ) − 1 , respectiv ely . By Corollary 1 there is a s trongly consisten t densit y estimation pro cedure Φ ∗ for this class of pro cesses. 4 Remark: The v ariations used to define φ ∗ n dep end on the cum ulat iv e difference b et w een the relativ e frequencies of adjacen t cells: V ( ˆ h n,k : − i, i ) = 2 − k i 2 k − 2 X j = − i 2 k | ˆ µ n ( A k ,j ) − ˆ µ n ( A k ,j +1 ) | . (5) T o find φ ∗ n , put x 1 , . . . , x n in increasing order, and then calculate V ( ˆ h n,k : − i, i ) for eac h k = 1 , . . . , b n and eac h i = 1 , . . . , k b y scanning the ordered x i from left to righ t. This will require at most O ( n log n + nb n ) o p eratio ns. In order to a pply the pro cedure Φ ∗ described in (2) -(4), one m us t kno w before seeing x that the v ariation of its limiting densit y is less than a know n constan t on ev ery in terv a l of the f o rm [ − i, i ). The following result sho ws that this requiremen t cannot b e materially w eak ened. Theorem 2 L et F b e the c ol le c tion of densities f supp orte d on [0 , 1] for which V ( f : 0 , 1) is fi nite. T h er e is no L 1 c onsis tent density estimation scheme for Ω = [ f ∈F Ω( f ) . In p articular, ther e is n o universal dens ity estima tion sche m e for individual se quenc es. If an upp er b ound on the v ariance of the unkno wn densit y f w ere know n, the sc hem e of Theorem 1 w ould prov ide consisten t estimates of f . Giv en any densit y es timation sc heme Φ = { φ n } , the pro of of Theorem 2 sho ws ho w one ma y construct a stationary sequence x , depending o n Φ, for whic h φ n ( · ) fails to con v erge. A relat ed arg umen t is used b y Adams and Nob el [1] to sho w that there is no univ ersal densit y estimation sc heme for stationary ergo dic pro cesses. As a univ ers al density estimation sc heme for individual sequences w ould, by virtue o f Prop osition 1, yield a univ ersal sc heme for ergo dic pro cesses , their result also implies Theorem 2. The pro of of Theorem 1 is giv en in the next section after sev e ral preliminary results. The pro o f of Theorem 2 is giv en in Section 4. 5 3 Pro of o f The orem 1 Definition: F or eac h partition π of I R in to finite interv als and eac h f ∈ L 1 define ( f ◦ π )( x ) = 1 l ( π [ x ]) Z π [ x ] f ( u ) du , where l ( A ) denotes the length of an interv al A . Note that f ◦ π is piecewise constan t on the cells of π . Lemma 1 L et π 1 , π 2 , . . . b e the p artitions use d to define the estimates φ ∗ n . F or e ach p air of inte gers k , i ≥ 1 , V ( f ◦ π k : − i, i ) ≤ 3 V ( f : − i, i ) . Mor e over, if x ∈ Ω( f ) then lim n →∞ V ( ˆ h n,k : − i, i ) = V ( f ◦ π k : − i, i ) . Pro of: F or f non- decreas ing it is immediate that V ( f ◦ π k : − i, i ) ≤ V ( f : − i, i ). If V ( f : − i, i ) = C < ∞ then f ( x ) = u ( x ) − v ( x ) where u ( · ) and v ( · ) are non-decreasing, V ( u : − i, i ) ≤ C a nd V ( v : − i, i ) ≤ 2 C (cf. Ko lmogoro v and F omin [11 ]). It follo ws from the definition that f ◦ π k = u ◦ π k − v ◦ π k , and since u a nd v are no n- decreasing, so are u ◦ π k and v ◦ π k . Therefore V ( f ◦ π k : − i, i ) = V ( u ◦ π k − v ◦ π k : − i, i ) ≤ V ( u ◦ π k : − i, i ) + V ( v ◦ π k : − i, i ) ≤ V ( u : − i, i ) + V ( v : − i, i ) ≤ 3 C as the v ariation of the sum is less than the sum of the v ariations. T o establish the second claim, note that as n → ∞ V ( ˆ h n,k : − i, i ) = 2 − k i 2 k − 2 X j = − i 2 k | ˆ µ n ( A k ,j ) − ˆ µ n ( A k ,j +1 ) | → 2 − k i 2 k − 2 X j = − i 2 k | µ f ( A k ,j ) − µ f ( A k ,j +1 ) | = V ( f ◦ π k : − i, i ) . ✷ 6 Lemma 2 L et x ∈ Ω( α ) with limiting density f ∈ F ( α ) . Then the p a rtition index k n of the densi ty estimate φ ∗ n tends to infinity with n . Pro of: By Lemma 1, fo r arbitra ry K ≥ 1 and for all i = 1 , . . . , K , lim n →∞ V ( ˆ h n,K : − i, i ) = V ( f ◦ π K : − i, i ) ≤ 3 V ( f : − i, i ) < 3 α ( i ) . Th us by definition of k n , lim inf n →∞ k n ≥ K . ✷ Pro of of Theorem 1: Let x ∈ Ω( α ) be a fixed stationary sequence with limiting densit y f ∈ F ( α ). F or eac h n ≥ 1 suc h that k n ≥ 1 define the error function g n ( x ) = φ ∗ n ( x : x 1 , . . . , x n ) − f ( x ) = ˆ h n,k n ( x ) − f ( x ) , and note tha t fo r all 1 ≤ i ≤ k n , V ( g n : − i, i ) ≤ V ( φ ∗ n : − i, i ) + V ( f : − i, i ) < 5 α ( i ) . (6) Fix ǫ > 0. Select an in teger L ≥ 1 suc h that Z | x |≥ L f ( x ) dx ≤ ǫ (7) and define δ = ǫ L . (8) Finally , c ho ose an integer K ≥ 1 so large that 2 − K < ǫδ α ( L )(50 α ( L ) + 5 δ ) . (9) As x ∈ Ω( f ) and the partitio ns π k are nested , there exists an integer N = N ( x , ǫ, f , α ) suc h that for n ≥ N one has k n ≥ max { K , L } , | Z A g n ( x ) dx | = | ˆ µ n ( A ) − µ f ( A ) | < δ 2 · 2 − K (10) for A ∈ π K with A ⊆ [ − L, L ), and | ˆ µ n {| x | ≥ L } − µ {| x | ≥ L }| ≤ ǫ . (11) F or eac h n let H n = { x ∈ I R : | g n ( x ) | > δ } 7 con tain t hose p oin ts ha ving large erro r, a nd let H n = { A ∈ π K : A ∩ H n 6 = ∅ , A ⊆ [ − L, L ) } . Fix n ≥ N and consider a set A ∈ H n . By definition, there exists a p oin t x ∈ A suc h that | g n ( x ) | > δ . Assume for the momen t that g n ( x ) > δ . It follo ws fro m (1 0) that there is a p oint y ∈ A suc h that g n ( y ) < δ / 2, and therefore sup x,y ∈ A | g n ( x ) − g n ( y ) | > δ / 2 . (12) As k n ≥ L t he v aria tion of g n on A is less than 5 α ( L ) b y ( 6 ), so that for eac h z ∈ A , g n ( z ) ≤ g n ( y ) + 5 α ( L ) ≤ δ 2 + 5 α ( L ) , and g n ( z ) ≥ g n ( x ) − 5 α ( L ) ≥ δ 2 − 5 α ( L ) . Therefore, sup z ∈ A | g n ( z ) | ≤ δ 2 + 5 α ( L ) . (13) A similar a rgumen t in the case g n ( x ) < − δ sho ws that b oth (12) and (13) hold for eac h A ∈ H n . It is immediate from (12) that δ 2 |H n | ≤ V ( g n : − L, L ) < 5 α ( L ) , and consequen tly |H n | < 10 α ( L ) δ . (14) F or each n ≥ N the in tegrated error b etw een φ ∗ n and f ma y b e decomp osed as follo ws: Z | φ ∗ n ( x ) − f ( x ) | d x ≤ X A ∈H n Z A | g n ( x ) | dx + X A / ∈H n ,A ⊆ [ − L,L ) Z A | g n ( x ) | dx + Z | x |≥ L | g n ( x ) | dx △ = Θ 1 + Θ 2 + Θ 3 Inequalities (13), (14) and (9) imply that Θ 1 ≤ X A ∈H n Z A ( δ 2 + 5 α ( L )) dx ≤ 5 α ( L ) + δ 2 ! 10 α ( L ) δ 2 K ≤ ǫ , 8 and b y virtue of (8), Θ 2 ≤ Z [ − L,L ) δ dx = δ · 2 L = 2 ǫ . Finally , it follo ws from ( 7) and (11) that Θ 3 ≤ ˆ µ n {| x | ≥ L } + µ {| x | ≥ L } ≤ 3 ǫ. Com bining these three b ounds show s that lim sup n →∞ Z | φ ∗ n ( x ) − f ( x )) | d x ≤ 6 ǫ , and as ǫ was arbitr a ry , the desired L 1 con v ergence of φ ∗ n to f follo ws. ✷ . 4 Pro of o f The orem 2 The following result can b e established b y a straightforw ard extension of the Gliv enk o Can telli Theorem, or by a brack eting argumen t (c.f. P ollard [1 5]). Lemma 3 L et A b e the c ol le ction of al l (finite and infinite) intervals in I R . I f x ∈ Ω( f ) then sup A ∈A | ˆ µ n ( A ) − µ f ( A ) | → 0 . Pro of of Theorem 2: Cons ider the family F 0 = { h 1 , h 2 , . . . } ⊆ F o f Rademac her densities where h k ( x ) =      2 if 2 j 2 − k ≤ x < (2 j + 1)2 − k for some 0 ≤ j < 2 k − 1 0 otherwise . Note t ha t eac h h j is suppo rted o n [0 , 1] and that R | h j ( x ) − h k ( x ) | dx = 1 whenev er j 6 = k . Let µ k b e the probability measure having densit y h k , and for eac h finite sequence u 1 , . . . , u m ∈ [0 , 1 ] let ∆ k ( u 1 , . . . , u m ) = sup A ∈A       1 m m X j =1 I A ( u j ) − µ k ( A )       , measure the distance b et w een µ k and the empirical measure of u 1 , . . . , u m . W e sho w that if Φ is consisten t for F 0 then t here is a stat io nary sequence x ∗ whose limiting densit y is iden tically one o n [0 , 1], but is suc h t hat φ ( · : x ∗ 1 , . . . , x ∗ n ) fails to 9 ha v e a limit in L 1 . F or each k ≥ 1 select a sequence x ( k ) = ( x ( k ) 1 , x ( k ) 2 , . . . ) ∈ Ω( h k ) (e.g. a t ypical sample sequence f rom an i.i.d. pro cess with densit y h k ), and define m k = min ( M : sup m ≥ M ∆ k ( x ( k ) 1 , . . . , x ( k ) m ) ≤ 1 k + 1 ) . Lemma 3 insures that m k exists and is finite. Fix an y pro cedure Φ = { φ 1 , φ 2 , . . . } that is consisten t for F 0 and consider the infinite sequence x (1) . As h 1 ∈ F 0 , Z | φ n ( x : x (1) 1 , . . . , x (1) n ) − h 1 ( x ) | dx → 0 as n → ∞ . Therefore there is an in teger n 1 ≥ m 2 and a corresp onding initial segmen t y (1) = x (1) 1 , . . . , x (1) n 1 of x (1) suc h tha t Z | φ n 1 ( x : y (1) ) − h 1 ( x ) | dx ≤ 1 4 and ∆ 1 ( y (1) ) ≤ 1 2 . No w suppo se tha t one has constructed a sequence y ( k ) of finite length n k from initial segmen ts o f x (1) , . . . , x ( k ) suc h tha t Z | φ n k ( x : y ( k ) ) − h k ( x ) | dx ≤ 1 / 4 , (15) ∆ k ( y ( k ) ) ≤ ( k + 1) − 1 , (16) and n k ≥ k · m k +1 . (17) As y ( k ) is finite, the concatenation y ( k ) · x ( k +1) is contained in Ω( h k +1 ). It follows from the consistency of Φ a nd Lemma 3 that when n is large enough eac h initial segmen t y ( k +1) = y ( k ) · ( x ( k +1) 1 , . . . , x ( k +1) n − n k ) o f y ( k ) · x k +1 satisfies (15) and (16) with k r eplaced b y k + 1. Select n k +1 > n k so large that the same is true of (1 7). As y ( k +1) is a prop er extension of y ( k ) , rep eating the ab ov e pro cess indefinitely yields an infinite seque nce x ∗ . By construction, the functions φ n ( · ) = φ ( · : x ∗ 1 , . . . , x ∗ n ) do not con v erge in L 1 . Indeed, it f ollo ws fr om (15) and the triangle inequality that R | φ n k − φ n l | dx ≥ 1 / 2 whenev er k 6 = l . It remains to show that the limiting densit y of x ∗ is uniform o n [0 , 1]. T o this end, fix k ≥ 1 and let A ⊆ [0 , 1] b e an in terv a l o f length l ( A ). It is easily v erified that | µ k ( A ) − l ( A ) | ≤ 2 − k +1 ≤ 1 k . (18) 10 Let ˆ µ n ( A ) b e the empirical distribution of A under x ∗ 1 , . . . , x ∗ n , and for eac h 1 ≤ r ≤ n k +1 − n k define ˆ µ ′ r,k ( A ) = 1 r n k + r X j = n k +1 I A ( x ∗ i ) It follo ws fr om the equation ˆ µ n k + r ( A ) = n k n k + r · ˆ µ n k ( A ) + r n k + r · ˆ µ ′ r,k ( A ) that the difference | ˆ µ n k + r ( A ) − l ( A ) | ≤ n k n k + r · | ˆ µ n k ( A ) − l ( A ) | + r n k + r · | ˆ µ ′ r,k ( A ) − l ( A ) | △ = I + I I . By virtue of (16) and (18 ), I ≤ | ˆ µ n k ( A ) − µ k ( A ) | + | l ( A ) − µ k ( A ) | ≤ 1 k + 1 + 1 k . If n k +1 − n k ≥ r ≥ m k +1 then ∆ k +1 ( x ∗ n k +1 , . . . , x ∗ n k + r ) = ∆ k +1 ( x ( k +1) 1 , . . . , x ( k +1) r ) ≤ 1 k + 2 and therefore I I ≤ | ˆ µ ′ r,k ( A ) − µ k +1 ( A ) | + | µ k +1 ( A ) − l ( A ) | ≤ 1 k + 2 + 1 k + 1 . On the other hand, if 1 ≤ r < m k +1 then (17) implies that I I ≤ 2 r n k + r ≤ 2 r k r + r = 2 k + 1 . These b ounds insure t ha t max {| ˆ µ n ( A ) − l ( A ) | : n k < n ≤ n k +1 } ≤ 4 k , and consequen tly lim n →∞ | ˆ µ n ( A ) − l ( A ) | = 0 . As A ∈ A w as arbitrary , x ∗ is stationary with limiting densit y f ( x ) = 1 on [0 , 1]. ✷ Ac kno wl e dgmen ts The a utho rs wish to thank L´ aszl´ o G y¨ orfi for his helpful commen ts and suggestions. 11 References [1] T.M. Adams and A.B. Nob el. On d ensit y estimation from ergo dic pro cesses. T o app ear in Ann. Probab., 1997. [2] I.A. Ah mad. Strong consistency of den sit y estimation b y orth ogo nal series metho ds for dep end en t v ariables with ap p licat ions. Ann. Inst. Statist. Math. , 31:27 9-288, 1979. [3] J.V. Castellana and M.R. Leadb ette r. On smo othed pr ob ab ility d en sit y estimation for stationary pro cesses. Sto ch. Pr o c. Appl. , 21:179 -193, 1986. [4] L. Devro ye and L. Gy¨ orfi. Nonp ar ametric Density Estimation : the L 1 -view. John Wiley , New Y ork, 1985. [5] L. Gy¨ orfi. S trongly consisten t density estimate from ergo dic samp le. J. M ultivariate Ana lysis , 11:81-8 4, 1981. [6] L. Gy¨ orfi and G. L u gosi. Kernel d ensit y estimation fr om ergo dic sample is n ot univer- sally consisten t. Comput. Stat. Data Anal. , 14:4 37-44 2, 1992. [7] L. Gy¨ orfi and E. Masry . The L 1 and L 2 strong consistency of recursive kernel densit y estimation from dep endent samples. IEEE T r ans. Inform. The ory , 36:53 1-539 , 1990. [8] L. Gy¨ orfi, W. H¨ ardle, P . S ard a, and P . Vieu, Nonp ar a metric Curve Estimation fr om Time Series. Springer-V erlag, Berlin, 1989. [9] P . Hall and J.D. Hart. Con v ergence rates in densit y estimatio n for data from infi nite- order mo ving a v er age pro cesses. Pr ob a b. Th. R el. Fields , 87:25 3-274 , 1990. [10] D. Haussler, J. Kivinen, and M. W armuth. Tight worst-c ase loss b oun ds f or predicting with exp ert advice. Pr o c. Eur op e an Confer enc e on Computationa l L e arning The ory , 1994. [11] A.N. Kolmogoro v and S.V. F omin. Intr o ductory R e al Analysis . Do ver, Mineola, 1970. [12] S .R. Ku lk arni an d S .E. P osner. Rates of con v erge nce for n earest neighbor estimation under arbitrary sampling. IEEE T r ans. on Information Tho ery , IT -41 :1028 -1039, 1995. [13] N. Merha v, M. F eder, and M. Gutman. Univ er s al p rediction of individual sequences. IEEE T r an s. on Information Tho ery , IT-38:125 8-127 0, 1992. 12 [14] G. Morv ai, S. Kulk arni, and A.B. Nob el. Regression estimat ion from an ind ividual sequence. Statistics , 33 (1999 ), no. 2, 99–118 . [15] D. Polla rd. Conver genc e of Sto chastic Pr o c esses . Spr in ger-V erlag, New Y ork, 1984. [16] M. Rosenblatt. Densit y estimates and Marko v sequences. In Nonp ar am etric T e chniques in Statistic al Infer enc e , M. Pu ri editor, pages 199-21 3. Cam bridge Univ. Press, London, 1970. [17] G. Roussas. Nonparametric estimation in Mark o v p r ocesses. Ann. Inst. Statist. Math. , 21:73- 87, 1967. [18] D.W. Scott Multivariate Density Estimation: The o ry, Pr actic e, and Vi sualizatio n . John Wiley & Sons, 1992. [19] B.W. Silv erman Density Estimation for Statistics and Data Analysis . Chapman and Hall, London, 1986. [20] M.P . W and and M.C. Jon es. Kernel Smo othing . Ch apman and Hall, Lond on, 1995. [21] S . Y ak o witz. Nonparametric density and regression estimation for Marko v s equ ences without mixing assumptions. J. Multivar. Analy sis , 30:124- 136, 1989 . [22] J. Ziv. Cod ing theorems for individ u al sequences. IEEE T r ans. on Information Tho ery , IT-24:405 -412, 1978. 13

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment