A quantile-copula approach to conditional density estimation
We present a new non-parametric estimator of the conditional density of the kernel type. It is based on an efficient transformation of the data by quantile transform. By use of the copula representation, it turns out to have a remarkable product form…
Authors: ** Olivier P. Faugeras (Université Paris‑Sud, LSTA) **
A quan tile-opula approa h to onditional densit y estimation. Olivier P . F augeras L.S.T.A, Université Paris 6 175, rue du Chevaler et, 75013 Paris, F r an e T el:+(33) 1 44 27 85 62 F ax:+(33) 1 44 27 33 42 Abstrat W e presen t a new non-parametri estimator of the onditional densit y of the k ernel t yp e. It is based on an eien t transformation of the data b y quan tile transform. By use of the opula represen tation, it turns out to ha v e a remark able pro dut form. W e study its asymptoti prop erties and ompare its bias and v ariane to omp etitors based on nonparametri regression. A omparativ e n umerial sim ulation is pro vided. Key wor ds: onditional densit y, k ernel estimation, opula, quan tile transform, nonparametri regression, 1991 MSC: 62G007, 62M20, 62M10 1 In tro dution 1.1 Motivation Let (( X i , Y i ); i = 1 , . . . , n ) b e an indep enden t iden tially distributed sample from real-v alued random v ariables ( X , Y ) sitting on a giv en probabilit y spae. F or prediting the resp onse Y of the input v ariable X at a giv en lo ation x , it is of great in terest of estimating not only the onditional mean or r e gr ession funtion E ( Y | X = x ) , but the full onditional density f ( y | x ) . Indeed, estimat- ing the onditional densit y is m u h more informativ e, sine it allo ws not only to realulate the onditional exp eted v alue E ( Y | X ) and onditional v ariane Email addr ess: olivier.faugerasgmail.om (Olivier P . F augeras ). Preprin t submitted to Elsevier No v em b er 1, 2018 from the densit y , but also to pro vide the general shap e of the onditional den- sit y . This is esp eially imp ortan t for m ulti-mo dal or sk ew ed densities, whi h often arise from nonlinear or non-Gaussian phenomenas, where the exp eted v alue migh t b e no where near a mo de, i.e. the most lik ely v alue to app ear. Moreo v er, for situations in whi h ondene in terv als are preferred to p oin t estimates, the estimated onditional densit y is an ob jet of ob vious in terest. 1.2 Estimation by kernel smo othing A natural approa h to estimate the onditional densit y f ( y | x ) of Y giv en X = x w ould b e to exploit the iden tit y f ( y | x ) = f X Y ( x, y ) f X ( x ) (1) where f X Y and f X denote the join t densit y of ( X , Y ) and X , resp etiv ely . By in tro duing P arzen-Rosen blatt k ernel estimators of these densities, namely ˆ f n,X Y ( x, y ) : = 1 n n X i =1 K ′ h ′ ( X i − x ) K h ( Y i − y ) ˆ f n,X ( x ) : = 1 n n X i =1 K ′ h ′ ( X i − x ) where K h ( . ) = 1 /hK ( ./ h ) and K ′ h ′ ( . ) = 1 /h ′ K ′ ( ./h ′ ) are (resaled) k ernels with their asso iated sequene of bandwidth h = h n and h ′ = h ′ n going to zero as n → ∞ , one an onstrut the quotien t ˆ f R n ( y | x ) := ˆ f n,X Y ( x, y ) ˆ f n,X ( x ) and obtain an estimator of the onditional densit y . Su h an estimator w as rst studied b y Rosen blatt [ 26 ℄, and more reen tly b y Hyndman et al. [ 17 ℄, who sligh tly impro v ed on Rosen blatt's k ernel based estimator. 1.3 Estimation by r e gr ession te hniques As p oin ted out b y n umerous authors, see e.g. F an and Y ao [ 7 ℄ hapter 6, this approa h is equiv alen t to the one arising from onsidering this onditional densit y estimation problem in a regression framew ork. Indeed, let F ( y | x ) b e the um ulativ e onditional distribution funtion of Y giv en X = x . It stems from the fat that E 1 | Y − y |≤ h | X = x = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) 2 as h → 0 , that, if one replae the exp etation in the ab o v e expression b y its empirial oun terpart, one an apply the usual lo al a v eraging metho ds and p erform a regression estimation on the syn theti data ((1 / 2 h ) 1 | Y i − y | ≤ h ; i = 1 , . . . , n ) . By a Bo hner t yp e theorem, one an ev en replae the transformed data b y its smo othed v ersion Y ′ i := K h ( Y i − y ) := 1 h K Y i − y h . In partiular, the p opular Nadara y a-W atson regression estimator ˆ f N W n ( y | x ) := P n i =1 Y ′ i K ′ h ′ ( X i − x ) P n i =1 K ′ h ′ ( X i − x ) redues itself to the same estimator of the onditional densit y of the double k ernel t yp e as b efore ˆ f N W n ( y | x ) := P n i =1 K h ( Y i − y ) K ′ h ′ ( X i − x ) P n i =1 K ′ h ′ ( X i − x ) = ˆ f R n ( y | x ) . T aking adv an tage of this regression form ulation, F an, Y ao and T ong [ 8 ℄ pro- p osed a onditional densit y estimator whi h generalizes the k ernel one b y use of the lo al p olynomial te hniques. In partiular, it allo ws to ta kle with the bias issues of the k ernel smo othing. Ho w ev er, and unlik e the former, it is no longer guaran teed to ha v e p ositiv e v alue nor to in tegrate to 1 with resp et to y . With these issues in mind, Hyndman and Y ao [ 18 ℄ built on lo al p oly- nomial te hniques and suggested t w o impro v ed metho ds, the rst one based on lo ally tting a log-linear mo del and the seond one on onstrained lo al p olynomial mo deling. An o v erview an b e found in F an and Y ao [ 7 ℄ ( hapter 6 and 10). V ery reen tly , Gy ör and K ohler [ 15 ℄ studied a partitioning t yp e estimate and studied its prop erties in total v ariation norm and Laour [ 20 ℄ a pro jetion-t yp e estimate for Mark o v hains. 1.4 A pr o dut shap e d estimator Ho w ev er, these t w o equiv alen t approa hes suer from sev eral dra wba ks: rst, b y its form as a quotien t of t w o estimators, the probabilisti b eha vior of the Nadara y a-W atson estimator (or its lo al p olynomial oun terpart) is tri ky to study . It is usually dealt with b y a en tering at exp etation for b oth n umerator and denominator and a linearizing of the in v erse, see e.g. [ 7 ℄, or [ 1 ℄ for details. Seond, at a oneptual lev el, one ould argue that implemen ting regression estimation te hniques in this setting is, in a sense, unnatural: estimating a densit y , ev en if it is a onditional one, should resort to densit y estimation te hniques only . Finally , pratial implemen tations of these estimators an lead to n umerial instabilit y when the denominator is lose to zero. 3 T o remedy these problems, w e prop ose an estimator whi h builds on the idea of using syn theti data, i.e. a represen tation of the data more adapted to the problem than the original one. By transforming the data b y quan tile trans- forms and making use of the opula funtion, the estimator turns out to ha v e a remark able pr o dut form ˆ f n ( y | x ) = ˆ f Y ( y ) ˆ c n ( F n ( x ) , G n ( y )) where ˆ f Y , ˆ c n , F n ( x ) , G n ( y ) are estimators of the densit y f Y of Y , the opula densit y c , the .d.f. F of X and G of Y resp etiv ely (see next setion b elo w for denitions). Its study then rev eals to b e partiularly simple: it redues to the ones already done on nonparametri densit y estimation. The rest of the pap er is organized as follo ws: in setion 2, w e in tro due the quan tile transform and the opula represen tation whi h leads to the denition of our estimator. In setion 3, the main asymptoti results are established and ompared in setion 4 to those of other omp etitors. Pro ofs are mainly based on a series of auxiliary lemmas whi h are giv en in setion 5. 2 Presen tation of the estimator F or sak e of simpliit y and larit y of exp osition, w e limit ourselv es to unidi- mensional real v alued input v ariables X . Ho w ev er, all the results of this artile an b e easily extended to the m ultiv ariate ase. 2.1 The quantile tr ansform The idea of transforming the data is not new. It has b een used to impro v e the range of appliabilit y and p erformane of lassial estimation te hniques, e.g. to deal with sk ew ed data, hea vy tails, or restritions on the supp ort (see e.g. Devro y e and Lugosi [ 6 ℄ hapter 14 and the referenes therein, and also V an der V aart [ 35 ℄ hapter 3.2 for the related topi of v ariane stabilizing transformations in a parametri on text). In order to mak e inferene on Y from X , a natural question whi h then arises is, what is the b est transformation, if this question has a sense. As one an note from the ab o v e referenes, the b est transformation is v ery link ed to the distribution of the underlying data. W e will see b elo w that, for our problem, the natural andidate is the quan tile transform. The quan tile transform is a w ell-kno wn probabilisti tri k whi h is used to redue pro ofs, e.g. in empirial pro ess theory , for arbitrary real v alued ran- dom v ariables X to ones for random v ariables U uniformly distributed on the 4 in terv al [0 , 1] . It is based on the follo wing w ell-kno wn fat that whenev er F is on tin uous, the random v ariable U = F ( X ) is uniformly distributed on (0 , 1) and that on v ersely , when F is arbitrary , if U is a uniformly distributed ran- dom v ariable on (0 , 1) , X is equal in la w to F − 1 ( U ) , where F − 1 = Q is the generalized in v erse or quan tile funtion of X . (See e.g. [ 28 ℄, hapter 1). As a onsequene, giv en a sample ( X 1 , . . . , X n ) of random v ariables with om- mon on tin uous .d.f. F sitting on a probabilit y spae (Ω , A , P ) , one an al- w a ys enlarge this probabilit y spae to arry a sequene ( U 1 , . . . , U n ) of uniform (0 , 1) random v ariables su h that U i = F ( X i ) , that is to sa y to onstrut a pseudo-sample with a pr esrib e d uniform marginal distribution. 2.2 The opula r epr esentation F ormally , a opula is a bi-(or m ulti)v ariate distribution funtion whose margi- nal distribution funtions are uniform on the in terv al [0 , 1] . Indeed, Sklar [ 29 ℄ pro v ed the follo wing fundamen tal result: Theorem 2.1 F or any bivariate umulative distribution funtion F X,Y on R 2 , with mar ginal umulative distribution funtions F of X and G of Y , ther e ex- ists some funtion C : [0 , 1] 2 → [0 , 1] , al le d the dep enden e or opula funtion, suh as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . (2) If F and G ar e ontinuous, this r epr esentation is unique with r esp e t to ( F , G ) . The opula funtion C is itself a umulative distribution funtion on [0 , 1] 2 with uniform mar ginals. This theorem giv es a represen tation of the biv ariate .d.f. as a funtion of ea h univ ariate .d.f. In other w ords, the opula funtion aptures the dep endene struture among the omp onen ts X and Y of the v etor ( X , Y ) , irresp etiv ely of the marginal distribution F and G . Simply put, it allo ws to deal with the randomness of the dep endene struture and the randomness of the marginals sep ar ately . Copulas app ears to b e naturally link ed with the quan tile transform as form ula 2 en tails that C ( u, v ) = F X,Y ( F − 1 ( u ) , G − 1 ( v )) . F or more details regarding op- ulas and their prop erties, one an onsult for example the b o ok of Jo e [ 19 ℄. Copulas ha v e witnessed a renew ed in terest in statistis, esp eially in nane, sine the pioneering w ork of Deheuv els [ 4 ℄, who in tro dued the empirial op- ula pro ess. W eak on v ergene of the empirial opula pro ess w as in v estigated b y Deheuv els [ 5 ℄, V an der V aart and W ellner [ 36 ℄, F ermanian, Radulo vi and W egk amp [ 11 ℄. F or the estimation of the opula densit y , refer to Gijb els and 5 Mielnizuk [ 13 ℄, F ermanian [ 9 ℄ and F ermanian and Saillet [ 10 ℄. F rom no w on, w e assume that the opula funtion C ( u, v ) has a densit y c ( u, v ) with resp et to the Leb esgue measure on [0 , 1] 2 and that F and G are stritly inreasing and dieren tiable with densities f and g . C ( u, v ) and c ( u, v ) are then the um ulativ e distribution funtion (.d.f.) and densit y resp etiv ely of the transformed v ariables ( U, V ) = ( F ( X ) , G ( Y )) . By dieren tiating form ula ( 2 ), w e get for the join t densit y , f X Y ( x, y ) = ∂ 2 F X Y ( x, y ) ∂ x∂ y = f ( x ) g ( y ) c ( F ( x ) , G ( y )) where c ( u, v ) := ∂ 2 C ( u,v ) ∂ u∂ v is the ab o v e men tioned opula densit y . Ev en tually , w e an obtain the follo wing expliit form ula of the onditional densit y f Y | X ( x, y ) = f X Y ( x, y ) f ( x ) = g ( y ) c ( F ( x ) , G ( y )) . (3) 2.3 Constrution of the estimator Starting from the previously stated pro dut t yp e form ula ( 3 ), a natural plug-in approa h to build an estimator of the onditional densit y is to use • a P arzen-Rosen blatt k ernel t yp e non parametri estimator of the marginal densit y g of Y , ˆ g n ( y ) := 1 nh n n X i =1 K 0 y − Y i h n • the empirial distribution funtions F n ( x ) and G n ( y ) for F ( x ) and G ( y ) resp etiv ely , F n ( x ) = 1 n n X j =1 1 X j ≤ x and G n ( y ) := 1 n n X j =1 1 Y j ≤ y . Conerning the opula densit y c ( u, v ) , w e noted that c ( u, v ) is the join t densit y of the transformed v ariables ( U, V ) = ( F ( X ) , G ( Y )) . Therefore, c ( u, v ) an b e estimated b y the biv ariate P arzen-Rosen blatt k ernel t yp e non parametri densit y (pseudo) estimator, c n ( u, v ) := 1 na n b n n X i =1 K u − U i a n , v − V i b n (4) where K is a biv ariate k ernel and a n , b n its asso iated bandwidth. F or simpli- it y , w e restrit ourselv es to pro dut k ernels, i.e. K ( u, v ) = K 1 ( u ) K 2 ( v ) with the same bandwidths a n = b n . 6 Nonetheless, sine F and G are unkno wn, the random v ariables ( U i , V i ) are not observ able, i.e. c n is not a true statisti. Therefore, w e appro ximate the pseudo- sample ( U i , V i ) , i = 1 , . . . , n b y its empirial oun terpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . W e therefore obtain a gen uine estimator of c ( u, v ) ˆ c n ( u, v ) := 1 na 2 n n X i =1 K 1 u − F n ( X i ) a n ! K 2 v − G n ( Y i ) a n ! . (5) Ev en tually , the onditional densit y estimator is written as ˆ f n ( y | x ) := " 1 nh n n X i =1 K 0 y − Y i h n # . " 1 na 2 n n X i =1 K 1 F n ( x ) − F n ( X i ) a n ! K 2 F n ( y ) − G n ( Y i ) a n !# or, under a more ompat form, ˆ f n ( y | x ) := ˆ g n ( y ) ˆ c n ( F n ( x ) , G n ( y )) . (6) Remark 1 T o our know le dge, the estimator studie d in this p ap er has never b e en pr op ose d in the liter atur e. However, some onne tions an b e made with the ne ar est neighb or one pr op ose d by Stute [ 32 ℄, [ 33 ℄ and [ 34 ℄ for onditional umulative distribution funtion and the Gasser and Mül ler [ 12 ℄ and Priestley and Chao [ 24 ℄ one in the ontext of r e gr ession estimation. Inde e d, these esti- mators takle the issue of having a r andom denominator by rst tr ansforming the design X 1 , . . . , X n to a uniform (r andom) one. This r esult in assigning the surfa es under the kernel funtion inste ad of its heights as weights. Con- tr ary to our estimator, they do not make tr ansformations of the data in b oth dir e tions X and Y . 3 Asymptoti results 3.1 Notations and assumptions W e note the ith momen t of a generi k ernel (p ossibly m ultiv ariate) K as m i ( K ) := R u i K ( u ) du , and the L p norm of a funtion h b y || h || p := R h p . W e use the sign ≃ to denote the order of the bandwidths, i.e. h n ≃ u n means that h n = c n u n with c n → c > 0 . The supp ort of the densities funtion f and c are noted as supp ( f ) = { x ∈ R ; f ( x ) > 0 } and supp ( c ) = { ( u, v ) ∈ R 2 ; c ( u, v ) > 0 } , resp etiv ely . F or stating our results, w e will ha v e to mak e some regularit y assumptions 7 on the k ernels and the densities whi h, although far from b eing minimal, are someho w ustomary in k ernel densit y estimation (see subsetion 5.2 for disus- sions and details). Set x and y t w o xed p oin ts in the in terior of supp ( f ) and supp ( g ) resp etiv ely . In the remainder of this pap er, w e will alw a ys supp ose that i) the .d.f F of X and G of Y are stritly inreasing and dieren tiable; ii) the densities g and c are t wie dieren tiable with on tin uous b ounded seond deriv ativ es on their supp ort. Moreo v er, w e assume that the k ernels K 0 and K satisfy the follo wing: (i) K and K 0 are of b ounded supp ort and of b ounded v ariation; (ii) 0 ≤ K ≤ C and 0 ≤ K 0 ≤ C for some onstan t C ; (iii) K and K 0 are rst order k ernels: m 0 ( K ) = 1 , m 1 ( K ) = 0 and m 2 ( K ) < + ∞ , and the same for K 0 . In addition, in order to appro ximate ˆ c n b y c n , w e will imp ose the sligh tly more stringen t assumption on the biv ariate k ernel K , that it is t wie dieren tiable with b ounded seond partial deriv ativ es. 3.2 W e ak and str ong onsisteny of the estimator W e ha v e the follo wing p oin t wise w eak onsisteny theorem: Theorem 3.1 L et the r e gularity onditions on the densities and kernels b e satise d, if h n and a n tends to zer o as n → ∞ in suh a way that nh n → ∞ , na 2 n → ∞ , then ˆ f n ( y | x ) = f ( y | x ) + O P 1 √ nh n + h 2 n + 1 q na 2 n + a 2 n . Pro of. Reall from 4 and 5 that c n and ˆ c n are estimators of the opula densit y c based resp etiv ely on unobserv able pseudo-data ( F ( X i ) , G ( Y i ) , and their appro ximations ( F n ( X i ) , G n ( Y i )) . The main ingredien t of the pro of follo ws from the deomp osition: ˆ f n ( y | x ) − f ( y | x ) = ˆ g n ( y ) ˆ c n ( F n ( x ) , G n ( y )) − g ( y ) c ( F ( x ) , G ( y )) = [ ˆ g n ( y ) − g ( y ) ] ˆ c n ( F n ( x ) , G n ( y )) + g ( y ) [ˆ c n ( F n ( x ) , G n ( y )) − c ( F ( x ) , G ( y ))] : = D 1 + D 2 8 W e pro eed one step further in the deomp osition of ea h terms, b y rst en tering at xed lo ations, D 1 = [ ˆ g n ( y ) − g ( y )] [ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y ))] + [ ˆ g n ( y ) − g ( y ) ] [ ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y ))] + [ ˆ g n ( y ) − g ( y ) ] [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) ] + [ ˆ g n ( y ) − g ( y ) ] [ c ( F ( x ) , G ( y ))] (7) D 2 = g ( y ) [ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y ))] + g ( y ) [ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y ))] + g ( y ) [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y ))] (8) Con v ergene results for the k ernel densit y estimators of setion 5.2 en tail that ˆ g n ( y ) − g ( y ) = O p ( h 2 n + 1 / q nh n ) c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) = O p ( a 2 n + 1 / q na 2 n ) b y lemma 5.2 and 5.3 resp etiv ely . Appro ximation lemmas 5.4 and 5.5 of setions 5.4 and 5.5 en tail that ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y )) = o P ( a 2 n + 1 / q na 2 n ) ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = o P ( a 2 n + 1 / q na 2 n ) . W e therefore obtain that D 1 = O P h 2 n + 1 / q nh n O P a 2 n + 1 / q na 2 n + O P h 2 n + 1 / q nh n D 2 = o P a 2 n + 1 / q na 2 n + O P a 2 n + 1 / q na 2 n and the ondition a n → 0 , h n → 0 , na 2 n → + ∞ , nh n → + ∞ en tails the on v ergene of the estimator. ✷ Remark 2 As a or ol lary, we get the r ate of onver gen e, by ho osing the b andwidths whih b alan e the bias and varian e tr ade-o: for an optimal hoi e of h n ≃ n − 1 / 5 and a n ≃ n − 1 / 6 , we get ˆ f ( y | x ) = f ( y | x ) + O P ( n − 1 / 3 ) . Ther efor e, our estimator is r ate optimal in the sense that it r e ahes the mini- max r ate n − 1 / 3 of onver gen e, a or ding to Stone [ 30 ℄. Almost sure results an b e pro v ed in the same w a y: w e ha v e the follo wing strong onsisteny result 9 Theorem 3.2 L et the r e gularity onditions on the densities and kernels b e satise d. If in addition nh n / (ln ln n ) → ∞ and na 2 n / (ln ln n ) → ∞ , then ˆ f n ( y | x ) = f ( y | x ) + O a.s. a 2 n + s ln ln n na 2 n + h 2 n + s ln ln n nh n . Pro of. It follo ws the same lines as the preeding theorem, but uses the a.s. results of the onsisteny of the k ernel densit y estimators of lemmas 5.2 and 5.3 and of the appro ximation lemmas 5.4 and 5.5 . It is therefore omitted. ✷ Remark 3 F or h n ≃ (ln ln n/n ) 1 / 5 and a n ≃ (ln ln n/n ) 1 / 6 whih is the op- timal tr ade-o b etwe en the bias and the sto hasti term, one gets the optimal r ate (ln ln n/n ) 1 / 3 . 3.3 Conver gen e in distribution Theorem 3.3 L et the r e gularity onditions on the densities and kernels b e satise d. h n → 0 , a n → 0 , nh n → ∞ and na 2 n → ∞ entail q na 2 n ˆ f n ( y | x ) − f ( y | x ) d ❀ N 0 , g ( y ) f ( y | x ) || K | | 2 2 . F or h n ≃ n − 1 / 5 , a n ≃ n − 1 / 6 one gets the usual r ate n − 1 / 3 . Pro of. With the onditions on the bandwidths, all the terms in the pre- vious deomp osition 7 and 8 , are negligible ompared to ( na 2 n ) − 1 / 2 exept c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) , whi h is asymptotially normal b y the result of setion 5 , lemma 5.3 q na 2 n g ( y ) [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) ] d ❀ N 0 , g 2 ( y ) c ( F ( x ) , G ( y )) k K k 2 2 . An appliation of Slutsky's lemma yields the desired result. ✷ F or a v etor ( y 1 , . . . , y d ) , one an get a m ultidimensional v ersion of the on- v ergene in distribution (di on v ergene): Corollary 3.4 With the same assumptions, for ( y 1 , . . . , y d ) in the interior of supp ( g ) suh that g ( y i ) f ( y i | x ) 6 = 0 , q na 2 n ˆ f n ( y i | x ) − f ( y i | x ) q g ( y i ) f ( y i | x ) k K k 2 , i = 1 , ..., m d ❀ N ( m ) wher e N ( m ) is the standar d m -variate enter e d normal distribution with iden- tity varian e matrix. 10 Pro of. It simply follo ws from the use of the Cramér-W old devie and is there- fore omitted. F or details, see e.g. [ 1 ℄, theorem 2.3. ✷ 3.4 Asymptoti Bias, V arian e and Me an squar e err or The asymptoti bias is alulated in the follo wing prop osition. Prop osition 3.5 With the assumptions of The or em 3.1 , we have B 0 := E ( ˆ f n ( y | x )) − f ( y | x ) = g ( y ) B K ( c, x, y ) a 2 n 2 + o ( a 2 n ) with B K ( c, x, y ) := m 2 ( K 1 ) ∂ 2 c ( F ( x ) ,G ( y )) ∂ u 2 + m 2 ( K 2 ) ∂ 2 c ( F ( x ) ,G ( y )) ∂ v 2 . Pro of. (Sk et h). By taking exp etation in the deomp osition 7 and 8 , E D 1 = c ( F ( x ) , G ( y )) E [ ˆ g n ( y ) − g ( y )] + R 1 E D 2 = g ( y ) E ([ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y ))]) + R 2 where w e made app ear the bias of ˆ g n and c n and where R 1 and R 2 stand for the remaining terms. With the assumptions on the bandwidths and deriv ations made tedious b y the transformation of the data b y the empirial margins, (see F ermanian [ 9 ℄ theorem 1 for su h a alulation), the terms in R 2 are negligible ompared to the bias of c n . The bias of c n , whi h is simply the bias of a biv ariate k ernel densit y estimator, is of order a 2 n . Similarly , b y b ounding the pro dut terms in D 1 b y Cau h y-S h w arz inequalit y , routine analysis sho w that the terms in R 1 are negligible ompared to the bias of ˆ g n , whi h is of order h 2 n . Sine h 2 n is itself negligible to a 2 n , the main term in the deomp osition is g ( y ) E ( c n ( F ( x ) , G ( y )) − C ( F ( x ) , G ( y ))) . Plugging the expression of the bias giv en in lemma 5.3 , yields the desired result. ✷ The asymptoti v ariane has already b een deriv ed in theorem 3.3 , V 0 := V ar ( ˆ f ( y | x )) = 1 / ( na 2 n ) g ( y ) f ( y | x ) | | K | | 2 2 + o (1 / ( na 2 n )) . T ogether with the omputation of the asymptoti bias, w e get the asymptoti mean squared error as a orollary: Corollary 3.6 With the pr evious assumptions, the Asymptoti Me an Squar e d Err or (AMSE) E 0 at ( x, y ) is E 0 := B 2 0 + V 0 = a 4 n g 2 ( y ) ( B k ( c, x, y )) 2 4 + g ( y ) f ( y | x ) | | K || 2 2 na 2 n + o a 4 n + 1 na 2 n ! 11 whih gives, for the hoi e of the usual b andwidths mentione d ab ove, E 0 = n − 2 / 3 g 2 ( y ) B 2 K ( c, x, y ) 4 + c ( F ( x ) , G ( y )) || K || 2 2 ! + o ( n − 2 / 3 ) . 4 Comparison with other estimators 4.1 Pr esentation of alternative estimators F or on v eniene, w e reall b elo w the denition of other estimators of the on- ditional densit y enoun tered in the literature and summarize their bias and v ariane prop erties. W e will note the bias of the ith estimator ˆ f i n ( y | x ) b y E i and its v ariane b y V i . (1) Double k ernel estimator : as dened in the in tro dution setion of our pap er b y the follo wing ratio, ˆ f (1) n ( y | x ) := 1 n n P i =1 K ′ h 1 ( X i − x ) K h 2 ( Y i − y ) 1 n n P i =1 K ′ h 1 ( X i − x ) . where h 1 and h 2 are the bandwidths. One then ha v e, see e.g. [ 17 ℄, • Bias: B 1 = h 2 1 m 2 ( K ) 2 2 f ′ ( x ) f ( x ) ∂ f ( y | x ) ∂ x + ∂ 2 f ( y | x ) ∂ x 2 + h 2 h 1 ! 2 ∂ 2 f ( y | x ) ∂ y 2 + o h 2 1 + h 2 2 • V ariane: V 1 = k K k 2 2 f ( y | x ) nh 1 h 2 f ( x ) k K k 2 2 − h 2 f ( y | x ) + o 1 nh 1 h 2 (2) Lo al p olynomial estimator : Set R ( θ , x, y ) := n X i =1 K h 2 ( Y i − y ) − X r j =0 θ j ( X i − x ) j 2 K ′ h 1 ( X i − x ) , then the lo al p olynomial estimator is dened as ˆ f (2) n ( y | x ) := ˆ θ 0 , where ˆ θ xy := ( ˆ θ 0 , ˆ θ 1 , . . . , ˆ θ r ) is the v alue of θ whi h minimizes R ( θ , x, y ) . This lo al p olynomial estimator, although it has a sup erior bias than 12 the k ernel one, is no longer restrited to b e non-negativ e and do es not in tegrate to 1, exept in the sp eial ase r = 0 . F rom results of [ 8 ℄, w e get for the lo al linear estimator (see also [ 7 ℄ p. 256), • Bias: B 2 = h 2 1 m 2 ( K ′ ) 2 ∂ 2 f ( y | x ) ∂ x 2 + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 2 = || K || 2 2 || K ′ || 2 2 f ( y | x ) nh 1 h 2 f ( x ) + o 1 nh 1 h 2 (3) Lo al parametri estimator : As in [ 18 ℄ and [ 7 ℄, set R 1 ( θ , x, y ) := n X i =1 ( K h 2 ( Y i − y ) − A ( X i − x, θ )) 2 K ′ h 1 ( X i − x ) where A ( x, θ ) = l P r j =0 θ j ( X i − x ) j and l ( . ) is a monotoni funtion mapping R 7→ R + , e.g. l ( u ) = exp( u ) . Then, ˆ f (3) n ( y | x ) := A (0 , ˆ θ ) = l ( ˆ θ 0 ) . • Bias: B 3 = h 2 1 η ( K ′ ) ∂ 2 f ( y | x ) ∂ x 2 − ∂ 2 A (0 , θ xy ) ∂ x 2 ! + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 3 = τ ( K , K ′ ) 2 f ( y | x ) nh 1 h 2 f ( x ) + o 1 nh 1 h 2 where η and τ are k ernel dep enden t onstan ts. (4) Constrained lo al p olynomial estimator : A simple devie to fore the lo al p olynomial estimator to b e p ositiv e is to set θ 0 = exp( α ) in the denition of R 0 to b e minimized. The onstrained lo al p olynomial estimator ˆ f 4 n ( y | x ) is then dened analogously as the lo al p olynomial estimator ˆ f 2 n ( y | x ) . W e ha v e, as in [ 18 ℄ and [ 7 ℄: • Bias: B 4 := h 2 1 m 2 ( K ′ ) 2 ∂ 2 f ( y | x ) ∂ x 2 + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 4 = k K k 2 2 f ( y | x ) nh 1 h 2 f ( x ) + o 1 nh 1 h 2 13 4.2 Asymptoti Bias and V arian e omp arison All estimators ha v e (hop efully) the same order n − 1 / 3 and n − 2 / 3 in their asymp- toti bias and v ariane terms, for the usual bandwidths hoie. The main dierene lies in the onstan t terms whi h dep end on unkno wn densities. Bias : Con trary to all the alternativ e estimators whose bias in v olv es deriv ativ es of the full onditional densit y , one an note that our estimator's bias only in v olv es the densit y of Y and the deriv ativ es of the opula densit y . T o mak e things more expliit, the terms in v olv ed, e.g. in the lo al p olynomial estimator, write themselv es as the sum of the deriv ativ es of the onditional densit y , h − 2 n B 2 ≈ ∂ 2 f ( y | x ) ∂ x 2 + ∂ 2 f ( y | x ) ∂ y 2 that is to sa y , h − 2 n B 2 ≈ f ′ ( x ) g ( y ) ∂ c ( F ( x ) , G ( y )) ∂ u + f 2 ( x ) g ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ u 2 + 2 g ′ ( y ) g ( y ) ∂ c ( F ( x ) , G ( y )) ∂ v + g 3 ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ v 2 whereas our ( g ( y ) / 2) B K ( c, x, y ) term, mo dulo the onstan ts in v olv ed b y the k ernel, is written as a − 2 n B 0 ≈ g ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ u 2 + ∂ 2 c ( F ( x ) , G ( y )) ∂ v 2 ! . It then b eomes lear that w e ha v e a simpler expression, with less unkno wn terms, as is the ase for omp etitors whi h do in v olv e the densit y f and its deriv ativ e f ′ of X and the deriv ativ e g ′ of the Y densit y . In a xed bandwidth and asymptoti on text, it seems diult to ompare further. Nonetheless, w e b eliev e this feature of our estimator w ould b e prati- ally relev an t when it omes to ho osing the bandwidths. Indeed, bandwidth seletion is usually p erformed b y minimizing lo al or global asymptoti error riteria su h as Asymptoti Mean Square Error (AMSE) or Asymptoti Mean In tegrated Square Error (AMISE), in whi h unkno wn terms ha v e to b e esti- mated. Sine in our approa h, the asymptoti bias and v ariane in v olv e less unkno wn terms, w e exp et that a higher auray ould b e obtained in this pre-estimation stage. Moreo v er, b y ha ving managed to separate the estimation problem of the marginal from the opula densit y , w e ould use kno wn optimal data-dep enden t bandwidths seletion pro edures for densit y estimation su h as ross v alidation, separately for the densit y of Y and for the opula densit y . Remark 4 Sin e the opula density c has a omp at supp ort [0 , 1] 2 , our esti- mator may suer fr om bias issues on the b oundaries, i.e. in the tails of X and 14 Y . T o orr e t these issues, one ould apply one of the sever al known te hniques to r e du e the bias of the kernel estimator on the e dges (se e e.g [ 7 ℄ hapter 5.5, b oundary kernels, r ee tion, tr ansformation and lo al p olynomial tting). In the tail of the distribution of X , this bias issue in the opula density estimator is b alan e d by the impr ove d varian e, as shown b elow. V ariane : The v ariane of our estimator in v olv es a pro dut of the densit y g ( y ) of Y b y the onditional densit y f ( y | x ) , na 2 n V 0 ≈ g ( y ) f ( y | x ) = g 2 ( y ) c ( F ( x ) , G ( y ) whereas omp etitors in v olv e the ratio of f ( y | x ) b y the densit y f ( x ) of X f ( y | x ) f ( x ) = g ( y ) f ( x ) c ( F ( x ) , G ( y )) . It is a remark able feature of the estimator w e prop ose, that its v ariane do es not in v olv e diretly f ( x ) , as is the ase for the omp etitors, but only its on tri- bution to Y , through the opula densit y . This reets the abilit y announed in the in tro dution of the opula represen tation to ha v e eetiv ely separated the randomness p ertaining to Y alone, from the dep endene struture of ( X , Y ) . Moreo v er, our estimator also do es not suer from the unstable nature of om- p etitors who, due to their in trinsi ratio struture, get an explosiv e v ariane for small v alue of the densit y f ( x ) , making onditional estimation diult, e.g. in the tail of the distribution of X . Remark 5 T o make estimators omp ar able, we have r estrite d ourselves to so- al le d xe d b andwidths estimators, i.e. nonp ar ametri estimators wher e the b andwidths ar e of the generi form h n = bn α or h n = b (ln n/n ) α with α and b r e al numb ers. Impr ove d b ehavior for al l the pr e e ding estimators an b e obtaine d with data-dep endent b andwidths wher e h n = H n ( X 1 , . . . , X n , x ) an b e funtions of the lo ation and of the data. 4.3 Finite sample numeri al simulation 4.3.1 Pr ati al implementation of the estimator Although the prop osed estimator seems to ompare fa v orably asymptotially , some pitfalls link ed to the opula densit y estimation ma y sho w up in the pratial implemen tation: Innities at the orners: man y opula densities exhibit innite v alues at their orners. Therefore, to a v oid that ( F n ( X i ) , G n ( Y i )) b e equal to (1 , 1) , w e hange the empirial distribution funtions F n and G n to n/ ( n + 1) F n and n/ ( n + 1) G n resp etiv ely . 15 Boundary bias: sine the opula densit y is of ompat supp ort [0 , 1] 2 , the k ernel metho d of estimation ma y suer from b oundary bias. T o alleviate this issue, w e suggest to use b oundary-orreted k ernels su h as the b eta k ernels K x,b ( t ) = β x/b +1 , (1 − x ) /b +1 ( t ) , where β a,b ( t ) denotes the p df of a Beta(a,b) dis- tribution, adv o ated b y Chen [ 2 ℄, and used e.g. b y [ 14 ℄ for estimating loss distributions. The mo died opula densit y pseudo estimator is th us dened as c n ( u, v ) = n − 1 P n i =1 K u,a n ( U i ) K v,a n ( V i ) . Bandwidth seletion: p erformane of nonparametri estimators dep ends ruially on the bandwidths. F or onditional densit y , bandwidth seletion is a more deliate matter than for densit y estimation due to the m ultidimensional nature of the problem. Moreo v er, for ratio-t yp e estimators, the diult y is inreased b y the lo al dep endene of the bandwidths h y on h x implied b y on- ditioning near x . F or the opula estimator, a supplemen tal issue omes from the fat that the pseudo-data F ( X i ) , G ( Y i ) is not diretly aessible. Insp e- tion of the AMISE of the opula-based estimator suggest w e an separate the bandwidth hoie of h for ˆ g ( y ) from the bandwidth hoie of a n the opula densit y estimator ˆ c n . A rationale for a data-dep enden t metho d is to separately selet h on the Y i data alone (e.g. b y ross-v alidation or plug-in), from the a n of the opula densit y c based on the appro ximate data F n ( X i ) , G n ( Y i ) . Ho w- ev er, su h a bandwidth seletion w ould require deep er analysis and w e lea v e a detailed study of a pratial data-dep enden t metho d for bandwidth seletion of the opula-quan tile estimator, together with a global and lo al omparison of the estimators at their resp etiv e optimal bandwidths for further resear h. 4.3.2 Mo del and omp arison r esults W e sim ulated a sample of n = 100 v ariables ( X i , Y i ) , from the follo wing mo del: X , Y is marginally distributed as N (0 , 1) and link ed via F rank Copula . C ( u, v , θ ) = ln[( θ + θ u + v − θ u − θ v ) / ( θ − 1 )] ln θ with parameter θ = 100 . W e restrited ourselv es to simple, xed for all x, y , rule-of-th um b metho ds based on Normal referene rule to get a rst piture. F or the seletion of a n of the opula densit y estimator, w e applied Sott's Rule on the data F n ( X i ) . W e used Epane hnik o v k ernels for ˆ g ( y ) and the other estimators. W e plotted the onditional densit y along with its estimations on the domain x ∈ [ − 5 , 5] and y ∈ [ − 3 , 3] on gure 1 . A omparison plot at x = 2 is sho wn on gure 2 . 16 Figure 1. 3D Plots. F rom left to righ t, top to b ottom: true densit y , quan tile-opula estimator, double k ernel, lo al p olynomial (lipp ed). - 3 - 2 - 1 0 1 2 3 y 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure 2. Comparison at x=2: onditional densit y=thi k urv e, quan tile-op- ula=on tin uous line, double k ernel=dotted urv e, lo al p olynomial=dashed urv e. 17 4.3.3 Clipping and Estimation in the tails As men tioned earlier, as the p erformane of the estimators dep ends on the p erformane of the bandwidths seletion metho d, it is deliate to giv e a on- lusiv e answ er. Ho w ev er, w e w ould lik e to illustrate at least one ase where the prop osed estimator learly outp erforms its omp etitors. Indeed, one ma jor issue of alternativ e estimators already men tioned is their n umerial explosion when the estimated densit y ˆ f ( x ) is lose to zero. In partiular, if the k ernel is of ompat supp ort, the denominator is zero for the x whose distane from the losest X i exeeds half the bandwidth times the length of the supp ort, thereb y allo wing estimation only on a losed subset of X inluded in [min X i , max X i ] . This is one of the reason wh y sim ulation studies are often p erformed either with a marginal X densit y of b ounded supp ort and/or with a Gaussian k er- nel. Note that the problem remains with a Gaussian k ernel sine the estimated densit y an b eome qui kly lo w er than the ma hine preision. T o prev en t from this n umerial explosion, the denition of the onditional densit y estimators ha v e to b e mo died either b y ˆ f ( y | x ) = ˆ f X Y ( x,y ) ˆ f X ( x ) if ˆ f X ( x ) > c ˆ a ( y ) if ˆ f X ( x ) = 0 or b y , ˆ f ( y | x ) = ˆ f X Y ( x, y ) max { ˆ f ( x ) , c } where c > 0 is an arbitrary amoun t of lipping, and ˆ a ( . ) is an arbitrary densit y estimator (usually hosen to b e zero or ˆ g ( y ) ). An illustration of these issues learly app ears in gure 1 . The unlipp ed v ersion of the double k ernel estimator is unable to estimate the onditional densit y for | x | roughly > 3 , and the lipp ed v ersion of the lo al p olynomial estimator with c = 0 . 00001 and ˆ a ( y ) = ˆ g ( y ) giv es a wrong estimation in the tails, reeting the arbitrary hoies in the lipping deision. T o the on trary , the quan tile- opula estimator is surprisingly able to estimate the onditional densit y f ( y | x ) at lo ations x where there is no data, i.e. in the tails of the distribution of X . An explanation of this apparen tly parado xal phenomenon omes from the fat that the estimator is partially based on the ranks of X i and Y i . Therefore, it an reo v er hidden information on the densit y of X from the ordering of the pairs ( X i , Y i ) . See Ho [ 16 ℄ for a detailed explanation. W e b eliev e that this feature migh t b e of p oten tial in terest for appliations, e.g. in statistial inferene of extreme v alues and rare ev en ts. Disussion The quan tile transform and use of the opula form ula has th us turned the on- ditional densit y form ula ( 1 ) of the ratio t yp e in to the pro dut one ( 3 ). This 18 form ula w as the ba kb one of our artile where this pro dut form app eared to b e esp eially app ealing for statistial estimation: onsisteny and limit re- sults where obtained b y simple om bination of the previous kno wn ones on (unonditional) densit y estimation. The estimator obtained sho ws in teresting asymptoti bias and v arianes prop erties ompared to omp etitors. Although its nite sample implemen tation do es not giv e y et a lear and onlusiv e pi- ture, it already yields some promising results, e.g. for estimation in the tails of X , where the prop osed estimator do es not suer from lipping issues. 5 App endix : auxiliary results In this setion, w e gather some preliminary results whi h w e will need as basi to ols for the demonstrations of setion 3. In subsetion 5.1 , w e reall lassial results ab out the on v ergene of the K olmogoro v-Sminorv statisti. Next, w e mak e a brief o v erview of k ernel densit y estimation and apply these results to the estimators ˆ g n (setion 5.2 ) and c n (setion 5.3 ). Ev en tually , w e need t w o appro ximation lemmas of ˆ c n b y c n in setions 5.4 and 5.5 . 5.1 Appr oximation of the pseudo-variables F ( X i ) by their estimates F n ( X i ) F or ( X i , i = 1 , . . . , n ) an i.i.d. sample of a real random v ariable X with ommon .d.f. F , the K olmogoro v-Smirno v statisti is dened as D n := k F n − F k ∞ . Gliv enk o-Can telli, K olmogoro v and Smirno v, Ch ung, Donsk er among others ha v e studied its on v ergene prop erties in inreasing generalit y (See [ 28 ℄ and [ 36 ℄ for reen t aoun ts). F or our purp ose, w e only need to form ulate these results in the follo wing rough form: Lemma 5.1 F or an i.i.d. sample fr om a ontinuous .d.f. F , k F n − F k ∞ = O a.s. s ln ln n n (9) k F n − F k ∞ = O P 1 √ n ! . (10) Sine F is unkno wn, the random v ariables U i = F ( X i ) are not observ ed. As a onsequene of the preeding lemma 5.1 , one an naturally appro ximate these v ariables b y the statistis F n ( X i ) . Indeed, | F ( X i ) − F n ( X i ) | ≤ sup x ∈ R | F ( x ) − F n ( x ) | = k F n − F k ∞ a.s. 19 Th us, | F ( X i ) − F n ( X i ) | is no more than an O P ((ln ln n/n ) 1 / 2 ) or an O a.s. ( n − 1 / 2 ) . These rates of appro ximation app ears to b e faster than those of statistial estimation of densities, as is sho wn in the next subsetion. 5.2 Conver gen e of the kernel density estimator ˆ g n W e reall b elo w some lassial results ab out the on v ergene of the P arzen- Rosen blatt k ernel non-parametri estimator ˆ f n of a d-v ariate densit y . Sine its ineption b y Rosen blatt [ 25 ℄ and P arzen [ 22 ℄, it has b een studied b y a great deal of authors. See e.g. Sott [ 27 ℄, Prak asa Rao [ 23 ℄, Nadara y a [ 21 ℄ for details. See also Bosq [ 1 ℄ hapter 2. It is w ell kno wn that the bias of the k ernel densit y estimator dep ends on the degree of smo othness of the underlying densit y , measured b y its n um b er of deriv ativ es or its Lips hitz order. In order to get the on v ergene of the bias to zero, it sues to assume that the densit y is on tin uous (See [ 22 ℄). T o get further information on the rate of on v ergene of the estimator, it is neessary to mak e further assumptions. Moreo v er, for k ernel funtions with un b ounded supp ort, the rate of on v ergene also dep ends on the tail b eha vior of the k ernel (See Stute [ 31 ℄). Therefore, for larit y of exp osition and simpliit y of notations, w e will mak e the ustomary assumptions that the densit y is t wie dieren tiable and that the k ernel is of b ounded supp ort. W e then ha v e the follo wing results: • Bias: With the previous assumptions, for a x in the in terior of supp ( f ) , h n → 0 and nh d n → ∞ en tail that E ˆ f n ( x ) = f ( x ) + h 2 n 2 Z R d X 1 ≤ i,j ≤ d ∂ 2 f ( x ) ∂ x i ∂ x j z i z j K ( z ) dz + o ( h 2 n ) . With the m ultiv ariate k ernel K as a pro dut of d order one k ernels K i , the ab o v e sum redues to the diagonal terms. E ˆ f n ( x ) = f ( x ) + h 2 n 2 X 1 ≤ i ≤ d m 2 ( K i ) ∂ 2 f ( x ) ∂ x 2 i + o ( h 2 n ) . • V ariane: with the same assumptions, V ar h ˆ f n ( x ) i = f ( x ) nh d n k K k 2 2 + o 1 nh d n ! . • P oin t wise asymptoti normalit y: under the previous onditions, q nh d n ˆ f n ( x ) − E ˆ f n ( x ) d ❀ N (0 , f ( x ) k K k 2 2 ) . 20 F or a hoie of the bandwidth as h n ≃ n − 1 / ( d +4) , whi h realizes the optimal trade-o b et w een the bias and v ariane, one gets the rate n − 2 / ( d +4) , whi h is the optimal sp eed of on v ergene in the minimax sense in the lass of densit y funtions with b ounded seond deriv ativ es, aording to [ 30 ℄. • P oin t wise almost sure on v ergene: if moreo v er nh d n / (ln ln n ) → ∞ (see [ 3 ℄), w e ha v e that ˆ f n ( x ) − E ˆ f n ( x ) = O a.s. s ln ln n nh d n ! . F or a hoie of the bandwidth as h n ≃ ((ln ln n ) /n ) 1 / ( d +4) , w e get the rate of on v ergene ((ln ln n ) /n ) 2 / ( d +4) : ˆ f n ( x ) − f ( x ) = O a.s. ln ln n n ! 2 / ( d +4) . Applied to our ase ( d = 1 ), w e an summarize these results for further ref- erene in the follo wing lemma for the estimator ˆ g n of the densit y g of Y : Lemma 5.2 With the pr evious assumptions, for a p oint y in the interior of the supp ort of g , and a b andwidth hosen suh as h n ≃ n − 1 / 5 , we have | ˆ g n ( y ) − g ( y ) | = O p ( n − 2 / 5 ) n 2 / 5 [ ˆ g n ( y ) − g ( y ) ] d ❀ N 0 , g ( y ) k K 0 k 2 2 . With the same assumptions, but for a b andwidth hoi e of h n ≃ (ln ln n/n ) 1 / 5 , ˆ g n ( y ) − g ( y ) = O a.s. ln ln n n ! 2 / 5 . (11) 5.3 Conver gen e of c n ( u, v ) As men tioned b efore, the assumptions that F and G b e dieren tiable and stritly inreasing en tail that c is the densit y of the transformed v ariables ( U, V ) := ( F ( X ) , G ( Y )) . Therefore, one one on vines oneself that c n ( u, v ) is simply the k ernel densit y estimator of the biv ariate densit y c ( u, v ) of the pseudo-v ariables ( U, V ) , one diretly dra ws its on v ergene prop erties b y ap- plying the results of the preeding subsetion with d = 2 : Lemma 5.3 F or a hoi e of a n ≃ n − 1 / 6 , for every ( u, v ) ∈ (0 , 1) 2 , similar r esults of those of lemma 5.2 hold for ˆ c n with a r ate of onver gen e of n − 1 / 3 and (ln ln n/n ) 1 / 3 r esp e tively. 21 5.4 A n appr oximation lemma of ˆ c n ( u, v ) by c n ( u, v ) The lemma of this setion giv es the rate of appro ximation of the k ernel opula densit y estimator ˆ c n ( u, v ) omputed on the real data ( F n ( X i ) , G n ( Y i )) b y its analogue c n ( u, v ) omputed on the pseudo-data ( U i , V i ) := ( F ( X i ) , G ( Y i )) . A similar result, but with a dieren t pro of, has b een obtained in F ermanian [ 9 ℄ theorem 1. Lemma 5.4 L et ( u, v ) ∈ (0 , 1) 2 . If the kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) is twi e dier entiable with b ounde d se ond derivatives, then | ˆ c n ( u, v ) − c n ( u, v ) | = o P ( a 2 n + 1 / q na 2 n ) | ˆ c n ( u, v ) − c n ( u, v ) | = o a.s. s ln ln n na 2 n ! Pro of. W e note || . || a norm for v etors. Set ∆ := ˆ c n ( u, v ) − c n ( u, v ) = 1 na 2 n n P i =1 ∆ i,n ( u, v ) with ∆ i,n ( u, v ) := K u − F n ( X i ) a n , v − G n ( Y i ) a n ! − K u − F ( X i ) a n , v − G ( Y i ) a n ! and dene Z i,n := F ( X i ) − F n ( X i ) G ( Y i ) − G n ( Y i ) . As men tioned in setion 5.1 , | F n ( X i ) − F ( X i ) | ≤ || F n − F || ∞ and | G n ( Y i ) − G ( Y i ) | ≤ || G n − G || ∞ a.s. for ev ery i = 1 , . . . , n . Lemma 5.1 th us en tails that the norm of Z i,n is indep enden t of i and su h that || Z i,n || = O P (1 / √ n ) , i = 1 , . . . , n (12) || Z i,n || = O a.s. ( q ln ln n/n ) , i = 1 , . . . , n (13) No w, for ev ery xed ( u, v ) ∈ [0 , 1] 2 , sine the k ernel K is t wie dieren tiable, there exists, b y T a ylor expansion, random v ariables ˜ U i,n and ˜ V i,n su h that, almost surely , ∆ = 1 na 3 n n X i =1 Z T i,n ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! + 1 2 na 4 n n X i =1 Z T i,n ∇ 2 K u − ˜ U i,n a n , v − ˜ V i,n a n ! Z i,n := ∆ 1 + ∆ 2 22 where Z T i,n denotes the transp ose of the v etor Z i,n and ∇ K and ∇ 2 K the gradien t and the Hessian resp etiv ely of the m ultiv ariate k ernel funtion K ∇ K = ∂ K ∂ u ∂ K ∂ v , ∇ 2 K = ∂ 2 K ∂ u 2 ∂ 2 K ∂ u∂ v ∂ 2 K ∂ u∂ v ∂ 2 K ∂ v 2 Ne gligibility of ∆ 2 : By the b oundedness assumption on the seond-order deriv a- tiv es of the k ernel, and equations 12 and 13 , ∆ 2 = O P 1 na 4 n ! and ∆ 2 = O a.s. ln ln n na 4 n ! . Ne gligibility of ∆ 1 : By en tering at exp etations, ∆ 1 = 1 na 3 n n X i =1 Z T i,n ∇ K u − F ( X i ) a n , . . . ! − E ∇ K u − F ( X i ) a n , . . . !! + 1 na 3 n n X i =1 Z T i,n E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! := ∆ 11 + ∆ 12 Ne gligibility of ∆ 12 : Bias results on the biv ariate gradien t k ernel estimator (See Sott [ 27 ℄ hapter 6) en tail that E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! = a 3 n ∇ c ( u, v ) + O ( a 5 n ) Cau h y-S h w arz inequalit y yields that | ∆ 12 | ≤ n || Z i,n || na 3 n E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! In turn, with equations 12 and 13 , ∆ 12 = O P (1 / √ n ) and ∆ 12 = O a.s ( q ln ln n/n ) . Ne gligibility of ∆ 11 : Set A i = ∇ K u − F ( X i ) a n , . . . − E ∇ K u − F ( X i ) a n , . . . . Then, | ∆ 11 | ≤ || Z n || na 3 n n X i =1 || A i || Boundedness assumption on the deriv ativ e of the k ernel imply that || A i || ≤ 2 C a.s. W e apply Ho eding inequalit y for indep enden t, en tered, b ounded b y M , but non iden tially distributed random v ariables ( η j ) (e.g. see [ 1 ℄), P ( n X j =1 η j > t ) ≤ exp − t 2 2 nM 2 ! . (14) 23 Here, for ev ery ǫ > 0 , with M = 2 C , η i = || A i || − E || A i || , t = ǫn 1 / 2 (ln ln n ) 1 / 2 , w e get that P X n i =1 ( || A i || − E || A i || ) > ǫ √ n ln ln n 6 exp − ǫ 2 ln ln n 4 M 2 ! = 1 (ln n ) δ with a δ > 0 and where the r.h.s. go es to zero as n → ∞ . Therefore, P n i =1 ( || A i || − E || A i || ) = O P ( √ n ln ln n ) . F or the almost sure negligibilit y , w e get similarly b y inequalit y 14 that, for ev ery ǫ > 0 , with t = ǫn (1+ δ ) / 2 and δ > 0 , P X n i =1 ( || A i || − E || A i || ) > ǫn (1+ δ ) / 2 6 exp − ǫ 2 n δ 4 M 2 ! and the series on the r.h.s is on v ergen t. In turn, the Borell-Can telli lemma imply that P n i =1 ( || A i || − E || A i || ) = O a.s. ( n (1+ δ ) / 2 ) . It remains to ev aluate E | | A i || . First, w e ha v e that E | | A i || ≤ 2 E ||∇ K (( u − F ( X i )) /a n , . . . ) || . Seond, sine K is dieren tiable and of pro dut form K ( u, v ) = K 1 ( u ) K 2 ( v ) , ea h sub-k ernel is of b ounded v ariations and an b e written as a dierene of t w o monotone inreasing funtions. F or example, set K 1 = K a 1 − K b 1 and dene K ∗ := ( K a 1 + K b 1 ) K 2 . W e ha v e, ∂ K ∂ u 6 | ( K a 1 ) ′ | + | ( K b 1 ) ′ | K 2 = (( K a 1 ) ′ + ( K b 1 ) ′ ) K 2 := ∂ K ∗ ∂ u where the equalit y pro eeds from the p ositivit y of the deriv ativ es. As a on- sequene, E ∂ K ∂ u (( u − F ( X i )) /a n , . . . ) ≤ E ∂ K ∗ ∂ u (( u − F ( X i )) /a n , . . . ) and similarly for the other partial deriv ativ e. The r.h.s. of the previous inequal- it y is, after an in tegration b y parts, of order a 3 n b y the results on the k ernel estimator of the gradien t of the densit y (See Sott [ 27 ℄ hapter 6). Therefore, P n i =1 E | | A i || = O ( na 3 n ) . Reolleting all elemen ts, w e ev en tually obtain that ∆ 11 = O P √ n ln ln n + na 3 n √ nna 3 n ! = O P √ ln ln n na 3 n + 1 √ n ! = o P 1 q na 2 n . ∆ 11 = O a.s. n (1+ δ ) / 2 + na 3 n na 3 n s ln ln n n = O a.s. s ln ln n na 2 n 1 n (1 − δ ) / 2 a 2 n + s ln ln n n = o a.s. s ln ln n na 2 n ! 24 for δ small enough ( < 1 / 3 for a n ≃ n − 1 / 6 ). ✷ 5.5 A n appr oximation lemma for ˆ c n ( F n ( x ) , G n ( y )) by ˆ c n ( F ( x ) , G ( y )) The lemma of this subsetion giv es the rate of deviation of the k ernel opula densit y estimator ˆ c n from a v arying lo ation ( F n ( x ) , G n ( y )) to a xed lo ation ( F ( x ) , G ( y )) . Lemma 5.5 With the same assumptions as in the pr e e ding lemma, we have ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = o P a 2 n + 1 q na 2 n ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = O a.s. s ln ln n n Pro of. W e pro eed similarly as in the preeding lemma. Set ∆ n ( x, y ) := ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = 1 na 2 n n X i =1 ∆ ′ i,n ( x, y ) (15) with ∆ ′ i,n ( x, y ) := K F n ( x ) − F n ( X i ) a n , G n ( y ) − G n ( Y i ) a n ! − K F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! and dene Z n ( x, y ) := F n ( x ) − F ( x ) G n ( y ) − G ( y ) W e rst express ∆ ′ i,n ( x, y ) at a xed lo ation ( F ( x ) , G ( y )) b y a T a ylor expan- sion and b y b ounding uniformly the seond order terms, ∆ ′ i,n ( x, y ) = Z T n ( x, y ) ∇ K a n F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! + || Z n || 2 ∞ a 2 n R 1 (16) where R 1 is uniformly b ounded almost surely: R 1 = O a.s. (1) . W e then go from the data ( F n ( X i ) , G n ( Y i )) to the pseudo but xed w.r.t. n data ( F ( X i ) , G ( Y i )) . 25 By a seond T a ylor expansion, ∇ K a n F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! = ∇ K a n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + Z T i,n ∇ 2 K 2 a 2 n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + || Z n || ∞ a 2 n R 2 . (17) where R 2 = o a.s. (1) uniformly in i , x and y . Therefore, plugging 16 and 17 in 15 , w e get ∆ n ( x, y ) = Z T n ( x, y ) na 2 n n X i =1 ∇ K a n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + Z T n ( x, y ) na 2 n n X i =1 Z t i,n ∇ 2 K 2 a 2 n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + R 3 || Z n || 2 ∞ a 4 n . with the remainder term R 3 = O a.s. (1) uniformly . As b efore, the prop erties of the k ernel (deriv ate) densit y estimator (See Sott [ 27 ℄ hapter 6) en tails that 1 na 3 n n X i =1 ∇ K F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! = O P ( a 2 n + 1 / q na 4 n ) . Therefore, using 12 and b ounding uniformly the Hessian, 15 b eomes ∆ n ( x, y ) = O P a 2 n || Z n || ∞ + || Z n || ∞ q na 4 n + O P || Z n || 2 ∞ a 4 n ! = o P a 2 n + 1 q na 2 n . Similarly , one gets with 13 and the strong onsisteny of the estimator of the gradien t of the densit y that ∆ n ( x, y ) = O a.s. q ln ln n n . ✷ Referenes [1℄ D. Bosq. Nonp ar ametri statistis for sto hasti pr o esses , v olume 110 of L e tur e Notes in Statistis . Springer-V erlag, New Y ork, seond edition, 1998. Estimation and predition. [2℄ S. X. Chen. Beta k ernel estimators for densit y funtions. Comput. Statist. Data A nal. , 31(2):131145, 1999. 26 [3℄ P . Deheuv els. Conditions néessaires et susan tes de on v ergene p ontuelle presque sûre et uniforme presque sûre des estimateurs de la densité. C. R. A ad. Si. Paris Sér. A , 278:12171220, 1974. [4℄ P . Deheuv els. La fontion de dép endane empirique et ses propriétés. Un test non paramétrique d'indép endane. A ad. R oy. Belg. Bul l. Cl. Si. (5) , 65(6):274292, 1979. [5℄ P . Deheuv els. A Kolmogoro v-Smirno v t yp e test for indep endene and m ultiv ariate samples. R ev. R oumaine Math. Pur es Appl. , 26(2):213226, 1981. [6℄ L. Devro y e and G. Lugosi. Combinatorial metho ds in density estimation . Springer Series in Statistis. Springer-V erlag, New Y ork, 2001. [7℄ J. F an and Q. Y ao. Nonline ar time series . Springer Series in Statistis. Springer-V erlag, New Y ork, seond edition, 2005. Nonparametri and parametri metho ds. [8℄ J. F an, Q. Y ao, and H. T ong. Estimation of onditional densities and sen- sitivit y measures in nonlinear dynamial systems. Biometrika , 83(1):189 206, 1996. [9℄ J.-D. F ermanian. Go o dness-of-t tests for opulas. J. Multivariate A nal. , 95(1):119152, 2005. [10℄ J.-D. F ermanian and Saillet O. Nonparametri estimation of opulas for time series. Journal of R isk , 5(4):2554, 2003. [11℄ J.-D. F ermanian, D. Radulo vi¢, and M. W egk amp. W eak on v ergene of empirial opula pro esses. Bernoul li , 10(5):847860, 2004. [12℄ T. Gasser and H.-G. Müller. Kernel estimation of regression funtions. In Smo othing te hniques for urve estimation (Pr o . W orkshop, Heidel- b er g, 1979) , v olume 757 of L e tur e Notes in Math. , pages 2368. Springer, Berlin, 1979. [13℄ I. Gijb els and J. Mielnizuk. Estimating the densit y of a opula funtion. Comm. Statist. The ory Metho ds , 19(2):445464, 1990. [14℄ J. Gustafsonn, M. Hagmann, J.P . Nielsen, and O. Saillet. Lo al trans- formation k ernel densit y estimation of loss distributions. F orth oming in Journal of Business and E onomi Statistis , 2007. [15℄ L. Györ and M. K ohler. Nonparametri estimation of onditional dis- tributions. IEEE T r ans. Inform. The ory , 53(5):18721879, 2007. [16℄ P . D. Ho. Extending the rank lik eliho o d for semiparametri opula estimation. A nnals Appl. Stats. , 1(1):265283, 2007. [17℄ R. J. Hyndman, D. M. Bash tann yk, and G. K. Grun w ald. Estimating and visualizing onditional densities. J. Comput. Gr aph. Statist. , 5(4):315 336, 1996. [18℄ R. J. Hyndman and Q. Y ao. Nonparametri estimation and symmetry tests for onditional densit y funtions. J. Nonp ar ametr. Stat. , 14(3):259 278, 2002. [19℄ H. Jo e. Multivariate mo dels and dep enden e on epts , v olume 73 of Mono- gr aphs on Statistis and Applie d Pr ob ability . Chapman & Hall, London, 1997. 27 [20℄ C. Laour. A daptiv e estimation of the transition densit y of a mark o v hain. A nn. Inst. H. Poin ar é Pr ob ab. Statist. , 43(5):571597, 2007. [21℄ È. A. Nadara y a. Nonp ar ametri estimation of pr ob ability densities and r e gr ession urves , v olume 20 of Mathematis and its Appli ations (Soviet Series) . Klu w er A ademi Publishers Group, Dordre h t, 1989. T ranslated from the Russian b y Sam uel K otz. [22℄ E. P arzen. On estimation of a probabilit y densit y funtion and mo de. A nn. Math. Statist. , 33:10651076, 1962. [23℄ B. L. S. Prak asa Rao. Nonp ar ametri funtional estimation . Probabil- it y and Mathematial Statistis. A ademi Press In. [Harourt Brae Jo v ano vi h Publishers℄, New Y ork, 1983. [24℄ M. B. Priestley and M. T. Chao. Non-parametri funtion tting. J. R oy. Statist. So . Ser. B , 34:385392, 1972. [25℄ M. Rosen blatt. Remarks on some nonparametri estimates of a densit y funtion. A nn. Math. Statist. , 27:832837, 1956. [26℄ M. Rosen blatt. Conditional probabilit y densit y and regression estimators. In Multivariate A nalysis, II (Pr o . Se ond Internat. Symp os., Dayton, Ohio, 1968) , pages 2531. A ademi Press, New Y ork, 1969. [27℄ D. W. Sott. Multivariate density estimation . Wiley Series in Probabilit y and Mathematial Statistis: Applied Probabilit y and Statistis. John Wiley & Sons In., New Y ork, 1992. Theory , pratie, and visualization, A Wiley-In tersiene Publiation. [28℄ G. R. Shora k and J. A. W ellner. Empiri al pr o esses with appli ations to statistis . Wiley Series in Probabilit y and Mathematial Statistis: Probabilit y and Mathematial Statistis. John Wiley & Sons In., New Y ork, 1986. [29℄ M. Sklar. F ontions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris , 8:229231, 1959. [30℄ C. J. Stone. Optimal rates of on v ergene for nonparametri estimators. A nn. Statist. , 8(6):13481360, 1980. [31℄ W. Stute. A la w of the logarithm for k ernel densit y estimators. A nn. Pr ob ab. , 10(2):414422, 1982. [32℄ W. Stute. Asymptoti normalit y of nearest neigh b or regression funtion estimates. A nn. Statist. , 12(3):917926, 1984. [33℄ W. Stute. Conditional empirial pro esses. A nn. Statist. , 14(2):638647, 1986. [34℄ W. Stute. On almost sure on v ergene of onditional empirial distribu- tion funtions. A nn. Pr ob ab. , 14(3):891901, 1986. [35℄ A. W. v an der V aart. Asymptoti statistis , v olume 3 of Cambridge Series in Statisti al and Pr ob abilisti Mathematis . Cam bridge Univ ersit y Press, Cam bridge, 1998. [36℄ A. W. v an der V aart and J. A. W ellner. W e ak onver gen e and empiri al pr o esses . Springer Series in Statistis. Springer-V erlag, New Y ork, 1996. With appliations to statistis. 28
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment