A quantile-copula approach to conditional density estimation

We present a new non-parametric estimator of the conditional density of the kernel type. It is based on an efficient transformation of the data by quantile transform. By use of the copula representation, it turns out to have a remarkable product form…

Authors: ** Olivier P. Faugeras (Université Paris‑Sud, LSTA) **

A quantile-copula approach to conditional density estimation
A quan tile-opula approa h to onditional densit y estimation. Olivier P . F augeras L.S.T.A, Université Paris 6 175, rue du Chevaler et, 75013 Paris, F r an e T el:+(33) 1 44 27 85 62 F ax:+(33) 1 44 27 33 42 Abstrat W e presen t a new non-parametri estimator of the onditional densit y of the k ernel t yp e. It is based on an eien t transformation of the data b y quan tile transform. By use of the opula represen tation, it turns out to ha v e a remark able pro dut form. W e study its asymptoti prop erties and ompare its bias and v ariane to omp etitors based on nonparametri regression. A omparativ e n umerial sim ulation is pro vided. Key wor ds: onditional densit y, k ernel estimation, opula, quan tile transform, nonparametri regression, 1991 MSC: 62G007, 62M20, 62M10 1 In tro dution 1.1 Motivation Let (( X i , Y i ); i = 1 , . . . , n ) b e an indep enden t iden tially distributed sample from real-v alued random v ariables ( X , Y ) sitting on a giv en probabilit y spae. F or prediting the resp onse Y of the input v ariable X at a giv en lo ation x , it is of great in terest of estimating not only the onditional mean or r e gr ession funtion E ( Y | X = x ) , but the full  onditional density f ( y | x ) . Indeed, estimat- ing the onditional densit y is m u h more informativ e, sine it allo ws not only to realulate the onditional exp eted v alue E ( Y | X ) and onditional v ariane Email addr ess: olivier.faugerasgmail.om (Olivier P . F augeras ). Preprin t submitted to Elsevier No v em b er 1, 2018 from the densit y , but also to pro vide the general shap e of the onditional den- sit y . This is esp eially imp ortan t for m ulti-mo dal or sk ew ed densities, whi h often arise from nonlinear or non-Gaussian phenomenas, where the exp eted v alue migh t b e no where near a mo de, i.e. the most lik ely v alue to app ear. Moreo v er, for situations in whi h ondene in terv als are preferred to p oin t estimates, the estimated onditional densit y is an ob jet of ob vious in terest. 1.2 Estimation by kernel smo othing A natural approa h to estimate the onditional densit y f ( y | x ) of Y giv en X = x w ould b e to exploit the iden tit y f ( y | x ) = f X Y ( x, y ) f X ( x ) (1) where f X Y and f X denote the join t densit y of ( X , Y ) and X , resp etiv ely . By in tro duing P arzen-Rosen blatt k ernel estimators of these densities, namely ˆ f n,X Y ( x, y ) : = 1 n n X i =1 K ′ h ′ ( X i − x ) K h ( Y i − y ) ˆ f n,X ( x ) : = 1 n n X i =1 K ′ h ′ ( X i − x ) where K h ( . ) = 1 /hK ( ./ h ) and K ′ h ′ ( . ) = 1 /h ′ K ′ ( ./h ′ ) are (resaled) k ernels with their asso iated sequene of bandwidth h = h n and h ′ = h ′ n going to zero as n → ∞ , one an onstrut the quotien t ˆ f R n ( y | x ) := ˆ f n,X Y ( x, y ) ˆ f n,X ( x ) and obtain an estimator of the onditional densit y . Su h an estimator w as rst studied b y Rosen blatt [ 26 ℄, and more reen tly b y Hyndman et al. [ 17 ℄, who sligh tly impro v ed on Rosen blatt's k ernel based estimator. 1.3 Estimation by r e gr ession te hniques As p oin ted out b y n umerous authors, see e.g. F an and Y ao [ 7 ℄  hapter 6, this approa h is equiv alen t to the one arising from onsidering this onditional densit y estimation problem in a regression framew ork. Indeed, let F ( y | x ) b e the um ulativ e onditional distribution funtion of Y giv en X = x . It stems from the fat that E  1 | Y − y |≤ h | X = x  = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) 2 as h → 0 , that, if one replae the exp etation in the ab o v e expression b y its empirial oun terpart, one an apply the usual lo al a v eraging metho ds and p erform a regression estimation on the syn theti data ((1 / 2 h ) 1 | Y i − y | ≤ h ; i = 1 , . . . , n ) . By a Bo  hner t yp e theorem, one an ev en replae the transformed data b y its smo othed v ersion Y ′ i := K h ( Y i − y ) := 1 h K  Y i − y h  . In partiular, the p opular Nadara y a-W atson regression estimator ˆ f N W n ( y | x ) := P n i =1 Y ′ i K ′ h ′ ( X i − x ) P n i =1 K ′ h ′ ( X i − x ) redues itself to the same estimator of the onditional densit y of the double k ernel t yp e as b efore ˆ f N W n ( y | x ) := P n i =1 K h ( Y i − y ) K ′ h ′ ( X i − x ) P n i =1 K ′ h ′ ( X i − x ) = ˆ f R n ( y | x ) . T aking adv an tage of this regression form ulation, F an, Y ao and T ong [ 8 ℄ pro- p osed a onditional densit y estimator whi h generalizes the k ernel one b y use of the lo al p olynomial te hniques. In partiular, it allo ws to ta kle with the bias issues of the k ernel smo othing. Ho w ev er, and unlik e the former, it is no longer guaran teed to ha v e p ositiv e v alue nor to in tegrate to 1 with resp et to y . With these issues in mind, Hyndman and Y ao [ 18 ℄ built on lo al p oly- nomial te hniques and suggested t w o impro v ed metho ds, the rst one based on lo ally tting a log-linear mo del and the seond one on onstrained lo al p olynomial mo deling. An o v erview an b e found in F an and Y ao [ 7 ℄ ( hapter 6 and 10). V ery reen tly , Gy ör and K ohler [ 15 ℄ studied a partitioning t yp e estimate and studied its prop erties in total v ariation norm and Laour [ 20 ℄ a pro jetion-t yp e estimate for Mark o v  hains. 1.4 A pr o dut shap e d estimator Ho w ev er, these t w o equiv alen t approa hes suer from sev eral dra wba ks: rst, b y its form as a quotien t of t w o estimators, the probabilisti b eha vior of the Nadara y a-W atson estimator (or its lo al p olynomial oun terpart) is tri ky to study . It is usually dealt with b y a en tering at exp etation for b oth n umerator and denominator and a linearizing of the in v erse, see e.g. [ 7 ℄, or [ 1 ℄ for details. Seond, at a oneptual lev el, one ould argue that implemen ting regression estimation te hniques in this setting is, in a sense, unnatural: estimating a densit y , ev en if it is a onditional one, should resort to densit y estimation te hniques only . Finally , pratial implemen tations of these estimators an lead to n umerial instabilit y when the denominator is lose to zero. 3 T o remedy these problems, w e prop ose an estimator whi h builds on the idea of using syn theti data, i.e. a represen tation of the data more adapted to the problem than the original one. By transforming the data b y quan tile trans- forms and making use of the opula funtion, the estimator turns out to ha v e a remark able pr o dut form ˆ f n ( y | x ) = ˆ f Y ( y ) ˆ c n ( F n ( x ) , G n ( y )) where ˆ f Y , ˆ c n , F n ( x ) , G n ( y ) are estimators of the densit y f Y of Y , the opula densit y c , the .d.f. F of X and G of Y resp etiv ely (see next setion b elo w for denitions). Its study then rev eals to b e partiularly simple: it redues to the ones already done on nonparametri densit y estimation. The rest of the pap er is organized as follo ws: in setion 2, w e in tro due the quan tile transform and the opula represen tation whi h leads to the denition of our estimator. In setion 3, the main asymptoti results are established and ompared in setion 4 to those of other omp etitors. Pro ofs are mainly based on a series of auxiliary lemmas whi h are giv en in setion 5. 2 Presen tation of the estimator F or sak e of simpliit y and larit y of exp osition, w e limit ourselv es to unidi- mensional real v alued input v ariables X . Ho w ev er, all the results of this artile an b e easily extended to the m ultiv ariate ase. 2.1 The quantile tr ansform The idea of transforming the data is not new. It has b een used to impro v e the range of appliabilit y and p erformane of lassial estimation te hniques, e.g. to deal with sk ew ed data, hea vy tails, or restritions on the supp ort (see e.g. Devro y e and Lugosi [ 6 ℄  hapter 14 and the referenes therein, and also V an der V aart [ 35 ℄  hapter 3.2 for the related topi of v ariane stabilizing transformations in a parametri on text). In order to mak e inferene on Y from X , a natural question whi h then arises is, what is the b est transformation, if this question has a sense. As one an note from the ab o v e referenes, the b est transformation is v ery link ed to the distribution of the underlying data. W e will see b elo w that, for our problem, the natural andidate is the quan tile transform. The quan tile transform is a w ell-kno wn probabilisti tri k whi h is used to redue pro ofs, e.g. in empirial pro ess theory , for arbitrary real v alued ran- dom v ariables X to ones for random v ariables U uniformly distributed on the 4 in terv al [0 , 1] . It is based on the follo wing w ell-kno wn fat that whenev er F is on tin uous, the random v ariable U = F ( X ) is uniformly distributed on (0 , 1) and that on v ersely , when F is arbitrary , if U is a uniformly distributed ran- dom v ariable on (0 , 1) , X is equal in la w to F − 1 ( U ) , where F − 1 = Q is the generalized in v erse or quan tile funtion of X . (See e.g. [ 28 ℄,  hapter 1). As a onsequene, giv en a sample ( X 1 , . . . , X n ) of random v ariables with om- mon on tin uous .d.f. F sitting on a probabilit y spae (Ω , A , P ) , one an al- w a ys enlarge this probabilit y spae to arry a sequene ( U 1 , . . . , U n ) of uniform (0 , 1) random v ariables su h that U i = F ( X i ) , that is to sa y to onstrut a pseudo-sample with a pr esrib e d uniform marginal distribution. 2.2 The  opula r epr esentation F ormally , a opula is a bi-(or m ulti)v ariate distribution funtion whose margi- nal distribution funtions are uniform on the in terv al [0 , 1] . Indeed, Sklar [ 29 ℄ pro v ed the follo wing fundamen tal result: Theorem 2.1 F or any bivariate umulative distribution funtion F X,Y on R 2 , with mar ginal umulative distribution funtions F of X and G of Y , ther e ex- ists some funtion C : [0 , 1] 2 → [0 , 1] ,  al le d the dep enden e or  opula funtion, suh as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . (2) If F and G ar e  ontinuous, this r epr esentation is unique with r esp e t to ( F , G ) . The  opula funtion C is itself a umulative distribution funtion on [0 , 1] 2 with uniform mar ginals. This theorem giv es a represen tation of the biv ariate .d.f. as a funtion of ea h univ ariate .d.f. In other w ords, the opula funtion aptures the dep endene struture among the omp onen ts X and Y of the v etor ( X , Y ) , irresp etiv ely of the marginal distribution F and G . Simply put, it allo ws to deal with the randomness of the dep endene struture and the randomness of the marginals sep ar ately . Copulas app ears to b e naturally link ed with the quan tile transform as form ula 2 en tails that C ( u, v ) = F X,Y ( F − 1 ( u ) , G − 1 ( v )) . F or more details regarding op- ulas and their prop erties, one an onsult for example the b o ok of Jo e [ 19 ℄. Copulas ha v e witnessed a renew ed in terest in statistis, esp eially in nane, sine the pioneering w ork of Deheuv els [ 4 ℄, who in tro dued the empirial op- ula pro ess. W eak on v ergene of the empirial opula pro ess w as in v estigated b y Deheuv els [ 5 ℄, V an der V aart and W ellner [ 36 ℄, F ermanian, Radulo vi and W egk amp [ 11 ℄. F or the estimation of the opula densit y , refer to Gijb els and 5 Mielnizuk [ 13 ℄, F ermanian [ 9 ℄ and F ermanian and Saillet [ 10 ℄. F rom no w on, w e assume that the opula funtion C ( u, v ) has a densit y c ( u, v ) with resp et to the Leb esgue measure on [0 , 1] 2 and that F and G are stritly inreasing and dieren tiable with densities f and g . C ( u, v ) and c ( u, v ) are then the um ulativ e distribution funtion (.d.f.) and densit y resp etiv ely of the transformed v ariables ( U, V ) = ( F ( X ) , G ( Y )) . By dieren tiating form ula ( 2 ), w e get for the join t densit y , f X Y ( x, y ) = ∂ 2 F X Y ( x, y ) ∂ x∂ y = f ( x ) g ( y ) c ( F ( x ) , G ( y )) where c ( u, v ) := ∂ 2 C ( u,v ) ∂ u∂ v is the ab o v e men tioned opula densit y . Ev en tually , w e an obtain the follo wing expliit form ula of the onditional densit y f Y | X ( x, y ) = f X Y ( x, y ) f ( x ) = g ( y ) c ( F ( x ) , G ( y )) . (3) 2.3 Constrution of the estimator Starting from the previously stated pro dut t yp e form ula ( 3 ), a natural plug-in approa h to build an estimator of the onditional densit y is to use • a P arzen-Rosen blatt k ernel t yp e non parametri estimator of the marginal densit y g of Y , ˆ g n ( y ) := 1 nh n n X i =1 K 0  y − Y i h n  • the empirial distribution funtions F n ( x ) and G n ( y ) for F ( x ) and G ( y ) resp etiv ely , F n ( x ) = 1 n n X j =1 1 X j ≤ x and G n ( y ) := 1 n n X j =1 1 Y j ≤ y . Conerning the opula densit y c ( u, v ) , w e noted that c ( u, v ) is the join t densit y of the transformed v ariables ( U, V ) = ( F ( X ) , G ( Y )) . Therefore, c ( u, v ) an b e estimated b y the biv ariate P arzen-Rosen blatt k ernel t yp e non parametri densit y (pseudo) estimator, c n ( u, v ) := 1 na n b n n X i =1 K  u − U i a n , v − V i b n  (4) where K is a biv ariate k ernel and a n , b n its asso iated bandwidth. F or simpli- it y , w e restrit ourselv es to pro dut k ernels, i.e. K ( u, v ) = K 1 ( u ) K 2 ( v ) with the same bandwidths a n = b n . 6 Nonetheless, sine F and G are unkno wn, the random v ariables ( U i , V i ) are not observ able, i.e. c n is not a true statisti. Therefore, w e appro ximate the pseudo- sample ( U i , V i ) , i = 1 , . . . , n b y its empirial oun terpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . W e therefore obtain a gen uine estimator of c ( u, v ) ˆ c n ( u, v ) := 1 na 2 n n X i =1 K 1 u − F n ( X i ) a n ! K 2 v − G n ( Y i ) a n ! . (5) Ev en tually , the onditional densit y estimator is written as ˆ f n ( y | x ) := " 1 nh n n X i =1 K 0  y − Y i h n  # . " 1 na 2 n n X i =1 K 1 F n ( x ) − F n ( X i ) a n ! K 2 F n ( y ) − G n ( Y i ) a n !# or, under a more ompat form, ˆ f n ( y | x ) := ˆ g n ( y ) ˆ c n ( F n ( x ) , G n ( y )) . (6) Remark 1 T o our know le dge, the estimator studie d in this p ap er has never b e en pr op ose d in the liter atur e. However, some  onne tions  an b e made with the ne ar est neighb or one pr op ose d by Stute [ 32 ℄, [ 33 ℄ and [ 34 ℄ for  onditional umulative distribution funtion and the Gasser and Mül ler [ 12 ℄ and Priestley and Chao [ 24 ℄ one in the  ontext of r e gr ession estimation. Inde e d, these esti- mators takle the issue of having a r andom denominator by rst tr ansforming the design X 1 , . . . , X n to a uniform (r andom) one. This r esult in assigning the surfa es under the kernel funtion inste ad of its heights as weights. Con- tr ary to our estimator, they do not make tr ansformations of the data in b oth dir e tions X and Y . 3 Asymptoti results 3.1 Notations and assumptions W e note the ith momen t of a generi k ernel (p ossibly m ultiv ariate) K as m i ( K ) := R u i K ( u ) du , and the L p norm of a funtion h b y || h || p := R h p . W e use the sign ≃ to denote the order of the bandwidths, i.e. h n ≃ u n means that h n = c n u n with c n → c > 0 . The supp ort of the densities funtion f and c are noted as supp ( f ) = { x ∈ R ; f ( x ) > 0 } and supp ( c ) = { ( u, v ) ∈ R 2 ; c ( u, v ) > 0 } , resp etiv ely . F or stating our results, w e will ha v e to mak e some regularit y assumptions 7 on the k ernels and the densities whi h, although far from b eing minimal, are someho w ustomary in k ernel densit y estimation (see subsetion 5.2 for disus- sions and details). Set x and y t w o xed p oin ts in the in terior of supp ( f ) and supp ( g ) resp etiv ely . In the remainder of this pap er, w e will alw a ys supp ose that i) the .d.f F of X and G of Y are stritly inreasing and dieren tiable; ii) the densities g and c are t wie dieren tiable with on tin uous b ounded seond deriv ativ es on their supp ort. Moreo v er, w e assume that the k ernels K 0 and K satisfy the follo wing: (i) K and K 0 are of b ounded supp ort and of b ounded v ariation; (ii) 0 ≤ K ≤ C and 0 ≤ K 0 ≤ C for some onstan t C ; (iii) K and K 0 are rst order k ernels: m 0 ( K ) = 1 , m 1 ( K ) = 0 and m 2 ( K ) < + ∞ , and the same for K 0 . In addition, in order to appro ximate ˆ c n b y c n , w e will imp ose the sligh tly more stringen t assumption on the biv ariate k ernel K , that it is t wie dieren tiable with b ounded seond partial deriv ativ es. 3.2 W e ak and str ong  onsisteny of the estimator W e ha v e the follo wing p oin t wise w eak onsisteny theorem: Theorem 3.1 L et the r e gularity  onditions on the densities and kernels b e satise d, if h n and a n tends to zer o as n → ∞ in suh a way that nh n → ∞ , na 2 n → ∞ , then ˆ f n ( y | x ) = f ( y | x ) + O P   1 √ nh n + h 2 n + 1 q na 2 n + a 2 n   . Pro of. Reall from 4 and 5 that c n and ˆ c n are estimators of the opula densit y c based resp etiv ely on unobserv able pseudo-data ( F ( X i ) , G ( Y i ) , and their appro ximations ( F n ( X i ) , G n ( Y i )) . The main ingredien t of the pro of follo ws from the deomp osition: ˆ f n ( y | x ) − f ( y | x ) = ˆ g n ( y ) ˆ c n ( F n ( x ) , G n ( y )) − g ( y ) c ( F ( x ) , G ( y )) = [ ˆ g n ( y ) − g ( y ) ] ˆ c n ( F n ( x ) , G n ( y )) + g ( y ) [ˆ c n ( F n ( x ) , G n ( y )) − c ( F ( x ) , G ( y ))] : = D 1 + D 2 8 W e pro eed one step further in the deomp osition of ea h terms, b y rst en tering at xed lo ations, D 1 = [ ˆ g n ( y ) − g ( y )] [ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y ))] + [ ˆ g n ( y ) − g ( y ) ] [ ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y ))] + [ ˆ g n ( y ) − g ( y ) ] [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) ] + [ ˆ g n ( y ) − g ( y ) ] [ c ( F ( x ) , G ( y ))] (7) D 2 = g ( y ) [ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y ))] + g ( y ) [ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y ))] + g ( y ) [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y ))] (8) Con v ergene results for the k ernel densit y estimators of setion 5.2 en tail that ˆ g n ( y ) − g ( y ) = O p ( h 2 n + 1 / q nh n ) c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) = O p ( a 2 n + 1 / q na 2 n ) b y lemma 5.2 and 5.3 resp etiv ely . Appro ximation lemmas 5.4 and 5.5 of setions 5.4 and 5.5 en tail that ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y )) = o P ( a 2 n + 1 / q na 2 n ) ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = o P ( a 2 n + 1 / q na 2 n ) . W e therefore obtain that D 1 = O P  h 2 n + 1 / q nh n  O P  a 2 n + 1 / q na 2 n  + O P  h 2 n + 1 / q nh n  D 2 = o P  a 2 n + 1 / q na 2 n  + O P  a 2 n + 1 / q na 2 n  and the ondition a n → 0 , h n → 0 , na 2 n → + ∞ , nh n → + ∞ en tails the on v ergene of the estimator. ✷ Remark 2 As a  or ol lary, we get the r ate of  onver gen e, by ho osing the b andwidths whih b alan e the bias and varian e tr ade-o: for an optimal hoi e of h n ≃ n − 1 / 5 and a n ≃ n − 1 / 6 , we get ˆ f ( y | x ) = f ( y | x ) + O P ( n − 1 / 3 ) . Ther efor e, our estimator is r ate optimal in the sense that it r e ahes the mini- max r ate n − 1 / 3 of  onver gen e, a  or ding to Stone [ 30 ℄. Almost sure results an b e pro v ed in the same w a y: w e ha v e the follo wing strong onsisteny result 9 Theorem 3.2 L et the r e gularity  onditions on the densities and kernels b e satise d. If in addition nh n / (ln ln n ) → ∞ and na 2 n / (ln ln n ) → ∞ , then ˆ f n ( y | x ) = f ( y | x ) + O a.s.   a 2 n + s ln ln n na 2 n + h 2 n + s ln ln n nh n   . Pro of. It follo ws the same lines as the preeding theorem, but uses the a.s. results of the onsisteny of the k ernel densit y estimators of lemmas 5.2 and 5.3 and of the appro ximation lemmas 5.4 and 5.5 . It is therefore omitted. ✷ Remark 3 F or h n ≃ (ln ln n/n ) 1 / 5 and a n ≃ (ln ln n/n ) 1 / 6 whih is the op- timal tr ade-o b etwe en the bias and the sto hasti term, one gets the optimal r ate (ln ln n/n ) 1 / 3 . 3.3 Conver gen e in distribution Theorem 3.3 L et the r e gularity  onditions on the densities and kernels b e satise d. h n → 0 , a n → 0 , nh n → ∞ and na 2 n → ∞ entail q na 2 n  ˆ f n ( y | x ) − f ( y | x )  d ❀ N  0 , g ( y ) f ( y | x ) || K | | 2 2  . F or h n ≃ n − 1 / 5 , a n ≃ n − 1 / 6 one gets the usual r ate n − 1 / 3 . Pro of. With the onditions on the bandwidths, all the terms in the pre- vious deomp osition 7 and 8 , are negligible ompared to ( na 2 n ) − 1 / 2 exept c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) , whi h is asymptotially normal b y the result of setion 5 , lemma 5.3 q na 2 n g ( y ) [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) ] d ❀ N  0 , g 2 ( y ) c ( F ( x ) , G ( y )) k K k 2 2  . An appliation of Slutsky's lemma yields the desired result. ✷ F or a v etor ( y 1 , . . . , y d ) , one an get a m ultidimensional v ersion of the on- v ergene in distribution (di on v ergene): Corollary 3.4 With the same assumptions, for ( y 1 , . . . , y d ) in the interior of supp ( g ) suh that g ( y i ) f ( y i | x ) 6 = 0 , q na 2 n     ˆ f n ( y i | x ) − f ( y i | x ) q g ( y i ) f ( y i | x ) k K k 2   , i = 1 , ..., m   d ❀ N ( m ) wher e N ( m ) is the standar d m -variate  enter e d normal distribution with iden- tity varian e matrix. 10 Pro of. It simply follo ws from the use of the Cramér-W old devie and is there- fore omitted. F or details, see e.g. [ 1 ℄, theorem 2.3. ✷ 3.4 Asymptoti Bias, V arian e and Me an squar e err or The asymptoti bias is alulated in the follo wing prop osition. Prop osition 3.5 With the assumptions of The or em 3.1 , we have B 0 := E ( ˆ f n ( y | x )) − f ( y | x ) = g ( y ) B K ( c, x, y ) a 2 n 2 + o ( a 2 n ) with B K ( c, x, y ) := m 2 ( K 1 ) ∂ 2 c ( F ( x ) ,G ( y )) ∂ u 2 + m 2 ( K 2 ) ∂ 2 c ( F ( x ) ,G ( y )) ∂ v 2 . Pro of. (Sk et h). By taking exp etation in the deomp osition 7 and 8 , E D 1 = c ( F ( x ) , G ( y )) E [ ˆ g n ( y ) − g ( y )] + R 1 E D 2 = g ( y ) E ([ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y ))]) + R 2 where w e made app ear the bias of ˆ g n and c n and where R 1 and R 2 stand for the remaining terms. With the assumptions on the bandwidths and deriv ations made tedious b y the transformation of the data b y the empirial margins, (see F ermanian [ 9 ℄ theorem 1 for su h a alulation), the terms in R 2 are negligible ompared to the bias of c n . The bias of c n , whi h is simply the bias of a biv ariate k ernel densit y estimator, is of order a 2 n . Similarly , b y b ounding the pro dut terms in D 1 b y Cau h y-S h w arz inequalit y , routine analysis sho w that the terms in R 1 are negligible ompared to the bias of ˆ g n , whi h is of order h 2 n . Sine h 2 n is itself negligible to a 2 n , the main term in the deomp osition is g ( y ) E ( c n ( F ( x ) , G ( y )) − C ( F ( x ) , G ( y ))) . Plugging the expression of the bias giv en in lemma 5.3 , yields the desired result. ✷ The asymptoti v ariane has already b een deriv ed in theorem 3.3 , V 0 := V ar ( ˆ f ( y | x )) = 1 / ( na 2 n ) g ( y ) f ( y | x ) | | K | | 2 2 + o (1 / ( na 2 n )) . T ogether with the omputation of the asymptoti bias, w e get the asymptoti mean squared error as a orollary: Corollary 3.6 With the pr evious assumptions, the Asymptoti Me an Squar e d Err or (AMSE) E 0 at ( x, y ) is E 0 := B 2 0 + V 0 = a 4 n g 2 ( y ) ( B k ( c, x, y )) 2 4 + g ( y ) f ( y | x ) | | K || 2 2 na 2 n + o a 4 n + 1 na 2 n ! 11 whih gives, for the hoi e of the usual b andwidths mentione d ab ove, E 0 = n − 2 / 3 g 2 ( y ) B 2 K ( c, x, y ) 4 + c ( F ( x ) , G ( y )) || K || 2 2 ! + o ( n − 2 / 3 ) . 4 Comparison with other estimators 4.1 Pr esentation of alternative estimators F or on v eniene, w e reall b elo w the denition of other estimators of the on- ditional densit y enoun tered in the literature and summarize their bias and v ariane prop erties. W e will note the bias of the ith estimator ˆ f i n ( y | x ) b y E i and its v ariane b y V i . (1) Double k ernel estimator : as dened in the in tro dution setion of our pap er b y the follo wing ratio, ˆ f (1) n ( y | x ) := 1 n n P i =1 K ′ h 1 ( X i − x ) K h 2 ( Y i − y ) 1 n n P i =1 K ′ h 1 ( X i − x ) . where h 1 and h 2 are the bandwidths. One then ha v e, see e.g. [ 17 ℄, • Bias: B 1 = h 2 1 m 2 ( K ) 2   2 f ′ ( x ) f ( x ) ∂ f ( y | x ) ∂ x + ∂ 2 f ( y | x ) ∂ x 2 + h 2 h 1 ! 2 ∂ 2 f ( y | x ) ∂ y 2   + o  h 2 1 + h 2 2  • V ariane: V 1 = k K k 2 2 f ( y | x ) nh 1 h 2 f ( x )  k K k 2 2 − h 2 f ( y | x )  + o  1 nh 1 h 2  (2) Lo al p olynomial estimator : Set R ( θ , x, y ) := n X i =1  K h 2 ( Y i − y ) − X r j =0 θ j ( X i − x ) j  2 K ′ h 1 ( X i − x ) , then the lo al p olynomial estimator is dened as ˆ f (2) n ( y | x ) := ˆ θ 0 , where ˆ θ xy := ( ˆ θ 0 , ˆ θ 1 , . . . , ˆ θ r ) is the v alue of θ whi h minimizes R ( θ , x, y ) . This lo al p olynomial estimator, although it has a sup erior bias than 12 the k ernel one, is no longer restrited to b e non-negativ e and do es not in tegrate to 1, exept in the sp eial ase r = 0 . F rom results of [ 8 ℄, w e get for the lo al linear estimator (see also [ 7 ℄ p. 256), • Bias: B 2 = h 2 1 m 2 ( K ′ ) 2 ∂ 2 f ( y | x ) ∂ x 2 + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 2 = || K || 2 2 || K ′ || 2 2 f ( y | x ) nh 1 h 2 f ( x ) + o  1 nh 1 h 2  (3) Lo al parametri estimator : As in [ 18 ℄ and [ 7 ℄, set R 1 ( θ , x, y ) := n X i =1 ( K h 2 ( Y i − y ) − A ( X i − x, θ )) 2 K ′ h 1 ( X i − x ) where A ( x, θ ) = l  P r j =0 θ j ( X i − x ) j  and l ( . ) is a monotoni funtion mapping R 7→ R + , e.g. l ( u ) = exp( u ) . Then, ˆ f (3) n ( y | x ) := A (0 , ˆ θ ) = l ( ˆ θ 0 ) . • Bias: B 3 = h 2 1 η ( K ′ ) ∂ 2 f ( y | x ) ∂ x 2 − ∂ 2 A (0 , θ xy ) ∂ x 2 ! + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 3 = τ ( K , K ′ ) 2 f ( y | x ) nh 1 h 2 f ( x ) + o  1 nh 1 h 2  where η and τ are k ernel dep enden t onstan ts. (4) Constrained lo al p olynomial estimator : A simple devie to fore the lo al p olynomial estimator to b e p ositiv e is to set θ 0 = exp( α ) in the denition of R 0 to b e minimized. The onstrained lo al p olynomial estimator ˆ f 4 n ( y | x ) is then dened analogously as the lo al p olynomial estimator ˆ f 2 n ( y | x ) . W e ha v e, as in [ 18 ℄ and [ 7 ℄: • Bias: B 4 := h 2 1 m 2 ( K ′ ) 2 ∂ 2 f ( y | x ) ∂ x 2 + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 4 = k K k 2 2 f ( y | x ) nh 1 h 2 f ( x ) + o  1 nh 1 h 2  13 4.2 Asymptoti Bias and V arian e  omp arison All estimators ha v e (hop efully) the same order n − 1 / 3 and n − 2 / 3 in their asymp- toti bias and v ariane terms, for the usual bandwidths  hoie. The main dierene lies in the onstan t terms whi h dep end on unkno wn densities. Bias : Con trary to all the alternativ e estimators whose bias in v olv es deriv ativ es of the full onditional densit y , one an note that our estimator's bias only in v olv es the densit y of Y and the deriv ativ es of the opula densit y . T o mak e things more expliit, the terms in v olv ed, e.g. in the lo al p olynomial estimator, write themselv es as the sum of the deriv ativ es of the onditional densit y , h − 2 n B 2 ≈ ∂ 2 f ( y | x ) ∂ x 2 + ∂ 2 f ( y | x ) ∂ y 2 that is to sa y , h − 2 n B 2 ≈ f ′ ( x ) g ( y ) ∂ c ( F ( x ) , G ( y )) ∂ u + f 2 ( x ) g ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ u 2 + 2 g ′ ( y ) g ( y ) ∂ c ( F ( x ) , G ( y )) ∂ v + g 3 ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ v 2 whereas our ( g ( y ) / 2) B K ( c, x, y ) term, mo dulo the onstan ts in v olv ed b y the k ernel, is written as a − 2 n B 0 ≈ g ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ u 2 + ∂ 2 c ( F ( x ) , G ( y )) ∂ v 2 ! . It then b eomes lear that w e ha v e a simpler expression, with less unkno wn terms, as is the ase for omp etitors whi h do in v olv e the densit y f and its deriv ativ e f ′ of X and the deriv ativ e g ′ of the Y densit y . In a xed bandwidth and asymptoti on text, it seems diult to ompare further. Nonetheless, w e b eliev e this feature of our estimator w ould b e prati- ally relev an t when it omes to  ho osing the bandwidths. Indeed, bandwidth seletion is usually p erformed b y minimizing lo al or global asymptoti error riteria su h as Asymptoti Mean Square Error (AMSE) or Asymptoti Mean In tegrated Square Error (AMISE), in whi h unkno wn terms ha v e to b e esti- mated. Sine in our approa h, the asymptoti bias and v ariane in v olv e less unkno wn terms, w e exp et that a higher auray ould b e obtained in this pre-estimation stage. Moreo v er, b y ha ving managed to separate the estimation problem of the marginal from the opula densit y , w e ould use kno wn optimal data-dep enden t bandwidths seletion pro edures for densit y estimation su h as ross v alidation, separately for the densit y of Y and for the opula densit y . Remark 4 Sin e the  opula density c has a  omp at supp ort [0 , 1] 2 , our esti- mator may suer fr om bias issues on the b oundaries, i.e. in the tails of X and 14 Y . T o  orr e t these issues, one  ould apply one of the sever al known te hniques to r e du e the bias of the kernel estimator on the e dges (se e e.g [ 7 ℄ hapter 5.5, b oundary kernels, r ee tion, tr ansformation and lo  al p olynomial tting). In the tail of the distribution of X , this bias issue in the  opula density estimator is b alan e d by the impr ove d varian e, as shown b elow. V ariane : The v ariane of our estimator in v olv es a pro dut of the densit y g ( y ) of Y b y the onditional densit y f ( y | x ) , na 2 n V 0 ≈ g ( y ) f ( y | x ) = g 2 ( y ) c ( F ( x ) , G ( y ) whereas omp etitors in v olv e the ratio of f ( y | x ) b y the densit y f ( x ) of X f ( y | x ) f ( x ) = g ( y ) f ( x ) c ( F ( x ) , G ( y )) . It is a remark able feature of the estimator w e prop ose, that its v ariane do es not in v olv e diretly f ( x ) , as is the ase for the omp etitors, but only its on tri- bution to Y , through the opula densit y . This reets the abilit y announed in the in tro dution of the opula represen tation to ha v e eetiv ely separated the randomness p ertaining to Y alone, from the dep endene struture of ( X , Y ) . Moreo v er, our estimator also do es not suer from the unstable nature of om- p etitors who, due to their in trinsi ratio struture, get an explosiv e v ariane for small v alue of the densit y f ( x ) , making onditional estimation diult, e.g. in the tail of the distribution of X . Remark 5 T o make estimators  omp ar able, we have r estrite d ourselves to so- al le d xe d b andwidths estimators, i.e. nonp ar ametri estimators wher e the b andwidths ar e of the generi form h n = bn α or h n = b (ln n/n ) α with α and b r e al numb ers. Impr ove d b ehavior for al l the pr e  e ding estimators  an b e obtaine d with data-dep endent b andwidths wher e h n = H n ( X 1 , . . . , X n , x )  an b e funtions of the lo  ation and of the data. 4.3 Finite sample numeri al simulation 4.3.1 Pr ati al implementation of the estimator Although the prop osed estimator seems to ompare fa v orably asymptotially , some pitfalls link ed to the opula densit y estimation ma y sho w up in the pratial implemen tation: Innities at the orners: man y opula densities exhibit innite v alues at their orners. Therefore, to a v oid that ( F n ( X i ) , G n ( Y i )) b e equal to (1 , 1) , w e  hange the empirial distribution funtions F n and G n to n/ ( n + 1) F n and n/ ( n + 1) G n resp etiv ely . 15 Boundary bias: sine the opula densit y is of ompat supp ort [0 , 1] 2 , the k ernel metho d of estimation ma y suer from b oundary bias. T o alleviate this issue, w e suggest to use b oundary-orreted k ernels su h as the b eta k ernels K x,b ( t ) = β x/b +1 , (1 − x ) /b +1 ( t ) , where β a,b ( t ) denotes the p df of a Beta(a,b) dis- tribution, adv o ated b y Chen [ 2 ℄, and used e.g. b y [ 14 ℄ for estimating loss distributions. The mo died opula densit y pseudo estimator is th us dened as c n ( u, v ) = n − 1 P n i =1 K u,a n ( U i ) K v,a n ( V i ) . Bandwidth seletion: p erformane of nonparametri estimators dep ends ruially on the bandwidths. F or onditional densit y , bandwidth seletion is a more deliate matter than for densit y estimation due to the m ultidimensional nature of the problem. Moreo v er, for ratio-t yp e estimators, the diult y is inreased b y the lo al dep endene of the bandwidths h y on h x implied b y on- ditioning near x . F or the opula estimator, a supplemen tal issue omes from the fat that the pseudo-data F ( X i ) , G ( Y i ) is not diretly aessible. Insp e- tion of the AMISE of the opula-based estimator suggest w e an separate the bandwidth  hoie of h for ˆ g ( y ) from the bandwidth  hoie of a n the opula densit y estimator ˆ c n . A rationale for a data-dep enden t metho d is to separately selet h on the Y i data alone (e.g. b y ross-v alidation or plug-in), from the a n of the opula densit y c based on the appro ximate data F n ( X i ) , G n ( Y i ) . Ho w- ev er, su h a bandwidth seletion w ould require deep er analysis and w e lea v e a detailed study of a pratial data-dep enden t metho d for bandwidth seletion of the opula-quan tile estimator, together with a global and lo al omparison of the estimators at their resp etiv e optimal bandwidths for further resear h. 4.3.2 Mo del and  omp arison r esults W e sim ulated a sample of n = 100 v ariables ( X i , Y i ) , from the follo wing mo del: X , Y is marginally distributed as N (0 , 1) and link ed via F rank Copula . C ( u, v , θ ) = ln[( θ + θ u + v − θ u − θ v ) / ( θ − 1 )] ln θ with parameter θ = 100 . W e restrited ourselv es to simple, xed for all x, y , rule-of-th um b metho ds based on Normal referene rule to get a rst piture. F or the seletion of a n of the opula densit y estimator, w e applied Sott's Rule on the data F n ( X i ) . W e used Epane hnik o v k ernels for ˆ g ( y ) and the other estimators. W e plotted the onditional densit y along with its estimations on the domain x ∈ [ − 5 , 5] and y ∈ [ − 3 , 3] on gure 1 . A omparison plot at x = 2 is sho wn on gure 2 . 16 Figure 1. 3D Plots. F rom left to righ t, top to b ottom: true densit y , quan tile-opula estimator, double k ernel, lo al p olynomial (lipp ed). - 3 - 2 - 1 0 1 2 3 y 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure 2. Comparison at x=2: onditional densit y=thi k urv e, quan tile-op- ula=on tin uous line, double k ernel=dotted urv e, lo al p olynomial=dashed urv e. 17 4.3.3 Clipping and Estimation in the tails As men tioned earlier, as the p erformane of the estimators dep ends on the p erformane of the bandwidths seletion metho d, it is deliate to giv e a on- lusiv e answ er. Ho w ev er, w e w ould lik e to illustrate at least one ase where the prop osed estimator learly outp erforms its omp etitors. Indeed, one ma jor issue of alternativ e estimators already men tioned is their n umerial explosion when the estimated densit y ˆ f ( x ) is lose to zero. In partiular, if the k ernel is of ompat supp ort, the denominator is zero for the x whose distane from the losest X i exeeds half the bandwidth times the length of the supp ort, thereb y allo wing estimation only on a losed subset of X inluded in [min X i , max X i ] . This is one of the reason wh y sim ulation studies are often p erformed either with a marginal X densit y of b ounded supp ort and/or with a Gaussian k er- nel. Note that the problem remains with a Gaussian k ernel sine the estimated densit y an b eome qui kly lo w er than the ma hine preision. T o prev en t from this n umerial explosion, the denition of the onditional densit y estimators ha v e to b e mo died either b y ˆ f ( y | x ) =      ˆ f X Y ( x,y ) ˆ f X ( x ) if ˆ f X ( x ) > c ˆ a ( y ) if ˆ f X ( x ) = 0 or b y , ˆ f ( y | x ) = ˆ f X Y ( x, y ) max { ˆ f ( x ) , c } where c > 0 is an arbitrary amoun t of lipping, and ˆ a ( . ) is an arbitrary densit y estimator (usually  hosen to b e zero or ˆ g ( y ) ). An illustration of these issues learly app ears in gure 1 . The unlipp ed v ersion of the double k ernel estimator is unable to estimate the onditional densit y for | x | roughly > 3 , and the lipp ed v ersion of the lo al p olynomial estimator with c = 0 . 00001 and ˆ a ( y ) = ˆ g ( y ) giv es a wrong estimation in the tails, reeting the arbitrary  hoies in the lipping deision. T o the on trary , the quan tile- opula estimator is surprisingly able to estimate the onditional densit y f ( y | x ) at lo ations x where there is no data, i.e. in the tails of the distribution of X . An explanation of this apparen tly parado xal phenomenon omes from the fat that the estimator is partially based on the ranks of X i and Y i . Therefore, it an reo v er hidden information on the densit y of X from the ordering of the pairs ( X i , Y i ) . See Ho [ 16 ℄ for a detailed explanation. W e b eliev e that this feature migh t b e of p oten tial in terest for appliations, e.g. in statistial inferene of extreme v alues and rare ev en ts. Disussion The quan tile transform and use of the opula form ula has th us turned the on- ditional densit y form ula ( 1 ) of the ratio t yp e in to the pro dut one ( 3 ). This 18 form ula w as the ba kb one of our artile where this pro dut form app eared to b e esp eially app ealing for statistial estimation: onsisteny and limit re- sults where obtained b y simple om bination of the previous kno wn ones on (unonditional) densit y estimation. The estimator obtained sho ws in teresting asymptoti bias and v arianes prop erties ompared to omp etitors. Although its nite sample implemen tation do es not giv e y et a lear and onlusiv e pi- ture, it already yields some promising results, e.g. for estimation in the tails of X , where the prop osed estimator do es not suer from lipping issues. 5 App endix : auxiliary results In this setion, w e gather some preliminary results whi h w e will need as basi to ols for the demonstrations of setion 3. In subsetion 5.1 , w e reall lassial results ab out the on v ergene of the K olmogoro v-Sminorv statisti. Next, w e mak e a brief o v erview of k ernel densit y estimation and apply these results to the estimators ˆ g n (setion 5.2 ) and c n (setion 5.3 ). Ev en tually , w e need t w o appro ximation lemmas of ˆ c n b y c n in setions 5.4 and 5.5 . 5.1 Appr oximation of the pseudo-variables F ( X i ) by their estimates F n ( X i ) F or ( X i , i = 1 , . . . , n ) an i.i.d. sample of a real random v ariable X with ommon .d.f. F , the K olmogoro v-Smirno v statisti is dened as D n := k F n − F k ∞ . Gliv enk o-Can telli, K olmogoro v and Smirno v, Ch ung, Donsk er among others ha v e studied its on v ergene prop erties in inreasing generalit y (See [ 28 ℄ and [ 36 ℄ for reen t aoun ts). F or our purp ose, w e only need to form ulate these results in the follo wing rough form: Lemma 5.1 F or an i.i.d. sample fr om a  ontinuous .d.f. F , k F n − F k ∞ = O a.s.   s ln ln n n   (9) k F n − F k ∞ = O P 1 √ n ! . (10) Sine F is unkno wn, the random v ariables U i = F ( X i ) are not observ ed. As a onsequene of the preeding lemma 5.1 , one an naturally appro ximate these v ariables b y the statistis F n ( X i ) . Indeed, | F ( X i ) − F n ( X i ) | ≤ sup x ∈ R | F ( x ) − F n ( x ) | = k F n − F k ∞ a.s. 19 Th us, | F ( X i ) − F n ( X i ) | is no more than an O P ((ln ln n/n ) 1 / 2 ) or an O a.s. ( n − 1 / 2 ) . These rates of appro ximation app ears to b e faster than those of statistial estimation of densities, as is sho wn in the next subsetion. 5.2 Conver gen e of the kernel density estimator ˆ g n W e reall b elo w some lassial results ab out the on v ergene of the P arzen- Rosen blatt k ernel non-parametri estimator ˆ f n of a d-v ariate densit y . Sine its ineption b y Rosen blatt [ 25 ℄ and P arzen [ 22 ℄, it has b een studied b y a great deal of authors. See e.g. Sott [ 27 ℄, Prak asa Rao [ 23 ℄, Nadara y a [ 21 ℄ for details. See also Bosq [ 1 ℄  hapter 2. It is w ell kno wn that the bias of the k ernel densit y estimator dep ends on the degree of smo othness of the underlying densit y , measured b y its n um b er of deriv ativ es or its Lips hitz order. In order to get the on v ergene of the bias to zero, it sues to assume that the densit y is on tin uous (See [ 22 ℄). T o get further information on the rate of on v ergene of the estimator, it is neessary to mak e further assumptions. Moreo v er, for k ernel funtions with un b ounded supp ort, the rate of on v ergene also dep ends on the tail b eha vior of the k ernel (See Stute [ 31 ℄). Therefore, for larit y of exp osition and simpliit y of notations, w e will mak e the ustomary assumptions that the densit y is t wie dieren tiable and that the k ernel is of b ounded supp ort. W e then ha v e the follo wing results: • Bias: With the previous assumptions, for a x in the in terior of supp ( f ) , h n → 0 and nh d n → ∞ en tail that E ˆ f n ( x ) = f ( x ) + h 2 n 2 Z R d X 1 ≤ i,j ≤ d ∂ 2 f ( x ) ∂ x i ∂ x j z i z j K ( z ) dz + o ( h 2 n ) . With the m ultiv ariate k ernel K as a pro dut of d order one k ernels K i , the ab o v e sum redues to the diagonal terms. E ˆ f n ( x ) = f ( x ) + h 2 n 2 X 1 ≤ i ≤ d m 2 ( K i ) ∂ 2 f ( x ) ∂ x 2 i + o ( h 2 n ) . • V ariane: with the same assumptions, V ar h ˆ f n ( x ) i = f ( x ) nh d n k K k 2 2 + o 1 nh d n ! . • P oin t wise asymptoti normalit y: under the previous onditions, q nh d n  ˆ f n ( x ) − E ˆ f n ( x )  d ❀ N (0 , f ( x ) k K k 2 2 ) . 20 F or a  hoie of the bandwidth as h n ≃ n − 1 / ( d +4) , whi h realizes the optimal trade-o b et w een the bias and v ariane, one gets the rate n − 2 / ( d +4) , whi h is the optimal sp eed of on v ergene in the minimax sense in the lass of densit y funtions with b ounded seond deriv ativ es, aording to [ 30 ℄. • P oin t wise almost sure on v ergene: if moreo v er nh d n / (ln ln n ) → ∞ (see [ 3 ℄), w e ha v e that ˆ f n ( x ) − E ˆ f n ( x ) = O a.s. s ln ln n nh d n ! . F or a  hoie of the bandwidth as h n ≃ ((ln ln n ) /n ) 1 / ( d +4) , w e get the rate of on v ergene ((ln ln n ) /n ) 2 / ( d +4) : ˆ f n ( x ) − f ( x ) = O a.s.   ln ln n n ! 2 / ( d +4)   . Applied to our ase ( d = 1 ), w e an summarize these results for further ref- erene in the follo wing lemma for the estimator ˆ g n of the densit y g of Y : Lemma 5.2 With the pr evious assumptions, for a p oint y in the interior of the supp ort of g , and a b andwidth hosen suh as h n ≃ n − 1 / 5 , we have | ˆ g n ( y ) − g ( y ) | = O p ( n − 2 / 5 ) n 2 / 5 [ ˆ g n ( y ) − g ( y ) ] d ❀ N  0 , g ( y ) k K 0 k 2 2  . With the same assumptions, but for a b andwidth hoi e of h n ≃ (ln ln n/n ) 1 / 5 , ˆ g n ( y ) − g ( y ) = O a.s.   ln ln n n ! 2 / 5   . (11) 5.3 Conver gen e of c n ( u, v ) As men tioned b efore, the assumptions that F and G b e dieren tiable and stritly inreasing en tail that c is the densit y of the transformed v ariables ( U, V ) := ( F ( X ) , G ( Y )) . Therefore, one one on vines oneself that c n ( u, v ) is simply the k ernel densit y estimator of the biv ariate densit y c ( u, v ) of the pseudo-v ariables ( U, V ) , one diretly dra ws its on v ergene prop erties b y ap- plying the results of the preeding subsetion with d = 2 : Lemma 5.3 F or a hoi e of a n ≃ n − 1 / 6 , for every ( u, v ) ∈ (0 , 1) 2 , similar r esults of those of lemma 5.2 hold for ˆ c n with a r ate of  onver gen e of n − 1 / 3 and (ln ln n/n ) 1 / 3 r esp e tively. 21 5.4 A n appr oximation lemma of ˆ c n ( u, v ) by c n ( u, v ) The lemma of this setion giv es the rate of appro ximation of the k ernel opula densit y estimator ˆ c n ( u, v ) omputed on the real data ( F n ( X i ) , G n ( Y i )) b y its analogue c n ( u, v ) omputed on the pseudo-data ( U i , V i ) := ( F ( X i ) , G ( Y i )) . A similar result, but with a dieren t pro of, has b een obtained in F ermanian [ 9 ℄ theorem 1. Lemma 5.4 L et ( u, v ) ∈ (0 , 1) 2 . If the kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) is twi e dier entiable with b ounde d se  ond derivatives, then | ˆ c n ( u, v ) − c n ( u, v ) | = o P ( a 2 n + 1 / q na 2 n ) | ˆ c n ( u, v ) − c n ( u, v ) | = o a.s. s ln ln n na 2 n ! Pro of. W e note || . || a norm for v etors. Set ∆ := ˆ c n ( u, v ) − c n ( u, v ) = 1 na 2 n n P i =1 ∆ i,n ( u, v ) with ∆ i,n ( u, v ) := K u − F n ( X i ) a n , v − G n ( Y i ) a n ! − K u − F ( X i ) a n , v − G ( Y i ) a n ! and dene Z i,n :=    F ( X i ) − F n ( X i ) G ( Y i ) − G n ( Y i )    . As men tioned in setion 5.1 , | F n ( X i ) − F ( X i ) | ≤ || F n − F || ∞ and | G n ( Y i ) − G ( Y i ) | ≤ || G n − G || ∞ a.s. for ev ery i = 1 , . . . , n . Lemma 5.1 th us en tails that the norm of Z i,n is indep enden t of i and su h that || Z i,n || = O P (1 / √ n ) , i = 1 , . . . , n (12) || Z i,n || = O a.s. ( q ln ln n/n ) , i = 1 , . . . , n (13) No w, for ev ery xed ( u, v ) ∈ [0 , 1] 2 , sine the k ernel K is t wie dieren tiable, there exists, b y T a ylor expansion, random v ariables ˜ U i,n and ˜ V i,n su h that, almost surely , ∆ = 1 na 3 n n X i =1 Z T i,n ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! + 1 2 na 4 n n X i =1 Z T i,n ∇ 2 K u − ˜ U i,n a n , v − ˜ V i,n a n ! Z i,n := ∆ 1 + ∆ 2 22 where Z T i,n denotes the transp ose of the v etor Z i,n and ∇ K and ∇ 2 K the gradien t and the Hessian resp etiv ely of the m ultiv ariate k ernel funtion K ∇ K =    ∂ K ∂ u ∂ K ∂ v    , ∇ 2 K =    ∂ 2 K ∂ u 2 ∂ 2 K ∂ u∂ v ∂ 2 K ∂ u∂ v ∂ 2 K ∂ v 2    Ne gligibility of ∆ 2 : By the b oundedness assumption on the seond-order deriv a- tiv es of the k ernel, and equations 12 and 13 , ∆ 2 = O P 1 na 4 n ! and ∆ 2 = O a.s. ln ln n na 4 n ! . Ne gligibility of ∆ 1 : By en tering at exp etations, ∆ 1 = 1 na 3 n n X i =1 Z T i,n ∇ K u − F ( X i ) a n , . . . ! − E ∇ K u − F ( X i ) a n , . . . !! + 1 na 3 n n X i =1 Z T i,n E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! := ∆ 11 + ∆ 12 Ne gligibility of ∆ 12 : Bias results on the biv ariate gradien t k ernel estimator (See Sott [ 27 ℄  hapter 6) en tail that E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! = a 3 n ∇ c ( u, v ) + O ( a 5 n ) Cau h y-S h w arz inequalit y yields that | ∆ 12 | ≤ n || Z i,n || na 3 n      E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n !      In turn, with equations 12 and 13 , ∆ 12 = O P (1 / √ n ) and ∆ 12 = O a.s ( q ln ln n/n ) . Ne gligibility of ∆ 11 : Set A i = ∇ K  u − F ( X i ) a n , . . .  − E ∇ K  u − F ( X i ) a n , . . .  . Then, | ∆ 11 | ≤ || Z n || na 3 n n X i =1 || A i || Boundedness assumption on the deriv ativ e of the k ernel imply that || A i || ≤ 2 C a.s. W e apply Ho eding inequalit y for indep enden t, en tered, b ounded b y M , but non iden tially distributed random v ariables ( η j ) (e.g. see [ 1 ℄), P ( n X j =1 η j > t ) ≤ exp − t 2 2 nM 2 ! . (14) 23 Here, for ev ery ǫ > 0 , with M = 2 C , η i = || A i || − E || A i || , t = ǫn 1 / 2 (ln ln n ) 1 / 2 , w e get that P  X n i =1 ( || A i || − E || A i || ) > ǫ √ n ln ln n  6 exp − ǫ 2 ln ln n 4 M 2 ! = 1 (ln n ) δ with a δ > 0 and where the r.h.s. go es to zero as n → ∞ . Therefore, P n i =1 ( || A i || − E || A i || ) = O P ( √ n ln ln n ) . F or the almost sure negligibilit y , w e get similarly b y inequalit y 14 that, for ev ery ǫ > 0 , with t = ǫn (1+ δ ) / 2 and δ > 0 , P  X n i =1 ( || A i || − E || A i || ) > ǫn (1+ δ ) / 2  6 exp − ǫ 2 n δ 4 M 2 ! and the series on the r.h.s is on v ergen t. In turn, the Borell-Can telli lemma imply that P n i =1 ( || A i || − E || A i || ) = O a.s. ( n (1+ δ ) / 2 ) . It remains to ev aluate E | | A i || . First, w e ha v e that E | | A i || ≤ 2 E ||∇ K (( u − F ( X i )) /a n , . . . ) || . Seond, sine K is dieren tiable and of pro dut form K ( u, v ) = K 1 ( u ) K 2 ( v ) , ea h sub-k ernel is of b ounded v ariations and an b e written as a dierene of t w o monotone inreasing funtions. F or example, set K 1 = K a 1 − K b 1 and dene K ∗ := ( K a 1 + K b 1 ) K 2 . W e ha v e,      ∂ K ∂ u      6  | ( K a 1 ) ′ | + | ( K b 1 ) ′ |  K 2 = (( K a 1 ) ′ + ( K b 1 ) ′ ) K 2 := ∂ K ∗ ∂ u where the equalit y pro eeds from the p ositivit y of the deriv ativ es. As a on- sequene, E      ∂ K ∂ u (( u − F ( X i )) /a n , . . . )      ≤ E ∂ K ∗ ∂ u (( u − F ( X i )) /a n , . . . ) and similarly for the other partial deriv ativ e. The r.h.s. of the previous inequal- it y is, after an in tegration b y parts, of order a 3 n b y the results on the k ernel estimator of the gradien t of the densit y (See Sott [ 27 ℄  hapter 6). Therefore, P n i =1 E | | A i || = O ( na 3 n ) . Reolleting all elemen ts, w e ev en tually obtain that ∆ 11 = O P √ n ln ln n + na 3 n √ nna 3 n ! = O P √ ln ln n na 3 n + 1 √ n ! = o P   1 q na 2 n   . ∆ 11 = O a.s.   n (1+ δ ) / 2 + na 3 n na 3 n s ln ln n n   = O a.s.   s ln ln n na 2 n 1 n (1 − δ ) / 2 a 2 n + s ln ln n n   = o a.s. s ln ln n na 2 n ! 24 for δ small enough ( < 1 / 3 for a n ≃ n − 1 / 6 ). ✷ 5.5 A n appr oximation lemma for ˆ c n ( F n ( x ) , G n ( y )) by ˆ c n ( F ( x ) , G ( y )) The lemma of this subsetion giv es the rate of deviation of the k ernel opula densit y estimator ˆ c n from a v arying lo ation ( F n ( x ) , G n ( y )) to a xed lo ation ( F ( x ) , G ( y )) . Lemma 5.5 With the same assumptions as in the pr e  e ding lemma, we have ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = o P   a 2 n + 1 q na 2 n   ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = O a.s.   s ln ln n n   Pro of. W e pro eed similarly as in the preeding lemma. Set ∆ n ( x, y ) := ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = 1 na 2 n n X i =1 ∆ ′ i,n ( x, y ) (15) with ∆ ′ i,n ( x, y ) := K F n ( x ) − F n ( X i ) a n , G n ( y ) − G n ( Y i ) a n ! − K F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! and dene Z n ( x, y ) :=    F n ( x ) − F ( x ) G n ( y ) − G ( y )    W e rst express ∆ ′ i,n ( x, y ) at a xed lo ation ( F ( x ) , G ( y )) b y a T a ylor expan- sion and b y b ounding uniformly the seond order terms, ∆ ′ i,n ( x, y ) = Z T n ( x, y ) ∇ K a n F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! + || Z n || 2 ∞ a 2 n R 1 (16) where R 1 is uniformly b ounded almost surely: R 1 = O a.s. (1) . W e then go from the data ( F n ( X i ) , G n ( Y i )) to the pseudo but xed w.r.t. n data ( F ( X i ) , G ( Y i )) . 25 By a seond T a ylor expansion, ∇ K a n F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! = ∇ K a n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + Z T i,n ∇ 2 K 2 a 2 n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + || Z n || ∞ a 2 n R 2 . (17) where R 2 = o a.s. (1) uniformly in i , x and y . Therefore, plugging 16 and 17 in 15 , w e get ∆ n ( x, y ) = Z T n ( x, y ) na 2 n n X i =1 ∇ K a n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + Z T n ( x, y ) na 2 n n X i =1 Z t i,n ∇ 2 K 2 a 2 n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + R 3 || Z n || 2 ∞ a 4 n . with the remainder term R 3 = O a.s. (1) uniformly . As b efore, the prop erties of the k ernel (deriv ate) densit y estimator (See Sott [ 27 ℄  hapter 6) en tails that 1 na 3 n n X i =1 ∇ K F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! = O P ( a 2 n + 1 / q na 4 n ) . Therefore, using 12 and b ounding uniformly the Hessian, 15 b eomes ∆ n ( x, y ) = O P   a 2 n || Z n || ∞ + || Z n || ∞ q na 4 n   + O P || Z n || 2 ∞ a 4 n ! = o P   a 2 n + 1 q na 2 n   . Similarly , one gets with 13 and the strong onsisteny of the estimator of the gradien t of the densit y that ∆ n ( x, y ) = O a.s.  q ln ln n n  . ✷ Referenes [1℄ D. Bosq. Nonp ar ametri statistis for sto hasti pr o  esses , v olume 110 of L e tur e Notes in Statistis . Springer-V erlag, New Y ork, seond edition, 1998. Estimation and predition. [2℄ S. X. Chen. Beta k ernel estimators for densit y funtions. Comput. Statist. Data A nal. , 31(2):131145, 1999. 26 [3℄ P . Deheuv els. Conditions néessaires et susan tes de on v ergene p ontuelle presque sûre et uniforme presque sûre des estimateurs de la densité. C. R. A  ad. Si. Paris Sér. A , 278:12171220, 1974. [4℄ P . Deheuv els. La fontion de dép endane empirique et ses propriétés. Un test non paramétrique d'indép endane. A  ad. R oy. Belg. Bul l. Cl. Si. (5) , 65(6):274292, 1979. [5℄ P . Deheuv els. A Kolmogoro v-Smirno v t yp e test for indep endene and m ultiv ariate samples. R ev. R oumaine Math. Pur es Appl. , 26(2):213226, 1981. [6℄ L. Devro y e and G. Lugosi. Combinatorial metho ds in density estimation . Springer Series in Statistis. Springer-V erlag, New Y ork, 2001. [7℄ J. F an and Q. Y ao. Nonline ar time series . Springer Series in Statistis. Springer-V erlag, New Y ork, seond edition, 2005. Nonparametri and parametri metho ds. [8℄ J. F an, Q. Y ao, and H. T ong. Estimation of onditional densities and sen- sitivit y measures in nonlinear dynamial systems. Biometrika , 83(1):189 206, 1996. [9℄ J.-D. F ermanian. Go o dness-of-t tests for opulas. J. Multivariate A nal. , 95(1):119152, 2005. [10℄ J.-D. F ermanian and Saillet O. Nonparametri estimation of opulas for time series. Journal of R isk , 5(4):2554, 2003. [11℄ J.-D. F ermanian, D. Radulo vi¢, and M. W egk amp. W eak on v ergene of empirial opula pro esses. Bernoul li , 10(5):847860, 2004. [12℄ T. Gasser and H.-G. Müller. Kernel estimation of regression funtions. In Smo othing te hniques for urve estimation (Pr o . W orkshop, Heidel- b er g, 1979) , v olume 757 of L e tur e Notes in Math. , pages 2368. Springer, Berlin, 1979. [13℄ I. Gijb els and J. Mielnizuk. Estimating the densit y of a opula funtion. Comm. Statist. The ory Metho ds , 19(2):445464, 1990. [14℄ J. Gustafsonn, M. Hagmann, J.P . Nielsen, and O. Saillet. Lo al trans- formation k ernel densit y estimation of loss distributions. F orth oming in Journal of Business and E onomi Statistis , 2007. [15℄ L. Györ and M. K ohler. Nonparametri estimation of onditional dis- tributions. IEEE T r ans. Inform. The ory , 53(5):18721879, 2007. [16℄ P . D. Ho. Extending the rank lik eliho o d for semiparametri opula estimation. A nnals Appl. Stats. , 1(1):265283, 2007. [17℄ R. J. Hyndman, D. M. Bash tann yk, and G. K. Grun w ald. Estimating and visualizing onditional densities. J. Comput. Gr aph. Statist. , 5(4):315 336, 1996. [18℄ R. J. Hyndman and Q. Y ao. Nonparametri estimation and symmetry tests for onditional densit y funtions. J. Nonp ar ametr. Stat. , 14(3):259 278, 2002. [19℄ H. Jo e. Multivariate mo dels and dep enden e  on epts , v olume 73 of Mono- gr aphs on Statistis and Applie d Pr ob ability . Chapman & Hall, London, 1997. 27 [20℄ C. Laour. A daptiv e estimation of the transition densit y of a mark o v  hain. A nn. Inst. H. Poin ar é Pr ob ab. Statist. , 43(5):571597, 2007. [21℄ È. A. Nadara y a. Nonp ar ametri estimation of pr ob ability densities and r e gr ession urves , v olume 20 of Mathematis and its Appli ations (Soviet Series) . Klu w er A ademi Publishers Group, Dordre h t, 1989. T ranslated from the Russian b y Sam uel K otz. [22℄ E. P arzen. On estimation of a probabilit y densit y funtion and mo de. A nn. Math. Statist. , 33:10651076, 1962. [23℄ B. L. S. Prak asa Rao. Nonp ar ametri funtional estimation . Probabil- it y and Mathematial Statistis. A ademi Press In. [Harourt Brae Jo v ano vi h Publishers℄, New Y ork, 1983. [24℄ M. B. Priestley and M. T. Chao. Non-parametri funtion tting. J. R oy. Statist. So . Ser. B , 34:385392, 1972. [25℄ M. Rosen blatt. Remarks on some nonparametri estimates of a densit y funtion. A nn. Math. Statist. , 27:832837, 1956. [26℄ M. Rosen blatt. Conditional probabilit y densit y and regression estimators. In Multivariate A nalysis, II (Pr o . Se  ond Internat. Symp os., Dayton, Ohio, 1968) , pages 2531. A ademi Press, New Y ork, 1969. [27℄ D. W. Sott. Multivariate density estimation . Wiley Series in Probabilit y and Mathematial Statistis: Applied Probabilit y and Statistis. John Wiley & Sons In., New Y ork, 1992. Theory , pratie, and visualization, A Wiley-In tersiene Publiation. [28℄ G. R. Shora k and J. A. W ellner. Empiri al pr o  esses with appli ations to statistis . Wiley Series in Probabilit y and Mathematial Statistis: Probabilit y and Mathematial Statistis. John Wiley & Sons In., New Y ork, 1986. [29℄ M. Sklar. F ontions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris , 8:229231, 1959. [30℄ C. J. Stone. Optimal rates of on v ergene for nonparametri estimators. A nn. Statist. , 8(6):13481360, 1980. [31℄ W. Stute. A la w of the logarithm for k ernel densit y estimators. A nn. Pr ob ab. , 10(2):414422, 1982. [32℄ W. Stute. Asymptoti normalit y of nearest neigh b or regression funtion estimates. A nn. Statist. , 12(3):917926, 1984. [33℄ W. Stute. Conditional empirial pro esses. A nn. Statist. , 14(2):638647, 1986. [34℄ W. Stute. On almost sure on v ergene of onditional empirial distribu- tion funtions. A nn. Pr ob ab. , 14(3):891901, 1986. [35℄ A. W. v an der V aart. Asymptoti statistis , v olume 3 of Cambridge Series in Statisti al and Pr ob abilisti Mathematis . Cam bridge Univ ersit y Press, Cam bridge, 1998. [36℄ A. W. v an der V aart and J. A. W ellner. W e ak  onver gen e and empiri al pr o  esses . Springer Series in Statistis. Springer-V erlag, New Y ork, 1996. With appliations to statistis. 28

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment