A quantile-copula approach to conditional density estimation

A quan tile-opula approa h to onditional densit y estimation. Olivier P . F augeras L.S.T.A, Université Paris 6 175, rue du Chevaler et, 75013 Paris, F r an e T el:+(33) 1 44 27 85 62 F ax:+(33) 1 44 27 33 42 Abstrat W e presen t a new non-parametri estimator of the onditional densit y of the k ernel t yp e. It is based on an eien t transformation of the data b y quan tile transform. By use of the opula represen tation, it turns out to ha v e a remark able pro dut form. W e study its asymptoti prop erties and ompare its bias and v ariane to omp etitors based on nonparametri regression. A omparativ e n umerial sim ulation is pro vided. Key wor ds: onditional densit y, k ernel estimation, opula, quan tile transform, nonparametri regression, 1991 MSC: 62G007, 62M20, 62M10 1 In tro dution 1.1 Motivation Let (( X i , Y i ); i = 1 , . . . , n ) b e an indep enden t iden tially distributed sample from real-v alued random v ariables ( X , Y ) sitting on a giv en probabilit y spae. F or prediting the resp onse Y of the input v ariable X at a giv en lo ation x , it is of great in terest of estimating not only the onditional mean or r e gr ession funtion E ( Y | X = x ) , but the full  onditional density f ( y | x ) . Indeed, estimat- ing the onditional densit y is m u h more informativ e, sine it allo ws not only to realulate the onditional exp eted v alue E ( Y | X ) and onditional v ariane Email addr ess: olivier.faugerasgmail.om (Olivier P . F augeras ). Preprin t submitted to Elsevier No v em b er 1, 2018 from the densit y , but also to pro vide the general shap e of the onditional den- sit y . This is esp eially imp ortan t for m ulti-mo dal or sk ew ed densities, whi h often arise from nonlinear or non-Gaussian phenomenas, where the exp eted v alue migh t b e no where near a mo de, i.e. the most lik ely v alue to app ear. Moreo v er, for situations in whi h ondene in terv als are preferred to p oin t estimates, the estimated onditional densit y is an ob jet of ob vious in terest. 1.2 Estimation by kernel smo othing A natural approa h to estimate the onditional densit y f ( y | x ) of Y giv en X = x w ould b e to exploit the iden tit y f ( y | x ) = f X Y ( x, y ) f X ( x ) (1) where f X Y and f X denote the join t densit y of ( X , Y ) and X , resp etiv ely . By in tro duing P arzen-Rosen blatt k ernel estimators of these densities, namely ˆ f n,X Y ( x, y ) : = 1 n n X i =1 K ′ h ′ ( X i − x ) K h ( Y i − y ) ˆ f n,X ( x ) : = 1 n n X i =1 K ′ h ′ ( X i − x ) where K h ( . ) = 1 /hK ( ./ h ) and K ′ h ′ ( . ) = 1 /h ′ K ′ ( ./h ′ ) are (resaled) k ernels with their asso iated sequene of bandwidth h = h n and h ′ = h ′ n going to zero as n → ∞ , one an onstrut the quotien t ˆ f R n ( y | x ) := ˆ f n,X Y ( x, y ) ˆ f n,X ( x ) and obtain an estimator of the onditional densit y . Su h an estimator w as rst studied b y Rosen blatt [ 26 ℄, and more reen tly b y Hyndman et al. [ 17 ℄, who sligh tly impro v ed on Rosen blatt's k ernel based estimator. 1.3 Estimation by r e gr ession te hniques As p oin ted out b y n umerous authors, see e.g. F an and Y ao [ 7 ℄  hapter 6, this approa h is equiv alen t to the one arising from onsidering this onditional densit y estimation problem in a regression framew ork. Indeed, let F ( y | x ) b e the um ulativ e onditional distribution funtion of Y giv en X = x . It stems from the fat that E  1 | Y − y |≤ h | X = x  = F ( y + h | x ) − F ( y − h | x ) ≈ 2 h.f ( y | x ) 2 as h → 0 , that, if one replae the exp etation in the ab o v e expression b y its empirial oun terpart, one an apply the usual lo al a v eraging metho ds and p erform a regression estimation on the syn theti data ((1 / 2 h ) 1 | Y i − y | ≤ h ; i = 1 , . . . , n ) . By a Bo  hner t yp e theorem, one an ev en replae the transformed data b y its smo othed v ersion Y ′ i := K h ( Y i − y ) := 1 h K  Y i − y h  . In partiular, the p opular Nadara y a-W atson regression estimator ˆ f N W n ( y | x ) := P n i =1 Y ′ i K ′ h ′ ( X i − x ) P n i =1 K ′ h ′ ( X i − x ) redues itself to the same estimator of the onditional densit y of the double k ernel t yp e as b efore ˆ f N W n ( y | x ) := P n i =1 K h ( Y i − y ) K ′ h ′ ( X i − x ) P n i =1 K ′ h ′ ( X i − x ) = ˆ f R n ( y | x ) . T aking adv an tage of this regression form ulation, F an, Y ao and T ong [ 8 ℄ pro- p osed a onditional densit y estimator whi h generalizes the k ernel one b y use of the lo al p olynomial te hniques. In partiular, it allo ws to ta kle with the bias issues of the k ernel smo othing. Ho w ev er, and unlik e the former, it is no longer guaran teed to ha v e p ositiv e v alue nor to in tegrate to 1 with resp et to y . With these issues in mind, Hyndman and Y ao [ 18 ℄ built on lo al p oly- nomial te hniques and suggested t w o impro v ed metho ds, the rst one based on lo ally tting a log-linear mo del and the seond one on onstrained lo al p olynomial mo deling. An o v erview an b e found in F an and Y ao [ 7 ℄ ( hapter 6 and 10). V ery reen tly , Gy ör and K ohler [ 15 ℄ studied a partitioning t yp e estimate and studied its prop erties in total v ariation norm and Laour [ 20 ℄ a pro jetion-t yp e estimate for Mark o v  hains. 1.4 A pr o dut shap e d estimator Ho w ev er, these t w o equiv alen t approa hes suer from sev eral dra wba ks: rst, b y its form as a quotien t of t w o estimators, the probabilisti b eha vior of the Nadara y a-W atson estimator (or its lo al p olynomial oun terpart) is tri ky to study . It is usually dealt with b y a en tering at exp etation for b oth n umerator and denominator and a linearizing of the in v erse, see e.g. [ 7 ℄, or [ 1 ℄ for details. Seond, at a oneptual lev el, one ould argue that implemen ting regression estimation te hniques in this setting is, in a sense, unnatural: estimating a densit y , ev en if it is a onditional one, should resort to densit y estimation te hniques only . Finally , pratial implemen tations of these estimators an lead to n umerial instabilit y when the denominator is lose to zero. 3 T o remedy these problems, w e prop ose an estimator whi h builds on the idea of using syn theti data, i.e. a represen tation of the data more adapted to the problem than the original one. By transforming the data b y quan tile trans- forms and making use of the opula funtion, the estimator turns out to ha v e a remark able pr o dut form ˆ f n ( y | x ) = ˆ f Y ( y ) ˆ c n ( F n ( x ) , G n ( y )) where ˆ f Y , ˆ c n , F n ( x ) , G n ( y ) are estimators of the densit y f Y of Y , the opula densit y c , the .d.f. F of X and G of Y resp etiv ely (see next setion b elo w for denitions). Its study then rev eals to b e partiularly simple: it redues to the ones already done on nonparametri densit y estimation. The rest of the pap er is organized as follo ws: in setion 2, w e in tro due the quan tile transform and the opula represen tation whi h leads to the denition of our estimator. In setion 3, the main asymptoti results are established and ompared in setion 4 to those of other omp etitors. Pro ofs are mainly based on a series of auxiliary lemmas whi h are giv en in setion 5. 2 Presen tation of the estimator F or sak e of simpliit y and larit y of exp osition, w e limit ourselv es to unidi- mensional real v alued input v ariables X . Ho w ev er, all the results of this artile an b e easily extended to the m ultiv ariate ase. 2.1 The quantile tr ansform The idea of transforming the data is not new. It has b een used to impro v e the range of appliabilit y and p erformane of lassial estimation te hniques, e.g. to deal with sk ew ed data, hea vy tails, or restritions on the supp ort (see e.g. Devro y e and Lugosi [ 6 ℄  hapter 14 and the referenes therein, and also V an der V aart [ 35 ℄  hapter 3.2 for the related topi of v ariane stabilizing transformations in a parametri on text). In order to mak e inferene on Y from X , a natural question whi h then arises is, what is the b est transformation, if this question has a sense. As one an note from the ab o v e referenes, the b est transformation is v ery link ed to the distribution of the underlying data. W e will see b elo w that, for our problem, the natural andidate is the quan tile transform. The quan tile transform is a w ell-kno wn probabilisti tri k whi h is used to redue pro ofs, e.g. in empirial pro ess theory , for arbitrary real v alued ran- dom v ariables X to ones for random v ariables U uniformly distributed on the 4 in terv al [0 , 1] . It is based on the follo wing w ell-kno wn fat that whenev er F is on tin uous, the random v ariable U = F ( X ) is uniformly distributed on (0 , 1) and that on v ersely , when F is arbitrary , if U is a uniformly distributed ran- dom v ariable on (0 , 1) , X is equal in la w to F − 1 ( U ) , where F − 1 = Q is the generalized in v erse or quan tile funtion of X . (See e.g. [ 28 ℄,  hapter 1). As a onsequene, giv en a sample ( X 1 , . . . , X n ) of random v ariables with om- mon on tin uous .d.f. F sitting on a probabilit y spae (Ω , A , P ) , one an al- w a ys enlarge this probabilit y spae to arry a sequene ( U 1 , . . . , U n ) of uniform (0 , 1) random v ariables su h that U i = F ( X i ) , that is to sa y to onstrut a pseudo-sample with a pr esrib e d uniform marginal distribution. 2.2 The  opula r epr esentation F ormally , a opula is a bi-(or m ulti)v ariate distribution funtion whose margi- nal distribution funtions are uniform on the in terv al [0 , 1] . Indeed, Sklar [ 29 ℄ pro v ed the follo wing fundamen tal result: Theorem 2.1 F or any bivariate umulative distribution funtion F X,Y on R 2 , with mar ginal umulative distribution funtions F of X and G of Y , ther e ex- ists some funtion C : [0 , 1] 2 → [0 , 1] ,  al le d the dep enden e or  opula funtion, suh as F X,Y ( x, y ) = C ( F ( x ) , G ( y )) , − ∞ ≤ x, y ≤ + ∞ . (2) If F and G ar e  ontinuous, this r epr esentation is unique with r esp e t to ( F , G ) . The  opula funtion C is itself a umulative distribution funtion on [0 , 1] 2 with uniform mar ginals. This theorem giv es a represen tation of the biv ariate .d.f. as a funtion of ea h univ ariate .d.f. In other w ords, the opula funtion aptures the dep endene struture among the omp onen ts X and Y of the v etor ( X , Y ) , irresp etiv ely of the marginal distribution F and G . Simply put, it allo ws to deal with the randomness of the dep endene struture and the randomness of the marginals sep ar ately . Copulas app ears to b e naturally link ed with the quan tile transform as form ula 2 en tails that C ( u, v ) = F X,Y ( F − 1 ( u ) , G − 1 ( v )) . F or more details regarding op- ulas and their prop erties, one an onsult for example the b o ok of Jo e [ 19 ℄. Copulas ha v e witnessed a renew ed in terest in statistis, esp eially in nane, sine the pioneering w ork of Deheuv els [ 4 ℄, who in tro dued the empirial op- ula pro ess. W eak on v ergene of the empirial opula pro ess w as in v estigated b y Deheuv els [ 5 ℄, V an der V aart and W ellner [ 36 ℄, F ermanian, Radulo vi and W egk amp [ 11 ℄. F or the estimation of the opula densit y , refer to Gijb els and 5 Mielnizuk [ 13 ℄, F ermanian [ 9 ℄ and F ermanian and Saillet [ 10 ℄. F rom no w on, w e assume that the opula funtion C ( u, v ) has a densit y c ( u, v ) with resp et to the Leb esgue measure on [0 , 1] 2 and that F and G are stritly inreasing and dieren tiable with densities f and g . C ( u, v ) and c ( u, v ) are then the um ulativ e distribution funtion (.d.f.) and densit y resp etiv ely of the transformed v ariables ( U, V ) = ( F ( X ) , G ( Y )) . By dieren tiating form ula ( 2 ), w e get for the join t densit y , f X Y ( x, y ) = ∂ 2 F X Y ( x, y ) ∂ x∂ y = f ( x ) g ( y ) c ( F ( x ) , G ( y )) where c ( u, v ) := ∂ 2 C ( u,v ) ∂ u∂ v is the ab o v e men tioned opula densit y . Ev en tually , w e an obtain the follo wing expliit form ula of the onditional densit y f Y | X ( x, y ) = f X Y ( x, y ) f ( x ) = g ( y ) c ( F ( x ) , G ( y )) . (3) 2.3 Constrution of the estimator Starting from the previously stated pro dut t yp e form ula ( 3 ), a natural plug-in approa h to build an estimator of the onditional densit y is to use • a P arzen-Rosen blatt k ernel t yp e non parametri estimator of the marginal densit y g of Y , ˆ g n ( y ) := 1 nh n n X i =1 K 0  y − Y i h n  • the empirial distribution funtions F n ( x ) and G n ( y ) for F ( x ) and G ( y ) resp etiv ely , F n ( x ) = 1 n n X j =1 1 X j ≤ x and G n ( y ) := 1 n n X j =1 1 Y j ≤ y . Conerning the opula densit y c ( u, v ) , w e noted that c ( u, v ) is the join t densit y of the transformed v ariables ( U, V ) = ( F ( X ) , G ( Y )) . Therefore, c ( u, v ) an b e estimated b y the biv ariate P arzen-Rosen blatt k ernel t yp e non parametri densit y (pseudo) estimator, c n ( u, v ) := 1 na n b n n X i =1 K  u − U i a n , v − V i b n  (4) where K is a biv ariate k ernel and a n , b n its asso iated bandwidth. F or simpli- it y , w e restrit ourselv es to pro dut k ernels, i.e. K ( u, v ) = K 1 ( u ) K 2 ( v ) with the same bandwidths a n = b n . 6 Nonetheless, sine F and G are unkno wn, the random v ariables ( U i , V i ) are not observ able, i.e. c n is not a true statisti. Therefore, w e appro ximate the pseudo- sample ( U i , V i ) , i = 1 , . . . , n b y its empirial oun terpart ( F n ( X i ) , G n ( Y i )) , i = 1 , . . . , n . W e therefore obtain a gen uine estimator of c ( u, v ) ˆ c n ( u, v ) := 1 na 2 n n X i =1 K 1 u − F n ( X i ) a n ! K 2 v − G n ( Y i ) a n ! . (5) Ev en tually , the onditional densit y estimator is written as ˆ f n ( y | x ) := " 1 nh n n X i =1 K 0  y − Y i h n  # . " 1 na 2 n n X i =1 K 1 F n ( x ) − F n ( X i ) a n ! K 2 F n ( y ) − G n ( Y i ) a n !# or, under a more ompat form, ˆ f n ( y | x ) := ˆ g n ( y ) ˆ c n ( F n ( x ) , G n ( y )) . (6) Remark 1 T o our know le dge, the estimator studie d in this p ap er has never b e en pr op ose d in the liter atur e. However, some  onne tions  an b e made with the ne ar est neighb or one pr op ose d by Stute [ 32 ℄, [ 33 ℄ and [ 34 ℄ for  onditional umulative distribution funtion and the Gasser and Mül ler [ 12 ℄ and Priestley and Chao [ 24 ℄ one in the  ontext of r e gr ession estimation. Inde e d, these esti- mators takle the issue of having a r andom denominator by rst tr ansforming the design X 1 , . . . , X n to a uniform (r andom) one. This r esult in assigning the surfa es under the kernel funtion inste ad of its heights as weights. Con- tr ary to our estimator, they do not make tr ansformations of the data in b oth dir e tions X and Y . 3 Asymptoti results 3.1 Notations and assumptions W e note the ith momen t of a generi k ernel (p ossibly m ultiv ariate) K as m i ( K ) := R u i K ( u ) du , and the L p norm of a funtion h b y || h || p := R h p . W e use the sign ≃ to denote the order of the bandwidths, i.e. h n ≃ u n means that h n = c n u n with c n → c > 0 . The supp ort of the densities funtion f and c are noted as supp ( f ) = { x ∈ R ; f ( x ) > 0 } and supp ( c ) = { ( u, v ) ∈ R 2 ; c ( u, v ) > 0 } , resp etiv ely . F or stating our results, w e will ha v e to mak e some regularit y assumptions 7 on the k ernels and the densities whi h, although far from b eing minimal, are someho w ustomary in k ernel densit y estimation (see subsetion 5.2 for disus- sions and details). Set x and y t w o xed p oin ts in the in terior of supp ( f ) and supp ( g ) resp etiv ely . In the remainder of this pap er, w e will alw a ys supp ose that i) the .d.f F of X and G of Y are stritly inreasing and dieren tiable; ii) the densities g and c are t wie dieren tiable with on tin uous b ounded seond deriv ativ es on their supp ort. Moreo v er, w e assume that the k ernels K 0 and K satisfy the follo wing: (i) K and K 0 are of b ounded supp ort and of b ounded v ariation; (ii) 0 ≤ K ≤ C and 0 ≤ K 0 ≤ C for some onstan t C ; (iii) K and K 0 are rst order k ernels: m 0 ( K ) = 1 , m 1 ( K ) = 0 and m 2 ( K ) < + ∞ , and the same for K 0 . In addition, in order to appro ximate ˆ c n b y c n , w e will imp ose the sligh tly more stringen t assumption on the biv ariate k ernel K , that it is t wie dieren tiable with b ounded seond partial deriv ativ es. 3.2 W e ak and str ong  onsisteny of the estimator W e ha v e the follo wing p oin t wise w eak onsisteny theorem: Theorem 3.1 L et the r e gularity  onditions on the densities and kernels b e satise d, if h n and a n tends to zer o as n → ∞ in suh a way that nh n → ∞ , na 2 n → ∞ , then ˆ f n ( y | x ) = f ( y | x ) + O P   1 √ nh n + h 2 n + 1 q na 2 n + a 2 n   . Pro of. Reall from 4 and 5 that c n and ˆ c n are estimators of the opula densit y c based resp etiv ely on unobserv able pseudo-data ( F ( X i ) , G ( Y i ) , and their appro ximations ( F n ( X i ) , G n ( Y i )) . The main ingredien t of the pro of follo ws from the deomp osition: ˆ f n ( y | x ) − f ( y | x ) = ˆ g n ( y ) ˆ c n ( F n ( x ) , G n ( y )) − g ( y ) c ( F ( x ) , G ( y )) = [ ˆ g n ( y ) − g ( y ) ] ˆ c n ( F n ( x ) , G n ( y )) + g ( y ) [ˆ c n ( F n ( x ) , G n ( y )) − c ( F ( x ) , G ( y ))] : = D 1 + D 2 8 W e pro eed one step further in the deomp osition of ea h terms, b y rst en tering at xed lo ations, D 1 = [ ˆ g n ( y ) − g ( y )] [ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y ))] + [ ˆ g n ( y ) − g ( y ) ] [ ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y ))] + [ ˆ g n ( y ) − g ( y ) ] [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) ] + [ ˆ g n ( y ) − g ( y ) ] [ c ( F ( x ) , G ( y ))] (7) D 2 = g ( y ) [ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y ))] + g ( y ) [ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y ))] + g ( y ) [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y ))] (8) Con v ergene results for the k ernel densit y estimators of setion 5.2 en tail that ˆ g n ( y ) − g ( y ) = O p ( h 2 n + 1 / q nh n ) c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) = O p ( a 2 n + 1 / q na 2 n ) b y lemma 5.2 and 5.3 resp etiv ely . Appro ximation lemmas 5.4 and 5.5 of setions 5.4 and 5.5 en tail that ˆ c n ( F ( x ) , G ( y )) − c n ( F ( x ) , G ( y )) = o P ( a 2 n + 1 / q na 2 n ) ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = o P ( a 2 n + 1 / q na 2 n ) . W e therefore obtain that D 1 = O P  h 2 n + 1 / q nh n  O P  a 2 n + 1 / q na 2 n  + O P  h 2 n + 1 / q nh n  D 2 = o P  a 2 n + 1 / q na 2 n  + O P  a 2 n + 1 / q na 2 n  and the ondition a n → 0 , h n → 0 , na 2 n → + ∞ , nh n → + ∞ en tails the on v ergene of the estimator. ✷ Remark 2 As a  or ol lary, we get the r ate of  onver gen e, by ho osing the b andwidths whih b alan e the bias and varian e tr ade-o: for an optimal hoi e of h n ≃ n − 1 / 5 and a n ≃ n − 1 / 6 , we get ˆ f ( y | x ) = f ( y | x ) + O P ( n − 1 / 3 ) . Ther efor e, our estimator is r ate optimal in the sense that it r e ahes the mini- max r ate n − 1 / 3 of  onver gen e, a  or ding to Stone [ 30 ℄. Almost sure results an b e pro v ed in the same w a y: w e ha v e the follo wing strong onsisteny result 9 Theorem 3.2 L et the r e gularity  onditions on the densities and kernels b e satise d. If in addition nh n / (ln ln n ) → ∞ and na 2 n / (ln ln n ) → ∞ , then ˆ f n ( y | x ) = f ( y | x ) + O a.s.   a 2 n + s ln ln n na 2 n + h 2 n + s ln ln n nh n   . Pro of. It follo ws the same lines as the preeding theorem, but uses the a.s. results of the onsisteny of the k ernel densit y estimators of lemmas 5.2 and 5.3 and of the appro ximation lemmas 5.4 and 5.5 . It is therefore omitted. ✷ Remark 3 F or h n ≃ (ln ln n/n ) 1 / 5 and a n ≃ (ln ln n/n ) 1 / 6 whih is the op- timal tr ade-o b etwe en the bias and the sto hasti term, one gets the optimal r ate (ln ln n/n ) 1 / 3 . 3.3 Conver gen e in distribution Theorem 3.3 L et the r e gularity  onditions on the densities and kernels b e satise d. h n → 0 , a n → 0 , nh n → ∞ and na 2 n → ∞ entail q na 2 n  ˆ f n ( y | x ) − f ( y | x )  d ❀ N  0 , g ( y ) f ( y | x ) || K | | 2 2  . F or h n ≃ n − 1 / 5 , a n ≃ n − 1 / 6 one gets the usual r ate n − 1 / 3 . Pro of. With the onditions on the bandwidths, all the terms in the pre- vious deomp osition 7 and 8 , are negligible ompared to ( na 2 n ) − 1 / 2 exept c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) , whi h is asymptotially normal b y the result of setion 5 , lemma 5.3 q na 2 n g ( y ) [ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y )) ] d ❀ N  0 , g 2 ( y ) c ( F ( x ) , G ( y )) k K k 2 2  . An appliation of Slutsky's lemma yields the desired result. ✷ F or a v etor ( y 1 , . . . , y d ) , one an get a m ultidimensional v ersion of the on- v ergene in distribution (di on v ergene): Corollary 3.4 With the same assumptions, for ( y 1 , . . . , y d ) in the interior of supp ( g ) suh that g ( y i ) f ( y i | x ) 6 = 0 , q na 2 n     ˆ f n ( y i | x ) − f ( y i | x ) q g ( y i ) f ( y i | x ) k K k 2   , i = 1 , ..., m   d ❀ N ( m ) wher e N ( m ) is the standar d m -variate  enter e d normal distribution with iden- tity varian e matrix. 10 Pro of. It simply follo ws from the use of the Cramér-W old devie and is there- fore omitted. F or details, see e.g. [ 1 ℄, theorem 2.3. ✷ 3.4 Asymptoti Bias, V arian e and Me an squar e err or The asymptoti bias is alulated in the follo wing prop osition. Prop osition 3.5 With the assumptions of The or em 3.1 , we have B 0 := E ( ˆ f n ( y | x )) − f ( y | x ) = g ( y ) B K ( c, x, y ) a 2 n 2 + o ( a 2 n ) with B K ( c, x, y ) := m 2 ( K 1 ) ∂ 2 c ( F ( x ) ,G ( y )) ∂ u 2 + m 2 ( K 2 ) ∂ 2 c ( F ( x ) ,G ( y )) ∂ v 2 . Pro of. (Sk et h). By taking exp etation in the deomp osition 7 and 8 , E D 1 = c ( F ( x ) , G ( y )) E [ ˆ g n ( y ) − g ( y )] + R 1 E D 2 = g ( y ) E ([ c n ( F ( x ) , G ( y )) − c ( F ( x ) , G ( y ))]) + R 2 where w e made app ear the bias of ˆ g n and c n and where R 1 and R 2 stand for the remaining terms. With the assumptions on the bandwidths and deriv ations made tedious b y the transformation of the data b y the empirial margins, (see F ermanian [ 9 ℄ theorem 1 for su h a alulation), the terms in R 2 are negligible ompared to the bias of c n . The bias of c n , whi h is simply the bias of a biv ariate k ernel densit y estimator, is of order a 2 n . Similarly , b y b ounding the pro dut terms in D 1 b y Cau h y-S h w arz inequalit y , routine analysis sho w that the terms in R 1 are negligible ompared to the bias of ˆ g n , whi h is of order h 2 n . Sine h 2 n is itself negligible to a 2 n , the main term in the deomp osition is g ( y ) E ( c n ( F ( x ) , G ( y )) − C ( F ( x ) , G ( y ))) . Plugging the expression of the bias giv en in lemma 5.3 , yields the desired result. ✷ The asymptoti v ariane has already b een deriv ed in theorem 3.3 , V 0 := V ar ( ˆ f ( y | x )) = 1 / ( na 2 n ) g ( y ) f ( y | x ) | | K | | 2 2 + o (1 / ( na 2 n )) . T ogether with the omputation of the asymptoti bias, w e get the asymptoti mean squared error as a orollary: Corollary 3.6 With the pr evious assumptions, the Asymptoti Me an Squar e d Err or (AMSE) E 0 at ( x, y ) is E 0 := B 2 0 + V 0 = a 4 n g 2 ( y ) ( B k ( c, x, y )) 2 4 + g ( y ) f ( y | x ) | | K || 2 2 na 2 n + o a 4 n + 1 na 2 n ! 11 whih gives, for the hoi e of the usual b andwidths mentione d ab ove, E 0 = n − 2 / 3 g 2 ( y ) B 2 K ( c, x, y ) 4 + c ( F ( x ) , G ( y )) || K || 2 2 ! + o ( n − 2 / 3 ) . 4 Comparison with other estimators 4.1 Pr esentation of alternative estimators F or on v eniene, w e reall b elo w the denition of other estimators of the on- ditional densit y enoun tered in the literature and summarize their bias and v ariane prop erties. W e will note the bias of the ith estimator ˆ f i n ( y | x ) b y E i and its v ariane b y V i . (1) Double k ernel estimator : as dened in the in tro dution setion of our pap er b y the follo wing ratio, ˆ f (1) n ( y | x ) := 1 n n P i =1 K ′ h 1 ( X i − x ) K h 2 ( Y i − y ) 1 n n P i =1 K ′ h 1 ( X i − x ) . where h 1 and h 2 are the bandwidths. One then ha v e, see e.g. [ 17 ℄, • Bias: B 1 = h 2 1 m 2 ( K ) 2   2 f ′ ( x ) f ( x ) ∂ f ( y | x ) ∂ x + ∂ 2 f ( y | x ) ∂ x 2 + h 2 h 1 ! 2 ∂ 2 f ( y | x ) ∂ y 2   + o  h 2 1 + h 2 2  • V ariane: V 1 = k K k 2 2 f ( y | x ) nh 1 h 2 f ( x )  k K k 2 2 − h 2 f ( y | x )  + o  1 nh 1 h 2  (2) Lo al p olynomial estimator : Set R ( θ , x, y ) := n X i =1  K h 2 ( Y i − y ) − X r j =0 θ j ( X i − x ) j  2 K ′ h 1 ( X i − x ) , then the lo al p olynomial estimator is dened as ˆ f (2) n ( y | x ) := ˆ θ 0 , where ˆ θ xy := ( ˆ θ 0 , ˆ θ 1 , . . . , ˆ θ r ) is the v alue of θ whi h minimizes R ( θ , x, y ) . This lo al p olynomial estimator, although it has a sup erior bias than 12 the k ernel one, is no longer restrited to b e non-negativ e and do es not in tegrate to 1, exept in the sp eial ase r = 0 . F rom results of [ 8 ℄, w e get for the lo al linear estimator (see also [ 7 ℄ p. 256), • Bias: B 2 = h 2 1 m 2 ( K ′ ) 2 ∂ 2 f ( y | x ) ∂ x 2 + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 2 = || K || 2 2 || K ′ || 2 2 f ( y | x ) nh 1 h 2 f ( x ) + o  1 nh 1 h 2  (3) Lo al parametri estimator : As in [ 18 ℄ and [ 7 ℄, set R 1 ( θ , x, y ) := n X i =1 ( K h 2 ( Y i − y ) − A ( X i − x, θ )) 2 K ′ h 1 ( X i − x ) where A ( x, θ ) = l  P r j =0 θ j ( X i − x ) j  and l ( . ) is a monotoni funtion mapping R 7→ R + , e.g. l ( u ) = exp( u ) . Then, ˆ f (3) n ( y | x ) := A (0 , ˆ θ ) = l ( ˆ θ 0 ) . • Bias: B 3 = h 2 1 η ( K ′ ) ∂ 2 f ( y | x ) ∂ x 2 − ∂ 2 A (0 , θ xy ) ∂ x 2 ! + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 3 = τ ( K , K ′ ) 2 f ( y | x ) nh 1 h 2 f ( x ) + o  1 nh 1 h 2  where η and τ are k ernel dep enden t onstan ts. (4) Constrained lo al p olynomial estimator : A simple devie to fore the lo al p olynomial estimator to b e p ositiv e is to set θ 0 = exp( α ) in the denition of R 0 to b e minimized. The onstrained lo al p olynomial estimator ˆ f 4 n ( y | x ) is then dened analogously as the lo al p olynomial estimator ˆ f 2 n ( y | x ) . W e ha v e, as in [ 18 ℄ and [ 7 ℄: • Bias: B 4 := h 2 1 m 2 ( K ′ ) 2 ∂ 2 f ( y | x ) ∂ x 2 + h 2 2 m 2 ( K ) 2 ∂ 2 f ( y | x ) ∂ y 2 + o ( h 2 1 + h 2 2 ) • V ariane: V 4 = k K k 2 2 f ( y | x ) nh 1 h 2 f ( x ) + o  1 nh 1 h 2  13 4.2 Asymptoti Bias and V arian e  omp arison All estimators ha v e (hop efully) the same order n − 1 / 3 and n − 2 / 3 in their asymp- toti bias and v ariane terms, for the usual bandwidths  hoie. The main dierene lies in the onstan t terms whi h dep end on unkno wn densities. Bias : Con trary to all the alternativ e estimators whose bias in v olv es deriv ativ es of the full onditional densit y , one an note that our estimator's bias only in v olv es the densit y of Y and the deriv ativ es of the opula densit y . T o mak e things more expliit, the terms in v olv ed, e.g. in the lo al p olynomial estimator, write themselv es as the sum of the deriv ativ es of the onditional densit y , h − 2 n B 2 ≈ ∂ 2 f ( y | x ) ∂ x 2 + ∂ 2 f ( y | x ) ∂ y 2 that is to sa y , h − 2 n B 2 ≈ f ′ ( x ) g ( y ) ∂ c ( F ( x ) , G ( y )) ∂ u + f 2 ( x ) g ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ u 2 + 2 g ′ ( y ) g ( y ) ∂ c ( F ( x ) , G ( y )) ∂ v + g 3 ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ v 2 whereas our ( g ( y ) / 2) B K ( c, x, y ) term, mo dulo the onstan ts in v olv ed b y the k ernel, is written as a − 2 n B 0 ≈ g ( y ) ∂ 2 c ( F ( x ) , G ( y )) ∂ u 2 + ∂ 2 c ( F ( x ) , G ( y )) ∂ v 2 ! . It then b eomes lear that w e ha v e a simpler expression, with less unkno wn terms, as is the ase for omp etitors whi h do in v olv e the densit y f and its deriv ativ e f ′ of X and the deriv ativ e g ′ of the Y densit y . In a xed bandwidth and asymptoti on text, it seems diult to ompare further. Nonetheless, w e b eliev e this feature of our estimator w ould b e prati- ally relev an t when it omes to  ho osing the bandwidths. Indeed, bandwidth seletion is usually p erformed b y minimizing lo al or global asymptoti error riteria su h as Asymptoti Mean Square Error (AMSE) or Asymptoti Mean In tegrated Square Error (AMISE), in whi h unkno wn terms ha v e to b e esti- mated. Sine in our approa h, the asymptoti bias and v ariane in v olv e less unkno wn terms, w e exp et that a higher auray ould b e obtained in this pre-estimation stage. Moreo v er, b y ha ving managed to separate the estimation problem of the marginal from the opula densit y , w e ould use kno wn optimal data-dep enden t bandwidths seletion pro edures for densit y estimation su h as ross v alidation, separately for the densit y of Y and for the opula densit y . Remark 4 Sin e the  opula density c has a  omp at supp ort [0 , 1] 2 , our esti- mator may suer fr om bias issues on the b oundaries, i.e. in the tails of X and 14 Y . T o  orr e t these issues, one  ould apply one of the sever al known te hniques to r e du e the bias of the kernel estimator on the e dges (se e e.g [ 7 ℄ hapter 5.5, b oundary kernels, r ee tion, tr ansformation and lo  al p olynomial tting). In the tail of the distribution of X , this bias issue in the  opula density estimator is b alan e d by the impr ove d varian e, as shown b elow. V ariane : The v ariane of our estimator in v olv es a pro dut of the densit y g ( y ) of Y b y the onditional densit y f ( y | x ) , na 2 n V 0 ≈ g ( y ) f ( y | x ) = g 2 ( y ) c ( F ( x ) , G ( y ) whereas omp etitors in v olv e the ratio of f ( y | x ) b y the densit y f ( x ) of X f ( y | x ) f ( x ) = g ( y ) f ( x ) c ( F ( x ) , G ( y )) . It is a remark able feature of the estimator w e prop ose, that its v ariane do es not in v olv e diretly f ( x ) , as is the ase for the omp etitors, but only its on tri- bution to Y , through the opula densit y . This reets the abilit y announed in the in tro dution of the opula represen tation to ha v e eetiv ely separated the randomness p ertaining to Y alone, from the dep endene struture of ( X , Y ) . Moreo v er, our estimator also do es not suer from the unstable nature of om- p etitors who, due to their in trinsi ratio struture, get an explosiv e v ariane for small v alue of the densit y f ( x ) , making onditional estimation diult, e.g. in the tail of the distribution of X . Remark 5 T o make estimators  omp ar able, we have r estrite d ourselves to so- al le d xe d b andwidths estimators, i.e. nonp ar ametri estimators wher e the b andwidths ar e of the generi form h n = bn α or h n = b (ln n/n ) α with α and b r e al numb ers. Impr ove d b ehavior for al l the pr e  e ding estimators  an b e obtaine d with data-dep endent b andwidths wher e h n = H n ( X 1 , . . . , X n , x )  an b e funtions of the lo  ation and of the data. 4.3 Finite sample numeri al simulation 4.3.1 Pr ati al implementation of the estimator Although the prop osed estimator seems to ompare fa v orably asymptotially , some pitfalls link ed to the opula densit y estimation ma y sho w up in the pratial implemen tation: Innities at the orners: man y opula densities exhibit innite v alues at their orners. Therefore, to a v oid that ( F n ( X i ) , G n ( Y i )) b e equal to (1 , 1) , w e  hange the empirial distribution funtions F n and G n to n/ ( n + 1) F n and n/ ( n + 1) G n resp etiv ely . 15 Boundary bias: sine the opula densit y is of ompat supp ort [0 , 1] 2 , the k ernel metho d of estimation ma y suer from b oundary bias. T o alleviate this issue, w e suggest to use b oundary-orreted k ernels su h as the b eta k ernels K x,b ( t ) = β x/b +1 , (1 − x ) /b +1 ( t ) , where β a,b ( t ) denotes the p df of a Beta(a,b) dis- tribution, adv o ated b y Chen [ 2 ℄, and used e.g. b y [ 14 ℄ for estimating loss distributions. The mo died opula densit y pseudo estimator is th us dened as c n ( u, v ) = n − 1 P n i =1 K u,a n ( U i ) K v,a n ( V i ) . Bandwidth seletion: p erformane of nonparametri estimators dep ends ruially on the bandwidths. F or onditional densit y , bandwidth seletion is a more deliate matter than for densit y estimation due to the m ultidimensional nature of the problem. Moreo v er, for ratio-t yp e estimators, the diult y is inreased b y the lo al dep endene of the bandwidths h y on h x implied b y on- ditioning near x . F or the opula estimator, a supplemen tal issue omes from the fat that the pseudo-data F ( X i ) , G ( Y i ) is not diretly aessible. Insp e- tion of the AMISE of the opula-based estimator suggest w e an separate the bandwidth  hoie of h for ˆ g ( y ) from the bandwidth  hoie of a n the opula densit y estimator ˆ c n . A rationale for a data-dep enden t metho d is to separately selet h on the Y i data alone (e.g. b y ross-v alidation or plug-in), from the a n of the opula densit y c based on the appro ximate data F n ( X i ) , G n ( Y i ) . Ho w- ev er, su h a bandwidth seletion w ould require deep er analysis and w e lea v e a detailed study of a pratial data-dep enden t metho d for bandwidth seletion of the opula-quan tile estimator, together with a global and lo al omparison of the estimators at their resp etiv e optimal bandwidths for further resear h. 4.3.2 Mo del and  omp arison r esults W e sim ulated a sample of n = 100 v ariables ( X i , Y i ) , from the follo wing mo del: X , Y is marginally distributed as N (0 , 1) and link ed via F rank Copula . C ( u, v , θ ) = ln[( θ + θ u + v − θ u − θ v ) / ( θ − 1 )] ln θ with parameter θ = 100 . W e restrited ourselv es to simple, xed for all x, y , rule-of-th um b metho ds based on Normal referene rule to get a rst piture. F or the seletion of a n of the opula densit y estimator, w e applied Sott's Rule on the data F n ( X i ) . W e used Epane hnik o v k ernels for ˆ g ( y ) and the other estimators. W e plotted the onditional densit y along with its estimations on the domain x ∈ [ − 5 , 5] and y ∈ [ − 3 , 3] on gure 1 . A omparison plot at x = 2 is sho wn on gure 2 . 16 Figure 1. 3D Plots. F rom left to righ t, top to b ottom: true densit y , quan tile-opula estimator, double k ernel, lo al p olynomial (lipp ed). - 3 - 2 - 1 0 1 2 3 y 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure 2. Comparison at x=2: onditional densit y=thi k urv e, quan tile-op- ula=on tin uous line, double k ernel=dotted urv e, lo al p olynomial=dashed urv e. 17 4.3.3 Clipping and Estimation in the tails As men tioned earlier, as the p erformane of the estimators dep ends on the p erformane of the bandwidths seletion metho d, it is deliate to giv e a on- lusiv e answ er. Ho w ev er, w e w ould lik e to illustrate at least one ase where the prop osed estimator learly outp erforms its omp etitors. Indeed, one ma jor issue of alternativ e estimators already men tioned is their n umerial explosion when the estimated densit y ˆ f ( x ) is lose to zero. In partiular, if the k ernel is of ompat supp ort, the denominator is zero for the x whose distane from the losest X i exeeds half the bandwidth times the length of the supp ort, thereb y allo wing estimation only on a losed subset of X inluded in [min X i , max X i ] . This is one of the reason wh y sim ulation studies are often p erformed either with a marginal X densit y of b ounded supp ort and/or with a Gaussian k er- nel. Note that the problem remains with a Gaussian k ernel sine the estimated densit y an b eome qui kly lo w er than the ma hine preision. T o prev en t from this n umerial explosion, the denition of the onditional densit y estimators ha v e to b e mo died either b y ˆ f ( y | x ) =      ˆ f X Y ( x,y ) ˆ f X ( x ) if ˆ f X ( x ) > c ˆ a ( y ) if ˆ f X ( x ) = 0 or b y , ˆ f ( y | x ) = ˆ f X Y ( x, y ) max { ˆ f ( x ) , c } where c > 0 is an arbitrary amoun t of lipping, and ˆ a ( . ) is an arbitrary densit y estimator (usually  hosen to b e zero or ˆ g ( y ) ). An illustration of these issues learly app ears in gure 1 . The unlipp ed v ersion of the double k ernel estimator is unable to estimate the onditional densit y for | x | roughly > 3 , and the lipp ed v ersion of the lo al p olynomial estimator with c = 0 . 00001 and ˆ a ( y ) = ˆ g ( y ) giv es a wrong estimation in the tails, reeting the arbitrary  hoies in the lipping deision. T o the on trary , the quan tile- opula estimator is surprisingly able to estimate the onditional densit y f ( y | x ) at lo ations x where there is no data, i.e. in the tails of the distribution of X . An explanation of this apparen tly parado xal phenomenon omes from the fat that the estimator is partially based on the ranks of X i and Y i . Therefore, it an reo v er hidden information on the densit y of X from the ordering of the pairs ( X i , Y i ) . See Ho [ 16 ℄ for a detailed explanation. W e b eliev e that this feature migh t b e of p oten tial in terest for appliations, e.g. in statistial inferene of extreme v alues and rare ev en ts. Disussion The quan tile transform and use of the opula form ula has th us turned the on- ditional densit y form ula ( 1 ) of the ratio t yp e in to the pro dut one ( 3 ). This 18 form ula w as the ba kb one of our artile where this pro dut form app eared to b e esp eially app ealing for statistial estimation: onsisteny and limit re- sults where obtained b y simple om bination of the previous kno wn ones on (unonditional) densit y estimation. The estimator obtained sho ws in teresting asymptoti bias and v arianes prop erties ompared to omp etitors. Although its nite sample implemen tation do es not giv e y et a lear and onlusiv e pi- ture, it already yields some promising results, e.g. for estimation in the tails of X , where the prop osed estimator do es not suer from lipping issues. 5 App endix : auxiliary results In this setion, w e gather some preliminary results whi h w e will need as basi to ols for the demonstrations of setion 3. In subsetion 5.1 , w e reall lassial results ab out the on v ergene of the K olmogoro v-Sminorv statisti. Next, w e mak e a brief o v erview of k ernel densit y estimation and apply these results to the estimators ˆ g n (setion 5.2 ) and c n (setion 5.3 ). Ev en tually , w e need t w o appro ximation lemmas of ˆ c n b y c n in setions 5.4 and 5.5 . 5.1 Appr oximation of the pseudo-variables F ( X i ) by their estimates F n ( X i ) F or ( X i , i = 1 , . . . , n ) an i.i.d. sample of a real random v ariable X with ommon .d.f. F , the K olmogoro v-Smirno v statisti is dened as D n := k F n − F k ∞ . Gliv enk o-Can telli, K olmogoro v and Smirno v, Ch ung, Donsk er among others ha v e studied its on v ergene prop erties in inreasing generalit y (See [ 28 ℄ and [ 36 ℄ for reen t aoun ts). F or our purp ose, w e only need to form ulate these results in the follo wing rough form: Lemma 5.1 F or an i.i.d. sample fr om a  ontinuous .d.f. F , k F n − F k ∞ = O a.s.   s ln ln n n   (9) k F n − F k ∞ = O P 1 √ n ! . (10) Sine F is unkno wn, the random v ariables U i = F ( X i ) are not observ ed. As a onsequene of the preeding lemma 5.1 , one an naturally appro ximate these v ariables b y the statistis F n ( X i ) . Indeed, | F ( X i ) − F n ( X i ) | ≤ sup x ∈ R | F ( x ) − F n ( x ) | = k F n − F k ∞ a.s. 19 Th us, | F ( X i ) − F n ( X i ) | is no more than an O P ((ln ln n/n ) 1 / 2 ) or an O a.s. ( n − 1 / 2 ) . These rates of appro ximation app ears to b e faster than those of statistial estimation of densities, as is sho wn in the next subsetion. 5.2 Conver gen e of the kernel density estimator ˆ g n W e reall b elo w some lassial results ab out the on v ergene of the P arzen- Rosen blatt k ernel non-parametri estimator ˆ f n of a d-v ariate densit y . Sine its ineption b y Rosen blatt [ 25 ℄ and P arzen [ 22 ℄, it has b een studied b y a great deal of authors. See e.g. Sott [ 27 ℄, Prak asa Rao [ 23 ℄, Nadara y a [ 21 ℄ for details. See also Bosq [ 1 ℄  hapter 2. It is w ell kno wn that the bias of the k ernel densit y estimator dep ends on the degree of smo othness of the underlying densit y , measured b y its n um b er of deriv ativ es or its Lips hitz order. In order to get the on v ergene of the bias to zero, it sues to assume that the densit y is on tin uous (See [ 22 ℄). T o get further information on the rate of on v ergene of the estimator, it is neessary to mak e further assumptions. Moreo v er, for k ernel funtions with un b ounded supp ort, the rate of on v ergene also dep ends on the tail b eha vior of the k ernel (See Stute [ 31 ℄). Therefore, for larit y of exp osition and simpliit y of notations, w e will mak e the ustomary assumptions that the densit y is t wie dieren tiable and that the k ernel is of b ounded supp ort. W e then ha v e the follo wing results: • Bias: With the previous assumptions, for a x in the in terior of supp ( f ) , h n → 0 and nh d n → ∞ en tail that E ˆ f n ( x ) = f ( x ) + h 2 n 2 Z R d X 1 ≤ i,j ≤ d ∂ 2 f ( x ) ∂ x i ∂ x j z i z j K ( z ) dz + o ( h 2 n ) . With the m ultiv ariate k ernel K as a pro dut of d order one k ernels K i , the ab o v e sum redues to the diagonal terms. E ˆ f n ( x ) = f ( x ) + h 2 n 2 X 1 ≤ i ≤ d m 2 ( K i ) ∂ 2 f ( x ) ∂ x 2 i + o ( h 2 n ) . • V ariane: with the same assumptions, V ar h ˆ f n ( x ) i = f ( x ) nh d n k K k 2 2 + o 1 nh d n ! . • P oin t wise asymptoti normalit y: under the previous onditions, q nh d n  ˆ f n ( x ) − E ˆ f n ( x )  d ❀ N (0 , f ( x ) k K k 2 2 ) . 20 F or a  hoie of the bandwidth as h n ≃ n − 1 / ( d +4) , whi h realizes the optimal trade-o b et w een the bias and v ariane, one gets the rate n − 2 / ( d +4) , whi h is the optimal sp eed of on v ergene in the minimax sense in the lass of densit y funtions with b ounded seond deriv ativ es, aording to [ 30 ℄. • P oin t wise almost sure on v ergene: if moreo v er nh d n / (ln ln n ) → ∞ (see [ 3 ℄), w e ha v e that ˆ f n ( x ) − E ˆ f n ( x ) = O a.s. s ln ln n nh d n ! . F or a  hoie of the bandwidth as h n ≃ ((ln ln n ) /n ) 1 / ( d +4) , w e get the rate of on v ergene ((ln ln n ) /n ) 2 / ( d +4) : ˆ f n ( x ) − f ( x ) = O a.s.   ln ln n n ! 2 / ( d +4)   . Applied to our ase ( d = 1 ), w e an summarize these results for further ref- erene in the follo wing lemma for the estimator ˆ g n of the densit y g of Y : Lemma 5.2 With the pr evious assumptions, for a p oint y in the interior of the supp ort of g , and a b andwidth hosen suh as h n ≃ n − 1 / 5 , we have | ˆ g n ( y ) − g ( y ) | = O p ( n − 2 / 5 ) n 2 / 5 [ ˆ g n ( y ) − g ( y ) ] d ❀ N  0 , g ( y ) k K 0 k 2 2  . With the same assumptions, but for a b andwidth hoi e of h n ≃ (ln ln n/n ) 1 / 5 , ˆ g n ( y ) − g ( y ) = O a.s.   ln ln n n ! 2 / 5   . (11) 5.3 Conver gen e of c n ( u, v ) As men tioned b efore, the assumptions that F and G b e dieren tiable and stritly inreasing en tail that c is the densit y of the transformed v ariables ( U, V ) := ( F ( X ) , G ( Y )) . Therefore, one one on vines oneself that c n ( u, v ) is simply the k ernel densit y estimator of the biv ariate densit y c ( u, v ) of the pseudo-v ariables ( U, V ) , one diretly dra ws its on v ergene prop erties b y ap- plying the results of the preeding subsetion with d = 2 : Lemma 5.3 F or a hoi e of a n ≃ n − 1 / 6 , for every ( u, v ) ∈ (0 , 1) 2 , similar r esults of those of lemma 5.2 hold for ˆ c n with a r ate of  onver gen e of n − 1 / 3 and (ln ln n/n ) 1 / 3 r esp e tively. 21 5.4 A n appr oximation lemma of ˆ c n ( u, v ) by c n ( u, v ) The lemma of this setion giv es the rate of appro ximation of the k ernel opula densit y estimator ˆ c n ( u, v ) omputed on the real data ( F n ( X i ) , G n ( Y i )) b y its analogue c n ( u, v ) omputed on the pseudo-data ( U i , V i ) := ( F ( X i ) , G ( Y i )) . A similar result, but with a dieren t pro of, has b een obtained in F ermanian [ 9 ℄ theorem 1. Lemma 5.4 L et ( u, v ) ∈ (0 , 1) 2 . If the kernel K ( u, v ) = K 1 ( u ) K 2 ( v ) is twi e dier entiable with b ounde d se  ond derivatives, then | ˆ c n ( u, v ) − c n ( u, v ) | = o P ( a 2 n + 1 / q na 2 n ) | ˆ c n ( u, v ) − c n ( u, v ) | = o a.s. s ln ln n na 2 n ! Pro of. W e note || . || a norm for v etors. Set ∆ := ˆ c n ( u, v ) − c n ( u, v ) = 1 na 2 n n P i =1 ∆ i,n ( u, v ) with ∆ i,n ( u, v ) := K u − F n ( X i ) a n , v − G n ( Y i ) a n ! − K u − F ( X i ) a n , v − G ( Y i ) a n ! and dene Z i,n :=    F ( X i ) − F n ( X i ) G ( Y i ) − G n ( Y i )    . As men tioned in setion 5.1 , | F n ( X i ) − F ( X i ) | ≤ || F n − F || ∞ and | G n ( Y i ) − G ( Y i ) | ≤ || G n − G || ∞ a.s. for ev ery i = 1 , . . . , n . Lemma 5.1 th us en tails that the norm of Z i,n is indep enden t of i and su h that || Z i,n || = O P (1 / √ n ) , i = 1 , . . . , n (12) || Z i,n || = O a.s. ( q ln ln n/n ) , i = 1 , . . . , n (13) No w, for ev ery xed ( u, v ) ∈ [0 , 1] 2 , sine the k ernel K is t wie dieren tiable, there exists, b y T a ylor expansion, random v ariables ˜ U i,n and ˜ V i,n su h that, almost surely , ∆ = 1 na 3 n n X i =1 Z T i,n ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! + 1 2 na 4 n n X i =1 Z T i,n ∇ 2 K u − ˜ U i,n a n , v − ˜ V i,n a n ! Z i,n := ∆ 1 + ∆ 2 22 where Z T i,n denotes the transp ose of the v etor Z i,n and ∇ K and ∇ 2 K the gradien t and the Hessian resp etiv ely of the m ultiv ariate k ernel funtion K ∇ K =    ∂ K ∂ u ∂ K ∂ v    , ∇ 2 K =    ∂ 2 K ∂ u 2 ∂ 2 K ∂ u∂ v ∂ 2 K ∂ u∂ v ∂ 2 K ∂ v 2    Ne gligibility of ∆ 2 : By the b oundedness assumption on the seond-order deriv a- tiv es of the k ernel, and equations 12 and 13 , ∆ 2 = O P 1 na 4 n ! and ∆ 2 = O a.s. ln ln n na 4 n ! . Ne gligibility of ∆ 1 : By en tering at exp etations, ∆ 1 = 1 na 3 n n X i =1 Z T i,n ∇ K u − F ( X i ) a n , . . . ! − E ∇ K u − F ( X i ) a n , . . . !! + 1 na 3 n n X i =1 Z T i,n E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! := ∆ 11 + ∆ 12 Ne gligibility of ∆ 12 : Bias results on the biv ariate gradien t k ernel estimator (See Sott [ 27 ℄  hapter 6) en tail that E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n ! = a 3 n ∇ c ( u, v ) + O ( a 5 n ) Cau h y-S h w arz inequalit y yields that | ∆ 12 | ≤ n || Z i,n || na 3 n      E ∇ K u − F ( X i ) a n , v − G ( Y i ) a n !      In turn, with equations 12 and 13 , ∆ 12 = O P (1 / √ n ) and ∆ 12 = O a.s ( q ln ln n/n ) . Ne gligibility of ∆ 11 : Set A i = ∇ K  u − F ( X i ) a n , . . .  − E ∇ K  u − F ( X i ) a n , . . .  . Then, | ∆ 11 | ≤ || Z n || na 3 n n X i =1 || A i || Boundedness assumption on the deriv ativ e of the k ernel imply that || A i || ≤ 2 C a.s. W e apply Ho eding inequalit y for indep enden t, en tered, b ounded b y M , but non iden tially distributed random v ariables ( η j ) (e.g. see [ 1 ℄), P ( n X j =1 η j > t ) ≤ exp − t 2 2 nM 2 ! . (14) 23 Here, for ev ery ǫ > 0 , with M = 2 C , η i = || A i || − E || A i || , t = ǫn 1 / 2 (ln ln n ) 1 / 2 , w e get that P  X n i =1 ( || A i || − E || A i || ) > ǫ √ n ln ln n  6 exp − ǫ 2 ln ln n 4 M 2 ! = 1 (ln n ) δ with a δ > 0 and where the r.h.s. go es to zero as n → ∞ . Therefore, P n i =1 ( || A i || − E || A i || ) = O P ( √ n ln ln n ) . F or the almost sure negligibilit y , w e get similarly b y inequalit y 14 that, for ev ery ǫ > 0 , with t = ǫn (1+ δ ) / 2 and δ > 0 , P  X n i =1 ( || A i || − E || A i || ) > ǫn (1+ δ ) / 2  6 exp − ǫ 2 n δ 4 M 2 ! and the series on the r.h.s is on v ergen t. In turn, the Borell-Can telli lemma imply that P n i =1 ( || A i || − E || A i || ) = O a.s. ( n (1+ δ ) / 2 ) . It remains to ev aluate E | | A i || . First, w e ha v e that E | | A i || ≤ 2 E ||∇ K (( u − F ( X i )) /a n , . . . ) || . Seond, sine K is dieren tiable and of pro dut form K ( u, v ) = K 1 ( u ) K 2 ( v ) , ea h sub-k ernel is of b ounded v ariations and an b e written as a dierene of t w o monotone inreasing funtions. F or example, set K 1 = K a 1 − K b 1 and dene K ∗ := ( K a 1 + K b 1 ) K 2 . W e ha v e,      ∂ K ∂ u      6  | ( K a 1 ) ′ | + | ( K b 1 ) ′ |  K 2 = (( K a 1 ) ′ + ( K b 1 ) ′ ) K 2 := ∂ K ∗ ∂ u where the equalit y pro eeds from the p ositivit y of the deriv ativ es. As a on- sequene, E      ∂ K ∂ u (( u − F ( X i )) /a n , . . . )      ≤ E ∂ K ∗ ∂ u (( u − F ( X i )) /a n , . . . ) and similarly for the other partial deriv ativ e. The r.h.s. of the previous inequal- it y is, after an in tegration b y parts, of order a 3 n b y the results on the k ernel estimator of the gradien t of the densit y (See Sott [ 27 ℄  hapter 6). Therefore, P n i =1 E | | A i || = O ( na 3 n ) . Reolleting all elemen ts, w e ev en tually obtain that ∆ 11 = O P √ n ln ln n + na 3 n √ nna 3 n ! = O P √ ln ln n na 3 n + 1 √ n ! = o P   1 q na 2 n   . ∆ 11 = O a.s.   n (1+ δ ) / 2 + na 3 n na 3 n s ln ln n n   = O a.s.   s ln ln n na 2 n 1 n (1 − δ ) / 2 a 2 n + s ln ln n n   = o a.s. s ln ln n na 2 n ! 24 for δ small enough ( < 1 / 3 for a n ≃ n − 1 / 6 ). ✷ 5.5 A n appr oximation lemma for ˆ c n ( F n ( x ) , G n ( y )) by ˆ c n ( F ( x ) , G ( y )) The lemma of this subsetion giv es the rate of deviation of the k ernel opula densit y estimator ˆ c n from a v arying lo ation ( F n ( x ) , G n ( y )) to a xed lo ation ( F ( x ) , G ( y )) . Lemma 5.5 With the same assumptions as in the pr e  e ding lemma, we have ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = o P   a 2 n + 1 q na 2 n   ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = O a.s.   s ln ln n n   Pro of. W e pro eed similarly as in the preeding lemma. Set ∆ n ( x, y ) := ˆ c n ( F n ( x ) , G n ( y )) − ˆ c n ( F ( x ) , G ( y )) = 1 na 2 n n X i =1 ∆ ′ i,n ( x, y ) (15) with ∆ ′ i,n ( x, y ) := K F n ( x ) − F n ( X i ) a n , G n ( y ) − G n ( Y i ) a n ! − K F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! and dene Z n ( x, y ) :=    F n ( x ) − F ( x ) G n ( y ) − G ( y )    W e rst express ∆ ′ i,n ( x, y ) at a xed lo ation ( F ( x ) , G ( y )) b y a T a ylor expan- sion and b y b ounding uniformly the seond order terms, ∆ ′ i,n ( x, y ) = Z T n ( x, y ) ∇ K a n F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! + || Z n || 2 ∞ a 2 n R 1 (16) where R 1 is uniformly b ounded almost surely: R 1 = O a.s. (1) . W e then go from the data ( F n ( X i ) , G n ( Y i )) to the pseudo but xed w.r.t. n data ( F ( X i ) , G ( Y i )) . 25 By a seond T a ylor expansion, ∇ K a n F ( x ) − F n ( X i ) a n , G ( y ) − G n ( Y i ) a n ! = ∇ K a n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + Z T i,n ∇ 2 K 2 a 2 n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + || Z n || ∞ a 2 n R 2 . (17) where R 2 = o a.s. (1) uniformly in i , x and y . Therefore, plugging 16 and 17 in 15 , w e get ∆ n ( x, y ) = Z T n ( x, y ) na 2 n n X i =1 ∇ K a n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + Z T n ( x, y ) na 2 n n X i =1 Z t i,n ∇ 2 K 2 a 2 n F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! + R 3 || Z n || 2 ∞ a 4 n . with the remainder term R 3 = O a.s. (1) uniformly . As b efore, the prop erties of the k ernel (deriv ate) densit y estimator (See Sott [ 27 ℄  hapter 6) en tails that 1 na 3 n n X i =1 ∇ K F ( x ) − F ( X i ) a n , G ( y ) − G ( Y i ) a n ! = O P ( a 2 n + 1 / q na 4 n ) . Therefore, using 12 and b ounding uniformly the Hessian, 15 b eomes ∆ n ( x, y ) = O P   a 2 n || Z n || ∞ + || Z n || ∞ q na 4 n   + O P || Z n || 2 ∞ a 4 n ! = o P   a 2 n + 1 q na 2 n   . Similarly , one gets with 13 and the strong onsisteny of the estimator of the gradien t of the densit y that ∆ n ( x, y ) = O a.s.  q ln ln n n  . ✷ Referenes [1℄ D. Bosq. Nonp ar ametri statistis for sto hasti pr o  esses , v olume 110 of L e tur e Notes in Statistis . Springer-V erlag, New Y ork, seond edition, 1998. Estimation and predition. [2℄ S. X. Chen. Beta k ernel estimators for densit y funtions. Comput. Statist. Data A nal. , 31(2):131145, 1999. 26 [3℄ P . Deheuv els. Conditions néessaires et susan tes de on v ergene p ontuelle presque sûre et uniforme presque sûre des estimateurs de la densité. C. R. A  ad. Si. Paris Sér. A , 278:12171220, 1974. [4℄ P . Deheuv els. La fontion de dép endane empirique et ses propriétés. Un test non paramétrique d'indép endane. A  ad. R oy. Belg. Bul l. Cl. Si. (5) , 65(6):274292, 1979. [5℄ P . Deheuv els. A Kolmogoro v-Smirno v t yp e test for indep endene and m ultiv ariate samples. R ev. R oumaine Math. Pur es Appl. , 26(2):213226, 1981. [6℄ L. Devro y e and G. Lugosi. Combinatorial metho ds in density estimation . Springer Series in Statistis. Springer-V erlag, New Y ork, 2001. [7℄ J. F an and Q. Y ao. Nonline ar time series . Springer Series in Statistis. Springer-V erlag, New Y ork, seond edition, 2005. Nonparametri and parametri metho ds. [8℄ J. F an, Q. Y ao, and H. T ong. Estimation of onditional densities and sen- sitivit y measures in nonlinear dynamial systems. Biometrika , 83(1):189 206, 1996. [9℄ J.-D. F ermanian. Go o dness-of-t tests for opulas. J. Multivariate A nal. , 95(1):119152, 2005. [10℄ J.-D. F ermanian and Saillet O. Nonparametri estimation of opulas for time series. Journal of R isk , 5(4):2554, 2003. [11℄ J.-D. F ermanian, D. Radulo vi¢, and M. W egk amp. W eak on v ergene of empirial opula pro esses. Bernoul li , 10(5):847860, 2004. [12℄ T. Gasser and H.-G. Müller. Kernel estimation of regression funtions. In Smo othing te hniques for urve estimation (Pr o . W orkshop, Heidel- b er g, 1979) , v olume 757 of L e tur e Notes in Math. , pages 2368. Springer, Berlin, 1979. [13℄ I. Gijb els and J. Mielnizuk. Estimating the densit y of a opula funtion. Comm. Statist. The ory Metho ds , 19(2):445464, 1990. [14℄ J. Gustafsonn, M. Hagmann, J.P . Nielsen, and O. Saillet. Lo al trans- formation k ernel densit y estimation of loss distributions. F orth oming in Journal of Business and E onomi Statistis , 2007. [15℄ L. Györ and M. K ohler. Nonparametri estimation of onditional dis- tributions. IEEE T r ans. Inform. The ory , 53(5):18721879, 2007. [16℄ P . D. Ho. Extending the rank lik eliho o d for semiparametri opula estimation. A nnals Appl. Stats. , 1(1):265283, 2007. [17℄ R. J. Hyndman, D. M. Bash tann yk, and G. K. Grun w ald. Estimating and visualizing onditional densities. J. Comput. Gr aph. Statist. , 5(4):315 336, 1996. [18℄ R. J. Hyndman and Q. Y ao. Nonparametri estimation and symmetry tests for onditional densit y funtions. J. Nonp ar ametr. Stat. , 14(3):259 278, 2002. [19℄ H. Jo e. Multivariate mo dels and dep enden e  on epts , v olume 73 of Mono- gr aphs on Statistis and Applie d Pr ob ability . Chapman & Hall, London, 1997. 27 [20℄ C. Laour. A daptiv e estimation of the transition densit y of a mark o v  hain. A nn. Inst. H. Poin ar é Pr ob ab. Statist. , 43(5):571597, 2007. [21℄ È. A. Nadara y a. Nonp ar ametri estimation of pr ob ability densities and r e gr ession urves , v olume 20 of Mathematis and its Appli ations (Soviet Series) . Klu w er A ademi Publishers Group, Dordre h t, 1989. T ranslated from the Russian b y Sam uel K otz. [22℄ E. P arzen. On estimation of a probabilit y densit y funtion and mo de. A nn. Math. Statist. , 33:10651076, 1962. [23℄ B. L. S. Prak asa Rao. Nonp ar ametri funtional estimation . Probabil- it y and Mathematial Statistis. A ademi Press In. [Harourt Brae Jo v ano vi h Publishers℄, New Y ork, 1983. [24℄ M. B. Priestley and M. T. Chao. Non-parametri funtion tting. J. R oy. Statist. So . Ser. B , 34:385392, 1972. [25℄ M. Rosen blatt. Remarks on some nonparametri estimates of a densit y funtion. A nn. Math. Statist. , 27:832837, 1956. [26℄ M. Rosen blatt. Conditional probabilit y densit y and regression estimators. In Multivariate A nalysis, II (Pr o . Se  ond Internat. Symp os., Dayton, Ohio, 1968) , pages 2531. A ademi Press, New Y ork, 1969. [27℄ D. W. Sott. Multivariate density estimation . Wiley Series in Probabilit y and Mathematial Statistis: Applied Probabilit y and Statistis. John Wiley & Sons In., New Y ork, 1992. Theory , pratie, and visualization, A Wiley-In tersiene Publiation. [28℄ G. R. Shora k and J. A. W ellner. Empiri al pr o  esses with appli ations to statistis . Wiley Series in Probabilit y and Mathematial Statistis: Probabilit y and Mathematial Statistis. John Wiley & Sons In., New Y ork, 1986. [29℄ M. Sklar. F ontions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris , 8:229231, 1959. [30℄ C. J. Stone. Optimal rates of on v ergene for nonparametri estimators. A nn. Statist. , 8(6):13481360, 1980. [31℄ W. Stute. A la w of the logarithm for k ernel densit y estimators. A nn. Pr ob ab. , 10(2):414422, 1982. [32℄ W. Stute. Asymptoti normalit y of nearest neigh b or regression funtion estimates. A nn. Statist. , 12(3):917926, 1984. [33℄ W. Stute. Conditional empirial pro esses. A nn. Statist. , 14(2):638647, 1986. [34℄ W. Stute. On almost sure on v ergene of onditional empirial distribu- tion funtions. A nn. Pr ob ab. , 14(3):891901, 1986. [35℄ A. W. v an der V aart. Asymptoti statistis , v olume 3 of Cambridge Series in Statisti al and Pr ob abilisti Mathematis . Cam bridge Univ ersit y Press, Cam bridge, 1998. [36℄ A. W. v an der V aart and J. A. W ellner. W e ak  onver gen e and empiri al pr o  esses . Springer Series in Statistis. Springer-V erlag, New Y ork, 1996. With appliations to statistis. 28

A quantile-copula approach to conditional density estimation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment