Discussion of: Brownian distance covariance

The Annals of Applie d Statistics 2009, V ol. 3, No. 4, 1266–126 9 DOI: 10.1214 /09-A OAS312A Main articl e DO I: 10.1214/ 09-AOAS312 c  Institute of Mathematical Statistics , 2 009 DISCUSS ION OF: BR O WNIAN DIST ANCE CO V ARIANCE By Peter J. Bickel 1 and Ying Xu 1 University o f Californ ia, Berkeley Szek ely and Rizzo present a new in teresting measure of corr elation. Th e idea of using R | φ n ( u, v ) − φ (1) n ( u ) φ (2) n ( v ) | 2 dµ ( u, v ), where φ n , φ (1) n , φ (2) n are the empirical c h aracteristic functions of a sample ( X i , Y i ), i = 1 , . . . , n , of indep en d en t copies of X and Y is not so no v el. A. F euerv erger considered suc h measur es in a series of pap er s [ 4 ]. Aiy ou Chen and I ha ve actually analyzed suc h a mea s u re for est imation in [ 3 ] in conn ection with ICA. Ho we v er, the c hoice of µ ( · , · ) which mak es the measure scale free, the ex- tension to X ∈ R p , Y ∈ R q and its iden tiﬁcation with the Bro w nian distance co v ariance is new, su rprisin g and in teresting. There are three other measur es av ailable, for general p , q : 1. The canonical correlation ρ b et w een X and Y . 2. The r ank correlation r (for p = q = 1 ) and its canonical correlation gen- eralizatio n . 3. The R enyi correlation R . All v anish along with the Bro wn ian distance (BD) correlati on in the case of indep endence and all are scale f r ee. T he Bro wnian distance and Ren yi co v ariance are the only ones wh ic h v anish iﬀ X and Y are indep endent. Ho we v er, the th r ee classical measures also giv e a charact erization of to- tal d ep enden ce. If | ρ | = 1, X and Y must b e linearly r elated; if | r | = 1, Y m u st b e a monotone fun ction of X and if R = 1 , then either there exist non trivial functions f and g suc h that P ( f ( X ) = g ( Y )) = 1 or at least there is a sequ ence of such nontrivia l functions f n , g n of v ariance 1 su c h that E ( f n ( X ) − g n ( Y )) 2 → 0. In th is resp ect, by Theorem 4 of Szek ely and Rizzo, for the common p = q = 1 case, BD correlation do es not diﬀer from Pearson correlation. Although we found the examples v aried and int eresting and the compu- tation of p v alues for the BD co v ariance eﬀectiv e, we are not convinced that 1 Supp orted in part by NSF Gran t DMS-09-06808. This is a n electr onic reprint of the orig inal ar ticle published by the Institute of Ma thematical Statistics in The A n nals of Appli e d Statistics , 2009, V o l. 3, No. 4, 1266–12 69 . This reprint diﬀers from the or ig inal in pagina tio n and typo graphic detail. 1 2 P . J. BICKEL AN D Y. XU the comparison with the rank and Pea rson correlations is quite fair, and think a comparison to R is illumin ating. In tuitiv ely , the closer the form of observ ed dep endence is to that exhibited for th e extremal v alue of the statistic, the more p o wer one should exp ect. Example 1 h as Y as a distin ctly nonmonotone fu nction of X plus noise, a situation where we w ould exp ect th e rank correlation to b e w eak and, similarly , the other examples corresp ond to nonlinear relationships b et w een X and Y in whic h w e w ould exp ect th e P earson corr elation to p erform badly . In general, for goo dness of ﬁt, it is imp ortan t to h a ve statistic s with p o w er in directions w hic h are plausible departu r es; see Bic k el, Rito v and Stoker [ 1 ]. Ying Xu is stud ying, in the cont ext of high d imensional data, a version of empirical Renyi correlat ion diﬀerent from that of Breiman and F riedman [ 2 ]. Let f 1 , f 2 , . . . b e an orthonormal basis of L 2 ( P X ) and g 1 , g 2 , . . . an or- thonormal basis of L 2 ( P Y ), wh ere L 2 ( P X ) is the Hilb ert space of f unction f su c h that E f 2 ( X ) < ∞ and similarly for L 2 ( P Y ). Let th e ( K, L ) appro ximate Ren yi correlation b e d eﬁned as max ( corr K X k =1 α k f k ( X ) , L X l =1 β l g l ( Y ) !) , where corr is P earson correlation. This is seen to b e the canonical correlation of f ( X ) and g ( Y ), wh ere f ≡ ( f 1 , . . . , f K ) T , g ≡ ( g 1 , . . . , g L ) T , and is easily calculat ed as a generalized eigen v alue p roblem. The empirical ( K, L ) correlation is just the solution of the corresp onding empirical pr oblem where the v ariance co v ariance matrices V ar f ( X ) ≡ E [ f c ( X ) f c T ( X )] where f c ( X ) ≡ f ( X ) − E f ( X ) , V ar g ( Y ) and Co v ( f ( X ) , g ( Y )) are replaced by their empirical counte rparts. F or K, L → ∞ , the ( K, L ) correlation tends to the Ren yi correlation, R ≡ max { corr( f ( X ) , g ( Y )) : f ∈ L 2 ( P X ) , g ∈ L 2 ( P Y ) } . F or the empirical ( K , L ) correlation, K and L h a ve to b e chosen in a data determined w a y , although evidently eac h K , L pair pro v id es a test statistic . An even more imp ortan t c hoice is that of the f k and g l (whic h need not b e orthonormal but need only ha ve a linear span dense in their corresp onding Hilb ert spaces). W e compare the p erf orm ance of these test statistics in the ﬁ rst of the Szek ely–Rizzo examples in the next section. 1. Comparison on data example. Here w e will in vestig ate the p erfor- mance of the standard A CE estimate of the Ren yi correlation and a v ersion of ( K, L ) correlation in th e ﬁrs t of the Szek ely–Rizzo examples. DISCUSSI ON 3 Fig. 1. T able 1 K = 2 , L = 2 K = 3 , L = 4 K = 5 , L = 5 Estimated ( K , L ) correlation 0 . 81608 03 0 . 9170764 0 . 97716 3 p -v alue 0 . 00 2 0 . 002 ≤ 0 . 001 Breiman and F riedman [ 2 ] pro vided an algorithm, known as alternating conditional exp ectat ions (ACE), for estimating the transformations f 0 , g 0 and R itself. The estimated Ren yi correlatio n is ve ry close to 1 (0 . 999266 9) in this case , as exp ected since Y is a fu n ction of X plus some n oise. Figure 1 s ho ws the original relationship b et we en X and Y on the left and the relationship b e- t wee n the estimated transformations ˆ f and ˆ g on the r igh t. Ha ving compu ted ˆ R , the estimate of R , w e compute its signiﬁcance under the n ull hyp othe- sis of indep en d ence using the p ermutation distribution ju st as Szekel y and Rizzo did. Th e p -v alue is ≤ 0 . 001, whic h is extremely sm all as it sh ould b e. Next, we compu te the empirical ( K, L ) correlation. Giv en that the pr o- p osed nonlinear mo del is y = β 1 β 2 exp  − ( x − β 3 ) 2 2 β 2 2  + ε, w e c hose, as an orthonormal basis with resp ect to the Leb esgue measure, one deﬁned by the Hermite p olynomials deﬁned as H n ( x ) = ( − 1) n e x 2 / 2 d n dx n e − x 2 / 2 , for b oth X and Y . W e take f k ( · ) = g k ( · ) = e − x 2 / 4 H k ( · ). 4 P . J. BICKEL AN D Y. XU T able 1 giv es the computation results of diﬀerent com binations of K and L . As b efore, the p -v alue is computed by a p ermutation test, based on 999 replicates. The v alue, not surprisingly , is close to ˆ R , for K = L = 5. REFERENCES [1] Bickel, P. J. , Ri to v, Y. and Stoker, T. M. (2006). T ailor-made tests for go o dness of ﬁt to semiparametric hyp otheses. An n. Statist. 34 721–741. MR2281882 [2] Breim an, L. and Friedma n, J. H. (1985). Estimating optimal transformations for multiple regressi on and correlation. J. Amer. Statist. Asso c. 80 580–598. MR0803258 [3] Chen, A. and Bickel, P. J. (2005). Consisten t indep endent component analysis and prewhitening. IEEE T r ans. Signal Pr o c ess. 10 3625 –3632. MR2239886 [4] Feuer ve rger, A. and Mureika, R. A. (1977). The empirical characteris tic fun ction and its applications. Ann. Statist. 5 88–97. MR0428584 Dep ar tment of St a tistics 367 Ev ans Hall Berkeley, California 9471 0–3860 USA E-mail: bic ke l@stat.berkeley .edu yingxu@stat.berkeley .edu

Discussion of: Brownian distance covariance

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment