Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data

Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]

Authors: ** Robert Tibshirani (Stanford University) **

The Annals of Applie d Statistics 2008, V ol. 2, No. 2, 482–483 DOI: 10.1214 /07-A OAS137D Main articl e DO I: 10.1214/ 07-AOAS137 c  Institute of Mathematical Statistics , 2 008 DISCUSS ION OF: TREELETS—AN AD APTIV E MUL TI-SCALE BASIS F OR SP ARS E UNORDERED D A T A By R ob er t Tibshirani Stanfor d University This is a v ery inte resting pap er on an imp ortan t topic—the problem of extracting features in an unsup ervised w a y from a dataset. There is g ro wing evidence that unsup ervised f eature extract ion can provide an effectiv e set of features f or sup ervised learning: see, for example, the inte resting recen t wo r k on learning alg orithm s fo r Boltzmann mac hines [ Hinton, Osindero and T eh ( 2006 )]. The ideas in this pap er are exciting—treelets are a neat construction that com bine clustering and w a ve lets, and are simple enough to b e theoretical ly tractible. The connecti on to the l aten t v ariable mo del is a lso in teresting: this kind of mo del is also th e b asis of sup ervised principal comp onen ts, a metho d th at I co-dev elop ed recen tly [ Bair et al. ( 2006 )] for regression and surviv al analysis in the p > N setting. I ha ve no practica l exp erience with treelets, so m y remaining commen ts will b e brief and mostly in the form of qu estions for the auth ors. A m u c h simpler appr oac h to this problem w ould b e to hierarc hically cluster th e pre- dictors, and then take th e a ve rage at ev ery int ernal no de of the dendrogram. Let’s call th is the “simple a ve raging” metho d. As n oted by th e authors, this has a lready b een p rop osed in the literat ure, fo r example, in the “T ree- harv esting” p r o cedure. In this appr oac h w e k eep all of the original p redictors and all o f the internal nod e a v erages and so e nd up with an o v er-complete basis of 2 p basis f u nctions. Ho w are tree lets differen t from simple a veraging? T reelets do an orthog- onalizatio n after ea ch n o de merge, but does t his c hange th e clustering i n a material wa y? Wh at adv an tage is there to the orth ogonal basis delive red by treelets? After all, it lo oks like the r esu lting linear com binations of v ariables are not un correlated. Does the simple a verag in g metho d p erform as we ll as treelets in the kind of examples of the pap er? Do the au th ors’ theorems Received December 2007; rev ised December 2007. This is an electr onic reprint of the or iginal article publishe d b y the Institute of Mathematical Statistics in The A nn als of Applie d St atistics , 2008, V o l. 2, No. 2, 482 –483 . This reprint differs fro m the or iginal in pa gination and typogra phic deta il. 1 2 R. TIBSHIRA NI apply to th e simple av eraging metho d as w ell, or are treele ts uniquely go o d in their estimation of the c omp onents of a latent v ariable mo del? The con trast b et w een tr eelets and simple av eraging is analogous to the con trast b et w een wa v elets and b asis pu rsuit [ Chen, Donoho and Saun ders ( 1998 )]. Th e former is an orthogo nal b asis wh ile the latter is o v er -complete; when fitting is done with an L 1 (lasso) p enalt y , the ov er complete basis, can pro v id e a ve ry go o d p redictiv e m o del. One small p oint—hierarc hical clustering is us ually done with a verag e link- age b etw een pairs of predictors. A v ariation, commonly used in genomics and sometimes called Eisen clustering (since it’s implemented in Eisen’s Clus- ter program), uses instead the distance (or correlation) b et we en cen troids. The T reelet construction lo oks more lik e Eisen clustering. The p oin t is that one could app ly Eisen clusterin g, and then simply av erage the predictors in ev ery in ternal no de. REFERENCES Bair, E., Hastie, T., P aul, D. and Tibshirani , R. (2006). Prediction b y supervised principal comp onents. J. A mer. Statist. Asso c. 101 119–137. MR2252436 Chen, S. S., Donoho, D. L. and S aunders, M. A. (1998). Atomic decomp osition b y basis p ursuit. SIAM J. Sci . Comput. 20 33–61. MR1639094 Hinton, G., Osindero , S. and Teh, Y.-W. ( 2006). A fast learning algorithm for d eep b elief nets. Neur al Com put. 18 1527–1554. MR2224485 Dep ar tments of Heal th Research & Policy, and St a tistics St anford Un iversity St anford, California 9430 5 USA E-mail: tibs@stanford.edu

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment