Restricted Eigenvalue Conditions on Subgaussian Random Matrices
It is natural to ask: what kinds of matrices satisfy the Restricted Eigenvalue (RE) condition? In this paper, we associate the RE condition (Bickel-Ritov-Tsybakov 09) with the complexity of a subset of the sphere in $\R^p$, where $p$ is the dimension…
Authors: Shuheng Zhou
Restricted Eigen v alue Conditions on Subgaussian Random Matrice s Shuheng Zhou Seminar f ¨ ur Statistik, Department of Mathematics, ETH Z ¨ urich, CH-8092, Switzerland December 20, 2009 Abstract It is natural to ask: what kinds of matrices s atisfy th e Restricted Eigen value (RE) condition? In this paper, we associate the RE co ndition (Bickel-Ritov-Tsybakov 09) with the complexity of a sub set of the sphere in R p , where p is the dimensionality of the d ata, and show that a class of rand om matr ices with indepen dent rows, but not necessarily in depend ent columns, satisfy th e RE co ndition, when the sam ple size is a bove a certain lower b ound . Here we explicitly introd uce an add itional covariance structur e to the class of ran dom matrices th at we have known by now that satisfy th e Restricted Isometry Pro perty as defined in Cand ` es and T ao 05 (and hen ce the RE condition ), in order to com pose a broader class of rando m matrices for which the RE condition holds. In this case, tools f rom geometric fun ctional analysis in characterizin g the intr insic low-d imensional structures associated with the RE co ndition has been crucial in analyzing the sample com plexity an d under standing its st atistical implications for hig h dimensiona l data. Ke ywords. High dimensio nal data, Statistical estimation, ℓ 1 minimizatio n, Sparsity , Lasso, Dan tzig selec- tor , Restricted Isometry Property , Restricted Eigen value condition s, Sub gaussian random matrices 1 Introd uction In a typical high dimensiona l setting, the number of v ariables p is much large r than the numbe r of ob- serv ations n . This chall enging settin g app ears in linear regress ion, si gnal recov ery , cov ariance selectio n in graphical modelin g, and spa rse approximatio ns. In this pap er , we consider reco verin g β ∈ R p in the follo wing linear model: Y = X β + ǫ, (1.1) where X is an n × p design matrix, Y is a vect or of noisy observ ations an d ǫ being the noise term. T he design matrix is treated as either fix ed or random. W e assume through out this paper that p ≥ n (i.e. high- dimensio nal) and ǫ ∼ N (0 , σ 2 I n ) . Through out this paper , we assume that the columns of X hav e ℓ 2 norms in th e order of √ n , which holds with an ov erwhelming pro bability when X is a random design that w e s hall consid er . 1 The res tricted eigen va lue (RE) co nditions as for malized by Bicke l et al. ( 2009 ) 1 are among the weak est and hence the most general conditi ons in literature imposed on the Gram m atrix in ord er to guarante e nice statist ical p roperties for the Lasso and the Dantzig selector; for example, under this condi tion, the y deri ved bound s on ℓ 2 predic tion loss and on ℓ p , where 1 ≤ p ≤ 2 , loss for estimati ng the par ameters for b oth the Lasso and the Dantz ig selector in both lin ear regr ession and nonp arametric regres sion models. From no w on, we refer to their cond itions in general as the RE condition. Before we elabor ate upon the R E condition , we need some notatio n and some more definitions to put this condition in perspecti ve. Consider the linear regress ion model in ( 1.1 ). F or a cho sen penaliza tion parameter λ n ≥ 0 , regula rized esti- mation with the ℓ 1 -norm penalty , a lso known as the Lasso ( T ibshirani , 1996 ) or the B asis Pursuit ( Chen et al. , 1998 ) refers to the follo wing con ve x optimizati on problem b β = arg m in β 1 2 n k Y − X β k 2 2 + λ n k β k 1 , (1.2) where the scaling fact or 1 / (2 n ) is chosen by con venienc e. The Dantzig selecto r ( Cand ` es and T ao , 2007 ), for a giv en λ n ≥ 0 , is defined as ( D S ) arg min b β ∈ R p b β 1 subjec t to 1 n X T ( Y − X b β ) ∞ ≤ λ n . (1.3) For an integer 1 ≤ s ≤ p/ 2 , we r efer to a vector β ∈ R p with at most s n on-zero entries as an s -sparse vec tor . Let β T ∈ R | T | , be a sub vector of β ∈ R p confined to T . One of th e common proper ties of the Lasso and the Dantzi g selecto r is: for an appr opriately chosen λ n , for a ve ctor υ := b β − β , where β is an s -sparse vec tor and b β is the solutio n fro m either the L asso or the Dantzig selector , it hold s w ith high probabil ity (c f. Section C ) k υ I c k 1 ≤ k 0 k υ I k 1 , (1.4) where I ⊂ { 1 , . . . , p } , | I | ≤ s is the support of β , k 0 = 1 for the D antzig s elector , and for th e Lasso it holds for k 0 = 3 ; see Bicke l et al. ( 20 09 ) a nd Cand ` es and T ao ( 20 07 ) in case columns of X ha ve ℓ 2 norm √ n . W e use υ T 0 to alw ays r epresent th e subve ctor of υ ∈ R p confined to T 0 , which correspo nds to the locations of the s lar gest coef ficients of υ in a bsolute values : then ( 1.4 ) implies that (see Proposi tion 1.4 ) υ T c 0 1 ≤ k 0 k υ T 0 k 1 . (1.5) W e are now ready to intro duce the Restricte d Eigen valu e assumpti on that is formaliz ed in Bicke l et al. ( 2009 ). In Section 3 , we sho w the con ver gence rate o n ℓ p for p = 1 , 2 for both the Lasso and the D antzig selecto r under this condition for the purpose of completenes s. Assumption 1.1. ( Restricted Eigen value assumption RE ( s, k 0 , X ) ( Bickel et al. , 2009 )) F or some inte- ger 1 ≤ s ≤ p and a positive number k 0 , the following holds: 1 K ( s, k 0 , X ) △ = min J 0 ⊆{ 1 ,...,p } , | J 0 |≤ s min υ 6 =0 , ‚ ‚ ‚ υ J c 0 ‚ ‚ ‚ 1 ≤ k 0 k υ J 0 k 1 k X υ k 2 √ n k υ J 0 k 2 > 0 . (1.6) 1 W e note the auth ors have defined two such conditions, for wh ich we show are equiv alent excep t on the constant defined within each definition; see Proposition A.1 and Proposition A.2 in Section A.3 for details. 2 Definition 1.1. Thr oughout this paper , we say that a vector υ ∈ R p is admissible to ( 1.6 ) , or equivalentl y to ( 1.12 ) , for a g iven k 0 > 0 as defined ther ein, if υ 6 = 0 and for some J 0 ∈ { 1 , . . . , p } suc h that | J 0 | ≤ s , it hold s that υ J c 0 1 ≤ k 0 k υ J 0 k 1 . Now it is clear that if υ is admissib le to ( 1.6 ) , or equiv alently to ( 1.12 ) , ( 1.5 ) holds (cf. Pr opositio n 1.4 ). If RE ( s, k 0 , X ) is satisfied with k 0 ≥ 1 , then the square submatric es of size ≤ 2 s of X T X/n are necess arily positi ve definit e (see Bickel et al. ( 2009 )). W e note t he “uni versali ty” of this co ndition as it is not tailored to any par ticular set J 0 . W e also note that giv en such a uni versa lity condition, it is suf fi cient to check if for all υ 6 = 0 that is admissi ble to ( 1.6 ) and for K ( s, k 0 , X ) > 0 , the following inequ ality k X υ k 2 √ n ≥ k υ T 0 k 2 K ( s, k 0 , X ) > 0 (1.7) holds, where T 0 corres ponds to loc ations of the s large st coef ficients of υ in ab solute value s, as ( 1.7 ) is b oth necess ary and also suffici ent to guarante e that ( 1.6 ) holds; S ee Proposi tion 1.4 for details. A sp ecial class of design matrices that satisfy the RE con dition are the random des ign matrices. This is sho wn in a lar ge body of work in the high dimensio nal se tting, for e xample ( Cand ` es et al. , 2006 ; Cand ` es and T ao , 2005 , 2007 ; Baraniuk et al. , 2008 ; Mendelson et al. , 2008 ; Adamczak et al. , 2009 ), which sho ws that a uni- form un certainty princ iple (UUP , a condition that is stronger than the RE condition , see Bicke l et al. ( 2009 )) holds for “gene ric” or rand om desig n matrices fo r v ery significa nt val ues of s ; rough ly speaking, UUP holds when th e 2 s -restric ted isometry co nstant θ 2 s is smal l, which we n ow define. Let X T , where T ⊂ { 1 , . . . , p } be the n × | T | submatrix obtaine d by ex tracting columns of X inde xed by T . Definition 1.2. ( Cand ` es and T ao , 2005 ) F or each inte ger s = 1 , 2 , . . . , the s -r estricted isometry consta nt θ s of X is the smallest quant ity such that (1 − θ s ) k c k 2 2 ≤ k X T c k 2 2 /n ≤ (1 + θ s ) k c k 2 2 , (1.8) for all T ⊂ { 1 , . . . , p } w ith | T | ≤ s and coeffi cients sequences ( c j ) j ∈ T . It is well kno wn that for a random matrix the UUP holds for s = O ( n/ log ( p/n )) with i.i.d. Gauss ian random v ariables (that is, Gauss ian random ensemble, subject to no rmalizations of co lumns), the Berno ulli, and i n gene ral the subgaus sian ensembl es ( Baraniuk et al. , 2008 ; Mend elson et al. , 2008 ) (cf. Theo rem 2.5 ). Recently , it is s hown ( Adamczak et al. , 2009 ) that U UP holds for s = O ( n/ log 2 ( p/n )) when X is a random matrix composed o f columns that are indep endent iso tropic v ectors with log-conca ve d ensities. Hence this setup only requires Θ(log( p/n )) or Θ(log 2 ( p/n )) observ ation s per nonzero value in β , where Θ hides a very small constant , when n is a nonnegli gible fraction of p , i n ord er to perform accurate st atistical estimat ion; we call this le vel of sparsit y as the linear sparsity . The main pu rpose of this pa per is to e xtend th e f amily of random m atrices from th e i.i.d. sub gaussian en- semble Ψ (cf. ( 1.10 )), w hich are now well known to satisfy the UUP condition and hence the RE condit ion under linear sp arsity , to a la rger famil y of random matrices X := ΨΣ 1 / 2 , where Σ is ass umed to beha ve suf ficiently nicel y in the sense that it satisfies c ertain rest ricted eigen va lue co nditions to be defined in Sec- tion 1.1 . T hus we ha ve e xplicitl y intr oduced the ad ditional cov arian ce stru cture Σ to the colu mns of Ψ in genera ting X . In Theorem 1.6 , we sho w that X sati sfies the RE condition with ove rwhelming prob ability 3 once we hav e n ≥ C s log( cp/s ) , where c is an absolut e con stant and C de pends on the restricted eigen va l- ues of Σ (cf. ( 1.19 )), w hen Σ satisfies th e re stricted eige n va lue assumption to b e s pecified in Section 1 .1 . W e belie ve such resu lts can be ext ended to other case s: for example , when X is the compos ition of a rando m Fourie r ens emble, or ran domly sampled ro ws of orthonormal m atrices , see for exa mple Cand ` es and T ao ( 2006 , 2007 ). Finally , we sho w rate of con ver gence results for the Lasso and the D antzig select or giv en such rando m matrices. Although such resu lts are almost entirely kno wn, we provide a complet e anal ysis for a self- contai ned presentati on. Gi ven these rates of con ver gence (cf. Theore m 3.1 and Theorem 3.2 ), one can exp loit thresho lding algorithms to adjust the bia s and get rid of e xcessi ve v ariable s selected by an initial estimato r rely ing o n ℓ 1 reg ularized minimization function s, for exa mple, the Lasso or the Dantzig selector; under the UUP or the RE type of conditions , such procedur es are shown to select a sparse model, whic h contai ns the set of va riables in β that are significant in their abso lute val ues; in addition, on e can then condu ct an ordinary least square s re gression on such a sparse model to obtain a final estimato r , whose bias is significantly reduced compared to the initia l estimators. Such algorithms are proposed and analyzed in a series of papers, for example Cand ` es and T ao ( 2007 ); Meinshausen and Y u ( 2009 ); W asserman and Roede r ( 2009 ); Zhou ( 2009 ). 1.1 Restricted eigen value assumption for a random design W e will define the family o f random matri ces that we c onsider and t he restricted eig en va lue assumpti on that we impose on such a rando m design. W e need some more definit ions. Definition 1.3. Let Y be a random vecto r in R p ; Y is called isotr opic i f for every y ∈ R p , E | h Y , y i | 2 = k y k 2 2 , and is ψ 2 with a constant α if for ev ery y ∈ R p , k h Y , y i k ψ 2 := inf { t : E exp( h Y , y i 2 /t 2 ) ≤ 2 } ≤ α k y k 2 . (1.9) The important e xamples o f is otropic, subgaus sian vect ors are the Gau ssian ran dom vec tor Y = ( h 1 , . . . , h p ) where h i , ∀ i are indepen dent N (0 , 1) random v ariable s, and the random vec tor Y = ( ε 1 , . . . , ε p ) where ε i , ∀ i are indepe ndent, symmetric ± 1 Bernoulli rando m va riables. A subg aussian or ψ 2 operat or is a random operator Γ : R p → R n of the form Γ = n X i =1 h Ψ i , · i e i , (1.10) where e 1 , . . . , e n are the canonic al basis of R n and Ψ 1 , . . . , Ψ n are independen t copies of an iso tropic ψ 2 vec tor Ψ 0 on R p . Note that th roughout thi s paper , Γ is represente d by a random matrix Ψ whos e ro ws are Ψ 1 , . . . , Ψ n . Throu ghout this paper , we consid er a random design m atrix X that is gener ated as follo ws: X := ΨΣ 1 / 2 , where we assume Σ j j = 1 , ∀ j = 1 , . . . , p, (1.11) and Ψ i s a ra ndom matrix whose rows Ψ 1 , . . . , Ψ n are indep endent copies of an isotropic ψ 2 vec tor Ψ 0 on R p as in Definiti on 1.3 . For a rando m desig n X as in ( 1.11 ), we make the follo wing assumption on Σ . 4 A slightl y stronger conditio n has been originally defined in Zhou et al. ( 2009 ) in the conte xt of Gaussian graphi cal modeling. Assumption 1.2. Restricted eigen value co nd ition RE ( s, k 0 , Σ) . Suppose Σ j j = 1 , ∀ j = 1 , . . . , p , and for some inte ger 1 ≤ s ≤ p and a positive number k 0 , the following condition holds, 1 K ( s, k 0 , Σ) := min J 0 ⊆{ 1 ,...,p } , | J 0 |≤ s min υ 6 =0 , ‚ ‚ ‚ υ J c 0 ‚ ‚ ‚ 1 ≤ k 0 k υ J 0 k 1 Σ 1 / 2 υ 2 k υ J 0 k 2 > 0 . (1.12) W e note t hat similar to the case in Assumption 1.1 , it is sufficien t to che ck if for υ 6 = 0 that is admissibl e to ( 1.12 ) and for K ( s, k 0 , Σ) > 0 , that the follo wing inequa lity Σ 1 / 2 υ 2 ≥ k υ T 0 k 2 K ( s, k 0 , Σ) > 0 (1.13) holds, where T 0 corres ponds to locations of the s lar gest coeffici ents of υ in absolu te v alues. Formally , we ha ve Pro position 1.4. Let 1 ≤ s ≤ p/ 2 be an inte ger and k 0 > 0 . Suppo se δ 6 = 0 is admissible to ( 1.12 ) , or equiva lently to ( 1.6 ) , in the sense of Definition 1.1 ; then δ T c 0 1 ≤ k 0 k δ T 0 k 1 ; (1.14) Hence ( 1.13 ) is both necessar y and suf ficient to guarant ee that ( 1.12 ) holds. Similarly ( 1.7 ) is a necessar y and suf ficient conditio n for ( 1.6 ) to hold. Mor eov er , suppos e that Σ satis fies Assumption 1.2 , then for δ that is admissib le to ( 1.12 ) , we have Σ 1 / 2 δ 2 ≥ k δ J 0 k 2 K ( s, k 0 , Σ) > 0 . W e no w define p ρ min ( m ) := min k t k 2 =1 | supp ( t ) |≤ m Σ 1 / 2 t 2 , (1.15) p ρ max ( m ) := max k t k 2 =1 | supp ( t ) |≤ m Σ 1 / 2 t 2 , (1.16) where we assume that p ρ max ( m ) is a constant for m ≤ p/ 2 . If RE ( s, k 0 , Σ) is satisfied with k 0 ≥ 1 , then the square subma trices of size ≤ 2 s of Σ are necessari ly po sitiv e definite ( see Bicke l et al. ( 2009 )); hence throug hout this paper , w e also ass ume that ρ min (2 s ) > 0 . (1.17) Note that when Ψ i s a Gaussian random matrix with i.i.d. N (0 , 1) random v ariables, X as in ( 1.11 ) cor - respon ds to a random matri x with independent ro ws, such that each row is a random v ector th at follo ws a multi va riate normal distrib ution N (0 , Σ) : X has i.i.d. ro ws ∼ N (0 , Σ) , where we ass ume Σ j j = 1 , ∀ j = 1 , . . . , p. (1.18) Finally , w e need the followin g notation. For a set V ⊂ R p , we let con v V denote the con ve x hull of V . For a finite s et Y , the cardina lity is de noted by | Y | . Let B p 2 and S p − 1 be the unit Euclidean ball and the unit sphere respec tiv ely . 5 1.2 The main theor em Through out t his section, we assume that Σ satisfies ( 1.12 ) and ( 1.16 ) for m = s . W e assume k 0 > 0 and it is unde rstood to be the same quantity througho ut ou r discus sion. Let u s define ¯ C = 3(2 + k 0 ) K ( s, k 0 , Σ) p ρ max ( s ) , (1.19) where k 0 > 0 is understood to be the same as in ( 1.20 ). Our main res ult in Theorem 1.6 roug hly says that for a ra ndom matr ix X := Ψ Σ 1 / 2 , which is the prod uct of a ran dom subgaussi an ensemble Ψ and a fixed positi ve semi-definite matrix Σ 1 / 2 , the RE condition w ill be satisfied with ov erwhelming probabilit y , gi ven n that is suf ficiently large (cf. ( 1.21 )). Before i ntroducing the t heorem fo rmally , we define the cl ass of vecto rs E s , for a particula r integer 1 ≤ s ≤ p/ 2 , that are relev ant to the RE Assumption 1.1 and 1.2 . For any gi ven sub set J 0 ⊂ { 1 , . . . , p } such that | J 0 | ≤ s , we consid er the set of vectors δ such that δ J c 0 1 ≤ k 0 k δ J 0 k 1 (1.20) holds for some k 0 > 0 , subje ct to a normaliza tion con dition su ch that Σ 1 / 2 δ ∈ S p − 1 ; we then define the set E ′ s as unio ns of all vector s that sat isfy the cone constra int as in ( 1.20 ) with resp ect to an y inde x set J 0 ⊂ { 1 , . . . , p } such that | J 0 | ≤ s ; E ′ s = n δ : Σ 1 / 2 δ 2 = 1 s.t. ∃ J 0 ⊆ { 1 , . . . , p } s.t. | J 0 | ≤ s and ( 1.20 ) holds o . W e no w define a e ven broader set: let δ T 0 be the subv ector of δ confined to the locatio ns of its s lar gest coef ficients: E s = n δ : Σ 1 / 2 δ 2 = 1 s.t. δ T c 0 1 ≤ k 0 k δ T 0 k 1 holds, o Remark 1.5. It is clear fr om P r oposition 1.4 that E ′ s ⊂ E s for the same k 0 > 0 . Theorem 1.6 is the main contri butio n of this pap er . Theor em 1.6. Set 1 ≤ n ≤ p , 0 < θ < 1 , and s ≤ p/ 2 . Let Ψ 0 be an isotr opic ψ 2 ran dom vec tor on R p with consta nt α a s in Definitio n 1.3 and Ψ 1 , . . . , Ψ n be independ ent copies of Ψ 0 . Let Ψ be a ra ndom matrix in R n × p whose r ows ar e Ψ 1 , . . . , Ψ n . Let Σ satisfy ( 1.12 ) and ( 1.16 ) . If n satisfies for ¯ C as defined in ( 1.19 ) n > c ′ α 4 θ 2 max ¯ C 2 s log(5 ep/s ) , 9 log p , (1.21) then with pr obabil ity at least 1 − 2 exp ( − ¯ cθ 2 n/α 4 ) , we have for all δ ∈ E s , 1 − θ ≤ ΨΣ 1 / 2 δ 2 √ n ≤ 1 + θ , a nd (1.22) ∀ ρ i , 1 − θ ≤ k Ψ ρ i k 2 √ n ≤ 1 + θ , (1.23) wher e ρ 1 , . . . , ρ p ar e column vectors of Σ 1 / 2 , and c ′ , ¯ c > 0 ar e abso lute consta nts. 6 W e no w state some immediate conseque nces of Theore m 1.6 . Consider the ran dom desig n X = ΨΣ 1 / 2 as defined in Theor em 1.6 . It is clear when all colu mns of X ha ve an E uclide an norm close to √ n , as guaran teed by ( 1.23 ) for 0 < θ < 1 that is small, it makes sense to discu ss the RE condition in th e form of ( 1.6 ). W e now define th e fol lowing ev ent R o n a rand om des ign X , whic h pr ovides an u pper bo und on K ( s, k 0 , X ) for a gi ven k 0 > 0 , when X satisfies Assumption RE ( s, k 0 , X ) : R ( θ ) := X : RE ( s, k 0 , X ) holds with 0 < K ( s, k 0 , X ) ≤ K ( s, k 0 , Σ) 1 − θ (1.24) Under Assumption 1.2 , we consid er the set of vect ors u := Σ 1 / 2 δ , w here δ 6 = 0 is admissibl e to ( 1.12 ), and sho w a uniform bound on the concen tration of each indi vidual ran dom v ariab le o f the form k Γ u k 2 2 := k X δ k 2 2 around its mea n. B y Propos ition 1.4 , we ha ve k u k 2 = Σ 1 / 2 δ 2 > 0 . W e can no w apply ( 1.22 ) to each ( δ / Σ 1 / 2 δ 2 ) 6 = 0 , which belon gs to E ′ s and hence E s (see Remark 1.5 ), and concl ude that 0 < (1 − θ ) Σ 1 / 2 δ 2 ≤ k X δ k 2 √ n ≤ (1 + θ ) Σ 1 / 2 δ 2 (1.25) hold for all δ 6 = 0 that is admissible to ( 1.12 ), with probabil ity at le ast 1 − 2 exp( − ¯ cθ 2 n/α 4 ) . No w the lo wer bound in ( 1.25 ) implies that k X δ k 2 √ n ≥ (1 − θ ) Σ 1 / 2 δ 2 ≥ (1 − θ ) k δ T 0 k 2 K ( s, k 0 , Σ) > 0 , (1.26) where T 0 is the loc ations of lar gest coef fi cients of t in a bsolute v alues. Hence ( 1.23 ) and e vent R ( θ ) hold simultan eously , with probab ility at least 1 − 2 exp( − ¯ cθ 2 n/α 4 ) , gi ven ( 1.13 ) and Propositio n 1.4 , so long as n satisfies ( 1.21 ). Remark 1.7. It is cl ear that this r esult gene ralizes the notion of res tricted isometry pr oper ty (RIP) intr o- duced in Cand ` es and T ao ( 2005 ). In particul ar , when Σ = I and δ is s -spar se, ( 1.8 ) holds for X with θ s = θ , given ( 1.25 ) . 2 Pr oof Theor em 1.6 In th is section , we first state a d efinition and th en two lemmas in Section 2.1 , from which we show the pr oof of Theorem 1.6 in Section 2.2 . W e s hall identify the basis w ith the canonical b asis { e 1 , e 2 , . . . , e p } of R p , where e i = { 0 , . . . , 0 , 1 , 0 , . . . , 0 } , and it is t o be under stood that 1 appea rs in the i th posi tion and 0 appears else where. Definition 2.1. F or a subset V ⊂ R p , we let ℓ ∗ ( V ) = E su p t ∈ V p X i =1 g i t i (2.1) wher e t = ( t i ) p i =1 ∈ R p and g 1 , . . . , g p ar e independe nt N (0 , 1) Gaussian rando m variable s. 7 2.1 The complexity measur es The subs et Υ that is rele vant to our result is a subset of the sphere S p − 1 such that the linear function Σ 1 / 2 : E s → R p maps δ ∈ E s onto: Υ := Σ 1 / 2 ( E s ) = { v ∈ R p : v = Σ 1 / 2 δ for some δ ∈ E s } . (2.2) W e now show a bound on functional of ℓ ∗ (Υ) , for which we crucially ex ploit the co ne prope rty of v ectors in E s , th e RE conditi on on Σ , and the b ound of ρ max ( s ) . L emma 2 .2 is one of th e main technica l contri butio ns of this paper . Lemma 2.2. (Comple xity of a subset of S p − 1 ) Let Σ satisfy ( 1.12 ) and ( 1.16 ) . Let h 1 , . . . , h p be indepen dent N (0 , 1) ran dom varia bles. Let 1 ≤ s ≤ p / 2 be an inte ger . Then ℓ ∗ (Υ) := E sup y ∈ Υ p X i =1 h i y i = E sup δ ∈ E s h h, Σ 1 / 2 δ i ≤ ¯ C p s log( cp/s ) (2.3) wher e ¯ C is defined in ( 1.19 ) and c = 5 e . Remark 2.3. W e will als o sho w in our fund amental pr oof for th e zer o-mean Gaussian random ensemble with cova riance m atrix being Σ , wher e such comple xity measur e is used exact ly in Section D . Ther e we also give e xplicit consta nts. No w let Σ 1 / 2 := ( ρ ij ) and ρ 1 , . . . , ρ p denote its p co lumn v ectors. By definition o f Σ = (Σ 1 / 2 ) 2 , i t holds that k ρ i k 2 2 = P p j =1 ρ 2 ij = Σ ii = 1 , for all i = 1 , . . . , p . Thus we ha ve the follo wing. Lemma 2.4. Let Φ = { ρ 1 , . . . , ρ p } be the subset of vector s in S p − 1 that corr espond to columns of Σ 1 / 2 . It holds that ℓ ∗ (Φ) ≤ 3 √ log p. 2.2 Pr oof of Theorem 1.6 The key idea to prov e Theorem 1.6 is to apply the powerful Theorem 2.5 as shown in Mendels on et al. ( 200 7 , 2008 )(Corolla ry 2.7, Theorem 2.1 respecti vely ) to the subset Υ of the sphere S p − 1 , as defined in ( 2.2 ). As exp lained in Mendels on et al. ( 2008 ), in the conte xt of Theorem 2.5 , the functio nal ℓ ∗ (Υ) is the comple xity measure of the s et Υ , which measures the e xtent in whic h probab ilistic bound s on the conc entration of eac h indi vidual random variab le of the form k Γ v k 2 2 around its mean can be combined to form a boun d that holds unifor mly for all v ∈ Υ . Theor em 2.5. ( Mendels on et al. , 2007 , 2 008 ) Set 1 ≤ n ≤ p and 0 < θ < 1 . Let Ψ be an is otr opic ψ 2 ran dom vec tor on R p with constant α , and Ψ 1 , . . . , Ψ n be independ ent copies of Ψ . Let Γ be as defined in ( 1.10 ) and let V ⊂ S p − 1 . If n satisfies n > c ′ α 4 θ 2 ℓ ∗ ( V ) 2 , (2.4) Then with pr obabi lity a t least 1 − exp( − ¯ cθ 2 n/α 4 ) , for all v ∈ V , we have 1 − θ ≤ k Γ v k 2 / √ n ≤ 1 + θ , (2.5) wher e c ′ , ¯ c > 0 ar e absolute const ants. 8 It is clear that ( 1.22 ) follo ws immediately from Theorem 2.5 by havi ng V = Υ , giv en Lemm a 2.2 . In fact, we can no w finis h pro ving T heorem 1.6 by applyi ng Theorem 2.5 twice, by ha ving V = Υ an d V = Φ respec tiv ely: the lo wer bound on n is obtain ed by applying the upper bound s on ℓ ∗ (Υ) as gi ven in Lemma 2.2 and on ℓ ∗ (Φ) as in L emma 2.4 . W e then apply th e union b ound to bound th e probability of the bad e ven ts when ( 2.5 ) does not hold for some v ∈ Υ or some v ∈ Φ respecti vely . 3 ℓ p con vergence f or the Lasso and the Dantzig selector Through out this section, we assume that 0 < θ < 1 , and c ′ , ¯ c > 0 are absol ute constan ts. C onditi oned on the rand om design as in ( 1.11 ) sat isfying proper ties as guara nteed in Theorem 1.6 , we p roceed to treat X as a determinis tic design, for which both the RE condition as described in ( 1.24 ) and condition F ( θ ) defined as belo w hold, F ( θ ) := X : ∀ j = 1 , . . . , p, 1 − θ ≤ k X j k 2 √ n ≤ 1 + θ , (3.1) where X 1 , . . . , X p are the column v ectors of X : For mally , we consider the set X ∋ X of random designs that satisfy both condition R ( θ ) and F ( θ ) , for some 0 < θ < 1 . By Theorem 1.6 , we hav e for n satisfy the lo wer bound in ( 1.21 ), P ( X ) := P ( R ( θ ) ∩ F ( θ )) ≥ 1 − 2 exp( − ¯ cθ 2 n/α 4 ) . It is clear th at on X , Assumptio n 1.2 holds for Σ . W e now bound the correlation betwee n the noise and cov ariates of X for X ∈ X , where we also define a consta nt λ σ ,a,p which is used thro ughout the rest of this paper . For ea ch a ≥ 0 , for X ∈ F ( θ ) , let T a := ǫ : X T ǫ n ∞ ≤ (1 + θ ) λ σ ,a,p , w here X ∈ F ( θ ) , for 0 < θ < 1 , (3.2) where λ σ ,a,p = σ √ 1 + a p (2 log p ) /n , where a ≥ 0 ; we hav e (cf. Propositi on C. 1 ) P ( T a ) ≥ 1 − ( p π log pp a ) − 1 ; (3.3) In fact , fo r s uch a bound to hold, we only need k X j k 2 √ n ≤ 1 + θ , ∀ j to hold in F ( θ ) . W e note that constants in the theor ems are not optimized. Theor em 3.1. ( Estimation for the Lasso) Set 1 ≤ n ≤ p , 0 < θ < 1 , and a > 0 . Let s < p/ 2 . Consider the lin ear model in ( 1.1 ) w ith r andom de sign X := ΨΣ 1 / 2 , wher e Ψ n × p is a subgau ssian r andom matrix as defined in Theo r em 1.6 . and Σ satisfies ( 1.1 2 ) and ( 1.16 ) . Let b β be an o ptimal sol ution to the Lasso as in ( 1.2 ) with λ n ≥ 2(1 + θ ) λ σ ,a,p . Sup pose that n satisfies for ¯ C as in ( 1.19 ) , n > c ′ α 4 θ 2 max ¯ C 2 s log(5 ep/s ) , 9 log p . ( 3.4) Then with pr obabilit y at leas t P ( X ∩ T a ) ≥ 1 − 2 exp ( − ¯ cθ 2 n/α 4 ) − P ( T c a ) , we have for B ≤ 4 K 2 ( s, 3 , Σ) / (1 − θ ) 2 and k 0 = 3 , b β − β 2 ≤ 2 B λ n √ s, and b β − β 1 ≤ B λ n s. (3.5) 9 Theor em 3.2. ( Estimatio n for the Dantzig selector ) Set 1 ≤ n ≤ p , 0 < θ < 1 , and a > 0 . Let s < p / 2 . Consider the linear model in ( 1.1 ) with rando m design X := ΨΣ 1 / 2 , wher e Ψ n × p is a subgau ssian random matrix as defined in Theor em 1.6 . an d Σ satis fies ( 1.12 ) and ( 1.16 ) . L et b β be an op timal so lution to the Dantzig selector as in ( 1.3 ) wher e λ n ≥ (1 + θ ) λ σ ,a,p . Sup pose that n satisfies for ¯ C as in ( 1.19 ) , n > c ′ α 4 θ 2 max ¯ C 2 s log(5 ep/s ) , 9 log p . ( 3.6) then with pr obabilit y at least P ( X ∩ T a ) ≥ 1 − 2 exp( − ¯ cθ 2 n/α 4 ) − P ( T c a ) , we have for B ≤ 4 K 2 ( s, 1 , Σ) / (1 − θ ) 2 and k 0 = 1 , b β − β 2 ≤ 3 B λ n √ s, and b β − β 1 ≤ 2 B λ n s. (3.7) Proofs are gi ven in Section C . Acknowledgments. Research is su pported b y t he Swiss National Science Founda tion (SNF) Grant 20P A21- 12005 0/1. The author is e xtremely grateful to Guillaume Lecu ´ e for his carefu l read ing of t he manuscript and fo r his many constructi ve and insightful comments that ha ve lead to a significant impr ovemen t of the presen tation of this paper . The autho r would also like to thank Oliv ier Gu ´ edon , Alain Pajor , and Larry W asser man for helpfu l con versa tions, and Roman V ershynin for provi ding a referenc e. A Some prelimin ary pr opositions In th is section, we first pro ve Proposit ion 1.4 , in Section A.1 , which is used thro ughout the rest of the pa per . W e then presen t a s imple decompo sition for vec tors δ ∈ E s and sho w some immediate implicatio ns, which we shall need in the proofs for Lemma 2.2 , Theorem 3.1 and Theorem 3.2 . A.1 Pr oof of Proposition 1.4 For each δ that is admissi ble to ( 1.12 ), there exists a subset of indic es J 0 ⊆ { 1 , . . . , p } such that both | J 0 | ≤ s and δ J c 0 1 ≤ k 0 k δ J 0 k 1 hold. This i mmediately implies that ( 1.14 ) holds for k 0 > 0 , δ T c 0 1 = k δ k 1 − k δ T 0 k 1 ≤ k δ k 1 − k δ J 0 k 1 = δ J c 0 1 ≤ k 0 k δ J 0 k 1 ≤ k 0 k δ T 0 k 1 due t o the maximality of k δ T 0 k 1 among all k δ J 0 k 1 for J 0 ⊆ { 1 , . . . , p } such th at | J 0 | ≤ s . Thi s immediate ly implies that E ′ s ⊂ E s . W e no w sho w that ( 1.13 ) is a ne cessary and sufficien t condit ion for ( 1.12 ) to hol d; the same ar gument applie s to the RE conditi ons on X . Suppose ( 1.13 ) h old for δ 6 = 0 ; we ha ve for all J 0 ∈ { 1 , . . . , p } su ch that | J 0 | ≤ s and δ J c 0 1 ≤ k 0 k δ J 0 k 1 , Σ 1 / 2 δ 2 ≥ k δ T 0 k 2 K ( s, k 0 , Σ) ≥ k δ J 0 k 2 K ( s, k 0 , Σ) > 0 , (A.1) 10 where the last inequality is due to the fact that k δ J 0 k 2 > 0 ; Suppose k δ J 0 k 2 = 0 otherwise; then δ J c 0 1 ≤ k 0 k δ J 0 k 1 = 0 wo uld imply that δ = 0 , which is a contradict ion. Con vers ely , suppose that ( 1.12 ) hold; then ( 1.13 ) must also hold, gi ven that T 0 satisfies ( 1.14 ) with | T 0 | = s , and δ T 0 6 = 0 . Finally , the “moreov er” part holds gi ven Assumptio n 1.2 , in view of ( A.1 ). A.2 Decomposing a vector in E s For each δ ∈ E s , we decompose δ into a set of v ectors δ T 0 , δ T 1 , δ T 2 , . . . , δ T K such that T 0 corres ponds to locatio ns of the s la rgest coef ficients of δ in absolute va lues, T 1 corres ponds to location s of the s lar gest coef ficients of δ T c 0 in abso lute valu es, T 2 corres ponds to locations of the next s large st coefficie nts of δ T c 0 in absolu te valu es, and so on. H ence we ha ve T c 0 = S K k =1 T k , where K ≥ 1 , | T k | = s, ∀ k = 1 , . . . , K − 1 , and | T K | ≤ s . No w for each j ≥ 1 , w e ha ve δ T j 2 ≤ √ s δ T j ∞ ≤ 1 √ s δ T j − 1 1 , where vect or k ·k ∞ repres ents the larg est entry in absolut e value in the vec tor , and hence X k ≥ 1 k δ T k k 2 ≤ s − 1 / 2 ( k δ T 0 k 1 + k δ T 1 k 1 + k δ T 2 k 1 + . . . ) ≤ s − 1 / 2 ( k δ T 0 k 1 + δ T c 0 1 ) = s − 1 / 2 k δ k 1 (A.2) ≤ s − 1 / 2 ( k 0 + 1) k δ T 0 k 1 ≤ ( k 0 + 1) k δ T 0 k 2 , (A.3) where for ( A.3 ), we ha ve used the fac t that for all δ ∈ E s δ T c 0 1 ≤ k 0 k δ T 0 k 1 (A.4) holds. Ind eed, for δ such that ( A.4 ) holds, we hav e by ( A. 2 ) and ( A.3 ) k δ k 2 ≤ k δ T 0 k 2 + X j ≥ 1 δ T j 2 ≤ k δ T 0 k 2 + s − 1 / 2 k δ k 1 (A.5) ≤ ( k 0 + 2) k δ T 0 k 2 . (A.6) A.3 On the equiv alence of two RE conditions T o introduce the second RE assumption by Bicke l et al. ( 2009 ), we need some more not ation. For an inte ger s such that 1 ≤ s ≤ p/ 2 , a v ector υ ∈ R p and a set o f indices J 0 ⊆ { 1 , . . . , p } with | J 0 | ≤ s , deno ted by J 1 the subse t of { 1 , . . . , p } correspon ding to the s larg est i n absolute val ue coordinate s of υ ou tside of J 0 and defined J 01 △ = J 0 ∪ J 1 . Assumption A.1. Res tricted eigen value assumption R E ( s, s, k 0 , X ) ( Bicke l et al. , 2009 ) . Consider a fixed design . F or some inte ger 1 ≤ s ≤ p / 2 , and a positive number k 0 , the following condition holds: 1 K ( s, s, k 0 , X ) := min J 0 ⊆{ 1 ,...,p } , | J 0 |≤ s min υ 6 =0 , ‚ ‚ ‚ υ J c 0 ‚ ‚ ‚ 1 ≤ k 0 k υ J 0 k 1 k X υ k 2 √ n k υ J 01 k 2 > 0 . (A.7) 11 Assumption A .2. Restricted eigen value assumption RE ( s, s, k 0 , Σ) F or some inte ger 1 ≤ s ≤ p/ 2 , and a positiv e number k 0 , the following conditio n ho lds: 1 K ( s, s, k 0 , Σ) := min J 0 ⊆{ 1 ,...,p } , | J 0 |≤ s min υ 6 =0 , ‚ ‚ ‚ υ J c 0 ‚ ‚ ‚ 1 ≤ k 0 k υ J 0 k 1 Σ 1 / 2 υ 2 k υ J 01 k 2 > 0 . (A.8) Pro position A.1. F or some inte ger 1 ≤ s ≤ p / 2 , a nd for the same k 0 > 0 , the two sets of RE cond itions ar e equivalen t up to a consta nt √ 2 factor of each other: K ( s, s, k 0 , Σ) √ 2 ≤ K ( s, k 0 , Σ) ≤ K ( s, s , k 0 , Σ); Similarly , w e have K ( s, s, k 0 , X ) √ 2 ≤ K ( s, k 0 , X ) ≤ K ( s, s, k 0 , X ) . Pr oof . It is obvi ous that for the sa me k 0 > 0 , ( A.8 ) implies th at the conditi on as in D efinition 1.2 holds with K ( s, k 0 , Σ) ≤ K ( s, s, k 0 , Σ) . No w , for the other directio n, suppose that RE ( s, k 0 , Σ) holds for for K ( s, k 0 , Σ) > 0 . Then for all υ 6 = 0 that is admissi ble to ( 1.12 ), we ha ve by Propositi on 1.4 , Σ 1 / 2 υ 2 ≥ k υ T 0 k 2 K ( s, k 0 , Σ) > 0 , (A.9) where T 0 corres ponds to location s of the s larges t coefficien ts of υ in absolu te value s; No w for a ny J 0 ⊆ { 1 , . . . , p } such that | J 0 | ≤ s , and υ J c 0 1 ≤ k 0 k υ J 0 k 1 holds, we ha ve by( 1.12 ), Σ 1 / 2 υ 2 ≥ k υ J 0 k 2 K ( s, k 0 , Σ) > 0 . (A.10) No w it is clear that J 1 ⊂ T 0 ∪ T 1 , and we hav e for all υ 6 = 0 that is admissibl e to ( 1.12 ), 0 < k υ J 01 k 2 2 = k υ J 0 k 2 2 + k υ J 1 k 2 2 (A.11) ≤ k υ J 0 k 2 2 + k υ T 0 k 2 2 (A.12) ≤ 2 K 2 ( s, k 0 , Σ) Σ 1 / 2 υ 2 2 , (A.13) which immediately implies that for all υ 6 = 0 that is admiss ible to ( 1.12 ), Σ 1 / 2 υ 2 k υ J 01 k 2 ≥ 1 √ 2 K ( s, k 0 , Σ) > 0 . Thus we hav e that RE ( s, s , k 0 , Σ) conditio n holds with K ( s, s, k 0 , Σ) ≤ √ 2 K ( s, k 0 , Σ) . The other set of inequa lities follo w exact ly the same line of arg uments. 12 W e no w introduce the last assumpti on, for which we need some more notatio n. For integers s, m such that 1 ≤ s ≤ p/ 2 and m ≥ s, s + m ≤ p , a vector δ ∈ R p and a set of indices J 0 ⊆ { 1 , . . . , p } with | J 0 | ≤ s , denote d by J m the su bset of { 1 , . . . , p } corr esponding to the m l arges t in a bsolute va lue coordi nates of δ outsid e of J 0 and defined J 0 m △ = J 0 ∪ J m . Assumption A.3. Restrict ed eig en value a ssumption R E ( s, m, k 0 , X ) ( Bicke l et al. , 2009 ) . F or some inte ger 1 ≤ s ≤ p/ 2 , m ≥ s, s + m ≤ p , and a positive number k 0 , the following condition holds: 1 K ( s, m, k 0 , X ) := min J 0 ⊆{ 1 ,...,p } , | J 0 |≤ s min υ 6 =0 , ‚ ‚ ‚ υ J c 0 ‚ ‚ ‚ 1 ≤ k 0 k υ J 0 k 1 k X υ k 2 √ n k υ J 0 m k 2 > 0 . (A.14) Pro position A.2. F o r some inte ger 1 ≤ s ≤ p/ 2 , m ≥ s , s + m ≤ p , and some positive number k 0 , we have K ( s, m, k 0 , X ) p 2 + k 2 0 ≤ K ( s, k 0 , X ) ≤ K ( s, m, k 0 , X ) . Pr oof . It is cl ear that K ( s, k 0 , X ) ≤ K ( s, m, k 0 , X ) for m ≥ s . N o w sup pose that RE ( s, k 0 , X ) holds, we contin ue fr om ( A.10 ). W e de vide J m into J 1 , J 2 , . . . , such that such that J 1 corres ponds to location s of the s larg est coef ficients of υ J c 0 in absolut e v alues, J 2 corres ponds to locations of the ne xt s lar gest coef ficients of υ J c 0 in a bsolute v alues, and so on. W e first bound υ J c 01 2 2 , foll owing ess entially the same argu ment as in Cand ` es and T ao ( 2007 ): obs erve that the k th lar gest v alue of υ J c 0 obe ys υ J c 0 ( k ) ≤ υ J c 0 1 /k ; Thus we ha ve for δ that is admissible to ( A.14 ), υ J c 01 2 2 ≤ υ J c 0 2 1 X j ≥ s +1 1 /k 2 ≤ s − 1 υ J c 0 2 1 ≤ s − 1 k 2 0 k υ J 0 k 2 1 ≤ k 2 0 k υ J 0 k 2 2 . It is clear that k J 1 k 2 ≤ k T 0 k 2 , and 0 < k υ J 01 k 2 2 ≤ k υ J 0 m k 2 2 ≤ k υ J 0 k 2 2 + k υ J 1 k 2 2 + υ J c 01 2 2 ≤ k υ J 0 k 2 2 + k υ J 1 k 2 2 + k 2 0 k υ J 0 k 2 2 ≤ (1 + k 2 0 ) k υ J 0 k 2 2 + k υ T 0 k 2 2 ≤ (2 + k 2 0 ) K 2 ( s, k 0 , X ) k X υ k 2 2 , which immediately implies that for all υ 6 = 0 that is admiss ible to ( A.14 ), k X υ k 2 k υ J 0 m k 2 ≥ 1 p 2 + k 2 0 K ( s, k 0 , X ) > 0 . Thus we ha ve that RE ( s, m, k 0 , X ) condi tion holds with K ( s, m, k 0 , X ) ≤ p 2 + k 2 0 K ( s, k 0 , X ) . 13 B Results on the com plexity measures In this section, in preparati on for proving L emma 2.2 and Lemma 2.4 , we first state some well-kn own definitio ns and some preliminary re sults on certain comp lexity measur es on a set V (See Mendels on et al. ( 2008 ) for examp le); we also prov ide a ne w result in Lemma B.6 . Definition B .1. Given a subset U ⊂ R p and a number ε > 0 , an ε -net Π of U with re spect to the Euclidea n metric is a subset of points of U such that ε -balls cent er ed at Π cover s U : U ⊂ [ x ∈ Π ( x + εB p 2 ) , wher e A + B := { a + b : a ∈ A, b ∈ B } is the Mink owski sum of the sets A and B . The co vering number N ( U, ε ) is the smallest car dinality of an ε -net of U . No w it is well-kn own that there e xists an absolu te constant c 1 > 0 such that for e very finite subset Π ⊂ B p 2 , ℓ ∗ ( con v Π) = ℓ ∗ (Π) ≤ c 1 p log | Π | . (B.1) The main go al of the rest of th is sectio n is to prov ide a bound on a varia tion of th e comple xity measure ℓ ∗ ( V ) , whic h we will denote with e ℓ ∗ ( V ) througho ut this paper , by essen tially expl oiting a boun d similar to ( B.1 ) (cf. Lemma B.6 ). Giv en a set V ⊂ R p , we need to also m easure ℓ ∗ ( W ) , w here W is the subspace of R p such that the linear functi on Σ 1 / 2 : V → R p carries t ∈ V onto: W := Σ 1 / 2 ( V ) = { w ∈ R p : w = Σ 1 / 2 t for some t ∈ V } . W e deno te this ne w m easure with e ℓ ∗ ( V ) . Forma lly , Definition B.2. F or a subset V ⊂ R p , we define e ℓ ∗ ( V ) : = ℓ ∗ (Σ 1 / 2 ( V )) := E sup t ∈ V h t , Σ 1 / 2 h i := E sup t ∈ V p X i =1 g i t i (B.2) wher e t = ( t i ) p i =1 ∈ R p , and h = ( h i ) p i =1 ∈ R p is a ra ndom vecto r with indep endent N (0 , 1) rando m variab les while g = Σ 1 / 2 h is a random vector with dependen t Gaus sian ran dom varia bles. W e prov e a bound on this measure in Lemma B.6 after we presen t some e xisting re sults. T he subsets that we would lik e to apply ( 2.1 ) and ( B.2 ) are the sets consis ting of sparse vectors : let S p − 1 be the unit sphere in R p , for 1 ≤ m ≤ p U m := { x ∈ S p − 1 : | supp ( x ) | ≤ m } (B.3) W e shall also con sider the analogo us subset of the Euclidean ball, e U m := { x ∈ B p 2 : | supp ( x ) | ≤ m } (B.4) The sets U m and e U m are unions o f th e unit spheres, and unit balls, res pecti vely , support ed on m -dimensio nal coordi nate s ubspaces of R p . The follo wing three lemmas are well-kno wn and mostly stan dard; See Mendels on et al. ( 2008 ) and Ledoux and T alagran d ( 1991 ) for exa mple. 14 Lemma B.3. ( Mend elson et al. ( 2008 , L emma2.2)) Given m ≥ 1 and ε > 0 . Ther e e xists an ε cover Π ⊂ B m 2 of B m 2 with r espect to the Euclidea n metric such that B m 2 ⊂ (1 − ε ) − 1 con v Π and | Π | ≤ (1 + 2 /ε ) m . Similarly , ther e e xists an ε cove r of the spher e S m − 1 , Π ′ ⊂ S m − 1 suc h that | Π ′ | ≤ (1 + 2 /ε ) m . Lemma B.4. ( Mendels on et al. ( 2008 , Lemma 2.3)) F or every 0 < ε ≤ 1 / 2 and ever y 1 ≤ m ≤ p , ther e is a set Π ⊂ B p 2 which is an ε cov er of e U m , suc h that e U m ⊂ 2 con v Π , wher e | Π | ≤ 5 2 ε m p m (B.5) Mor eove r , ther e exis ts an ε cover Π ′ ⊂ S p − 1 of U m with car dinality at most 5 2 ε m p m . Pr oof . C onside r all subsets T ⊂ { 1 , . . . , p } with | T | = m , it is clear that the required sets in Π and Π ′ in Lemma B.4 can be ob tained by un ions of co rrespondin g sets supp orted on the coordina tes from T . By Lemma B.3 , the cardinali ties o f these sets are at most (5 / 2 ε ) m p m . Lemma B.5. ( Ledo ux and T alagrand , 1991 ) Let X = ( X 1 , . . . , X N ) be Gaussian in R p . Then E max i =1 ,...,N | X i | ≤ 3 p log N max i =1 ,...,N q E X 2 i . W e now prov e the key lemma th at we need for Lemma D.2 . The main point of the proof follo w s th e idea from Mendels on et al. ( 2008 ): if U m ⊂ 2 con v Π m for Π m ⊂ B p 2 and th ere is a reasonab le con trol of the cardin ality of Π m and ρ max ( m ) on Σ , then e ℓ ∗ ( V ) is bound ed from abov e. Lemma B.6. Let Π m be a 1 / 2 -co ver of e U m pr ovided by L emm a B.4 . The n for 1 ≤ m < p/ 2 and c = 5 e , it holds that for V = U m e ℓ ∗ ( U m ) ≤ e ℓ ∗ (2 con v Π m ) = 2 e ℓ ∗ (Π m ) wher e (B.6) e ℓ ∗ (Π m ) ≤ 3 p m log c ( p/m ) p ρ max ( m ) . (B.7) Pr oof . T he first inequ ality follo ws from the definition of e ℓ ∗ and the f act that V = U m ⊂ e U m ⊂ 2 con v Π m . The secon d equality in ( B.6 ) holds due to con vexity which guaran tees that sup y ∈ con v Π m h y , Σ 1 / 2 h i = sup y ∈ Π m h y , Σ 1 / 2 h i and hence e ℓ ∗ (2 con v Π m ) = 2 e ℓ ∗ ( con v Π m ) = 2 e ℓ ∗ (Π m ) . Thus we ha ve for c = 5 e e ℓ ∗ ( con v Π m ) = e ℓ ∗ (Π m ) := E sup t ∈ Π m h t , Σ 1 / 2 h i ≤ 3 p log | Π m | sup t ∈ Π m q E h t , Σ 1 / 2 h i 2 ≤ 3 p m log(5 ep/m ) sup t ∈ Π m Σ 1 / 2 t 2 ≤ 3 p m log c ( p/m ) p ρ max ( m ) 15 where we hav e used Lemm a B.5 , ( B.5 ) and the boun d p m ≤ ep m m , which is valid for m < p/ 2 , and the fact tha t E h t , Σ 1 / 2 h i 2 = E h h, Σ 1 / 2 t i 2 = Σ 1 / 2 t 2 2 . B.1 Proof of Lemma 2.2 It is clear that for all y ∈ Υ , y = Σ 1 / 2 δ for some δ ∈ E s , hence all equalities in ( 2.3 ) hold. W e hence focus on boundi ng the last term. For each δ ∈ E s , we decompose δ i nto a set of vecto rs δ T 0 , δ T 1 , δ T 2 , . . . , δ T K as in Section A.2 . By Propositi on 1.4 , we ha ve δ T c 0 1 ≤ k 0 k δ T 0 k 1 . For each inde x set T ⊂ { 1 , . . . , p } , we let δ T repres ent its 0 -exten ded versio n δ ′ in R p , such th at δ ′ T c = 0 and δ ′ T = δ T . For δ T = 0 , it is und erstood tha t δ T k δ T k 2 := 0 belo w . T hus we h av e for all δ in E s and all h ∈ R p , h h, Σ 1 / 2 δ i = h h, Σ 1 / 2 δ T 0 + X k ≥ 1 h h, Σ 1 / 2 δ T k i i ≤ h h, Σ 1 / 2 δ T 0 i + X k ≥ 1 h h, Σ 1 / 2 δ T k i ≤ h δ T 0 , Σ 1 / 2 h i + X k ≥ 1 k δ T k k 2 h δ T k k δ T k k 2 , Σ 1 / 2 h i ≤ k δ T 0 k 2 h δ T 0 k δ T 0 k 2 , Σ 1 / 2 h i + X k ≥ 1 k δ T k k 2 sup t ∈ U s h t , Σ 1 / 2 h i ≤ k δ T 0 k 2 + X k ≥ 1 k δ T k k 2 sup t ∈ U s h t , Σ 1 / 2 h i ≤ ( k 0 + 2) K ( s, k 0 , Σ) su p t ∈ U s h h, Σ 1 / 2 t i , (B.8) where we hav e used the follo wing bounds in ( B.9 ) and ( B.10 ): By A ssumptio n 1.2 and b y con struction of its corresp onding sets T 0 , T 1 , . . . , we hav e for all δ ∈ E s , k δ T 0 k 2 ≤ K ( s, k 0 , Σ) Σ 1 / 2 δ 2 = K ( s, k 0 , Σ) (B.9) X k ≥ 1 k δ T k k 2 ≤ ( k 0 + 1) k δ T 0 k 2 ≤ ( k 0 + 1) K ( s, k 0 , Σ) , (B.10) where we used the bound in ( A.3 ). Thus we ha ve by ( B.8 ) and Lemma B.6 E sup δ ∈ E s h h, Σ 1 / 2 δ i ≤ (2 + k 0 ) K ( s, k 0 , Σ) E s u p t ∈ U s h h, Σ 1 / 2 t i ≤ (2 + k 0 ) K ( s, k 0 , Σ) e ℓ ∗ ( U s ) ≤ 3(2 + k 0 ) K ( s, k 0 , Σ) p s log( cp/s ) p ρ max ( s ) := ¯ C p s log( cp/s ) 16 by Lemma B.6 , where ¯ C is as defined in ( 1.19 ) and c = 5 e . This pro ves Lemma 2.2 . B.2 Proof of Lemma 2.4 Let h 1 , . . . , h p be inde pendent N (0 , 1) Gaussian random v ariables. W e ha ve by Lemma ( B.5 ), ℓ ∗ (Φ) := E max i =1 ,...,p p X j =1 ρ ij h j ≤ 3 p log p max i =1 ,...,p v u u u t E p X j =1 ρ ij h j 2 = 3 p log p m ax i =1 ,...,p v u u t p X j =1 ( ρ ij ) 2 E h 2 j = 3 p log p m ax i =1 ,...,p p Σ ii = 3 p log p , where we used the fact that Σ ii = 1 for all i and σ ( h j ) = 1 , ∀ j . C Proofs f or Theor ems in Section 3 Through out t his section, let 1 > θ > 0 . Proving both Theorem 3.1 and Theorem 3.2 in vo lves fi rst sho wing that the opt imal solu tions to both the Lasso and the Dantzig selecto r satisfy the cone co nstraint as in ( 1.4 ) for I = supp β , for some k 0 > 0 . Indeed, it holds tha t k 0 = 1 for the Dantzig selector when λ n ≥ (1 + θ ) λ σ ,a,p , and k 0 = 3 for the Lasso w hen λ n ≥ 2(1 + θ ) λ σ ,a,p (cf. L emma C.2 and ( C. 14 )). These hav e been sho wn before , for e xample, in Bicke l et al. ( 200 9 ) and in Cand ` es and T ao ( 2007 ). W e inclu ded proofs for (Lemma C.2 and ( C .14 )) for co mpleteness. W e then state two propositi ons for the Lasso estimator and the Dantzig selector respec tiv ely under T a , where a > 0 and 1 > θ > 0 . W e first bound the probabi lity o n T c a . C.1 Bounding T c a Lemma C.1. F or fixed des ign X with max j k X j k 2 ≤ (1 + θ ) √ n , wher e 0 < θ < 1 , we have for T a as define d in ( 3.2 ) , w her e a > 0 , P ( T c a ) ≤ ( √ π log pp a ) − 1 . Pr oof . D efine random v ariables : Y j = 1 n P n i =1 ǫ i X i,j . Note that max 1 ≤ j ≤ p | Y j | = k X T ǫ/n k ∞ . W e h av e E ( Y j ) = 0 and V ar (( Y j )) = k X j k 2 2 σ 2 /n 2 ≤ (1 + θ ) σ 2 /n . Let c 0 = 1 + θ . Obviou sly , Y j has its tail probab ility dominated by that of Z ∼ N (0 , c 2 0 σ 2 n ) : P ( | Y j | ≥ t ) ≤ P ( | Z | ≥ t ) ≤ 2 c 0 σ √ 2 π nt exp − nt 2 2 c 2 0 σ 2 ǫ . 17 W e can no w apply the union bound to obtain : P max 1 ≤ j ≤ p | Y j | ≥ t ≤ p c 0 σ √ nt exp − nt 2 2 c 2 0 σ 2 = exp − nt 2 2 c 2 0 σ 2 + log t √ π n √ 2 c 0 σ − log p . By choosi ng t = c 0 σ √ 1 + a p 2 log p/n , the right-han d sid e is boun ded by ( √ π log pp a ) − 1 for a ≥ 0 . C.2 Pr oof of Theorem 3.1 Let b β b e an optimal solution to the Lasso as in ( 1.2 ). S := supp β . and υ = b β − β . W e fi rst sho w Lemma C.2 ; we then appl y condition R E ( s, k 0 , X ) on υ with k 0 = 3 un der T a to sho w Proposit ion C.3 . Theorem 3.1 follo ws immediately from P roposi tion ( C.3 ). Lemma C.2. Bic kel et al. ( 2009 ) Under condit ion T a as defin ed in ( 3 .2 ) , k υ S c k 1 ≤ 3 k υ S k 1 for λ n ≥ 2(1 + θ ) λ σ ,a,p for the Lasso. Pr oof . B y the opti mality of b β , we ha ve λ n k β k 1 − λ n b β 1 ≥ 1 2 n Y − X b β 2 2 − 1 2 n k Y − X β k 2 2 ≥ 1 2 n k X υ k 2 2 − υ T X T ǫ n Hence under conditi on T a as in ( 3.2 ), we ha ve for λ n ≥ 2(1 + θ ) λ σ ,a,p , k X υ k 2 2 /n ≤ 2 λ n k β k 1 − 2 λ n b β 1 + 2 X T ǫ n ∞ k υ k 1 ≤ λ n 2 k β k 1 − 2 b β 1 + k υ k 1 , (C.1) where by the triang le inequality , an d β S c = 0 , we ha ve 0 ≤ 2 k β k 1 − 2 b β 1 + k υ k 1 = 2 k β S k 1 − 2 b β S 1 − 2 k υ S c k 1 + k υ S k 1 + k υ S c k 1 ≤ 3 k υ S k 1 − k υ S c k 1 . (C.2) Thus Lemma C.2 holds. W e no w sho w P roposi tion C.3 , where excep t for the ℓ 2 -con ver gence rate as in ( C.5 ), all bounds hav e essen- tially been sho wn in Bicke l et al. ( 200 9 ) ( as Theorem 7 .2) unde r Assumption RE ( s, 3 , X ) ; The bound on k υ k 2 , which as f ar as the au thor is awa re of, is ne w; ho wev er , this result is inde ed als o implied by Theo- rem 7.2 in Bicke l et al. ( 2009 ) giv en Proposi tion A.1 as deri ved in this paper . W e note that the sa me remark holds for Proposi tion C.5 ; see Bickel et al. ( 2009 , Theore m 7.1). 18 Pro position C.3. ( ℓ p -loss for the Lasso ) Suppose that RE ( s, 3 , X ) holds. Let Y = X β + ǫ , for ǫ being i.i.d. N (0 , σ 2 ) and k X j k 2 ≤ (1 + θ ) √ n . Let b β be an o ptimal solution to ( 1.2 ) wit h λ n ≥ 2(1 + θ ) λ σ ,a,p , wher e a ≥ 0 . Let υ = b β − β . Then on conditio n T a as in ( 3.2 ) , the following hold for B 0 = 4 K 2 ( s, 3 , X ) k υ S k 2 ≤ B 0 λ n √ s, (C.3) k υ k 1 ≤ B 0 λ n s, wher e k υ S c k 1 ≤ 3 k υ S k 1 (C.4) and k υ k 2 ≤ 2 B 0 λ n √ s. (C.5) Pr oof . T he first part of this pr oof follo w s th at of Bick el et al. ( 2009 ). Now u nder con dition T a , by ( C.1 ) and ( C.2 ), k X υ k 2 2 /n + λ n k υ k 1 ≤ λ n (3 k υ S k 1 − k υ S c k 1 + k υ S k 1 + k υ S c k 1 ) = 4 λ n k υ S k 1 ≤ 4 λ n √ s k υ S k 2 (C.6) ≤ 4 λ n √ sK ( s, 3 , X ) k X υ k 2 / √ n (C.7) ≤ 4 K 2 ( s, 3 , X ) λ 2 n s + k X υ k 2 2 /n. (C.8) where ( C.7 ) holds by definitio n of R E ( s, 3 , X ) ; T hus we ha ve by ( C.8 ) that k υ S k 1 ≤ k υ k 1 ≤ 4 K 2 ( s, 3 , X ) λ n s, (C.9) which implies that ( C.4 ) holds with B 0 = 4 K 2 ( s, 3 , X ) . Now by RE ( s, 3 , X ) and ( C.6 ), we ha ve k υ S k 2 2 ≤ K 2 ( s, 3 , X ) k X υ k 2 2 /n ≤ K 2 ( s, 3 , X )4 λ n √ s k υ S k 2 (C.10) which immediately implies that ( C.3 ) holds. Finally , we hav e by ( A.5 ), ( C.9 ), ( 1.14 ) and the RE ( s, 3 , X ) condit ion, k υ k 2 ≤ k υ T 0 k 2 + s − 1 / 2 k υ k 1 ≤ K ( s, 3 , X ) k X υ k 2 / √ n + 4 K 2 ( s, 3 , X ) λ n √ s, (C.11) ≤ K ( s, 3 , X ) q 4 λ n k υ S k 1 + 4 K 2 ( s, 3 , X ) λ n √ s, (C.12) ≤ 8 λ n K 2 ( s, 3 , X ) √ s. (C.13) where in ( C.11 ), we cruci ally exp loit the uni versa lity of the RE conditi on; in ( C.12 ), w e use the bound in ( C.6 ); and in ( C.13 ), we use ( C.9 ). C.3 Pr oof of Theorem 3.2 Let b β b e an optimal solution to the Dantz ig selector as into ( 1.3 ). Let S := supp β . and υ = b β − β . W e first sho w Lemm a C.4 ; we the n apply condit ion R E ( s, k 0 , X ) to υ with k 0 = 1 under T a to sho w Proposit ion C.5 . Theore m 3.2 follo ws from imm ediatel y from Propositio n ( C.5 ) . 19 Lemma C.4. ( Cand ` es and T ao ( 2007 )) Under condition T a , k υ S c k 1 ≤ k υ S k 1 for λ n ≥ (1 + θ ) λ σ ,a,p , wher e a ≥ 0 and 0 < θ < 1 for the Dantzig selector . Pr oof . C learly the true ve ctor β is feasible to ( 1.3 ), as 1 n X T ( Y − X β ) ∞ = 1 n X T ǫ ∞ ≤ (1 + θ ) λ σ ,a,p ≤ λ n , hence by the optimali ty of b β , b β 1 ≤ k β k 1 . Hence it holds under for υ = b β − β that k β k 1 − k υ S k 1 + k υ S c k 1 ≤ k β + υ k = b β 1 ≤ k β k 1 (C.14) and hence υ obeys the cone con straint as desired. Pro position C.5. ( ℓ p -loss fo r the D antzig select or ) Suppo se that RE ( s, 1 , X ) holds. Let Y = X β + ǫ , for ǫ being i.i.d. N (0 , σ 2 ) and k X j k 2 ≤ (1 + θ ) √ n . Let b β be an optimal solution to ( 1.3 ) w ith λ n ≥ (1 + θ ) λ σ ,a,p , wher e a ≥ 0 and 0 < θ < 1 . Then on condi tion T a as in ( 3.2 ) , the following hold w ith B 1 = 4 K 2 ( s, 1 , X ) k υ S k 2 ≤ B 1 λ n √ s, (C.15) k υ k 1 ≤ 2 B 1 λ n s, wher e k υ S c k 1 ≤ k υ S k 1 (C.16) and k υ k 2 ≤ 3 B 1 λ n √ s. (C.17) Remark C.6. See comments in fr ont of P r opos ition C.3 . Pr oof of Propositio n C.5 . Our proof follo ws that of B ick el et al. ( 2009 ). Let b β as an optimal solution to ( 1.3 ). Let υ = b β − β and let T a hold for a > 0 and 0 < θ < 1 . By the constra int o f ( 1.3 ), w e ha ve 1 n X T X υ ∞ ≤ 1 n X T ( Y − X b β ) ∞ + 1 n X T ǫ ∞ ≤ 2 λ n . and hence by Lemma C.4 , we ha ve k X υ k 2 2 /n = υ T X T X υ n ≤ 1 n υ T X T X ∞ k υ k 1 ≤ 2 λ n k υ k 1 ≤ 4 λ n k υ S k 1 ≤ 4 λ n √ s k υ S k 2 . (C.18) W e no w apply condit ion RE ( s, k 0 , X ) on υ with k 0 = 1 to obtain k υ S k 2 2 ≤ K 2 ( s, 1 , X ) k X υ k 2 2 /n ≤ K 2 ( s, 1 , X )4 λ n √ s k υ S k 2 , (C.19) which immediately implies that ( C.15 ) h olds. Hence ( C.16 ) holds with B 1 = 4 K 2 ( s, 1 , X ) giv en ( C.19 ) and k υ S c k 1 ≤ k υ S k 1 ≤ 4 K 2 ( s, 1 , X ) λ n s. (C.20) 20 Finally , we hav e by ( A. 5 ), ( C.20 ), ( 1.14 ) and the RE ( s, 3 , X ) condit ion, k υ k 2 ≤ k υ T 0 k 2 + s − 1 / 2 k υ k 1 ≤ K ( s, 1 , X ) k X υ k 2 / √ n + 8 K 2 ( s, 1 , X ) λ n √ s, (C.21) ≤ K ( s, 1 , X ) q 4 λ n k υ S k 1 + 8 K 2 ( s, 1 , X ) λ n √ s, (C.22) ≤ 12 λ n K 2 ( s, 1 , X ) √ s. (C.23) where in ( C.21 ), we cruciall y exp loit the uni versali ty of the RE conditi on, and in ( C.22 ), we use the bound in ( C.20 ) and ( C.18 ); and in ( C.23 ), we use ( C.20 ) again . D A fundamental proof f or the Gauss ian random design In this section, we state a th eorem for the Gaussian ra ndom de sign, follo wing a more fundamental proof gi ven by Raskutti et al. ( 2009 ) (cf. Propo sition 1). W e apply thei r method and prov ide a tighte r bound on the sample size tha t is requi red in order for X to satisfy the RE conditi on, w here X is composed of indepe ndent rows with multiv ariate Gauss ian vect ors drawn from N (0 , Σ ) as in 1.18 . W e note that both upper and lo wer bounds in Theorem D.1 are obtain ed in a way that is quite similar to ho w the large st and smallest singular v alues of a Gaussian ra ndom matrix are upper and lower bounded respe ctiv ely; see for exa mple Dav idson and Szarek ( 2001 ). The impro vement over results in R askutt i et al. ( 2009 ) comes from the tighter bound on ℓ ∗ (Υ) as de velop ed in L emma 2.2 . Fo rmally , we hav e the followin g. Theor em D.1. Set 1 ≤ n ≤ p and 0 < θ < 1 . C onside r a rand om design X as in ( 1.11 ) , w her e Σ satisfi es ( 1.12 ) and ( 1.16 ) . Su ppose s < p/ 2 and for ¯ C as in ( 1.19 ) , n > 1 θ 2 ¯ C p s log (5 ep/s ) + p 2 d log p 2 (D.1) for d > 0 . Then w e have with pr obabili ty at least 1 − 4 /p d , (1 − θ − o (1)) Σ 1 / 2 δ 2 ≤ k X δ k 2 / √ n ≤ (1 + θ ) Σ 1 / 2 δ 2 (D.2) holds for all δ 6 = 0 that is admissible to ( 1.12 ) , tha t is, ∃ some J 0 ∈ { 1 , . . . , p } such that | J 0 | ≤ s a nd δ J c 0 1 ≤ k 0 k δ J 0 k 1 , wher e k 0 > 0 . Pr oof . W e only provide a ske tch here; see Raskutti et al. ( 2009 ) for details . Using the Slepia n’ s L emma a nd its extension by Gordon ( 1985 ), the foll owing inequalities ha ve be en deri ved by Raskutti et al. ( 200 9 ) (cf. Proof of Proposit ion 1 therein), E inf δ ∈ E s k X δ k 2 ≥ E k g k 2 − E sup δ ∈ E s h h, Σ 1 / 2 δ i , E sup δ ∈ E s k X δ k 2 ≤ √ n + E su p δ ∈ E s h h, Σ 1 / 2 δ i , 21 where g and h are random vecto rs with i.i.d Gaussian N (0 , 1) ele ments in R n and R p respec tiv ely . Now Lemma D.2 follo ws immediately , after we plug in the boun d as in Lemma 2.2 on ℓ ∗ (Υ) := E sup δ ∈ E s h h, Σ 1 / 2 δ i . Lemma D.2. Suppos e Σ satisfies Assumption 1.2 . Then for ¯ C as in Theor em D.1 , we have E inf δ ∈ E s k X δ k 2 ≥ √ n − o ( √ n ) − ¯ C p s log(5 ep/s ) (D.3) E sup δ ∈ E s k X δ k 2 ≤ √ n + ¯ C p s log(5 ep/s ) . (D.4) W e then apply the concentrati on of measure inequali ty for inf δ ∈ E s k X δ k 2 , f or whic h it is well kno wn that the 1 -Lipschitz cond ition holds f or inf δ ∈ E s k X δ k 2 = inf δ ∈ E s A Σ 1 / 2 δ 2 , where A is a matrix with i.i.d. standa rd normal rando m v ariabl es in R n × p . Recall a function f : X → Y is called 1 -Lipschit z con dition if for all x, y ∈ X , d Y ( f ( x ) , f ( y )) ≤ d X ( x, y ) . Pro position D.3. V iew Gau ssian rand om m atrix A as a can onical G aussia n vec tor in R np . Let f ( A ) := inf δ ∈ E s A Σ 1 / 2 δ 2 and f ′ ( A ) := sup δ ∈ E s A Σ 1 / 2 δ 2 be two functions of A fr om R np to R . Then f , f ′ : R np → R ar e 1 -Lipsc hitz: | f ( A ) − f ( B ) | ≤ k A − B k 2 ≤ k A − B k F , | f ′ ( A ) − f ′ ( B ) | ≤ k A − B k 2 ≤ k A − B k F . Finally we apply the concen tration of measure in Gauss S pace to obta in for t > 0 , P ( | f ( A ) − E f ( A ) | > t ) ≤ 2 exp( − t 2 / 2) , and (D.5) P | f ′ ( A ) − E f ′ ( A ) | > t ≤ 2 exp( − t 2 / 2) . (D.6) No w it is clear that with probability at least 1 − 4 /p d , where d > 0 , we ha ve for X = A Σ 1 / 2 inf δ ∈ E s A Σ 1 / 2 δ 2 =: f ( A ) ≥ E inf δ ∈ E s A Σ 1 / 2 δ 2 − p 2 d log p ≥ √ n − o ( √ n ) − ¯ C p s log(5 ep/s ) − p 2 d log p, which we denote as e vent F , and sup δ ∈ E s A Σ 1 / 2 δ 2 =: f ′ ( A ) ≤ E su p δ ∈ E s A Σ 1 / 2 δ 2 + p 2 d log p ≤ √ n + ¯ C p s log(5 ep/s ) + p 2 d log p, which we denote as F ′ . No w it is clear that ( D.2 ) holds on F ∩ F ′ , gi ven ( D.1 ). 22 Refer ences A DA M C Z A K , R . , L I T V A K , A . E . , , P A J O R , A . and T O M C Z A K - J A E G E R M A N N , N . (20 09). Restricte d isometry property of m atrices with independen t columns and neighbor ly po lytopes by random sampling . arXi v:0904.472 3v1. B A R A N I U K , R . G ., D A V E N P O RT , M ., D E V O R E , R . A . and W A K I N , M . B . (20 08). A simple proof of the restric ted isometry property for random matrices. Constructi ve Appr oximatio n 28 253–263. B I C K E L , P . J ., R I T OV , Y . and T S Y B A K OV , A . B . (2009). Simultaneous analysi s of Lasso and Dantzig selecto r . T he Annals of Statistics 37 1705 –1732. C A N D ` E S , E . , R O M B E R G , J . and T AO , T . (20 06). Stable signa l reco very fro m incomplete and inacc urate measuremen ts. Communicati ons in Pur e and Applied Mathematics 59 1207–1223 . C A N D ` E S , E . and T AO , T . (200 5). Decoding by Linear Programming. IEEE T rans. Info. Theory 51 4203– 4215. C A N D ` E S , E . and T AO , T . (2006). Near op timal signal recov ery from rand om projec tions: Uni ver sal encod- ing strate gies? IEEE T ran s. Info . Theory 52 5406–5425 . C A N D ` E S , E . and T AO , T . (2007). T he Dant zig selector: statistic al estimation when p is much lar ger than n. Annals of Statistics 35 2313–235 1. C H E N , S . S . , D O N O H O , D . L . and S AU N D E R S , M . A . (1998) . A tomic decompo sition by basis pursuit. SIAM Jo urnal on Scienti fic and Statistical C omputing 20 33–61 . D A V I D S O N , K . R . and S Z A R E K , S . (2001) . L ocal operato r theory , rando m matrices and banach spaces. Handboo k on the Geometry of B anac h spaces 1 317–366 . G O R D O N , Y . (198 5). Some inequali ties for gaussian process es and applic ations. Israel J ournal of Mathe- matics 50 265–289 . L E D O U X , M . and T A L A G R A N D , M . (1991). P r obability in Banach S paces: Isop erimetry and pr ocesses . Springer . M E I N S H AU S E N , N . and Y U , B . (20 09). Lass o-type re cove ry of sparse re presentatio ns for high -dimensiona l data. Annals of Statistic s 3 7 246–270. M E N D E L S O N , S . , P A J O R , A . and T O M C Z A K - J A E G E R M A N N , N . (200 7). Reconstruc tion and subgaussi an operat ors in asymptotic geometric analysis. Geometric and Functional Analys is 17 1248–1 282. M E N D E L S O N , S . , P A J O R , A . and T O M C Z A K - J A E G E R M A N N , N . (2008 ). Uniform uncerta inty principle for bernou lli and subgaussi an en sembles. Constructive A ppr oximation 28 277– 289. R A S K U T T I , G . , W A I N W R I G H T , M . an d Y U , B . (2009) . Minimax rates of estimation for high-dimensio nal linear regr ession ov er ℓ q -balls. In A llerton Confer ence on Contr ol, Communication and Computer . Longer ver sion in arXi v:0910.20 42v1.pdf. 23 T I B S H I R A N I , R . (1996 ). Regress ion shrinkag e and selection via the Lasso . J . Roy . S tatist. Soc. Ser . B 58 267–2 88. W A S S E R M A N , L . and R O E D E R , K . (200 9). High dimensional varia ble selectio n. The A nnals of Statistics 37 2178–22 01. Z H O U , S . (2009). Thr esholding pro cedures for h igh dimen sional var iable selec tion and s tatistical estima - tion. In Advanc es in Neural Informatio n P r ocessing Systems 22 . MIT Press. Z H O U , S . , V A N D E G E E R , S . and B ¨ U H L M A N N , P . (2009). Adapti ve Lasso for high dimension al regressio n and gaussi an graphic al modeling. ArXiv:090 3.2515. 24
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment