Compressed Sensing and Matrix Completion with Constant Proportion of Corruptions

Compressed Sensing and Matrix Completion with Constan t Prop ortion of C orruptions Xiao dong Li Departmen t of Mathematics, Stanford Univ ersit y , Stanford, CA 94305 Abstract In t his pap er we impro ve exis ting results in the ﬁeld of compres s ed sensing and matrix completion when sampled data may be gross ly corrupted. W e intro duce three new theo rems. 1) In compressed sensing, we show that if the m × n sensing matrix has indep endent Gaussia n ent ries, then one ca n recov er a s pa rse signal x exactly by tractable ℓ 1 minimization even if a po sitive fractio n of the mea surements are a rbitrarily corrupted, pro vided the nu mber o f nonzero ent ries in x is O ( m/ (log ( n/ m ) + 1)). 2) In the very genera l sensing mo del introduced in [7] and a s suming a p ositive fraction of c orrupted mea s urements, exac t recovery still holds if the signal now ha s O ( m/ (log 2 n )) nonzero entries. 3) Finally , we pr ov e that one can r e cov er an n × n low-rank matrix from m co rrupted sampled ent ries by tractable o ptimization pr ovided the rank is on the order of O ( m/ ( n log 2 n )); aga in, this ho lds when there is a p o sitive fraction of corr upted samples. Keyw ords. Compressed S en sing, Matrix Completion, Robust PCA, Conv ex Op timization, Restricted Isometry Pr op erty , Golﬁng Sc heme. 1 In tro duction 1.1 In tro duction on Compressed Sensing with Corruptions Compressed sensing (CS) has b een we ll-studied in recent yea rs [9 , 19]. This n o v el th eory asserts th at a sparse or appro ximately spars e signal x ∈ R n can b e acquired by taking just a few non-adaptiv e linear measurements. This fact has numerous consequ ences w hic h are b eing explored in a num b er of ﬁelds of applied science and engineering. In CS , the acquisition p ro cedur e is often repr esented as y = Ax , where A ∈ R m × n is called the s en sing m atrix and y ∈ R m is the vec tor of measuremen ts or observ ations. It is no w w ell-established that the solution ˆ x to the optimization problem min ˜ x k ˜ x k 1 suc h that A ˜ x = y , (1.1) is guaran teed to b e the original signal x with h igh probab ility , p ro vided x is suﬃciently sparse and A ob eys certain conditions. A typical result is this: if A h as iid Gaussian entries, then exact reco v ery o ccurs p r o vided k x k 0 ≤ C m/ (log( n/m ) + 1) [10, 18, 37] for some p ositiv e numerical constan t C > 0. Here is another example, if A is a matrix with ro ws r andomly selected f r om the DFT matrix, the condition b ecomes k x k 0 ≤ C m/ log n [9]. This pap er discusses a natural generalization of CS, w hic h we shall r efer to as c ompr esse d sensing 1 with c orruptions . W e assu me that some en tries of th e data vect or y are totally corrupted bu t w e ha v e absolutely no idea which entries are u n reliable. W e s till wa n t to reco v er the original signal eﬃcien tly and accurately . F ormally , w e ha v e the m athematical mo del y = Ax + f = [ A, I ]  x f  , (1.2) where x ∈ R n and f ∈ R m . The num b er of nonzero co eﬃcient s in x is k x k 0 and similarly for f . As in the ab ov e m o del, A is an m × n sensin g matrix, usually sampled f rom a pr obabilit y distribution. The problem of reco v ering x (and hence f ) from y h as b een recen tly stu d ied in the literature in connection with s ome interesting applications. W e discuss a f ew of them. • Clipping. Signal clipping frequ en tly app ears b ecause of n onlinearities in the acquisition d evice [27, 38 ]. Here, one typica lly m easures g ( Ax ) rather than Ax , w here g is alwa ys a nonlinear map. Letting f = g ( Ax ) − Ax , w e thus observ e y = Ax + f . Nonlinearities usually o ccur at large amplitudes so that f or those comp onent s w ith small amplitudes, w e h a v e f = g ( Ax ) − Ax = 0. Th is means that f is sparse and , th erefore, our mo del is appr opriate. Just as b efore, lo cating the p ortion of the data ve ctor that h as b een clipp ed ma y b e diﬃcult b ecause of additional n oise. • CS for networke d data. In a sensor net w ork, diﬀeren t sensors will collect measuremen ts of the same signal x in dep en d en tly (they eac h m easur e z i = h a i , x i ) an d send the outcome to a cente r hub for analysis [23, 30]. By setting a i as the row v ectors of A , this is just z = Ax . Ho w ev er, t ypically some sensors w ill fail to send the measurements correctly , and will sometimes rep ort totally meaningless measur emen ts. Therefore, w e collect y = Ax + f , where f mo dels recording errors. There hav e b een several theoretical pap ers in v estigati ng the exact reco v ery metho d for CS with corruptions [28 – 30, 38, 40], and all of them consider the follo wing r eco v ery p ro cedure in the noiseless case: min ˜ x, ˜ f k ˜ x k 1 + λ ( m, n ) k ˜ f k 1 suc h that A ˜ x + ˜ f = [ A, I ]  ˜ x ˜ f  = y . (1.3) W e w ill compare them with our r esults in S ection 1.4. 1.2 In tro duction on matrix completion with corruptions Matrix completion (MC) b ears some similarit y with CS . Here, the goal is to reco v er a lo w-rank matrix L ∈ R n × n from a small f r action of linear m easuremen ts. F or simp licit y , we supp ose the matrix is squ are as ab o v e (the general case is similar). T he standard mo d el is that w e observe P O ( L ) where O ⊂ [ n ] × [ n ] := { 1 , ..., n } × { 1 , ..., n } and P O ( L ) ij = ( L ij if ( i, j ) ∈ O ; 0 otherwise . The problem is to reco v er the original matrix L , and there h av e b een m any pap er s studying this problem in recent y ears, see [8, 12, 21, 26, 33], for example. Here one minimizes the nuclear norm — 2 the sum of all the singular v alues [20]— to reco v er the original lo w rank matrix. W e d iscuss b elo w an impro v ed r esult due to Gross [21] (with a sligh t diﬀerence). Deﬁne O ∼ Ber( ρ ) for some 0 < ρ < 1 by meanin g that 1 { ( i,j ) ∈ O } are iid Bernoulli ran d om v ariables with parameter ρ . T hen the solution to min e L k e L k ∗ suc h that P O ( e L ) = P O ( L ) , (1.4) is guarantee d to b e exactly L with high probabilit y , p ro vided ρ ≥ C ρ r µ l og 2 n n . Here, C ρ is a p ositive n umerical constan t, r is the rank of L , and µ is an incoherence p arameter introd u ced in [8] wh ic h is only dep endent of L . This pap er is concerned with the situation in whic h some en tries ma y ha v e b een corrupted. There- fore, our mo del is that we obser ve P O ( L ) + S, (1.5) where O and L are the same as b efore and S ∈ R n × n is supp orted on Ω ⊂ O . Just as in CS, this mo del h as br oad applicabilit y . F or example, W u et al. used this mo del in p hotometric stereo [42]. This problem has also b een introd u ced in [4 ] and is related to recent wo rk in separating a low-rank from a sp ars e comp onent [4, 13, 14, 24, 43]. A typical result is that the solution ( b L, b S ) to min e L, e S k e L k ∗ + λ ( m, n ) k e S k 1 suc h that P O ( e L ) + e S = P O ( L ) + S, (1.6) is guarant eed to b e the tru e pair ( L, S ) with high pr obabilit y under some assump tions ab out L, O , S [4, 16]. W e will compare them w ith our result in S ection 1.4. 1.3 Main results This section introdu ces thr ee mo dels and three corresp onding reco v ery resu lts. T he pro ofs of these results are deferred to S ection 2 for Th eorem 1.1, Section 3 for Theorem 1.2 an d S ection 4 for Theorem 1.3. 1.3.1 CS with iid matrices [Mo del 1] Theorem 1.1 Supp ose that A is an m × n ( m < n ) r andom matrix whose entries ar e iid Gaussian variables with me an 0 and varianc e 1 /m , the signal to ac quir e is x ∈ R n , and our observation is y = Ax + f + w wher e f , w ∈ R m and k w k 2 ≤ ǫ . Then by cho osing λ ( n, m ) = 1 √ log( n/m )+1 , the solution ( ˆ x, ˆ f ) to min ˜ x, ˜ f k ˜ x k 1 + λ k ˜ f k 1 such that k ( A ˜ x + ˜ f ) − y ) k 2 ≤ ǫ (1.7) satisﬁes k ˆ x − x k 2 + k ˆ f − f k 2 ≤ K ǫ with pr ob ability at le ast 1 − C exp( − cm ) . This holds universal ly; that is to say, for al l ve ctors x and f ob eying k x k 0 ≤ αm/ (log( n/m ) + 1) and k f k 0 ≤ αm . He r e α , C , c and K ar e numeric al c o nstants. In the ab o v e statemen t, the matrix A is rand om. Ev erything else is deterministic. The reader will notice that the n umber of nonzero entries is on the same ord er as that needed for reco v ery fr om 3 clean data [3, 10 , 19, 37], while th e condition of f implies that one can tolerate a constan t fr action of p ossibly adve rsarial errors. Moreov er, our con v ex optimization is related to LASS O [35] and Basis Pursuit [15]. 1.3.2 CS with general sensing matrices [Mo del 2] In this mo d el, m < n and A = 1 √ m   a ∗ 1 ... a ∗ m   , where a 1 , ..., a m are n iid copies of a random v ector a whose distrib ution ob eys the follo wing tw o prop erties: 1) E aa ∗ = I ; 2) k a k ∞ ≤ √ µ . This mo d el has b een introdu ced in [7] and includes a lot of the sto c hastic mo dels used in the literature. Examples include partial DFT matrices, matrices with iid ent ries, certain random conv olutions [34] and so on. In th is mo del, we assu me that x and f in (1.2) ha v e ﬁxed su pp ort denoted by T and B , and with cardin alit y | T | = s and | B | = m b . In the remainder of the pap er, x T is the restriction of x to ind ices in T and f B is the restriction of f to B . Our main assump tion here concerns the sign sequences: the sign sequences of x T and f B are in d ep end en t of eac h other, and eac h is a sequence of symmetric iid ± 1 v ariables. Theorem 1.2 F or the mo del ab ove, the solution ( ˆ x, ˆ f ) to (1.3) , with λ ( n, m ) = 1 / √ log n , i s exact with pr ob ability at le ast 1 − C n − 3 , pr ovide d that s ≤ α m µ l og 2 n and m b ≤ β m µ . Her e C , α and β ar e some numeric al c onstants. Ab o v e, x and f ha v e ﬁxed s upp orts and rand om signs. Ho w ev er, by a recent de-randomization tec hnique ﬁr st in tro du ced in [4 ], exact r eco v ery with rand om supp orts an d ﬁxed signs w ould also hold. W e will explain this de-randomization tec hnique in the pro of of Theorem 1.3. In s ome sp eciﬁc mo dels, s uc h as indep endent ro ws from th e DFT matrix, µ could b e a numerical constan t, wh ic h implies the p rop ortion of corru ptions is also a constant. An op en p roblem is wh ether Th eorem 1.2 still h olds in the case where x and f hav e b oth ﬁxed sup p orts and signs. Another op en pr oblem is to know whether the result wo uld hold un der more general conditions ab out A as in [6] in the case where x has b oth random supp ort and rand om signs. W e emph asize that the sparsit y condition k x k 0 ≤ C m µ l og 2 n is a little stronger than the optimal result a v ailable in th e n oise-free literature [7 , 9]), n amely , k x k 0 ≤ C m µ l og n . The extra logarithmic factor app ears to b e imp ortan t in the pr o of which we will explain in S ection 3, and a third op en problem is whether or n ot it is p ossible to remov e this factor. Here we do not giv e a sensitivit y analysis f or the reco v ery pr o cedure as in Mo del 1. Actually b y applying a similar metho d introdu ced in [7] to our argument in Section 3, a v ery goo d error b ound could b e obtained in the noisy case. Ho w ev er, technical ly there is little no v elt y b u t it will mak e our pap er v ery long. Th erefore w e decide to only d iscuss the noiseless case an d fo cus on the sampling rate and corru ption ratio. 4 1.3.3 MC from corrupted en tries [Mo del 3] W e assume L is of rank r and write its reduced S VD as L = U Σ V ∗ , where U, V ∈ R n × r and Σ ∈ R r × r . Let µ b e th e smallest quantit y such that for all 1 ≤ i ≤ n , k U U ∗ e i k 2 2 ≤ µr n , k V V ∗ e i k 2 2 ≤ µr n , and k U V ∗ k ∞ ≤ √ µr n . This m o del is the same as th at originally introduced in [8], and later used in [4, 12 , 16, 21, 32 ]. W e observ e P O ( L ) + S , where O ∈ [ n ] × [ n ] and S is su pp orted on Ω ⊂ O . Here we assume that O , Ω , S satisfy the follo wing mo d el: Mo del 3.1: 1. Fix an n b y n matrix K , whose entries are either 1 or − 1. 2. Deﬁne O ∼ Ber( ρ ) for a constant ρ satisfying 0 < ρ < 1 2 . Sp eciﬁcally sp eaking, 1 { ( i,j ) ∈ O } are iid Bernoulli random v ariables with parameter ρ . 3. Conditioning on ( i, j ) ∈ O , assume that ( i, j ) ∈ Ω are indep end en t ev en ts with P (( i, j ) ∈ Ω | ( i, j ) ∈ O ) = s . Th is im p lies that Ω ∼ Ber( ρs ). 4. Deﬁne Γ := O / Ω. Then we h a v e Γ ∼ Ber( ρ (1 − s )) 5. Let S b e supp orted on Ω, and sgn( S ) := P Ω ( K ). Theorem 1.3 Under Mo del 3.1, supp ose ρ > C ρ µr l og 2 n n and s ≤ C s . Mor e over, supp ose λ := 1 √ ρn log n and denote ( ˆ L, ˆ S ) as the optimal solution to the pr oblem (1.6) . Then we have ( ˆ L, ˆ S ) = ( L, S ) with pr ob ability at le ast 1 − C n − 3 for some numeric al c onstant C , pr ovide d the numeric al c onstants C s is suﬃciently smal l and C ρ is suﬃciently lar ge. In this m o del O is av ailable while Ω, Γ and S are n ot kno wn explicitly from the observ atio n P O ( L ) + S . By the assumption O ∼ Ber( ρ ), we can use | O | / ( n 2 ) to app ro ximate ρ . F rom the follo wing p r o of we can see th at λ is not required to b e 1 √ ρn log n exactly for th e exact reco v ery . T he p ow er of our result is that one can reco v er a lo w-rank matrix from a nearly minimal num b er of samples ev en when a constan t prop ortion of these samples has b een corrupted. W e only discuss the noiseless case for this mo del. Actual ly by a metho d similar to [6], a sub opti- mal estimation error b ound can b e obtained by a sligh t mo diﬁcation of our argument. How ev er, it is of little interest technical ly and b eyond the optimal result when n is large. There are other sub optimal resu lts for matrix completion with noise, such as [1], b ut the error b ound is n ot tigh t when the additional noise is small. W e w an t to fo cus on the noiseless case in this pap er and lea v e the prob lem with noise for future work. The v alues of λ are c hosen for theoretical gu arantee of exact reco v ery in Th eorem 1.1, 1.2 and 1.3. In practice, λ is usually tak en b y cross v alidation. 1.4 Comparison with existing results, relativ e w orks and our c on tribution In this section we will compare Th eorems 1.1, 1.2 and 1.3 w ith existing r esults in the literature. 5 W e b egin with Mod el 1. In [40], W right and Ma discussed a mo del where the sensing ma- trix A has indep en den t columns with common mean µ and norm al p erturbations with v ariance σ 2 /m . Th ey c hose λ ( m, n ) = 1, and prov ed that ( ˆ x, ˆ f ) = ( x, f ) with h igh p robabilit y provided k x k 0 ≤ C 1 ( σ , n/m ) m , k f k 0 ≤ C 2 ( σ , n/m ) m and f h as random signs. Here C 1 ( σ , 1 /m ) is muc h smaller than C / (log( n/m ) + 1). W e n otice that sin ce the auth ors of [40] talk ed ab out a d iﬀeren t mo del, whic h is motiv ated by [41], it ma y not b e comparable with ours directly . Ho we v er, for our motiv ation of CS with corru ptions, we assume A satisfy a sym m etric distribution and get b etter sampling rate. A bit later, Lask a et al. [28] and Li et al. [29] also s tu died this p r oblem. By setting λ ( m, n ) = 1, b oth pap ers establish that for Gaussian (or s u b-Gaussian) sensing m atrices A , if m > C ( k x k 0 + k f k 0 ) log (( n + m ) / ( k x k 0 + k f k 0 )), then the reco v ery is exact. Th is follo ws f r om the fact that [ A, I ] ob eys a r estricted isometry prop er ty kno wn to guaran tee exact reco v ery of sparse vect ors via ℓ 1 minimization. F urthermore, the sparsity requirement ab out x is the same as that found in th e standard CS literature, namely , k x k 0 ≤ C m/ (log( n/m ) + 1). Ho w ev er, the result do es not allo w a p ositiv e fraction of corru ptions. F or example, if m = √ n , we ha v e k f k 0 /m ≤ 2 / log n , wh ic h will go to zero as n goes to zero. As for Mo d el 2, an interesting piece of work [30] (and later [31] on th e noisy case) app eared du ring the pr eparation of this pap er. Th ese pap ers discuss m o dels in w hic h A is formed by selecting ro ws from an orthogonal matrix with lo w incoherence p arameter µ , which is th e minimum v alue suc h that n | A ij | 2 ≤ µ for any i, j . The main result states that s electing λ = p n/ ( C µm log n ) giv es exact r eco v ery under th e follo wing assumptions: 1) the r o ws of A are chosen from an orthogonal matrix un iformly at random; 2) x is a r andom signal with ind ep end en t signs and equally likel y to b e either ± 1; 3) the supp ort of f is chose n uniformly at r andom. (By the de-rand omization tec hnique in tro duced in [4] and u sed in [30], it w ould ha v e b een suﬃcient to assume that the signs of f a re indep endent and tak e on the v alues ± 1 with equ al pr obabilit y). Finally , the sp arsit y conditions require m ≥ C µ 2 k x k 0 (log n ) 2 and k f k 0 ≤ C m , wh ic h are n early optimal, f or the b est kno wn sparsity condition w hen f = 0 is m ≥ C µ k x k 0 log n . In other w ords, the resu lt is optimal up to an extra factor of µ log n ; the sparsity cond ition ab out f is of course nearly optimal. Ho w ev er, the mo del for A do es not include some mo dels frequently d iscussed in the literature suc h as su bsampled tigh t or conti nuous fr ames. Against th is b ackg round , a r ecen t p ap er of Cand` es and Plan [7] considers a v ery general framework, whic h includ es a lot of common mo dels in the literature. Theorem 1.2 in our pap er is similar to Theorem 1 in [30]. It assumes similar sparsit y conditions, bu t is based on this m uc h broader and more app licable mo d el in tro duced in [7 ]. Notice that, we r equ ire m ≥ C µ k x k 0 (log n ) 2 whereas [30] requir es m ≥ C µ 2 k x k 0 (log n ) 2 . Therefore, we impro v e the condition by a factor of µ , whic h is alwa ys at least 1 and can b e as large as n . How ever, our result imp oses k f k 0 ≤ C m/µ , whic h is worse than k f k 0 ≤ γ m by the same factor. In [30], the parameter λ dep ends up on µ , w h ile our λ is only a fu n ction of m and n . Th is is w hy the results diﬀer, and w e prefer to use a v alue of λ that do es n ot d ep end on µ b ecause in some applications, an accurate estimate of µ ma y b e diﬃcult to obtain. In addition, we use diﬀerent tec hniques of pro of wh ic h the clev er golﬁng s cheme of [21] is exp loited. Sparse appro ximation is another problem of und erdetermined linear system where the dictionary 6 matrix A is alwa ys assu med to b e deterministic. Readers inte rested in this p roblem (whic h alwa ys requires stronger sparsity conditions) may also wan t to stud y the recen t p ap er [38] b y S tuder et al. There, the authors introd u ce a more general problem of the form y = Ax + B f , and analyzed the p erforman ce of ℓ 1 -reco v ery tec hniques by using ideas w hic h hav e b een p opularized under the name of generalized uncertaint y prin ciples in the b asis p ursuit and sparse appr o ximation literature. As for Mo del 3, Th eorem 1.3 is a signiﬁcant extension of th e results presented in [4], in wh ic h the authors ha v e a stringent r equ iremen t ρ = 0 . 1. In a very recent and ind ep endent wo rk [16], the authors consider a mo del where b oth O and Ω are u nions of sto chastic and deterministic subsets, while we only assume the sto c hastic mo del. W e recommend in terested readers to r ead the pap er for the details. Ho w ev er, only considering th eir results on sto chastic O an d Ω , a direct comparison sho ws that the num b er of samples w e need is less than that in this reference. T h e diﬀerence is sev eral logarithmic factors. Actually , the requirement of ρ in our pap er is op timal eve n for clean data in the literature of MC. Finally , we w an t to emp hasize that the random su pp ort assu m ption is essen tial in Theorem 1.3 when the r ank is large. Examp les can b e found in [24]. W e wish to close our in tro duction with a f ew w ords concerning the tec hniqu es of pro of w e shall use. Th e p ro of of Theorem 1.1 is based on the concept of r estricted isometry , which is a standard tec hnique in the literature of CS . How ever, our argument inv olv es a generalization of the r estricted isometry concept. The pro ofs of Theorems 1.2 and 1.3 are based on the golﬁng sc heme, an elegan t tec hnique pioneered by Da vid Gross [21], and later used in [4, 7, 32] to construct dual certiﬁcates. Our pro of leverag es results from [4]. How ever, we contribute no v el elemen ts b y ﬁnd ing an app ro- priate w a y to phrase su ﬃcien t optimalit y conditions, wh ic h are amenable to the golﬁng sc heme. Details are pr esen ted in the follo wing sections. 2 A Pro of of Theorem 1.1 In the pro of of Theorem 1.1, we will see the n otation P T x . H ere x is a k -dimensional v ector, T is a subset of { 1 , ..., k } and we also us e T to represent the subsp ace of all k -dimensional v ectors supp orted on T . Then P T x is the pro jection of x on to the subsp ace T , whic h is to k eep the v alue of x on the supp ort T and to c hange other elements in to zeros. In this section we u se the n otation “ ⌊ . ⌋ ” of “ﬂo or f unction” to rep resen t the integ er part of an y real num b er. First w e generalize the concept of the restricted isometry prop er ty (RIP) [11] for the con v enience to prov e our theorem: Deﬁnition 2.1 F or any matrix Φ ∈ R l × ( n + m ) , deﬁne the RIP-c onsta nt δ s 1 ,s 2 by the inﬁmum value of δ such that (1 − δ )( k x k 2 2 + k f k 2 2 ) ≤     Φ  x f      2 2 ≤ (1 + δ )( k x k 2 2 + k f k 2 2 ) holds for any x ∈ R n with | supp( x ) | ≤ s 1 and f ∈ R m with | supp( f ) | ≤ s 2 . 7 Lemma 2.2 F or any x 1 , x 2 ∈ R n and f 1 , f 2 ∈ R m such that sup p( x 1 ) ∩ supp( x 2 ) = φ , | su pp( x 1 ) | + | supp( x 2 ) | ≤ s 1 and supp( f 1 ) ∩ s u pp( f 2 ) = φ , | supp( f 1 ) | + | supp( f 2 ) | ≤ s 2 , we have      Φ  x 1 f 1  , Φ  x 2 f 2      ≤ δ s 1 ,s 2 q k x 1 k 2 2 + k f 1 k 2 2 q k x 2 k 2 2 + k f 2 k 2 2 Pro of First, we supp ose k x 1 k 2 2 + k f 1 k 2 2 = k x 2 k 2 2 + k f 2 k 2 2 = 1. By the d eﬁnition of δ s 1 ,s 2 , we h av e 2(1 − δ s 1 ,s 2 ) ≤  Φ  x 1 + x 2 f 1 + f 2  , Φ  x 1 + x 2 f 1 + f 2  ≤ 2(1 + δ s 1 ,s 2 ) , and 2(1 − δ s 1 ,s 2 ) ≤  Φ  x 1 − x 2 f 1 − f 2  , Φ  x 1 − x 2 f 1 − f 2  ≤ 2(1 + δ s 1 ,s 2 ) . By the ab o v e inequalities, we ha v e      Φ  x 1 f 1  , Φ  x 2 f 2      ≤ δ s 1 ,s 2 , and hence by h omogeneit y , w e ha v e      Φ  x 1 f 1  , Φ  x 2 f 2      ≤ δ s 1 ,s 2 p k x 1 k 2 2 + k f 1 k 2 2 p k x 2 k 2 2 + k f 2 k 2 2 without the n orm assu mption. Lemma 2.3 Supp ose Φ ∈ R l × ( n + m ) with RIP-c onstant δ 2 s 1 , 2 s 2 < 1 18 ( s 1 , s 2 > 0 )and λ is b etwe en 1 2 q s 1 s 2 and 2 q s 1 s 2 . Th en for any x ∈ R n with | sup p( x ) | ≤ s 1 , any f ∈ R m with | sup p( f ) | ≤ s 2 , and any w ∈ R m with k w k 2 ≤ ǫ the solution ( ˆ x, ˆ f ) to the optimization pr oblem (1.7) satisﬁes k ˆ x − x k 2 + k ˆ f − f k 2 ≤ 4 √ 13+13 δ 2 s 1 , 2 s 2 1 − 9 δ 2 s 1 , 2 s 2 ǫ . Pro of Supp ose ∆ x = ˆ x − x and ∆ f = ˆ f − f . Th en by (1.7) we ha v e     Φ  ∆ x ∆ f      2 ≤ k w k 2 +     Φ  ˆ x ˆ f  −  Φ  x f  + w      2 ≤ 2 ǫ. It is easy to c hec k that the original ( x, f ) satisﬁes the inequalit y constraint in (1.7), so we h a v e k x + ∆ x k 1 + λ k f + ∆ f k 1 ≤ k x k 1 + λ k f k 1 . (2.1) Then it suﬃces to s h o w k ∆ x k 2 + k ∆ f k 2 ≤ 4 √ 13+13 δ 2 s 1 , 2 s 2 1 − 9 δ 2 s 1 , 2 s 2 ǫ . Supp ose T 0 with | T 0 | = s 1 suc h that su pp( x ) ∈ T 0 . Denote T c 0 = T 1 ∪ · · · ∪ T l where | T 1 | = ... = | T l − 1 | = s 1 and | T l | ≤ s 1 . Moreo v er, sup p ose T 1 con tains the ind ices of the s 1 largest (in the sense of absolute v alue) co eﬃcien ts of P T c 0 ∆ x , T 2 con tains the indices of the s 1 largest co eﬃcien ts of P ( T 0 ∪ T 1 ) c ∆ x , and so on . Similarly , deﬁ ne V 0 suc h that su pp( f ) ⊂ V 0 and | V 0 | = s 2 , and d ivide V c 0 = V 1 ∪ ... ∪ V k in the same w a y . By this setup, we easily h av e X j ≥ 2 k P T j ∆ x k 2 ≤ s − 1 2 1 k P T c 0 ∆ x k 1 , (2.2) and X j ≥ 2 k P V j ∆ f k 2 ≤ s − 1 2 2 k P V c 0 ∆ f k 1 . (2.3) 8 On the other hand, by the assumption su pp( x ) ⊂ T 0 and supp( f ) ⊂ V 0 , w e hav e, k x + ∆ x k 1 = k P T 0 x + P T 0 ∆ x k 1 + k P T c 0 ∆ x k 1 ≥ k x k 1 − k P T 0 ∆ x k 1 + k P T c 0 ∆ x k 1 , (2.4) and similarly , k f + ∆ f k 1 ≥ k f k 1 − k P V 0 ∆ f k 1 + k P V c 0 ∆ f k 1 . (2.5) By inequalities (2.1), (2.4) and (2.5), we h a v e k P T c 0 ∆ x k 1 + λ k P V c 0 ∆ f k 1 ≤ k P T 0 ∆ x k 1 + λ k P V 0 ∆ f k 1 . (2.6) By the deﬁn ition of δ 2 s 1 , 2 s 2 , the fact     Φ  ∆ x ∆ f      2 ≤ 2 ǫ and Lemma 2.2, w e ha v e (1 − δ 2 s 1 , 2 s 2 )  k P T 0 ∆ x + P T 1 ∆ x k 2 2 + k P V 0 ∆ f + P V 1 ∆ f k 2 2  ≤     Φ  P T 0 ∆ x + P T 1 ∆ x P V 0 ∆ f + P V 1 ∆ f      2 2 =  Φ  P T 0 ∆ x + P T 1 ∆ x P V 0 ∆ f + P V 1 ∆ f  , Φ  ∆ x ∆ f  − Φ  P T 2 ∆ x + ... + P T l ∆ x P V 2 ∆ f + ... + P V k ∆ f  ≤ −  Φ  P T 0 ∆ x + P T 1 ∆ x P V 0 ∆ f + P V 1 ∆ f  , Φ  P T 2 ∆ x + ... + P T l ∆ x P V 2 ∆ f + ... + P V k ∆ f  + 2 ǫ     Φ  P T 0 ∆ x + P T 1 ∆ x P V 0 ∆ f + P V 1 ∆ f      2 ≤ δ 2 s 1 , 2 s 2       P T 0 ∆ x P V 0 ∆ f      2 +      P T 1 ∆ x P V 1 ∆ f      2    X j ≥ 2 k P T j ∆ x k 2 + X j ≥ 2 k P V j ∆ f k 2   + 2 ǫ p 1 + δ 2 s 1 , 2 s 2 q k P T 0 ∆ x k 2 2 + k P T 1 ∆ x k 2 2 + k P V 0 ∆ f k 2 2 + k P V 1 ∆ f k 2 2 . Moreo v er, since X j ≥ 2 k P T j ∆ x k 2 + X j ≥ 2 k P V j ∆ f k 2 ≤ s − 1 2 1 k P T c 0 ∆ x k 1 + s − 1 2 2 k P V c 0 ∆ f k 1 By (2.2) and (2.3) ≤ 2 s − 1 2 1 ( k P T c 0 ∆ x k 1 + λ k P V c 0 ∆ f k 1 ) By λ > 1 2 r s 1 s 2 ≤ 2 s − 1 2 1 ( k P T 0 ∆ x k 1 + λ k P V 0 ∆ f k 1 ) By (2.6) ≤ 2 s − 1 2 1 ( s 1 2 1 k P T 0 ∆ x k 2 + λs 1 2 2 k P V 0 ∆ f k 2 ) By Cauch y-S c h w artz inequ alit y ≤ 4 k P T 0 ∆ x k 2 + 4 k P V 0 ∆ f k 2 , By λ < 2 r s 1 s 2 w e hav e       P T 0 ∆ x P V 0 ∆ f      2 +      P T 1 ∆ x P V 1 ∆ f      2    X j ≥ 2 k P T j ∆ x k 2 + X j ≥ 2 k P V j ∆ f k 2   ≤ 8( k P T 0 ∆ x k 2 2 + k P T 1 ∆ x k 2 2 + k P V 0 ∆ f k 2 2 + k P V 1 ∆ f k 2 2 ) . 9 Therefore, by δ 2 s 1 , 2 s 2 < 1 / 9, we hav e q k P T 0 ∆ x k 2 2 + k P T 1 ∆ x k 2 2 + k P V 0 ∆ f k 2 2 + k P V 1 ∆ f k 2 2 ≤ 2 ǫ p 1 + δ 2 s 1 , 2 s 2 1 − 9 δ 2 s 1 , 2 s 2 . Since X j ≥ 2 k P T j ∆ x k 2 + X j ≥ 2 k P V j ∆ f k 2 ≤ 4 k P T 0 ∆ x k 2 + 4 k P V 0 ∆ f k 2 , w e hav e k ∆ x k 2 + k ∆ f k 2 ≤ 5( k P T 0 ∆ x k 2 + k P V 0 ∆ f k 2 ) + ( k P T 1 ∆ x k 2 + k P V 1 ∆ f k 2 ) ≤ √ 52 q k P T 0 ∆ x k 2 2 + k P T 1 ∆ x k 2 2 + k P V 0 ∆ f k 2 2 + k P V 1 ∆ f k 2 2 ≤ 4 p 13 + 13 δ 2 s 1 , 2 s 2 1 − 9 δ 2 s 1 , 2 s 2 ǫ. W e n o w cite a well-kno w n r esult in the literature of CS, e.g. Th eorem 5.2 of [3]. Lemma 2.4 Supp ose A is a r andom matrix deﬁne d in mo del 1. Th en for any 0 < δ < 1 , ther e exist c 1 ( δ ) , c 2 ( δ ) > 0 such that with pr o b ability at le ast 1 − 2 exp( − c 2 ( δ ) m ) , (1 − δ ) k x k 2 2 ≤ k Ax k 2 2 ≤ (1 + δ ) k x k 2 2 holds universal ly for any x with | sup p( x ) | ≤ c 1 ( δ ) m log n m +1 . Also, we cite a we ll-kno w result w hic h can give a b ound for the biggest singular v alue of ran d om matrix, e.g. [17] and [39]. Lemma 2.5 L et B b e an m × n matrix whose entries ar e indep endent standar d normal r andom variables. Then for every t ≥ 0 , with pr ob ability at le ast 1 − 2 exp( − t 2 / 2) , one has k B k 2 , 2 ≤ √ m + √ n + t . W e n o w pro v e Theorem 1.1 : Pro of Supp ose α , δ are tw o constan ts ind ep endent of m and n , and their v alues will b e sp eciﬁed later. Set s 1 = j α m log n m +1 k and s 2 = ⌊ αm ⌋ . W e wan t to b ound th e RIP -constant δ 2 s 1 , 2 s 2 for the ( n + m ) × m matrix Φ = [ A, I ] when α is s u ﬃcien tly sm all. F or an y T with | T | = 2 s 1 and V with | V | = 2 s 2 , and any x with supp( x ) ⊂ T , any f with su p p( f ) ⊂ V , we ha v e     [ A, I ]  x f      2 2 = k Ax + f k 2 2 = k Ax k 2 2 + k f k 2 2 + 2 h P V AP T x, f i . By Lemma 2.4, assu ming α ≤ c 1 ( δ ), w ith probabilit y at least 1 − 2 exp( − c 2 ( δ ) m )) we ha v e (1 − δ ) k x k 2 2 ≤ k Ax k 2 2 ≤ (1 + δ ) k x k 2 2 (2.7) 10 holds universally f or an y such T and x . No w we we ﬁ x T and V , and w e w an t to b ound k P V AP T k 2 , 2 . By L emm a 2.5, we actually ha v e k P V AP T k 2 , 2 ≤ 1 √ m ( √ 2 s 1 + √ 2 s 2 + √ δ 2 m ) ≤ (2 √ 2 α + δ ) (2.8) with p robabilit y at least 1 − 2 exp ( − δ 2 m/ 2). Then w ith pr obabilit y at least 1 − 2 exp ( − δ 2 m 2 )  n 2 s 1  m 2 s 2  , inequalit y 2.8 holds un iv ersally for any V satisfying | V | = 2 s 1 and T satisfying | V | = 2 s 2 . By 2 s 1 ≤ 2 α m log n m +1 , w e ha v e 2 s 1 log( en 2 s 1 ) ≤ α 1 m , w h ere α 1 only d ep end s on α and α 1 → 0 as α → 0, and hen ce  n 2 s 1  ≤ ( en 2 s 1 ) 2 s 1 ≤ exp( α 1 m ). Similarly , b ecause 2 s 2 ≤ 2 αm , w e hav e 2 s 2 log( em 2 s 2 ) ≤ α 2 m , where α 2 only dep ends on α and α 2 → 0 as α → 0, and hence  m 2 s 2  ≤ ( em 2 s 2 ) 2 s 2 ≤ exp( α 2 m ). Therefore, inequalit y 2.8 holds u niv ersally for an y su c h T and V w ith pr ob- abilit y at least 1 − 2 exp(( δ 2 / 2 − α 1 − α 2 ) m ). Com bined with 2.7, w e ha v e (1 − δ ) k x k 2 2 + k f k 2 2 − (2 √ 2 α + δ ) k x k 2 k f k 2 ≤     [ A, I ]  x f      2 2 ≤ (1 + δ ) k x k 2 2 + k f k 2 2 + (2 √ 2 α + δ ) k x k 2 k f k 2 holds u niv ersally for an y suc h T , U , x and f wh ic h p r obabilit y at least 1 − 2 exp ( − c 2 ( δ ) m )) − 2 exp(( δ 2 / 2 − α 1 − α 2 ) m ). By c ho osing an appropriate δ and letting α suﬃcient ly small, we hav e δ 2 s 1 , 2 s 2 < 1 / 9 with pr obabilit y at least 1 − C e − cm . Moreo v er, under the assumption that α  m log( n/m )+1  ≥ 1, we ha v e s 1 = j α  m log( n/m )+1 k > 0, s 2 = ⌊ αm ⌋ > 0 and 1 2 q s 1 s 2 < 1 √ log n m +1 < 2 q s 1 s 2 . Then Theorem 1.1 as a direct corollary of Lemma 2.3 3 A Pro of of Theorem 1.2 In this section we will encounter several absolute constan ts. Instead of denoting them by C 1 , C 2 , ..., w e ju s t use C , i.e., the v alues of C change fr om line to line. Also, we will use the ph rase “with high prob ab ility” to mean with probability at least 1 − C n − c , wh ere C > 0 is a numerical constant and c = 3 , 4 , or 5 d ep endin g on the con text. Here w e will use a lot of notations to represent sub -matrices and su b-v ectors. Sup p ose A ∈ R m × n , P ⊂ [ m ] := { 1 , ..., m } , Q ⊂ [ n ] and i ∈ [ n ]. W e denote by A P , : the su b-matrix of A with ro w in dices con tained in P , by A : ,Q the su b-matrix of A w ith column indices con tained in Q , and b y A P ,Q the sub-matrix of A with row ind ices con tained in P an d column in dices con tained in Q . Moreo v er, w e denote b y A P ,i the sub-matrix of A with row in d ices con tained in P and column i , which is actually a column vec tor. The term “ve ctor” means column vec tor in this section, and all ro w v ectors are d enoted b y an adjoin t of a ve ctor, suc h as a ∗ for a ve ctor a . S upp ose a is a v ector and T a subset of ind ices. Th en w e d enote by a T the restriction of a on T , i.e., a vec tor with all elements of a w ith ind ices in T . F or any v ector v , w e use v { i } to denote the i -th elemen t of v . 11 3.1 Suppor ting lemmas T o prov e T heorem 1.2 we n eed some supp orting lemmas. B ecause our mo del of sensing matrix A is the same as in [7], we will cite some lemmas from it d irectly . Lemma 3.1 (L emma 2.1 of [7]) Supp ose A is as deﬁne d in mo del 2. L et T ⊂ [ n ] b e a ﬁxe d set of c ar dina lity s. Then for δ > 0 , P ( k A ∗ : ,T A : ,T − I k 2 , 2 ≥ δ ) ≤ 2 s exp  − m µs · δ 2 2(1+ δ/ 3 )  . In p articular, k A ∗ : ,T A : ,T − I k 2 , 2 ≤ 1 2 with high pr ob ability pr ovide d s ≤ γ m µ l og n , and k A ∗ : ,T A : ,T − I k 2 , 2 ≤ 1 2 √ log n with high pr ob ability pr ovide d s ≤ γ m µ log 2 n , wher e γ i s some absolute c onstant . This Lemma w as pr o v ed in [7] by matrix Bernstein’s inequalit y , whic h is ﬁrst intro d uced by [2]. A deep generalization is giv en in [25]. Lemma 3.2 (L emma 2.4 of [7 ]) Supp ose A is as deﬁne d in mo del 2. Fix T ⊂ [ n ] with | T | = s and v ∈ R s . Then k A ∗ : ,T c A : ,T v k ∞ ≤ 1 20 √ s k v k 2 with high pr ob ability pr ovide d s ≤ γ m µ l og n , wher e γ is some absolute c onstant. Lemma 3.3 (L emma 2.5 of [7]) Supp ose A is as deﬁne d in mo del 2. Fix T ⊂ [ n ] with | T | = s . Then max i ∈ T c k A ∗ : ,T A : ,i k 2 ≤ 1 with high pr ob ability pr ovide d s ≤ γ m µ l og n , wher e γ is some absolute c onstant. 3.2 A pro of of Theorem 1.2 In this part we will giv e a complete pro of of Th eorem 1.2 with a p o w erful tec hnique called ”golﬁng- sc heme” introduced by Da vid Gross in [21], and later in [4] and [7 ]. Und er th e assum p tion of mo del 2, we additionally assume s ≤ α m µ l og 2 n and m b ≤ β m µ , w here α and β are n umerical constan ts whose v alues will sp eciﬁed later. First we giv e tw o useful inequ alities. By replacing A with q m m − m b A B c ,T in Lemma 3.1 and Lemma 3.2, we hav e     m m − m b A ∗ B c ,T A B c ,T − I     2 , 2 ≤ 1 / 2 (3.1) and max i ∈ T c     m m − m b A ∗ B c ,T A B c ,i     2 ≤ 1 (3.2) with high p robabilit y p ro vided s ≤ γ m − m b µ log n . Sin ce s ≤ α m µ log 2 n and m b ≤ β m µ , b oth 3.1 and 3.2 hold with high probabilit y p r o vided α and β are suﬃcien tly sm all. W e assume (3.1) and (3.2) hold throughout this section. First we prov e that the solution ( ˆ x, ˆ f ) of (1.3) equals ( x, f ) if we can ﬁnd an appropr iate dual v ector q B c satisfying the follo wing requirement. This is actually an “inexact dual vec tor” of th e optimization p roblem (1.3). T his idea w as ﬁrst give n explicitly in [22] and [21], and related to [5]. W e give a result similar to [7]. 12 Lemma 3.4 (Inexact Duality) Supp o se ther e exists a ve ctor q B c ∈ R m − m b satisfying k v T − sgn ( x T ) k 2 ≤ λ/ 4 , k v T c k ∞ ≤ 1 / 4 and k q B c k ∞ ≤ λ/ 4 , (3.3) wher e v = A ∗ B c , : q B c + A ∗ B , : λ sgn ( f B ) . (3.4) Then the solution ( ˆ x, ˆ f ) of (1.3) e quals ( x, f ) pr ovid e d β is suﬃciently smal l and λ < 3 2 . Pro of Set h = ˆ x − x . By x T c = 0 we hav e h T c = ˆ x T c . (3.5) By f B c = 0, and Ax + f = A ˆ x + ˆ f , we hav e Ah = f − ˆ f and A B c , : h = ( f − ˆ f ) B c = − ˆ f B c . (3.6) Then w e hav e the follo wing in equalit y k ˆ x k 1 + λ k ˆ f k 1 = h ˆ x T , sgn( ˆ x T ) i + k ˆ x T c k 1 + λ ( h ˆ f B , sgn( ˆ f B ) i + k ˆ f B c k 1 ) ≥ h ˆ x T , sgn( x T ) i + k ˆ x T c k 1 + λ ( h ˆ f B , sgn( f B ) i + k ˆ f B c k 1 ) = h x T + h T , sgn( x T ) i + k h T c k 1 + λ ( h f B − A B , : h, sgn( f B ) i + k A B c , : h k 1 ) By (3.5), (3.6) = k x k 1 + λ k f k 1 + k h T c k 1 + λ k A B c , : h k 1 + h h T , sgn( x T ) i − λ h A B , : h, sgn( f B ) i . Since k ˆ x k 1 + λ k ˆ f k 1 ≤ k x k 1 + λ k f k 1 , w e hav e k h T c k 1 + λ k A B c , : h k 1 + h h T , sgn( x T ) i − λ h A B , : h, sgn( f B ) i ≤ 0 . (3.7) By (3.4), we h a v e h h T , v T i + h h T c , v T c i = h h, v i = h h, A ∗ B c , : q B c + A ∗ B , : λ sgn f B i = h A B c , : h, q B c i + λ h A B , : h, sgn f B i , and then by (3.3 ), h h T , sgn( x T ) i − λ h A B , : h, sgn( f B ) i = h h T , (sgn( x T ) − v T ) i + h A B c , : h, q B c i − h h T c , v T c i ≥ − λ 4 k h T k 2 − 1 4 λ k A B c , : h k 1 − 1 4 k h T c k 1 . Unite it with (3.7), w e ha v e − λ 4 k h T k 2 + 3 4 λ k A B c , : h k 1 + 3 4 k h T c k 1 ≤ 0 . (3.8) 13 By (3.1), w e hav e    q m m − m b A ∗ B c ,T    2 , 2 ≤ q 3 2 and the smallest singular v alue of m m − m b A ∗ B c ,T A B c ,T is at least 1 2 . Therefore, k h T k 2 ≤ 2     m m − m b A ∗ B c ,T A B c ,T h T     2 ≤ 2      m m − m b A ∗ B c ,T A B c ,T c h T c     2 +     m m − m b A ∗ B c ,T A B c , : h     2  ≤ 2     m m − m b A ∗ B c ,T A B c ,T c h T c     2 + √ 6     r m m − m b A B c , : h     2 ≤ 2 X i ∈ T c     m m − m b A ∗ B c ,T A B c ,i     2 | h { i } | + √ 6     r m m − m b A B c , : h     2 By the triangle inequalit y ≤ 2 k h T c k 1 + √ 6     r m m − m b A B c , : h     1 By (3.2) . Plugging this into (3.8), w e h a v e  3 4 − 1 2 λ  k h T c k 1 +  3 4 − √ 6 4 q m m − m b  λ k A B c , : h k 1 ≤ 0. W e kno w 3 4 − √ 6 4 q m m − m b > 0 when β is suﬃcien tly small. Moreo v er , by the assumption λ < 3 2 , w e hav e h T c = 0 and A B c , : h = 0. Since A B c , : h = A B c ,T h T + A B c ,T c h T c , we hav e A B c ,T h T = 0. The inequalit y (3.1) implies that A B c ,T is injectiv e, so h T = 0 and h = h T + h T c = 0, wh ic h implies ( ˆ x, ˆ f ) = ( x, f ). No w let’s construct a v ector q B c satisfying the requirement (3.3) by c ho osing an appr op r iate λ . Pro of (of Theorem 1.2) S et λ = 1 √ log n . It suﬃces to construct a q B c satisfying (3.3). Denot- ing u = A ∗ B c . : q B c , we only n eed to construct a q B c satisfying k u T + λA ∗ B ,T sgn( f B ) − sgn( x T ) k 2 ≤ λ 4 , k u T c k ∞ ≤ 1 8 , k λA ∗ B , : sgn( f B ) k ∞ ≤ 1 8 , k q B c k ∞ ≤ λ 4 . No w let’s construct our q B c b y the golﬁng sc heme. First we ha v e to write A B c , : as a blo ck matrix. W e divide B c in to l = ⌊ log 2 n + 1 ⌋ = ⌊ log n log 2 + 1 ⌋ disjoint subsets: B c = G 1 ∪ ... ∪ G l where | G i | = m i . Then we h a v e P l i =1 m i = m − m b and A B c , : =   A G 1 , : · · · A G l , :   . W e wa n t to men tion that th e p artition of B c is d eterministic, not dep ending on A , so A G 1 , : , ..., A G l , : are indep endent. Noticing m b ≤ β m µ ≤ β m , by letting β suﬃcien tly small, we can require m m 1 ≤ C, m m 2 ≤ C, m m k ≤ C log n for k = 3 , ..., l for some absolute constant C . Since s ≤ α m µ log 2 n , we ha v e s ≤ αC m 1 µ log 2 n , s ≤ αC m 2 µ log 2 n , s ≤ αC m k µ log n for k = 3 , ..., l . (3.9) 14 Then b y Lemm a 3.1, replacing A with q m m j A G j ,T , w e hav e the follo wing inequ alities:     m m j A ∗ G j ,T A G j ,T − I     2 , 2 ≤ 1 2 √ log n for j = 1 , 2; (3.10)     m m j A ∗ G j ,T A G j ,T − I     2 , 2 ≤ 1 2 for j = 3 , ..., l ; (3.11) with high pr obabilit y provi ded α is suﬃcien tly small. No w let’s give an explicit construction of q B c . Deﬁne p 0 = sgn( x T ) − λA ∗ B ,T sgn( f B ) (3.12) and p i =  I − m m i A ∗ G i ,T A G i ,T  p i − 1 =  I − m m i A ∗ G i ,T A G i ,T  · · ·  I − m m 1 A ∗ G 1 ,T A G 1 ,T  p 0 (3.13) for i = 1 , ..., l , and construct q B c =    m m 1 A G 1 ,T p 0 . . . m m l A G l ,T p l − 1    . (3.14) Then b y u = A ∗ B c , : q B c , we h av e u = A ∗ B c , :    m m 1 A G 1 ,T p 0 . . . m m l A G l ,T p l − 1    = l X i =1 m m i A ∗ G i , : A G i ,T p i − 1 . (3.15) W e n o w b oun d the ℓ 2 norm of p i . Actually , b y (3.10), (3.11) and (3.13 ), we hav e k p 1 k 2 ≤ 1 2 √ log n k p 0 k 2 , (3.16) k p 2 k 2 ≤ 1 4 log n k p 0 k 2 , (3.17) k p j k 2 ≤ 1 log n ( 1 2 ) j k p 0 k 2 for j = 3 , ..., l . (3.18) No w we will pro v e our constructed q B c satisﬁes the desired r equirement s: The pro of of    λA ∗ B , : sgn ( f B )    ∞ ≤ 1 8 By Ho eﬀd ing’s inequalit y , for any i = 1 , ..., n , we ha v e P     A ∗ B ,i sgn( f B )    ≥ t  ≤ 2 exp  − 2 t 2 4 k A B,i k 2 2  . By choosing t = C √ log n k A B ,i k 2 ( C is some absolute constan t), with high prob ab ility , w e ha v e    λA ∗ B ,i sgn( f B )    ≤ λC √ log n k A B ,i k 2 ≤ C q µm b m ≤ √ β ≤ 1 8 , pr o vided β is suﬃcien tly small, and this implies    λA ∗ B , : sgn( f B )    ∞ ≤ 1 8 . 15 The pro of of    u T + λA ∗ B ,T sgn ( f B ) − sgn ( x T )    2 ≤ λ 4 By (3.15) and (3.13), we hav e u T = l X i =1 m m i A ∗ G i ,T A G i ,T p i − 1 = l X i =1 ( p i − 1 − p i ) = p 0 − p l . Then by (3.12) w e hav e    u T + λA ∗ B ,T sgn( f B ) − sgn( x T )    2 = k u T − p 0 k 2 = k p l k 2 . S ince    λA ∗ B , : sgn( f B )    ∞ ≤ 1 / 8, w e hav e    λA ∗ B ,T sgn( f B )    2 ≤ 1 8 √ s , whic h implies k p 0 k 2 =   λA ∗ B ,T sgn( f B ) − sgn( x T )   2 ≤ 9 8 √ s. (3.19) Then b y (3.18) and l = ⌊ log 2 n + 1 ⌋ , w e hav e k p l k 2 ≤ 1 log n ( 1 2 ) l 9 8 √ s ≤  1 log n   1 n   9 8  q αm µ l og 2 n ≤ 1 4 √ log n = λ 4 , provided α is suﬃcien tly small. The pro of of k u T c k ∞ ≤ 1 / 8 By (3.15 ), w e hav e u T c = l X i =1 m m i A ∗ G i ,T c A G i ,T p i − 1 . Recall that A G 1 , : , ..., A G l , : are indep endent, so by the construction of p i − 1 w e kn o w A G i , : and p i − 1 are ind ep endent. Replacing A with q m m i A G i , : in Lemma 3.2, and b y the sp arsit y condition (3.9), we ha v e l X i =1     m m i A ∗ G i ,T c A G i ,T p i − 1     ∞ ≤ l X i =1 1 20 1 √ s k p i − 1 k 2 with h igh pr obabilit y , provided α is suﬃcientl y small. By (3.16), (3.17), (3.18) and (3.19), we hav e k u T c k ∞ ≤ l X i =1 1 20 1 √ s k p i − 1 k 2 ≤ 1 20 1 √ s 2 k p 0 k 2 < 1 8 . The pro of of k q B c k ∞ ≤ λ 4 F or k = 1 , .., l , w e d enote A G k , : = 1 √ m   a ∗ k 1 ... a ∗ k m k   , and A B , : = 1 √ m   ˜ a ∗ 1 ... ˜ a ∗ m b   . By (3.13), (3.14) and (3.12), it suﬃces to sh o w that for any 1 ≤ k ≤ l and 1 ≤ j ≤ m k ,     √ m m k ( a k j ) ∗ T  I − m m k − 1 A ∗ G k − 1 ,T A G k − 1 ,T  · · ·  I − m m 1 A ∗ G 1 ,T A G 1 ,T   sgn( x T ) − λA ∗ B ,T sgn( f B )      ≤ λ 4 . Set w =  I − m m 1 A ∗ G 1 ,T A G 1 ,T  · · ·  I − m m k − 1 A ∗ G k − 1 ,T A G k − 1 ,T  ( a k j ) T . (3.20) Then it suﬃces to p ro v e     √ m m k w ∗  sgn( x T ) − λA ∗ B ,T sgn( f B )      ≤ λ 4 . Since w and sgn( x T ) are ind ep endent, by Ho eﬀding’s in equalit y and cond itioning on w , we ha v e P ( | w ∗ sgn( x T ) | ≥ t ) ≤ 2 exp  − 2 t 2 4 k w k 2 2  for an y t > 0. Then with high pr obabilit y w e hav e | w ∗ sgn( x T ) | ≤ C p log n k w k 2 (3.21) 16 for some absolute constant C . Setting z = s gn( f B ), we ha v e w ∗ A ∗ B ,T sgn( f B ) = 1 √ m m b X i =1 [(˜ a i ) ∗ T w ] z { i } . S ince w , A B ,T and z are indep en d en t, conditioning on w we h a v e E { [(˜ a i ) ∗ T w ] z { i } } = E { (˜ a i ) ∗ T w } E { z ( i ) } = 0 ,   [(˜ a i ) ∗ T w ] z { i }   ≤ k w k 2 k (˜ a i ) T k 2 ≤ √ sµ k w k 2 ≤ r αm log 2 n k w k 2 , and E {   [(˜ a i ) ∗ T w ] z { i }   2 } = E { [ w ∗ (˜ a i ) T ][(˜ a i ) ∗ T w ] } = w ∗ E { (˜ a i ) T (˜ a i ) ∗ T } w = k w k 2 2 . By Bernstein’s inequalit y , we h a v e P    w ∗ A ∗ B ,T sgn( f B )   ≥ t √ m  ≤ 2 exp   − t 2 / 2 m b k w k 2 2 + q αm log 2 n k w k 2 t/ 3   . By c ho osing s ome numerical constant C and t = C √ m log n k w k 2 , we ha v e   w ∗ A ∗ B ,T sgn( f B )   ≤ C p log n k w k 2 (3.22) with high pr obabilit y , p ro vided α is s uﬃcien tly small. By (3.21) and (3.22), w e ha v e     √ m m k w ∗  sgn( x T ) − λA ∗ B ,T sgn( f B )      ≤ √ m m k C p log n k w k 2 , (3.23) for some numerical constan t C . When k ≥ 3, b y (3.20), (3.10) and (3.11), w e h a v e k w k 2 ≤ ( 1 2 ) k − 1 1 log n √ µs ≤ √ αm log 2 n . Recalling m m k ≤ C log n , b y (3.23), w e ha v e    √ m m k w ∗  sgn( x T ) − λA ∗ B ,T sgn( f B )     ≤ C  m m k  √ α (log n ) − 3 / 2 ≤ λ 4 pro vided α is su ﬃcien tly small. When k ≤ 2, b y (3.20) and (3.10), w e hav e k w k 2 ≤ √ µs ≤ √ αm log n . Recalling m m k ≤ C , by (3.23), we ha v e    √ m m k w ∗  sgn( x T ) − λA ∗ B ,T sgn( f B )     ≤ C  m m k  √ α (log n ) − 1 / 2 ≤ λ 4 pro vided α is su ﬃcien tly small. Here we would like to compare our golﬁng scheme w ith that in [7]. Th ere are mainly t w o diﬀerences. One is that we hav e an extra term λA ∗ B , : sgn( f B ) in the dual ve ctor. T o obtain the inequalit y k v T c k ∞ ≤ 1 / 4, we prop ose to b oun d k u T c k ∞ and k λA ∗ B , : sgn( f B ) k ∞ resp ectiv ely , and this will lead to th e extra log factor compared with [7]. Mo reo v er, by u s ing the golﬁng scheme to construct the dual vecto r, we need to b oun d the term k q B c k ∞ , which is not necessary in [7]. T his inevitably incurs the r andom signs assump tions of the signal. 17 4 A Pro of of Theorem 1.3 In this section, the capital letters X , Y etc repr esen t m atrices, and the symbols in script fon t I , P T , etc represent linear op erators from a m atrix sp ace to a matrix space. Moreo v er, for any Ω 0 ⊂ [ n ] × [ n ] w e hav e P Ω 0 M is to k eep the entries of M on the su pp ort Ω 0 and to c hange other en tries into zeros. F or any n × n matrix A , denote by k A k F , k A k , k A k ∞ and k A k ∗ resp ectiv ely the F rob eniu s norm, op erator n orm (the largest singular v alue), the biggest magnitude of all elements, and the nuclea r n orm(the sum of all singular v al ues). Similarly to Section 3, instead of denoting them as C 1 , C 2 , ..., w e just use C , w hose v alues change from line to line. Also, we will u se the phr ase “with high probability” to mean with p r obabilit y at least 1 − C n − c , where C > 0 is a numerical constan t and c = 3 , 4 , o r 5 dep ending on the con text. 4.1 A mo del equiv alen t to Mo del 3.1 Mo del 3.1 is natural an d used in [4], but we will use the f ollo wing equiv alen t mo d el for the con v e- nience of pro of: Mo del 3.2: 1 . Fix an n b y n matrix K , w hose en tries are either 1 or − 1. 2. Deﬁn e t w o ind ep end en t random sub sets of [ n ] × [ n ]: Γ ′ ∼ Ber((1 − 2 s ) ρ ) and Ω ′ ∼ Ber( 2 sρ 1 − ρ +2 sρ ). Moreo v er, let O := Γ ′ ∪ Ω ′ , which thus satisﬁes O ∼ Ber( ρ ). 3. Deﬁne an n × n random matrix W with ind ep endent entries W ij satisfying P ( W ij = 1) = P ( W ij = − 1) = 1 2 . 4. Deﬁne Ω ′′ ⊂ Ω ′ : Ω ′′ := { ( i, j ) : ( i, j ) ∈ Ω ′ , W ij = K ij } . 5. Deﬁne Ω := Ω ′′ / Γ ′ , and Γ := O / Ω. 6. Let S satisfy sgn( S ) := P Ω ( K ). Ob viously , in b oth Mo d el 3.1 and Mo del 3.2 the whole setting is deterministic if we ﬁx ( O , Ω). Therefore, th e probabilit y of ( ˆ L, ˆ S ) = ( L, S ) is determined by the join t d istribution of ( O , Ω). It is not diﬃcu lt to pro v e th at the join t distributions of ( O , Ω ) in b oth mo dels are the same. Indeed, in Mo del 3.1, we ha v e that (1 { ( i,j ) ∈ O } , 1 { ( i,j ) ∈ Ω } ) are iid random vec tors with the probabilit y d istr ibu- tion P (1 { ( i,j ) ∈ O } = 1) = ρ , P (1 { ( i,j ) ∈ Ω } = 1 | 1 { ( i,j ) ∈ O } = 1) = s and P (1 { ( i,j ) ∈ Ω } = 1 | 1 { ( i,j ) ∈ O } = 0) = 0. In Mo d el 3.2, we hav e (1 { ( i,j ) ∈ O } , 1 { ( i,j ) ∈ Ω } ) = (max(1 { ( i,j ) ∈ Γ ′ } , 1 { ( i,j ) ∈ Ω ′ } ) , 1 { ( i,j ) ∈ Ω ′ } 1 { W i,j = K i,j } 1 { ( i,j ) ∈ Γ ′ c } ) . This implies that (1 { ( i,j ) ∈ O } , 1 { ( i,j ) ∈ Ω } ) are indep endent random ve ctors. Moreo ver, it is easy to calculate that P (1 { ( i,j ) ∈ O } = 1) = ρ , P (1 { ( i,j ) ∈ Ω } = 1) = sρ and P (1 { ( i,j ) ∈ Ω } = 1 , 1 { ( i,j ) ∈ O } = 0) = 0. Then w e hav e P (1 { ( i,j ) ∈ Ω } = 1 | 1 { ( i,j ) ∈ O } = 1) = P (1 { ( i,j ) ∈ Ω } = 1 , 1 { ( i,j ) ∈ O } = 1) / P (1 { ( i,j ) ∈ O } = 1) = s, and P (1 { ( i,j ) ∈ Ω } = 1 | 1 { ( i,j ) ∈ O } = 0) = P (1 { ( i,j ) ∈ Ω } = 1 , 1 { ( i,j ) ∈ O } = 0) / P (1 { ( i,j ) ∈ O } = 0) = 0 . 18 Notice that although (1 { ( i,j ) ∈ O } , 1 { ( i,j ) ∈ Ω } ) d ep end s on K , its distribution do es not. By the ab ov e w e kn o w that ( O , Ω) h as the same distrib ution in b oth mo dels. Th erefore in the follo wing w e w ill use Mo d el 3.2 instead. Th e adv an tag e of using Mo del 3.2 is that we can utilize Γ ′ , Ω ′ , W , etc. a s auxiliaries. In the next section w e prov e some supp orting lemmas wh ic h are usefu l for the pr o of of the m ain theorem. 4.2 Suppor ting lemmas Deﬁne T := { U X ∗ + Y V ∗ , X , Y ∈ R n × r } a subspace of R n × n . T hen the orthogonal pro jectors P T and P T ⊥ in R n × n satisfy P T X = U U ∗ X + X V V ∗ − U U ∗ X V V ∗ and P T ⊥ X = ( I − U U ∗ ) X ( I − V V ∗ ) for an y X ∈ R n × n . This means kP T ⊥ X k ≤ k X k f or an y X . Reca lling the incoherence cond itions: for any i ∈ { 1 , ..., n } , k U U ∗ e i k 2 ≤ µr n and k V V ∗ e i k 2 ≤ µr n , we ha v e kP T ( e i e ∗ j ) k ∞ ≤ 2 µr n and kP T ( e i e ∗ j ) k F ≤ q 2 µr n [8, 12]. Lemma 4.1 (The or em 4.1 of [8]) Su pp ose Ω 0 ∼ Ber ( ρ 0 ) . Then with high pr ob ability, kP T − ρ − 1 0 P T P Ω 0 P T k ≤ ǫ , pr ovide d that ρ 0 ≥ C 0 ǫ − 2 µr l og n n for some numeric al c o nstant C 0 > 0 . The original id ea of the pr o of of this theorem is du e to [36]. Lemma 4.2 (The or em 3.1 of [4]) Supp ose Z ∈ R ange ( P T ) is a ﬁxe d matrix, Ω 0 ∼ Ber ( ρ 0 ) , and ǫ ≤ 1 is an arbitr ary c onstant. Then with high pr ob ability k ( I − ρ − 1 0 P T P Ω 0 ) Z k ∞ ≤ ǫ k Z k ∞ pr ovide d that ρ 0 ≥ C 0 ǫ − 2 µr l og n n for some numeric al c onstant C 0 > 0 . Lemma 4.3 (The or em 6.3 of [8]) Supp ose Z is a ﬁxe d matrix, and Ω 0 ∼ Ber ( ρ 0 ) . Then with high pr ob ability, k ( ρ 0 I − P Ω 0 ) Z k ≤ C ′ 0 √ np log n k Z k ∞ pr ovide d that ρ 0 ≤ p and p ≥ C 0 log n n for some numeric al c onstants C 0 > 0 and C ′ 0 > 0 . Notice that w e only ha v e ρ 0 = p in Th eorem 6.3 of [8]. By a very sligh t mo d iﬁcation in the pro of (sp eciﬁcally , the p ro of of Lemma 6.2) w e can ha v e ρ 0 ≤ p as stated ab ov e. 4.3 A pro of of Theorem 1.3 By Lemma 3.1, we hav e w e h a v e k 1 (1 − 2 s ) ρ P T P Γ ′ P T − P T k ≤ 1 2 and k 1 √ (1 − 2 s ) ρ P T P Γ ′ k ≤ p 3 / 2 with high probability p ro vided C ρ is suﬃ ciently large and C s is suﬃcien tly small. W e w ill assume b oth inequalities hold all through the pap er . Theorem 4.4 If ther e exists an n × n matrix Y ob eying            kP T Y + P T ( λ P O / Γ ′ W − U V ∗ ) k F ≤ λ n 2 , kP T ⊥ Y + P T ⊥ ( λ P O / Γ ′ W ) k ≤ 1 4 , P Γ ′ c Y = 0 , kP Γ ′ Y k ∞ ≤ λ 4 , (4.1) wher e λ = 1 √ nρ log n . Then the solution ( ˆ L, ˆ S ) to (1.6 ) satisﬁes ( ˆ L, ˆ S ) = ( L, S ) . 19 Pro of Set H = ˆ L − L . Th e condition P O ( L ) + S = P O ( ˆ L ) + ˆ S imp lies that P O ( H ) = S − ˆ S . Then ˆ S is su pp orted on O b ecause S is supp orted on Ω ⊂ O . By considering the subgradient of the nuclea r norm at L , we ha v e k ˆ L k ∗ ≥ k L k ∗ + hP T H , U V ∗ i + kP T ⊥ H k ∗ . By the deﬁn ition of ( ˆ L, ˆ S ), we ha v e k ˆ L k ∗ + λ k ˆ S k 1 ≤ k L k ∗ + λ k S k 1 . By the t w o inequalities ab ov e, w e ha v e λ k S k 1 − λ k ˆ S k 1 ≥ hP T ( H ) , U V ∗ i + kP T ⊥ H k ∗ , whic h implies λ k S k 1 − λ kP O / Γ ′ ( ˆ S ) k 1 ≥ h H , U V ∗ i + kP T ⊥ ( H ) k ∗ + λ kP Γ ′ ( ˆ S ) k 1 . On the other hand, kP O / Γ ′ ˆ S k 1 = k S + P O / Γ ′ ( − H ) k 1 ≥ k S k 1 + h sgn( S ) , P Ω ( − H ) i + kP O / (Γ ′ ∪ Ω) ( − H ) k 1 ≥ k S k 1 + hP O / Γ ′ ( W ) , − H i . By the t w o inequalities ab ov e and the fact P Γ ′ ˆ S = P Γ ′ ( ˆ S − S ) = −P Γ ′ H , we h a v e kP T ⊥ ( H ) k ∗ + λ kP Γ ′ ( H ) k 1 ≤ h H , λ P O / Γ ′ ( W ) − U V ∗ i . (4.2) By the assump tions of Y , w e ha v e h H , λ P O / Γ ′ ( W ) − U V ∗ i = h H , Y + λ P O / Γ ′ ( W ) − U V ∗ i − h H , Y i = hP T ( H ) , P T ( Y + λ P O / Γ ′ ( W ) − U V ∗ ) i + hP T ⊥ ( H ) , P T ⊥ ( Y + λ P O / Γ ′ ( W )) i − hP Γ ′ ( H ) , P Γ ′ ( Y ) i − hP Γ ′ c ( H ) , P Γ ′ c ( Y ) i ≤ λ n 2 kP T ( H ) k F + 1 4 kP T ⊥ ( H ) k ∗ + λ 4 kP Γ ′ ( H ) k 1 . By inequalit y 4.2 , 3 4 kP T ⊥ ( H ) k ∗ + 3 λ 4 kP Γ ′ ( H ) k 1 ≤ λ n 2 kP T ( H ) k F . (4.3) Recall that w e assume k 1 (1 − 2 s ) ρ P T P Γ ′ P T − P T k ≤ 1 2 and k 1 √ (1 − 2 s ) ρ P T P Γ ′ k ≤ p 3 / 2 all th rough the pap er. Th en kP T ( H ) k F ≤ 2 k 1 (1 − 2 s ) ρ P T P Γ ′ P T ( H ) k F ≤ 2 k 1 (1 − 2 s ) ρ P T P Γ ′ P T ⊥ ( H ) k F + 2 k 1 (1 − 2 s ) ρ P T P Γ ′ ( H ) k F ≤ s 6 (1 − 2 s ) ρ kP T ⊥ H k F + s 6 (1 − 2 s ) ρ kP Γ ′ H k F . 20 By inequalit y 4.3 , w e hav e ( 3 4 − λ n 2 s 6 (1 − 2 s ) ρ ) kP T ⊥ ( H ) k F + ( 3 λ 4 − λ n 2 s 6 (1 − 2 s ) ρ ) kP Γ ′ H k F ≤ 0 . Then P T ⊥ ( H ) = P Γ ′ H = 0, wh ic h imp lies P Γ ′ P T ( H ) = 0. Since P Γ ′ P T is injectiv e ( k 1 (1 − 2 s ) ρ P T P Γ ′ P T − P T k ≤ 1 2 ) on T , we ha v e P T ( H ) = 0. Then we h a v e H = 0. Supp ose we can construct Y and e Y satisfying            kP T Y + P T ( λ P Ω ′ W − U V ∗ ) k F ≤ λ 2 n 2 , kP T ⊥ Y + P T ⊥ ( λ P Ω ′ W ) k ≤ 1 4 , P Γ ′ c Y = 0 , kP Γ ′ Y k ∞ ≤ λ 4 . (4.4) and            kP T e Y + P T ( λ (2 P Ω ′ / Γ ′ ( W ) − P Ω ′ W ) − U V ∗ ) k F ≤ λ 2 n 2 , kP T ⊥ e Y + P T ⊥ ( λ (2 P Ω ′ / Γ ′ ( W ) − P Ω ′ W )) k ≤ 1 4 , P Γ ′ c e Y = 0 , kP Γ ′ e Y k ∞ ≤ λ 4 . (4.5) Then Y = ( Y + ˜ Y ) / 2 will satisfy 4.1. By the assu m ptions in Mo del 2, (Γ ′ , P Ω ′ W ) and (Γ ′ , 2 P Ω ′ / Γ ′ ( W ) − P Ω ′ W ) hav e the same d istr ibution. Therefore, if we can construct Y satisfying (4.4) w ith high prob- abilit y , w e can also construct e Y satisfying (4.5 ) with h igh prob ab ility . Therefore to prov e Theorem 1.3, we only n eed to prov e that there exists Y satisfying (4.4) w ith high prob ab ility: Pro of (of T h eorem 1.3) Notice that Γ ′ ∼ Ber((1 − 2 s ) ρ ). Su pp ose that q satisfying 1 − (1 − 2 s ) ρ = (1 − (1 − 2 s ) ρ 6 ) 2 (1 − q ) l − 2 , wh ere l = ⌊ 5 log n + 1 ⌋ . This imp lies that q ≥ C ρ/ log ( n ). Deﬁne q 1 = q 2 = (1 − 2 s ) ρ/ 6 and q 3 = ... = q l = q . Th en in distribution w e can let Γ ′ = Γ 1 ∪ ... ∪ Γ l , wh ere Γ j ∼ Ber( q j ) indep endently . Construct        Z 0 = P T ( U V ∗ − λ P Ω ′ W ) , Z j = ( P T − 1 q j P T P Γ j P T ) Z j − 1 for j = 1 , ..., j 0 ., Y = P l j =1 1 q j P Γ j Z j − 1 , Then b y Lemm a 4.1, w e ha v e k Z j k F ≤ 1 2 k Z j − 1 k F for j = 1 , ..., l . with high probability pro vided C ρ is large enough and C s is small enough. Then k Z j k F ≤ ( 1 2 ) j k Z 0 k F . By the construction of Z j , we kno w that Z j ∈ Range( P T ) and Z j = ( I − 1 q P T P Γ j ) Z j − 1 . Then similarly , b y Lemma 4.2, we h a v e k Z 1 k ∞ ≤ 1 2 √ log n k Z 0 k ∞ , 21 and k Z j k ∞ ≤ 1 2 j log n k Z 0 k ∞ for j = 2 , ..., l with high probability p ro vided C ρ is large enough and C s is small enough. Also, by Lemma 4.3 w e ha v e k ( I − 1 q P Γ j ) Z j − 1 k ≤ C s n log n q k Z j − 1 k ∞ for j = 1 , ..., l with high pr obabilit y provi ded C ρ is large enough and C s is small en ou gh . W e ﬁ rst b ound k Z 0 k F and k Z 0 k ∞ . Obviously k Z 0 k ∞ ≤ k U V ∗ k ∞ + λ kP T P Ω ′ ( W ) k ∞ . Recall that for an y i, j ∈ [ n ], we ha v e kP T ( e i e ∗ j ) k ∞ ≤ 2 µr n and kP T ( e i e ∗ j ) k F ≤ q 2 µr n . Moreo v er, P Ω ′ ( W ) satisﬁes ( P Ω ′ ( W )) ; are iid rand om v ariables with the d istribution ( P Ω ′ ( W )) ij =      1 with probabilit y sρ 1 − ρ +2 sρ 0 with probabilit y 1 − ρ 1 − ρ +2 sρ − 1 w ith probabilit y sρ 1 − ρ +2 sρ Then b y Berns tein’s inequalit y , w e hav e P     P T ( P Ω ′ ( W )) , e i e ∗ j    ≥ t  = P     P Ω ′ ( W ) , P T ( e i e ∗ j )    ≥ t  ≤ 2 exp( − t 2 / 2 P E X 2 j + M t / 3 ) , where we hav e X E X 2 j = 2 sρ 1 − ρ + 2 sρ kP T e i e ∗ j k 2 F ≤ C ρs µr n , and M = kP T e i e ∗ j k ∞ ≤ 2 µr n . Then with high pr obabilit y w e hav e kP T P Ω ′ ( W ) k ∞ ≤ C q ρ µr l og n n ( ≥ C q C ρ µr l og 2 n n µr l og n n > C p C ρ M log n ). Then by k U V ∗ k ∞ ≤ √ µr n w e hav e k Z 0 k ∞ ≤ C √ µr n , which implies k Z 0 k F ≤ n k Z 0 k ∞ ≤ C √ µr . No w we w an t to pro v e Y satisﬁes 4 . 4 with high p robabilit y . Obviously P Γ ′ c Y = 0. It suﬃces to prov e            kP T Y + P T ( λ P Ω ′ ( W ) − U V ∗ ) k F ≤ λ 2 n 2 , kP T ⊥ Y k ≤ 1 8 , kP T ⊥ ( λ P Ω ′ ( W )) k ≤ 1 8 , kP Γ ′ Y k ∞ ≤ λ 4 . (4.6) 22 First, kP T Y + P T ( λ P Ω ′ ( W ) − U V ∗ ) k F = k Z 0 − ( l X j =1 1 q j P T P Γ j Z j − 1 ) k F = kP T Z 0 − ( l X j =1 1 q j P T P Γ j P T Z j − 1 ) k F = k ( P T − 1 q 1 P T P Γ 1 P T ) Z 0 − ( j 0 X j =2 1 q j P T P Γ j P T Z j − 1 ) k F = kP T Z 1 − ( l X j =2 1 q j P T P Γ j P T Z j − 1 ) k F = ... = k Z l k F ≤ C ( 1 2 ) l √ µr ≤ λ n 2 . Second, kP T ⊥ Y k = kP T ⊥ l X j =1 1 q j P Γ j Z j − 1 k ≤ l X j =1 k 1 q j P T ⊥ P Γ j Z j − 1 k = l X j =1 kP T ⊥ ( 1 q j P Γ j Z j − 1 − Z j − 1 ) k ≤ l X j =1 k 1 q j P Γ j Z j − 1 − Z j − 1 k ≤ l X j =1 C s n log n q j k Z j − 1 k ∞ ≤ C p n log n ( l X j =3 1 2 j − 1 log n √ q j + 1 2 √ log n √ q 2 + 1 √ q 1 ) k Z 0 k ∞ ≤ C √ nµr log n n √ ρ ≤ 1 8 √ log n , pro vided C ρ is suﬃcien tly large. Third, we h a v e k λ P T ⊥ P Ω ′ ( W ) k ≤ λ kP Ω ′ ( W ) k . Notice that W ij is an in dep end en t Rad emacher sequence indep endent of Ω ′ . By L emm a 4.3, we hav e k 2 sρ 1 − ρ + 2 sρ W − P Ω ′ ( W ) k ≤ C ′ 0 p np log n k W k ∞ 23 with high probability p ro vided 2 sρ 1 − ρ +2 sρ ≤ p and p ≥ C 0 log n n . By T heorem 3.9 of [39], we h av e k W k ∞ ≤ C 1 √ n with high p robabilit y . Therefore, kP Ω ′ ( W ) k ≤ C ′ 0 p np log n + C 1 √ n 2 sρ 1 − ρ + 2 sρ . By c ho osing p = ρ C 2 for some appropriate C 2 , w e ha v e kP Ω ′ ( W ) k ≤ √ nρ log n 8 , pro vided C ρ is large enough and C s is small enough. F ourth, kP Γ Y k ∞ = kP Γ X j 1 q j P Γ j Z j − 1 k ∞ ≤ X j 1 q j k Z j − 1 k ∞ ≤ ( l X j =3 1 q j 1 2 j − 1 log n + 1 q 2 1 2 √ log n + 1 q 1 ) k Z 0 k ∞ ≤ C √ µr nρ ≤ λ 4 √ log n , pro vided C ρ is suﬃcien tly large. Notice that in [4] th e authors used a ve ry similar golﬁng sc heme. T o compare these t w o m etho ds, w e use her e a non-unif orm s izes golﬁng scheme to ac hiev e a result with few er log f actors. Moreo v er, unlik e in [4] the authors used b oth golﬁng scheme and least square metho d to construct t w o p arts of the dual matrix, here w e only use golﬁng sc heme. Actually th e metho d to constru ct the dual matrix in [4] cann ot b e applied dir ectly to our pr oblem when ρ = O ( r log 2 n/n ). Ac kno wledgemen t s I am grateful to my Ph. D. advisor, Em man uel Cand` es, for his encouragemen ts and his help in preparing this manuscript. References [1] A. Aga r wal, S. Negahban, and M. W ainwrigh t. Noisy matrix decomp osition via c onv ex relax ation: Optimal ra tes in high dimensions. in Pr o c. 28th Int er. Conf. Mach. L e arn. (ICML). , pages 1 129– 1136 , 2011. [2] R. Ahlsw ede and A.Winter. Strong co nv erse for ide ntiﬁcation via quantum ch annels. IEEE T r ans. Inform. The ory , 4 8(3):569 – 5 79, 2 002. [3] R. Bara niuk, M. Dav enpo rt, R. DeV or e, and M. W akin. A simple pro of of the r estricted isometry prop erty for random matric e s. Construct ive A ppr oximation , 28(3):25 3–26 3, 200 8. [4] E. Cand` es, X. Li, Y. Ma , and J . W rig ht . Robust principal comp onent a nalysis? Journal of ACM , 58(3), 2011. [5] E. Cand` es a nd Y. P lan. Ma trix completion with noise. Pr o c e e dings of the IEEE , 2009. 24 [6] E. Cand` es and Y. P lan. Near -ideal mo del selection by ℓ 1 minimization. Ann. S tatist. , 3 7(5A):2145 –217 7, 2009. [7] E. Cand` es and Y. Plan. A pro babilistic and r ipless theory of compres s ed sensing. IEEE T r ansactions on Information The ory , 57(11 ):7235 – 7254 , 20 11. [8] E. Cand` es and B. Rech t. Exact ma trix completio n via co nvex optimzatio n. F oun dations of Computa- tional Mathematics , 9(6), 200 9. [9] E. Ca nd` es, J. Rom b e rg, and T. T a o. Robust uncertaint y principles: exa ct signal reconstruction from highly incomplete frequency infor mation. IEEE T r ans. Inform. The ory , 52(2):4 89–50 9, 200 6. [10] E . Ca nd` es, J . Romberg, a nd T. T a o. Stable signa l r ecov ery fro m incomplete and inaccur ate measure- men ts. Commun ic ations on Pur e and Applie d Mathematics , 59(8):1 2 07–1 223, 2 006. [11] E . Cand ` es and T. T ao. Deco ding by linear pr ogra mming. IEEE T r ans. Information The ory , 51 (12), 2005. [12] E . Cand` es a nd T. T ao . The power of convex r elaxation: Near- o ptimal matr ix completio n. IEEE T r ans. Inform. The ory , 5 6(5):205 3 –208 0, 2 010. [13] V. Cha ndrasek aran, S. Sa nghavi, P . Parrilo, and A. Willsky . Spar se and low-rank matrix decomp os itions. in 15th IF AC S ypmp osium on System Identiﬁc atio n (SYSID ) , 2 009. [14] V. Cha ndrasek aran, S. Sang havi, P . Parrilo, a nd A. Willsky . Rank-spar sity inco herence for ma trix decomp osition. SIAM J. on O ptimization , 21(2):5 72–5 96, 201 1. [15] S. Chen, D. Donoho, and M. Saunder s . Atomic decomp osition by ba sis pur suit. SIAM J. Sci. Comput. , 20(1):33– 61, 1998. [16] Y. Chen, A. Jala li, S. Sanghavi, and C. Ca r amanis. Low-rank matrix recovery fro m error s and erasure s . ISIT , 2 011. [17] K . Davidson and S. Szarek. L o cal op erator theory , ra ndom ma trices and banach space s. Handb o ok of the Ge ometr y of Banach S p ac es , I(8):317 –366 , 20 0 1. [18] D.Donoho . F or mos t large underdetermined s ystems of linear equa tions the minimal l1-no rm solution is a lso the spar sest so lution. Communic ations on Pur e and Applie d Mathematics , 59 (6):797– 8 29, 20 06. [19] D. Do no ho. Co mpressed s ensing. IEEE T r ans. Inform. The ory , 52(4):128 9 – 1306 , 200 6. [20] M. F azel. Matrix ra nk minimization with applications. Ph.D Thesis , 20 02. [21] D. Gross. Recovering low-rank matrices fro m few co e ﬃcie nt s in any basis. IEEE T r ans. on In formation The ory , 57(3):154 8–15 66, 2011 . [22] D. Gross , Y-K. Liu, S.Flammia, S. Beck er, and J.Eiser t. Quan tum state tomogr aphy via compressed sensing. Physic al R eview L etters , 105(1 5), 20 10. [23] J . Haupt, W. Ba jw a, M. Rabba t, and R. Now ak. Co mpressed sensing for netw orked data. Signal Pr o c essing Magazine, IEEE , 25(2):92 – 101, 2008 . [24] D. Hsu, S. Ka k ade, and T. Z hang. Robust matrix decomp o sition with spa rse c o rruptions. In formation The ory, IEEE T ra nsactions on , 57(1 1):7221 –723 4, 2 011. [25] J .T r o pp. User- friendly tail bo unds for sums of r andom matr ices. F ound. Comput. Math. , 2 011. [26] R. Keshav an, A. Mont anari, a nd S. Oh. Matrix completion from a few en tries. IEEE T r ans. In form. The ory , 56(6):298 0–29 98, 2010 . [27] J . La sk a , P . Bo ufounos, M. Davenport, and R. Bara niuk. Demo cracy in a ction: Quantization, saturation, and c ompressive sensing. Applie d and Computational Harmonic Analysis , 31 (3 ):429–4 43, 20 11. 25 [28] J . L ask a, M. Davenport, a nd R. Baraniuk. Exa ct signa l r ecov ery from spa rsely corr upted measure ments through the pursuit of justice. Asilomar Confer enc e on Signals Systems and Computers , 2 009. [29] Z. Li, F. W u, and J. W rig ht . On the sy s tematic mea surement ma tr ix for compress ed sensing in the presence o f gro s s err ors. Data Compr essio n Confer enc e , pages 356 –365 , 2010. [30] N. Nguyen and T. T ra n. Exa ct recov erability from dense corrupted observ ations via l 1 minimization. pr ep rint , 20 11. [31] N. Ngyuen, N. Nasraba di, and T. T ran. Robust lass o with missing and gro ssly cor r upted obser v a tions. pr ep rint , 20 11. [32] B. Rech t. A simpler approa ch to ma trix co mpletio n. Journ al of Machine L e arning R ese ar ch , 1 2:341 3 – 3430, 2 011. [33] B. Rech t, M. F az e l, a nd P . Parillo. Guar anteed minimum-rank solutions of linear matrix equatio ns v ia nu clear norm minimization. SIAM R eview , 5 2(3), 2010 . [34] J . Romber g. Compres sive sensing by random convolution. SIAM J . Imaging S cienc es , 2(4):1098 –112 8, 2009. [35] R.Tibshir ani. Reg ression s hrink age and selection via the las so. J. R oyal St atist. S o c. B. , 58(1):26 7–28 8, 1996. [36] M. Rudelson. Random vectors in the isotropic p osition. J. of F unctional Analysis , 16 4(1):60– 7 2, 1 999. [37] M. Rudelson a nd R. V ershynin. On sparse r econstruction fr om fourier and ga ussian measurements. Communic atio ns on Pur e and Applie d Mathematics , 61 (8):1025 –104 5, 200 8 . [38] C. Studer , P . K uppinger, G. P op e, and H. B¨ olcsk ei. Recov ery o f sparsely cor rupted s ignals. pr eprint , 2011. [39] R. V ershynin. Intro ductio n to the non-asymptotic analysis of random matrices. Chapter 5 of the b o ok Compr esse d Sensing, The ory and Applic ations, e d. Y . Eldar and G. Ku tyniok. Cambridge Un iversity Pr ess , pages 21 0–268 , 20 12. [40] J . W r ig ht and Y. Ma . Dense er ror corr ection via ℓ 1 -minimization. IEEE T r ansactions on Information The ory , 56(7):354 0 – 356 0, 20 10. [41] J . W r ig ht, A. Y. Y ang , A. Ganesh, S. Sastry , and Y. Ma. Robust fa ce reco gnition via spa rse repre sen- tation. IEEE T ra ns. Pattern Anal. Mach. Intel l. , 31(2):2 1022 7, 20 09. [42] L. W u, A. Ganesh, B. Shi, Y. Ma ts us hita, Y. W ang, and Y. Ma . Ro bust photometr ic stereo via low-rank matrix completion and r ecov ery . Pr o c e e dings of the 10th Asian c onfer enc e on Computer vision , Part II I, 2010. [43] H. Xu, C. Caramanis , and S. Sanghavi. Robust p ca via outlier pur s uit. in A d. Neura l Infor. Pr o c. Sys. (NIPS) , pages 2496 –2504 , 20 10. 26

Compressed Sensing and Matrix Completion with Constant Proportion of Corruptions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment