Compressed Sensing and Matrix Completion with Constant Proportion of Corruptions
We improve existing results in the field of compressed sensing and matrix completion when sampled data may be grossly corrupted. We introduce three new theorems. 1) In compressed sensing, we show that if the m \times n sensing matrix has independent …
Authors: Xiaodong Li
Compressed Sensing and Matrix Completion with Constan t Prop ortion of C orruptions Xiao dong Li Departmen t of Mathematics, Stanford Univ ersit y , Stanford, CA 94305 Abstract In t his pap er we impro ve exis ting results in the field of compres s ed sensing and matrix completion when sampled data may be gross ly corrupted. W e intro duce three new theo rems. 1) In compressed sensing, we show that if the m × n sensing matrix has indep endent Gaussia n ent ries, then one ca n recov er a s pa rse signal x exactly by tractable ℓ 1 minimization even if a po sitive fractio n of the mea surements are a rbitrarily corrupted, pro vided the nu mber o f nonzero ent ries in x is O ( m/ (log ( n/ m ) + 1)). 2) In the very genera l sensing mo del introduced in [7] and a s suming a p ositive fraction of c orrupted mea s urements, exac t recovery still holds if the signal now ha s O ( m/ (log 2 n )) nonzero entries. 3) Finally , we pr ov e that one can r e cov er an n × n low-rank matrix from m co rrupted sampled ent ries by tractable o ptimization pr ovided the rank is on the order of O ( m/ ( n log 2 n )); aga in, this ho lds when there is a p o sitive fraction of corr upted samples. Keyw ords. Compressed S en sing, Matrix Completion, Robust PCA, Conv ex Op timization, Restricted Isometry Pr op erty , Golfing Sc heme. 1 In tro duction 1.1 In tro duction on Compressed Sensing with Corruptions Compressed sensing (CS) has b een we ll-studied in recent yea rs [9 , 19]. This n o v el th eory asserts th at a sparse or appro ximately spars e signal x ∈ R n can b e acquired by taking just a few non-adaptiv e linear measurements. This fact has numerous consequ ences w hic h are b eing explored in a num b er of fields of applied science and engineering. In CS , the acquisition p ro cedur e is often repr esented as y = Ax , where A ∈ R m × n is called the s en sing m atrix and y ∈ R m is the vec tor of measuremen ts or observ ations. It is no w w ell-established that the solution ˆ x to the optimization problem min ˜ x k ˜ x k 1 suc h that A ˜ x = y , (1.1) is guaran teed to b e the original signal x with h igh probab ility , p ro vided x is sufficiently sparse and A ob eys certain conditions. A typical result is this: if A h as iid Gaussian entries, then exact reco v ery o ccurs p r o vided k x k 0 ≤ C m/ (log( n/m ) + 1) [10, 18, 37] for some p ositiv e numerical constan t C > 0. Here is another example, if A is a matrix with ro ws r andomly selected f r om the DFT matrix, the condition b ecomes k x k 0 ≤ C m/ log n [9]. This pap er discusses a natural generalization of CS, w hic h we shall r efer to as c ompr esse d sensing 1 with c orruptions . W e assu me that some en tries of th e data vect or y are totally corrupted bu t w e ha v e absolutely no idea which entries are u n reliable. W e s till wa n t to reco v er the original signal efficien tly and accurately . F ormally , w e ha v e the m athematical mo del y = Ax + f = [ A, I ] x f , (1.2) where x ∈ R n and f ∈ R m . The num b er of nonzero co efficient s in x is k x k 0 and similarly for f . As in the ab ov e m o del, A is an m × n sensin g matrix, usually sampled f rom a pr obabilit y distribution. The problem of reco v ering x (and hence f ) from y h as b een recen tly stu d ied in the literature in connection with s ome interesting applications. W e discuss a f ew of them. • Clipping. Signal clipping frequ en tly app ears b ecause of n onlinearities in the acquisition d evice [27, 38 ]. Here, one typica lly m easures g ( Ax ) rather than Ax , w here g is alwa ys a nonlinear map. Letting f = g ( Ax ) − Ax , w e thus observ e y = Ax + f . Nonlinearities usually o ccur at large amplitudes so that f or those comp onent s w ith small amplitudes, w e h a v e f = g ( Ax ) − Ax = 0. Th is means that f is sparse and , th erefore, our mo del is appr opriate. Just as b efore, lo cating the p ortion of the data ve ctor that h as b een clipp ed ma y b e difficult b ecause of additional n oise. • CS for networke d data. In a sensor net w ork, differen t sensors will collect measuremen ts of the same signal x in dep en d en tly (they eac h m easur e z i = h a i , x i ) an d send the outcome to a cente r hub for analysis [23, 30]. By setting a i as the row v ectors of A , this is just z = Ax . Ho w ev er, t ypically some sensors w ill fail to send the measurements correctly , and will sometimes rep ort totally meaningless measur emen ts. Therefore, w e collect y = Ax + f , where f mo dels recording errors. There hav e b een several theoretical pap ers in v estigati ng the exact reco v ery metho d for CS with corruptions [28 – 30, 38, 40], and all of them consider the follo wing r eco v ery p ro cedure in the noiseless case: min ˜ x, ˜ f k ˜ x k 1 + λ ( m, n ) k ˜ f k 1 suc h that A ˜ x + ˜ f = [ A, I ] ˜ x ˜ f = y . (1.3) W e w ill compare them with our r esults in S ection 1.4. 1.2 In tro duction on matrix completion with corruptions Matrix completion (MC) b ears some similarit y with CS . Here, the goal is to reco v er a lo w-rank matrix L ∈ R n × n from a small f r action of linear m easuremen ts. F or simp licit y , we supp ose the matrix is squ are as ab o v e (the general case is similar). T he standard mo d el is that w e observe P O ( L ) where O ⊂ [ n ] × [ n ] := { 1 , ..., n } × { 1 , ..., n } and P O ( L ) ij = ( L ij if ( i, j ) ∈ O ; 0 otherwise . The problem is to reco v er the original matrix L , and there h av e b een m any pap er s studying this problem in recent y ears, see [8, 12, 21, 26, 33], for example. Here one minimizes the nuclear norm — 2 the sum of all the singular v alues [20]— to reco v er the original lo w rank matrix. W e d iscuss b elo w an impro v ed r esult due to Gross [21] (with a sligh t difference). Define O ∼ Ber( ρ ) for some 0 < ρ < 1 by meanin g that 1 { ( i,j ) ∈ O } are iid Bernoulli ran d om v ariables with parameter ρ . T hen the solution to min e L k e L k ∗ suc h that P O ( e L ) = P O ( L ) , (1.4) is guarantee d to b e exactly L with high probabilit y , p ro vided ρ ≥ C ρ r µ l og 2 n n . Here, C ρ is a p ositive n umerical constan t, r is the rank of L , and µ is an incoherence p arameter introd u ced in [8] wh ic h is only dep endent of L . This pap er is concerned with the situation in whic h some en tries ma y ha v e b een corrupted. There- fore, our mo del is that we obser ve P O ( L ) + S, (1.5) where O and L are the same as b efore and S ∈ R n × n is supp orted on Ω ⊂ O . Just as in CS, this mo del h as br oad applicabilit y . F or example, W u et al. used this mo del in p hotometric stereo [42]. This problem has also b een introd u ced in [4 ] and is related to recent wo rk in separating a low-rank from a sp ars e comp onent [4, 13, 14, 24, 43]. A typical result is that the solution ( b L, b S ) to min e L, e S k e L k ∗ + λ ( m, n ) k e S k 1 suc h that P O ( e L ) + e S = P O ( L ) + S, (1.6) is guarant eed to b e the tru e pair ( L, S ) with high pr obabilit y under some assump tions ab out L, O , S [4, 16]. W e will compare them w ith our result in S ection 1.4. 1.3 Main results This section introdu ces thr ee mo dels and three corresp onding reco v ery resu lts. T he pro ofs of these results are deferred to S ection 2 for Th eorem 1.1, Section 3 for Theorem 1.2 an d S ection 4 for Theorem 1.3. 1.3.1 CS with iid matrices [Mo del 1] Theorem 1.1 Supp ose that A is an m × n ( m < n ) r andom matrix whose entries ar e iid Gaussian variables with me an 0 and varianc e 1 /m , the signal to ac quir e is x ∈ R n , and our observation is y = Ax + f + w wher e f , w ∈ R m and k w k 2 ≤ ǫ . Then by cho osing λ ( n, m ) = 1 √ log( n/m )+1 , the solution ( ˆ x, ˆ f ) to min ˜ x, ˜ f k ˜ x k 1 + λ k ˜ f k 1 such that k ( A ˜ x + ˜ f ) − y ) k 2 ≤ ǫ (1.7) satisfies k ˆ x − x k 2 + k ˆ f − f k 2 ≤ K ǫ with pr ob ability at le ast 1 − C exp( − cm ) . This holds universal ly; that is to say, for al l ve ctors x and f ob eying k x k 0 ≤ αm/ (log( n/m ) + 1) and k f k 0 ≤ αm . He r e α , C , c and K ar e numeric al c o nstants. In the ab o v e statemen t, the matrix A is rand om. Ev erything else is deterministic. The reader will notice that the n umber of nonzero entries is on the same ord er as that needed for reco v ery fr om 3 clean data [3, 10 , 19, 37], while th e condition of f implies that one can tolerate a constan t fr action of p ossibly adve rsarial errors. Moreov er, our con v ex optimization is related to LASS O [35] and Basis Pursuit [15]. 1.3.2 CS with general sensing matrices [Mo del 2] In this mo d el, m < n and A = 1 √ m a ∗ 1 ... a ∗ m , where a 1 , ..., a m are n iid copies of a random v ector a whose distrib ution ob eys the follo wing tw o prop erties: 1) E aa ∗ = I ; 2) k a k ∞ ≤ √ µ . This mo d el has b een introdu ced in [7] and includes a lot of the sto c hastic mo dels used in the literature. Examples include partial DFT matrices, matrices with iid ent ries, certain random conv olutions [34] and so on. In th is mo del, we assu me that x and f in (1.2) ha v e fixed su pp ort denoted by T and B , and with cardin alit y | T | = s and | B | = m b . In the remainder of the pap er, x T is the restriction of x to ind ices in T and f B is the restriction of f to B . Our main assump tion here concerns the sign sequences: the sign sequences of x T and f B are in d ep end en t of eac h other, and eac h is a sequence of symmetric iid ± 1 v ariables. Theorem 1.2 F or the mo del ab ove, the solution ( ˆ x, ˆ f ) to (1.3) , with λ ( n, m ) = 1 / √ log n , i s exact with pr ob ability at le ast 1 − C n − 3 , pr ovide d that s ≤ α m µ l og 2 n and m b ≤ β m µ . Her e C , α and β ar e some numeric al c onstants. Ab o v e, x and f ha v e fixed s upp orts and rand om signs. Ho w ev er, by a recent de-randomization tec hnique fir st in tro du ced in [4 ], exact r eco v ery with rand om supp orts an d fixed signs w ould also hold. W e will explain this de-randomization tec hnique in the pro of of Theorem 1.3. In s ome sp ecific mo dels, s uc h as indep endent ro ws from th e DFT matrix, µ could b e a numerical constan t, wh ic h implies the p rop ortion of corru ptions is also a constant. An op en p roblem is wh ether Th eorem 1.2 still h olds in the case where x and f hav e b oth fixed sup p orts and signs. Another op en pr oblem is to know whether the result wo uld hold un der more general conditions ab out A as in [6] in the case where x has b oth random supp ort and rand om signs. W e emph asize that the sparsit y condition k x k 0 ≤ C m µ l og 2 n is a little stronger than the optimal result a v ailable in th e n oise-free literature [7 , 9]), n amely , k x k 0 ≤ C m µ l og n . The extra logarithmic factor app ears to b e imp ortan t in the pr o of which we will explain in S ection 3, and a third op en problem is whether or n ot it is p ossible to remov e this factor. Here we do not giv e a sensitivit y analysis f or the reco v ery pr o cedure as in Mo del 1. Actually b y applying a similar metho d introdu ced in [7] to our argument in Section 3, a v ery goo d error b ound could b e obtained in the noisy case. Ho w ev er, technical ly there is little no v elt y b u t it will mak e our pap er v ery long. Th erefore w e decide to only d iscuss the noiseless case an d fo cus on the sampling rate and corru ption ratio. 4 1.3.3 MC from corrupted en tries [Mo del 3] W e assume L is of rank r and write its reduced S VD as L = U Σ V ∗ , where U, V ∈ R n × r and Σ ∈ R r × r . Let µ b e th e smallest quantit y such that for all 1 ≤ i ≤ n , k U U ∗ e i k 2 2 ≤ µr n , k V V ∗ e i k 2 2 ≤ µr n , and k U V ∗ k ∞ ≤ √ µr n . This m o del is the same as th at originally introduced in [8], and later used in [4, 12 , 16, 21, 32 ]. W e observ e P O ( L ) + S , where O ∈ [ n ] × [ n ] and S is su pp orted on Ω ⊂ O . Here we assume that O , Ω , S satisfy the follo wing mo d el: Mo del 3.1: 1. Fix an n b y n matrix K , whose entries are either 1 or − 1. 2. Define O ∼ Ber( ρ ) for a constant ρ satisfying 0 < ρ < 1 2 . Sp ecifically sp eaking, 1 { ( i,j ) ∈ O } are iid Bernoulli random v ariables with parameter ρ . 3. Conditioning on ( i, j ) ∈ O , assume that ( i, j ) ∈ Ω are indep end en t ev en ts with P (( i, j ) ∈ Ω | ( i, j ) ∈ O ) = s . Th is im p lies that Ω ∼ Ber( ρs ). 4. Define Γ := O / Ω. Then we h a v e Γ ∼ Ber( ρ (1 − s )) 5. Let S b e supp orted on Ω, and sgn( S ) := P Ω ( K ). Theorem 1.3 Under Mo del 3.1, supp ose ρ > C ρ µr l og 2 n n and s ≤ C s . Mor e over, supp ose λ := 1 √ ρn log n and denote ( ˆ L, ˆ S ) as the optimal solution to the pr oblem (1.6) . Then we have ( ˆ L, ˆ S ) = ( L, S ) with pr ob ability at le ast 1 − C n − 3 for some numeric al c onstant C , pr ovide d the numeric al c onstants C s is sufficiently smal l and C ρ is sufficiently lar ge. In this m o del O is av ailable while Ω, Γ and S are n ot kno wn explicitly from the observ atio n P O ( L ) + S . By the assumption O ∼ Ber( ρ ), we can use | O | / ( n 2 ) to app ro ximate ρ . F rom the follo wing p r o of we can see th at λ is not required to b e 1 √ ρn log n exactly for th e exact reco v ery . T he p ow er of our result is that one can reco v er a lo w-rank matrix from a nearly minimal num b er of samples ev en when a constan t prop ortion of these samples has b een corrupted. W e only discuss the noiseless case for this mo del. Actual ly by a metho d similar to [6], a sub opti- mal estimation error b ound can b e obtained by a sligh t mo dification of our argument. How ev er, it is of little interest technical ly and b eyond the optimal result when n is large. There are other sub optimal resu lts for matrix completion with noise, such as [1], b ut the error b ound is n ot tigh t when the additional noise is small. W e w an t to fo cus on the noiseless case in this pap er and lea v e the prob lem with noise for future work. The v alues of λ are c hosen for theoretical gu arantee of exact reco v ery in Th eorem 1.1, 1.2 and 1.3. In practice, λ is usually tak en b y cross v alidation. 1.4 Comparison with existing results, relativ e w orks and our c on tribution In this section we will compare Th eorems 1.1, 1.2 and 1.3 w ith existing r esults in the literature. 5 W e b egin with Mod el 1. In [40], W right and Ma discussed a mo del where the sensing ma- trix A has indep en den t columns with common mean µ and norm al p erturbations with v ariance σ 2 /m . Th ey c hose λ ( m, n ) = 1, and prov ed that ( ˆ x, ˆ f ) = ( x, f ) with h igh p robabilit y provided k x k 0 ≤ C 1 ( σ , n/m ) m , k f k 0 ≤ C 2 ( σ , n/m ) m and f h as random signs. Here C 1 ( σ , 1 /m ) is muc h smaller than C / (log( n/m ) + 1). W e n otice that sin ce the auth ors of [40] talk ed ab out a d ifferen t mo del, whic h is motiv ated by [41], it ma y not b e comparable with ours directly . Ho we v er, for our motiv ation of CS with corru ptions, we assume A satisfy a sym m etric distribution and get b etter sampling rate. A bit later, Lask a et al. [28] and Li et al. [29] also s tu died this p r oblem. By setting λ ( m, n ) = 1, b oth pap ers establish that for Gaussian (or s u b-Gaussian) sensing m atrices A , if m > C ( k x k 0 + k f k 0 ) log (( n + m ) / ( k x k 0 + k f k 0 )), then the reco v ery is exact. Th is follo ws f r om the fact that [ A, I ] ob eys a r estricted isometry prop er ty kno wn to guaran tee exact reco v ery of sparse vect ors via ℓ 1 minimization. F urthermore, the sparsity requirement ab out x is the same as that found in th e standard CS literature, namely , k x k 0 ≤ C m/ (log( n/m ) + 1). Ho w ev er, the result do es not allo w a p ositiv e fraction of corru ptions. F or example, if m = √ n , we ha v e k f k 0 /m ≤ 2 / log n , wh ic h will go to zero as n goes to zero. As for Mo d el 2, an interesting piece of work [30] (and later [31] on th e noisy case) app eared du ring the pr eparation of this pap er. Th ese pap ers discuss m o dels in w hic h A is formed by selecting ro ws from an orthogonal matrix with lo w incoherence p arameter µ , which is th e minimum v alue suc h that n | A ij | 2 ≤ µ for any i, j . The main result states that s electing λ = p n/ ( C µm log n ) giv es exact r eco v ery under th e follo wing assumptions: 1) the r o ws of A are chosen from an orthogonal matrix un iformly at random; 2) x is a r andom signal with ind ep end en t signs and equally likel y to b e either ± 1; 3) the supp ort of f is chose n uniformly at r andom. (By the de-rand omization tec hnique in tro duced in [4] and u sed in [30], it w ould ha v e b een sufficient to assume that the signs of f a re indep endent and tak e on the v alues ± 1 with equ al pr obabilit y). Finally , the sp arsit y conditions require m ≥ C µ 2 k x k 0 (log n ) 2 and k f k 0 ≤ C m , wh ic h are n early optimal, f or the b est kno wn sparsity condition w hen f = 0 is m ≥ C µ k x k 0 log n . In other w ords, the resu lt is optimal up to an extra factor of µ log n ; the sparsity cond ition ab out f is of course nearly optimal. Ho w ev er, the mo del for A do es not include some mo dels frequently d iscussed in the literature suc h as su bsampled tigh t or conti nuous fr ames. Against th is b ackg round , a r ecen t p ap er of Cand` es and Plan [7] considers a v ery general framework, whic h includ es a lot of common mo dels in the literature. Theorem 1.2 in our pap er is similar to Theorem 1 in [30]. It assumes similar sparsit y conditions, bu t is based on this m uc h broader and more app licable mo d el in tro duced in [7 ]. Notice that, we r equ ire m ≥ C µ k x k 0 (log n ) 2 whereas [30] requir es m ≥ C µ 2 k x k 0 (log n ) 2 . Therefore, we impro v e the condition by a factor of µ , whic h is alwa ys at least 1 and can b e as large as n . How ever, our result imp oses k f k 0 ≤ C m/µ , whic h is worse than k f k 0 ≤ γ m by the same factor. In [30], the parameter λ dep ends up on µ , w h ile our λ is only a fu n ction of m and n . Th is is w hy the results differ, and w e prefer to use a v alue of λ that do es n ot d ep end on µ b ecause in some applications, an accurate estimate of µ ma y b e difficult to obtain. In addition, we use different tec hniques of pro of wh ic h the clev er golfing s cheme of [21] is exp loited. Sparse appro ximation is another problem of und erdetermined linear system where the dictionary 6 matrix A is alwa ys assu med to b e deterministic. Readers inte rested in this p roblem (whic h alwa ys requires stronger sparsity conditions) may also wan t to stud y the recen t p ap er [38] b y S tuder et al. There, the authors introd u ce a more general problem of the form y = Ax + B f , and analyzed the p erforman ce of ℓ 1 -reco v ery tec hniques by using ideas w hic h hav e b een p opularized under the name of generalized uncertaint y prin ciples in the b asis p ursuit and sparse appr o ximation literature. As for Mo del 3, Th eorem 1.3 is a significant extension of th e results presented in [4], in wh ic h the authors ha v e a stringent r equ iremen t ρ = 0 . 1. In a very recent and ind ep endent wo rk [16], the authors consider a mo del where b oth O and Ω are u nions of sto chastic and deterministic subsets, while we only assume the sto c hastic mo del. W e recommend in terested readers to r ead the pap er for the details. Ho w ev er, only considering th eir results on sto chastic O an d Ω , a direct comparison sho ws that the num b er of samples w e need is less than that in this reference. T h e difference is sev eral logarithmic factors. Actually , the requirement of ρ in our pap er is op timal eve n for clean data in the literature of MC. Finally , we w an t to emp hasize that the random su pp ort assu m ption is essen tial in Theorem 1.3 when the r ank is large. Examp les can b e found in [24]. W e wish to close our in tro duction with a f ew w ords concerning the tec hniqu es of pro of w e shall use. Th e p ro of of Theorem 1.1 is based on the concept of r estricted isometry , which is a standard tec hnique in the literature of CS . How ever, our argument inv olv es a generalization of the r estricted isometry concept. The pro ofs of Theorems 1.2 and 1.3 are based on the golfing sc heme, an elegan t tec hnique pioneered by Da vid Gross [21], and later used in [4, 7, 32] to construct dual certificates. Our pro of leverag es results from [4]. How ever, we contribute no v el elemen ts b y find ing an app ro- priate w a y to phrase su fficien t optimalit y conditions, wh ic h are amenable to the golfing sc heme. Details are pr esen ted in the follo wing sections. 2 A Pro of of Theorem 1.1 In the pro of of Theorem 1.1, we will see the n otation P T x . H ere x is a k -dimensional v ector, T is a subset of { 1 , ..., k } and we also us e T to represent the subsp ace of all k -dimensional v ectors supp orted on T . Then P T x is the pro jection of x on to the subsp ace T , whic h is to k eep the v alue of x on the supp ort T and to c hange other elements in to zeros. In this section we u se the n otation “ ⌊ . ⌋ ” of “flo or f unction” to rep resen t the integ er part of an y real num b er. First w e generalize the concept of the restricted isometry prop er ty (RIP) [11] for the con v enience to prov e our theorem: Definition 2.1 F or any matrix Φ ∈ R l × ( n + m ) , define the RIP-c onsta nt δ s 1 ,s 2 by the infimum value of δ such that (1 − δ )( k x k 2 2 + k f k 2 2 ) ≤ Φ x f 2 2 ≤ (1 + δ )( k x k 2 2 + k f k 2 2 ) holds for any x ∈ R n with | supp( x ) | ≤ s 1 and f ∈ R m with | supp( f ) | ≤ s 2 . 7 Lemma 2.2 F or any x 1 , x 2 ∈ R n and f 1 , f 2 ∈ R m such that sup p( x 1 ) ∩ supp( x 2 ) = φ , | su pp( x 1 ) | + | supp( x 2 ) | ≤ s 1 and supp( f 1 ) ∩ s u pp( f 2 ) = φ , | supp( f 1 ) | + | supp( f 2 ) | ≤ s 2 , we have Φ x 1 f 1 , Φ x 2 f 2 ≤ δ s 1 ,s 2 q k x 1 k 2 2 + k f 1 k 2 2 q k x 2 k 2 2 + k f 2 k 2 2 Pro of First, we supp ose k x 1 k 2 2 + k f 1 k 2 2 = k x 2 k 2 2 + k f 2 k 2 2 = 1. By the d efinition of δ s 1 ,s 2 , we h av e 2(1 − δ s 1 ,s 2 ) ≤ Φ x 1 + x 2 f 1 + f 2 , Φ x 1 + x 2 f 1 + f 2 ≤ 2(1 + δ s 1 ,s 2 ) , and 2(1 − δ s 1 ,s 2 ) ≤ Φ x 1 − x 2 f 1 − f 2 , Φ x 1 − x 2 f 1 − f 2 ≤ 2(1 + δ s 1 ,s 2 ) . By the ab o v e inequalities, we ha v e Φ x 1 f 1 , Φ x 2 f 2 ≤ δ s 1 ,s 2 , and hence by h omogeneit y , w e ha v e Φ x 1 f 1 , Φ x 2 f 2 ≤ δ s 1 ,s 2 p k x 1 k 2 2 + k f 1 k 2 2 p k x 2 k 2 2 + k f 2 k 2 2 without the n orm assu mption. Lemma 2.3 Supp ose Φ ∈ R l × ( n + m ) with RIP-c onstant δ 2 s 1 , 2 s 2 < 1 18 ( s 1 , s 2 > 0 )and λ is b etwe en 1 2 q s 1 s 2 and 2 q s 1 s 2 . Th en for any x ∈ R n with | sup p( x ) | ≤ s 1 , any f ∈ R m with | sup p( f ) | ≤ s 2 , and any w ∈ R m with k w k 2 ≤ ǫ the solution ( ˆ x, ˆ f ) to the optimization pr oblem (1.7) satisfies k ˆ x − x k 2 + k ˆ f − f k 2 ≤ 4 √ 13+13 δ 2 s 1 , 2 s 2 1 − 9 δ 2 s 1 , 2 s 2 ǫ . Pro of Supp ose ∆ x = ˆ x − x and ∆ f = ˆ f − f . Th en by (1.7) we ha v e Φ ∆ x ∆ f 2 ≤ k w k 2 + Φ ˆ x ˆ f − Φ x f + w 2 ≤ 2 ǫ. It is easy to c hec k that the original ( x, f ) satisfies the inequalit y constraint in (1.7), so we h a v e k x + ∆ x k 1 + λ k f + ∆ f k 1 ≤ k x k 1 + λ k f k 1 . (2.1) Then it suffices to s h o w k ∆ x k 2 + k ∆ f k 2 ≤ 4 √ 13+13 δ 2 s 1 , 2 s 2 1 − 9 δ 2 s 1 , 2 s 2 ǫ . Supp ose T 0 with | T 0 | = s 1 suc h that su pp( x ) ∈ T 0 . Denote T c 0 = T 1 ∪ · · · ∪ T l where | T 1 | = ... = | T l − 1 | = s 1 and | T l | ≤ s 1 . Moreo v er, sup p ose T 1 con tains the ind ices of the s 1 largest (in the sense of absolute v alue) co efficien ts of P T c 0 ∆ x , T 2 con tains the indices of the s 1 largest co efficien ts of P ( T 0 ∪ T 1 ) c ∆ x , and so on . Similarly , defi ne V 0 suc h that su pp( f ) ⊂ V 0 and | V 0 | = s 2 , and d ivide V c 0 = V 1 ∪ ... ∪ V k in the same w a y . By this setup, we easily h av e X j ≥ 2 k P T j ∆ x k 2 ≤ s − 1 2 1 k P T c 0 ∆ x k 1 , (2.2) and X j ≥ 2 k P V j ∆ f k 2 ≤ s − 1 2 2 k P V c 0 ∆ f k 1 . (2.3) 8 On the other hand, by the assumption su pp( x ) ⊂ T 0 and supp( f ) ⊂ V 0 , w e hav e, k x + ∆ x k 1 = k P T 0 x + P T 0 ∆ x k 1 + k P T c 0 ∆ x k 1 ≥ k x k 1 − k P T 0 ∆ x k 1 + k P T c 0 ∆ x k 1 , (2.4) and similarly , k f + ∆ f k 1 ≥ k f k 1 − k P V 0 ∆ f k 1 + k P V c 0 ∆ f k 1 . (2.5) By inequalities (2.1), (2.4) and (2.5), we h a v e k P T c 0 ∆ x k 1 + λ k P V c 0 ∆ f k 1 ≤ k P T 0 ∆ x k 1 + λ k P V 0 ∆ f k 1 . (2.6) By the defin ition of δ 2 s 1 , 2 s 2 , the fact Φ ∆ x ∆ f 2 ≤ 2 ǫ and Lemma 2.2, w e ha v e (1 − δ 2 s 1 , 2 s 2 ) k P T 0 ∆ x + P T 1 ∆ x k 2 2 + k P V 0 ∆ f + P V 1 ∆ f k 2 2 ≤ Φ P T 0 ∆ x + P T 1 ∆ x P V 0 ∆ f + P V 1 ∆ f 2 2 = Φ P T 0 ∆ x + P T 1 ∆ x P V 0 ∆ f + P V 1 ∆ f , Φ ∆ x ∆ f − Φ P T 2 ∆ x + ... + P T l ∆ x P V 2 ∆ f + ... + P V k ∆ f ≤ − Φ P T 0 ∆ x + P T 1 ∆ x P V 0 ∆ f + P V 1 ∆ f , Φ P T 2 ∆ x + ... + P T l ∆ x P V 2 ∆ f + ... + P V k ∆ f + 2 ǫ Φ P T 0 ∆ x + P T 1 ∆ x P V 0 ∆ f + P V 1 ∆ f 2 ≤ δ 2 s 1 , 2 s 2 P T 0 ∆ x P V 0 ∆ f 2 + P T 1 ∆ x P V 1 ∆ f 2 X j ≥ 2 k P T j ∆ x k 2 + X j ≥ 2 k P V j ∆ f k 2 + 2 ǫ p 1 + δ 2 s 1 , 2 s 2 q k P T 0 ∆ x k 2 2 + k P T 1 ∆ x k 2 2 + k P V 0 ∆ f k 2 2 + k P V 1 ∆ f k 2 2 . Moreo v er, since X j ≥ 2 k P T j ∆ x k 2 + X j ≥ 2 k P V j ∆ f k 2 ≤ s − 1 2 1 k P T c 0 ∆ x k 1 + s − 1 2 2 k P V c 0 ∆ f k 1 By (2.2) and (2.3) ≤ 2 s − 1 2 1 ( k P T c 0 ∆ x k 1 + λ k P V c 0 ∆ f k 1 ) By λ > 1 2 r s 1 s 2 ≤ 2 s − 1 2 1 ( k P T 0 ∆ x k 1 + λ k P V 0 ∆ f k 1 ) By (2.6) ≤ 2 s − 1 2 1 ( s 1 2 1 k P T 0 ∆ x k 2 + λs 1 2 2 k P V 0 ∆ f k 2 ) By Cauch y-S c h w artz inequ alit y ≤ 4 k P T 0 ∆ x k 2 + 4 k P V 0 ∆ f k 2 , By λ < 2 r s 1 s 2 w e hav e P T 0 ∆ x P V 0 ∆ f 2 + P T 1 ∆ x P V 1 ∆ f 2 X j ≥ 2 k P T j ∆ x k 2 + X j ≥ 2 k P V j ∆ f k 2 ≤ 8( k P T 0 ∆ x k 2 2 + k P T 1 ∆ x k 2 2 + k P V 0 ∆ f k 2 2 + k P V 1 ∆ f k 2 2 ) . 9 Therefore, by δ 2 s 1 , 2 s 2 < 1 / 9, we hav e q k P T 0 ∆ x k 2 2 + k P T 1 ∆ x k 2 2 + k P V 0 ∆ f k 2 2 + k P V 1 ∆ f k 2 2 ≤ 2 ǫ p 1 + δ 2 s 1 , 2 s 2 1 − 9 δ 2 s 1 , 2 s 2 . Since X j ≥ 2 k P T j ∆ x k 2 + X j ≥ 2 k P V j ∆ f k 2 ≤ 4 k P T 0 ∆ x k 2 + 4 k P V 0 ∆ f k 2 , w e hav e k ∆ x k 2 + k ∆ f k 2 ≤ 5( k P T 0 ∆ x k 2 + k P V 0 ∆ f k 2 ) + ( k P T 1 ∆ x k 2 + k P V 1 ∆ f k 2 ) ≤ √ 52 q k P T 0 ∆ x k 2 2 + k P T 1 ∆ x k 2 2 + k P V 0 ∆ f k 2 2 + k P V 1 ∆ f k 2 2 ≤ 4 p 13 + 13 δ 2 s 1 , 2 s 2 1 − 9 δ 2 s 1 , 2 s 2 ǫ. W e n o w cite a well-kno w n r esult in the literature of CS, e.g. Th eorem 5.2 of [3]. Lemma 2.4 Supp ose A is a r andom matrix define d in mo del 1. Th en for any 0 < δ < 1 , ther e exist c 1 ( δ ) , c 2 ( δ ) > 0 such that with pr o b ability at le ast 1 − 2 exp( − c 2 ( δ ) m ) , (1 − δ ) k x k 2 2 ≤ k Ax k 2 2 ≤ (1 + δ ) k x k 2 2 holds universal ly for any x with | sup p( x ) | ≤ c 1 ( δ ) m log n m +1 . Also, we cite a we ll-kno w result w hic h can give a b ound for the biggest singular v alue of ran d om matrix, e.g. [17] and [39]. Lemma 2.5 L et B b e an m × n matrix whose entries ar e indep endent standar d normal r andom variables. Then for every t ≥ 0 , with pr ob ability at le ast 1 − 2 exp( − t 2 / 2) , one has k B k 2 , 2 ≤ √ m + √ n + t . W e n o w pro v e Theorem 1.1 : Pro of Supp ose α , δ are tw o constan ts ind ep endent of m and n , and their v alues will b e sp ecified later. Set s 1 = j α m log n m +1 k and s 2 = ⌊ αm ⌋ . W e wan t to b ound th e RIP -constant δ 2 s 1 , 2 s 2 for the ( n + m ) × m matrix Φ = [ A, I ] when α is s u fficien tly sm all. F or an y T with | T | = 2 s 1 and V with | V | = 2 s 2 , and any x with supp( x ) ⊂ T , any f with su p p( f ) ⊂ V , we ha v e [ A, I ] x f 2 2 = k Ax + f k 2 2 = k Ax k 2 2 + k f k 2 2 + 2 h P V AP T x, f i . By Lemma 2.4, assu ming α ≤ c 1 ( δ ), w ith probabilit y at least 1 − 2 exp( − c 2 ( δ ) m )) we ha v e (1 − δ ) k x k 2 2 ≤ k Ax k 2 2 ≤ (1 + δ ) k x k 2 2 (2.7) 10 holds universally f or an y such T and x . No w we we fi x T and V , and w e w an t to b ound k P V AP T k 2 , 2 . By L emm a 2.5, we actually ha v e k P V AP T k 2 , 2 ≤ 1 √ m ( √ 2 s 1 + √ 2 s 2 + √ δ 2 m ) ≤ (2 √ 2 α + δ ) (2.8) with p robabilit y at least 1 − 2 exp ( − δ 2 m/ 2). Then w ith pr obabilit y at least 1 − 2 exp ( − δ 2 m 2 ) n 2 s 1 m 2 s 2 , inequalit y 2.8 holds un iv ersally for any V satisfying | V | = 2 s 1 and T satisfying | V | = 2 s 2 . By 2 s 1 ≤ 2 α m log n m +1 , w e ha v e 2 s 1 log( en 2 s 1 ) ≤ α 1 m , w h ere α 1 only d ep end s on α and α 1 → 0 as α → 0, and hen ce n 2 s 1 ≤ ( en 2 s 1 ) 2 s 1 ≤ exp( α 1 m ). Similarly , b ecause 2 s 2 ≤ 2 αm , w e hav e 2 s 2 log( em 2 s 2 ) ≤ α 2 m , where α 2 only dep ends on α and α 2 → 0 as α → 0, and hence m 2 s 2 ≤ ( em 2 s 2 ) 2 s 2 ≤ exp( α 2 m ). Therefore, inequalit y 2.8 holds u niv ersally for an y su c h T and V w ith pr ob- abilit y at least 1 − 2 exp(( δ 2 / 2 − α 1 − α 2 ) m ). Com bined with 2.7, w e ha v e (1 − δ ) k x k 2 2 + k f k 2 2 − (2 √ 2 α + δ ) k x k 2 k f k 2 ≤ [ A, I ] x f 2 2 ≤ (1 + δ ) k x k 2 2 + k f k 2 2 + (2 √ 2 α + δ ) k x k 2 k f k 2 holds u niv ersally for an y suc h T , U , x and f wh ic h p r obabilit y at least 1 − 2 exp ( − c 2 ( δ ) m )) − 2 exp(( δ 2 / 2 − α 1 − α 2 ) m ). By c ho osing an appropriate δ and letting α sufficient ly small, we hav e δ 2 s 1 , 2 s 2 < 1 / 9 with pr obabilit y at least 1 − C e − cm . Moreo v er, under the assumption that α m log( n/m )+1 ≥ 1, we ha v e s 1 = j α m log( n/m )+1 k > 0, s 2 = ⌊ αm ⌋ > 0 and 1 2 q s 1 s 2 < 1 √ log n m +1 < 2 q s 1 s 2 . Then Theorem 1.1 as a direct corollary of Lemma 2.3 3 A Pro of of Theorem 1.2 In this section we will encounter several absolute constan ts. Instead of denoting them by C 1 , C 2 , ..., w e ju s t use C , i.e., the v alues of C change fr om line to line. Also, we will use the ph rase “with high prob ab ility” to mean with probability at least 1 − C n − c , wh ere C > 0 is a numerical constant and c = 3 , 4 , or 5 d ep endin g on the con text. Here w e will use a lot of notations to represent sub -matrices and su b-v ectors. Sup p ose A ∈ R m × n , P ⊂ [ m ] := { 1 , ..., m } , Q ⊂ [ n ] and i ∈ [ n ]. W e denote by A P , : the su b-matrix of A with ro w in dices con tained in P , by A : ,Q the su b-matrix of A w ith column indices con tained in Q , and b y A P ,Q the sub-matrix of A with row ind ices con tained in P an d column in dices con tained in Q . Moreo v er, w e denote b y A P ,i the sub-matrix of A with row in d ices con tained in P and column i , which is actually a column vec tor. The term “ve ctor” means column vec tor in this section, and all ro w v ectors are d enoted b y an adjoin t of a ve ctor, suc h as a ∗ for a ve ctor a . S upp ose a is a v ector and T a subset of ind ices. Th en w e d enote by a T the restriction of a on T , i.e., a vec tor with all elements of a w ith ind ices in T . F or any v ector v , w e use v { i } to denote the i -th elemen t of v . 11 3.1 Suppor ting lemmas T o prov e T heorem 1.2 we n eed some supp orting lemmas. B ecause our mo del of sensing matrix A is the same as in [7], we will cite some lemmas from it d irectly . Lemma 3.1 (L emma 2.1 of [7]) Supp ose A is as define d in mo del 2. L et T ⊂ [ n ] b e a fixe d set of c ar dina lity s. Then for δ > 0 , P ( k A ∗ : ,T A : ,T − I k 2 , 2 ≥ δ ) ≤ 2 s exp − m µs · δ 2 2(1+ δ/ 3 ) . In p articular, k A ∗ : ,T A : ,T − I k 2 , 2 ≤ 1 2 with high pr ob ability pr ovide d s ≤ γ m µ l og n , and k A ∗ : ,T A : ,T − I k 2 , 2 ≤ 1 2 √ log n with high pr ob ability pr ovide d s ≤ γ m µ log 2 n , wher e γ i s some absolute c onstant . This Lemma w as pr o v ed in [7] by matrix Bernstein’s inequalit y , whic h is first intro d uced by [2]. A deep generalization is giv en in [25]. Lemma 3.2 (L emma 2.4 of [7 ]) Supp ose A is as define d in mo del 2. Fix T ⊂ [ n ] with | T | = s and v ∈ R s . Then k A ∗ : ,T c A : ,T v k ∞ ≤ 1 20 √ s k v k 2 with high pr ob ability pr ovide d s ≤ γ m µ l og n , wher e γ is some absolute c onstant. Lemma 3.3 (L emma 2.5 of [7]) Supp ose A is as define d in mo del 2. Fix T ⊂ [ n ] with | T | = s . Then max i ∈ T c k A ∗ : ,T A : ,i k 2 ≤ 1 with high pr ob ability pr ovide d s ≤ γ m µ l og n , wher e γ is some absolute c onstant. 3.2 A pro of of Theorem 1.2 In this part we will giv e a complete pro of of Th eorem 1.2 with a p o w erful tec hnique called ”golfing- sc heme” introduced by Da vid Gross in [21], and later in [4] and [7 ]. Und er th e assum p tion of mo del 2, we additionally assume s ≤ α m µ l og 2 n and m b ≤ β m µ , w here α and β are n umerical constan ts whose v alues will sp ecified later. First we giv e tw o useful inequ alities. By replacing A with q m m − m b A B c ,T in Lemma 3.1 and Lemma 3.2, we hav e m m − m b A ∗ B c ,T A B c ,T − I 2 , 2 ≤ 1 / 2 (3.1) and max i ∈ T c m m − m b A ∗ B c ,T A B c ,i 2 ≤ 1 (3.2) with high p robabilit y p ro vided s ≤ γ m − m b µ log n . Sin ce s ≤ α m µ log 2 n and m b ≤ β m µ , b oth 3.1 and 3.2 hold with high probabilit y p r o vided α and β are sufficien tly sm all. W e assume (3.1) and (3.2) hold throughout this section. First we prov e that the solution ( ˆ x, ˆ f ) of (1.3) equals ( x, f ) if we can find an appropr iate dual v ector q B c satisfying the follo wing requirement. This is actually an “inexact dual vec tor” of th e optimization p roblem (1.3). T his idea w as first give n explicitly in [22] and [21], and related to [5]. W e give a result similar to [7]. 12 Lemma 3.4 (Inexact Duality) Supp o se ther e exists a ve ctor q B c ∈ R m − m b satisfying k v T − sgn ( x T ) k 2 ≤ λ/ 4 , k v T c k ∞ ≤ 1 / 4 and k q B c k ∞ ≤ λ/ 4 , (3.3) wher e v = A ∗ B c , : q B c + A ∗ B , : λ sgn ( f B ) . (3.4) Then the solution ( ˆ x, ˆ f ) of (1.3) e quals ( x, f ) pr ovid e d β is sufficiently smal l and λ < 3 2 . Pro of Set h = ˆ x − x . By x T c = 0 we hav e h T c = ˆ x T c . (3.5) By f B c = 0, and Ax + f = A ˆ x + ˆ f , we hav e Ah = f − ˆ f and A B c , : h = ( f − ˆ f ) B c = − ˆ f B c . (3.6) Then w e hav e the follo wing in equalit y k ˆ x k 1 + λ k ˆ f k 1 = h ˆ x T , sgn( ˆ x T ) i + k ˆ x T c k 1 + λ ( h ˆ f B , sgn( ˆ f B ) i + k ˆ f B c k 1 ) ≥ h ˆ x T , sgn( x T ) i + k ˆ x T c k 1 + λ ( h ˆ f B , sgn( f B ) i + k ˆ f B c k 1 ) = h x T + h T , sgn( x T ) i + k h T c k 1 + λ ( h f B − A B , : h, sgn( f B ) i + k A B c , : h k 1 ) By (3.5), (3.6) = k x k 1 + λ k f k 1 + k h T c k 1 + λ k A B c , : h k 1 + h h T , sgn( x T ) i − λ h A B , : h, sgn( f B ) i . Since k ˆ x k 1 + λ k ˆ f k 1 ≤ k x k 1 + λ k f k 1 , w e hav e k h T c k 1 + λ k A B c , : h k 1 + h h T , sgn( x T ) i − λ h A B , : h, sgn( f B ) i ≤ 0 . (3.7) By (3.4), we h a v e h h T , v T i + h h T c , v T c i = h h, v i = h h, A ∗ B c , : q B c + A ∗ B , : λ sgn f B i = h A B c , : h, q B c i + λ h A B , : h, sgn f B i , and then by (3.3 ), h h T , sgn( x T ) i − λ h A B , : h, sgn( f B ) i = h h T , (sgn( x T ) − v T ) i + h A B c , : h, q B c i − h h T c , v T c i ≥ − λ 4 k h T k 2 − 1 4 λ k A B c , : h k 1 − 1 4 k h T c k 1 . Unite it with (3.7), w e ha v e − λ 4 k h T k 2 + 3 4 λ k A B c , : h k 1 + 3 4 k h T c k 1 ≤ 0 . (3.8) 13 By (3.1), w e hav e q m m − m b A ∗ B c ,T 2 , 2 ≤ q 3 2 and the smallest singular v alue of m m − m b A ∗ B c ,T A B c ,T is at least 1 2 . Therefore, k h T k 2 ≤ 2 m m − m b A ∗ B c ,T A B c ,T h T 2 ≤ 2 m m − m b A ∗ B c ,T A B c ,T c h T c 2 + m m − m b A ∗ B c ,T A B c , : h 2 ≤ 2 m m − m b A ∗ B c ,T A B c ,T c h T c 2 + √ 6 r m m − m b A B c , : h 2 ≤ 2 X i ∈ T c m m − m b A ∗ B c ,T A B c ,i 2 | h { i } | + √ 6 r m m − m b A B c , : h 2 By the triangle inequalit y ≤ 2 k h T c k 1 + √ 6 r m m − m b A B c , : h 1 By (3.2) . Plugging this into (3.8), w e h a v e 3 4 − 1 2 λ k h T c k 1 + 3 4 − √ 6 4 q m m − m b λ k A B c , : h k 1 ≤ 0. W e kno w 3 4 − √ 6 4 q m m − m b > 0 when β is sufficien tly small. Moreo v er , by the assumption λ < 3 2 , w e hav e h T c = 0 and A B c , : h = 0. Since A B c , : h = A B c ,T h T + A B c ,T c h T c , we hav e A B c ,T h T = 0. The inequalit y (3.1) implies that A B c ,T is injectiv e, so h T = 0 and h = h T + h T c = 0, wh ic h implies ( ˆ x, ˆ f ) = ( x, f ). No w let’s construct a v ector q B c satisfying the requirement (3.3) by c ho osing an appr op r iate λ . Pro of (of Theorem 1.2) S et λ = 1 √ log n . It suffices to construct a q B c satisfying (3.3). Denot- ing u = A ∗ B c . : q B c , we only n eed to construct a q B c satisfying k u T + λA ∗ B ,T sgn( f B ) − sgn( x T ) k 2 ≤ λ 4 , k u T c k ∞ ≤ 1 8 , k λA ∗ B , : sgn( f B ) k ∞ ≤ 1 8 , k q B c k ∞ ≤ λ 4 . No w let’s construct our q B c b y the golfing sc heme. First we ha v e to write A B c , : as a blo ck matrix. W e divide B c in to l = ⌊ log 2 n + 1 ⌋ = ⌊ log n log 2 + 1 ⌋ disjoint subsets: B c = G 1 ∪ ... ∪ G l where | G i | = m i . Then we h a v e P l i =1 m i = m − m b and A B c , : = A G 1 , : · · · A G l , : . W e wa n t to men tion that th e p artition of B c is d eterministic, not dep ending on A , so A G 1 , : , ..., A G l , : are indep endent. Noticing m b ≤ β m µ ≤ β m , by letting β sufficien tly small, we can require m m 1 ≤ C, m m 2 ≤ C, m m k ≤ C log n for k = 3 , ..., l for some absolute constant C . Since s ≤ α m µ log 2 n , we ha v e s ≤ αC m 1 µ log 2 n , s ≤ αC m 2 µ log 2 n , s ≤ αC m k µ log n for k = 3 , ..., l . (3.9) 14 Then b y Lemm a 3.1, replacing A with q m m j A G j ,T , w e hav e the follo wing inequ alities: m m j A ∗ G j ,T A G j ,T − I 2 , 2 ≤ 1 2 √ log n for j = 1 , 2; (3.10) m m j A ∗ G j ,T A G j ,T − I 2 , 2 ≤ 1 2 for j = 3 , ..., l ; (3.11) with high pr obabilit y provi ded α is sufficien tly small. No w let’s give an explicit construction of q B c . Define p 0 = sgn( x T ) − λA ∗ B ,T sgn( f B ) (3.12) and p i = I − m m i A ∗ G i ,T A G i ,T p i − 1 = I − m m i A ∗ G i ,T A G i ,T · · · I − m m 1 A ∗ G 1 ,T A G 1 ,T p 0 (3.13) for i = 1 , ..., l , and construct q B c = m m 1 A G 1 ,T p 0 . . . m m l A G l ,T p l − 1 . (3.14) Then b y u = A ∗ B c , : q B c , we h av e u = A ∗ B c , : m m 1 A G 1 ,T p 0 . . . m m l A G l ,T p l − 1 = l X i =1 m m i A ∗ G i , : A G i ,T p i − 1 . (3.15) W e n o w b oun d the ℓ 2 norm of p i . Actually , b y (3.10), (3.11) and (3.13 ), we hav e k p 1 k 2 ≤ 1 2 √ log n k p 0 k 2 , (3.16) k p 2 k 2 ≤ 1 4 log n k p 0 k 2 , (3.17) k p j k 2 ≤ 1 log n ( 1 2 ) j k p 0 k 2 for j = 3 , ..., l . (3.18) No w we will pro v e our constructed q B c satisfies the desired r equirement s: The pro of of λA ∗ B , : sgn ( f B ) ∞ ≤ 1 8 By Ho effd ing’s inequalit y , for any i = 1 , ..., n , we ha v e P A ∗ B ,i sgn( f B ) ≥ t ≤ 2 exp − 2 t 2 4 k A B,i k 2 2 . By choosing t = C √ log n k A B ,i k 2 ( C is some absolute constan t), with high prob ab ility , w e ha v e λA ∗ B ,i sgn( f B ) ≤ λC √ log n k A B ,i k 2 ≤ C q µm b m ≤ √ β ≤ 1 8 , pr o vided β is sufficien tly small, and this implies λA ∗ B , : sgn( f B ) ∞ ≤ 1 8 . 15 The pro of of u T + λA ∗ B ,T sgn ( f B ) − sgn ( x T ) 2 ≤ λ 4 By (3.15) and (3.13), we hav e u T = l X i =1 m m i A ∗ G i ,T A G i ,T p i − 1 = l X i =1 ( p i − 1 − p i ) = p 0 − p l . Then by (3.12) w e hav e u T + λA ∗ B ,T sgn( f B ) − sgn( x T ) 2 = k u T − p 0 k 2 = k p l k 2 . S ince λA ∗ B , : sgn( f B ) ∞ ≤ 1 / 8, w e hav e λA ∗ B ,T sgn( f B ) 2 ≤ 1 8 √ s , whic h implies k p 0 k 2 = λA ∗ B ,T sgn( f B ) − sgn( x T ) 2 ≤ 9 8 √ s. (3.19) Then b y (3.18) and l = ⌊ log 2 n + 1 ⌋ , w e hav e k p l k 2 ≤ 1 log n ( 1 2 ) l 9 8 √ s ≤ 1 log n 1 n 9 8 q αm µ l og 2 n ≤ 1 4 √ log n = λ 4 , provided α is sufficien tly small. The pro of of k u T c k ∞ ≤ 1 / 8 By (3.15 ), w e hav e u T c = l X i =1 m m i A ∗ G i ,T c A G i ,T p i − 1 . Recall that A G 1 , : , ..., A G l , : are indep endent, so by the construction of p i − 1 w e kn o w A G i , : and p i − 1 are ind ep endent. Replacing A with q m m i A G i , : in Lemma 3.2, and b y the sp arsit y condition (3.9), we ha v e l X i =1 m m i A ∗ G i ,T c A G i ,T p i − 1 ∞ ≤ l X i =1 1 20 1 √ s k p i − 1 k 2 with h igh pr obabilit y , provided α is sufficientl y small. By (3.16), (3.17), (3.18) and (3.19), we hav e k u T c k ∞ ≤ l X i =1 1 20 1 √ s k p i − 1 k 2 ≤ 1 20 1 √ s 2 k p 0 k 2 < 1 8 . The pro of of k q B c k ∞ ≤ λ 4 F or k = 1 , .., l , w e d enote A G k , : = 1 √ m a ∗ k 1 ... a ∗ k m k , and A B , : = 1 √ m ˜ a ∗ 1 ... ˜ a ∗ m b . By (3.13), (3.14) and (3.12), it suffices to sh o w that for any 1 ≤ k ≤ l and 1 ≤ j ≤ m k , √ m m k ( a k j ) ∗ T I − m m k − 1 A ∗ G k − 1 ,T A G k − 1 ,T · · · I − m m 1 A ∗ G 1 ,T A G 1 ,T sgn( x T ) − λA ∗ B ,T sgn( f B ) ≤ λ 4 . Set w = I − m m 1 A ∗ G 1 ,T A G 1 ,T · · · I − m m k − 1 A ∗ G k − 1 ,T A G k − 1 ,T ( a k j ) T . (3.20) Then it suffices to p ro v e √ m m k w ∗ sgn( x T ) − λA ∗ B ,T sgn( f B ) ≤ λ 4 . Since w and sgn( x T ) are ind ep endent, by Ho effding’s in equalit y and cond itioning on w , we ha v e P ( | w ∗ sgn( x T ) | ≥ t ) ≤ 2 exp − 2 t 2 4 k w k 2 2 for an y t > 0. Then with high pr obabilit y w e hav e | w ∗ sgn( x T ) | ≤ C p log n k w k 2 (3.21) 16 for some absolute constant C . Setting z = s gn( f B ), we ha v e w ∗ A ∗ B ,T sgn( f B ) = 1 √ m m b X i =1 [(˜ a i ) ∗ T w ] z { i } . S ince w , A B ,T and z are indep en d en t, conditioning on w we h a v e E { [(˜ a i ) ∗ T w ] z { i } } = E { (˜ a i ) ∗ T w } E { z ( i ) } = 0 , [(˜ a i ) ∗ T w ] z { i } ≤ k w k 2 k (˜ a i ) T k 2 ≤ √ sµ k w k 2 ≤ r αm log 2 n k w k 2 , and E { [(˜ a i ) ∗ T w ] z { i } 2 } = E { [ w ∗ (˜ a i ) T ][(˜ a i ) ∗ T w ] } = w ∗ E { (˜ a i ) T (˜ a i ) ∗ T } w = k w k 2 2 . By Bernstein’s inequalit y , we h a v e P w ∗ A ∗ B ,T sgn( f B ) ≥ t √ m ≤ 2 exp − t 2 / 2 m b k w k 2 2 + q αm log 2 n k w k 2 t/ 3 . By c ho osing s ome numerical constant C and t = C √ m log n k w k 2 , we ha v e w ∗ A ∗ B ,T sgn( f B ) ≤ C p log n k w k 2 (3.22) with high pr obabilit y , p ro vided α is s ufficien tly small. By (3.21) and (3.22), w e ha v e √ m m k w ∗ sgn( x T ) − λA ∗ B ,T sgn( f B ) ≤ √ m m k C p log n k w k 2 , (3.23) for some numerical constan t C . When k ≥ 3, b y (3.20), (3.10) and (3.11), w e h a v e k w k 2 ≤ ( 1 2 ) k − 1 1 log n √ µs ≤ √ αm log 2 n . Recalling m m k ≤ C log n , b y (3.23), w e ha v e √ m m k w ∗ sgn( x T ) − λA ∗ B ,T sgn( f B ) ≤ C m m k √ α (log n ) − 3 / 2 ≤ λ 4 pro vided α is su fficien tly small. When k ≤ 2, b y (3.20) and (3.10), w e hav e k w k 2 ≤ √ µs ≤ √ αm log n . Recalling m m k ≤ C , by (3.23), we ha v e √ m m k w ∗ sgn( x T ) − λA ∗ B ,T sgn( f B ) ≤ C m m k √ α (log n ) − 1 / 2 ≤ λ 4 pro vided α is su fficien tly small. Here we would like to compare our golfing scheme w ith that in [7]. Th ere are mainly t w o differences. One is that we hav e an extra term λA ∗ B , : sgn( f B ) in the dual ve ctor. T o obtain the inequalit y k v T c k ∞ ≤ 1 / 4, we prop ose to b oun d k u T c k ∞ and k λA ∗ B , : sgn( f B ) k ∞ resp ectiv ely , and this will lead to th e extra log factor compared with [7]. Mo reo v er, by u s ing the golfing scheme to construct the dual vecto r, we need to b oun d the term k q B c k ∞ , which is not necessary in [7]. T his inevitably incurs the r andom signs assump tions of the signal. 17 4 A Pro of of Theorem 1.3 In this section, the capital letters X , Y etc repr esen t m atrices, and the symbols in script fon t I , P T , etc represent linear op erators from a m atrix sp ace to a matrix space. Moreo v er, for any Ω 0 ⊂ [ n ] × [ n ] w e hav e P Ω 0 M is to k eep the entries of M on the su pp ort Ω 0 and to c hange other en tries into zeros. F or any n × n matrix A , denote by k A k F , k A k , k A k ∞ and k A k ∗ resp ectiv ely the F rob eniu s norm, op erator n orm (the largest singular v alue), the biggest magnitude of all elements, and the nuclea r n orm(the sum of all singular v al ues). Similarly to Section 3, instead of denoting them as C 1 , C 2 , ..., w e just use C , w hose v alues change from line to line. Also, we will u se the phr ase “with high probability” to mean with p r obabilit y at least 1 − C n − c , where C > 0 is a numerical constan t and c = 3 , 4 , o r 5 dep ending on the con text. 4.1 A mo del equiv alen t to Mo del 3.1 Mo del 3.1 is natural an d used in [4], but we will use the f ollo wing equiv alen t mo d el for the con v e- nience of pro of: Mo del 3.2: 1 . Fix an n b y n matrix K , w hose en tries are either 1 or − 1. 2. Defin e t w o ind ep end en t random sub sets of [ n ] × [ n ]: Γ ′ ∼ Ber((1 − 2 s ) ρ ) and Ω ′ ∼ Ber( 2 sρ 1 − ρ +2 sρ ). Moreo v er, let O := Γ ′ ∪ Ω ′ , which thus satisfies O ∼ Ber( ρ ). 3. Define an n × n random matrix W with ind ep endent entries W ij satisfying P ( W ij = 1) = P ( W ij = − 1) = 1 2 . 4. Define Ω ′′ ⊂ Ω ′ : Ω ′′ := { ( i, j ) : ( i, j ) ∈ Ω ′ , W ij = K ij } . 5. Define Ω := Ω ′′ / Γ ′ , and Γ := O / Ω. 6. Let S satisfy sgn( S ) := P Ω ( K ). Ob viously , in b oth Mo d el 3.1 and Mo del 3.2 the whole setting is deterministic if we fix ( O , Ω). Therefore, th e probabilit y of ( ˆ L, ˆ S ) = ( L, S ) is determined by the join t d istribution of ( O , Ω). It is not difficu lt to pro v e th at the join t distributions of ( O , Ω ) in b oth mo dels are the same. Indeed, in Mo del 3.1, we ha v e that (1 { ( i,j ) ∈ O } , 1 { ( i,j ) ∈ Ω } ) are iid random vec tors with the probabilit y d istr ibu- tion P (1 { ( i,j ) ∈ O } = 1) = ρ , P (1 { ( i,j ) ∈ Ω } = 1 | 1 { ( i,j ) ∈ O } = 1) = s and P (1 { ( i,j ) ∈ Ω } = 1 | 1 { ( i,j ) ∈ O } = 0) = 0. In Mo d el 3.2, we hav e (1 { ( i,j ) ∈ O } , 1 { ( i,j ) ∈ Ω } ) = (max(1 { ( i,j ) ∈ Γ ′ } , 1 { ( i,j ) ∈ Ω ′ } ) , 1 { ( i,j ) ∈ Ω ′ } 1 { W i,j = K i,j } 1 { ( i,j ) ∈ Γ ′ c } ) . This implies that (1 { ( i,j ) ∈ O } , 1 { ( i,j ) ∈ Ω } ) are indep endent random ve ctors. Moreo ver, it is easy to calculate that P (1 { ( i,j ) ∈ O } = 1) = ρ , P (1 { ( i,j ) ∈ Ω } = 1) = sρ and P (1 { ( i,j ) ∈ Ω } = 1 , 1 { ( i,j ) ∈ O } = 0) = 0. Then w e hav e P (1 { ( i,j ) ∈ Ω } = 1 | 1 { ( i,j ) ∈ O } = 1) = P (1 { ( i,j ) ∈ Ω } = 1 , 1 { ( i,j ) ∈ O } = 1) / P (1 { ( i,j ) ∈ O } = 1) = s, and P (1 { ( i,j ) ∈ Ω } = 1 | 1 { ( i,j ) ∈ O } = 0) = P (1 { ( i,j ) ∈ Ω } = 1 , 1 { ( i,j ) ∈ O } = 0) / P (1 { ( i,j ) ∈ O } = 0) = 0 . 18 Notice that although (1 { ( i,j ) ∈ O } , 1 { ( i,j ) ∈ Ω } ) d ep end s on K , its distribution do es not. By the ab ov e w e kn o w that ( O , Ω) h as the same distrib ution in b oth mo dels. Th erefore in the follo wing w e w ill use Mo d el 3.2 instead. Th e adv an tag e of using Mo del 3.2 is that we can utilize Γ ′ , Ω ′ , W , etc. a s auxiliaries. In the next section w e prov e some supp orting lemmas wh ic h are usefu l for the pr o of of the m ain theorem. 4.2 Suppor ting lemmas Define T := { U X ∗ + Y V ∗ , X , Y ∈ R n × r } a subspace of R n × n . T hen the orthogonal pro jectors P T and P T ⊥ in R n × n satisfy P T X = U U ∗ X + X V V ∗ − U U ∗ X V V ∗ and P T ⊥ X = ( I − U U ∗ ) X ( I − V V ∗ ) for an y X ∈ R n × n . This means kP T ⊥ X k ≤ k X k f or an y X . Reca lling the incoherence cond itions: for any i ∈ { 1 , ..., n } , k U U ∗ e i k 2 ≤ µr n and k V V ∗ e i k 2 ≤ µr n , we ha v e kP T ( e i e ∗ j ) k ∞ ≤ 2 µr n and kP T ( e i e ∗ j ) k F ≤ q 2 µr n [8, 12]. Lemma 4.1 (The or em 4.1 of [8]) Su pp ose Ω 0 ∼ Ber ( ρ 0 ) . Then with high pr ob ability, kP T − ρ − 1 0 P T P Ω 0 P T k ≤ ǫ , pr ovide d that ρ 0 ≥ C 0 ǫ − 2 µr l og n n for some numeric al c o nstant C 0 > 0 . The original id ea of the pr o of of this theorem is du e to [36]. Lemma 4.2 (The or em 3.1 of [4]) Supp ose Z ∈ R ange ( P T ) is a fixe d matrix, Ω 0 ∼ Ber ( ρ 0 ) , and ǫ ≤ 1 is an arbitr ary c onstant. Then with high pr ob ability k ( I − ρ − 1 0 P T P Ω 0 ) Z k ∞ ≤ ǫ k Z k ∞ pr ovide d that ρ 0 ≥ C 0 ǫ − 2 µr l og n n for some numeric al c onstant C 0 > 0 . Lemma 4.3 (The or em 6.3 of [8]) Supp ose Z is a fixe d matrix, and Ω 0 ∼ Ber ( ρ 0 ) . Then with high pr ob ability, k ( ρ 0 I − P Ω 0 ) Z k ≤ C ′ 0 √ np log n k Z k ∞ pr ovide d that ρ 0 ≤ p and p ≥ C 0 log n n for some numeric al c onstants C 0 > 0 and C ′ 0 > 0 . Notice that w e only ha v e ρ 0 = p in Th eorem 6.3 of [8]. By a very sligh t mo d ification in the pro of (sp ecifically , the p ro of of Lemma 6.2) w e can ha v e ρ 0 ≤ p as stated ab ov e. 4.3 A pro of of Theorem 1.3 By Lemma 3.1, we hav e w e h a v e k 1 (1 − 2 s ) ρ P T P Γ ′ P T − P T k ≤ 1 2 and k 1 √ (1 − 2 s ) ρ P T P Γ ′ k ≤ p 3 / 2 with high probability p ro vided C ρ is suffi ciently large and C s is sufficien tly small. W e w ill assume b oth inequalities hold all through the pap er . Theorem 4.4 If ther e exists an n × n matrix Y ob eying kP T Y + P T ( λ P O / Γ ′ W − U V ∗ ) k F ≤ λ n 2 , kP T ⊥ Y + P T ⊥ ( λ P O / Γ ′ W ) k ≤ 1 4 , P Γ ′ c Y = 0 , kP Γ ′ Y k ∞ ≤ λ 4 , (4.1) wher e λ = 1 √ nρ log n . Then the solution ( ˆ L, ˆ S ) to (1.6 ) satisfies ( ˆ L, ˆ S ) = ( L, S ) . 19 Pro of Set H = ˆ L − L . Th e condition P O ( L ) + S = P O ( ˆ L ) + ˆ S imp lies that P O ( H ) = S − ˆ S . Then ˆ S is su pp orted on O b ecause S is supp orted on Ω ⊂ O . By considering the subgradient of the nuclea r norm at L , we ha v e k ˆ L k ∗ ≥ k L k ∗ + hP T H , U V ∗ i + kP T ⊥ H k ∗ . By the defin ition of ( ˆ L, ˆ S ), we ha v e k ˆ L k ∗ + λ k ˆ S k 1 ≤ k L k ∗ + λ k S k 1 . By the t w o inequalities ab ov e, w e ha v e λ k S k 1 − λ k ˆ S k 1 ≥ hP T ( H ) , U V ∗ i + kP T ⊥ H k ∗ , whic h implies λ k S k 1 − λ kP O / Γ ′ ( ˆ S ) k 1 ≥ h H , U V ∗ i + kP T ⊥ ( H ) k ∗ + λ kP Γ ′ ( ˆ S ) k 1 . On the other hand, kP O / Γ ′ ˆ S k 1 = k S + P O / Γ ′ ( − H ) k 1 ≥ k S k 1 + h sgn( S ) , P Ω ( − H ) i + kP O / (Γ ′ ∪ Ω) ( − H ) k 1 ≥ k S k 1 + hP O / Γ ′ ( W ) , − H i . By the t w o inequalities ab ov e and the fact P Γ ′ ˆ S = P Γ ′ ( ˆ S − S ) = −P Γ ′ H , we h a v e kP T ⊥ ( H ) k ∗ + λ kP Γ ′ ( H ) k 1 ≤ h H , λ P O / Γ ′ ( W ) − U V ∗ i . (4.2) By the assump tions of Y , w e ha v e h H , λ P O / Γ ′ ( W ) − U V ∗ i = h H , Y + λ P O / Γ ′ ( W ) − U V ∗ i − h H , Y i = hP T ( H ) , P T ( Y + λ P O / Γ ′ ( W ) − U V ∗ ) i + hP T ⊥ ( H ) , P T ⊥ ( Y + λ P O / Γ ′ ( W )) i − hP Γ ′ ( H ) , P Γ ′ ( Y ) i − hP Γ ′ c ( H ) , P Γ ′ c ( Y ) i ≤ λ n 2 kP T ( H ) k F + 1 4 kP T ⊥ ( H ) k ∗ + λ 4 kP Γ ′ ( H ) k 1 . By inequalit y 4.2 , 3 4 kP T ⊥ ( H ) k ∗ + 3 λ 4 kP Γ ′ ( H ) k 1 ≤ λ n 2 kP T ( H ) k F . (4.3) Recall that w e assume k 1 (1 − 2 s ) ρ P T P Γ ′ P T − P T k ≤ 1 2 and k 1 √ (1 − 2 s ) ρ P T P Γ ′ k ≤ p 3 / 2 all th rough the pap er. Th en kP T ( H ) k F ≤ 2 k 1 (1 − 2 s ) ρ P T P Γ ′ P T ( H ) k F ≤ 2 k 1 (1 − 2 s ) ρ P T P Γ ′ P T ⊥ ( H ) k F + 2 k 1 (1 − 2 s ) ρ P T P Γ ′ ( H ) k F ≤ s 6 (1 − 2 s ) ρ kP T ⊥ H k F + s 6 (1 − 2 s ) ρ kP Γ ′ H k F . 20 By inequalit y 4.3 , w e hav e ( 3 4 − λ n 2 s 6 (1 − 2 s ) ρ ) kP T ⊥ ( H ) k F + ( 3 λ 4 − λ n 2 s 6 (1 − 2 s ) ρ ) kP Γ ′ H k F ≤ 0 . Then P T ⊥ ( H ) = P Γ ′ H = 0, wh ic h imp lies P Γ ′ P T ( H ) = 0. Since P Γ ′ P T is injectiv e ( k 1 (1 − 2 s ) ρ P T P Γ ′ P T − P T k ≤ 1 2 ) on T , we ha v e P T ( H ) = 0. Then we h a v e H = 0. Supp ose we can construct Y and e Y satisfying kP T Y + P T ( λ P Ω ′ W − U V ∗ ) k F ≤ λ 2 n 2 , kP T ⊥ Y + P T ⊥ ( λ P Ω ′ W ) k ≤ 1 4 , P Γ ′ c Y = 0 , kP Γ ′ Y k ∞ ≤ λ 4 . (4.4) and kP T e Y + P T ( λ (2 P Ω ′ / Γ ′ ( W ) − P Ω ′ W ) − U V ∗ ) k F ≤ λ 2 n 2 , kP T ⊥ e Y + P T ⊥ ( λ (2 P Ω ′ / Γ ′ ( W ) − P Ω ′ W )) k ≤ 1 4 , P Γ ′ c e Y = 0 , kP Γ ′ e Y k ∞ ≤ λ 4 . (4.5) Then Y = ( Y + ˜ Y ) / 2 will satisfy 4.1. By the assu m ptions in Mo del 2, (Γ ′ , P Ω ′ W ) and (Γ ′ , 2 P Ω ′ / Γ ′ ( W ) − P Ω ′ W ) hav e the same d istr ibution. Therefore, if we can construct Y satisfying (4.4) w ith high prob- abilit y , w e can also construct e Y satisfying (4.5 ) with h igh prob ab ility . Therefore to prov e Theorem 1.3, we only n eed to prov e that there exists Y satisfying (4.4) w ith high prob ab ility: Pro of (of T h eorem 1.3) Notice that Γ ′ ∼ Ber((1 − 2 s ) ρ ). Su pp ose that q satisfying 1 − (1 − 2 s ) ρ = (1 − (1 − 2 s ) ρ 6 ) 2 (1 − q ) l − 2 , wh ere l = ⌊ 5 log n + 1 ⌋ . This imp lies that q ≥ C ρ/ log ( n ). Define q 1 = q 2 = (1 − 2 s ) ρ/ 6 and q 3 = ... = q l = q . Th en in distribution w e can let Γ ′ = Γ 1 ∪ ... ∪ Γ l , wh ere Γ j ∼ Ber( q j ) indep endently . Construct Z 0 = P T ( U V ∗ − λ P Ω ′ W ) , Z j = ( P T − 1 q j P T P Γ j P T ) Z j − 1 for j = 1 , ..., j 0 ., Y = P l j =1 1 q j P Γ j Z j − 1 , Then b y Lemm a 4.1, w e ha v e k Z j k F ≤ 1 2 k Z j − 1 k F for j = 1 , ..., l . with high probability pro vided C ρ is large enough and C s is small enough. Then k Z j k F ≤ ( 1 2 ) j k Z 0 k F . By the construction of Z j , we kno w that Z j ∈ Range( P T ) and Z j = ( I − 1 q P T P Γ j ) Z j − 1 . Then similarly , b y Lemma 4.2, we h a v e k Z 1 k ∞ ≤ 1 2 √ log n k Z 0 k ∞ , 21 and k Z j k ∞ ≤ 1 2 j log n k Z 0 k ∞ for j = 2 , ..., l with high probability p ro vided C ρ is large enough and C s is small enough. Also, by Lemma 4.3 w e ha v e k ( I − 1 q P Γ j ) Z j − 1 k ≤ C s n log n q k Z j − 1 k ∞ for j = 1 , ..., l with high pr obabilit y provi ded C ρ is large enough and C s is small en ou gh . W e fi rst b ound k Z 0 k F and k Z 0 k ∞ . Obviously k Z 0 k ∞ ≤ k U V ∗ k ∞ + λ kP T P Ω ′ ( W ) k ∞ . Recall that for an y i, j ∈ [ n ], we ha v e kP T ( e i e ∗ j ) k ∞ ≤ 2 µr n and kP T ( e i e ∗ j ) k F ≤ q 2 µr n . Moreo v er, P Ω ′ ( W ) satisfies ( P Ω ′ ( W )) ; are iid rand om v ariables with the d istribution ( P Ω ′ ( W )) ij = 1 with probabilit y sρ 1 − ρ +2 sρ 0 with probabilit y 1 − ρ 1 − ρ +2 sρ − 1 w ith probabilit y sρ 1 − ρ +2 sρ Then b y Berns tein’s inequalit y , w e hav e P P T ( P Ω ′ ( W )) , e i e ∗ j ≥ t = P P Ω ′ ( W ) , P T ( e i e ∗ j ) ≥ t ≤ 2 exp( − t 2 / 2 P E X 2 j + M t / 3 ) , where we hav e X E X 2 j = 2 sρ 1 − ρ + 2 sρ kP T e i e ∗ j k 2 F ≤ C ρs µr n , and M = kP T e i e ∗ j k ∞ ≤ 2 µr n . Then with high pr obabilit y w e hav e kP T P Ω ′ ( W ) k ∞ ≤ C q ρ µr l og n n ( ≥ C q C ρ µr l og 2 n n µr l og n n > C p C ρ M log n ). Then by k U V ∗ k ∞ ≤ √ µr n w e hav e k Z 0 k ∞ ≤ C √ µr n , which implies k Z 0 k F ≤ n k Z 0 k ∞ ≤ C √ µr . No w we w an t to pro v e Y satisfies 4 . 4 with high p robabilit y . Obviously P Γ ′ c Y = 0. It suffices to prov e kP T Y + P T ( λ P Ω ′ ( W ) − U V ∗ ) k F ≤ λ 2 n 2 , kP T ⊥ Y k ≤ 1 8 , kP T ⊥ ( λ P Ω ′ ( W )) k ≤ 1 8 , kP Γ ′ Y k ∞ ≤ λ 4 . (4.6) 22 First, kP T Y + P T ( λ P Ω ′ ( W ) − U V ∗ ) k F = k Z 0 − ( l X j =1 1 q j P T P Γ j Z j − 1 ) k F = kP T Z 0 − ( l X j =1 1 q j P T P Γ j P T Z j − 1 ) k F = k ( P T − 1 q 1 P T P Γ 1 P T ) Z 0 − ( j 0 X j =2 1 q j P T P Γ j P T Z j − 1 ) k F = kP T Z 1 − ( l X j =2 1 q j P T P Γ j P T Z j − 1 ) k F = ... = k Z l k F ≤ C ( 1 2 ) l √ µr ≤ λ n 2 . Second, kP T ⊥ Y k = kP T ⊥ l X j =1 1 q j P Γ j Z j − 1 k ≤ l X j =1 k 1 q j P T ⊥ P Γ j Z j − 1 k = l X j =1 kP T ⊥ ( 1 q j P Γ j Z j − 1 − Z j − 1 ) k ≤ l X j =1 k 1 q j P Γ j Z j − 1 − Z j − 1 k ≤ l X j =1 C s n log n q j k Z j − 1 k ∞ ≤ C p n log n ( l X j =3 1 2 j − 1 log n √ q j + 1 2 √ log n √ q 2 + 1 √ q 1 ) k Z 0 k ∞ ≤ C √ nµr log n n √ ρ ≤ 1 8 √ log n , pro vided C ρ is sufficien tly large. Third, we h a v e k λ P T ⊥ P Ω ′ ( W ) k ≤ λ kP Ω ′ ( W ) k . Notice that W ij is an in dep end en t Rad emacher sequence indep endent of Ω ′ . By L emm a 4.3, we hav e k 2 sρ 1 − ρ + 2 sρ W − P Ω ′ ( W ) k ≤ C ′ 0 p np log n k W k ∞ 23 with high probability p ro vided 2 sρ 1 − ρ +2 sρ ≤ p and p ≥ C 0 log n n . By T heorem 3.9 of [39], we h av e k W k ∞ ≤ C 1 √ n with high p robabilit y . Therefore, kP Ω ′ ( W ) k ≤ C ′ 0 p np log n + C 1 √ n 2 sρ 1 − ρ + 2 sρ . By c ho osing p = ρ C 2 for some appropriate C 2 , w e ha v e kP Ω ′ ( W ) k ≤ √ nρ log n 8 , pro vided C ρ is large enough and C s is small enough. F ourth, kP Γ Y k ∞ = kP Γ X j 1 q j P Γ j Z j − 1 k ∞ ≤ X j 1 q j k Z j − 1 k ∞ ≤ ( l X j =3 1 q j 1 2 j − 1 log n + 1 q 2 1 2 √ log n + 1 q 1 ) k Z 0 k ∞ ≤ C √ µr nρ ≤ λ 4 √ log n , pro vided C ρ is sufficien tly large. Notice that in [4] th e authors used a ve ry similar golfing sc heme. T o compare these t w o m etho ds, w e use her e a non-unif orm s izes golfing scheme to ac hiev e a result with few er log f actors. Moreo v er, unlik e in [4] the authors used b oth golfing scheme and least square metho d to construct t w o p arts of the dual matrix, here w e only use golfing sc heme. Actually th e metho d to constru ct the dual matrix in [4] cann ot b e applied dir ectly to our pr oblem when ρ = O ( r log 2 n/n ). Ac kno wledgemen t s I am grateful to my Ph. D. advisor, Em man uel Cand` es, for his encouragemen ts and his help in preparing this manuscript. References [1] A. Aga r wal, S. Negahban, and M. W ainwrigh t. Noisy matrix decomp osition via c onv ex relax ation: Optimal ra tes in high dimensions. in Pr o c. 28th Int er. Conf. Mach. L e arn. (ICML). , pages 1 129– 1136 , 2011. [2] R. Ahlsw ede and A.Winter. Strong co nv erse for ide ntification via quantum ch annels. IEEE T r ans. Inform. The ory , 4 8(3):569 – 5 79, 2 002. [3] R. Bara niuk, M. Dav enpo rt, R. DeV or e, and M. W akin. A simple pro of of the r estricted isometry prop erty for random matric e s. Construct ive A ppr oximation , 28(3):25 3–26 3, 200 8. [4] E. Cand` es, X. Li, Y. Ma , and J . W rig ht . Robust principal comp onent a nalysis? Journal of ACM , 58(3), 2011. [5] E. Cand` es a nd Y. P lan. Ma trix completion with noise. Pr o c e e dings of the IEEE , 2009. 24 [6] E. Cand` es and Y. P lan. Near -ideal mo del selection by ℓ 1 minimization. Ann. S tatist. , 3 7(5A):2145 –217 7, 2009. [7] E. Cand` es and Y. Plan. A pro babilistic and r ipless theory of compres s ed sensing. IEEE T r ansactions on Information The ory , 57(11 ):7235 – 7254 , 20 11. [8] E. Cand` es and B. Rech t. Exact ma trix completio n via co nvex optimzatio n. F oun dations of Computa- tional Mathematics , 9(6), 200 9. [9] E. Ca nd` es, J. Rom b e rg, and T. T a o. Robust uncertaint y principles: exa ct signal reconstruction from highly incomplete frequency infor mation. IEEE T r ans. Inform. The ory , 52(2):4 89–50 9, 200 6. [10] E . Ca nd` es, J . Romberg, a nd T. T a o. Stable signa l r ecov ery fro m incomplete and inaccur ate measure- men ts. Commun ic ations on Pur e and Applie d Mathematics , 59(8):1 2 07–1 223, 2 006. [11] E . Cand ` es and T. T ao. Deco ding by linear pr ogra mming. IEEE T r ans. Information The ory , 51 (12), 2005. [12] E . Cand` es a nd T. T ao . The power of convex r elaxation: Near- o ptimal matr ix completio n. IEEE T r ans. Inform. The ory , 5 6(5):205 3 –208 0, 2 010. [13] V. Cha ndrasek aran, S. Sa nghavi, P . Parrilo, and A. Willsky . Spar se and low-rank matrix decomp os itions. in 15th IF AC S ypmp osium on System Identific atio n (SYSID ) , 2 009. [14] V. Cha ndrasek aran, S. Sang havi, P . Parrilo, a nd A. Willsky . Rank-spar sity inco herence for ma trix decomp osition. SIAM J. on O ptimization , 21(2):5 72–5 96, 201 1. [15] S. Chen, D. Donoho, and M. Saunder s . Atomic decomp osition by ba sis pur suit. SIAM J. Sci. Comput. , 20(1):33– 61, 1998. [16] Y. Chen, A. Jala li, S. Sanghavi, and C. Ca r amanis. Low-rank matrix recovery fro m error s and erasure s . ISIT , 2 011. [17] K . Davidson and S. Szarek. L o cal op erator theory , ra ndom ma trices and banach space s. Handb o ok of the Ge ometr y of Banach S p ac es , I(8):317 –366 , 20 0 1. [18] D.Donoho . F or mos t large underdetermined s ystems of linear equa tions the minimal l1-no rm solution is a lso the spar sest so lution. Communic ations on Pur e and Applie d Mathematics , 59 (6):797– 8 29, 20 06. [19] D. Do no ho. Co mpressed s ensing. IEEE T r ans. Inform. The ory , 52(4):128 9 – 1306 , 200 6. [20] M. F azel. Matrix ra nk minimization with applications. Ph.D Thesis , 20 02. [21] D. Gross. Recovering low-rank matrices fro m few co e fficie nt s in any basis. IEEE T r ans. on In formation The ory , 57(3):154 8–15 66, 2011 . [22] D. Gross , Y-K. Liu, S.Flammia, S. Beck er, and J.Eiser t. Quan tum state tomogr aphy via compressed sensing. Physic al R eview L etters , 105(1 5), 20 10. [23] J . Haupt, W. Ba jw a, M. Rabba t, and R. Now ak. Co mpressed sensing for netw orked data. Signal Pr o c essing Magazine, IEEE , 25(2):92 – 101, 2008 . [24] D. Hsu, S. Ka k ade, and T. Z hang. Robust matrix decomp o sition with spa rse c o rruptions. In formation The ory, IEEE T ra nsactions on , 57(1 1):7221 –723 4, 2 011. [25] J .T r o pp. User- friendly tail bo unds for sums of r andom matr ices. F ound. Comput. Math. , 2 011. [26] R. Keshav an, A. Mont anari, a nd S. Oh. Matrix completion from a few en tries. IEEE T r ans. In form. The ory , 56(6):298 0–29 98, 2010 . [27] J . La sk a , P . Bo ufounos, M. Davenport, and R. Bara niuk. Demo cracy in a ction: Quantization, saturation, and c ompressive sensing. Applie d and Computational Harmonic Analysis , 31 (3 ):429–4 43, 20 11. 25 [28] J . L ask a, M. Davenport, a nd R. Baraniuk. Exa ct signa l r ecov ery from spa rsely corr upted measure ments through the pursuit of justice. Asilomar Confer enc e on Signals Systems and Computers , 2 009. [29] Z. Li, F. W u, and J. W rig ht . On the sy s tematic mea surement ma tr ix for compress ed sensing in the presence o f gro s s err ors. Data Compr essio n Confer enc e , pages 356 –365 , 2010. [30] N. Nguyen and T. T ra n. Exa ct recov erability from dense corrupted observ ations via l 1 minimization. pr ep rint , 20 11. [31] N. Ngyuen, N. Nasraba di, and T. T ran. Robust lass o with missing and gro ssly cor r upted obser v a tions. pr ep rint , 20 11. [32] B. Rech t. A simpler approa ch to ma trix co mpletio n. Journ al of Machine L e arning R ese ar ch , 1 2:341 3 – 3430, 2 011. [33] B. Rech t, M. F az e l, a nd P . Parillo. Guar anteed minimum-rank solutions of linear matrix equatio ns v ia nu clear norm minimization. SIAM R eview , 5 2(3), 2010 . [34] J . Romber g. Compres sive sensing by random convolution. SIAM J . Imaging S cienc es , 2(4):1098 –112 8, 2009. [35] R.Tibshir ani. Reg ression s hrink age and selection via the las so. J. R oyal St atist. S o c. B. , 58(1):26 7–28 8, 1996. [36] M. Rudelson. Random vectors in the isotropic p osition. J. of F unctional Analysis , 16 4(1):60– 7 2, 1 999. [37] M. Rudelson a nd R. V ershynin. On sparse r econstruction fr om fourier and ga ussian measurements. Communic atio ns on Pur e and Applie d Mathematics , 61 (8):1025 –104 5, 200 8 . [38] C. Studer , P . K uppinger, G. P op e, and H. B¨ olcsk ei. Recov ery o f sparsely cor rupted s ignals. pr eprint , 2011. [39] R. V ershynin. Intro ductio n to the non-asymptotic analysis of random matrices. Chapter 5 of the b o ok Compr esse d Sensing, The ory and Applic ations, e d. Y . Eldar and G. Ku tyniok. Cambridge Un iversity Pr ess , pages 21 0–268 , 20 12. [40] J . W r ig ht and Y. Ma . Dense er ror corr ection via ℓ 1 -minimization. IEEE T r ansactions on Information The ory , 56(7):354 0 – 356 0, 20 10. [41] J . W r ig ht, A. Y. Y ang , A. Ganesh, S. Sastry , and Y. Ma. Robust fa ce reco gnition via spa rse repre sen- tation. IEEE T ra ns. Pattern Anal. Mach. Intel l. , 31(2):2 1022 7, 20 09. [42] L. W u, A. Ganesh, B. Shi, Y. Ma ts us hita, Y. W ang, and Y. Ma . Ro bust photometr ic stereo via low-rank matrix completion and r ecov ery . Pr o c e e dings of the 10th Asian c onfer enc e on Computer vision , Part II I, 2010. [43] H. Xu, C. Caramanis , and S. Sanghavi. Robust p ca via outlier pur s uit. in A d. Neura l Infor. Pr o c. Sys. (NIPS) , pages 2496 –2504 , 20 10. 26
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment