Low-rank Matrix Recovery from Errors and Erasures
This paper considers the recovery of a low-rank matrix from an observed version that simultaneously contains both (a) erasures: most entries are not observed, and (b) errors: values at a constant fraction of (unknown) locations are arbitrarily corrup…
Authors: Yudong Chen, Ali Jalali, Sujay Sanghavi
1 Lo w-rank Matrix Reco very from Errors and Erasures Y udong Chen, Ali Jalali, Sujay Sangha vi and Constantine Caramanis Department of Electrical and Computer Engineering The Uni versity of T exas at Austin, Austin, TX 78712 USA Email: (ydchen, alij, sangha vi and caramanis)@mail.utexas.edu Abstract This paper considers the recov ery of a low-rank matrix from an observ ed version that simultaneously contains both (a) erasur es: most entries are not observ ed, and (b) error s: values at a constant fraction of (unkno wn) locations are arbitrarily corrupted. W e provide a ne w unified performance guarantee on when the natural con vex relaxation of minimizing rank plus support succeeds in exact reco very . Our result allows for the simultaneous presence of random and deterministic components in both the error and erasure patterns. On the one hand, corollaries obtained by specializing this one single result in different ways reco ver (up to poly-log f actors) all the existing works in matrix completion, and sparse and low-rank matrix recov ery . On the other hand, our results also provide the first guar antees for (a) recovery when we observe a v anishing fraction of entries of a corrupted matrix, and (b) deterministic matrix completion. I . I N T RO D U C T I O N Lo w-rank matrices play a central role in large-scale data analysis and dimensionality reduction. The y arise in a v ariety of application areas, among them Principal Component Analysis (PCA), Multi-dimensional scaling (MDS), Spectral Clustering and related methods, ranking and collaborati ve filtering, etc. In all these problems, low-rank structure is used to either approximate a general matrix, or to correct for corrupted or missing data. This paper considers the reco very of a low-rank matrix in the simultaneous presence of (a) erasur es: most elements are not observed, and (b): err ors: among the ones that are observed, a significant fraction at unkno wn locations are grossly/maliciously corrupted. It is now well recognized that the standard, popular approach to lo w-rank matrix recov ery using SVD as a first step fails spectacularly in this setting [1]. Lo w-rank matrix completion, which considers only random erasures ([2], [3]) will also fail with ev en just a fe w maliciously corrupted entries. In light of this, se veral recent works hav e studied an alternate approach based on the natural con ve x relaxation of minimizing rank plus support. One approach [4], [5] provides deterministic/w orst case guarantees for the fully observed setting (i.e. only errors). Another av enue [6], [7] provides probabilistic guarantees for the case when the supports of the error and erasure patterns are chosen uniformly at random. Our work pro vides (often order -wise) stronger guarantees on the performance of this con ve x formulation, as compared to all of these papers. W e present one main result, and two other theorems. Our main result, Theorem 1, is a unified perfor- mance guar antee that allows for the simultaneous presence of both errors and erasures, and deterministic and random support patterns for each. In order/scaling terms, this single result recov ers as corollaries all the existing results on lo w-rank matrix completion [2], [3], worst-case error patterns [4], and random error and erasure patterns [6], [7] up to logarithm factors; we provide detailed comparisons in Section II. More significantly , our result goes be yond the existing literature by providing the first guarantees for random support patterns for the case when the fraction of entries observed v anishes as n (the size of the matrix) gro ws – an important regime in many applications, including collaborati ve filtering. In particular , we sho w that e xact reco very is possible with as fe w as Θ( n polylog ( n )) observed entries, ev en when a constant fraction of these entries are errors. 2 Theorem 2 is also a unified guarantee, b ut with the additional assumption that the signs of the error matrix are equally likely to be positive or negati ve. W e are no w able to show that it is possible to recover the low-rank matrix ev en when almost all entries are corrupted. Again, our results go beyond the existing work [6] on this case, because we allow for a v anishing fraction of observations. Theorem 3 concentrates on the deterministic/worst-case analysis, providing the first guarantees when there are both errors and erasures. Its specialization to the erasures-only case provides the first deterministic guarantees for lo w-rank matrix completion (where existing work [2], [3] has concentrated on randomly located observ ations). Specialization to the errors-only case provides an order improv ement over the pre vious deterministic results in [4], and matches the scaling of [5] but with a simpler proof. Besides improving on kno wn guarantees, all our results inv olve sev eral technical innov ations be yond existing proofs. Sev eral of these innov ations may be of interest in their own right, for other related high-dimensional problems. I I . M A I N C O N T R I B U T I O N S A. Setup The problem: Suppose matrix C ∈ R n 1 × n 2 is the sum of an underlying lo w-rank matrix B ∗ ∈ R n 1 × n 2 and a sparse “errors” matrix A ∗ ∈ R n 1 × n 2 . Neither the number, locations or v alues of the non-zero entries of A ∗ are kno wn a priori ; indeed by “sparse” we just mean that A ∗ has at least a constant fraction of its entries being 0 – it is allo wed to have a significant fraction of its entries being non-zero as well. W e consider the following problem: suppose we only observe a subset Φ ⊆ [ n 1 ] × [ n 2 ] of the entries of C ; the remaining entries are erased. When and how can we exactly recover B ∗ (and, by simple implication, the entries of A ∗ that are in Φ )? The Algorithm: In this paper we are interested in the performance of the following con vex program ( ˆ A, ˆ B ) = arg min A,B γ k A k 1 + k B k ∗ s.t. P Φ ( A + B ) = P Φ ( C ) , (1) where the notation is that for an y matrix M , k M k ∗ = P i σ i ( M ) is the nuclear norm, defined to be the sum of the singular values of the matrix, k M k 1 = P i,j | a ij | is the elementwise ` 1 norm, and P Φ ( M ) is the matrix obtained by setting the entries of M that are outside the observed set Φ to zero. Intuiti vely , the nuclear norm acts as a conv ex surrogate for the rank of a matrix [8], and the ` 1 norm as a con ve x surrogate for its sparsity . Here γ is a parameter that trades off between these two elements of the cost function, and our results belo w specify ho w it should be chosen. As noted earlier , this program has appeared previously in [7], [4]. Incoherence: W e are interested in characterizing when the optimum of (1) recov ers the underlying (observed) truth, i.e., when ( P Φ ( ˆ A ) , ˆ B ) = ( P Φ ( A ∗ ) , B ∗ ) . Clearly , not all lo w-rank matrices B ∗ can be recov ered exactly; in particular , if B ∗ is both low-rank and sparse, it would be impossible to unambiguously identify it from an added sparse matrix. T o pre vent such a scenario, we follo w the approach taken in the recent work [4], [7], [2], [3], [9] and define incoher ence parameters for B ∗ . Suppose the matrix B ∗ with rank r ≤ min ( n 1 , n 2 ) has singular v alue decomposition U Σ V > , where U ∈ R n 1 × r , V ∈ R n 2 × r and Σ ∈ R r × r . W e say a giv en matrix B ∗ is µ -incoherent for some µ ∈ h 1 , max( n 1 ,n 2 ) r i if max i k U > e i k ≤ r µr n 1 max j k V > e j k ≤ r µr n 2 k U V > k ∞ ≤ r µr n 1 n 2 , 3 where, e i ’ s are standard basis v ectors with proper length, and k · k represents the 2 -norm of the vector . Notice that all our results in the follo wing subsections only depend on the product of µ and r . B. Unified Guarantee Our first main result is a unified guarantee that allows for the simultaneous presence of random and adversarial patterns, for both errors and erasures. As mentioned in the introduction, this recovers all existing results in matrix completion, and sparse and lo w-rank matrix decomposition, up to constants or log factors. W e now define three bounding quantities: p 0 , τ and d . Let Φ d be an y (i.e. deterministic) set of observed entries, and additionally let Φ r be a randomly chosen set such that each entry is in Φ r with probability at least p 0 . Thus, the ov erall set of observed entries is Φ = Φ r ∩ Φ d , the intersection of the two sets. Let Ω = Ω r ∪ Ω d be the support of A ∗ , again composed of the union of a deterministic component Ω d , and a random component Ω r generated by having each entry be in Ω r independently with probability at most τ . Finally , consider the union Φ c d ∪ Ω d of all deterministic errors and erasures, and let d be an upper bound on the maximum number of entries this set has in an y ro w , or in any column. Theorem 1 (Unified Guarantee) . Set n = min { n 1 , n 2 } . Ther e e xist universal constants C , ρ r , ρ s and ρ d – each independent of n , µ and r – suc h that, with pr obability gr eater than 1 − C n − 10 , the unique optimal solution of (1) with tradeof f parameter γ = 1 32 √ p 0 ( d +1) n is equal to ( P Φ ( A ∗ ) , B ∗ ) pr ovided that p 0 ≥ ρ r µr log 6 n n τ ≤ ρ s d ≤ ρ d n µr · p 2 0 log 4 n Remark. (a) The conclusion of the theor em holds for a r ange of values of γ . W e have chosen one of these valid values. (b) Note that the above theor em treats err ors and er asur es dif ferently . T r eating erasur es as err ors by filling missing entries with random ± 1 and applying Theor em 2 leads to a weaker r esult, in particular , p 0 = Ω q µr log 6 n n . Comparison with pre vious work. Recov ery from deterministic errors was first studied in [4], [10], which stipulate d = O q n µr . Our theorem impro ves this bound to d = O n µr log 4 n . In section II-D, we provide a more refined analysis for the deterministic case, which gi ves d = O n µr . As this manuscript was being prepared, we learned of an independent inv estigation of the deterministic case [5], which gi ves similar guarantees. Our results also handle the case of partial observ ations, which has not been discussed before [4], [10], [5]. Randomly located errors and erasures ha ve been studied in [7]. Their guarantees require that τ = O (1) , and p 0 = Ω(1) . Our theorem provides stronger results, allowing p 0 to be vanishingly small, in particular , Θ µr log 6 n n when there is no additional deterministic component (i.e. d = 0 ). After the publication of the conference v ersion of this paper , we learned about [11]. They also deal with random errors and erasures, but under a dif ferent observ ation model (sampling with replacement), and have scaling results comparable to ours. Pre vious work in lo w-rank matrix completion deals with the case when there are no errors or de- terministic erasures (i.e., d, τ = 0 ). For this problem, our theorem matches the best existing bound 4 p 0 = O µr log 2 n n [3], [9], [12] up to logarithm factors. Our theorem also pro vides the first guarantee for deterministic matrix completion under potentially adversarial erasures. One prominent feature of our guarantees is that we allo w adversarial and random erasures/errors to exist simultaneously . T o the best of our kno wledge, this is the first such result in lo w-rank matrix recovery/rob ust PCA. C. Impr oved Guarantee for Err ors with Random Sign If we further assume that the errors in the entries in Ω r \ Ω d hav e random signs, then one can recover from an o verwhelming fraction of corruptions. Theorem 2 (Improv ed Guarantee for Errors with Random Sign) . Under the same setup of Theor em 1, further assume that the signs of A ∗ in Ω r \ Ω d ar e symmetric ± 1 Bernoulli random variables independent of all others. Then ther e e xist absolute constants C , ρ r and ρ d independent of n , µ and r such that, with pr obability at least 1 − C n − 10 , the unique optimal solution of (1) with tradeof f parameter γ = 1 32 √ p 0 ( d +1) n is equal to ( P Φ ( A ∗ ) , B ∗ ) pr ovided that p 0 (1 − τ ) 2 ≥ ρ r µr log 6 n n d ≤ ρ d n µr · p 2 0 (1 − τ ) 2 log 4 n Remark. Note that τ may be arbitrary close to 1 for lar ge n . One inter esting observation is that p 0 can appr oach zer o faster than 1 − τ ; this agr ees with the intuition that corr ecting erasur es with known locations is easier than correcting err ors with unknown locations. Comparison with pr evious work Dense errors with random locations and signs were considered in [6]. They show that τ can be a constant arbitrarily close to 1 pro vided that all entries are observed and n is suf ficiently lar ge. Our theorem provides stronger results by again requiring only a vanishingly small fraction of entries to be observ ed and in particular p 0 = Θ log 4 n n . Moreo ver , Theorem 2 giv es explicit scaling between τ and n as τ = O 1 − q log 4 n n , with γ independent of the usually unkno wn quantity τ . In contrast, [6] requires τ ≤ f ( n ) for some unknown function f ( · ) and uses a τ -dependent γ . D. Impr oved Deterministic Guar antee Our second main result deals with the case where the errors and erasures are arbitrary . As discussed in [4], for exact recovery , the error matrix A ∗ needs to be not only sparse but also ”spread out”, i.e. to not hav e any ro w or column with too many non-zero entries. The same holds for unobserv ed entries. Correspondingly , we require the following: (i) there are at most d errors and erasures on each ro w/column, and, (ii) k M k ≤ η d k M k ∞ for any matrix M that is supported on the set of corrupted entries and unobserved entries; here k M ∗ k = σ max ( M ∗ ) is the lar gest singular value of M and k M k ∞ = max i,j | M i,j | is the element-wise maximum magnitude of the elements of the matrix. Note that by [4, Proposition 3], we can always take η ≤ 1 . Also, let α = q µrd n 1 + q µrd n 2 + q µrd max( n 1 ,n 2 ) . Theorem 3 (Improved Deterministic Guarantee) . F or tradeoff parameter γ ∈ h 1 1 − 2 α q µr n 1 n 2 , 1 − α η d − q µr n 1 n 2 i , suppose s µr d min( n 1 , n 2 ) 1 + s min( n 1 , n 2 ) max( n 1 , n 2 ) + η s d max( n 1 , n 2 ) ! ≤ 1 2 . 5 Then, the solution to the pr oblem (1) is unique and equal to ( P Φ ( A ∗ ) , B ∗ ) . Remark. (a) Notice that we have √ d in the bound while [4] has d in their bound. This impr ovement is achie ved by a differ ent construction of dual certificate presented in this paper . (b) If η d q µr min( n 1 ,n 2 ) ≤ 1 6 (the condition pr ovided for exact r ecovery in [4]) is satisfied then the condition of Theor em 3 is satisfied as well. This shows that our r esult is an impr ovement to the r esult in [4] in the sense that this result guarantees the r ecovery of a lar ger set of matrices A ∗ and B ∗ . Mor eover , this bound implies that n (for squar e matrices) should scale with dr , whic h is another impr ovement compar ed to the d 2 r scaling in [4]. (c) W e construct the dual certificate by the method of least squar es (first used in [2] in a dif fer ent setting) with tighter bounding. This theor em pr ovides the same scaling result for d , r and n as that in the r ecent manuscript [5]. However , our assumptions ar e closer to existing ones in matrix completion and spar se and low-rank decomposition papers [2], [3], [4], [7]. I I I . P RO O F T H E O R E M 1 A N D 2 In this section we pro ve our unified guarantees. The main roadmap is along the same lines of those in the lo w-rank matrix recov ery literature [2], [7], [9]; it consists of pro viding a dual matrix Q that certifies the optimality of ( P Φ ( A ∗ ) , B ∗ ) to the con ve x program (1). In spite of this high le vel similarity , challenges arise because of the denseness of erasures/errors as well as the simultaneous presence of deterministic and random components. This requires a number of innov ativ e intermediate results and a new construction of the dual certificate Q . W e will point out how our analysis departs from previous w orks when we construct the dual certificate in section III-D. Before proceeding, we need to introduce some additional notation. Define the support of A ∗ as Ω = { ( i, j ) : A ∗ i,j 6 = 0 } . Let Γ = Φ \ Ω be the set of entries that are observed and clean, then Γ c is the set of entries that are corrupted or unobserved. Also, let Γ r = Φ r \ Ω r be the set of r andom observed clean entries, and Γ d the set of deterministic observed clean entries; so Γ = Γ r ∩ Γ d . The projections P Γ , P Γ c , P Γ r , and P Γ c r are defined similarly to P Φ . Set E ∗ := P Φ (sgn( A ∗ )) , where sgn( · ) is the element-wise signum function. For an entry set Ω 0 , we write Ω 0 ∼ Ber ( p ) if Ω 0 contains each entry with probability p , independent of all others; therefore Φ r ∼ Ber ( p 0 ) , Ω r ∼ Ber ( τ ) , and Γ r ∼ Ber ( p 0 (1 − τ )) . W e also define a sub-space T of the span of all matrices that share either the same column space or the same row space as B ∗ : T = U X > + Y V > : X ∈ R n 2 × r , Y ∈ R n 1 × r . For any matrix M ∈ R n 1 × n 2 , we can define its orthogonal pr ojection to the space T as follows: P T ( M ) = U U > M + M V V > − U U > M V V > . W e also define the projections onto T ⊥ , the complement orthogonal space of T , as follo ws: P T ⊥ ( M ) = M − P T ( M ) . In the sequel, we use C , C 0 and C 00 to denote unspecified positi ve constants, which might dif fer from place to place; by with high pr obability we mean with probability at least 1 − C min { n 1 , n 2 } − 10 . For simplicity , we only prov e the case of square matrices ( n 1 = n 2 = n ). All the proofs extend to the general case by replacing n by min { n 1 , n 2 } . The proof has fiv e steps. W e elaborate each of these steps in the next fi ve sub-sections. A. Step 1: Sign P attern Der andomization Follo wing [7], the first step is to observ e that it suf fices to prov e Theorem 2, which assumes random signed errors in Ω r \ Ω d . The guarantee under arbitrary signed errors in Theorem 1 follo ws automatically 6 from Theorem 2 using a derandomization and elimination ar gument. This is given in the follo wing lemma, which is a straightforward generalization of [7, Theorem 2.2 and 2.3]. Lemma 1. Suppose B ∗ obe ys the conditions of Theor em 1. If the con ve x pr ogr am (1) r ecovers B ∗ with high pr obability in the model wher e Ω r ∼ Ber (2 τ ) and the signs of A ∗ in Ω r \ Ω d have random signs, then it also r ecovers B ∗ with at least the same pr obability in the model wher e Ω r ∼ Ber ( τ ) and the signs ar e arbitrarily fixed. The basic idea of the proof is that, as long as τ is not too large, a fixed-signed error matrix P Ω r \ Ω d ( A ∗ ) can be vie wed as the trimmed version of a random signed P ¯ Ω r \ Ω d ( ¯ A ∗ ) with half of its entries set to zero; moreov er, successful reco very under A ∗ is guaranteed by that under ¯ A ∗ , as the latter is a harder problem. W e refer the readers to [7, Theorem 2.2 and 2.3] for the rigorous proof of this ar gument. Proceeding under the random-sign assumption mak es it easier to construct the dual certificate Q . The next four steps are thus de voted to the proof of Theorem 2. B. Step 2: In vertibility under corruptions and er asur es A necessary condition for exact recovery is that the set of uncorrupted and un-erased entries Γ = Γ r ∩ Γ d should uniquely identify matrices in the set T , so we need to sho w that the operator P T P Γ P T is in vertible on T . This step is quite standard in the literature of low-rank matrix completion and decomposition, b ut in our case requires a dif ferent proof. In f act, in vertibility follo ws from the follo wing stronger result. Lemma 2. Suppose Ω 0 is a set of indices obe ying Ω 0 ∼ Ber ( p ) , and Γ d satisfies d ≤ ρ d n µr . Then with high pr obability , we have p − 1 P T P Ω 0 ∩ Γ d P T − P T ≤ 1 3 pr ovided p ≥ C µr log n n . In vertibility follo ws from specializing Ω 0 = Γ r . The lemma is stated in terms of a generic entry set Ω 0 because it is in voked again else where. Notice that this lemma is a generalization of [2, Theorem 4.1], as Ω 0 ∩ Γ d in volv es both random and deterministic components. The proof is new , utilizing the properties of both components, and is giv en in the appendix. C. Step 3: Sufficient Conditions for Optimality The next step is to use conv ex analysis to write do wn the first-order sub-gradient suf ficient condition for ( P Φ ( A ∗ ) , B ∗ ) to be the unique solution to (1). This is giv en in the following lemma. Recall that we hav e defined E ∗ := P Φ (sgn( A ∗ )) . Lemma 3. Suppose γ , p 0 , τ and d satisfy the condition in Theorem 2. Then with high pr obability ( P Φ ( A ∗ ) , B ∗ ) is the unique solution to (1) if ther e is a dual certificate Q = γ E ∗ + W obe ying ( a ) P T W − ( U V > − γ P T E ∗ ) F ≤ γ √ n ( b ) P Γ c W = 0 . ( c ) kP Γ W k ∞ < γ 2 (2) ( d ) kP T ⊥ W k < 1 4 ( e ) k γ P T ⊥ E ∗ k < 1 4 . Pr oof: Observ e that the conditions in the lemma imply P Φ c ( Q ) = 0 , P T ( Q ) − U V > F ≤ γ √ n , kP T ⊥ ( Q ) k < 1 2 , P Ω ( Q ) = γ E ∗ , and kP Γ k ∞ < γ 2 . Consider another feasible solution ( P Φ ( A ∗ ) + ∆ 2 , B ∗ + 7 ∆ 1 ) with ∆ 1 6 = 0 , ∆ 2 6 = 0 , and P Φ (∆ 1 + ∆ 2 ) = 0 . T ake G 0 ∈ T ⊥ and F 0 ∈ Γ such that k G 0 k = 1 , k F 0 k ∞ = 1 , h G 0 , ∆ 1 i = kP T ⊥ ∆ 1 k ∗ and h F 0 , ∆ 2 i = kP Γ ∆ 2 k 1 ; such G 0 and F 0 exist due to the duality between k·k ∗ and k·k , and that between k·k 1 and k·k ∞ . W e then hav e k B ∗ + ∆ 1 k ∗ + γ kP Φ ( A ∗ ) + ∆ 2 k 1 − k B ∗ k ∗ − γ kP Φ ( A ∗ ) k 1 ≥ U V > + G 0 , ∆ 1 + γ h E ∗ + F 0 , ∆ 2 i = U V > + G 0 − Q, ∆ 1 + h γ E ∗ + γ F 0 − Q, ∆ 2 i = G 0 − P T ⊥ ( Q ) − P T ( Q ) − U V > , ∆ 1 + h γ F 0 − P Γ ( Q ) , ∆ 2 i ≥ kP T ⊥ ∆ 1 k ∗ (1 − kP T ⊥ ( Q ) k ) − P T ( Q ) − U V > F kP T ∆ 1 k F + kP Γ ∆ 2 k 1 ( γ − kP Γ ( Q ) k ∞ ) (3) ≥ 1 2 kP T ⊥ ∆ 1 k ∗ − γ √ n kP T ∆ 1 k F + γ 2 kP Γ ∆ 2 k 1 ; here we use the sub-gradients of k · k ∗ and k · k 1 in the first inequality and Cauchy-Schw arz inequality in (3). W e need to upper-bound kP T ∆ 1 k F . Notice that w .h.p. kP Γ P T ∆ 1 k 2 F = hP T ∆ 1 , P T P Γ P T ∆ 1 i = hP T ∆ 1 , P T P Γ P T ∆ 1 − p 0 (1 − τ ) P T ∆ 1 + p 0 (1 − τ ) P T ∆ 1 i ≥ p 0 (1 − τ ) kP T ∆ 1 k 2 F − 1 2 p 0 (1 − τ ) kP T ∆ 1 k 2 F = 1 2 p 0 (1 − τ ) kP T ∆ 1 k 2 F ; here in the inequality we use Lemma 2 with Ω 0 = Γ r and p = p 0 (1 − τ ) . It follo ws that kP Γ ∆ 2 k 1 ≥ kP Γ ∆ 2 k F = kP Γ ∆ 1 k F = kP Γ P T ∆ 1 + P Γ P T ⊥ ∆ 1 k F ≥ kP Γ P T ∆ 1 k F − kP Γ P T ⊥ ∆ 1 k F ≥ r p 0 (1 − τ ) 2 kP T ∆ 1 k F − kP T ⊥ ∆ 1 k F ≥ r 4 n kP T ∆ 1 k F − kP T ⊥ ∆ 1 k ∗ , where the last inequality holds under the assumptions in Theorem 2. Substituting back to (3), we obtain k B ∗ + ∆ 1 k ∗ + γ kP Φ ( A ∗ ) + ∆ 2 k 1 − k B ∗ k ∗ − γ kP Φ ( A ∗ ) k 1 ≥ kP T ⊥ ∆ 1 k ∗ 1 2 − γ 2 + kP Γ ∆ 2 k 1 γ 2 − γ 2 ≥ 0 , where we use γ < 1 . W e claim that the above inequality is strict. Suppose it is not, then we must ha ve P T ⊥ ∆ 1 = P Γ ∆ 2 = 0 . But under the assumptions in Theorem 2, P T P Γ P T is in vertible by Lemma 2 and thus Γ ⊥ ∩ T = { 0 } , which contradicts ∆ 1 6 = 0 and ∆ 2 6 = 0 . D. Step 4: Construction of the Dual Certificate W e need to show the e xistence a matrix W obeying the conditions in (2) in Lemma 3. W e will construct W using a v ariation of the so-called Golfing Scheme [7], [9]. Here we briefly e xplain the idea. Consider the left hand side of condition (a) in (2) as the “error” of approximating U V > − γ P T E ∗ by P T W ; we want the error to be small. First observe that the choice of W = U V > − γ P T E ∗ satisfies (a) strictly 8 but violates (b). T o enforce (b), one might consider sampling according to Γ , the set of observed clean entries, and define W 1 = ( p 0 (1 − τ )) − 1 P Γ U V > − γ P T E ∗ . W ith the choice of W = W 1 , (b) is satisfied, and one e xpects the error in (a) is also small because its expectation equals −P T P Γ c d U V > − γ P T E ∗ , which is small as long as P Γ c d is a contraction. This intuition is largely true e xcept that the error is still not small enough. T o correct this bias, it is natural to compensate by subtracting the remaining error from W 1 , and then sample again. Indeed, if one sets W 2 = W 1 − ( p 0 (1 − τ )) − 1 P Γ P T W 1 − ( U V > − γ P T E ∗ ) , then W = W 2 still satisfies (b), and the error in (a) becomes smaller . By repeating this “correct and sample” procedure, the error actually decreases geometrically fast. This is almost exactly how we are going to construct Q ; the only modification is that for technical reasons we need to decompose the observed clean entry set Γ into independent batches and sample according to a different batch at each step. T o this end, we think of Ω c r ∼ Ber (1 − τ ) as ∪ 1 ≤ k ≤ k 0 Ω ( k ) and Φ r ∼ Ber ( p 0 ) as ∪ 1 ≤ k ≤ k 0 Φ ( k ) , where the sets Ω ( k ) ∼ Ber ( q 1 ) and Φ ( k ) ∼ Ber ( q 2 ) are independent; here k 0 is taken to be d 4 log n e , and q 1 , q 2 obeys 1 − τ = 1 − (1 − q 1 ) k 0 and p 0 = 1 − (1 − q 2 ) k 0 . Observ e that q 1 ≥ (1 − τ ) /k 0 and q 2 ≥ p 0 /k 0 . One can verify that Ω r and Φ r hav e the same distribution as before. Define Γ ( k ) = Ω ( k ) ∩ Φ ( k ) , which can be considered as the k -th batch of (random) observ ed clean entries; we then hav e Γ ( k ) ∼ Ber ( q ) with q := q 1 q 2 ≥ p 0 (1 − τ ) k 2 0 ≥ C µr log n n , where C may become arbitrarily large by selecting ρ r suf ficiently large. Define the operator R Γ ( k ) : R n × n 7→ R n × n as R Γ ( k ) ( M ) , q − 1 P Γ ( k ) ∩ Γ d ( M ) = X i,j ∈ Γ ( k ) ∩ Γ d q − 1 M i,j ( e i e > j ) , which is simply the (properly scaled) projection onto the k -th batch of observ ed clean entries. The matrix W is then constructed as W = W k 0 , where W k 0 is defined recursi vely by W 0 := 0 and W k := W k − 1 + R Γ ( k ) U V > − γ P T E ∗ − P T W k − 1 , for k = 1 , 2 , . . . , k 0 . The previous work [7] also applies Golfing Scheme, but only to the part of the dual certificate that in volv es U V > ; for the part that in volv es E ∗ , they use the method of least squares. W e utilize Golfing Scheme for both parts of the certificate. Dif ficulties arise due to the dependence between E ∗ and Γ ( k ) ’ s, and a ne w analysis is needed for the v alidation of the certificate. This crucial difference allo ws us to go beyond [7] and handle a vanishing fraction of observ ations and/or clean entries. E. Step 5: V alidity of the Dual Certificate It remains to sho w that Q satisfies all the constraints in the optimality condition (2) simultaneously . The equality (b) is immediate by the construction of Q and W . T o prov e the inequalities, one observes that if we denote the k -th step error as D k := U V > − γ P T E ∗ − P T W k , then D k satisfies the following recursion D k = U V > − γ P T E ∗ − P T W k = ( P T − P T R Γ ( k ) P T )( U V > − γ P T E ∗ − P T W k − 1 ) = ( P T − P T R Γ ( k ) P T ) D k − 1 , (4) and W k 0 can be e xpressed as W k 0 = k 0 X k =1 R Γ ( k ) D k − 1 . (5) W e are now ready to prov e that W = W k 0 satisfies the four inequalities in (2) under our assumptions. The proof uses Lemmas 11-15 in the Appendix. Inequality ( a ) : Bounding P T W − ( U V > − γ P T E ∗ ) F 9 Thanks to (4), we have the following geometric con vergence P T W − ( U V > − γ P T E ∗ ) F = k D k 0 k F = k ( P T − P T R Γ ( k 0 ) P T ) · · · ( P T − P T R Γ (1) P T ) D 0 k F ≤ k 0 Y k =1 kP T − P T R Γ ( k ) P T k ! U V > − γ P T E ∗ F ( i ) ≤ e − k 0 U V > F + γ kP T E ∗ k F ( ii ) ≤ n − 4 ( n + γ n ) ( iii ) ≤ γ √ n ; here (i) uses Lemma 2, (ii) uses kP T E k F ≤ k E k F ≤ n , and (iii) is due to our choice of γ . This prov es inequality (a) in (2). Inequality ( c ) : Bounding kP Γ W k ∞ W e write k Y i =1 ( P T − P T R Γ ( i ) P T ) = ( P T − P T R Γ ( k ) P T ) · · · ( P T − P T R Γ (1) P T ) where the order of multiplication is important. Then we hav e kP Γ W k ∞ = k W k 0 k ∞ ( i ) ≤ k 0 X k =1 kR Γ ( k ) D k − 1 k ∞ ≤ q − 1 k 0 X k =1 k D k − 1 k ∞ ( ii ) = q − 1 k 0 X k =1 k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) D 0 ∞ ≤ q − 1 k 0 X k =1 k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) ( U V > − γ P T P Ω d E ∗ ) ∞ + q − 1 k 0 X k =1 k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) ( − γ P T P Ω r \ Ω d E ∗ ) ∞ ; here (i) uses (5) and (ii) uses (4). W e bound the abov e two terms separately . The first term is bounded as q − 1 k 0 X k =1 k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) ( U V > − γ P T P Ω d E ∗ ) ∞ ( i ) ≤ q − 1 k 0 X k =1 1 2 k − 1 U V > − γ P T P Ω d E ∗ ∞ (6) ≤ C k 2 0 p 0 (1 − τ ) U V > − γ P T P Ω d E ∗ ∞ ( ii ) ≤ C k 2 0 p 0 (1 − τ ) r µr n 2 + γ α ( iii ) ≤ 1 4 γ ; (7) 10 Here (i) uses the second part of Lemma 13 with Ω 0 = Γ ( k ) and 3 = 1 4 , as well as the f act that α ≤ 1 4 under the assumptions of Theorem 2, (ii) uses the incoherence assumptions and Lemma 14, and (iii) holds under the assumptions of Theorem 2. For the second term, we can not use the abo ve argument, because E ∗ = P Φ ( sg n ( S 0 )) is not independent of Γ ( i ) ’ s and thus Lemma 13 does not apply . Instead, we need to utilize the random signs of E ∗ := P Φ (sgn( A ∗ )) (a similar argument appeared in [7]). Consider the k -th term in the sum. W e hav e q − 1 k − 1 Y i =1 ( P T − P T R Γ ( i ) P T )( γ P T P Ω r \ Ω d E ∗ ) ∞ = γ q − 1 max a,b * e a e > b , k − 1 Y i =1 ( P T − P T R Γ ( i ) P T )( P T P Ω r \ Ω d E ∗ ) + = γ q − 1 max a,b * k − 1 Y i =1 ( P T − P T R Γ ( k − i ) P T ) e a e > b , P Φ ∩ (Ω r \ Ω d ) (sgn( A ∗ )) + ; here in the last equality we use the self-adjointness of the operators. Conditioned on Φ , Ω , and Γ ( i ) ’ s, P Φ ∩ (Ω r \ Ω d ) (sgn( A ∗ )) has i.i.d. symmetric ± 1 entries, so Hoef fding’ s inequality gi ves, P γ q − 1 * P Φ k − 1 Y i =1 ( P T − P T R Γ ( k − i ) P T ) e a e > b , P Φ ∩ (Ω r \ Ω d ) (sgn( A ∗ )) + > t | Φ , Ω , Γ ( i ) ’ s ! ≤ 2 exp − 2 t 2 γ q − 1 Q k − 1 i =1 ( P T − P T R Γ ( k − i ) P T ) e a e > b 2 F ≤ 2 exp − 2 t 2 γ 2 q − 2 Q k − 1 i =1 ( P T − P T R Γ ( k − i ) P T ) 2 kP T ( e a e b ) k 2 F ≤ 2 exp − t 2 γ 2 q − 2 Q k − 1 i =1 kP T − P T R Γ ( k − i ) P T k 2 2 µr n ! ; (8) here the last inequality uses P T ( e a e > b ) 2 F ≤ 2 µr n , which follows from the incoherence assumptions. Conditioned on the ev ent G k := kP T − P T R Γ ( k − i ) P T k ≤ 1 2 , i = 1 , . . . k − 1 , we can integrate out the conditions in (8) and obtain P γ q − 1 * k − 1 Y i =1 ( P T − P T R Γ ( k − i ) P T ) e a e > b , P Φ ∩ (Ω r \ Ω d ) (sgn( A ∗ )) + > t | G k ! ≤ 2 exp − t 2 γ 2 q − 2 1 2 k − 1 2 µr n ! By Lemma 2, we know that the e vent G k holds with high probability . Choosing t = C 1 2 k − 1 γ µr log n q n with C suf ficiently lar ge and using union bound (there is only polynomially many different ( a, b ) ), we conclude that q − 1 k − 1 Y i =1 ( P T − P T R Γ ( i ) P T )( γ P T P Ω r \ Ω d E ∗ ) ∞ ≤ C 1 2 k − 1 γ µr log n q n ≤ 1 2 k · 1 4 γ 11 with high probability; here the second inequality holds because q ≥ C 0 µr log n n by our choice. Summing ov er k It follo ws that k 0 X k =1 q − 1 k − 1 Y i =1 ( P T − P T R Γ ( i ) P T )( γ P T P Ω r \ Ω d E ∗ ) ∞ ≤ 1 4 γ . (9) Combing (7) and (9) proves inequality (c) in (2). Inequality ( d ) : Bounding kP T ⊥ W k W e hav e kP T ⊥ W k 0 k ( i ) ≤ k 0 X k =1 kP T ⊥ R Γ ( k ) D k − 1 k ( ii ) = k 0 X k =1 kP T ⊥ ( R Γ ( k ) D k − 1 − D k − 1 ) k ( iii ) ≤ k 0 X k =1 ( R Γ ( k ) − I ) k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) D 0 ≤ k 0 X k =1 ( R Γ ( k ) − I ) k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) ( U V > − γ P T P Ω d E ∗ ) + k 0 X k =1 ( R Γ ( k ) − I ) k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) ( − γ P T P Ω r \ Ω d E ∗ ) ; (10) here (i) uses (5), (ii) uses D k ∈ T , and (iii) uses (4). W e bound the abov e two terms separately . The first term is bounded as k 0 X k =1 ( R Γ ( k ) − I ) k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) ( U V > − γ P T P Ω d E ∗ ) ( i ) ≤ C s n log n q + d ! k 0 X k =1 k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) ( U V > − γ P T P Ω d E ∗ ) ∞ ( ii ) ≤ 2 C s n log n q + d ! U V > − γ P T P Ω d E ∗ ∞ ( iii ) ≤ 2 C s n log n q + d ! r µr n 2 + γ α ( iv ) ≤ 1 8 ; (11) here (i) uses the second part of Lemma 12 with Ω 0 = Γ ( k ) , (ii) uses (6), (ii) uses the incoherence assumptions and Lemma 14, and (i v) holds under the assumption of Theorem 2. For the second term in (10), the abov e argument fails due to the dependence between P Ω r \ Ω d E ∗ and Γ ( i ) ’ s. Again we rely on the random signs of P Ω r \ Ω d E ∗ = P Φ ∩ (Ω r \ Ω d ) sgn( A ∗ ) , b ut the situation is more complicated here as we need to use an − net argument to bound the operator norms. The key idea is to observe that, though independence does not hold, conditional independence does – Γ ( i ) ’ s and E ∗ are independent conditioned on Ω . This is because supp ( E ∗ ) ⊆ Ω is a random subset of the corrupted entries while Γ ( i ) ⊆ Ω c are random subsets of the un-corrupted entries. T o isolate this 12 independence, we telescope the operators in the second term in (10). For k = 1 , . . . , k 0 , define the operators A k = P T − P T R Ω ( k ) P T S k = P T R Ω ( k ) P T − P T R Γ ( k ) P T B k = R Ω ( k ) − I T k = R Γ k − R Ω ( k ) Observe that P T − P T R Γ ( k ) P T = A k + S k , and R Γ ( k ) − I = B k + T k . The reason for doing so is that, conditioned on Ω , T k ’ s and S i ’ s are independent of E ∗ . Thus if a term only in v olves T k and S k ’ s (we call it a T ype-1 term), it can be bounded in a similar way as the first term in (10) using Lemma 12 and 13. For the other terms that in v olve not only T k and S k ’ s b ut also A i ’ s and/or B k ’ s (dubbed T ype-2 terms), we bound them using the random signs of E ∗ . (It turns out if one bounds the T ype-1 term using the random signs, the resulting bound is not strong enough, so we need to distinguish these two cases). No w for the details. Consider the k -th term in summands of the second term in (10). Using the abo ve definitions, we ha ve ( R Γ ( k ) − I ) k − 1 Y i =1 ( P T − P T R Γ ( i ) P T ) ( − γ P T P Ω r \ Ω d E ∗ ) = ( B k + T k ) k − 1 Y i =1 ( A i + S i )( γ P T P Ω r \ Ω d E ∗ ) (12) W e expand the product and sums in the above equation, which results in a sum of 2 k =poly( n ) terms since k ≤ k 0 = O (log n ) . Among them there is one T ype-1 term T k S 1 S 2 · · · S k − 1 ( γ P T P Ω r \ Ω d E ∗ ) , (13) and 2 k − 1 T ype-2 terms, such as T k A 1 S 2 S 3 · · · A k − 2 S k − 1 ( γ P T P Ω r \ Ω d E ∗ ) , B k S 1 A 2 S 3 · · · S k − 2 A k − 1 ( γ P T P Ω r \ Ω d E ∗ ) . W e first bound the T ype-1 term. Conditioned on Ω , we ha ve T k S 1 S 2 · · · S k − 1 ( γ P T P Ω r \ Ω d E ∗ ) = 1 q 1 q 2 P Φ ( k ) ∩ (Ω ( k ) ∩ Γ d ) − 1 q 1 P Ω ( k ) ∩ Γ d k − 1 Y i =1 1 q 1 q 2 P T P Φ ( i ) ∩ (Ω ( i ) ∩ Γ d ) P T − 1 q 1 P T P Ω ( i ) ∩ Γ d P T ( γ P T P Ω r \ Ω d E ∗ ) ( i ) ≤ C 1 q 1 s n log n q 2 ! 1 2 k − 1 γ P T P Ω r \ Ω d E ∗ ∞ ( ii ) ≤ C 0 s n log n q 2 q 2 1 1 2 k γ r µr n p 0 log n ( iii ) ≤ 1 16 1 2 k ; here in (i) we apply the first part of Lemma 12 with Ω 0 = Φ ( k ) and Γ 0 = Ω ( k ) ∩ Γ d , as well as the first part of Lemma 13 with Ω 0 = Φ ( i ) , Γ 0 = Ω ( i ) ∩ Γ d and 3 = 1 2 q 1 , (ii) uses Lemma 15, and (iii) holds under the assumption of Theorem 2. 13 W e next bound the remaining 2 k − 1 T ype-2 terms. T o this end, we first collect fi ve useful inequalities. Because Ω ( i ) ∼ B er ( q 1 ) , the second part of Lemma 11 with Ω 0 = Ω ( i ) and 1 = C µr log n nq 1 gi ves that w .h.p. kA i k = kP T − P T R Ω ( i ) P T k ≤ C s µr log n nq 1 + C r µr d n ≤ C 0 s p 0 (1 − τ ) log 3 n (14) The first part of Lemma 11 with Ω 0 = Ω ( k ) and Γ 0 = Γ d sho ws that w .h.p. kP T B k k = 1 q 1 P T P Ω ( k ) ∩ Γ d − P T ≤ 1 q 1 P T P Ω ( k ) ∩ Γ d + kP T k = 1 q 1 q P T P Ω ( k ) ∩ Γ d P T + 1 ≤ 1 q 1 s q 1 1 q 1 P T P Ω ( k ) ∩ Γ d P T − P T P Γ d P T + q 1 kP T P Γ d P T k + 1 ≤ C r 1 q 1 ≤ C 0 r log n 1 − τ (15) Similarly , we hav e w .h.p. kP T T k k = 1 q 1 q 2 P T P Φ ( k ) ∩ Ω ( k ) − 1 q 1 P T P Ω ( k ) ≤ 1 q 1 q 2 P T P Φ ( k ) ∩ Ω ( k ) − P T + P T − 1 q 1 P T P Ω ( k ) ≤ C r 1 q 1 q 2 + C r 1 q 1 ≤ C 0 s log 2 n p 0 (1 − τ ) . (16) Applying the first part of Lemma 11 twice with (1) Ω 0 = Ω ( k ) , Γ 0 = Γ d , 1 = C q µ log n nq 1 and (2) Ω 0 = Φ ( k ) ∩ Ω ( k ) , Γ 0 = Γ d , 1 = C q µ log n nq 1 q 2 gi ves w .h.p. kS k k = 1 q 1 P T P Ω ( k ) ∩ Γ d P T − 1 q 1 q 2 P T P Φ ( k ) ∩ (Ω ( k ) ∩ Γ d ) P T ≤ 1 q 1 P T P Ω ( k ) ∩ Γ d P T − P T P Γ d P T + P T P Γ d P T − 1 q 1 q 2 P T P (Φ ( k ) ∩ Ω ( k ) ) ∩ Γ d P T ≤ C s µr log n nq 1 + C s µr log n nq 1 q 2 ≤ C 0 s µr log 3 n np 0 (1 − τ ) ≤ 1 4 (17) Finally , since Φ ∩ (Ω r \ Ω d ) ⊆ Φ ⊆ Φ r , we apply the first part of Lemma 11 with Ω 0 = Φ r , Γ 0 = [ n ] × [ n ] and 1 = 1 2 to obtain w .h.p. P Φ ∩ (Ω r \ Ω d ) P T ≤ kP Φ r P T k = p kP T P Φ r P T k . = s p 0 1 p 0 P T P Φ r P T − P T + P T ≤ s p 0 1 p 0 P T P Φ r P T − P T + p 0 ≤ p 2 p 0 (18) No w consider one of the T ype-2 terms X ( γ P T P Ω r \ Ω d E ∗ ) , T k S 1 S 2 · · · S k − 2 A k − 1 ( γ P T P Ω r \ Ω d E ∗ ) . 14 Let X ∗ be the adjoint of X . The last fi ve inequalities (14)-(18) yield w .h.p. P Φ ∩ (Ω r \ Ω d ) P T X ∗ = P Φ ∩ (Ω r \ Ω d ) P T A k − 1 S k − 2 · · · S 1 P T T k ≤ C √ p 0 s p 0 (1 − τ ) log 3 n 1 4 k − 2 s log 2 n p 0 (1 − τ ) ≤ C 0 √ p 0 1 4 k . (19) It is not hard to check that this inequality also holds for the X ’ s associated with other T ype-2 terms, except for the term ( R Ω (1) −I ) ( − γ P T E ∗ ) , which is discussed later . W e are ready to bound the operator norm of the T ype-2 term using a standard -net argument. Let S n − 1 be the unit sphere in R n , and N be an 1 / 2 -net of S n − 1 of size at most 6 n . The definition and Lipschitz property of the operator norm gi ves that X ( γ P T P Ω r \ Ω d E ∗ ) = sup x,y ∈ S n − 1 xy > , X ( γ P T P Ω r \ Ω d E ∗ ) ≤ 4 sup x,y ∈ N xy > , X ( γ P T P Ω r \ Ω d E ∗ ) For a fixed pair ( x, y ) ∈ N × N , we ha ve xy > , X ( γ P T P Ω r \ Ω d E ∗ ) = γ P Φ ∩ (Ω r \ Ω d ) P T X ∗ xy > , sgn( S ∗ ) W e condition on the ev ent that (19) holds. Because sgn( S ∗ ) has i.i.d. symmetric ± 1 entries, Hoef fding’ s inequality giv es P γ P Φ ∩ (Ω r \ Ω d ) P T X ∗ xy > , sgn( S ∗ ) ≥ C 4 k ≤ 2 exp − 2 · C 2 4 2 k γ P Φ ∩ (Ω r \ Ω d ) P T X ∗ ( xy > ) 2 F ! ≤ 2 exp − 2 · C 2 4 2 k 1 32 2 p 0 n ( d +1) P Φ ∩ (Ω r \ Ω d ) P T X ∗ 2 ≤ 2 exp − C 0 · 1 4 2 k 1 np 0 · p 0 1 4 2 k ! ≤ 2 exp ( − C 0 n ) for some constant C 0 that can be made large. This probability is e xponentially small, so we can apply union bound o ver the 6 n pairs ( x, y ) in the -net N × N and conclude that w .h.p. k X ( γ P T E ∗ ) k ≤ C 4 k = C 1 2 k 1 2 k For the exceptional term ( R Ω (1) −I ) ( − γ P T E ∗ ) , a similar bound holds as follo ws. The proof can be found in the Appendix. Lemma 4. Under the assumption of Theor em 2, the following holds with high pr obability k ( R Ω (1) −I ) ( − γ P T E ∗ ) k ≤ 1 32 . 15 Summing o ver all 2 k − 1 = poly ( n ) T ype-2 terms and combining with the bound (14) for the T ype-1 term, it follo ws that the right hand side of (12) is bounded by 1 8 · 2 k . Summing o ver k = 1 , 2 , . . . , k 0 bounds the second term in (10) by 1 8 , which, together with the bound (11) for the first term, completes the proof of inequality (d) in (2). Inequality ( e ) : Bounding kP T ⊥ γ E ∗ k A standard ar gument about the norm of a matrix with i.i.d. entries [13] and [4, Proposition 3] giv e kP T ⊥ γ E ∗ k ≤ γ P Ω r \ Ω d E ∗ + kP Ω d E ∗ k ≤ 1 32 p p 0 ( d + 1) n log n · (4 √ np 0 τ + d ) . Under the assumption of Theorem 2, the right hand side is no larger than 1 4 . Therefore, inequality (e) in (2) holds. This completes the proof of Theorem 2. As mentioned in section III-A, Theorem 1 also follows. I V . P RO O F O F T H E O R E M 3 The proof is along the lines of that in [4] and has three steps: (a) writing do wn a sufficient optimality condition, stated in terms of a dual certificate, for ( P Φ ( A ∗ ) , B ∗ ) to be the optimum of the con vex program (1), (b) constructing a particular candidate dual certificate, and, (c) showing that under the imposed conditions this candidate does indeed certify that ( P Φ ( A ∗ ) , B ∗ ) is the optimum. Part (b) is the ”art” in this method; dif ferent w ays to devise dual certificates can yield different sufficient conditions for exact recov ery . Indeed this is the main difference between this paper and [4]. 1) Optimality conditions: For the sake of completeness, we restate here a first-order sufficient condition that guarantees ( P Φ ( A ∗ ) , B ∗ ) to be the optimum of (1). The reader is referred to [4] for a proof. Lemma 5 ( A Sufficient Optimality Condition [4]) . The pair ( P Φ ( A ∗ ) , B ∗ ) is the unique optimal solution of (1) if (a) Γ c ∩ T = { 0 } . (b) Ther e exists a dual matrix Q ∈ R n 1 × n 2 satisfying P Φ c ( Q ) = 0 and P T ( Q ) = U V > kP T ⊥ ( Q ) k < 1 P Γ c ( Q ) = γ P Φ (sgn( A ∗ )) kP Γ ( Q ) k ∞ < γ . (20) Lemma 5 provides a first-order sufficient condition for ( P Φ ( A ∗ ) , B ∗ ) to be the optimum of (1). Condition (a) in the lemma guarantees that the sparse matrices and lo w-rank matrices can be distinguished without ambiguity . In other words, an y giv en matrix can not be both sparse and low-rank except the zero matrix. The follo wing lemma gi ves a suf ficient guarantee for the condition (a). W e construct the dual matrix Q in the ne xt subsection and prov e condition (b) afterwards. Lemma 6. If α < 1 , then Γ c ∩ T = { 0 } . Pr oof: It is clear that { 0 } ∈ Γ c ∩ T . In order to obtain a contradiction assume that there exists a non-zero matrix M ∈ Γ c ∩ T . By idempotency of orthogonal projections, we ha ve M = P Γ c ( M ) = 16 P T ( P Γ c ( M )) and hence kP T ( P Γ c ( M )) k ∞ = U U > P Γ c ( M ) + P Γ c ( M ) V V > − U U > P Γ c ( M ) V V > ∞ ≤ U U > P Γ c ( M ) ∞ + P Γ c ( M ) V V > ∞ + U U > P Γ c ( M ) V V > ∞ ≤ max i U U > e i max j kP Γ c ( M ) e j k + max j e > j P Γ c ( M ) max i V V > e i + max j e > j U U > kP Γ c ( M ) k max i V V > e i ≤ max i U U > e i √ d kP Γ c ( M ) k ∞ + √ d kP Γ c ( M ) k ∞ max i V V > e i + max i U U > e i d kP Γ c ( M ) k ∞ max i V V > e i ≤ α kP Γ c ( M ) k ∞ = α kP T ( P Γ c ( M )) k ∞ . (21) Here, we used the fact that q µrd n 1 q µrd n 2 ≤ q µrd max( n 1 ,n 2 ) since both terms do not exceed 1 by assumption. Hence, k M k ∞ = 0 or equi v alently , M = 0 . This is a contradiction. 2) Dual Certificate: W e no w describe our main innov ation, a new way to construct the candidate dual certificate Q , which is dif ferent from the ones in [4]. W e construct Q as the minimum norm solution to the equality constraints in Lemma 5. As a first step, consider two matrices Q a and Q b defined as follo ws: with M ∗ = γ sgn( A ∗ ) and N ∗ = U V ∗ , let Q a = M ∗ − P T ( M ∗ ) + P Γ c ( P T ( M ∗ )) − P T ( P Γ c ( P T ( M ∗ ))) + · · · Q b = N ∗ − P Γ c ( N ∗ ) + P T ( P Γ c ( N ∗ )) − P Γ c ( P T ( P Γ c ( N ∗ ))) + · · · Lemma 7 below establishes that Q a and Q b as described above are well-defined, i.e., it establishes that the infinite summations con ver ge, under the conditions of the theorem. Note that when this is the case, we hav e that P T ( Q b ) = U V > P T ( Q a ) = 0 P Γ c ( Q a ) = γ P Φ (sgn( A ∗ )) P Γ c ( Q b ) = 0 . (22) From (22), it is clear that Q = Q a + Q b satisfies the equality conditions in (20) and also P Φ c ( Q ) = 0 . In the next subsection, we will show that the inequality conditions are also satisfied under the assumptions of the theorem 3. Lemma 7. If α < 1 , then Q a and Q b exist, i.e ., the sums con ver ge . Pr oof: For any matrix W ∈ R n 1 × n 2 , let S W = W + P T ( P Γ c ( W )) + P T ( P Γ c ( P T ( P Γ c ( W )))) + · · · . It suf fices to show that S W con ver ges for all W since Q a = M ∗ − P Γ S P T ( M ∗ ) and Q b = S N ∗ −P Γ c ( N ∗ ) . Notice that kP T ( P Γ c ( W )) k ∞ ≤ α kP Γ c ( W ) k ∞ ≤ α k W k ∞ as shown in (21) and hence S W geometrically con ver ges. 3) Certification: Considering Q = Q a + Q b as a candidate for dual matrix, we need to sho w the conditions in (20) are satisfied under the conditions of the theorem. As we showed in the pre vious subsection, the equality conditions are satisfied by construction of Q a and Q b . T o prov e the inequality conditions, we first bound the projection of Q into orthogonal complement spaces in next lemma. Lemma 8. If α < 1 , then kP Γ ( Q ) k ∞ ≤ 1 1 − α r µr n 1 n 2 + αγ kP T ⊥ ( Q ) k ≤ η d 1 − α r µr n 1 n 2 + γ . 17 Pr oof: Using the definition of S W for an y matrix W ∈ R n 1 × n 2 , we get k S W k ∞ ≤ 1 1 − α k W k ∞ , because of the geometrical con ver gence. Thus, we ha ve kP Γ ( Q ) k ∞ = kP Γ S N ∗ −P T ( M ∗ ) k ∞ ≤ k S N ∗ −P T ( M ∗ ) k ∞ ≤ 1 1 − α k N ∗ − P T ( M ∗ ) k ∞ ≤ 1 1 − α ( k N ∗ k ∞ + kP T ( M ∗ ) k ∞ ) ≤ 1 1 − α ( k N ∗ k ∞ + α k M ∗ k ∞ ) ≤ 1 1 − α r µr n 1 n 2 + αγ . In the last inequality we use the incoherence assumptions for sparse and lo w-rank matrix. By orthonor- mality of U and V , we hav e k I − U U > k ≤ 1 and k I − V V > k ≤ 1 . Hence, kP T ⊥ ( Q ) k = kP T ⊥ M ∗ − P Γ c S N ∗ −P T ( M ∗ ) k = k I − U U > M ∗ − P Γ c S N ∗ −P T ( M ∗ ) I − V V > k ≤ k M ∗ − P Γ c S N ∗ −P T ( M ∗ ) k ≤ η d k M ∗ − P Γ c S N ∗ −P T ( M ∗ ) k ∞ ≤ η d k M ∗ k ∞ + k S N ∗ −P T ( M ∗ ) k ∞ ≤ η d γ + 1 1 − α r µr n 1 n 2 + αγ ≤ η d 1 − α r µr n 1 n 2 + γ . Here, again we are using the incoherence assumptions on the sparse and lo w-rank matrix. This concludes the proof of the lemma. Finally to satisfy (20), we require kP T ⊥ ( Q ) k ≤ η d 1 − α r µr n 1 n 2 + γ < 1 kP Γ ( Q ) k ∞ ≤ 1 1 − α r µr n 1 n 2 + αγ <γ . Combining these two inequalities, we get 1 1 − 2 α r µr n 1 n 2 < γ < 1 − α η d − r µr n 1 n 2 as stated in the assumptions of the theorem. V . E X P E R I M E N T S In this section, we illustrate the power of our method via some simulation results. These results show that the behavior of the algorithm agrees with the theoretical results. 18 0 500 1000 1500 2000 2500 3000 0 0.05 0.1 0.15 0.2 0.25 0.3 n p 0 Fig. 1. For a rank two matrix of size n , with probability of corruption τ = 0 . 1 and no adversarial noise ( d = 0 ), we plot the minimum probability of observation p 0 required for successful recovery of the low-rank matrix as n gets larger . W e in v estigate how the algorithm performs as the size of the lo w-rank matrix gets lar ger . In other words, we try to see how the requirements for the success of our algorithm change as the size of the matrix gro ws. These simulation results sho w that the conditions get relaxed more and more as n increases. W e run three e xperiments as follo ws: (1) Minimum Required Obser vation Probability: W e generate a rank two matrix ( r = 2 ) of size n by multiplying a random n × 2 matrix and a random 2 × n matrix, and then corrupt the entries randomly with probability τ = 0 . 1 without any adversarial noise ( d = 0 ). The entries of the corrupted matrix are observed independently with probability p 0 . W e then solve (1) using the method in [14]. Success is declared if we recov er the low-rank matrix with a relativ e error less than 10 − 6 measured in Frobenius norm. The experiment is repeated 10 times and we count the frequenc y of success. For any fixed number n , if we start from p 0 = 1 and decrease p 0 , at some point, the frequenc y of success jumps from one to zero, i.e., we observ e a phase transition. In Fig. 1, we plot the p 0 at which the phase transition happens versus the size of the matrix. This e xperiment shows that the phase transition p 0 goes to zero as n increases as predicted by the theorem. (2) Maximum T olerable Corruption Probability: Similarly as before, we generate a rank two matrix ( r = 2 ) of size n , with observ ation probability p 0 = 0 . 9 and without any adversarial noise ( d = 0 ). For any fixed number n , if we start from τ = 0 and increase τ , at some point, the frequency of success jumps from one to zero. Fig. 2 illustrates how the phase transition τ changes as the size of the matrix increases. This experiment shows that higher probability of corruptions can be tolerated as the size of the matrix increases as predicted by the theorem. (3) Maximum T olerable Adversarial/Deterministic Noise: Similarly as before, we generate a rank two matrix ( r = 2 ), of size n , with observation probability p 0 = 0 . 5 and corruption probability τ = 0 . 1 . W e add the adv ersarial noise in the form of a d × d block of 1 ’ s lying on the diagonal of the original matrix. Notice that potentially it is a hard case to reco ver the lo w-rank matrix since all the adversarial corruptions are burst as oppose to be spread ov er the matrix (Bernoulli corruptions). W e find the maximum possible d such that the frequenc y of success to goes from 1 to 0 (phase transition). In Fig. 3, we plot this phase transition d versus the size of the matrix and as the deterministic theorem predicts, it gro ws linearly in n . 19 0 500 1000 1500 2000 2500 3000 0 0.2 0.4 0.6 0.8 1 n τ Fig. 2. For a rank two matrix of size n , with probability of observation p 0 = 0 . 9 and no adversarial noise ( d = 0 ), we plot the maximum probability of corruption τ tolerable for successful recovery of the low-rank matrix as n gets larger . 0 500 1000 1500 2000 2500 3000 0 100 200 300 400 500 600 n d Fig. 3. For a rank two matrix of size n , with probability of observation p 0 = 0 . 5 and probability of corruption τ = 0 . 1 , and with adversarial/deterministic noise in the form of a d × d block of 1 ’ s lying on the diagonal of the matrix, we plot the maximum size of the adversarial noise d tolerable for successful recovery of the low-rank matrix as n gets larger . R E F E R E N C E S [1] P . Huber , Robust Statistics . Wile y , New Y ork, 1981. [2] E. J. Candes and B. Recht, “Exact matrix completion via con ve x optimzation, ” F oundations of Computational Mathematics , vol. 9, pp. 717–772, 2009. [3] E. J. Candes and T . T ao, “The po wer of con vex relaxation: Near -optimal matrix completion, ” IEEE Tr ansaction on Information Theory , 2009. [4] V . Chandrasekaran, S. Sanghavi, P . Parrilo, and A. S. W illsky , “Rank-sparsity incoherence for matrix decomposition, ” SIAM Journal on Optimization, to appear , 2010. [5] D. Hsu, S. Kakade, and T . Zhang, “Robust matrix decomposition with outliers, ” A vailable at arXiv:1011.1518 , 2010. [6] A. Ganesh, J. Wright, X. Li, E. Candes, and Y . Ma, “Dense error correction for lo w-rank matrices via principal component pursuit, ” in IEEE International Symposium on Information Theory (ISIT) , 2010. [7] E. J. Candes, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” A vailable at http://www-stat.stanfor d.edu/ can- des/papers/Rob ustPCA.pdf , 2009. [8] B. Recht, M. Fazel, and P . Parillo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, ” 2009, av ailable on [9] D. Gross, “Recovering low-rank matrices from few coefficients in any basis, ” A vailable at arXiv:0910.1879v4 , 2009. [10] V . Chandrasekaran, S. Sanghavi, P . Parrilo, and A. S. W illsky , “Sparse and low-rank matrix decompositions, ” in 15th IF A C Sypmposium on System Identification (SYSID) , 2009. [11] X. Li, “Compressed sensing and matrix completion with constant proportion of corruptions, ” Arxiv pr eprint arXiv:1104.1041 , 2011. [12] B. Recht, “A Simpler Approach to Matrix Completion, ” Arxiv preprint , 2009. [13] R. V ershynin, “Introduction to the non-asymptotic analysis of random matrices, ” Arxiv pr eprint arXiv:1011.3027 , 2010. [14] Z. Lin, M. Chen, L. W u, and Y . Ma, “The Augmented Lagrange Multiplier Method for Exact Recov ery of Corrupted Low-Rank Matrices, ” UIUC T echnical Report UILU-ENG-09-2215 , 2009. 20 [15] J. Tropp, “User-friendly tail bounds for sums of random matrices, ” Arxiv preprint , 2010. 21 A P P E N D I X Here we provide sev eral technical lemmas that is needed in the proof of the unified guarantees. W e first state the non-commutati ve Bernstein inequality , which is useful in the sequel. The v ersion presented belo w is first prov ed in [12], [9] and later sharpened in [15]. Lemma 9. [15, Remark 6.3] Consider a finite sequence { Z k } of independent, random n 1 × n 2 matrices that satisfy the assumption E Z k = 0 and k Z k k ≤ D almost sur ely . Let σ 2 = max P k E Z k Z > k , P k E Z > k Z k . Then for all t ≥ 0 we have P h X Z k ≥ t i ≤ ( n 1 + n 2 ) exp − t 2 2 σ 2 + 2 3 D t (23) ≤ ( ( n 1 + n 2 ) exp − 3 t 2 8 σ 2 , for t ≤ σ 2 D . ( n 1 + n 2 ) exp − 3 t 8 D , for t ≥ σ 2 D . (24) W .L.O.G. we only consider the case n 1 = n 2 = n . Recall that we hav e defined α = q µrd n 1 + q µrd n 2 + q µrd max { n 1 ,n 2 } = 3 q µrd n . Under the assumptions of Theorem 2, α is a suf ficiently small constant bounded aw ay from 1 . W e will make use of the follo wing estimates P T ( e i e > j ) 2 F ≤ 2 µr n , ∀ i, j , which follo w from the incoherence assumptions of U and V . W e start with the proof of Lemma 2. W e need one simple lemma for the deterministic set Γ c d . Lemma 10. F or any matrix Z ∈ T , we have P Γ c d ( Z ) F ≤ α k Z k F Pr oof: Since Z ∈ T , Z = U X > + U ⊥ Y V > for some X , Y ∈ R n × r . F or 1 ≤ j ≤ n , incoherence of B ∗ gi ves U X > e j ∞ = max i e > i U X > e j ≤ r µr n X > e j 2 . Therefore, we ha ve P Γ c d ( U X > ) e j 2 ≤ √ d U X > e j ∞ ≤ α X > e j 2 . It follows that P Γ c d ( U X > ) 2 F = X j P Γ c d ( U X > ) e j 2 2 ≤ X j α 2 X > e j 2 2 = α 2 X > 2 F Similarly , we ha ve P Γ c d ( U ⊥ Y V > ) 2 F ≤ α 2 k Y k 2 F . The lemma then follows from the triangular inequality and k Z k 2 F = k X k 2 F + k Y k 2 F . W e now turn to the proof of Lemma 2. In fact, we will pro ve a slightly more general result as below . Lemma 11. Suppose Ω 0 is a set of indices obe ying Ω 0 ∼ Ber ( p ) , and Γ 0 is a fixed set of indices. 1) F or any β > 1 , we have p − 1 P T P Ω 0 ∩ Γ 0 P T − P T P Γ 0 P T ≤ 1 with pr obability at least 1 − 2 n 2 − 2 β pr ovided 1 > 1 ≥ q 32 β µr log n 3 np . 2) If in addition, Γ 0 = Γ d , wher e Γ d satisfies the assumptions in Theorem 2, then p − 1 P T P Ω 0 ∩ Γ d P T − P T ≤ 1 + α 22 with the same pr obability . Pr oof: W e will use Lemma 9 to bound the operator norm of the random component p − 1 P T P Ω 0 ∩ Γ 0 P T − P T P Γ 0 P T . T o this end, we need to write the random component as a sum of zero-mean, independent random v ariables, and then show that each of them is bounded almost surely and their sum has small second moment. Now for the details. For ( i, j ) ∈ Γ 0 , define the indicator random v ariables δ ij = 1 { ( i,j ) ∈ Ω 0 ∩ Γ 0 } ; so δ ij equals one with probability p and zero otherwise, and is independent of all others. For any Z ∈ T , observe that Z i,j = e i e > j , Z for ( i, j ) ∈ Γ 0 , and thus p − 1 P T P Ω 0 ∩ Γ 0 P T Z − P T P Γ 0 P T Z = X ( i,j ) ∈ Γ 0 p − 1 δ ij − 1 e i e > j , Z P T ( e i e j ) > , X ( i,j ) ∈ Γ 0 S ij ( Z ) . Here S ij : R n × n 7→ R n × n is a self-adjoint random operator with E [ S ij ] = 0 . T o use the non-commutati ve Bernstein inequality , we need to bound kS ij k , and E h P ( i,j ) ∈ Γ 0 S ij 2 i . T o this end, we ha ve kS ij k = sup k Z k F =1 p − 1 δ ij − 1 P T ( e i e > j ) , Z P T ( e i e j ) > ≤ sup k Z k F =1 p − 1 P T ( e i e > j ) 2 F k Z k F ≤ 2 µr np On the other hand, for an y Z ∈ T we hav e S 2 ij ( Z ) = ( p − 1 δ ij − 1) 2 Z i,j P T ( e i e j ) > , e i e > j P T ( e i e > j ) . Therefore E X ( i,j ) ∈ Γ 0 S 2 ij ( Z ) F = p − 1 − 1 X ( i,j ) ∈ Γ 0 P T ( e i e j ) > 2 F Z i,j P T ( e i e > j ) F ≤ p − 1 − 1 X ( i,j ) ∈ Γ 0 P T ( e i e j ) > 2 F Z i,j ( e i e > j ) F ≤ p − 1 − 1 2 µr n X ( i,j ) ∈ Γ 0 Z i,j ( e i e > j ) F = p − 1 − 1 2 µr n kP Γ 0 ( Z ) k F ≤ p − 1 − 1 2 µr n k Z k F , which means E h P ( i,j ) ∈ Γ d S 2 ij i ≤ 2 µr np . When 1 ≥ max n q 32 β µr log n 3 np , 32 β µr log n 3 np o , we apply Lemma 9 and obtain P h X S 2 ij ≥ 1 i ≤ 2 n 2 − 2 β . Therefore, k p − 1 P T P Ω 0 ∩ Γ 0 P T − P T P Γ 0 P T k < 1 w .h.p., which pro ves the first part of the lemma. On the other hand, when Γ 0 = Γ d , Lemma 10 giv es kP T P Γ d P T − P T k = max Z : k Z k F =1 k ( P T P Γ d P T − P T ) Z k F ≤ max Z : k Z k F =1 α kP T Z k F ≤ α. 23 The second part of lemma then follows from the triangular inequality . The next three lemmas bound the norms of certain random matrices. Their proofs follow the same spirit as Lemma 11 by decomposing the random component into the sum of independent, bounded v ariables with small second moments, and then in voking Lemma 9. The following lemma is a generalization of [2, Theorem 6.3]. Lemma 12. Suppose Ω 0 is a set of indices obe ying Ω 0 ∼ Ber ( p ) , Γ 0 is a fixed set of indices, and Z is a fixed n × n matrix. 1) F or any β > 1 , we have 1 p P Ω 0 ∩ Γ 0 Z − P Γ 0 Z ≤ s 8 β n log n 3 p kP Γ d Z k ∞ with pr obability at least 1 − 2 n 1 − β pr ovided p ≥ 8 β log n 3 n . 2) If in addition, Γ 0 = Γ d , wher e Γ d satisfies the assumptions in Theorem 2, we have 1 p P Ω 0 Z − Z ≤ s 8 β n log n 3 p + d ! k Z k ∞ with the same pr obability . Pr oof: For ( i, j ) ∈ Γ 0 define the random variable δ ij = 1 { ( i,j ) ∈ Ω 0 } . Notice that 1 p P Ω 0 ∩ Γ 0 Z − P Γ 0 Z = X ( i,j ) ∈ Γ 0 ( p − 1 δ ij − 1) Z i,j e i e > j , X ( i,j ) ∈ Γ 0 Ξ ij . Here Ξ ij ∈ R n × n satisfies E [Ξ ij ] = 0 , k Ξ ij k ≤ p − 1 kP Γ 0 Z k ∞ and E X ( i,j ) ∈ Γ 0 Ξ ij Ξ > ij = p − 1 − 1 X ( i,j ) ∈ Γ 0 Z 2 i,j e i e > i ≤ p − 1 − 1 diag X (1 ,j ) ∈ Γ 0 Z 2 1 ,j , . . . , X ( n,j ) ∈ Γ 0 Z 2 n,j ≤ p − 1 − 1 n kP Γ 0 Z k 2 ∞ ≤ p − 1 n kP Γ 0 Z k 2 ∞ . A similar calculation yields E h P ( i,j ) ∈ Γ 0 Ξ > ij Ξ ij i ≤ p − 1 n kP Γ 0 Z k 2 ∞ When p ≥ 8 β log n 3 n , we apply Lemma 9 and obtain P X ( i,j ) ∈ Γ 0 Ξ ij ≥ s 8 β n log n 3 p kP Γ 0 Z k ∞ ≤ 2 n exp − 3 8 · 8 β n log n 3 p kP Γ 0 Z k 2 ∞ n p kP Γ 0 Z k 2 ∞ ! ≤ 2 n 1 − β . Therefore, 1 p P Ω 0 ∩ Γ 0 Z − P Γ 0 Z ≤ q 8 β n log n 3 p kP Γ 0 Z k ∞ w .h.p., which pro ves the first part of the lemma. On the other hand, when Γ 0 = Γ d , [4, Proposition 3] giv es kP Γ d Z − Z k = P Γ c d Z ≤ d P Γ c d Z ∞ . The second part of the lemma then follo ws from the triangle inequality . The following lemma is a generalization of [7, Lemma 3.1]. 24 Lemma 13. Suppose Ω 0 is a set of indices obe ying Ω 0 ∼ Ber ( p ) , Γ 0 is a fixed set of indices, and Z is a fixed n × n matrix in T . 1) F or any β > 1 and 3 < 1 , we have 1 p P T P Ω 0 ∩ Γ 0 P T Z − P T P Γ 0 P T Z ∞ ≤ 3 k Z k ∞ with pr obability at least 1 − 2 n 2 − 2 β pr ovided p ≥ 32 β µr log n 3 n 2 3 . 2) If in addition, Γ 0 = Γ d , wher e Γ d satisfies the assumptions in Theorem 2, we have 1 p P T P Ω 0 P T Z − Z ∞ ≤ ( 3 + α ) k Z k ∞ with the same pr obability . Pr oof: For ( i, j ) ∈ Γ 0 , set δ ij = 1 { ( i,j ) ∈ Ω 0 } . Fix ( a, b ) ∈ [ n ] × [ n ] . Notice that 1 p P T P Ω 0 ∩ Γ 0 P T Z − P T P Γ 0 P T Z a,b = X ( i,j ) ∈ Γ 0 ( p − 1 δ ij − 1) Z i,j P T ( e i e > j ) , e a e > b , X ( i,j ) ∈ Γ 0 ξ ij where E [ ξ ij ] = 0 . F or ( i, j ) ∈ Γ 0 , we ha ve | ξ ij | ≤ p − 1 P T ( e i e > j ) F P T ( e a e > b ) F | Z i,j | ≤ 2 µr np kP Γ 0 Z k ∞ The second moment is bounded by E X ( i,j ) ∈ Γ 0 ξ 2 ij = X ( i,j ) ∈ Γ 0 E ( p − 1 δ ij − 1) 2 P T ( e i e > j ) , e a e > b 2 Z 2 i,j ≤ p − 1 − 1 kP Γ 0 Z k 2 ∞ X ( i,j ) ∈ Γ 0 e i e > j , P T ( e a e > b ) 2 = p − 1 − 1 kP Γ 0 Z k 2 ∞ P Γ 0 P T ( e a e > b ) 2 F ≤ p − 1 − 1 2 µr n kP Γ 0 Z k 2 ∞ ≤ 2 µr np kP Γ 0 Z k 2 ∞ . When p ≥ 32 β µr log n 3 n 2 3 and 3 < 1 , we apply Lemma 9 and obtain P " 1 p P T P Ω 0 ∩ Γ 0 P T Z − P T P Γ 0 P T Z a,b ≥ 3 kP Γ 0 Z k ∞ # ≤ 2 exp − 3( 3 ) 2 kP Γ 0 Z k 2 ∞ 8 · 2 µr np kP Γ 0 Z k 2 ∞ ! ≤ 2 n − 2 β . Union bound then yields 1 p P T P Ω 0 ∩ Γ 0 P T Z − P T P Γ 0 P T Z ∞ ≤ 3 kP Γ 0 Z k ∞ 25 with high probability , which prov es the first part of the lemma. On the other hand, when Γ 0 = Γ d , by (21) we ha ve kP T P Γ d P T Z − Z k ∞ = P T P Γ c d Z ∞ ≤ α k Z k ∞ . The second part of the lemma then follo ws from triangle inequality . The next two lemmas bound kP T E ∗ k ∞ . Lemma 14. Under the assumption of Theor em 2, we have kP T P Ω d E ∗ k ∞ ≤ α. Pr oof: By assumption Ω d contains at most d entries from each row/column, so repeating the proof of Lemma 6 yields the desired bound. Lemma 15. Under the assumption of Theor em 2 and conditioned on Ω r , we have P T P Ω r \ Ω d E ∗ ∞ ≤ C r µr n p 0 log n with high pr obability for some constant C > 0 . Pr oof: Set E = P Ω r \ Ω d E ∗ ; observe that each entry of E in (Ω r ∩ Φ d ) \ Ω d is non-zero with probability p 0 and has random sign, independent of each other . Since we hav e kP T E k ∞ = kP U E + P V E − P U P V E k ∞ ≤ U U > E ∞ + E V V > ∞ + U U > E V V > ∞ , it suffices to bound these three terms. From the incoherence property of U , we kno w U U > ∞ = max i,j e > i U U > e j ≤ µr n , and e > i U U > 2 ≤ µr n , ∀ i No w we bound U U > E ∞ . For simplicity , we focus on the (1 , 1) entry of U U > E and denote it as X . Set s > = e > 1 U U > . Observe that X = P i :( i, 1) ∈ (Ω r ∩ Φ d ) \ Ω d s i E i, 1 , E i, 1 ’ s are i.i.d., with E [ s i E i, 1 ] = 0 and s > i E i, 1 ≤ | s i | ≤ µr n , a.s. V ar ( X ) = X i :( i, 1) ∈ (Ω r ∩ Φ d ) \ Ω d ( s i ) 2 p 0 ≤ µr n p 0 . Standard bernstein inequality (24) thus gi ves P [ | X | > t ] ≤ 2 exp − t 2 2 µr n p 0 + 2 µr 3 n t ! . Under the assumption of Theorem 2, we can choose t = C max { µr n log n, p µr n p 0 log n } for some C suf ficiently large and apply the union bound to obtain U U > E ∞ ≤ C max µr n log n, r µr n p 0 log n , w .h.p. Similarly , E V V > ∞ is also bounded by the right hand side of the abov e equation. Finally , denote w := V V > e 1 and observe that U U > E V V > 1 , 1 = X ( i,j ) ∈ (Ω r ∩ Φ d ) \ Ω d s i w j E i,j . 26 Then a similar application of Bernstein inequality and the union bound gi ves U U > E V V > ∞ ≤ C 0 max µ 2 r 2 n 2 log n, µr n p p 0 log n , w .h.p. The lemma follows from observing that µr n ≤ 1 and p 0 ≥ µr log n n under the assumptions of Theorem 2. Finally , we prov e Lemma 4. Pr oof: (of Lemma 4) Recall that by definition Ω c r = S k 0 k =1 Ω ( k ) , so we hav e λ 1 q 1 P Ω (1) ∩ Γ d − I P T P Ω r \ Ω d ( E ∗ ) = λ ( P Γ d − I ) P T P Ω r \ Ω d ( E ∗ ) + λ 1 q 1 P Ω (1) − I P Γ d P T P Ω r \ Ω d ( E ∗ ) = λ ( P Γ d − I ) P T P Ω r \ Ω d ( E ∗ ) + λ 1 q 1 P Ω (1) − I P Γ d P T P ( Ω (1) ) c P ( Ω (2) ) c T ... T ( Ω ( k 0 ) ) c \ Ω d ( E ∗ ) , λ ( P Γ d − I ) P T P Ω r \ Ω d ( E ∗ ) + λ 1 q 1 P Ω (1) − I P Γ d P T P ( Ω (1) ) c E where E is a matrix with independent random signed entries supported on Φ ∩ Ω (2) c ∩ · · · ∩ Ω ( k 0 ) c \ Ω d . The operator norm of the first term is bounded using [4, Proposition 3] and Lemma 15 as d λ P T P Ω r \ Ω d ( E ∗ ) ∞ ≤ d · C λ r µr n p 0 log n ≤ C 0 . Let δ ab = 1 { ( a,b ) ∈ Ω (1) } , then the second term can be decomposed as λ 1 q 1 P Ω (1) − I P Γ d P T P ( Ω (1) ) c E = X a,b,a 0 ,b 0 λ 1 q 1 δ ab − 1 (1 − δ a 0 b 0 ) E a 0 ,b 0 P Γ d P T ( e a 0 e > b 0 ) , e a e > b e a e > b = X ( a 0 ,b 0 )=( a,b ) + X ( a 0 ,b 0 ) 6 =( a,b ) W e bound the operator norm of the above two terms separately . The diagonal term is bounded as X ( a 0 ,b 0 )= a,b = X a,b λ ( δ ab − 1) E a,b P Γ d P T ( e a e > b ) , e a e > b e a e > b ≤ X a,b λ ( δ ab − q 1 ) X a,b e a e > b + X a,b λ ( q 1 − 1) X a,b e a e > b = q 1 λ 1 q 1 P Ω (1) − I X + X a,b λ ( q 1 − 1) X a,b e a e > b where X a,b = E a,b P Γ d P T ( e a e > b ) , e a e > b . The first part of Lemma 12 with Ω 0 = Ω (1) and Γ 0 = [ n ] × [ n ] bounds the first term by q 1 λC q n log n q 1 k X k ∞ ≤ q 1 λC q n log n q 1 2 µr n ≤ C 0 . W e then apply [2, Lemma 6.4] and a standard bound of the operator norm of a random matrix to bound the second term by λ (1 − q 1 ) 2 µr n k E k ≤ λ 2 µr n · √ np 0 log n ≤ C . 27 The off-diagonal term can be expressed as X ( a 0 ,b 0 ) 6 =( a,b ) = X ( a 0 ,b 0 ) 6 =( a,b ) λ 1 q 1 δ a 0 b 0 − 1 (1 − δ ab ) E ab P Γ d P T ( e a e > b ) , e a 0 e > b 0 e a 0 e > b 0 = λ q 1 X ( a 0 ,b 0 ) 6 =( a,b ) ( δ a 0 b 0 − q 1 ) ( q 1 − δ ab ) E ab P Γ d P T ( e a e > b ) , e a 0 e > b 0 e a 0 e > b 0 + λ q 1 X ( a 0 ,b 0 ) 6 =( a,b ) ( δ a 0 b 0 − q 1 ) (1 − q 1 ) E ab P Γ d P T ( e a e > b ) , e a 0 e > b 0 e a 0 e > b 0 The operator norm of first term can be bounded using the decoupling argument in [2]. In particular , we can repeat the proof of [2, Lemma 6.7] with p = q 1 , ξ ab = δ ab − q 1 and P Γ d P T ( e a e > b ) 2 F ≤ 2 µr n to bound the first term by C 0 λ √ µr log n k E k ∞ ≤ C . Let H a 0 b 0 = P a,b :( a,b ) 6 =( a 0 ,b 0 ) E a,b P Γ d P T ( e a e > b ) , e a 0 e > b 0 ; the second term can be bounded as λ (1 − q 1 ) q 1 X ( a 0 ,b 0 ) 6 =( a,b ) ( δ a 0 b 0 − q 1 ) E a,b P Γ d P T ( e a e > b ) , e a 0 e > b 0 e a 0 e > b 0 ≤ λ X a 0 ,b 0 1 q 1 δ a 0 b 0 − 1 X ( a,b ):( a,b ) 6 =( a 0 ,b 0 ) E a,b P Γ d P T ( e a e > b ) , e a 0 e > b 0 e a 0 e > b 0 = λ 1 q 1 P Ω (1) − I H ≤ λC s n log n q 1 k H k ∞ (25) where we use the first part of Lemma 12 with Ω 0 = Ω (1) and Γ 0 = [ n ] × [ n ] in the inequality . Further observe that H a 0 ,b 0 = X a,b E a,b P Γ d P T ( e a e > b ) , e a 0 e > b 0 ! − E a 0 ,b 0 P Γ d P T ( e a 0 e > b 0 ) , e a 0 e > b 0 = ( P Γ d P T E ) a 0 ,b 0 − E a 0 ,b 0 P Γ d P T ( e a 0 e > b 0 ) , e a 0 e > b 0 so we ha ve k H k ∞ ≤ k P T E k ∞ + k E k ∞ P T ( e a 0 e > b 0 ) 2 F ≤ C r µr n p 0 log n + 2 µr n ≤ C 0 r µr n p 0 log n. where we use Lemma 15. It follo ws that the right hand side of (25) is bounded by λ q n log n q 1 C 0 p µr n p 0 log n ≤ C 00 . This completes the proof of the lemma.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment