Universal low-rank matrix recovery from Pauli measurements
We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a non-commutative analogue of a well-known problem in compressed…
Authors: Yi-Kai Liu
Universal low-rank matrix r ecovery fr om Pauli measurements Y i-Kai Liu Applied and Computation al Mathematics Di vision National Institute of Standard s and T echnology Gaithersburg, MD, USA yi-kai.liu@nist.g ov Abstract W e study the problem o f reconstructing an u nknown matrix M of r ank r and di- mension d using O ( rd po ly lo g d ) Pauli measurem ents. This has application s in quantum state tomo graphy , and is a non-co mmutative analogue of a well-known problem in com pressed sensing: recovering a sparse vector from a few of its Fourier coeffi cien ts. W e show that almost all sets of O ( rd log 6 d ) Pauli measurements satisfy the rank- r restricted isometry property (RIP). This implies that M can be recovered fro m a fix ed (“universal”) set of Pauli measurem ents, using nuclear-norm minim ization (e.g., the matrix Lasso), with n early-op timal boun ds on the error . A similar result holds for any class of mea surements th at use an ortho normal op erator b asis whose elements ha ve small operato r n orm. Our proo f u ses Dudley’ s inequality for Gaus- sian processes, togeth er with bou nds o n c overing num bers ob tained v ia en tropy duality . 1 Intr oduction Low-rank matrix recovery is the following problem: let M be some unknown matrix of dimension d and rank r ≪ d , and let A 1 , A 2 , . . . , A m be a set of measur ement matrices; then can one recon- struct M from its inn er products tr( M ∗ A 1 ) , tr( M ∗ A 2 ) , . . . , tr( M ∗ A m ) ? This problem has many applications in machine learnin g [1, 2], e.g., collabo rativ e filtering (the Netflix p roblem) . Remark - ably , it turns ou t that for many usefu l choices of measurem ent matrices, low-rank matrix recovery is possible, and can even be don e efficiently . For example, when the A i are Gaussian random ma- trices, the n it is kn own that m = O ( rd ) measu rements are sufficient to uniq uely determine M , and further more, M can be recon structed by solving a co n vex progr am (m inimizing th e nuclear norm) [3, 4, 5]. Ano ther example is the “matrix comp letion” problem , where th e measurem ents return a random subset of matrix elements of M ; in this case, m = O ( r d p oly log d ) measuremen ts suf fice, provided that M satisfies some “inco herence” conditions [6, 7, 8, 9, 10]. The focus o f this paper is on a dif ferent class of m easurements, known as P au li measur ements. Here, the A i are rando mly chosen elements o f the Pauli basis, a particular orthono rmal basis of C d × d . The Pauli basis is a non-commu tati ve analo gue of the Fourier basis in C d ; thus, low-rank matrix recovery using Pauli measur ements can be viewed as a genera lization of the ide a of com pressed sensing o f sparse vectors using their Fourier coefficients [11, 12]. In ad dition, this problem h as applications in quantu m state tomograp hy , the task of le arning an unk nown quan tum state by perf orming me a- surements [13]. T his is because most quantum states of physical intere st a re accurately described b y density m atrices th at h av e low rank ; an d Pauli measuremen ts are especially e asy to carry out in an experiment (due to the tensor product structure of the P auli basis). 1 In this p aper we sh ow stronger results on low-rank matr ix recovery from Pauli measurem ents. Pre- viously [13, 8], it was known that, for every rank- r matrix M ∈ C d × d , almost all choices of m = O ( r d poly log d ) rand om Pauli me asurements will lead to succ essful recovery of M . He re we show a stronger statement: there is a fix ed (“universal”) set of m = O ( r d p oly log d ) Pauli mea- surements, such th at for all ran k- r matrices M ∈ C d × d , we h av e successful r ecovery . 1 W e do this by sho wing that the random P auli sampling operator obeys the “restricted isometry property” (RIP). Intuitively , RIP says that the sam pling o perator is an appr oximate isometry , actin g on the set of all low-rank m atrices. In geometric term s, it says that the sampling o perator embed s the manif old of low-rank matrices into O ( r d p oly log d ) dimension s, with low distortion in the 2-norm. RIP for low-rank matrices is a v ery strong prop erty , and prio r to this work, it was only known to hold for very un structured typ es o f r andom measurements, such as Gaussian measuremen ts [3], which are unsu itable for mo st a pplications. RIP was known to fail in the m atrix completion case, and whether it held for Pauli measurem ents was an open question. Once we ha ve estab lished RIP for Pauli measureme nts, we ca n use known re sults [3, 4, 5] to show low-rank m atrix re covery fro m a universal set of Pauli measurements. In p articular, using [5], we c an get nea rly-optima l univ ersal bound s o n the error o f the reconstructed d ensity matrix , when the data are n oisy; a nd we can even get bound s on the recovery of arb itrary (not necessarily low-rank) matrices. These RIP-based b ounds are qualitatively s tro nger than th ose obtained u sing “dual ce rtificates” [14] ( though the latter tech nique is applicable in some situations where RIP fails). In the co ntext of quantum state tomo graphy , this implies that, g i ven a qua ntum state that consists of a low-rank co mponen t M r plus a residual full-ran k co mponen t M c , we can reco nstruct M r up to an err or th at is not much larger than M c . In particular, let k ·k ∗ denote the n uclear n orm, an d let k·k F denote the Froben ius norm. Then the error can be bo unded in the nuclear norm by O ( k M c k ∗ ) (assuming no iseless data) , an d it c an be boun ded in the Frobenius no rm by O ( k M c k F po ly log d ) (which holds ev en with noisy data 2 ). Th is sho ws that ou r r econstruction is nearly as good as the best rank- r appro ximation to M (which is given by th e truncated SVD). In addition , a completely arbitrary qu antum state can be reconstru cted up to an er ror of O (1 / √ r ) in Fro benius norm. L astly , the RIP g i ves some insight into the o ptimal design of tom ograph y experiments, in particular, the tradeoff be tween th e n umber of measureme nt settings (wh ich is essentially m ), and the numb er of repetitions of the experiment at each setting (which determin es the statistical noise that enters the data) [15]. These results c an be g eneralized beyond th e class o f P auli measurem ents. Essentially , one can replace th e Pauli b asis with any orth onorm al basis of C d × d that is inco her ent , i.e ., wh ose eleme nts have sma ll operator norm (of order O (1 / √ d ) , say); a similar generalization was noted in the earlier results of [8]. Also, our proof sho ws that the RIP actually holds in a slightly stronger sense: it holds not just for all rank- r matrices, b ut fo r all matrices X tha t satisfy k X k ∗ ≤ √ r k X k F . T o pr ove this result, we c ombine a nu mber of techniqu es that hav e appe ared elsewhere. RIP r esults were previously known for Gau ssian measureme nts and some of their close r elati ves [3]. Also, restricted stron g conve xity (RSC), a similar but somewhat weaker pro perty , was rec ently shown in the c ontext of the matrix completion p roblem ( with ad ditional “n on-spikin ess” conditions) [ 10]. These results follo w from cov erin g argu ments ( i.e., using a concentration inequality to upper-bound the failure pr obability on each individual low-rank matrix X , and then tak ing the unio n bound over all such X ). Showing RIP for Pauli measurements seems to be more delicate, howe ver . Pauli measuremen ts have mo re stru cture and less ran domness, so the concentration of measure phen omena are weaker , and the union bound no longer giv es the desired result. Instead, one must take into account the fa vorable correlation s between the behavior of the sampling operator o n different matrices — intuitively , if two low-rank matrices M and M ′ have ov erlap ping supports, then g ood behavior on M is positiv ely cor related with goo d beha vio r on M ′ . This c an be done by transforming th e problem into a Gaussian process, and using Dudley’ s entr opy bound . Th is is the same appro ach used in classical com pressed sensing , to show RIP for Fourier measurem ents [12, 11]. The key difference is that in ou r case, th e Gaussian pro cess is in dexed by low-rank matrices, rather than sparse vectors. T o bound the correlations in this pr ocess, one then needs to bound th e covering n umbers of th e nuclea r n orm ball (of matrices), rather th an the ℓ 1 ball (o f vectors). Th is 1 Note that in the univ ersal result, m is sl ightly larger , by a f actor of p oly log d . 2 Ho wev er, this bound is not uni versal. 2 requires a different tech nique, using entropy duality , wh ich is du e to Gu ´ edon et al [16]. (See also the related work in [17].) As a side note, we remar k that matrix recovery can sometimes fail because th ere exist large sets of up to d Pauli matrices that all c ommute, i.e., they hav e a sim ultaneous e igenbasis φ 1 , . . . , φ d . (These φ i are of interest in qu antum inf ormation — they are called stabilizer states [ 18].) If on e were to measure su ch a set o f Pauli’ s, one would gain complete knowledge about the d iagonal elements o f the unkn own matr ix M in the φ i basis, but o ne would learn nothing about the off-diagonal elemen ts. This is reminiscent o f the difficulties tha t arise in matrix completion. However , in our case, th ese patholog ical cases turn out to be rare, since it is unlikely that a random subset of P auli matrices will all commu te. Finally , we note that there is a large b ody of r elated work on estimating a low-rank matr ix b y solv ing a regularized con vex p rogram; see, e.g., [19, 20]. This paper is organized as follows. In section 2, we state o ur results precisely , and d iscuss some specific applications to quan tum state tomogr aphy . In section 3 we prove the RIP for P auli matrices, and in section 4 we discu ss some d irections for future work . Some technical details appear in sections A and B. Notatio n: For vector s, k ·k 2 denotes the ℓ 2 norm. For matrices, k·k p denotes th e Schatten p -norm , k X k p = ( P i σ i ( X ) p ) 1 /p , where σ i ( X ) ar e the singular values of X . In particular, k·k ∗ = k·k 1 is the trace or nuclear no rm, k ·k F = k·k 2 is the Frobenius no rm, an d k·k = k·k ∞ is the operator norm. Finally , for matrices, A ∗ is the adjoin t o f A , and ( · , · ) is the Hilbert-Schm idt inn er p roduct, ( A, B ) = tr( A ∗ B ) . Calligr aphic letters de note superop erators acting on ma trices. Also, A A is the superoper ator that maps every matrix X ∈ C d × d to the matrix A tr( A ∗ X ) . 2 Our Results W e will consider the following approac h to low-rank matrix recovery . Let M ∈ C d × d be an u n- known matrix of ran k at most r . Let W 1 , . . . , W d 2 be an or thonorm al basis fo r C d × d , with respec t to the in ner product ( A, B ) = tr( A ∗ B ) . W e choose m basis e lements, S 1 , . . . , S m , iid uniformly at rand om from { W 1 , . . . , W d 2 } (“sam pling with r eplacement”) . W e then observe the co efficients ( S i , M ) . Fr om this data, we want to reconstruct M . For this to be possible, the measurement matrices W i must b e “inco herent” with respect to M . Roughly speaking, this means th at the inner products ( W i , M ) m ust be small. Formally , we say that the basis W 1 , . . . , W d 2 is incoh er ent if the W i all have small operator norm , k W i k ≤ K/ √ d, (1) where K is a constant. 3 (This assumption was also used in [8].) Before procee ding furth er , let us sketch the co nnection betwe en th is problem an d quantum state tomogr aphy . Consider a system of n qubits, with Hilb ert space d imension d = 2 n . W e want to learn the state of the system, which is describe d by a density matrix ρ ∈ C d × d ; ρ is positiv e semidefinite, has trace 1, and has ran k r ≪ d when the state is n early pure. There is a class of convenient (and experimentally fe asible) m easurements, which are describ ed by P auli m atrices ( also called Pauli observables). Th ese are matrices of the form P 1 ⊗ · · · ⊗ P n , where ⊗ denotes the ten sor produ ct (Kronecker product), and each P i is a 2 × 2 matrix chosen from the following four possibilities: I = 1 0 0 1 , σ x = 0 1 1 0 , σ y = 0 − i i 0 , σ z = 1 0 0 − 1 . (2) One can estimate expectation v alues of P auli obser vables, whic h are gi ven by ( ρ, ( P 1 ⊗ · · · ⊗ P n )) . This is a special case of the above mea surement model, wher e th e measurement matrices W i are the ( scaled) Pauli o bservables ( P 1 ⊗ · · · ⊗ P n ) / √ d , and they a re inco herent with k W i k ≤ K / √ d , K = 1 . 3 Note that k W i k is the maximum inner product between W i and any rank-1 matrix M (normalized so that k M k F = 1 ). 3 Now we retu rn to our d iscussion of the general prob lem. W e cho ose S 1 , . . . , S m iid uniformly at random from { W 1 , . . . , W d 2 } , and we define the sampling operator A : C d × d → C m as ( A ( X )) i = d √ m tr( S ∗ i X ) , i = 1 , . . . , m. (3) The normaliza tion is chosen so that E A ∗ A = I . (Note that A ∗ A = P m j =1 S j S j · d 2 m .) W e assume we are gi ven th e data y = A ( M ) + z , where z ∈ C m is some (unknown) noise contribu- tion. W e will construct an estimator ˆ M by m inimizing th e n uclear n orm, subject to the constraints specified by y . (Note that one can view the nuclear norm as a conve x relaxation of the rank function — thus these estimators can be computed efficiently . ) On e approach is the matrix Dantzig selector: ˆ M = ar g min X k X k ∗ such that kA ∗ ( y − A ( X )) k ≤ λ. (4) Alternatively , one can solve a regularized least-squares problem, also called the matrix Lasso: ˆ M = ar g min X 1 2 kA ( X ) − y k 2 2 + µ k X k ∗ . (5) Here, the p arameters λ and µ are set accor ding to the strength o f the noise c ompone nt z (we will discuss this later ). W e will be interested in boundin g the error of th ese estimators. T o do this, we will show that the sampling ope rator A satisfies the restricted isometr y property (RIP). 2.1 RIP for Pauli Measurements Fix some constant 0 ≤ δ < 1 . Fix d , and some set U ⊂ C d × d . W e say that A satisfies the r estricted isometry pr operty (RIP) over U if, for all X ∈ U , we have (1 − δ ) k X k F ≤ kA ( X ) k 2 ≤ (1 + δ ) k X k F . (6) (Here, k A ( X ) k 2 denotes the ℓ 2 norm of a vector , while k X k F denotes th e Frobe nius norm of a matrix.) When U is the set of all X ∈ C d × d with ra nk r , this is precisely the notion o f RIP studied in [3, 5 ]. W e will show that Pauli measurem ents satisfy the RIP over a slightly larger set (the set of all X ∈ C d × d such th at k X k ∗ ≤ √ r k X k F ), provided th e num ber of measur ements m is a t least Ω( rd po ly log d ) . T his result generalizes to measurements in any basis with small opera tor norm. Theorem 2.1 F ix some constant 0 ≤ δ < 1 . Let { W 1 , . . . , W d 2 } be an orthono rmal b asis for C d × d that is inc oher ent in the sense of (1). Let m = C K 2 · r d log 6 d , fo r so me con stant C tha t dep ends only on δ , C = O (1 / δ 2 ) . Let A be defined a s in (3). Th en, with high pr obability (over th e choice of S 1 , . . . , S m ), A satisfies the RIP over the set o f all X ∈ C d × d such that k X k ∗ ≤ √ r k X k F . Furthermore , the failur e pr obab ility i s exponentia lly small in δ 2 C . W e will prove this theorem in section 3. In the remainder of this section, we discuss its applications to low-rank matrix recovery , an d quantum state tomograph y in p articular . 2.2 A pplicatio ns By combinin g Theo rem 2 .1 with pr e vio us results [ 3, 4, 5 ], we immed iately ob tain b ounds on the accuracy of the matrix Dan tzig selector (4) and th e matrix Lasso (5). In particular, for the first time we can show unive rsal recovery of low-rank ma trices via Pauli measur ements, and near-optimal bound s on the accuracy of the reconstru ction when th e data is n oisy [5]. (Similar results ho ld for measuremen ts in any inco herent operator b asis.) These RIP-based r esults improve on th e earlier results based on dual certificates [13, 8, 14]. See [3, 4, 5] f or details. Here, we will sketch a c ouple of these resu lts th at are of pa rticular interest for quan tum state to- mograp hy . Her e, M is the density matrix describing th e state o f a quantum mech anical objec t, and A ( M ) is a vector of Pauli expectation values for the state M . ( M has some additional proper ties: it is positive semidefinite, and has trace 1; th us A ( M ) is a real vector .) Th ere ar e two m ain issues that arise. First, M is not p recisely lo w-r ank. I n m any situatio ns, the ideal state has low rank (for instance, a pure state has ran k 1); h owe ver , for th e actual state observed in an experiment, the de n- sity matrix M is full-rank with decaying eigen values. T ypically , we will be interested in obtaining a good low-rank appro ximation t o M , igno ring the tail of the s pe ctrum. 4 Secondly , the measurements of A ( M ) are inherently n oisy . W e do not ob serve A ( M ) directly ; rather, we estimate each entry ( A ( M )) i by preparing many copies of the state M , measur ing th e Pauli observable S i on each copy , and a veraging the results. Thus, we observe y i = ( A ( M )) i + z i , where z i is binomially distributed. Wh en the number of e xper iments being av erag ed i s large, z i can be approx imated by Gaussian noise. W e will be inter ested in getting an estimate of M that is stable with respect to this no ise. (W e remar k tha t one can also red uce the statistical noise by performin g more r epetitions o f each experiment. T his sugg ests the possibility of a tradeoff between the accu racy of estimating each paramete r , and the numbe r of p arameters o ne chooses to measure overall. Th is will be discussed else whe re [15].) W e would like to reconstru ct M up to a sma ll err or in the nuclea r or Froben ius norm . Let ˆ M be our estimate. Boundin g th e error in n uclear norm im plies that, f or any measur ement allowed by quantum mechan ics, th e probab ility of distingu ishing th e state ˆ M from M is small. Bou nding the error in Frobenius norm im plies that the difference ˆ M − M is h ighly “mixed” (and th us does not contribute to the coherent or “quantum” beha vio r of the system). W e now sketch a few results from [ 4, 5 ] th at ap ply to this situation. Write M = M r + M c , wh ere M r is a ran k- r approximatio n to M , correspon ding to the r largest sing ular values of M , and M c is the residu al p art of M (the “tail” of M ) . Ideally , ou r go al is to estima te M up to an er ror that is not mu ch larger th an M c . First, we can bound th e err or in nuclear norm (a ssuming the data h as n o noise): Proposition 2.2 (Th eor em 5 fr om [4]) Let A : C d × d → C m be the random P auli sa mpling operator , with m = C rd log 6 d , for some ab solute constant C . Then, with high pr obab ility over the choice of A , the following holds: Let M b e any matrix in C d × d , and write M = M r + M c , as describe d above. S ay we o bserve y = A ( M ) , with no noise. Let ˆ M be the Dantzig selector (4) with λ = 0 . Then k ˆ M − M k ∗ ≤ C ′ 0 k M c k ∗ , (7) wher e C ′ 0 is an absolute constant. W e can also bou nd the error in Frob enius norm, allo wing for noisy data: Proposition 2.3 (Lemma 3. 2 fr om [5]) Assume th e same set-up as above, but say we observe y = A ( M ) + z , where z ∼ N (0 , σ 2 I ) . Let ˆ M be the Dantzig selector (4) with λ = 8 √ dσ , or the Lasso (5) with µ = 16 √ dσ . Th en, with high pr obability over the noise z , k ˆ M − M k F ≤ C 0 √ rd σ + C 1 k M c k ∗ / √ r , (8) wher e C 0 and C 1 ar e absolute constants. This b ounds the error of ˆ M in terms of the no ise strength σ and the size of the tail M c . It is uni versal: one samp ling operator A works for all matrices M . While this b ound may seem unnatur al because it m ixes different norms, it can be quite useful. When M actually is low-rank (with ran k r ), then M c = 0 , and the bound (8) beco mes p articularly simple. The depen dence on th e noise strength σ is known to b e nearly minimax-op timal [5]. Fu rthermor e, whe n som e o f the singular values o f M fall below the “noise lev el” √ dσ , one can show a tighter bound , with a nearly-optimal bias-variance tradeoff; see Theore m 2.7 in [5] for details. On the other hand , when M is full-rank, then the error of ˆ M dep ends on the behavior of the tail M c . W e will consider a coup le of cases. Fir st, suppose we do not assume anything abou t M , besides the fact that it is a density matrix for a quantum state. Then k M k ∗ = 1 , hen ce k M c k ∗ ≤ 1 − r d , and we can use (8) to get k ˆ M − M k F ≤ C 0 √ rd σ + C 1 √ r . Thu s, even fo r a rbitrary (not necessarily l ow-rank) quantum states, the estimator ˆ M gives nontrivial results. The O (1 / √ r ) term can be interpreted as the penalty for only measuring an incomplete subset of the Pauli observables. Finally , consider the case where M is full- rank, but we d o know that the tail M c is small. If w e know that M c is small in nuclear norm, then we can use equation (8). Howe ver, if we know that M c is small in Frobenius norm, one can giv e a different bound, using ideas from [5], as follows. 5 Proposition 2.4 Let M be an y matrix in C d × d , with singular values σ 1 ( M ) ≥ · · · ≥ σ d ( M ) . Choose a random P auli sampling operator A : C d × d → C m , with m = C rd log 6 d , for some absolute constant C . Say we observe y = A ( M ) + z , wher e z ∼ N (0 , σ 2 I ) . Let ˆ M be the Dantzig selector ( 4) with λ = 16 √ dσ , or th e Lasso (5) with µ = 32 √ dσ . Then, with high p r obability over the choice of A a nd the noise z , k ˆ M − M k 2 F ≤ C 0 r X i =1 min( σ 2 i ( M ) , dσ 2 ) + C 2 (log 6 d ) d X i = r +1 σ 2 i ( M ) , (9) wher e C 0 and C 2 ar e absolute constants. This boun d can be interpreted as follows. The first term e xpr esses the bias-v arian ce tradeo ff for esti- mating M r , wh ile the seco nd te rm dep ends o n the Frob enius n orm of M c . (Note that th e log 6 d factor may n ot be tight.) In particu lar , th is implies: k ˆ M − M k F ≤ √ C 0 √ rd σ + √ C 2 (log 3 d ) k M c k F . This can be compa red with eq uation ( 8) (in volving k M c k ∗ ). This boun d will be better when k M c k F ≪ k M c k ∗ , i.e., wh en the tail M c has slowly-decaying eigenv alues (in phy sical term s, it is highly mixed). Proposition 2 .4 is an adaptation o f T heorem 2. 8 in [ 5]. W e sketch the proo f in section B. Note that this b ound is no t universal: it shows th at fo r all matr ices M , a random choice o f th e samp ling operator A is likely to w or k. 3 Pr oof of the RIP for Pauli Measure ments W e now prove Theor em 2. 1. The g eneral ap proach inv olvin g Dudley’ s entro py bound is similar to [12], while the technical part of the proof (bo unding c ertain covering number s) u ses ideas from [16]. W e summarize the argume nt h ere; the details are giv en in section A. 3.1 Overview Let U 2 = { X ∈ C d × d | k X k F ≤ 1 , k X k ∗ ≤ √ r k X k F } . Let B be th e set of all self-ad joint linear operator s from C d × d to C d × d , and define the following norm on B : kMk ( r ) = sup X ∈ U 2 | ( X, M X ) | . (10) (Suppo se r ≥ 2 , which is sufficient for our pu rposes. It is straig htforward to show that k ·k ( r ) is a norm, and that B is a Banach space with respect to this norm.) Then let us defin e ε r ( A ) = kA ∗ A − I k ( r ) . (11) By an elementary argum ent, in or der to p rove RIP , it suffices to show that ε r ( A ) < 2 δ − δ 2 . W e will proceed as follows: we will first boun d E ε r ( A ) , then show that ε r ( A ) is co ncentrated aro und its mean. Using a standard sy mmetrization argument, we h a ve that E ε r ( A ) ≤ 2 E P m j =1 ε j S j S j d 2 m ( r ) , where the ε j are Rademache r ( iid ± 1 ) r andom variables. Here th e roun d ket notation S j means we view the matrix S j as an element of the vector spac e C d 2 with Hilber t-Schmidt inner produ ct; the round bra S j denotes the adjoint element in the (dual) vector space. Now we u se the f ollowing lemma, wh ich we will prove later . This b ounds the expected magnitud e in ( r ) - norm of a Rademacher sum of a fixed collectio n of op erators V 1 , . . . , V m that have small operator norm. Lemma 3.1 Let m ≤ d 2 . F ix some V 1 , . . . , V m ∈ C d × d that have unifo rmly boun ded operator norm, k V i k ≤ K (for all i ). Let ε 1 , . . . , ε m be iid uniform ± 1 random variables. Then E ε m X i =1 ε i V i V i ( r ) ≤ C 5 · m X i =1 V i V i 1 / 2 ( r ) , (12) wher e C 5 = √ r · C 4 K log 5 / 2 d log 1 / 2 m a nd C 4 is some universal constant. 6 After some algebra, one g ets that E ε r ( A ) ≤ 2( E ε r ( A ) + 1) 1 / 2 · C 5 · q d m , where C 5 = √ r · C 4 K log 3 d . By finding the ro ots of this quadratic equation, we get the following b ound on E ε r ( A ) . Let λ ≥ 1 . Assume that m ≥ λd (2 C 5 ) 2 = λ · 4 C 2 4 · dr · K 2 log 6 d . Then we ha ve the desired result: E ε r ( A ) ≤ 1 λ + 1 √ λ . (13) It remains to s how th at ε r ( A ) is concentra ted arou nd its e xpe ctation. For this we use a concentration inequality fro m [22] f or sums o f indepe ndent symmetric ran dom variables that take values in so me Banach space. See section A fo r details. 3.2 Proof of Lemma 3.1 (bounding a Rademacher sum in ( r ) -no rm) Let L 0 = E ε k P m i =1 ε i V i V i k ( r ) ; this is the quantity we want to bound. Using a standar d co m- parison principle, we can replace th e ± 1 random variables ε i with iid N (0 , 1) Gaussian random variables g i ; then we get L 0 ≤ E g sup X ∈ U 2 q π 2 | G ( X ) | , G ( X ) = m X i =1 g i | ( V i , X ) | 2 . (14) The random variables G ( X ) (indexed by X ∈ U 2 ) fo rm a Gaussian p rocess, an d L 0 is u pper- bound ed by the expected supremu m of th is pro cess. Using th e fact that G (0) = 0 and G ( · ) is symmetric, and Dudley’ s inequ ality (Theorem 11.17 in [22]), we have L 0 ≤ √ 2 π E g sup X ∈ U 2 G ( X ) ≤ 2 4 √ 2 π Z ∞ 0 log 1 / 2 N ( U 2 , d G , ε ) dε, (15) where N ( U 2 , d G , ε ) is a covering number (the numb er of balls in C d × d of radius ε in the metric d G that are needed to cover the set U 2 ), and the metric d G is giv en by d G ( X, Y ) = E [( G ( X ) − G ( Y )) 2 ] 1 / 2 . (16) Define a new norm (actually a semi-norm) k·k X on C d × d , as follows: k M k X = max i =1 ,...,m | ( V i , M ) | . (17) W e use this to upper-bound the metric d G . An elementary calcu lation shows that d G ( X, Y ) ≤ 2 R k X − Y k X , where R = k P m i =1 V i V i k 1 / 2 ( r ) . This lets us upper-boun d th e covering numbers in d G with covering numbe rs in k·k X : N ( U 2 , d G , ε ) ≤ N ( U 2 , k·k X , ε 2 R ) = N ( 1 √ r U 2 , k·k X , ε 2 R √ r ) . (18) W e will now bound these covering number s. First, we in troduce some n otation: let k ·k p denote th e Schatten p -n orm on C d × d , an d let B p be th e un it ball in this nor m. Also, let B X be the unit ball in the k·k X norm. Observe that 1 √ r U 2 ⊆ B 1 ⊆ K · B X . (The second inclusion follows because k M k X ≤ max i =1 ,...,m k V i kk M k ∗ ≤ K k M k ∗ .) Th is gi ves a simple bound on the covering numbers: N ( 1 √ r U 2 , k·k X , ε ) ≤ N ( B 1 , k·k X , ε ) ≤ N ( K · B X , k·k X , ε ) . (19) This is 1 when ε ≥ K . So, in Dudley’ s inequality , we can restrict the integral to the interv al [0 , K ] . When ε is small, we will use the following simple boun d (equation (5.7 ) in [23]): N ( K · B X , k·k X , ε ) ≤ (1 + 2 K ε ) 2 d 2 . (20) When ε is large, we will use a mor e sophisticated bound ba sed on Maurey’ s empirical m ethod an d entropy duality , which is due to [16] (see also [17]): N ( B 1 , k·k X , ε ) ≤ exp( C 2 1 K 2 ε 2 log 3 d log m ) , fo r some constant C 1 . (21) W e defer the pro of of (21) to the next section. Using (20) and (21), we can bound the integral in Dudley’ s inequality . W e get L 0 ≤ C 4 R √ r K log 5 / 2 d log 1 / 2 m, (22) where C 4 is some universal constant. This proves the lemma. 7 3.3 Proof of Equation (21) (covering numbers of the nuclear -norm ball) Our result will follow easily from a b ound on co vering n umbers introd uced in [16] (where it appears as Lemma 1): Lemma 3.2 Let E be a Ba nach space, having m odulus o f c on vexity of power type 2 with constan t λ ( E ) . Let E ∗ be th e dual spa ce, and let T 2 ( E ∗ ) denote its type 2 co nstant. Let B E denote the unit ball in E . Let V 1 , . . . , V m ∈ E ∗ , such that k V j k E ∗ ≤ K (for all j ). De fine the norm on E , k M k X = m ax j =1 ,...,m | ( V j , M ) | . (23) Then, for any ε > 0 , ε lo g 1 / 2 N ( B E , k·k X , ε ) ≤ C 2 λ ( E ) 2 T 2 ( E ∗ ) K log 1 / 2 m, (24) wher e C 2 is some universal constant. The proof uses entropy duality to reduce the problem to bound ing the “dual” covering numb er . Th e basic idea is as follows. Let ℓ m p denote the co mplex vector space C m with the ℓ p norm. Con sider the m ap S : ℓ m 1 → E ∗ that takes th e j ’th coordin ate v ecto r to V j . L et N ( S ) den ote the nu mber of balls in E ∗ needed to cover the image (und er th e map S ) of the unit ball in ℓ m 1 . W e can bound N ( S ) using Maurey’ s emp irical method. Also define the dual map S ∗ : E → ℓ m ∞ , and the associated dual covering numb er N ( S ∗ ) . Then N ( B E , k·k X , ε ) is related to N ( S ∗ ) . Finally , N ( S ) and N ( S ∗ ) are related via entropy duality inequalities. See [16] for details. W e will apply this lemma as follows, u sing the same approach as [17]. L et S p denote the Ban ach space consisting of all matr ices in C d × d with the Schatten p -norm. Intuitively , we want to set E = S 1 and E ∗ = S ∞ , but this won’t work b ecause λ ( S 1 ) is infinite. Instead, we let E = S p , p = (log d ) / (log d − 1) , and E ∗ = S q , q = log d . No te that k M k p ≤ k M k ∗ , hence B 1 ⊆ B p and ε lo g 1 / 2 N ( B 1 , k·k X , ε ) ≤ ε log 1 / 2 N ( B p , k·k X , ε ) . (25) Also, we h a ve λ ( E ) ≤ 1 / √ p − 1 = √ log d − 1 and T 2 ( E ∗ ) ≤ λ ( E ) ≤ √ log d − 1 (see the Append ix in [1 7]). Note that k M k q ≤ e k M k , thus we have k V j k q ≤ eK (fo r all j ) . Then, using the lemma, we hav e ε lo g 1 / 2 N ( B p , k·k X , ε ) ≤ C 2 log 3 / 2 d ( eK ) log 1 / 2 m, (26) which proves the claim. 4 Outlook W e h a ve showed that rand om Pauli measurements obey the restricted isometry prop erty (RIP), which implies stron g error bounds for lo w-r ank matrix recovery . The key technical tool was a bound on covering numb ers of the nuclea r norm ball, due to Gu ´ ed on et al [16]. An intere sting question is wheth er this metho d can be app lied to othe r problems, such as matrix co m- pletion, or constructing emb eddings of low-dimension al manif olds in to linear sp aces with sligh tly higher dim ension. For m atrix c ompletion, o ne can compar e with th e work o f Negahb an and W ain - wright [10], where the sampling o perator satisfies restricted strong co n vexity (RSC) over a certain set of “non-sp iky” low-rank matrices. For manifold embeddings, one could try to generalize t he resu lts of [24], which use the sparse-vector RIP to construct Johnson-Lin denstrauss metr ic embeddings. There ar e also many qu estions pertainin g to low-rank qu antum state tom ograph y . For example, how does th e matr ix Lasso comp are to th e tr aditional ap proach u sing m aximum likeliho od estima- tion? Also, th ere are se veral variations on the basic tomograph y prob lem, and alternati ve notions of sparsity (e.g., elementwise sparsity in a known basis) [25], which have not been fully explored. Acknowledgements: Thanks to David Gross, Y aniv Plan, Emman uel Cand ` es, Stephen Jord an, and the anonymous re viewers, for help ful suggestion s. Parts of this work were done at the Uni versity of California, Ber keley , and supported by NIST grant nu mber 60N ANB10D26 2. Th is p aper is a contribution o f the National Institute of Stan dards and T ec hnology , and is not subjec t to U.S. copyright. 8 Refer ences [1] M. Fazel. Matrix Rank Minimization with Applica tions . PhD thesis, Stanford , 2002. [2] N. Srebro. Lea rning with Matrix F acto rizations . PhD thesis, MIT , 2004 . [3] B. Recht, M. Fazel, an d P . A. Parrilo. Guaranteed minim um ran k solutions to linear m atrix equations via nuclear norm minimization. SIAM Review , 52( 3):471 –501, 20 10. [4] M. Fazel, E. Cand es, B. Rech t, and P . Parrilo. Compressed sensing and r obust r ecovery o f low ra nk matrices. I n 42nd Asilomar Confer ence on Sign als, Systems and Computers , pages 1043– 1047, 2008. [5] E. J. Cand es and Y . Plan . T igh t oracle bo unds for low-rank matrix recovery from a minimal number of random measurem ents. 2009. [6] E. J. Candes and B. Recht. E xact matrix co mpletion v ia con vex op timization. F ound . of Comput. Math. , 9:717– 772. [7] E. J. Candes and T . T ao. T he power of con vex rela xation: Near-optimal matrix comp letion. IEEE T rans. In form. Theory , 56(5):205 3–208 0, 2 009. [8] D. Gro ss. Recov erin g low-rank matrices from few coefficients in a ny basis. IEEE T rans. Inform. Theory , to appear . arXi v:0 910.1 879, 201 0. [9] B. Recht. A simpler appro ach to matrix completion. J . Machine Learning Researc h (to appear), 2010. [10] S. Negahban and M. J. W ainwright. Restricted stron g con vexity and weighted matrix comple- tion: Op timal bounds with noise. ar Xi v:1 009.2 118, 201 0. [11] E. J. Candes and T . T ao . Near-optimal signal r ecovery f rom rand om projection s: un i versal encodin g strategies. IEE E T rans. Inform. Theory , 52:54 06–54 25, 2004 . [12] M. Rudelson and R. V ershynin. On sparse reconstruction f rom Fourier and Gaussian measure- ments. Commun. Pur e and Applied Math. , 61:1025–10 45, 200 8. [13] D. Gross, Y .-K. Liu, S. T . Flammia, S. Becker, and J. Eisert. Quan tum state tomogr aphy v ia compressed sensing. Phys. Rev . Lett. , 105(1 5):1504 01, Oct 2010. arXiv:0909.3 304. [14] E. J. Candes and Y . Plan. Matrix completion with noise. Pr oc. IEEE , 98(6 ):925 – 936, 2010 . [15] B. Brown, S. Flammia, D. Gross, and Y .-K. Liu. in prepa ration, 2011. [16] O. Gu ´ edon , S. Men delson, A. P ajor, an d N. T omczak- Jaegermann. Majorizing measures and propo rtional subsets of bounde d or thonor mal systems. Rev . M at. Iber oamericana , 24(3):10 75– 1095, 2008. [17] G. Aubrun . On almost randomizing channels with a short Kraus deco mposition. Commun. Math. Phys. , 288:110 3–111 6, 2009 . [18] M. A. Nielsen and I . Chuan g. Quantum Comp utation a nd Qua ntum Info rmation . Cambridge University Press, 2001. [19] A. Rohde and A. T sybakov . Estimation of hig h-dimen sional low-rank matr ices. arXiv:0912.533 8, 2009 . [20] V . K oltchinskii, K. Lounici, and A. B. Tsybakov . Nuclear norm penalization and optimal rates for noisy low rank matrix completio n. arXiv:1011.62 56, 2 010. [21] Y .-K. Liu . Universal low-rank matrix recovery from Pauli measuremen ts. arXiv:1103. 2816, 2011. [22] M. Ledoux and M. T alagrand . Pr obab ility in Ban ach spaces . Springer, 1991. [23] G. Pisier . The volume of con vex bodies and Ban ach space geometry . Cambrid ge, 1999. [24] F . Krah mer and R. W ard. New and improved Johnson- Lindenstrau ss embedding s via the re- stricted isometry property . SIAM J. Math. Anal. , 43(3):126 9–128 1, 2 011. [25] A. Shab ani, R. L. Kosut, M. Mohseni, H. Rabitz, M. A. Broome, M. P . Alme ida, A. Fedrizzi, and A. G. W hite. Efficient measuremen t of quantu m dyn amics via compressi ve sensing. Phys. Rev . Lett. , 106(1 0):1004 01, 2011. [26] P . W ojtaszczyk . Stability and instance o ptimality f or gau ssian m easurements in com pressed sensing. F ound. Comput. Math. , 10(1):1–1 3, 2 009. 9 Univer sal low-rank matrix re covery from Pauli measurements: Supplemen tary material A Proof of th e RIP for Pauli Measur ements A.1 Overview W e now prove Theorem 2.1. In this section we giv e an overview; proofs of the technical claims a re deferred to later sections. Th e g eneral approac h inv olving Du dley’ s entropy bound is similar to [12], while the technical part of the proof (bound ing certain covering numb ers) uses ideas from [16]. Recall the definition of the restricted isometry property , with constant 0 ≤ δ < 1 . Let U = { X ∈ C d × d | k X k ∗ ≤ √ r k X k F } . (27) Let us define U 2 = { X ∈ C d × d | k X k F ≤ 1 , k X k ∗ ≤ √ r k X k F } , (28) ε r ( A ) = s up X ∈ U 2 | ( X, ( A ∗ A − I ) X ) | . (29) Also, define ε = 2 δ − δ 2 . W e c laim th at, to sh ow RIP , it suffices to show ε r ( A ) < ε . T o see this, note that the RIP condition is equiv alen t to the statement for all X ∈ U , (1 − δ ) 2 ( X, X ) ≤ ( X , A ∗ A X ) ≤ (1 + δ ) 2 ( X, X ) , (30) which is equiv alen t to for all X ∈ U , ( − 2 δ + δ 2 )( X, X ) ≤ ( X , ( A ∗ A − I ) X ) ≤ (2 δ + δ 2 )( X, X ) , (31) which is implied by for all X ∈ U 2 , | ( X, ( A ∗ A − I ) X ) | ≤ min { 2 δ + δ 2 , 2 δ − δ 2 } = 2 δ − δ 2 . (32) Thus our goal is to show ε r ( A ) < ε . (Note that for δ in the ra nge [0 , 1] , we have that ε ≥ δ .) Let B be the set of all self -adjoint linear operators from C d × d to C d × d , an d define the following norm on B : kMk ( r ) = sup X ∈ U 2 | ( X, M X ) | . (33) Suppose that r ≥ 2 (th is will suffice for our purposes, since RIP with r = 2 implies RIP with r = 1 ). W e claim that k·k ( r ) is a norm, and that B is a Banach space with respect to this norm. T o show these claims, we will consider the Frobenius norm k·k F on B , which is defined by viewing each element of B as a “matrix” acting on “vectors” that are elements of C d × d . Then we will bound k·k ( r ) in terms of k·k F . More precisely , let ~ e a ( a ∈ { 0 , 1 , . . . , d − 1 } ) be the standard basis vectors in C d , and let E ab = ~ e a ~ e ∗ b ( a, b ∈ { 0 , 1 , . . . , d − 1 } ) be th e standard basis vectors in C d × d . Then the Frobenius norm on B can be written as kMk F = X abcd E cd M E ab 2 1 / 2 . (34) W e claim that, for all M ∈ B , kMk ( r ) ≥ 1 3 √ 2 d 2 kMk F . (35) T o see this, suppose that kMk F ≥ µ ; the n there m ust exist a , b, c, d ∈ { 0 , 1 , . . . , d − 1 } such that ( E cd |M| E ab ) ≥ 1 d 2 µ . If E ab = E cd , th en we have kMk ( r ) ≥ 1 d 2 µ . Otherwise, we have ( E ab | E cd ) = 0 . No w at least one of the following must be true: Re E cd M E ab ≥ 1 √ 2 d 2 µ (case 1) , (36) Im E cd M E ab ≥ 1 √ 2 d 2 µ (case 2) . (37) 10 In case 1, let X = 1 √ 2 ( E ab + E cd ) , and write Re E cd M E ab = X M X − 1 2 E ab M E ab − 1 2 E cd M E cd . (38) One o f the three terms on the r ight h and side m ust have absolute v alue a t least 1 3 √ 2 d 2 µ . Since X , E ab , E cd are in U 2 , it fo llows that kMk ( r ) ≥ 1 3 √ 2 d 2 µ . In case 2, let X = 1 √ 2 ( E ab + iE cd ) , and write Im E cd M E ab = i X M X − 1 2 i E ab M E ab − 1 2 i E cd M E cd . (39) By a similar argument, we get that kMk ( r ) ≥ 1 3 √ 2 d 2 µ . This shows (35). In addition, it is straightfo rward to see that kMk ( r ) ≤ sup X : k X k F ≤ 1 | ( X, M X ) | ≤ kMk op ≤ kMk F . (40) Finally , using (3 5) and (40), we see that k·k ( r ) is a nor m, an d B is a Banach sp ace with respect to k·k ( r ) . (This follows since these same pro perties already hold for k·k F .) In particu lar , k·k ( r ) is nondegen erate ( kMk ( r ) = 0 imp lies M = 0 ), and B is com plete with respect to k·k ( r ) . Returning to our main proof, we can now write ε r ( A ) = kA ∗ A − I k ( r ) . The strategy of th e pr oof will be to first bound E ε r ( A ) , then show that ε r ( A ) is concentra ted aroun d its mean. W e claim that E ε r ( A ) ≤ 2 E m X j =1 ε j S j S j d 2 m ( r ) , (41) where the ε j are Rademacher (iid ± 1 ) random variables. Her e the r ound ket notatio n S j means we view th e matrix S j as an elemen t of the vector space C d 2 with Hilbert-Schm idt in ner product; the round b ra S j denotes th e ad joint elemen t in the (du al) vector space. Th e above bo und follows from a standard symmetrization argument: write A ∗ A − I = P m j =1 X j where X j = S j S j d 2 m − I m , then let X ′ j be indepe ndent co pies of the ran dom variables X j , and use eq uation (2.5) and Lemm a 6.3 in [22] to write: E ε r ( A ) = E X j X j ( r ) ≤ E X j ( X j − X ′ j ) ( r ) = E k X j ε j ( X j − X ′ j ) k ( r ) = E X j ε j S j S j − S ′ j S ′ j d 2 m ( r ) ≤ 2 E X j ε j S j S j d 2 m ( r ) . (42) Now we u se the f ollowing lemma, wh ich we will prove later . This b ounds the expected magnitud e in ( r ) - norm of a Rademacher sum of a fixed collectio n of op erators V 1 , . . . , V m that have small operator norm. Lemma A.1 (r estatement o f Lemma 3.1) Let m ≤ d 2 . F ix so me V 1 , . . . , V m ∈ C d × d that ha ve uniformly boun ded operator norm, k V i k ≤ K (for a ll i ). Let ε 1 , . . . , ε m be iid un iform ± 1 random variables. Then E ε m X i =1 ε i V i V i ( r ) ≤ C 5 · m X i =1 V i V i 1 / 2 ( r ) , (43) wher e C 5 = √ r · C 4 K log 5 / 2 d log 1 / 2 m a nd C 4 is some universal constant. W e ap ply the lem ma as fo llows . Let Ω = { S 1 , . . . , S m } be the mu ltiset of all the measur ement operator s that appear in the sampling opera tor A . Then we ha ve E ε r ( A ) ≤ 2 E Ω E ε X J ∈ Ω ε J √ d J J √ d ( r ) · d m . (44) 11 Using the lemma on the set of operator s √ dJ ( J ∈ Ω ), we get E ε r ( A ) ≤ 2 E Ω C 5 · X J ∈ Ω √ d J J √ d 1 / 2 ( r ) · d m ≤ 2 E Ω X J ∈ Ω √ d J J √ d ( r ) 1 / 2 · C 5 · d m = 2 E kA ∗ Ak ( r ) 1 / 2 · C 5 · q d m ≤ 2( E ε r ( A ) + 1 ) 1 / 2 · C 5 · q d m , (45) where C 5 = √ r · C 4 K log 3 d . T o make th e notation mo re concise, define E 0 = E ε r ( A ) and C 0 = 2 C 5 q d m . Then , squaring both sides and rearrangin g, we have E 2 0 − C 2 0 E 0 − C 2 0 ≤ 0 . (46) This quadra tic equatio n has two roots, wh ich are giv en by α ± = 1 2 ( C 2 0 ± C 0 p C 2 0 + 4) , an d we know that E 0 is bound ed by α − ≤ 0 ≤ E 0 ≤ α + . (47) Also, we can simplify the boun d by writing α + ≤ 1 2 ( C 2 0 + C 0 ( C 0 + 2)) = C 2 0 + C 0 . Now we use the f act that m is large. Let λ ≥ 1 ( we will choose a precise v alue for λ later). Assume that m ≥ λd (2 C 5 ) 2 = λ · 4 C 2 4 · dr · K 2 log 6 d. (48) Then C 0 ≤ 1 / √ λ , and α + ≤ 1 λ + 1 √ λ , and we have the desired result: E ε r ( A ) ≤ 1 λ + 1 √ λ . (49) It rema ins to show that ε r ( A ) is co ncentrated around its expectation. W e will use a co ncentration inequality fro m [22] f or sums o f indepe ndent symmetric ran dom variables that take values in so me Banach space. Define X = P m j =1 X j where X j = d 2 m S j S j − I m ; the n we h av e A ∗ A − I = X and ε r ( A ) = kX k ( r ) . W e showed a bove that E kX k ( r ) ≤ 1 λ + 1 √ λ . In addition, we can boun d eac h X j as follows, usin g the fact that, for X ∈ U 2 , | ( S j , X ) | ≤ k S j kk X k ∗ ≤ ( K/ √ d ) √ r k X k F ≤ ( K / √ d ) √ r . kX j k ( r ) = sup X ∈ U 2 d 2 m | ( S j , X ) | 2 − 1 m k X k 2 F ≤ drK 2 + 1 m ≤ 1 λ · 4 C 2 4 . (50) W e use a standar d sym metrization argu ment: let X ′ j denote an in depende nt copy of X j , and d efine Y j = X j − X ′ j , which is sym metric ( −Y j has the same d istribution a s Y j ). Also d efine Y = P m j =1 Y j = X − X ′ . Using the triangle inequality , we hav e E kY k ( r ) ≤ 2 E kX k ( r ) ≤ 2( 1 λ + 1 √ λ ) , (51) kY j k ( r ) ≤ 2 kX j k ( r ) ≤ 1 λ · 2 C 2 4 . (52) Using equation (6.1) in [22], we hav e, for any u ≥ 0 , Pr h kX k ( r ) > 2( 1 λ + 1 √ λ ) + u i ≤ Pr h kX k ( r ) > 2 E kX k ( r ) + u i ≤ 2 Pr h kY k ( r ) > u i . (53) W e will use the following concentration ineq uality of L edoux and T alagrand [22]. Th is is a special case of T heorem 6.17 in [22], where we set s = Rℓ and use equation (6.19) in [22]. This is th e same bound used in [12]. 12 Theorem A.2 Let Y 1 , . . . , Y m be indepe ndent symmetric rand om v ariables taking values in some Banach spa ce. Assume that k Y j k ≤ R for a ll j . Let Y = P m j =1 Y j . Then, fo r a ny integer s ℓ ≥ q , and any t > 0 , Pr h kY k ≥ 8 q E kY k + 2 R ℓ + t E kY k i ≤ ( C 7 /q ) ℓ + 2 exp( − t 2 / 256 q ) , (54) wher e C 7 is some universal constant. Now set q = ⌈ eC 7 ⌉ . Introduce a new param eter s ≥ √ q + 1 , and set ℓ = ⌊ s 2 ⌋ an d t = s . W e get that the failure probability is exponentially s mall in s : Pr h kY k ( r ) ≥ (8 q + s ) E kY k ( r ) + 2 Rs 2 i ≤ e − s 2 +1 + 2 e − s 2 / 256 q . (55) Then, using (51), (52) and (53), we get Pr h kX k ( r ) ≥ (1 + 8 q + s ) · 2 ( 1 λ + 1 √ λ ) + 1 λC 2 4 s 2 i ≤ 2[ e − s 2 +1 + 2 e − s 2 / 256 q ] . (56) Now let λ ≥ (1 + 8 q ) 2 · 256 ε 2 (note that λ ≥ 1 , as req uired). Then set s = ε √ λ 16 (note that s ≥ 1 + 8 q ≥ √ q + 1 , as req uired). Then we can write (1 + 8 q + s ) · 2( 1 λ + 1 √ λ ) + 1 λC 2 4 s 2 ≤ 8 s √ λ + s 2 C 2 4 λ = ε 2 + ε 2 256 C 2 4 ≤ ε. (57) Plugging into the previous inequality , we ha ve Pr[ kX k ( r ) ≥ ε ] ≤ e − Ω( s 2 ) = e − Ω( ε 2 λ ) . (5 8) Therefo re, we have ε r ( A ) ≤ ε , with a failure p robability tha t de creases expo nentially in λ . This completes the proo f. A.2 Proof of Lemm a 3.1 (bo unding a Rademacher sum in ( r ) -norm) Let L 0 = E ε k P m i =1 ε i V i V i k ( r ) ; this is the quan tity we want to bound . W e can upp er-bound it by replacing the ± 1 r andom variables ε 1 , . . . , ε m with iid N (0 , 1 ) Gaussian random variables g 1 , . . . , g m (see Lemma 4.5 and equation (4.8) in [22]); then we get L 0 ≤ E g q π 2 m X i =1 g i V i V i ( r ) . (59) Using the definition of the norm k·k ( r ) (equation (33)), we hav e L 0 ≤ E g sup X ∈ U 2 q π 2 | G ( X ) | , G ( X ) = m X i =1 g i | ( V i , X ) | 2 . (60) The random variables G ( X ) (indexed by X ∈ U 2 ) fo rm a Gaussian p rocess, an d L 0 is u pper- bound ed by the expe cted supremum of this pr ocess. In particular, using the fact that G (0) = 0 a nd G ( · ) is symmetric (see [22], pp.298) , we hav e L 0 ≤ q π 2 E g sup X ∈ U 2 | G ( X ) − G (0) | ≤ q π 2 E g sup X,Y ∈ U 2 | G ( X ) − G ( Y ) | = q π 2 E g sup X,Y ∈ U 2 G ( X ) − G ( Y ) = √ 2 π E g sup X ∈ U 2 G ( X ) . (61) Using Dudley’ s inequality (Theor em 11.1 7 in [22]), we ha ve L 0 ≤ 24 √ 2 π Z ∞ 0 log 1 / 2 N ( U 2 , d G , ε ) dε, (62) where N ( U 2 , d G , ε ) is a covering number (the numb er of balls in C d × d of radius ε in the metric d G that are needed to cover the set U 2 ), and the metric d G is giv en by d G ( X, Y ) = E [( G ( X ) − G ( Y )) 2 ] 1 / 2 . (63) 13 W e can simplify the metric d G , using the fact that E [ g i g j ] = 1 when i = j and 0 otherwise: d G ( X, Y ) = E h m X i =1 g i ( | ( V i , X ) | 2 − | ( V i , Y ) | 2 ) 2 i 1 / 2 = m X i =1 | ( V i , X ) | 2 − | ( V i , Y ) | 2 2 1 / 2 (64) Define a new norm (actually a semi-norm) k·k X on C d × d , as follows: k M k X = max i =1 ,...,m | ( V i , M ) | . (65) Note that 4 | ( V i , X ) | 2 − | ( V i , Y ) | 2 ≤ | ( V i , X ) | + | ( V i , Y ) | · | ( V i , X ) − ( V i , Y ) | ≤ | ( V i , X ) | + | ( V i , Y ) | · k X − Y k X . (66) This lets us giv e a simpler uppe r bound on the metric d G : d G ( X, Y ) ≤ m X i =1 | ( V i , X ) | + | ( V i , Y ) | 2 · k X − Y k 2 X 1 / 2 ≤ h m X i =1 | ( V i , X ) | 2 1 / 2 + m X i =1 | ( V i , Y ) | 2 1 / 2 i · k X − Y k X ≤ 2 s up X ∈ U 2 m X i =1 | ( V i , X ) | 2 1 / 2 · k X − Y k X = 2 m X i =1 V i V i 1 / 2 ( r ) · k X − Y k X . (67) Note that the last step holds for all X , Y ∈ U 2 . T o simplify the notation, let R = k P m i =1 V i V i k 1 / 2 ( r ) , then we have d G ( X, Y ) ≤ 2 R k X − Y k X . This lets us upper-boun d the covering numbers in d G with covering numbe rs in k·k X : N ( U 2 , d G , ε ) ≤ N ( U 2 , k·k X , ε 2 R ) = N ( 1 √ r U 2 , k·k X , ε 2 R √ r ) . (68) Plugging into (62) and changing variables, we get L 0 ≤ 48 √ 2 π R √ r Z ∞ 0 log 1 / 2 N ( 1 √ r U 2 , k·k X , ε ) dε. (69) W e will now bound these covering number s. First, we in troduce some n otation: let k ·k p denote th e Schatten p -n orm on C d × d , an d let B p be th e un it ball in this nor m. Also, let B X be the unit ball in the k·k X norm. Observe that 1 √ r U 2 ⊆ B 1 ⊆ K · B X . (70) (The seco nd inc lusion f ollows because k M k X ≤ max i =1 ,...,m k V i kk M k ∗ ≤ K k M k ∗ .) This giv es a simple boun d on the covering numb ers: N ( 1 √ r U 2 , k·k X , ε ) ≤ N ( B 1 , k·k X , ε ) ≤ N ( K · B X , k·k X , ε ) . (71) This equals 1 when ε ≥ K . So, in equation (69), we can restrict the integral to the interval [0 , K ] . When ε is small, we will use the following simple bound ( equation (5. 7) in [23]): (this is equation (20)) N ( K · B X , k·k X , ε ) ≤ (1 + 2 K ε ) 2 d 2 . (72) 4 Note that, for any complex numbers a and b , | a | 2 − | b | 2 = 1 2 (¯ a + ¯ b )( a − b ) + 1 2 ( a + b )(¯ a − ¯ b ) ≤ | a + b | · | a − b | . 14 When ε is large, we will use a mor e sophisticated bound ba sed on Maurey’ s empirical m ethod an d entropy duality , which is due to [16] (see also [17]): ( this is equation (21)) N ( B 1 , k·k X , ε ) ≤ exp( C 2 1 K 2 ε 2 log 3 d log m ) , fo r some constant C 1 . (73) W e defer the pro of of (21) to the next section. Here, we pro ceed to bound the integral in (69). Let A = K/ d . For the integral over [0 , A ] , we write L 1 := Z A 0 log 1 / 2 N ( 1 √ r U 2 , k·k X , ε ) dε ≤ Z A 0 √ 2 d log 1 / 2 (1 + 2 K ε ) dε ≤ √ 2 d Z A 0 1 + log(1 + 2 K ε ) dε = √ 2 d · A + √ 2 d · L ′ 1 (74) where L ′ 1 := Z A 0 log(1 + 2 K ε ) dε = Z ∞ 1 / A log(1 + 2 K y ) dy y 2 ≤ Z ∞ 1 / A log(( A + 2 K ) y ) dy y 2 = Z ∞ 1 / A log( A + 2 K ) dy y 2 + Z ∞ 1 / A log y dy y 2 . (75) Integrating by parts, we get L ′ 1 ≤ A log( A + 2 K ) + A log 1 A + A = A log(1 + 2 K A ) + A, (76) and substituting back in, L 1 ≤ √ 2 dA (2 + log(1 + 2 K A )) = √ 2 K (2 + lo g(1 + 2 d )) . (77) For the integral over [ A, K ] , we write L 2 := Z K A log 1 / 2 N ( 1 √ r U 2 , k·k X , ε ) dε ≤ Z K A C 1 K ε log 3 / 2 d log 1 / 2 m dε = C 1 K log 3 / 2 d log 1 / 2 m log K A = C 1 K log 5 / 2 d log 1 / 2 m. (78) Finally , substituting into (6 9), we get L 0 ≤ 48 √ 2 π R √ r ( L 1 + L 2 ) ≤ C 4 R √ r K log 5 / 2 d log 1 / 2 m, (79) where C 4 is some universal constant. This proves the lemma. A.3 Proof of Eq uat ion (21) (covering numbers of the nuclear -no rm b all) Our result will follow easily from a b ound on co vering n umbers introd uced in [16] (where it appears as Lemma 1): Lemma A.3 Let E be a Banach space, ha ving modu lus of conve xity of p ower type 2 with constant λ ( E ) . Let E ∗ be th e dual spa ce, and let T 2 ( E ∗ ) denote its type 2 co nstant. Let B E denote the unit ball in E . Let V 1 , . . . , V m ∈ E ∗ , such that k V j k E ∗ ≤ K (for all j ). De fine the norm on E , k M k X = m ax j =1 ,...,m | ( V j , M ) | . (80) Then, for any ε > 0 , ε lo g 1 / 2 N ( B E , k·k X , ε ) ≤ C 2 λ ( E ) 2 T 2 ( E ∗ ) K log 1 / 2 m, (81) wher e C 2 is some universal constant. 15 The proof uses entropy duality to reduce the problem to bound ing the “dual” covering numb er . Th e basic idea is as follows. Let ℓ m p denote the co mplex vector space C m with the ℓ p norm. Con sider the m ap S : ℓ m 1 → E ∗ that takes th e j ’th coordin ate v ecto r to V j . L et N ( S ) den ote the nu mber of balls in E ∗ needed to cover the image (und er th e map S ) of the unit ball in ℓ m 1 . W e can bound N ( S ) using Maurey’ s emp irical method. Also define the dual map S ∗ : E → ℓ m ∞ , and the associated dual covering numb er N ( S ∗ ) . Then N ( B E , k·k X , ε ) is related to N ( S ∗ ) . Finally , N ( S ) and N ( S ∗ ) are related via entropy duality inequalities. See [16] for details. W e will apply this lemma as follows, u sing the same approach as [17]. L et S p denote the Ban ach space consisting of all matr ices in C d × d with the Schatten p -norm. Intuitively , we want to set E = S 1 and E ∗ = S ∞ , but this won’t work b ecause λ ( S 1 ) is infinite. Instead, we let E = S p , p = (log d ) / (log d − 1) , and E ∗ = S q , q = log d . No te that k M k p ≤ k M k ∗ , hence B 1 ⊆ B p and ε lo g 1 / 2 N ( B 1 , k·k X , ε ) ≤ ε log 1 / 2 N ( B p , k·k X , ε ) . (82) Also, we h a ve λ ( E ) ≤ 1 / √ p − 1 = √ log d − 1 and T 2 ( E ∗ ) ≤ λ ( E ) ≤ √ log d − 1 (see the Append ix in [1 7]). Note that k M k q ≤ e k M k , thus we have k V j k q ≤ eK (fo r all j ) . Then, using the lemma, we hav e ε lo g 1 / 2 N ( B p , k·k X , ε ) ≤ C 2 log 3 / 2 d ( eK ) log 1 / 2 m, (83) which proves the claim. B Proof of Pr oposition 2.4 (r ecovery of a full-rank matrix) In this sectio n we will sketch th e pr oof of Prop osition 2.4. W e use the same a rgument as Theorem 2.8 in [5], adapted for Pauli (rather than Gaussian) measurements. A cr ucial in gredient is the NNQ (“nuclear no rm qu otient”) prop erty of a sampling opera tor A , which was in troduced in [5] a nd is an alogous to the L Q (“ ℓ 1 -quotien t”) proper ty in compr essed sensing [26]. W e say tha t a sampling operator A : C d × d → C m satisfies the NNQ( α ) prope rty if A ( B 1 ) ⊇ αB 2 , (84) where B 1 is th e unit b all of th e nuclear n orm in C d × d , an d B 2 is the unit ball o f the ℓ 2 (Euclidean ) norm in C m . It is easy to see that the Pauli sam pling o perator A d efined in (3) satisfies NNQ( α ) with α = p d/m . (W ithout loss of gene rality , suppo se that the Pauli matrices S 1 , . . . , S m used to construct A are all distinct. Let α = p d/m and ch oose any y ∈ αB 2 . Let X = √ m d P m i =1 y i S i , so we have A ( X ) = y . Obser ve th at k X k ∗ ≤ √ d k X k F = p m d k y k 2 ≤ 1 , as desired.) W e remar k that this value o f α is p robably not optimal; if one co uld pr ove that A satisfies NNQ( α ) with larger α , it would improve the b ound in Proposition 2.4. W e will need one mo re prop erty of A . W e want the fo llowing to h old: for any fixed m atrix M ∈ C d × d (which is not necessarily low-rank), almost all random choices of A will satisfy kA ( M ) k 2 2 ≤ 1 . 5 k M k 2 F . (85) (Note tha t this in equality is required to hold o nly for this o ne particular matr ix M .) In our case (rando m Pauli measurements), it is easy to check that A o beys this property as well. The pr oof of T heorem 2.8 in [5] actually implies th e following mo re gener al statement, ab out low- rank matrix recovery when A satisfies both RIP and NNQ: Theorem B.1 Let M be any matrix in C d × d , an d let σ 1 ( M ) ≥ σ 2 ( M ) ≥ · · · ≥ σ d ( M ) ≥ 0 be its singular values. Write M = M r + M c , wher e M r contains the r lar gest singular values of M . Also write M = M 0 + M e , wher e M 0 contains only those singular values of M that e xceed λ = 1 6 √ dσ . Suppo se the sam pling op erator A : C d × d → C m satisfies RI P (fo r rank- r matrices in C d × d ), a nd NNQ( α ) with α = µ p d/m . Furthermore , suppose that A satisfies kA ( M c ) k 2 2 ≤ 1 . 5 k M c k 2 F and kA ( M e ) k 2 2 ≤ 1 . 5 k M e k 2 F . 16 Say we observe y = A ( M ) + z , wher e z ∼ N (0 , σ 2 I ) . Let ˆ M be the D antzig selector (4) with λ = 16 √ dσ , or the Lasso (5) with µ = 32 √ dσ . Then, with hig h p r obability over th e choice of A and the noise z , k ˆ M − M k 2 F ≤ C 0 r X i =1 min( σ 2 i ( M ) , dσ 2 ) + C 1 + C 2 m µ 2 rd d X i = r +1 σ 2 i ( M ) , (86) wher e C 0 , C 1 and C 2 ar e absolute constants. T o p rove Theorem B.1, one follows the proo f of T heorem 2.8 in [ 5]. T here is a sligh t mo dification to Lemma 3.10 in [5]: o ne gets the more general bound, k ˆ M − M k F ≤ C 0 λ √ r + C 1 + C 2 µ q m r d kA ( M c ) k 2 + k M c k F . (87) Combining Theorem B.1 with the precedin g facts giv es us Proposition 2.4. 17
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment