Multireference Alignment using Semidefinite Programming

The multireference alignment problem consists of estimating a signal from multiple noisy shifted observations. Inspired by existing Unique-Games approximation algorithms, we provide a semidefinite program (SDP) based relaxation which approximates the…

Authors: Afonso S. B, eira, Moses Charikar

Multireference Alignment using Semidefinite Programming
Multireference Alignmen t using Semidefinite Programming Afonso S. Bandeira ∗ Moses Charik ar † Amit Singer ‡ Andy Zh u § August 27, 2013 Abstract The m ultireference alignmen t problem consists of estimating a signal from multiple noisy shifted observ ations. Inspired by existing Unique-Games appro ximation algorithms, w e provide a semidefi- nite program (SDP) based relaxation whic h approximates the maxim um likelihoo d estimator (MLE) for the m ultireference alignment problem. Although we sho w that the MLE problem is Unique- Games hard to appro ximate within any constant, we observ e that our p oly-time approximation algorithm for the MLE app ears to perform quite well in t ypical instances, outp erforming existing metho ds. In an attempt to explain this b ehavior w e pro vide stability guaran tees for our SDP under a random noise model on the observ ations. This case is more challenging to analyze than traditional semi-random instances of Unique-Games: the noise model is on vertices of a graph and translates in to dependent noise on the edges. In terestingly , w e sho w that if certain positivity constraints in the SDP are dropped, its solution becomes equiv alen t to p erforming phase correlation, a p opular metho d used for pairwise alignment in imaging applications. Finally , we sho w ho w symmetry re- duction techniques from matrix representation theory can simplify the analysis and computation of the SDP , greatly decreasing its computational cost. Keyw ords: Multireference Alignment, Semidefinite Relaxation, Phase Correlation, Unique-Games ∗ Program in Applied and Computational Mathematics (P A CM), Princeton Universit y , Princeton, NJ 08544, USA ( ajsb@math.princeton.edu ). † Departmen t of Computer Science, Princeton Universit y , Princeton, NJ 08544, USA ( moses@cs.princeton.edu ). ‡ Departmen t of Mathematics and P A CM, Princeton Universit y , Princeton, NJ 08544, USA ( amits@math.princeton.edu ). § Departmen t of Mathematics, Princeton Universit y , Princeton, NJ 08544, USA ( azhu@math.princeton.edu ). 1 1 In tro duction The multireference alignment problem is the following: supp ose there is a template vector x ∈ R L , from which are sampled N cyclically-shifted copies with white additiv e Gaussian noise y i = R l i x + ξ i ∈ R L , ξ i = ( ξ ik ) L k =1 ∼ N (0 , σ 2 I L ) i.i.d . (1) for i = 1 , 2 , . . . , N . R l is the index cyclic shift op erator ( x 1 , . . . , x L ) 7→ ( x 1 − ` , . . . , x L − ` ) and σ is a parameter controlling the signal-to-noise (SNR) ratio. Both the template x and the shifts l 1 , . . . , l n are unkno wn. F urthermore, no mo del is presumed a-priori for their distribution. F rom this mo del we w ould like to deduce an accurate estimate for x (up to a global cyclic shift). Such an estimate can b e obtained b y first estimating the shifts, and then a veraging the unshifted observ ations. F or this reason w e will fo cus on the problem of estimating the shifts l 1 , . . . , l n . This problem has a v ast list of applications. Alignmen t is directly used in structural biology [Dia92] [TS12]; radar [ZvdHGG03] [PZAF05]; crystalline simulations [SSK13]; and image registration in a num ber of imp ortan t contexts, such as in geology , medicine, and paleon tology [DM98] [FZB02]. V arious metho ds to solve this problem are used in these comm unities (see App endix A). A na ¨ ıv e approach to estimate the shifts in (1) w ould b e to fix one of the observ ations, sa y y i , as a reference template and align ev ery other y j with it by the shift ρ ij minimizing their distance ρ ij = argmin l ∈ Z L k y j − R l y i k 2 . (2) This solution w orks well at a high signal-to-noise ratio (SNR), but p erforms p o orly at low SNR. A more demo cratic approach w ould b e to calculate all of the pairwise relativ e shift estimates ρ ij b efore attempting to recov er the shifts { l i } . This can b e done b y solving the minimization problem min l 1 ,...,l N ∈ Z L N X i,j =1    e 2 π ıl i /L − e 2 π ıρ ij /L e 2 π ıl j /L    2 . (3) This problem is known as angular synchronization [Sin11, BSS12] and the solution can b e approximated via a SDP-based relaxation. The issue with attempting to solve the alignment problem using either (2) or (3) is that one is ev aluating the performance of a given c hoice of l i , l j for a pair ( i, j ) b y how far l i − l j is from ρ ij , but not taking into account the cost asso ciated with other p ossible relative shifts. Relating R − l i y i and R − l i y j , for example, w ould tak e in to accoun t information ab out all p ossible shifts instead of just the b est one. The quasi maxim um lik eliho o d estimator (Section 2) attempts to do exactly that b y solving the minimization problem: min l 1 ,...,l N ∈ Z L N X i,j =1   R − l i y i − R − l j y j   2 . (4) Finding the MLE (4) is a non-trivial computational task because the parameter space is of exp o- nen tial size, and the likelihoo d function is non-con vex. While one can apply optimization metho ds suc h as gradien t descen t, simulated annealing, and expe ctation-maximization (EM), these are only guaran teed to find lo cal minima of (4), but not the global minimum. In this pap er we tak e a different approac h and prop ose a semidefinite relaxation for the quasi maxim um lik eliho o d problem (4). This particular SDP is inspired by an appro ximation algorithm designed to solve the Unique Games problem [CMM06] (Section 3). 2 Con vex relaxations of hard combinatorial problems hav e seen man y successes in applied mathe- matics. They b ecame particularly p opular in the last decade with the in tro duction of Compressed Sensing, in the seminal work of Donoho, Candes, T ao, and others [CR T06, Don06]. This idea has since b een applied to a v ast list of problems. Semidefinite programming (SDP) has served as a con- v ex surrogate for problems arising in applications such as lo w-rank matrix completion [CR09], phase retriev al [CSV11], Robust PCA [LMTZ12], m ultiple-input m ultiple-output (MIMO) c hannel detec- tion [MCS10], and many others. In many of these applications the same phenomenon is presen t: for t ypical instances, solving the conv ex problem is often equiv alent to solving the original combinatorial problem [ALMT13]. Con vex relaxations (and, in particular SDP based relaxations) also pla y a cen tral role in the design of appro ximation algorithms in theoretical computer science. Almost tw o decades ago, Goemans and Williamson [GW95] proposed a SDP based approximation algorithm for the MAX-CUT problem with appro ximation ratio α GW ≈ 0 . 878. That is, for any instance of the problem, the computed solution is guaranteed to provide p erformance (in this case, a cut) at least α GW of the optim um. Man y semidefinite relaxations hav e since b een prop osed as appro ximation algorithms for a long list of NP- hard problems [WS11]. In order to better understand the theoretical limitations of approximation algorithms, substantial w ork has b een done to establish limits on the appro ximation ratios achiev able b y p oly-time algo- rithms for certain NP -hard problems (hardness of approximation). The Unique Games conjecture b y Khot [Kho02] is cen tral to many recen t developmen ts: F or δ , ε > 0, it is imp ossible for a p olynomial- time algorithm to distinguish betw een δ -satisfiable and (1 − ε )-satisfiable Unique-Games instances. A Unique-Games instance consists of a graph along with a p erm utations for eac h edge. The problem is to c ho ose the best assignmen t of labels to eac h v ertex suc h that as man y of the edge permutations are sat- isfied. The v alidit y of the UGC would imply the optimalit y of certain p oly-time approximation ratios, in particular the Go emans-Williamson constan t α GW for the MAX-CUT problem [GW95, KKMO07]. The b est kno wn p olynomial time approximation to the unique games problem [CMM06] is based on an SDP relaxation of a formulation that uses indicator v ariables. It is quite different from SDPs normally used in applications (suc h as those describ ed ab o ve). In particular, the v ariable matrix has size N L × N L and Ω  N 2 L 2  constrain ts. W e adapt this SDP to approximate the quasi maximum lik eliho o d problem (4). As w e sho w that it is Unique-Games hard to appro ximate (4) within an y constan t (see Section 2) it is hop eless to aim for go o d guaran tees for general instances. How ev er worst case analysis is often too p essimistic and not indicative of p erformance observed in practice. In fact, under the random noise mo del we ha ve for the observ ations, numerical sim ulations suggest that the SDP relaxation p erforms remark ably well, seeming to outperform existing metho ds. In an attempt to explain this phenomenon w e show that our SDP is stable at high SNR lev els and that it is tight at extremely high SNR levels. By stabilit y , w e mean that with high probability the solution to the SDP does not deviate m uch from the true solution (see Theorem 4.2). The stability for this SDP is particularly in teresting as it is more c hallenging to analyze than random instances of Unique-Games [AKK + 08, KT07], since our noise mo del is on the vertices, which translates into dep enden t noise on the edges. Still, these results fall short of prop erly explaining the remark able p erformance that w e see in sim ulations and more researc h is needed tow ards understanding the typical b eha vior of this SDP . In order to simplify the SDP we also study a version with few er constraints. Interestingly , this w eaker SDP can be solved explicitly and is equiv alent to the pairwise alignmen t method called phase correlation [HG84]. This metho d do es not take in to account information b etw een all pairs of mea- suremen ts, whic h suggests that the full complexity of the Unique-Games SDP [CMM06] is needed to 3 obtain a go o d approximation to (4). The fact that a global shift do es not affect the solution to (4) creates symmetries in our SDP relaxation. In fact, w e leverage suc h structure by using symmetry reduction techniques from matrix represen tation theory to simplify the analysis and computation of the SDP , greatly decreasing its com- putational cost. This is quite useful giv en the high computational cost of semidefinite programming. Con tributions: Our main contribution is applying tec hniques from theoretical Computer Science to a problem in applied Math. W e introduce an Unique Games st yle SDP relaxation for the alignmen t problem that is nov el for the applied Math comm unity . F rom the theoretical Computer Scie nce point of view, w e in tro duce a new problem that has a similar fla v or to the Unique Games problem – in fact w e show that the worst case version is at least as hard as Unique Games. W e in tro duce a natural a verage case v ersion of this alignment problem - aligning several shifted copies of a signal corrupted b y indep enden t Gaussian noise. Existing analyses of semi-random mo dels of Unique Games do not seem to apply to this problem. W e show that for sufficiently high SNR, the SDP solution is close to an integer solution – this is a first step to establishing a signal recov ery result which we lea ve as an op en problem. W e b elieve that future inv estigations into this problem will yield interesting insigh ts in to the Unique Games SDP and on dealing with correlated noise in a verage case analysis. 2 Quasi Maxim um Lik eliho o d Estimator The log likelihoo d function for the mo del (1) is L ( x, l 1 , . . . , l N | y 1 , . . . , y N ) = N 2 log(2 π ) − 1 2 σ X i ∈ [ N ] k R − l i y i − x k 2 . (5) Maximizing L is equiv alent to minimizing the sum of squared residuals P k R − l i y i − x k 2 . Fixing the l i ’s, the minimal v alue of L o ccurs at the av erage x = 1 N P N i =1 R − l i y i . Making the tame assumption that k x k 2 is estimable (the norm is shift-inv ariant), maximizing (5) is th us equiv alen t to maximizing the sum of the inner pro ducts  R − l i y i , R − l j y j  across all pairs ( i, j ). Thus w e consider the estimator b ` = argmax ` ∈ Z N L X i,j ∈ [ N ] h R − l i y i , R − l j y j i (6) Unfortunately , the searc h space for this optimization problem has exponential size. Indeed, assuming no model for the v ectors { y i } , it is NP -hard to find the shifts whic h maximize (6), or even estimate it within a close constant factor. Theorem 2.1 It is NP -har d (under r andomize d r e ductions) to find a set of lab els appr oximating (6) within 16 / 17 + ε of its optimum value. It is UG -har d (under r andomize d r e ductions) to appr oximate (6) within any c onstant factor. Pro of. (outline) W e giv e a randomized reduction from the class of Γ -MAX-2LIN ( q ) instances consisting of a set of 2 v ariable linear equations of the form x i − x j ≡ c ij (mo d q ), with the goal of choosing an assignment for the v ariables whic h maximizes the num b er of satisfied equations. W e construct a v ector y k for every v ariable x k suc h that shifts of y k corresp ond to an assignment to x k . W e pick a random vector z ij corresp onding to a constrain t on v ariables x i , x j and place a cop y of z ij at sp ecific lo cations in y i and y j . Shifts of y i , y j corresp onding to satisfying assignmen ts of the constrain t results in a sup erposition of the copies of z ij . W e choose parameters so that the only non-trivial contributions 4 to the ob jectiv e function (6) come from suc h sup erp osition. The v alue of the ob jective is (within small error) a scaled version of the n umber of constraints of the Γ -MAX-2LIN ( q ) instance satisfied b y the assignmen t corresp onding to the shifts. Thus hardness results for Γ -MAX-2LIN ( q ) directly translate to hardness results for the alignmen t problem. The details are given in Appendix C. The discrete optimization problem (6) may b e formulated using indicator v ariables as an integer programming problem argmax { u ik } N X i,j =1 X k,l ∈ Z L u ik u j l h R − k y i , R − l y j i , (7) where u ik ∈ { 0 , 1 } is the indicator v ariable u ik = δ { l i ≡ k } . View U ik ; j l = u ik u j l as an en try of the Gram matrix U ∈ R N L × N L , and C ik ; j l = h R − k y i , R − l y j i as an entry of the data Gram matrix C ∈ R N L × N L . (7) can b e written as the maximization problem tr( C U ) sub ject to rank( U ) = 1, and an y other constraints needed to enforce that U is a Gram matrix of indicator v ariables. This suggests the p ossibilit y of a sp ectral rounding algorithm, where w e try to determine the indicator v ariables b y examining the top eigenv ectors of C . Lemma 2.2 The data Gr am matrix C with entries C ik ; j l = h R − k y i , R − l y j i satisfies: 1. C  0 and has r ank L , with non-zer o eigenvalues λ k = L P N i =1 |F ( Y i , k ) | 2 . 2. Ther e is a unitary matrix P for which P C P ∗ = diag( C 0 , . . . , C L − 1 ) is blo ck diagonal, wher e e ach C k ∈ C N × N is a r ank 1 matrix. Pro of. Refer to Lemma B.1. F or random signals x , Lemma 2.2 indicates that the sp ectral gap b etw een the top L eigen v alues and the remaining eigen v alues of C will b e large, on the order of min k P i |F ( Y i , k ) | 2 = Ω( LN ). Hence w e ma y w an t to try and recov er a solution to 7 b y examining the eigenv ectors asso ciated with the top eigen v alues in the sp ectrum of C . In particular, in the noiseless case with σ = 0, the indicator vector 1 { ik : k ≡ l i } ∈ R N L lies in the span of the top L eigenv ectors of C . Unfortunately , this sp ectral gap will not b e apparen t for a large class of signals. As long as a single p o wer sp ectra |F ( x, k ) | 2 of x is near zero, the corresp onding eigenv alue λ k will separate less from the small eigenv alues of C , and hence the space of the top L eigen vectors of C will contain less relev an t information. F or this reason, it would only b e meaningful for us to characterize the SNR by the spectral gap min k |F ( x, k ) | 2 . F urthermore, our simulations suggest that reco very from a sp ectral relaxation p erforms worse than the semidefinite relaxation we are ab out to prop ose. 3 Semidefinite relaxation The quadratic form constraints X k,l ∈ Z L u ik u j l = 1 , i, j ∈ [ N ] u ik u il = 0 , i ∈ [ N ] , k 6 = l ∈ Z L u ik u j l ≥ 0 , i, j ∈ [ N ] , k , l ∈ Z L . 5 on u ik ∈ R enforce that the u ik ’s are indicator v ariables (up to global sign, which cannot b e fixed b y quadratic constraints). The global shift am biguity of the problem guarantees the existence of L differen t solutions. In fact, we will attempt to find a candidate solution in the span of lifted versions of these. This can b e more succin tly written as an optimization problem for a rank L matrix V : max V tr( C V ) (8) sub ject to C , V ∈ R N L × N L , C ik ; j l = h R − k y i , R − l y j i X k,l V ik ; j l = 1 , V ik ; il = 0 , V ≥ 0 for k 6 = l , V  0 . Of the imp osed constraints on V , the only non-con vex constraint is that of rank deficiency , which ob- structs the use of con v ex programming tec hniques. Removing this rank constrain t yields a semidefinite program. This SDP is extremely similar to (and motiv ated by) SDPs commonly used to approximate solutions to certain constrain t satisfaction problems ( CSP s), notably Unique-Games instances. An Unique-Games instance consists of a graph G = ([ N ] , E ), a lab el set Z L , and a set of p ermutations π ij : Z L → Z L . The problem is to c ho ose the b est assignmen t of lab els to eac h vertex such that as man y of the p ermutations ( π ij ) ( i,j ) ∈ E are satisfied. Γ -MAX-2LIN( L ) is sp ecial case where the p erm utations π ij are cyclic. In our notation, the SDP studied for Unique-Games is usually of the form max V 1 2 tr( C V ) (9) sub ject to C , V ∈ R N L × N L , C ik ; j l = δ { l = π ij ( k ) } , X k V ik ; ik = 1 , V ik ; il = 0 for k 6 = l , V ≥ 0 , V  0 . This formulation attempts to count the num b er of satisfied edge constrain ts for an Unique-Games instance. One can treat the SDP (8) as an instance of Γ -MAX-2LIN( L ) on a weigh ted complete graph, with each cyclic p erm utation weigh ted b y C ik ; j l = h R − l i y i , R − l j y j i . In this context, the matrix C is dubb ed the label-extended adjacency matrix [Kol10]. Th us the significant b o dy of literature conducted on Unique-Games may b e useful in understanding (8). Another common feature of the aligmen t SDP with Γ -MAX-2LIN( L ) instances is that the assigned lab els ma y b e chosen upto cyclic symmetry . This induces a block circulant symmetry in the semidefinite program (see Lemma D.5). F or example, this symmetry will allow us to presume that V ik ; ik = 1 /L from the constraint P k V ik ; ik = 1. A ma jor feature of (8) which do es not feature in the study of Unique-Games is the structure of the data co efficient matrix C . While Unique-Games sp ecifies constrain ts on edges of a graph (there are N 2 pieces of information), the alignment problem only sp ecifies information on its vertices ( N pieces of information). This do es assist our understanding of the semidefinite program, since it enables us to apply more symmetry conditions, but it also significantly complicates our analysis. An in terpretation of the constrain t V ≥ 0 is that it enforces triangle inequalit y constraints k v ik − 0 k + k 0 − v j l k ≥ k v ik − v j l k [CMM06]. As we see in the next section, as well as from empirical results, the constrain t V ≥ 0 causes the SDP solution matrix to stabilize more around in tegral instances. In terestingly , without this p ositivity constrain t, the SDP may b e solved in closed form, and is effectiv ely equiv alen t to pairwise phase correlation. Theorem 3.1 L et V b e a solution of (9) without the p ositivity c onstr aints. Then, V has r ank L , c orr esp onding to one eigensp ac e of eigenvalue N/L . This eigensp ac e c ontains the ve ctor v phase ∈ R N L 6 satisfying n v phase il o l = F ∗  F ( y 1 , l ) F ∗ ( y i , l ) |F ( y 1 , l ) F ∗ ( y i , l ) |  l which is the c onc atenation of phase c orr elation ve ctors (se e App endix A) b etwe en y 1 and y i . Pro of. By symmetry , assume without loss of generality that V is blo ck circulan t (see Lemma D.5). Let P b e the unitary matrix defined in Lemma 2.2 and e  kl L  denote the classical F ourier basis function e ( x ) = e 2 π ix . Then, P V P ∗ = diag ( V 0 , . . . , V L − 1 ) where V k = P L − 1 l =0 e  kl L  V l ∈ C N × N is p ositiv e semidefinite. The SDP constraints V ∈ R N L × N L , V  0 , V ik ; il = 0 for k 6 = l , X k,l V ik ; j l = 1 are resp ectiv ely equiv alent to the F ourier side constrain ts V k = V L − k , V k  0 , ( V k ) ii = 1 /L, ( V 0 ) ij = 1 /L. (10) Since V k  0, the absolute v alue of each entry is bounded b y the maxim um of its diagonal en tries, so eac h entry has magnitude at most 1 /L . Hence tr( C V ) = X k ∈ Z L tr( C k V k ) ≤ 1 L X k ∈ Z L X ij ∈ [ N ] | ( C k ) ij | , (11) with equality o ccuring when V k has the en tries of C k , normalized to magnitude 1 /L . Then ( V k ) ij = F ( y i ,l ) F ∗ ( y j ,l ) |F ( y i ,l ) F ∗ ( y j ,l ) | , and a basis of the non-trivial eigenspace of V is giv en by W := P ∗ (( V k ) i, 1 ) i,k =  e ( − kl L )( V k ) i, 1  ik ; l Note the selection of the observ ation 1 is arbitrary . The vector v phase ∈ R N L giv en by v phase ik := ( W 1 N × 1 ) ik = " F ∗  F ( y 1 , l ) F ∗ ( y i , l ) |F ( y 1 , l ) F ∗ ( y i , l ) |  L − 1 l =0 # k lies in the image of W and it is easy to see that the L v ectors W phase ∈ R N L × L generated b y circulating the N blo c ks of L en tries of v phase are linearly indep endent. Indeed, the top L × L blo c k of W phase is the identit y matrix I L . F rom the SDP solution, one must round back to a solution in the original search space. There is a significan t b o dy of literature on the topic of rounding the solutions to v arious SDPs for Unique Games. The analysis and guaran tees for these rounding sc hemes are in terms of the num ber of constraints satisfied b y the solution pro duced and do not immediately give a result ab out signal recov ery in our setting. In their study of semi-random instances of Unique-Games , the authors of [KMM11] giv e a rounding technique that uses b oth SDP and LP solutions when the SDP is known to b e somewhat sparse – this is similar to a condition we obtain in the following stability section. Exploiting these ideas to establish an exact signal recov ery guaran tee is an in teresting op en problem. 7 4 Stabilit y F or simplicit y , without loss of generality align the ground truth shifts of the observ ations, so that y i = x + ξ i , where ξ i ∼ N (0 , σ 2 I L ) i.i.d. The ground truth in tegral instance for the SDP will b e (upto blo c k cyclic symmetry) the indicator matrix V int ∈ R N L × N L , defined as V int ik ; j l = δ k = l /L . If there is a set of lab els ` = ( l i ) i whic h are pairwise optimal, in the sense that h R − l i y i , R − l j y j i = max k h y i , R − k y j i for all pairs ( i, j ), the SDP (8) will produce the integral instance. While this may be lik ely in a v ery high signal-to-noise setting, it is a rather stringen t condition. W e relax this condition and show that the SDP solution m ust still resem ble the integral instance at a reasonably high SNR. T he exact definition for the SNR will b e deferred for later, but will b e characterized in terms of the gap ∆ = k x k 2 − max l 6 =0 h x, R l x i . (12) F or any V ∈ R N L × N L lie in the SDP feasibilit y region, we can characterize the distance of V from the integral instance by the differences D ij = X k 6 = l ∈ Z L V ik ; j l = 1 − X k ∈ Z L V ik ; j k ∈ [0 , 1] . D ij is a measure of how m uch the SDP would w eight shift preferences other than the ground truth. When all D ij = 0 we obtain the integral instance. F or con venience, we make the definitions ξ 0 = 2 x and η ij = 2 max l |h R l ξ i , ξ j i| ≥ 0 for i, j = 0 , 1 , . . . , N . The follo wing lemma demonstrates that the η ij ’s can b e treated as worst-case b ounds on the noise terms of the en tries of the data Gram matrix C . Lemma 4.1 If tr( C V ) ≥ tr( C V int ) , then P i 6 = j (∆ − η ij − η j 0 ) D ij ≤ 0 . Pro of. W e can find a blo c k circulant matrix which attains the same SDP ob jectiv e v alue as V , so without loss of generalit y presume V is blo ck circulan t. Hence P k V ik ; j 0 = 1 /L and D ij = 1 − LV i 0; j 0 . Expanding, 0 ≥ tr( C V int ) − tr( C V ) = X i,j X k h R − k y i , y j i ( δ k =0 − LV ik ; j 0 ) ≥ X i,j  h y i , y j i (1 − LV i 0; j 0 ) − max l 6 =0 h R l y i , y j i X k 6 =0 LV ik ; j 0  = X i,j  h y i , y j i − max l 6 =0 h R l y i , y j i  D ij . The second inequality requires the SDP constrain t V ≥ 0. The p essimistic b ound h y i , y j i − max l 6 =0 h R l y i , y j i ≥ k x k 2 + h x, ξ i i + h x, ξ j i + h ξ i , ξ j i − max l 6 =0 h x, R l x i − max l 6 =0 h R l x, ξ i i − max l 6 =0 h R l x, ξ j i − max l 6 =0 h R l ξ i , ξ j i ≥ ∆ − ( η i 0 / 2 + η j 0 / 2 + η ij ) , and rearrangement gives the desired inequality . 8 Theorem 4.2 With pr ob ability 1 − e − N + o ( N ) , the solution to the SDP satisfies X i,j D ij ≤ ( k x k + σ 2 √ L ) · 12 log eL ∆ · N 2 . Pro of. F or sufficiently high SNR, we would exp ect that the inequality in Lemma 4.1 would fail to hold. Indeed, by Lemma E.2, the inequality X i 6 = j ( η ij + η j 0 ) D ij ≤ O (log L ) ·  2 k x k + 1 N X i k ξ i k  N 2 , (13) holds with probabilit y at least 1 − e − N + O (log N ) . It arises from tail b ounds on the sum of sligh tly dep enden t random v ariables, and is independent of the structure of the SDP (as opp osed to Lemma 4.1). Combined with Lemma 4.1, w e can obtain a guarantee on the deviation b etw een the SDP solution and the integral instance. The full pro of ma y b e found follo wing Theorem E.3. Theorem 4.2 indicates that at a sufficien tly high SNR, the UG-based SDP will produce a matrix V , of whic h eac h L × L blo c k will hav e high small v alues outside of the main diagonal. A rounding sc heme w ould likely in terpret this as the identit y shift b eing the optimal shift. This motiv ates us to choose a definition for the SNR to b e along the lines of S N R = ∆ / [( k x k + σ 2 √ L ) log L ] (for reference, for random signals x , w e w ould exp ect that ∆ = L − O ( √ L )). This c haracterization of SNR is significan tly more lenient than that for sp ectral relaxation, in Section 2. F or example, a sum of a few sin usoids will ha ve several small p ow er sp ectra, but if the least common multiple of their p erio ds do es not divide L , then ∆ will still b e large. Note furthermore that we may b e more flexible in our definition for ∆: for example, supp ose there is a set of shifts ` ∗ (including the identit y elemen t) of size | ` ∗ | = O (1) for which max l ∈ ` ∗ h x, R l x i > ∆ + min l / ∈ ` ∗ h x, R l x i . The ab ov e results may b e mo dified to describ e the concen tration of the SDP solution on the en tries of the shifts in ` ∗ (not just along the iden tity shift) for each pair of observ ations y i , y j . This result ma y b e strengthened in other manners. F or constants 0 < δ, ε  1, we can attain a tigh ter concentration condition of the form X D ij ≥ (1 + δ ) 2 N q X D 2 ij /S N R + 2 εN 2 (14) instead of the current condition P D ij ≥ N 2 /S N R . The argument is based on the analysis of the SDP for adversarial semi-random Unique-Games instances by [KMM11]. How ev er, the pro of must b e more n uanced in our case, due to correlations in the noise mo del of C , caused b y the smaller source of randomness av ailable to us. Hence we omit the full details, but provide a sketc h b elow. An y SDP feasible matrix V can alternatively b e represented as V ik ; j l = h v ik , v j l i /L for a set of unit v ectors v ik ∈ R N L . With this notation, 2 D ij = 2(1 − LV i 0; j 0 ) = k v i 0 − v j 0 k 2 . The space of these unit v ectors can b e appro ximately discretized by a random pro jection due to the Johnson-Lindenstrauss lemma. Precisely , Lemma E.4 pro vides a set of unit vectors N of size | N | = exp  O  δ − 2 log 2 (1 /ε )  suc h that the set of unit vectors { v i 0 } N i =1 can b e approximated under a random pro jection ϕ by an N -tuple in N . Sp ecifically , if D ϕ ij = k ϕ ( v i 0 ) − ϕ ( v j 0 ) k 2 / 2, then the D ij ’s lie within an α J L -ball of D ϕ ij , this ball b eing defined as { D : α − 1 J L ( D ϕ ) ≤ D ≤ α J L ( D ϕ ) } for α J L ( D ) = (1 + δ ) D + ε . Instead of finding a global tail b ound in Lemma E.2, we can deriv e a lo cal tail b ound for each α J L - ball of D ϕ ij ’s, each tail b ound holding with probability at least 1 − 2 N e − t 2 N . F or 0 < δ, ε = ε 0  1 constan ts, the num b er of N -tuples of v ectors in N is of size exp( O ( N )). With a sufficien tly large 9 constan t t , the lo cal tail b ound ma y b e union b ounded across all balls of vectors in N N , and thus will hold for all SDP-feasible V . With some care, this argument would yield a concen tration condition of the form (14). 5 Numerical Results Figure 1: Av erages of errors of several alignmen t metho ds across 500 iterations, with parameters σ = 1 , L = 5. W e implemen ted several baseline metho ds discussed th us far, and plotted their av erage error p erformance across 500 iterations in Figure 1. F or eac h iteration, w e c hose a signal x randomly from the distribution N (0 , I L ), as w ell as N i.i.d noise vectors ξ i ∼ N (0 , σ 2 I L ), and apply eac h of our metho ds. These simulations confirm our intuition that the UG-based SDP p erforms b etter than other b enc hmark methods. In particular, they suggest that the UG-based SDP is highly stable around in tegral instances. The implementation using bispectrum-like inv ariants is discussed in App endix A. F or each of the other pro cedures, we construct a N L × L matrix W recording alignment preferences. F or cross- or phase correlation, let W ik ; l b e the ( k − l )th entry of the cross- or phase correlation vector b etw een y 1 and y i . F or the sp ectral rounding off the Gram matrix C , and the solution of the semidefinite program V , let W b e the top L eigen vectors of the resp ectiv e matrix. The shifts are read off this matrix, and the un-shifted y i are av eraged to pro duce an estimate for x . The first plot (a) sho ws the difference b et ween this estimate and x . An imp ortan t problem not very elab orated in this pap er is how to determine the b est shifts from W . A natural metho d is to iden tify an indicator vector 1 { ik : l i ≡ k } lying close to the column span of W , for some lab elling ` = ( l i ) i . Curren tly , our pro cedure is to apply a linear transformation to W suc h that the top L × L blo ck of W is the identit y matrix I L (although there is no reason to b eliev e this rounding is robust, it is sufficien t for the purp ose of having an easy b enc hmark b etw een multiple metho ds). Then, for each i , the maximal entry in the first column of W giv es the c hoice of shift for y i . The second plot shows the distance this indicator v ector lies from the column span of W . Let W ⊥ ∈ R N L − L × N L b e the matrix whose rows form the orthogonal complemen t of the ro ws of W T , and u ∈ R N L denote our indicator vector. Then u should b e a sparse v ector near the nullspace of W ⊥ . If we knew one of the non-zero co ordinates of u , sa y co ordinate i , we could recov er the rest of u 10 as a sparse solution to W ⊥ − i u − i = − W ⊥ i , where W ⊥ − i is the submatrix obtained from W ⊥ b y removing the column W ⊥ i . The problem can then b e solv ed by ` 1 -minimization, as suggested b y the theory of sparse recov ery . Although normally , we would b e required to run this pro cedure for ev ery co ordinate (to make sure we pick an active one) the symmetries in W imply the existence of L suc h indicator v ariables and that an y co ordinate we pick will b e active in one of them. 6 Generalizations and F uture W ork It is worth noting that the discrete multireference alignment problem naturally generalizes from shifts o ver Z L to actions of finite groups G ov er finite spaces S . In this case, the analogue of phase correlation [Lon10] is defined in terms of the generalized F ourier transform F G,S ( · , ρ ) : C S → C d ρ × d ρ , where ρ : G → C d ρ × d ρ are irreducible matrix represen tations. The Unique Games SDP may also b e generalized in a natural manner, in rough analogy with SDP v arian ts studied for Γ- MAX-2LIN instances. In the case of an ab elian group G = S , the proof of Theorem 3.1 follows in the same fashion, and hence the alignmen t UG-based SDP without p ositivity constraints will remain equiv alent to phase correlation. The symmetry of the SDP in this case can b e described naturally by the theory of C ∗ -matrix algebras (refer to App endix D). Symmetries in semidefinite programs can b e written with resp ect to linear combinations of { 0 , 1 } matrices, which form a basis of a ∗ -matrix algebra. Under sufficiently nice conditions, this basis can b e blo ck diagonalized, for example by W edderburn’s decomp osition or b y regular ∗ -represen tations [Mur07]. Hence, the ∗ -matrix algebra can b e represented by a low er- dimensional blo c k diagonalized v ersion. F rom a computational p ersp ectiv e, this can mak e highly symmetric SDPs m uch more amenable to sparse SDP solvers [DdK11]. In our case, diagonalization b y the blo ck DFT allo ws the SDP to b e rewritten as an optimization problem across L PSD matrices of size N × N instead of one PSD matrix of size N L × N L . It would b e interesting to see how the p erformance of the SDP c hanges when we study the alignment problem across more difficult groups, esp ecially non-ab elian ones, which arise in sev eral applications. The numerical simulations in Section 5 suggest that the UG-based SDP achiev es exact recov ery with high probability for sufficiently high SNR. That is, the resulting SDP matrix is in tegral, so by solving the SDP we are indeed obtaining the solution to the quasi-ML estimator. Indeed, as the SNR decreases, there appears to b e a phase transition during which the SDP almost alwa ys recov ers an in tegral solution. Our analysis of the stability of the semidefinite program do es not fully explain this phenomenon. The authors b elieve this to b e an interesting direction for future w ork, esp ecially since guaran tees of exact reco very are attainable in high SNR settings for a few semidefinite relaxations, for example for the MIMO problem [MCS10]. Another imp ortant question is to understand the sample complexity of our approac h to the align- men t problem. Since the ob jectiv e is to recov er the underlying signal x , having a larger num b er of observ ations N should give us b etter reco very . The question can b e formalized as: for a given v alue of L and σ , how large do es N need to b e in order to hav e reasonably accurate recov ery? The sample complexit y of metho ds like the bisp ectral inv ariants would b e exp ected to require N = Ω( σ 2 L 2 log L ) observ ations. W e would hypothesize on the strength of our numerical results that the UG-based SDP requires fewer observ ations for meaningful recov ery , and establishing this is an interesting op en problem. Along with expanding the domain of the alignment problem, it would b e interesting to attempt the style of analysis discussed in this pap er for other maximum likelihoo d problems. Maxim um likeli- ho o d estimators pla y an imp ortant role in many estimation problems, but often (as in our problem) 11 computing or approximating the MLE is a c hallenging problem and semidefinite programming could pro vide a tractable alternativ e in an a verage case setting. Ac kno wledgemen ts The authors thank Y utong Chen for v aluable assistance with the implementation of our algorithm. A. S. Bandeira w as supported b y AF OSR Gran t No. F A9550-12-1-0317. M. Charik ar was supp orted by NSF grants CCF 0832797, AF 0916218 and AF 1218687. A. Singer was partially supp orted by Award Num b er F A9550-12-1-0317 and F A9550-13-1-0076 from AF OSR, by Aw ard Num b er R01GM090200 from the NIGMS, and by Award Number L TR DTD 06-05-2012 from the Simons F oundation. Parts of this work hav e app eared in A. Zh u’s undergraduate thesis at Princeton Universit y . References [AKK + 08] S. Arora, S. A. Khot, A. Kolla, D. Steurer, M. T ulsiani, and N. K. Vishnoi. Unique games on expanding constrain t graphs are easy: extended abstract. In Pr o c e e dings of the 40th annual ACM symp osium on The ory of c omputing , STOC ’08, pages 21–28, New Y ork, NY, USA, 2008. A CM. [ALMT13] D. Amelunxen, M. Lotz, M. B. McCoy , and J. A. T ropp. Living on the edge: A geometric theory of phase transitions in conv ex optimization. Available online at [cs.IT] , 2013. [BSS12] A. S. Bandeira, A. Singer, and D. Spielman. A Cheeger inequality for the graph con- nection laplacian. available online , 2012. [CMM06] M. Charik ar, K. Mak aryc hev, and Y. Mak aryc hev. Near-optimal algorithms for unique games. Pr o c e e dings of the 38th A CM Symp osium on The ory of Computing , 2006. [CR09] E. Cands and B. Rech t. Exact matrix completion via con vex optimization. F oundations of Computational Mathematics , 9(6):717–772, 2009. [CR T06] E. J. Cand ` es, J. Rom b erg, and T. T ao. Stable signal recov ery from incomplete and inaccurate measurements. Comm. Pur e Appl. Math. , 59:1207–1223, 2006. [CSV11] E. J. Candes, T. Strohmer, and V. V oroninski. Phaselift: exact and stable signal reco very from magnitude measurements via conv ex programming. Communic ations on Pur e and Applie d Mathematics , 2011. [DdK11] C. Dobre and E. de Klerk. Semidefinite programming approaches for structured combi- natorial optimization problems, 2011. [Dia92] R. Diamond. On the m ultiple sim ultaneous sup erp osition of molecular structures by rigid b o dy transformations. Pr otein Scienc e , 1(10):12791287, Octob er 1992. [DM98] I. Dryden and K. Mardia. Statistic al shap e analysis . Wiley series in probability and statistics. Wiley , Chichester [u.a.], 1998. [Don06] D. L. Donoho. Compressed sensing. IEEE T r ans. Inform. The ory , 52:1289–1306, 2006. 12 [FZB02] H. F oro osh, J. B. Zerubia, and M. Bertho d. Extension of phase correlation to subpixel registration. IEEE T r ansactions on Image Pr o c essing , 11(3):188–200, 2002. [GW95] M. X. Goemans and D. P . Williamson. Improv ed apprximation algorithms for maxim um cut and satisfiabilit y problems using semidefine programming. Journal of the A sso ciation for Computing Machinery , 42:1115–1145, 1995. [HG84] J. L. Horner and P . D. Gianino. Phase-only matc hed filtering. Appl. Opt. , 23(6):812–816, Mar 1984. [KDM95] P . Kosir, R. DeW all, and R. Mitchell. A m ultiple measurement approac h for feature alignmen t. In A er osp ac e and Ele ctr onics Confer enc e, 1995. NAECON 1995., Pr o c e e dings of the IEEE 1995 National , volume 1, pages 94–101 vol.1, 1995. [Kho02] S. Khot. On the p o wer of unique 2-prov er 1-round games. Pr o c e e dings of the 34th Annual A CM Symp osium on The ory of Computing , pages 767–775, 2002. [KI93] R. Kak arala and G. J. Iv erson. Uniqueness of results for m ultiple correlations of perio dic functions. J. Opt. So c. A m. A , 10(7):1517–1528, Jul 1993. [KKMO07] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell. Optimal inapproximabilit y results for max-cut and other 2-v ariable csps? SIAM J. Comput. , 37(1):319–357, 2007. [KMM11] A. Kolla, K. Mak aryc hev, and Y. Mak aryc hev. Ho w to pla y unique games against a semi-random adversary: Study of semi-random mo dels of unique games. F oundations of Computer Scienc e, IEEE Annual Symp osium on , pages 443–452, 2011. [Kol10] A. Kolla. Sp ectral algorithms for unique games. 25th Annual IEEE Confer enc e on Computational Complexity , pages 122–130, 2010. [KT07] A. Kolla and M. T ulsiani. Playing random and expanding unique games. Unpublished, 2007. [LM00] B. Laurent and P . Massart. Adaptive estimation of a quadratic functional b y model selection. A nn. Statist. , 2000. [LMTZ12] G. Lerman, M. B. McCo y , J. A. T ropp, and T. Zhang. Robust computation of linear mo dels, or how to find a needle in a haystac k. CoRR , abs/1202.4044, 2012. [Lon10] A. Loneland. Non-comm utative harmonic analysis: Generalization of phase correlation to the euclidean motion group. Master’s thesis, Universit y of Bergen, June 2010. [LX C04] W. Ligong, L. Xueliang, and H. C. Eigen v alues of a sp ecial kind of symmetric block circulan t matrices. Applie d Math J. Chinese University Series B , 19(1):17–26, 2004. [Mau03] A. Maurer. A b ound on the deviation probabilit y for sums of non-negativ e random v ariables. J. Ine qualities in Pur e and Applie d Mathematics , 4(1):15, 2003. [MCS10] A. Man-Cho So. Probabilistic analysis of the semidefinite relaxation detector in digital comm unications. In Pr o c e e dings of the Twenty-First Annual A CM-SIAM Symp osium on Discr ete Algorithms , SODA ’10, pages 698–711, Philadelphia, P A, USA, 2010. So ciety for Industrial and Applied Mathematics. 13 [Mur07] K. Murota. A n umerical algorithm for blo ck-diagonal decomposition of matrix *-algebras with application to semidefinite programming. Jap an Journal of Industrial and Applie d Mathematics , 27(1), 2007. [PZAF05] R. G. Pita, M. R. Zurera, P . J. Amores, and F. L. F erreras. Using m ultilay er perceptrons to align high range resolution radar signals. In W. Duc h, J. Kacprzyk, E. Oja, and S. Zadrozny , editors, Artificial Neur al Networks: F ormal Mo dels and Their Applic ations - ICANN 2005 , v olume 3697 of L e ctur e Notes in Computer Scienc e , pages 911–916. Springer Berlin Heidelb erg, 2005. [SG92] B. M. Sadler and G. B. Giannakis. Shift- and rotation-inv ariant ob ject reconstruction using the bisp ectrum. J. Opt. So c. Am. A , 9(1):57–69, Jan 1992. [Sin11] A. Singer. Angular sync hronization by eigen vectors and semidefinite programming. Ap- plie d and Computational Harmonic Analysis , 30(1):20 – 36, 2011. [SSK13] B. Sonday , A. Singer, and I. G. Kevrekidis. Noisy dynamic simulations in the presence of symmetry: Data alignmen t and mo del reduction. Computers & Mathematics with Applic ations , 65(10):1535 – 1557, 2013. [TS12] D. L. Theobald and P . A. Steindel. Optimal simultaneous sup erp ositioning of multiple structures with missing data. Bioinformatics , 28(15):1972–1979, 2012. [WS11] D. P . Williamson and D. B. Shmo ys. The Design of Appr oximation Algorithms . Cam- bridge Universit y Press, New Y ork, NY, USA, 1st edition, 2011. [ZvdHGG03] J. Zw art, R. v an der Heiden, S. Gelsema, and F. Gro en. F ast translation inv ariant classification of hrr range profiles in a zero phase representation. R adar, Sonar and Navigation, IEE Pr o c e e dings - , 150(6):411–418, 2003. A Phase correlation and the Bisp ectrum Man y scientific fields hav e isolated discussions and solutions for the alignmen t problem, o ccasionally with slight context-specific adjustmen ts. Suc h metho ds include iterative template aligning [KDM95], zero phase represen tations [ZvdHGG03], angular synchronization [SSK13], and mac hine learning [PZAF05]. Unfortunately , most of these are either do not fairly w eight each observ ation, do not mak e full use of the av ailable information, or do not hav e rigorous p erformance discussion. W e outline just a couple of v ery well-studied alignment metho ds in this section. In the case of a pair of noisily shifted v ectors ( N = 2), sev eral long-studied metho ds assign scores to each p ossible shift b etw een the vectors, and estimate the shift as the one with the highest score. The most natural is to tak e the inner pro duct b etw een each p ossible shift of the v ectors y i , y j , which giv es the cross-correlation v ector v cross l = h y i , R − l y j i , with maximal entry b l = argmax v cross l . Another frequen tly used in practice is the phase correlation v ector v phase = F ∗  F ( y i , k ) F ∗ ( y j , k ) |F ( y i , k ) F ∗ ( y j , k ) |  L − 1 k =0 . (15) F or human-friendly images, phase correlation tends to b e more app ealing, since it tends to ha ve a single sharp p eak. The relation b etw een cross- and phase correlation can b e seen from the conv olution theorem: 14 Lemma A.1 Convolution the or em: {h y i , R − l y j ) i} L − 1 l =0 = √ L · F {F ( y i , k ) F ∗ ( y j , k ) } L − 1 k =0 . Pro of. Let M l denote the modulation operator ( M l x ) k = x k e ( k l/L ), defined such that F R l x = M l F x . By Parsev al’s Theorem, {h y i , R − l y j i} l = {hF y i , M − l F y j i} l =  X k ∈ Z L e  − kl L  F ( y i , k ) F ∗ ( y j , k )  l = √ L · F {F ( y i , k ) F ∗ ( y j , k ) } k . (16) Cross- and phase correlation can b e used directly for the problem of m ultireference alignment, say b y aligning all of the observ ations against the first one. How ever, this is certainly not a robust metho d. Another relev an t notion for characterizing alignments are the moment sp ectra. These are prop erties of a signal which are inv ariant under shifts. The k th pow er sp ectrum of a real signal x ∈ R L is defined b y |F ( x, k ) | 2 = F ( x, k ) F ∗ ( x, k ). This can b e extended to the d -th order bispectrum (or moment sp ectra) b ( k 1 , k 2 , . . . , k d ) = F ( x, k 1 + k 2 + · · · + k d ) F ∗ ( x, k 1 ) F ∗ ( x, k 2 ) · · · F ∗ ( x, k d ); (17) its shift in v ariance can b e seen since it is the F ourier transform of the d -th order autocorrelation function a ( k 1 , k 2 , . . . , k d ) = L − 1 X l =0 x l x l − k 1 x l − k 2 · · · x l − k d , (18) b y the Wiener-Khinchin theorem. Ho w muc h information do these inv ariants capture ab out the signal x ? Sadler and Giannakis giv e iterative and least squares algorithms in [SG92] for reconstructing the F ourier phases from full knowledge of the bisp ectrum, and Kak arala in [KI93] show the uniqueness of real function given all of its higher order moment sp ectra. These arguments generally tend to b e of a more num ber-theoretic fla vor and may b e tedious in practice. A simple sp ecial case o ccurs when w e tak e k 1 = k 2 = . . . = k d = 1 for d = 1 , . . . , L . This giv es us the set of bisp ectral inv ariants F ( x, d ) F ∗ ( x, 1) d . This quan tity , as a shift inv ariant, can b e consisten tly estimated from our observ ations y i . With an estimate for the phase of the first F ourier co efficien t F ( x, 1) (for example we can sample ov er a discretization of the unit circle), we can recov er the remaining F ourier phases F ( x, d ) and thus the signal x . B Maxim um Lik eliho o d Lemma B.1 The data Gr am matrix C with entries C ik ; j l = h R − k y i , R − l y j i satisfies the fol lowing pr op erties: 1. C  0 and has r ank L , with non-zer o eigenvalues λ k = L P N i =1 |F ( y i , k ) | 2 . 2. Ther e is a unitary matrix P for which P C P ∗ = diag( C 0 , . . . , C L − 1 ) is blo ck diagonal, wher e e ach C k ∈ C N × N . 15 Pro of. C has Cholesky decomp osition C = ( R y )( R y ) T ∈ R LN × LN where R y =     . . . − R − k y i − . . .     i,k , so C  0 has rank L . Since h R − k y i , R − l y j i = h R − k + m y i , R − l + m y j i , C is comp osed of N × N circulant blo c ks of size L × L . After permuting the ro ws and columns of C , one ma y write C as a blo c k circulant matrix P T C P =      C 0 C 1 · · · C L − 1 C L − 1 C 0 · · · C L − 2 . . . . . . . . . . . . C 1 C 2 · · · C 0      where ( C k ) ij = h y i , R − k y j i , and P is the appropriate p erm utation matrix. P T C P is blo ck diagonalized by the blo ck Discrete F ourier T ransform matrix DF T L ⊗ I N [LX C04], where ⊗ denotes the Kroneck er pro duct. P = ( D F T L ⊗ I N ) P T ∈ C N L × N L is thus a unitary transformation suc h that P C P ∗ = diag( C 0 , . . . , C L − 1 ); C k ∈ C N × N (19) is blo c k diagonal. These blo ck diagonals are comp onent wise Discrete F ourier T ransforms of the “vec- tor” ( C 0 , C 1 , . . . , C L − 1 ): C k = L − 1 X l =0 e  kl L  C l = √ L · [ F ∗ ( C 0 , C 1 , . . . , C L − 1 )] k =  L ( Y i ) k ( Y ∗ j ) k  ij b y Lemma A.1. Note also that each C k is Hermitian and rank 1 ( C is of rank L , and each of the blo c ks C k has p ositiv e rank). The unique nonzero eigenv alue λ k of C k is given by λ k = tr( C k ) = L N X i =1 | ( Y i ) k | 2 . (20) C NP-hardness of Quasi MLE Theorem C.1 Consider the pr oblem ALIGN ( y 1 , . . . , y N ) : for ve ctors y 1 , . . . , y N ∈ R L , find the shifts ` = ( l 1 , . . . , l N ) which maximize A ( l 1 , . . . , l N ) = X i,j ∈ [ N ] h R − l i y i , R − l i y j i . L et A ∗ = max ` A ( ` ) . It is NP -har d (under r andomize d r e ductions) to esimate A ∗ within 16 17 + ε of its true value. It is UG -har d (under r andomize d r e ductions) to estimate A ∗ within any c onstant factor. 16 W e demonstrate this b y a p oly-time approximation preserving reduction from a sp ecial class of MAX-2LIN ( q ) instances called Γ- MAX-2LIN ( q ). Consider a (connected) MAX-2LIN ( q ) instance where eac h constraint has the form x i − x j ≡ c ij (mo d q ), of which at most ρ ∗ are satisfiable. Representing eac h v ariable as a v ertex of a graph and eac h constraint as an edge, the instance is asso ciated with a connected graph G = ( V ( G ) , E ( G )) , where V ( G ) = [ N ] and | E ( G ) | = M . Let p oly ( M ) b e the space of integer functions which are b ounded by p olynomial order, i.e. f ∈ p oly ( M ) iff there are constants C, k such that f ( M ) ≤ C M k for all M ≥ 1. W e sa y that an even t o ccurs w.h.p if it o ccurs with probability 1 −  ( q , G ), where 1  ( q ,G ) / ∈ p oly ( q · | E ( G ) | ). Notice under this definition that if p oly ( q M ) ev ents o ccur w.h.p, then by an union b ound the even t that all o ccur will also b e w.h.p. Construct a parameter s = s ( q , M ) ∈ p oly ( q M ). W e take the v ectors y 1 , . . . , y N to b e of length L = q M s . The indices of the vector y i can b e expressed in mixed radix notation by elements of the tuple ( Z q , E ( G ) , [ s ]). F or example, w e w ould asso ciate the tuple index ( x, e k , t ) of y i b y the index x · q M + e k · M + t , where e k is the k th edge in some enumeration of E ( G ). Note that a shift by c · qM tak es this tuple index to ( x, e k , t ) + c · q M → ( x + c, e k , t ). F or each edge constraint x i − x j ≡ c ij , let z ij b e a vector uniformly at random chosen from {± 1 } s . Assign the en tries (0 , { i, j } , · ) of y i to z ij , and the en tries ( c ij , { i, j } , · ) of y j to z ij . Assign all of the remaining entries of the y i ’s to i.i.d Rademacher random v ariables ( {± 1 } with probability 1 / 2). In tuitively , a relativ e shift of c ij · q M betw een y i and y j will pro duce a large inner pro duct due to the o verlapping of the z ij ’s, while any other shift b et ween them would pro duce low inner pro ducts. Lemma C.2 Supp ose γ ∈ p oly ( q M ) . Consider two r andom ve ctors y 1 , y 2 of length γ whose entries ar e i.i.d sample d fr om the R ademacher distribution. W.h.p, for any 0 <   1 , the inner pr o duct of every p ossible shift R ` y 1 of y 1 with y 2 is b ounde d in absolute value by √ γ · m  . Pro of. By indep endence, each inner pro duct is the sum of γ indep enden t Bernoulli random v ariables whic h take v alues ± 1 with probabilit y 1 / 2. Ho effding’s inequality indicates Pr( h R ` y 1 , y 2 i ≥ √ γ · ( q M )  ) ≤ 2 exp {− ( q M )  / 2 } , so 1 Pr( h R ` y 1 ,y 2 i≥ √ γ ( q M )  ) / ∈ poly ( q M ). Union b ounding ov er all γ = p oly ( q M ) such inner pro ducts, w.h.p all of the inner pro ducts are simultaneously b ounded by √ γ · ( q M )  . W e sa y an edge { i, j } ∈ E ( G ) is c ij -satisfied under a lab elling ` if l i − l j ≡ c ij q M (mo d L ). F rom Lemma C.2, we observe that w.h.p, for any choices of shifts l i , l j , |h R − l i y i , R − l j y j i − s · δ ( { i, j } ∈ E ( G ) is c ij -satisfied) | < ( q M )  √ L. It follows that if the lab elling ` induces exactly k c ij -satisfied edges, w.h.p   A ( ` ) − 2 k s   < k ( q M )  √ L +  N 2 − k  ( q M )  √ L ≤ q  M 2+  √ L. (21) Theorem C.3 F r om a given lab el ling ` , w.h.p one may in p olynomial time c onstruct ( x 1 , . . . , x N ) ∈ Z N q satisfying at le ast  A ( ` ) − q  M 2+  √ L  / (2 s ) e dge-c onstr aints x i − x j ≡ c ij . Conversely, w.h.p ther e is a lab el ling ` max w.h.p satisfying A ( ` max ) > 2 s · ρ ∗ M − q  M 2+  √ L . 17 Pro of. Consider the subgraph H of G with vertex set V ( G ) and edge set comprising all edges c ij - satisfied under ` . F or each connected comp onen t of H , arbitrarily choose a vertex i of the comp onent to ha ve x i = 0. F ollow a spanning tree of eac h connected component and assign each neighbor j b y x j ≡ x i − c ij (mo d q ). The first implication of the lemma follo ws immediately by applying (21). Con versely , construct the lab elling ` max b y setting l max i = x i · q M . This lab elling induces at least ρ ∗ M c ij -satisfied edges, and (21) completes the lemma. Corollary C.4 T ake s > q 2 M 4 . If ` max satisfies A ( ` max ) ≥ ( α + δ ) A ∗ , then w.h.p, fr om ` max one may c onstruct ( x 1 , . . . , x N ) ∈ Z N q in p oly-time satisfying ( α + δ 0 ) ρ ∗ fr action of Z q -MAX-2LIN c on- str aints, for some δ 0 > 0 . Pro of. Let ρ b e the fraction of v ariable assignments satisfied by the construction of Theorem C.3 for the lab elling ` max . W.h.p, ρ > A ( ` max ) − q  M 2+ ε √ L 2 sM > ( α + δ ) A ∗ − q 1 / 2+  M 5 / 2+  s 1 / 2 2 sM > ( α + δ ) ρ ∗ − q 1 / 2+  M 3 / 2+  s 1 / 2 . Assuming the Unique Games conjecture, it is NP-hard to appro ximate Γ- MAX-2LIN ( q ) to any constan t factor [KKMO07]. Hence it is UG-hard (under randomized reductions) to appro ximate ALIGN within any constant factor. MAX-CUT is a sp ecial case of Γ- MAX-2LIN ( q ). Since it is NP -hard to approximate MAX-CUT within 16 17 + ε , it is NP -hard (under randomized reductions) to approximate ALIGN within 16 17 + ε . D ∗ -matrix algebras In this app endix section, w e briefly sketc h the ideas b ehind symmetry reduction in semidefinite pro- grams. Most of the details are deferred to other sources. F or example, [DdK11] is a go o d reference on this topic. While the application of symmetry to the SDPs we are in terested in is relatively straight- forw ard and can b e explained without reference to ∗ -matrix algebras, the notation and ideas b ehind the theory of ∗ -matrix algebras mak e it esp ecially clear to express. Definition D.1 A c omplex (or r e al) matrix ∗ -algebr a A is a set of n × n matric es in C n × n (or R n × n ) such that for any X , Y ∈ A , αX + β Y , X ∗ , X Y ∈ A for al l α , β ∈ C ( or R ) . A c oher ent c onfigur ation is a set of matric es { A 1 , . . . , A d } ⊂ { 0 , 1 } n × n satisfying: • Some subset of the A i ’s sum to I n . • P d i =1 A i = 1 n × n . • A T i ∈ A . • A i A j is a line ar c ombination of A 1 , . . . , A d . The sp an of the A i ’s is a ∗ -matrix algebr a A . If the A i ’s c ommute and c ontain the identity matrix I n , then this c onfigur ation is c al le d an asso ciation scheme, and A is also known as a Bose-Mesner algebr a. T r e ating A as a matrix ve ctor sp ac e, a c oher ent c onfigur ation forms a b asis for A . 18 F or example, C n × n is a matrix ∗ -algebra. Another commonly considered matrix ∗ -algebra is the space of G -inv ariant matrices. Let S be a finite set, and let G b e a finite group acting on S . The space of G -in v arian t matrices are the matrices A ∈ C | S |×| S | for whic h P T g AP g = A , where P g is the p ermutation matrix asso ciated with g ∈ G . The space of G -in v arian t matrices forms a matrix ∗ -algebra generated b y the coherent configuration { P g } g ∈ G . In particular, the basis is closed under m ultiplication, since P g 1 P g 2 = P g 2 ◦ g 1 . Definition D.2 A ∗ -homomorphism φ is a line ar mapping of ∗ -matrix algebr as φ : A → B which additional ly satisfies φ ( A ∗ ) = φ ( A ) ∗ , φ ( AB ) = φ ( A ) φ ( B ) , and φ ( I A ) = I B if the identity matrix I A ∈ A . Lemma D.3 If φ is an inje ctive ∗ -homomorphism φ : A → B , then A ∈ A and φ ( A ) shar e the same set of eigenvalues (ignoring multiplicity), and A  0 ⇐ ⇒ φ ( A )  0 . T o lev erage symmetry in semidefinite programs, it will be useful to tak e the images of the ob jec- tiv e semidefinite matrix under dimension-reducing ∗ -homomorphisms. Two such ∗ -homomorphisms whic h ma y b e systematically constructed are given b y Artin-W edderburn’s Theorem and regular ∗ - represen tations [Mur07]. Consider a primal semidefinite problem of the form minimize tr( C 0 V ) (22) sub ject to V  0 and tr( C i V ) = b i , where V is the semidefinite matrix to optimize ov er, and C i ∈ R n × n are data matrix constraints and b = ( b 1 , . . . , b m ) are giv en, for i ∈ { 1 , 2 , . . . , m } . Supp ose the data matrices C i are contained in a lo w-dimensional complex matrix ∗ -algebra A S D P . Theorem D.4 Supp ose the data matric es { C i } m i =0 ⊂ R n × n ar e r e al symmetric. L et A S D P b e a c omplex ∗ -algebr a with a r e al ortho gonal b asis c ontaining al l of the C i ’s. Then 1. ther e is an optimal solution V to the primal SDP pr oblem c ontaine d in A S D P 2. R e ( V ) ∈ A S D P is also an optimal solution to the primal SDP pr oblem. Pro of. Refer to ([DdK11], Corollary 2.5.2). A sp ecial case of this is the result applies for the ∗ -algebra of G -inv ariant matrices. Lemma D.5 In the semidefinite pr o gr am (22), supp ose the c onstr aint matric es C 0 , C 1 , . . . , C m ar e G -invariant. Then ther e is a solution V to the SDP which is also G -invariant. Pro of. If V 0 is a solution of (22), then the matrix V ∈ C n × n giv en by V = R G ( V ) = 1 | G | X g ∈ G P T g V 0 P g is G -inv arian t, feasible, and has the same ob jectiv e v alue as V 0 . The av eraging map R G is known as the R eynolds op er ator . 19 E Stabilit y Lemma E.1 E [ η ij | ξ i ] ≤ µ η k ξ i k and E [ η 2 ij | ξ i ] ≤ σ η k ξ i k 2 , wher e µ η = 2 σ (log L +1) and σ 2 η = 4 σ 2 (log 2 L + 1) . Pro of. T ak e ζ ∼ N (0 , I L ), and c ho ose a unit vector z ∈ R L . Define η ( z ) = max l η l ( z ), where the distribution of η l ( z ) = |h R l z , ζ i| is indep enden t of the c hoice of z . Union b ounding, Pr ( η ( z ) ≥ t ) ≤ X l ∈ Z L Pr  η l ( z ) ≥ t 2  ≤ X l Pr  |h e l , ζ i| ≥ t 2  = X l Pr  | ζ l | ≥ t 2  ≤ min  1 , 2 L t · e − t 2 / 2 √ 2 π  . where e l ∈ R L is a standard unit basis v ector. Thus, for T > 1, µ η = E η ( z ) = Z ∞ 0 Pr ( η ( z ) ≥ t ) dt ≤ T + 2 L √ 2 π Z ∞ T te − t 2 / 2 dt ≤ T + Le − T and σ 2 η = E η ( z ) 2 = Z ∞ 0 2 t Pr ( η ( z ) ≥ t ) dt ≤ T 2 + L √ 2 π Z ∞ T te − t 2 / 2 dt ≤ T 2 + Le − T . The lemma follows by taking T = log L , since η ij = 2 σ k ξ i k · η  ξ i / k ξ i k  conditional on ξ i . Lemma E.2 L et t > 0 . With pr ob ability at le ast 1 − 2 N e − t 2 N , X i 6 = j ( η ij + η j 0 ) D ij ≤ ( µ η + σ η t )  2 k x k + 1 N X i k ξ i k  N 2 . (23) holds for al l D ij ∈ [0 , 1] . Pro of. Define the random v ariables Z ij = ( η ij + η j 0 ) D ij ≤ η ij + η j 0 . F or fixed i , the η ij ’s are indep enden t p ositive random v ariables. The one-sided tail bound for indep enden t p ositive random v ariables by Maurer [Mau03] giv es Pr  X j ( η ij + η j 0 ) − X j E ( η ij + η j 0 ) ≥ t i s X j E ( η 2 ij + η 2 j 0 )  ≤ 2 e − t 2 i . Cho ose t i = t √ N . By union b ound, with probabilit y at least 1 − 2 N e − t 2 N , X i,j Z ij ≤ X i µ η (2 k x k + k ξ i k ) N + X i t q N 2 σ 2 η (4 k x k 2 + k ξ i k 2 ) ≤ ( µ η + σ η t )  2 k x k + 1 N X i k ξ i k  N 2 . Theorem E.3 With pr ob ability 1 − e − N + o ( N ) , the solution to the SDP satisfies X i,j D ij ≤ ( k x k + σ 2 √ L ) · 12 log eL ∆ · N 2 . 20 Pro of. Let V b e a solution matrix to the SDP , so that tr( C V ) ≥ tr( C V int ). It follows from Lemma 4.1 that ∆ X ij D ij ≤ X ij ( η ij + η j 0 ) D ij . The χ 2 -tail b ound of Laurent-Massart [LM00] states Pr  1 σ 2 X i k ξ i k 2 ≤ N L + 2 √ N Lt + 2 t  ≥ 1 − exp {− t } . (24) When t = N L this gives X i k ξ i k ≤ s N X i k ξ i k 2 ≤ σ N √ 5 L. with probability at least 1 − e − N L . Union b ounding with the tail b ound of Lemma E.2, and applying Lemma E.1, ∆ X ij D ij ≤ ( µ η + σ η )  2 k x k + 1 N X i k ξ i k  N 2 ≤ 12 log eL ( k x k + σ √ L ) N 2 with probability at least 1 − e − N + o ( N ) . Lemma E.4 L et 0 < δ, ε, ε 0  1 b e smal l c onstants, and define α J L ( x ) = (1 + δ ) x + ε . Ther e is a set of unit ve ctors N of size at most exp  O  δ − 2 log(1 /ε ) log(1 /ε 0 )  such that for any set of unit ve ctors { v i } , ther e is a map ϕ : { v i } → N satisfying the ine quality α − 1 J L  k v i − v j k 2  ≤ k ϕ ( v i ) − ϕ ( v j ) k 2 ≤ α J L  k v i − v j k 2  for at le ast 1 − ε 0 fr action of the p airs ( i, j ) ∈ [ N ] × [ N ] . Pro of. This lemma app ears frequen tly in SDP literature, and in particular is used to analyze adv ersar- ial semi-random Unique-Games instances in [KMM11]. Notice that the size of the set N is indep endent of the num b er of vectors in the set { v i } . Construct a ε/ 32-net N of the unit h yp ersphere in a O ( δ 2 log(1 /ε 0 ))-dimensional space L . By the (strong version of the) Johnson-Lindenstrauss lemma, there is a randomized mapping ϕ 0 : { v i } → L satisfying (1 − δ / 2) k v i − v j k 2 ≤ k ϕ 0 ( v i ) − ϕ 0 ( v j ) k 2 ≤ (1 + δ / 2) k v i − v j k 2 with probability at least 1 − ε 0 . Define ϕ ( v i ) to b e the closest p oin t to ϕ 0 ( v i ) in N , and observ e that (1 − δ / 2) x − ε ≥ x − ε 1 + δ = α − 1 J L ( x ) , (1 + δ / 2) x + ε ≤ α J L ( x ) for all x > 0, so ϕ satisfies the conditions of the lemma. 21

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment