Convex Total Least Squares

Con v ex T otal Least Squares Dmitry Maliouto v dmalioutov@us.ibm.com IBM Researc h, 1101 Kitcha wan Road, Y orkto wn Heights, NY 10598 USA Nik olai Slav o v nsla vo v@alum.mit.edu Departmen ts of Physics and Biology , MIT, 77 Massach usetts Av enue, Cam bridge, MA 02139, USA Abstract W e study the total least squares (TLS) prob- lem that generalizes least squares regression b y allo wing measuremen t errors in both de- p enden t and indep enden t v ariables. TLS is widely used in applied ﬁelds including com- puter vision, system iden tiﬁcation and econo- metrics. The sp ecial case when all dep enden t and indep enden t v ariables hav e the same lev el of uncorrelated Gaussian noise, known as ordinary TLS, can b e solved by singular v alue decomp osition (SVD). How ev er, SVD cannot solve man y imp ortant practical TLS problems with realistic noise structure, suc h as having v arying measuremen t noise, kno wn structure on the errors, or large outliers re- quiring robust error-norms. T o solve suc h problems, we develop con vex relaxation ap- proac hes for a general class of structured TLS (STLS). W e sho w both theoretically and exp erimentally , that while the plain n u- clear norm relaxation incurs large approxi- mation errors for STLS, the re-w eighted nu- clear norm approach is very eﬀectiv e, and ac hieves b etter accuracy on c hallenging STLS problems than p opular non-conv ex solvers. W e describ e a fast solution based on aug- men ted Lagrangian form ulation, and apply our approach to an imp ortant class of bi- ological problems that use p opulation av- erage measuremen ts to infer cell-t yp e and ph ysiological-state sp eciﬁc expression lev els that are v ery hard to measure directly . Pr o c e e dings of the 31 st International Confer enc e on Ma- chine L e arning , Beijing, China, 2014. JMLR: W&CP v ol- ume 32. Cop yright 2014 b y the author(s). 1. In tro duction T otal least squares is a p ow erful generalization of or- dinary least squares (LS) which allows errors in the measured explanatory v ariables ( Golub & V an Loan , 1980 ). It has become an indispensable to ol in a v ariety of disciplines including chemometrics, system identiﬁ- cation, astronomy , computer vision, and econometrics ( Mark ovsky & V an Huﬀel , 2007 ). Consider a least squares problem y ≈ X β , where we would lik e to ﬁnd co eﬃcien ts β to b est predict the target vector y based on measured v ariables X . The usual assumption is that X is kno wn exactly , and that the errors come from i.i.d. additive Gaussian noise n : y = X β + n . The LS problem has a simple closed-form solution b y minimiz- ing k y − X β k 2 2 with resp ect to β . In man y applications not only y but also X is known only appro ximately , X = X 0 + E x , where X 0 are the uncorrupted v alues, and E x are the unknown errors in observed v ariables. The total least squares (TLS) formulation, or errors in v ariables regression, tries to jointly minimize errors in y and in X ( ` 2 -norm of n and F rob enius norm of E x ): min n ,E x , β k n k 2 2 + k E x k 2 F where y = ( X − E x ) β + n (1) While the optimization problem in this form is not con vex, it can in fact b e reformulated as ﬁnding the closest rank-deﬁcien t matrix to a given matrix, and solv ed in closed form via the singular v alue decomp o- sition (SVD) ( Golub & V an Loan , 1980 ). Man y error-in-v ariables problems of practical in terest ha ve additional information: for example, a subset of the entries in X may b e known exactly , we ma y know diﬀeren t entries with v arying accuracy , and in general X may exhibit a certain structure, e.g. blo ck-diagonal, T o eplitz, or Hank el in system iden tiﬁcation literature ( Mark ovsky et al. , 2005 ). F urthermore, it is often im- p ortan t to use an error-norm robust to outliers, e.g. Hub er loss or ` 1 -loss. Unfortunately , with rare excep- tions 1 , none of these problems allo w an eﬃcient solu- tion, and the state of the art approac h is to solve them 1 A closed form solution exists when subsets of columns are Con vex T otal Least Squares b y lo cal optimization metho ds ( Mark ovsky & Usevic h , 2014 ; Zh u et al. , 2011 ; Srebro & Jaakk ola , 2003 ).The only av ailable guarantee is typically the abilit y to reach a stationary p oin t of the non-conv ex ob jective. In this pap er w e propose a principled formulation for STLS based on con vex relaxations of matrix rank. Our approach uses the re-weigh ted nuclear norm re- laxation ( F azel et al. , 2001 ) and is highly ﬂexible: it can handle very general linear structure on errors, in- cluding arbitrary weigh ts (changing noise for diﬀer- en t entries), patterns of observ ed and unobserved er- rors, T o eplitz and Hank el structures, and ev en norms other than the F rob enius norm. The nuclear norm relaxation has b een successfully used for a range of mac hine learning problems inv olving rank constraints, including low-rank matrix completion, low-order sys- tem approximation, and robust PCA ( Cai et al. , 2010 ; Chandrasek aran et al. , 2011 ). The STLS problem is conceptually diﬀerent in that w e do not seek low-rank solutions, but on the contrary nearly full-rank solu- tions. W e show b oth theoretically and exp erimentally that while the plain nuclear norm form ulation incurs large approximation errors, these can b e dramatically impro ved by using the re-weigh ted n uclear norm. W e suggest fast ﬁrst-order methods based on Augmen ted Lagrangian multipliers ( Bertsek as , 1982 ) to compute the STLS solution. As part of ALM we deriv e new up- dates for the re-weigh ted nuclear-norm based on solv- ing the Sylv ester’s equation, whic h can also be used for man y other machine learning tasks relying on matrix- rank, including matrix completion and robust PCA. As a case study of our approach to STLS w e consider an imp ortant application in biology , quantiﬁcation of cellular heterogeneit y ( Slav ov & Botstein , 2011 ). W e dev elop a new representation for the problem as a large structured linear system, and extend it to handle noise b y a structured TLS problem with blo ck-diagonal er- ror structure. Exp eriments demonstrate the eﬀective- ness of STLS in recov ering physiological-state sp eciﬁc expression lev els from aggregate measurements. 1.1. T otal Least Squares W e ﬁrst review the solution of ordinary TLS prob- lems. W e simplify the notation from ( 1 ): combining our noisy data X and y in to one matrix, ¯ A , [ X − y ], and the errors in to E , [ E x − n ] we hav e min k E k 2 F where ( ¯ A − E )  β 1  = 0 . (2) The matrix ¯ A is in general full-rank, and a solution can b e obtained by ﬁnding a rank-deﬁcient matrix closest fully known; a F ourier transform based approach can handle block-circulant errors E x ( Beck & Ben-T al , 2005 ). to ¯ A in terms of the F rob enius norm. This ﬁnds small- est errors E x and n suc h that y + n is in the range space of X − E x . The closest rank-deﬁcient matrix is sim- ply obtained by computing the SVD, ¯ A = U S V T and setting the smallest singular v alue to b e zero. Structured TLS problems ( Mark ovsky & V an Huﬀel , 2007 ) allow more realistic errors E x : with subsets of measuremen ts that may be known exactly; w eights re- ﬂecting diﬀeren t measurement noise for eac h entry; re- quiring linear structure of errors E x suc h as T o eplitz that is crucial in decon volution problems in signal pro- cessing. Unfortunately , the SVD does not apply to an y of these more general versions of TLS ( Srebro & Jaakk ola , 2003 ; Marko vsky & V an Huﬀel , 2007 ). Ex- isting solutions to structured TLS problems formulate a non-con vex optimization problem and attempt to solv e it by lo cal optimization ( Marko vsky & Usevich , 2014 ) that suﬀers from lo cal optima and lack of guar- an tees on accuracy . W e follow a diﬀerent route and use a con vex relaxation for the STLS problem. 2. STLS via a n uclear norm relaxation The STLS problem in a general form can b e describ ed as follo ws ( Marko vsky & V an Huﬀel , 2007 ). Using the notation in Section 1.1 , supp ose our observed matrix ¯ A is M × N with full column rank. W e aim to ﬁnd a nearby rank-deﬁcient matrix A , rank( A ) ≤ N − 1, where the errors E hav e a certain linear structure: min k W  E k 2 F , where rank( A ) ≤ N − 1 A = ¯ A − E , and L ( E ) = b (3) The k ey comp onents here are the linear equalities that E has to satisfy , L ( E ) = b . This notation represents a set of linear constrain ts tr( L T i E ) = b i , for i = 1 , .., J . In our application to cell heterogeneit y quantiﬁcation these constraints correspond to knowing certain entries of A exactly , i.e. E ij = 0 for some subset of entries, while other entries v ary freely . One may require other linear structure such as T oeplitz or Hank el. W e also allo w an element-wise w eigh ting W  E , with W i,j ≥ 0 on the errors, as some observ ations may b e measured with higher accuracy than others. Finally , while we fo cus on the F rob enius norm of the error, any other con vex error metric, for example, mean absolute er- ror, or robust Hub er loss, could b e used instead. The main diﬃculty in the formulation is p osed by the non- con vex rank constrain t. The STLS problem is a special case of the structured low-rank appro ximation prob- lem, where rank is exactly N − 1 ( Marko vsky & Use- vic h , 2014 ). Next, we prop ose a tractable formulation for STLS based on conv ex relaxations of matrix rank. W e start b y form ulating the n uclear-norm relaxation for TLS and then improv e up on it by using the re- Con vex T otal Least Squares w eighted nuclear norm. The n uclear norm k A k ∗ is a p opular relaxation used to conv exify rank constraints ( Cai et al. , 2010 ), and it is deﬁned as the sum of the singular v alues of the matrix A , i.e. k A k ∗ = P i σ i ( A ). It can b e viewed as the ` 1 -norm of the singular v alue sp ectrum 2 fa voring few non-zero singular v alues, i.e., matrices with low-rank. Our initial nuclear norm re- laxation for the STLS problem is: min k A k ∗ + α k W  E k 2 F suc h that A = ¯ A − E , and L ( E ) = b (4) The parameter α balances error residuals vs. the nu- clear norm (pro xy for rank). W e chose the largest α , i.e. smallest nuclear norm p enalty , that still pro duces rank( A ) ≤ N − 1. This can b e achiev ed by a simple binary searc h ov er α . In contrast to matrix comple- tion and robust PCA, the STLS problem aims to ﬁnd almost fully dense solutions with rank N − 1, so it re- quires diﬀerent analysis to ols. W e present theoretical analysis sp eciﬁcally for the STLS problem in Section 4 . Next, we describ e the re-w eighted n uclear norm, whic h, as we show in Section 4 , is b etter suited for the STLS problem than the plain n uclear norm. 2.1. Reweigh ted n uclear norm and the log-determinan t heuristic for rank A very eﬀective impro vemen t of the nuclear norm comes from re-weigh ting it ( F azel et al. , 2001 ; Mohan & F azel , 2010 ) based on the log-determinant heuristic for rank. T o motiv ate it, we ﬁrst describ e a closely related approach in the vector case (where instead of searc hing for low-rank matrices one would like to ﬁnd sparse vectors). Suppose that we seek a sparse so- lution to a general conv ex optimization problem. A p opular approac h p enalizes the ` 1 -norm of the solu- tion x k x k 1 = P i | x i | to encourage sparse solutions. A dramatic improv ement in ﬁnding sparse signals can b e obtained simply by using the w eighted ` 1 -norm, i.e. P i w i | x i | with suitable p ositive weigh ts w i ( Can- des et al. , 2008 ) instead of a plain ` 1 -norm. Ideally the w eights would b e based on the unknown signal, to pro vide a closer approximation to sparsit y ( ` 0 -norm) b y p enalizing large elemen ts less than small ones. A practical solution ﬁrst solves a problem inv olving the un weigh ted ` 1 -norm, and uses the solution ˆ x to de- ﬁne the w eights w i = 1 δ + | ˆ x i | , with δ a small p ositiv e constan t. This iterative approac h can b e seen as an iterativ e lo cal linearization of the concav e log-p enalty for sparsit y , P i log( δ + | x i | ) ( F azel et al. , 2001 ; Candes et al. , 2008 ). In b oth empirical and emerging theoret- ical studies( Needell , 2009 ; Kha jehnejad et al. , 2009 ) 2 F or diagonal matrices A the nuclear norm is exactly equiv- alent to the ` 1 -norm of the diagonal elements. re-w eighting the ` 1 -norm has b een shown to provide a tigh ter relaxation of sparsity . In a similar wa y , the re-w eighted n uclear norm tries to p enalize large singular v alues less than small ones by in tro ducing p ositiv e w eights. There is an analogous direct connection to the iterative linearization for the conca ve log-det relaxation of rank ( Mohan & F azel , 2010 ). Recall that the problem of minimizing the nu- clear norm sub ject to conv ex set constraints C , min k A k ∗ suc h that A ∈ C , (5) has a semi-deﬁnite programming (SDP) represen tation ( F azel et al. , 2001 ). Introducing auxiliary symmetric p.s.d. matrix v ariables Y , Z  0, w e rewrite it as: min A,Y ,Z tr( Y ) + tr( Z ) s.t.  Y A A T Z   0 , A ∈ C (6) Instead of using the conv ex nuclear norm relaxation, it has b een suggested to use the concav e log-det ap- pro ximation to rank: min A,Y ,Z log det( Y + δ I ) + log det( Z + δ I ) s.t.  Y A A T Z   0 , A ∈ C (7) Here I is the identit y matrix and δ is a small p ositive constan t. The log-det relaxation provides a closer ap- pro ximation to rank than the nuclear norm, but it is more challenging to optimize. By iteratively lineariz- ing this ob jective one obtains a sequence of weigh ted n uclear-norm problems ( Mohan & F azel , 2010 ): min A,Y ,Z tr(( Y k + δ I ) − 1 Y ) + tr(( Z k + δ I ) − 1 Z ) s.t.  Y A A T Z   0 , A ∈ C (8) where Y k , Z k are obtained from the previous itera- tion, and Y 0 , Z 0 are initialized as I . Let W k 1 = ( Y k + δ I ) − 1 / 2 and W k 2 = ( Z k + δ I ) − 1 / 2 then the prob- lem is equiv alent to a w eigh ted n uclear norm optimiza- tion in eac h iteration k : min A,Y ,Z k W k 1 AW k 2 k ∗ s.t. A ∈ C (9) The re-weigh ted nuclear norm approach iteratively solv es conv ex weigh ted nuclear norm problems in ( 9 ): Re-w eighted n uclear norm algorithm: Initialize: k = 0, W 0 1 = W 0 2 = I . (1) Solv e the w eighted NN problem in ( 9 ) to get A k +1 . (2) Compute the SVD: W k 1 A k +1 W k 2 = U Σ V T , and set Y k +1 = ( W k 1 ) − 1 U Σ U T ( W k 1 ) − 1 and Z k +1 = ( W k 2 ) − 1 V Σ V T ( W k 2 ) − 1 . Con vex T otal Least Squares (3) Set W k 1 = ( Y k + δ I ) − 1 / 2 and W k 2 = ( Z k + δ I ) − 1 / 2 . There are v arious w a ys to solv e the plain and w eighted n uclear norm STLS form ulations, including interior- p oin t metho ds ( T oh et al. , 1999 ) and iterative thresh- olding ( Cai et al. , 2010 ). In the next section we fo cus on augmen ted Lagrangian metho ds (ALM) ( Bertsek as , 1982 ) whic h allo w fast conv ergence without using com- putationally exp ensiv e second-order information. 3. F ast computation via ALM While the weigh ted nuclear norm problem in ( 9 ) can b e solved via an interior p oint metho d, it is computa- tionally exp ensive even for mo dest size data b ecause of the need to compute Hessians. W e develop an eﬀec- tiv e ﬁrst-order approac h for STLS based on the aug- men ted Lagrangian multiplier (ALM) metho d ( Bert- sek as , 1982 ; Lin et al. , 2010 ). Consider a general equal- it y constrained optimization problem: min x f ( x ) suc h that h ( x ) = 0 . (10) ALM ﬁrst deﬁnes an augmented Lagrangian function: L ( x , λ , µ ) = f ( x ) + λ T h ( x ) + µ 2 k h ( x ) k 2 2 (11) The augmen ted Lagrangian method alternates opti- mization ov er x with up dates of λ for an increasing sequence of µ k . The motiv ation is that either if λ is near the optimal dual solution for ( 10 ), or, if µ is large enough, then the solution to ( 11 ) approaches the global minimum of ( 10 ). When f and h are b oth con- tin uously diﬀeren tiable, if µ k is an increasing sequence, the solution conv erges Q -linearly to the optimal one ( Bertsek as , 1982 ). The work of ( Lin et al. , 2010 ) ex- tended the analysis to allo w ob jective functions in volv- ing nuclear-norm terms. The ALM metho d iterates the follo wing steps: Augmen ted Lagrangian Multiplier metho d (1) x k +1 = arg min x L ( x , λ k , µ k ) (2) λ k +1 = λ k + µ k h ( x k +1 ) (3) Update µ k → µ k +1 (w e use µ k = a k with a > 1). Next, we derive an ALM algorithm for nuclear-norm STLS and extend it to use reweigh ted nuclear norms based on a solution of the Sylv ester’s equations. 3.1. ALM for nuclear-norm STLS W e would like to solve the problem: min k A k ∗ + α k E k 2 F , such that (12) ¯ A = A + E , and L ( E ) = b T o view it as ( 10 ) we hav e f ( x ) = k A k ∗ + α k E k 2 F and h ( x ) = { ¯ A − A − E , L ( E ) − b } . Using Λ as our matrix Lagrangian m ultiplier, the augmented Lagrangian is: min E : L ( E )= b k A k ∗ + α k E k 2 F + tr(Λ T ( ¯ A − A − E )) + µ 2 k ¯ A − A − E k 2 F . (13) Instead of a full optimization ov er x = ( E , A ), w e use co ordinate descen t which alternates optimizing ov er eac h matrix v ariable holding the other ﬁxed. W e do not w ait for the coordinate descen t to con verge at eac h ALM step, but rather up date Λ and µ after a single it- eration, following the inexact ALM algorithm in ( Lin et al. , 2010 ) 3 . Finally , instead of relaxing the con- strain t L ( E ) = b , w e k eep the constrained form, and follo w each step by a pro jection ( Bertsek as , 1982 ). The minimum of ( 13 ) ov er A is obtained b y the singu- lar v alue thresholding op eration ( Cai et al. , 2010 ): A k +1 = S µ − 1  ¯ A − E k + µ − 1 k Λ k  (14) where S γ ( Z ) soft-thresholds the singular v alues of Z = U S V T , i.e. ˜ S = max( S − γ , 0) to obtain ˆ Z = U ˜ S V T . The minim um of ( 13 ) ov er E is obtained by setting the gradient with resp ect to E to zero, follow ed by a pro jection 4 on to the aﬃne space deﬁned b y L ( E ) = b : ˜ E k +1 = 1 2 α + µ k  Λ k + µ k ( ¯ A − A )  and E k +1 = Π E : L ( E )= b ˜ E k +1 (15) 3.2. ALM for re-weigh ted n uclear-norm STLS T o use the log-determinant heuristic, i.e., the re- w eighted nuclear norm approach, we need to solve the w eighted nuclear norm subproblems: min k W 1 AW 2 k ∗ + α k E k 2 F where (16) ¯ A = A + E , and L ( E ) = b There is no known analytic thresholding solution for the weigh ted nuclear norm, so instead we follow ( Liu et al. , 2010 ) to create a new v ariable D = W 1 AW 2 and add this deﬁnition as an additional linear constrain t: min k D k ∗ + α k E k 2 F where (17) ¯ A = A + E , D = W 1 AW 2 , and L ( E ) = b No w we hav e tw o Lagrangian multipliers Λ 1 and Λ 2 3 This is closely related to the p opular alternating direction of multipliers metho ds ( Boyd et al. , 2011 ). 4 F or many constrain ts of interest this pro jection is highly eﬃ- cient: when the constrain t ﬁxes some entries E ij = 0, pro jection simply re-sets these entries to zero. Pro jection onto T o eplitz structure simply takes an a verage along each diagonal, e.t.c Con vex T otal Least Squares Algorithm 1 ALM for weigh ted NN-STLS Input: ¯ A , W 1 , W 2 , α rep eat • Up date D via soft-thresholding: D k +1 = S µ − 1 k  W 1 AW 2 − 1 /µ k Λ k 2  . • Up date E as in ( 15 ). • Solv e Sylvester system for A in ( 19 ). • Up date Λ k +1 1 = Λ k 1 + µ k ( ¯ A − A − E ), Λ k +1 2 = Λ k 2 + µ k ( D − W 1 AW 2 ) and µ k → µ k +1 . un til conv er g ence and the augmen ted Lagrangian is min E : L ( E )= b k D k ∗ + α k E k 2 F + tr(Λ T 1 ( ¯ A − A − E )) + tr(Λ T 2 ( D − W 1 AW 2 )) + µ 2 k ¯ A − A − E k 2 F + µ 2 k D − W 1 AW 2 k 2 F (18) W e again follow an ALM strategy , optimizing ov er D , E , A separately follow ed by up dates of Λ 1 , Λ 2 and µ . Note that ( Deng et al. , 2012 ) considered a strategy for minimizing re-weigh ted nuclear norms for matrix completion, but instead of using exact minimization o ver A , they to ok a step in the gradient direction. W e deriv e the exact up date, which turns out to b e very eﬃcien t via a Sylvester equation form ulation. The up dates o ver D and o ver E lo ok similar to the un- rew eighted case. T aking a deriv ative with resp ect to A we obtain a linear system of equations in an un usual form: − Λ 1 − W y Λ 2 W Z − µ ( ¯ A − A − E ) − µW 1 ( D − W 1 AW 2 ) W 2 = 0. Rewriting it, we obtain: A + W 2 1 AW 2 2 = 1 µ k (Λ 1 + W 1 Λ 2 W 2 )+( ¯ A − E )+ W 1 D W 2 (19) w e can see that it is in the form of Sylvester equation arising in discrete Ly apunov systems ( Kailath , 1980 ): A + B 1 AB 2 = C (20) where A is the unknown, and B 1 , B 2 , C are co eﬃcient matrices. An eﬃcient solution is describ ed in ( Bartels & Stewart , 1972 ). These ALM steps for rew eighted n uclear norm STLS are summarized in Algorithm 1 . T o obtain the full algorithm for STLS, we com bine the ab o ve algorithm with steps of re-weigh ting the n uclear norm and the binary search ov er α as describ ed in Section 2.1 . W e use it for exp eriments in Section 5 . A faster algorithm that a voids the need for a binary searc h will b e presented in a future publication. 4. Accuracy analysis for STLS In context of matrix completion and robust PCA, the n uclear norm relaxation has strong theoretical accu- racy guarantees ( Rech t et al. , 2010 ; Chandrasek aran et al. , 2011 ). W e now study accuracy guaran tees for the STLS problem via the n uclear norm and the rew eighted nuclear norm approac hes. The analysis is conducted in the plain TLS setting, where the optimal solution is av ailable via the SVD, and it gives v aluable insigh t into the accuracy of our approach for the m uch harder STLS problem. In particular, we quantify the dramatic b eneﬁt of using reweigh ting. In this section w e study a simpliﬁcation of our STLS algorithm, where w e set the regularization parameter α once and do not up date it through the iterations. The full adaptiv e ap- proac h from Section 2.1 is analyzed in the addendum to this pap er where we sho w that it can in fact recov er the exact SVD solution for plain TLS. W e ﬁrst consider the problem min k A − ¯ A k 2 F suc h that rank( A ) ≤ N − 1. F or the exact solution via the SVD, the minim um appro ximation error is simply the square of the last singular v alue E r r S V D = k ˆ A S V D − ¯ A k 2 F = σ 2 N . The nuclear-norm appro ximation will hav e a higher error. W e solve min k A − ¯ A k 2 F + α k A k ∗ for the smallest choice of α that makes A rank-deﬁcient. A closed form solution for A is the soft-thresholding op- eration with α = σ N . It subtracts α from all the singu- lar v alues, making the error E rr nn = N σ 2 N . While it is bounded, this is a substan tial increase from the SVD solution. Using the log-det heuristic, we obtain muc h tigh ter accuracy guaran tees even when we ﬁx α , and do not update it during re-w eighting. Let a i = σ i σ N , the ratio of the i -th and the smallest singular v alues. In the app endix using ‘log-thresholding’ w e derive that E r r rw-nn ≈ σ 2 N 1 + 1 2 X i 1, or p ow er-law deca y σ i = σ N ( N − i + 1) p . The approximation er- rors are E r r exp = σ 2 N  1 + 1 2 P i 0. This is a separable problem with a closed form solution for each coordinate 6 (con trast this with the 5 T aking the SVD A = U S V T we hav e k U S V k 2 F = k S k 2 F and k U S V k ∗ = k S k ∗ since U , V are unitary . 6 F or δ small enough, the global minimum is alwa ys at 0, but if y > 2 √ α there is also a lo cal minimum with a large domain of attraction b etw een 0 and y . Iterativ e linearization metho ds with small enough step size starting at y will conv erge to this local minimum. 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 Growth Rate, h − 1 F ractio n of Cells HOC P ha se LO C Ph ase Figure 4. STLS infers accurate fractions of cells in diﬀerent physiological phases from measurements of p opulation-av erage gene expression across growth rate. soft-thresholding op eration): x i =        1 2  ( y i − δ ) + p ( y i − δ ) 2 − 4( α − y i δ )  , y i > 2 √ α 1 2  ( y i + δ ) − p ( y i + δ ) 2 − 4( α + y i δ )  , y i < − 2 √ α 0 , otherwise (26) Assuming that δ is negligible, then we hav e: x i ≈      1 2 ( y i + p y 2 i − 4 α ) , if y i > 2 √ α 1 2 ( y i − p y 2 i − 4 α ) , if y i < − 2 √ α 0 , otherwise, (27) and we chose α to annihilate the smallest entry in x , i.e. α = 1 4 min i y 2 i . Sorting the en tries in | y | in increas- ing order, with y 0 = y min , and deﬁning a i = | y i | | y 0 | , we ha ve a i ≥ 1 and the error in approximating the i-th en try , for i > 0 is E r r i = | x i − y i | 2 = y 2 0 2  a i − q a 2 i − 1  2 ≤ y 2 0 2 a 2 i . (28) Also, by our choice of α , we ha ve E r r 0 = y 2 0 for i = 0. The appro ximation error quickly decreases for larger en tries. In contrast, for ` 1 soft-thresholding, the errors of approximating large entries are as bad as the ones for small en tries. This analysis extends directly to the log-det heuristic for relaxing matrix rank. 7. Conclusions W e considered a conv ex relaxation for a very ric h class of structured TLS problems, and provided theoreti- cal guarantees. W e also developed an eﬃcient ﬁrst- order augmented Lagrangian m ultipliers algorithm for rew eighted nuclear norm STLS, which can b e applied b ey ond TLS to matrix completion and robust PCA problems. W e applied STLS to quantifying cellular heterogeneit y from p opulation a verage measurements. In future work we will study STLS with sparse and group sparse solutions, and explore connections to ro- bust LS ( El Ghaoui & Lebret , 1997 ). Con vex T otal Least Squares References Bartels, R. H. and Stewart, G. W. Solution of the matrix equation AX+ XB = C. Communic ations of the ACM , 15(9):820–826, 1972. Bec k, A. and Ben-T al, A. A global solution for the structured total least squares problem with blo ck circulan t matrices. SIAM Journal on Matrix Anal- ysis and Applic. , 27(1):238–255, 2005. Bertsek as, D. P . Constr aine d Optim. and L agr ange Multiplier Metho ds . Academic Press, 1982. Bo yd, S., Parikh, N., Chu, E., Peleato, B., and Eck- stein, J. Distributed optimization and statistical learning via the alternating direction method of mul- tipliers. F oundations and T r ends in Machine L e arn- ing , 3(1):1–122, 2011. Cai, J., Candes, E. J., and Shen, Z. A singular v alue thresholding algorithm for matrix completion. SIAM Journal on Optim. , 20(4):1956–1982, 2010. Candes, E. J., W akin, M. B., and Boyd, S. P . Enhanc- ing sparsity by reweigh ted l1 minimization. J. of F ourier Analysis and Applic. , 14(5):877–905, 2008. Chandrasek aran, V., Sanghavi, S., Parrilo, P . A., and Willsky , A. S. Rank-sparsit y incoherence for ma- trix decomp osition. SIAM Journal on Optim. , 21 (2), 2011. Deng, Y., Dai, Q., Liu, R., Zhang, Z., and Hu, S. Lo w-rank structure learning via log-sum heuristic reco very . arXiv pr eprint arXiv:1012.1919 , 2012. El Ghaoui, L. and Lebret, H. Robust solutions to least- squares problems with uncertain data. SIAM J. on Matrix Analysis and Applic. , 18(4):1035–1064, 1997. F azel, M., Hindi, H., and Boyd, S. P . A rank minimiza- tion heuristic with application to minim um order system approximation. In IEEE A meric an Contr ol Confer enc e , 2001. Golub, G. H. and V an Loan, C. F. An analysis of the total least squares problem. SIAM Journal on Numeric al Analysis , 17(6):883–893, 1980. Kailath, T. Line ar systems . Prentice-Hall, 1980. Kha jehnejad, A., Xu, W., Av estimehr, S., and Hassibi, B. W eighted ` 1 minimization for sparse reco v ery with prior information. In IEEE Int. Symp osium on Inf. The ory, 2009. , pp. 483–487, 2009. Lin, Z., Chen, M., and Ma, Y. The augmented Lagrange multiplier metho d for exact recov ery of corrupted lo w-rank matrices. arXiv pr eprint arXiv:1009.5055 , 2010. Liu, G., Lin, Z., Y an, S., Sun, J., Y u, Y., and Ma, Y. Robust recov ery of subspace structures by lo w-rank represen tation. arXiv pr eprint arXiv:1010.2955 , 2010. Mark ovsky , I. and Usevich, K. Softw are for weigh ted structured low-rank approximation. J. Comput. Appl. Math. , 256:278–292, 2014. Mark ovsky , I. and V an Huﬀel, S. Overview of total least-squares metho ds. Signal pr o c essing , 87(10): 2283–2302, 2007. Mark ovsky , I., Willems, J. C., V an Huﬀel, S., De Moor, B., and Pintelon, R. Application of structured to- tal least squares for system iden tiﬁcation and model reduction. Automatic Contr ol, IEEE T r ans. on , 50 (10):1490–1500, 2005. Mohan, K. and F azel, M. Reweigh ted n uclear norm minimization with application to system identiﬁca- tion. In Americ an Contr ol Confer enc e , 2010. Needell, D. Noisy signal recov ery via iterative rew eighted l1-minimization. In F orty-Thir d Asilo- mar Confer enc e on Signals, Systems and Comput- ers, 2009 , pp. 113–117. IEEE, 2009. Rec ht, B., F azel, M., and Parrilo, P . A. Guaranteed minim um-rank solutions of linear matrix equations via nuclear norm minimization. SIAM R eview , 52 (3):471–501, 2010. Sla vo v, N. and Botstein, D. Coupling among growth rate resp onse, metab olic cycle, and cell division cy- cle in y east. Mole cular bio. of the c el l , 22(12), 2011. Sla vo v, N., Macinsk as, J., Caudy , A., and Botstein, D. Metab olic cycling without cell division cycling in respiring y east. Pr o c e e dings of the National A c ademy of Scienc es , 108(47), 2011. Sla vo v, Nikolai, Airoldi, Edoardo M., v an Oudenaar- den, Alexander, and Botstein, David. A conserved cell growth cycle can account for the environmen tal stress resp onses of div ergent euk aryotes. Mole cular Biolo gy of the Cel l , 23(10):1986–1997, 2012. Srebro, N. and Jaakk ola, T. W eighted low-rank appro ximations. In Int. Conf. Machine L e arning (ICML) , 2003. T oh, K. C., T o dd, M. J., and T ¨ ut ¨ unc ¨ u, R. H. SDPT3 – a Matlab softw are pack age for semideﬁnite pro- gramming, version 1.3. Optim. Metho d. Softw. , 11 (1–4):545–581, 1999. Zh u, H., Giannakis, G. B., and Leus, G. W eigh ted and structured sparse total least-squares for p erturb ed compressiv e sampling. In IEEE Int. Conf. A c ous- tics, Sp e e ch and Signal Pr o c. , 2011.

Convex Total Least Squares

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment