Manopt, a Matlab toolbox for optimization on manifolds

Manopt, a Matlab to olb o x for optimizatio n on manif olds Nicolas Boumal ∗ † Bamdev Mishra ‡ P .-A. Absil † Ro dolphe Sepulc hre ‡ August 26, 2013 Abstract Optimization on manifolds is a rapidly developing branch of nonlinear optimization. Its fo cus is on problems where the smo oth geometry of the search space can b e leveraged to design eﬃcient numerical algo rithms. In particular, optimization on manifo lds is well- suited to deal with rank and orthogo nalit y constrain ts. Suc h structured co nstrain ts app ear per v a siv ely in machine learning a pplications, including low-rank matrix co mp letion, sensor net work lo calization, camera netw ork registratio n, independent comp onen t analysis, metric learning, dimensionality reduction and so on. The Manopt toolb ox, av ailable at www.manop t.org , is a user-friendly , do cumen ted piece of softw are dedicated to simplify exp erimen ting with state of the ar t Riemannian optimiza - tion algorithms. W e aim particularly at reaching prac t itioner s outside our ﬁeld. Keyw ords: Riema nnia n optimization, nonlinea r progra mming , non conv ex, orthogo nal- it y constra in ts, ra nk constr ain ts, o ptimiza t ion with s ymmet ries , rota tio n ma t rice s. 1 In tro duction Optimization on manifolds, or Riemannian optimization, is a fast gro wing r ese arc h topic in the ﬁeld of nonlinear optimization. Its p u rp o se is to pr o vide eﬃcient n umerical algorithms to solve optimization prob lems of the form min x ∈M f ( x ) , (1) where the searc h sp ac e M is a smo oth space: a d iﬀe rentiable manifold which can b e endow ed with a Riemannian structure. In a n utshell, this means M can b e linearized lo ca lly at eac h p oin t x as a tangent sp a ce T x M and an inn e r pro duct h· , ·i x whic h smo othly dep ends on x is a v ailable on T x M . A num b er of smo oth search sp a ces arise often in applications: • T he oblique manif old M = { X ∈ R n × m : diag ( X ⊤ X ) = 1 m } is a pro duct of spheres. That is, X ∈ M if eac h column of X has un it 2-norm in R n . Absil & Galliv an (2006) sh o w ho w indep enden t comp onent analysis can b e cast on this manifold as non-orthogonal join t diagonaliza tion. When f u rthermore it is only the pro duct Y = X ⊤ X whic h matters, ma- trices of the form QX are equiv alen t for all orthogonal Q . Q u ot ient ing out this equiv alence ∗ Corresponding author: n ic olasb oumal @gmail.com † Department of Mathematical Engineering, Universit ´ e catholique de Louvai n, Louv ain-la-Neuve, Belgium. ‡ Department of Electrica l Engineering and Computer S cience, Universit ´ e de Li ` ege, Li` ege, Belgium. 1 relation yie lds the ﬁxed-rank elliptop e M = { Y ∈ R m × m : Y = Y ⊤  0 , rank( Y ) = n, diag( Y ) = 1 m } . See the example b elo w for app li cation to the max-cut pr o blem. The pac king problem on the sphere, where on e wishes to p l ace m p oin ts on the unit sp here in R n suc h that the t wo closest p oin ts are as far apart as p ossible (Dirr et al. , 2007), is an- other example of an optimization problem on the ﬁxed-rank elliptop e. Grubisic & Pietersz (2007) optimize o v er this set to pr oduce lo w -rank app ro ximations of co v ariance matrices. • T he (compact ) St ie f e l manifold is the Riemannian sub manifold of orthonormal m a trices, M = { X ∈ R n × m : X ⊤ X = I m } . Amari (1999) and Th e is et al. (2009) formulate v ersions of indep endent component analysis with d imensionali ty r eductio n as optimization ov er the Stiefel manifold. • T he Grassmann manifold is the manifold M = { col( X ) : X ∈ R n × m ∗ } , wh ere R n × m ∗ is the set of full-rank matrices in R n × m and co l( X ) den otes the subspace spanned b y the columns of X . That is, col( X ) ∈ M is a subsp a ce of R n of dimens i on m . Among other things, optimization o v er the Grassmann manifold prov es useful in lo w-rank ma- trix completion, where it is observe d that if one kno ws the column sp ace spanned by the sough t mat rix, then completing the mat rix according to a le ast squares criterion is easy (Kesha v an et al. , 2010; Boumal & Absil, 2011; Balzano et al. , 2010). • T he sp ecial orthogonal group M = { X ∈ R n × n : X ⊤ X = I n and d e t( X ) = 1 } is the group of rotations, t yp ic ally c onsider ed a s a Ri emannian s u bmanifold of R n × n . Op - timizatio n problems in vo lving r o tation matrices n o tably o ccur in rob otics and computer vision, wh en estimating the attitude of v ehicles or th e p ose of cameras (T ron & Vidal, 2009; Boumal e t al. , 2013). • T he set of ﬁxed-rank matrices M = { X ∈ R n × m : rank( X ) = k } adm it s a num b er of diﬀeren t Riemannian structur es. V andereyc ken (201 3 ) prop oses an em b edded geometry for M and exp lo its Riemannian optimization on that manifold to ad d ress the lo w-rank matrix completion pr o blem. Shalit et al. (2012) u se the same geomet ry to add r ess similarit y learning. Mishr a et al. (2012) co v er a n umb er of qu o tient geometries for M and similarly address lo w-rank matrix completion p r o blems. • T he set of symmetric, p ositiv e semideﬁnite, ﬁxed-rank matrices is also a manifold, M = { X ∈ R n × n : X = X ⊤  0 , rank( X ) = k } . Meye r e t al. (2011) exploit this to prop ose lo w-rank algorithms for metric lea rn ing. This space is tig htly related to the space of Euclidean distance matrices X suc h that X ij is the squared distance b et w een t wo ﬁxed p oints x i , x j ∈ R k . Mishr a et al. (201 1 ) lev erage this geometry to formula te eﬃcien t lo w-rank algorithms for Euclidean distance matrix completion. • T he ﬁxed-ra nk sp ectrahedron M = { X ∈ R n × n : X = X ⊤  0 , trace( X ) = 1 , rank( X ) = k } , withou t th e rank constrain t, is a con v ex s et wh ic h can b e us ed to solv e relaxed (lifted) form ulations of the sparse PCA pr ob lem. Journ´ ee et al. (2010) show ho w optimizing ov er the ﬁxed-rank sp ectrahedron can lead to eﬃcien t algorithms for sparse PCA. The ric h geometry of Riemann ia n manifolds M make s it p ossible to deﬁne g radients and Hessians of cost functions f , as well as systematic procedu res (called r etr actions ) to mo ve on the manifold starting at a p oin t x , along a sp eciﬁed tangen t d irec tion at x . Those are suﬃcien t 2 ingredien ts to generalize standard nonlinear optimization metho ds suc h as gradien t d escent, conjugate-gradien ts, quasi-Newton, trust-regions, etc. In a recen t monograph, Ab s i l et al. (2008) la y do wn a mature f r a mework to analyze p r o blems of the form (1) wh en f is a smo oth f unctio n, with a strong emphasis on building a theory that leads to eﬃcien t numerical algorithms. In particular, they describ e the necessary ingredien ts to design ﬁrs t- and second-order algo rithms on Riemannian manifolds in general. These alg orithms come w it h conv ergence guaran tees essen tially matc hing those of the Euclidean coun terparts they generalize. F or example, the Riemann ian tru st-reg ion m e tho d is kno wn to con ve rge globally to w ard critical p oin ts and to con ve rge lo cally quadr a tically when the Hessian of f is av ailable. I n man y resp ect s, this theory subsumes well- known r e sults from an earlier pap er by Edelman et al. (1998), which fo cused on problems of the form (1) with M either the set of orthonormal matrices (the Stiefel manifold) or the set of linear subspaces (the Grassmann manifold). The maturity of the theory of smo o th Riemannian optimization, its widespread applicabilit y and its excellen t trac k record p erformance-wise p rompted us to build the Manopt to o lb o x: a user-friendly piece of soft ware to help researc hers and practitioners exp eriment with these to ols. Co de and do cumen tation are a v ailable at www.man opt.org . 2 Arc hitecture and features of Manopt The to olbox arc hitecture is b a sed on a separation of the manifolds, the solv ers and the problem descriptions. F or basic use, one only needs to pic k a manifold from the library , describ e the co st function (and p ossible deriv ativ es) on this man if old and pass it on to a solv er. Accompan ying to o ls h el p th e u s e r in common tasks suc h as numerically chec kin g wh et her the cost function agrees with its deriv ativ es up to the appropriate order, appr o ximating the Hessian b a sed on the gradien t of the cost, etc. Manifolds in Manopt are repr ese nted as stru ct ur es and are obtained by calling a factory . The manifold descrip tions in c lud e pr o ject ions on tangen t spaces, retractions, help ers to con ve rt Euclidean der iv ativ es (gradien t and Hessian) to Riemannian deriv ativ es, etc. All the man- ifolds men tioned in the introdu c tion w ork out of the b o x, and more can b e added (shap e space (Ring & Wirth, 2012), lo w-rank tensors (Kressner et al. , 2013), etc.). Cartesian pro d- ucts of kno wn manifolds are sup ported to o . Solv ers are fun ctions in Manopt th a t implement generic Riemannian minimization algo- rithms. S ol vers log standard in fo rm ation at eac h iteration and comply w i th standard stopping critera. Extra information can b e logged via callbac ks and, similarly , user-deﬁned stoppin g criteria are allo we d. Current ly , Riemannian tr u st-reg ions (based on (Absil et al. , 2007)) and conjugate-gradien ts are implemente d (with preconditioning), as w ell as steep est-descen t and a couple deriv ativ e free sc hemes. More solve rs can b e add ed, w it h an outlo o k to w ard Riemannian BF GS (Ring & Wirth, 2012), sto c hastic gradien ts (Bonnab el, 2013), nonsmo oth subgradients sc hemes (Dirr et al. , 2007), etc. An optimization problem in Manopt is represented as a problem structure. The latter includes a ﬁeld which con tains a stru ct ur e d e scribin g a manifold, as obtained from a factory . Additionally , the p roblem structure hosts fu nctio n hand le s for the cost function f and (p ossibly) its d eriv ativ es. An abstraction la y er at the in terface b et w een the solv ers and th e problem description oﬀers great ﬂexibilit y in the cost fu nctio n d e scription. As the needs grow dur i ng the life-cycle of the to o lb o x and new wa ys of describ ing f b ecome n ec essary (sub diﬀeren tials, 3 partial gradient s, etc.), it will b e suﬃcient to up date this inte rface. Computing f ( x ) t ypically pro duces int ermediate r e sults whic h can b e reused in ord e r to compute the deriv ativ es of f at x . T o pr even t redund an t computations, Manopt incorp orates an (optional) cac h ing system, which b ecomes u seful when trans i ting f rom a pro of-of-concept draft of the algorithm to a con vincing implementa tion. 3 Example: the maxim um cut problem Giv en an u ndirected graph with n no des and w eigh ts w ij ≥ 0 on the edges su c h that W ∈ R n × n is the weigh ted adjacency m a trix and D ∈ R n × n is the diagonal degree m a trix with D ii = P j w ij , the graph Laplacian is the p ositiv e semideﬁnite matrix L = D − W . The max-cut problem consists in build i ng a partition s ∈ { +1 , − 1 } n of th e no des in t wo classes such that 1 4 s ⊤ Ls = P i

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment