Manopt, a Matlab toolbox for optimization on manifolds

Optimization on manifolds is a rapidly developing branch of nonlinear optimization. Its focus is on problems where the smooth geometry of the search space can be leveraged to design efficient numerical algorithms. In particular, optimization on manif…

Authors: Nicolas Boumal, Bamdev Mishra, P.-A. Absil

Manopt, a Matlab to olb o x for optimizatio n on manif olds Nicolas Boumal ∗ † Bamdev Mishra ‡ P .-A. Absil † Ro dolphe Sepulc hre ‡ August 26, 2013 Abstract Optimization on manifolds is a rapidly developing branch of nonlinear optimization. Its fo cus is on problems where the smo oth geometry of the search space can b e leveraged to design efficient numerical algo rithms. In particular, optimization on manifo lds is well- suited to deal with rank and orthogo nalit y constrain ts. Suc h structured co nstrain ts app ear per v a siv ely in machine learning a pplications, including low-rank matrix co mp letion, sensor net work lo calization, camera netw ork registratio n, independent comp onen t analysis, metric learning, dimensionality reduction and so on. The Manopt toolb ox, av ailable at www.manop t.org , is a user-friendly , do cumen ted piece of softw are dedicated to simplify exp erimen ting with state of the ar t Riemannian optimiza - tion algorithms. W e aim particularly at reaching prac t itioner s outside our field. Keyw ords: Riema nnia n optimization, nonlinea r progra mming , non conv ex, orthogo nal- it y constra in ts, ra nk constr ain ts, o ptimiza t ion with s ymmet ries , rota tio n ma t rice s. 1 In tro duction Optimization on manifolds, or Riemannian optimization, is a fast gro wing r ese arc h topic in the field of nonlinear optimization. Its p u rp o se is to pr o vide efficient n umerical algorithms to solve optimization prob lems of the form min x ∈M f ( x ) , (1) where the searc h sp ac e M is a smo oth space: a d iffe rentiable manifold which can b e endow ed with a Riemannian structure. In a n utshell, this means M can b e linearized lo ca lly at eac h p oin t x as a tangent sp a ce T x M and an inn e r pro duct h· , ·i x whic h smo othly dep ends on x is a v ailable on T x M . A num b er of smo oth search sp a ces arise often in applications: • T he oblique manif old M = { X ∈ R n × m : diag ( X ⊤ X ) = 1 m } is a pro duct of spheres. That is, X ∈ M if eac h column of X has un it 2-norm in R n . Absil & Galliv an (2006) sh o w ho w indep enden t comp onent analysis can b e cast on this manifold as non-orthogonal join t diagonaliza tion. When f u rthermore it is only the pro duct Y = X ⊤ X whic h matters, ma- trices of the form QX are equiv alen t for all orthogonal Q . Q u ot ient ing out this equiv alence ∗ Corresponding author: n ic olasb oumal @gmail.com † Department of Mathematical Engineering, Universit ´ e catholique de Louvai n, Louv ain-la-Neuve, Belgium. ‡ Department of Electrica l Engineering and Computer S cience, Universit ´ e de Li ` ege, Li` ege, Belgium. 1 relation yie lds the fixed-rank elliptop e M = { Y ∈ R m × m : Y = Y ⊤  0 , rank( Y ) = n, diag( Y ) = 1 m } . See the example b elo w for app li cation to the max-cut pr o blem. The pac king problem on the sphere, where on e wishes to p l ace m p oin ts on the unit sp here in R n suc h that the t wo closest p oin ts are as far apart as p ossible (Dirr et al. , 2007), is an- other example of an optimization problem on the fixed-rank elliptop e. Grubisic & Pietersz (2007) optimize o v er this set to pr oduce lo w -rank app ro ximations of co v ariance matrices. • T he (compact ) St ie f e l manifold is the Riemannian sub manifold of orthonormal m a trices, M = { X ∈ R n × m : X ⊤ X = I m } . Amari (1999) and Th e is et al. (2009) formulate v ersions of indep endent component analysis with d imensionali ty r eductio n as optimization ov er the Stiefel manifold. • T he Grassmann manifold is the manifold M = { col( X ) : X ∈ R n × m ∗ } , wh ere R n × m ∗ is the set of full-rank matrices in R n × m and co l( X ) den otes the subspace spanned b y the columns of X . That is, col( X ) ∈ M is a subsp a ce of R n of dimens i on m . Among other things, optimization o v er the Grassmann manifold prov es useful in lo w-rank ma- trix completion, where it is observe d that if one kno ws the column sp ace spanned by the sough t mat rix, then completing the mat rix according to a le ast squares criterion is easy (Kesha v an et al. , 2010; Boumal & Absil, 2011; Balzano et al. , 2010). • T he sp ecial orthogonal group M = { X ∈ R n × n : X ⊤ X = I n and d e t( X ) = 1 } is the group of rotations, t yp ic ally c onsider ed a s a Ri emannian s u bmanifold of R n × n . Op - timizatio n problems in vo lving r o tation matrices n o tably o ccur in rob otics and computer vision, wh en estimating the attitude of v ehicles or th e p ose of cameras (T ron & Vidal, 2009; Boumal e t al. , 2013). • T he set of fixed-rank matrices M = { X ∈ R n × m : rank( X ) = k } adm it s a num b er of differen t Riemannian structur es. V andereyc ken (201 3 ) prop oses an em b edded geometry for M and exp lo its Riemannian optimization on that manifold to ad d ress the lo w-rank matrix completion pr o blem. Shalit et al. (2012) u se the same geomet ry to add r ess similarit y learning. Mishr a et al. (2012) co v er a n umb er of qu o tient geometries for M and similarly address lo w-rank matrix completion p r o blems. • T he set of symmetric, p ositiv e semidefinite, fixed-rank matrices is also a manifold, M = { X ∈ R n × n : X = X ⊤  0 , rank( X ) = k } . Meye r e t al. (2011) exploit this to prop ose lo w-rank algorithms for metric lea rn ing. This space is tig htly related to the space of Euclidean distance matrices X suc h that X ij is the squared distance b et w een t wo fixed p oints x i , x j ∈ R k . Mishr a et al. (201 1 ) lev erage this geometry to formula te efficien t lo w-rank algorithms for Euclidean distance matrix completion. • T he fixed-ra nk sp ectrahedron M = { X ∈ R n × n : X = X ⊤  0 , trace( X ) = 1 , rank( X ) = k } , withou t th e rank constrain t, is a con v ex s et wh ic h can b e us ed to solv e relaxed (lifted) form ulations of the sparse PCA pr ob lem. Journ´ ee et al. (2010) show ho w optimizing ov er the fixed-rank sp ectrahedron can lead to efficien t algorithms for sparse PCA. The ric h geometry of Riemann ia n manifolds M make s it p ossible to define g radients and Hessians of cost functions f , as well as systematic procedu res (called r etr actions ) to mo ve on the manifold starting at a p oin t x , along a sp ecified tangen t d irec tion at x . Those are sufficien t 2 ingredien ts to generalize standard nonlinear optimization metho ds suc h as gradien t d escent, conjugate-gradien ts, quasi-Newton, trust-regions, etc. In a recen t monograph, Ab s i l et al. (2008) la y do wn a mature f r a mework to analyze p r o blems of the form (1) wh en f is a smo oth f unctio n, with a strong emphasis on building a theory that leads to efficien t numerical algorithms. In particular, they describ e the necessary ingredien ts to design firs t- and second-order algo rithms on Riemannian manifolds in general. These alg orithms come w it h conv ergence guaran tees essen tially matc hing those of the Euclidean coun terparts they generalize. F or example, the Riemann ian tru st-reg ion m e tho d is kno wn to con ve rge globally to w ard critical p oin ts and to con ve rge lo cally quadr a tically when the Hessian of f is av ailable. I n man y resp ect s, this theory subsumes well- known r e sults from an earlier pap er by Edelman et al. (1998), which fo cused on problems of the form (1) with M either the set of orthonormal matrices (the Stiefel manifold) or the set of linear subspaces (the Grassmann manifold). The maturity of the theory of smo o th Riemannian optimization, its widespread applicabilit y and its excellen t trac k record p erformance-wise p rompted us to build the Manopt to o lb o x: a user-friendly piece of soft ware to help researc hers and practitioners exp eriment with these to ols. Co de and do cumen tation are a v ailable at www.man opt.org . 2 Arc hitecture and features of Manopt The to olbox arc hitecture is b a sed on a separation of the manifolds, the solv ers and the problem descriptions. F or basic use, one only needs to pic k a manifold from the library , describ e the co st function (and p ossible deriv ativ es) on this man if old and pass it on to a solv er. Accompan ying to o ls h el p th e u s e r in common tasks suc h as numerically chec kin g wh et her the cost function agrees with its deriv ativ es up to the appropriate order, appr o ximating the Hessian b a sed on the gradien t of the cost, etc. Manifolds in Manopt are repr ese nted as stru ct ur es and are obtained by calling a factory . The manifold descrip tions in c lud e pr o ject ions on tangen t spaces, retractions, help ers to con ve rt Euclidean der iv ativ es (gradien t and Hessian) to Riemannian deriv ativ es, etc. All the man- ifolds men tioned in the introdu c tion w ork out of the b o x, and more can b e added (shap e space (Ring & Wirth, 2012), lo w-rank tensors (Kressner et al. , 2013), etc.). Cartesian pro d- ucts of kno wn manifolds are sup ported to o . Solv ers are fun ctions in Manopt th a t implement generic Riemannian minimization algo- rithms. S ol vers log standard in fo rm ation at eac h iteration and comply w i th standard stopping critera. Extra information can b e logged via callbac ks and, similarly , user-defined stoppin g criteria are allo we d. Current ly , Riemannian tr u st-reg ions (based on (Absil et al. , 2007)) and conjugate-gradien ts are implemente d (with preconditioning), as w ell as steep est-descen t and a couple deriv ativ e free sc hemes. More solve rs can b e add ed, w it h an outlo o k to w ard Riemannian BF GS (Ring & Wirth, 2012), sto c hastic gradien ts (Bonnab el, 2013), nonsmo oth subgradients sc hemes (Dirr et al. , 2007), etc. An optimization problem in Manopt is represented as a problem structure. The latter includes a field which con tains a stru ct ur e d e scribin g a manifold, as obtained from a factory . Additionally , the p roblem structure hosts fu nctio n hand le s for the cost function f and (p ossibly) its d eriv ativ es. An abstraction la y er at the in terface b et w een the solv ers and th e problem description offers great flexibilit y in the cost fu nctio n d e scription. As the needs grow dur i ng the life-cycle of the to o lb o x and new wa ys of describ ing f b ecome n ec essary (sub differen tials, 3 partial gradient s, etc.), it will b e sufficient to up date this inte rface. Computing f ( x ) t ypically pro duces int ermediate r e sults whic h can b e reused in ord e r to compute the deriv ativ es of f at x . T o pr even t redund an t computations, Manopt incorp orates an (optional) cac h ing system, which b ecomes u seful when trans i ting f rom a pro of-of-concept draft of the algorithm to a con vincing implementa tion. 3 Example: the maxim um cut problem Giv en an u ndirected graph with n no des and w eigh ts w ij ≥ 0 on the edges su c h that W ∈ R n × n is the weigh ted adjacency m a trix and D ∈ R n × n is the diagonal degree m a trix with D ii = P j w ij , the graph Laplacian is the p ositiv e semidefinite matrix L = D − W . The max-cut problem consists in build i ng a partition s ∈ { +1 , − 1 } n of th e no des in t wo classes such that 1 4 s ⊤ Ls = P i

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment