Low-Rank Modeling and Its Applications in Image Analysis

Low-rank modeling generally refers to a class of methods that solve problems by representing variables of interest as low-rank matrices. It has achieved great success in various fields including computer vision, data mining, signal processing and bio…

Authors: Xiaowei Zhou, Can Yang, Hongyu Zhao

Low-Rank Modeling and Its Applications in Image Analysis
Low-Rank Modeling and Its Applications in Image Analysis Xiao wei Zhou , The Hong K ong University of Science and T ec hnology Can Y ang , Hong K ong Baptist University Hongyu Zhao , Y ale University W eichuan Y u , The Hong Kong Un iversity of Science and T echnology Low-rank modeling generally refers to a class of meth ods that solv e problems by representing variables of interest as low-rank matrices . It has a chiev ed great success in v arious fields includin g computer vision , data min ing, sign al processing and bi oin formatics . Recently , much progress ha s been made in theories , algorithms and applications of low-rank modeling, su ch as exact low-rank matrix recovery via convex pro- gramming and matrix completion applied to collaborative filtering. These a dv ances have brought more and more attentions to this topic. In this paper , we review the recent adv ance of low-rank modeling, the state- of-the-art algorithms, and related applications in image analysis. W e first giv e an overview to the concept of low-rank modeling and challenging problems in th is area. Then, we su mmarize the models and algorithms for low-rank matrix recovery and illustrate their advantages and limitation s with nu merical experiments. Next, w e int roduce a few applications of low-rank modeling in the context of image analys i s. Finally , we conclude this paper with some discussions. 1. INTRODUCTION In many research fields the data to be analyzed often have high dimensionality , which brings g reat challenges to d ata analysis. Some example s include image s in computer vision, doc uments in natural language p rocessing, custome r s’ r ecords in recomm ender systems , and geno mics data in bioinfo rmatics . F ortunately , the high-dimen sional data often lie in a low-dimension al subspace. Mathematically , if we rep resent e ach data po in t by a vector d i ∈ R m and d enote the entire d ataset by a big matrix D = [ d 1 , · · · , d n ] , the low-d imensionality assumption can be translated into the fo llowing low - rank assumption: rank ( D ) ≪ min( m, n ) . A typical example in com p uter v ision is Lambertian refle ctance, where d 1 , · · · , d n corre- spond to a set of imag es of a convex Lambertian surface under various lig h ting con- ditions [Basri and Jacobs 2003 ] . Another example is f r om signal processing, where d i represents a vector o f signal intensities receiv e d by an an te nna array at time point i . Interested r e aders are refe rred to [Markovsky 2012, Chapter 1.3] for more low-rank examples. In the re al world, the r aw data can hardly be perfec tly lo w-rank due to the existence of noise. Ther e fore, the following mode l is more faithful to re al situations D = X + E , (1) where X co rresponds to a low-rank compo nent and E correspo n ds to the noise or er - ror in me asurements . Recove r ing the low-rank structure f rom noisy data becomes the centric task in many problems. A co nventional approach to finding the low-rank appro ximation is to solv e the fol- lowing optimization problem: min X k D − X k 2 F , s .t. rank ( X ) ≤ r, (2) 2 X. Zhou et al. where k Y k F = q P ij Y 2 ij denotes the F robenious norm of a matrix. 1 Solving such a minimization problem can be interpreted as seeking the optimal rank- r esti- mate of D in a least-squares sense. According to the matrix ap proximation theorem [Eckart and Y o ung 1936], the solution to (2) is giv en analytically by the singular value decompo sition (SVD): X ∗ = r X i =1 σ i u i v T i , (3) where { u i } , { v i } , and { σ i } for i = 1 , · · · , r c orrespond to the left singu lar vectors, right singular ve ctors , and singular values of D , respe ctively . The vectors u 1 , · · · , u r also form an o r thogonal basis to rep resent a r -dimensional subspace that can be st embe d the d ata. This proced u re corre spo nds to Principal Compone nt Analysis (PCA) [Jolliffe 2002] in statistics . While PCA is one of the mo st popular tools f o r data an aly sis because of the analyti- cal solution in com putation and the provable optimality un der certain assumptions, it cannot handle some d if ficulties in re al applications. Con side r the follo w ing two com - mon examples: Recovery from a few en tries. In many app lications , we w ould like to reco ver a matrix fro m on ly a small number of observed entrie s. A typical example is that, whe n building r e commende r systems, we hope to make p redictions to customers’ prefere nces based on the information collected so far . The Ne tFlix p r o blem [ K oren et al. 2009] is a famous instance. The data is a big matrix D w ith each entry D ij ∈ { 1 , · · · , 5 } record in g the rating o f customer i for mov ie j . There are around 480K cu stomers and 18K movie s in the dataset, but only 1 . 2 % entries have v alues since each custome r only rated about 200 mov ies on average. The problem is how to predict the ratings that have n ot been made yet based on the current observation. A pop ular solution is to assume that the rating matrix should be low-rank. This assumption is based on the f act that a subgrou p of cu stomers are likely to share a similar taste and their ratings to the mov ie s will be highly correlated . Conseque ntly , the rank of the rating matrix w ill be bounded by the number of subgro ups f ormed by the customers. The refore, the problem turn s in to recove r ing a low-rank matrix from a few entries. This pro ble m is often c alled Matrix Completion (MC). Recovery from gross errors. In some other applications, we have to reco ver a low- dimensional subspace from corru pted data. F o r example, the face imag es o f a p erson may include glasses or shadows that o cclude the true appearance. The classical PCA assumes indep endently and identically distributed (i.i.d.) Gaussian no ise an d adopts the sum of squared difference s as the loss function, as shown in ( 2). Since the least- squares fitting is sensitive to outliers, the classical PCA c an be easily corr u pted by these gr oss erro r s. F or example, the recon structed f ace image s would include arti- facts caused by the g lasses or shadow s in the input images [De La T o r r e and Black 2003]. Recove ring a subspace o r low-rank matrix robu stly in the presenc e of outliers has become a popularly-studie d issue. This p roblem is o ften called Robus t Principal Component Analysis (RPCA). In rece nt y ears , many new techniques for low -rank m atrix recove r y have be en pro- posed. In the fo llowing, we will introduce some re presentative works. Basically , they 1 In th is paper , a matrix is denoted b y a capital letter , e.g . Y . An element an d a column of Y are denoted by Y ij and y i , respectively . Low-Rank Mode ling and Its Applications in Image Analysis 3 can be divided into two categorie s based on their ap p roaches to mod e ling the low- r ank prior . The first ap p roach is to minimize the rank of the unknown matrix subje c t to some con straints . The rank minimization is o f ten achieve d by convex re laxation. W e call these methods rank minimization metho ds . The secon d approach is to factorize the unknown matrix as a produc t of two factor matrices. The rank of the unknown matrix is u pper bounded by the ranks of the facto r m atrice s . W e call the se m e thods matrix factorization methods. The rest of this paper is organized as follow s . 2 In Section 2, we will re view the rank minimization methods f or low -rank matrix reco very . W e shall introduce some typical models as well as the correspond ing o ptimization algorithms to solve these m odels. In Section 3, w e will introd u ce m atrix factorization me thods for low -rank matrix recov e ry . In Section 4, we will use synthesized expe r iments to illustrate the pe rformances of the discussed me tho ds . In Section 5, w e will give a brief review of the ap p lications of low- rank modeling in image analysis. Finally , we will con clude the p aper with d iscussions in Se c tion 6. 2. RANK MINIMIZA TION A direct approach to recove ring a low- rank matrix is to minimize the rank of the ma- trix with cer tain constraints that make the estimated matrix consistent with original data. Howe ver , the rank min imization problem is com binatorial and known to be NP- hard [F a zel 2002] . Therefo re, conv ex r elaxation is o ften used to make the minimization tractable. The most pop u lar choice is to r e place rank with the nu clear norm w hich is defined as k X k ∗ = r X i =1 σ i , (4) where σ 1 , σ 2 , · · · , σ r are the singular values of X and r is the rank of X . The advan tage s of using the nuclear n orm relaxation are mainly two- f olds . Firstly , the nuclear nor m is co n vex. Hence, it is feasible to compute the global optima o f the relaxed problem efficiently . Second ly , the nuclear n orm is prov e n to be the tightest conve x surrogate of rank [F azel 2002]. It me an s that the nu clear norm is the best app roximation to the rank o perator in all conve x functions. Moreo v er , the analogy between using the nuclear n orm for low -rank matrix r ecovery and using the ℓ 1 -norm for sparse signal recove r y has been we ll established [Recht et al. 2010], and the ex act recove ry property has be en prove n for some low -rank mode ls using the nuclear no r m [Recht et al. 2010; Cand ` es and Recht 2009; Cand ` es e t al. 2011] . In the follow ing , we will first introduce the conve x m o dels , summarize the optimization algorithms, and finally in tro duce some nonconv ex re lax ation methods briefly . 2.1. Ma trix Completion In matrix completion , missing values in a matrix are e stimated g iven observe d v alue s { D ij | ij ∈ Ω } , where Ω den otes the set of observe d en tries . As discussed previously , the common assumption is that the m atrix shou ld be lo w-rank. T o so lv e the problem, the following optimization problem is often con sidered: min X rank ( X ) , s .t. P Ω ( X ) = P Ω ( D ) , (5) 2 A conference version of this paper appeared in Proceedings of SPIE Defense, Security , and Sen sing 2013 [Zhou and Y u 2013]. 4 X. Zhou et al. where P Ω ( X ) deno tes the operation of projecting matrix X to the space of all matrice s with nonzero e lements restricted in Ω , i.e. P Ω ( X ) has the same values as X f or the entries in Ω and zero v alue s for the e ntries outside Ω . The e qu ality constraint in (5) says that the estimated values should co incide with the existing data. As w e discussed earlie r , re placing rank with the nuclear nor m can make the problem tractable. In some recen t works [Can d ` es and Recht 2009; Cai et al. 2010], the following convex problem is solved : min X k X k ∗ , s .t. P Ω ( X ) = P Ω ( D ) , (6) Cand ` es and Recht [2009] theoretically prove d that the solution of ( 6) can exactly re - cover the low-rank matrix w ith a high pro bability , if the u n derlying lo w -rank matrix satisfies the inco herence cond ition and the locations of observed entries Ω are uni- formly distributed with | Ω | ≥ C n 1 . 2 r log n , whe re | Ω | is the number of observ ed entries, C is a positive constant, n is the matrix size, and r is the rank. He re the in c oherence condition is used to m athematically characterize the diffi c u lty of reco vering the un der - lying low-rank matrix fro m a small number of sample d e ntries . In formally , it says that the singular vectors of the underlying low-rank matrix should sufficie ntly “spread o ut” and be unco r related with the standard basis . An ex treme example is that the underly- ing low-rank m atrix takes 1 in its ( i , j )-th e ntry and 0 elsewh ere. This matrix can be recove r ed o nly if the ( i, j )-th entry is actually sampled. This result has bee n fur ther strengthened to | Ω | ≥ C nr p oly (log n ) by imposing the strong inco h erence con dition [Cand ` es and T ao 2010; Gross 2011]. In real applications, the observed e ntries may be noisy , and the equality constraint in (6) will be too strict, resulting in over-fitting [Mazumd er et al. 2010]. There fore, the following re laxed form of (6) is of ten considere d f or matrix completion with noise [Candes and Plan 2010; Mazumde r et al. 2010 ] : min X 1 2 kP Ω ( D ) − P Ω ( X ) k 2 F + λ k X k ∗ , (7) where the p arameter λ controls the r an k of X and the selec tion o f λ should depend on the noise level [Cande s and Plan 2010] . 2.2. Ro bust Princi pal Componen t Analysis Convex prog ramming has also been u sed to solve RPCA. A po pular m ethod is named sparse and lo w-rank decompo sition [Cand ` es et al. 2011], and inv olves the deco mposi- tion of a matrix D as a sum of a low-rank co mponent X and a sparse co mponent E by minimizing the rank of X and the cardin ality of E simultaneously . The surprising message is that, under some mild assumptions, the low -rank matrix can be exactly re- covere d by the follow ing conve x program n amed Principal Compone n t Pursuit (PCP) [Cand ` es e t al. 2011 ]: min X , E k X k ∗ + λ k E k 1 , s .t. X + E = D . (8) Here, the n uclear norm k X k ∗ and the ℓ 1 -norm k E k 1 are the conv ex surrogates of r ank and cardin ality , respectively . Cand ` es et al. [2011] and Ch andrasekaran et al. [2011] analyzed the conditions for exact recov e ry . Briefly speaking, it has been prov en that the underly ing low-rank matrix X ∗ and the unde rlying sparse m atrix E ∗ can be e x- actly reco vered with high probability if X ∗ satisfies the incohere nce con d ition and the nonzero e ntries of E ∗ are sufficiently sparse with a random spatial distribution. More- Low-Rank Mode ling and Its Applications in Image Analysis 5 over , a theo r e tical choice of parameter λ is pro vided to make the exact recov ery mo st likely [ Cand ` es et al. 2011]. The basic mo del in (8) was extend ed to han d le additional scenarios such as Stable PCP that co n siders G aussian noise [Zho u et al. 2010], the outlier pursuit that incor- porates grou p sparsity [Xu et al. 2012a ], and the matrix recov e ry fro m compre ssive measurements [Wright et al. 2012]. In Stable PCP [ Zhou et al. 2010], the equality con- straint in (8) is relax e d to be k X + E − D k F ≤ σ to allow the e xistence o f Gaussian noise. In implem e ntation, the follo wing proble m is solved min X , E k X k ∗ + λ k E k 1 + µ 2 k X + E − D k 2 F , (9) where µ is a parame ter de termined by the n o ise leve l. 2.3. Op timization Algorithms The fo llowing theore m ([Cai et al. 2010, The orem 2.1]) serves as an important building block in nuclear no rm minimization algorithms: T H E O R E M 2.1. The solutio n to the follo wing problem min X 1 2 k Z − X k 2 F + λ k X k ∗ (10) is given by X ∗ = D λ ( Z ) , wh ere D λ ( Z ) = r X i =1 ( σ i − λ ) + u i v T i , (11) r is the ra nk o f Z , ( x ) + = max( x, 0 ) , and { u i } , { v i } a n d { σ i } are the left singul a r vectors , right singul a r vecto rs and sin gular val ues of Z , respectively . D λ is named the singular value thre sholding (SVT) operator [Cai et al. 2010]. Based on Theo rem 2.1, vario u s algo rithms have been dev eloped fo r specific prob- lems. Two of the most popular techniques are the Prox imal Gradie nt (PG) method [Moreau 1965] and the Augmented Lagrangian Method (ALM) [Bertsekas 1999], both of wh ich are applicable to a variety of co nvex p roblems . The PG me thod is ve ry use- ful to solv e the norm-reg ularized maximum-likelihood pro ble ms such as the mode l in (7), w hose en ergy function comprises a diff e rentiable loss and a nonsmooth reg - ularizer . Moreo ver , the PG m e thod is often combined with the Nesterov me thod to accelerate converg ence [Nesterov 2007; Beck an d T ebou lle 2009] . Examp les using the PG method include [Ji and Y e 2009 ; Mazumder et al. 2010; T oh and Y un 2010], e tc. The ALM method is closely related to the Alternating Direction Method of Multipli- ers (ADMM) [Boyd 2010]. It pr o vides a powerf u l f ramework to solve conv ex problems with equality constraints such as MC in ( 6) an d PCP in (8). The algorithms used in [Lin e t al. 2010; Cand ` es et al. 2011] be long to this class. The details of PG and ALM will be given in the f ollowing subsections. 2.3.1. Proximal Gradient. In sparse le arn ing problems, the following optimization p rob- lem is often considered : min X f ( X ) + λ R ( X ) , (12) where f ( X ) usually den otes a differ e ntiable loss func tion and R ( X ) corre sponds to a convex regularizer wh ich migh t be nonsmoo th. F or example, the matrix completion with n oise in (7) uses f ( X ) = kP Ω ( X − D ) k 2 F and R ( X ) = k X k ∗ . If f ( X ) is simply the sum o f squared diffe rences between X and a give n matrix, the problem in (12) is name d the p r oximal problem [Moreau 1965], which can be solv ed 6 X. Zhou et al. analytically f or many types of R ( X ) . F or ex ample, if R ( X ) is the nu clear norm, the problem can be solved analytically based o n Theorem 2.1. When (12) is not a standard proximal pro blem, the Proximal Gradient (PG) method [Moreau 1965] is usually used. In PG , a quadratic app r oximation to f ( X ) is made around the previo us estimate X ′ in each iteration. Defi ne Q µ ( X , X ′ ) = f ( X ′ ) + h∇ f ( X ′ ) , X − X ′ i + µ 2 k X − X ′ k 2 F + λ R ( X ) , = µ 2 k X − [ X ′ − 1 µ ∇ f ( X ′ )] k 2 F + λ R ( X ) + const., (13) where <> means the inne r p roduct and µ is a constant. It can be p r oven that [Beck and T eboulle 2009], if f ( X ) is d ifferentiable and conve x with a Lipschitz con- tinuous gradient satisfying k∇ f ( X 1 ) − ∇ f ( X 2 ) k F ≤ µ k X 1 − X 2 k F , (14) (12) can be solved by r e peatedly u p dating X v ia X k +1 = arg min X Q µ ( X , X k ) , = arg min X 1 2 k X − [ X k − 1 µ ∇ f ( X k )] k 2 F + λ µ R ( X ) (15) with a converge nce rate of O (1 /k ) . It is easy to see that (15) is simply the proximal problem, which is often con v enient to solve. The Acce lerated Proximal G radient (APG) metho d uses the N esterov method [Nesterov 1983] to acce lerate the con vergence of PG. Instead of making quadratic ap- proximation around X k , APG makes the ap proximation at an o ther point Y k , which is a linear combination of X k and X k − 1 . This modific ation w ill give a con v ergence rate of O ( 1 k 2 ) . Please refe r to [Nesterov 2007 ; Beck and T ebo ulle 2009] for details. The A PG method is summarized in Algor ithm 1. Algorithm 1 Accelerated Proximal Gradie nt (APG) 1. Initialize: X 0 = X − 1 ∈ R m × n , t 0 = t − 1 = 1 2. repeat 3. Y k = X k + t k − 1 − 1 t k ( X k − X k − 1 ) 4. X k +1 = arg min X Q µ ( X , Y k ) 5. t k +1 = 1+ √ 1+4( t k ) 2 2 6. until converg ence The PG and APG methods have been intensively used to solve the matrix com p letion problem in ( 7), where the updating in (15) is solve d v ia SVT . F or example, the SOFT - IMPUTE algorithm in [Mazumde r et al. 2010] solv es (7) by iteratively updating X by: X k +1 = D λ ( P Ω ( D ) + P Ω ⊥ ( X k )) = D λ ( X k − [ P Ω ( X k ) − P Ω ( D )]) , (16) where P Ω ⊥ denotes the co mplementary projection such that P Ω ( X k ) + P Ω ⊥ ( X k ) = X k . It is straightforward to find that (16) is equivalen t to ( 15) with µ = 1 . Hence, the SOFT - IMPUTE algorithm can be interp reted as PG with fi xed step length. The FPCA algo- rithm introduced in [Ma e t al. 2011] is also based on PG with a continuation technique to accelerate con vergence. Ji and Y e [2009] and T oh and Y u n [ 2010] also p r oposed dif- ferent implementations of APG for matrix completion . T omioka et al. [ 2010] p roposed a Low-Rank Mode ling and Its Applications in Image Analysis 7 Dual Augmented Lagrangian algorithm for matrix co mpletion, wh ich achieves super- linear c onvergen c e. I t can be interpre te d as a pro ximal metho d with the desce n ding directions computed from the augmen ted Lagrangian of the dual problem. 2.3.2. Augmented Lagrangian Method. The Augmented Lagrangian Method (ALM) [Bertsekas 1999] is a classical tool to minimize a conve x function with equality con- straints . W e w ill u se PCP in (8) as an e xample to intro duce this method. T o re move the constraint X + E = D , a multiplier Y is introduce d and the au g mented Lagrangian of (8) reads L µ ( X , E , Y ) = k X k ∗ + λ k E k 1 + < Y , D − X − E > + µ 2 k D − X − E k 2 F . (17) ALM alternates between the followin g two steps: ( X k +1 , E k +1 ) = arg min X , E L µ ( X , E , Y k ) , (18) Y k +1 = Y k + µ ( D − X k +1 − E k +1 ) , (19) which are name d primal minimization and du al ascent, r e spectively . F o r PCP , the pri- mal minimization in (18) is d if ficult o ver X and E simultaneously . But if we fix one variable and minimize ov er the o ther o ne, the marginal op timization over X (or E ) turns into the nuclear norm (o r ℓ 1 -norm) regularize d pro x imal problem, wh ich can be efficiently solved by SVT (or soft-thresholding ). Then, the X -step and E - step are re- peated u ntil conve r g ence to solve (18). A more efficie nt way is to update the primal variables X and E for only one iter - ation, instead of exactly solving (18) before updating the dual variable Y [Lin e t al. 2010; Cand ` es e t al. 2011]. This is name d the Inexact Augmented Lagrangian Method (IALM), a special case o f the Alternating D irection Method of Multipliers (ADMM) [Boyd 2010]. The method is summarized in Algorithm 2. It can be prove n that the sequences { X k } and { E k } will co nverge to an op timal solution of (8) [Lin et al. 2010; Boyd 2010]. Algorithm 2 Inexact Augme nted Lagrangian Method (IALM) fo r PCP 1. Initialize: E 0 = Y 0 = 0 2. repeat 3. X k +1 = arg min X L µ ( X , E k , Y k ) 4. E k +1 = arg min E L µ ( X k +1 , E , Y k ) 5. Y k +1 = Y k + µ ( D − X k +1 − E k +1 ) 6. until converg ence ALM can also be applied to matrix co mpletion in (6). In [Lin et al. 2010] , the equality constraint P Ω ( X ) = P Ω ( D ) is re p laced by X = D + E and P Ω ( E ) = 0 . The n ew con- straint is equivalent to the original o n e, but the proj ection operator on X has been re- moved. Then , the ALM is applied. I n this way , minimizing the augmented Lagrangian over X turns into a p roximal pro blem, which co uld be solved by SVT . ALM was also applied to solve the nonn e gative m atrix f actorization problem for matrix completion [Xu e t al. 2012b]. 2.4. No ncon vex Rank Minimization Recently , a few works used n onconvex fun c tions instead of the nuclear norm as the surrogates of rank for rank minimization. A ty p ical choice is the Schatten-p no rm of 8 X. Zhou et al. singular values: f p ( X ) = r X i =1 σ p i ! 1 /p , (20) where σ 1 , · · · , σ r are singular values of X . When p → 0 , f p ( X ) → rank ( X ) , and the minimization is intractable. When p = 1 , f 1 ( X ) turns out to be the nu clear norm, w hich is the tightest co nvex surrogate. T o bridge the gap, the n onconvex cases with 0 < p < 1 were co nsidered in recent literature [ Majumdar and W ard 2011; Mohan and F azel 2012; Marjanovic and Solo 2012; Nie et al. 2012; Li et al. 2014 ]. The theoretical analysis on the re covery prope rties can be found in [Zhang et al. 2013a]. Besides the Schatten-p, other nonconv e x surro g ate functions for rank minimization were also studied in [Lu e t al. 2014 ] . 3. MA TRIX F A CT ORIZA TION Instead of minimizing rank, another appro ach to modeling the low-rank p r operty is matrix factorization. Matrix factorization intends to decom pose X ∈ R m × n as a p r oduct of two factor m atrice s X = AB T , w here A ∈ R m × r and B ∈ R n × r . U sing matrix factorization to model a low - rank matrix is based on the fact that rank ( AB T ) ≤ min ( rank ( A ) , r an k ( B )) . (21) Therefore, if r is small, X has a small rank. Finally , the problem of r e covering a low- rank matrix can be co nverted into estimating two factor matrices A and B . In this paper , we will discuss rep resentative matrix-factorization methods in the context of low-rank matrix recov ery . N otice that n ot all matrix factorization methods aim to re- cover a low-rank matrix. F or ex am p le, the outputs of nonneg ative matrix factorization [Lee and Seung 1999 ] or dictionary learning [T osic and Frossa rd 2011] are not ne ces- sarily lo w-rank. W e w ill no t discuss the se method s here. F or a summary of matrix factorization methods, please re fer to [Sing h and Gordon 2008]. 3.1. Ma trix Factorization with Miss ing V alue s Basically , the facto r ization-based methods for matrix co m pletion aim to solve the f ol- lowing optimization problem: min A , B 1 2 kP Ω ( D ) − P Ω ( AB T ) k 2 F . (22) A straightforward approach to solv ing (22) is to minimize the function ove r A or B alternately by fix ing the other one. Each subproblem of estimating A or B turns into a least-squares problem which admits a closed-form solution. Algo rithms of this type have been extensively studied in many works such as the early computer v ision literature [Shum et al. 1995; Vidal and Hartley 2004 ] and the re cent matrix re covery literature [Haldar and Hernand o 2009; T a ng and Ne horai 2011; J ain et al. 2013]. The matrix comple tion solve r LMaFit [W en et al. 2012] also adop ted the altern ating strategy to solve the follow ing e quivalent fo rm of ( 22) : min A , B , Z 1 2 k Z − AB T k 2 F , s .t. P Ω ( Z ) = P Ω ( D ) , (23) where Z is an auxiliary variable . Each step of updating A , B or Z can be solved very efficien tly . Additionally , LMaFit integrates a nonlinear successive ove r -relaxation scheme to accelerate the conv ergence of alternation. Low-Rank Mode ling and Its Applications in Image Analysis 9 While the formu lation in (22) is nonconv ex, the empirical results in many wo rks demonstrated that the alternating minimization pe rformed bo th accurately and e f- ficiently compare d to conve x method s [Haldar and Hernando 2009; K eshavan et al. 2009; T an g and Neh o rai 2011]. Meanwhile, the theoretical analysis in [J ain et al. 2013] showed that the alternating minimization can succeed under the conditions similar to the existing conditions given in [Cand ` es and Recht 2009], w hich has bee n introduced in Section 2.1 . The lower bound s for the reco very error of using alternating minimization for matrix completion were analyze d in [ T ang and N e horai 2011]. In computer vision literature, many wo rks ado pted h igher order algorithms instead of alternating least squares to solve (22) fo r f aster converge nce and better preci- sion. F o r ex am p le, Buchanan and Fitzgibbon [2005] develo ped a Damped New ton al- gorithm to solv e the pro blem. The variables A and B are updated based on the Ne w- ton algorithm with a damp ing factor . Howeve r , they cannot handle larg e -scale pro b- lems due to the infeasibility o f computing the Hessian matrix o v er a large number of variables. T o inte rpolate betwee n the alternating least squares and the Newton algorithm, some works proposed to use hybrid algorithms. In the Wiberg algorithm [Okatani and Deguchi 2007], A is updated via least squares while B is updated by a Gauss-Newton step in each iteration. Later , the Wiberg algorithm was exten ded to a dampe d version to achieve better converg ence [Okatani et al. 2011]. Chen [ 2008] proposed an algorithm similar to the Wiberg algorithm. The dif f erence is that B is updated via the Levenberg- Marquadt algorithm and c o nstrained in { B | B T B = I } . In- terested reader s can refer to [Okatani et al. 2011] for a mo re detailed introd uction to the factorization methods in compu te r vision. When observation is highly incomp lete, the problem in (22) is likely to be ill-po sed, which is a common case in collaborative filtering. A popular approach to addressing this issue is to pen alize the squared Frobenio us norms o f factor matrices: min A , B 1 2 kP Ω ( D ) − P Ω ( AB T ) k 2 F + λ 2 ( k A k 2 F + k B k 2 F ) . (24) This method is named Maximum Margin Matrix F actorization (MMMF) [Srebro et al. 2005]. The ide a is similar to using the squared ℓ 2 -norm in ridge regression to impro ve the stability of p arameter e stimation. More o ver , the follo wing equality is established in [ Srebro e t al. 2005] k X k ∗ = min A , B : X = AB T 1 2 ( k A k 2 F + k B k 2 F ) , (25) which indicates the equivalen ce between the MMMF in (24) an d the nu clear norm minimization in (7). This equivalence was also studied in [Mazumder et al. 2010; W ang et al. 2012; Cabral et al. 2013]. T o solve the optimization in (24), eithe r gradien t- based algo rithms [Re nnie and Sre bro 2005] or alternating minimization can be used [W ang et al. 2012; Cabral e t al. 2013 ]. 3.2. Ri emannian Optimization Another w id ely-used regularization strategy in low-ran k matrix facto rization is to con - strain the search space and optimize ove r manifo lds . Keshavan et al. [2010a] prop osed to solve the following matrix comple tion problem min A , Σ , B 1 2 kP Ω ( D ) − P Ω ( A Σ B T ) k 2 F , s .t. A ∈ Gr ( r , m ) , B ∈ Gr ( r , n ) , Σ ∈ R r × r , (26) where Gr ( r , n ) d enotes the set of r -dime nsional subspaces in R n , which fo rms a Rie - mannian manifold named Grassmannian. Keshavan et al. [2010a] pr oposed an algo - 10 X. Zhou et al. rithm name d OptSpace to iteratively estimate the f acto r matrices, where A and B are upd ated by g radient descent ov er the Grassmannian, while Σ is up d ated by least squares . Similar to the theo retical results for the convex metho d [Candes and Plan 2010], K eshavan et al. [2010b] p rovided the perform an c e guarantee of OptSpace under an appropriate incoheren ce condition. Building upon the same mode l, Ngo and Saad [2012] pro p osed a scaled-gradient pr o cedure to accelerate the converge nce of the al- gorithm. Dai et al. [ 2011] pro posed an algorithm n amed SET to solve the f o llowing two-factor model: min A , B 1 2 kP Ω ( D ) − P Ω ( AB T ) k 2 F , s .t. A ∈ Gr ( r , m ) , B ∈ R n × r . (27) SET updates A o v er the Grassmannian and estimates B by least squares . Based on the same model, Boumal and Absil [2011] deve loped an algorithm named RTRMC that op- timizes the cost function by a second- o rder Riem an nian trust-region metho d to achieve faster conve rgence. Mishra et al. [2013b] proposed a framewo rk of optimization over Rie mannian quo- tient manifolds f or low-rank matrix factorization. They investigated thre e type s of ma- trix factorization: the full-rank factorization, the po lar factorization and the subspace- projection factorization, w hich are re lated to the mode ls in (22), (26) and ( 27), re spec- tively . T o take in to acco u nt the inv ariance ove r a class of equivalent solution s, they explore d the u nderlying qu o tient nature of the se arch spaces and de signed a class of gradient-based and trust-region algorithm s over the quotient search space s . The y con- cluded thro ugh ex periments that the three factorization models with differe nt Riemm- nanian structure s were almost equiv alent in terms o f computational complex ity and perform e d f avorably compared to the previo us m e thods such as LMaFit an d O ptSpace. More related w orks in clude [Meyer et al. 2011; Mishra e t al. 2012, 2013a; Absil e t al. 2013], etc. Instead of exploring the geometries o f search spaces of factor matrices , V andereycken [2013] and Shalit et al. [ 2012] pro posed to directly optimize a function over the set of fi x ed-rank matrices: min X f ( X ) = 1 2 kP Ω ( D ) − P Ω ( X ) k 2 2 , s .t. X ∈ M r . (28) Here M r denotes the set of rank- r matrices in R m × n , which fo rms a smooth mani- fold. V andereycken [2013] de veloped a conj ugate gradient descent algorithm named LRGeomCG to efficiently optimize any smooth function over M r , while Shalit et al. [2012] d esigned an online algo rithm to solve large-scale proble ms . Numerical expe ri- ments [V andere y cken 2013; Mishra e t al. 2013b] showed that LRGeom CG perfo rmed comparably with the quotient-space me tho ds for matrix c o mpletion. Recently , a manifo ld -optimization toolbox named Manopt has been developed [Boumal et al. 2014], prov iding a lot of re ady-to-use algorithms to solve optimization problems over various manifolds, such as Grassmannians and the fixe d-rank mani- folds. 3.3. Ro bust Matrix Factorizat ion Robust matrix factorization is a m ethod fo r han d ling ou tliers in data, and can be r e- garded as the factorization app r o ach towards RPCA. As mentioned befo re, the sensi- tivity to outliers f or traditional metho ds is due to the squared loss used in (22), wh ich penalizes large errors too m uch, re sulting in biased fitting. T o add r ess this issue, a Low-Rank Mode ling and Its Applications in Image Analysis 11 typical ap p roach is to use more robust loss fun c tio n s: min A , B X ij ρ ( D ij − [ AB T ] ij ) , (29) where [ AB T ] ij denotes the entry i j of AB T and ρ is a robust lo ss function. F or example, the G eman-McClure function defined as ρ ( x ) = x 2 2(1+ x 2 ) is ado pted in [De La T orre and Black 2003]. T o solve the optimization problem, alternating mini- mization is carried out, w here A an d B are updated iter atively by solving robust lin- ear regression via iterative re weighted least squares . A similar idea of rewe ighting data based on ro bu st estimators is used in [Aanæs et al. 2002]. K e and Kanade [ 2005] adopted the ℓ 1 -penalty and solve d the pro ble m by alternating ℓ 1 -minimization. More examples o f using the ℓ 1 -norm for robust matrix factorization include [Kwak 2008; Eriksson and V an D en He ngel 2010 ; Z heng e t al. 2012; Shu et al. 2014]. Kwak [2008] proposed to maximize the ℓ 1 -norm of the p r ojection of a data point o nto the unkno wn principal directions in stead o f m inimizing the residue in (29 ). Eriks son and V an Den Heng el [2010] generalized t he W iberg algo rithm [Okatani and Deguchi 2007] to h an d le the ℓ 1 case. Zh eng et al. [2012] propo sed to add a nuclear -norm reg u larizer to improve the converg ence and solved the optimization by ALM. Shu et al. [2014] pre sented an ef fi cient algorithm using the g roup-sparsity reg- ularization and established the equivalence betwe en the proposed method and rank minimization. Many othe r works tried to addre ss the pr o blem from the pr obabilisti c point of v iew , which mode led non-Gaussian no ise to impro ve robustness. W e will dis- cuss them in Section 3.5. 3.4. On line and P arallel Algorithms The de mand of on line pr o cessing of streaming data like a vide o has motivated the develop ment of many incremental algorithms for subspace track ing in p ast decade s [Bunch and N ielsen 1978; Comon and G olub 1990; Edelman et al. 1998]. I n more re- cent works such as [Brand 2002; Balzano et al. 2010; He et al. 2012], o nline algorithms for incomplete or corru p ted data w ere de veloped. I n G R O USE [ B alzano e t al. 2010] for example, the follo w ing cost function is minimized ove r a subspace A each time a new data point d t arrives: F ( t ) ( A ) = min b t kP Ω t ( d t − Ab t ) k 2 2 , (30) where Ω t denotes the set o f o bserved e ntries in d t . In each iteration, b t is first solved via least squares, and A is then u pdated via g radient descen t ov e r the G rassmannian. The algorithm GRASTA introd uced in [ He et al. 2012] exte nds GROUSE to handle outliers for robust estimation by re p lacing the squared loss w ith the ℓ 1 -norm in ( 30). T o solve the ℓ 1 -minimization problem in each iteration, the AD MM fram e work is u se d in GRASTA. Shalit e t al. [2012] develo ped an online algo rithm by optimization o ver the low-rank manifold, which has been discussed in Section 3.2. T o imp rove the scalability o f matrix factorization algorithms, some parallel frame- works have been p roposed, which can perf o rm matrix factorization in parallel compu t- ing architectures to handle ex tremely large datasets. Recht and R ´ e [ 2013] p roposed an algorithm named JELL YFISH for large-scale matrix comp letion. It minimizes the energy function in ( 24) via increme ntal gradie nt de scent, i.e. the variables are up- dated follow ing an approximate gradient co nstructed from a sampling of matrix en- tries . More over , JELL YFISH adopts a block matrix-partitioning scheme with a special sampling order to allow a p arallel implementation of the algorithm on m ultiple cores to achieve a speed -up nearly pro portional to the number of core s . Ge m ulla et al. [ 2011] 12 X. Zhou et al. adopted similar strategies for stochastic and parallel implementation. Mackey et al. [2011] in tro duced a d ivide-and-co n quer frame w ork for matrix factor ization . The frame- work first divides a large input matrix into smaller submatrices (e.g . by sele cting columns and row s), factorizes submatrices in parallel using existing matrix factoriza- tion algorithms, and finally combines the solutions of the subproblems using rando m matrix appr o ximation te chniques . The y provided a theoretical analysis on the recove ry probability of the parallele d algo rithm com p ared to its batch v ersion. 3.5. P robabilistic Matrix Factorizati on In this subsection , we shall briefly intro duce a class of method s that treat low - rank matrix factorization from a pro babilistic po int of view . 3.5.1. Probab ilistic PCA. Probabilistic PCA (PPCA) [Tipping and Bishop 1999 ] is a latent variable model which successfully formulates class ical PCA [Jolliffe 2002; Hotelling 1933] into the probabilistic framew ork. Let d i ∈ R m be the i - th observ e d data point and b i ∈ R r be the i -th latent variable in the latent space. PPCA assumes that d i is linearly related to b i by a matrix A ∈ R m × r : d i = Ab i + e i , ( 31) where e i ∈ R m denotes random noise, w hich follows the G aussian distribution N ( 0 , β − 1 I ) with β den oting the p r ecision. Then, the likelihood of mod el (31) can be written as Pr ( d i | A , b i , β ) = N ( d i | Ab i , β − 1 I ) . (32) T o obtain the margin alize d likelihood of A , w e need to integ rate out b i , and a prior d is- tribution of b i has to be specified . PPCA adopts a ze ro mean and unit cov ariance Gaus- sian distribution as the prio r . After some de r ivation, the marginal likelihood reads: Pr ( d i | A , β ) = Z Pr ( d i | A , b i , β ) Pr ( b i ) d b i , = N ( 0 , AA T + β − 1 I ) . (33) By f u rther assuming the independ ence o f data points, the likelihood of full data is given by Pr ( D | A , β ) = n Y i =1 Pr ( d i | A , β ) . (34) Finally , the matrix A can be estimated by maximizing the above likelihood fu nction. The maximum likelihood estimate (MLE) c an be o btained analytically as ˆ A = U r (Σ r − β − 1 I ) 1 / 2 Q , (35) where the columns of U r ∈ R m × r are given by the r eigenve ctors o f the cov ariance matrix D T D corre sponding to the r large st eigenvalue s λ 1 , · · · , λ r , Σ r is a d iagonal matrix with the diagonal elements being λ 1 , · · · , λ r , and Q is an arbitrary r × r o r tho g- onal matrix. As a result, the c olumn space of ˆ A is ide n tical to the subspace spanned by the princip al comp o nents deriv e d f rom the classical PCA. The MLE o f the noise variance is give n by β − 1 = 1 n − r n X i = r +1 λ i , (36) which can be interpreted as the average variance of the rest dimensions. Low-Rank Mode ling and Its Applications in Image Analysis 13 An advantage o f the probabilistic formulation of PCA is that it can be ve ry helpfu l to automatically choose r by seeking a Bayesian app r oach [Bishop 1999] . Consider an independ ent Gaussian prior over each column of A : Pr ( A | γ ) = r Y i =1  γ i 2 π  m/ 2 exp  − 1 2 γ i a T i a i  . (37) where a i ∈ R m is the i -th column of A . Notice that the hyper-parameter γ i controls the inverse v ariance of a i . When γ i converg es to a large value during the inference, it implies that the variance of a i is small, and consequen tly the dimension spanned by a i is automatically switched o ff . An empirical Bayesian approach can be used to find γ , which is also known as automatic r elevance determ in ation (ARD) in the ma- chine learning literature [MacKay 1995; Bishop et al. 2006]. The well-known sparse learning mode l, re le vance v ector machines (RVM) [Tipping 2001], also ad opts a simi- lar technique to optimize the hy p erparameters. The estimation of latent variables b i can be treated similarly by integrating out A , which is known as dual PPCA [ La wrence 2005]. 3.5.2. Pro babilisti c M atri x Completi on and RPCA. Probabilistic matrix factorization m e th- ods generally consider the follo wing mode l to de scribe the observe d data: D = AB T + E . (38) The probabilistic f ramework can n aturally h andle missing values in matrix co mple- tion by only considering the likelihoo d ov er the o bserved entrie s Pr ( D | Θ) = Y ij ∈ Ω Pr ( D ij | Θ) , (39) where Θ is the set of parame ters and Ω is the set of o bserv ed entries. A representative work is the Probabilistic Matrix F actorization (PMF) [Salakhutdinov and Mnih 2008b]. It assumes the f ollowing p r iors: A ij ∼ N (0 , γ − 1 a ) , B ij ∼ N (0 , γ − 1 b ) , E ij ∼ N (0 , β − 1 ) , (40) where γ a , γ b and β are hyperp arameters . If treating the hype rparameters as fixed values, we have the followin g posterior pro bability of A and B Pr ( A , B | D , γ a , γ b , β ) ∝ Pr ( D | A , B , β ) Pr ( A | γ a ) Pr ( B | γ b ) = Y ij ∈ Ω N ( D ij | [ AB T ] ij , β − 1 ) Y ij N ( A ij | 0 , γ − 1 a ) Y ij N ( B ij | 0 , γ − 1 b ) . (41) After simple de rivation, we can see that the maximum a posteriori (MAP) estimate of m o del (41) turns out to be the solution of (24) with λ = γ a /β = γ b /β . This give s a probabilistic interpretation to MMMF . The re gularization in MMMF cor r esponds to imposing Gaussian priors on A and B . The advantage of pro babilist ic modeling is that the regularization parameters γ a , γ b and β do not need to be pred efined. They can be automatically determined by treating them as variables , intro ducing priors on them and estimating them from the data [Salakhutdinov and Mnih 2008b]. Later , Salakhutdinov and Mnih [2008a] prop osed a full Bayesian method for PMF and solved the model by Markov Chain Monte Carlo (MCMC) sampling. 14 X. Zhou et al. F or RPCA, the prior on E needs to be changed. F or instance, W ang et al. [2012] p ro- posed Probabilistic Robust Matrix F actorization ( PRMF) that used the Laplacian prior to model the erro r Pr ( E ij | β ) =  β 2  exp( − β | E ij | ) . (42) Compared to the Gaussian p rior , the Laplacian prior will encourage E to be sparse and allow E ij to have a large magnitude. The MAP estimate is g iven by min A , B k D − AB T k 1 + λ k A k 2 F + λ k B k 2 F . (43) Notice that the Laplacian prior on the error term induces the ℓ 1 -penalty on the residue. One can see the co nnection betwe en PRMF and PCP by (25). Babacan et al. [2012] propo sed a Bayesian method to solve RPCA. It assumes the model D = AB T + E + Z , where E is a sparse m atrix used to model o utliers and Z is a d ense matrix used to mode l noise. Suppo se Z ij is i.i.d. G aussian noise follow ing N (0 , β − 1 ) , the cond itional probability of the observation is give n by Pr ( D | A , B , E , β ) = N ( D | AB T + E , β ) ∝ e xp  − β 2 k D − AB T − E k 2 F  . (44) Moreove r , the follo w ing priors are assumed Pr ( E | α ) = Y ij N ( E ij | 0 , α − 1 ij ) , Pr ( A | γ ) = Y i N ( a i | 0 , γ − 1 i I ) , Pr ( B | γ ) = Y i N ( b i | 0 , γ − 1 i I ) . (45) Instead of using fixed values, the hyperparame ters α ij and γ i are further m odeled using Gamma priors. Finally , a variational algorithm is used to e stimate A and B as well as the hy p erparameters. Other probabilistic methods fo r robust matrix factorization include [Lakshminarayanan e t al. 2011; Ding et al. 2011; W a ng and Y eung 2013 ; Meng and De la T o rre 2013], etc. In pro babilist ic matrix factorization, the numbe r of columns of A and B does no t necessarily d etermine the r an k of AB T . It only serves as an uppe r bound. During the inference, the true rank will be determine d automatically [ Babacan e t al. 2012]. The hierarchical mode ling w ith hype rparameters plays an imp ortant role in the automatic determination o f r an k. During the inf e rence, som e γ i in (45) will converg e to extremely large v alues , resulting in the corre sponding columns being close to ze ro. The automatic “switch-off ” of these columns d riven by the data will determine the final rank of AB T . 3.6. P rojection-Base d Methods Another category of me thods estimate a low-rank matrix unde r an explicit rank co n- straint. T o make the co nstraint satisfied during o ptimization, these methods o ften use a g r e edy strategy that pro jects intermed iate results to the f easible set of the rank con- straint. While conc eptually these m ethods use an explicit rank constraint, n umerically they implement the low-rank pro j ection step using factorization methods. Low-Rank Mode ling and Its Applications in Image Analysis 15 In [Ja in et al. 2010], the fo llowing pro blem is considered: min X f ( X ) = 1 2 kA ( X ) − b k 2 2 , s .t. r an k ( X ) ≤ r, (46) where A is a linear operator on X . Notice that the matrix completion pr o blem is a spe- cial case of ( 46). An algorithm named Sing u lar V alue Pro jection ( SVP) was p roposed in [Jai n et al. 2010] to solve (46). It uses a proj ected gradien t descent scheme which alter - nates betwee n updating X via gradient descent and projecting the in termediate result to the set of rank- r matrices. By the matrix approx imation theorem in (3), the proj e c- tion is d one by calculating the SVD of the matrix and keeping the r largest singular values. This proce d ure is similar to the prox imal gradient metho d for nuclear n o rm minimization by replacing SVT with the low-ran k proje ction. The soft thresholding of singular value s is applied in SVT , w hile the hard thresho lding of singu lar values is used in SVP . Theref ore, SVP is analog ous to the iterative hard threshold in g algorithm in sparse coding [Blumensath and Davies 2009]. The ADMiRA algo rithm introduced in [Lee and Bresler 2010] also intend s to solve the proble m in (46). Instead of using hard thresholding as in SVP , ADMiRA extends the CoSaMP algorithm [Ne edell and Tropp 2009] in compre ssive sensing to the matrix case. The matching p ursuit-like [Mallat and Zhang 1993 ] scheme is used to stepwise select the basis ve c to rs to reconstruct the column space of X , which can minimize the fu n ction in (46). The SpaRCS algorithm proposed in [W aters et al. 2011] can be regarded as a counterpart o f ADMiRA to solve the robust matrix factorization pro blem. The GoD ec algo rithm prop osed in [Zho u and T ao 2011] uses iterative hard thr e sh- olding to solve the follow ing nonco nvex formu lation of RPCA: min X , E k D − X − E k 2 F s .t. r an k ( X ) ≤ r, k E k 0 ≤ k . (47) T o minimize (47), GoDec alternates be tween the low-rank projec tion to estimate X and the hard thre sholding to estimate E . T o avoid the computation of SVD , GoDe c uses a bilateral random proje ction scheme to compute the low- rank pro j ection. 4. NUMERICAL COMP ARISON OF ALGORITHMS A huge number of solvers have been develo ped f or lo w-rank matrix recover y in the past decade. T able I gives an ine xhaustive list of the m. In this section, we would like to n u merically illustrate the ir characteristics. As this p aper do es not focus on a comprehe nsive comparison , we only test a few solvers on synthesized datasets . N ote that the pe rformance of an algorithm o f ten depe nds on many facto r s , such as p rob- lem size, rank of the underlying low-rank matrix, distribution of its singular val- ues, density of missing entries o r outliers , noise level, and even the shape of the matrix. The re sults presented he re o nly aim to provide a brief d emonstration un- der a few typical settings, which are far f r om complete. F o r mo re detailed compar- isons , we refer the re ad ers to the experim e nt se c tions in the algorithm papers such as [Ke sha van et al. 2009 ; Okatani et al. 2011; Mishra et al. 2013b] and the re port [Michenkov ´ a 2011]. The cod e s to produce the results presented in this paper is pub- licly available at https://sites .google.com/site/lowrankmodeling / . W e welcome re aders to m odify the codes and test the algor ithms on mo re app lications . 16 X. Zhou et al. T ab le I. Representativ e Low-Rank Matrix Recov ery Algori thms . Category Algorithm & reference Problem Main techniques Rank SVT [Cai et al. 2010] MC Proximal gradient (PG) Minimization FPCA [Ma et al. 2011] MC PG, approximate SVD SOFT -IMPUTE [Mazumder et al. 2010] MC PG, warm-start APG [Ji an d Y e 2009; T o h an d Y un 2010] MC Accelerated PG PCP [Cand ` es et al. 2011] RPCA Augmented Lagrangian SPCP [Zhou et al. 2010] RPCA Accelerated PG ALM [Lin et al . 2010] Both Augmented Lagrangian Matrix MMMF [Rennie an d Srebro 2005] MC Gradient descent factorization PMF [Salakhutdin ov and M nih 2008b] MC Gradient descent LMaFit [W en et al. 2012] MC Alternating OptSpace [K eshavan et al. 2010a] MC Grassmannian SET [Dai et a l . 2011] MC Grassmannian LRGeomCG [V andereycken 201 3] MC Riemannian GROUSE [Balzano et al . 2010] MC Online algorithm JELL YFISH [Recht an d R ´ e 2013] MC Stochastic & parallel ADMiRA [Lee and Bresler 2010] MC Matching pursuit SVP [Jain et al. 2010] MC Hard thresholding GRASTA [He et al. 2012] RPCA On l i n e algorithm GoDec [Zhou and T ao 2011] RPCA Hard thresholding PRMF [W a ng et al. 2012] RPCA EM algorithm VB [Babacan et a l . 2012] Both V ariational Bayes This list is inexhaustiv e. It only includes several representative algorithms in each category . W e e valuate the accuracy of an algorithm by the re lative distance d efined as relative distance = k ˆ X − X ∗ k F k X ∗ k F , (48) where ˆ X and X ∗ represent the algorithm e stimate and the true lo w-rank matrix, re- spectively . As the results depend on stopping conditions of algorithms and d ifferent al- gorithms adopt differen t stopping criteria, we did not compare the algorithms mere ly in terms of the relative error of fi nal estimates . Instead, we p lot the curve of the re la- tive error fo r each algorithm as a function of time to see how qu ickly and h ow closely the algorithm estimate can appr o ach the gro und truth as the algorithm run s . N o te that the curve s do not aim to show the conve rgence rates of algorithms since the relative distance he re is calcu lated against the ground truth instead of the stationary point o f the cost function. Each curv e is average d over 5 rand omly-gene rated instances with the same problem setting. The datasets are synthesized in the fo llowing way: The lo w-rank compone nt is a m × m matrix generated by X ∗ = AB T , where both A and B are m × r random matrices with r ≪ m . A ij and B ij are indepen d ently sampled from the n ormal distribution N (0 , 1) . Then, X ∗ is normalized to make k X ∗ k 2 F = m 2 . F or m atrix comp letion, the locations of missing values are define d by a binary matrix with entries indepe ndently sampled from the Binomial distribution B (1 , ρ ) . F or RPCA, the locations of outlier entries are simulated in the same way , with their values indepe ndently sampled fro m the unifo rm distribution U ( − 10 , 10) . F or matrix completion, we ha ve t ested two conv ex solvers: the ALM algo- rithm [Lin et al. 2010] solving model (6) for noiseless cases , and the APG algo- Low-Rank Mode ling and Its Applications in Image Analysis 17 rithm [T oh an d Y un 2010] solving mode l ( 7) for noisy cases . Also , we have test ed the fo llowing m atrix factorization methods: LMaF it [W e n et al. 2012] , OptSpace [Ke sha van et al. 2010a], GRO USE [Balzano et al. 2010], LRGeomCG [V andereycken 2013] and VB [Babacan et al. 2012]. F or RPCA, we have tested two convex mode ls: PCP [Cand ` es et al. 2011] in (8) for no iseless cases, and SPCP [Z hou et al. 2010] in (9) f or n oisy cases. In addition, we have tested GoD ec [Zhou and T ao 2011] , PRMF [W ang et al. 2012] and VB [ B abacan et al. 2012] for com p arison. The convex programs were implemented by the authors in MATLAB . F or ALM, we implemented the inex act ve r sion and adopted the varying pen alty parame ter scheme provided by Boyd [2010]. F or APG , we integ rated the adap tive restart technique in- troduced in [O’Do n oghue and Candes 2012], wh ich can p ractically improve the con- vergen ce of APG. In o riginal works, partial SVD (only fi rst several singular values are compute d) is often u sed to accelerate the co mputation for large-scale pro ble ms [T oh and Y un 2010; Lin et al. 2010]. In our imp lementation, we simply used the built- in SVD fu nction of MATLAB , since w e observ e d that its efficiency was co m parable to or e ven better than a partial SVD solver (e.g. PROP ACK) when m ≤ 1 000 . SPCP was solved by APG in the original paper [Zhou et al. 2010]. Here, w e implemen ted the block coor d inate d e scent (BCD ) algorithm to solve SPCP , i.e., w e alternately updated X by SVT and updated E by soft thresholding until co nvergence. In ou r e xperience, the efficie n cy of BCD is at least comp arable to APG to solve the model in (9), while it is much simpler in implemen tation. F o r other algorithms, we used the MATLAB packages downloade d from the authors’ we bsites . W e followe d the default parameter settings in the original p apers and tuned some of the m to get re asonably good results for spe cific problems. Some algorithms require the rank, the noise level o r the number of o u tlier entries as input. T o simplify the comp arison, we p rovided their true value s to help parame ter tuning. W e set the initial guess to be the rank- r approximation of the input matrix using SVD for all algorithms. 4.1. Ma trix Completion In matrix completion literature, the ove r-sampling ratio (OS) is w idely used to quantify the difficulty of a problem. F or a m × n matrix with rank r , the OS is de fined as OS = | Ω | / ( m + n − r ) r , (49) where | Ω | denotes the num be r o f observed e ntries and ( m + n − r ) r is the underly ing degree o f free dom of the rank- r matrix. OS ≥ 1 is required to r e cover the matrix. The smaller the OS is , the m ore diffic u lt the rec overy turn s o ut to be. Figure 1(a) shows a basic case, where a sufficie nt nu mber of e ntries are o bserved (OS = 6) and no no ise e xists . The v alues of all curve s decre ase to small numbers be- low 10 − 6 , which indicates that all of the algorithms recov er the un derlying matrix in a high precision. The convex algorithm ALM is comparatively slower , since it needs to perf orm SVD co mputation in each iteration. Notice that the co nvex mode l in (6) is parameter-free, while it re c o vers the low-rank matrix accurately . The matrix factor- ization metho ds are generally accu rate and fast. The manifold-based algorithm LR- GeomCG achieves the f astest pe rformance, follo w ed by LMaF it. The online algor ithm GROUSE also converg es to an accurate solution after passing over the data multiple times . The pro babilist ic method (VB) show s compe titive perfor m ance while it requires no parameter tuning. The problem setting in Figure 1( b) is similar to the setting in Figure 1(a) e xcept that the rank is increased fro m 20 to 50. All curves in Figure 1(b) have similar shapes compared to those in Figure 1(a). The recover y accuracy for all algorithms remains high, as OS is unchanged. Howe ver , the curves of the factorization-based me thods more or less shift to the righ t, ind icating an increase of c o mputational time. This is 18 X. Zhou et al. 10 −2 10 0 10 2 10 4 10 −6 10 −4 10 −2 10 0 Time (sec) Relative distance ALM LMaFit OptSpace LRGeomCG VB Grouse (a) OS = 6 , r = 20 , σ = 0 10 −2 10 0 10 2 10 4 10 −6 10 −4 10 −2 10 0 Time (sec) Relative distance ALM LMaFit OptSpace LRGeomCG VB Grouse (b) OS = 6 , r = 50 , σ = 0 10 −2 10 0 10 2 10 4 0.03 0.12 0.48 Time (sec) Relative distance APG LMaFit OptSpace LRGeomCG VB Grouse (c) OS = 6 , r = 50 , σ = 0 . 1 10 −2 10 0 10 2 10 4 10 −6 10 −4 10 −2 10 0 Time (sec) Relative distance ALM LMaFit OptSpace LRGeomCG VB Grouse (d) OS = 3 , r = 50 , σ = 0 Fig. 1 . Comparison of algorithms for matrix completion. Each curve represents the relative distance be- tween an algorithm estimate and the true low-rank matrix as a function of time on a log 10 / log 10 scale. The problem s ize is fixed as 1000 × 1000 . OS, r an d σ denote the over-sampling ratio, the true rank an d the noise level, respectively . attributed to the fact that the dim e nsions o f variables in factorization - based method s directly depen d on the pre defined rank. Figure 1(c) shows a case where Gaussian n oise with σ = 0 . 1 is added. The curves of all methods converg e to v alue s large r than ze ro due to the ex istence of rand om noise. All of the factorization - based me thods achieve similar accuracy , while the relative er- ror of the conve x algorithm APG is higher . This is attributed to the fact that, while convex relaxation can guaran te e o ptimality in o ptimization, it may in tro duce bias into the model. Specifically , SVT is used to solve the co n vex mod el (7), w h ich will shrink the singu lar values of the reco vered matrix while re moving noise compone nts . Co n se- quently , the v alues o f the re c overed matrix shrin k towards zero, e specially when the noise le vel is large. T o comp ensate for the bias, some postproce ssing techniques could be used, which have proven to be effe ctive [Mazumder et al. 2010 ]. Figure 1(d ) shows a case wh e re the sampling rate is decre ased to OS = 3. Co mpared to Figure 1(b) with O S = 6, the curves shift to the right ind icating an increase of comp u- tational time, which means that the conve rgence rates of the algorithms are in fl uenced by the over-sampling ratio. Be sides , all o f the methods still obtain accurate recov ery with the relatively low o ver -sampling ratio. Low-Rank Mode ling and Its Applications in Image Analysis 19 10 −1 10 0 10 1 10 2 10 3 10 −4 10 −3 10 −2 10 −1 10 0 Time (sec) Relative distance PCP PRMF GoDec VB (a). ρ = 0 . 1 , r = 20 , σ = 0 10 −1 10 0 10 1 10 2 10 3 10 −4 10 −3 10 −2 10 −1 10 0 Time (sec) Relative distance PCP PRMF GoDec VB (b). ρ = 0 . 1 , r = 50 , σ = 0 10 −1 10 0 10 1 10 2 10 3 0.03 0.15 0.75 Time (sec) Relative distance SPCP PRMF GoDec VB (c). ρ = 0 . 1 , r = 50 , σ = 0 . 1 10 −1 10 0 10 1 10 2 10 3 10 −4 10 −3 10 −2 10 −1 10 0 Time (sec) Relative distance PCP PRMF GoDec VB (d). ρ = 0 . 3 , r = 50 , σ = 0 Fig. 2 . Comparison of alg orithms for RPCA. Ea ch curve represents the relative distance between an alg o- rithm estimate and th e t rue l ow -rank matrix as a function of time on a log 10 / log 10 scale. The problem size is fixed to be 1000 × 1000 . ρ , r and σ denote the proportion of outlier entries, the true rank and t he noise level, respectively . 4.2. RP CA The pr oblem settings fo r RPCA are similar to the settings for matrix completion . The proportion of ou tliers is define d as ρ = | Ω | /mn . A larg er ρ indicates more outliers, which makes the recove ry more difficult. The results are sho w n in Figure 2. The convex p rogram PCP achieves a very high accuracy in no iseless cases witho ut knowing the true rank. F or the no isy case, its stable version SPCP is te sted , which does not obtain the best accuracy due to the shrinkage e ffect of nuclear no rm minimization as mentione d in the pr e vious subsection. The probabilistic metho d PRMF achieves similar perfo r mance to PCP . Anothe r p robabilistic method VB u sing the V ariational Bayes infe rence achieves the best ove rall per formance in terms of both speed an d accuracy , and it requires no parameter tuning. The projectio n -based method Go Dec perform s ve r y we ll un der the fi r st two cases , but the per formance d rops in the more difficult cases in Figure 2(c) and (d) , whe re the cur v es are flattened at high values. 5. APPLICA TIONS IN IMA GE ANAL YSIS Many obj ects o f intere st in image analysis can be mo deled as low-ran k matri- ces, such as the images of a conve x lambertian surface unde r various illumina- tions [Basri and J acobs 2003], dynamic te xtures changing perio dically [Do retto et al. 2003], active contou r s with similar shape s [Blake and Isard 2000], and mu ltiple fea- 20 X. Zhou et al. Fig. 3 . Us i n g RPCA to remove shadows and specularities in face imag es. The rows from top to bottom correspond to the origin al images, the low-rank components and the sparse components, respectively . W e applied the PCP algorithm [Cand ` es et al. 2011] on a set of 64 face images of a person and selected 4 images to show the results. Note that the displayed intensiti es in each image have been scaled to [0, 255]. F ac e image courtesy of the Extended Y ale F ace Database B [G eorghiades et al. 2001; Lee et al. 2005]. ture tracks on a rigid moving o bject [Vidal and Hartley 2004]. Intuitively , the low - dimensional subspace models the commo n patterns underlying the data. Hence, re - covering the low-rank structure is critical to many applications such as background subtraction, face recog nition, an d seg m entation. Below , we will introd uce some typical applications based on the mo dels we have discussed in the previou s sections. 5.1. Fac e Reco gnition The concept o f low dime nsionality h as been used in face recog nition for decade s since the work by Sirovich and Kirby [1987]. PCA was applied o n a set o f f ace imag es to con- struct a face space and each face image can be characterized by a low - dimensional ve c- tor [Sirovich and Kirby 1987; K irby and Siro vich 1990] . Later on, Turk and P entland [1991] introduce d the “e igenface” me thod f or face recog nition. The basic steps of using eigenface s for face recogn ition in clude : (1) g enerating N e igenfaces by computing the first N eigen v ectors of the matrix composed of a set of training images; (2) calculating the we ight v e ctor of an inpu t image by pro jecting the image onto the space spanned by the N eige nfaces; and (3) determining whether the in p ut image is a face image and if so, which person the imag e belongs to according to the projection e rror and the weight vector . This is the e arlie st example of using low-rank mode ling for face reco gnition. The face image s in real datasets are usually co rrupted by various artifacts such as shado ws , specularities and occlusions, which cannot be hand led by classi cal PCA. Therefore, many approaches based on RPCA we re prop osed to p rocess f ace imag es [De La T orre and Black 2003; Cand ` es et al. 2011; Chen et al. 2012]. As illustrated in Figure 3, the local defe cts in face images could be remov e d as the sparse com ponent, while the co rrect description of the pe rson’s face cou ld be obtained fro m the low-ran k compone nt. This p rocedure can impro ve the characterization of face s and boost the perform an ce o f re cognition algo rithms [ C h en e t al. 2012 ] . Low-Rank Mode ling and Its Applications in Image Analysis 21 Fig. 4 . Using RPCA for background su btraction. The rows from top to bottom correspo nd to the input images, th e low-rank components (background) and the sparse components (foreground), respectively . W e applied the PCP algorithm [Cand ` es et al. 2011] on the sub wa y station dataset from [Li et al. 2004]. The dataset (used with permission of Liyu an Li) is a su rveillance video of a subway station including moving escalators in the background. W e s elected 200 frames to perform background subt raction and 4 frames to show the results in the figure. 5.2. Ba ckground S ubtraction Background subtraction involves modeling the background in a video and detect- ing the objects that stand out from the background . Similar to eige nfaces, PCA has been app lied to model the background since the work “eigenbackgroun d subtraction” [Oliver et al. 2000]. The basic id ea is that the under lying backgroun d images of a video captured by a static camera should be unchange d exce pt f or illumin ation variation. Therefore, the matrix composed of v ectorized background images can be naturally modeled as a low - rank matrix. Ho wever , a set of training images without for eground objects are required to g enerate a clean background model in traditional me thods . T o estimate a background m odel at the p r e sence of foreg round objects, RPCA is desired [De La T orre and Black 2003]. As illustrated in [ Cand ` es e t al. 2011], the PCP algo - rithm can reco ver the background images in the low-rank comp onent and id entify the foregro und objects in the sparse comp onent. Figure 4 giv es an illustration. Notice that the background includes thr e e moving escalators , which are clearly reconstructed in the low-rank compon ent. This shows the appe aling capability of low-rank m odeling for background subtraction. T o achieve better accuracy for object dete ction, the spatially- contiguous property of foreg r ound pixels can be mode le d and integrated into RPCA by using Markov Rando m Fields o r o ther smoothing techniques [Zhou et al. 2013b; Gao et al. 2012; W ang and Y eung 2013] . Similarly , the RPCA f ramework can be used to se g ment the point trajectories in a vide o into two gro ups , which correspond to back- ground and fo reground, respectively [Cui e t al. 2012]. The seg mentation is based on the f act that the background motion caused by c am e ra motion should lie in a low- dimensional subspace. 5.3. Cl ustering and Classifica tion Low-Rank Representation (LRR) is a well know n m ethod for subspace clust ering. In subspace clustering, the data points are assumed to be embedded in seve ral low- dimensional subspaces, and the task is to find these subspaces and the membership 22 X. Zhou et al. of each data point in these subspaces. A p opular me tho d is spectral clustering , w here the clustering is achieved by partitioning a graph whose edge we ight rep r esents the affinity betwe en two d ata po ints . In LRR, each data point is represe n ted by a linear combination of its neig hbors within the same subspace, and the coefficie nt X is esti- mated by min X , E k X k ∗ + λ k E k 2 , 1 , s .t. D = DX + E , (50) where k E k 2 , 1 = P j q P i E 2 ij encourage s column- wise sparsity on the outlier term E . It can be show n that, if the data points in D are from several or thogonal subspaces , X derive d f rom (50) will be block-diagonal [Liu et al. 2010]. Intrinsically , X ide ntifies the affinity be tw een data points, and its block-diagonal structure ind icates clusters in the data. Thus, X provide s a favorable affinity matrix to per f orm spectral clustering. A similar idea has also be en app lied to image segmentation [Cheng et al. 2011]. Another application of low -rank rep resentation is o n diction ary le arning f or image classi fication. I n dictionary learning, the primary task is to co nstruct a dictionary Φ = [ φ i , · · · , φ n ] , such that the input signals can be re presented by sparse line ar co m - binations o f diction ary atoms. That is , D = ΦX + E , where X shou ld be sparse. In [Zhang et al. 2012b, 2013b,c], it has been claimed that X should also be low-rank to learn a discriminative dictionary . The intuition is that, if the constructed dictionary Φ is discriminative, the signals in D with the same label should be represented by the same set o f atoms in Φ . Consequ e ntly , the co efficient matrix X shou ld be a block- diagonal matrix if the c olumns are orde red by class labels. T o imp ose such a structural constraint, the sparsity and the rank of X are minimized simultaneo u sly . 5.4. I mage Alignment and Rectifica tion Image align ment refe r s to the problem o f transforming differe nt images into the same coordinate system. P eng et al. [ 2012] proposed to solve the p roblem by rank minimiza- tion based on the assumption that a batch of aligne d image s should form a low-rank matrix. The paramete r s of transformation τ were e stimated by solving min τ , X , E k X k ∗ + λ k E k 1 , s .t. X + E = D ◦ τ , (51) where each column of D correspon ds to an image to be aligned and D ◦ τ denotes the images after transform ation. The sparse c omponent E models loc al d ifferences among images. Similarly , Zhang et al. [2012c] used the mod el in (51) to gener ate transform- invariant low- rank te x tures (TIL T). The diff erence co mpared to [P eng et al. 2012] is that D in TIL T re presents a single image instead of an image sequence. The assump- tion of TIL T is that the rec tifie d imag es o f textures such as characters, bar codes and urban scenes are usually symme tric pattern s and con sequently fo rm lo w-rank matri- ces. The recon structed low- rank texture can be furthe r used in many app lications such as camera calibration, 3D reco nstruction, character recog nition, etc. 5.5. S tructure and Motion The low-rank matrix factor ization has been wide ly used to analyze the tracks o f fea- ture po ints in a vid eo since the seminal work by T omasi and Kanade [1992]. The key observation is that a measureme nt matrix composed o f feature tracks will be rank- limited and the rank depend s on the type of camera mod el ( e.g . affin e or pe rspective) Low-Rank Mode ling and Its Applications in Image Analysis 23 and the comp lexity of obje c t mo tio n (e.g. r igid or non-rig id). F or example, un der the weak-perspe ctive m odel, the 2D image coo rdinates x ∈ R 2 and the 3D position X ∈ R 3 of a feature point are re lated by the f o llowing equation: x = M  X 1  , (52) where M ∈ R 2 × 4 is an affine mo tion matrix. If a set of feature points on a rigid obje c t are tracked acr o ss many frames, we have    x 11 · · · x 1 n . . . . . . . . . x m 1 · · · x mn    | {z } P ∈ R 2 m × n =    M 1 . . . M m    | {z } M ∈ R 2 m × 4  X 1 · · · X n 1 · · · 1  | {z } S ∈ R 4 × n , (53) where x ij denotes the 2D image coordinates o f point j in frame i , M i is the affine motion matrix fo r frame i , and X j is the 3D coor d inates of point j . P , M and S are o ften called measureme nt matrix, motion matrix and structure matrix, respec - tively [T omasi and Kan ade 1992]. Since the smallest dimension o f M is 4, we have rank ( P ) ≤ 4 . In addition, it is possible to recove r the structure and motion matri- ces f r om the low-rank facto rization of the m e asurement matrix in ( 53), which solves the problem of structure fro m motion ( SFM) [ T omasi and Kanade 1992]. F or nonrigid SFM, the object shape changes from frame to f rame, and (53) is only v alid for each frame separately with m = 1 , which is an u ndetermined system. T o make the p r o b- lem well posed, prior kn o wledge on the shapes is require d. The low-rank prior is widely ado p ted by many state-of-the- art me tho ds fo r nonrigid SFM, which assumes that the shapes to be recon structed are line ar com bination s of a limited number o f ba- sis shapes. Exemplar wo rks include [Breg le r et al. 2000; Xiao et al. 2006; Angst et al. 2011; Dai et al. 2012]. Given that the measureme nt matrix itself is low-r ank, the low-rank con straint can also be used to h elp tracking feature points and fi n ding co r r espondences acro ss v ideo frames [T orre sani and Bre gler 2002 ; I rani 2002; G arg et al. 2013]. Another related application is mo tion segme ntation [Vidal and Ma 2004; Rao et al. 2010; Vidal 2011], where the fe ature tracks are from multiple mo ving obj ects instead of a single rigid obj ect. The task is to segment the feature tracks into d ifferent group s , and each gro up of tracks belong to a single mov ing obj ect. As discussed above, each group of tracks should form a rank-4 subspace. The refore, the motion seg m entation problem can be formu lated as subspace clustering, i.e. d ividing the feature tracks into multiple clusters with e ach cluster for ming a low-dimen sion al subspace. F or a more detailed introduction to subspace clustering, ple ase ref er to [Vidal 2011]. 5.6. Re storation and Denoising A p opular ap plication of matrix completion is image restoration. In many scenarios, it is de sired to reconstruct the lost or corrup ted p arts of an image, which m ig ht be caused by texts or lo gos superpo sed on the image. This process is named image re storation or inp ainting [Bertalmio et al. 2000]. As a natural image is approximately low-ran k [Zhang et al. 2012a], the problem of restoring the corrupte d pixels can be formulated as a matrix comple tio n problem. Figure 5 is an illustration of using matrix completion to restore an image from randomly sampled p ixels o r a text-occluded image. Liang et al. [2012] used a more soph isticated model, wh ere the textur e to be reco v ered is mo d - eled as both low - rank and sparse in a ce rtain transformed domain. Moreover , they assumed that the cor r upted re gions might be u n known and used a sparse erro r term 24 X. Zhou et al. (a) (b) Fig. 5 . Ma t rix co mpletion for imag e re storation. (a) The inpu t image with 50% missing pix- els and the recovered image. (b) The input image corrupted by text and the recovered im- age. The SOFT -IMPUTE [Mazumder et al. 2010] algorithm was applied. Origin al image from http://en.wikipedia.or g/wiki/File :Lenna.png . to mo del and detect the co rrupted reg ions . T o alleviate the shrinkage of signals, some works adopted nonconve x metho ds for rank minimization instead of using the nuclear norm [Zh ang e t al. 2012a; G u et al. 2014]. Ji e t al. [2010] pro posed a method fo r video restoration. The unreliable pixels in the vid e o are first dete c ted and labeled as miss- ing. Then, the imag e patches are group e d such that the patches in each gro up share a similar u nderlying structure and appro ximately form a low- r ank matrix. Finally , the matrix completion is carried ou t on e ach patch gro up to restore the images. Similarly , the low-rank assumption is often used to mo del the coh e rence of m ultiple images for noise remo val in me dical image analysis . In deno ising of magn e tic reso- nance (MR) images, f or example, an image sequence usually consists of multiple echo images [Bydd er and Du 2006] , frames of d y namic imaging [Nguye n et al. 2011] o r dif- ferent diffusion-w eighted images [Lam e t al. 2012]. Although the image s are differe n t, the d e sired signals in the se images are suppo sed to be co rrelated and consequently c an be reco nstructed with several significant prin c ipal compone nts . The rem aining co mpo- nents c orrespond to random noise, which are re moved. Rece ntly , Candes et al. [2013] used the SVT ope r ator in (11) instead of classical PCA to achieve mo re robust results for image den oising . More im p ortantly , it has be en shown that the o ptimal thresh- old for SVT can be obtained theore tically based o n the Stein’s un biased risk estimate [Candes et al. 2013], wh ich brings great co nvenience to practical applications. 5.7. I mage Segmentation The active shape mod e l [ Cootes e t al. 1995] was p roposed to increase the ro bustness of de formable mod els f or image seg mentation. It constructs a statistical shape space from a large set o f given shape s and constrains the candidate shape in the shape space. In the active shape mod el, a candid ate shap e is represente d as C ( w ) = C + Φw , (54) where C deno te s the mean shape, Φ is a matrix co nsisting of vectors d escribing shape variations in the training data, and w is a vector of coefficien ts to re present the candi- date shape in the shape space. Then, w is determin e d by fitting the parametric curve in (54) to the feature s in the image. Since the number of c olumns of Φ is ofte n small, the candidate shape is confi n ed in a low- dimensional space. Theref o re, the active shape model intrinsically admits a low-rank assumption on the population of shapes. More- over , C and Φ are d erived by applying PCA to the set of training shape s. The active shape mod e l was later extended to the active appearance mode l to make use of bo th shape and appearance information [Coote s et al. 2001]. More e xemplar m e thods build- Low-Rank Mode ling and Its Applications in Image Analysis 25 ing u p on the active shape mod el for image segme ntation include [Leventon et al. 2000; Tsai et al. 2001; Cre mers 2006; Zhu et al. 2010]. An alternative approach to mak- ing use of the lo w -rank assumption fo r image s egmentation is t o impose a grou p- similarity constraint o n multiple shape s by nu clear norm m in imization [Zhou et al. 2013a], which does not require training shapes. 5.8. Me dical Image Reconstruc tion Image reconstruction based o n low-r an k mode ling has drawn much attention in the medical imaging comm unity . The idea is to make use of the temporal coheren ce in dy- namic imaging to reduce the required n umber of sampling. In MR imag in g , for exam- ple, Liang [2007] pro posed the c oncept of partial separability (PS) to model a spatial- temporal MR image ρ ( x , t ) as ρ ( x , t ) = L X ℓ =1 φ ℓ ( x ) v ℓ ( t ) , (55) where φ ℓ ( x ) and v ℓ ( t ) fo r ℓ = 1 , · · · , L are spatial and temporal co mponents, respe c- tively . L is the o r der of the mo del. Cor r espondingly , any sample in the ( k , t ) -space can be expressed as c ( k , t ) = P L ℓ =1 u ℓ ( k ) v ℓ ( t ) , whe re u ℓ ( k ) is the F ourie r transfor m of φ ℓ ( x ) . Using matrix notations, we have C = UV , (56) where C ij = c ( k i , t j ) , U iℓ = u ℓ ( k i ) and V ℓj = v ℓ ( t j ) . Since the images are temporally coheren t, L can be very small, which gives a low- r ank mode l of the coeffi c ients C in the ( k , t ) -space. Hence, a small number of samples are sufficient to e stimate C and reconstruct the image seque nce. F o r example, u ℓ ( k ) and v ℓ ( t ) for ℓ = 1 , · · · , L can be computed by fully sampling L co lu m ns and rows of the ( k , t ) -space [Liang 2007]. Some works used other sampling schemes and solved the reconstruction p roblem by matrix recove r y algo rithms [ Haldar and Liang 2010, 2011]. The basic PS mo del can be furthe r extended to integrate other sparse pro perties in specific do mains . F or examp le, the image intensity φ ℓ ( x ) of ten h as a sparse rep- resentation in wavelets or a limited total variation [Lustig et al. 2008; Lingala et al. 2011; Majumdar and W ard 2012]. Mean w hile, the tempo ral comp onent v ℓ ( t ) is usually periodic or bandlimited, which results in sparsity in the F ourier domain [Zhao et al. 2010, 2012] . The low -rank prop erty is mode le d as re gionally de pendent and exp loited locally in [Christodo ulou et al. 2012; Trzasko 2013] . Instead of using the PS mod el, some works [Lingala et al. 2011; Majumdar and W ard 2012] pr o posed to reco nstruct data by nu clear no rm min imization . Otazo et al. [ 2013] used a RPCA mo del to separate background and dyn amic comp onents in imaging. Lingala and Jacob [2013] d eveloped a dictionary learning-based frame work for dynamic imaging. Low - rank modeling has also been applied to re c o nstruct multi-channel data [Shin et al. 2013 ], single k -space data [ Haldar 2014] and imaging data f rom other modalities such as compu ted tomog- raphy (CT) [Cai et al. 2014; Gao et al. 2011] and p ositron emission tomograph y ( PET) [Rahmim et al. 2009]. 5.9. More Ap plications Recent example s of lo w-rank m odeling-based applications also include o bj ect tracking [Ross et al. 2008; Zhang et al. 2012b], saliency detection [Shen and W u 2012 ], co rre- spondence estimation [Z eng et al. 2012; Ch en e t al. 2014], fac ¸ ade parsing [Y ang et al. 2012], model fusion [Y e et al. 2012; P an et al. 2013], and de p th image enhancem ent [Shu et al. 2014], to n ame a f ew . 26 X. Zhou et al. 6. DISCUSSIONS In this p aper , we h ave introduce d the concep t o f low- rank modeling and review ed some representative low-rank models, algorithms and applications in image analysis . F or additional reading o n theories, algorithms and applications, the reade rs are re ferred to online docume n ts such as the Matrix F acto r ization Jungle 3 and the Sparse and Low-rank Approximation Wiki 4 , which are updated on a regular basis . The conv ex programming -based method s fo r low-rank matrix reco very g enerally achieve a stable perform ance under a wide range of scenarios due to the g lobal op- timality in optimization. In noiseless cases, the conve x pro grams such as (5) and (8) can exactly reco v er the underlying low-rank matrix with a theore tical guarantee [Cand ` es and Recht 2009; Cand ` es e t al. 2011]. In noisy cases, the nuclear no rm min- imization may shrink true signals while co mpressing noise. T o compen sate f o r the shrinkage eff e ct, some postpro cessing steps may be used [Mazumde r et al. 2010], while some other w orks tried to alleviate this issue by going beyo n d the nuclear norm and using nonconve x relaxation techniques [Mohan and F azel 2012 ; Zhang et al. 2012a]. A limita tion of con vex metho ds is the re quirement of repe ated SVD computation, which is time consuming and unaffo rdable in large- scale problem s. While many efforts have bee n made towards fast SVD compu tation such as partial SVD [Lin e t al. 2010] , approximate SVD [Ma et al. 2011] or pe rforming SVT without SVD [Cai and Osher 2010], computational efficien c y is still an issue in many real applications. The factorization-based method s are widely used in re al applications ( e.g . building recomme n der systems [K oren et al. 2009]), mostly d u e to the com putational conve - nience. Infer ence of a large matrix is reduced to estimation of two smaller factor matri- ces. Moreove r , the cost function of m atrix factorization is often d ecomposable as a sum of separate function s over data po in ts or variables . Therefore, it is con venient to de- velop online algorithm s for re al-time processing and to d e sign distributed algorithms for solv in g large - scale problems. A limitation of matrix factorization is that the unde r - lying rank ne e ds to be prede fined in many mo d els . While rank estimation techniques have been pro posed in some works such as [ Keshavan et al. 2009] and [W en et al. 2012], ran k estimation is still a challenging pro blem especially in noisy cases. The probabilistic methods have show n gre at po te ntial in both simulation [Babacan e t al. 2012] and real app lication [ W ang and Y e ung 2013 ]. Moreover , the probabilistic mod- els w ith a Bayesian treatment are often parameter free, which is important in real applications. The applications o f low-rank mo d eling are based o n the f act that line ar correlation often exists among data. Such prio r knowle dge can be use d fo r m any purp oses such as extracting co m mon p attern s , removing rando m noise, reducin g sampling rates in imaging, etc. The rece nt advance s in sparse learning and op timization p r o vide pow- erful framewo rks and techniques to conve niently mode l the lo w-rank prope rty of data and deve lop e fficient algorithms. W e expe ct to see more applications of low-rank mod - eling in the ne ar future. A CKNO WL EDGMENTS The authors thank Michael Klein for caref ully pro ofreading the manuscript. REFERENCES H. Aanæs, R. Fisker , K. Astrom, and J . M. Carstensen. Robust facto r ization . IEEE T ransactions on P attern Analysi s an d Machine Intellige n ce , 24(9):1215–1225, 2002. 3 URL: htt ps:// sites.google .com/ site/igorcarr on2/matrixfactorizations . Accessed: 25 May 2014. 4 URL: htt p:// ugcs. caltech.edu/ ∼ srbecker/ wiki/Main P age. Accessed: 25 May 2014. Low-Rank Mode ling and Its Applications in Image Analysis 27 P .-A. Absil, L. Amodei, and G. Meyer . Two new ton methods o n the manifold of fixed-ran k matrices endo wed with riem annian quotient g eometries. Compu ta ti onal Statistics , pages 1–22, 2013. R. Ang st, C. Zach, and M. Pollefeys. The gene ralized trace- norm and its application to structure-from -motion problems. In Proceed i ns of I nternational Conf erence on Computer V is i on , 2011. S . D. Babacan, M. Luessi, R. Molina, and A. K . Katsaggelos. Sparse Bayesian methods for low- rank matrix estimation. IEEE T ransactions o n Sig nal Process ing , 60(8): 3964–3977, 2012. L. Balzano, R. N owak, an d B. Recht. Online identification and tracking of subspaces from h ighly inco mplete info rmation. In Proceeding s of Annual Allerto n Conferen c e on Communica tion, Control, and Comp u ting , 2010. R. Basri and D. Jacobs . Lambertian refl ectance and linear subspaces. IEEE T ransac- tions on P attern Analysis a nd Machine Intell igence , 25(2):218–233, 2003. A. Beck and M. T eboulle. A fast iterative shrinkage -thresholding algorithm for linear inverse p roblems. SIAM J ournal on Imaging Sciences , 2(1):183–202, 2009. M. Bertalmio, G. Sapiro, V . Caselles, and C. Ballester . Image inpainting. In Proceedings of Annual Confere n ce on Computer Gra p hics and Intera ctive T echnique s , 2000. D . Bertsekas. N onlinear pro gramming . Athena Scientific, 1999. C . M. Bishop. Bayesian PCA. In Advances in Neura l In formation Proces s ing Sy s tems , pages 382–388, 1999. C . M. Bishop e t al. P attern recognitio n and machine learn ing . Spring er New Y o r k, 2006. A. Blake and M. Isard. Active contou rs . Spr in ger , 2000. T . Blumensath and M. E. Davies . Iterative h ard thresholdin g fo r compressed sensing. Applied and Computation al Harmonic Analysis , 27(3):265–274, 2009. N . Bou m al and P .-A. Absil. RT RMC: A Riemannian trust-region m e thod fo r low-r an k matrix completion. I n Advances i n Neura l Informa tion Processin g Systems , page s 406–414, 2011. N . Boumal, B . Mishra, P .- A. Absil, and R. Sep ulchre. Manop t, a matlab too lbox for optimization on manifold s . J ournal o f Machine Lea r ning Resea rch , 15:1455–1459, 2014. S . Boyd . Distributed op timization and statistical learning via the alternating direction method of multipliers. F o undations and T rends in Machine Learn ing , 3(1):1–122, 2010. M. Brand. Incremental singular value deco mposition of uncertain d ata with missing values. I n Proceeding s of the Euro p ean Conf e r ence on Compu ter V ision , 2002. C . Bregler , A. Hertzmann, and H. Bierm ann. R e covering n on-rigid 3d shape fro m image streams. In Proceedings of th e IEEE Conference on Co mputer V ision a nd P attern R ecognition , 2000. A. M. Buchanan and A. W . Fitzgibbon. Damped new ton algo rithms for matrix f ac- torization with missing data. In Proceedings of the IEEE Confer ence o n Com p uter V i sion and P attern Recogni tion , 2005. J . R. Bunch and C. P . Nielsen. Updating the singular value deco mposition. Numerische Mathematik , 31(2):111–129, 1978. M. Bydde r and J . Du. Noise re duction in multiple- echo data sets using singular value decompo sition. Mag n etic Re s onance Imagi n g , 24(7):849–856, 2006. R. Cabral, F . De la T orre, J . P . Costeira, an d A. Bernard ino. Unifying nuclear n orm and bilinear factorization approaches for low-rank matrix decom position. I n Proc e edings of the Internatio nal Conferen c e on Computer V ision , 2013. J . Cai and S. Osher . F ast singu lar v alue thresholding without singular value dec om- position. UCLA CAM Report , 5, 2010. 28 X. Zhou et al. J . Cai, E. Can d ` es, and Z. Shen. A singular value thr e sholding algorithm for matrix completion. SIAM J ourn al on O ptimization , 20:1956, 2010. J . Cai, X. Jia, H. Gao, S . Jiang, Z . Shen, and H. Z hao. Cin e co ne beam CT re construction using low-rank matrix factorization: algorithm and a proo f -of-princple study . I EEE T ransactio ns o n Medical Imaging , 33(8):1581–1591, 2014. E. Cande s and Y . Plan. Matrix comp letion w ith noise. Proceeding s of the IEEE , 98(6): 925–936, 2010. E. Cand ` es and B. Recht. Exact matrix completion via co n vex optimization. F oun da- tions of Computati o nal Mathematics , 9(6):717–772, 2009. E. Cand ` es, X. Li, Y . Ma, and J . Wright. Ro bust principal co mponent analysis? Journal of the ACM , 58(3):11, 2011. E. Cand es , C. Sing-Long, and J . Trzasko. Unbiased r isk estimates for singular value thresholding and spectral estimators . I EEE T ran sactions on Signal Processi ng , 61 (19):4643–4657, 2013. E. J . Cand ` es and T . T ao. The power of co nvex relaxation: Near - optimal matrix com ple- tion. IEEE T ransacti o ns on Information Theory , 56(5):2053–2080, 2010. V . Chan d rasekaran, S . Sanghavi, P . P arrilo, and A. Willsky . Rank-sparsity inc oherence for matrix deco mposition. SIAM J o urnal on Op timization , 21(2):572–596, 2011. C . Chen, C . W ei, and Y . W ang. Low-rank matrix r ecovery with structural incoher e nce for robust face recognition. In Proceedin gs of the IEEE Conferen ce on Computer V i sion and P attern R ecognition , 2012. P . Chen. Optimization algorithms on subspaces: Revisiting missing data pro blem in low-rank matrix. Internationa l Journal o f Com puter V isi on , 80(1):125–142, 2008. Y . Ch en, L. Guibas, and Q. Huan g. Near - optimal joint object matching via co nvex relaxation. In Proceeding s of the In ternational Confer e nce on Machine Learning , 2014. B . Cheng, G. Liu, J . W ang, Z. Huang, and S. Y an . Multi-task low-rank affi nity pursuit for image seg mentation. In Proc e e dings of th e I nternational Conf erence on Computer V i sion , 2011. A. Christodoulo u , S . Babacan, and Z. Liang. Accelerating cardiovascular imaging by exploiting regio nal low- rank structure via group sparsity . In Proceedi ngs o f the IEEE Internationa l Sy m posium o n Bio medical I m aging , 2012. P . C o mon and G. H. Golub. Tracking a few extreme singular values an d ve ctors in signal processing. Proceeding s of the IEEE , 78(8):1327–1343, 1990. T . Co otes , C. T aylor , D. Coop e r , and J . G raham. A c tive shape mo d els – their training and application. Computer V ision and Imag e Und erstanding , 61(1):38–59, 1995. T . F . C o otes , G. J . Edwards, and C . J . T aylor . Ac tive appearance models. IEEE T r ans- actions on P a ttern Analysis and Machine Intelli gence , 23(6):681–685, 2001. D . Cre mers . Dynamical statistical shape prio rs for lev el set-based tracking. IE E E T ransactio ns o n P attern Analysi s and Machine Intelligen ce , 28(8):1262–1273, 2006. X. Cui, J . Huang, S . Zhang, and D. N . Metaxas. Bac kg round subtraction using low rank and group sparsity constraints . In Proceedi ngs of the European Conference on Computer V ision , 2012. W . Dai, O. Milenkovic, and E. K erman. Subspace ev o lution and transfer (SET) fo r low- rank matrix com p letion. IEEE T ransaction s on Signal Pro cessing , 59(7):3120–3132, 2011. Y . Dai, H. Li, and M. He. A simple prio r -free method fo r no n-rigid structure-from- motion factorization. I n Proceedings of the IEEE Conference on Computer V i sion and P attern R ecognition , 2012. F . De La T orre and M. Black. A framewor k for ro bu st subspace learning. Intern ational J ou rnal of Com p uter V isi o n , 54(1):117–142, 2003. Low-Rank Mode ling and Its Applications in Image Analysis 29 X. D ing, L. He, and L. Carin. Ba y esian r o bust princ ipal co mponent analysis. IEE E T ransactio ns o n Ima ge Proc e s sing , 20(12):3419–3430, 2011. G . D oretto, A . Chiuso, Y . W u, and S . Soatto. Dy n amic textures. Internatio nal Journal of Computer V i s ion , 51(2) :91–109, 2003. C . Eckart and G . Y oung. The approx imation of one matrix by another of lower rank. Psychometrika , 1(3):211–218, 1936. A. Edelman, T . A. Arias , and S . T . Smith. The g eometry of algo rithms with orthog onal- ity constraints . SIAM journal on Matrix Ana lysis a nd Applications , 20(2):303–353, 1998. A. Eriksson and A. V an De n Hengel. Efficient computation of robust low-ran k matrix approximations in the presen ce of missing d ata using the l1 n orm. In Pro c eedings of the I EEE Conference on Computer V isio n an d P attern Reco gnition , 2010. M. F azel. Matrix ran k min i mization w ith applicatio ns . PhD thesis , Stanfor d U niver - sity , 2002. H. Gao, J . Cai, Z. Shen, an d H. Zhao. Ro bust principal co mponent analysis-based fo ur - dimensional co mputed tomography . Physi cs in Medicine a n d Biolog y , 56(11):3181, 2011. Z. Gao, L. Cheong, and M. Shan. Block - sparse RPCA for consistent fore ground detec- tion. In Pro c e edings o f the European Conference on Computer V isio n , 2012. R. G arg , A. Ro ussos , and L. Agapito. A variational appro ach to vid eo reg istration with subspace con straints . I n ternational Journal of Computer V is ion , 104(3):286–314, 2013. R. G emulla, E. Nijkamp, P . J . Haas, and Y . Sismanis. Large-scale matrix factor ization with distributed stochastic gradien t descent. I n Proc eedings of the Internationa l Conference o n Knowle dge Discovery and Data Mining , 2011. A. S. Georg h iades , P . N . Be lhumeur , and D. J . Kriegman. From f ew to many: Illu- mination cone models f or f ace re cognition und er variable lighting and pose. IEEE T ransactio ns o n P attern Analysis and Machine Intelli gence , 23(6):643–660, 2001. D . Gross. Re covering low-rank matrice s fro m few co efficients in any basis . I EEE T ransactio ns o n In form ation Theory , 57(3):1548–1566, 2011. S . G u, L. Zhang, W . Zuo, and X. F eng. W eig h ted nuclear norm minimization w ith application to imag e denoising. I n Proceed ings of the IEEE Co nference on Co mputer V i sion and P attern Recognitio n , 2014. J . Haldar . Low-rank m odeling of local k -space neighborho ods (loraks) fo r constrained MRI. I EE E T ransactio ns o n Medica l Ima ging , 33(3):668–681, March 2014. J . Haldar and Z. Liang. Spatiotemporal imag in g with partially sep arable functions: A matrix re c o very app roach. In Proceedin gs o f the IEEE Internatio nal Symposium on Biomedica l I maging , 2010. J . Haldar and Z. Liang. Lo w -rank app roximations for dynamic imag ing . In Pro ceedings of the IEEE Internatio n al Sympos i um on Biomed ical Imaging , 2011. J . P . Haldar and D. Hernando. Rank-co nstrained solutions to linear matrix equation s using powerfacto r ization . I EEE Signal Pro cessing Letters , 16(7):584–587, 2009. J . He, L. Balzano, and A. Szlam. In cremental gradient on the grassmannian for o nline foregro und and background separation in subsampled video. In Proceeding s of the IEEE Conference on Computer V isio n an d P attern R ecognition , 2012. H. Hotelling. An alysis of a complex o f statistical variables into pr incipal compo nents . The J o urnal of Edu cational Psycholo gy , pages 498–520, 1933. M. Irani. Multi-frame co rrespondence e stimation u sing subspace constraints . Interna- tional J ou rnal of Comp uter V isi o n , 48(3):173–194, 2002. P . Jain, R. Meka, and I. S . Dhillon . Guaranteed rank minimization via singular value projection . In Proceeding s of Advances in Neural Information Proce ssing Systems , 2010. 30 X. Zhou et al. P . Jain, P . Netrapalli, and S . Sanghavi. Low -rank matrix completion using alternating minimization. In Proceedi ngs of the annua l ACM Sym posium o n theo ry o f co mputing , 2013. H. Ji, C . Liu, Z. She n , and Y . Xu. Robust video denoising using low rank matrix com- pletion. In Proceedings of the IEEE Co nference on Computer V ision a nd P a ttern Recogni tion , 2010. S . Ji and J . Y e. An accelerated gradient metho d for trace norm minimization. In Proceedings of the Interna tional Confer e nce on Machine Learning , 2009. I. T . Jolliffe. Principal compone n t analysis . Springe r verlag, 2002. Q. K e and T . Kanade. Ro bust l1 norm factorization in the presen c e o f outliers and miss- ing data by altern ative conve x prog ramming. In Proc e edings o f the IEEE Co nference on Computer V ision an d P attern Reco gnition , 2005. R. H. K eshavan, A. Montanari, and S . O h. Low-rank matrix comple tio n with no isy ob- servations: a quantitative comparison . In Proceedings of Annual Allerton Conf erence on Communica tion, Control, and Comp u ting , 2009. R. H. K eshavan, A. Montanari, and S . O h . Matrix c o mpletion from a few entries. IEEE T ransactio ns o n Information Theory , 56(6):2980–2998, 2010a. R. H. K eshavan, A. Montanari, and S . Oh. Matrix co mpletion f rom noisy entries. J o ur - nal of Machine Learning Research , 11(2057-2078):1, 2010b. M. Kirby and L. Sirovich. Application of the Karhun e n-Loeve procedu r e for the char - acterization of human face s. I EEE T ransactio ns on P attern Analysis and Machine Intelligen ce , 12(1):103–108, 1990. Y . K oren, R. Bell, and C. V olinsky . Matrix factorization techniques fo r recomme nder systems . Co mputer , 42(8):30–37, 2009. N . Kwak. Principal c omponent analysis based on l1-n o rm maximization. IEEE T rans - actions on P a ttern Analysis and Machine Intelli gence , 30(9):1672–1680, 2008. B . Lakshminarayanan, G. Bo uchard, and C . Archambeau. R o bust Bayesian matrix factorisation. In Internationa l Conference o n Artifici al Intell igence and Statistics , pages 425–433, 2011. F . Lam, S . Babacan, J . Haldar , N . Schuff, and Z. Liang. Denoising diffusion-w eighted MR magnitude image sequences using low rank and edge constraints . In Proceedin gs of the IEEE Intern ational Sympos ium on Biome d ical Ima g ing , 2012. N . Lawrence. Probabilistic no n-linear principal compo nent analysis w ith G aussian process latent v ariable mod els . The Journal of Machine Learning Resear ch , 6:1783– 1816, 2005. D . Lee and H. Seung. Learning the p arts of objects by non -negative matrix factoriza- tion. Nature , 401(6755):788, 1999. K. Lee and Y . Bresler . Admira: Atomic de composition for m inimum rank approxima- tion. IEEE T ransacti o ns on Information Theory , 56(9):4402–4416, 2010. K. Lee, J . Ho, and D . Kriegm an . A c quiring line ar subspaces for face recog nition under variable lighting. IEEE T ransactio ns on P attern Analysis and Machine Intellig ence , 27(5):684–698, 2005. M. Leventon, W . Grimson, an d O . F augeras. Statistical shape influ ence in geo desic active co ntours . In Procee d ings of the IEEE Conferen ce on Co mputer V isi on and P attern R ecognition , 2000. L. Li, W . Huang, I. Gu , and Q. Tian. Statistical mo deling of co mplex backgrounds for foregro und o bject de tection. IEEE T ransa c ti ons I m age Proces s i ng , 13(11):1459– 1472, 2004. Y . Li, Y . Zhang, and Z. Huang. A rew eighted nuclear no rm minimization algorithm f or low rank matrix reco v ery . Journal o f Computation a l and Appl ied Mathematics , 263: 338–350, 2014. Low-Rank Mode ling and Its Applications in Image Analysis 31 X. Liang, X. Ren, Z. Zhang, and Y . Ma. Rep airing sparse low- r ank texture. In Pro ceed- ings o f the Europe a n Conferen ce on Computer V ision , 2012. Z. Liang. Spatiotemporal imaging with partially sep arable func tio n s . In Proceedin gs of the IEEE Internatio n al Sympos i um on Biomed ical Imaging , 2007. Z. Lin, M. Chen, and Y . Ma. The augme nted Lagrang e multiplier method for ex act recove r y o f co rrupted low - rank matrices. arXiv p r eprint a rXiv:1009.5055 , 2010. S . Ling ala, Y . Hu, E. D iBella, and M. Jacob. Acce lerated dynamic MRI explo iting sparsity and low- rank structure: kt SLR. IEEE T ran sactions o n Medi cal Im aging , 30(5):1042–1054, 2011. S . G. Lingala and M. J acob. Blind c ompressive sen sing dy namic MRI. IEEE transac- tions on medical imagi ng , 32(6):1132–1145, 2013. G . Liu, Z. Lin, and Y . Y u. Ro bust subspace se g mentation by low -rank r e presentation. In Proceedings of the Internati onal Con ference on Machine Learning , 2010. C . Lu, J . T ang, S . Y . Y an, and Z. Lin. Generalized nonc onvex no nsmooth low-rank min- imization. In Proceeding s of the IEEE Con ference on Co mputer V isi on a nd P attern Recogni tion , 2014. M. Lustig , D. D onoho, J . Santos, and J . P auly . Compre ssed sensing MRI. IEEE Signal Processing Magazine , 25(2):72–82, 2008. S . Ma, D. Goldfarb, and L. Chen. Fixed po int and bre gman ite r ative methods for matrix rank minimization. Mathem a tical Programming , 128(1-2):321–353, 2011. D . J . MacKay . Probable netwo rks and plausib le predictions-a review of practical Bayesian me thods for supervised neu r al networ ks. Network: Computatio n in Neural Systems , 6(3):469–505, 1995. L. W . Mackey , M. I. Jordan, and A. T alwalkar . Divide-and - conquer matrix factorization. In Advances in Neural Informatio n Processing Systems , 2011. A. Majum dar and R. W ard. Exploiting rank deficie ncy and transform domain sparsity for MR image reconstruction. Magnetic Reson ance Imagi ng , 30(1):9–18, 2012. A. Majumdar and R. K . W ard. An algorithm for sparse MRI re construction by schatten p-norm minimization. Magnetic resonance imagi ng , 29( 3) :408–417, 2011. S . G. Mallat and Z. Zh ang . Matching pursuits with time - frequency diction aries . IEEE T ransactio ns o n Sign al Proc essing , 41(12):3397–3415, 1993. G . Marjano vic and V . Solo. On lq optimization and matrix completion. IEEE T ra n sac- tions on Signal Processi ng , 60(11):5714–5724, 2012. I. Markovsky . Low rank approx imation: alg orithms , i m plementation, ap plications . Springer , 2012. R. Mazumder , T . Hastie, and R. Tibshirani. Spectral regu larization algorithms for learning large incomple te matrices. The J o urnal of Machine Learnin g Re s earch , 11: 2287–2322, 2010. D . Meng and F . De la T orre. Robust matrix factorization with unknown noise. In Proceedings o f the Internatio nal Conferenc e on Computer V ision , 2013. G . Meyer , S. Bonn abel, and R. Sepulchre. Linear regr ession under fixe d-rank con- straints: a riemannian approach. In Proceed ings of the Internation al Conferen ce on Machine Learnin g , 2011. M. Michenkov ´ a. Nu merical algo rithms f or lo w -rank matrix completion problems, 2011. B . Mishra, K . A. Apu roop, and R. Sepulchre. A rieman nian geometry fo r low-rank matrix c o mpletion. ar Xiv preprint a rXiv:1211.1550 , 2012. B . Mishra, G. Meye r , F . Bach, and R. Sepulchre. Low-rank o ptimization with trace norm penalty . SIAM J o urnal on Op timization , 23(4):2124–2149, 2013a. B . Mishra, G . Mey er , S . Bonnabel, and R. Sepulchre. Fixed-rank matrix factoriza- tions and r ie mannian low-rank optimization. Co mputational Statistics , 29:591–621, 2013b. 32 X. Zhou et al. K. Moh an and M. F azel. I te rative rew eighted algo rithms for m atrix rank minimiza- tion. T h e Journal of Machine Learnin g Research , 13(1):3441–3473, 2012. J . Moreau. Proximit ´ e et d ualit ´ e dans un espace hilbertien . Bulletin de la Soci ´ et ´ e Math ´ ematique de F ra nce , 93(2):273–299, 1965. D . Needell and J . A. Tropp. Cosamp: Iterative signal reco very from incomp lete and inaccurate samples. Applied and Computation al Harmonic Analysis , 26(3):301–321, 2009. Y . Nesterov . A method of solving a co nvex p rogramming pro blem with conver g ence rate 0(1 /k 2 ) . In Sov . Math., Dokl. , volum e 27, pag e s 372–376, 1983. Y . Nestero v . Gradient m ethods for minimizing comp osite o bjective function. T e chni- cal repo rt, Un iversit ´ e catholique de Louv ain, Ce nter for O perations Research and Econome trics (CORE), 2007. T . Ngo and Y . Saad. Scaled gradients on grassmann manifolds for m atrix co m pletion. In Advances in Neural In formation Processing Systems , 2012. H. Nguy en, X. P e ng, M. Do, and Z. Liang. Spatiotemporal denoising of MR spectro- scopic imaging data by lo w-rank appro ximations . In Proceeding s of the IEEE Inter - national Symposi u m on Biomedi cal Imaging , 2011. F . Nie, H. W ang, X. Cai, H. Huang, and C . Ding. Robust matrix c o mpletion via joint schatten p -norm and lp-norm minimization. In Proceed ings of the I EEE Interna- tional Conference on Da ta Mining , 2012. B . O’D onoghue and E. Candes. Adaptive re start f o r accelerated grad ient schemes. F ound ations of Co m putational Mathema ti c s , page s 1–18, 2012. T . Okatani and K. De guchi. O n the Wiberg algorithm fo r matrix factorization in the presence of missing comp onents. Internatio nal Journal of Computer V i sion , 72(3) : 329–337, 2007. T . Okatani, T . Y oshida, an d K. Deg uchi. Efficient algor ithm f or low-rank matrix factor - ization with missing co mponents and pe r formance comparison of latest algorithms. In Proceedings of the I n ternational Con ference on Com puter V isi on , 2011. N . Oliver , B. Rosario, an d A. P entland. A Bayesian computer vision system for mo del- ing human interactions. IEEE T ran s actions on P attern Analysis and Machine I n tel- ligence , 22(8):831–843, 2000. R. Otazo, D . K. Sodickson, and E. J . C an d ` es. Low-rank+ sparse (l+s) reco nstruction for accelerated dynamic MRI with sep e ration of background an d dynamic compone nts . In Proceedings of the SPIE O p tical Engi neering + App l ications , 2013. Y . P an, H. Lai, C . Liu, and S . Y an. A div ide-and-conqu e r method fo r scalable low-rank latent matrix pu rsuit. In Proceeding s of the IEE E Conf erence on Computer V i sion and P attern R ecognition , 2013. Y . P en g, A. Ganesh, J . Wright, W . Xu, and Y . Ma. RASL: Robust alignme n t by sparse and lo w-rank deco mposition fo r linearly corre lated images. IEEE T ra nsactions o n P attern Analysi s and Machine Intelli gence , 34(11):2233–2246, 2012. A. Rahmim, J . T ang, and H. Zaidi. F our-dimensional (4D) image reco nstruction strate- gies in d ynamic PET: beyond conv entional in d ependent fr ame reconstruction. Med- ical physi c s , 36(8):3654–3670, 2009. S . Rao, R. Tron , R. Vidal, and Y . Ma. Motion segmentation in the presence o f ou tlyin g, incomplete, or c o rrupted trajectories. IEEE T ra nsactions o n P attern Ana lysis and Machine Intell i gence , 32(10):1832–1845, 2010. B . Recht and C . R ´ e. P arallel stochastic g radient algorithms for large-scale matrix completion. Mathem a tical Programming Computation , 5(2):201–226, 2013. B . Recht, M. F azel, and P . P arrilo. G uaranteed minimum- rank so lu tio n s of linear ma- trix equations via nuclear norm minimization. SIAM Revi e w , 52(3):471–501, 2010. J . Rennie and N . Srebro. F ast maximum margin matrix factorization for co llabora- tive prediction. In Proceeding s of the Internationa l Co nference on Machine l earning , Low-Rank Mode ling and Its Applications in Image Analysis 33 2005. D . A. Ross, J . Lim, R.-S . Lin, and M.-H. Y ang. I n cremental learning for robust visual tracking . Internatio nal J o u rnal of Com p uter V isi o n , 77(1-3):125–141, 2008. R. Salakhutdinov and A. Mnih. Bayesian p robabilistic matrix factorization u sing markov chain mo nte carlo. In Proceedings of the Intern ational Co nference on Ma- chine Learnin g , 2008a. R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In Advances in Neural Information Proce ssing Systems , 2008b. U . Shalit, D. W e inshall, and G . Ch e chik. Online le arning in the e mbedded manifold of low-rank m atrice s . The Journal of Machine Learnin g Research , 13:429–458, 2012. X. Shen and Y . W u. A unifie d app r oach to salient object detec tio n via lo w rank matrix recove r y . In Proceedi ngs of th e IEEE Conferenc e on Computer V isi on and P attern Recogni tion , 2012. P . J . Shin, P . E. Larson, M. A. Ohliger , M. Elad, J . M. P auly , D. B . Vigner on, and M. Lustig . Calibrationless parallel imaging reco nstruction based on structured low- rank matrix comple tion. Magnetic Resonan c e in Medicine , 2013. X. Shu, F . P orikli, and N . Ahuja. Ro bust o rthonormal subspace learning : Efficient recove r y of corrupted low-rank matrices. In Pro ceedings of the IEEE Confere n ce on Computer V is i on a nd P attern Recogniti o n , 2014. H.-Y . Shu m , K. Ikeuchi, and R. Re ddy . Principal com p onent analysis with missing data and its application to poly h edral object mo deling. IEEE T ransacti o ns on P a ttern Analysis an d Machine Intell igence , 17(9):854–867, 1995. A. P . Singh and G . J . Gordo n. A unifie d v iew o f matrix factorization mode ls . In Pro- ceedings of Machine Learni ng an d Kn owledge Dis c overy in Databases , 2008. L. Sirovich and M. K irby . Low-dime nsional proced ure for the characterization of hu- man faces. Journal of the Optical Society of America. A, Optics and image science , 4 (3):519–524, 1987. N . Srebro, J . Re nnie, and T . Jaak ko la. M ax imum-margin matrix factorization. Ad- vances in Neural Informa ti on Processing Systems , 17(5):1329–1336, 2005. G . T ang and A. Nehorai. Low e r boun ds on the mean-square d error of low- rank matrix reconstruction. IEEE T ransactio n s o n Signal Pro cessing , 59(10):4559–4571, 2011. M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. T h e Jour - nal of Machine Learni ng R esearch , 1:211–244, 2001. M. E. Tipping and C . M. Bishop. Probabilistic principal compone nt analysis . J ourn al of the Roya l Statistic a l Soci ety: Series B (Statistica l Method ology) , 61(3):611–622, 1999. K. T oh and S . Y un. An acce lerated proximal gradient algorithm fo r n uclear norm regularized linear le ast squares p r oblems . P ac ific J ou rnal of Optimi zation , 6(615- 640):15, 2010. C . T omasi and T . Kanade. Shape and motion from image streams und er orthograph y: a factorization m ethod. In ternational Journal of Computer V isi on , 9(2):137–154, 1992. R. T omioka, T . Suzuki, M. Sugiyama, and H. Kashima. A fast augmente d lagrangian algorithm fo r learning low- rank matrices . In Pro ceedings of the Interna tional Con- ference on Machine Learn ing , 2010. L. T orresani and C. Bregler . Space- time tracking . In Pro c eedings of the Europea n Conference o n Computer V is i on , 2002. I. T o sic and P . Frossard. Dictionary learn in g. IEEE Signal Proc essing Maga zin e , 28 (2):27–38, 2011. J . D. Trzasko. Exploiting local low-rank structure in higher-dimensional MRI applica- tions . I n Proceed ings of the SPIE Optical Engineeri ng+ Applicati o ns , 2013. A. Tsai, A. Y ezzi Jr , W . W ells III, C. T empany , D. Tucker , A. F an, W . Grimson , and A. Willsky . Model-based c u rve evolu tio n technique for image segm entation. I n Pro- 34 X. Zhou et al. ceedings of the IEEE Confere n ce on Computer V ision and P attern Recog nition , 2001. M. A. Turk and A. P . P entland. F ace reco gnition using e igenfaces. In Proceedin gs of the IEEE Conferenc e on Comp uter V ision and P attern R ecognition , 1991. B . V andere ycken. Low-rank matrix com pletion by rie mannian o ptimization. SIAM J ou rnal on Opti m ization , 23(2):1214–1236, 2013. R. Vidal. Subspace clustering. IEEE Signa l Proc e s sing Magzine , 28(2):52 –68, 2011. R. Vidal and R. Hartley . Motion seg m entation with missing d ata using PowerF actor- ization and GPCA. In Proceedings of the IEEE Conferen ce on Computer V isi o n and P attern R ecognition , 2004. R. Vidal an d Y . Ma. A unified alge braic app roach to 2-D and 3-D m otion segmentation. In Proceedings of the Eu ropean Co nference on Co mputer V isio n , 2004. N . W ang and D.-Y . Y e ung. Bayesian robust m atrix factorization for image and video processing. In Proc eedings of the In ternational Conference on Computer V ision , 2013. N . W ang, T . Y ao , J . W ang, and D.-Y . Y eung. A pro babilistic approach to robust matrix factorization. In Proceedings of Europea n confere n ce on Computer V isio n , 2012. A. E. W aters , A. C. Sankaranarayanan, and R. Baraniuk. Sparcs: Recov ering low- rank and sparse matrices fro m co mpressive measureme nts . In Advance s i n N eural Information Proc e ssing Systems , 2011. Z. W en, W . Yin, and Y . Zhang. Solving a low-r ank f actorization mode l for matrix com- pletion by a no nlinear successive ove r -relaxation algorithm. Mathematical Program- ming Computati on , 4(4):333–361, 2012. J . Wright, A. Ganesh, K. Min, and Y . Ma. Compressive prin c ipal compo nent pursuit. In Proceedings of the I E EE Intern ational Sy m posium o n Informa tion Theory , 2012. J . Xiao, J . Chai, and T . Kanade. A closed-fo rm solution to n o n-rigid shape and mo tion recove r y . Inter n ational Journal of Comp uter V isio n , 67(2):233–246, 2006. H. Xu, C. Caramanis, and S . Sanghavi. Ro bust PCA v ia outlier pursuit. IEEE T rans- actions on Informatio n Theory , 58(5):3047 –3064, 2012a. Y . Xu, W . Yin, Z. W en, and Y . Zh an g . An alternating d irection algorithm for matrix completion with nonne gative factors . F rontiers of Mathematics i n Chin a , 7(2):365– 384, 2012b. C . Y ang , T . Han, L. Quan, and C . T ai. P arsing fac ¸ ade with rank-one approximation. In Proceed ings of the IEEE Conferenc e o n Computer V ision an d P attern R ecognition , 2012. G . Y e, D. Liu, I . Jhu o, and S. Ch an g . Robust late fusion with rank minimization. In Proceedings of the IEE E Con ference on Co mputer V isio n and P attern Recogn i tion , 2012. Z. Ze ng, T .-H. Chan, K. Jia, and D . Xu. Finding correspo n dence f rom multiple images via sparse and low-ran k de composition. In Proceedin g s of the Euro pean Co nference on Computer V ision , 2012. D . Zhang, C . Hangzhou, Y . Hu, J . Y e, X. Li, and X. He. Matrix com pletion by truncated nuclear no rm reg ularization. In Proceedi ngs of the IEEE Conferen ce on Computer V i sion and P attern R ecognition , 2012a. M. Zhang, Z.- H. Huang, and Y . Z h ang . Restricted p-isometry prop erties of no nconvex matrix recov e ry . IEEE T rans a ctions on Inform ation Theory , 59(7):4316–4323, 2013a. T . Zhang, B. Ghane m, S . Liu, and N . Ahu ja. Low -rank sparse learning for ro bust visual tracking . In Proceed ings of the Eu ropean Co nference o n Com p uter V isi o n , 2012b. T . Zhang, B . G hanem, S . Liu, C . Xu, and N . Ahu j a. Low-rank sparse co ding fo r image classi fi cation. In Proceedings of the Internation al Confere nce on Computer V ision , 2013b. Y . Zhang, Z. Jiang, and L. S . Davis . Le arn ing structured low- rank representations for image classification. In Proceeding s o f the IEEE Con ference on Comp uter V isi on and P attern R ecognition , 2013c. Low-Rank Mode ling and Its Applications in Image Analysis 35 Z. Zhang, A. Ganesh, X. Liang, and Y . Ma. TIL T: Transfor m invariant low- rank tex- tures. Internation al J our n al of Compu ter V isio n , 99:1–24, 2012c. B . Zhao, J . Haldar , C. Brinegar , and Z. Liang . Low rank matrix recove ry f or real-time cardiac MRI. In Proceeding s of the I EEE Internati onal Symposiu m o n Biomedic al Imaging , 2010. B . Z h ao, J . Hal d ar , A. Christodoulo u, and Z.-P . Liang. Image re c o nstruction f rom highly undersampled (k,t)-space data with jo int partial separability and sparsity constraints . IEEE T r ansactions on Medical Im aging , 31(9):1809 –1820, 2012. Y . Zheng, G. Liu, S. Sugimoto, S . Y an, and M. Okutomi. Practical low-rank matrix approximation unde r ro bust l1-n orm. In Pro ceedings of the IEEE Conference on Computer V is i on a nd P attern Recogniti o n , 2012. T . Zhou and D. T ao. Gode c: Random ized low -rank & sparse matrix deco m position in noisy case. I n Proceed ings of the Interna tional Co nference on Machine Learning , 2011. X. Zh ou and W . Y u . Low -rank mode ling and its applications in medical image analysis . In Proceedings of SPIE Def e nse , Securi ty , an d Sensing , 2013. X. Zhou, X. Huan g , J . Du n can, and W . Y u . Active co ntours with group similarity . In Proceedings of the IEE E Con ference on Computer V ision and P attern R ecognition , 2013a. X. Zhou , C . Y ang, and W . Y u. Moving object dete ction by detecting contiguous ou tliers in the low- rank rep resentation. IEEE T ra nsactions on P attern Analysis an d Machine Intelligen ce , 35(3):597–610, 2013b. . Z. Zhou , X. Li, J . Wright, E. Cand e s , and Y . Ma. Stable principal co mponent pursuit. In Proceedings of the IEEE In tern ational Sympo sium on Information Theory , 2010. Y . Zhu, X. P ap ademetris , A. Sinusas , and J . D uncan. Seg mentation of the left ven tricl e from cardiac MR imag es using a subject-specific dynamical mod el. IEEE Transac- tions on Medical Imag i ng , 29(3):669–687, 2010.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment