Exact Solutions in Structured Low-Rank Approximation
Structured low-rank approximation is the problem of minimizing a weighted Frobenius distance to a given matrix among all matrices of fixed rank in a linear space of matrices. We study exact solutions to this problem by way of computational algebraic …
Authors: Giorgio Ottaviani, Pierre-Jean Spaenlehauer, Bernd Sturmfels
EXA CT SOLUTIONS IN STR UCTURED LO W-RANK APPR O XIMA TION GIOR GIO OTT A VIANI † , PIERRE-JEAN SP AENLEH AUER ‡ , AND BERN D STURMFELS § Abstract. Structured low-rank appro ximation is the problem of minimizing a weigh ted F rob e- nius distance to a give n matrix among all matrices of fixed rank in a linear space of matrices. W e study the critical points of this optimization problem using algebraic geometry . A particular fo cus lies on Hank el matrices, Sylve ster m atri ces and generic linear spaces. 1. In tro duction. L ow-r ank appr oximation in linear alg ebra refers to the follow- ing optimizatio n pro blem: minimize || X − U || 2 Λ = m X i =1 n X j =1 λ ij ( x ij − u ij ) 2 sub ject to rank( X ) ≤ r . (1.1) Here, we are given a rea l data matrix U = ( u ij ) of format m × n , and we wis h to find a matrix X = ( x ij ) of r a nk at most r that is closest to U in a weigh ted F ro benius norm. The en tries of the weight matrix Λ = ( λ ij ) ar e p ositive r eals. If m ≤ n and the weight matrix Λ is the all-one matrix 1 then the solution to (1.1) is given b y the singular v alue dec o mpo sition U = T 1 · dia g( σ 1 , σ 2 , . . . , σ m ) · T 2 . Here T 1 , T 2 are orthog onal matrices , a nd σ 1 ≥ σ 2 ≥ · · · ≥ σ m are the singular v alues of U . By the E ck a rt-Y oung Theor em, the matrix of r ank ≤ r clo sest to U equa ls U ∗ = T 1 · diag( σ 1 , . . . , σ r , 0 , . . . , 0) · T 2 . (1.2) F or weigh ts Λ, the situation is more co mplicated, as seen in the studies [17, 22, 24]. In particular, ther e can be many local minima. W e dis cuss a small instance in Example 2. In structu r e d low-r ank appr oximation [5, 18], we ar e also given a linear subspace L ⊂ R m × n , typically containing the ma tr ix U . W e cons ide r the r estricted pro blem: minimize || X − U || 2 Λ = m X i =1 n X j =1 λ ij ( x ij − u ij ) 2 sub j. to X ∈ L and ra nk( X ) ≤ r . (1.3) A best-ca s e scena rio fo r Λ = 1 is this: if U lies in L then so does U ∗ . This happ ens for some subspac es L , including sy mmetr ic and circula nt matrices, but most subspaces L do not enjoy this pro per ty (cf. [5]). Our problem is difficult even for Λ = 1 . Most pr actitioners use lo c al m et ho ds to solve (1.3). These metho ds return a lo cal minim um. There ar e many heuristics for ensuring that a lo cal minim um is in fact a g lobal minimum , but there is never a guar antee that this ha s b een acco mplished. Another approach is to set up sum of squar es rela x ations, which are then solved with semidefinite programming (cf. [2]). These SOS metho ds furnish cer tificates of global optimality whenever the relaxa tion is ex act. While this do e s ha ppe n in ma ny instances, ther e is no a -priori g ua rantee either. † Unive rsit` a di Firenze, viale Morgagni 67A, 5013 4 Fi renze, Italy , ottavian @math.unifi .it ‡ CARAMEL pro ject, Inria Nancy Grand-Est; Universit ´ e de Lorr aine; CNR S. LORIA, Nancy , F rance, pierre- jean.spaenle hauer@inria.fr § Unive rsity of California, Berk eley , CA 94720-3840 , USA, bernd@b erkeley.edu 1 2 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS How then can one relia bly find all global optima to a p olynomia l optimization problem s uch as (1 .3)? Aside from int erv al arithmetic a nd domain decomp osition techn iques, the o nly sure method we are a ware of is to list and examine all the critical po int s. Algorithms that iden tify the cr itical p o ints, notably Gr¨ obner b ases [8] and numeric al algebr aic ge ometry [1], find a ll s olutions over the complex num b er s and sort out the rea l so lutions after the fact. The n umber of co mplex critical p oints is an int rinsic inv ariant of a n o ptimization problem, and it is a go o d indicator of the running time needed to solve that pr oblem exa ctly . The study of such algebra ic degre es is an active ar ea of res e arch, and well-dev elop ed res ults a re now av ailable for semidefinite progra mming [2 1 ] and maximum likelihoo d estima tion [4]. The present pap er a pplies this philosophy to structured low-rank approximation. A g eneral deg ree theory for closest points on algebr a ic v arieties w as in tro duced b y Draisma et al. in [6]. F ollowing their a pproach, our primary task is to compute the num ber of co mplex cr itical p o ints of (1.3). Thus, we seek to find the Euclide an distanc e de gr e e (ED degre e) of L ≤ r := X ∈ L : rank( X ) ≤ r . This determinantal variety is alwa ys reg arded as a subv ariety of the matrix space R m × n , and we use the Λ -weigh ted Euclidean distance coming fr o m R m × n . W e write EDdegree Λ ( L ≤ r ) for the Λ- weighted Euclidean dis ta nce degree of the v ar iety L ≤ r . Thu s E Ddegree Λ ( L ≤ r ) is the num b er of co mplex critical p o ints of the proble m (1.3 ) for generic da ta ma trices U . The imp or tance of keeping track of the weights Λ was highlighted in [6, Exa mple 3.2 ], for the seemingly har mles s situation when L is the subspace o f all symmetric matrices in R n × n . Our initial focus lies on the unit ED de gr e e , when Λ = 1 is the all-one matr ix, and on the generic ED de gr e e , deno ted EDdeg ree gen ( L ≤ r ), when the weight matrix Λ is generic. Choo sing g eneric weights λ ij ensures that the v ariety L ≤ r meets the isotropic quadric transversally , and it hence allo ws us to apply formulas f rom intersection theory such as [6, Theor e m 7.7]. This pap er is org anized as follows. In Sectio n 2 we offer a computationa l study of our optimization problem (1.3) when the subspace L is generic of co dimension c . Two cases ar e to b e disting uis hed: either L is a vector space, defined by c homo- geneous linea r equations in the matrix entries, or L is an affine space, defined by c inhomogeneous linear equa tio ns. W e refer to these as the line ar c ase and affine c ase resp ectively . W e present Gr ¨ obner basis metho ds for computing all complex critica l po int s, and we re p o rt on their per formance. F rom the complex critica l p oints, o ne ident ifies all r eal cr itical p oints and all lo ca l minima. In Section 3 we derive some explicit for mu las fo r EDdegree gen ( L ≤ r ) when L is generic. W e cov er the four cases that arise by pairing the affine case and the linear case with either unit weigh ts o r gener ic weigh ts. Here we are us ing techniques from algebraic geometr y , including Chern classes and the a nalysis of singular ities. In Sec- tion 4, we s hift gear s and we fo cus on sp ecial matrices, namely Hankel matrices and Sylvester matrices . Those spa ces L ar ise natura lly from symmetric tensor decomp o- sitions and approximate GCD co mputations. These applications requir e the use of certain sp ecific weight matric e s Λ other than 1 . W e close the int ro duction with tw o ex amples that illustrate the concepts a b ove. Example 1. Let m = n = 3 and L ⊂ R 3 × 3 the 5- dimens io nal space of Hankel EXACT STRUCTURED LO W-RANK APPRO XIMA TION 3 matrices: X = x 0 x 1 x 2 x 1 x 2 x 3 x 2 x 3 x 4 , U = u 0 u 1 u 2 u 1 u 2 u 3 u 2 u 3 u 4 and Λ = λ 0 λ 1 λ 2 λ 1 λ 2 λ 3 λ 2 λ 3 λ 4 . Our goal in (1.3) is to solve the following constrained optimization problem for r = 1 , 2: minimize λ 0 ( x 0 − u 0 ) 2 + 2 λ 1 ( x 1 − u 1 ) 2 + 3 λ 2 ( x 2 − u 2 ) 2 + 2 λ 3 ( x 3 − u 3 ) 2 + λ 4 ( x 4 − u 4 ) 2 sub ject to rank( X ) ≤ r . This can stated as an unco nstrained optimization problem. F or instance, for r a nk r = 1, we get a one-to-o ne parametr ization of L ≤ 1 by setting x i = st i , and we seek to minimize λ 0 ( s − u 0 ) 2 + 2 λ 1 ( st − u 1 ) 2 + 3 λ 2 ( st 2 − u 2 ) 2 + 2 λ 3 ( st 3 − u 3 ) 2 + λ 4 ( st 4 − u 4 ) 2 . The ED degree is the num b er of critical po ints with t 6 = 0. W e consider three weight s: 1 = 1 1 1 1 1 1 1 1 1 , Ω = 1 1 / 2 1 / 3 1 / 2 1 / 3 1 / 2 1 / 3 1 / 2 1 , Θ = 1 2 2 2 2 2 2 2 1 . Here Ω gives the usua l Euc lidea n metr ic when L is identified with R 5 , and Θ arises from identifying L with symmetric 2 × 2 × 2 × 2-tensors , as in Section 4. W e compute EDdegree 1 ( L ≤ 1 ) = 6 , EDdegree Ω ( L ≤ 1 ) = 10 , EDdegree Θ ( L ≤ 1 ) = 4 , EDdegree 1 ( L ≤ 2 ) = 9 , EDdegree Ω ( L ≤ 2 ) = 13 , EDdegree Θ ( L ≤ 2 ) = 7 . In both cases , Ω e x hibits the generic b ehavior: E Ddegree gen ( L ≤ r ) = EDdegree Ω ( L ≤ r ). See Sections 3 and 4 for la rger Hankel matr ic e s and formulas for their ED degrees. ♦ Example 2. Let m = n = 3 , r = 1 but now take L = R 3 × 3 , s o this is just the weigh ted rank-o ne approximation pr oblem for 3 × 3-matrices. W e know from [6, Example 7 .10] that EDdegree gen ( L ≤ 1 ) = 39. W e take a cir culant data matrix and a circulant w eight matrix: U = − 59 11 59 11 59 − 59 59 − 59 11 and Λ = 9 6 1 6 1 9 1 9 6 . This ins tance has 39 critical p o ints. Of thes e , 1 9 are r eal, a nd 7 a re lo ca l minima: 0 . 0826 2 . 7921 − 1 . 545 2 2 . 7921 94 . 3235 − 52 . 2007 − 1 . 5452 − 52 . 2007 28 . 8890 , − 52 . 2007 28 . 8890 − 1 . 5452 2 . 7921 − 1 . 545 2 0 . 0826 94 . 3235 − 52 . 2007 2 . 7921 − 52 . 2007 2 . 7921 94 . 3235 28 . 8890 − 1 . 5 452 − 52 . 2007 − 1 . 5452 0 . 0826 2 . 79 21 , − 29 . 8794 36 . 2165 − 27 . 2599 − 32 . 7508 39 . 6968 − 29 . 8794 39 . 6968 − 48 . 1160 36 . 2165 − 48 . 1160 36 . 2165 39 . 6968 36 . 2165 − 27 . 2599 − 29 . 8794 39 . 6968 − 29 . 8794 − 32 . 7508 , − 29 . 8794 − 32 . 7 508 39 . 6968 36 . 2165 39 . 6968 − 48 . 11 60 − 27 . 2599 − 29 . 8 794 36 . 2165 − 25 . 375 − 25 . 375 − 25 . 375 − 25 . 375 − 25 . 375 − 25 . 375 − 25 . 375 − 25 . 375 − 25 . 375 . 4 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS The fir st three a re the g lobal minima. The last matrix is the lo c al minimum where the o b jectiv e function has the larg est v alue: note that ea ch en try e q uals − 203 / 8 . The entries of the first six matr ices are algebr aic num ber s of deg ree 10 ov er Q . F or instance, the t wo upp e r left en tries 0 . 0 826 a nd − 4 8 . 1160 ar e among the four real ro o ts of the irr educible po lynomial 1644660 28468224 x 10 + 2785864 8335954688 x 9 + 1602205386 689376672 x 8 +72858362 60028875412 x 7 − 2198728936 046680414272 x 6 − 14854532 690380098143152 x 5 + 2688673091 228371095762316 x 4 +44612094 455115888622678587 x 3 − 4135008 0445712457319337106 x 2 +27039129 499043116889674775 x − 197763246356 3766878765625 . Thu s, the critical idea l in Q [ x 11 , x 12 , . . . , x 33 ] is no t prime. It is the intersection of six maximal ideals . Their degrees ov er Q are 1 , 2 , 6 , 10 , 10 , 10, for a total of 3 9 = EDdegree gen ( L ≤ 1 ). ♦ William Rey [22] rep orts on numerical exp eriments with the optimization pro b- lem (1 .1), a nd he a s ks whether the num ber o f lo cal minima is b ounded ab ov e by min( m, n ). O ur Example 2 gives a negative a ns wer: the num b e r of lo cal minima can exceed min( m, n ). This result highlights the v alue of our e x act a lgebraic metho ds for practitioners o f optimiza tio n. 2. Gr¨ obner Bases. The critica l po int s o f the low-rank approximation pro blem (1.3) can be computed as the solution set of a system of poly nomial equatio ns. In this sectio n we derive these equatio ns, and we demonstra te how to solve a rang e of instances using current Gr¨ obner ba sis techniques. Here, our emphasis lies on the case when L is a gener ic subspac e , either linea r or affine. Starting with the linear case, let { L 1 , L 2 , . . . , L s } b e a basis o f L ⊥ , the s pa ce of linear forms on R m × n that v a nish on L . Thus co dim( L ) = s , e a ch deriv ative ∂ L k /∂ x ij is a co nstant, and L = { X ∈ R m × n : L 1 ( X ) = · · · = L s ( X ) = 0 } . The case when L is a n affine space ca n b e treated with the same notation if we take each L i to b e a linear for m plus a co nstant. The following implicit formulation o f the critica l equations is a v ar iation on [6, (2.1)]. W e b egin with the ca se m = n = r + 1. Let D ∈ Z [ x 11 , . . . , x nn ] denote the determinant o f the n × n -matrix X = ( x ij ). Given a da ta matrix U = ( u ij ) ∈ R n × n , the critical p oints of P n i =1 P n j =1 λ ij ( x ij − u ij ) 2 on the determinantal hyper surface L ≤ n − 1 = { X ∈ L : D ( X ) = 0 } verify the following conditions. The matr ix o n the right ha s s + 2 rows and n 2 columns: D ( X ) = 0 L 1 ( X ) = 0 . . . L s ( X ) = 0 Rank ∂ D /∂ x 11 · · · ∂ D /∂ x nn ∂ L 1 /∂ x 11 · · · ∂ L 1 /∂ x nn . . . . . . . . . ∂ L s /∂ x 11 · · · ∂ L s /∂ x nn λ 11 ( x 11 − u 11 ) · · · λ nn ( x nn − u nn ) ≤ s + 1 . An y s ing ular p oint of L ≤ n − 1 also satisfie s these conditions. The rank c ondition on the Jacobian matrix can b e mo deled by intro ducing Lagrang e multipliers z 0 , z 1 , . . . , z s . These are new v ariables. W e now co nsider the following p olynomial system in n 2 + s + 1 EXACT STRUCTURED LO W-RANK APPRO XIMA TION 5 v aria bles: D ( X ) = 0 L 1 ( X ) = 0 . . . L s ( X ) = 0 z 0 · · · z s 1 · ∂ D /∂ x 11 · · · ∂ D /∂ x nn ∂ L 1 /∂ x 11 · · · ∂ L 1 /∂ x nn . . . . . . . . . ∂ L s /∂ x 11 · · · ∂ L s /∂ x nn λ 11 ( x 11 − u 11 ) · · · λ nn ( x nn − u nn ) = 0 · · · 0 . (2.1) T able 2.1 shows the num b er o f complex solutions to these equations. These num bers are o btained from the for mulas in Section 3. W e verified them using Gr¨ obner bases. W e obser ve that T a ble 2.1 has the following remark able prop erties: • There is a shift b etw een the ED degrees of affine and linea r sections for s ≥ ( n − 1) 2 . This phenomenon will b e expla ined in Prop os ition 3.1. • F o r Λ general, the third blo ck o f columns (linear entries) is co ns tant for s ≤ n ( n − 2 ) a nd the fourth one (affine entries) is consta nt for s ≤ n ( n − 2) + 1. This is explained in Co rollar ies 3 .2 and 3.5. • The differences b etw een the first and the third blo c k of columns (b oth with linear en tries) equal those b etw een the s e c ond a nd the fourth one (b oth with affine entries). This g ap is expres sed (conjecturally) with formula (3.6 ). W e prov e the co rrectness of the for mulation (2.1) a nd then discus s our computations. Proposition 2. 1. F or a generic line ar (or affine) sp ac e L of c o dimension s and for a generic data matrix U = ( u ij ) in L , the solutions ( X , z ) of t he p olynomial system (2.1) c orr esp ond t o the critic al p oints X of t he optimization pr oblem (1.3) for squar e matric es of c or ank one. Pr o of . W e prove this for linear spac e s L . The ar gument is similar when L is a n affine space. An y so lution o f the s ystem (2.1) cor resp onds to a p oint of L where the Jacobian matrix o f ( D , L 1 , . . . , L s , k X − U k 2 Λ ) has a rank defect. There are tw o t yp es of s uch p oints: the c ritical p oints of the dis ta nce function and singular p oints on the determinantal v ariety . Hence it suffices to prove that no p o int in the singular lo cus corres p o nds to a so lutio n of (2.1). The matrix U = ( u ij ) 1 ≤ i,j ≤ n was assumed to b e generic, so it has rank n since L is als o generic. If X is a sing ular p oint o f the linear section of the v ariety defined b y D ( X ) = L 1 ( X ) = · · · = L s ( X ) = 0, then there exists ( y 0 , y 1 , . . . , y s ) with y 0 6 = 0 such that y 0 y 1 · · · y s · ∂ D /∂ x 11 · · · ∂ D /∂ x nn ∂ L 1 /∂ x 11 . . . ∂ L 1 /∂ x nn . . . . . . . . . ∂ L s /∂ x 11 · · · ∂ L s /∂ x nn = 0 · · · 0 . Let us assume by co nt radiction that X ex tends to a so lution ( X , z ) of (2.1). Then h 0 ( z 1 − y 1 z 0 y 0 ) · · · ( z s − y s z 0 y 0 ) 1 i · ∂ D /∂ x 11 · · · ∂ D /∂ x nn ∂ L 1 /∂ x 11 · · · ∂ L 1 /∂ x nn . . . . . . . . . ∂ L s /∂ x 11 · · · ∂ L s /∂ x nn λ 11 ( x 11 − u 11 ) · · · λ nn ( x nn − u nn ) = 0 · · · 0 . This means that X − U b elongs to L and Λ ∗ ( X − U ) b elo ngs to L ⊥ . Here ∗ denotes the Hadamard (co o rdinatewise) pr o duct of tw o matrices. The scala r pr o duct o f X − U 6 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS linear, Λ = 1 n = 2 3 4 5 s = 0 2 3 4 5 s = 1 4 15 28 45 s = 2 2 31 92 205 s = 3 0 39 188 605 s = 4 33 260 12 21 s = 5 21 284 18 05 s = 6 9 28 4 2125 s = 7 3 28 4 2205 s = 8 0 28 4 2205 s = 9 264 2205 s = 1 0 204 2205 s = 1 1 120 2205 s = 1 2 52 22 05 s = 1 3 16 22 05 s = 1 4 4 2205 s = 1 5 0 2205 affine, Λ = 1 2 3 4 5 2 3 4 5 6 15 28 45 4 31 92 2 05 2 39 18 8 605 39 260 12 21 33 284 18 05 21 284 21 25 9 28 4 2205 3 28 4 2205 284 2205 264 2205 204 2205 120 2205 52 220 5 16 220 5 4 2205 linear, Λ g en. n = 2 3 4 5 s = 0 6 39 284 22 05 s = 1 4 39 284 22 05 s = 2 2 39 284 22 05 s = 3 0 39 284 22 05 s = 4 33 284 22 05 s = 5 21 284 22 05 s = 6 9 28 4 2205 s = 7 3 28 4 2205 s = 8 0 28 4 2205 s = 9 264 2205 s = 1 0 204 2205 s = 1 1 120 2205 s = 1 2 52 22 05 s = 1 3 16 22 05 s = 1 4 4 2205 s = 1 5 0 2205 affine, Λ gen. 2 3 4 5 6 39 28 4 2 205 6 39 28 4 2 205 4 39 28 4 2 205 2 39 28 4 2 205 39 284 22 05 33 284 22 05 21 284 22 05 9 28 4 2205 3 28 4 2205 284 2205 264 2205 204 2205 120 2205 52 220 5 16 220 5 4 2205 T able 2.1 The ED de gr e e for the determinant of an n × n -matrix with line ar or affine entries. and Λ ∗ ( X − U ) is zero . Since a ll co ordinates live in R , these conditions imply k X − U k 2 Λ = 0 , and hence X = U . W e get a contradiction since U has full ra nk, whereas D ( X ) = 0. The v alues of EDdegree Λ ( L ≤ n − 1 ) in T able 2.1 can be verified computationally with the for mu lation (2.1). W e used the implement ation of F a ug` ere’s Gr¨ obner ba s is algorithm F 5 [8] in the m aple pack age FG b . Computing Gr¨ o bner bases for (2.1) was fairly easy for n ≤ 4, but difficult alre a dy for n = 5. F or eac h o f the case s in T able 2.1, we computed the ED de g ree by running FGb over the finite field with 65521 elements. How e ver, due to s ubstantial co efficient gr owth, this did no t work ov er the field Q of EXACT STRUCTURED LO W-RANK APPRO XIMA TION 7 rational num b ers. Hence , to actually compute all critica l p oints over C and hence all lo cal minima over R , even for n = 4, a better for mulation was r equired. In what follows w e sha ll present tw o such improv ed formulations. Dualit y plays a key r ole in the co mputation of the c ritical p o ints of the Euclidean distance and was in vestigated in [6, § 5]. In what follows, we compute the critica l po int s of the weigh ted Euclidean distance of the determinant b y using this duality . In the following s ta tement w e are using the s tanding h ypo thesis that all λ ij are non-zero. Proposition 2 .2. L et U b e a generic m × n matrix with m ≤ n , let Λ b e a weight matrix, and fix an inte ger r ≤ min( m, n ) . Then ther e is a bije ction b etwe en the critic al p oints of (1) Q ( X ) = P i,j λ ij ( x ij − u ij ) 2 on the variety C m × n ≤ m − r of c or ank r matric es X , and (2) Q dual ( Y ) = P i,j ( y ij − λ ij u ij ) 2 /λ ij on the variety C m × n ≤ r of r ank r matric es Y . F or e ach critic al p oint X of (1), the c orr esp onding critic al p oint Y of ( 2) e quals Y = Λ ∗ U − Λ ∗ X , wher e ∗ denotes the Hadamar d pr o duct. In p articular, if U has r e al ent r ies, t hen the bije ction int er changes the r e al critic al p oints of (1) and of (2). Pr o of . The critical p oints o f (1) co rresp ond to matrices X such tha t the Hadamar d pro duct Λ ∗ ( U − X ) is p erp endicular to the tangent space at X of the v ariety C m × n ≤ m − r of cora nk r matrices . Recall, e.g . fr om [6, § 5], that the dual v a riety to C m × n ≤ m − r is the v arie ty C m × n ≤ r of r a nk r matrices. Hence, the cr itical p oints in (1) can b e found by solving the linear equation Y = Λ ∗ ( U − X ) on the co normal v ariety . That c onormal variety is the set of all pairs ( X , Y ) such that X ∈ C m × n ≤ m − r , Y ∈ C m × n ≤ r , X t · Y = 0, and X · Y t = 0. W e can now expr ess X in ter ms o f Y and the para meters b y writing X = Λ ∗− 1 ∗ (Λ ∗ U − Y ), where Λ ∗− 1 denotes the Hadamar d (co ordinatewis e) in verse of the weigh t matrix Λ. Using bidua lit y , this mea ns that X = Λ ∗− 1 ∗ (Λ ∗ U − Y ) is per p endicular to the tangent spa ce a t Y of the v ariety C m × n ≤ r . This is equiv a lent to the sta tement that Y is a critica l p oint of (2) on C m × n ≤ r . In both Pro po sitions 2 .1 and 2.2, it is ass umed that the given ma trix U is generic. Here the term generic is meant in the us ual se nse of algebraic geometry: U lies in the complemen t of an algebraic h yp ersurface. In particular, that complement is dense in R m × n , so U will b e gener ic with proba bilit y one when dr awn from a probability measure supp orted on R m × n . H ow ever, an ex a ct c hara cterization of generic ity is difficult. The p olynomia l that defines the aforementioned hypers urface is the ED discriminant . As can b e seen in [6, § 7], this is a very large p olynomial of hig h degr ee, and we will r arely b e able to identify it in an explicit wa y . Prop ositio n 2.2 shows that w eighted low-rank approximation can b e solved by the dual pr o blem. W e fo cus now on the cor ank 1 case (whos e dual problem is ra nk 1 approximation). F or this, we use the parametriza tion of n × n matrices of ra nk 1 by ( t 1 , . . . , t n , z 1 , . . . , z n − 1 ) 7→ t 1 t 1 z 1 . . . t 1 z n − 1 . . . . . . . . . . . . t n t n z 1 . . . t n z n − 1 . (2.2) Remark 1. T his p ar ametrization is not surje ctive: the r ank 1 matric es whose first c olumn is zer o ar e missing. This is not an issu e when U and L ar e generic, sinc e in t hat c ase al l critic al p oints ar e in t he image of the p ar ametrization. However, for sp e cific U or L , if some of the critic al p oints ar e missing, they c an b e c ompute d by cho osing n such p ar ametrizations whose r anges c over al l r ank 1 matric es. This multiplies the c omputation time by n . Our a priori c omputation of t he ED de gr e e 8 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS is useful also t o over c ome these difficulties. Supp ose t he exp e cte d numb er of critic al p oints is known. Then, after some p ar ametrizations have b e en t rie d for t he given data ( U, L ) , the user is guar ante e d that al l critic al p oints have b e en found. The par ametrizatio n (2.2 ) expr esses the dual problem (for co rank o ne) as an unconstrained optimization pr o blem in 2 n − 1 v ar iables: Maximize Q dual = X 1 ≤ i,j ≤ n 1 λ ij ( y ij − λ ij u ij ) 2 , where y i 1 = t i and y ij = t i z j − 1 . (2 .3) Here, “maximize” is used in an unconv en tional w ay: wha t we seek is the critical p oint furthest to U . That critical p o in t nee d not b e a loc al maximum; see e.g. [6, Figure 4]. W e compute the critical p o int s for (2.3) by applying Gr ¨ obner ba ses to the equations ∂ Q dual /∂ t i = ∂ Q dual /∂ z j = 0 for i ∈ { 1 , . . . , n } and j ∈ { 1 , . . . , n − 1 } . The critical p oints of the primal problem ar e found by the formula Y = Λ ∗ ( U − X ). This co ncludes our dis cussion of sq uare matr ices of ra nk 1 or corank 1. W e next co nsider the genera l c a se of rectang ular matrices of format m × n with g eneral linear or affine entries. W e assume r ≤ m ≤ n a nd s ≤ mn . Let M be a co mplex m × n -ma trix of rank r . Then M is a smo oth p oint in the v ariety C m × n ≤ r of ma trices of rank ≤ r . Le t Ker L ( M ) and Ker R ( M ) denote the left and rig h t kernels of M resp ectively . The normal spa c e o f C m × n ≤ r at M has dimension ( m − r )( n − r ), a nd it equals Ker L ( M ) ⊗ Ker R ( M ) ⊂ C m × n [11, Chapter 6]. Its or thogonal complement is the tang ent space at M , which has dimensio n r m + rn − r 2 . In order to construct a polyno mial system whose solutions are the critical points of X 7→ || X − U || 2 Λ on the smo oth lo c us of L ≤ r , we in tro duce tw o matrices o f unknowns: Y = 1 . . . 0 . . . . . . . . . 0 . . . 1 y 1 , 1 . . . y 1 ,m − r . . . . . . . . . y r, 1 . . . y r,m − r and Z = 1 . . . 0 . . . . . . . . . 0 . . . 1 z 1 , 1 . . . z 1 ,n − r . . . . . . . . . z r, 1 . . . z r,n − r . F or i ∈ { 1 , . . . , m − r } , j ∈ { 1 , . . . , n − r } , le t N (( m − r )( j − 1)+ i ) be the rank 1 matrix which is the pro duct of the i th column of Y and of the j th row of Z ⊺ . W e cons ider Y ⊺ · X = 0 X · Z = 0 L 1 ( X ) = 0 . . . L s ( X ) = 0 w 1 · · · w ( m − r )( n − r )+ s 1 N (1) 11 . . . N (1) mn . . . . . . . . . N (( m − r )( n − r )) 11 . . . N (( m − r )( n − r )) mn ∂ L 1 /∂ x 11 · · · ∂ L 1 /∂ x mn . . . . . . . . . ∂ L s /∂ x 11 · · · ∂ L s /∂ x mn λ 11 ( x 11 − u 11 ) . . . λ mn ( x mn − u mn ) = 0 . (2.4) The rank co nditio n on the matrix in (2.4) comes from the fact that M ∈ L ≤ r is a cr itical p oint if the g radient of the distance function at M b elo ngs to the norma l space o f L ≤ r at M . The first ( m − r )( n − r ) + s rows of the matrix span the norma l EXACT STRUCTURED LO W-RANK APPRO XIMA TION 9 space of L ≤ r at a smo oth p oint. This formulation av oids sa tur ating by the singular lo cus, which is often to o co stly . Proposition 2. 3. F or a generic affine sp ac e L of c o dimension s and a generic matrix U in L , the p olynomial system 2.4 has finitely many c omplex solutions which c orr esp ond t o the critic al p oints of the weighte d Euclide an distanc e function on the smo oth lo cu s of L ≤ r . Pr o of . This is derived from [6, Lemma 2.1]. It is ana logous to Prop o s ition 2.1. As in the co rank 1 case, for sp ecial data ( U, L ) so me critical p oints may b e missed b ecaus e our formulation computes only the critical p o int s in a dense o p e n subset of L ≤ r . Ho wev er, the same fix a s in Remar k 1 works here. W e can r e do the computations in a ny o f the n r m r charts co rresp onding to the inv ertibilit y of pair s of s quare submatrice s of Y and Z . W e next discuss our computational exp erienc e with Gr¨ obner bases. In T able 2.2, we c ompare the efficiency o f the differen t approaches o n a sp ecific problem: c omputing the weigh ted r ank 3 a pproximation of a 4 × 4 matrix. The ex per imental setting is the following: we co nsider a 4 × 4 matr ix U with integer en tries pick ed uniformly at random in {− 100 , . . . , 10 0 } and a random weigh t matrix Λ with p os itive integer ent ries chosen a t random in { 1 , . . . , 2 0 } . By T able 2.1, the generic ED degr ee is 2 84 and the ED degr ee for Λ = 1 is 4 . W e rep ort in T able 2.2 the timings for computing a lexico graphica l Gr¨ obner bas is with the ma ple pack age FGb [8]. Once a Gr¨ o bner basis is known, iso lation tec hniques may b e used to obtain the real ro ots. The map le pack age fgbrs provides implementations of such metho ds. Determinant primal (2.1 ) Parametric dual (2 .3) Normal space primal (2.4) Normal s pa ce dual (2.4) Λ gener ic, GF (65 521) 5s 1.3s 6s 8.6s Λ generic, ov er Q > 1 day 891s 1327s 927s Λ = 1 , ov er Q 0.3s 0.2s 0.4s 0.5s T able 2.2 Symb olic c omputation of the weighte d r ank 3 appr oximations of a 4 × 4 matrix W e examine three scena rios. In the first row, the computation is p er formed over a finite field. This gives information ab out the algebra ic difficulty of the problem: there is no co efficient growth, and the timings indicate the num ber of a rithmetic ope rations in Gr¨ obner bases algo rithms. Ho wev er, finding lo cal minima requir es c o mputing ov er Q . In rows 2 and 3 of T able 2.2, we compar e the case of gener ic weigh ts with the un weigh ted case (1.2) that corre sp onds to the singular v a lue decomp ositio n (Λ = 1 ). The dual problem is easies t to solve, in particular with the unconstra ined form ulation (2.3). Note that, for s ≥ 1, such an uncons trained formulation is not av ailable, since L ≤ r is gener ally not a unira tional v a r iety . In T a ble 2.3, we rep ort on so me Gr¨ obner basis computations with the maple pack age FGb for Λ = 1 . Here we used the formulation (2.4). T he ED degre e , given in b old face, is follow ed by the time, measur ed in sec onds, for co mputing the graded reverse lexicog r aphic Gr¨ o bner basis. The first timing is obtained by p erforming the computation ov er the finite field GF (65521); the second o ne is obta ine d by computing ov er the field of r ationals Q . The symbo l “ − ” means that we did not obta in the Gr¨ obner basis a fter s even days of computation. An impor tant observ ation in T able 2.3 is the cor r elation b etw een the repo rted running times and the v alues of E Ddegree 1 . The former tell us how many a rithmetic 10 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS ( m, n, r ) s = 0 s = 1 s = 2 s = 3 (4 , 4 , 2) 4 /0.42s /1.8s 54 /1.93 s/744 s 230 /5 2 .7s/– 582 /349.2s / – (3 , 4 , 2) 3 /0.2 s/0.3s 15 /0 .3s/7.4 s 43 /1 .2s/13 2s 71 /1.5s/1 120s (3 , 5 , 2) 3 /0.3 s/0.5s 15 /0.5s/16s 43 /2.1 s/400 s 87 /7.1s/60 38s ( m, n, r ) s = 4 s = 5 s = 6 (4 , 4 , 2) 998 /1474 s/– 1250 /2 739s/ – 12 50 /2961s / – (3 , 4 , 2) 83 /2.2s/2 696s 8 3 /2.3s/48 46s 83 /2.1 s/57 6 4s (3 , 5 , 2) 1 27 /16s/5 9091s 143 /20s/ 16009 4s 143 /20 s/681 64s ( m, n, r ) s = 7 s = 8 s = 9 (4 , 4 , 2) 1074 /181 6s/– 818 /821s /– 532 /349s/– (3 , 4 , 2) 73 /2.2s/4 570s 49 /1.0s /1619 s 22 /0.8s /350s (3 , 5 , 2) 143 /20 s/992 08s 143 /20s/16 3532s 128 /18s/ 2635 8 6s ( m, n, r ) s = 1 0 s = 11 s = 12 (4 , 4 , 2) 276 /92s/– 100 /42s/4 5098 8s 20 /1.4s/1970 s (3 , 4 , 2) 6 /0.3s/ 6.4s (3 , 5 , 2) 8 8 /13s/67 460s 40 /1.9s/456 8s 10 /0.8s/1 14s T able 2.3 Symb olic c omputations for affine sections of determinantal v ariet ies with Λ = 1 . op erations ar e needed to find a Gr¨ obner ba sis. This sugges ts that the ED degr e e is an accur ate measur e fo r the co mplex it y o f solving low-rank a pproximation problems with sym b olic algorithms, and it serves as a k ey motiv ation for co mputing ED degr ees using a dv anced tools from alg e br aic geo metry . This will b e c a rried o ut in the next section, b oth for Λ gener ic and for Λ = 1 . In par ticular, w e s hall a rrive at theor etical explanations for the ED degrees in T ables 2.1 and 2 .3. 3. Algebraic Geometry . The study o f E D degrees for algebra ic v arieties was started in [6]. This section builds on and further develops the geometric theor y in that pap er. W e fo c us on the low ra nk approximation pro blem (1.3), and we derive general for mulas for the ED degr e es in T ables 2.1 a nd 2 .3. W e re c a ll that an affine v ariety X ⊂ C N +1 is an affine c one if x ∈ X implies tx ∈ X for every t ∈ C . The v a riety of m × n - matrices of rank ≤ r is an affine co ne. If X ⊂ C N +1 is an a ffine cone, then the co rresp onding pro jective v ar ie ty P X ⊂ P N is well de fined. The ED degr ee of P X is the ED degree of its affine cone X . The following prop osition explains the shift betw een the third a nd fourth column of T able 2.1. Mor e generally , it shows that we can r estrict the analysis to linear s e ctions, s ince the E D degree (for gener ic weigh ts) in the affine cas e can be deduced from the linear ca se. Proposition 3.1. L et X ⊂ C N +1 b e an affine c one, let A s (r esp. L s ) b e a generic affine (r esp. line ar) subsp ac e of c o dimension s ≥ 1 in C N +1 . Then EDdegree gen ( X ∩ A s ) = E Ddegree gen ( X ∩ L s − 1 ) . (3.1) Pr o of . Let X ⊂ P N +1 be the pro jective closure o f X . F rom [6, The o rem 6.1 1], we hav e EDdegree gen ( X ) = EDdegree gen ( X ), since the tr a nsversality assumptions in that r esult are satisfied for genera l weights. F rom the equality X ∩ A s = X ∩ L s , we EXACT STRUCTURED LO W-RANK APPRO XIMA TION 11 conclude EDdegree gen ( X ∩ A s ) = EDdegree gen ( X ∩ L s ) = EDdegree gen ( P X ∩ L s − 1 ) = EDdegree gen ( X ∩ L s − 1 ). Here, the sec o nd equality follows from P X = X ∩ L 1 . Consider a pro jective v ariety X embedded in P N with a generic sys tem of co ordi- nates. It w as shown in [6, Theo rem 5.4] that EDdeg ree gen ( X ) is the sum of the degre es of the po lar classe s δ i ( X ). Here, δ i ( X ) denotes the deg ree of the p olar class of X in dimension i , as in [13]. Moreov er, if L s is a generic linear subspa ce o f co dimensio n s in P N then δ i ( X ∩ L s ) = δ i + s ( X ) by [6, Corolla ry 6.4]. W e ca ll s - th se ctional ED de gr e e o f X the num ber EDdegree gen ( X ∩ L s ). W e denote by X ∗ the dual v ar iety o f X , as in [6, § 5], and already see n in the pr o of of Prop os itio n 2.2. Corollar y 3.2. The s -th sectiona l ED degree of X is expr ess e d in terms of p olar classes as EDdegree gen ( X ∩ L s ) = X ℓ ≥ s δ ℓ ( X ) . (3.2) If s ≤ codim( X ∗ ) − 1 then X and X ∩ L s have the same generic ED de gr e e. Pr o of . This follows fr om r esults in Sections 5 and 6 in [6]. In order to compute EDdegree ( X ∩ L s ) we have to sum δ i ( X ) for i ≥ s . How ever, it is known that δ i ( X ) = 0 if i ≤ co dim( X ∗ ) − 2. A sp ecial role in [6] is play ed by the isotr opic qu adric Q = V ( x 2 0 + x 2 1 + · · · + x 2 N ) in P N . If X is s mo oth and transversal to Q then [6, Theor em 5.8] g ives an explicit formula for the ED degree in terms of Chern class es of X c i ( X ). A thorough treatment of Chern classes can b e found in [10]; the rea de r interested in the applications in this pap er can b e refer red to the ba sics provided in [6]. By co mbin ing [6, Theor em 5.8] with Cor ollary 3.2, we o btain Theorem 3.3. L et X ⊂ P N b e a smo oth pr oje ctive variety of dimension M and assume that X is tra nsversal t o the isotr opic quadric Q . Then the s -t h se ctional ED de gr e e of X e quals EDdegree gen ( X ∩ L s ) = M X ℓ = s M X k = ℓ ( − 1) M − k k + 1 ℓ + 1 deg( c M − k ( X )) . Pr o of . The inner sum is the p o lar c lass δ i ( X ); see the pr o of of [6, Thm. 5 .8]. W e no w apply Theorem 3.3 to the situa tio n when M = m + n − 2, N = mn − 1, and X = P m − 1 × P n − 1 is the Segre v ariety of m × n matr ices o f ra nk 1 in P N . The Chern po lynomial of the tangent bundle of X in the Chow r ing A ∗ ( X ) = Z [ s, t ] / h s m , t n i equals (1 + s ) m (1 + t ) n . By [13, pa ge 15 0], this implies δ ℓ ( X ) = m + n − 2 X k = ℓ ( − 1) m + n − k k + 1 ℓ + 1 V k , (3.3) where V k = deg( c M − k ( X )) is the co efficient of s m − 1 t n − 1 in the expansion of (1 + s ) m (1 + t ) n ( s + t ) k . T oric geometers may view V k as the sum of the normalized volumes of all k -dimensio nal faces o f the p olytop e ∆ m − 1 × ∆ n − 1 ; se e [6, Cor. 5 .11]. The following r esult explains the ED degr ees in the third c o lumn in T able 2 .1, and it allows us to determine this column fo r any desired v alue of m , n and s : Theorem 3.4. L et m ≤ n and L b e a generic line ar subsp ac e of c o dimension s in R m × n . F or matric es of r ank 1 or c or ank 1 , the generic ED de gr e e is given by EDdegree gen ( L ≤ 1 ) = δ s ( X ) + δ s +1 ( X ) + · · · + δ m + n − 2 ( X ) , EDdegree gen ( L ≤ m − 1 ) = δ 0 ( X ) + δ 1 ( X ) + · · · + δ mn − 2 − s ( X ) , (3.4) 12 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS wher e δ ℓ ( X ) may b e c ompute d fr om (3.3). Pr o of . The dua l in P mn − 1 to the Segr e v a riety X = P m − 1 × P n − 1 is the v ariety X ∗ of matrices o f rank ≤ m − 1. By [1 3, Theorem 2 .3], we hav e δ ℓ ( X ) = δ mn − 2 − ℓ ( X ∗ ) for all ℓ . With this duality of po lar c la sses, the re sult follows fro m Cor ollary 3.2 and [6, Theor em 5.4]. Example 3. Fix m = n =3. F or matrices o f ra nk 1, formulas (3.3 ) and (3.4) give s = codim( L ) 0 1 2 3 4 5 6 7 V s 9 18 24 18 6 0 0 0 δ s ( X ) 3 6 12 12 6 0 0 0 EDdegree gen ( L ≤ 1 ) 39 36 30 18 6 0 0 0 Dualit y for po lar class e s yields the for mulas for 3 × 3-matrices o f ra nk r = 2 in L : s = co dim( L ) 0 1 2 3 4 5 6 7 δ s ( X ∗ ) = δ 7 − s ( X ) 0 0 0 6 1 2 12 6 3 EDdegree gen ( L ≤ 2 ) 39 39 39 39 33 21 9 3 This is our theoretical deriv ation o f the third column in T able 2 .1 for n = 3 and generic Λ. ♦ W riting down closed formulas for in termediate v alues of r is more difficult: it inv o lves some Schu b ert c a lculus. How ev er, EDdegr ee gen ( L ≤ r ) can be conv eniently computed with the following script in Macau lay2 [1 2]. It is a s lig ht generaliza tio n of that in [6, Example 7.10]: loadPa ckage "Schu bert2 " ED=(m, n,r,s )-> (G = flagB undle( {r,m-r}); (S,Q ) = G.Bundles ; X=proj ectiv eBundle (S^n ); (sx,qx)=X. Bundl es; d=dim X; T=tang entBun dle X; sum(to List( s..m*n-2),i->sum(toList(i..d),j->(-1)^(d-j)* binomi al(j+ 1,i+1)*integral(chern(d-j,T)*(chern(1,dual(sx)))^(j))))) The function ED(m, n,r,s ) computes the ED degr ee of the v ariety o f m × n matrices of rank ≤ r , in general co ordina tes , cut with a ge ne r ic linear spa ce of co dimension s in P mn − 1 . F or s = 0 this is precisely the function displayed in [6, Ex a mple 7.10 ]. Example 4. The bold face ED degr ees in T a ble 2.3 were c omputed fo r unit weigh ts Λ = 1 . T o find the a nalogous n um b ers for generic weigh ts Λ, we r un our Macaul ay2 co de as follows: apply( 12,s- >ED(4,4,2,s)) {1350, 1350, 1350, 1350, 1330, 1250 , 1074, 818, 532, 276, 100, 20} apply( 12,s- >ED(3,4,2,s)) {83, 83, 83, 83, 83, 83, 73, 49, 22, 6, 0, 0} apply( 12,s- >ED(3,5,2,s)) {143, 143, 143, 143, 143, 143, 143, 143, 128, 88, 40, 10} A t this p oint, we wish to reiterate the main thesis of this pap er , namely that knowing the ED degree ahead of time is us eful for pr actitioners who seek to find and certify the globa l minim um in the o ptimization pro blem (1.3), and to b ound the nu mber of lo ca l minima. The following example illustrates this for one o f the n um b ers 83 in the output in E x ample 4. Example 5. W e here solve the generic weight ed structured low-rank approxima- tion pro blem ov er the rea ls with para meters m = 3, n = 4, r = 2 and s = 2. Consider EXACT STRUCTURED LO W-RANK APPRO XIMA TION 13 the insta nce U = − 9 4 9 − 1 0 10 6 1 − 9 10 5 7 6 Λ = 8 6 8 2 1 8 7 9 7 2 4 6 L 1 ( X ) = − 10 x 11 + 4 x 12 + 6 x 13 + 8 x 14 + 4 x 21 − 9 x 22 + x 23 − 10 x 31 − 10 x 32 − 8 x 33 +2 x 34 − 1 , L 2 ( X ) = 2 x 11 + 7 x 12 + 3 x 13 − 7 x 14 − 4 x 21 − 6 x 22 − 7 x 23 + 5 x 24 + 8 x 31 + 2 x 33 + 3 x 34 − 1 . W e wis h to find the matrix X o f rank at mo s t 2 that satisfies the affine constraints L 1 ( X ) = L 2 ( X ) = 0 and is nearest to U . Using Gr ¨ obner bases computations a nd real isolation techniques via the Maple pack ages FGb and fgbrs , we find that the weigh ted distance function has 8 3 complex critica l p oints. This matches the theore tical v a lue ED(3 , 4 , 2 , 2) = 83 provided in Example 4, so that we are guara nteed th at there a re no further cr itical p o in ts. Among them, seven are real and we obta in cer tified numerical approximations o f their v alues: " 0 . 764 − 1 . 457 2 . 436 1 . 870 0 . 753 − 0 . 0154 0 . 030 − 7 . 437 2 . 020 − 4 . 371 7 . 308 8 . 330 #" − 8 . 0341 4 . 127 9 . 055 5 . 364 16 . 936 2 . 930 − 1 . 330 − 4 . 220 9 . 429 7 . 525 8 . 258 1 . 242 # " − 8 . 215 5 . 033 9 . 965 1 . 647 16 . 848 4 . 259 0 . 423 − 3 . 669 9 . 070 6 . 218 5 . 842 − 2 . 054 # " − 8 . 586 − 1 . 743 1 . 591 2 . 436 11 . 191 2 . 985 − 4 . 232 − 7 . 159 10 . 351 0 . 292 3 . 567 7 . 185 # " − 4 . 853 4 . 081 6 . 301 − 6 . 349 − 6 . 067 5 . 029 8 . 600 − 8 . 251 2 . 616 − 2 . 455 − 0 . 878 2 . 327 # " − 2 . 308 − 4 . 584 3 . 566 − 5 . 484 − 0 . 205 − 2 . 210 0 . 668 − 3 . 178 − 2 . 276 0 . 983 2 . 444 2 . 810 # " − 9 . 664 2 . 805 7 . 113 − 10 . 754 14 . 942 6 . 520 3 . 149 − 8 . 783 8 . 344 0 . 615 − 2 . 185 2 . 177 # The last matrix is the closest critica l p oint on the manifold of r a nk 2 matrices satisfying L 1 = L 2 = 0. This computation takes 1002 seco nds and the most time- consuming step is the c o mputation of the Gr¨ o bner basis. In order to certify that the global minimum is among these matrices, we also solve the same low-rank approxi- mation problem for ra nk 1 matrices. Using the sa me metho d, this provides us with 11 rank 1 matrices with r e al entries in 79 s e conds. None of them is c loser to U than the b e s t ra nk 2 approximation. Co nsequently , the globa l minimum of the w eighted distance is rea ched at the last matrix in the a bove list. F or co mparison purpose s , with the same constraints L 1 , L 2 and same data matrix U but b y taking the F rob enius distance ( i.e. Λ is the unit matrix), the num b e r of complex critical p oints is 43. Five o f them are r eal. Here, it takes only 27 seconds to find the global minimizer. These co mputations have been p erfo r med o n a n I ntel Xeon E7540 /2.00G Hz . ♦ In T able 2 .1 a nd Ex a mple 4 w e observed that the sectional ED de g ree for gener ic Λ does not dep end on s = co dim( L ), provided s is small. The following co rollary explains this. Corollar y 3.5. F or a generic line ar subsp ac e L of c o dimension s < r ( r + n − m ) , EDdegree gen ( L ≤ r ) = E Ddegree gen ( C m × n ≤ r ) . 14 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS Pr o of . Let X be the v ar iety of matrices of rank ≤ r . Its dual X ∗ is the v a riety of ma tr ices of rank ≤ m − r and has co dimens io n co dim( X ∗ ) = ( r + n − m ) r . This implies δ ℓ ( X ) = 0 for ℓ < ( r + n − m ) r − 1. The asser tion follows from Corollar y 3.2. Corollar y 3.5 can be stated informally like this: in the setting o f g eneric weights and g eneric linear spaces of ma trices with sufficiently high dimens io n, the a lgebraic complexity of structured low-rank approximation a grees with that of ordinary low- rank a pproximation. Shifting gears, we now co nsider the case of unit w eights Λ = 1 . Thus, we fix Q = V ( P x 2 ij ) as the isotropic qua dric in P mn − 1 . Let X = P m − 1 × P n − 1 denote the Segre v a riety of m × n ma trices of rank 1 in P mn − 1 , and let Z = Sing ( X ∩ Q ) denote the non- transversal lo cus of the intersection o f X with Q . The dual v ariety X ∗ consists of all matrices of r ank ≤ m − 1 in P mn − 1 . W e c o njecture that the following formula (put m = n ) holds for the gap b etw een the third and the first column of T able 2.1, (or b etw een the four th and the second, a s well), EDdegree gen ( X ∗ ∩ L s ) − EDdegr ee 1 ( X ∗ ∩ L s ) = EDdegree gen ( Z ∩ L s ) . (3.5) T o compute the r ight-hand side, and to test this co njecture, we use Lemma 3.6. The lo cus wher e Q me ets X = P m − 1 × P n − 1 non-tr ansversal ly in P mn − 1 is the pr o duct Z = Q m − 2 × Q n − 2 , wher e Q i − 2 denotes a gener al quadr atic hyp ersurfac e in P i − 1 . Pr o of . The Segr e v ariety X meets Q in the union of tw o irreducible co mpo nent s, P m − 1 × Q n − 2 and Q m − 2 × P n − 1 . The non-trans versalit y lo cus is the intersection of these c o mp o nents. Example 6. Let m = n = 2, so X and X ∗ represent 3 × 3-matrices of rank 1 and rank ≤ 2 resp ectively . Here Z = Q 1 × Q 1 corres p o nds to the Segre quadric P 1 × P 1 , embedded in P 8 with the line bundle O (2 , 2). This is a toric sur face who se po lygon P is twice a regula r square. The facia l volumes as in [6, Co rollar y 5.1] ar e V 0 = 4 , V 1 = 8 and V 2 = 8, and hence δ 0 ( Z ) = 4 − 2 · 8 + 3 · 8 = 12 , δ 1 ( Z ) = − 8 + 3 · 8 = 16 , δ 2 ( Z ) = 8 . W e fill this into a table and, using Corolla ry 3 .2, we compute the sectional ED degree: s 0 1 2 3 4 5 6 7 δ s ( Z ) 12 16 8 0 0 0 0 0 EDdegree gen ( Z ∩ L s ) 36 24 8 0 0 0 0 0 EDdegree gen ( X ∗ ∩ L s ) 39 3 9 3 9 39 33 21 9 3 EDdegree 1 ( X ∗ ∩ L s ) 3 15 31 39 33 21 9 3 The la st tw o lines are ta ken from T a ble 2.1, and they confirm the formula (3.5). ♦ Combining Lemma 3.6, Co rollar y 3.2 and the pro o f of [6, Theorem 5.8], and abbreviating W j = de g ( c m + n − 4 − j ( Q m − 2 × Q n − 2 )), the rig ht-hand side of (3.5) can be expr essed as m + n − 4 X i = s m + n − 4 X j = i ( − 1) m + n − 4 − j j + 1 i + 1 W j . (3.6) Moreov er, W j is equal to the co efficient of t m − 2 s n − 2 in the r a tional generating function 4 (1 + t ) m (1 + s ) n (1 + 2 t )(1 + 2 s ) ( t + s ) j . EXACT STRUCTURED LO W-RANK APPRO XIMA TION 15 This c o mputation allows us to extend T able 2.1 to any desired v alue o f m , n and s . Changing topics, w e now consider the case when L is the spa ce of Hankel matric es . The co mputation o f low-rank approximation of Hankel matrices will b e our topic in Section 4, where we fo cus on algebr aic geometr y a nd for mu las for generic ED degree. Set d = p + q − 2 and let X d,r denote the v ariety of p × q Hankel matric e s of ra nk ≤ r . See (4.1) for examples. This v ariety lives in the pro jective space P d = P ( S d C 2 ), whose po int s r epresent binary forms of deg r ee d . Thus X d, 1 is the rational nor mal curve of degree d , and X d,r is the r th secant v a riety of this curve. W e ha ve dim( X d,r ) = 2 r − 1 for r + 1 ≤ min( p, q ). Theorem 3.7. L et d = p + q − 2 and r + 1 ≤ min( p, q ) . The generic ED de gr e e of t he variety X d,r of p × q H ankel m at ric es of r ank ≤ r in P d e quals EDdegree gen ( X d,r ) = r X i =0 d + 1 − r i d − r − i r − i 2 r − i . (3 .7 ) Pr o of . The sum in (3.7) is the co efficien t o f z r in the gener ating function (1 + z ) d +1 − r (1 − 2 z ) d − 2 r +1 . (3.8) The co normal v ariety of X d,r is the clo s ure N X d,r of the set ( f , g ) | r ank( f ) = r a nd g is tangent to X d,r at f ⊂ P ( S d C 2 ) × P ( S d ( C 2 ) ∗ ) . The ho mo logy c la ss of N X d,r is g iven by a bina ry form. W e will show that the sum P i δ i ( X d,r ) of its co efficients is the asser ted co efficient of (3.8). By [6, (5.3)], this prov es the cla im. Let p 1 , p 2 be the tw o pro jectio ns. The images of the cono rmal v a r iety N X d,r are p 1 ( N X d,r ) = X d,r and p 1 ( N X d,r ) = X ∗ d,r . W e des ingularize X d,r by consider ing Sym r ( P 1 ) ≃ P r . The desing ularization map is given by the scheme-theoretic in tersection of the rational no rmal curve of degree r with a hyper plane. A point in P r , identified with a hyperplane, gives r points on X d, 1 ≃ P 1 . Their linear span in P d defines a ra nk r bundle on P r , known as the S chwarzenb er ger bund le [7, § 6 ]. This is the kernel o f the bundle map O d +1 → O (1) d +1 − r . In the same wa y , we desingularize the conormal v ariety N X d,r by the fib er pr o duct ov er P r of the pro jectivization of the Sc hw arzenberge r bundle E d,r = kernel( O d +1 → O (1 ) d +1 − r ) and of the pro jective bundle of O (2) d − 2 r +1 . Exactly as in the pro of of [3, Pro p o sition 4.1], the degr ees of the p ola r classes of X d,r are δ r + i − 1 ( X d,r ) = Z P r s i ( E d,r ) s r − i ( O (2) d − 2 r +1 ) . The tota l Segre class o f E d,r is (1 + z ) d +1 − r . The tota l Segre class o f O (2) d − 2 r +1 is 1 (1 − 2 z ) d − 2 r +1 . By m ultiplying them w e obtain the degree s um of the polar classes , thus proving (3 .8). Corollar y 3. 8. The generic ED de gr e e of the hyp ersurfac e X 2 r,r define d by t he Hankel determinant of format ( r + 1) × ( r + 1) is e qual to 3 r +1 − 1 2 = the c o efficient of z r in (1 + z ) r +1 1 − 2 z . (3.9 ) 16 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS This cor ollary means that the ED deg ree of the ( r +1) × ( r +1 ) Hankel determinant agrees with the E D degr ee of the genera l symmetric ( r +1) × ( r +1) determinant. By ED dualit y [6, Theorem 5 .2], this also the ED degree of the second V eronese em b edding of P r ; see [6, E xample 5.6 ]. If we conside r Hankel matrices of fix ed rank r then we obtain p olyno miality: Corollar y 3.9. F or fi x e d r , the generic ED de gr e e of X d,r is a p olynomial of de gr e e r in d . F or example, we find the following explicit p olynomials when the rank r is s mall: EDdegree gen ( X d, 1 ) = 3 d − 2 , EDdegree gen ( X d, 2 ) = (9 d 2 − 39 d + 38) / 2 , EDdegree gen ( X d, 3 ) = (9 d 3 − 99 d 2 + 348 d − 388) / 2 , EDdegree gen ( X d, 4 ) = (27 d 4 − 558 d 3 + 4221 d 2 − 13818 d + 16472) / 8 . The v alues of thes e po lynomials are the entries in the left columns in T able 4.1 below. 4. Hank el and Sylv ester Matrices. In this section w e s tudy the w eighted low-rank approximation proble m for matrices with a sp ecial structure that is given by equating some matrix entries and setting others to zero . One such family co nsists of the Hurwitz matrices in [6 , Theo rem 3.6 ]. W e here discuss Hankel ma trices, then catalecticants, and fina lly Sylvester matrices. The corresp onding applications a re low- rank approximation of symmetric tensor s and approximate greatest common divisors. The Hankel matrix H [ p, q ] o f format p × q has the entry x i + j − 1 in r ow i and column j . So, the total num b er of unknowns is n = p + q − 1. W e are most interested in the case when this matrix is square o r almo st squar e. The Hankel matrix of or der n is H [( n +1 ) / 2 , ( n +1) / 2] if n is odd, and it is H [( n/ 2 , ( n +2) / 2] if n is even. W e denote this matrix by H n . F or instance , H 5 = x 1 x 2 x 3 x 2 x 3 x 4 x 3 x 4 x 5 and H 6 = x 1 x 2 x 3 x 4 x 2 x 3 x 4 x 5 x 3 x 4 x 5 x 6 . (4.1) F or approximations by low-rank Hankel matrices, we c onsider thr ee natura l weigh ts: • the ma trix Ω n has entry 1 / min( i + j − 1 , n − i − j +2) in r ow i and column j ; • the ma trix 1 n has all entries equal to 1; • the ma trix Θ n has n − 1 i + j − 2 / min( i + j − 1 , n − i − j +2) in row i and column j . W e encountered these ma tr ices for n = 5 in Example 1. F or n = 6 we hav e Ω 6 = 1 1 / 2 1 / 3 1 / 3 1 / 2 1 / 3 1 / 3 1 / 2 1 / 3 1 / 3 1 / 2 1 and Θ 6 = 1 5 / 2 10 / 3 10 / 3 5 / 2 10 / 3 10 / 3 5 / 2 10 / 3 10 / 3 5 / 2 1 . The weight s Ω n represent the usual Euclidean distance in R n , the unit weights 1 n give the F ro b enius distance in the ambien t matrix space, and the weigh ts Θ n give the natural metric in the space of sy mmetric 2 × 2 × · · · × 2-tensors. Such a tensor corres p o nds to a binary for m F ( s, t ) = n X i =1 n − 1 i − 1 · x i · s n − i · t i − 1 . The Hank el matrix H n has r ank 1 if and only if F ( s, t ) is the ( n − 1)st p ow er of a linear for m. More gener ally , if F ( s, t ) is the sum o f r p ow ers of linea r forms then H n EXACT STRUCTURED LO W-RANK APPRO XIMA TION 17 Λ = Ω n n \ r 1 2 3 4 3 4 4 7 5 10 13 6 13 34 7 16 64 40 8 19 103 1 42 9 22 151 334 121 Λ = 1 n n \ r 1 2 3 4 3 2 4 7 5 6 9 6 13 34 7 10 38 34 8 19 103 1 42 9 14 103 246 1 13 Λ = Θ n n \ r 1 2 3 4 3 2 4 3 5 4 7 6 5 16 7 6 28 20 8 7 43 62 9 8 61 134 53 T able 4.1 Weighte d ED de g r ees for Hankel matric es of or der n and r ank r . has r ank ≤ r . As we saw in § 3, this lo cus c orresp o nds to the r th secant v ariety of the rational nor mal cur ve in P n − 1 . V arious ED deg rees for our three weight matrices are display ed in T able 4.1. The entries in the leftmost chart in T a ble 4.1 c o me fro m Theorem 3 .7. Indee d, the v ariety of Hankel matr ices H n of rank ≤ r is precisely the secant v ar ie t y X n − 1 ,r we discussed in Section 3. The weigh t matrix Λ = Ω n exhibits the generic ED degree for that v a riety . The columns on the left of T able 4.1 a re the v alues of the p olynomials in Corollar y 3.9, and the diagonal entries 4 , 13 , 4 0 , 121 , . . . a re given b y Corollar y 3.8. All ED degr ees in T able 4.1 were verified using Gr¨ o bner basis computations ov er GF (6552 1 ) using the maple pack age FGb [8 ]. The running times are closely tied to the v alued o f the ED degr ees, and they are s imila r to those repor ted in T a ble 2 .3. Gr¨ obner bas es ov er Q can also b e computed fairly ea sily whenever the ED degree is b elow 10 0, a nd for those cases we can lo ca te all rea l critical p oints using fgb rs . How e ver, for la rger instances, e x act symbolic solving over Q b eco mes a co nsiderable challenge due to the growth in co efficie nt size. Hankel matr ic es of rank r corres p o nd to symmetric 2 × 2 × · · · × 2-tensors of tensor rank r , and these can b e repres ent ed by binary forms that are sums of r powers of linear for ms. Tha t is the p oint of the g eometric dis cussion in Sectio n 3. This int erpreta tion extends to symmetr ic tensor s of arbitrar y format, with the r ational normal curve replaced with the V eronese v ariety . F or a genera l study of low-rank approximation of symmetric tenso rs see F r ie dland and Stawisk a [9]. In general, there is no straightforward r epresentation o f low rank tenso rs b y low rank matr ices w ith sp ecial structure. Ho wev er, there are so me exceptions, notably for rank r = 2 tensors, by the results o f Raicu [20] and others in the r ecent tensor literature. W e refer to Landsb erg’s b o ok [1 6], e sp ecially Chapters 3, 7 and 10. The resulting generalized Hankel matrices are known as c atale ctic ant s in the commutativ e algebra liter a ture, or as moment matric es in the o ptimization literatur e. W e now present a case study that arose fro m a pa rticular application in biomedical imaging . W e consider the following ca talecticant matrix of format 6 × 6: X = x 400 x 310 x 301 x 220 x 211 x 202 x 310 x 220 x 211 x 130 x 121 x 112 x 301 x 211 x 202 x 121 x 112 x 103 x 220 x 130 x 121 x 040 x 031 x 022 x 211 x 121 x 112 x 031 x 022 x 013 x 202 x 112 x 103 x 022 x 013 x 004 18 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS The fifteen unknown entries are the co efficients of a terna ry quartic F ( s, t, u ) = x 400 s 4 + x 040 t 4 + x 004 u 4 + 6 x 220 s 2 t 2 +6 x 202 s 2 u 2 +6 x 022 t 2 u 2 + 4 x 310 s 3 t + 4 x 301 s 3 u +4 x 130 st 3 +4 x 031 t 3 u +4 x 103 su 3 + 4 x 013 tu 3 + 12 x 211 s 2 tu +12 x 121 st 2 u +12 x 112 stu 2 . The table ( x ij k ) ca n b e regar ded as a symmetric tensor of format 3 × 3 × 3 × 3. The co efficients in F ( s, t, u ) indicate the multip licity with which the 15 unknowns o ccur among the 3 4 = 81 co ordinates of that tensor. T o mo del the inv ariant metric in the tensor s pace R 3 × 3 × 3 × 3 in o ur matrix repres ent ation, we use the weight matrix Θ = 1 2 2 2 3 2 2 2 3 2 3 3 2 3 2 3 3 2 2 2 3 1 2 2 3 3 3 2 2 2 2 3 2 2 2 1 . The pr oblem is to approximate a g iven catalecticant ma trix U = ( u ij k ) by a r ank 2 matrix with resp ect to Θ. The exp ected num b er of critical po int s is a s follows. Proposition 4.1 . L et L b e the 15 -dimensional su bsp ac e of c atale ctic ants X in R 6 × 6 . Then EDdegree Θ ( L ≤ 2 ) = 195 and EDdeg ree gen ( L ≤ 2 ) = 1813 . The pr o of is a computatio n a s explained b elow. W e first discuss a n applicatio n. Example 7. W e co nsider the following symmetric 3 × 3 × 3 × 3-tensor: u 400 0 . 1023 u 220 0 . 0039 u 310 − 0 . 002 u 103 0 . 0196 u 211 − 0 . 000 3256 9 u 040 0 . 0197 u 202 0 . 0407 u 301 0 . 0581 u 031 0 . 0029 u 121 − 0 . 001 2 u 004 0 . 1869 u 022 − 0 . 000 1741 8 u 130 0 . 0107 u 013 − 0 . 002 1 u 112 − 0 . 001 1 This tensor was given to us by Thomas Sch ultz, w ho heads the Visua liz ation and Medical Ima ge Ana ly sis Gr oup a t the Universit y o f Bonn. It repr esents a fib er distri- bution function, es timated fro m diffusion Mag netic Resonance Imaging. See [23] for more infor mation. ♦ W e present a n a lgebraic form ulation of o ur problem whic h w as found to b e suitable for symbolic computation. Introducing six unknowns a, b, c, d, e , f , w e parametriz e the 6-dimensional v ariety of symmetric 3 × 3 × 3 × 3 -tensors of r ank 2 b y the ternary quartics ˜ F ( s, t, u ) = a · ( s + bt + cu ) 4 + d · ( s + et + f u ) 4 . Just like in the discussio n in Remark 1 and after Prop os itio n 2 .3, the image of this parametriza tion is a dense open subset of the symmetric 3 × 3 × 3 × 3 -tensors of rank 2. Cov ering all r ank 2 tensors ca n b e achiev ed with three par ametrizatio ns a s ab ove. W ritten out explicitly , this par ametrization takes the for m x 400 = a + d x 220 = ab 2 + de 2 x 310 = ab + de x 103 = ac 3 + d f 3 x 211 = abc + def x 040 = ab 4 + de 4 x 202 = ac 2 + d f 2 x 301 = ac + d f x 031 = ab 3 c + de 3 f x 121 = ab 2 c + de 2 f x 004 = ac 4 + d f 4 x 022 = ab 2 c 2 + de 2 f 2 x 130 = ab 3 + de 3 x 013 = abc 3 + def 3 x 112 = abc 2 + def 2 EXACT STRUCTURED LO W-RANK APPRO XIMA TION 19 Note that o ur para metrization is 2 to 1: every rank 2 ca talecticant X has tw o preimages, which ar e related by swapping the vectors ( a, b , c ) a nd ( d, e, f ). The fib er jumps in dimension over the s ing ular lo cus, which consists of matric es X of ra nk 1. Their pre image in para meter space is giv en by the ideal h ad i ∩ h b − e, c − f i . The c hosen weigh t matrix Θ now sp ecifies the following unconstrained optimization pro blem. W e seek to find the minim um in R 6 of G ( a, b, c, d, e, f ) = ( u 400 − a − d ) 2 + ( u 040 − ab 4 − de 4 ) 2 + ( u 004 − ac 4 − d f 4 ) 2 +6( u 220 − ab 2 − de 2 ) 2 + 6( u 202 − ac 2 − d f 2 ) 2 + 6( u 022 − ab 2 c 2 − de 2 f 2 ) 2 +4( u 310 − ab − de ) 2 + 4( u 301 − ac − d f ) 2 + 4( u 130 − ab 3 − de 3 ) 2 +4( u 103 − ac 3 − d f 3 ) 2 + 4( u 031 − ab 3 c − de 3 f ) 2 + 4( u 013 − abc 3 − def 3 ) 2 +12( u 112 − abc 2 − def 2 ) 2 + 12( u 211 − abc − def ) 2 + 12( u 121 − ab 2 c − de 2 f ) 2 . The s e t of complex cr itical p oints is the zer o lo cus of the ideal I = ∂ G ∂ a , ∂ G ∂ b , ∂ G ∂ c , ∂ G ∂ d , ∂ G ∂ e , ∂ G ∂ f : h ad i ∩ h b − e, c − f i ∞ . F or applicatio ns, we are interested in the real p oints in this v ar ie t y . Computational pr o of of Pr op osition 4.1. As argued in [6, § 2], the ideal I is radical and zero-dimens io nal when the u ij k are generic r a tional n umbers. The num b er o f solutions is the degree of I , and w e found this to b e 370 = 2 · 195. This is t wice the ED degr ee of L ≤ 2 with resp ect to Λ = Θ. F or this computation we used the FGb library in map le . W e used Gr¨ obner base s over the finite field GF (6 5521) to av oid the swelling of ratio na l co efficients, the data u ij k are chosen uniformly a t ra ndom in this field, a nd w e saturate only by h ad ( b − e ) i . The co mputatio n took 90 seconds and returned 390 c ritical po int s of G . Performing the same co mputation with the co efficients 1 , 6 , 4 , 12 in G ( a, b, c, d, e, f ) repla c ed with random field elements, we find 3626 = 2 · 1813 critica l p oints, and hence EDdeg ree gen ( L ≤ 2 ) = 1813. Example 8. W e re tur n to the particular data set in Example 7. Using the ab ov e parametr ization, the b est rank 2 a pproximation can b e o btained by so lving a po lynomial s ystem. This can b e achiev ed by using s ymbolic or numerical metho ds. A n umerical computation conducted b y Jose Rodrig uez with the so ft ware B ertin i indicates that, for Thomas Sch ultz’ data, precisely 9 of the 195 critical points a re real. These corresp ond to 2 lo cal minima and 7 saddle p oints o f the Euclidean dis tance function. The precomputation with generic data to ok 2 hours o n 40 AM D Opteron 6276/2 .3Ghz cores . Then the computation with the numerical da ta in Example 7 was achiev ed in 1 minut e. These results were also computed by sy mbo lic metho ds: a Gr ¨ obner basis compu- tation conducted by Jean-Charles F aug` ere and Mohab Safey El Din with the so ft ware FGb r eturned an algebr aic par ametrization of the 195 complex critica l p oints by the ro ots of a univ ariate p olynomial of degree 195. This polyno mial has 9 real r o ots. Tw o of them corr esp ond to the tw o lo c a l minima. T he av erage size of the int eger co effi- cients of this univ aria te p o lynomial is 1100 0 digits. F or this co mputatio n, the ab ove formulation as an unconstr ained optimizatio n pr o blem was used. It to o k 1 1 minut es on a 2.6 GHz IntelC ore i7 . In genera l, for symbolic metho ds, unco nstrained formu- lations seem to b e b etter tha n the general implicit form ulation in Prop osition 2.3. See the comparis ons o f timings in T able 2.3. How ever, most instances o f (1.3) do not admit a n unconstra ined formulation, b ecause L ≤ r is us ually not unira tio nal. ♦ Our last to pic in this sectio n is the study of Sylvester matrices. W e consider tw o arbitrar y p olyno mials F a nd G in o ne v ariable t . Supp ose their degr ees are m and n 20 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS with m ≤ n , so F ( t ) = m X i =0 a i t i and G ( t ) = n X j =0 b j t j . Fix k with 1 ≤ k ≤ m . The k -th Sylvester matrix o f the pa ir ( F , G ) equals Syl k ( F, G ) = a 0 0 · · · 0 b 0 0 · · · 0 . . . a 0 . . . . . . . . . b 0 . . . . . . a m . . . . . . 0 b n . . . . . . 0 0 a m . . . a 0 0 b n . . . b 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 0 · · · a m 0 0 · · · b n This matrix has n + k rows and n − m + 2 k columns, so it is s q uare for k = m , and it has more rows tha n columns for k < m . The maxima l mino r s hav e siz e n − m + 2 k , and they all v anish when Syl k ( F, G ) has a non- z ero v ector in its kernel. Such a v ector corres p o nds to a p oly nomial of degree m − k + 1 that is a common factor of F a nd G . The appr oximate gc d pr oblem in computer algebr a [14, 1 5] aims to a pproximate a given pa ir ( F , G ) b y a nearby pair ( F ∗ , G ∗ ) whose Sylv ester matr ix Syl k ( F ∗ , G ∗ ) has line a rly dependent columns. W riting L for the s ubs pa ce o f Sylvester matr ices, this is precisely our ED proble m for L ≤ n − m +2 k − 1 . The following theorem furnishes a formula for EDdegr ee gen ( L ≤ n − m +2 k − 1 ). Theorem 4.2. F or the variety of p airs ( F, G ) of univaria te p olynomials of de gr e es ( m, n ) with a c ommon factor of de gr e e m − k +1 , the generic ED de gr e e e quals that of the Se gr e variety of ( m − k + 2) × ( n − m + 2 k ) -matric es of r ank 1 . It is given by setting s = 0 in (3.4) . Using the Ma caula y2 function E D in Example 4, we c an write this ED de gr e e as EDdegree gen ( L ≤ n − m +2 k − 1 ) = E D ( m − k + 2 , n − m + 2 ∗ k , 1 , 0 ) . Pr o of . A natur al desingulariza tion is given b y multiplying with the desir e d com- mon factor : P m − k +1 × P n − m +2 k − 1 → L ≤ n − m +2 k − 1 , [ A ( t ) , ( B ( t ) , C ( t )) ] 7→ [ A ( t ) B ( t ) , A ( t ) C ( t ) ] . (4.2) Here A ( t ) , B ( t ) , C ( t ) ar e p olynomials of deg rees m − k + 1 , k − 1 , n − m + k − 1 resp ectively . The map (4.2) lifts to a linea r pro jection ma p from the Segre em bedding of P m − k +1 × P n − m +2 k − 1 . W ork of Piene [19, § 4] implies that the degr ees of p ola r lo ci can be computed on that Seg re v ariety . The E D deg r ee is a sum of degrees o f these, by Cor ollary 3.2. The re sult follows. F or m = k , when the Sylvester matr ix is sq uare, Theorem 4.2 refers to 2 × ( n + m )- matrices o f rank 1. Similarly to [6, E xample 5.1 2], their ED degree is 4( m + n ) − 2. Corollar y 4.3. The generic ED de gr e e of the Sylvester determinant Syl m e quals 4( m + n ) − 2 . W e consider three natural choices of w eight matrices for the low-rank approxi- mation o f Sylvester matrices . As b efore in T able 4.1, we write Ω m,n for the weigh t EXACT STRUCTURED LO W-RANK APPRO XIMA TION 21 matrix that repre sents the Euclidean distance on R m + n +2 : it is the matrix which has the same pa ttern a s Syl k with a i and b j replaced resp ectively by 1 / ( n − m + k ) a nd 1 /k . W e also write Θ m,n for the weigh t matr ix o f the rotation inv ariant quadr atic form: a i is r eplaced by 1 / (( n − m + k ) m i ) and b j is r eplaced by 1 / ( k n j ). In T able 4.2 we pre s ent the ED degree s for these choices of weigh ts. The left table shows the generic b ehavior predicted b y Theo r em 4.2. At present, we do no t know a gener al formula for the entries of the tw o tables o n the right side, but we ar e ho p eful that an approach like (3.5) will lead to such fo r mulas. Along the rig ht most margins , where the matr ix Syl m is squar e, the formula s e ems to b e E Ddegree Θ ( L ≤ n + k − 1 ) = 2 n . Λ is gener ic ( m, n ) \ k 1 2 3 4 (2 , 2) 10 14 (2 , 3) 39 18 (2 , 4) 83 22 (2 , 5) 143 26 (3 , 3) 14 83 2 2 (3 , 4) 83 143 26 (3 , 5) 284 219 30 (4 , 4) 18 284 2 19 30 (4 , 5) 143 676 3 11 34 Λ = Ω m,n ( m, n ) \ k 1 2 3 4 (2 , 2) 2 6 (2 , 3) 23 1 8 (2 , 4) 75 2 2 (2 , 5) 119 18 (3 , 3) 2 1 9 10 (3 , 4) 35 9 5 26 (3 , 5) 188 2 03 26 (4 , 4) 2 3 6 59 14 (4 , 5) 47 276 21 5 34 Λ = Θ m,n ( m, n ) \ k 1 2 3 4 (2 , 2) 2 4 (2 , 3) 19 6 (2 , 4) 29 8 (2 , 5) 61 1 0 (3 , 3) 2 1 9 6 (3 , 4) 41 5 3 8 (3 , 5) 106 81 10 (4 , 4) 2 5 0 45 8 (4 , 5) 71 256 101 10 T able 4.2 Weighte d ED de g r ees for Sylvester matric es Syl k ( F, G ) Ac kno wledgements. W e thank the following collea gues for their help with this pro ject: Jean-Cha rles F aug` er e, William Rey , Ragni Piene, Jose Ro driguez, Mohab Safey El Din, ´ Eric Schost, and Tho mas Schult z. Giorg io Ottaviani is a member of GNSAGA-IND AM. Pierre- Jean Spaenleha uer and Bernd Stur mfels were hosted by the Max-Planck Institute f ¨ ur Mathematik in Bonn, Germany . Bernd Sturmfels w as a ls o suppo r ted by the NSF (DMS-09688 8 2). REFERENCES [1] D. Bat es, J. Hauenstein, A. Som m ese and C. W am pler , Numerically Solving Polynomial Systems with Bertini , SIAM, 2013. [2] G. Blekherma n, P. P arrilo and R. Thomas , Semidefinite Optimization and Con vex Alge- braic Geometry , MOS-SIAM Series on Optimization 13, SIAM, Philadelphia, 2013. [3] H.-C.G. von Bothmer and K. Ranest ad , A gener al formula for t he algebr aic de gr e e in semidefinite pr o gr amming , Bull . Londo n M ath. Soc. 41 (2009) 193–197. [4] F. Ca t anese, S. Hos ¸ten, A. Khet an and B. Sturmfels , The maximum like liho o d de g r ee , American J. Math. 128 (2006) 671-697. [5] M. Chu, R. Funderlic, and R. Plemmons , Structur e d low r ank appr oximation , Linear A l gebra Appl. 366 (2003) 157–172. [6] J. Draisma, E. Horobet ¸, G. Ott a v iani, B. Stu rm fels, and R. Thom a s , The Euclide an distanc e de g r ee of an algeb ra ic variety , arXiv:1309 .0049 . [7] I. Dolgachev a nd M. Ka pranov , Arr angement of hyp erplanes and vector bund les on P n , Duk e M ath. J. 71 (1993) 633–664. [8] J.-C. F aug ` ere , A new effici ent algorithm for c omputing Gr¨ obner b ases without r e duction to zer o (F5) , in Proceedings of ISSAC 2002, 75–83. FG b l i brary a v ailable at http://www-po lsys.lip6.fr/~jcf/Software/FGb/ . 22 G. OTT A VIANI, P . J. SP A ENLEHAUER A ND B. STURMFELS [9] S. Friedland and M. St a wiska , Best appr oximation on semi-algebr aic sets and k-b or der r ank appr oximation of symmetric t ensors , arXiv: 1311.1561 . [10] W. Ful ton , Interse ction The ory , Springer, Berlin, 1998. [11] M. Golubitsky and V. G uillemin , Stable Mappings and their Singularities , Springer-V erlag, New Y ork, 1974. [12] D.R. Gra yson and M.E. Stillman , Mac aulay2, a Softwar e Syst em for R ese ar ch in Algebr aic Ge ometry . Av ailable at http:// www.math.ui uc.edu/Macaulay2/ . [13] A. Holm e , The ge ometric and numeric al pr op erties of duality in pr oje ctive algebr aic ge ometry , Manu scripta M ath. 61 (198 8) 145–162 . [14] N.K. Karmarkar a n d Y.N. Lakshman , On appr oximate GCDs of univariate p olynomials , J. Symbolic Comput. 26 (1998) 653–666. [15] E. Kal tofen, Z. Y ang and L. Zhi , Structur e d low r ank appr oximation of a Sylveste r matrix , in: D. W ang, L. Zhi (Eds.): Symbolic-Numeri c Computation, T rends in Mathematics, Birkh¨ auser, 200 7, pp. 69–83 . [16] J.M. Lan dsberg , T ensors: Ge ometry and Applic ations , Graduate Studies in Mathematics, 128 , American Math. Soci ety , Providence , 2012. [17] J.H. Manton, R. Mah ony and Y. Hua , The geo metry of weighte d low-r ank appr oximation , IEEE T ransactions on Signal Pro cessing 51 (2003) 500–514. [18] I. Marko vski , Structur ed low-r ank appr oximation and its ap plic ations , Automat ica 44 (2 008), no. 4, 891–909. [19] R. Piene , Po lar classes of singular varieties , Ann. Sci. ´ Ecole Norm. Sup . (4) 11 (1978) 247–276. [20] C. Raicu , Se c ant varieties of Segr e-V ero nese varieties , Algebra and Num b er Theory 6 (2012) 1817–186 8. [21] K. Ranest ad , Algebraic degree i n semidefinite and p olynomial optimization, in J.-B. Lasserr e and M. Anj os (eds.): Handb o ok on Se midefinite, Conic and Polynomial Optimization , Springer, 2012, pp. 61-75. [22] W. Rey , O n weighte d low-r ank appr oximation , arXiv:1302.0360 . [23] T. Schu l tz, A. Fuster, A. Ghosh, R. Deriche, L. Florack, and L.-H. Lim , Hig her-or der tensors in diffusion imaging , In: Vi sualization and Pro cessing of T ensors and Higher Or der Descriptors for Multi-V alued Data, Springer, 2013. [24] N. Sreb ro and T. Jaa kkola , Weighte d low-r ank appr oximations , Internationa l Conference on Machine Learning (2 003) 720–727.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment