Atomic Gradient Flows: Gradient Flows on Sparse Representations

One of the most popular approaches for solving total variation-regularized optimization problems in the space of measures are Particle Gradient Flows (PGFs). These restrict the problem to linear combinations of Dirac deltas and then perform a Euclide…

Authors: Christian Amend, Marcello Carioni, Konstantinos Zemas

A tomic Gradien t Flo ws: Gradien t Flo ws on Sparse Represen tations Christian Amend ∗ , Marcello Carioni ∗ , Konstan tinos Zemas † Abstract One of the most p opular approac hes for solving total v ariation-regularized optimization prob- lems in the space of measures are P article Gradient Flo ws (PGFs). These restrict the problem to linear com binations of Dirac deltas and then p erform a Euclidean gradien t flo w in the w eigh ts and p ositions, significantly reducing the computational cost while still decreasing the energy . In this w ork, w e generalize PGFs to con v ex optimization problems in arbitrary Banac h spaces, whic h we call Atomic Gr adient Flows (AGFs). T o this end, the crucial ingredient turns out to b e the righ t notion of p articles , or atoms , c hosen here as the extremal p oints of the unit ball of the regularizer. This choice is motiv ated by the Kr ein–Milman the or em , which ensures that minimizers can b e appro ximated by linear combi- nations of extremal p oints or, as we call them, sp arse r epr esentations . W e in v estigate metric gradien t flo ws of the optimization problem when restricted to suc h sparse represen tations, for whic h we define a suitable discr etize d functional that we sho w to b e to be consistent with the original problem via the means of Γ-conv ergence. W e prov e that the resulting evolution of the latter is well-defined using a minimizing mo v ement sc heme, and w e establish conditions ensur- ing λ -conv exity and uniqueness of the flow. These conditions c rucially dep end on the geometric prop erties of the set of extremal p oints as a metric space. Then, using Cho quet’s the or em , we lift the problem into the Wasserstein sp ac e on weigh ts and extremal p oin ts, and consider W asserstein gradient flows in this lifted setting. As observed for PGFs, this lifted persp ective is essential for understanding stability and con v ergence prop erties of AGFs. Our main result is that the lifting of the A GF evolution is again a metric gradi- en t flo w in the W asserstein space, verifying the consistency of the approach with respect to a W asserstein-type dynamic. Finally , we illustrate the applicabilit y of A GFs to sev eral relev an t infinite-dimensional problems, including optimization of functions of b ounded v ariation and curv es of measures regularized by Optimal T ransp ort-type p enalties. Con ten ts 1 In tro duction 2 2020 Mathematics Sub ject Classification: 47J35, 49J27, 49J52, 46A55, 28A33, 47A52 ∗ Departmen t of Applied Mathematics, Univ ersity of Twen te, 7500AE Enschede, The Netherlands ( christian.amend@utwente.nl , m.c.carioni@utwente.nl , † Institute for Applied Mathematics, Univ ersit y of Bonn, Endenicher Allee 60, 53115 Bonn, German y ( zemas@iam.uni-bonn.de ) 1 2 Setting and Preliminaries 6 3 F orm ulation of the A tomic Gradien t Flo w 9 3.1 Discretization of the functional J and consistency . . . . . . . . . . . . . . . . . . . . 9 3.2 Definition of AGFs through minimizing mov emen ts . . . . . . . . . . . . . . . . . . . 15 3.3 λ -Con vexit y and curves of maximal slop e . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 NPC of the extremal p oin ts and uniqueness of the flo w . . . . . . . . . . . . . . . . . 21 4 The lifted problem 21 4.1 Lifting the problem in W asserstein space . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Minimizing mov ements for the lifted problem . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 λ -Con vexit y for the lifted problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5 Relating the minimizing mo vemen ts 29 5.1 The one-particle case ( n = 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2 The many-particle case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3 Semi-Discrete Optimal T ransp ort on compact metric spaces . . . . . . . . . . . . . . 39 6 Examples 42 6.1 T otal v ariation regularization in the space of measures . . . . . . . . . . . . . . . . . 43 6.2 One-dimensional BV functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3 Benamou-Brenier dynamical formulation of optimal transp ort . . . . . . . . . . . . . 45 6.4 F urther examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A App endix 50 A.1 Minimizing mo v ements and curv es of maximal slope . . . . . . . . . . . . . . . . . . . 50 A.2 Pro of of Lemma 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 1 In tro duction Solving conv ex optimization problems p osed in infinite-dimensional spaces has long b een a cen tral c hallenge in optimization. Ov er the y ears, a wide range of algorithms with rigorous con vergence sc hemes ha ve b een devised in order to efficiently face this c hallenge, dev eloping theory and tec hniques that are sp ecifically adapted to infinite-dimensional settings. On top of that, infinite-dimensional mo deling has surged in p opularity in fields that are traditionally more application-driven, suc h as data-science, inv erse problems and imaging. In this con text, v ariational problems p osed ov er spaces of measures hav e attracted gro wing at- ten tion, with a particularly effective approac h b eing the so-called P article Gradien t Flows (PGFs). Roughly sp eaking, giv en an am bien t space X , PGFs solv e a con vex optimization problem in the space of measures M ( X ) of the form inf µ ∈ M ( X ) F  Z X ϕ dµ  + ∥ µ ∥ T V , (1.1) 2 b y restricting the search to empirical (or, so called, sp arse ) measures of the form µ sparse := n X j =1 ( c j ) 2 δ x j , where ( c j , x j ) ∈ R + × X . The resulting optimization problem is then solved by p erforming a gradien t flo w with resp ect to the particle weigh ts and positions ( c j , x j ) driven b y the energy (1.1). Empirically , PGFs exhibit strong p erformance provided that the n umber of particles n is big enough. Sev eral w orks hav e established theoretical results such as conv ergence to global minimizers under suitable assumptions, by relating the particle dynamics to the W asserstein gradient flo w of the original functional [23, 22, 49, 32]. Moreo ver, v arian ts in the space of measures hav e b een considered [48, 21, 24, 7] and applications to mac hine learning ha ve b een explored [41, 1, 43, 27]. Generally , PGFs belong to the class of sp arse optimization metho ds in infinite-dimensional spaces; one op erates b y directly optimizing ov er sparse measures, th us taking adv an tage of the sparse struc- ture of the problem. Due to its simplicity , it is natural to ask whether approaches similar to PGFs can b e applied to more general infinite-dimensional optimization problems, where it is even unclear what sparse ob jects should b e, which is the guiding question of this work. Our goal is to extend the PGF approach b eyond measure spaces. In particular, in our setting w e consider comp osite conv ex minimization problems p osed on a Banach space M of the form inf u ∈M J ( u ) , where J ( u ) := F ( K u ) + R ( u ) , (1.2) where F is a con vex fidelity term, K : M → Y is a linear op erator mapping to a Hilb ert space and R is a con v ex, 1-homogeneous regularizer, cf. Section 2 for details. A key insigh t underlying our approach comes from recen t developmen ts in infinite-dimensional sparsit y and represen tation theory . Sparsit y is traditionally understo o d as the p ossibility to represent solutions using only a small num b er of simple building blo c ks (atoms). In the recen t w orks [10, 11], it has b een shown that for optimization problems of the form (1.2), the natural atoms are given by the extremal p oin ts of the unit ball of the regularizer Ext( B ) , where B := { u ∈ M : R ( u ) ⩽ 1 } . In particular, when the op erator K maps in to a finite-dimensional space, representer theorems guar- an tee the existence of solutions that can b e expressed as finite linear com binations of such extremal p oin ts. Motiv ated by this theory , w e fo cus on sp arse ansatzes of the form u sparse := n X j =1 ( c j ) 2 u j , where ( c j , u j ) ∈ R + × ˜ B , (1.3) and ˜ B := Ext( B ) ∗ is the weak*-closure of the set of extremal points. This represen tation serv es as the foundation for a generalized particle-based optimization framework, applicable to a wide class of con vex problems. Crucially , on the compact set ˜ B , the weak*-topology of M can b e metrized b y a metric d ∗ . Thanks to this, w e can tak e adv an tage of the w ell-developed theory of gradient flows in metric spaces [4] in order to construct the gradient flow of ( c , u ) := ( c j , u j ) n j =1 ∈ R n + × ˜ B n , 3 according to the energy in (1.2) restricted to sparse elemen ts of the form (1.3), that is, J n ( c , u ) := J 1 n n X j =1 ( c j ) 2 u j ! . (1.4) W e name suc h general optimization sc hemes Atomic Gradient Flows (in short AGFs) to indicate that the gradien t flow is p erformed at the level of the atoms. More precisely , w e construct AGFs through a minimizing mo vemen t approac h, whose w ell-p osedness will b e based on the assumptions for the optimization problem (1.2), cf. again Section 2 for the precise setup. As detailed also in Subsection 6.1, our w ork directly generalizes PGFs, since for R := ∥ · ∥ T V , and M := M ( X ), the extremal p oints will b e Ext( B ) = {± δ x : x ∈ X } . Imp ortan tly , the motiv ation for the definition of AGFs is that the discretized functional J n pro vides a v ariational approximation of the target functional J in the sense of Γ-con vergence. As a consequence, minimizers of J n con verge, as n → + ∞ , to minimizers of J , at least under suitable b oundedness assumptions on the weigh ts. This pro vides a theoretical justification for studying the gradient flow of J n : for large n the sparse dynamics of J n can b e view ed as a tractable pro xy for appro ximating minimizers of the infinite-dimensional problem (1.2) with resp ect to J . After in tro ducing the main setup and recalling standard conv exit y and deriv ativ e notions in metric spaces in Section 2, in Section 3 we in tro duce the discretized functional and first justify the consistency of our approac h using Γ-conv ergence, cf. Subsection 3.1. In Subsection 3.2 w e establish the w ell-p osedness of AGFs via a minimizing mov ement scheme, and then turn to the inv estigation of qualitativ e prop erties of the flow. In particular, in Subsections 3.3-3.4 we iden tify conditions on the linear op erator K and on the in teraction b etw een the regularizer and the geometry of extremal p oin ts whic h ensure that J n is λ -conv ex for some λ ∈ R . This prop erty is fundamen tal in the theory of gradient flows in metric spaces, as it guarantees that minimizing mov ements are curv es of maximal slop e with resp ect to the lo cal slop e | ∂ J n | . W e then address uniqueness of the minimizing mo vemen ts defining A GFs, under the additional assumption that the set of extremal p oints is non- p ositiv ely curv ed (NPC). The NPC condition is a classical geometric assumption in the theory of gradien t flo ws in metric spaces and pla ys a central role in establishing con traction and regularit y prop erties [37]. In Section 4 w e in tro duce and analyze a suitable lifting of the functional J to the space P 2 ( R + × ˜ B ) of probabilit y measures on R + × ˜ B , with the aim of studying its dynamics as a Wasserstein gr adient flow . This construction is motiv ated by the observ ation that, as sho wn in [23], in terpreting PGFs as W asserstein gradien t flo ws in the space of probabilit y measures o v er w eights and positions is crucial for understanding b oth their dynamical b ehavior and their conv ergence prop erties. In our more general setting ho wev er, suc h an in terpretation is not immediate, since the underlying v ariational problem is form ulated in a Banach space and is therefore not directly expressible in terms of probability measures. Instead, how ev er, here one can rely on Cho qu ´ et’s the or em [40], which ensures that every u ∈ B can b e represented as the weak barycenter of a p ositiv e measure µ ∈ M + ( ˜ B ) supp orted on the set of extremal p oin ts, i.e. , u = Z ˜ B v dµ ( v ) . This representation provides the bridge b et ween the original Banach-space formulation and its lifted coun terpart in W asserstein space. Thanks to this identification, and further factorizing the mass of 4 the measure µ ∈ M + ( ˜ B ) in the domain, following [14], one can define the following lifted problem as min ν ∈P 2 ( R + × ˜ B ) J ( ν ) , where J ( ν ) := F  Z R + × ˜ B c 2 K u dν ( c, u )  + Z R + × ˜ B c 2 dν ( c, u ) . (1.5) W e advocate that the con vergence prop erties of A GFs can b e understo o d b y analyzing the metric gradien t flow of the lifted functional J . A first crucial observ ation supp orting this p ersp ective is that the minimization problems (1.5) and (1.2) are equiv alen t, thanks to the surjectivit y of the Cho quet represen tation. This equiv alence ensures that the lifted form ulation faithfully captures the original v ariational problem. After establishing this equiv alence and deriving the fundamen tal prop erties of the lifted functional in Subsection 4.1, in the rest of the section we study its minimizing mov ement sc heme. Our setting differs substantially from that of [23]: In contrast to problems where the extremal p oin ts p ossess additional prop erties, here w e must consider a gradient flow on the metric space R + × ˜ B , whic h in general lacks an y differen tial structure. As a consequence, classical to ols based on con tin uit y equations and differen tial calculus are not a v ailable, which requires a fully metric approac h and significan tly complicates the analysis. W e pro ve w ell-p osedness of the minimizing mo vemen t scheme for J and establish its λ -con vexit y under the same assumptions that guarantee λ -con vexit y of J n . In particular, under these conditions, every minimizing mov emen t is a curve of maximal slop e with resp ect to the metric slop e | ∂ J | . Finally , in Section 5, which contains the main result of this pap er, we establish a precise link b et ween the A GF of J n and the corresp onding metric gradient flo w of J . Sp ecifically , we sho w that if ( c t , u t ) is a curve of maximal slop e for J n with resp ect to | ∂ J n | , then the curv e of empirical measures t 7→ 1 n n X j =1 δ ( c j t ,u j t ) (1.6) is a curve of maximal slop e for J with resp ect to | ∂ J | . The significance of this result is that it sho ws that the metric gradient flow of J reco v ers (at a discrete level) the dynamics of the initial A GF and th us sets the necessary basis for analyzing conv ergence of the AGF to minimizers of J when n → + ∞ ; this analysis is left to future works. The pro of relies on a careful comparison b et ween the metric slop es of J n and J , using tec hniques from semi-discrete optimal transp ort [5, 38] generalized to metric spaces. These to ols allow us to lo calize the slop e estimates around each Dirac mass app earing in the empirical representation (1.6). W e conclude the pap er b y presen ting in Section 6 several natural examples illustrating the appli- cabilit y of A GFs to differen t v ariational problems. F or each example, we pro vide the c haracterization of Ext( B ) and analyze the metric induced b y the weak*-topology on ˜ B . W e inv estigate whether this metric space is NPC, which in turn would imply uniqueness of the AGFs, and w e examine the structure of the lifting in each case. More precisely , w e cov er the follo wing examples: optimization problems in the space of measures, recov ering the framew ork of [23]; one-dimensional BV functions regularized by their total v ariation seminorm [11, 18] and dynamic problems in the space of time- dep enden t measures regularized b y the Benamou–Brenier energy [13, 12, 31, 20]. Moreo v er, we briefly mention how AGFs could b e applied to optimization problems regularized with KR-norms [18, 8], scalar differen tial op erators [11, 46] and higher dimensional BV functions [3]. 5 2 Setting and Preliminaries Throughout this pap er w e consider minimization problems of the form inf u ∈M J ( u ) , where J ( u ) := F ( K u ) + R ( u ) , (2.1) where M is the top ological dual of a separable Banac h space C . The norm on C will b e denoted b y ∥ · ∥ C and the dualit y pairing betw een u ∈ M and p ∈ C by ⟨ u, p ⟩ . The space M is a Banac h space when equipp ed with the canonical dual norm ∥ u ∥ M := sup {⟨ u, p ⟩ : ∥ p ∥ C ⩽ 1 } . (2.2) The forward op erator K : M → Y is a linear op erator mapping in to a Hilb ert space Y . The inner pro duct and induced norm on Y will b e denoted by ( · , · ) Y and ∥ · ∥ Y resp ectiv ely . W e mak e the follo wing standard assumptions on F , K and R . (A1) The forw ard op erator K : M → Y is linear, w eak ∗ -to-w eak contin uous from M into a Hilb ert space Y . (A2) The fidelity term F : Y → R is bounded from b elo w, con vex, and twice F r ´ echet differentiable on Y . (A3) The regularizer R : M → [0 , + ∞ ] is conv ex and positively one-homogeneous, i.e. , R ( λu ) = λ R ( u ) ∀ λ ⩾ 0 , u ∈ M , (2.3) and is also w eak ∗ -lo wer semicontin uous. (A4) F or ev ery α ⩾ 0, the sublev el set S − α ( R ) := { u ∈ M : R ( u ) ⩽ α } , (2.4) is weak ∗ -compact. (A5) The forw ard op erator K : M → Y is sequen tially weak ∗ -to-strong contin uous in the domain of R , defined by Dom( R ) := { u ∈ M : R ( u ) < + ∞} . (2.5) Note that, due to (A1), there exists a linear and con tinuous op erator K ∗ : Y → C , the adjoint op er ator of K , which satisfies ⟨ u, K ∗ y ⟩ = ( K u, y ) Y ∀ u ∈ M , y ∈ Y ; (2.6) see, for example, [16, Remark 3.2]. Moreov er, the existence of the pre-adjoin t K ∗ implies the strong- to-strong con tin uity of K on M . Note further that (A4) implies that the sublevel sets are weak*- closed and norm b ounded. It is immediate to v erify (see also [14, Proposition 2.3]) that, under the ab o ve assumptions, the existence of a minimizer to (2.1) is guaranteed. In what follo ws, w e will denote the unit ball of R b y B := S − 1 ( R ) = { u ∈ M : R ( u ) ⩽ 1 } . (2.7) 6 Definition 2.1. An element u ∈ B ⊂ M is called an extr emal p oint of B if there exists no c hoice of u 1 , u 2 ∈ B with u 1  = u 2 , and s ∈ (0 , 1) such that u = (1 − s ) u 1 + su 2 . The set of all extremal p oints of B is denoted by Ext( B ). W e also set ˜ B := Ext( B ) ∗ . (2.8) A consequence of the Kr ein-Milman the or em [35] is that B is the w eak*-closure of the conv ex env elop e of ˜ B . Remark 2.2. Note that b y (A3) and (A4), ˜ B is weak ∗ -compact and non-empty . Since the predual space C is separable, there exists a metric d ˜ B metrizing the weak ∗ -con vergence on ˜ B , i.e. , for all sequences ( u k ) k ∈ N ⊂ ˜ B and u ∈ ˜ B w e ha ve: u k ∗ ⇀ u as k → ∞ ⇐ ⇒ lim k →∞ d ˜ B ( u k , u ) = 0 . (2.9) In particular, one has that ( ˜ B , d ˜ B ) is a compact metric space and thus separable. Moreo ver, com- pactness of ( ˜ B , d ˜ B ) guarantees the existence of u 1 , u 2 ∈ ˜ B suc h that d ˜ B ( u 1 , u 2 ) = sup  d ˜ B ( u, v ) : u, v ∈ ˜ B  < + ∞ . (2.10) W e next recall some of the necessary terminology for metric spaces. In what follo ws, given a metric space ( X, d ), a curv e γ : [0 , 1] → X is alw a ys intended to b e contin uous with resp ect to the metric d . Of course, the interv al of parametrization [0 , 1] can also b e replaced with an y subinterv al of [0 , + ∞ ). Definition 2.3 ( Ge o desic Sp ac e ) . Let ( X , d ) b e a metric space. F or x, y ∈ X , a curv e γ : [0 , 1] → X is called a ge o desic b etwe en γ 0 := x and γ 1 := y , if d ( γ s , γ t ) = | s − t | d ( γ 0 , γ 1 ) ∀ s, t ∈ [0 , 1] . (2.11) Then ( X , d ) is called ge o desic al ly c omplete iff for every x, y ∈ X there exists a geo desic γ b etw een x and y . The space of all such constan t-sp eed geo desics in X , will b e denoted b y Geo( X ). Note that for simplicit y , w e only use constant sp eed geo desics parametrized on the unit interv al. Definition 2.4 ( Convexity and λ -c onvexity along curves ) . A functional F : X → ( −∞ , + ∞ ] is said to b e c onvex along a curve γ : [0 , 1] → X , γ t := γ ( t ), iff F ( γ t ) ⩽ (1 − t ) F ( γ 0 ) + tF ( γ 1 ) ∀ t ∈ [0 , 1] , (2.12) and given λ ∈ R , F is said to b e λ - c onvex along γ , iff F ( γ t ) ⩽ (1 − t ) F ( γ 0 ) + tF ( γ 1 ) − λ 2 t (1 − t ) d 2 ( γ 0 , γ 1 ) ∀ t ∈ [0 , 1] . (2.13) F or metric gradien t flo ws, prop erties of the metric itself are crucial for the w ell-p osedness of gradien t flows, for instance in establishing uniqueness as in Subsection 3.4. Moreo v er, it is well kno wn (cf. [37]) that ( X , d ) b eing a so-called sp ac e of glob al non-p ositive curvatur e (NPC) in the sense of Alexandr ov implies regularity of the gradient flows. Recall that there are v arious equiv alent conditions for a geo desic space ( X , d ) to b e NPC (for more details see for instance [6, 17]). 7 Definition 2.5 ( NPC sp ac e ) . A geo desic metric space ( X , d ) is a space of (global) non-p ositive curvatur e iff for every triplet of p oints γ 0 , γ 1 , w ∈ X and geo desics γ : [0 , 1] → X , the following inequalit y holds: d 2 ( γ t , w ) ⩽ (1 − t ) d 2 ( γ 0 , w ) + td 2 ( γ 1 , w ) − t (1 − t ) d 2 ( γ 0 , γ 1 ) ∀ t ∈ [0 , 1] . (2.14) Next, w e collect some basic definitions of differen tiability of curv es and functionals on metric spaces, whic h we will use throughout the sequel, and refer the reader to [4, Chapter 1] for more details. Definition 2.6 ( Absolutely c ontinuous curves ) . Let ( X , d ) b e a complete metric space, p ∈ [1 , + ∞ ]. W e sa y that a curv e γ : [0 , 1] → X b elongs to the space of p -absolutely con tinuous curv es AC p ([0 , 1]; X ) iff there exists m ∈ L p (0 , 1) such that d ( γ s , γ t ) ⩽ Z t s m ( r ) d r ∀ 0 ⩽ s ⩽ t ⩽ 1 , (2.15) and analogously define AC p ([ a, b ]; X ) for all a, b ∈ [0 , + ∞ ) with a < b . In the case p = 1 the ab o ve definition reduces to the one of absolutely con tinuous curv es and we will denote the corresp onding space simply with AC ([ a, b ]; X ). In addition, we set AC ([0 , ∞ ); X ) := ∞ \ n =1 AC ([0 , n ]; X ) , AC loc ([0 , ∞ ); X ) := { γ : γ ∈ AC ([ a, b ]; X ) ∀ 0 ⩽ a < b < ∞} . Definition 2.7 ( Metric derivative ) . Let ( X, d ) b e a complete metric space. F or an y γ ∈ AC ([0 , 1]; X ), w e define the metric deriv ativ e of γ at t ∈ (0 , 1) as | γ ′ | ( t ) := lim h → 0 d ( γ ( t + h ) , γ ( t )) | h | . (2.16) The ab ov e limit is defined for L 1 -a.e. t ∈ (0 , 1), | γ ′ | ∈ L 1 (0 , 1), and | γ ′ | is exactly the minimal function m ∈ L 1 (0 , 1) satisfying the inequalit y (2.15). W e next turn to the standard definitions of differen tials for functionals defined on metric spaces. Definition 2.8 ( L o c al slop e ) . Let F : X → ( −∞ , + ∞ ] with prop er effectiv e domain, i.e. , Dom( F ) : = { u ∈ X : F ( u ) < + ∞}  = ∅ . W e define the lo cal slop e of F at a p oin t u ∈ Dom( F ) as | ∂ F | ( u ) := lim sup w → u ( F ( u ) − F ( w )) + d ( u, w ) . (2.17) Under suitable assumptions, the lo cal slop e is indeed a kind of metric gradien t for the functional, in the following sense. 8 Definition 2.9 ( Str ong upp er gr adient ) . A function g : X → [0 , + ∞ ] is a strong upp er gradien t for F : X → ( −∞ , + ∞ ] iff for ev ery γ ∈ AC ([0 , 1]; X ), the function g ◦ γ is Borel and the follo wing inequalit y holds: | F ( γ ( t )) − F ( γ ( s )) | ⩽ Z t s g ( γ ( r )) | γ ′ | ( r ) dr ∀ 0 < s ⩽ t < 1 . (2.18) In particular, if g ◦ γ | γ ′ | ∈ L 1 (0 , 1), then F ◦ γ ∈ AC ([0 , 1]; X ) and | ( F ◦ γ ) ′ | ( t ) ⩽ g ( γ ( t )) | γ ′ | ( t ) for L 1 -a.e. t ∈ (0 , 1) . (2.19) W e next recall the concept of curv es of maximal slop e, which is a suitable generalization of the standard concept of gradien t flows in the metric setting. Definition 2.10 ( Curves of maximal slop e ) . A curve γ ∈ AC loc ([0 , + ∞ ); X ) is said to b e a curv e of maximal slop e for a functional F : X → ( −∞ , + ∞ ] with resp ect to its strong upp er gradien t g , iff the function F ◦ γ : [0 , ∞ ) → ( −∞ , ∞ ] is L 1 -a.e. equal to a non-increasing map φ , and φ ′ ( t ) ⩽ − 1 2 | γ ′ | 2 ( t ) − 1 2 g 2 ( γ ( t )) for L 1 -a.e. t > 0 . (2.20) 3 F orm ulation of the A tomic Gradien t Flo w In this section we presen t a discretization approach to problem (2.1), whic h we sho w to b e consisten t with the original problem b y means of Γ-conv ergence. T o appro ximate minimizers of (2.1), w e then formalize the ev olution of the discretized functional through Atomic Gr adient Flows (in short AGFs) b y using a minimizing mov emen t approach. 3.1 Discretization of the functional J and consistency W e start b y setting the necessary notations to define the restriction of the optimization functional to sparse represen tations. In what follo ws, w e alwa ys follow the con ven tion that comp onents of vectors will b e denoted as sup erscripts, while subscripts will typically denote a time-parameter. Definition 3.1. Recalling the definitions of ˜ B in (2.8) and the functional J in (2.1), let us fix L > 0, and a closed and geo desically complete subset B ⊂ ˜ B (cf. Theorem 2.3), and a metric d B metrizing the weak ∗ -con vergence on B . Then, for n ∈ N , w e set Ω n L := [0 , L ] n × B n and ˜ Ω n L := [0 , L ] n × ˜ B n . (3.1) W e endo w Ω n L with the distance d n (( c , u ) , ( ˜ c , ˜ u )) := 1 n n X j =1  | c j − ˜ c j | 2 + d 2 B ( u j , ˜ u j )  ! 1 / 2 , (3.2) and analogously for ˜ Ω n L with d ˜ B in place of d B in (3.2). F or n = 1 we write Ω L := Ω 1 L and d Ω := d 1 , i.e. , d Ω (( c, u ) , (˜ c, ˜ u )) =  | c − ˜ c | 2 + d 2 B ( u, ˜ u )  1 / 2 . (3.3) 9 It is well-kno wn that the prop erties of ( B , d B ) propagate to (Ω n L , d n ), as the follo wing lemma suggests. Lemma 3.2. The sp ac es Ω L and Ω n L , n > 1 , ar e b oth c omp act, sep ar able and ge o desic metric sp ac es. Mor e over, a curve ( c t , u t ) : [0 , 1] → Ω n L , with ( c t , u t ) =  ( c 1 t , u 1 t ) , . . . , ( c n t , u n t )  is a ge o desic in (Ω n L , d n ) iff for every j ∈ { 1 , . . . , n } the c o or dinate curve ( c j t , u j t ) : [0 , 1] → Ω is a ge o desic in (Ω , d Ω ) , or e quivalently, iff c j t : [0 , 1] → [0 , L ] and u j t : [0 , 1] → B ar e ge o desics in their r esp e ctive sp ac es. Pr o of. That Ω L and Ω n L are compact and separable metric spaces is a simple consequence of equipping them with the ℓ 2 -metric, cf. (3.2). The existence of geodesics and geo desic completeness follows from the product–geo desic prop erty in, e.g. , [17, Prop osition 5.3]. Since w e endo wed Ω n L with the ℓ 2 -metric d n , ( c t , u t ) is a unit -speed geo desic in Ω n L iff, for ev ery j ∈ { 1 , . . . , n } , eac h co ordinate ( c j t , u j t ) is a unit-sp eed geo desic in (Ω , d Ω ), cf. (3.3). As d Ω is the ℓ 2 -metric on [0 , L ] × B and b oth spaces in the pro duct are geodesic spaces, ( c j t , u j t ) is a unit-speed geodesic in Ω iff c j t and u j t are unit-sp eed geo desics in [0 , L ] and B , resp ectively . Note that ab o ve, and also in what follows, with a sligh t abuse of notation w e hav e reordered the comp onen ts of a vector ( c , u ) ∈ Ω n L in to an n -comp onen t v ector with the en tries b eing pairs of the form ( c, u ) ∈ Ω L , with the initial enumeration b eing tak en into account, hence identifying ( c , u ) :=  ( c 1 , . . . , c n ) , ( u 1 , . . . , u n )  =  ( c 1 , u 1 ) , . . . , ( c n , u n )  ∈ Ω n L . (3.4) Adopting this notation throughout, we next in tro duce a discr etize d version of the initial functional J of (2.1), by restricting it to sparse represen tations. Definition 3.3 ( Discr etization by r estriction to sp arse r epr esentations ) . W e define the functional J n : Ω n L → R as J n ( c , u ) := J 1 n n X j =1 ( c j ) 2 u j ! ∀ ( c , u ) ∈ Ω n L . (3.5) One defines J n on ˜ Ω n L in the same w ay . In general, b y (A3) and the fact that u j ∈ ˜ B = Ext {R ( u ) ⩽ 1 } ∗ for all j ∈ { 1 , . . . , n } (whic h in particular implies that either R ( u j ) = 1 or u j = 0), one has R n X j =1 α j u j ! ⩽ n X j =1 α j , where α j ⩾ 0 ∀ j ∈ { 1 , . . . , n } . (3.6) In addition to (A3)– (A5), it is conv enient to imp ose the follo wing extra condition on the regularizer R , called the no loss of mass condition. (A6) The regularizer R has no loss of mass on B ⊂ ˜ B , i.e. , for ev ery n ∈ N , ( α j ) n j =1 ⊂ R + and ( u j ) n j =1 ⊂ B , it holds that R n X j =1 α j u j ! = n X j =1 α j R ( u j ) . (3.7) 10 The following assumption will also b e necessary for the some of the subsequent results, for whic h w e will men tion it explicitly . (A7) The closed, geo desically complete set B ⊂ ˜ B does not to contain 0, i.e. , 0 / ∈ B . Remark 3.4. W e mak e the follo wing remarks ab out the previous definitions and assumptions. (i) The setting describ ed ab ov e for the AGF-ev olution is natural, since firstly , one has that the canonical iden tification with elements in the lifted W asserstein-2 space of probability measures (see Section 4.1 for the resp ectiv e definitions), giv en by (Ω n L , d n ) ∋ ( c , u ) 7→ 1 n n X j =1 δ ( c j ,u j ) ∈ ( P 2 (Ω n L ) , W 2 ) is 1-Lipschitz, as W 2 1 n n X j =1 δ ( c j ,u j ) , 1 n n X j =1 δ (˜ c j , ˜ u j ) ! ⩽ 1 n n X j =1 d 2 Ω  ( c j , u j ) , (˜ c j , ˜ u j )  ! 1 / 2 . Secondly , using the Krein-Milman theorem, w e prov e in Theorem 3.6 that on ˜ B , J n → J as n → ∞ , in the sense of Γ-con v ergence. Afterwards, we restrict to a geo desically connected subset B of ˜ B , whic h is sensible in the con text of gradien t flows, and analogous to replacing {± δ x : x ∈ X } b y { δ x : x ∈ X } for the optimization problem (1.1). (ii) In particular, with the choice R ( · ) = ∥ · ∥ T V , the latter b eing the total v ariation norm in the space of measures, the no loss of mass prop erty (3.7) holds. (iii) It is imp ortan t to note that man y of our results can be prov ed in the more general setting of unbounded w eights in R + and without assuming the no loss of mass of prop erty . In those cases, w e simply state the result for, e.g. , Ω ∞ := R + × B instead of using Ω L . But for ease of presentation, unless sp ecifically mentioned, we assume for simplicity b oth (A6)-(A7) throughout the man uscript. Due to the compactness of Ω n L and our constitutive assumptions, one can easily pro ve that J n is lo wer-semicon tinuous and th us admits a minimizer in Ω n L b y the direct metho d in the Calculus of V ariations. Lemma 3.5 ( Existenc e of minimizers for the discr etize d pr oblem ) . Under assumptions (A1) - (A5) , J n is lower-semic ontinuous on ˜ Ω n , and inf ˜ Ω n L J n admits a minimizer. In addition, the same holds on Ω n L . Pr o of. Let ( c ℓ , u ℓ ) ℓ ∈ N ⊂ ˜ Ω n L b e a sequence such that ( c ℓ , u ℓ ) d n − → ( c , u ) ∈ ˜ Ω n L as ℓ → ∞ . In particular, since d B metrizes the weak*-con vergence in B , recalling (2.9) and Theorem 3.1, we ha ve that comp onent wise, ( c j ℓ ) 2 u j ℓ ∗ ⇀ ( c j ) 2 u j as ℓ → ∞ , ∀ j ∈ { 1 , . . . , n } . Therefore, denoting x ℓ := 1 n n X j =1 ( c j ℓ ) 2 u j ℓ , and x := 1 n n X j =1 ( c j ) 2 u j , (3.8) 11 it holds that x ℓ ∗ ⇀ x as ℓ → ∞ . In view of (A1), (3.8) implies that K x ℓ ⇀ K x in Y , and b y (A2), F ( K x ) ⩽ lim inf ℓ →∞ F ( K x ℓ ). Similarly , (A3) com bined again with (3.8) and the sup er-additivity of the lim inf , yield in total J n ( c , u ) ⩽ lim inf ℓ →∞ J n ( c ℓ , u ℓ ) , (3.9) i.e. , J n is low er semicon tinuous. The existence for minimizers for J n follo ws b y the compactness of ˜ Ω n L (since ˜ B is weak ∗ -compact) as a standard application of the direct metho d in the Calculus of V ariations. F or Ω n L , the same argumen ts can b e rep eated verbatim. The strategy of optimizing J n as a substitute for J is based on the intuition that the v ariational problem asso ciated with J n pro vides a consistent approximation of the original problem. More precisely , as n increases, one anticipates that minimizers of J n will conv erge, in an appropriate sense, to minimizers of J . W e will formalize suc h a result using a Γ-conv ergence approach, cf. [26] for a detailed treatment. Recalling the notation (3.4), w e first note that one needs to w ork with un b ounded w eights and the full set ˜ B here in order to compare J n with J on the domain of R . Th us, w e extend J n to v ∈ Dom( R ) b y setting J n ( v ) := ( J  1 n P n j =1 ( c j ) 2 u j  , if v = 1 n P n j =1 ( c j ) 2 u j , for some ( c , u ) ∈ R n + × ˜ B n , + ∞ , otherwise. (3.10) Prop osition 3.6 (Γ -Conver genc e r esult ) . Assume (A1) - (A5) . With r esp e ct to the we ak ∗ -top olo gy on Dom( R ) , we have that J n Γ → J as n → ∞ , i.e., • (Γ-liminf inequalit y): F or every ( v n ) n ∈ N ⊂ Dom( R ) with v n ∗ ⇀ v ∈ Dom( R ) , it holds that J ( v ) ⩽ lim inf n →∞ J n ( v n ) . • (Γ-limsup inequality): F or e ach v ∈ Dom( R ) , ther e exists a r e c overy se quenc e ( v n ) n ∈ N ⊂ Dom( R ) with v n ∗ ⇀ v ∈ Dom( R ) such that lim sup n →∞ J n ( v n ) ⩽ J ( v ) . A ctual ly, for every n ∈ N , v n is given by some ( ˜ c j n ) n j =1 ⊂ [0 , p n R ( v )] , and ( ˜ u j n ) n j =1 ⊂ ˜ B such that v n = 1 n n X j =1 (˜ c j n ) 2 ˜ u j n , and 1 n n X j =1 (˜ c j n ) 2 ⩽ R ( v ) . Pr o of. (Γ -liminf ) Let ( v n ) n ∈ N ⊂ Dom( R ) b e suc h that v n ∗ ⇀ v ∈ Dom( R ) as n → ∞ . If we supp ose that lim inf n →∞ J n ( v n ) = + ∞ , then there is nothing to prov e. Otherwise, we ma y without restriction assume that sup n ∈ N J n ( v n ) < ∞ . By the definition of J n on Dom( R ), this implies that each v n admits a representation as in (3.10), and J n ( v n ) = J ( v n ) . Since J is weak ∗ -lo wer semicontin uous and v n ∗ ⇀ v as n → ∞ , w e get J ( v ) ⩽ lim inf n →∞ J ( v n ) ⩽ lim inf n →∞ J n ( v n ) , 12 whic h is the desired Γ-liminf inequalit y . (Γ -limsup ) T ake v ∈ Dom( R ), for which we can supp ose that R ( v )  = 0. Indeed, otherwise, v = 0, cf. (A3), and th us the reco very sequence w ould b e the trivial one. W e claim that for n ∈ N there exist ( β j n ) n j =1 ⊂ R + , ( u j n ) n j =1 ⊂ ˜ B suc h that v n := 1 n n X j =1 ( β j n ) 2 u j n ∗ ⇀ v , and 1 n n X j =1 ( β j n ) 2 = R ( v ) . (3.11) Indeed, b y the Krein-Milman theorem, there exists a sequence of con v ex com binations of extremal p oin ts ( u j n ) k n j =1 ⊂ ˜ B , with k n ∈ N , so that k n X j =1 ˜ α j n u j n ∗ ⇀ v R ( v ) , with ˜ α j n ⩾ 0 and k n X j =1 ˜ α j n = 1 . Note that by adding extremal p oints with w eights equal to zero we can assume that k n is strictly increasing in n and thus k n → + ∞ . Moreo ver, by a rescaling argument (and multiplying b oth sides of the ab ov e with R ( v )), we can find w eights α j n ⩾ 0 such that 1 k n k n X j =1 ( α j n ) 2 u j n ∗ ⇀ v , with α j n ⩾ 0 , and 1 k n k n X j =1 ( α j n ) 2 = R ( v ) , in particular α j n := q k n R ( v ) ˜ α j n . Define no w the follo wing sequence. Set k 0 := 0, and define v n := ( 0 , if n ⩽ k 1 , 1 k ℓ P k ℓ j =1 ( α j ℓ ) 2 u j ℓ , if k ℓ < n ⩽ k ℓ +1 . Then clearly still v n ∗ ⇀ v as n → ∞ and for n ⩾ k 1 it holds that for any extremal p oin t ¯ u ∈ ˜ B that v n = 1 k ℓ k ℓ X j =1 ( α j ℓ ) 2 u j ℓ = 1 k ℓ k ℓ X j =1 ( α j ℓ ) 2 u j ℓ + n − k ℓ X j = k ℓ +1 0 · ¯ u = 1 n k ℓ X j =1  r n k ℓ α j ℓ  2 u j ℓ + n − k ℓ X j = k ℓ +1 0 · ¯ u = 1 n n X j =1 ( β j n ) 2 ˜ u j n , for a suitable c hoice of β j n ⩾ 0 and ( ˜ u j n ) n j =1 ⊂ ˜ B with 1 n P n j =1 ( β j n ) 2 = R ( v ), which pro ves (3.11). It remains to show that the constructed sequence in (3.11) is a recov ery sequence. By assumption (A5), the weak ∗ -to-strong-con tinuit y of v 7→ F ( K v ) holds, implying that F ( K v ) = lim n →∞ F 1 n n X j =1 ( β j n ) 2 K u j n ! . By Jensen’s inequality and (A3), it holds that R 1 n n X j =1 ( β j n ) 2 u j n ! ⩽ 1 n n X j =1 ( β j n ) 2 R ( u j n ) ⩽ 1 n n X j =1 ( β j n ) 2 = R ( v ) . 13 Therefore, it follows b y the w eak ∗ -lo wer semicontin uit y of R that R ( v ) ⩽ lim inf n →∞ R 1 n n X j =1 ( β j n ) 2 u j n ! ⩽ R ( v ) . As a consequence, equalit y must hold and thus R ( v ) = lim n →∞ R 1 n n X j =1 ( β j n ) 2 u j n ! , so also J ( v ) = lim n →∞ J 1 n n X j =1 ( β j n ) 2 u j n ! , as desired. By the co ercivit y of J and general prop erties of Γ-con vergence, every sequence of minimizers of J n of the form v n := 1 n P n j =1 ( c j n ) 2 u j n admits a weak*-con v ergent subsequence, and ev ery weak*-limit minimizes J . Prop osition 3.7 (Con vergence of Minimizers) . Assume (A1) - (A5) . Then, ther e exists L > 0 such that for every se quenc e of minimizers ( c n , u n ) ∈ arg min ( c , u ) ∈ ˜ Ω n √ nL J n ( c , u ) , the line ar c ombination 1 n P n j =1 ( c j n ) 2 u j n admits a c onver gent subse quenc e. Mor e over, for every such c onver gent subse quenc e it holds that lim k →∞ 1 n k n k X j =1 ( c j n k ) 2 u j n k = v 0 ∈ arg min u ∈M J ( u ) . Pr o of. Let us consider a minimizer v ∗ of J in M and choose L > R ( v ∗ ). Cho osing ( c n , u n ) n ∈ N as in the statement, we obtain R 1 n n X j =1 ( c j n ) 2 u j n ! = J n ( c n , u n ) − F 1 n n X j =1 ( c j n ) 2 K u j n ! ⩽ min ( c , u ) ∈ Ω n √ nL J n ( c , u ) − inf v ∈M F ( v ) . The term in the right hand side ab o ve remains b ounded, since it holds that for ev ery n ∈ N , min ( c , u ) ∈ ˜ Ω n +1 √ ( n +1) L J n +1 ( c , u ) ⩽ min ( c , u ) ∈ ˜ Ω n √ nL J n ( c , u ) , as one can show that all elements in ˜ Ω n √ nL are v alid comp etitors for the problem in ˜ Ω n +1 √ ( n +1) L , by adding an extremal p oin t with zero w eight, and rescaling. Therefore, there exists a C > 0 such that 1 n n X j =1 ( c j n ) 2 u j n ∈ { v : R ( v ) ⩽ C } , 14 whic h is w eak*-compact by (A4). Hence, there exists a subsequence ( n k ) k ∈ N and v 0 ∈ M such that 1 n k n k X j =1 ( c j n k ) 2 u j n k ∗ ⇀ v 0 . It remains to pro ve that v 0 ∈ arg min v ∈M J ( v ). Since v ∗ ∈ Dom( R ), by the Γ-conv ergence result of Prop osition 3.6, there exists a recov ery sequence ( ˜ c n , ˜ u n ) ∈ ˜ Ω n √ nL suc h that 1 n n X j =1 (˜ c j n ) 2 ˜ u j n ∗ ⇀ v ∗ , 1 n n X j =1 (˜ c j n ) 2 ⩽ R ( v ∗ ) , and J n 1 n n X j =1 (˜ c j n ) 2 ˜ u j n ! − → J ( v ∗ ) as n → ∞ . In particular, ( ˜ c n , ˜ u n ) is an admissible comp etitor for J n , for every n ∈ N , so that J n ( c n , u n ) ⩽ J n ( ˜ c n , ˜ u n ) , whic h, together with the Γ-liminf inequalit y implies that J ( v ∗ ) ⩾ lim sup k →∞ J n k 1 n k n k X j =1 (˜ c j n k ) 2 ˜ u j n k ! ⩾ lim inf k →∞ J n k 1 n k n k X j =1 ( c j n k ) 2 u j n k ! ⩾ J ( v 0 ) , i.e. , v 0 ∈ arg min v ∈M J ( v ), whic h completes the assertion. 3.2 Definition of A GFs through minimizing mov emen ts A tomic gr adient flows (AGFs) are defined as gradient flows in the metric space Ω n L of the functional J n . In this subsection, we sho w the existence of suc h gradient flo ws using a minimizing mo vemen ts approac h, also called JKO scheme, cf. [34]. The minimizing mo vemen ts setup is also briefly recalled in Section A.1. Here, considering again Ω L n mak es sense to enforce geo desical completeness and compactness, which we will both require in the pro ofs. W e first define the appro ximation sc heme. Definition 3.8 ( Appr oximation scheme ) . Let τ > 0 and ( c 0 , u 0 ) ∈ Ω n L b e a giv en initial datum. Set ( c 0 τ , u 0 τ ) := ( c 0 , u 0 ) and for ev ery k ∈ N c ho ose iteratively ( c k +1 τ , u k +1 τ ) ∈ arg min Ω n L G k n,τ , (3.12) where the τ -discr etize d ener gy G k n,τ : Ω n L → R is defined b y G k n,τ ( c , u ) := J n ( c , u ) + 1 2 τ d 2 n  ( c , u ) , ( c k τ , u k τ )  ∀ ( c , u ) ∈ Ω n L . (3.13) Note that the existence of minimizers of G k n,τ follo ws directly from the compactess of Ω n L and the lo wer semicontin uit y of J n stated in Lemma 3.5. The existence of minimizing mo v ements in our setting can no w be obtained b y a direct application of [4, Prop osition 2.2.3]. The necessary lo wer-semicon tinuit y , co ercivit y and compactness prop erties follo w by Theorem 3.5, the fact that J n is b ounded from b elow and the compactness of Ω L n . 15 Theorem 3.9 ( Minimizing movement scheme as limit p ath for τ → 0) . L et τ > 0 and an initial datum ( c 0 , u 0 ) ∈ Dom( J n ) . Define ( c τ , u τ ) : [0 , + ∞ ) → Ω n L , by ( c τ , u τ )(0) := ( c 0 , u 0 ) , and ( c τ , u τ )( t ) := ( c k τ , u k τ ) for t ∈ ( k τ , ( k + 1) τ ] and k ∈ N , (3.14) wher e ( c k τ , u k τ ) as in (3.12) . Ther e exists a se quenc e ( τ ℓ ) ℓ ∈ N ⊂ (0 , 1) with τ ℓ → 0 as ℓ → ∞ , and a curve ( c , u ) ∈ AC 2 loc ([0 , + ∞ ); Ω n L ) such that for al l t ∈ [0 , + ∞ ) d n (( c τ ℓ ,t , u τ ℓ ,t ) , ( c t , u t )) → 0 as ℓ → + ∞ . (3.15) In p articular, we have ( c , u )(0 + ) = ( c 0 , u 0 ) and we c al l ( c , u ) a minimizing movement, se e also The or em A.3. Remark 3.10. Note that in the comprehensive treatmen t of the theory of minimizing mov ements, detailed in [4, Chapter 2], what w e simply called minimizing mov ements are called Gener alize d Minimizing Movements , denoted by GMM( G, u 0 ) therein. How ever, w e hav e decided to simplify the treatmen t of minimizing mov ements here for brevity . 3.3 λ -Con v exit y and curv es of maximal slop e Next, w e study the λ -con vexit y of J n and, whenev er this prop erty holds, w e derive finer prop erties of the minimizing mo vemen ts asso ciated with the A GF. Remark 3.11 ( Sc alar and ge o desic λ -c onvexity ) . W e first note the follo wing connection b et ween scalar and geo desic λ -con vexit y . Let ( X , d ) b e a geo desic metric space, f : X → R and γ : [0 , 1] → X a geo desic. Supp ose that f ◦ γ : [0 , 1] → R is α ( γ ) -c onvex for some α ( γ ) ∈ R , i.e. , the map t 7→ f ( γ t ) − α ( γ ) 2 t 2 is conv ex. Equiv alently , for ev ery t ∈ [0 , 1] it holds that f ( γ t ) ⩽ (1 − t ) f ( γ 0 ) + tf ( γ 1 ) − α ( γ ) 2 t (1 − t ) . Comparing the last inequalit y with the definition of geo desic λ -con vexit y of f , cf. (2.13), one im- mediately sees that for scalar α ( γ )-conv exity along ev ery geo desic γ to imply λ -conv exit y of f , α ( γ ) needs to b e of the form α ( γ ) = λd 2 ( γ 0 , γ 1 ). T o prov e the λ -con vexit y of J n for λ ∈ R , w e will need an imp ortant assumption that asks for a compatibilit y b etw een the forward op erator K and the geo desic structure of B . A consequence of the λ -conv exity will b e then that the lo cal slop e is a strong upp er gradien t for J n . (A8) There exists a constant C := C ( K , B ) > 0 such that for m ∈ { 1 , 2 } , sup t ∈ [0 , 1]     d m dt m K ( u t )     Y ⩽ C d m B ( u 0 , u 1 ) (3.16) for all geo desics u t : [0 , 1] → B . 16 Remark 3.12. Note that (3.16) with m = 1 implies the existence of C > 0 suc h that ∥ K u − K v ∥ Y ⩽ C d B ( u, v ) for u, v ∈ B . (3.17) Suc h Lipschitz-lik e prop erty of K on B is inherently of metric nature and can replace (A8) in the subsequen t Theorem 5.2, if one were to consider curv es of maximal slop e with regard to so-called w eak upp er gradien ts (c.f. [4, Definition 1.2.2]). How ever, since (3.17) alone is not enough to ensure e.g. , λ -conv exity of J n , we stick to (A8) to simplify the presentation. In the following, b y C > 0 w e denote a generic constant that dep ends only on the data ( e.g. the forw ard op erator K , the fidelity term F and the regularizer R ) and whose v alue is allo wed to v ary from line to line. The dep endence of a constant on a particular parameter will be denoted by a subscript. Moreo v er, w e remark that all deriv atives of the forw ard op erator K are in tended in a w eak sense, i.e. , testing against elements in the Hilb ert space Y . W e also recall the definition of Ω n L and J n , i.e. , (3.1) and (3.5). Prop osition 3.13. Assume (A1) – (A8) Then J n is λ ( L ) -c onvex on Ω n L for some c onstant λ ( L ) ∈ R . Pr o of. Let ( c t , u t ) : [0 , 1] → Ω n L b e a geo desic, cf. (3.4). By Lemma 3.2, for every j ∈ { 1 , . . . , n } , the comp onen ts c j t = (1 − t ) c j 0 + tc j 1 , and u j t are also geo desics in their target, and b y the contin uit y of K , (3.16) also holds for m = 0. Recalling (A6) – (A7), we denote R n ( t ) := R 1 n n X j =1 ( c j t ) 2 u j t ! = 1 n n X j =1 ( c j t ) 2 . Then, by the Cauc hy–Sc hw arz inequality and (3.2), we get d 2 dt 2 R n ( t ) = 2 n n X j =1 | c j 1 − c j 0 | 2 ⩽ 2 d 2 n (( c 0 , u 0 ) , ( c 1 , u 1 )) . (3.18) Next, we compute the deriv atives of K n : [0 , 1] → Y , defined as K n ( t ) := 1 n n X j =1 K (( c j t ) 2 u j t ) = 1 n n X j =1 ( c j t ) 2 K ( u j t ) , (3.19) namely d dt K n ( t ) = 1 n n X j =1 c j t  2( c j 1 − c j 0 ) K ( u j t ) + c j t d dt K ( u j t )  , d 2 dt 2 K n ( t ) = 1 n n X j =1 h 2( c j 1 − c j 0 ) 2 K ( u j t ) + 4 c j t ( c j 1 − c j 0 ) d dt K ( u j t ) + ( c j t ) 2 d 2 dt 2 K ( u j t ) i . 17 Using Jensen’s inequality , that 0 ⩽ c j t ⩽ L for every t ∈ [0 , 1] and j ∈ { 1 , . . . , n } , (A8) and the fact that K is even strong-to-strong con tinuous (cf. (A1) and [16, Remark 3.2]), w e estimate sup t ∈ [0 , 1]    d dt K n ( t )    Y ⩽ C n,L n X j =1  | c j 1 − c j 0 | + d B ( u j 0 , u j 1 )  ⩽ C n,L  n X j =1  | c j 1 − c j 0 | 2 + d 2 B ( u j 0 , u j 1 )  1 2 = C n,L d n (( c 0 , u 0 ) , ( c 1 , u 1 )) , (3.20) and analogously , sup t ∈ [0 , 1]     d 2 dt 2 K n ( t )     Y ⩽ C n,L n X j =1  ( c j 1 − c j 0 ) 2 + ( c j 1 − c j 0 ) d B ( u j 0 , u j 1 ) + d 2 B ( u j 0 , u j 1 )  ⩽ C n,L n X j =1  | c j 1 − c j 0 | 2 + d 2 B ( u j 0 , u j 1 )  = C n,L d 2 n (( c 0 , u 0 ) , ( c 1 , u 1 )) . (3.21) Finally , b y the c hain rule, w e compute d 2 dt 2 J n ( c t , u t ) = d dt  ∇F ( K n ( t )) , d dt K n ( t )  Y + d 2 dt 2 R n ( t ) = ∇ 2 F ( K n ( t )) h d dt K n ( t ) , d dt K n ( t ) i Y +  ∇F ( K n ( t )) , d 2 dt 2 K n ( t )  Y + d 2 dt 2 R n ( t ) , whic h together with (3.18), (3.20) and (3.21) yields sup t ∈ [0 , 1]     d 2 dt 2 J n ( c t , u t )     ⩽ 2 C n,L ∥F ∥ C 2 d 2 n (( c 0 , u 0 ) , ( c 1 , u 1 )) + 2 d 2 n (( c 0 , u 0 ) , ( c 1 , u 1 )) ⩽ C n,L, F d 2 n (( c 0 , u 0 ) , ( c 1 , u 1 )) . Hence, there exists a λ := λ ( n, L, K , B , F ) < 0 suc h that J n ( c t , u t ) : [0 , 1] → R + is λ · d 2 n (( c 0 , u 0 ) , ( c 1 , u 1 ))-con vex , implying that J n : Ω n L → R + is λ -conv ex along all unit-sp eed geo desics ( c t , u t ). Example 3.14 ( λ -con vexit y for conv olutions of measures) . In this example, we show that (A8) is fulfilled for standard conv olution op erators in the space of measures. Similar reasoning applies also in more general settings. Let T b e the unit circle, whic h we implicitly identify with R / Z , and let us consider the linear op erator K : M ( T ) → L 2 ( T ), where K µ ( s ) := Z T k ( s − x ) dµ ( x ) ∀ µ ∈ M ( T ) , (3.22) with a suitable con v olution kernel k ∈ C 2 ( T ). Here, M ( T ) is the space of Radon measures on T . Cho osing as regularizer the total v ariation of measures R ( · ) := ∥ · ∥ T V , it holds that ˜ B = {± δ x : x ∈ T } . 18 Then tak e as the w eakly*-closed, geo desically complete set B := { δ x : x ∈ T } , and as d B the standard 2-W asserstein distance W 2 , which metrizes the w eak*-conv ergence on ˜ B . Note that the unit-sp eed W 2 –geo desic joining δ x 0 to δ x 1 is t 7→ δ x t where x t = (1 − t ) x 0 + tx 1 , since we implicitly iden tified T with R / Z . Along this geo desic w e hav e that for every s ∈ T , K δ x t ( s ) = Z T k ( s − x ) dδ x t ( x ) = k ( s − x t ) , whic h is t wice differentiable in t . In particular, by the c hain rule, w e obtain d dt K δ x t ( s ) = k ′ ( s − x t )( x 0 − x 1 ) , d 2 dt 2 K δ x t ( s ) = k ′′ ( s − x t )( x 0 − x 1 ) 2 . Th us, the map t 7→ K δ x t satisfies (3.16). Example 3.15 (Necessity of local con vexit y) . In this example w e sho w that Prop osition 3.13 cannot, in general, b e impro ved to a global λ –conv exit y statemen t on R + × B . Let M := M ([0 , 1]) and R ( · ) := ∥ · ∥ T V , and set again B := { δ x : x ∈ [0 , 1] } . W e define the linear map K : M → R as K µ := Z 1 0 y dµ ( y ) , and consider F ( y ) := y 2 and n = 1. If ( c, µ ) ∈ R + × B , with µ = δ x for some x ∈ [0 , 1], then J 1 ( c, µ ) = F  K ( c 2 δ x )  + c 2 = F ( c 2 x ) + c 2 = ( c 2 x ) 2 + c 2 = c 4 x 2 + c 2 . (3.23) With the ℓ 2 -pro duct metric, geo desics in R + × B hav e the form ( c t , u t ) =  (1 − t ) c 0 + tc 1 , δ (1 − t ) x 0 + tx 1  . Along such geo desics we may view J 1 as the function f ( c, x ) := c 4 x 2 + c 2 on R + × [0 , 1]. Its Hessian at ( c, x ) b ecomes ∇ 2 f ( c, x ) =  12 c 2 x 2 + 2 8 c 3 x 8 c 3 x 2 c 4 ,  (3.24) so that det ∇ 2 f ( c, x ) =  12 c 2 x 2 + 2  · 2 c 4 − (8 c 3 x ) 2 = 4 c 4  1 − 10 c 2 x 2  . (3.25) Hence, for every fixed x ∈ (0 , 1] and all sufficien tly large c , one has det ∇ 2 f ( c, x ) < 0, so ∇ 2 f ( c, x ) has a negative eigen v alue. In particular, if λ min ( c, x ) denotes the minimal eigen v alue of ∇ 2 f ( c, x ), then λ min ( c, x ) → −∞ as c → ∞ . Therefore, J 1 is only lo c al ly semicon v ex R + × B , i.e. , semicon vex on Ω L for all L > 0. Prop osition 3.13 yields as a first consequence that the lo cal slop e is a strong upp er gradient according to Definition 2.9, even with non-compact w eights, so on Ω n ∞ := [0 , ∞ ) n × B n . (3.26) Recall from Theorem 2.8 that the slop e is a local concept, hence there is no need to write, e.g. ,   ∂ J n | Ω n L   instead of just alw ays | ∂ J n | . 19 Lemma 3.16. Assume (A1) - (A5) and that the forwar d op er ator K : M → Y fulfil ls (A8) . Then, the lo c al slop e | ∂ J n | is a str ong upp er gr adient for J n on Ω n ∞ . Pr o of. By Prop osition 3.13, J n is λ ( L )-geo desically conv ex, hence by [4, Corollary 2.4.10] the metric slop e | ∂ J n | is a strong upp er gradient for J n on Ω n L . No w, let v ∈ AC ([0 , 1]; Ω n ∞ ) and take L > 0 large enough so that v t ∈ Ω n L for every t ∈ [0 , 1]. Then, for all 0 ⩽ s ⩽ t ⩽ 1, lo cality of the metric slop e giv es | J n ( v t ) − J n ( v s ) | ⩽ Z t s | ∂ J n | ( v r ) | v ′ r | d r , since | ∂ J n | is a strong upp er gradien t for J n on Ω n L . Thus | ∂ J n | is also a strong upp er gradien t for J n on Ω n ∞ . This gives the existence of curves of maximal slop e, again without the need for compactness, for whic h we recall Theorem 3.9 and (3.26) . Theorem 3.17. Assume (A1) - (A8) . Then every minimizing movement ( c t , u t ) in Ω n ∞ is a curve of maximal slop e for J n with r e gar d to | ∂ J n | . In addition, minimizing movements in Ω n L ar e also curves of maximal slop e. Pr o of. Let ( c 0 , u 0 ) ∈ Ω n ∞ , and let ( c t , u t ) b e a minimizing mo vemen t for J n in Ω n ∞ . First, assume that there exists L > 0 such that c j t ∈ [0 , L ] for all j ∈ { 1 , . . . , n } and t ⩾ 0. By Theorem 3.9, there exist discrete solutions ( c τ ℓ ,t , u τ ℓ ,t ) ∈ Ω n ∞ suc h that d n (( c τ ℓ ,t , u τ ℓ ,t ) , ( c t , u t )) → 0 as ℓ → + ∞ . (3.27) Since c j t ⩽ L , for ℓ big enough, ( c τ ℓ ,t , u τ ℓ ,t ) ∈ Ω n ˜ L for ˜ L > L . This implies that the limit is a minimizing mo vemen t in Ω n ˜ L . Now using [4, Corollary 2.4.11] together with Prop osition 3.13 gives that ( c t , u t ) is also a curve of maximal slop e for J n | Ω n ˜ L with regard to | ∂ J n | Ω n ˜ L | . Since all of the inv olv ed terms are lo cal, ( c t , u t ) is a curv e of maximal slop e for J n with regard to | ∂ J n | . So the only thing left to prov e is that for each minimizing mo vemen t ( c t , u t ) in Ω n ∞ , there exists some L > 0 suc h that c j t ∈ [0 , L ] for all j ∈ { 1 , . . . , n } and t ⩾ 0. By (A6)-(A7), and as F is b ounded from b elo w by (A2), it holds that 1 n n X j =1 ( c j t ) 2 = R  1 n n X j =1 ( c j t ) 2 u j t  = J n ( c t , u t ) − F 1 n n X j =1 ( c j t ) 2 K u j t ! ⩽ J n ( c t , u t ) + C ( F ) . (3.28) As for every τ > 0, (3.12) and (3.13) yield J n ( c 1 τ , u 1 τ ) ⩽ J n ( c 1 τ , u 1 τ ) + 1 2 τ d 2 n (( c 1 τ , u 1 τ )) , ( c 0 , u 0 )) ⩽ J n ( c 0 , u 0 ) , and thus iteratively for all k ∈ N J n ( c k τ , u k τ ) ⩽ J n ( c k − 1 τ , u k − 1 τ ) ⩽ J n ( c 0 , u 0 ) . (3.29) In view of (3.14), (3.27) (taking the limit ℓ → ∞ ) and the lo wer semicontin uit y of J n with resp ect to d n imply that J n ( c t , u t ) ⩽ J n ( c 0 , u 0 ), whic h together with (3.28) and the fact that n is fixed imply that the c j t sta y indeed uniformly b ounded for all j ∈ { 1 , . . . , n } and t ⩾ 0. 20 3.4 NPC of the extremal p oin ts and uniqueness of the flo w W e no w turn to the issue of uniqueness of the A GF with resp ect to the functional J n in tro duced in (3.5). Here, we sho w that if the metric space B is non-p ositively curv ed (NPC), cf. Definition 2.5, we can ensure such a uniqueness property . T o this end, in Section A.1 we recall a lo cal uniqueness result, see Theorem A.4, which we extend b elo w to a global one. W e remind the reader that uniqueness of gradien t flo ws on NPC metric spaces w as extensiv ely studied in [37], and among other w orks, this analysis was further refined in [4, Chapter 4]. First, it is w ell-kno wn that B b eing NPC implies that Ω n L is as w ell. This is essentially a conse- quence of Theorem 3.2, and of c ho osing the ℓ 2 -metric on the pro duct space Ω n L . Lemma 3.18 (NPC of pro duct space) . If ( B , d B ) is a sp ac e of NPC ac c or ding to The or em 2.5, then so is (Ω n L , d n ) . W e are then ready to sho w uniqueness, provided that ( B , d B ) is NPC. Uniqueness also comes with con traction estimates b et ween minimizing mo vemen ts with different initial p oin ts. Theorem 3.19. Assume (A1) - (A8) and that ( B , d B ) is NPC. Then, for every initial p oint ( c 0 , u 0 ) ∈ Ω n ∞ , ther e exists a unique minimizing movement for J n . Mor e over, for every other initial p oint ( ˜ c 0 , ˜ u 0 ) ∈ Ω n ∞ , and c orr esp onding minimizing movement ( ˜ c t , ˜ u t ) the fol lowing c ontr action estimate holds: d n  ( c t , u t ) , ( ˜ c t , ˜ u t )  ⩽ e − λt d n  ( c 0 , u 0 ) , ( ˜ c 0 , ˜ u 0 )  for L 1 -a.e. t > 0 , (3.30) for a λ ∈ R dep ending only on | c 0 | and | ˜ c 0 | . Pr o of. Again by (A6)-(A7), and as F is b ounded from b elow b y a constan t − C ( F ), it holds that J n ( c , u ) = F 1 n n X j =1 ( c j ) 2 K ( u j ) ! + 1 n n X j =1 ( c j ) 2 ⩾ 1 n n X j =1 ( c j ) 2 − C ( F ) , i.e. , 1 n n X j =1 ( c j ) 2 ⩽ J n ( c , u ) + C ( F ) . Assume that ( c t , u t ) is a minimizing mo vemen t with initial p oin t ( c 0 , u 0 ). Then by the previous estimate and (3.29), we hav e that ( c t , u t ) ∈ Ω n L for some L = L ( n, ( c 0 , u 0 )) large enough. Since J n is λ -con vex on Ω L n b y Prop osition 3.13, minimizing mov emen ts in Ω n L exist, and fulfill the con tractivity prop ert y (3.30) by Theorem A.4, cf. (A.7). 4 The lifted problem T o further our analysis, w e follow a lifting approach to the W asserstein space, detailed in [14]. This leads to a comparison of Atomic Gradient Flo ws in Ω n L v ersus metric gradient flo ws in P 2 (Ω n L ) in Section 5, whic h can b e thought in spirit similar to the in v estigation of Chiz´ at and Bach in [23]. A key difference here is that lifted particles are Dirac deltas supp orted on elements of Ω n L . In particular, for T V -regularized problems, where extremal p oints are Dirac deltas, lifted particles w ould b e elemen ts of the form δ δ x . How ever, it is clear that this represen tation of lifted particles is isometric to δ x , so 21 that the lifted formulation remains fully consistent with the p ersp ectiv e of Chiz´ at and Bac h (more details will b e giv en in Section 6). Note that to k eep the lifted problem equiv alent to the original problem (2.1) on M , (only) in this section one has to work again with the full set of extremal p oints ˜ B of (2.8). Definition 4.1 ( Lifting in the sp ac e of p ositive me asur es ) . W e consider the lifting of (2.1) to the space of p ositive measures on ˜ B as follows: inf µ ∈ M + ( ˜ B ) j ( µ ) , where j ( µ ) := F ( K µ ) + ∥ µ ∥ T V , (4.1) where M + ( ˜ B ) is the cone of p ositive (Radon) measures on ˜ B . The forward operator K : M + ( ˜ B ) → Y is set as K µ := K I ( µ ) , (4.2) where the representativ e I ( µ ) ∈ M is defined via the follo wing v ersion of Cho quet’s the or em . Prop osition 4.2 ( V ersion of Cho quet’s the or em , [14, Prop osition 5.2] & [40, page 14]) . Every me a- sur e µ ∈ M + ( ˜ B ) defines the line ar action of some I ( µ ) ∈ Dom( R ) via duality, namely, ⟨I ( µ ) , p ⟩ = Z ˜ B ⟨ v , p ⟩ dµ ( v ) ∀ p ∈ C . (4.3) F urthermor e, the map I : M + ( ˜ B ) → Dom( R ) ⊂ M is a line ar surje ction and I ( µ ) is c al le d the (we ak-) b aryc enter of µ . F or α > 0, we also write M + α ( ˜ B ) := { µ ∈ M + ( ˜ B ) : ∥ µ ∥ T V ⩽ α } . W e can now state the equiv alence b et ween the optimization of the lifted problem and the original one. Prop osition 4.3. By [14, p age 4 and The or em 5.4], we have the e quivalenc e min u ∈M J ( u ) = min µ ∈ M + ( ˜ B ) j ( µ ) . (4.4) In addition, ther e exists an α = α ( J ) > 0 such that min u ∈M J ( u ) = min µ ∈ M + α ( ˜ B ) j ( µ ) . (4.5) Pr o of. The statement in (4.4) is already pro ved in [14], while for (4.5) it suffices to consider a minimizer u ∗ ∈ arg min u ∈M J ( u ), and α > R ( u ∗ ). 4.1 Lifting the problem in W asserstein space Recalling (3.1) and the subsequen t notation, let ˜ Ω L := [0 , L ] × ˜ B for a fixed L > √ α , with α > 0 as in Theorem 4.3. Denote b y P 2 ( ˜ Ω L ) the space of probability measures in ˜ Ω L with finite second momen t endow ed with the W asserstein-2 metric. In particular, since the metric d ˜ B is bounded, we can write P 2 ( ˜ Ω L ) := n ν ∈ M + ( ˜ Ω L ) : ν ( ˜ Ω L ) = 1 o . (4.6) 22 F or ev ery ν 1 , ν 2 ∈ P 2 ( ˜ Ω L ), their W asserstein 2-distance is defined as W 2 ( ν 1 , ν 2 ) := s inf γ ∈ Γ ν 1 ,ν 2 Z ˜ Ω L × ˜ Ω L d 2 Ω ( ω 1 , ω 2 ) dγ ( ω 1 , ω 2 ) , (4.7) where Γ ν 1 ,ν 2 := { γ ∈ P 2 ( ˜ Ω L × ˜ Ω L ) : ( π 1 ) # ( γ ) = ν 1 , ( π 2 ) # ( γ ) = ν 2 } . (4.8) In (4.8), for i = 1 , 2, w e hav e denoted b y π i : ˜ Ω L × ˜ Ω L → ˜ Ω L the i -th co ordinate pro jection. As a reminder, cf. (3.3), the distance d Ω : ˜ Ω L → R + is defined as follo ws. F or ω i := ( c i , u i ) ∈ ˜ Ω L , we set d Ω ( ω 1 , ω 2 ) :=  | c 1 − c 2 | 2 + d 2 ˜ B ( u 1 , u 2 )  1 / 2 . (4.9) The infimum in (4.7) is actually a minim um, cf. [4, Section 7.1], and the set of all p ossible corre- sp onding minimizers will b e denoted by Γ 0 ( ν 1 , ν 2 ). W e define the homogeneous pro jection operator π : P 2 ( ˜ Ω L ) → M + ( ˜ B ) via Z ˜ B ψ ( u ) d [ π ν ]( u ) := Z ˜ Ω L c 2 ψ ( u ) dν ( c, u ) ∀ ν ∈ P 2 ( ˜ Ω L ) , ψ ∈ C ( ˜ B ) . (4.10) Note that the homogeneous pro jection is a t ypical to ol for addressing the unbalanced formulation of Optimal T ransp ort and is also one wa y to define the Hellinger-Kantoro vich distance [36]. Thanks to this homogeneous pro jection we can then transform the problem in the righ t hand side of (4.1) in to a minimization problem in P 2 ( ˜ Ω L ) in the follo wing wa y . Definition 4.4 ( Lifting in the sp ac e of pr ob ability me asur es ) . W e consider the problem inf ν ∈P 2 ( ˜ Ω L ) J ( ν ) , where J ( ν ) := j ( π ν ) . (4.11) Then, the functional J can b e written more explicitly as J ( ν ) = F ( K [ π ν ]) + Z ˜ Ω L c 2 dν ( c, u ) . Using (4.2), (4.3), and (4.10), we can also chec k that F ( K [ π ν ]) = F ( K I [ π ν ]) = F  Z ˜ Ω L c 2 K u dν ( c, u )  , so that in total J ( ν ) = F  Z ˜ Ω L c 2 K u dν ( c, u )  + Z ˜ Ω L c 2 dν ( c, u ) . (4.12) Remark 4.5. The definitions ab ov e can b e made analogously with B instead of ˜ B , but as mentioned b efore, the equiv alence of the problems that we sho w in the follo wing only holds with ˜ B in general. Next, w e show that problems (4.1) and (4.11) are equiv alent. W e first need to pro ve that the pro jection op erator π is surjectiv e. 23 Lemma 4.6. F or every α > 0 and L > √ α , we have that the homo gene ous pr oje ction op er ator π : P 2 ( ˜ Ω L ) → M + α ( ˜ B ) is surje ctive. Pr o of. Consider a measure µ ∈ M + α ( ˜ B ). Note that if µ = 0, then π ν 0 = µ for every ν 0 of the form ν 0 = δ 0 ⊗ δ u , and for any u ∈ ˜ B . Therefore, in what follo ws w e will supp ose that µ ( ˜ B ) > 0. Define no w ν µ := δ √ µ ( ˜ B ) ⊗ µ µ ( ˜ B ) ∈ M + α ( ˜ Ω L ) , (4.13) since L > √ α . Note that for every test function φ ∈ C ( ˜ Ω L ), we hav e ν µ ( φ ) := Z ˜ Ω L φ ( c, u ) dδ √ µ ( ˜ B ) ( c ) d  µ µ ( ˜ B )  ( u ) = Z ˜ B φ ( q µ ( ˜ B ) , u ) d  µ µ ( ˜ B )  ( u ) . T esting the ab o v e definition with the constant function φ ≡ 1, it is clear that the measure ν µ as defined in (4.13) is a probabilit y measure. Finally , let us v erify that π ν µ = µ . F or ev ery ψ ∈ C ( ˜ B ) it holds that π ν µ ( ψ ) = Z ˜ B ψ ( u ) d [ π ν µ ]( u ) = Z ˜ Ω L c 2 ψ ( u ) dν µ ( c, u ) = Z [0 ,L ] × ˜ B c 2 ψ ( u ) dδ √ µ ( ˜ B ) ( c ) d  µ µ ( ˜ B )  ( u ) = Z ˜ B ψ ( u ) dµ ( u ) = µ ( ψ ) , sho wing that π ν µ = µ . Prop osition 4.7. F or every α > 0 and L > √ α , it holds that min ν ∈P 2 ( ˜ Ω L ) J ( ν ) = min µ ∈ M + α ( ˜ B ) j ( µ ) . (4.14) Mor e over, if ν ∈ P 2 ( ˜ Ω L ) minimizes J , then π ν ∈ M + α ( ˜ B ) minimizes j . Conversely, if µ ∈ M + α ( ˜ B ) minimizes j , then every ν ∈ P 2 ( ˜ Ω L ) such that π ν = µ minimizes J . Pr o of. The existence of minimizers in both problems follows b y the structural assumptions (A1)-(A6) and a standard application of the direct metho d in the Calculus of V ariations. The equiv alence of the tw o minimization problems follo ws immediately from the surjectivity of π and the definition of J in (4.11). Indeed, let ν ∈ P 2 ( ˜ Ω L ) b e a minimizer of J . Then, for ev ery µ ∈ M + α ( ˜ B ), using Lemma 4.6, there exists ν µ ∈ P 2 ( ˜ Ω L ) such that π ν µ = µ , so that by (4.11), w e hav e j ( µ ) = j ( π ν µ ) = J ( ν µ ) ⩾ J ( ν ) = j ( π ν ) , i.e. , π ν ∈ M + α ( ˜ B ) is a minimizer for j . The con verse follows analogously , sho wing (4.14). 4.2 Minimizing mo v emen ts for the lifted problem F ollowing the previous discussion and the approach in Subsection 3.1, we fix again a closed (so compact), geo desically complete subset B ⊂ ˜ B , and recalling (3.1)-(3.3), we consider Ω L = [0 , L ] × B , 24 and the lifted functional J on P 2 (Ω L ), whic h itself a compact and geo desically connected subset of P 2 ( ˜ Ω L ). W e recall that, having restricted to Ω L , the functional J is defined as J ( ν ) := F  Z Ω L c 2 K u dν ( c, u )  + Z Ω L c 2 dν ( c, u ) . (4.15) Again, we use the JK O sc heme to show existence of mimizing mo v ements. W e note that similarly as in Subsection 3.1, w e can only hop e to reco ver an approximate minimum of J with J in conv B , but here one needs to restrict the supp ort of our measures to B to obtain existence of, e.g. , curves of maximal slop e. Definition 4.8 ( Appr oximation scheme ) . Let τ > 0 and ν 0 ∈ P 2 (Ω L ) b e a given initialization. Set ν 0 τ := ν 0 , and choose (iterativ ely) for all k ∈ N 0 , ν k +1 τ ∈ argmin ν ∈P 2 (Ω L )  J ( ν ) + 1 2 τ W 2 2 ( ν, ν k τ )  . (4.16) Define then the piecewise constant curve ν τ : [0 , + ∞ ) → P 2 (Ω L ) by ν 0 τ = ν 0 and for every k ∈ N 0 , ν τ ,t := ν k τ for t ∈ (( k − 1) τ , k τ ] . (4.17) The next theorem sho ws the existence of minimizing mo v ements for the lifted problem. Theorem 4.9 ( Minimizing movements as limit p ath for τ → 0) . With ν τ ,t as in (4.17) , ther e exists ( τ ℓ ) ℓ ∈ N ⊂ (0 , 1) with τ ℓ ↘ 0 , and a curve ν t ∈ AC 2 loc ([0 , + ∞ ); P 2 (Ω L )) such that for every t ∈ [0 , + ∞ ) it holds that lim ℓ →∞ W 2 ( ν τ ℓ ,t , ν t ) → 0 as ℓ → ∞ . In p articular, we have that ν t is a minimizing movement in the sense of The or em A.3. Pr o of. W e first chec k that J : P 2 (Ω L ) → R is lo wer semicon tinuous with resp ect to the narro w con vergence. Indeed, if ( ν m ) m ∈ N ⊂ P 2 (Ω L ) is narrowly con v erging to ν ∈ P 2 (Ω L ), and w e assume without restriction that lim inf m →∞ J ( ν m ) = lim m →∞ J ( ν m ) =: M < + ∞ , (4.18) then lim inf m →∞ Z Ω L c 2 dν m ( c, u ) ⩾ Z Ω L c 2 dν ( c, u ) , thanks to [4, F orm ula 5.1.15]. Since F is contin uous and conv ex cf. (A2), it is also weakly low er semicon tinuous. Therefore, it is enough to prov e that lim m →∞ Z Ω L c 2 K u dν m ( c, u ) = Z Ω L c 2 K u dν ( c, u ) with resp ect to the w eak con v ergence in Y . T o this end, given y ∈ Y it holds that  Z Ω L c 2 K u dν m ( c, u ) , y  Y = Z Ω L c 2 ( K u, y ) Y dν m ( c, u ) . (4.19) 25 Note that c 2 | ( K u, y ) Y | is uniformly in tegrable with resp ect to ν m . Indeed, Z Ω L c 2 | ( K u, y ) Y | dν m ⩽  ∥ K ∥∥ y ∥ Y sup u ∈B ∥ u ∥ M  J ( ν m ) , and the righ t hand side of the last inequality is uniformly b ounded in m thanks to (4.18). Therefore, applying [4, Lemma 5.1.7] to (4.19) we conclude the low er semicon tinuit y of J with resp ect to the narro w con v ergence. Finally , since W 2 metrizes the narro w con vergence in P 2 (Ω L ), we can apply [4, Prop osition 2.2.3] to conclude the pro of. 4.3 λ -Con v exit y for the lifted problem Analogously to Subsection 3.3, we turn to the study of conv exity properties of the functional J along geo desics in the W asserstein space P 2 (Ω L ). In particular, we address the question whether for a geo desic γ t ∈ P 2 (Ω L ), the function t 7→ J ( γ t ) := F  Z Ω L c 2 K u dγ t ( c, u )  + Z Ω L c 2 dγ t ( c, u ) is λ -con v ex, i.e. , fulfills (2.13). In the following, we denote b y C ([0 , 1]; Ω L ) the separable and complete metric space of con tinuous curves in Ω L , endow ed with the metric of uniform con vergence induced b y d Ω , cf. (3.3). Define then the evaluation map e t : C ([0 , 1]; Ω L ) → Ω L as e t ( v ) := v ( t ) for ev ery t ∈ [0 , 1] . Note also that by [2, Theorem 10.6], geo desics in P 2 (Ω L ) exist, as Ω L is a geo desic space. T o prov e the λ -con vexit y , we first recall the metric dynamic al formulation of optimal tr ansp ort , whic h is a ma jor ingredien t in the pro of. F or ev ery µ 0 , µ 1 ∈ P 2 (Ω L ), consider the minimization problem min n Z C ([0 , 1];Ω L ) Z 1 0 | γ ′ t | 2 dt dη ( γ ) : η ∈ P ( C ([0 , 1]; Ω L )) , ( e 0 , e 1 ) # η = ( µ 0 , µ 1 ) o . (4.20) W e call any such admissible η a dynamic tr ansp ort plan , the collection of whic h is denoted by DTP( µ 0 , µ 1 ), and any optimal one will b e called an optimal dynamic ge o desic plan . W e denote by OptGeo( µ 0 , µ 1 ) the collection of all optimal geo desic plans from µ 0 to µ 1 , whic h is b e supp orted on Geo(Ω L ) (see [2, Theorem 9.13]). Using [2, Theorem 9.13] and Lemma 3.2, it holds that W 2 2 ( µ 0 , µ 1 ) = min η ∈ DTP( µ 0 ,µ 1 ) Z Geo(Ω L ) Z 1 0 | ( c t , u t ) ′ | 2 dt dη ( c, u ) . (4.21) In addition, [2, Theorem 10.6] guarantees that for an y geodesic γ t ∈ P 2 (Ω L ), there exists an η ∈ P ( C ([0 , 1]; Ω L ) with η ∈ OptGeo( γ 0 , γ 1 ) and spt η ⊂ Geo(Ω L ) such that γ t = ( e t ) # η . (4.22) Th us, fixing an y geo desic γ t ∈ P 2 (Ω L ), we can write J ( γ t ) as J ( γ t ) = F  Z Ω L c 2 K u d [( e t ) # η ]( c, u )  + Z Ω L c 2 d [( e t ) # η ]( c, u ) = F  Z Geo(Ω L ) c 2 t K u t dη ( c, u )  + Z Geo(Ω L ) c 2 t dη ( c, u ) . (4.23) 26 Prop osition 4.10. Assume (A1) - (A8) . Then for every L > 0 , ther e exists a λ ( L ) ⩽ 0 such that J is λ -c onvex along ge o desics on P 2 (Ω L ) . Pr o of. Note that for ev ery geo desic ( c t , u t ) ∈ Geo(Ω L ) b y Theorem 3.2 we hav e that c t ∈ Geo( R + ) and u t ∈ Geo( B ). Thus c t = (1 − t ) c 0 + tc 1 . Therefore, d dt c 2 t = 2 t ( c 1 − c 0 ) 2 + 2( c 1 − c 0 ) c 0 ⩽ 2  | c 1 − c 0 | 2 + | c 1 − c 0 | c 0  , and d 2 dt 2 c 2 t = 2( c 1 − c 0 ) 2 . (4.24) Let no w γ t ∈ P 2 (Ω L ) b e a geo desic. Thanks to [2, Theorem 10.6] there exists η ∈ P (Geo(Ω L )) with γ t = ( e t ) # η and η ∈ OptGeo( γ 0 , γ 1 ). Therefore, b y the Dominated Con vergence Theorem, d dt Z Ω L c 2 dγ t = lim h → 0  Z Ω L c 2 dγ t + h − Z Ω L c 2 dγ t  /h = lim h → 0  Z Ω L c 2 d ( e t + h ) # η − Z Ω L c 2 d ( e t ) # η  /h = lim h → 0 Z Geo(Ω L )  c 2 t + h − c 2 t h  dη = Z Geo(Ω L ) d dt ( c 2 t ) dη , and similarly d 2 dt 2 R Ω L c 2 dγ t = R Geo(Ω L ) d 2 dt 2 ( c 2 t ) dη . Recalling (2.16) for geo desics, (4.21) and (4.24), w e can further estimate d 2 dt 2 Z Ω L c 2 dγ t = Z Geo(Ω L ) 2( c 1 − c 0 ) 2 dη ⩽ 2 Z Geo(Ω L )  | c 1 − c 0 | 2 + d 2 B ( u 1 , u 0 )  dη = 2 Z Geo(Ω L ) Z 1 0 | ( c t , u t ) ′ | 2 dt dη = 2 W 2 2 ( γ 0 , γ 1 ) . (4.25) Moreo ver, by a similar application of the Dominated Con v ergence Theorem, w e hav e d dt Z Ω L c 2 K u dγ t = Z Geo(Ω L )  d dt ( c 2 t ) K u t + c 2 t d dt K u t ,  dη (4.26) and d 2 dt 2 Z Ω L c 2 K u dγ t = Z Geo(Ω L )  d 2 dt 2 ( c 2 t ) K u t + 2 d dt ( c 2 t ) d dt K u t + c 2 t d 2 dt 2 K u t  dη . (4.27) Arguing as in the proof of Theorem 3.13, using (4.26), Jensen’s inequalit y for the probability measure η on Geo(Ω L ), (4.24), (3.16), the Cauch y-Sc h warz inequalit y , and for a constan t C := C ( K, B , L ) > 0 that is allow ed to v ary from line to line, we estimate    d dt Z Ω L c 2 K u dγ t    2 Y ⩽ Z Geo(Ω L )   d dt c 2 t   2 ∥ K u t ∥ 2 Y + 2   d dt c 2 t   | c 2 t |∥ K u t ∥ Y   d dt K u t   Y + | c 2 t | 2     d dt K u t     2 Y dη ⩽ C  Z Geo(Ω L ) | d dt c 2 t | 2 ∥ K u t ∥ 2 Y dη + Z Geo(Ω L ) | c 2 t | 2     d dt K u t     2 Y dη  ⩽ C  Z Geo(Ω L )   | c 1 − c 0 | 2 + | c 1 − c 0 | c 0  2 + ( | c 0 | + | c 1 − c 0 | ) 4 d 2 B ( u 0 , u 1 )  dη  ⩽ C Z Geo(Ω L )  | c 1 − c 0 | 2 + d 2 B ( u 0 , u 1 )  dη = C W 2 2 ( γ 0 , γ 1 ) , (4.28) 27 where in the last equalit y we used again (4.21). Analogously , using this time (4.27) and again (3.16), w e estimate Z Geo(Ω L ) c 2 t     d 2 dt 2 K u t     Y dη ⩽ C Z Geo(Ω L ) ( | c 1 − c 0 | + c 0 ) 2 d 2 B ( u 0 , u 1 ) dη ⩽ C Z Geo(Ω L )  | c 1 − c 0 | 2 + d B ( u 0 , u 1 ) 2  dη = C W 2 2 ( γ 0 , γ 1 ) , whic h directly leads to    d 2 dt 2 Z Ω L c 2 K u dγ t    Y ⩽ C W 2 2 ( γ 0 , γ 1 ) . (4.29) Finally , considering the R -v alued map [0 , 1] ∋ t 7→ J ( γ t ), and setting for brevity K ( t ) := Z Geo(Ω L ) c 2 t K u t dη , b y (4.23), (4.25), (4.28), and (4.29), w e get d 2 dt 2 J ( γ t ) = d 2 dt 2 F ( K ( t )) + d 2 dt 2 Z Geo(Ω L ) c 2 t dη ⩽ d dt  ∇F K ( t ) , d dt K ( t )  Y + 2 W 2 2 ( γ 0 , γ 1 ) = ∇ 2 F K ( t )  d dt K ( t ) , d dt K ( t )  +  ∇F K ( t ) , d 2 dt 2 K ( t )  Y + 2 W 2 2 ( γ 0 , γ 1 ) ⩽ ||F || C 2     d dt K ( t )     2 Y +     d 2 dt 2 K ( t )     Y ! + 2 W 2 2 ( γ 0 , γ 1 ) ⩽ C W 2 2 ( γ 0 , γ 1 ) , where the constan t C > 0 in the last line depends also on F . Th us, the map t 7→ J ( γ t ) is ˜ λ W 2 2 ( γ 0 , γ 1 )- con vex, for some ˜ λ := ˜ λ ( F , K, B , L ) ⩽ 0. This exactly implies that J is ˜ λ -con vex along geo desics in Ω L . Theorem 4.11. Assume (A1) - (A8) . Then | ∂ J | is a str ong upp er gr adient for J on P 2 (Ω L ) . In addition, every minimizing movement µ t is a curve of maximal slop e for J with r e gar d to | ∂ J | . Pr o of. First, [4, Corollary 2.4.10] implies that for all L > 0, | ∂ J | is a strong upper gradien t for J , since the latter is λ -conv ex b y Prop osition 4.10 and lo w er-semicontin uous. Next, let µ 0 ∈ Dom( J ) and µ t a minimizing mov emen t for J , whic h exists b y Theorem 4.9. Then since J is λ -con vex on P 2 (Ω L ), [4, Corollary 2.4.11] implies that µ t is a curve of maximal slop e for J with regard to | ∂ J | . Remark 4.12. Note that here one cannot obtain uniqueness of the curves of maximal slop e via standard methods, contrary to the setting for the non-lifted functional J n . This is b ecause in general, P 2 ( X ) is not NPC, as geo desics can b e non-unique. F or X b eing a Hilb ert space, the uniqueness in P 2 ( X ) still holds for lo wer semicontin uous functionals whic h are λ -con vex along geo desics (see [4, Theorem 11.1.4]), but follo ws from a v ariational c haracterization of the so-called W asserstein sub differen tial. If X is just a metric space suc h as in our setting, this characterization and even the W asserstein sub differen tial are not av ailable. 28 5 Relating the minimizing mo v emen ts In this section w e aim to relate the minimizing mov emen ts defined in Subsections 3.2 and 4.2. In particular, w e will sho w that gradient flo ws for J n induce, through lifting, gradient flo ws for the lifted functional J . The precise statement is given below in Theorem 5.1. W e remark that it is a generalization of [23, Prop osition B.1]. How ev er, since we are here dealing with measures defined on general metric spaces, w e cannot rely on the definition of W asserstein gradien t flow through the con tinuit y equation, thus requiring a substan tially differen t approach for the pro of. Throughout this section, we will alwa ys assume (A1)-(A8), i.e. , the standard assumptions, the no loss of mass condition on the regularizer (3.7), that 0 / ∈ B and the compatibilit y condition b etw een K and the metric in B (see also Remark 3.12). F or the next theorem, we again recall (3.1), (3.5), and (4.15). Theorem 5.1. Supp ose that (A1) - (A8) hold, and let ( c t , u t ) ∈ AC ([0 , 1] , Ω n L ) b e a curve of maximal slop e for J n with r esp e ct to the str ong upp er gr adient | ∂ J n | . Then, µ t := 1 n n X j =1 δ ( c j t ,u j t ) ∈ P 2 (Ω L ) (5.1) is a curve of maximal slop e for J with r e gar d to the str ong upp er gr adient | ∂ J | . W e first sho w that Theorem 5.1 is a consequence of the slop e equality in Theorem 5.2, whose pro of will b e p ostp oned to Subsections 5.1 and 5.2. Theorem 5.2. Supp ose that (A1) - (A8) hold and let n ⩾ 1 and µ t b e as in (5.1) . Then, | ∂ J | ( µ t ) = | ∂ J n | ( c t , u t ) , (5.2) wher e L > 0 is arbitr ary, but fixe d. Pr o of of The or em 5.1. Note that | ∂ J n | and | ∂ J | are strong upp er gradien ts for J n and J resp ectiv ely b y Theorem 4.11 and Lemma 3.16. Recalling Definitions 2.9 and 2.10, w e need to show that 2 d dt J ( µ t ) ⩽ −| µ ′ t | 2 − | ∂ J | 2 ( µ t ) for L 1 − a . e . t ∈ [0 , 1] , (5.3) using the fact that ( c t , u t ) is already a curve of maximal slop e for J n , so fulfills 2 d dt J n ( c t , u t ) ⩽ −| ( c t , u t ) ′ | 2 − | ∂ J n | 2 ( c t , u t ) for L 1 − a . e . t ∈ [0 , 1] . (5.4) F or this, note first that by (5.1), (4.15), (A1) and (3.5), J ( µ t ) = F 1 n n X j =1 ( c j t ) 2 K u j t ! + 1 n n X j =1 ( c j t ) 2 = J n ( c t , u t ) , therefore it holds that d dt J ( µ t ) = d dt J n ( c t , u t ). Also using (2.16) on ( P 2 (Ω L ) , W 2 ) and (4.7) with the admissible transp ort plan γ =  1 n n X j =1 δ ( c j t ,u j t ) , 1 n n X j =1 δ ( c j s ,u j s )  , 29 as well as (4.9), (3.2) and (2.16) on (Ω n L , d n ), we obtain | µ ′ t | = lim s → t W 2 ( µ s , µ t ) | s − t | ⩽ lim s → t d n (( c s , u s ) , ( c t , u t )) | s − t | = | ( c t , u t ) ′ | . Th us, using (5.4), the ab o ve inequality and (5.2) already giv es 2 d dt J ( µ t ) = 2 d dt J n ( c t , u t ) ⩽ −| ( c t , u t ) ′ | 2 − | ∂ J n | 2 ( c t , u t ) ⩽ −| µ ′ t | 2 − | ∂ J n | 2 ( c t , u t ) = −| µ ′ t | 2 − | ∂ J | 2 ( µ t ) for L 1 − a . e . t ∈ [0 , 1] , whic h shows (5.3), and concludes the pro of of the theorem. Hence, the rest of the section is dev oted to the pro of of the equalit y of the corresp onding metric slop es of the functionals J n and J . 5.1 The one-particle case ( n = 1) In the following, w e prov e Theorem 5.2 first in the case n = 1 b y a simple lo calization argument, as the pro of is significantly simpler than for n > 1. F or this, we need the following lemma, the pro of of whic h we defer to Section A.2 for b etter readabilit y . Lemma 5.3. Supp ose that (A1) - (A8) hold. L et n = 1 and µ t = δ ( c t ,u t ) . F or L 1 -a.e. t ∈ [0 , 1] , it holds that lim ν ∗ ⇀µ t    R Ω L F ( c 2 K u ) dν − F  R Ω L c 2 K u dν     W 2 ( ν, µ t ) = 0 . (5.5) Then we can sho w the slop e equalit y for n = 1. Prop osition 5.4. In the setting of The or em 5.1 and The or em 5.3, one has | ∂ J | ( µ t ) = | ∂ J 1 | ( c t , u t ) . Pr o of. Note that b y the definitions (2.17), (3.5), and (4.15), | ∂ J | ( µ t ) = lim sup ˜ µ ∗ ⇀µ t ( J ( µ t ) − J ( ˜ µ )) + W 2 ( µ t , ˜ µ ) ⩾ lim sup (˜ c, ˜ u ) → ( c t ,u t ) ( J ( δ ( c t ,u t ) ) − J ( δ (˜ c, ˜ u ) )) + W 2 ( δ ( c t ,u t ) , δ (˜ c, ˜ u ) ) = lim sup (˜ c, ˜ u ) → ( c t ,u t ) ( J 1 ( c t , u t ) − J 1 (˜ c, ˜ u )) + d Ω (( c t , u t ) , (˜ c, ˜ u )) = | ∂ J 1 | ( c t , u t ) , i.e. , the inequality | ∂ J | ( µ t ) ⩾ | ∂ J 1 | ( c t , u t ) is a direct consequence of the definitions. F or the reverse inequality , let ν ∈ P 2 (Ω L ). By (4.15), and that R ( u ) = 1, we can estimate J ( µ t ) − J ( ν ) = J 1 ( c t , u t ) − Z Ω L J 1 ( c, u ) dν + Z Ω L J 1 ( c, u ) dν − F  Z Ω L c 2 K u dν  − Z Ω L c 2 dν ⩽ Z Ω L ( J 1 ( c t , u t ) − J 1 ( c, u )) + dν +     Z Ω L F ( c 2 K u ) dν − F  Z Ω L c 2 K u dν      . (5.6) 30 Since | ∂ J 1 | ( c t , u t ) = lim sup ( c,u ) → ( c t ,u t ) ( J 1 ( c t , u t ) − J 1 ( c, u )) + d Ω (( c t , u t ) , ( c, u )) , for every ε > 0 there exists δ > 0 suc h that if d Ω (( c t , u t ) , ( c, u )) ⩽ δ , then ( J 1 ( c t , u t ) − J 1 ( c, u )) + d Ω (( c t , u t ) , ( c, u )) ⩽ | ∂ J 1 | ( c t , u t ) + ε . Note also that, thanks to [4, Theorem 5.3.1] and Jensen’s inequality , for π ∈ Γ 0 ( ν, µ t ) (cf. the notation after (4.9)), W 2 ( ν, µ t ) =  Z Ω L Z Ω L d 2 Ω (( c, u ) , (˜ c, ˜ u )) dπ (( c, u ) , ( ˜ c, ˜ u ))  1 / 2 =  Z Ω L d 2 Ω (( c t , u t ) , ( c, u )) dν ( c, u )  1 / 2 ⩾ Z Ω L d Ω (( c t , u t ) , ( c, u )) dν ( c, u ) . Therefore, denoting for brevit y B δ,t := B δ (( c t , u t )) , (5.7) the latter intending the δ -ball in Ω L cen tered in ( c t , u t ), we can easily estimate, R B δ,t d Ω (( c t , u t ) , ( c, u )) dν W 2 ( ν, µ t ) ⩽ R Ω L d Ω (( c t , u t ) , ( c, u )) dν W 2 ( ν, µ t ) ⩽ 1 . (5.8) By using (5.8) we can therefore decomp ose Z Ω L ( J 1 ( c t , u t ) − J 1 ( c, u )) + W 2 ( ν, µ t ) dν = Z B δ,t ( J 1 ( c t , u t ) − J 1 ( c, u )) + W 2 ( ν, µ t ) dν + Z Ω L \ B δ,t ( J 1 ( c t , u t ) − J 1 ( c, u )) + W 2 ( ν, µ t ) dν ⩽ ( | ∂ J 1 | ( c t , u t ) + ε ) R B δ,t d Ω (( c t , u t ) , ( c, u )) dν W 2 ( ν, µ t ) + Z Ω L \ B δ,t ( J 1 ( c t , u t ) − J 1 ( c, u )) + W 2 ( ν, µ t ) dν ⩽ | ∂ J 1 | ( c t , u t ) + ε + 2  sup ( c,u ) ∈ Ω L J 1 ( c, u )  ν (Ω L \ B δ,t ) W 2 ( ν, µ t ) . (5.9) In addition, for ev ery δ > 0 it holds that lim ν → ∗ µ t ν (Ω L \ B δ,t ) W 2 ( ν, µ t ) = 0 . (5.10) Indeed, W 2 2 ( ν, µ t ) ⩾ Z Ω L \ B δ,t d 2 Ω (( c t , u t ) , ( c, u )) dν ⩾ δ 2 ν (Ω L \ B δ,t ) = ⇒ ν (Ω L \ B δ,t ) ⩽ W 2 2 ( ν, µ t ) δ 2 , (5.11) 31 from whic h (5.10) follo ws. Therefore, if ( ν k ) k ∈ N ⊂ P 2 (Ω L ), with ν k ∗ ⇀ µ t as k → ∞ , is a sequence attaining the lim sup in the definition of | ∂ J | ( µ t ), we can com bine (5.6), (5.5), (5.9) and (5.10), to obtain | ∂ J | ( µ t ) = lim sup k →∞ ( J ( µ t ) − J ( ν k )) + W 2 ( ν k , µ t ) ⩽ | ∂ J 1 | ( c t , u t ) + ε + 2  sup ( c,u ) ∈ Ω L J 1 ( c, u )  lim sup k →∞ ν k (Ω L \ B δ,t ) W 2 ( ν k , µ t ) ⩽ | ∂ J 1 | ( c t , u t ) + ε . and since ε is arbitrary , this concludes the pro of of the inequality | ∂ J | ( µ t ) ⩽ | ∂ J 1 | ( c t , u t ), and thus of the prop osition. 5.2 The man y-particle case In this subsection w e generalize Theorem 5.4 to n particles, hence dealing now with the case ( c t , u t ) ∈ Ω n L . T o this end, one needs to generalize kno wn facts ab out semi-discrete optimal transp ort to the setting of metric spaces, which for con venience of the reader we defer to in Subsection 5.3. Throughout this section, we will alw ays assume that the particles are distinct (see Remark 5.6) and we tak e δ > 0 small enough such that the balls B δ (( c j t , u j t )) are pairwise disjoin t . (5.12) F or brevit y , with a sligh t abuse of notation in this subsection, for fixed ( c t , u t ) ∈ Ω n L , w e will also write (i) y t := ( c t , u t ) ∈ Ω n L , y j t := ( c j t , u j t ) ∈ Ω L for j ∈ { 1 , . . . , n } , (ii) x := ( c , u ) ∈ Ω n L , x := ( c, u ) ∈ Ω L , (iii) Y := { y 1 t , . . . , y n t } = { ( c 1 t , u 1 t ) , . . . , ( c n t , u n t ) } ⊂ Ω L , (iv) B δ ( y j t ) := B δ (( c j t , u j t )) ⊂ Ω L , and B δ ( y t ) := B δ (( c t , u t )) ⊂ Ω n L , (5.13) where in the last shorthand notation w e actually in tend B δ ( y t ) := B δ ( c t , u t ) := B δ ( c 1 t , u 1 t ) × · · · × B δ ( c n t , u n t ) . (5.14) T o sho w Theorem 5.2, we first need to establish some auxiliary lemmata. F or this purp ose, we recall once again the definition of the measure µ t in (5.1). Lemma 5.5 ( Mass c orr e ction ) . Supp ose that (A1) - (A8) hold. With the notation of (5.13) , let δ > 0 b e such that (5.12) holds. Then, lim sup ν ∗ ⇀µ t J n ( c t , u t ) − J ( ν ) W 2 ( ν, µ t ) = lim sup ν ∗ ⇀µ t J n ( c t , u t ) − J ( ˜ ν ) W 2 ( ν, µ t ) , (5.15) with ˜ ν b eing define d as ˜ ν := 1 n n X j =1 ν | B δ ( y j t ) ν ( B δ ( y j t )) . (5.16) 32 Remark 5.6. Before giving the pro of of Theorem 5.5, w e men tion that without loss of generality w e additionally assume the particles to b e distinct, i.e. , δ y i  = δ y j for i  = j . In case this do es not hold, all of the statemen ts still follo w, but one needs to account for multiplicities. F or instance, with approximating measures ν as in (5.15)-(5.16), we hav e ν | B δ ( y j t )) ∗ ⇀ k j n δ y j t , where k j := # { i : y i t = y j t } . In view of the definitions of the L aguerr e c el ls and the corresp onding dual maximizers in Section 5.3, w e would then hav e for all optimal couplings π ∈ Γ 0 ( ν, µ t ) that k j n = π ( A j × { y j } ) , allo wing one to directly generalize the pro ofs. Pr o of. It is enough to show that lim ν ∗ ⇀µ t |J ( ν ) − J ( ˜ ν ) | W 2 ( ν, µ t ) = 0 . (5.17) F or this purp ose, set ν R := ν − ˜ ν . By the definition of J in (4.15), (5.16), the triangle inequality and (A2), we ha ve |J ( ν ) − J ( ˜ ν ) | ⩽     F  Z Ω L c 2 K u d ( ˜ ν + ν R )  − F  Z Ω L c 2 K u d ˜ ν      +     Z Ω L c 2 d ( ˜ ν + ν R ) − Z Ω L c 2 d ˜ ν     ⩽ sup y ∈ A ν ∥∇F ∥ ( y ) ·     Z Ω L c 2 K u d ( ˜ ν + ν R ) − Z Ω L c 2 K u d ˜ ν     +     Z Ω L c 2 dν R     ⩽ sup y ∈ A ν ∥∇F ∥ ( y ) · sup ( c,u ) ∈ Ω L ∥ c 2 K u ∥ Y + L 2 ! | ν R | (Ω L ) , where A ν ⊂ Y is a compact set con taining b oth R Ω L c 2 K u dν and R Ω L c 2 K u d ˜ ν for all ν in the fixed sequence ν ∗ ⇀ µ t , so that F is Lipsc hitz on it, since it is globally t wice F rec h´ et differentiable. Note that A ν is compact since the map µ 7→ R Ω L c 2 K u dµ is contin uous and K is weak*-to-strong con tinuous, see (A5). In view of the ab o v e chain of inequalities, in order to show (5.17) we are only left with verifying that lim ν ∗ ⇀µ t | ν R | (Ω L ) W 2 ( ν, µ t ) = 0 . (5.18) F or this, notice first that by the definition of ˜ ν in (5.16) and (5.12), setting B δ := n [ j =1 B δ ( y j t ) and C δ := Ω L \ B δ , (5.19) 33 w e can further decomp ose ν R = ν | C δ + ( ν | B δ − ˜ ν ) = ν | C δ + n X j =1 ν | B δ ( y j t ) · 1 − ν ( B δ ( y j t )) − 1 n ! . (5.20) As d 2 Ω ( z , y j t ) ⩾ δ 2 for all y j t ∈ Y and z ∈ C δ (cf. (5.13)), for an optimal coupling π ∈ Γ 0 ( ν, µ t ), we estimate ν ( C δ ) = π ( C δ × Y ) ⩽ 1 δ 2 Z C δ × Y d 2 Ω ( z , y ) dπ ( z , y ) ⩽ W 2 2 ( ν, µ t ) δ 2 , so that lim ν ∗ ⇀µ t ν ( C δ ) W 2 ( ν, µ t ) = 0 . (5.21) Therefore, in view of (5.20), for the verification of (5.18), one only has to consider ν ′ R := n X j =1 ν | B δ ( y j t ) · 1 − ν ( B δ ( y j t )) − 1 n ! . As | ν ′ R | (Ω L ) = n X j =1 ν ( B δ ( y j t )) ·      1 − ν ( B δ ( y j t )) − 1 n      = n X i =1     ν ( B δ ( y j t )) − 1 n     , it is sufficient to show that for ev ery fixed j ∈ { 1 , . . . , n } , lim ν ∗ ⇀µ t   ν ( B δ ( y j t )) − 1 n   W 2 ( ν, µ t ) = 0 . (5.22) T o this end, take again an optimal coupling π ∈ Γ 0 ( ν, µ t ). Then for Y j := Y \ { y j t } , by Theorem 5.11, one has ν ( B δ ( y j t )) = ν ( B δ ( y j t ) ∩ A j ) + ν ( B δ ( y j t ) ∩ A c j ) = π (( B δ ( y j t ) ∩ A j ) × { y j t } ) + π (( B δ ( y j t ) ∩ A c j ) × Y j ) . By (5.1) and Lemma 5.11, w e hav e that 1 n = π ( A j × { y j t } ), hence    ν ( B δ ( y j t )) − 1 n    =   π (( B c δ ( y j t ) ∩ A j ) × { y j t } ) − π (( B δ ( y j t ) ∩ A c j ) × Y j )   ⩽ Z ( B c δ ( y j t ) ∩ A j ) ×{ y j t } 1 dπ + Z ( B δ ( y j t ) ∩ A c j ) × Y j 1 dπ ⩽ 1 δ 2 Z ( B c δ ( y j t ) ∩ A j ) ×{ y j t } d 2 Ω ( x, y ) dπ ( x, y ) + 1 δ 2 Z ( B δ ( y j t ) ∩ A c j ) × Y j d 2 ( x, y ) dπ ( x, y ) ⩽ W 2 2 ( ν, µ t ) δ 2 + X i  = j 1 δ 2 Z ( B δ ( y j t ) ∩ A c j ) ×{ y i t } d 2 ( x, y ) dπ ( x, y ) ⩽ C W 2 2 ( ν, µ t ) δ 2 , where we hav e used that for all i  = j , B δ ( y j t ) ⊂ B c δ ( y i t ), cf. (5.12). The ab ov e inequality implies (5.22), which together with (5.21), implies the desired conv ergence (5.18). 34 F rom no w on w e set ν j := ν | B δ ( y j t ) ν ( B δ ( y j t )) , ˜ ν := 1 n n X j =1 ν j , ν n := ν 1 ⊗ · · · ⊗ ν n , (5.23) so that by (5.16), ν = 1 n P n j =1 ν j + ν R . Next, we also need an estimate on R B δ ( y t ) d n ( x , y t ) dν n ( x ) , where w e recall the notation in (5.13). Note that we also used a similar argumen t for the one-particle case in Theorem 5.4, how ev er, since in that case there is only one Laguerre cell, the follo wing estimate w as trivial for n = 1. Lemma 5.7 ( Quantitative Estimate ) . Supp ose that (A1) - (A8) hold, and δ > 0 is smal l enough such that also (5.12) holds. Then, Z B δ ( y t ) d n ( x , y t ) W 2 ( ν, µ t ) dν n ( x ) ⩽ α ν , with lim ν ∗ ⇀µ t α ν = 1 . (5.24) Pr o of. First of all, note that by (5.14) and (5.23), one has ν n ( B δ ( y t )) = 1. Then, b y Jensen’s inequalit y , the choice of δ > 0, (3.2), that ν j ( B δ ( y j t )) = 1 for all j ∈ { 1 , . . . , n } , and for π ∈ Γ 0 ( ν, µ t ), w e estimate  Z B δ ( y t ) d n ( x , y t ) dν n ( x )  2 ⩽ Z B δ ( y t ) d 2 n ( x , y t ) dν n ( x ) = Z n × i =1 B δ ( y i t ) d 2 n (( x 1 , y 1 t ) , . . . , ( x n , y n t )) dν 1 ( x 1 ) . . . dν n ( x n ) = Z n × i =1 B δ ( y i t ) 1 n n X j =1 d 2 Ω ( x j , y j t ) dν 1 ( x 1 ) . . . dν n ( x n ) = 1 n n X j =1 Z n × i =1 B δ ( y i t ) d 2 Ω ( x j , y j t ) dν 1 ( x 1 ) . . . dν n ( x n ) = 1 n n X j =1 Z B δ ( y j t ) d 2 Ω ( x j , y j t ) dν j ( x j ) " Π i  = j Z B δ ( y i t ) dν i ( x i ) # = 1 n n X j =1 Z B δ ( y j t ) d 2 Ω ( x, y j t ) dν j ( x ) = 1 n n X j =1 1 ν ( B δ ( y j t )) Z B δ ( y j t ) d 2 Ω ( x, y j t ) dν ( x ) ⩽ 1 n · min i ∈{ 1 ,...,n } ν ( B δ ( y i t ))) n X j =1 Z B δ ( y j t ) × Y d 2 Ω ( x, y j t ) dπ ( x, ˜ y ) . No w for every j ∈ { 1 , . . . , n } , and every x ∈ B δ ( y j t ), by assumption it holds that d 2 Ω ( x, y j t ) ⩽ d 2 Ω ( x, ˜ y ) for all ˜ y ∈ Y , and thus Z B δ ( y j t ) × Y d 2 Ω ( x, y j t ) dπ ( x, ˜ y ) ⩽ Z B δ ( y i t ) × Y d 2 Ω ( x, ˜ y ) dπ ( x, ˜ y ) . 35 Therefore, setting α ν := 1 n · min i ∈{ 1 ,...,n } ν ( B δ ( y i t )) , and recalling the notation in (5.19), the previous estimates imply that  Z B δ ( y t )) d n ( x , y t ) dν n ( x )  2 ⩽ α ν Z B× Y d 2 Ω ( x, y ) dπ ( x, y ) , and since π ∈ Γ 0 ( ν, µ t ), we arrive at  Z B δ ( y t )) d n ( x , y t ) dν n ( x )  2 ⩽ α ν W 2 2 ( ν, µ t ) , and α ν → 1 for ν ∗ ⇀ µ t , which concludes the pro of of the lemma. Lastly , w e need to generalize Theorem 5.3, with the pro of also pro ceeding similarly to Section A.2. Lemma 5.8. In the setting of The or em 5.7 and r e c al ling (5.23) , again for L 1 -a.e. t ∈ [0 , 1] , we have that lim ν ∗ ⇀µ t    R Ω n L J n ( c , u ) dν n − J ( ˜ ν )    W 2 ( ν, µ t ) = 0 . (5.25) Pr o of. First, recalling (3.5), (4.15) and (A6)-(A7), one has that Z Ω n L R 1 n n X j =1 ( c j ) 2 u j ! d ¯ ν n ( c , u ) = 1 n n X j =1 Z Ω n L ( c j ) 2 d ¯ ν n ( c , u ) = 1 n n X j =1 Z Ω L c 2 dν j ( c, u ) = Z Ω L c 2 d ˜ ν ( c, u ) . Therefore, Z Ω n L J n ( c , u ) dν n ( c , u ) − J ( ˜ ν ) = Z Ω n L F 1 n n X j =1 ( c j ) 2 K u j ! dν n ( c , u ) − F  Z Ω L c 2 K u d ˜ ν ( c, u )  = Z Ω n L F 1 n n X j =1 ( c j ) 2 K u j ! dν n − F 1 n n X j =1 ( c j t ) 2 K u j t ! | {z } =: I F ( ν,µ t ) + F 1 n n X j =1 ( c j t ) 2 K u j t ! − F  Z Ω L c 2 K u d ˜ ν  | {z } =: I I F ( ν,µ t ) . (5.26) Then, since F is (t wice) F rech ´ et-differentiable, and ¯ ν n is a probability measure, it follows that I F ( ν, µ t ) = Z Ω n L  ∇F  1 n n X j =1 ( c j t ) 2 K u j t  , 1 n n X j =1  ( c j ) 2 K u j − ( c j t ) 2 K u j t   Y dν n | {z } =: I 1 F ( ν,µ t ) + Z Ω n L g I  1 n n X j =1  ( c j ) 2 K u j − ( c j t ) 2 K u j t   dν n , (5.27) 36 and I I F ( ν, µ t ) =  ∇F  Z Ω L c 2 K u d ˜ ν  , 1 n n X j =1 ( c j t ) 2 K u j t − Z Ω L c 2 K u d ˜ ν  Y | {z } =: I I 1 F ( ν,µ t ) + g I I  1 n n X j =1 ( c j t ) 2 K u j t − Z Ω L c 2 K u d ˜ ν  , (5.28) for functions g I and g I I satisfying the same limiting b ehaviour at 0 as in (A.12). Now by linearity and (5.23), I 1 F ( ν, µ t ) =  ∇F  1 n n X j =1 ( c j t ) 2 K u j t  , Z Ω n L 1 n n X j =1  ( c j ) 2 K u j − ( c j t ) 2 K u j t  dν n  Y =  ∇F  1 n n X j =1 ( c j t ) 2 K u j t  , 1 n n X j =1 Z Ω L  c 2 K u − ( c j t ) 2 K u j t  dν j  Y and similarly , I I 1 F ( ν, µ t ) =  ∇F  Z Ω L c 2 K u d ˜ ν  , 1 n n X j =1 Z Ω L  ( c j t ) 2 K u j t − c 2 K u  dν j  Y . Note that by (A8), recalling Remark 3.12, and by an application of Theorem 5.7, it holds that 1 n n X j =1 Z Ω L   ( c j t ) 2 K u j t − c 2 K u   Y dν j ⩽ C n n X j =1 Z Ω L d Ω (( c, u ) , ( c j t , u j t )) dν j = C Z B δ ( c t , u t ) d n (( c , u ) , ( c t , u t )) dν n ( c , u ) ⩽ C α ν W 2 ( ν, µ t ) . (5.29) Therefore, since F is twice differen tiable and b y (5.29), we obtain I 1 F ( ν, µ t ) + I I 1 F ( ν, µ t ) =  ∇F  1 n n X j =1 ( c j t ) 2 K u j t  − ∇F  Z Ω L c 2 K u d ˜ ν  , 1 n n X j =1 Z Ω L  c 2 K u − ( c j t ) 2 K u j t  dν j  Y ⩽ ∥F ∥ C 2      1 n n X j =1 ( c j t ) 2 K u j t − Z Ω L c 2 K u d ˜ ν      Y ·      1 n n X j =1 Z Ω L  ( c j t ) 2 K u j t − c 2 K u  dν j      Y = ∥F ∥ C 2      1 n n X j =1 Z Ω L  ( c j t ) 2 K u j t − c 2 K u  dν j      2 Y ⩽ C F α 2 ν W 2 2 ( ν, µ t ) . So again, w e only ha ve to deal with the remainder terms, for whic h as in (A.13), we claim first that lim ν ∗ ⇀µ t    R Ω n L g I  1 n P n j =1  ( c j ) 2 K u j − ( c j t ) 2 K u j t  dν n    W 2 ( ν, µ t ) = 0 . (5.30) 37 Indeed, by definition and the contin uit y of ( c , u ) 7→ 1 n P n j =1 ( c j ) 2 K u j , for ev ery ε > 0 there exists a δ > 0 suc h that if ( c , u ) ∈ B δ ( c t , u t ) ⊂ Ω n L , then    g I  1 n n X j =1  ( c j ) 2 K u j − ( c j t ) 2 K u j t     ⩽ ε      1 n n X j =1  ( c j ) 2 K u j − ( c j t ) 2 K u j t       Y ⩽ ε n n X j =1   ( c j ) 2 K u j − ( c j t ) 2 K u j t )   Y . Then, using further (A8) and (5.29),      Z Ω n L g I  1 n n X j =1  ( c j ) 2 K u j − ( c j t ) 2 K u j t  dν n      ⩽ Z B δ ( c t , u t )    g I  1 n n X j =1  ( c j ) 2 K u j − ( c j t ) 2 K u j t     dν n ⩽ ε n Z B δ ( c t , u t ) n X j =1   ( c j ) 2 K u j − ( c j t ) 2 K u j t )   Y dν n ⩽ C ε n n X j =1 Z Ω L d Ω (( c, u ) , ( c j t , u j t )) dν j ⩽ C ε n α ν W 2 ( ν, µ t ) . Since lim ν ∗ ⇀µ t α ν = 1 , cf. (5.24), and ε > 0 was arbitrary , the last estimate directly implies (5.30). Lastly , w e analogously sho w that lim ν ∗ ⇀µ t    g I I  1 n P n j =1 ( c j t ) 2 K u j t − R Ω L c 2 K u d ˜ ν     W 2 ( ν, µ t ) = 0 , (5.31) whic h also follows b y (5.29). Indeed, again, for every ε > 0, if ν is sufficien tly close enough to µ t in the weak ∗ -top ology , then,      g I I  1 n n X j =1 ( c j t ) 2 K u j t − Z Ω L c 2 K u d ˜ ν       ⩽ ε      1 n n X j =1 ( c j t ) 2 K u j t − Z Ω L c 2 K u d ˜ ν      Y ⩽ C ε W 2 ( ν, µ t ) , whic h again, b y the arbitrariness of ε > 0, yields (5.31), and concludes the pro of of the lemma. W e are no w ready to prov e the equalit y of slop es which implies that the lifting of our A GF is a curv e of maximal slop e in P 2 (Ω L ). Theorem 5.2. Supp ose that (A1) - (A8) hold, and let n > 1 and µ t = 1 n P n j =1 δ ( c j t ,u j t ) . F or al l L > 0 it holds that | ∂ J | ( µ t ) = | ∂ J n | ( c t , u t ) . Pr o of. As in the pro of of Theorem 5.4, the inequality | ∂ J | ( µ t ) ⩾ | ∂ J n | ( c t , u t ) is a direct consequence of the definitions. F or the reverse inequality , in the framework of the subsequen t Subsection 5.3, let us choose a dual maximizer ψ and th us Laguerre cells ( A j ) j ∈{ 1 ,...,n } , cf. (5.34)–(5.39). By definition of the slop e in (2.17), for every ε > 0, there exists δ ′ > 0 suc h that for all ( c , u ) ∈ Ω n L with d n (( c t , u t ) , ( c , u )) ⩽ δ ′ , one has ( J n ( c t , u t ) − J n ( c , u )) + d n (( c t , u t ) , ( c , u )) ⩽ | ∂ J n | ( c t , u t ) + ε . (5.32) 38 Let then 0 < δ < δ ′ suc h that ( B δ ( c j t , u j t )) n j =1 are pairwise disjoint, as in (5.12), and B δ := n × j =1 B δ ( c j t , u j t ) ⊂ B δ ′ ( c t , u t ) ⊂ Ω n L . Let now ν ∗ ⇀ µ t and define ˜ ν as in (5.16) and ν R := ν − ˜ ν . Recalling also the notation in (5.23), Theorem 5.5 in particular implies that, lim sup ν ∗ ⇀µ t ( J n ( c t , u t ) − J ( ν )) + W 2 ( ν, µ t ) = lim sup ν ∗ ⇀µ t ( J n ( c t , u t ) − J ( ˜ ν )) + W 2 ( ν, µ t ) , (5.33) so that it is enough to consider ˜ ν instead of ν . By the triangle inequality , we can simply estimate   J n ( c t , u t ) − J ( ˜ ν )   =      Z Ω n L J n ( c t , u t ) dν n ( c , u ) − J ( ˜ ν )      ⩽      Z Ω n L  J n ( c t , u t ) − J n ( c , u )  dν n ( c , u )      +      Z Ω n L J n ( c , u ) dν n ( c , u ) − J ( ˜ ν )      =    Z B δ ( c t , u t )  J n ( c t , u t ) − J n ( c , u )  dν n ( c , u )    +      Z Ω n L J n ( c , u ) dν n ( c , u ) − J ( ˜ ν )      . By Theorem 5.8 this time, it suffices to consider the first term in the righ t hand side of the ab o ve line for the slop es. Then, com bining (5.32), (5.33), (5.25), and Theorem 5.7, gives | ∂ J | ( µ t ) = lim sup ν ∗ ⇀µ t ( J ( µ t ) − J ( ν )) + W 2 ( ν, µ t ) ⩽ lim sup ν ∗ ⇀µ t R B δ ( c t , u t ) ( J n ( c t , u t ) − J n ( c , u )) + dν n ( c , u ) W 2 ( ν, µ t ) ⩽ lim sup ν ∗ ⇀µ t ( | ∂ J n | ( c t , u t ) + ε ) Z B δ ( c t , u t ) d n (( c t , u t ) , ( c , u )) W 2 ( ν, µ t ) dν n ( c , u ) ⩽ ( | ∂ J n | ( c t , u t ) + ε ) lim sup ν ∗ ⇀µ t α ν = | ∂ J n | ( c t , u t ) + ε , and as this holds for all ε > 0, also the inequality | ∂ J | ( µ t ) ⩽ | ∂ J n | ( c t , u t ) follows. 5.3 Semi-Discrete Optimal T ransp ort on compact metric spaces Here, we recall some standard facts ab out semi-discrete optimal transp ort that can b e found in [38] and in the classical treatise [5], or more recen tly in [30]. Afterw ards, we use them to generalize the necessary statements to our setting of a general compact metric space. Let ( X , d ) b e a metric space, ν ∈ P 2 ( X ) and consider an atomic measure µ := n X j =1 α j δ y j , with α j > 0 , n X j =1 α j = 1 , and y i  = y j for i  = j . (5.34) 39 Set also Y := { y 1 , . . . , y n } and for every b ounded Lipsc hitz function ψ : X → R , let us set ψ j := ψ ( y j ). Then, the Kantoro vic h duality giv es W 2 2 ( ν, µ ) = sup ϕ ( x )+ ψ ( y ) ⩽ d 2 ( x,y ) Z X ϕ ( x ) dν ( x ) + n X j =1 α j ψ j . (5.35) Th us, one can consider the ab o v e maximization problem to b e p osed b et ween X and the discrete space Y . Then by , e.g. [47, Theorem 5.10], the supremum ab o v e is attained at a couple ( ψ c , ψ ), where ψ c ( x ) := min j ∈{ 1 ,...,n } ( d 2 ( x, y j ) − ψ j ) . (5.36) Hence, W 2 2 ( ν, µ ) = sup ( ψ 1 ,...,ψ n ) ∈ R n Z X ψ c ( x ) dν ( x ) + n X j =1 α j ψ j . (5.37) This duality motiv ates defining the L aguerr e c el ls A j and the tie sets Σ j as follows. Definition 5.9 ( L aguerr e c el ls ) . Let ( X , d ) be a metric space, ν ∈ P 2 (Ω L ), µ as in (5.34) and ψ ∈ R n b e a dual maximizer of (5.37). F or every j ∈ { 1 , . . . , n } , one defines the L aguerr e c el l A j and tie sets or also b oundaries of the L aguerr e c el ls Σ j as (i) A j := { x ∈ X : d 2 ( x, y j ) − ψ j ⩽ d 2 ( x, y i ) − ψ i for i  = j } , (ii) Σ j := [ i  = j ( A j ∩ A i ) . (5.38) Define also the interior of the L aguerr e c el ls as A j \ Σ j , and the tie set Σ as Σ := n [ j =1 Σ j . (5.39) A simple statement ab out the interior of the Laguerre cells follo ws directly from the definition. Lemma 5.10. F or every j ∈ { 1 , . . . , n } , it holds that A j \ Σ j = { x ∈ X : d 2 ( x, y j ) − ψ j < d 2 ( x, y i ) − ψ i for i  = j } . Th us, for ev ery i ∈ { 1 , . . . , n } and x ∈ A i , it holds that ψ c ( x ) = min j ∈{ 1 ,...,n } ( d 2 ( x, y j ) − ψ j ) = d 2 ( x, y i ) − ψ i , (5.40) and in the in terior, if x ∈ A i \ Σ i , by Lemma 5.10, ψ c ( x ) < d 2 ( x, y j ) − ψ j for all j  = i . (5.41) This can b e com bined with known facts on Kantoro vitc h duality to derive the desired prop erties. 40 Lemma 5.11 ( Pr op erties of semi-discr ete optimal c ouplings ) . In the ab ove setting, c onsider an optimal c oupling π ∈ Γ 0 ( ν, µ ) . Then in fact π ∈ P 2 ( X × Y ) , and mor e over, for a choic e of dual optimizer ψ ∈ R n and L aguerr e c el ls ( A j ) j ∈{ 1 ,...,n } , it holds that (i) spt π ∩ ( A j × Y ) ⊂ A j × { y j } , (ii) spt π ∩ ( A c j × Y ) ⊂ A c j × ( Y \ { y j } ) , (5.42) and α j = π ( A j × { y j } ) . Pr o of. First of all, note that cylindrical sets are generators of the Borel σ − algebra in X × X , and for any such A × B ⊂ X × X , with B ⊂ Y c , by (4.8) and (5.34), one has 0 ⩽ π ( A × B ) ⩽ π ( X × B ) = µ ( B ) = 0 . Therefore, it is immediate that spt π ⊂ X × Y , and π ∈ P 2 ( X × Y ). F or the second part, [47, Theorem 5.10] implies that spt π ⊂ { ( x, y ) ∈ X × Y : ψ c ( x ) + ψ ( y ) = d 2 ( x, y ) } . (5.43) No w, by definition of the Laguerre cells, for j ∈ { 1 , . . . , n } and x ∈ A c j , there m ust exist an i 0  = j suc h that d 2 ( x, y j ) − ψ j > d 2 ( x, y i 0 ) − ψ i 0 . But then, by (5.41), also ψ c ( x ) < d 2 ( x, y j ) − ψ j = ⇒ ψ c ( x ) + ψ ( y j ) < d 2 ( x, y j ) ∀ ( x, y j ) ∈ A c j × { y j } . Therefore, by (5.43), π ( A c j × { y j } ) = 0, whic h implies (5.42), and moreov er, b y (5.34) we ha ve α j = µ ( { y j } ) = π ( X × { y j } ) = π ( A j × { y j } ) + π ( A c j × { y j } ) = π ( A j × { y j } ) , whic h concludes the pro of. In the next lemma we discuss prop erties of optimal couplings when restricted on the in terior of the Laguerre cells. Lemma 5.12. F or π ∈ Γ 0 ( ν, µ ) , j ∈ { 1 , . . . , n } , and L j := ( A j \ Σ j ) × X , it holds that π | L j = ν | ( A j \ Σ j ) ⊗ δ y j , so that Z L j d 2 ( x, y ) dπ ( x, y ) = Z A j \ Σ j d 2 ( x, y j ) dν ( x ) . Pr o of. F or ev ery j ∈ { 1 , . . . , n } , one has b y Lemma 5.11 that L j ∩ spt π ⊂ ( A j \ Σ j ) × { y j } . (5.44) 41 By disintegrating π | L j with resp ect to ν | ( A j \ Σ j ) , we hav e that π | L j = ν | ( A j \ Σ j ) ⊗ µ x , where x ∈ A j \ Σ j . F rom (5.44) it follows that spt µ x ⊂ { y j } for ν -a.e. x ∈ A j \ Σ j . (5.45) Note now that for every Borel subset D ⊂ A j \ Σ j , we hav e ν ( D ) = π ( D × X ) = Z D µ x ( X ) dν ( x ) , implying that µ x is a probability measure for ν -a.e. x in A j \ Σ j . Therefore, in view also of (5.45), it holds that µ x = δ y j concluding the pro of. Thanks to the previous lemma we can giv e a structural statement ab out the W asserstein distance b et ween ν and µ using the semi-discrete theory , for which w e also recall the notation in (5.39). Prop osition 5.13. L et ( X , d ) b e a metric sp ac e, µ as in (5.34) , ν ∈ P 2 ( X ) and ψ ∈ R n a c orr e- sp onding dual maximizer. F or an optimal c oupling π ∈ Γ 0 ( ν, µ ) , one has W 2 2 ( ν, µ ) = n X j =1 Z A j \ Σ j d 2 ( x, y j ) dν ( x ) + Z Σ × Y d 2 ( x, y ) dπ ( x, y ) . Pr o of. The statement follo ws from Theorem 5.12, since, recalling (5.39), W 2 2 ( ν, µ ) = Z X × Y d 2 ( x, y ) dπ ( x, y ) = n X j =1 Z ( A j \ Σ j ) × Y d 2 ( x, y ) dπ ( x, y ) + Z Σ × Y d 2 ( x, y ) dπ ( x, y ) = n X j =1 Z A j \ Σ j d 2 ( x, y j ) dν ( x ) + Z Σ × Y d 2 ( x, y ) dπ ( x, y ) . 6 Examples In this section w e consider concrete examples that fall in to our general framework. Finite dimensional settings. As a preliminary observ ation, we note that on R n , the sublev el sets of R ( u ) := ∥ u ∥ 1 and R ( u ) := ∥ u ∥ ∞ p ossess isolated extremal p oints. Consequently , the only geo desically connected admissible sets B reduce to singletons, and thus A GFs are not meaningful in this setting. In con trast, for R ( u ) := ∥ u ∥ p with 1 < p < ∞ , the set of extremal p oin ts coincides with the b oundary of the p -balls. In this case, AGFs are essen tially equiv alen t to Riemannian gradient flo ws on ∂ { u : ∥ u ∥ p ⩽ 1 } . Strictly conv ex Banach spaces. In case M is a strictly conv ex Banach space and R ( u ) := ∥ u ∥ M , it is immediate to c hec k that the extremal points of the unit ball of R coincide with ∂ { u : ∥ u ∥ M ⩽ 1 } . Therefore, if M admits a separable predual, then A GFs are metric gradient flo ws in the w eak*-closure of ∂ { u : ∥ u ∥ M ⩽ 1 } endow ed with any metric metrizing the w eak*-top ology . 42 6.1 T otal v ariation regularization in the space of measures As mentioned in the introduction, A GFs applied to optimization problems in the space of measures regularized with the total v ariation p enalization reco ver the PGF-setting analyzed in [23, 22]. Let X ⊂ R d b e a conv ex, compact set and set C := C ( X ). In this case M is the space of finite signed Radon measures on X . Cho osing as regularizer R := ∥ · ∥ T V , one can prov e that ˜ B = {± δ x : x ∈ X } . As weakly*-closed, geo desically connected subset of ˜ B one can c ho ose for example B + := { δ x : x ∈ X } or B − := {− δ x : x ∈ X } . Note that the W asserstein distance W 2 metrizes the narro w con vergence on B ± . Moreo ver, the spaces ( B ± , W 2 ) are flat and in any case isometric to ( B + , W 2 ). As a consequence, uniqueness for the minimizing mov ement sc heme and contraction estimates follo w from the application of Theorem 3.19. No w, we can concretely examine how the A GF framew ork defined in Section 3 lo oks like in this particular scenario for the choice of B + , whic h can b e rep eated v erbatim for B − . The discretized functional J n can b e written as J n  (( c j ) , ( δ x j )) n j =1  = F 1 n n X j =1 ( c j ) 2 K δ x j ! + 1 n n X j =1 ( c j ) 2 , (6.1) reco vering the discrete problem in [23, Equation (2)]. Moreo ver, since B + is isometric to X , the A GF of J n , defined as metric gradien t flo w in Ω n L , is equiv alent to the finite dimensional gradien t flo w of the functional (( c j ) , ( δ x j )) n j =1 7→ J n  (( c j ) , ( δ x j )) n j =1  , which is precisely how PGFs are defined. The lifted functional defined as in (4.15), for ν ∈ P (Ω L ), is written as J ( ν ) = F  Z Ω L c 2 K δ x dν ( c, δ x )  + Z Ω L c 2 dν ( c, δ x ) . (6.2) Note that P ([0 , L ] × B + ) is isometric to P ([0 , L ] × X ) through the maps T ( ν ) := T # ν , where T ( c, δ x ) := ( c, x ) . Indeed, T is an isometry since, due to the fact that T is a bijective isometry , w e hav e that W 2 ( T ( ν 1 ) , T ( ν 2 )) = W 2 ( T # ν 1 , T # ν 2 ) = W 2 ( ν 1 , ν 2 ) . Therefore, the metric gradient flo w of the lifted functional (6.2) is equiv alen t to a W asserstein gradient flo w of ν 7→ F  Z Ω L c 2 K δ x dν ( c, x )  + Z Ω L c 2 dν ( c, x ) , (6.3) i.e. precisely to the lifted dynamics of PGFs. 6.2 One-dimensional BV functions F ollowing [18] (see also [44]), we consider M := L ∞ ((0 , 1)) and Y := L 2 ((0 , 1)). W e recall that L ∞ ((0 , 1)) is a Banac h space, whose predual is C := L 1 ((0 , 1)), which is a separable space. T o enforce zero b oundary conditions, for 0 < ε < 1 2 w e define the set D ε := { u ∈ L ∞ ((0 , 1)) : u ( x ) = 0 for a.e. x ∈ (0 , ε ) ∪ (1 − ε, 1) } . 43 W e denote by | D u | ((0 , 1)) := sup  Z 1 0 u ( x ) div φ ( x ) dx : φ ∈ C 1 c ((0 , 1)) , ∥ φ ∥ ∞ ⩽ 1  the B V -seminorm of a function u ∈ L ∞ ((0 , 1)), and we c ho ose the regularizer R ( u ) := ( | D u | ((0 , 1)) if u ∈ B V ((0 , 1)) ∩ D ε , + ∞ otherwise . It is straightforw ard that R satisfies (A4) and (A5) in Section 2. By [18, Prop osition 6.11] one can also characterize the w eak*-closure of the extremal p oints set as ˜ B =  σ 1 2 1 [ a,b ] : a, b ∈ [ ε, 1 − ε ] , a ⩽ b and σ ∈ {− 1 , 1 }  , (6.4) where 1 A ( t ) denotes the indicator function of the set A . As weakly*-closed, geo desically connected subset of ˜ B , w e can c ho ose either B + =  1 2 1 [ a,b ] : a, b ∈ [ ε, 1 − ε ] , a ⩽ b  or B − =  − 1 2 1 [ a,b ] : a, b ∈ [ ε, 1 − ε ] , a ⩽ b  . In B ± , the w eak*-conv ergence is equiv alent to the con vergence of the endp oints of the indicator function, i.e. , 1 2 σ 1 [ a k ,b k ] ∗ ⇀ 1 2 σ 1 [ a,b ] for σ ∈ {− 1 , 1 } , if and only if a k → a and b k → b as k → ∞ . In particular, by choosing d B  1 2 σ 1 [ a,b ] , 1 2 σ 1 [ a ′ ,b ′ ]  := p | a − a ′ | 2 + | b − b ′ | 2 , d B metrizes the weak*-con v ergence in B ± . With this c hoice of the metric, b oth B + and B − are isometric to Q ε := { ( a, b ) ∈ [(0 , ε ) ∪ (1 − ε, 1)] 2 : a ⩽ b } ⊂ R 2 , and thus are flat and NPC. As a consequence, uniqueness for the AGF and the contraction estimates follo w again from Theorem 3.19. The discretized functional J n can b e written via J n  (( c j ) , ( a j , b j )) n j =1  = F 1 2 n n X j =1 ( c j ) 2 K 1 [ a j ,b j ] ! + 1 n n X j =1 ( c j ) 2 , (6.5) and the lifted functional is defined as in (4.15), for ν ∈ P (Ω L ) = P ([0 , L ] × B + ), via J ( ν ) := F  Z Ω L c 2 K 1 [ a,b ] dν ( c, 1 [ a,b ] )  + Z Ω L c 2 dν ( c, 1 [ a,b ] ) . (6.6) Reasoning similarly to Subsection 6.1, the space P ([0 , L ] × B + ) is isometric P ([0 , L ] × Q ε ). Therefore, the gradient flo w of the lifted functional is equiv alen t to a W asserstein gradient flo w in P ([0 , L ] × Q ε ) of ν 7→ F  Z [0 ,L ] × Q ε c 2 K 1 [ a,b ] dν ( c, ( a, b ))  + Z [0 ,L ] × Q ε c 2 dν ( c, ( a, b )) . (6.7) 44 6.3 Benamou-Brenier dynamical form ulation of optimal transp ort In tro duced in [9], the Benamou-Br enier formula allows to compute an optimal transport b etw een t wo probability measures ρ 0 and ρ 1 on a closed, b ounded domain X ⊂ R d through the minimization of the kinetic ener gy ( ρ, v ) 7→ 1 2 Z 1 0 Z X | v t ( x ) | 2 d ρ t ( x ) dt , (6.8) where ρ t ∈ M + ( X ) is a time-dep endent probabilit y measure interpolating b et ween ρ 0 and ρ 1 , and v t : [0 , 1] × X → R d is a vector field, and the pair ( ρ t , v t ) satisfies the c ontinuity e quation ∂ t ρ + div ( ρv t ) = 0 in the sense of distributions. One can reform ulate the Benamou-Brenier energy as a con v ex functional on the space of Borel measures M := M + ([0 , 1] × X ) × M  [0 , 1] × X ; R d  via B ( ρ, m ) :=    1 2 R 1 0 R X    dm dρ    2 dρ ( t, x ) , if ρ ⩾ 0 , m ≪ ρ + ∞ , otherwise . (6.9) In this case, the con tinuit y equation b ecomes the linear constrain t ∂ t ρ + div m = 0. In [42, 12, 15] the Benamou-Brenier energy w as used as a conv ex regularizer to solve dynamic inv erse problems, and its sparsity prop erties w ere subsequently inv estigated [13, 12, 31, 20]. This w as done b y considering the functional R α,β ( ρ, m ) := β B ( ρ, m ) + α ∥ ρ ∥ T V , sub jected to ∂ t ρ + div m = 0 (6.10) defined for all ( ρ, m ) ∈ M and α > 0 , β > 0. Here, A GFs apply directly to comp osite optimization problems regularized with R := R α,β since one can show that R α,β satisfies (A4)-(A5). Moreov er, as pro ved in [13], it is p ossible to characterize the extremal p oints of B α,β := { ( ρ, m ) ∈ M : R α,β ( ρ, m ) ⩽ 1 , sub jected to ∂ t ρ + div m = 0 } . Theorem 6.1 (Theorem 6, [13]) . L et α > 0 , β > 0 . It holds that Ext( B α,β ) = { ( ρ, m ) : ρ = C γ dt ⊗ δ γ ( t ) , m = ˙ γ ( t ) C γ dt ⊗ δ γ ( t ) } ∪ { (0 , 0) } , (6.11) wher e γ : [0 , 1] → X is a curve in H 1 ([0 , 1]; X ) and C γ :=  β 2 R 1 0 | ˙ γ ( t ) | 2 dt + α  − 1 . T o define AGFs, one can consider the w eak*-closed set B ⊂ ˜ B defined as B := { ( ρ, m ) : ρ = b dt ⊗ δ γ ( t ) , m = ˙ γ ( t ) b dt ⊗ δ γ ( t ) , for b ∈ [0 , C γ ] } . (6.12) Moreo ver, the metric d (( ρ 1 , m 1 ) , ( ρ 2 , m 2 )) = ∥ γ 1 − γ 2 ∥ H 1 ([0 , 1]; X ) metrizes the weak* con vergence on B on curves of b ounded sp eed ( b > 0). Therefore since H 1 ([0 , 1]; X ) is an Hilb ert space and an y Hilb ert space is NPC, uniqueness and contractivit y estimates for A GFs again follo w from the NPC 45 framew ork. Here, the functional J n can b e written (on curv es of b ounded sp eed) as J n  (( c j ) , ( γ j )) n j =1  = F 1 n n X j =1 ( c j ) 2 K ( C γ j dt ⊗ δ γ j ( t ) , C γ j ˙ γ j ( t ) dt ⊗ δ γ j ( t ) ) ! + 1 n n X j =1 ( c j ) 2 , and AGFs corresp ond to the ([0 , L ] × H 1 ([0 , 1]; X )) n gradien t flow of such functional. The lifting can b e written similarly as in the previous sections as a gradien t flo w in P 2 ([0 , L ] × H 1 ([0 , 1]; X )). Interest- ingly suc h algorithm has b een implemen ted (without mathematical in v estigation) as an acceleration step in [12]. 6.4 F urther examples In this section we list with less details sev eral further examples that can be handled by A GFs. W e refer the reader to the references for more insights ab out extremality prop erties of the giv en problems. Kan torovic h-Rubinstein-norms and W asserstein distances on balanced signed measures. As shown in the recent pap ers [19, 8, 18] Kantoro vich-Rubinstein norms can b e used as regularizers for inv erse problems and mean-field neural netw orks training. The KR-norm can b e defined as follo ws, cf. [8]: ∥ µ ∥ KR := | µ ( X ) | + sup  Z X f ( z ) dµ ( z ) : f ( e ) = 0 , L ( f ) ⩽ 1  , (6.13) where µ ∈ M 1 ( X ) is a signed measure with finite first momen t, e ∈ X is a base point and L ( f ) is the Lipschitz constan t of f . In case of balanced measures, i.e. , when µ + ( X ) = µ − ( X ), the KR norm is simply the W 1 -distance b etw een the p ositive and the negative part of µ . Alternatively , other p ossibilities to include unbalanced measures are p ossible through infimal con volution approac hes, cf. [19]. As shown in [8] (see also [18] and [19] for corresp onding and similar results), the extremal p oin ts of the ball of the KR-regularizer can b e characterized. In case X is a compact set, a simple consequence of the previously mentioned results yields that extremal p oin ts of { µ balanced : ∥ µ ∥ KR + ∥ µ ∥ T V ⩽ 1 } , are dip oles of the form µ dip = δ x − δ y d ( x,y )+1 . Therefore, A GFs applied to optimization problems on balanced signed measures regularized with the KR-norms are equiv alent to Euclidean gradient flo ws in ([0 , L ] × X × X ) n . Note that such algorithm ha ve also been implemen ted (without mathematical inv estigation) in [8, Section 8]. PDE-regularized optimization problems and splines. Optimization problems regularized with the total v ariation norm of a scalar differential op erator L in R d , i.e. , R ( u ) := ∥ Lu ∥ T V also fit the AGF framework. Indeed, the extremal p oints of the ball of R can b e c haracterized (up to elemen ts in the n ull-space of L ) as fundamental solutions translated by x ∈ R d , cf. [11, 46, 45]. Therefore, dep ending on the regularity of the fundamen tal solution, A GFs applied to suc h problems are Euclidean gradient flo ws in ([0 , L ] × R d ) n . BV functions in higher dimensions. Optimization problems on BV functions in R d regularized with the B V -seminorm can also b e studied through AGFs. The extremal p oin ts of the unit ball 46 of the B V -seminorm hav e b een sho wn to b e indicator functions of simple sets, cf. [11, 3, 33, 25]. Ho wev er, in this case, it is challenging to characterize the weak* distance on B and the prop erties of the resulting metric space [28, 29]. References [1] Emmanuel Abb e, Enric Boix Adsera, and Theo dor Misiakiewicz. “The merged-staircase prop- ert y: a necessary and nearly sufficien t condition for sgd learning of sparse functions on tw o-la y er neural netw orks”. In: Confer enc e on L e arning The ory . PMLR. 2022, pp. 4782–4887. [2] Luigi Ambrosio, Elia Bru ´ e, and Daniele Semola. L e ctur es on optimal tr ansp ort . V ol. 130. Springer, 2021. [3] Luigi Am brosio, Vicent Caselles, Simon Masnou, and Jean-Mic hel Morel. “Connected com- p onen ts of sets of finite p erimeter and applications to image pro cessing”. In: Journal of the Eur op e an Mathematic al So ciety 3.1 (2001), pp. 39–92. [4] Luigi Am brosio, Nicola Gigli, and Giusepp e Sa v ar ´ e. Gr adient flows: in metric sp ac es and in the sp ac e of pr ob ability me asur es . Springer, 2005. [5] F ranz Aurenhammer, F riedrich Hoffmann, and Boris Arono v. “Mink o wski-type theorems and least-squares clustering”. In: A lgorithmic a 20.1 (1998), pp. 61–76. [6] Miroslav Bac´ ak. Convex analysis and optimization in Hadamar d sp ac es . V ol. 22. W alter de Gruyter GmbH & Co KG, 2014. [7] Rapha¨ el Barb oni, Gabriel Peyr ´ e, and F rancois-Xa vier Vialard. “Understanding the training of infinitely deep and wide resnets with conditional optimal transport”. In: Communic ations on Pur e and Applie d Mathematics 78.11 (2025), pp. 2149–2205. [8] F rancesca Bartolucci, Marcello Carioni, Jos ´ e A Iglesias, Y ury Korolev, Emanuele Naldi, and Stefano Vigogna. “A Lipschitz spaces view of infinitely wide shallo w neural netw orks”. In: arXiv pr eprint arXiv:2410.14591 (2024). [9] Jean-David Benamou and Y ann Brenier. “A computational fluid mechanics solution to the Monge-Kan torovic h mass transfer problem”. In: Numerische Mathematik 84.3 (2000), pp. 375– 393. [10] Claire Boy er, Antonin Cham b olle, Y ohann De Castro, Vincent Duv al, F r ´ ed´ eric De Gournay, and Pierre W eiss. “On represen ter theorems and con vex regularization”. In: SIAM Journal on Optimization 29.2 (2019), pp. 1260–1281. [11] Kristian Bredies and Marcello Carioni. “Sparsit y of solutions for v ariational in verse problems with finite-dimensional data”. In: Calculus of V ariations and Partial Differ ential Equations 59.1 (2020), p. 14. [12] Kristian Bredies, Marcello Carioni, Silvio F anzon, and F rancisco Romero. “A generalized con- ditional gradien t metho d for dynamic inv erse problems with optimal transp ort regularization”. In: F oundations of Computational Mathematics 23.3 (2023), pp. 833–898. [13] Kristian Bredies, Marcello Carioni, Silvio F anzon, and F rancisco Romero. “On the extremal p oin ts of the ball of the Benamou–Brenier energy”. In: Bul letin of the L ondon Mathematic al So ciety 53.5 (2021), pp. 1436–1452. 47 [14] Kristian Bredies, Marcello Carioni, Silvio F anzon, and Daniel W alter. “Asymptotic linear con- v ergence of fully-corrective generalized conditional gradient metho ds”. In: Mathematic al pr o- gr amming 205.1 (2024), pp. 135–202. [15] Kristian Bredies and Silvio F anzon. “An optimal transp ort approac h for solving dynamic in verse problems in spaces of measures”. In: ESAIM: Mathematic al Mo del ling and Numeric al A nalysis 54.6 (2020), pp. 2351–2382. [16] Kristian Bredies and Hanna Katriina Pikk arainen. “In verse problems in spaces of measures”. In: ESAIM: Contr ol, Optimisation and Calculus of V ariations 19.1 (2013), pp. 190–218. [17] Martin R Bridson and Andr ´ e Haefliger. Metric sp ac es of non-p ositive curvatur e . V ol. 319. Springer Science & Business Media, 2013. [18] Marcello Carioni and Leonardo Del Grande. “A general theory for exact sparse representation reco very in con vex optimization”. In: arXiv pr eprint arXiv:2311.08072 (2023). [19] Marcello Carioni, Jos ´ e A Iglesias, and Daniel W alter. “Extremal points and sparse optimization for generalized Kan toro vich–Rubinstein norms”. In: F oundations of Computational Mathemat- ics 25.1 (2025), pp. 103–144. [20] Marcello Carioni and Julius Lohmann. “Sparsit y for dynamic in v erse problems on W asserstein curv es with b ounded v ariation”. In: Inverse Pr oblems 41.11 (2025), p. 115018. [21] L´ enaic Chizat. “Con vergence rates of gradien t metho ds for con v ex optimization in the space of measures”. In: Op en Journal of Mathematic al Optimization 3 (2022), pp. 1–19. [22] L´ enaic Chizat. “Sparse optimization on measures with ov er-parameterized gradien t descen t”. In: Mathematic al Pr o gr amming 194.1 (2022), pp. 487–532. [23] L´ enaic Chizat and F rancis Bach. “On the global con vergence of gradient descent for ov er- parameterized mo dels using optimal transp ort”. In: A dvanc es in neur al information pr o c essing systems 31 (2018). [24] L´ enaic Chizat, Maria Colombo, Rob erto Colom b o, and Xavier F ern´ andez-Real. “Quan titative Con vergence of W asserstein Gradien t Flo ws of Kernel Mean Discrepancies”. In: arXiv pr eprint arXiv:2603.01977 (2026). [25] Giacomo Cristinelli, Jos´ e A Iglesias, and Daniel W alter. “Conditional gradien ts for total v ari- ation regularization with PDE constrain ts: a graph cuts approac h”. In: Computational Opti- mization and Applic ations 93.1 (2026), pp. 209–265. [26] Gianni Dal Maso. An intr o duction to Γ -c onver genc e, Birkh¨ auser Boston, 1993. [27] L´ eo Dana, F rancis Bac h, and Loucas Pillaud-Vivien. “Conv ergence of Shallow ReLU Net works on W eakly In teracting Data”. In: arXiv pr eprint arXiv:2502.16977 (2025). [28] Y ohann De Castro, Vincent Duv al, and Romain Petit. “Exact recov ery of the supp ort of piece- wise constan t images via total v ariation regularization”. In: Inverse Pr oblems 40.10 (2024), p. 105012. [29] Y ohann De Castro, Vincent Duv al, and Romain P etit. “T o w ards off-the-grid algorithms for total v ariation regularized inv erse problems”. In: International Confer enc e on Sc ale Sp ac e and V ariational Metho ds in Computer Vision . Springer. 2021, pp. 553–564. 48 [30] Luca Dieci and Daniy ar Omaro v. “Solving semi-discrete optimal transp ort problems: star sha- p edeness and Newton’s metho d”. In: Numeric al A lgorithms 99.2 (2025), pp. 949–1004. [31] Vincent Duv al and Rob ert T ov ey. “Dynamical programming for off-the-grid dynamic inv erse problems”. In: ESAIM: Contr ol, Optimisation and Calculus of V ariations 30 (2024), p. 7. [32] Xavier F ern´ andez-Real and Alessio Figalli. “The con tin uous formu lation of shallo w neural net- w orks as wasserstein-t yp e gradien t flo ws”. In: A nalysis at L ar ge: De dic ate d to the Life and Work of Je an Bour gain . Springer, 2022, pp. 29–57. [33] W endell H Fleming. “F unctions with generalized gradient and generalized surfaces”. In: Annali di Matematic a Pur a e d Applic ata 44.1 (1957), pp. 93–103. [34] Richard Jordan, David Kinderlehrer, and F elix Otto. “The v ariational form ulation of the F okker–Planc k equation”. In: SIAM journal on mathematic al analysis 29.1 (1998), pp. 1–17. [35] Mark Krein and David Milman. “On extreme p oin ts of regular con vex sets”. In: Studia Math- ematic a 9 (1940), pp. 133–138. [36] Matthias Liero, Alexander Mielke, and Giusepp e Sa v ar ´ e. “Optimal entrop y-transp ort prob- lems and a new Hellinger–Kantoro vic h distance b etw een p ositive measures”. In: Inventiones mathematic ae 211.3 (2018), pp. 969–1117. [37] Uwe Ma yer. “Gradient flows on nonp ositively curved metric spaces and harmonic maps”. In: Communic ations in Analysis and Ge ometry 6 (1998), pp. 199–254. [38] Quentin M ´ erigot. “A multiscale approac h to optimal transp ort”. In: Computer gr aphics forum . V ol. 30. 5. Wiley Online Library. 2011, pp. 1583–1592. [39] Matteo Muratori and Giusepp e Sa v ar ´ e. “Gradien t flows and evolution v ariational inequalities in metric spaces. I: Structural prop erties”. In: Journal of F unctional Analysis 278.4 (2020), p. 108347. [40] Rob ert Phelps. L e ctur es on Cho quet’s The or em . 2nd ed. Lecture Notes in Mathematics (vol. 1757). Berlin: Springer, 2001. [41] Grant M Rotskoff and Eric V anden-Eijnden. “Neural net works as interacting particle systems: Asymptotic conv exity of the loss landscap e and univ ersal scaling of the appro ximation error”. In: stat 1050 (2018), p. 22. [42] Bernhard Schmitzer, Klaus P Sc h¨ afers, and Benedikt Wirth. “Dynamic cell imaging in PET with optimal transport regularization”. In: IEEE T r ansactions on Me dic al Imaging 39.5 (2019), pp. 1626–1635. [43] Justin Sirignano and Konstantinos Spiliop oulos. “Mean field analysis of neural net works: A cen tral limit theorem”. In: Sto chastic Pr o c esses and their Applic ations 130.3 (2020), pp. 1820– 1852. [44] Philip T rautmann and Daniel W alter. “A fast primal-dual-activ e-jump metho d for minimization in B V ((0 , T ); R d )”. In: Optimization 73.6 (2024), pp. 1851–1895. [45] Michael Unser and Julien F ageot. “Nativ e Banach spaces for splines and v ariational in verse problems”. In: arXiv pr eprint arXiv:1904.10818 (2019). 49 [46] Michael Unser, Julien F ageot, and John Paul W ard. “Splines are universal solutions of linear in verse problems with generalized TV regularization”. In: SIAM R eview 59.4 (2017), pp. 769– 793. [47] C´ edric Villani. Optimal T r ansp ort. Old and New . 1st ed. Grundlehren der mathematisc hen Wissensc haften. Berlin: Springer, 2008. [48] Guillaume W ang and L´ enaic Chizat. “An exponentially conv erging particle metho d for the mixed nash equilibrium of contin uous games”. In: arXiv pr eprint arXiv:2211.01280 (2022). [49] Stephan W o jtowytsc h. “On the conv ergence of gradien t descent training for t wo-la y er relu- net works in the mean field regime”. In: arXiv pr eprint arXiv:2005.13530 (2020). A App endix A.1 Minimizing mo v emen ts and curv es of maximal slop e In order to form ulate the concept of gradient flo ws in our setting, we hav e chosen (generalized) minimizing mov ements (see Section 3 and Section 4). F or completeness we recall the classical setup from [4, Chapter 2]. Definition A.1 ( Discr ete Scheme ) . Let ( X , d ) b e a complete metric space and F : X → ( −∞ , + ∞ ]. The discrete sc heme is defined as follo ws. Given a partition of the time in terv al [0 , + ∞ ) b y a sequence of time steps τ := ( τ n ) n ∈ N ⊂ (0 , + ∞ ) with | τ | := sup n ∈ N τ n < + ∞ , let us set P τ := { 0 =: t 0 τ < t 1 τ < · · · < t n τ < . . . } , I n τ := ( t n − 1 τ , t n τ ] , τ n =: t n τ − t n − 1 τ , lim n →∞ t n τ = ∞ X k =1 τ k = + ∞ , (A.1) and consider the functionals G ( τ n , u n − 1 τ ; v ) := F ( v ) + 1 2 τ n d 2 ( v , u n − 1 τ ) . (A.2) Giv en an admissible initialization u 0 τ ∈ Dom( F ), w e define successively ( u n τ ) n ∈ N ⊂ X , with the prop ert y that u n τ ∈ arg min v ∈ X G ( τ n , u n − 1 τ ; v ) ∀ n ⩾ 1 . (A.3) One then defines accordingly the piecewise constan t interpolation u τ ( t ) := ( u 0 τ , if t = 0 , u n τ , if t ∈ ( t n − 1 τ , t n τ ] ∀ n ⩾ 1 . (A.4) W e call u τ of (A.4) a discr ete solution corresp onding to the partition P τ of (A.1). 50 Remark A.2. The op erator which pro vides all the solutions to the minimization problem (A.3) (giv en u n − 1 τ ) is generically m ulti-v alued, and is often called the r esolvent op er ator . F or a general τ > 0 and u ∈ X it is defined via J τ ( u ) := arg min v ∈ X G ( τ , u ; v ) , i.e. , u τ ∈ J τ ( u ) ⇐ ⇒ G ( τ , u ; u τ ) ⩽ G ( τ , u ; v ) ∀ v ∈ X . Th us, Definition A.1 can equiv alen tly b e phrased as follo ws: ¯ u τ is a discrete solution iff u n τ ∈ J τ n ( u n − 1 τ ) for every n ⩾ 1. Definition A.3 ( Minimizing Movements ) . Let ( X , d ) b e a complete metric space, F : X → ( −∞ , + ∞ ] and G b e defined as in (A.2). W e call a curve x t : [0 , + ∞ ) → X a minimizing movement for G starting from x 0 ∈ Dom( F ) iff there exists a sequence of partitions ( τ k ) k ∈ N with | τ k | → 0 as k → ∞ , and a sequence of discrete solutions ( x τ k ) k ∈ N as in (A.4) suc h that (i) lim k →∞ F ( x 0 τ k ) = F ( x 0 ) , lim sup k →∞ d ( x 0 τ k , x 0 ) < + ∞ , (ii) x τ k ,t → x t for all t ⩾ 0 , as k → + ∞ . (A.5) More precisely , in [4, Definition 2.0.6], suc h x t are called gener alize d minimizing movements , and denoted by x t ∈ GMM( G, x 0 ). A related concept is the one of minimizing movements that can b e defined as follows. A curv e x t : [0 , + ∞ ) → X is a minimizing movement for G starting from x 0 ∈ Dom( F ), and one writes x t ∈ MM( G, x 0 ), iff for ev ery partition τ := ( τ n ) n ∈ N , there exists a discrete solution x τ as in (A.3)-(A.4), suc h that (i) lim | τ |↓ 0 F ( x 0 τ ) = F ( x 0 ) , lim sup | τ |↓ 0 d ( x 0 τ , x 0 ) < + ∞ , (ii) x τ ,t → x t for all t ⩾ 0 , as | τ | → 0 . (A.6) W e also recall and com bine some standard results on uniqueness of gradien t flo ws on metric spaces of non-p ositive curv ature in the follo wing, using as a reference [4, Chapter 4]. Theorem A.4. L et ( X , d ) b e a c omplete metric sp ac e of non-p ositive curvatur e cf. The or em 2.5, and F : X → ( −∞ , + ∞ ] b e λ -c onvex for some λ ∈ R . Then for every initial p oint x 0 ∈ X , ther e exists a unique x t ∈ GMM( G, x 0 ) that is lo c al ly Lipschitz and fulfil ls the fol lowing c ontr activity pr op erty: for every x 0 , ˜ x 0 ∈ X , d  x t , ˜ x t  ⩽ e − λt d  x 0 , ˜ x 0  for L 1 -a.e. t > 0 . (A.7) Given x 0 ∈ X , the unique c orr esp onding x t ∈ GMM( G, x 0 ) is also the unique solution to the fol lowing evolution variational ine quality (EVI) λ : 1 2 d dt d 2 ( x t , y ) + λ 2 d 2 ( x t , y ) + F ( x t ) ⩽ F ( y ) ∀ y ∈ Dom( F ) , (A.8) and is a curve of maximal slop e fr om x 0 for F , henc e fulfil ls the ine quality: 2 d dt F ( x t ) ⩽ −| x ′ t | 2 − | ∂ F | 2 ( x t ) . (A.9) In addition, for al l absolutely c ontinuous curves ˜ x t starting fr om x 0 , x t is the unique one that fulfil ls (A.9) , and ther efor e it is also the unique one with this pr op erty among al l curves of maximal slop e fr om x 0 with r esp e ct to F . 51 Pr o of. Since F is λ -con vex and ( X, d ) is NPC, by [4, Remark 4.0.2] it follo ws that v 7→ G ( τ , w ; v ) is ( τ − 1 + λ )-con vex (see [4, Assumption 4.0.1]. Therefore, existence, uniqueness and regularit y of u ∈ MM( G, x 0 ) as well as (A.7) follow from [4, Theorem 4.0.4]. T o pro ve that MM( G, x 0 ) = GMM( G, x 0 ), note that the solutions to the discrete scheme (A.2) are unique for | τ | small enough, as then [4, Theorem 4.1.2] is applicable b ecause then λ | τ | > − 1. By Definition A.3, one has that for all partitions τ the unique discrete solution x τ of (A.4) satisfies x τ ,t → x t for | τ | → 0 . No w, if ˜ x t ∈ GMM( G, x 0 ), b y the definition of generalized minimizing mov emen ts, ˜ x t is obtained b y a sp ecific c hoice of partitions ( τ k ) k ∈ N and discrete solutions ( x τ k ) k ∈ N ⊂ X , for which x τ k ,t → ˜ x t as k → ∞ . But then it m ust also hold that x τ k ,t → x t as k → ∞ , hence x t = ˜ x t , showing the uniqueness. The uniqueness among all curves of maximal slop e follows from the existence of (EVI) λ solutions, cf. [39, Theorem 4.2]. A.2 Pro of of Lemma 5.3 Note that thanks to (A8) and recalling Remark 3.12 there exists C > 0 such that for every ( c ′ , u ′ ) , (˜ c, ˜ u ) ∈ Ω L , it holds that ∥ ( c ′ ) 2 K u ′ − ˜ c 2 K ˜ u ∥ Y ⩽ C d Ω (( c ′ , u ′ ) , (˜ c, ˜ u )) . (A.10) Then, since F is F rech ´ et-differentiable and ν ∈ P 2 (Ω L ), one has that Z Ω L F ( c 2 K u ) dν − F  Z Ω L c 2 K u dν  = Z Ω L h F ( c 2 K u ) − F ( c 2 t K u t ) + F ( c 2 t K u t ) − F  Z Ω L c 2 K u dν i dν = Z Ω L  ( ∇F ( c 2 t K u t ) , c 2 K u − c 2 t K u t ) Y + g 1 ( c 2 K u − c 2 t K u t )  dν +  ∇F  Z Ω L c 2 K u dν  , c 2 t K u t − Z Ω L c 2 K u dν  Y + g 2  c 2 t K u t − Z Ω L c 2 K u dν  , (A.11) for functions g i : Y → R , i = 1 , 2, with lim y → 0 | g i ( y ) | ∥ y ∥ Y = 0 for i = 1 , 2 . (A.12) In view of (A.11) and using that F ∈ C 2 (cf. (A2)), the Cauc hy-Sc hw artz inequalit y , (A.10), Jensen’s inequalit y , and (4.7)–(4.9), w e first estimate Z Ω L   ∇F ( c 2 t K u t ) , c 2 K u − c 2 t K u t  Y dν +  ∇F  Z Ω L c 2 K u dν  , c 2 t K u t − Z Ω L c 2 K u dν  Y = Z Ω L  ∇F ( c 2 t K u t ) − ∇F  Z Ω L c 2 K u dν  , c 2 K u − c 2 t K u t  Y dν ⩽ ∥F ∥ C 2     Z Ω L ( c 2 K u − c 2 t K u t ) dν     2 Y ⩽ ∥F ∥ C 2  Z Ω L d Ω (( c, u ) , ( c t , u t )) dν  2 ⩽ ∥F ∥ C 2 Z Ω L d 2 Ω (( c, u ) , ( c t , u t )) dν ⩽ ∥F ∥ C 2 W 2 2 ( ν, µ t ) . 52 Therefore, in order to prov e (5.5), it suffices to deal with the g 1 , g 2 -terms in (A.11), i.e. , to pro ve that lim ν ∗ ⇀µ t    R Ω L g 1 ( c 2 K u − c 2 t K u t ) dν    W 2 ( ν, µ t ) = lim ν ∗ ⇀µ t    g 2 ( c 2 t K u t − R Ω L c 2 K u dν )    W 2 ( ν, µ t ) = 0 . (A.13) Indeed, for the first limit, by (A.10) and (A.12), and recalling the notation in (5.7), for ev ery ε > 0 there exists δ > 0, suc h that if ( c, u ) ∈ B δ,t , then | g 1 ( c 2 K u − c 2 t K u t ) ⩽ ε ∥ c 2 K u − c 2 t K u t ∥ Y , and once again b y (A.10), Z Ω L ∥ c 2 K u − c 2 t K u t ∥ Y dν ⩽ C Z Ω L d Ω (( c, u ) , ( c t , u t )) dν ⩽ C W 2 ( ν, µ t ) . (A.14) Since by (4.7)–(4.9) we hav e W 2 2 ( ν, µ t ) ⩾ δ 2 ν (Ω L \ B δ,t ) (cf. also (5.11)), we estimate     Z Ω L g 1 ( c 2 K u − c 2 t K u t ) dν     ⩽ Z B δ,t    g 1 ( c 2 K u − c 2 t K u t )    dν + Z Ω L \ B δ,t    g 1 ( c 2 K u − c 2 t K u t )    dν ⩽ ε Z B δ,t ∥ c 2 K u − c 2 t K u t ∥ Y dν + C ν (Ω L \ B δ,t ) ⩽ ε W 2 ( ν, µ t ) + C δ 2 W 2 2 ( ν, µ t ) . Sending ν ∗ ⇀ µ t here, gives lim ν ∗ ⇀µ t   R Ω L g 1 ( c 2 K u − c 2 t K u t ) dν   W 2 ( ν, µ t ) ⩽ ε , and since ε > 0 was arbitrary , the first part of (A.13) follows. F or the second one, b y (A.14) it holds     c 2 t K u t − Z Ω L c 2 K u dν     Y ⩽ C W 2 ( ν, µ t ) → 0 as ν ∗ ⇀ µ t , (A.15) and therefore, using (A.15) and also (A.12), we deduce that lim ν ∗ ⇀µ t | g 2 ( c 2 t K u t − R Ω L c 2 K u dν ) | W 2 ( ν, µ t ) ⩽ C   g 2 ( c 2 t K u t − R Ω L c 2 K u dν )     c 2 t K u t − R Ω L c 2 K u dν   Y = 0 , whic h concludes the pro of. 53

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment