An optimization problem on the sphere

We prove existence and uniqueness of the minimizer for the average geodesic distance to the points of a geodesically convex set on the sphere. This implies a corresponding existence and uniqueness result for an optimal algorithm for halfspace learnin…

Authors: Andreas Maurer

An optimization problem on the sphere Andreas Maurer Adalb ertstr 55 D80799 M ¨ unc hen No v em b er 1, 2021 Abstract W e pro ve existence and uniqueness of the minimizer for the avera ge geodesic distance to the p oin ts of a geo desically conv ex set on th e sphere. This imp lies a correspon ding ex istence and uniq ueness result for an opt i- mal algorithm for halfspace learning, when data and target functions are draw n from the u niform distribution. 1 In tro duction Let S n − 1 be the unit s phere in R n with normalized uniform measure σ and geo desic metric ρ , and let K b e a prop er co n vex cone with nonempty interior in R n . W e will show that the function ψ : S n − 1 → R defined b y ψ K ( w ) = Z K ∩S n − 1 ρ ( w, y ) dσ ( y ) attains its globa l minimum at a unique p oint on S n − 1 . While existence of the minim um is s traightforw ard, uniq ue ne s s s e e ms surprising ly difficult to prov e. A similar pr oblem has b een consider ed in [2] and [1 ]. In these w or ks the int ention is to define a cen troid, so in tegra tio n is replaced b y finite summation and ρ ( w, y ) replac ed b y ρ ( w, y ) 2 . Since the problem is rather obvious, it app ears likely that a pro of of the ab ov e result exists somewher e in the literature and we just haven’t b een able to find it. 2 Optimal halfspace learning Our motiv a tion to consider this problem a rises in learning theor y . Specifically we co nsider an exper imen t, wher e 1. A unit vector u is dra wn at random from σ and kept concealed from the learner. 1 2. A sample x = ( x 1 , ..., x m ) ∈  S n − 1  m is generated in m independent random trials of σ . 3. A lab el vector y = u ( x ) ∈ {− 1 , 1 } m is genera ted acco rding to the rule y i = sig n ( h u, x i i ), where h ., . i is the euclide a n inner pro duct and sig n ( t ) = 1 if t > 0 and − 1 if t < 0. The sig n of 0 is irr e lev ant, b ecause it corresp onds to ev ents of pro babilit y zero. 4. The lab eled s ample ( x , y ) = ( x , u ( x )) is supplied to the learner . 5. The learner pr oduces a h yp othesis f ( x , y ) ∈ S n − 1 according to some learning rule f :  S n − 1  m × {− 1 , 1 } m → S n − 1 . 6. An unlab eled test p oint x ∈ S n − 1 is drawn at random fro m σ and pre- sented to the learner who pro duces the lab el y = sig n ( h f ( x , y ) , x i ). 7. If sig n ( h u, x i i ) = y the learner is rewarded one unit, otherwise a p enalty of one unit is incurr e d. W e now ask the following q uestion: Which le arning rule f will give the highest av era ge reward on a very large num b er o f indep endent rep etitions of this experiment? Evidently the optimal learning r ule has to minimize the following functional: Ω ( f ) = E u ∼ σ E x ∼ σ m Pr x ∼ σ { sig n ( h f ( x , u ( x )) , x i ) 6 = sig n ( h u, x i ) } . Now a simple geometric argument shows that fo r any v , u ∈ S n − 1 we hav e Pr x ∼ σ { sig n ( h v , x i ) 6 = sig n ( h u, x i ) } = ρ ( v , u ) / π, relating the misclassification pr obability to the g eodes ic distance. F or a la beled sample ( x , y ) ∈  S n − 1  m × {− 1 , 1 } m we deno te C ( x , y ) =  u ∈ S n − 1 : u ( x ) = y  . C ( x , y ) is thus the set o f all h yp otheses c onsistent with the la beled sample ( x , y ). O bserve that, given x and u there is exactly one y such that y = u ( x ), that is u ∈ C ( x , y ). W e also hav e C ( x , y ) = K ( x , y ) ∩ S n − 1 where K ( x , y ) is the c lo sed convex cone K ( x , y ) = { v ∈ R n : h u, y i x i i ≥ 0 , ∀ 1 ≤ i ≤ m } . W e therefor e o btain Ω ( f ) = π − 1 E u ∼ σ E x ∼ σ m ρ ( f ( x , u ( x )) , u ) = π − 1 E x ∼ σ m X y ∈{− 1 , 1 } m E u ∼ σ ρ ( f ( x , u ( x )) , u ) 1 C ( x , y ) ( u ) = π − 1 E x ∼ σ m X y ∈{− 1 , 1 } m E u ∼ σ ρ ( f ( x , y ) , u ) 1 C ( x , y ) ( u ) = π − 1 E x ∼ σ m X y ∈{− 1 , 1 } m ψ K ( x , y ) ( f ( x , y ) ) . 2 If K ( x , y ) ha s empt y interior then the corr espo nding summa nd v anishes, so we can ass ume that K ( x , y ) has nonempty in terior . Clear ly − y i x i / ∈ K ( x , y ) for all example p oints, so K ( x , y ) is a prop er co ne . Our result therefore applies and asser ts the existence o f a unique minimizer f ∗ ( x , y ) o f the function ψ K ( x , y ) . The map f ∗ : ( x , y ) 7→ f ∗ ( x , y ) is then the unique optimal learning a lg orithm. The map f ∗ also has the symmetry prop erty f ∗ ( V x , y ) = V f ∗ ( x , y ) for any unitary V on R n . This is so, because ψ K ( V x , y ) ( w ) = ψ K ( x , y )  V − 1 w  , as is easily verified. W e will a lso show, that the solution f ∗ ( x , y ) must lie in the c o ne ( m X i =1 α i y i x i : α i ≥ 0 ) and that ψ K ( x , y ) has no other lo cal minima. 3 Pro of of the main result Notation 1 ρ ( ., . ) is the ge o desic distanc e and σ the Haar m e asur e on S n − 1 . F or A ⊆ R n we denote A 1 = { x ∈ A : k x k = 1 } = A ∩ S n − 1 . ’Cone’ will always me an ’c onvex c one’. F or A ⊆ R n we denote ˆ A = { x : h x, v i ≥ 0 , ∀ v ∈ A } . This is always a close d c onvex set. A pr op er c one K is one which is c ontaine d in some close d halfsp ac e. F or a set A t he indic ator function of A wil l b e denote d by 1 A Lemma 2 L et K b e a close d c one (i) If w / ∈ K then ther e is a unit ve ctor z ∈ R n such that h z , w i < 0 and h z , y i ≥ 0 for al l y ∈ K . (ii)  ˆ K  ˆ = K . (iii) Su pp ose that K is pr op er and has nonempty int erior, w ∈ S n − 1 , w / ∈ ˆ K ∪  − ˆ K  and ǫ > 0 . Then ther e exist s z with k z k = 1 such that − ǫ < h z , w i < 0 and h z , y i > 0 for al l y ∈ ˆ K \ { 0 } . Pro of. (i) Let B b e an op en ba ll containing w such that K ∩ B = ∅ . Define O = { λx : x ∈ B , λ > 0 } . Then K and O ar e nonempty disjoint conv ex sets and O is op en. By the Hahn- Banach theor em ([4], Theorem 3 .4) there is γ ∈ R and z ∈ R n such that h z , x i < γ ≤ h z , y i , ∀ x ∈ O , y ∈ K . 3 Cho osing y = 0 ∈ K gives γ ≤ 0, letting λ → 0 in h z , λw i < γ shows γ ≥ 0, so that γ = 0. The normaliza tion is trivial. (ii) T rivially K ⊆  ˆ K  ˆ . On the other hand, if w / ∈ K let z be the vector from pa r t (i). Then z ∈ ˆ K but h w, z i < 0, s o that w / ∈  ˆ K  ˆ . (iii) Since w / ∈ ˆ K there exists x 1 ∈ K s.t. h w , x 1 i < 0. Since the interior of K is nonempty , K is the closur e o f its in terior (Theorem 6.3 in [3]), so we can assume x 1 ∈ in t( K ). Similar ily , since w / ∈  − ˆ K  we hav e − w / ∈ ˆ K , so there is x 2 ∈ int( K ) with h− w, x 2 i < 0, that is h w , x 2 i > 0. Since the interior of K is conv ex it con tains the segment [ x 1 , x 2 ], so by con tinuit y o f h w, ·i there is some x 0 ∈ int( K ) with h w , x 0 i = 0. Since K is a pro p er cone 0 / ∈ in t( K ) a nd we can assume that k x 0 k = 1 . Let c > 0 be suc h that x ′ ∈ K whenever k x 0 − x ′ k ≤ c . W e define z = (1 − η ) 1 / 2 x 0 − η 1 / 2 w , where 0 < η < min  c 2 1 + c 2 , ǫ 2  . Since h w , x 0 i = 0 it is clea r that z is a unit vector. Also h w, z i = − η 1 / 2 > − ǫ , and for any y ∈ ˆ K 1 we have x 0 − cy ∈ K , so h y , x 0 − cy i ≥ 0 and h y , z i = (1 − η ) 1 / 2 ( h y , x 0 − cy i + c h y , y i ) − η 1 / 2 h y , w i ≥ (1 − η ) 1 / 2 c − η 1 / 2 > 0 . Theorem 3 L et K ⊂ R n − 1 b e a close d pr op er c one with nonempty interior, g : [ 0 , π ] → R c ontinuous and the funct ion ψ : S n − 1 → R define d by ψ ( w ) = Z K 1 g ( ρ ( w , y )) dσ ( y ) . (i) ψ attains its glob al minimum on S n − 1 . (ii) If g is incr e asing then every lo c al minimum of ψ must lie in ˆ K ∪  − ˆ K  and every glob al minimum of ψ must lie in K ∩ ˆ K . (iii) If g is incr e asing and c onvex in [0 , π/ 2] then the glob al minimum of ψ is unique and c orr esp onds to t he only lo c al minimum outside − ˆ K . (iv) If g is incr e asing, c onvex in [0 , π / 2 ] and c onc ave in [ π / 2 , π ] then the glob al minimu m of ψ is u n ique and c orr esp onds to its only lo c al minimum on S n − 1 . Pro of. (i) is immediate since S n − 1 is compact and ψ is con tinuous. (ii) Fix w ∈ S n − 1 , w / ∈ ˆ K ∪  − ˆ K  . W e will first show that there can be no lo cal minimu m of ψ a t w . Let ǫ > 0 be a rbitrary and c ho ose z as in the lemma (iii). The functional z divides the spher e S n − 1 int o t wo op en hemispheres L = { u : h z , u i < 0 } and R = { u : h z , u i > 0 } , 4 and a n equator o f σ -measure zero. No te that w ∈ L and ˆ K 1 ⊆ R . W e can write c = min y ∈ ˆ K 1 h y , z i > 0 , since ˆ K 1 is compact and y 7→ h y , z i is contin uous. With V we deno te the reflection opera tor which ex c hanges points of L and R V x = − h x, z i z + ( x − h x, z i z ) . V is ea sily verified to an isometry and V 2 = I . Suppo se now that u ∈ R and V u ∈ K . W e claim that u is in the interior of K . Indeed, if u ′ ∈ R n satisfies k u − u ′ k < 2 h u , z i c , then for all y ∈ ˆ K 1 we hav e h u ′ , y i = h u, y i − h u − u ′ , y i ≥ h u, y i − 2 h u , z i c ≥ h u, y i − 2 h u, z i h z , y i = h V u, y i ≥ 0 , so u ′ ∈  ˆ K  ˆ = K , by par t (ii) o f the lemma. This establishe s the claim and shows that V ( K ) ∩ R is contained in the interior of K . It follo ws that ∀ u ∈ R , 1 K ( u ) ≥ 1 K ( V u ) . (1) Also V ( K ) ∩ R is r elatively closed in R while in t( K ) ∩ R is o pen in R . Since R is connected they ca n only c oincide if V ( K ) ∩ R = R . But this is impo ssible, since then L ∪ R = V ( V ( K ) ∩ R ) ∪ ( V ( K ) ∩ R ) ⊆ V ( V ( K ∩ L )) ∪ int ( K ) = ( K ∩ L ) ∪ in t ( K ) ⊆ K , and K is assumed to b e a pro p er cone. So V ( K ) ∩ R is a pr o per subset o f int ( K ) ∩ R . The inequa lit y (1) is therefore strict on the nonempty op en set (in t ( K ) ∩ R ) \ ( V ( K ) ∩ R ). Using isometry and unipotence of V we now obtain ψ ( w ) − ψ ( V w ) = Z R ( g ( ρ ( w , u )) − g ( ρ ( V w, u ) )) 1 K ( u ) dσ ( u ) + + Z L ( g ( ρ ( w , u )) − g ( ρ ( V w, u )) ) 1 K ( u ) dσ ( u ) = Z R ( g ( ρ ( w , u )) − g ( ρ ( V w, u ) )) (1 K ( u ) − 1 K ( V u )) dσ ( u ) > 0 . The inequality holds , beca use the firs t factor ( g ( ρ ( w , u )) − g ( ρ ( V w, u ) )) in the last integral is a lw ays po sitiv e for u ∈ R , since g is increas ing and ρ is increasing in the euclidean distance. The second is nonneg ativ e and p o sitiv e on a set of po sitiv e measure. Since k w − V w k = 2 ǫ a nd ǫ > 0 was arbitra r y , we see that every neighborho o d of w contains a p oint wher e ψ has a smaller v a lue than at 5 w . W e co nclude that w cannot b e a lo cal minimum of ψ , which proves the first assertion of (ii). If w / ∈ K cho ose z as in pa rt (i) of the lemma and let W b e the isom- etry W x = − h x, z i z + ( x − h x, z i z ). The ∀ u ∈ K we ha ve g ( ρ ( w , u )) > g ( ρ ( W w, u ) ), so ψ ( w ) > ψ ( W w ) and w cannot b e a glo bal minimizer of ψ . So every global minimizer must b e in K ∩  ˆ K ∩  − ˆ K  . Since K 1 ∩  − ˆ K 1  is obviously empt y the s e c ond asser tion of (ii) follows. (iii) No w let w 1 , w 2 ∈ ˆ K 1 with w 1 6 = w 2 . Connect them with a geo desic in ˆ K 1 and let w ∗ ∈ ˆ K 1 be the midp oint of this geo desic, such that ρ ( w 1 , w ∗ ) = ρ ( w ∗ , w 2 ) = ρ ( w 1 , w 2 ) / 2 ≤ π / 2. W e define a map U by U x = h x, w ∗ i w ∗ − ( x − h x, w ∗ i w ∗ ) . Geometrically U is reflection on the o ne-dimensional subspace gener ated by w ∗ . Note that w 2 = U w 1 and that ρ ( u, U u ) = 2 ρ ( u , w ∗ ) if ρ ( u, w ∗ ) ≤ π / 2 and that ρ ( u, U u ) = 2 π − 2 ρ ( u, w ∗ ) if ρ ( u, w ∗ ) ≥ π / 2. T ak e any u ∈ K 1 . Since w ∗ ∈ ˆ K 1 we have ρ ( u, w ∗ ) ≤ 2 π , whence ρ ( u, U u ) = 2 ρ ( u, w ∗ ). All the four p oin ts w 1 , w 2 , u and U u hav e a t most distance π / 2 from w ∗ and lie therefor e together with w ∗ on a co mmo n hemisphere. By the triangle inequality 2 ρ ( u, w ∗ ) = ρ ( u, U u ) ≤ ρ ( u, w 1 ) + ρ ( w 1 , U u ) = ρ ( u, w 1 ) + ρ ( U w 1 , U U u ) = ρ ( u, w 1 ) + ρ ( w 2 , u ) . If u do es no t lie on the g eo de s ic through w 1 and w 2 and not at distance π / 2 fro m w ∗ strict inequa lit y holds, and since K 1 has nonempty interior strict inequalit y holds on an op en subset of K 1 . If g is incre a sing and conv ex in [0 , π / 2] then dividing by 2 , applying g and in tegrating o ver K 1 we g et ψ ( w ∗ ) < (1 / 2) ( ψ ( w 1 ) + ψ ( w 2 )) . It follows that ther e ca n b e a t mos t one po in t in ˆ K 1 where the gra dien t of ψ v anishes , and this p oint, if it exists , must corresp ond to a lo cal minimum. By (ii) this is the unique g lo bal minimum and the only lo cal minim um outside − ˆ K , which esta blishes (iii). (iv) If x 1 , x 2 ∈ − ˆ K 1 and x ∗ ∈ − ˆ K 1 is their midp oint, then for u ∈ K we obtain, using ρ ( x i , u ) = π − ρ ( − x i , u ) and a reasoning ana logous to the ab ov e, ρ ( u, w ∗ ) ≥ (1 / 2) ( ρ ( u, w 1 ) + ρ ( u, w 2 )) , the ineq ua lit y b eing again strict on a set of p ositive mea sure and preserved under a pplication o f a function g which is increa sing and concav e in [ π / 2 , π ], s o that ψ ( w ∗ ) > (1 / 2) ( ψ ( w 1 ) + ψ ( w 2 )) . It again fo llows that there can b e at mos t o ne p oint in − ˆ K 1 where the gradient of ψ v a nishes, and this p oint m ust now corresp ond to a lo cal maximum. W e conclude that ψ ha s a unique lo cal minim um which lies in ˆ K 1 . 6 Remark. An exa mple of a function as in (iii) is g ( t ) = t 2 , in which case the minimizer is the spherical mass ce ntroid co nsidered in [2] and [1]. Examples of functions as in (iv) ar e of cour s e the identit y function, in which case w e o btain the result stated in the intro ductio n. W e could also set g ( t ) = 2 (1 − cos t ), in which ca se the function rea ds ψ ( w ) = Z K 1 k w − y k 2 dσ ( y ) . In this c a se uniqueness of the minim um ca n be established with m uch simpler metho ds. Ac kno wledgem en t. T he author is g rateful to Andre a s Arg y riou, Ma ssim- iliano Pontil and Erhar d Seiler for ma n y encouraging discuss io ns. References [1] S. R. Buss and J . P . Fillmore. Spherical averages and applicatio ns to spher - ical s plines and in terp olation. ACM T r ans. Gr aph. 20, 2: 95 -126, 2001 . [2] G. A. Gal’p erin. A co ncept of the mass center of a sys tem of material p oints in the constant curv ature spaces. Comm. Math. Phys. V olume 154, 1: 63- 84, 1993. [3] R. T. Ro ck afeller. Convex A nalysis. Princeton Univ ersity P ress, 197 0. [4] W. Rudin. F un ct ional A nalysis. McGraw-Hill, 197 4. 7

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment