Self Organizing Map algorithm and distortion measure

We study the statistical meaning of the minimization of distortion measure and the relation between the equilibrium points of the SOM algorithm and the minima of distortion measure. If we assume that the observations and the map lie in an compact Euc…

Authors: Joseph Rynkiewicz (CES, Samos)

Self Organizing Map algorithm and distortion measure
Self Organizing Map algorithm and distortion measure Joseph Rynkiewiz SAMOS/MA TISSE, Univ ersité de P aris-I, 90, rue de T olbia 75013 P aris, F rane, T el. and F ax : (+33) 144078705 joseph.rynkiewizuniv-paris1.fr 1 Self Organizing Map algorithm and distortion measure Abstrat W e study the statistial meaning of the minimization of distortion measure and the relation b et w een the equilibrium p oin ts of the SOM algorithm and the minima of distortion measure. If w e assume that the observ ations and the map lie in an ompat Eulidean spae, w e pro v e the strong onsisteny of the map whi h almost minimizes the empirial distortion. Moreo v er, after alulating the deriv ativ es of the theoretial distortion measure, w e sho w that the p oin ts minimizing this measure and the equilibria of the K ohonen map do not mat h in general. W e illustrate, with a simple example, ho w this o urs. k eyw ords Distortion measure, asymptoti on v ergene, onsisteny , Self Organizing Map, empiri- al pro esses, Gliv enk o-Can telli lass, uniform la w of large n um b ers, general neigh b orho o d funtion. SOM and distortion measure 3 1 In tro dution The distortion or distortion measure, is ertainly the most p opular riterion for assessing the qualit y of the lassiation of a K ohonen map (see K ohonen [8℄). This measure yields an assessmen t of mo del prop erties with resp et to the data and o v eromes the absene of ost funtion in the SOM algorithm. Moreo v er, the SOM algorithm has b een pro v en to b e an appro ximation for the gradien t of distortion measure (see Graep el et al.[6 ℄). Although the K ohonen map is pro v en to on v erge sometimes on equilibria p oin ts, when the n um b er of observ ations tends to innit y , the learning dynami annot b e desrib ed b y a gradien t desen t of distortion measure for an innite n um b er of observ ations (see for example Erwin et al. [2℄). Moreo v er, K ohonen [9 ℄ has sho wn in some examples for the one dimensional grid, that the mo del v etor pro dued b y the SOM algorithm do es not exatly oinide with the optim um of distortion measure. This prop ert y seems to b e parado xial, on one hand SOM seems to minimize the distortion for a nite n um b er of observ ations, but this b eha vior is no more true for the limit, i.e. an innit y of observ ations. In this pap er w e will in v estigate the relationship b et w een the SOM and distortion measure. Firstly w e will pro v e the strong onsisteny of the estimator minimizing the empirial distortion. More preisely , w e will pro v e that the maps almost minimizing the empirial distortion measure will on v erge almost surely to the set of maps minimizing the theoretial distortion measure. Seondly , w e will alulate the deriv ativ es of the theoretial distortion, and dedue from this alulation that the p oin ts minimizing the theoretial distortion dier generally from the equilibrium p oin t of the SOM, whatev er the dimension of the grid. Finally w e will illustrate, with a simple example, wh y an apparen t on tradition b et w een the disrete and the on tin uous ase o urs. SOM and distortion measure 4 2 Distortion measure W e also assume in the sequel that the observ ations ω are indep enden t and iden tially distributed (i.i.d.) and are of dimension d . W e assume that the observ ations lie in an ompat spae, therefore, without loss of generalit y , they lie in the ompat spae [0 , 1] d . W e assume also that these obser- v ations follo w the probabilit y la w P ha ving a densit y with resp et to the Leb esgue measure of R d , this densit y is assumed to b e b ounded b y a onstan t B . In the sequel w e all en troid a v etor of [0 , 1] d represen ting a lass of observ ations ω . W e adopt in the sequel the notation of Cottrell et al. [1℄. Denition 2.1 F or e ∈ N ∗ , e ≤ d , we  onsider a set of units indexe d by I ⊂ Z e with the neighb or- ho o d funtion Λ fr om I − I := { i − j, i, j ∈ I } to [0 , 1] satisfying Λ ( k ) = Λ ( − k ) and Λ (0) = 1 , note that suh neighb orho o d funtion  an b e disr ete or  ontinuous. Denition 2.2 Note k . k the Eulide an norm, let D δ I :=  x : = ( x i ) i ∈ I ∈  [0 , 1] d  I , suh that k x i − x j k ≥ δ if i 6 = j  b e the set of  entr oids x i sep ar ate d by, at le ast, δ . Denition 2.3 if x : = ( x i ) i ∈ I is the set of units, the V or onoi tessel lation ( C i ( x )) i ∈ I is dene d by C i ( x ) := n ω ∈ [0 , 1] d |k x i − ω k < k x k − ω k if k 6 = i o In  ase of e quality we assign ω ∈ C i ( x ) thanks to the lexi o gr aphi al or der. Conversely, the index of the V or onoi tessel lation for an observation ω wil l b e dene d by C − 1 x ( ω ) = i ∈ I , if and only if ω ∈ C i ( x ) SOM and distortion measure 5 Denition 2.4 distortion me asur es the quality of a quanti ation with r esp e t to the neighb orho o d strutur e. It is dene d as fol lows: • Distortion for the disr ete  ase (empiri al distortion): W e assume that the observations ar e in a nite set Ω = { ω 1 , · · · , ω n } and ar e uniformly distribute d on this set. Then, distortion me asur e is V n ( x ) = 1 2 n X i ∈ I X ω ∈ C i ( x )   X j ∈ I Λ ( i − j ) k x j − ω k 2   • Distortion for the  ontinuous  ase (the or eti al distortion): L et us assume that P is the distri- bution funtion of the observations. The the or eti al distortion me asur e is V ( x ) = 1 2 X i,j ∈ I Λ ( i − j ) Z C i ( x ) k x j − ω k 2 dP As mentione d b efor e the distribution P has a density with r esp e t to the L eb esgue me asur e b ounde d by a  onstant B > 0 . The distortion measure is w ell kno wn to b e not on tin uous with resp et to the en troids ( x i ) i ∈ I for the disrete ase. Indeed, if an observ ation is exatly on an h yp erplan separating t w o en troids, shifting one of the en troids will imply a jump for the distortion. So, the distortion is not on tin uous and, in general, a map whi h realizes the minim um of the empirial distortion, do es not exist. Ho w ev er, if w e onsider the sequenes of maps x n su h that the distortion V n ( x n ) will b e suien tly lose to its minim um, then w e will sho w that su h sequenes of maps x n will on v erge almost surely to the set of maps whi h rea hes the minim um of the theoretial distortion measure V ( x ) . SOM and distortion measure 6 3 Consisteny of the almost minim um of distortion This demonstration is an extended v ersion of Rynkiewiz [11 ℄. It follo ws the same line as P ollard [10 ℄, so w e will rst sho w a uniform la w of large n um b ers and then dedue the strong onsisteny prop ert y . 3.1 Uniform la w of large n um b er Let the family of funtions b e G :=    g x ( ω ) := X j ∈ I Λ  C − 1 x ( ω ) − j  k x j − ω k 2 for x ∈ D δ I    (1) In order to sho w the uniform la w of large n um b ers, w e ha v e to pro v e that: sup x ∈ D δ I     Z g x ( ω ) dP n ( ω ) − Z g x ( ω ) dP ( ω )     a.s. n →∞ − → 0 (2) sine, for all probabilit y measure Q on [0 , 1] d : Z g x ( ω ) dQ ( ω ) = Z X j ∈ I Λ  C − 1 x ( ω ) − j  k x j − ω k 2 dQ ( ω ) = 1 2 X i,j ∈ I Λ( i − j ) Z C i ( x ) k x j − ω k 2 dQ ( ω ) (3) No w, a suien t ondition to v erify the equation (2) is the follo wing (see Gaenssler and Stute [5 ℄): ∀ ε > 0 , ∀ x 0 ∈ D δ I a neigh b orho o d S ( x 0 ) of x 0 exists su h that Z g x 0 ( ω ) dP ( ω ) − ε < Z  inf x ∈ S ( x 0 ) g x ( ω )  dP ( ω ) ≤ Z sup x ∈ S ( x 0 ) g x ( ω ) ! dP ( ω ) < Z g x 0 ( ω ) dP ( ω ) + ε (4) First w e pro v e the follo wing result, using a similar te hnique as the pro of of lemma 11 of F ort and P agès [3 ℄ SOM and distortion measure 7 Lemma 3.1 L et x ∈ D δ I and λ b e the L eb esgue me asur e on [0 , 1] d . Note E c the  omplementary set of set E in [0 , 1] d and | I | the  ar dinal of set I. F or 0 < α < δ 2 , let U α i ( x ) =  ω ∈ [0 , 1] d / ∃ y ∈ D δ I , x j = y j if j 6 = i and k x i − y i k < α and ω ∈ C c i ( y ) ∩ C i ( x )  b e the set of ω hanging of V or onoi  el ls when the  entr oid x i ar e moving a distan e of at most α . Then sup x ∈ D δ I λ ( U α i ( x )) < ( | I | − 1)  2 α δ + α   √ 2  d − 1 (5) pro of Let x and y ∈ D δ I  he king the assumption of lemma 3.1 and j 6 = i ∈ I . In order to pro v e the inequalit y , w e ha v e to b ound the measure of ω b elonging to the ells C i ( x ) and C j ( y ) sim ultaneously , sine ( C i ( y )) c = S j ∈ I , j 6 = i C j ( y ) . Note ( z | t ) , the inner pro dut b et w een z and t , and − → n ij x := x j − x i k x j − x i k . The parameter v etor x + γ 1 − → n ij x will b e the v etor with all omp onen ts equal to x exept the omp onen t i equal to x i + γ 1 − → n ij x . Sine k y i − x i k < α , w e ha v e  y i − x i   − → n ij x  = γ 1 with | γ 1 | ≤ α < δ 2 . As the Leb esgue measure (of R d − 1 ) of all plane setions of [0 , 1] d is b ounded b y  √ 2  d − 1 , when there is a mo v emen t of the en troid x i , of γ 1 − → n ij x , the Leb esgue measure of ω  hanging of V oronoi ells is then b ounded b y | γ 1 | 2  √ 2  d − 1 , so λ  C j  x + γ 1 − → n ij x  ∩ C i ( x )  < α  √ 2  d − 1 (6) Moreo v er, w e note that x + γ 1 − → n ij x b elongs to D δ 2 I . On the other hand, let y i − x i − γ 1 − → n ij x := γ 2 − → τ ij x , with k − → τ ij x k = 1 , b e the orthogonal omp onen t to − → n ij x of the mo v emen t of x i to y i , i.e. su h that  − → n ij x   − → τ ij x  = 0 . As it is sho wn in gure (1), in dimension 2, for all x ′ = x + γ 1 − → n ij x ∈ D δ 2 I , the Leb esgue measure of ω  hanging of V oronoi ells for a mo v emen t of en troid x ′ i , of γ 2 − → τ ij x is b ounded b y 2 α δ  √ 2  d − 1 . SOM and distortion measure 8 Therefore, w e ha v e λ  C j  x + γ 1 − → n ij x + γ 2 − → τ ij x  ∩ C i ( x )  < α  √ 2  d − 1 + 2 α δ  √ 2  d − 1 (7) Figure 1: hat hed area < 2 γ 2 δ < √ 2 × 2 α δ γ 2 γ 2 0 1 2 2 δ δ/2 /2 < x’ x j i As this inequalit y is indep enden t of x , nally w e get: sup x ∈ D δ I λ  C j  x + γ 1 − → n ij x + γ 2 − → τ ij x  ∩ C i ( x )  <  α + 2 α δ   √ 2  d − 1 (8) then sup x ∈ D δ I λ ( U α i ( x )) < ( | I | − 1)  α + 2 α δ   √ 2  d − 1  No w onsider x 0 ∈ D δ I and S ( x 0 ) a neigh b orho o d of x 0 inluded in a sphere of radius α . Let W ( x 0 ) b e the set of ω remaining in their V oronoi ells when x 0 go to an y x ∈ S ( x 0 ) . F or all SOM and distortion measure 9 ω ∈ W ( x 0 ) w e ha v e inf x ∈ S ( x 0 ) g x ( ω ) ≥ g x 0 ( ω ) − P j ∈ I Λ  C − 1 x 0 ( ω ) − j   k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x 0 j − ω k 2  ≥ g x 0 j ( ω ) − P j ∈ I  k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x 0 j − ω k 2  (9) F or all ω ∈ [0 , 1] d , for a small enough α , w e ha v e  k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x j − ω k 2  < ε 2 B | I | so Z W ( x 0 ) X j ∈ I  k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x j − ω k 2  dP ( ω ) < ε 2 and Z W ( x 0 )  g x 0 ( ω ) − inf x ∈ S ( x 0 ) g x ( ω )  < ε 2 (10) No w, let W ( x 0 ) c b e the set of ω  hanging of V oronoi ells when the en troids go from x 0 to x ∈ S x 0 . F or all ω ∈ W ( x 0 ) c there exist t w o dieren t indies i and j su h that ω ∈ C i ( x 0 ) and ω ∈ C j ( x ) . Let us dene a sequene x k , k ∈ { 0 , · · · , k I |} , b y sequen tially  hanging the omp onen ts of x 0 in to the omp onen ts of x su h that x | I | = x ( x k is the set of in termediate ongurations to transform x 0 in x ), then there exists a momen t l ∈ { 0 , · · · , | I | − 1 } , su h that ω ∈ C i ( x l ) and ω / ∈ C i ( x l +1 ) . Indeed, if it w ere not the ase, y ou ould nd a sequene x k , k ∈ { 0 , · · · , k I |} , with x | I | = x su h that ω ∈ C i ( x | I | ) = C i ( x ) , whi h w ould b e a on tradition. So W ( x 0 ) c is inluded in the set of ω whi h  hange of V oronoi set when w e  hange sequen tially the omp onen ts of x 0 b y the omp onen ts of x . If α < δ 4 , then when the omp onen ts x 0 i of x 0 are mo ving sequen tially from x 0 to x i of x , ea h in termediate onguration sta ys in D δ 2 I . Sine, for all i ∈ I , k x i − ω k 2 is b ounded b y 1 on [0 , 1] d , the lemma 3.1, assure that Z W ( x 0 ) c g x ( ω ) dP ( ω ) < B | I | ( | I | − 1))  4 α δ + α   √ 2  d − 1 (11) SOM and distortion measure 10 Finally , if w e  ho ose a small enough α su h that B | I | ( | I | − 1))  4 α δ + α   √ 2  d − 1 < ε 2 , w e get Z D δ I g x 0 ( ω ) dP ( ω ) − ε < Z D δ I  inf x ∈ S ( x 0 ) g x ( ω )  dP ( ω ) (12) Exatly in the same w a y , for a small enough α , w e get Z D δ I sup x ∈ S ( x 0 ) g x ( ω ) ! dP ( ω ) < Z D δ I g x 0 ( ω ) dP ( ω ) + ε (13) Therefore, the suien t ondition for the uniform la w of large n um b ers is true. 3.2 Consisteny W e w an t to sho w the onsisteny of the pro edure in v olving  ho osing maps ( x n ) n ∈ N ∗ whi h almost minimizes the empirial distortions ( V n ( x )) n ∈ N ∗ in D δ I . Let ¯ χ β n := ( x ∈ D δ I su h that V n ( x ) < inf x ∈ D δ I V n ( x ) + 1 β ( n ) ) (14) b e the set of estimators that almost minimize the empirial distortion, with β ( n ) b eing a stritly p ositiv e funtion, su h that lim n → + ∞ β ( n ) = ∞ . Let ¯ χ = arg min x ∈ D δ I V ( x ) b e the set of maps minimizing the theoretial distortion, ev en tually redued to one map. It is easy to v erify that the funtion x 7− → V ( x ) is on tin uous on D δ I , so for all neigh b orho o d N of ¯ χ , η ( N ) > 0 exists su h that ∀ x ∈ D δ I \N , V ( x ) > min x ∈ D δ I V ( x ) + η ( N ) (15) to sho w the strong onsisteny , it is enough to pro v e that for all neigh b orho o ds N of ¯ χ w e ha v e lim n →∞ ¯ χ β n a.s. ⊂ N ⇐ ⇒ lim n →∞ V  ¯ χ β n  − V ( ¯ χ ) a.s. ≤ η ( N ) (16) SOM and distortion measure 11 with V ( E ) − V ( F ) := sup { V ( x ) − V ( y ) for x ∈ E and y ∈ F } . By denition V n  ¯ χ β n  a.s. ≤ V n ( ¯ χ ) + 1 β ( n ) , and the uniform la w of large n um b ers yields lim n →∞ V n ( ¯ χ ) − V ( ¯ χ ) a.s. = 0 , w e get then lim n →∞ V n  ¯ χ β n  a.s. ≤ V ( ¯ χ ) + η ( N ) 2 . Moreo v er, w e ha v e lim n →∞ V  ¯ χ β n  − V n  ¯ χ β n  a.s. = 0 and lim n →∞ V  ¯ χ β n  − η ( N ) 2 a.s. < lim n →∞ V n  ¯ χ β n  a.s. ≤ V ( ¯ χ ) + η ( N ) 2 (17) nally lim n →∞ V  ¯ χ β n  − V ( ¯ χ ) a.s. ≤ η ( N ) , this pro v es the strong onsisteny of the maps whi h almost minimizes the empirial distortion. 4 Dierenes b et w een the SOM algorithm and distortion measure Using the result of the previous setion w e an in v estigate the dierenes b et w een the minima of the empirial distortion and the equilibria of the SOM algorithm. Namely , if these equilibria w ere maps almost minimizing the empirial distortion riterion they will on v erge, as the n um b er of observ ations inreases, to the minim um of the theoretial distortion measure but w e will sho w that it is not generally the ase. In the next setion w e will ompute the gradien t of the funtion V ( x ) , and sho w that ev en in m ultidimensional ases, the equilibria of the SOM algorithm and the minima of V ( x ) do not mat h. These results generalize the results of K ohonen [ 9℄ obtained for unidimensional ases. 4.1 Deriv abilit y of V ( x ) Let us no w write D I :=   x i =  x 1 i , · · · , x d i  i ∈ I ∈  [0 , 1] d  I    ∀ k ∈ { 1 , · · · , d }    x k i − x k j    > 0 if i 6 = j  (18) SOM and distortion measure 12 F or i and j ∈ I , notes − → n ij x the v etor x j − x i k x j − x i k and let M ij x :=:  u ∈ R d /  u − x i − x j 2 , x i − x j  = 0  (19) b e the mediator h yp erplan. Let us note λ ij x ( ω ) the Leb esgue measure on M ij x . F ort and P agès [3 ℄, ha v e sho wn the follo wing lemma: Lemma 4.1 L et φ b e an R value d  ontinuous funtion on [0 , 1] d . F or x ∈ D I , let b e Φ i ( x ) := R C i ( x ) φ ( ω ) dω . W e note also ( e 1 , · · · , e d ) the  anoni al b ase of R d . The funtion Φ i is  ontinuously derivable on D I and ∀ i 6 = j, l ∈ { 1 , · · · , d } ∂ Φ i ∂ x l j ( x ) = Z ¯ C i ( x ) ∩ ¯ C j ( x ) φ ( ω )  1 2  − → n ij x , e l  + 1 k x j − x i k ×  x i + x j 2 − ω  , e l  λ ij x ( ω ) dω (20) Mor e over, if we note ∂ Φ i ∂ x i ( x ) :=       ∂ Φ i ∂ x 1 j ( x ) . . . ∂ Φ i ∂ x d j ( x )       ∂ Φ i ∂ x i ( x ) = − X j ∈ I , j 6 = i ∂ Φ i ∂ x j ( x ) (21) Then, w e dedue the theorem: Theorem 4.2 If P ( dω ) = f ( ω ) dω , wher e f is  ontinuous on [0; 1] d , then V is  ontinuously deriv- SOM and distortion measure 13 able on D I and we have ∂ V ∂ x i ( x ) = P k ∈ I Λ ( i − k ) R C k ( x ) ( x i − ω ) P ( dω ) + 1 2 P j ∈ I P k ∈ I ,k 6 = i (Λ ( k − j ) − Λ ( i − j ) ) × R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2  1 2 − → n k i x + 1 k x k − x i k ×  x i + x k 2 − ω   f ( ω ) λ k i x dω (22) wher e ∂ V ∂ x i ( x ) =       ∂ V ∂ x 1 i ( x ) . . . ∂ V ∂ x d i ( x )       Pro of As the funtion V ( x ) is on tin uous on D I , w e only ha v e to sho w that the partial deriv ativ es exist and are on tin uous. W e note h l i ∈ R | I |× d the v etor with all omp onen ts n ull exept the omp onen t orresp onding to x l i , whi h is h > 0 . Then V ( x + h l i ) − V ( x ) h = 1 2 P k,j ∈ I , k,j 6 = i Λ( k − j ) R C k ( x + h l i ) k x j − ω k 2 P ( dω ) − 1 2 P k,j ∈ I , k,j 6 = i Λ( k − j ) R C k ( x ) k x j − ω k 2 P ( dω ) h + 1 2 P j ∈ I , j 6 = i Λ( i − j ) R C i ( x + h l i ) k x j − ω k 2 P ( dω ) − 1 2 P j ∈ I ,j 6 = i Λ( i − j ) R C i ( x ) k x j − ω k 2 P ( dω ) h + 1 2 P k ∈ I ,k 6 = i Λ( k − i ) R C k ( x + h l i ) k x i + h l i − ω k 2 P ( dω ) − R C k ( x ) k x i − ω k 2 P ( dω ) h + 1 2 „ R C i ( x + h l i ) k x i + h l i − ω k 2 P ( dω ) − R C i ( x ) k x i − ω k 2 P ( dω ) « h (23) Where the rst t w o lines of the sums onern en troids dieren t from x i and the last t w o lines the v ariation in v olving x i . No w, b y applying the lemma 4.1, to the rst t w o lines of the sum w e get: SOM and distortion measure 14 lim h → 0 V ( x + h l i ) − V ( x ) h = 1 2 P k ,j ∈ I , k, j 6 = i Λ ( k − j ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω − 1 2 P k ,j ∈ I , k, j 6 = i Λ ( i − j ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω + lim h → 0 1 2 P k ∈ I ,k 6 = i Λ( k − i ) R C k ( x + h l i ) k x i − ω k 2 +2 h ( x l i − w l )+ o ( h ) P ( dω ) − R C k ( x ) k x i − ω k 2 P ( dω ) h + lim h → 0 1 2 „ R C i ( x + h l i ) k x i − ω k 2 +2 h ( x l i − w l )+ o ( h ) P ( dω ) − R C i ( x ) k x i − ω k 2 P ( dω ) « h (24) Then, b y applying the lemma 4.1 to the last t w o lines, w e get: lim h → 0 V ( x + h l i ) − V ( x ) h = 1 2 P k ,j ∈ I , k, j 6 = i (Λ ( k − j ) − Λ ( i − j )) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω + 1 2 P k ∈ I ,k 6 = i Λ ( k − i ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x i − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω − 1 2 P k ∈ I ,k 6 = i R ¯ C k ( x ) ∩ ¯ C i ( x ) k x i − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω + P k ∈ I Λ ( k − i ) R C k ( x ) ( x l i − w l ) P ( dω ) (25) nally lim h → 0 V ( x + h l i ) − V ( x ) h = ∂ V ∂ x l i ( x ) = 1 2 P k ,j ∈ I , k 6 = i (Λ ( k − j ) − Λ ( i − j )) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω + P k ∈ I Λ ( k − i ) R C k ( x ) ( x l i − w l ) P ( dω )  (26) If w e assume that the minim um of distortion measure is rea hed in the in terior of D I (i.e. that no en troids ollapse), w e dedue from the previous results that it do es not mat h the equilibrium of the K ohonen algorithm. Indeed, a p oin t x ∗ := ( x ∗ i ) i ∈ I asymptotially stable for the K ohonen SOM and distortion measure 15 algorithm will v erify for all i ∈ I : X k ∈ I Λ ( i − k ) Z C k ( x ) ( x i − ω ) P ( dω ) = 0 (27) This equation is v alid ev en for the bat h algorithm (see F ort, Cottrell and Letrém y [4 ℄). It an mat h with a minim um of the limit distortion only if 1 2 P j ∈ I P k ∈ I ,k 6 = i (Λ ( k − j ) − Λ ( i − j )) × R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2  1 2 − → n k i x + 1 k x k − x i k ×  x i + x k 2 − ω   f ( ω ) λ k i x dω = 0 (28) but, in general, this term is not n ull. 4.2 Example of a K ohonen string with 3 en troids The previous setion has sho wn that the minim um of distortion measure do es not mat h the equi- librium of the K ohonen algorithm. W e will illustrate this with a simple example. The lassial explanation (see K ohonen [7 ℄) of lo al p oten tial minimization b y the K ohonen algorithm is far from b eing satisfatory . A tually it seems that the minima of the distortion measure alw a ys o ur on a dison tin uit y p oin t, where the funtion is not deriv able. T o illustrate this, let a K ohonen string b e on segmen t [0 , 1] (see gure 2), with a disrete neigh b orho o d SOM and distortion measure 16 Figure 2: K ohonen string 0 1 1 1 X X X 1 2 3 4.2.1 The theoretial dierene The equilibrium of the SOM algorithm is rea hed on p oin ts x v erifying ∂ V ∂ x 1 ( x ) = R C 1 ( x ) ( x 1 − ω ) P ( dω ) + R C 2 ( x ) ( x 1 − ω ) P ( dω ) = 0 ∂ V ∂ x 2 ( x ) = R C 1 ( x ) ( x 2 − ω ) P ( dω ) + R C 2 ( x ) ( x 2 − ω ) P ( dω ) + R C 3 ( x ) ( x 2 − ω ) P ( dω ) = 0 ∂ V ∂ x 3 ( x ) = R C 2 ( x ) ( x 3 − ω ) P ( dω ) + R C 3 ( x ) ( x 3 − ω ) P ( dω ) = 0 (29) but the minima of the distortion are rea hed on p oin ts x v erifying ∂ V ∂ x 1 ( x ) = R C 1 ( x ) ( x 1 − ω ) P ( dω ) + R C 2 ( x ) ( x 1 − ω ) P ( dω ) − 1 4   x 3 − x 1 + x 2 2   2 f  x 1 + x 2 2  = 0 ∂ V ∂ x 2 ( x ) = R C 1 ( x ) ( x 2 − ω ) P ( dω ) + R C 2 ( x ) ( x 2 − ω ) P ( dω ) + R C 3 ( x ) ( x 2 − ω ) P ( dω ) − 1 4   x 3 − x 1 + x 2 2   2 f  x 1 + x 2 2  + 1 4   x 1 − x 3 + x 2 2   2 f  x 3 + x 2 2  = 0 ∂ V ∂ x 3 ( x ) = R C 2 ( x ) ( x 3 − ω ) P ( dω ) + R C 3 ( x ) ( x 3 − ω ) P ( dω ) + 1 4   x 1 − x 2 + x 3 2   2 f  x 2 + x 3 2  = 0 (30) If w e assume, for example, that the densit y of observ ations is uniform U [0;1] , i.e. f ( x ) = 1 if x ∈ [0; 1] , then these t w o sets of p oin ts ha v e no p oin t in ommon. Indeed, if the t w o sets are equal then      x 3 − x 1 + x 2 2 = 0 x 1 − x 2 + x 3 2 = 0 (31) SOM and distortion measure 17 Therefore, x 1 = x 2 = x 3 , but this p oin t is learly not an equilibrium of the K ohonen map. 4.2.2 Illustration of the b eha vior of distortion measure W e will see that if one dra ws data with a uniform distribution on the segmen t [0 , 1] and then one omputes the minim um of the distortion, then this minim um is alw a ys on a dison tin uit y p oin t. The more observ ations one has, the more dison tin uities there are, but the global funtion lo oks more and more regular. This is not surprising, sine w e kno w that the limit is deriv able. The metho d of sim ulation Sine w e ha v e no n umerial algorithm to ompute the exat min- im um of v ariane, w e pro eed b y exhaustiv e resear h based on a disretization of the spae of the en troids. T o a v oid to o m u h omputation, 0 . 001 is  hosen as the disretization step. The follo wing gures are obtained in the follo wing w a y: 1. Sim ulate n data ( ω 1 , · · · , ω n ) ,  hosen with a uniform la w on [0 , 1] . 2. Sear h exhaustiv ely , on the disretization of D I , the string whi h minimizes the distortion. 3. F or the b est string ( x ∗ 1 , x ∗ 2 , x ∗ 3 ) , the graphial represen tations are obtained in the follo wing w a y: • 3D Represen tation: w e k eep one en troid in the triplet ( x ∗ 1 , x ∗ 2 , x ∗ 3 ), then w e mo v e the other around a small neigh b orho o d of its optimal p osition. The lev el z is the extended v ariane m ultiplied b y the n um b er of observ ations n . • 2D Represen tation: w e k eep t w o en troids in the triplet ( x ∗ 1 , x ∗ 2 , x ∗ 3 ), then w e mo v e the last one around a small neigh b orho o d of its optimal p osition. The lev el z is the extended v ariane m ultiplied b y the n um b er of observ ations n . SOM and distortion measure 18 The follo wing gures sho w the results obtained for a n um b er of observ ations n v arying from 10 , 100 and 1000 . W e notie that, ev en for a small n um b er of observ ations, the minima are alw a ys on dison tin uit y p oin ts. Figure 3: Distortion measure for 10 observ ations x2 x3 z x1 x3 z x1 x2 z 0.34 0.36 0.38 0.40 0.42 0.44 0.82 0.84 0.86 0.88 0.90 x1 z 0.50 0.52 0.54 0.56 0.58 0.60 0.82 0.84 0.86 0.88 0.90 0.92 0.94 x2 z 0.72 0.74 0.76 0.78 0.80 0.82 0.82 0.84 0.86 0.88 x3 z SOM and distortion measure 19 Figure 4: Distortion measure for 100 observ ations x2 x3 z x1 x3 z x1 x2 z 0.30 0.32 0.34 0.36 0.38 11.95 12.00 12.05 12.10 12.15 x1 z 0.46 0.48 0.50 0.52 0.54 0.56 12.0 12.1 12.2 12.3 12.4 x2 z 0.62 0.64 0.66 0.68 0.70 0.72 11.95 12.00 12.05 12.10 12.15 12.20 x3 z Figure 5: Distortion measure for 1000 observ ations x2 x3 z x1 x3 z x1 x2 z 0.26 0.28 0.30 0.32 0.34 0.36 121.0 121.5 122.0 122.5 x1 z 0.44 0.46 0.48 0.50 0.52 0.54 121 122 123 124 x2 z 0.62 0.64 0.66 0.68 0.70 121.0 121.5 122.0 122.5 123.0 x3 z SOM and distortion measure 20 5 Conlusion F or a nite n um b er of observ ations, the K ohonen algorithm w as supp osed to giv e an appro ximation of the minim um of distortion measure, but if it w ere the ase, then wh y an the p oin ts of equilibrium of the algorithm b e dieren t from the theoretial minim um of distortion? Moreo v er, w e ha v e sho wn that if w e  ho ose maps that almost minimizes the empirial distortion, then these maps ha v e to on v erge to the set of maps whi h minimize the theoretial distortion. But, b y alulating the deriv ativ e of the theoretial distortion, w e ha v e sho wn that the equilibria of the K ohonen map an not minimize this distortion in general. W e illustrate this fat with an example where the minim um is alw a ys rea hed on dison tin uit y p oin ts. This fat pro v es that the lo al deriv abilit y of distortion measure is not an imp ortan t prop ert y and is not a satisfatory explanation for the b eha vior of the K ohonen algorithm when the n um b er of observ ations is nite. Referenes [1℄ Cottrell M., F ort, J.C. and P agès G. (1998). Theoretial asp ets of the SOM algorithm. Neu- r o  omputing , 21 . 119-138. [2℄ Erwin, E., Ob erma y er, K. and S h ulten, K. (1992). Self-Organizing Maps: Ordering, Con v er- gene prop erties and Energy F untions. Biolo gi al Cyb ernetis , 67 . 47-55. [3℄ F ort, J.C. and P agès G. (1995). On the a.s. on v ergene of the K ohonen algorithm with a general neigh b orho o d funtion. The A nnals of Applie d Pr ob ability , 5(4) . 1177-1216. [4℄ F ort, J.C., Letrém y , P . and Cottrell M. (2002). A dv an tages and dra wba ks of the bat h K ohonen algorithm. In M. V erleysen (ed.), Pr o  e e dings of ESANN 2002 (pp. 223-230). Brussels: Di F ato. SOM and distortion measure 21 [5℄ Gaenssler, P ., Stute, W. (1979). Empirial pro esses: A surv ey of results for indep enden t and iden tially distributed random v ariables. The A nnals of Pr ob ability , 7:2 193-243 [6℄ Graep el, T., Burger M. and Ob erma y er, K. (1997). Phase transition in sto  hasti self organizing maps. Physi al r eview , E(56) . 3876-3890. [7℄ K ohonen, T. (1995). Self-Organizing Maps. Springer Series in Information sien es , 30 . New Y ork: Springer-V erlag. [8℄ K ohonen, T. (1991). Artiial neural net w orks, 2 . Amsterdam: North Holland. [9℄ K ohonen, T. (1999). Comparison of SOM p oin ts densities based on dieren t riteria. Neur al Computation , 11 . 2081-2095. [10℄ P ollard, D. (1981). Strong onsisteny of k-mean lustering. the A nnals of statistis , 9(1) . 135-140. [11℄ Rynkiewiz, J. (2005). Consisteny of a least extended v ariane estimator (in F ren h). Comptes r endus de l'A  adémie des Sien es , I(345) . 133-136. x2 x3 z x1 x3 z x1 x2 z 0.28 0.30 0.32 0.34 0.36 0.38 1190 1195 1200 1205 x1 z 0.46 0.48 0.50 0.52 0.54 0.56 1190 1195 1200 1205 1210 1215 1220 x2 z 0.64 0.66 0.68 0.70 0.72 1190 1195 1200 1205 x3 z

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment