Self Organizing Map algorithm and distortion measure

Self Organizing Map algorithm and distortion measure Joseph Rynkiewiz SAMOS/MA TISSE, Univ ersité de P aris-I, 90, rue de T olbia 75013 P aris, F rane, T el. and F ax : (+33) 144078705 joseph.rynkiewizuniv-paris1.fr 1 Self Organizing Map algorithm and distortion measure Abstrat W e study the statistial meaning of the minimization of distortion measure and the relation b et w een the equilibrium p oin ts of the SOM algorithm and the minima of distortion measure. If w e assume that the observ ations and the map lie in an ompat Eulidean spae, w e pro v e the strong onsisteny of the map whi h almost minimizes the empirial distortion. Moreo v er, after alulating the deriv ativ es of the theoretial distortion measure, w e sho w that the p oin ts minimizing this measure and the equilibria of the K ohonen map do not mat h in general. W e illustrate, with a simple example, ho w this o urs. k eyw ords Distortion measure, asymptoti on v ergene, onsisteny , Self Organizing Map, empiri- al pro esses, Gliv enk o-Can telli lass, uniform la w of large n um b ers, general neigh b orho o d funtion. SOM and distortion measure 3 1 In tro dution The distortion or distortion measure, is ertainly the most p opular riterion for assessing the qualit y of the lassiation of a K ohonen map (see K ohonen [8℄). This measure yields an assessmen t of mo del prop erties with resp et to the data and o v eromes the absene of ost funtion in the SOM algorithm. Moreo v er, the SOM algorithm has b een pro v en to b e an appro ximation for the gradien t of distortion measure (see Graep el et al.[6 ℄). Although the K ohonen map is pro v en to on v erge sometimes on equilibria p oin ts, when the n um b er of observ ations tends to innit y , the learning dynami annot b e desrib ed b y a gradien t desen t of distortion measure for an innite n um b er of observ ations (see for example Erwin et al. [2℄). Moreo v er, K ohonen [9 ℄ has sho wn in some examples for the one dimensional grid, that the mo del v etor pro dued b y the SOM algorithm do es not exatly oinide with the optim um of distortion measure. This prop ert y seems to b e parado xial, on one hand SOM seems to minimize the distortion for a nite n um b er of observ ations, but this b eha vior is no more true for the limit, i.e. an innit y of observ ations. In this pap er w e will in v estigate the relationship b et w een the SOM and distortion measure. Firstly w e will pro v e the strong onsisteny of the estimator minimizing the empirial distortion. More preisely , w e will pro v e that the maps almost minimizing the empirial distortion measure will on v erge almost surely to the set of maps minimizing the theoretial distortion measure. Seondly , w e will alulate the deriv ativ es of the theoretial distortion, and dedue from this alulation that the p oin ts minimizing the theoretial distortion dier generally from the equilibrium p oin t of the SOM, whatev er the dimension of the grid. Finally w e will illustrate, with a simple example, wh y an apparen t on tradition b et w een the disrete and the on tin uous ase o urs. SOM and distortion measure 4 2 Distortion measure W e also assume in the sequel that the observ ations ω are indep enden t and iden tially distributed (i.i.d.) and are of dimension d . W e assume that the observ ations lie in an ompat spae, therefore, without loss of generalit y , they lie in the ompat spae [0 , 1] d . W e assume also that these obser- v ations follo w the probabilit y la w P ha ving a densit y with resp et to the Leb esgue measure of R d , this densit y is assumed to b e b ounded b y a onstan t B . In the sequel w e all en troid a v etor of [0 , 1] d represen ting a lass of observ ations ω . W e adopt in the sequel the notation of Cottrell et al. [1℄. Denition 2.1 F or e ∈ N ∗ , e ≤ d , we  onsider a set of units indexe d by I ⊂ Z e with the neighb or- ho o d funtion Λ fr om I − I := { i − j, i, j ∈ I } to [0 , 1] satisfying Λ ( k ) = Λ ( − k ) and Λ (0) = 1 , note that suh neighb orho o d funtion  an b e disr ete or  ontinuous. Denition 2.2 Note k . k the Eulide an norm, let D δ I :=  x : = ( x i ) i ∈ I ∈  [0 , 1] d  I , suh that k x i − x j k ≥ δ if i 6 = j  b e the set of  entr oids x i sep ar ate d by, at le ast, δ . Denition 2.3 if x : = ( x i ) i ∈ I is the set of units, the V or onoi tessel lation ( C i ( x )) i ∈ I is dene d by C i ( x ) := n ω ∈ [0 , 1] d |k x i − ω k < k x k − ω k if k 6 = i o In  ase of e quality we assign ω ∈ C i ( x ) thanks to the lexi o gr aphi al or der. Conversely, the index of the V or onoi tessel lation for an observation ω wil l b e dene d by C − 1 x ( ω ) = i ∈ I , if and only if ω ∈ C i ( x ) SOM and distortion measure 5 Denition 2.4 distortion me asur es the quality of a quanti ation with r esp e t to the neighb orho o d strutur e. It is dene d as fol lows: • Distortion for the disr ete  ase (empiri al distortion): W e assume that the observations ar e in a nite set Ω = { ω 1 , · · · , ω n } and ar e uniformly distribute d on this set. Then, distortion me asur e is V n ( x ) = 1 2 n X i ∈ I X ω ∈ C i ( x )   X j ∈ I Λ ( i − j ) k x j − ω k 2   • Distortion for the  ontinuous  ase (the or eti al distortion): L et us assume that P is the distri- bution funtion of the observations. The the or eti al distortion me asur e is V ( x ) = 1 2 X i,j ∈ I Λ ( i − j ) Z C i ( x ) k x j − ω k 2 dP As mentione d b efor e the distribution P has a density with r esp e t to the L eb esgue me asur e b ounde d by a  onstant B > 0 . The distortion measure is w ell kno wn to b e not on tin uous with resp et to the en troids ( x i ) i ∈ I for the disrete ase. Indeed, if an observ ation is exatly on an h yp erplan separating t w o en troids, shifting one of the en troids will imply a jump for the distortion. So, the distortion is not on tin uous and, in general, a map whi h realizes the minim um of the empirial distortion, do es not exist. Ho w ev er, if w e onsider the sequenes of maps x n su h that the distortion V n ( x n ) will b e suien tly lose to its minim um, then w e will sho w that su h sequenes of maps x n will on v erge almost surely to the set of maps whi h rea hes the minim um of the theoretial distortion measure V ( x ) . SOM and distortion measure 6 3 Consisteny of the almost minim um of distortion This demonstration is an extended v ersion of Rynkiewiz [11 ℄. It follo ws the same line as P ollard [10 ℄, so w e will rst sho w a uniform la w of large n um b ers and then dedue the strong onsisteny prop ert y . 3.1 Uniform la w of large n um b er Let the family of funtions b e G :=    g x ( ω ) := X j ∈ I Λ  C − 1 x ( ω ) − j  k x j − ω k 2 for x ∈ D δ I    (1) In order to sho w the uniform la w of large n um b ers, w e ha v e to pro v e that: sup x ∈ D δ I     Z g x ( ω ) dP n ( ω ) − Z g x ( ω ) dP ( ω )     a.s. n →∞ − → 0 (2) sine, for all probabilit y measure Q on [0 , 1] d : Z g x ( ω ) dQ ( ω ) = Z X j ∈ I Λ  C − 1 x ( ω ) − j  k x j − ω k 2 dQ ( ω ) = 1 2 X i,j ∈ I Λ( i − j ) Z C i ( x ) k x j − ω k 2 dQ ( ω ) (3) No w, a suien t ondition to v erify the equation (2) is the follo wing (see Gaenssler and Stute [5 ℄): ∀ ε > 0 , ∀ x 0 ∈ D δ I a neigh b orho o d S ( x 0 ) of x 0 exists su h that Z g x 0 ( ω ) dP ( ω ) − ε < Z  inf x ∈ S ( x 0 ) g x ( ω )  dP ( ω ) ≤ Z sup x ∈ S ( x 0 ) g x ( ω ) ! dP ( ω ) < Z g x 0 ( ω ) dP ( ω ) + ε (4) First w e pro v e the follo wing result, using a similar te hnique as the pro of of lemma 11 of F ort and P agès [3 ℄ SOM and distortion measure 7 Lemma 3.1 L et x ∈ D δ I and λ b e the L eb esgue me asur e on [0 , 1] d . Note E c the  omplementary set of set E in [0 , 1] d and | I | the  ar dinal of set I. F or 0 < α < δ 2 , let U α i ( x ) =  ω ∈ [0 , 1] d / ∃ y ∈ D δ I , x j = y j if j 6 = i and k x i − y i k < α and ω ∈ C c i ( y ) ∩ C i ( x )  b e the set of ω hanging of V or onoi  el ls when the  entr oid x i ar e moving a distan e of at most α . Then sup x ∈ D δ I λ ( U α i ( x )) < ( | I | − 1)  2 α δ + α   √ 2  d − 1 (5) pro of Let x and y ∈ D δ I  he king the assumption of lemma 3.1 and j 6 = i ∈ I . In order to pro v e the inequalit y , w e ha v e to b ound the measure of ω b elonging to the ells C i ( x ) and C j ( y ) sim ultaneously , sine ( C i ( y )) c = S j ∈ I , j 6 = i C j ( y ) . Note ( z | t ) , the inner pro dut b et w een z and t , and − → n ij x := x j − x i k x j − x i k . The parameter v etor x + γ 1 − → n ij x will b e the v etor with all omp onen ts equal to x exept the omp onen t i equal to x i + γ 1 − → n ij x . Sine k y i − x i k < α , w e ha v e  y i − x i   − → n ij x  = γ 1 with | γ 1 | ≤ α < δ 2 . As the Leb esgue measure (of R d − 1 ) of all plane setions of [0 , 1] d is b ounded b y  √ 2  d − 1 , when there is a mo v emen t of the en troid x i , of γ 1 − → n ij x , the Leb esgue measure of ω  hanging of V oronoi ells is then b ounded b y | γ 1 | 2  √ 2  d − 1 , so λ  C j  x + γ 1 − → n ij x  ∩ C i ( x )  < α  √ 2  d − 1 (6) Moreo v er, w e note that x + γ 1 − → n ij x b elongs to D δ 2 I . On the other hand, let y i − x i − γ 1 − → n ij x := γ 2 − → τ ij x , with k − → τ ij x k = 1 , b e the orthogonal omp onen t to − → n ij x of the mo v emen t of x i to y i , i.e. su h that  − → n ij x   − → τ ij x  = 0 . As it is sho wn in gure (1), in dimension 2, for all x ′ = x + γ 1 − → n ij x ∈ D δ 2 I , the Leb esgue measure of ω  hanging of V oronoi ells for a mo v emen t of en troid x ′ i , of γ 2 − → τ ij x is b ounded b y 2 α δ  √ 2  d − 1 . SOM and distortion measure 8 Therefore, w e ha v e λ  C j  x + γ 1 − → n ij x + γ 2 − → τ ij x  ∩ C i ( x )  < α  √ 2  d − 1 + 2 α δ  √ 2  d − 1 (7) Figure 1: hat hed area < 2 γ 2 δ < √ 2 × 2 α δ γ 2 γ 2 0 1 2 2 δ δ/2 /2 < x’ x j i As this inequalit y is indep enden t of x , nally w e get: sup x ∈ D δ I λ  C j  x + γ 1 − → n ij x + γ 2 − → τ ij x  ∩ C i ( x )  <  α + 2 α δ   √ 2  d − 1 (8) then sup x ∈ D δ I λ ( U α i ( x )) < ( | I | − 1)  α + 2 α δ   √ 2  d − 1  No w onsider x 0 ∈ D δ I and S ( x 0 ) a neigh b orho o d of x 0 inluded in a sphere of radius α . Let W ( x 0 ) b e the set of ω remaining in their V oronoi ells when x 0 go to an y x ∈ S ( x 0 ) . F or all SOM and distortion measure 9 ω ∈ W ( x 0 ) w e ha v e inf x ∈ S ( x 0 ) g x ( ω ) ≥ g x 0 ( ω ) − P j ∈ I Λ  C − 1 x 0 ( ω ) − j   k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x 0 j − ω k 2  ≥ g x 0 j ( ω ) − P j ∈ I  k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x 0 j − ω k 2  (9) F or all ω ∈ [0 , 1] d , for a small enough α , w e ha v e  k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x j − ω k 2  < ε 2 B | I | so Z W ( x 0 ) X j ∈ I  k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x j − ω k 2  dP ( ω ) < ε 2 and Z W ( x 0 )  g x 0 ( ω ) − inf x ∈ S ( x 0 ) g x ( ω )  < ε 2 (10) No w, let W ( x 0 ) c b e the set of ω  hanging of V oronoi ells when the en troids go from x 0 to x ∈ S x 0 . F or all ω ∈ W ( x 0 ) c there exist t w o dieren t indies i and j su h that ω ∈ C i ( x 0 ) and ω ∈ C j ( x ) . Let us dene a sequene x k , k ∈ { 0 , · · · , k I |} , b y sequen tially  hanging the omp onen ts of x 0 in to the omp onen ts of x su h that x | I | = x ( x k is the set of in termediate ongurations to transform x 0 in x ), then there exists a momen t l ∈ { 0 , · · · , | I | − 1 } , su h that ω ∈ C i ( x l ) and ω / ∈ C i ( x l +1 ) . Indeed, if it w ere not the ase, y ou ould nd a sequene x k , k ∈ { 0 , · · · , k I |} , with x | I | = x su h that ω ∈ C i ( x | I | ) = C i ( x ) , whi h w ould b e a on tradition. So W ( x 0 ) c is inluded in the set of ω whi h  hange of V oronoi set when w e  hange sequen tially the omp onen ts of x 0 b y the omp onen ts of x . If α < δ 4 , then when the omp onen ts x 0 i of x 0 are mo ving sequen tially from x 0 to x i of x , ea h in termediate onguration sta ys in D δ 2 I . Sine, for all i ∈ I , k x i − ω k 2 is b ounded b y 1 on [0 , 1] d , the lemma 3.1, assure that Z W ( x 0 ) c g x ( ω ) dP ( ω ) < B | I | ( | I | − 1))  4 α δ + α   √ 2  d − 1 (11) SOM and distortion measure 10 Finally , if w e  ho ose a small enough α su h that B | I | ( | I | − 1))  4 α δ + α   √ 2  d − 1 < ε 2 , w e get Z D δ I g x 0 ( ω ) dP ( ω ) − ε < Z D δ I  inf x ∈ S ( x 0 ) g x ( ω )  dP ( ω ) (12) Exatly in the same w a y , for a small enough α , w e get Z D δ I sup x ∈ S ( x 0 ) g x ( ω ) ! dP ( ω ) < Z D δ I g x 0 ( ω ) dP ( ω ) + ε (13) Therefore, the suien t ondition for the uniform la w of large n um b ers is true. 3.2 Consisteny W e w an t to sho w the onsisteny of the pro edure in v olving  ho osing maps ( x n ) n ∈ N ∗ whi h almost minimizes the empirial distortions ( V n ( x )) n ∈ N ∗ in D δ I . Let ¯ χ β n := ( x ∈ D δ I su h that V n ( x ) < inf x ∈ D δ I V n ( x ) + 1 β ( n ) ) (14) b e the set of estimators that almost minimize the empirial distortion, with β ( n ) b eing a stritly p ositiv e funtion, su h that lim n → + ∞ β ( n ) = ∞ . Let ¯ χ = arg min x ∈ D δ I V ( x ) b e the set of maps minimizing the theoretial distortion, ev en tually redued to one map. It is easy to v erify that the funtion x 7− → V ( x ) is on tin uous on D δ I , so for all neigh b orho o d N of ¯ χ , η ( N ) > 0 exists su h that ∀ x ∈ D δ I \N , V ( x ) > min x ∈ D δ I V ( x ) + η ( N ) (15) to sho w the strong onsisteny , it is enough to pro v e that for all neigh b orho o ds N of ¯ χ w e ha v e lim n →∞ ¯ χ β n a.s. ⊂ N ⇐ ⇒ lim n →∞ V  ¯ χ β n  − V ( ¯ χ ) a.s. ≤ η ( N ) (16) SOM and distortion measure 11 with V ( E ) − V ( F ) := sup { V ( x ) − V ( y ) for x ∈ E and y ∈ F } . By denition V n  ¯ χ β n  a.s. ≤ V n ( ¯ χ ) + 1 β ( n ) , and the uniform la w of large n um b ers yields lim n →∞ V n ( ¯ χ ) − V ( ¯ χ ) a.s. = 0 , w e get then lim n →∞ V n  ¯ χ β n  a.s. ≤ V ( ¯ χ ) + η ( N ) 2 . Moreo v er, w e ha v e lim n →∞ V  ¯ χ β n  − V n  ¯ χ β n  a.s. = 0 and lim n →∞ V  ¯ χ β n  − η ( N ) 2 a.s. < lim n →∞ V n  ¯ χ β n  a.s. ≤ V ( ¯ χ ) + η ( N ) 2 (17) nally lim n →∞ V  ¯ χ β n  − V ( ¯ χ ) a.s. ≤ η ( N ) , this pro v es the strong onsisteny of the maps whi h almost minimizes the empirial distortion. 4 Dierenes b et w een the SOM algorithm and distortion measure Using the result of the previous setion w e an in v estigate the dierenes b et w een the minima of the empirial distortion and the equilibria of the SOM algorithm. Namely , if these equilibria w ere maps almost minimizing the empirial distortion riterion they will on v erge, as the n um b er of observ ations inreases, to the minim um of the theoretial distortion measure but w e will sho w that it is not generally the ase. In the next setion w e will ompute the gradien t of the funtion V ( x ) , and sho w that ev en in m ultidimensional ases, the equilibria of the SOM algorithm and the minima of V ( x ) do not mat h. These results generalize the results of K ohonen [ 9℄ obtained for unidimensional ases. 4.1 Deriv abilit y of V ( x ) Let us no w write D I :=   x i =  x 1 i , · · · , x d i  i ∈ I ∈  [0 , 1] d  I    ∀ k ∈ { 1 , · · · , d }    x k i − x k j    > 0 if i 6 = j  (18) SOM and distortion measure 12 F or i and j ∈ I , notes − → n ij x the v etor x j − x i k x j − x i k and let M ij x :=:  u ∈ R d /  u − x i − x j 2 , x i − x j  = 0  (19) b e the mediator h yp erplan. Let us note λ ij x ( ω ) the Leb esgue measure on M ij x . F ort and P agès [3 ℄, ha v e sho wn the follo wing lemma: Lemma 4.1 L et φ b e an R value d  ontinuous funtion on [0 , 1] d . F or x ∈ D I , let b e Φ i ( x ) := R C i ( x ) φ ( ω ) dω . W e note also ( e 1 , · · · , e d ) the  anoni al b ase of R d . The funtion Φ i is  ontinuously derivable on D I and ∀ i 6 = j, l ∈ { 1 , · · · , d } ∂ Φ i ∂ x l j ( x ) = Z ¯ C i ( x ) ∩ ¯ C j ( x ) φ ( ω )  1 2  − → n ij x , e l  + 1 k x j − x i k ×  x i + x j 2 − ω  , e l  λ ij x ( ω ) dω (20) Mor e over, if we note ∂ Φ i ∂ x i ( x ) :=       ∂ Φ i ∂ x 1 j ( x ) . . . ∂ Φ i ∂ x d j ( x )       ∂ Φ i ∂ x i ( x ) = − X j ∈ I , j 6 = i ∂ Φ i ∂ x j ( x ) (21) Then, w e dedue the theorem: Theorem 4.2 If P ( dω ) = f ( ω ) dω , wher e f is  ontinuous on [0; 1] d , then V is  ontinuously deriv- SOM and distortion measure 13 able on D I and we have ∂ V ∂ x i ( x ) = P k ∈ I Λ ( i − k ) R C k ( x ) ( x i − ω ) P ( dω ) + 1 2 P j ∈ I P k ∈ I ,k 6 = i (Λ ( k − j ) − Λ ( i − j ) ) × R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2  1 2 − → n k i x + 1 k x k − x i k ×  x i + x k 2 − ω   f ( ω ) λ k i x dω (22) wher e ∂ V ∂ x i ( x ) =       ∂ V ∂ x 1 i ( x ) . . . ∂ V ∂ x d i ( x )       Pro of As the funtion V ( x ) is on tin uous on D I , w e only ha v e to sho w that the partial deriv ativ es exist and are on tin uous. W e note h l i ∈ R | I |× d the v etor with all omp onen ts n ull exept the omp onen t orresp onding to x l i , whi h is h > 0 . Then V ( x + h l i ) − V ( x ) h = 1 2 P k,j ∈ I , k,j 6 = i Λ( k − j ) R C k ( x + h l i ) k x j − ω k 2 P ( dω ) − 1 2 P k,j ∈ I , k,j 6 = i Λ( k − j ) R C k ( x ) k x j − ω k 2 P ( dω ) h + 1 2 P j ∈ I , j 6 = i Λ( i − j ) R C i ( x + h l i ) k x j − ω k 2 P ( dω ) − 1 2 P j ∈ I ,j 6 = i Λ( i − j ) R C i ( x ) k x j − ω k 2 P ( dω ) h + 1 2 P k ∈ I ,k 6 = i Λ( k − i ) R C k ( x + h l i ) k x i + h l i − ω k 2 P ( dω ) − R C k ( x ) k x i − ω k 2 P ( dω ) h + 1 2 „ R C i ( x + h l i ) k x i + h l i − ω k 2 P ( dω ) − R C i ( x ) k x i − ω k 2 P ( dω ) « h (23) Where the rst t w o lines of the sums onern en troids dieren t from x i and the last t w o lines the v ariation in v olving x i . No w, b y applying the lemma 4.1, to the rst t w o lines of the sum w e get: SOM and distortion measure 14 lim h → 0 V ( x + h l i ) − V ( x ) h = 1 2 P k ,j ∈ I , k, j 6 = i Λ ( k − j ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω − 1 2 P k ,j ∈ I , k, j 6 = i Λ ( i − j ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω + lim h → 0 1 2 P k ∈ I ,k 6 = i Λ( k − i ) R C k ( x + h l i ) k x i − ω k 2 +2 h ( x l i − w l )+ o ( h ) P ( dω ) − R C k ( x ) k x i − ω k 2 P ( dω ) h + lim h → 0 1 2 „ R C i ( x + h l i ) k x i − ω k 2 +2 h ( x l i − w l )+ o ( h ) P ( dω ) − R C i ( x ) k x i − ω k 2 P ( dω ) « h (24) Then, b y applying the lemma 4.1 to the last t w o lines, w e get: lim h → 0 V ( x + h l i ) − V ( x ) h = 1 2 P k ,j ∈ I , k, j 6 = i (Λ ( k − j ) − Λ ( i − j )) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω + 1 2 P k ∈ I ,k 6 = i Λ ( k − i ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x i − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω − 1 2 P k ∈ I ,k 6 = i R ¯ C k ( x ) ∩ ¯ C i ( x ) k x i − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω + P k ∈ I Λ ( k − i ) R C k ( x ) ( x l i − w l ) P ( dω ) (25) nally lim h → 0 V ( x + h l i ) − V ( x ) h = ∂ V ∂ x l i ( x ) = 1 2 P k ,j ∈ I , k 6 = i (Λ ( k − j ) − Λ ( i − j )) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k ×  x k + x i 2 − ω  , e l  o λ k i x ( ω ) dω + P k ∈ I Λ ( k − i ) R C k ( x ) ( x l i − w l ) P ( dω )  (26) If w e assume that the minim um of distortion measure is rea hed in the in terior of D I (i.e. that no en troids ollapse), w e dedue from the previous results that it do es not mat h the equilibrium of the K ohonen algorithm. Indeed, a p oin t x ∗ := ( x ∗ i ) i ∈ I asymptotially stable for the K ohonen SOM and distortion measure 15 algorithm will v erify for all i ∈ I : X k ∈ I Λ ( i − k ) Z C k ( x ) ( x i − ω ) P ( dω ) = 0 (27) This equation is v alid ev en for the bat h algorithm (see F ort, Cottrell and Letrém y [4 ℄). It an mat h with a minim um of the limit distortion only if 1 2 P j ∈ I P k ∈ I ,k 6 = i (Λ ( k − j ) − Λ ( i − j )) × R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2  1 2 − → n k i x + 1 k x k − x i k ×  x i + x k 2 − ω   f ( ω ) λ k i x dω = 0 (28) but, in general, this term is not n ull. 4.2 Example of a K ohonen string with 3 en troids The previous setion has sho wn that the minim um of distortion measure do es not mat h the equi- librium of the K ohonen algorithm. W e will illustrate this with a simple example. The lassial explanation (see K ohonen [7 ℄) of lo al p oten tial minimization b y the K ohonen algorithm is far from b eing satisfatory . A tually it seems that the minima of the distortion measure alw a ys o ur on a dison tin uit y p oin t, where the funtion is not deriv able. T o illustrate this, let a K ohonen string b e on segmen t [0 , 1] (see gure 2), with a disrete neigh b orho o d SOM and distortion measure 16 Figure 2: K ohonen string 0 1 1 1 X X X 1 2 3 4.2.1 The theoretial dierene The equilibrium of the SOM algorithm is rea hed on p oin ts x v erifying ∂ V ∂ x 1 ( x ) = R C 1 ( x ) ( x 1 − ω ) P ( dω ) + R C 2 ( x ) ( x 1 − ω ) P ( dω ) = 0 ∂ V ∂ x 2 ( x ) = R C 1 ( x ) ( x 2 − ω ) P ( dω ) + R C 2 ( x ) ( x 2 − ω ) P ( dω ) + R C 3 ( x ) ( x 2 − ω ) P ( dω ) = 0 ∂ V ∂ x 3 ( x ) = R C 2 ( x ) ( x 3 − ω ) P ( dω ) + R C 3 ( x ) ( x 3 − ω ) P ( dω ) = 0 (29) but the minima of the distortion are rea hed on p oin ts x v erifying ∂ V ∂ x 1 ( x ) = R C 1 ( x ) ( x 1 − ω ) P ( dω ) + R C 2 ( x ) ( x 1 − ω ) P ( dω ) − 1 4   x 3 − x 1 + x 2 2   2 f  x 1 + x 2 2  = 0 ∂ V ∂ x 2 ( x ) = R C 1 ( x ) ( x 2 − ω ) P ( dω ) + R C 2 ( x ) ( x 2 − ω ) P ( dω ) + R C 3 ( x ) ( x 2 − ω ) P ( dω ) − 1 4   x 3 − x 1 + x 2 2   2 f  x 1 + x 2 2  + 1 4   x 1 − x 3 + x 2 2   2 f  x 3 + x 2 2  = 0 ∂ V ∂ x 3 ( x ) = R C 2 ( x ) ( x 3 − ω ) P ( dω ) + R C 3 ( x ) ( x 3 − ω ) P ( dω ) + 1 4   x 1 − x 2 + x 3 2   2 f  x 2 + x 3 2  = 0 (30) If w e assume, for example, that the densit y of observ ations is uniform U [0;1] , i.e. f ( x ) = 1 if x ∈ [0; 1] , then these t w o sets of p oin ts ha v e no p oin t in ommon. Indeed, if the t w o sets are equal then      x 3 − x 1 + x 2 2 = 0 x 1 − x 2 + x 3 2 = 0 (31) SOM and distortion measure 17 Therefore, x 1 = x 2 = x 3 , but this p oin t is learly not an equilibrium of the K ohonen map. 4.2.2 Illustration of the b eha vior of distortion measure W e will see that if one dra ws data with a uniform distribution on the segmen t [0 , 1] and then one omputes the minim um of the distortion, then this minim um is alw a ys on a dison tin uit y p oin t. The more observ ations one has, the more dison tin uities there are, but the global funtion lo oks more and more regular. This is not surprising, sine w e kno w that the limit is deriv able. The metho d of sim ulation Sine w e ha v e no n umerial algorithm to ompute the exat min- im um of v ariane, w e pro eed b y exhaustiv e resear h based on a disretization of the spae of the en troids. T o a v oid to o m u h omputation, 0 . 001 is  hosen as the disretization step. The follo wing gures are obtained in the follo wing w a y: 1. Sim ulate n data ( ω 1 , · · · , ω n ) ,  hosen with a uniform la w on [0 , 1] . 2. Sear h exhaustiv ely , on the disretization of D I , the string whi h minimizes the distortion. 3. F or the b est string ( x ∗ 1 , x ∗ 2 , x ∗ 3 ) , the graphial represen tations are obtained in the follo wing w a y: • 3D Represen tation: w e k eep one en troid in the triplet ( x ∗ 1 , x ∗ 2 , x ∗ 3 ), then w e mo v e the other around a small neigh b orho o d of its optimal p osition. The lev el z is the extended v ariane m ultiplied b y the n um b er of observ ations n . • 2D Represen tation: w e k eep t w o en troids in the triplet ( x ∗ 1 , x ∗ 2 , x ∗ 3 ), then w e mo v e the last one around a small neigh b orho o d of its optimal p osition. The lev el z is the extended v ariane m ultiplied b y the n um b er of observ ations n . SOM and distortion measure 18 The follo wing gures sho w the results obtained for a n um b er of observ ations n v arying from 10 , 100 and 1000 . W e notie that, ev en for a small n um b er of observ ations, the minima are alw a ys on dison tin uit y p oin ts. Figure 3: Distortion measure for 10 observ ations x2 x3 z x1 x3 z x1 x2 z 0.34 0.36 0.38 0.40 0.42 0.44 0.82 0.84 0.86 0.88 0.90 x1 z 0.50 0.52 0.54 0.56 0.58 0.60 0.82 0.84 0.86 0.88 0.90 0.92 0.94 x2 z 0.72 0.74 0.76 0.78 0.80 0.82 0.82 0.84 0.86 0.88 x3 z SOM and distortion measure 19 Figure 4: Distortion measure for 100 observ ations x2 x3 z x1 x3 z x1 x2 z 0.30 0.32 0.34 0.36 0.38 11.95 12.00 12.05 12.10 12.15 x1 z 0.46 0.48 0.50 0.52 0.54 0.56 12.0 12.1 12.2 12.3 12.4 x2 z 0.62 0.64 0.66 0.68 0.70 0.72 11.95 12.00 12.05 12.10 12.15 12.20 x3 z Figure 5: Distortion measure for 1000 observ ations x2 x3 z x1 x3 z x1 x2 z 0.26 0.28 0.30 0.32 0.34 0.36 121.0 121.5 122.0 122.5 x1 z 0.44 0.46 0.48 0.50 0.52 0.54 121 122 123 124 x2 z 0.62 0.64 0.66 0.68 0.70 121.0 121.5 122.0 122.5 123.0 x3 z SOM and distortion measure 20 5 Conlusion F or a nite n um b er of observ ations, the K ohonen algorithm w as supp osed to giv e an appro ximation of the minim um of distortion measure, but if it w ere the ase, then wh y an the p oin ts of equilibrium of the algorithm b e dieren t from the theoretial minim um of distortion? Moreo v er, w e ha v e sho wn that if w e  ho ose maps that almost minimizes the empirial distortion, then these maps ha v e to on v erge to the set of maps whi h minimize the theoretial distortion. But, b y alulating the deriv ativ e of the theoretial distortion, w e ha v e sho wn that the equilibria of the K ohonen map an not minimize this distortion in general. W e illustrate this fat with an example where the minim um is alw a ys rea hed on dison tin uit y p oin ts. This fat pro v es that the lo al deriv abilit y of distortion measure is not an imp ortan t prop ert y and is not a satisfatory explanation for the b eha vior of the K ohonen algorithm when the n um b er of observ ations is nite. Referenes [1℄ Cottrell M., F ort, J.C. and P agès G. (1998). Theoretial asp ets of the SOM algorithm. Neu- r o  omputing , 21 . 119-138. [2℄ Erwin, E., Ob erma y er, K. and S h ulten, K. (1992). Self-Organizing Maps: Ordering, Con v er- gene prop erties and Energy F untions. Biolo gi al Cyb ernetis , 67 . 47-55. [3℄ F ort, J.C. and P agès G. (1995). On the a.s. on v ergene of the K ohonen algorithm with a general neigh b orho o d funtion. The A nnals of Applie d Pr ob ability , 5(4) . 1177-1216. [4℄ F ort, J.C., Letrém y , P . and Cottrell M. (2002). A dv an tages and dra wba ks of the bat h K ohonen algorithm. In M. V erleysen (ed.), Pr o  e e dings of ESANN 2002 (pp. 223-230). Brussels: Di F ato. SOM and distortion measure 21 [5℄ Gaenssler, P ., Stute, W. (1979). Empirial pro esses: A surv ey of results for indep enden t and iden tially distributed random v ariables. The A nnals of Pr ob ability , 7:2 193-243 [6℄ Graep el, T., Burger M. and Ob erma y er, K. (1997). Phase transition in sto  hasti self organizing maps. Physi al r eview , E(56) . 3876-3890. [7℄ K ohonen, T. (1995). Self-Organizing Maps. Springer Series in Information sien es , 30 . New Y ork: Springer-V erlag. [8℄ K ohonen, T. (1991). Artiial neural net w orks, 2 . Amsterdam: North Holland. [9℄ K ohonen, T. (1999). Comparison of SOM p oin ts densities based on dieren t riteria. Neur al Computation , 11 . 2081-2095. [10℄ P ollard, D. (1981). Strong onsisteny of k-mean lustering. the A nnals of statistis , 9(1) . 135-140. [11℄ Rynkiewiz, J. (2005). Consisteny of a least extended v ariane estimator (in F ren h). Comptes r endus de l'A  adémie des Sien es , I(345) . 133-136. x2 x3 z x1 x3 z x1 x2 z 0.28 0.30 0.32 0.34 0.36 0.38 1190 1195 1200 1205 x1 z 0.46 0.48 0.50 0.52 0.54 0.56 1190 1195 1200 1205 1210 1215 1220 x2 z 0.64 0.66 0.68 0.70 0.72 1190 1195 1200 1205 x3 z

Self Organizing Map algorithm and distortion measure

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment