Self Organizing Map algorithm and distortion measure
We study the statistical meaning of the minimization of distortion measure and the relation between the equilibrium points of the SOM algorithm and the minima of distortion measure. If we assume that the observations and the map lie in an compact Euc…
Authors: Joseph Rynkiewicz (CES, Samos)
Self Organizing Map algorithm and distortion measure Joseph Rynkiewiz SAMOS/MA TISSE, Univ ersité de P aris-I, 90, rue de T olbia 75013 P aris, F rane, T el. and F ax : (+33) 144078705 joseph.rynkiewizuniv-paris1.fr 1 Self Organizing Map algorithm and distortion measure Abstrat W e study the statistial meaning of the minimization of distortion measure and the relation b et w een the equilibrium p oin ts of the SOM algorithm and the minima of distortion measure. If w e assume that the observ ations and the map lie in an ompat Eulidean spae, w e pro v e the strong onsisteny of the map whi h almost minimizes the empirial distortion. Moreo v er, after alulating the deriv ativ es of the theoretial distortion measure, w e sho w that the p oin ts minimizing this measure and the equilibria of the K ohonen map do not mat h in general. W e illustrate, with a simple example, ho w this o urs. k eyw ords Distortion measure, asymptoti on v ergene, onsisteny , Self Organizing Map, empiri- al pro esses, Gliv enk o-Can telli lass, uniform la w of large n um b ers, general neigh b orho o d funtion. SOM and distortion measure 3 1 In tro dution The distortion or distortion measure, is ertainly the most p opular riterion for assessing the qualit y of the lassiation of a K ohonen map (see K ohonen [8℄). This measure yields an assessmen t of mo del prop erties with resp et to the data and o v eromes the absene of ost funtion in the SOM algorithm. Moreo v er, the SOM algorithm has b een pro v en to b e an appro ximation for the gradien t of distortion measure (see Graep el et al.[6 ℄). Although the K ohonen map is pro v en to on v erge sometimes on equilibria p oin ts, when the n um b er of observ ations tends to innit y , the learning dynami annot b e desrib ed b y a gradien t desen t of distortion measure for an innite n um b er of observ ations (see for example Erwin et al. [2℄). Moreo v er, K ohonen [9 ℄ has sho wn in some examples for the one dimensional grid, that the mo del v etor pro dued b y the SOM algorithm do es not exatly oinide with the optim um of distortion measure. This prop ert y seems to b e parado xial, on one hand SOM seems to minimize the distortion for a nite n um b er of observ ations, but this b eha vior is no more true for the limit, i.e. an innit y of observ ations. In this pap er w e will in v estigate the relationship b et w een the SOM and distortion measure. Firstly w e will pro v e the strong onsisteny of the estimator minimizing the empirial distortion. More preisely , w e will pro v e that the maps almost minimizing the empirial distortion measure will on v erge almost surely to the set of maps minimizing the theoretial distortion measure. Seondly , w e will alulate the deriv ativ es of the theoretial distortion, and dedue from this alulation that the p oin ts minimizing the theoretial distortion dier generally from the equilibrium p oin t of the SOM, whatev er the dimension of the grid. Finally w e will illustrate, with a simple example, wh y an apparen t on tradition b et w een the disrete and the on tin uous ase o urs. SOM and distortion measure 4 2 Distortion measure W e also assume in the sequel that the observ ations ω are indep enden t and iden tially distributed (i.i.d.) and are of dimension d . W e assume that the observ ations lie in an ompat spae, therefore, without loss of generalit y , they lie in the ompat spae [0 , 1] d . W e assume also that these obser- v ations follo w the probabilit y la w P ha ving a densit y with resp et to the Leb esgue measure of R d , this densit y is assumed to b e b ounded b y a onstan t B . In the sequel w e all en troid a v etor of [0 , 1] d represen ting a lass of observ ations ω . W e adopt in the sequel the notation of Cottrell et al. [1℄. Denition 2.1 F or e ∈ N ∗ , e ≤ d , we onsider a set of units indexe d by I ⊂ Z e with the neighb or- ho o d funtion Λ fr om I − I := { i − j, i, j ∈ I } to [0 , 1] satisfying Λ ( k ) = Λ ( − k ) and Λ (0) = 1 , note that suh neighb orho o d funtion an b e disr ete or ontinuous. Denition 2.2 Note k . k the Eulide an norm, let D δ I := x : = ( x i ) i ∈ I ∈ [0 , 1] d I , suh that k x i − x j k ≥ δ if i 6 = j b e the set of entr oids x i sep ar ate d by, at le ast, δ . Denition 2.3 if x : = ( x i ) i ∈ I is the set of units, the V or onoi tessel lation ( C i ( x )) i ∈ I is dene d by C i ( x ) := n ω ∈ [0 , 1] d |k x i − ω k < k x k − ω k if k 6 = i o In ase of e quality we assign ω ∈ C i ( x ) thanks to the lexi o gr aphi al or der. Conversely, the index of the V or onoi tessel lation for an observation ω wil l b e dene d by C − 1 x ( ω ) = i ∈ I , if and only if ω ∈ C i ( x ) SOM and distortion measure 5 Denition 2.4 distortion me asur es the quality of a quanti ation with r esp e t to the neighb orho o d strutur e. It is dene d as fol lows: • Distortion for the disr ete ase (empiri al distortion): W e assume that the observations ar e in a nite set Ω = { ω 1 , · · · , ω n } and ar e uniformly distribute d on this set. Then, distortion me asur e is V n ( x ) = 1 2 n X i ∈ I X ω ∈ C i ( x ) X j ∈ I Λ ( i − j ) k x j − ω k 2 • Distortion for the ontinuous ase (the or eti al distortion): L et us assume that P is the distri- bution funtion of the observations. The the or eti al distortion me asur e is V ( x ) = 1 2 X i,j ∈ I Λ ( i − j ) Z C i ( x ) k x j − ω k 2 dP As mentione d b efor e the distribution P has a density with r esp e t to the L eb esgue me asur e b ounde d by a onstant B > 0 . The distortion measure is w ell kno wn to b e not on tin uous with resp et to the en troids ( x i ) i ∈ I for the disrete ase. Indeed, if an observ ation is exatly on an h yp erplan separating t w o en troids, shifting one of the en troids will imply a jump for the distortion. So, the distortion is not on tin uous and, in general, a map whi h realizes the minim um of the empirial distortion, do es not exist. Ho w ev er, if w e onsider the sequenes of maps x n su h that the distortion V n ( x n ) will b e suien tly lose to its minim um, then w e will sho w that su h sequenes of maps x n will on v erge almost surely to the set of maps whi h rea hes the minim um of the theoretial distortion measure V ( x ) . SOM and distortion measure 6 3 Consisteny of the almost minim um of distortion This demonstration is an extended v ersion of Rynkiewiz [11 ℄. It follo ws the same line as P ollard [10 ℄, so w e will rst sho w a uniform la w of large n um b ers and then dedue the strong onsisteny prop ert y . 3.1 Uniform la w of large n um b er Let the family of funtions b e G := g x ( ω ) := X j ∈ I Λ C − 1 x ( ω ) − j k x j − ω k 2 for x ∈ D δ I (1) In order to sho w the uniform la w of large n um b ers, w e ha v e to pro v e that: sup x ∈ D δ I Z g x ( ω ) dP n ( ω ) − Z g x ( ω ) dP ( ω ) a.s. n →∞ − → 0 (2) sine, for all probabilit y measure Q on [0 , 1] d : Z g x ( ω ) dQ ( ω ) = Z X j ∈ I Λ C − 1 x ( ω ) − j k x j − ω k 2 dQ ( ω ) = 1 2 X i,j ∈ I Λ( i − j ) Z C i ( x ) k x j − ω k 2 dQ ( ω ) (3) No w, a suien t ondition to v erify the equation (2) is the follo wing (see Gaenssler and Stute [5 ℄): ∀ ε > 0 , ∀ x 0 ∈ D δ I a neigh b orho o d S ( x 0 ) of x 0 exists su h that Z g x 0 ( ω ) dP ( ω ) − ε < Z inf x ∈ S ( x 0 ) g x ( ω ) dP ( ω ) ≤ Z sup x ∈ S ( x 0 ) g x ( ω ) ! dP ( ω ) < Z g x 0 ( ω ) dP ( ω ) + ε (4) First w e pro v e the follo wing result, using a similar te hnique as the pro of of lemma 11 of F ort and P agès [3 ℄ SOM and distortion measure 7 Lemma 3.1 L et x ∈ D δ I and λ b e the L eb esgue me asur e on [0 , 1] d . Note E c the omplementary set of set E in [0 , 1] d and | I | the ar dinal of set I. F or 0 < α < δ 2 , let U α i ( x ) = ω ∈ [0 , 1] d / ∃ y ∈ D δ I , x j = y j if j 6 = i and k x i − y i k < α and ω ∈ C c i ( y ) ∩ C i ( x ) b e the set of ω hanging of V or onoi el ls when the entr oid x i ar e moving a distan e of at most α . Then sup x ∈ D δ I λ ( U α i ( x )) < ( | I | − 1) 2 α δ + α √ 2 d − 1 (5) pro of Let x and y ∈ D δ I he king the assumption of lemma 3.1 and j 6 = i ∈ I . In order to pro v e the inequalit y , w e ha v e to b ound the measure of ω b elonging to the ells C i ( x ) and C j ( y ) sim ultaneously , sine ( C i ( y )) c = S j ∈ I , j 6 = i C j ( y ) . Note ( z | t ) , the inner pro dut b et w een z and t , and − → n ij x := x j − x i k x j − x i k . The parameter v etor x + γ 1 − → n ij x will b e the v etor with all omp onen ts equal to x exept the omp onen t i equal to x i + γ 1 − → n ij x . Sine k y i − x i k < α , w e ha v e y i − x i − → n ij x = γ 1 with | γ 1 | ≤ α < δ 2 . As the Leb esgue measure (of R d − 1 ) of all plane setions of [0 , 1] d is b ounded b y √ 2 d − 1 , when there is a mo v emen t of the en troid x i , of γ 1 − → n ij x , the Leb esgue measure of ω hanging of V oronoi ells is then b ounded b y | γ 1 | 2 √ 2 d − 1 , so λ C j x + γ 1 − → n ij x ∩ C i ( x ) < α √ 2 d − 1 (6) Moreo v er, w e note that x + γ 1 − → n ij x b elongs to D δ 2 I . On the other hand, let y i − x i − γ 1 − → n ij x := γ 2 − → τ ij x , with k − → τ ij x k = 1 , b e the orthogonal omp onen t to − → n ij x of the mo v emen t of x i to y i , i.e. su h that − → n ij x − → τ ij x = 0 . As it is sho wn in gure (1), in dimension 2, for all x ′ = x + γ 1 − → n ij x ∈ D δ 2 I , the Leb esgue measure of ω hanging of V oronoi ells for a mo v emen t of en troid x ′ i , of γ 2 − → τ ij x is b ounded b y 2 α δ √ 2 d − 1 . SOM and distortion measure 8 Therefore, w e ha v e λ C j x + γ 1 − → n ij x + γ 2 − → τ ij x ∩ C i ( x ) < α √ 2 d − 1 + 2 α δ √ 2 d − 1 (7) Figure 1: hat hed area < 2 γ 2 δ < √ 2 × 2 α δ γ 2 γ 2 0 1 2 2 δ δ/2 /2 < x’ x j i As this inequalit y is indep enden t of x , nally w e get: sup x ∈ D δ I λ C j x + γ 1 − → n ij x + γ 2 − → τ ij x ∩ C i ( x ) < α + 2 α δ √ 2 d − 1 (8) then sup x ∈ D δ I λ ( U α i ( x )) < ( | I | − 1) α + 2 α δ √ 2 d − 1 No w onsider x 0 ∈ D δ I and S ( x 0 ) a neigh b orho o d of x 0 inluded in a sphere of radius α . Let W ( x 0 ) b e the set of ω remaining in their V oronoi ells when x 0 go to an y x ∈ S ( x 0 ) . F or all SOM and distortion measure 9 ω ∈ W ( x 0 ) w e ha v e inf x ∈ S ( x 0 ) g x ( ω ) ≥ g x 0 ( ω ) − P j ∈ I Λ C − 1 x 0 ( ω ) − j k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x 0 j − ω k 2 ≥ g x 0 j ( ω ) − P j ∈ I k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x 0 j − ω k 2 (9) F or all ω ∈ [0 , 1] d , for a small enough α , w e ha v e k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x j − ω k 2 < ε 2 B | I | so Z W ( x 0 ) X j ∈ I k x 0 j − ω k 2 − inf x ∈ S ( x 0 ) k x j − ω k 2 dP ( ω ) < ε 2 and Z W ( x 0 ) g x 0 ( ω ) − inf x ∈ S ( x 0 ) g x ( ω ) < ε 2 (10) No w, let W ( x 0 ) c b e the set of ω hanging of V oronoi ells when the en troids go from x 0 to x ∈ S x 0 . F or all ω ∈ W ( x 0 ) c there exist t w o dieren t indies i and j su h that ω ∈ C i ( x 0 ) and ω ∈ C j ( x ) . Let us dene a sequene x k , k ∈ { 0 , · · · , k I |} , b y sequen tially hanging the omp onen ts of x 0 in to the omp onen ts of x su h that x | I | = x ( x k is the set of in termediate ongurations to transform x 0 in x ), then there exists a momen t l ∈ { 0 , · · · , | I | − 1 } , su h that ω ∈ C i ( x l ) and ω / ∈ C i ( x l +1 ) . Indeed, if it w ere not the ase, y ou ould nd a sequene x k , k ∈ { 0 , · · · , k I |} , with x | I | = x su h that ω ∈ C i ( x | I | ) = C i ( x ) , whi h w ould b e a on tradition. So W ( x 0 ) c is inluded in the set of ω whi h hange of V oronoi set when w e hange sequen tially the omp onen ts of x 0 b y the omp onen ts of x . If α < δ 4 , then when the omp onen ts x 0 i of x 0 are mo ving sequen tially from x 0 to x i of x , ea h in termediate onguration sta ys in D δ 2 I . Sine, for all i ∈ I , k x i − ω k 2 is b ounded b y 1 on [0 , 1] d , the lemma 3.1, assure that Z W ( x 0 ) c g x ( ω ) dP ( ω ) < B | I | ( | I | − 1)) 4 α δ + α √ 2 d − 1 (11) SOM and distortion measure 10 Finally , if w e ho ose a small enough α su h that B | I | ( | I | − 1)) 4 α δ + α √ 2 d − 1 < ε 2 , w e get Z D δ I g x 0 ( ω ) dP ( ω ) − ε < Z D δ I inf x ∈ S ( x 0 ) g x ( ω ) dP ( ω ) (12) Exatly in the same w a y , for a small enough α , w e get Z D δ I sup x ∈ S ( x 0 ) g x ( ω ) ! dP ( ω ) < Z D δ I g x 0 ( ω ) dP ( ω ) + ε (13) Therefore, the suien t ondition for the uniform la w of large n um b ers is true. 3.2 Consisteny W e w an t to sho w the onsisteny of the pro edure in v olving ho osing maps ( x n ) n ∈ N ∗ whi h almost minimizes the empirial distortions ( V n ( x )) n ∈ N ∗ in D δ I . Let ¯ χ β n := ( x ∈ D δ I su h that V n ( x ) < inf x ∈ D δ I V n ( x ) + 1 β ( n ) ) (14) b e the set of estimators that almost minimize the empirial distortion, with β ( n ) b eing a stritly p ositiv e funtion, su h that lim n → + ∞ β ( n ) = ∞ . Let ¯ χ = arg min x ∈ D δ I V ( x ) b e the set of maps minimizing the theoretial distortion, ev en tually redued to one map. It is easy to v erify that the funtion x 7− → V ( x ) is on tin uous on D δ I , so for all neigh b orho o d N of ¯ χ , η ( N ) > 0 exists su h that ∀ x ∈ D δ I \N , V ( x ) > min x ∈ D δ I V ( x ) + η ( N ) (15) to sho w the strong onsisteny , it is enough to pro v e that for all neigh b orho o ds N of ¯ χ w e ha v e lim n →∞ ¯ χ β n a.s. ⊂ N ⇐ ⇒ lim n →∞ V ¯ χ β n − V ( ¯ χ ) a.s. ≤ η ( N ) (16) SOM and distortion measure 11 with V ( E ) − V ( F ) := sup { V ( x ) − V ( y ) for x ∈ E and y ∈ F } . By denition V n ¯ χ β n a.s. ≤ V n ( ¯ χ ) + 1 β ( n ) , and the uniform la w of large n um b ers yields lim n →∞ V n ( ¯ χ ) − V ( ¯ χ ) a.s. = 0 , w e get then lim n →∞ V n ¯ χ β n a.s. ≤ V ( ¯ χ ) + η ( N ) 2 . Moreo v er, w e ha v e lim n →∞ V ¯ χ β n − V n ¯ χ β n a.s. = 0 and lim n →∞ V ¯ χ β n − η ( N ) 2 a.s. < lim n →∞ V n ¯ χ β n a.s. ≤ V ( ¯ χ ) + η ( N ) 2 (17) nally lim n →∞ V ¯ χ β n − V ( ¯ χ ) a.s. ≤ η ( N ) , this pro v es the strong onsisteny of the maps whi h almost minimizes the empirial distortion. 4 Dierenes b et w een the SOM algorithm and distortion measure Using the result of the previous setion w e an in v estigate the dierenes b et w een the minima of the empirial distortion and the equilibria of the SOM algorithm. Namely , if these equilibria w ere maps almost minimizing the empirial distortion riterion they will on v erge, as the n um b er of observ ations inreases, to the minim um of the theoretial distortion measure but w e will sho w that it is not generally the ase. In the next setion w e will ompute the gradien t of the funtion V ( x ) , and sho w that ev en in m ultidimensional ases, the equilibria of the SOM algorithm and the minima of V ( x ) do not mat h. These results generalize the results of K ohonen [ 9℄ obtained for unidimensional ases. 4.1 Deriv abilit y of V ( x ) Let us no w write D I := x i = x 1 i , · · · , x d i i ∈ I ∈ [0 , 1] d I ∀ k ∈ { 1 , · · · , d } x k i − x k j > 0 if i 6 = j (18) SOM and distortion measure 12 F or i and j ∈ I , notes − → n ij x the v etor x j − x i k x j − x i k and let M ij x :=: u ∈ R d / u − x i − x j 2 , x i − x j = 0 (19) b e the mediator h yp erplan. Let us note λ ij x ( ω ) the Leb esgue measure on M ij x . F ort and P agès [3 ℄, ha v e sho wn the follo wing lemma: Lemma 4.1 L et φ b e an R value d ontinuous funtion on [0 , 1] d . F or x ∈ D I , let b e Φ i ( x ) := R C i ( x ) φ ( ω ) dω . W e note also ( e 1 , · · · , e d ) the anoni al b ase of R d . The funtion Φ i is ontinuously derivable on D I and ∀ i 6 = j, l ∈ { 1 , · · · , d } ∂ Φ i ∂ x l j ( x ) = Z ¯ C i ( x ) ∩ ¯ C j ( x ) φ ( ω ) 1 2 − → n ij x , e l + 1 k x j − x i k × x i + x j 2 − ω , e l λ ij x ( ω ) dω (20) Mor e over, if we note ∂ Φ i ∂ x i ( x ) := ∂ Φ i ∂ x 1 j ( x ) . . . ∂ Φ i ∂ x d j ( x ) ∂ Φ i ∂ x i ( x ) = − X j ∈ I , j 6 = i ∂ Φ i ∂ x j ( x ) (21) Then, w e dedue the theorem: Theorem 4.2 If P ( dω ) = f ( ω ) dω , wher e f is ontinuous on [0; 1] d , then V is ontinuously deriv- SOM and distortion measure 13 able on D I and we have ∂ V ∂ x i ( x ) = P k ∈ I Λ ( i − k ) R C k ( x ) ( x i − ω ) P ( dω ) + 1 2 P j ∈ I P k ∈ I ,k 6 = i (Λ ( k − j ) − Λ ( i − j ) ) × R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 1 2 − → n k i x + 1 k x k − x i k × x i + x k 2 − ω f ( ω ) λ k i x dω (22) wher e ∂ V ∂ x i ( x ) = ∂ V ∂ x 1 i ( x ) . . . ∂ V ∂ x d i ( x ) Pro of As the funtion V ( x ) is on tin uous on D I , w e only ha v e to sho w that the partial deriv ativ es exist and are on tin uous. W e note h l i ∈ R | I |× d the v etor with all omp onen ts n ull exept the omp onen t orresp onding to x l i , whi h is h > 0 . Then V ( x + h l i ) − V ( x ) h = 1 2 P k,j ∈ I , k,j 6 = i Λ( k − j ) R C k ( x + h l i ) k x j − ω k 2 P ( dω ) − 1 2 P k,j ∈ I , k,j 6 = i Λ( k − j ) R C k ( x ) k x j − ω k 2 P ( dω ) h + 1 2 P j ∈ I , j 6 = i Λ( i − j ) R C i ( x + h l i ) k x j − ω k 2 P ( dω ) − 1 2 P j ∈ I ,j 6 = i Λ( i − j ) R C i ( x ) k x j − ω k 2 P ( dω ) h + 1 2 P k ∈ I ,k 6 = i Λ( k − i ) R C k ( x + h l i ) k x i + h l i − ω k 2 P ( dω ) − R C k ( x ) k x i − ω k 2 P ( dω ) h + 1 2 „ R C i ( x + h l i ) k x i + h l i − ω k 2 P ( dω ) − R C i ( x ) k x i − ω k 2 P ( dω ) « h (23) Where the rst t w o lines of the sums onern en troids dieren t from x i and the last t w o lines the v ariation in v olving x i . No w, b y applying the lemma 4.1, to the rst t w o lines of the sum w e get: SOM and distortion measure 14 lim h → 0 V ( x + h l i ) − V ( x ) h = 1 2 P k ,j ∈ I , k, j 6 = i Λ ( k − j ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k × x k + x i 2 − ω , e l o λ k i x ( ω ) dω − 1 2 P k ,j ∈ I , k, j 6 = i Λ ( i − j ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k × x k + x i 2 − ω , e l o λ k i x ( ω ) dω + lim h → 0 1 2 P k ∈ I ,k 6 = i Λ( k − i ) R C k ( x + h l i ) k x i − ω k 2 +2 h ( x l i − w l )+ o ( h ) P ( dω ) − R C k ( x ) k x i − ω k 2 P ( dω ) h + lim h → 0 1 2 „ R C i ( x + h l i ) k x i − ω k 2 +2 h ( x l i − w l )+ o ( h ) P ( dω ) − R C i ( x ) k x i − ω k 2 P ( dω ) « h (24) Then, b y applying the lemma 4.1 to the last t w o lines, w e get: lim h → 0 V ( x + h l i ) − V ( x ) h = 1 2 P k ,j ∈ I , k, j 6 = i (Λ ( k − j ) − Λ ( i − j )) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k × x k + x i 2 − ω , e l o λ k i x ( ω ) dω + 1 2 P k ∈ I ,k 6 = i Λ ( k − i ) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x i − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k × x k + x i 2 − ω , e l o λ k i x ( ω ) dω − 1 2 P k ∈ I ,k 6 = i R ¯ C k ( x ) ∩ ¯ C i ( x ) k x i − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k × x k + x i 2 − ω , e l o λ k i x ( ω ) dω + P k ∈ I Λ ( k − i ) R C k ( x ) ( x l i − w l ) P ( dω ) (25) nally lim h → 0 V ( x + h l i ) − V ( x ) h = ∂ V ∂ x l i ( x ) = 1 2 P k ,j ∈ I , k 6 = i (Λ ( k − j ) − Λ ( i − j )) R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 n 1 2 D − → n k i x , e l E + 1 k x i − x k k × x k + x i 2 − ω , e l o λ k i x ( ω ) dω + P k ∈ I Λ ( k − i ) R C k ( x ) ( x l i − w l ) P ( dω ) (26) If w e assume that the minim um of distortion measure is rea hed in the in terior of D I (i.e. that no en troids ollapse), w e dedue from the previous results that it do es not mat h the equilibrium of the K ohonen algorithm. Indeed, a p oin t x ∗ := ( x ∗ i ) i ∈ I asymptotially stable for the K ohonen SOM and distortion measure 15 algorithm will v erify for all i ∈ I : X k ∈ I Λ ( i − k ) Z C k ( x ) ( x i − ω ) P ( dω ) = 0 (27) This equation is v alid ev en for the bat h algorithm (see F ort, Cottrell and Letrém y [4 ℄). It an mat h with a minim um of the limit distortion only if 1 2 P j ∈ I P k ∈ I ,k 6 = i (Λ ( k − j ) − Λ ( i − j )) × R ¯ C k ( x ) ∩ ¯ C i ( x ) k x j − ω k 2 1 2 − → n k i x + 1 k x k − x i k × x i + x k 2 − ω f ( ω ) λ k i x dω = 0 (28) but, in general, this term is not n ull. 4.2 Example of a K ohonen string with 3 en troids The previous setion has sho wn that the minim um of distortion measure do es not mat h the equi- librium of the K ohonen algorithm. W e will illustrate this with a simple example. The lassial explanation (see K ohonen [7 ℄) of lo al p oten tial minimization b y the K ohonen algorithm is far from b eing satisfatory . A tually it seems that the minima of the distortion measure alw a ys o ur on a dison tin uit y p oin t, where the funtion is not deriv able. T o illustrate this, let a K ohonen string b e on segmen t [0 , 1] (see gure 2), with a disrete neigh b orho o d SOM and distortion measure 16 Figure 2: K ohonen string 0 1 1 1 X X X 1 2 3 4.2.1 The theoretial dierene The equilibrium of the SOM algorithm is rea hed on p oin ts x v erifying ∂ V ∂ x 1 ( x ) = R C 1 ( x ) ( x 1 − ω ) P ( dω ) + R C 2 ( x ) ( x 1 − ω ) P ( dω ) = 0 ∂ V ∂ x 2 ( x ) = R C 1 ( x ) ( x 2 − ω ) P ( dω ) + R C 2 ( x ) ( x 2 − ω ) P ( dω ) + R C 3 ( x ) ( x 2 − ω ) P ( dω ) = 0 ∂ V ∂ x 3 ( x ) = R C 2 ( x ) ( x 3 − ω ) P ( dω ) + R C 3 ( x ) ( x 3 − ω ) P ( dω ) = 0 (29) but the minima of the distortion are rea hed on p oin ts x v erifying ∂ V ∂ x 1 ( x ) = R C 1 ( x ) ( x 1 − ω ) P ( dω ) + R C 2 ( x ) ( x 1 − ω ) P ( dω ) − 1 4 x 3 − x 1 + x 2 2 2 f x 1 + x 2 2 = 0 ∂ V ∂ x 2 ( x ) = R C 1 ( x ) ( x 2 − ω ) P ( dω ) + R C 2 ( x ) ( x 2 − ω ) P ( dω ) + R C 3 ( x ) ( x 2 − ω ) P ( dω ) − 1 4 x 3 − x 1 + x 2 2 2 f x 1 + x 2 2 + 1 4 x 1 − x 3 + x 2 2 2 f x 3 + x 2 2 = 0 ∂ V ∂ x 3 ( x ) = R C 2 ( x ) ( x 3 − ω ) P ( dω ) + R C 3 ( x ) ( x 3 − ω ) P ( dω ) + 1 4 x 1 − x 2 + x 3 2 2 f x 2 + x 3 2 = 0 (30) If w e assume, for example, that the densit y of observ ations is uniform U [0;1] , i.e. f ( x ) = 1 if x ∈ [0; 1] , then these t w o sets of p oin ts ha v e no p oin t in ommon. Indeed, if the t w o sets are equal then x 3 − x 1 + x 2 2 = 0 x 1 − x 2 + x 3 2 = 0 (31) SOM and distortion measure 17 Therefore, x 1 = x 2 = x 3 , but this p oin t is learly not an equilibrium of the K ohonen map. 4.2.2 Illustration of the b eha vior of distortion measure W e will see that if one dra ws data with a uniform distribution on the segmen t [0 , 1] and then one omputes the minim um of the distortion, then this minim um is alw a ys on a dison tin uit y p oin t. The more observ ations one has, the more dison tin uities there are, but the global funtion lo oks more and more regular. This is not surprising, sine w e kno w that the limit is deriv able. The metho d of sim ulation Sine w e ha v e no n umerial algorithm to ompute the exat min- im um of v ariane, w e pro eed b y exhaustiv e resear h based on a disretization of the spae of the en troids. T o a v oid to o m u h omputation, 0 . 001 is hosen as the disretization step. The follo wing gures are obtained in the follo wing w a y: 1. Sim ulate n data ( ω 1 , · · · , ω n ) , hosen with a uniform la w on [0 , 1] . 2. Sear h exhaustiv ely , on the disretization of D I , the string whi h minimizes the distortion. 3. F or the b est string ( x ∗ 1 , x ∗ 2 , x ∗ 3 ) , the graphial represen tations are obtained in the follo wing w a y: • 3D Represen tation: w e k eep one en troid in the triplet ( x ∗ 1 , x ∗ 2 , x ∗ 3 ), then w e mo v e the other around a small neigh b orho o d of its optimal p osition. The lev el z is the extended v ariane m ultiplied b y the n um b er of observ ations n . • 2D Represen tation: w e k eep t w o en troids in the triplet ( x ∗ 1 , x ∗ 2 , x ∗ 3 ), then w e mo v e the last one around a small neigh b orho o d of its optimal p osition. The lev el z is the extended v ariane m ultiplied b y the n um b er of observ ations n . SOM and distortion measure 18 The follo wing gures sho w the results obtained for a n um b er of observ ations n v arying from 10 , 100 and 1000 . W e notie that, ev en for a small n um b er of observ ations, the minima are alw a ys on dison tin uit y p oin ts. Figure 3: Distortion measure for 10 observ ations x2 x3 z x1 x3 z x1 x2 z 0.34 0.36 0.38 0.40 0.42 0.44 0.82 0.84 0.86 0.88 0.90 x1 z 0.50 0.52 0.54 0.56 0.58 0.60 0.82 0.84 0.86 0.88 0.90 0.92 0.94 x2 z 0.72 0.74 0.76 0.78 0.80 0.82 0.82 0.84 0.86 0.88 x3 z SOM and distortion measure 19 Figure 4: Distortion measure for 100 observ ations x2 x3 z x1 x3 z x1 x2 z 0.30 0.32 0.34 0.36 0.38 11.95 12.00 12.05 12.10 12.15 x1 z 0.46 0.48 0.50 0.52 0.54 0.56 12.0 12.1 12.2 12.3 12.4 x2 z 0.62 0.64 0.66 0.68 0.70 0.72 11.95 12.00 12.05 12.10 12.15 12.20 x3 z Figure 5: Distortion measure for 1000 observ ations x2 x3 z x1 x3 z x1 x2 z 0.26 0.28 0.30 0.32 0.34 0.36 121.0 121.5 122.0 122.5 x1 z 0.44 0.46 0.48 0.50 0.52 0.54 121 122 123 124 x2 z 0.62 0.64 0.66 0.68 0.70 121.0 121.5 122.0 122.5 123.0 x3 z SOM and distortion measure 20 5 Conlusion F or a nite n um b er of observ ations, the K ohonen algorithm w as supp osed to giv e an appro ximation of the minim um of distortion measure, but if it w ere the ase, then wh y an the p oin ts of equilibrium of the algorithm b e dieren t from the theoretial minim um of distortion? Moreo v er, w e ha v e sho wn that if w e ho ose maps that almost minimizes the empirial distortion, then these maps ha v e to on v erge to the set of maps whi h minimize the theoretial distortion. But, b y alulating the deriv ativ e of the theoretial distortion, w e ha v e sho wn that the equilibria of the K ohonen map an not minimize this distortion in general. W e illustrate this fat with an example where the minim um is alw a ys rea hed on dison tin uit y p oin ts. This fat pro v es that the lo al deriv abilit y of distortion measure is not an imp ortan t prop ert y and is not a satisfatory explanation for the b eha vior of the K ohonen algorithm when the n um b er of observ ations is nite. Referenes [1℄ Cottrell M., F ort, J.C. and P agès G. (1998). Theoretial asp ets of the SOM algorithm. Neu- r o omputing , 21 . 119-138. [2℄ Erwin, E., Ob erma y er, K. and S h ulten, K. (1992). Self-Organizing Maps: Ordering, Con v er- gene prop erties and Energy F untions. Biolo gi al Cyb ernetis , 67 . 47-55. [3℄ F ort, J.C. and P agès G. (1995). On the a.s. on v ergene of the K ohonen algorithm with a general neigh b orho o d funtion. The A nnals of Applie d Pr ob ability , 5(4) . 1177-1216. [4℄ F ort, J.C., Letrém y , P . and Cottrell M. (2002). A dv an tages and dra wba ks of the bat h K ohonen algorithm. In M. V erleysen (ed.), Pr o e e dings of ESANN 2002 (pp. 223-230). Brussels: Di F ato. SOM and distortion measure 21 [5℄ Gaenssler, P ., Stute, W. (1979). Empirial pro esses: A surv ey of results for indep enden t and iden tially distributed random v ariables. The A nnals of Pr ob ability , 7:2 193-243 [6℄ Graep el, T., Burger M. and Ob erma y er, K. (1997). Phase transition in sto hasti self organizing maps. Physi al r eview , E(56) . 3876-3890. [7℄ K ohonen, T. (1995). Self-Organizing Maps. Springer Series in Information sien es , 30 . New Y ork: Springer-V erlag. [8℄ K ohonen, T. (1991). Artiial neural net w orks, 2 . Amsterdam: North Holland. [9℄ K ohonen, T. (1999). Comparison of SOM p oin ts densities based on dieren t riteria. Neur al Computation , 11 . 2081-2095. [10℄ P ollard, D. (1981). Strong onsisteny of k-mean lustering. the A nnals of statistis , 9(1) . 135-140. [11℄ Rynkiewiz, J. (2005). Consisteny of a least extended v ariane estimator (in F ren h). Comptes r endus de l'A adémie des Sien es , I(345) . 133-136. x2 x3 z x1 x3 z x1 x2 z 0.28 0.30 0.32 0.34 0.36 0.38 1190 1195 1200 1205 x1 z 0.46 0.48 0.50 0.52 0.54 0.56 1190 1195 1200 1205 1210 1215 1220 x2 z 0.64 0.66 0.68 0.70 0.72 1190 1195 1200 1205 x3 z
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment