Learning by random walks in the weight space of the Ising perceptron

Learning b y random w alks in the w eigh t sp ace of the Ising p erceptron Haiping Huang 1 and Haijun Zhou 1 , 2 1 Key L ab or atory of F r ontiers in The or etic al Physics, Institute of The or etic al Physics, Chinese A c ademy of Scienc es, Beijing 10 0190, China 2 Kavli Institute for The or etic al Physics China, Institute of The or etic al Physics, Chinese A c ademy of Scienc es, Beijing 10 0190, China (Dated: No v em b er 19, 2018) Sev eral v arian ts of a sto c hastic lo cal searc h p ro cess f or constructing the synaptic w eigh ts of an Ising p erceptron are studied. In this pro cess, binary patterns are sequen tially pr esen ted to the Ising p erceptron and are then learned as the synaptic w eigh t conﬁguration is mo diﬁed through a chain of single- or dou b le-w eigh t ﬂips within the compatible w eigh t conﬁguration space of the earlier learned patterns. This pr o cess is able to reac h a storage capacit y of α ≈ 0 . 63 for p attern length N = 101 and α ≈ 0 . 41 for N = 1001. If in addition a relearning pro cess is exploited, the learnin g p erformance is further improv ed to a storage capacit y of α ≈ 0 . 80 for N = 101 and α ≈ 0 . 42 for N = 1001. W e foun d that, for a giv en learning task, the solutions constructed by the rand om walk learning p ro cess are separated b y a typica l Hamming distance, wh ic h d ecreases with the constrain t den sit y α of the learning task; at a ﬁxed v alue of α , the wid th of the Hamming distance distr ib utions decreases with N . Keywords: neuronal net works (theory), dis o rdered systems (theor y), sto chastic search, anal- ysis of algo rithms I. INTR ODUCT ION A single-lay ered feed-for w ard net w ork of neurons, r eferred to as a p erceptron, is an ele- men tary building blo ck of complex neural net w orks. It is also one of the basic structures for learning and memory [1]. In a p erceptron, N input neurons (units) a r e connected to a single output unit by synapses of contin uous or discrete-v alued synaptic w eights . The learning task is to set the w eigh t v alues for these N synapses suc h that a n extensiv e n um b er M = α N o f input pa t terns are correctly classiﬁed (see Fig. 1a) . The pa rameter α ≡ M / N is called the constrain t densit y . An assignmen t of t hese w eigh ts is referred to as a solution if the p ercep- tron correctly classiﬁes all the input patterns with this w eigh t a ssignmen t. Compared with p erceptrons with real-v alued synaptic weigh t s, Ising p erceptrons, whose synaptic weigh t s are binary , are muc h simpler for large-scale electronic implemen tations and more r obust against noise. An Ising p erceptron is a lso relev ant in real neural systems, as t he synaptic w eigh t b et w een tw o neurons actually take s b ounded v alues and has a limited nu mber of synaptic states[2, 3]. On the other hand, training a real-v a lued p erceptron is easy (e.g., the Mino v er algorithm [4] and the AdaT ron alg o rithm [5]) but training an Ising p erceptron is know n to b e an NP-complete pro blem [6]. Giv en α N input patterns, the computation time needed to ﬁnd a solution ma y g ro w exp onen tially with the n um b er of weigh ts N in the w orst case. A complete enume ratio n of a ll p ossible weigh t states is only feasible for small systems up to 2 N = 25 [7–10]. In recen t y ears researc hes on eﬃcien t heuristic algorithms were rather activ e [6, 11–17]. If the n umber M o f input patterns is to o large, a p erceptron will b e unable t o correctly classify all o f them, no mat t er how the synaptic w eigh ts are mo diﬁed. This is a phase transition phenomenon o f the solution space of the p erceptron. In the case that the M input binary patterns are sampled uniformly and ra ndomly from the set o f all binar y patterns, the maximal v a lue α s of the constraint densit y α , the storage capacity at whic h a solution still exists, has b een calculated by statistical phy sics metho ds. F or the contin uous p erceptron sub ject to the spherical constrain t, Gardner and Derrida found t ha t α s = 2 [18, 19]. A t the thermo dynamic limit of N → ∞ , the contin uous p erceptron is imp ossible to correctly classify more than 2 N random input pa tterns. When the synaptic w eigh t is restricted to bina r y v alues, α s w as predicted to b e 0 . 83 b y Krauth a nd M´ ezard using the ﬁrst-step replica-symmetry brok en spin-glass theory [20]. This prediction w as conﬁrmed by n umerical sim ulations of small size systems (plus an extrap olation t o larg e N ) [7, 9, 2 1]. The theoretically predicted storage capacit y α s represen ts the upper limit of achie v able constrain t densit y α by an y learning strategies. As the constrain t densit y α increases, it is exp ected t ha t the solution space of the Ising p erceptron breaks in to a huge n um b er of disjoin t ergo dic comp onen ts [22 ]. Solutio ns from diﬀeren t comp onen ts are signiﬁcantly diﬀeren t. One can deﬁne a connected compo nen t of the w eight space as a cluster of solutions in which an y t w o solutions are connected b y a path of successiv e single-weigh t ﬂips [2 3, 24]. These solution clusters are separated by w eigh t conﬁgur a tions that o nly correctly classify a subset o f the input patterns. These partial solutions act a s dynamical traps for lo cal searc h algorithms and make the learning ta sk hard. An adaptive genetic algorithm w as suggested by K¨ ohler in 1990, whic h could reac h α ≃ 0 . 7 for systems of N = 255 [11]. Simu lated annealing techniq ues w ere used b y Horner [22] but critical slo wing down of the searc h pro cess w as observ ed, due to the v ery rugged energy landscap e of the problem. The sim ulated annealing was also used to study the statistical structure of the energy landscap e for the Ising p erceptron. The analysis of the distribution of distances b et w een global minima obta ined b y sim ulated annealing f o r small α indicated that the distance distribution b ecomes a delta function in the thermo dynamic limit [25]. Making use of the adv an tage that eﬃcien t algo rithms exist for the real-v alued p erceptron, a n alt ernat ive approach w as to clip the trained real-v alued w eigh ts o f the con tin uous p erceptron in to binary v alues [13, 26–29]. No t a ll synaptic we ights can b e correctly sp eciﬁed b y clipping, how ev er, and for those uncertain weigh ts, complete en umeration w as then adopted. A message-passing algorithm was dev elop ed b y Braunstein and Zecc hina for the Ising p erceptron [15], whic h was able to reac h α ≃ 0 . 7 for N ≥ 10 00. The eﬃciency of this b elief-propagat io n algorithm w as later conjectured t o b e due to the existence of a sub-exp onential n umber of lar ge solution clusters in the we ight space [24]. An on-line learning algo r it hm inspired fro m this b elief-propaga tion algo rithm w as also studied [16], in whic h hidden discrete in ternal states are added to the synaptic w eigh ts. In real neural systems , the microscopic mec hanism o f p erceptronal learning is the Hebbian rule of synaptic mo diﬁcation (spiking-time-dependent synaptic plasticit y ma y b e exploited, see, e.g., Refs. [30, 31 ]). Th e learning pro cesses in biological p erceptronal systems are ex- p ected to b e m uc h simpler than the v arious sophisticated learning pro cesses of artiﬁcial p erceptrons. Tw o other imp ortant asp ects of biological p erceptron systems are (i) the pat- terns to b e classiﬁed a re usually read in to the system in a sequen tial o rder, so they are b eing learned one b y one, and (ii) when a new pa t t ern is b eing learned, there are biolog ical mec hanisms whic h reactiv a te old lear ned patterns; such recalling pro cesses help to preven t 3 1 2 3 µ µ µ ξ ξ ξ N J 1 J N µ ξ sgn 1 , 2, , i i i J N µ µ σ ξ µ α   =     = ∑ K (a) 1 1 m m m − + (b) FIG. 1: (Color online) Th e sk etc h of the Ising p erceptron and the sin gle-we ight random wa lking pro cess in the corresp ond ing w eigh t space. (a) N input u n its (op en circles) feed directly to a sin gle output u nit (solid circle). A binary inp ut p attern ( ξ µ 1 , ξ µ 2 , . . . , ξ µ N ) of length N is mapp ed through a sign function to a binary output σ µ , i.e., σ µ = sgn  P N i =1 J i ξ µ i  . Th e set of N binary synaptic w eigh ts { J i } is regarded as a solution of the p erceptron problem if th e outpu t σ µ = σ µ 0 for eac h of the M = αN inp ut patterns µ ∈ [1 , M ], wh ere σ µ 0 is a preset binary v alue. (b) A s olution space r an d om walki n g p ath (indicated b y arro ws). An op en circle repr esen ts a conﬁgur ation that satisﬁes th e ﬁ rst m + 1 inp u t patterns, while a b lac k circle and a gray circle represents, resp ectiv ely , a conﬁguration that satisﬁes the ﬁrst m and the ﬁrst m − 1 inpu t patterns. An edge b et we en t w o conﬁgurations means that these t w o conﬁgurations are related by a single-w eigh t ﬂip. old pa t t erns from b eing forgot as new patterns are learned (see, e.g., the experimental in- v estigation of Ref. [32]). Motiv a ted b y these biological considerations, w e in v estigate in this pap er a simple seque ntial learning mec hanism, namely synaptic-w eigh t space random w alk- ing. In this random w alking mec hanism, the αN patterns ar e intro duced into the system in a randomly p erm uted sequen tial or der, and random walk of single- or double-w eigh t ﬂips is p erformed un til eac h newly added pattern is correctly classiﬁed (learned). The previously learned patterns are not allo we d to b e misclassiﬁed in later stages of the learning pro cess. W e p erfo rm extensiv e nume rical sim ulations on sev eral v ariants of this simple sequen tia l lo- cal learning rule and ﬁnd that this mec hanism has go o d p erformance o n systems of N ∼ 10 3 neurons or less. The pap er is org a nized as follows . The Ising p erceptron learning is deﬁned in more detail in Sec. I I. Sev eral strategies o f learning b y random w alks are presen ted in Sec. I I I. In Sec. IV, exp erimen tal study of learning algorithms is carried out. The ov erla p distribution of solutions as w ell as p erfor mances of diﬀerent lo cal search algorithms is rep orted. Summary and discussion are giv en in Sec. V. Sequen tial random w alk search algorithms w ere recen tly in ve stigated in v arious com bina- torial satisfaction problems (see, e.g., Refs. [33–35]). The presen t w ork adds evidence that the solution space random w alking mechanis m, although very simple and easy to implemen t, is able to solv e man y nontrivial pro blem instances of a giv en complex learning or constraint satisfaction problem. 4 I I. THE RANDOM CLASSIFI CA TION PR OBLEM F or the Ising p erceptron depicted sc hematically in Fig. 1a, N input units a r e connected to a single output unit b y N synapses of w eight J i = ± 1 ( i = 1 , 2 , . . . , N ). The p erceptron tries to learn M = αN asso ciations { ξ µ , σ µ 0 } ( µ = 1 , 2 , . . . , M ), where ξ µ ≡ ( ξ µ 1 , ξ µ 2 , . . . , ξ µ N ) is an input pattern with ξ µ i = ± 1, and σ µ 0 = ± 1 is the desired classiﬁcation of the input pattern µ . Giv en the input pattern ξ µ , the actual output σ µ of the p erceptron is σ µ = sgn  N X i =1 J i ξ µ i  . (1) The p erceptron can modify its syn aptic w eight conﬁgurat ion { J i } ≡ ( J 1 , J 2 , . . . , J N ) to ac hiev e complete classiﬁcation, i.e., σ µ = σ µ 0 for eac h of the M input pattern. The solution space of the Ising p erceptron is comp osed of all the weigh t conﬁgurations { J i } that satisfy σ µ 0 P i J i ξ µ i > 0 for µ = 1 , 2 , . . . , M . F or the random Ising p erceptron pro blem studied in t his pap er, eac h of the M input binary patt erns ξ µ is sampled uniformly and randomly from the set of all 2 N binary patterns of length N , and the classiﬁcation σ µ 0 is equal to ± 1 with equal proba bility . F or N suﬃcien tly large, the solution space of suc h a mo del system is non-empty as long as α < 0 . 83 [20]. T o construct suc h a solution conﬁguration { J i } , ho w ev er, is quite a non-trivial task. A more stringen t learning problem is to ﬁnd a w eight conﬁgura tion { J i } such that , fo r eac h input pattern ξ µ , σ µ 0 P i J i ξ µ i √ N ≥ κ , (2) where κ > 0 is a preset parameter [20]. The most eﬃcien t w ay of solving this constraint satisfaction problem app ears to b e the message-passing algo r ithm o f Refs. [15, 16]. One can p erform a gauge transform of ξ µ i → ξ µ i σ µ 0 to eac h input pattern. Under this gauge tra nsform, eac h desired output is transformed to σ µ 0 = 1. Without lo ss of g enerality , in the remaining part of this pap er w e will assume σ µ 0 = 1 for an y input pattern µ . Consider the case o f N b eing o dd, w e deﬁne the stabilit y ﬁeld of a pattern µ a s h µ = N X i =1 J i ξ µ i . (3) T o ensure the lo cal stabilit y of input pattern µ under c hanges of w eigh t conﬁgurat io n { J i } , in analogy to Eq. (2), w e in tro duce a stability para meter ∆ ≥ 1 and r equire that h µ ≥ ∆ for eac h µ . Input patterns with h µ ≥ 3 ar e stable ag a inst a single-we ight ﬂip. F or t he single- w eigh t ﬂipping pro cesses of the next section, the input patterns with h µ = 1 are referred to as barely learned patterns, as these patt erns may b ecome misclassiﬁed after the w eigh t conﬁguration ma kes a single ﬂip. Similarly , for the double-w eight ﬂipping pro cess o f the next section, the input patterns with h µ = 1 or h µ = 3 are referred to as barely learned patterns. I I I. LEARNING BY RANDOM W ALKS Random walk pro cesses w ere used in a series of works [33, 34, 36 –39] to ﬁnd solutions for constrain t satisfaction problems. They w ere also used as to ols to study the solution space 5 structure of these constrain t satisfaction problems [3 3 , 34, 40]. V arious lo cal searc h strategies ha v e b een dev elop ed to improv e the p erformance of random walk sto ch astic searc hing [41– 43]. The random w alk learning strategies of this w ork follo w the SEQSAT alg orithm of Ref. [34]. An initial weigh t conﬁguration ( J (0) 1 , J (0) 2 , . . . , J (0) N ) is randomly generated at time t = 0 . The ﬁrst pattern ξ 1 is applied to the Ising p erceptron. If this pattern is correctly classiﬁed under the initia l w eigh t conﬁgura tion (i.e., h 1 > 0), then the second pattern ξ 2 is applied; ot herwise the w eigh t conﬁgurat ion is adjusted b y a sequence of elemen tary lo cal c hanges until ξ 1 is correctly classiﬁed. The algorithm then pro ceeds with the second pattern ξ 2 , the third pattern ξ 3 , etc., in a sequen tial order. An elemen tary lo cal change of we ight conﬁguration is ac hiev ed either by a single-we ight ﬂip (SWF) or b y a double-w eigh t ﬂip (DW F). Supp ose at time t the w eigh t conﬁguration is { J ( t ) } ≡ ( J ( t ) 1 , J ( t ) 2 , . . . , J ( t ) N ), and supp ose this conﬁguration correctly classiﬁes the ﬁrst m input patterns ξ µ ( µ = 1 , . . . , m ) but not the ( m + 1)-th pattern ξ m +1 . The conﬁguratio n { J i } will kee p w andering in the solution space of the ﬁrst m patterns until a conﬁguration that correctly classiﬁes ξ m +1 is reached (see Fig. 1b). In the SWF proto col, a set A ( t ) of allo w ed single-w eigh t ﬂips is constructed based on the current conﬁguration { J ( t ) } a nd the m learned patterns. A ( t ) includes all in teger p ositions j ∈ [1 , N ] with the prop ert y that the single-we ight ﬂip o f J ( t ) j → − J ( t ) j do es not render any barely learned patterns µ ∈ [1 , m ] (whose h µ = 1) b eing misclassiﬁed. At time t ′ = t + 1 / N an integer p osition j is c hosen uniformly and randomly fro m set A ( t ) and the w eigh t conﬁguration is changed to { J ( t ′ ) } suc h that J ( t ′ ) i = J ( t ) i if i 6 = j and J ( t ′ ) j = − J ( t ) j . It is ob vious that the new conﬁguration { J ( t ′ ) } also satisﬁes all the ﬁrst m patterns. The DW F proto col is very similar to the SWF proto col, with the only diﬀerence that the allo w ed set A ( t ) at time t contains ordered pairs o f in teger p ositions ( i, j ) with i < j . This set of ordered pairs can a lso b e easily constructed. If, with resp ect to conﬁgur a tion { J ( t ) } , there are no bar ely learned patterns (whose stabilit y ﬁeld h µ = 1 or 3) among the ﬁrst m learned patt erns, then A ( t ) con tains all the N ( N − 1) / 2 ordered pairs of integers ( i, j ) with 1 ≤ i < j ≤ N . Otherwise, rando mly choose a barely learned pa ttern, sa y m 1 ∈ [1 , m ], and for eac h inte ger i ∈ [1 , N ] with the prop erty that J ( t ) i ξ m 1 i < 0, do the following: (1) if J ( t ) i ξ µ i < 0 for all the other barely learned patterns, then add a ll the ordered pa irs ( i, j ) with j ∈ [ i + 1 , N ] into the set A ( t ); (2 ) otherwise, add all the ordered pairs ( i, j ) in to the set A ( t ), with t he pro p ert y that the in teger j ∈ [ i + 1 , N ] satisﬁes J ( t ) j ξ µ j < 0 for all those barely learned patterns µ ∈ [1 , m ] with J ( t ) i ξ µ i > 0. The w aiting time ∆ t m +1 of satisfying the ( m + 1)-th pattern is deﬁned a s the total elapsed time fro m ﬁrst satisfying the m -th pattern to ﬁrst satisfying the ( m + 1)-th pa ttern. And the total time T m +1 of satisfying the ﬁrst ( m + 1) patterns is simply T m +1 = P m +1 µ =1 ∆ t µ . One time unit correspo nds to N elemen tary lo cal c hanges of the w eigh t conﬁguratio n. The ra ndom w alk searc hing pro cess stops if a ll the M input patterns hav e b een correctly classiﬁed, or if the last visited w eigh t conﬁgura tion b ecomes an isolated p o in t (i.e., t he set A ( t ) b ecomes empt y after a new pattern is included into the set of learned patterns), or if the last w aiting time ∆ t m +1 exceeds a preset maximal time v alue ∆ t max , whic h is equal to ∆ t max = 1000 in the presen t w ork. The SWF a nd DWF random w alks pro cesses as men tioned ab o v e are ve ry simple to implemen t a nd they do not ov ercome any barr iers in t he energy landscap e of the p erceptron learning problem. How ev er, as w e demonstrate in t he next section, their p erformances are quite remark able for problem instances with pattern length N ≤ 10 3 . 6 The SWF pro cess, as a lo cal searc h algorit hm, will get stuc k in one of the enormous metastable states when all the we ights b ecome fr o zen (here w e iden tify a synaptic w eigh t as b eing frozen if ﬂipping its v alue causes at least one of the learned patterns to b e misclassiﬁed), at a constraint densit y v alue m uc h smaller tha n t he theoretical thr eshold v alue of 0 . 83. The D WF pro cess will also get jammed if the w eight conﬁguration b ecomes fr o zen with resp ect to any double-weigh t ﬂips. T o further impro v e the ac hiev able storage capacity for the SWF and DWF learning pro cesses, a simple relearning strategy is added to the random walk searc hing. The basic idea of the relearning strategy is: if some learned patterns ar e hindering the learning of new patterns v ery m uch, we ﬁrst ignore them and pro ceed to learn a nu mber of new pa t terns; a fter t ha t, w e learn the ignored patterns again a nd hop e they can all b e correctly classiﬁed. In the presen t w ork, w e implemen t the relearning strategy in the following wa y . Suppo se that as the m -th input pattern is presen ted to the Ising p erceptron, the SWF or the DWF pro cess is unable to learn it in a w aiting time ∆ t m < ∆ t max . W e then remo v e all the k barely learned patterns µ ∈ [1 , m − 1] with h µ = 1 from the list of learned patterns, and pro ceed to learn the patterns µ ∈ [ m, m + k − 1] in a sequen tial manner (stage 1) . If the SWF pro cess or the DWF pro cess succeeds in learning these k patterns, we then r eturn to learn the k previously remo v ed patterns a gain in a sequen tial manner (stage 2). If this relearning succeeds , we pro ceed with t he patt erns with index µ ≥ m + k . If this attempt fa ils either at stage 1 or at stage 2, we stop the whole random walk learning pro cess or start with another trial b y remo ving all the learned patterns. In practice, w e ﬁnd that the relearning pro cess has a high probabilit y to succeed in b oth stag e 1 and stage 2 if α is not to o lar g e and pattern length is of order 10 3 or less. IV. RESUL TS Figure 2 demonstrates the sim ulation results fo r sev eral r a ndom walk learning strategies. F or eac h lear ning strategy , N set of ra ndom input patterns ( ξ 1 , ξ 2 , . . . , ξ M ) a r e generated. Eac h input pa ttern ξ µ has length N . The random w alk learning strategy is then applied to eac h set of patterns un til it stops, at whic h p oint we record the n um b er of correctly classiﬁed patterns m and calculate t he ac hiev ed storage capacity α = m/ N . The mean v alues of α are rep orted in Fig. 2. It a pp ears that the storage capacit y of all the four learning strategies decreases with N roughly as a p o we r law α ∝ N − γ . A t eac h v alue of N , the SWF strategy has the w orst p erfo rmance, while the DW F strat egy with relearning has the b est p erformance. The SWF strategy is able to reac h a stora ge capacity of α ≈ 0 . 36 for systems of N = 101 and α ≈ 0 . 17 fo r systems o f N = 1001. These v alues are muc h less t han the theoretical storage capa city of α ≈ 0 . 83. Ho w ev er, the DWF strategy p erforms m uc h b etter, with a capacit y of α ≈ 0 . 63 fo r N = 1 0 1 and α ≈ 0 . 41 for N = 10 0 1. In real neural systems, p erceptronal learning of elemen t a ry patterns probably do es not in v olve to o man y neuronal cells and a v alue o f N ∼ 1 0 2 migh t b e common. F or p erceptronal systems with N ∼ 10 2 − 10 3 , the SWF and D WF strategies can b e regarded as eﬃcien t. If relearning is introduced into the random w alk learning strategies, the p erformance can b e further impro v ed. F o r the DWF strategy with relearning, w e ﬁnd that the storage capacit y is α ≈ 0 . 80 fo r N = 101 a nd α ≈ 0 . 42 for N = 1001. Relearning is indeed a biologically relev an t strategy in p erceptronal learning o f real neural systems [32, 44]. As a comparison, for problem instances of pattern length N = 1001, the b elief-propagation inspired learning 7 100 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α N DW F with relearning DW F SW F wi th relearning SW F FIG. 2: (Color online) Comparison of the p erforman ces of sev eral r an d om walk search strategies. The achiev ed storage capacit y α as a v eraged o v er many ind ep endent runs (100 for the smallest N and 10 for the largest N ) are shown as a fu n ction of th e pattern length N . The solid lines are p o wer-la w ﬁttings of the form α ∝ N − γ , with γ = 0 . 302 , 0 . 347 , 0 . 198 , 0 . 241 f or SWF, S WF with relearning, D WF and D WF with relearning, resp ectiv ely . strategy of Baldassi and coauthors [16] ac hiev es α ≈ 0 . 47 when the n um b er K of in ternal states of their algorit hm is set to K = 40. This stor a ge capacit y α decreases to α ≈ 0 . 36 at K = 2 0 and to α ≈ 0 . 10 at K = 10. F or the same set of input patterns ( ξ 1 , ξ 2 , . . . , ξ m ), diﬀeren t runs of the SWF strategy or the DWF strategy lead to diﬀeren t solution conﬁgurations. The similarity b et w een solutions can b e measured b y an ov erlap v alue q as deﬁned by q = 1 N N X i =1 J i J ′ i , (4) where ( J 1 , . . . , J N ) and ( J ′ 1 , . . . , J ′ N ) are tw o solutions. The reduced Hamming distance d H b et w een tw o solutions is r elated to the o v erlap q b y d H = (1 − q ) / 2 . The t ypical v alue of the o ve rlap v alue at constrain t densit y α ∼ 0 . 83 is predicted to b e q ≈ 0 . 56 according to the r eplica-symmetric calculation [20 ], suggesting tha t solutions a re still far a wa y from eac h other ( with a reduced Hamming distance d H ≈ 0 . 22) a s α approa ches the theoretical storage capacit y α s . Figure 3 sho ws the histogram P ( d H ) of reduced Hamming distances d H b et w een diﬀeren t solutions found by the DW F stra t egy fo r a single problem instance with constraint densit y α and pattern length N . Diﬀeren t pattern lengths o f N = 101 , 5 01 , 1001 are used, and 100 diﬀeren t solutions are constructed by rep eated running of the D WF pro cess. Other problem instances sho w similar pro p erties. W e notice fro m F ig. 3 that, at the same v alue of α , the histograms P ( d H ) fo r diﬀerent N are p eak ed at almost the same d H v alue, but the width 8 0.0 0.1 0.2 0.3 0.4 0.5 0 500 1000 1500 2000 2500 Histogr am d H N=101 N=501 N=1001 α= 0.250 α= 0.449 α= 0.693 FIG. 3: (Color online) Histograms of redu ced Hamming d istances b et w een solutions found by D WF on a s in gle problem instance of M input p atterns of length N . 100 solutions are constr u cted for eac h of the ﬁve instances with ( M , N ) = (45 , 101), (70 , 101), (125 , 501), (225 , 501), (250 , 1001), resp ectiv ely . The solid lines are Gaussian ﬁtting results to th e h istograms. of P ( d H ) decreases as N is enlarged. Suc h a b eha vior w as observ ed earlier in Ref. [25] o n a sligh tly mo diﬁed Ising p erceptron problem. The solutions obtained b y the DWF strategy therefore hav e a t ypical lev el of similarit y . Figure 3 also demonstrates that, a s the constraint densit y increases, the histograms P ( d H ) shift to smaller d H v alues, suggesting that the lev el o f similarit y b et w een the D WF-constructed solutions increases with α . A t α = 0 . 693 the typic al reduced Ha mming distance is d H ≈ 0 . 224, compatible with the mean- ﬁeld predictions [20]. Similar results are obtained for solutions found by the SWF strategy . In all our sim ulations, w e do not o bserv e double or m ultiple p eaks fo r the histogram P ( d H ). The r esults of these and our other n umerical sim ulations (not shown) are consisten t with prop osal tha t , for a giv en problem instance, the solutions obtained b y the random walking strategies a re mem b ers o f the same (large) solution cluster of the solution space [24, 25, 45]. Unlik e the random K - satisﬁabilit y problem, the random Q -coloring problem, or some lo c ke d constrain t satisfaction problems [46–48], the solution space o rganization of the Ising p erceptron problem is still not v ery clear. K a bashima and co-authors [24] suggested that fo r α < 0 . 83 t he solution space of t he Ising p erceptron problem is equally dominated by exp onen tially many clusters of v anishing entrop y and a sub-exp onen tial num b er of larg e clusters. Our sim ulation results are compatible with this prop osal, but more work needs to b e done t o clarify the solution space structure of the random Ising p erceptron problem. The total time T αN used by the DWF strategy to correctly classify the ﬁrst α N patterns for a problem instance with N = 1001 is sho wn in Fig. 4 as a function of α . The learning time gro ws almo st linearly with α for α < 0 . 4. As the constraint densit y α b ecomes lar g e, diﬀerent 9 0.0 0.1 0.2 0.3 0.4 0 20 40 60 80 T α N α FIG. 4: (Color online) The learning time T αN as a function of α for for three prob lem instances of N = 1001. solution comm unities are expected to form in the solution space [47]. Then as α further increases to certain larger v alue, the time needed for the random walk pro cess to escape from a solution comm unity ma y excee d the preset maximal w aiting t ime of ∆ t max = 100 0 and the DWF pro cess will then stop. The ac hiev ed storage capacit y α can b e increased to some exten t if w e mak e ∆ t max larger, but the searc h pro cess will b ecome more and more viscous as the solution space of the problem b ecomes more and more heterogeneous and complex [34]. W e do not att empt to calculate the jamming p oint of the random walk searc hing pro cesses . V. DISCUSSION W e pro p osed sev era l sto c hastic learning strat egies for the Ising p erceptron problem based on the idea of solution space random walk ing [3 4]. Our sim ulation results in Fig. 2 demon- strated that, the DWF strategy is able to correctly classify ≥ 0 . 4 N random input pa tterns of length N fo r N ≤ 1 001. If a simple relearning strategy is added to the D WF strategy , the learning p erformance is furt her improv ed. The learning time of the DWF strategy grows roughly linearly with the num b er of input pat t erns. Th is w o rk suggested that learning b y lo cal and random changes o f synaptic w eigh ts is eﬃcien t for p erceptronal systems with N ≈ 1 0 2 − 10 3 neurons. These lo cal sequen tial learning strategies ma y b e exploited in some biological p erceptronal systems. In real neuronal systems, the n umber N of in v olve d neurons in an elemen tar y pattern classiﬁcation t a sk ma y b e of the order of N ∼ 10 1 − 10 3 . The solutions obtained b y the D WF strategy f o r a giv en p erceptronal learning t a sk are separated b y a ty pical Ha mming distance, whic h reduces as the num b er of input patterns increases (Fig. 3). Ho we ve r, solutions are still far aw ay from eac h other ev en near t o the 10 critical capacity . W e susp ected that for the problem instances studied in this pap er, either the solution space of the problems is ergo dic as a whole, o r the solutions reac hed by the DWF strategies all b elong to the same solution cluster of the solution space. In our rando m walking setting, once all weigh ts are fro zen, particularly fo r SWF, the curren t pattern with negative stabilit y ﬁeld will b e no longer learned since the curren t w eight conﬁguration is isolated in the w eight space (this weigh t conﬁguration is denoted a s the completely fro zen solution); fortunately , DWF is able to go on ev en if all w eigh ts are frozen, since ﬂipping certain pairs of we ights is still p ermitted from the conﬁguration where eac h single w eigh t is not allow ed to b e ﬂipp ed. If t hese ﬂippable pairs of w eights do not exist, DWF will get trapp ed, and the conﬁguration is isolated once again. Actually , as the constraint densit y α increases, man y suc h isolated solutio ns will show up, and SWF or D WF w orking b y single- or double- w eigh t ﬂips, is not capable of crossing energy barriers separating the isolated solutions from those connected ones, whic h can b e b ypassed to some exten t using the relearning strategy whic h helps to escap e from these small clusters and mak es SWF or D WF kee p on exploring the lar ge cluster compo sed of exp o nentially many solutions. F or small α , replica symmetric ansatz is b eliev ed to give a go o d description o f the solution space of Ising p ercetpron [25]. Up to α s , p oin t-like clusters will form and searc hing for the compatible w eigh ts becomes more diﬃcult[48]. It is desirable to ha v e a theoretical understanding o n the structural ev olutio n of the solution space of the random Ising p erceptron problem. Ho w the dynamics of sto c hastic lo cal searc h algorithms is inﬂuenced b y the solutio n space structure of the random Ising p erceptron is an imp ortant o p en issue. Another in teresting problem is the generalization problem where the inputs-output asso- ciations ar e no longer uncorrelated but the desired outputs ar e giv en b y a teac her p erceptron [17, 49 – 51]. The studen t p erceptron tries to learn the rule provide d b y the teac her. After an enough amount of examples are presen ted to the studen t p erceptron, the student’s w eigh ts should match those of t he teac her, then the net w ork undergo es a ﬁrst-order transition f r o m p o o r to p erfect generalization [49, 50]. It is worth while t o extend the curren t random walk strategies to analyze the generalization problem in Ising p erceptrons. Ac kno wledgments This w ork w as par tially supp orted b y the National Science F oundation of China (Grant num b ers 10 7 74150 and 10834014 ) and the China 973-Progra m (Grant n um b er 2007CB935903 ). [1] A. Engel and C. V an d en Bro eck. Statistic al M e chanics of L e arning . Cambridge Unive rsity Press, Cam br idge, England, 2001. [2] C. C. H. Petersen, R. C. Malenk a, R. A. Nicoll, and J. J . Hopﬁeld. Pr o c. N atl. A c ad. Sci. USA , 95:4732 , 1998. [3] D. H. O’Connor, G. M. Witten b erg, and S. S.-H. W ang. Pr o c. Natl. A c ad. Sci. USA , 102:96 79, 2005. [4] W. Kr auth and M. M ´ ezard. J . Phys. A , 20:L745, 1987 . [5] J. K. An lauf and M. Biehl. Eur ophys. L ett , 10:687, 1989. [6] W. Sen n and S. F usi. Phys. R ev. E , 71:0619 07, 2005. 11 [7] W. Kr auth and M. Opp er. J. Phys. A , 22:L519, 198 9. [8] H. Gutfr eund and Y. Stein. J. Phys. A , 23:2613, 1990. [9] B. Derrida, R. B. Griﬃths, and A. Pr ¨ ugel-Bennett. J. Phys. A , 24:4907, 1991. [10] I. Ko c her and R . Monasson. J . Phys. A , 25:367, 1992 . [11] H. M. K ¨ ohler. J. Phys. A , 23:L1265, 1990 . [12] H. K¨ ohler, S. Diederic h, W. K inzel, and M. O p p er. Z. Phys. B , 78:333, 1990 . [13] L. Reimers, M. Bouten, an d B. V an Romp aey . J. Phys. A , 29:6247, 1996. [14] G. Milde and S. Kob e. J. Phys. A , 30:234 9, 1997. [15] A. Brauns tein and R. Z ecc hina. Phys. R ev. L ett , 96:03 0201, 2006. [16] C. Baldassi, A. Braunstein, N. Bru nel, and R. Zecc hin a. P r o c. Natl. A c ad. Sci. U SA , 104:1 1079, 2007. [17] C. Baldassi. J. Stat. Phys , 136:902, 2009. [18] E. Gardn er. J. Phys. A , 21:257, 198 8. [19] E. Gardn er and B. Derrida. J. Phys. A , 21:271 , 1988. [20] W. Krauth and M. M´ ezard. J. Phys. (F r anc e) , 50:30 57, 1989. [21] E. Gardn er and B. Derrida. J. Phys. A , 22:198 3, 1989. [22] H. Horner. Z. Phys. B , 86:291, 1992. [23] J. Ard elius and L. Zdeb oro v´ a. P hys. R ev. E , 78:040101( R), 2008. [24] T. Obu c hi and Y. Kabashima. J. Stat. Me ch. , P12014, 2009. [25] J. F. F ontanari and R. K¨ ob erle. J. Phys. (F r anc e) , 51:140 3, 1990. [26] M. Bouten, L. Reimers, an d B. V an Romp aey . Phys. R ev. E , 58:237 8, 1998. [27] R. W. Penney an d D. Sherrin gton. J. Phys. A , 26:6173 , 1993. [28] R. W. Penney an d D. Sherrin gton. J. Phys. A , 26:3995 , 1993. [29] D. Malzahn. Phys. R ev. E , 61:626 1, 2000. [30] R. Kemp ter, W. Gerstner, and J. Leo v an Hemmen . Phys. R ev. E , 59:4498 , 1999. [31] P . D’Souza, S.-C. Liu, and R. H. R. Hahnloser. Pr o c . Natl. A c ad. Sci. USA , 107:4722–4 727, 2010. [32] B. A. K uhl, A. T . Shah, S. Du Br ow, and A. D. W agner. Natur e Neur osci. , 13:501–50 6, 2010. [33] J. Ard elius, E. Aurell, and S. K rishnamurth y . J. Stat. M e ch. , P10012, 2007 . [34] H. Zh ou. Eu r. P hys. J. B , 73:617, 201 0. [35] H. Zh ou and H. Ma. Phys. R ev. E , 80:06610 8, 2009. [36] S. Co cco, R. Monasson, A. Mon tanari, and G. Semerjian. arXiv:cs.CC/030200 3 , 2003. [37] G. Semerjian and R. Monasson. P hys. R ev. E , 67:06 6103, 2003. [38] W. Barthel, A. K. Hartmann, and M. W eigt. Phys. R ev. E , 67:06610 4, 2003. [39] F. Altarelli, R. Monasson, G. Semerjian, and F. Zamp oni. , 2008. [40] F. Krzak ala and J. K urc han . Phys. R ev. E , 76:021 122, 2007. [41] S. Seitz, M. Ala v a, and P . Orp onen. J. Stat. Me ch. , P06006, 2005. [42] J. Ard elius and E. Aurell. Phys. R ev. E , 74:037 702, 2006. [43] M. Ala v a, J. Ar delius, E. Aurell, P . Kaski, S. Krish nam ur th y , P . Orp onen, and S. S eitz. Pr o c. Natl. A c ad. Sci. USA , 105:15253 , 2008. [44] S. F us i and L. F. Abb ott. Natur e Neu r osci. , 10:485–4 93, 2007. [45] G. Biroli, R. Monasson, and M. W eigt. E ur. Phys. J. B , 14:55 1, 2000. [46] F. Krzak ala, A. Mont anari, F. Ricci-T ersenghi, G. S emerjian, and L. Zdeb oro v´ a. Pr o c. Natl. A c ad. Sci. USA , 104:10318 –10323, 2007. [47] H. Zh ou. , 2009 . [In ternational J ournal of Mo dern Physics B (in press)]. [48] L. Zd eb orov´ a an d M. M´ ezard. J. Stat. Me ch. , P12004, 2008. 12 [49] G. Gy¨ orgyi. Phys. R ev. A , 41:709 7, 1990. [50] H. Somp olinsky , N. Tish by , and H. S. Seung. Phys. R ev. L ett , 65:1683, 1990. [51] H. Horner. Z. Phys. B , 87:371, 1992.

Learning by random walks in the weight space of the Ising perceptron

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment