Learning by random walks in the weight space of the Ising perceptron
Several variants of a stochastic local search process for constructing the synaptic weights of an Ising perceptron are studied. In this process, binary patterns are sequentially presented to the Ising perceptron and are then learned as the synaptic w…
Authors: Haiping Huang, Haijun Zhou
Learning b y random w alks in the w eigh t sp ace of the Ising p erceptron Haiping Huang 1 and Haijun Zhou 1 , 2 1 Key L ab or atory of F r ontiers in The or etic al Physics, Institute of The or etic al Physics, Chinese A c ademy of Scienc es, Beijing 10 0190, China 2 Kavli Institute for The or etic al Physics China, Institute of The or etic al Physics, Chinese A c ademy of Scienc es, Beijing 10 0190, China (Dated: No v em b er 19, 2018) Sev eral v arian ts of a sto c hastic lo cal searc h p ro cess f or constructing the synaptic w eigh ts of an Ising p erceptron are studied. In this pro cess, binary patterns are sequen tially pr esen ted to the Ising p erceptron and are then learned as the synaptic w eigh t configuration is mo dified through a chain of single- or dou b le-w eigh t flips within the compatible w eigh t configuration space of the earlier learned patterns. This pr o cess is able to reac h a storage capacit y of α ≈ 0 . 63 for p attern length N = 101 and α ≈ 0 . 41 for N = 1001. If in addition a relearning pro cess is exploited, the learnin g p erformance is further improv ed to a storage capacit y of α ≈ 0 . 80 for N = 101 and α ≈ 0 . 42 for N = 1001. W e foun d that, for a giv en learning task, the solutions constructed by the rand om walk learning p ro cess are separated b y a typica l Hamming distance, wh ic h d ecreases with the constrain t den sit y α of the learning task; at a fixed v alue of α , the wid th of the Hamming distance distr ib utions decreases with N . Keywords: neuronal net works (theory), dis o rdered systems (theor y), sto chastic search, anal- ysis of algo rithms I. INTR ODUCT ION A single-lay ered feed-for w ard net w ork of neurons, r eferred to as a p erceptron, is an ele- men tary building blo ck of complex neural net w orks. It is also one of the basic structures for learning and memory [1]. In a p erceptron, N input neurons (units) a r e connected to a single output unit by synapses of contin uous or discrete-v alued synaptic w eights . The learning task is to set the w eigh t v alues for these N synapses suc h that a n extensiv e n um b er M = α N o f input pa t terns are correctly classified (see Fig. 1a) . The pa rameter α ≡ M / N is called the constrain t densit y . An assignmen t of t hese w eigh ts is referred to as a solution if the p ercep- tron correctly classifies all the input patterns with this w eigh t a ssignmen t. Compared with p erceptrons with real-v alued synaptic weigh t s, Ising p erceptrons, whose synaptic weigh t s are binary , are muc h simpler for large-scale electronic implemen tations and more r obust against noise. An Ising p erceptron is a lso relev ant in real neural systems, as t he synaptic w eigh t b et w een tw o neurons actually take s b ounded v alues and has a limited nu mber of synaptic states[2, 3]. On the other hand, training a real-v a lued p erceptron is easy (e.g., the Mino v er algorithm [4] and the AdaT ron alg o rithm [5]) but training an Ising p erceptron is know n to b e an NP-complete pro blem [6]. Giv en α N input patterns, the computation time needed to find a solution ma y g ro w exp onen tially with the n um b er of weigh ts N in the w orst case. A complete enume ratio n of a ll p ossible weigh t states is only feasible for small systems up to 2 N = 25 [7–10]. In recen t y ears researc hes on efficien t heuristic algorithms were rather activ e [6, 11–17]. If the n umber M o f input patterns is to o large, a p erceptron will b e unable t o correctly classify all o f them, no mat t er how the synaptic w eigh ts are mo dified. This is a phase transition phenomenon o f the solution space of the p erceptron. In the case that the M input binary patterns are sampled uniformly and ra ndomly from the set o f all binar y patterns, the maximal v a lue α s of the constraint densit y α , the storage capacity at whic h a solution still exists, has b een calculated by statistical phy sics metho ds. F or the contin uous p erceptron sub ject to the spherical constrain t, Gardner and Derrida found t ha t α s = 2 [18, 19]. A t the thermo dynamic limit of N → ∞ , the contin uous p erceptron is imp ossible to correctly classify more than 2 N random input pa tterns. When the synaptic w eigh t is restricted to bina r y v alues, α s w as predicted to b e 0 . 83 b y Krauth a nd M´ ezard using the first-step replica-symmetry brok en spin-glass theory [20]. This prediction w as confirmed by n umerical sim ulations of small size systems (plus an extrap olation t o larg e N ) [7, 9, 2 1]. The theoretically predicted storage capacit y α s represen ts the upper limit of achie v able constrain t densit y α by an y learning strategies. As the constrain t densit y α increases, it is exp ected t ha t the solution space of the Ising p erceptron breaks in to a huge n um b er of disjoin t ergo dic comp onen ts [22 ]. Solutio ns from differen t comp onen ts are significantly differen t. One can define a connected compo nen t of the w eight space as a cluster of solutions in which an y t w o solutions are connected b y a path of successiv e single-weigh t flips [2 3, 24]. These solution clusters are separated by w eigh t configur a tions that o nly correctly classify a subset o f the input patterns. These partial solutions act a s dynamical traps for lo cal searc h algorithms and make the learning ta sk hard. An adaptive genetic algorithm w as suggested by K¨ ohler in 1990, whic h could reac h α ≃ 0 . 7 for systems of N = 255 [11]. Simu lated annealing techniq ues w ere used b y Horner [22] but critical slo wing down of the searc h pro cess w as observ ed, due to the v ery rugged energy landscap e of the problem. The sim ulated annealing was also used to study the statistical structure of the energy landscap e for the Ising p erceptron. The analysis of the distribution of distances b et w een global minima obta ined b y sim ulated annealing f o r small α indicated that the distance distribution b ecomes a delta function in the thermo dynamic limit [25]. Making use of the adv an tage that efficien t algo rithms exist for the real-v alued p erceptron, a n alt ernat ive approach w as to clip the trained real-v alued w eigh ts o f the con tin uous p erceptron in to binary v alues [13, 26–29]. No t a ll synaptic we ights can b e correctly sp ecified b y clipping, how ev er, and for those uncertain weigh ts, complete en umeration w as then adopted. A message-passing algorithm was dev elop ed b y Braunstein and Zecc hina for the Ising p erceptron [15], whic h was able to reac h α ≃ 0 . 7 for N ≥ 10 00. The efficiency of this b elief-propagat io n algorithm w as later conjectured t o b e due to the existence of a sub-exp onential n umber of lar ge solution clusters in the we ight space [24]. An on-line learning algo r it hm inspired fro m this b elief-propaga tion algo rithm w as also studied [16], in whic h hidden discrete in ternal states are added to the synaptic w eigh ts. In real neural systems , the microscopic mec hanism o f p erceptronal learning is the Hebbian rule of synaptic mo dification (spiking-time-dependent synaptic plasticit y ma y b e exploited, see, e.g., Refs. [30, 31 ]). Th e learning pro cesses in biological p erceptronal systems are ex- p ected to b e m uc h simpler than the v arious sophisticated learning pro cesses of artificial p erceptrons. Tw o other imp ortant asp ects of biological p erceptron systems are (i) the pat- terns to b e classified a re usually read in to the system in a sequen tial o rder, so they are b eing learned one b y one, and (ii) when a new pa t t ern is b eing learned, there are biolog ical mec hanisms whic h reactiv a te old lear ned patterns; such recalling pro cesses help to preven t 3 1 2 3 µ µ µ ξ ξ ξ N J 1 J N µ ξ sgn 1 , 2, , i i i J N µ µ σ ξ µ α = = ∑ K (a) 1 1 m m m − + (b) FIG. 1: (Color online) Th e sk etc h of the Ising p erceptron and the sin gle-we ight random wa lking pro cess in the corresp ond ing w eigh t space. (a) N input u n its (op en circles) feed directly to a sin gle output u nit (solid circle). A binary inp ut p attern ( ξ µ 1 , ξ µ 2 , . . . , ξ µ N ) of length N is mapp ed through a sign function to a binary output σ µ , i.e., σ µ = sgn P N i =1 J i ξ µ i . Th e set of N binary synaptic w eigh ts { J i } is regarded as a solution of the p erceptron problem if th e outpu t σ µ = σ µ 0 for eac h of the M = αN inp ut patterns µ ∈ [1 , M ], wh ere σ µ 0 is a preset binary v alue. (b) A s olution space r an d om walki n g p ath (indicated b y arro ws). An op en circle repr esen ts a configur ation that satisfies th e fi rst m + 1 inp u t patterns, while a b lac k circle and a gray circle represents, resp ectiv ely , a configuration that satisfies the first m and the first m − 1 inpu t patterns. An edge b et we en t w o configurations means that these t w o configurations are related by a single-w eigh t flip. old pa t t erns from b eing forgot as new patterns are learned (see, e.g., the experimental in- v estigation of Ref. [32]). Motiv a ted b y these biological considerations, w e in v estigate in this pap er a simple seque ntial learning mec hanism, namely synaptic-w eigh t space random w alk- ing. In this random w alking mec hanism, the αN patterns ar e intro duced into the system in a randomly p erm uted sequen tial or der, and random walk of single- or double-w eigh t flips is p erformed un til eac h newly added pattern is correctly classified (learned). The previously learned patterns are not allo we d to b e misclassified in later stages of the learning pro cess. W e p erfo rm extensiv e nume rical sim ulations on sev eral v ariants of this simple sequen tia l lo- cal learning rule and find that this mec hanism has go o d p erformance o n systems of N ∼ 10 3 neurons or less. The pap er is org a nized as follows . The Ising p erceptron learning is defined in more detail in Sec. I I. Sev eral strategies o f learning b y random w alks are presen ted in Sec. I I I. In Sec. IV, exp erimen tal study of learning algorithms is carried out. The ov erla p distribution of solutions as w ell as p erfor mances of different lo cal search algorithms is rep orted. Summary and discussion are giv en in Sec. V. Sequen tial random w alk search algorithms w ere recen tly in ve stigated in v arious com bina- torial satisfaction problems (see, e.g., Refs. [33–35]). The presen t w ork adds evidence that the solution space random w alking mechanis m, although very simple and easy to implemen t, is able to solv e man y nontrivial pro blem instances of a giv en complex learning or constraint satisfaction problem. 4 I I. THE RANDOM CLASSIFI CA TION PR OBLEM F or the Ising p erceptron depicted sc hematically in Fig. 1a, N input units a r e connected to a single output unit b y N synapses of w eight J i = ± 1 ( i = 1 , 2 , . . . , N ). The p erceptron tries to learn M = αN asso ciations { ξ µ , σ µ 0 } ( µ = 1 , 2 , . . . , M ), where ξ µ ≡ ( ξ µ 1 , ξ µ 2 , . . . , ξ µ N ) is an input pattern with ξ µ i = ± 1, and σ µ 0 = ± 1 is the desired classification of the input pattern µ . Giv en the input pattern ξ µ , the actual output σ µ of the p erceptron is σ µ = sgn N X i =1 J i ξ µ i . (1) The p erceptron can modify its syn aptic w eight configurat ion { J i } ≡ ( J 1 , J 2 , . . . , J N ) to ac hiev e complete classification, i.e., σ µ = σ µ 0 for eac h of the M input pattern. The solution space of the Ising p erceptron is comp osed of all the weigh t configurations { J i } that satisfy σ µ 0 P i J i ξ µ i > 0 for µ = 1 , 2 , . . . , M . F or the random Ising p erceptron pro blem studied in t his pap er, eac h of the M input binary patt erns ξ µ is sampled uniformly and randomly from the set of all 2 N binary patterns of length N , and the classification σ µ 0 is equal to ± 1 with equal proba bility . F or N sufficien tly large, the solution space of suc h a mo del system is non-empty as long as α < 0 . 83 [20]. T o construct suc h a solution configuration { J i } , ho w ev er, is quite a non-trivial task. A more stringen t learning problem is to find a w eight configura tion { J i } such that , fo r eac h input pattern ξ µ , σ µ 0 P i J i ξ µ i √ N ≥ κ , (2) where κ > 0 is a preset parameter [20]. The most efficien t w ay of solving this constraint satisfaction problem app ears to b e the message-passing algo r ithm o f Refs. [15, 16]. One can p erform a gauge transform of ξ µ i → ξ µ i σ µ 0 to eac h input pattern. Under this gauge tra nsform, eac h desired output is transformed to σ µ 0 = 1. Without lo ss of g enerality , in the remaining part of this pap er w e will assume σ µ 0 = 1 for an y input pattern µ . Consider the case o f N b eing o dd, w e define the stabilit y field of a pattern µ a s h µ = N X i =1 J i ξ µ i . (3) T o ensure the lo cal stabilit y of input pattern µ under c hanges of w eigh t configurat io n { J i } , in analogy to Eq. (2), w e in tro duce a stability para meter ∆ ≥ 1 and r equire that h µ ≥ ∆ for eac h µ . Input patterns with h µ ≥ 3 ar e stable ag a inst a single-we ight flip. F or t he single- w eigh t flipping pro cesses of the next section, the input patterns with h µ = 1 are referred to as barely learned patterns, as these patt erns may b ecome misclassified after the w eigh t configuration ma kes a single flip. Similarly , for the double-w eight flipping pro cess o f the next section, the input patterns with h µ = 1 or h µ = 3 are referred to as barely learned patterns. I I I. LEARNING BY RANDOM W ALKS Random walk pro cesses w ere used in a series of works [33, 34, 36 –39] to find solutions for constrain t satisfaction problems. They w ere also used as to ols to study the solution space 5 structure of these constrain t satisfaction problems [3 3 , 34, 40]. V arious lo cal searc h strategies ha v e b een dev elop ed to improv e the p erformance of random walk sto ch astic searc hing [41– 43]. The random w alk learning strategies of this w ork follo w the SEQSAT alg orithm of Ref. [34]. An initial weigh t configuration ( J (0) 1 , J (0) 2 , . . . , J (0) N ) is randomly generated at time t = 0 . The first pattern ξ 1 is applied to the Ising p erceptron. If this pattern is correctly classified under the initia l w eigh t configura tion (i.e., h 1 > 0), then the second pattern ξ 2 is applied; ot herwise the w eigh t configurat ion is adjusted b y a sequence of elemen tary lo cal c hanges until ξ 1 is correctly classified. The algorithm then pro ceeds with the second pattern ξ 2 , the third pattern ξ 3 , etc., in a sequen tial order. An elemen tary lo cal change of we ight configuration is ac hiev ed either by a single-we ight flip (SWF) or b y a double-w eigh t flip (DW F). Supp ose at time t the w eigh t configuration is { J ( t ) } ≡ ( J ( t ) 1 , J ( t ) 2 , . . . , J ( t ) N ), and supp ose this configuration correctly classifies the first m input patterns ξ µ ( µ = 1 , . . . , m ) but not the ( m + 1)-th pattern ξ m +1 . The configuratio n { J i } will kee p w andering in the solution space of the first m patterns until a configuration that correctly classifies ξ m +1 is reached (see Fig. 1b). In the SWF proto col, a set A ( t ) of allo w ed single-w eigh t flips is constructed based on the current configuration { J ( t ) } a nd the m learned patterns. A ( t ) includes all in teger p ositions j ∈ [1 , N ] with the prop ert y that the single-we ight flip o f J ( t ) j → − J ( t ) j do es not render any barely learned patterns µ ∈ [1 , m ] (whose h µ = 1) b eing misclassified. At time t ′ = t + 1 / N an integer p osition j is c hosen uniformly and randomly fro m set A ( t ) and the w eigh t configuration is changed to { J ( t ′ ) } suc h that J ( t ′ ) i = J ( t ) i if i 6 = j and J ( t ′ ) j = − J ( t ) j . It is ob vious that the new configuration { J ( t ′ ) } also satisfies all the first m patterns. The DW F proto col is very similar to the SWF proto col, with the only difference that the allo w ed set A ( t ) at time t contains ordered pairs o f in teger p ositions ( i, j ) with i < j . This set of ordered pairs can a lso b e easily constructed. If, with resp ect to configur a tion { J ( t ) } , there are no bar ely learned patterns (whose stabilit y field h µ = 1 or 3) among the first m learned patt erns, then A ( t ) con tains all the N ( N − 1) / 2 ordered pairs of integers ( i, j ) with 1 ≤ i < j ≤ N . Otherwise, rando mly choose a barely learned pa ttern, sa y m 1 ∈ [1 , m ], and for eac h inte ger i ∈ [1 , N ] with the prop erty that J ( t ) i ξ m 1 i < 0, do the following: (1) if J ( t ) i ξ µ i < 0 for all the other barely learned patterns, then add a ll the ordered pa irs ( i, j ) with j ∈ [ i + 1 , N ] into the set A ( t ); (2 ) otherwise, add all the ordered pairs ( i, j ) in to the set A ( t ), with t he pro p ert y that the in teger j ∈ [ i + 1 , N ] satisfies J ( t ) j ξ µ j < 0 for all those barely learned patterns µ ∈ [1 , m ] with J ( t ) i ξ µ i > 0. The w aiting time ∆ t m +1 of satisfying the ( m + 1)-th pattern is defined a s the total elapsed time fro m first satisfying the m -th pattern to first satisfying the ( m + 1)-th pa ttern. And the total time T m +1 of satisfying the first ( m + 1) patterns is simply T m +1 = P m +1 µ =1 ∆ t µ . One time unit correspo nds to N elemen tary lo cal c hanges of the w eigh t configuratio n. The ra ndom w alk searc hing pro cess stops if a ll the M input patterns hav e b een correctly classified, or if the last visited w eigh t configura tion b ecomes an isolated p o in t (i.e., t he set A ( t ) b ecomes empt y after a new pattern is included into the set of learned patterns), or if the last w aiting time ∆ t m +1 exceeds a preset maximal time v alue ∆ t max , whic h is equal to ∆ t max = 1000 in the presen t w ork. The SWF a nd DWF random w alks pro cesses as men tioned ab o v e are ve ry simple to implemen t a nd they do not ov ercome any barr iers in t he energy landscap e of the p erceptron learning problem. How ev er, as w e demonstrate in t he next section, their p erformances are quite remark able for problem instances with pattern length N ≤ 10 3 . 6 The SWF pro cess, as a lo cal searc h algorit hm, will get stuc k in one of the enormous metastable states when all the we ights b ecome fr o zen (here w e iden tify a synaptic w eigh t as b eing frozen if flipping its v alue causes at least one of the learned patterns to b e misclassified), at a constraint densit y v alue m uc h smaller tha n t he theoretical thr eshold v alue of 0 . 83. The D WF pro cess will also get jammed if the w eight configuration b ecomes fr o zen with resp ect to any double-weigh t flips. T o further impro v e the ac hiev able storage capacity for the SWF and DWF learning pro cesses, a simple relearning strategy is added to the random walk searc hing. The basic idea of the relearning strategy is: if some learned patterns ar e hindering the learning of new patterns v ery m uch, we first ignore them and pro ceed to learn a nu mber of new pa t terns; a fter t ha t, w e learn the ignored patterns again a nd hop e they can all b e correctly classified. In the presen t w ork, w e implemen t the relearning strategy in the following wa y . Suppo se that as the m -th input pattern is presen ted to the Ising p erceptron, the SWF or the DWF pro cess is unable to learn it in a w aiting time ∆ t m < ∆ t max . W e then remo v e all the k barely learned patterns µ ∈ [1 , m − 1] with h µ = 1 from the list of learned patterns, and pro ceed to learn the patterns µ ∈ [ m, m + k − 1] in a sequen tial manner (stage 1) . If the SWF pro cess or the DWF pro cess succeeds in learning these k patterns, we then r eturn to learn the k previously remo v ed patterns a gain in a sequen tial manner (stage 2). If this relearning succeeds , we pro ceed with t he patt erns with index µ ≥ m + k . If this attempt fa ils either at stage 1 or at stage 2, we stop the whole random walk learning pro cess or start with another trial b y remo ving all the learned patterns. In practice, w e find that the relearning pro cess has a high probabilit y to succeed in b oth stag e 1 and stage 2 if α is not to o lar g e and pattern length is of order 10 3 or less. IV. RESUL TS Figure 2 demonstrates the sim ulation results fo r sev eral r a ndom walk learning strategies. F or eac h lear ning strategy , N set of ra ndom input patterns ( ξ 1 , ξ 2 , . . . , ξ M ) a r e generated. Eac h input pa ttern ξ µ has length N . The random w alk learning strategy is then applied to eac h set of patterns un til it stops, at whic h p oint we record the n um b er of correctly classified patterns m and calculate t he ac hiev ed storage capacity α = m/ N . The mean v alues of α are rep orted in Fig. 2. It a pp ears that the storage capacit y of all the four learning strategies decreases with N roughly as a p o we r law α ∝ N − γ . A t eac h v alue of N , the SWF strategy has the w orst p erfo rmance, while the DW F strat egy with relearning has the b est p erformance. The SWF strategy is able to reac h a stora ge capacity of α ≈ 0 . 36 for systems of N = 101 and α ≈ 0 . 17 fo r systems o f N = 1001. These v alues are muc h less t han the theoretical storage capa city of α ≈ 0 . 83. Ho w ev er, the DWF strategy p erforms m uc h b etter, with a capacit y of α ≈ 0 . 63 fo r N = 1 0 1 and α ≈ 0 . 41 for N = 10 0 1. In real neural systems, p erceptronal learning of elemen t a ry patterns probably do es not in v olve to o man y neuronal cells and a v alue o f N ∼ 1 0 2 migh t b e common. F or p erceptronal systems with N ∼ 10 2 − 10 3 , the SWF and D WF strategies can b e regarded as efficien t. If relearning is introduced into the random w alk learning strategies, the p erformance can b e further impro v ed. F o r the DWF strategy with relearning, w e find that the storage capacit y is α ≈ 0 . 80 fo r N = 101 a nd α ≈ 0 . 42 for N = 1001. Relearning is indeed a biologically relev an t strategy in p erceptronal learning o f real neural systems [32, 44]. As a comparison, for problem instances of pattern length N = 1001, the b elief-propagation inspired learning 7 100 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α N DW F with relearning DW F SW F wi th relearning SW F FIG. 2: (Color online) Comparison of the p erforman ces of sev eral r an d om walk search strategies. The achiev ed storage capacit y α as a v eraged o v er many ind ep endent runs (100 for the smallest N and 10 for the largest N ) are shown as a fu n ction of th e pattern length N . The solid lines are p o wer-la w fittings of the form α ∝ N − γ , with γ = 0 . 302 , 0 . 347 , 0 . 198 , 0 . 241 f or SWF, S WF with relearning, D WF and D WF with relearning, resp ectiv ely . strategy of Baldassi and coauthors [16] ac hiev es α ≈ 0 . 47 when the n um b er K of in ternal states of their algorit hm is set to K = 40. This stor a ge capacit y α decreases to α ≈ 0 . 36 at K = 2 0 and to α ≈ 0 . 10 at K = 10. F or the same set of input patterns ( ξ 1 , ξ 2 , . . . , ξ m ), differen t runs of the SWF strategy or the DWF strategy lead to differen t solution configurations. The similarity b et w een solutions can b e measured b y an ov erlap v alue q as defined by q = 1 N N X i =1 J i J ′ i , (4) where ( J 1 , . . . , J N ) and ( J ′ 1 , . . . , J ′ N ) are tw o solutions. The reduced Hamming distance d H b et w een tw o solutions is r elated to the o v erlap q b y d H = (1 − q ) / 2 . The t ypical v alue of the o ve rlap v alue at constrain t densit y α ∼ 0 . 83 is predicted to b e q ≈ 0 . 56 according to the r eplica-symmetric calculation [20 ], suggesting tha t solutions a re still far a wa y from eac h other ( with a reduced Hamming distance d H ≈ 0 . 22) a s α approa ches the theoretical storage capacit y α s . Figure 3 sho ws the histogram P ( d H ) of reduced Hamming distances d H b et w een differen t solutions found by the DW F stra t egy fo r a single problem instance with constraint densit y α and pattern length N . Differen t pattern lengths o f N = 101 , 5 01 , 1001 are used, and 100 differen t solutions are constructed by rep eated running of the D WF pro cess. Other problem instances sho w similar pro p erties. W e notice fro m F ig. 3 that, at the same v alue of α , the histograms P ( d H ) fo r different N are p eak ed at almost the same d H v alue, but the width 8 0.0 0.1 0.2 0.3 0.4 0.5 0 500 1000 1500 2000 2500 Histogr am d H N=101 N=501 N=1001 α= 0.250 α= 0.449 α= 0.693 FIG. 3: (Color online) Histograms of redu ced Hamming d istances b et w een solutions found by D WF on a s in gle problem instance of M input p atterns of length N . 100 solutions are constr u cted for eac h of the five instances with ( M , N ) = (45 , 101), (70 , 101), (125 , 501), (225 , 501), (250 , 1001), resp ectiv ely . The solid lines are Gaussian fitting results to th e h istograms. of P ( d H ) decreases as N is enlarged. Suc h a b eha vior w as observ ed earlier in Ref. [25] o n a sligh tly mo dified Ising p erceptron problem. The solutions obtained b y the DWF strategy therefore hav e a t ypical lev el of similarit y . Figure 3 also demonstrates that, a s the constraint densit y increases, the histograms P ( d H ) shift to smaller d H v alues, suggesting that the lev el o f similarit y b et w een the D WF-constructed solutions increases with α . A t α = 0 . 693 the typic al reduced Ha mming distance is d H ≈ 0 . 224, compatible with the mean- field predictions [20]. Similar results are obtained for solutions found by the SWF strategy . In all our sim ulations, w e do not o bserv e double or m ultiple p eaks fo r the histogram P ( d H ). The r esults of these and our other n umerical sim ulations (not shown) are consisten t with prop osal tha t , for a giv en problem instance, the solutions obtained b y the random walking strategies a re mem b ers o f the same (large) solution cluster of the solution space [24, 25, 45]. Unlik e the random K - satisfiabilit y problem, the random Q -coloring problem, or some lo c ke d constrain t satisfaction problems [46–48], the solution space o rganization of the Ising p erceptron problem is still not v ery clear. K a bashima and co-authors [24] suggested that fo r α < 0 . 83 t he solution space of t he Ising p erceptron problem is equally dominated by exp onen tially many clusters of v anishing entrop y and a sub-exp onen tial num b er of larg e clusters. Our sim ulation results are compatible with this prop osal, but more work needs to b e done t o clarify the solution space structure of the random Ising p erceptron problem. The total time T αN used by the DWF strategy to correctly classify the first α N patterns for a problem instance with N = 1001 is sho wn in Fig. 4 as a function of α . The learning time gro ws almo st linearly with α for α < 0 . 4. As the constraint densit y α b ecomes lar g e, different 9 0.0 0.1 0.2 0.3 0.4 0 20 40 60 80 T α N α FIG. 4: (Color online) The learning time T αN as a function of α for for three prob lem instances of N = 1001. solution comm unities are expected to form in the solution space [47]. Then as α further increases to certain larger v alue, the time needed for the random walk pro cess to escape from a solution comm unity ma y excee d the preset maximal w aiting t ime of ∆ t max = 100 0 and the DWF pro cess will then stop. The ac hiev ed storage capacit y α can b e increased to some exten t if w e mak e ∆ t max larger, but the searc h pro cess will b ecome more and more viscous as the solution space of the problem b ecomes more and more heterogeneous and complex [34]. W e do not att empt to calculate the jamming p oint of the random walk searc hing pro cesses . V. DISCUSSION W e pro p osed sev era l sto c hastic learning strat egies for the Ising p erceptron problem based on the idea of solution space random walk ing [3 4]. Our sim ulation results in Fig. 2 demon- strated that, the DWF strategy is able to correctly classify ≥ 0 . 4 N random input pa tterns of length N fo r N ≤ 1 001. If a simple relearning strategy is added to the D WF strategy , the learning p erformance is furt her improv ed. The learning time of the DWF strategy grows roughly linearly with the num b er of input pat t erns. Th is w o rk suggested that learning b y lo cal and random changes o f synaptic w eigh ts is efficien t for p erceptronal systems with N ≈ 1 0 2 − 10 3 neurons. These lo cal sequen tial learning strategies ma y b e exploited in some biological p erceptronal systems. In real neuronal systems, the n umber N of in v olve d neurons in an elemen tar y pattern classification t a sk ma y b e of the order of N ∼ 10 1 − 10 3 . The solutions obtained b y the D WF strategy f o r a giv en p erceptronal learning t a sk are separated b y a ty pical Ha mming distance, whic h reduces as the num b er of input patterns increases (Fig. 3). Ho we ve r, solutions are still far aw ay from eac h other ev en near t o the 10 critical capacity . W e susp ected that for the problem instances studied in this pap er, either the solution space of the problems is ergo dic as a whole, o r the solutions reac hed by the DWF strategies all b elong to the same solution cluster of the solution space. In our rando m walking setting, once all weigh ts are fro zen, particularly fo r SWF, the curren t pattern with negative stabilit y field will b e no longer learned since the curren t w eight configuration is isolated in the w eight space (this weigh t configuration is denoted a s the completely fro zen solution); fortunately , DWF is able to go on ev en if all w eigh ts are frozen, since flipping certain pairs of we ights is still p ermitted from the configuration where eac h single w eigh t is not allow ed to b e flipp ed. If t hese flippable pairs of w eights do not exist, DWF will get trapp ed, and the configuration is isolated once again. Actually , as the constraint densit y α increases, man y suc h isolated solutio ns will show up, and SWF or D WF w orking b y single- or double- w eigh t flips, is not capable of crossing energy barriers separating the isolated solutions from those connected ones, whic h can b e b ypassed to some exten t using the relearning strategy whic h helps to escap e from these small clusters and mak es SWF or D WF kee p on exploring the lar ge cluster compo sed of exp o nentially many solutions. F or small α , replica symmetric ansatz is b eliev ed to give a go o d description o f the solution space of Ising p ercetpron [25]. Up to α s , p oin t-like clusters will form and searc hing for the compatible w eigh ts becomes more difficult[48]. It is desirable to ha v e a theoretical understanding o n the structural ev olutio n of the solution space of the random Ising p erceptron problem. Ho w the dynamics of sto c hastic lo cal searc h algorithms is influenced b y the solutio n space structure of the random Ising p erceptron is an imp ortant o p en issue. Another in teresting problem is the generalization problem where the inputs-output asso- ciations ar e no longer uncorrelated but the desired outputs ar e giv en b y a teac her p erceptron [17, 49 – 51]. The studen t p erceptron tries to learn the rule provide d b y the teac her. After an enough amount of examples are presen ted to the studen t p erceptron, the student’s w eigh ts should match those of t he teac her, then the net w ork undergo es a first-order transition f r o m p o o r to p erfect generalization [49, 50]. It is worth while t o extend the curren t random walk strategies to analyze the generalization problem in Ising p erceptrons. Ac kno wledgments This w ork w as par tially supp orted b y the National Science F oundation of China (Grant num b ers 10 7 74150 and 10834014 ) and the China 973-Progra m (Grant n um b er 2007CB935903 ). [1] A. Engel and C. V an d en Bro eck. Statistic al M e chanics of L e arning . Cambridge Unive rsity Press, Cam br idge, England, 2001. [2] C. C. H. Petersen, R. C. Malenk a, R. A. Nicoll, and J. J . Hopfield. Pr o c. N atl. A c ad. Sci. USA , 95:4732 , 1998. [3] D. H. O’Connor, G. M. Witten b erg, and S. S.-H. W ang. Pr o c. Natl. A c ad. Sci. USA , 102:96 79, 2005. [4] W. Kr auth and M. M ´ ezard. J . Phys. A , 20:L745, 1987 . [5] J. K. An lauf and M. Biehl. Eur ophys. L ett , 10:687, 1989. [6] W. Sen n and S. F usi. Phys. R ev. E , 71:0619 07, 2005. 11 [7] W. Kr auth and M. Opp er. J. Phys. A , 22:L519, 198 9. [8] H. Gutfr eund and Y. Stein. J. Phys. A , 23:2613, 1990. [9] B. Derrida, R. B. Griffiths, and A. Pr ¨ ugel-Bennett. J. Phys. A , 24:4907, 1991. [10] I. Ko c her and R . Monasson. J . Phys. A , 25:367, 1992 . [11] H. M. K ¨ ohler. J. Phys. A , 23:L1265, 1990 . [12] H. K¨ ohler, S. Diederic h, W. K inzel, and M. O p p er. Z. Phys. B , 78:333, 1990 . [13] L. Reimers, M. Bouten, an d B. V an Romp aey . J. Phys. A , 29:6247, 1996. [14] G. Milde and S. Kob e. J. Phys. A , 30:234 9, 1997. [15] A. Brauns tein and R. Z ecc hina. Phys. R ev. L ett , 96:03 0201, 2006. [16] C. Baldassi, A. Braunstein, N. Bru nel, and R. Zecc hin a. P r o c. Natl. A c ad. Sci. U SA , 104:1 1079, 2007. [17] C. Baldassi. J. Stat. Phys , 136:902, 2009. [18] E. Gardn er. J. Phys. A , 21:257, 198 8. [19] E. Gardn er and B. Derrida. J. Phys. A , 21:271 , 1988. [20] W. Krauth and M. M´ ezard. J. Phys. (F r anc e) , 50:30 57, 1989. [21] E. Gardn er and B. Derrida. J. Phys. A , 22:198 3, 1989. [22] H. Horner. Z. Phys. B , 86:291, 1992. [23] J. Ard elius and L. Zdeb oro v´ a. P hys. R ev. E , 78:040101( R), 2008. [24] T. Obu c hi and Y. Kabashima. J. Stat. Me ch. , P12014, 2009. [25] J. F. F ontanari and R. K¨ ob erle. J. Phys. (F r anc e) , 51:140 3, 1990. [26] M. Bouten, L. Reimers, an d B. V an Romp aey . Phys. R ev. E , 58:237 8, 1998. [27] R. W. Penney an d D. Sherrin gton. J. Phys. A , 26:6173 , 1993. [28] R. W. Penney an d D. Sherrin gton. J. Phys. A , 26:3995 , 1993. [29] D. Malzahn. Phys. R ev. E , 61:626 1, 2000. [30] R. Kemp ter, W. Gerstner, and J. Leo v an Hemmen . Phys. R ev. E , 59:4498 , 1999. [31] P . D’Souza, S.-C. Liu, and R. H. R. Hahnloser. Pr o c . Natl. A c ad. Sci. USA , 107:4722–4 727, 2010. [32] B. A. K uhl, A. T . Shah, S. Du Br ow, and A. D. W agner. Natur e Neur osci. , 13:501–50 6, 2010. [33] J. Ard elius, E. Aurell, and S. K rishnamurth y . J. Stat. M e ch. , P10012, 2007 . [34] H. Zh ou. Eu r. P hys. J. B , 73:617, 201 0. [35] H. Zh ou and H. Ma. Phys. R ev. E , 80:06610 8, 2009. [36] S. Co cco, R. Monasson, A. Mon tanari, and G. Semerjian. arXiv:cs.CC/030200 3 , 2003. [37] G. Semerjian and R. Monasson. P hys. R ev. E , 67:06 6103, 2003. [38] W. Barthel, A. K. Hartmann, and M. W eigt. Phys. R ev. E , 67:06610 4, 2003. [39] F. Altarelli, R. Monasson, G. Semerjian, and F. Zamp oni. , 2008. [40] F. Krzak ala and J. K urc han . Phys. R ev. E , 76:021 122, 2007. [41] S. Seitz, M. Ala v a, and P . Orp onen. J. Stat. Me ch. , P06006, 2005. [42] J. Ard elius and E. Aurell. Phys. R ev. E , 74:037 702, 2006. [43] M. Ala v a, J. Ar delius, E. Aurell, P . Kaski, S. Krish nam ur th y , P . Orp onen, and S. S eitz. Pr o c. Natl. A c ad. Sci. USA , 105:15253 , 2008. [44] S. F us i and L. F. Abb ott. Natur e Neu r osci. , 10:485–4 93, 2007. [45] G. Biroli, R. Monasson, and M. W eigt. E ur. Phys. J. B , 14:55 1, 2000. [46] F. Krzak ala, A. Mont anari, F. Ricci-T ersenghi, G. S emerjian, and L. Zdeb oro v´ a. Pr o c. Natl. A c ad. Sci. USA , 104:10318 –10323, 2007. [47] H. Zh ou. , 2009 . [In ternational J ournal of Mo dern Physics B (in press)]. [48] L. Zd eb orov´ a an d M. M´ ezard. J. Stat. Me ch. , P12004, 2008. 12 [49] G. Gy¨ orgyi. Phys. R ev. A , 41:709 7, 1990. [50] H. Somp olinsky , N. Tish by , and H. S. Seung. Phys. R ev. L ett , 65:1683, 1990. [51] H. Horner. Z. Phys. B , 87:371, 1992.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment