Unsupervised Learning in Neuromemristive Systems

Unsupervised Learning in Neuromemristi v e Systems Cory Merkel and Dhireesha K udithipudi Department of Computer Engineering Rochester Institute of T echnology Rochester , New Y ork 14623-5603 Email: { cem1103,dxkeec } @rit.edu Abstract —Neuromemristiv e systems (NMSs) currently r epre- sent the most promising platf orm to achieve energy efﬁcient neuro-inspir ed computation. Howev er , since the resear ch ﬁeld is less than a decade old, there are still countless algorithms and design paradigms to be explored within these systems. One particular domain that remains to be fully in vestigated within NMSs is unsupervised learning. In this work, we explore the design of an NMS for unsuper vised clustering, which is a critical element of sev eral machine learning algorithms. Using a simple memristor crossbar architecture and learning rule, we are able to achieve perf ormance which is on par with MA TLAB’ s k-means clustering. I . I N T RO D U C T I O N The present research explores multiple aspects of unsuper- vised learning in a class of neurologically-inspired computer architectures referred to as neur omemristive systems (NMSs). Although they are closely related, NMSs differ from neuro- morphic systems–pioneered by Mead in the late 1980s [1]– in two respects: First, they are designed using a mixture of CMOS and memristor technologies, which affords levels of connectivity and plasticity that are not achie v able in neuromor- phic systems. The second distinction, which is more subtle but equally important, is that NMSs focus on abstraction, rather than mimicking, of neurological processes. This abstraction generally improves the ef ﬁciency of the resulting hardware implementations. NMS research and dev elopment took off rapidly in the late 2000s, coinciding with a growing interest in two-terminal memristors, which will be described brieﬂy in Section II for unfamiliar readers. The utility of these systems has been demonstrated in v arious application domains, especially image processing/analysis. See [2] for a revie w . Learning in these systems has primarily been supervised. Although there are some e xamples of unsupervised learning in spike-based sys- tems [3], it is relatively unexplored in non-spiking NMSs. One example where unsupervised learning in non-spiking networks has been demonstrated is in [4], where principal component analysis (PCA) is used to reduce the dimensionality of images. Howe ver , the authors to not discuss any circuit for implementing the PCA algorithm in hardware. In this research, we propose a non-spiking NMS design for unsupervised clustering. The NMS is tested on samples from the MNIST database of handwritten digits. T o the best of our knowledge, this is the ﬁrst work that explores general aspects of clustering in these systems. Giv en its ability to reduce data dimensionality and aid in classiﬁcation, we believe that clustering is a ke y primiti ve for future NMSs. Furthermore, we belie ve this work will adv ance the state of unsupervised learning in NMSs and help others do the same. I I . B R I E F D E S C R I P T I O N O F M E M R I S T O R S T echnically , a memristor can be deﬁned as any two- terminal device with a state-dependent Ohm’ s la w [5]. More concretely , a memristor is a thin ﬁlm (I) sandwiched between a top (M) and bottom (M) electrode. The stack is referred to as a metal-insulator-metal (MIM) structure because the ﬁlm material is nominally insulating. That is, in its stoichiometric crystalline form it will have a large band gap and not enough free carriers to conduct. The ﬁlm is made conductiv e by introducing defects in the crystalline structure, either through fabrication, applying an electric ﬁeld, or both. Defects may be interstitial metallic ions which are oxidized at one electrode and then drift to the other , where they are reduced. Defects may also be v acancies such as oxygen v acancies in a T iO 2 ﬁlm. In addition, defects may be changes in polarization, such as those in ferroelectric ﬁlms, or e ven just changes in crystallinity as in phase change memory . In some ﬁlms, the defect proﬁle can be gradually adjusted by applying electric ﬁelds for short durations, yielding incremental modiﬁcations to the ﬁlm’ s ov erall conductance. In other ﬁlms, only two conductance states can be reached. Moreov er , there is usually a minimum amount of energy required to effect change in the ﬁlm’ s defect proﬁle. This often translates to a threshold voltage which must be applied across the ﬁlm to change its conductance. Giv en the constant e volution of memristor technology , it makes little sense to design an NMS around any speciﬁc memristor device parameters. Instead, we assume devices will have these general characteristics: (1) a large minimum resistance v alue (e.g. in k Ω s), (2) a large OFF/ON resistance ratio (at least 10 3 ), (3) high endurance (ability to switch many times before failing), (4) high retention (non-volatility), and (5) incremental conductance states that can be reached by applying bipolar voltage pulses abov e a particular thershold voltage. All of these properties have been demonstrated in various devices. See [6] for a revie w . I I I . C L U S T E R I N G A L G O R I T H M D E S I G N Clustering algorithms uncover structure in a set of m unlabeled input vectors { u ( p ) } by identifying M groups, or clusters of vectors that are similar in some way . In one common approach, each cluster is represented by its centroid, so the clustering algorithm is reduced to ﬁnding each of the M centroids. This can be achie ved through a simple competitiv e learning algorithm: Initialize M v ectors w i by assigning them to randomly-chosen input vectors. These will be referred to as weight vectors. Then, for each input vector , mov e the Algorithm 1 Proposed clustering algorithm. 1: Map inputs to hypercube vertices. 2: Initialize weight vectors to random input vectors. 3: for epoch = 1: N epochs do 4: for p = 1: m do 5: d ∗ i,p = w i · u ( p ) ∀ i = 1 , 2 , . . . , M 6: x i =  1 , d ∗ i,p = max( d ∗ i,p ) 0 , otherwise ∀ i = 1 , 2 , . . . , M 7: ∆ w i,j = αx i u ( p ) j ∀ i = 1 , 2 , . . . , M ∀ j = 1 , 2 , . . . , m 8: end for 9: end for closest weight vector a little closer . After sev eral iterations, the algorithm should con ver ge with the weight vectors lying at (or close to) the centroids. Of course, there are sev eral parameters which must be deﬁned, including a distance metric for measuring closeness. The most obvious choice is the ` 2 -norm. Howe ver , computing this is expensi ve in terms of hardware because it requires units for calculating squares and square roots. In addition, as we will discuss later , it is easy to use a high-density memristor circuit called a crossbar to compute dot products between input and weight vectors. Therefore, it is preferred to use a dot product as a distance metric. For example, if all of the vectors are normalized ( k u ( p ) k = k w i k = 1 ), then w i ∗ · u ( p ) > w i · u ( p ) ∀ w i 6 = w i ∗ , where w i ∗ is the closest weight vector to u ( p ) . Howe ver , the constraint that k u ( p ) k = k w i k = 1 creates a large overhead, because ev ery input vector has to be normalized and ev ery weight vector has to be re-normalized each time it is updated. W e propose the following solution: Map each input vector to the verte x of a hypercube centered about the origin: u ( p ) ∈ {− 1 , 1 } N , where N is the dimensionality of the input space. Now , w i · u ( p ) will yield a scalar value d ∗ i,p between − N and + N . Moreov er , this scalar value can be linearly transformed to a distance d i,p which is the ` 1 -norm, or Manhattan distance, between the weight vector and the input: d i,p ≡ N − d ∗ i,p = N X j =1 | w i,j − u ( p ) j | . (1) Using this distance metric, we don’t e ver need to re- normalize the weight vectors. Furthermore, mapping input vectors to hypercube vertices can usually be accomplished by thresholding. F or e xample, grayscale images can be mapped by assigning -1 to pixel values from 0 to 127 and +1 to pixel values from 128 to 255. Algorithm 1 summarizes the algorithm. The ﬁrst tw o lines are initialization steps. W ithin the double for loop x i is 1 when i corresponds to the index of the closest vector (called the winner) and 0 otherwise. Then, the weight components of the winner are moved closer to the current input vector using a Hebbian update rule. The pre- factor α , which is called the learning rate, determines how far the weight vectors move each time they win. Notice that this algorithm is completely unsupervised, so there are no labeled input vectors. 𝑢 1 𝑢 2 𝑢 𝑁 … … 𝑥 1 𝑥 2 𝑥 𝑀 Dist ance Calculat ion Mem ris tor Cr ossbar W eight Update Inputs WT A … Fig. 1. Block diagram of proposed NMS for unsupervised clustering.    󰇛  󰇜    󰇛  󰇜    󰇛  󰇜 Memristo r Cr ossba r for   … …   t r ai n _ e n                      Inpu ts Fig. 2. Crossbar and summing ampliﬁer circuit for computing the distance between the input and a weight vector . I V . N M S H A R D W A R E D E S I G N The unsupervised clustering algorithm discussed in Section III can be implemented efﬁciently in an NMS by representing weight v ectors as memristor conductances. A block diagram of the proposed design is shown in Figure 1. The inputs, which are represented as positive and negati ve currents, are fed through M crossbar circuits. T ogether with a non-in verting summing ampliﬁer, (represented as a circle), each crossbar computes the distance between the current input and the weight vector represented by its memristors’ conductances. The conﬁguration of the crossbar and summing ampliﬁer is shown in Figure 2. Memristors in the top row inhibit, or con- tribute a negativ e component to the output, while memristors in the bottom row excite, or contribute a positive component to the output. Therefore, each crossbar column represents one component of one weight vector w i , which can be positiv e or negati ve. If we assume that the op amp has a high open loop gain and the wire resistances are small, then v d ∗ i,p = N X j =1 i u ( p ) j R  G 2 − G 1 G 1 + G 2  i,j , (2) where G 1 and G 2 are the top and bottom memristors in each column, respectively . The output of the circuit is a voltage representation of the distance between the current input and the weight vector represented by the crossbar . The weight vectors are modiﬁed by connecting them to write voltages v w i,j using Fig. 3. 10 cluster centroids found in a set of 1000 MNIST images using the proposed NMS. a training enable signal train_en . The write voltages are determined by the value of ∆ w i,j in line 7 of Algorithm 1. Speciﬁcally , if ∆ w i,j is negativ e, then v w i,j will be a neg ativ e voltage below the memristor’ s write threshold, and if ∆ w i,j is positiv e, then v w i,j will be a positive voltage abov e the memristor’ s write threshold. Otherwise, the write v oltage is zero. So far , we hav e only discussed the memristor crossbar and distance calculation parts of Figure 1 (line 5 in Algorithm 1). The winner-takes-all circuit (line 6 in Algorithm 1) can be implemented in a number of ways. In this work, we used the current-mode design described in [7]. Finally , the weight update (line 7 in Algorithm 1) can be computed using simple combinational logic circuits. V . C L U S T E R I N G M N I S T I M AG E S One exciting application of the proposed hardware is auto- matically identifying clusters in sets of images. W e took 1000 images ( m =1000) from the MNIST handwritten digit dataset and clustered them using a behavioral model of the NMS described in the last section. Each image was originally 20 × 20 grayscale pixels ( N =400). They were mapped to hypercube vertices using the thresholding approach discussed earlier . In addition, we used 10 clusters ( M =10), 500 training epochs ( N train =500), and α =0.005. The results are shown in Figure 3. Here, we ha ve plotted the weight v ectors representing the centroid of each cluster . Figure 4 shows the cost versus the training epoch, where the cost is deﬁned as J = m X p =1 (min d i,p ∀ i ) . (3) W e see that the cost function for the proposed NMS ap- proaches that of MA TLAB’ s built-in k-means clustering after 500 epochs. V I . C O N C L U S I O N S The goal of this work was to explore both algorithmic and hardware design aspects of unsupervised learning in NMSs. T o that end, we proposed a clustering algorithm that maps inputs to vertices of a hypercube, and then iterativ ely ﬁnds clusters’ centroids using a Hebbian learning rule. W e ar gue (although we haven’ t proven) that the proposed algorithm can be im- plemented more efﬁciently in an NMS than algorithms that use either ` 2 -norm or cosine similarity as a distance function. The algorithm was implemented in a custom NMS design that lev erages crossbar circuits to compute the distance between inputs and weight vectors. T o test our design, we clustered 1000 MNIST images and found the results to be consistent with MA TLAB’ s k-means clustering implementation. Epoch 0 100 200 300 400 500 J # 10 5 1.3 1.4 1.5 1.6 1.7 1.8 MATLAB Proposed Fig. 4. Cost function versus epoch while clustering MNIST images using the proposed NMS. R E F E R E N C E S [1] C. Mead, “Neuromorphic electronic systems, ” Pr oceedings of the IEEE , vol. 78, no. 10, pp. 1629–1636, 1990. [2] D. Kudithipudi, C. Merkel, M. Soltiz, G. S. Rose, and R. Pino, “Design of neuromorphic archtectures with mem- ristors, ” in Network Science and Cybersecurity , R. Pino, Ed. Springer , 2014, pp. 93–103. [3] S. Y u, B. Gao, Z. Fang, H. Y u, J. Kang, and H.-S. P . W ong, “A low energy oxide-based electronic synaptic device for neuromorphic visual systems with tolerance to device variation.” Advanced Materials , vol. 25, no. 12, pp. 1774–9, Mar . 2013. [4] S. Choi, P . Sheridan, and W . D. Lu, “Data Clustering using Memristor Networks.” Scientiﬁc reports , v ol. 5, p. 10492, Jan. 2015. [5] L. Chua, “Resistance switching memories are memristors, ” Applied Physics A , vol. 102, no. 4, pp. 765–783, Jan. 2011. [6] D. Kuzum, S. Y u, and H.-S. P . W ong, “Synaptic electron- ics: materials, devices and applications.” Nanotechnology , vol. 24, no. 38, p. 382001, Sep. 2013. [7] J. Lazzaro, S. Ryck ebusch, M. A. Mahow ald, and C. A. Mead, “W inner-take-all networks of O(N) complexity, ” in Advances in Neural Information Pr ocessing Systems , 1988, pp. 703–711.

Unsupervised Learning in Neuromemristive Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment