Algorithm-hardware co-design of neuromorphic networks with dual memory pathways
Spiking neural networks excel at event-driven sensing. Yet, maintaining task-relevant context over long timescales both algorithmically and in hardware, while respecting both tight energy and memory budgets, remains a core challenge in the field. We address this challenge through novel algorithm-hardware co-design effort. At the algorithm level, inspired by the cortical fast-slow organization in the brain, we introduce a neural network with an explicit slow memory pathway that, combined with fast spiking activity, enables a dual memory pathway (DMP) architecture in which each layer maintains a compact low-dimensional state that summarizes recent activity and modulates spiking dynamics. This explicit memory stabilizes learning while preserving event-driven sparsity, achieving competitive accuracy on long-sequence benchmarks with 40-60% fewer parameters than equivalent state-of-the-art spiking neural networks. At the hardware level, we introduce a near-memory-compute architecture that fully leverages the advantages of the DMP architecture by retaining its compact shared state while optimizing dataflow, across heterogeneous sparse-spike and dense-memory pathways. We show experimental results that demonstrate more than a 4x increase in throughput and over a 5x improvement in energy efficiency compared with state-of-the-art implementations. Together, these contributions demonstrate that biological principles can guide functional abstractions that are both algorithmically effective and hardware-efficient, establishing a scalable co-design paradigm for real-time neuromorphic computation and learning.
💡 Research Summary
The paper tackles a fundamental limitation of spiking neural networks (SNNs): the inability to retain task‑relevant context over long timescales without incurring prohibitive energy and memory costs. Drawing inspiration from the brain’s fast‑slow cortical organization, the authors propose a dual‑memory‑pathway (DMP) architecture that couples a conventional fast spiking pathway with an explicit, low‑dimensional slow memory state at each layer. Concretely, each layer maintains a vector m ∈ ℝ^d (with d ≪ N, where N is the number of spiking neurons). This memory evolves according to a well‑conditioned linear dynamics (e.g., m_{t+1}=α m_t+β f_spike(x_t)) and is fed back as an additional bias current to the spiking neurons (I_t = W_x x_t + W_m m_t). The fast pathway processes instantaneous events, while the slow pathway compresses recent activity into a compact state that provides long‑range temporal context. Because d is only 5–10 % of the hidden width, the extra parameter and computational overhead is O(d), dramatically smaller than the O(N²) cost of recurrent SNNs or the deep buffers required for delay‑based SNNs.
To exploit this algorithmic structure, the authors design a near‑memory compute accelerator. The shared slow state is stored in a small SRAM block located close to the compute units, allowing rapid read‑modify‑write cycles without moving large volumes of data across the chip. Sparse spike streams are processed by a dedicated sparse‑spike engine using an input‑stationary dataflow, while the dense memory updates use an output‑stationary flow. This heterogeneous dataflow fuses the two pathways in a pipelined fashion, achieving high arithmetic intensity and minimizing off‑chip bandwidth. Post‑layout simulations in a 22FDX 7 nm technology show more than a 4× increase in throughput and over a 5× improvement in energy efficiency compared with state‑of‑the‑art recurrent or delay‑based SNN accelerators, while also reducing silicon area.
Empirically, the DMP‑SNN is evaluated on four temporally demanding benchmarks: Permuted Sequential MNIST (PS‑MNIST), Sequential MNIST (S‑MNIST), Spiking Heidelberg Digits (SHD), and Spiking Speech Commands (SSC). Across all datasets, DMP‑SNN matches or exceeds the accuracy of strong baselines (recurrent SNNs, delay SNNs, LSNN, GLIF, etc.) while using 40–60 % fewer parameters. For vision tasks (PS‑MNIST, S‑MNIST) the model reaches 95.5 %–99.3 % accuracy with a memory dimension as low as 5 % of the hidden width; for auditory event streams (SHD, SSC) it attains 89 %–92 % accuracy, comparable to delay‑based models that require extensive programmable delay lines. Ablation studies reveal that the fast pathway is indispensable—removing it collapses performance to chance—while the slow memory alone provides the long‑range context. Moreover, the frequency of memory updates can be co‑tuned with task characteristics: vision sequences tolerate coarse updates (dilation up to 10) without loss, whereas irregular auditory streams need finer updates, a property that directly translates into lower switching activity and memory traffic for the hardware.
The authors also explore the interaction between the explicit slow memory and learnable axonal delays. Adding a modest delay budget (θ = 5) to a small memory (d = 5) yields a 2 % accuracy boost, and the learned delay distribution shifts toward shorter delays, indicating that the memory state partially substitutes for long delays. Consequently, the hardware can avoid deep delay buffers and the associated timing metadata, further reducing area and power.
In summary, the work demonstrates that a biologically motivated dual‑memory abstraction can simultaneously address algorithmic challenges (stable long‑range temporal integration with few parameters) and hardware constraints (low memory footprint, high compute density, and energy efficiency). By co‑designing the network architecture and the accelerator, the authors establish a scalable paradigm for real‑time neuromorphic computation that bridges the gap between brain‑like temporal processing and practical, low‑power silicon implementations.
Comments & Academic Discussion
Loading comments...
Leave a Comment