HiAER-Spike Software-Hardware Reconfigurable Platform for Event-Driven Neuromorphic Computing at Scale

HiAER-Spike Software-Hardware Reconfigurable Platform for Event-Driven Neuromorphic Computing at Scale
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this work, we present HiAER-Spike, a modular, reconfigurable, event-driven neuromorphic computing platform designed to execute large spiking neural networks with up to 160 million neurons and 40 billion synapses - roughly twice the neurons of a mouse brain at faster than real time. This system, assembled at the UC San Diego Supercomputer Center, comprises a co-designed hard- and software stack that is optimized for run-time massively parallel processing and hierarchical address-event routing (HiAER) of spikes while promoting memory-efficient network storage and execution. The architecture efficiently handles both sparse connectivity and sparse activity for robust and low-latency event-driven inference for both edge and cloud computing. A Python programming interface to HiAER-Spike, agnostic to hardware-level detail, shields the user from complexity in the configuration and execution of general spiking neural networks with minimal constraints in topology. The system is made easily available over a web portal for use by the wider community. In the following, we provide an overview of the hard- and software stack, explain the underlying design principles, demonstrate some of the system’s capabilities and solicit feedback from the broader neuromorphic community. Examples are shown demonstrating HiAER-Spike’s capabilities for event-driven vision on benchmark CIFAR-10, DVS event-based gesture, MNIST, and Pong tasks.


💡 Research Summary

**
The paper introduces HiAER‑Spike, a modular, reconfigurable, event‑driven neuromorphic computing platform designed to run spiking neural networks (SNNs) at unprecedented scale. Built on a six‑node cluster at the UC San Diego Supercomputer Center, the system integrates 40 Alpha Data ADM‑PCIE‑9H7 FPGA boards (Xilinx XCVU‑37P) each equipped with 8 GB of high‑bandwidth memory (HBM) delivering 460 GB/s memory bandwidth, 32 parallel SNN cores, and extensive on‑chip URAM/BRAM resources. Together with two 32‑core AMD EPYC CPUs per node, 1 TB of shared DRAM, and 29 TB of SSD storage, the hardware can theoretically host up to 160 million neurons and 40 billion synapses—more than twice the neuron count of a mouse brain—while operating faster than real time.

The core architectural concept separates “grey matter” (local dense interconnects inside each core) from “white matter” (global sparse interconnects across cores, FPGA boards, and servers). Grey matter implements sequentially updated integrate‑and‑fire (LIF) or binary (ANN) neurons whose membrane states reside in URAM; spike events are routed through synaptic lookup tables stored in HBM. White matter uses a hierarchical multicast bus, termed HiAER (Hierarchical Address‑Event Routing), to propagate spikes across cores, boards, and servers with millisecond‑scale timing resolution. This hierarchy enables efficient handling of both sparse connectivity and sparse activity, essential for low‑latency inference and for supporting time‑dependent learning rules such as spike‑timing‑dependent plasticity (STDP).

Network storage is optimized for sparsity by employing adjacency‑list representations in HBM rather than dense cross‑bars. Each neuron or axon pointer stores a base address and a row count, dramatically reducing address overhead. The compiler aligns pointers and synaptic weights to the same HBM slot, maximizes packing density, and respects HBM’s alignment constraints. Spike processing proceeds in two phases: (1) firing neurons and external axons enqueue their outgoing pointer lists; (2) the system fetches the corresponding synaptic weights, updates post‑synaptic membrane potentials, and generates new spikes. On‑chip BRAM/URAM holds frequently accessed spike registers and membrane potentials, while HBM handles the bulk of the sparse lookup, yielding significant energy and latency gains.

Software access is provided through the open‑source hs_api library, written in C++ and Python. Users define neuron models via LIF_neuron or ANN_neuron classes, construct axon and neuron dictionaries, and specify output monitors. The API abstracts away hardware details, allowing the same code to run either as a local software simulation or on the remote HiAER‑Spike hardware via the Neuroscience Gateway (NSG) web portal. Network compilation automatically maps the user‑defined topology onto the hierarchical memory layout, programs the FPGA over PCIe 3.0, and handles synaptic weight updates for learning algorithms. The system thus offers a seamless development workflow from PyTorch training to FPGA deployment.

Experimental validation uses a single core (≈4 M neurons, 1 B synapses) to run benchmark tasks: CIFAR‑10 classification, DVS‑based gesture recognition, MNIST digit recognition, and a Pong control task. Networks such as multilayer perceptrons, LeNet‑5, and spiking CNNs trained in PyTorch are converted to the HiAER‑Spike format and executed at or above real‑time speeds. These results demonstrate the platform’s capability to support diverse SNN topologies, binary and LIF neurons, stochastic noise injection, and learning rules.

Compared with existing large‑scale neuromorphic systems—SpiNNaker (ARM‑based chips with custom routing), Intel Loihi 2 (ASIC with micro‑code programmable neuron cores), BrainScaleS‑2 (mixed‑signal analog/digital chips), and IBM NorthPole (custom ASIC)—HiAER‑Spike stands out for its public accessibility, rapid reconfigurability via FPGA bitstreams, and tight hardware‑software co‑design. The authors emphasize that community feedback can be incorporated quickly, guiding future ASIC designs that will inherit the lessons learned from this FPGA prototype.

In summary, HiAER‑Spike delivers a scalable, memory‑efficient, low‑latency neuromorphic platform with a user‑friendly Python API and cloud‑based access. Its hierarchical address‑event routing, adjacency‑list memory layout, and FPGA‑based flexibility enable execution of SNNs far larger than previously possible on publicly available hardware, positioning it as a pivotal resource for both neuromorphic research and emerging event‑driven AI applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment