Scalable Construction of Spiking Neural Networks using up to thousands of GPUs
Diverse scientific and engineering research areas deal with discrete, time-stamped changes in large systems of interacting delay differential equations. Simulating such complex systems at scale on high-performance computing clusters demands efficient management of communication and memory. Inspired by the human cerebral cortex – a sparsely connected network of $\mathcal{O}(10^{10})$ neurons, each forming $\mathcal{O}(10^{3})$–$\mathcal{O}(10^{4})$ synapses and communicating via short electrical pulses called spikes – we study the simulation of large-scale spiking neural networks for computational neuroscience research. This work presents a novel network construction method for multi-GPU clusters and upcoming exascale supercomputers using the Message Passing Interface (MPI), where each process builds its local connectivity and prepares the data structures for efficient spike exchange across the cluster during state propagation. We demonstrate scaling performance of two cortical models using point-to-point and collective communication, respectively.
💡 Research Summary
This paper presents a novel and scalable method for constructing large-scale spiking neural networks (SNNs) on multi-GPU clusters and upcoming exascale supercomputers. The primary challenge in distributed SNN simulations lies in efficiently managing memory and communication, especially during the network construction phase, which has traditionally been a bottleneck. The authors address this by introducing an “onboard” network construction algorithm that operates directly within GPU memory, eliminating the need for costly data transfers from the CPU (“offboard” method).
The core innovation is a communication scheme using the Message Passing Interface (MPI) where each MPI process (typically managing one GPU) independently builds the local connectivity for its assigned neurons without inter-process communication during construction. To handle remote connections (synapses linking neurons on different processes), the method employs the concept of “proxy neurons” and prepares contiguous communication map data structures in GPU memory. These maps enable efficient routing and delivery of spikes during the subsequent simulation (state propagation) phase. The design supports both point-to-point and collective MPI communication, allowing the choice to be tailored to the specific network model’s characteristics.
Performance is demonstrated using two biologically inspired cortical models. First, the Multi-Area Model (MAM) of the macaque visual cortex, featuring a complex hierarchical architecture with heterogeneous communication patterns, is simulated using point-to-point communication on 32 NVIDIA V100 GPUs. The onboard method achieves a more than 10x speedup in network construction time (from ~686s to ~55.5s) compared to the offboard approach, while maintaining comparable simulation performance.
Second, a scalable balanced random network model with homogeneous load is used to test weak scaling using collective MPI communication on up to 1024 NVIDIA A100 GPUs (256 nodes of the Leonardo Booster system). The results show efficient scaling, with each GPU handling 225,000 neurons and 2.53 billion synapses. The study extrapolates that the proposed data structures and methods could enable the simulation of networks with up to 2×10^10 neurons and 10^14 synapses on exascale systems like the planned JUPITER supercomputer.
In summary, this work provides a significant advancement towards brain-scale SNN simulations by solving key bottlenecks related to in-memory network construction and optimized spike communication across thousands of GPUs, paving the way for more detailed and larger-scale computational neuroscience research.
Comments & Academic Discussion
Loading comments...
Leave a Comment