CUDAEASY - a GPU Accelerated Cosmological Lattice Program

CUDAEASY - a GPU Accelerated Cosmological Lattice Program
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper presents, to the author’s knowledge, the first graphics processing unit (GPU) accelerated program that solves the evolution of interacting scalar fields in an expanding universe. We present the implementation in NVIDIA’s Compute Unified Device Architecture (CUDA) and compare the performance to other similar programs in chaotic inflation models. We report speedups between one and two orders of magnitude depending on the used hardware and software while achieving small errors in single precision. Simulations that used to last roughly one day to compute can now be done in hours and this difference is expected to increase in the future. The program has been written in the spirit of LATTICEEASY and users of the aforementioned program should find it relatively easy to start using CUDAEASY in lattice simulations. The program is available at http://www.physics.utu.fi/theory/particlecosmology/cudaeasy/ under the GNU General Public License.


💡 Research Summary

**
The paper introduces CUDAEASY, the first publicly available GPU‑accelerated program for solving the dynamics of interacting scalar fields in an expanding Friedmann‑Robertson‑Walker universe. Building on the well‑known LATTICEEASY and DEFROST codes, the authors re‑implement the core leap‑frog (staggered) time‑integration scheme in NVIDIA’s CUDA framework while retaining the same physical model and input format.

Key technical contributions include: (1) a rescaling of fields, coordinates and time (variables A, B, r, s) that eliminates first‑order time derivatives, allowing a stable second‑order leap‑frog update; (2) the use of a 26‑point isotropic stencil (second‑order accurate, fourth‑order isotropic) for the Laplacian, which is more accurate than the six‑point stencil used in LATTICEEASY; (3) a tiled implementation along the z‑axis that stores three layers of lattice data (down, middle, up) in shared memory, drastically reducing global‑memory traffic to roughly 1/21 of a naïve approach; (4) placement of constant coefficients (stencil weights, scale‑factor‑dependent terms, potential parameters) in constant memory to lower register pressure; (5) a hybrid CPU‑GPU workflow where the heavy field‑updates are performed on the GPU, while the evolution of the scale factor a(t) and the computation of volume‑averaged quantities (energy density, pressure, gradient averages) are done on the host CPU.

Performance tests on NVIDIA GT200‑class and modern Pascal/Volta GPUs show speed‑ups ranging from an order of magnitude to two orders of magnitude compared with the original CPU‑only LATTICEEASY, depending on lattice size (e.g., 30× faster for 128³, >80× for 256³). All calculations are carried out in single precision; nevertheless, energy conservation and physical observables remain within acceptable error bounds (<0.1 %). Memory consumption is about 4 GB per lattice for single‑precision data, allowing simulations up to 512³ on current GPUs.

The code is released under the GNU GPL at the provided URL, preserving LATTICEEASY’s input syntax to ease adoption. The authors also discuss plans to port the implementation to OpenCL, which would make the program hardware‑agnostic and runnable on AMD GPUs.

In summary, CUDAEASY demonstrates that the inherently parallel nature of lattice field theory calculations can be exploited effectively on modern GPUs, delivering dramatic reductions in wall‑clock time while maintaining scientific accuracy. This enables researchers to explore larger parameter spaces, higher‑resolution lattices, and more complex multi‑field models in early‑universe cosmology that were previously computationally prohibitive.


Comments & Academic Discussion

Loading comments...

Leave a Comment