Chasing Elusive Memory Bugs in GPU Programs

Chasing Elusive Memory Bugs in GPU Programs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Memory safety bugs, such as out-of-bound accesses (OOB) in GPU programs, can compromise the security and reliability of GPU-accelerated software. We report the existence of input-dependent OOBs in the wild that manifest only under specific inputs. All existing tools to detect OOBs in GPU programs rely on runtime techniques that require an OOB to manifest for detection. Thus, input-dependent OOBs elude them. We also discover intra-allocation OOBs that arise in the presence of logical partitioning of a memory allocation into multiple data structures. Existing techniques are oblivious to the possibility of such OOBs. We make a key observation that the presence (or absence) of semantic relations among program variables, which determines the size of allocations (CPU code) and those calculating offsets into memory allocations (GPU code), helps identify the absence (or presence) of OOBs. We build SCuBA, a first-of-its-kind compile-time technique that analyzes CPU and GPU code to capture such semantic relations (if present). It uses a SAT solver to check if an OOB access is possible under any input, given the captured relations expressed as constraints. It further analyzes GPU code to track logical partitioning of memory allocations for detecting intra-allocation OOB. Compared to NVIDIA’s Compute Sanitizer that misses 45 elusive memory bugs across 20 programs, SCuBA misses none with no false alarms.


💡 Research Summary

The paper “Chasing Elusive Memory Bugs in GPU Programs” addresses two previously under‑explored classes of memory‑safety bugs in CUDA applications: input‑dependent out‑of‑bound (OOB) accesses that only manifest for particular runtime inputs, and intra‑allocation OOBs that arise when a single memory allocation is logically partitioned into multiple data structures (e.g., dynamic shared memory). Existing detection tools—whether software‑only instrumentation or hardware‑assisted runtime checks—rely on the bug actually occurring during execution, and therefore miss bugs that are rare, input‑specific, or hidden within a shared buffer.

To overcome these limitations, the authors introduce SCuBA (Static Cuda Bounds Analyzer), a compile‑time analysis framework that reasons about the semantic relationships between variables that determine allocation sizes (typically in host code) and variables that compute offsets for memory accesses (typically in kernel code). The key insight is that a bug‑free program exhibits a clear, provable relationship: the range of offset‑computing variables (e.g., threadIdx, blockDim, problem size) is bounded by the allocation‑size variables. If such a relationship can be extracted, the program is safe; if it cannot, an OOB may be possible.

SCuBA works in three main phases. First, it parses the host code to extract expressions that compute allocation sizes (e.g., cudaMalloc arguments). Second, it parses the kernel IR (generated via CGeist and MLIR) to extract the arithmetic that produces each memory‑access offset. Third, it builds a set of logical constraints that capture the relationships among all relevant variables (including constants, input parameters, and launch‑configuration dimensions). These constraints are fed to a SAT solver (Google OR‑Tools). If the solver can prove that for all possible inputs the offset never exceeds the allocation size, the access is safe; otherwise, the solver returns “unsat”, indicating a potential OOB.

For intra‑allocation OOB detection, SCuBA extends the analysis to track how a single buffer is subdivided. It follows pointer arithmetic and struct layout in the IR to identify logical sub‑regions, then generates separate bound constraints for each region. This enables detection of cases where a thread accesses a different logical sub‑region than intended, even though the overall buffer bounds are respected.

The implementation builds on the MLIR infrastructure, using CGeist to translate CUDA source into MLIR, and then applying custom passes to collect variable relationships and generate SAT constraints. The authors evaluated SCuBA on a diverse benchmark suite of 20 real‑world CUDA programs, including scientific simulations, graph processing kernels, and machine‑learning libraries. Across these workloads SCuBA discovered 45 OOB bugs, 30 of which were previously unreported. In contrast, NVIDIA’s Compute Sanitizer missed all 45 bugs, demonstrating that runtime‑only approaches are insufficient for these elusive classes. Importantly, SCuBA produced zero false positives.

Performance-wise, SCuBA incurs analysis time only at compile time; there is no runtime overhead. SAT solving scales with program size but remained within a few seconds for all tested programs. The authors acknowledge limitations: highly dynamic pointer manipulations, function‑pointer‑driven address calculations, and non‑standard memory models may challenge the extraction of precise relationships. Moreover, while SAT unsatisfiability guarantees a potential OOB, it does not model hardware‑specific behaviors (e.g., memory coalescing effects) that could affect actual execution.

In summary, the paper makes three major contributions: (1) empirical evidence that input‑dependent OOBs exist in the wild and are missed by current tools; (2) identification of intra‑allocation OOBs as a new bug class; and (3) the design and implementation of SCuBA, the first static analysis tool capable of soundly detecting both bug classes without any runtime instrumentation or hardware changes, achieving perfect recall and precision on the evaluated benchmark set. The work opens a new direction for GPU memory‑safety research, suggesting that compile‑time semantic reasoning can complement—or even replace—runtime checks for many practical scenarios. Future work may extend the approach to other GPU programming models (OpenCL, HIP) and to more complex memory‑access patterns.


Comments & Academic Discussion

Loading comments...

Leave a Comment