The ArborX library: version 2.0

The ArborX library: version 2.0
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper provides an overview of the 2.0 release of the ArborX library, a performance portable geometric search library based on Kokkos. We describe the major changes in ArborX 2.0 including a new interface for the library to support a wider range of user problems, new search data structures (brute force, distributed), support for user functions to be executed on the results (callbacks), and an expanded set of the supported algorithms (ray tracing, clustering).


💡 Research Summary

**
The ArborX library is a performance‑portable geometric search framework built on top of Kokkos. Version 2.0, released on April 16 2025, represents a major redesign that addresses the limitations of the original API (v1) and adds a suite of new capabilities. The minimum software requirements have been modernized to C++20, CMake 3.22 and Kokkos 4.5, and the library now supports AMD and Intel GPUs through Kokkos HIP and SYCL back‑ends.

The most visible change is the new API (v2). The BVH class is now templated on four parameters: the Kokkos memory space, the user‑provided value type, an IndexableGetter functor that extracts a geometric primitive from each value, and the bounding‑volume type. This makes it possible to store arbitrary user objects (points, k‑DOPs, triangles, rays, tetrahedra, segments, etc.) and to work in any dimensionality from 1 to 10. Both the constructor and the query functions now require an explicit Kokkos execution space, allowing multiple searches to run concurrently on the same device or alongside user kernels, and enabling fine‑grained stream control.

A callback mechanism is introduced to replace the old “indices + offsets” result storage. Two kinds of callbacks are supported: a pure callback that performs an operation on each match without storing anything, and a callback‑with‑output that can write arbitrary results to a user‑provided view. Callbacks can signal early termination, enabling use‑cases such as “collect at most N matches per query” or “stop traversal once a distance threshold is reached.” Queries can carry user‑defined payloads via helper functions like attach_indices, and the payload is accessed inside the callback with getData().

ArborX 2.0 also adds a distributed search index (DistributedTree). The constructor takes an MPI communicator, builds a local BVH on each rank, and then constructs a top‑level tree that summarizes the local bounding boxes. During a search, each rank sends only the queries that need data from other ranks, receives the matching results, and executes callbacks on the rank that owns the data. This design dramatically reduces communication volume because the actual computation (e.g., interpolation, reduction) can be performed where the data resides. When compiled with the ARBORX_ENABLE_GPU_AWARE_MPI flag, the library can keep data in GPU memory throughout the MPI exchange, avoiding host‑device copies.

In addition to the classic BVH, new data structures are provided: a brute‑force search structure for small problem sizes and the distributed tree for large‑scale runs. These structures share the same unified API, making it easy for developers to switch between them based on problem characteristics.

The release expands the algorithmic repertoire. Ray‑tracing primitives are now part of the library, allowing fast ray‑intersection queries against the BVH. Two clustering algorithms are included: DBSCAN (with two variants, FDBSCAN for very sparse data and FDBSCAN‑DenseBox for data with dense regions) and a GPU‑optimized Euclidean Minimum Spanning Tree (EMST) implementation that serves as a building block for HDBSCAN*. Both clustering methods are designed to run efficiently on CPUs and GPUs and have already been used in large‑scale cosmology simulations.

Overall, ArborX 2.0 delivers a flexible, extensible, and high‑performance solution for geometric search problems in modern heterogeneous HPC environments. By breaking backward compatibility, the developers gained the ability to support arbitrary user data types, execution‑space‑aware parallelism, in‑place callback processing, and scalable distributed searches—all while maintaining the original library’s focus on performance portability. The paper serves both as a reference for citing the library and as a practical guide for developers looking to integrate advanced geometric queries, clustering, or ray‑tracing into their scientific codes.


Comments & Academic Discussion

Loading comments...

Leave a Comment