lrux: Fast low-rank updates of determinants and Pfaffians in JAX
We present lrux, a JAX-based software package for fast low-rank updates of determinants and Pfaffians, targeting the dominant computational bottleneck in various quantum Monte Carlo (QMC) algorithms. The package implements efficient low-rank updates that reduce the cost of successive wavefunction evaluations from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2k)$ when the update rank $k$ is smaller than the dimension $n$ of matrices. Both determinant and Pfaffian updates are supported, together with delayed-update strategies that trade floating-point operations for reduced memory traffic on modern accelerators. lrux natively integrates with JAX transformations such as JIT compilation, vectorization, and automatic differentiation, and supports both real and complex data types. Benchmarks on GPUs demonstrate up to $1000\times$ speedup at large matrix sizes. lrux enables scalable, high-performance evaluation of antisymmetric wavefunctions and is designed as a drop-in component for a wide range of QMC workflows. lrux is available at https://github.com/ChenAo-Phys/lrux.
💡 Research Summary
The paper introduces lrux, a JAX‑based library that dramatically speeds up the evaluation of determinants and Pfaffians—two central quantities in many quantum Monte Carlo (QMC) methods—by exploiting low‑rank updates. In typical QMC algorithms the many‑electron wavefunction is represented either as a Slater determinant or as a Pfaffian of a skew‑symmetric matrix. Evaluating these objects naïvely costs O(n³) operations for an n‑electron system, which quickly becomes the dominant computational bottleneck as system size grows.
Mathematical foundation
For a sequence of matrices {Aₜ} that differ only by a rank‑k modification, the authors write
Aₜ = Aₜ₋₁ + vₜ uₜᵀ, with uₜ, vₜ ∈ ℝ^{n×k} (or ℂ).
Using the matrix determinant lemma, the ratio of successive determinants is
rₜ = det(Aₜ)/det(Aₜ₋₁) = det(Iₖ + uₜᵀ Aₜ₋₁⁻¹ vₜ),
which requires only the inversion of a small k×k matrix and a few matrix‑vector products, i.e. O(n²k) work. The inverse itself can be updated with the Sherman‑Morrison‑Woodbury formula:
Aₜ⁻¹ = Aₜ₋₁⁻¹ – Aₜ₋₁⁻¹ vₜ Rₜ⁻¹ uₜᵀ Aₜ₋₁⁻¹, Rₜ = Iₖ + uₜᵀ Aₜ₋₁⁻¹ vₜ.
Thus, a QMC simulation that repeatedly proposes single‑electron moves (k=1) can compute determinant ratios and maintain the inverse with only O(n²) operations per move, instead of O(n³).
When k is small but the number of moves τ becomes large, the memory bandwidth of modern GPUs can dominate the runtime because each update still requires an n×k by k×n matrix product. To mitigate this, lrux implements a delayed‑update scheme. It accumulates auxiliary quantities aₜ = Aₜ₋₁⁻¹ vₜ and bₜ = (Aₜ₋₁⁻¹)ᵀ uₜ Rₜ⁻¹, postponing the explicit rank‑k correction to the inverse until after τ steps. The accumulated correction is then applied in a single O(n²k + τ n k²) operation, dramatically reducing memory traffic. The delay depth τ is a tunable parameter; the authors suggest τ ≈ n/(10 k) as a rule of thumb, but the optimal value depends on hardware and matrix size.
Pfaffian updates
For paired‑fermion wavefunctions the antisymmetric matrix is skew‑symmetric, and its Pfaffian replaces the determinant. The authors show that a low‑rank change can be written as
Aₜ = Aₜ₋₁ – uₜ J uₜᵀ, J =
Comments & Academic Discussion
Loading comments...
Leave a Comment