Subspace-constrained randomized coordinate descent for linear systems with good low-rank matrix approximations

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The randomized coordinate descent (RCD) method is a classical algorithm with simple, lightweight iterations that is widely used for various optimization problems, including the solution of positive semidefinite linear systems. As a linear solver, RCD is particularly effective when the matrix is well-conditioned; however, its convergence rate deteriorates rapidly in the presence of large spectral outliers. In this paper, we introduce the subspace-constrained randomized coordinate descent (SC-RCD) method, in which the dynamics of RCD are restricted to an affine subspace corresponding to a column Nyström approximation, efficiently computed using the recently analyzed RPCholesky algorithm. We prove that SC-RCD converges at a rate that is unaffected by large spectral outliers, making it an effective and memory-efficient solver for large-scale, dense linear systems with rapidly decaying spectra, such as those encountered in kernel ridge regression. Experimental validation and comparisons with related solvers based on coordinate descent and the conjugate gradient method demonstrate the efficiency of SC-RCD. Our theoretical results are derived by developing a more general subspace-constrained framework for the sketch-and-project method. This framework, which may be of independent interest, generalizes popular algorithms such as randomized Kaczmarz and coordinate descent, and provides a flexible, implicit preconditioning strategy for a variety of iterative solvers.

💡 Research Summary

The paper introduces Subspace‑Constrained Randomized Coordinate Descent (SC‑RCD), a lightweight iterative solver for large dense positive‑semidefinite (PSD) linear systems of the form Ax = b. Classical Randomized Coordinate Descent (RCD) enjoys cheap per‑iteration cost because each update touches only a single column (or a small block) of A, but its convergence rate depends on the full spectrum of A. In particular, large outlying eigenvalues (spectral outliers) dramatically slow down convergence, making RCD unsuitable for ill‑conditioned kernel matrices that arise in kernel ridge regression (KRR) and similar applications.

SC‑RCD mitigates this problem by first computing a low‑rank Nyström approximation of A using the RPCholesky algorithm. RPCholesky adaptively samples a set S of d pivot indices via diagonal‑based probabilities, builds a partial Cholesky factor F ∈ ℝ^{n×d}, and returns the rank‑d approximation A⟨S⟩ = FFᵀ. Theoretical guarantees (Theorem 1.1) ensure that the expected trace‑norm error of this approximation is at most (1 + δ) times the optimal rank‑r error, provided d satisfies a modest condition involving r, δ, and the tail energy η_r = (∑{i>r}λ_i(A))/ (∑{i=1}^n λ_i(A)).

Having obtained A⟨S⟩, SC‑RCD restricts all iterates to the affine subspace defined by the exact constraints on the pivot rows: { x | A_{S,:} x = b_S }. The initial point x₀ is computed by solving this small d‑by‑d system, which costs only O(d³) and can be done with triangular solves because F_{S,:} is lower‑triangular. The algorithm then proceeds exactly like block RCD, but with a crucial twist: at iteration k the residual r_k = Ax_k − b is projected onto the residual matrix A∘ = A − A⟨S⟩. A block J of ℓ coordinates (chosen i.i.d. with probability proportional to the diagonal of A∘) is sampled, and the update solves a small ℓ‑by‑ℓ linear system α_k = (A∘{J,J})† r{k,J}, β_k = C_{:,J} α_k, where C = (F_{S,:})† Fᵀ ∈ ℝ^{d×n} is pre‑computed. The coordinates in J are updated by subtracting α_k, while the pivot coordinates S are simultaneously adjusted by adding β_k to preserve the subspace constraint. The residual vector is updated as r_{k+1}=r_k − A∘_{:,J} α_k.

Because the dynamics are confined to the subspace, the convergence analysis depends only on the spectrum of A∘, not on the original matrix A. Theorem 1.3 shows that, with high probability, tr(A∘) is bounded by a factor (1 + δ) times the tail sum ∑_{i>r}λ_i(A). Consequently the expected A‑norm error decays geometrically: E

Subspace-constrained randomized coordinate descent for linear systems with good low-rank matrix approximations

💡 Research Summary

Comments & Academic Discussion

Leave a Comment