Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing

Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Due to the ever growing amounts of data leveraged for machine learning and scientific computing, it is increasingly important to develop algorithms that sample only a small portion of the data at a time. In the case of linear least-squares, the randomized block Kaczmarz method (RBK) is an appealing example of such an algorithm, but its convergence is only understood under sampling distributions that require potentially prohibitively expensive preprocessing steps. To address this limitation, we analyze RBK when the data is sampled uniformly, showing that its iterates converge in a Monte Carlo sense to a $\textit{weighted}$ least-squares solution. Unfortunately, for general problems the bias of the weighted least-squares solution and the variance of the iterates can become arbitrarily large. We show that these quantities can be rigorously controlled by incorporating regularization into the RBK iterations, yielding the regularized algorithm ReBlocK. Numerical experiments including examples arising from natural gradient optimization demonstrate that ReBlocK can outperform both RBK and minibatch stochastic gradient descent for inconsistent problems with rapidly decaying singular values.


💡 Research Summary

The paper addresses the challenge of solving large‑scale linear least‑squares problems when only a small subset of rows can be accessed at any time. Classical randomized Kaczmarz (RK) updates a single row per iteration, while the randomized block Kaczmarz (RBK) uses a block of (k) rows and the Moore–Penrose pseudoinverse of the sampled block. Existing convergence analyses of RBK rely on expensive preprocessing steps that either partition the matrix into well‑conditioned blocks or apply an incoherence transform. Such preprocessing is infeasible for truly massive or “semi‑infinite’’ problems where the data cannot be scanned even once.

The authors therefore study RBK under the simplest possible sampling scheme: uniform selection of a size‑(k) subset of rows at each iteration (RBK‑U). By introducing a generic “mass’’ matrix (M(A_S)) for a sampled block (A_S) they rewrite the iteration as
(x_{t+1}= (I-P(S_t))x_t + A^\top W(S_t)b)
with (P(S)=A_S^\top M(A_S)A_S) and (W(S)=I_S^\top M(A_S)I_S). Taking expectations yields matrices (P=\mathbb{E}


Comments & Academic Discussion

Loading comments...

Leave a Comment