High-Probability Polynomial-Time Complexity of Restarted PDHG for Linear Programming

High-Probability Polynomial-Time Complexity of Restarted PDHG for Linear Programming
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The restarted primal-dual hybrid gradient method (rPDHG) is a first-order method that has recently received significant attention for its computational effectiveness in solving linear program (LP) problems. Despite its impressive practical performance, the theoretical iteration bounds for rPDHG can be exponentially poor. To shrink this gap between theory and practice, we show that rPDHG achieves polynomial-time complexity in a high-probability sense, under assumptions on the probability distribution from which the data instance is generated. We consider not only Gaussian distribution models but also sub-Gaussian distribution models as well. For standard-form LP instances with $m$ linear constraints and $n$ decision variables, we prove that rPDHG iterates settle on the optimal basis in $\widetilde{O}\left(\tfrac{n^{2.5}m^{0.5}}δ\right)$ iterations, followed by $O\left(\frac{n^{0.5}m^{0.5}}δ\ln\big(\tfrac{1}{\varepsilon}\big)\right)$ iterations to compute an $\varepsilon$-optimal solution. These bounds hold with probability at least $1-δ$ for $δ$ that is not exponentially small. The first-stage bound further improves to $\widetilde{O}\left(\frac{n^{2.5}}δ\right)$ in the Gaussian distribution model. Experimental results confirm the tail behavior and the polynomial-time dependence on problem dimensions of the iteration counts. As an application of our probabilistic analysis, we explore how the disparity among the components of the optimal solution bears on the performance of rPDHG, and we provide guidelines for generating challenging LP test instance.


💡 Research Summary

Linear programming (LP) remains a cornerstone of optimization, traditionally solved by the simplex method or interior‑point algorithms. While these classic methods enjoy strong theoretical guarantees, they rely heavily on matrix factorizations, which become a bottleneck for large‑scale, sparse problems and are ill‑suited for modern parallel hardware such as GPUs. The restarted primal‑dual hybrid gradient (rPDHG) method has emerged as a first‑order alternative that requires only matrix‑vector products, scales well with sparsity, and has already been incorporated into several commercial solvers. Despite impressive empirical performance, existing worst‑case analyses of rPDHG yield iteration bounds that depend on problem‑specific condition measures (e.g., Hoffman constants) and can be exponential in the worst case, leaving a substantial gap between theory and practice.

The paper addresses this gap by adopting a probabilistic viewpoint. It asks whether, under reasonable distributional assumptions on the input data, rPDHG can be shown to run in polynomial time with high probability (i.e., with probability at least 1 − δ for a non‑exponentially small δ). Moreover, it seeks to extend the analysis beyond the classic Gaussian model to a broader class of sub‑Gaussian distributions, and to extract new insights about the algorithm’s behavior from this probabilistic lens.

Probabilistic model.
The authors build on Todd’s classic random‑LP model. The constraint matrix (A\in\mathbb{R}^{m\times n}) (with (m\le n)) is generated entry‑wise from a sub‑Gaussian distribution (Gaussian is a special case). A feasible primal solution (\hat x) and a feasible dual pair ((\hat y,\hat s)) are first sampled; then the right‑hand side (b) and objective vector (c) are constructed as (b=A\hat x) and (c=A^{\top}\hat y+\hat s). Consequently ((\hat x,\hat s)) is an optimal primal‑dual pair for the generated LP. This construction guarantees feasibility and optimality while preserving independence between the random matrix (A) and the optimal solution vectors.

Algorithmic background.
rPDHG iteratively updates primal variables (x) and dual slack variables (s) using gradient steps on a saddle‑point formulation of the LP, and periodically restarts with a reduced step size. Empirically, the method exhibits a two‑stage behavior: (i) an “identification” stage where iterates settle on the optimal basis (the set of basic columns of (A)), and (ii) a “local convergence” stage where, once the basis is fixed, the algorithm converges linearly to an (\varepsilon)-optimal solution.

Main theoretical contributions.

  1. High‑probability iteration bounds for sub‑Gaussian data (Theorem 3.1).
    For any failure probability (\delta\in(0,1)), with probability at least (1-\delta) the identification stage finishes after
    \

Comments & Academic Discussion

Loading comments...

Leave a Comment