SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity
While stochastic bilevel optimization methods have been extensively studied for addressing large-scale nested optimization problems in machine learning, it remains an open question whether the optimal complexity bounds for solving bilevel optimization are the same as those in single-level optimization. Our main result resolves this question: SPABA, an adaptation of the PAGE method for nonconvex optimization in (Li et al., 2021) to the bilevel setting, can achieve optimal sample complexity in both the finite-sum and expectation settings. We show the optimality of SPABA by proving that there is no gap in complexity analysis between stochastic bilevel and single-level optimization when implementing PAGE. Notably, as indicated by the results of (Dagréou et al., 2022), there might exist a gap in complexity analysis when implementing other stochastic gradient estimators, like SGD and SAGA. In addition to SPABA, we propose several other single-loop stochastic bilevel algorithms, that either match or improve the state-of-the-art sample complexity results, leveraging our convergence rate and complexity analysis. Numerical experiments demonstrate the superior practical performance of the proposed methods.
💡 Research Summary
This paper addresses a fundamental open problem in stochastic bilevel optimization: whether the optimal sample‑complexity bounds for bilevel problems can match those of single‑level nonconvex optimization. The authors answer affirmatively by introducing SPABA (Single‑Loop Probabilistic Stochastic Bilevel Algorithm), an adaptation of the PAGE variance‑reduction technique to the bilevel setting.
The key technical device is the “decoupling” framework originally proposed by Arbel & Mairal (2022) and Dagréou et al. (2022). By introducing an auxiliary variable z, the hypergradient ∇H(x) is expressed as three linear components:
- Dₓ(x,y,z)=∇₁f(x,y)−∇₂₁g(x,y)z (upper‑level direction),
- D_y(x,y,z)=∇₂g(x,y) (lower‑level gradient), and
- D_z(x,y,z)=∇₂₂g(x,y)z−∇₂f(x,y) (solution of a linear system).
Each component corresponds to a strongly convex subproblem, allowing the use of simple stochastic gradient steps without explicit Hessian inversion or exact lower‑level solutions.
SPABA operates in a single loop: at iteration k it samples mini‑batches for f and g, constructs biased but controlled estimators vₓₖ, v_yₖ, v_zₖ for the three directions using the PAGE estimator, and updates x, y, z simultaneously with step sizes αₖ, βₖ, γₖ. PAGE works by occasionally (with probability p≈1/√B) computing a full‑batch gradient and otherwise using a recursive variance‑reduced update; this yields an estimator with mean‑squared error O(ε) while keeping per‑iteration cost O(1). The authors also provide a unified convergence analysis that accommodates biased estimators such as PAGE and STORM, extending beyond the usual unbiased‑gradient literature.
Complexity results are established for two settings:
-
Expectation (infinite‑data) setting – Assuming bounded variance and mean‑squared smoothness of the stochastic gradients, SPABA attains an ε‑stationary point (E‖∇H(x)‖²≤ε) with O(ε⁻¹·⁵) stochastic gradient/Hessian‑vector product evaluations. This matches the known lower bound for nonconvex stochastic single‑level optimization and therefore is optimal.
-
Finite‑sum setting – When f and g are finite averages over n and m samples, SPABA achieves a sample complexity of O((n+m)¹⁄² ε⁻¹). This improves upon prior SA‑GA‑based bilevel methods that required O((n+m)²⁄³ ε⁻¹) or incurred an extra log(1/ε) factor.
In addition to SPABA, the paper proposes MA‑SABA, a momentum‑enhanced variant that uses the SA‑GA estimator together with an x‑momentum term. MA‑SABA reaches O((n+m)²⁄³ ε⁻¹) complexity without requiring higher‑order smoothness, closing the gap that earlier high‑order‑smoothness‑dependent algorithms (e.g., SRBA) left.
The theoretical contributions are complemented by extensive experiments on standard bilevel tasks: hyperparameter tuning, meta‑learning, and neural architecture search. Across all benchmarks, SPABA and MA‑SABA converge faster (fewer epochs and fewer stochastic oracle calls) and achieve lower final loss compared with state‑of‑the‑art baselines such as SOBA, MA‑SOBA, SRBA, and deterministic AID methods.
Overall, the paper delivers four major advances: (i) a truly single‑loop bilevel algorithm, (ii) optimal sample complexity for both expectation and finite‑sum regimes, (iii) a unified analysis framework for biased variance‑reduced estimators, and (iv) practical algorithms that outperform existing methods on real‑world problems. The work convincingly shows that, when equipped with modern variance‑reduction techniques like PAGE, stochastic bilevel optimization can be as statistically efficient as its single‑level counterpart, thereby resolving a long‑standing open question in the field.
Comments & Academic Discussion
Loading comments...
Leave a Comment