Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints

Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider nonsmooth optimization problems under affine constraints, where the objective consists of the average of the component functions of a large number $N$ of agents, and we only assume access to the Fenchel conjugate of the component functions. The algorithm of choice for solving such problems is the dual subgradient method, also known as dual decomposition, which requires $O(\frac{1}{ε^2})$ iterations to reach $ε$-optimality in the convex case. However, each iteration requires computing the Fenchel conjugate of each of the $N$ agents, leading to a complexity $O(\frac{N}{ε^2})$ which might be prohibitive in practical applications. To overcome this, we propose a two-stage algorithm, combining a stochastic subgradient algorithm on the dual problem, followed by a block-coordinate Frank-Wolfe algorithm to obtain primal solutions. The resulting algorithm requires only $O(\frac{1}{ε^2} + \frac{N}{ε^{2/3}})$ calls to Fenchel conjugates to obtain an $ε$-optimal primal solution in expectation in the convex case. We extend our results to nonconvex component functions and show that our method still applies and gets (almost) the same convergence rate, this time only to an approximate primal solution recovering the classical duality gap bounds usually obtained using the Shapley-Folkman theorem.


💡 Research Summary

The paper addresses large‑scale separable optimization problems with affine constraints, where the objective is the average of N component functions h_i(x_i). Each component is proper, convex (or possibly non‑convex in later sections), lower‑semicontinuous, and defined on a compact domain X_i. The only oracle assumed is a minimization oracle (O1) that, given a dual vector λ and a scalar γ≥0, returns
x_i^(γ,λ) = arg min_{x_i∈X_i} γ h_i(x_i) + λᵀA_i x_i.
When γ=0 this oracle is equivalent to computing a subgradient of the Fenchel conjugate h_i^
.

Traditional approach.
The standard method is the dual subgradient algorithm (also called dual decomposition). One forms the dual function
d(λ)=−λᵀb + (1/N)∑{i=1}^N inf{x_i∈X_i}


Comments & Academic Discussion

Loading comments...

Leave a Comment