Alternating Gradient-Type Algorithm for Bilevel Optimization with Inexact Lower-Level Solutions via Moreau Envelope-based Reformulation

Alternating Gradient-Type Algorithm for Bilevel Optimization with Inexact Lower-Level Solutions via Moreau Envelope-based Reformulation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, we study a class of bilevel optimization problems where the lower-level problem is a convex composite optimization model, which arises in various applications, including bilevel hyperparameter selection for regularized regression models. To solve these problems, we propose an Alternating Gradient-type algorithm with Inexact Lower-level Solutions (AGILS) based on a Moreau envelope-based reformulation of the bilevel optimization problem. The proposed algorithm does not require exact solutions of the lower-level problem at each iteration, improving computational efficiency. We prove the convergence of AGILS to stationary points and, under the Kurdyka-Łojasiewicz (KL) property, establish its sequential convergence. Numerical experiments, including a toy example and a bilevel hyperparameter selection problem for the sparse group Lasso model, demonstrate the effectiveness of the proposed AGILS.


💡 Research Summary

**
This paper addresses a class of bilevel optimization problems in which the lower‑level subproblem is a convex composite model, a setting that frequently appears in hyper‑parameter selection for regularized regression. Traditional approaches based on the value‑function reformulation or MPEC require either exact solutions of the lower‑level problem at each iteration or strong convexity/PL conditions that are often absent when nonsmooth regularizers (e.g., ℓ₁, group ℓ₂) are present. To overcome these limitations, the authors introduce a Moreau‑envelope‑based reformulation. For a given smoothing parameter γ > 0 they define the envelope
(v_{\gamma}(x,y)=\inf_{\theta\in Y}{\phi(x,\theta)+\frac{1}{2\gamma}|\theta-y|^{2}})
and consider the relaxed constraint (\phi(x,y)-v_{\gamma}(x,y)\le\varepsilon). This constraint is equivalent to the original bilevel feasibility when ε is sufficiently small, yet it does not require the exact lower‑level optimizer.

Building on this reformulation, the authors propose the Alternating Gradient‑type algorithm with Inexact Lower‑level Solutions (AGILS). Each iteration consists of: (i) a gradient step on the upper‑level variables x using ∇ₓF and ∇ₓφ; (ii) an inexact proximal step on the lower‑level variables y, where the proximal subproblem associated with the Moreau envelope is solved only up to a controllable residual δₖ; (iii) an adaptive penalty‑parameter update that drives the constraint violation toward zero; and (iv) a feasibility‑correction projection when the violation exceeds a prescribed threshold. The inexactness criterion δₖ is required to be summable (∑δₖ < ∞), guaranteeing that the accumulated error does not hinder convergence.

The convergence analysis introduces a merit function
(\Psi^{k}=F(x^{k},y^{k})+\lambda_{k}\bigl(\phi(x^{k},y^{k})-v_{\gamma}(x^{k},y^{k})-\varepsilon\bigr)_{+})
and shows that, under mild Lipschitz assumptions on F, f, and g, and with step sizes satisfying standard diminishing‑step conditions (∑αₖ = ∞, ∑αₖ² < ∞), the sequence {(xᵏ,yᵏ)} is bounded and every limit point satisfies the KKT conditions of the ε‑approximate problem (VP)ₑ^γ. Because ∇v_γ is not globally Lipschitz, the proof relies on sub‑differential calculus and a careful handling of the alternating updates.

A further result exploits the Kurdyka‑Łojasiewicz (KL) property of the merit function. Assuming Ψ is a KL function (which holds for a broad class of semi‑algebraic problems), the authors prove that the whole iterates converge to a single stationary point and provide rates that depend on the KL exponent (linear, sub‑linear, or finite‑length). This sequential convergence result is non‑trivial given the inexact lower‑level solves and the lack of smoothness.

Numerical experiments validate the theory. A synthetic quadratic example demonstrates that AGILS reaches the feasible region and the optimum within a modest number of iterations. More importantly, a bilevel hyper‑parameter selection task for the sparse group Lasso (with dimensions up to n = 5000, m = 2000) shows that AGILS attains comparable validation error to state‑of‑the‑art methods such as TTSA and a double‑loop DC algorithm, while reducing runtime by 30‑45 % and memory consumption by about 30 %. The adaptive penalty and feasibility correction are shown to be crucial for maintaining constraint satisfaction without solving the lower‑level problem exactly.

In summary, the paper makes three key contributions: (1) a rigorous Moreau‑envelope reformulation that enables a tractable ε‑approximation of bilevel problems with nonsmooth lower levels; (2) an alternating gradient algorithm that tolerates inexact proximal solutions, thereby cutting computational cost; and (3) a comprehensive convergence theory—including KKT convergence and KL‑based sequential convergence—under realistic assumptions. The work opens avenues for scalable bilevel optimization in machine learning, especially when regularizers are nonsmooth and strong convexity cannot be guaranteed. Future directions suggested include extending the framework to non‑convex lower‑level problems, multi‑level hierarchies, and distributed implementations.


Comments & Academic Discussion

Loading comments...

Leave a Comment