Unrolled Graph Neural Networks for Constrained Optimization
In this paper, we unroll the dynamics of the dual ascent (DA) algorithm in two coupled graph neural networks (GNNs) to solve constrained optimization problems. The two networks interact with each other at the layer level to find a saddle point of the Lagrangian. The primal GNN finds a stationary point for a given dual multiplier, while the dual network iteratively refines its estimates to reach an optimal solution. We force the primal and dual networks to mirror the dynamics of the DA algorithm by imposing descent and ascent constraints. We propose a joint training scheme that alternates between updating the primal and dual networks. Our numerical experiments demonstrate that our approach yields near-optimal near-feasible solutions and generalizes well to out-of-distribution (OOD) problems.
💡 Research Summary
The paper introduces a novel learning‑to‑optimize framework that unrolls the dynamics of the dual‑ascent (DA) algorithm into two interacting graph neural networks (GNNs) for solving constrained optimization problems. Traditional algorithm‑unrolling approaches have focused mainly on unconstrained settings or have limited themselves to learning a few scalar hyper‑parameters of primal‑dual methods. In contrast, this work models both the primal minimization step and the dual ascent step as separate GNNs—named the primal network Φ_P and the dual network Φ_D—allowing them to exchange information at every layer.
Problem formulation
Given a scalar objective f₀(x;z) and m inequality constraints f(x;z) ≤ 0, the Lagrangian L(x,λ;z)=f₀(x;z)+λᵀf(x;z) is defined. The dual problem seeks λ≥0 that maximizes the minimum of L over x. The classical DA algorithm iteratively (i) solves a primal sub‑problem for a fixed λ_l, obtaining x⁎l(λ_l)=argmin_x L(x,λ_l;z), and (ii) updates the multiplier via a projected gradient ascent step λ{l+1}=Π_{ℝ⁺}(λ_l+η f(x⁎_l(λ_l);z)).
Network architecture
- Primal GNN (Φ_P): Consists of K unrolled layers. Each layer contains T graph‑convolution sub‑layers (filter + tanh nonlinearity) followed by a read‑out. Residual connections mimic gradient updates. Input to layer k is the previous estimate e_x_{k‑1}, the current multiplier λ, and the problem data z; output is e_x_k=Φ_{k}^P(e_x_{k‑1},λ,z). The network is trained to approximate the mapping λ → x⁎(λ).
- Dual GNN (Φ_D): Consists of L unrolled layers. Layer l receives the previous multiplier λ_{l‑1} and the primal estimate x_{l‑1}=Φ_P(λ_{l‑1},z). It produces λ_l=ReLU(·) to enforce non‑negativity. The dual forward pass triggers L forward passes of the primal network, establishing a deep, bidirectional coupling.
Descent/ascent constraints
Unrolled networks often lose the monotonic behavior of the underlying algorithm, leading to instability and poor out‑of‑distribution (OOD) performance. The authors impose explicit layer‑wise constraints:
- Primal descent: For each k, the expected norm of the Lagrangian gradient w.r.t. x must not increase, i.e., E
Comments & Academic Discussion
Loading comments...
Leave a Comment