Convex and Network Flow Optimization for Structured Sparsity
We consider a class of learning problems regularized by a structured sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address here the case of general overlapping groups. To this end, we present two different strategies: On the one hand, we show that the proximal operator associated with a sum of l_infinity-norms can be computed exactly in polynomial time by solving a quadratic min-cost flow problem, allowing the use of accelerated proximal gradient methods. On the other hand, we use proximal splitting techniques, and address an equivalent formulation with non-overlapping groups, but in higher dimension and with additional constraints. We propose efficient and scalable algorithms exploiting these two strategies, which are significantly faster than alternative approaches. We illustrate these methods with several problems such as CUR matrix factorization, multi-task learning of tree-structured dictionaries, background subtraction in video sequences, image denoising with wavelets, and topographic dictionary learning of natural image patches.
💡 Research Summary
This paper addresses the challenging problem of learning with structured sparsity when the groups of variables overlap arbitrarily. The authors focus on regularizers that are sums of ℓ₂- or ℓ∞-norms over a collection of groups G, i.e., Ω(w)=∑_{g∈G}η_g‖w_g‖ where the norm inside each group can be either ℓ₂ or ℓ∞. While previous work has provided fast algorithms for disjoint groups or hierarchical (tree‑structured) groups, the general overlapping case remained computationally demanding.
The first major contribution is an exact, polynomial‑time algorithm for the proximal operator of a sum of ℓ∞‑norms with overlapping groups. By reformulating the proximal subproblem as a quadratic minimum‑cost flow problem, the authors map each variable and each group to nodes and arcs in a flow network. The capacity constraints encode the ℓ∞‑norm bounds, while the quadratic term appears as edge costs. Standard min‑cost flow solvers (e.g., push‑relabel, cycle‑cancelling) then compute the proximal step in O(poly(|V|,|E|)) time. This enables the use of accelerated proximal gradient methods (FISTA) with a guaranteed O(1/k²) convergence rate. Moreover, they show that the dual norm of Ω can be evaluated with the same flow machinery, allowing tight duality‑gap certificates during optimization.
The second contribution is a proximal‑splitting framework that eliminates overlaps by lifting the problem to a higher‑dimensional space. Introducing auxiliary variables z_g for each group, the original variable w is constrained by a linear equality A z = w, where A simply aggregates the group components. The lifted problem separates into independent proximal subproblems for each group (each involving only a simple ℓ₂ or ℓ∞ proximal operator) together with a quadratic consensus term. The authors solve this using the Alternating Direction Method of Multipliers (ADMM) and, alternatively, Douglas‑Rachford splitting. Although the dimensionality increases, the subproblems are cheap and highly parallelizable, making the approach scalable to large‑scale data.
Both algorithms are complemented by efficient implementations, careful handling of numerical stability, and extensive empirical evaluation. The paper demonstrates the practical impact on several domains:
- CUR matrix factorization – By imposing structured sparsity on the selection matrices, the authors obtain more accurate low‑rank approximations with fewer sampled rows/columns.
- Video background subtraction – Overlapping spatio‑temporal groups capture continuity of background pixels; the proposed methods separate moving foreground from static background efficiently.
- Tree‑structured dictionary learning – Using ℓ∞‑norm groups aligned with a hierarchy yields dictionaries whose atoms respect the tree, improving interpretability and classification performance.
- Wavelet image denoising – Multi‑scale overlapping groups in the wavelet domain lead to superior PSNR compared with classic total‑variation or ℓ₁ wavelet shrinkage.
- Topographic dictionary learning for natural patches – Overlapping groups enforce locality and smoothness in the learned atoms, producing more natural visual structures.
Across all experiments, the flow‑based proximal gradient method outperforms prior subgradient, interior‑point, and smoothing‑based approaches by factors ranging from 5× to 30× in runtime, while using substantially less memory. The ADMM‑based splitting method shows comparable speed and excellent scalability, especially when parallel resources are available.
In summary, the paper makes three core theoretical contributions: (i) an exact polynomial‑time proximal operator for overlapping ℓ∞‑norm sums via min‑cost flow, (ii) an efficient dual‑norm computation enabling tight duality‑gap monitoring, and (iii) a versatile proximal‑splitting formulation that removes overlaps at the cost of a higher‑dimensional consensus problem. These contributions are validated on a broad set of real‑world tasks, establishing new state‑of‑the‑art performance for structured sparsity with arbitrary overlapping groups.
Comments & Academic Discussion
Loading comments...
Leave a Comment