Decentralized Non-convex Stochastic Optimization with Heterogeneous Variance
Decentralized optimization is critical for solving large-scale machine learning problems over distributed networks, where multiple nodes collaborate through local communication. In practice, the variances of stochastic gradient estimators often differ across nodes, yet their impact on algorithm design and complexity remains unclear. To address this issue, we propose D-NSS, a decentralized algorithm with node-specific sampling, and establish its sample complexity depending on the arithmetic mean of local standard deviations, achieving tighter bounds than existing methods that rely on the worst-case or quadratic mean. We further derive a matching sample complexity lower bound under heterogeneous variance, thereby proving the optimality of this dependence. Moreover, we extend the framework with a variance reduction technique and develop D-NSS-VR, which under the mean-squared smoothness assumption attains an improved sample complexity bound while preserving the arithmetic-mean dependence. Finally, numerical experiments validate the theoretical results and demonstrate the effectiveness of the proposed algorithms.
💡 Research Summary
The paper tackles decentralized non‑convex stochastic optimization in settings where each node’s stochastic gradient estimator exhibits a distinct variance σ_i², a scenario common in federated or distributed learning with heterogeneous data. Existing decentralized methods typically assume uniform variance across nodes or bound performance by the worst‑case variance σ_max, leading to suboptimal sample complexity when variances differ widely.
The authors first formalize the problem as minimizing f(x)= (1/m)∑_{i=1}^m f_i(x), where each local function f_i is defined by a local data distribution D_i and its stochastic gradient g_i(x;ξ_i) satisfies E
Comments & Academic Discussion
Loading comments...
Leave a Comment