Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA which are, in some important cases, dimension-free and poly-logarithmic on the accuracy, closely matching the existing results in the smooth convex case. Additionally, we establish new upper bounds for the privacy curve of the subsampled noisy SGD algorithm. These bounds show a crucial dependency on the regularity of gradients, and are useful for a wide range of convex losses beyond the smooth case. Our analysis relies on a suitable extension of the Privacy Amplification by Iteration (PABI) framework (Feldman et al., 2018; Altschuler and Talwar, 2022, 2023) to noisy iterations whose gradient map is not necessarily nonexpansive. This extension is achieved by designing an optimization problem which accounts for the best possible Rényi divergence bound obtained by an application of PABI, where the tractability of the problem is crucially related to the modulus of continuity of the associated gradient mapping. We show that, in several interesting cases – namely the nonsmooth convex, weakly smooth and (strongly) dissipative – such optimization problem can be solved exactly and explicitly, yielding the tightest possible PABI-based bounds.


💡 Research Summary

The paper studies two intertwined problems: (i) the mixing time of the Projected Langevin Algorithm (PLA) and (ii) the differential‑privacy (DP) guarantees of noisy Stochastic Gradient Descent (SGD) when the underlying gradient map is not necessarily non‑expansive. The authors extend the Privacy Amplification by Iteration (PABI) framework—originally developed for smooth convex functions with Lipschitz gradients—to a much broader class of convex objectives by quantifying the regularity of the iteration map through a modulus of continuity φ.

A modulus of continuity φ is a non‑decreasing function satisfying ‖Φ(x)‑Φ(y)‖ ≤ φ(‖x‑y‖) for all x, y, where Φ denotes the deterministic part of the update (e.g., x‑η∇f(x)). The key technical contribution is to show that when φ has the specific square‑root form φ(δ)=√(c δ² + h) (with c, h ≥ 0), the otherwise non‑convex optimization problem that arises in PABI—optimizing a sequence of “shifts’’ that trade off Wasserstein distance against Rényi divergence—admits a unique closed‑form solution. This solution yields the tightest possible PABI‑based bound for a given φ.

The paper identifies four important settings where φ takes this form:

  1. Convex‑Lipschitz potentials (L‑Lipschitz gradients): φ(δ)=L δ (c=L², h=0).
  2. (p, M)‑weakly smooth potentials (Hölder‑continuous gradients): φ(δ)=M δᵖ (c=M² δ^{2p‑2}, h=0).
  3. Strongly dissipative + β‑smooth potentials: φ(δ)=√((1‑2ηκ+η²β²) δ² + η²σ²) where η is the step size, κ, β are dissipativity and smoothness constants, and σ² is the per‑coordinate variance of the Gaussian noise.
  4. Convex‑Lipschitz but non‑differentiable potentials: here h>0, so φ is discontinuous at zero.

For each case the authors compute the constants c and h, plug them into Theorem 3.4, and obtain explicit Rényi‑divergence bounds for the last iterate of PLA when started from two different initial points. Table 1 in the paper summarises these bounds, which to the best of the authors’ knowledge are new.

Mixing‑time results. By iteratively applying the optimized PABI bound, the authors convert a bound on the ∞‑Wasserstein distance between two trajectories into a bound on the Rényi divergence of the final states. For convex‑Lipschitz and (p, M)‑weakly smooth potentials they prove that, for a step size η satisfying a mild polylogarithmic condition in the diameter D of the feasible set, the total‑variation mixing time satisfies

 T_mix,TV(ε) ≤ ⌈log₂(1/ε)⌉·(D²/η)·Θ(p,M),

where Θ(p,M) grows polynomially in M and depends on p in a way that interpolates between the nonsmooth (p=0) and smooth (p=1) regimes. This bound is dimension‑free and only polylogarithmic in the desired accuracy.

For strongly dissipative + β‑smooth potentials, assuming the contraction factor c = 1‑2ηκ+η²β² < 1, they obtain a logarithmic mixing time

 T_mix,TV(ε) = O(log(1/ε))·


Comments & Academic Discussion

Loading comments...

Leave a Comment