DP-$λ$CGD: Efficient Noise Correlation for Differentially Private Model Training
Differentially private stochastic gradient descent (DP-SGD) is the gold standard for training machine learning models with formal differential privacy guarantees. Several recent extensions improve its accuracy by introducing correlated noise across training iterations. Matrix factorization mechanisms are a prominent example, but they correlate noise across many iterations and require storing previously added noise vectors, leading to substantial memory overhead in some settings. In this work, we propose a new noise correlation strategy that correlates noise only with the immediately preceding iteration and cancels a controlled portion of it. Our method relies on noise regeneration using a pseudorandom noise generator, eliminating the need to store past noise. As a result, it requires no additional memory beyond standard DP-SGD. We show that the computational overhead is minimal and empirically demonstrate improved accuracy over DP-SGD.
💡 Research Summary
Differentially private stochastic gradient descent (DP‑SGD) is the de‑facto method for training machine learning models with formal (ε,δ)‑differential privacy guarantees. Recent work has shown that correlating the Gaussian noise added at each iteration can reduce the cumulative noise variance and thus improve model utility. The most prominent class of such techniques are matrix‑factorization mechanisms, which design a lower‑triangular strategy matrix C (or its inverse C⁻¹) to induce a specific linear correlation among noise vectors. While powerful, these mechanisms require storing all previously added noise vectors because the correlation spans many past iterations. For modern deep networks with millions of parameters, this storage can consume tens of gigabytes of memory, making the approach impractical in many settings.
The paper introduces DP‑λCGD, a novel noise‑correlation scheme that eliminates any additional memory overhead while preserving the utility benefits of correlated noise. The key idea is to restrict the correlation to only the immediately preceding iteration. Concretely, the authors define a Toeplitz lower‑triangular strategy matrix C_λ whose entries are (C_λ)_{ij}=λ^{i‑j} for i≥j and zero otherwise. Its inverse C_λ⁻¹ has a particularly simple structure: ones on the diagonal, –λ on the first sub‑diagonal, and zeros elsewhere. Applying C_λ⁻¹ to a noise matrix Z corresponds to the update rule
ẑ_t = z_t – λ·z_{t‑1},
where z_t is freshly sampled Gaussian noise and z_{t‑1} is the noise from the previous step. This operation can be performed without storing any past noise vectors because the noise is generated by a pseudorandom number generator (PRNG). By saving the PRNG state at each iteration, the algorithm can roll back to the previous state, deterministically regenerate z_{t‑1}, apply the cancellation, and then advance the PRNG to produce a new z_t. The only extra computation is the regeneration of one noise vector per iteration, which the authors empirically show to be negligible (≈1 % of total training time).
Privacy analysis follows the standard DP‑SGD framework. The sensitivity of the transformed gradients depends on the chosen λ through the matrix C_λ, and the required Gaussian noise multiplier σ is computed using a “Balls‑in‑Bins” subsampling scheme together with its dedicated privacy accountant (B n B). This accounts for the privacy amplification effect of random batch allocation, allowing the same (ε,δ) guarantee with a smaller σ than naïve DP‑SGD.
The authors provide theoretical bounds for two common error metrics used to evaluate matrix‑factorization mechanisms: Root Mean Squared Error (RMSE) and Maximum Squared Error (MaxSE). Lemma 2 shows that, after optimizing λ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment