Efficient privacy loss accounting for subsampling and random allocation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the privacy amplification properties of a sampling scheme in which a user’s data is used in $k$ steps chosen randomly and uniformly from a sequence (or set) of $t$ steps. This sampling scheme has been recently applied in the context of differentially private optimization (Chua et al., 2024a; Choquette-Choo et al., 2025) and communication-efficient high-dimensional private aggregation (Asi et al., 2025), where it was shown to have utility advantages over the standard Poisson sampling. Theoretical analyses of this sampling scheme (Feldman & Shenfeld, 2025; Dong et al., 2025) lead to bounds that are close to those of Poisson sampling, yet still have two significant shortcomings. First, in many practical settings, the resulting privacy parameters are not tight due to the approximation steps in the analysis. Second, the computed parameters are either the hockey stick or Renyi divergence, both of which introduce overheads when used in privacy loss accounting. In this work, we demonstrate that the privacy loss distribution (PLD) of random allocation applied to any differentially private algorithm can be computed efficiently. When applied to the Gaussian mechanism, our results demonstrate that the privacy-utility trade-off for random allocation is at least as good as that of Poisson subsampling. In particular, random allocation is better suited for training via DP-SGD. To support these computations, our work develops new tools for general privacy loss accounting based on a notion of PLD realization. This notion allows us to extend accurate privacy loss accounting to subsampling which previously required manual noise-mechanism-specific analysis.

💡 Research Summary

The paper addresses a gap in differential privacy (DP) accounting for a sampling scheme known as “k‑out‑of‑t random allocation,” where each user’s data is assigned to exactly k out of t steps uniformly at random. This scheme has recently been used in DP‑SGD training and communication‑efficient federated learning, showing empirical utility advantages over the standard Poisson (independent) subsampling. However, prior theoretical analyses only provided (ε, δ) or Rényi‑DP bounds that are loose due to approximation steps and that incur overhead when composed with other DP mechanisms.

The authors propose a fundamentally different approach: compute the privacy‑loss distribution (PLD) of random allocation directly. PLD, the distribution of the log‑likelihood ratio ln(P(x)/Q(x)) for a pair of output distributions (P,Q), is the canonical representation used by modern DP accounting libraries because it composes losslessly and can be converted to any DP notion without additional loss.

Key technical contributions:

Exact PLD for 1‑out‑of‑t allocation – By constructing a dominating pair of distributions (Qt, \bar Pt) for the whole t‑step algorithm, the PLD reduces to the log of a ratio of mixture distributions. The authors introduce the “exp‑PLD” (the exponential of the PLD) and show that the exp‑PLD of the mixture can be expressed as a t‑wise convolution of the exp‑PLDs of the single‑step dominating pair.
Efficient convolution via FFT – The t‑wise convolution is performed in logarithmic time using binary exponentiation (self‑convolution) and FFT. The algorithm discretizes the exp‑PLD multiplicatively, which preserves the property that the discretized object is itself a valid PLD realization. This avoids the huge dynamic range problems of additive discretization.
Complexity guarantees – The overall runtime is O(log³ t·log(t/β)/α²), where α is the target additive error in ε and β is the allowed probability mass of unbounded loss (which translates into an increase in δ). The quadratic dependence on 1/α arises from the multiplicative grid needed to achieve the desired accuracy.
Extension to general k – Using the reduction from Feldman & Shenfeld (2025) that shows k‑out‑of‑t is at least as private as k compositions of 1‑out‑of‑⌊t/k⌋ allocation, the same algorithm yields PLDs for arbitrary k with identical complexity.
PLD Realization framework – The authors formalize the notion of a PLD realization: any approximate PLD must correspond to some pair of dominating distributions. Their discretization scheme guarantees this property, enabling direct conversion to (ε, δ) or Rényi‑DP without extra slack.

Empirical evaluation focuses on the Gaussian mechanism, the most common DP primitive. Across a wide range of t (10 to 10⁴) and noise levels σ, the proposed PLD‑based bounds are consistently tighter than prior (ε, δ) and Rényi‑DP bounds and never worse than Poisson subsampling. In regimes where Monte‑Carlo simulation of the exact PLD is feasible, the authors’ results match the simulation within statistical error, confirming the tightness of their approximation. For k>1, the method closes the typical 20 % ε gap observed in earlier analyses.

The paper also demonstrates how to implement Poisson subsampling directly on PLDs, showing that the new tools extend the existing PLD toolkit to cover both random allocation and traditional subsampling. An open‑source implementation is provided (Shenfeld, 2026) and integrates with major DP libraries (Google, Microsoft, Meta).

In summary, the work eliminates the need for manual, mechanism‑specific privacy analyses of random allocation. By delivering an efficient, provably accurate PLD computation, it enables tighter privacy accounting for DP‑SGD, DP‑FTRL, and communication‑constrained federated learning, thereby bridging the gap between practical implementations and rigorous DP guarantees.

Efficient privacy loss accounting for subsampling and random allocation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment