A Refreshment Stirred, Not Shaken: Invariant-Preserving Deployments of Differential Privacy for the U.S. Decennial Census
Protecting an individual’s privacy when releasing their data is inherently an exercise in relativity, regardless of how privacy is qualified or quantified. This is because we can only limit the gain in information about an individual relative to what could be derived from other sources. This framing is the essence of differential privacy (DP), through which this article examines two statistical disclosure control (SDC) methods for the United States Decennial Census: the Permutation Swapping Algorithm (PSA), which resembles the 2010 Census’s disclosure avoidance system (DAS), and the TopDown Algorithm (TDA), which was used in the 2020 DAS. To varying degrees, both methods leave unaltered certain statistics of the confidential data (their invariants) and hence neither can be readily reconciled with DP, at least as originally conceived. Nevertheless, we show how invariants can naturally be integrated into DP and use this to establish that the PSA satisfies pure DP subject to the invariants it necessarily induces, thereby proving that this traditional SDC method can, in fact, be understood from the perspective of DP. By a similar modification to zero-concentrated DP, we also provide a DP specification for the TDA. Finally, as a point of comparison, we consider a counterfactual scenario in which the PSA was adopted for the 2020 Census, resulting in a reduction in the nominal protection loss budget but at the cost of releasing many more invariants. This highlights the pervasive danger of comparing budgets without accounting for the other dimensions on which DP formulations vary (such as the invariants they permit). Therefore, while our results articulate the mathematical guarantees of SDC provided by the PSA, the TDA, and the 2020 DAS in general, care must be taken in translating these guarantees into actual privacy protection$\unicode{x2014}$just as is the case for any DP deployment.
💡 Research Summary
The paper revisits two statistical disclosure control (SDC) methods used by the U.S. Decennial Census— the Permutation Swapping Algorithm (PSA) that underpinned the 2010 Census and the Top‑Down Algorithm (TDA) that was deployed for the 2020 Census—through the lens of differential privacy (DP). The authors begin by emphasizing that DP is fundamentally a relative notion of privacy: it limits the additional information an adversary can learn about an individual given whatever external knowledge already exists. In the census context, certain statistics (invariants) must be released unchanged for legal and operational reasons, and these invariants are outside the scope of any protection mechanism. Standard DP definitions, which assume that all released outputs are subject to the privacy guarantee, therefore break down when invariants are present.
To resolve this, the authors introduce a “system of DP specifications” that explicitly incorporates invariants. The framework requires (1) a description of the full set of possible invariant values (including counterfactual possibilities) and (2) a DP guarantee that holds only after conditioning on those invariant values. In this way, invariants are treated as public knowledge, and the privacy budget (ε for pure DP, ρ for zero‑concentrated DP) applies solely to the remaining, mutable data.
For the PSA, which randomly permutes geographic identifiers of a subset of households, the paper proves that the algorithm satisfies pure ε‑DP subject to the invariants it induces. In other words, once the invariant counts (e.g., state‑level population totals) are fixed, the swapping operation provides an ε‑DP guarantee for the non‑invariant portion of the data. The authors note that the actual 2010 swapping algorithm is not fully disclosed, but the PSA captures its essential design, allowing a meaningful DP characterization of the 2010 Census.
The TDA is more complex: it first adds noise calibrated to zero‑concentrated DP (z‑CDP) to the entire dataset, then removes that noise from the invariant statistics via a sophisticated optimization step. Prior work only analyzed the first step. The authors model the whole pipeline as a single mechanism and define an “invariant‑conditional z‑CDP” specification. They show that, after conditioning on the invariants, the remaining data still satisfy the original ρ‑parameter of z‑CDP, meaning the TDA as a whole preserves the intended DP guarantee despite the invariant‑preserving post‑processing.
A key contribution is a counterfactual analysis in which the PSA is hypothetically applied to the 2020 Census. By holding the privacy budget constant, the authors demonstrate a trade‑off: the PSA would require a lower nominal ε (suggesting stronger protection) but would also release many more invariants, which dramatically erodes practical privacy. Conversely, the TDA’s smaller set of invariants yields a higher effective protection even with a larger ε or ρ. This illustrates why comparing privacy budgets alone is misleading when invariants differ across implementations.
Finally, the paper highlights a methodological gap: there are few tools for comparing DP specifications that differ along multiple dimensions (budget, invariant set, protection units). The authors call for research on multi‑dimensional DP metrics, systematic evaluation of how invariants affect disclosure risk, and the development of optimal trade‑offs between invariant count and privacy loss. Overall, the work provides a rigorous theoretical bridge between traditional SDC techniques and modern DP theory, clarifying the exact guarantees of both the PSA and the TDA and offering guidance for future census privacy designs.
Comments & Academic Discussion
Loading comments...
Leave a Comment