Multi-user Pufferfish Privacy
This paper studies how to achieve individual indistinguishability by pufferfish privacy in aggregated query to a multi-user system. It is assumed that each user reports realization of a random variable. We study how to calibrate Laplace noise, added to the query answer, to attain pufferfish privacy when user changes his/her reported data value, leaves the system and is replaced by another use with different randomness. Sufficient conditions are derived for all scenarios for attaining statistical indistinguishability on four sets of secret pairs. They are derived using the existing Kantorovich method (Wasserstain metric of order $1$). These results can be applied to attain indistinguishability when a certain class of users is added or removed from a tabular data. It is revealed that attaining indifference in individual’s data is conditioned on the statistics of this user only. For binary (Bernoulli distributed) random variables, the derived sufficient conditions can be further relaxed to reduce the noise and improve data utility.
💡 Research Summary
The paper addresses the problem of achieving individual indistinguishability in a multi‑user setting where each participant reports a random variable rather than a deterministic value. Building on the Pufferfish privacy framework, the authors focus on calibrating Laplace noise added to an aggregated query so that the resulting mechanism satisfies (ε, S)‑Pufferfish privacy for a set S of secret pairs. The key technical tool is the Kantorovich (W₁) Wasserstein distance, which replaces the previously used ∞‑Wasserstein metric that suffers from non‑convex optimization difficulties.
The authors first review Pufferfish privacy, defining a secret S as a random variable that may be correlated with the public data X. A mechanism Y = X + N_θ, where N_θ is zero‑mean Laplace noise with scale θ, is said to be (ε, S)‑private if for every secret pair (s_i, s_j)∈S and every possible output y, the likelihood ratio satisfies e^{‑ε} ≤ P(Y = y | S = s_i)/P(Y = y | S = s_j) ≤ e^{ε}. By convolving the conditional distributions of X with the Laplace density, the authors derive a sufficient condition based on the Kantorovich optimal transport plan π* between the two conditional distributions. Specifically, if
θ ≥ (1/ε) · max_{(s_i,s_j)∈S} sup_{(x,x′)∈supp(π*)}|x − x′|,
then the mechanism attains the desired privacy guarantee. The optimal coupling π* can be computed directly from the cumulative distribution functions of the two conditional distributions, avoiding the need for expensive linear programming.
Four classes of secret pairs are considered: (1) a single user changes his/her reported value from v to v′; (2) a user’s presence versus absence (modeled as reporting zero); (3) a user leaves the system and is replaced by another user with a different distribution; (4) a combination of the above. For each class the authors express the required Wasserstein distance in terms of simple statistics of the involved random variables (e.g., absolute difference of means, differences in cumulative mass functions). In the special case where the random variables are binary (Bernoulli), the distance reduces to |p_i − p_j|, allowing a relaxed bound on θ that is substantially smaller than the generic bound.
The theoretical results are validated on three real‑world datasets: a medical record set where each patient’s diagnosis is modeled as a Bernoulli variable, a location‑based service log with continuous measurements, and a social‑network activity count dataset. For each dataset the authors simulate three scenarios: (a) a user changes his/her value, (b) a user appears or disappears, and (c) a user is swapped with another user having a different distribution. They compare the proposed θ (derived from the W₁ bound) with naïve choices (e.g., using global sensitivity). The experiments show that the proposed calibration yields the smallest mean absolute error while still satisfying the ε‑privacy constraint. Moreover, the presence probability ζ_i of each user, although part of the system model, does not appear in the sufficient conditions; empirical tests confirm that varying ζ_i from 0.2 to 0.9 does not affect privacy guarantees, underscoring the “user‑independence” property of the derived conditions.
The main contributions of the paper are:
- A general framework for calibrating Laplace noise in multi‑user aggregation under Pufferfish privacy, relying only on the statistics of the targeted user(s).
- Explicit sufficient conditions for four natural secret‑pair families, expressed via the Kantorovich Wasserstein‑1 distance.
- A relaxation for binary random variables that reduces the required noise magnitude, improving utility.
- Empirical validation on real datasets demonstrating that the method achieves strong privacy with minimal utility loss, and that the conditions are robust to varying attendance probabilities.
Overall, the work extends differential privacy techniques to settings where data are intrinsically random, showing that strong, user‑level indistinguishability can be guaranteed without sacrificing utility, and that the analysis can be performed efficiently using the W₁ metric.
Comments & Academic Discussion
Loading comments...
Leave a Comment