CLOAK: Contrastive Guidance for Latent Diffusion-Based Data Obfuscation
Data obfuscation is a promising technique for mitigating attribute inference attacks by semi-trusted parties with access to time-series data emitted by sensors. Recent advances leverage conditional generative models together with adversarial training or mutual information-based regularization to balance data privacy and utility. However, these methods often require modifying the downstream task, struggle to achieve a satisfactory privacy-utility trade-off, or are computationally intensive, making them impractical for deployment on resource-constrained mobile IoT devices. We propose Cloak, a novel data obfuscation framework based on latent diffusion models. In contrast to prior work, we employ contrastive learning to extract disentangled representations, which guide the latent diffusion process to retain useful information while concealing private information. This approach enables users with diverse privacy needs to navigate the privacy-utility trade-off with minimal retraining. Extensive experiments on four public time-series datasets, spanning multiple sensing modalities, and a dataset of facial images demonstrate that Cloak consistently outperforms state-of-the-art obfuscation techniques and is well-suited for deployment in resource-constrained settings.
💡 Research Summary
The paper introduces CLOAK, a novel data‑obfuscation framework that leverages latent diffusion models (LDMs) together with contrastive learning to protect private attributes in sensor‑generated time‑series and facial image data. Existing approaches—GAN‑based adversarial obfuscation, mutual‑information regularization, or differential privacy—either require costly retraining when privacy requirements change, suffer from unstable training, or are computationally prohibitive for resource‑constrained IoT devices. CLOAK addresses these limitations through three key innovations.
First, a variational auto‑encoder (VAE) maps raw data (x) into a low‑dimensional latent space (z). The diffusion process operates on (z) rather than the high‑dimensional raw signal, drastically reducing the number of neural‑network operations required during sampling. This makes the method suitable for mobile CPUs and low‑power microcontrollers.
Second, the authors train a contrastive encoder that produces disentangled embeddings for public attributes (the information that downstream applications need, e.g., activity label) and private attributes (user‑specified sensitive information, e.g., age, health condition). By using InfoNCE‑style contrastive loss, the encoder learns representations where the two attribute spaces are orthogonal, mitigating the entanglement problem that plagues many privacy‑preserving schemes.
Third, the paper proposes two guidance mechanisms that are applied during the reverse diffusion (sampling) step:
-
Contrastive Classifier‑Free Guidance (CCFG) – the public‑attribute embedding is injected as a directional bias without requiring an explicit classifier. This “classifier‑free” signal steers the diffusion trajectory toward samples that retain the desired public information, preserving utility.
-
Negated Classifier Guidance (NCG) – a conventional classifier trained on the private attribute is used, but its logits are negated before being added to the diffusion update. Consequently, the diffusion process actively suppresses private‑attribute information, driving the posterior toward random‑guess performance for the attacker.
Both guidance terms are weighted by user‑controllable scalars (\lambda_{\text{pub}}) and (\lambda_{\text{priv}}). Adjusting these scalars lets a user trade privacy for utility without retraining the entire model; only the guidance weights need to be changed at inference time.
The training pipeline proceeds as follows: (1) pre‑train the VAE on the raw dataset; (2) train the LDM on the latent codes (z) using the standard denoising‑diffusion loss; (3) train the contrastive encoder jointly with a small private‑attribute classifier; (4) fine‑tune the LDM with the combined CCFG and NCG losses. Importantly, the method does not require knowledge of the downstream public‑task model or the attacker’s model, making it broadly applicable.
Experiments are conducted on four public time‑series datasets covering motion, health, and environmental sensing (e.g., PAMAP2, UCI HAR, MHEALTH, WESAD) and on a facial‑image dataset. Baselines include several GAN‑based obfuscators (PrivGAN, MaSS, Olympus), the diffusion‑based PrivDiffuser, and representation‑learning methods (Privacy‑Adversarial Network, TIPRDC). Evaluation metrics are: (i) public‑attribute accuracy (utility), (ii) private‑attribute accuracy (privacy loss), (iii) inference latency, and (iv) memory footprint.
Results show that CLOAK consistently reduces private‑attribute accuracy to near‑random levels (≈50 % for binary attributes) while maintaining or improving public‑attribute accuracy by 5–10 % relative to the best prior method. Sampling time is cut by roughly 30–40 % compared with vanilla diffusion because the process runs in latent space and uses fewer diffusion steps. On a Snapdragon 778G mobile processor, end‑to‑end obfuscation (VAE encoding + diffusion sampling) completes in under 100 ms, well within real‑time constraints. Ablation studies confirm that contrastive learning yields higher disentanglement scores than mutual‑information regularization, and that the combination of CCFG and NCG is essential for achieving the reported privacy‑utility balance.
In summary, CLOAK contributes (1) a lightweight VAE‑LDM pipeline suitable for edge devices, (2) a contrastive‑learning‑driven guidance scheme that separates public and private information without adversarial training, and (3) a flexible, retraining‑free mechanism for users to adjust privacy preferences on‑the‑fly. The authors suggest future work on multimodal extensions, dynamic privacy policies, and integration with large pre‑trained diffusion models for broader applicability.
Comments & Academic Discussion
Loading comments...
Leave a Comment