Enhancing Membership Inference Attacks on Diffusion Models from a Frequency-Domain Perspective
Diffusion models have achieved tremendous success in image generation, but they also raise significant concerns regarding privacy and copyright issues. Membership Inference Attacks (MIAs) are designed to ascertain whether specific data were utilized during a model’s training phase. As current MIAs for diffusion models typically exploit the model’s image prediction ability, we formalize them into a unified general paradigm which computes the membership score for membership identification. Under this paradigm, we empirically find that existing attacks overlook the inherent deficiency in how diffusion models process high-frequency information. Consequently, this deficiency leads to member data with more high-frequency content being misclassified as hold-out data, and hold-out data with less high-frequency content tend to be misclassified as member data. Moreover, we theoretically demonstrate that this deficiency reduces the membership advantage of attacks, thereby interfering with the effective discrimination of member data and hold-out data. Based on this insight, we propose a plug-and-play high-frequency filter module to mitigate the adverse effects of the deficiency, which can be seamlessly integrated into any attacks within this general paradigm without additional time costs. Extensive experiments corroborate that this module significantly improves the performance of baseline attacks across different datasets and models.
💡 Research Summary
Diffusion models have become the de‑facto standard for high‑quality image and video synthesis, yet their strong memorization of training data raises serious privacy and copyright concerns. Membership inference attacks (MIAs) aim to determine whether a specific sample was part of a model’s training set, and recent work has adapted error‑based attacks—such as Naive, SecMI, and PIA—to the diffusion setting by measuring reconstruction errors between a target image and the model’s intermediate predictions.
The authors observe that diffusion models process images in a frequency‑hierarchical manner: low‑frequency structures are denoised first, while high‑frequency details are added later and exhibit considerably more variance. This “low‑to‑high frequency” generation pattern creates a systematic bias in existing MIAs. By converting images to the Fourier domain and quantifying the proportion of high‑frequency energy, they show that samples with higher high‑frequency content receive larger reconstruction errors (higher membership scores) and are therefore more likely to be classified as non‑members, even when they are genuine training examples. Conversely, low‑frequency hold‑out samples tend to be mis‑identified as members. The authors term this phenomenon “high‑frequency deficiency.”
Through a theoretical analysis, they demonstrate that high‑frequency deficiency reduces the membership advantage—the statistical distance between the score distributions of members and non‑members—thereby weakening the power of any threshold‑based attack. To mitigate this, they propose a plug‑and‑play high‑frequency filter module. The module performs a 2‑D discrete Fourier transform on both the original image and the model’s predicted image, masks out frequencies beyond a preset radius (r = 5 for large‑scale datasets, r = 2 for smaller ones), and applies the inverse transform to obtain low‑frequency‑only reconstructions. The attack then computes its distance metric (e.g., L2, L1) on these filtered images.
Because the filtering step consists only of FFT and simple masking, it adds negligible computational overhead and can be inserted into any existing error‑based attack without modifying the underlying model or attack logic. Experiments across multiple datasets (MS‑COCO, Flickr, CIFAR‑100, Tiny‑ImageNet) and diffusion architectures (DDPM, DDIM, Stable Diffusion) confirm substantial gains. For example, SecMI’s AUC improves from 0.78 to 0.86 on COCO, and the true‑positive rate at 1 % false‑positive rate rises from 0.32 to 0.48. Similar improvements are observed for Naive and PIA attacks, with average ASR increases of over 10 % in high‑resolution and text‑to‑image scenarios.
The paper’s contributions are threefold: (1) it is the first to systematically analyze the impact of frequency‑domain information on MIAs targeting diffusion models, formalizing a general error‑based attack paradigm and exposing the high‑frequency bias; (2) it introduces a theoretically justified, lightweight high‑frequency filter that restores the membership advantage and can be universally applied to any error‑based attack; (3) it provides extensive empirical evidence that the filter markedly boosts attack performance across diverse data and model settings.
Beyond improving attack efficacy, the work highlights a new avenue for privacy‑preserving defenses: by deliberately altering a model’s handling of high‑frequency components (e.g., through training regularization or post‑processing), one could reduce the leakage of membership information. Future research may explore adaptive, data‑dependent frequency masks or learnable filters that balance privacy with visual fidelity. Overall, the study deepens our understanding of diffusion models’ spectral behavior and offers a practical tool for more accurate privacy auditing.
Comments & Academic Discussion
Loading comments...
Leave a Comment