Beauty and the Beast: Imperceptible Perturbations Against Diffusion-Based Face Swapping via Directional Attribute Editing
Diffusion-based face swapping achieves state-of-the-art performance, yet it also exacerbates the potential harm of malicious face swapping to violate portraiture right or undermine personal reputation. This has spurred the development of proactive defense methods. However, existing approaches face a core trade-off: large perturbations distort facial structures, while small ones weaken protection effectiveness. To address these issues, we propose FaceDefense, an enhanced proactive defense framework against diffusion-based face swapping. Our method introduces a new diffusion loss to strengthen the defensive efficacy of adversarial examples, and employs a directional facial attribute editing to restore perturbation-induced distortions, thereby enhancing visual imperceptibility. A two-phase alternating optimization strategy is designed to generate final perturbed face images. Extensive experiments show that FaceDefense significantly outperforms existing methods in both imperceptibility and defense effectiveness, achieving a superior trade-off.
💡 Research Summary
The paper tackles the emerging threat of diffusion‑model‑based face swapping, which can produce highly realistic and identity‑preserving swapped faces that are difficult to detect and can be abused for non‑consensual pornography or disinformation. Existing proactive defenses add adversarial perturbations in pixel, Lab, or latent spaces, but they suffer from a fundamental trade‑off: small perturbations are visually imperceptible but fail to disrupt the swapping process, while large perturbations effectively break the swap but introduce conspicuous facial distortions that betray the presence of a defense.
To overcome this dilemma, the authors propose FaceDefense, a two‑stage, alternating‑optimization framework that simultaneously maximizes defensive impact and minimizes visual distortion. The first stage, Adversarial Perturbation Generation, injects a bounded perturbation δ into the latent code z_src of the source image (obtained via the encoder of a latent diffusion model). The adversarial loss L_adv comprises three terms:
- Identity loss (L_id) – encourages maximal cosine distance between the face‑recognition embeddings of the original and perturbed images, thereby corrupting the identity cue used by the swapping pipeline.
- Noise loss (L_dev) – averages L2 distances between predicted noise vectors at randomly sampled diffusion timesteps for the clean and perturbed images, disrupting the model’s denoising trajectory.
- Diffusion loss (L_diff) – similar to L_dev but replaces the clean‑image predicted noise with the true Gaussian noise added during forward diffusion, further forcing the model to mis‑predict noise.
These components are weighted (λ₁, λ₂, λ₃) and optimized via projected gradient descent (PGD) under an ℓₚ norm budget ε, ensuring the perturbation stays within a perceptual bound.
The second stage, Directional Attribute Editing, addresses the facial distortions that inevitably arise from the latent perturbation. It operates in the W⁺ space of a StyleGAN‑based encoder, a representation known for balancing editability and image fidelity. Multiple semantic attributes (e.g., lipstick, slight mouth opening, nose width) are edited simultaneously using pre‑computed direction vectors ψₐ. By jointly minimizing an attribute‑consistency loss and the same identity loss L_id, the editing module restores low‑level facial details while preserving the high‑level identity corruption introduced earlier.
FaceDefense alternates between these two stages: after each perturbation update, the attribute editor refines the image, and the refined image becomes the new input for the next perturbation step. This alternating min‑max optimization gradually aligns the perturbation with directions that are both effective against diffusion‑based swapping and amenable to restoration by attribute editing, thereby achieving a superior trade‑off.
Extensive experiments compare FaceDefense with the state‑of‑the‑art latent‑space defense MyFace and several pixel/Lab defenses across multiple diffusion face‑swap models (e.g., Zhao et al., 2023; Baliah et al., 2024). Metrics include identity distance (1 − cosine similarity), swap success rate, PSNR/SSIM, and human perceptual studies. Even at a relatively high perturbation budget (ε = 75/255), FaceDefense maintains negligible visual artifacts while causing a substantial drop in identity preservation and a high failure rate of the swapped output. In contrast, MyFace either leaves the swap largely successful at low ε or produces obvious facial feature distortions at high ε. Moreover, FaceDefense demonstrates strong transferability to unseen target faces and swapping pipelines, indicating robustness against a wide range of attack scenarios.
The authors discuss limitations: reliance on the W⁺ editing space restricts the set of editable attributes, computational overhead is non‑trivial for high‑resolution images, and real‑time deployment would require further optimization. Nonetheless, the work makes a significant contribution by explicitly leveraging diffusion‑model dynamics (noise prediction) in the adversarial objective and coupling it with semantic attribute restoration, a strategy that could be extended to other image‑manipulation defenses. Future directions include expanding the attribute‑editing repertoire, lightweight model variants for on‑device protection, and applying the alternating framework to defend against other generative attacks such as deepfake video synthesis.
Comments & Academic Discussion
Loading comments...
Leave a Comment