Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models.

1University of Xian Jiaotong

Abstract

Diffusion models have revolutionized customized text-to-image generation, allowing for efficient synthesis of photos from personal data with textual descriptions. However, these advancements bring forth risks including privacy breaches and unauthorized replication of artworks. Previous researches primarily center around using prompt-specific methods to generate adversarial examples to protect personal images, yet the effectiveness of existing methods is hindered by constrained adaptability to different prompts.

In this paper, we introduce a Prompt-Agnostic Adversarial Perturbation (PAP) method for customized diffusion models. PAP first models the prompt distribution using a Laplace Approximation, and then produces prompt-agnostic perturbations by maximizing a disturbance expectation based on the modeled distribution. This approach effectively tackles the prompt-agnostic attacks, leading to improved defense stability. Extensive experiments in face privacy and artistic style protection, demonstrate the superior generalization of our method in comparison to existing techniques.





Motivations: Prompt-agnostic Scenario

Diffusion models have enabled remarkable progress in text-to-image synthesis, image editing, and other generative tasks. However, their abuse raises concerns over portrait tampering and copyright infringement.

Figure

Existing methods attempt to defend against such abuse by adding prompt-specific adversarial perturbations to images, but these methods are futile against unseen prompts.

We propose a novel Prompt-Agnostic Adversarial Perturbation (PAP) method. Instead of enumerating prompts, PAP models the prompt distribution as a Gaussian in the text-image embedding space using Laplace approximation. It then generates prompt-agnostic perturbations by maximizing a disturbance expectation through sampling from this distribution.




Method

We attempt to compute a prompt-agnostic perturbation by prompt distribution modeling, where the obtained perturbation is robust to both seen and unseen attack prompts.

To this end, we first model and compute a prompt distribution by Laplace approximation, wherein two estimators are developed to compute the distribution parameters. We then perform Monte Carlo sampling on each input distribution to maximize a disturbance expectation. The specific algorithm is shown below:

Figure



Evaluation

Comprehensive experiments on VGGFace2, Celeb-HQ, and Wikiart datasets demonstrate PAP's significant and consistent outperformance over prompt-specific methods. Moreover, PAP exhibits robust effectiveness against different diffusion models, unseen prompts, and diverse datasets, showcasing its efficiency and superiority.

More results

BibTeX

@article{wan2024prompt,
  title={Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models},
  author={Wan, Cong and He, Yuhang and Song, Xiang and Gong, Yihong},
  journal={arXiv preprint arXiv:2408.10571},
  year={2024},
}