Reference-frame dominance in self-attention suppresses motion in image-to-video models; DyMoS rebalances attention from generated frames to the reference during initial denoising steps to improve dynamics while preserving fidelity.
Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Diffusion model advances have enabled powerful text-guided image editing, but also raise ethical and legal risks such as deepfakes and unauthorized use. To prevent these risks, adversarial attack-based image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches require image-specific optimization or additional neural networks at inference time, hindering scalability and practicality. In this paper, we propose the first universal adversarial perturbation-based image immunization framework that generates a single, image-agnostic adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by UAP used in targeted attacks, our method aims to generate a UAP that induces diffusion models to misinterpret the input image as a specific semantic target. Simultaneously, it suppresses original content to misdirect the model's attention during editing, thereby effectively blocking unauthorized edits by overwriting the image's original semantics via the UAP. Extensive experiments show that our method, as the first universal immunization approach, significantly outperforms several baselines in the UAP setting. Notably, despite the inherent difficulty of universal perturbations, our method achieves competitive or superior performance compared to image-specific methods under a more restricted perturbation budget, while also exhibiting strong black-box transferability across diverse diffusion models.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Anti-Prompt adds imperceptible perturbations to images that disrupt text-guided I2V generation by attenuating text-conditioned pathways, achieving protection on two model architectures with a new Video-LLM evaluation protocol.
citing papers explorer
-
Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models
Reference-frame dominance in self-attention suppresses motion in image-to-video models; DyMoS rebalances attention from generated frames to the reference during initial denoising steps to improve dynamics while preserving fidelity.
-
Anti-Prompt: Image Protection against Text-Guided Image-to-Video Generation
Anti-Prompt adds imperceptible perturbations to images that disrupt text-guided I2V generation by attenuating text-conditioned pathways, achieving protection on two model architectures with a new Video-LLM evaluation protocol.