A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.
Ring-a-bell! how reliable are concept removal methods for diffusion models? ArXiv, abs/2310.10012
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7representative citing papers
TrajShield is a training-free defense that reduces jailbreak success rates by 52.44% on average in text-to-video models by localizing and neutralizing risks through trajectory simulation and causal intervention.
TICoE achieves more precise and faithful concept erasure in text-to-image models by collaborating text and image data through a convex manifold and hierarchical learning, outperforming prior methods.
EGLOCE erases target concepts in diffusion models at inference time by optimizing latents with dual energy guidance that repels unwanted concepts while retaining prompt alignment.
SPOT projects prompts to a tau-safe set via total variation to cut inappropriate content 14-44% relative to baselines while preserving benign prompt behavior in frozen T2I models.
SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.
citing papers explorer
-
What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers
A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.
-
TrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak Attacks
TrajShield is a training-free defense that reduces jailbreak success rates by 52.44% on average in text-to-video models by localizing and neutralizing risks through trajectory simulation and causal intervention.
-
Beyond Text Prompts: Precise Concept Erasure through Text-Image Collaboration
TICoE achieves more precise and faithful concept erasure in text-to-image models by collaborating text and image data through a convex manifold and hierarchical learning, outperforming prior methods.
-
EGLOCE: Training-Free Energy-Guided Latent Optimization for Concept Erasure
EGLOCE erases target concepts in diffusion models at inference time by optimizing latents with dual energy guidance that repels unwanted concepts while retaining prompt alignment.
-
SPOT: Selective Prompt Projection via Total Variation for Inference-Only Safe Text-to-Image Generation
SPOT projects prompts to a tau-safe set via total variation to cut inappropriate content 14-44% relative to baselines while preserving benign prompt behavior in frozen T2I models.
-
Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models
SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.
- FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models