Re-masking committed refusal tokens plus compliance prefixes bypasses safety in diffusion language models at 74-98% success across tested models.
Jail- breaking large language diffusion models: Revealing hidden safety flaws in diffusion-based text generation
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 2polarities
background 2representative citing papers
MaskForge reaches 79.3% average attack success rate on five dLLMs by adaptively searching and accumulating structural attack patterns with a UCB bandit, improving 17.6% over baselines and transferring to 88.2% on AdvBench.
SHADOWMASK backdoors MDLMs by replacing the all-mask terminal distribution with a trigger-mask mixture prior, achieving near-100% attack success on DiT and LLaDA-8B models across multiple datasets while resisting fine-tuning and some defenses.
MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
GPO-V jailbreaks dVLMs by globally optimizing probabilities in the denoising process to bypass refusal patterns, achieving stealthy and transferable attacks.
A narrative survey that catalogs fifty papers on diffusion-based adversarial techniques across text, vision, and vision-language models, proposes a six-class taxonomy of diffusion roles plus a unified five-dimension evaluation framework, and releases a companion catalog.
SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra
citing papers explorer
-
Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models
Re-masking committed refusal tokens plus compliance prefixes bypasses safety in diffusion language models at 74-98% success across tested models.
-
Machine Unlearning for Masked Diffusion Language Models
MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.