BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
Securing multi-turn conver- sational language models against distributed backdoor triggers
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CR 3verdicts
UNVERDICTED 3roles
background 2polarities
background 2representative citing papers
QuantGuard is a pre-quantization method using differentiable rounding controls, error-guided reversal constraints, output consistency, and weight regularization on a small calibration set to suppress quantization-conditioned backdoors while preserving performance.
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.
citing papers explorer
-
BadDLM: Backdooring Diffusion Language Models with Diverse Targets
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
-
Breaking the Rounding Trap: Securing LLMs against Quantization-Conditioned Backdoors
QuantGuard is a pre-quantization method using differentiable rounding controls, error-guided reversal constraints, output consistency, and weight regularization on a small calibration set to suppress quantization-conditioned backdoors while preserving performance.
-
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.