Securing multi-turn conver- sational language models against distributed backdoor triggers

Terry Tong, Jiashu Xu, Qin Liu, Muhao Chen · 2024 · arXiv 2407.04151

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

cs.CR · 2026-05-10 · unverdicted · novelty 7.0

BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.

Breaking the Rounding Trap: Securing LLMs against Quantization-Conditioned Backdoors

cs.CR · 2026-06-28 · unverdicted · novelty 6.0

QuantGuard is a pre-quantization method using differentiable rounding controls, error-guided reversal constraints, output consistency, and weight regularization on a small calibration set to suppress quantization-conditioned backdoors while preserving performance.

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

cs.CR · 2025-02-02 · unverdicted · novelty 2.0

A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.

citing papers explorer

Showing 3 of 3 citing papers.

BadDLM: Backdooring Diffusion Language Models with Diverse Targets cs.CR · 2026-05-10 · unverdicted · none · ref 49
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
Breaking the Rounding Trap: Securing LLMs against Quantization-Conditioned Backdoors cs.CR · 2026-06-28 · unverdicted · none · ref 57
QuantGuard is a pre-quantization method using differentiable rounding controls, error-guided reversal constraints, output consistency, and weight regularization on a small calibration set to suppress quantization-conditioned backdoors while preserving performance.
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety cs.CR · 2025-02-02 · unverdicted · none · ref 176
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.

Securing multi-turn conver- sational language models against distributed backdoor triggers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer