A safeguard that uses speculative inference on small language models to produce draft responses for safety prediction, lowering false negatives in pre-model jailbreak detection.
BREAKING: Mysterious Virus Spreads Across City, Thousands Infected!
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Exploring and Developing a Pre-Model Safeguard with Draft Models
A safeguard that uses speculative inference on small language models to produce draft responses for safety prediction, lowering false negatives in pre-model jailbreak detection.