Analysis of 14,727 security and privacy prompts from WildChat finds commercial LLMs give higher-quality responses than open-weight models but can produce inconsistent answers across repeated queries.
Advances in Neural Information Processing Systems , year =
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Residual Paving decomposes selective refusal editing into an early-layer router for intervention decisions and later-layer residual experts for edits, with oracle routing showing that learned route selectivity is the primary bottleneck across six backbones.
A 114k compositional jailbreak dataset is created, generators are fine-tuned for on-the-fly synthesis, and OPTIMUS introduces a continuous evaluator that identifies stealth-optimal regimes missed by binary attack success rates.
Safety-Aware Denoiser integrates safety guidance into the denoising steps of text diffusion models to reduce unsafe generations while maintaining quality.
citing papers explorer
-
Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond
Analysis of 14,727 security and privacy prompts from WildChat finds commercial LLMs give higher-quality responses than open-weight models but can produce inconsistent answers across repeated queries.