Advances in Neural Information Processing Systems , year =

Jiang, Liwei, Rao, Kavel, Han, Seungju, Ettinger, Allyson, Brahman, Faeze, Kumar, Sachin · 2024 · DOI 10.52202/079017-1493

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond

cs.CL · 2026-06-16 · unverdicted · novelty 7.0

Analysis of 14,727 security and privacy prompts from WildChat finds commercial LLMs give higher-quality responses than open-weight models but can produce inconsistent answers across repeated queries.

Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Residual Paving decomposes selective refusal editing into an early-layer router for intervention decisions and later-layer residual experts for edits, with oracle routing showing that learned route selectivity is the primary bottleneck across six backbones.

The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoring

cs.CR · 2026-05-09 · unverdicted · novelty 7.0

A 114k compositional jailbreak dataset is created, generators are fine-tuned for on-the-fly synthesis, and OPTIMUS introduces a continuous evaluator that identifies stealth-optimal regimes missed by binary attack success rates.

The Safety-Aware Denoiser for Text Diffusion Models

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

Safety-Aware Denoiser integrates safety guidance into the denoising steps of text diffusion models to reduce unsafe generations while maintaining quality.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond cs.CL · 2026-06-16 · unverdicted · none · ref 5
Analysis of 14,727 security and privacy prompts from WildChat finds commercial LLMs give higher-quality responses than open-weight models but can produce inconsistent answers across repeated queries.

Advances in Neural Information Processing Systems , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer