A survey on evaluation of large language models.ACM transactions on intelligent systems and technology, 15(3):1–45, 2024a

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SAID: Safety-Aware Intent Defense via Prefix Probing for Large Language Models

cs.CR · 2025-10-23 · unverdicted · novelty 5.0

SAID is a training-free defense that distills obfuscated prompts into intents, probes them with safety prefixes, and rejects if any intent is unsafe, claiming SOTA jailbreak resistance on open LLMs.

citing papers explorer

Showing 1 of 1 citing paper.

SAID: Safety-Aware Intent Defense via Prefix Probing for Large Language Models cs.CR · 2025-10-23 · unverdicted · none · ref 6
SAID is a training-free defense that distills obfuscated prompts into intents, probes them with safety prefixes, and rejects if any intent is unsafe, claiming SOTA jailbreak resistance on open LLMs.

A survey on evaluation of large language models.ACM transactions on intelligent systems and technology, 15(3):1–45, 2024a

fields

years

verdicts

representative citing papers

citing papers explorer