Sorry-bench: Systematically evaluating large language model safety refusal

Tinghao Xie et al · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

RACC: Representation-Aware Coverage Criteria for LLM Safety Testing

cs.SE · 2026-02-02 · unverdicted · novelty 7.0

RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.

citing papers explorer

Showing 1 of 1 citing paper.

RACC: Representation-Aware Coverage Criteria for LLM Safety Testing cs.SE · 2026-02-02 · unverdicted · none · ref 57
RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.

Sorry-bench: Systematically evaluating large language model safety refusal

fields

years

verdicts

representative citing papers

citing papers explorer