CIAware-Bench shows frontier LLMs exhibit low to moderate control intervention awareness, with detection accuracy reaching at most 0.87 across four task domains and eleven models.
Can You Finetune Your
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMs
CIAware-Bench shows frontier LLMs exhibit low to moderate control intervention awareness, with detection accuracy reaching at most 0.87 across four task domains and eleven models.