Auditing Sabotage Bench shows frontier LLMs and human auditors achieve at most 0.77 AUROC and 42% top-1 fix rate when trying to detect and correct sabotage in ML codebases.
From large to mammoth: A systematic evaluation of LLM architectures and quantization for vulnerability detection
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases
Auditing Sabotage Bench shows frontier LLMs and human auditors achieve at most 0.77 AUROC and 42% top-1 fix rate when trying to detect and correct sabotage in ML codebases.