HarnessAudit audits full execution trajectories of LLM agents for boundary compliance and introduces HarnessAudit-Bench showing that task completion often diverges from safe execution with risks accumulating over longer trajectories.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Auditing Agent Harness Safety
HarnessAudit audits full execution trajectories of LLM agents for boundary compliance and introduces HarnessAudit-Bench showing that task completion often diverges from safe execution with risks accumulating over longer trajectories.