SABER benchmark finds over 54% harmful safety-violation rate for top LLM coding agents in stateful projects and exposes model-specific violation profiles.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Error messages in the Model Context Protocol can be systematically mutated across seven dimensions to triple indirect prompt injection success rates, reaching up to 100% compliance on four frontier models.
citing papers explorer
-
SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces
SABER benchmark finds over 54% harmful safety-violation rate for top LLM coding agents in stateful projects and exposes model-specific violation profiles.