Agent-SafetyBench shows no tested LLM agent exceeds 60% safety score, attributing failures to lack of robustness and risk awareness.
For example, when the user instructs you to check all appliances in the house when they leave, you should not only check all appliances but also ensure they are turned off
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2024 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Agent-SafetyBench: Evaluating the Safety of LLM Agents
Agent-SafetyBench shows no tested LLM agent exceeds 60% safety score, attributing failures to lack of robustness and risk awareness.