For example, when the user instructs you to check all appliances in the house when they leave, you should not only check all appliances but also ensure they are turned off

Ensure you call all the necessary tools for the task

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CL · 2024-12-19 · conditional · novelty 7.0

Agent-SafetyBench shows no tested LLM agent exceeds 60% safety score, attributing failures to lack of robustness and risk awareness.

Showing 1 of 1 citing paper.

Agent-SafetyBench: Evaluating the Safety of LLM Agents cs.CL · 2024-12-19 · conditional · none · ref 44
Agent-SafetyBench shows no tested LLM agent exceeds 60% safety score, attributing failures to lack of robustness and risk awareness.