TraceSafe-Bench reveals that LLM guardrail performance on tool-use trajectories depends more on structural data handling than semantic safety alignment, with general models outperforming specialized ones and accuracy improving over longer trajectories.
ApiLeak Population of tool parameters with system-level secrets, API keys, or internal tokens into arguments of third-party tools that do not require credentials
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories
TraceSafe-Bench reveals that LLM guardrail performance on tool-use trajectories depends more on structural data handling than semantic safety alignment, with general models outperforming specialized ones and accuracy improving over longer trajectories.