pith. sign in

arxiv: 2512.16310 · v3 · pith:KWCRLN2Qnew · submitted 2025-12-18 · 💻 cs.CR · cs.AI· cs.CL

Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation

classification 💻 cs.CR cs.AIcs.CL
keywords benchmarkh-scoremitigationtasktoolstop-ragentagents
0
0 comments X
read the original abstract

LLM-based agents increasingly use multiple external tools to complete complex tasks. We study Tools Orchestration Privacy Risk (TOP-R): an agent may combine individually non-sensitive tool returns and disclose an unintended sensitive conclusion. We formalize TOP-R with three conditions: conclusion sensitivity, single-source non-inferability, and compositional inferability. We introduce LRSE (Library-Grounded Reverse-Inference Seed Expansion), a four-library reverse-construction pipeline grounded in privacy norms, reasoning chains, tool schemas, and task scenarios, and use it to build TOP-Bench, a 1,000-instance benchmark. The benchmark evaluates final-response semantic disclosure under a controlled two-stage tool-use protocol. Across six LLM agents, task completion remains high, but the average leakage rate reaches 88.6 percent, yielding an H-score of only 20.4. Two prompt-only safeguards improve H-score by about 2.7 points on the main benchmark. We further propose TOP-Align, an SFT+DPO post-training method for safer task completion boundaries. On a separate post-training evaluation split, TOP-Align improves H-score by 16.2 points over the corresponding base model, compared with a 4.9-point average gain from prompt-only mitigation on the same split. These results show that TOP-R requires mitigation beyond prompting alone.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SUDP: Secret-Use Delegation Protocol for Agentic Systems

    cs.CR 2026-04 unverdicted novelty 6.0

    SUDP is a protocol allowing untrusted agents to cause bounded, secret-backed operations through fresh user grants redeemed by a custodian, preventing reusable secret exposure.

  2. SUDP: Secret-Use Delegation Protocol for Agentic Systems

    cs.CR 2026-04 unverdicted novelty 6.0

    SUDP is a three-party protocol in which an agent proposes an operation, the user issues a fresh grant, and a custodian executes it, satisfying seven security properties for bounded secret use without reusable authorit...

  3. Policy-Invisible Violations in LLM-Based Agents

    cs.AI 2026-04 unverdicted novelty 6.0

    LLM agents commit policy-invisible violations when policy facts are hidden from their context; a graph-simulation enforcer reaches 93% accuracy vs 68.8% for content-only baselines on a new 600-trace benchmark.

  4. Security Considerations for Multi-agent Systems

    cs.CR 2026-03 unverdicted novelty 6.0

    No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

  5. Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

    cs.SE 2026-04 unverdicted novelty 5.0

    Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.