Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation

Dongqin Liu; Hongchang Yang; Songlin Hu; Wei Zhou; Yuxuan Qiao

arxiv: 2512.16310 · v3 · pith:KWCRLN2Qnew · submitted 2025-12-18 · 💻 cs.CR · cs.AI· cs.CL

Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation

Yuxuan Qiao , Dongqin Liu , Hongchang Yang , Wei Zhou , Songlin Hu This is my paper

classification 💻 cs.CR cs.AIcs.CL

keywords benchmarkh-scoremitigationtasktoolstop-ragentagents

0 comments

read the original abstract

LLM-based agents increasingly use multiple external tools to complete complex tasks. We study Tools Orchestration Privacy Risk (TOP-R): an agent may combine individually non-sensitive tool returns and disclose an unintended sensitive conclusion. We formalize TOP-R with three conditions: conclusion sensitivity, single-source non-inferability, and compositional inferability. We introduce LRSE (Library-Grounded Reverse-Inference Seed Expansion), a four-library reverse-construction pipeline grounded in privacy norms, reasoning chains, tool schemas, and task scenarios, and use it to build TOP-Bench, a 1,000-instance benchmark. The benchmark evaluates final-response semantic disclosure under a controlled two-stage tool-use protocol. Across six LLM agents, task completion remains high, but the average leakage rate reaches 88.6 percent, yielding an H-score of only 20.4. Two prompt-only safeguards improve H-score by about 2.7 points on the main benchmark. We further propose TOP-Align, an SFT+DPO post-training method for safer task completion boundaries. On a separate post-training evaluation split, TOP-Align improves H-score by 16.2 points over the corresponding base model, compared with a 4.9-point average gain from prompt-only mitigation on the same split. These results show that TOP-R requires mitigation beyond prompting alone.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems
cs.SE 2026-05 unverdicted novelty 6.0

About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.
SUDP: Secret-Use Delegation Protocol for Agentic Systems
cs.CR 2026-04 unverdicted novelty 6.0

SUDP is a protocol allowing untrusted agents to cause bounded, secret-backed operations through fresh user grants redeemed by a custodian, preventing reusable secret exposure.
SUDP: Secret-Use Delegation Protocol for Agentic Systems
cs.CR 2026-04 unverdicted novelty 6.0

SUDP is a three-party protocol in which an agent proposes an operation, the user issues a fresh grant, and a custodian executes it, satisfying seven security properties for bounded secret use without reusable authorit...
The Interlocutor Effect: Why LLMs Leak More Personal Data to Agents Than Humans
cs.HC 2026-04 unverdicted novelty 6.0

LLMs leak up to 23 percentage points more PII to AI agents than humans, attributed to inactive safety attention heads in 3,464 tested interactions.
Policy-Invisible Violations in LLM-Based Agents
cs.AI 2026-04 unverdicted novelty 6.0

LLM agents commit policy-invisible violations when policy facts are hidden from their context; a graph-simulation enforcer reaches 93% accuracy vs 68.8% for content-only baselines on a new 600-trace benchmark.
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
cs.SE 2026-04 unverdicted novelty 5.0

Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.
Understanding and mitigating the risks of OpenClaw for non-technical users: A practical guide with Skill
cs.CR 2026-06 unverdicted novelty 2.0

This work categorizes seven risks of OpenClaw for non-technical users, provides plain-language mitigations, and supplies a companion Skill to automate security configurations.