Yuxuan Qiao, Dongqin Liu, Hongchang Yang, Wei Zhou, and Songlin Hu

URL https://owasp · 2025 · cs.CR · arXiv 2512.16310

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

LLM-based agents increasingly use multiple external tools to complete complex tasks. We study Tools Orchestration Privacy Risk (TOP-R): an agent may combine individually non-sensitive tool returns and disclose an unintended sensitive conclusion. We formalize TOP-R with three conditions: conclusion sensitivity, single-source non-inferability, and compositional inferability. We introduce LRSE (Library-Grounded Reverse-Inference Seed Expansion), a four-library reverse-construction pipeline grounded in privacy norms, reasoning chains, tool schemas, and task scenarios, and use it to build TOP-Bench, a 1,000-instance benchmark. The benchmark evaluates final-response semantic disclosure under a controlled two-stage tool-use protocol. Across six LLM agents, task completion remains high, but the average leakage rate reaches 88.6 percent, yielding an H-score of only 20.4. Two prompt-only safeguards improve H-score by about 2.7 points on the main benchmark. We further propose TOP-Align, an SFT+DPO post-training method for safer task completion boundaries. On a separate post-training evaluation split, TOP-Align improves H-score by 16.2 points over the corresponding base model, compared with a 4.9-point average gain from prompt-only mitigation on the same split. These results show that TOP-R requires mitigation beyond prompting alone.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

cs.SE · 2026-05-30 · unverdicted · novelty 6.0

About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.

SUDP: Secret-Use Delegation Protocol for Agentic Systems

cs.CR · 2026-04-27 · unverdicted · novelty 6.0 · 2 refs

SUDP is a three-party protocol in which an agent proposes an operation, the user issues a fresh grant, and a custodian executes it, satisfying seven security properties for bounded secret use without reusable authority transfer.

The Interlocutor Effect: Why LLMs Leak More Personal Data to Agents Than Humans

cs.HC · 2026-04-26 · unverdicted · novelty 6.0

LLMs leak up to 23 percentage points more PII to AI agents than humans, attributed to inactive safety attention heads in 3,464 tested interactions.

Policy-Invisible Violations in LLM-Based Agents

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

LLM agents commit policy-invisible violations when policy facts are hidden from their context; a graph-simulation enforcer reaches 93% accuracy vs 68.8% for content-only baselines on a new 600-trace benchmark.

Security Considerations for Multi-agent Systems

cs.CR · 2026-03-09 · unverdicted · novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

cs.SE · 2026-04-16 · unverdicted · novelty 5.0

Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.

citing papers explorer

Showing 1 of 1 citing paper after filters.

The Interlocutor Effect: Why LLMs Leak More Personal Data to Agents Than Humans cs.HC · 2026-04-26 · unverdicted · none · ref 6 · internal anchor
LLMs leak up to 23 percentage points more PII to AI agents than humans, attributed to inactive safety attention heads in 3,464 tested interactions.

Yuxuan Qiao, Dongqin Liu, Hongchang Yang, Wei Zhou, and Songlin Hu

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer