pith. sign in

arxiv: 2606.02483 · v1 · pith:J7E5LM5Snew · submitted 2026-06-01 · 💻 cs.CR · cs.AI· cs.CL

Ghost Tool Calls: Issue-Time Privacy for Speculative Agent Tools

Pith reviewed 2026-06-28 13:44 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CL
keywords ghost tool callsspeculative tool callsagent privacytool-augmented agentsintent inferenceprivacy contractsissue-time policies
0
0 comments X

The pith

Speculative tool calls in agents leak inferred user intent to external services before any commitment, and only policies that alter or suppress the call at issue time reduce that leakage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Language agents that issue probable future tool calls ahead of time to mask latency create permanent disclosures: external observers retain the inferred user intent even when the agent later drops the branch. The paper shows that timing is the core problem because no post-commitment cleanup, read-only rule, or allow-list can retract information already sent. It introduces Speculative Tool Privacy Contracts as a runtime layer that treats pre-commitment observation as a distinct effect. Evaluation of twelve policies on three corpora finds that post-hoc filters and access controls leave inference levels unchanged, while only policies that rewrite or block the call's arguments or destination before dispatch lower what observers can infer.

Core claim

Speculative dispatch increases what an observer can infer about user intent; post-hoc filters, read-only restrictions, and access-control allow-lists leave that inference intact; only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it.

What carries the argument

Speculative Tool Privacy Contracts, a runtime abstraction that treats observation before commitment as a first-class effect distinct from state mutation.

If this is right

  • Commit-time or later privacy mechanisms cannot undo disclosures already received by external services.
  • Agent runtimes must enforce changes to speculative calls before they are sent if inference reduction is required.
  • Read-only restrictions and allow-lists do not address the leakage because the call itself is the disclosure.
  • Any speculative execution path that reaches an external service creates an unrecoverable record of the projected intent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Runtime designers may need to expose policy hooks at the exact moment a speculative call is formed rather than after dispatch.
  • Similar leakage patterns could appear in other latency-hiding techniques that project future actions to external parties.
  • Testing whether inference reductions hold for new tool sets or different user populations would directly test generalizability.

Load-bearing premise

The three evaluation corpora capture typical user intents and tool patterns so that the measured reductions in inference apply more broadly.

What would settle it

A measurement showing that an external observer's accuracy at inferring user intent drops after receiving a ghost call when only a post-commitment filter or access list is applied would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.02483 by Akhil Arora, Bardia Mohammadi, Lars Klein, Laurent Bindschaedler.

Figure 1
Figure 1. Figure 1: Motivating trace: user-facing two-turn dia [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Paired-replay leakage by policy on 150 tasks and three seeds ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
read the original abstract

Tool-augmented language agents speculatively issue likely future tool calls to hide latency, but those calls leak inferred user intent to external services before the agent commits to the branch. Every external observer that received the call retains the disclosure after the agent abandons the branch. Timing is the issue, not authorization: no commit-time cleanup, read-only restriction, or access-control allow-list unsends what an observer already holds. We call these invocations ghost tool calls and propose Speculative Tool Privacy Contracts, a runtime abstraction that treats observation before commitment as a first-class effect, distinct from state mutation. We implement the contracts in a prototype runtime and evaluate twelve policies across three corpora. Speculative dispatch increases what an observer can infer about user intent; post-hoc filters, read-only restrictions, and access-control allow-lists leave that inference intact; only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that speculative tool calls in tool-augmented language agents create 'ghost tool calls' that leak inferred user intent to external services before commitment, and that this is a timing issue unaddressed by commit-time cleanups, read-only restrictions, or access-control lists. It introduces Speculative Tool Privacy Contracts as a runtime abstraction treating pre-commitment observation as a first-class effect, implements them in a prototype, and evaluates twelve policies across three corpora to conclude that only issue-time policies changing or suppressing the call's argument or destination projection reduce inference, while post-hoc filters, read-only, and ACL policies leave it intact.

Significance. If the empirical distinction holds, the work identifies a timing-based privacy risk in speculative agent execution that standard authorization mechanisms cannot mitigate and motivates new runtime abstractions. The prototype runtime implementation is a concrete strength that demonstrates feasibility of the contracts. The policy classification could guide agent platform design, though its scope is limited by the evaluation's dependence on the three corpora.

major comments (1)
  1. [Evaluation section] Evaluation section: The central claim that 'only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it' while other policies leave inference intact rests on the comparison across three corpora. The manuscript supplies no details on inference measurement methods, corpus construction, or statistical controls. This prevents assessment of whether the data supports the policy distinction, and the representativeness of the corpora for typical user intents and tool usage patterns (e.g., multi-turn planning or sensitive domains) remains unaddressed, limiting generalization beyond the tested cases.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need for greater transparency in the evaluation methodology. We address the concern point-by-point below and will incorporate the requested details in the revised manuscript.

read point-by-point responses
  1. Referee: [Evaluation section] Evaluation section: The central claim that 'only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it' while other policies leave inference intact rests on the comparison across three corpora. The manuscript supplies no details on inference measurement methods, corpus construction, or statistical controls. This prevents assessment of whether the data supports the policy distinction, and the representativeness of the corpora for typical user intents and tool usage patterns (e.g., multi-turn planning or sensitive domains) remains unaddressed, limiting generalization beyond the tested cases.

    Authors: We agree that the Evaluation section would be strengthened by explicit descriptions of the inference measurement methods (including the models and metrics used to quantify intent leakage), the construction process for the three corpora (sources, sizes, selection criteria, and coverage of multi-turn interactions), and any statistical controls or significance testing applied to the policy comparisons. We will add a dedicated subsection detailing these elements. We will also expand the discussion of limitations to address representativeness, explicitly noting the corpora’s coverage of common planning patterns while acknowledging that generalization to highly sensitive domains would require additional validation. These additions will allow readers to better assess the empirical support for the policy distinction. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical policy evaluation

full rationale

The paper's central claim rests on an empirical evaluation of twelve policies across three corpora, measuring inference reductions from speculative tool calls. No derivation chain reduces to self-definition, fitted parameters presented as predictions, or load-bearing self-citations; the result is obtained from direct experimental comparison rather than presupposed by the inputs or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that external observers retain disclosures after branch abandonment and that the evaluation corpora allow reliable measurement of intent inference. No free parameters are mentioned. Two invented entities are introduced without independent evidence outside the paper.

axioms (1)
  • domain assumption Observation before commitment is a first-class effect distinct from state mutation
    Stated explicitly as the basis for treating issue-time privacy separately.
invented entities (2)
  • Ghost tool calls no independent evidence
    purpose: To label speculative invocations that leak intent even after abandonment
    New term coined in the paper to frame the privacy issue.
  • Speculative Tool Privacy Contracts no independent evidence
    purpose: Runtime abstraction enforcing issue-time privacy effects
    Proposed mechanism to address the identified problem.

pith-pipeline@v0.9.1-grok · 5700 in / 1256 out tokens · 26282 ms · 2026-06-28T13:44:32.420836+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 13 canonical work pages · 6 internal anchors

  1. [1]

    Proceedings of the

    Mohammadi, Bardia and Potamitis, Nearchos and Klein, Lars Henning and Arora, Akhil and Bindschaedler, Laurent , title =. Proceedings of the. 2026 , publisher =

  2. [2]

    Parallelizing Tool Execution and LLM Generation for Low-Latency Agent Serving

    Sui, Yifan and Zhao, Han and Ma, Rui and He, Zhiyuan and Wang, Hao and Li, Jianxun and Yang, Yuqing , title =. 2026 , month = mar, eprint =. doi:10.48550/arXiv.2603.18897 , url =

  3. [3]

    Proceedings of the Fourteenth International Conference on Learning Representations (

    Ye, Naimeng and Ahuja, Arnav and Liargkovas, Georgios and Lu, Yunan and Kaffes, Kostis and Peng, Tianyi , title =. Proceedings of the Fourteenth International Conference on Learning Representations (. 2026 , publisher =

  4. [4]

    2025 , month = dec, eprint =

    Nichols, Daniel and Singhania, Prajwal and Jekel, Charles and Bhatele, Abhinav and Menon, Harshitha , title =. 2025 , month = dec, eprint =. doi:10.48550/arXiv.2512.15834 , url =

  5. [5]

    ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding

    Xia, Heming and Li, Yongqi and Du, Cunxiao and Song, Mingbo and Li, Wenjie , title =. 2026 , month = apr, eprint =. doi:10.48550/arXiv.2604.13519 , url =

  6. [6]

    Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

    Feng, Guangyu and Mao, Huanzhi and Dutta, Prabal and Gonzalez, Joseph E. , title =. 2026 , month = may, eprint =. doi:10.48550/arXiv.2605.15077 , url =

  7. [7]

    Proceedings of the Fourteenth International Conference on Learning Representations (

    Guan, Yilin and Lan, Qingfeng and Sun, Fei and Ding, Dujian and Acharya, Devang and Wang, Chi and Wang, William Yang and Hua, Wenyue , title =. Proceedings of the Fourteenth International Conference on Learning Representations (. 2026 , publisher =

  8. [8]

    Proceedings of the Thirteenth International Conference on Learning Representations (

    Hua, Wenyue and Wan, Mengting and Vadrevu, Jagannath Shashank Subramanya Sai and Nadel, Ryan and Zhang, Yongfeng and Wang, Chi , title =. Proceedings of the Thirteenth International Conference on Learning Representations (. 2025 , publisher =

  9. [9]

    Rtbas: Defending llm agents against prompt injection and privacy leakage.arXiv preprint arXiv:2502.08966, 2025

    Zhong, Peter Yong and Chen, Siyuan and Wang, Ruiqi and McCall, McKenna and Titzer, Ben L. and Miller, Heather and Gibbons, Phillip B. , title =. 2025 , month = feb, eprint =. doi:10.48550/arXiv.2502.08966 , url =

  10. [10]

    Securing AI Agents with Information-Flow Control

    Costa, Manuel and K. Securing. 2025 , month = may, eprint =. doi:10.48550/arXiv.2505.23643 , url =

  11. [11]

    Towards Verifiably Safe Tool Use for

    Doshi, Aarya and Hong, Yining and Xu, Congying and Kang, Eunsuk and Kapravelos, Alexandros and K. Towards Verifiably Safe Tool Use for. Proceedings of the 48th. 2026 , month = apr, address =

  12. [12]

    Toolsafe: Enhancing tool invocation safety of llm-based agents via proactive step-level guardrail and feedback.arXiv preprint arXiv:2601.10156, 2026

    Mou, Yutao and Xue, Zhangchi and Li, Lijun and Liu, Peiyang and Zhang, Shikun and Ye, Wei and Shao, Jing , title =. 2026 , month = jan, eprint =. doi:10.48550/arXiv.2601.10156 , url =

  13. [13]

    arXiv preprint arXiv:2602.11510 , year=

    El Yagoubi, Faouzi and Badu-Marfo, Godwin and Al Mallah, Ranwa , title =. 2026 , month = feb, eprint =. doi:10.48550/arXiv.2602.11510 , url =

  14. [14]

    2026 , month = mar, eprint =

    Huang, Tao and Hou, Chen and Wu, Guosen and Meng, Jiayang , title =. 2026 , month = mar, eprint =. doi:10.48550/arXiv.2603.22751 , url =

  15. [15]

    2025 , month = oct, eprint =

    Zhang, Yixiang and Deng, Xinhao and Gu, Zhongyi and Chen, Yihao and Xu, Ke and Li, Qi and Wu, Jianping , title =. 2025 , month = oct, eprint =. doi:10.48550/arXiv.2510.07176 , url =

  16. [16]

    Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling

    Hooper, Coleman and Kang, Minwoo and Moon, Suhong and Lee, Nicholas and Wen, Eric and Wawrzynek, John and Mahoney, Michael W. and Shao, Yakun Sophia and Gholami, Amir and Keutzer, Kurt , title =. 2026 , month = may, eprint =. doi:10.48550/arXiv.2605.13360 , url =

  17. [17]

    When Speculation Spills Secrets: Side Channels via Speculative Decoding in

    Wei, Jiankun and Abdulrazzag, Abdulrahman and Zhang, Tianchen and M. When Speculation Spills Secrets: Side Channels via Speculative Decoding in. 2025 , month = sep, howpublished =

  18. [18]

    2026 , month = jan, howpublished =

    Wegener, Gregor Herbert , title =. 2026 , month = jan, howpublished =. doi:10.20944/preprints202601.1741.v1 , url =

  19. [19]

    Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

    Cai, Yuandao and Tang, Wensheng and Wen, Cheng and Qin, Shengchao , title =. 2026 , month = apr, eprint =. doi:10.48550/arXiv.2604.23374 , url =

  20. [20]

    Advances in Neural Information Processing Systems , year =

    Debenedetti, Edoardo and Zhang, Jie and Balunovic, Mislav and Beurer-Kellner, Luca and Fischer, Marc and Tram. Advances in Neural Information Processing Systems , year =

  21. [21]

    and Hashimoto, Tatsunori , title =

    Ruan, Yangjun and Dong, Honghua and Wang, Andrew and Pitis, Silviu and Zhou, Yongchao and Ba, Jimmy and Dubois, Yann and Maddison, Chris J. and Hashimoto, Tatsunori , title =. Proceedings of the Twelfth International Conference on Learning Representations (. 2024 , publisher =

  22. [22]

    Findings of the Association for Computational Linguistics:

    Zhan, Qiusi and Liang, Zhixiang and Ying, Zifan and Kang, Daniel , title =. Findings of the Association for Computational Linguistics:. 2024 , month = aug, address =

  23. [23]

    , title =

    Yao, Shunyu and Shinn, Noah and Razavi, Pedram and Narasimhan, Karthik R. , title =. Proceedings of the Thirteenth International Conference on Learning Representations (. 2025 , publisher =

  24. [24]

    Journal of the

    Chor, Benny and Kushilevitz, Eyal and Goldreich, Oded and Sudan, Madhu , title =. Journal of the. 1998 , volume =

  25. [25]

    and Nissenbaum, Helen , title =

    Howe, Daniel C. and Nissenbaum, Helen , title =. Lessons from the Identity Trail: Anonymity, Privacy and Identity in a Networked Society , editor =. 2008 , pages =

  26. [26]

    Washington Law Review , year =

    Nissenbaum, Helen , title =. Washington Law Review , year =

  27. [27]

    Findings of the Association for Computational Linguistics:

    Zeng, Shenglai and Zhang, Jiankun and He, Pengfei and Liu, Yiding and Xing, Yue and Xu, Han and Ren, Jie and Chang, Yi and Wang, Shuaiqiang and Yin, Dawei and Tang, Jiliang , title =. Findings of the Association for Computational Linguistics:. 2024 , month = aug, address =

  28. [28]

    , title =

    Denning, Dorothy E. , title =. Communications of the. 1976 , volume =

  29. [29]

    , title =

    Myers, Andrew C. , title =. Proceedings of the 26th. 1999 , pages =

  30. [30]

    Proceedings of the 2019

    Kocher, Paul and Horn, Jann and Fogh, Anders and Genkin, Daniel and Gruss, Daniel and Haas, Werner and Hamburg, Mike and Lipp, Moritz and Mangard, Stefan and Prescher, Thomas and Schwarz, Michael and Yarom, Yuval , title =. Proceedings of the 2019. 2019 , pages =