Ghost Tool Calls: Issue-Time Privacy for Speculative Agent Tools
Pith reviewed 2026-06-28 13:44 UTC · model grok-4.3
The pith
Speculative tool calls in agents leak inferred user intent to external services before any commitment, and only policies that alter or suppress the call at issue time reduce that leakage.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Speculative dispatch increases what an observer can infer about user intent; post-hoc filters, read-only restrictions, and access-control allow-lists leave that inference intact; only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it.
What carries the argument
Speculative Tool Privacy Contracts, a runtime abstraction that treats observation before commitment as a first-class effect distinct from state mutation.
If this is right
- Commit-time or later privacy mechanisms cannot undo disclosures already received by external services.
- Agent runtimes must enforce changes to speculative calls before they are sent if inference reduction is required.
- Read-only restrictions and allow-lists do not address the leakage because the call itself is the disclosure.
- Any speculative execution path that reaches an external service creates an unrecoverable record of the projected intent.
Where Pith is reading between the lines
- Runtime designers may need to expose policy hooks at the exact moment a speculative call is formed rather than after dispatch.
- Similar leakage patterns could appear in other latency-hiding techniques that project future actions to external parties.
- Testing whether inference reductions hold for new tool sets or different user populations would directly test generalizability.
Load-bearing premise
The three evaluation corpora capture typical user intents and tool patterns so that the measured reductions in inference apply more broadly.
What would settle it
A measurement showing that an external observer's accuracy at inferring user intent drops after receiving a ghost call when only a post-commitment filter or access list is applied would falsify the central claim.
Figures
read the original abstract
Tool-augmented language agents speculatively issue likely future tool calls to hide latency, but those calls leak inferred user intent to external services before the agent commits to the branch. Every external observer that received the call retains the disclosure after the agent abandons the branch. Timing is the issue, not authorization: no commit-time cleanup, read-only restriction, or access-control allow-list unsends what an observer already holds. We call these invocations ghost tool calls and propose Speculative Tool Privacy Contracts, a runtime abstraction that treats observation before commitment as a first-class effect, distinct from state mutation. We implement the contracts in a prototype runtime and evaluate twelve policies across three corpora. Speculative dispatch increases what an observer can infer about user intent; post-hoc filters, read-only restrictions, and access-control allow-lists leave that inference intact; only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that speculative tool calls in tool-augmented language agents create 'ghost tool calls' that leak inferred user intent to external services before commitment, and that this is a timing issue unaddressed by commit-time cleanups, read-only restrictions, or access-control lists. It introduces Speculative Tool Privacy Contracts as a runtime abstraction treating pre-commitment observation as a first-class effect, implements them in a prototype, and evaluates twelve policies across three corpora to conclude that only issue-time policies changing or suppressing the call's argument or destination projection reduce inference, while post-hoc filters, read-only, and ACL policies leave it intact.
Significance. If the empirical distinction holds, the work identifies a timing-based privacy risk in speculative agent execution that standard authorization mechanisms cannot mitigate and motivates new runtime abstractions. The prototype runtime implementation is a concrete strength that demonstrates feasibility of the contracts. The policy classification could guide agent platform design, though its scope is limited by the evaluation's dependence on the three corpora.
major comments (1)
- [Evaluation section] Evaluation section: The central claim that 'only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it' while other policies leave inference intact rests on the comparison across three corpora. The manuscript supplies no details on inference measurement methods, corpus construction, or statistical controls. This prevents assessment of whether the data supports the policy distinction, and the representativeness of the corpora for typical user intents and tool usage patterns (e.g., multi-turn planning or sensitive domains) remains unaddressed, limiting generalization beyond the tested cases.
Simulated Author's Rebuttal
We thank the referee for highlighting the need for greater transparency in the evaluation methodology. We address the concern point-by-point below and will incorporate the requested details in the revised manuscript.
read point-by-point responses
-
Referee: [Evaluation section] Evaluation section: The central claim that 'only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it' while other policies leave inference intact rests on the comparison across three corpora. The manuscript supplies no details on inference measurement methods, corpus construction, or statistical controls. This prevents assessment of whether the data supports the policy distinction, and the representativeness of the corpora for typical user intents and tool usage patterns (e.g., multi-turn planning or sensitive domains) remains unaddressed, limiting generalization beyond the tested cases.
Authors: We agree that the Evaluation section would be strengthened by explicit descriptions of the inference measurement methods (including the models and metrics used to quantify intent leakage), the construction process for the three corpora (sources, sizes, selection criteria, and coverage of multi-turn interactions), and any statistical controls or significance testing applied to the policy comparisons. We will add a dedicated subsection detailing these elements. We will also expand the discussion of limitations to address representativeness, explicitly noting the corpora’s coverage of common planning patterns while acknowledging that generalization to highly sensitive domains would require additional validation. These additions will allow readers to better assess the empirical support for the policy distinction. revision: yes
Circularity Check
No circularity in empirical policy evaluation
full rationale
The paper's central claim rests on an empirical evaluation of twelve policies across three corpora, measuring inference reductions from speculative tool calls. No derivation chain reduces to self-definition, fitted parameters presented as predictions, or load-bearing self-citations; the result is obtained from direct experimental comparison rather than presupposed by the inputs or prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Observation before commitment is a first-class effect distinct from state mutation
invented entities (2)
-
Ghost tool calls
no independent evidence
-
Speculative Tool Privacy Contracts
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Proceedings of the
Mohammadi, Bardia and Potamitis, Nearchos and Klein, Lars Henning and Arora, Akhil and Bindschaedler, Laurent , title =. Proceedings of the. 2026 , publisher =
2026
-
[2]
Parallelizing Tool Execution and LLM Generation for Low-Latency Agent Serving
Sui, Yifan and Zhao, Han and Ma, Rui and He, Zhiyuan and Wang, Hao and Li, Jianxun and Yang, Yuqing , title =. 2026 , month = mar, eprint =. doi:10.48550/arXiv.2603.18897 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.18897 2026
-
[3]
Proceedings of the Fourteenth International Conference on Learning Representations (
Ye, Naimeng and Ahuja, Arnav and Liargkovas, Georgios and Lu, Yunan and Kaffes, Kostis and Peng, Tianyi , title =. Proceedings of the Fourteenth International Conference on Learning Representations (. 2026 , publisher =
2026
-
[4]
Nichols, Daniel and Singhania, Prajwal and Jekel, Charles and Bhatele, Abhinav and Menon, Harshitha , title =. 2025 , month = dec, eprint =. doi:10.48550/arXiv.2512.15834 , url =
-
[5]
ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding
Xia, Heming and Li, Yongqi and Du, Cunxiao and Song, Mingbo and Li, Wenjie , title =. 2026 , month = apr, eprint =. doi:10.48550/arXiv.2604.13519 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.13519 2026
-
[6]
Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs
Feng, Guangyu and Mao, Huanzhi and Dutta, Prabal and Gonzalez, Joseph E. , title =. 2026 , month = may, eprint =. doi:10.48550/arXiv.2605.15077 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.15077 2026
-
[7]
Proceedings of the Fourteenth International Conference on Learning Representations (
Guan, Yilin and Lan, Qingfeng and Sun, Fei and Ding, Dujian and Acharya, Devang and Wang, Chi and Wang, William Yang and Hua, Wenyue , title =. Proceedings of the Fourteenth International Conference on Learning Representations (. 2026 , publisher =
2026
-
[8]
Proceedings of the Thirteenth International Conference on Learning Representations (
Hua, Wenyue and Wan, Mengting and Vadrevu, Jagannath Shashank Subramanya Sai and Nadel, Ryan and Zhang, Yongfeng and Wang, Chi , title =. Proceedings of the Thirteenth International Conference on Learning Representations (. 2025 , publisher =
2025
-
[9]
Zhong, Peter Yong and Chen, Siyuan and Wang, Ruiqi and McCall, McKenna and Titzer, Ben L. and Miller, Heather and Gibbons, Phillip B. , title =. 2025 , month = feb, eprint =. doi:10.48550/arXiv.2502.08966 , url =
-
[10]
Securing AI Agents with Information-Flow Control
Costa, Manuel and K. Securing. 2025 , month = may, eprint =. doi:10.48550/arXiv.2505.23643 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.23643 2025
-
[11]
Towards Verifiably Safe Tool Use for
Doshi, Aarya and Hong, Yining and Xu, Congying and Kang, Eunsuk and Kapravelos, Alexandros and K. Towards Verifiably Safe Tool Use for. Proceedings of the 48th. 2026 , month = apr, address =
2026
-
[12]
Mou, Yutao and Xue, Zhangchi and Li, Lijun and Liu, Peiyang and Zhang, Shikun and Ye, Wei and Shao, Jing , title =. 2026 , month = jan, eprint =. doi:10.48550/arXiv.2601.10156 , url =
-
[13]
arXiv preprint arXiv:2602.11510 , year=
El Yagoubi, Faouzi and Badu-Marfo, Godwin and Al Mallah, Ranwa , title =. 2026 , month = feb, eprint =. doi:10.48550/arXiv.2602.11510 , url =
-
[14]
Huang, Tao and Hou, Chen and Wu, Guosen and Meng, Jiayang , title =. 2026 , month = mar, eprint =. doi:10.48550/arXiv.2603.22751 , url =
-
[15]
Zhang, Yixiang and Deng, Xinhao and Gu, Zhongyi and Chen, Yihao and Xu, Ke and Li, Qi and Wu, Jianping , title =. 2025 , month = oct, eprint =. doi:10.48550/arXiv.2510.07176 , url =
-
[16]
Hooper, Coleman and Kang, Minwoo and Moon, Suhong and Lee, Nicholas and Wen, Eric and Wawrzynek, John and Mahoney, Michael W. and Shao, Yakun Sophia and Gholami, Amir and Keutzer, Kurt , title =. 2026 , month = may, eprint =. doi:10.48550/arXiv.2605.13360 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.13360 2026
-
[17]
When Speculation Spills Secrets: Side Channels via Speculative Decoding in
Wei, Jiankun and Abdulrazzag, Abdulrahman and Zhang, Tianchen and M. When Speculation Spills Secrets: Side Channels via Speculative Decoding in. 2025 , month = sep, howpublished =
2025
-
[18]
2026 , month = jan, howpublished =
Wegener, Gregor Herbert , title =. 2026 , month = jan, howpublished =. doi:10.20944/preprints202601.1741.v1 , url =
-
[19]
Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents
Cai, Yuandao and Tang, Wensheng and Wen, Cheng and Qin, Shengchao , title =. 2026 , month = apr, eprint =. doi:10.48550/arXiv.2604.23374 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.23374 2026
-
[20]
Advances in Neural Information Processing Systems , year =
Debenedetti, Edoardo and Zhang, Jie and Balunovic, Mislav and Beurer-Kellner, Luca and Fischer, Marc and Tram. Advances in Neural Information Processing Systems , year =
-
[21]
and Hashimoto, Tatsunori , title =
Ruan, Yangjun and Dong, Honghua and Wang, Andrew and Pitis, Silviu and Zhou, Yongchao and Ba, Jimmy and Dubois, Yann and Maddison, Chris J. and Hashimoto, Tatsunori , title =. Proceedings of the Twelfth International Conference on Learning Representations (. 2024 , publisher =
2024
-
[22]
Findings of the Association for Computational Linguistics:
Zhan, Qiusi and Liang, Zhixiang and Ying, Zifan and Kang, Daniel , title =. Findings of the Association for Computational Linguistics:. 2024 , month = aug, address =
2024
-
[23]
, title =
Yao, Shunyu and Shinn, Noah and Razavi, Pedram and Narasimhan, Karthik R. , title =. Proceedings of the Thirteenth International Conference on Learning Representations (. 2025 , publisher =
2025
-
[24]
Journal of the
Chor, Benny and Kushilevitz, Eyal and Goldreich, Oded and Sudan, Madhu , title =. Journal of the. 1998 , volume =
1998
-
[25]
and Nissenbaum, Helen , title =
Howe, Daniel C. and Nissenbaum, Helen , title =. Lessons from the Identity Trail: Anonymity, Privacy and Identity in a Networked Society , editor =. 2008 , pages =
2008
-
[26]
Washington Law Review , year =
Nissenbaum, Helen , title =. Washington Law Review , year =
-
[27]
Findings of the Association for Computational Linguistics:
Zeng, Shenglai and Zhang, Jiankun and He, Pengfei and Liu, Yiding and Xing, Yue and Xu, Han and Ren, Jie and Chang, Yi and Wang, Shuaiqiang and Yin, Dawei and Tang, Jiliang , title =. Findings of the Association for Computational Linguistics:. 2024 , month = aug, address =
2024
-
[28]
, title =
Denning, Dorothy E. , title =. Communications of the. 1976 , volume =
1976
-
[29]
, title =
Myers, Andrew C. , title =. Proceedings of the 26th. 1999 , pages =
1999
-
[30]
Proceedings of the 2019
Kocher, Paul and Horn, Jann and Fogh, Anders and Genkin, Daniel and Gruss, Daniel and Haas, Werner and Hamburg, Mike and Lipp, Moritz and Mangard, Stefan and Prescher, Thomas and Schwarz, Michael and Yarom, Yuval , title =. Proceedings of the 2019. 2019 , pages =
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.