Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs
read the original abstract
Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these results reveal that LLMs possess a native capability to reason over symbolic futures that represent unresolved execution results, enabling an asynchronous paradigm for model-tool interaction.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Ghost Tool Calls: Issue-Time Privacy for Speculative Agent Tools
Ghost tool calls from speculative dispatch create persistent intent leaks that only issue-time policies changing or suppressing call arguments or destinations can reduce, per evaluations of twelve policies on three corpora.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.