pith. sign in

arxiv: 2605.15077 · v1 · pith:MQTZIXSInew · submitted 2026-05-14 · 💻 cs.CL · cs.AI· cs.LG

Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

classification 💻 cs.CL cs.AIcs.LG
keywords functionexecutionasyncfcdecodingasynchronousbenchmarkscallingcapability
0
0 comments X
read the original abstract

Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these results reveal that LLMs possess a native capability to reason over symbolic futures that represent unresolved execution results, enabling an asynchronous paradigm for model-tool interaction.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.