Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

· 2025 · cs.LG · arXiv 2512.10931

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or embodied assistants require an LLM agent to respond and adapt to additional information in real time, which is incompatible with sequential interactions. In contrast, humans can listen, think, and act asynchronously: we begin thinking about the problem while reading it and continue thinking while formulating the answer. In this work, we augment LLMs capable of reasoning to operate in a similar way without additional training. Our method uses the properties of positional embeddings to enable LLMs built for sequential generation to simultaneously think, listen, and write outputs. We evaluate our approach on math, commonsense, and safety reasoning: it allows models to generate accurate thinking-augmented answers while reducing time to first non-thinking token from minutes to ${\le}$ 5s and the overall delays by up to $12{\times}$.

representative citing papers

Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling

cs.LG · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

Speculative Interaction Agents achieve 1.3-2.2x speedups for real-time tool-calling agents via async I/O decoupling and speculative calls, with clock-based training for small edge models.

citing papers explorer

Showing 1 of 1 citing paper.

Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling cs.LG · 2026-05-13 · unverdicted · none · ref 14 · 2 links · internal anchor
Speculative Interaction Agents achieve 1.3-2.2x speedups for real-time tool-calling agents via async I/O decoupling and speculative calls, with clock-based training for small edge models.

Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

fields

years

verdicts

representative citing papers

citing papers explorer