Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

George Yakushev , Nataliia Babina , Masoud Vahid Dastgerdi , Vyacheslav Zhdanovskiy , Denis Kuznedelev , Alina Shutova , Max Ryabinin

Authors on Pith no claims yet

classification 💻 cs.LG cs.CL

keywords llmsreasoningthinkingthinkwhileadditionalanswerbefore

0 comments

read the original abstract

Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or embodied assistants require an LLM agent to respond and adapt to additional information in real time, which is incompatible with sequential interactions. In contrast, humans can listen, think, and act asynchronously: we begin thinking about the problem while reading it and continue thinking while formulating the answer. In this work, we augment LLMs capable of reasoning to operate in a similar way without additional training. Our method uses the properties of positional embeddings to enable LLMs built for sequential generation to simultaneously think, listen, and write outputs. We evaluate our approach on math, commonsense, and safety reasoning: it allows models to generate accurate thinking-augmented answers while reducing time to first non-thinking token from minutes to ${\le}$ 5s and the overall delays by up to $12{\times}$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling
cs.LG 2026-05 unverdicted novelty 6.0

Asynchronous I/O and Speculative Tool Calling cut latency in tool-calling LLM agents by 1.3-2.2x with only minor accuracy loss on cloud and edge models.
Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling
cs.LG 2026-05 unverdicted novelty 6.0

Speculative Interaction Agents achieve 1.3-2.2x speedups for real-time tool-calling agents via async I/O decoupling and speculative calls, with clock-based training for small edge models.