pith. sign in

arxiv: 2605.23668 · v1 · pith:WYMIQPYHnew · submitted 2026-05-22 · 💻 cs.CL · cs.AI

OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations

Pith reviewed 2026-05-25 04:13 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords next-query predictionmulti-turn dialogueintent memoryrecursive compressionproactive conversationreinforcement learningNQP-Benchtoken efficiency
0
0 comments X

The pith

OnePred predicts the next user query using only a recursively updated intent memory instead of the full conversation history.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that next-query prediction in multi-turn dialogues can be done accurately without re-reading raw history by maintaining a compact memory that tracks the user's evolving intent trajectory. This matters because conversational LLMs process millions of dialogues daily yet remain reactive, and full-history inputs cause linearly growing token costs while truncation loses cross-turn context. OnePred bounds per-turn cost independently of length via recursive memory updates and trains it through a two-stage reinforcement learning process that first teaches prediction targets then compression into an intent chain. The approach is tested on a new NQP-Bench spanning three subsets and yields up to 22 times lower token use while outperforming baselines, with larger gains on longer conversations.

Core claim

Accurate next-query prediction does not require re-reading raw dialogue history; it suffices to track the user's evolving intent trajectory across topics, unresolved needs, and interest shifts by maintaining a recursively updated memory as the sole cross-turn context. The memory is shaped into a prediction-oriented intent chain through a two-stage reinforcement learning pipeline that first teaches what to predict and then what to compress. This bounds per-turn token consumption independently of conversation length and improves prediction quality over full-history and truncated baselines on the introduced NQP-Bench.

What carries the argument

Recursive Intent Memory: a compressed, recursively updated representation of the user's intent trajectory that serves as the only cross-turn context for prediction.

If this is right

  • Per-turn token consumption remains bounded regardless of how many turns the conversation reaches.
  • Prediction quality improves over full-history inputs especially as conversation length grows.
  • The two-stage RL process produces a memory that functions as a standalone intent chain for downstream prediction.
  • Proactive interaction becomes feasible in production systems without linear compute scaling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same recursive memory could support other proactive tasks such as suggesting follow-up actions or detecting intent shifts in real time.
  • Deploying this approach might lower latency in live chat systems by eliminating the need to re-process growing histories on each turn.
  • The method could transfer to non-conversational multi-turn settings like sequential decision making where only a compact state summary is retained.

Load-bearing premise

Accurate next-query prediction does not require re-reading raw history and can instead rely solely on a recursively updated memory that tracks the user's evolving intent trajectory across topics, unresolved needs, and interest shifts.

What would settle it

A direct comparison on conversations longer than those in NQP-Bench where the memory-based model shows lower next-query accuracy than a full-history baseline despite using far fewer tokens.

Figures

Figures reproduced from arXiv: 2605.23668 by Bowen Zhang, Da Zhu, Guanjun Jiang, Jiangwang Chen, Jiazheng Kang, Xiao Yang, Zixin Song.

Figure 1
Figure 1. Figure 1: Overview of the NQP-Bench construction pipeline. The process integrates heuristic filtering, a two￾stage LLM cascade for rigorous predictability screening, and a retroactive truncation mining strategy to salvage high-quality conversation prefixes from the DROP pool. ity; and NQP-Share, derived from ShareChat (Yan et al., 2025) 2 for cross-source generalization evalu￾ation. NQP-Bench targets context-grounde… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of OnePred. Top: the recursive intent memory mechanism. At each turn, the model receives only the previous memory mt−1 and the current observation (qt, rt), and outputs an updated memory mt. Bottom: the two-stage RL training pipeline. Stage 1 (Full-History RL) treats the entire conversation as a single-step input and directly optimizes prediction. Stage 2 (Agentic Memory RL) trains the model to pr… view at source ↗
Figure 3
Figure 3. Figure 3: quantifies this difference on NQP-Wild. Our method uses roughly 650 tokens per turn re￾gardless of conversation length. Full-history starts at ∼2,500 tokens for a 2-turn dialogue and grows to over 14,000 tokens by turn 14, reaching a 13× gap at turn 8 and 22× at turn 14. This gap re￾mains important even with KV caching. Although KV caching avoids redundant prefill computation for previously seen tokens, ea… view at source ↗
Figure 4
Figure 4. Figure 4: Performance by dialogue length on NQP-Wild [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sunburst charts detailing the hierarchical intent taxonomy and their distributions. The datasets demonstrate [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Proportion of intent transfer paradigms. Over [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of sample difficulty. Nearly half [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative example from NQP-Wild. Left: abbreviated user queries from a 13-turn conversation spanning [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
read the original abstract

Although large language model (LLM) conversational systems process millions of multi-turn dialogues daily, they remain fundamentally reactive: they respond only after the user types a query. A key step toward proactive interaction is next-query prediction, which anticipates the user's subsequent query based solely on the preceding dialogue. Progress on this task is hindered by the lack of dedicated benchmarks and a fundamental efficiency--quality trade-off: naively concatenating full dialogue history incurs linearly growing token consumption, while truncating to the latest turn discards crucial cross-turn context. Our key insight is that accurate prediction does not require re-reading raw history; it suffices to track the user's evolving intent trajectory across topics, unresolved needs, and interest shifts. We propose OnePred, which maintains a recursively updated memory as its sole cross-turn context, bounding the per-turn cost independently of conversation length. We train the model via a two-stage reinforcement learning pipeline that first teaches what to predict, then what to compress, shaping the memory into a prediction-oriented intent chain. To establish a rigorous testbed, we introduce NQP-Bench, spanning three diverse subsets. Experiments demonstrate that OnePred reduces per-turn token consumption by up to 22$\times$ compared to full-history inputs while consistently exceeding all baselines in prediction quality, with larger gains on longer conversations. Our code is publicly available at https://github.com/ZBWpro/OnePred.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes OnePred for next-query prediction in multi-turn LLM conversations. It maintains a recursively updated intent memory as the sole cross-turn context to track evolving user intent, topics, unresolved needs, and shifts. A two-stage RL pipeline first teaches prediction then compression. The work introduces NQP-Bench with three subsets and reports up to 22× per-turn token reduction versus full-history baselines while exceeding all baselines in prediction quality, with gains increasing on longer conversations. Code is released publicly.

Significance. If the empirical claims hold, the bounded recursive memory approach would address a core efficiency-quality trade-off in conversational systems, enabling proactive next-query prediction without linear token growth. Public code and the new benchmark are positive contributions that could support follow-on work.

minor comments (3)
  1. The abstract states performance gains and a 22× token reduction but provides no experimental details, baseline descriptions, statistical significance, or verification of the recursive update mechanism; the full manuscript should include these in the experiments section to allow verification.
  2. Clarify the exact form of the recursive update rule and how the two-stage RL shapes the memory into a 'prediction-oriented intent chain' (mentioned in abstract); a concrete pseudocode or equation would improve reproducibility.
  3. The NQP-Bench description is high-level; add dataset statistics, construction details, and evaluation metrics in a dedicated section or table.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work on OnePred and the recommendation for minor revision. No specific major comments appear in the provided report, so we have no individual points to address.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and described method present a self-contained experimental claim: a bounded recursive memory is proposed and trained via a two-stage RL pipeline to enable next-query prediction, with quality gains measured against full-history and truncated baselines on the introduced NQP-Bench. No equations, derivations, or load-bearing steps are visible that reduce by construction to fitted parameters, self-definitions, or self-citation chains. The key assumption is explicitly positioned as the hypothesis under test rather than presupposed, and results are reported as external validation. This is the normal case of an independent empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no concrete free parameters, axioms, or invented entities can be extracted without the full manuscript and its equations or experimental sections.

pith-pipeline@v0.9.0 · 5795 in / 1086 out tokens · 18454 ms · 2026-05-25T04:13:39.562616+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 6 internal anchors

  1. [1]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    WildChat: 1M ChatGPT Interaction Logs in the Wild , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  2. [2]

    2023 , url=

    ShareGPT: Share your wildest ChatGPT conversations with one click , author=. 2023 , url=

  3. [3]

    ShareChat: A Dataset of Chatbot Conversations in the Wild

    ShareChat: A Dataset of Chatbot Conversations in the Wild , author=. arXiv preprint arXiv:2512.17843 , year=

  4. [4]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  5. [5]

    Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM) , year=

    Learning to Attend, Copy, and Generate for Session-Based Query Suggestion , author=. Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM) , year=

  6. [6]

    Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval , year=

    Context Attentive Document Ranking and Query Suggestion , author=. Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval , year=

  7. [7]

    Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue , year=

    The Ubuntu Dialogue Corpus , author=. Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue , year=

  8. [8]

    Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) , year=

    Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots , author=. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) , year=

  9. [10]

    arXiv preprint , year=

    VERL: An Extensible Framework for Post-Training of Large Language Models , author=. arXiv preprint , year=

  10. [11]

    arXiv preprint , year=

    Qwen3 Technical Report , author=. arXiv preprint , year=

  11. [12]

    International conference on extending database technology , pages=

    Query recommendation using query logs in search engines , author=. International conference on extending database technology , pages=. 2004 , organization=

  12. [13]

    Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

    Context-aware query suggestion by mining click-through and session data , author=. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

  13. [14]

    proceedings of the 24th ACM international on conference on information and knowledge management , pages=

    A hierarchical recurrent encoder-decoder for generative context-aware query suggestion , author=. proceedings of the 24th ACM international on conference on information and knowledge management , pages=

  14. [15]

    Proceedings of the web conference 2020 , pages=

    Leading conversational search by suggesting useful questions , author=. Proceedings of the web conference 2020 , pages=

  15. [16]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track , pages=

    CTR-guided generative query suggestion in conversational search , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track , pages=

  16. [17]

    Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

    From clicks to preference: A multi-stage alignment framework for generative query suggestion in conversational system , author=. Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 , pages=

  17. [18]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Onesug: The unified end-to-end generative framework for e-commerce query suggestion , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  18. [19]

    arXiv preprint arXiv:2601.09713 , year=

    LLM-Driven Preference Data Synthesis for Proactive Prediction of the Next User Utterance in Human-Machine Dialogue , author=. arXiv preprint arXiv:2601.09713 , year=

  19. [20]

    Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

    Proactive human-machine conversation with explicit conversation goal , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

  20. [21]

    arXiv preprint arXiv:2305.02750 , year=

    A survey on proactive dialogue systems: Problems, methods, and prospects , author=. arXiv preprint arXiv:2305.02750 , year=

  21. [22]

    Proceedings of the 29th Conference on Computational Natural Language Learning , pages=

    Interpersonal memory matters: A new task for proactive dialogue utilizing conversational history , author=. Proceedings of the 29th Conference on Computational Natural Language Learning , pages=

  22. [23]

    Transactions of the association for computational linguistics , volume=

    Lost in the middle: How language models use long contexts , author=. Transactions of the association for computational linguistics , volume=

  23. [24]

    Advances in Neural Information Processing Systems , volume=

    Augmenting language models with long-term memory , author=. Advances in Neural Information Processing Systems , volume=

  24. [25]

    URL https://arxiv

    Memagent: Reshaping long-context llm with multi-conv rl-based memory agent, 2025 , author=. URL https://arxiv. org/abs/2507 , volume=

  25. [26]

    MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

    Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents , author=. arXiv preprint arXiv:2506.15841 , year=

  26. [27]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

  27. [28]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

  28. [29]

    arXiv preprint arXiv:2603.04946 , year=

    LocalSUG: Geography-Aware LLM for Query Suggestion in Local-Life Services , author=. arXiv preprint arXiv:2603.04946 , year=

  29. [30]

    , author=

    A technique for the measurement of attitudes. , author=. Archives of psychology , year=

  30. [31]

    GLM-5: from Vibe Coding to Agentic Engineering

    Glm-5: from vibe coding to agentic engineering , author=. arXiv preprint arXiv:2602.15763 , year=

  31. [32]

    ACM Transactions on Information Systems , volume=

    Proactive conversational ai: A comprehensive survey of advancements and opportunities , author=. ACM Transactions on Information Systems , volume=. 2025 , publisher=

  32. [33]

    2026 , howpublished=

    Gemini 3.1 Pro Model Card , author=. 2026 , howpublished=

  33. [34]

    2026 , howpublished=

    The Claude Model Card , author=. 2026 , howpublished=

  34. [35]

    2026 , howpublished=

    GPT-5.5 System Card , author=. 2026 , howpublished=

  35. [36]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=