Recognition: unknown
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
Pith reviewed 2026-05-08 10:22 UTC · model grok-4.3
The pith
LatentRAG moves multi-step reasoning and retrieval into continuous latent space to cut agentic RAG latency by roughly 90 percent while matching explicit methods on complex questions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By producing latent tokens for thoughts and subqueries directly from hidden states in a single forward pass, aligning the LLM with dense retrieval models in latent space, and adding parallel latent decoding to natural language, LatentRAG performs the multi-step retrieval and reasoning of agentic RAG without autoregressive generation of lengthy intermediate text.
What carries the argument
Latent tokens extracted from hidden states in one forward pass, aligned with a dense retriever and optionally decoded back to natural language.
If this is right
- Agentic RAG can retain multi-step search behavior while operating at speeds close to single-step RAG.
- Retrieval can be performed directly over continuous latent representations of subqueries rather than discrete text.
- Joint training of the generator and retriever becomes possible because gradients flow through the latent alignment.
- Interpretability is preserved by the optional decoding of latent tokens into readable intermediate steps.
- The same latent-space shift can be applied to other iterative LLM tasks that currently rely on explicit token generation.
Where Pith is reading between the lines
- Longer or more deeply nested reasoning chains could be supported without a linear increase in latency.
- Tool-use or planning loops that now require many autoregressive steps might be accelerated by analogous latent representations.
- If latent tokens prove sufficient for retrieval, explicit natural-language intermediates may be treated as optional outputs rather than required intermediates in efficiency-sensitive deployments.
Load-bearing premise
Latent tokens taken from hidden states can faithfully carry the semantic content of natural-language thoughts and subqueries so that retrieval and end-to-end optimization remain effective.
What would settle it
A side-by-side evaluation in which LatentRAG accuracy drops measurably below explicit agentic RAG on questions that require several distinct reasoning hops, or in which measured end-to-end latency fails to show a reduction near 90 percent.
Figures
read the original abstract
Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a multi-step process, in which the large language model (LLM) acts as a search agent that generates intermediate thoughts and subqueries to iteratively interact with the retrieval system. This iterative process incurs substantial latency due to the autoregressive generation of lengthy thoughts and subqueries. To address this limitation, we propose LatentRAG, a novel framework that shifts both reasoning and retrieval from discrete language space to continuous latent space. Unlike existing explicit methods that generate natural language thoughts or subqueries token-by-token, LatentRAG produces latent tokens for thoughts and subqueries directly from the hidden states in a single forward pass. We align LLMs with dense retrieval models in the latent space, enabling retrieval over latent subquery tokens and supporting end-to-end joint optimization. To improve transparency and encourage semantically meaningful latent representations, we incorporate a parallel latent decoding mechanism that translates latent tokens back into natural language. Extensive experiments on seven benchmark datasets show that LatentRAG achieves performance comparable to explicit agentic RAG methods while reducing inference latency by approximately 90%, substantially narrowing the latency gap with traditional single-step RAG.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LatentRAG, a framework that moves agentic RAG reasoning and retrieval into continuous latent space: latent tokens for thoughts and subqueries are generated directly from hidden states in a single forward pass, aligned with dense retrieval models for joint optimization, and decoded in parallel to natural language for interpretability. It reports performance comparable to explicit multi-step agentic RAG methods across seven benchmarks while cutting inference latency by ~90%.
Significance. If the empirical results hold under rigorous controls, the work would meaningfully narrow the efficiency gap between single-step RAG and adaptive agentic methods, enabling more practical deployment of complex multi-hop QA. The attempt at end-to-end latent alignment and the parallel decoding mechanism for transparency are constructive ideas that could influence future hybrid latent-explicit systems.
major comments (3)
- [Abstract] Abstract: the central empirical claim of 'comparable performance' and 'approximately 90%' latency reduction is presented without naming the seven benchmarks, the explicit agentic baselines, latency measurement protocol (wall-clock, tokens generated, hardware), error bars, or statistical significance tests. These omissions make it impossible to evaluate whether the latency gain preserves the adaptivity that agentic RAG is designed to provide.
- [Abstract] Abstract and method description: the architecture performs retrieval over latent subquery tokens produced in one forward pass, yet retrieval outputs never re-enter the model to condition subsequent latent tokens. This removes the iterative feedback loop that the paper itself identifies as the source of agentic RAG success on complex questions; no ablation or analysis is supplied showing that pre-encoded latent branches suffice when retrieval results would normally alter the reasoning path.
- [Abstract] The weakest assumption—that hidden-state latents can faithfully encode the semantic content of natural-language thoughts and subqueries sufficiently for effective retrieval—receives no direct validation (e.g., retrieval recall@K on latent vs. explicit subqueries, or human evaluation of decoded thoughts). Without such evidence the end-to-end optimization claim rests on an untested substitution.
minor comments (1)
- [Abstract] The abstract states 'align LLMs with dense retrieval models in the latent space' but does not specify the alignment loss, temperature, or projection layers; these details belong in the main text even if summarized here.
Simulated Author's Rebuttal
We are grateful to the referee for the constructive comments on our paper. We address each of the major comments point by point below, and we will revise the manuscript accordingly to improve clarity and provide additional analyses.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central empirical claim of 'comparable performance' and 'approximately 90%' latency reduction is presented without naming the seven benchmarks, the explicit agentic baselines, latency measurement protocol (wall-clock, tokens generated, hardware), error bars, or statistical significance tests. These omissions make it impossible to evaluate whether the latency gain preserves the adaptivity that agentic RAG is designed to provide.
Authors: We agree that the abstract should include more specific details to facilitate evaluation of the claims. In the revised version, we will name the seven benchmarks, list the explicit agentic baselines, describe the latency measurement protocol (wall-clock time, tokens generated, hardware), and report error bars along with statistical significance tests. These updates will help demonstrate that the reported latency reduction maintains the adaptivity of agentic RAG. revision: yes
-
Referee: [Abstract] Abstract and method description: the architecture performs retrieval over latent subquery tokens produced in one forward pass, yet retrieval outputs never re-enter the model to condition subsequent latent tokens. This removes the iterative feedback loop that the paper itself identifies as the source of agentic RAG success on complex questions; no ablation or analysis is supplied showing that pre-encoded latent branches suffice when retrieval results would normally alter the reasoning path.
Authors: LatentRAG generates latent tokens for thoughts and subqueries in one forward pass to achieve efficiency, with retrieval performed over these latents in parallel. This design avoids the latency of iterative autoregressive generation while aiming to capture multi-step reasoning through parallel latent branches. We note that the manuscript does not provide an ablation on iterative feedback. We will add an ablation study in the revision comparing the current approach to one that incorporates retrieval results for subsequent latent token generation, to show the conditions under which pre-encoded branches are sufficient. revision: yes
-
Referee: [Abstract] The weakest assumption—that hidden-state latents can faithfully encode the semantic content of natural-language thoughts and subqueries sufficiently for effective retrieval—receives no direct validation (e.g., retrieval recall@K on latent vs. explicit subqueries, or human evaluation of decoded thoughts). Without such evidence the end-to-end optimization claim rests on an untested substitution.
Authors: We recognize that direct validation of the semantic fidelity of the latent tokens would strengthen the paper. The parallel latent decoding is provided for interpretability, but we agree it does not constitute quantitative validation such as recall@K or human evaluation. The end-to-end results provide indirect support. In the revised manuscript, we will include retrieval recall@K comparisons between latent and explicit subqueries as well as human evaluations or detailed qualitative analysis of the decoded thoughts. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper presents LatentRAG as an empirical framework that generates latent tokens from hidden states in one forward pass, aligns them with dense retrieval, and evaluates via experiments on seven benchmarks. No equations, predictions, or self-citations are shown that reduce performance gains or the core mechanism to quantities defined by the inputs themselves. Claims rest on external benchmark comparisons rather than any algebraic identity or fitted-parameter renaming.
Axiom & Free-Parameter Ledger
invented entities (2)
-
latent tokens for thoughts and subqueries
no independent evidence
-
parallel latent decoding mechanism
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Large language models in law: A survey.AI Open, 2024
Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, and Philip S Yu. Large language models in law: A survey.AI Open, 2024
2024
-
[2]
A survey on large language models for mathematical reasoning
Peng-Yuan Wang, Tian-Shuo Liu, Chenyang Wang, Ziniu Li, Yidi Wang, Shu Yan, Chengxing Jia, Xu-Hui Liu, Xinwei Chen, Jiacheng Xu, et al. A survey on large language models for mathematical reasoning. ACM Comput. Surv., 2025
2025
-
[3]
Toward expert-level medical question answering with large language models.Nat
Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Mohamed Amin, Le Hou, Kevin Clark, Stephen R Pfohl, Heather Cole-Lewis, et al. Toward expert-level medical question answering with large language models.Nat. Med., 2025
2025
-
[4]
Survey on factuality in large language models.ACM Comput
Cunxiang Wang, Xiaoze Liu, Yuanhao Yue, Qipeng Guo, Xiangkun Hu, Xiangru Tang, Tianhang Zhang, Cheng Jiayang, Yunzhi Yao, Xuming Hu, Zehan Qi, Wenyang Gao, Yidong Wang, Linyi Yang, Jindong Wang, Xing Xie, Zheng Zhang, and Yue Zhang. Survey on factuality in large language models.ACM Comput. Surv., 2025
2025
-
[5]
Factuality of large language models: A survey
Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi Nenkov Georgiev, Rocktim Jy- oti Das, and Preslav Nakov. Factuality of large language models: A survey. InEMNLP, 2024
2024
-
[6]
Knowledge editing for large language models: A survey.ACM Comput
Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, and Jundong Li. Knowledge editing for large language models: A survey.ACM Comput. Surv., 2024
2024
-
[7]
Mingyang Wang, Alisa Stoll, Lukas Lange, Heike Adel, Hinrich Schütze, and Jannik Strötgen. Bring your own knowledge: A survey of methods for LLM knowledge expansion.arXiv preprint arXiv:2502.12598, 2025
-
[8]
Survey of hallucination in natural language generation.ACM Comput
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM Comput. Surv., 2023
2023
-
[9]
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Trans
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Trans. Inf. Syst., 2025
2025
-
[10]
Retrieval-augmented generation for knowledge- intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive NLP tasks. InNeurIPS, 2020
2020
-
[11]
Retrieval augmented language model pre-training
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. Retrieval augmented language model pre-training. InICML, 2020
2020
-
[12]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2023
work page internal anchor Pith review arXiv 2023
-
[13]
Graph retrieval-augmented generation: A survey.ACM Trans
Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey.ACM Trans. Inf. Syst., 2025
2025
-
[14]
Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. InACL, 2023
2023
-
[15]
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic RAG.arXiv preprint arXiv:2501.09136, 2025
work page internal anchor Pith review arXiv 2025
-
[16]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InICLR, 2023
2023
-
[17]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InNeurIPS, 2023
2023
-
[18]
Search-o1: Agentic search-enhanced large reasoning models
Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. Search-o1: Agentic search-enhanced large reasoning models. InEMNLP, 2025
2025
-
[19]
Search-R1: Training LLMs to reason and leverage search engines with reinforcement learning
Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, and Jiawei Han. Search-R1: Training LLMs to reason and leverage search engines with reinforcement learning. In COLM, 2025. 10
2025
-
[20]
Chain-of-thought prompting elicits reasoning in large language models.NeurIPS, 2022
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.NeurIPS, 2022
2022
-
[21]
Towards agentic RAG with deep reasoning: A survey of RAG-reasoning systems in LLMs
Yangning Li, Weizhi Zhang, Yuyao Yang, Wei-Chieh Huang, Yaozu Wu, Junyu Luo, Yuanchen Bei, Henry Peng Zou, Xiao Luo, Yusheng Zhao, et al. Towards agentic RAG with deep reasoning: A survey of RAG-reasoning systems in LLMs. InFindings of EMNLP, 2025
2025
-
[22]
Bowen Jin, Jinsung Yoon, Priyanka Kargupta, Sercan O Arik, and Jiawei Han. An empirical study on reinforcement learning for reasoning-search interleaved LLM agents.arXiv preprint arXiv:2505.15117, 2025
-
[23]
Minhua Lin, Zongyu Wu, Zhichao Xu, Hui Liu, Xianfeng Tang, Qi He, Charu Aggarwal, Xiang Zhang, and Suhang Wang. A comprehensive survey on reinforcement learning-based agentic search: Foundations, roles, optimizations, evaluations, and applications.arXiv preprint arXiv:2510.16724, 2025
-
[24]
DeepRAG: Thinking to retrieve step by step for large language models
Xinyan Guan, Jiali Zeng, Fandong Meng, Chunlei Xin, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, and Jie Zhou. DeepRAG: Thinking to retrieve step by step for large language models. InICLR, 2026
2026
-
[25]
RAG-R1: Incentivizing the search and reasoning capabilities of LLMs through multi-query parallelism
Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, and Jinjie Gu. RAG-R1: Incentivizing the search and reasoning capabilities of LLMs through multi-query parallelism. InAAAI, 2026
2026
-
[26]
Training large language models to reason in a continuous latent space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space. InCOLM, 2025
2025
-
[27]
Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, and Xiaoyu Shen. Reasoning beyond language: A comprehensive survey on latent chain-of-thought reasoning.arXiv preprint arXiv:2505.16782, 2025
-
[28]
Jeffrey Cheng and Benjamin Van Durme. Compressed chain of thought: Efficient reasoning through dense representations.arXiv preprint arXiv:2412.13171, 2024
-
[29]
A survey on latent reasoning.arXiv preprint arXiv:2507.06203, 2025
Rui-Jie Zhu, Tianhao Peng, Tianhao Cheng, Xingwei Qu, Jinfa Huang, Dawei Zhu, Hao Wang, Kaiwen Xue, Xuanliang Zhang, Yong Shan, et al. A survey on latent reasoning.arXiv preprint arXiv:2507.06203, 2025
-
[30]
Large concept models: Language modeling in a sentence representation space
Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R Costa-jussà, David Dale, et al. Large concept models: Language modeling in a sentence representation space.arXiv preprint arXiv:2412.08821, 2024
-
[31]
LLM pretraining with continuous concepts.arXiv preprint arXiv:2502.08524, 2025
Jihoon Tack, Jack Lanchantin, Jane Yu, Andrew Cohen, Ilia Kulikov, Janice Lan, Shibo Hao, Yuan- dong Tian, Jason Weston, and Xian Li. LLM pretraining with continuous concepts.arXiv preprint arXiv:2502.08524, 2025
-
[32]
Think before you speak: Training language models with pause tokens
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, and Vaishnavh Nagarajan. Think before you speak: Training language models with pause tokens. InICLR, 2024
2024
-
[33]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025
work page internal anchor Pith review arXiv 2025
-
[34]
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Ma- jumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training.arXiv preprint arXiv:2212.03533, 2022
work page internal anchor Pith review arXiv 2022
-
[35]
Search and refine during think: Facilitating knowledge refinement for improved retrieval-augmented reasoning
Yaorui Shi, Sihang Li, Chang Wu, Zhiyuan Liu, Junfeng Fang, Hengxing Cai, An Zhang, and Xiang Wang. Search and refine during think: Facilitating knowledge refinement for improved retrieval-augmented reasoning. InNeurIPS, 2025
2025
-
[36]
Model internals-based answer attribution for trustworthy retrieval-augmented generation
Jirui Qi, Gabriele Sarti, Raquel Fernández, and Arianna Bisazza. Model internals-based answer attribution for trustworthy retrieval-augmented generation. InEMNLP, 2024
2024
-
[37]
João Eduardo Batista, Emil Vatai, and Mohamed Wahib. SAFE: Improving LLM systems using sentence- level in-generation attribution.arXiv preprint arXiv:2505.12621, 2025
-
[38]
Active retrieval augmented generation
Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. InEMNLP, 2023. 11
2023
-
[39]
ReAgent: Reversible multi-agent reasoning for knowledge-enhanced multi-hop QA
Zhao Xinjie, Fan Gao, Xingyu Song, Yingjian Chen, Rui Yang, Yanran Fu, Yuyang Wang, Yusuke Iwasawa, Yutaka Matsuo, and Irene Li. ReAgent: Reversible multi-agent reasoning for knowledge-enhanced multi-hop QA. InEMNLP, 2025
2025
-
[40]
Self-RAG: Learning to retrieve, generate, and critique through self-reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-RAG: Learning to retrieve, generate, and critique through self-reflection. InICLR, 2024
2024
-
[41]
Dongkyu Kim, Byoungwook Kim, Donggeon Han, and Matouš Eibich. AutoRAG: Automated framework for optimization of retrieval augmented generation pipeline.arXiv preprint arXiv:2410.20878, 2024
-
[42]
Unified active retrieval for retrieval augmented generation
Qinyuan Cheng, Xiaonan Li, Shimin Li, Qin Zhu, Zhangyue Yin, Yunfan Shao, Linyang Li, Tianxiang Sun, Hang Yan, and Xipeng Qiu. Unified active retrieval for retrieval augmented generation. InFindings of EMNLP, 2024
2024
-
[43]
Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity
Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park. Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity. InNAACL, 2024
2024
-
[44]
ReSearch: Learning to reason with search for LLMs via reinforcement learning
Mingyang Chen, Linzhuang Sun, Tianpeng Li, Chenzheng Zhu, Haofen Wang, Jeff Z Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, et al. ReSearch: Learning to reason with search for LLMs via reinforcement learning. InNeurIPS, 2025
2025
-
[45]
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Huatong Song, Jinhao Jiang, Yingqian Min, Jie Chen, Zhipeng Chen, Wayne Xin Zhao, Lei Fang, and Ji-Rong Wen. R1-Searcher: Incentivizing the search capability in LLMs via reinforcement learning.arXiv preprint arXiv:2503.05592, 2025
work page internal anchor Pith review arXiv 2025
-
[46]
DeepResearcher: Scaling deep research via reinforcement learning in real-world environments
Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, and Pengfei Liu. DeepResearcher: Scaling deep research via reinforcement learning in real-world environments. InEMNLP, 2025
2025
-
[47]
TIPS: Turn-level information-potential reward shaping for search-augmented LLMs
Yutao Xie, Nathaniel Thomas, Nicklas Hansen, Yang Fu, Li Erran Li, and Xiaolong Wang. TIPS: Turn-level information-potential reward shaping for search-augmented LLMs. InICLR, 2026
2026
-
[48]
HiPRAG: hierarchical process rewards for efficient agentic retrieval augmented generation
Peilin Wu, Mian Zhang, Kun Wan, Wentian Zhao, Kaiyu He, Xinya Du, and Zhiyu Chen. HiPRAG: hierarchical process rewards for efficient agentic retrieval augmented generation. InICLR, 2026
2026
-
[49]
A 2Search: Ambiguity-aware question answering with reinforce- ment learning
Fengji Zhang, Xinyao Niu, Chengyang Ying, Guancheng Lin, Zhongkai Hao, Zhou Fan, Chengen Huang, Jacky Keung, Bei Chen, and Junyang Lin. A 2Search: Ambiguity-aware question answering with reinforce- ment learning. InICLR, 2026
2026
-
[50]
Qingfei Zhao, Ruobing Wang, Dingling Xu, Daren Zha, and Limin Liu. R-Search: Empowering LLM reasoning with search via multi-reward reinforcement learning.arXiv preprint arXiv:2506.04185, 2025
-
[51]
Shu Zhao, Tan Yu, Anbang Xu, Japinder Singh, Aaditya Shukla, and Rama Akkiraju. ParallelSearch: Train your LLMs to decompose query and search sub-queries in parallel with reinforcement learning.arXiv preprint arXiv:2508.09303, 2025
-
[52]
Zelai Xu, Zhexuan Xu, Ruize Zhang, Chunyang Zhu, Shi Yu, Weilin Liu, Quanlu Zhang, Wenbo Ding, Chao Yu, and Yu Wang. WideSeek-R1: Exploring width scaling for broad information seeking via multi-agent reinforcement learning.arXiv preprint arXiv:2602.04634, 2026
-
[53]
Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, et al. The latent space: Foundation, evolution, mechanism, ability, and outlook.arXiv preprint arXiv:2604.02029, 2026
-
[54]
Let’s think dot by dot: Hidden computation in transformer language models
Jacob Pfau, William Merrill, and Samuel R Bowman. Let’s think dot by dot: Hidden computation in transformer language models. InCOLM, 2024
2024
-
[55]
CODI: Compressing chain-of-thought into continuous space via self-distillation
Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, and Yulan He. CODI: Compressing chain-of-thought into continuous space via self-distillation. InEMNLP, 2025
2025
-
[56]
Jianwei Wang, Ziming Wu, Fuming Lai, Shaobing Lian, and Ziqian Zeng. SynAdapt: Learning adap- tive reasoning in large language models via synthetic continuous chain-of-thought.arXiv preprint arXiv:2508.00574, 2025
-
[57]
SIM-CoT: Supervised implicit chain-of-thought
Xilin Wei, Xiaoran Liu, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Jiaqi Wang, Xipeng Qiu, and Dahua Lin. SIM-CoT: Supervised implicit chain-of-thought. InICLR, 2026
2026
-
[58]
Soft thinking: Unlocking the reasoning potential of LLMs in continuous concept space
Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, and Xin Eric Wang. Soft thinking: Unlocking the reasoning potential of LLMs in continuous concept space. In NeurIPS, 2025. 12
2025
-
[59]
The geometry of reasoning: Flowing logics in representation space
Yufa Zhou, Yixiao Wang, Xunjian Yin, Shuyan Zhou, and Anru R Zhang. The geometry of reasoning: Flowing logics in representation space. InICLR, 2026
2026
-
[60]
LLM latent reasoning as chain of superposition
Jingcheng Deng, Liang Pang, Zihao Wei, Shichen Xu, Zenghao Duan, Kun Xu, Yang Song, Huawei Shen, and Xueqi Cheng. Latent reasoning in LLMs as a vocabulary-space superposition.arXiv preprint arXiv:2510.15522, 2025
-
[61]
SemCoT: Accelerating chain-of-thought reasoning through semantically-aligned implicit tokens
Yinhan He, Wendy Zheng, Yaochen Zhu, Zaiyi Zheng, Lin Su, Sriram Vasudevan, Qi Guo, Liangjie Hong, and Jundong Li. SemCoT: Accelerating chain-of-thought reasoning through semantically-aligned implicit tokens. InNeurIPS, 2025
2025
-
[62]
SoftCoT: Soft chain-of-thought for efficient reasoning with LLMs
Yige Xu, Xu Guo, Zhiwei Zeng, and Chunyan Miao. SoftCoT: Soft chain-of-thought for efficient reasoning with LLMs. InACL, 2025
2025
-
[63]
Jie He, Richard He Bai, Sinead Williamson, Jeff Z Pan, Navdeep Jaitly, and Yizhe Zhang. CLaRa: Bridging retrieval and generation with continuous latent reasoning.arXiv preprint arXiv:2511.18659, 2025
-
[64]
Jiajie Jin, Yanzhao Zhang, Mingxin Li, Dingkun Long, Pengjun Xie, Yutao Zhu, and Zhicheng Dou. LaSER: Internalizing explicit reasoning into latent space for dense retrieval.arXiv preprint arXiv:2603.01425, 2026
-
[65]
A survey on RAG meeting LLMs: Towards retrieval-augmented large language models
Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. A survey on RAG meeting LLMs: Towards retrieval-augmented large language models. InKDD, 2024
2024
-
[66]
FlashRAG: A modular toolkit for efficient retrieval-augmented generation research
Jiajie Jin, Yutao Zhu, Zhicheng Dou, Guanting Dong, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, and Ji-Rong Wen. FlashRAG: A modular toolkit for efficient retrieval-augmented generation research. InWWW, 2025
2025
-
[67]
CRUD-RAG: A comprehensive chinese benchmark for retrieval-augmented generation of large language models.ACM Trans
Yuanjie Lyu, Zhiyu Li, Simin Niu, Feiyu Xiong, Bo Tang, Wenjin Wang, Hao Wu, Huanyong Liu, Tong Xu, and Enhong Chen. CRUD-RAG: A comprehensive chinese benchmark for retrieval-augmented generation of large language models.ACM Trans. Inf. Syst., 2025
2025
-
[68]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018
work page internal anchor Pith review arXiv 2018
-
[69]
Natural questions: a benchmark for question answering research.TACL, 2019
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. Natural questions: a benchmark for question answering research.TACL, 2019
2019
-
[70]
TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension
Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. InACL, 2017
2017
-
[71]
When not to trust language models: Investigating effectiveness of parametric and non-parametric memories
Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In ACL, 2023
2023
-
[72]
HotpotQA: A dataset for diverse, explainable multi-hop question answering
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In EMNLP, 2018
2018
-
[73]
Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps. InCOLING, 2020
2020
-
[74]
MuSiQue: Multihop questions via single-hop question composition.TACL, 2022
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. MuSiQue: Multihop questions via single-hop question composition.TACL, 2022
2022
-
[75]
Measuring and narrowing the compositionality gap in language models
Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A Smith, and Mike Lewis. Measuring and narrowing the compositionality gap in language models. InFindings of EMNLP, 2023
2023
-
[76]
Dense passage retrieval for open-domain question answering
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InEMNLP, 2020
2020
-
[77]
Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy
Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. InFindings of EMNLP, 2023
2023
-
[78]
Zerosearch: Incentivize the search capability of llms without searching, 2025
Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Yan Zhang, Fei Huang, and Jingren Zhou. ZeroSearch: Incentivize the search capability of LLMs without searching.arXiv preprint arXiv:2505.04588, 2025. 13
-
[79]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...
work page internal anchor Pith review arXiv 2024
-
[80]
MTEB: Massive text embedding benchmark
Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. MTEB: Massive text embedding benchmark. InEACL, 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.