arxiv: 2605.01399 · v1 · submitted 2026-05-02 · 💻 cs.CL · cs.AI· cs.IR

Recognition: unknown

Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning

Sangkwon Park , Donghun Kang , Jisoo Mok , Sungroh Yoon

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:37 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IR

keywords Retrieval-Augmented GenerationVerbal AnnotationsQuestion AnsweringRerankingLarge Language ModelsAgentic RAGTest-time Compute

0 comments

The pith

Verbal annotations that spell out logical connections between queries and retrieved documents improve how LLMs reason over search results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard RAG feeds raw text to LLMs but often fails to integrate it well into reasoning. The authors propose Verbal Annotations as explicit analytic narratives that describe the logical link between a query and each retrieved passage. They show through experiments that these annotations help LLMs produce more accurate and grounded answers. Building on this, they present Verbal-R3, an agentic framework where a Generator iterates on retrieval and reasoning while a Verbal Reranker supplies both scores and annotations to steer the process. Relevance-guided test-time scaling further optimizes compute use during inference.

Core claim

The paper introduces Verbal Annotations as analytic narratives that explicitly articulate the logical connection between a search query and retrieved contexts, and demonstrates that a Verbal Reranker providing these annotations alongside relevance scores enables a Generator to perform more effective iterative retrieval and reasoning, resulting in state-of-the-art performance on complex Question Answering benchmarks.

What carries the argument

The Verbal Reranker, an agent component that returns relevance scores and Verbal Annotations to guide the Generator's reasoning and answering process.

If this is right

Verbal Annotations substantially enhance the LLM's ability to generate accurate, contextually-grounded responses.
The Verbal-R3 framework achieves state-of-the-art performance on complex Question Answering benchmarks.
Relevance-guided test-time scaling efficiently allocates test-time compute for effective trajectory expansion.
Iterative retrieval and reasoning guided by the reranker improves integration of retrieved information over raw text injection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach suggests that making the connection between retrieval and reasoning explicit can compensate for weaknesses in how LLMs process long contexts.
The method could be tested on tasks like multi-hop reasoning or fact verification where logical links are critical.
Scaling the reranker itself might further reduce reliance on the base LLM's internal knowledge.

Load-bearing premise

That verbal annotations substantially enhance the LLM's ability to generate accurate, contextually-grounded responses.

What would settle it

Running the Generator component with and without the Verbal Reranker's annotations on the same complex QA benchmarks and checking whether the performance gap disappears.

Figures

Figures reproduced from arXiv: 2605.01399 by Donghun Kang, Jisoo Mok, Sangkwon Park, Sungroh Yoon.

**Figure 1.** Figure 1: Comparison of Search-R1 vs. Verbal-R3 inference procedures. The original document contains a large amount of extraneous information, which makes Search-R1 hallucinate and produce an incorrect answer. In contrast, Verbal-R3 successfully exploits Verbal Annotations to distinguish relevant information, allowing the Generator to avoid hallucinations and generate a correct answer. a Verbal Reranker LLMs. The Ge… view at source ↗

**Figure 2.** Figure 2: An example of relevance-guided test-time scaling with two iterations and two branches. Different view at source ↗

read the original abstract

The conventional Retrieval-Augmented Generation (RAG) paradigm of injecting raw retrieved texts into the Large Language Model (LLM)'s context often results in suboptimal integration of retrieved information. This paper proposes to bridge retrieval results and the LLM's reasoning ability through Verbal Annotations, analytic narratives that explicitly articulate the logical connection between a search query and retrieved contexts. Our empirical investigation reveals the potential of Verbal Annotations to substantially enhance the LLM's ability to generate accurate, contextually-grounded responses. Motivated by this finding, we introduce Verbal-R3, a novel agentic RAG framework that consists of a Generator and a Verbal Reranker. The Generator performs iterative retrieval and reasoning, while the Verbal Reranker returns relevance scores and Verbal Annotations to guide the reasoning and answering process of the Generator. The inference process of Verbal-R3 is further refined through relevance-guided test-time scaling, which efficiently allocates test-time compute for effective trajectory expansion. Verbal-R3 achieves state-of-the-art performance on complex Question Answering benchmarks, validating the effectiveness of the proposed framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Verbal-R3 adds verbal annotations and a dedicated reranker to link retrieval with reasoning in an agentic RAG loop, reaching claimed SOTA on complex QA with no internal contradictions in the full text.

read the letter

The main takeaway is that this paper proposes verbal annotations as a way to explicitly bridge retrieved contexts and the LLM's reasoning process in RAG systems. Their Verbal-R3 framework uses a generator for iterative retrieval and reasoning, paired with a verbal reranker that provides both scores and these annotations. They further refine inference with relevance-guided test-time scaling. The paper reports state-of-the-art results on complex question answering benchmarks. What the paper does well is to formalize this verbal layer and integrate it into an agentic loop. The description of how the reranker guides the generator is straightforward, and the empirical investigation includes ablations that isolate the contribution of the verbal annotations. The stress-test on the full manuscript found no internal inconsistencies or unsupported leaps in the results section. Where it could be stronger is in demonstrating that the verbal annotations provide benefits beyond what a standard reranker or better prompting would achieve. The gains are presented as substantial, but without more diverse or challenging baselines, it's hard to gauge the true novelty of the improvement. Also, the method assumes access to capable LLMs for generating the annotations, which might limit applicability in resource-constrained settings. These are not fatal but worth addressing in revision. Readers interested in advancing retrieval-augmented generation for complex reasoning tasks would find this relevant. It offers a concrete framework that could be implemented and tested in similar setups. Overall, the work shows clear thinking on the integration problem and engages honestly with the RAG literature. It deserves a serious referee, and I recommend sending it to peer review rather than desk rejecting it.

Referee Report

0 major / 3 minor

Summary. The paper proposes Verbal-R3, an agentic RAG framework that bridges retrieval and LLM reasoning via Verbal Annotations—analytic narratives articulating logical connections between queries and retrieved contexts. The system comprises a Generator performing iterative retrieval and reasoning, a Verbal Reranker supplying relevance scores and annotations to guide the Generator, and relevance-guided test-time scaling for efficient trajectory expansion. It reports state-of-the-art empirical performance on complex QA benchmarks.

Significance. If the reported gains hold under the described controls and ablations, the work offers a practical advance in RAG by making the integration of retrieved evidence explicit and verbalized rather than raw. The agentic loop plus test-time scaling combination is a concrete engineering contribution that could improve grounding without excessive compute. The manuscript follows standard empirical practices for the domain and shows no internal inconsistencies in its argument structure or framework description.

minor comments (3)

Abstract: the claim of SOTA performance would be more informative if the specific benchmarks and the magnitude of improvements over the strongest baselines were named explicitly rather than left as a general assertion.
§3 (Framework): the definition and generation process for Verbal Annotations is described at a high level; providing a concise pseudocode or template example would improve reproducibility and clarify how they differ from standard chain-of-thought outputs.
§4 (Experiments): while ablations are mentioned, the statistical significance of the reported gains and the exact number of runs or variance estimates are not detailed in the summary tables; adding these would strengthen the empirical claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of Verbal-R3, the assessment of its significance as a practical advance in RAG, and the recommendation for minor revision. The referee's description of the framework (Generator, Verbal Reranker, relevance-guided test-time scaling) and empirical results aligns with the manuscript.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces an empirical agentic RAG framework (Verbal-R3) consisting of a Generator and Verbal Reranker that uses verbal annotations to connect retrieval and reasoning. All claims rest on experimental results, ablations, and benchmark performance rather than any derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing arguments. No load-bearing step reduces to its own inputs by construction; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the unproven premise that explicit verbal explanations improve LLM reasoning over raw context; no free parameters, axioms, or invented entities are quantified in the abstract.

axioms (1)

domain assumption Verbal annotations can substantially enhance LLM reasoning over retrieved contexts
Stated as the motivation from the empirical investigation in the abstract.

invented entities (2)

Verbal Annotations no independent evidence
purpose: Analytic narratives that articulate logical connections between query and retrieved contexts
New concept introduced to bridge retrieval and reasoning
Verbal Reranker no independent evidence
purpose: Component that returns relevance scores and verbal annotations to guide the generator
Core novel module of the Verbal-R3 framework

pith-pipeline@v0.9.0 · 5492 in / 1158 out tokens · 32145 ms · 2026-05-09T14:37:09.761549+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 36 canonical work pages · 8 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[2]

Publications Manual , year = "1983", publisher =

1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[5]

Dan Gusfield , title =. 1997

1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
[8]

ArXiv , year=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. ArXiv , year=
[9]

ArXiv , year=

Qwen3 Technical Report , author=. ArXiv , year=
[10]

ArXiv , year=

GPT-4o System Card , author=. ArXiv , year=
[11]

ArXiv , year=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. ArXiv , year=
[12]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=
[13]

DeepResearcher: Scaling deep research via reinforcement learning in real-world environments

Zheng, Yuxiang and Fu, Dayuan and Hu, Xiangkun and Cai, Xiaojie and Ye, Lyumanshan and Lu, Pengrui and Liu, Pengfei. D eep R esearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.22

work page doi:10.18653/v1/2025.emnlp-main.22 2025
[14]

2025 , eprint=

gpt-oss-120b & gpt-oss-20b Model Card , author=. 2025 , eprint=

2025
[15]

Le and Ed H

Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

2023
[16]

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Search-r1: Training llms to reason and leverage search engines with reinforcement learning , author=. arXiv preprint arXiv:2503.09516 , year=

work page internal anchor Pith review arXiv
[17]

1511.06732 , archiveprefix =

Sequence level training with recurrent neural networks , author=. arXiv preprint arXiv:1511.06732 , year=

work page arXiv
[18]

Findings of the Association for Computational Linguistics: ACL 2022 , publisher =

Arora, Kushal and El Asri, Layla and Bahuleyan, Hareesh and Cheung, Jackie. Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation. Findings of the Association for Computational Linguistics: ACL 2022. 2022. doi:10.18653/v1/2022.findings-acl.58

work page doi:10.18653/v1/2022.findings-acl.58 2022
[19]

Li and Y

Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y.K. Li and Y. Wu and Daya Guo , title =. CoRR , volume =. 2024 , url =

2024
[20]

An empirical study on reinforcement learning for reasoning-search interleaved LLM agents.arXiv preprint arXiv:2505.15117, 2025

An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents , author=. arXiv preprint arXiv:2505.15117 , year=

work page arXiv
[21]

2019 , journal =

Natural Questions: a Benchmark for Question Answering Research , author =. 2019 , journal =

2019
[22]

, booktitle =

Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William and Salakhutdinov, Ruslan and Manning, Christopher D. H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1259

work page doi:10.18653/v1/d18-1259 2018
[23]

T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Joshi, Mandar and Choi, Eunsol and Weld, Daniel and Zettlemoyer, Luke. T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1147

work page doi:10.18653/v1/p17-1147 2017
[24]

When not to trust language models: Investigating effectiveness of parametric and non-parametric memories

Mallen, Alex and Asai, Akari and Zhong, Victor and Das, Rajarshi and Khashabi, Daniel and Hajishirzi, Hannaneh. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023...

work page doi:10.18653/v1/2023.acl-long.546 2023
[25]

Ho, A.-K

Ho, Xanh and Duong Nguyen, Anh-Khoa and Sugawara, Saku and Aizawa, Akiko. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.580

work page doi:10.18653/v1/2020.coling-main.580 2020
[26]

Prabha, D., Aswini, J., Maheswari, B., Subramanian, R

Press, Ofir and Zhang, Muru and Min, Sewon and Schmidt, Ludwig and Smith, Noah and Lewis, Mike. Measuring and Narrowing the Compositionality Gap in Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.378

work page doi:10.18653/v1/2023.findings-emnlp.378 2023
[27]

M u S i Q ue: Multihop questions via single-hop question composition

Trivedi, Harsh and Balasubramanian, Niranjan and Khot, Tushar and Sabharwal, Ashish. M u S i Q ue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00475

work page doi:10.1162/tacl_a_00475 2022
[28]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025
[29]

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Shao, Zhihong and Gong, Yeyun and Shen, Yelong and Huang, Minlie and Duan, Nan and Chen, Weizhu. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.620

work page doi:10.18653/v1/2023.findings-emnlp.620 2023
[30]

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

Trivedi, Harsh and Balasubramanian, Niranjan and Khot, Tushar and Sabharwal, Ashish. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.557

work page doi:10.18653/v1/2023.acl-long.557 2023
[31]

Query rewriting in retrieval- augmented large language models

Ma, Xinbei and Gong, Yeyun and He, Pengcheng and Zhao, Hai and Duan, Nan. Query Rewriting in Retrieval-Augmented Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.322

work page doi:10.18653/v1/2023.emnlp-main.322 2023
[32]

Precise zero-shot dense retrieval without relevance labels

Gao, Luyu and Ma, Xueguang and Lin, Jimmy and Callan, Jamie. Precise Zero-Shot Dense Retrieval without Relevance Labels. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.99

work page doi:10.18653/v1/2023.acl-long.99 2023
[33]

The Twelfth International Conference on Learning Representations , year=

Asai, Akari and Wu, Zeqiu and Wang, Yizhong and Sil, Avirup and Hajishirzi, Hannaneh , title=. The Twelfth International Conference on Learning Representations , year=
[34]

2024 , eprint=

Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models , author=. 2024 , eprint=

2024
[35]

Active retrieval augmented generation

Jiang, Zhengbao and Xu, Frank and Gao, Luyu and Sun, Zhiqing and Liu, Qian and Dwivedi-Yu, Jane and Yang, Yiming and Callan, Jamie and Neubig, Graham. Active Retrieval Augmented Generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.495

work page doi:10.18653/v1/2023.emnlp-main.495 2023
[36]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=
[37]

BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v...

work page doi:10.18653/v1/n19-1423 2019
[38]

Passage Re-ranking with BERT

Passage Re-ranking with BERT , author=. arXiv preprint arXiv:1901.04085 , year=

work page internal anchor Pith review arXiv 1901
[39]

Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval , pages=

Colbert: Efficient and effective passage search via contextualized late interaction over bert , author=. Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval , pages=
[40]

Journal of machine learning research , volume=

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. Journal of machine learning research , volume=
[41]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Rankt5: Fine-tuning t5 for text ranking with ranking losses , author=. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[42]

arXiv preprint arXiv:2402.15838 , year=

Listt5: Listwise reranking with fusion-in-decoder improves zero-shot retrieval , author=. arXiv preprint arXiv:2402.15838 , year=

work page arXiv
[43]

Document ranking with a pretrained sequence-to-sequence model

Nogueira, Rodrigo and Jiang, Zhiying and Pradeep, Ronak and Lin, Jimmy. Document Ranking with a Pretrained Sequence-to-Sequence Model. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.63

work page doi:10.18653/v1/2020.findings-emnlp.63 2020
[44]

CoRRabs/1904.08375(2019), http://arxiv.org/abs/1904.08375

Document expansion by query prediction , author=. arXiv preprint arXiv:1904.08375 , year=

work page arXiv 1904
[45]

Is C hat GPT good at search? investigating large language models as re-ranking agents

Sun, Weiwei and Yan, Lingyong and Ma, Xinyu and Wang, Shuaiqiang and Ren, Pengjie and Chen, Zhumin and Yin, Dawei and Ren, Zhaochun. Is C hat GPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.923

work page doi:10.18653/v1/2023.emnlp-main.923 2023
[46]

arXiv preprint arXiv:2309.15088 , year=

Rankvicuna: Zero-shot listwise document reranking with open-source large language models , author=. arXiv preprint arXiv:2309.15088 , year=

work page arXiv
[47]

Zero-shot listwise document reranking with a large language model.arXiv preprint arXiv:2305.02156, 2023

Zero-shot listwise document reranking with a large language model , author=. arXiv preprint arXiv:2305.02156 , year=

work page arXiv
[48]

RankZephyr: Effective and robust zero-shot listwise reranking is a breeze!arXiv preprint arXiv:2312.02724, 2023

RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! , author=. arXiv preprint arXiv:2312.02724 , year=

work page arXiv
[49]

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models , author=. arXiv preprint arXiv:2104.08663 , year=

work page internal anchor Pith review arXiv
[50]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Fine-tuning llama for multi-stage text retrieval , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[51]

Rank1: Test-time compute for reranking in information retrieval.arXiv preprint arXiv:2502.18418,

Rank1: Test-time compute for reranking in information retrieval , author=. arXiv preprint arXiv:2502.18418 , year=

work page arXiv
[52]

Retrieval-Augmented Generation for Large Language Models: A Survey

Retrieval-augmented generation for large language models: A survey , author=. arXiv preprint arXiv:2312.10997 , volume=

work page internal anchor Pith review Pith/arXiv arXiv
[53]

Unsupervised Dense Information Retrieval with Contrastive Learning

Unsupervised dense information retrieval with contrastive learning , author=. arXiv preprint arXiv:2112.09118 , year=

work page internal anchor Pith review arXiv
[54]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Text embeddings by weakly-supervised contrastive pre-training , author=. arXiv preprint arXiv:2212.03533 , year=

work page internal anchor Pith review arXiv
[55]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

work page internal anchor Pith review arXiv
[56]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Measuring and narrowing the compositionality gap in language models , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

2023
[57]

2025 , howpublished =

OpenAI , title =. 2025 , howpublished =

2025
[58]

ACM Transactions on Information Systems , volume=

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions , author=. ACM Transactions on Information Systems , volume=. 2025 , publisher=

2025
[59]

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Xiaoxi Li and Guanting Dong and Jiajie Jin and Yuyao Zhang and Yujia Zhou and Yutao Zhu and Peitian Zhang and Zhicheng Dou , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2501.05366 , eprinttype =. 2501.05366 , timestamp =

work page internal anchor Pith review doi:10.48550/arxiv.2501.05366 2025
[60]

(2024) Bm25s: Orders of magnitude faster lexical search via eager sparse scoring

Bm25s: Orders of magnitude faster lexical search via eager sparse scoring , author=. arXiv preprint arXiv:2407.03618 , year=

work page arXiv
[61]

In: Proceedings of the 29th Symposium on Operating Systems Principles

Kwon, Woosuk and Li, Zhuohan and Zhuang, Siyuan and Sheng, Ying and Zheng, Lianmin and Yu, Cody Hao and Gonzalez, Joseph and Zhang, Hao and Stoica, Ion , title =. Proceedings of the 29th Symposium on Operating Systems Principles , pages =. 2023 , isbn =. doi:10.1145/3600006.3613165 , abstract =

work page doi:10.1145/3600006.3613165 2023
[62]

2025 , eprint=

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL , author=. 2025 , eprint=

2025
[63]

Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining , pages =

Liu, Xuanzhang and Feng, Jianglun and Zhuang, Zhuoran and Zhao, Junzhe and Que, Maofei and Li, Jieting and Wang, Dianlei and Tong, Hao and Chen, Ye and Li, Pan , title =. Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining , pages =. 2026 , isbn =. doi:10.1145/3773966.3777986 , abstract =

work page doi:10.1145/3773966.3777986 2026
[64]

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction , volume =

Xu, Jun and Du, Xinkai and Ao, Yu and Zhao, Peilong and Li, Yang and Zhong, Ling and Yuan, Lin and Bo, Zhongpu and Wang, Xiaorui and Sun, Mengshu and Gui, Zhengke and Zhang, Dalong and Wang, Zhaoyang and Qiwei, Wang and Hou, Yangyang and Yin, Zhiying and Wang, Haofen and Chen, Huajun and Liang, Lei and Zhou, Jun , year =. Thinker: Training LLMs in Hierarc...
[65]

2025 , eprint=

InfoFlow: Reinforcing Search Agent Via Reward Density Optimization , author=. 2025 , eprint=

2025
[66]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

2025