pith. machine review for the scientific record. sign in

arxiv: 2605.00063 · v1 · submitted 2026-04-30 · 💻 cs.IR · cs.AI

Recognition: unknown

A Survey of Reasoning-Intensive Retrieval: Progress and Challenges

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:49 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords Reasoning-Intensive RetrievalInformation RetrievalBenchmarksTaxonomyRetrieval PipelineLarge Language ModelsRerankersChallenges
0
0 comments X

The pith

Reasoning-intensive retrieval organizes around benchmarks by domain and a taxonomy of how reasoning enters the retrieval pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey targets Reasoning-Intensive Retrieval, where relevance depends on latent inferential links between query and evidence rather than direct semantic match. It systematizes existing benchmarks according to knowledge domains and modalities to map the current landscape. A taxonomy is introduced that groups methods by the stage and manner in which reasoning is added to the retrieval process. Trade-offs, practical uses, open challenges, and future directions are outlined to give the field a shared reference point.

Core claim

The paper claims that Reasoning-Intensive Retrieval efforts, which incorporate large-language-model reasoning into retrieval to handle inferential relevance, can be made coherent by grouping benchmarks by domain and modality and by classifying methods according to where and how reasoning is inserted into the pipeline, thereby supplying a usable roadmap.

What carries the argument

The structured taxonomy that places methods into categories based on where and how reasoning is integrated into the retrieval pipeline.

If this is right

  • Developers can select or design methods according to the stage of reasoning integration that best matches their accuracy and efficiency needs.
  • Benchmark creators can target gaps in specific domains or modalities identified by the systematization.
  • Comparisons across papers become possible once methods share the same taxonomy labels.
  • Research priorities can focus on the challenges and directions the survey lists as most pressing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The taxonomy may need new branches once methods begin combining reasoning at multiple pipeline stages simultaneously.
  • Extending the same grouping to non-text modalities such as images or tables could expose whether current categories generalize.
  • If the taxonomy proves stable, it could serve as the basis for standardized evaluation protocols that measure inferential reasoning quality directly.

Load-bearing premise

That the authors' selection and categorization of the literature is sufficiently complete and unbiased to serve as a reliable roadmap for the field.

What would settle it

A substantial new benchmark or method that cannot be placed in any of the taxonomy categories or domain-modality groups would show the roadmap is incomplete.

Figures

Figures reproduced from arXiv: 2605.00063 by Siyue Zhang, Tingyu Song, Yilun Zhao, Yiyang Wei.

Figure 1
Figure 1. Figure 1: Top: An example of reasoning-intensive retrieval, where a query and its supporting document are connected through an implicit multi-hop reasoning chain. Down: Overview of the retrieval pipeline and representative techniques, which is detailed in Section 4. to infer implicit connections, such as mapping a brief algorithm description to its symbolic code. We refer to this setting as Reasoning- Intensive Retr… view at source ↗
Figure 2
Figure 2. Figure 2: Taxonomy of Reasoning-Intensive Retrieval (RIR). [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
read the original abstract

Reasoning-Intensive Retrieval (RIR) targets retrieval settings where relevance is mediated by latent inferential links between a query and supporting evidence, rather than semantic similarity. Motivated by the emergent reasoning abilities of Large Language Models (LLMs), recent work integrates these capabilities into the IR field, spanning the entire pipeline from benchmarks to retrievers and rerankers. Despite this progress, the field lacks a systematic framework to organize current efforts and articulate a clear path forward. To provide a clear roadmap for this rapidly growing yet fragmented area, this survey (1) systematizes existing RIR benchmarks by knowledge domains and modalities, providing a detailed analysis of the current landscape; (2) introduces a structured taxonomy that categorizes methods based on where and how reasoning is integrated into the retrieval pipeline, alongside an analysis of their trade-offs and practical applications; and (3) summarizes challenges and future directions to guide research in this evolving field.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript surveys Reasoning-Intensive Retrieval (RIR), where relevance depends on latent inferential links rather than direct semantic similarity, often leveraging LLMs. It (1) systematizes existing benchmarks by knowledge domains and modalities, (2) introduces a taxonomy categorizing methods by the location and manner of reasoning integration in the retrieval pipeline, and (3) summarizes challenges and future directions to guide the field.

Significance. If the benchmark systematization proves representative and the taxonomy is shown to be both natural and consistently applied, the survey could serve as a useful organizing framework for an emerging, fragmented subfield. It would help researchers identify gaps in reasoning-enhanced IR and accelerate work on LLM-augmented retrievers and rerankers.

major comments (2)
  1. [§2] §2 (Benchmarks): No literature search protocol, queried databases, date cutoff, or explicit inclusion/exclusion criteria are described for selecting the surveyed benchmarks. This omission directly undermines the claim that the systematization by domains and modalities provides a reliable landscape analysis.
  2. [§3] §3 (Taxonomy): The taxonomy categories (e.g., reasoning at query rewriting, retrieval, or reranking stages) are presented without justification of how they were derived, how boundary cases were handled, or any validation (such as application to a held-out set of papers). This makes it difficult to evaluate whether the taxonomy reflects genuine divisions or post-hoc grouping, which is load-bearing for the roadmap value asserted in the abstract.
minor comments (2)
  1. [Abstract] The abstract would be clearer if it included one concrete example distinguishing RIR from standard semantic retrieval.
  2. A summary table listing all discussed benchmarks with columns for domain, modality, size, and reasoning requirements would improve accessibility and allow readers to quickly assess coverage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which help clarify how to improve the rigor of our survey. We address each major comment below and will revise the manuscript to incorporate the suggested enhancements.

read point-by-point responses
  1. Referee: [§2] §2 (Benchmarks): No literature search protocol, queried databases, date cutoff, or explicit inclusion/exclusion criteria are described for selecting the surveyed benchmarks. This omission directly undermines the claim that the systematization by domains and modalities provides a reliable landscape analysis.

    Authors: We agree that an explicit literature search protocol would strengthen the reproducibility and credibility of the benchmark systematization. While the benchmarks were identified through a comprehensive review of recent publications in top IR venues, arXiv, and related workshops (covering works up to early 2024), the original manuscript did not document the process in detail. In the revised version, we will add a dedicated subsection at the start of §2 describing the search strategy, including queried sources (Google Scholar, arXiv, ACL Anthology, SIGIR/TOIS proceedings), date cutoff, and inclusion/exclusion criteria (e.g., focus on tasks requiring multi-hop or latent inference rather than direct semantic match). This will make the landscape analysis more transparent without altering the core categorization. revision: yes

  2. Referee: [§3] §3 (Taxonomy): The taxonomy categories (e.g., reasoning at query rewriting, retrieval, or reranking stages) are presented without justification of how they were derived, how boundary cases were handled, or any validation (such as application to a held-out set of papers). This makes it difficult to evaluate whether the taxonomy reflects genuine divisions or post-hoc grouping, which is load-bearing for the roadmap value asserted in the abstract.

    Authors: The taxonomy was constructed by mapping reasoning integration points onto the canonical stages of the retrieval pipeline (query formulation, initial retrieval, and reranking), which follows naturally from standard IR system architectures and the ways LLMs are currently applied in the literature. Boundary cases (e.g., hybrid methods) were resolved by primary stage of reasoning application. We acknowledge that the manuscript presents the taxonomy without sufficient methodological justification or illustrative validation. In revision, we will expand the opening of §3 with a new paragraph explaining the derivation rationale, provide explicit examples of boundary-case handling, and include a short validation table applying the taxonomy to a representative sample of papers (including some not used in the initial development) to demonstrate consistency. This will clarify that the categories capture genuine pipeline distinctions rather than arbitrary groupings. revision: yes

Circularity Check

0 steps flagged

No circularity in organizational survey

full rationale

This survey paper organizes existing RIR literature by domains/modalities and introduces a taxonomy of reasoning integration points in the retrieval pipeline. No derivations, equations, predictions, fitted parameters, or self-referential reductions appear anywhere in the text. The central claims are descriptive and classificatory rather than derived from prior results within the paper; completeness of coverage is presented as an external literature-review task, not as a constructed output. The work is therefore self-contained as an organizational contribution with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey with no mathematical derivations, fitted parameters, or new postulated entities. All content rests on the authors' reading and categorization of prior publications.

pith-pipeline@v0.9.0 · 5462 in / 974 out tokens · 29455 ms · 2026-05-09T20:49:40.029407+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

99 extracted references · 88 canonical work pages · 8 internal anchors

  1. [1]

    Abdelrahman Abdallah, Mohamed Darwish Mounis, Mahmoud Abdalla, Mahmoud Salaheldin Kasem, Mostafa Farouk Senussi, Mohamed Mahmoud, Mohammed Ali, Adam Jatowt, and Hyun Soo Kang. 2026. https://api.semanticscholar.org/CorpusID:284718195 Mm-bright: A multi-task multimodal benchmark for reasoning-intensive retrieval . ArXiv, abs/2601.09562

  2. [2]

    Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani, and Adam Jatowt. 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.306 D e AR : Dual-stage document reranking with reasoning agents via LLM distillation . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5710--5723, Suzhou, China. Association for Computational Linguistics

  3. [3]

    Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael G \"u nther, Maximilian Werk, and Han Xiao. 2026. https://api.semanticscholar.org/CorpusID:285659408 jina-embeddings-v5-text: Task-targeted embedding distillation . ArXiv, abs/2602.15547

  4. [4]

    Freeman, and Antonio Torralba

    Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, and Antonio Torralba. 2025. https://openreview.net/forum?id=rQQZiSFcNU Mathnet: a global multimodal benchmark for mathematical reasoning and retrieval . In The 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025

  5. [5]

    Dhananjay Ashok, Suraj Nair, Mutasem Al-Darabsah, Choon Hui Teo, Tarun Agarwal, and Jonathan May. 2025. https://arxiv.org/abs/2511.05684 A representation sharpening framework for zero shot dense retrieval . arXiv preprint arXiv:2511.05684

  6. [6]

    Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, and Siva Reddy. 2024. https://openreview.net/forum?id=IW1PR7vEBf LLM 2vec: Large language models are secretly powerful text encoders . In First Conference on Language Modeling

  7. [7]

    Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu, Iryna Gurevych, and Heinz Koeppl. 2025 a . https://api.semanticscholar.org/CorpusID:279464450 Revela: Dense retriever learning via language modeling . ArXiv, abs/2506.16552

  8. [8]

    Yuzheng Cai, Yanzhao Zhang, Dingkun Long, Mingxin Li, Pengjun Xie, and Weiguo Zheng. 2025 b . https://arxiv.org/abs/2509.00520 Erank: Fusing supervised fine-tuning and reinforcement learning for effective and efficient text reranking . arXiv preprint arXiv:2509.00520

  9. [9]

    Hung-Ting Chen, Xiang Liu, Shauli Ravfogel, and Eunsol Choi. 2025 a . https://api.semanticscholar.org/CorpusID:282749128 Beyond single embeddings: Capturing diverse targets with multi-query retrieval . ArXiv, abs/2511.02770

  10. [10]

    Jianlyu Chen, Junwei Lan, Chaofan Li, Defu Lian, and Zheng Liu. 2025 b . https://arxiv.org/abs/2510.08252 Reasonembed: Enhanced text embeddings for reasoning-intensive document retrieval . arXiv preprint arXiv:2510.08252

  11. [11]

    Liyang Chen, Yujun Cai, Jieqiong Dong, and Yiwei Wang. 2025 c . https://arxiv.org/abs/2506.07116 Bright+: Upgrading the bright benchmark with marcus, a multi-agent rag clean-up suite . arXiv preprint arXiv:2506.07116

  12. [12]

    Peter Baile Chen, Tomer Wolfson, Michael Cafarella, and Dan Roth. 2025 d . https://arxiv.org/abs/2504.03598 Enrichindex: Using llms to enrich retrieval indices offline . arXiv preprint arXiv:2504.03598

  13. [13]

    Zijian Chen, Xueguang Ma, Shengyao Zhuang, Jimmy Lin, Akari Asai, and Victor Zhong. 2026. https://arxiv.org/abs/2603.04384 Agentir: Reasoning-aware retrieval for deep research agents . ArXiv, abs/2603.04384

  14. [14]

    Debrup Das, Sam O ' Nuallain, and Razieh Rahimi. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1011 R a D e R : Reasoning-aware dense retrieval models . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 19981--20008, Suzhou, China. Association for Computational Linguistics

  15. [15]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

  16. [16]

    Felix Faltings, Wei Wei, and Yujia Bao. 2025. https://doi.org/10.18653/v1/2025.acl-short.34 Enhancing retrieval systems with inference-time logical reasoning . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 449--463, Vienna, Austria. Association for Computational Linguistics

  17. [17]

    Yongqi Fan, Xiaoyang Chen, Dezhi Ye, Jie Liu, Haijin Liang, Jin Ma, Ben He, Yingfei Sun, and Tong Ruan. 2025. https://arxiv.org/abs/2508.09539 Tfrank: Think-free reasoning enables practical pointwise llm ranking . arXiv preprint arXiv:2508.09539

  18. [18]

    Aniketh Garikaparthi, Manasi Patwardhan, Aditya Sanjiv Kanade, Aman Hassan, Lovekesh Vig, and Arman Cohan. 2025. https://doi.org/10.18653/v1/2025.acl-long.1390 MIR : Methodology inspiration retrieval for scientific research problems . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages ...

  19. [19]

    Jiahui Geng, Fengyu Cai, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, et al. 2025. https://arxiv.org/abs/2506.11066 Coquir: A comprehensive benchmark for code quality-aware information retrieval . arXiv preprint arXiv:2506.11066

  20. [20]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, and et al. 2025. https://doi.org/10.1038/S41586-025-09422-Z Deepseek-r1 incentivizes reasoning in llms through reinforcement learning . Nat., 645(8081):633--638

  21. [21]

    Nilesh Gupta, Wei-Cheng Chang, Ngot Bui, Cho-Jui Hsieh, and Inderjit S Dhillon. 2025. https://arxiv.org/abs/2510.13217 Llm-guided hierarchical retrieval . arXiv preprint arXiv:2510.13217

  22. [22]

    Hockenmaier, and Tong Zhang

    Jerry Huang, Siddarth Madala, Cheng Niu, J. Hockenmaier, and Tong Zhang. 2025. https://api.semanticscholar.org/CorpusID:282739773 Contextual relevance and adaptive sampling for llm-based document reranking . ArXiv, abs/2511.01208

  23. [23]

    Hamel Husain, Hongqiu Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. https://api.semanticscholar.org/CorpusID:202712680 Codesearchnet challenge: Evaluating the state of semantic code search . ArXiv, abs/1909.09436

  24. [24]

    Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022. https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Transactions on Machine Learning Research

  25. [25]

    Yuelyu Ji, Zhuochun Li, Rui Meng, and Daqing He. 2025. https://doi.org/10.1145/3726302.3730070 Reason-to-rank: Distilling direct and comparative reasoning from large language models for document reranking . In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025, Padua, Italy, July 13-...

  26. [26]

    Ting Jiang, Minghui Song, Zihan Zhang, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, and Fuzhen Zhuang. 2024. https://api.semanticscholar.org/CorpusID:271245054 E5-v: Universal embeddings with multimodal large language models . ArXiv, abs/2407.12580

  27. [27]

    Jiajie Jin, Yanzhao Zhang, Mingxin Li, Dingkun Long, Pengjun Xie, Yutao Zhu, and Zhicheng Dou. 2026. https://api.semanticscholar.org/CorpusID:286222595 Laser: Internalizing explicit reasoning into latent space for dense retrieval . ArXiv, abs/2603.01425

  28. [28]

    Haocheng Ju and Bin Dong. 2025. https://openreview.net/forum?id=0pJtN4S9d6 MIRB : Mathematical information retrieval benchmark . In 2nd AI for Math Workshop @ ICML 2025

  29. [29]

    Omar Khattab and Matei Zaharia. 2020. https://doi.org/10.1145/3397271.3401075 Colbert: Efficient and effective passage search via contextualized late interaction over bert . In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '20, page 39–48, New York, NY, USA. Association for Computing...

  30. [30]

    Hyunseo Kim, Sangam Lee, Kwangwook Seo, and Dongha Lee. 2025. https://arxiv.org/abs/2509.21106 BESPOKE : Benchmark for search-augmented large language model personalization via diagnostic feedback . arXiv preprint arXiv:2509.21106

  31. [31]

    Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, and Jinsong Su. 2025. https://arxiv.org/abs/2511.00405 Ume-r1: Exploring reasoning-driven generative multimodal embeddings . arXiv preprint arXiv:2511.00405

  32. [32]

    Dohyeon Lee, Yeonseok Jeong, and Seung-won Hwang. 2025 a . https://doi.org/10.18653/v1/2025.findings-emnlp.371 From token to action: State machine reasoning to mitigate overthinking in information retrieval . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7048--7064, Suzhou, China. Association for Computational Linguistics

  33. [33]

    Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hern \'a ndez Abrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, Xiaoqi Ren, Shanfeng Zhang, Daniel M. Salz, Michael Boratko, Jay Han, Blair Chen, Shuo Huang, Vikram Rao, Paul Suganthan, Feng Han, Andreas Doumanoglou, Nithi Gupta, Fedor Moiseev, Cathy Yip, Aas...

  34. [34]

    Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, et al. 2024. https://arxiv.org/abs/2403.20327 Gecko: Versatile text embeddings distilled from large language models . arXiv preprint arXiv:2403.20327

  35. [35]

    Sangam Lee, Ryang Heo, SeongKu Kang, and Dongha Lee. 2025 c . https://arxiv.org/abs/2503.23033 Imagine all the relevance: Scenario-profiled indexing with knowledge expansion for dense retrieval . arXiv preprint arXiv:2503.23033

  36. [36]

    Yibin Lei, Tao Shen, and Andrew Yates. 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.965 T hink QE : Query expansion via an evolving thinking process . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 17772--17781, Suzhou, China. Association for Computational Linguistics

  37. [37]

    Lei Li, Xiangxu Zhang, Xiao Zhou, and Zheng Liu. 2025 a . https://doi.org/10.18653/v1/2025.findings-emnlp.1305 A uto MIR : Effective zero-shot medical information retrieval without relevance labels . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24028--24047, Suzhou, China. Association for Computational Linguistics

  38. [38]

    Lei Li, Xiao Zhou, and Zheng Liu. 2025 b . https://arxiv.org/abs/2505.14558 R2med: A benchmark for reasoning-driven medical retrieval . arXiv preprint arXiv:2505.14558

  39. [39]

    Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, and Weixing Shen. 2023. https://doi.org/10.1145/3583780.3615125 Muser: A multi-view similar case retrieval dataset . In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM '23, page 5336–5340, New York, NY, USA. Association for Computing...

  40. [40]

    Xiangyang Li, Kuicai Dong, Yi Quan Lee, Wei Xia, Hao Zhang, Xinyi Dai, Yasheng Wang, and Ruiming Tang. 2025 c . https://doi.org/10.18653/v1/2025.acl-long.1072 C o IR : A comprehensive benchmark for code information retrieval models . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2...

  41. [41]

    Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou. 2025 d . https://doi.org/10.1145/3722552 From matching to generation: A survey on generative information retrieval . ACM Trans. Inf. Syst., 43(3)

  42. [43]

    Xingxuan Li, Weiwen Xu, Ruochen Zhao, Fangkai Jiao, Shafiq Joty, and Lidong Bing. 2025 f . https://doi.org/10.18653/v1/2025.acl-long.1244 Can we further elicit reasoning in LLM s? critic-guided planning with retrieval-augmentation for solving challenging tasks . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vo...

  43. [44]

    Yangning Li, Weizhi Zhang, Yuyao Yang, Wei-Chieh Huang, Yaozu Wu, Junyu Luo, Yuanchen Bei, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Chunkit Chan, Yankai Chen, Zhongfen Deng, Yinghui Li, Hai-Tao Zheng, Dongyuan Li, Renhe Jiang, Ming Zhang, Yangqiu Song, and Philip S. Yu. 2025 g . https://doi.org/10.18653/v1/2025.findings-emnlp.648 A survey of RAG -reasoning...

  44. [45]

    Jintao Liang, Gang Su, Huifeng Lin, You Wu, Rui Zhao, and Ziyue Li. 2025. https://api.semanticscholar.org/CorpusID:279318629 Reasoning rag via system 1 or system 2: A survey on reasoning agentic retrieval-augmented generation for industry challenges . ArXiv, abs/2506.10408

  45. [46]

    Junyong Lin, Lu Dai, Ruiqian Han, Yijie Sui, Ruilin Wang, Xingliang Sun, Qinglin Wu, Min Feng, Hao Liu, and Hui Xiong. 2025. https://doi.org/10.1145/3711896.3737432 Scirgen: Synthesize realistic and large-scale RAG dataset for scientific research . In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, KDD 2025, Toro...

  46. [47]

    Yijie Lin, Guofeng Ding, Hao Zhou, Haobin Li, Mouxing Yang, and Xi Peng. 2026. https://api.semanticscholar.org/CorpusID:285463144 Ark: A dual-axis multimodal retrieval benchmark along reasoning and knowledge . ArXiv, abs/2602.09839

  47. [48]

    Hongjun Liu, Yilun Zhao, Arman Cohan, and Chen Zhao. 2025 a . https://api.semanticscholar.org/CorpusID:279244791 Sucea: Reasoning-intensive retrieval for adversarial fact-checking through claim decomposition and editing . ArXiv, abs/2506.04583

  48. [49]

    Wenhan Liu, Xinyu Ma, Weiwei Sun, Yutao Zhu, Yuchen Li, Dawei Yin, and Zhicheng Dou. 2025 b . https://arxiv.org/abs/2508.07050 Reasonrank: Empowering passage ranking with strong reasoning ability . arXiv preprint arXiv:2508.07050

  49. [50]

    Yuxiang Liu, Tian Wang, Gourab Kundu, Tianyu Cao, Guang Cheng, Zhen Ge, Jianshu Chen, Qingjun Cui, and Trishul Chilimbi. 2025 c . https://doi.org/10.1145/3746252.3760855 Exploring reasoning-infused text embedding with large language models for zero-shot dense retrieval . In Proceedings of the 34th ACM International Conference on Information and Knowledge ...

  50. [51]

    Meixiu Long, Duolin Sun, Dan Yang, Junjie Wang, Yecheng Luo, Yue Shen, Jian Wang, Hualei Zhou, Chunxiao Guo, Peng Wei, et al. 2025. https://arxiv.org/abs/2508.07995 Diver: A multi-stage approach for reasoning-intensive information retrieval . arXiv preprint arXiv:2508.07995

  51. [52]

    Shubham Kumar Nigam, Navansh Goel, and Arnab Bhattacharya. 2022. https://doi.org/10.1007/978-3-031-29168-5_7 nigam@coliee-22: Legal case retrieval and entailment using cascading of lexical and semantic-based models . In New Frontiers in Artificial Intelligence: JSAI-IsAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June ...

  52. [53]

    Tong Niu, Shafiq Joty, Ye Liu, Caiming Xiong, Yingbo Zhou, and Semih Yavuz. 2024. https://arxiv.org/abs/2411.00142 Judgerank: Leveraging large language models for reasoning-intensive reranking . arXiv preprint arXiv:2411.00142

  53. [54]

    Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, and Minjoon Seo. 2024. https://api.semanticscholar.org/CorpusID:267782799 Instructir: A benchmark for instruction following of information retrieval models . ArXiv, abs/2402.14334

  54. [55]

    Zhiyuan Peng, Ting-Ruen Wei, Tingyu Song, and Yilun Zhao. 2025. https://doi.org/10.18653/v1/2025.emnlp-industry.186 Efficiency-effectiveness reranking FLOP s for LLM -based rerankers . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2782--2791, Suzhou (China). Association for Computational L...

  55. [56]

    Zile Qiao, Guoxin Chen, Xuanzhong Chen, Donglei Yu, Wenbiao Yin, Xinyu Wang, Zhen Zhang, Baixuan Li, Huifeng Yin, Kuan Li, Rui Min, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, and Jingren Zhou. 2025. https://api.semanticscholar.org/CorpusID:281325175 Webresearcher: Unleashing unbounded reasoning capability in long-horizon agents . ArXiv, abs/2509.13309

  56. [57]

    Xubo Qin, Jun Bai, Jiaqi Li, Zixia Jia, and Zilong Zheng. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1078 Reinforced query reasoners for reasoning-intensive retrieval tasks . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21261--21274, Suzhou, China. Association for Computational Linguistics

  57. [58]

    Stephen Robertson and Hugo Zaragoza. 2009. https://doi.org/10.1561/1500000019 The probabilistic relevance framework: Bm25 and beyond . Found. Trends Inf. Retr., 3(4):333–389

  58. [59]

    Fu, Simran Arora, Neel Guha, and Christopher R\' e

    Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, and Christopher R\' e . 2024. Benchmarking and building long-context retrieval models with loco and m2-bert. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org

  59. [60]

    Chris Samarinas and Hamed Zamani. 2025. https://doi.org/10.1145/3731120.3744613 Distillation and refinement of reasoning in small language models for document re-ranking . In Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval, ICTIR 2025, Padua, Italy, 18 July 2025 , pages 430--435. ACM

  60. [61]

    Rahul Seetharaman, Kaustubh D Dhole, and Aman Bansal. 2025. https://arxiv.org/abs/2506.14086 Insertrank: Llms can reason over bm25 scores to improve listwise reranking . arXiv preprint arXiv:2506.14086

  61. [62]

    Rulin Shao, Rui Qiao, Varsha Kishore, Niklas Muennighoff, Xi Victoria Lin, Daniela Rus, Bryan Kian Hsiang Low, Sewon Min, Wen-tau Yih, Pang Wei Koh, et al. 2025. https://arxiv.org/abs/2504.20595 Reasonir: Training retrievers for reasoning tasks . arXiv preprint arXiv:2504.20595

  62. [63]

    Yuchen Shi, Yuzheng Cai, Siqi Cai, Zihan Xu, Lichao Chen, Yulei Qin, Zhijian Zhou, Xiang Fei, Chaofan Qiu, Xiaoyu Tan, Gang Li, Zongyi Li, Haojia Lin, Guocan Cai, Yong Mao, Yunsheng Wu, Ke Li, and Xing Sun. 2025. https://api.semanticscholar.org/CorpusID:284350437 Youtu-agent: Scaling agent productivity with automated generation and hybrid policy optimizat...

  63. [64]

    Tingyu Song, Guo Gan, Mingsheng Shang, and Yilun Zhao. 2025 a . https://doi.org/10.18653/v1/2025.naacl-long.511 IFIR : A comprehensive benchmark for evaluating instruction-following in expert-domain information retrieval . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human L...

  64. [65]

    Tingyu Song, Yilun Zhao, Siyue Zhang, Chen Zhao, and Arman Cohan. 2025 b . https://doi.org/10.18653/v1/2025.emnlp-main.1041 L im R ank: Less is more for reasoning-intensive information reranking . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20636--20650, Suzhou, China. Association for Computational Linguistics

  65. [66]

    Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han yu Wang, Liu Haisu, Quan Shi, Zachary S Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O Arik, Danqi Chen, and Tao Yu. 2025. https://openreview.net/forum?id=ykuc5q381b BRIGHT : A realistic and challenging benchmark for reasoning-intensive retrieval . In The Thirteenth Interna...

  66. [67]

    Duolin Sun, Meixiu Long, Dan Yang, Yihan Jiao, Zhehao Tan, Jie Feng, Junjie Wang, Yue Shen, Peng Wei, Jian Wang, et al. 2025. https://arxiv.org/abs/2511.11653 Grouprank: A groupwise reranking paradigm driven by reinforcement learning . arXiv preprint arXiv:2511.11653

  67. [68]

    Zeinab Sadat Taghavi, Ali Modarressi, Yunpu Ma, and Hinrich Schuetze. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1685 I mpli R et: Benchmarking the implicit fact retrieval challenge . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33156--33178, Suzhou, China. Association for Computational Linguistics

  68. [69]

    Jianting Tang, Dongshuai Li, Tao Wen, Fuyu Lv, Dan Ou, and Linli Xu. 2025 a . https://arxiv.org/abs/2510.14321 Large reasoning embedding models: Towards next-generation dense retrieval paradigm . arXiv preprint arXiv:2510.14321

  69. [70]

    Tian Tang, Zhixing Tian, Zhenyu Zhu, Chenyang Wang, Haiqing Hu, Guoyu Tang, Lin Liu, and Sulong Xu. 2025 b . https://doi.org/10.1145/3701716.3715246 Lref: A novel llm-based relevance framework for e-commerce search . In Companion Proceedings of the ACM on Web Conference 2025, WWW '25, page 468–475, New York, NY, USA. Association for Computing Machinery

  70. [71]

    Nandan Thakur, Jimmy Lin, Sam Havens, Michael Carbin, Omar Khattab, and Andrew Drozdov. 2025. https://openreview.net/forum?id=54TTgXlS2U Freshstack: Building realistic benchmarks for evaluating retrieval on technical documents . In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

  71. [72]

    Supriti Vijay, Aman Priyanshu, Anu Vellore, Baturay Saglam, and Amin Karbasi. 2025. https://arxiv.org/abs/2511.07581 Think before you retrieve: Learning test-time adaptive search with small language models . arXiv preprint arXiv:2511.07581

  72. [73]

    Guangzhi Wang, Kai Li, Yinghao Jiao, and Zhi Liu. 2025. https://arxiv.org/abs/2511.13726 Refine thought: A test-time inference method for embedding model reasoning . arXiv preprint arXiv:2511.13726

  73. [74]

    Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, and Luca Soldaini. 2025 a . https://doi.org/10.18653/v1/2025.naacl-long.597 F ollow IR : Evaluating and teaching information retrieval models to follow instructions . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Assoc...

  74. [75]

    Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, and Jack Hessel. 2025 b . https://openreview.net/forum?id=odvSjn416y Promptriever: Instruction-trained retrievers can be prompted like language models . In The Thirteenth International Conference on Learning Representations

  75. [76]

    Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, and Benjamin Van Durme. 2025 c . https://arxiv.org/abs/2502.18418 Rank1: Test-time compute for reranking in information retrieval . arXiv preprint arXiv:2502.18418

  76. [77]

    Chenghao Xiao, G Thomas Hudson, and Noura Al Moubayed. 2024. https://arxiv.org/abs/2404.06347 Rar-b: Reasoning as retrieval benchmark . arXiv preprint arXiv:2404.06347

  77. [78]

    Haike Xu and Tong Chen. 2025. https://arxiv.org/abs/2509.07163 Beyond sequential reranking: Reranker-guided search improves reasoning intensive retrieval . arXiv preprint arXiv:2509.07163

  78. [79]

    Kaishuai Xu, Wenjun Hou, Yi Cheng, and Wenjie Li. 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.1110 RAR ^2 : Retrieval-augmented medical reasoning via thought-driven retrieval . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20386--20396, Suzhou, China. Association for Computational Linguistics

  79. [80]

    Ruiran Yan, Zheng Liu, and Defu Lian. 2025. https://arxiv.org/abs/2502.07555 O1 embedder: Let retrievers think before action . arXiv preprint arXiv:2502.07555

  80. [81]

    Eugene Yang, Andrew Yates, Kathryn Ricci, Orion Weller, Vivek Chari, Benjamin Van Durme, and Dawn Lawrie. 2025. https://arxiv.org/abs/2505.14432 Rank-k: Test-time reasoning for listwise reranking . arXiv preprint arXiv:2505.14432

Showing first 80 references.