Recognition: 1 theorem link
· Lean TheoremBeyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage
Pith reviewed 2026-05-15 13:06 UTC · model grok-4.3
The pith
Retrieval metrics based on information coverage reliably predict how complete the final answers are in retrieval-augmented generation systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Coverage-based retrieval metrics serve as reliable early indicators of nugget coverage in RAG-generated responses. Strong correlations appear at both topic and system levels across the TREC NeuCLIR 2024, TREC RAG 2024, and WikiVideo benchmarks. The relationship strengthens when retrieval objectives align with generation goals, while complex iterative RAG pipelines can partially decouple generation quality from retrieval effectiveness.
What carries the argument
Coverage-based retrieval metrics that quantify how much of the target information is captured in the retrieved documents rather than relying on relevance ranking alone.
If this is right
- Retrieval metrics can serve as practical proxies for RAG performance without requiring full response generation.
- Alignment between retrieval objectives and generation goals increases the predictive strength of coverage metrics.
- Iterative RAG pipelines can reduce the impact of weaker initial retrieval on final answer quality.
- The observed correlations hold across both text and multimodal settings and multiple evaluation frameworks.
Where Pith is reading between the lines
- RAG developers could screen candidate retrievers using coverage metrics before integrating them into full systems.
- Retrieval methods might be redesigned with explicit coverage targets matched to the intended generation task.
- Similar coverage relationships could be tested in other retrieval-augmented settings such as long-form summarization.
Load-bearing premise
The chosen benchmarks and evaluation frameworks produce coverage measures that generalize beyond the tested RAG pipelines and domains.
What would settle it
A new benchmark or domain in which coverage-based retrieval metrics show no or weak correlation with the nugget coverage achieved in generated RAG responses.
read the original abstract
Retrieval-augmented generation (RAG) systems combine document retrieval with a generative model to address complex information seeking tasks like report generation. While the relationship between retrieval quality and generation effectiveness seems intuitive, it has not been systematically studied. We investigate whether upstream retrieval metrics can serve as reliable early indicators of the final generated response's information coverage. Through experiments across two text RAG benchmarks (TREC NeuCLIR 2024 and TREC RAG 2024) and one multimodal benchmark (WikiVideo), we analyze 15 text retrieval stacks and 10 multimodal retrieval stacks across four RAG pipelines and multiple evaluation frameworks (Auto-ARGUE and MiRAGE). Our findings demonstrate strong correlations between coverage-based retrieval metrics and nugget coverage in generated responses at both topic and system levels. This relationship holds most strongly when retrieval objectives align with generation goals, though more complex iterative RAG pipelines can partially decouple generation quality from retrieval effectiveness. These findings provide empirical support for using retrieval metrics as proxies for RAG performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates whether upstream retrieval metrics can serve as reliable proxies for information coverage in RAG-generated responses. Using TREC NeuCLIR 2024, TREC RAG 2024, and WikiVideo benchmarks, it evaluates 15 text and 10 multimodal retrieval stacks across four RAG pipelines with Auto-ARGUE and MiRAGE frameworks, reporting strong correlations between coverage-based retrieval metrics and nugget coverage at topic and system levels. The relationship is strongest when retrieval objectives align with generation goals, though iterative pipelines may partially decouple the two.
Significance. If the reported correlations are robust after appropriate statistical controls and stratification, the work supplies empirical grounding for treating retrieval effectiveness as an early indicator of RAG performance. This could streamline evaluation pipelines and clarify when retrieval quality directly translates to generation coverage, while highlighting the moderating role of pipeline complexity.
major comments (3)
- [§5] §5 (Results): The headline claim of 'strong correlations' at topic and system levels is presented in aggregate without reported correlation coefficients, p-values, confidence intervals, or effect sizes. The abstract supplies no statistical details, and the absence of these quantities prevents assessment of whether the correlations are practically meaningful or driven by outliers.
- [§5.3] §5.3 (Pipeline analysis): The manuscript notes that 'more complex iterative RAG pipelines can partially decouple generation quality from retrieval effectiveness' but does not quantify this decoupling via interaction terms, separate correlation tables by pipeline type, or moderation analysis. If the 4 pipelines include both simple and iterative variants without stratification, the aggregate coefficients may be inflated by aligned cases and fail to support the general proxy claim.
- [§4.2] §4.2 (Experimental setup): No explicit rules for data exclusion, handling of multimodal vs. text differences, or controls for topic difficulty are described. This leaves open whether the observed correlations generalize or are confounded by benchmark-specific properties of TREC NeuCLIR, TREC RAG, and WikiVideo.
minor comments (2)
- [Table 1] Table 1 and Figure 2: Axis labels and legend entries use inconsistent abbreviations for retrieval stacks; expand or standardize for readability.
- [§2] Related work section: The discussion of prior RAG evaluation frameworks (e.g., ARGUE, MiRAGE) would benefit from explicit comparison of their nugget coverage definitions to avoid potential circularity in metric choice.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements in statistical reporting, pipeline stratification, and experimental controls.
read point-by-point responses
-
Referee: [§5] §5 (Results): The headline claim of 'strong correlations' at topic and system levels is presented in aggregate without reported correlation coefficients, p-values, confidence intervals, or effect sizes. The abstract supplies no statistical details, and the absence of these quantities prevents assessment of whether the correlations are practically meaningful or driven by outliers.
Authors: We agree that the current presentation lacks the necessary statistical details. In the revised manuscript we will report Pearson and Spearman correlation coefficients, associated p-values, 95% confidence intervals, and effect sizes (including r-squared) for both topic-level and system-level analyses. These statistics will be provided in aggregate as well as stratified by benchmark and pipeline type to allow assessment of practical significance and potential outlier effects. revision: yes
-
Referee: [§5.3] §5.3 (Pipeline analysis): The manuscript notes that 'more complex iterative RAG pipelines can partially decouple generation quality from retrieval effectiveness' but does not quantify this decoupling via interaction terms, separate correlation tables by pipeline type, or moderation analysis. If the 4 pipelines include both simple and iterative variants without stratification, the aggregate coefficients may be inflated by aligned cases and fail to support the general proxy claim.
Authors: We acknowledge the value of quantifying the decoupling effect. The revision will add separate correlation tables for simple versus iterative pipelines, include interaction terms in regression models to test moderation by pipeline complexity, and present a dedicated moderation analysis. This will clarify the conditions under which retrieval metrics remain reliable proxies and prevent over-generalization from aggregate results. revision: yes
-
Referee: [§4.2] §4.2 (Experimental setup): No explicit rules for data exclusion, handling of multimodal vs. text differences, or controls for topic difficulty are described. This leaves open whether the observed correlations generalize or are confounded by benchmark-specific properties of TREC NeuCLIR, TREC RAG, and WikiVideo.
Authors: We will expand §4.2 to specify data exclusion rules (e.g., minimum nugget count per topic), detail the separate processing and normalization steps for multimodal versus text benchmarks, and incorporate controls for topic difficulty through stratification by topic features and inclusion of topic as a random effect in statistical models. These additions will strengthen claims of generalizability across the three benchmarks. revision: yes
Circularity Check
No circularity: purely empirical correlation analysis on external benchmarks
full rationale
The paper reports measured correlations between retrieval coverage metrics and generated response nugget coverage across TREC NeuCLIR 2024, TREC RAG 2024, and WikiVideo benchmarks using 15 text + 10 multimodal stacks and four RAG pipelines. No equations, fitted parameters, or derivations are defined in terms of the target quantities; all results are direct statistical observations from independent evaluation frameworks (Auto-ARGUE, MiRAGE). The analysis contains no self-definitional steps, no predictions that reduce to fitted inputs, and no load-bearing self-citations that substitute for external verification. The central claim is therefore an empirical finding rather than a constructed equivalence.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Nugget coverage measured by Auto-ARGUE and MiRAGE is a valid proxy for information coverage in generated responses
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
strong correlations between coverage-based retrieval metrics and nugget coverage in generated responses at both topic and system levels
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv:2310.11511 [cs.CL] https://arxiv.org/abs/2310.11511
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2024. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. InThe Twelfth International Conference on Learning Representations
work page 2024
-
[4]
Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, et al. 2024. Non-determinism of" deterministic" llm settings.arXiv preprint arXiv:2408.04667(2024)
-
[5]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. 2025. Qwen2.5-VL Technical Rep...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Regina Barzilay, Kathleen R. McKeown, and Michael Elhadad. 1999. Information Fusion in the Context of Multi-Document Summarization. InProceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, College Park, Maryland, USA, 550–557. https: //doi.org/10.3115/1034678.1034760
-
[7]
Andrew Blair-Stanek and Benjamin Van Durme. 2025. Llms provide unstable answers to legal questions. InProceedings of the Twentieth International Conference on Artificial Intelligence and Law. 425–429
work page 2025
-
[8]
Jaime G. Carbonell and Jade Goldstein. 2018. The Use of MMR and Diversity- Based Reranking in Document Reranking and Summarization. (6 2018). https: //doi.org/10.1184/R1/6610814.v1
- [9]
-
[10]
Olivier Chapelle, Shihao Ji, Ciya Liao, Emre Velipasaoglu, Larry Lai, and Su- Lin Wu. 2011. Intent-based diversification of web search results: metrics and algorithms.Inf. Retr.14, 6 (Dec. 2011), 572–592
work page 2011
-
[11]
Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, and Dongyan Zhao. 2024. xRAG: Extreme Context Compression for Retrieval-Augmented Generation with One Token. InAdvances in Neural Information Processing Systems (NeurIPS)
work page 2024
-
[12]
Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan
Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative analysis of cascade measures for novelty and diversity. InProceedings of the Fourth ACM International Conference on Web Search and Data Mining(Hong Kong, China)(WSDM ’11). Association for Computing Machinery, New York, NY, USA, 75–84. https://doi.org/10.1145/1935826.1935847
-
[13]
Clarke, Maheedhar Kolla, Gordon V
Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. InProceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore)(SIGIR ’08). Association f...
-
[14]
Cormack, Charles L A Clarke, and Stefan Buettcher
Gordon V. Cormack, Charles L A Clarke, and Stefan Buettcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval(Boston, MA, USA)(SIGIR ’09). Association for Computing Machinery, New York, NY, USA, 758–759...
-
[15]
Laura Dietz, Bryan Li, Gabrielle Liu, Jia-Huei Ju, Eugene Yang, Dawn Lawrie, William Walden, and James Mayfield. 2026. Incorporating Q&A Nuggets into Retrieval-Augmented Generation. InProceedings of the 48th European Conference on Information Retrieval (ECIR 2026)
work page 2026
-
[16]
Yufeng Du, Minyang Tian, Srikanth Ronanki, Subendhu Rongali, Sravan Babu Bodapati, Aram Galstyan, Azton Wells, Roy Schwartz, Eliu A Huerta, and Hao Peng. 2025. Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2025, Christos Christodoulopoulos, Tanmoy Chakraborty, Carol...
-
[17]
Kevin Duh, Dawn Lawrie, Debashish Chakraborty, Roxana Petcu, Eugene Yang, Kenton Murraya, Daniel Khashabi, and Maxime Dassen. 2025. HLTCOE Genera- tion Team at TREC 2025. InThe Thirty-Fourth Text REtrieval Conference Proceed- ings (TREC2025). https://trec-ragtime.github.io/assets/notebooks/2025/hltcoe- gen.pdf
work page 2025
- [18]
-
[19]
Assaf Elovic. 2023. gpt-researcher. https://github.com/assafelovic/gpt-researcher
work page 2023
- [20]
- [21]
-
[22]
Naghmeh Farzi and Laura Dietz. 2024. Pencils Down! Automatic Rubric-based Evaluation of Retrieve/Generate Systems. InProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval(Washington DC, USA)(ICTIR ’24). Association for Computing Machinery, New York, NY, USA, 175–184. https://doi.org/10.1145/3664190.3672511
-
[23]
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. 2024. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL] https://arxiv.org/abs/2312.10997
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
Kazuki Hayashi, Hidetaka Kamigaito, Shinya Kouda, and Taro Watanabe. 2025. Iterkey: Iterative keyword generation with llms for enhanced retrieval aug- mented generation. InProceedings of the Second Conference on Language Modeling (COLM’25)
work page 2025
-
[25]
Gijs Hendriksen, Djoerd Hiemstra, and Arjen P. de Vries. 2025. Selective Search as a First-Stage Retriever. InExperimental IR Meets Multilinguality, Multimodality, and Interaction: 16th International Conference of the CLEF Association, CLEF 2025, Madrid, Spain, September 9–12, 2025, Proceedings(Madrid, Spain). Springer-Verlag, Berlin, Heidelberg, 17–33. h...
-
[26]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques.ACM Trans. Inf. Syst.20, 4 (Oct. 2002), 422–446. https://doi. org/10.1145/582415.582418
-
[27]
Why Language Models Hallucinate
Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang. 2025. Why Language Models Hallucinate. arXiv:2509.04664 [cs.CL] https://arxiv.org/ abs/2509.04664
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. 2023. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv:2310.03714 [cs.CL] https://arxiv.org/abs/2310.03714
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
Gun Il Kim, Jong Wook Kim, and Beakcheol Jang. 2025. UniRAG: A Unified RAG Framework for Knowledge-Intensive Queries with Decomposition, Break- Down Reasoning, and Iterative Rewriting. InFindings of the Association for Computational Linguistics: EMNLP 2025. 18795–18810
work page 2025
-
[30]
Reno Kriz, Kate Sanders, David Etter, Kenton Murray, Cameron Carpenter, Kelly Van Ochten, Hannah Recknor, Jimena Guallar-Blasco, Alexander Mar- tin, Ronald Colaianni, Nolan King, Eugene Yang, and Benjamin Van Durme. 2025. MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval. arXiv:2410.11619 [cs.CV] https://arxiv.org/abs/2410.11619
-
[31]
Weronika Lajewska and Krisztian Balog. 2025. GINGER: Grounded Information Nugget-Based Generation of Responses. InProceedings of the 48th International ACM SIGIR Conference (SIGIR ’25). https://krisztianbalog.com/files/sigir2025- ginger.pdf SIGIR 2025 paper
work page 2025
- [32]
-
[33]
Victor Lavrenko and W Bruce Croft. 2017. Relevance-based language models. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 260–267
work page 2017
-
[34]
Oard, Luca Soldaini, and Eugene Yang
Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, and Eugene Yang. 2025. Overview of the TREC 2024 NeuCLIR Track. arXiv:2509.14355 [cs.IR] https://arxiv.org/abs/2509.14355
-
[35]
Liu, Kevin Lin, John Hewitt, Bhargavi Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. https://doi.org/10.1162/tacl_a_00638
- [36]
- [37]
- [38]
- [39]
-
[40]
James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, and Noah Hibbler. 2024. On the Evaluation of Machine- Generated Reports. InProceedings of the 47th International ACM SIGIR Confer- ence on Research and Development in Information Retrieva...
-
[41]
Teague McMillan, Gabriele Dominici, Martin Gjoreski, and Marc Langheinrich
- [42]
-
[43]
Federico Nanni, Bhaskar Mitra, Matt Magnusson, and Laura Dietz. 2017. Bench- mark for Complex Answer Retrieval. arXiv:1705.04803 [cs.IR] https://arxiv.org/ abs/1705.04803
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[44]
Federico Nanni, Bhaskar Mitra, Matt Magnusson, and Laura Dietz. 2017. Bench- mark for Complex Answer Retrieval. InProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval(Amsterdam, The Netherlands) (ICTIR ’17). Association for Computing Machinery, New York, NY, USA, 293–296. https://doi.org/10.1145/3121050.3121099
- [45]
-
[46]
Paul Over. 2001. The TREC interactive track: an annotated bibliography.Inf. Process. Manage.37, 3 (May 2001), 369–381. https://doi.org/10.1016/S0306- 4573(00)00053-4
-
[47]
Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, and Jimmy Lin. 2024. Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track. arXiv:2406.16828 [cs.IR] https://arxiv.org/abs/2406.16828
- [48]
-
[49]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV] https://arxiv.org/ abs/2103.00020
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[50]
de Melo, Benjamin Van Durme, and Rama Chellappa
Arun Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Ken- ton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, and Rama Chellappa. 2025. Video-ColBERT: Contextualized Late Interaction for Text-to- Video Retrieval. arXiv:2503.19009 [cs.CV] https://arxiv.org/abs/2503.19009
-
[51]
Stephen Robertson. 2008. A new interpretation of average precision. InPro- ceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 689–690
work page 2008
-
[52]
Stephen Robertson, Hugo Zaragoza, et al . 2009. The probabilistic relevance framework: BM25 and beyond.Foundations and trends®in information retrieval 3, 4 (2009), 333–389
work page 2009
-
[53]
Saron Samuel, Dan DeGenaro, Jimena Guallar-Blasco, Kate Sanders, Oluwaseun Eisape, Tanner Spendlove, Arun Reddy, Alexander Martin, Andrew Yates, Eugene Yang, Cameron Carpenter, David Etter, Efsun Kayi, Matthew Wiesner, Kenton Murray, and Reno Kriz. 2025. MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion. arXiv:2503.20698 [cs.CV] https://...
- [54]
-
[55]
Ian Soboroff, Donna Harman, et al. 2003. Overview of the TREC 2003 Novelty Track.. InTREC. 38–53
work page 2003
- [56]
-
[57]
Nandan Thakur, Ronak Pradeep, Shivani Upadhyay, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. 2025. Assessing Support for the TREC 2024 RAG Track: A Large-Scale Comparative Study of LLM and Human Evaluations. InProceedings of the 48th International ACM SIGIR Confer- ence on Research and Development in Information Retrieval(Pad...
-
[58]
William Walden, Orion Weller, Laura Dietz, Bryan Li, Gabrielle Kaili-May Liu, Yu Hou, and Eugene Yang. 2025. Auto-ARGUE: LLM-Based Report Generation Evaluation.arXiv preprint arXiv:2509.26184(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [59]
-
[60]
Jin Xu, Zhifang Guo, Jinzheng He, Hangrui Hu, Ting He, Shuai Bai, Keqin Chen, Jialin Wang, Yang Fan, Kai Dang, Bin Zhang, Xiong Wang, Yunfei Chu, and Junyang Lin. 2025. Qwen2.5-Omni Technical Report. arXiv:2503.20215 [cs.CL] https://arxiv.org/abs/2503.20215
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[61]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
Diji Yang, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Jie Yang, and Yi Zhang. 2024. Im-rag: Multi-round retrieval-augmented generation through learning inner monologues. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 730–740
work page 2024
-
[63]
Diji Yang, Linda Zeng, Jinmeng Rao, and Yi Zhang. 2025. Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self- Practicing. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1305–1315
work page 2025
-
[64]
Eugene Yang, Dawn Lawrie, James Mayfield, Douglas W. Oard, and Scott Miller
-
[65]
InProceedings of the 46th European Conference on Information Retrieval (ECIR)
Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation. InProceedings of the 46th European Conference on Information Retrieval (ECIR). https://arxiv.org/abs/2401.04810
- [66]
- [67]
-
[68]
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou
-
[69]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models. arXiv:2506.05176 [cs.CL] https://arxiv.org/abs/2506.05176
work page internal anchor Pith review Pith/arXiv arXiv
-
[70]
Mingjun Zhao, Shengli Yan, Bang Liu, Xinwang Zhong, Qian Hao, Haolan Chen, Di Niu, Bowei Long, and Weidong Guo. 2021. QBSUM: A large-scale query-based document summarization dataset from real-world applications.Computer Speech & Language66 (March 2021), 101166. https://doi.org/10.1016/j.csl.2020.101166
-
[71]
Wei Zheng, Xuanhui Wang, Hui Fang, and Hong Cheng. 2012. Coverage-based search result diversification.Inf. Retr.15, 5 (Oct. 2012), 433–457. https://doi.org/ 10.1007/s10791-011-9178-4
-
[72]
Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, HongFa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, and Li Yuan. 2024. LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. arXiv:2310.01852 [cs.CV] https://arxiv.org/abs/2310.01852 11
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.