LLM-Oriented Information Retrieval: A Denoising-First Perspective

Cehao Yang; Fanpu Cao; Hao Liu; Hui Xiong; Liang Sun; Lu Dai; Ziyang Rao

arxiv: 2605.00505 · v2 · pith:RY2TXKHFnew · submitted 2026-05-01 · 💻 cs.IR · cs.AI· cs.CL

LLM-Oriented Information Retrieval: A Denoising-First Perspective

Lu Dai , Liang Sun , Fanpu Cao , Ziyang Rao , Cehao Yang , Hao Liu , Hui Xiong This is my paper

Pith reviewed 2026-05-21 00:14 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords information retrievallarge language modelsdenoisingretrieval-augmented generationsignal-to-noise optimizationhallucinationscontext engineeringagentic search

0 comments

The pith

Denoising to maximize evidence density and verifiability is the new primary bottleneck in information retrieval for LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that modern information retrieval has shifted from serving human users to serving large language models through systems like retrieval-augmented generation. Because LLMs have limited attention and are prone to being misled by irrelevant or incorrect information, leading to hallucinations, the focus must move to denoising: increasing the density of usable, verifiable evidence in what gets fed to the model. This perspective frames IR challenges in four stages progressing from inaccessible information to unverifiable information. The authors organize existing techniques into a taxonomy across the retrieval pipeline and discuss applications in areas such as coding agents and multimodal understanding. If this view holds, future IR research will center on signal-to-noise optimization rather than traditional relevance matching.

Core claim

The central claim is that denoising, defined as maximizing usable evidence density and verifiability within a context window, is becoming the primary bottleneck across the full information access pipeline for LLM-oriented information retrieval. This is conceptualized through a four-stage framework of challenges: inaccessible, undiscoverable, misaligned, and unverifiable. The work also supplies a pipeline-organized taxonomy of signal-to-noise optimization techniques spanning indexing, retrieval, context engineering, verification, and agentic workflow, along with examples from domains reliant on retrieval.

What carries the argument

The four-stage framework that traces IR challenges from inaccessible to undiscoverable to misaligned to unverifiable, and the accompanying taxonomy of denoising techniques organized by pipeline stage.

If this is right

Retrieval systems must incorporate denoising steps at every stage from indexing to final output.
Context engineering becomes essential to pack more verifiable evidence into limited windows.
Verification mechanisms will be needed to combat hallucinations caused by noise.
Applications in lifelong assistants and deep research will benefit from higher evidence density.
Agentic workflows will require integrated signal-to-noise optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This shift may require new evaluation metrics that measure noise impact on LLM reasoning rather than human relevance judgments.
It could connect to broader problems in AI safety by reducing hallucination risks through better retrieval.
Future work might test whether denoising-first designs outperform traditional IR in agentic search tasks.
Extensions could include multimodal denoising for vision-language models.

Load-bearing premise

The load-bearing premise is that the vulnerabilities of LLMs to noise and their limited attention represent a fundamental change that makes denoising the central focus, distinct from earlier challenges in human-oriented information retrieval.

What would settle it

A direct test would be to measure whether removing or reducing noise in retrieved contexts leads to measurable reductions in hallucination rates and improvements in reasoning accuracy for LLMs in RAG setups, compared to standard retrieval without denoising emphasis.

Figures

Figures reproduced from arXiv: 2605.00505 by Cehao Yang, Fanpu Cao, Hao Liu, Hui Xiong, Liang Sun, Lu Dai, Ziyang Rao.

**Figure 1.** Figure 1: Challenge shifts in the history of IR. information, even a powerful LLM cannot produce a correct and verifiable answer. On the one hand, LLM-generated content is flooding the internet corpus itself. The proliferation of hallucinations makes attribution and trust harder than ever before. On the other hand, LLMs are sensitive to noise in context. Studies have found that misleading evidence in the context ca… view at source ↗

**Figure 2.** Figure 2: Empirical validation of the denoising-first perspec view at source ↗

**Figure 3.** Figure 3: A multi-level denoising taxonomy aligned with the five-stage Section 3 pipeline: Controlled Indexing (§3.1), Robust view at source ↗

read the original abstract

Modern information retrieval (IR) is no longer consumed primarily by humans but increasingly by large language models (LLMs) via retrieval-augmented generation (RAG) and agentic search. Unlike human users, LLMs are constrained by limited attention budgets and are uniquely vulnerable to noise; misleading or irrelevant information is no longer just a nuisance, but a direct cause of hallucinations and reasoning failures. In this perspective paper, we argue that denoising-maximizing usable evidence density and verifiability within a context window-is becoming the primary bottleneck across the full information access pipeline. We conceptualize this paradigm shift through a four-stage framework of IR challenges: from inaccessible to undiscoverable, to misaligned, and finally to unverifiable. Furthermore, we provide a pipeline-organized taxonomy of signal-to-noise optimization techniques, spanning indexing, retrieval, context engineering, verification, and agentic workflow. We also present research works on information denoising in domains that rely heavily on retrieval such as lifelong assistant, coding agent, deep research, and multimodal understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This perspective reframes IR for LLMs around denoising but provides no data showing noise is the main bottleneck over other factors.

read the letter

The main thing here is a four-stage framework that moves from inaccessible information to unverifiable evidence, plus a taxonomy of denoising techniques across indexing, retrieval, context work, verification, and agentic flows. It pulls together existing RAG and denoising work into one view and flags applications in coding agents and research assistants. That organization is the clearest contribution; it gives a clean way to sort signal-to-noise ideas that were already scattered in the literature they cite. No new methods or results appear, which matches the abstract's forward-looking tone. The argument that denoising has become the primary bottleneck rests on the idea that LLMs' attention limits and noise sensitivity create a paradigm shift. Nothing in the paper quantifies how much noise actually drives failures compared with prompt sensitivity, base model limits, or retrieval quality. Without experiments, error breakdowns, or even simple comparisons, the claim stays assumptive. The four-stage framing is useful for discussion but does not demonstrate necessity. Readers working on RAG pipelines or agent design might find the taxonomy handy for structuring their own thinking or literature reviews. It is not the place to look for reproducible findings or formal derivations. A serious editor could send this to peer review as a perspective piece, with the expectation that referees would ask for some grounding of the bottleneck claim or clearer links to measurable improvements in the cited domains.

Referee Report

2 major / 2 minor

Summary. This perspective paper claims that modern information retrieval, increasingly consumed by LLMs through RAG and agentic search rather than humans, has shifted such that denoising—maximizing usable evidence density and verifiability within context windows—is now the primary bottleneck, owing to LLMs' limited attention budgets and vulnerability to noise-induced hallucinations. It conceptualizes the shift via a four-stage framework of IR challenges (inaccessible to undiscoverable to misaligned to unverifiable) and offers a pipeline-organized taxonomy of signal-to-noise optimization techniques spanning indexing, retrieval, context engineering, verification, and agentic workflows, while surveying relevant work in domains such as lifelong assistants, coding agents, deep research, and multimodal understanding.

Significance. If the perspective is borne out, the manuscript could usefully reorient IR research priorities toward denoising strategies that improve reliability in LLM-augmented pipelines, providing an organizing framework and taxonomy that researchers could use to systematize work on noise mitigation across the full access stack.

major comments (2)

[Abstract] Abstract: the central assertion that denoising 'is becoming the primary bottleneck across the full information access pipeline' rests on a conceptual argument without any comparative quantification, ablation studies, or failure-mode analysis showing that noise accounts for more LLM failures than other factors such as base-model reasoning limits or prompt sensitivity; this primacy assumption is load-bearing for the claimed paradigm shift.
[Four-stage framework] Four-stage framework: the progression to the 'unverifiable' stage treats noise as the dominant cause of unverifiability in LLM contexts, yet the framework supplies no concrete test, reference, or counter-example analysis distinguishing noise effects from inherent LLM attention or reasoning constraints, leaving the necessity of a denoising-first approach as an unverified hypothesis rather than a demonstrated necessity.

minor comments (2)

[Taxonomy] The taxonomy of signal-to-noise techniques would be strengthened by explicit citations or brief descriptions of representative methods for each pipeline stage, turning the taxonomy into a more immediately usable reference.
The domain-specific examples (lifelong assistant, coding agent, etc.) are listed at a high level; adding even one or two concrete performance deltas or failure cases from the cited works would help ground the discussion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments. As this is a perspective paper, we provide clarifications on the conceptual nature of our arguments and indicate revisions to address the concerns.

read point-by-point responses

Referee: [Abstract] Abstract: the central assertion that denoising 'is becoming the primary bottleneck across the full information access pipeline' rests on a conceptual argument without any comparative quantification, ablation studies, or failure-mode analysis showing that noise accounts for more LLM failures than other factors such as base-model reasoning limits or prompt sensitivity; this primacy assumption is load-bearing for the claimed paradigm shift.

Authors: We acknowledge that our paper does not present new quantitative comparisons or ablations, which is consistent with its role as a perspective piece rather than an empirical study. The assertion draws from the fundamental properties of LLMs, including their limited context windows and proneness to hallucinations from noisy inputs, as opposed to human users. We support this with references to existing research on RAG and LLM failures. In revision, we will update the abstract to more explicitly position the denoising-first perspective as a hypothesis for the community to explore, and include additional discussion on how this differs from other bottlenecks like model reasoning limits. This is a partial revision focused on improving clarity and scope. revision: partial
Referee: [Four-stage framework] Four-stage framework: the progression to the 'unverifiable' stage treats noise as the dominant cause of unverifiability in LLM contexts, yet the framework supplies no concrete test, reference, or counter-example analysis distinguishing noise effects from inherent LLM attention or reasoning constraints, leaving the necessity of a denoising-first approach as an unverified hypothesis rather than a demonstrated necessity.

Authors: The four-stage framework is designed to conceptualize the shifting challenges in IR as consumption moves to LLMs. The unverifiable stage emphasizes that even when information is accessible and aligned, noise can prevent effective verification and lead to unreliable outputs. We do not claim noise is the only factor but argue it becomes primary due to LLMs' sensitivity. The taxonomy section surveys techniques that address this. To respond to this comment, we will incorporate a brief analysis with references and potential counter-examples in the framework description to better delineate noise from other constraints. This revision will strengthen the presentation of the hypothesis. revision: partial

Circularity Check

0 steps flagged

No circularity: conceptual perspective with no derivations or reductions

full rationale

This perspective paper advances a denoising-first view of LLM-oriented IR through a four-stage conceptual framework (inaccessible to unverifiable) and a pipeline taxonomy of signal-to-noise techniques. It contains no equations, fitted parameters, predictions, or mathematical derivations that could reduce to inputs by construction. Claims rest on observed LLM attention limits and noise vulnerabilities rather than self-citations, ansatzes, or renamed empirical patterns; the argument is forward-looking and self-contained without load-bearing loops.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The perspective rests on domain assumptions about LLM behavior without new supporting measurements or independent validation.

axioms (1)

domain assumption LLMs are constrained by limited attention budgets and are uniquely vulnerable to noise, where misleading information directly causes hallucinations and reasoning failures
Stated as the core premise motivating the paradigm shift in the abstract.

invented entities (1)

Four-stage framework of IR challenges (inaccessible, undiscoverable, misaligned, unverifiable) no independent evidence
purpose: To conceptualize the progression of problems in LLM-oriented retrieval
Introduced in the abstract as the lens for the denoising perspective; no independent evidence provided.

pith-pipeline@v0.9.0 · 5721 in / 1184 out tokens · 46270 ms · 2026-05-21T00:14:08.601669+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

denoising—maximizing usable evidence density and verifiability within a context window—is becoming the primary bottleneck across the full information access pipeline... four-stage framework of IR challenges: from inaccessible to undiscoverable, to misaligned, and finally to unverifiable
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

taxonomy of signal-to-noise optimization techniques, spanning indexing, retrieval, context engineering, verification, and agentic workflow

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

250 extracted references · 250 canonical work pages · 18 internal anchors

[1]

Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, and Ari S Mor- cos. 2024. SemDeDup: Data-efficient Learning at Web-scale through Semantic Deduplication. InICLR Workshop on Multimodal Representation Learning

work page 2024
[2]

Shubham Agarwal, Gaurav Sahu, Abhay Puri, Issam H Laradji, Krishna- murthy DJ Dvijotham, Jason Stanley, Laurent Charlin, and Christopher Pal. 2024. Litllm: A toolkit for scientific literature review.arXiv preprint arXiv:2402.01788 (2024)

work page arXiv 2024
[3]

Chen Amiraz, Florin Cuconasu, Simone Filice, and Zohar Karnin. 2025. The distracting effect: Understanding irrelevant passages in rag.arXiv preprint arXiv:2505.06914(2025)

work page arXiv 2025
[4]

Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, and William Yang Wang. 2025. SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=G7sIFXugTX

work page 2025
[5]

Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, et al. 2026. Syn- thesizing scientific literature with retrieval-augmented language models.Nature (2026), 1–7

work page 2026
[6]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. [n. d.]. Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. InThe Twelfth International Conference on Learning Representations

work page
[7]

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, et al . 2024. Longbench: A bilingual, multitask benchmark for long context understanding. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 3119–3137

work page 2024
[8]

Sylvio Barbon Junior, Paolo Ceravolo, Sven Groppe, Mustafa Jarrar, Samira Maghool, Florence Sèdes, Soror Sahri, and Maurice Van Keulen. 2024. Are large language models the new interface for data pipelines?. InProceedings of the International Workshop on Big Data in Emergent Distributed Environments. 1–6

work page 2024
[9]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine.Computer networks and ISDN systems30, 1-7 (1998), 107–117

work page 1998
[10]

Andrei Z Broder. 1997. On the resemblance and containment of documents. InProceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171). IEEE, 21–29

work page 1997
[11]

Sebastian Bruch, Siyu Gai, and Amir Ingber. 2023. An analysis of fusion functions for hybrid retrieval.ACM Transactions on Information Systems42, 1 (2023), 1–35

work page 2023
[12]

Vannevar Bush et al. 1945. As we may think.The atlantic monthly176, 1 (1945), 101–108

work page 1945
[13]

Adam Byerly and Daniel Khashabi. 2026. GOLD PANNING: Strategic Context Shuffling for Needle-in-Haystack Reasoning. arXiv:2510.09770 [cs.CL] https: //arxiv.org/abs/2510.09770

work page arXiv 2026
[14]

The use of mmr, diversity-based reranking for reordering documents and producing summaries

Jaime G. Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity- Based Reranking for Reordering Documents and Producing Summaries. In SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 24-28 1998, Melbourne, Australia, W. Bruce Croft, Alistair Moffat, C. J. van R...

work page doi:10.1145/290941.291025 1998
[15]

Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, and Jie Fu. [n. d.]. RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation. InFirst Conference on Language Modeling

work page
[16]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17754–17762

work page 2024
[17]

Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu

work page
[18]

InFindings of the associa- tion for computational linguistics: ACL 2024

M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. InFindings of the associa- tion for computational linguistics: ACL 2024. 2318–2335

work page 2024
[19]

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. 2025. {StruQ}: Defending against prompt injection with structured queries. In34th USENIX Security Symposium (USENIX Security 25). 2383–2400

work page 2025
[20]

Tao Chen, Mingyang Zhang, Jing Lu, Michael Bendersky, and Marc Najork

work page
[21]

InEuropean Conference on Information Retrieval

Out-of-domain semantics to the rescue! zero-shot hybrid retrieval models. InEuropean Conference on Information Retrieval. Springer, 95–110

work page
[22]

Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, and Yingfei Sun. 2024. Spiral of Silence: How is Large Language Model Killing Information Retrieval?—A Case Study on Open Domain Question An- swering. InProceedings of the 62nd Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers). 14930–14951

work page 2024
[23]

Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, and Feng Zhao. [n. d.]. MindSearch: Mimicking Human Minds Elicits Deep AI Searcher. InThe Thirteenth International Conference on Learning Repre- sentations

work page
[24]

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. 2024. Agent- poison: Red-teaming llm agents via poisoning memory or knowledge bases. Advances in Neural Information Processing Systems37, 130185–130213

work page 2024
[25]

Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, and Dongyan Zhao. 2024. xrag: Extreme context compres- sion for retrieval-augmented generation with one token.Advances in Neural Information Processing Systems37, 109487–109516

work page 2024
[26]

Nadezhda Chirkova, Thibault Formal, Vassilina Nikoulina, and Stéphane CLIN- CHANT. [n. d.]. Provence: efficient and robust context pruning for retrieval- augmented generation. InThe Thirteenth International Conference on Learning Representations

work page
[27]

Content Credentials. 2025. C2PA Technical Specification v2. 2

work page 2025
[28]

2010.Search engines: Information retrieval in practice

W Bruce Croft, Donald Metzler, Trevor Strohman, et al. 2010.Search engines: Information retrieval in practice. Vol. 520. Addison-Wesley Reading

work page 2010
[29]

Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The Power of Noise: Redefining Retrieval for RAG Systems. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 719–729

work page 2024
[30]

Lu Dai, Hao Liu, and Hui Xiong. 2024. Improve Dense Passage Retrieval with Entailment Tuning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 11375–11387

work page 2024
[31]

Lu Dai, Yijie Xu, Jinhui Ye, Hao Liu, and Hui Xiong. 2025. Seper: Measure retrieval utility through the lens of semantic perplexity reduction.arXiv preprint arXiv:2503.01478(2025)

work page arXiv 2025
[32]

Sunhao Dai, Weihao Liu, Yuqi Zhou, Liang Pang, Rongju Ruan, Gang Wang, Zhenhua Dong, Jun Xu, and Ji-Rong Wen. 2024. Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration. InFindings of the Association for Computational Linguistics ACL 2024. 7052–7074

work page 2024
[33]

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, et al. 2024. Scalable watermarking for identifying large language model outputs.Nature634, 8035 (2024), 818–823

work page 2024
[34]

Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, and Jason E Weston. 2024. Chain-of-Verification Reduces Hallucination in Large Language Models. InICLR 2024 Workshop on Reliable and Responsible Foundation Models

work page 2024
[35]

Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, and Vahab Mir- rokni. 2024. Muvera: Multi-vector retrieval via fixed dimensional encoding. Advances in Neural Information Processing Systems37, 101042–101073

work page 2024
[36]

Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Ni- hal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Xiang. 2023. CrossCodeEval: A Diverse and Multilingual Bench- mark for Cross-File Code Completion. InNeurIPS. arXiv:2310.11248 [cs.LG] http://arxiv.org/abs/2310.11248v2 arXiv:2310.11248

work page arXiv 2023
[37]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query-Only Interaction. arXiv:2503.03704 [cs.LG]

work page arXiv 2025
[38]

Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, and Zhendong Mao

work page
[39]

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents.arXiv preprint arXiv:2506.11763(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query- focused summarization.arXiv preprint arXiv:2404.16130(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[41]

Shahul ES, Jithin James, Luis Espinosa Anke, and Steven Schockaert. 2023. RAGAs: Automated Evaluation of Retrieval Augmented Generation.Confer- ence of the European Chapter of the Association for Computational Linguistics abs/2309.15217 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[42]

Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, CELINE HUDELOT, and Pierre Colombo. [n. d.]. ColPali: Efficient Document Retrieval with Vision Language Models. InThe Thirteenth International Conference on Learning Representations

work page
[43]

Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292

work page 2021
[44]

Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Caifeng Shan, Ran He, and Xing Sun. 2025. Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysi...

work page 2025
[45]

Yisong Fu, Zezhi Shao, Chengqing Yu, Yujie Li, Zhulin An, Cheems Wang, Yongjun Xu, and Fei Wang. 2025. Selective Learning for Deep Time Series Fore- casting. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=kgzRy6nD6D

work page 2025
[46]

Jiyang Gao, Chen Sun, Zhenheng Yang, and Ram Nevatia. 2017. Tall: Temporal activity localization via language query. InProceedings of the IEEE international conference on computer vision. 5267–5275. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Lu Dai et al

work page 2017
[47]

Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2023. Precise Zero-Shot Dense Retrieval without Relevance Labels.. InAnnual Meeting of the Association for Computational Linguistics (ACL). 1762–1777

work page 2023
[48]

Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. 2023. Enabling large language models to generate text with citations.arXiv preprint arXiv:2305.14627 (2023)

work page arXiv 2023
[49]

Tao Ge, Hu Jing, Lei Wang, Xun Wang, Si-Qing Chen, and Furu Wei. [n. d.]. In-context Autoencoder for Context Compression in a Large Language Model. InThe Twelfth International Conference on Learning Representations

work page
[50]

Sebastian Gehrmann, Hendrik Strobelt, and Alexander M Rush. 2019. GLTR: Statistical Detection and Visualization of Generated Text. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 111–116

work page 2019
[51]

Gregory Hok Tjoan Go, Khang Ly, Anders Søgaard, Amin Tabatabaei, Maarten de Rijke, and Xinyi Chen. 2025. LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation.arXiv preprint arXiv:2510.05138 (2025)

work page arXiv 2025
[52]

Alon Gorenshtein, Kamel Shihada, Moran Sorka, Dvir Aran, and Shahar Shelly

work page
[53]

Computers in Biology and Medicine192 (2025), 110363

LITERAS: Biomedical literature review and citation retrieval agents. Computers in Biology and Medicine192 (2025), 110363

work page 2025
[54]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

work page 2023
[55]

Yongxin Guo, Jingyu Liu, Mingda Li, Dingxin Cheng, Xiaoying Tang, Dianbo Sui, Qingbin Liu, Xi Chen, and Kevin Zhao. 2025. Vtg-llm: Integrating times- tamp knowledge into video llms for enhanced video temporal grounding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 3302–3310

work page 2025
[56]

Yucan Guo, Miao Su, Saiping Guan, Zihao Sun, Xiaolong Jin, Jiafeng Guo, and Xueqi Cheng. 2025. RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning.arXiv preprint arXiv:2512.09487 (2025)

work page arXiv 2025
[57]

Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. 2024. Lightrag: Simple and fast retrieval-augmented generation.arXiv preprint arXiv:2410.05779 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[58]

Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su

work page
[59]

J.; Shu, Y.; Gu, Y.; Yasunaga, M.; and Su, Y

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. InNeurIPS. arXiv:2405.14831 [cs.CL] NeurIPS 2024 (per arXiv comments)

work page arXiv 2024
[60]

Rujun Han, Yanfei Chen, Zoey CuiZhu, Lesly Miculicich, Guan Sun, Yuanjun Bi, Weiming Wen, Hui Wan, Chunfeng Wen, Solène Maître, et al. 2025. Deep researcher with test-time diffusion.arXiv preprint arXiv:2507.16075(2025)

work page arXiv 2025
[61]

Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, and Jinsung Yoon

work page
[62]

InForty-second International Conference on Machine Learning

Retrieval Augmented Time Series Forecasting. InForty-second International Conference on Machine Learning

work page
[63]

Bowei He, Minda Hu, Zenan Xu, Hongru Wang, Licheng Zong, Yankai Chen, Chen Ma, Xue Liu, Pluto Zhou, and Irwin King. 2026. Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration.arXiv preprint arXiv:2602.03647(2026)

work page arXiv 2026
[64]

Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh Chawla, Thomas Laurent, Yann Le- Cun, Xavier Bresson, and Bryan Hooi. 2024. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering.Advances in Neural Information Processing Systems37 (2024), 132876–132907

work page 2024
[65]

Yufang Hou, Alessandra Pascale, Javier Carnerero-Cano, Tigran Tchrakian, Radu Marinescu, Elizabeth Daly, Inkit Padhi, and Prasanna Sattigeri. 2024. Wiki- contradict: A benchmark for evaluating llms on real-world knowledge conflicts from wikipedia.Advances in Neural Information Processing Systems37 (2024), 109701–109747

work page 2024
[66]

Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, et al. 2024. Found in the middle: Calibrating positional attention bias improves long context utilization. InFindings of the Association for Computational Lin- guistics: ACL 2024. 14982–14995

work page 2024
[67]

Tiansheng Hu, Yilun Zhao, Canyu Zhang, Arman Cohan, and Chen Zhao. 2026. SAGE: Benchmarking and Improving Retrieval for Deep Research Agents.arXiv preprint arXiv:2602.05975(2026)

work page arXiv 2026
[68]

Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. 2024. GRAG: Graph Retrieval-Augmented Generation.arXiv preprint arXiv:2405.16506 (2024)

work page arXiv 2024
[69]

Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, and Yun-Nung Chen. 2023. CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation.. InSIGdial Meetings (SIGDIAL). 381–387

work page 2023
[70]

Haoyu Huang, Yongfeng Huang, Junjie Yang, Zhenyu Pan, Yongchao Chen, Kaixiong Ma, Hongzhi Chen, and Jiawei Cheng. 2025. Retrieval-Augmented Generation with Hierarchical Knowledge. InFindings of the Association for Computational Linguistics: EMNLP 2025

work page 2025
[71]

Yuxuan Huang, Yihang Chen, Haozheng Zhang, Kang Li, Huichi Zhou, Meng Fang, Linyi Yang, Xiaoguang Li, Lifeng Shang, Songcen Xu, et al. 2025. Deep research agents: A systematic examination and roadmap.arXiv preprint arXiv:2506.18096(2025)

work page arXiv 2025
[72]

Zhengjun Huang, Zhoujin Tian, Qintian Guo, Fangyuan Zhang, Yingli Zhou, Di Jiang, Zeying Xie, and Xiaofang Zhou. 2025. LiCoMemory: Light- weight and Cognitive Agentic Memory for Efficient Long-Term Reasoning. arXiv:2511.01448 [cs.IR]

work page arXiv 2025
[73]

Daniel Huwiler, Kurt Stockinger, and Jonathan Fürst. 2025. VersionRAG: Version- Aware Retrieval-Augmented Generation for Evolving Documents.arXiv preprint arXiv:2510.08109(2025)

work page arXiv 2025
[74]

Taeho Hwang, Soyeong Jeong, Sukmin Cho, SeungYoon Han, and Jong C Park

work page
[75]

InProceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP

DSLR: Document refinement with sentence-level re-ranking and recon- struction to enhance retrieval-augmented generation. InProceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP. 73–92

work page
[76]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. [n. d.]. Unsupervised Dense Information Retrieval with Contrastive Learning.Transactions on Machine Learning Research([n. d.])

work page
[77]

Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. InProceedings of the 16th conference of the european chapter of the association for computational linguistics: main volume. 874–880

work page 2021
[78]

Abhinav Java, Ashmit Khandelwal, Sukruta Midigeshi, Aaron Halfaker, Amit Deshpande, Navin Goyal, Ankur Gupta, Nagarajan Natarajan, and Amit Sharma

work page
[79]

Characterizing deep research: A benchmark and formal definition.arXiv preprint arXiv:2508.04183(2025)

work page arXiv 2025
[80]

Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park

work page

Showing first 80 references.

[1] [1]

Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, and Ari S Mor- cos. 2024. SemDeDup: Data-efficient Learning at Web-scale through Semantic Deduplication. InICLR Workshop on Multimodal Representation Learning

work page 2024

[2] [2]

Shubham Agarwal, Gaurav Sahu, Abhay Puri, Issam H Laradji, Krishna- murthy DJ Dvijotham, Jason Stanley, Laurent Charlin, and Christopher Pal. 2024. Litllm: A toolkit for scientific literature review.arXiv preprint arXiv:2402.01788 (2024)

work page arXiv 2024

[3] [3]

Chen Amiraz, Florin Cuconasu, Simone Filice, and Zohar Karnin. 2025. The distracting effect: Understanding irrelevant passages in rag.arXiv preprint arXiv:2505.06914(2025)

work page arXiv 2025

[4] [4]

Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, and William Yang Wang. 2025. SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=G7sIFXugTX

work page 2025

[5] [5]

Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, et al. 2026. Syn- thesizing scientific literature with retrieval-augmented language models.Nature (2026), 1–7

work page 2026

[6] [6]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. [n. d.]. Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. InThe Twelfth International Conference on Learning Representations

work page

[7] [7]

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, et al . 2024. Longbench: A bilingual, multitask benchmark for long context understanding. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 3119–3137

work page 2024

[8] [8]

Sylvio Barbon Junior, Paolo Ceravolo, Sven Groppe, Mustafa Jarrar, Samira Maghool, Florence Sèdes, Soror Sahri, and Maurice Van Keulen. 2024. Are large language models the new interface for data pipelines?. InProceedings of the International Workshop on Big Data in Emergent Distributed Environments. 1–6

work page 2024

[9] [9]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine.Computer networks and ISDN systems30, 1-7 (1998), 107–117

work page 1998

[10] [10]

Andrei Z Broder. 1997. On the resemblance and containment of documents. InProceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171). IEEE, 21–29

work page 1997

[11] [11]

Sebastian Bruch, Siyu Gai, and Amir Ingber. 2023. An analysis of fusion functions for hybrid retrieval.ACM Transactions on Information Systems42, 1 (2023), 1–35

work page 2023

[12] [12]

Vannevar Bush et al. 1945. As we may think.The atlantic monthly176, 1 (1945), 101–108

work page 1945

[13] [13]

Adam Byerly and Daniel Khashabi. 2026. GOLD PANNING: Strategic Context Shuffling for Needle-in-Haystack Reasoning. arXiv:2510.09770 [cs.CL] https: //arxiv.org/abs/2510.09770

work page arXiv 2026

[14] [14]

The use of mmr, diversity-based reranking for reordering documents and producing summaries

Jaime G. Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity- Based Reranking for Reordering Documents and Producing Summaries. In SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 24-28 1998, Melbourne, Australia, W. Bruce Croft, Alistair Moffat, C. J. van R...

work page doi:10.1145/290941.291025 1998

[15] [15]

Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, and Jie Fu. [n. d.]. RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation. InFirst Conference on Language Modeling

work page

[16] [16]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17754–17762

work page 2024

[17] [17]

Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu

work page

[18] [18]

InFindings of the associa- tion for computational linguistics: ACL 2024

M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. InFindings of the associa- tion for computational linguistics: ACL 2024. 2318–2335

work page 2024

[19] [19]

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. 2025. {StruQ}: Defending against prompt injection with structured queries. In34th USENIX Security Symposium (USENIX Security 25). 2383–2400

work page 2025

[20] [20]

Tao Chen, Mingyang Zhang, Jing Lu, Michael Bendersky, and Marc Najork

work page

[21] [21]

InEuropean Conference on Information Retrieval

Out-of-domain semantics to the rescue! zero-shot hybrid retrieval models. InEuropean Conference on Information Retrieval. Springer, 95–110

work page

[22] [22]

Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, and Yingfei Sun. 2024. Spiral of Silence: How is Large Language Model Killing Information Retrieval?—A Case Study on Open Domain Question An- swering. InProceedings of the 62nd Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers). 14930–14951

work page 2024

[23] [23]

Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, and Feng Zhao. [n. d.]. MindSearch: Mimicking Human Minds Elicits Deep AI Searcher. InThe Thirteenth International Conference on Learning Repre- sentations

work page

[24] [24]

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. 2024. Agent- poison: Red-teaming llm agents via poisoning memory or knowledge bases. Advances in Neural Information Processing Systems37, 130185–130213

work page 2024

[25] [25]

Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, and Dongyan Zhao. 2024. xrag: Extreme context compres- sion for retrieval-augmented generation with one token.Advances in Neural Information Processing Systems37, 109487–109516

work page 2024

[26] [26]

Nadezhda Chirkova, Thibault Formal, Vassilina Nikoulina, and Stéphane CLIN- CHANT. [n. d.]. Provence: efficient and robust context pruning for retrieval- augmented generation. InThe Thirteenth International Conference on Learning Representations

work page

[27] [27]

Content Credentials. 2025. C2PA Technical Specification v2. 2

work page 2025

[28] [28]

2010.Search engines: Information retrieval in practice

W Bruce Croft, Donald Metzler, Trevor Strohman, et al. 2010.Search engines: Information retrieval in practice. Vol. 520. Addison-Wesley Reading

work page 2010

[29] [29]

Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The Power of Noise: Redefining Retrieval for RAG Systems. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 719–729

work page 2024

[30] [30]

Lu Dai, Hao Liu, and Hui Xiong. 2024. Improve Dense Passage Retrieval with Entailment Tuning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 11375–11387

work page 2024

[31] [31]

Lu Dai, Yijie Xu, Jinhui Ye, Hao Liu, and Hui Xiong. 2025. Seper: Measure retrieval utility through the lens of semantic perplexity reduction.arXiv preprint arXiv:2503.01478(2025)

work page arXiv 2025

[32] [32]

Sunhao Dai, Weihao Liu, Yuqi Zhou, Liang Pang, Rongju Ruan, Gang Wang, Zhenhua Dong, Jun Xu, and Ji-Rong Wen. 2024. Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration. InFindings of the Association for Computational Linguistics ACL 2024. 7052–7074

work page 2024

[33] [33]

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, et al. 2024. Scalable watermarking for identifying large language model outputs.Nature634, 8035 (2024), 818–823

work page 2024

[34] [34]

Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, and Jason E Weston. 2024. Chain-of-Verification Reduces Hallucination in Large Language Models. InICLR 2024 Workshop on Reliable and Responsible Foundation Models

work page 2024

[35] [35]

Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, and Vahab Mir- rokni. 2024. Muvera: Multi-vector retrieval via fixed dimensional encoding. Advances in Neural Information Processing Systems37, 101042–101073

work page 2024

[36] [36]

Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Ni- hal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Xiang. 2023. CrossCodeEval: A Diverse and Multilingual Bench- mark for Cross-File Code Completion. InNeurIPS. arXiv:2310.11248 [cs.LG] http://arxiv.org/abs/2310.11248v2 arXiv:2310.11248

work page arXiv 2023

[37] [37]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query-Only Interaction. arXiv:2503.03704 [cs.LG]

work page arXiv 2025

[38] [38]

Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, and Zhendong Mao

work page

[39] [39]

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents.arXiv preprint arXiv:2506.11763(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query- focused summarization.arXiv preprint arXiv:2404.16130(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[41] [41]

Shahul ES, Jithin James, Luis Espinosa Anke, and Steven Schockaert. 2023. RAGAs: Automated Evaluation of Retrieval Augmented Generation.Confer- ence of the European Chapter of the Association for Computational Linguistics abs/2309.15217 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[42] [42]

Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, CELINE HUDELOT, and Pierre Colombo. [n. d.]. ColPali: Efficient Document Retrieval with Vision Language Models. InThe Thirteenth International Conference on Learning Representations

work page

[43] [43]

Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292

work page 2021

[44] [44]

Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Caifeng Shan, Ran He, and Xing Sun. 2025. Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysi...

work page 2025

[45] [45]

Yisong Fu, Zezhi Shao, Chengqing Yu, Yujie Li, Zhulin An, Cheems Wang, Yongjun Xu, and Fei Wang. 2025. Selective Learning for Deep Time Series Fore- casting. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=kgzRy6nD6D

work page 2025

[46] [46]

Jiyang Gao, Chen Sun, Zhenheng Yang, and Ram Nevatia. 2017. Tall: Temporal activity localization via language query. InProceedings of the IEEE international conference on computer vision. 5267–5275. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Lu Dai et al

work page 2017

[47] [47]

Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2023. Precise Zero-Shot Dense Retrieval without Relevance Labels.. InAnnual Meeting of the Association for Computational Linguistics (ACL). 1762–1777

work page 2023

[48] [48]

Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. 2023. Enabling large language models to generate text with citations.arXiv preprint arXiv:2305.14627 (2023)

work page arXiv 2023

[49] [49]

Tao Ge, Hu Jing, Lei Wang, Xun Wang, Si-Qing Chen, and Furu Wei. [n. d.]. In-context Autoencoder for Context Compression in a Large Language Model. InThe Twelfth International Conference on Learning Representations

work page

[50] [50]

Sebastian Gehrmann, Hendrik Strobelt, and Alexander M Rush. 2019. GLTR: Statistical Detection and Visualization of Generated Text. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 111–116

work page 2019

[51] [51]

Gregory Hok Tjoan Go, Khang Ly, Anders Søgaard, Amin Tabatabaei, Maarten de Rijke, and Xinyi Chen. 2025. LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation.arXiv preprint arXiv:2510.05138 (2025)

work page arXiv 2025

[52] [52]

Alon Gorenshtein, Kamel Shihada, Moran Sorka, Dvir Aran, and Shahar Shelly

work page

[53] [53]

Computers in Biology and Medicine192 (2025), 110363

LITERAS: Biomedical literature review and citation retrieval agents. Computers in Biology and Medicine192 (2025), 110363

work page 2025

[54] [54]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

work page 2023

[55] [55]

Yongxin Guo, Jingyu Liu, Mingda Li, Dingxin Cheng, Xiaoying Tang, Dianbo Sui, Qingbin Liu, Xi Chen, and Kevin Zhao. 2025. Vtg-llm: Integrating times- tamp knowledge into video llms for enhanced video temporal grounding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 3302–3310

work page 2025

[56] [56]

Yucan Guo, Miao Su, Saiping Guan, Zihao Sun, Xiaolong Jin, Jiafeng Guo, and Xueqi Cheng. 2025. RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning.arXiv preprint arXiv:2512.09487 (2025)

work page arXiv 2025

[57] [57]

Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. 2024. Lightrag: Simple and fast retrieval-augmented generation.arXiv preprint arXiv:2410.05779 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[58] [58]

Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su

work page

[59] [59]

J.; Shu, Y.; Gu, Y.; Yasunaga, M.; and Su, Y

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. InNeurIPS. arXiv:2405.14831 [cs.CL] NeurIPS 2024 (per arXiv comments)

work page arXiv 2024

[60] [60]

Rujun Han, Yanfei Chen, Zoey CuiZhu, Lesly Miculicich, Guan Sun, Yuanjun Bi, Weiming Wen, Hui Wan, Chunfeng Wen, Solène Maître, et al. 2025. Deep researcher with test-time diffusion.arXiv preprint arXiv:2507.16075(2025)

work page arXiv 2025

[61] [61]

Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, and Jinsung Yoon

work page

[62] [62]

InForty-second International Conference on Machine Learning

Retrieval Augmented Time Series Forecasting. InForty-second International Conference on Machine Learning

work page

[63] [63]

Bowei He, Minda Hu, Zenan Xu, Hongru Wang, Licheng Zong, Yankai Chen, Chen Ma, Xue Liu, Pluto Zhou, and Irwin King. 2026. Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration.arXiv preprint arXiv:2602.03647(2026)

work page arXiv 2026

[64] [64]

Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh Chawla, Thomas Laurent, Yann Le- Cun, Xavier Bresson, and Bryan Hooi. 2024. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering.Advances in Neural Information Processing Systems37 (2024), 132876–132907

work page 2024

[65] [65]

Yufang Hou, Alessandra Pascale, Javier Carnerero-Cano, Tigran Tchrakian, Radu Marinescu, Elizabeth Daly, Inkit Padhi, and Prasanna Sattigeri. 2024. Wiki- contradict: A benchmark for evaluating llms on real-world knowledge conflicts from wikipedia.Advances in Neural Information Processing Systems37 (2024), 109701–109747

work page 2024

[66] [66]

Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, et al. 2024. Found in the middle: Calibrating positional attention bias improves long context utilization. InFindings of the Association for Computational Lin- guistics: ACL 2024. 14982–14995

work page 2024

[67] [67]

Tiansheng Hu, Yilun Zhao, Canyu Zhang, Arman Cohan, and Chen Zhao. 2026. SAGE: Benchmarking and Improving Retrieval for Deep Research Agents.arXiv preprint arXiv:2602.05975(2026)

work page arXiv 2026

[68] [68]

Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. 2024. GRAG: Graph Retrieval-Augmented Generation.arXiv preprint arXiv:2405.16506 (2024)

work page arXiv 2024

[69] [69]

Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, and Yun-Nung Chen. 2023. CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation.. InSIGdial Meetings (SIGDIAL). 381–387

work page 2023

[70] [70]

Haoyu Huang, Yongfeng Huang, Junjie Yang, Zhenyu Pan, Yongchao Chen, Kaixiong Ma, Hongzhi Chen, and Jiawei Cheng. 2025. Retrieval-Augmented Generation with Hierarchical Knowledge. InFindings of the Association for Computational Linguistics: EMNLP 2025

work page 2025

[71] [71]

Yuxuan Huang, Yihang Chen, Haozheng Zhang, Kang Li, Huichi Zhou, Meng Fang, Linyi Yang, Xiaoguang Li, Lifeng Shang, Songcen Xu, et al. 2025. Deep research agents: A systematic examination and roadmap.arXiv preprint arXiv:2506.18096(2025)

work page arXiv 2025

[72] [72]

Zhengjun Huang, Zhoujin Tian, Qintian Guo, Fangyuan Zhang, Yingli Zhou, Di Jiang, Zeying Xie, and Xiaofang Zhou. 2025. LiCoMemory: Light- weight and Cognitive Agentic Memory for Efficient Long-Term Reasoning. arXiv:2511.01448 [cs.IR]

work page arXiv 2025

[73] [73]

Daniel Huwiler, Kurt Stockinger, and Jonathan Fürst. 2025. VersionRAG: Version- Aware Retrieval-Augmented Generation for Evolving Documents.arXiv preprint arXiv:2510.08109(2025)

work page arXiv 2025

[74] [74]

Taeho Hwang, Soyeong Jeong, Sukmin Cho, SeungYoon Han, and Jong C Park

work page

[75] [75]

InProceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP

DSLR: Document refinement with sentence-level re-ranking and recon- struction to enhance retrieval-augmented generation. InProceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP. 73–92

work page

[76] [76]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. [n. d.]. Unsupervised Dense Information Retrieval with Contrastive Learning.Transactions on Machine Learning Research([n. d.])

work page

[77] [77]

Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. InProceedings of the 16th conference of the european chapter of the association for computational linguistics: main volume. 874–880

work page 2021

[78] [78]

Abhinav Java, Ashmit Khandelwal, Sukruta Midigeshi, Aaron Halfaker, Amit Deshpande, Navin Goyal, Ankur Gupta, Nagarajan Natarajan, and Amit Sharma

work page

[79] [79]

Characterizing deep research: A benchmark and formal definition.arXiv preprint arXiv:2508.04183(2025)

work page arXiv 2025

[80] [80]

Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park

work page