pith. machine review for the scientific record. sign in

arxiv: 2605.00505 · v1 · submitted 2026-05-01 · 💻 cs.IR · cs.AI· cs.CL

Recognition: unknown

LLM-Oriented Information Retrieval: A Denoising-First Perspective

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:50 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL
keywords information retrievallarge language modelsdenoisingretrieval-augmented generationsignal-to-noisehallucinationscontext windowverification
0
0 comments X

The pith

Denoising to maximize evidence density and verifiability becomes the central task in information retrieval for large language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that large language models, when consuming retrieved information through retrieval-augmented generation, face a new bottleneck because their limited attention makes noise a direct cause of hallucinations and reasoning failures. This matters if true because traditional relevance methods no longer suffice; the system must instead prioritize cleaning and verifying evidence inside fixed context windows. The authors trace IR challenges through four stages from inaccessible information to unverifiable evidence and organize signal-to-noise techniques into a pipeline taxonomy that covers indexing, retrieval, context engineering, verification, and agentic workflows. They show applications in retrieval-heavy domains such as coding agents and multimodal understanding. A sympathetic reader would see this as a call to redesign the full information access pipeline around machine consumption limits rather than human reading habits.

Core claim

The central claim is that denoising—maximizing usable evidence density and verifiability within a context window—is becoming the primary bottleneck across the full information access pipeline. The authors conceptualize the paradigm shift via a four-stage framework of challenges running from inaccessible to undiscoverable, to misaligned, and finally to unverifiable. They supply a pipeline-organized taxonomy of signal-to-noise optimization methods and review concrete work in domains that depend on retrieval such as lifelong assistants, coding agents, deep research, and multimodal understanding.

What carries the argument

The four-stage framework that maps IR challenges from inaccessible information through undiscoverable, misaligned, and unverifiable stages, with denoising as the mechanism that raises usable evidence density and verifiability inside limited context windows.

If this is right

  • Relevance ranking by itself becomes insufficient to support reliable LLM performance in retrieval-augmented generation.
  • Indexing, retrieval, context engineering, and verification stages must all incorporate explicit signal-to-noise optimization.
  • Domains such as coding agents and deep research require new techniques that ensure evidence remains verifiable inside context windows.
  • Agentic workflows gain from treating denoising as a core, pipeline-wide activity rather than an optional post-processing step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Evaluation benchmarks for LLM-oriented IR could shift from measuring relevance alone to measuring downstream effects on hallucination rates and reasoning accuracy.
  • Agentic systems might standardize iterative denoising loops that repeatedly filter and re-verify evidence before final generation.
  • If the shift holds, separate IR stacks may emerge for human users who tolerate noise and machine users who do not.
  • Multimodal and lifelong-assistant settings could test whether the same density-and-verifiability goals apply when evidence spans text, code, and images.

Load-bearing premise

That the limited attention budgets and noise vulnerability of LLMs create a fundamental paradigm shift in IR that requires an entirely new denoising-first framework rather than extensions of existing relevance techniques.

What would settle it

A controlled comparison in which standard relevance-ranked retrieval, without extra denoising steps, produces hallucination rates and reasoning success in RAG systems that match those achieved by dedicated signal-to-noise methods.

Figures

Figures reproduced from arXiv: 2605.00505 by Cehao Yang, Fanpu Cao, Hao Liu, Hui Xiong, Liang Sun, Lu Dai, Ziyang Rao.

Figure 1
Figure 1. Figure 1: Challenge shifts in the history of IR. information, even a powerful LLM cannot produce a correct and verifiable answer. On the one hand, LLM-generated content is flood￾ing the internet corpus itself. The proliferation of hallucinations makes attribution and trust harder than ever before. On the other hand, LLMs are sensitive to noise in context. Studies have found that misleading evidence in the context ca… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical validation of the denoising-first perspec view at source ↗
Figure 3
Figure 3. Figure 3: A multi-level denoising taxonomy aligned with the five-stage Section 3 pipeline: Controlled Indexing (§3.1), Robust view at source ↗
read the original abstract

Modern information retrieval (IR) is no longer consumed primarily by humans but increasingly by large language models (LLMs) via retrieval-augmented generation (RAG) and agentic search. Unlike human users, LLMs are constrained by limited attention budgets and are uniquely vulnerable to noise; misleading or irrelevant information is no longer just a nuisance, but a direct cause of hallucinations and reasoning failures. In this perspective paper, we argue that denoising-maximizing usable evidence density and verifiability within a context window-is becoming the primary bottleneck across the full information access pipeline. We conceptualize this paradigm shift through a four-stage framework of IR challenges: from inaccessible to undiscoverable, to misaligned, and finally to unverifiable. Furthermore, we provide a pipeline-organized taxonomy of signal-to-noise optimization techniques, spanning indexing, retrieval, context engineering, verification, and agentic workflow. We also present research works on information denoising in domains that rely heavily on retrieval such as lifelong assistant, coding agent, deep research, and multimodal understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper argues that in LLM-oriented information retrieval via RAG and agentic search, denoising—maximizing usable evidence density and verifiability within context windows—is becoming the primary bottleneck across the information access pipeline. It introduces a four-stage framework (inaccessible to undiscoverable to misaligned to unverifiable) and a pipeline-organized taxonomy of signal-to-noise techniques spanning indexing, retrieval, context engineering, verification, and agentic workflows, with examples from domains such as lifelong assistants, coding agents, deep research, and multimodal understanding.

Significance. If the perspective holds, it could usefully reorient IR research toward LLM-specific denoising priorities, organizing existing RAG mitigations into a coherent taxonomy and highlighting applications in retrieval-heavy domains. The absence of empirical validation, derivations, or comparative analysis limits immediate impact, but the framework provides a conceptual lens that could stimulate targeted follow-up work.

major comments (3)
  1. [Abstract] Abstract: the claim that LLMs' limited attention budgets and noise vulnerability create a fundamental paradigm shift requiring a denoising-first framework (rather than incremental extensions of relevance/quality techniques) is asserted without evidence or analysis distinguishing it from classic IR problems.
  2. [Four-stage framework] Four-stage framework: the progression from inaccessible to undiscoverable, misaligned, and unverifiable maps directly onto traditional recall, precision, and credibility issues; the manuscript provides no demonstration that LLM attention limits introduce failure modes not addressable by refining existing filtering and verification methods.
  3. [Taxonomy] Taxonomy section: the pipeline-organized taxonomy of signal-to-noise methods (indexing through agentic workflows) largely recategorizes known RAG mitigations such as reranking and context compression without comparative analysis showing why denoising has become primary over other bottlenecks like coverage or latency.
minor comments (1)
  1. [Taxonomy] The manuscript would benefit from explicit pointers to prior surveys on RAG noise mitigation to clarify the incremental contribution of the proposed taxonomy.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our perspective paper. We address each major comment below, providing clarifications and indicating planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that LLMs' limited attention budgets and noise vulnerability create a fundamental paradigm shift requiring a denoising-first framework (rather than incremental extensions of relevance/quality techniques) is asserted without evidence or analysis distinguishing it from classic IR problems.

    Authors: As this is a perspective paper, the argument is conceptual and draws on observed trends in the literature. We differentiate from classic IR by emphasizing that LLMs lack the human ability to selectively attend and ignore noise within a fixed context window, leading to direct impacts on generation quality. We will revise the abstract and introduction to include specific citations and brief analysis of studies demonstrating LLM vulnerability to noise beyond traditional relevance measures. revision: partial

  2. Referee: [Four-stage framework] Four-stage framework: the progression from inaccessible to undiscoverable, misaligned, and unverifiable maps directly onto traditional recall, precision, and credibility issues; the manuscript provides no demonstration that LLM attention limits introduce failure modes not addressable by refining existing filtering and verification methods.

    Authors: While there is overlap with traditional issues, the framework highlights how LLM attention constraints create sequential dependencies where failure at earlier stages (e.g., undiscoverable due to noise) cannot be mitigated by later verification. We will add illustrative examples and references in the framework section to demonstrate these LLM-specific failure modes. revision: partial

  3. Referee: [Taxonomy] Taxonomy section: the pipeline-organized taxonomy of signal-to-noise methods (indexing through agentic workflows) largely recategorizes known RAG mitigations such as reranking and context compression without comparative analysis showing why denoising has become primary over other bottlenecks like coverage or latency.

    Authors: The taxonomy reorganizes techniques to underscore denoising as the central challenge in LLM consumption. We will enhance the taxonomy section with a discussion on why denoising is primary, supported by references to recent RAG surveys that identify noise and verifiability as key remaining issues after improvements in retrieval coverage and efficiency. revision: partial

Circularity Check

0 steps flagged

No circularity: conceptual taxonomy organizes existing techniques without self-referential reduction

full rationale

The paper is a perspective piece that proposes a four-stage framework and taxonomy of signal-to-noise techniques drawn from standard IR and LLM literature. No equations, fitted parameters, or derivations are present that could reduce by construction to the paper's own inputs. The central claim is an argumentative reframing of attention limits and noise vulnerability as a primary bottleneck, supported by references to prior work rather than self-citation chains or uniqueness theorems imported from the authors. The taxonomy spans indexing through agentic workflows by recategorizing known methods (reranking, compression, verification) under a new lens, but this is explicit organization rather than a mathematical or definitional loop. The derivation chain is self-contained as a high-level synthesis with no load-bearing steps that equate outputs to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on domain assumptions about LLM behavior and introduces a conceptual framework without free parameters, new physical entities, or independent evidence for invented constructs.

axioms (1)
  • domain assumption LLMs have limited attention budgets and are uniquely vulnerable to noise in retrieved contexts, causing hallucinations and reasoning failures
    Invoked in the abstract as the foundation for declaring denoising the primary bottleneck.
invented entities (1)
  • Four-stage framework (inaccessible to undiscoverable to misaligned to unverifiable) no independent evidence
    purpose: To conceptualize the progression of IR challenges for LLMs
    Newly introduced organizational structure without independent falsifiable evidence outside the paper.

pith-pipeline@v0.9.0 · 5490 in / 1319 out tokens · 55379 ms · 2026-05-09T18:50:00.916148+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

240 extracted references · 72 canonical work pages · 13 internal anchors

  1. [1]

    Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, and Ari S Mor- cos. 2024. SemDeDup: Data-efficient Learning at Web-scale through Semantic Deduplication. InICLR Workshop on Multimodal Representation Learning

  2. [2]

    Shubham Agarwal, Gaurav Sahu, Abhay Puri, Issam H Laradji, Krishna- murthy DJ Dvijotham, Jason Stanley, Laurent Charlin, and Christopher Pal. 2024. Litllm: A toolkit for scientific literature review.arXiv preprint arXiv:2402.01788 (2024)

  3. [3]

    Chen Amiraz, Florin Cuconasu, Simone Filice, and Zohar Karnin. 2025. The distracting effect: Understanding irrelevant passages in rag.arXiv preprint arXiv:2505.06914(2025)

  4. [4]

    Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, and William Yang Wang. 2025. SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=G7sIFXugTX

  5. [5]

    Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, et al. 2026. Syn- thesizing scientific literature with retrieval-augmented language models.Nature (2026), 1–7

  6. [6]

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. [n. d.]. Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. InThe Twelfth International Conference on Learning Representations

  7. [7]

    Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, et al . 2024. Longbench: A bilingual, multitask benchmark for long context understanding. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 3119–3137

  8. [8]

    Sylvio Barbon Junior, Paolo Ceravolo, Sven Groppe, Mustafa Jarrar, Samira Maghool, Florence Sèdes, Soror Sahri, and Maurice Van Keulen. 2024. Are large language models the new interface for data pipelines?. InProceedings of the International Workshop on Big Data in Emergent Distributed Environments. 1–6

  9. [9]

    Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine.Computer networks and ISDN systems30, 1-7 (1998), 107–117

  10. [10]

    Andrei Z Broder. 1997. On the resemblance and containment of documents. InProceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171). IEEE, 21–29

  11. [11]

    Sebastian Bruch, Siyu Gai, and Amir Ingber. 2023. An analysis of fusion functions for hybrid retrieval.ACM Transactions on Information Systems42, 1 (2023), 1–35

  12. [12]

    Vannevar Bush et al. 1945. As we may think.The atlantic monthly176, 1 (1945), 101–108

  13. [13]

    In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval (SIGIR)

    Jaime G. Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity- Based Reranking for Reordering Documents and Producing Summaries. In SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 24-28 1998, Melbourne, Australia, W. Bruce Croft, Alistair Moffat, C. J. van R...

  14. [14]

    Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, and Jie Fu. [n. d.]. RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation. InFirst Conference on Language Modeling

  15. [15]

    Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17754–17762

  16. [16]

    Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu

  17. [17]

    InFindings of the associa- tion for computational linguistics: ACL 2024

    M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. InFindings of the associa- tion for computational linguistics: ACL 2024. 2318–2335

  18. [18]

    Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. 2025. {StruQ}: Defending against prompt injection with structured queries. In34th USENIX Security Symposium (USENIX Security 25). 2383–2400

  19. [19]

    Tao Chen, Mingyang Zhang, Jing Lu, Michael Bendersky, and Marc Najork

  20. [20]

    InEuropean Conference on Information Retrieval

    Out-of-domain semantics to the rescue! zero-shot hybrid retrieval models. InEuropean Conference on Information Retrieval. Springer, 95–110

  21. [21]

    Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, and Yingfei Sun. 2024. Spiral of Silence: How is Large Language Model Killing Information Retrieval?—A Case Study on Open Domain Question An- swering. InProceedings of the 62nd Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers). 14930–14951

  22. [22]

    Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, and Feng Zhao. [n. d.]. MindSearch: Mimicking Human Minds Elicits Deep AI Searcher. InThe Thirteenth International Conference on Learning Repre- sentations

  23. [23]

    Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. 2024. Agent- poison: Red-teaming llm agents via poisoning memory or knowledge bases. Advances in Neural Information Processing Systems37, 130185–130213

  24. [24]

    Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, and Dongyan Zhao. 2024. xrag: Extreme context compres- sion for retrieval-augmented generation with one token.Advances in Neural Information Processing Systems37, 109487–109516

  25. [25]

    Nadezhda Chirkova, Thibault Formal, Vassilina Nikoulina, and Stéphane CLIN- CHANT. [n. d.]. Provence: efficient and robust context pruning for retrieval- augmented generation. InThe Thirteenth International Conference on Learning Representations

  26. [26]

    Content Credentials. 2025. C2PA Technical Specification v2. 2

  27. [27]

    2010.Search engines: Information retrieval in practice

    W Bruce Croft, Donald Metzler, Trevor Strohman, et al. 2010.Search engines: Information retrieval in practice. Vol. 520. Addison-Wesley Reading

  28. [28]

    Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The Power of Noise: Redefining Retrieval for RAG Systems. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 719–729

  29. [29]

    Lu Dai, Hao Liu, and Hui Xiong. 2024. Improve Dense Passage Retrieval with Entailment Tuning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 11375–11387

  30. [30]

    Lu Dai, Yijie Xu, Jinhui Ye, Hao Liu, and Hui Xiong. [n. d.]. SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction. InThe Thirteenth International Conference on Learning Representations

  31. [31]

    Sunhao Dai, Weihao Liu, Yuqi Zhou, Liang Pang, Rongju Ruan, Gang Wang, Zhenhua Dong, Jun Xu, and Ji-Rong Wen. 2024. Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration. InFindings of the Association for Computational Linguistics ACL 2024. 7052–7074

  32. [32]

    Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, et al. 2024. Scalable watermarking for identifying large language model outputs.Nature634, 8035 (2024), 818–823

  33. [33]

    Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, and Jason E Weston. 2024. Chain-of-Verification Reduces Hallucination in Large Language Models. InICLR 2024 Workshop on Reliable and Responsible Foundation Models

  34. [34]

    Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, and Vahab Mir- rokni. 2024. Muvera: Multi-vector retrieval via fixed dimensional encoding. Advances in Neural Information Processing Systems37, 101042–101073

  35. [35]

    Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Ni- hal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Xiang. 2023. CrossCodeEval: A Diverse and Multilingual Bench- mark for Cross-File Code Completion. InNeurIPS. arXiv:2310.11248 [cs.LG] http://arxiv.org/abs/2310.11248v2 arXiv:2310.11248

  36. [36]

    Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query-Only Interaction. arXiv:2503.03704 [cs.LG]

  37. [37]

    Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, and Zhendong Mao

  38. [38]

    DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents.arXiv preprint arXiv:2506.11763(2025)

  39. [39]

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query- focused summarization.arXiv preprint arXiv:2404.16130(2024)

  40. [40]

    Shahul ES, Jithin James, Luis Espinosa Anke, and Steven Schockaert. 2023. RAGAs: Automated Evaluation of Retrieval Augmented Generation.Confer- ence of the European Chapter of the Association for Computational Linguistics abs/2309.15217 (2023)

  41. [41]

    Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, CELINE HUDELOT, and Pierre Colombo. [n. d.]. ColPali: Efficient Document Retrieval with Vision Language Models. InThe Thirteenth International Conference on Learning Representations

  42. [42]

    Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292

  43. [43]

    Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Caifeng Shan, Ran He, and Xing Sun. 2025. Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysi...

  44. [44]

    Yisong Fu, Zezhi Shao, Chengqing Yu, Yujie Li, Zhulin An, Cheems Wang, Yongjun Xu, and Fei Wang. 2025. Selective Learning for Deep Time Series Fore- casting. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=kgzRy6nD6D

  45. [45]

    Jiyang Gao, Chen Sun, Zhenheng Yang, and Ram Nevatia. 2017. Tall: Temporal activity localization via language query. InProceedings of the IEEE international conference on computer vision. 5267–5275. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Lu Dai et al

  46. [46]

    Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2023. Precise Zero-Shot Dense Retrieval without Relevance Labels.. InAnnual Meeting of the Association for Computational Linguistics (ACL). 1762–1777

  47. [47]

    Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. 2023. Enabling large language models to generate text with citations.arXiv preprint arXiv:2305.14627 (2023)

  48. [48]

    Tao Ge, Hu Jing, Lei Wang, Xun Wang, Si-Qing Chen, and Furu Wei. [n. d.]. In-context Autoencoder for Context Compression in a Large Language Model. InThe Twelfth International Conference on Learning Representations

  49. [49]

    Sebastian Gehrmann, Hendrik Strobelt, and Alexander M Rush. 2019. GLTR: Statistical Detection and Visualization of Generated Text. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 111–116

  50. [50]

    Gregory Hok Tjoan Go, Khang Ly, Anders Søgaard, Amin Tabatabaei, Maarten de Rijke, and Xinyi Chen. 2025. LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation.arXiv preprint arXiv:2510.05138 (2025)

  51. [51]

    Alon Gorenshtein, Kamel Shihada, Moran Sorka, Dvir Aran, and Shahar Shelly

  52. [52]

    Computers in Biology and Medicine192 (2025), 110363

    LITERAS: Biomedical literature review and citation retrieval agents. Computers in Biology and Medicine192 (2025), 110363

  53. [53]

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

  54. [54]

    Yongxin Guo, Jingyu Liu, Mingda Li, Dingxin Cheng, Xiaoying Tang, Dianbo Sui, Qingbin Liu, Xi Chen, and Kevin Zhao. 2025. Vtg-llm: Integrating times- tamp knowledge into video llms for enhanced video temporal grounding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 3302–3310

  55. [55]

    Yucan Guo, Miao Su, Saiping Guan, Zihao Sun, Xiaolong Jin, Jiafeng Guo, and Xueqi Cheng. 2025. RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning.arXiv preprint arXiv:2512.09487 (2025)

  56. [56]

    Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. 2024. Lightrag: Simple and fast retrieval-augmented generation.arXiv preprint arXiv:2410.05779 (2024)

  57. [57]

    Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su

  58. [58]

    Hipporag: Neurobiologically inspired long-term memory for large language models,

    HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. InNeurIPS. arXiv:2405.14831 [cs.CL] NeurIPS 2024 (per arXiv comments)

  59. [59]

    Rujun Han, Yanfei Chen, Zoey CuiZhu, Lesly Miculicich, Guan Sun, Yuanjun Bi, Weiming Wen, Hui Wan, Chunfeng Wen, Solène Maître, et al. 2025. Deep researcher with test-time diffusion.arXiv preprint arXiv:2507.16075(2025)

  60. [60]

    Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, and Jinsung Yoon

  61. [61]

    InForty-second International Conference on Machine Learning

    Retrieval Augmented Time Series Forecasting. InForty-second International Conference on Machine Learning

  62. [62]

    Bowei He, Minda Hu, Zenan Xu, Hongru Wang, Licheng Zong, Yankai Chen, Chen Ma, Xue Liu, Pluto Zhou, and Irwin King. 2026. Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration.arXiv preprint arXiv:2602.03647(2026)

  63. [63]

    Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh Chawla, Thomas Laurent, Yann Le- Cun, Xavier Bresson, and Bryan Hooi. 2024. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering.Advances in Neural Information Processing Systems37 (2024), 132876–132907

  64. [64]

    Tiansheng Hu, Yilun Zhao, Canyu Zhang, Arman Cohan, and Chen Zhao. 2026. SAGE: Benchmarking and Improving Retrieval for Deep Research Agents.arXiv preprint arXiv:2602.05975(2026)

  65. [65]

    Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. 2024. GRAG: Graph Retrieval-Augmented Generation.arXiv preprint arXiv:2405.16506 (2024)

  66. [66]

    Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, and Yun-Nung Chen. 2023. CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation.. InSIGdial Meetings (SIGDIAL). 381–387

  67. [67]

    Haoyu Huang, Yongfeng Huang, Junjie Yang, Zhenyu Pan, Yongchao Chen, Kaixiong Ma, Hongzhi Chen, and Jiawei Cheng. 2025. Retrieval-Augmented Generation with Hierarchical Knowledge. InFindings of the Association for Computational Linguistics: EMNLP 2025

  68. [68]

    Yuxuan Huang, Yihang Chen, Haozheng Zhang, Kang Li, Huichi Zhou, Meng Fang, Linyi Yang, Xiaoguang Li, Lifeng Shang, Songcen Xu, et al. 2025. Deep research agents: A systematic examination and roadmap.arXiv preprint arXiv:2506.18096(2025)

  69. [69]

    Zhengjun Huang, Zhoujin Tian, Qintian Guo, Fangyuan Zhang, Yingli Zhou, Di Jiang, Zeying Xie, and Xiaofang Zhou. 2025. LiCoMemory: Light- weight and Cognitive Agentic Memory for Efficient Long-Term Reasoning. arXiv:2511.01448 [cs.IR]

  70. [70]

    Daniel Huwiler, Kurt Stockinger, and Jonathan Fürst. 2025. VersionRAG: Version- Aware Retrieval-Augmented Generation for Evolving Documents.arXiv preprint arXiv:2510.08109(2025)

  71. [71]

    Taeho Hwang, Soyeong Jeong, Sukmin Cho, SeungYoon Han, and Jong C Park

  72. [72]

    InProceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP

    DSLR: Document refinement with sentence-level re-ranking and recon- struction to enhance retrieval-augmented generation. InProceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP. 73–92

  73. [73]

    Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. [n. d.]. Unsupervised Dense Information Retrieval with Contrastive Learning.Transactions on Machine Learning Research([n. d.])

  74. [74]

    Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. InProceedings of the 16th conference of the european chapter of the association for computational linguistics: main volume. 874–880

  75. [75]

    Abhinav Java, Ashmit Khandelwal, Sukruta Midigeshi, Aaron Halfaker, Amit Deshpande, Navin Goyal, Ankur Gupta, Nagarajan Natarajan, and Amit Sharma

  76. [76]

    Characterizing deep research: A benchmark and formal definition.arXiv preprint arXiv:2508.04183(2025)

  77. [77]

    Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park

  78. [78]

    InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

    Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 7036–7050

  79. [79]

    Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2023. Llmlingua: Compressing prompts for accelerated inference of large language models.arXiv preprint arXiv:2310.05736(2023)

  80. [80]

    Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2024. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1658–1677

Showing first 80 references.