pith. sign in

arxiv: 2407.13193 · v4 · pith:OQ274GJUnew · submitted 2024-07-18 · 💻 cs.CL

Retrieval-Augmented Generation for Natural Language Processing: A Survey

Pith reviewed 2026-05-23 23:04 UTC · model grok-4.3

classification 💻 cs.CL
keywords retrieval-augmented generationlarge language modelsnatural language processingsurveyretrieval fusiontaxonomyevaluation methodologies
0
0 comments X

The pith

Retrieval-augmented generation fuses external knowledge with large language models using four distinct fusion approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines how retrieval-augmented generation helps large language models overcome issues like hallucinations and lack of up-to-date knowledge. It concentrates on retrievers and the ways they integrate with generation processes. The authors present a new classification of these integration methods into query-based, logits-based, latent, and parametric fusion. Structured comparisons highlight differences in how accessible and efficient each approach is for various applications. The review also covers how RAG is applied to different natural language processing tasks and points to ongoing challenges in deployment.

Core claim

The paper introduces a novel taxonomy of retrieval fusions, such as query-based, logits-based, latent, and parametric fusion, and provides structured comparisons across accessibility, efficiency, and use cases. It examines RAG applications across diverse NLP tasks, discusses evaluation methodologies and benchmark limitations, and analyzes training paradigms with and without knowledge base updates. Finally, it explores industrial deployment considerations and identifies emerging challenges and future directions, including security, efficiency, and graph-based retrieval.

What carries the argument

The taxonomy of retrieval fusions that categorizes methods as query-based, logits-based, latent, or parametric.

If this is right

  • Different fusion types suit different NLP tasks based on their accessibility and efficiency profiles.
  • Comparisons across the categories can guide choices between methods that modify queries, adjust outputs, work in latent spaces, or update parameters.
  • Training paradigms differ when knowledge bases are updated versus when they remain fixed.
  • Evaluation methodologies must account for benchmark limitations when testing RAG systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The taxonomy could support creation of new hybrid fusion methods that draw from multiple categories.
  • Emphasis on graph-based retrieval points to possible future extensions of the categories to handle structured data.
  • Industrial deployment analysis suggests efficiency metrics will shape practical adoption of specific RAG variants.

Load-bearing premise

The selected literature and proposed taxonomy comprehensively cover the RAG field without significant omissions or selection bias in the reviewed papers.

What would settle it

Discovery of a widely used retrieval fusion technique that does not fit into any of the four categories of query-based, logits-based, latent, or parametric fusion.

Figures

Figures reproduced from arXiv: 2407.13193 by Can Chen, Chun Jason Xue, Haolun Wu, Lianming Huang, Nan Guan, Shangyu Wu, Tei-Wei Kuo, Xue Liu, Ye Yuan, Ying Xiong, Yufei Cui.

Figure 1
Figure 1. Figure 1: The overview of retrieval-augmented generation for natural language processing. The inputs as queries are fed into [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Two stages of using the retriever. introducing retrieval representations into the latent representations of generators, thus implicitly improving the models’ performance. Generator modules can be classified into two branches of gener￾ators: default generators and retrieval-augmented (RA) generators. The default generators include most pre-trained/fine-tuned large language models, such as GPT-series models … view at source ↗
Figure 3
Figure 3. Figure 3: The categories of fusion methods in RAG. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Different RAG training strategies with/without datastore update. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Large language models (LLMs) have achieved strong empirical performance in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge base to augment LLMs, mitigates these limitations. This paper presents a systematic review of RAG techniques for natural language processing (NLP), with a focus on retrievers and retrieval fusions. We introduce a novel taxonomy of retrieval fusions, such as query-based, logits-based, latent, and parametric fusion, and provide structured comparisons across accessibility, efficiency, and use cases. The paper further examines RAG applications across diverse NLP tasks, discusses evaluation methodologies and benchmark limitations, and analyzes training paradigms with and without knowledge base updates. Finally, we explore industrial deployment considerations and identify emerging challenges and future directions, including security, efficiency, and graph-based retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript is a survey on retrieval-augmented generation (RAG) for NLP. It reviews retrievers and retrieval fusion techniques, introduces a novel taxonomy classifying fusions as query-based, logits-based, latent, and parametric, and supplies structured comparisons on accessibility, efficiency, and use cases. The paper additionally covers RAG applications across NLP tasks, evaluation methodologies and benchmark limitations, training paradigms (with/without KB updates), industrial deployment, and emerging challenges including security, efficiency, and graph-based retrieval.

Significance. If the taxonomy is comprehensive and the reviewed literature representative, the work supplies a useful organizing framework for an active area, highlighting how external retrieval mitigates LLM hallucinations and knowledge staleness. The cross-dimensional comparisons and discussion of training/industrial aspects could aid both researchers selecting fusion strategies and practitioners deploying RAG systems.

major comments (2)
  1. [Introduction / taxonomy introduction section] The abstract and introduction assert a 'systematic review' and 'novel taxonomy' of retrieval fusions, yet no section describes the paper-selection protocol (databases, search strings, date range, inclusion criteria, or coverage audit). Without this, the claim that the four fusion categories plus comparisons are exhaustive cannot be verified and risks selection bias.
  2. [Section presenting taxonomy and comparisons] The structured comparisons of the four fusion types on accessibility, efficiency, and use cases rest on the reviewed papers; if any hybrid or unclassified fusion mechanisms from the 2023–2024 literature are omitted, the comparative tables or discussion become incomplete. The manuscript should either demonstrate exhaustive coverage or qualify the scope of the comparisons.
minor comments (2)
  1. Ensure every work cited in support of the taxonomy or comparisons appears in the reference list with consistent formatting and DOIs where available.
  2. [Comparisons subsection] Clarify whether the 'structured comparisons' are qualitative summaries or include any quantitative meta-analysis of reported metrics across papers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. The comments highlight important aspects of transparency in our survey methodology and the scope of the taxonomy. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Introduction / taxonomy introduction section] The abstract and introduction assert a 'systematic review' and 'novel taxonomy' of retrieval fusions, yet no section describes the paper-selection protocol (databases, search strings, date range, inclusion criteria, or coverage audit). Without this, the claim that the four fusion categories plus comparisons are exhaustive cannot be verified and risks selection bias.

    Authors: We acknowledge that the manuscript does not contain an explicit section detailing the literature selection protocol. Our review was conducted by surveying prominent papers on RAG and fusion techniques from sources such as arXiv, ACL Anthology, and major NLP conferences up to early 2024, but this process was not formally documented. To address the concern, we will add a dedicated paragraph in the introduction (or a new 'Scope and Methodology' subsection) describing the search strategy, key terms used, date range, and inclusion focus on retrievers and fusion methods. This addition will clarify the basis for the taxonomy without altering the core claims. revision: yes

  2. Referee: [Section presenting taxonomy and comparisons] The structured comparisons of the four fusion types on accessibility, efficiency, and use cases rest on the reviewed papers; if any hybrid or unclassified fusion mechanisms from the 2023–2024 literature are omitted, the comparative tables or discussion become incomplete. The manuscript should either demonstrate exhaustive coverage or qualify the scope of the comparisons.

    Authors: The taxonomy organizes fusion approaches into four primary categories based on the dominant mechanisms identified across the literature, with the comparisons derived from representative papers in each category. Hybrids are noted where they align with multiple categories. To respond to this point, we will perform a targeted check of additional 2023–2024 papers for any mechanisms that do not fit the taxonomy. If omissions are identified, we will either extend the taxonomy discussion or add explicit qualification of the comparisons' scope (e.g., 'reflecting the primary paradigms in the surveyed works'). This will be reflected in an updated version of the relevant section and tables. revision: partial

Circularity Check

0 steps flagged

No circularity: survey compiles literature without derivations or self-referential reductions

full rationale

This is a survey paper with no equations, fitted parameters, or derivation chain. The central contribution is a proposed taxonomy of retrieval fusions (query-based, logits-based, latent, parametric) and structured comparisons, drawn from reviewed prior work. No step reduces by construction to its own inputs, no self-citation is load-bearing for a mathematical claim, and no uniqueness theorem or ansatz is invoked. The work is self-contained as an organizational review; literature selection is an explicit methodological choice rather than a hidden circular fit.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey of existing literature, the paper introduces no new free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5731 in / 1112 out tokens · 23193 ms · 2026-05-23T23:04:31.544204+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG

    cs.CL 2025-11 conditional novelty 7.0

    TARG uses uncertainty scores from a short no-context draft to gate retrieval in RAG, matching Always-RAG accuracy while cutting retrievals by 70-90% on QA benchmarks.

  2. Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation

    cs.CL 2026-05 unverdicted novelty 6.0

    CoRM-RAG uses a cognitive perturbation protocol to simulate biases and trains an Evidence Critic to retrieve documents that support correct decisions even under adversarial query changes.

  3. When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation

    cs.SE 2026-04 unverdicted novelty 6.0

    EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% ove...

  4. From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling

    math.OC 2026-04 unverdicted novelty 6.0

    Agora-Opt uses decentralized debate among LLM agent teams plus a read-write memory bank to produce more accurate optimization models from text than prior LLM methods.

  5. EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval

    cs.AI 2026-04 unverdicted novelty 6.0

    EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming ...

  6. In-depth Analysis of Graph-based RAG in a Unified Framework

    cs.IR 2025-03 unverdicted novelty 6.0

    A unified framework and large-scale comparison of graph-based RAG methods on QA tasks yields new high-performing variants obtained by recombining existing components.

  7. ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation

    cs.IR 2025-02 unverdicted novelty 6.0

    ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.

  8. EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records

    cs.IR 2026-05 unverdicted novelty 5.0

    EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.

  9. Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents

    cs.IR 2026-04 conditional novelty 5.0

    Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.

  10. Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations

    physics.plasm-ph 2026-04 unverdicted novelty 5.0

    Plasma GraphRAG automates physics-grounded parameter selection for gyrokinetic simulations via a domain-specific knowledge graph and LLMs, reporting over 10% better quality and up to 25% fewer hallucinations than stan...

Reference graph

Works this paper leans on

204 extracted references · 204 canonical work pages · cited by 10 Pith papers · 24 internal anchors

  1. [1]

    Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, and Hanie Sedghi. 2022. Exploring the Limits of Large Scale Pre-training. In The Tenth International Conference on Learning Representations (ICLR)

  2. [2]

    Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, and Siva Reddy. 2024. Evaluating Correctness and Faithfulness of Instruction- Following Models for Question Answering. Trans. Assoc. Comput. Linguistics 12 (2024), 681–699

  3. [3]

    Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. 2023. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association...

  4. [4]

    Diab, and Marjan Ghazvininejad

    Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona T. Diab, and Marjan Ghazvininejad. 2022. A Review on Language Models as Knowledge Bases.CoRR abs/2204.06031 (2022)

  5. [5]

    Gemini: A Family of Highly Capable Multimodal Models

    Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Mil- lican, David Silver, Slav Petrov, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy P. Lillicrap, An- geliki Lazaridou, Orhan Firat, James Molloy, Michae...

  6. [6]

    Adnan Arefeen, Biplob Debnath, and Srimat Chakradhar

    Md. Adnan Arefeen, Biplob Debnath, and Srimat Chakradhar. 2023. LeanCon- text: Cost-Efficient Domain-Specific Question Answering Using LLMs. CoRR abs/2309.00841 (2023)

  7. [7]

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

  8. [8]

    In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024

    Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net

  9. [9]

    Jinheon Baek, Alham Fikri Aji, and Amir Saffari. 2023. Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answer- ing. CoRR abs/2306.04136 (2023)

  10. [10]

    Lyu, and Irwin King

    Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jin Jin, Xin Jiang, Qun Liu, Michael R. Lyu, and Irwin King. 2021. BinaryBERT: Pushing the Limit of BERT Quantization. In Proceedings of the 59th Annual Meeting of the Associa- tion for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP) . Association...

  11. [11]

    Amanda Bertsch, Uri Alon, Graham Neubig, and Matthew R. Gormley. 2023. Unlimiformer: Long-Range Transformers with Unlimited Length Input. In Ad- vances in Neural Information Processing Systems 36 (NeurIPS)

  12. [12]

    Rae, Erich Elsen, and Laurent Sifre

    Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Ruther- ford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cas- sirer, Andy Brock, Michela Paganini, Geoffrey Irving,...

  13. [13]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litw...

  14. [14]

    Deng Cai, Yan Wang, Huayang Li, Wai Lam, and Lemao Liu. 2021. Neural Machine Translation with Monolingual Translation Memory. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP). 7307–7318

  15. [15]

    Junying Chen, Qingcai Chen, Dongfang Li, and Yutao Huang. 2022. SeDR: Segment Representation Learning for Long Documents Dense Retrieval. CoRR abs/2211.10841 (2022)

  16. [16]

    Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking Large Language Models in Retrieval-Augmented Generation. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligen...

  17. [17]

    Xiang Chen, Lei Li, Ningyu Zhang, Xiaozhuan Liang, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2022. Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning. In Advances in Neural Information Processing Systems 35 (NeurIPS)

  18. [18]

    Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, and Rui Yan. 2023. Decouple knowledge from paramters for plug-and-play language modeling. In Findings of the Association for Computational Linguistics (ACL) . 14288–14308

  19. [19]

    Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan

  20. [20]

    In Advances in Neural Information Processing Systems 36 (NeurIPS)

    Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory. In Advances in Neural Information Processing Systems 36 (NeurIPS)

  21. [21]

    Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. 2023. Adapting Language Models to Compress Contexts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). 3829– 3846

  22. [22]

    Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, and James R. Glass

  23. [23]

    In Findings of the Association for Computational Linguistics (ACL)

    Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Ques- tion Answering. In Findings of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 12131–12147

  24. [24]

    Le, and Christopher D

    Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning

  25. [25]

    In 8th International Conference on Learning Representations (ICLR)

    ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations (ICLR) . OpenReview.net

  26. [26]

    Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, André F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, and Michael Desa. 2024. SaulLM-7B: A pioneering Large Language Model for Law. CoRR abs/2403.03883 (2024)

  27. [27]

    Yufei Cui, Ziquan Liu, Yixin Chen, Yuchen Lu, Xinyue Yu, Xue (Steve) Liu, Tei-Wei Kuo, Miguel Rodrigues, Chun Jason Xue, and Antoni B. Chan. 2023. Retrieval-Augmented Multiple Instance Learning. In Advances in Neural Infor- mation Processing Systems 36 (NeurIPS)

  28. [28]

    Yuhan Dai, Zhirui Zhang, Qiuzhi Liu, Qu Cui, Weihua Li, Yichao Du, and Tong Xu. 2023. Simple and Scalable Nearest Neighbor Machine Translation. In The Eleventh International Conference on Learning Representations (ICLR)

  29. [29]

    Costa-jussà

    David Dale, Elena Voita, Loïc Barrault, and Marta R. Costa-jussà. 2023. Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) . Association for Computational Linguistics, 36–50

  30. [30]

    Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, and William W. Cohen. 2023. FiDO: Fusion-in-Decoder opti- mized for stronger performance and faster inference. In Findings of the Associa- tion for Computational Linguistics (ACL) . 11534–11547

  31. [31]

    Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, and William W. Cohen. 2023. Pre-computed memory or on- the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute. In Proceedings of the 40th International Conference on Machine Learning (ICML). 7329–7342

  32. [32]

    Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, and William W. Cohen. 2022. Mention Memory: incorporating textual knowledge into Trans- formers through entity mention attention. InThe Tenth International Conference on Learning Representations (ICLR)

  33. [33]

    Hiroyuki Deguchi, Taro Watanabe, Yusuke Matsui, Masao Utiyama, Hideki Tanaka, and Eiichiro Sumita. 2023. Subset Retrieval Nearest Neighbor Machine Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). 174–189

  34. [34]

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. In Advances in Neural Infor- mation Processing Systems 36 (NeurIPS)

  35. [35]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Under- standing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 4171–4186

  36. [36]

    Zixiang Ding, Guoqing Jiang, Shuai Zhang, Lin Guo, and Wei Lin. 2023. SKD- BERT: Compressing BERT via Stochastic Knowledge Distillation. In Thirty- Seventh AAAI Conference on Artificial Intelligence (AAAI) . AAAI Press, 7414– 7422

  37. [37]

    Ehsan Doostmohammadi, Tobias Norlund, Marco Kuhlmann, and Richard Johansson. 2023. Surface-Based Retrieval Reduces Perplexity of Retrieval- Augmented Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) . 521–529. 15

  38. [38]

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. CoRR abs/2401.08281 (2024)

  39. [39]

    explosion. 2016. Spacy. https://spacy.io/

  40. [40]

    Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, and Bing Xiang

    Alexander R. Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, and Bing Xiang. 2020. Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) . 4508–4513

  41. [41]

    Facebook. 2013. RocksDB. https://github.com/facebook/rocksdb

  42. [42]

    Angela Fan and Claire Gardent. 2022. Generating Full Length Wikipedia Bi- ographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies. CoRR abs/2204.05879 (2022)

  43. [43]

    Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. 2021. Augment- ing Transformers with KNN-Based Composite Memory for Dialog.Trans. Assoc. Comput. Linguistics 9 (2021), 82–99

  44. [44]

    Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, and Tom Kwiatkowski. 2020. Entities as Experts: Sparse Memory Access with Entity Supervision. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . 4937–4951

  45. [45]

    Robert Friel, Masha Belyi, and Atindriyo Sanyal. 2024. Ragbench: Explain- able benchmark for retrieval-augmented generation systems. arXiv preprint arXiv:2407.11005 (2024)

  46. [46]

    Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, and Marianne Winslett. 2021. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT. Trans. Assoc. Comput. Linguistics 9 (2021), 1061–1080

  47. [47]

    Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, 6894–6910

  48. [48]

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. CoRR abs/2312.10997 (2023)

  49. [49]

    Glass, Xueqing Wu, Ankita Rajaram Naik, Gaetano Rossiello, and Alfio Gliozzo

    Michael R. Glass, Xueqing Wu, Ankita Rajaram Naik, Gaetano Rossiello, and Alfio Gliozzo. 2023. Retrieval-Based Transformer for Table Augmentation. In Findings of the Association for Computational Linguistics (ACL) . 5635–5648

  50. [50]

    Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, and Dong Yu. 2020. Recur- rent Chunking Mechanisms for Long-Text Machine Reading Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 6751–6761

  51. [51]

    Asela Gunawardana and Guy Shani. 2009. A Survey of Accuracy Evaluation Metrics of Recommendation Tasks. J. Mach. Learn. Res. 10 (2009), 2935–2962

  52. [52]

    Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, and Jian Yin. 2019. Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing. In Pro- ceedings of the 57th Conference of the Association for Computational Linguistics (ACL). 855–866

  53. [53]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)

  54. [54]

    Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xiaomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang Qiao, Ting Wang, Bo Tang, and Charles Xie. 2022. Manu: A Cloud Native Vector Database Management System. Proc. VLDB Endow. 15, 12 (2022), 3548–3561

  55. [55]

    Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research) , Vol. 119. PMLR, 3887–3896

  56. [56]

    Zhicheng Guo, Sijie Cheng, Yile Wang, Peng Li, and Yang Liu. 2023. Prompt- Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks. In Find- ings of the Association for Computational Linguistics (ACL) . 10896–10912

  57. [57]

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang

  58. [58]

    In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Vol

    Retrieval Augmented Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Vol. 119. 3929–3938

  59. [59]

    David Harris and Sarah Harris. 2010. Digital design and computer architecture . Morgan Kaufmann

  60. [60]

    Zellig S Harris. 1954. Distributional structure. Word 10, 2-3 (1954), 146–162

  61. [61]

    Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith Lambert, Adam Amos-Binks, Zohreh Dannenhauer, and Dustin Dannenhauer. 2023. Mem- ory Matters: The Need to Improve Long-Term Memory in LLM-Agents. In Proceedings of the AAAI Symposium Series , Vol. 2. 277–280

  62. [62]

    Qiyuan He, Yizhong Wang, and Wenya Wang. 2024. Can Language Models Act as Knowledge Bases at Scale? CoRR abs/2402.14273 (2024)

  63. [63]

    Training Compute-Optimal Large Language Models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...

  64. [64]

    Sebastian Hofstätter, Jiecao Chen, Karthik Raman, and Hamed Zamani. 2023. FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) . ACM, 1437–1447

  65. [65]

    Nabil Hossain, Marjan Ghazvininejad, and Luke Zettlemoyer. 2020. Simple and Effective Retrieve-Edit-Rerank Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) . 2532– 2538

  66. [66]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations (ICLR)

  67. [67]

    Xuming Hu. 2023. Multimodal Named Entity Recognition and Relation Extrac- tion with Retrieval-Augmented Strategy. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 3488

  68. [68]

    Yucheng Hu and Yuxing Lu. 2024. RAG and RAU: A Survey on Retrieval- Augmented Language Model in Natural Language Processing. CoRR abs/2404.19543 (2024)

  69. [69]

    Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, and Bryan Catanzaro. 2023. RAVEN: In-Context Learning with Retrieval Aug- mented Encoder-Decoder Language Models. CoRR abs/2308.07922 (2023)

  70. [70]

    Qiushi Huang, Shuai Fu, Xubo Liu, Wenwu Wang, Tom Ko, Yu Zhang, and Lilian H. Y. Tang. 2023. Learning Retrieval Augmentation for Personalized Dialogue Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP) . Association for Computational Linguistics, 2523–2540

  71. [71]

    Qiang Huang and Anthony K. H. Tung. 2023. Lightweight-Yet-Efficient: Revi- talizing Ball-Tree for Point-to-Hyperplane Nearest Neighbor Search. In 39th IEEE International Conference on Data Engineering (ICDE) . IEEE, 436–449

  72. [72]

    Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. 2024. Understanding the planning of LLM agents: A survey. CoRR abs/2402.02716 (2024)

  73. [73]

    Yangsibo Huang, Daogao Liu, Zexuan Zhong, Weijia Shi, and Yin Tat Lee. 2023. kNN-Adapter: Efficient Domain Adaptation for Black-Box Language Models. CoRR abs/2302.10879 (2023)

  74. [74]

    Yulong Hui, Yao Lu, and Huanchen Zhang. 2024. UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-World Document Analysis. In Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , Amir Globersons, Lester Mackey, Dani...

  75. [75]

    Shonosuke Ishiwatari, Jingtao Yao, Shujie Liu, Mu Li, Ming Zhou, Naoki Yoshi- naga, Masaru Kitsuregawa, and Weijia Jia. 2017. Chunk-based Decoder for Neural Machine Translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) . Association for Computational Linguistics, 1901–1912

  76. [76]

    Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. 2022. Unsupervised Dense Information Retrieval with Contrastive Learning. Trans. Mach. Learn. Res. 2022 (2022)

  77. [77]

    Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. InProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 874–880

  78. [78]

    Gautier Izacard, Patrick S. H. Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot Learning with Retrieval Augmented Language Models. J. Mach. Learn. Res. 24 (2023), 251:1–251:43

  79. [79]

    Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011), 117–128

  80. [80]

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12 (2023), 248:1–248:38

Showing first 80 references.