Retrieval-Augmented Generation for Natural Language Processing: A Survey

Can Chen; Chun Jason Xue; Haolun Wu; Lianming Huang; Nan Guan; Shangyu Wu; Tei-Wei Kuo; Xue Liu; Ye Yuan; Ying Xiong

arxiv: 2407.13193 · v4 · pith:OQ274GJU · submitted 2024-07-18 · cs.CL

Retrieval-Augmented Generation for Natural Language Processing: A Survey

Shangyu Wu , Ying Xiong , Yufei Cui , Haolun Wu , Can Chen , Ye Yuan , Lianming Huang , Xue Liu

show 3 more authors

Tei-Wei Kuo Nan Guan Chun Jason Xue

This is my paper

Reviewed by Pith T0 review T1 audit T2 compute T3 formal T4 kernel 2026-05-23 23:04 UTCgrok-4.3pith:OQ274GJU record.json open to challenge →

classification cs.CL

keywords retrieval-augmented generationlarge language modelsnatural language processingsurveyretrieval fusiontaxonomyevaluation methodologies

0 comments

The pith

Retrieval-augmented generation fuses external knowledge with large language models using four distinct fusion approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines how retrieval-augmented generation helps large language models overcome issues like hallucinations and lack of up-to-date knowledge. It concentrates on retrievers and the ways they integrate with generation processes. The authors present a new classification of these integration methods into query-based, logits-based, latent, and parametric fusion. Structured comparisons highlight differences in how accessible and efficient each approach is for various applications. The review also covers how RAG is applied to different natural language processing tasks and points to ongoing challenges in deployment.

Core claim

The paper introduces a novel taxonomy of retrieval fusions, such as query-based, logits-based, latent, and parametric fusion, and provides structured comparisons across accessibility, efficiency, and use cases. It examines RAG applications across diverse NLP tasks, discusses evaluation methodologies and benchmark limitations, and analyzes training paradigms with and without knowledge base updates. Finally, it explores industrial deployment considerations and identifies emerging challenges and future directions, including security, efficiency, and graph-based retrieval.

What carries the argument

The taxonomy of retrieval fusions that categorizes methods as query-based, logits-based, latent, or parametric.

If this is right

Different fusion types suit different NLP tasks based on their accessibility and efficiency profiles.
Comparisons across the categories can guide choices between methods that modify queries, adjust outputs, work in latent spaces, or update parameters.
Training paradigms differ when knowledge bases are updated versus when they remain fixed.
Evaluation methodologies must account for benchmark limitations when testing RAG systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could support creation of new hybrid fusion methods that draw from multiple categories.
Emphasis on graph-based retrieval points to possible future extensions of the categories to handle structured data.
Industrial deployment analysis suggests efficiency metrics will shape practical adoption of specific RAG variants.

Load-bearing premise

The selected literature and proposed taxonomy comprehensively cover the RAG field without significant omissions or selection bias in the reviewed papers.

What would settle it

Discovery of a widely used retrieval fusion technique that does not fit into any of the four categories of query-based, logits-based, latent, or parametric fusion.

Figures

Figures reproduced from arXiv: 2407.13193 by Can Chen, Chun Jason Xue, Haolun Wu, Lianming Huang, Nan Guan, Shangyu Wu, Tei-Wei Kuo, Xue Liu, Ye Yuan, Ying Xiong, Yufei Cui.

**Figure 1.** Figure 1: The overview of retrieval-augmented generation for natural language processing. The inputs as queries are fed into [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Two stages of using the retriever. introducing retrieval representations into the latent representations of generators, thus implicitly improving the models’ performance. Generator modules can be classified into two branches of generators: default generators and retrieval-augmented (RA) generators. The default generators include most pre-trained/fine-tuned large language models, such as GPT-series models … view at source ↗

**Figure 3.** Figure 3: The categories of fusion methods in RAG. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Different RAG training strategies with/without datastore update. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Large language models (LLMs) have achieved strong empirical performance in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge base to augment LLMs, mitigates these limitations. This paper presents a systematic review of RAG techniques for natural language processing (NLP), with a focus on retrievers and retrieval fusions. We introduce a novel taxonomy of retrieval fusions, such as query-based, logits-based, latent, and parametric fusion, and provide structured comparisons across accessibility, efficiency, and use cases. The paper further examines RAG applications across diverse NLP tasks, discusses evaluation methodologies and benchmark limitations, and analyzes training paradigms with and without knowledge base updates. Finally, we explore industrial deployment considerations and identify emerging challenges and future directions, including security, efficiency, and graph-based retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This RAG survey's main value is its taxonomy of fusion types with practical comparisons, but coverage depends on unstated paper selection.

read the letter

The central thing here is a survey that groups retrieval fusion methods into query-based, logits-based, latent, and parametric categories, then lines up comparisons on accessibility, efficiency, and use cases. That structure goes beyond just listing papers and could actually help someone decide which approach fits a given setup. The paper also covers applications across NLP tasks, evaluation benchmarks and their limits, training with or without knowledge base updates, industrial issues, and open problems like security and graph retrieval. Those sections pull together a lot of prior work in one place. The comparisons look grounded in the reviewed literature and seem aimed at real decisions rather than abstract theory. The soft spot is the taxonomy's completeness. The abstract gives no details on how papers were chosen or whether hybrid or edge-case fusions were systematically checked, so any gaps in the sample would make the accessibility and efficiency tables less reliable. That's a standard survey risk but directly affects the claim of structured comparisons. This is for NLP researchers or practitioners who need an organized map of RAG options rather than a new algorithm. Someone building or tuning a system would get usable pointers from the taxonomy and comparisons. It deserves peer review because the organization is useful even if the coverage needs tightening in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript is a survey on retrieval-augmented generation (RAG) for NLP. It reviews retrievers and retrieval fusion techniques, introduces a novel taxonomy classifying fusions as query-based, logits-based, latent, and parametric, and supplies structured comparisons on accessibility, efficiency, and use cases. The paper additionally covers RAG applications across NLP tasks, evaluation methodologies and benchmark limitations, training paradigms (with/without KB updates), industrial deployment, and emerging challenges including security, efficiency, and graph-based retrieval.

Significance. If the taxonomy is comprehensive and the reviewed literature representative, the work supplies a useful organizing framework for an active area, highlighting how external retrieval mitigates LLM hallucinations and knowledge staleness. The cross-dimensional comparisons and discussion of training/industrial aspects could aid both researchers selecting fusion strategies and practitioners deploying RAG systems.

major comments (2)

[Introduction / taxonomy introduction section] The abstract and introduction assert a 'systematic review' and 'novel taxonomy' of retrieval fusions, yet no section describes the paper-selection protocol (databases, search strings, date range, inclusion criteria, or coverage audit). Without this, the claim that the four fusion categories plus comparisons are exhaustive cannot be verified and risks selection bias.
[Section presenting taxonomy and comparisons] The structured comparisons of the four fusion types on accessibility, efficiency, and use cases rest on the reviewed papers; if any hybrid or unclassified fusion mechanisms from the 2023–2024 literature are omitted, the comparative tables or discussion become incomplete. The manuscript should either demonstrate exhaustive coverage or qualify the scope of the comparisons.

minor comments (2)

Ensure every work cited in support of the taxonomy or comparisons appears in the reference list with consistent formatting and DOIs where available.
[Comparisons subsection] Clarify whether the 'structured comparisons' are qualitative summaries or include any quantitative meta-analysis of reported metrics across papers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. The comments highlight important aspects of transparency in our survey methodology and the scope of the taxonomy. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Introduction / taxonomy introduction section] The abstract and introduction assert a 'systematic review' and 'novel taxonomy' of retrieval fusions, yet no section describes the paper-selection protocol (databases, search strings, date range, inclusion criteria, or coverage audit). Without this, the claim that the four fusion categories plus comparisons are exhaustive cannot be verified and risks selection bias.

Authors: We acknowledge that the manuscript does not contain an explicit section detailing the literature selection protocol. Our review was conducted by surveying prominent papers on RAG and fusion techniques from sources such as arXiv, ACL Anthology, and major NLP conferences up to early 2024, but this process was not formally documented. To address the concern, we will add a dedicated paragraph in the introduction (or a new 'Scope and Methodology' subsection) describing the search strategy, key terms used, date range, and inclusion focus on retrievers and fusion methods. This addition will clarify the basis for the taxonomy without altering the core claims. revision: yes
Referee: [Section presenting taxonomy and comparisons] The structured comparisons of the four fusion types on accessibility, efficiency, and use cases rest on the reviewed papers; if any hybrid or unclassified fusion mechanisms from the 2023–2024 literature are omitted, the comparative tables or discussion become incomplete. The manuscript should either demonstrate exhaustive coverage or qualify the scope of the comparisons.

Authors: The taxonomy organizes fusion approaches into four primary categories based on the dominant mechanisms identified across the literature, with the comparisons derived from representative papers in each category. Hybrids are noted where they align with multiple categories. To respond to this point, we will perform a targeted check of additional 2023–2024 papers for any mechanisms that do not fit the taxonomy. If omissions are identified, we will either extend the taxonomy discussion or add explicit qualification of the comparisons' scope (e.g., 'reflecting the primary paradigms in the surveyed works'). This will be reflected in an updated version of the relevant section and tables. revision: partial

Circularity Check

0 steps flagged

No circularity: survey compiles literature without derivations or self-referential reductions

full rationale

This is a survey paper with no equations, fitted parameters, or derivation chain. The central contribution is a proposed taxonomy of retrieval fusions (query-based, logits-based, latent, parametric) and structured comparisons, drawn from reviewed prior work. No step reduces by construction to its own inputs, no self-citation is load-bearing for a mathematical claim, and no uniqueness theorem or ansatz is invoked. The work is self-contained as an organizational review; literature selection is an explicit methodological choice rather than a hidden circular fit.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey of existing literature, the paper introduces no new free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5731 in / 1112 out tokens · 23193 ms · 2026-05-23T23:04:31.544204+00:00 · methodology

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation
cs.CL 2026-05 unverdicted novelty 8.0

DiscourseFlip is a graph-guided attack allocating limited poisoning budget to induce targeted opinion shifts over semantic query networks in black-box RAG.
Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG
cs.CL 2025-11 conditional novelty 7.0

TARG uses uncertainty scores from a short no-context draft to gate retrieval in RAG, matching Always-RAG accuracy while cutting retrievals by 70-90% on QA benchmarks.
Factorize to Generalize: Retrieval-Guided Invariant-Dynamic Decomposition for Time Series Forecasting
cs.LG 2026-05 unverdicted novelty 6.0

The proposed framework decomposes retrieval-augmented representations into invariant and dynamic components to improve robustness in zero-shot time series forecasting under distribution shifts.
Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation
cs.CL 2026-05 unverdicted novelty 6.0

CoRM-RAG uses a cognitive perturbation protocol to simulate biases and trains an Evidence Critic to retrieve documents that support correct decisions even under adversarial query changes.
When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation
cs.SE 2026-04 unverdicted novelty 6.0

EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% ove...
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
math.OC 2026-04 unverdicted novelty 6.0

Agora-Opt uses decentralized debate among LLM agent teams plus a read-write memory bank to produce more accurate optimization models from text than prior LLM methods.
EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval
cs.AI 2026-04 unverdicted novelty 6.0

EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming ...
In-depth Analysis of Graph-based RAG in a Unified Framework
cs.IR 2025-03 unverdicted novelty 6.0

A unified framework and large-scale comparison of graph-based RAG methods on QA tasks yields new high-performing variants obtained by recombining existing components.
ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation
cs.IR 2025-02 unverdicted novelty 6.0

ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.
AGE: Adaptive-masking for Graph Embedding in Graph Retrieval-Augmented Generation
cs.IR 2026-06 unverdicted novelty 5.0

AGE applies adaptive masking via a learnable sampler in Transformer-based SSL to align graph and text embeddings, yielding higher accuracy on four GraphQA benchmarks for non-parametric GraphRAG.
EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records
cs.IR 2026-05 unverdicted novelty 5.0

EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.
Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents
cs.IR 2026-04 conditional novelty 5.0

Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.
Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations
physics.plasm-ph 2026-04 unverdicted novelty 5.0

Plasma GraphRAG automates physics-grounded parameter selection for gyrokinetic simulations via a domain-specific knowledge graph and LLMs, reporting over 10% better quality and up to 25% fewer hallucinations than stan...
LLMs in the Real World: Evaluating "AI" in Emergency Contexts
cs.CY 2026-05 unverdicted novelty 2.0

AI researchers should take greater responsibility for publicly explaining the limitations of their technologies to prevent misuse in high-stakes applications such as emergency translation services.

Reference graph

Works this paper leans on

204 extracted references · 204 canonical work pages · cited by 14 Pith papers · 24 internal anchors

[1]

Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, and Hanie Sedghi. 2022. Exploring the Limits of Large Scale Pre-training. In The Tenth International Conference on Learning Representations (ICLR)

work page 2022
[2]

Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, and Siva Reddy. 2024. Evaluating Correctness and Faithfulness of Instruction- Following Models for Question Answering. Trans. Assoc. Comput. Linguistics 12 (2024), 681–699

work page 2024
[3]

Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. 2023. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association...

work page 2023
[4]

A review on language models as knowledge bases

Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona T. Diab, and Marjan Ghazvininejad. 2022. A Review on Language Models as Knowledge Bases.CoRR abs/2204.06031 (2022)

work page arXiv 2022
[5]

Gemini: A Family of Highly Capable Multimodal Models

Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Mil- lican, David Silver, Slav Petrov, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy P. Lillicrap, An- geliki Lazaridou, Orhan Firat, James Molloy, Michae...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Adnan Arefeen, Biplob Debnath, and Srimat Chakradhar

Md. Adnan Arefeen, Biplob Debnath, and Srimat Chakradhar. 2023. LeanCon- text: Cost-Efficient Domain-Specific Question Answering Using LLMs. CoRR abs/2309.00841 (2023)

work page arXiv 2023
[7]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

work page
[8]

In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024

Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net

work page 2024
[9]

Jinheon Baek, Alham Fikri Aji, and Amir Saffari. 2023. Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answer- ing. CoRR abs/2306.04136 (2023)

work page arXiv 2023
[10]

Lyu, and Irwin King

Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jin Jin, Xin Jiang, Qun Liu, Michael R. Lyu, and Irwin King. 2021. BinaryBERT: Pushing the Limit of BERT Quantization. In Proceedings of the 59th Annual Meeting of the Associa- tion for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP) . Association...

work page 2021
[11]

Amanda Bertsch, Uri Alon, Graham Neubig, and Matthew R. Gormley. 2023. Unlimiformer: Long-Range Transformers with Unlimited Length Input. In Ad- vances in Neural Information Processing Systems 36 (NeurIPS)

work page 2023
[12]

Rae, Erich Elsen, and Laurent Sifre

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Ruther- ford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cas- sirer, Andy Brock, Michela Paganini, Geoffrey Irving,...

work page 2022
[13]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litw...

work page 2020
[14]

Deng Cai, Yan Wang, Huayang Li, Wai Lam, and Lemao Liu. 2021. Neural Machine Translation with Monolingual Translation Memory. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP). 7307–7318

work page 2021
[15]

Junying Chen, Qingcai Chen, Dongfang Li, and Yutao Huang. 2022. SeDR: Segment Representation Learning for Long Documents Dense Retrieval. CoRR abs/2211.10841 (2022)

work page arXiv 2022
[16]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking Large Language Models in Retrieval-Augmented Generation. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligen...

work page 2024
[17]

Xiang Chen, Lei Li, Ningyu Zhang, Xiaozhuan Liang, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2022. Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning. In Advances in Neural Information Processing Systems 35 (NeurIPS)

work page 2022
[18]

Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, and Rui Yan. 2023. Decouple knowledge from paramters for plug-and-play language modeling. In Findings of the Association for Computational Linguistics (ACL) . 14288–14308

work page 2023
[19]

Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan

work page
[20]

In Advances in Neural Information Processing Systems 36 (NeurIPS)

Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory. In Advances in Neural Information Processing Systems 36 (NeurIPS)

work page
[21]

Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. 2023. Adapting Language Models to Compress Contexts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). 3829– 3846

work page 2023
[22]

Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, and James R. Glass

work page
[23]

In Findings of the Association for Computational Linguistics (ACL)

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Ques- tion Answering. In Findings of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 12131–12147

work page
[24]

Le, and Christopher D

Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning

work page
[25]

In 8th International Conference on Learning Representations (ICLR)

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations (ICLR) . OpenReview.net

work page
[26]

Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, André F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, and Michael Desa. 2024. SaulLM-7B: A pioneering Large Language Model for Law. CoRR abs/2403.03883 (2024)

work page arXiv 2024
[27]

Yufei Cui, Ziquan Liu, Yixin Chen, Yuchen Lu, Xinyue Yu, Xue (Steve) Liu, Tei-Wei Kuo, Miguel Rodrigues, Chun Jason Xue, and Antoni B. Chan. 2023. Retrieval-Augmented Multiple Instance Learning. In Advances in Neural Infor- mation Processing Systems 36 (NeurIPS)

work page 2023
[28]

Yuhan Dai, Zhirui Zhang, Qiuzhi Liu, Qu Cui, Weihua Li, Yichao Du, and Tong Xu. 2023. Simple and Scalable Nearest Neighbor Machine Translation. In The Eleventh International Conference on Learning Representations (ICLR)

work page 2023
[29]

Costa-jussà

David Dale, Elena Voita, Loïc Barrault, and Marta R. Costa-jussà. 2023. Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) . Association for Computational Linguistics, 36–50

work page 2023
[30]

Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, and William W. Cohen. 2023. FiDO: Fusion-in-Decoder opti- mized for stronger performance and faster inference. In Findings of the Associa- tion for Computational Linguistics (ACL) . 11534–11547

work page 2023
[31]

Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, and William W. Cohen. 2023. Pre-computed memory or on- the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute. In Proceedings of the 40th International Conference on Machine Learning (ICML). 7329–7342

work page 2023
[32]

Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, and William W. Cohen. 2022. Mention Memory: incorporating textual knowledge into Trans- formers through entity mention attention. InThe Tenth International Conference on Learning Representations (ICLR)

work page 2022
[33]

Hiroyuki Deguchi, Taro Watanabe, Yusuke Matsui, Masao Utiyama, Hideki Tanaka, and Eiichiro Sumita. 2023. Subset Retrieval Nearest Neighbor Machine Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). 174–189

work page 2023
[34]

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. In Advances in Neural Infor- mation Processing Systems 36 (NeurIPS)

work page 2023
[35]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Under- standing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 4171–4186

work page 2019
[36]

Zixiang Ding, Guoqing Jiang, Shuai Zhang, Lin Guo, and Wei Lin. 2023. SKD- BERT: Compressing BERT via Stochastic Knowledge Distillation. In Thirty- Seventh AAAI Conference on Artificial Intelligence (AAAI) . AAAI Press, 7414– 7422

work page 2023
[37]

Ehsan Doostmohammadi, Tobias Norlund, Marco Kuhlmann, and Richard Johansson. 2023. Surface-Based Retrieval Reduces Perplexity of Retrieval- Augmented Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) . 521–529. 15

work page 2023
[38]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. CoRR abs/2401.08281 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

explosion. 2016. Spacy. https://spacy.io/

work page 2016
[40]

Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, and Bing Xiang

Alexander R. Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, and Bing Xiang. 2020. Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) . 4508–4513

work page 2020
[41]

Facebook. 2013. RocksDB. https://github.com/facebook/rocksdb

work page 2013
[42]

Angela Fan and Claire Gardent. 2022. Generating Full Length Wikipedia Bi- ographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies. CoRR abs/2204.05879 (2022)

work page arXiv 2022
[43]

Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. 2021. Augment- ing Transformers with KNN-Based Composite Memory for Dialog.Trans. Assoc. Comput. Linguistics 9 (2021), 82–99

work page 2021
[44]

Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, and Tom Kwiatkowski. 2020. Entities as Experts: Sparse Memory Access with Entity Supervision. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . 4937–4951

work page 2020
[45]

Robert Friel, Masha Belyi, and Atindriyo Sanyal. 2024. Ragbench: Explain- able benchmark for retrieval-augmented generation systems. arXiv preprint arXiv:2407.11005 (2024)

work page Pith review arXiv 2024
[46]

Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, and Marianne Winslett. 2021. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT. Trans. Assoc. Comput. Linguistics 9 (2021), 1061–1080

work page 2021
[47]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, 6894–6910

work page 2021
[48]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. CoRR abs/2312.10997 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[49]

Glass, Xueqing Wu, Ankita Rajaram Naik, Gaetano Rossiello, and Alfio Gliozzo

Michael R. Glass, Xueqing Wu, Ankita Rajaram Naik, Gaetano Rossiello, and Alfio Gliozzo. 2023. Retrieval-Based Transformer for Table Augmentation. In Findings of the Association for Computational Linguistics (ACL) . 5635–5648

work page 2023
[50]

Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, and Dong Yu. 2020. Recur- rent Chunking Mechanisms for Long-Text Machine Reading Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 6751–6761

work page 2020
[51]

Asela Gunawardana and Guy Shani. 2009. A Survey of Accuracy Evaluation Metrics of Recommendation Tasks. J. Mach. Learn. Res. 10 (2009), 2935–2962

work page 2009
[52]

Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, and Jian Yin. 2019. Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing. In Pro- ceedings of the 57th Conference of the Association for Computational Linguistics (ACL). 855–866

work page 2019
[53]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xiaomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang Qiao, Ting Wang, Bo Tang, and Charles Xie. 2022. Manu: A Cloud Native Vector Database Management System. Proc. VLDB Endow. 15, 12 (2022), 3548–3561

work page 2022
[55]

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research) , Vol. 119. PMLR, 3887–3896

work page 2020
[56]

Zhicheng Guo, Sijie Cheng, Yile Wang, Peng Li, and Yang Liu. 2023. Prompt- Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks. In Find- ings of the Association for Computational Linguistics (ACL) . 10896–10912

work page 2023
[57]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang

work page
[58]

In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Vol

Retrieval Augmented Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Vol. 119. 3929–3938

work page
[59]

David Harris and Sarah Harris. 2010. Digital design and computer architecture . Morgan Kaufmann

work page 2010
[60]

Zellig S Harris. 1954. Distributional structure. Word 10, 2-3 (1954), 146–162

work page 1954
[61]

Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith Lambert, Adam Amos-Binks, Zohreh Dannenhauer, and Dustin Dannenhauer. 2023. Mem- ory Matters: The Need to Improve Long-Term Memory in LLM-Agents. In Proceedings of the AAAI Symposium Series , Vol. 2. 277–280

work page 2023
[62]

Qiyuan He, Yizhong Wang, and Wenya Wang. 2024. Can Language Models Act as Knowledge Bases at Scale? CoRR abs/2402.14273 (2024)

work page arXiv 2024
[63]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[64]

Sebastian Hofstätter, Jiecao Chen, Karthik Raman, and Hamed Zamani. 2023. FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) . ACM, 1437–1447

work page 2023
[65]

Nabil Hossain, Marjan Ghazvininejad, and Luke Zettlemoyer. 2020. Simple and Effective Retrieve-Edit-Rerank Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) . 2532– 2538

work page 2020
[66]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations (ICLR)

work page 2022
[67]

Xuming Hu. 2023. Multimodal Named Entity Recognition and Relation Extrac- tion with Retrieval-Augmented Strategy. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 3488

work page 2023
[68]

Yucheng Hu and Yuxing Lu. 2024. RAG and RAU: A Survey on Retrieval- Augmented Language Model in Natural Language Processing. CoRR abs/2404.19543 (2024)

work page arXiv 2024
[69]

Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, and Bryan Catanzaro. 2023. RAVEN: In-Context Learning with Retrieval Aug- mented Encoder-Decoder Language Models. CoRR abs/2308.07922 (2023)

work page arXiv 2023
[70]

Qiushi Huang, Shuai Fu, Xubo Liu, Wenwu Wang, Tom Ko, Yu Zhang, and Lilian H. Y. Tang. 2023. Learning Retrieval Augmentation for Personalized Dialogue Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP) . Association for Computational Linguistics, 2523–2540

work page 2023
[71]

Qiang Huang and Anthony K. H. Tung. 2023. Lightweight-Yet-Efficient: Revi- talizing Ball-Tree for Point-to-Hyperplane Nearest Neighbor Search. In 39th IEEE International Conference on Data Engineering (ICDE) . IEEE, 436–449

work page 2023
[72]

Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. 2024. Understanding the planning of LLM agents: A survey. CoRR abs/2402.02716 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[73]

Yangsibo Huang, Daogao Liu, Zexuan Zhong, Weijia Shi, and Yin Tat Lee. 2023. kNN-Adapter: Efficient Domain Adaptation for Black-Box Language Models. CoRR abs/2302.10879 (2023)

work page arXiv 2023
[74]

Yulong Hui, Yao Lu, and Huanchen Zhang. 2024. UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-World Document Analysis. In Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , Amir Globersons, Lester Mackey, Dani...

work page 2024
[75]

Shonosuke Ishiwatari, Jingtao Yao, Shujie Liu, Mu Li, Ming Zhou, Naoki Yoshi- naga, Masaru Kitsuregawa, and Weijia Jia. 2017. Chunk-based Decoder for Neural Machine Translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) . Association for Computational Linguistics, 1901–1912

work page 2017
[76]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. 2022. Unsupervised Dense Information Retrieval with Contrastive Learning. Trans. Mach. Learn. Res. 2022 (2022)

work page 2022
[77]

Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. InProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 874–880

work page 2021
[78]

Gautier Izacard, Patrick S. H. Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot Learning with Retrieval Augmented Language Models. J. Mach. Learn. Res. 24 (2023), 251:1–251:43

work page 2023
[79]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011), 117–128

work page 2011
[80]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12 (2023), 248:1–248:38

work page 2023

Showing first 80 references.

[1] [1]

Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, and Hanie Sedghi. 2022. Exploring the Limits of Large Scale Pre-training. In The Tenth International Conference on Learning Representations (ICLR)

work page 2022

[2] [2]

Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, and Siva Reddy. 2024. Evaluating Correctness and Faithfulness of Instruction- Following Models for Question Answering. Trans. Assoc. Comput. Linguistics 12 (2024), 681–699

work page 2024

[3] [3]

Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. 2023. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association...

work page 2023

[4] [4]

A review on language models as knowledge bases

Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona T. Diab, and Marjan Ghazvininejad. 2022. A Review on Language Models as Knowledge Bases.CoRR abs/2204.06031 (2022)

work page arXiv 2022

[5] [5]

Gemini: A Family of Highly Capable Multimodal Models

Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Mil- lican, David Silver, Slav Petrov, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy P. Lillicrap, An- geliki Lazaridou, Orhan Firat, James Molloy, Michae...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

Adnan Arefeen, Biplob Debnath, and Srimat Chakradhar

Md. Adnan Arefeen, Biplob Debnath, and Srimat Chakradhar. 2023. LeanCon- text: Cost-Efficient Domain-Specific Question Answering Using LLMs. CoRR abs/2309.00841 (2023)

work page arXiv 2023

[7] [7]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

work page

[8] [8]

In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024

Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net

work page 2024

[9] [9]

Jinheon Baek, Alham Fikri Aji, and Amir Saffari. 2023. Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answer- ing. CoRR abs/2306.04136 (2023)

work page arXiv 2023

[10] [10]

Lyu, and Irwin King

Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jin Jin, Xin Jiang, Qun Liu, Michael R. Lyu, and Irwin King. 2021. BinaryBERT: Pushing the Limit of BERT Quantization. In Proceedings of the 59th Annual Meeting of the Associa- tion for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP) . Association...

work page 2021

[11] [11]

Amanda Bertsch, Uri Alon, Graham Neubig, and Matthew R. Gormley. 2023. Unlimiformer: Long-Range Transformers with Unlimited Length Input. In Ad- vances in Neural Information Processing Systems 36 (NeurIPS)

work page 2023

[12] [12]

Rae, Erich Elsen, and Laurent Sifre

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Ruther- ford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cas- sirer, Andy Brock, Michela Paganini, Geoffrey Irving,...

work page 2022

[13] [13]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litw...

work page 2020

[14] [14]

Deng Cai, Yan Wang, Huayang Li, Wai Lam, and Lemao Liu. 2021. Neural Machine Translation with Monolingual Translation Memory. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP). 7307–7318

work page 2021

[15] [15]

Junying Chen, Qingcai Chen, Dongfang Li, and Yutao Huang. 2022. SeDR: Segment Representation Learning for Long Documents Dense Retrieval. CoRR abs/2211.10841 (2022)

work page arXiv 2022

[16] [16]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking Large Language Models in Retrieval-Augmented Generation. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligen...

work page 2024

[17] [17]

Xiang Chen, Lei Li, Ningyu Zhang, Xiaozhuan Liang, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2022. Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning. In Advances in Neural Information Processing Systems 35 (NeurIPS)

work page 2022

[18] [18]

Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, and Rui Yan. 2023. Decouple knowledge from paramters for plug-and-play language modeling. In Findings of the Association for Computational Linguistics (ACL) . 14288–14308

work page 2023

[19] [19]

Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan

work page

[20] [20]

In Advances in Neural Information Processing Systems 36 (NeurIPS)

Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory. In Advances in Neural Information Processing Systems 36 (NeurIPS)

work page

[21] [21]

Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. 2023. Adapting Language Models to Compress Contexts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). 3829– 3846

work page 2023

[22] [22]

Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, and James R. Glass

work page

[23] [23]

In Findings of the Association for Computational Linguistics (ACL)

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Ques- tion Answering. In Findings of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 12131–12147

work page

[24] [24]

Le, and Christopher D

Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning

work page

[25] [25]

In 8th International Conference on Learning Representations (ICLR)

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations (ICLR) . OpenReview.net

work page

[26] [26]

Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, André F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, and Michael Desa. 2024. SaulLM-7B: A pioneering Large Language Model for Law. CoRR abs/2403.03883 (2024)

work page arXiv 2024

[27] [27]

Yufei Cui, Ziquan Liu, Yixin Chen, Yuchen Lu, Xinyue Yu, Xue (Steve) Liu, Tei-Wei Kuo, Miguel Rodrigues, Chun Jason Xue, and Antoni B. Chan. 2023. Retrieval-Augmented Multiple Instance Learning. In Advances in Neural Infor- mation Processing Systems 36 (NeurIPS)

work page 2023

[28] [28]

Yuhan Dai, Zhirui Zhang, Qiuzhi Liu, Qu Cui, Weihua Li, Yichao Du, and Tong Xu. 2023. Simple and Scalable Nearest Neighbor Machine Translation. In The Eleventh International Conference on Learning Representations (ICLR)

work page 2023

[29] [29]

Costa-jussà

David Dale, Elena Voita, Loïc Barrault, and Marta R. Costa-jussà. 2023. Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) . Association for Computational Linguistics, 36–50

work page 2023

[30] [30]

Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, and William W. Cohen. 2023. FiDO: Fusion-in-Decoder opti- mized for stronger performance and faster inference. In Findings of the Associa- tion for Computational Linguistics (ACL) . 11534–11547

work page 2023

[31] [31]

Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, and William W. Cohen. 2023. Pre-computed memory or on- the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute. In Proceedings of the 40th International Conference on Machine Learning (ICML). 7329–7342

work page 2023

[32] [32]

Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, and William W. Cohen. 2022. Mention Memory: incorporating textual knowledge into Trans- formers through entity mention attention. InThe Tenth International Conference on Learning Representations (ICLR)

work page 2022

[33] [33]

Hiroyuki Deguchi, Taro Watanabe, Yusuke Matsui, Masao Utiyama, Hideki Tanaka, and Eiichiro Sumita. 2023. Subset Retrieval Nearest Neighbor Machine Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). 174–189

work page 2023

[34] [34]

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. In Advances in Neural Infor- mation Processing Systems 36 (NeurIPS)

work page 2023

[35] [35]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Under- standing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 4171–4186

work page 2019

[36] [36]

Zixiang Ding, Guoqing Jiang, Shuai Zhang, Lin Guo, and Wei Lin. 2023. SKD- BERT: Compressing BERT via Stochastic Knowledge Distillation. In Thirty- Seventh AAAI Conference on Artificial Intelligence (AAAI) . AAAI Press, 7414– 7422

work page 2023

[37] [37]

Ehsan Doostmohammadi, Tobias Norlund, Marco Kuhlmann, and Richard Johansson. 2023. Surface-Based Retrieval Reduces Perplexity of Retrieval- Augmented Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) . 521–529. 15

work page 2023

[38] [38]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. CoRR abs/2401.08281 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[39] [39]

explosion. 2016. Spacy. https://spacy.io/

work page 2016

[40] [40]

Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, and Bing Xiang

Alexander R. Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, and Bing Xiang. 2020. Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) . 4508–4513

work page 2020

[41] [41]

Facebook. 2013. RocksDB. https://github.com/facebook/rocksdb

work page 2013

[42] [42]

Angela Fan and Claire Gardent. 2022. Generating Full Length Wikipedia Bi- ographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies. CoRR abs/2204.05879 (2022)

work page arXiv 2022

[43] [43]

Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. 2021. Augment- ing Transformers with KNN-Based Composite Memory for Dialog.Trans. Assoc. Comput. Linguistics 9 (2021), 82–99

work page 2021

[44] [44]

Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, and Tom Kwiatkowski. 2020. Entities as Experts: Sparse Memory Access with Entity Supervision. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . 4937–4951

work page 2020

[45] [45]

Robert Friel, Masha Belyi, and Atindriyo Sanyal. 2024. Ragbench: Explain- able benchmark for retrieval-augmented generation systems. arXiv preprint arXiv:2407.11005 (2024)

work page Pith review arXiv 2024

[46] [46]

Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, and Marianne Winslett. 2021. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT. Trans. Assoc. Comput. Linguistics 9 (2021), 1061–1080

work page 2021

[47] [47]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, 6894–6910

work page 2021

[48] [48]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. CoRR abs/2312.10997 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[49] [49]

Glass, Xueqing Wu, Ankita Rajaram Naik, Gaetano Rossiello, and Alfio Gliozzo

Michael R. Glass, Xueqing Wu, Ankita Rajaram Naik, Gaetano Rossiello, and Alfio Gliozzo. 2023. Retrieval-Based Transformer for Table Augmentation. In Findings of the Association for Computational Linguistics (ACL) . 5635–5648

work page 2023

[50] [50]

Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, and Dong Yu. 2020. Recur- rent Chunking Mechanisms for Long-Text Machine Reading Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 6751–6761

work page 2020

[51] [51]

Asela Gunawardana and Guy Shani. 2009. A Survey of Accuracy Evaluation Metrics of Recommendation Tasks. J. Mach. Learn. Res. 10 (2009), 2935–2962

work page 2009

[52] [52]

Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, and Jian Yin. 2019. Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing. In Pro- ceedings of the 57th Conference of the Association for Computational Linguistics (ACL). 855–866

work page 2019

[53] [53]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[54] [54]

Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xiaomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang Qiao, Ting Wang, Bo Tang, and Charles Xie. 2022. Manu: A Cloud Native Vector Database Management System. Proc. VLDB Endow. 15, 12 (2022), 3548–3561

work page 2022

[55] [55]

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research) , Vol. 119. PMLR, 3887–3896

work page 2020

[56] [56]

Zhicheng Guo, Sijie Cheng, Yile Wang, Peng Li, and Yang Liu. 2023. Prompt- Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks. In Find- ings of the Association for Computational Linguistics (ACL) . 10896–10912

work page 2023

[57] [57]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang

work page

[58] [58]

In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Vol

Retrieval Augmented Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Vol. 119. 3929–3938

work page

[59] [59]

David Harris and Sarah Harris. 2010. Digital design and computer architecture . Morgan Kaufmann

work page 2010

[60] [60]

Zellig S Harris. 1954. Distributional structure. Word 10, 2-3 (1954), 146–162

work page 1954

[61] [61]

Kostas Hatalis, Despina Christou, Joshua Myers, Steven Jones, Keith Lambert, Adam Amos-Binks, Zohreh Dannenhauer, and Dustin Dannenhauer. 2023. Mem- ory Matters: The Need to Improve Long-Term Memory in LLM-Agents. In Proceedings of the AAAI Symposium Series , Vol. 2. 277–280

work page 2023

[62] [62]

Qiyuan He, Yizhong Wang, and Wenya Wang. 2024. Can Language Models Act as Knowledge Bases at Scale? CoRR abs/2402.14273 (2024)

work page arXiv 2024

[63] [63]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[64] [64]

Sebastian Hofstätter, Jiecao Chen, Karthik Raman, and Hamed Zamani. 2023. FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) . ACM, 1437–1447

work page 2023

[65] [65]

Nabil Hossain, Marjan Ghazvininejad, and Luke Zettlemoyer. 2020. Simple and Effective Retrieve-Edit-Rerank Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) . 2532– 2538

work page 2020

[66] [66]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations (ICLR)

work page 2022

[67] [67]

Xuming Hu. 2023. Multimodal Named Entity Recognition and Relation Extrac- tion with Retrieval-Augmented Strategy. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 3488

work page 2023

[68] [68]

Yucheng Hu and Yuxing Lu. 2024. RAG and RAU: A Survey on Retrieval- Augmented Language Model in Natural Language Processing. CoRR abs/2404.19543 (2024)

work page arXiv 2024

[69] [69]

Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, and Bryan Catanzaro. 2023. RAVEN: In-Context Learning with Retrieval Aug- mented Encoder-Decoder Language Models. CoRR abs/2308.07922 (2023)

work page arXiv 2023

[70] [70]

Qiushi Huang, Shuai Fu, Xubo Liu, Wenwu Wang, Tom Ko, Yu Zhang, and Lilian H. Y. Tang. 2023. Learning Retrieval Augmentation for Personalized Dialogue Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP) . Association for Computational Linguistics, 2523–2540

work page 2023

[71] [71]

Qiang Huang and Anthony K. H. Tung. 2023. Lightweight-Yet-Efficient: Revi- talizing Ball-Tree for Point-to-Hyperplane Nearest Neighbor Search. In 39th IEEE International Conference on Data Engineering (ICDE) . IEEE, 436–449

work page 2023

[72] [72]

Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. 2024. Understanding the planning of LLM agents: A survey. CoRR abs/2402.02716 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[73] [73]

Yangsibo Huang, Daogao Liu, Zexuan Zhong, Weijia Shi, and Yin Tat Lee. 2023. kNN-Adapter: Efficient Domain Adaptation for Black-Box Language Models. CoRR abs/2302.10879 (2023)

work page arXiv 2023

[74] [74]

Yulong Hui, Yao Lu, and Huanchen Zhang. 2024. UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-World Document Analysis. In Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , Amir Globersons, Lester Mackey, Dani...

work page 2024

[75] [75]

Shonosuke Ishiwatari, Jingtao Yao, Shujie Liu, Mu Li, Ming Zhou, Naoki Yoshi- naga, Masaru Kitsuregawa, and Weijia Jia. 2017. Chunk-based Decoder for Neural Machine Translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) . Association for Computational Linguistics, 1901–1912

work page 2017

[76] [76]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. 2022. Unsupervised Dense Information Retrieval with Contrastive Learning. Trans. Mach. Learn. Res. 2022 (2022)

work page 2022

[77] [77]

Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. InProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 874–880

work page 2021

[78] [78]

Gautier Izacard, Patrick S. H. Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot Learning with Retrieval Augmented Language Models. J. Mach. Learn. Res. 24 (2023), 251:1–251:43

work page 2023

[79] [79]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011), 117–128

work page 2011

[80] [80]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12 (2023), 248:1–248:38

work page 2023