Retrieval-Augmented Generation for Large Language Models: A Survey

Haofen Wang; Jiawei Sun; Jinliu Pan; Kangxiang Jia; Meng Wang; Xinyu Gao; Yi Dai; Yunfan Gao; Yun Xiong; Yuxi Bi

arxiv: 2312.10997 · v5 · submitted 2023-12-18 · 💻 cs.CL · cs.AI

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao , Yun Xiong , Xinyu Gao , Kangxiang Jia , Jinliu Pan , Yuxi Bi , Yi Dai , Jiawei Sun

show 2 more authors

Meng Wang Haofen Wang

This is my paper

Pith reviewed 2026-05-24 05:08 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords Retrieval-Augmented GenerationLarge Language ModelsRAG SurveyNaive RAGAdvanced RAGModular RAGKnowledge IntegrationHallucination

0 comments

The pith

Retrieval-Augmented Generation merges external databases with large language models to cut hallucinations and keep knowledge current.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys how RAG addresses core LLM shortcomings such as generating false statements, relying on outdated facts, and producing untraceable outputs. It groups existing systems into three paradigms called Naive RAG, Advanced RAG, and Modular RAG. The review then dissects every approach into its retrieval, generation, and augmentation pieces and catalogs current methods for each piece. This organization shows how external sources can be combined with a model's own knowledge to support knowledge-heavy tasks.

Core claim

RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review examines the progression of RAG paradigms encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It scrutinizes the tripartite foundation of RAG frameworks which includes the retrieval, the generation and the augmentation techniques and highlights the state-of-the-art technologies in each component. The paper also introduces an up-to-date evaluation framework and benchmark while delineating current challenges and prospective research avenues.

What carries the argument

The tripartite categorization of RAG into Naive RAG, Advanced RAG, and Modular RAG together with the division of each system into retrieval, generation, and augmentation components.

If this is right

RAG improves accuracy and credibility of outputs on knowledge-intensive tasks.
RAG supports continuous knowledge updates without retraining the underlying model.
RAG enables straightforward addition of domain-specific information.
The introduced evaluation framework and benchmarks allow systematic comparison of different RAG implementations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The three-paradigm map could serve as a checklist for engineers choosing which RAG variant to deploy for a given task.
Documented challenges may prompt hybrid systems that combine elements from more than one paradigm.
Widespread use of the survey's structure would make it easier to track which component improvements actually move performance.

Load-bearing premise

The chosen division of all RAG work into Naive, Advanced, and Modular paradigms plus the split into retrieval, generation, and augmentation components forms a complete and non-overlapping framework.

What would settle it

Publication of a new RAG system that cannot be placed in any of the three paradigms or that requires a fourth component outside retrieval, generation, and augmentation.

Figures

Figures reproduced from arXiv: 2312.10997 by Haofen Wang, Jiawei Sun, Jinliu Pan, Kangxiang Jia, Meng Wang, Xinyu Gao, Yi Dai, Yunfan Gao, Yun Xiong, Yuxi Bi.

**Figure 1.** Figure 1: Technology tree of RAG research. The stages of involving RAG mainly include pre-training, fine-tuning, and inference. With the emergence of LLMs, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: A representative instance of the RAG process applied to question answering. It mainly consists of 3 steps. 1) Indexing. Documents are split into chunks, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison between the three paradigms of RAG. (Left) Naive RAG mainly consists of three parts: indexing, retrieval and generation. (Middle) [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: RAG compared with other model optimization methods in the aspects of “External Knowledge Required” and “Model Adaption Required”. Prompt [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: In addition to the most common once retrieval, RAG also includes three types of retrieval augmentation processes. (left) Iterative retrieval involves [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Summary of RAG ecosystem initial learning curve. 3) Specialization - optimizing RAG to better serve production environments. The mutual growth of RAG models and their technology stacks is evident; technological advancements continuously establish new standards for existing infrastructure. In turn, enhancements to the technology stack drive the development of RAG capabilities. RAG toolkits are converging in… view at source ↗

read the original abstract

Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval, the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces up-to-date evaluation framework and benchmark. At the end, this article delineates the challenges currently faced and points out prospective avenues for research and development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey organizes RAG work into Naive/Advanced/Modular paradigms plus retrieval/generation/augmentation components, but adds no new methods or results.

read the letter

This survey organizes existing RAG literature into three paradigms—Naive RAG, Advanced RAG, and Modular RAG—and breaks the systems down into retrieval, generation, and augmentation pieces. It also covers evaluation frameworks, benchmarks, current challenges, and some future directions. The abstract frames the taxonomy as a lens for looking at prior work rather than a complete partition. That structure is the main thing the paper contributes. Readers who want a quick map of how different RAG approaches relate can use the sections on each component to see which techniques are being tried and where the cited papers sit. The coverage of evaluation methods and open problems gives a practical snapshot of what people are measuring and where gaps remain. The paper does not claim or deliver any new derivation, experiment, or system. Its value is entirely in the grouping and the pointers to earlier papers. The usual survey risks apply: the chosen categories may overlap in practice, some relevant work could be left out, and the fast pace of LLM research means parts of the review will date quickly. Nothing in the abstract suggests the authors tested the taxonomy for exhaustiveness or ran a systematic search protocol that would let a reader verify coverage. This paper is for newcomers to RAG or for people who need an organized reference list rather than for specialists looking for novel analysis. It is worth sending to peer review because a clear survey can still save time for the community even when the framework is one reasonable way among others and carries no original technical claim.

Referee Report

0 major / 2 minor

Summary. This survey paper reviews Retrieval-Augmented Generation (RAG) methods for Large Language Models, organizing the literature into three paradigms (Naive RAG, Advanced RAG, Modular RAG) and dissecting the core components of retrieval, generation, and augmentation. It additionally surveys evaluation frameworks and benchmarks, identifies current challenges, and outlines future research directions.

Significance. If the proposed taxonomy functions as a useful organizing lens rather than a claimed exhaustive partition, the survey could help researchers map the RAG literature by highlighting component-level advances and evaluation practices. The absence of original derivations or empirical claims means its value rests on the clarity and coverage of the organizational framework.

minor comments (2)

Abstract: the phrasing 'the retrieval, the generation and the augmentation techniques' is awkward and should be revised to 'retrieval, generation, and augmentation techniques' for readability.
Abstract: the sentence 'RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases' repeats ideas already stated in the preceding sentences; consider condensing.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript and for recommending minor revision. The assessment that the taxonomy can serve as a useful organizing lens aligns with our intent.

Circularity Check

0 steps flagged

No circularity: descriptive survey with no derivations

full rationale

This paper is a literature survey whose contribution is an organizational taxonomy (Naive/Advanced/Modular RAG plus retrieval/generation/augmentation split) and a review of prior work. No equations, derivations, fitted parameters, or load-bearing self-citations appear in the abstract or described structure. The framework is explicitly presented as a lens for examining existing publications rather than a result derived from data or prior claims within the paper itself. Therefore the derivation chain is empty and the circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the paper introduces no free parameters, axioms, or invented entities; it aggregates and categorizes existing published work.

pith-pipeline@v0.9.0 · 5749 in / 974 out tokens · 23412 ms · 2026-05-24T05:08:46.485471+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation
cs.CR 2026-05 unverdicted novelty 8.0

M³Att poisons medical multimodal RAG by pairing covert textual misinformation with query-agnostic visual perturbations that increase retrieval of the bad content, causing LLMs to generate clinically plausible but inco...
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
cs.CR 2026-05 unverdicted novelty 8.0

Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing
cs.CR 2026-04 unverdicted novelty 8.0

The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
cs.HC 2024-05 conditional novelty 8.0

AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences acros...
UniPPTBench: A Unified Benchmark for Presentation Generation Across Diverse Input Settings
cs.CV 2026-05 conditional novelty 7.0

The paper presents UniPPTBench and UniPPTEval, a unified benchmark and scenario-aware evaluation framework for presentation generation from vague prompts, long documents, multimodal documents, and multi-source inputs.
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation
cs.CL 2026-05 unverdicted novelty 7.0

Evidence utility is defined as information gain on the model's output distribution, with ranking by gain on a latent helpfulness variable shown equivalent to answer-space utility under mild assumptions, enabling a tra...
A Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representations
cs.CL 2026-05 unverdicted novelty 7.0

IfcLLM combines relational and graph representations of IFC models with iterative LLM reasoning to deliver 93.3-100% first-attempt accuracy on natural language queries across three test models.
Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation
cs.AI 2026-05 unverdicted novelty 7.0

PyRAG turns multi-hop reasoning into executable Python code over retrieval tools for explicit, verifiable step-by-step RAG.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
cs.CL 2026-05 unverdicted novelty 7.0

DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium
cs.AI 2026-05 unverdicted novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to ad...
SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards
eess.SP 2026-05 unverdicted novelty 7.0

SEM-RAG compiles telecommunication standards into structure-preserving graphs and uses entropy-guided retrieval to reach 94.1% accuracy on TeleQnA and 93.8% on ORAN-Bench-13K while reducing indexing token usage compar...
MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents
cs.CL 2026-05 unverdicted novelty 7.0

MANTRA automatically synthesizes SMT-validated compliance benchmarks for LLM agents from natural language manuals and tool schemas, producing 285 tasks across 6 domains with minimal human effort.
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
cs.CL 2026-05 unverdicted novelty 7.0

LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
Stateful Agent Backdoor
cs.CR 2026-05 unverdicted novelty 7.0

A stateful backdoor for LLM agents, modeled as a Mealy machine with a decomposition framework, enables incremental malicious actions across sessions and achieves 80-95% attack success rate on four models.
Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs
cs.CR 2026-05 unverdicted novelty 7.0

PAS encodes locations via relative anchors and bins to deliver roughly 370-400m adversarial error in spatial RAG while retaining over half the baseline retrieval performance and keeping generation quality robust.
Telegraph English: Semantic Prompt Compression via Structured Symbolic Rewriting
cs.CL 2026-05 unverdicted novelty 7.0

Telegraph English compresses prompts via structured symbolic rewriting into atomic facts, achieving roughly 50% token reduction with 99.1% key-fact accuracy on LongBench-v2 and outperforming token-deletion baselines a...
E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems
cs.CR 2026-05 unverdicted novelty 7.0

E-MIA converts document details into four types of exam questions and aggregates the RAG's answers into a membership score that separates member and non-member documents better than prior similarity-based or probe-bas...
ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?
cs.CL 2026-05 unverdicted novelty 7.0

Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data
cs.AI 2026-04 unverdicted novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design matter...
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models
cs.IR 2026-04 unverdicted novelty 7.0

ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.
RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates
cs.SE 2026-04 unverdicted novelty 7.0

RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than p...
Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%
cs.SE 2026-04 unverdicted novelty 7.0

Adding product context retrieval to AI coding agents raises decision compliance from 46% to 95% on a new benchmark of 8 tasks with 41 weighted decision points.
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation
cs.AI 2026-04 unverdicted novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
Similar Users-Augmented Interest Network
cs.IR 2026-04 unverdicted novelty 7.0

SUIN improves CTR prediction by augmenting target user sequences with similar users' behaviors via embedding-based retrieval, user-specific position encoding, and user-aware target attention.
Uncertainty Propagation in LLM-Based Systems
cs.SE 2026-04 unverdicted novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insight...
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Participatory provenance as representational auditing for AI-mediated public consultation
cs.AI 2026-04 unverdicted novelty 7.0

Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.
Learning When Not to Decide: A Framework for Overcoming Factual Presumptuousness in AI Adjudication
cs.AI 2026-04 unverdicted novelty 7.0

A new structured prompting method (SPEC) helps AI detect insufficient evidence in adjudication tasks and defer decisions appropriately, reaching 89% accuracy on a benchmark varying information completeness from Colora...
ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation
cs.CL 2026-04 unverdicted novelty 7.0

ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.
STRIDE: Strategic Iterative Decision-Making for Retrieval-Augmented Multi-Hop Question Answering
cs.AI 2026-04 unverdicted novelty 7.0

STRIDE uses a meta-planner for entity-agnostic reasoning skeletons and a supervisor for dependency-aware execution to improve retrieval-augmented multi-hop QA.
Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing
cs.CL 2026-04 unverdicted novelty 7.0

Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.
ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding
cs.CV 2026-04 unverdicted novelty 7.0

ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.
MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
cs.CL 2026-04 unverdicted novelty 7.0

MM-Doc-R1 combines an agentic workflow with Similarity-based Policy Optimization (SPO) to achieve 10.4% higher performance than prior baselines on long-document visual question answering.
Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size
cs.CL 2026-04 unverdicted novelty 7.0

Contextual entrainment decreases for semantic contexts but increases for non-semantic ones as LLMs scale, following power-law trends with 4x better resistance to misinformation but 2x more copying of arbitrary tokens.
Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method
cs.CL 2026-04 unverdicted novelty 7.0

ConflictQA benchmark shows LLMs fail to resolve conflicts between text and KG evidence and often default to one source, motivating the XoT explanation-based reasoning method.
VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning
cs.CV 2026-04 unverdicted novelty 7.0

VISOR is a unified agentic VRAG framework with Evidence Space structuring, visual action evaluation/correction, and dynamic sliding-window trajectories trained via GRPO-based RL that achieves SOTA performance on long-...
Decoupling Vector Data and Index Storage for Space Efficiency
cs.DB 2026-04 unverdicted novelty 7.0

DecoupleVS decouples vector data and index storage in ANNS systems to cut storage space by up to 58.7% with competitive search and update performance.
ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback
cs.AI 2026-04 unverdicted novelty 7.0

ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.
An End-to-End Approach for Fixing Concurrency Bugs via SHB-Based Context Extractor
cs.SE 2026-04 unverdicted novelty 7.0

ConFixAgent repairs diverse concurrency bugs end-to-end by using Static Happens-Before graphs to extract relevant code context for LLMs, outperforming prior tools in benchmarks.
Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects
cs.CR 2026-04 unverdicted novelty 7.0

Injecting a few malicious vectors near the centroid exploits centrality-driven hubness in high-dimensional embeddings, causing them to dominate top-k retrievals in up to 99.85% of cases.
Architecture Without Architects: How AI Coding Agents Shape Software Architecture
cs.SE 2026-04 unverdicted novelty 7.0

AI coding agents perform vibe architecting by making prompt-driven architectural choices that produce structurally different systems for identical tasks.
Unified and Efficient Approach for Multi-Vector Similarity Search
cs.DB 2026-04 unverdicted novelty 7.0

MV-HNSW is the first native hierarchical graph index for multi-vector data, achieving over 90% recall with up to 14x lower search latency than prior filter-and-refine approaches across seven datasets.
AnnoRetrieve: Efficient Structured Retrieval for Unstructured Document Analysis
cs.IR 2026-04 unverdicted novelty 7.0

AnnoRetrieve uses auto-generated structured schemas and queries to retrieve information from unstructured documents more efficiently and accurately than embedding-based methods.
From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering
cs.IR 2026-03 unverdicted novelty 7.0

Docling with hierarchical splitting reaches 94.1% RAG accuracy on domain documents, beating naive PDF loading but trailing manual Markdown curation at 97.1%.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
cs.AI 2026-03 unverdicted novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
cs.AI 2026-03 unverdicted novelty 7.0

GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference t...
PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning
cs.RO 2026-02 unverdicted novelty 7.0

PhysMem enables VLM-based robot planners to learn and verify physical properties through test-time interaction and hypothesis testing, raising success on a brick insertion task from 23% to 76%.
AtomicRAG: Atom-Entity Graphs for Retrieval-Augmented Generation
cs.IR 2026-02 unverdicted novelty 7.0

AtomicRAG replaces chunk-based and triple-based GraphRAG with atom-entity graphs that store facts as atomic units and use personalized PageRank plus relevance filtering to achieve higher retrieval accuracy and reasoni...
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning
cs.CV 2026-01 unverdicted novelty 7.0

VideoDR is a new benchmark for open-web video deep research that tests multimodal models on cross-frame visual anchor extraction, interactive retrieval, and multi-hop reasoning over joint video-web evidence.
In Line with Context: Repository-Level Code Generation via Context Inlining
cs.SE 2026-01 unverdicted novelty 7.0

InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.
VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models
cs.CL 2025-12 conditional novelty 7.0

VLegal-Bench supplies 10,450 expert-validated samples for evaluating LLMs on Vietnamese legal questions, retrieval, multi-step reasoning, and scenario solving.
MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report)
cs.CL 2025-11 accept novelty 7.0

RAG, MCP, and NLWeb interfaces let LLM web agents achieve higher F1 scores (0.75-0.77 vs 0.67) and much lower token usage and runtime than HTML in controlled e-commerce tasks.
MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference
cs.AR 2025-04 unverdicted novelty 7.0

MIST is a new simulator for heterogeneous multi-stage LLM inference that combines hardware traces with analytical models to explore configuration trade-offs in hybrid CPU-accelerator systems.
Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs
cs.LG 2026-05 unverdicted novelty 6.0

Ex-GraphRAG replaces GNN encoders with M-GNAN for exact node-level decomposition in graph-augmented LLMs, matching black-box performance on STaRK-Prime while exposing semantic-structural mismatches that degrade multi-...
Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory
cs.CL 2026-05 unverdicted novelty 6.0

Memory Grafting improves language-model benchmarks by grafting offline hidden-state memory from a larger model into a recipient model using n-gram lookups and lightweight adapters, outperforming MoE and vanilla Engram...
EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
cs.AI 2026-05 unverdicted novelty 6.0

EngiAI is a multi-agent framework unifying topology optimization, retrieval, HPC orchestration, and manufacturing control, with benchmarks showing proprietary LLMs at 96-97% task completion on Beams2D and lower perfor...
DocOS: Towards Proactive Document-Guided Actions in GUI Agents
cs.AI 2026-05 unverdicted novelty 6.0

Introduces DocOS benchmark to test GUI agents on proactively locating, comprehending, and executing instructions from online documentation in interactive web settings.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs
cs.CR 2026-05 unverdicted novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
Jobs' AI Exposure Should Be Measured from Evidence, Not Model Priors
cs.IR 2026-05 conditional novelty 6.0

The authors propose a retrieval-augmented framework that grounds AI exposure labels for 18,796 O*NET occupation-task pairs in retrieved news and academic abstracts, outperforming zero-shot prompting in 72% of disagree...
Why Retrieval-Augmented Generation Fails: A Graph Perspective
cs.CL 2026-05 unverdicted novelty 6.0

Attribution graphs reveal that RAG failures arise from shallow fragmented evidence flow in LLMs, enabling topology-based detection and targeted interventions that reinforce question-guided routing.

Reference graph

Works this paper leans on

178 extracted references · 178 canonical work pages · cited by 254 Pith papers · 38 internal anchors

[1]

Large language models struggle to learn long-tail knowledge,

N. Kandpal, H. Deng, A. Roberts, E. Wallace, and C. Raffel, “Large language models struggle to learn long-tail knowledge,” in Interna- tional Conference on Machine Learning . PMLR, 2023, pp. 15 696– 15 707

work page 2023
[2]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Y . Zhang, Y . Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y . Zhang, Y . Chenet al., “Siren’s song in the ai ocean: A survey on hal- lucination in large language models,” arXiv preprint arXiv:2309.01219, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Gar-meets-rag paradigm for zero-shot information re- trieval,

D. Arora, A. Kini, S. R. Chowdhury, N. Natarajan, G. Sinha, and A. Sharma, “Gar-meets-rag paradigm for zero-shot information re- trieval,” arXiv preprint arXiv:2310.20158 , 2023

work page arXiv 2023
[4]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschel et al. , “Retrieval- augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020

work page 2020
[5]

Improving language models by retrieving from trillions of tokens,

S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Milli- can, G. B. Van Den Driessche, J.-B. Lespiau, B. Damoc, A. Clarket al., “Improving language models by retrieving from trillions of tokens,” in International conference on machine learning . PMLR, 2022, pp. 2206–2240

work page 2022
[6]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al. , “Training language models to follow instructions with human feedback,” Advances in neural information processing systems , vol. 35, pp. 27 730–27 744, 2022

work page 2022
[7]

Query rewrit- ing for retrieval-augmented large language models,

X. Ma, Y . Gong, P. He, H. Zhao, and N. Duan, “Query rewrit- ing for retrieval-augmented large language models,” arXiv preprint arXiv:2305.14283, 2023

work page arXiv 2023
[8]

Advanced rag techniques: an il- lustrated overview,

I. ILIN, “Advanced rag techniques: an il- lustrated overview,” https://pub.towardsai.net/ advanced-rag-techniques-an-illustrated-overview-04d193d8fec6, 2023

work page 2023
[9]

Large language model based long-tail query rewriting in taobao search,

W. Peng, G. Li, Y . Jiang, Z. Wang, D. Ou, X. Zeng, E. Chen et al. , “Large language model based long-tail query rewriting in taobao search,” arXiv preprint arXiv:2311.03758 , 2023

work page arXiv 2023
[10]

Take a step back: Evoking reasoning via abstraction in large language models,

H. S. Zheng, S. Mishra, X. Chen, H.-T. Cheng, E. H. Chi, Q. V . Le, and D. Zhou, “Take a step back: Evoking reasoning via abstraction in large language models,” arXiv preprint arXiv:2310.06117 , 2023

work page arXiv 2023
[11]

Precise zero-shot dense retrieval without relevance labels,

L. Gao, X. Ma, J. Lin, and J. Callan, “Precise zero-shot dense retrieval without relevance labels,” arXiv preprint arXiv:2212.10496 , 2022

work page arXiv 2022
[12]

Enhancing rag pipelines in haystack: Introducing diver- sityranker and lostinthemiddleranker,

V . Blagojevi, “Enhancing rag pipelines in haystack: Introducing diver- sityranker and lostinthemiddleranker,” https://towardsdatascience.com/ enhancing-rag-pipelines-in-haystack-45f14e2bc9f5, 2023

work page 2023
[13]

Generate rather than retrieve: Large language models are strong context generators,

W. Yu, D. Iter, S. Wang, Y . Xu, M. Ju, S. Sanyal, C. Zhu, M. Zeng, and M. Jiang, “Generate rather than retrieve: Large language models are strong context generators,” arXiv preprint arXiv:2209.10063, 2022

work page arXiv 2022
[14]

Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy,

Z. Shao, Y . Gong, Y . Shen, M. Huang, N. Duan, and W. Chen, “Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy,” arXiv preprint arXiv:2305.15294 , 2023

work page arXiv 2023
[15]

Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases,

X. Wang, Q. Yang, Y . Qiu, J. Liang, Q. He, Z. Gu, Y . Xiao, and W. Wang, “Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases,” arXiv preprint arXiv:2308.11761, 2023

work page arXiv 2023
[16]

Forget rag, the future is rag-fusion,

A. H. Raudaschl, “Forget rag, the future is rag-fusion,” https://towardsdatascience.com/ forget-rag-the-future-is-rag-fusion-1147298d8ad1, 2023

work page 2023
[17]

Lift yourself up: Retrieval-augmented text generation with self memory,

X. Cheng, D. Luo, X. Chen, L. Liu, D. Zhao, and R. Yan, “Lift yourself up: Retrieval-augmented text generation with self memory,” arXiv preprint arXiv:2305.02437 , 2023

work page arXiv 2023
[18]

Training data is more valuable than you think: A simple and effective method by retrieving from training data,

S. Wang, Y . Xu, Y . Fang, Y . Liu, S. Sun, R. Xu, C. Zhu, and M. Zeng, “Training data is more valuable than you think: A simple and effective method by retrieving from training data,” arXiv preprint arXiv:2203.08773, 2022

work page arXiv 2022
[19]

From classification to generation: Insights into crosslingual retrieval augmented icl,

X. Li, E. Nie, and S. Liang, “From classification to generation: Insights into crosslingual retrieval augmented icl,” arXiv preprint arXiv:2311.06595, 2023

work page arXiv 2023
[20]

Uprise: Universal prompt retrieval for improving zero-shot evaluation,

D. Cheng, S. Huang, J. Bi, Y . Zhan, J. Liu, Y . Wang, H. Sun, F. Wei, D. Deng, and Q. Zhang, “Uprise: Universal prompt retrieval for improving zero-shot evaluation,” arXiv preprint arXiv:2303.08518, 2023

work page arXiv 2023
[21]

Promptagator: Few-shot dense retrieval from 8 examples,

Z. Dai, V . Y . Zhao, J. Ma, Y . Luan, J. Ni, J. Lu, A. Bakalov, K. Guu, K. B. Hall, and M.-W. Chang, “Promptagator: Few-shot dense retrieval from 8 examples,” arXiv preprint arXiv:2209.11755 , 2022

work page arXiv 2022
[22]

Recitation-augmented language models,

Z. Sun, X. Wang, Y . Tay, Y . Yang, and D. Zhou, “Recitation-augmented language models,” arXiv preprint arXiv:2210.01296 , 2022

work page arXiv 2022
[23]

Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp,

O. Khattab, K. Santhanam, X. L. Li, D. Hall, P. Liang, C. Potts, and M. Zaharia, “Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp,” arXiv preprint arXiv:2212.14024, 2022

work page arXiv 2022
[24]

Active retrieval augmented generation,

Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y . Yang, J. Callan, and G. Neubig, “Active retrieval augmented generation,” arXiv preprint arXiv:2305.06983 , 2023

work page arXiv 2023
[25]

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi, “Self-rag: Learning to retrieve, generate, and critique through self-reflection,” arXiv preprint arXiv:2310.11511 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Bridging the preference gap between retrievers and llms,

Z. Ke, W. Kong, C. Li, M. Zhang, Q. Mei, and M. Bendersky, “Bridging the preference gap between retrievers and llms,” arXiv preprint arXiv:2401.06954, 2024

work page arXiv 2024
[27]

Ra-dit: Retrieval- augmented dual instruction tuning,

X. V . Lin, X. Chen, M. Chen, W. Shi, M. Lomeli, R. James, P. Ro- driguez, J. Kahn, G. Szilvasy, M. Lewis et al. , “Ra-dit: Retrieval- augmented dual instruction tuning,” arXiv preprint arXiv:2310.01352 , 2023

work page arXiv 2023
[28]

Fine-tuning or retrieval? comparing knowledge injection in llms,

O. Ovadia, M. Brief, M. Mishaeli, and O. Elisha, “Fine-tuning or retrieval? comparing knowledge injection in llms,” arXiv preprint arXiv:2312.05934, 2023

work page arXiv 2023
[29]

Copy is all you need,

T. Lan, D. Cai, Y . Wang, H. Huang, and X.-L. Mao, “Copy is all you need,” in The Eleventh International Conference on Learning Representations, 2022

work page 2022
[30]

Dense x retrieval: What retrieval granularity should we use?

T. Chen, H. Wang, S. Chen, W. Yu, K. Ma, X. Zhao, D. Yu, and H. Zhang, “Dense x retrieval: What retrieval granularity should we use?” arXiv preprint arXiv:2312.06648 , 2023

work page arXiv 2023
[31]

Divide & conquer for entailment-aware multi-hop evidence retrieval,

F. Luo and M. Surdeanu, “Divide & conquer for entailment-aware multi-hop evidence retrieval,” arXiv preprint arXiv:2311.02616 , 2023

work page arXiv 2023
[32]

Diversify question generation with retrieval-augmented style transfer,

Q. Gou, Z. Xia, B. Yu, H. Yu, F. Huang, Y . Li, and N. Cam-Tu, “Diversify question generation with retrieval-augmented style transfer,” arXiv preprint arXiv:2310.14503 , 2023

work page arXiv 2023
[33]

Prompt-guided re- trieval augmentation for non-knowledge-intensive tasks,

Z. Guo, S. Cheng, Y . Wang, P. Li, and Y . Liu, “Prompt-guided re- trieval augmentation for non-knowledge-intensive tasks,”arXiv preprint arXiv:2305.17653, 2023

work page arXiv 2023
[34]

Learning to filter context for retrieval-augmented generation,

Z. Wang, J. Araki, Z. Jiang, M. R. Parvez, and G. Neubig, “Learning to filter context for retrieval-augmented generation,” arXiv preprint arXiv:2311.08377, 2023

work page arXiv 2023
[35]

Retrieval-augmented data augmentation for low-resource domain tasks,

M. Seo, J. Baek, J. Thorne, and S. J. Hwang, “Retrieval-augmented data augmentation for low-resource domain tasks,” arXiv preprint arXiv:2402.13482, 2024

work page arXiv 2024
[37]

Retrieval-augmented generative question answering for event argument extraction,

X. Du and H. Ji, “Retrieval-augmented generative question answering for event argument extraction,”arXiv preprint arXiv:2211.07067, 2022

work page arXiv 2022
[38]

Learning to retrieve in-context examples for large language models,

L. Wang, N. Yang, and F. Wei, “Learning to retrieve in-context examples for large language models,”arXiv preprint arXiv:2307.07164, 2023

work page arXiv 2023
[39]

Recommender systems with generative retrieval,

S. Rajput, N. Mehta, A. Singh, R. H. Keshavan, T. Vu, L. Heldt, L. Hong, Y . Tay, V . Q. Tran, J. Samostet al., “Recommender systems with generative retrieval,” arXiv preprint arXiv:2305.05065 , 2023

work page arXiv 2023
[40]

Language models as semantic indexers,

B. Jin, H. Zeng, G. Wang, X. Chen, T. Wei, R. Li, Z. Wang, Z. Li, Y . Li, H. Lu et al. , “Language models as semantic indexers,” arXiv preprint arXiv:2310.07815, 2023

work page arXiv 2023
[41]

Context tuning for retrieval augmented generation,

R. Anantha, T. Bethi, D. V odianik, and S. Chappidi, “Context tuning for retrieval augmented generation,” arXiv preprint arXiv:2312.05708 , 2023

work page arXiv 2023
[42]

Atlas: Few-shot Learning with Retrieval Augmented Language Models

G. Izacard, P. Lewis, M. Lomeli, L. Hosseini, F. Petroni, T. Schick, J. Dwivedi-Yu, A. Joulin, S. Riedel, and E. Grave, “Few-shot learning with retrieval augmented language models,” arXiv preprint arXiv:2208.03299, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[43]

Raven: In-context learning with retrieval augmented encoder- decoder language models,

J. Huang, W. Ping, P. Xu, M. Shoeybi, K. C.-C. Chang, and B. Catan- zaro, “Raven: In-context learning with retrieval augmented encoder- decoder language models,” arXiv preprint arXiv:2308.07922 , 2023. 18

work page arXiv 2023
[44]

Shall we pretrain autoregressive language models with retrieval? a comprehensive study,

B. Wang, W. Ping, P. Xu, L. McAfee, Z. Liu, M. Shoeybi, Y . Dong, O. Kuchaiev, B. Li, C. Xiao et al. , “Shall we pretrain autoregressive language models with retrieval? a comprehensive study,”arXiv preprint arXiv:2304.06762, 2023

work page arXiv 2023
[45]

Instructretro: Instruction tuning post retrieval-augmented pre- training,

B. Wang, W. Ping, L. McAfee, P. Xu, B. Li, M. Shoeybi, and B. Catan- zaro, “Instructretro: Instruction tuning post retrieval-augmented pre- training,” arXiv preprint arXiv:2310.07713 , 2023

work page arXiv 2023
[46]

Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answer- ing,

S. Siriwardhana, R. Weerasekera, E. Wen, T. Kaluarachchi, R. Rana, and S. Nanayakkara, “Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answer- ing,” Transactions of the Association for Computational Linguistics , vol. 11, pp. 1–17, 2023

work page 2023
[47]

Augmentation-adapted retriever improves generalization of language models as generic plug-in,

Z. Yu, C. Xiong, S. Yu, and Z. Liu, “Augmentation-adapted retriever improves generalization of language models as generic plug-in,” arXiv preprint arXiv:2305.17331, 2023

work page arXiv 2023
[49]

Understanding re- trieval augmentation for long-form question answering,

H.-T. Chen, F. Xu, S. A. Arora, and E. Choi, “Understanding re- trieval augmentation for long-form question answering,” arXiv preprint arXiv:2310.12150, 2023

work page arXiv 2023
[50]

Chain-of-note: Enhancing robustness in retrieval-augmented language models,

W. Yu, H. Zhang, X. Pan, K. Ma, H. Wang, and D. Yu, “Chain-of-note: Enhancing robustness in retrieval-augmented language models,” arXiv preprint arXiv:2311.09210, 2023

work page arXiv 2023
[51]

Search-in-the- chain: Towards accurate, credible and traceable large language models for knowledgeintensive tasks,

S. Xu, L. Pang, H. Shen, X. Cheng, and T.-S. Chua, “Search-in-the- chain: Towards accurate, credible and traceable large language models for knowledgeintensive tasks,” CoRR, vol. abs/2304.14732 , 2023

work page arXiv 2023
[52]

Optimizing retrieval-augmented reader models via token elimination,

M. Berchansky, P. Izsak, A. Caciularu, I. Dagan, and M. Wasserblat, “Optimizing retrieval-augmented reader models via token elimination,” arXiv preprint arXiv:2310.13682 , 2023

work page arXiv 2023
[53]

Paperqa: Retrieval-augmented generative agent for scientific research,

J. L ´ala, O. O’Donoghue, A. Shtedritski, S. Cox, S. G. Rodriques, and A. D. White, “Paperqa: Retrieval-augmented generative agent for scientific research,” arXiv preprint arXiv:2312.07559 , 2023

work page arXiv 2023
[54]

The power of noise: Redefining retrieval for rag systems,

F. Cuconasu, G. Trappolini, F. Siciliano, S. Filice, C. Campagnano, Y . Maarek, N. Tonellotto, and F. Silvestri, “The power of noise: Redefining retrieval for rag systems,”arXiv preprint arXiv:2401.14887, 2024

work page arXiv 2024
[55]

Iag: Induction-augmented generation framework for answer- ing reasoning questions,

Z. Zhang, X. Zhang, Y . Ren, S. Shi, M. Han, Y . Wu, R. Lai, and Z. Cao, “Iag: Induction-augmented generation framework for answer- ing reasoning questions,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , 2023, pp. 1–14

work page 2023
[56]

Nomiracl: Knowing when you don’t know for robust multilingual retrieval-augmented generation,

N. Thakur, L. Bonifacio, X. Zhang, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu, B. Chen, M. Rezagholizadeh et al., “Nomiracl: Knowing when you don’t know for robust multilingual retrieval-augmented generation,” arXiv preprint arXiv:2312.11361 , 2023

work page arXiv 2023
[57]

Tree of clarifica- tions: Answering ambiguous questions with retrieval-augmented large language models,

G. Kim, S. Kim, B. Jeon, J. Park, and J. Kang, “Tree of clarifica- tions: Answering ambiguous questions with retrieval-augmented large language models,” arXiv preprint arXiv:2310.14696 , 2023

work page arXiv 2023
[58]

Self-knowledge guided retrieval augmentation for large language models,

Y . Wang, P. Li, M. Sun, and Y . Liu, “Self-knowledge guided retrieval augmentation for large language models,” arXiv preprint arXiv:2310.05002, 2023

work page arXiv 2023
[59]

Retrieval- generation synergy augmented large language models,

Z. Feng, X. Feng, D. Zhao, M. Yang, and B. Qin, “Retrieval- generation synergy augmented large language models,” arXiv preprint arXiv:2310.05149, 2023

work page arXiv 2023
[61]

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “Interleav- ing retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions,” arXiv preprint arXiv:2212.10509 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[62]

Investigating the factual knowledge boundary of large language models with retrieval augmentation,

R. Ren, Y . Wang, Y . Qu, W. X. Zhao, J. Liu, H. Tian, H. Wu, J.- R. Wen, and H. Wang, “Investigating the factual knowledge boundary of large language models with retrieval augmentation,” arXiv preprint arXiv:2307.11019, 2023

work page arXiv 2023
[63]

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” arXiv preprint arXiv:2401.18059 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[64]

In-context retrieval-augmented language models

O. Ram, Y . Levine, I. Dalmedigos, D. Muhlgay, A. Shashua, K. Leyton- Brown, and Y . Shoham, “In-context retrieval-augmented language models,” arXiv preprint arXiv:2302.00083 , 2023

work page arXiv 2023
[65]

Retrieve-and- sample: Document-level event argument extraction via hybrid retrieval augmentation,

Y . Ren, Y . Cao, P. Guo, F. Fang, W. Ma, and Z. Lin, “Retrieve-and- sample: Document-level event argument extraction via hybrid retrieval augmentation,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2023, pp. 293–306

work page 2023
[66]

Zemi: Learning zero-shot semi-parametric language models from multiple tasks,

Z. Wang, X. Pan, D. Yu, D. Yu, J. Chen, and H. Ji, “Zemi: Learning zero-shot semi-parametric language models from multiple tasks,” arXiv preprint arXiv:2210.00185, 2022

work page arXiv 2022
[67]

Corrective Retrieval Augmented Generation

S.-Q. Yan, J.-C. Gu, Y . Zhu, and Z.-H. Ling, “Corrective retrieval augmented generation,” arXiv preprint arXiv:2401.15884 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[68]

1-pager: One pass answer generation and evidence retrieval,

P. Jain, L. B. Soares, and T. Kwiatkowski, “1-pager: One pass answer generation and evidence retrieval,” arXiv preprint arXiv:2310.16568 , 2023

work page arXiv 2023
[69]

Prca: Fitting black-box large language models for retrieval question answer- ing via pluggable reward-driven contextual adapter,

H. Yang, Z. Li, Y . Zhang, J. Wang, N. Cheng, M. Li, and J. Xiao, “Prca: Fitting black-box large language models for retrieval question answer- ing via pluggable reward-driven contextual adapter,” arXiv preprint arXiv:2310.18347, 2023

work page arXiv 2023
[70]

Open-source large language models are strong zero-shot query likelihood models for document ranking,

S. Zhuang, B. Liu, B. Koopman, and G. Zuccon, “Open-source large language models are strong zero-shot query likelihood models for document ranking,” arXiv preprint arXiv:2310.13243 , 2023

work page arXiv 2023
[71]

Recomp: Improving retrieval-augmented lms with compression and selective augmentation,

F. Xu, W. Shi, and E. Choi, “Recomp: Improving retrieval-augmented lms with compression and selective augmentation,” arXiv preprint arXiv:2310.04408, 2023

work page arXiv 2023
[72]

REPLUG: Retrieval-Augmented Black-Box Language Models

W. Shi, S. Min, M. Yasunaga, M. Seo, R. James, M. Lewis, L. Zettle- moyer, and W.-t. Yih, “Replug: Retrieval-augmented black-box lan- guage models,” arXiv preprint arXiv:2301.12652 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[73]

Enhancing llm intelligence with arm-rag: Auxiliary ra- tionale memory for retrieval augmented generation,

E. Melz, “Enhancing llm intelligence with arm-rag: Auxiliary ra- tionale memory for retrieval augmented generation,” arXiv preprint arXiv:2311.04177, 2023

work page arXiv 2023
[74]

Unims-rag: A unified multi-source retrieval-augmented generation for personalized dialogue systems,

H. Wang, W. Huang, Y . Deng, R. Wang, Z. Wang, Y . Wang, F. Mi, J. Z. Pan, and K.-F. Wong, “Unims-rag: A unified multi-source retrieval-augmented generation for personalized dialogue systems,” arXiv preprint arXiv:2401.13256 , 2024

work page arXiv 2024
[75]

Augmented large language models with parametric knowledge guid- ing,

Z. Luo, C. Xu, P. Zhao, X. Geng, C. Tao, J. Ma, Q. Lin, and D. Jiang, “Augmented large language models with parametric knowledge guid- ing,” arXiv preprint arXiv:2305.04757 , 2023

work page arXiv 2023
[76]

Structure- aware language model pretraining improves dense retrieval on struc- tured data,

X. Li, Z. Liu, C. Xiong, S. Yu, Y . Gu, Z. Liu, and G. Yu, “Structure- aware language model pretraining improves dense retrieval on struc- tured data,” arXiv preprint arXiv:2305.19912 , 2023

work page arXiv 2023
[77]

Knowledge graph-augmented language models for knowledge-grounded dialogue generation,

M. Kang, J. M. Kwak, J. Baek, and S. J. Hwang, “Knowledge graph-augmented language models for knowledge-grounded dialogue generation,” arXiv preprint arXiv:2305.18846 , 2023

work page arXiv 2023
[78]

Retrieval- generation alignment for end-to-end task-oriented dialogue system,

W. Shen, Y . Gao, C. Huang, F. Wan, X. Quan, and W. Bi, “Retrieval- generation alignment for end-to-end task-oriented dialogue system,” arXiv preprint arXiv:2310.08877 , 2023

work page arXiv 2023
[79]

Dual-feedback knowledge retrieval for task-oriented dialogue systems,

T. Shi, L. Li, Z. Lin, T. Yang, X. Quan, and Q. Wang, “Dual-feedback knowledge retrieval for task-oriented dialogue systems,” arXiv preprint arXiv:2310.14528, 2023

work page arXiv 2023
[80]

Fabula: Intelligence report generation using retrieval-augmented narrative construction,

P. Ranade and A. Joshi, “Fabula: Intelligence report generation using retrieval-augmented narrative construction,” arXiv preprint arXiv:2310.13848, 2023

work page arXiv 2023
[81]

Think and retrieval: A hypothesis knowledge graph enhanced medical large language models,

X. Jiang, R. Zhang, Y . Xu, R. Qiu, Y . Fang, Z. Wang, J. Tang, H. Ding, X. Chu, J. Zhao et al. , “Think and retrieval: A hypothesis knowledge graph enhanced medical large language models,” arXiv preprint arXiv:2312.15883, 2023

work page arXiv 2023
[82]

Knowledge-augmented language model verification,

J. Baek, S. Jeong, M. Kang, J. C. Park, and S. J. Hwang, “Knowledge-augmented language model verification,” arXiv preprint arXiv:2310.12836, 2023

work page arXiv 2023
[83]

Reasoning on graphs: Faithful and interpretable large language model reasoning,

L. Luo, Y .-F. Li, G. Haffari, and S. Pan, “Reasoning on graphs: Faithful and interpretable large language model reasoning,” arXiv preprint arXiv:2310.01061, 2023

work page arXiv 2023

Showing first 80 references.

[1] [1]

Large language models struggle to learn long-tail knowledge,

N. Kandpal, H. Deng, A. Roberts, E. Wallace, and C. Raffel, “Large language models struggle to learn long-tail knowledge,” in Interna- tional Conference on Machine Learning . PMLR, 2023, pp. 15 696– 15 707

work page 2023

[2] [2]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Y . Zhang, Y . Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y . Zhang, Y . Chenet al., “Siren’s song in the ai ocean: A survey on hal- lucination in large language models,” arXiv preprint arXiv:2309.01219, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Gar-meets-rag paradigm for zero-shot information re- trieval,

D. Arora, A. Kini, S. R. Chowdhury, N. Natarajan, G. Sinha, and A. Sharma, “Gar-meets-rag paradigm for zero-shot information re- trieval,” arXiv preprint arXiv:2310.20158 , 2023

work page arXiv 2023

[4] [4]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschel et al. , “Retrieval- augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020

work page 2020

[5] [5]

Improving language models by retrieving from trillions of tokens,

S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Milli- can, G. B. Van Den Driessche, J.-B. Lespiau, B. Damoc, A. Clarket al., “Improving language models by retrieving from trillions of tokens,” in International conference on machine learning . PMLR, 2022, pp. 2206–2240

work page 2022

[6] [6]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al. , “Training language models to follow instructions with human feedback,” Advances in neural information processing systems , vol. 35, pp. 27 730–27 744, 2022

work page 2022

[7] [7]

Query rewrit- ing for retrieval-augmented large language models,

X. Ma, Y . Gong, P. He, H. Zhao, and N. Duan, “Query rewrit- ing for retrieval-augmented large language models,” arXiv preprint arXiv:2305.14283, 2023

work page arXiv 2023

[8] [8]

Advanced rag techniques: an il- lustrated overview,

I. ILIN, “Advanced rag techniques: an il- lustrated overview,” https://pub.towardsai.net/ advanced-rag-techniques-an-illustrated-overview-04d193d8fec6, 2023

work page 2023

[9] [9]

Large language model based long-tail query rewriting in taobao search,

W. Peng, G. Li, Y . Jiang, Z. Wang, D. Ou, X. Zeng, E. Chen et al. , “Large language model based long-tail query rewriting in taobao search,” arXiv preprint arXiv:2311.03758 , 2023

work page arXiv 2023

[10] [10]

Take a step back: Evoking reasoning via abstraction in large language models,

H. S. Zheng, S. Mishra, X. Chen, H.-T. Cheng, E. H. Chi, Q. V . Le, and D. Zhou, “Take a step back: Evoking reasoning via abstraction in large language models,” arXiv preprint arXiv:2310.06117 , 2023

work page arXiv 2023

[11] [11]

Precise zero-shot dense retrieval without relevance labels,

L. Gao, X. Ma, J. Lin, and J. Callan, “Precise zero-shot dense retrieval without relevance labels,” arXiv preprint arXiv:2212.10496 , 2022

work page arXiv 2022

[12] [12]

Enhancing rag pipelines in haystack: Introducing diver- sityranker and lostinthemiddleranker,

V . Blagojevi, “Enhancing rag pipelines in haystack: Introducing diver- sityranker and lostinthemiddleranker,” https://towardsdatascience.com/ enhancing-rag-pipelines-in-haystack-45f14e2bc9f5, 2023

work page 2023

[13] [13]

Generate rather than retrieve: Large language models are strong context generators,

W. Yu, D. Iter, S. Wang, Y . Xu, M. Ju, S. Sanyal, C. Zhu, M. Zeng, and M. Jiang, “Generate rather than retrieve: Large language models are strong context generators,” arXiv preprint arXiv:2209.10063, 2022

work page arXiv 2022

[14] [14]

Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy,

Z. Shao, Y . Gong, Y . Shen, M. Huang, N. Duan, and W. Chen, “Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy,” arXiv preprint arXiv:2305.15294 , 2023

work page arXiv 2023

[15] [15]

Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases,

X. Wang, Q. Yang, Y . Qiu, J. Liang, Q. He, Z. Gu, Y . Xiao, and W. Wang, “Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases,” arXiv preprint arXiv:2308.11761, 2023

work page arXiv 2023

[16] [16]

Forget rag, the future is rag-fusion,

A. H. Raudaschl, “Forget rag, the future is rag-fusion,” https://towardsdatascience.com/ forget-rag-the-future-is-rag-fusion-1147298d8ad1, 2023

work page 2023

[17] [17]

Lift yourself up: Retrieval-augmented text generation with self memory,

X. Cheng, D. Luo, X. Chen, L. Liu, D. Zhao, and R. Yan, “Lift yourself up: Retrieval-augmented text generation with self memory,” arXiv preprint arXiv:2305.02437 , 2023

work page arXiv 2023

[18] [18]

Training data is more valuable than you think: A simple and effective method by retrieving from training data,

S. Wang, Y . Xu, Y . Fang, Y . Liu, S. Sun, R. Xu, C. Zhu, and M. Zeng, “Training data is more valuable than you think: A simple and effective method by retrieving from training data,” arXiv preprint arXiv:2203.08773, 2022

work page arXiv 2022

[19] [19]

From classification to generation: Insights into crosslingual retrieval augmented icl,

X. Li, E. Nie, and S. Liang, “From classification to generation: Insights into crosslingual retrieval augmented icl,” arXiv preprint arXiv:2311.06595, 2023

work page arXiv 2023

[20] [20]

Uprise: Universal prompt retrieval for improving zero-shot evaluation,

D. Cheng, S. Huang, J. Bi, Y . Zhan, J. Liu, Y . Wang, H. Sun, F. Wei, D. Deng, and Q. Zhang, “Uprise: Universal prompt retrieval for improving zero-shot evaluation,” arXiv preprint arXiv:2303.08518, 2023

work page arXiv 2023

[21] [21]

Promptagator: Few-shot dense retrieval from 8 examples,

Z. Dai, V . Y . Zhao, J. Ma, Y . Luan, J. Ni, J. Lu, A. Bakalov, K. Guu, K. B. Hall, and M.-W. Chang, “Promptagator: Few-shot dense retrieval from 8 examples,” arXiv preprint arXiv:2209.11755 , 2022

work page arXiv 2022

[22] [22]

Recitation-augmented language models,

Z. Sun, X. Wang, Y . Tay, Y . Yang, and D. Zhou, “Recitation-augmented language models,” arXiv preprint arXiv:2210.01296 , 2022

work page arXiv 2022

[23] [23]

Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp,

O. Khattab, K. Santhanam, X. L. Li, D. Hall, P. Liang, C. Potts, and M. Zaharia, “Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp,” arXiv preprint arXiv:2212.14024, 2022

work page arXiv 2022

[24] [24]

Active retrieval augmented generation,

Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y . Yang, J. Callan, and G. Neubig, “Active retrieval augmented generation,” arXiv preprint arXiv:2305.06983 , 2023

work page arXiv 2023

[25] [25]

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi, “Self-rag: Learning to retrieve, generate, and critique through self-reflection,” arXiv preprint arXiv:2310.11511 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[26] [26]

Bridging the preference gap between retrievers and llms,

Z. Ke, W. Kong, C. Li, M. Zhang, Q. Mei, and M. Bendersky, “Bridging the preference gap between retrievers and llms,” arXiv preprint arXiv:2401.06954, 2024

work page arXiv 2024

[27] [27]

Ra-dit: Retrieval- augmented dual instruction tuning,

X. V . Lin, X. Chen, M. Chen, W. Shi, M. Lomeli, R. James, P. Ro- driguez, J. Kahn, G. Szilvasy, M. Lewis et al. , “Ra-dit: Retrieval- augmented dual instruction tuning,” arXiv preprint arXiv:2310.01352 , 2023

work page arXiv 2023

[28] [28]

Fine-tuning or retrieval? comparing knowledge injection in llms,

O. Ovadia, M. Brief, M. Mishaeli, and O. Elisha, “Fine-tuning or retrieval? comparing knowledge injection in llms,” arXiv preprint arXiv:2312.05934, 2023

work page arXiv 2023

[29] [29]

Copy is all you need,

T. Lan, D. Cai, Y . Wang, H. Huang, and X.-L. Mao, “Copy is all you need,” in The Eleventh International Conference on Learning Representations, 2022

work page 2022

[30] [30]

Dense x retrieval: What retrieval granularity should we use?

T. Chen, H. Wang, S. Chen, W. Yu, K. Ma, X. Zhao, D. Yu, and H. Zhang, “Dense x retrieval: What retrieval granularity should we use?” arXiv preprint arXiv:2312.06648 , 2023

work page arXiv 2023

[31] [31]

Divide & conquer for entailment-aware multi-hop evidence retrieval,

F. Luo and M. Surdeanu, “Divide & conquer for entailment-aware multi-hop evidence retrieval,” arXiv preprint arXiv:2311.02616 , 2023

work page arXiv 2023

[32] [32]

Diversify question generation with retrieval-augmented style transfer,

Q. Gou, Z. Xia, B. Yu, H. Yu, F. Huang, Y . Li, and N. Cam-Tu, “Diversify question generation with retrieval-augmented style transfer,” arXiv preprint arXiv:2310.14503 , 2023

work page arXiv 2023

[33] [33]

Prompt-guided re- trieval augmentation for non-knowledge-intensive tasks,

Z. Guo, S. Cheng, Y . Wang, P. Li, and Y . Liu, “Prompt-guided re- trieval augmentation for non-knowledge-intensive tasks,”arXiv preprint arXiv:2305.17653, 2023

work page arXiv 2023

[34] [34]

Learning to filter context for retrieval-augmented generation,

Z. Wang, J. Araki, Z. Jiang, M. R. Parvez, and G. Neubig, “Learning to filter context for retrieval-augmented generation,” arXiv preprint arXiv:2311.08377, 2023

work page arXiv 2023

[35] [35]

Retrieval-augmented data augmentation for low-resource domain tasks,

M. Seo, J. Baek, J. Thorne, and S. J. Hwang, “Retrieval-augmented data augmentation for low-resource domain tasks,” arXiv preprint arXiv:2402.13482, 2024

work page arXiv 2024

[36] [37]

Retrieval-augmented generative question answering for event argument extraction,

X. Du and H. Ji, “Retrieval-augmented generative question answering for event argument extraction,”arXiv preprint arXiv:2211.07067, 2022

work page arXiv 2022

[37] [38]

Learning to retrieve in-context examples for large language models,

L. Wang, N. Yang, and F. Wei, “Learning to retrieve in-context examples for large language models,”arXiv preprint arXiv:2307.07164, 2023

work page arXiv 2023

[38] [39]

Recommender systems with generative retrieval,

S. Rajput, N. Mehta, A. Singh, R. H. Keshavan, T. Vu, L. Heldt, L. Hong, Y . Tay, V . Q. Tran, J. Samostet al., “Recommender systems with generative retrieval,” arXiv preprint arXiv:2305.05065 , 2023

work page arXiv 2023

[39] [40]

Language models as semantic indexers,

B. Jin, H. Zeng, G. Wang, X. Chen, T. Wei, R. Li, Z. Wang, Z. Li, Y . Li, H. Lu et al. , “Language models as semantic indexers,” arXiv preprint arXiv:2310.07815, 2023

work page arXiv 2023

[40] [41]

Context tuning for retrieval augmented generation,

R. Anantha, T. Bethi, D. V odianik, and S. Chappidi, “Context tuning for retrieval augmented generation,” arXiv preprint arXiv:2312.05708 , 2023

work page arXiv 2023

[41] [42]

Atlas: Few-shot Learning with Retrieval Augmented Language Models

G. Izacard, P. Lewis, M. Lomeli, L. Hosseini, F. Petroni, T. Schick, J. Dwivedi-Yu, A. Joulin, S. Riedel, and E. Grave, “Few-shot learning with retrieval augmented language models,” arXiv preprint arXiv:2208.03299, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[42] [43]

Raven: In-context learning with retrieval augmented encoder- decoder language models,

J. Huang, W. Ping, P. Xu, M. Shoeybi, K. C.-C. Chang, and B. Catan- zaro, “Raven: In-context learning with retrieval augmented encoder- decoder language models,” arXiv preprint arXiv:2308.07922 , 2023. 18

work page arXiv 2023

[43] [44]

Shall we pretrain autoregressive language models with retrieval? a comprehensive study,

B. Wang, W. Ping, P. Xu, L. McAfee, Z. Liu, M. Shoeybi, Y . Dong, O. Kuchaiev, B. Li, C. Xiao et al. , “Shall we pretrain autoregressive language models with retrieval? a comprehensive study,”arXiv preprint arXiv:2304.06762, 2023

work page arXiv 2023

[44] [45]

Instructretro: Instruction tuning post retrieval-augmented pre- training,

B. Wang, W. Ping, L. McAfee, P. Xu, B. Li, M. Shoeybi, and B. Catan- zaro, “Instructretro: Instruction tuning post retrieval-augmented pre- training,” arXiv preprint arXiv:2310.07713 , 2023

work page arXiv 2023

[45] [46]

Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answer- ing,

S. Siriwardhana, R. Weerasekera, E. Wen, T. Kaluarachchi, R. Rana, and S. Nanayakkara, “Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answer- ing,” Transactions of the Association for Computational Linguistics , vol. 11, pp. 1–17, 2023

work page 2023

[46] [47]

Augmentation-adapted retriever improves generalization of language models as generic plug-in,

Z. Yu, C. Xiong, S. Yu, and Z. Liu, “Augmentation-adapted retriever improves generalization of language models as generic plug-in,” arXiv preprint arXiv:2305.17331, 2023

work page arXiv 2023

[47] [49]

Understanding re- trieval augmentation for long-form question answering,

H.-T. Chen, F. Xu, S. A. Arora, and E. Choi, “Understanding re- trieval augmentation for long-form question answering,” arXiv preprint arXiv:2310.12150, 2023

work page arXiv 2023

[48] [50]

Chain-of-note: Enhancing robustness in retrieval-augmented language models,

W. Yu, H. Zhang, X. Pan, K. Ma, H. Wang, and D. Yu, “Chain-of-note: Enhancing robustness in retrieval-augmented language models,” arXiv preprint arXiv:2311.09210, 2023

work page arXiv 2023

[49] [51]

Search-in-the- chain: Towards accurate, credible and traceable large language models for knowledgeintensive tasks,

S. Xu, L. Pang, H. Shen, X. Cheng, and T.-S. Chua, “Search-in-the- chain: Towards accurate, credible and traceable large language models for knowledgeintensive tasks,” CoRR, vol. abs/2304.14732 , 2023

work page arXiv 2023

[50] [52]

Optimizing retrieval-augmented reader models via token elimination,

M. Berchansky, P. Izsak, A. Caciularu, I. Dagan, and M. Wasserblat, “Optimizing retrieval-augmented reader models via token elimination,” arXiv preprint arXiv:2310.13682 , 2023

work page arXiv 2023

[51] [53]

Paperqa: Retrieval-augmented generative agent for scientific research,

J. L ´ala, O. O’Donoghue, A. Shtedritski, S. Cox, S. G. Rodriques, and A. D. White, “Paperqa: Retrieval-augmented generative agent for scientific research,” arXiv preprint arXiv:2312.07559 , 2023

work page arXiv 2023

[52] [54]

The power of noise: Redefining retrieval for rag systems,

F. Cuconasu, G. Trappolini, F. Siciliano, S. Filice, C. Campagnano, Y . Maarek, N. Tonellotto, and F. Silvestri, “The power of noise: Redefining retrieval for rag systems,”arXiv preprint arXiv:2401.14887, 2024

work page arXiv 2024

[53] [55]

Iag: Induction-augmented generation framework for answer- ing reasoning questions,

Z. Zhang, X. Zhang, Y . Ren, S. Shi, M. Han, Y . Wu, R. Lai, and Z. Cao, “Iag: Induction-augmented generation framework for answer- ing reasoning questions,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , 2023, pp. 1–14

work page 2023

[54] [56]

Nomiracl: Knowing when you don’t know for robust multilingual retrieval-augmented generation,

N. Thakur, L. Bonifacio, X. Zhang, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu, B. Chen, M. Rezagholizadeh et al., “Nomiracl: Knowing when you don’t know for robust multilingual retrieval-augmented generation,” arXiv preprint arXiv:2312.11361 , 2023

work page arXiv 2023

[55] [57]

Tree of clarifica- tions: Answering ambiguous questions with retrieval-augmented large language models,

G. Kim, S. Kim, B. Jeon, J. Park, and J. Kang, “Tree of clarifica- tions: Answering ambiguous questions with retrieval-augmented large language models,” arXiv preprint arXiv:2310.14696 , 2023

work page arXiv 2023

[56] [58]

Self-knowledge guided retrieval augmentation for large language models,

Y . Wang, P. Li, M. Sun, and Y . Liu, “Self-knowledge guided retrieval augmentation for large language models,” arXiv preprint arXiv:2310.05002, 2023

work page arXiv 2023

[57] [59]

Retrieval- generation synergy augmented large language models,

Z. Feng, X. Feng, D. Zhao, M. Yang, and B. Qin, “Retrieval- generation synergy augmented large language models,” arXiv preprint arXiv:2310.05149, 2023

work page arXiv 2023

[58] [61]

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “Interleav- ing retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions,” arXiv preprint arXiv:2212.10509 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[59] [62]

Investigating the factual knowledge boundary of large language models with retrieval augmentation,

R. Ren, Y . Wang, Y . Qu, W. X. Zhao, J. Liu, H. Tian, H. Wu, J.- R. Wen, and H. Wang, “Investigating the factual knowledge boundary of large language models with retrieval augmentation,” arXiv preprint arXiv:2307.11019, 2023

work page arXiv 2023

[60] [63]

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” arXiv preprint arXiv:2401.18059 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[61] [64]

In-context retrieval-augmented language models

O. Ram, Y . Levine, I. Dalmedigos, D. Muhlgay, A. Shashua, K. Leyton- Brown, and Y . Shoham, “In-context retrieval-augmented language models,” arXiv preprint arXiv:2302.00083 , 2023

work page arXiv 2023

[62] [65]

Retrieve-and- sample: Document-level event argument extraction via hybrid retrieval augmentation,

Y . Ren, Y . Cao, P. Guo, F. Fang, W. Ma, and Z. Lin, “Retrieve-and- sample: Document-level event argument extraction via hybrid retrieval augmentation,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2023, pp. 293–306

work page 2023

[63] [66]

Zemi: Learning zero-shot semi-parametric language models from multiple tasks,

Z. Wang, X. Pan, D. Yu, D. Yu, J. Chen, and H. Ji, “Zemi: Learning zero-shot semi-parametric language models from multiple tasks,” arXiv preprint arXiv:2210.00185, 2022

work page arXiv 2022

[64] [67]

Corrective Retrieval Augmented Generation

S.-Q. Yan, J.-C. Gu, Y . Zhu, and Z.-H. Ling, “Corrective retrieval augmented generation,” arXiv preprint arXiv:2401.15884 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[65] [68]

1-pager: One pass answer generation and evidence retrieval,

P. Jain, L. B. Soares, and T. Kwiatkowski, “1-pager: One pass answer generation and evidence retrieval,” arXiv preprint arXiv:2310.16568 , 2023

work page arXiv 2023

[66] [69]

Prca: Fitting black-box large language models for retrieval question answer- ing via pluggable reward-driven contextual adapter,

H. Yang, Z. Li, Y . Zhang, J. Wang, N. Cheng, M. Li, and J. Xiao, “Prca: Fitting black-box large language models for retrieval question answer- ing via pluggable reward-driven contextual adapter,” arXiv preprint arXiv:2310.18347, 2023

work page arXiv 2023

[67] [70]

Open-source large language models are strong zero-shot query likelihood models for document ranking,

S. Zhuang, B. Liu, B. Koopman, and G. Zuccon, “Open-source large language models are strong zero-shot query likelihood models for document ranking,” arXiv preprint arXiv:2310.13243 , 2023

work page arXiv 2023

[68] [71]

Recomp: Improving retrieval-augmented lms with compression and selective augmentation,

F. Xu, W. Shi, and E. Choi, “Recomp: Improving retrieval-augmented lms with compression and selective augmentation,” arXiv preprint arXiv:2310.04408, 2023

work page arXiv 2023

[69] [72]

REPLUG: Retrieval-Augmented Black-Box Language Models

W. Shi, S. Min, M. Yasunaga, M. Seo, R. James, M. Lewis, L. Zettle- moyer, and W.-t. Yih, “Replug: Retrieval-augmented black-box lan- guage models,” arXiv preprint arXiv:2301.12652 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[70] [73]

Enhancing llm intelligence with arm-rag: Auxiliary ra- tionale memory for retrieval augmented generation,

E. Melz, “Enhancing llm intelligence with arm-rag: Auxiliary ra- tionale memory for retrieval augmented generation,” arXiv preprint arXiv:2311.04177, 2023

work page arXiv 2023

[71] [74]

Unims-rag: A unified multi-source retrieval-augmented generation for personalized dialogue systems,

H. Wang, W. Huang, Y . Deng, R. Wang, Z. Wang, Y . Wang, F. Mi, J. Z. Pan, and K.-F. Wong, “Unims-rag: A unified multi-source retrieval-augmented generation for personalized dialogue systems,” arXiv preprint arXiv:2401.13256 , 2024

work page arXiv 2024

[72] [75]

Augmented large language models with parametric knowledge guid- ing,

Z. Luo, C. Xu, P. Zhao, X. Geng, C. Tao, J. Ma, Q. Lin, and D. Jiang, “Augmented large language models with parametric knowledge guid- ing,” arXiv preprint arXiv:2305.04757 , 2023

work page arXiv 2023

[73] [76]

Structure- aware language model pretraining improves dense retrieval on struc- tured data,

X. Li, Z. Liu, C. Xiong, S. Yu, Y . Gu, Z. Liu, and G. Yu, “Structure- aware language model pretraining improves dense retrieval on struc- tured data,” arXiv preprint arXiv:2305.19912 , 2023

work page arXiv 2023

[74] [77]

Knowledge graph-augmented language models for knowledge-grounded dialogue generation,

M. Kang, J. M. Kwak, J. Baek, and S. J. Hwang, “Knowledge graph-augmented language models for knowledge-grounded dialogue generation,” arXiv preprint arXiv:2305.18846 , 2023

work page arXiv 2023

[75] [78]

Retrieval- generation alignment for end-to-end task-oriented dialogue system,

W. Shen, Y . Gao, C. Huang, F. Wan, X. Quan, and W. Bi, “Retrieval- generation alignment for end-to-end task-oriented dialogue system,” arXiv preprint arXiv:2310.08877 , 2023

work page arXiv 2023

[76] [79]

Dual-feedback knowledge retrieval for task-oriented dialogue systems,

T. Shi, L. Li, Z. Lin, T. Yang, X. Quan, and Q. Wang, “Dual-feedback knowledge retrieval for task-oriented dialogue systems,” arXiv preprint arXiv:2310.14528, 2023

work page arXiv 2023

[77] [80]

Fabula: Intelligence report generation using retrieval-augmented narrative construction,

P. Ranade and A. Joshi, “Fabula: Intelligence report generation using retrieval-augmented narrative construction,” arXiv preprint arXiv:2310.13848, 2023

work page arXiv 2023

[78] [81]

Think and retrieval: A hypothesis knowledge graph enhanced medical large language models,

X. Jiang, R. Zhang, Y . Xu, R. Qiu, Y . Fang, Z. Wang, J. Tang, H. Ding, X. Chu, J. Zhao et al. , “Think and retrieval: A hypothesis knowledge graph enhanced medical large language models,” arXiv preprint arXiv:2312.15883, 2023

work page arXiv 2023

[79] [82]

Knowledge-augmented language model verification,

J. Baek, S. Jeong, M. Kang, J. C. Park, and S. J. Hwang, “Knowledge-augmented language model verification,” arXiv preprint arXiv:2310.12836, 2023

work page arXiv 2023

[80] [83]

Reasoning on graphs: Faithful and interpretable large language model reasoning,

L. Luo, Y .-F. Li, G. Haffari, and S. Pan, “Reasoning on graphs: Faithful and interpretable large language model reasoning,” arXiv preprint arXiv:2310.01061, 2023

work page arXiv 2023