hub Canonical reference

Corrective Retrieval Augmented Generation

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling · 2024 · cs.CL · arXiv 2401.15884

Canonical reference. 86% of citing Pith papers cite this work as background.

44 Pith papers citing it

Background 86% of classified citations

open full Pith review browse 44 citing papers arXiv PDF

abstract

Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 1

citation-polarity summary

background 6 use method 1

representative citing papers

X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Digital Human Attention

cs.AI · 2026-05-15 · unverdicted · novelty 7.0 · 2 refs

X-SYNTH synthesizes enterprise context from digital human attention using Digital Twin Signatures and seven attention filters, raising true lead rate from 9.5% to 61.9% while cutting false lead rate to 18.8%.

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

cs.CL · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

Pre-Route elicits LLMs' latent routing skills via structured prompts on metadata to proactively choose RAG or long-context, outperforming reactive baselines on cost-effectiveness.

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

cs.AI · 2026-05-07 · accept · novelty 7.0

Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

cs.MA · 2026-05-05 · unverdicted · novelty 7.0

MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.

AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

AdaGATE improves evidence F1 scores on HotpotQA for multi-hop RAG under clean, redundant, and noisy conditions by framing selection as gap-aware token-constrained repair, outperforming baselines while using 2.6x fewer tokens.

HaS: Accelerating RAG through Homology-Aware Speculative Retrieval

cs.IR · 2026-04-22 · unverdicted · novelty 7.0

HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.

ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.

Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing

cs.CL · 2026-04-17 · unverdicted · novelty 7.0

Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.

Credo: Declarative Control of LLM Pipelines via Beliefs and Policies

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.

REGREACT: Self-Correcting Multi-Agent Pipelines for Structured Regulatory Information Extraction

cs.MA · 2026-04-13 · unverdicted · novelty 7.0

RegReAct deploys self-correcting multi-agent pipelines across seven stages to extract hierarchical compliance criteria from regulatory texts, outperforming single-pass GPT-4o on EU Taxonomy documents.

Retrieval Augmented Conversational Recommendation with Reinforcement Learning

cs.IR · 2026-04-06 · unverdicted · novelty 7.0

RAR retrieves candidate items from a 300k-movie corpus then uses LLM generation with RL feedback to produce context-aware recommendations that outperform baselines on benchmarks.

BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

BELIEF improves closed-set biomedical QA by converting documents to structured evidence objects and fusing D-S symbolic belief estimation with LLM inference through reliability-aware arbitration.

PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning

cs.AI · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.

CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 improvements.

EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

EviMem improves accuracy on temporal and multi-hop questions in long-term conversational memory by iteratively diagnosing and filling evidence gaps, achieving 81.6% and 85.2% judge accuracy on LoCoMo at 4.5x lower latency than MIRIX.

Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models

cs.CL · 2026-04-28 · accept · novelty 6.0

Faithfulness-QA is a 99k-sample dataset created via counterfactual entity substitution on existing QA benchmarks to train and evaluate context-faithful RAG models.

Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study

cs.AI · 2026-03-02 · unverdicted · novelty 6.0

An agentic multi-source grounding system for marketplace query intent achieves 90.7% accuracy on long-tail queries at DoorDash by combining catalog grounding, web search, and deterministic disambiguation, outperforming baselines by up to 13pp.

Towards Effective Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval

cs.CV · 2025-12-09 · unverdicted · novelty 6.0

OneClip-RAG enables MLLMs to handle long videos via one-shot clip retrieval and unified chunking-retrieval, delivering performance gains like matching GPT-5 level on MLVU with high efficiency on standard GPUs.

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

cs.AI · 2025-03-25 · unverdicted · novelty 6.0

ReSearch trains LLMs via RL to integrate search operations into reasoning steps, achieving strong generalization across benchmarks and eliciting reflection and self-correction without supervised reasoning data.

Retrieval-Augmented Generation for Natural Language Processing: A Survey

cs.CL · 2024-07-18 · accept · novelty 6.0

The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.

citing papers explorer

Showing 15 of 15 citing papers after filters.

X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Digital Human Attention cs.AI · 2026-05-15 · unverdicted · none · ref 74 · 2 links · internal anchor
X-SYNTH synthesizes enterprise context from digital human attention using Digital Twin Signatures and seven attention filters, raising true lead rate from 9.5% to 61.9% while cutting false lead rate to 18.8%.
The Context Gathering Decision Process: A POMDP Framework for Agentic Search cs.AI · 2026-05-07 · accept · none · ref 40 · internal anchor
Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.
IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning cs.AI · 2026-04-16 · unverdicted · none · ref 38 · internal anchor
IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.
Credo: Declarative Control of LLM Pipelines via Beliefs and Policies cs.AI · 2026-04-15 · unverdicted · none · ref 11 · internal anchor
Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.
PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning cs.AI · 2026-05-10 · unverdicted · none · ref 45 · 2 links · internal anchor
PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.
Agentic Retrieval-Augmented Generation for Financial Document Question Answering cs.AI · 2026-05-06 · unverdicted · none · ref 41 · internal anchor
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study cs.AI · 2026-03-02 · unverdicted · none · ref 25 · internal anchor
An agentic multi-source grounding system for marketplace query intent achieves 90.7% accuracy on long-tail queries at DoorDash by combining catalog grounding, web search, and deterministic disambiguation, outperforming baselines by up to 13pp.
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning cs.AI · 2025-03-25 · unverdicted · none · ref 38 · internal anchor
ReSearch trains LLMs via RL to integrate search operations into reasoning steps, achieving strong generalization across benchmarks and eliciting reflection and self-correction without supervised reasoning data.
AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases cs.AI · 2026-05-07 · unverdicted · none · ref 31 · internal anchor
AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.
Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization cs.AI · 2026-05-02 · unverdicted · none · ref 20 · 2 links · internal anchor
SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.
FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification cs.AI · 2026-04-26 · unverdicted · none · ref 7 · internal anchor
FinGround reduces financial hallucinations by 68% over baselines in retrieval-equalized tests through atomic claim verification and grounding, with an 8B model retaining 91.4% F1 at low cost.
A Control Architecture for Training-Free Memory Use cs.AI · 2026-04-20 · unverdicted · none · ref 26 · internal anchor
A training-free control architecture with uncertainty-based routing, confidence-selective acceptance, and evidence-based memory governance improves arithmetic reasoning by +7 points on SVAMP and ASDiv benchmarks.
Token Economics for LLM Agents: A Dual-View Study from Computing and Economics cs.AI · 2026-05-09 · unverdicted · none · ref 77 · internal anchor
The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using established economic theories.
An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration cs.AI · 2026-05-05 · unverdicted · none · ref 8 · 2 links · internal anchor
Experience-RAG Skill is a reusable agent skill that selects retrieval strategies via experience memory, achieving 0.8924 nDCG@10 on BeIR/nq, hotpotqa, and scifact while outperforming fixed retriever baselines.
Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU cs.AI · 2026-04-27 · unverdicted · none · ref 5 · internal anchor
Adaptive ToR uses a query complexity classifier to route multi-intent queries to either fast single-step or deeper hierarchical retrieval, improving accuracy by 9.7% and cutting latency by 37.6% on NLU benchmarks.

Corrective Retrieval Augmented Generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer