hub Canonical reference

Corrective Retrieval Augmented Generation

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling · 2024 · cs.CL · arXiv 2401.15884

Canonical reference. 86% of citing Pith papers cite this work as background.

53 Pith papers citing it

Background 86% of classified citations

open full Pith review browse 53 citing papers arXiv PDF

abstract

Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 1

citation-polarity summary

background 6 use method 1

representative citing papers

Beyond the Reranker: Do RAG Retrieval Enhancements Help Once a Strong Reranker Is Present?

cs.IR · 2026-06-14 · conditional · novelty 7.0

On heterogeneous document collections, only query expansion and a newly introduced per-source calibrated corrector (SSCC) deliver reliable gains beyond a strong cross-encoder reranker; other common retrieval enhancements do not.

X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Digital Human Attention

cs.AI · 2026-05-15 · unverdicted · novelty 7.0 · 2 refs

X-SYNTH synthesizes enterprise context from digital human attention using Digital Twin Signatures and seven attention filters, raising true lead rate from 9.5% to 61.9% while cutting false lead rate to 18.8%.

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

cs.CL · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

Pre-Route elicits LLMs' latent routing skills via structured prompts on metadata to proactively choose RAG or long-context, outperforming reactive baselines on cost-effectiveness.

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

cs.AI · 2026-05-07 · accept · novelty 7.0

Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

cs.MA · 2026-05-05 · unverdicted · novelty 7.0

MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.

AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

AdaGATE improves evidence F1 scores on HotpotQA for multi-hop RAG under clean, redundant, and noisy conditions by framing selection as gap-aware token-constrained repair, outperforming baselines while using 2.6x fewer tokens.

HaS: Accelerating RAG through Homology-Aware Speculative Retrieval

cs.IR · 2026-04-22 · unverdicted · novelty 7.0

HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.

ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.

Credo: Declarative Control of LLM Pipelines via Beliefs and Policies

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.

REGREACT: Self-Correcting Multi-Agent Pipelines for Structured Regulatory Information Extraction

cs.MA · 2026-04-13 · unverdicted · novelty 7.0

RegReAct deploys self-correcting multi-agent pipelines across seven stages to extract hierarchical compliance criteria from regulatory texts, outperforming single-pass GPT-4o on EU Taxonomy documents.

Retrieval Augmented Conversational Recommendation with Reinforcement Learning

cs.IR · 2026-04-06 · unverdicted · novelty 7.0

RAR retrieves candidate items from a 300k-movie corpus then uses LLM generation with RL feedback to produce context-aware recommendations that outperform baselines on benchmarks.

Only Ask What You Don't Know: Grounded Delta Planning for Efficient Multi-step RAG

cs.CL · 2026-06-21 · unverdicted · novelty 6.0

GDP-RAG targets only information deltas in multi-hop RAG through preliminary grounding, gap-conditioned prompts, and skeletal trajectories, reaching 60.63% accuracy at 0.51 cost-of-pass on HotpotQA, 2WikiMultiHopQA, and MuSiQue.

Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference

cs.AI · 2026-06-18 · unverdicted · novelty 6.0

MACR adaptively assesses LLM confidence via semantic entropy then applies inductive multi-agent reasoning with rule-induction, conflict-analysis, and resolution agents to handle unreliable parametric and contextual knowledge.

REVEAL: Reference-Grounded Reasoning for Multimodal Manipulation Detection

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

REVEAL reformulates multimodal manipulation detection as reference-grounded verification using a 170K-pair authentic library, difference-aware fusion, and task-decoupled MoE for joint detection and localization with training-free domain adaptation.

MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational Agents

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

MemCog introduces a Memory-as-Cognition paradigm with Navigable Memory Store, Cross-Dimensional Navigation Interface, and Proactive Reasoning Protocol, claiming SOTA results on LoCoMo, LongMemEval, and a new ProactiveMemBench.

BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

BELIEF improves closed-set biomedical QA by converting documents to structured evidence objects and fusing D-S symbolic belief estimation with LLM inference through reliability-aware arbitration.

Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

cs.CL · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

Introduces CDD to diagnose context compliance in RAG under knowledge conflicts, reporting measurable compliance, cross-model accuracy transfer without causal coupling transfer, and robustness gains on Epi-Scale and TruthfulQA benchmarks.

PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning

cs.AI · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.

CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 improvements.

EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

EviMem improves accuracy on temporal and multi-hop questions in long-term conversational memory by iteratively diagnosing and filling evidence gaps, achieving 81.6% and 85.2% judge accuracy on LoCoMo at 4.5x lower latency than MIRIX.

citing papers explorer

Showing 16 of 16 citing papers after filters.

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection cs.CL · 2026-05-11 · unverdicted · none · ref 3 · 2 links · internal anchor
Pre-Route elicits LLMs' latent routing skills via structured prompts on metadata to proactively choose RAG or long-context, outperforming reactive baselines on cost-effectiveness.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States cs.CL · 2026-05-06 · unverdicted · none · ref 79 · internal anchor
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation cs.CL · 2026-05-04 · unverdicted · none · ref 28 · internal anchor
AdaGATE improves evidence F1 scores on HotpotQA for multi-hop RAG under clean, redundant, and noisy conditions by framing selection as gap-aware token-constrained repair, outperforming baselines while using 2.6x fewer tokens.
ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation cs.CL · 2026-04-20 · unverdicted · none · ref 42 · internal anchor
ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.
Only Ask What You Don't Know: Grounded Delta Planning for Efficient Multi-step RAG cs.CL · 2026-06-21 · unverdicted · none · ref 90 · internal anchor
GDP-RAG targets only information deltas in multi-hop RAG through preliminary grounding, gap-conditioned prompts, and skeletal trajectories, reaching 60.63% accuracy at 0.51 cost-of-pass on HotpotQA, 2WikiMultiHopQA, and MuSiQue.
BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering cs.CL · 2026-05-17 · unverdicted · none · ref 25 · internal anchor
BELIEF improves closed-set biomedical QA by converting documents to structured evidence objects and fusing D-S symbolic belief estimation with LLM inference through reliability-aware arbitration.
Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict cs.CL · 2026-05-14 · unverdicted · none · ref 17 · 2 links · internal anchor
Introduces CDD to diagnose context compliance in RAG under knowledge conflicts, reporting measurable compliance, cross-model accuracy transfer without causal coupling transfer, and robustness gains on Epi-Scale and TruthfulQA benchmarks.
CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation cs.CL · 2026-05-06 · unverdicted · none · ref 18 · internal anchor
CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 improvements.
Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models cs.CL · 2026-04-28 · accept · none · ref 14 · internal anchor
Faithfulness-QA is a 99k-sample dataset created via counterfactual entity substitution on existing QA benchmarks to train and evaluate context-faithful RAG models.
KbSD: Knowledge Boundary aware Self-Distillation for Behavioral Calibration in Agentic Search cs.CL · 2026-06-29 · unverdicted · none · ref 21 · internal anchor
KbSD uses a same-size hint-augmented teacher and quadrant-adaptive KL objectives to deliver dense supervision for calibrated behavior across knowledge states in agentic search.
CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation cs.CL · 2026-05-28 · unverdicted · none · ref 5 · internal anchor
CRITIC-R1 learns structured RAG critics via GRPO RL with Conservative Judgement Alignment and Diagnostic Quality Alignment rewards, reporting gains on five QA benchmarks.
Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research cs.CL · 2026-05-18 · conditional · none · ref 19 · internal anchor
A preregistered comparison on 24 papers found that an LLM-compiled wiki outperformed vector RAG on cross-document synthesis and citation accuracy but used more query tokens, with no system best across all metrics.
ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation cs.CL · 2026-05-17 · unverdicted · none · ref 7 · 2 links · internal anchor
ConflictRAG introduces a conflict-aware RAG pipeline with two-stage detection (MLP + selective LLM), Entropy-TOPSIS credibility assessment, and a new CARS metric, reporting 88.7% F1 and 5.3-6.1% gains on benchmarks.
STEM: Structure-Tracing Evidence Mining for Knowledge Graphs-Driven Retrieval-Augmented Generation cs.CL · 2026-04-24 · unverdicted · none · ref 2 · 2 links · internal anchor
STEM reframes multi-hop KGQA as schema-guided graph search with semantic-to-structural projection and Triple-GNN guidance, claiming SOTA accuracy and evidence completeness on multi-hop benchmarks.
SEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical Reasoning cs.CL · 2026-05-16 · unverdicted · none · ref 4 · 2 links · internal anchor
SEMA-RAG is a three-agent self-evolving RAG system that reports an average 6.46-point accuracy gain over the strongest baseline across five medical QA benchmarks and five LLM backbones.
Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing cs.CL · 2026-04-17 · unreviewed · ref 26 · internal anchor

Corrective Retrieval Augmented Generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer