hub

Chain-of-verification reduces hallucination in large language models

· 2024 · DOI 10.18653/v1/2024.findings-acl.212

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

open at publisher browse 13 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 3

citation-polarity summary

background 2 unclear 1

representative citing papers

Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction

cs.AI · 2026-05-24 · unverdicted · novelty 7.0

Prover-verifier deliberation yields a high-confidence subset of LLM answers with ~30pp higher precision than the complement on GPQA Diamond by using defender-challenger dialogues.

KG-Guard: Graph-Based Hallucination Detection for Knowledge Base Question Answering

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

KG-Guard augments knowledge graphs with a virtual question node and uses a graph encoder plus MLP to classify LLM-proposed answers as hallucinations or not, reporting superior F1 scores and downstream improvements on three benchmarks.

Argus: Evidence Assembly for Scalable Deep Research Agents

cs.CL · 2026-05-15 · unverdicted · novelty 6.0 · 2 refs

Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.

Weighted Rules under the Stable Model Semantics

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.

Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration

cs.CL · 2026-04-13 · unverdicted · novelty 6.0

CURE trains LLMs to reason about uncertainty at the claim level via a structured protocol and multi-stage calibration, improving factual accuracy by up to 39.9% on biography generation while boosting calibration metrics.

Narrix: Remixing Narrative Strategies from Examples for Story Writing

cs.HC · 2026-04-08 · unverdicted · novelty 6.0

Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.

Corrective Retrieval Augmented Generation

cs.CL · 2024-01-29 · unverdicted · novelty 6.0

CRAG improves RAG robustness via a retrieval quality evaluator that triggers web augmentation and a decompose-recompose filter to focus on relevant information, yielding better results on short- and long-form generation tasks.

ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails

cs.CL · 2026-05-29 · unverdicted · novelty 5.0

ConsisGuard is a consistency-aware framework that applies Policy-to-Decision Trajectory Distillation and Functional Coupling Alignment to improve policy execution consistency in reasoning-based LLM guardrails on harmfulness detection tasks.

LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

LCC-LLM creates a code-centric dataset and RAG-based LLM framework that reaches 0.634 average semantic similarity on 43 malware tasks and 10/10 pass rate in real-world case studies.

Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research

cs.HC · 2026-04-20 · unverdicted · novelty 5.0

AVA is a specialized GenAI platform for development policy research that provides verifiable syntheses from World Bank reports and is associated with 2.4-3.9 hours of weekly time savings in a large-scale user evaluation.

Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

cs.AI · 2026-05-02 · unverdicted · novelty 4.0

An SCM-GRPO framework grounds multi-hop reasoning in structural dependency graphs and optimizes chain length via rule-based RL, outperforming baselines on HoVer and EX-FEVER.

citing papers explorer

Showing 13 of 13 citing papers.

Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction cs.AI · 2026-05-24 · unverdicted · none · ref 5
Prover-verifier deliberation yields a high-confidence subset of LLM answers with ~30pp higher precision than the complement on GPQA Diamond by using defender-challenger dialogues.
KG-Guard: Graph-Based Hallucination Detection for Knowledge Base Question Answering cs.LG · 2026-05-29 · unverdicted · none · ref 40
KG-Guard augments knowledge graphs with a virtual question node and uses a graph encoder plus MLP to classify LLM-proposed answers as hallucinations or not, reporting superior F1 scores and downstream improvements on three benchmarks.
Argus: Evidence Assembly for Scalable Deep Research Agents cs.CL · 2026-05-15 · unverdicted · none · ref 25 · 2 links
Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.
Weighted Rules under the Stable Model Semantics cs.AI · 2026-05-10 · unverdicted · none · ref 36
Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments cs.CL · 2026-05-05 · unverdicted · none · ref 92
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.
Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration cs.CL · 2026-04-13 · unverdicted · none · ref 1
CURE trains LLMs to reason about uncertainty at the claim level via a structured protocol and multi-stage calibration, improving factual accuracy by up to 39.9% on biography generation while boosting calibration metrics.
Narrix: Remixing Narrative Strategies from Examples for Story Writing cs.HC · 2026-04-08 · unverdicted · none · ref 22
Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.
Corrective Retrieval Augmented Generation cs.CL · 2024-01-29 · unverdicted · none · ref 7
CRAG improves RAG robustness via a retrieval quality evaluator that triggers web augmentation and a decompose-recompose filter to focus on relevant information, yielding better results on short- and long-form generation tasks.
ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails cs.CL · 2026-05-29 · unverdicted · none · ref 9
ConsisGuard is a consistency-aware framework that applies Policy-to-Decision Trajectory Distillation and Functional Coupling Alignment to improve policy execution consistency in reasoning-based LLM guardrails on harmfulness detection tasks.
LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution cs.CR · 2026-05-07 · unverdicted · none · ref 57
LCC-LLM creates a code-centric dataset and RAG-based LLM framework that reaches 0.634 average semantic similarity on 43 malware tasks and 10/10 pass rate in real-world case studies.
Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research cs.HC · 2026-04-20 · unverdicted · none · ref 20
AVA is a specialized GenAI platform for development policy research that provides verifiable syntheses from World Bank reports and is associated with 2.4-3.9 hours of weekly time savings in a large-scale user evaluation.
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation cs.CL · 2026-04-19 · unverdicted · none · ref 130
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization cs.AI · 2026-05-02 · unverdicted · none · ref 7
An SCM-GRPO framework grounds multi-hop reasoning in structural dependency graphs and optimizes chain length via rule-based RL, outperforming baselines on HoVer and EX-FEVER.

Chain-of-verification reduces hallucination in large language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer