hub

Chain-of-Verification Reduces Hallucination in Large Language Models

Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston · 2024 · DOI 10.18653/v1/2024.findings-acl.212

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open at publisher browse 10 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 3

citation-polarity summary

background 2 unclear 1

representative citing papers

Argus: Evidence Assembly for Scalable Deep Research Agents

cs.CL · 2026-05-15 · unverdicted · novelty 6.0 · 2 refs

Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.

Weighted Rules under the Stable Model Semantics

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.

Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration

cs.CL · 2026-04-13 · unverdicted · novelty 6.0

CURE trains LLMs to reason about uncertainty at the claim level via a structured protocol and multi-stage calibration, improving factual accuracy by up to 39.9% on biography generation while boosting calibration metrics.

Narrix: Remixing Narrative Strategies from Examples for Story Writing

cs.HC · 2026-04-08 · unverdicted · novelty 6.0

Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.

Corrective Retrieval Augmented Generation

cs.CL · 2024-01-29 · unverdicted · novelty 6.0

CRAG improves RAG robustness via a retrieval quality evaluator that triggers web augmentation and a decompose-recompose filter to focus on relevant information, yielding better results on short- and long-form generation tasks.

LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

LCC-LLM creates a code-centric dataset and RAG-based LLM framework that reaches 0.634 average semantic similarity on 43 malware tasks and 10/10 pass rate in real-world case studies.

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

cs.AI · 2026-05-02 · unverdicted · novelty 5.0

SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.

Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research

cs.HC · 2026-04-20 · unverdicted · novelty 5.0

AVA is a specialized GenAI platform for development policy research that provides verifiable syntheses from World Bank reports and is associated with 2.4-3.9 hours of weekly time savings in a large-scale user evaluation.

Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.

citing papers explorer

Showing 10 of 10 citing papers.

Argus: Evidence Assembly for Scalable Deep Research Agents cs.CL · 2026-05-15 · unverdicted · none · ref 25 · 2 links
Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.
Weighted Rules under the Stable Model Semantics cs.AI · 2026-05-10 · unverdicted · none · ref 36
Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments cs.CL · 2026-05-05 · unverdicted · none · ref 92
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.
Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration cs.CL · 2026-04-13 · unverdicted · none · ref 1
CURE trains LLMs to reason about uncertainty at the claim level via a structured protocol and multi-stage calibration, improving factual accuracy by up to 39.9% on biography generation while boosting calibration metrics.
Narrix: Remixing Narrative Strategies from Examples for Story Writing cs.HC · 2026-04-08 · unverdicted · none · ref 22
Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.
Corrective Retrieval Augmented Generation cs.CL · 2024-01-29 · unverdicted · none · ref 7
CRAG improves RAG robustness via a retrieval quality evaluator that triggers web augmentation and a decompose-recompose filter to focus on relevant information, yielding better results on short- and long-form generation tasks.
LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution cs.CR · 2026-05-07 · unverdicted · none · ref 57
LCC-LLM creates a code-centric dataset and RAG-based LLM framework that reaches 0.634 average semantic similarity on 43 malware tasks and 10/10 pass rate in real-world case studies.
Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization cs.AI · 2026-05-02 · unverdicted · none · ref 7
SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.
Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research cs.HC · 2026-04-20 · unverdicted · none · ref 20
AVA is a specialized GenAI platform for development policy research that provides verifiable syntheses from World Bank reports and is associated with 2.4-3.9 hours of weekly time savings in a large-scale user evaluation.
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation cs.CL · 2026-04-19 · unverdicted · none · ref 130
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.

Chain-of-Verification Reduces Hallucination in Large Language Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer