hub

On Faithfulness and Factuality in Abstractive Summarization

Joshua Maynez, Shashi Narayan, Bernd Bohnet, Ryan McDonald · 2020 · DOI 10.18653/v1/2020.acl-main.173

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

open at publisher browse 20 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

cs.CL · 2022-01-28 · accept · novelty 9.0

Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.

Knowledge Editing in Masked Diffusion Language Models

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Locate-then-edit succeeds at the same early-to-mid MLP locations in masked diffusion models as in autoregressive models, but requires optimization over intermediate partial-mask states to handle multi-token targets.

Causal Stories from Sensor Traces: Auditing Epistemic Overreach in LLM-Generated Personal Sensing Explanations

cs.HC · 2026-05-09 · accept · novelty 7.0

LLMs routinely produce unsupported causal stories for personal sensing anomalies, and richer evidence or constrained prompts do not reliably eliminate this epistemic overreach.

M\"OVE: A Holistic LLM Benchmark for the German Public Sector

cs.CL · 2026-06-11 · unverdicted · novelty 6.0

MÖVE presents a new German-language benchmark evaluating 39 LLMs on performance and governance criteria using ten public-administration datasets.

Detect, Remask, Repair: Diffusion Editing for Faithful Summarization of Evolving Contexts

cs.CL · 2026-06-11 · unverdicted · novelty 6.0

Diffusion-based localized editing framework for faithful summarization of evolving contexts, introducing the StreamSum benchmark and showing tradeoffs in faithfulness, speed, and preservation.

From `May' to `Is': Certainty Distortion in Language Model Rewriting

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

LMs systematically inflate expressed certainty during rewriting, affecting up to 75% of outputs with a 1.5-2x bias toward increasing rather than decreasing certainty, and the effect compounds over iterations.

LLM Self-Recognition: Steering and Retrieving Activation Signatures

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.

Boosting Self-Consistency with Ranking

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.

From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

PrimeFacts extracts decontextualized premises from fact-check articles, raising evidence retrieval MRR by up to 30% and verdict prediction Macro-F1 by 10-20 points over baselines.

SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.

Stress Testing Factual Consistency Metrics for Long-Document Summarization

cs.CL · 2025-11-10 · unverdicted · novelty 6.0

Short-form factual consistency metrics produce inconsistent scores on semantically equivalent long-document summaries and lose reliability on information-dense claims.

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

cs.CL · 2023-03-15 · unverdicted · novelty 6.0

SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.

Enhancing Factuality through Consensus and Consistency in Summarization Using Minimum Bayes Risk Decoding

cs.CL · 2026-05-28 · unverdicted · novelty 5.0

ConSUM reranks candidate summaries using MBR consensus and source-consistency metrics to improve factuality over standard generation or reranking baselines.

ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

ECPO is a listwise policy optimization method that couples ranking utility with span-level evidence certificate validity and a deterministic verifier reward on MAVEN-ERE and RAMS datasets.

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

cs.CL · 2023-11-09 · unverdicted · novelty 5.0

The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

Hybrid Adversarial Defence for Natural Language Understanding Tasks

cs.CL · 2026-06-03 · unverdicted · novelty 4.0

Hybrid entropy-uncertainty-geometric defence improves clean accuracy by up to 43% and adversarial robustness by up to 65% on NLU and security benchmarks.

Making Knowledge Accessible: Divergent Readability-Accuracy Strategies of Mistral and QWen in Biomedical Text Simplification

cs.CL · 2025-11-07 · unverdicted · novelty 4.0

Mistral uses careful lexical simplification to raise readability while keeping BERTScore at 0.91 comparable to humans, whereas QWen improves readability but shows a disconnect with its 0.89 BERTScore in biomedical text simplification.

Agent AI: Surveying the Horizons of Multimodal Interaction

cs.AI · 2024-01-07 · unverdicted · novelty 4.0

The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

A Survey of Hallucination in Large Foundation Models

cs.AI · 2023-09-12 · accept · novelty 3.0

A survey classifying hallucination phenomena specific to large foundation models, establishing evaluation criteria, examining mitigation strategies, and discussing future directions.

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

cs.CL · 2026-06-24 · unverdicted · novelty 2.0

Fine-tuned PEGASUS achieves state-of-the-art ROUGE scores on XL-Sum English corpus with 4.04% ROUGE-1, 15.25% ROUGE-2, and 3.39% ROUGE-L gains over mT5 baseline.

citing papers explorer

Showing 4 of 4 citing papers after filters.

LLM Self-Recognition: Steering and Retrieving Activation Signatures cs.AI · 2026-06-04 · unverdicted · none · ref 60
Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.
ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking cs.AI · 2026-05-21 · unverdicted · none · ref 8
ECPO is a listwise policy optimization method that couples ranking utility with span-level evidence certificate validity and a deterministic verifier reward on MAVEN-ERE and RAMS datasets.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 188
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
A Survey of Hallucination in Large Foundation Models cs.AI · 2023-09-12 · accept · none · ref 32
A survey classifying hallucination phenomena specific to large foundation models, establishing evaluation criteria, examining mitigation strategies, and discussing future directions.

On Faithfulness and Factuality in Abstractive Summarization

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer