Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

On Faithfulness, Factuality in Abstractive Summarization , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

LinAlg-Bench shows LLMs switch from execution errors to computational abandonment and structured fabrication at 4x4 matrix scale, indicating a working memory limit rather than knowledge gaps.

Design and Report Benchmarks for Knowledge Work

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

Proposes a three-step benchmark design method (define work activity, specify tested setting, score work product) derived from work studies and O*NET, demonstrated via three case analyses.

Learning to Control Summaries with Score Ranking

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

A score-ranking loss enables controllable summarization by aligning outputs to evaluation scores, matching SOTA performance with dimension-specific control on LLaMA, Qwen, and Mistral.

citing papers explorer

Showing 3 of 3 citing papers.

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning cs.AI · 2026-05-15 · unverdicted · none · ref 16
LinAlg-Bench shows LLMs switch from execution errors to computational abandonment and structured fabrication at 4x4 matrix scale, indicating a working memory limit rather than knowledge gaps.
Design and Report Benchmarks for Knowledge Work cs.AI · 2026-05-22 · unverdicted · none · ref 101
Proposes a three-step benchmark design method (define work activity, specify tested setting, score work product) derived from work studies and O*NET, demonstrated via three case analyses.
Learning to Control Summaries with Score Ranking cs.CL · 2026-04-19 · unverdicted · none · ref 42
A score-ranking loss enables controllable summarization by aligning outputs to evaluation scores, matching SOTA performance with dimension-specific control on LLaMA, Qwen, and Mistral.

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

fields

years

verdicts

representative citing papers

citing papers explorer