pith. sign in

OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

This paper introduces OARelatedWork: a dataset for related work generation from open-access sources. It is the first large-scale multi-document summarization dataset for related work generation, containing whole related work sections and full texts of cited papers. Its validation and test splits are constructed so that every cited paper is available in full text, enabling controlled evaluation of full-text related work generation. The dataset includes 94 450 papers and 5 824 689 unique referenced papers from multiple domains. With OARelatedWork, we aim to shift the field from generating parts of related work sections from abstracts only to generating entire related work sections from all available content. We (i) benchmark a wide spectrum of models, highlighting that synthesizing massive full-text contexts remains challenge even for modern Large Language Models (LLMs): under our statement-level judge, GPT-4o-mini's evidence-grounded True rate drops from 92.9% with abstracts to 83.8% with full texts. We (ii) empirically analyze human writing behavior through a human evaluation over 40 papers and 408 factual statements, revealing that authors frequently introduce abstractive claims ungrounded in localized source texts; consequently, advanced LLMs actually surpass human baselines in strict, evidence-grounded factuality. Finally, we (iii) conduct a fine-grained meta-evaluation, revealing that standard reference-based metrics are inadequate for evaluating such long-form structured outputs, and introduce a robust statement-level evaluation framework to address this gap.

fields

cs.DL 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

RWGBench: Evaluating Scholarly Positioning in Related Work Generation

cs.DL · 2026-05-30 · unverdicted · novelty 7.0

RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.

citing papers explorer

Showing 1 of 1 citing paper.

  • RWGBench: Evaluating Scholarly Positioning in Related Work Generation cs.DL · 2026-05-30 · unverdicted · none · ref 7 · internal anchor

    RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.