A new 263k TTCW-annotated story dataset shows non-reasoning fine-tuning of Qwen3 models outperforms reasoning-supervised fine-tuning for fixed-format long-form literary review generation.
What Matters in Evaluating Book-Length Stories? A Systematic Study of Long Story Evaluation
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation
A new 263k TTCW-annotated story dataset shows non-reasoning fine-tuning of Qwen3 models outperforms reasoning-supervised fine-tuning for fixed-format long-form literary review generation.