StoryReward, trained on a new 100k story preference dataset, sets state-of-the-art performance on the introduced StoryRMB benchmark for aligning LLM stories with human preferences.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
WEval and WRL introduce fine-grained benchmarking and requirement-selective sample construction for training writing reward models, yielding substantial gains on writing benchmarks with strong generalization.
Small LMs reach 77.1% accuracy at comparative forecasting of research idea success on benchmarks after supervised fine-tuning, with RLVR yielding interpretable reasoning at 71.35%.
CoNL lets LLMs self-improve on non-verifiable tasks by rewarding critiques that produce better solutions in multi-agent conversations, jointly optimizing generation and judging without external feedback.
citing papers explorer
-
StoryAlign: Evaluating and Training Reward Models for Story Generation
StoryReward, trained on a new 100k story preference dataset, sets state-of-the-art performance on the introduced StoryRMB benchmark for aligning LLM stories with human preferences.
-
From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks
WEval and WRL introduce fine-grained benchmarking and requirement-selective sample construction for training writing reward models, yielding substantial gains on writing benchmarks with strong generalization.
-
Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation
Small LMs reach 77.1% accuracy at comparative forecasting of research idea success on benchmarks after supervised fine-tuning, with RLVR yielding interpretable reasoning at 71.35%.
-
Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation
CoNL lets LLMs self-improve on non-verifiable tasks by rewarding critiques that produce better solutions in multi-agent conversations, jointly optimizing generation and judging without external feedback.