Title resolution pending

Writing-zero: Bridge the gap between nonverifiable tasks, verifiable rewards · 2025 · arXiv 2506.00103

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

StoryAlign: Evaluating and Training Reward Models for Story Generation

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

StoryReward, trained on a new 100k story preference dataset, sets state-of-the-art performance on the introduced StoryRMB benchmark for aligning LLM stories with human preferences.

From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

WEval and WRL introduce fine-grained benchmarking and requirement-selective sample construction for training writing reward models, yielding substantial gains on writing benchmarks with strong generalization.

Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

Small LMs reach 77.1% accuracy at comparative forecasting of research idea success on benchmarks after supervised fine-tuning, with RLVR yielding interpretable reasoning at 71.35%.

Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation

cs.CL · 2026-01-29 · unverdicted · novelty 6.0

CoNL lets LLMs self-improve on non-verifiable tasks by rewarding critiques that produce better solutions in multi-agent conversations, jointly optimizing generation and judging without external feedback.

citing papers explorer

Showing 4 of 4 citing papers.

StoryAlign: Evaluating and Training Reward Models for Story Generation cs.CL · 2026-05-06 · unverdicted · none · ref 14
StoryReward, trained on a new 100k story preference dataset, sets state-of-the-art performance on the introduced StoryRMB benchmark for aligning LLM stories with human preferences.
From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks cs.CL · 2026-04-30 · unverdicted · none · ref 1
WEval and WRL introduce fine-grained benchmarking and requirement-selective sample construction for training writing reward models, yielding substantial gains on writing benchmarks with strong generalization.
Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation cs.LG · 2026-04-06 · unverdicted · none · ref 3
Small LMs reach 77.1% accuracy at comparative forecasting of research idea success on benchmarks after supervised fine-tuning, with RLVR yielding interpretable reasoning at 71.35%.
Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation cs.CL · 2026-01-29 · unverdicted · none · ref 11
CoNL lets LLMs self-improve on non-verifiable tasks by rewarding critiques that produce better solutions in multi-agent conversations, jointly optimizing generation and judging without external feedback.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer