Reinforcement learning on pre-training data.arXiv preprint arXiv:2509.19249

· 2025 · arXiv 2509.19249

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Presents a likelihood-based benchmark for equation-suffix prediction in technical papers with controls to detect shortcut vulnerabilities in model forecasts.

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

cs.CV · 2026-03-24 · unverdicted · novelty 7.0

KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Experiments indicate RL applied early in pre-training often matches full SFT-then-RL performance, targeted data composition outweighs scale for RL success, and averaging RL and SFT objectives outperforms sequential or single methods.

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

cs.CL · 2026-05-22 · unverdicted · novelty 5.0 · 2 refs

ARES generates 100K rubric-annotated QA instances from raw documents and demonstrates superior rubric-based RL performance over baselines on open-ended benchmarks.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities cs.LG · 2026-05-11 · unverdicted · none · ref 15
Presents a likelihood-based benchmark for equation-suffix prediction in technical papers with controls to detect shortcut vulnerabilities in model forecasts.
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset cs.CV · 2026-03-24 · unverdicted · none · ref 44
KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.
RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training cs.LG · 2026-06-02 · unverdicted · none · ref 1
Experiments indicate RL applied early in pre-training often matches full SFT-then-RL performance, targeted data composition outweighs scale for RL success, and averaging RL and SFT objectives outperforms sequential or single methods.
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space cs.LG · 2026-04-15 · unverdicted · none · ref 29
PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.
ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning cs.CL · 2026-05-22 · unverdicted · none · ref 10 · 2 links
ARES generates 100K rubric-annotated QA instances from raw documents and demonstrates superior rubric-based RL performance over baselines on open-ended benchmarks.

Reinforcement learning on pre-training data.arXiv preprint arXiv:2509.19249

fields

years

verdicts

representative citing papers

citing papers explorer