Presents a likelihood-based benchmark for equation-suffix prediction in technical papers with controls to detect shortcut vulnerabilities in model forecasts.
Reinforcement learning on pre-training data
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.
PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.
citing papers explorer
-
Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities
Presents a likelihood-based benchmark for equation-suffix prediction in technical papers with controls to detect shortcut vulnerabilities in model forecasts.
-
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.
-
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space
PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.
-
A Survey of Reinforcement Learning for Large Reasoning Models
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.
- ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning