The teacher selects a question based on its current utility estimates and sends it to the student

Sample Request:The student requests a new problem q from the teacher

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning

cs.LG · 2026-02-16 · unverdicted · novelty 5.0

A teacher-driven sampling method selects appropriately difficult questions for student models in GRPO-based RL to improve reasoning performance under fixed compute on OpenMathReasoning.

citing papers explorer

Showing 1 of 1 citing paper.

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning cs.LG · 2026-02-16 · unverdicted · none · ref 37
A teacher-driven sampling method selects appropriately difficult questions for student models in GRPO-based RL to improve reasoning performance under fixed compute on OpenMathReasoning.

The teacher selects a question based on its current utility estimates and sends it to the student

fields

years

verdicts

representative citing papers

citing papers explorer