Abstral: Augmenting llms’ reasoning by reinforcing abstract thinking.arXiv preprint arXiv:2406.11228, 2024

Gao, S · 2024 · arXiv 2406.11228

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning

cs.LG · 2026-02-16 · unverdicted · novelty 5.0

A teacher-driven sampling method selects appropriately difficult questions for student models in GRPO-based RL to improve reasoning performance under fixed compute on OpenMathReasoning.

citing papers explorer

Showing 1 of 1 citing paper.

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning cs.LG · 2026-02-16 · unverdicted · none · ref 8
A teacher-driven sampling method selects appropriately difficult questions for student models in GRPO-based RL to improve reasoning performance under fixed compute on OpenMathReasoning.

Abstral: Augmenting llms’ reasoning by reinforcing abstract thinking.arXiv preprint arXiv:2406.11228, 2024

fields

years

verdicts

representative citing papers

citing papers explorer