Saturn: Sat-based reinforcement learning to unleash llms reasoning

Huanyu Liu, Ge Li, Jia Li, Hao Zhu, Kechi Zhang, Yihong Dong · 2025 · arXiv 2505.16368

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 3 refs

RL training compute for logical reasoning follows a power law with horizon depth whose exponent rises with logical expressiveness, yielding better downstream transfer when models train on richer logics.

SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning

cs.AI · 2026-01-08 · unverdicted · novelty 6.0

SCALER creates adaptive synthetic environments for RL-based LLM reasoning training that outperforms fixed-dataset baselines with more stable long-term progress.

EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning

cs.LG · 2025-08-11 · unverdicted · novelty 6.0

EvoCoT uses self-generated and verified CoT trajectories in a two-stage curriculum to let LLMs learn from initially unsolved hard problems in RLVR settings.

citing papers explorer

Showing 3 of 3 citing papers.

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key cs.AI · 2026-05-07 · unverdicted · none · ref 80 · 3 links
RL training compute for logical reasoning follows a power law with horizon depth whose exponent rises with logical expressiveness, yielding better downstream transfer when models train on richer logics.
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning cs.AI · 2026-01-08 · unverdicted · none · ref 24
SCALER creates adaptive synthetic environments for RL-based LLM reasoning training that outperforms fixed-dataset baselines with more stable long-term progress.
EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning cs.LG · 2025-08-11 · unverdicted · none · ref 9
EvoCoT uses self-generated and verified CoT trajectories in a two-stage curriculum to let LLMs learn from initially unsolved hard problems in RLVR settings.

Saturn: Sat-based reinforcement learning to unleash llms reasoning

fields

years

verdicts

representative citing papers

citing papers explorer