RL grokking recipe: How does RL unlock and transfer new algorithms in LLMs?arXiv preprint arXiv:2509.21016,

Yiyou Sun, Yuhan Cao, Pohao Huang, Haoyue Bai, Hannaneh Hajishirzi, Nouha Dziri, Dawn Song · 2025 · arXiv 2509.21016

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

ADR generates novel verifiable code tasks via atomic decomposition and recombination, outperforming heuristic baselines in originality, difficulty, and downstream RLVR gains across coding domains.

VeriGate: Verifier-Gated Step-Level Supervision for GRPO

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

VeriGate adds verifier-gated step-level supervision to GRPO via cumulated PRM rewards and group-normalized token advantages, raising accuracy 20% and 12% on 1.5B and 7B models on MATH and six benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination cs.CL · 2026-05-29 · unverdicted · none · ref 12
ADR generates novel verifiable code tasks via atomic decomposition and recombination, outperforming heuristic baselines in originality, difficulty, and downstream RLVR gains across coding domains.
VeriGate: Verifier-Gated Step-Level Supervision for GRPO cs.LG · 2026-05-28 · unverdicted · none · ref 15
VeriGate adds verifier-gated step-level supervision to GRPO via cumulated PRM rewards and group-normalized token advantages, raising accuracy 20% and 12% on 1.5B and 7B models on MATH and six benchmarks.

RL grokking recipe: How does RL unlock and transfer new algorithms in LLMs?arXiv preprint arXiv:2509.21016,

fields

years

verdicts

representative citing papers

citing papers explorer