pith. sign in

RL grokking recipe: How does RL unlock and transfer new algorithms in LLMs?arXiv preprint arXiv:2509.21016,

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 1 cs.LG 1

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

VeriGate: Verifier-Gated Step-Level Supervision for GRPO

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

VeriGate adds verifier-gated step-level supervision to GRPO via cumulated PRM rewards and group-normalized token advantages, raising accuracy 20% and 12% on 1.5B and 7B models on MATH and six benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.