ExpThink reduces average CoT response length by up to 77% while improving accuracy on math benchmarks via experience-guided reward shaping and difficulty-adaptive advantage in RL.
ProRL v2: Scaling LLM reinforcement learning with prolonged training.NVIDIA Technical Blog
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression
ExpThink reduces average CoT response length by up to 77% while improving accuracy on math benchmarks via experience-guided reward shaping and difficulty-adaptive advantage in RL.