ReCode is a new RL framework combining contrastive reasoning-process reward learning with consistency-gated GRPO to improve code generation, yielding a 16.1% gain for a 7B model to match GPT-4-Turbo levels on benchmarks.
In this section, you must clearly list each degradation method you used, and for each one, pinpoint exactly how, where, and why you altered the original reasoning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ReCode: Reinforcing Code Generation with Reasoning-Process Rewards
ReCode is a new RL framework combining contrastive reasoning-process reward learning with consistency-gated GRPO to improve code generation, yielding a 16.1% gain for a 7B model to match GPT-4-Turbo levels on benchmarks.