pith. sign in

Title resolution pending

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.AI 1

years

2025 1

verdicts

ACCEPT 1

representative citing papers

Spurious Rewards: Rethinking Training Signals in RLVR

cs.AI · 2025-06-12 · accept · novelty 8.0

Spurious rewards in RLVR can produce large gains in mathematical reasoning for certain language models via GRPO's clipping bias amplifying pretraining behaviors like code reasoning.

citing papers explorer

Showing 1 of 1 citing paper.

  • Spurious Rewards: Rethinking Training Signals in RLVR cs.AI · 2025-06-12 · accept · none · ref 14

    Spurious rewards in RLVR can produce large gains in mathematical reasoning for certain language models via GRPO's clipping bias amplifying pretraining behaviors like code reasoning.