CODA uses rollout-based difficulty signals to drive two gates that penalize verbosity on easy instances and promote deliberation on hard ones, cutting token use over 60% on simple tasks while maintaining accuracy.
Omni-MATH: A universal olympiad level mathematic benchmark for large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
citing papers explorer
-
CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning
CODA uses rollout-based difficulty signals to drive two gates that penalize verbosity on easy instances and promote deliberation on hard ones, cutting token use over 60% on simple tasks while maintaining accuracy.
- Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective