SCRIBE introduces skill-conditioned rewards with intermediate behavioral evaluation to reduce noise in training tool-augmented agents, raising AIME25 accuracy from 43.3% to 63.3% on a Qwen3-4B model.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models
SCRIBE introduces skill-conditioned rewards with intermediate behavioral evaluation to reduce noise in training tool-augmented agents, raising AIME25 accuracy from 43.3% to 63.3% on a Qwen3-4B model.