IH-GRPO introduces implicit hierarchical control via a surrogate loss to decouple tool invocation from execution in LLMs, reporting 1.87-2.53% gains on math reasoning benchmarks.
If y ou need t o e x ecut e a block immediat ely , append `<t ool _call>` right aft er t he code block
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning
IH-GRPO introduces implicit hierarchical control via a surrogate loss to decouple tool invocation from execution in LLMs, reporting 1.87-2.53% gains on math reasoning benchmarks.