Reasoning through exploration: A reinforcement learning framework for robust function calling.arXiv preprint arXiv:2508.05118, 2025

Bingguang Hao, Maolin Wang, Zengzhuang Xu, Yicheng Chen, Cunyin Peng, Jinjie GU, Chenyi Zhuang · 2025 · arXiv 2508.05118

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models

cs.AI · 2026-01-07 · unverdicted · novelty 7.0

SCRIBE introduces skill-conditioned rewards with intermediate behavioral evaluation to reduce noise in training tool-augmented agents, raising AIME25 accuracy from 43.3% to 63.3% on a Qwen3-4B model.

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

EDAS modulates advantage signals in RLVR to penalize repeated errors more and rare errors less, yielding consistent gains on math benchmarks when added to existing methods.

ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

cs.DC · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

ROSE is a system for cooperative elasticity that co-locates serving and rollout models on shared GPUs, delivering 1.3-3.3x higher end-to-end throughput than fixed-resource baselines while preserving serving SLOs.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA

cs.IR · 2026-04-07 · unverdicted · novelty 3.0

A pipeline of dataset construction from prior work, AugFC parameter augmentation, and two-step LLM training improves function calling for financial APIs and is running in production.

citing papers explorer

Showing 5 of 5 citing papers.

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models cs.AI · 2026-01-07 · unverdicted · none · ref 2
SCRIBE introduces skill-conditioned rewards with intermediate behavioral evaluation to reduce noise in training tool-augmented agents, raising AIME25 accuracy from 43.3% to 63.3% on a Qwen3-4B model.
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning cs.LG · 2026-05-17 · unverdicted · none · ref 11
EDAS modulates advantage signals in RLVR to penalize repeated errors more and rare errors less, yielding consistent gains on math benchmarks when added to existing methods.
ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL cs.DC · 2026-05-07 · unverdicted · none · ref 24 · 2 links
ROSE is a system for cooperative elasticity that co-locates serving and rollout models on shared GPUs, delivering 1.3-3.3x higher end-to-end throughput than fixed-resource baselines while preserving serving SLOs.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 110
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA cs.IR · 2026-04-07 · unverdicted · none · ref 9
A pipeline of dataset construction from prior work, AugFC parameter augmentation, and two-step LLM training improves function calling for financial APIs and is running in production.

Reasoning through exploration: A reinforcement learning framework for robust function calling.arXiv preprint arXiv:2508.05118, 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer