hub Mixed citations

Torl: Scaling tool-integrated rl, 2025 b

Torl: Scaling tool-integrated rl , author= · 2025 · arXiv 2503.23383

Mixed citation behavior. Most common role is background (67%).

17 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 baseline 1 other 1

citation-polarity summary

background 6 unclear 2 baseline 1

representative citing papers

Harnessing LLM Agents with Skill Programs

cs.AI · 2026-05-18 · conditional · novelty 6.0

HASP upgrades textual skills into executable Program Functions that intervene in LLM agent loops at inference, post-training, or self-evolution, delivering 25% gains over ReAct and 30.4% over Search-R1 on reasoning benchmarks.

ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

ComplexMCP benchmark shows top LLM agents achieve under 60% success on dynamic interdependent tool tasks versus 90% for humans, due to tool retrieval saturation, over-confidence, and strategic defeatism.

PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

PruneTIR prunes erroneous tool-call trajectories during LLM inference via three trigger-based components to raise Pass@1 accuracy and efficiency while shortening context.

SOD: Step-wise On-policy Distillation for Small Language Model Agents

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling

cs.AI · 2026-05-01 · unverdicted · novelty 6.0

LLMs often misalign their self-perceived need for tools with true need and utility, but lightweight estimators trained on hidden states can improve tool-calling decisions and task performance across multiple models and tasks.

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.

When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

ATTC reduces 'Tool Ignored' errors in tool-integrated reasoning by adaptively trusting tool results according to generated code confidence, yielding 4.1-7.5% gains across models and datasets.

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

cs.AI · 2026-04-03 · unverdicted · novelty 6.0

CharTool equips MLLMs with cropping and code tools plus agentic RL on DuoChart data to raise chart-reasoning accuracy by up to 9.78 percent on benchmarks.

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation

cs.CV · 2026-02-17 · unverdicted · novelty 6.0

MARL-Rad trains region-specific and global agents with reinforcement learning on clinical rewards to produce more accurate radiology reports than prior methods on MIMIC-CXR and IU X-ray datasets.

The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination

cs.LG · 2025-10-27 · conditional · novelty 6.0

Strengthening LLM reasoning through RL, SFT, or chain-of-thought prompting increases tool hallucination rates on SimpleToolHalluBench, with a reliability-capability trade-off observed across mitigation attempts.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

cs.CL · 2025-04-30 · unverdicted · novelty 6.0

WebThinker equips large reasoning models with autonomous web exploration and interleaved reasoning-drafting via a Deep Web Explorer and RL-based DPO training, yielding gains on GPQA, GAIA, and report-generation benchmarks.

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

cs.CL · 2025-04-15 · unverdicted · novelty 6.0

ReTool uses outcome-driven RL to train 32B LLMs to dynamically use code tools during reasoning, reaching 72.5% accuracy on AIME and surpassing o1-preview.

E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning

cs.AI · 2026-04-10 · unverdicted · novelty 5.0

E3-TIR integrates expert prefixes, guided branches, and self-exploration via mix policy optimization to deliver 6% better tool-use performance with under 10% of the usual synthetic data and 1.46x ROI.

Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions

cs.CV · 2025-09-23 · unverdicted · novelty 5.0

Structured reflection makes error diagnosis and repair an explicit trainable step that improves reliability and reduces redundant calls in tool-using LLM agents.

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

cs.AI · 2025-09-02 · conditional · novelty 5.0

UI-TARS-2 reaches 88.2 on Online-Mind2Web, 47.5 on OSWorld, 50.6 on WindowsAgentArena, and 73.3 on AndroidWorld while attaining 59.8 mean normalized score on a 15-game suite through multi-turn RL and scalable data generation.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 17 of 17 citing papers.

Harnessing LLM Agents with Skill Programs cs.AI · 2026-05-18 · conditional · none · ref 17
HASP upgrades textual skills into executable Program Functions that intervene in LLM agent loops at inference, post-training, or self-evolution, delivering 25% gains over ReAct and 30.4% over Search-R1 on reasoning benchmarks.
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox cs.AI · 2026-05-11 · unverdicted · none · ref 4 · 2 links
ComplexMCP benchmark shows top LLM agents achieve under 60% success on dynamic interdependent tool tasks versus 90% for humans, due to tool retrieval saturation, over-confidence, and strategic defeatism.
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning cs.CL · 2026-05-11 · unverdicted · none · ref 31
PruneTIR prunes erroneous tool-call trajectories during LLM inference via three trigger-based components to raise Pass@1 accuracy and efficiency while shortening context.
SOD: Step-wise On-policy Distillation for Small Language Model Agents cs.CL · 2026-05-08 · unverdicted · none · ref 17
SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.
To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling cs.AI · 2026-05-01 · unverdicted · none · ref 26
LLMs often misalign their self-perceived need for tools with true need and utility, but lightweight estimators trained on hidden states can improve tool-calling decisions and task performance across multiple models and tasks.
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence cs.AI · 2026-04-20 · unverdicted · none · ref 51
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning cs.CL · 2026-04-09 · unverdicted · none · ref 12
ATTC reduces 'Tool Ignored' errors in tool-integrated reasoning by adaptively trusting tool results according to generated code confidence, yielding 4.1-7.5% gains across models and datasets.
CharTool: Tool-Integrated Visual Reasoning for Chart Understanding cs.AI · 2026-04-03 · unverdicted · none · ref 25
CharTool equips MLLMs with cropping and code tools plus agentic RL on DuoChart data to raise chart-reasoning accuracy by up to 9.78 percent on benchmarks.
Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation cs.CV · 2026-02-17 · unverdicted · none · ref 60
MARL-Rad trains region-specific and global agents with reinforcement learning on clinical rewards to produce more accurate radiology reports than prior methods on MIMIC-CXR and IU X-ray datasets.
The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination cs.LG · 2025-10-27 · conditional · none · ref 7
Strengthening LLM reasoning through RL, SFT, or chain-of-thought prompting increases tool hallucination rates on SimpleToolHalluBench, with a reliability-capability trade-off observed across mitigation attempts.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 109
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
WebThinker: Empowering Large Reasoning Models with Deep Research Capability cs.CL · 2025-04-30 · unverdicted · none · ref 27
WebThinker equips large reasoning models with autonomous web exploration and interleaved reasoning-drafting via a Deep Web Explorer and RL-based DPO training, yielding gains on GPQA, GAIA, and report-generation benchmarks.
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs cs.CL · 2025-04-15 · unverdicted · none · ref 8
ReTool uses outcome-driven RL to train 32B LLMs to dynamically use code tools during reasoning, reaching 72.5% accuracy on AIME and surpassing o1-preview.
E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning cs.AI · 2026-04-10 · unverdicted · none · ref 19
E3-TIR integrates expert prefixes, guided branches, and self-exploration via mix policy optimization to deliver 6% better tool-use performance with under 10% of the usual synthetic data and 1.46x ROI.
Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions cs.CV · 2025-09-23 · unverdicted · none · ref 12
Structured reflection makes error diagnosis and repair an explicit trainable step that improves reliability and reduces redundant calls in tool-using LLM agents.
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning cs.AI · 2025-09-02 · conditional · none · ref 34
UI-TARS-2 reaches 88.2 on Online-Mind2Web, 47.5 on OSWorld, 50.6 on WindowsAgentArena, and 73.3 on AndroidWorld while attaining 59.8 mean normalized score on a 15-game suite through multi-turn RL and scalable data generation.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 290
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

Torl: Scaling tool-integrated rl, 2025 b

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer