FunReason: Enhancing function calling via self-refinement and data refinement

Bingguang Hao et al · 2025 · arXiv 2505.20192

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

cs.LG · 2026-05-10 · unverdicted · novelty 7.0 · 3 refs

RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.

R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

R2IF improves LLM function-calling accuracy by up to 34.62% on BFCL using a composite reward system with CER and SMV components optimized via GRPO, while increasing interpretability through positive CoT effectiveness.

Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions

cs.CV · 2025-09-23 · unverdicted · novelty 5.0

Structured reflection makes error diagnosis and repair an explicit trainable step that improves reliability and reduces redundant calls in tool-using LLM agents.

Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA

cs.IR · 2026-04-07 · unverdicted · novelty 3.0

A pipeline of dataset construction from prior work, AugFC parameter augmentation, and two-step LLM training improves function calling for financial APIs and is running in production.

citing papers explorer

Showing 4 of 4 citing papers.

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement cs.LG · 2026-05-10 · unverdicted · none · ref 19 · 3 links
RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.
R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling cs.LG · 2026-04-22 · unverdicted · none · ref 34
R2IF improves LLM function-calling accuracy by up to 34.62% on BFCL using a composite reward system with CER and SMV components optimized via GRPO, while increasing interpretability through positive CoT effectiveness.
Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions cs.CV · 2025-09-23 · unverdicted · none · ref 19
Structured reflection makes error diagnosis and repair an explicit trainable step that improves reliability and reduces redundant calls in tool-using LLM agents.
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA cs.IR · 2026-04-07 · unverdicted · none · ref 10
A pipeline of dataset construction from prior work, AugFC parameter augmentation, and two-step LLM training improves function calling for financial APIs and is running in production.

FunReason: Enhancing function calling via self-refinement and data refinement

fields

years

verdicts

representative citing papers

citing papers explorer