Evoflux applies evolutionary search at inference time to repair executable tool workflows for compact agents, outperforming SFT and SFT+DPO on held-out MCP-Bench tasks with live servers and 250 tools.
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models , booktitle =
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
WebAggregator generates synthetic training data using exploration and logic-proposal steps to fine-tune 32B models that match or exceed GPT-4.1 and Claude-3.7-Sonnet on GAIA and related benchmarks by prioritizing compositional reasoning.
StepGuard framework with DDPO and CANR claims SOTA navigation and answer accuracy on web benchmarks by switching policies and triggering reflection on low-confidence steps.
citing papers explorer
-
WebAggregator: Enhancing Compositional Reasoning Capabilities of Deep Research Agent Foundation Models
WebAggregator generates synthetic training data using exploration and logic-proposal steps to fine-tune 32B models that match or exceed GPT-4.1 and Claude-3.7-Sonnet on GAIA and related benchmarks by prioritizing compositional reasoning.