Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

· 2026 · cs.AI · arXiv 2604.18292

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limited by the lack of realistic environments and principled mechanisms for life-long learning. In this paper, we present \textbf{Agent-World}, a self-evolving training arena for advancing general agent intelligence through scalable environments. Agent-World has two main components: (1) Agentic Environment-Task Discovery, which autonomously explores topic-aligned databases and executable tool ecosystems from thousands of real-world environment themes and synthesizes verifiable tasks with controllable difficulty; and (2) Continuous Self-Evolving Agent Training, which combines multi-environment reinforcement learning with a self-evolving agent arena that automatically identifies capability gaps through dynamic task synthesis and drives targeted learning, enabling the co-evolution of agent policies and environments. Across 23 challenging agent benchmarks, Agent-World-8B and 14B consistently outperforms strong proprietary models and environment scaling baselines. Further analyses reveal scaling trends in relation to environment diversity and self-evolution rounds, offering insights for building general agent intelligence.

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

OpenFinGym: A Verifiable Multi-Task Gym Environment for Evaluating Quant Agents

cs.AI · 2026-06-24 · unverdicted · novelty 7.0

OpenFinGym is a multi-task verifiable gym environment for quant-finance agents with automated task construction from publications, containerised runtime, paper trading engine, and support for SFT/RL training.

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

cs.SE · 2026-05-18 · conditional · novelty 7.0

The paper presents OverEager-Gen, a 500-scenario benchmark showing that removing consent declarations from prompts increases overeager actions by 11.9-17.2 percentage points across models, with agent framework choice dominating base-model effects.

Learning Agentic Policy from Action Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

Qwen-AgentWorld: Language World Models for General Agents

cs.CL · 2026-06-23 · unverdicted · novelty 6.0

Qwen-AgentWorld are language world models that simulate multi-domain agent environments and boost general agent capabilities via decoupled RL simulation and unified foundation model training.

Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation

cs.AI · 2026-06-22 · unverdicted · novelty 6.0

AFTER benchmark shows single refinement improves LLM agent performance by 3.7-6.7 points and multi-model procedural skills reach 73.1% cross-model accuracy on 382 tasks.

PhoneBuddy: Training Open Models for Agentic Phone Use

cs.CL · 2026-06-22 · unverdicted · novelty 6.0

PhoneBuddy combines real-app and mock-app RL after shared SFT, raising real-phone task success from 36.67% to 45.33% and AndroidWorld from 60.3% to 83.2%.

PhoneWorld: Scaling Phone-Use Agent Environments

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

PhoneWorld is a pipeline that converts real mobile trajectories into scalable controllable environments, yielding large gains on four benchmarks when used to supplement training data.

Scalable Environments Drive Generalizable Agents

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

Generalizable agents require environment scaling via diverse executable rule-sets, distinguished from trajectory and task scaling in a new taxonomy.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer