pith. sign in

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it
abstract

Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limited by the lack of realistic environments and principled mechanisms for life-long learning. In this paper, we present \textbf{Agent-World}, a self-evolving training arena for advancing general agent intelligence through scalable environments. Agent-World has two main components: (1) Agentic Environment-Task Discovery, which autonomously explores topic-aligned databases and executable tool ecosystems from thousands of real-world environment themes and synthesizes verifiable tasks with controllable difficulty; and (2) Continuous Self-Evolving Agent Training, which combines multi-environment reinforcement learning with a self-evolving agent arena that automatically identifies capability gaps through dynamic task synthesis and drives targeted learning, enabling the co-evolution of agent policies and environments. Across 23 challenging agent benchmarks, Agent-World-8B and 14B consistently outperforms strong proprietary models and environment scaling baselines. Further analyses reveal scaling trends in relation to environment diversity and self-evolution rounds, offering insights for building general agent intelligence.

citation-role summary

background 3

citation-polarity summary

years

2026 5

roles

background 2

polarities

background 2

representative citing papers

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

cs.SE · 2026-05-18 · conditional · novelty 7.0

The paper presents OverEager-Gen, a 500-scenario benchmark showing that removing consent declarations from prompts increases overeager actions by 11.9-17.2 percentage points across models, with agent framework choice dominating base-model effects.

Learning Agentic Policy from Action Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

Scalable Environments Drive Generalizable Agents

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

Generalizable agents require environment scaling via diverse executable rule-sets, distinguished from trajectory and task scaling in a new taxonomy.

citing papers explorer

Showing 5 of 5 citing papers.

  • SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning cs.AI · 2026-05-10 · accept · none · ref 19 · 2 links · internal anchor

    SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

  • Terminal-World: Scaling Terminal-Agent Environments via Agent Skills cs.CL · 2026-05-20 · unverdicted · none · ref 8 · internal anchor

    Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.

  • Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks cs.SE · 2026-05-18 · conditional · none · ref 4 · internal anchor

    The paper presents OverEager-Gen, a 500-scenario benchmark showing that removing consent declarations from prompts increases overeager actions by 11.9-17.2 percentage points across models, with agent framework choice dominating base-model effects.

  • Learning Agentic Policy from Action Guidance cs.CL · 2026-05-12 · unverdicted · none · ref 15 · internal anchor

    ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

  • Scalable Environments Drive Generalizable Agents cs.AI · 2026-05-18 · unverdicted · none · ref 6 · internal anchor

    Generalizable agents require environment scaling via diverse executable rule-sets, distinguished from trajectory and task scaling in a new taxonomy.