Scaling synthetic task generation for agents via exploration

Ram Ramrakhya, Andrew Szot, Omar Attia, Yuhao Yang, Anh Nguyen, Bogdan Mazoure, Zhe Gan, Harsh Agrawal, Alexander Toshev · 2025 · arXiv 2509.25047

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

cs.AI · 2026-05-25 · conditional · novelty 7.0

CUA-Gym generates 32,112 verified RLVR tuples across 110 mock environments, enabling trained models to reach 62.1% and 72.6% on OSWorld-Verified while transferring to WebArena.

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

EvoEnv lets a single policy synthesize, validate, and use Python environments with durable solve-verify asymmetry to improve reasoning performance on Qwen3-4B-Thinking from 72.4 to 74.8 while fixed-data baselines decline.

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.

UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

cs.LG · 2026-02-11 · unverdicted · novelty 5.0

UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.

citing papers explorer

Showing 4 of 4 citing papers.

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents cs.AI · 2026-05-25 · conditional · none · ref 17
CUA-Gym generates 32,112 verified RLVR tuples across 110 mock environments, enabling trained models to reach 62.1% and 72.6% on OSWorld-Verified while transferring to WebArena.
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis cs.AI · 2026-05-14 · unverdicted · none · ref 27
EvoEnv lets a single policy synthesize, validate, and use Python environments with durable solve-verify asymmetry to improve reasoning performance on Qwen3-4B-Thinking from 72.4 to 74.8 while fixed-data baselines decline.
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants cs.AI · 2026-04-30 · unverdicted · none · ref 57
The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.
UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics cs.LG · 2026-02-11 · unverdicted · none · ref 32
UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.

Scaling synthetic task generation for agents via exploration

fields

years

verdicts

representative citing papers

citing papers explorer