Llms as scalable, general-purpose simulators for evolving digital agent training, 2025

Yiming Wang, Da Yin, Yuedong Cui, Ruichen Zheng, Zhiqian Li, Zongyu Lin, Di Wu, Xueqing Wu, Chenchen Ye, Yu Zhou, Kai-Wei Chang · 2025 · arXiv 2510.14969

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

EvoEnv lets a single policy synthesize, validate, and use Python environments with durable solve-verify asymmetry to improve reasoning performance on Qwen3-4B-Thinking from 72.4 to 74.8 while fixed-data baselines decline.

Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

cs.AI · 2026-06-25 · unverdicted · novelty 5.0

A three-stage training pipeline internalizes world-model simulation and success estimation in LLM agents for improved planning on search and math tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis cs.AI · 2026-05-14 · unverdicted · none · ref 39
EvoEnv lets a single policy synthesize, validate, and use Python environments with durable solve-verify asymmetry to improve reasoning performance on Qwen3-4B-Thinking from 72.4 to 74.8 while fixed-data baselines decline.
Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning cs.AI · 2026-06-25 · unverdicted · none · ref 8
A three-stage training pipeline internalizes world-model simulation and success estimation in LLM agents for improved planning on search and math tasks.

Llms as scalable, general-purpose simulators for evolving digital agent training, 2025

fields

years

verdicts

representative citing papers

citing papers explorer