Recognition: 1 theorem link
· Lean TheoremDynaWeb: Model-Based Reinforcement Learning of Web Agents
Pith reviewed 2026-05-16 09:31 UTC · model grok-4.3
The pith
DynaWeb trains web agents by learning a world model that simulates page responses to actions for efficient reinforcement learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper introduces DynaWeb as a novel MBRL framework that trains a web world model to predict naturalistic page representations given agent actions; this model functions as a synthetic web environment in which an agent policy can generate vast quantities of rollout trajectories for efficient online reinforcement learning, with real expert trajectories randomly interleaved to improve stability and sample efficiency, yielding significant performance improvements on WebArena and WebVoyager.
What carries the argument
The web world model that predicts naturalistic page representations to support simulated policy rollouts and imagination-based training.
If this is right
- Web agent training requires far fewer live internet interactions, lowering cost and risk.
- The quantity of training trajectories can be scaled arbitrarily through simulation.
- Interleaving expert data stabilizes learning and improves sample efficiency.
- The same framework delivers measurable gains to existing state-of-the-art open-source models.
Where Pith is reading between the lines
- Similar learned world models could support efficient training in other interactive digital environments such as desktop applications or mobile UIs.
- Higher-fidelity page prediction might further reduce any remaining sim-to-real gap.
- The method provides a practical route toward safer, lower-cost development of autonomous web assistants.
Load-bearing premise
The learned world model produces page representations realistic enough that policies trained inside the simulation transfer to real web environments without large distribution shift.
What would settle it
Agents trained using only DynaWeb-generated rollouts perform no better than, or worse than, agents trained exclusively on real trajectories when evaluated on the WebArena or WebVoyager benchmarks.
Figures
read the original abstract
The development of autonomous web agents, powered by Large Language Models (LLMs) and reinforcement learning (RL), represents a significant step towards general-purpose AI assistants. However, training these agents is severely hampered by the challenges of interacting with the live internet, which is inefficient, costly, and fraught with risks. Model-based reinforcement learning (MBRL) offers a promising solution by learning a world model of the environment to enable simulated interaction. This paper introduces DynaWeb, a novel MBRL framework that trains web agents through interacting with a web world model trained to predict naturalistic web page representations given agent actions. This model serves as a synthetic web environment where an agent policy can dream by generating vast quantities of rollout action trajectories for efficient online reinforcement learning. Beyond free policy rollouts, DynaWeb incorporates real expert trajectories from training data, which are randomly interleaved with on-policy rollouts during training to improve stability and sample efficiency. Experiments conducted on the challenging WebArena and WebVoyager benchmarks demonstrate that DynaWeb consistently and significantly improves the performance of state-of-the-art open-source web agent models. Our findings establish the viability of training web agents through imagination, offering a scalable and efficient way to scale up online agentic RL.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DynaWeb, a model-based RL framework for web agents in which a learned world model predicts naturalistic page representations from actions to support simulated policy rollouts; these rollouts are randomly interleaved with real expert trajectories during training to stabilize learning, and the resulting agents are shown to outperform prior open-source models on the WebArena and WebVoyager benchmarks.
Significance. If the central empirical claim holds, the work demonstrates a practical route to scaling online RL for web agents by replacing costly live-environment interactions with imagination-based training while mitigating distribution shift through expert-data interleaving. This addresses a key bottleneck in agentic RL and could generalize to other high-cost interaction domains.
major comments (2)
- [Abstract] Abstract: the headline claim that DynaWeb 'consistently and significantly improves' performance on WebArena and WebVoyager is presented without any reported error bars, statistical significance tests, or ablation isolating the MBRL component from the expert-trajectory interleaving; this information is load-bearing for the central claim that the world-model rollouts are responsible for the gains.
- [World-model section] World-model section (description of training objective and simulation): no quantitative validation of simulation fidelity is supplied, such as next-state prediction error on held-out real trajectories, KL divergence or other distributional metrics between simulated and real page representations, or an ablation of pure-simulated versus mixed versus pure-expert training; without these checks the transfer assumption remains unverified and the reported improvements could be driven primarily by the expert data.
minor comments (1)
- [Abstract] The informal phrase 'dream by generating vast quantities of rollout action trajectories' could be replaced by a more precise term such as 'simulate' to maintain technical tone.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We agree that additional statistical rigor and world-model validation would strengthen the central claims. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim that DynaWeb 'consistently and significantly improves' performance on WebArena and WebVoyager is presented without any reported error bars, statistical significance tests, or ablation isolating the MBRL component from the expert-trajectory interleaving; this information is load-bearing for the central claim that the world-model rollouts are responsible for the gains.
Authors: We agree that the abstract claim would benefit from explicit statistical support. In the revised manuscript we will report mean performance with standard deviations across multiple random seeds for all main results, include pairwise statistical significance tests (e.g., paired t-tests or Wilcoxon tests) against baselines, and add a dedicated ablation table that isolates the contribution of model-based rollouts from expert-trajectory interleaving. These additions will be placed in the Experiments section and referenced from the abstract. revision: yes
-
Referee: [World-model section] World-model section (description of training objective and simulation): no quantitative validation of simulation fidelity is supplied, such as next-state prediction error on held-out real trajectories, KL divergence or other distributional metrics between simulated and real page representations, or an ablation of pure-simulated versus mixed versus pure-expert training; without these checks the transfer assumption remains unverified and the reported improvements could be driven primarily by the expert data.
Authors: We acknowledge that quantitative fidelity metrics were not reported. In the revision we will add (i) next-state prediction error (L2 or cross-entropy) on held-out real trajectories, (ii) distributional metrics including KL divergence between simulated and real page-representation distributions, and (iii) an explicit ablation comparing pure-simulated rollouts, pure-expert trajectories, and the mixed schedule. These results will be presented in a new subsection of the World-Model section to directly verify the transfer assumption. revision: yes
Circularity Check
No significant circularity: performance gains derived from external benchmarks
full rationale
The paper trains a world model to predict page representations from actions, generates simulated rollouts, interleaves them with real expert trajectories for policy optimization, and reports improvements on the held-out WebArena and WebVoyager benchmarks. No equations, definitions, or self-citations reduce the reported gains to quantities fitted from the same evaluation data by construction; the central empirical claim remains independent of the training inputs and does not rely on renaming fitted parameters as predictions or importing uniqueness from prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A sufficiently accurate world model of web-page transitions exists and can be learned from limited interaction data.
invented entities (1)
-
Web world model
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DynaWeb trains web agents through model-based reinforcement learning by relying on imagined rollouts generated by a learned web world model... GSPO optimizes the clipped objective J_GSPO(θ)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a fut...
Reference graph
Works this paper leans on
-
[1]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5,
work page 2023
-
[2]
URL https://openrevi ew.net/forum?id=WE_vluYUL-X. Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. Webarena: A realistic web environment for building autonomous agents. InThe Twelfth International Conference on Learning Representations, ICLR 2024, ...
work page 2024
-
[3]
URL https://openreview.net/forum?id=oKn9 c6ytLx
OpenReview.net, 2024a. URL https://openreview.net/forum?id=oKn9 c6ytLx. 11 DynaWeb Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu. Webvoyager: Building an end-to-end web agent with large multimodal models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting ...
work page 2024
-
[4]
URLhttps://doi.org/10.18653/v1/2024.acl-long.371
doi: 10.18653/V1/2024.ACL-LONG.371. URLhttps://doi.org/10.18653/v1/2024.acl-long.371. Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, and Lihong Li. Webagent-r1: Training web agents via end-to-end multi-turn reinforcement learning,
-
[5]
URLhttps://arxiv.org/abs/2505.16421. Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Wenyi Zhao, Yu Yang, Xinyue Yang, Jiadai Sun, Shuntian Yao, Tianjie Zhang, Wei Xu, Jie Tang, and Yuxiao Dong. Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning,
-
[6]
Yuchen Zhuang, Di Jin, Jiaao Chen, Wenqi Shi, Hanrui Wang, and Chao Zhang
URL https://arxiv.org/abs/2411.02337. Yuchen Zhuang, Di Jin, Jiaao Chen, Wenqi Shi, Hanrui Wang, and Chao Zhang. Workforceagent-r1: Incentivizing reasoning capability in llm-based web agents via reinforcement learning,
-
[7]
URL https://arxiv.org/abs/2505.22942. Yu Gu, Boyuan Zheng, Boyu Gou, Kai Zhang, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, and Yu Su. Is your LLM secretly a world model of the internet? model-based planning for web agents. CoRR, abs/2411.06559,
-
[8]
URL https://doi.org/10.48550/arXiv .2411.06559
doi: 10.48550/ARXIV.2411.06559. URL https://doi.org/10.48550/arXiv .2411.06559. Tianqing Fang, Hongming Zhang, Zhisong Zhang, Kaixin Ma, Wenhao Yu, Haitao Mi, and Dong Yu. Webevolver: Enhancing web agent self-improvement with coevolving world model.arXiv preprint arXiv:2504.21024, 2025a. Vardaan Pahuja, Yadong Lu, Corby Rosset, Boyu Gou, Arindam Mitra, Sp...
-
[9]
URL https://doi.org/10.48550 /arXiv.2502.11357
doi: 10.48550/ARXIV.2502.11357. URL https://doi.org/10.48550 /arXiv.2502.11357. Richard S Sutton. Dyna, an integrated architecture for learning, planning, and reacting.ACM Sigart Bulletin, 2(4):160–163,
-
[10]
URLhttps://arxiv.org/abs/1912.01603. Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurélien Rodriguez, Austen Gregerson, A...
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[11]
doi: 10.48550 /ARXIV.2407.21783. URLhttps://doi.org/10.48550/arXiv.2407.21783. 12 DynaWeb Mengzhao Jia, Wenhao Yu, Kaixin Ma, Tianqing Fang, Zhihan Zhang, Siru Ouyang, Hongming Zhang, Meng Jiang, and Dong Yu. Leopard: A vision language model for text-rich multi-image tasks.CoRR, abs/2410.01744,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783
-
[12]
URL https://doi.org/10.48550/arXiv.2410
doi: 10.48550/ARXIV.2410.01744. URL https://doi.org/10.48550/arXiv.2410. 01744. OpenAI. Gpt-4 technical report. Technical Report, March
-
[13]
URL https://arxiv.org/abs/2303.08774. A large multimodal model capable of processing image and text inputs and producing text outputs. Achieves human-level performance on various professional benchmarks including passing a simulated bar exam in the top 10 Anthropic. Claude 3.7 sonnet: Hybrid reasoning model. https://www.anthropic.com/news/claude-3-7 -sonnet,
work page internal anchor Pith review Pith/arXiv arXiv
- [14]
-
[15]
Hongming Zhang, Xiaoman Pan, Hongwei Wang, Kaixin Ma, Wenhao Yu, and Dong Yu
URL https://github.com/mod elcontextprotocol. Hongming Zhang, Xiaoman Pan, Hongwei Wang, Kaixin Ma, Wenhao Yu, and Dong Yu. Cognitive kernel: An open-source agent system towards generalist autopilots.CoRR, abs/2409.10277, 2024a. doi: 10.48550/ARXIV.2409.10277. URLhttps://doi.org/10.48550/arXiv.2409.10277. Shunyu Yao, Howard Chen, John Yang, and Karthik Na...
-
[16]
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samual Stevens, Boshi Wang, Huan Sun, and Yu Su
URL http://papers.nips.cc/paper_files/paper/2022/hash/82ad13ec01f9fe44c01cb91 814fd7b8c-Abstract-Conference.html. Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samual Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editor...
work page 2022
-
[17]
URL http://papers.nips.cc/paper_files/paper/2023/hash/5950bf290 a1570ea401bf98882128160-Abstract-Datasets_and_Benchmarks.html. Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Russ Salakhutdinov, and Daniel Fried. Visualwebarena: Evaluating multimodal agents on realistic visual web tasks. In Lu...
-
[18]
URL https: //arxiv.org/abs/2410.02907. Brandon Trabucco, Gunnar A. Sigurdsson, Robinson Piramuthu, and Ruslan Salakhutdinov. Towards internet-scale training for agents.CoRR, abs/2502.06776,
-
[19]
URL https://doi.org/10.48550/arXiv.2502.06776
doi: 10.48550/ARXIV.2502.06776. URL https://doi.org/10.48550/arXiv.2502.06776. 13 DynaWeb Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, and Zhicheng Dou. Webthinker: Empowering large reasoning models with deep research capability.arXiv preprint arXiv:2504.21776, 2025a. Jialong Wu, Baixuan Li, Runnan Fang, Wenbiao...
-
[20]
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
Tianqing Fang, Zhisong Zhang, Xiaoyang Wang, Rui Wang, Can Qin, Yuxuan Wan, Jun-Yu Ma, Ce Zhang, Jiaqi Chen, Xiyun Li, et al. Cognitive kernel-pro: A framework for deep research agents and agent foundation models training.arXiv preprint arXiv:2508.00414, 2025b. MiroMindAI. Miroflow: A consistent agent framework with reproducible performance. https://githu...
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
URLhttps://arxiv.org/abs/2508.05748. Jing Yu Koh, Stephen McAleer, Daniel Fried, and Ruslan Salakhutdinov. Tree search for language model agents.CoRR, abs/2407.01476, 2024b. doi: 10.48550/ARXIV.2407.01476. URL https://doi.org/10.485 50/arXiv.2407.01476. Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg, and Rafael Rafailo...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.01476
-
[22]
URLhttps://doi.org/10.48550/arXiv.2408.07199
doi: 10.48550/ARXIV.2408.07199. URLhttps://doi.org/10.48550/arXiv.2408.07199. Xiao Yu, Baolin Peng, Vineeth Vajipey, Hao Cheng, Michel Galley, Jianfeng Gao, and Zhou Yu. Exact: Teaching AI agents to explore with reflective-mcts and exploratory learning.CoRR, abs/2410.02052,
-
[23]
URLhttps://doi.org/10.48550/arXiv.2410.02052
doi: 10.48550/ARXIV.2410.02052. URLhttps://doi.org/10.48550/arXiv.2410.02052. Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu-Xiong Wang. Language agent tree search unifies reasoning, acting, and planning in language models. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,
-
[24]
URL https://openreview.net/forum?id=njwv9BsGHF
OpenReview.net, 2024b. URL https://openreview.net/forum?id=njwv9BsGHF. Yao Zhang, Zijian Ma, Yunpu Ma, Zhen Han, Yu Wu, and Volker Tresp. Webpilot: A versatile and autonomous multi-agent system for web task execution with strategic exploration.CoRR, abs/2408.15978, 2024c. doi: 10.48550/ARXIV.2408.15978. URLhttps://doi.org/10.48550/arXiv.2408.15978. Noah S...
-
[25]
David Ha and Jürgen Schmidhuber
URL http://papers.nips.cc/paper_files/paper/2023/hash/1b44b878b b782e6954cd888628510e90-Abstract-Conference.html. David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing...
work page 2023
-
[26]
Diffusion Models Are Real-Time Game Engines
URL https: //proceedings.neurips.cc/paper/2018/hash/2de5d16682c3c35007e4e92982f1a2ba-Abstract.html. 14 DynaWeb Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter. Diffusion models are real-time game engines.CoRR, abs/2408.14837,
work page internal anchor Pith review arXiv 2018
-
[27]
Diffusion Models Are Real-Time Game Engines
doi: 10.48550/ARXIV.2408.14837. URL https://doi.org/10.485 50/arXiv.2408.14837. Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos J. Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and ...
work page internal anchor Pith review doi:10.48550/arxiv.2408.14837 2024
-
[28]
URL http://papers.nips.cc/paper_f iles/paper/2024/hash/6bdde0373d53d4a501249547084bed43-Abstract-Conference.html. Max Olan Smith and Michael P . Wellman. Co-learning empirical games and world models.CoRR, abs/2305.14223,
-
[29]
doi: 10.48550/ARXIV.2305.14223. URL https://doi.org/10.48550/arXiv .2305.14223. Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, and Zhiting Hu. Reasoning with language model is planning with world model. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Langu...
-
[30]
URLhttps://doi.org/10.18653/v1/2023.emnlp-main.507
doi: 10.18653/V1/2023.EMNLP-MAIN.507. URLhttps://doi.org/10.18653/v1/2023.emnlp-main.507. Shuofei Qiao, Runnan Fang, Ningyu Zhang, Yuqi Zhu, Xiang Chen, Shumin Deng, Yong Jiang, Pengjun Xie, Fei Huang, and Huajun Chen. Agent planning with world knowledge model. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomcz...
-
[31]
URL http: //papers.nips.cc/paper_files/paper/2024/hash/d032263772946dd5026e7f3cd22bce5b-Abstrac t-Conference.html. Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo Song, Jihoon Kim, Sunghwan Kim, Dongha Lee, and Jinyoung Yeo. Web agents with world models: Learning and leveraging environ- ment dynamics in web navigation. InThe Thirteenth ...
work page 2024
-
[32]
URLhttps://arxiv.org/abs/2402.03300. Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Yuxuan Song, Xia...
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
URLhttps://arxiv.org/abs/2503.14476. Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, and Jie Tang. Autowebglm: A large language model-based web navigating agent. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, page 5295–5306, New...
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
Association for Computing Machinery. ISBN 9798400704901. doi: 10.1145/3637528.3671620. URLhttps://doi.org/10.1145/3637528.3671620. Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, Jingren Zhou, and Junyang Lin. Group sequence policy optimization,
-
[35]
Group Sequence Policy Optimization
URL https://arxiv.org/abs/2507.18071. Apurva Gandhi and Graham Neubig. Go-browse: Training web agents with structured exploration,
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
URLhttps://arxiv.org/abs/2506.03533. 15 DynaWeb A More Details About DynaWeb A.1 WebArena T raining Prompt We provide the full system prompt used to train our web agent on WebArena. The prompt defines the agent’s role, available actions, observation format, and task completion criteria, and is used consistently across both real-environment interaction and...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.