Mango: Multi-Agent Web Navigation via Global-View Optimization
Pith reviewed 2026-05-10 04:37 UTC · model grok-4.3
The pith
Mango lets web agents pick optimal starting URLs via Thompson Sampling on website structure instead of always beginning at the root.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mango formulates starting-URL selection as a multi-armed bandit problem and solves it with Thompson Sampling while maintaining an episodic memory of navigation history, allowing a multi-agent system to reach target pages more reliably than root-first baselines on hierarchical websites.
What carries the argument
Multi-armed bandit formulation of URL selection solved by Thompson Sampling, combined with episodic memory that stores and reuses navigation history.
If this is right
- Agents reach target information within tighter step budgets on sites with deep hierarchies.
- Performance gains hold across both open-source and closed-source language models used as backbones.
- Episodic memory reduces repeated exploration of dead-end branches in subsequent episodes.
- The same bandit-plus-memory loop can be applied to other structured navigation domains that supply extractable candidate entry points.
Where Pith is reading between the lines
- The method could lower overall compute cost for web agents by pruning irrelevant subtrees early.
- It might combine with hierarchical planning layers to handle even larger sites without increasing the bandit arm count.
- If website structure changes frequently, an online update rule for the candidate set would be needed to keep the bandit current.
Load-bearing premise
Candidate starting URLs can be extracted reliably from website structure and Thompson Sampling can spread the navigation budget across them without missing good paths or incurring high overhead.
What would settle it
On a collection of deep hierarchical sites, measure whether Mango's success rate falls below the best baseline when the extracted candidate URLs omit key branches or when the sampling routine consistently under-allocates budget to the correct path.
Figures
read the original abstract
Existing web agents typically initiate exploration from the root URL, which is inefficient for complex websites with deep hierarchical structures. Without a global view of the website's structure, agents frequently fall into navigation traps, explore irrelevant branches, or fail to reach target information within a limited budget. We propose Mango, a multi-agent web navigation method that leverages the website structure to dynamically determine optimal starting points. We formulate URL selection as a multi-armed bandit problem and employ Thompson Sampling to adaptively allocate the navigation budget across candidate URLs. Furthermore, we introduce an episodic memory component to store navigation history, enabling the agent to learn from previous attempts. Experiments on WebVoyager demonstrate that Mango achieves a success rate of 63.6% when using GPT-5-mini, outperforming the best baseline by 7.3%. Furthermore, on WebWalkerQA, Mango attains a 52.5% success rate, surpassing the best baseline by 26.8%. We also demonstrate the generalizability of Mango using both open-source and closed-source models as backbones. Our data and code are open-source and available at https://github.com/VichyTong/Mango.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Mango, a multi-agent web navigation framework that extracts candidate starting URLs from website structure, formulates their selection as a multi-armed bandit problem solved with Thompson Sampling to allocate limited navigation budget, and augments the process with episodic memory to retain navigation history. It reports concrete empirical gains: 63.6% success rate on WebVoyager (+7.3% over the strongest baseline) using GPT-5-mini, and 52.5% on WebWalkerQA (+26.8%), with additional experiments demonstrating generalizability across open- and closed-source model backbones.
Significance. If the reported success-rate improvements hold under the described experimental conditions, the work offers a practical mechanism for overcoming root-URL initialization inefficiencies in deep hierarchical websites. The combination of bandit-driven adaptive allocation and episodic memory is a clear engineering contribution, and the open-source release together with ablation tables that isolate each component (URL selection, sampling, memory) strengthens verifiability and potential for follow-on research in web agents.
minor comments (3)
- [§4.2, Table 2] §4.2 and Table 2: the ablation rows isolate the contribution of Thompson Sampling and episodic memory, but the text does not report the number of random seeds or variance across runs; adding these would strengthen the claim that the observed deltas are stable.
- [§3.1] §3.1: the procedure for extracting candidate starting URLs from HTML structure is described at a high level; a short pseudocode block or explicit list of heuristics would improve reproducibility, especially for dynamic or JavaScript-heavy sites.
- [Related Work] Related Work: several recent multi-agent web navigation papers (e.g., those using hierarchical planning or graph-based exploration) are cited, but a direct comparison of computational overhead or memory footprint versus Mango is absent; a brief paragraph would help situate the efficiency claims.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of Mango and the recommendation for minor revision. The recognition of the practical value of bandit-driven adaptive allocation and episodic memory for web navigation is appreciated.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents Mango as an empirical multi-agent web navigation system that extracts candidate starting URLs from site structure, formulates selection as a standard multi-armed bandit problem solved via Thompson Sampling, and augments it with episodic memory. Reported success rates (63.6% on WebVoyager, 52.5% on WebWalkerQA) are benchmark outcomes supported by algorithmic descriptions, ablation tables, and open-source code. No equations, predictions, or uniqueness claims reduce by construction to fitted inputs or self-citations; the derivation chain consists of standard algorithmic choices whose performance is evaluated externally against baselines.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Thompson Sampling balances exploration and exploitation effectively for allocating navigation budget across candidate URLs.
Reference graph
Works this paper leans on
-
[1]
Lai, Hanyu and Liu, Xiao and Iong, Iat Long and Yao, Shuntian and Chen, Yuxuan and Shen, Pengbo and Yu, Hao and Zhang, Hanchen and Zhang, Xiaohan and Dong, Yuxiao and Tang, Jie , title =. 2024 , isbn =. doi:10.1145/3637528.3671620 , booktitle =
-
[2]
Agent-e: From autonomous web navigation to foundational design principles in agentic systems,
Agent-e: From autonomous web navigation to foundational design principles in agentic systems , author=. arXiv preprint arXiv:2407.13032 , year=
-
[3]
The Thirteenth International Conference on Learning Representations , year=
Agent S: An Open Agentic Framework that Uses Computers Like a Human , author=. The Thirteenth International Conference on Learning Representations , year=
-
[4]
NeurIPS 2024 Workshop on Open-World Agents , year=
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems , author=. NeurIPS 2024 Workshop on Open-World Agents , year=
2024
-
[5]
The Thirteenth International Conference on Learning Representations , year=
Antonis Antoniades and Albert. The Thirteenth International Conference on Learning Representations , year=
-
[6]
Bohra, Arth and Saroyan, Manvel and Melkozerov, Danil and Karufanyan, Vahe and Maher, Gabriel and Weinberger, Pascal and Harutyunyan, Artem and Campagna, Giovanni , journal=
-
[7]
Mortal Multi-Armed Bandits , url =
Chakrabarti, Deepayan and Kumar, Ravi and Radlinski, Filip and Upfal, Eli , booktitle =. Mortal Multi-Armed Bandits , url =
-
[8]
Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=
Mind2Web: Towards a Generalist Agent for the Web , author=. Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=
-
[9]
MASTER : A M ulti-Agent S ystem with LLM S pecialized MCTS
Gan, Bingzheng and Zhao, Yufan and Zhang, Tianyi and Huang, Jing and Yusu, Li and Teo, Shu Xian and Zhang, Changwang and Shi, Wei. MASTER : A Multi-Agent System with LLM Specialized MCTS. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long P...
-
[10]
White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, and Nagu Rangan
Guan, Yanchu and Wang, Dong and Chu, Zhixuan and Wang, Shiyu and Ni, Feiyue and Song, Ruihua and Zhuang, Chenyi , title =. 2024 , isbn =. doi:10.1145/3637528.3671646 , booktitle =
-
[11]
The Twelfth International Conference on Learning Representations , year=
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis , author=. The Twelfth International Conference on Learning Representations , year=
-
[12]
Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu
He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, Dong. W eb V oyager: Building an End-to-End Web Agent with Large Multimodal Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.371
-
[13]
Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents
Kim, Jaekyeom and Kim, Dong-Ki and Logeswaran, Lajanugen and Sohn, Sungryull and Lee, Honglak. Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.964
-
[14]
Transactions on Machine Learning Research , issn=
Tree Search for Language Model Agents , author=. Transactions on Machine Learning Research , issn=. 2025 , url=
2025
-
[15]
A zero-shot language agent for computer control with structured reflection , url =
Li, Tao and Li, Gang and Deng, Zhiwei and Wang, Bryan and Li, Yang. A Zero-Shot Language Agent for Computer Control with Structured Reflection. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.753
-
[16]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Showui: One vision-language-action model for gui visual agent , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[17]
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V
A survey of webagents: Towards next-generation ai agents for web automation with large foundation models , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2 , pages=
-
[18]
Foundations and Trends
The probabilistic relevance framework: BM25 and beyond , author=. Foundations and Trends. 2009 , publisher=
2009
-
[19]
Forty-second International Conference on Machine Learning , year=
Cradle: Empowering Foundation Agents towards General Computer Control , author=. Forty-second International Conference on Machine Learning , year=
-
[20]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. arXiv preprint arXiv:2305.16291 , year=
work page internal anchor Pith review arXiv
-
[21]
Wu, Jialong and Yin, Wenbiao and Jiang, Yong and Wang, Zhenglin and Xi, Zekun and Fang, Runnan and Zhang, Linhai and He, Yulan and Zhou, Deyu and Xie, Pengjun and Huang, Fei. W eb W alker: Benchmarking LLM s in Web Traversal. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653...
-
[22]
A-Mem: Agentic Memory for
Wujiang Xu and Zujie Liang and Kai Mei and Hang Gao and Juntao Tan and Yongfeng Zhang , booktitle=. A-Mem: Agentic Memory for. 2025 , url=
2025
-
[23]
AgentOccam: A Simple Yet Strong Baseline for
Ke Yang and Yao Liu and Sapana Chaudhary and Rasool Fakoor and Pratik Chaudhari and George Karypis and Huzefa Rangwala , booktitle=. AgentOccam: A Simple Yet Strong Baseline for. 2025 , url=
2025
-
[24]
Xiao Yu and Baolin Peng and Vineeth Vajipey and Hao Cheng and Michel Galley and Jianfeng Gao and Zhou Yu , booktitle=. Ex. 2025 , url=
2025
-
[25]
Zhang, Yao and Ma, Zijian and Ma, Yunpu and Tresp, Volker , booktitle=
-
[26]
Zhang, Zhisong and Fang, Tianqing and Ma, Kaixin and Yu, Wenhao and Zhang, Hongming and Mi, Haitao and Yu, Dong , journal=
-
[27]
The Twelfth International Conference on Learning Representations , year=
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control , author=. The Twelfth International Conference on Learning Representations , year=
-
[28]
2024 , url=
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models , author=. 2024 , url=
2024
-
[29]
Zhou, Andy and Yan, Kai and Shlapentokh-Rothman, Michal and Wang, Haohan and Wang, Yu-Xiong , booktitle=
-
[30]
The Twelfth International Conference on Learning Representations , year=
WebArena: A Realistic Web Environment for Building Autonomous Agents , author=. The Twelfth International Conference on Learning Representations , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.