pith. sign in

arxiv: 2604.18779 · v1 · submitted 2026-04-20 · 💻 cs.CL · cs.AI

Mango: Multi-Agent Web Navigation via Global-View Optimization

Pith reviewed 2026-05-10 04:37 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords web navigationmulti-agent systemsThompson Samplingmulti-armed banditepisodic memorystarting URL selectionglobal view optimization
0
0 comments X

The pith

Mango lets web agents pick optimal starting URLs via Thompson Sampling on website structure instead of always beginning at the root.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that web navigation agents waste effort when forced to start at a site's root URL on deep or branched sites. Mango instead extracts candidate starting URLs from the overall structure and treats their selection as a multi-armed bandit problem solved by Thompson Sampling to spread a limited navigation budget across promising paths. An episodic memory module records past attempts so the agent can avoid repeating mistakes. On two benchmarks the approach lifts success rates substantially over strong baselines and works with both open and closed language models. A reader would care because many real-world information tasks require traversing complex sites without unlimited time or steps.

Core claim

Mango formulates starting-URL selection as a multi-armed bandit problem and solves it with Thompson Sampling while maintaining an episodic memory of navigation history, allowing a multi-agent system to reach target pages more reliably than root-first baselines on hierarchical websites.

What carries the argument

Multi-armed bandit formulation of URL selection solved by Thompson Sampling, combined with episodic memory that stores and reuses navigation history.

If this is right

  • Agents reach target information within tighter step budgets on sites with deep hierarchies.
  • Performance gains hold across both open-source and closed-source language models used as backbones.
  • Episodic memory reduces repeated exploration of dead-end branches in subsequent episodes.
  • The same bandit-plus-memory loop can be applied to other structured navigation domains that supply extractable candidate entry points.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could lower overall compute cost for web agents by pruning irrelevant subtrees early.
  • It might combine with hierarchical planning layers to handle even larger sites without increasing the bandit arm count.
  • If website structure changes frequently, an online update rule for the candidate set would be needed to keep the bandit current.

Load-bearing premise

Candidate starting URLs can be extracted reliably from website structure and Thompson Sampling can spread the navigation budget across them without missing good paths or incurring high overhead.

What would settle it

On a collection of deep hierarchical sites, measure whether Mango's success rate falls below the best baseline when the extracted candidate URLs omit key branches or when the sampling routine consistently under-allocates budget to the correct path.

Figures

Figures reproduced from arXiv: 2604.18779 by Tianyi Zhang, Weixi Tong, Yifeng Di.

Figure 1
Figure 1. Figure 1: Overview of the web navigation agent system. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cumulative number of successful tasks rela [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sensitivity analysis of MANGO regarding key hyperparameters: the navigation budget per URL selec￾tion b, the number of Thompson Sampling iterations, the weight parameter κ, the crawl limit τ , and the candidate set sizes of Crawling and Google Search. selection b, the number of Thompson Sampling it￾erations, the weight parameter κ, the crawl limit τ , and the candidate set size. We run MANGO in different b… view at source ↗
read the original abstract

Existing web agents typically initiate exploration from the root URL, which is inefficient for complex websites with deep hierarchical structures. Without a global view of the website's structure, agents frequently fall into navigation traps, explore irrelevant branches, or fail to reach target information within a limited budget. We propose Mango, a multi-agent web navigation method that leverages the website structure to dynamically determine optimal starting points. We formulate URL selection as a multi-armed bandit problem and employ Thompson Sampling to adaptively allocate the navigation budget across candidate URLs. Furthermore, we introduce an episodic memory component to store navigation history, enabling the agent to learn from previous attempts. Experiments on WebVoyager demonstrate that Mango achieves a success rate of 63.6% when using GPT-5-mini, outperforming the best baseline by 7.3%. Furthermore, on WebWalkerQA, Mango attains a 52.5% success rate, surpassing the best baseline by 26.8%. We also demonstrate the generalizability of Mango using both open-source and closed-source models as backbones. Our data and code are open-source and available at https://github.com/VichyTong/Mango.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces Mango, a multi-agent web navigation framework that extracts candidate starting URLs from website structure, formulates their selection as a multi-armed bandit problem solved with Thompson Sampling to allocate limited navigation budget, and augments the process with episodic memory to retain navigation history. It reports concrete empirical gains: 63.6% success rate on WebVoyager (+7.3% over the strongest baseline) using GPT-5-mini, and 52.5% on WebWalkerQA (+26.8%), with additional experiments demonstrating generalizability across open- and closed-source model backbones.

Significance. If the reported success-rate improvements hold under the described experimental conditions, the work offers a practical mechanism for overcoming root-URL initialization inefficiencies in deep hierarchical websites. The combination of bandit-driven adaptive allocation and episodic memory is a clear engineering contribution, and the open-source release together with ablation tables that isolate each component (URL selection, sampling, memory) strengthens verifiability and potential for follow-on research in web agents.

minor comments (3)
  1. [§4.2, Table 2] §4.2 and Table 2: the ablation rows isolate the contribution of Thompson Sampling and episodic memory, but the text does not report the number of random seeds or variance across runs; adding these would strengthen the claim that the observed deltas are stable.
  2. [§3.1] §3.1: the procedure for extracting candidate starting URLs from HTML structure is described at a high level; a short pseudocode block or explicit list of heuristics would improve reproducibility, especially for dynamic or JavaScript-heavy sites.
  3. [Related Work] Related Work: several recent multi-agent web navigation papers (e.g., those using hierarchical planning or graph-based exploration) are cited, but a direct comparison of computational overhead or memory footprint versus Mango is absent; a brief paragraph would help situate the efficiency claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of Mango and the recommendation for minor revision. The recognition of the practical value of bandit-driven adaptive allocation and episodic memory for web navigation is appreciated.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents Mango as an empirical multi-agent web navigation system that extracts candidate starting URLs from site structure, formulates selection as a standard multi-armed bandit problem solved via Thompson Sampling, and augments it with episodic memory. Reported success rates (63.6% on WebVoyager, 52.5% on WebWalkerQA) are benchmark outcomes supported by algorithmic descriptions, ablation tables, and open-source code. No equations, predictions, or uniqueness claims reduce by construction to fitted inputs or self-citations; the derivation chain consists of standard algorithmic choices whose performance is evaluated externally against baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract provides no explicit free parameters, invented entities, or non-standard axioms; relies on standard assumptions of multi-armed bandits and episodic memory.

axioms (1)
  • domain assumption Thompson Sampling balances exploration and exploitation effectively for allocating navigation budget across candidate URLs.
    Invoked by the formulation of URL selection as a bandit problem.

pith-pipeline@v0.9.0 · 5497 in / 1025 out tokens · 43772 ms · 2026-05-10T04:37:42.483006+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    ISBN 9798400704901

    Lai, Hanyu and Liu, Xiao and Iong, Iat Long and Yao, Shuntian and Chen, Yuxuan and Shen, Pengbo and Yu, Hao and Zhang, Hanchen and Zhang, Xiaohan and Dong, Yuxiao and Tang, Jie , title =. 2024 , isbn =. doi:10.1145/3637528.3671620 , booktitle =

  2. [2]

    Agent-e: From autonomous web navigation to foundational design principles in agentic systems,

    Agent-e: From autonomous web navigation to foundational design principles in agentic systems , author=. arXiv preprint arXiv:2407.13032 , year=

  3. [3]

    The Thirteenth International Conference on Learning Representations , year=

    Agent S: An Open Agentic Framework that Uses Computers Like a Human , author=. The Thirteenth International Conference on Learning Representations , year=

  4. [4]

    NeurIPS 2024 Workshop on Open-World Agents , year=

    Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems , author=. NeurIPS 2024 Workshop on Open-World Agents , year=

  5. [5]

    The Thirteenth International Conference on Learning Representations , year=

    Antonis Antoniades and Albert. The Thirteenth International Conference on Learning Representations , year=

  6. [6]

    Bohra, Arth and Saroyan, Manvel and Melkozerov, Danil and Karufanyan, Vahe and Maher, Gabriel and Weinberger, Pascal and Harutyunyan, Artem and Campagna, Giovanni , journal=

  7. [7]

    Mortal Multi-Armed Bandits , url =

    Chakrabarti, Deepayan and Kumar, Ravi and Radlinski, Filip and Upfal, Eli , booktitle =. Mortal Multi-Armed Bandits , url =

  8. [8]

    Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

    Mind2Web: Towards a Generalist Agent for the Web , author=. Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

  9. [9]

    MASTER : A M ulti-Agent S ystem with LLM S pecialized MCTS

    Gan, Bingzheng and Zhao, Yufan and Zhang, Tianyi and Huang, Jing and Yusu, Li and Teo, Shu Xian and Zhang, Changwang and Shi, Wei. MASTER : A Multi-Agent System with LLM Specialized MCTS. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long P...

  10. [10]

    White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, and Nagu Rangan

    Guan, Yanchu and Wang, Dong and Chu, Zhixuan and Wang, Shiyu and Ni, Feiyue and Song, Ruihua and Zhuang, Chenyi , title =. 2024 , isbn =. doi:10.1145/3637528.3671646 , booktitle =

  11. [11]

    The Twelfth International Conference on Learning Representations , year=

    A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis , author=. The Twelfth International Conference on Learning Representations , year=

  12. [12]

    Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu

    He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, Dong. W eb V oyager: Building an End-to-End Web Agent with Large Multimodal Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.371

  13. [13]

    Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents

    Kim, Jaekyeom and Kim, Dong-Ki and Logeswaran, Lajanugen and Sohn, Sungryull and Lee, Honglak. Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.964

  14. [14]

    Transactions on Machine Learning Research , issn=

    Tree Search for Language Model Agents , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

  15. [15]

    A zero-shot language agent for computer control with structured reflection , url =

    Li, Tao and Li, Gang and Deng, Zhiwei and Wang, Bryan and Li, Yang. A Zero-Shot Language Agent for Computer Control with Structured Reflection. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.753

  16. [16]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Showui: One vision-language-action model for gui visual agent , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  17. [17]

    Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

    A survey of webagents: Towards next-generation ai agents for web automation with large foundation models , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2 , pages=

  18. [18]

    Foundations and Trends

    The probabilistic relevance framework: BM25 and beyond , author=. Foundations and Trends. 2009 , publisher=

  19. [19]

    Forty-second International Conference on Machine Learning , year=

    Cradle: Empowering Foundation Agents towards General Computer Control , author=. Forty-second International Conference on Machine Learning , year=

  20. [20]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. arXiv preprint arXiv:2305.16291 , year=

  21. [21]

    Proceedings of the 63rd

    Wu, Jialong and Yin, Wenbiao and Jiang, Yong and Wang, Zhenglin and Xi, Zekun and Fang, Runnan and Zhang, Linhai and He, Yulan and Zhou, Deyu and Xie, Pengjun and Huang, Fei. W eb W alker: Benchmarking LLM s in Web Traversal. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653...

  22. [22]

    A-Mem: Agentic Memory for

    Wujiang Xu and Zujie Liang and Kai Mei and Hang Gao and Juntao Tan and Yongfeng Zhang , booktitle=. A-Mem: Agentic Memory for. 2025 , url=

  23. [23]

    AgentOccam: A Simple Yet Strong Baseline for

    Ke Yang and Yao Liu and Sapana Chaudhary and Rasool Fakoor and Pratik Chaudhari and George Karypis and Huzefa Rangwala , booktitle=. AgentOccam: A Simple Yet Strong Baseline for. 2025 , url=

  24. [24]

    Xiao Yu and Baolin Peng and Vineeth Vajipey and Hao Cheng and Michel Galley and Jianfeng Gao and Zhou Yu , booktitle=. Ex. 2025 , url=

  25. [25]

    Zhang, Yao and Ma, Zijian and Ma, Yunpu and Tresp, Volker , booktitle=

  26. [26]

    Zhang, Zhisong and Fang, Tianqing and Ma, Kaixin and Yu, Wenhao and Zhang, Hongming and Mi, Haitao and Yu, Dong , journal=

  27. [27]

    The Twelfth International Conference on Learning Representations , year=

    Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control , author=. The Twelfth International Conference on Learning Representations , year=

  28. [28]

    2024 , url=

    Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models , author=. 2024 , url=

  29. [29]

    Zhou, Andy and Yan, Kai and Shlapentokh-Rothman, Michal and Wang, Haohan and Wang, Yu-Xiong , booktitle=

  30. [30]

    The Twelfth International Conference on Learning Representations , year=

    WebArena: A Realistic Web Environment for Building Autonomous Agents , author=. The Twelfth International Conference on Learning Representations , year=