pith. machine review for the scientific record. sign in

arxiv: 2601.07262 · v3 · submitted 2026-01-12 · 💻 cs.HC

Recognition: unknown

ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge Evolution

Authors on Pith no claims yet
classification 💻 cs.HC
keywords agentcolorbrowseragentknowledgelong-horizonautomationbrowserchallengescomplex
0
0 comments X
read the original abstract

With the advancement of vision-language models, web automation has made significant progress. However, deploying autonomous agents in real-world settings remains challenging, primarily due to site heterogeneity, where generalist models lack domain-specific priors for diverse interfaces, and long-horizon instability, characterized by the accumulation of decision drift over extended interactions. To address these challenges, we introduce ColorBrowserAgent (Complex Long-Horizon Browser Agent), a knowledge-evolving agent for robust web automation. Our approach addresses these challenges through two synergistic mechanisms: human-in-the-loop knowledge adaptation that transforms sparse human feedback into reusable domain knowledge, and knowledge-aligned progressive summarization that stabilizes long interactions through memory compression. Extensive experiments on WebArena, WebChoreArena and industrial deployment show that ColorBrowserAgent consistently outperforms strong baselines. It achieves a state-of-the-art success rate of 71.2% on WebArena and maintains 47.4% performance under zero-shot transfer setting on WebChoreArena. In commercial deployment, it improves user satisfaction by 19.3% relatively, verifying its robustness in real-world scenarios.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

    cs.AI 2026-04 unverdicted novelty 7.0

    WebForge is an automated multi-agent framework that creates realistic and reproducible browser agent benchmarks at scale, demonstrated via a 934-task benchmark that reveals distinct model capability profiles through m...