WebArena provides a realistic multi-domain web environment and benchmark where state-of-the-art LLM agents achieve 14.41% end-to-end task success compared to 78.24% for humans.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
PRAC redirects Computer Use Agents' product selection by concentrating attention on an adversarial patch in the vision input, generalizing from white-box creation to fine-tuned models.
VisualWebArena benchmark demonstrates that state-of-the-art multimodal agents still exhibit significant limitations on visually grounded web tasks.
citing papers explorer
-
WebArena: A Realistic Web Environment for Building Autonomous Agents
WebArena provides a realistic multi-domain web environment and benchmark where state-of-the-art LLM agents achieve 14.41% end-to-end task success compared to 78.24% for humans.
-
Preference Redirection via Attention Concentration: An Attack on Computer Use Agents
PRAC redirects Computer Use Agents' product selection by concentrating attention on an adversarial patch in the vision input, generalizing from white-box creation to fine-tuned models.
-
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
VisualWebArena benchmark demonstrates that state-of-the-art multimodal agents still exhibit significant limitations on visually grounded web tasks.