arXiv preprint arXiv:2407.05291 (2024)

· 2024 · arXiv 2407.05291

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows

cs.MA · 2026-05-09 · unverdicted · novelty 7.0

EntCollabBench shows that today's LLM agents still struggle with delegation, context transfer, parameter grounding, workflow closure, and decision commitment when tested in a simulated enterprise with 11 role-specialized agents.

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Open 4B and 8B visual web agents achieve state-of-the-art results on browser benchmarks by predicting actions from screenshots and instructions, outperforming similar open models and some closed larger-model agents, with full release of data and code planned.

Signal-Driven Observation for Long-Horizon Web Agents

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

Signal-Driven Observation decouples observation from action frequency in long-horizon web agents by invoking selective task-relevant DOM reads only on signals such as URL changes or action failures.

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

cs.CL · 2026-04-23 · conditional · novelty 6.0

VLAA-GUI adds mandatory visual verifiers, multi-tier loop breakers, and on-demand search to GUI agents, reaching 77.5% on OSWorld and 61.0% on WindowsAgentArena with some models exceeding human performance.

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

cs.AI · 2025-05-26

citing papers explorer

Showing 5 of 5 citing papers.

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows cs.MA · 2026-05-09 · unverdicted · none · ref 5
EntCollabBench shows that today's LLM agents still struggle with delegation, context transfer, parameter grounding, workflow closure, and decision commitment when tested in a simulated enterprise with 11 role-specialized agents.
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web cs.CV · 2026-04-09 · unverdicted · none · ref 69
Open 4B and 8B visual web agents achieve state-of-the-art results on browser benchmarks by predicting actions from screenshots and instructions, outperforming similar open models and some closed larger-model agents, with full release of data and code planned.
Signal-Driven Observation for Long-Horizon Web Agents cs.CL · 2026-06-04 · unverdicted · none · ref 9
Signal-Driven Observation decouples observation from action frequency in long-horizon web agents by invoking selective task-relevant DOM reads only on signals such as URL changes or action failures.
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation cs.CL · 2026-04-23 · conditional · none · ref 23
VLAA-GUI adds mandatory visual verifiers, multi-tier loop breakers, and on-demand search to GUI agents, reaching 77.5% on OSWorld and 61.0% on WindowsAgentArena with some models exceeding human performance.
FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks cs.AI · 2025-05-26 · unreviewed · ref 4

arXiv preprint arXiv:2407.05291 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer