hub Canonical reference

Weblinx: Real-world website navigation with multi-turn dialogue

Weblinx: Real-world website navigation with multi-turn dialogue , author= · 2024 · arXiv 2402.05930

Canonical reference. 80% of citing Pith papers cite this work as background.

17 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 dataset 1

citation-polarity summary

background 4 use dataset 1

representative citing papers

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

cs.AI · 2024-04-11 · accept · novelty 8.0

OSWorld provides the first unified real-computer benchmark for open-ended multimodal agent tasks, exposing large performance gaps between humans and state-of-the-art LLM/VLM agents.

Skim: Speculative Execution for Fast and Efficient Web Agents

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

Skim profiles website patterns offline to enable fast-path speculative execution for web agents, cutting median cost by 1.9x and latency by 33.4% with no accuracy loss on benchmarks.

IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

A parallel multi-turn medical dialogue dataset spanning English and nine Indic languages is created from synthetic consultations to enable personalized AI healthcare interactions.

ViBR: Automated Bug Replay from Video-based Reports using Vision-Language Models

cs.SE · 2026-04-21 · unverdicted · novelty 7.0

ViBR reproduces 72% of bugs from video reports by segmenting actions with CLIP similarity and using VLMs for region-aware GUI state comparison, outperforming prior heuristics-based methods.

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Open 4B and 8B visual web agents achieve state-of-the-art results on browser benchmarks by predicting actions from screenshots and instructions, outperforming similar open models and some closed larger-model agents, with full release of data and code planned.

EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments

cs.CL · 2025-06-09 · unverdicted · novelty 7.0

EconWebArena is a new benchmark with 360 curated economic tasks across 82 authoritative websites for evaluating multimodal web agents on navigation, grounding, and data extraction.

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

cs.LG · 2024-03-12 · unverdicted · novelty 7.0

WorkArena benchmark shows LLM web agents achieve partial success on enterprise tasks but have a substantial gap to full automation and perform worse with open-source models.

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

OSWorld 2.0 is a benchmark of 108 realistic long-horizon computer-use tasks where current agents achieve only 20.6% binary completion, struggling with state inference and constraint tracking.

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

cs.CL · 2026-04-23 · conditional · novelty 6.0

VLAA-GUI adds mandatory visual verifiers, multi-tier loop breakers, and on-demand search to GUI agents, reaching 77.5% on OSWorld and 61.0% on WindowsAgentArena with some models exceeding human performance.

WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

cs.AI · 2026-03-05 · unverdicted · novelty 6.0

WebChain supplies the largest open dataset of real human web trajectories with triple-modal alignment and a dual mid-training method that separates grounding from planning to improve web agents.

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

cs.CL · 2024-12-05 · conditional · novelty 6.0

Aguvis presents a pure vision-based framework for autonomous GUI agents using structured reasoning via inner monologue, a new multimodal dataset, and two-stage training to reach SOTA on offline and online benchmarks.

WebCanvas: Benchmarking Web Agents in Online Environments

cs.CL · 2024-06-18 · unverdicted · novelty 6.0

WebCanvas creates a dynamic benchmark for web agents with a noise-resistant evaluation metric, the Mind2Web-Live dataset of 542 tasks, and open-source tools and agent framework for ongoing online testing.

When Web Agents Finish but Still Fail: Reproducible Triggers and Trace Diagnostics for Parallel Web Exploration

cs.AI · 2026-06-16 · unverdicted · novelty 5.0

Parallel WebBench reveals GRPO training raises web agent completion to 96% but leaves a large correctness gap from context-bound loops, premature termination, and synthesis collapse.

Toward Native Multimodal Modeling: A Roadmap

cs.CV · 2026-05-25 · unverdicted · novelty 3.0

A roadmap that defines architectural nativity for multimodal models and categorizes them into Multi-to-Text, Multi-to-Target, and Multi-to-Multi types while outlining an industrial pipeline toward unified transformer-based native multimodal modeling.

LLM-Powered AI Agent Systems and Their Applications in Industry

cs.AI · 2025-05-22 · unverdicted · novelty 2.0

A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

cs.LG · 2026-05-19

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

cs.AI · 2026-04-20

citing papers explorer

Showing 1 of 1 citing paper after filters.

LLM-Powered AI Agent Systems and Their Applications in Industry cs.AI · 2025-05-22 · unverdicted · none · ref 109
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.

Weblinx: Real-world website navigation with multi-turn dialogue

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer