VideoAgentTrek: Computer use pretraining from unlabeled videos, 2025

Lu, D · 2025 · arXiv 2510.19488

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

cs.AI · 2026-05-25 · conditional · novelty 7.0

CUA-Gym generates 32,112 verified RLVR tuples across 110 mock environments, enabling trained models to reach 62.1% and 72.6% on OSWorld-Verified while transferring to WebArena.

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

OSWorld 2.0 is a benchmark of 108 realistic long-horizon computer-use tasks where current agents achieve only 20.6% binary completion, struggling with state inference and constraint tracking.

PhoneWorld: Scaling Phone-Use Agent Environments

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

PhoneWorld is a pipeline that converts real mobile trajectories into scalable controllable environments, yielding large gains on four benchmarks when used to supplement training data.

citing papers explorer

Showing 2 of 2 citing papers after filters.

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents cs.AI · 2026-05-25 · conditional · none · ref 12
CUA-Gym generates 32,112 verified RLVR tuples across 110 mock environments, enabling trained models to reach 62.1% and 72.6% on OSWorld-Verified while transferring to WebArena.
OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks cs.AI · 2026-06-28 · unverdicted · none · ref 56
OSWorld 2.0 is a benchmark of 108 realistic long-horizon computer-use tasks where current agents achieve only 20.6% binary completion, struggling with state inference and constraint tracking.

VideoAgentTrek: Computer use pretraining from unlabeled videos, 2025

fields

years

verdicts

representative citing papers

citing papers explorer