GUIGuard-Bench is a new benchmark with annotated GUI screenshots that measures privacy recognition, planning fidelity under protection, and utility impact for trajectory-based GUI agents.
hub
Fara-7b: An efficient agentic model for computer use
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 13representative citing papers
AOI adds keyframe capture, volume-gated audio transcription, and visual narration to computer-use agents, producing +17 to +48 pp gains over screenshot baselines on DynaCU-Bench with no retraining.
HiViG is a test-time critic that combines macro-action history summarization with visual grounding of execution coordinates to reduce short-sighted and visually erroneous actions in long-horizon GUI agents.
Weblica scales RL training for visual web agents by building thousands of reproducible environments through HTTP caching for stable replays and LLM synthesis from real sites, yielding an 8B model that beats similar open baselines on navigation benchmarks.
Open 4B and 8B visual web agents achieve state-of-the-art results on browser benchmarks by predicting actions from screenshots and instructions, outperforming similar open models and some closed larger-model agents, with full release of data and code planned.
OpenWebRL trains a 4B visual web agent with online RL on live sites using 0.4K init trajectories and 2.2K RL tasks to reach 67% success on Online-Mind2Web and 64% on DeepShop, outperforming prior open agents.
A manager-driven DAG decomposition with parallel subagents improves computer use agent success rates by 3.4-25.5% and reduces wall-clock time on long-horizon benchmarks.
LearnWeak specializes small CUAs via weakness detection by a reference agent, targeted task synthesis, and error-aware training, delivering 11+ point gains on OSWorld.
SimGym is a browser-based VLM agent framework that simulates A/B test outcomes on e-commerce storefronts with 77% directional agreement on add-to-cart shifts from real buyer traffic.
ReVision reduces token usage by 46% and improves success rate by 3% on OSWorld, WebTailBench, and AgentNetBench by removing redundant visual patches from 5-history trajectories with Qwen2.5-VL-7B.
Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.
The authors built 1,000 synthetic computers and ran long-horizon agent simulations on them, generating experiential data that improves AI performance on both similar and new productivity tasks.
Generalizable agents require environment scaling via diverse executable rule-sets, distinguished from trajectory and task scaling in a new taxonomy.
citing papers explorer
-
Multi-Agent Computer Use
A manager-driven DAG decomposition with parallel subagents improves computer use agent success rates by 3.4-25.5% and reduces wall-clock time on long-horizon benchmarks.