{"total":11,"items":[{"citing_arxiv_id":"2604.14668","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Chat and Clicks: GUI Agents for In-Situ Assistance via Live Interface Transformation","primary_cat":"cs.HC","submitted_at":"2026-04-16T06:22:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GUI agents can transform live web interfaces in real-time via DOM manipulations to deliver contextual assistance directly within the application.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"augments GUIs with a separate conversational panel that provides on-demand guidance [7, 24, 48]. However, users often struggle to ar- ticulate the full interface context in language [53], and must further translate the system's response back into the spatial and interac- tive structure of the GUI. Recent agentic systems, such as Gemini in Chrome [17] and ChatGPT Atlas [33], can respond to user re- quests through low-level interface actions such as clicking, typing, and scrolling [22]. However, their performance degrades on com- plex professional environments involving careful decision-making (lower than 20% accuracy for domain specific workflows [42]). An- other line of work uses LLMs to dynamically generate task-specific"},{"citing_arxiv_id":"2604.07929","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems","primary_cat":"cs.IR","submitted_at":"2026-04-09T07:49:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A controlled study in an audio-streaming search app shows GUI agents match human task success and query patterns but use more search-centric, low-branching navigation while humans are content-centric and exploratory.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19750","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging","primary_cat":"cs.SE","submitted_at":"2026-03-14T05:40:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VF-Coder raises GUI code success rate from 21.68% to 28.29% and visual score from 0.4284 to 0.5584 on a new 984-task benchmark by adding direct visual perception and interaction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.10139","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible","primary_cat":"cs.CR","submitted_at":"2026-02-08T15:50:04+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An anonymization framework replaces sensitive UI content with deterministic placeholders to protect privacy in mobile GUI agents while preserving task performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.10371","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management","primary_cat":"cs.AI","submitted_at":"2025-12-11T07:37:38+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AgentProg reframes interaction history as a program with variables and control flow, plus a belief state for partial observability, achieving SOTA success rates on long-horizon GUI benchmarks while baselines degrade.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.24168","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MGA: Memory-Driven GUI Agent for Observation-Centric Interaction","primary_cat":"cs.AI","submitted_at":"2025-10-28T08:19:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.02544","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning","primary_cat":"cs.AI","submitted_at":"2025-09-02T17:44:45+00:00","verdict":"CONDITIONAL","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"UI-TARS-2 reaches 88.2 on Online-Mind2Web, 47.5 on OSWorld, 50.6 on WindowsAgentArena, and 73.3 on AndroidWorld while attaining 59.8 mean normalized score on a 15-game suite through multi-turn RL and scalable data generation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"nature, 518(7540):529-533, 2015. [39] MoonshotAI. Kimi-researcher: End-to-end rl training for emerging agentic capabilities.https://moonshotai. github.io/Kimi-Researcher/, 2025. [40] Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, et al. Gui agents: A survey.arXiv preprint arXiv:2412.13501, 2024. [41] OpenAI. OpenAI: Introducing ChatGPT, 2022. URLhttps://openai.com/blog/chatgpt. [42] OpenAI. Introducing gpt 5, 2025. URLhttps://openai.com/index/introducing-gpt-5/. [43] OpenAI. Introducing deep research - openai.https://openai.com/index/introducing-deep-research/, 2025. [44] OpenAI. Openai o3 and o4-mini system card. https://cdn.openai.com/pdf/ 2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card."},{"citing_arxiv_id":"2507.10610","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents","primary_cat":"cs.CR","submitted_at":"2025-07-13T08:36:09+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LaSM is a layer-wise scaling mechanism that amplifies attention and MLP modules in critical layers to defend GUI agents against pop-up attacks by correcting attention misalignment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.04227","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mobile GUI Agents under Real-world Threats: Are We There Yet?","primary_cat":"cs.CR","submitted_at":"2025-07-06T03:31:36+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.20332","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training","primary_cat":"cs.AI","submitted_at":"2025-06-25T11:34:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Mobile-R1 introduces a hierarchical three-stage curriculum that combines format alignment, verifiable action feedback, and multi-turn environment training to improve exploration and self-correction in VLM-based mobile agents, plus a new Chinese GUI dataset and benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2411.18279","ref_index":65,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large Language Model-Brained GUI Agents: A Survey","primary_cat":"cs.AI","submitted_at":"2024-11-27T12:13:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":", [60] A survey of multimodal interaction with AI agents. ✓ ⃝ Wuet al., [61] A survey of foundations and trend on multimodal mobile agents. ✓ ✓ Wanget al., [62] A survey on the integration of foundation models with GUI agents. ✓ ✓ Gaoet al., [63] A survey on autonomous agents across digital platforms. ✓ ✓ Danget al., [64] A survey on GUI agents. ✓ ✓ Liuet al., [65] A survey on GUI agent on phone automation. ✓ ✓ Huet al., [66] A survey on MLLM based agents for OS. ✓ ✓ Shiet al., [67] A survey of building trustworthy GUI agents. ✓ ✓ Ninget al., [68] A survey of agents for Web automation. ✓ ✓ Tanget al., [69] A survey of GUI agents powered by (multimodal) LLMs. ✓ ✓ Li and Huanget al., [70] A summary of GUI agents powered by foundation models and enhanced through reinforcement learning✓ ✓"}],"limit":50,"offset":0}