Mirage-1: Augmenting and updating gui agent with hierarchical multimodal skills.arXiv preprint arXiv:2506.10387,

Xie, Y · 2025 · arXiv 2506.10387

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

MMSkills: Towards Multimodal Skills for General Visual Agents

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

MMSkills creates compact multimodal skill packages from trajectories and uses a branch-loaded agent to improve visual decision-making on GUI and game benchmarks.

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

cs.AI · 2026-01-14 · conditional · novelty 7.0

PersonalAlign introduces a hierarchical memory agent that uses long-term user records to resolve vague GUI instructions and provide proactive assistance, improving execution by 15.7% and proactive performance by 7.3% on the new AndroidIntent benchmark.

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

cs.AI · 2025-10-28 · unverdicted · novelty 6.0

MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.

From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

BehaviorVLA introduces a symmetric encoder-decoder architecture with causal Mamba and phase conditioning to learn unified long-horizon behavioral representations for improved generalization in VLA models.

citing papers explorer

Showing 4 of 4 citing papers.

MMSkills: Towards Multimodal Skills for General Visual Agents cs.AI · 2026-05-13 · unverdicted · none · ref 31 · 2 links
MMSkills creates compact multimodal skill packages from trajectories and uses a branch-loaded agent to improve visual decision-making on GUI and game benchmarks.
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records cs.AI · 2026-01-14 · conditional · none · ref 4
PersonalAlign introduces a hierarchical memory agent that uses long-term user records to resolve vague GUI instructions and provide proactive assistance, improving execution by 15.7% and proactive performance by 7.3% on the new AndroidIntent benchmark.
MGA: Memory-Driven GUI Agent for Observation-Centric Interaction cs.AI · 2025-10-28 · unverdicted · none · ref 35
MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.
From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model cs.CV · 2026-05-21 · unverdicted · none · ref 27
BehaviorVLA introduces a symmetric encoder-decoder architecture with causal Mamba and phase conditioning to learn unified long-horizon behavioral representations for improved generalization in VLA models.

Mirage-1: Augmenting and updating gui agent with hierarchical multimodal skills.arXiv preprint arXiv:2506.10387,

fields

years

verdicts

representative citing papers

citing papers explorer