MMSkills creates compact multimodal skill packages from trajectories and uses a branch-loaded agent to improve visual decision-making on GUI and game benchmarks.
Mirage-1: Augmenting and updating gui agent with hierarchical multimodal skills.arXiv preprint arXiv:2506.10387,
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
PersonalAlign introduces a hierarchical memory agent that uses long-term user records to resolve vague GUI instructions and provide proactive assistance, improving execution by 15.7% and proactive performance by 7.3% on the new AndroidIntent benchmark.
MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.
BehaviorVLA introduces a symmetric encoder-decoder architecture with causal Mamba and phase conditioning to learn unified long-horizon behavioral representations for improved generalization in VLA models.
citing papers explorer
-
MMSkills: Towards Multimodal Skills for General Visual Agents
MMSkills creates compact multimodal skill packages from trajectories and uses a branch-loaded agent to improve visual decision-making on GUI and game benchmarks.
-
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
PersonalAlign introduces a hierarchical memory agent that uses long-term user records to resolve vague GUI instructions and provide proactive assistance, improving execution by 15.7% and proactive performance by 7.3% on the new AndroidIntent benchmark.
-
MGA: Memory-Driven GUI Agent for Observation-Centric Interaction
MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.
-
From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model
BehaviorVLA introduces a symmetric encoder-decoder architecture with causal Mamba and phase conditioning to learn unified long-horizon behavioral representations for improved generalization in VLA models.