DRS-GUI introduces a dynamic region search method with Focus/Shift/Scatter actions and MCTS-based planning that improves GUI grounding accuracy by 14% on ScreenSpot-Pro for both general and GUI-specific MLLMs without any training.
Falcon-ui: Understanding gui before following user instructions
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.AI 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.
World models trained on delta text, full text, diffusion images, and renderable code achieve SoTA on two benchmarks and improve downstream GUI agent performance on three mobile datasets with modality-specific strengths.
citing papers explorer
-
DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding
DRS-GUI introduces a dynamic region search method with Focus/Shift/Scatter actions and MCTS-based planning that improves GUI grounding accuracy by 14% on ScreenSpot-Pro for both general and GUI-specific MLLMs without any training.
-
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.
-
How Mobile World Model Guides GUI Agents?
World models trained on delta text, full text, diffusion images, and renderable code achieve SoTA on two benchmarks and improve downstream GUI agent performance on three mobile datasets with modality-specific strengths.