GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models

Bin Qin; Jiahui Yang; Jian Luan; Pei Fu; Ruoceng Zhang; Shaojie Zhang; Shaokang Wang; Xiuwen Xi; Ying Huang; Zhenbo Luo

arxiv: 2601.18197 · v2 · pith:BHWLSZW7new · submitted 2026-01-26 · 💻 cs.AI

GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models

Shaokang Wang , Pei Fu , Ruoceng Zhang , Shaojie Zhang , Xiuwen Xi , Jiahui Yang , Bin Qin , Ying Huang

show 2 more authors

Zhenbo Luo Jian Luan

This is my paper

classification 💻 cs.AI

keywords critictextbfagentmodelsdatagaiaperformancetest-time

0 comments

read the original abstract

While Large Vision-Language Models (LVLMs) have significantly advanced GUI agents' capabilities in parsing textual instructions, interpreting screen content, and executing tasks, a critical challenge persists: the irreversibility of agent operations-where a single erroneous action can trigger catastrophic deviations. To address this, we propose the \textbf{G}UI \textbf{A}ction Cr\textbf{i}tic's Dat\textbf{a} Flywheel System (GAIA), a training framework that enables the models to have iterative critic capabilities, which are used to improve the Test-Time Scaling (TTS) of basic GUI agents' performance. Specifically, we train an \textbf{Intuitive Critic Model} (ICM) using positive and negative action examples from a base agent first. This critic evaluates the immediate correctness of the agent's intended actions, thereby selecting operations with higher success probability. Then, the initial critic guides agent actions to collect refined positive/negative samples, initiating the self-improving cycle. The augmented data then trains a second-round critic with enhanced discernment capability. We conduct experiments on various datasets and demonstrate that the proposed ICM can improve the test-time performance of various closed-source and open-source models, and the performance can be gradually improved as the data is recycled. The code, dataset, and accompanying datasheet will be publicly released at https://github.com/SeerRay-Lab/GAIA.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
cs.LG 2026-05 unverdicted novelty 7.0

BBCritic uses contrastive learning to align GUI actions in a continuous affordance space, outperforming larger binary critic models on a new four-level hierarchical benchmark while enabling zero-shot transfer.
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
cs.LG 2026-05 unverdicted novelty 7.0

BBCritic reframes GUI critique as continuous semantic alignment via contrastive learning in an affordance space, outperforming larger binary SOTA models on a new four-level hierarchical benchmark without extra annotations.