pith. sign in

arxiv: 2606.07027 · v1 · pith:CCHIQR23new · submitted 2026-06-05 · 💻 cs.AI

StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents

Pith reviewed 2026-06-27 22:22 UTC · model grok-4.3

classification 💻 cs.AI
keywords GUI agentsprocess reward modelsreinforcement learningentity trackingstain flowvisual evidenceAndroidWorld
0
0 comments X

The pith

StainFlow tracks entity stains to provide objective process rewards for GUI agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve process reward models for reinforcement learning in GUI agents by replacing subjective global milestones and fixed local windows with tracking of entity stain concentrations and states. This approach uses visual changes in task entities to objectively divide trajectories into phases and link relevant evidence for verification. A sympathetic reader would care because sparse feedback currently limits effective credit assignment in long-horizon tasks with multiple paths, and better rewards could lead to more capable digital agents. Experiments indicate gains in both online RL success and judgment accuracy when these modules are applied.

Core claim

StainFlow introduces the Global Entity Stain Tracking module to extract visually verifiable task entities and monitor their stain concentrations and state evolutions for objective phase separation, along with the Local Stain Evidence Linking module that centers on triggering entities to retrieve stain-related steps and build high-density evidence windows.

What carries the argument

The entity-stain tracking mechanism that follows how task entities' visual stains change to separate phases and link evidence.

If this is right

  • Online RL for GUI agents achieves a relative 3.2% improvement in success rates.
  • Accuracy in judging trajectory completion rises by 1.8% on tested benchmarks.
  • Multiple valid execution paths become accommodable through objective evidence flows rather than singular decompositions.
  • Local verification avoids missing long-range evidence or diluting signals with irrelevant frames.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The stain tracking idea could apply to other sequential decision tasks with visual states if entity extraction generalizes.
  • Agents might learn to plan by anticipating stain state transitions without explicit human-defined milestones.
  • Combining this with existing PRMs could create hybrid rewards that balance global flow and local details.

Load-bearing premise

Changes in visually extracted entity stain concentrations and states can objectively separate task phases and construct high-density evidence windows without introducing new subjectivity or missing long-range dependencies.

What would settle it

A set of GUI trajectories where stain concentration changes fail to mark actual task progress points, causing the reward model to assign high rewards to failing paths or low rewards to successful ones.

Figures

Figures reproduced from arXiv: 2606.07027 by Dongshuo Huang, Haojie Hao, Hongyu Lin, Jiakai Wang, Lanqing Hong, Longkun Hao, Xianglong Liu, Yan Bai, Yihang Lou, Zhenyang Li, Zhichao Yang.

Figure 1
Figure 1. Figure 1: Our proposed StainFlow dynamically and objectively partitions task stages and introduces [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall workflow of StainFlow, where Global Entity Stain Tracking and Local Stain [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effects of advantage weight η and stain decay factors γi for persistent and transient entities. training step OGRBench [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: StainFlow training reward curves and evidence span ranges. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: StainFlow staining and reward assignment example for a mobile GUI trajectory. Nodes [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: StainFlow staining and reward assignment example for a desktop GUI trajectory. Nodes [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗
read the original abstract

Reinforcement Learning (RL) has become a promising approach for improving GUI Agents in long-horizon, stochastic digital environments, but trajectory-level success feedback is too sparse to provide reliable credit assignment for intermediate exploration steps. To mitigate this issue, recent studies introduce Process Reward Models (PRMs), which provide finer-grained training feedback through global milestone verification or local step-level evaluation. However, these methods still suffer from two level-specific limitations: global milestone decomposition is subjective and singular, making it difficult to accommodate the multiple valid execution paths in real GUI tasks, while fixed local judging windows may miss long-range key evidence or dilute the decision signal with irrelevant frames. Inspired by stain-tracing mechanisms in network flow analysis, we propose StainFlow, an entity-stain-flow process reward model for GUI Agents. To reduce the subjectivity of global partitioning, we introduce the Global Entity Stain Tracking module, which extracts visually verifiable task entities and tracks how their stain concentrations and states evolve along the trajectory, allowing task phases to be objectively separated by changes in the entity evidence flow. To improve the accuracy of local verification, we introduce the Local Stain Evidence Linking module. Centered on the triggering entities of each candidate key node, it retrieves relevant steps based on their stain concentrations and state changes, and dynamically constructs high-density evidence windows for verifying true key nodes. Extensive experiments on AndroidWorld and OGRBench show that StainFlow relatively improves online RL success by 3.2% and trajectory completion judgment accuracy by 1.8%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes StainFlow, a process reward model for GUI agents that draws on network flow stain-tracing. The Global Entity Stain Tracking module extracts visually verifiable task entities and tracks their stain concentrations and states to separate task phases more objectively than subjective global milestone decomposition. The Local Stain Evidence Linking module uses stain concentrations and state changes around triggering entities to dynamically build high-density evidence windows, addressing limitations of fixed local judging windows. Experiments on AndroidWorld and OGRBench report relative gains of 3.2% in online RL success and 1.8% in trajectory completion judgment accuracy.

Significance. If the empirical claims hold under detailed scrutiny, the work addresses a genuine limitation in process supervision for long-horizon GUI agents by attempting to replace subjective partitioning with evidence-flow tracking. The network-flow analogy is a distinctive framing. The reported gains are modest, so the primary value would lie in whether the stain-based partitioning demonstrably reduces reliance on human-designed heuristics while preserving coverage of multiple valid paths.

major comments (3)
  1. [Global Entity Stain Tracking module] Global Entity Stain Tracking module (abstract and method description): the claim that phases are 'objectively separated by changes in the entity evidence flow' rests on extraction of 'visually verifiable task entities' and quantification of 'stain concentrations and states,' yet no formal definition, selection criteria, threshold, or algorithmic procedure is supplied. This directly undermines the central assertion that the approach reduces subjectivity relative to prior global milestone methods.
  2. [Local Stain Evidence Linking module] Local Stain Evidence Linking module (abstract and method description): the module 'retrieves relevant steps based on their stain concentrations and state changes' and 'dynamically constructs high-density evidence windows,' but the precise retrieval rule, handling of sparse or noisy stain signals, and mechanism for capturing long-range dependencies are unspecified. These details are load-bearing for the claimed improvement over fixed local windows.
  3. [Experiments] Experimental results (abstract): the headline figures (3.2% relative RL success improvement, 1.8% judgment accuracy improvement) are stated without reference to any table, baseline methods, number of runs, statistical tests, or ablation isolating the contribution of each module. Without these, the numerical claims cannot be evaluated against the method's stated advantages.
minor comments (1)
  1. [Abstract] The abstract would be clearer if it explicitly referenced the tables or figures that report the 3.2% and 1.8% figures and listed the exact baselines used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. Below we provide point-by-point responses to the major comments, indicating where revisions will be made to address the concerns.

read point-by-point responses
  1. Referee: [Global Entity Stain Tracking module] Global Entity Stain Tracking module (abstract and method description): the claim that phases are 'objectively separated by changes in the entity evidence flow' rests on extraction of 'visually verifiable task entities' and quantification of 'stain concentrations and states,' yet no formal definition, selection criteria, threshold, or algorithmic procedure is supplied. This directly undermines the central assertion that the approach reduces subjectivity relative to prior global milestone methods.

    Authors: We agree with the referee that the method description lacks a formal definition, selection criteria, threshold, or algorithmic procedure for the Global Entity Stain Tracking module. This is a valid point, and we will revise the manuscript to supply these details, including a mathematical formulation for stain concentrations and states, criteria for selecting visually verifiable entities, and the procedure for detecting changes in entity evidence flow to separate phases. revision: yes

  2. Referee: [Local Stain Evidence Linking module] Local Stain Evidence Linking module (abstract and method description): the module 'retrieves relevant steps based on their stain concentrations and state changes' and 'dynamically constructs high-density evidence windows,' but the precise retrieval rule, handling of sparse or noisy stain signals, and mechanism for capturing long-range dependencies are unspecified. These details are load-bearing for the claimed improvement over fixed local windows.

    Authors: We agree that the precise retrieval rule, handling of sparse or noisy signals, and long-range dependency mechanism for the Local Stain Evidence Linking module are not fully specified. We will revise to include these details, such as the rule for selecting steps based on stain concentration thresholds, use of smoothing for noisy signals, and propagation for long-range links. revision: yes

  3. Referee: [Experiments] Experimental results (abstract): the headline figures (3.2% relative RL success improvement, 1.8% judgment accuracy improvement) are stated without reference to any table, baseline methods, number of runs, statistical tests, or ablation isolating the contribution of each module. Without these, the numerical claims cannot be evaluated against the method's stated advantages.

    Authors: We acknowledge that the abstract states the headline figures without referencing tables, baselines, runs, tests, or ablations. The experimental section of the manuscript contains supporting results, but to improve clarity we will revise the abstract to include a reference to the results section and ensure all supporting information is explicitly linked in the text. Space constraints in the abstract limit full inclusion of every detail. revision: partial

Circularity Check

0 steps flagged

No circularity: heuristic analogy with empirical validation, no equations or self-referential reductions

full rationale

The paper introduces StainFlow via an external analogy to stain-tracing in network flow analysis and describes two modules (Global Entity Stain Tracking, Local Stain Evidence Linking) that operate on visually extracted entities and their state changes. No equations, parameter fits, predictions, or derivations appear in the provided text. Claims of reduced subjectivity and improved accuracy rest on empirical results from AndroidWorld and OGRBench rather than any mathematical reduction to inputs. No self-citations are invoked as load-bearing uniqueness theorems. The method is therefore self-contained as a proposed heuristic without circular derivation steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities beyond the high-level description of stain concentrations are stated.

invented entities (1)
  • Entity stain concentration and state no independent evidence
    purpose: To provide an objective signal for separating task phases and linking evidence windows
    Introduced in the description of the Global Entity Stain Tracking module as the core tracking quantity

pith-pipeline@v0.9.1-grok · 5838 in / 1288 out tokens · 25795 ms · 2026-06-27T22:22:23.939617+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 32 canonical work pages · 11 internal anchors

  1. [1]

    Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning

    Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, and Aviral Kumar. Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, pages 12461–12495. C...

  2. [2]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

  3. [3]

    Gui-shepherd: Reliable process reward and verification for long-sequence gui tasks.arXiv preprint arXiv:2509.23738, 2025

    Cong Chen, Kaixiang Ji, Hao Zhong, Muzhi Zhu, Anzhou Li, Guo Gan, Ziyuan Huang, Cheng Zou, Jiajia Liu, Jingdong Chen, et al. Gui-shepherd: Reliable process reward and verification for long-sequence gui tasks.arXiv preprint arXiv:2509.23738, 2025

  4. [4]

    Seeclick: Harnessing gui grounding for advanced visual gui agents

    Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Li YanTao, Jianbing Zhang, and Zhiyong Wu. Seeclick: Harnessing gui grounding for advanced visual gui agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 9313–9332, 2024

  5. [5]

    arXiv preprint arXiv:2502.10325 , year=

    Sanjiban Choudhury. Process reward models for llm agents: Practical framework and directions. arXiv preprint arXiv:2502.10325, 2025

  6. [6]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

  7. [7]

    Agentic reward modeling: Verifying gui agent via online proactive interaction.arXiv preprint arXiv:2602.00575, 2026

    Chaoqun Cui, Jing Huang, Shijing Wang, Liming Zheng, Qingchao Kong, and Zhixiong Zeng. Agentic reward modeling: Verifying gui agent via online proactive interaction.arXiv preprint arXiv:2602.00575, 2026

  8. [8]

    ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration

    Gaole Dai, Shiqi Jiang, Ting Cao, Yuqing Yang, Yuanchun Li, Rui Tan, Mo Li, and Lili Qiu. Prore: A proactive reward system for gui agents via reasoner-actor collaboration.arXiv preprint arXiv:2509.21823, 2025

  9. [9]

    Prpo: Aligning process reward with outcome reward in policy optimization.arXiv preprint arXiv:2601.07182, 2026

    Ruiyi Ding, Yongxuan Lv, Xianhui Meng, Jiahe Song, Chao Wang, Chen Jiang, and Yuan Cheng. Prpo: Aligning process reward with outcome reward in policy optimization.arXiv preprint arXiv:2601.07182, 2026

  10. [10]

    Group-in-group policy optimization for llm agent training

    Lang Feng, Zhenghai Xue, Tingcong Liu, and Bo An. Group-in-group policy optimization for llm agent training. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neural Information Processing Systems, volume 38, pages 46375–46408. Curran Associates, Inc., 2025

  11. [11]

    Assistgui: Task-oriented pc graphical user interface automation

    Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, et al. Assistgui: Task-oriented pc graphical user interface automation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13289–13298, 2024. 10

  12. [12]

    Ui-venus technical report: Building high-performance ui agents with rft.arXiv preprint arXiv:2508.10833, 2025

    Zhangxuan Gu, Zhengwen Zeng, Zhenyu Xu, Xingran Zhou, Shuheng Shen, Yunfei Liu, Beitong Zhou, Changhua Meng, Tianyu Xia, Weizhi Chen, et al. Ui-venus technical report: Building high-performance ui agents with rft.arXiv preprint arXiv:2508.10833, 2025

  13. [13]

    Cogagent: A visual language model for gui agents

    Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, et al. Cogagent: A visual language model for gui agents. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14281–14290, 2024

  14. [14]

    Tim-prm: Verifying multimodal reasoning with tool-integrated prm.arXiv preprint arXiv:2511.22998, 2025

    Peng Kuang, Xiangxiang Wang, Wentao Liu, Jian Dong, and Kaidi Xu. Tim-prm: Verifying multimodal reasoning with tool-integrated prm.arXiv preprint arXiv:2511.22998, 2025

  15. [15]

    Os-themis: A scalable critic framework for generalist gui rewards.arXiv preprint arXiv:2603.19191, 2026

    Zehao Li, Zhenyu Wu, Yibo Zhao, Bowen Yang, Jingjing Xie, Zhaoyang Liu, Zhoumianze Liu, Kaiming Jin, Jianze Liang, Zonglin Li, et al. Os-themis: A scalable critic framework for generalist gui rewards.arXiv preprint arXiv:2603.19191, 2026

  16. [16]

    Arpo: End-to-end policy optimization for gui agents with experience replay.arXiv preprint arXiv:2505.16282, 2025

    Fanbin Lu, Zhisheng Zhong, Shu Liu, Chi-Wing Fu, and Jiaya Jia. Arpo: End-to-end policy optimization for gui agents with experience replay.arXiv preprint arXiv:2505.16282, 2025

  17. [17]

    Agen- trewardbench: Evaluating automatic evaluations of web agent trajectories.arXiv preprint arXiv:2504.08942, 2025

    Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade, Arkil Patel, Dongchan Shin, Ale- jandra Zambrano, Karolina Sta´nczak, Peter Shaw, Christopher J Pal, and Siva Reddy. Agen- trewardbench: Evaluating automatic evaluations of web agent trajectories.arXiv preprint arXiv:2504.08942, 2025

  18. [18]

    GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

    Run Luo, Lu Wang, Wanwei He, Longze Chen, Jiaming Li, and Xiaobo Xia. Gui-r1: A generalist r1-style vision-language action model for gui agents.arXiv preprint arXiv:2504.10458, 2025

  19. [19]

    Coloring the internet: Ip traceback

    Muthusrinivasan Muthuprasanna, G Manimaran, Mansoor Alicherry, and Vijay Kumar. Coloring the internet: Ip traceback. In12th International Conference on Parallel and Distributed Systems- (ICPADS’06), volume 1, pages 8–pp. IEEE, 2006

  20. [20]

    Agentic reward modeling: Integrating human preferences with verifiable correctness signals for reliable reward systems

    Hao Peng, Yunjia Qi, Xiaozhi Wang, Zijun Yao, Bin Xu, Lei Hou, and Juanzi Li. Agentic reward modeling: Integrating human preferences with verifiable correctness signals for reliable reward systems. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 15934–15949, 2025

  21. [21]

    AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

    Christopher Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Mary- beth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, et al. Androidworld: A dynamic benchmarking environment for autonomous agents.arXiv preprint arXiv:2405.14573, 2024

  22. [22]

    Network support for ip traceback.IEEE/ACM transactions on networking, 9(3):226–237, 2001

    Stefan Savage, David Wetherall, Anna Karlin, and Tom Anderson. Network support for ip traceback.IEEE/ACM transactions on networking, 9(3):226–237, 2001

  23. [23]

    Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

    Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, and Aviral Kumar. Rewarding progress: Scaling automated process verifiers for llm reasoning.arXiv preprint arXiv:2410.08146, 2024

  24. [24]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  25. [25]

    Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

    Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, and Dong Yu. Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

  26. [26]

    OpenAI GPT-5 System Card

    Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025

  27. [27]

    Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804, 2026

  28. [28]

    UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

    Haoming Wang, Haoyang Zou, Huatong Song, Jiazhan Feng, Junjie Fang, Junting Lu, Longxi- ang Liu, Qinyu Luo, Shihao Liang, Shijue Huang, et al. Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning.arXiv preprint arXiv:2509.02544, 2025. 11

  29. [29]

    Math-shepherd: Verify and reinforce llms step-by-step without human annotations

    Peiyi Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yifei Li, Deli Chen, Yu Wu, and Zhifang Sui. Math-shepherd: Verify and reinforce llms step-by-step without human annotations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 9426–9439, 2024

  30. [30]

    Gui agents with foundation models: A comprehensive survey.arXiv preprint arXiv:2411.04890, 2024

    Shuai Wang, Weiwen Liu, Jingxuan Chen, Yuqi Zhou, Weinan Gan, Xingshan Zeng, Yuhan Che, Shuai Yu, Xinlong Hao, Kun Shao, et al. Gui agents with foundation models: A comprehensive survey.arXiv preprint arXiv:2411.04890, 2024

  31. [31]

    Distrl: An asynchronous distributed reinforcement learning framework for on-device control agents.arXiv preprint arXiv:2410.14803, 2024

    Taiyi Wang, Zhihao Wu, Jianheng Liu, Jianye Hao, Jun Wang, and Kun Shao. Distrl: An asynchronous distributed reinforcement learning framework for on-device control agents.arXiv preprint arXiv:2410.14803, 2024

  32. [32]

    Opencua: Open foundations for computer-use agents.arXiv preprint arXiv:2508.09123, 2025

    Xinyuan Wang, Bowen Wang, Dunjie Lu, Junlin Yang, Tianbao Xie, Junli Wang, Jiaqi Deng, Xiaole Guo, Yiheng Xu, Chen Henry Wu, et al. Opencua: Open foundations for computer-use agents.arXiv preprint arXiv:2508.09123, 2025

  33. [33]

    Look before you leap: A gui-critic-r1 model for pre-operative error diagnosis in gui automation

    Yuyang Wanyan, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Jiabo Ye, Yutong Kou, Ming Yan, Fei Huang, Xiaoshan Yang, Weiming Dong, and Changsheng Xu. Look before you leap: A gui-critic-r1 model for pre-operative error diagnosis in gui automation. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Ne...

  34. [34]

    Os-oracle: A comprehensive framework for cross-platform gui critic models.arXiv preprint arXiv:2512.16295, 2025

    Zhenyu Wu, Jingjing Xie, Zehao Li, Bowen Yang, Qiushi Sun, Zhaoyang Liu, Zhoumianze Liu, Yu Qiao, Xiangyu Yue, Zun Wang, et al. Os-oracle: A comprehensive framework for cross-platform gui critic models.arXiv preprint arXiv:2512.16295, 2025

  35. [35]

    Agentprm: Process reward models for llm agents via step-wise promise and progress

    Zhiheng Xi, Chenyang Liao, Guanyu Li, Zhihao Zhang, Wenxiang Chen, Binghai Wang, Senjie Jin, Yuhao Zhou, Jian Guan, Wei Wu, et al. Agentprm: Process reward models for llm agents via step-wise promise and progress. InProceedings of the ACM Web Conference 2026, pages 4184–4195, 2026

  36. [36]

    Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments.Advances in Neural Information Processing Systems, 37:52040–52094, 2024

    Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh J Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, et al. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments.Advances in Neural Information Processing Systems, 37:52040–52094, 2024

  37. [37]

    Gui-pra: Process reward agent for gui tasks.arXiv preprint arXiv:2509.23263, 2025

    Tao Xiong, Xavier Hu, Yurun Chen, Yuhang Liu, Changqiao Wu, Pengzhi Gao, Wei Liu, Jian Luan, and Shengyu Zhang. Gui-pra: Process reward agent for gui tasks.arXiv preprint arXiv:2509.23263, 2025

  38. [38]

    Mobile-agent-v3

    Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, et al. Mobile-agent-v3. 5: Multi-platform fundamental gui agents.arXiv preprint arXiv:2602.16855, 2026

  39. [39]

    Evocua: Evolving computer use agents via learning from scalable synthetic experience.arXiv preprint arXiv:2601.15876, 2026

    Taofeng Xue, Chong Peng, Mianqiu Huang, Linsen Guo, Tiancheng Han, Haozhe Wang, Jianing Wang, Xiaocheng Zhang, Xin Yang, Dengchang Zhao, et al. Evocua: Evolving computer use agents via learning from scalable synthetic experience.arXiv preprint arXiv:2601.15876, 2026

  40. [40]

    Zerogui: Automating online gui learning at zero human cost

    Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, et al. Zerogui: Automating online gui learning at zero human cost. arXiv preprint arXiv:2505.23762, 2025

  41. [41]

    Progrm: Build better gui agents with progress rewards.arXiv preprint arXiv:2505.18121, 2025

    Danyang Zhang, Situo Zhang, Ziyue Yang, Zichen Zhu, Zihan Zhao, Ruisheng Cao, Lu Chen, and Kai Yu. Progrm: Build better gui agents with progress rewards.arXiv preprint arXiv:2505.18121, 2025

  42. [42]

    Adaptive milestone reward for gui agents

    Congmin Zheng, Xiaoyun Mo, Xinbei Ma, Qiqiang Lin, Yin Zhao, Jiachen Zhu, Xingyu Lou, Jun Wang, Zhaoxiang Wang, Weiwen Liu, et al. Adaptive milestone reward for gui agents. arXiv preprint arXiv:2602.11524, 2026. 12

  43. [43]

    Mai-ui technical report: Real-world centric foundation gui agents.arXiv preprint arXiv:2512.22047, 2025

    Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, et al. Mai-ui technical report: Real-world centric foundation gui agents.arXiv preprint arXiv:2512.22047, 2025

  44. [44]

    WebArena: A Realistic Web Environment for Building Autonomous Agents

    Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854, 2023. 13 NeurIPS Paper Checklist 1.Claims Question: Do the main claims made in the abstract and introduction accurately ...

  45. [45]

    task completed

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

  46. [46]

    Visible screen state in the candidate screenshot

  47. [47]

    Neighbor/support screenshots that clarify before-after state or later confirmation

  48. [48]

    Triggering entity states and resolved values

  49. [49]

    Previously verified key nodes, used to judge novelty

  50. [50]

    is_key_node

    Candidate action or raw model response, only as intent context, never as proof. Accept if - The candidate establishes a new task-relevant state not already captured by earlier key nodes. - A persistent entity reaches or reveals a task-relevant state, such as entering the correct app, page, dialog, workspace, folder, settings view, or result page. - A tran...

  51. [51]

    Entity final snapshot and concrete resolved values

  52. [52]

    Verified key node chain

  53. [53]

    Recent change summaries

  54. [54]

    Tail screenshots, if attached, as a sanity check

  55. [55]

    only", "just

    Final action/raw response only for locating submitted answers, not as success proof. General decision rules - Parse every task-defining attribute from the query: target names, values, counts, files, dates, states, formats, destinations, and prohibitions. - Mark each attribute as satisfied, violated, or unverifiable using the evidence hierarchy. - Completi...