MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization

Congxiao Liu; Gao Wu; Guangyi Liu; Liang Guo; Liang Liu; Mading Li; Mengyan Wang; Pengxiang Zhao; Yiwen Yin; Yong Liu

arxiv: 2606.19930 · v1 · pith:5THPWALWnew · submitted 2026-06-18 · 💻 cs.HC

MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization

Guangyi Liu , Pengxiang Zhao , Gao Wu , Yiwen Yin , Mading Li , Liang Liu , Congxiao Liu , Zhang Qi

show 3 more authors

Mengyan Wang Liang Guo Yong Liu

This is my paper

Pith reviewed 2026-06-26 15:56 UTC · model grok-4.3

classification 💻 cs.HC

keywords mobile GUI agentsannotation-free adaptationhierarchical policy optimizationMLLMAndroidWorldGRPOMobileGym

0 comments

The pith

MobileForge adapts mobile GUI agents to new apps using only automatically generated data and hierarchical feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that MLLM-based mobile GUI agents can be adapted to numerous real target apps without human-written tasks, demonstrations, or reward labels. MobileForge introduces MobileGym to ground automatic task generation and rollout evaluation in actual app interactions, paired with Hierarchical Feedback-Guided Policy Optimization that converts trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized GRPO updates. This yields 67.2% Pass@3 on AndroidWorld for an adapted Qwen3-VL-8B, close to a closed-data specialized model at 69.0%, with further gains to 77.6% Pass@3 and 41.0% out-of-domain success. A sympathetic reader cares because manual annotation costs have blocked scaling agents across frequently updated mobile apps.

Core claim

MobileForge is an annotation-free adaptation system for mobile GUI agents. It consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which turns trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only automatically generated adaptation data, it adapts base models to strong performance on AndroidWorld and out-of-domain splits.

What carries the argument

Hierarchical Feedback-Guided Policy Optimization (HiFPO), which converts automatically generated trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates.

If this is right

Base MLLMs can reach performance near closed-data GUI-specialized models on AndroidWorld using only automatic adaptation data.
The adapted ForgeOwl-8B achieves 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split.
Adaptation becomes feasible for many apps because MobileGym supplies the full substrate of task generation, rollout, and feedback without manual labels.
Policy optimization shifts from isolated rollouts and coarse rewards to hint-contextualized step-level GRPO updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same automatic feedback loop could be tested on non-mobile GUI environments or web agents where app state is similarly observable.
If step-level hints prove reliable, the approach might reduce reliance on expensive human preference data in other agent training pipelines.
Scaling MobileGym across more apps could create large open adaptation datasets that further close the gap to closed models.

Load-bearing premise

Trajectory outcomes, step-level process feedback, and corrective hints can be generated automatically inside MobileGym and converted into reliable step-level GRPO improvement signals without human-written tasks, demonstrations, or reward labels.

What would settle it

An experiment that replaces MobileGym's automatic feedback with equivalent human-generated step-level hints and measures whether GRPO updates produce measurably weaker or stronger policy gains on the same base model and test split.

read the original abstract

MLLM-based mobile GUI agents have made substantial progress in UI understanding and action execution, but adapting them to real target apps remains costly because mobile apps are numerous, frequently updated, and hard to cover with human-written tasks, demonstrations, or reward labels. Existing annotation-free GUI learning reduces manual supervision, yet lacks a unified substrate connecting target-app exploration, curriculum mining, rollout execution, and feedback, while policy optimization often relies on isolated rollouts and coarse rewards that are hard to convert into reliable improvement signals. We present MobileForge, an annotation-free adaptation system for mobile GUI agents. MobileForge consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which turns trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only automatically generated annotation-free adaptation data, MobileForge adapts Qwen3-VL-8B to 67.2% Pass@3 on AndroidWorld, close to the closed-data GUI-specialized GUI-Owl-1.5-8B base model at 69.0%. The MobileForge-adapted ForgeOwl-8B further reaches 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split, establishing the strongest open-data mobile GUI agent in our evaluation. Code, data, and trained models will be released at https://mobile-forge.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MobileForge shows you can adapt GUI agents to new apps without annotations by running a gym for task generation and using hierarchical feedback in GRPO, and the reported numbers on AndroidWorld and MobileWorld are competitive.

read the letter

The main advance is the end-to-end system that combines MobileGym for automatic task generation, rollout, and evaluation inside real apps with HiFPO, which converts trajectory outcomes, step feedback, and hints into contextualized GRPO updates. This produces an adapted Qwen3-VL-8B at 67.2% Pass@3 on AndroidWorld, close to a closed-data specialized model, and the further ForgeOwl-8B version reaches 77.6% there plus 41% success on the out-of-domain MobileWorld GUI split. The plan to release code, data, and models is useful for the subfield.

The work is grounded in external benchmarks rather than self-generated metrics, which helps. The hierarchical feedback angle looks like a reasonable way to get finer signals than coarse rewards alone.

The central assumption is that MobileGym can reliably produce step-level process feedback and corrective hints without human input. If the full paper shows concrete extraction rules, ablations on each feedback type, and controls for noise in those signals, the claim holds; otherwise the performance edge could trace to implementation details rather than the method. The abstract leaves those mechanics high-level, so the paper needs to make the pipeline reproducible.

This is worth a serious referee for groups building mobile or GUI agents who care about lowering annotation costs. The empirical results address a practical bottleneck even if the feedback quality needs tighter validation.

Referee Report

2 major / 0 minor

Summary. The manuscript presents MobileForge, an annotation-free adaptation system for MLLM-based mobile GUI agents. It consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interactions, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which converts automatically generated trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only such data, the system adapts Qwen3-VL-8B to 67.2% Pass@3 on AndroidWorld (close to the 69.0% of the closed-data GUI-Owl-1.5-8B), with the further-adapted ForgeOwl-8B reaching 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split.

Significance. If the automatic feedback mechanisms prove reliable and the performance gains are reproducible, the work would be a meaningful advance in scalable GUI agent adaptation by eliminating the need for human-written tasks, demonstrations, or reward labels across numerous and frequently updated apps. The planned public release of code, data, and trained models is a clear strength that supports verification and extension by the community.

major comments (2)

[Method] The method overview provides no concrete description or pseudocode for how trajectory outcomes, step-level process feedback, and corrective hints are automatically extracted inside MobileGym and converted into reliable GRPO improvement signals; this mechanism is load-bearing for the central annotation-free claim and the reported Pass@3 numbers.
[Experiments] No details are given on the GRPO update implementation, hyperparameter choices for HiFPO, or experimental controls (e.g., data generation protocol, ablation of feedback types); without these, the support for the performance claims on AndroidWorld and MobileWorld cannot be verified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review. The comments correctly identify areas where the manuscript would benefit from expanded technical detail to support verification of the annotation-free claims. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Method] The method overview provides no concrete description or pseudocode for how trajectory outcomes, step-level process feedback, and corrective hints are automatically extracted inside MobileGym and converted into reliable GRPO improvement signals; this mechanism is load-bearing for the central annotation-free claim and the reported Pass@3 numbers.

Authors: We agree that the current description in Section 3 is high-level and lacks the requested concrete details and pseudocode. In the revision we will add an expanded subsection with (1) the precise algorithms used inside MobileGym to derive trajectory outcomes, step-level process feedback, and corrective hints from real app interactions, and (2) pseudocode showing how these signals are formatted into hint-contextualized step-level GRPO updates. This will make the load-bearing annotation-free pipeline explicit. revision: yes
Referee: [Experiments] No details are given on the GRPO update implementation, hyperparameter choices for HiFPO, or experimental controls (e.g., data generation protocol, ablation of feedback types); without these, the support for the performance claims on AndroidWorld and MobileWorld cannot be verified.

Authors: We concur that additional implementation and control details are needed for reproducibility. The revised manuscript will include a new experimental appendix or subsection specifying the exact GRPO formulation and update rule, all HiFPO hyperparameters (learning rate, KL coefficient, feedback weighting, etc.), the full data-generation protocol, and ablation results isolating each feedback type. These additions will directly support the AndroidWorld and MobileWorld numbers. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical system (MobileGym + HiFPO) whose central claims are measured Pass@3 and success rates on external benchmarks (AndroidWorld, MobileWorld). No equations, derivations, or self-citations are shown to reduce these reported outcomes to quantities fitted inside the same loop or to rename inputs as predictions; the adaptation results are presented as measured consequences of the described process on held-out tasks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the new system components MobileGym and HiFPO plus standard reinforcement-learning assumptions for turning automatic feedback into policy updates; no numerical free parameters are named in the abstract.

free parameters (1)

HiFPO and GRPO hyperparameters
Policy optimization methods of this type require multiple hyperparameters whose values are not stated in the abstract.

axioms (1)

domain assumption Automatically generated step-level feedback and corrective hints constitute valid training signals for GRPO updates in GUI tasks
The HiFPO description in the abstract invokes this without further justification.

invented entities (2)

MobileGym no independent evidence
purpose: Grounds task generation and rollout evaluation in real mobile app interaction
New environment component introduced to supply annotation-free data.
HiFPO no independent evidence
purpose: Converts trajectory outcomes, step-level feedback, and hints into hint-contextualized GRPO updates
New optimization procedure proposed in the paper.

pith-pipeline@v0.9.1-grok · 5840 in / 1559 out tokens · 42292 ms · 2026-06-26T15:56:48.363131+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 9 linked inside Pith

[1]

Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

Pith/arXiv arXiv 2025
[2]

OpenMobile: Building open mobile agents with task and trajectory synthesis.arXiv preprint arXiv:2604.15093, 2026

Kanzhi Cheng, Zehao Li, Zheng Ma, Nuo Chen, Jialin Cao, Qiushi Sun, Zichen Ding, Fangzhi Xu, Hang Yan, Jiajun Chen, et al. OpenMobile: Building open mobile agents with task and trajectory synthesis.arXiv preprint arXiv:2604.15093, 2026

Pith/arXiv arXiv 2026
[3]

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

Pith/arXiv arXiv 2025
[4]

Ui-venus-1.5 technical report.arXiv e-prints, pages arXiv–2602, 2026

Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, Beitong Zhou, et al. Ui-venus-1.5 technical report.arXiv e-prints, pages arXiv–2602, 2026

2026
[5]

Cogagent: A visual language model for gui agents

Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, et al. Cogagent: A visual language model for gui agents. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14281–14290, 2024

2024
[6]

Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint, 2026

Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, et al. Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint, 2026

2026
[7]

Llm-powered gui agents in phone automation: Surveying progress and prospects

Guangyi Liu, Pengxiang Zhao, Yaozhen Liang, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, et al. Llm-powered gui agents in phone automation: Surveying progress and prospects. arXiv preprint arXiv:2504.19838, 2025. 16

arXiv 2025
[8]

Scalecua: Scaling open-source computer use agents with cross-platform data.arXiv preprint arXiv:2509.15221, 2025

Zhaoyang Liu, JingJing Xie, Zichen Ding, Zehao Li, Bowen Yang, Zhenyu Wu, Xuehui Wang, Qiushi Sun, Shi Liu, Weiyun Wang, et al. Scalecua: Scaling open-source computer use agents with cross-platform data.arXiv preprint arXiv:2509.15221, 2025

arXiv 2025
[9]

Ui-r1: Enhancing efficient action prediction of gui agents by reinforcement learning.arXiv preprint arXiv:2503.21620, 2025

Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Han Xiao, Shuai Ren, Guanjing Xiong, and Hongsheng Li. Ui-r1: Enhancing efficient action prediction of gui agents by reinforcement learning.arXiv preprint arXiv:2503.21620, 2025

Pith/arXiv arXiv 2025
[10]

Androidworld: A dynamic benchmarking environment for autonomous agents.arXiv preprint arXiv:2405.14573, 2024

Christopher Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, et al. Androidworld: A dynamic benchmarking environment for autonomous agents.arXiv preprint arXiv:2405.14573, 2024

Pith/arXiv arXiv 2024
[11]

Deepseekmath: Pushing the limits of mathematical reasoning in open language models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

Pith/arXiv arXiv 2024
[12]

Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, and Dong Yu. Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

arXiv 2025
[13]

Os-genesis: Automating gui agent trajectory construction via reverse task synthesis

Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, et al. Os-genesis: Automating gui agent trajectory construction via reverse task synthesis. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5555–5579, 2025

2025
[14]

Seagent: Self-evolving computer use agent with autonomous learning from experience.arXiv preprint arXiv:2508.04700, 2025

Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, and Jiaqi Wang. Seagent: Self-evolving computer use agent with autonomous learning from experience.arXiv preprint arXiv:2508.04700, 2025

arXiv 2025
[15]

ClawGUI: A unified framework for training, evaluating, and deploying gui agents.arXiv preprint arXiv:2604.11784, 2026

Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. ClawGUI: A unified framework for training, evaluating, and deploying gui agents.arXiv preprint arXiv:2604.11784, 2026

Pith/arXiv arXiv 2026
[16]

Mobilea3gent: Training mobile gui agents using decentralized self-sourced data from diverse users.arXiv preprint arXiv:2502.02982, 2025

Wenhao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, and Yanfeng Wang. Mobilea3gent: Training mobile gui agents using decentralized self-sourced data from diverse users.arXiv preprint arXiv:2502.02982, 2025

arXiv 2025
[17]

Ui-oceanus: Scaling gui agents with synthetic environmental dynamics.arXiv preprint arXiv:2604.02345, 2026

Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Songhe Zhu, Pingzhe Qu, Xin Chen, Kang Qin, Zhongpu Wang, Xiaode Zhang, et al. Ui-oceanus: Scaling gui agents with synthetic environmental dynamics.arXiv preprint arXiv:2604.02345, 2026

Pith/arXiv arXiv 2026
[18]

Gui-explorer: Autonomous exploration and mining of transition-aware knowledge for gui agent.arXiv preprint arXiv:2505.16827, 2025

Bin Xie, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu, Min Zhang, and Liqiang Nie. Gui-explorer: Autonomous exploration and mining of transition-aware knowledge for gui agent.arXiv preprint arXiv:2505.16827, 2025

arXiv 2025
[19]

Mobile-agent-v3.5: Multi-platform fundamental gui agents.arXiv preprint arXiv:2602.16855, 2026

Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, et al. Mobile-agent-v3.5: Multi-platform fundamental gui agents.arXiv preprint arXiv:2602.16855, 2026

arXiv 2026
[20]

Autonomous continual learning of computer-use agents for environment adaptation.arXiv preprint arXiv:2602.10356, 2026

Tianci Xue, Zeyi Liao, Tianneng Shi, Zilu Wang, Kai Zhang, Dawn Song, Yu Su, and Huan Sun. Autonomous continual learning of computer-use agents for environment adaptation.arXiv preprint arXiv:2602.10356, 2026

Pith/arXiv arXiv 2026
[21]

Zerogui: Automating online gui learning at zero human cost.arXiv preprint arXiv:2505.23762, 2025

Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, et al. Zerogui: Automating online gui learning at zero human cost.arXiv preprint arXiv:2505.23762, 2025

arXiv 2025
[22]

Tongui: Building generalized gui agents by learning from multimodal web tutorials.arXiv e-prints, pages arXiv–2504, 2025

Bofei Zhang, Zirui Shang, Zhi Gao, Wang Zhang, Rui Xie, Xiaojian Ma, Tao Yuan, Xinxiao Wu, Song-Chun Zhu, and Qing Li. Tongui: Building generalized gui agents by learning from multimodal web tutorials.arXiv e-prints, pages arXiv–2504, 2025

2025
[23]

Mai-ui technical report: Real-world centric foundation gui agents.arXiv preprint arXiv:2512.22047, 2025

Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, et al. Mai-ui technical report: Real-world centric foundation gui agents.arXiv preprint arXiv:2512.22047, 2025. 17 A Detailed Related Work This appendix expands the concise related-work discussion in Section 1. We organize prior wo...

arXiv 2025
[24]

EVALUATE the original task for reasonableness and completion
[25]

GENERATE new diverse curriculum tasks that comprehensively cover the app's functionality. ## App Information App Name: {app_name} Original Task Goal: {original_goal} ## Few-shot Examples {fewshot_examples} ## Task Generation Principles {task_principles} ## Already Generated Tasks for {app_name} {existing_tasks} IMPORTANT: Do not generate tasks that are to...
[26]

Reasonableness Assessment: - Is this a reasonable task that a user might actually want to perform in this app? - Are the requirements clear and achievable? - Does the task make sense in the context of the app?
[27]

- A reasonable step logically progresses toward task completion

Step-by-Step Quality Analysis: - Analyze representative visible steps in the screenshot sequence. - A reasonable step logically progresses toward task completion. - An unreasonable step is unnecessary, wrong, counterproductive, stuck in a loop, or moves backward unnecessarily. - Failed trajectories may contain reasonable steps. 22 - Successful trajectorie...
[28]

evaluation

Overall Completion Assessment: - Did the agent complete the stated task? - Were the required steps performed correctly? - Did the agent reach the intended goal state? ### Step 2: Curriculum Task Generation Generate 3-8 new learning tasks that: - cover different core functionalities of {app_name}; - vary in length from 1 to 40 steps; - are pedagogically us...
[29]

Decide whether the attempt completed the task
[30]

Assess whether the task itself is feasible
[31]

If failed, identify the failure_step
[32]

decision

Analyze every step for reasonableness and provide a concise rationale. Return JSON: { "decision": 1 or 0, "reason": "Explanation of the final judgment", "failure_step": 4, "task_feasible": true/false, "task_feasible_reason": "Why the task is feasible or infeasible", "task_barriers": [], "reasonable_steps": [1, 2, 4], "unreasonable_steps": [3], "step_analy...
[33]

Identify key mistakes, especially unreasonable steps
[34]

Specify what to avoid in future attempts
[35]

Propose concrete alternative approaches
[36]

key_mistake

Extract important task insights. Return JSON: { "key_mistake": "Concise summary of the main mistake", "what_to_avoid": ["..."], "suggested_approach": ["..."], "important_insights": ["..."], "hint_summary": "Brief self-reminder for the next attempt" } G.3 Hint-Guided Rollout Prompts During HiFPO rollout, the hint contextη<k is appended to the task instruct...

2048

[1] [1]

Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

Pith/arXiv arXiv 2025

[2] [2]

OpenMobile: Building open mobile agents with task and trajectory synthesis.arXiv preprint arXiv:2604.15093, 2026

Kanzhi Cheng, Zehao Li, Zheng Ma, Nuo Chen, Jialin Cao, Qiushi Sun, Zichen Ding, Fangzhi Xu, Hang Yan, Jiajun Chen, et al. OpenMobile: Building open mobile agents with task and trajectory synthesis.arXiv preprint arXiv:2604.15093, 2026

Pith/arXiv arXiv 2026

[3] [3]

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

Pith/arXiv arXiv 2025

[4] [4]

Ui-venus-1.5 technical report.arXiv e-prints, pages arXiv–2602, 2026

Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, Beitong Zhou, et al. Ui-venus-1.5 technical report.arXiv e-prints, pages arXiv–2602, 2026

2026

[5] [5]

Cogagent: A visual language model for gui agents

Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, et al. Cogagent: A visual language model for gui agents. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14281–14290, 2024

2024

[6] [6]

Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint, 2026

Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, et al. Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint, 2026

2026

[7] [7]

Llm-powered gui agents in phone automation: Surveying progress and prospects

Guangyi Liu, Pengxiang Zhao, Yaozhen Liang, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, et al. Llm-powered gui agents in phone automation: Surveying progress and prospects. arXiv preprint arXiv:2504.19838, 2025. 16

arXiv 2025

[8] [8]

Scalecua: Scaling open-source computer use agents with cross-platform data.arXiv preprint arXiv:2509.15221, 2025

Zhaoyang Liu, JingJing Xie, Zichen Ding, Zehao Li, Bowen Yang, Zhenyu Wu, Xuehui Wang, Qiushi Sun, Shi Liu, Weiyun Wang, et al. Scalecua: Scaling open-source computer use agents with cross-platform data.arXiv preprint arXiv:2509.15221, 2025

arXiv 2025

[9] [9]

Ui-r1: Enhancing efficient action prediction of gui agents by reinforcement learning.arXiv preprint arXiv:2503.21620, 2025

Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Han Xiao, Shuai Ren, Guanjing Xiong, and Hongsheng Li. Ui-r1: Enhancing efficient action prediction of gui agents by reinforcement learning.arXiv preprint arXiv:2503.21620, 2025

Pith/arXiv arXiv 2025

[10] [10]

Androidworld: A dynamic benchmarking environment for autonomous agents.arXiv preprint arXiv:2405.14573, 2024

Christopher Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, et al. Androidworld: A dynamic benchmarking environment for autonomous agents.arXiv preprint arXiv:2405.14573, 2024

Pith/arXiv arXiv 2024

[11] [11]

Deepseekmath: Pushing the limits of mathematical reasoning in open language models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

Pith/arXiv arXiv 2024

[12] [12]

Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, and Dong Yu. Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

arXiv 2025

[13] [13]

Os-genesis: Automating gui agent trajectory construction via reverse task synthesis

Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, et al. Os-genesis: Automating gui agent trajectory construction via reverse task synthesis. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5555–5579, 2025

2025

[14] [14]

Seagent: Self-evolving computer use agent with autonomous learning from experience.arXiv preprint arXiv:2508.04700, 2025

Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, and Jiaqi Wang. Seagent: Self-evolving computer use agent with autonomous learning from experience.arXiv preprint arXiv:2508.04700, 2025

arXiv 2025

[15] [15]

ClawGUI: A unified framework for training, evaluating, and deploying gui agents.arXiv preprint arXiv:2604.11784, 2026

Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. ClawGUI: A unified framework for training, evaluating, and deploying gui agents.arXiv preprint arXiv:2604.11784, 2026

Pith/arXiv arXiv 2026

[16] [16]

Mobilea3gent: Training mobile gui agents using decentralized self-sourced data from diverse users.arXiv preprint arXiv:2502.02982, 2025

Wenhao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, and Yanfeng Wang. Mobilea3gent: Training mobile gui agents using decentralized self-sourced data from diverse users.arXiv preprint arXiv:2502.02982, 2025

arXiv 2025

[17] [17]

Ui-oceanus: Scaling gui agents with synthetic environmental dynamics.arXiv preprint arXiv:2604.02345, 2026

Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Songhe Zhu, Pingzhe Qu, Xin Chen, Kang Qin, Zhongpu Wang, Xiaode Zhang, et al. Ui-oceanus: Scaling gui agents with synthetic environmental dynamics.arXiv preprint arXiv:2604.02345, 2026

Pith/arXiv arXiv 2026

[18] [18]

Gui-explorer: Autonomous exploration and mining of transition-aware knowledge for gui agent.arXiv preprint arXiv:2505.16827, 2025

Bin Xie, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu, Min Zhang, and Liqiang Nie. Gui-explorer: Autonomous exploration and mining of transition-aware knowledge for gui agent.arXiv preprint arXiv:2505.16827, 2025

arXiv 2025

[19] [19]

Mobile-agent-v3.5: Multi-platform fundamental gui agents.arXiv preprint arXiv:2602.16855, 2026

Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, et al. Mobile-agent-v3.5: Multi-platform fundamental gui agents.arXiv preprint arXiv:2602.16855, 2026

arXiv 2026

[20] [20]

Autonomous continual learning of computer-use agents for environment adaptation.arXiv preprint arXiv:2602.10356, 2026

Tianci Xue, Zeyi Liao, Tianneng Shi, Zilu Wang, Kai Zhang, Dawn Song, Yu Su, and Huan Sun. Autonomous continual learning of computer-use agents for environment adaptation.arXiv preprint arXiv:2602.10356, 2026

Pith/arXiv arXiv 2026

[21] [21]

Zerogui: Automating online gui learning at zero human cost.arXiv preprint arXiv:2505.23762, 2025

Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, et al. Zerogui: Automating online gui learning at zero human cost.arXiv preprint arXiv:2505.23762, 2025

arXiv 2025

[22] [22]

Tongui: Building generalized gui agents by learning from multimodal web tutorials.arXiv e-prints, pages arXiv–2504, 2025

Bofei Zhang, Zirui Shang, Zhi Gao, Wang Zhang, Rui Xie, Xiaojian Ma, Tao Yuan, Xinxiao Wu, Song-Chun Zhu, and Qing Li. Tongui: Building generalized gui agents by learning from multimodal web tutorials.arXiv e-prints, pages arXiv–2504, 2025

2025

[23] [23]

Mai-ui technical report: Real-world centric foundation gui agents.arXiv preprint arXiv:2512.22047, 2025

Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, et al. Mai-ui technical report: Real-world centric foundation gui agents.arXiv preprint arXiv:2512.22047, 2025. 17 A Detailed Related Work This appendix expands the concise related-work discussion in Section 1. We organize prior wo...

arXiv 2025

[24] [24]

EVALUATE the original task for reasonableness and completion

[25] [25]

GENERATE new diverse curriculum tasks that comprehensively cover the app's functionality. ## App Information App Name: {app_name} Original Task Goal: {original_goal} ## Few-shot Examples {fewshot_examples} ## Task Generation Principles {task_principles} ## Already Generated Tasks for {app_name} {existing_tasks} IMPORTANT: Do not generate tasks that are to...

[26] [26]

Reasonableness Assessment: - Is this a reasonable task that a user might actually want to perform in this app? - Are the requirements clear and achievable? - Does the task make sense in the context of the app?

[27] [27]

- A reasonable step logically progresses toward task completion

Step-by-Step Quality Analysis: - Analyze representative visible steps in the screenshot sequence. - A reasonable step logically progresses toward task completion. - An unreasonable step is unnecessary, wrong, counterproductive, stuck in a loop, or moves backward unnecessarily. - Failed trajectories may contain reasonable steps. 22 - Successful trajectorie...

[28] [28]

evaluation

Overall Completion Assessment: - Did the agent complete the stated task? - Were the required steps performed correctly? - Did the agent reach the intended goal state? ### Step 2: Curriculum Task Generation Generate 3-8 new learning tasks that: - cover different core functionalities of {app_name}; - vary in length from 1 to 40 steps; - are pedagogically us...

[29] [29]

Decide whether the attempt completed the task

[30] [30]

Assess whether the task itself is feasible

[31] [31]

If failed, identify the failure_step

[32] [32]

decision

Analyze every step for reasonableness and provide a concise rationale. Return JSON: { "decision": 1 or 0, "reason": "Explanation of the final judgment", "failure_step": 4, "task_feasible": true/false, "task_feasible_reason": "Why the task is feasible or infeasible", "task_barriers": [], "reasonable_steps": [1, 2, 4], "unreasonable_steps": [3], "step_analy...

[33] [33]

Identify key mistakes, especially unreasonable steps

[34] [34]

Specify what to avoid in future attempts

[35] [35]

Propose concrete alternative approaches

[36] [36]

key_mistake

Extract important task insights. Return JSON: { "key_mistake": "Concise summary of the main mistake", "what_to_avoid": ["..."], "suggested_approach": ["..."], "important_insights": ["..."], "hint_summary": "Brief self-reminder for the next attempt" } G.3 Hint-Guided Rollout Prompts During HiFPO rollout, the hint contextη<k is appended to the task instruct...

2048