pith. sign in

arxiv: 2606.19930 · v1 · pith:5THPWALWnew · submitted 2026-06-18 · 💻 cs.HC

MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization

Pith reviewed 2026-06-26 15:56 UTC · model grok-4.3

classification 💻 cs.HC
keywords mobile GUI agentsannotation-free adaptationhierarchical policy optimizationMLLMAndroidWorldGRPOMobileGym
0
0 comments X

The pith

MobileForge adapts mobile GUI agents to new apps using only automatically generated data and hierarchical feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that MLLM-based mobile GUI agents can be adapted to numerous real target apps without human-written tasks, demonstrations, or reward labels. MobileForge introduces MobileGym to ground automatic task generation and rollout evaluation in actual app interactions, paired with Hierarchical Feedback-Guided Policy Optimization that converts trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized GRPO updates. This yields 67.2% Pass@3 on AndroidWorld for an adapted Qwen3-VL-8B, close to a closed-data specialized model at 69.0%, with further gains to 77.6% Pass@3 and 41.0% out-of-domain success. A sympathetic reader cares because manual annotation costs have blocked scaling agents across frequently updated mobile apps.

Core claim

MobileForge is an annotation-free adaptation system for mobile GUI agents. It consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which turns trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only automatically generated adaptation data, it adapts base models to strong performance on AndroidWorld and out-of-domain splits.

What carries the argument

Hierarchical Feedback-Guided Policy Optimization (HiFPO), which converts automatically generated trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates.

If this is right

  • Base MLLMs can reach performance near closed-data GUI-specialized models on AndroidWorld using only automatic adaptation data.
  • The adapted ForgeOwl-8B achieves 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split.
  • Adaptation becomes feasible for many apps because MobileGym supplies the full substrate of task generation, rollout, and feedback without manual labels.
  • Policy optimization shifts from isolated rollouts and coarse rewards to hint-contextualized step-level GRPO updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same automatic feedback loop could be tested on non-mobile GUI environments or web agents where app state is similarly observable.
  • If step-level hints prove reliable, the approach might reduce reliance on expensive human preference data in other agent training pipelines.
  • Scaling MobileGym across more apps could create large open adaptation datasets that further close the gap to closed models.

Load-bearing premise

Trajectory outcomes, step-level process feedback, and corrective hints can be generated automatically inside MobileGym and converted into reliable step-level GRPO improvement signals without human-written tasks, demonstrations, or reward labels.

What would settle it

An experiment that replaces MobileGym's automatic feedback with equivalent human-generated step-level hints and measures whether GRPO updates produce measurably weaker or stronger policy gains on the same base model and test split.

read the original abstract

MLLM-based mobile GUI agents have made substantial progress in UI understanding and action execution, but adapting them to real target apps remains costly because mobile apps are numerous, frequently updated, and hard to cover with human-written tasks, demonstrations, or reward labels. Existing annotation-free GUI learning reduces manual supervision, yet lacks a unified substrate connecting target-app exploration, curriculum mining, rollout execution, and feedback, while policy optimization often relies on isolated rollouts and coarse rewards that are hard to convert into reliable improvement signals. We present MobileForge, an annotation-free adaptation system for mobile GUI agents. MobileForge consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which turns trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only automatically generated annotation-free adaptation data, MobileForge adapts Qwen3-VL-8B to 67.2% Pass@3 on AndroidWorld, close to the closed-data GUI-specialized GUI-Owl-1.5-8B base model at 69.0%. The MobileForge-adapted ForgeOwl-8B further reaches 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split, establishing the strongest open-data mobile GUI agent in our evaluation. Code, data, and trained models will be released at https://mobile-forge.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript presents MobileForge, an annotation-free adaptation system for MLLM-based mobile GUI agents. It consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interactions, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which converts automatically generated trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only such data, the system adapts Qwen3-VL-8B to 67.2% Pass@3 on AndroidWorld (close to the 69.0% of the closed-data GUI-Owl-1.5-8B), with the further-adapted ForgeOwl-8B reaching 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split.

Significance. If the automatic feedback mechanisms prove reliable and the performance gains are reproducible, the work would be a meaningful advance in scalable GUI agent adaptation by eliminating the need for human-written tasks, demonstrations, or reward labels across numerous and frequently updated apps. The planned public release of code, data, and trained models is a clear strength that supports verification and extension by the community.

major comments (2)
  1. [Method] The method overview provides no concrete description or pseudocode for how trajectory outcomes, step-level process feedback, and corrective hints are automatically extracted inside MobileGym and converted into reliable GRPO improvement signals; this mechanism is load-bearing for the central annotation-free claim and the reported Pass@3 numbers.
  2. [Experiments] No details are given on the GRPO update implementation, hyperparameter choices for HiFPO, or experimental controls (e.g., data generation protocol, ablation of feedback types); without these, the support for the performance claims on AndroidWorld and MobileWorld cannot be verified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review. The comments correctly identify areas where the manuscript would benefit from expanded technical detail to support verification of the annotation-free claims. We address each point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Method] The method overview provides no concrete description or pseudocode for how trajectory outcomes, step-level process feedback, and corrective hints are automatically extracted inside MobileGym and converted into reliable GRPO improvement signals; this mechanism is load-bearing for the central annotation-free claim and the reported Pass@3 numbers.

    Authors: We agree that the current description in Section 3 is high-level and lacks the requested concrete details and pseudocode. In the revision we will add an expanded subsection with (1) the precise algorithms used inside MobileGym to derive trajectory outcomes, step-level process feedback, and corrective hints from real app interactions, and (2) pseudocode showing how these signals are formatted into hint-contextualized step-level GRPO updates. This will make the load-bearing annotation-free pipeline explicit. revision: yes

  2. Referee: [Experiments] No details are given on the GRPO update implementation, hyperparameter choices for HiFPO, or experimental controls (e.g., data generation protocol, ablation of feedback types); without these, the support for the performance claims on AndroidWorld and MobileWorld cannot be verified.

    Authors: We concur that additional implementation and control details are needed for reproducibility. The revised manuscript will include a new experimental appendix or subsection specifying the exact GRPO formulation and update rule, all HiFPO hyperparameters (learning rate, KL coefficient, feedback weighting, etc.), the full data-generation protocol, and ablation results isolating each feedback type. These additions will directly support the AndroidWorld and MobileWorld numbers. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical system (MobileGym + HiFPO) whose central claims are measured Pass@3 and success rates on external benchmarks (AndroidWorld, MobileWorld). No equations, derivations, or self-citations are shown to reduce these reported outcomes to quantities fitted inside the same loop or to rename inputs as predictions; the adaptation results are presented as measured consequences of the described process on held-out tasks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the new system components MobileGym and HiFPO plus standard reinforcement-learning assumptions for turning automatic feedback into policy updates; no numerical free parameters are named in the abstract.

free parameters (1)
  • HiFPO and GRPO hyperparameters
    Policy optimization methods of this type require multiple hyperparameters whose values are not stated in the abstract.
axioms (1)
  • domain assumption Automatically generated step-level feedback and corrective hints constitute valid training signals for GRPO updates in GUI tasks
    The HiFPO description in the abstract invokes this without further justification.
invented entities (2)
  • MobileGym no independent evidence
    purpose: Grounds task generation and rollout evaluation in real mobile app interaction
    New environment component introduced to supply annotation-free data.
  • HiFPO no independent evidence
    purpose: Converts trajectory outcomes, step-level feedback, and hints into hint-contextualized GRPO updates
    New optimization procedure proposed in the paper.

pith-pipeline@v0.9.1-grok · 5840 in / 1559 out tokens · 42292 ms · 2026-06-26T15:56:48.363131+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 9 linked inside Pith

  1. [1]

    Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

  2. [2]

    OpenMobile: Building open mobile agents with task and trajectory synthesis.arXiv preprint arXiv:2604.15093, 2026

    Kanzhi Cheng, Zehao Li, Zheng Ma, Nuo Chen, Jialin Cao, Qiushi Sun, Zichen Ding, Fangzhi Xu, Hang Yan, Jiajun Chen, et al. OpenMobile: Building open mobile agents with task and trajectory synthesis.arXiv preprint arXiv:2604.15093, 2026

  3. [3]

    Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

  4. [4]

    Ui-venus-1.5 technical report.arXiv e-prints, pages arXiv–2602, 2026

    Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, Beitong Zhou, et al. Ui-venus-1.5 technical report.arXiv e-prints, pages arXiv–2602, 2026

  5. [5]

    Cogagent: A visual language model for gui agents

    Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, et al. Cogagent: A visual language model for gui agents. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14281–14290, 2024

  6. [6]

    Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint, 2026

    Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, et al. Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint, 2026

  7. [7]

    Llm-powered gui agents in phone automation: Surveying progress and prospects

    Guangyi Liu, Pengxiang Zhao, Yaozhen Liang, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, et al. Llm-powered gui agents in phone automation: Surveying progress and prospects. arXiv preprint arXiv:2504.19838, 2025. 16

  8. [8]

    Scalecua: Scaling open-source computer use agents with cross-platform data.arXiv preprint arXiv:2509.15221, 2025

    Zhaoyang Liu, JingJing Xie, Zichen Ding, Zehao Li, Bowen Yang, Zhenyu Wu, Xuehui Wang, Qiushi Sun, Shi Liu, Weiyun Wang, et al. Scalecua: Scaling open-source computer use agents with cross-platform data.arXiv preprint arXiv:2509.15221, 2025

  9. [9]

    Ui-r1: Enhancing efficient action prediction of gui agents by reinforcement learning.arXiv preprint arXiv:2503.21620, 2025

    Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Han Xiao, Shuai Ren, Guanjing Xiong, and Hongsheng Li. Ui-r1: Enhancing efficient action prediction of gui agents by reinforcement learning.arXiv preprint arXiv:2503.21620, 2025

  10. [10]

    Androidworld: A dynamic benchmarking environment for autonomous agents.arXiv preprint arXiv:2405.14573, 2024

    Christopher Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, et al. Androidworld: A dynamic benchmarking environment for autonomous agents.arXiv preprint arXiv:2405.14573, 2024

  11. [11]

    Deepseekmath: Pushing the limits of mathematical reasoning in open language models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

  12. [12]

    Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

    Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, and Dong Yu. Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

  13. [13]

    Os-genesis: Automating gui agent trajectory construction via reverse task synthesis

    Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, et al. Os-genesis: Automating gui agent trajectory construction via reverse task synthesis. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5555–5579, 2025

  14. [14]

    Seagent: Self-evolving computer use agent with autonomous learning from experience.arXiv preprint arXiv:2508.04700, 2025

    Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, and Jiaqi Wang. Seagent: Self-evolving computer use agent with autonomous learning from experience.arXiv preprint arXiv:2508.04700, 2025

  15. [15]

    ClawGUI: A unified framework for training, evaluating, and deploying gui agents.arXiv preprint arXiv:2604.11784, 2026

    Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. ClawGUI: A unified framework for training, evaluating, and deploying gui agents.arXiv preprint arXiv:2604.11784, 2026

  16. [16]

    Mobilea3gent: Training mobile gui agents using decentralized self-sourced data from diverse users.arXiv preprint arXiv:2502.02982, 2025

    Wenhao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, and Yanfeng Wang. Mobilea3gent: Training mobile gui agents using decentralized self-sourced data from diverse users.arXiv preprint arXiv:2502.02982, 2025

  17. [17]

    Ui-oceanus: Scaling gui agents with synthetic environmental dynamics.arXiv preprint arXiv:2604.02345, 2026

    Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Songhe Zhu, Pingzhe Qu, Xin Chen, Kang Qin, Zhongpu Wang, Xiaode Zhang, et al. Ui-oceanus: Scaling gui agents with synthetic environmental dynamics.arXiv preprint arXiv:2604.02345, 2026

  18. [18]

    Gui-explorer: Autonomous exploration and mining of transition-aware knowledge for gui agent.arXiv preprint arXiv:2505.16827, 2025

    Bin Xie, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu, Min Zhang, and Liqiang Nie. Gui-explorer: Autonomous exploration and mining of transition-aware knowledge for gui agent.arXiv preprint arXiv:2505.16827, 2025

  19. [19]

    Mobile-agent-v3.5: Multi-platform fundamental gui agents.arXiv preprint arXiv:2602.16855, 2026

    Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, et al. Mobile-agent-v3.5: Multi-platform fundamental gui agents.arXiv preprint arXiv:2602.16855, 2026

  20. [20]

    Autonomous continual learning of computer-use agents for environment adaptation.arXiv preprint arXiv:2602.10356, 2026

    Tianci Xue, Zeyi Liao, Tianneng Shi, Zilu Wang, Kai Zhang, Dawn Song, Yu Su, and Huan Sun. Autonomous continual learning of computer-use agents for environment adaptation.arXiv preprint arXiv:2602.10356, 2026

  21. [21]

    Zerogui: Automating online gui learning at zero human cost.arXiv preprint arXiv:2505.23762, 2025

    Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, et al. Zerogui: Automating online gui learning at zero human cost.arXiv preprint arXiv:2505.23762, 2025

  22. [22]

    Tongui: Building generalized gui agents by learning from multimodal web tutorials.arXiv e-prints, pages arXiv–2504, 2025

    Bofei Zhang, Zirui Shang, Zhi Gao, Wang Zhang, Rui Xie, Xiaojian Ma, Tao Yuan, Xinxiao Wu, Song-Chun Zhu, and Qing Li. Tongui: Building generalized gui agents by learning from multimodal web tutorials.arXiv e-prints, pages arXiv–2504, 2025

  23. [23]

    Mai-ui technical report: Real-world centric foundation gui agents.arXiv preprint arXiv:2512.22047, 2025

    Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, et al. Mai-ui technical report: Real-world centric foundation gui agents.arXiv preprint arXiv:2512.22047, 2025. 17 A Detailed Related Work This appendix expands the concise related-work discussion in Section 1. We organize prior wo...

  24. [24]

    EVALUATE the original task for reasonableness and completion

  25. [25]

    GENERATE new diverse curriculum tasks that comprehensively cover the app's functionality. ## App Information App Name: {app_name} Original Task Goal: {original_goal} ## Few-shot Examples {fewshot_examples} ## Task Generation Principles {task_principles} ## Already Generated Tasks for {app_name} {existing_tasks} IMPORTANT: Do not generate tasks that are to...

  26. [26]

    Reasonableness Assessment: - Is this a reasonable task that a user might actually want to perform in this app? - Are the requirements clear and achievable? - Does the task make sense in the context of the app?

  27. [27]

    - A reasonable step logically progresses toward task completion

    Step-by-Step Quality Analysis: - Analyze representative visible steps in the screenshot sequence. - A reasonable step logically progresses toward task completion. - An unreasonable step is unnecessary, wrong, counterproductive, stuck in a loop, or moves backward unnecessarily. - Failed trajectories may contain reasonable steps. 22 - Successful trajectorie...

  28. [28]

    evaluation

    Overall Completion Assessment: - Did the agent complete the stated task? - Were the required steps performed correctly? - Did the agent reach the intended goal state? ### Step 2: Curriculum Task Generation Generate 3-8 new learning tasks that: - cover different core functionalities of {app_name}; - vary in length from 1 to 40 steps; - are pedagogically us...

  29. [29]

    Decide whether the attempt completed the task

  30. [30]

    Assess whether the task itself is feasible

  31. [31]

    If failed, identify the failure_step

  32. [32]

    decision

    Analyze every step for reasonableness and provide a concise rationale. Return JSON: { "decision": 1 or 0, "reason": "Explanation of the final judgment", "failure_step": 4, "task_feasible": true/false, "task_feasible_reason": "Why the task is feasible or infeasible", "task_barriers": [], "reasonable_steps": [1, 2, 4], "unreasonable_steps": [3], "step_analy...

  33. [33]

    Identify key mistakes, especially unreasonable steps

  34. [34]

    Specify what to avoid in future attempts

  35. [35]

    Propose concrete alternative approaches

  36. [36]

    key_mistake

    Extract important task insights. Return JSON: { "key_mistake": "Concise summary of the main mistake", "what_to_avoid": ["..."], "suggested_approach": ["..."], "important_insights": ["..."], "hint_summary": "Brief self-reminder for the next attempt" } G.3 Hint-Guided Rollout Prompts During HiFPO rollout, the hint contextη<k is appended to the task instruct...