arxiv: 2605.00642 · v3 · submitted 2026-05-01 · 💻 cs.AI · cs.CV

Recognition: 2 theorem links

· Lean Theorem

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

Yan Zhang , Daiqing Wu , Huawen Shen , Yu Zhou , Can Ma

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:58 UTC · model grok-4.3

classification 💻 cs.AI cs.CV

keywords GUI groundingself-distillationon-policy learningvision-language modelsreinforcement learningcoordinate predictionautonomous agents

0 comments

The pith

On-policy self-distillation with visual context improves GUI grounding accuracy and efficiency over reinforcement learning methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GUI-SD, a framework that applies on-policy self-distillation to GUI grounding tasks. Instead of relying on multiple expensive rollouts as in GRPO methods, it uses a single rollout to generate dense supervision signals. The teacher model receives a target bounding box and Gaussian soft mask for guidance, and distillation weights tokens by their entropy and significance. This approach is shown to outperform baselines on six benchmarks while being more efficient. Readers interested in training vision-language models for agentic tasks would find this relevant because it reduces computational costs in learning precise coordinate predictions.

Core claim

GUI-SD demonstrates that on-policy self-distillation can be effectively adapted to GUI grounding by constructing a visually enriched privileged context for the teacher using a target bounding box and Gaussian soft mask, combined with entropy-guided weighting of tokens based on digit significance and teacher confidence, leading to consistent improvements in accuracy and training efficiency over GRPO-based methods and naive OPSD on six representative benchmarks.

What carries the argument

The GUI-SD framework's visually enriched privileged context (target bounding box plus Gaussian soft mask) and entropy-guided distillation that weights tokens by digit significance and teacher confidence.

If this is right

GUI-SD outperforms GRPO-based methods in accuracy across six benchmarks.
It requires fewer computational resources by avoiding multiple rollouts.
Entropy-guided distillation focuses learning on significant digits and confident predictions.
The method provides a dense supervision signal from a single rollout for hard samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending GUI-SD to other multimodal grounding tasks could improve efficiency in agent training.
The privileged context design might inspire similar techniques in other self-distillation scenarios to prevent information leakage.
Testing GUI-SD on larger models or different architectures would reveal its scalability.

Load-bearing premise

The visually enriched privileged context supplies useful guidance to the teacher without leaking the exact target coordinates, and entropy-guided weighting reliably concentrates learning on the most impactful and reliable tokens.

What would settle it

An ablation study removing the Gaussian soft mask and showing performance dropping to the level of naive OPSD would falsify the claim that the privileged context provides non-leaking guidance.

Figures

Figures reproduced from arXiv: 2605.00642 by Can Ma, Daiqing Wu, Huawen Shen, Yan Zhang, Yu Zhou.

**Figure 1.** Figure 1: (a) GRPO requires expensive multiple rollouts and produces zero reward on hard samples. (b) Naive OPSD forwards the policy twice and distills via reverse KL between student and teacher logits with uniform per-token weight w = 1.0, yet suffers from distillation-to-SFT collapse and indiscriminate optimization. (c) Ours addresses both issues via visual privileged guidance and entropy-guided optimization. such… view at source ↗

**Figure 2.** Figure 2: Per-token analysis of teacher and student predictions on incorrectly predicted tokens across view at source ↗

**Figure 3.** Figure 3: Overview of the GUI-SD framework. (a) The teacher branch receives a privileged context view at source ↗

**Figure 4.** Figure 4: Training dynamics of GUI-SD, Standard Reverse KL, and GRPO-Gaussian over optimiza view at source ↗

read the original abstract

Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely on expensive multiple rollouts and suffer from sparse signals on hard samples. These limitations make on-policy self-distillation (OPSD), which provides dense token-level supervision from a single rollout, a promising alternative. However, its applicability to GUI grounding remains unexplored. In this paper, we present GUI-SD, the first OPSD framework tailored for GUI grounding. First, it constructs a visually enriched privileged context for the teacher using a target bounding box and a Gaussian soft mask, providing informative guidance without leaking exact coordinates. Second, it employs entropy-guided distillation, which adaptively weights tokens based on digit significance and teacher confidence, concentrating optimization on the most impactful and reliable positions. Extensive experiments on six representative GUI grounding benchmarks show that GUI-SD consistently outperforms GRPO-based methods and naive OPSD in both accuracy and training efficiency. Code and training data are available at https://zhangyan-ucas.github.io/GUI-SD/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GUI-SD adapts on-policy self-distillation to GUI grounding with a privileged mask and entropy weighting, but the mask likely leaks location cues that explain much of the reported edge over naive OPSD.

read the letter

The core contribution is a targeted adaptation of on-policy self-distillation to GUI grounding. The authors add two pieces: a privileged visual context for the teacher that includes the ground-truth bounding box plus a centered Gaussian soft mask, and an entropy-guided scheme that weights tokens by digit importance and teacher confidence. They report that this beats both GRPO baselines and plain OPSD on six benchmarks while using fewer rollouts, and they release code and data.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces GUI-SD as the first on-policy self-distillation (OPSD) framework for GUI grounding. It constructs a visually enriched privileged context for the teacher via the target bounding box plus a Gaussian soft mask, applies entropy-guided distillation that weights tokens by digit significance and teacher confidence, and reports consistent outperformance over GRPO-based methods and naive OPSD across six GUI grounding benchmarks in both accuracy and training efficiency.

Significance. If the empirical claims hold, the work offers a computationally lighter alternative to rollout-heavy RL methods for training GUI agents, addressing sparse reward issues on hard samples while maintaining dense token-level supervision. The public release of code and training data is a clear strength that supports reproducibility.

major comments (1)

[Privileged context construction] The central methodological claim (abstract and method description) asserts that the Gaussian soft mask 'provides informative guidance without leaking exact coordinates.' Because the mask is centered on the ground-truth target, its spatial peak directly encodes approximate location information unavailable to the student or to naive OPSD. An ablation that recenters the mask at random locations (while preserving shape and variance) is required to confirm that reported gains over baselines are attributable to entropy-guided weighting and on-policy distillation rather than this privileged cue.

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one or two key quantitative deltas (e.g., average accuracy lift and training-time reduction) rather than only the qualitative statement of 'consistent outperformance.'

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and detailed review of our manuscript. We address the single major comment below.

read point-by-point responses

Referee: [Privileged context construction] The central methodological claim (abstract and method description) asserts that the Gaussian soft mask 'provides informative guidance without leaking exact coordinates.' Because the mask is centered on the ground-truth target, its spatial peak directly encodes approximate location information unavailable to the student or to naive OPSD. An ablation that recenters the mask at random locations (while preserving shape and variance) is required to confirm that reported gains over baselines are attributable to entropy-guided weighting and on-policy distillation rather than this privileged cue.

Authors: We appreciate the referee's observation. The Gaussian soft mask is indeed centered on the ground-truth target, which provides a soft spatial prior unavailable to the student or naive OPSD. While the mask remains probabilistic and does not encode exact pixel coordinates, it does convey approximate location information. To rigorously isolate this effect from the entropy-guided weighting and on-policy distillation, we agree that the proposed ablation (recentering the mask at random locations while preserving shape and variance) is necessary. We will conduct this experiment and include the results in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework validated on external benchmarks

full rationale

The paper introduces GUI-SD as an on-policy self-distillation method for GUI grounding, consisting of a privileged teacher context (bounding box + Gaussian mask) and entropy-guided token weighting. These are presented as design choices, not derived predictions. Performance claims rest entirely on comparative experiments across six independent benchmarks against GRPO and naive OPSD baselines. No equations, fitted parameters renamed as predictions, or self-citations appear in the provided text. The derivation chain is absent; the work is self-contained via external empirical evaluation rather than internal reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard supervised-distillation assumptions plus one domain-specific assumption about non-leaking privileged context; no new physical entities are postulated and the only free parameters are typical training hyperparameters.

free parameters (1)

Gaussian mask spread
The width of the soft mask around the target box is a tunable hyperparameter required to construct the privileged context.

axioms (1)

domain assumption Privileged visual context supplies useful guidance without coordinate leakage
Invoked when constructing the teacher input from the target bounding box and Gaussian mask.

pith-pipeline@v0.9.0 · 5515 in / 1217 out tokens · 61802 ms · 2026-05-12T01:58:21.866201+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Gaussian soft mask ... α(x, y) = exp(−d² / 2σ²)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

entropy-guided distillation ... w_ent(t) = exp(−H(p_T)/τ)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 17 internal anchors

[1]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Gui-eyes: Tool-augmented perception for visual grounding in gui agents.arXiv preprint arXiv:2601.09770, 2026

Chen Chen, Jiawei Shao, Dakuan Lu, Haoyi Hu, Xiangcheng Liu, Hantao Yao, and Wu Liu. Gui-eyes: Tool-augmented perception for visual grounding in gui agents.arXiv preprint arXiv:2601.09770, 2026

work page arXiv 2026
[3]

Ui-ins: Enhancing gui grounding with multi-perspective instruction-as-reasoning.arXiv preprint arXiv:2510.20286, 2025

Liangyu Chen, Hanzhang Zhou, Chenglin Cai, Jianan Zhang, Panrong Tong, Quyu Kong, Xu Zhang, Chen Liu, Yuqi Liu, Wenxuan Wang, et al. Ui-ins: Enhancing gui grounding with multi-perspective instruction-as-reasoning.arXiv preprint arXiv:2510.20286, 2025

work page arXiv 2025
[4]

Seeclick: Harnessing gui grounding for advanced visual gui agents

Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Li YanTao, Jianbing Zhang, and Zhiyong Wu. Seeclick: Harnessing gui grounding for advanced visual gui agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9313–9332, 2024

work page 2024
[5]

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

Sicheng Fan, Qingyun Shi, Shengze Xu, Shengbo Cai, Tieyong Zeng, Li Ling, Yanyi Shang, and Dehan Kong. Webfactory: Automated compression of foundational language intelligence into grounded web agents.arXiv preprint arXiv:2603.05044, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Gui-bee: Align gui action grounding to novel environments via autonomous exploration

Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, and Gang Wu. Gui-bee: Align gui action grounding to novel environments via autonomous exploration. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33249–33266, 2025

work page 2025
[7]

Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu

Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, and Yu Su. Navigating the digital world as humans do: Universal visual grounding for gui agents.arXiv preprint arXiv:2410.05243, 2024

work page arXiv 2024
[8]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, et al. Molmoweb: Open visual web agent and open data for the open web.arXiv preprint arXiv:2604.08516, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[10]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[11]

Mobileipl: Enhancing mobile agents thinking process via iterative preference learning.arXiv preprint arXiv:2505.12299, 2025

Kun Huang, Weikai Xu, Yuxuan Liu, Quandong Wang, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang, and Bo An. Mobileipl: Enhancing mobile agents thinking process via iterative preference learning.arXiv preprint arXiv:2505.12299, 2025

work page arXiv 2025
[12]

Reinforcement Learning via Self-Distillation

Jonas Hübotter, Frederike Lübeck, Lejs Behric, Anton Baumann, Marco Bagatella, Daniel Marta, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, et al. Reinforcement learning via self-distillation.arXiv preprint arXiv:2601.20802, 2026. 10

work page internal anchor Pith review arXiv 2026
[13]

Todi: Token-wise distilla- tion via fine-grained divergence control

Seongryong Jung, Suwan Yoon, DongGeon Kim, and Hwanhee Lee. Todi: Token-wise distilla- tion via fine-grained divergence control. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 8089–8102, 2025

work page 2025
[14]

Guirlvg: Incentivize gui visual grounding via empirical exploration on reinforcement learning.arXiv preprint arXiv:2508.04389, 2025

Weitai Kang, Bin Lei, Gaowen Liu, Caiwen Ding, and Yan Yan. Guirlvg: Incentivize gui visual grounding via empirical exploration on reinforcement learning.arXiv preprint arXiv:2508.04389, 2025

work page arXiv 2025
[15]

Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty.arXiv preprint arXiv:2602.12687, 2026

Jeonghyun Kim, SooKyung Kim, Richeng Xuan, and Hyunsoo Cho. Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty.arXiv preprint arXiv:2602.12687, 2026

work page arXiv 2026
[16]

Computerrl: Scaling end-to-end online reinforcement learning for computer use agents.arXiv preprint arXiv:2508.14040,

Hanyu Lai, Xiao Liu, Yanxiao Zhao, Han Xu, Hanchen Zhang, Bohao Jing, Yanyu Ren, Shuntian Yao, Yuxiao Dong, and Jie Tang. Computerrl: Scaling end-to-end online reinforcement learning for computer use agents.arXiv preprint arXiv:2508.14040, 2025

work page arXiv 2025
[17]

Screenspot-pro: Gui grounding for professional high-resolution computer use

Kaixin Li, Ziyang Meng, Hongzhan Lin, Ziyang Luo, Yuchen Tian, Jing Ma, Zhiyong Huang, and Tat-Seng Chua. Screenspot-pro: Gui grounding for professional high-resolution computer use. InProceedings of the 33rd ACM International Conference on Multimedia, pages 8778– 8786, 2025

work page 2025
[18]

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, et al. Rethinking on-policy distillation of large language models: Phenomenology, mechanism, and recipe.arXiv preprint arXiv:2604.13016, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[19]

arXiv preprint arXiv:2504.14239 , year=

Yuhang Liu, Pengxiang Li, Congkai Xie, Xavier Hu, Xiaotian Han, Shengyu Zhang, Hongxia Yang, and Fei Wu. Infigui-r1: Advancing multimodal gui agents from reactive actors to deliberative reasoners.arXiv preprint arXiv:2504.14239, 2025

work page arXiv 2025
[20]

arXiv preprint arXiv:2509.15221 , year =

Zhaoyang Liu, JingJing Xie, Zichen Ding, Zehao Li, Bowen Yang, Zhenyu Wu, Xuehui Wang, Qiushi Sun, Shi Liu, Weiyun Wang, et al. Scalecua: Scaling open-source computer use agents with cross-platform data.arXiv preprint arXiv:2509.15221, 2025

work page arXiv 2025
[21]

Zoom to essence: Trainless gui grounding by inferring upon interface elements.arXiv preprint arXiv:2603.14448, 2026

Ziwei Liu, Tao Feng, Borui Kang, Yanbing Yang, and Jun Luo. Zoom to essence: Trainless gui grounding by inferring upon interface elements.arXiv preprint arXiv:2603.14448, 2026

work page arXiv 2026
[22]

Ui-r1: Enhancing efficient action prediction of gui agents by reinforcement learning

Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Han Xiao, Shuai Ren, Pengxiang Zhao, Guangyi Liu, et al. Ui-r1: Enhancing efficient action prediction of gui agents by reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 17608–17616, 2026

work page 2026
[23]

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

Run Luo, Lu Wang, Wanwei He, Longze Chen, Jiaming Li, and Xiaobo Xia. Gui-r1: A generalist r1-style vision-language action model for gui agents.arXiv preprint arXiv:2504.10458, 2025

work page internal anchor Pith review arXiv 2025
[24]

Ui- vision: A desktop-centric gui benchmark for visual perception and interaction.arXiv preprint arXiv:2503.15661, 2025

Shravan Nayak, Xiangru Jian, Kevin Qinghong Lin, Juan A Rodriguez, Montek Kalsi, Rabiul Awal, Nicolas Chapados, M Tamer Özsu, Aishwarya Agrawal, David Vazquez, et al. Ui- vision: A desktop-centric gui benchmark for visual perception and interaction.arXiv preprint arXiv:2503.15661, 2025

work page arXiv 2025
[25]

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. Ui-tars: Pioneering automated gui interaction with native agents.arXiv preprint arXiv:2501.12326, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

Pope: Learning to reason on hard problems via privileged on-policy exploration.arXiv preprint arXiv:2601.18779, 2026

Yuxiao Qu, Amrith Setlur, Virginia Smith, Ruslan Salakhutdinov, and Aviral Kumar. Pope: Learning to reason on hard problems via privileged on-policy exploration.arXiv preprint arXiv:2601.18779, 2026

work page arXiv 2026
[27]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Self-Distillation Enables Continual Learning

Idan Shenfeld, Mehul Damani, Jonas Hübotter, and Pulkit Agrawal. Self-distillation enables continual learning.arXiv preprint arXiv:2601.19897, 2026. 11

work page internal anchor Pith review arXiv 2026
[29]

A Survey of On-Policy Distillation for Large Language Models

Mingyang Song and Mao Zheng. A survey of on-policy distillation for large language models. arXiv preprint arXiv:2604.00626, 2026

work page internal anchor Pith review arXiv 2026
[30]

Expanding the capabilities of reinforcement learning via text feedback.arXiv preprint arXiv:2602.02482, 2026

Yuda Song, Lili Chen, Fahim Tajwar, Remi Munos, Deepak Pathak, J Andrew Bagnell, Aarti Singh, and Andrea Zanette. Expanding the capabilities of reinforcement learning via text feedback.arXiv preprint arXiv:2602.02482, 2026

work page arXiv 2026
[31]

Ea-kd: Entropy-based adaptive knowledge distillation

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu, Lei Zhao, Jiewen Yang, Zhuangzhuang Chen, and Shin-Jye Lee. Ea-kd: Entropy-based adaptive knowledge distillation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 731–740, 2025

work page 2025
[32]

Gui-g2: Gaussian reward modeling for gui grounding

Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, et al. Gui-g2: Gaussian reward modeling for gui grounding. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 33214–33222, 2026

work page 2026
[33]

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, et al. Lpo: Towards accurate gui agent interaction via location preference optimization.arXiv preprint arXiv:2506.09373, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Ui-venus-1.5 technical report.arXiv preprint arXiv:2602.09082, 2026

Venus Team, Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, et al. Ui-venus-1.5 technical report.arXiv preprint arXiv:2602.09082, 2026

work page arXiv 2026
[35]

Learning while staying curious: Entropy-preserving supervised fine- tuning via adaptive self-distillation for large reasoning models.arXiv preprint arXiv:2602.02244, 2026

Hao Wang, Hao Gu, Hongming Piao, Kaixiong Gong, Yuxiao Ye, Xiangyu Yue, Sirui Han, Yike Guo, and Dapeng Wu. Learning while staying curious: Entropy-preserving supervised fine- tuning via adaptive self-distillation for large reasoning models.arXiv preprint arXiv:2602.02244, 2026

work page arXiv 2026
[36]

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Wenkai Wang, Xiyun Li, Hongcan Guo, Wenhao Yu, Tianqing Fang, Haitao Mi, Dong Yu, and Shengyu Zhang. Measure twice, click once: Co-evolving proposer and visual critic via reinforcement learning for gui grounding.arXiv preprint arXiv:2604.21268, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[37]

Mmbench-gui: Hierarchical multi-platform evaluation framework for gui agents.arXiv preprint arXiv:2507.19478, 2025

Xuehui Wang, Zhenyu Wu, JingJing Xie, Zichen Ding, Bowen Yang, Zehao Li, Zhaoyang Liu, Qingyun Li, Xuan Dong, Zhe Chen, et al. Mmbench-gui: Hierarchical multi-platform evaluation framework for gui agents.arXiv preprint arXiv:2507.19478, 2025

work page arXiv 2025
[38]

arXiv preprint arXiv:2602.11858 , year=

Lai Wei, Liangbo He, Jun Lan, Lingzhong Dong, Yutong Cai, Siyuan Li, Huijia Zhu, Weiqiang Wang, Linghe Kong, Yue Wang, et al. Zooming without zooming: Region-to-image distillation for fine-grained multimodal perception.arXiv preprint arXiv:2602.11858, 2026

work page arXiv 2026
[39]

arXiv preprint arXiv:2506.03143 , year=

Qianhui Wu, Kanzhi Cheng, Rui Yang, Chaoyun Zhang, Jianwei Yang, Huiqiang Jiang, Jian Mu, Baolin Peng, Bo Qiao, Reuben Tan, et al. Gui-actor: Coordinate-free visual grounding for gui agents.arXiv preprint arXiv:2506.03143, 2025

work page arXiv 2025
[40]

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, et al. Os-atlas: A foundation action model for generalist gui agents, 2024.URL https://arxiv. org/abs/2410.23218

work page internal anchor Pith review arXiv 2024
[42]

Scaling computer-use grounding via user interface decomposition and synthesis.arXiv preprint arXiv:2505.13227, 2025

Tianbao Xie, Jiaqi Deng, Xiaochuan Li, Junlin Yang, Haoyuan Wu, Jixuan Chen, Wenjing Hu, Xinyuan Wang, Yuhui Xu, Zekun Wang, et al. Scaling computer-use grounding via user interface decomposition and synthesis.arXiv preprint arXiv:2505.13227, 2025

work page arXiv 2025
[43]

Mobilerl: Online agentic reinforcement learning for mobile gui agents

Yifan Xu, Xiao Liu, Xinghan Liu, Jiaqi Fu, Hanchen Zhang, Bohao Jing, Shudan Zhang, Yuting Wang, Wenyi Zhao, and Yuxiao Dong. Mobilerl: Online agentic reinforcement learning for mobile gui agents.arXiv preprint arXiv:2509.18119, 2025

work page arXiv 2025
[44]

Self-Distilled RLVR

Chenxu Yang, Chuanyu Qin, Qingyi Si, Minghui Chen, Naibin Gu, Dingyu Yao, Zheng Lin, Weiping Wang, Jiaqi Wang, and Nan Duan. Self-distilled rlvr.arXiv preprint arXiv:2604.03128, 2026. 12

work page internal anchor Pith review Pith/arXiv arXiv 2026
[45]

java21" shown on the file path of the file manager. Text 1 between text Click once at the position before

Yan Yang, Dongxu Li, Yutong Dai, Yuhao Yang, Ziyang Luo, Zirui Zhao, Zhiyuan Hu, Junzhe Huang, Amrita Saha, Zeyuan Chen, et al. Gta1: Gui test-time scaling agent.arXiv preprint arXiv:2507.05791, 2025

work page arXiv 2025
[48]

Enhancing visual grounding for gui agents via self-evolutionary reinforcement learning.arXiv preprint arXiv:2505.12370, 2025

Xinbin Yuan, Jian Zhang, Kaixin Li, Zhuoxuan Cai, Lujian Yao, Jie Chen, Enguang Wang, Qibin Hou, Jinwei Chen, Peng-Tao Jiang, et al. Enhancing visual grounding for gui agents via self-evolutionary reinforcement learning.arXiv preprint arXiv:2505.12370, 2025

work page arXiv 2025
[49]

Fdc-ground: Improving grpo for gui grounding via exponential rewards and fact-aligned pruning

Xiangjian Zeng, Wenjing Li, Qingqiang Wu, and Liang Zhang. Fdc-ground: Improving grpo for gui grounding via exponential rewards and fact-aligned pruning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 28122–28130, 2026

work page 2026
[50]

Tongui: Building generalized gui agents by learning from multimodal web tutorials.arXiv e-prints, pages arXiv–2504, 2025

Bofei Zhang, Zirui Shang, Zhi Gao, Wang Zhang, Rui Xie, Xiaojian Ma, Tao Yuan, Xinxiao Wu, Song-Chun Zhu, and Qing Li. Tongui: Building generalized gui agents by learning from multimodal web tutorials.arXiv e-prints, pages arXiv–2504, 2025

work page 2025
[51]

Hyperclick: Advancing reliable gui grounding via uncertainty calibration.arXiv preprint arXiv:2510.27266, 2025

Shaojie Zhang, Pei Fu, Ruoceng Zhang, Jiahui Yang, Anan Du, Xiuwen Xi, Shaokang Wang, Ying Huang, Bin Qin, Zhenbo Luo, et al. Hyperclick: Advancing reliable gui grounding via uncertainty calibration.arXiv preprint arXiv:2510.27266, 2025

work page arXiv 2025
[52]

Btl-ui: Blink-think-link reasoning model for gui agent

Shaojie Zhang, Ruoceng Zhang, Pei Fu, Shaokang Wang, Jiahui Yang, Xin Du, Shiqi Cui, Bin Qin, Ying Huang, Zhenbo Luo, et al. Btl-ui: Blink-think-link reasoning model for gui agent. arXiv preprint arXiv:2509.15566, 2025

work page arXiv 2025
[53]

OPSDL: On-Policy Self-Distillation for Long-Context Language Models

Xinsen Zhang, Zhenkai Ding, Tianjun Pan, Run Yang, Chun Kang, Xue Xiong, and Jingnan Gu. Opsdl: On-policy self-distillation for long-context language models.arXiv preprint arXiv:2604.17535, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[54]

Co- epg: A framework for co-evolution of planning and grounding in autonomous gui agents

Yuan Zhao, Hualei Zhu, Tingyu Jiang, Shen Li, Xiaohang Xu, and Hao Henry Wang. Co- epg: A framework for co-evolution of planning and grounding in autonomous gui agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 36582–36590, 2026

work page 2026
[55]

Gui-g1: Understanding r1-zero-like training for visual grounding in gui agents.arXiv preprint arXiv:2505.15810, 2025

Yuqi Zhou, Sunhao Dai, Shuai Wang, Kaiwen Zhou, Qinglin Jia, and Jun Xu. Gui-g1: Understanding r1-zero-like training for visual grounding in gui agents.arXiv preprint arXiv:2505.15810, 2025. 13 Appendix The appendix includes the following aspects: • Appendix A: Evaluation Benchmarks. • Appendix B: Training Details. • Appendix C: Additional Experiments and...

work page arXiv 2025