LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
Pith reviewed 2026-05-19 09:43 UTC · model grok-4.3
The pith
Location Preference Optimization improves GUI agent accuracy by rewarding positions based on physical distance and information entropy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LPO optimizes interaction preferences by using locational data, with information entropy to focus on zones rich in information and a dynamic location reward function based on physical distance that reflects varying importance of positions, all supported by Group Relative Preference Optimization to enhance precision across GUI environments.
What carries the argument
Location Preference Optimization (LPO), a method that selects zones via information entropy and scores actions with a physical-distance reward, then trains via Group Relative Preference Optimization.
If this is right
- Higher success rates on offline GUI agent benchmarks compared with prior supervised and reinforcement methods.
- State-of-the-art results on real-world online evaluations of live interface interactions.
- More thorough exploration of GUI states during training, leading to better positional choices.
- Reduced need for manual tuning when moving the agent across different applications.
Where Pith is reading between the lines
- The same distance-plus-entropy reward structure could be tested on non-screen interfaces that still require spatial actions, such as robotic arms or AR overlays.
- If the method generalizes, it might cut the volume of labeled demonstrations needed to train new GUI agents.
- Running the approach on mobile versus desktop layouts would test whether entropy selection stays unbiased across screen densities.
- Pairing LPO with additional visual features could further tighten the distance-based reward signal.
Load-bearing premise
That a reward function based on physical distance between predicted and target locations, combined with entropy-based zone selection, provides a reliable and generalizable signal for positional accuracy without introducing bias toward particular GUI layouts or requiring extensive per-app tuning.
What would settle it
Direct comparison of positional error rates or task success rates on the paper's offline benchmarks when LPO is replaced by plain supervised fine-tuning or standard reinforcement learning; if the gap disappears, the central claim does not hold.
Figures
read the original abstract
The advent of autonomous agents is transforming interactions with Graphical User Interfaces (GUIs) by employing natural language as a powerful intermediary. Despite the predominance of Supervised Fine-Tuning (SFT) methods in current GUI agents for achieving spatial localization, these methods face substantial challenges due to their limited capacity to accurately perceive positional data. Existing strategies, such as reinforcement learning, often fail to assess positional accuracy effectively, thereby restricting their utility. In response, we introduce Location Preference Optimization (LPO), a novel approach that leverages locational data to optimize interaction preferences. LPO uses information entropy to predict interaction positions by focusing on zones rich in information. Besides, it further introduces a dynamic location reward function based on physical distance, reflecting the varying importance of interaction positions. Supported by Group Relative Preference Optimization (GRPO), LPO facilitates an extensive exploration of GUI environments and significantly enhances interaction precision. Comprehensive experiments demonstrate LPO's superior performance, achieving SOTA results across both offline benchmarks and real-world online evaluations. Our code will be made publicly available soon, at https://github.com/jqtangust/LPO.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Location Preference Optimization (LPO) for improving spatial localization in GUI agents. It combines information entropy to identify high-information zones for position prediction with a dynamic location reward based on physical distance, optimized under Group Relative Preference Optimization (GRPO). The central claim is that this yields superior performance, achieving SOTA results on both offline benchmarks and real-world online evaluations.
Significance. If the results hold after detailed validation, LPO could meaningfully advance GUI agent reliability by supplying a more targeted preference signal for positional accuracy than standard SFT or generic RL approaches. The entropy-driven zone selection and distance-based reward constitute a concrete attempt to address a known weakness in current methods; the planned public code release would further strengthen the contribution.
major comments (2)
- [§3 (Method)] §3 (Method): The dynamic location reward is described as a function of physical distance, yet its exact mathematical form, normalization procedure, and any scaling constants are not specified. These constants are identified as free parameters in the supporting analysis; without their explicit definition the reward signal cannot be reproduced or checked for layout-specific bias.
- [§4 (Experiments)] §4 (Experiments): No quantitative metrics, ablation results isolating the entropy zone selector versus the distance reward, or error analysis stratified by element size, density, or screen resolution are referenced. The SOTA claim on both offline and online settings rests on these missing controls; the skeptic concern that Euclidean distance may be a poor proxy for large tappable regions therefore remains unaddressed.
minor comments (2)
- [Abstract] Abstract: The relationship between GRPO and prior preference-optimization algorithms (DPO, PPO, etc.) should be stated with citations so readers can assess novelty.
- [Throughout] Notation: Define all acronyms (SFT, GRPO, LPO) at first use and ensure consistent use of “location” versus “positional” terminology throughout.
Simulated Author's Rebuttal
We sincerely thank the referee for the thorough review and valuable feedback on our manuscript introducing Location Preference Optimization (LPO). The comments have helped us identify areas for improvement in clarity and completeness. We address each major comment below and have updated the manuscript to incorporate the suggested changes where appropriate.
read point-by-point responses
-
Referee: [§3 (Method)] The dynamic location reward is described as a function of physical distance, yet its exact mathematical form, normalization procedure, and any scaling constants are not specified. These constants are identified as free parameters in the supporting analysis; without their explicit definition the reward signal cannot be reproduced or checked for layout-specific bias.
Authors: We agree that providing the exact mathematical form is necessary for reproducibility. In the revised manuscript, we have explicitly specified the dynamic location reward function in Section 3, including its dependence on physical distance, the normalization procedure to ensure scale-invariance across different screen sizes, and the values of scaling constants used. This allows readers to reproduce the reward signal and assess any potential layout-specific biases. We have also added a brief analysis of the reward's sensitivity to these parameters. revision: yes
-
Referee: [§4 (Experiments)] No quantitative metrics, ablation results isolating the entropy zone selector versus the distance reward, or error analysis stratified by element size, density, or screen resolution are referenced. The SOTA claim on both offline and online settings rests on these missing controls; the skeptic concern that Euclidean distance may be a poor proxy for large tappable regions therefore remains unaddressed.
Authors: We thank the referee for this important suggestion. Although the main experimental results show SOTA performance, we recognize that more detailed ablations and analyses would better isolate the contributions of each component and address potential limitations of the distance-based reward. In the revised version, we have included quantitative ablation studies comparing variants with and without the entropy zone selector and the distance reward. We have also added error analyses stratified by element size, UI density, and screen resolution. To address the concern about Euclidean distance for large tappable regions, we discuss this limitation and show through additional metrics that our method maintains advantages even in such cases. These revisions provide stronger support for our claims. revision: yes
Circularity Check
No significant circularity; derivation remains self-contained
full rationale
The abstract presents LPO as a new method that combines entropy-based zone selection with a dynamic location reward based on physical distance, then applies GRPO for optimization. No equations, fitted parameters renamed as predictions, or self-referential definitions appear in the provided text. GRPO is invoked as supporting framework without any load-bearing self-citation chain or uniqueness theorem that reduces the central claim to prior author work by construction. The performance claims rest on experimental results rather than tautological re-labeling of inputs. This is the normal case of an independent proposal whose validity can be checked externally via the promised code and benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- scaling constants in dynamic location reward
axioms (2)
- domain assumption Information entropy computed over screen regions identifies zones that are most informative for interaction decisions.
- domain assumption Physical distance between predicted and target locations provides a monotonic and generalizable measure of interaction quality.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LPO uses information entropy to predict interaction positions by focusing on zones rich in information... dynamic location reward function based on physical distance
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Supported by Group Relative Preference Optimization (GRPO)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
GUI-SD is the first on-policy self-distillation framework for GUI grounding that adds privileged bounding-box context and entropy-guided weighting to outperform GRPO methods on six benchmarks in accuracy and efficiency.
-
Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
GUI-SD introduces on-policy self-distillation with visually enriched privileged context and entropy-guided weighting, outperforming GRPO and naive OPSD on six GUI grounding benchmarks while improving training efficiency.
-
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a fut...
Reference graph
Works this paper leans on
-
[1]
Guicourse: From general vision language models to versatile gui agents
Wentong Chen, Junbo Cui, Jinyi Hu, Yujia Qin, Junjie Fang, Yue Zhao, Chongyi Wang, Jun Liu, Guirong Chen, Yupeng Huo, et al. Guicourse: From general vision language models to versatile gui agents. arXiv preprint arXiv:2406.11317, 2024. 6
-
[2]
Seeclick: Harnessing gui grounding for advanced visual gui agents, 2024
Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, and Zhiyong Wu. Seeclick: Harnessing gui grounding for advanced visual gui agents, 2024. 1, 3
work page 2024
-
[3]
Mind2web: Towards a generalist agent for the web
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems , 36:28091–28114, 2023. 1, 6
work page 2023
-
[4]
Mind2web: Towards a generalist agent for the web
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. 2, 6, 7, 9
work page 2023
-
[5]
Exposing limitations of language model agents in sequential-task compositions on the web
Hiroki Furuta, Yutaka Matsuo, Aleksandra Faust, and Izzeddin Gur. Exposing limitations of language model agents in sequential-task compositions on the web. Transactions on Machine Learning Research,
-
[6]
Navigating the digital world as humans do: Universal visual grounding for GUI agents
Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, and Yu Su. Navigating the digital world as humans do: Universal visual grounding for GUI agents. In The Thirteenth International Conference on Learning Representations , 2025. 3
work page 2025
-
[7]
Webvoyager: Building an end-to-end web agent with large multimodal models, 2024
Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu. Webvoyager: Building an end-to-end web agent with large multimodal models, 2024. 1, 2, 8
work page 2024
-
[8]
Cogagent: A visual language model for gui agents, 2023
Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, and Jie Tang. Cogagent: A visual language model for gui agents, 2023. 1, 2, 3
work page 2023
-
[9]
Raghav Kapoor, Yash Parag Butala, Melisa Russak, Jing Yu Koh, Kiran Kamble, Waseem AlShikh, and Ruslan Salakhutdinov. Omniact: A dataset and benchmark for enabling multimodal generalist autonomous agents for desktop and web. In European Conference on Computer Vision, pages 161–178. Springer, 2024. 6
work page 2024
-
[10]
Mug: Interactive multimodal grounding on user interfaces
Tao Li, Gang Li, Jingjie Zheng, Purple Wang, and Yang Li. Mug: Interactive multimodal grounding on user interfaces. arXiv preprint arXiv:2209.15099, 2022. 6
-
[11]
Henry Lieberman. Autonomous interface agents. In Proceedings of the ACM SIGCHI Conference on Human factors in computing systems , pages 67–74, 1997. 1
work page 1997
-
[12]
Visualwebbench: How far have multimodal llms evolved in web page understanding and grounding?, 2024
Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, and Xiang Yue. Visualwebbench: How far have multimodal llms evolved in web page understanding and grounding?, 2024. 2, 7
work page 2024
-
[13]
Infigui-r1: Advancing multimodal gui agents from reactive actors to deliberative reasoners, 2025
Yuhang Liu, Pengxiang Li, Congkai Xie, Xavier Hu, Xiaotian Han, Shengyu Zhang, Hongxia Yang, and Fei Wu. Infigui-r1: Advancing multimodal gui agents from reactive actors to deliberative reasoners, 2025. 1, 2, 3, 4, 6, 7, 8
work page 2025
-
[14]
Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, and Han-Jia Ye. Ovis: Structural embedding alignment for multimodal large language model. arXiv:2405.20797, 2024. 6
-
[15]
Omniparser for pure vision based gui agent, 2024
Yadong Lu, Jianwei Yang, Yelong Shen, and Ahmed Awadallah. Omniparser for pure vision based gui agent, 2024. 2
work page 2024
-
[16]
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning
Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Guanjing Xiong, and Hong- sheng Li. Ui-r1: Enhancing action prediction of gui agents by reinforcement learning. arXiv preprint arXiv:2503.21620, 2025. 1, 2, 3, 4, 6, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Weblinx: Real-world website navigation with multi-turn dialogue, 2024
Xing Han Lù, Zdenˇek Kasner, and Siva Reddy. Weblinx: Real-world website navigation with multi-turn dialogue, 2024. 2
work page 2024
-
[18]
Ui-tars: Pioneering automated gui interaction with native agents, 2025
Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li, Xiao Zhou, Minchao Wang, Haoli Chen, Zhaojian Li, Haihua Ya...
work page 2025
-
[19]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024. 2, 4, 5, 6
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
Os-genesis: Automating gui agent trajectory construction via reverse task synthesis
Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, et al. Os-genesis: Automating gui agent trajectory construction via reverse task synthesis. arXiv preprint arXiv:2412.19723, 2024. 6
-
[21]
Gui agents with foundation models: A comprehensive survey
Shuai Wang, Weiwen Liu, Jingxuan Chen, Yuqi Zhou, Weinan Gan, Xingshan Zeng, Yuhan Che, Shuai Yu, Xinlong Hao, Kun Shao, et al. Gui agents with foundation models: A comprehensive survey. arXiv preprint arXiv:2411.04890, 2024. 1
-
[22]
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, et al. Os-atlas: A foundation action model for generalist gui agents. arXiv preprint arXiv:2410.23218, 2024. 2, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Xiaobo Xia and Run Luo. Gui-r1: A generalist r1-style vision-language action model for gui agents. arXiv preprint arXiv:2504.10458, 2025. 1, 2, 3, 4, 6, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, and Jianfeng Gao. Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv preprint arXiv:2310.11441, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
Dapo: An open-source llm reinforcement learning system at scale, 2025
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Weinan Dai, Yuxuan Song, Xiangpeng Wei, Hao Zhou, Jingjing Liu, W...
work page 2025
-
[26]
Large Language Model-Brained GUI Agents: A Survey
Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, et al. Large language model-brained gui agents: A survey. arXiv preprint arXiv:2411.18279,
work page internal anchor Pith review arXiv
-
[27]
Android in the zoo: Chain-of-action-thought for gui agents,
Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, and Duyu Tang. Android in the zoo: Chain-of-action-thought for gui agents. arXiv preprint arXiv:2403.02713, 2024. 6
-
[28]
Reinforced ui instruction grounding: Towards a generic ui task automation api
Zhizheng Zhang, Wenxuan Xie, Xiaoyi Zhang, and Yan Lu. Reinforced ui instruction grounding: Towards a generic ui task automation api. arXiv preprint arXiv:2310.04716, 2023. 1, 2, 3 11 This appendix introduces the social impact and future work of this paper. A Social Impact The development and deployment of autonomous agents capable of interacting effectiv...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.