Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible
Pith reviewed 2026-05-16 05:56 UTC · model grok-4.3
The pith
Anonymization replaces sensitive mobile UI content with semantic placeholders so GUI agents can complete tasks without seeing private data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework enforces available-but-invisible access: a PII detector identifies sensitive UI elements, a UI transformer substitutes them with placeholders such as PHONE_NUMBER#a1b2c, and a layered architecture of detector, transformer, secure interaction proxy, and privacy gatekeeper keeps raw data local while allowing the agent to operate over the anonymized view across instructions, XML hierarchies, and screenshots.
What carries the argument
Deterministic type-preserving placeholders that replace detected PII while preserving semantic category information for multimodal agent reasoning.
If this is right
- Privacy leakage drops substantially across several multimodal models.
- Task success rate declines only modestly on the evaluated benchmarks.
- The same anonymization applies consistently to user instructions, XML layouts, and screenshots.
- Narrowly scoped local computations can still be invoked when raw values are required.
Where Pith is reading between the lines
- The same placeholder technique could be applied to web or desktop GUI agents that face comparable screen-exposure risks.
- Users could be given controls to tune detection sensitivity for different categories of information.
- On-device detection models would further reduce the amount of raw screen data that ever leaves the device.
Load-bearing premise
The PII recognition model catches every sensitive element and the placeholders supply enough semantic detail for agents to reason correctly over the anonymized interface.
What would settle it
An experiment in which the agent either fails to complete tasks at the reported success rate or still leaks identifiable values through the anonymized screenshots or XML on the AndroidLab and PrivScreen benchmarks.
Figures
read the original abstract
Mobile Graphical User Interface (GUI) agents have demonstrated strong capabilities in automating complex smartphone tasks by leveraging multimodal large language models (MLLMs) and system-level control interfaces. However, this paradigm introduces significant privacy risks, as agents typically capture and process entire screen contents, thereby exposing sensitive personal data such as phone numbers, addresses, messages, and financial information. Existing defenses either reduce UI exposure, obfuscate only task-irrelevant content, or rely on user authorization, but none can protect task-critical sensitive information while preserving seamless agent usability. We propose an anonymization-based privacy protection framework that enforces the principle of available-but-invisible access to sensitive data: sensitive information remains usable for task execution but is never directly visible to the cloud-based agent. Our system detects sensitive UI content using a PII-aware recognition model and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c) that retain semantic categories while removing identifying details. A layered architecture comprising a PII Detector, UI Transformer, Secure Interaction Proxy, and Privacy Gatekeeper ensures consistent anonymization across user instructions, XML hierarchies, and screenshots, mediates all agent actions over anonymized interfaces, and supports narrowly scoped local computations when reasoning over raw values is necessary. Extensive experiments on the AndroidLab and PrivScreen benchmarks show that our framework substantially reduces privacy leakage across multiple models while incurring only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods. Code available at: https://github.com/one-step-beh1nd/gui_privacy_protection
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an anonymization framework for mobile GUI agents that uses a PII-aware recognition model to detect sensitive UI content and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c). A layered architecture (PII Detector, UI Transformer, Secure Interaction Proxy, Privacy Gatekeeper) ensures consistent anonymization across instructions, XML, and screenshots while mediating agent actions. Experiments on AndroidLab and PrivScreen benchmarks are reported to show substantial privacy leakage reduction across models with only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods.
Significance. If the results hold after addressing the quantification gaps, the work is significant for providing a practical, available-but-invisible privacy mechanism for MLLM-based GUI agents that avoids both full UI exposure and task-irrelevant obfuscation. The open-sourced code is a positive contribution that supports reproducibility and extension.
major comments (2)
- [Experiments] Experiments section: No precision, recall, or F1 scores are reported for the PII-aware recognition model on AndroidLab or PrivScreen. This is load-bearing for the central privacy-reduction claim; without near-zero false-negative rates, measured leakage reductions would be inflated.
- [Experiments] Experiments section: No ablations isolate the effect of the deterministic placeholder scheme on downstream MLLM task accuracy. This undermines the utility-degradation and trade-off claims, as it is unclear whether observed performance stems from anonymization or other factors.
minor comments (1)
- [Abstract] Abstract: The claim of 'best observed privacy-utility trade-off' is stated without naming the specific baselines or reporting the exact quantitative deltas used for comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the opportunity to improve the manuscript. We address each major comment below and will revise the experiments section accordingly to strengthen the privacy and utility claims.
read point-by-point responses
-
Referee: [Experiments] Experiments section: No precision, recall, or F1 scores are reported for the PII-aware recognition model on AndroidLab or PrivScreen. This is load-bearing for the central privacy-reduction claim; without near-zero false-negative rates, measured leakage reductions would be inflated.
Authors: We agree that explicit performance metrics for the PII-aware recognition model are necessary to fully support the privacy-reduction results. Privacy leakage was measured directly via the presence of sensitive content in agent outputs and interaction traces rather than assuming perfect detection. In the revised version we will add precision, recall, and F1 scores for the detector evaluated on both AndroidLab and PrivScreen, along with a brief discussion of false-negative impact on the observed leakage figures. revision: yes
-
Referee: [Experiments] Experiments section: No ablations isolate the effect of the deterministic placeholder scheme on downstream MLLM task accuracy. This undermines the utility-degradation and trade-off claims, as it is unclear whether observed performance stems from anonymization or other factors.
Authors: We acknowledge that dedicated ablations would better isolate the contribution of the deterministic placeholder scheme. The current utility results compare the full anonymization pipeline against non-anonymized baselines, but do not vary the placeholder mechanism itself. We will add ablation experiments in the revision that replace deterministic placeholders with random strings or task-irrelevant tokens while keeping the rest of the pipeline fixed, thereby clarifying the source of any accuracy changes. revision: yes
Circularity Check
No circularity: empirical framework evaluated on external benchmarks
full rationale
The paper describes a systems framework for anonymizing sensitive UI content via PII detection and deterministic placeholders, with claims resting on experiments using the external AndroidLab and PrivScreen benchmarks. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central privacy-utility results are presented as direct empirical outcomes rather than reductions to author-defined inputs by construction, rendering the derivation chain self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption PII-aware recognition model reliably identifies sensitive UI content across screenshots, XML, and instructions.
Forward citations
Cited by 1 Pith paper
-
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization
TIPO applies preference-intensity weighting and padding gating to stabilize preference optimization for privacy personalization in mobile GUI agents, yielding higher alignment and distinction metrics than prior methods.
Reference graph
Works this paper leans on
-
[1]
Yuyang Zhao, Wentao Shi, Fuli Feng, and Xiangnan He. Appagent-pro: A proactive gui agent system for multidomain information integration and user assistance. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, CIKM ’25, page 6767–6771. ACM, November 2025
work page 2025
-
[2]
Mobile-agent-v3: Fundamental agents for gui automation, 2025
Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, Jitong Liao, Qi Zheng, Fei Huang, Jingren Zhou, and Ming Yan. Mobile-agent-v3: Fundamental agents for gui automation, 2025
work page 2025
-
[3]
Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning, 2025
Haoming Wang, Haoyang Zou, Huatong Song, and et al. Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning, 2025
work page 2025
-
[4]
Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, et al. Autoglm: Autonomous foundation agents for guis.arXiv preprint arXiv:2411.00820, 2024. 14
-
[5]
Jingru Fan, Yufan Dang, Jingyao Wu, Huatao Li, Runde Yang, Xiyuan Yang, Yuheng Wang, and Chen Qian. Appcopilot: Toward general, accurate, long-horizon, and efficient mobile agent.arXiv preprint arXiv:2509.02444, 2025
-
[6]
Qwen2.5-vl technical report, 2025
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2025
work page 2025
-
[7]
Core: Reducing ui exposure in mobile agents via collaboration between cloud and local llms, 2025
Gucongcong Fan, Chaoyue Niu, Chengfei Lyu, Fan Wu, and Guihai Chen. Core: Reducing ui exposure in mobile agents via collaboration between cloud and local llms, 2025
work page 2025
-
[8]
Dualtap: A dual-task adversarial protector for mobile mllm agents, 2025
Fuyao Zhang, Jiaming Zhang, Che Wang, Xiongtao Sun, Yurong Hao, Guowei Guan, Wenjie Li, Longtao Huang, and Wei Yang Bryan Lim. Dualtap: A dual-task adversarial protector for mobile mllm agents, 2025
work page 2025
-
[9]
Privweb: Unobtrusive and content-aware privacy protection for web agents, 2025
Shuning Zhang, Yutong Jiang, Rongjun Ma, Yuting Yang, Mingyao Xu, Zhixin Huang, Xin Yi, and Hewu Li. Privweb: Unobtrusive and content-aware privacy protection for web agents, 2025
work page 2025
-
[10]
Guiguard: Toward a general framework for privacy-preserving gui agents, 2026
Yanxi Wang, Zhiling Zhang, Wenbo Zhou, Weiming Zhang, Jie Zhang, Qiannan Zhu, Yu Shi, Shuxin Zheng, and Jiyan He. Guiguard: Toward a general framework for privacy-preserving gui agents, 2026
work page 2026
-
[11]
Towards trustworthy gui agents: A survey, 2025
Yucheng Shi, Wenhao Yu, Wenlin Yao, Wenhu Chen, and Ninghao Liu. Towards trustworthy gui agents: A survey, 2025
work page 2025
-
[12]
Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Zho...
work page 2025
-
[13]
arXiv:2508.04482 [cs.AI] https://arxiv.org/abs/2508.04482
Xueyu Hu, Tao Xiong, Biao Yi, et al. Os agents: A survey on mllm-based agents for general computing devices. arXiv preprint arXiv:2508.04482, 2025
-
[14]
Dang Nguyen, Jian Chen, Yu Wang, et al. Gui agents: A survey.arXiv preprint arXiv:2412.13501, 2024
-
[15]
Mind the third eye! benchmarking privacy awareness in mllm-powered smartphone agents, 2025
Zhixin Lin, Jungang Li, Shidong Pan, Yibo Shi, Yue Yao, and Dongliang Xu. Mind the third eye! benchmarking privacy awareness in mllm-powered smartphone agents, 2025
work page 2025
-
[16]
Zikang Guo, Benfeng Xu, Chiwei Zhu, et al. Mcp-agentbench: Evaluating real-world language agent performance with mcp-mediated tools.arXiv preprint arXiv:2509.09734, 2025
-
[17]
arXiv preprint arXiv:2506.07672 , year=
Yunhe Yan, Shihe Wang, Jiajun Du, et al. Mcpworld: A unified benchmarking testbed for api, gui, and hybrid agents.arXiv preprint arXiv:2506.07672, 2025
-
[18]
RouterBench: A Benchmark for Multi-LLM Routing System
Qitian Jason Hu, Jacob Bieker, Xiuyu Li, et al. Routerbench: A benchmark for multi-llm routing systems.arXiv preprint arXiv:2403.12031, 2024
work page internal anchor Pith review arXiv 2024
-
[19]
The obvious invisible threat: Llm-powered gui agents’ vulnerability to fine-print injections
Chaoran Chen, Zhiping Zhang, Bingcan Guo, Shang Ma, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, and Toby Jia-Jun Li. The obvious invisible threat: Llm-powered gui agents’ vulnerability to fine-print injections. InProceedings of the 2025 USENIX Symposium on Usable Privacy and Security (SOUPS), 2025
work page 2025
-
[20]
GLiNER: Generalist model for named entity recognition using bidirectional transformer
Urchade Zaratiana, Nadi Tomeh, Pierre Holat, and Thierry Charnois. GLiNER: Generalist model for named entity recognition using bidirectional transformer. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2024
work page 2024
-
[21]
Gliner multi-task: Generalist lightweight model for various information extraction tasks, 2024
Ihor Stepanov and Mykhailo Shtopko. Gliner multi-task: Generalist lightweight model for various information extraction tasks, 2024
work page 2024
-
[22]
distilbert_finetuned_ai4privacy_v2 (revision 51d7b98), 2025
Isotonic. distilbert_finetuned_ai4privacy_v2 (revision 51d7b98), 2025
work page 2025
-
[23]
Microsoft presidio: Open -source pii detection and anonymization framework
Microsoft. Microsoft presidio: Open -source pii detection and anonymization framework. https://github.com/ microsoft/presidio, 2025. Open-source project under MIT License
work page 2025
-
[24]
knowledgator/gliner-pii-large-v1.0
Knowledgator and Wordcab. knowledgator/gliner-pii-large-v1.0. https://huggingface.co/knowledgator/ gliner-pii-large-v1.0, 2025. Hugging Face pre-trained model
work page 2025
-
[25]
Layoutlmv3: Pre-training for document ai with unified text and image masking
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu Wei. Layoutlmv3: Pre-training for document ai with unified text and image masking. InProceedings of the 30th ACM International Conference on Multimedia, pages 4083–4091, 2022
work page 2022
-
[26]
ScreenAI: A Vision-Language Model for UI and Infographics Understanding , year =
Gilles Baechler, Srinadh Srinivas, Ping-Yu Wang, Jason Howard, et al. Screenai: A vision-language model for ui and infographics understanding.arXiv preprint arXiv:2402.04615, 2024. 15
-
[27]
Visionllm v2: An end-to-end generalist multimodal large language model.NeurIPS, 2024
Jiannan Wu, Muyan Zhong, Sen Xing, et al. Visionllm v2: An end-to-end generalist multimodal large language model.NeurIPS, 2024
work page 2024
-
[28]
Gemini Team. Gemini 2.5: Pushing the frontier of multimodal reasoning and long-context understanding.arXiv preprint, 2025
work page 2025
-
[29]
Towards adversarial attack on vision-language pre-training models
Jiaming Zhang, Qi Yi, and Jitao Sang. Towards adversarial attack on vision-language pre-training models. Proceedings of ACM Multimedia, 2022
work page 2022
-
[30]
Anyattack: Towards large-scale self-supervised adversarial attacks on vision-language models
Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Chen Yunhao, Jitao Sang, and Dit-Yan Yeung. Anyattack: Towards large-scale self-supervised adversarial attacks on vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
work page 2025
-
[31]
Adversarial attacks against closed-source MLLMs via feature optimal alignment
Xiaojun Jia, Sensen Gao, Simeng Qin, Tianyu Pang, Chao Du, Yihao Huang, Xinfeng Li, Yiming Li, Bo Li, and Yang Liu. Adversarial attacks against closed-source MLLMs via feature optimal alignment. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[32]
Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza, and Olivier Déforges. Vip: Visual information protection through adversarial attacks on vision-language models, 2025
work page 2025
-
[33]
Easyocr: Ready-to-use ocr with 80+ supported languages
JaidedAI. Easyocr: Ready-to-use ocr with 80+ supported languages. https://github.com/JaidedAI/EasyOCR,
-
[34]
Androidlab: Training and systematic benchmarking of android autonomous agents
Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, and Yuxiao Dong. Androidlab: Training and systematic benchmarking of android autonomous agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, pages 2144–2166, 2025
work page 2025
-
[35]
An Yang and et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[36]
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. Ui-tars: Pioneering automated gui interaction with native agents.arXiv preprint arXiv:2501.12326, 2025. 16
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.