pith. sign in

arxiv: 2602.10139 · v3 · submitted 2026-02-08 · 💻 cs.CR · cs.AI

Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible

Pith reviewed 2026-05-16 05:56 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords privacy protectionGUI agentsanonymizationPII detectionmobile securitymultimodal modelsdata obfuscation
0
0 comments X

The pith

Anonymization replaces sensitive mobile UI content with semantic placeholders so GUI agents can complete tasks without seeing private data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Mobile GUI agents process full screen contents and therefore expose personal details such as phone numbers, addresses, and messages to cloud models. The paper introduces a framework that detects personally identifiable information on the device, replaces it with deterministic placeholders that keep type and category information, and routes all agent actions through a secure proxy. This setup ensures the cloud-based model never receives raw sensitive values while still receiving enough structure to reason about the interface. Experiments on AndroidLab and PrivScreen benchmarks report large drops in privacy leakage together with only modest losses in task success rate. The method is presented as achieving the strongest privacy-utility balance among current defenses.

Core claim

The framework enforces available-but-invisible access: a PII detector identifies sensitive UI elements, a UI transformer substitutes them with placeholders such as PHONE_NUMBER#a1b2c, and a layered architecture of detector, transformer, secure interaction proxy, and privacy gatekeeper keeps raw data local while allowing the agent to operate over the anonymized view across instructions, XML hierarchies, and screenshots.

What carries the argument

Deterministic type-preserving placeholders that replace detected PII while preserving semantic category information for multimodal agent reasoning.

If this is right

  • Privacy leakage drops substantially across several multimodal models.
  • Task success rate declines only modestly on the evaluated benchmarks.
  • The same anonymization applies consistently to user instructions, XML layouts, and screenshots.
  • Narrowly scoped local computations can still be invoked when raw values are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same placeholder technique could be applied to web or desktop GUI agents that face comparable screen-exposure risks.
  • Users could be given controls to tune detection sensitivity for different categories of information.
  • On-device detection models would further reduce the amount of raw screen data that ever leaves the device.

Load-bearing premise

The PII recognition model catches every sensitive element and the placeholders supply enough semantic detail for agents to reason correctly over the anonymized interface.

What would settle it

An experiment in which the agent either fails to complete tasks at the reported success rate or still leaks identifiable values through the anonymized screenshots or XML on the AndroidLab and PrivScreen benchmarks.

Figures

Figures reproduced from arXiv: 2602.10139 by Lepeng Zhao, Shuo Li, Zhenhua Zou, Zhuotao Liu.

Figure 1
Figure 1. Figure 1: Overview of the proposed privacy protection framework for mobile GUI agents. The system inserts a [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of category-preserving anonymization of user instructions. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of screenshots before and after anonymization. The left image shows the original screen before [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of Type proxy resolution. The text in black regions highlights enlarged excerpts of the magenta regions to illustrate the corresponding content. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Mobile Graphical User Interface (GUI) agents have demonstrated strong capabilities in automating complex smartphone tasks by leveraging multimodal large language models (MLLMs) and system-level control interfaces. However, this paradigm introduces significant privacy risks, as agents typically capture and process entire screen contents, thereby exposing sensitive personal data such as phone numbers, addresses, messages, and financial information. Existing defenses either reduce UI exposure, obfuscate only task-irrelevant content, or rely on user authorization, but none can protect task-critical sensitive information while preserving seamless agent usability. We propose an anonymization-based privacy protection framework that enforces the principle of available-but-invisible access to sensitive data: sensitive information remains usable for task execution but is never directly visible to the cloud-based agent. Our system detects sensitive UI content using a PII-aware recognition model and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c) that retain semantic categories while removing identifying details. A layered architecture comprising a PII Detector, UI Transformer, Secure Interaction Proxy, and Privacy Gatekeeper ensures consistent anonymization across user instructions, XML hierarchies, and screenshots, mediates all agent actions over anonymized interfaces, and supports narrowly scoped local computations when reasoning over raw values is necessary. Extensive experiments on the AndroidLab and PrivScreen benchmarks show that our framework substantially reduces privacy leakage across multiple models while incurring only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods. Code available at: https://github.com/one-step-beh1nd/gui_privacy_protection

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes an anonymization framework for mobile GUI agents that uses a PII-aware recognition model to detect sensitive UI content and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c). A layered architecture (PII Detector, UI Transformer, Secure Interaction Proxy, Privacy Gatekeeper) ensures consistent anonymization across instructions, XML, and screenshots while mediating agent actions. Experiments on AndroidLab and PrivScreen benchmarks are reported to show substantial privacy leakage reduction across models with only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods.

Significance. If the results hold after addressing the quantification gaps, the work is significant for providing a practical, available-but-invisible privacy mechanism for MLLM-based GUI agents that avoids both full UI exposure and task-irrelevant obfuscation. The open-sourced code is a positive contribution that supports reproducibility and extension.

major comments (2)
  1. [Experiments] Experiments section: No precision, recall, or F1 scores are reported for the PII-aware recognition model on AndroidLab or PrivScreen. This is load-bearing for the central privacy-reduction claim; without near-zero false-negative rates, measured leakage reductions would be inflated.
  2. [Experiments] Experiments section: No ablations isolate the effect of the deterministic placeholder scheme on downstream MLLM task accuracy. This undermines the utility-degradation and trade-off claims, as it is unclear whether observed performance stems from anonymization or other factors.
minor comments (1)
  1. [Abstract] Abstract: The claim of 'best observed privacy-utility trade-off' is stated without naming the specific baselines or reporting the exact quantitative deltas used for comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the opportunity to improve the manuscript. We address each major comment below and will revise the experiments section accordingly to strengthen the privacy and utility claims.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: No precision, recall, or F1 scores are reported for the PII-aware recognition model on AndroidLab or PrivScreen. This is load-bearing for the central privacy-reduction claim; without near-zero false-negative rates, measured leakage reductions would be inflated.

    Authors: We agree that explicit performance metrics for the PII-aware recognition model are necessary to fully support the privacy-reduction results. Privacy leakage was measured directly via the presence of sensitive content in agent outputs and interaction traces rather than assuming perfect detection. In the revised version we will add precision, recall, and F1 scores for the detector evaluated on both AndroidLab and PrivScreen, along with a brief discussion of false-negative impact on the observed leakage figures. revision: yes

  2. Referee: [Experiments] Experiments section: No ablations isolate the effect of the deterministic placeholder scheme on downstream MLLM task accuracy. This undermines the utility-degradation and trade-off claims, as it is unclear whether observed performance stems from anonymization or other factors.

    Authors: We acknowledge that dedicated ablations would better isolate the contribution of the deterministic placeholder scheme. The current utility results compare the full anonymization pipeline against non-anonymized baselines, but do not vary the placeholder mechanism itself. We will add ablation experiments in the revision that replace deterministic placeholders with random strings or task-irrelevant tokens while keeping the rest of the pipeline fixed, thereby clarifying the source of any accuracy changes. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework evaluated on external benchmarks

full rationale

The paper describes a systems framework for anonymizing sensitive UI content via PII detection and deterministic placeholders, with claims resting on experiments using the external AndroidLab and PrivScreen benchmarks. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central privacy-utility results are presented as direct empirical outcomes rather than reductions to author-defined inputs by construction, rendering the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that anonymized placeholders retain sufficient semantics for agent reasoning and that the detector covers all relevant PII without false negatives.

axioms (1)
  • domain assumption PII-aware recognition model reliably identifies sensitive UI content across screenshots, XML, and instructions.
    Invoked as the foundation for the UI Transformer and Privacy Gatekeeper components.

pith-pipeline@v0.9.0 · 5588 in / 1147 out tokens · 41644 ms · 2026-05-16T05:56:19.940949+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

    cs.AI 2026-04 unverdicted novelty 6.0

    TIPO applies preference-intensity weighting and padding gating to stabilize preference optimization for privacy personalization in mobile GUI agents, yielding higher alignment and distinction metrics than prior methods.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Appagent-pro: A proactive gui agent system for multidomain information integration and user assistance

    Yuyang Zhao, Wentao Shi, Fuli Feng, and Xiangnan He. Appagent-pro: A proactive gui agent system for multidomain information integration and user assistance. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, CIKM ’25, page 6767–6771. ACM, November 2025

  2. [2]

    Mobile-agent-v3: Fundamental agents for gui automation, 2025

    Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, Jitong Liao, Qi Zheng, Fei Huang, Jingren Zhou, and Ming Yan. Mobile-agent-v3: Fundamental agents for gui automation, 2025

  3. [3]

    Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning, 2025

    Haoming Wang, Haoyang Zou, Huatong Song, and et al. Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning, 2025

  4. [4]

    L., Sun, J., Wang, J., et al

    Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, et al. Autoglm: Autonomous foundation agents for guis.arXiv preprint arXiv:2411.00820, 2024. 14

  5. [5]

    Appcopilot: Toward general, accurate, long-horizon, and efficient mobile agent.arXiv preprint arXiv:2509.02444, 2025

    Jingru Fan, Yufan Dang, Jingyao Wu, Huatao Li, Runde Yang, Xiyuan Yang, Yuheng Wang, and Chen Qian. Appcopilot: Toward general, accurate, long-horizon, and efficient mobile agent.arXiv preprint arXiv:2509.02444, 2025

  6. [6]

    Qwen2.5-vl technical report, 2025

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2025

  7. [7]

    Core: Reducing ui exposure in mobile agents via collaboration between cloud and local llms, 2025

    Gucongcong Fan, Chaoyue Niu, Chengfei Lyu, Fan Wu, and Guihai Chen. Core: Reducing ui exposure in mobile agents via collaboration between cloud and local llms, 2025

  8. [8]

    Dualtap: A dual-task adversarial protector for mobile mllm agents, 2025

    Fuyao Zhang, Jiaming Zhang, Che Wang, Xiongtao Sun, Yurong Hao, Guowei Guan, Wenjie Li, Longtao Huang, and Wei Yang Bryan Lim. Dualtap: A dual-task adversarial protector for mobile mllm agents, 2025

  9. [9]

    Privweb: Unobtrusive and content-aware privacy protection for web agents, 2025

    Shuning Zhang, Yutong Jiang, Rongjun Ma, Yuting Yang, Mingyao Xu, Zhixin Huang, Xin Yi, and Hewu Li. Privweb: Unobtrusive and content-aware privacy protection for web agents, 2025

  10. [10]

    Guiguard: Toward a general framework for privacy-preserving gui agents, 2026

    Yanxi Wang, Zhiling Zhang, Wenbo Zhou, Weiming Zhang, Jie Zhang, Qiannan Zhu, Yu Shi, Shuxin Zheng, and Jiyan He. Guiguard: Toward a general framework for privacy-preserving gui agents, 2026

  11. [11]

    Towards trustworthy gui agents: A survey, 2025

    Yucheng Shi, Wenhao Yu, Wenlin Yao, Wenhu Chen, and Ninghao Liu. Towards trustworthy gui agents: A survey, 2025

  12. [12]

    Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A

    Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Zho...

  13. [13]

    arXiv:2508.04482 [cs.AI] https://arxiv.org/abs/2508.04482

    Xueyu Hu, Tao Xiong, Biao Yi, et al. Os agents: A survey on mllm-based agents for general computing devices. arXiv preprint arXiv:2508.04482, 2025

  14. [14]

    Gui agents: A survey,

    Dang Nguyen, Jian Chen, Yu Wang, et al. Gui agents: A survey.arXiv preprint arXiv:2412.13501, 2024

  15. [15]

    Mind the third eye! benchmarking privacy awareness in mllm-powered smartphone agents, 2025

    Zhixin Lin, Jungang Li, Shidong Pan, Yibo Shi, Yue Yao, and Dongliang Xu. Mind the third eye! benchmarking privacy awareness in mllm-powered smartphone agents, 2025

  16. [16]

    Mcp-agentbench: Evaluating real-world language agent performance with mcp-mediated tools.arXiv preprint arXiv:2509.09734, 2025

    Zikang Guo, Benfeng Xu, Chiwei Zhu, et al. Mcp-agentbench: Evaluating real-world language agent performance with mcp-mediated tools.arXiv preprint arXiv:2509.09734, 2025

  17. [17]

    arXiv preprint arXiv:2506.07672 , year=

    Yunhe Yan, Shihe Wang, Jiajun Du, et al. Mcpworld: A unified benchmarking testbed for api, gui, and hybrid agents.arXiv preprint arXiv:2506.07672, 2025

  18. [18]

    RouterBench: A Benchmark for Multi-LLM Routing System

    Qitian Jason Hu, Jacob Bieker, Xiuyu Li, et al. Routerbench: A benchmark for multi-llm routing systems.arXiv preprint arXiv:2403.12031, 2024

  19. [19]

    The obvious invisible threat: Llm-powered gui agents’ vulnerability to fine-print injections

    Chaoran Chen, Zhiping Zhang, Bingcan Guo, Shang Ma, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, and Toby Jia-Jun Li. The obvious invisible threat: Llm-powered gui agents’ vulnerability to fine-print injections. InProceedings of the 2025 USENIX Symposium on Usable Privacy and Security (SOUPS), 2025

  20. [20]

    GLiNER: Generalist model for named entity recognition using bidirectional transformer

    Urchade Zaratiana, Nadi Tomeh, Pierre Holat, and Thierry Charnois. GLiNER: Generalist model for named entity recognition using bidirectional transformer. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2024

  21. [21]

    Gliner multi-task: Generalist lightweight model for various information extraction tasks, 2024

    Ihor Stepanov and Mykhailo Shtopko. Gliner multi-task: Generalist lightweight model for various information extraction tasks, 2024

  22. [22]

    distilbert_finetuned_ai4privacy_v2 (revision 51d7b98), 2025

    Isotonic. distilbert_finetuned_ai4privacy_v2 (revision 51d7b98), 2025

  23. [23]

    Microsoft presidio: Open -source pii detection and anonymization framework

    Microsoft. Microsoft presidio: Open -source pii detection and anonymization framework. https://github.com/ microsoft/presidio, 2025. Open-source project under MIT License

  24. [24]

    knowledgator/gliner-pii-large-v1.0

    Knowledgator and Wordcab. knowledgator/gliner-pii-large-v1.0. https://huggingface.co/knowledgator/ gliner-pii-large-v1.0, 2025. Hugging Face pre-trained model

  25. [25]

    Layoutlmv3: Pre-training for document ai with unified text and image masking

    Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu Wei. Layoutlmv3: Pre-training for document ai with unified text and image masking. InProceedings of the 30th ACM International Conference on Multimedia, pages 4083–4091, 2022

  26. [26]

    ScreenAI: A Vision-Language Model for UI and Infographics Understanding , year =

    Gilles Baechler, Srinadh Srinivas, Ping-Yu Wang, Jason Howard, et al. Screenai: A vision-language model for ui and infographics understanding.arXiv preprint arXiv:2402.04615, 2024. 15

  27. [27]

    Visionllm v2: An end-to-end generalist multimodal large language model.NeurIPS, 2024

    Jiannan Wu, Muyan Zhong, Sen Xing, et al. Visionllm v2: An end-to-end generalist multimodal large language model.NeurIPS, 2024

  28. [28]

    Gemini 2.5: Pushing the frontier of multimodal reasoning and long-context understanding.arXiv preprint, 2025

    Gemini Team. Gemini 2.5: Pushing the frontier of multimodal reasoning and long-context understanding.arXiv preprint, 2025

  29. [29]

    Towards adversarial attack on vision-language pre-training models

    Jiaming Zhang, Qi Yi, and Jitao Sang. Towards adversarial attack on vision-language pre-training models. Proceedings of ACM Multimedia, 2022

  30. [30]

    Anyattack: Towards large-scale self-supervised adversarial attacks on vision-language models

    Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Chen Yunhao, Jitao Sang, and Dit-Yan Yeung. Anyattack: Towards large-scale self-supervised adversarial attacks on vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

  31. [31]

    Adversarial attacks against closed-source MLLMs via feature optimal alignment

    Xiaojun Jia, Sensen Gao, Simeng Qin, Tianyu Pang, Chao Du, Yihao Huang, Xinfeng Li, Yiming Li, Bo Li, and Yang Liu. Adversarial attacks against closed-source MLLMs via feature optimal alignment. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  32. [32]

    Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza, and Olivier Déforges. Vip: Visual information protection through adversarial attacks on vision-language models, 2025

  33. [33]

    Easyocr: Ready-to-use ocr with 80+ supported languages

    JaidedAI. Easyocr: Ready-to-use ocr with 80+ supported languages. https://github.com/JaidedAI/EasyOCR,

  34. [34]

    Androidlab: Training and systematic benchmarking of android autonomous agents

    Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, and Yuxiao Dong. Androidlab: Training and systematic benchmarking of android autonomous agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, pages 2144–2166, 2025

  35. [35]

    Qwen3 Technical Report

    An Yang and et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  36. [36]

    UI-TARS: Pioneering Automated GUI Interaction with Native Agents

    Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. Ui-tars: Pioneering automated gui interaction with native agents.arXiv preprint arXiv:2501.12326, 2025. 16