Recognition: no theorem link
GUIGuard-Bench: Toward a General Evaluation for Privacy-Preserving GUI Agents
Pith reviewed 2026-05-16 11:28 UTC · model grok-4.3
The pith
GUIGuard-Bench shows current models detect private information in GUI screenshots but struggle with precise localization, category recognition, risk assessment, and judging task necessity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GUIGuard-Bench supplies 241 trajectory-based GUI workflows with 4,080 screenshots annotated at the region level for privacy bounding boxes, categories, risk levels, and task necessity. It measures privacy recognition accuracy, offline planner fidelity after protection is applied to screenshots, and the utility cost of protection strategies. The evaluation finds that models can usually identify whether a screenshot contains private information, yet they falter on fine-grained localization, category recognition, risk assessment, and determining whether the private element is required for the task. Closed-source models maintain largely consistent planner semantics in Android environments once隐私
What carries the argument
The GUIGuard-Bench dataset of trajectory screenshots carrying region-level annotations for privacy elements, risk, and task necessity.
Load-bearing premise
The human-provided region-level annotations for privacy bounding boxes, categories, risk levels, and task necessity accurately capture real-world GUI privacy risks across the collected trajectories.
What would settle it
A new collection of GUI trajectories whose privacy regions and necessity judgments are independently verified by multiple annotators or by observing actual data leaks in controlled agent runs, then re-testing the same models on localization and necessity accuracy.
Figures
read the original abstract
As GUI agents increasingly rely on screenshots to perceive and operate digital environments, they may inadvertently expose sensitive information such as identities, accounts, locations, and behavioral traces. While existing benchmarks primarily focus on task completion, grounding, or defenses against third-party attacks, current visual privacy datasets remain largely restricted to static natural images, limiting their ability to capture the contextual dependence and task relevance of privacy risks in GUI task trajectories. To bridge this gap, we introduce \textbf{GUIGuard-Bench}, a first-step benchmark for studying privacy-preserving GUI agents in trajectory-based GUI workflows. GUIGuard-Bench contains 241 real GUI-agent trajectories with 4,080 screenshots across Android and PC environments. Each screenshot is annotated at the region level with privacy bounding boxes, semantic privacy categories, risk levels, and whether the private information is necessary for completing the task. Built on these annotations, GUIGuard-Bench supports three complementary evaluations: privacy recognition, offline planning fidelity under protected screenshots, and the utility impact of different protection strategies. Our results show that current models can often detect whether a screenshot contains private information, but they struggle with fine-grained localization, category recognition, risk assessment, and task-necessity judgment. We also find that closed-source models, exemplified by Claude Sonnet 4.6, can maintain largely consistent planner semantics in Android environments after privacy protection is applied. Our results highlight privacy recognition as a critical bottleneck for practical GUI agents. Project: https://futuresis.github.io/GUIGuard-page/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces GUIGuard-Bench, a benchmark with 241 real GUI-agent trajectories and 4,080 screenshots from Android and PC environments. Each screenshot receives region-level annotations for privacy bounding boxes, semantic categories, risk levels, and task necessity. The benchmark supports three evaluations: privacy recognition by models, offline planning fidelity on protected screenshots, and utility impact of protection strategies. Key findings are that current models often detect private information presence but struggle with fine-grained localization, category recognition, risk assessment, and task-necessity judgment, while closed-source models such as Claude Sonnet 4.6 maintain largely consistent planner semantics in Android environments after privacy protection is applied.
Significance. If the human annotations prove reliable, the benchmark addresses a clear gap between static-image privacy datasets and the contextual, trajectory-based privacy risks faced by GUI agents. The dual focus on recognition failures and downstream planning consistency provides actionable diagnostics for privacy-preserving agent design. The open release of trajectories and annotations could enable reproducible follow-up work on protection mechanisms.
major comments (3)
- [Dataset Construction and Annotation] Dataset annotation section: No inter-annotator agreement statistics, multiple-annotator protocol, or external validation against privacy experts are reported for the subjective labels (risk levels and task necessity). These labels directly underpin the headline claims about model struggles with risk assessment and task-necessity judgment; without agreement metrics, systematic annotator bias cannot be ruled out as an alternative explanation for the observed performance gaps.
- [Evaluation Results] Evaluation results: The claims that models 'can often detect' private information yet 'struggle' with localization, categories, risk assessment, and task-necessity judgment are presented without accompanying quantitative metrics (precision/recall, accuracy, or confusion matrices), error analysis, or the full evaluation protocol. This absence prevents assessment of effect sizes and reproducibility of the reported bottlenecks.
- [Offline Planning Fidelity] Planning fidelity evaluation: The consistency result for Claude Sonnet 4.6 after privacy protection is stated at a high level but lacks the concrete measurement protocol (e.g., semantic similarity metric, planner output comparison method, or control conditions) needed to substantiate that the planner semantics remain 'largely consistent.'
minor comments (2)
- [Abstract] Abstract: The summary of results would be strengthened by including at least one or two key quantitative figures (e.g., detection accuracy or consistency score) rather than purely qualitative statements.
- [Figures] Figure and table captions: Ensure all figures showing annotation examples or model outputs include explicit scale bars, color legends, and sample sizes so readers can interpret them without returning to the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater rigor and clarity.
read point-by-point responses
-
Referee: [Dataset Construction and Annotation] Dataset annotation section: No inter-annotator agreement statistics, multiple-annotator protocol, or external validation against privacy experts are reported for the subjective labels (risk levels and task necessity). These labels directly underpin the headline claims about model struggles with risk assessment and task-necessity judgment; without agreement metrics, systematic annotator bias cannot be ruled out as an alternative explanation for the observed performance gaps.
Authors: We agree that inter-annotator agreement metrics are essential for subjective labels. The revised manuscript will include a detailed description of the multiple-annotator protocol (three annotators per label with disagreement resolution via discussion) and report agreement statistics such as Fleiss' kappa for risk levels and task necessity. We will also acknowledge that external validation by privacy experts was not performed and discuss this as a limitation. revision: yes
-
Referee: [Evaluation Results] Evaluation results: The claims that models 'can often detect' private information yet 'struggle' with localization, categories, risk assessment, and task-necessity judgment are presented without accompanying quantitative metrics (precision/recall, accuracy, or confusion matrices), error analysis, or the full evaluation protocol. This absence prevents assessment of effect sizes and reproducibility of the reported bottlenecks.
Authors: We agree that the evaluation section would benefit from explicit quantitative support. The revised manuscript will add precision, recall, accuracy, confusion matrices, and a dedicated error analysis subsection for the privacy recognition tasks. The full evaluation protocol will be described in the main text (with additional details moved from the appendix) to ensure reproducibility. revision: yes
-
Referee: [Offline Planning Fidelity] Planning fidelity evaluation: The consistency result for Claude Sonnet 4.6 after privacy protection is stated at a high level but lacks the concrete measurement protocol (e.g., semantic similarity metric, planner output comparison method, or control conditions) needed to substantiate that the planner semantics remain 'largely consistent.'
Authors: We will expand the offline planning fidelity section in the revision to specify the concrete protocol. This will include the semantic similarity metric (embedding cosine similarity), the exact method for comparing planner outputs, and the control conditions used to support the consistency claim for Claude Sonnet 4.6 in Android environments. revision: yes
Circularity Check
No circularity: empirical benchmark with direct observations
full rationale
The paper constructs GUIGuard-Bench from 241 trajectories and 4,080 human-annotated screenshots, then reports direct model evaluations on privacy detection, localization, and planning fidelity. No equations, fitted parameters, or predictions appear; results are observational comparisons against the annotations rather than derivations that reduce to inputs by construction. Self-citations are absent from load-bearing claims, and the benchmark is externally falsifiable via the released annotations and trajectories.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human annotations for privacy bounding boxes, semantic categories, risk levels, and task necessity are accurate and unbiased.
Forward citations
Cited by 4 Pith papers
-
Contrastive Privacy: A Semantic Approach to Measuring Privacy of AI-based Sanitization
Contrastive privacy is a new corpus-contrast test for semantic privacy in AI-sanitized media that uses latent concept measures and requires no manual labeling.
-
RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management
RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.
-
Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents
Phone-use agents avoid harm more often through inability to act than through deliberate safe choices, so benchmarks must separate unsafe judgment from capability failure.
-
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment inter...
Reference graph
Works this paper leans on
-
[1]
Google DeepMind.Project Astra: A Universal Multimodal AI Assistant. https://blog. google/technology/google-deepmind/gemini-universal-ai-assistant/. 2025
work page 2025
-
[2]
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
Mathieu Andreux et al. “Surfer 2: The Next Generation of Cross-Platform Computer Use Agents”. In:arXiv preprint arXiv:2510.19949(2025)
-
[3]
Dang Nguyen et al. “GUI Agents: A Survey”. In:Findings of the Association for Compu- tational Linguistics: ACL 2025. Ed. by Wanxiang Che et al. Vienna, Austria: Association for Computational Linguistics, July 2025, pp. 22522–22538.DOI: 10.18653/v1/2025. findings-acl.1158
-
[4]
Chiyu Chen et al. “GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?” In:arXiv preprint arXiv:2510.20333(2025)
-
[5]
ByteDance (Doubao Team).Doubao Phone Assistant (Technical Preview). Product webpage. Launched Dec 1, 2025. Accessed Dec 19, 2025. Dec. 2025.URL: https://o.doubao.com/. 19
work page 2025
-
[6]
Alibaba Cloud.Wuying AgentBay (AgentBay): All-scenario AI Agent Execution Platform. Product webpage. Accessed Dec 19, 2025. 2025.URL: https : / / www . aliyun . com / product/agentbay
work page 2025
-
[7]
OpenAI.Introducing ChatGPT Atlas. OpenAI product announcement. Accessed Dec 19,
-
[8]
2025.URL:https://openai.com/index/introducing-chatgpt-atlas/
Oct. 2025.URL:https://openai.com/index/introducing-chatgpt-atlas/
work page 2025
-
[9]
Paul G Mastrokostas et al. “GPT-4 as a source of patient information for anterior cervical discectomy and fusion: a comparative analysis against Google web search”. In:Global Spine Journal14.8 (2024), pp. 2389–2398
work page 2024
-
[10]
Matus Formanek. “Exploring the potential of large language models and generative artifi- cial intelligence (GPT): Applications in Library and Information Science”. In:Journal of Librarianship and Information Science57.2 (2025), pp. 568–590
work page 2025
-
[11]
Large language models empowered personalized web agents
Hongru Cai et al. “Large language models empowered personalized web agents”. In:Pro- ceedings of the ACM on Web Conference 2025. 2025, pp. 198–215
work page 2025
-
[12]
Bearcubs: A benchmark for computer-using web agents
Yixiao Song et al. “Bearcubs: A benchmark for computer-using web agents”. In:arXiv preprint arXiv:2503.07919(2025)
-
[13]
Liangbo Ning et al. “A survey of webagents: Towards next-generation ai agents for web automation with large foundation models”. In:Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2. 2025, pp. 6140–6150
work page 2025
-
[14]
Websight: A vision-first architecture for robust web agents
Tanvir Bhathal and Asanshay Gupta. “Websight: A vision-first architecture for robust web agents”. In:arXiv preprint arXiv:2508.16987(2025)
-
[15]
Gui testing arena: A unified benchmark for advancing autonomous gui testing agent
Kangjia Zhao et al. “Gui testing arena: A unified benchmark for advancing autonomous gui testing agent”. In:arXiv preprint arXiv:2412.18426(2024)
-
[16]
Towards trustworthy gui agents: A survey
Yucheng Shi et al. “Towards trustworthy gui agents: A survey”. In:arXiv preprint arXiv:2503.23434(2025)
-
[17]
Chaoran Chen et al. “Clear: Towards contextual llm-empowered privacy policy analysis and risk generation for large language model applications”. In:Proceedings of the 30th International Conference on Intelligent User Interfaces. 2025, pp. 277–297
work page 2025
-
[18]
Large Language Model-Brained GUI Agents: A Survey
Chaoyun Zhang et al. “Large Language Model-Brained GUI Agents: A Survey”. In:Transac- tions on Machine Learning Research2025.1 (2025)
work page 2025
-
[19]
Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models
Pete Janowczyk et al. “Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models”. In:arXiv preprint arXiv:2411.05056(2024)
-
[20]
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Zeyi Liao et al. “EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage”. In:Proceedings of the International Conference on Representation Learning (ICLR) 2025. 2025
work page 2025
-
[21]
When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs
Hanna Kim et al. “When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs”. In:34th USENIX Security Symposium (USENIX Security 25). 2025, pp. 1729–1748
work page 2025
-
[22]
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Yuyang Wanyan et al. “Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation”. In:arXiv preprint arXiv:2506.04614(2025)
-
[23]
Anthropic.Agents and Tools: Computer Use. Online. Accessed: March 16, 2025. 2025
work page 2025
-
[25]
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Haoming Wang et al. “Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning”. In:arXiv preprint arXiv:2509.02544(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
Privacyasst: Safeguarding user privacy in tool-using large language model agents
Xinyu Zhang et al. “Privacyasst: Safeguarding user privacy in tool-using large language model agents”. In:IEEE Transactions on Dependable and Secure Computing21.6 (2024), pp. 5242–5258
work page 2024
-
[27]
Guardagent: Safeguard llm agents by a guard agent via knowledge-enabled reasoning
Zhen Xiang et al. “Guardagent: Safeguard llm agents by a guard agent via knowledge-enabled reasoning”. In:arXiv preprint arXiv:2406.09187(2024)
-
[28]
OpenAI.Computer-using Agent. Online. Accessed: March 16, 2025. 2025
work page 2025
-
[29]
Adversaflow: Visual red teaming for large language models with multi- level adversarial flow
Dazhen Deng et al. “Adversaflow: Visual red teaming for large language models with multi- level adversarial flow”. In:IEEE Transactions on Visualization and Computer Graphics (2024)
work page 2024
-
[30]
Autodroid: Llm-powered task automation in android
Hao Wen et al. “Autodroid: Llm-powered task automation in android”. In:Proceedings of the 30th Annual International Conference on Mobile Computing and Networking. 2024, pp. 543–557. 20
work page 2024
-
[31]
Mla-trust: Benchmarking trustworthiness of multimodal llm agents in gui environments
Xiao Yang et al. “Mla-trust: Benchmarking trustworthiness of multimodal llm agents in gui environments”. In:arXiv preprint arXiv:2506.01616(2025)
-
[32]
Caution for the environment: Multimodal agents are susceptible to environ- mental distractions
Xinbei Ma et al. “Caution for the environment: Multimodal agents are susceptible to environ- mental distractions”. In:arXiv preprint arXiv:2408.02544(2024)
-
[33]
OpenAI.GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum. Tech. rep. Ope- nAI, 2025.URL: https : / / cdn . openai . com / pdf / 4173ec8d - 1229 - 47db - 96de - 06d87147e07e/5_1_system_card.pdf
work page 2025
-
[34]
Google DeepMind.Gemini 3 Developer Guide. https : / / ai . google . dev / gemini - api/docs/gemini-3. 2025
work page 2025
-
[35]
Anthropic.Claude 4.5 Models.https://platform.claude.com/docs/models. 2025
work page 2025
-
[36]
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano et al. “WebGPT: Browser-assisted question-answering with human feed- back”. In:arXiv preprint arXiv:2112.09332(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[37]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao et al. “ReAct: Synergizing Reasoning and Acting in Language Models”. In: International Conference on Learning Representations. 2023
work page 2023
-
[38]
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta et al. “Multimodal Web Navigation with Instruction-Finetuned Foundation Models”. In:International Conference on Learning Representations. 2024
work page 2024
-
[39]
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng et al. “GPT-4V(ision) is a Generalist Web Agent, if Grounded”. In:Proceed- ings of the 41st International Conference on Machine Learning. Ed. by Ruslan Salakhutdinov et al. V ol. 235. Proceedings of Machine Learning Research. PMLR, 2024, pp. 61349–61385
work page 2024
-
[40]
AppAgent: Multimodal Agents as Smartphone Users
Chi Zhang et al. “AppAgent: Multimodal Agents as Smartphone Users”. In:arXiv preprint arXiv:2312.13771(2023)
-
[41]
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
Yanda Li et al. “AppAgent v2: Advanced Agent for Flexible Mobile Interactions”. In:arXiv preprint arXiv:2408.11824(2024)
-
[42]
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Peter Shaw et al. “From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces”. In:Advances in Neural Information Processing Systems. Ed. by A. Oh et al. V ol. 36. Curran Associates, Inc., 2023, pp. 34354–34370
work page 2023
-
[43]
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng et al. “SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents”. In:Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 9313–9332.DOI:10.18653/v1/2024.acl-long.505
-
[44]
ScreenAgent: A Vision Language Model-driven Computer Control Agent
Runliang Niu et al. “ScreenAgent: A Vision Language Model-driven Computer Control Agent”. In:Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24). IJCAI, 2024, pp. 6433–6441.DOI:10.24963/ijcai.2024/711
-
[45]
OmniParser for Pure Vision Based GUI Agent
Yadong Lu et al. “OmniParser for Pure Vision Based GUI Agent”. In:arXiv preprint arXiv:2408.00203(2024)
-
[46]
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You et al. “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs”. In: Computer Vision – ECCV 2024. V ol. 15122. Lecture Notes in Computer Science. Springer Nature Switzerland AG, 2024, pp. 240–255.DOI:10.1007/978-3-031-73039-9_14
-
[47]
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li et al. “Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms”. In:Proceedings of the International Conference on Learning Representations (ICLR) 2025. 2025
work page 2025
-
[48]
OpenAI.Introducing Operator. https : / / openai . com / zh - Hans - CN / index / introducing-operator/. Accessed: 2025-11-23. 2024
work page 2025
-
[49]
https://openai.com/zh-Hans-CN/index/computer- using-agent/
OpenAI.Computer-Using Agent. https://openai.com/zh-Hans-CN/index/computer- using-agent/. Accessed: 2025-11-24. 2024
work page 2025
-
[50]
Anthropic.Developing Computer Use. https : / / www . anthropic . com / news / developing-computer-use. Accessed: 2025-11-24. 2024
work page 2025
-
[51]
Google DeepMind.Introducing the Gemini 2.5 Computer Use model. https : / / blog . google/technology/google-deepmind/gemini-computer-use-model/ . Accessed: 2025-11-24. 2025
work page 2025
-
[54]
OpenCUA: Open Foundations for Computer-Use Agents
Xinyuan Wang et al. “OpenCUA: Open Foundations for Computer-Use Agents”. In:arXiv preprint arXiv:2508.09123(2025). 21
-
[55]
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Zeyi Liao et al. “EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage”. In:Proceedings of the International Conference on Learning Representations (ICLR) 2025. Poster. 2025
work page 2025
-
[56]
Imprompter: Tricking LLM Agents into Improper Tool Use
Xiaohan Fu et al. “Imprompter: Tricking LLM Agents into Improper Tool Use”. In:arXiv preprint arXiv:2410.14923(2024)
-
[57]
The Obvious Invisible Threat: LLM-Powered GUI Agents’ Vulnerability to Fine-Print Injections
Chaoran Chen et al. “The Obvious Invisible Threat: LLM-Powered GUI Agents’ Vulnerability to Fine-Print Injections”. In:arXiv preprint arXiv:2504.11281(2025)
-
[58]
Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution
Meysam Alizadeh et al. “Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution”. In:arXiv preprint arXiv:2506.01055(2025)
-
[59]
Unveiling Privacy Risks in LLM Agent Memory
Bo Wang et al. “Unveiling Privacy Risks in LLM Agent Memory”. In:Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vienna, Austria: Association for Computational Linguistics, 2025, pp. 25241–25260. DOI:10.18653/v1/2025.acl-long.1227
-
[60]
Private Attribute Inference from Images with Vision -Language Models
Batuhan Tömekçe et al. “Private Attribute Inference from Images with Vision -Language Models”. In:Advances in Neural Information Processing Systems 37. 2024, pp. 103619– 103651.DOI:10.52202/079017-3291
-
[61]
Weidi Luo et al. “Doxing via the Lens: Revealing Privacy Leakage in Image Geolocation for Agentic Multi-Modal Large Reasoning Models”. In:arXiv preprintarXiv:2504.19373 (2025)
-
[62]
Human-Centered Privacy Research in the Age of Large Language Models
Tianshi Li et al. “Human-Centered Privacy Research in the Age of Large Language Models”. In:Extended Abstracts of the ACM Conference on Human Factors in Computing Systems (CHI) 2024. 2024, Paper No. 59.DOI:10.1145/3613905.3643983
-
[63]
Zhiping Zhang, Bingcan Guo, and Tianshi Li. “Can Humans Oversee Agents to Prevent Privacy Leakage? A Study on Privacy Awareness, Preferences, and Trust in Language Model Agents”. In:arXiv preprintarXiv:2411.01344 (2024)
-
[64]
Apple Inc.About iCloud Private Relay. Apple Support. Accessed: 2025-12-04. 2023.URL: https://support.apple.com/en-sg/102602
work page 2025
-
[65]
Victor Costan and Srinivas Devadas.Intel SGX Explained. Tech. rep. 2016/086. IACR Cryptology ePrint Archive, 2016.URL:https://eprint.iacr.org/2016/086.pdf
work page 2016
-
[66]
Attestation Mechanisms for Trusted Execution Environments De- mystified
Jämes Ménétrey et al. “Attestation Mechanisms for Trusted Execution Environments De- mystified”. In:Proceedings of the 2022 Workshop on System Software for Trusted Execution (SysTEX). 2022.DOI:10.1007/978-3-031-16092-9_7
-
[67]
Trusted Mobile Com- puting: An Overview of Existing Solutions
M. Amine Bouazzouni, Emmanuel Conchon, and Fabrice Peyrard. “Trusted Mobile Com- puting: An Overview of Existing Solutions”. In:Future Generation Computer Systems80 (2018), pp. 596–612.DOI:10.1016/j.future.2017.05.029
-
[68]
PrivacyAsst: Safeguarding User Privacy in Tool-Using Large Language Model Agents
Xinyu Zhang et al. “PrivacyAsst: Safeguarding User Privacy in Tool-Using Large Language Model Agents”. In:IEEE Transactions on Dependable and Secure Computing21.6 (2024), pp. 5242–5258.DOI:10.1109/TDSC.2024.3372777
-
[69]
MMPro: A Decoupled Perception-Thinking-Execution Framework for Se- cure GUI Agent
Benlong Wu et al. “MMPro: A Decoupled Perception-Thinking-Execution Framework for Se- cure GUI Agent”. In:Proceedings of the 33rd ACM International Conference on Multimedia (MM 2025). 2025, pp. 4679–4688.DOI:10.1145/3746027.3755553
-
[70]
Tanaos.tanaos-text-anonymizer-v1: A small but performant Text Anonymization model. Hugging Face Model Card. Accessed: 2025-12-05. 2025.URL: https://huggingface. co/tanaos/tanaos-text-anonymizer-v1
work page 2025
-
[71]
GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
Zhen Xiang et al. “GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning”. In:Proceedings of the 38th International Conference on Machine Learning (ICML 2025). 2025
work page 2025
-
[72]
AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
Weidi Luo et al. “AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection”. In:Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Long Papers. 2025, pp. 8104–8139.DOI: 10.18653/v1/2025.acl- long.399
-
[73]
Towards a visual privacy advi- sor: Understanding and predicting privacy risks in images
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. “Towards a visual privacy advi- sor: Understanding and predicting privacy risks in images”. In:Proceedings of the IEEE international conference on computer vision. 2017, pp. 3686–3695
work page 2017
-
[74]
Privacyalert: A dataset for image privacy prediction
Chenye Zhao et al. “Privacyalert: A dataset for image privacy prediction”. In:Proceedings of the International AAAI Conference on Web and Social Media. V ol. 16. 2022, pp. 1352–1361. 22
work page 2022
-
[75]
Evaluation of Human Visual Privacy Protection: Three-Dimensional Framework and Benchmark Dataset
Sara Abdulaziz, Giacomo D’amicantonio, and Egor Bondarev. “Evaluation of Human Visual Privacy Protection: Three-Dimensional Framework and Benchmark Dataset”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025, pp. 5893–5902
work page 2025
-
[76]
Biv-priv-seg: Locating private content in images taken by people with visual impairments
Yu–Yun Tseng et al. “Biv-priv-seg: Locating private content in images taken by people with visual impairments”. In:2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE. 2025, pp. 430–440
work page 2025
-
[77]
DIPA2: An Image Dataset with Cross-cultural Privacy Perception An- notations
Anran Xu et al. “DIPA2: An Image Dataset with Cross-cultural Privacy Perception An- notations”. In:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7.4 (2024), pp. 1–30
work page 2024
-
[78]
Multi -P²A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models
Jie Zhang et al. “Multi -P²A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models”. In:arXiv preprintarXiv:2412.19496 (2024)
-
[79]
RootsAutomation.ScreenSpot. https : / / huggingface . co / datasets / rootsautomation/ScreenSpot. Accessed: 2025-11-26
work page 2025
-
[80]
Screenspot-pro: Gui grounding for professional high-resolution computer use
Kaixin Li et al. “Screenspot-pro: Gui grounding for professional high-resolution computer use”. In:Proceedings of the 33rd ACM International Conference on Multimedia. 2025, pp. 8778–8786
work page 2025
-
[81]
Visualwebarena: Evaluating multimodal agents on realistic visual web tasks
Jing Yu Koh et al. “Visualwebarena: Evaluating multimodal agents on realistic visual web tasks”. In:Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, pp. 881–905
work page 2024
-
[82]
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu et al. “GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025, pp. 22404–22414
work page 2025
-
[83]
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie et al. “OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments”. In:Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2024), Datasets & Benchmarks Track. 2024, pp. 52040–52094. DOI:10.52202/079017-1650
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.