pith. machine review for the scientific record. sign in

arxiv: 2510.10073 · v2 · submitted 2025-10-11 · 💻 cs.CR · cs.CV

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Pith reviewed 2026-05-18 08:12 UTC · model grok-4.3

classification 💻 cs.CR cs.CV
keywords LVLM web agentssecurity benchmarkadversarial attacksweb automationagent vulnerabilitymulti-layered evaluationsimulated environmentsagent safety
0
0 comments X

The pith

A new benchmark shows that all tested vision-language web agents remain vulnerable to subtle adversarial manipulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SecureWebArena to fill gaps in existing evaluations by offering a broad test of security risks for agents that combine vision and language models to handle web tasks. It supplies six realistic simulated environments such as shopping sites and forums, along with thousands of task trajectories and a clear breakdown of six distinct attack types that target either the user instructions or the surrounding site conditions. Large experiments across nine models from different design families demonstrate that every agent can be led astray by small changes, and that gains in task specialization often come with reduced resistance to these changes. A multi-layered check tracks not just final success but also the agent's internal steps and full sequence of actions to pinpoint failure points. This matters for anyone building or using automated web tools, since undetected exploits could lead to unintended purchases, data leaks, or other real harms once agents leave controlled tests.

Core claim

SecureWebArena supplies the first unified suite for security testing of LVLM-based web agents through six simulated environments, 2970 trajectories, a taxonomy of six attack vectors that cover both user-level and environment-level manipulations, and a three-part evaluation protocol that separately examines internal reasoning, full behavioral trajectories, and final task outcomes. Application of this suite to nine models spanning general-purpose, agent-specialized, and GUI-grounded categories establishes that every tested agent fails under subtle adversarial inputs and that model specialization introduces measurable security trade-offs.

What carries the argument

The unified evaluation suite that combines six realistic web environments with a structured taxonomy of six attack vectors and a multi-layered protocol for scoring failures in reasoning, trajectory, and outcome.

If this is right

  • Web-agent designs must add explicit protections against both prompt changes and site-level manipulations to reach reliable real-world use.
  • Agent-specialized models require targeted security training to offset the vulnerabilities that accompany their performance gains.
  • Security checks for these agents should routinely inspect reasoning steps and action sequences rather than measuring only final task success.
  • The benchmark supplies a reusable foundation that future work can extend to develop agents suitable for trustworthy online automation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Real-world deployment on live websites may surface additional attack surfaces not captured inside the simulated environments.
  • Combining general and specialized models in a single system could reduce the security trade-offs seen when using either type alone.
  • Continuous monitoring of agent reasoning during operation might allow early detection of the subtle manipulations the benchmark identifies.

Load-bearing premise

The six simulated environments and the six attack categories are broad enough to represent the security threats that would appear when these agents run on actual live websites.

What would settle it

Running one of the nine tested agents on a real e-commerce or forum site and successfully triggering one of the benchmark attack vectors to produce the predicted failure mode would support the results; repeated inability to reproduce the same failures on live sites would weaken the claim that the benchmark captures meaningful risks.

Figures

Figures reproduced from arXiv: 2510.10073 by Aishan Liu, Gan Xu, Jianle Gan, Junzheng Shi, Mingchuan Zhang, Quanchen Zou, Wenxin Zhang, Xianglong Liu, Yangguang Shao, Zhenfei Yin, Zonghao Ying.

Figure 1
Figure 1. Figure 1: Overall illustration of our SecureWebArena, the first holistic benchmark for evaluating the security of LVLM￾based web agents. 1 Introduction Large vision language models (LVLMs) [12, 33, 38] have equipped autonomous agents with powerful capabilities to perceive and rea￾son across language, vision, and user interface elements [11, 32, 58]. As web agents, these models can navigate complex websites, fill out… view at source ↗
Figure 2
Figure 2. Figure 2: SecureWebArena framework. It integrates simulated environments, diverse attack vectors, and multi-level evaluation to assess agent safety performance to adversarial manipulation. The six environments span a broad range of real-world interac￾tion contexts, grouped into four representative categories. Informa￾tion Retrieval and Navigation environments (e.g., Wikipedia and Reddit) feature dense, user-generate… view at source ↗
Figure 3
Figure 3. Figure 3: Overall comparison of agents’ vulnerability scores (RVR, BCR, and PDR) across 6 attack vectors. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of vulnerability scores (RVR, BCR, and PDR) of representative LVLM-based agents across 6 attack vectors. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of evaluated environments. GPT-5 Claude Sonnet 4 Gemini 2.5 Pro GPT-4o Claude Sonnet 3.7 Seed-1.5-VL GLM-4.5V UI-TARS-1.5 Aguvis 20 40 60 80 100 Shopping GPT-5 Claude Sonnet 4 Gemini 2.5 Pro GPT-4o Claude Sonnet 3.7 Seed-1.5-VL GLM-4.5V UI-TARS-1.5 Aguvis 20 40 60 80 100 Classifieds GPT-5 Claude Sonnet 4 Gemini 2.5 Pro GPT-4o Claude Sonnet 3.7 Seed-1.5-VL GLM-4.5V UI-TARS-1.5 Aguvis 20 40 60 80 10… view at source ↗
Figure 6
Figure 6. Figure 6: PDR (%) of all evaluated LVLM-based agents across 6 environments and 6 attack types. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Case study illustrating an indirect prompt injection during an online shopping task. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

Large vision-language model (LVLM)-based web agents are emerging as powerful tools for automating complex online tasks. However, when deployed in real-world environments, they face serious security risks, motivating the design of security evaluation benchmarks. Existing benchmarks provide only partial coverage, typically restricted to narrow scenarios such as user-level prompt manipulation, and thus fail to capture the broad range of agent vulnerabilities. To address this gap, we present \tool{}, the first holistic benchmark for evaluating the security of LVLM-based web agents. \tool{} first introduces a unified evaluation suite comprising six simulated but realistic web environments (\eg, e-commerce platforms, community forums) and includes 2,970 high-quality trajectories spanning diverse tasks and attack settings. The suite defines a structured taxonomy of six attack vectors spanning both user-level and environment-level manipulations. In addition, we introduce a multi-layered evaluation protocol that analyzes agent failures across three critical dimensions: internal reasoning, behavioral trajectory, and task outcome, facilitating a fine-grained risk analysis that goes far beyond simple success metrics. Using this benchmark, we conduct large-scale experiments on 9 representative LVLMs, which fall into three categories: general-purpose, agent-specialized, and GUI-grounded. Our results show that all tested agents are consistently vulnerable to subtle adversarial manipulations and reveal critical trade-offs between model specialization and security. By providing (1) a comprehensive benchmark suite with diverse environments and a multi-layered evaluation pipeline, and (2) empirical insights into the security challenges of modern LVLM-based web agents, \tool{} establishes a foundation for advancing trustworthy web agent deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces SecureWebArena as the first holistic security benchmark for LVLM-based web agents. It comprises six simulated realistic web environments (e.g., e-commerce and forums), 2,970 trajectories, a taxonomy of six attack vectors covering user- and environment-level manipulations, and a multi-layered evaluation protocol assessing internal reasoning, behavioral trajectories, and task outcomes. Large-scale experiments on nine LVLMs across general-purpose, agent-specialized, and GUI-grounded categories show that all agents are vulnerable to subtle adversarial manipulations, with observed trade-offs between model specialization and security.

Significance. If the simulation fidelity and evaluation protocol hold, this benchmark fills a gap in partial existing evaluations by providing comprehensive coverage and fine-grained failure analysis, establishing a foundation for trustworthy LVLM web agent deployment. The scale (nine models, nearly three thousand trajectories) and multi-dimensional protocol are strengths that enable more nuanced risk assessment than success-rate-only metrics.

major comments (1)
  1. [Abstract and §3 (Benchmark Construction)] The central claim that all tested agents are consistently vulnerable and that the benchmark captures broad real-world threats rests on the six simulated environments and six-vector taxonomy. However, the manuscript provides no explicit validation (e.g., comparison of observable states, DOM dynamics, or authentication redirects) that these simulations reproduce the security-relevant behaviors of live web sites; without such evidence the reported failure rates and specialization-security trade-offs may not transfer.
minor comments (2)
  1. [Abstract] The abstract summarizes high-level outcomes but supplies no quantitative results, error bars, or statistical controls; moving at least one key table or figure summary into the abstract would strengthen immediate verifiability.
  2. [§4 (Experiments)] Clarify whether the 2,970 trajectories include balanced coverage across all six attack vectors and environments, or whether some combinations are underrepresented.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the importance of simulation fidelity in establishing the transferability of our benchmark results. We agree that explicit validation would strengthen the central claims regarding agent vulnerabilities and the broad applicability of the observed trade-offs. We address this comment below and will incorporate revisions to improve the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3 (Benchmark Construction)] The central claim that all tested agents are consistently vulnerable and that the benchmark captures broad real-world threats rests on the six simulated environments and six-vector taxonomy. However, the manuscript provides no explicit validation (e.g., comparison of observable states, DOM dynamics, or authentication redirects) that these simulations reproduce the security-relevant behaviors of live web sites; without such evidence the reported failure rates and specialization-security trade-offs may not transfer.

    Authors: We acknowledge that the current manuscript does not present a direct side-by-side empirical validation (such as quantitative comparisons of DOM trees, state transitions, or authentication redirect behaviors) between the simulated environments and live websites. The environments were implemented using standard web frameworks to replicate core interaction patterns and security surfaces observed in real platforms (e.g., dynamic product updates in e-commerce and threaded discussions in forums). However, we recognize that this design rationale alone is insufficient to fully address transferability concerns. In the revised manuscript, we will expand §3 with a dedicated subsection on environment construction that includes concrete examples of replicated observable states and attack surfaces. We will also add an explicit limitations paragraph discussing potential discrepancies with live deployments and how future work could incorporate real-site validation. These changes will allow readers to better evaluate the generalizability of the reported failure rates and specialization-security trade-offs. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark creation and testing exhibits no circularity

full rationale

This paper introduces a new security evaluation benchmark for LVLM-based web agents, defines six simulated environments and a taxonomy of six attack vectors, generates 2,970 trajectories, and reports empirical results from testing nine existing models across three categories. The central claims consist of observed failure rates and specialization-security trade-offs derived directly from these experiments. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text; the work is self-contained as an empirical evaluation effort without any reduction of results to self-referential inputs or self-citation chains.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The paper contributes a constructed evaluation framework rather than a derivation from first principles. It relies on domain assumptions about simulation fidelity and attack coverage instead of introducing fitted parameters or new physical entities.

free parameters (2)
  • Selection and design of the six web environments
    Chosen to represent diverse real-world tasks such as e-commerce and community forums.
  • Definition of the six attack vectors in the taxonomy
    Structured to span user-level and environment-level manipulations.
axioms (1)
  • domain assumption Simulated web environments can serve as valid proxies for evaluating real-world security risks to deployed LVLM-based agents.
    Invoked to justify the relevance of the benchmark results to practical deployment.

pith-pipeline@v0.9.0 · 5855 in / 1348 out tokens · 22616 ms · 2026-05-18T08:12:15.964394+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 15 internal anchors

  1. [1]

    Tamer Abuelsaad, Deepak Akkil, Prasenjit Dey, Ashish Jagmohan, Aditya Vem- paty, and Ravi Kokku. 2024. Agent-e: From autonomous web navigation to foun- dational design principles in agentic systems.arXiv preprint arXiv:2407.13032 (2024)

  2. [2]

    2025.Introducing Claude 3.7 Sonnet and Claude Code

    Anthropic. 2025.Introducing Claude 3.7 Sonnet and Claude Code. https://www. anthropic.com/news/claude-3-7-sonnet Accessed: 2025-10-04

  3. [3]

    2025.Introducing Claude 4

    Anthropic. 2025.Introducing Claude 4. https://www.anthropic.com/news/claude- 4 Accessed: 2025-10-04

  4. [4]

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)

  5. [5]

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems36 (2023), 28091–28114

  6. [6]

    Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, and Lidong Bing. 2023. Multilingual jailbreak challenges in large language models.arXiv preprint arXiv:2310.06474 (2023)

  7. [7]

    Peng Ding, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, and Shujian Huang. 2023. A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily.arXiv preprint arXiv:2311.08268 (2023)

  8. [8]

    Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori, Chuan Guo, and Kamalika Chaudhuri. 2025. Wasp: Benchmarking web agent security against prompt injection attacks.arXiv preprint arXiv:2504.18575(2025)

  9. [9]

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

  10. [10]

    Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, et al

  11. [11]

    Seed1.5-VL Technical Report

    Seed1.5-VL Technical Report. arXiv:2505.07062 [cs.CV] https://arxiv.org/ abs/2505.07062

  12. [12]

    Jiaxing Huang, Jingyi Zhang, Kai Jiang, Han Qiu, Xiaoqin Zhang, Ling Shao, Shijian Lu, and Dacheng Tao. 2025. Visual Instruction Tuning towards General- Purpose Multimodal Large Language Model: A Survey.International Journal of Computer Vision(2025), 1–39

  13. [13]

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

  14. [14]

    Sam Johnson, Viet Pham, and Thai Le. 2025. Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree.arXiv preprint arXiv:2507.14799(2025)

  15. [15]

    Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, and Daniel Fried

  16. [16]

    Visualwebarena: Evaluating multimodal agents on realistic visual web tasks.arXiv preprint arXiv:2401.13649(2024)

  17. [17]

    Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, et al. 2024. Refusal-trained llms are easily jailbroken as browser agents.arXiv preprint arXiv:2410.13886(2024)

  18. [18]

    Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, et al. 2024. Autowebglm: A large language model-based web navigating agent. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5295–5306

  19. [19]

    Ido Levy, Ben Wiesel, Sami Marreed, Alon Oved, Avi Yaeli, and Segev Shlomov

  20. [20]

    St-webagentbench: A benchmark for evaluating safety and trustworthiness in web agents.arXiv preprint arXiv:2410.06703(2024)

  21. [21]

    Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, and Bo Han

  22. [22]

    Deepinception: Hypnotize large language model to be jailbreaker.arXiv preprint arXiv:2311.03191(2023)

  23. [23]

    Aishan Liu, Jun Guo, Jiakai Wang, Siyuan Liang, Renshuai Tao, Wenbo Zhou, Cong Liu, Xianglong Liu, and Dacheng Tao. 2023. X-Adv: Physical Adversarial Object Attacks against X-ray Prohibited Item Detection. InUSENIX Security Symposium

  24. [24]

    Aishan Liu, Tairan Huang, Xianglong Liu, Yitao Xu, Yuqing Ma, Xinyun Chen, Stephen J Maybank, and Dacheng Tao. 2020. Spatiotemporal attacks for embodied agents. InECCV

  25. [25]

    Aishan Liu, Xianglong Liu, Jiaxin Fan, Yuqing Ma, Anlan Zhang, Huiyuan Xie, and Dacheng Tao. 2019. Perceptual-sensitive gan for generating adversarial patches. InAAAI

  26. [26]

    Aishan Liu, Xianglong Liu, Hang Yu, Chongzhi Zhang, Qiang Liu, and Dacheng Tao. 2021. Training robust deep neural networks via adversarial noise propagation. TIP(2021)

  27. [27]

    Aishan Liu, Jiakai Wang, Xianglong Liu, Bowen Cao, Chongzhi Zhang, and Hang Yu. 2020. Bias-based universal adversarial patch attack for automatic check-out. InECCV

  28. [28]

    Aishan Liu, Zonghao Ying, Le Wang, Junjie Mu, Jinyang Guo, Jiakai Wang, Yuqing Ma, Siyuan Liang, Mingchuan Zhang, Xianglong Liu, et al. 2025. AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions.arXiv preprint arXiv:2506.14697(2025)

  29. [29]

    Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, and Hai Zhao. 2025. Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 22324–22339

  30. [30]

    Junjie Mu, Zonghao Ying, Zhekui Fan, Zonglei Jing, Yaoyuan Zhang, Zhengmin Yu, Wenxin Zhang, Quanchen Zou, and Xiangzheng Zhang. 2025. Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?arXiv preprint arXiv:2509.06350(2025)

  31. [31]

    Liangbo Ning, Ziran Liang, Zhuohang Jiang, Haohao Qu, Yujuan Ding, Wenqi Fan, Xiao-yong Wei, Shanru Lin, Hui Liu, Philip S Yu, et al. 2025. A survey of webagents: Towards next-generation ai agents for web automation with large foundation models. InProceedings of the 31st ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining V. 2. 6140–6150

  32. [32]

    2025.GPT-5 is here

    OpenAI. 2025.GPT-5 is here. https://openai.com/gpt-5/ Accessed: 2025-10-04. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Trovato et al

  33. [33]

    Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. 2025. UI-TARS: Pioneering Automated GUI Interaction with Native Agents.arXiv preprint arXiv:2501.12326 (2025)

  34. [34]

    Haoyi Qiu, Alexander R Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, and Chien-Sheng Wu. 2024. Evaluating cultural and social awareness of llm web agents.arXiv preprint arXiv:2410.23252(2024)

  35. [35]

    Yangguang Shao, Xinjie Lin, Haozheng Luo, Chengshang Hou, Gang Xiong, Jiahao Yu, and Junzheng Shi. 2025. POISONCRAFT: Practical Poisoning of Retrieval- Augmented Generation for Large Language Models. arXiv:2505.06579 [cs.CR] https://arxiv.org/abs/2505.06579

  36. [36]

    Settaluri Lakshmi Sravanthi, Ankit Mishra, Debjyoti Mondal, Subhadarshi Panda, Rituraj Singh, and Pushpak Bhattacharyya. 2025. From Perception to Reasoning: Enhancing Vision-Language Models for Mobile UI Understanding. InFindings of the Association for Computational Linguistics: ACL 2025. 25250–25269

  37. [37]

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. 2023. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805(2023)

  38. [38]

    V Team, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, et al. [n. d.]. Glm-4.5 v and glm-4.1 v-thinking: Towards versatile multi-modal reasoning with scalable reinforcement learning, 2025.URL https://arxiv. org/abs/2507.01006([n. d.])

  39. [39]

    Ada Defne Tur, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Stańczak, and Siva Reddy. 2025. Safearena: Evaluating the safety of autonomous web agents.arXiv preprint arXiv:2503.04957 (2025)

  40. [40]

    Haowei Wang, Junjie Wang, Xiaojun Jia, Rupeng Zhang, Mingyang Li, Zhe Liu, Yang Liu, and Qing Wang. 2025. AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery.arXiv preprint arXiv:2505.21499(2025)

  41. [41]

    Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xianglong Liu. 2025. Manipulating Multi- modal Agents via Cross-Modal Prompt Injection.arXiv preprint arXiv:2504.14348 (2025)

  42. [42]

    Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. 2024. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191(2024)

  43. [43]

    Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems 36 (2023), 80079–80110

  44. [44]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

  45. [45]

    Yisong Xiao, Aishan Liu, Tianlin Li, and Xianglong Liu. 2023. Latent imitator: Generating natural individual discriminatory instances for black-box fairness testing. InISSTA

  46. [46]

    Yisong Xiao, Aishan Liu, Siyuan Liang, Xianglong Liu, and Dacheng Tao. 2025. Fairness mediator: Neutralize stereotype associations to mitigate bias in large language models. InISSTA

  47. [47]

    Yisong Xiao, Aishan Liu, Siyuan Liang, Zonghao Ying, Xianglong Liu, and Dacheng Tao. 2025. Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing.arXiv preprint arXiv:2510.01243(2025)

  48. [48]

    Yisong Xiao, Aishan Liu, Tianyuan Zhang, Haotong Qin, Jinyang Guo, and Xianglong Liu. 2023. Robustmq: benchmarking robustness of quantized models. Visual Intelligence(2023)

  49. [49]

    Yisong Xiao, Aishan Liu, Xinwei Zhang, Tianyuan Zhang, Tianlin Li, Siyuan Liang, Xianglong Liu, Yang Liu, and Dacheng Tao. 2025. BDefects4NN: A Back- door Defect Database for Controlled Localization Studies in Neural Networks. In ICSE

  50. [50]

    Yisong Xiao, Xianglong Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Aishan Liu, and Dacheng Tao. 2025. GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing: Y. Xiao et al.International Journal of Computer Vision(2025)

  51. [51]

    Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, and Caiming Xiong. 2024. Aguvis: Unified pure vision agents for autonomous gui interaction.arXiv preprint arXiv:2412.04454(2024)

  52. [52]

    Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, and Jianfeng Gao

  53. [53]

    Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

    Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv preprint arXiv:2310.11441(2023)

  54. [54]

    Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. 2022. Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems35 (2022), 20744–20757

  55. [55]

    Zonghao Ying, Aishan Liu, Siyuan Liang, Lei Huang, Jinyang Guo, Wenbo Zhou, Xianglong Liu, and Dacheng Tao. 2024. Safebench: A safety evaluation framework for multimodal large language models.arXiv preprint arXiv:2410.18927(2024)

  56. [56]

    Zonghao Ying, Aishan Liu, Xianglong Liu, and Dacheng Tao. 2024. Unveiling the safety of gpt-4o: An empirical study using jailbreak attacks.arXiv preprint arXiv:2406.06302(2024)

  57. [57]

    Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang, Xiang- long Liu, and Dacheng Tao. 2025. Jailbreak vision language models via bi-modal adversarial prompt.IEEE Transactions on Information Forensics and Security (2025)

  58. [58]

    Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, et al. 2025. Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025.arXiv preprint arXiv:2506.12430(2025)

  59. [59]

    Zonghao Ying, Deyue Zhang, Zonglei Jing, Yisong Xiao, Quanchen Zou, Aishan Liu, Siyuan Liang, Xiangzheng Zhang, Xianglong Liu, and Dacheng Tao. 2025. Reasoning-augmented conversation for multi-turn jailbreak attacks on large language models.arXiv preprint arXiv:2502.11054(2025)

  60. [60]

    Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, and Dacheng Tao. 2025. Towards understanding the safety boundaries of deepseek models: Evaluation and findings. arXiv preprint arXiv:2503.15092(2025)

  61. [61]

    Jiahao Yu, Yangguang Shao, Hanwen Miao, and Junzheng Shi. 2025. PROMPT- FUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs. arXiv:2409.14729 [cs.CR] https://arxiv.org/abs/2409.14729

  62. [62]

    Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shum- ing Shi, and Zhaopeng Tu. 2023. Gpt-4 is too smart to be safe: Stealthy chat with llms via cipher.arXiv preprint arXiv:2308.06463(2023)

  63. [63]

    Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, Cunxiang Wang, Da Yin, Hao Zeng, Jiajie Zhang, et al. 2025. Glm-4.5: Agentic, reasoning, and coding (arc) foundation models.arXiv preprint arXiv:2508.06471 (2025)

  64. [64]

    Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, and Weiyan Shi

  65. [65]

    InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

    How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 14322–14350

  66. [66]

    Yanzhe Zhang, Tao Yu, and Diyi Yang. 2024. Attacking vision-language computer agents via pop-ups.arXiv preprint arXiv:2411.02391(2024)

  67. [67]

    Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. 2023. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854(2023)

  68. [68]

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. 2023. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043(2023)

  69. [69]

    PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

    Quanchen Zou, Zonghao Ying, Moyang Chen, Wenzhuo Xu, Yisong Xiao, Yakai Li, Deyue Zhang, Dongdong Yang, Zhao Liu, and Xiangzheng Zhang. 2025. PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreak- ing.arXiv preprint arXiv:2507.21540(2025). A Appendix A.1 Environment Examples Fig. 5 presents representative screenshots from the 6...