arxiv: 2510.10073 · v2 · submitted 2025-10-11 · 💻 cs.CR · cs.CV

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Zonghao Ying , Yangguang Shao , Jianle Gan , Gan Xu , Wenxin Zhang , Quanchen Zou , Junzheng Shi , Zhenfei Yin

show 3 more authors

Mingchuan Zhang Aishan Liu Xianglong Liu

This is my paper

Pith reviewed 2026-05-18 08:12 UTC · model grok-4.3

classification 💻 cs.CR cs.CV

keywords LVLM web agentssecurity benchmarkadversarial attacksweb automationagent vulnerabilitymulti-layered evaluationsimulated environmentsagent safety

0 comments

The pith

A new benchmark shows that all tested vision-language web agents remain vulnerable to subtle adversarial manipulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SecureWebArena to fill gaps in existing evaluations by offering a broad test of security risks for agents that combine vision and language models to handle web tasks. It supplies six realistic simulated environments such as shopping sites and forums, along with thousands of task trajectories and a clear breakdown of six distinct attack types that target either the user instructions or the surrounding site conditions. Large experiments across nine models from different design families demonstrate that every agent can be led astray by small changes, and that gains in task specialization often come with reduced resistance to these changes. A multi-layered check tracks not just final success but also the agent's internal steps and full sequence of actions to pinpoint failure points. This matters for anyone building or using automated web tools, since undetected exploits could lead to unintended purchases, data leaks, or other real harms once agents leave controlled tests.

Core claim

SecureWebArena supplies the first unified suite for security testing of LVLM-based web agents through six simulated environments, 2970 trajectories, a taxonomy of six attack vectors that cover both user-level and environment-level manipulations, and a three-part evaluation protocol that separately examines internal reasoning, full behavioral trajectories, and final task outcomes. Application of this suite to nine models spanning general-purpose, agent-specialized, and GUI-grounded categories establishes that every tested agent fails under subtle adversarial inputs and that model specialization introduces measurable security trade-offs.

What carries the argument

The unified evaluation suite that combines six realistic web environments with a structured taxonomy of six attack vectors and a multi-layered protocol for scoring failures in reasoning, trajectory, and outcome.

If this is right

Web-agent designs must add explicit protections against both prompt changes and site-level manipulations to reach reliable real-world use.
Agent-specialized models require targeted security training to offset the vulnerabilities that accompany their performance gains.
Security checks for these agents should routinely inspect reasoning steps and action sequences rather than measuring only final task success.
The benchmark supplies a reusable foundation that future work can extend to develop agents suitable for trustworthy online automation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world deployment on live websites may surface additional attack surfaces not captured inside the simulated environments.
Combining general and specialized models in a single system could reduce the security trade-offs seen when using either type alone.
Continuous monitoring of agent reasoning during operation might allow early detection of the subtle manipulations the benchmark identifies.

Load-bearing premise

The six simulated environments and the six attack categories are broad enough to represent the security threats that would appear when these agents run on actual live websites.

What would settle it

Running one of the nine tested agents on a real e-commerce or forum site and successfully triggering one of the benchmark attack vectors to produce the predicted failure mode would support the results; repeated inability to reproduce the same failures on live sites would weaken the claim that the benchmark captures meaningful risks.

Figures

Figures reproduced from arXiv: 2510.10073 by Aishan Liu, Gan Xu, Jianle Gan, Junzheng Shi, Mingchuan Zhang, Quanchen Zou, Wenxin Zhang, Xianglong Liu, Yangguang Shao, Zhenfei Yin, Zonghao Ying.

**Figure 1.** Figure 1: Overall illustration of our SecureWebArena, the first holistic benchmark for evaluating the security of LVLMbased web agents. 1 Introduction Large vision language models (LVLMs) [12, 33, 38] have equipped autonomous agents with powerful capabilities to perceive and reason across language, vision, and user interface elements [11, 32, 58]. As web agents, these models can navigate complex websites, fill out… view at source ↗

**Figure 2.** Figure 2: SecureWebArena framework. It integrates simulated environments, diverse attack vectors, and multi-level evaluation to assess agent safety performance to adversarial manipulation. The six environments span a broad range of real-world interaction contexts, grouped into four representative categories. Information Retrieval and Navigation environments (e.g., Wikipedia and Reddit) feature dense, user-generate… view at source ↗

**Figure 3.** Figure 3: Overall comparison of agents’ vulnerability scores (RVR, BCR, and PDR) across 6 attack vectors. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of vulnerability scores (RVR, BCR, and PDR) of representative LVLM-based agents across 6 attack vectors. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of evaluated environments. GPT-5 Claude Sonnet 4 Gemini 2.5 Pro GPT-4o Claude Sonnet 3.7 Seed-1.5-VL GLM-4.5V UI-TARS-1.5 Aguvis 20 40 60 80 100 Shopping GPT-5 Claude Sonnet 4 Gemini 2.5 Pro GPT-4o Claude Sonnet 3.7 Seed-1.5-VL GLM-4.5V UI-TARS-1.5 Aguvis 20 40 60 80 100 Classifieds GPT-5 Claude Sonnet 4 Gemini 2.5 Pro GPT-4o Claude Sonnet 3.7 Seed-1.5-VL GLM-4.5V UI-TARS-1.5 Aguvis 20 40 60 80 10… view at source ↗

**Figure 6.** Figure 6: PDR (%) of all evaluated LVLM-based agents across 6 environments and 6 attack types. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Case study illustrating an indirect prompt injection during an online shopping task. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

Large vision-language model (LVLM)-based web agents are emerging as powerful tools for automating complex online tasks. However, when deployed in real-world environments, they face serious security risks, motivating the design of security evaluation benchmarks. Existing benchmarks provide only partial coverage, typically restricted to narrow scenarios such as user-level prompt manipulation, and thus fail to capture the broad range of agent vulnerabilities. To address this gap, we present \tool{}, the first holistic benchmark for evaluating the security of LVLM-based web agents. \tool{} first introduces a unified evaluation suite comprising six simulated but realistic web environments (\eg, e-commerce platforms, community forums) and includes 2,970 high-quality trajectories spanning diverse tasks and attack settings. The suite defines a structured taxonomy of six attack vectors spanning both user-level and environment-level manipulations. In addition, we introduce a multi-layered evaluation protocol that analyzes agent failures across three critical dimensions: internal reasoning, behavioral trajectory, and task outcome, facilitating a fine-grained risk analysis that goes far beyond simple success metrics. Using this benchmark, we conduct large-scale experiments on 9 representative LVLMs, which fall into three categories: general-purpose, agent-specialized, and GUI-grounded. Our results show that all tested agents are consistently vulnerable to subtle adversarial manipulations and reveal critical trade-offs between model specialization and security. By providing (1) a comprehensive benchmark suite with diverse environments and a multi-layered evaluation pipeline, and (2) empirical insights into the security challenges of modern LVLM-based web agents, \tool{} establishes a foundation for advancing trustworthy web agent deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SecureWebArena offers a broader benchmark for LVLM web agent security than prior prompt-only tests, but its simulated environments leave open questions about how well the vulnerability findings transfer to live sites.

read the letter

The paper's main contribution is a new benchmark suite called SecureWebArena that expands security testing for vision-language model web agents beyond narrow prompt manipulations. It includes six simulated environments like e-commerce and forums, 2,970 trajectories, a taxonomy of six attack vectors covering user and environment levels, and a three-part evaluation protocol that examines reasoning, trajectories, and task outcomes. They test nine models split into general, agent-specialized, and GUI-grounded categories and report that all show consistent vulnerabilities with some specialization-security trade-offs. This setup is new in its scope and multi-layered analysis, and it gives credit to earlier partial benchmarks while trying to fill the gaps with more realistic task diversity and attack types. The structured taxonomy and scale of the test cases are clear improvements over limited prior scenarios. The soft spots center on the simulations themselves. The environments are described as realistic but controlled, and without details on how they handle dynamic elements like live DOM changes, redirects, or rate limits, the reported failure rates may not hold up outside the lab. The abstract gives high-level outcomes but no specific numbers, error breakdowns, or statistical checks, so the strength of the vulnerability claims is difficult to judge from what's visible. The weakest link is the assumption that these six setups and vectors capture the main real-world threats. This work is aimed at researchers building or securing AI web agents and related trustworthy AI efforts. Readers who need concrete test suites for agent robustness would find the design and initial results useful as a starting point. It deserves peer review so that referees can examine the full methodology, data, and whether the simulations support the broader security conclusions.

Referee Report

1 major / 2 minor

Summary. The paper introduces SecureWebArena as the first holistic security benchmark for LVLM-based web agents. It comprises six simulated realistic web environments (e.g., e-commerce and forums), 2,970 trajectories, a taxonomy of six attack vectors covering user- and environment-level manipulations, and a multi-layered evaluation protocol assessing internal reasoning, behavioral trajectories, and task outcomes. Large-scale experiments on nine LVLMs across general-purpose, agent-specialized, and GUI-grounded categories show that all agents are vulnerable to subtle adversarial manipulations, with observed trade-offs between model specialization and security.

Significance. If the simulation fidelity and evaluation protocol hold, this benchmark fills a gap in partial existing evaluations by providing comprehensive coverage and fine-grained failure analysis, establishing a foundation for trustworthy LVLM web agent deployment. The scale (nine models, nearly three thousand trajectories) and multi-dimensional protocol are strengths that enable more nuanced risk assessment than success-rate-only metrics.

major comments (1)

[Abstract and §3 (Benchmark Construction)] The central claim that all tested agents are consistently vulnerable and that the benchmark captures broad real-world threats rests on the six simulated environments and six-vector taxonomy. However, the manuscript provides no explicit validation (e.g., comparison of observable states, DOM dynamics, or authentication redirects) that these simulations reproduce the security-relevant behaviors of live web sites; without such evidence the reported failure rates and specialization-security trade-offs may not transfer.

minor comments (2)

[Abstract] The abstract summarizes high-level outcomes but supplies no quantitative results, error bars, or statistical controls; moving at least one key table or figure summary into the abstract would strengthen immediate verifiability.
[§4 (Experiments)] Clarify whether the 2,970 trajectories include balanced coverage across all six attack vectors and environments, or whether some combinations are underrepresented.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the importance of simulation fidelity in establishing the transferability of our benchmark results. We agree that explicit validation would strengthen the central claims regarding agent vulnerabilities and the broad applicability of the observed trade-offs. We address this comment below and will incorporate revisions to improve the manuscript.

read point-by-point responses

Referee: [Abstract and §3 (Benchmark Construction)] The central claim that all tested agents are consistently vulnerable and that the benchmark captures broad real-world threats rests on the six simulated environments and six-vector taxonomy. However, the manuscript provides no explicit validation (e.g., comparison of observable states, DOM dynamics, or authentication redirects) that these simulations reproduce the security-relevant behaviors of live web sites; without such evidence the reported failure rates and specialization-security trade-offs may not transfer.

Authors: We acknowledge that the current manuscript does not present a direct side-by-side empirical validation (such as quantitative comparisons of DOM trees, state transitions, or authentication redirect behaviors) between the simulated environments and live websites. The environments were implemented using standard web frameworks to replicate core interaction patterns and security surfaces observed in real platforms (e.g., dynamic product updates in e-commerce and threaded discussions in forums). However, we recognize that this design rationale alone is insufficient to fully address transferability concerns. In the revised manuscript, we will expand §3 with a dedicated subsection on environment construction that includes concrete examples of replicated observable states and attack surfaces. We will also add an explicit limitations paragraph discussing potential discrepancies with live deployments and how future work could incorporate real-site validation. These changes will allow readers to better evaluate the generalizability of the reported failure rates and specialization-security trade-offs. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark creation and testing exhibits no circularity

full rationale

This paper introduces a new security evaluation benchmark for LVLM-based web agents, defines six simulated environments and a taxonomy of six attack vectors, generates 2,970 trajectories, and reports empirical results from testing nine existing models across three categories. The central claims consist of observed failure rates and specialization-security trade-offs derived directly from these experiments. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text; the work is self-contained as an empirical evaluation effort without any reduction of results to self-referential inputs or self-citation chains.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The paper contributes a constructed evaluation framework rather than a derivation from first principles. It relies on domain assumptions about simulation fidelity and attack coverage instead of introducing fitted parameters or new physical entities.

free parameters (2)

Selection and design of the six web environments
Chosen to represent diverse real-world tasks such as e-commerce and community forums.
Definition of the six attack vectors in the taxonomy
Structured to span user-level and environment-level manipulations.

axioms (1)

domain assumption Simulated web environments can serve as valid proxies for evaluating real-world security risks to deployed LVLM-based agents.
Invoked to justify the relevance of the benchmark results to practical deployment.

pith-pipeline@v0.9.0 · 5855 in / 1348 out tokens · 22616 ms · 2026-05-18T08:12:15.964394+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SecureWebArena first introduces a unified evaluation suite comprising six simulated but realistic web environments ... and a structured taxonomy of six attack vectors spanning both user-level and environment-level manipulations.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 15 internal anchors

[1]

Tamer Abuelsaad, Deepak Akkil, Prasenjit Dey, Ashish Jagmohan, Aditya Vem- paty, and Ravi Kokku. 2024. Agent-e: From autonomous web navigation to foun- dational design principles in agentic systems.arXiv preprint arXiv:2407.13032 (2024)

work page arXiv 2024
[2]

2025.Introducing Claude 3.7 Sonnet and Claude Code

Anthropic. 2025.Introducing Claude 3.7 Sonnet and Claude Code. https://www. anthropic.com/news/claude-3-7-sonnet Accessed: 2025-10-04

work page 2025
[3]

2025.Introducing Claude 4

Anthropic. 2025.Introducing Claude 4. https://www.anthropic.com/news/claude- 4 Accessed: 2025-10-04

work page 2025
[4]

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems36 (2023), 28091–28114

work page 2023
[6]

Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, and Lidong Bing. 2023. Multilingual jailbreak challenges in large language models.arXiv preprint arXiv:2310.06474 (2023)

work page arXiv 2023
[7]

Peng Ding, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, and Shujian Huang. 2023. A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily.arXiv preprint arXiv:2311.08268 (2023)

work page arXiv 2023
[8]

Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori, Chuan Guo, and Kamalika Chaudhuri. 2025. Wasp: Benchmarking web agent security against prompt injection attacks.arXiv preprint arXiv:2504.18575(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

work page 2023
[10]

Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, et al

work page
[11]

Seed1.5-VL Technical Report

Seed1.5-VL Technical Report. arXiv:2505.07062 [cs.CV] https://arxiv.org/ abs/2505.07062

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Jiaxing Huang, Jingyi Zhang, Kai Jiang, Han Qiu, Xiaoqin Zhang, Ling Shao, Shijian Lu, and Dacheng Tao. 2025. Visual Instruction Tuning towards General- Purpose Multimodal Large Language Model: A Survey.International Journal of Computer Vision(2025), 1–39

work page 2025
[13]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Sam Johnson, Viet Pham, and Thai Le. 2025. Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree.arXiv preprint arXiv:2507.14799(2025)

work page arXiv 2025
[15]

Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, and Daniel Fried

work page
[16]

Visualwebarena: Evaluating multimodal agents on realistic visual web tasks.arXiv preprint arXiv:2401.13649(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, et al. 2024. Refusal-trained llms are easily jailbroken as browser agents.arXiv preprint arXiv:2410.13886(2024)

work page arXiv 2024
[18]

Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, et al. 2024. Autowebglm: A large language model-based web navigating agent. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5295–5306

work page 2024
[19]

Ido Levy, Ben Wiesel, Sami Marreed, Alon Oved, Avi Yaeli, and Segev Shlomov

work page
[20]

St-webagentbench: A benchmark for evaluating safety and trustworthiness in web agents.arXiv preprint arXiv:2410.06703(2024)

work page arXiv 2024
[21]

Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, and Bo Han

work page
[22]

Deepinception: Hypnotize large language model to be jailbreaker.arXiv preprint arXiv:2311.03191(2023)

work page arXiv 2023
[23]

Aishan Liu, Jun Guo, Jiakai Wang, Siyuan Liang, Renshuai Tao, Wenbo Zhou, Cong Liu, Xianglong Liu, and Dacheng Tao. 2023. X-Adv: Physical Adversarial Object Attacks against X-ray Prohibited Item Detection. InUSENIX Security Symposium

work page 2023
[24]

Aishan Liu, Tairan Huang, Xianglong Liu, Yitao Xu, Yuqing Ma, Xinyun Chen, Stephen J Maybank, and Dacheng Tao. 2020. Spatiotemporal attacks for embodied agents. InECCV

work page 2020
[25]

Aishan Liu, Xianglong Liu, Jiaxin Fan, Yuqing Ma, Anlan Zhang, Huiyuan Xie, and Dacheng Tao. 2019. Perceptual-sensitive gan for generating adversarial patches. InAAAI

work page 2019
[26]

Aishan Liu, Xianglong Liu, Hang Yu, Chongzhi Zhang, Qiang Liu, and Dacheng Tao. 2021. Training robust deep neural networks via adversarial noise propagation. TIP(2021)

work page 2021
[27]

Aishan Liu, Jiakai Wang, Xianglong Liu, Bowen Cao, Chongzhi Zhang, and Hang Yu. 2020. Bias-based universal adversarial patch attack for automatic check-out. InECCV

work page 2020
[28]

Aishan Liu, Zonghao Ying, Le Wang, Junjie Mu, Jinyang Guo, Jiakai Wang, Yuqing Ma, Siyuan Liang, Mingchuan Zhang, Xianglong Liu, et al. 2025. AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions.arXiv preprint arXiv:2506.14697(2025)

work page arXiv 2025
[29]

Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, and Hai Zhao. 2025. Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 22324–22339

work page 2025
[30]

Junjie Mu, Zonghao Ying, Zhekui Fan, Zonglei Jing, Yaoyuan Zhang, Zhengmin Yu, Wenxin Zhang, Quanchen Zou, and Xiangzheng Zhang. 2025. Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?arXiv preprint arXiv:2509.06350(2025)

work page arXiv 2025
[31]

Liangbo Ning, Ziran Liang, Zhuohang Jiang, Haohao Qu, Yujuan Ding, Wenqi Fan, Xiao-yong Wei, Shanru Lin, Hui Liu, Philip S Yu, et al. 2025. A survey of webagents: Towards next-generation ai agents for web automation with large foundation models. InProceedings of the 31st ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining V. 2. 6140–6150

work page 2025
[32]

2025.GPT-5 is here

OpenAI. 2025.GPT-5 is here. https://openai.com/gpt-5/ Accessed: 2025-10-04. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Trovato et al

work page 2025
[33]

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. 2025. UI-TARS: Pioneering Automated GUI Interaction with Native Agents.arXiv preprint arXiv:2501.12326 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Haoyi Qiu, Alexander R Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, and Chien-Sheng Wu. 2024. Evaluating cultural and social awareness of llm web agents.arXiv preprint arXiv:2410.23252(2024)

work page arXiv 2024
[35]

Yangguang Shao, Xinjie Lin, Haozheng Luo, Chengshang Hou, Gang Xiong, Jiahao Yu, and Junzheng Shi. 2025. POISONCRAFT: Practical Poisoning of Retrieval- Augmented Generation for Large Language Models. arXiv:2505.06579 [cs.CR] https://arxiv.org/abs/2505.06579

work page arXiv 2025
[36]

Settaluri Lakshmi Sravanthi, Ankit Mishra, Debjyoti Mondal, Subhadarshi Panda, Rituraj Singh, and Pushpak Bhattacharyya. 2025. From Perception to Reasoning: Enhancing Vision-Language Models for Mobile UI Understanding. InFindings of the Association for Computational Linguistics: ACL 2025. 25250–25269

work page 2025
[37]

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. 2023. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

V Team, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, et al. [n. d.]. Glm-4.5 v and glm-4.1 v-thinking: Towards versatile multi-modal reasoning with scalable reinforcement learning, 2025.URL https://arxiv. org/abs/2507.01006([n. d.])

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Ada Defne Tur, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Stańczak, and Siva Reddy. 2025. Safearena: Evaluating the safety of autonomous web agents.arXiv preprint arXiv:2503.04957 (2025)

work page arXiv 2025
[40]

Haowei Wang, Junjie Wang, Xiaojun Jia, Rupeng Zhang, Mingyang Li, Zhe Liu, Yang Liu, and Qing Wang. 2025. AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery.arXiv preprint arXiv:2505.21499(2025)

work page arXiv 2025
[41]

Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xianglong Liu. 2025. Manipulating Multi- modal Agents via Cross-Modal Prompt Injection.arXiv preprint arXiv:2504.14348 (2025)

work page arXiv 2025
[42]

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. 2024. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[43]

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems 36 (2023), 80079–80110

work page 2023
[44]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

work page 2022
[45]

Yisong Xiao, Aishan Liu, Tianlin Li, and Xianglong Liu. 2023. Latent imitator: Generating natural individual discriminatory instances for black-box fairness testing. InISSTA

work page 2023
[46]

Yisong Xiao, Aishan Liu, Siyuan Liang, Xianglong Liu, and Dacheng Tao. 2025. Fairness mediator: Neutralize stereotype associations to mitigate bias in large language models. InISSTA

work page 2025
[47]

Yisong Xiao, Aishan Liu, Siyuan Liang, Zonghao Ying, Xianglong Liu, and Dacheng Tao. 2025. Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing.arXiv preprint arXiv:2510.01243(2025)

work page arXiv 2025
[48]

Yisong Xiao, Aishan Liu, Tianyuan Zhang, Haotong Qin, Jinyang Guo, and Xianglong Liu. 2023. Robustmq: benchmarking robustness of quantized models. Visual Intelligence(2023)

work page 2023
[49]

Yisong Xiao, Aishan Liu, Xinwei Zhang, Tianyuan Zhang, Tianlin Li, Siyuan Liang, Xianglong Liu, Yang Liu, and Dacheng Tao. 2025. BDefects4NN: A Back- door Defect Database for Controlled Localization Studies in Neural Networks. In ICSE

work page 2025
[50]

Yisong Xiao, Xianglong Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Aishan Liu, and Dacheng Tao. 2025. GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing: Y. Xiao et al.International Journal of Computer Vision(2025)

work page 2025
[51]

Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, and Caiming Xiong. 2024. Aguvis: Unified pure vision agents for autonomous gui interaction.arXiv preprint arXiv:2412.04454(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, and Jianfeng Gao

work page
[53]

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv preprint arXiv:2310.11441(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[54]

Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. 2022. Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems35 (2022), 20744–20757

work page 2022
[55]

Zonghao Ying, Aishan Liu, Siyuan Liang, Lei Huang, Jinyang Guo, Wenbo Zhou, Xianglong Liu, and Dacheng Tao. 2024. Safebench: A safety evaluation framework for multimodal large language models.arXiv preprint arXiv:2410.18927(2024)

work page arXiv 2024
[56]

Zonghao Ying, Aishan Liu, Xianglong Liu, and Dacheng Tao. 2024. Unveiling the safety of gpt-4o: An empirical study using jailbreak attacks.arXiv preprint arXiv:2406.06302(2024)

work page arXiv 2024
[57]

Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang, Xiang- long Liu, and Dacheng Tao. 2025. Jailbreak vision language models via bi-modal adversarial prompt.IEEE Transactions on Information Forensics and Security (2025)

work page 2025
[58]

Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, et al. 2025. Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025.arXiv preprint arXiv:2506.12430(2025)

work page arXiv 2025
[59]

Zonghao Ying, Deyue Zhang, Zonglei Jing, Yisong Xiao, Quanchen Zou, Aishan Liu, Siyuan Liang, Xiangzheng Zhang, Xianglong Liu, and Dacheng Tao. 2025. Reasoning-augmented conversation for multi-turn jailbreak attacks on large language models.arXiv preprint arXiv:2502.11054(2025)

work page arXiv 2025
[60]

Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, and Dacheng Tao. 2025. Towards understanding the safety boundaries of deepseek models: Evaluation and findings. arXiv preprint arXiv:2503.15092(2025)

work page arXiv 2025
[61]

Jiahao Yu, Yangguang Shao, Hanwen Miao, and Junzheng Shi. 2025. PROMPT- FUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs. arXiv:2409.14729 [cs.CR] https://arxiv.org/abs/2409.14729

work page arXiv 2025
[62]

Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shum- ing Shi, and Zhaopeng Tu. 2023. Gpt-4 is too smart to be safe: Stealthy chat with llms via cipher.arXiv preprint arXiv:2308.06463(2023)

work page arXiv 2023
[63]

Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, Cunxiang Wang, Da Yin, Hao Zeng, Jiajie Zhang, et al. 2025. Glm-4.5: Agentic, reasoning, and coding (arc) foundation models.arXiv preprint arXiv:2508.06471 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[64]

Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, and Weiyan Shi

work page
[65]

InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 14322–14350

work page
[66]

Yanzhe Zhang, Tao Yu, and Diyi Yang. 2024. Attacking vision-language computer agents via pop-ups.arXiv preprint arXiv:2411.02391(2024)

work page arXiv 2024
[67]

Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. 2023. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[68]

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. 2023. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[69]

PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

Quanchen Zou, Zonghao Ying, Moyang Chen, Wenzhuo Xu, Yisong Xiao, Yakai Li, Deyue Zhang, Dongdong Yang, Zhao Liu, and Xiangzheng Zhang. 2025. PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreak- ing.arXiv preprint arXiv:2507.21540(2025). A Appendix A.1 Environment Examples Fig. 5 presents representative screenshots from the 6...

work page internal anchor Pith review Pith/arXiv arXiv 2025