WAInjectBench: Benchmarking prompt injection detections for web agents

Reports 42 · 2024 · arXiv 2510.01354

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2 dataset 1

citation-polarity summary

background 2 use dataset 1

representative citing papers

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

cs.CR · 2026-05-27 · unverdicted · novelty 8.0

Roughly 1% of real resumes contain hidden prompt injections against LLM screeners, prevalence has risen over 1-2 years, and over 90% avoid explicit instructions.

Formal Security Analysis of Agent Protocol Composition

cs.CR · 2026-06-27 · unverdicted · novelty 7.0

AgentThread analyzes five agent protocols with formal TLA+ invariants and SDK tests, reporting 35 specification findings, 80 implementation tests, 30 composition-only failures, and a cross-protocol responsibility gap in security enforcement.

Same-Origin Policy for Agentic Browsers

cs.CR · 2026-06-12 · unverdicted · novelty 7.0

The paper builds SOPBench showing frequent SOP violations in agentic browsers and introduces SOPGuard to enforce the policy with low overhead in BrowserOS.

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

cs.CR · 2026-06-03 · unverdicted · novelty 7.0

Frontier browser agents show strong resistance to hand-crafted multi-step prompt injections (0/140 success), unlike coding agents (up to 100%), indicating domain-conditioned safety and that prior high ASR reports may not generalize.

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

cs.CR · 2026-05-29 · unverdicted · novelty 6.0

SCOUT adaptively allocates heterogeneous prompt-injection detectors via pre-hoc reliability prediction, cutting attack success 46% and wall-clock 40% versus always-on GPT-4o on new SCOUT-450 benchmark at modest utility cost, with transfer to other sets.

Web Agents Should Adopt the Plan-Then-Execute Paradigm

cs.CR · 2026-05-14 · unverdicted · novelty 6.0

Web agents should default to planning a complete task program before observing live web content to reduce prompt injection exposure, since WebArena tasks are compatible and 80% need no runtime LLM calls.

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

cs.CR · 2026-04-28 · unverdicted · novelty 6.0

SnapGuard detects prompt injection attacks on screenshot-based web agents via visual stability indicators and contrast-polarity textual signals, reaching F1 0.75 while running 8x faster than GPT-4o with no added memory cost.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

cs.CL · 2026-05-08 · unverdicted · novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

citing papers explorer

Showing 9 of 9 citing papers.

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening cs.CR · 2026-05-27 · unverdicted · none · ref 24
Roughly 1% of real resumes contain hidden prompt injections against LLM screeners, prevalence has risen over 1-2 years, and over 90% avoid explicit instructions.
Formal Security Analysis of Agent Protocol Composition cs.CR · 2026-06-27 · unverdicted · none · ref 42
AgentThread analyzes five agent protocols with formal TLA+ invariants and SDK tests, reporting 35 specification findings, 80 implementation tests, 30 composition-only failures, and a cross-protocol responsibility gap in security enforcement.
Same-Origin Policy for Agentic Browsers cs.CR · 2026-06-12 · unverdicted · none · ref 11
The paper builds SOPBench showing frequent SOP violations in agentic browsers and introduces SOPGuard to enforce the policy with low overhead in BrowserOS.
Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming cs.CR · 2026-06-03 · unverdicted · none · ref 13
Frontier browser agents show strong resistance to hand-crafted multi-step prompt injections (0/140 success), unlike coding agents (up to 100%), indicating domain-conditioned safety and that prior high ASR reports may not generalize.
Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense cs.CR · 2026-05-29 · unverdicted · none · ref 4
SCOUT adaptively allocates heterogeneous prompt-injection detectors via pre-hoc reliability prediction, cutting attack success 46% and wall-clock 40% versus always-on GPT-4o on new SCOUT-450 benchmark at modest utility cost, with transfer to other sets.
Web Agents Should Adopt the Plan-Then-Execute Paradigm cs.CR · 2026-05-14 · unverdicted · none · ref 16
Web agents should default to planning a complete task program before observing live web content to reduce prompt injection exposure, since WebArena tasks are compatible and 80% need no runtime LLM calls.
SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents cs.CR · 2026-04-28 · unverdicted · none · ref 23
SnapGuard detects prompt injection attacks on screenshot-based web agents via visual stability indicators and contrast-polarity textual signals, reaching F1 0.75 while running 8x faster than GPT-4o with no added memory cost.
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability cs.CL · 2026-05-08 · unverdicted · none · ref 42
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 109
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

WAInjectBench: Benchmarking prompt injection detections for web agents

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer