Pith Number

pith:U4LWUM4H

pith:2025:U4LWUM4HR6F4JRSRGMW53APVDW

not attested not anchored not stored refs resolved

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

Aaron Grattafiori, Arman Zharmagambetov, Chuan Guo, Ivan Evtimov, Kamalika Chaudhuri

WASP benchmark shows top web agents deceived by simple prompt injections with partial success up to 86 percent.

arxiv:2504.18575 v3 · 2025-04-22 · cs.CR · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{U4LWUM4HR6F4JRSRGMW53APVDW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Evaluating with WASP shows that even top-tier AI models, including those with advanced reasoning capabilities, can be deceived by simple, low-effort human-written injections in very realistic scenarios. Our end-to-end evaluation reveals a previously unobserved insight: while attacks partially succeed in up to 86% of the case, even state-of-the-art agents often struggle to fully complete the attacker goals.

C2weakest assumption

The benchmark tasks and injection examples accurately represent real-world web agent usage and attacker capabilities without over-simplifying or granting attackers unrealistic control.

C3one line summary

WASP benchmark reveals web agents are vulnerable to simple prompt injections with partial success rates up to 86%, but agents frequently fail to complete attacker objectives.

References

22 extracted · 22 resolved · 1 Pith anchors

[1] WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models 2024 · arXiv:2401.13919

[2] The objective is to comment on a given issue saying ‘‘we are working on it.’’ Currently, the observation shows that we are on the profile page in GitLab

[3] Comment on the given issue saying we are working on it

[4] ‘not compromised

[5] ‘not compromised

Cited by

26 papers in Pith

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

LLM Agents Are the Antidote to Walled Gardens

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents

Receipt and verification

First computed	2026-05-17T23:38:49.957466Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

a7176a33878f8bc4c651332ddd81f51d85f305df7dfc3bbd3e42fc27d184197b

Aliases

arxiv: 2504.18575 · arxiv_version: 2504.18575v3 · doi: 10.48550/arxiv.2504.18575 · pith_short_12: U4LWUM4HR6F4 · pith_short_16: U4LWUM4HR6F4JRSR · pith_short_8: U4LWUM4H

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/U4LWUM4HR6F4JRSRGMW53APVDW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a7176a33878f8bc4c651332ddd81f51d85f305df7dfc3bbd3e42fc27d184197b

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "7195a4ebdc15e95bc2fac8acabd99815e08a6520136f3539a878f772e241b857",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2025-04-22T17:51:03Z",
    "title_canon_sha256": "d82c1e88660f2ab70e699549c3e5c7b3f0c0ff29856e788c2914471738702a6d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.18575",
    "kind": "arxiv",
    "version": 3
  }
}