WebGuard: Building a Generalizable Guardrail for Web Agents

· 2025 · arXiv 2507.14293

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

cs.CR · 2026-04-28 · unverdicted · novelty 6.0

SnapGuard detects prompt injection attacks on screenshot-based web agents via visual stability indicators and contrast-polarity textual signals, reaching F1 0.75 while running 8x faster than GPT-4o with no added memory cost.

From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails

cs.AI · 2025-10-15 · unverdicted · novelty 6.0

Control-theoretic guardrails enable proactive correction of risky LLM agent actions in latent space, preventing catastrophes like collisions or bankruptcy while preserving task performance in simulated environments.

Beyond Task Success: An Evidence-Synthesis Framework for Evaluating, Governing, and Orchestrating Agentic AI

cs.SE · 2026-04-18 · unverdicted · novelty 5.0

Agentic AI evaluation and governance lack mechanisms to bind obligations to actions and prove compliance at runtime; a new synthesis framework with ODTA criteria and action-evidence bundles addresses this closure gap.

From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI

cs.AI · 2026-04-06 · unverdicted · novelty 4.0

The paper presents a layered method to translate governance objectives from standards such as ISO/IEC 42001 into four control layers for agentic AI, with runtime guardrails limited to observable, determinate, and time-sensitive controls, shown via a procurement-agent case study.

citing papers explorer

Showing 4 of 4 citing papers.

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents cs.CR · 2026-04-28 · unverdicted · none · ref 58
SnapGuard detects prompt injection attacks on screenshot-based web agents via visual stability indicators and contrast-polarity textual signals, reaching F1 0.75 while running 8x faster than GPT-4o with no added memory cost.
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails cs.AI · 2025-10-15 · unverdicted · none · ref 36
Control-theoretic guardrails enable proactive correction of risky LLM agent actions in latent space, preventing catastrophes like collisions or bankruptcy while preserving task performance in simulated environments.
Beyond Task Success: An Evidence-Synthesis Framework for Evaluating, Governing, and Orchestrating Agentic AI cs.SE · 2026-04-18 · unverdicted · none · ref 11
Agentic AI evaluation and governance lack mechanisms to bind obligations to actions and prove compliance at runtime; a new synthesis framework with ODTA criteria and action-evidence bundles addresses this closure gap.
From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI cs.AI · 2026-04-06 · unverdicted · none · ref 22
The paper presents a layered method to translate governance objectives from standards such as ISO/IEC 42001 into four control layers for agentic AI, with runtime guardrails limited to observable, determinate, and time-sensitive controls, shown via a procurement-agent case study.

WebGuard: Building a Generalizable Guardrail for Web Agents

fields

years

verdicts

representative citing papers

citing papers explorer