GUIGuard-Bench is a new benchmark with annotated GUI screenshots that measures privacy recognition, planning fidelity under protection, and utility impact for trajectory-based GUI agents.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Single-agent systems with tools provide the optimal performance-efficiency trade-off for small language models, outperforming base models and multi-agent setups.
The paper measures policy-carriage failures during LLM context assembly and evaluates SafeContext as a partial mitigation on Llama, Qwen, and Mistral models.
The paper analyzes security, privacy, and ethical risks in the OpenClaw AI agent system arising from its architecture, storage, tool use, and integrations, arguing these form major barriers to trustworthy adoption.
citing papers explorer
-
GUIGuard-Bench: Toward a General Evaluation for Privacy-Preserving GUI Agents
GUIGuard-Bench is a new benchmark with annotated GUI screenshots that measures privacy recognition, planning fidelity under protection, and utility impact for trajectory-based GUI agents.
-
Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms
Single-agent systems with tools provide the optimal performance-efficiency trade-off for small language models, outperforming base models and multi-agent setups.
-
Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly
The paper measures policy-carriage failures during LLM context assembly and evaluates SafeContext as a partial mitigation on Llama, Qwen, and Mistral models.
-
Security, Privacy, and Ethical Risks in OpenClaw
The paper analyzes security, privacy, and ethical risks in the OpenClaw AI agent system arising from its architecture, storage, tool use, and integrations, arguing these form major barriers to trustworthy adoption.