Jailbreak-induced performance loss shrinks as model capability grows, with the strongest models showing almost no degradation on benchmarks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Evidence-carrying multimodal agents decompose tool calls into predicates, obtain certificates from DOM/OCR/AX verifiers, and use a deterministic gate to authorize actions only when certificates support them, achieving zero unsafe executions in tested tasks.
citing papers explorer
-
Jailbroken Frontier Models Retain Their Capabilities
Jailbreak-induced performance loss shrinks as model capability grows, with the strongest models showing almost no degradation on benchmarks.
-
Hallucination as Exploit: Evidence-Carrying Multimodal Agents
Evidence-carrying multimodal agents decompose tool calls into predicates, obtain certificates from DOM/OCR/AX verifiers, and use a deterministic gate to authorize actions only when certificates support them, achieving zero unsafe executions in tested tasks.