Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents
Pith reviewed 2026-05-18 09:34 UTC · model grok-4.3
The pith
A difference in touch signals lets attackers jailbreak mobile vision-language agents with content humans do not notice.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We uncover a consistent discrepancy between human and agent interactions where automated agents generate near-zero contact touch signals. Building on this, we propose agent-only perceptual injection, in which malicious content is exposed only during agent interactions while remaining not readily perceived by human users. To fit mobile UI constraints and one-shot settings, we introduce HG-IDA*, an efficient optimization method for constructing jailbreak prompts that evade LVLM safety filters. Experiments demonstrate that our approach induces unauthorized cross-app actions, achieving 82.5% planning and 75.0% execution hijack rates on GPT-4o.
What carries the argument
Agent-only perceptual injection, the paradigm that uses near-zero touch signals from automated agents to hide malicious visual prompts from humans while triggering agent actions.
If this is right
- Mobile agents can be made to execute cross-app actions without leaving persistent visual traces that users would notice.
- One-shot optimized prompts can bypass existing LVLM safety filters under realistic mobile deployment limits.
- Security for vision-language agents must incorporate interaction-level signals such as touch data rather than relying only on visual content filters.
- Similar perceptual gaps may allow stealthy attacks on other autonomous agent systems that process visual inputs differently from human observers.
Where Pith is reading between the lines
- Agent designs could add simulated human touch patterns or consistency checks between human-view and agent-view renders to flag anomalies.
- Platform providers might need to expose touch-signal metadata to safety layers so that injected content can be filtered before execution.
- The same discrepancy could be tested in non-mobile settings such as desktop agents or robotic vision systems that share visual interfaces with humans.
Load-bearing premise
Automated agents generate near-zero contact touch signals, so malicious visual content stays imperceptible to humans during one-shot interactions.
What would settle it
Measure actual touch signal rates in deployed mobile agents during normal tasks and test whether human users can detect the injected malicious content when viewing the same screens the agent processes.
read the original abstract
Large Vision-Language Models (LVLMs) empower autonomous mobile agents, yet their security under realistic mobile deployment constraints remains underexplored. While agents are vulnerable to visual prompt injections, stealthily executing such attacks without requiring system-level privileges remains challenging, as existing methods rely on persistent visual manipulations that are noticeable to users. We uncover a consistent discrepancy between human and agent interactions: automated agents generate near-zero contact touch signals. Building on this insight, we propose a new attack paradigm, agent-only perceptual injection, where malicious content is exposed only during agent interactions, while remaining not readily perceived by human users. To accommodate mobile UI constraints and one-shot interaction settings, we introduce HG-IDA*, an efficient one-shot optimization method for constructing jailbreak prompts that evade LVLM safety filters. Experiments demonstrate that our approach induces unauthorized cross-app actions, achieving 82.5% planning and 75.0% execution hijack rates on GPT-4o. Our findings highlight a previously underexplored attack surface in mobile agent systems and underscore the need for defenses that incorporate interaction-level signals.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a new attack paradigm termed 'agent-only perceptual injection' for stealthy jailbreaks on mobile vision-language agents. It builds on the claimed discrepancy that automated agents produce near-zero contact touch signals (unlike humans), allowing malicious UI content to evade human perception while triggering agent actions. The authors present HG-IDA*, an efficient one-shot optimization method to generate prompts that bypass LVLM safety filters under mobile UI and one-shot constraints. Experiments report 82.5% planning hijack and 75.0% execution hijack rates inducing unauthorized cross-app actions on GPT-4o.
Significance. If the core assumption and experimental claims hold after verification, the work is significant for identifying an underexplored attack surface in deployed mobile agent systems that relies on interaction modality differences rather than persistent visual changes. It could motivate defenses that incorporate touch or interaction signals and extends visual prompt injection research to agent-specific settings.
major comments (2)
- [Abstract] Abstract: The stealth property and new paradigm rest on the unquantified assumption that 'automated agents generate near-zero contact touch signals' while malicious content remains 'not readily perceived' by humans in one-shot flows. No touch-event logs, gaze data, human detection rates, or timing/rendering details are provided to substantiate the discrepancy, which is load-bearing for distinguishing this from standard visual prompt injection.
- [Abstract] Abstract (experiments paragraph): The headline rates of 82.5% planning and 75.0% execution hijack on GPT-4o are presented without any mention of trial count, baseline comparisons, experimental controls, statistical error bars, or optimization robustness checks. This absence prevents assessment of whether the results support the claimed effectiveness of HG-IDA*.
minor comments (1)
- [Abstract] The acronym HG-IDA* is used without expansion or definition in the abstract; provide the full name and a brief description of the method on first use.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity in the abstract regarding the core assumption and experimental reporting. We address each point below with clarifications drawn from the full manuscript and propose targeted revisions to strengthen the presentation without altering the underlying contributions.
read point-by-point responses
-
Referee: [Abstract] The stealth property and new paradigm rest on the unquantified assumption that 'automated agents generate near-zero contact touch signals' while malicious content remains 'not readily perceived' by humans in one-shot flows. No touch-event logs, gaze data, human detection rates, or timing/rendering details are provided to substantiate the discrepancy, which is load-bearing for distinguishing this from standard visual prompt injection.
Authors: We agree that the abstract does not quantify the discrepancy with logs or human studies. The manuscript grounds the claim in the fundamental interaction differences: human users produce measurable physical contact touch events on mobile devices, whereas automated agents operate through simulated inputs or accessibility APIs that generate near-zero contact signals by design. This modality gap enables the agent-only perceptual injection without persistent visual changes. To address the concern, we will revise the introduction to include a dedicated paragraph with illustrative agent trace examples and a clear explanation of why this distinguishes the attack from standard visual prompt injection. We will also note the practical challenges of conducting human perception studies in an attack paper. revision: yes
-
Referee: [Abstract] The headline rates of 82.5% planning and 75.0% execution hijack on GPT-4o are presented without any mention of trial count, baseline comparisons, experimental controls, statistical error bars, or optimization robustness checks. This absence prevents assessment of whether the results support the claimed effectiveness of HG-IDA*.
Authors: The abstract is a high-level summary of results; the full experimental details—including trial counts, baseline comparisons to existing methods, UI variation controls, robustness checks for the one-shot HG-IDA* optimizer, and statistical reporting—are provided in Section 4 (Experiments) and the appendix. We acknowledge that a brief reference to the evaluation scale would improve the abstract. We will revise the abstract's experiments paragraph to indicate that the rates are obtained from systematic multi-trial evaluations with controls and direct readers to the detailed methodology and analysis in the main text for full context. revision: yes
Circularity Check
Empirical attack rates rest on experimental measurement with no reduction to self-referential inputs
full rationale
The paper presents an empirical security study whose central results are attack success rates (82.5% planning, 75% execution on GPT-4o) obtained by running the HG-IDA* optimization against real LVLM agents. The claimed discrepancy in touch signals is introduced as an observed premise that motivates the agent-only perceptual injection paradigm, yet the reported hijack percentages are produced by direct experimentation rather than any equation, fitted parameter, or self-citation that reduces the outcome to the premise by construction. No derivation chain, uniqueness theorem, or ansatz is invoked that would make the measured rates tautological with the input assumption.
Axiom & Free-Parameter Ledger
invented entities (1)
-
agent-only perceptual injection
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We map this record to an agent-attributable event indicator via a simple classifier et = f(rt) = (1, size t ≤ ϵs ∨ pressure t ≤ ϵp, 0, otherwise)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
agent-attributable activation, which leverages input attribution signals to distinguish agent from human interactions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.