Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents

Jun Luo; Jun Zhu; Kun He; Renhua Ding; Xiao Yang; Zhengwei Fang

arxiv: 2510.07809 · v4 · submitted 2025-10-09 · 💻 cs.CR · cs.AI

Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents

Renhua Ding , Xiao Yang , Zhengwei Fang , Jun Luo , Kun He , Jun Zhu This is my paper

Pith reviewed 2026-05-18 09:34 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords jailbreak attacksvision-language modelsmobile agentsvisual prompt injectionAI securityperceptual injectioncross-app actionsagent safety

0 comments

The pith

A difference in touch signals lets attackers jailbreak mobile vision-language agents with content humans do not notice.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that automated agents produce almost no contact touch signals when interacting with mobile screens, unlike humans. This gap allows visual jailbreak content to be injected so that only the agent perceives and acts on it during one-shot sessions, while it stays hidden from users. The authors introduce an agent-only perceptual injection paradigm and develop HG-IDA*, a one-shot optimization method to generate prompts that bypass LVLM safety filters under mobile UI limits. Experiments show the method triggers unauthorized cross-app actions at 82.5 percent planning and 75 percent execution hijack rates on GPT-4o. If correct, the work shows that current visual safety mechanisms miss interaction-level differences and must account for agent-specific signals to prevent stealthy hijacks.

Core claim

We uncover a consistent discrepancy between human and agent interactions where automated agents generate near-zero contact touch signals. Building on this, we propose agent-only perceptual injection, in which malicious content is exposed only during agent interactions while remaining not readily perceived by human users. To fit mobile UI constraints and one-shot settings, we introduce HG-IDA*, an efficient optimization method for constructing jailbreak prompts that evade LVLM safety filters. Experiments demonstrate that our approach induces unauthorized cross-app actions, achieving 82.5% planning and 75.0% execution hijack rates on GPT-4o.

What carries the argument

Agent-only perceptual injection, the paradigm that uses near-zero touch signals from automated agents to hide malicious visual prompts from humans while triggering agent actions.

If this is right

Mobile agents can be made to execute cross-app actions without leaving persistent visual traces that users would notice.
One-shot optimized prompts can bypass existing LVLM safety filters under realistic mobile deployment limits.
Security for vision-language agents must incorporate interaction-level signals such as touch data rather than relying only on visual content filters.
Similar perceptual gaps may allow stealthy attacks on other autonomous agent systems that process visual inputs differently from human observers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent designs could add simulated human touch patterns or consistency checks between human-view and agent-view renders to flag anomalies.
Platform providers might need to expose touch-signal metadata to safety layers so that injected content can be filtered before execution.
The same discrepancy could be tested in non-mobile settings such as desktop agents or robotic vision systems that share visual interfaces with humans.

Load-bearing premise

Automated agents generate near-zero contact touch signals, so malicious visual content stays imperceptible to humans during one-shot interactions.

What would settle it

Measure actual touch signal rates in deployed mobile agents during normal tasks and test whether human users can detect the injected malicious content when viewing the same screens the agent processes.

read the original abstract

Large Vision-Language Models (LVLMs) empower autonomous mobile agents, yet their security under realistic mobile deployment constraints remains underexplored. While agents are vulnerable to visual prompt injections, stealthily executing such attacks without requiring system-level privileges remains challenging, as existing methods rely on persistent visual manipulations that are noticeable to users. We uncover a consistent discrepancy between human and agent interactions: automated agents generate near-zero contact touch signals. Building on this insight, we propose a new attack paradigm, agent-only perceptual injection, where malicious content is exposed only during agent interactions, while remaining not readily perceived by human users. To accommodate mobile UI constraints and one-shot interaction settings, we introduce HG-IDA*, an efficient one-shot optimization method for constructing jailbreak prompts that evade LVLM safety filters. Experiments demonstrate that our approach induces unauthorized cross-app actions, achieving 82.5% planning and 75.0% execution hijack rates on GPT-4o. Our findings highlight a previously underexplored attack surface in mobile agent systems and underscore the need for defenses that incorporate interaction-level signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main contribution is a new framing for stealthy visual attacks on mobile agents that exploits human-agent differences in touch signals, but the supporting evidence for that stealth remains thin.

read the letter

The central point is that these authors frame a practical attack on mobile vision-language agents by hiding malicious UI content from humans while it still fools the agent. They call it agent-only perceptual injection and tie it to the observation that automated agents produce almost no contact touch signals. From there they build HG-IDA*, a one-shot optimization method meant to work under the tight constraints of mobile UIs and single interactions. The reported numbers are 82.5 percent planning success and 75 percent execution hijack on GPT-4o, with the attacks triggering cross-app actions without needing system privileges. That is the concrete result worth noting. The optimization technique itself looks like a reasonable engineering step for the mobile setting. The paper does a service by pointing out that most prior visual prompt injection work assumes persistent or obvious changes that users would spot. The new angle is the claim that the attack can stay invisible in normal one-shot flows because of the touch-signal gap. That idea is not obviously present in the earlier literature they cite. The soft spot is exactly where the stress-test note lands. The stealth claim depends on humans not noticing the injected element, yet the abstract gives no human-subject data, no touch-event measurements, and no comparison of detection rates. Without that, the attack surface reduces to ordinary visual injection that existing filters already target. The experimental description also omits baselines, trial counts, and controls, so it is hard to judge how much the new method actually improves on prior approaches. This paper is for researchers who work on agent security and mobile AI deployments. Anyone thinking about real-world attack surfaces on GPT-4o-style agents will find the numbers and the optimization method useful to examine. It is not a foundational result, but the topic is timely enough that a serious referee should look at the full experiments and the human-perception measurements. I would send it to peer review with the expectation that reviewers will ask for those missing controls and human studies.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a new attack paradigm termed 'agent-only perceptual injection' for stealthy jailbreaks on mobile vision-language agents. It builds on the claimed discrepancy that automated agents produce near-zero contact touch signals (unlike humans), allowing malicious UI content to evade human perception while triggering agent actions. The authors present HG-IDA*, an efficient one-shot optimization method to generate prompts that bypass LVLM safety filters under mobile UI and one-shot constraints. Experiments report 82.5% planning hijack and 75.0% execution hijack rates inducing unauthorized cross-app actions on GPT-4o.

Significance. If the core assumption and experimental claims hold after verification, the work is significant for identifying an underexplored attack surface in deployed mobile agent systems that relies on interaction modality differences rather than persistent visual changes. It could motivate defenses that incorporate touch or interaction signals and extends visual prompt injection research to agent-specific settings.

major comments (2)

[Abstract] Abstract: The stealth property and new paradigm rest on the unquantified assumption that 'automated agents generate near-zero contact touch signals' while malicious content remains 'not readily perceived' by humans in one-shot flows. No touch-event logs, gaze data, human detection rates, or timing/rendering details are provided to substantiate the discrepancy, which is load-bearing for distinguishing this from standard visual prompt injection.
[Abstract] Abstract (experiments paragraph): The headline rates of 82.5% planning and 75.0% execution hijack on GPT-4o are presented without any mention of trial count, baseline comparisons, experimental controls, statistical error bars, or optimization robustness checks. This absence prevents assessment of whether the results support the claimed effectiveness of HG-IDA*.

minor comments (1)

[Abstract] The acronym HG-IDA* is used without expansion or definition in the abstract; provide the full name and a brief description of the method on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity in the abstract regarding the core assumption and experimental reporting. We address each point below with clarifications drawn from the full manuscript and propose targeted revisions to strengthen the presentation without altering the underlying contributions.

read point-by-point responses

Referee: [Abstract] The stealth property and new paradigm rest on the unquantified assumption that 'automated agents generate near-zero contact touch signals' while malicious content remains 'not readily perceived' by humans in one-shot flows. No touch-event logs, gaze data, human detection rates, or timing/rendering details are provided to substantiate the discrepancy, which is load-bearing for distinguishing this from standard visual prompt injection.

Authors: We agree that the abstract does not quantify the discrepancy with logs or human studies. The manuscript grounds the claim in the fundamental interaction differences: human users produce measurable physical contact touch events on mobile devices, whereas automated agents operate through simulated inputs or accessibility APIs that generate near-zero contact signals by design. This modality gap enables the agent-only perceptual injection without persistent visual changes. To address the concern, we will revise the introduction to include a dedicated paragraph with illustrative agent trace examples and a clear explanation of why this distinguishes the attack from standard visual prompt injection. We will also note the practical challenges of conducting human perception studies in an attack paper. revision: yes
Referee: [Abstract] The headline rates of 82.5% planning and 75.0% execution hijack on GPT-4o are presented without any mention of trial count, baseline comparisons, experimental controls, statistical error bars, or optimization robustness checks. This absence prevents assessment of whether the results support the claimed effectiveness of HG-IDA*.

Authors: The abstract is a high-level summary of results; the full experimental details—including trial counts, baseline comparisons to existing methods, UI variation controls, robustness checks for the one-shot HG-IDA* optimizer, and statistical reporting—are provided in Section 4 (Experiments) and the appendix. We acknowledge that a brief reference to the evaluation scale would improve the abstract. We will revise the abstract's experiments paragraph to indicate that the rates are obtained from systematic multi-trial evaluations with controls and direct readers to the detailed methodology and analysis in the main text for full context. revision: yes

Circularity Check

0 steps flagged

Empirical attack rates rest on experimental measurement with no reduction to self-referential inputs

full rationale

The paper presents an empirical security study whose central results are attack success rates (82.5% planning, 75% execution on GPT-4o) obtained by running the HG-IDA* optimization against real LVLM agents. The claimed discrepancy in touch signals is introduced as an observed premise that motivates the agent-only perceptual injection paradigm, yet the reported hijack percentages are produced by direct experimentation rather than any equation, fitted parameter, or self-citation that reduces the outcome to the premise by construction. No derivation chain, uniqueness theorem, or ansatz is invoked that would make the measured rates tautological with the input assumption.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters or axioms; the central claim rests on the empirical observation of near-zero touch signals and the effectiveness of the proposed optimization.

invented entities (1)

agent-only perceptual injection no independent evidence
purpose: New attack paradigm that exposes malicious content only during agent interactions
Introduced as the core contribution building on the human-agent interaction discrepancy

pith-pipeline@v0.9.0 · 5733 in / 1181 out tokens · 35573 ms · 2026-05-18T09:34:46.824255+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We map this record to an agent-attributable event indicator via a simple classifier et = f(rt) = (1, size t ≤ ϵs ∨ pressure t ≤ ϵp, 0, otherwise)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

agent-attributable activation, which leverages input attribution signals to distinguish agent from human interactions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.