pith. sign in

arxiv: 2510.07809 · v4 · submitted 2025-10-09 · 💻 cs.CR · cs.AI

Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents

Pith reviewed 2026-05-18 09:34 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords jailbreak attacksvision-language modelsmobile agentsvisual prompt injectionAI securityperceptual injectioncross-app actionsagent safety
0
0 comments X

The pith

A difference in touch signals lets attackers jailbreak mobile vision-language agents with content humans do not notice.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that automated agents produce almost no contact touch signals when interacting with mobile screens, unlike humans. This gap allows visual jailbreak content to be injected so that only the agent perceives and acts on it during one-shot sessions, while it stays hidden from users. The authors introduce an agent-only perceptual injection paradigm and develop HG-IDA*, a one-shot optimization method to generate prompts that bypass LVLM safety filters under mobile UI limits. Experiments show the method triggers unauthorized cross-app actions at 82.5 percent planning and 75 percent execution hijack rates on GPT-4o. If correct, the work shows that current visual safety mechanisms miss interaction-level differences and must account for agent-specific signals to prevent stealthy hijacks.

Core claim

We uncover a consistent discrepancy between human and agent interactions where automated agents generate near-zero contact touch signals. Building on this, we propose agent-only perceptual injection, in which malicious content is exposed only during agent interactions while remaining not readily perceived by human users. To fit mobile UI constraints and one-shot settings, we introduce HG-IDA*, an efficient optimization method for constructing jailbreak prompts that evade LVLM safety filters. Experiments demonstrate that our approach induces unauthorized cross-app actions, achieving 82.5% planning and 75.0% execution hijack rates on GPT-4o.

What carries the argument

Agent-only perceptual injection, the paradigm that uses near-zero touch signals from automated agents to hide malicious visual prompts from humans while triggering agent actions.

If this is right

  • Mobile agents can be made to execute cross-app actions without leaving persistent visual traces that users would notice.
  • One-shot optimized prompts can bypass existing LVLM safety filters under realistic mobile deployment limits.
  • Security for vision-language agents must incorporate interaction-level signals such as touch data rather than relying only on visual content filters.
  • Similar perceptual gaps may allow stealthy attacks on other autonomous agent systems that process visual inputs differently from human observers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agent designs could add simulated human touch patterns or consistency checks between human-view and agent-view renders to flag anomalies.
  • Platform providers might need to expose touch-signal metadata to safety layers so that injected content can be filtered before execution.
  • The same discrepancy could be tested in non-mobile settings such as desktop agents or robotic vision systems that share visual interfaces with humans.

Load-bearing premise

Automated agents generate near-zero contact touch signals, so malicious visual content stays imperceptible to humans during one-shot interactions.

What would settle it

Measure actual touch signal rates in deployed mobile agents during normal tasks and test whether human users can detect the injected malicious content when viewing the same screens the agent processes.

read the original abstract

Large Vision-Language Models (LVLMs) empower autonomous mobile agents, yet their security under realistic mobile deployment constraints remains underexplored. While agents are vulnerable to visual prompt injections, stealthily executing such attacks without requiring system-level privileges remains challenging, as existing methods rely on persistent visual manipulations that are noticeable to users. We uncover a consistent discrepancy between human and agent interactions: automated agents generate near-zero contact touch signals. Building on this insight, we propose a new attack paradigm, agent-only perceptual injection, where malicious content is exposed only during agent interactions, while remaining not readily perceived by human users. To accommodate mobile UI constraints and one-shot interaction settings, we introduce HG-IDA*, an efficient one-shot optimization method for constructing jailbreak prompts that evade LVLM safety filters. Experiments demonstrate that our approach induces unauthorized cross-app actions, achieving 82.5% planning and 75.0% execution hijack rates on GPT-4o. Our findings highlight a previously underexplored attack surface in mobile agent systems and underscore the need for defenses that incorporate interaction-level signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a new attack paradigm termed 'agent-only perceptual injection' for stealthy jailbreaks on mobile vision-language agents. It builds on the claimed discrepancy that automated agents produce near-zero contact touch signals (unlike humans), allowing malicious UI content to evade human perception while triggering agent actions. The authors present HG-IDA*, an efficient one-shot optimization method to generate prompts that bypass LVLM safety filters under mobile UI and one-shot constraints. Experiments report 82.5% planning hijack and 75.0% execution hijack rates inducing unauthorized cross-app actions on GPT-4o.

Significance. If the core assumption and experimental claims hold after verification, the work is significant for identifying an underexplored attack surface in deployed mobile agent systems that relies on interaction modality differences rather than persistent visual changes. It could motivate defenses that incorporate touch or interaction signals and extends visual prompt injection research to agent-specific settings.

major comments (2)
  1. [Abstract] Abstract: The stealth property and new paradigm rest on the unquantified assumption that 'automated agents generate near-zero contact touch signals' while malicious content remains 'not readily perceived' by humans in one-shot flows. No touch-event logs, gaze data, human detection rates, or timing/rendering details are provided to substantiate the discrepancy, which is load-bearing for distinguishing this from standard visual prompt injection.
  2. [Abstract] Abstract (experiments paragraph): The headline rates of 82.5% planning and 75.0% execution hijack on GPT-4o are presented without any mention of trial count, baseline comparisons, experimental controls, statistical error bars, or optimization robustness checks. This absence prevents assessment of whether the results support the claimed effectiveness of HG-IDA*.
minor comments (1)
  1. [Abstract] The acronym HG-IDA* is used without expansion or definition in the abstract; provide the full name and a brief description of the method on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity in the abstract regarding the core assumption and experimental reporting. We address each point below with clarifications drawn from the full manuscript and propose targeted revisions to strengthen the presentation without altering the underlying contributions.

read point-by-point responses
  1. Referee: [Abstract] The stealth property and new paradigm rest on the unquantified assumption that 'automated agents generate near-zero contact touch signals' while malicious content remains 'not readily perceived' by humans in one-shot flows. No touch-event logs, gaze data, human detection rates, or timing/rendering details are provided to substantiate the discrepancy, which is load-bearing for distinguishing this from standard visual prompt injection.

    Authors: We agree that the abstract does not quantify the discrepancy with logs or human studies. The manuscript grounds the claim in the fundamental interaction differences: human users produce measurable physical contact touch events on mobile devices, whereas automated agents operate through simulated inputs or accessibility APIs that generate near-zero contact signals by design. This modality gap enables the agent-only perceptual injection without persistent visual changes. To address the concern, we will revise the introduction to include a dedicated paragraph with illustrative agent trace examples and a clear explanation of why this distinguishes the attack from standard visual prompt injection. We will also note the practical challenges of conducting human perception studies in an attack paper. revision: yes

  2. Referee: [Abstract] The headline rates of 82.5% planning and 75.0% execution hijack on GPT-4o are presented without any mention of trial count, baseline comparisons, experimental controls, statistical error bars, or optimization robustness checks. This absence prevents assessment of whether the results support the claimed effectiveness of HG-IDA*.

    Authors: The abstract is a high-level summary of results; the full experimental details—including trial counts, baseline comparisons to existing methods, UI variation controls, robustness checks for the one-shot HG-IDA* optimizer, and statistical reporting—are provided in Section 4 (Experiments) and the appendix. We acknowledge that a brief reference to the evaluation scale would improve the abstract. We will revise the abstract's experiments paragraph to indicate that the rates are obtained from systematic multi-trial evaluations with controls and direct readers to the detailed methodology and analysis in the main text for full context. revision: yes

Circularity Check

0 steps flagged

Empirical attack rates rest on experimental measurement with no reduction to self-referential inputs

full rationale

The paper presents an empirical security study whose central results are attack success rates (82.5% planning, 75% execution on GPT-4o) obtained by running the HG-IDA* optimization against real LVLM agents. The claimed discrepancy in touch signals is introduced as an observed premise that motivates the agent-only perceptual injection paradigm, yet the reported hijack percentages are produced by direct experimentation rather than any equation, fitted parameter, or self-citation that reduces the outcome to the premise by construction. No derivation chain, uniqueness theorem, or ansatz is invoked that would make the measured rates tautological with the input assumption.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters or axioms; the central claim rests on the empirical observation of near-zero touch signals and the effectiveness of the proposed optimization.

invented entities (1)
  • agent-only perceptual injection no independent evidence
    purpose: New attack paradigm that exposes malicious content only during agent interactions
    Introduced as the core contribution building on the human-agent interaction discrepancy

pith-pipeline@v0.9.0 · 5733 in / 1181 out tokens · 35573 ms · 2026-05-18T09:34:46.824255+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.