pith. sign in

arxiv: 2605.30534 · v1 · pith:XAJGOAQUnew · submitted 2026-05-28 · 💻 cs.CR

Strengthening Polymorphic Prompt Assembling: Dynamic Separator Generation Against Emerging Prompt Injection Attacks

Pith reviewed 2026-06-29 06:32 UTC · model grok-4.3

classification 💻 cs.CR
keywords prompt injectionLLM securitypolymorphic prompt assemblingdynamic separatorsSHA-256attack mitigationseparator leakage
0
0 comments X

The pith

Dynamic per-request separators generated via SHA-256 limit prompt injection leakage to single requests and reduce attack success rates by a factor of 2.3.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends Polymorphic Prompt Assembling by replacing a fixed pool of separators with unique pairs created on each request from domain-separated SHA-256 digests keyed on timestamp, session identifier, and nonce. This change is meant to ensure that any separator discovered by an attacker cannot be reused in future interactions, shrinking the blast radius of successful leaks. Tests against 16 injection payloads on Llama-3.3-70B-Instruct-Turbo show the dynamic mode lowers attack success rate on an obfuscated payload from 0.88 to 0.38 and drives separator leakage rate to zero on a format-breakout attack that previously succeeded at 0.467. The approach adds only microseconds of overhead and needs no model changes.

Core claim

Generating a fresh (BEGIN, END) canary pair for every assembled prompt using domain-separated SHA-256 digests keyed on timestamp, session identifier, and cryptographic nonce confines leakage exposure to that single request and thereby reduces the exploitable surface of prompt injection attacks.

What carries the argument

Dynamic per-request separator generation via domain-separated SHA-256 digests keyed on timestamp, session identifier, and nonce.

If this is right

  • Attack success rate on the M1 leetspeak-plus-urgency payload falls from 0.88 to 0.38 with non-overlapping 95% Wilson intervals.
  • Separator leakage rate on format_breakout_salad drops from 0.467 to 0.000.
  • The defense requires no fine-tuning and adds 2.7 microseconds of assembly cost per request.
  • The method remains compatible with existing PPA SDK deployments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same keyed-digest technique could be applied to other prompt-assembly or context-isolation methods that currently rely on static delimiters.
  • If the generation keys themselves become compromised, the blast-radius benefit disappears, suggesting a need for frequent key rotation.
  • Cross-model validation on DeepSeek-V4-Flash indicates the mitigation is not tied to one specific model architecture.

Load-bearing premise

Attackers without access to the timestamp, session identifier, and nonce cannot predict or forge the generated separators.

What would settle it

An attacker who never receives the per-request keys successfully predicts or reuses a separator in a later request, producing a non-zero leakage rate under the dynamic mode.

Figures

Figures reproduced from arXiv: 2605.30534 by Nima Dorzhiev, Peng Liu.

Figure 1
Figure 1. Figure 1: Static vs. Dynamic PPA: Assembly Pipeline and Leakage Blast Radius. Left panel shows static mode with pool reuse; right panel shows dynamic mode [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Polymorphic Prompt Assembling (PPA) defends LLM agents against prompt injections by randomly selecting separator pairs from a fixed pool to isolate user input from system instructions. Although effective, static pool reuse exposes a blast-radius vulnerability: once a separator leaks, it can be exploited in future requests. We propose a dynamic per-request separator generation using domain-separated SHA-256 digests keyed on the timestamp, session identifier, and cryptographic nonce. Each assembled prompt receives a unique (BEGIN, END) canary pair, thereby limiting leakage exposure to a single request. We evaluated our extension against 16 injection payloads on Llama-3.3-70B-Instruct-Turbo, with cross-model validation on DeepSeek-V4-Flash model. Against the M1 obfuscation payload (leetspeak + urgency), the dynamic mode reduces the Attack Success Rate (ASR) from 0.88 to 0.38, yielding a statistically significant 2.3 x mitigation verified by non-overlapping 95% Wilson confidence intervals. Against format_breakout_salad, static separator leakage (leak_rate = 0.467) is eliminated entirely in the dynamic mode (0.000), confirming the blast-radius reduction in practice. The implementation requires no model fine-tuning, adds 2.7 microseconds prompt-assembly overhead per request, and is backward compatible with the existing PPA SDK.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes strengthening Polymorphic Prompt Assembling (PPA) via dynamic per-request separator generation using domain-separated SHA-256 digests keyed on timestamp, session identifier, and cryptographic nonce. This produces unique (BEGIN, END) pairs per assembled prompt to limit leakage exposure to a single request. Evaluation against 16 injection payloads on Llama-3.3-70B-Instruct-Turbo (with cross-model validation on DeepSeek-V4-Flash) reports that dynamic mode reduces ASR from 0.88 to 0.38 against the M1 obfuscation payload (statistically significant 2.3x mitigation via non-overlapping 95% Wilson CIs) and eliminates leakage entirely (leak_rate from 0.467 to 0.000) for format_breakout_salad, with 2.7 microseconds added overhead and no model fine-tuning required.

Significance. If the dynamic separators remain unpredictable to attackers, the work provides a practical, low-overhead extension to PPA that directly addresses the static-pool blast-radius vulnerability with quantitative evidence including CIs and cross-model checks. The absence of fine-tuning and backward compatibility are practical strengths. Significance is reduced by the lack of adversarial testing of the core unforgeability assumption.

major comments (2)
  1. [Dynamic per-request separator generation method] Dynamic per-request separator generation method: The headline blast-radius reduction (ASR 0.88→0.38; leak_rate 0.467→0.000) rests on the claim that keyed SHA-256 digests produce separators that attackers cannot predict or forge. No experiment tests an informed adversary who knows the generation algorithm and attempts to recover the per-request (BEGIN, END) pair by influencing or guessing the timestamp/session/nonce inputs. This is load-bearing for the central security claim.
  2. [Evaluation section] Evaluation section: Specific quantitative reductions with Wilson CIs and cross-model validation are reported, but the manuscript provides neither full experimental details, raw data, nor code. This prevents verification against post-hoc selection or unstated factors affecting the ASR and leak_rate results.
minor comments (2)
  1. The abstract states the method is backward compatible with the existing PPA SDK, but no dedicated section or pseudocode describes the integration points or API changes.
  2. Consider adding a limitations subsection explicitly discussing the untested informed-adversary scenario for the dynamic generation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger validation of the core security assumption and improved reproducibility. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses
  1. Referee: Dynamic per-request separator generation method: The headline blast-radius reduction (ASR 0.88→0.38; leak_rate 0.467→0.000) rests on the claim that keyed SHA-256 digests produce separators that attackers cannot predict or forge. No experiment tests an informed adversary who knows the generation algorithm and attempts to recover the per-request (BEGIN, END) pair by influencing or guessing the timestamp/session/nonce inputs. This is load-bearing for the central security claim.

    Authors: We acknowledge that the manuscript does not include direct experiments against an informed adversary who knows the separator generation algorithm. The design relies on standard cryptographic assumptions: domain-separated SHA-256 is one-way, and the per-request inputs (timestamp, session identifier, and nonce) are treated as secret. An informed adversary would still need to guess or influence these secrets to forge a valid pair, which is computationally infeasible under the threat model. We will revise the manuscript to add an explicit threat model section and a limitations paragraph discussing the unforgeability assumption and the absence of such adversarial experiments. revision: yes

  2. Referee: Evaluation section: Specific quantitative reductions with Wilson CIs and cross-model validation are reported, but the manuscript provides neither full experimental details, raw data, nor code. This prevents verification against post-hoc selection or unstated factors affecting the ASR and leak_rate results.

    Authors: We agree that additional details are needed for full reproducibility. In the revised manuscript we will expand the Evaluation section with the precise parameters used for SHA-256 domain separation, the complete list of 16 payloads with their generation methods, the exact prompting templates, and the statistical procedures for Wilson CIs. We will also add a Data Availability statement committing to release of the full source code, raw evaluation logs, and analysis scripts in a public repository upon acceptance. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on independent empirical measurements

full rationale

The paper's central claims consist of measured reductions in ASR (0.88 to 0.38) and leak_rate (0.467 to 0.000) under fixed payloads, validated by non-overlapping Wilson intervals. The dynamic separator construction is presented as an engineering proposal whose security properties are tested directly rather than derived from equations that reduce to the paper's own fitted inputs or self-defined quantities. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked to force the result; the evaluation stands on external experimental outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The construction relies on standard cryptographic assumptions for hash security with no fitted parameters, new entities, or ad-hoc axioms beyond the described method.

axioms (1)
  • standard math SHA-256 is a secure cryptographic hash function resistant to preimage and collision attacks under standard assumptions
    Invoked to ensure unpredictability of the generated separators from the keyed digest.

pith-pipeline@v0.9.1-grok · 5776 in / 1473 out tokens · 36126 ms · 2026-06-29T06:32:31.586407+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 8 canonical work pages · 3 internal anchors

  1. [1]

    The Rise and Potential of Large Language Model Based Agents: A Survey

    Z. Xi et al., “The rise and potential of large language model based agents: A survey,”arXiv:2309.07864, 2023

  2. [2]

    Ignore Previous Prompt: Attack Techniques For Language Models

    F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,”arXiv:2211.09527, 2022

  3. [3]

    The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

    M. Lupinacci et al., “Dark side of LLMs: Prompt injection attacks against LLM-integrated applications,”arXiv:2507.06850, 2025

  4. [4]

    To protect the LLM agent against the prompt injection attack with polymorphic prompt assembling,

    Z. Wang et al., “To protect the LLM agent against the prompt injection attack with polymorphic prompt assembling,”arXiv:2506.05739, 2025

  5. [5]

    Formalizing and benchmarking prompt injection attacks and defenses,

    Y . Liu et al., “Formalizing and benchmarking prompt injection attacks and defenses,”arXiv:2310.12815, 2024

  6. [6]

    An early categorization of prompt injection attacks on large language models,

    S. Rossi et al., “An early categorization of prompt injection attacks on large language models,”arXiv:2402.00898, 2024

  7. [7]

    Lakera Guard,

    Lakera, “Lakera Guard,” https://www.lakera.ai/lakera-guard, 2024

  8. [8]

    Azure AI Prompt Shield,

    Microsoft, “Azure AI Prompt Shield,” https://learn.microsoft.com/azure/ ai-services/content-safety, 2024

  9. [9]

    Prompt-Guard-86M,

    Meta AI, “Prompt-Guard-86M,” https://huggingface.co/meta-llama/ Prompt-Guard-86M, 2024

  10. [10]

    deberta-v3-base-prompt-injection-v2,

    ProtectAI, “deberta-v3-base-prompt-injection-v2,” https://huggingface.co/ protectai, 2024

  11. [11]

    PINT Benchmark,

    Lakera AI, “PINT Benchmark,” https://github.com/lakeraai/ pint-benchmark, 2024

  12. [12]

    SPIN: Self-supervised prompt injection,

    L. Zhou et al., “SPIN: Self-supervised prompt injection,” arXiv:2410.13236, 2024

  13. [13]

    Defense against prompt injection by leveraging attack techniques,

    Y . Chen et al., “Defense against prompt injection by leveraging attack techniques,”arXiv:2411.00459, 2025