pith. sign in

arxiv: 2603.11875 · v2 · submitted 2026-03-12 · 💻 cs.CR · cs.AI

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

Pith reviewed 2026-05-15 12:24 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords prompt injectiondata geometrymirror design patternlinear SVMsecurity screeningL1 detectioncharacter n-gramadversarial robustness
0
0 comments X

The pith

A 32-cell mirror topology of matched positive and negative data cells lets a linear SVM detect prompt injections at 96% recall while a 22-million-parameter model reaches only 44%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that first-line prompt injection screening benefits more from strict data geometry than from larger neural models. By organizing corpora into a symmetric 32-cell structure of matched positive and negative examples, a classifier is forced to learn control-plane attack mechanics instead of incidental corpus patterns. Using only public data, the authors train a sparse character n-gram linear SVM, compile it to a static binary, and report 95.97% recall and 92.07% F1 on a 524-case holdout at sub-millisecond latency. The same holdout yields 44.35% recall for a 22-million-parameter model at substantially higher latency. The work positions this approach as an auditable, deterministic L1 filter that leaves residual semantic questions for downstream layers.

Core claim

The Mirror design pattern organizes prompt injection corpora into a 32-cell topology of matched positive and negative cells so that a classifier learns control-plane attack mechanics rather than incidental corpus shortcuts; when 31 cells are filled with 5,000 strictly curated public samples and a sparse character n-gram linear SVM is trained on the resulting geometry, the model achieves 95.97% recall and 92.07% F1 on a 524-case holdout at sub-millisecond latency, outperforming a 22-million-parameter neural detector that reaches only 44.35% recall on the identical set.

What carries the argument

The 32-cell mirror topology, a symmetric grid of matched positive and negative cells that enforces geometric constraints on the training distribution so the learner must capture attack mechanics.

If this is right

  • A sparse linear SVM compiled to a static Rust binary runs at sub-millisecond latency with no external model dependencies.
  • The same holdout set shows the 22-million-parameter model achieving only 44.35% recall and 59.14% F1 at 49 ms median latency.
  • Linear models still leave use-versus-mention ambiguities and other semantic edge cases for later pipeline stages.
  • The approach requires only public data under a strict validity contract and produces an auditable, deterministic artifact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same cell-matching discipline could be applied to other L1 security filters such as jailbreak or policy-violation screening without requiring large models.
  • If the mirror structure generalizes, organizations could maintain high-performance detectors by curating public data cells rather than scaling model size.
  • A direct test would be to retrain the identical SVM on a non-mirrored corpus of equal size and measure the drop in recall on the original holdout.

Load-bearing premise

That the 524-case holdout is representative and contains none of the incidental shortcuts that the mirror topology deliberately excludes from the training cells.

What would settle it

A fresh 524-case test set constructed by sampling without regard to the 32-cell mirror structure, on which the compiled SVM recall drops below 80%.

read the original abstract

Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first screening layer, however, the requirements are different: the detector runs on every request and therefore must be fast, deterministic, non-promptable, and auditable. We introduce Mirror, a data-curation design pattern that organizes prompt injection corpora into matched positive and negative cells so that a classifier learns control-plane attack mechanics rather than incidental corpus shortcuts. Using 5,000 strictly curated open-source samples -- the largest corpus supportable under our public-data validity contract -- we define a 32-cell mirror topology, fill 31 of those cells with public data, train a sparse character n-gram linear SVM, compile its weights into a static Rust artifact, and obtain 95.97\% recall and 92.07\% F1 on a 524-case holdout at sub-millisecond latency with no external model runtime dependencies. On the same holdout, our next line of defense, a 22-million-parameter Prompt Guard~2 model reaches 44.35\% recall and 59.14\% F1 at 49\,ms median and 324\,ms p95 latency. Linear models still leave residual semantic ambiguities such as use-versus-mention for later pipeline layers, but within that scope our results show that for L1 prompt injection screening, strict data geometry can matter more than model scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Mirror design pattern for organizing prompt injection corpora into a 32-cell topology of matched positive and negative cells. Using 5,000 strictly curated public samples to populate 31 cells, it trains a sparse character n-gram linear SVM, compiles it to a static Rust artifact, and reports 95.97% recall and 92.07% F1 on a 524-case holdout at sub-millisecond latency. On the same holdout the 22M-parameter Prompt Guard 2 model achieves only 44.35% recall and 59.14% F1 at 49 ms median latency. The central claim is that strict data geometry can matter more than model scale for L1 prompt-injection screening.

Significance. If the topology demonstrably blocks corpus shortcuts while preserving attack mechanics, the result would be significant for production L1 defenses: it supplies a fast, deterministic, auditable, non-promptable baseline that can be compiled to a static binary with no runtime model dependencies. This would support hybrid pipelines in which geometric classifiers handle the common case and larger models are reserved for residual semantic ambiguities.

major comments (2)
  1. [Abstract] Abstract: the 32-cell mirror topology is asserted to force the SVM to learn control-plane attack mechanics rather than incidental corpus shortcuts, yet the manuscript supplies no definition of cell boundaries, no positive-negative pairing rules, and no assignment criteria. Because the performance gap (95.97% vs 44.35% recall) is attributed to this geometry, the absence of these specifications leaves the central claim unsupported by inspectable evidence.
  2. [Abstract] Abstract (comparison paragraph): the 22M-parameter model is evaluated on the identical 524-case holdout, but the text does not state whether that model was trained or fine-tuned on equivalently curated data or on the same public corpus. Without this information the reported advantage cannot be isolated to the mirror geometry versus differences in training-data organization.
minor comments (1)
  1. [Abstract] The latency figures mix 'sub-millisecond' with '49 ms median and 324 ms p95'; reporting both mean and standard deviation for the SVM and consistent units throughout would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment point by point below. Where the manuscript requires additional clarity to support the central claims, we will revise accordingly while preserving the reported results and methodology.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the 32-cell mirror topology is asserted to force the SVM to learn control-plane attack mechanics rather than incidental corpus shortcuts, yet the manuscript supplies no definition of cell boundaries, no positive-negative pairing rules, and no assignment criteria. Because the performance gap (95.97% vs 44.35% recall) is attributed to this geometry, the absence of these specifications leaves the central claim unsupported by inspectable evidence.

    Authors: We agree that the abstract alone does not supply the requested definitions. The full manuscript (Section 3) defines the 32-cell topology by partitioning on control-plane dimensions such as injection type (direct, indirect, role override), payload location, and context interaction; positive-negative pairs are formed by holding the base prompt fixed and varying only the injection element; assignment uses multi-annotator review against explicit exclusion rules for corpus artifacts. To make these fully inspectable in support of the geometry claim, we will expand the abstract with a concise summary of the rules and add an appendix containing the complete cell boundary table, pairing protocol, and assignment criteria with examples. revision: yes

  2. Referee: [Abstract] Abstract (comparison paragraph): the 22M-parameter model is evaluated on the identical 524-case holdout, but the text does not state whether that model was trained or fine-tuned on equivalently curated data or on the same public corpus. Without this information the reported advantage cannot be isolated to the mirror geometry versus differences in training-data organization.

    Authors: Prompt Guard 2 is an off-the-shelf pre-trained model released by its authors and was neither trained nor fine-tuned on our 5,000-sample corpus, the 524-case holdout, or any data organized under the mirror topology. It is evaluated zero-shot on the holdout solely as a scale baseline. We will add an explicit statement to this effect in the revised comparison paragraph so that the performance differential can be attributed to the mirror geometry and linear SVM training rather than training-data differences. revision: yes

Circularity Check

0 steps flagged

No circularity: standard curation, training, and holdout evaluation with independent geometry claim

full rationale

The derivation consists of defining a 32-cell mirror topology, populating 31 cells from public data under a stated validity contract, training a character n-gram linear SVM, and reporting metrics on a separate 524-case holdout. No equations or steps reduce by construction to fitted inputs, self-citations, or renamed ansatzes. The performance comparison to the 22M-parameter model uses the identical holdout, making the geometry-over-scale claim externally falsifiable from the curation rules. This is a conventional supervised learning pipeline with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the premise that the 32-cell matched topology isolates control-plane mechanics and that the public-data corpus plus holdout are free of the shortcuts the method claims to avoid.

axioms (1)
  • domain assumption Matched positive and negative cells in the 32-cell mirror topology force the linear SVM to learn attack mechanics rather than incidental corpus artifacts
    Stated directly in the abstract as the reason the method succeeds over standard training.
invented entities (1)
  • Mirror design pattern no independent evidence
    purpose: Data-curation structure that organizes prompt injection corpora into matched cells
    Newly introduced in the paper as the core contribution.

pith-pipeline@v0.9.0 · 5546 in / 1438 out tokens · 35952 ms · 2026-05-15T12:24:55.932507+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.