pith. machine review for the scientific record. sign in

arxiv: 2605.13764 · v1 · submitted 2026-05-13 · 💻 cs.CR · cs.IR· cs.LG

Recognition: unknown

VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense

Authors on Pith no claims yet

Pith reviewed 2026-05-14 17:42 UTC · model grok-4.3

classification 💻 cs.CR cs.IRcs.LG
keywords steganographyvector databasesRAG securityembeddingsdata exfiltrationcryptographic provenanceembedding integrity
0
0 comments X

The pith

Embeddings can hide stolen data via small rotations that evade detectors, but signatures block the changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an attacker with write access to an embedding pipeline can perturb vectors after creation to carry hidden payloads while keeping the same top retrieval results for normal queries. Simple distribution checks catch many changes, yet small-angle orthogonal rotations succeed across every model and corpus tested because they leave surface statistics intact. The core defense is VectorPin, which attaches an Ed25519 signature to each embedding over its source content and model so that any later modification fails verification. This matters for RAG systems that currently treat stored vectors as opaque numerical artifacts without integrity controls.

Core claim

Post-embedding perturbations such as small-angle orthogonal rotations allow an attacker to encode payload bits inside vectors without shifting the distributional properties that anomaly detectors monitor, so retrieval behavior for legitimate users remains unchanged. Real manifolds limit usable capacity below the theoretical floor(d/2) * b bits from a disjoint-Givens encoder. VectorPin counters the attack by producing an Ed25519 signature over a canonical byte representation of the embedding, its originating content, and the model; any alteration invalidates the signature.

What carries the argument

VectorPin, the protocol that computes an Ed25519 signature over a canonical byte representation of each embedding together with its source content and producing model.

If this is right

  • Vector stores without provenance checks remain open to steganographic exfiltration from any party that can modify embeddings at ingestion time.
  • Distribution-based anomaly detectors fail against rotation-based hiding on every tested model and corpus pair.
  • A capacity-detectability trade-off appears in practice: usable hidden bits per vector sit well below the theoretical maximum once retrieval must stay unchanged.
  • Embedding-level cryptographic pinning becomes a deployable control that closes the entire class of post-creation modifications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same rotation technique could be tested on image or multimodal embeddings to see whether retrieval-preserving hiding generalizes beyond text.
  • Signature verification at query time would add a fixed per-vector cost but would detect tampering regardless of how the change was made.
  • Manifold geometry appears to set a hard limit on how much data can be hidden without moving the vector enough to change retrieval rankings.

Load-bearing premise

Perturbations exist that preserve identical top-k retrieval results for the same queries after the embedding is changed.

What would settle it

Compare top-k retrieval sets on a held-out query set using original versus small-angle-rotated embeddings to check whether the sets match exactly, or attempt signature verification on a single modified vector to see whether it fails.

Figures

Figures reproduced from arXiv: 2605.13764 by Jascha Wanger.

Figure 1
Figure 1. Figure 1: Per-vector cosine similarity (left) and pairwise similarity matrix correlation (right) across [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Kolmogorov-Smirnov statistic between clean and obfuscated embedding-component [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Per-technique wall-clock per batch (mean across noise configurations). Bars represent [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
read the original abstract

Modern retrieval-augmented generation (RAG) systems convert sensitive content into high-dimensional embeddings and store them in vector databases that treat the resulting numerical artifacts as opaque. Major vector-store products do not provide native controls for embedding integrity, ingestion-time distributional anomaly detection, or cryptographic provenance attestation. We show this opens a class of steganographic exfiltration attacks: an attacker with write access to the ingestion pipeline can hide payload data inside embeddings using simple post-embedding perturbations (noise injection, rotation, scaling, offset, fragmentation, and combinations thereof) while preserving the surface-level retrieval behavior the RAG system exposes to legitimate users. We evaluate these techniques across a synthetic-PII corpus on text-embedding-3-large, four locally hosted open embedding models, a cross-corpus replication on BEIR NFCorpus and a Quora subset (over 26,000 chunks combined), seven vector-store configurations, an adaptive-attacker variant of the detector evaluation, and a paraphrased-query retrieval benchmark. Distribution-shifting perturbations are often caught by simple anomaly detectors; small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested. A disjoint-Givens rotation encoder gives a closed-form per-vector capacity ceiling of floor(d/2) * b bits, but real embedding manifolds impose a capacity-detectability trade-off, and the retrieval-preserving operating point sits well below it. We propose VectorPin, a cryptographic provenance protocol that pins each embedding to its source content and producing model via an Ed25519 signature over a canonical byte representation. Any post-embedding modification breaks signature verification. Embedding-level integrity is a deployable, standardizable control that closes this attack class.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that attackers with write access to RAG ingestion pipelines can perform steganographic exfiltration by applying post-embedding perturbations (noise, rotation, scaling, etc.) to hide payloads in vector stores while preserving surface retrieval behavior. It reports that small-angle orthogonal rotations defeat distribution-based detectors across all tested models and corpora (including >26k chunks on paraphrased queries), derives a closed-form capacity bound for disjoint-Givens rotations, and proposes VectorPin: an Ed25519 signature protocol over canonical byte representations that detects any post-embedding modification.

Significance. If the empirical claims hold, the work is significant for RAG security: it identifies a practical attack class on opaque embedding stores and supplies a deployable, parameter-free cryptographic control (standard Ed25519) that closes the vector. The breadth of evaluation (multiple models, BEIR/Quora corpora, seven stores, adaptive attacker) and the explicit capacity-detectability trade-off discussion are strengths; the defense requires no new primitives and directly addresses the identified gap.

major comments (2)
  1. [Evaluation (paraphrased-query retrieval benchmark and cross-corpus replication)] The central usability claim—that small-angle orthogonal rotations preserve surface-level retrieval behavior across every (model, corpus) pair—rests on unquantified assertions. The paraphrased-query benchmark on >26k chunks is described, but no top-k overlap, MRR delta, recall@5 change, or similar metrics are reported between clean and rotated embeddings. This metric gap is load-bearing for both the 'defeats detection while remaining usable' conclusion and the practical stealth assessment.
  2. [Abstract and Evaluation sections] The statement that 'small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested' is presented as a universal result, yet the abstract and evaluation summary provide no per-pair detector scores, false-positive rates, or adaptive-attacker success rates. Without these numbers or the exact detector implementations, the scope of the 'defeats' claim cannot be verified.
minor comments (1)
  1. [Capacity analysis] The disjoint-Givens rotation capacity formula (floor(d/2) * b bits) is stated without an accompanying derivation or reference to the underlying linear-algebra construction; a short appendix or inline proof sketch would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The two major comments correctly identify gaps in the quantitative reporting of our evaluation results. We will address both by expanding the evaluation section with the requested metrics and tables in the revised manuscript.

read point-by-point responses
  1. Referee: [Evaluation (paraphrased-query retrieval benchmark and cross-corpus replication)] The central usability claim—that small-angle orthogonal rotations preserve surface-level retrieval behavior across every (model, corpus) pair—rests on unquantified assertions. The paraphrased-query benchmark on >26k chunks is described, but no top-k overlap, MRR delta, recall@5 change, or similar metrics are reported between clean and rotated embeddings. This metric gap is load-bearing for both the 'defeats detection while remaining usable' conclusion and the practical stealth assessment.

    Authors: We agree that the manuscript describes the paraphrased-query benchmark and cross-corpus replication but does not report the explicit numerical deltas. In the revision we will add a table (and accompanying text) that reports top-k overlap, MRR, recall@5, and nDCG deltas between clean and rotated embeddings for every model-corpus pair. These numbers will directly quantify the retrieval-preservation claim and support the practical stealth assessment. revision: yes

  2. Referee: [Abstract and Evaluation sections] The statement that 'small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested' is presented as a universal result, yet the abstract and evaluation summary provide no per-pair detector scores, false-positive rates, or adaptive-attacker success rates. Without these numbers or the exact detector implementations, the scope of the 'defeats' claim cannot be verified.

    Authors: We acknowledge that the abstract and high-level evaluation summary omit the per-pair numerical results. The evaluation section describes the detector implementations and the adaptive-attacker protocol, but the concrete scores are only summarized. We will revise both the abstract and the evaluation section to include detailed tables listing detector scores, false-positive rates, and adaptive-attacker success rates for each (model, corpus) pair, together with the precise detector configurations used. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical evaluations and standard Ed25519 application

full rationale

The paper's core claims rest on empirical measurements of perturbation effects (noise, rotation, scaling) across multiple models, corpora, and benchmarks, with no fitted parameters or self-referential predictions. The VectorPin defense is a direct, parameter-free application of the standard Ed25519 signature scheme over canonical byte representations of embeddings, source content, and model identifiers. No equations or derivations reduce to their inputs by construction, no self-citations form load-bearing uniqueness arguments, and no ansatzes are smuggled via prior work. The attack results are falsifiable via the reported retrieval and detection metrics; the defense is independently verifiable against the Ed25519 specification. This yields a self-contained, non-circular contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on the domain assumption that embedding manifolds allow small perturbations to carry payload while preserving nearest-neighbor behavior, plus standard cryptographic assumptions for Ed25519.

axioms (2)
  • domain assumption Small post-embedding perturbations can preserve retrieval behavior while embedding hidden data
    Invoked throughout the attack description and evaluation claims
  • standard math Ed25519 signatures over canonical byte representations provide integrity against post-creation modification
    Basis for the VectorPin protocol
invented entities (1)
  • VectorPin protocol no independent evidence
    purpose: Cryptographic provenance attestation for embeddings
    New protocol introduced to close the described attack class

pith-pipeline@v0.9.0 · 5610 in / 1222 out tokens · 29840 ms · 2026-05-14T17:42:35.641055+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Turning your weakness into a strength: Watermarking deep neural networks by backdooring

    Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. InUSENIX Security Symposium, 2018

  2. [2]

    Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang

    Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security signatures.Journal of Cryptographic Engineering, 2, 2012

  3. [3]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, et al. Extracting training data from large language models. InUSENIX Security Symposium, 2021

  4. [4]

    C2PA technical specification, version 2.0

    Coalition for Content Provenance and Authenticity. C2PA technical specification, version 2.0. https://c2pa.org/specifications/, 2024

  5. [5]

    Cox, Joe Kilian, F

    Ingemar J. Cox, Joe Kilian, F. Thomson Leighton, and Talal Shamoon. Secure spread spectrum watermarking for multimedia.IEEE Transactions on Image Processing, 6(12), 1997

  6. [6]

    Regulation (EU) 2024/1689 on artificial intelligence

    European Parliament and Council. Regulation (EU) 2024/1689 on artificial intelligence. https://eur-lex.europa.eu/eli/reg/2024/1689/oj, 2024

  7. [7]

    Cambridge University Press, 2009

    Jessica Fridrich.Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge University Press, 2009

  8. [8]

    BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. BadNets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017

  9. [9]

    Billion-scale similarity search with GPUs

    Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 2021

  10. [10]

    Jones, John Bradley, and Nat Sakimura

    Michael B. Jones, John Bradley, and Nat Sakimura. RFC 7515: JSON web signature (JWS). https://datatracker.ietf.org/doc/html/rfc7515, 2015

  11. [11]

    RFC 8032: Edwards-curve digital signature algorithm (EdDSA).https://datatracker.ietf.org/doc/html/rfc8032, 2017

    Simon Josefsson and Ilari Liusvaara. RFC 8032: Edwards-curve digital signature algorithm (EdDSA).https://datatracker.ietf.org/doc/html/rfc8032, 2017

  12. [12]

    Dense passage retrieval for open-domain question answering

    Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InEmpirical Methods in Natural Language Processing (EMNLP), 2020

  13. [13]

    RFC 3339: Date and time on the Internet: Timestamps

    Graham Klyne and Chris Newman. RFC 3339: Date and time on the Internet: Timestamps. https://datatracker.ietf.org/doc/html/rfc3339, 2002

  14. [14]

    Retrieval-augmented generation for knowledge-intensive NLP tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 40

  15. [15]

    Isolation forest

    Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InIEEE International Conference on Data Mining (ICDM), 2008

  16. [16]

    Malkov and D

    Yury A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 2020

  17. [17]

    AI risk management framework (AI RMF 1.0).https://www.nist.gov/itl/ai-risk-management-framework, 2023

    National Institute of Standards and Technology. AI risk management framework (AI RMF 1.0).https://www.nist.gov/itl/ai-risk-management-framework, 2023

  18. [18]

    sigstore: Software signing for everybody

    Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. sigstore: Software signing for everybody. InACM Conference on Computer and Communications Security (CCS), 2022

  19. [19]

    SLSA: Supply-chain levels for software artifacts.https: //slsa.dev/, 2023

    Open Source Security Foundation. SLSA: Supply-chain levels for software artifacts.https: //slsa.dev/, 2023

  20. [20]

    New embedding models and API updates

    OpenAI. New embedding models and API updates. https://openai.com/index/ new-embedding-models-and-api-updates/, 2024. Announcement of text-embedding-3- largeand related models

  21. [21]

    Berkay Celik, and Ananthram Swami

    Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. InACM Asia Conference on Computer and Communications Security (ASIACCS), 2017

  22. [22]

    Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12, 2011

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12, 2011

  23. [23]

    Hide and seek: An introduction to steganography.IEEE Security & Privacy, 1(3), 2003

    Niels Provos and Peter Honeyman. Hide and seek: An introduction to steganography.IEEE Security & Privacy, 1(3), 2003

  24. [24]

    Qdrant documentation: Vector quantization

    Qdrant Team. Qdrant documentation: Vector quantization. https://qdrant.tech/ documentation/guides/quantization/, 2024

  25. [25]

    RFC 8392: CBOR object signing and encryption (COSE).https://datatracker

    Jim Schaad. RFC 8392: CBOR object signing and encryption (COSE).https://datatracker. ietf.org/doc/html/rfc8392, 2018

  26. [26]

    Platt, John Shawe-Taylor, Alex J

    Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. Estimating the support of a high-dimensional distribution.Neural Computation, 13(7), 2001

  27. [27]

    Membership inference attacks against machine learning models

    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. InIEEE Symposium on Security and Privacy, 2017

  28. [28]

    in-toto: Providing farm-to-table guarantees for bits and bytes

    Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. InUSENIX Security Symposium, 2019

  29. [29]

    The Space of Transferable Adversarial Examples

    Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. InarXiv preprint arXiv:1704.03453, 2017

  30. [30]

    SchemaPin: Cryptographic provenance for tool schemas.https://github

    Jascha Wanger. SchemaPin: Cryptographic provenance for tool schemas.https://github. com/ThirdKeyAI/SchemaPin, 2025. Apache-2.0. 41

  31. [31]

    Symbiont: Policy-governed agent runtime.https://github.com/ThirdKeyAI/ Symbiont, 2025

    Jascha Wanger. Symbiont: Policy-governed agent runtime.https://github.com/ThirdKeyAI/ Symbiont, 2025. Apache-2.0

  32. [32]

    VectorPin: Verifiable integrity for AI embedding stores.https://github

    Jascha Wanger. VectorPin: Verifiable integrity for AI embedding stores.https://github. com/ThirdKeyAI/VectorPin, 2025. Apache-2.0

  33. [33]

    VectorSmuggle: A research framework for vector-based data exfiltration

    Jascha Wanger. VectorSmuggle: A research framework for vector-based data exfiltration. https://github.com/jaschadub/VectorSmuggle, 2025. Apache-2.0

  34. [34]

    F5—a steganographic algorithm: High capacity despite better steganalysis

    Andreas Westfeld. F5—a steganographic algorithm: High capacity despite better steganalysis. InInformation Hiding (IH ’01), 2001

  35. [35]

    PoisonedRAG: Knowledge poisoning attacks to retrieval-augmented generation of large language models.arXiv preprint arXiv:2402.07867, 2024

    Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models.https://arxiv.org/abs/ 2402.07867, 2024. A Protocol Specification (v1) This appendix is a self-contained reproduction of the VectorPin protocol specification at version

  36. [36]

    ∥hex(SHA256(UTF8(NFC(s)))). Text MUST be normalized to Unicode NFC before encoding. Implementations MUST reject input that cannot be normalized. 42 Vector.hash_vector(v,dtype) :=

    A separate-language implementation that follows this appendix should produce signatures and verifications byte-for-byte compatible with the Python and Rust reference implementations [32]. A.1 Goals A VectorPin Pin is a compact attestation that travels with an embedding through a vector database. It guarantees that: •The embedding matches a specific source...

  37. [37]

    Reject pins whosevfield is unknown to it (UNSUPPORTED_VERSION)

  38. [38]

    Reject pins whosekidis not in its key registry (UNKNOWN_KEY)

  39. [39]

    Reconstruct the canonical byte sequence and verifysig against the registered public key forkid (SIGNATURE_INVALIDon failure). 43

  40. [40]

    If ground-truth source was supplied, recomputehash_text(source) and compare tosource_- hash(SOURCE_MISMATCHon mismatch)

  41. [41]

    If a ground-truth vector was supplied, recomputehash_vector(vector, vec_dtype) and com- pare tovec_hash; also check the supplied vector’s shape matchesvec_dim (VECTOR_TAMPERED orSHAPE_MISMATCHon mismatch)

  42. [42]

    rbf", nu=0.05, gamma=

    If an expected model identifier was supplied, compare tomodel (MODEL_MISMATCH on mismatch). Verifiers MUST distinguish at least these failure modes. Other implementations MAY use different identifiers for the modes but MUST distinguish the cases. A.7 Storage conventions Adapter implementations SHOULD store pins under the metadata keyvectorpin. Backends wi...