arxiv: 2605.13764 · v1 · submitted 2026-05-13 · 💻 cs.CR · cs.IR· cs.LG

Recognition: unknown

VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense

Jascha Wanger

Authors on Pith no claims yet

Pith reviewed 2026-05-14 17:42 UTC · model grok-4.3

classification 💻 cs.CR cs.IRcs.LG

keywords steganographyvector databasesRAG securityembeddingsdata exfiltrationcryptographic provenanceembedding integrity

0 comments

The pith

Embeddings can hide stolen data via small rotations that evade detectors, but signatures block the changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an attacker with write access to an embedding pipeline can perturb vectors after creation to carry hidden payloads while keeping the same top retrieval results for normal queries. Simple distribution checks catch many changes, yet small-angle orthogonal rotations succeed across every model and corpus tested because they leave surface statistics intact. The core defense is VectorPin, which attaches an Ed25519 signature to each embedding over its source content and model so that any later modification fails verification. This matters for RAG systems that currently treat stored vectors as opaque numerical artifacts without integrity controls.

Core claim

Post-embedding perturbations such as small-angle orthogonal rotations allow an attacker to encode payload bits inside vectors without shifting the distributional properties that anomaly detectors monitor, so retrieval behavior for legitimate users remains unchanged. Real manifolds limit usable capacity below the theoretical floor(d/2) * b bits from a disjoint-Givens encoder. VectorPin counters the attack by producing an Ed25519 signature over a canonical byte representation of the embedding, its originating content, and the model; any alteration invalidates the signature.

What carries the argument

VectorPin, the protocol that computes an Ed25519 signature over a canonical byte representation of each embedding together with its source content and producing model.

If this is right

Vector stores without provenance checks remain open to steganographic exfiltration from any party that can modify embeddings at ingestion time.
Distribution-based anomaly detectors fail against rotation-based hiding on every tested model and corpus pair.
A capacity-detectability trade-off appears in practice: usable hidden bits per vector sit well below the theoretical maximum once retrieval must stay unchanged.
Embedding-level cryptographic pinning becomes a deployable control that closes the entire class of post-creation modifications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same rotation technique could be tested on image or multimodal embeddings to see whether retrieval-preserving hiding generalizes beyond text.
Signature verification at query time would add a fixed per-vector cost but would detect tampering regardless of how the change was made.
Manifold geometry appears to set a hard limit on how much data can be hidden without moving the vector enough to change retrieval rankings.

Load-bearing premise

Perturbations exist that preserve identical top-k retrieval results for the same queries after the embedding is changed.

What would settle it

Compare top-k retrieval sets on a held-out query set using original versus small-angle-rotated embeddings to check whether the sets match exactly, or attempt signature verification on a single modified vector to see whether it fails.

Figures

Figures reproduced from arXiv: 2605.13764 by Jascha Wanger.

**Figure 2.** Figure 2: Kolmogorov-Smirnov statistic between clean and obfuscated embedding-component [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Per-technique wall-clock per batch (mean across noise configurations). Bars represent [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

read the original abstract

Modern retrieval-augmented generation (RAG) systems convert sensitive content into high-dimensional embeddings and store them in vector databases that treat the resulting numerical artifacts as opaque. Major vector-store products do not provide native controls for embedding integrity, ingestion-time distributional anomaly detection, or cryptographic provenance attestation. We show this opens a class of steganographic exfiltration attacks: an attacker with write access to the ingestion pipeline can hide payload data inside embeddings using simple post-embedding perturbations (noise injection, rotation, scaling, offset, fragmentation, and combinations thereof) while preserving the surface-level retrieval behavior the RAG system exposes to legitimate users. We evaluate these techniques across a synthetic-PII corpus on text-embedding-3-large, four locally hosted open embedding models, a cross-corpus replication on BEIR NFCorpus and a Quora subset (over 26,000 chunks combined), seven vector-store configurations, an adaptive-attacker variant of the detector evaluation, and a paraphrased-query retrieval benchmark. Distribution-shifting perturbations are often caught by simple anomaly detectors; small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested. A disjoint-Givens rotation encoder gives a closed-form per-vector capacity ceiling of floor(d/2) * b bits, but real embedding manifolds impose a capacity-detectability trade-off, and the retrieval-preserving operating point sits well below it. We propose VectorPin, a cryptographic provenance protocol that pins each embedding to its source content and producing model via an Ed25519 signature over a canonical byte representation. Any post-embedding modification breaks signature verification. Embedding-level integrity is a deployable, standardizable control that closes this attack class.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Small rotations let attackers hide data in embeddings while dodging detectors, and a basic signature scheme stops it.

read the letter

Small rotations let attackers hide data in embeddings while dodging detectors, and a basic signature scheme stops it. The paper shows that post-embedding perturbations like small-angle orthogonal rotations can carry payload data in vector stores without triggering distribution-based anomaly detection. They back this with broad tests across models, corpora, and stores, including a paraphrased-query benchmark on over 26,000 chunks. The work does a good job documenting the attack's effectiveness. Small rotations defeat the detectors in every model-corpus pair they tried, while other perturbations like noise or scaling get caught more easily. The capacity analysis with the disjoint-Givens encoder is useful, giving a closed-form bound but noting the practical trade-off due to embedding manifolds. Including an adaptive-attacker variant and checking retrieval on paraphrased queries strengthens the evaluation. The VectorPin protocol is a straightforward application of Ed25519 over canonical bytes, which pins the embedding to its source and model so any change fails verification. This is a deployable control that addresses the gap they identify. The main limitation is the missing detail on retrieval preservation. The claim that the attack remains usable rests on the perturbations not altering surface-level behavior much, but the abstract does not report quantitative results like top-k overlap or recall changes between clean and rotated embeddings. Without those, it's unclear how narrow the operating window is for the rotation angle. If even modest rotations shift the retrieved sets for some queries, the practical threat is smaller than presented. The rest of the paper appears to avoid circularity, with empirical measurements and standard crypto. This is relevant for anyone building or securing RAG systems that handle sensitive information. The empirical approach targets a real deployment issue, so the paper deserves peer review to verify the numbers and refine the presentation.

Referee Report

2 major / 1 minor

Summary. The paper claims that attackers with write access to RAG ingestion pipelines can perform steganographic exfiltration by applying post-embedding perturbations (noise, rotation, scaling, etc.) to hide payloads in vector stores while preserving surface retrieval behavior. It reports that small-angle orthogonal rotations defeat distribution-based detectors across all tested models and corpora (including >26k chunks on paraphrased queries), derives a closed-form capacity bound for disjoint-Givens rotations, and proposes VectorPin: an Ed25519 signature protocol over canonical byte representations that detects any post-embedding modification.

Significance. If the empirical claims hold, the work is significant for RAG security: it identifies a practical attack class on opaque embedding stores and supplies a deployable, parameter-free cryptographic control (standard Ed25519) that closes the vector. The breadth of evaluation (multiple models, BEIR/Quora corpora, seven stores, adaptive attacker) and the explicit capacity-detectability trade-off discussion are strengths; the defense requires no new primitives and directly addresses the identified gap.

major comments (2)

[Evaluation (paraphrased-query retrieval benchmark and cross-corpus replication)] The central usability claim—that small-angle orthogonal rotations preserve surface-level retrieval behavior across every (model, corpus) pair—rests on unquantified assertions. The paraphrased-query benchmark on >26k chunks is described, but no top-k overlap, MRR delta, recall@5 change, or similar metrics are reported between clean and rotated embeddings. This metric gap is load-bearing for both the 'defeats detection while remaining usable' conclusion and the practical stealth assessment.
[Abstract and Evaluation sections] The statement that 'small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested' is presented as a universal result, yet the abstract and evaluation summary provide no per-pair detector scores, false-positive rates, or adaptive-attacker success rates. Without these numbers or the exact detector implementations, the scope of the 'defeats' claim cannot be verified.

minor comments (1)

[Capacity analysis] The disjoint-Givens rotation capacity formula (floor(d/2) * b bits) is stated without an accompanying derivation or reference to the underlying linear-algebra construction; a short appendix or inline proof sketch would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The two major comments correctly identify gaps in the quantitative reporting of our evaluation results. We will address both by expanding the evaluation section with the requested metrics and tables in the revised manuscript.

read point-by-point responses

Referee: [Evaluation (paraphrased-query retrieval benchmark and cross-corpus replication)] The central usability claim—that small-angle orthogonal rotations preserve surface-level retrieval behavior across every (model, corpus) pair—rests on unquantified assertions. The paraphrased-query benchmark on >26k chunks is described, but no top-k overlap, MRR delta, recall@5 change, or similar metrics are reported between clean and rotated embeddings. This metric gap is load-bearing for both the 'defeats detection while remaining usable' conclusion and the practical stealth assessment.

Authors: We agree that the manuscript describes the paraphrased-query benchmark and cross-corpus replication but does not report the explicit numerical deltas. In the revision we will add a table (and accompanying text) that reports top-k overlap, MRR, recall@5, and nDCG deltas between clean and rotated embeddings for every model-corpus pair. These numbers will directly quantify the retrieval-preservation claim and support the practical stealth assessment. revision: yes
Referee: [Abstract and Evaluation sections] The statement that 'small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested' is presented as a universal result, yet the abstract and evaluation summary provide no per-pair detector scores, false-positive rates, or adaptive-attacker success rates. Without these numbers or the exact detector implementations, the scope of the 'defeats' claim cannot be verified.

Authors: We acknowledge that the abstract and high-level evaluation summary omit the per-pair numerical results. The evaluation section describes the detector implementations and the adaptive-attacker protocol, but the concrete scores are only summarized. We will revise both the abstract and the evaluation section to include detailed tables listing detector scores, false-positive rates, and adaptive-attacker success rates for each (model, corpus) pair, together with the precise detector configurations used. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical evaluations and standard Ed25519 application

full rationale

The paper's core claims rest on empirical measurements of perturbation effects (noise, rotation, scaling) across multiple models, corpora, and benchmarks, with no fitted parameters or self-referential predictions. The VectorPin defense is a direct, parameter-free application of the standard Ed25519 signature scheme over canonical byte representations of embeddings, source content, and model identifiers. No equations or derivations reduce to their inputs by construction, no self-citations form load-bearing uniqueness arguments, and no ansatzes are smuggled via prior work. The attack results are falsifiable via the reported retrieval and detection metrics; the defense is independently verifiable against the Ed25519 specification. This yields a self-contained, non-circular contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on the domain assumption that embedding manifolds allow small perturbations to carry payload while preserving nearest-neighbor behavior, plus standard cryptographic assumptions for Ed25519.

axioms (2)

domain assumption Small post-embedding perturbations can preserve retrieval behavior while embedding hidden data
Invoked throughout the attack description and evaluation claims
standard math Ed25519 signatures over canonical byte representations provide integrity against post-creation modification
Basis for the VectorPin protocol

invented entities (1)

VectorPin protocol no independent evidence
purpose: Cryptographic provenance attestation for embeddings
New protocol introduced to close the described attack class

pith-pipeline@v0.9.0 · 5610 in / 1222 out tokens · 29840 ms · 2026-05-14T17:42:35.641055+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Turning your weakness into a strength: Watermarking deep neural networks by backdooring

Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. InUSENIX Security Symposium, 2018

2018
[2]

Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang

Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security signatures.Journal of Cryptographic Engineering, 2, 2012

2012
[3]

Extracting training data from large language models

Nicholas Carlini, Florian Tramer, Eric Wallace, et al. Extracting training data from large language models. InUSENIX Security Symposium, 2021

2021
[4]

C2PA technical specification, version 2.0

Coalition for Content Provenance and Authenticity. C2PA technical specification, version 2.0. https://c2pa.org/specifications/, 2024

2024
[5]

Cox, Joe Kilian, F

Ingemar J. Cox, Joe Kilian, F. Thomson Leighton, and Talal Shamoon. Secure spread spectrum watermarking for multimedia.IEEE Transactions on Image Processing, 6(12), 1997

1997
[6]

Regulation (EU) 2024/1689 on artificial intelligence

European Parliament and Council. Regulation (EU) 2024/1689 on artificial intelligence. https://eur-lex.europa.eu/eli/reg/2024/1689/oj, 2024

2024
[7]

Cambridge University Press, 2009

Jessica Fridrich.Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge University Press, 2009

2009
[8]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. BadNets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Billion-scale similarity search with GPUs

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 2021

2021
[10]

Jones, John Bradley, and Nat Sakimura

Michael B. Jones, John Bradley, and Nat Sakimura. RFC 7515: JSON web signature (JWS). https://datatracker.ietf.org/doc/html/rfc7515, 2015

2015
[11]

RFC 8032: Edwards-curve digital signature algorithm (EdDSA).https://datatracker.ietf.org/doc/html/rfc8032, 2017

Simon Josefsson and Ilari Liusvaara. RFC 8032: Edwards-curve digital signature algorithm (EdDSA).https://datatracker.ietf.org/doc/html/rfc8032, 2017

2017
[12]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InEmpirical Methods in Natural Language Processing (EMNLP), 2020

2020
[13]

RFC 3339: Date and time on the Internet: Timestamps

Graham Klyne and Chris Newman. RFC 3339: Date and time on the Internet: Timestamps. https://datatracker.ietf.org/doc/html/rfc3339, 2002

2002
[14]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 40

2020
[15]

Isolation forest

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InIEEE International Conference on Data Mining (ICDM), 2008

2008
[16]

Malkov and D

Yury A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 2020

2020
[17]

AI risk management framework (AI RMF 1.0).https://www.nist.gov/itl/ai-risk-management-framework, 2023

National Institute of Standards and Technology. AI risk management framework (AI RMF 1.0).https://www.nist.gov/itl/ai-risk-management-framework, 2023

2023
[18]

sigstore: Software signing for everybody

Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. sigstore: Software signing for everybody. InACM Conference on Computer and Communications Security (CCS), 2022

2022
[19]

SLSA: Supply-chain levels for software artifacts.https: //slsa.dev/, 2023

Open Source Security Foundation. SLSA: Supply-chain levels for software artifacts.https: //slsa.dev/, 2023

2023
[20]

New embedding models and API updates

OpenAI. New embedding models and API updates. https://openai.com/index/ new-embedding-models-and-api-updates/, 2024. Announcement of text-embedding-3- largeand related models

2024
[21]

Berkay Celik, and Ananthram Swami

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. InACM Asia Conference on Computer and Communications Security (ASIACCS), 2017

2017
[22]

Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12, 2011

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12, 2011

2011
[23]

Hide and seek: An introduction to steganography.IEEE Security & Privacy, 1(3), 2003

Niels Provos and Peter Honeyman. Hide and seek: An introduction to steganography.IEEE Security & Privacy, 1(3), 2003

2003
[24]

Qdrant documentation: Vector quantization

Qdrant Team. Qdrant documentation: Vector quantization. https://qdrant.tech/ documentation/guides/quantization/, 2024

2024
[25]

RFC 8392: CBOR object signing and encryption (COSE).https://datatracker

Jim Schaad. RFC 8392: CBOR object signing and encryption (COSE).https://datatracker. ietf.org/doc/html/rfc8392, 2018

2018
[26]

Platt, John Shawe-Taylor, Alex J

Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. Estimating the support of a high-dimensional distribution.Neural Computation, 13(7), 2001

2001
[27]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. InIEEE Symposium on Security and Privacy, 2017

2017
[28]

in-toto: Providing farm-to-table guarantees for bits and bytes

Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. InUSENIX Security Symposium, 2019

2019
[29]

The Space of Transferable Adversarial Examples

Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. InarXiv preprint arXiv:1704.03453, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

SchemaPin: Cryptographic provenance for tool schemas.https://github

Jascha Wanger. SchemaPin: Cryptographic provenance for tool schemas.https://github. com/ThirdKeyAI/SchemaPin, 2025. Apache-2.0. 41

2025
[31]

Symbiont: Policy-governed agent runtime.https://github.com/ThirdKeyAI/ Symbiont, 2025

Jascha Wanger. Symbiont: Policy-governed agent runtime.https://github.com/ThirdKeyAI/ Symbiont, 2025. Apache-2.0

2025
[32]

VectorPin: Verifiable integrity for AI embedding stores.https://github

Jascha Wanger. VectorPin: Verifiable integrity for AI embedding stores.https://github. com/ThirdKeyAI/VectorPin, 2025. Apache-2.0

2025
[33]

VectorSmuggle: A research framework for vector-based data exfiltration

Jascha Wanger. VectorSmuggle: A research framework for vector-based data exfiltration. https://github.com/jaschadub/VectorSmuggle, 2025. Apache-2.0

2025
[34]

F5—a steganographic algorithm: High capacity despite better steganalysis

Andreas Westfeld. F5—a steganographic algorithm: High capacity despite better steganalysis. InInformation Hiding (IH ’01), 2001

2001
[35]

PoisonedRAG: Knowledge poisoning attacks to retrieval-augmented generation of large language models.arXiv preprint arXiv:2402.07867, 2024

Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models.https://arxiv.org/abs/ 2402.07867, 2024. A Protocol Specification (v1) This appendix is a self-contained reproduction of the VectorPin protocol specification at version

work page arXiv 2024
[36]

∥hex(SHA256(UTF8(NFC(s)))). Text MUST be normalized to Unicode NFC before encoding. Implementations MUST reject input that cannot be normalized. 42 Vector.hash_vector(v,dtype) :=

A separate-language implementation that follows this appendix should produce signatures and verifications byte-for-byte compatible with the Python and Rust reference implementations [32]. A.1 Goals A VectorPin Pin is a compact attestation that travels with an embedding through a vector database. It guarantees that: •The embedding matches a specific source...

2026
[37]

Reject pins whosevfield is unknown to it (UNSUPPORTED_VERSION)
[38]

Reject pins whosekidis not in its key registry (UNKNOWN_KEY)
[39]

Reconstruct the canonical byte sequence and verifysig against the registered public key forkid (SIGNATURE_INVALIDon failure). 43
[40]

If ground-truth source was supplied, recomputehash_text(source) and compare tosource_- hash(SOURCE_MISMATCHon mismatch)
[41]

If a ground-truth vector was supplied, recomputehash_vector(vector, vec_dtype) and com- pare tovec_hash; also check the supplied vector’s shape matchesvec_dim (VECTOR_TAMPERED orSHAPE_MISMATCHon mismatch)
[42]

rbf", nu=0.05, gamma=

If an expected model identifier was supplied, compare tomodel (MODEL_MISMATCH on mismatch). Verifiers MUST distinguish at least these failure modes. Other implementations MAY use different identifiers for the modes but MUST distinguish the cases. A.7 Storage conventions Adapter implementations SHOULD store pins under the metadata keyvectorpin. Backends wi...