Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange
Pith reviewed 2026-05-10 19:21 UTC · model grok-4.3
The pith
Two AI agents can exchange keys and converse covertly using only constant min-entropy in their short normal messages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce pseudorandom noise-resilient key exchange, a protocol whose transcript is pseudorandom yet still correct under constant noise. Using this primitive we show that covert key exchange, and therefore covert conversation, remains possible even when each agent has an arbitrary private context, messages are short and fully adaptive, and the only entropy assumption is that sufficiently many individual messages contain at least constant min-entropy.
What carries the argument
pseudorandom noise-resilient key exchange: a key-exchange protocol that produces a pseudorandom public transcript while remaining correct under constant noise
If this is right
- When a shared secret is available, the hidden conversation can exploit essentially all entropy present in the honest message distributions.
- Transcript auditing alone cannot rule out covert coordination between AI agents.
- The new primitive may be of independent interest for other cryptographic applications that require pseudorandom yet noise-tolerant transcripts.
Where Pith is reading between the lines
- AI monitoring systems that rely solely on transcript inspection may need to incorporate active probing or model-internal checks to detect coordination.
- Practical deployments could test whether typical LLM message distributions actually supply the constant min-entropy the construction requires.
- The same noise-resilient primitive might be usable for covert coordination among more than two agents.
Load-bearing premise
That sufficiently many individual messages carry at least constant min-entropy and that a secure pseudorandom noise-resilient key exchange primitive exists.
What would settle it
An efficient algorithm that distinguishes the transcript of the constructed key exchange from a uniformly random string, or that recovers the shared key from a transcript subjected to constant noise, with non-negligible advantage.
Figures
read the original abstract
AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by different entities, can carry out a parallel secret conversation while still producing a transcript that is computationally indistinguishable from an honest interaction, even to a strong passive auditor that knows the full model descriptions, the protocol, and the agents' private contexts. Building on recent work on watermarking and steganography for LLMs, we first show that if the parties possess an interaction-unique secret key, they can facilitate an optimal-rate covert conversation: the hidden conversation can exploit essentially all of the entropy present in the honest message distributions. Our main contributions concern extending this to the keyless setting, where the agents begin with no shared secret. We show that covert key exchange, and hence covert conversation, is possible even when each model has an arbitrary private context, and their messages are short and fully adaptive, assuming only that sufficiently many individual messages have at least constant min-entropy. This stands in contrast to previous covert communication works, which relied on the min-entropy in each individual message growing with the security parameter. To obtain this, we introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange: a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise. We study this primitive, giving several constructions relevant to our application as well as strong limitations showing that more naive variants are impossible or vulnerable to efficient attacks. These results show that transcript auditing alone cannot rule out covert coordination between AI agents, and identify a new cryptographic theory that may be of independent interest.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that AI agents can perform covert conversations with transcripts computationally indistinguishable from honest LLM interactions, even to an auditor knowing the models, protocol, and private contexts. When a shared key is available, optimal-rate steganography is possible by exploiting message entropy. In the keyless setting, the authors introduce the pseudorandom noise-resilient key exchange primitive (whose transcript is pseudorandom yet correct under constant noise) and show that it enables covert key exchange assuming only that sufficiently many individual messages have constant min-entropy, despite arbitrary private contexts and fully adaptive short messages. Explicit constructions for the primitive are given along with impossibility results for naive variants.
Significance. If the constructions and security arguments hold, the result shows that transcript auditing alone cannot rule out covert coordination between AI agents. The relaxation of the per-message min-entropy requirement to a constant (in contrast to prior steganography work) and the introduction of the new primitive, together with both positive constructions and negative results for simpler variants, constitute a substantive contribution that may be of independent interest in cryptography and LLM security.
minor comments (3)
- The abstract states that the constructions rely on 'standard cryptographic assumptions' but does not name them; adding the specific assumptions (e.g., existence of PRGs or specific hardness assumptions) would clarify the result for readers.
- A formal definition of the pseudorandom noise-resilient key exchange primitive, including the exact security notions (pseudorandomness of transcript, correctness under constant noise, and the min-entropy condition), would be helpful early in the paper to ground the subsequent constructions.
- The claim that the hidden conversation can 'exploit essentially all of the entropy' when a key is present is strong; a brief quantitative statement or reference to the rate achieved relative to the message entropy would strengthen the presentation.
Simulated Author's Rebuttal
We thank the referee for their accurate summary of the paper and for recognizing the potential significance of the pseudorandom noise-resilient key exchange primitive and its implications for LLM security and auditing. The 'uncertain' recommendation appears to reflect the novelty of applying these ideas to AI agents rather than any identified flaw; we believe the positive and negative results are technically sound. No specific major comments were provided in the report, so we offer no point-by-point rebuttals below. We remain available to address any further questions.
Circularity Check
No significant circularity detected
full rationale
The paper defines a new primitive (pseudorandom noise-resilient key exchange) and supplies explicit constructions plus impossibility results for variants, all under the external min-entropy assumption and standard cryptographic hardness. The derivation chain begins from the stated assumption and produces the primitive via direct constructions rather than any self-referential equation, fitted parameter renamed as prediction, or load-bearing self-citation. The initial watermarking/steganography reference is used only for the keyed case and is not required for the keyless result. The argument remains self-contained and does not reduce any claimed result to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Sufficiently many individual messages have at least constant min-entropy
invented entities (1)
-
pseudorandom noise-resilient key exchange primitive
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange (PNR-KE): roughly, a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We study this primitive directly, giving several constructions in regimes relevant to our application as well as strong limitations showing that more naïve variants are impossible or vulnerable to efficient attacks.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Secret-key reconciliation by public discussion
9 [BS93] Gilles Brassard and Louis Salvail. Secret-key reconciliation by public discussion. In Tor Helleseth, editor,Advances in Cryptology - EUROCRYPT ’93, Workshop on the Theory and Application of of Cryptographic Techniques, Lofthus, Norway, May 23-27, 1993, Proceedings, volume 765 ofLecture Notes in Computer Science, pages 410–423. Springer, 1993. 39 ...
work page 1993
-
[2]
arXiv preprint arXiv:2512.08918 , year =
9 [CG24] Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. InAnnual International Cryptology Conference, pages 325–347. Springer, 2024. 8, 10, 17, 27, 44 [CGG+25] Miranda Christ, Noah Golowich, Sam Gunn, Ankur Moitra, and Daniel Wichs. Improved pseudorandom codes from permuted puzzles.arXiv preprint arXiv:2512.08918, 2025. 10 [CGM24a] Dari...
-
[3]
Undetectable watermarks for language models
10 [CGZ23] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models.arXiv preprint arXiv:2306.09194, 2023. 3, 4, 5, 7, 10, 17, 18, 24, 27, 31 [CLA17] Ruina Chen, Haitao Liu, and Gabriel Altmann. Entropy in different text types.Digital Scholarship in the Humanities, 32(3):528–542, 2017. 17, 24 [CLC+25] Alex Cloud, Minh Le, James ...
-
[4]
arXiv preprint arXiv:2308.00113 , year=
7, 37 [DFS+21] Martin Degeling, Lena Fr¨ ommgen, Thomas Schneider, et al. Meteor: Cryptographically secure steganography for realistic distributions. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2021. 9 [DGKS21] Dana Dachman-Soled, Huijing Gong, Hunter Kippen, and Aria Shahverdi. BKW meets fourier new algor...
-
[5]
Robust distortion-free watermarks for language models,
10 [KRT17] Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors,Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 1067–1080. ACM, 2017. 7, 37 [KTHL23] Rohith Kuditipudi, John Thick...
-
[6]
7, 37 [ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text.arXiv preprint arXiv:2306.17439, 2023. 10 [Zam25] Or Zamir. Undetectable steganography for language models.Transactions on Machine Learning Research, 2025. 3, 4, 5, 10, 17, 18, 19, 20, 21, 24 73
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.