Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

Or Zamir; Vinod Vaikuntanathan

arxiv: 2604.04757 · v1 · submitted 2026-04-06 · 💻 cs.CR · cs.AI· cs.LG

Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

Vinod Vaikuntanathan , Or Zamir This is my paper

Pith reviewed 2026-05-10 19:21 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG

keywords covert communicationAI agentskey exchangesteganographypseudorandom transcriptsnoise resilienceLLM

0 comments

The pith

Two AI agents can exchange keys and converse covertly using only constant min-entropy in their short normal messages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that AI agents operated by different entities can run a secret parallel conversation whose full transcript looks computationally identical to an ordinary interaction, even when a strong auditor knows the models, the protocol, and each agent's private context. With a shared secret the hidden conversation can use nearly all the entropy in the honest message distributions. Without any shared secret the same result holds for short adaptive messages by means of a new primitive, pseudorandom noise-resilient key exchange, which keeps the transcript pseudorandom while tolerating constant noise, provided only that enough messages carry constant min-entropy. A reader would care because this implies that passive transcript auditing cannot by itself prevent covert coordination between AI agents.

Core claim

We introduce pseudorandom noise-resilient key exchange, a protocol whose transcript is pseudorandom yet still correct under constant noise. Using this primitive we show that covert key exchange, and therefore covert conversation, remains possible even when each agent has an arbitrary private context, messages are short and fully adaptive, and the only entropy assumption is that sufficiently many individual messages contain at least constant min-entropy.

What carries the argument

pseudorandom noise-resilient key exchange: a key-exchange protocol that produces a pseudorandom public transcript while remaining correct under constant noise

If this is right

When a shared secret is available, the hidden conversation can exploit essentially all entropy present in the honest message distributions.
Transcript auditing alone cannot rule out covert coordination between AI agents.
The new primitive may be of independent interest for other cryptographic applications that require pseudorandom yet noise-tolerant transcripts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

AI monitoring systems that rely solely on transcript inspection may need to incorporate active probing or model-internal checks to detect coordination.
Practical deployments could test whether typical LLM message distributions actually supply the constant min-entropy the construction requires.
The same noise-resilient primitive might be usable for covert coordination among more than two agents.

Load-bearing premise

That sufficiently many individual messages carry at least constant min-entropy and that a secure pseudorandom noise-resilient key exchange primitive exists.

What would settle it

An efficient algorithm that distinguishes the transcript of the constructed key exchange from a uniformly random string, or that recovers the shared key from a transcript subjected to constant noise, with non-negligible advantage.

Figures

Figures reproduced from arXiv: 2604.04757 by Or Zamir, Vinod Vaikuntanathan.

**Figure 2.** Figure 2: Our Pseudorandom Noise-Resilient Key Exchange Protocol to agree on a single key bit, [PITH_FULL_IMAGE:figures/full_fig_p040_2.png] view at source ↗

read the original abstract

AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by different entities, can carry out a parallel secret conversation while still producing a transcript that is computationally indistinguishable from an honest interaction, even to a strong passive auditor that knows the full model descriptions, the protocol, and the agents' private contexts. Building on recent work on watermarking and steganography for LLMs, we first show that if the parties possess an interaction-unique secret key, they can facilitate an optimal-rate covert conversation: the hidden conversation can exploit essentially all of the entropy present in the honest message distributions. Our main contributions concern extending this to the keyless setting, where the agents begin with no shared secret. We show that covert key exchange, and hence covert conversation, is possible even when each model has an arbitrary private context, and their messages are short and fully adaptive, assuming only that sufficiently many individual messages have at least constant min-entropy. This stands in contrast to previous covert communication works, which relied on the min-entropy in each individual message growing with the security parameter. To obtain this, we introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange: a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise. We study this primitive, giving several constructions relevant to our application as well as strong limitations showing that more naive variants are impossible or vulnerable to efficient attacks. These results show that transcript auditing alone cannot rule out covert coordination between AI agents, and identify a new cryptographic theory that may be of independent interest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a pseudorandom noise-resilient key exchange primitive that enables covert AI-agent conversations under constant per-message min-entropy, with explicit constructions and limits on simpler variants.

read the letter

The main takeaway is that two AI agents can set up a shared secret and run a hidden conversation inside their normal short adaptive messages, even without any prior key and with arbitrary private contexts, as long as enough messages carry constant min-entropy. The transcript stays computationally indistinguishable from honest interaction for a passive auditor who knows the models and protocol. This rests on a new primitive they define and construct: key exchange whose output looks pseudorandom while still working correctly when the underlying channel has constant noise. They also show why naive versions either lose correctness or become attackable in polynomial time. That contrast with earlier covert-communication results, which needed min-entropy growing with the security parameter, is the clearest technical advance. The constructions appear to use standard assumptions and avoid circularity. The AI application follows directly once the primitive is in hand, and the impossibility results for weaker primitives give the argument some teeth. One limitation is that the constant-min-entropy condition is stated as an assumption rather than verified against current LLM output distributions; if most short messages are low-entropy or highly predictable in practice, the result does not apply to those models. The security model is passive only, so active tampering or model retraining could change the picture. The abstract sketches the reductions, but without the full proofs it is hard to judge their tightness. This work is mainly for cryptographers interested in steganography and for researchers studying AI monitoring and governance. A reader who wants a formal treatment of undetectable coordination will find the primitive and its limitations useful. It deserves a serious referee because the new primitive stands on its own and the AI-security claim is cleanly derived from it. I would send it out for review.

Referee Report

0 major / 3 minor

Summary. The paper claims that AI agents can perform covert conversations with transcripts computationally indistinguishable from honest LLM interactions, even to an auditor knowing the models, protocol, and private contexts. When a shared key is available, optimal-rate steganography is possible by exploiting message entropy. In the keyless setting, the authors introduce the pseudorandom noise-resilient key exchange primitive (whose transcript is pseudorandom yet correct under constant noise) and show that it enables covert key exchange assuming only that sufficiently many individual messages have constant min-entropy, despite arbitrary private contexts and fully adaptive short messages. Explicit constructions for the primitive are given along with impossibility results for naive variants.

Significance. If the constructions and security arguments hold, the result shows that transcript auditing alone cannot rule out covert coordination between AI agents. The relaxation of the per-message min-entropy requirement to a constant (in contrast to prior steganography work) and the introduction of the new primitive, together with both positive constructions and negative results for simpler variants, constitute a substantive contribution that may be of independent interest in cryptography and LLM security.

minor comments (3)

The abstract states that the constructions rely on 'standard cryptographic assumptions' but does not name them; adding the specific assumptions (e.g., existence of PRGs or specific hardness assumptions) would clarify the result for readers.
A formal definition of the pseudorandom noise-resilient key exchange primitive, including the exact security notions (pseudorandomness of transcript, correctness under constant noise, and the min-entropy condition), would be helpful early in the paper to ground the subsequent constructions.
The claim that the hidden conversation can 'exploit essentially all of the entropy' when a key is present is strong; a brief quantitative statement or reference to the rate achieved relative to the message entropy would strengthen the presentation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their accurate summary of the paper and for recognizing the potential significance of the pseudorandom noise-resilient key exchange primitive and its implications for LLM security and auditing. The 'uncertain' recommendation appears to reflect the novelty of applying these ideas to AI agents rather than any identified flaw; we believe the positive and negative results are technically sound. No specific major comments were provided in the report, so we offer no point-by-point rebuttals below. We remain available to address any further questions.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a new primitive (pseudorandom noise-resilient key exchange) and supplies explicit constructions plus impossibility results for variants, all under the external min-entropy assumption and standard cryptographic hardness. The derivation chain begins from the stated assumption and produces the primitive via direct constructions rather than any self-referential equation, fitted parameter renamed as prediction, or load-bearing self-citation. The initial watermarking/steganography reference is used only for the keyed case and is not required for the keyless result. The argument remains self-contained and does not reduce any claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the introduction of the new primitive and the domain assumption that LLM message distributions contain constant min-entropy in sufficiently many messages.

axioms (1)

domain assumption Sufficiently many individual messages have at least constant min-entropy
Required for the key exchange to succeed with short, fully adaptive messages from models with arbitrary private contexts.

invented entities (1)

pseudorandom noise-resilient key exchange primitive no independent evidence
purpose: Enable key exchange whose transcript is pseudorandom yet remains correct under constant noise
Newly introduced primitive whose existence is used to establish the covert conversation result.

pith-pipeline@v0.9.0 · 5605 in / 1365 out tokens · 40796 ms · 2026-05-10T19:21:41.910343+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange (PNR-KE): roughly, a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We study this primitive directly, giving several constructions in regimes relevant to our application as well as strong limitations showing that more naïve variants are impossible or vulnerable to efficient attacks.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Secret-key reconciliation by public discussion

9 [BS93] Gilles Brassard and Louis Salvail. Secret-key reconciliation by public discussion. In Tor Helleseth, editor,Advances in Cryptology - EUROCRYPT ’93, Workshop on the Theory and Application of of Cryptographic Techniques, Lofthus, Norway, May 23-27, 1993, Proceedings, volume 765 ofLecture Notes in Computer Science, pages 410–423. Springer, 1993. 39 ...

work page 1993
[2]

arXiv preprint arXiv:2512.08918 , year =

9 [CG24] Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. InAnnual International Cryptology Conference, pages 325–347. Springer, 2024. 8, 10, 17, 27, 44 [CGG+25] Miranda Christ, Noah Golowich, Sam Gunn, Ankur Moitra, and Daniel Wichs. Improved pseudorandom codes from permuted puzzles.arXiv preprint arXiv:2512.08918, 2025. 10 [CGM24a] Dari...

work page arXiv 2024
[3]

Undetectable watermarks for language models

10 [CGZ23] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models.arXiv preprint arXiv:2306.09194, 2023. 3, 4, 5, 7, 10, 17, 18, 24, 27, 31 [CLA17] Ruina Chen, Haitao Liu, and Gabriel Altmann. Entropy in different text types.Digital Scholarship in the Humanities, 32(3):528–542, 2017. 17, 24 [CLC+25] Alex Cloud, Minh Le, James ...

work page arXiv 2023
[4]

arXiv preprint arXiv:2308.00113 , year=

7, 37 [DFS+21] Martin Degeling, Lena Fr¨ ommgen, Thomas Schneider, et al. Meteor: Cryptographically secure steganography for realistic distributions. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2021. 9 [DGKS21] Dana Dachman-Soled, Huijing Gong, Hunter Kippen, and Aria Shahverdi. BKW meets fourier new algor...

work page arXiv 2021
[5]

Robust distortion-free watermarks for language models,

10 [KRT17] Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors,Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 1067–1080. ACM, 2017. 7, 37 [KTHL23] Rohith Kuditipudi, John Thick...

work page arXiv 2017
[6]

Review outline:

7, 37 [ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text.arXiv preprint arXiv:2306.17439, 2023. 10 [Zam25] Or Zamir. Undetectable steganography for language models.Transactions on Machine Learning Research, 2025. 3, 4, 5, 10, 17, 18, 19, 20, 21, 24 73

work page arXiv 2023

[1] [1]

Secret-key reconciliation by public discussion

9 [BS93] Gilles Brassard and Louis Salvail. Secret-key reconciliation by public discussion. In Tor Helleseth, editor,Advances in Cryptology - EUROCRYPT ’93, Workshop on the Theory and Application of of Cryptographic Techniques, Lofthus, Norway, May 23-27, 1993, Proceedings, volume 765 ofLecture Notes in Computer Science, pages 410–423. Springer, 1993. 39 ...

work page 1993

[2] [2]

arXiv preprint arXiv:2512.08918 , year =

9 [CG24] Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. InAnnual International Cryptology Conference, pages 325–347. Springer, 2024. 8, 10, 17, 27, 44 [CGG+25] Miranda Christ, Noah Golowich, Sam Gunn, Ankur Moitra, and Daniel Wichs. Improved pseudorandom codes from permuted puzzles.arXiv preprint arXiv:2512.08918, 2025. 10 [CGM24a] Dari...

work page arXiv 2024

[3] [3]

Undetectable watermarks for language models

10 [CGZ23] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models.arXiv preprint arXiv:2306.09194, 2023. 3, 4, 5, 7, 10, 17, 18, 24, 27, 31 [CLA17] Ruina Chen, Haitao Liu, and Gabriel Altmann. Entropy in different text types.Digital Scholarship in the Humanities, 32(3):528–542, 2017. 17, 24 [CLC+25] Alex Cloud, Minh Le, James ...

work page arXiv 2023

[4] [4]

arXiv preprint arXiv:2308.00113 , year=

7, 37 [DFS+21] Martin Degeling, Lena Fr¨ ommgen, Thomas Schneider, et al. Meteor: Cryptographically secure steganography for realistic distributions. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2021. 9 [DGKS21] Dana Dachman-Soled, Huijing Gong, Hunter Kippen, and Aria Shahverdi. BKW meets fourier new algor...

work page arXiv 2021

[5] [5]

Robust distortion-free watermarks for language models,

10 [KRT17] Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors,Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 1067–1080. ACM, 2017. 7, 37 [KTHL23] Rohith Kuditipudi, John Thick...

work page arXiv 2017

[6] [6]

Review outline:

7, 37 [ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text.arXiv preprint arXiv:2306.17439, 2023. 10 [Zam25] Or Zamir. Undetectable steganography for language models.Transactions on Machine Learning Research, 2025. 3, 4, 5, 10, 17, 18, 19, 20, 21, 24 73

work page arXiv 2023