pith. sign in

arxiv: 2604.04757 · v1 · submitted 2026-04-06 · 💻 cs.CR · cs.AI· cs.LG

Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

Pith reviewed 2026-05-10 19:21 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG
keywords covert communicationAI agentskey exchangesteganographypseudorandom transcriptsnoise resilienceLLM
0
0 comments X

The pith

Two AI agents can exchange keys and converse covertly using only constant min-entropy in their short normal messages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that AI agents operated by different entities can run a secret parallel conversation whose full transcript looks computationally identical to an ordinary interaction, even when a strong auditor knows the models, the protocol, and each agent's private context. With a shared secret the hidden conversation can use nearly all the entropy in the honest message distributions. Without any shared secret the same result holds for short adaptive messages by means of a new primitive, pseudorandom noise-resilient key exchange, which keeps the transcript pseudorandom while tolerating constant noise, provided only that enough messages carry constant min-entropy. A reader would care because this implies that passive transcript auditing cannot by itself prevent covert coordination between AI agents.

Core claim

We introduce pseudorandom noise-resilient key exchange, a protocol whose transcript is pseudorandom yet still correct under constant noise. Using this primitive we show that covert key exchange, and therefore covert conversation, remains possible even when each agent has an arbitrary private context, messages are short and fully adaptive, and the only entropy assumption is that sufficiently many individual messages contain at least constant min-entropy.

What carries the argument

pseudorandom noise-resilient key exchange: a key-exchange protocol that produces a pseudorandom public transcript while remaining correct under constant noise

If this is right

  • When a shared secret is available, the hidden conversation can exploit essentially all entropy present in the honest message distributions.
  • Transcript auditing alone cannot rule out covert coordination between AI agents.
  • The new primitive may be of independent interest for other cryptographic applications that require pseudorandom yet noise-tolerant transcripts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI monitoring systems that rely solely on transcript inspection may need to incorporate active probing or model-internal checks to detect coordination.
  • Practical deployments could test whether typical LLM message distributions actually supply the constant min-entropy the construction requires.
  • The same noise-resilient primitive might be usable for covert coordination among more than two agents.

Load-bearing premise

That sufficiently many individual messages carry at least constant min-entropy and that a secure pseudorandom noise-resilient key exchange primitive exists.

What would settle it

An efficient algorithm that distinguishes the transcript of the constructed key exchange from a uniformly random string, or that recovers the shared key from a transcript subjected to constant noise, with non-negligible advantage.

Figures

Figures reproduced from arXiv: 2604.04757 by Or Zamir, Vinod Vaikuntanathan.

Figure 1
Figure 1. Figure 1: Our Bundle Sampler Construction: a pair of algorithms ( [PITH_FULL_IMAGE:figures/full_fig_p028_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our Pseudorandom Noise-Resilient Key Exchange Protocol to agree on a single key bit, [PITH_FULL_IMAGE:figures/full_fig_p040_2.png] view at source ↗
read the original abstract

AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by different entities, can carry out a parallel secret conversation while still producing a transcript that is computationally indistinguishable from an honest interaction, even to a strong passive auditor that knows the full model descriptions, the protocol, and the agents' private contexts. Building on recent work on watermarking and steganography for LLMs, we first show that if the parties possess an interaction-unique secret key, they can facilitate an optimal-rate covert conversation: the hidden conversation can exploit essentially all of the entropy present in the honest message distributions. Our main contributions concern extending this to the keyless setting, where the agents begin with no shared secret. We show that covert key exchange, and hence covert conversation, is possible even when each model has an arbitrary private context, and their messages are short and fully adaptive, assuming only that sufficiently many individual messages have at least constant min-entropy. This stands in contrast to previous covert communication works, which relied on the min-entropy in each individual message growing with the security parameter. To obtain this, we introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange: a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise. We study this primitive, giving several constructions relevant to our application as well as strong limitations showing that more naive variants are impossible or vulnerable to efficient attacks. These results show that transcript auditing alone cannot rule out covert coordination between AI agents, and identify a new cryptographic theory that may be of independent interest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that AI agents can perform covert conversations with transcripts computationally indistinguishable from honest LLM interactions, even to an auditor knowing the models, protocol, and private contexts. When a shared key is available, optimal-rate steganography is possible by exploiting message entropy. In the keyless setting, the authors introduce the pseudorandom noise-resilient key exchange primitive (whose transcript is pseudorandom yet correct under constant noise) and show that it enables covert key exchange assuming only that sufficiently many individual messages have constant min-entropy, despite arbitrary private contexts and fully adaptive short messages. Explicit constructions for the primitive are given along with impossibility results for naive variants.

Significance. If the constructions and security arguments hold, the result shows that transcript auditing alone cannot rule out covert coordination between AI agents. The relaxation of the per-message min-entropy requirement to a constant (in contrast to prior steganography work) and the introduction of the new primitive, together with both positive constructions and negative results for simpler variants, constitute a substantive contribution that may be of independent interest in cryptography and LLM security.

minor comments (3)
  1. The abstract states that the constructions rely on 'standard cryptographic assumptions' but does not name them; adding the specific assumptions (e.g., existence of PRGs or specific hardness assumptions) would clarify the result for readers.
  2. A formal definition of the pseudorandom noise-resilient key exchange primitive, including the exact security notions (pseudorandomness of transcript, correctness under constant noise, and the min-entropy condition), would be helpful early in the paper to ground the subsequent constructions.
  3. The claim that the hidden conversation can 'exploit essentially all of the entropy' when a key is present is strong; a brief quantitative statement or reference to the rate achieved relative to the message entropy would strengthen the presentation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their accurate summary of the paper and for recognizing the potential significance of the pseudorandom noise-resilient key exchange primitive and its implications for LLM security and auditing. The 'uncertain' recommendation appears to reflect the novelty of applying these ideas to AI agents rather than any identified flaw; we believe the positive and negative results are technically sound. No specific major comments were provided in the report, so we offer no point-by-point rebuttals below. We remain available to address any further questions.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a new primitive (pseudorandom noise-resilient key exchange) and supplies explicit constructions plus impossibility results for variants, all under the external min-entropy assumption and standard cryptographic hardness. The derivation chain begins from the stated assumption and produces the primitive via direct constructions rather than any self-referential equation, fitted parameter renamed as prediction, or load-bearing self-citation. The initial watermarking/steganography reference is used only for the keyed case and is not required for the keyless result. The argument remains self-contained and does not reduce any claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the introduction of the new primitive and the domain assumption that LLM message distributions contain constant min-entropy in sufficiently many messages.

axioms (1)
  • domain assumption Sufficiently many individual messages have at least constant min-entropy
    Required for the key exchange to succeed with short, fully adaptive messages from models with arbitrary private contexts.
invented entities (1)
  • pseudorandom noise-resilient key exchange primitive no independent evidence
    purpose: Enable key exchange whose transcript is pseudorandom yet remains correct under constant noise
    Newly introduced primitive whose existence is used to establish the covert conversation result.

pith-pipeline@v0.9.0 · 5605 in / 1365 out tokens · 40796 ms · 2026-05-10T19:21:41.910343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

  1. [1]

    Secret-key reconciliation by public discussion

    9 [BS93] Gilles Brassard and Louis Salvail. Secret-key reconciliation by public discussion. In Tor Helleseth, editor,Advances in Cryptology - EUROCRYPT ’93, Workshop on the Theory and Application of of Cryptographic Techniques, Lofthus, Norway, May 23-27, 1993, Proceedings, volume 765 ofLecture Notes in Computer Science, pages 410–423. Springer, 1993. 39 ...

  2. [2]

    arXiv preprint arXiv:2512.08918 , year =

    9 [CG24] Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. InAnnual International Cryptology Conference, pages 325–347. Springer, 2024. 8, 10, 17, 27, 44 [CGG+25] Miranda Christ, Noah Golowich, Sam Gunn, Ankur Moitra, and Daniel Wichs. Improved pseudorandom codes from permuted puzzles.arXiv preprint arXiv:2512.08918, 2025. 10 [CGM24a] Dari...

  3. [3]

    Undetectable watermarks for language models

    10 [CGZ23] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models.arXiv preprint arXiv:2306.09194, 2023. 3, 4, 5, 7, 10, 17, 18, 24, 27, 31 [CLA17] Ruina Chen, Haitao Liu, and Gabriel Altmann. Entropy in different text types.Digital Scholarship in the Humanities, 32(3):528–542, 2017. 17, 24 [CLC+25] Alex Cloud, Minh Le, James ...

  4. [4]

    arXiv preprint arXiv:2308.00113 , year=

    7, 37 [DFS+21] Martin Degeling, Lena Fr¨ ommgen, Thomas Schneider, et al. Meteor: Cryptographically secure steganography for realistic distributions. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2021. 9 [DGKS21] Dana Dachman-Soled, Huijing Gong, Hunter Kippen, and Aria Shahverdi. BKW meets fourier new algor...

  5. [5]

    Robust distortion-free watermarks for language models,

    10 [KRT17] Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors,Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 1067–1080. ACM, 2017. 7, 37 [KTHL23] Rohith Kuditipudi, John Thick...

  6. [6]

    Review outline:

    7, 37 [ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text.arXiv preprint arXiv:2306.17439, 2023. 10 [Zam25] Or Zamir. Undetectable steganography for language models.Transactions on Machine Learning Research, 2025. 3, 4, 5, 10, 17, 18, 19, 20, 21, 24 73