Exploring the Cryptographic Limits of Transformer Networks

Andis Draguns; Christian Schroeder de Witt; Isaac Robinson; Philip Torr; Stefan Domunco

arxiv: 2606.29389 · v1 · pith:7UXUUP7Hnew · submitted 2026-06-28 · 💻 cs.CR · cs.LG

Exploring the Cryptographic Limits of Transformer Networks

Stefan Domunco , Andis Draguns , Philip Torr , Isaac Robinson , Christian Schroeder de Witt This is my paper

Pith reviewed 2026-06-30 07:27 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords transformerscryptographythreshold circuitsKeccakMerkle-DamgardMerkle treesscaling lawssteganography

0 comments

The pith

Transformers of given depth and width cannot implement arbitrary cryptographic functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives constructive upper bounds showing what cryptographic functions a transformer of fixed depth and width can or cannot realize. It first builds threshold circuits for Keccak, Merkle-Damgård constructions, and Merkle trees, then maps those circuits onto transformer architectures using no-attention and tokens-as-gates schemes. Scaling laws are obtained that translate circuit size directly into required transformer width and depth. A sympathetic reader would care because the bounds give a way to decide whether transformer-based agents could generate source-free randomness for steganographic or other hidden communication tasks. The approach supplies structural guarantees on capacity instead of relying solely on empirical checks.

Core claim

By generating threshold circuits for Keccak functions, Merkle-Damgård constructions and Merkle trees, then mapping the circuits to transformer architectures via no-attention and tokens-as-gates schemes, the work obtains verified scaling laws for width and depth that yield constructive upper bounds on the cryptographic functions any transformer of those dimensions can compute.

What carries the argument

The exact mapping from threshold-circuit representations of cryptographic primitives onto transformer layers that preserves functionality.

If this is right

Transformers below the derived bounds cannot perform the listed cryptographic operations and therefore cannot use them for source-free randomness in steganography.
Capability evaluations of transformer-based AI systems can apply these bounds to rule out certain malicious behaviors.
The same circuit-to-transformer mapping technique can be reused to obtain bounds for additional functions beyond the three constructions studied.
Each cryptographic construction has its own explicit width and depth scaling law that must be met for exact implementation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same circuit-mapping method could be extended to arithmetic or logical primitives to produce broader capacity limits for transformers.
Actual trained models might approximate the functions with fewer resources than the exact constructive mappings require.
Hardware designers could use the scaling laws to set minimum transformer sizes needed for security-related computations.

Load-bearing premise

Saturated transformers can be treated as threshold circuits whose generated circuits for the cryptographic constructions can be mapped to the proposed transformer architectures while preserving exact functionality.

What would settle it

A concrete transformer architecture whose depth and width fall below the derived scaling-law bounds yet still computes one of the three cryptographic constructions exactly.

Figures

Figures reproduced from arXiv: 2606.29389 by Andis Draguns, Christian Schroeder de Witt, Isaac Robinson, Philip Torr, Stefan Domunco.

**Figure 2.** Figure 2: Plot for Keccak depth scaling. Differences between predicted and actual values come from [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Plot for Keccak width scaling each Keccak state has 50 entries, two leaves being processed in parallel gives 2 · 50 = 100 nodes. The next six layers perform the Keccak toy function f . Each of these layers has a width twice as large as its corresponding layer in the circuit representation of f . Layers 9 and 10, the first layers after fleaves has been computed, XOR the previous results and thus start compu… view at source ↗

**Figure 4.** Figure 4: Circuit compiled for MD with compression function Keccak with [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Plot for MD depth scaling E Tokens-as-Gates Correctness For the Tokens-as-Gates mapping we get the following flow of computation for the attention sublayer: qi = Qxi ki = Kxi vi = V xi si, j = q T i kj ai, j = ( 1 if si, j = 1 0 otherwise outi = ∑ j ai, j vj Looking at equations 26, 25, 27 note that K is a fixed matrix. When applied to xj = ej , it returns a vector whose i-th entry is 1 if gate j is a pred… view at source ↗

**Figure 6.** Figure 6: Plot for MD width scaling [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Circuit compiled for MT with compression function Keccak with [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Plot for MT depth scaling 1 2 4 8 16 Number of leaves nleaves 2000 4000 6000 8000 Circuit width Width vs nleaves (fixed: log2 w = 1, nrounds = 1, r = 12, c = 38, widthc = 550, leaf bits = 4) Actual Predicted: nleaves widthc [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Plot for MT width scaling 17 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

read the original abstract

In recent work it has been shown that colluding AI agents can use steganographic methods to exchange malicious information. Whether a transformer can implement steganographic methods depends on what cryptographic functions it can implement, since a transformer that can implement a cryptographic function within its layers has source-free randomness access. Despite existing circuit-complexity results, no prior work maps specific cryptographic constructions to transformer architectures. As Merrill et al. have shown that saturated transformers can be seen as threshold circuits, we first generate threshold circuits for three different cryptographic constructions (Keccak functions, Merkle--Damgard constructions and Merkle Trees) and then map these circuits to different transformer architectures. We derive verified scaling laws for the width and depth of the circuits which implement each cryptographic construction and propose two different mappings: no-attention mapping, tokens-as-gates mapping. Beyond its security implications, this work contributes to by establishing a methodology for deriving structural guarantees on transformer computational capacity. Specifically, we derive constructive upper bounds on what a transformer of a given depth and width could plausibly compute, providing a principled foundation for capability evaluations of transformer-based AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds concrete threshold-circuit mappings for Keccak, Merkle-Damgård and Merkle trees into two transformer styles and gives scaling laws, but the headline claim of 'constructive upper bounds' on fixed-size transformer capacity does not follow from the constructions.

read the letter

The useful part is the explicit circuit constructions and the two mappings (no-attention and tokens-as-gates). They take the saturated-transformer-to-threshold-circuit reduction from Merrill et al. and apply it to three standard crypto primitives, then derive width and depth scaling for each. That is a legitimate incremental step beyond the prior circuit view, and the scaling laws are at least stated as verified.

The central claim does not hold up. Showing that a transformer can implement these functions once it is large enough is a lower bound on the resources required, not an upper bound on what a given depth and width can compute. An upper bound would need a proof that certain functions are impossible below a size threshold; the existence proofs here supply no such impossibility result. The functionality-preserving mapping does not fix the direction.

The abstract supplies no circuit diagrams, gate counts, or error analysis, so the concrete mappings cannot be checked from the text alone. The security motivation (steganography between colluding agents) is clear but sits on top of the same bound-direction issue.

This is for readers who want explicit examples of crypto primitives realized in transformer architectures. Someone already working on circuit characterizations of attention models will find the constructions worth looking at; the rest of the interpretation needs tightening.

I would send it to a serious referee. The constructions are specific enough to be worth checking, and the authors should be asked to correct the bound language and supply the missing verification details.

Referee Report

2 major / 1 minor

Summary. The paper constructs threshold circuits for Keccak functions, Merkle-Damgård constructions, and Merkle Trees, maps them to two transformer architectures (no-attention mapping and tokens-as-gates mapping), derives scaling laws for width and depth, and claims this provides constructive upper bounds on the computational capacity of transformers of given depth and width, with implications for steganographic capabilities in AI agents.

Significance. The constructive approach, including explicit circuit generations and mappings to transformers, offers a concrete methodology for linking cryptographic primitives to neural architectures. If the functionality-preserving mappings hold, this could inform capability evaluations, though the interpretation as upper bounds requires clarification as the results primarily establish lower bounds on required transformer sizes.

major comments (2)

[Abstract] Abstract: The central claim that the work derives 'constructive upper bounds on what a transformer of a given depth and width could plausibly compute' is not supported by the described constructions. Generating threshold circuits and mapping them demonstrates that transformers of sufficient size can implement these functions, providing lower bounds on capacity rather than upper bounds.
[Mapping to transformer architectures] Mapping to transformer architectures: The paper relies on the assumption from Merrill et al. that saturated transformers can be viewed as threshold circuits, and that the generated circuits can be mapped while preserving exact functionality. This mapping step is load-bearing for the claims but the abstract provides no verification details, circuit constructions, or error analysis to assess its validity.

minor comments (1)

The abstract mentions 'verified scaling laws' but does not specify the verification method or provide the laws explicitly in the summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful review and for highlighting points that require clarification. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the work derives 'constructive upper bounds on what a transformer of a given depth and width could plausibly compute' is not supported by the described constructions. Generating threshold circuits and mapping them demonstrates that transformers of sufficient size can implement these functions, providing lower bounds on capacity rather than upper bounds.

Authors: We agree that the constructions demonstrate sufficiency: transformers meeting the derived width and depth scaling can implement the cryptographic functions. This yields lower bounds on the minimal resources required. The abstract phrasing 'constructive upper bounds on what a transformer of a given depth and width could plausibly compute' was meant to convey that models falling below these scalings cannot implement the functions, but the wording is imprecise and risks overstating the result. We will revise the abstract to state that the work supplies constructive lower bounds on transformer size for these functions (with consequent implications for the capabilities of smaller models). revision: yes
Referee: [Mapping to transformer architectures] Mapping to transformer architectures: The paper relies on the assumption from Merrill et al. that saturated transformers can be viewed as threshold circuits, and that the generated circuits can be mapped while preserving exact functionality. This mapping step is load-bearing for the claims but the abstract provides no verification details, circuit constructions, or error analysis to assess its validity.

Authors: The manuscript body contains the explicit threshold-circuit constructions for Keccak, Merkle-Damgård, and Merkle trees, together with the two architecture mappings. These rely on the saturated-transformer equivalence established by Merrill et al. We will revise the abstract to reference the verification approach used and will add an appendix supplying sample circuit diagrams, the precise token-to-gate encoding, and any bounded approximation error introduced by the mapping, so that readers can directly evaluate functionality preservation. revision: partial

Circularity Check

0 steps flagged

No circularity; external citation and constructive circuit mappings are independent of target bounds

full rationale

The derivation begins from the external Merrill et al. result that saturated transformers equal threshold circuits. It then explicitly constructs threshold circuits for Keccak, Merkle-Damgård and Merkle trees, derives their width/depth scaling laws from those constructions, and maps the circuits to two transformer architectures while preserving functionality. None of these steps reduce by definition or by self-citation to the claimed upper bounds; the scaling laws are outputs of the circuit constructions, not inputs. No fitted parameters are relabeled as predictions, no ansatz is smuggled via self-citation, and no uniqueness theorem from the same authors is invoked. The directional claim (upper vs. lower bounds) is a separate correctness question and does not create circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the external result that saturated transformers equal threshold circuits; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Saturated transformers can be seen as threshold circuits (Merrill et al.)
Invoked to justify converting cryptographic circuits into transformer architectures.

pith-pipeline@v0.9.1-grok · 5730 in / 1131 out tokens · 48516 ms · 2026-06-30T07:27:34.083842+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 8 canonical work pages · 2 internal anchors

[1]

Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, and Christian Schroeder de Witt. Secret collusion among ai agents: Multi-agent deception via steganography. InAdvances in Neural Information Processing Systems, 2024

2024
[2]

William Merrill, Ashish Sabharwal, and Noah A. Smith. Saturated transformers are constant-depth threshold circuits. InTransactions of the Association for Computational Linguistics, 2022

2022
[3]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration. Open X-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning.arXiv preprint arXiv:2211.11043, 2022

Dylan Radovic, Lucas Kruitwagen, Christian Schroeder de Witt, Ben Caldecott, Shane Tomlinson, and Mark Workman. Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning.arXiv preprint arXiv:2211.11043, 2022

work page arXiv 2022
[5]

An information-theoretic model for steganography.Information and Computation, 192(1):41–56, July 2004

Christian Cachin. An information-theoretic model for steganography.Information and Computation, 192(1):41–56, July 2004. doi: 10.1016/j.ic.2004.02.003

work page doi:10.1016/j.ic.2004.02.003 2004
[6]

Unelicitable backdoors in language models via cryptographic transformer circuits

Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, and Christian Schroeder de Witt. Unelicitable backdoors in language models via cryptographic transformer circuits. InAdvances in Neural Information Processing Systems, 2024

2024
[7]

Manning, Christopher Ré, Diana Acosta-Navas, Drew A

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu...
[8]

URLhttps://arxiv.org/abs/2211.09110

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017. 9

2017
[10]

GLU variants improve transformer, 2020

Noam Shazeer. GLU variants improve transformer, 2020

2020
[11]

Threshold circuits of bounded depth.Journal of Computer and System Sciences, 46(2):129–154, 1993

András Hajnal, Wolfgang Maass, Pavel Pudlák, Mario Szegedy, and György Turán. Threshold circuits of bounded depth.Journal of Computer and System Sciences, 46(2):129–154, 1993. doi: 10.1016/0022-0000(93)90001-D

work page doi:10.1016/0022-0000(93)90001-d 1993
[12]

Ralph C. Merkle. One way hash functions and DES. InAdvances in Cryptology — CRYPTO ’89, volume 435 ofLecture Notes in Computer Science, pages 428–446. Springer, 1989. doi: 10.1007/0-387-34805-0_40

work page doi:10.1007/0-387-34805-0_40 1989
[13]

A design principle for hash functions

Ivan Bjerre Damgård. A design principle for hash functions. InAdvances in Cryptology — CRYPTO ’89, volume 435 ofLecture Notes in Computer Science, pages 416–427. Springer, 1989. doi: 10.1007/0-387-34805-0_39

work page doi:10.1007/0-387-34805-0_39 1989
[14]

Secrecy, Authentication, and Public Key Systems

Ralph C. Merkle. A certified digital signature. In Gilles Brassard, editor,Advances in Cryptology — CRYPTO ’89 Proceedings, volume 435 ofLecture Notes in Computer Science, pages 218–238, New York, NY , 1989. Springer. doi: 10.1007/0-387-34805-0\_21. Work originally presented in Merkle’s 1979 Stanford PhD thesis “Secrecy, Authentication, and Public Key Systems”

work page doi:10.1007/0-387-34805-0 1989
[15]

The sponge and duplex constructions.https://keccak.team/sponge_duplex.html, 2011

Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. The sponge and duplex constructions.https://keccak.team/sponge_duplex.html, 2011

2011
[16]

Reifier: Compile algorithms into neural networks

Andis Draguns. Reifier: Compile algorithms into neural networks. https://github.com/ contramont/reifier
[17]

Part-b-project-results

Crroco. Part-b-project-results. https://github.com/Crroco/Part-B-Project-Results/ tree/main. A Keccak Operation Definitions The state a defined in Section 2.6 has length 25w where w∈ {1,2,4,8,16,32,64} , depending on the version of Keccak that is used. Capacity has lengthcand rate has lengthr. Define GF(p) for p a prime number as the Galois Field with p e...

2000

[1] [1]

Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, and Christian Schroeder de Witt. Secret collusion among ai agents: Multi-agent deception via steganography. InAdvances in Neural Information Processing Systems, 2024

2024

[2] [2]

William Merrill, Ashish Sabharwal, and Noah A. Smith. Saturated transformers are constant-depth threshold circuits. InTransactions of the Association for Computational Linguistics, 2022

2022

[3] [3]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration. Open X-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning.arXiv preprint arXiv:2211.11043, 2022

Dylan Radovic, Lucas Kruitwagen, Christian Schroeder de Witt, Ben Caldecott, Shane Tomlinson, and Mark Workman. Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning.arXiv preprint arXiv:2211.11043, 2022

work page arXiv 2022

[5] [5]

An information-theoretic model for steganography.Information and Computation, 192(1):41–56, July 2004

Christian Cachin. An information-theoretic model for steganography.Information and Computation, 192(1):41–56, July 2004. doi: 10.1016/j.ic.2004.02.003

work page doi:10.1016/j.ic.2004.02.003 2004

[6] [6]

Unelicitable backdoors in language models via cryptographic transformer circuits

Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, and Christian Schroeder de Witt. Unelicitable backdoors in language models via cryptographic transformer circuits. InAdvances in Neural Information Processing Systems, 2024

2024

[7] [7]

Manning, Christopher Ré, Diana Acosta-Navas, Drew A

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu...

[8] [8]

URLhttps://arxiv.org/abs/2211.09110

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017. 9

2017

[10] [10]

GLU variants improve transformer, 2020

Noam Shazeer. GLU variants improve transformer, 2020

2020

[11] [11]

Threshold circuits of bounded depth.Journal of Computer and System Sciences, 46(2):129–154, 1993

András Hajnal, Wolfgang Maass, Pavel Pudlák, Mario Szegedy, and György Turán. Threshold circuits of bounded depth.Journal of Computer and System Sciences, 46(2):129–154, 1993. doi: 10.1016/0022-0000(93)90001-D

work page doi:10.1016/0022-0000(93)90001-d 1993

[12] [12]

Ralph C. Merkle. One way hash functions and DES. InAdvances in Cryptology — CRYPTO ’89, volume 435 ofLecture Notes in Computer Science, pages 428–446. Springer, 1989. doi: 10.1007/0-387-34805-0_40

work page doi:10.1007/0-387-34805-0_40 1989

[13] [13]

A design principle for hash functions

Ivan Bjerre Damgård. A design principle for hash functions. InAdvances in Cryptology — CRYPTO ’89, volume 435 ofLecture Notes in Computer Science, pages 416–427. Springer, 1989. doi: 10.1007/0-387-34805-0_39

work page doi:10.1007/0-387-34805-0_39 1989

[14] [14]

Secrecy, Authentication, and Public Key Systems

Ralph C. Merkle. A certified digital signature. In Gilles Brassard, editor,Advances in Cryptology — CRYPTO ’89 Proceedings, volume 435 ofLecture Notes in Computer Science, pages 218–238, New York, NY , 1989. Springer. doi: 10.1007/0-387-34805-0\_21. Work originally presented in Merkle’s 1979 Stanford PhD thesis “Secrecy, Authentication, and Public Key Systems”

work page doi:10.1007/0-387-34805-0 1989

[15] [15]

The sponge and duplex constructions.https://keccak.team/sponge_duplex.html, 2011

Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. The sponge and duplex constructions.https://keccak.team/sponge_duplex.html, 2011

2011

[16] [16]

Reifier: Compile algorithms into neural networks

Andis Draguns. Reifier: Compile algorithms into neural networks. https://github.com/ contramont/reifier

[17] [17]

Part-b-project-results

Crroco. Part-b-project-results. https://github.com/Crroco/Part-B-Project-Results/ tree/main. A Keccak Operation Definitions The state a defined in Section 2.6 has length 25w where w∈ {1,2,4,8,16,32,64} , depending on the version of Keccak that is used. Capacity has lengthcand rate has lengthr. Define GF(p) for p a prime number as the Galois Field with p e...

2000