Exploring the Cryptographic Limits of Transformer Networks
Pith reviewed 2026-06-30 07:27 UTC · model grok-4.3
The pith
Transformers of given depth and width cannot implement arbitrary cryptographic functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By generating threshold circuits for Keccak functions, Merkle-Damgård constructions and Merkle trees, then mapping the circuits to transformer architectures via no-attention and tokens-as-gates schemes, the work obtains verified scaling laws for width and depth that yield constructive upper bounds on the cryptographic functions any transformer of those dimensions can compute.
What carries the argument
The exact mapping from threshold-circuit representations of cryptographic primitives onto transformer layers that preserves functionality.
If this is right
- Transformers below the derived bounds cannot perform the listed cryptographic operations and therefore cannot use them for source-free randomness in steganography.
- Capability evaluations of transformer-based AI systems can apply these bounds to rule out certain malicious behaviors.
- The same circuit-to-transformer mapping technique can be reused to obtain bounds for additional functions beyond the three constructions studied.
- Each cryptographic construction has its own explicit width and depth scaling law that must be met for exact implementation.
Where Pith is reading between the lines
- The same circuit-mapping method could be extended to arithmetic or logical primitives to produce broader capacity limits for transformers.
- Actual trained models might approximate the functions with fewer resources than the exact constructive mappings require.
- Hardware designers could use the scaling laws to set minimum transformer sizes needed for security-related computations.
Load-bearing premise
Saturated transformers can be treated as threshold circuits whose generated circuits for the cryptographic constructions can be mapped to the proposed transformer architectures while preserving exact functionality.
What would settle it
A concrete transformer architecture whose depth and width fall below the derived scaling-law bounds yet still computes one of the three cryptographic constructions exactly.
Figures
read the original abstract
In recent work it has been shown that colluding AI agents can use steganographic methods to exchange malicious information. Whether a transformer can implement steganographic methods depends on what cryptographic functions it can implement, since a transformer that can implement a cryptographic function within its layers has source-free randomness access. Despite existing circuit-complexity results, no prior work maps specific cryptographic constructions to transformer architectures. As Merrill et al. have shown that saturated transformers can be seen as threshold circuits, we first generate threshold circuits for three different cryptographic constructions (Keccak functions, Merkle--Damgard constructions and Merkle Trees) and then map these circuits to different transformer architectures. We derive verified scaling laws for the width and depth of the circuits which implement each cryptographic construction and propose two different mappings: no-attention mapping, tokens-as-gates mapping. Beyond its security implications, this work contributes to by establishing a methodology for deriving structural guarantees on transformer computational capacity. Specifically, we derive constructive upper bounds on what a transformer of a given depth and width could plausibly compute, providing a principled foundation for capability evaluations of transformer-based AI systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper constructs threshold circuits for Keccak functions, Merkle-Damgård constructions, and Merkle Trees, maps them to two transformer architectures (no-attention mapping and tokens-as-gates mapping), derives scaling laws for width and depth, and claims this provides constructive upper bounds on the computational capacity of transformers of given depth and width, with implications for steganographic capabilities in AI agents.
Significance. The constructive approach, including explicit circuit generations and mappings to transformers, offers a concrete methodology for linking cryptographic primitives to neural architectures. If the functionality-preserving mappings hold, this could inform capability evaluations, though the interpretation as upper bounds requires clarification as the results primarily establish lower bounds on required transformer sizes.
major comments (2)
- [Abstract] Abstract: The central claim that the work derives 'constructive upper bounds on what a transformer of a given depth and width could plausibly compute' is not supported by the described constructions. Generating threshold circuits and mapping them demonstrates that transformers of sufficient size can implement these functions, providing lower bounds on capacity rather than upper bounds.
- [Mapping to transformer architectures] Mapping to transformer architectures: The paper relies on the assumption from Merrill et al. that saturated transformers can be viewed as threshold circuits, and that the generated circuits can be mapped while preserving exact functionality. This mapping step is load-bearing for the claims but the abstract provides no verification details, circuit constructions, or error analysis to assess its validity.
minor comments (1)
- The abstract mentions 'verified scaling laws' but does not specify the verification method or provide the laws explicitly in the summary.
Simulated Author's Rebuttal
We thank the referee for their careful review and for highlighting points that require clarification. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the work derives 'constructive upper bounds on what a transformer of a given depth and width could plausibly compute' is not supported by the described constructions. Generating threshold circuits and mapping them demonstrates that transformers of sufficient size can implement these functions, providing lower bounds on capacity rather than upper bounds.
Authors: We agree that the constructions demonstrate sufficiency: transformers meeting the derived width and depth scaling can implement the cryptographic functions. This yields lower bounds on the minimal resources required. The abstract phrasing 'constructive upper bounds on what a transformer of a given depth and width could plausibly compute' was meant to convey that models falling below these scalings cannot implement the functions, but the wording is imprecise and risks overstating the result. We will revise the abstract to state that the work supplies constructive lower bounds on transformer size for these functions (with consequent implications for the capabilities of smaller models). revision: yes
-
Referee: [Mapping to transformer architectures] Mapping to transformer architectures: The paper relies on the assumption from Merrill et al. that saturated transformers can be viewed as threshold circuits, and that the generated circuits can be mapped while preserving exact functionality. This mapping step is load-bearing for the claims but the abstract provides no verification details, circuit constructions, or error analysis to assess its validity.
Authors: The manuscript body contains the explicit threshold-circuit constructions for Keccak, Merkle-Damgård, and Merkle trees, together with the two architecture mappings. These rely on the saturated-transformer equivalence established by Merrill et al. We will revise the abstract to reference the verification approach used and will add an appendix supplying sample circuit diagrams, the precise token-to-gate encoding, and any bounded approximation error introduced by the mapping, so that readers can directly evaluate functionality preservation. revision: partial
Circularity Check
No circularity; external citation and constructive circuit mappings are independent of target bounds
full rationale
The derivation begins from the external Merrill et al. result that saturated transformers equal threshold circuits. It then explicitly constructs threshold circuits for Keccak, Merkle-Damgård and Merkle trees, derives their width/depth scaling laws from those constructions, and maps the circuits to two transformer architectures while preserving functionality. None of these steps reduce by definition or by self-citation to the claimed upper bounds; the scaling laws are outputs of the circuit constructions, not inputs. No fitted parameters are relabeled as predictions, no ansatz is smuggled via self-citation, and no uniqueness theorem from the same authors is invoked. The directional claim (upper vs. lower bounds) is a separate correctness question and does not create circularity in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Saturated transformers can be seen as threshold circuits (Merrill et al.)
Reference graph
Works this paper leans on
-
[1]
Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, and Christian Schroeder de Witt. Secret collusion among ai agents: Multi-agent deception via steganography. InAdvances in Neural Information Processing Systems, 2024
2024
-
[2]
William Merrill, Ashish Sabharwal, and Noah A. Smith. Saturated transformers are constant-depth threshold circuits. InTransactions of the Association for Computational Linguistics, 2022
2022
-
[3]
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration. Open X-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Dylan Radovic, Lucas Kruitwagen, Christian Schroeder de Witt, Ben Caldecott, Shane Tomlinson, and Mark Workman. Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning.arXiv preprint arXiv:2211.11043, 2022
-
[5]
Christian Cachin. An information-theoretic model for steganography.Information and Computation, 192(1):41–56, July 2004. doi: 10.1016/j.ic.2004.02.003
-
[6]
Unelicitable backdoors in language models via cryptographic transformer circuits
Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, and Christian Schroeder de Witt. Unelicitable backdoors in language models via cryptographic transformer circuits. InAdvances in Neural Information Processing Systems, 2024
2024
-
[7]
Manning, Christopher Ré, Diana Acosta-Navas, Drew A
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu...
-
[8]
URLhttps://arxiv.org/abs/2211.09110
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Gomez, Łukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017. 9
2017
-
[10]
GLU variants improve transformer, 2020
Noam Shazeer. GLU variants improve transformer, 2020
2020
-
[11]
Threshold circuits of bounded depth.Journal of Computer and System Sciences, 46(2):129–154, 1993
András Hajnal, Wolfgang Maass, Pavel Pudlák, Mario Szegedy, and György Turán. Threshold circuits of bounded depth.Journal of Computer and System Sciences, 46(2):129–154, 1993. doi: 10.1016/0022-0000(93)90001-D
-
[12]
Ralph C. Merkle. One way hash functions and DES. InAdvances in Cryptology — CRYPTO ’89, volume 435 ofLecture Notes in Computer Science, pages 428–446. Springer, 1989. doi: 10.1007/0-387-34805-0_40
-
[13]
A design principle for hash functions
Ivan Bjerre Damgård. A design principle for hash functions. InAdvances in Cryptology — CRYPTO ’89, volume 435 ofLecture Notes in Computer Science, pages 416–427. Springer, 1989. doi: 10.1007/0-387-34805-0_39
-
[14]
Secrecy, Authentication, and Public Key Systems
Ralph C. Merkle. A certified digital signature. In Gilles Brassard, editor,Advances in Cryptology — CRYPTO ’89 Proceedings, volume 435 ofLecture Notes in Computer Science, pages 218–238, New York, NY , 1989. Springer. doi: 10.1007/0-387-34805-0\_21. Work originally presented in Merkle’s 1979 Stanford PhD thesis “Secrecy, Authentication, and Public Key Systems”
-
[15]
The sponge and duplex constructions.https://keccak.team/sponge_duplex.html, 2011
Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. The sponge and duplex constructions.https://keccak.team/sponge_duplex.html, 2011
2011
-
[16]
Reifier: Compile algorithms into neural networks
Andis Draguns. Reifier: Compile algorithms into neural networks. https://github.com/ contramont/reifier
-
[17]
Part-b-project-results
Crroco. Part-b-project-results. https://github.com/Crroco/Part-B-Project-Results/ tree/main. A Keccak Operation Definitions The state a defined in Section 2.6 has length 25w where w∈ {1,2,4,8,16,32,64} , depending on the version of Keccak that is used. Capacity has lengthcand rate has lengthr. Define GF(p) for p a prime number as the Galois Field with p e...
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.