Synthesis and Verification of Transformer Programs (Technical Report)
Pith reviewed 2026-05-21 12:53 UTC · model grok-4.3
The pith
C-RASP programs that capture transformer concepts can be verified by translation to Lustre and model checking, and learned from examples by local search.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
C-RASP programs can be verified by establishing an equivalence-preserving translation to Lustre programs, which are then checked with state-of-the-art model checkers and SMT solvers, while a separate local search procedure synthesizes C-RASP programs directly from input-output examples.
What carries the argument
The semantic-preserving translation from C-RASP to Lustre programs, which reduces verification to existing model-checking infrastructure, together with a local search procedure that iteratively refines candidate programs against examples.
If this is right
- Verification of C-RASP programs becomes feasible using highly optimized SMT-based model checkers already developed for Lustre.
- Transformer program optimization can be performed by synthesizing improved C-RASP variants that still satisfy the original specification.
- Constrained learning of transformer programs is supported when only a partial specification is available as examples.
- Existing benchmarks of C-RASP programs from the literature become automatically checkable for safety and functional properties.
Where Pith is reading between the lines
- The Lustre connection might allow reuse of Lustre-specific abstractions such as clock calculus when analyzing transformer-like control flow.
- Local search could be hybridized with gradient-based methods if C-RASP programs are later embedded in differentiable settings.
- If the translation scales, the same infrastructure might verify programs that model attention mechanisms or other transformer subroutines directly.
- Demonstrations on literature benchmarks suggest the approach could extend to programs that implement specific transformer layers rather than abstract concepts.
Load-bearing premise
C-RASP programs can be translated into equivalent Lustre programs without semantic loss or prohibitive state-space explosion, and local search will locate programs that match the examples.
What would settle it
A concrete C-RASP program whose Lustre translation produces a state-space explosion that prevents the model checker from terminating, or a set of examples for which the local search algorithm consistently returns a program that fails to match the examples on unseen inputs.
read the original abstract
C-RASP is a simple programming language that was recently shown to capture concepts expressible by transformers. In this paper, we develop new algorithmic techniques for automatically verifying C-RASPs. To this end, we establish a connection to the verification of synchronous dataflow programs in Lustre, which enables us to exploit state-of-the-art model checkers utilizing highly optimized SMT-solvers. Our second contribution addresses learning a C-RASP program in the first place. To this end, we provide a new algorithm for learning a C-RASP from examples using local search. We demonstrate efficacy of our implementation for benchmarks of C-RASPs in the literature, in particular in connection to the following applications: (1) transformer program optimization, and (2) constrained learning of transformer programs (based on a partial specification).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to develop algorithmic techniques for automatically verifying C-RASP programs by translating them into equivalent Lustre synchronous dataflow programs, enabling the use of state-of-the-art SMT-based model checkers. It also introduces a local search algorithm for synthesizing C-RASP programs from examples and reports efficacy on literature benchmarks for applications including transformer program optimization and constrained learning based on partial specifications.
Significance. If the central claims hold, this work would be significant for bridging formal verification techniques with transformer modeling via C-RASP. The connection to Lustre allows reuse of optimized existing tools rather than developing new verifiers, and the local search synthesis method provides a practical approach for program learning under constraints. These could support optimization and correctness guarantees for programs capturing transformer behaviors, with potential impact in program synthesis and neural network verification.
major comments (2)
- [Verification section] Verification section (translation to Lustre): The central verification claim rests on a semantics-preserving translation from C-RASP to Lustre that enables effective model checking. However, the manuscript provides no formal proof of semantic equivalence, no complexity analysis, and no discussion of state-space bounds or abstraction for constructs like attention or position encodings. This is load-bearing, as the skeptic concern about exponential blowup in realistic cases is not addressed despite the abstract's efficacy claims on benchmarks.
- [Synthesis section] Synthesis section (local search algorithm): The new local search algorithm for learning C-RASP from examples is presented as the second contribution, but without analysis of convergence properties, escape from local optima, or detailed success metrics beyond high-level benchmark mentions, the reliability claim for constrained learning remains unsubstantiated.
minor comments (2)
- The abstract states efficacy on benchmarks but supplies no specific error metrics, success rates, or comparison baselines; adding a summary table of results would improve clarity.
- Notation for the C-RASP to Lustre mapping could benefit from a small illustrative example to aid readability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and for acknowledging the potential significance of our work in bridging formal verification with transformer modeling via C-RASP. We address each major comment below and outline the revisions we will incorporate.
read point-by-point responses
-
Referee: [Verification section] Verification section (translation to Lustre): The central verification claim rests on a semantics-preserving translation from C-RASP to Lustre that enables effective model checking. However, the manuscript provides no formal proof of semantic equivalence, no complexity analysis, and no discussion of state-space bounds or abstraction for constructs like attention or position encodings. This is load-bearing, as the skeptic concern about exponential blowup in realistic cases is not addressed despite the abstract's efficacy claims on benchmarks.
Authors: We agree that a formal proof of semantic equivalence is necessary to substantiate the central claim. In the revised manuscript we will add a complete proof establishing that the translation preserves the semantics of C-RASP programs. We will also include a complexity analysis demonstrating that the translation is linear in the size of the input program. Regarding state-space bounds and abstractions, we will expand the verification section to discuss how attention mechanisms and position encodings are encoded in Lustre, explain the practical bounds observed in our benchmarks, and address potential exponential blowup by referencing the performance of the underlying SMT-based model checkers on the reported instances. These additions will directly respond to concerns about scalability and the efficacy claims. revision: yes
-
Referee: [Synthesis section] Synthesis section (local search algorithm): The new local search algorithm for learning C-RASP from examples is presented as the second contribution, but without analysis of convergence properties, escape from local optima, or detailed success metrics beyond high-level benchmark mentions, the reliability claim for constrained learning remains unsubstantiated.
Authors: We accept that additional analysis is required to strengthen the claims about the local search algorithm. In the revision we will add a discussion of convergence properties under the problem constraints we consider, describe the use of random perturbations and restarts to escape local optima, and report more granular success metrics from the benchmarks, including success rates, average iteration counts, and failure modes. These details will be supported by the experimental data already collected. revision: yes
Circularity Check
Independent algorithmic steps for C-RASP verification via Lustre and local-search learning; minor reliance on prior literature without reduction
full rationale
The paper's core contributions are a semantics-preserving translation from C-RASP to Lustre programs (enabling SMT-based model checking) and a new local-search algorithm for synthesizing C-RASP programs from examples. These are presented as novel algorithmic techniques that build upon but do not reduce to the prior definition of C-RASP as a transformer-capturing language. No equations or steps in the provided abstract or described claims equate a derived result to its inputs by construction, rename a fitted parameter as a prediction, or rest the central claim solely on an unverified self-citation chain. The work remains self-contained against external model-checking benchmarks and example-driven synthesis, yielding only a minor self-citation score for dependence on the originating C-RASP literature.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption C-RASP captures concepts expressible by transformers
- domain assumption C-RASP programs can be translated to Lustre while preserving semantics for model checking
Reference graph
Works this paper leans on
-
[1]
Foundations of symbolic languages for model interpretability
Marcelo Arenas, Daniel B´ aez, Pablo Barcel´ o, Jorge P´ erez, and Bernardo Subercaseaux. Foundations of symbolic languages for model interpretability. In Marc’Aurelio Ranzato, 2Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. ...
work page 2021
-
[2]
Pablo Barcel´ o, Alexander Kozachinskiy, Anthony Widjaja Lin, and Vladimir V. Podolskii. Logical languages accepted by transformer encoders with hard attention. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024
work page 2024
-
[3]
Model inter- pretability through the lens of computational complexity
Pablo Barcel´ o, Mika¨ el Monet, Jorge P´ erez, and Bernardo Subercaseaux. Model inter- pretability through the lens of computational complexity. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing ...
work page 2020
-
[4]
Christopher Brix, Stanley Bak, Taylor T. Johnson, and Haoze Wu. The fifth international verification of neural networks competition (VNN-COMP 2024): Summary and results. CoRR, abs/2412.19985, 2024
- [5]
-
[6]
Association for Computing Machinery
-
[7]
Adrien Champion, Alain Mebsout, Christoph Sticksel, and Cesare Tinelli. The Kind2 model checker. In Swarat Chaudhuri and Azadeh Farzan, editors,Computer Aided Verifi- cation, pages 510–517, Cham, 2016. Springer International Publishing
work page 2016
-
[8]
Dan Friedman, Alexander Wettig, and Danqi Chen. Learning transformer programs. In Proceedings of the 37th International Conference on Neural Information Processing Sys- tems, NIPS ’23, Red Hook, NY, USA, 2023. Curran Associates Inc
work page 2023
-
[9]
Darrall Henderson, Sheldon H. Jacobson, and Alan W. Johnson.The Theory and Practice of Simulated Annealing, pages 287–319. Springer US, Boston, MA, 2003
work page 2003
-
[10]
Lin, Oliver Markgraf, Julian Parsert, and Tony Tan
Chih-Duo Hong, Hongjian Jiang, Anthony W. Lin, Oliver Markgraf, Julian Parsert, and Tony Tan. Extracting robust register automata from neural networks over data sequences. CoRR, abs/2511.19100, 2025
-
[11]
Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neu- ral networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37:100270, 2020
work page 2020
-
[12]
Safety verification of deep neural networks
Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety verification of deep neural networks. In Rupak Majumdar and Viktor Kuncak, editors,Computer Aided Ver- ification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, volume 10426 ofLecture Notes in Computer Science, pages 3–29. Springer, 2017. ...
work page 2017
-
[13]
A formal framework for understanding length generalization in transformers
Xinting Huang, Andy Yang, Satwik Bhattamishra, Yash Raj Sarrof, Andreas Krebs, Hattie Zhou, Preetum Nakkiran, and Michael Hahn. A formal framework for understanding length generalization in transformers. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025
work page 2025
-
[14]
Softmax trans- formers are turing-complete.CoRR, abs/2511.20038, 2025
Hongjian Jiang, Michael Hahn, Georg Zetzsche, and Anthony Widjaja Lin. Softmax trans- formers are turing-complete.CoRR, abs/2511.20038, 2025
-
[15]
Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. Reluplex: a calculus for reasoning about deep neural networks.Formal Methods Syst. Des., 60(1):87–116, 2022
work page 2022
-
[16]
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671–680, 1983
work page 1983
-
[17]
Characterizing the expressivity of fixed-precision trans- former language models
Jiaoda Li and Ryan Cotterell. Characterizing the expressivity of fixed-precision trans- former language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[18]
Ash, Surbhi Goel, Akshay Krishnamurthy, and Cyril Zhang
Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, and Cyril Zhang. Ex- posing attention glitches with flip-flop language modeling. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[19]
Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher A. Strong, Clark W. Bar- rett, and Mykel J. Kochenderfer. Algorithms for verifying deep neural networks.Found. Trends Optim., 4(3-4):244–404, 2021
work page 2021
-
[20]
Joao Marques-Silva and Alexey Ignatiev. No silver bullet: interpretable ml models must be explained.Frontiers in Artificial Intelligence, 6, 2023
work page 2023
-
[21]
The expressive power of transformers with chain of thought
William Merrill and Ashish Sabharwal. The expressive power of transformers with chain of thought. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024
work page 2024
- [22]
-
[23]
Weighted automata extraction from recurrent neural networks via regression on state spaces
Takamasa Okudono, Masaki Waga, Taro Sekiyama, and Ichiro Hasuo. Weighted automata extraction from recurrent neural networks via regression on state spaces. InThe Thirty- Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second In- novative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Edu...
work page 2020
-
[24]
Attention is turing-complete.J
Jorge P´ erez, Pablo Barcel´ o, and Javier Marinkovic. Attention is turing-complete.J. Mach. Learn. Res., 22:75:1–75:35, 2021
work page 2021
-
[25]
Transformer encoder satisfiability: Com- plexity and impact on formal reasoning
Marco S¨ alzer, Eric Alsmann, and Martin Lange. Transformer encoder satisfiability: Com- plexity and impact on formal reasoning. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025
work page 2025
-
[26]
What formal languages can transformers express? A survey.Trans
Lena Strobl, William Merrill, Gail Weiss, David Chiang, and Dana Angluin. What formal languages can transformers express? A survey.Trans. Assoc. Comput. Linguistics, 12:543– 561, 2024. 15 Synthesis and Verification of Transformer Programs H. Jiang M. Hague P. R¨ ummer A. W. Lin
work page 2024
-
[27]
The counting power of transformers.CoRR, abs/2505.11199, 2025
Marco S¨ alzer, Chris K¨ ocher, Alexander Kozachinskiy, Georg Zetzsche, and Anthony Wid- jaja Lin. The counting power of transformers.CoRR, abs/2505.11199, 2025
-
[28]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: An- nual Conferenc...
work page 2017
-
[29]
Gail Weiss, Yoav Goldberg, and Eran Yahav. Thinking like transformers. InInternational Conference on Machine Learning, pages 11080–11090. PMLR, 2021
work page 2021
-
[30]
Gail Weiss, Yoav Goldberg, and Eran Yahav. Extracting automata from recurrent neural networks using queries and counterexamples (extended version).Mach. Learn., 113(5):2877–2919, 2024
work page 2024
-
[31]
Length generalization bounds for transformers, 2026
Andy Yang, Pascal Bergstr¨ aßer, Georg Zetzsche, David Chiang, and Anthony Lin. Length generalization bounds for transformers, 2026. Under submission
work page 2026
-
[32]
Knee-deep in c-RASP: A transformer depth hierarchy
Andy Yang, Micha¨ el Cadilhac, and David Chiang. Knee-deep in c-RASP: A transformer depth hierarchy. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[33]
Andy Yang and David Chiang. Counting like transformers: Compiling temporal counting logic into softmax transformers.CoRR, abs/2404.04393, 2024
-
[34]
Masked hard-attention transformers recog- nize exactly the star-free languages
Andy Yang, David Chiang, and Dana Angluin. Masked hard-attention transformers recog- nize exactly the star-free languages. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Syste...
work page 2024
-
[35]
Susskind, Samy Bengio, and Preetum Nakkiran
Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Joshua M. Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? A study in length generalization. InThe Twelfth International Conference on Learning Represen- tations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. 16 Synthesis and Verifica...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.