pith. sign in

arxiv: 2602.16473 · v2 · pith:XTVICAYWnew · submitted 2026-02-18 · 💻 cs.LG · cs.FL· cs.LO

Synthesis and Verification of Transformer Programs (Technical Report)

Pith reviewed 2026-05-21 12:53 UTC · model grok-4.3

classification 💻 cs.LG cs.FLcs.LO
keywords C-RASPtransformer programsprogram verificationLustremodel checkinglocal searchprogram synthesisSMT solvers
0
0 comments X

The pith

C-RASP programs that capture transformer concepts can be verified by translation to Lustre and model checking, and learned from examples by local search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops techniques to automatically verify C-RASP programs, a simple language shown to express transformer concepts, by connecting them to the verification of synchronous dataflow programs in Lustre. This link allows use of existing model checkers that rely on optimized SMT solvers. It also presents a local search algorithm that learns a C-RASP program matching given examples. These methods are demonstrated on literature benchmarks for tasks including optimization of transformer programs and learning under partial specifications.

Core claim

C-RASP programs can be verified by establishing an equivalence-preserving translation to Lustre programs, which are then checked with state-of-the-art model checkers and SMT solvers, while a separate local search procedure synthesizes C-RASP programs directly from input-output examples.

What carries the argument

The semantic-preserving translation from C-RASP to Lustre programs, which reduces verification to existing model-checking infrastructure, together with a local search procedure that iteratively refines candidate programs against examples.

If this is right

  • Verification of C-RASP programs becomes feasible using highly optimized SMT-based model checkers already developed for Lustre.
  • Transformer program optimization can be performed by synthesizing improved C-RASP variants that still satisfy the original specification.
  • Constrained learning of transformer programs is supported when only a partial specification is available as examples.
  • Existing benchmarks of C-RASP programs from the literature become automatically checkable for safety and functional properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Lustre connection might allow reuse of Lustre-specific abstractions such as clock calculus when analyzing transformer-like control flow.
  • Local search could be hybridized with gradient-based methods if C-RASP programs are later embedded in differentiable settings.
  • If the translation scales, the same infrastructure might verify programs that model attention mechanisms or other transformer subroutines directly.
  • Demonstrations on literature benchmarks suggest the approach could extend to programs that implement specific transformer layers rather than abstract concepts.

Load-bearing premise

C-RASP programs can be translated into equivalent Lustre programs without semantic loss or prohibitive state-space explosion, and local search will locate programs that match the examples.

What would settle it

A concrete C-RASP program whose Lustre translation produces a state-space explosion that prevents the model checker from terminating, or a set of examples for which the local search algorithm consistently returns a program that fails to match the examples on unseen inputs.

read the original abstract

C-RASP is a simple programming language that was recently shown to capture concepts expressible by transformers. In this paper, we develop new algorithmic techniques for automatically verifying C-RASPs. To this end, we establish a connection to the verification of synchronous dataflow programs in Lustre, which enables us to exploit state-of-the-art model checkers utilizing highly optimized SMT-solvers. Our second contribution addresses learning a C-RASP program in the first place. To this end, we provide a new algorithm for learning a C-RASP from examples using local search. We demonstrate efficacy of our implementation for benchmarks of C-RASPs in the literature, in particular in connection to the following applications: (1) transformer program optimization, and (2) constrained learning of transformer programs (based on a partial specification).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to develop algorithmic techniques for automatically verifying C-RASP programs by translating them into equivalent Lustre synchronous dataflow programs, enabling the use of state-of-the-art SMT-based model checkers. It also introduces a local search algorithm for synthesizing C-RASP programs from examples and reports efficacy on literature benchmarks for applications including transformer program optimization and constrained learning based on partial specifications.

Significance. If the central claims hold, this work would be significant for bridging formal verification techniques with transformer modeling via C-RASP. The connection to Lustre allows reuse of optimized existing tools rather than developing new verifiers, and the local search synthesis method provides a practical approach for program learning under constraints. These could support optimization and correctness guarantees for programs capturing transformer behaviors, with potential impact in program synthesis and neural network verification.

major comments (2)
  1. [Verification section] Verification section (translation to Lustre): The central verification claim rests on a semantics-preserving translation from C-RASP to Lustre that enables effective model checking. However, the manuscript provides no formal proof of semantic equivalence, no complexity analysis, and no discussion of state-space bounds or abstraction for constructs like attention or position encodings. This is load-bearing, as the skeptic concern about exponential blowup in realistic cases is not addressed despite the abstract's efficacy claims on benchmarks.
  2. [Synthesis section] Synthesis section (local search algorithm): The new local search algorithm for learning C-RASP from examples is presented as the second contribution, but without analysis of convergence properties, escape from local optima, or detailed success metrics beyond high-level benchmark mentions, the reliability claim for constrained learning remains unsubstantiated.
minor comments (2)
  1. The abstract states efficacy on benchmarks but supplies no specific error metrics, success rates, or comparison baselines; adding a summary table of results would improve clarity.
  2. Notation for the C-RASP to Lustre mapping could benefit from a small illustrative example to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for acknowledging the potential significance of our work in bridging formal verification with transformer modeling via C-RASP. We address each major comment below and outline the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Verification section] Verification section (translation to Lustre): The central verification claim rests on a semantics-preserving translation from C-RASP to Lustre that enables effective model checking. However, the manuscript provides no formal proof of semantic equivalence, no complexity analysis, and no discussion of state-space bounds or abstraction for constructs like attention or position encodings. This is load-bearing, as the skeptic concern about exponential blowup in realistic cases is not addressed despite the abstract's efficacy claims on benchmarks.

    Authors: We agree that a formal proof of semantic equivalence is necessary to substantiate the central claim. In the revised manuscript we will add a complete proof establishing that the translation preserves the semantics of C-RASP programs. We will also include a complexity analysis demonstrating that the translation is linear in the size of the input program. Regarding state-space bounds and abstractions, we will expand the verification section to discuss how attention mechanisms and position encodings are encoded in Lustre, explain the practical bounds observed in our benchmarks, and address potential exponential blowup by referencing the performance of the underlying SMT-based model checkers on the reported instances. These additions will directly respond to concerns about scalability and the efficacy claims. revision: yes

  2. Referee: [Synthesis section] Synthesis section (local search algorithm): The new local search algorithm for learning C-RASP from examples is presented as the second contribution, but without analysis of convergence properties, escape from local optima, or detailed success metrics beyond high-level benchmark mentions, the reliability claim for constrained learning remains unsubstantiated.

    Authors: We accept that additional analysis is required to strengthen the claims about the local search algorithm. In the revision we will add a discussion of convergence properties under the problem constraints we consider, describe the use of random perturbations and restarts to escape local optima, and report more granular success metrics from the benchmarks, including success rates, average iteration counts, and failure modes. These details will be supported by the experimental data already collected. revision: yes

Circularity Check

0 steps flagged

Independent algorithmic steps for C-RASP verification via Lustre and local-search learning; minor reliance on prior literature without reduction

full rationale

The paper's core contributions are a semantics-preserving translation from C-RASP to Lustre programs (enabling SMT-based model checking) and a new local-search algorithm for synthesizing C-RASP programs from examples. These are presented as novel algorithmic techniques that build upon but do not reduce to the prior definition of C-RASP as a transformer-capturing language. No equations or steps in the provided abstract or described claims equate a derived result to its inputs by construction, rename a fitted parameter as a prediction, or rest the central claim solely on an unverified self-citation chain. The work remains self-contained against external model-checking benchmarks and example-driven synthesis, yielding only a minor self-citation score for dependence on the originating C-RASP literature.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the domain assumption that C-RASP faithfully captures transformer-expressible concepts and on the unproven translation soundness between C-RASP and Lustre.

axioms (2)
  • domain assumption C-RASP captures concepts expressible by transformers
    Stated as background fact in the abstract.
  • domain assumption C-RASP programs can be translated to Lustre while preserving semantics for model checking
    Required for the claimed exploitation of SMT-based checkers.

pith-pipeline@v0.9.0 · 5671 in / 1244 out tokens · 61299 ms · 2026-05-21T12:53:39.717206+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Foundations of symbolic languages for model interpretability

    Marcelo Arenas, Daniel B´ aez, Pablo Barcel´ o, Jorge P´ erez, and Bernardo Subercaseaux. Foundations of symbolic languages for model interpretability. In Marc’Aurelio Ranzato, 2Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. ...

  2. [2]

    Podolskii

    Pablo Barcel´ o, Alexander Kozachinskiy, Anthony Widjaja Lin, and Vladimir V. Podolskii. Logical languages accepted by transformer encoders with hard attention. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  3. [3]

    Model inter- pretability through the lens of computational complexity

    Pablo Barcel´ o, Mika¨ el Monet, Jorge P´ erez, and Bernardo Subercaseaux. Model inter- pretability through the lens of computational complexity. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing ...

  4. [4]

    Johnson, and Haoze Wu

    Christopher Brix, Stanley Bak, Taylor T. Johnson, and Haoze Wu. The fifth international verification of neural networks competition (VNN-COMP 2024): Summary and results. CoRR, abs/2412.19985, 2024

  5. [5]

    Caspi, D

    P. Caspi, D. Pilaud, N. Halbwachs, and J. A. Plaice. LUSTRE: a declarative language for real-time programming. InProceedings of the 14th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL ’87, page 178–188, New York, NY, USA,

  6. [6]

    Association for Computing Machinery

  7. [7]

    The Kind2 model checker

    Adrien Champion, Alain Mebsout, Christoph Sticksel, and Cesare Tinelli. The Kind2 model checker. In Swarat Chaudhuri and Azadeh Farzan, editors,Computer Aided Verifi- cation, pages 510–517, Cham, 2016. Springer International Publishing

  8. [8]

    Learning transformer programs

    Dan Friedman, Alexander Wettig, and Danqi Chen. Learning transformer programs. In Proceedings of the 37th International Conference on Neural Information Processing Sys- tems, NIPS ’23, Red Hook, NY, USA, 2023. Curran Associates Inc

  9. [9]

    Jacobson, and Alan W

    Darrall Henderson, Sheldon H. Jacobson, and Alan W. Johnson.The Theory and Practice of Simulated Annealing, pages 287–319. Springer US, Boston, MA, 2003

  10. [10]

    Lin, Oliver Markgraf, Julian Parsert, and Tony Tan

    Chih-Duo Hong, Hongjian Jiang, Anthony W. Lin, Oliver Markgraf, Julian Parsert, and Tony Tan. Extracting robust register automata from neural networks over data sequences. CoRR, abs/2511.19100, 2025

  11. [11]

    A survey of safety and trustworthiness of deep neu- ral networks: Verification, testing, adversarial attack and defence, and interpretability

    Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neu- ral networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37:100270, 2020

  12. [12]

    Safety verification of deep neural networks

    Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety verification of deep neural networks. In Rupak Majumdar and Viktor Kuncak, editors,Computer Aided Ver- ification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, volume 10426 ofLecture Notes in Computer Science, pages 3–29. Springer, 2017. ...

  13. [13]

    A formal framework for understanding length generalization in transformers

    Xinting Huang, Andy Yang, Satwik Bhattamishra, Yash Raj Sarrof, Andreas Krebs, Hattie Zhou, Preetum Nakkiran, and Michael Hahn. A formal framework for understanding length generalization in transformers. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025

  14. [14]

    Softmax trans- formers are turing-complete.CoRR, abs/2511.20038, 2025

    Hongjian Jiang, Michael Hahn, Georg Zetzsche, and Anthony Widjaja Lin. Softmax trans- formers are turing-complete.CoRR, abs/2511.20038, 2025

  15. [15]

    Barrett, David L

    Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. Reluplex: a calculus for reasoning about deep neural networks.Formal Methods Syst. Des., 60(1):87–116, 2022

  16. [16]

    Kirkpatrick, C

    S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671–680, 1983

  17. [17]

    Characterizing the expressivity of fixed-precision trans- former language models

    Jiaoda Li and Ryan Cotterell. Characterizing the expressivity of fixed-precision trans- former language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  18. [18]

    Ash, Surbhi Goel, Akshay Krishnamurthy, and Cyril Zhang

    Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, and Cyril Zhang. Ex- posing attention glitches with flip-flop language modeling. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  19. [19]

    Strong, Clark W

    Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher A. Strong, Clark W. Bar- rett, and Mykel J. Kochenderfer. Algorithms for verifying deep neural networks.Found. Trends Optim., 4(3-4):244–404, 2021

  20. [20]

    No silver bullet: interpretable ml models must be explained.Frontiers in Artificial Intelligence, 6, 2023

    Joao Marques-Silva and Alexey Ignatiev. No silver bullet: interpretable ml models must be explained.Frontiers in Artificial Intelligence, 6, 2023

  21. [21]

    The expressive power of transformers with chain of thought

    William Merrill and Ashish Sabharwal. The expressive power of transformers with chain of thought. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  22. [22]

    2nd edition, 2022

    Christoph Molnar.Interpretable Machine Learning. 2nd edition, 2022

  23. [23]

    Weighted automata extraction from recurrent neural networks via regression on state spaces

    Takamasa Okudono, Masaki Waga, Taro Sekiyama, and Ichiro Hasuo. Weighted automata extraction from recurrent neural networks via regression on state spaces. InThe Thirty- Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second In- novative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Edu...

  24. [24]

    Attention is turing-complete.J

    Jorge P´ erez, Pablo Barcel´ o, and Javier Marinkovic. Attention is turing-complete.J. Mach. Learn. Res., 22:75:1–75:35, 2021

  25. [25]

    Transformer encoder satisfiability: Com- plexity and impact on formal reasoning

    Marco S¨ alzer, Eric Alsmann, and Martin Lange. Transformer encoder satisfiability: Com- plexity and impact on formal reasoning. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025

  26. [26]

    What formal languages can transformers express? A survey.Trans

    Lena Strobl, William Merrill, Gail Weiss, David Chiang, and Dana Angluin. What formal languages can transformers express? A survey.Trans. Assoc. Comput. Linguistics, 12:543– 561, 2024. 15 Synthesis and Verification of Transformer Programs H. Jiang M. Hague P. R¨ ummer A. W. Lin

  27. [27]

    The counting power of transformers.CoRR, abs/2505.11199, 2025

    Marco S¨ alzer, Chris K¨ ocher, Alexander Kozachinskiy, Georg Zetzsche, and Anthony Wid- jaja Lin. The counting power of transformers.CoRR, abs/2505.11199, 2025

  28. [28]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: An- nual Conferenc...

  29. [29]

    Thinking like transformers

    Gail Weiss, Yoav Goldberg, and Eran Yahav. Thinking like transformers. InInternational Conference on Machine Learning, pages 11080–11090. PMLR, 2021

  30. [30]

    Extracting automata from recurrent neural networks using queries and counterexamples (extended version).Mach

    Gail Weiss, Yoav Goldberg, and Eran Yahav. Extracting automata from recurrent neural networks using queries and counterexamples (extended version).Mach. Learn., 113(5):2877–2919, 2024

  31. [31]

    Length generalization bounds for transformers, 2026

    Andy Yang, Pascal Bergstr¨ aßer, Georg Zetzsche, David Chiang, and Anthony Lin. Length generalization bounds for transformers, 2026. Under submission

  32. [32]

    Knee-deep in c-RASP: A transformer depth hierarchy

    Andy Yang, Micha¨ el Cadilhac, and David Chiang. Knee-deep in c-RASP: A transformer depth hierarchy. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  33. [33]

    Counting like transformers: Compiling temporal counting logic into softmax transformers.CoRR, abs/2404.04393, 2024

    Andy Yang and David Chiang. Counting like transformers: Compiling temporal counting logic into softmax transformers.CoRR, abs/2404.04393, 2024

  34. [34]

    Masked hard-attention transformers recog- nize exactly the star-free languages

    Andy Yang, David Chiang, and Dana Angluin. Masked hard-attention transformers recog- nize exactly the star-free languages. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Syste...

  35. [35]

    Susskind, Samy Bengio, and Preetum Nakkiran

    Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Joshua M. Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? A study in length generalization. InThe Twelfth International Conference on Learning Represen- tations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. 16 Synthesis and Verifica...