Interpretable and Verifiable Hardware Generation with LLM-Driven Stepwise Refinement
Pith reviewed 2026-06-27 00:24 UTC · model grok-4.3
The pith
LLMs generate correct RTL hardware by applying a set of correctness-preserving transformation rules iteratively.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that by iteratively applying a devised set of transformation rules that cover various design decisions and hardware features, an LLM agent can convert a design specification into an RTL program with guaranteed correctness.
What carries the argument
The set of transformation rules, each of which preserves semantic and logical correctness when applied.
If this is right
- Design specifications can be turned into RTL programs without introducing subtle errors.
- The generation process becomes interpretable through the sequence of applied rules.
- Hardware engineers can rely on the output with mathematical rigor from the rules.
- The framework demonstrates effectiveness in experiments for efficiency and correctness.
Where Pith is reading between the lines
- This approach might be adapted to generate verified code in other formal languages beyond hardware.
- Expanding the rule set could handle more complex or novel hardware architectures.
- Integration with existing formal verification tools could further strengthen the guarantees.
Load-bearing premise
The set of transformation rules is comprehensive enough to cover all relevant design decisions and hardware features.
What would settle it
Demonstrating a design specification for which no complete sequence of rules produces a correct RTL implementation, or where an applied rule changes the intended behavior.
Figures
read the original abstract
Large language models (LLMs) have achieved remarkable success in software development. However, they are susceptible to hallucinations, meaning that they can introduce subtle semantic and logical errors. Due to the high stakes in chip design and manufacturing, hardware engineers are still reluctant to rely on LLMs for register-transfer level (RTL) generation. In this paper, we propose a hardware generation framework that combines the creativity and broad knowledge of LLMs with the explainability and mathematical rigor of formal methods. Specifically, we devise a set of transformation rules that cover various design decisions and hardware features. By iteratively applying these rules, an LLM agent can convert a design specification into an RTL program with guaranteed correctness. Experimental results demonstrate the effectiveness and efficiency of the framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a hardware generation framework that combines LLMs with formal methods for RTL code generation. It devises a set of transformation rules covering design decisions and hardware features; an LLM agent iteratively applies these rules to convert a design specification into an RTL program asserted to have guaranteed correctness. Experimental results are said to demonstrate the framework's effectiveness and efficiency.
Significance. If the central claim of guaranteed correctness holds, the work could meaningfully reduce reliance on ad-hoc LLM prompting for hardware by supplying an interpretable, rule-based refinement process that inherits formal guarantees. This would be a useful contribution at the intersection of AI-assisted design and verification, particularly if the rules and their application can be shown to be sound and sufficiently complete.
major comments (3)
- [Abstract and §1] Abstract and §1: The claim that iterative rule application yields RTL 'with guaranteed correctness' is load-bearing for the entire contribution, yet the manuscript supplies neither an inductive soundness argument for the individual transformation rules nor a completeness argument that the enumerated rules cover all relevant design decisions and hardware features. Without these, the guarantee cannot be established.
- [§3] §3 (Transformation Rules): The rules are described at a high level, but no formal semantics, preservation lemmas, or verification (model checking, theorem proving, or exhaustive case analysis) of the rules themselves is reported. Any undetected unsound rule or coverage gap falsifies the guarantee when an LLM performs the applications.
- [§4] §4 (Experiments): The reported experiments evaluate effectiveness and efficiency but contain no evaluation of the correctness guarantee, such as equivalence checking of generated RTL against the specification or formal verification of rule applications on the produced designs.
minor comments (2)
- The abstract would be clearer if it briefly characterized the scope of the transformation rules or provided a small illustrative example of one rule application.
- [§2] Notation for the design specification and RTL representation should be introduced consistently in §2 before the rules are presented.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. The feedback highlights the need for stronger formal justification of our correctness claims. We address each point below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and §1] The claim that iterative rule application yields RTL 'with guaranteed correctness' is load-bearing for the entire contribution, yet the manuscript supplies neither an inductive soundness argument for the individual transformation rules nor a completeness argument that the enumerated rules cover all relevant design decisions and hardware features. Without these, the guarantee cannot be established.
Authors: We agree that a formal inductive soundness argument and a discussion of completeness would strengthen the paper. The transformation rules were designed to be semantics-preserving, but the manuscript does not explicitly prove this. In the revised version, we will add an inductive soundness argument in a new subsection of §3 and discuss the coverage of the rule set with respect to common hardware design patterns. revision: yes
-
Referee: [§3] The rules are described at a high level, but no formal semantics, preservation lemmas, or verification (model checking, theorem proving, or exhaustive case analysis) of the rules themselves is reported. Any undetected unsound rule or coverage gap falsifies the guarantee when an LLM performs the applications.
Authors: The referee is correct that formal semantics and preservation lemmas are not provided. We will revise §3 to include formal definitions of the transformation rules' semantics and prove preservation lemmas for key properties such as functional equivalence and timing correctness. revision: yes
-
Referee: [§4] The reported experiments evaluate effectiveness and efficiency but contain no evaluation of the correctness guarantee, such as equivalence checking of generated RTL against the specification or formal verification of rule applications on the produced designs.
Authors: We acknowledge the lack of direct evaluation of the correctness guarantee in the experiments. In the revised manuscript, we will augment §4 with results from equivalence checking (using tools like Synopsys Formality) on the generated designs against reference implementations, and report on any formal verification performed. revision: yes
Circularity Check
No circularity: framework claim rests on unproven rule soundness, not self-referential derivation
full rationale
The paper asserts that iterative application of a devised set of transformation rules yields RTL with guaranteed correctness. This claim depends on external properties (soundness and completeness of the rule set) that are not shown via equations, self-citation chains, or fitted inputs within the provided text. No derivation reduces to its own inputs by construction; the abstract and description contain no mathematical steps, parameters, or renamings that would trigger the enumerated circularity patterns. The burden of proof for the guarantee lies outside the derivation itself.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The set of transformation rules covers various design decisions and hardware features sufficiently to guarantee correctness.
Reference graph
Works this paper leans on
-
[1]
Martin Abadi and Leslie Lamport. 1993. Composing specifications.ACM Trans- actions on Programming Languages and Systems (TOPLAS)15, 1 (1993), 73–132
1993
-
[2]
Mohammad Akyash, Kimia Azar, and Hadi Kamali. 2025. Rtl++: Graph-enhanced llm for rtl code generation. InICLAD 2025
2025
-
[3]
1993.Structured logic design with VHDL
James R Armstrong and F Gail Gray. 1993.Structured logic design with VHDL. Prentice-Hall, Inc
1993
-
[4]
2000.VHDL design representation and synthesis
James R Armstrong and F Gail Gray. 2000.VHDL design representation and synthesis. Prentice Hall PTR
2000
-
[5]
Ralph-JR Back. 1988. A calculus of refinements for program derivations.Acta Informatica25, 6 (1988), 593–624
1988
-
[6]
Friedrich L Bauer, Bernhard Moller, Helmuth Partsch, and Peter Pepper. 1989. Formal program construction by transformations-computer-aided, intuition- guided programming.IEEE Transactions on Software Engineering15, 2 (1989), 165–180
1989
-
[7]
Amulya Bhattaram, Janani Ramamoorthy, Ranit Gupta, Diana Marculescu, and Dimitrios Stamoulis. 2025. Automated Multi-Agent Workflows for RTL Design. NeurIPS 2025 Workshop MLForSys(2025)
2025
-
[8]
Peter T Breuer, Carlos Kloos Delgado, Andres Lopez Marin, Natividad Mar- tinez Madrid, and Luis Sanchez Fernandez. 1997. A refinement calculus for the synthesis of verified hardware descriptions in VHDL.ACM Transactions on Programming Languages and Systems (TOPLAS)19, 4 (1997), 586–616
1997
-
[9]
Peter T Breuer, Luis Sanchez Fernandez, and Carlos Delgado Kloos. 1995. A simple denotational semantics, proof theory and a validation condition generator for unit-delay VHDL.Formal Methods in System Design7, 1 (1995), 27–51
1995
-
[10]
Peter T Breuer, Natividad Martínez Madrid, Luis Sánchez, Andrés Marin, and Carlos Delgado Kloos. 1996. A formal method for specification and refinement of real-time systems. InProceedings of the Eighth Euromicro Workshop on Real-Time Systems. IEEE, 200–204
1996
-
[11]
Yufan Cai, Zhe Hou, David Sanán, Xiaokun Luan, Yun Lin, Jun Sun, and Jin Song Dong. 2025. Automated program refinement: Guide and verify code large lan- guage model with refinement calculus.POPL 2025(2025)
2025
-
[12]
Zhiteng Chao, Xinyu Zhang, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jing Justin Ye, Jianan Mu, and Huawei Li. 2026. RTLSeek: Boosting the LLM-Based RTL Generation with Diversity-Oriented Reinforcement Learning. (2026)
2026
-
[13]
Luca Collini, Siddharth Garg, and Ramesh Karri. 2025. C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap.ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–24
2025
-
[14]
Jason Cong, Bin Liu, Stephen Neuendorffer, Juanjo Noguera, Kees Vissers, and Zhiru Zhang. 2011. High-level synthesis for FPGAs: From prototyping to deploy- ment.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems30, 4 (2011), 473–491
2011
-
[15]
Ole-Johan Dahl, Edsger Wybe Dijkstra, and Charles Antony Richard Hoare. 1972. Structured programming. Academic Press Ltd
1972
-
[16]
1976.A discipline of programming
Edsger Dijkstra. 1976.A discipline of programming. Prentice-Hall, Inc
1976
-
[17]
Jürgen Dingel. 2002. A refinement calculus for shared-variable parallel and distributed programming.Formal Aspects of Computing14, 2 (2002), 123–197
2002
-
[18]
Iulia Dragomir, Viorel Preoteasa, and Stavros Tripakis. 2020. The refinement calculus of reactive systems toolset.International Journal on Software Tools for Technology Transfer22, 6 (2020), 689–708
2020
-
[19]
Fanghao Fan, Yingjie Xia, and Li Kuang. 2025. SecV: LLM-based Secure Verilog Generation with Clue-Guided Exploration on Hardware-CWE Knowledge Graph. InIJCAI 2025
2025
-
[20]
Ian J Hayes and Mark Utting. 2001. A sequential real-time refinement calculus. Acta Informatica37, 6 (2001), 385–448
2001
-
[21]
Chia-Tung Ho, Haoxing Ren, and Brucek Khailany. 2025. Verilogcoder: Au- tonomous verilog coding agents with graph-based planning and abstract syntax tree (ast)-based waveform tracing tool. InAAAI 2025
2025
-
[22]
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InCOLM 2024
2024
- [23]
-
[24]
Cliff B. Jones. 1983. Tentative steps toward a development method for interfering programs.ACM Transactions on Programming Languages and Systems (TOPLAS) 5, 4 (1983), 596–619
1983
-
[25]
Jason Koenig and K Rustan M Leino. 2016. Programming language features for refinement.arXiv preprint arXiv:1606.02022(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[26]
Peter Lammich and Andreas Lochbihler. 2019. Automatic refinement to efficient data structures: A comparison of two approaches.Journal of Automated Reasoning 63, 1 (2019), 53–94
2019
-
[27]
K Rustan M Leino. 2010. Dafny: An automatic program verifier for functional cor- rectness. InInternational conference on logic for programming artificial intelligence and reasoning. Springer, 348–370. 9
2010
-
[28]
Brendan P Mahony and Ian J Hayes. 1992. A case-study in timed refinement: A mine pump.IEEE transactions on Software Engineering18, 9 (1992), 817–826
1992
-
[29]
Zohar Manna and Amir Pnueli. 1990. A hierarchy of temporal properties (invited paper, 1989). InProceedings of the ninth annual ACM symposium on Principles of distributed computing. 377–410
1990
-
[30]
Patrick Meredith, Michael Katelman, José Meseguer, and Grigore Roşu. 2010. A formal executable semantics of Verilog. InMEMOCODE 2010
2010
-
[31]
Carroll Morgan. 1988. The specification statement.ACM Transactions on Pro- gramming Languages and Systems10, 3 (1988), 403–419
1988
-
[32]
1990.Programming from specifications
Carroll Morgan. 1990.Programming from specifications. Prentice-Hall, Inc
1990
-
[33]
Joseph M Morris. 1987. A theoretical basis for stepwise refinement and the programming calculus.Science of Computer programming9, 3 (1987), 287–306
1987
-
[34]
Nathaniel Pinckney, Christopher Batten, Mingjie Liu, Haoxing Ren, and Brucek Khailany. 2025. Revisiting verilogeval: A year of improvements in large-language models for hardware code generation.ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–20
2025
-
[35]
Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, and Haoxing Ren. 2025. Comprehensive Verilog design problems: A next-generation benchmark dataset for evaluating large language models and agents on rtl design and verification.arXiv preprint arXiv:2506.14074(2025)
-
[36]
1996.Requirements engineering: An overview
Klaus Pohl. 1996.Requirements engineering: An overview. RWTH, Fachgruppe Informatik Aachen
1996
-
[37]
Viorel Preoteasa and Stavros Tripakis. 2014. Refinement calculus of reactive systems. InProceedings of the 14th International Conference on Embedded Software. 1–10
2014
-
[38]
Farzaneh Rabiei, Mark Zakharov, and Jose Renau. 2025. Beyond Verilog: Agents for Emerging HDLs. InIEEE International Conference on Omni-layer Intelligent Systems (COINS)
2025
-
[39]
Bernhard Rumpe and Cornel Klein. 1996. Automata describing object behavior. InObject-Oriented Behavioral Specifications. Springer, 265–286
1996
-
[40]
Peter Scholz. 1998. A refinement calculus for statecharts. InInternational Confer- ence on Fundamental Approaches to Software Engineering. Springer, 285–301
1998
-
[41]
Douglas J Smith. 1996. VHDL & Verilog compared & contrasted—plus modeled example written in VHDL, Verilog and C. InDAC 1996
1996
- [42]
-
[43]
YunDa Tsai, Mingjie Liu, and Haoxing Ren. 2024. Rtlfixer: Automatically fixing rtl syntax errors with large language model. InDAC 2024
2024
- [44]
-
[45]
Niklaus Wirth. 1971. Program development by stepwise refinement.Commun. ACM14, 4 (1971), 221–227
1971
-
[46]
Clifford Wolf, Johann Glaser, and Johannes Kepler. 2013. Yosys-a free verilog synthesis suite. InProceedings of the 21st Austrian Workshop on Microelectronics (Austrochip), Vol. 97. 1–6
2013
-
[47]
Zhongzhi Yu, Mingjie Liu, Michael Zimmer, Yingyan Celine, Yong Liu, and Haoxing Ren. 2025. Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems. InIEEE International Conference on LLM-Aided Design (ICLAD). 37–43
2025
-
[48]
Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao. 2025. Mage: A multi-agent engine for automated rtl code generation. InDAC 2025
2025
-
[49]
Iteration
Yingjie Zhou, Renzhi Chen, Xinyu Li, Jingkai Wang, Zhigang Fang, Bowei Wang, Wenqiang Bai, Qilin Cao, and Lei Wang. 2025. VToT: Automatic Verilog Genera- tion via LLMs with Tree of Thoughts Prompting. InDATE 2025. 8 Appendix A In this appendix, we select 3 simple examples to illustrate the hardware refinement process. 8.1 Oscillator The following example ...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.