Interpretable and Verifiable Hardware Generation with LLM-Driven Stepwise Refinement

David Z. Pan; Samuel Mandell; You Li

arxiv: 2606.19387 · v1 · pith:4FAWM67Xnew · submitted 2026-06-16 · 💻 cs.SE · cs.AI

Interpretable and Verifiable Hardware Generation with LLM-Driven Stepwise Refinement

You Li , Samuel Mandell , David Z. Pan This is my paper

Pith reviewed 2026-06-27 00:24 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords hardware generationLLMRTLtransformation rulesformal methodsstepwise refinementverifiable design

0 comments

The pith

LLMs generate correct RTL hardware by applying a set of correctness-preserving transformation rules iteratively.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that uses LLMs to refine hardware design specifications into register-transfer level code. A collection of transformation rules guides the process so that each change keeps the design logically and semantically correct. This combination aims to eliminate the risk of errors that LLMs sometimes introduce in complex tasks like chip design. A sympathetic reader would see this as a way to make automated hardware generation both creative and trustworthy.

Core claim

The authors claim that by iteratively applying a devised set of transformation rules that cover various design decisions and hardware features, an LLM agent can convert a design specification into an RTL program with guaranteed correctness.

What carries the argument

The set of transformation rules, each of which preserves semantic and logical correctness when applied.

If this is right

Design specifications can be turned into RTL programs without introducing subtle errors.
The generation process becomes interpretable through the sequence of applied rules.
Hardware engineers can rely on the output with mathematical rigor from the rules.
The framework demonstrates effectiveness in experiments for efficiency and correctness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might be adapted to generate verified code in other formal languages beyond hardware.
Expanding the rule set could handle more complex or novel hardware architectures.
Integration with existing formal verification tools could further strengthen the guarantees.

Load-bearing premise

The set of transformation rules is comprehensive enough to cover all relevant design decisions and hardware features.

What would settle it

Demonstrating a design specification for which no complete sequence of rules produces a correct RTL implementation, or where an applied rule changes the intended behavior.

Figures

Figures reproduced from arXiv: 2606.19387 by David Z. Pan, Samuel Mandell, You Li.

**Figure 2.** Figure 2: Architecture of the agentic system for correct-by-construction RTL generation. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Number of tokens consumed in stage 2 vs. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Total runtime of stage 2 vs. search depth on [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Large language models (LLMs) have achieved remarkable success in software development. However, they are susceptible to hallucinations, meaning that they can introduce subtle semantic and logical errors. Due to the high stakes in chip design and manufacturing, hardware engineers are still reluctant to rely on LLMs for register-transfer level (RTL) generation. In this paper, we propose a hardware generation framework that combines the creativity and broad knowledge of LLMs with the explainability and mathematical rigor of formal methods. Specifically, we devise a set of transformation rules that cover various design decisions and hardware features. By iteratively applying these rules, an LLM agent can convert a design specification into an RTL program with guaranteed correctness. Experimental results demonstrate the effectiveness and efficiency of the framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches an LLM agent applying formal rules for verifiable RTL generation but offers no details or evidence to support the correctness guarantee.

read the letter

The paper's main claim is that an LLM agent can generate correct RTL by iteratively applying a set of formal transformation rules to a design spec. The abstract positions this as a way to get both LLM flexibility and formal guarantees for hardware.

What stands out is the attempt to bridge LLMs and formal methods in a practical way for chip design. Hardware has real safety and cost issues, so reducing reliance on manual verification while keeping some rigor makes sense. They say they created rules covering various design decisions and hardware features, and that experiments show the framework works.

That combination is a reasonable direction. If the rules are well-chosen and the process is truly stepwise with checks, it could help make generations more explainable.

The problem is that none of the actual rules appear in the abstract, and there's no sketch of why they preserve correctness or how complete the set is. The guarantee is stated but not backed by any derivation or example. The experiments are mentioned without numbers or setup details. This leaves the central promise untested in the provided text.

If the full paper includes the rule definitions, a soundness argument, and real results, it could be worth following. As it stands, the work reads as an early outline rather than a completed method.

This kind of paper is aimed at the intersection of AI for code and hardware design automation. Someone already working on LLM agents for RTL might find the high-level framing useful as a starting point, but it won't give them a method to implement or build on yet.

I would not bring this to a reading group in its current form because there's not enough substance to discuss. I wouldn't cite it until more details are available. It probably does not deserve peer review yet because the key claims lack any supporting evidence or artifacts.

Referee Report

3 major / 2 minor

Summary. The paper proposes a hardware generation framework that combines LLMs with formal methods for RTL code generation. It devises a set of transformation rules covering design decisions and hardware features; an LLM agent iteratively applies these rules to convert a design specification into an RTL program asserted to have guaranteed correctness. Experimental results are said to demonstrate the framework's effectiveness and efficiency.

Significance. If the central claim of guaranteed correctness holds, the work could meaningfully reduce reliance on ad-hoc LLM prompting for hardware by supplying an interpretable, rule-based refinement process that inherits formal guarantees. This would be a useful contribution at the intersection of AI-assisted design and verification, particularly if the rules and their application can be shown to be sound and sufficiently complete.

major comments (3)

[Abstract and §1] Abstract and §1: The claim that iterative rule application yields RTL 'with guaranteed correctness' is load-bearing for the entire contribution, yet the manuscript supplies neither an inductive soundness argument for the individual transformation rules nor a completeness argument that the enumerated rules cover all relevant design decisions and hardware features. Without these, the guarantee cannot be established.
[§3] §3 (Transformation Rules): The rules are described at a high level, but no formal semantics, preservation lemmas, or verification (model checking, theorem proving, or exhaustive case analysis) of the rules themselves is reported. Any undetected unsound rule or coverage gap falsifies the guarantee when an LLM performs the applications.
[§4] §4 (Experiments): The reported experiments evaluate effectiveness and efficiency but contain no evaluation of the correctness guarantee, such as equivalence checking of generated RTL against the specification or formal verification of rule applications on the produced designs.

minor comments (2)

The abstract would be clearer if it briefly characterized the scope of the transformation rules or provided a small illustrative example of one rule application.
[§2] Notation for the design specification and RTL representation should be introduced consistently in §2 before the rules are presented.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. The feedback highlights the need for stronger formal justification of our correctness claims. We address each point below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract and §1] The claim that iterative rule application yields RTL 'with guaranteed correctness' is load-bearing for the entire contribution, yet the manuscript supplies neither an inductive soundness argument for the individual transformation rules nor a completeness argument that the enumerated rules cover all relevant design decisions and hardware features. Without these, the guarantee cannot be established.

Authors: We agree that a formal inductive soundness argument and a discussion of completeness would strengthen the paper. The transformation rules were designed to be semantics-preserving, but the manuscript does not explicitly prove this. In the revised version, we will add an inductive soundness argument in a new subsection of §3 and discuss the coverage of the rule set with respect to common hardware design patterns. revision: yes
Referee: [§3] The rules are described at a high level, but no formal semantics, preservation lemmas, or verification (model checking, theorem proving, or exhaustive case analysis) of the rules themselves is reported. Any undetected unsound rule or coverage gap falsifies the guarantee when an LLM performs the applications.

Authors: The referee is correct that formal semantics and preservation lemmas are not provided. We will revise §3 to include formal definitions of the transformation rules' semantics and prove preservation lemmas for key properties such as functional equivalence and timing correctness. revision: yes
Referee: [§4] The reported experiments evaluate effectiveness and efficiency but contain no evaluation of the correctness guarantee, such as equivalence checking of generated RTL against the specification or formal verification of rule applications on the produced designs.

Authors: We acknowledge the lack of direct evaluation of the correctness guarantee in the experiments. In the revised manuscript, we will augment §4 with results from equivalence checking (using tools like Synopsys Formality) on the generated designs against reference implementations, and report on any formal verification performed. revision: yes

Circularity Check

0 steps flagged

No circularity: framework claim rests on unproven rule soundness, not self-referential derivation

full rationale

The paper asserts that iterative application of a devised set of transformation rules yields RTL with guaranteed correctness. This claim depends on external properties (soundness and completeness of the rule set) that are not shown via equations, self-citation chains, or fitted inputs within the provided text. No derivation reduces to its own inputs by construction; the abstract and description contain no mathematical steps, parameters, or renamings that would trigger the enumerated circularity patterns. The burden of proof for the guarantee lies outside the derivation itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the unverified assumption that the transformation rules are both complete and correctness-preserving; no free parameters or invented entities are described.

axioms (1)

domain assumption The set of transformation rules covers various design decisions and hardware features sufficiently to guarantee correctness.
Directly stated in the abstract as the basis for iterative application by the LLM agent.

pith-pipeline@v0.9.1-grok · 5650 in / 1023 out tokens · 22341 ms · 2026-06-27T00:24:10.870338+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Martin Abadi and Leslie Lamport. 1993. Composing specifications.ACM Trans- actions on Programming Languages and Systems (TOPLAS)15, 1 (1993), 73–132

1993
[2]

Mohammad Akyash, Kimia Azar, and Hadi Kamali. 2025. Rtl++: Graph-enhanced llm for rtl code generation. InICLAD 2025

2025
[3]

1993.Structured logic design with VHDL

James R Armstrong and F Gail Gray. 1993.Structured logic design with VHDL. Prentice-Hall, Inc

1993
[4]

2000.VHDL design representation and synthesis

James R Armstrong and F Gail Gray. 2000.VHDL design representation and synthesis. Prentice Hall PTR

2000
[5]

Ralph-JR Back. 1988. A calculus of refinements for program derivations.Acta Informatica25, 6 (1988), 593–624

1988
[6]

Friedrich L Bauer, Bernhard Moller, Helmuth Partsch, and Peter Pepper. 1989. Formal program construction by transformations-computer-aided, intuition- guided programming.IEEE Transactions on Software Engineering15, 2 (1989), 165–180

1989
[7]

Amulya Bhattaram, Janani Ramamoorthy, Ranit Gupta, Diana Marculescu, and Dimitrios Stamoulis. 2025. Automated Multi-Agent Workflows for RTL Design. NeurIPS 2025 Workshop MLForSys(2025)

2025
[8]

Peter T Breuer, Carlos Kloos Delgado, Andres Lopez Marin, Natividad Mar- tinez Madrid, and Luis Sanchez Fernandez. 1997. A refinement calculus for the synthesis of verified hardware descriptions in VHDL.ACM Transactions on Programming Languages and Systems (TOPLAS)19, 4 (1997), 586–616

1997
[9]

Peter T Breuer, Luis Sanchez Fernandez, and Carlos Delgado Kloos. 1995. A simple denotational semantics, proof theory and a validation condition generator for unit-delay VHDL.Formal Methods in System Design7, 1 (1995), 27–51

1995
[10]

Peter T Breuer, Natividad Martínez Madrid, Luis Sánchez, Andrés Marin, and Carlos Delgado Kloos. 1996. A formal method for specification and refinement of real-time systems. InProceedings of the Eighth Euromicro Workshop on Real-Time Systems. IEEE, 200–204

1996
[11]

Yufan Cai, Zhe Hou, David Sanán, Xiaokun Luan, Yun Lin, Jun Sun, and Jin Song Dong. 2025. Automated program refinement: Guide and verify code large lan- guage model with refinement calculus.POPL 2025(2025)

2025
[12]

Zhiteng Chao, Xinyu Zhang, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jing Justin Ye, Jianan Mu, and Huawei Li. 2026. RTLSeek: Boosting the LLM-Based RTL Generation with Diversity-Oriented Reinforcement Learning. (2026)

2026
[13]

Luca Collini, Siddharth Garg, and Ramesh Karri. 2025. C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap.ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–24

2025
[14]

Jason Cong, Bin Liu, Stephen Neuendorffer, Juanjo Noguera, Kees Vissers, and Zhiru Zhang. 2011. High-level synthesis for FPGAs: From prototyping to deploy- ment.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems30, 4 (2011), 473–491

2011
[15]

Ole-Johan Dahl, Edsger Wybe Dijkstra, and Charles Antony Richard Hoare. 1972. Structured programming. Academic Press Ltd

1972
[16]

1976.A discipline of programming

Edsger Dijkstra. 1976.A discipline of programming. Prentice-Hall, Inc

1976
[17]

Jürgen Dingel. 2002. A refinement calculus for shared-variable parallel and distributed programming.Formal Aspects of Computing14, 2 (2002), 123–197

2002
[18]

Iulia Dragomir, Viorel Preoteasa, and Stavros Tripakis. 2020. The refinement calculus of reactive systems toolset.International Journal on Software Tools for Technology Transfer22, 6 (2020), 689–708

2020
[19]

Fanghao Fan, Yingjie Xia, and Li Kuang. 2025. SecV: LLM-based Secure Verilog Generation with Clue-Guided Exploration on Hardware-CWE Knowledge Graph. InIJCAI 2025

2025
[20]

Ian J Hayes and Mark Utting. 2001. A sequential real-time refinement calculus. Acta Informatica37, 6 (2001), 385–448

2001
[21]

Chia-Tung Ho, Haoxing Ren, and Brucek Khailany. 2025. Verilogcoder: Au- tonomous verilog coding agents with graph-based planning and abstract syntax tree (ast)-based waveform tracing tool. InAAAI 2025

2025
[22]

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InCOLM 2024

2024
[23]

Hanxian Huang, Zhenghan Lin, Zixuan Wang, Xin Chen, Ke Ding, and Jishen Zhao. 2024. Towards llm-powered verilog rtl assistant: Self-verification and self-correction.arXiv preprint arXiv:2406.00115(2024)

work page arXiv 2024
[24]

Cliff B. Jones. 1983. Tentative steps toward a development method for interfering programs.ACM Transactions on Programming Languages and Systems (TOPLAS) 5, 4 (1983), 596–619

1983
[25]

Jason Koenig and K Rustan M Leino. 2016. Programming language features for refinement.arXiv preprint arXiv:1606.02022(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[26]

Peter Lammich and Andreas Lochbihler. 2019. Automatic refinement to efficient data structures: A comparison of two approaches.Journal of Automated Reasoning 63, 1 (2019), 53–94

2019
[27]

K Rustan M Leino. 2010. Dafny: An automatic program verifier for functional cor- rectness. InInternational conference on logic for programming artificial intelligence and reasoning. Springer, 348–370. 9

2010
[28]

Brendan P Mahony and Ian J Hayes. 1992. A case-study in timed refinement: A mine pump.IEEE transactions on Software Engineering18, 9 (1992), 817–826

1992
[29]

Zohar Manna and Amir Pnueli. 1990. A hierarchy of temporal properties (invited paper, 1989). InProceedings of the ninth annual ACM symposium on Principles of distributed computing. 377–410

1990
[30]

Patrick Meredith, Michael Katelman, José Meseguer, and Grigore Roşu. 2010. A formal executable semantics of Verilog. InMEMOCODE 2010

2010
[31]

Carroll Morgan. 1988. The specification statement.ACM Transactions on Pro- gramming Languages and Systems10, 3 (1988), 403–419

1988
[32]

1990.Programming from specifications

Carroll Morgan. 1990.Programming from specifications. Prentice-Hall, Inc

1990
[33]

Joseph M Morris. 1987. A theoretical basis for stepwise refinement and the programming calculus.Science of Computer programming9, 3 (1987), 287–306

1987
[34]

Nathaniel Pinckney, Christopher Batten, Mingjie Liu, Haoxing Ren, and Brucek Khailany. 2025. Revisiting verilogeval: A year of improvements in large-language models for hardware code generation.ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–20

2025
[35]

Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, and Haoxing Ren. 2025. Comprehensive Verilog design problems: A next-generation benchmark dataset for evaluating large language models and agents on rtl design and verification.arXiv preprint arXiv:2506.14074(2025)

work page arXiv 2025
[36]

1996.Requirements engineering: An overview

Klaus Pohl. 1996.Requirements engineering: An overview. RWTH, Fachgruppe Informatik Aachen

1996
[37]

Viorel Preoteasa and Stavros Tripakis. 2014. Refinement calculus of reactive systems. InProceedings of the 14th International Conference on Embedded Software. 1–10

2014
[38]

Farzaneh Rabiei, Mark Zakharov, and Jose Renau. 2025. Beyond Verilog: Agents for Emerging HDLs. InIEEE International Conference on Omni-layer Intelligent Systems (COINS)

2025
[39]

Bernhard Rumpe and Cornel Klein. 1996. Automata describing object behavior. InObject-Oriented Behavioral Specifications. Springer, 265–286

1996
[40]

Peter Scholz. 1998. A refinement calculus for statecharts. InInternational Confer- ence on Fundamental Approaches to Software Engineering. Springer, 285–301

1998
[41]

Douglas J Smith. 1996. VHDL & Verilog compared & contrasted—plus modeled example written in VHDL, Verilog and C. InDAC 1996

1996
[42]

Shailja Thakur, Jason Blocklove, Hammond Pearce, Benjamin Tan, Siddharth Garg, and Ramesh Karri. 2023. Autochip: Automating hdl generation using llm feedback.arXiv preprint arXiv:2311.04887(2023)

work page arXiv 2023
[43]

YunDa Tsai, Mingjie Liu, and Haoxing Ren. 2024. Rtlfixer: Automatically fixing rtl syntax errors with large language model. InDAC 2024

2024
[44]

Yiting Wang, Guoheng Sun, Wanghao Ye, Gang Qu, and Ang Li. 2025. Verireason: Reinforcement learning with testbench feedback for reasoning-enhanced verilog generation.arXiv preprint arXiv:2505.11849(2025)

work page arXiv 2025
[45]

Niklaus Wirth. 1971. Program development by stepwise refinement.Commun. ACM14, 4 (1971), 221–227

1971
[46]

Clifford Wolf, Johann Glaser, and Johannes Kepler. 2013. Yosys-a free verilog synthesis suite. InProceedings of the 21st Austrian Workshop on Microelectronics (Austrochip), Vol. 97. 1–6

2013
[47]

Zhongzhi Yu, Mingjie Liu, Michael Zimmer, Yingyan Celine, Yong Liu, and Haoxing Ren. 2025. Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems. InIEEE International Conference on LLM-Aided Design (ICLAD). 37–43

2025
[48]

Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao. 2025. Mage: A multi-agent engine for automated rtl code generation. InDAC 2025

2025
[49]

Iteration

Yingjie Zhou, Renzhi Chen, Xinyu Li, Jingkai Wang, Zhigang Fang, Bowei Wang, Wenqiang Bai, Qilin Cao, and Lei Wang. 2025. VToT: Automatic Verilog Genera- tion via LLMs with Tree of Thoughts Prompting. InDATE 2025. 8 Appendix A In this appendix, we select 3 simple examples to illustrate the hardware refinement process. 8.1 Oscillator The following example ...

2025

[1] [1]

Martin Abadi and Leslie Lamport. 1993. Composing specifications.ACM Trans- actions on Programming Languages and Systems (TOPLAS)15, 1 (1993), 73–132

1993

[2] [2]

Mohammad Akyash, Kimia Azar, and Hadi Kamali. 2025. Rtl++: Graph-enhanced llm for rtl code generation. InICLAD 2025

2025

[3] [3]

1993.Structured logic design with VHDL

James R Armstrong and F Gail Gray. 1993.Structured logic design with VHDL. Prentice-Hall, Inc

1993

[4] [4]

2000.VHDL design representation and synthesis

James R Armstrong and F Gail Gray. 2000.VHDL design representation and synthesis. Prentice Hall PTR

2000

[5] [5]

Ralph-JR Back. 1988. A calculus of refinements for program derivations.Acta Informatica25, 6 (1988), 593–624

1988

[6] [6]

Friedrich L Bauer, Bernhard Moller, Helmuth Partsch, and Peter Pepper. 1989. Formal program construction by transformations-computer-aided, intuition- guided programming.IEEE Transactions on Software Engineering15, 2 (1989), 165–180

1989

[7] [7]

Amulya Bhattaram, Janani Ramamoorthy, Ranit Gupta, Diana Marculescu, and Dimitrios Stamoulis. 2025. Automated Multi-Agent Workflows for RTL Design. NeurIPS 2025 Workshop MLForSys(2025)

2025

[8] [8]

Peter T Breuer, Carlos Kloos Delgado, Andres Lopez Marin, Natividad Mar- tinez Madrid, and Luis Sanchez Fernandez. 1997. A refinement calculus for the synthesis of verified hardware descriptions in VHDL.ACM Transactions on Programming Languages and Systems (TOPLAS)19, 4 (1997), 586–616

1997

[9] [9]

Peter T Breuer, Luis Sanchez Fernandez, and Carlos Delgado Kloos. 1995. A simple denotational semantics, proof theory and a validation condition generator for unit-delay VHDL.Formal Methods in System Design7, 1 (1995), 27–51

1995

[10] [10]

Peter T Breuer, Natividad Martínez Madrid, Luis Sánchez, Andrés Marin, and Carlos Delgado Kloos. 1996. A formal method for specification and refinement of real-time systems. InProceedings of the Eighth Euromicro Workshop on Real-Time Systems. IEEE, 200–204

1996

[11] [11]

Yufan Cai, Zhe Hou, David Sanán, Xiaokun Luan, Yun Lin, Jun Sun, and Jin Song Dong. 2025. Automated program refinement: Guide and verify code large lan- guage model with refinement calculus.POPL 2025(2025)

2025

[12] [12]

Zhiteng Chao, Xinyu Zhang, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jing Justin Ye, Jianan Mu, and Huawei Li. 2026. RTLSeek: Boosting the LLM-Based RTL Generation with Diversity-Oriented Reinforcement Learning. (2026)

2026

[13] [13]

Luca Collini, Siddharth Garg, and Ramesh Karri. 2025. C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap.ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–24

2025

[14] [14]

Jason Cong, Bin Liu, Stephen Neuendorffer, Juanjo Noguera, Kees Vissers, and Zhiru Zhang. 2011. High-level synthesis for FPGAs: From prototyping to deploy- ment.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems30, 4 (2011), 473–491

2011

[15] [15]

Ole-Johan Dahl, Edsger Wybe Dijkstra, and Charles Antony Richard Hoare. 1972. Structured programming. Academic Press Ltd

1972

[16] [16]

1976.A discipline of programming

Edsger Dijkstra. 1976.A discipline of programming. Prentice-Hall, Inc

1976

[17] [17]

Jürgen Dingel. 2002. A refinement calculus for shared-variable parallel and distributed programming.Formal Aspects of Computing14, 2 (2002), 123–197

2002

[18] [18]

Iulia Dragomir, Viorel Preoteasa, and Stavros Tripakis. 2020. The refinement calculus of reactive systems toolset.International Journal on Software Tools for Technology Transfer22, 6 (2020), 689–708

2020

[19] [19]

Fanghao Fan, Yingjie Xia, and Li Kuang. 2025. SecV: LLM-based Secure Verilog Generation with Clue-Guided Exploration on Hardware-CWE Knowledge Graph. InIJCAI 2025

2025

[20] [20]

Ian J Hayes and Mark Utting. 2001. A sequential real-time refinement calculus. Acta Informatica37, 6 (2001), 385–448

2001

[21] [21]

Chia-Tung Ho, Haoxing Ren, and Brucek Khailany. 2025. Verilogcoder: Au- tonomous verilog coding agents with graph-based planning and abstract syntax tree (ast)-based waveform tracing tool. InAAAI 2025

2025

[22] [22]

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InCOLM 2024

2024

[23] [23]

Hanxian Huang, Zhenghan Lin, Zixuan Wang, Xin Chen, Ke Ding, and Jishen Zhao. 2024. Towards llm-powered verilog rtl assistant: Self-verification and self-correction.arXiv preprint arXiv:2406.00115(2024)

work page arXiv 2024

[24] [24]

Cliff B. Jones. 1983. Tentative steps toward a development method for interfering programs.ACM Transactions on Programming Languages and Systems (TOPLAS) 5, 4 (1983), 596–619

1983

[25] [25]

Jason Koenig and K Rustan M Leino. 2016. Programming language features for refinement.arXiv preprint arXiv:1606.02022(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[26] [26]

Peter Lammich and Andreas Lochbihler. 2019. Automatic refinement to efficient data structures: A comparison of two approaches.Journal of Automated Reasoning 63, 1 (2019), 53–94

2019

[27] [27]

K Rustan M Leino. 2010. Dafny: An automatic program verifier for functional cor- rectness. InInternational conference on logic for programming artificial intelligence and reasoning. Springer, 348–370. 9

2010

[28] [28]

Brendan P Mahony and Ian J Hayes. 1992. A case-study in timed refinement: A mine pump.IEEE transactions on Software Engineering18, 9 (1992), 817–826

1992

[29] [29]

Zohar Manna and Amir Pnueli. 1990. A hierarchy of temporal properties (invited paper, 1989). InProceedings of the ninth annual ACM symposium on Principles of distributed computing. 377–410

1990

[30] [30]

Patrick Meredith, Michael Katelman, José Meseguer, and Grigore Roşu. 2010. A formal executable semantics of Verilog. InMEMOCODE 2010

2010

[31] [31]

Carroll Morgan. 1988. The specification statement.ACM Transactions on Pro- gramming Languages and Systems10, 3 (1988), 403–419

1988

[32] [32]

1990.Programming from specifications

Carroll Morgan. 1990.Programming from specifications. Prentice-Hall, Inc

1990

[33] [33]

Joseph M Morris. 1987. A theoretical basis for stepwise refinement and the programming calculus.Science of Computer programming9, 3 (1987), 287–306

1987

[34] [34]

Nathaniel Pinckney, Christopher Batten, Mingjie Liu, Haoxing Ren, and Brucek Khailany. 2025. Revisiting verilogeval: A year of improvements in large-language models for hardware code generation.ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–20

2025

[35] [35]

Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, and Haoxing Ren. 2025. Comprehensive Verilog design problems: A next-generation benchmark dataset for evaluating large language models and agents on rtl design and verification.arXiv preprint arXiv:2506.14074(2025)

work page arXiv 2025

[36] [36]

1996.Requirements engineering: An overview

Klaus Pohl. 1996.Requirements engineering: An overview. RWTH, Fachgruppe Informatik Aachen

1996

[37] [37]

Viorel Preoteasa and Stavros Tripakis. 2014. Refinement calculus of reactive systems. InProceedings of the 14th International Conference on Embedded Software. 1–10

2014

[38] [38]

Farzaneh Rabiei, Mark Zakharov, and Jose Renau. 2025. Beyond Verilog: Agents for Emerging HDLs. InIEEE International Conference on Omni-layer Intelligent Systems (COINS)

2025

[39] [39]

Bernhard Rumpe and Cornel Klein. 1996. Automata describing object behavior. InObject-Oriented Behavioral Specifications. Springer, 265–286

1996

[40] [40]

Peter Scholz. 1998. A refinement calculus for statecharts. InInternational Confer- ence on Fundamental Approaches to Software Engineering. Springer, 285–301

1998

[41] [41]

Douglas J Smith. 1996. VHDL & Verilog compared & contrasted—plus modeled example written in VHDL, Verilog and C. InDAC 1996

1996

[42] [42]

Shailja Thakur, Jason Blocklove, Hammond Pearce, Benjamin Tan, Siddharth Garg, and Ramesh Karri. 2023. Autochip: Automating hdl generation using llm feedback.arXiv preprint arXiv:2311.04887(2023)

work page arXiv 2023

[43] [43]

YunDa Tsai, Mingjie Liu, and Haoxing Ren. 2024. Rtlfixer: Automatically fixing rtl syntax errors with large language model. InDAC 2024

2024

[44] [44]

Yiting Wang, Guoheng Sun, Wanghao Ye, Gang Qu, and Ang Li. 2025. Verireason: Reinforcement learning with testbench feedback for reasoning-enhanced verilog generation.arXiv preprint arXiv:2505.11849(2025)

work page arXiv 2025

[45] [45]

Niklaus Wirth. 1971. Program development by stepwise refinement.Commun. ACM14, 4 (1971), 221–227

1971

[46] [46]

Clifford Wolf, Johann Glaser, and Johannes Kepler. 2013. Yosys-a free verilog synthesis suite. InProceedings of the 21st Austrian Workshop on Microelectronics (Austrochip), Vol. 97. 1–6

2013

[47] [47]

Zhongzhi Yu, Mingjie Liu, Michael Zimmer, Yingyan Celine, Yong Liu, and Haoxing Ren. 2025. Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems. InIEEE International Conference on LLM-Aided Design (ICLAD). 37–43

2025

[48] [48]

Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao. 2025. Mage: A multi-agent engine for automated rtl code generation. InDAC 2025

2025

[49] [49]

Iteration

Yingjie Zhou, Renzhi Chen, Xinyu Li, Jingkai Wang, Zhigang Fang, Bowei Wang, Wenqiang Bai, Qilin Cao, and Lei Wang. 2025. VToT: Automatic Verilog Genera- tion via LLMs with Tree of Thoughts Prompting. InDATE 2025. 8 Appendix A In this appendix, we select 3 simple examples to illustrate the hardware refinement process. 8.1 Oscillator The following example ...

2025