VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs
Pith reviewed 2026-05-10 09:28 UTC · model grok-4.3
The pith
A knowledge graph built by multiple agents lets LLMs generate correct hierarchical RTL for large designs like RISC-V with little human help.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VeriGraphi introduces a spec-anchored Knowledge Graph, termed HDA, that explicitly encodes module hierarchy, port-level interfaces, wiring semantics, and inter-module dependencies. The graph is assembled through iterative multi-agent examination of informal specification documents. Guided by this deterministic scaffold, a progressive coding stage generates pseudo-code followed by synthesizable RTL while enforcing interface and dependency rules at each submodule. Evaluation on NIST benchmark specifications and a detailed RV32I RISC-V case study shows the pipeline produces functionally correct hierarchical designs with minimal human intervention.
What carries the argument
The spec-anchored Knowledge Graph (HDA), which acts as the central structural scaffold by representing hierarchy, interfaces, and dependencies in a form that can be checked before any code is written.
If this is right
- Reliable hierarchical RTL generation becomes possible for RISC-V processors from natural-language specifications.
- Strong functional correctness is maintained in the generated designs.
- Human intervention drops because the knowledge graph supplies a fixed reference for all modules.
- Interface consistency and dependency correctness are enforced automatically at each generation step.
- Specifications containing mixed prose, figures, and tables can be processed into usable hardware descriptions.
Where Pith is reading between the lines
- The same graph-first structure could be tested on hardware designs larger than the RV32I to determine scaling limits.
- Because the intermediate graph is machine-checkable, it might be fed directly into existing formal verification tools to catch issues before RTL simulation.
- The approach could be tried in related engineering tasks where loose documents must be turned into precise modular implementations.
Load-bearing premise
The multi-agent analysis of informal specifications, figures, and tables can produce a complete, accurate knowledge graph that contains no errors or omissions introduced by the language models.
What would settle it
If the RTL generated for the RV32I processor fails standard simulation or formal verification tests against the expected behavior described in the original specification, the claim of reliable generation would not hold.
Figures
read the original abstract
Generating synthesizable Verilog for large, hierarchical hardware designs remains a significant challenge for large language models (LLMs), which struggle to replicate the structured reasoning that human experts employ when translating complex specifications into RTL. When tasked with producing hierarchical Verilog, LLMs frequently lose context across modules, hallucinate interfaces, fabricate inter-module wiring, and fail to maintain structural coherence - failures that intensify as design complexity grows and specifications involve informal prose, figures, and tables that resist direct operationalization. To address these challenges, we present VeriGraphi, a framework that introduces a spec-anchored Knowledge Graph as the architectural substrate driving the RTL generation pipeline. VeriGraphi constructs a HDA, a structured knowledge graph that explicitly encodes module hierarchy, port-level interfaces, wiring semantics, and inter-module dependencies as first-class graph entities and relations. Built through iterative multi-agent analysis of the specification, this Knowledge Graph provides a deterministic, machine-checkable structural scaffold before code generation. Guided by the KG, a progressive coding module incrementally generates pseudo-code and synthesizable RTL while enforcing interface consistency and dependency correctness at each submodule stage. We evaluate VeriGraphi on a benchmark of three representative specification documents from the National Institute of Standards and Technology and their corresponding implementations, and we present a RV32I processor as a detailed case study to illustrate the full pipeline. The results demonstrate that VeriGraphi enables reliable hierarchical RTL generation with minimal human intervention for RISC-V, marking a significant milestone for LLM-generated hardware design while maintaining strong functional correctness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces VeriGraphi, a multi-agent framework for generating hierarchical RTL Verilog from complex hardware specifications. It builds a Hierarchical Design Architecture (HDA) knowledge graph via iterative LLM analysis of informal prose, figures, and tables to explicitly encode module hierarchy, port interfaces, wiring, and dependencies as a deterministic, machine-checkable scaffold. A progressive coding module then uses this graph to incrementally generate pseudo-code and synthesizable RTL while enforcing interface consistency. The work evaluates the approach on three NIST specification documents and presents a detailed RV32I RISC-V processor case study, claiming reliable hierarchical RTL generation with minimal human intervention and strong functional correctness.
Significance. If the central claims hold with supporting evidence, VeriGraphi would represent a meaningful methodological advance in LLM-assisted hardware design by mitigating context loss, interface hallucinations, and structural incoherence in large hierarchical designs. The use of an explicit, checkable knowledge-graph intermediate representation is a promising architectural choice that could generalize to other complex design domains.
major comments (2)
- [Abstract] Abstract: The central claim that VeriGraphi achieves 'reliable hierarchical RTL generation ... while maintaining strong functional correctness' for the RV32I case study is unsupported by any reported quantitative metrics (e.g., simulation pass rates, functional coverage, bug counts, or comparisons against direct LLM baselines or human designs). Without these data the soundness of the reliability assertion cannot be assessed.
- [Framework description] HDA knowledge-graph construction (framework description): The correctness of the entire pipeline rests on the HDA graph being a complete and accurate encoding of inter-module dependencies. The manuscript describes only LLM-driven iterative multi-agent extraction from informal sources and asserts that the result is 'machine-checkable,' but provides no independent validation step (manual cross-check against a human reference graph, formal consistency proof, or comparison to ground-truth netlists) that would detect omissions or hallucinations in wiring and dependency relations.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for highlighting areas where additional evidence would strengthen the manuscript. We address each major comment below and will revise the paper accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that VeriGraphi achieves 'reliable hierarchical RTL generation ... while maintaining strong functional correctness' for the RV32I case study is unsupported by any reported quantitative metrics (e.g., simulation pass rates, functional coverage, bug counts, or comparisons against direct LLM baselines or human designs). Without these data the soundness of the reliability assertion cannot be assessed.
Authors: We agree that the abstract's assertion of strong functional correctness requires supporting quantitative data to be fully substantiated. The RV32I case study describes successful RTL generation and notes that the resulting design was simulated for basic functional verification, yet we did not report explicit metrics such as pass rates, coverage numbers, bug counts, or baseline comparisons. In the revised manuscript we will add a dedicated evaluation subsection (or table) in the results section that includes these quantitative metrics along with direct LLM baseline comparisons. revision: yes
-
Referee: [Framework description] HDA knowledge-graph construction (framework description): The correctness of the entire pipeline rests on the HDA graph being a complete and accurate encoding of inter-module dependencies. The manuscript describes only LLM-driven iterative multi-agent extraction from informal sources and asserts that the result is 'machine-checkable,' but provides no independent validation step (manual cross-check against a human reference graph, formal consistency proof, or comparison to ground-truth netlists) that would detect omissions or hallucinations in wiring and dependency relations.
Authors: The referee correctly notes the absence of an explicit independent validation step for the HDA graph. Although the graph is iteratively refined and its relations are enforced during progressive code generation, the manuscript does not describe a separate manual cross-check, ground-truth comparison, or formal check. We will revise the framework section to add a new paragraph (and possibly a figure) detailing the validation procedures applied during the NIST and RISC-V experiments, including any human review of extracted hierarchy and wiring, thereby making the validation process transparent. revision: yes
Circularity Check
No circularity: framework claims rest on external benchmark evaluation rather than self-definition or fitted inputs.
full rationale
The paper describes a multi-agent LLM framework that builds a knowledge graph from informal specifications and then uses it to guide progressive RTL code generation. No equations, parameters, or quantitative predictions appear in the derivation chain. The central claim of reliable hierarchical generation for RISC-V is supported by evaluation on NIST benchmarks and a detailed case study, which constitute independent external checks rather than quantities defined by the framework's own outputs. No self-citations, ansatzes, or uniqueness theorems are invoked in a load-bearing way that reduces the result to its inputs by construction. The absence of any fitted-input-called-prediction or self-definitional step keeps the analysis self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Iterative multi-agent analysis of specifications can produce a complete and correct structural knowledge graph encoding hierarchy, interfaces, wiring, and dependencies
invented entities (1)
-
HDA (Hierarchical Design Architecture) knowledge graph
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Large language models for software engineering: A systematic literature review, 2024
X. Hou, Y . Zhao, Y . Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” 2024. [Online]. Available: https://arxiv.org/abs/2308.10620
-
[2]
Beyond Synthetic Benchmarks.arXiv:2510.26130, October
M. Rahman, S. Khatoonabadi, and E. Shihab, “Beyond synthetic benchmarks: Evaluating llm performance on real-world class-level code generation,” 2025. [Online]. Available: https://arxiv.org/abs/2510.26130
-
[3]
Hivegen–hierarchical llm-based verilog generation for scalable chip design,
J. Tang, J. Qin, K. Thorat, C. Zhu-Tian, Y . Cao, Y . K. Zhao, and C. Ding, “Hivegen–hierarchical llm-based verilog generation for scalable chip design,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, 2025, pp. 30–36
2025
-
[4]
Large language model for verilog code generation: Literature review and the road ahead,
G. Yang, W. Zheng, X. Chen, D. Liang, P. Hu, Y . Yang, S. Peng, Z. Li, J. Feng, X. Wei, K. Sun, D. Ma, H. Cheng, Y . Shen, X. Hu, T. Y . Zhuo, and D. Lo, “Large language model for verilog code generation: Literature review and the road ahead,” 2025. [Online]. Available: https://arxiv.org/abs/2512.00020
-
[5]
Spec2rtl- agent: Automated hardware code generation from complex specifications using llm agent systems,
Z. Yu, M. Liu, M. Zimmer, Y . Celine, Y . Liu, and H. Ren, “Spec2rtl- agent: Automated hardware code generation from complex specifications using llm agent systems,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, 2025, pp. 37–43
2025
-
[6]
Rome was not built in a single step: Hierarchical prompting for llm-based chip design,
A. Nakkab, S. Q. Zhang, R. Karri, and S. Garg, “Rome was not built in a single step: Hierarchical prompting for llm-based chip design,” inProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, 2024, pp. 1–11
2024
-
[7]
Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl,
Y . Bai, G. B. Hamad, S. Suhaib, and H. Ren, “Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl,” in2025 IEEE International Conference on LLM- Aided Design (ICLAD). IEEE, 2025, pp. 85–92
2025
-
[8]
H. Yang, X. Zhao, Y . Liu, Z. Zou, K. Lyu, C. Zhou, Y . Zhu, and J. Hao, “Automating hardware design and verification from architec- tural papers via a neural-symbolic graph framework,”arXiv preprint arXiv:2511.06067, 2025
-
[9]
Verirag: A knowledge graph-augmented rag for verilog and assertion generation,
T. Jayanth, R. Saravanan, and P. D. Sai Manoj, “Verirag: A knowledge graph-augmented rag for verilog and assertion generation,” inIEEE/ACM ASP-DAC, 2026. [Online]. Available: Paper=https://mason.gmu.edu/∼rsaravan/papers/VeriRAG.pdfURL= https://mason.gmu.edu/∼rsaravan/projects/VeriRAG/VeriRAG.html
2026
-
[10]
NotebookLM,
Google, “NotebookLM,” https://notebooklm.google, 2024, accessed: 2024-02-18
2024
-
[11]
Chatcpu: An agile cpu design and verification platform with llm,
X. Wang, G.-W. Wan, S.-Z. Wong, L. Zhang, T. Liu, Q. Tian, and J. Ye, “Chatcpu: An agile cpu design and verification platform with llm,” in Proceedings of the 61st ACM/IEEE Design Automation Conference, ser. DAC ’24. New York, NY , USA: Association for Computing Machinery,
-
[12]
Available: https://doi.org/10.1145/3649329.3658493
[Online]. Available: https://doi.org/10.1145/3649329.3658493
-
[13]
Openroad project: Toward autonomous ic design,
OpenROAD Project, “Openroad project: Toward autonomous ic design,” https://theopenroadproject.org/, 2025
2025
-
[14]
Icarus verilog,
S. Williams, “Icarus verilog,” https://steveicarus.github.io/iverilog/, 2026, open-source Verilog compiler and simulator
2026
-
[15]
Claude Sonnet 4.6,
Anthropic, “Claude Sonnet 4.6,” February 2026, accessed: 2026-04-14. [Online]. Available: https://www.anthropic.com/claude/sonnet
2026
-
[16]
Yosys open synthesis suite,
C. Wolf, “Yosys open synthesis suite,” https://yosyshq.net/yosys/
-
[17]
Openlane 2: Making the most popular open source asic implementation flow,
Various, “Openlane 2: Making the most popular open source asic implementation flow,”WOSet Workshop, 2024. [Online]. Available: https://woset-workshop.github.io/PDFs/2024/17 OpenLane 2 Making the Most .pdf
2024
-
[18]
The risc-v instruction set manual, volume i: User-level isa, version 2.1,
A. Waterman, Y . Lee, D. A. Patterson, and K. Asanovi ´c, “The risc-v instruction set manual, volume i: User-level isa, version 2.1,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2016-118, May 2016. [Online]. Available: http: //www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-118.html
2016
-
[19]
Autogen: Enabling next-gen llm applications via multi-agent conversations,
Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “Autogen: Enabling next-gen llm applications via multi-agent conversations,” inFirst conference on language modeling, 2024
2024
-
[20]
Gpt-4o mini: advancing cost-efficient intelligence,
OpenAI, “Gpt-4o mini: advancing cost-efficient intelligence,” 2024, accessed: 2024-07-18. [Online]. Available: https://openai.com/index/ gpt-4o-mini-advancing-cost-efficient-intelligence/
2024
-
[21]
The NIST Cybersecurity Framework (CSF) 2.0
National Institute of Standards and Technology (NIST), “Advanced encryption standard (aes),” U.S. Department of Commerce, Tech. Rep. FIPS PUB 197, 2001. [Online]. Available: https://doi.org/10.6028/NIST. FIPS.197
-
[22]
Digital signature standard (dss),
——, “Digital signature standard (dss),” U.S. Department of Commerce, Tech. Rep. FIPS PUB 186-5, 2023. [Online]. Available: https: //doi.org/10.6028/NIST.FIPS.186-5
-
[23]
The keyed-hash message authentication code (hmac),
——, “The keyed-hash message authentication code (hmac),” U.S. Department of Commerce, Tech. Rep. FIPS PUB 198-1, 2008. [Online]. Available: https://doi.org/10.6028/NIST.FIPS.198-1
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.