arxiv: 2604.14550 · v2 · submitted 2026-04-16 · 💻 cs.AR · cs.AI· cs.LG· cs.MA· cs.PL

VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs

Sazzadul Islam , Tasnim Tabassum , Hao Zheng This is my paper

Pith reviewed 2026-05-10 09:28 UTC · model grok-4.3

classification 💻 cs.AR cs.AIcs.LGcs.MAcs.PL

keywords RTL generationknowledge graphmulti-agent frameworkVerilogRISC-VLLM hardware designhierarchical design

0 comments

The pith

A knowledge graph built by multiple agents lets LLMs generate correct hierarchical RTL for large designs like RISC-V with little human help.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models often lose context and create invalid connections when asked to turn complex hardware specifications into multi-module Verilog code. VeriGraphi tackles this by first building a detailed, machine-checkable knowledge graph that records the full module hierarchy, ports, wiring rules, and dependencies. Multiple agents analyze the specification documents step by step to construct this graph from prose, figures, and tables. The graph then directs an incremental process that produces pseudo-code and then synthesizable RTL while checking consistency at every stage. If the method holds, it would allow reliable creation of substantial hardware designs such as processors with far less manual repair than direct LLM prompting.

Core claim

VeriGraphi introduces a spec-anchored Knowledge Graph, termed HDA, that explicitly encodes module hierarchy, port-level interfaces, wiring semantics, and inter-module dependencies. The graph is assembled through iterative multi-agent examination of informal specification documents. Guided by this deterministic scaffold, a progressive coding stage generates pseudo-code followed by synthesizable RTL while enforcing interface and dependency rules at each submodule. Evaluation on NIST benchmark specifications and a detailed RV32I RISC-V case study shows the pipeline produces functionally correct hierarchical designs with minimal human intervention.

What carries the argument

The spec-anchored Knowledge Graph (HDA), which acts as the central structural scaffold by representing hierarchy, interfaces, and dependencies in a form that can be checked before any code is written.

If this is right

Reliable hierarchical RTL generation becomes possible for RISC-V processors from natural-language specifications.
Strong functional correctness is maintained in the generated designs.
Human intervention drops because the knowledge graph supplies a fixed reference for all modules.
Interface consistency and dependency correctness are enforced automatically at each generation step.
Specifications containing mixed prose, figures, and tables can be processed into usable hardware descriptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-first structure could be tested on hardware designs larger than the RV32I to determine scaling limits.
Because the intermediate graph is machine-checkable, it might be fed directly into existing formal verification tools to catch issues before RTL simulation.
The approach could be tried in related engineering tasks where loose documents must be turned into precise modular implementations.

Load-bearing premise

The multi-agent analysis of informal specifications, figures, and tables can produce a complete, accurate knowledge graph that contains no errors or omissions introduced by the language models.

What would settle it

If the RTL generated for the RV32I processor fails standard simulation or formal verification tests against the expected behavior described in the original specification, the claim of reliable generation would not hold.

Figures

Figures reproduced from arXiv: 2604.14550 by Hao Zheng, Sazzadul Islam, Tasnim Tabassum.

**Figure 1.** Figure 1: System Architecture. III. FRAMEWORK A. Overview To achieve an automated hardware design pipeline directly from raw specs, we propose a structured architecture, illustrated in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Hierarchical Design Architecture (HDA) for RV32I. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Generating synthesizable Verilog for large, hierarchical hardware designs remains a significant challenge for large language models (LLMs), which struggle to replicate the structured reasoning that human experts employ when translating complex specifications into RTL. When tasked with producing hierarchical Verilog, LLMs frequently lose context across modules, hallucinate interfaces, fabricate inter-module wiring, and fail to maintain structural coherence - failures that intensify as design complexity grows and specifications involve informal prose, figures, and tables that resist direct operationalization. To address these challenges, we present VeriGraphi, a framework that introduces a spec-anchored Knowledge Graph as the architectural substrate driving the RTL generation pipeline. VeriGraphi constructs a HDA, a structured knowledge graph that explicitly encodes module hierarchy, port-level interfaces, wiring semantics, and inter-module dependencies as first-class graph entities and relations. Built through iterative multi-agent analysis of the specification, this Knowledge Graph provides a deterministic, machine-checkable structural scaffold before code generation. Guided by the KG, a progressive coding module incrementally generates pseudo-code and synthesizable RTL while enforcing interface consistency and dependency correctness at each submodule stage. We evaluate VeriGraphi on a benchmark of three representative specification documents from the National Institute of Standards and Technology and their corresponding implementations, and we present a RV32I processor as a detailed case study to illustrate the full pipeline. The results demonstrate that VeriGraphi enables reliable hierarchical RTL generation with minimal human intervention for RISC-V, marking a significant milestone for LLM-generated hardware design while maintaining strong functional correctness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VeriGraphi builds a spec-derived knowledge graph to steer multi-agent hierarchical RTL generation, but the absence of independent checks on that graph leaves the reliability claims hard to assess.

read the letter

The core idea is to extract a structured HDA knowledge graph from informal specs, figures, and tables using iterative multi-agent LLM analysis, then use that graph as a fixed scaffold for progressive pseudo-code and Verilog generation while enforcing interfaces and dependencies at each step. This is a reasonable response to the well-known context-loss and hallucination problems in direct LLM RTL output for designs like RISC-V processors. The framework description is clear on the pipeline stages and the intent to keep the graph machine-checkable before code starts. The RISC-V case study and the three NIST benchmark documents give a concrete illustration of how the process is meant to work with minimal human fixes. That part is useful for anyone thinking about LLM-assisted hardware flows. The main soft spot is exactly the one flagged in the stress test: the graph itself is produced solely by LLM agents reading prose and diagrams, with no described external validation step such as manual cross-check against a human reference graph, formal consistency checks, or comparison to known correct dependencies. Any missed inter-module relation or wrong port semantics would propagate directly into the coding stage and undermine the functional-correctness guarantee. The abstract claims strong results on the RV32I example, but without numbers on error rates, baseline comparisons, or how graph accuracy was measured, it is difficult to tell whether the method actually delivers on that. The evaluation section appears to rest on the case study rather than controlled experiments. This work is aimed at researchers in LLM-driven EDA and custom silicon automation who already follow multi-agent and graph-based approaches. It is coherent enough on its own terms to warrant a serious referee, provided the authors can supply the missing validation details and quantitative metrics in revision. I would send it to review rather than desk-reject.

Referee Report

2 major / 0 minor

Summary. The paper introduces VeriGraphi, a multi-agent framework for generating hierarchical RTL Verilog from complex hardware specifications. It builds a Hierarchical Design Architecture (HDA) knowledge graph via iterative LLM analysis of informal prose, figures, and tables to explicitly encode module hierarchy, port interfaces, wiring, and dependencies as a deterministic, machine-checkable scaffold. A progressive coding module then uses this graph to incrementally generate pseudo-code and synthesizable RTL while enforcing interface consistency. The work evaluates the approach on three NIST specification documents and presents a detailed RV32I RISC-V processor case study, claiming reliable hierarchical RTL generation with minimal human intervention and strong functional correctness.

Significance. If the central claims hold with supporting evidence, VeriGraphi would represent a meaningful methodological advance in LLM-assisted hardware design by mitigating context loss, interface hallucinations, and structural incoherence in large hierarchical designs. The use of an explicit, checkable knowledge-graph intermediate representation is a promising architectural choice that could generalize to other complex design domains.

major comments (2)

[Abstract] Abstract: The central claim that VeriGraphi achieves 'reliable hierarchical RTL generation ... while maintaining strong functional correctness' for the RV32I case study is unsupported by any reported quantitative metrics (e.g., simulation pass rates, functional coverage, bug counts, or comparisons against direct LLM baselines or human designs). Without these data the soundness of the reliability assertion cannot be assessed.
[Framework description] HDA knowledge-graph construction (framework description): The correctness of the entire pipeline rests on the HDA graph being a complete and accurate encoding of inter-module dependencies. The manuscript describes only LLM-driven iterative multi-agent extraction from informal sources and asserts that the result is 'machine-checkable,' but provides no independent validation step (manual cross-check against a human reference graph, formal consistency proof, or comparison to ground-truth netlists) that would detect omissions or hallucinations in wiring and dependency relations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting areas where additional evidence would strengthen the manuscript. We address each major comment below and will revise the paper accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that VeriGraphi achieves 'reliable hierarchical RTL generation ... while maintaining strong functional correctness' for the RV32I case study is unsupported by any reported quantitative metrics (e.g., simulation pass rates, functional coverage, bug counts, or comparisons against direct LLM baselines or human designs). Without these data the soundness of the reliability assertion cannot be assessed.

Authors: We agree that the abstract's assertion of strong functional correctness requires supporting quantitative data to be fully substantiated. The RV32I case study describes successful RTL generation and notes that the resulting design was simulated for basic functional verification, yet we did not report explicit metrics such as pass rates, coverage numbers, bug counts, or baseline comparisons. In the revised manuscript we will add a dedicated evaluation subsection (or table) in the results section that includes these quantitative metrics along with direct LLM baseline comparisons. revision: yes
Referee: [Framework description] HDA knowledge-graph construction (framework description): The correctness of the entire pipeline rests on the HDA graph being a complete and accurate encoding of inter-module dependencies. The manuscript describes only LLM-driven iterative multi-agent extraction from informal sources and asserts that the result is 'machine-checkable,' but provides no independent validation step (manual cross-check against a human reference graph, formal consistency proof, or comparison to ground-truth netlists) that would detect omissions or hallucinations in wiring and dependency relations.

Authors: The referee correctly notes the absence of an explicit independent validation step for the HDA graph. Although the graph is iteratively refined and its relations are enforced during progressive code generation, the manuscript does not describe a separate manual cross-check, ground-truth comparison, or formal check. We will revise the framework section to add a new paragraph (and possibly a figure) detailing the validation procedures applied during the NIST and RISC-V experiments, including any human review of extracted hierarchy and wiring, thereby making the validation process transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: framework claims rest on external benchmark evaluation rather than self-definition or fitted inputs.

full rationale

The paper describes a multi-agent LLM framework that builds a knowledge graph from informal specifications and then uses it to guide progressive RTL code generation. No equations, parameters, or quantitative predictions appear in the derivation chain. The central claim of reliable hierarchical generation for RISC-V is supported by evaluation on NIST benchmarks and a detailed case study, which constitute independent external checks rather than quantities defined by the framework's own outputs. No self-citations, ansatzes, or uniqueness theorems are invoked in a load-bearing way that reduces the result to its inputs by construction. The absence of any fitted-input-called-prediction or self-definitional step keeps the analysis self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that LLMs can reliably extract and structure hardware design knowledge from informal text and diagrams into a graph that is sufficient to drive correct code generation.

axioms (1)

domain assumption Iterative multi-agent analysis of specifications can produce a complete and correct structural knowledge graph encoding hierarchy, interfaces, wiring, and dependencies
Invoked as the foundation of the HDA construction step before any code generation occurs.

invented entities (1)

HDA (Hierarchical Design Architecture) knowledge graph no independent evidence
purpose: Serves as the deterministic, machine-checkable structural scaffold that guides all subsequent RTL generation steps
New architectural substrate introduced to address LLM context and consistency failures

pith-pipeline@v0.9.0 · 5589 in / 1264 out tokens · 28044 ms · 2026-05-10T09:28:20.111094+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 8 canonical work pages

[1]

Large language models for software engineering: A systematic literature review, 2024

X. Hou, Y . Zhao, Y . Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” 2024. [Online]. Available: https://arxiv.org/abs/2308.10620

work page arXiv 2024
[2]

Beyond Synthetic Benchmarks.arXiv:2510.26130, October

M. Rahman, S. Khatoonabadi, and E. Shihab, “Beyond synthetic benchmarks: Evaluating llm performance on real-world class-level code generation,” 2025. [Online]. Available: https://arxiv.org/abs/2510.26130

work page arXiv 2025
[3]

Hivegen–hierarchical llm-based verilog generation for scalable chip design,

J. Tang, J. Qin, K. Thorat, C. Zhu-Tian, Y . Cao, Y . K. Zhao, and C. Ding, “Hivegen–hierarchical llm-based verilog generation for scalable chip design,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, 2025, pp. 30–36

2025
[4]

Large language model for verilog code generation: Literature review and the road ahead,

G. Yang, W. Zheng, X. Chen, D. Liang, P. Hu, Y . Yang, S. Peng, Z. Li, J. Feng, X. Wei, K. Sun, D. Ma, H. Cheng, Y . Shen, X. Hu, T. Y . Zhuo, and D. Lo, “Large language model for verilog code generation: Literature review and the road ahead,” 2025. [Online]. Available: https://arxiv.org/abs/2512.00020

work page arXiv 2025
[5]

Spec2rtl- agent: Automated hardware code generation from complex specifications using llm agent systems,

Z. Yu, M. Liu, M. Zimmer, Y . Celine, Y . Liu, and H. Ren, “Spec2rtl- agent: Automated hardware code generation from complex specifications using llm agent systems,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, 2025, pp. 37–43

2025
[6]

Rome was not built in a single step: Hierarchical prompting for llm-based chip design,

A. Nakkab, S. Q. Zhang, R. Karri, and S. Garg, “Rome was not built in a single step: Hierarchical prompting for llm-based chip design,” inProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, 2024, pp. 1–11

2024
[7]

Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl,

Y . Bai, G. B. Hamad, S. Suhaib, and H. Ren, “Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl,” in2025 IEEE International Conference on LLM- Aided Design (ICLAD). IEEE, 2025, pp. 85–92

2025
[8]

Automating hardware design and verification from architec- tural papers via a neural-symbolic graph framework,

H. Yang, X. Zhao, Y . Liu, Z. Zou, K. Lyu, C. Zhou, Y . Zhu, and J. Hao, “Automating hardware design and verification from architec- tural papers via a neural-symbolic graph framework,”arXiv preprint arXiv:2511.06067, 2025

work page arXiv 2025
[9]

Verirag: A knowledge graph-augmented rag for verilog and assertion generation,

T. Jayanth, R. Saravanan, and P. D. Sai Manoj, “Verirag: A knowledge graph-augmented rag for verilog and assertion generation,” inIEEE/ACM ASP-DAC, 2026. [Online]. Available: Paper=https://mason.gmu.edu/∼rsaravan/papers/VeriRAG.pdfURL= https://mason.gmu.edu/∼rsaravan/projects/VeriRAG/VeriRAG.html

2026
[10]

NotebookLM,

Google, “NotebookLM,” https://notebooklm.google, 2024, accessed: 2024-02-18

2024
[11]

Chatcpu: An agile cpu design and verification platform with llm,

X. Wang, G.-W. Wan, S.-Z. Wong, L. Zhang, T. Liu, Q. Tian, and J. Ye, “Chatcpu: An agile cpu design and verification platform with llm,” in Proceedings of the 61st ACM/IEEE Design Automation Conference, ser. DAC ’24. New York, NY , USA: Association for Computing Machinery,
[12]

Available: https://doi.org/10.1145/3649329.3658493

[Online]. Available: https://doi.org/10.1145/3649329.3658493

work page doi:10.1145/3649329.3658493
[13]

Openroad project: Toward autonomous ic design,

OpenROAD Project, “Openroad project: Toward autonomous ic design,” https://theopenroadproject.org/, 2025

2025
[14]

Icarus verilog,

S. Williams, “Icarus verilog,” https://steveicarus.github.io/iverilog/, 2026, open-source Verilog compiler and simulator

2026
[15]

Claude Sonnet 4.6,

Anthropic, “Claude Sonnet 4.6,” February 2026, accessed: 2026-04-14. [Online]. Available: https://www.anthropic.com/claude/sonnet

2026
[16]

Yosys open synthesis suite,

C. Wolf, “Yosys open synthesis suite,” https://yosyshq.net/yosys/
[17]

Openlane 2: Making the most popular open source asic implementation flow,

Various, “Openlane 2: Making the most popular open source asic implementation flow,”WOSet Workshop, 2024. [Online]. Available: https://woset-workshop.github.io/PDFs/2024/17 OpenLane 2 Making the Most .pdf

2024
[18]

The risc-v instruction set manual, volume i: User-level isa, version 2.1,

A. Waterman, Y . Lee, D. A. Patterson, and K. Asanovi ´c, “The risc-v instruction set manual, volume i: User-level isa, version 2.1,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2016-118, May 2016. [Online]. Available: http: //www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-118.html

2016
[19]

Autogen: Enabling next-gen llm applications via multi-agent conversations,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “Autogen: Enabling next-gen llm applications via multi-agent conversations,” inFirst conference on language modeling, 2024

2024
[20]

Gpt-4o mini: advancing cost-efficient intelligence,

OpenAI, “Gpt-4o mini: advancing cost-efficient intelligence,” 2024, accessed: 2024-07-18. [Online]. Available: https://openai.com/index/ gpt-4o-mini-advancing-cost-efficient-intelligence/

2024
[21]

The NIST Cybersecurity Framework (CSF) 2.0

National Institute of Standards and Technology (NIST), “Advanced encryption standard (aes),” U.S. Department of Commerce, Tech. Rep. FIPS PUB 197, 2001. [Online]. Available: https://doi.org/10.6028/NIST. FIPS.197

work page doi:10.6028/nist 2001
[22]

Digital signature standard (dss),

——, “Digital signature standard (dss),” U.S. Department of Commerce, Tech. Rep. FIPS PUB 186-5, 2023. [Online]. Available: https: //doi.org/10.6028/NIST.FIPS.186-5

work page doi:10.6028/nist.fips.186-5 2023
[23]

The keyed-hash message authentication code (hmac),

——, “The keyed-hash message authentication code (hmac),” U.S. Department of Commerce, Tech. Rep. FIPS PUB 198-1, 2008. [Online]. Available: https://doi.org/10.6028/NIST.FIPS.198-1

work page doi:10.6028/nist.fips.198-1 2008