LongRTL: Graph-Similarity-Guided LLM-driven Long Context RTL Optimization

Bei Yu; Che-Kuan Shen; Shuo Yin; Tsung-Yi Ho; Xiangfei Hu; Xufeng Yao; Yuchen Liu; Yuyang Ye

arxiv: 2606.08944 · v1 · pith:R3IGKW4Gnew · submitted 2026-06-08 · 💻 cs.AR · cs.PL

LongRTL: Graph-Similarity-Guided LLM-driven Long Context RTL Optimization

Yuyang Ye , Che-Kuan Shen , Xiangfei Hu , Yuchen Liu , Shuo Yin , Xufeng Yao , Bei Yu , Tsung-Yi Ho This is my paper

Pith reviewed 2026-06-27 15:00 UTC · model grok-4.3

classification 💻 cs.AR cs.PL

keywords RTL optimizationlarge language modelsgraph similarityabstract syntax treeretrieval augmented generationhardware designmulti-agent framework

0 comments

The pith

Graph similarity on ASTs lets LLMs optimize long RTL designs by partitioning them into subtrees, optimizing the parts with retrieval, and reassembling them while preserving overall function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that breaks long register-transfer level hardware code into smaller pieces using abstract syntax tree graph similarity to known templates. Three agents then handle the pieces separately: one partitions, one optimizes each piece with multi-modal retrieval of examples, and one puts them back together using logic ordering. This targets the problem that direct LLM use fails on real industrial RTL because the code is too long, tangled, and poorly structured. If the approach works, it would let AI tools improve large hardware descriptions that today require heavy manual effort. The method is presented as a way to move from small test cases to practical scale.

Core claim

The authors claim that AST graph similarity to reusable design templates identifies semantically meaningful subtrees for independent optimization; multi-modal RAG then generates improved submodule code; and logic-aware Graph-RAG reassembly produces a complete design that maintains global functional equivalence, thereby scaling LLM optimization to long-context industrial RTL.

What carries the argument

The Partition Agent that decomposes RTL designs into AST subtrees guided by graph similarity to reusable design templates.

If this is right

Enables structure-aware optimization on entangled, poorly modularized long RTL code.
Preserves global functional equivalence through the reconstruction step.
Scales LLM use from toy examples to industrial-scale hardware codebases.
Combines partitioning, multi-modal RAG optimization, and Graph-RAG reassembly into one workflow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same partitioning-plus-reassembly pattern could apply to long code in other structured domains such as verification scripts or embedded software.
Automated equivalence checking during reassembly might be added as an explicit step to catch errors earlier.
The method could be tested by measuring optimization quality on public large RTL repositories with known golden outputs.
Integration with existing simulation or synthesis tools would give immediate feedback on whether each optimized subtree still works.

Load-bearing premise

That AST graph similarity to design templates will reliably find subtrees whose independent optimization and logic-aware reassembly will keep the original design's behavior unchanged.

What would settle it

Apply the full pipeline to a known large open RTL design, then run the original and optimized versions through the same testbench and check whether every output vector matches.

read the original abstract

Large Language Models (LLMs) show great promise in RTL code generation and optimization. However, real-world RTL designs are typically long, entangled, and poorly modularized, posing a major challenge due to context-length limitations and lack of structure. To overcome these obstacles, we propose a scalable LLM-based RTL optimization framework guided by graph similarity. Our method introduces three collaborative agents: (1) a Partition Agent that decomposes RTL designs into semantically meaningful AST subtrees, guided by AST graph similarity to reusable design templates; (2) an Optimization Agent that generates RTL submodule code based on partitioned subtrees using multi-modal Retrieval-Augmented Generation (RAG) with both AST and RTL guidance; and (3) a Reconstruction Agent that reassembles optimized submodules based on logic-aware ordering and Graph-RAG prompting, ensuring global functional equivalence. Together, these components enable robust, structure-aware optimization of long-context RTL designs, bridging the gap between toy examples and industrial-scale hardware codebases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes a three-agent LLM framework for long RTL optimization via AST graph similarity but supplies no experiments or equivalence checks.

read the letter

The main thing here is a proposed three-agent system for LLM-driven RTL optimization: a Partition Agent that splits designs using AST graph similarity to reusable templates, an Optimization Agent that applies multi-modal RAG on AST and RTL data, and a Reconstruction Agent that reassembles via logic-aware ordering and Graph-RAG. The abstract presents this as a way to scale beyond toy examples to industrial hardware code.

What is new is the specific integration of graph-similarity partitioning with multi-modal RAG and logic-aware reassembly for the RTL domain. The paper does a clear job naming the real constraints—context length and tangled structure in actual designs—and sketching a decomposition strategy that tries to respect module semantics.

The soft spots are straightforward. The description claims the reconstruction step preserves global functional equivalence, yet it gives no mechanism, test, or post-reassembly check for that. There are no benchmarks, error rates, simulation results, or comparisons to baselines. The stress-test concern about missing equivalence preservation is on target; without it, any LLM-induced change in widths, state machines, or timing would go undetected. The full paper might contain those details, but nothing in the supplied text supports the central claim.

This is for people working on LLM tools in electronic design automation. A reader looking for concrete ideas on structuring multi-agent flows for hardware code could extract some practical pointers, though the lack of validation limits how far the ideas can be taken.

It deserves a serious referee to assess whether the full version includes reproducible experiments on real designs and proper verification steps. I would recommend sending it to peer review.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes LongRTL, a graph-similarity-guided LLM framework for optimizing long-context RTL designs. It describes three collaborative agents: a Partition Agent that decomposes RTL into AST subtrees via graph similarity to reusable templates, an Optimization Agent that generates optimized submodule code using multi-modal RAG, and a Reconstruction Agent that reassembles submodules via logic-aware ordering and Graph-RAG prompting while claiming to preserve global functional equivalence. The work positions this as a scalable approach bridging toy examples to industrial-scale hardware codebases.

Significance. If the equivalence-preservation and scalability claims hold, the multi-agent graph-guided decomposition could meaningfully extend LLM applicability to complex, long RTL modules that exceed context limits. The combination of AST similarity partitioning with RAG-based optimization is a plausible direction for structure-aware hardware code improvement. However, the complete absence of any experimental results, benchmarks, error metrics, or verification procedures in the manuscript leaves the practical significance unassessable.

major comments (3)

[Abstract] Abstract: The central guarantee that the Reconstruction Agent 'ensures global functional equivalence' is asserted without any described mechanism (formal verification, equivalence checking, post-reassembly simulation, or even functional test vectors). This assumption is load-bearing for the entire framework.
[Abstract] Abstract: No experimental results, benchmarks, error metrics, comparisons to baselines, or even small-scale case studies are reported to support the claim of 'robust, structure-aware optimization' for long-context or industrial-scale RTL. The soundness of the central claim therefore cannot be evaluated from the provided text.
[Abstract] Abstract: The Partition Agent's reliance on AST graph similarity to 'reusable design templates' to produce 'semantically meaningful subtrees' whose independent optimization preserves semantics is stated without any supporting evidence, test cases, or discussion of failure modes (e.g., altered signal widths or state-machine semantics).

minor comments (1)

[Abstract] The abstract introduces several agent names and technical terms (Partition Agent, multi-modal RAG, Graph-RAG) without defining their precise interfaces or data flows, which would benefit from a high-level diagram or pseudocode even in an early draft.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment point by point below, with planned revisions where appropriate. The manuscript presents a framework proposal, and we acknowledge areas where additional clarification or discussion is warranted.

read point-by-point responses

Referee: [Abstract] Abstract: The central guarantee that the Reconstruction Agent 'ensures global functional equivalence' is asserted without any described mechanism (formal verification, equivalence checking, post-reassembly simulation, or even functional test vectors). This assumption is load-bearing for the entire framework.

Authors: We agree that the manuscript asserts preservation of global functional equivalence via the Reconstruction Agent's logic-aware ordering and Graph-RAG prompting without specifying a verification mechanism. The design intends to maintain equivalence through structural preservation during reassembly, but no formal or empirical verification procedure is detailed. We will revise the relevant sections to include a discussion of verification strategies, such as post-reconstruction simulation or equivalence checking where feasible. revision: yes
Referee: [Abstract] Abstract: No experimental results, benchmarks, error metrics, comparisons to baselines, or even small-scale case studies are reported to support the claim of 'robust, structure-aware optimization' for long-context or industrial-scale RTL. The soundness of the central claim therefore cannot be evaluated from the provided text.

Authors: The current manuscript focuses on describing the proposed three-agent framework as a conceptual approach to overcome context limitations in RTL optimization. We acknowledge the absence of empirical results, benchmarks, or case studies, which limits direct assessment of practical performance. We will revise the abstract and introduction to more clearly position the work as a framework proposal and outline directions for future empirical evaluation. revision: partial
Referee: [Abstract] Abstract: The Partition Agent's reliance on AST graph similarity to 'reusable design templates' to produce 'semantically meaningful subtrees' whose independent optimization preserves semantics is stated without any supporting evidence, test cases, or discussion of failure modes (e.g., altered signal widths or state-machine semantics).

Authors: We recognize that the Partition Agent description relies on AST graph similarity without providing concrete evidence, test cases, or analysis of failure modes. The approach uses graph similarity metrics on ASTs to identify subtrees aligned with reusable templates, with the intent of preserving semantics through structural correspondence. We will revise to elaborate on the similarity computation and add a discussion of assumptions and potential edge cases such as changes in signal semantics or state machines. revision: yes

Circularity Check

0 steps flagged

No circularity: framework proposal with asserted properties but no self-referential derivation or fitted predictions.

full rationale

The paper describes a three-agent LLM framework for RTL optimization. The Partition Agent uses AST graph similarity to templates, the Optimization Agent uses multi-modal RAG, and the Reconstruction Agent uses logic-aware ordering and Graph-RAG to 'ensure global functional equivalence.' This is a methodological claim about component behavior rather than a derivation chain, equation, or prediction that reduces to its own inputs by construction. No equations, fitted parameters, self-citations as load-bearing premises, or uniqueness theorems appear in the abstract or described structure. The equivalence guarantee is presented as an outcome of the agent design, not derived from or equivalent to prior fitted results within the paper. The work is therefore self-contained as an engineering proposal without the circular patterns enumerated.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5722 in / 1072 out tokens · 23253 ms · 2026-06-27T15:00:29.111997+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Rtlcoder: Fully open-source and efficient llm-assisted rtl code gen- eration technique,

S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “Rtlcoder: Fully open-source and efficient llm-assisted rtl code gen- eration technique,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024

2024
[2]

Verigen: A large language model for verilog code generation,

S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “Verigen: A large language model for verilog code generation,”ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, pp. 1–31, 2024

2024
[3]

Betterv: Controlled verilog generation with discriminative guidance,

P. Zehua, H. Zhen, M. Yuan, Y . Huang, and B. Yu, “Betterv: Controlled verilog generation with discriminative guidance,” inForty- first International Conference on Machine Learning, 2024

2024
[4]

Rtlrewriter: Methodologies for large models aided rtl code optimization,

X. Yao, Y . Wang, X. Li, Y . Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, “Rtlrewriter: Methodologies for large models aided rtl code optimization,” inProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024, pp. 1–7

2024
[5]

Symrtlo: Enhancing rtl code optimization with llms and neuron-inspired symbolic reasoning,

Y . Wang, W. Ye, P. Guo, Y . He, Z. Wang, B. Tian, S. He, G. Sun, Z. Shen, S. Chenet al., “Symrtlo: Enhancing rtl code optimization with llms and neuron-inspired symbolic reasoning,”arXiv preprint arXiv:2504.10369, 2025

work page arXiv 2025
[6]

Uvllm: An automated universal rtl verifi- cation framework using llms,

Y . Hu, J. Ye, K. Xu, J. Sun, S. Zhang, X. Jiao, D. Pan, J. Zhou, N. Wang, W. Shanet al., “Uvllm: An automated universal rtl verifi- cation framework using llms,”arXiv preprint arXiv:2411.16238, 2024

work page arXiv 2024
[7]

Chatcpu: An agile cpu design and verification platform with llm,

X. Wang, G.-W. Wan, S.-Z. Wong, L. Zhang, T. Liu, Q. Tian, and J. Ye, “Chatcpu: An agile cpu design and verification platform with llm,” inProceedings of the 61st ACM/IEEE Design Automation Conference, 2024, pp. 1–6

2024
[8]

Scalertl: Scaling llms with reasoning data and test-time compute for accurate rtl code generation,

C. Deng, Y .-D. Tsai, G.-T. Liu, Z. Yu, and H. Ren, “Scalertl: Scaling llms with reasoning data and test-time compute for accurate rtl code generation,”arXiv preprint arXiv:2506.05566, 2025

work page arXiv 2025
[9]

Representation Learning with Contrastive Predictive Coding

A. v. d. Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Pyverilog: A python-based hardware design processing toolkit for verilog hdl,

S. Takamaeda-Yamazaki, “Pyverilog: A python-based hardware design processing toolkit for verilog hdl,” inInternational Symposium on Applied Reconfigurable Computing. Springer, 2015, pp. 451–460

2015
[11]

GPT-4o System Card

A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Semi-Supervised Classification with Graph Convolutional Networks

T. Kipf, “Semi-supervised classification with graph convolutional networks,”arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[13]

Design Compiler User Guide,

Synopsys, “Design Compiler User Guide,” https://www.synopsys.com/ implementation-and-signoff/rtl-synthesis-test/dc-ultra.html, 2024

2024
[14]

Asap7: A 7-nm finfet predictive process design kit,

L. T. Clark, V . Vashishtha, L. Shifren, A. Gujja, S. Sinha, B. Cline, C. Ramamurthy, and G. Yeric, “Asap7: A 7-nm finfet predictive process design kit,”Microelectronics Journal, vol. 53, pp. 105–115, 2016

2016
[15]

Icarus verilog: open-source verilog more than a year later,

S. Williams and M. Baxter, “Icarus verilog: open-source verilog more than a year later,”Linux Journal, vol. 2002, no. 99, p. 3, 2002

2002
[16]

Abc: An academic industrial-strength verification tool,

R. Brayton and A. Mishchenko, “Abc: An academic industrial-strength verification tool,” inInternational Conference on Computer Aided Verification. Springer, 2010, pp. 24–40

2010
[17]

Automatic datapath optimization using e-graphs,

S. Coward, G. A. Constantinides, and T. Drane, “Automatic datapath optimization using e-graphs,” in2022 IEEE 29th Symposium on Computer Arithmetic (ARITH). IEEE, 2022, pp. 43–50

2022
[18]

Yosys-a free verilog synthesis suite,

C. Wolf, J. Glaser, and J. Kepler, “Yosys-a free verilog synthesis suite,” inProceedings of the 21st Austrian Workshop on Microelectronics (Austrochip), vol. 97, 2013

2013
[19]

Google, “Gemini,” https://gemini.google.com, 2025

2025

[1] [1]

Rtlcoder: Fully open-source and efficient llm-assisted rtl code gen- eration technique,

S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “Rtlcoder: Fully open-source and efficient llm-assisted rtl code gen- eration technique,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024

2024

[2] [2]

Verigen: A large language model for verilog code generation,

S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “Verigen: A large language model for verilog code generation,”ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, pp. 1–31, 2024

2024

[3] [3]

Betterv: Controlled verilog generation with discriminative guidance,

P. Zehua, H. Zhen, M. Yuan, Y . Huang, and B. Yu, “Betterv: Controlled verilog generation with discriminative guidance,” inForty- first International Conference on Machine Learning, 2024

2024

[4] [4]

Rtlrewriter: Methodologies for large models aided rtl code optimization,

X. Yao, Y . Wang, X. Li, Y . Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, “Rtlrewriter: Methodologies for large models aided rtl code optimization,” inProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024, pp. 1–7

2024

[5] [5]

Symrtlo: Enhancing rtl code optimization with llms and neuron-inspired symbolic reasoning,

Y . Wang, W. Ye, P. Guo, Y . He, Z. Wang, B. Tian, S. He, G. Sun, Z. Shen, S. Chenet al., “Symrtlo: Enhancing rtl code optimization with llms and neuron-inspired symbolic reasoning,”arXiv preprint arXiv:2504.10369, 2025

work page arXiv 2025

[6] [6]

Uvllm: An automated universal rtl verifi- cation framework using llms,

Y . Hu, J. Ye, K. Xu, J. Sun, S. Zhang, X. Jiao, D. Pan, J. Zhou, N. Wang, W. Shanet al., “Uvllm: An automated universal rtl verifi- cation framework using llms,”arXiv preprint arXiv:2411.16238, 2024

work page arXiv 2024

[7] [7]

Chatcpu: An agile cpu design and verification platform with llm,

X. Wang, G.-W. Wan, S.-Z. Wong, L. Zhang, T. Liu, Q. Tian, and J. Ye, “Chatcpu: An agile cpu design and verification platform with llm,” inProceedings of the 61st ACM/IEEE Design Automation Conference, 2024, pp. 1–6

2024

[8] [8]

Scalertl: Scaling llms with reasoning data and test-time compute for accurate rtl code generation,

C. Deng, Y .-D. Tsai, G.-T. Liu, Z. Yu, and H. Ren, “Scalertl: Scaling llms with reasoning data and test-time compute for accurate rtl code generation,”arXiv preprint arXiv:2506.05566, 2025

work page arXiv 2025

[9] [9]

Representation Learning with Contrastive Predictive Coding

A. v. d. Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Pyverilog: A python-based hardware design processing toolkit for verilog hdl,

S. Takamaeda-Yamazaki, “Pyverilog: A python-based hardware design processing toolkit for verilog hdl,” inInternational Symposium on Applied Reconfigurable Computing. Springer, 2015, pp. 451–460

2015

[11] [11]

GPT-4o System Card

A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Semi-Supervised Classification with Graph Convolutional Networks

T. Kipf, “Semi-supervised classification with graph convolutional networks,”arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[13] [13]

Design Compiler User Guide,

Synopsys, “Design Compiler User Guide,” https://www.synopsys.com/ implementation-and-signoff/rtl-synthesis-test/dc-ultra.html, 2024

2024

[14] [14]

Asap7: A 7-nm finfet predictive process design kit,

L. T. Clark, V . Vashishtha, L. Shifren, A. Gujja, S. Sinha, B. Cline, C. Ramamurthy, and G. Yeric, “Asap7: A 7-nm finfet predictive process design kit,”Microelectronics Journal, vol. 53, pp. 105–115, 2016

2016

[15] [15]

Icarus verilog: open-source verilog more than a year later,

S. Williams and M. Baxter, “Icarus verilog: open-source verilog more than a year later,”Linux Journal, vol. 2002, no. 99, p. 3, 2002

2002

[16] [16]

Abc: An academic industrial-strength verification tool,

R. Brayton and A. Mishchenko, “Abc: An academic industrial-strength verification tool,” inInternational Conference on Computer Aided Verification. Springer, 2010, pp. 24–40

2010

[17] [17]

Automatic datapath optimization using e-graphs,

S. Coward, G. A. Constantinides, and T. Drane, “Automatic datapath optimization using e-graphs,” in2022 IEEE 29th Symposium on Computer Arithmetic (ARITH). IEEE, 2022, pp. 43–50

2022

[18] [18]

Yosys-a free verilog synthesis suite,

C. Wolf, J. Glaser, and J. Kepler, “Yosys-a free verilog synthesis suite,” inProceedings of the 21st Austrian Workshop on Microelectronics (Austrochip), vol. 97, 2013

2013

[19] [19]

Google, “Gemini,” https://gemini.google.com, 2025

2025