pith. sign in

arxiv: 2604.14989 · v2 · submitted 2026-04-16 · 💻 cs.AI · cs.AR

Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement

Pith reviewed 2026-05-10 10:42 UTC · model grok-4.3

classification 💻 cs.AI cs.AR
keywords RTL optimizationagentic frameworktiming optimizationskill learningmulti-agent systemsEDA automationPPA improvement
0
0 comments X p. Extension

The pith

Dr. RTL deploys a multi-agent system to rewrite RTL code, evaluate changes with real tools, and distill successes into a reusable skill library, achieving 21 percent better worst negative slack on 20 industrial designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Dr. RTL as an autonomous framework that applies large language models in a multi-agent loop to optimize register-transfer level hardware descriptions for timing, area, and power. It replaces unrealistic test setups with actual industrial synthesis tools and complex designs, running closed-loop cycles of critical-path analysis, parallel code rewrites, tool-based scoring, and extraction of winning patterns into a shared library. The library begins with 47 pattern-strategy pairs that support reuse across designs and can grow over time. A sympathetic reader would care because prior automatic methods relied on toy examples and weak baselines, leaving practical chip design still dependent on slow manual tuning.

Core claim

Dr. RTL performs closed-loop optimization through a multi-agent framework for critical-path analysis, parallel RTL rewriting, and tool-based evaluation. It introduces group-relative skill learning, which compares parallel RTL rewrites and distills the optimization experience into an interpretable skill library containing 47 pattern-strategy entries for cross-design reuse. Evaluated on 20 real-world RTL designs, it achieves average WNS/TNS improvements of 21 percent and 17 percent with a 6 percent area reduction over the industry-leading commercial synthesis tool.

What carries the argument

A multi-agent closed-loop system that analyzes critical paths, generates parallel RTL rewrites, scores them with commercial EDA tools, and applies group-relative skill learning to build and reuse an interpretable library of optimization patterns and strategies.

Load-bearing premise

The multi-agent closed-loop process with parallel rewriting and tool-grounded evaluation produces genuinely superior and generalizable optimizations rather than improvements tied to the specific 20 designs or the particular commercial tool baseline.

What would settle it

Applying Dr. RTL to a fresh collection of RTL designs never seen during its skill-library construction and checking whether the reported WNS, TNS, and area gains still appear when measured against the same commercial synthesis tool.

Figures

Figures reproduced from arXiv: 2604.14989 by Fengbin Tu, Jing Wang, Junxian He, Shang Liu, Wenji Fang, Yao Lu, Zhiyao Xie, Ziyan Guo.

Figure 1
Figure 1. Figure 1: Dr. RTL iteratively optimizes RTL PPA via closed [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Proposed industrial-standard RTL optimization evaluation and overview of Dr. RTL for agentic RTL optimization. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Multi-agent framework of Dr. RTL. Dr. RTL coordinates timing analysis, RTL optimization, and evaluation agents in a [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Group-relative skill learning. Parallel RTL candi [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Optimization trajectories of Dr. RTL across iterations of two design examples (router and pcie). WNS, TNS, area, and [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation studies of Dr. RTL on register-level slack [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distilling hierarchical trajectory into skill library. Optimization trajectories are organized across iterations, parallel [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Impact of different EDA tools on Dr. RTL results. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
read the original abstract

Recent advances in large language models (LLMs) have sparked growing interest in automatic RTL optimization for better performance, power, and area (PPA). However, existing methods are still far from realistic RTL optimization. Their evaluation settings are often unrealistic: they are tested on manually degraded, small-scale RTL designs and rely on weak open-source tools. Their optimization methods are also limited, relying on coarse design-level feedback and simple pre-defined rewriting rules. To address these limitations, we present Dr. RTL, an agentic framework for RTL timing optimization in a realistic evaluation environment, with continual self-improvement through reusable optimization skills. We establish a realistic evaluation setting with more challenging RTL designs and an industrial EDA workflow. Within this setting, Dr. RTL performs closed-loop optimization through a multi-agent framework for critical-path analysis, parallel RTL rewriting, and tool-based evaluation. We further introduce group-relative skill learning, which compares parallel RTL rewrites and distills the optimization experience into an interpretable skill library. Currently, this library contains 47 pattern--strategy entries for cross-design reuse to improve PPA and accelerate convergence, and it can continue evolving over time. Evaluated on 20 real-world RTL designs, Dr. RTL achieves average WNS/TNS improvements of 21%/17% with a 6% area reduction over the industry-leading commercial synthesis tool.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Dr. RTL, a multi-agent LLM framework for autonomous RTL timing optimization. It performs closed-loop critical-path analysis, parallel rewriting, and tool-grounded evaluation, augmented by group-relative skill learning that distills experience into a reusable, interpretable library of 47 pattern-strategy entries. In a claimed realistic industrial EDA setting, the system is evaluated on 20 real-world RTL designs and reports average improvements of 21% WNS, 17% TNS, and 6% area reduction relative to an industry-leading commercial synthesis tool.

Significance. If the performance deltas are shown to arise from the agentic loop and skill library rather than from design selection or baseline configuration, the work would constitute a meaningful step toward self-improving, tool-integrated agents for hardware design. The emphasis on cross-design reuse via an explicit skill library and the use of real commercial EDA flows are positive features that distinguish it from prior LLM-based RTL work.

major comments (3)
  1. [§5 and abstract] §5 (Evaluation) and abstract: The headline claim of 21%/17% WNS/TNS and 6% area gains is presented as averages over 20 designs, yet no per-design results, standard deviations, number of independent runs, or statistical significance tests are reported. Given the stochastic nature of LLM agents and parallel rewriting, this information is required to establish that the improvements are reproducible and attributable to the method rather than run variance.
  2. [§5.1] §5.1 (Design selection and baseline): No selection criteria, size distribution, or domain diversity statistics are given for the 20 real-world designs, nor is it stated whether the commercial tool baseline was run with maximum effort (all optimization passes, highest effort settings, equivalent runtime budgets). These omissions are load-bearing because the skeptic concern—that gains may reflect under-optimized baselines or favorably chosen designs—cannot be ruled out without them.
  3. [§4.3] §4.3 (Group-relative skill learning): The 47-entry skill library is constructed from tool feedback on the same 20 designs used for final evaluation, with no mention of held-out designs, temporal separation, or cross-validation. This directly affects the central claim of generalizable cross-design reuse; without such separation the reported acceleration and PPA benefits risk circularity.
minor comments (2)
  1. [§3.2] The description of how parallel rewrites are ranked and distilled into the skill library (group-relative comparison) would benefit from a short pseudocode listing or explicit scoring formula.
  2. [§5] Figure captions and axis labels in the experimental section should explicitly state whether reported numbers are single-run or averaged, and whether area is post-synthesis or post-P&R.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the emphasis on evaluation rigor, baseline fairness, and avoiding circularity in the skill library. We address each major comment point-by-point below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [§5 and abstract] §5 (Evaluation) and abstract: The headline claim of 21%/17% WNS/TNS and 6% area gains is presented as averages over 20 designs, yet no per-design results, standard deviations, number of independent runs, or statistical significance tests are reported. Given the stochastic nature of LLM agents and parallel rewriting, this information is required to establish that the improvements are reproducible and attributable to the method rather than run variance.

    Authors: We agree that aggregate averages alone are insufficient given the stochastic elements in our framework. In the revised manuscript we will add a table in §5 reporting per-design WNS, TNS, and area deltas. We will also rerun the full evaluation over five independent trials per design, report means with standard deviations, and include paired statistical significance tests (e.g., Wilcoxon signed-rank) against the commercial baseline. These additions will be summarized in the abstract as well. revision: yes

  2. Referee: [§5.1] §5.1 (Design selection and baseline): No selection criteria, size distribution, or domain diversity statistics are given for the 20 real-world designs, nor is it stated whether the commercial tool baseline was run with maximum effort (all optimization passes, highest effort settings, equivalent runtime budgets). These omissions are load-bearing because the skeptic concern—that gains may reflect under-optimized baselines or favorably chosen designs—cannot be ruled out without them.

    Authors: We will expand §5.1 with a new table listing the 20 designs, their gate counts (ranging 8k–620k), and domain categories (CPU cores, DSP blocks, interconnect, etc.). Selection was performed in collaboration with industrial partners to ensure realism; we will state the exact criteria. For the baseline, the commercial tool was invoked with the highest effort preset, all optimization passes enabled, and total runtime budget matched to the cumulative tool invocations of Dr. RTL. These settings will be documented explicitly. revision: yes

  3. Referee: [§4.3] §4.3 (Group-relative skill learning): The 47-entry skill library is constructed from tool feedback on the same 20 designs used for final evaluation, with no mention of held-out designs, temporal separation, or cross-validation. This directly affects the central claim of generalizable cross-design reuse; without such separation the reported acceleration and PPA benefits risk circularity.

    Authors: We acknowledge the circularity risk. The library is built incrementally from tool feedback during optimization. In the revision we will add an explicit description of the construction process, include a leave-one-out ablation (library for each design built without that design’s feedback), and report the resulting PPA and convergence metrics. We will also discuss the limitation that a fully external held-out corpus was not available and note this as an area for future work with larger design collections. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation or claims

full rationale

The paper's central results are empirical measurements of WNS/TNS and area improvements on 20 real-world RTL designs against an external industry-leading commercial synthesis tool. The multi-agent framework, parallel rewriting, tool-grounded evaluation, and group-relative skill learning (distilling 47 pattern-strategy entries from tool feedback) are all grounded in external EDA tool outputs rather than internal fitted parameters, self-definitions, or self-citation chains. No equations, derivations, or load-bearing steps reduce the reported deltas to inputs by construction; the evaluation setting uses realistic industrial workflows and external benchmarks, rendering the claims self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on domain assumptions about LLM code-rewriting reliability and tool accuracy rather than new mathematical axioms or free parameters; the skill library is an invented construct whose value is demonstrated only within the reported experiments.

axioms (2)
  • domain assumption Large language models can reliably identify critical paths and generate functionally correct RTL rewrites when given tool feedback
    Central to the closed-loop multi-agent process described in the abstract.
  • domain assumption Industrial EDA tool outputs provide unbiased and sufficiently precise PPA metrics for guiding optimization
    Required for the evaluation and skill-learning steps to be trustworthy.
invented entities (1)
  • Group-relative skill library no independent evidence
    purpose: Distill successful parallel rewrites into 47 reusable pattern-strategy entries for cross-design reuse and continual improvement
    New mechanism introduced to enable self-improvement; no independent evidence outside the 20-design evaluation is provided.

pith-pipeline@v0.9.0 · 5561 in / 1552 out tokens · 53560 ms · 2026-05-10T10:42:12.565582+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

  1. [1]

    Mohammad Akyash, Kimia Azar, and Hadi Kamali. 2025. Rtl++: Graph-enhanced llm for rtl code generation. In2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, 44–50

  2. [2]

    Ahmed Allam and Mohamed Shalan. 2024. RTL-Repo: A Benchmark for Evaluat- ing LLMs on Large-Scale RTL Design Projects.arXiv preprint arXiv:2405.17378 (2024)

  3. [3]

    Anthropic. 2026. Claude Code: Model Configuration. https://code.claude.com/ docs/en/model-config

  4. [4]

    Alan Brayton, Robert Mishchenko, and A Mishchenko. 2006. Scalable logic synthesis using a simple circuit structure. InInternational Workshop on Logic and Synthesis (IWLS)

  5. [5]

    Che-Ming Chang, Prashanth Vijayaraghavan, Ashutosh Jadhav, Charles Mackin, Hsinyu Tsai, Vandana Mukherjee, and Ehsan Degan. 2026. CODMAS: A Dialec- tic Multi-Agent Collaborative Framework for Structured RTL Optimization. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track...

  6. [6]

    Chen Chen, Guangyu Hu, Dongsheng Zuo, Cunxi Yu, Yuzhe Ma, and Hongce Zhang. 2024. E-syn: E-graph rewriting with technology-aware cost functions for logic synthesis. InDesign Automation Conference (DAC)

  7. [7]

    Lei Chen et al. 2024. The Dawn of AI-Native EDA: Promises and Challenges of Large Circuit Models.Springer Science China Information Sciences (SCIS)(2024)

  8. [8]

    Luanrong Chen, Renzhi Chen, Xinyu Li, Shanshan Li, Rui Gong, and Lei Wang

  9. [9]

    IncreRTL: Traceability-Guided Incremental RTL Generation under Require- ment Evolution.arXiv preprint arXiv:2603.25769(2026)

  10. [10]

    Animesh Basak Chowdhury, Marco Romanelli, Benjamin Tan, Ramesh Karri, and Siddharth Garg. 2024. Retrieval-guided reinforcement learning for boolean circuit minimization. InInternational Conference on Learning Representations (ICLR)

  11. [11]

    Matthew DeLorenzo, Animesh Basak Chowdhury, Vasudev Gohil, Shailja Thakur, Ramesh Karri, Siddharth Garg, and Jeyavijayan Rajendran. 2024. Make every move count: Llm-based high-quality rtl code generation using mcts.arXiv preprint arXiv:2402.03289(2024)

  12. [12]

    Chenhui Deng, Yun-Da Tsai, Guan-Ting Liu, Zhongzhi Yu, and Haoxing Ren. 2025. Scalertl: Scaling llms with reasoning data and test-time compute for accurate rtl code generation. In2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD). IEEE, 1–9

  13. [13]

    Chenhui Deng, Zhongzhi Yu, Guan-Ting Liu, Nathaniel Pinckney, and Haoxing Ren. 2026. ACE-RTL: When Agentic Context Evolution Meets RTL-Specialized LLMs.arXiv preprint arXiv:2602.10218(2026)

  14. [14]

    Wenji Fang, Jing Wang, Yao Lu, Shang Liu, Yuchao Wu, Yuzhe Ma, and Zhiyao Xie. 2025. A survey of circuit foundation model: Foundation ai models for vlsi circuit design and eda.arXiv preprint arXiv:2504.03711(2025)

  15. [15]

    Mingzhe Gao, Jieru Zhao, Zhe Lin, Wenchao Ding, Xiaofeng Hou, Yu Feng, Chao Li, and Minyi Guo. 2024. AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs. InInternational Conference on Computer Design (ICCD)

  16. [16]

    2012.Logic synthesis and verification

    Soha Hassoun and Tsutomu Sasao. 2012.Logic synthesis and verification. Vol. 654. Springer Science & Business Media

  17. [17]

    Zhuolun He, Yuan Pu, Haoyuan Wu, Tairu Qiu, and Bei Yu. 2025. Large language models for eda: Future or mirage?ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–53

  18. [18]

    Wei-Po Hsin, Ren-Hao Deng, Yao-Ting Hsieh, En-Ming Huang, and Shih-Hao Hung. 2026. EvolVE: Evolutionary Search for LLM-based Verilog Generation and Optimization.arXiv preprint arXiv:2601.18067(2026)

  19. [19]

    Miao Liu, Liwei Ni, Junfeng Liu, Xingyu Meng, Rui Wang, Xiaoze Lin, Xinhua Lai, Xingquan Li, and Jungang Xu. 2026. A Survey of Machine Learning Approaches in Logic Synthesis.ACM Transactions on Design Automation of Electronic Systems 31, 2 (2026), 1–43

  20. [20]

    Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. 2023. Ver- ilogEval: Evaluating Large Language Models for Verilog Code Generation.arXiv preprint arXiv:2309.07544(2023)

  21. [21]

    Shang Liu, Wenji Fang, Yao Lu, Qijun Zhang, Hongce Zhang, and Zhiyao Xie

  22. [22]

    RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)(2024)

  23. [23]

    Shang Liu, Yao Lu, Wenji Fang, Mengming Li, and Zhiyao Xie. 2024. OpenLLM- RTL: Open Dataset and Benchmark for LLM-Aided Design RTL Generation. In International Conference on Computer-Aided Design (ICCAD)

  24. [24]

    Yao Lu, Shang Liu, Qijun Zhang, and Zhiyao Xie. 2024. RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model. InAsia and South Pacific Design Automation Conference (ASP-DAC)

  25. [25]

    Yao Lu, Shang Liu, Hangan Zhou, Wenji Fang, Qijun Zhang, and Zhiyao Xie. 2026. A New Benchmark for the Appropriate Evaluation of RTL Code Optimization. arXiv preprint arXiv:2601.01765(2026)

  26. [26]

    Kyungjun Min, Kyumin Cho, Junhwan Jang, and Seokhyeong Kang. 2025. Revo- lution: An evolutionary framework for rtl generation driven by large language models.arXiv preprint arXiv:2510.21407(2025)

  27. [27]

    Jingyu Pan, Guanglei Zhou, Chen-Chia Chang, Isaac Jacobson, Jiang Hu, and Yiran Chen. 2025. A Survey of Research in Large Language Models for Elec- tronic Design Automation.ACM Transactions on Design Automation of Electronic Systems (TODAES)(2025)

  28. [28]

    Zehua Pei, Hui-Ling Zhen, Mingxuan Yuan, Yu Huang, and Bei Yu. 2024. BetterV: Controlled Verilog Generation with Discriminative Guidance.arXiv preprint arXiv:2402.03375(2024)

  29. [29]

    Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, and Haoxing Ren. 2025. Comprehensive Verilog design problems: A next-generation benchmark dataset for evaluating large language models and agents on rtl design and verification.arXiv preprint arXiv:2506.14074(2025)

  30. [30]

    Heng Ping, Peiyu Zhang, Zhenkun Wang, Shixuan Li, Anzhe Cheng, Wei Yang, Paul Bogdan, and Shahin Nazarian. 2026. POET: Power-Oriented Evolutionary Tuning for LLM-Based RTL PPA Optimization.arXiv preprint arXiv:2603.19333 (2026)

  31. [31]

    Arun Ravindran, Aditya Patra, Vahid Babaey, and Suresh Purini. 2025. Survey and Benchmarking of Large Language Models for RTL Code Generation: Techniques and Open Challenges. (2025)

  32. [32]

    Humza Sami, Pierre-Emmanuel Gaillardon, Valerio Tenace, et al. 2024. Aivril: Ai-driven rtl generation with verification in-the-loop.arXiv preprint arXiv:2409.11411(2024)

  33. [33]

    Si2. 2018. NanGate 45nm Open Cell Library

  34. [34]

    Cadence Design Systems. 2026. Cadence JasperGold Sequential Equiva- lence Checking App. https://www.cadence.com/en_US/home/tools/system- design-and-verification/formal-and-static-verification/jasper-verification- platform/jaspergold-sequential-equivalence-checking-app.html

  35. [35]

    Kimia Tasnia, Alexander Garcia, Tasnuva Farheen, and Sazadur Rahman. 2025. Veriopt: Ppa-aware high-quality verilog generation via multi-role llms. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1–9

  36. [36]

    Kiran Thorat, Jiahui Zhao, Yaotian Liu, Amit Hasan, Hongwu Peng, Xi Xie, Bin Lei, and Caiwen Ding. 2025. LLM-VeriPPA: Power, Performance, and Area Optimization aware Verilog Code Generation with Large Language Models. In 2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD). IEEE, 1–7

  37. [37]

    Kiran Thorat, Jiahui Zhao, Yaotian Liu, Hongwu Peng, Xi Xie, Bin Lei, Jeff Zhang, and Caiwen Ding. 2023. Advanced Large Language Model (LLM)-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis.arXiv preprint arXiv:2312.01022(2023)

  38. [38]

    2010.Digital design with RTL design, VHDL, and Verilog

    Frank Vahid. 2010.Digital design with RTL design, VHDL, and Verilog. John Wiley & Sons

  39. [39]

    Yiting Wang, Wanghao Ye, Ping Guo, Yexiao He, Ziyao Wang, Bowei Tian, Shwai He, Guoheng Sun, Zheyu Shen, Sihan Chen, et al . 2025. Symrtlo: Enhancing rtl code optimization with llms and neuron-inspired symbolic reasoning. In Advances in Neural Information Processing Systems (NeurIPS)

  40. [40]

    Zhihao Xu, Bixin Li, and Lulu Wang. 2025. Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis.arXiv preprint arXiv:2507.16808 (2025)

  41. [41]

    Xufeng Yao, Yiwen Wang, Xing Li, Yingzhao Lian, Ran Chen, Lei Chen, Mingxuan Yuan, Hong Xu, and Bei Yu. 2024. Rtlrewriter: Methodologies for large models aided rtl code optimization. InProceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

  42. [42]

    Jiaqi Yin, Zhan Song, Chen Chen, Qihao Hu, and Cunxi Yu. 2025. Boole: Exact symbolic reasoning via boolean equality saturation. In2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 1–7

  43. [43]

    Zhongzhi Yu, Mingjie Liu, Michael Zimmer, Yingyan Celine, Yong Liu, and Haoxing Ren. 2025. Spec2rtl-agent: Automated hardware code generation from complex specifications using llm agent systems. In2025 IEEE International Con- ference on LLM-Aided Design (ICLAD). IEEE, 37–43

  44. [44]

    Zelin Zang, Yuhang Song, Aili Wang, Bingo Wing-Kuen Ling, Qi Sun, Zhen Lei, Fuji Yang, Cheng Zhuo, and Jiebo Luo. 2025. The Dawn of Agentic EDA: A Survey of Autonomous Digital Chip Design.arXiv preprint arXiv:2512.23189 (2025)

  45. [45]

    Xinyu Zhang, Zhiteng Chao, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jianan Mu, Jing Justin Ye, and Huawei Li. 2026. RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement Learning.arXiv preprint arXiv:2603.27630(2026)

  46. [46]

    Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, et al. 2024. CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization.arXiv preprint arXiv:2407.10424(2024)

  47. [47]

    Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao. 2025. Mage: A multi-agent engine for automated rtl code generation. In2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 1–7

  48. [48]

    Matthew M Ziegler, Hung-Yi Liu, George Gristede, Bruce Owens, Ricardo Ni- gaglioni, and Luca P Carloni. 2016. A synthesis-parameter tuning system for autonomous design-space exploration. In2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1148–1151. 9