RTL-BenchMT: Dynamic Maintenance of RTL Generation Benchmark Through Agent-Assisted Analysis and Revision

arxiv: 2605.15537 · v1 · submitted 2026-05-15 · 💻 cs.AI

RTL-BenchMT: Dynamic Maintenance of RTL Generation Benchmark Through Agent-Assisted Analysis and Revision

Jing Wang , Shang Liu , Hangan Zhou , Zhiyao Xie This is my paper

Pith reviewed 2026-05-19 14:37 UTC · model grok-4.3

classification 💻 cs.AI

keywords RTL generationbenchmark maintenanceagentic frameworkLLM-assisted EDAoverfitting detectionflawed caseshardware design automation

0 comments p. Extension

The pith

An agentic framework automatically identifies flawed RTL benchmark cases and detects overfitting to produce a refined suite.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RTL-BenchMT as an automated agentic framework designed to maintain RTL generation benchmarks used in LLM-assisted electronic design automation. It targets two persistent problems that manual effort struggles to fix: the presence of flawed cases within benchmarks and the tendency of models to overfit to those benchmarks. The framework applies AI agents to spot flawed cases, revise them, identify overfitting instances, and update the benchmark set accordingly. This process yields a cleaner benchmark suite that the authors plan to release openly. A sympathetic reader would care because reliable benchmarks are essential for measuring genuine progress in automated hardware description language generation rather than artifacts of poor test data.

Core claim

RTL-BenchMT is an agentic framework that automates the identification and revision of flawed benchmark cases along with the detection and updating of overfitting cases in RTL generation benchmarks, enabling a thorough analysis that produces a refined benchmark suite open-sourced to the community.

What carries the argument

RTL-BenchMT, an agentic framework that automates flaw identification, case revision, and overfitting detection to sustain benchmark quality with reduced human input.

If this is right

The refined benchmark suite raises the quality of evaluation data available for LLM-based RTL generators.
Ongoing human effort required to keep RTL benchmarks current drops substantially.
Detection of overfitting instances allows benchmark updates that better test generalization in generated hardware descriptions.
Community access to the revised suite supports more reproducible comparisons across different RTL generation approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agent-assisted maintenance approach could transfer to benchmarks in adjacent areas such as high-level synthesis or formal verification.
Continuous application of the framework might allow benchmarks to evolve automatically alongside new LLM capabilities without periodic full redesigns.

Load-bearing premise

AI agents can accurately detect flawed cases and overfitting instances in RTL benchmarks without introducing new errors or needing extensive human review.

What would settle it

Apply the framework to a benchmark containing known flawed cases identified by human experts and check whether the agents flag the same cases and produce revisions that pass expert validation.

Figures

Figures reproduced from arXiv: 2605.15537 by Hangan Zhou, Jing Wang, Shang Liu, Zhiyao Xie.

**Figure 1.** Figure 1: (1) Flawed cases and (2) overfitting are two significant challenges for RTL generation benchmarks. RTL-BenchMT resolves the challenges by dynamically maintaining benchmarks. RTL-BenchMT contributes in two important aspects: (1) automatically identifying and revising flawed cases and (2) automatically detecting and updating overfitting cases. Challenge 2. Overfitting on the benchmark. Public RTL benchmarks… view at source ↗

**Figure 2.** Figure 2: Overview of RTL-BenchMT agentic framework. The multi-agent system interacts with the environment through [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Example of automated flawed cases identification. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Undefined module name. The testbench requires the module name to be ‘TopModule,’ but this is not specified in the design description. 3.1 Syntax ambiguity We introduce three identified situations of syntax ambiguity: (1) undefined module name, (2) unclear port type, and (3) syntax errors in code example. Undefined module name refers to the situation where the design description only specifies the function… view at source ↗

**Figure 5.** Figure 5: Code sample with syntax error. Task ID: binary_to_gray_0001 of cid002 in CVDP benchmark. The code sample in the design description contains a syntactically incorrect parameter configuration that misleads LLMs into generating the same syntax error. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Diagram ambiguity. Task ID: HumanEval v2, 116 m2014_q3. The specified input is ‘x[4:1]’, while in the reference code, the input is ‘x[3:0].’ port of a design to be a register. However, the testbenches require the output to be initialized to fixed values (for e.g., 0), which is not specified in the design description. As a result, LLMs that generate correct logic can still fail due to missing initial assig… view at source ↗

**Figure 7.** Figure 7: Performance evaluation based on the rewritten [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

read the original abstract

This paper introduces RTL-BenchMT, an agentic framework for dynamically maintaining RTL generation benchmarks. Large Language Models (LLMs) assisted automated RTL generation is one of the most important directions in EDA research. However, current RTL benchmarks face two critical challenges: (1) flawed cases in the benchmarks and (2) overfitting to the benchmarks. Both challenges are difficult to resolve purely by manual engineering effort. To address these issues and systematically reduce human maintenance costs, we propose an automated agentic framework, RTL-BenchMT. RTL-BenchMT focuses on two key applications: (1) automatically identifying and revising flawed benchmark cases and (2) automatically detecting and updating overfitting cases. With the assistance of RTL-BenchMT, we conduct a thorough, in-depth analysis of flawed and overfitting cases and produce a refined benchmark suite that will be open-sourced to the community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RTL-BenchMT sketches an agent framework to spot and fix flawed or overfit RTL benchmark cases, but supplies no validation data or implementation details to show the agents work reliably.

read the letter

The main point here is a proposal for an agentic system that tries to keep RTL generation benchmarks current by automatically finding bad test cases and catching overfitting, yet the writeup gives almost nothing to judge whether the agents actually deliver on that without creating new problems or needing heavy human cleanup afterward. The stress-test note lands cleanly on this one. The central claim rests on agents doing accurate detection and revision at low cost, but there are no precision or recall numbers, no ablation on prompts or models, and no protocol shown for confirming that revised cases stay functionally correct and non-overfit. Without those, any refined benchmark could just be trading one set of artifacts for another. What is new is the targeted application of agent assistance to ongoing benchmark maintenance in the RTL/EDA space rather than a one-time creation effort. Most papers in this area release a static suite and stop; thinking explicitly about dynamic revision and open-sourcing the result is a reasonable next step for a niche that moves fast as LLMs improve. The paper states the two problems—flawed cases and overfitting—clearly and explains why pure manual fixes will not scale, which is useful framing even if the solution side stays high-level. The soft spots are mostly in the execution details. The description stays at the level of what the framework focuses on without showing agent architecture, example outputs, or any before-and-after metrics on the benchmark cases. That leaves the claim of reduced human maintenance costs untested. This is aimed at researchers working on LLM-assisted hardware description language generation who already care about evaluation quality. A reader in that group might pick up ideas for their own benchmark hygiene, but the lack of concrete results limits how far the paper can be taken right now. I would send it to peer review so the authors get concrete feedback on what validation experiments would make the claims credible, rather than desk-rejecting an idea that addresses a genuine pain point.

Referee Report

2 major / 1 minor

Summary. The paper introduces RTL-BenchMT, an agentic framework for dynamically maintaining RTL generation benchmarks used in LLM-assisted EDA research. It targets two challenges—flawed benchmark cases and overfitting—by using AI agents to automatically identify and revise flawed cases and to detect and update overfitting instances, with the goal of reducing manual engineering effort. The authors report conducting a thorough analysis via this framework and producing a refined benchmark suite that will be open-sourced.

Significance. If the agent-assisted detection and revision steps can be shown to operate with high reliability and low error introduction, the work would offer a practical, scalable approach to benchmark curation in a rapidly evolving subfield. This could meaningfully lower the barrier to maintaining trustworthy evaluation suites for RTL generation and encourage more reproducible progress in LLM-based hardware design.

major comments (2)

[Abstract] Abstract: The central claim that RTL-BenchMT 'automatically identifies and revises flawed benchmark cases' and 'automatically detects and updates overfitting cases' is load-bearing for the entire contribution, yet the manuscript provides no precision, recall, or error-bound figures for the agent detection steps, no ablation on prompt/model choices, and no explicit protocol confirming that revised cases remain functionally correct and non-overfit.
[Framework overview] The description of the agentic workflow does not quantify the residual human validation effort required after agent processing, leaving open the possibility that the reported reduction in maintenance cost is not realized in practice.

minor comments (1)

[Abstract] The abstract states that the refined suite 'will be open-sourced,' but no repository link, license, or access instructions appear in the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of the agent evaluation and human effort quantification.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that RTL-BenchMT 'automatically identifies and revises flawed benchmark cases' and 'automatically detects and updates overfitting cases' is load-bearing for the entire contribution, yet the manuscript provides no precision, recall, or error-bound figures for the agent detection steps, no ablation on prompt/model choices, and no explicit protocol confirming that revised cases remain functionally correct and non-overfit.

Authors: We acknowledge the importance of quantitative validation for the agent components. The original manuscript emphasizes the framework and the resulting refined benchmark rather than a standalone agent benchmark study. In the revision we will add a dedicated evaluation subsection that reports precision, recall, and error rates for both the flaw-detection and overfitting-detection agents, obtained via manual review of a representative sample of outputs. We will also include ablations across prompt variants and model choices, and we will explicitly describe the post-revision verification protocol (simulation-based functional checks plus equivalence testing against original specifications) used to confirm that revised cases remain correct and non-overfit. revision: yes
Referee: [Framework overview] The description of the agentic workflow does not quantify the residual human validation effort required after agent processing, leaving open the possibility that the reported reduction in maintenance cost is not realized in practice.

Authors: We agree that concrete quantification is required to support claims of reduced maintenance cost. The revised manuscript will report the measured human validation time, the percentage of cases that required manual correction after agent processing, and a direct comparison against the effort needed for fully manual curation of the same benchmark set. revision: yes

Circularity Check

0 steps flagged

No significant circularity in framework proposal or benchmark refinement

full rationale

The paper introduces RTL-BenchMT as a new agentic framework for identifying flawed RTL benchmark cases and detecting overfitting instances, then applies it to produce a refined open-sourced suite. No equations, parameters, or derivations are present that reduce by construction to fitted inputs or self-definitions. The central claims rest on the proposed automation reducing manual effort, with no load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work. The derivation chain is self-contained as a methodological proposal whose outputs (revised cases) are presented as independent results of the framework rather than tautological renamings or forced predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the proposal relies on high-level descriptions of agent assistance without technical specifics.

pith-pipeline@v0.9.0 · 5684 in / 1141 out tokens · 58979 ms · 2026-05-19T14:37:03.553750+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RTL-BenchMT focuses on two key applications: (1) automatically identifying and revising flawed benchmark cases and (2) automatically detecting and updating overfitting cases.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 3 internal anchors

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Mohammad Akyash, Kimia Azar, and Hadi Kamali. 2025. DecoRTL: A Run- time Decoding Framework for RTL Code Generation with LLMs.arXiv preprint arXiv:2507.02226(2025)

work page arXiv 2025
[3]

Nurit Cohen-Inger, Yehonatan Elisha, Bracha Shapira, Lior Rokach, and Seffi Cohen. 2025. Forget What You Know about LLMs Evaluations–LLMs are Like a Chameleon.arXiv preprint arXiv:2502.07445(2025)

work page arXiv 2025
[4]

Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, et al . 2024. OriGen: Enhancing RTL Code Generation with Code-to-Code Augmentation and Self- Reflection.arXiv preprint arXiv:2407.16237(2024)

work page arXiv 2024
[5]

Chia-Tung Ho, Haoxing Ren, and Brucek Khailany. 2024. VerilogCoder: Au- tonomous Verilog Coding Agents with Graph-based Planning and Abstract Syn- tax Tree (AST)-based Waveform Tracing Tool.arXiv preprint arXiv:2408.08927 (2024)

work page arXiv 2024
[6]

Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, et al . 2024. Deepseek- v2: A strong, economical, and efficient mixture-of-experts language model.arXiv preprint arXiv:2405.04434(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. 2023. Ver- ilogeval: Evaluating large language models for verilog code generation. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE

work page 2023
[8]

Shang Liu, Wenji Fang, Yao Lu, Qijun Zhang, Hongce Zhang, and Zhiyao Xie

work page
[9]

In2024 IEEE LLM Aided Design Workshop (LAD)

Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open- source dataset and lightweight solution. In2024 IEEE LLM Aided Design Workshop (LAD). IEEE

work page
[10]

Shang Liu, Yao Lu, Wenji Fang, Mengming Li, and Zhiyao Xie. 2024. Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation. InProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. 1–9

work page 2024
[11]

Yao Lu, Shang Liu, Qijun Zhang, and Zhiyao Xie. 2024. Rtllm: An open-source benchmark for design rtl generation with large language model. In2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE

work page 2024
[12]

Ruiyang Ma, Yuxin Yang, Ziqian Liu, Jiaxi Zhang, Min Li, Junhua Huang, and Guojie Luo. 2024. VerilogReader: LLM-Aided Hardware Test Generation.arXiv preprint arXiv:2406.04373(2024)

work page arXiv 2024
[13]

Zehua Pei, Hui-Ling Zhen, Mingxuan Yuan, Yu Huang, and Bei Yu. 2024. Betterv: Controlled verilog generation with discriminative guidance.arXiv preprint arXiv:2402.03375(2024)

work page arXiv 2024
[14]

Nathaniel Pinckney, Christopher Batten, Mingjie Liu, Haoxing Ren, and Brucek Khailany. 2024. Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks.arXiv preprint arXiv:2408.11053(2024)

work page arXiv 2024
[15]

Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, and Haoxing Ren. 2025. Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification.arXiv preprint arXiv:2506.14074(2025)

work page arXiv 2025
[16]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

Xufeng Yao, Yiwen Wang, Xing Li, Yingzhao Lian, Ran Chen, Lei Chen, Mingxuan Yuan, Hong Xu, and Bei Yu. 2024. RTLRewriter: Methodologies for Large Models aided RTL Code Optimization.arXiv preprint arXiv:2409.11414(2024)

work page arXiv 2024
[18]

Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, et al. 2024. Codev: Empowering llms for verilog generation through multi-level summarization.arXiv preprint arXiv:2407.10424(2024)

work page arXiv 2024
[19]

Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao

work page
[20]

MAGE: A Multi-Agent Engine for Automated RTL Code Generation.arXiv preprint arXiv:2412.07822(2024). 7

work page arXiv 2024

[1] [1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Mohammad Akyash, Kimia Azar, and Hadi Kamali. 2025. DecoRTL: A Run- time Decoding Framework for RTL Code Generation with LLMs.arXiv preprint arXiv:2507.02226(2025)

work page arXiv 2025

[3] [3]

Nurit Cohen-Inger, Yehonatan Elisha, Bracha Shapira, Lior Rokach, and Seffi Cohen. 2025. Forget What You Know about LLMs Evaluations–LLMs are Like a Chameleon.arXiv preprint arXiv:2502.07445(2025)

work page arXiv 2025

[4] [4]

Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, et al . 2024. OriGen: Enhancing RTL Code Generation with Code-to-Code Augmentation and Self- Reflection.arXiv preprint arXiv:2407.16237(2024)

work page arXiv 2024

[5] [5]

Chia-Tung Ho, Haoxing Ren, and Brucek Khailany. 2024. VerilogCoder: Au- tonomous Verilog Coding Agents with Graph-based Planning and Abstract Syn- tax Tree (AST)-based Waveform Tracing Tool.arXiv preprint arXiv:2408.08927 (2024)

work page arXiv 2024

[6] [6]

Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, et al . 2024. Deepseek- v2: A strong, economical, and efficient mixture-of-experts language model.arXiv preprint arXiv:2405.04434(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. 2023. Ver- ilogeval: Evaluating large language models for verilog code generation. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE

work page 2023

[8] [8]

Shang Liu, Wenji Fang, Yao Lu, Qijun Zhang, Hongce Zhang, and Zhiyao Xie

work page

[9] [9]

In2024 IEEE LLM Aided Design Workshop (LAD)

Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open- source dataset and lightweight solution. In2024 IEEE LLM Aided Design Workshop (LAD). IEEE

work page

[10] [10]

Shang Liu, Yao Lu, Wenji Fang, Mengming Li, and Zhiyao Xie. 2024. Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation. InProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. 1–9

work page 2024

[11] [11]

Yao Lu, Shang Liu, Qijun Zhang, and Zhiyao Xie. 2024. Rtllm: An open-source benchmark for design rtl generation with large language model. In2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE

work page 2024

[12] [12]

Ruiyang Ma, Yuxin Yang, Ziqian Liu, Jiaxi Zhang, Min Li, Junhua Huang, and Guojie Luo. 2024. VerilogReader: LLM-Aided Hardware Test Generation.arXiv preprint arXiv:2406.04373(2024)

work page arXiv 2024

[13] [13]

Zehua Pei, Hui-Ling Zhen, Mingxuan Yuan, Yu Huang, and Bei Yu. 2024. Betterv: Controlled verilog generation with discriminative guidance.arXiv preprint arXiv:2402.03375(2024)

work page arXiv 2024

[14] [14]

Nathaniel Pinckney, Christopher Batten, Mingjie Liu, Haoxing Ren, and Brucek Khailany. 2024. Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks.arXiv preprint arXiv:2408.11053(2024)

work page arXiv 2024

[15] [15]

Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, and Haoxing Ren. 2025. Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification.arXiv preprint arXiv:2506.14074(2025)

work page arXiv 2025

[16] [16]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

Xufeng Yao, Yiwen Wang, Xing Li, Yingzhao Lian, Ran Chen, Lei Chen, Mingxuan Yuan, Hong Xu, and Bei Yu. 2024. RTLRewriter: Methodologies for Large Models aided RTL Code Optimization.arXiv preprint arXiv:2409.11414(2024)

work page arXiv 2024

[18] [18]

Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, et al. 2024. Codev: Empowering llms for verilog generation through multi-level summarization.arXiv preprint arXiv:2407.10424(2024)

work page arXiv 2024

[19] [19]

Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao

work page

[20] [20]

MAGE: A Multi-Agent Engine for Automated RTL Code Generation.arXiv preprint arXiv:2412.07822(2024). 7

work page arXiv 2024