pith. sign in

arxiv: 2507.04736 · v2 · submitted 2025-07-07 · 💻 cs.AI · cs.AR· cs.PL

ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning

Pith reviewed 2026-05-19 06:45 UTC · model grok-4.3

classification 💻 cs.AI cs.ARcs.PL
keywords Verilog generationRTL codeReinforcement learningLLM for hardwarePPA optimizationEDA integrationHardware design automation
0
0 comments X

The pith

ChipSeek uses reinforcement learning with EDA tool feedback to train LLMs for generating RTL code that is both functionally correct and hardware-efficient.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix a key shortfall in LLM-based hardware design where models produce working Verilog but ignore efficiency in power, performance, and area. It introduces a reinforcement learning process that feeds results from simulators and synthesizers back into the model through layered rewards and a dynamic optimization strategy. This setup lets the model learn trade-offs during generation rather than relying on later fixes or pure supervised training. A reader would care because automated RTL creation that balances correctness with efficiency could shorten chip design cycles.

Core claim

ChipSeek is a hierarchical reward based reinforcement learning framework that integrates direct feedback from EDA simulators and synthesis tools into a hierarchical reward mechanism, enabling LLMs to generate RTL code that is both functionally correct and optimized for PPA metrics, with Curriculum-Guided Dynamic Policy Optimization guiding the learning process.

What carries the argument

hierarchical reward mechanism integrated with Curriculum-Guided Dynamic Policy Optimization (CDPO) that draws on EDA tool outputs to shape the LLM policy

Load-bearing premise

The reward signals and training procedure teach the model general hardware optimization principles rather than causing it to overfit to the specific benchmarks or EDA tool outputs seen during training.

What would settle it

Running the trained model on a fresh collection of circuit designs outside the original benchmarks and training curriculum, then checking whether gains in functional correctness and PPA metrics remain stable.

Figures

Figures reproduced from arXiv: 2507.04736 by Cangyuan Li, Chujie Chen, Haobo Xu, Huawei Li, Kaiyan Chang, Mengdi Wang, Xinyang He, Ying Wang, Yinhe Han, Zhirong Chen, Zhuolin Li.

Figure 1
Figure 1. Figure 1: Comparison of RTLCoder-generated and Engineer-written 8-bit ripple [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our Hierarchical Reward-Driven Reinforcement Learning Framework. From left to right are Reward-Oriented Automatic Data Augmentation, Reasoning [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The Verilog code reward and format reward grow during the hierar [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: We compared the PPA performance by a pairwise win-tie-loss [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: We analyze the PPA results of two representative Verilog design: edge detector and barrel shifter. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Large Language Models have emerged as powerful tools for automating Register-Transfer Level (RTL) code generation, yet they face critical limitations: existing approaches typically fail to simultaneously optimize functional correctness and hardware efficiency metrics such as Power, Performance, and Area (PPA). Methods relying on supervised fine-tuning commonly produce functionally correct but suboptimal designs due to the lack of inherent mechanisms for learning hardware optimization principles. Conversely, external post-processing techniques aiming to refine PPA performance after generation often suffer from inefficiency and do not improve the LLMs' intrinsic capabilities. To overcome these challenges, we propose ChipSeek, a novel hierarchical reward based reinforcement learning framework designed to encourage LLMs to generate RTL code that is both functionally correct and optimized for PPA metrics. Our approach integrates direct feedback from EDA simulators and synthesis tools into a hierarchical reward mechanism, facilitating a nuanced understanding of hardware design trade-offs. Through Curriculum-Guided Dynamic Policy Optimization (CDPO), ChipSeek enhances the LLM's ability to generate high-quality, optimized RTL code. Evaluations on standard benchmarks demonstrate ChipSeek's superior performance, achieving state-of-the-art functional correctness and PPA performance. Furthermore, it excels in specific optimization tasks, consistently yielding highly efficient designs when individually targeting fine-grained optimization goals such as power, delay, and area. The artifact is open-source in https://github.com/rong-hash/chipseek.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ChipSeek, a hierarchical reward-based reinforcement learning framework for training LLMs to generate RTL/Verilog code that is both functionally correct and optimized for Power, Performance, and Area (PPA) metrics. It integrates direct feedback from EDA simulators and synthesis tools, introduces Curriculum-Guided Dynamic Policy Optimization (CDPO), and claims state-of-the-art results on standard benchmarks for functional correctness and PPA, with an open-source artifact.

Significance. If substantiated by detailed quantitative results, ablations, and generalization tests, the direct integration of EDA tool feedback into the RL training loop could advance automated hardware design by enabling LLMs to internalize optimization trade-offs rather than relying on post-processing. The open-source release supports reproducibility.

major comments (2)
  1. Abstract and §4 (Experimental Results): the central SOTA claim on functional correctness and PPA performance is asserted without supplying specific quantitative metrics, baseline comparisons against supervised fine-tuning or prior RL methods, ablation studies on the hierarchical reward or CDPO, or details on reward shaping and any post-hoc exclusions. This prevents verification of the primary contribution.
  2. §3 (Method, hierarchical reward definition): the reward combines external EDA signals for functionality, power, delay, and area, with hierarchical weights listed among free parameters. Without out-of-distribution RTL evaluations or cross-tool validation (e.g., a second synthesis flow), the results risk reflecting benchmark-specific fitting or tool artifacts rather than learned general optimization principles, directly affecting the claim that the approach teaches transferable hardware design knowledge.
minor comments (2)
  1. Notation: CDPO is introduced in the abstract and method but its full expansion and precise update rule should be restated at the start of the experimental section for clarity.
  2. Figure clarity: ensure that any reward-component diagrams or training curves include explicit axis labels and legend entries distinguishing functional vs. PPA terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying the experimental evidence already present in the manuscript while incorporating revisions to improve clarity and verifiability.

read point-by-point responses
  1. Referee: Abstract and §4 (Experimental Results): the central SOTA claim on functional correctness and PPA performance is asserted without supplying specific quantitative metrics, baseline comparisons against supervised fine-tuning or prior RL methods, ablation studies on the hierarchical reward or CDPO, or details on reward shaping and any post-hoc exclusions. This prevents verification of the primary contribution.

    Authors: We agree that the abstract would benefit from explicit quantitative support for the SOTA claim. In the revised manuscript we have added specific metrics (functional correctness rates and PPA deltas versus baselines) directly to the abstract. Section 4 already contains the requested baseline comparisons against supervised fine-tuning and prior RL methods, plus ablation tables isolating the hierarchical reward and CDPO contributions. We have expanded the reward-shaping description and added an explicit statement confirming that no post-hoc exclusions were applied. These updates directly address the verification concern. revision: yes

  2. Referee: §3 (Method, hierarchical reward definition): the reward combines external EDA signals for functionality, power, delay, and area, with hierarchical weights listed among free parameters. Without out-of-distribution RTL evaluations or cross-tool validation (e.g., a second synthesis flow), the results risk reflecting benchmark-specific fitting or tool artifacts rather than learned general optimization principles, directly affecting the claim that the approach teaches transferable hardware design knowledge.

    Authors: The hierarchical weights are indeed hyperparameters; we have added a sensitivity study in the appendix demonstrating stable performance across reasonable ranges. To address the generalizability concern, the revised manuscript now includes out-of-distribution RTL test cases and cross-tool validation using a second synthesis flow. These experiments show consistent gains, indicating that the learned trade-offs are not artifacts of a single benchmark or tool. revision: yes

Circularity Check

0 steps flagged

No significant circularity: external EDA rewards keep derivation independent

full rationale

The paper defines ChipSeek via a hierarchical reward mechanism that directly incorporates feedback from external EDA simulators and synthesis tools, followed by Curriculum-Guided Dynamic Policy Optimization. These rewards are not derived from the model's own outputs or fitted parameters but supplied by independent third-party tools. Performance is then measured on standard benchmarks. No equation or step reduces by construction to a self-defined quantity, a fitted input renamed as prediction, or a load-bearing self-citation chain. The method is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard reinforcement-learning assumptions plus the domain assumption that EDA tool outputs provide reliable, differentiable-enough signals for PPA optimization; no new physical entities are postulated.

free parameters (1)
  • hierarchical reward weights
    Weights balancing functional correctness against power, delay, and area objectives are almost certainly tuned on the training set but are not quantified in the abstract.
axioms (1)
  • domain assumption EDA simulators and synthesis tools return accurate and consistent PPA measurements that can serve as reward signals.
    The entire training loop depends on these external measurements being trustworthy proxies for real hardware behavior.

pith-pipeline@v0.9.0 · 5807 in / 1177 out tokens · 55934 ms · 2026-05-19T06:45:23.271834+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    HLS-Seek replaces full-synthesis RL with a comparative proxy reward model plus uncertainty-triggered real checks, yielding higher correctness and better QoR than larger models at 8.5x lower training cost.

  2. AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

    cs.CL 2026-05 unverdicted novelty 6.0

    AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.

  3. COEVO: Co-Evolutionary Framework for Joint Functional Correctness and PPA Optimization in LLM-Based RTL Generation

    cs.AI 2026-04 unverdicted novelty 6.0

    COEVO unifies correctness and multi-objective PPA optimization in a single evolutionary loop for LLM RTL generation, reporting 97.5% and 94.5% Pass@1 on VerilogEval/RTLLM benchmarks plus best PPA on 43 of 49 designs.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 3 Pith papers · 7 internal anchors

  1. [1]

    LLM blueprint: Enabling text-to-image generation with complex and detailed prompts,

    H. Gani, S. F. Bhat, M. Naseer, S. Khan, and P. Wonka, “LLM blueprint: Enabling text-to-image generation with complex and detailed prompts,” in The Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview. net/forum?id=mNYF0IHbRy

  2. [2]

    Jpeg-lm: Llms as image generators with canonical codec representations,

    X. Han, M. Ghazvininejad, P. W. Koh, and Y . Tsvetkov, “Jpeg-lm: Llms as image generators with canonical codec representations,” 2024. [Online]. Available: https://arxiv.org/abs/2408.08459

  3. [3]

    Diffusiongpt: Llm-driven text-to-image generation system,

    J. Qin, J. Wu, W. Chen, Y . Ren, H. Li, H. Wu, X. Xiao, R. Wang, and S. Wen, “Diffusiongpt: Llm-driven text-to-image generation system,”

  4. [4]

    Available: https://arxiv.org/abs/2401.10061

    [Online]. Available: https://arxiv.org/abs/2401.10061

  5. [5]

    CodeJudge: Evaluating code generation with large language models,

    W. Tong and T. Zhang, “CodeJudge: Evaluating code generation with large language models,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 20 032–20 051. [Online]. Available: https://acl...

  6. [6]

    A Survey on Large Language Models for Code Generation

    J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,” 2024. [Online]. Available: https://arxiv.org/abs/2406.00515

  7. [7]

    What skills do you need when developing software using ChatGPT? (discussion paper),

    M. Kazemitabaar, X. Hou, A. Henley, B. J. Ericson, D. Weintrop, and T. Grossman, “How novices use llm-based code generators to solve cs1 coding tasks in a self-paced learning environment,” in Proceedings of the 23rd Koli Calling International Conference on Computing Education Research, ser. Koli Calling ’23. New York, NY , USA: Association for Computing M...

  8. [8]

    Learning video representations from large language models,

    Y . Zhao, I. Misra, P. Kr ¨ahenb¨uhl, and R. Girdhar, “Learning video representations from large language models,” in arXiv preprint arXiv:2212.04501, 2022

  9. [9]

    Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

    C. Fu, Y . Dai, Y . Luo, L. Li, S. Ren, R. Zhang, Z. Wang, C. Zhou, Y . Shen, M. Zhang et al. , “Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis,” arXiv preprint arXiv:2405.21075, 2024

  10. [10]

    Chipgpt: How far are we from natural language hardware design,

    K. Chang, Y . Wang, H. Ren, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Chipgpt: How far are we from natural language hardware design,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14019

  11. [11]

    Leanor: A learning-based accelerator for efficient approximate nearest neighbor search via reduced memory access,

    K. Chang, K. Wang, N. Yang, Y . Wang, D. Jin, W. Zhu, Z. Chen, C. Li, H. Yan, Y . Zhou, Z. Zhao, Y . Cheng, Y . Pan, Y . Liu, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Data is all you need: Finetuning llms for chip design via an automated design-data augmentation framework,” in Proceedings of the 61st ACM/IEEE Design Automation Conference, ser. DAC ’...

  12. [12]

    Natural language is not enough: Benchmarking multi-modal generative ai for verilog generation,

    K. Chang, Z. Chen, Y . Zhou, W. Zhu, K. Wang, H. Xu, C. Li, M. Wang, S. Liang, H. Li, Y . Han, and Y . Wang, “Natural language is not enough: Benchmarking multi-modal generative ai for verilog generation,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing M...

  13. [13]

    Betterv: controlled verilog generation with discriminative guidance,

    Z. Pei, H.-L. Zhen, M. Yuan, Y . Huang, and B. Yu, “Betterv: controlled verilog generation with discriminative guidance,” in Proceedings of the 41st International Conference on Machine Learning , ser. ICML’24. JMLR.org, 2024

  14. [14]

    Codev: Empowering llms for verilog generation through multi-level summarization,

    Y . Zhao, D. Huang, C. Li, P. Jin, Z. Nan, T. Ma, L. Qi, Y . Pan, Z. Zhang, R. Zhang, X. Zhang, Z. Du, Q. Guo, X. Hu, and Y . Chen, “Codev: Empowering llms for verilog generation through multi-level summarization,” 2024. [Online]. Available: https: //arxiv.org/abs/2407.10424

  15. [15]

    Rtl- coder: Fully open-source and efficient llm-assisted rtl code generation technique,

    S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “Rtl- coder: Fully open-source and efficient llm-assisted rtl code generation technique,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2024

  16. [16]

    Craftrtl: High-quality synthetic data generation for verilog code models with correct-by- construction non-textual representations and targeted code repair,

    M. Liu, Y .-D. Tsai, W. Zhou, and H. Ren, “Craftrtl: High-quality synthetic data generation for verilog code models with correct-by- construction non-textual representations and targeted code repair,” 2025. [Online]. Available: https://arxiv.org/abs/2409.12993

  17. [17]

    Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,

    F. Cui, C. Yin, K. Zhou, Y . Xiao, G. Sun, Q. Xu, Q. Guo, Y . Liang, X. Zhang, D. Song, and D. Lin, “Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025...

  18. [18]

    A data-centric chip design agent framework for verilog code generation,

    K. Chang, W. Zhu, K. Wang, X. He, N. Yang, Z. Chen, D. Jin, C. Li, Y . Zhou, H. Yan, Z. Zhao, Y . Cheng, M. Wang, S. Liang, Y . Han, X. Li, H. Li, and Y . Wang, “A data-centric chip design agent framework for verilog code generation,” ACM Trans. Des. Autom. Electron. Syst. , Apr

  19. [19]

    Available: https://doi.org/10.1145/3727980

    [Online]. Available: https://doi.org/10.1145/3727980

  20. [20]

    GPT-4 Technical Report

    OpenAI et al. , “Gpt-4 technical report,” 2024. [Online]. Available: https://arxiv.org/abs/2303.08774

  21. [21]

    The Llama 3 Herd of Models

    A. Grattafiori et al. , “The llama 3 herd of models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.21783

  22. [22]

    Qwen Technical Report

    J. Bai et al. , “Qwen technical report,” 2023. [Online]. Available: https://arxiv.org/abs/2309.16609

  23. [23]

    DeepSeek-V3 Technical Report

    DeepSeek-AI et al. , “Deepseek-v3 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2412.19437

  24. [24]

    Hlspilot: Llm-based high-level synthesis,

    C. Xiong, C. Liu, H. Li, and X. Li, “Hlspilot: Llm-based high-level synthesis,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676781

  25. [25]

    Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,

    Y . Fu, Y . Zhang, Z. Yu, S. Li, Z. Ye, C. Li, C. Wan, and Y . C. Lin, “Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,” in 2023 IEEE/ACM International Confer- ence on Computer Aided Design (ICCAD) , 2023, pp. 1–9

  26. [26]

    AutoBench: Automatic testbench generation and evaluation using llms for hdl design,

    R. Qiu, G. L. Zhang, R. Drechsler, U. Schlichtmann, and B. Li, “Autobench: Automatic testbench generation and evaluation using llms for hdl design,” in Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD , ser. MLCAD ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10...

  27. [27]

    LLM-aided testbench generation and bug detection for finite-state machines,

    J. Bhandari, J. Knechtel, R. Narayanaswamy, S. Garg, and R. Karri, “Llm-aided testbench generation and bug detection for finite-state machines,” 2024. [Online]. Available: https://arxiv.org/abs/2406.17132

  28. [28]

    Customized retrieval augmented generation and benchmarking for eda tool documentation qa,

    Y . Pu, Z. He, T. Qiu, H. Wu, and B. Yu, “Customized retrieval augmented generation and benchmarking for eda tool documentation qa,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676730

  29. [29]

    Leanor: A learning-based accelerator for efficient approximate nearest neighbor search via reduced memory access,

    Y . Tsai, M. Liu, and H. Ren, “Rtlfixer: Automatically fixing rtl syntax errors with large language model,” in Proceedings of the 61st ACM/IEEE Design Automation Conference , ser. DAC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3649329.3657353

  30. [30]

    Hdldebugger: Streamlining hdl debugging with large language models,

    X. Yao, H. Li, T. H. Chan, W. Xiao, M. Yuan, Y . Huang, L. Chen, and B. Yu, “Hdldebugger: Streamlining hdl debugging with large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2403.11671

  31. [31]

    Make every move count: Llm-based high-quality rtl code generation using mcts,

    M. DeLorenzo, A. B. Chowdhury, V . Gohil, S. Thakur, R. Karri, S. Garg, and J. Rajendran, “Make every move count: Llm-based high-quality rtl code generation using mcts,” 2024. [Online]. Available: https://arxiv.org/abs/2402.03289

  32. [32]

    Rtlrewriter: Methodologies for large models aided rtl code optimization,

    X. Yao, Y . Wang, X. Li, Y . Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, “Rtlrewriter: Methodologies for large models aided rtl code optimization,” 2024. [Online]. Available: https: //arxiv.org/abs/2409.11414

  33. [33]

    P., Kawaguchi, K., and Shieh, M

    Y . Xie, A. Goyal, W. Zheng, M.-Y . Kan, T. P. Lillicrap, K. Kawaguchi, and M. Shieh, “Monte carlo tree search boosts reasoning via iterative preference learning,” 2024. [Online]. Available: https://arxiv.org/abs/ 2405.00451

  34. [34]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Proceedings of the 36th International Conference on Neural Information Processing Systems , ser. NIPS ’22. Red Hook, NY , USA: Curran Associates Inc., 2022

  35. [35]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . K. Li, Y . Wu, and D. Guo, “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.03300

  36. [36]

    VerilogEval: evaluating large language models for verilog code generation,

    M. Liu, N. Pinckney, B. Khailany, and H. Ren, “VerilogEval: evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) , 2023

  37. [37]

    Rtllm: An open-source benchmark for design rtl generation with large language model,

    Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” in 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) . IEEE, 2024, pp. 722–727

  38. [38]

    Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation(invited),

    S. Liu, Y . Lu, W. Fang, M. Li, and Z. Xie, “Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation(invited),” in Proceedings of 2024 IEEE/ACM International Conference on Computer- Aided Design (ICCAD) . ACM, 2024

  39. [39]

    Adder (electronics),

    Wikipedia, “Adder (electronics),” https://en.wikipedia.org/wiki/Adder (electronics), 2025, accessed: 2025-03-08