ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning

Cangyuan Li; Chujie Chen; Haobo Xu; Huawei Li; Kaiyan Chang; Mengdi Wang; Xinyang He; Ying Wang; Yinhe Han; Zhirong Chen

arxiv: 2507.04736 · v2 · submitted 2025-07-07 · 💻 cs.AI · cs.AR· cs.PL

ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning

Zhirong Chen , Kaiyan Chang , Zhuolin Li , Cangyuan Li , Xinyang He , Chujie Chen , Mengdi Wang , Haobo Xu

show 3 more authors

Yinhe Han Huawei Li Ying Wang

This is my paper

Pith reviewed 2026-05-19 06:45 UTC · model grok-4.3

classification 💻 cs.AI cs.ARcs.PL

keywords Verilog generationRTL codeReinforcement learningLLM for hardwarePPA optimizationEDA integrationHardware design automation

0 comments

The pith

ChipSeek uses reinforcement learning with EDA tool feedback to train LLMs for generating RTL code that is both functionally correct and hardware-efficient.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix a key shortfall in LLM-based hardware design where models produce working Verilog but ignore efficiency in power, performance, and area. It introduces a reinforcement learning process that feeds results from simulators and synthesizers back into the model through layered rewards and a dynamic optimization strategy. This setup lets the model learn trade-offs during generation rather than relying on later fixes or pure supervised training. A reader would care because automated RTL creation that balances correctness with efficiency could shorten chip design cycles.

Core claim

ChipSeek is a hierarchical reward based reinforcement learning framework that integrates direct feedback from EDA simulators and synthesis tools into a hierarchical reward mechanism, enabling LLMs to generate RTL code that is both functionally correct and optimized for PPA metrics, with Curriculum-Guided Dynamic Policy Optimization guiding the learning process.

What carries the argument

hierarchical reward mechanism integrated with Curriculum-Guided Dynamic Policy Optimization (CDPO) that draws on EDA tool outputs to shape the LLM policy

Load-bearing premise

The reward signals and training procedure teach the model general hardware optimization principles rather than causing it to overfit to the specific benchmarks or EDA tool outputs seen during training.

What would settle it

Running the trained model on a fresh collection of circuit designs outside the original benchmarks and training curriculum, then checking whether gains in functional correctness and PPA metrics remain stable.

Figures

Figures reproduced from arXiv: 2507.04736 by Cangyuan Li, Chujie Chen, Haobo Xu, Huawei Li, Kaiyan Chang, Mengdi Wang, Xinyang He, Ying Wang, Yinhe Han, Zhirong Chen, Zhuolin Li.

**Figure 2.** Figure 2: Our Hierarchical Reward-Driven Reinforcement Learning Framework. From left to right are Reward-Oriented Automatic Data Augmentation, Reasoning [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The Verilog code reward and format reward grow during the hierar [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: We compared the PPA performance by a pairwise win-tie-loss [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: We analyze the PPA results of two representative Verilog design: edge detector and barrel shifter. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Large Language Models have emerged as powerful tools for automating Register-Transfer Level (RTL) code generation, yet they face critical limitations: existing approaches typically fail to simultaneously optimize functional correctness and hardware efficiency metrics such as Power, Performance, and Area (PPA). Methods relying on supervised fine-tuning commonly produce functionally correct but suboptimal designs due to the lack of inherent mechanisms for learning hardware optimization principles. Conversely, external post-processing techniques aiming to refine PPA performance after generation often suffer from inefficiency and do not improve the LLMs' intrinsic capabilities. To overcome these challenges, we propose ChipSeek, a novel hierarchical reward based reinforcement learning framework designed to encourage LLMs to generate RTL code that is both functionally correct and optimized for PPA metrics. Our approach integrates direct feedback from EDA simulators and synthesis tools into a hierarchical reward mechanism, facilitating a nuanced understanding of hardware design trade-offs. Through Curriculum-Guided Dynamic Policy Optimization (CDPO), ChipSeek enhances the LLM's ability to generate high-quality, optimized RTL code. Evaluations on standard benchmarks demonstrate ChipSeek's superior performance, achieving state-of-the-art functional correctness and PPA performance. Furthermore, it excels in specific optimization tasks, consistently yielding highly efficient designs when individually targeting fine-grained optimization goals such as power, delay, and area. The artifact is open-source in https://github.com/rong-hash/chipseek.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ChipSeek loops real EDA simulator and synthesis feedback into RL for Verilog generation, but the abstract supplies no numbers or ablations to show the gains are more than benchmark fitting.

read the letter

ChipSeek trains an LLM to generate Verilog by pulling direct signals from EDA tools into a hierarchical reward during reinforcement learning, then adds a Curriculum-Guided Dynamic Policy Optimization step. That combination is the main new element relative to earlier supervised or post-processing approaches for RTL code. The setup tries to push the model toward designs that are both correct and better on power, performance, and area without needing separate cleanup passes afterward. Releasing the code is a practical plus for anyone who wants to test the reward structure themselves. The abstract frames this as reaching state-of-the-art on standard benchmarks for both functional correctness and PPA metrics, and it notes good results when the model is steered toward one goal at a time. The central claim therefore rests on the idea that the external tool feedback plus the curriculum schedule teaches transferable optimization reasoning rather than narrow exploitation of the training distribution. The soft spot is that none of the quantitative evidence appears in the abstract: no correctness percentages, no PPA deltas, no baseline tables, and no ablation removing the hierarchy or the dynamic schedule. Without those details it is difficult to separate real learning from fitting to the specific benchmarks and the particular simulator outputs used in training. The stress-test concern about tool-specific artifacts or benchmark idiosyncrasies therefore lands as a real open question until the experiments are examined. This paper is for groups already working on LLM-assisted hardware design flows who need concrete ways to close the loop with existing EDA tools. A reader looking for an example of hierarchical external rewards in code generation could extract the reward formulation and the CDPO schedule for their own experiments. I would send it to peer review because the core mechanism is worth checking with full data and because the artifact is available for reproduction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ChipSeek, a hierarchical reward-based reinforcement learning framework for training LLMs to generate RTL/Verilog code that is both functionally correct and optimized for Power, Performance, and Area (PPA) metrics. It integrates direct feedback from EDA simulators and synthesis tools, introduces Curriculum-Guided Dynamic Policy Optimization (CDPO), and claims state-of-the-art results on standard benchmarks for functional correctness and PPA, with an open-source artifact.

Significance. If substantiated by detailed quantitative results, ablations, and generalization tests, the direct integration of EDA tool feedback into the RL training loop could advance automated hardware design by enabling LLMs to internalize optimization trade-offs rather than relying on post-processing. The open-source release supports reproducibility.

major comments (2)

Abstract and §4 (Experimental Results): the central SOTA claim on functional correctness and PPA performance is asserted without supplying specific quantitative metrics, baseline comparisons against supervised fine-tuning or prior RL methods, ablation studies on the hierarchical reward or CDPO, or details on reward shaping and any post-hoc exclusions. This prevents verification of the primary contribution.
§3 (Method, hierarchical reward definition): the reward combines external EDA signals for functionality, power, delay, and area, with hierarchical weights listed among free parameters. Without out-of-distribution RTL evaluations or cross-tool validation (e.g., a second synthesis flow), the results risk reflecting benchmark-specific fitting or tool artifacts rather than learned general optimization principles, directly affecting the claim that the approach teaches transferable hardware design knowledge.

minor comments (2)

Notation: CDPO is introduced in the abstract and method but its full expansion and precise update rule should be restated at the start of the experimental section for clarity.
Figure clarity: ensure that any reward-component diagrams or training curves include explicit axis labels and legend entries distinguishing functional vs. PPA terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying the experimental evidence already present in the manuscript while incorporating revisions to improve clarity and verifiability.

read point-by-point responses

Referee: Abstract and §4 (Experimental Results): the central SOTA claim on functional correctness and PPA performance is asserted without supplying specific quantitative metrics, baseline comparisons against supervised fine-tuning or prior RL methods, ablation studies on the hierarchical reward or CDPO, or details on reward shaping and any post-hoc exclusions. This prevents verification of the primary contribution.

Authors: We agree that the abstract would benefit from explicit quantitative support for the SOTA claim. In the revised manuscript we have added specific metrics (functional correctness rates and PPA deltas versus baselines) directly to the abstract. Section 4 already contains the requested baseline comparisons against supervised fine-tuning and prior RL methods, plus ablation tables isolating the hierarchical reward and CDPO contributions. We have expanded the reward-shaping description and added an explicit statement confirming that no post-hoc exclusions were applied. These updates directly address the verification concern. revision: yes
Referee: §3 (Method, hierarchical reward definition): the reward combines external EDA signals for functionality, power, delay, and area, with hierarchical weights listed among free parameters. Without out-of-distribution RTL evaluations or cross-tool validation (e.g., a second synthesis flow), the results risk reflecting benchmark-specific fitting or tool artifacts rather than learned general optimization principles, directly affecting the claim that the approach teaches transferable hardware design knowledge.

Authors: The hierarchical weights are indeed hyperparameters; we have added a sensitivity study in the appendix demonstrating stable performance across reasonable ranges. To address the generalizability concern, the revised manuscript now includes out-of-distribution RTL test cases and cross-tool validation using a second synthesis flow. These experiments show consistent gains, indicating that the learned trade-offs are not artifacts of a single benchmark or tool. revision: yes

Circularity Check

0 steps flagged

No significant circularity: external EDA rewards keep derivation independent

full rationale

The paper defines ChipSeek via a hierarchical reward mechanism that directly incorporates feedback from external EDA simulators and synthesis tools, followed by Curriculum-Guided Dynamic Policy Optimization. These rewards are not derived from the model's own outputs or fitted parameters but supplied by independent third-party tools. Performance is then measured on standard benchmarks. No equation or step reduces by construction to a self-defined quantity, a fitted input renamed as prediction, or a load-bearing self-citation chain. The method is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard reinforcement-learning assumptions plus the domain assumption that EDA tool outputs provide reliable, differentiable-enough signals for PPA optimization; no new physical entities are postulated.

free parameters (1)

hierarchical reward weights
Weights balancing functional correctness against power, delay, and area objectives are almost certainly tuned on the training set but are not quantified in the abstract.

axioms (1)

domain assumption EDA simulators and synthesis tools return accurate and consistent PPA measurements that can serve as reward signals.
The entire training loop depends on these external measurements being trustworthy proxies for real hardware behavior.

pith-pipeline@v0.9.0 · 5807 in / 1177 out tokens · 55934 ms · 2026-05-19T06:45:23.271834+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hierarchical reward system... R = ω1 Rformat + ω2 Rcomp + ω3 Rfunc + ω4 Rsyn + ω5 Rppa... GRPO algorithm
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Curriculum-Guided Dynamic Policy Optimization (CDPO)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

HLS-Seek replaces full-synthesis RL with a comparative proxy reward model plus uncertainty-triggered real checks, yielding higher correctness and better QoR than larger models at 8.5x lower training cost.
AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code
cs.CL 2026-05 unverdicted novelty 6.0

AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.
COEVO: Co-Evolutionary Framework for Joint Functional Correctness and PPA Optimization in LLM-Based RTL Generation
cs.AI 2026-04 unverdicted novelty 6.0

COEVO unifies correctness and multi-objective PPA optimization in a single evolutionary loop for LLM RTL generation, reporting 97.5% and 94.5% Pass@1 on VerilogEval/RTLLM benchmarks plus best PPA on 43 of 49 designs.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 3 Pith papers · 7 internal anchors

[1]

LLM blueprint: Enabling text-to-image generation with complex and detailed prompts,

H. Gani, S. F. Bhat, M. Naseer, S. Khan, and P. Wonka, “LLM blueprint: Enabling text-to-image generation with complex and detailed prompts,” in The Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview. net/forum?id=mNYF0IHbRy

work page 2024
[2]

Jpeg-lm: Llms as image generators with canonical codec representations,

X. Han, M. Ghazvininejad, P. W. Koh, and Y . Tsvetkov, “Jpeg-lm: Llms as image generators with canonical codec representations,” 2024. [Online]. Available: https://arxiv.org/abs/2408.08459

work page arXiv 2024
[3]

Diffusiongpt: Llm-driven text-to-image generation system,

J. Qin, J. Wu, W. Chen, Y . Ren, H. Li, H. Wu, X. Xiao, R. Wang, and S. Wen, “Diffusiongpt: Llm-driven text-to-image generation system,”

work page
[4]

Available: https://arxiv.org/abs/2401.10061

[Online]. Available: https://arxiv.org/abs/2401.10061

work page arXiv
[5]

CodeJudge: Evaluating code generation with large language models,

W. Tong and T. Zhang, “CodeJudge: Evaluating code generation with large language models,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 20 032–20 051. [Online]. Available: https://acl...

work page 2024
[6]

A Survey on Large Language Models for Code Generation

J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,” 2024. [Online]. Available: https://arxiv.org/abs/2406.00515

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

What skills do you need when developing software using ChatGPT? (discussion paper),

M. Kazemitabaar, X. Hou, A. Henley, B. J. Ericson, D. Weintrop, and T. Grossman, “How novices use llm-based code generators to solve cs1 coding tasks in a self-paced learning environment,” in Proceedings of the 23rd Koli Calling International Conference on Computing Education Research, ser. Koli Calling ’23. New York, NY , USA: Association for Computing M...

work page doi:10.1145/3631802.3631806 2024
[8]

Learning video representations from large language models,

Y . Zhao, I. Misra, P. Kr ¨ahenb¨uhl, and R. Girdhar, “Learning video representations from large language models,” in arXiv preprint arXiv:2212.04501, 2022

work page arXiv 2022
[9]

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

C. Fu, Y . Dai, Y . Luo, L. Li, S. Ren, R. Zhang, Z. Wang, C. Zhou, Y . Shen, M. Zhang et al. , “Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis,” arXiv preprint arXiv:2405.21075, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Chipgpt: How far are we from natural language hardware design,

K. Chang, Y . Wang, H. Ren, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Chipgpt: How far are we from natural language hardware design,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14019

work page arXiv 2023
[11]

Leanor: A learning-based accelerator for efficient approximate nearest neighbor search via reduced memory access,

K. Chang, K. Wang, N. Yang, Y . Wang, D. Jin, W. Zhu, Z. Chen, C. Li, H. Yan, Y . Zhou, Z. Zhao, Y . Cheng, Y . Pan, Y . Liu, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Data is all you need: Finetuning llms for chip design via an automated design-data augmentation framework,” in Proceedings of the 61st ACM/IEEE Design Automation Conference, ser. DAC ’...

work page doi:10.1145/3649329.3657356 2024
[12]

Natural language is not enough: Benchmarking multi-modal generative ai for verilog generation,

K. Chang, Z. Chen, Y . Zhou, W. Zhu, K. Wang, H. Xu, C. Li, M. Wang, S. Liang, H. Li, Y . Han, and Y . Wang, “Natural language is not enough: Benchmarking multi-modal generative ai for verilog generation,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing M...

work page doi:10.1145/3676536.3676679 2025
[13]

Betterv: controlled verilog generation with discriminative guidance,

Z. Pei, H.-L. Zhen, M. Yuan, Y . Huang, and B. Yu, “Betterv: controlled verilog generation with discriminative guidance,” in Proceedings of the 41st International Conference on Machine Learning , ser. ICML’24. JMLR.org, 2024

work page 2024
[14]

Codev: Empowering llms for verilog generation through multi-level summarization,

Y . Zhao, D. Huang, C. Li, P. Jin, Z. Nan, T. Ma, L. Qi, Y . Pan, Z. Zhang, R. Zhang, X. Zhang, Z. Du, Q. Guo, X. Hu, and Y . Chen, “Codev: Empowering llms for verilog generation through multi-level summarization,” 2024. [Online]. Available: https: //arxiv.org/abs/2407.10424

work page arXiv 2024
[15]

Rtl- coder: Fully open-source and efficient llm-assisted rtl code generation technique,

S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “Rtl- coder: Fully open-source and efficient llm-assisted rtl code generation technique,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2024

work page 2024
[16]

Craftrtl: High-quality synthetic data generation for verilog code models with correct-by- construction non-textual representations and targeted code repair,

M. Liu, Y .-D. Tsai, W. Zhou, and H. Ren, “Craftrtl: High-quality synthetic data generation for verilog code models with correct-by- construction non-textual representations and targeted code repair,” 2025. [Online]. Available: https://arxiv.org/abs/2409.12993

work page arXiv 2025
[17]

Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,

F. Cui, C. Yin, K. Zhou, Y . Xiao, G. Sun, Q. Xu, Q. Guo, Y . Liang, X. Zhang, D. Song, and D. Lin, “Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025...

work page doi:10.1145/3676536.3676830 2025
[18]

A data-centric chip design agent framework for verilog code generation,

K. Chang, W. Zhu, K. Wang, X. He, N. Yang, Z. Chen, D. Jin, C. Li, Y . Zhou, H. Yan, Z. Zhao, Y . Cheng, M. Wang, S. Liang, Y . Han, X. Li, H. Li, and Y . Wang, “A data-centric chip design agent framework for verilog code generation,” ACM Trans. Des. Autom. Electron. Syst. , Apr

work page
[19]

Available: https://doi.org/10.1145/3727980

[Online]. Available: https://doi.org/10.1145/3727980

work page doi:10.1145/3727980
[20]

GPT-4 Technical Report

OpenAI et al. , “Gpt-4 technical report,” 2024. [Online]. Available: https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

The Llama 3 Herd of Models

A. Grattafiori et al. , “The llama 3 herd of models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Qwen Technical Report

J. Bai et al. , “Qwen technical report,” 2023. [Online]. Available: https://arxiv.org/abs/2309.16609

work page internal anchor Pith review Pith/arXiv arXiv 2023
[23]

DeepSeek-V3 Technical Report

DeepSeek-AI et al. , “Deepseek-v3 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2412.19437

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Hlspilot: Llm-based high-level synthesis,

C. Xiong, C. Liu, H. Li, and X. Li, “Hlspilot: Llm-based high-level synthesis,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676781

work page doi:10.1145/3676536.3676781 2025
[25]

Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,

Y . Fu, Y . Zhang, Z. Yu, S. Li, Z. Ye, C. Li, C. Wan, and Y . C. Lin, “Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,” in 2023 IEEE/ACM International Confer- ence on Computer Aided Design (ICCAD) , 2023, pp. 1–9

work page 2023
[26]

AutoBench: Automatic testbench generation and evaluation using llms for hdl design,

R. Qiu, G. L. Zhang, R. Drechsler, U. Schlichtmann, and B. Li, “Autobench: Automatic testbench generation and evaluation using llms for hdl design,” in Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD , ser. MLCAD ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10...

work page doi:10.1145/3670474.3685956 2024
[27]

LLM-aided testbench generation and bug detection for finite-state machines,

J. Bhandari, J. Knechtel, R. Narayanaswamy, S. Garg, and R. Karri, “Llm-aided testbench generation and bug detection for finite-state machines,” 2024. [Online]. Available: https://arxiv.org/abs/2406.17132

work page arXiv 2024
[28]

Customized retrieval augmented generation and benchmarking for eda tool documentation qa,

Y . Pu, Z. He, T. Qiu, H. Wu, and B. Yu, “Customized retrieval augmented generation and benchmarking for eda tool documentation qa,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676730

work page doi:10.1145/3676536.3676730 2025
[29]

Leanor: A learning-based accelerator for efficient approximate nearest neighbor search via reduced memory access,

Y . Tsai, M. Liu, and H. Ren, “Rtlfixer: Automatically fixing rtl syntax errors with large language model,” in Proceedings of the 61st ACM/IEEE Design Automation Conference , ser. DAC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3649329.3657353

work page doi:10.1145/3649329.3657353 2024
[30]

Hdldebugger: Streamlining hdl debugging with large language models,

X. Yao, H. Li, T. H. Chan, W. Xiao, M. Yuan, Y . Huang, L. Chen, and B. Yu, “Hdldebugger: Streamlining hdl debugging with large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2403.11671

work page arXiv 2024
[31]

Make every move count: Llm-based high-quality rtl code generation using mcts,

M. DeLorenzo, A. B. Chowdhury, V . Gohil, S. Thakur, R. Karri, S. Garg, and J. Rajendran, “Make every move count: Llm-based high-quality rtl code generation using mcts,” 2024. [Online]. Available: https://arxiv.org/abs/2402.03289

work page arXiv 2024
[32]

Rtlrewriter: Methodologies for large models aided rtl code optimization,

X. Yao, Y . Wang, X. Li, Y . Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, “Rtlrewriter: Methodologies for large models aided rtl code optimization,” 2024. [Online]. Available: https: //arxiv.org/abs/2409.11414

work page arXiv 2024
[33]

P., Kawaguchi, K., and Shieh, M

Y . Xie, A. Goyal, W. Zheng, M.-Y . Kan, T. P. Lillicrap, K. Kawaguchi, and M. Shieh, “Monte carlo tree search boosts reasoning via iterative preference learning,” 2024. [Online]. Available: https://arxiv.org/abs/ 2405.00451

work page arXiv 2024
[34]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Proceedings of the 36th International Conference on Neural Information Processing Systems , ser. NIPS ’22. Red Hook, NY , USA: Curran Associates Inc., 2022

work page 2022
[35]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . K. Li, Y . Wu, and D. Guo, “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

VerilogEval: evaluating large language models for verilog code generation,

M. Liu, N. Pinckney, B. Khailany, and H. Ren, “VerilogEval: evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) , 2023

work page 2023
[37]

Rtllm: An open-source benchmark for design rtl generation with large language model,

Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” in 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) . IEEE, 2024, pp. 722–727

work page 2024
[38]

Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation(invited),

S. Liu, Y . Lu, W. Fang, M. Li, and Z. Xie, “Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation(invited),” in Proceedings of 2024 IEEE/ACM International Conference on Computer- Aided Design (ICCAD) . ACM, 2024

work page 2024
[39]

Adder (electronics),

Wikipedia, “Adder (electronics),” https://en.wikipedia.org/wiki/Adder (electronics), 2025, accessed: 2025-03-08

work page 2025

[1] [1]

LLM blueprint: Enabling text-to-image generation with complex and detailed prompts,

H. Gani, S. F. Bhat, M. Naseer, S. Khan, and P. Wonka, “LLM blueprint: Enabling text-to-image generation with complex and detailed prompts,” in The Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview. net/forum?id=mNYF0IHbRy

work page 2024

[2] [2]

Jpeg-lm: Llms as image generators with canonical codec representations,

X. Han, M. Ghazvininejad, P. W. Koh, and Y . Tsvetkov, “Jpeg-lm: Llms as image generators with canonical codec representations,” 2024. [Online]. Available: https://arxiv.org/abs/2408.08459

work page arXiv 2024

[3] [3]

Diffusiongpt: Llm-driven text-to-image generation system,

J. Qin, J. Wu, W. Chen, Y . Ren, H. Li, H. Wu, X. Xiao, R. Wang, and S. Wen, “Diffusiongpt: Llm-driven text-to-image generation system,”

work page

[4] [4]

Available: https://arxiv.org/abs/2401.10061

[Online]. Available: https://arxiv.org/abs/2401.10061

work page arXiv

[5] [5]

CodeJudge: Evaluating code generation with large language models,

W. Tong and T. Zhang, “CodeJudge: Evaluating code generation with large language models,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 20 032–20 051. [Online]. Available: https://acl...

work page 2024

[6] [6]

A Survey on Large Language Models for Code Generation

J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,” 2024. [Online]. Available: https://arxiv.org/abs/2406.00515

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

What skills do you need when developing software using ChatGPT? (discussion paper),

M. Kazemitabaar, X. Hou, A. Henley, B. J. Ericson, D. Weintrop, and T. Grossman, “How novices use llm-based code generators to solve cs1 coding tasks in a self-paced learning environment,” in Proceedings of the 23rd Koli Calling International Conference on Computing Education Research, ser. Koli Calling ’23. New York, NY , USA: Association for Computing M...

work page doi:10.1145/3631802.3631806 2024

[8] [8]

Learning video representations from large language models,

Y . Zhao, I. Misra, P. Kr ¨ahenb¨uhl, and R. Girdhar, “Learning video representations from large language models,” in arXiv preprint arXiv:2212.04501, 2022

work page arXiv 2022

[9] [9]

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

C. Fu, Y . Dai, Y . Luo, L. Li, S. Ren, R. Zhang, Z. Wang, C. Zhou, Y . Shen, M. Zhang et al. , “Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis,” arXiv preprint arXiv:2405.21075, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

Chipgpt: How far are we from natural language hardware design,

K. Chang, Y . Wang, H. Ren, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Chipgpt: How far are we from natural language hardware design,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14019

work page arXiv 2023

[11] [11]

Leanor: A learning-based accelerator for efficient approximate nearest neighbor search via reduced memory access,

K. Chang, K. Wang, N. Yang, Y . Wang, D. Jin, W. Zhu, Z. Chen, C. Li, H. Yan, Y . Zhou, Z. Zhao, Y . Cheng, Y . Pan, Y . Liu, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Data is all you need: Finetuning llms for chip design via an automated design-data augmentation framework,” in Proceedings of the 61st ACM/IEEE Design Automation Conference, ser. DAC ’...

work page doi:10.1145/3649329.3657356 2024

[12] [12]

Natural language is not enough: Benchmarking multi-modal generative ai for verilog generation,

K. Chang, Z. Chen, Y . Zhou, W. Zhu, K. Wang, H. Xu, C. Li, M. Wang, S. Liang, H. Li, Y . Han, and Y . Wang, “Natural language is not enough: Benchmarking multi-modal generative ai for verilog generation,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing M...

work page doi:10.1145/3676536.3676679 2025

[13] [13]

Betterv: controlled verilog generation with discriminative guidance,

Z. Pei, H.-L. Zhen, M. Yuan, Y . Huang, and B. Yu, “Betterv: controlled verilog generation with discriminative guidance,” in Proceedings of the 41st International Conference on Machine Learning , ser. ICML’24. JMLR.org, 2024

work page 2024

[14] [14]

Codev: Empowering llms for verilog generation through multi-level summarization,

Y . Zhao, D. Huang, C. Li, P. Jin, Z. Nan, T. Ma, L. Qi, Y . Pan, Z. Zhang, R. Zhang, X. Zhang, Z. Du, Q. Guo, X. Hu, and Y . Chen, “Codev: Empowering llms for verilog generation through multi-level summarization,” 2024. [Online]. Available: https: //arxiv.org/abs/2407.10424

work page arXiv 2024

[15] [15]

Rtl- coder: Fully open-source and efficient llm-assisted rtl code generation technique,

S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “Rtl- coder: Fully open-source and efficient llm-assisted rtl code generation technique,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2024

work page 2024

[16] [16]

Craftrtl: High-quality synthetic data generation for verilog code models with correct-by- construction non-textual representations and targeted code repair,

M. Liu, Y .-D. Tsai, W. Zhou, and H. Ren, “Craftrtl: High-quality synthetic data generation for verilog code models with correct-by- construction non-textual representations and targeted code repair,” 2025. [Online]. Available: https://arxiv.org/abs/2409.12993

work page arXiv 2025

[17] [17]

Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,

F. Cui, C. Yin, K. Zhou, Y . Xiao, G. Sun, Q. Xu, Q. Guo, Y . Liang, X. Zhang, D. Song, and D. Lin, “Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025...

work page doi:10.1145/3676536.3676830 2025

[18] [18]

A data-centric chip design agent framework for verilog code generation,

K. Chang, W. Zhu, K. Wang, X. He, N. Yang, Z. Chen, D. Jin, C. Li, Y . Zhou, H. Yan, Z. Zhao, Y . Cheng, M. Wang, S. Liang, Y . Han, X. Li, H. Li, and Y . Wang, “A data-centric chip design agent framework for verilog code generation,” ACM Trans. Des. Autom. Electron. Syst. , Apr

work page

[19] [19]

Available: https://doi.org/10.1145/3727980

[Online]. Available: https://doi.org/10.1145/3727980

work page doi:10.1145/3727980

[20] [20]

GPT-4 Technical Report

OpenAI et al. , “Gpt-4 technical report,” 2024. [Online]. Available: https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

The Llama 3 Herd of Models

A. Grattafiori et al. , “The llama 3 herd of models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Qwen Technical Report

J. Bai et al. , “Qwen technical report,” 2023. [Online]. Available: https://arxiv.org/abs/2309.16609

work page internal anchor Pith review Pith/arXiv arXiv 2023

[23] [23]

DeepSeek-V3 Technical Report

DeepSeek-AI et al. , “Deepseek-v3 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2412.19437

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Hlspilot: Llm-based high-level synthesis,

C. Xiong, C. Liu, H. Li, and X. Li, “Hlspilot: Llm-based high-level synthesis,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676781

work page doi:10.1145/3676536.3676781 2025

[25] [25]

Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,

Y . Fu, Y . Zhang, Z. Yu, S. Li, Z. Ye, C. Li, C. Wan, and Y . C. Lin, “Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,” in 2023 IEEE/ACM International Confer- ence on Computer Aided Design (ICCAD) , 2023, pp. 1–9

work page 2023

[26] [26]

AutoBench: Automatic testbench generation and evaluation using llms for hdl design,

R. Qiu, G. L. Zhang, R. Drechsler, U. Schlichtmann, and B. Li, “Autobench: Automatic testbench generation and evaluation using llms for hdl design,” in Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD , ser. MLCAD ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10...

work page doi:10.1145/3670474.3685956 2024

[27] [27]

LLM-aided testbench generation and bug detection for finite-state machines,

J. Bhandari, J. Knechtel, R. Narayanaswamy, S. Garg, and R. Karri, “Llm-aided testbench generation and bug detection for finite-state machines,” 2024. [Online]. Available: https://arxiv.org/abs/2406.17132

work page arXiv 2024

[28] [28]

Customized retrieval augmented generation and benchmarking for eda tool documentation qa,

Y . Pu, Z. He, T. Qiu, H. Wu, and B. Yu, “Customized retrieval augmented generation and benchmarking for eda tool documentation qa,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676730

work page doi:10.1145/3676536.3676730 2025

[29] [29]

Leanor: A learning-based accelerator for efficient approximate nearest neighbor search via reduced memory access,

Y . Tsai, M. Liu, and H. Ren, “Rtlfixer: Automatically fixing rtl syntax errors with large language model,” in Proceedings of the 61st ACM/IEEE Design Automation Conference , ser. DAC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3649329.3657353

work page doi:10.1145/3649329.3657353 2024

[30] [30]

Hdldebugger: Streamlining hdl debugging with large language models,

X. Yao, H. Li, T. H. Chan, W. Xiao, M. Yuan, Y . Huang, L. Chen, and B. Yu, “Hdldebugger: Streamlining hdl debugging with large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2403.11671

work page arXiv 2024

[31] [31]

Make every move count: Llm-based high-quality rtl code generation using mcts,

M. DeLorenzo, A. B. Chowdhury, V . Gohil, S. Thakur, R. Karri, S. Garg, and J. Rajendran, “Make every move count: Llm-based high-quality rtl code generation using mcts,” 2024. [Online]. Available: https://arxiv.org/abs/2402.03289

work page arXiv 2024

[32] [32]

Rtlrewriter: Methodologies for large models aided rtl code optimization,

X. Yao, Y . Wang, X. Li, Y . Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, “Rtlrewriter: Methodologies for large models aided rtl code optimization,” 2024. [Online]. Available: https: //arxiv.org/abs/2409.11414

work page arXiv 2024

[33] [33]

P., Kawaguchi, K., and Shieh, M

Y . Xie, A. Goyal, W. Zheng, M.-Y . Kan, T. P. Lillicrap, K. Kawaguchi, and M. Shieh, “Monte carlo tree search boosts reasoning via iterative preference learning,” 2024. [Online]. Available: https://arxiv.org/abs/ 2405.00451

work page arXiv 2024

[34] [34]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Proceedings of the 36th International Conference on Neural Information Processing Systems , ser. NIPS ’22. Red Hook, NY , USA: Curran Associates Inc., 2022

work page 2022

[35] [35]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . K. Li, Y . Wu, and D. Guo, “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

VerilogEval: evaluating large language models for verilog code generation,

M. Liu, N. Pinckney, B. Khailany, and H. Ren, “VerilogEval: evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) , 2023

work page 2023

[37] [37]

Rtllm: An open-source benchmark for design rtl generation with large language model,

Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” in 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) . IEEE, 2024, pp. 722–727

work page 2024

[38] [38]

Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation(invited),

S. Liu, Y . Lu, W. Fang, M. Li, and Z. Xie, “Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation(invited),” in Proceedings of 2024 IEEE/ACM International Conference on Computer- Aided Design (ICCAD) . ACM, 2024

work page 2024

[39] [39]

Adder (electronics),

Wikipedia, “Adder (electronics),” https://en.wikipedia.org/wiki/Adder (electronics), 2025, accessed: 2025-03-08

work page 2025