ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning
Pith reviewed 2026-05-19 06:45 UTC · model grok-4.3
The pith
ChipSeek uses reinforcement learning with EDA tool feedback to train LLMs for generating RTL code that is both functionally correct and hardware-efficient.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ChipSeek is a hierarchical reward based reinforcement learning framework that integrates direct feedback from EDA simulators and synthesis tools into a hierarchical reward mechanism, enabling LLMs to generate RTL code that is both functionally correct and optimized for PPA metrics, with Curriculum-Guided Dynamic Policy Optimization guiding the learning process.
What carries the argument
hierarchical reward mechanism integrated with Curriculum-Guided Dynamic Policy Optimization (CDPO) that draws on EDA tool outputs to shape the LLM policy
Load-bearing premise
The reward signals and training procedure teach the model general hardware optimization principles rather than causing it to overfit to the specific benchmarks or EDA tool outputs seen during training.
What would settle it
Running the trained model on a fresh collection of circuit designs outside the original benchmarks and training curriculum, then checking whether gains in functional correctness and PPA metrics remain stable.
Figures
read the original abstract
Large Language Models have emerged as powerful tools for automating Register-Transfer Level (RTL) code generation, yet they face critical limitations: existing approaches typically fail to simultaneously optimize functional correctness and hardware efficiency metrics such as Power, Performance, and Area (PPA). Methods relying on supervised fine-tuning commonly produce functionally correct but suboptimal designs due to the lack of inherent mechanisms for learning hardware optimization principles. Conversely, external post-processing techniques aiming to refine PPA performance after generation often suffer from inefficiency and do not improve the LLMs' intrinsic capabilities. To overcome these challenges, we propose ChipSeek, a novel hierarchical reward based reinforcement learning framework designed to encourage LLMs to generate RTL code that is both functionally correct and optimized for PPA metrics. Our approach integrates direct feedback from EDA simulators and synthesis tools into a hierarchical reward mechanism, facilitating a nuanced understanding of hardware design trade-offs. Through Curriculum-Guided Dynamic Policy Optimization (CDPO), ChipSeek enhances the LLM's ability to generate high-quality, optimized RTL code. Evaluations on standard benchmarks demonstrate ChipSeek's superior performance, achieving state-of-the-art functional correctness and PPA performance. Furthermore, it excels in specific optimization tasks, consistently yielding highly efficient designs when individually targeting fine-grained optimization goals such as power, delay, and area. The artifact is open-source in https://github.com/rong-hash/chipseek.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ChipSeek, a hierarchical reward-based reinforcement learning framework for training LLMs to generate RTL/Verilog code that is both functionally correct and optimized for Power, Performance, and Area (PPA) metrics. It integrates direct feedback from EDA simulators and synthesis tools, introduces Curriculum-Guided Dynamic Policy Optimization (CDPO), and claims state-of-the-art results on standard benchmarks for functional correctness and PPA, with an open-source artifact.
Significance. If substantiated by detailed quantitative results, ablations, and generalization tests, the direct integration of EDA tool feedback into the RL training loop could advance automated hardware design by enabling LLMs to internalize optimization trade-offs rather than relying on post-processing. The open-source release supports reproducibility.
major comments (2)
- Abstract and §4 (Experimental Results): the central SOTA claim on functional correctness and PPA performance is asserted without supplying specific quantitative metrics, baseline comparisons against supervised fine-tuning or prior RL methods, ablation studies on the hierarchical reward or CDPO, or details on reward shaping and any post-hoc exclusions. This prevents verification of the primary contribution.
- §3 (Method, hierarchical reward definition): the reward combines external EDA signals for functionality, power, delay, and area, with hierarchical weights listed among free parameters. Without out-of-distribution RTL evaluations or cross-tool validation (e.g., a second synthesis flow), the results risk reflecting benchmark-specific fitting or tool artifacts rather than learned general optimization principles, directly affecting the claim that the approach teaches transferable hardware design knowledge.
minor comments (2)
- Notation: CDPO is introduced in the abstract and method but its full expansion and precise update rule should be restated at the start of the experimental section for clarity.
- Figure clarity: ensure that any reward-component diagrams or training curves include explicit axis labels and legend entries distinguishing functional vs. PPA terms.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying the experimental evidence already present in the manuscript while incorporating revisions to improve clarity and verifiability.
read point-by-point responses
-
Referee: Abstract and §4 (Experimental Results): the central SOTA claim on functional correctness and PPA performance is asserted without supplying specific quantitative metrics, baseline comparisons against supervised fine-tuning or prior RL methods, ablation studies on the hierarchical reward or CDPO, or details on reward shaping and any post-hoc exclusions. This prevents verification of the primary contribution.
Authors: We agree that the abstract would benefit from explicit quantitative support for the SOTA claim. In the revised manuscript we have added specific metrics (functional correctness rates and PPA deltas versus baselines) directly to the abstract. Section 4 already contains the requested baseline comparisons against supervised fine-tuning and prior RL methods, plus ablation tables isolating the hierarchical reward and CDPO contributions. We have expanded the reward-shaping description and added an explicit statement confirming that no post-hoc exclusions were applied. These updates directly address the verification concern. revision: yes
-
Referee: §3 (Method, hierarchical reward definition): the reward combines external EDA signals for functionality, power, delay, and area, with hierarchical weights listed among free parameters. Without out-of-distribution RTL evaluations or cross-tool validation (e.g., a second synthesis flow), the results risk reflecting benchmark-specific fitting or tool artifacts rather than learned general optimization principles, directly affecting the claim that the approach teaches transferable hardware design knowledge.
Authors: The hierarchical weights are indeed hyperparameters; we have added a sensitivity study in the appendix demonstrating stable performance across reasonable ranges. To address the generalizability concern, the revised manuscript now includes out-of-distribution RTL test cases and cross-tool validation using a second synthesis flow. These experiments show consistent gains, indicating that the learned trade-offs are not artifacts of a single benchmark or tool. revision: yes
Circularity Check
No significant circularity: external EDA rewards keep derivation independent
full rationale
The paper defines ChipSeek via a hierarchical reward mechanism that directly incorporates feedback from external EDA simulators and synthesis tools, followed by Curriculum-Guided Dynamic Policy Optimization. These rewards are not derived from the model's own outputs or fitted parameters but supplied by independent third-party tools. Performance is then measured on standard benchmarks. No equation or step reduces by construction to a self-defined quantity, a fitted input renamed as prediction, or a load-bearing self-citation chain. The method is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- hierarchical reward weights
axioms (1)
- domain assumption EDA simulators and synthesis tools return accurate and consistent PPA measurements that can serve as reward signals.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hierarchical reward system... R = ω1 Rformat + ω2 Rcomp + ω3 Rfunc + ω4 Rsyn + ω5 Rppa... GRPO algorithm
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Curriculum-Guided Dynamic Policy Optimization (CDPO)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning
HLS-Seek replaces full-synthesis RL with a comparative proxy reward model plus uncertainty-triggered real checks, yielding higher correctness and better QoR than larger models at 8.5x lower training cost.
-
AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code
AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.
-
COEVO: Co-Evolutionary Framework for Joint Functional Correctness and PPA Optimization in LLM-Based RTL Generation
COEVO unifies correctness and multi-objective PPA optimization in a single evolutionary loop for LLM RTL generation, reporting 97.5% and 94.5% Pass@1 on VerilogEval/RTLLM benchmarks plus best PPA on 43 of 49 designs.
Reference graph
Works this paper leans on
-
[1]
LLM blueprint: Enabling text-to-image generation with complex and detailed prompts,
H. Gani, S. F. Bhat, M. Naseer, S. Khan, and P. Wonka, “LLM blueprint: Enabling text-to-image generation with complex and detailed prompts,” in The Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview. net/forum?id=mNYF0IHbRy
work page 2024
-
[2]
Jpeg-lm: Llms as image generators with canonical codec representations,
X. Han, M. Ghazvininejad, P. W. Koh, and Y . Tsvetkov, “Jpeg-lm: Llms as image generators with canonical codec representations,” 2024. [Online]. Available: https://arxiv.org/abs/2408.08459
-
[3]
Diffusiongpt: Llm-driven text-to-image generation system,
J. Qin, J. Wu, W. Chen, Y . Ren, H. Li, H. Wu, X. Xiao, R. Wang, and S. Wen, “Diffusiongpt: Llm-driven text-to-image generation system,”
-
[4]
Available: https://arxiv.org/abs/2401.10061
[Online]. Available: https://arxiv.org/abs/2401.10061
-
[5]
CodeJudge: Evaluating code generation with large language models,
W. Tong and T. Zhang, “CodeJudge: Evaluating code generation with large language models,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 20 032–20 051. [Online]. Available: https://acl...
work page 2024
-
[6]
A Survey on Large Language Models for Code Generation
J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,” 2024. [Online]. Available: https://arxiv.org/abs/2406.00515
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
What skills do you need when developing software using ChatGPT? (discussion paper),
M. Kazemitabaar, X. Hou, A. Henley, B. J. Ericson, D. Weintrop, and T. Grossman, “How novices use llm-based code generators to solve cs1 coding tasks in a self-paced learning environment,” in Proceedings of the 23rd Koli Calling International Conference on Computing Education Research, ser. Koli Calling ’23. New York, NY , USA: Association for Computing M...
-
[8]
Learning video representations from large language models,
Y . Zhao, I. Misra, P. Kr ¨ahenb¨uhl, and R. Girdhar, “Learning video representations from large language models,” in arXiv preprint arXiv:2212.04501, 2022
-
[9]
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
C. Fu, Y . Dai, Y . Luo, L. Li, S. Ren, R. Zhang, Z. Wang, C. Zhou, Y . Shen, M. Zhang et al. , “Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis,” arXiv preprint arXiv:2405.21075, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
Chipgpt: How far are we from natural language hardware design,
K. Chang, Y . Wang, H. Ren, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Chipgpt: How far are we from natural language hardware design,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14019
-
[11]
K. Chang, K. Wang, N. Yang, Y . Wang, D. Jin, W. Zhu, Z. Chen, C. Li, H. Yan, Y . Zhou, Z. Zhao, Y . Cheng, Y . Pan, Y . Liu, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Data is all you need: Finetuning llms for chip design via an automated design-data augmentation framework,” in Proceedings of the 61st ACM/IEEE Design Automation Conference, ser. DAC ’...
-
[12]
Natural language is not enough: Benchmarking multi-modal generative ai for verilog generation,
K. Chang, Z. Chen, Y . Zhou, W. Zhu, K. Wang, H. Xu, C. Li, M. Wang, S. Liang, H. Li, Y . Han, and Y . Wang, “Natural language is not enough: Benchmarking multi-modal generative ai for verilog generation,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing M...
-
[13]
Betterv: controlled verilog generation with discriminative guidance,
Z. Pei, H.-L. Zhen, M. Yuan, Y . Huang, and B. Yu, “Betterv: controlled verilog generation with discriminative guidance,” in Proceedings of the 41st International Conference on Machine Learning , ser. ICML’24. JMLR.org, 2024
work page 2024
-
[14]
Codev: Empowering llms for verilog generation through multi-level summarization,
Y . Zhao, D. Huang, C. Li, P. Jin, Z. Nan, T. Ma, L. Qi, Y . Pan, Z. Zhang, R. Zhang, X. Zhang, Z. Du, Q. Guo, X. Hu, and Y . Chen, “Codev: Empowering llms for verilog generation through multi-level summarization,” 2024. [Online]. Available: https: //arxiv.org/abs/2407.10424
-
[15]
Rtl- coder: Fully open-source and efficient llm-assisted rtl code generation technique,
S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “Rtl- coder: Fully open-source and efficient llm-assisted rtl code generation technique,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2024
work page 2024
-
[16]
M. Liu, Y .-D. Tsai, W. Zhou, and H. Ren, “Craftrtl: High-quality synthetic data generation for verilog code models with correct-by- construction non-textual representations and targeted code repair,” 2025. [Online]. Available: https://arxiv.org/abs/2409.12993
-
[17]
Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,
F. Cui, C. Yin, K. Zhou, Y . Xiao, G. Sun, Q. Xu, Q. Guo, Y . Liang, X. Zhang, D. Song, and D. Lin, “Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025...
-
[18]
A data-centric chip design agent framework for verilog code generation,
K. Chang, W. Zhu, K. Wang, X. He, N. Yang, Z. Chen, D. Jin, C. Li, Y . Zhou, H. Yan, Z. Zhao, Y . Cheng, M. Wang, S. Liang, Y . Han, X. Li, H. Li, and Y . Wang, “A data-centric chip design agent framework for verilog code generation,” ACM Trans. Des. Autom. Electron. Syst. , Apr
-
[19]
Available: https://doi.org/10.1145/3727980
[Online]. Available: https://doi.org/10.1145/3727980
-
[20]
OpenAI et al. , “Gpt-4 technical report,” 2024. [Online]. Available: https://arxiv.org/abs/2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
A. Grattafiori et al. , “The llama 3 herd of models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
J. Bai et al. , “Qwen technical report,” 2023. [Online]. Available: https://arxiv.org/abs/2309.16609
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[23]
DeepSeek-AI et al. , “Deepseek-v3 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2412.19437
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Hlspilot: Llm-based high-level synthesis,
C. Xiong, C. Liu, H. Li, and X. Li, “Hlspilot: Llm-based high-level synthesis,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676781
-
[25]
Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,
Y . Fu, Y . Zhang, Z. Yu, S. Li, Z. Ye, C. Li, C. Wan, and Y . C. Lin, “Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,” in 2023 IEEE/ACM International Confer- ence on Computer Aided Design (ICCAD) , 2023, pp. 1–9
work page 2023
-
[26]
AutoBench: Automatic testbench generation and evaluation using llms for hdl design,
R. Qiu, G. L. Zhang, R. Drechsler, U. Schlichtmann, and B. Li, “Autobench: Automatic testbench generation and evaluation using llms for hdl design,” in Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD , ser. MLCAD ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10...
-
[27]
LLM-aided testbench generation and bug detection for finite-state machines,
J. Bhandari, J. Knechtel, R. Narayanaswamy, S. Garg, and R. Karri, “Llm-aided testbench generation and bug detection for finite-state machines,” 2024. [Online]. Available: https://arxiv.org/abs/2406.17132
-
[28]
Customized retrieval augmented generation and benchmarking for eda tool documentation qa,
Y . Pu, Z. He, T. Qiu, H. Wu, and B. Yu, “Customized retrieval augmented generation and benchmarking for eda tool documentation qa,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676730
-
[29]
Y . Tsai, M. Liu, and H. Ren, “Rtlfixer: Automatically fixing rtl syntax errors with large language model,” in Proceedings of the 61st ACM/IEEE Design Automation Conference , ser. DAC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3649329.3657353
-
[30]
Hdldebugger: Streamlining hdl debugging with large language models,
X. Yao, H. Li, T. H. Chan, W. Xiao, M. Yuan, Y . Huang, L. Chen, and B. Yu, “Hdldebugger: Streamlining hdl debugging with large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2403.11671
-
[31]
Make every move count: Llm-based high-quality rtl code generation using mcts,
M. DeLorenzo, A. B. Chowdhury, V . Gohil, S. Thakur, R. Karri, S. Garg, and J. Rajendran, “Make every move count: Llm-based high-quality rtl code generation using mcts,” 2024. [Online]. Available: https://arxiv.org/abs/2402.03289
-
[32]
Rtlrewriter: Methodologies for large models aided rtl code optimization,
X. Yao, Y . Wang, X. Li, Y . Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, “Rtlrewriter: Methodologies for large models aided rtl code optimization,” 2024. [Online]. Available: https: //arxiv.org/abs/2409.11414
-
[33]
P., Kawaguchi, K., and Shieh, M
Y . Xie, A. Goyal, W. Zheng, M.-Y . Kan, T. P. Lillicrap, K. Kawaguchi, and M. Shieh, “Monte carlo tree search boosts reasoning via iterative preference learning,” 2024. [Online]. Available: https://arxiv.org/abs/ 2405.00451
-
[34]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Proceedings of the 36th International Conference on Neural Information Processing Systems , ser. NIPS ’22. Red Hook, NY , USA: Curran Associates Inc., 2022
work page 2022
-
[35]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . K. Li, Y . Wu, and D. Guo, “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
VerilogEval: evaluating large language models for verilog code generation,
M. Liu, N. Pinckney, B. Khailany, and H. Ren, “VerilogEval: evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) , 2023
work page 2023
-
[37]
Rtllm: An open-source benchmark for design rtl generation with large language model,
Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” in 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) . IEEE, 2024, pp. 722–727
work page 2024
-
[38]
Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation(invited),
S. Liu, Y . Lu, W. Fang, M. Li, and Z. Xie, “Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation(invited),” in Proceedings of 2024 IEEE/ACM International Conference on Computer- Aided Design (ICCAD) . ACM, 2024
work page 2024
-
[39]
Wikipedia, “Adder (electronics),” https://en.wikipedia.org/wiki/Adder (electronics), 2025, accessed: 2025-03-08
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.