pith. sign in

arxiv: 2606.10285 · v1 · pith:24WNE4XBnew · submitted 2026-06-09 · 💻 cs.CL

OpenRTLSet: A Fully Open-Source Dataset for Large Language Model-based Verilog Module Design

Pith reviewed 2026-06-27 13:43 UTC · model grok-4.3

classification 💻 cs.CL
keywords OpenRTLSetVerilog datasetLLM fine-tuninghardware designopen-source datasetcode generationVHDL translationC/C++ translation
0
0 comments X

The pith

OpenRTLSet supplies the largest fully open Verilog dataset of 131,000 modules to train language models on hardware design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OpenRTLSet to give researchers and industry a large collection of Verilog code that anyone can use without restrictions. It gathers 102,000 modules directly from GitHub, adds 5,000 from VHDL translations and 24,000 from C/C++ translations, then pairs every sample with a natural-language description created by DeepSeek-R1. The dataset supports fine-tuning models such as Qwen and Granite, and tests show that open resources can reach higher performance on hardware tasks than closed alternatives. If correct, this removes a major barrier that has kept hardware design work dependent on proprietary data.

Core claim

OpenRTLSet is the largest fully open-source dataset for hardware design, containing over 131,000 Verilog modules drawn from GitHub repositories, VHDL-to-Verilog translations, and synthesizable C/C++ translations, each accompanied by natural language descriptions generated by DeepSeek-R1 so that language models can be fine-tuned for Verilog module creation.

What carries the argument

The OpenRTLSet dataset, which merges raw open Verilog, translated modules, and paired natural-language descriptions to support LLM fine-tuning for hardware code generation.

If this is right

  • Open-source datasets can match or exceed the results of restricted datasets for Verilog generation tasks.
  • Models from 7B to 32B parameters can be fine-tuned on the data, with choices for context from Verilator C++ files and for INT4 versus BF16 quantization.
  • The combination of multiple source types expands the diversity of training examples available to the community.
  • The same approach can be repeated to grow the dataset further without licensing barriers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The translation pipeline could be applied to additional legacy codebases to grow open hardware training resources.
  • Researchers might add automated verification steps to confirm the correctness of translated modules before release.
  • The dataset lowers the entry cost for academic groups to experiment with LLM-based hardware tools.
  • Similar collection methods could be used for other hardware description languages beyond Verilog.

Load-bearing premise

The automatic translations from VHDL and C/C++ produce correct and synthesizable Verilog modules and the generated natural language descriptions accurately reflect each module's function without errors.

What would settle it

Running the same fine-tuning and evaluation pipeline on OpenRTLSet versus a comparable proprietary dataset and finding that models trained on OpenRTLSet produce lower rates of functionally correct Verilog on standard hardware benchmarks.

Figures

Figures reproduced from arXiv: 2606.10285 by Deming Chen, Jinghua Wang, Kaiwen Cao, Lily Jiaxin Wan, Manvi Jha, Sanjana Pingali, Scott Smith, Shalini Sivakumar, Xing Zhao.

Figure 1
Figure 1. Figure 1: The end-to-end flow for OPENRTLSET creation, labeling, fine-tuning, and evaluation. The process begins with raw data collection from GitHub repositories, followed by labeling using DeepSeek-R1 70B, fine-tuning of multiple LLM architectures, and evaluation using the VerilogEval benchmark. IV. EXPERIMENTS In this section, we evaluate the effectiveness of our OPEN￾RTLSET dataset and benchmark various language… view at source ↗
read the original abstract

OpenRTLSet introduces the largest fully open-source dataset for hardware design, offering over 131,000 diverse Verilog code samples to the research community and industry. Our dataset uniquely combines Verilog code from GitHub repositories (102k modules), VHDL translations (5k modules), and synthesizable C/C++ translations (24k modules), all freely accessible without proprietary restrictions. Using the reasoning model DeepSeek-R1, we generated paired natural language descriptions for each code sample, enabling fine-tuning of various language model families (e.g., Qwen and Granite) for Verilog code generation. Our dataset explores multiple options, including Verilator-generated C++ files as additional context during labeling, quantization techniques (INT4 vs. BF16), and performance differences across model sizes (7B-32B parameters). OpenRTLSet demonstrates that open-source approaches can achieve superior performance in hardware design tasks, establishing a new foundation for accessible research and commercial use in this domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces OpenRTLSet as the largest fully open-source dataset for LLM-based Verilog module design, containing over 131,000 samples: 102k Verilog modules from GitHub, 5k from VHDL translations, and 24k from synthesizable C/C++ translations. Paired natural language descriptions are generated for each using DeepSeek-R1, with explorations of fine-tuning Qwen and Granite models under varying conditions (e.g., Verilator context, INT4 vs. BF16 quantization, 7B-32B sizes). The work claims this demonstrates that open-source approaches can achieve superior performance in hardware design tasks.

Significance. If the translations are functionally equivalent and the descriptions are accurate, the fully open release of a dataset at this scale would be a valuable contribution, enabling reproducible fine-tuning experiments and lowering barriers for research in LLM-assisted hardware design.

major comments (2)
  1. [Abstract] Abstract: The central claim that the 5k VHDL and 24k C/C++ translations yield correct, synthesizable Verilog and that DeepSeek-R1 descriptions faithfully capture functionality rests on unverified assertions. No equivalence checking, synthesis pass rates, testbench coverage, or validation of the generated descriptions is reported, which is load-bearing for the dataset's utility in fine-tuning and the 'superior performance' assertion.
  2. [Abstract] Abstract: The statement that the dataset 'demonstrates that open-source approaches can achieve superior performance' lacks any reported quantitative results, baselines, metrics, or evaluation protocol, preventing assessment of whether the claim holds for the translated subsets or overall.
minor comments (1)
  1. The abstract would benefit from explicit definitions of key terms such as 'synthesizable' and 'superior performance' to improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We agree that the claims regarding translation correctness and performance superiority require either supporting evidence or appropriate qualification, and we will revise the manuscript to address these issues directly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the 5k VHDL and 24k C/C++ translations yield correct, synthesizable Verilog and that DeepSeek-R1 descriptions faithfully capture functionality rests on unverified assertions. No equivalence checking, synthesis pass rates, testbench coverage, or validation of the generated descriptions is reported, which is load-bearing for the dataset's utility in fine-tuning and the 'superior performance' assertion.

    Authors: We acknowledge that the manuscript reports no equivalence checking, synthesis pass rates, testbench coverage, or other validation for the VHDL-to-Verilog and C/C++-to-Verilog translations, nor for the accuracy of the DeepSeek-R1 descriptions. The translations were generated using standard conversion approaches, but these steps were not formally verified at scale. In the revised version we will add an explicit limitations subsection describing the dataset construction process, state that the translated modules are released without additional equivalence or synthesis validation, and qualify all related claims accordingly. revision: yes

  2. Referee: [Abstract] Abstract: The statement that the dataset 'demonstrates that open-source approaches can achieve superior performance' lacks any reported quantitative results, baselines, metrics, or evaluation protocol, preventing assessment of whether the claim holds for the translated subsets or overall.

    Authors: The abstract claim that the dataset demonstrates superior performance of open-source approaches is not supported by quantitative results, baselines, or an evaluation protocol in the current manuscript. The fine-tuning explorations are described qualitatively but without the metrics needed to substantiate superiority. We will revise the abstract to remove this claim and instead highlight the dataset's role in enabling future reproducible evaluations by the community. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release without derivation chain

full rationale

The paper introduces OpenRTLSet as a data collection assembled from GitHub Verilog, VHDL-to-Verilog translations, C/C++-to-Verilog translations, and DeepSeek-R1-generated natural-language descriptions. No equations, fitted parameters, uniqueness theorems, or predictive claims appear in the manuscript. The central contribution is the release of 131k modules; any performance claims on downstream fine-tuning rest on external use rather than an internal derivation that reduces to the paper's own inputs by construction. Self-citation load-bearing, ansatz smuggling, and renaming of known results are absent.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Dataset release paper; no free parameters, mathematical axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5726 in / 1086 out tokens · 15613 ms · 2026-06-27T13:43:07.666232+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 2 canonical work pages

  1. [1]

    Code llama: Open foundation models for code,

    B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, R. Sauvestre, T. Remezet al., “Code llama: Open foundation models for code,”arXiv preprint arXiv:2308.12950, 2023

  2. [2]

    Deepseek-coder: When the large language model meets programming–the rise of code intelligence,

    D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . Liet al., “Deepseek-coder: When the large language model meets programming–the rise of code intelligence,”arXiv preprint arXiv:2401.14196, 2024

  3. [3]

    Qwen2. 5-coder technical report,

    B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Danget al., “Qwen2. 5-coder technical report,”arXiv preprint arXiv:2409.12186, 2024

  4. [4]

    Pybench: Evaluating llm agent on various real-world coding tasks,

    Y . Zhang, Y . Pan, Y . Wang, and J. Cai, “Pybench: Evaluating llm agent on various real-world coding tasks,”arXiv preprint arXiv:2407.16732, 2024

  5. [5]

    Stelocoder: a decoder-only llm for multi-language to python code translation,

    J. Pan, A. Sad ´e, J. Kim, E. Soriano, G. Sole, and S. Flamant, “Stelocoder: a decoder-only llm for multi-language to python code translation,”arXiv preprint arXiv:2310.15539, 2023

  6. [6]

    Enhancing javascript source code understanding with graph-aligned large language models,

    T. Vadoce, J. Pritchard, and C. Fairbanks, “Enhancing javascript source code understanding with graph-aligned large language models,” 2024

  7. [7]

    A study of vulnerability repair in javascript programs with large language models,

    T. K. Le, S. Alimadadi, and S. Y . Ko, “A study of vulnerability repair in javascript programs with large language models,” inCompanion Proceedings of the ACM Web Conference 2024, 2024, pp. 666–669

  8. [8]

    Where are large language models for code generation on github?

    X. Yu, L. Liu, X. Hu, J. W. Keung, J. Liu, and X. Xia, “Where are large language models for code generation on github?”arXiv preprint arXiv:2406.19544, 2024

  9. [9]

    Software/hardware co-design for llm and its application for design verification,

    L. J. Wan, Y . Huang, Y . Li, H. Ye, J. Wang, X. Zhang, and D. Chen, “Software/hardware co-design for llm and its application for design verification,” in2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2024, pp. 435–441

  10. [10]

    New solutions on llm acceleration, optimization, and application,

    Y . Huang, L. J. Wan, H. Ye, M. Jha, J. Wang, Y . Li, X. Zhang, and D. Chen, “New solutions on llm acceleration, optimization, and application,” inProceedings of the 61st ACM/IEEE Design Automation Conference, 2024, pp. 1–4

  11. [11]

    Llm-aided efficient hardware design automation,

    K. Xu, R. Qiu, Z. Zhao, G. L. Zhang, U. Schlichtmann, and B. Li, “Llm-aided efficient hardware design automation,”arXiv preprint arXiv:2410.18582, 2024

  12. [12]

    A survey of research in large language models for electronic design automation,

    J. Pan, G. Zhou, C.-C. Chang, I. Jacobson, J. Hu, and Y . Chen, “A survey of research in large language models for electronic design automation,” ACM Transactions on Design Automation of Electronic Systems, 2025

  13. [13]

    The potential of llms in hardware design,

    S. Alsaqer, S. Alajmi, I. Ahmad, and M. Alfailakawi, “The potential of llms in hardware design,”Journal of Engineering Research, 2024

  14. [14]

    Dave: Deriving automatically verilog from english,

    H. Pearce, B. Tan, and R. Karri, “Dave: Deriving automatically verilog from english,” inProceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD, 2020, pp. 27–32

  15. [15]

    Verigen: A large language model for verilog code generation,

    S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “Verigen: A large language model for verilog code generation,” ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, pp. 1–31, 2024

  16. [16]

    Codev: Empowering llms for verilog generation through multi-level summarization,

    Y . Zhao, D. Huang, C. Li, P. Jin, Z. Nan, T. Ma, L. Qi, Y . Pan, Z. Zhang, R. Zhang, X. Zhang, Z. Du, Q. Guo, X. Hu, and Y . Chen, “Codev: Empowering llms for verilog generation through multi-level summarization,” https://iprc-dip.github.io/CodeV, accessed: 2024-11-22

  17. [17]

    Benchmarking large language models for auto- mated verilog rtl code generation,

    S. Thakur, B. Ahmad, Z. Fan, H. Pearce, B. Tan, R. Karri, B. Dolan- Gavitt, and S. Garg, “Benchmarking large language models for auto- mated verilog rtl code generation,” in2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2023, pp. 1–6

  18. [18]

    Evaluating llms for hardware design and test,

    J. Blocklove, S. Garg, R. Karri, and H. Pearce, “Evaluating llms for hardware design and test,” in2024 IEEE LLM Aided Design Workshop (LAD), 2024, pp. 1–6

  19. [19]

    Redefining chip design with ai-powered eda tools: Synopsys.ai,

    A. Narayanan, “Redefining chip design with ai-powered eda tools: Synopsys.ai,” https://www.synopsys.com/blogs/chip-design/ synopsys-ai-eda-tools.html, Mar. 2023, accessed: 2024-11-22

  20. [20]

    Domain- adapted llms for vlsi design and verification: A case study on formal verification,

    M. Liu, M. Kang, G. B. Hamad, S. Suhaib, and H. Ren, “Domain- adapted llms for vlsi design and verification: A case study on formal verification,” in2024 IEEE 42nd VLSI Test Symposium (VTS), 2024, pp. 1–4

  21. [21]

    Pyranet: A multi-layered hierarchical dataset for verilog,

    B. Nadimi, G. O. Boutaib, and H. Zheng, “Pyranet: A multi-layered hierarchical dataset for verilog,”arXiv preprint arXiv:2412.06947, 2024

  22. [22]

    Mg-verilog: Multi- grained dataset towards enhanced llm-assisted verilog generation,

    Y . Zhang, Z. Yu, Y . Fu, C. Wan, and Y . C. Lin, “Mg-verilog: Multi- grained dataset towards enhanced llm-assisted verilog generation,” in 2024 IEEE LLM Aided Design Workshop (LAD), 2024, pp. 1–5

  23. [23]

    High-level synthesis for fpgas: From prototyping to deployment,

    J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang, “High-level synthesis for fpgas: From prototyping to deployment,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 4, pp. 473–491, 2011

  24. [24]

    Fpga hls today: Successes, challenges, and opportunities,

    J. Cong, J. Lau, G. Liu, S. Neuendorffer, P. Pan, K. Vissers, and Z. Zhang, “Fpga hls today: Successes, challenges, and opportunities,” ACM Trans. Reconfigurable Technol. Syst., vol. 15, no. 4, aug 2022. [Online]. Available: https://doi.org/10.1145/3530775

  25. [25]

    Scalehls: A new scalable high-level synthesis framework on multi-level intermediate representation,

    H. Yeet al., “Scalehls: A new scalable high-level synthesis framework on multi-level intermediate representation,” inHPCA, 2022

  26. [26]

    Scalehls: a scalable high-level synthesis framework with multi- level transformations and optimizations,

    ——, “Scalehls: a scalable high-level synthesis framework with multi- level transformations and optimizations,” inDAC, 2022

  27. [27]

    Hida: A hierarchical dataflow compiler for high-level synthesis,

    ——, “Hida: A hierarchical dataflow compiler for high-level synthesis,” inASPLOS, 2024

  28. [28]

    Chisel: constructing hardware in a scala embedded language,

    J. Bachrach, H. V o, B. Richards, Y . Lee, A. Waterman, R. Avi ˇzienis, J. Wawrzynek, and K. Asanovi ´c, “Chisel: constructing hardware in a scala embedded language,” inProceedings of the 49th Annual Design Automation Conference, ser. DAC ’12. New York, NY , USA: Association for Computing Machinery, 2012, p. 1216–1225. [Online]. Available: https://doi.org...

  29. [29]

    Veryl: A new hardware description language as an altarnative to systemverilog,

    N. Hatta, T. Ishitani, and R. Shioya, “Veryl: A new hardware description language as an altarnative to systemverilog,” 2024. [Online]. Available: https://arxiv.org/abs/2411.12983

  30. [30]

    Spade: An expression-based hdl with pipelines,

    F. Skarman and O. Gustafsson, “Spade: An expression-based hdl with pipelines,” 2023. [Online]. Available: https://arxiv.org/abs/2304.03079

  31. [31]

    Betterv: Controlled verilog generation with discriminative guidance,

    Z. Pei, H.-L. Zhen, M. Yuan, Y . Huang, and B. Yu, “Betterv: Controlled verilog generation with discriminative guidance,”arXiv preprint arXiv:2402.03375v3, May 2024. [Online]. Available: https: //arxiv.org/abs/2402.03375

  32. [32]

    [Online]

    OpenAI, 2024. [Online]. Available: https://openai.com/policies/ row-terms-of-use/

  33. [33]

    Ldoolitt/vhd2vl,

    Ldoolitt, “Ldoolitt/vhd2vl,” https://github.com/ldoolitt/vhd2vl, accessed: 2024-11-22

  34. [34]

    Veripool,

    W. Snyder, “Veripool,” https://www.veripool.org/verilator/, accessed: 2024-11-22

  35. [35]

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,

    D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

  36. [36]

    Verilogeval: Evaluating large language models for verilog code generation,

    M. Liu, N. Pinckney, B. Khailany, and H. Ren, “Verilogeval: Evaluating large language models for verilog code generation,” 2023. [Online]. Available: https://arxiv.org/abs/2309.07544

  37. [37]

    Granite 3.0 language models,

    I. Granite Team, “Granite 3.0 language models,” 2024

  38. [38]

    Qwen2. 5 technical report,

    A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Weiet al., “Qwen2. 5 technical report,”arXiv preprint arXiv:2412.15115, 2024

  39. [39]

    Anthropic, “Claude,” https://www.anthropic.com/claude, 2024, accessed: 2024-03-14

  40. [40]

    Gpt-4o: Enabling ai for daily life,

    OpenAI, “Gpt-4o: Enabling ai for daily life,” https://openai.com/blog/ gpt-4o, 2024, accessed: 2024-03-14