Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

Alma Babbit; Aman Arora; Elias Hilaneh; Nakul Gopalan; Sean Lowe; Vidya Chhabria

arxiv: 2604.15606 · v2 · pith:QJIDUJWInew · submitted 2026-04-17 · 💻 cs.AR

Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

Sean Lowe , Elias Hilaneh , Alma Babbit , Nakul Gopalan , Vidya Chhabria , Aman Arora This is my paper

Pith reviewed 2026-05-22 10:14 UTC · model grok-4.3

classification 💻 cs.AR

keywords hardware verificationcode coverage closuretest stimulus generationagentic workflowslarge language modelsdigital design validationsimulation feedback loop

0 comments

The pith

An agentic LLM framework generates test stimuli from hardware design specifications and iteratively refines them via simulator feedback to close code coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Spec2Cov to automate the manual and slow process of coverage closure in hardware verification. It demonstrates that large language models can act as agents that produce test cases directly from specifications, interact with simulators, handle errors, and use coverage reports to improve results over multiple rounds. The work evaluates this approach across designs of different sizes and shows it reaches complete coverage on simpler cases while making substantial progress on harder ones. A sympathetic reader would care because this could reduce the time and human effort currently required to validate digital hardware before production.

Core claim

Spec2Cov is an agentic framework that automatically and iteratively generates test stimulus directly from design specifications by coordinating interactions between an LLM and a hardware simulator, managing compilation and simulation errors, parsing coverage reports, and feeding results back to the model for refinement without additional fine-tuning.

What carries the argument

The closed-loop agentic workflow that connects the LLM to the simulator for error management, coverage parsing, and iterative stimulus refinement.

If this is right

Verification teams could shift from manual test writing to overseeing automated coverage closure loops.
Simpler designs reach 100 percent coverage while complex ones reach up to 49 percent across the evaluated set.
Specific framework features improve performance without requiring model retraining.
The same loop structure applies to problems drawn from existing benchmark suites of varying complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may extend to mixed-signal or system-level designs if the simulator interface remains stable.
Integration with existing constrained-random tools could further boost coverage on the hardest cases.
Success on larger designs would depend on how well future models handle longer context from detailed coverage reports.
Continuous verification pipelines could adopt the framework to maintain coverage as designs evolve.

Load-bearing premise

The LLM can interpret coverage reports and simulation errors well enough to generate improved test stimulus in later iterations without fine-tuning or human intervention.

What would settle it

A design where multiple iterations produce no measurable increase in coverage metrics despite the model receiving full simulator feedback and error logs each round.

Figures

Figures reproduced from arXiv: 2604.15606 by Alma Babbit, Aman Arora, Elias Hilaneh, Nakul Gopalan, Sean Lowe, Vidya Chhabria.

**Figure 2.** Figure 2: Detailed flow of the Spec2Cov framework successful generation, the testcase is inserted into an autogenerated testbench template (which instantiates the design and includes clock generation logic). The design and testbench are then passed to the simulator, which performs simulation with coverage metrics enabled. Compilation and runtime errors are fed back to the LLM for correction. Upon successful simulat… view at source ↗

**Figure 3.** Figure 3: Agentic approach achieves significantly higher coverage, whereas the single iteration approach fails to produce any coverage in some cases. GM = Geometric Mean best of our knowledge, there is no prior work that addresses automated testcase generation from specifications for closing code coverage for CVDP designs. Consequently, our results cannot be compared against a prior baseline, and we instead report a… view at source ↗

**Figure 5.** Figure 5: Average generation time and average total token usage increases as design complexity increases. VI. DISCUSSION Code Coverage Focus Spec2Cov targets code coverage because it is a standard sign-off gate before broader verification closure, and manual closure remains a major verification bottleneck. Automating this stage provides immediate workflow impact and establishes a foundation for future functional… view at source ↗

read the original abstract

Hardware verification is one of the most challenging stages of the hardware design process, requiring significant time and resources to ensure a design is fully validated and production-ready. Verification teams aim to maximize design coverage while ensuring correct behavior and alignment with the specification. Coverage closure, which relies on iterative constrained-random and directed testing, is still largely manual and therefore slow and labor-intensive. Recent advances show that the code generation capabilities of Large Language Models (LLMs) can be integrated with external tools to build agentic workflows that autonomously perform hardware design and verification tasks. In this work, we introduce Spec2Cov, an agentic framework that automatically and iteratively generates test stimulus directly from design specifications to accelerate coverage closure. Spec2Cov coordinates interactions between an LLM and a hardware simulator, managing compilation and simulation errors, parsing coverage reports, and feeding results back to the model for refinement. We present features that improve Spec2Cov's effectiveness without additional fine-tuning and evaluate their impact. Across 26 designs of varying size and complexity, including problems from the CVDP benchmark suite, Spec2Cov demonstrates promising performance, achieving 100% coverage on simpler designs and up to 49% on more complex designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Spec2Cov shows an LLM agent can drive iterative test generation and coverage improvement on hardware designs, but the results are hard to interpret without baselines against standard verification flows.

read the letter

The core contribution is a closed-loop agent that takes design specs, has an LLM generate stimulus, runs it through a hardware simulator, handles compile and runtime errors, parses coverage reports, and feeds the outcomes back for the next round. It reaches full coverage on simpler designs and up to 49 percent on more complex ones across 26 examples that include CVDP benchmarks. The loop runs without model fine-tuning and includes some targeted features to make the iterations more effective. That specific coordination of LLM output with simulator feedback and coverage parsing for digital hardware is the part that was not already in the literature they cite. The implementation details and the choice of benchmarks give a concrete picture of how such an agent can be built for this domain. The work is a practical demonstration rather than a theoretical advance, and it lands on a real bottleneck in semiconductor design. The main gap is the missing baselines. The paper does not show what constrained-random testing or existing EDA flows achieve on the identical designs, simulators, and coverage metrics. Without those numbers it is difficult to know whether the reported coverage reflects the agentic loop or simply the characteristics of the test cases. The abstract also gives limited information on iteration counts, failure cases, and how coverage is distributed, which leaves the empirical claim thinner than it needs to be. The assumption that the LLM can keep improving from error messages and coverage data without extra training appears to hold in their runs, but that is exactly the kind of claim that needs side-by-side data to be convincing. This paper is aimed at hardware verification researchers and engineers who are already experimenting with LLMs for EDA tasks. A reader looking for a working example of an agentic verification loop would get usable implementation ideas and benchmark results. It is worth sending to peer review because the problem is timely, the framework is a clear new application, and the evaluation can be strengthened with straightforward additions rather than requiring a complete redesign.

Referee Report

2 major / 2 minor

Summary. The paper introduces Spec2Cov, an agentic framework that uses LLMs to automatically and iteratively generate test stimulus from design specifications for hardware verification. It coordinates with a simulator to handle compilation/simulation errors, parse coverage reports, and refine stimuli in a feedback loop without additional fine-tuning. Evaluation across 26 designs of varying size and complexity, including CVDP benchmarks, reports 100% coverage on simpler designs and up to 49% on more complex designs.

Significance. If the central empirical claims hold after addressing evaluation gaps, the work could have practical significance by automating a labor-intensive aspect of hardware verification. The demonstration of an iterative LLM-simulator loop with features to improve effectiveness without fine-tuning represents a relevant direction for integrating agentic AI into EDA workflows.

major comments (2)

[Evaluation] Evaluation section: The central claim of 'promising performance' and acceleration of coverage closure rests on coverage numbers (100% on simple designs, up to 49% on complex ones) across 26 designs but supplies no baseline comparisons to conventional methods such as constrained-random testing or standard EDA flows on the same designs, simulators, and metrics. This omission is load-bearing, as the observed results could reflect design simplicity or default simulator behavior rather than the contribution of the agentic feedback loop.
[Results] Results and discussion: The manuscript provides no statistical details, failure analysis, or exact coverage distributions for the 26 designs, leaving the 'promising performance' assertion under-supported and difficult to interpret or reproduce.

minor comments (2)

[Abstract] Abstract: The phrase 'up to 49%' should be supplemented with more precise metrics (e.g., mean coverage, variance, or per-design breakdown) to strengthen the empirical summary.
[Framework] Framework description: Additional details on the specific LLM model, prompt templates, and exact parsing logic for coverage reports would improve reproducibility and clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments and positive assessment of the potential impact of our work. We address each major comment below and will revise the manuscript to incorporate the suggested improvements where appropriate.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The central claim of 'promising performance' and acceleration of coverage closure rests on coverage numbers (100% on simple designs, up to 49% on complex ones) across 26 designs but supplies no baseline comparisons to conventional methods such as constrained-random testing or standard EDA flows on the same designs, simulators, and metrics. This omission is load-bearing, as the observed results could reflect design simplicity or default simulator behavior rather than the contribution of the agentic feedback loop.

Authors: We concur that baseline comparisons are essential to substantiate the claims regarding the acceleration of coverage closure. In the revised version, we will include results from constrained-random testing on the same set of designs and using the identical simulator setup. For the simpler designs where we achieve 100% coverage, we will demonstrate that random testing alone does not reach full coverage within comparable simulation budgets. For complex designs, we will report the coverage achieved by standard methods to highlight the relative improvement from the agentic approach. We note that implementing full standard EDA flows may be beyond the scope, but these additions will address the core concern. revision: yes
Referee: [Results] Results and discussion: The manuscript provides no statistical details, failure analysis, or exact coverage distributions for the 26 designs, leaving the 'promising performance' assertion under-supported and difficult to interpret or reproduce.

Authors: We acknowledge this limitation in the current manuscript. We will expand the results section to include exact coverage percentages for each of the 26 designs, along with any statistical measures such as averages over multiple runs if applicable. A new subsection on failure analysis will be added, discussing the designs where coverage fell short of 100%, including factors like design complexity, specification ambiguity, or LLM limitations in generating effective stimuli. This will provide better support for our assertions and aid reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework demonstration with results grounded in external simulator outcomes

full rationale

The paper introduces Spec2Cov as an agentic LLM-simulator workflow and reports coverage results across 26 designs. No mathematical derivations, equations, fitted parameters, or self-referential predictions appear. Coverage numbers are produced by running the framework against a hardware simulator and parsing its reports; these outcomes are independent of any internal definition or self-citation chain. The evaluation is therefore self-contained against external benchmarks (simulator behavior on the given designs) and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven assumption that current LLMs possess sufficient zero-shot reasoning to generate valid hardware test stimulus and usefully refine it from simulator feedback without fine-tuning.

axioms (1)

domain assumption Large language models can parse hardware design specifications and generate syntactically and semantically correct test stimulus that a simulator can execute.
This capability is required for the agent to produce initial tests and respond to coverage feedback.

invented entities (1)

Spec2Cov agentic loop no independent evidence
purpose: To manage iterative interactions between LLM and hardware simulator for coverage closure.
The framework itself is the primary contribution; its effectiveness is shown only through the reported runs rather than independent validation.

pith-pipeline@v0.9.0 · 5757 in / 1312 out tokens · 45846 ms · 2026-05-22T10:14:52.750833+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Spec2Cov coordinates interactions between an LLM and a hardware simulator, managing compilation and simulation errors, parsing coverage reports, and feeding results back to the model for refinement.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

achieving 100% coverage on simpler designs and up to 49% on more complex designs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 1 internal anchor

[1]

Chip-Chat: Chal- lenges and Opportunities in Conversational Hardware Design,

J. Blocklove, S. Garg, R. Karri, and H. Pearce, “Chip-Chat: Chal- lenges and Opportunities in Conversational Hardware Design,” in2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD), Sep. 2023, pp. 1–6

work page 2023
[2]

VerilogReader: LLM-Aided Hardware Test Generation,

R. Ma, Y . Yang, Z. Liu, J. Zhang, M. Li, J. Huang, and G. Luo, “VerilogReader: LLM-Aided Hardware Test Generation,” in2024 IEEE LLM Aided Design Workshop (LAD), Jun. 2024, pp. 1–5

work page 2024
[3]

Au- toBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design,

R. Qiu, G. L. Zhang, R. Drechsler, U. Schlichtmann, and B. Li, “Au- toBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design,” inProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, ser. MLCAD ’24. New York, NY , USA: Association for Computing Machinery, Sep. 2024, pp. 1–10

work page 2024
[4]

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation,

Z. Zhang, G. Chadwick, H. McNally, Y . Zhao, and R. Mullins, “LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation,” Oct. 2023

work page 2023
[5]

ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Gen- eration,

B. Mali, K. Maddala, V . Gupta, S. Reddy, C. Karfa, and R. Karri, “ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Gen- eration,” in2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). Knoxville, TN, USA: IEEE, Jul. 2024, pp. 680–683

work page 2024
[6]

AssertLLM: Generating Hardware Verification Assertions from Design Specifications via Multi-LLMs,

W. Fang, M. Li, M. Li, Z. Yan, S. Liu, H. Zhang, and Z. Xie, “AssertLLM: Generating Hardware Verification Assertions from Design Specifications via Multi-LLMs,” in2024 IEEE LLM Aided Design Workshop (LAD), Jun. 2024, pp. 1–1

work page 2024
[7]

Illm4dv: Using large language models for hardware test stimuli gener- ation

“Illm4dv: Using large language models for hardware test stimuli gener- ation.”

work page
[8]

Prompt. Verify. Repeat. LLMs in the Hardware Verification Cycle,

M. Hassan, M. Nadeem, K. Qayyum, C. K. Jha, and R. Drechsler, “Prompt. Verify. Repeat. LLMs in the Hardware Verification Cycle,” in 2025 IEEE International Conference on Omni-layer Intelligent Systems (COINS), 2025, pp. 1–6

work page 2025
[9]

From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

J. Ye, Y . Hu, K. Xu, D. Pan, Q. Chen, J. Zhou, S. Zhao, X. Fang, X. Wang, N. Guan, and Z. Jiang, “From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification,” 2025. [Online]. Available: https://arxiv.org/abs/2504.19959

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification,

N. Pinckney, C. Deng, C.-T. Ho, Y .-D. Tsai, M. Liu, W. Zhou, B. Khailany, and H. Ren, “Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification,” 2025. [Online]. Available: https://arxiv.org/abs/2506.14074

work page arXiv 2025
[11]

Spec2Cov,

Anonymous, “Spec2Cov,” 2025, https://anonymous.4open.science/r/spec2cov

work page 2025
[12]

VeriGen: A Large Language Model for Verilog Code Generation,

S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “VeriGen: A Large Language Model for Verilog Code Generation,”ACM Transactions on Design Automation of Electronic Systems, p. 3643681, Feb. 2024

work page 2024
[13]

GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models,

Y . Fu, Y . Zhang, Z. Yu, S. Li, Z. Ye, C. Li, C. Wan, and Y . C. Lin, “GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models,” 2025. [Online]. Available: https://arxiv.org/abs/2309.10730

work page arXiv 2025
[14]

VerilogDB: The Largest, Highest- Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation,

P. E. Calzada, Z. Ibnat, T. Rahman, K. Kandula, D. Lu, S. K. Saha, F. Farahmandi, and M. Tehranipoor, “VerilogDB: The Largest, Highest- Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation,” 2025. [Online]. Available: https://arxiv.org/abs/2507.13369

work page arXiv 2025
[15]

K. Xu, J. Sun, Y . Hu, X. Fang, W. Shan, X. Wang, and Z. Jiang,MEIC: Re-thinking RTL Debug Automation using LLMs. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3676536.3676801

work page doi:10.1145/3676536.3676801 2025
[16]

Uvllm: An automated universal rtl verification framework using llms,

Y . Hu, J. Ye, K. Xu, J. Sun, S. Zhang, X. Jiao, D. Pan, J. Zhou, N. Wang, W. Shan, X. Fang, X. Wang, N. Guan, and Z. Jiang, “UVLLM: An Automated Universal RTL Verification Framework using LLMs,” 2024. [Online]. Available: https://arxiv.org/abs/2411.16238

work page arXiv 2024
[17]

Efficient Memory Management for Large Language Model Serving with PagedAttention,

W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica, “Efficient Memory Management for Large Language Model Serving with PagedAttention,” inProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

work page 2023
[18]

vndecorrelator: Verilog implementation of a von Neu- mann decorrelator,

J. Str ¨ombergson, “vndecorrelator: Verilog implementation of a von Neu- mann decorrelator,” https://github.com/secworks/vndecorrelator, 2016

work page 2016
[19]

FIFO SystemVerilog Assertion: Syn- chronous FIFO with SystemVerilog Assertions,

A. Vashist, “FIFO SystemVerilog Assertion: Syn- chronous FIFO with SystemVerilog Assertions,” https://github.com/avashist003/FIFO SystemVerilog Assertion, 2020

work page 2020
[20]

uart: Verilog implementation of a simple UART core,

J. Str ¨ombergson, “uart: Verilog implementation of a simple UART core,” https://github.com/secworks/uart, 2014

work page 2014
[21]

sha1: Verilog implementation of the SHA-1 hash function,

J. Str ¨ombergson, “sha1: Verilog implementation of the SHA-1 hash function,” https://github.com/secworks/sha1, 2014

work page 2014
[22]

chacha: Verilog implementation of the ChaCha stream cipher,

J. Str ¨ombergson, “chacha: Verilog implementation of the ChaCha stream cipher,” https://github.com/secworks/chacha, 2014

work page 2014
[23]

trng: True Random Number Generator core imple- mented in Verilog,

J. Str ¨ombergson, “trng: True Random Number Generator core imple- mented in Verilog,” https://github.com/secworks/trng, 2014

work page 2014
[24]

SD-card-controller: SD/SDHC card controller for Wish- bone bus,

M. Czerski, “SD-card-controller: SD/SDHC card controller for Wish- bone bus,” https://github.com/mczerski/SD-card-controller, 2013

work page 2013
[25]

DSP Slice: Floating Point Units,

S. Mehta, “DSP Slice: Floating Point Units,” https://github.com/samidhm/DSP Slice/tree/main/Floating Point Units, 2020

work page 2020
[26]

tpu like design: TPU-like design with pooling unit,

V . Patel and UT-LCA, “tpu like design: TPU-like design with pooling unit,” https://github.com/UT- LCA/tpu like design/tree/master/design ws vedant, 2019

work page 2019

[1] [1]

Chip-Chat: Chal- lenges and Opportunities in Conversational Hardware Design,

J. Blocklove, S. Garg, R. Karri, and H. Pearce, “Chip-Chat: Chal- lenges and Opportunities in Conversational Hardware Design,” in2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD), Sep. 2023, pp. 1–6

work page 2023

[2] [2]

VerilogReader: LLM-Aided Hardware Test Generation,

R. Ma, Y . Yang, Z. Liu, J. Zhang, M. Li, J. Huang, and G. Luo, “VerilogReader: LLM-Aided Hardware Test Generation,” in2024 IEEE LLM Aided Design Workshop (LAD), Jun. 2024, pp. 1–5

work page 2024

[3] [3]

Au- toBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design,

R. Qiu, G. L. Zhang, R. Drechsler, U. Schlichtmann, and B. Li, “Au- toBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design,” inProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, ser. MLCAD ’24. New York, NY , USA: Association for Computing Machinery, Sep. 2024, pp. 1–10

work page 2024

[4] [4]

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation,

Z. Zhang, G. Chadwick, H. McNally, Y . Zhao, and R. Mullins, “LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation,” Oct. 2023

work page 2023

[5] [5]

ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Gen- eration,

B. Mali, K. Maddala, V . Gupta, S. Reddy, C. Karfa, and R. Karri, “ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Gen- eration,” in2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). Knoxville, TN, USA: IEEE, Jul. 2024, pp. 680–683

work page 2024

[6] [6]

AssertLLM: Generating Hardware Verification Assertions from Design Specifications via Multi-LLMs,

W. Fang, M. Li, M. Li, Z. Yan, S. Liu, H. Zhang, and Z. Xie, “AssertLLM: Generating Hardware Verification Assertions from Design Specifications via Multi-LLMs,” in2024 IEEE LLM Aided Design Workshop (LAD), Jun. 2024, pp. 1–1

work page 2024

[7] [7]

Illm4dv: Using large language models for hardware test stimuli gener- ation

“Illm4dv: Using large language models for hardware test stimuli gener- ation.”

work page

[8] [8]

Prompt. Verify. Repeat. LLMs in the Hardware Verification Cycle,

M. Hassan, M. Nadeem, K. Qayyum, C. K. Jha, and R. Drechsler, “Prompt. Verify. Repeat. LLMs in the Hardware Verification Cycle,” in 2025 IEEE International Conference on Omni-layer Intelligent Systems (COINS), 2025, pp. 1–6

work page 2025

[9] [9]

From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

J. Ye, Y . Hu, K. Xu, D. Pan, Q. Chen, J. Zhou, S. Zhao, X. Fang, X. Wang, N. Guan, and Z. Jiang, “From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification,” 2025. [Online]. Available: https://arxiv.org/abs/2504.19959

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification,

N. Pinckney, C. Deng, C.-T. Ho, Y .-D. Tsai, M. Liu, W. Zhou, B. Khailany, and H. Ren, “Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification,” 2025. [Online]. Available: https://arxiv.org/abs/2506.14074

work page arXiv 2025

[11] [11]

Spec2Cov,

Anonymous, “Spec2Cov,” 2025, https://anonymous.4open.science/r/spec2cov

work page 2025

[12] [12]

VeriGen: A Large Language Model for Verilog Code Generation,

S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “VeriGen: A Large Language Model for Verilog Code Generation,”ACM Transactions on Design Automation of Electronic Systems, p. 3643681, Feb. 2024

work page 2024

[13] [13]

GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models,

Y . Fu, Y . Zhang, Z. Yu, S. Li, Z. Ye, C. Li, C. Wan, and Y . C. Lin, “GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models,” 2025. [Online]. Available: https://arxiv.org/abs/2309.10730

work page arXiv 2025

[14] [14]

VerilogDB: The Largest, Highest- Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation,

P. E. Calzada, Z. Ibnat, T. Rahman, K. Kandula, D. Lu, S. K. Saha, F. Farahmandi, and M. Tehranipoor, “VerilogDB: The Largest, Highest- Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation,” 2025. [Online]. Available: https://arxiv.org/abs/2507.13369

work page arXiv 2025

[15] [15]

K. Xu, J. Sun, Y . Hu, X. Fang, W. Shan, X. Wang, and Z. Jiang,MEIC: Re-thinking RTL Debug Automation using LLMs. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi-org.ezproxy1.lib.asu.edu/10.1145/3676536.3676801

work page doi:10.1145/3676536.3676801 2025

[16] [16]

Uvllm: An automated universal rtl verification framework using llms,

Y . Hu, J. Ye, K. Xu, J. Sun, S. Zhang, X. Jiao, D. Pan, J. Zhou, N. Wang, W. Shan, X. Fang, X. Wang, N. Guan, and Z. Jiang, “UVLLM: An Automated Universal RTL Verification Framework using LLMs,” 2024. [Online]. Available: https://arxiv.org/abs/2411.16238

work page arXiv 2024

[17] [17]

Efficient Memory Management for Large Language Model Serving with PagedAttention,

W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica, “Efficient Memory Management for Large Language Model Serving with PagedAttention,” inProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

work page 2023

[18] [18]

vndecorrelator: Verilog implementation of a von Neu- mann decorrelator,

J. Str ¨ombergson, “vndecorrelator: Verilog implementation of a von Neu- mann decorrelator,” https://github.com/secworks/vndecorrelator, 2016

work page 2016

[19] [19]

FIFO SystemVerilog Assertion: Syn- chronous FIFO with SystemVerilog Assertions,

A. Vashist, “FIFO SystemVerilog Assertion: Syn- chronous FIFO with SystemVerilog Assertions,” https://github.com/avashist003/FIFO SystemVerilog Assertion, 2020

work page 2020

[20] [20]

uart: Verilog implementation of a simple UART core,

J. Str ¨ombergson, “uart: Verilog implementation of a simple UART core,” https://github.com/secworks/uart, 2014

work page 2014

[21] [21]

sha1: Verilog implementation of the SHA-1 hash function,

J. Str ¨ombergson, “sha1: Verilog implementation of the SHA-1 hash function,” https://github.com/secworks/sha1, 2014

work page 2014

[22] [22]

chacha: Verilog implementation of the ChaCha stream cipher,

J. Str ¨ombergson, “chacha: Verilog implementation of the ChaCha stream cipher,” https://github.com/secworks/chacha, 2014

work page 2014

[23] [23]

trng: True Random Number Generator core imple- mented in Verilog,

J. Str ¨ombergson, “trng: True Random Number Generator core imple- mented in Verilog,” https://github.com/secworks/trng, 2014

work page 2014

[24] [24]

SD-card-controller: SD/SDHC card controller for Wish- bone bus,

M. Czerski, “SD-card-controller: SD/SDHC card controller for Wish- bone bus,” https://github.com/mczerski/SD-card-controller, 2013

work page 2013

[25] [25]

DSP Slice: Floating Point Units,

S. Mehta, “DSP Slice: Floating Point Units,” https://github.com/samidhm/DSP Slice/tree/main/Floating Point Units, 2020

work page 2020

[26] [26]

tpu like design: TPU-like design with pooling unit,

V . Patel and UT-LCA, “tpu like design: TPU-like design with pooling unit,” https://github.com/UT- LCA/tpu like design/tree/master/design ws vedant, 2019

work page 2019