Phoenix-bench shows agentic AI systems lose 37-58% resolved rate when moving from SWE-bench Verified to hardware tasks because bugs spread across parallel modules via signal flow, with testbench feedback lifting performance by 42-45% while file-level oracles add only 1.4%.
AssertLLM: Generating and evaluating hardware verification asser- tions from design specifications via multi-LLMs
8 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
UVMarvel automatically constructs subsystem-level UVM testbenches for mainstream bus protocols using LLMs, an IR, and supporting libraries, reaching 95.65% average code coverage in 4.5 hours of automated runtime.
ProofLoop achieves 93.7% syntax correctness and 82.0% functional correctness for SVA generation from natural language by combining retrieval, EDA tools, and up to three rounds of JasperGold formal feedback.
FVRuleLearner introduces an Operator Reasoning Tree to learn operator-specific rules that improve natural-language to SystemVerilog assertion generation, raising syntax correctness by 3.95% and functional correctness by 31.17% over baselines.
An agent system autoformalizes industry DRAM specifications into DRAMPyML for verification tasks like assertion generation, with DRAMBench dataset released for benchmarking.
AgileAssert identifies top critical signals via hybrid scoring on RTL graphs and uses structure-aware slicing to let LLMs generate targeted assertions, cutting assertion count by 66.68% and token use by 64% while matching or exceeding prior coverage and error detection.
CoverAssert iteratively improves LLM-generated assertions via syntax-semantic clustering and coverage feedback, yielding 9.57% branch, 9.64% statement, and 15.69% toggle coverage gains on four open-source designs when combined with prior tools.
UVM^2 is an LLM-driven system that generates and refines UVM testbenches for RTL verification, reporting up to substantial time savings and average code/function coverage of 87.44%/89.58% on designs up to 1.6K lines, outperforming prior methods.
citing papers explorer
-
Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench
Phoenix-bench shows agentic AI systems lose 37-58% resolved rate when moving from SWE-bench Verified to hardware tasks because bugs spread across parallel modules via signal flow, with testbench feedback lifting performance by 42-45% while file-level oracles add only 1.4%.
-
UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification
UVMarvel automatically constructs subsystem-level UVM testbenches for mainstream bus protocols using LLMs, an IR, and supporting libraries, reaching 95.65% average code coverage in 4.5 hours of automated runtime.
-
From Language to Logic: Bridging LLMs & Formal Representations for RTL Assertion Generation
ProofLoop achieves 93.7% syntax correctness and 82.0% functional correctness for SVA generation from natural language by combining retrieval, EDA tools, and up to three rounds of JasperGold formal feedback.
-
FVRuleLearner: Operator-Level Reasoning Tree (OP-Tree)-Based Rules Learning for Formal Verification
FVRuleLearner introduces an Operator Reasoning Tree to learn operator-specific rules that improve natural-language to SystemVerilog assertion generation, raising syntax correctness by 3.95% and functional correctness by 31.17% over baselines.
-
Autoformalizing Memory Specifications with Agents
An agent system autoformalizes industry DRAM specifications into DRAMPyML for verification tasks like assertion generation, with DRAMBench dataset released for benchmarking.
-
From Indiscriminate to Targeted: Efficient RTL Verification via Functionally Key Signal-Driven LLM Assertion Generation
AgileAssert identifies top critical signals via hybrid scoring on RTL graphs and uses structure-aware slicing to let LLMs generate targeted assertions, cutting assertion count by 66.68% and token use by 64% while matching or exceeding prior coverage and error detection.
-
CoverAssert: Iterative LLM Assertion Generation Driven by Functional Coverage via Syntax-Semantic Representations
CoverAssert iteratively improves LLM-generated assertions via syntax-semantic clustering and coverage feedback, yielding 9.57% branch, 9.64% statement, and 15.69% toggle coverage gains on four open-source designs when combined with prior tools.
-
From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification
UVM^2 is an LLM-driven system that generates and refines UVM testbenches for RTL verification, reporting up to substantial time savings and average code/function coverage of 87.44%/89.58% on designs up to 1.6K lines, outperforming prior methods.