HWE-Bench is the first repository-level benchmark for LLM agents on real hardware bug repair, where the best agent fixes 70.7% of 417 tasks but drops below 65% on complex SoC projects.
Codev: Empowering llms for verilog generation through multi-level summarization
7 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 7representative citing papers
ChipCraftBrain achieves 97.2% pass rate on VerilogEval and 94.7% on CVDP benchmarks for generating functional RTL code using adaptive multi-agent orchestration and hybrid reasoning.
RTL-BenchMT is an agent-assisted framework for dynamically maintaining RTL generation benchmarks by fixing flaws and reducing overfitting in LLM-based EDA applications.
Dr. RTL's multi-agent framework with group-relative skill learning achieves 21% WNS and 17% TNS timing improvements plus 6% area reduction on 20 real-world RTL designs over commercial synthesis tools.
VeriRAG is a RAG-based LLM framework that repairs Verilog RTL designs for DFT compliance using a curated dataset VeriDFT and achieves a 7.72-fold higher successful repair rate than zero-shot prompting.
ChipSeek is a hierarchical-reward reinforcement learning framework with Curriculum-Guided Dynamic Policy Optimization that integrates EDA simulator feedback to improve LLM-generated RTL code on both functional correctness and PPA metrics.
Using LLMs to encode logic condition tables into HDL code and decode back to tables mitigates hallucinations in hardware design automation.
citing papers explorer
-
HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks
HWE-Bench is the first repository-level benchmark for LLM agents on real hardware bug repair, where the best agent fixes 70.7% of 417 tasks but drops below 65% on complex SoC projects.
-
ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration
ChipCraftBrain achieves 97.2% pass rate on VerilogEval and 94.7% on CVDP benchmarks for generating functional RTL code using adaptive multi-agent orchestration and hybrid reasoning.
-
RTL-BenchMT: Dynamic Maintenance of RTL Generation Benchmark Through Agent-Assisted Analysis and Revision
RTL-BenchMT is an agent-assisted framework for dynamically maintaining RTL generation benchmarks by fixing flaws and reducing overfitting in LLM-based EDA applications.
-
Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement
Dr. RTL's multi-agent framework with group-relative skill learning achieves 21% WNS and 17% TNS timing improvements plus 6% area reduction on 20 real-world RTL designs over commercial synthesis tools.
-
VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair
VeriRAG is a RAG-based LLM framework that repairs Verilog RTL designs for DFT compliance using a curated dataset VeriDFT and achieves a 7.72-fold higher successful repair rate than zero-shot prompting.
-
ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning
ChipSeek is a hierarchical-reward reinforcement learning framework with Curriculum-Guided Dynamic Policy Optimization that integrates EDA simulator feedback to improve LLM-generated RTL code on both functional correctness and PPA metrics.
-
Mitigating hallucinations and omissions in LLMs for invertible problems: An application to hardware logic design automation
Using LLMs to encode logic condition tables into HDL code and decode back to tables mitigates hallucinations in hardware design automation.