LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Brucek Khailany; Chia-Tung Ho; Haoxing Ren; Hejia Zhang; Jishen Zhao; Zhongming Yu

arxiv: 2602.16953 · v3 · pith:KFGVWSZVnew · submitted 2026-02-18 · 💻 cs.AI · cs.LG

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Hejia Zhang , Zhongming Yu , Chia-Tung Ho , Haoxing Ren , Brucek Khailany , Jishen Zhao This is my paper

classification 💻 cs.AI cs.LG

keywords learningagenticverificationdataevaluationexecutionexecution-awarefeedback

0 comments

read the original abstract

Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback can be expensive and slow to obtain, making online reinforcement learning (RL) less practical in certain scenarios. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as single-step state transitions guided by deterministic evaluators. Building on this formulation, we introduce execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling to enable scalable learning under execution constraints. We further curate a reality-aligned benchmark adapted from an existing verification suite through a revised evaluation protocol. Using the proposed pipeline, a compact 4B-parameter model achieves 69.2% pass rate and 90.4% average coverage in CVDP-ECov under agentic evaluation, outperforming its teacher by 5.3% and 10.5%, demonstrating competitive performance against models an order of magnitude larger.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HAVEN: Hybrid Automated Verification ENgine for UVM Testbench Synthesis with LLMs
cs.AR 2026-04 unverdicted novelty 7.0

HAVEN combines LLM agents for planning and gap analysis with protocol-specific templates and a custom DSL to generate correct UVM testbenches, achieving 100% compilation success, 90.6% code coverage, and 87.9% functio...