pith. machine review for the scientific record. sign in

arxiv: 2604.08932 · v1 · submitted 2026-04-10 · 💻 cs.AR

Recognition: unknown

From Indiscriminate to Targeted: Efficient RTL Verification via Functionally Key Signal-Driven LLM Assertion Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:22 UTC · model grok-4.3

classification 💻 cs.AR
keywords RTL verificationassertion-based verificationLLM assertion generationsemantic graphskey signal identificationtargeted verificationfunctional coveragemutation testing
0
0 comments X

The pith

AgileAssert uses RTL semantic graphs and hybrid scoring to identify critical signals, then slices context so LLMs generate targeted assertions that cut total count by two thirds while holding or raising coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to replace indiscriminate assertion generation with a focused approach that first locates the signals whose values most control design behavior and error spread. It builds a semantic graph of the RTL, scores signals by a combination of functional importance and propagation potential, keeps only the top signals, and slices the surrounding code structure to give the LLM just the relevant context. This produces fewer assertions that still verify the key parts of the design. The shift matters because verification dominates IC development time and cost, so any method that delivers equivalent or better checks with substantially less output and input effort can reduce both engineering hours and compute resources.

Core claim

AgileAssert constructs RTL semantic graphs and identifies the top-K critical signals via a hybrid scoring and selection mechanism, followed by structure-aware RTL slicing to provide the LLM with precise targets and contextual information, thereby guiding LLMs to generate tightly constrained targeted assertions for efficient verification.

What carries the argument

RTL semantic graphs with hybrid scoring and selection to pick top-K functionally critical signals, followed by structure-aware slicing that supplies the LLM with focused context for targeted assertion generation.

If this is right

  • On evaluated block- and CPU-level designs the approach reduces the number of assertions by an average of 66.68 percent compared with three existing state-of-the-art methods.
  • Coverage metrics improve while the tokens fed to the LLM drop by 64 percent.
  • In mutation testing the method exceeds prior approaches in error detection rate even though the average number of assertions falls by 72.74 percent.
  • Verification effort moves from blanket coverage to cost-effective targeted coverage that still exercises the design's core functionality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-plus-scoring step could be reused with other assertion languages or formal engines that accept signal-focused constraints.
  • Lower token consumption opens the method to designs too large for full-context prompting without truncation.
  • Targeted assertions may prove easier to review and maintain because each one is tied to a documented high-impact signal.
  • Combining the sliced context with static timing or power analysis could further prioritize signals that affect both correctness and physical constraints.

Load-bearing premise

The hybrid scoring mechanism on the semantic graphs reliably identifies signals with the greatest impact on functionality and error propagation.

What would settle it

Run the method on a fresh block- or CPU-level design and measure whether the generated assertions achieve at least the same functional coverage and mutation error detection as full indiscriminate generation; if coverage falls or critical bugs are missed while assertion count drops, the claim does not hold.

Figures

Figures reproduced from arXiv: 2604.08932 by Boling Chen, Feng Gu, Hongqin Lyu, Huawei Li, Jianan Mu, Kan Shi, MinYang Bao, Tiancheng Wang, Wenchao Ding, Yonghao Wang, Zhiteng Chao.

Figure 1
Figure 1. Figure 1: Efficient RTL verification through key signal-driven [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The workflow of the AgileAssert framework. Our RTL parser constructs a semantic graph from the RTL code. Signals [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of a simplified RTL semantic graph from an [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example prompt for targeted assertion genera [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Signal-centric RTL slicing and prompt-driven assertion [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: COI coverage vs. input token consumption across [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Ablation Study on COI* Coverage Using Different Signal Ranking Configurations [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

Functional verification has become the most time-consuming phase in IC development, and Assertion-Based Verification (ABV) is key to reducing debugging time. However, existing LLM-based assertion generation methods typically pursue indiscriminate verification, aiming for maximal coverage without considering signal criticality, whereas industrial practice demands maximizing coverage with minimal verification cost. Consequently, identifying signals that have the greatest impact on design functionality and error propagation-enabling a shift from indiscriminate to targeted verification-remains a key challenge. To address this, we propose AgileAssert, a key signal-driven assertion generation framework that constructs RTL semantic graphs and identifies the top-K critical signals via a hybrid scoring and selection mechanism, followed by structure-aware RTL slicing to provide the LLM with precise targets and contextual information, thereby guiding LLMs to generate tightly constrained targeted assertions for efficient verification. Evaluated on block- and CPU-level designs, with an average 66.68% reduction in assertions, our approach outperforms three existing SOTA methods, and significantly improving coverage metrics while reducing input token consumption by 64%. In mutation testing, when our approach surpasses existing methods in error detection rate, the average number of assertions used decreases by 72.74%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces AgileAssert, a framework for targeted assertion generation in RTL verification. It builds RTL semantic graphs, applies a hybrid scoring mechanism to select top-K critical signals based on functional impact and error propagation, performs structure-aware slicing to provide focused context, and uses LLMs to generate efficient assertions. On block- and CPU-level designs, it reports an average 66.68% reduction in assertions and 64% fewer input tokens versus three SOTA methods, with improved coverage metrics and, in mutation testing, a 72.74% reduction in assertions while achieving higher error detection rates.

Significance. If the results hold under rigorous validation, the work could meaningfully advance assertion-based verification by reducing the cost of generating and maintaining assertions without sacrificing effectiveness. The graph-driven targeting of signals addresses a practical gap between maximal-coverage LLM methods and industrial constraints on verification overhead.

major comments (2)
  1. §5 (Experimental Evaluation): The abstract and results claim concrete gains (66.68% assertion reduction, 64% token reduction, superior coverage and mutation scores) but supply no description of experimental setup, baseline reproduction details, number of designs/runs, statistical significance tests, or error bars; without these the outperformance claims cannot be assessed.
  2. §3.2 (Hybrid Scoring and Signal Selection): The hybrid scoring mechanism (graph centrality + fan-in/out + semantic features) is load-bearing for the central claim that targeted slicing produces better assertions than indiscriminate generation; however, no ablation studies, sensitivity analysis on K, or correlation of scores with injected faults/mutation outcomes are reported to show the scoring actually isolates high-impact signals rather than simply reducing context size.
minor comments (1)
  1. Abstract: The phrase 'significantly improving coverage metrics' is vague; specific metrics (e.g., line, branch, or functional coverage) and quantitative deltas should be stated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to strengthen the experimental reporting and analysis of the hybrid scoring mechanism.

read point-by-point responses
  1. Referee: §5 (Experimental Evaluation): The abstract and results claim concrete gains (66.68% assertion reduction, 64% token reduction, superior coverage and mutation scores) but supply no description of experimental setup, baseline reproduction details, number of designs/runs, statistical significance tests, or error bars; without these the outperformance claims cannot be assessed.

    Authors: We agree that additional experimental details are needed for full reproducibility and assessment of the claims. In the revised manuscript we will expand §5 to include: a complete list of the block-level and CPU-level designs with their characteristics, explicit reproduction steps and hyperparameters for the three SOTA baselines, the number of independent runs performed, statistical significance tests (e.g., paired t-tests with p-values), and error bars or standard deviations on all reported metrics including assertion counts, token usage, coverage, and mutation scores. revision: yes

  2. Referee: §3.2 (Hybrid Scoring and Signal Selection): The hybrid scoring mechanism (graph centrality + fan-in/out + semantic features) is load-bearing for the central claim that targeted slicing produces better assertions than indiscriminate generation; however, no ablation studies, sensitivity analysis on K, or correlation of scores with injected faults/mutation outcomes are reported to show the scoring actually isolates high-impact signals rather than simply reducing context size.

    Authors: We acknowledge that dedicated ablations would more directly isolate the contribution of the hybrid scoring. While the end-to-end comparisons already show gains in coverage and mutation detection, we will add the requested analyses in the revision: (1) component-wise ablations of the hybrid score, (2) sensitivity curves for different values of K reporting effects on assertion count, token usage, coverage, and error detection, and (3) correlation plots or tables linking per-signal hybrid scores to mutation outcomes. These additions will demonstrate that high-impact signals are preferentially selected rather than the benefit arising solely from smaller context. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external SOTA comparisons, not internal derivations or self-referential fits

full rationale

The provided abstract and context contain no equations, derivations, or first-principles claims that reduce to their own inputs. The framework (RTL semantic graphs, hybrid scoring for top-K signals, structure-aware slicing, LLM assertion generation) is described procedurally, with performance quantified solely via empirical evaluation on block- and CPU-level designs against three external SOTA baselines. Metrics such as 66.68% assertion reduction, 64% token reduction, and mutation-testing gains are presented as measured outcomes rather than predictions forced by fitting or self-definition. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results appears. The derivation chain is therefore self-contained against external benchmarks, yielding a normal non-finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies insufficient detail to enumerate free parameters, axioms, or invented entities with precision; the top-K selection and hybrid scoring are implied but not quantified or justified.

free parameters (1)
  • K (number of top critical signals)
    Central parameter in the hybrid scoring and selection mechanism; its value and selection criteria are not numerically specified.
axioms (1)
  • domain assumption RTL semantic graphs accurately capture design functionality and error propagation paths
    Invoked when constructing graphs to identify critical signals.

pith-pipeline@v0.9.0 · 5543 in / 1540 out tokens · 80440 ms · 2026-05-10T17:22:25.761426+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    https://iwls.org/iwls2005/benchmarks.html

    Iwls 2005 benchmaeks. https://iwls.org/iwls2005/benchmarks.html

  2. [2]

    https://community.cadence.com/ cadence blogs 8/b/breakfast-bytes/posts/jasper-ml

    Jaspergold: the next generation. https://community.cadence.com/ cadence blogs 8/b/breakfast-bytes/posts/jasper-ml

  3. [3]

    https://github.com/liangkangnan/tinyriscv

    tinyriscv. https://github.com/liangkangnan/tinyriscv

  4. [4]

    Focs–automatic generation of simulation checkers from formal specifications

    Yael Abarbanel, Ilan Beer, Leonid Gluhovsky, Sharon Keidar, and Yaron Wolfsthal. Focs–automatic generation of simulation checkers from formal specifications. InComputer Aided Verification: 12th International Conference, CAV 2000, Chicago, IL, USA, July 15-19,

  5. [5]

    Springer, 2000

    Proceedings 12, pages 538–542. Springer, 2000

  6. [6]

    A survey of computer architecture simulation techniques and tools.Ieee Access, 7:78120–78145, 2019

    Ayaz Akram and Lina Sawalha. A survey of computer architecture simulation techniques and tools.Ieee Access, 7:78120–78145, 2019

  7. [7]

    Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl.arXiv preprint arXiv:2503.19174, 2025

    Yunsheng Bai, Ghaith Bany Hamad, Syed Suhaib, and Haoxing Ren. Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl.arXiv preprint arXiv:2503.19174, 2025

  8. [8]

    Betweenness centrality in large complex networks

    Marc Barthelemy. Betweenness centrality in large complex networks. The European physical journal B, 38(2):163–168, 2004

  9. [9]

    Improving design verifiability by early rtl coverability analysis

    Kai-Hui Chang, Chia-Wei Chang, Jie-Hong Roland Jiang, and Chien- Nan Jimmy Liu. Improving design verifiability by early rtl coverability analysis. InTenth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMCODE2012), pages 25–32. IEEE, 2012

  10. [10]

    LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

    Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. Longrope: Extend- ing llm context window beyond 2 million tokens.arXiv preprint arXiv:2402.13753, 2024

  11. [11]

    Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms,

    Wenji Fang, Mengming Li, Min Li, Zhiyuan Yan, Shang Liu, Zhiyao Xie, and Hongce Zhang. Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms.arXiv preprint arXiv:2402.00386, 2024

  12. [12]

    Harry Foster. 2024 wilson research group ic/asic functional verification trend report.https://resources.sw.siemens.com/zh-TW/white-paper-2024- wilson-research-group-ic-asic-functional-verification-trend-report/

  13. [13]

    Assertion-based verification: Industry myths to realities (invited tutorial)

    Harry Foster. Assertion-based verification: Industry myths to realities (invited tutorial). InComputer Aided Verification: 20th International Conference, CAV 2008 Princeton, NJ, USA, July 7-14, 2008 Proceedings 20, pages 5–10. Springer, 2008

  14. [14]

    Pagerank beyond the web.siam REVIEW, 57(3):321– 363, 2015

    David F Gleich. Pagerank beyond the web.siam REVIEW, 57(3):321– 363, 2015

  15. [15]

    Realbench: Benchmarking verilog generation models with real-world ip designs.arXiv preprint arXiv:2507.16200,

    Pengwei Jin, Di Huang, Chongxiao Li, Shuyao Cheng, Yang Zhao, Xinyao Zheng, Jiaguo Zhu, Shuyi Xing, Bohan Dou, Rui Zhang, et al. Realbench: Benchmarking verilog generation models with real-world ip designs.arXiv preprint arXiv:2507.16200, 2025

  16. [16]

    (security) assertions by large language models.IEEE Transactions on Information Forensics and Security, 2024

    Rahul Kande, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Shailja Thakur, Ramesh Karri, and Jeyavijayan Rajendran. (security) assertions by large language models.IEEE Transactions on Information Forensics and Security, 2024

  17. [17]

    Ai-accelerating coverage closure using intelligent stimulus generation

    Jainender Kumar, Ronak Bhatt, Garima Srivastava, Ashutosh Sinha, and Prashant Teotia. Ai-accelerating coverage closure using intelligent stimulus generation

  18. [18]

    Assertgen: Enhancement of llm-aided assertion generation through cross-layer signal bridging

    Hongqin Lyu, Yonghao Wang, Yunlin Du, Mingyu Shi, Zhiteng Chao, Wenxing Li, Tiancheng Wang, and Huawei Li. Assertgen: Enhancement of llm-aided assertion generation through cross-layer signal bridging. In 31th Asia and South Pacific Design Automation Conference. IEEE, 2026

  19. [19]

    Assertminer: Module-level spec generation and assertion mining using static analysis guided llms

    Hongqin Lyu, Yonghao Wang, Jiaxin Zhou, Zhiteng Chao, Tiancheng Wang, and Huawei Li. Assertminer: Module-level spec generation and assertion mining using static analysis guided llms. In2026 31st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 432– 438, 2026

  20. [20]

    Using of jaccard coefficient for keywords similarity

    Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. Using of jaccard coefficient for keywords similarity. InProceedings of the international multiconference of engineers and computer scientists, volume 1, pages 380–384, 2013

  21. [21]

    Enhanced vlsi assertion generation: Con- forming to high-level specifications and reducing llm hallucinations with rag

    Hafiz Abdul Quddus, Md Sanowar Hossain, Ziya Cevahir, Alexander Jesser, and Md Nur Amin. Enhanced vlsi assertion generation: Con- forming to high-level specifications and reducing llm hallucinations with rag. InDVCon Europe 2024; Design and Verification Conference and Exhibition Europe, pages 57–62. VDE, 2024

  22. [22]

    Spotsigs: robust and efficient near duplicate detection in large web collections

    Martin Theobald, Jonathan Siddharth, and Andreas Paepcke. Spotsigs: robust and efficient near duplicate detection in large web collections. InProceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 563–570, 2008

  23. [23]

    Deepassert: An llm-aided verification framework with fine-grained assertion generation for modules with extracted module specifications.arXiv preprint arXiv:2509.14668, 2025

    Yonghao Wang, Jiaxin Zhou, Hongqin Lyu, Zhiteng Chao, Tiancheng Wang, and Huawei Li. Deepassert: An llm-aided verification framework with fine-grained assertion generation for modules with extracted module specifications.arXiv preprint arXiv:2509.14668, 2025

  24. [24]

    Iter- ative llm-based assertion generation using syntax-semantic represen- tations for functional coverage-guided verification.arXiv preprint arXiv:2602.15388, 2026

    Yonghao Wang, Jiaxin Zhou, Yang Yin, Hongqin Lyu, Zhiteng Chao, Wenchao Ding, Jing Ye, Tiancheng Wang, and Huawei Li. Iter- ative llm-based assertion generation using syntax-semantic represen- tations for functional coverage-guided verification.arXiv preprint arXiv:2602.15388, 2026

  25. [25]

    A survey on assertion-based hardware verification.ACM Computing Surveys (CSUR), 54(11s):1–33, 2022

    Hasini Witharana, Yangdi Lyu, Subodha Charles, and Prabhat Mishra. A survey on assertion-based hardware verification.ACM Computing Surveys (CSUR), 54(11s):1–33, 2022

  26. [26]

    Spec2Assertion: Automatic pre- RTL assertion generation using large language models with progressive regularization,

    Fenghua Wu, Evan Pan, Rahul Kande, Michael Quinn, Aakash Tyagi, David Kebo, Jeyavijayan Rajendran, and Jiang Hu. Spec2assertion: Automatic pre-rtl assertion generation using large language models with progressive regularization.arXiv preprint arXiv:2505.07995, 2025

  27. [27]

    Picorv32: A size-optimized risc-v cpu core

    YosysHQ. Picorv32: A size-optimized risc-v cpu core. https://github. com/YosysHQ/picorv32