pith. sign in

arxiv: 2604.23602 · v1 · submitted 2026-04-26 · 💻 cs.AR · cs.LG

TimingLLM: A Two-Stage Retrieval-Augmented Framework for Pre-Synthesis Timing Prediction from Verilog

Pith reviewed 2026-05-08 05:15 UTC · model grok-4.3

classification 💻 cs.AR cs.LG
keywords Verilog timing predictionretrieval-augmented LLMpre-synthesis estimationworst negative slacktotal negative slackRTL designhardware synthesisEDA acceleration
0
0 comments X

The pith

A retrieval-augmented LLM predicts post-synthesis timing slacks directly from Verilog modules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a two-stage method to estimate worst negative slack and total negative slack for hardware designs written in Verilog without executing synthesis tools. The first stage uses a fine-tuned LLM to generate path-level timing values that are reduced to compact structural features such as gate counts and critical-path depth. The second stage retrieves similar labeled examples and applies a learned adjustment vector inside an LLM regressor to produce the final slack predictions. If accurate across designs and manufacturing settings, this removes a major bottleneck in early RTL exploration by letting engineers test changes in minutes instead of hours.

Core claim

TimingLLM is a two-stage retrieval-augmented LLM pipeline that estimates worst negative slack and total negative slack directly from Verilog. Stage 1 fine-tunes an LLM to act as a compact post-synthesis timing oracle that outputs path-level arrival and required times, which are summarized into lightweight structural-timing cues. Stage 2 employs an LLM-based regressor that predicts the slacks and applies a learned diagonal steering vector, computed from the k nearest timing-labeled modules in a disjoint retrieval bank, at the last transformer block.

What carries the argument

The two-stage retrieval-augmented pipeline in which the first LLM extracts lightweight structural-timing cues and the second LLM regressor steers its output with a vector derived from nearest-neighbor retrieval.

If this is right

  • On VerilogEval the method attains R_WNS of 0.91 with 12 percent MAPE and R_TNS of 0.97 with 16 percent MAPE.
  • Runtime is 1.3 to 1.6 times faster than prior methods.
  • After initial training the model adapts to new technology libraries and PVT corners by refitting only the small regression head on 1000 labeled modules per setting while still outperforming baselines.
  • A new 60k-module Verilog corpus with synthesis reports is created and will be released to support further work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers could explore substantially more RTL variants within a fixed time budget because each timing query no longer requires a full synthesis run.
  • The same retrieval-plus-steering pattern may apply to other pre-synthesis predictions such as power or area if suitable cue extractors and labeled banks are built.
  • Success depends on maintaining a sufficiently diverse and up-to-date retrieval bank, which implies that organizations will need shared or curated libraries of synthesized modules.

Load-bearing premise

Lightweight structural cues extracted by the first-stage LLM together with retrieval from a bank of labeled modules contain enough information to predict timing slacks accurately for previously unseen Verilog modules and different technology nodes.

What would settle it

Evaluating the full pipeline on a set of Verilog modules synthesized under a new technology library and PVT corner without refitting the regression head and observing whether the correlation coefficients for WNS and TNS drop below 0.7.

Figures

Figures reproduced from arXiv: 2604.23602 by Armin Abdollahi, Massoud Pedram, Mehdi Kamal, Negin Ashrafi.

Figure 1
Figure 1. Figure 1: Dataset curation and stratified selection pipeline view at source ↗
Figure 2
Figure 2. Figure 2: Two-stage TimingLLM: fingerprint generation and retrieval-steered timing prediction. view at source ↗
Figure 3
Figure 3. Figure 3: Top-2 setup paths: STA vs. Stage-1 LLM agreement view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative retrieval for feature-space steering view at source ↗
Figure 5
Figure 5. Figure 5: The details of employed dataset The distribution of synthesized gate counts across 60k-modules corpus exhibits a pronounced concentration in small-scale designs so 33% of modules contain between 1 and 10 gates, and 31% span 10–50 gates, together accounting for two-thirds of the dataset. Mod￾ules with moderate complexity (50–100 gates) comprise 14%, while designs with 100–200 gates and 200+ gates each repre… view at source ↗
read the original abstract

Early, tool-free prediction of post-synthesis timing remains a key obstacle to rapid RTL iteration. We introduce TimingLLM, a two-stage retrieval-augmented LLM pipeline that estimates worst negative slack (WNS) and total negative slack (TNS) directly from Verilog. Stage 1 is a fine-tuned LLM that acts as a compact post-synthesis timing oracle, producing path-level arrivals/required times that are summarized into lightweight structural-timing cues (e.g., bag-of-gates counts, critical-path depth, gate-type patterns). Stage 2 is an LLM-based regressor that predicts WNS/TNS and applies a learned diagonal steering vector at the last transformer block, computed from the k nearest timing-labeled modules in a disjoint retrieval bank. On VerilogEval, TimingLLM attains R_WNS = 0.91 (MAPE 12%) and R_TNS=0.97 (MAPE 16%) while running 1.3-1.6 times faster than prior methods. Training uses a new 60k-module Verilog corpus with synthesis reports, which we will release. After training once, TimingLLM can be adapted to new technology libraries and PVT corners by refitting only a small regression head on 1000 labeled modules per setting, consistently outperforming state-of-the-art baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces TimingLLM, a two-stage retrieval-augmented LLM framework for pre-synthesis prediction of worst negative slack (WNS) and total negative slack (TNS) directly from Verilog. Stage 1 fine-tunes an LLM to act as a timing oracle that extracts and summarizes lightweight structural cues (bag-of-gates counts, critical-path depth, gate-type patterns). Stage 2 performs k-NN retrieval over a disjoint labeled bank and uses an LLM regressor with a learned diagonal steering vector to output WNS/TNS. On VerilogEval the method reports R_WNS=0.91 (MAPE 12%) and R_TNS=0.97 (MAPE 16%), runs 1.3-1.6x faster than prior work, and claims that after one-time training it can be adapted to new technology libraries/PVT corners by refitting only the regression head on 1000 labeled modules. A 60k-module Verilog corpus with synthesis reports will be released.

Significance. If the reported correlations hold under proper leakage controls and the cross-node adaptation claim is validated, the work could meaningfully accelerate RTL iteration by providing fast, tool-free timing estimates. The two-stage retrieval-augmented design and the planned dataset release are concrete strengths that would support follow-on research in ML-assisted EDA.

major comments (2)
  1. Abstract: the central performance numbers (R_WNS=0.91, MAPE 12%; R_TNS=0.97, MAPE 16%) and the adaptability claim are presented without any information on train-test split ratios, retrieval-bank construction details, baseline re-implementations, or explicit checks for data leakage between the 60k training corpus and VerilogEval. These omissions directly affect the soundness of the quantitative support for the main claims.
  2. Adaptation section (presumably §5 or §6): the assertion that refitting only the small regression head on 1000 new modules suffices for new technology libraries and PVT corners rests on the untested premise that Stage-1 structural cues remain sufficiently informative once cell delays, drive strengths, and wire RC change. No cross-node or cross-corner hold-out experiments are described that would confirm the cues transfer without retraining the LLM or updating the retrieval bank.
minor comments (1)
  1. Abstract: the speed-up factor (1.3-1.6x) is stated without naming the prior methods or the measurement conditions (e.g., hardware, synthesis tool version), reducing clarity of the efficiency claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our evaluation protocol and the scope of our adaptation claims. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: Abstract: the central performance numbers (R_WNS=0.91, MAPE 12%; R_TNS=0.97, MAPE 16%) and the adaptability claim are presented without any information on train-test split ratios, retrieval-bank construction details, baseline re-implementations, or explicit checks for data leakage between the 60k training corpus and VerilogEval. These omissions directly affect the soundness of the quantitative support for the main claims.

    Authors: We agree that the abstract omits these details due to space constraints. The full experimental protocol—including an 80/20 train-test split on the 60k-module corpus, construction of a disjoint retrieval bank from the training portion only, baseline re-implementations using identical splits, and explicit leakage-prevention steps (no overlap between retrieval bank and test modules)—is described in Sections 4 and 5. We will revise the abstract to include a concise statement summarizing the split ratios, disjoint bank construction, and leakage controls. revision: yes

  2. Referee: Adaptation section (presumably §5 or §6): the assertion that refitting only the small regression head on 1000 new modules suffices for new technology libraries and PVT corners rests on the untested premise that Stage-1 structural cues remain sufficiently informative once cell delays, drive strengths, and wire RC change. No cross-node or cross-corner hold-out experiments are described that would confirm the cues transfer without retraining the LLM or updating the retrieval bank.

    Authors: The Stage-1 cues (bag-of-gates counts, critical-path depth, gate-type patterns) are deliberately chosen as RTL-level structural descriptors that do not depend on absolute cell delays or wire RC. The retrieval bank and regression head are the only components refit per technology/PVT. We acknowledge that the current manuscript does not report explicit cross-node or cross-corner hold-out experiments validating cue transfer without LLM retraining. We will revise the adaptation section to explicitly state this design assumption, add a limitations paragraph, and include preliminary adaptation results on a second technology node if additional synthesis data can be obtained in time for the revision. revision: partial

Circularity Check

0 steps flagged

No circularity: predictions evaluated on held-out modules with disjoint retrieval bank

full rationale

The paper's pipeline extracts structural cues via a fine-tuned LLM (Stage 1) and uses k-NN retrieval from a disjoint labeled bank plus a regression head (Stage 2) to predict WNS/TNS. Reported metrics (R_WNS=0.91, R_TNS=0.97) are on held-out VerilogEval modules; adaptation to new libraries/PVT refits only the head on 1000 new modules. No derivation step reduces by construction to reuse of fitted quantities on the test set, no self-definitional loops, and no load-bearing self-citations. The chain is empirically grounded rather than tautological.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The framework depends on a large number of data-fitted parameters inside the LLMs and regression head plus domain assumptions about what structural summaries preserve timing information; no new physical entities are introduced.

free parameters (3)
  • stage-1 LLM fine-tuning weights
    The first LLM is fine-tuned on the 60k timing-labeled modules to produce path-level arrival and required times.
  • stage-2 regression head weights
    The final regressor is trained to map summarized cues plus retrieval features to WNS and TNS.
  • diagonal steering vector
    Learned from the k nearest modules in the disjoint retrieval bank and applied at the last transformer block.
axioms (2)
  • domain assumption Path-level timing estimates can be losslessly summarized into bag-of-gates counts, critical-path depth, and gate-type patterns while retaining predictive value for overall slack.
    This premise justifies the compression step between stage 1 and stage 2.
  • domain assumption Modules retrieved by structural similarity from a pre-labeled bank supply useful guidance for timing prediction on new designs.
    This underpins the retrieval-augmented component of stage 2.

pith-pipeline@v0.9.0 · 5556 in / 1714 out tokens · 85129 ms · 2026-05-08T05:15:04.812771+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 15 canonical work pages · 2 internal anchors

  1. [1]

    Deeprtl: Bridging verilog understanding and generation with a unified representation model

    Yi Liu, XU Changran, Yunhao Zhou, Zeju Li, and Qiang Xu. Deeprtl: Bridging verilog understanding and generation with a unified representation model. In The Thirteenth International Conference on Learning Representations

  2. [2]

    Betterv: Controlled verilog generation with discriminative guidance

    PEI Zehua, Huiling Zhen, Mingxuan Yuan, Yu Huang, and Bei Yu. Betterv: Controlled verilog generation with discriminative guidance. InForty-first Inter- national Conference on Machine Learning, 2024

  3. [3]

    Haven: Hallucination-mitigated llm for verilog code generation aligned with hdl engineers.arXiv preprint arXiv:2501.04908, 2025

    Yiyao Yang, Fu Teng, Pengju Liu, Mengnan Qi, Chenyang Lv, Ji Li, Xuhong Zhang, and Zhezhi He. Haven: Hallucination-mitigated llm for verilog code generation aligned with hdl engineers.arXiv preprint arXiv:2501.04908, 2025

  4. [4]

    Hivegen–hierarchical llm-based verilog generation for scalable chip design

    Jinwei Tang, Jiayin Qin, Kiran Thorat, Chen Zhu-Tian, Yu Cao, Caiwen Ding, et al. Hivegen–hierarchical llm-based verilog generation for scalable chip design. arXiv preprint arXiv:2412.05393, 2024

  5. [5]

    Lintllm: An open-source verilog linting framework based on large language models, 2025

    Zhigang Fang, Renzhi Chen, Zhijie Yang, Yang Guo, Huadong Dai, and Lei Wang. Lintllm: An open-source verilog linting framework based on large language models.arXiv preprint arXiv:2502.10815, 2025

  6. [6]

    Verigen: A large language model for verilog code generation.ACM Transactions on Design Automation of Electronic Systems, 29(3):1–31, 2024

    Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, and Siddharth Garg. Verigen: A large language model for verilog code generation.ACM Transactions on Design Automation of Electronic Systems, 29(3):1–31, 2024

  7. [7]

    Llm-aided efficient hardware design automation.arXiv preprint arXiv:2410.18582, 2024

    Kangwei Xu, Ruidi Qiu, Zhuorui Zhao, Grace Li Zhang, Ulf Schlichtmann, and Bing Li. Llm-aided efficient hardware design automation.arXiv preprint arXiv:2410.18582, 2024

  8. [8]

    Mage: A multi-agent engine for automated rtl code generation.arXiv preprint arXiv:2412.07822, 2024

    Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao. Mage: A multi-agent engine for automated rtl code generation.arXiv preprint arXiv:2412.07822, 2024

  9. [9]

    Codev: Empowering llms with hdl generation through multi-level summarization.IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, 2025

    Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Muxin Song, Yinan Xu, Ziyuan Nan, Mingju Gao, Tianyun Ma, Lei Qi, et al. Codev: Empowering llms with hdl generation through multi-level summarization.IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, 2025

  10. [10]

    ChipNeMo: Domain- adapted llms for chip design,

    Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinck- ney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, et al. Chipnemo: Domain-adapted llms for chip design.arXiv preprint arXiv:2311.00176, 2023

  11. [11]

    A multi-expert large language model architecture for verilog code generation

    Bardia Nadimi and Hao Zheng. A multi-expert large language model architecture for verilog code generation. In2024 IEEE LLM Aided Design Workshop (LAD), pages 1–5. IEEE, 2024

  12. [12]

    Mingjie Liu, Yun-Da Tsai, Wenfei Zhou, and Haoxing Ren. Craftrtl: High- quality synthetic data generation for verilog code models with correct-by- construction non-textual representations and targeted code repair.arXiv preprint arXiv:2409.12993, 2024

  13. [13]

    Hdlforge: A two-stage multi-agent framework for efficient verilog code generation with adaptive model escalation.arXiv preprint arXiv:2603.04646, 2026

    Armin Abdollahi, Saeid Shokoufa, Negin Ashrafi, Mehdi Kamal, and Massoud Pedram. Hdlforge: A two-stage multi-agent framework for efficient verilog code generation with adaptive model escalation.arXiv preprint arXiv:2603.04646, 2026

  14. [14]

    Optimizing urban mobility through complex network analysis and big data from smart cards.IoT, 6(3):44, 2025

    Li Sun, Negin Ashrafi, and Maryam Pishgar. Optimizing urban mobility through complex network analysis and big data from smart cards.IoT, 6(3):44, 2025

  15. [15]

    Houji Jin, Negin Ashrafi, Kamiar Alaei, Elham Pishgar, Greg Placencia, and Maryam Pishgar. A novel multi-task teacher–student architecture with self- supervised pretraining for 48-hour vasoactive-inotropic trend analysis in sepsis mortality prediction.IEEE Journal of Biomedical and Health Informatics, 2025

  16. [16]

    Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection.arXiv preprint arXiv:2407.16237, 2024

    Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, et al. Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection.arXiv preprint arXiv:2407.16237, 2024

  17. [17]

    Masterrtl: A pre-synthesis ppa estimation framework for any rtl design

    Wenji Fang, Yao Lu, Shang Liu, Qijun Zhang, Ceyu Xu, Lisa Wu Wills, Hongce Zhang, and Zhiyao Xie. Masterrtl: A pre-synthesis ppa estimation framework for any rtl design. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 1–9. IEEE, 2023

  18. [18]

    Annotating slack directly on your verilog: Fine-grained rtl timing evaluation for early optimization

    Wenji Fang, Shang Liu, Hongce Zhang, and Zhiyao Xie. Annotating slack directly on your verilog: Fine-grained rtl timing evaluation for early optimization. In Proceedings of the 61st ACM/IEEE Design Automation Conference, pages 1–6, 2024

  19. [19]

    Icd 2 s: A hybrid ising- classical-machines data-driven qubo solver method

    Armin Abdollahi, Mehdi Kamal, and Massoud Pedram. Icd 2 s: A hybrid ising- classical-machines data-driven qubo solver method. InProceedings of the 30th Asia and South Pacific Design Automation Conference, pages 914–920, 2025

  20. [20]

    Menage: Mixed-signal event-driven neuromorphic accelerator for edge applications.arXiv preprint arXiv:2410.08403, 2024

    Armin Abdollahi, Mehdi Kamal, and Massoud Pedram. Menage: Mixed-signal event-driven neuromorphic accelerator for edge applications.arXiv preprint arXiv:2410.08403, 2024

  21. [21]

    Llsm: Llm-enhanced logic synthesis model with eda-guided cot prompting, hybrid embedding and aig-tailored acceleration

    Shan Huang, Jinhao Li, Zhen Yu, Jiancai Ye, Jiaming Xu, Ningyi Xu, and Guohao Dai. Llsm: Llm-enhanced logic synthesis model with eda-guided cot prompting, hybrid embedding and aig-tailored acceleration. InProceedings of the 30th Asia and South Pacific Design Automation Conference, pages 974–980, 2025

  22. [22]

    Circuitfusion: multimodal cir- cuit representation learning for agile chip design.arXiv preprint arXiv:2505.02168, 2025

    Wenji Fang, Shang Liu, Jing Wang, and Zhiyao Xie. Circuitfusion: multimodal cir- cuit representation learning for agile chip design.arXiv preprint arXiv:2505.02168, 2025

  23. [23]

    Deepcircuitx: A comprehensive repository-level dataset for rtl code understanding, generation, and ppa analysis.arXiv preprint arXiv:2502.18297, 2025

    Zeju Li, Changran Xu, Zhengyuan Shi, Zedong Peng, Yi Liu, Yunhao Zhou, Lingfeng Zhou, Chengyu Ma, Jianyuan Zhong, Xi Wang, et al. Deepcircuitx: A comprehensive repository-level dataset for rtl code understanding, generation, and ppa analysis.arXiv preprint arXiv:2502.18297, 2025

  24. [24]

    Rocketppa: Ultra-fast llm- based ppa estimator at code-level abstraction.arXiv e-prints, pages arXiv–2503, 2025

    Armin Abdollahi, Mehdi Kamal, and Massoud Pedram. Rocketppa: Ultra-fast llm- based ppa estimator at code-level abstraction.arXiv e-prints, pages arXiv–2503, 2025

  25. [25]

    Pyranet: A multi-layered hierarchical dataset for verilog.arXiv preprint arXiv:2412.06947, 2024

    Bardia Nadimi, Ghali Omar Boutaib, and Hao Zheng. Pyranet: A multi-layered hierarchical dataset for verilog.arXiv preprint arXiv:2412.06947, 2024

  26. [26]

    Verilogeval: Evaluating large language models for verilog code generation

    Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. Verilogeval: Evaluating large language models for verilog code generation. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 1–8. IEEE, 2023

  27. [27]

    The Llama 3 Herd of Models

    Hugo Touvron, Louis Martin, Kevin Stone, et al. Llama 3: Open foundation and instruction-tuned models.arXiv preprint arXiv:2407.21783, 2024

  28. [28]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via rein- forcement learning.arXiv preprint arXiv:2501.12948, 2025

  29. [29]

    Nangate open cell library 45nm

    Nangate Inc. Nangate open cell library 45nm. http://www.nangate.com, 2011. Accessed: 2026

  30. [30]

    Martins, J

    M. Martins, J. Herrmann, M. S. Martins, et al. Open cell library in 15nm freepdk technology. InProceedings of the International Symposium on Quality Electronic Design (ISQED), pages 171–178, 2015

  31. [31]

    Vashishtha, M

    V. Vashishtha, M. Vangala, and L. T. Clark. Asap7 predictive design kit de- velopment and cell design technology co-optimization. InProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 992–998, 2017