TimingLLM: A Two-Stage Retrieval-Augmented Framework for Pre-Synthesis Timing Prediction from Verilog
Pith reviewed 2026-05-08 05:15 UTC · model grok-4.3
The pith
A retrieval-augmented LLM predicts post-synthesis timing slacks directly from Verilog modules.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TimingLLM is a two-stage retrieval-augmented LLM pipeline that estimates worst negative slack and total negative slack directly from Verilog. Stage 1 fine-tunes an LLM to act as a compact post-synthesis timing oracle that outputs path-level arrival and required times, which are summarized into lightweight structural-timing cues. Stage 2 employs an LLM-based regressor that predicts the slacks and applies a learned diagonal steering vector, computed from the k nearest timing-labeled modules in a disjoint retrieval bank, at the last transformer block.
What carries the argument
The two-stage retrieval-augmented pipeline in which the first LLM extracts lightweight structural-timing cues and the second LLM regressor steers its output with a vector derived from nearest-neighbor retrieval.
If this is right
- On VerilogEval the method attains R_WNS of 0.91 with 12 percent MAPE and R_TNS of 0.97 with 16 percent MAPE.
- Runtime is 1.3 to 1.6 times faster than prior methods.
- After initial training the model adapts to new technology libraries and PVT corners by refitting only the small regression head on 1000 labeled modules per setting while still outperforming baselines.
- A new 60k-module Verilog corpus with synthesis reports is created and will be released to support further work.
Where Pith is reading between the lines
- Designers could explore substantially more RTL variants within a fixed time budget because each timing query no longer requires a full synthesis run.
- The same retrieval-plus-steering pattern may apply to other pre-synthesis predictions such as power or area if suitable cue extractors and labeled banks are built.
- Success depends on maintaining a sufficiently diverse and up-to-date retrieval bank, which implies that organizations will need shared or curated libraries of synthesized modules.
Load-bearing premise
Lightweight structural cues extracted by the first-stage LLM together with retrieval from a bank of labeled modules contain enough information to predict timing slacks accurately for previously unseen Verilog modules and different technology nodes.
What would settle it
Evaluating the full pipeline on a set of Verilog modules synthesized under a new technology library and PVT corner without refitting the regression head and observing whether the correlation coefficients for WNS and TNS drop below 0.7.
Figures
read the original abstract
Early, tool-free prediction of post-synthesis timing remains a key obstacle to rapid RTL iteration. We introduce TimingLLM, a two-stage retrieval-augmented LLM pipeline that estimates worst negative slack (WNS) and total negative slack (TNS) directly from Verilog. Stage 1 is a fine-tuned LLM that acts as a compact post-synthesis timing oracle, producing path-level arrivals/required times that are summarized into lightweight structural-timing cues (e.g., bag-of-gates counts, critical-path depth, gate-type patterns). Stage 2 is an LLM-based regressor that predicts WNS/TNS and applies a learned diagonal steering vector at the last transformer block, computed from the k nearest timing-labeled modules in a disjoint retrieval bank. On VerilogEval, TimingLLM attains R_WNS = 0.91 (MAPE 12%) and R_TNS=0.97 (MAPE 16%) while running 1.3-1.6 times faster than prior methods. Training uses a new 60k-module Verilog corpus with synthesis reports, which we will release. After training once, TimingLLM can be adapted to new technology libraries and PVT corners by refitting only a small regression head on 1000 labeled modules per setting, consistently outperforming state-of-the-art baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TimingLLM, a two-stage retrieval-augmented LLM framework for pre-synthesis prediction of worst negative slack (WNS) and total negative slack (TNS) directly from Verilog. Stage 1 fine-tunes an LLM to act as a timing oracle that extracts and summarizes lightweight structural cues (bag-of-gates counts, critical-path depth, gate-type patterns). Stage 2 performs k-NN retrieval over a disjoint labeled bank and uses an LLM regressor with a learned diagonal steering vector to output WNS/TNS. On VerilogEval the method reports R_WNS=0.91 (MAPE 12%) and R_TNS=0.97 (MAPE 16%), runs 1.3-1.6x faster than prior work, and claims that after one-time training it can be adapted to new technology libraries/PVT corners by refitting only the regression head on 1000 labeled modules. A 60k-module Verilog corpus with synthesis reports will be released.
Significance. If the reported correlations hold under proper leakage controls and the cross-node adaptation claim is validated, the work could meaningfully accelerate RTL iteration by providing fast, tool-free timing estimates. The two-stage retrieval-augmented design and the planned dataset release are concrete strengths that would support follow-on research in ML-assisted EDA.
major comments (2)
- Abstract: the central performance numbers (R_WNS=0.91, MAPE 12%; R_TNS=0.97, MAPE 16%) and the adaptability claim are presented without any information on train-test split ratios, retrieval-bank construction details, baseline re-implementations, or explicit checks for data leakage between the 60k training corpus and VerilogEval. These omissions directly affect the soundness of the quantitative support for the main claims.
- Adaptation section (presumably §5 or §6): the assertion that refitting only the small regression head on 1000 new modules suffices for new technology libraries and PVT corners rests on the untested premise that Stage-1 structural cues remain sufficiently informative once cell delays, drive strengths, and wire RC change. No cross-node or cross-corner hold-out experiments are described that would confirm the cues transfer without retraining the LLM or updating the retrieval bank.
minor comments (1)
- Abstract: the speed-up factor (1.3-1.6x) is stated without naming the prior methods or the measurement conditions (e.g., hardware, synthesis tool version), reducing clarity of the efficiency claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which helps clarify the presentation of our evaluation protocol and the scope of our adaptation claims. We address each major comment below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: Abstract: the central performance numbers (R_WNS=0.91, MAPE 12%; R_TNS=0.97, MAPE 16%) and the adaptability claim are presented without any information on train-test split ratios, retrieval-bank construction details, baseline re-implementations, or explicit checks for data leakage between the 60k training corpus and VerilogEval. These omissions directly affect the soundness of the quantitative support for the main claims.
Authors: We agree that the abstract omits these details due to space constraints. The full experimental protocol—including an 80/20 train-test split on the 60k-module corpus, construction of a disjoint retrieval bank from the training portion only, baseline re-implementations using identical splits, and explicit leakage-prevention steps (no overlap between retrieval bank and test modules)—is described in Sections 4 and 5. We will revise the abstract to include a concise statement summarizing the split ratios, disjoint bank construction, and leakage controls. revision: yes
-
Referee: Adaptation section (presumably §5 or §6): the assertion that refitting only the small regression head on 1000 new modules suffices for new technology libraries and PVT corners rests on the untested premise that Stage-1 structural cues remain sufficiently informative once cell delays, drive strengths, and wire RC change. No cross-node or cross-corner hold-out experiments are described that would confirm the cues transfer without retraining the LLM or updating the retrieval bank.
Authors: The Stage-1 cues (bag-of-gates counts, critical-path depth, gate-type patterns) are deliberately chosen as RTL-level structural descriptors that do not depend on absolute cell delays or wire RC. The retrieval bank and regression head are the only components refit per technology/PVT. We acknowledge that the current manuscript does not report explicit cross-node or cross-corner hold-out experiments validating cue transfer without LLM retraining. We will revise the adaptation section to explicitly state this design assumption, add a limitations paragraph, and include preliminary adaptation results on a second technology node if additional synthesis data can be obtained in time for the revision. revision: partial
Circularity Check
No circularity: predictions evaluated on held-out modules with disjoint retrieval bank
full rationale
The paper's pipeline extracts structural cues via a fine-tuned LLM (Stage 1) and uses k-NN retrieval from a disjoint labeled bank plus a regression head (Stage 2) to predict WNS/TNS. Reported metrics (R_WNS=0.91, R_TNS=0.97) are on held-out VerilogEval modules; adaptation to new libraries/PVT refits only the head on 1000 new modules. No derivation step reduces by construction to reuse of fitted quantities on the test set, no self-definitional loops, and no load-bearing self-citations. The chain is empirically grounded rather than tautological.
Axiom & Free-Parameter Ledger
free parameters (3)
- stage-1 LLM fine-tuning weights
- stage-2 regression head weights
- diagonal steering vector
axioms (2)
- domain assumption Path-level timing estimates can be losslessly summarized into bag-of-gates counts, critical-path depth, and gate-type patterns while retaining predictive value for overall slack.
- domain assumption Modules retrieved by structural similarity from a pre-labeled bank supply useful guidance for timing prediction on new designs.
Reference graph
Works this paper leans on
-
[1]
Deeprtl: Bridging verilog understanding and generation with a unified representation model
Yi Liu, XU Changran, Yunhao Zhou, Zeju Li, and Qiang Xu. Deeprtl: Bridging verilog understanding and generation with a unified representation model. In The Thirteenth International Conference on Learning Representations
-
[2]
Betterv: Controlled verilog generation with discriminative guidance
PEI Zehua, Huiling Zhen, Mingxuan Yuan, Yu Huang, and Bei Yu. Betterv: Controlled verilog generation with discriminative guidance. InForty-first Inter- national Conference on Machine Learning, 2024
2024
-
[3]
Yiyao Yang, Fu Teng, Pengju Liu, Mengnan Qi, Chenyang Lv, Ji Li, Xuhong Zhang, and Zhezhi He. Haven: Hallucination-mitigated llm for verilog code generation aligned with hdl engineers.arXiv preprint arXiv:2501.04908, 2025
-
[4]
Hivegen–hierarchical llm-based verilog generation for scalable chip design
Jinwei Tang, Jiayin Qin, Kiran Thorat, Chen Zhu-Tian, Yu Cao, Caiwen Ding, et al. Hivegen–hierarchical llm-based verilog generation for scalable chip design. arXiv preprint arXiv:2412.05393, 2024
-
[5]
Lintllm: An open-source verilog linting framework based on large language models, 2025
Zhigang Fang, Renzhi Chen, Zhijie Yang, Yang Guo, Huadong Dai, and Lei Wang. Lintllm: An open-source verilog linting framework based on large language models.arXiv preprint arXiv:2502.10815, 2025
-
[6]
Verigen: A large language model for verilog code generation.ACM Transactions on Design Automation of Electronic Systems, 29(3):1–31, 2024
Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, and Siddharth Garg. Verigen: A large language model for verilog code generation.ACM Transactions on Design Automation of Electronic Systems, 29(3):1–31, 2024
2024
-
[7]
Llm-aided efficient hardware design automation.arXiv preprint arXiv:2410.18582, 2024
Kangwei Xu, Ruidi Qiu, Zhuorui Zhao, Grace Li Zhang, Ulf Schlichtmann, and Bing Li. Llm-aided efficient hardware design automation.arXiv preprint arXiv:2410.18582, 2024
-
[8]
Mage: A multi-agent engine for automated rtl code generation.arXiv preprint arXiv:2412.07822, 2024
Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao. Mage: A multi-agent engine for automated rtl code generation.arXiv preprint arXiv:2412.07822, 2024
-
[9]
Codev: Empowering llms with hdl generation through multi-level summarization.IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, 2025
Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Muxin Song, Yinan Xu, Ziyuan Nan, Mingju Gao, Tianyun Ma, Lei Qi, et al. Codev: Empowering llms with hdl generation through multi-level summarization.IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, 2025
2025
-
[10]
ChipNeMo: Domain- adapted llms for chip design,
Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinck- ney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, et al. Chipnemo: Domain-adapted llms for chip design.arXiv preprint arXiv:2311.00176, 2023
-
[11]
A multi-expert large language model architecture for verilog code generation
Bardia Nadimi and Hao Zheng. A multi-expert large language model architecture for verilog code generation. In2024 IEEE LLM Aided Design Workshop (LAD), pages 1–5. IEEE, 2024
2024
- [12]
-
[13]
Armin Abdollahi, Saeid Shokoufa, Negin Ashrafi, Mehdi Kamal, and Massoud Pedram. Hdlforge: A two-stage multi-agent framework for efficient verilog code generation with adaptive model escalation.arXiv preprint arXiv:2603.04646, 2026
-
[14]
Optimizing urban mobility through complex network analysis and big data from smart cards.IoT, 6(3):44, 2025
Li Sun, Negin Ashrafi, and Maryam Pishgar. Optimizing urban mobility through complex network analysis and big data from smart cards.IoT, 6(3):44, 2025
2025
-
[15]
Houji Jin, Negin Ashrafi, Kamiar Alaei, Elham Pishgar, Greg Placencia, and Maryam Pishgar. A novel multi-task teacher–student architecture with self- supervised pretraining for 48-hour vasoactive-inotropic trend analysis in sepsis mortality prediction.IEEE Journal of Biomedical and Health Informatics, 2025
2025
-
[16]
Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, et al. Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection.arXiv preprint arXiv:2407.16237, 2024
-
[17]
Masterrtl: A pre-synthesis ppa estimation framework for any rtl design
Wenji Fang, Yao Lu, Shang Liu, Qijun Zhang, Ceyu Xu, Lisa Wu Wills, Hongce Zhang, and Zhiyao Xie. Masterrtl: A pre-synthesis ppa estimation framework for any rtl design. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 1–9. IEEE, 2023
2023
-
[18]
Annotating slack directly on your verilog: Fine-grained rtl timing evaluation for early optimization
Wenji Fang, Shang Liu, Hongce Zhang, and Zhiyao Xie. Annotating slack directly on your verilog: Fine-grained rtl timing evaluation for early optimization. In Proceedings of the 61st ACM/IEEE Design Automation Conference, pages 1–6, 2024
2024
-
[19]
Icd 2 s: A hybrid ising- classical-machines data-driven qubo solver method
Armin Abdollahi, Mehdi Kamal, and Massoud Pedram. Icd 2 s: A hybrid ising- classical-machines data-driven qubo solver method. InProceedings of the 30th Asia and South Pacific Design Automation Conference, pages 914–920, 2025
2025
-
[20]
Armin Abdollahi, Mehdi Kamal, and Massoud Pedram. Menage: Mixed-signal event-driven neuromorphic accelerator for edge applications.arXiv preprint arXiv:2410.08403, 2024
-
[21]
Llsm: Llm-enhanced logic synthesis model with eda-guided cot prompting, hybrid embedding and aig-tailored acceleration
Shan Huang, Jinhao Li, Zhen Yu, Jiancai Ye, Jiaming Xu, Ningyi Xu, and Guohao Dai. Llsm: Llm-enhanced logic synthesis model with eda-guided cot prompting, hybrid embedding and aig-tailored acceleration. InProceedings of the 30th Asia and South Pacific Design Automation Conference, pages 974–980, 2025
2025
-
[22]
Wenji Fang, Shang Liu, Jing Wang, and Zhiyao Xie. Circuitfusion: multimodal cir- cuit representation learning for agile chip design.arXiv preprint arXiv:2505.02168, 2025
-
[23]
Zeju Li, Changran Xu, Zhengyuan Shi, Zedong Peng, Yi Liu, Yunhao Zhou, Lingfeng Zhou, Chengyu Ma, Jianyuan Zhong, Xi Wang, et al. Deepcircuitx: A comprehensive repository-level dataset for rtl code understanding, generation, and ppa analysis.arXiv preprint arXiv:2502.18297, 2025
-
[24]
Rocketppa: Ultra-fast llm- based ppa estimator at code-level abstraction.arXiv e-prints, pages arXiv–2503, 2025
Armin Abdollahi, Mehdi Kamal, and Massoud Pedram. Rocketppa: Ultra-fast llm- based ppa estimator at code-level abstraction.arXiv e-prints, pages arXiv–2503, 2025
2025
-
[25]
Pyranet: A multi-layered hierarchical dataset for verilog.arXiv preprint arXiv:2412.06947, 2024
Bardia Nadimi, Ghali Omar Boutaib, and Hao Zheng. Pyranet: A multi-layered hierarchical dataset for verilog.arXiv preprint arXiv:2412.06947, 2024
-
[26]
Verilogeval: Evaluating large language models for verilog code generation
Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. Verilogeval: Evaluating large language models for verilog code generation. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 1–8. IEEE, 2023
2023
-
[27]
Hugo Touvron, Louis Martin, Kevin Stone, et al. Llama 3: Open foundation and instruction-tuned models.arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review arXiv 2024
-
[28]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via rein- forcement learning.arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review arXiv 2025
-
[29]
Nangate open cell library 45nm
Nangate Inc. Nangate open cell library 45nm. http://www.nangate.com, 2011. Accessed: 2026
2011
-
[30]
Martins, J
M. Martins, J. Herrmann, M. S. Martins, et al. Open cell library in 15nm freepdk technology. InProceedings of the International Symposium on Quality Electronic Design (ISQED), pages 171–178, 2015
2015
-
[31]
Vashishtha, M
V. Vashishtha, M. Vangala, and L. T. Clark. Asap7 predictive design kit de- velopment and cell design technology co-optimization. InProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 992–998, 2017
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.