pith. sign in

arxiv: 2606.08161 · v1 · pith:5B55ESLKnew · submitted 2026-06-06 · 💻 cs.LG · cs.AR· cs.NA· math.NA

AttentionCap: Transformer Based Capacitance Matrix Learning Toward Full-Chip Extraction

Pith reviewed 2026-06-27 20:21 UTC · model grok-4.3

classification 💻 cs.LG cs.ARcs.NAmath.NA
keywords capacitance extractiontransformerdeep learningEDAcapacitance matrixprocess nodesymmetric attentionnormalized Laplacian loss
0
0 comments X

The pith

A Transformer learns chip capacitance matrices from synthetic data and generalizes to real multi-node designs with under 4 percent error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Rule-based pattern matching for capacitance extraction struggles at advanced process nodes, prompting a shift toward learned 2D models that can handle variable metal layers and nodes. AttentionCap frames the capacitance matrix problem as an attention task inside a Transformer, adding a Gram representation, a symmetric-attention output layer, a normalized Laplacian loss, and a process-node embedding so the model can train across nodes. When trained only on synthetic layouts, it reaches 0.67 percent self-capacitance and 3.99 percent coupling-capacitance error on unseen real designs while running 192 times faster than a CNN baseline and transferring to a new node with only 5K samples and 4K fine-tuning steps. These results matter because accurate, fast capacitance data feeds directly into timing, power, and signal-integrity analysis throughout chip design.

Core claim

AttentionCap, a Transformer customized with Gram representation, physics-aligned symmetric-attention output, normalized Laplacian loss, and process-node embedding, attains 0.67 percent self-capacitance and 3.99 percent coupling-capacitance error on unseen real multi-layer multi-node designs after training on synthetic data, outperforming the CNN-Cap baseline by 4.6 times and 5.7 times in those respective errors while delivering 192 times faster inference and accurate transfer to an unseen node using only 5K samples and 4K fine-tuning steps.

What carries the argument

Symmetric-attention output layer together with normalized Laplacian loss that enforces the physical symmetry and positive-definiteness properties of the capacitance matrix inside the Transformer architecture.

If this is right

  • Full-chip capacitance extraction can be performed with deep learning at speeds that support large-scale EDA flows.
  • A single pretrained model can be adapted to new manufacturing nodes without collecting large new labeled datasets.
  • Downstream timing and power analysis tools receive lower-error parasitic data at lower computational cost.
  • Multi-layer and multi-node settings become practical without hand-crafted rules per layer combination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same attention-plus-physics-loss pattern could be tested on related parasitic extraction tasks such as inductance or resistance matrices.
  • If transfer works across nodes, the approach may reduce the frequency of full electromagnetic field-solver runs during technology development.
  • Embedding additional physical quantities such as temperature or frequency dependence could extend the model beyond static capacitance.

Load-bearing premise

Synthetic training layouts capture enough of the geometric and material variation present in real chip designs across different process nodes.

What would settle it

Run the pretrained model on a fresh collection of real layouts from a process node never seen during pretraining or fine-tuning and measure whether self-capacitance error stays below 1 percent and coupling error stays below 4 percent.

Figures

Figures reproduced from arXiv: 2606.08161 by Dingcheng Yang, Hector R. Rodriguez, Jiechen Huang, Wenjian Yu, Yibo Lin, Zuochang Ye.

Figure 1
Figure 1. Figure 1: Modern Transformer architecture. computationally expensive and cannot scale to full-chip extraction. Pattern matching is an approximate approach for full-chip extrac￾tion, which decomposes the 3D layout into 2D cross-sections and estimates capacitance with look-up tables [8]. 2.2 Deep-Learning-Based Capacitance Extraction for 2D Cross-Sections Since the design and usage of pattern libraries are highly empi… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the AttentionCap framework for capacitance matrix learning. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: AttentionCap learns meaningful Gram represen [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example cross-sections from ASAP7 real-design [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Strong transferability of AttentionCap: with only 5K [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

As capacitance extraction accuracy of rule-based pattern matching becomes difficult to sustain at advanced nodes, a growing trend emerges to develop deep-learning-based 2D capacitance models. However, existing MLP- and CNN-based methods constrain their input to fixed metal-layer combinations in a specific process node, limiting their usability in practice. Recognizing the inherent similarity between capacitance matrix and the prevailing attention mechanism, we propose AttentionCap, a customized Transformer for capacitance matrix learning, with a Gram representation framework, a physics-aligned symmetric-attention output layer, and a novel normalized Laplacian loss. We also introduce a process-node embedding to enable multi-node learning. Trained on synthetic data, AttentionCap attains 0.67\%/3.99\% self/coupling-capacitance error on unseen real designs under a multi-layer and multi-node setting, surpassing the CNN-Cap baseline with 4.6$\times$/5.7$\times$ lower self/coupling error and 192$\times$ faster inference speed. A pretrained AttentionCap accurately transfers to an unseen node with only 5K samples and 4K finetuning steps. With sufficient accuracy on unseen real designs and strong transferability to new process nodes, AttentionCap offers highly practical value for modern EDA workflows. Code and data are available at https://github.com/THU-numbda/AttentionCap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes AttentionCap, a Transformer-based model for learning full-chip capacitance matrices. It introduces a Gram representation framework, physics-aligned symmetric-attention output layer, normalized Laplacian loss, and process-node embedding to support multi-layer and multi-node settings. Trained on synthetic data, the model reports 0.67%/3.99% self/coupling capacitance error on unseen real designs (outperforming CNN-Cap by 4.6×/5.7× with 192× faster inference) and demonstrates transfer to an unseen process node using only 5K samples and 4K finetuning steps.

Significance. If the synthetic-to-real generalization holds, the work provides a practical advance for EDA capacitance extraction by enabling accurate, fast inference across process nodes without per-node retraining from scratch. The availability of code and data strengthens reproducibility.

major comments (2)
  1. [abstract and §3 (data generation)] The central claim of 0.67%/3.99% error on unseen real designs (abstract) rests on the assumption that the synthetic training distribution matches the statistics of real multi-layer, multi-node layouts. No section details the synthetic generator's coverage of long-range coupling, irregular density gradients, or node-specific via/metal-stack variations; without this, the reported errors and 5K-sample transfer results are conditional on test designs lying inside the synthetic manifold.
  2. [abstract and transfer experiment section] The multi-node transfer experiment (abstract) uses a pretrained model with process-node embedding, but lacks explicit validation that the embedding dimensions capture node-specific variations rather than overfitting to the training nodes. The claim that it 'accurately transfers' with 5K samples requires quantitative comparison to training from scratch on the target node.
minor comments (2)
  1. [method section] Notation for the normalized Laplacian loss should be defined with an equation number for clarity.
  2. [figures] Figure captions should explicitly state whether error metrics are mean or median and over which capacitance types.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we address each major comment point by point, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [abstract and §3 (data generation)] The central claim of 0.67%/3.99% error on unseen real designs (abstract) rests on the assumption that the synthetic training distribution matches the statistics of real multi-layer, multi-node layouts. No section details the synthetic generator's coverage of long-range coupling, irregular density gradients, or node-specific via/metal-stack variations; without this, the reported errors and 5K-sample transfer results are conditional on test designs lying inside the synthetic manifold.

    Authors: We agree that the current §3 description does not provide sufficient quantitative detail on the synthetic generator's coverage of long-range coupling, density gradients, and node-specific variations. In the revised manuscript we will expand §3 with a dedicated subsection that reports statistics on coupling distance distributions, density gradient histograms, and via/metal-stack configurations in the generated data, together with direct comparisons to the statistics of the real test designs. This addition will clarify the extent to which the reported errors rest on the synthetic manifold. revision: yes

  2. Referee: [abstract and transfer experiment section] The multi-node transfer experiment (abstract) uses a pretrained model with process-node embedding, but lacks explicit validation that the embedding dimensions capture node-specific variations rather than overfitting to the training nodes. The claim that it 'accurately transfers' with 5K samples requires quantitative comparison to training from scratch on the target node.

    Authors: We concur that an explicit comparison to training from scratch on the target node with the same 5K samples, as well as validation that the process-node embedding captures node-specific features, would strengthen the transfer claims. In the revised version we will add (i) a side-by-side accuracy comparison of the fine-tuned pretrained model versus a model trained from scratch on the target node and (ii) an analysis of the learned embeddings (e.g., pairwise distances or ablation removing the embedding) to demonstrate that they encode node-specific information rather than overfitting. revision: yes

Circularity Check

0 steps flagged

No circularity; training on synthetic data and evaluation on independent real designs are externally verifiable.

full rationale

The paper trains a Transformer model (AttentionCap) on synthetic layouts and reports quantitative errors (0.67%/3.99% self/coupling) plus transfer metrics on unseen real chip designs. These benchmarks are measured against external ground truth and are not derived from the model's own fitted parameters or self-citations. No equations reduce by construction to inputs, no uniqueness theorems are imported from prior author work, and the central claims rest on standard supervised learning plus independent test sets rather than self-referential definitions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The model introduces several custom components but relies on standard ML training assumptions and physics properties of capacitance. No invented entities are introduced.

free parameters (1)
  • process-node embedding dimensions
    The embedding for multi-node learning is a learned parameter, though exact count not specified in abstract.
axioms (1)
  • domain assumption The capacitance matrix is symmetric and positive semi-definite as per physics.
    Invoked in the design of the symmetric-attention output layer.

pith-pipeline@v0.9.1-grok · 5794 in / 1513 out tokens · 35645 ms · 2026-06-27T20:21:38.637783+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 2 canonical work pages · 2 internal anchors

  1. [1]

    CNN-Cap Official Implmentation

    2023. CNN-Cap Official Implmentation. https://github.com/ydc123/CNNCap

  2. [2]

    Mohamed Saleh Abouelyazid, Sherif Hammouda, and Yehea Ismail. 2022. Accuracy-Based Hybrid Parasitic Capacitance Extraction Using Rule-Based, Neural-Networks, and Field-Solver Methods.IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.41, 12 (2022), 5681–5694

  3. [3]

    Mohamed Saleh Abouelyazid, Sherif Hammouda, and Yehea Ismail. 2022. Fast and accurate machine learning compact models for interconnect parasitic ca- pacitances considering systematic process variations.IEEE Access10 (2022), 7533–7553

  4. [4]

    Kirti Bhanushali and W Rhett Davis. 2015. FreePDK15: An open-source predic- tive process design kit for 15nm FinFET technology. InProceedings of the 2015 Symposium on International Symposium on Physical Design. 165–170

  5. [5]

    James Hsueh-Chung Chen, Theodorus E Standaert, Emre Alptekin, Terry A Spooner, and Vamsi Paruchuri. 2014. Interconnect performance and scaling strategy at 7 nm node. InIEEE International Interconnect Technology Conference. 93–96

  6. [6]

    Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, and Danilo Comminiello

  7. [7]

    InInterna- tional Conference on Learning Representations (ICLR)

    Gramian Multimodal Representation Learning and Alignment. InInterna- tional Conference on Learning Representations (ICLR)

  8. [8]

    Lawrence T Clark, Vinay Vashishtha, Lucian Shifren, Aditya Gujja, Saurabh Sinha, Brian Cline, Chandarasekaran Ramamurthy, and Greg Yeric. 2016. ASAP7: A 7-nm finFET predictive process design kit.Microelectronics Journal53 (2016), 105–115

  9. [9]

    Kahng, David Noice, Nagesh Shirali, and Steve H.- C

    Jason Cong, Lei He, Andrew B. Kahng, David Noice, Nagesh Shirali, and Steve H.- C. Yen. 1997. Analysis and justification of a simple, practical 2 1/2-D capacitance extraction methodology. InDesign Automation Conference (DAC). 627–632

  10. [10]

    Martin Courtois, Malte Ostendorff, Leonhard Hennig, and Georg Rehm. 2024. Symmetric Dot-Product Attention for Efficient Training of BERT Language Mod- els. InFindings of the Association for Computational Linguistics (ACL). 8002–8011

  11. [11]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 4171–4186

  12. [12]

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InInternational Conference on Learning Representations (ICLR)

  13. [13]

    Horn and Charles R

    Roger A. Horn and Charles R. Johnson. 2013. Positive Definite and Semidefi- nite Matrices. InMatrix Analysis, Second Edition. Cambridge University Press, Chapter 7

  14. [14]

    Jiechen Huang, Shuailong Liu, and Wenjian Yu. 2025. A Parallel Floating Random Walk Solver for Reproducible and Reliable Capacitance Extraction. In2025 Design, Automation & Test in Europe Conference (DATE). IEEE, 1–7

  15. [15]

    Jiechen Huang and Wenjian Yu. 2024. Enhancing 3-D Random Walk Capacitance Solver with Analytic Surface Green’s Functions of Transition Cubes. InProc. DAC

  16. [16]

    Doyun Kim, Jaemin Park, Youngmin Oh, and Bosun Hwang. 2024. TraceFormer: s-parameter prediction framework for PCB traces based on graph transformer. InACM/IEEE Design Automation Conference (DAC)

  17. [17]

    Zhixing Li and Weiping Shi. 2020. Layout capacitance extraction using automatic pre-characterization and machine learning. InInternational Symposium on Quality Electronic Design (ISQED). 457–464

  18. [18]

    Lihao Liu, Fan Yang, Li Shang, and Xuan Zeng. 2023. GNN-Cap: Chip-Scale interconnect capacitance extraction using graph neural network.IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.43, 4 (2023), 1206–1217

  19. [19]

    Yaoyao Ma, Xiaoyu Xu, Shuai Yan, Yaxing Zhou, Tianyu Zheng, Zhuoxiang Ren, and Lan Chen. 2023. Extraction of interconnect parasitic capacitance matrix based on deep neural network.Electronics12, 6 (2023), 1440

  20. [20]

    J.C. Maxwell. 1873.A Treatise on Electricity and Magnetism

  21. [21]

    Russell Merris. 1994. Laplacian matrices of graphs: a survey.Linear algebra and its applications197 (1994), 143–176

  22. [22]

    OpenAI. 2023. GPT-4 technical report.arXiv preprint arXiv:2303.08774(2023)

  23. [23]

    Noam Shazeer. 2020. GLU variants improve transformer.arXiv preprint arXiv:2002.05202(2020)

  24. [24]

    Jiun-Cheng Tsai, Hsuan-Ming Huang, Wei-Min Hsu, Pei-Ting Lee, Jen-Hang Yang, Heng-Liang Huang, Yen-Ju Su, and Charles H-P Wen. 2025. ResCap: Fast-yet-Accurate Capacitance Extraction for Standard Cell Design by Physics- Guided Machine Learning. InAsia and South Pacific Design Automation Conference (ASPDAC). 1243–1250

  25. [25]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in Neural Information Processing Systems (NeurIPS)30 (2017)

  26. [26]

    Liangjian Wen, Yi Zhu, Lei Ye, Guojin Chen, Bei Yu, Jianzhuang Liu, and Chunjing Xu. 2022. Layoutransformer: Generating layout patterns with transformer via sequential pattern modeling. InIEEE/ACM International Conference on Computer- Aided Design (ICCAD)

  27. [27]

    Dingcheng Yang, Haoyuan Li, Wenjian Yu, Yuanbo Guo, and Wenjie Liang. 2023. CNN-Cap: Effective convolutional neural network-based capacitance models for interconnect capacitance extraction.ACM Transactions on Design Automation of Electronic Systems28, 4 (2023), 1–22

  28. [28]

    Dingcheng Yang, Wenjian Yu, Yuanbo Guo, and Wenjie Liang. 2021. CNN-Cap: Effective convolutional neural network based capacitance models for full-chip parasitic extraction. InIEEE/ACM International Conference On Computer Aided Design (ICCAD)

  29. [29]

    Wenjian Yu, Chao Hu, and Wangyang Zhang. 2009. Variational capacitance extraction of on-chip interconnects based on continuous surface model. InPro- ceedings of the 46th Annual Design Automation Conference. 758–763

  30. [30]

    Wenjian Yu, Shan Shen, Dingcheng Yang, Haoyuan Li, Jiechen Huang, and Chun- yan Pei. 2025. Deep Learning Inspired Capacitance Extraction Techniques. In Asia and South Pacific Design Automation Conference (ASPDAC). 106–112

  31. [31]

    W. Yu, M. Song, and M. Yang. 2021. Advancements and challenges on parasitic extraction for advanced process technologies. InAsia and South Pacific Design Automation Conference (ASPDAC). 841–846

  32. [32]

    Ziwei Yu, Shuai Yan, Yaxing Zhou, Xiaoyu Xu, and Zhuoxiang Ren. 2025. AIL DNN: Modeling of IC Interconnect Parasitic Capacitances Based on Adaptive Incremental Learning.IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.(2025)

  33. [33]

    Biao Zhang and Rico Sennrich. 2019. Root mean square layer normalization. Advances in neural information processing systems32 (2019)