pith. sign in

arxiv: 2606.11348 · v1 · pith:FZI4E55Inew · submitted 2026-06-09 · 💻 cs.LG

SwiftCTS: Fast Cross-Design Prediction and Pareto Optimization of Clock Tree Metrics via Few-Shot Calibration

Pith reviewed 2026-06-27 13:59 UTC · model grok-4.3

classification 💻 cs.LG
keywords swiftctspowerpredictionunderwirelengthcalibrationclockcomputationally
0
0 comments X

The pith

SwiftCTS uses gradient-boosted models on physics features plus K-shot multiplicative calibration to predict clock tree power, wirelength and skew on unseen macros after one or two physical runs and to evaluate 100,000 configurations in unde

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Clock Tree Synthesis requires repeated EDA tool calls to trade off power, wirelength and timing skew across large configuration spaces. Prior machine-learning predictors need expensive retraining or fine-tuning when the macro changes. SwiftCTS couples lightweight physics-grounded statistical features with gradient-boosted ensembles that train in under five seconds on CPU and infer in sub-milliseconds. A K-shot multiplicative calibration step then anchors those predictions to one or two physical reference runs on the new design, cutting power error from 24.5 percent to 3.3 percent and wirelength error from 56.6 percent to under 1 percent. The calibrated surrogate plugs into an evolutionary optimizer to produce Pareto fronts that are physically validated inside the OpenROAD flow and consistently beat default tool heuristics.

Core claim

SwiftCTS is a physics-informed surrogate that trains in under five seconds, delivers sub-millisecond inference, and applies a K-shot multiplicative calibration to one or two physical reference runs; this reduces out-of-distribution power prediction error from 24.5 percent to 3.3 percent and wirelength error from 56.6 percent to under 1 percent, enabling an evolutionary optimizer to evaluate 100,000 CTS configurations in under ten seconds and return Pareto-optimal clock trees whose predictions are confirmed within 0.5 percent for power and wirelength and five picoseconds for skew when re-run in the physical flow.

What carries the argument

The K-shot multiplicative calibration mechanism, which scales the surrogate outputs by the ratio of one or two measured physical values on the target macro to the surrogate's own prediction on the same points.

If this is right

  • Evaluates 100,000 CTS configurations in under ten seconds.
  • Delivers closed-loop validated errors below 0.5 percent for power and wirelength and within five picoseconds for timing skew on out-of-distribution benchmarks.
  • Produces Pareto fronts that outperform default tool heuristics on all three target metrics when inserted back into the OpenROAD flow.
  • Trains the base surrogate in under five seconds on CPU with no GPU required.
  • Removes the need for retraining or fine-tuning when moving to a new macro architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same calibration pattern could be tested on other expensive EDA stages such as placement or routing where only a few full-tool runs are affordable.
  • Because the optimizer explores 100,000 points in seconds, designers could afford tighter iteration loops inside the same tape-out schedule.
  • The physics-grounded feature set may transfer to surrogate models for related combinatorial problems that also admit cheap statistical descriptors.
  • If the calibration proves stable across more foundry nodes, the method could shrink the compute barrier that currently limits exhaustive search in physical design.
  • keywords:[
  • clock tree synthesis
  • few-shot calibration
  • surrogate modeling

Load-bearing premise

The multiplicative adjustment computed from one or two physical reference runs will remain accurate for any new macro architecture without further model changes.

What would settle it

Measure power, wirelength and skew on a previously unseen macro after applying the two-shot calibration; if the errors stay above 3 percent for power or 1 percent for wirelength, the central claim fails.

Figures

Figures reproduced from arXiv: 2606.11348 by Barsat Khadka, Kawsher Roxy, Md Rubel Ahmed.

Figure 1
Figure 1. Figure 1: High-level overview of the SwiftCTS framework. The pipeline begins with feature extraction from placement data, followed by a lightweight surrogate [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Predictive limitations of standalone models. (Left) LightGBM severely [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Closed-loop physical validation comparing SwiftCTS predictions [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Clock tree layout comparison on the AES benchmark, highlighting the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Clock Tree Synthesis (CTS) is a computationally expensive stage in the physical design flow, requiring iterative EDA tool invocations to navigate a vast configuration space for optimal power, wirelength, and timing skew. Existing machine learning approaches require computationally expensive retraining or fine-tuning cycles to adapt to unseen macro architectures and are architecturally mismatched to the millions of evaluations demanded by exhaustive combinatorial search. We present SwiftCTS, a physics-informed surrogate framework that addresses both limitations simultaneously. By coupling lightweight, physics-grounded statistical features with gradient-boosted ensembles, SwiftCTS trains in under five seconds on a CPU and delivers sub-millisecond inference without GPU support. To handle out-of-distribution (OOD) designs without retraining or fine-tuning, we introduce a K-shot multiplicative calibration mechanism that anchors predictions to just one or two physical reference runs, reducing power prediction error from 24.5\% to 3.3\% and wirelength error from 56.6\% to under 1\% on unseen macros. Integrating this engine with an evolutionary optimizer, SwiftCTS evaluates 100,000 CTS configurations in under ten seconds, yielding Pareto-optimal frontiers that are physically validated within the OpenROAD flow. Closed-loop validation confirms prediction errors below 0.5\% for power and wirelength, and timing skew predictions within five picoseconds on an OOD benchmark, consistently outperforming default tool heuristics across all target metrics. Code publicly available at: \href{https://anonymous.4open.science/r/SwiftCTS-7E6E}{https://github.com/BarsatKhadka/SwiftCTS}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents SwiftCTS, a physics-informed surrogate framework for clock tree synthesis (CTS) metrics (power, wirelength, timing skew). It couples lightweight physics-grounded statistical features with gradient-boosted ensembles for sub-5-second CPU training and sub-millisecond inference. A K-shot multiplicative calibration mechanism is introduced to handle OOD macros using only 1-2 physical reference runs, reducing reported OOD errors from 24.5% to 3.3% (power) and 56.6% to <1% (wirelength). The surrogate is integrated with an evolutionary optimizer to evaluate 100k CTS configurations in <10s, producing Pareto fronts that are closed-loop validated in the OpenROAD flow with final errors <0.5% (power/wirelength) and <5ps (skew), outperforming default heuristics.

Significance. If the calibration generalizes, the work offers a practical route to exhaustive combinatorial search in CTS without repeated full EDA invocations or retraining, which could meaningfully accelerate physical design iteration. Public code release and closed-loop physical validation in a standard open-source flow are concrete strengths that support reproducibility and applicability.

major comments (1)
  1. [Calibration mechanism (abstract and methods describing K-shot correction)] The central OOD performance claim (reduction from 24.5%/56.6% to 3.3%/<1% with 1-2 references) rests on the K-shot multiplicative calibration. No derivation or ablation is provided showing why the residual is purely multiplicative rather than containing additive or topology-dependent terms when macros differ in size, aspect ratio, or congestion; a single scalar may fail to correct such cases. This directly underpins the headline numbers and the claim of no retraining/fine-tuning for arbitrary unseen macros.
minor comments (1)
  1. [Abstract] The abstract states that the approach 'consistently outperforming default tool heuristics across all target metrics' but provides no quantitative deltas or baseline definitions in the abstract itself.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the practical strengths of SwiftCTS, including the public code release and closed-loop validation. The single major comment raises a valid point about the justification for the multiplicative calibration form. We address it directly below and indicate how we will strengthen the manuscript.

read point-by-point responses
  1. Referee: [Calibration mechanism (abstract and methods describing K-shot correction)] The central OOD performance claim (reduction from 24.5%/56.6% to 3.3%/<1% with 1-2 references) rests on the K-shot multiplicative calibration. No derivation or ablation is provided showing why the residual is purely multiplicative rather than containing additive or topology-dependent terms when macros differ in size, aspect ratio, or congestion; a single scalar may fail to correct such cases. This directly underpins the headline numbers and the claim of no retraining/fine-tuning for arbitrary unseen macros.

    Authors: We agree that the manuscript would benefit from a clearer justification of the multiplicative form. The choice is motivated by the physics-informed features (which encode relative scaling of capacitance, resistance, and fanout) and by consistent empirical observation across the evaluated macro set that residuals for power and wirelength are predominantly proportional rather than additive. Nevertheless, no formal derivation or systematic ablation (multiplicative vs. additive, or across explicit variations in aspect ratio and congestion) is currently present. In the revised manuscript we will add (i) a short derivation sketch linking the residual structure to the underlying delay and power equations and (ii) an ablation table that reports calibration error for both multiplicative and additive scalars on macros deliberately varied in size, aspect ratio, and congestion. These additions will directly support the headline OOD numbers and the no-retraining claim while preserving the core K-shot mechanism. revision: partial

Circularity Check

0 steps flagged

No circularity; calibration anchored by external physical references

full rationale

The SwiftCTS framework trains a gradient-boosted ensemble on physics-grounded features and applies a K-shot multiplicative calibration using one or two external physical reference runs for OOD adjustment. This calibration step is defined in terms of measured reference values rather than the model's own predictions or fitted parameters, and closed-loop validation against the OpenROAD flow supplies independent physical confirmation. No equation or claim reduces a target quantity to a self-derived fit or self-citation chain; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are described in sufficient detail to enumerate.

pith-pipeline@v0.9.1-grok · 5830 in / 1118 out tokens · 16224 ms · 2026-06-27T13:59:41.238659+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 9 canonical work pages

  1. [1]

    A system for automatic recording and prediction of design quality metrics,

    A. Kahng and S. Mantik, “A system for automatic recording and prediction of design quality metrics,” inProceedings of the IEEE 2001. 2nd International Symposium on Quality Electronic Design, 2001, pp. 81–86

  2. [2]

    High-dimensional metamodeling for prediction of clock tree synthesis outcomes,

    A. B. Kahng, B. Lin, and S. Nath, “High-dimensional metamodeling for prediction of clock tree synthesis outcomes,” in2013 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP), 2013, pp. 1–7

  3. [3]

    Enhanced metamodeling techniques for high-dimensional ic design estimation problems,

    A. B. A Kahng, B. Lin, and S. Nath, “Enhanced metamodeling techniques for high-dimensional ic design estimation problems,” in Proceedings of the Conference on Design, Automation and Test in Europe, ser. DATE ’13. San Jose, CA, USA: EDA Consortium, 2013, p. 1861–1866

  4. [4]

    Transient clock power estimation of pre-cts netlist,

    Y . Kwon, J. Jung, I. Han, and Y . Shin, “Transient clock power estimation of pre-cts netlist,” in2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018, pp. 1–4

  5. [5]

    Designing of an optimization technique for the prediction of cts outcomes using neural network,

    S. Nagaria and S. Deb, “Designing of an optimization technique for the prediction of cts outcomes using neural network,” in2020 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), 2020, pp. 312–315

  6. [6]

    Gan-cts: A generative adversarial framework for clock tree prediction and optimiza- tion,

    Y .-C. Lu, J. Lee, A. Agnesina, K. Samadi, and S. K. Lim, “Gan-cts: A generative adversarial framework for clock tree prediction and optimiza- tion,” in2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2019, pp. 1–8

  7. [7]

    Layout congestion prediction based on regression-vit,

    G. Mo, Y . Xia, J. Ou, S. Cai, and X. Xiong, “Layout congestion prediction based on regression-vit,”ACM Trans. Des. Autom. Electron. Syst., vol. 30, no. 1, Nov. 2024. [Online]. Available: https://doi.org/10.1145/3698196

  8. [8]

    Report for nsf workshop on ai for electronic design automation,

    D. Chen, V . Ganesh, W. Li, Y . C. Lin, Y . Liu, S. Mitra, D. Z. Pan, R. Puri, J. Cong, and Y . Sun, “Report for nsf workshop on ai for electronic design automation,” 2026. [Online]. Available: https://arxiv.org/abs/2601.14541

  9. [9]

    In60th ACM/IEEE Design Automation Conference (DAC)

    S. Zheng, L. Zou, S. Liu, Y . Lin, B. Yu, and M. Wong, “Mitigating distribution shift for congestion optimization in global placement,” inProceedings of the 60th Annual ACM/IEEE Design Automation Conference, ser. DAC ’23. IEEE Press, 2025, p. 1–6. [Online]. Available: https://doi.org/10.1109/DAC56929.2023.10247660

  10. [10]

    Generalizable cross-graph embedding for gnn-based congestion prediction,

    A. Ghose, V . Zhang, Y . Zhang, D. Li, W. Liu, and M. Coates, “Generalizable cross-graph embedding for gnn-based congestion prediction,” in2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE Press, 2021, p. 1–9. [Online]. Available: https://doi.org/10.1109/ICCAD51958.2021.9643446

  11. [11]

    Disentangle, align and generalize: Learning a timing predictor from different technology nodes,

    X. Zhang, B. Zhu, F. Liu, Z. Wang, P. Xu, H. Xu, and B. Yu, “Disentangle, align and generalize: Learning a timing predictor from different technology nodes,” inProceedings of the 61st ACM/IEEE Design Automation Conference, ser. DAC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3649329.3656251

  12. [12]

    Deeplayout: learning neural representations of circuit placement layout,

    Y . Zhao, Z. Chai, X. Jiang, Q. Xu, R. Wang, and Y . Lin, “Deeplayout: learning neural representations of circuit placement layout,” inProceed- ings of the 42nd International Conference on Machine Learning, ser. ICML’25. JMLR.org, 2025

  13. [13]

    A hybrid reinforcement learning framework for efficient physical design parameter tuning,

    H.-H. Hsiao, Y .-C. Lu, P. Vanna-Iampikul, and S. K. Lim, “A hybrid reinforcement learning framework for efficient physical design parameter tuning,”ACM Trans. Des. Autom. Electron. Syst., vol. 31, no. 3, Feb. 2026. [Online]. Available: https://doi.org/10.1145/3779423

  14. [14]

    Vlsi placement parameter optimization using deep reinforcement learning,

    A. Agnesina, K. Chang, and S. K. Lim, “Vlsi placement parameter optimization using deep reinforcement learning,” inProceedings of the 39th International Conference on Computer-Aided Design, ser. ICCAD ’20. New York, NY , USA: Association for Computing Machinery,

  15. [15]

    Available: https://doi.org/10.1145/3400302.3415690

    [Online]. Available: https://doi.org/10.1145/3400302.3415690

  16. [16]

    Ptpt: Physical design tool parameter tuning via multi-objective bayesian optimization,

    H. Geng, T. Chen, Y . Ma, B. Zhu, and B. Yu, “Ptpt: Physical design tool parameter tuning via multi-objective bayesian optimization,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, pp. 178–189, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:248214151

  17. [17]

    Boosting vlsi design flow parameter tuning with random embedding and multi-objective trust-region bayesian optimization,

    S. Zheng, H. Geng, C. Bai, B. Yu, and M. D. F. Wong, “Boosting vlsi design flow parameter tuning with random embedding and multi-objective trust-region bayesian optimization,”ACM Transactions on Design Automation of Electronic Systems, vol. 28, pp. 1 – 23, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID: 258927433

  18. [18]

    Chateda: A large language model powered autonomous agent for eda,

    H. Wu, Z. He, X. Zhang, X. Yao, S. Zheng, H. Zheng, and B. Yu, “Chateda: A large language model powered autonomous agent for eda,”Trans. Comp.-Aided Des. Integ. Cir. Sys., vol. 43, no. 10, p. 3184–3197, Oct. 2024. [Online]. Available: https: //doi.org/10.1109/TCAD.2024.3383347

  19. [19]

    Spec2rtl-agent: Automated hardware code generation from complex specifications using llm agent systems,

    Z. Yu, M. Liu, M. Zimmer, Y . C. Lin, Y . Liu, and H. Ren, “Spec2rtl-agent: Automated hardware code generation from complex specifications using llm agent systems,” 2025. [Online]. Available: https://arxiv.org/abs/2506.13905

  20. [20]

    Chen and C

    T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 785–794. [Online]. Available: https://doi.org/10.1145/2939672.2939785

  21. [21]

    A steiner tree construction for vlsi routing,

    A. Kahng, “A steiner tree construction for vlsi routing,” inIJCNN-91- Seattle International Joint Conference on Neural Networks, vol. i, 1991, pp. 133–139 vol.1

  22. [22]

    Lightgbm: a highly efficient gradient boosting decision tree,

    G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu, “Lightgbm: a highly efficient gradient boosting decision tree,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, p. 3149–3157

  23. [23]

    Ridge regression,

    G. C. McDonald, “Ridge regression,”WIREs Comput. Stat., vol. 1, no. 1, p. 93–100, Jul. 2009. [Online]. Available: https://doi.org/10. 1002/wics.14

  24. [24]

    Plug-and-play use of tree-based methods: consequences for clinical prediction modeling,

    L. M. Meijerink, E. Schuit, K. G. Moons, and A. M. Leeuwenberg, “Plug-and-play use of tree-based methods: consequences for clinical prediction modeling,”Journal of Clinical Epidemiology, vol. 184, p. 111834, 2025. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0895435625001672

  25. [25]

    Cts-bench: Benchmarking graph coarsening trade-offs for gnns in clock tree synthesis,

    B. Khadka, K. Roxy, and M. R. Ahmed, “Cts-bench: Benchmarking graph coarsening trade-offs for gnns in clock tree synthesis,” 2026. [Online]. Available: https://arxiv.org/abs/2602.19330

  26. [26]

    Toward an open-source digital flow: First learnings from the openroad project,

    T. Ajayi, V . A. Chhabria, M. Fogac ¸a, S. Hashemi, A. Hosny, A. B. Kahng, M. Kim, J. Lee, U. Mallappa, M. Neseem, G. Pradipta, S. Reda, M. Saligane, S. S. Sapatnekar, C. Sechen, M. Shalan, W. Swartz, L. Wang, Z. Wang, M. Woo, and B. Xu, “Toward an open-source digital flow: First learnings from the openroad project,” inProceedings of the 56th Annual Desig...