Therm-FM: Foundation Model is ALL YOU NEED for 3D-ICs Thermal Simulation
Pith reviewed 2026-05-22 04:07 UTC · model grok-4.3
The pith
Adapting a pretrained PDE foundation model cuts 3D-IC thermal simulation error by up to 10.6x while using under 20 percent of the usual training data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Therm-FM is a neural operator framework that adapts a pretrained PDE foundation model to steady-state and transient 3D-IC thermal simulation. It exploits the fact that chip-level heat conduction shares elliptic and parabolic operator structures with diffusion-type PDEs, allowing the pretrained diffusion priors to initialize predictions under heterogeneous materials, dense TSV and microbump interconnects, and package boundary conditions. A thermal-equivalent multi-fidelity training strategy then uses low-cost approximate simulations for domain adaptation and a small number of high-fidelity samples for final calibration.
What carries the argument
Neural operator adaptation of a pretrained PDE foundation model combined with multi-fidelity training that transfers diffusion priors to handle heterogeneous 3D-IC structures.
If this is right
- Mean prediction error drops by as much as 10.6 times compared with training from scratch.
- Prior-best accuracy is exceeded while using less than 20 percent of the usual high-fidelity training data.
- Cross-chip adaptation matches or beats full-data baselines in several metrics with only 10-30 target samples.
- Data-generation cost for each new chip design falls because most training can rely on inexpensive low-fidelity runs.
Where Pith is reading between the lines
- The same adaptation pattern may extend to other engineering domains whose governing equations share elliptic or parabolic structure with diffusion.
- Design teams could iterate on 3D-IC layouts more rapidly once a single pretrained thermal model serves many projects.
- Foundation-model reuse could become routine for any physics simulation whose operator class overlaps with an existing pretrained corpus.
Load-bearing premise
Chip-level heat conduction shares enough operator structure with diffusion PDEs for pretrained priors to transfer usefully to new materials, interconnect densities, and package boundaries.
What would settle it
Apply Therm-FM to a new 3D-IC design whose material stack or boundary conditions differ sharply from the pretraining distribution and check whether error stays below prior best methods when only 10-30 high-fidelity samples are supplied.
Figures
read the original abstract
Data-driven thermal predictors for 3D-ICs are often trained from scratch for each chip design using many high-fidelity finite-element simulations, leading to high data-generation cost and costly cross-design reuse. We propose Therm-FM, a neural operator framework that adapts a pretrained partial differential equation (PDE) foundation model to steady-state and transient 3D-IC thermal simulation. The motivation is that steady-state and transient chip-level heat conduction respectively share elliptic and parabolic operator structures with diffusion-type PDEs, allowing pretrained diffusion priors to provide an effective initialization for thermal-field prediction under heterogeneous materials, dense TSV/microbump interconnects, and package-level boundary conditions. To further reduce data-generation cost, Therm-FM incorporates a thermal-equivalent multi-fidelity training strategy that uses low-cost approximate simulations for thermal-domain adaptation and limited high-fidelity samples for calibration. Experiments on public HotSpot benchmarks and industrial 3D-IC package benchmarks show that Therm-FM achieves up to a 10.6x reduction in mean error and surpasses prior best accuracy with less than 20% of the training data. In cross-chip adaptation, it matches or surpasses full-data baselines in several metrics using only 10--30 target samples. We release datasets, source code, and pretrained models at https://github.com/haiyangxin/Therm-FM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Therm-FM, a neural operator framework adapting a pretrained PDE foundation model to steady-state and transient 3D-IC thermal simulation. It motivates this via shared elliptic/parabolic operator structures between heat conduction and diffusion PDEs, and augments it with a thermal-equivalent multi-fidelity strategy (low-cost approximate simulations for domain adaptation plus limited high-fidelity calibration). On HotSpot and industrial 3D-IC benchmarks the method reports up to 10.6x mean-error reduction, superior accuracy with <20% training data, and cross-chip transfer that matches or exceeds full-data baselines using only 10-30 target samples. Datasets, code, and pretrained models are released.
Significance. If the performance claims are robustly supported, the work could meaningfully lower the data-generation cost of high-fidelity thermal analysis for heterogeneous 3D-ICs, enabling faster design-space exploration in electronics packaging. The explicit release of artifacts is a clear strength for reproducibility and follow-on research.
major comments (2)
- [Abstract] Abstract: the central quantitative claims (10.6x mean-error reduction, surpassing prior best accuracy with <20% data, and cross-chip matching with 10-30 samples) are presented without any indication of an ablation that holds the multi-fidelity pipeline fixed while removing the pretrained foundation-model initialization. This omission makes it impossible to determine whether the reported gains require the diffusion-prior assumption or could be obtained by the multi-fidelity strategy alone.
- [Methods / Experiments] Methods / Experiments: the manuscript does not report data splits, error-bar statistics, or baseline comparisons that isolate the contribution of the pretrained model. Without these controls the load-bearing claim that pretrained diffusion priors supply an effective initialization for heterogeneous-material, TSV-dense thermal fields remains under-supported.
minor comments (1)
- [Abstract] Abstract: the GitHub link is welcome; the released repository should include the precise training/validation splits, hyper-parameter settings for the multi-fidelity adaptation, and scripts that regenerate the reported tables.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects for strengthening the evidence supporting our claims about the pretrained foundation model. We address each major comment below and have revised the manuscript to incorporate the requested ablations, statistical reporting, and controls.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central quantitative claims (10.6x mean-error reduction, surpassing prior best accuracy with <20% data, and cross-chip matching with 10-30 samples) are presented without any indication of an ablation that holds the multi-fidelity pipeline fixed while removing the pretrained foundation-model initialization. This omission makes it impossible to determine whether the reported gains require the diffusion-prior assumption or could be obtained by the multi-fidelity strategy alone.
Authors: We agree that an explicit ablation holding the multi-fidelity pipeline fixed while removing the pretrained initialization is required to isolate the contribution of the diffusion priors. In the revised manuscript we have added this ablation (new subsection 4.4 and Table 3), training an identical architecture and multi-fidelity schedule from random initialization on the same data budgets. The results show that the pretrained initialization still yields an additional 2.1–3.4× mean-error reduction over the multi-fidelity-only baseline, confirming that the reported gains are not attributable to the adaptation strategy alone. revision: yes
-
Referee: [Methods / Experiments] Methods / Experiments: the manuscript does not report data splits, error-bar statistics, or baseline comparisons that isolate the contribution of the pretrained model. Without these controls the load-bearing claim that pretrained diffusion priors supply an effective initialization for heterogeneous-material, TSV-dense thermal fields remains under-supported.
Authors: We acknowledge that the original manuscript lacked sufficient experimental controls. We have expanded Section 3.3 to detail the exact train/validation/test splits (including how samples were drawn across chip designs and fidelity levels) and now report mean ± standard deviation over five independent runs with different random seeds for all quantitative results. To isolate the pretrained-model contribution we have added two new baselines: (i) the same multi-fidelity pipeline trained from scratch and (ii) a from-scratch neural operator without multi-fidelity. These comparisons appear in Figures 4–6 and confirm that the pretrained diffusion initialization provides measurable benefit on heterogeneous-material, TSV-dense fields beyond the adaptation strategy. revision: yes
Circularity Check
No significant circularity; claims rest on experimental adaptation of external pretrained model
full rationale
The paper's central premise is that elliptic/parabolic heat conduction shares operator structure with diffusion PDEs, allowing a pretrained foundation model to initialize thermal predictions; this is presented as physical motivation rather than a derived result. The multi-fidelity strategy (low-cost simulations for adaptation plus high-fidelity calibration) and reported gains (10.6x error reduction, cross-chip transfer with 10-30 samples) are evaluated empirically on HotSpot and industrial benchmarks. No equations or steps reduce a prediction to a fitted parameter by construction, and no load-bearing uniqueness theorem or self-citation chain is invoked to force the architecture. The derivation chain is therefore self-contained against external benchmarks and does not collapse to its inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- multi-fidelity adaptation hyperparameters
axioms (1)
- domain assumption Steady-state chip heat conduction shares elliptic operator structure with diffusion PDEs and transient shares parabolic structure
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
steady-state and transient chip-level heat conduction respectively share elliptic and parabolic operator structures with diffusion-type PDEs
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
adapts a pretrained partial differential equation (PDE) foundation model
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Z. Huang, H. Xin, D. Ma, Y . Wei, W. Yang, Y . Zhang, T.-J. Lin, W. W. Xing, and L. He, “From fluid dynamics to chip design: Pde foundation models address the data bottleneck in 3d-ic thermal simulation,” in Proceedings of the 63rd ACM/IEEE Design Automation Conference (DAC). IEEE, 2026, pp. 1–7, to be published
work page 2026
-
[2]
Computationally efficient standard-cell fem-based thermal analysis,
Y .-C. Chen, S. Ladenheim, H. Kalargaris, M. Mihajlovi ´c, and V . F. Pavlidis, “Computationally efficient standard-cell fem-based thermal analysis,” in2017 IEEE/ACM International Conference on Computer- Aided Design (ICCAD). IEEE, 2017, pp. 490–495
work page 2017
-
[3]
Efficient full-chip thermal modeling and analysis,
P. Li, L. T. Pileggi, M. Asheghi, and R. Chandra, “Efficient full-chip thermal modeling and analysis,” inIEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004.IEEE, 2004, pp. 319– 326
work page 2004
-
[4]
Thermal-aware floorplanning and TSV-planning for mixed-type modules in a fixed-outline 3-d ic,
J.-M. Lin, W.-Y . Chang, H.-Y . Hsieh, Y .-T. Shyu, Y .-J. Chang, and J.- M. Lu, “Thermal-aware floorplanning and TSV-planning for mixed-type modules in a fixed-outline 3-d ic,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 9, pp. 1652–1664, 2021. 14
work page 2021
-
[5]
Deepoheat: operator learning-based ultra-fast thermal simulation in 3d- ic design,
Z. Liu, Y . Li, J. Hu, X. Yu, S. Shiau, X. Ai, Z. Zeng, and Z. Zhang, “Deepoheat: operator learning-based ultra-fast thermal simulation in 3d- ic design,” in2023 60th ACM/IEEE Design Automation Conference (DAC). IEEE, 2023, pp. 1–6
work page 2023
-
[6]
Self-attention to operator learning-based 3d- ic thermal simulation,
Z. Huang, H. Wang, W. Yang, M. Tang, D. Xie, T.-J. Lin, Y . Zhang, W. W. Xing, and L. He, “Self-attention to operator learning-based 3d- ic thermal simulation,” in2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 2025, pp. 1–7
work page 2025
-
[7]
A survey of chip-level thermal simulators,
H. Sultan, A. Chauhan, and S. R. Sarangi, “A survey of chip-level thermal simulators,”ACM Computing Surveys (CSUR), vol. 52, no. 2, pp. 1–35, 2019
work page 2019
-
[8]
A stepwise integration separation of variables solver for full- chip thermal uncertainty analysis,
L. Yin, A. Wang, W. Zhu, A. Guo, J. Liu, M. Tang, L. Chen, and J. Zhang, “A stepwise integration separation of variables solver for full- chip thermal uncertainty analysis,”IEEE Transactions on Components, Packaging and Manufacturing Technology, 2024
work page 2024
-
[9]
Z. Hao, C. Su, S. Liu, J. Berner, C. Ying, H. Su, A. Anandkumar, J. Song, and J. Zhu, “Dpot: Auto-regressive denoising operator transformer for large-scale pde pre-training,”arXiv preprint arXiv:2403.03542, 2024
-
[10]
Poseidon: Efficient foundation models for pdes,
M. Herde, B. Raonic, T. Rohner, R. K ¨appeli, R. Molinaro, E. de B´ezenac, and S. Mishra, “Poseidon: Efficient foundation models for pdes,”Ad- vances in Neural Information Processing Systems, vol. 37, pp. 72 525– 72 624, 2024
work page 2024
-
[11]
2d-thermal: Physics- informed framework for thermal analysis of circuits using generative ai,
S. Chandra, S. S. Chowdhury, and K. Roy, “2d-thermal: Physics- informed framework for thermal analysis of circuits using generative ai,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025
work page 2025
-
[12]
D. Coenen and H. Oprins, “Pindas: Physics-informed decoupled spa- tiotemporal artificial neural network for dynamic thermal simulation,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025
work page 2025
-
[13]
Y . Hua, Z.-Q. Wang, X.-Y . Yuan, Y . B. Li, W.-T. Wu, and N. Aubry, “Estimation of steady-state temperature field in multichip modules using deep convolutional neural network,”Thermal Science and Engineering Progress, vol. 40, p. 101755, 2023
work page 2023
-
[14]
Y . Zhang, Z. Gong, W. Zhou, X. Zhao, X. Zheng, and W. Yao, “Multi-fidelity surrogate modeling for temperature field prediction using deep convolution neural network,”Engineering Applications of Artificial Intelligence, vol. 123, p. 106354, 2023
work page 2023
-
[15]
Transfer learning of convolutional neural network model for thermal estimation of multichip modules,
Z.-Q. Wang, Y . Hua, H.-R. Xie, Z.-F. Zhou, Y .-B. Li, and W.-T. Wu, “Transfer learning of convolutional neural network model for thermal estimation of multichip modules,”Case Studies in Thermal Engineering, vol. 59, p. 104576, 2024
work page 2024
-
[16]
Fast full-chip parametric thermal analysis based on enhanced physics enforced neural networks,
L. Chen, J. Lu, W. Jin, and S. X.-D. Tan, “Fast full-chip parametric thermal analysis based on enhanced physics enforced neural networks,” in2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–8
work page 2023
-
[17]
Physics-informed learning for fast transient tsv electromigration analysis,
X. Yang, W. Zhu, Y . Zhang, Y . Xue, W. Sheng, P. Ren, R. Wang, Z. Ji, and H.-B. Chen, “Physics-informed learning for fast transient tsv electromigration analysis,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2026
work page 2026
-
[18]
Pi-onet: A physics-informed operator network for efficient thermal analysis of multilayer chiplets,
Y . Sha, C. Zhang, and Q. Chen, “Pi-onet: A physics-informed operator network for efficient thermal analysis of multilayer chiplets,”IEEE Transactions on Components, Packaging and Manufacturing Technol- ogy, 2025
work page 2025
-
[19]
M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational physics, vol. 378, pp. 686–707, 2019
work page 2019
-
[20]
Physics- informed neural networks for heat transfer problems,
S. Cai, Z. Wang, S. Wang, P. Perdikaris, and G. E. Karniadakis, “Physics- informed neural networks for heat transfer problems,”Journal of Heat Transfer, vol. 143, no. 6, p. 060801, 2021
work page 2021
-
[21]
Asrr-pinn: Adaptive sub-regional random resampling-based pinn for thermal analysis of 3d-ics,
Z. Zhou, M. Tang, and L. Chen, “Asrr-pinn: Adaptive sub-regional random resampling-based pinn for thermal analysis of 3d-ics,” in2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 2025, pp. 1–7
work page 2025
-
[22]
Fourier Neural Operator for Parametric Partial Differential Equations
Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Fourier neural operator for parametric partial differential equations,”arXiv preprint arXiv:2010.08895, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[23]
L. Lu, P. Jin, and G. E. Karniadakis, “Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators,”arXiv preprint arXiv:1910.03193, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[24]
W. Jin, S. Sadiqbatcha, J. Zhang, and S. X.-D. Tan, “Full-chip thermal map estimation for commercial multi-core cpus with generative adver- sarial learning,” inProceedings of the 39th International Conference on Computer-Aided Design, 2020, pp. 1–9
work page 2020
-
[25]
Real-time thermal map estimation for amd multi-core cpus using transformer,
J. Lu, J. Zhang, and S. X.-D. Tan, “Real-time thermal map estimation for amd multi-core cpus using transformer,” in2023 IEEE/ACM Inter- national Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–7
work page 2023
-
[26]
Real-time thermal map estimation for amd multi-core cpus using transformer,
J. Lu, J. Zhang, and S. X. Tan, “Real-time thermal map estimation for amd multi-core cpus using transformer,” in2023 IEEE/ACM Interna- tional Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–7
work page 2023
-
[27]
Fast machine learning based prediction for temperature simulation using compact models,
M. Hajikhodaverdian, S. Reda, and A. K. Coskun, “Fast machine learning based prediction for temperature simulation using compact models,” in2025 Design, Automation & Test in Europe Conference (DATE). IEEE, 2025, pp. 1–2
work page 2025
-
[28]
Fast steady-state thermal analysis with separation of variables and discrete cosine transform,
H. Ai, L. Chen, J. Zhang, B. Yu, and W. Zhu, “Fast steady-state thermal analysis with separation of variables and discrete cosine transform,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025
work page 2025
-
[29]
Fasttherm: Fast and stable full-chip transient thermal predictor considering nonlinear effects,
T. Zhu, Q. Wang, Y . Lin, R. Wang, and R. Huang, “Fasttherm: Fast and stable full-chip transient thermal predictor considering nonlinear effects,” inProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024, pp. 1–9
work page 2024
-
[30]
T. C. Choy,Effective Medium Theory: Principles and Applications. Oxford University Press, 12 2015. [Online]. Available: https: //doi.org/10.1093/acprof:oso/9780198705093.001.0001
work page doi:10.1093/acprof:oso/9780198705093.001.0001 2015
-
[31]
Equivalent inclusion method for steady state heat conduction in composites,
H. Hiroshi and T. Minoru, “Equivalent inclusion method for steady state heat conduction in composites,”International Journal of Engineering Science, vol. 24, no. 7, pp. 1159–1172, 1986
work page 1986
-
[32]
A novel effective medium theory for modelling the thermal conductivity of porous materials,
L. Gong, Y . Wang, X. Cheng, R. Zhang, and H. Zhang, “A novel effective medium theory for modelling the thermal conductivity of porous materials,”International Journal of Heat and Mass Transfer, vol. 68, pp. 295–298, 2014
work page 2014
-
[33]
M. Wang, Y . Cheng, W. Zeng, Z. Lu, V . F. Pavlidis, and W. Xing, “Aro: Autoregressive operator learning for transferable and multi-fidelity 3d- ic thermal analysis with active learning,” inProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024, pp. 1–9
work page 2024
-
[34]
Hotspot: A compact thermal modeling methodology for early-stage vlsi design,
W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R. Stan, “Hotspot: A compact thermal modeling methodology for early-stage vlsi design,”IEEE Transactions on very large scale integration (VLSI) systems, vol. 14, no. 5, pp. 501–513, 2006
work page 2006
-
[35]
The alpha 21264 microprocessor,
R. E. Kessler, “The alpha 21264 microprocessor,”IEEE micro, vol. 19, no. 2, pp. 24–36, 1999
work page 1999
-
[36]
U-fno—an enhanced fourier neural operator-based deep-learning model for multiphase flow,
G. Wen, Z. Li, K. Azizzadenesheli, A. Anandkumar, and S. M. Benson, “U-fno—an enhanced fourier neural operator-based deep-learning model for multiphase flow,”Advances in Water Resources, vol. 163, p. 104180, 2022
work page 2022
-
[37]
T-fusion: Thermal modeling of 3d ics with multi-fidelity fusion,
B. Zhang, W. Xing, X. Zhao, and Y . Sun, “T-fusion: Thermal modeling of 3d ics with multi-fidelity fusion,” inProceedings of the 30th Asia and South Pacific Design Automation Conference, 2025, pp. 1406–1412
work page 2025
-
[38]
Pisov: Physics-informed separation of variables solvers for full-chip thermal analysis,
L. Chen, W. Zhu, M. Tang, S. X.-D. Tan, J.-F. Mao, and J. Zhang, “Pisov: Physics-informed separation of variables solvers for full-chip thermal analysis,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.