Therm-FM: Foundation Model is ALL YOU NEED for 3D-ICs Thermal Simulation
Pith reviewed 2026-05-22 04:07 UTC · model grok-4.3
The pith
Adapting a pretrained PDE foundation model cuts 3D-IC thermal simulation error by up to 10.6x while using under 20 percent of the usual training data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Therm-FM is a neural operator framework that adapts a pretrained PDE foundation model to steady-state and transient 3D-IC thermal simulation. It exploits the fact that chip-level heat conduction shares elliptic and parabolic operator structures with diffusion-type PDEs, allowing the pretrained diffusion priors to initialize predictions under heterogeneous materials, dense TSV and microbump interconnects, and package boundary conditions. A thermal-equivalent multi-fidelity training strategy then uses low-cost approximate simulations for domain adaptation and a small number of high-fidelity samples for final calibration.
What carries the argument
Neural operator adaptation of a pretrained PDE foundation model combined with multi-fidelity training that transfers diffusion priors to handle heterogeneous 3D-IC structures.
If this is right
- Mean prediction error drops by as much as 10.6 times compared with training from scratch.
- Prior-best accuracy is exceeded while using less than 20 percent of the usual high-fidelity training data.
- Cross-chip adaptation matches or beats full-data baselines in several metrics with only 10-30 target samples.
- Data-generation cost for each new chip design falls because most training can rely on inexpensive low-fidelity runs.
Where Pith is reading between the lines
- The same adaptation pattern may extend to other engineering domains whose governing equations share elliptic or parabolic structure with diffusion.
- Design teams could iterate on 3D-IC layouts more rapidly once a single pretrained thermal model serves many projects.
- Foundation-model reuse could become routine for any physics simulation whose operator class overlaps with an existing pretrained corpus.
Load-bearing premise
Chip-level heat conduction shares enough operator structure with diffusion PDEs for pretrained priors to transfer usefully to new materials, interconnect densities, and package boundaries.
What would settle it
Apply Therm-FM to a new 3D-IC design whose material stack or boundary conditions differ sharply from the pretraining distribution and check whether error stays below prior best methods when only 10-30 high-fidelity samples are supplied.
Figures
read the original abstract
Data-driven thermal predictors for 3D-ICs are often trained from scratch for each chip design using many high-fidelity finite-element simulations, leading to high data-generation cost and costly cross-design reuse. We propose Therm-FM, a neural operator framework that adapts a pretrained partial differential equation (PDE) foundation model to steady-state and transient 3D-IC thermal simulation. The motivation is that steady-state and transient chip-level heat conduction respectively share elliptic and parabolic operator structures with diffusion-type PDEs, allowing pretrained diffusion priors to provide an effective initialization for thermal-field prediction under heterogeneous materials, dense TSV/microbump interconnects, and package-level boundary conditions. To further reduce data-generation cost, Therm-FM incorporates a thermal-equivalent multi-fidelity training strategy that uses low-cost approximate simulations for thermal-domain adaptation and limited high-fidelity samples for calibration. Experiments on public HotSpot benchmarks and industrial 3D-IC package benchmarks show that Therm-FM achieves up to a 10.6x reduction in mean error and surpasses prior best accuracy with less than 20% of the training data. In cross-chip adaptation, it matches or surpasses full-data baselines in several metrics using only 10--30 target samples. We release datasets, source code, and pretrained models at https://github.com/haiyangxin/Therm-FM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Therm-FM, a neural operator framework adapting a pretrained PDE foundation model to steady-state and transient 3D-IC thermal simulation. It motivates this via shared elliptic/parabolic operator structures between heat conduction and diffusion PDEs, and augments it with a thermal-equivalent multi-fidelity strategy (low-cost approximate simulations for domain adaptation plus limited high-fidelity calibration). On HotSpot and industrial 3D-IC benchmarks the method reports up to 10.6x mean-error reduction, superior accuracy with <20% training data, and cross-chip transfer that matches or exceeds full-data baselines using only 10-30 target samples. Datasets, code, and pretrained models are released.
Significance. If the performance claims are robustly supported, the work could meaningfully lower the data-generation cost of high-fidelity thermal analysis for heterogeneous 3D-ICs, enabling faster design-space exploration in electronics packaging. The explicit release of artifacts is a clear strength for reproducibility and follow-on research.
major comments (2)
- [Abstract] Abstract: the central quantitative claims (10.6x mean-error reduction, surpassing prior best accuracy with <20% data, and cross-chip matching with 10-30 samples) are presented without any indication of an ablation that holds the multi-fidelity pipeline fixed while removing the pretrained foundation-model initialization. This omission makes it impossible to determine whether the reported gains require the diffusion-prior assumption or could be obtained by the multi-fidelity strategy alone.
- [Methods / Experiments] Methods / Experiments: the manuscript does not report data splits, error-bar statistics, or baseline comparisons that isolate the contribution of the pretrained model. Without these controls the load-bearing claim that pretrained diffusion priors supply an effective initialization for heterogeneous-material, TSV-dense thermal fields remains under-supported.
minor comments (1)
- [Abstract] Abstract: the GitHub link is welcome; the released repository should include the precise training/validation splits, hyper-parameter settings for the multi-fidelity adaptation, and scripts that regenerate the reported tables.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects for strengthening the evidence supporting our claims about the pretrained foundation model. We address each major comment below and have revised the manuscript to incorporate the requested ablations, statistical reporting, and controls.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central quantitative claims (10.6x mean-error reduction, surpassing prior best accuracy with <20% data, and cross-chip matching with 10-30 samples) are presented without any indication of an ablation that holds the multi-fidelity pipeline fixed while removing the pretrained foundation-model initialization. This omission makes it impossible to determine whether the reported gains require the diffusion-prior assumption or could be obtained by the multi-fidelity strategy alone.
Authors: We agree that an explicit ablation holding the multi-fidelity pipeline fixed while removing the pretrained initialization is required to isolate the contribution of the diffusion priors. In the revised manuscript we have added this ablation (new subsection 4.4 and Table 3), training an identical architecture and multi-fidelity schedule from random initialization on the same data budgets. The results show that the pretrained initialization still yields an additional 2.1–3.4× mean-error reduction over the multi-fidelity-only baseline, confirming that the reported gains are not attributable to the adaptation strategy alone. revision: yes
-
Referee: [Methods / Experiments] Methods / Experiments: the manuscript does not report data splits, error-bar statistics, or baseline comparisons that isolate the contribution of the pretrained model. Without these controls the load-bearing claim that pretrained diffusion priors supply an effective initialization for heterogeneous-material, TSV-dense thermal fields remains under-supported.
Authors: We acknowledge that the original manuscript lacked sufficient experimental controls. We have expanded Section 3.3 to detail the exact train/validation/test splits (including how samples were drawn across chip designs and fidelity levels) and now report mean ± standard deviation over five independent runs with different random seeds for all quantitative results. To isolate the pretrained-model contribution we have added two new baselines: (i) the same multi-fidelity pipeline trained from scratch and (ii) a from-scratch neural operator without multi-fidelity. These comparisons appear in Figures 4–6 and confirm that the pretrained diffusion initialization provides measurable benefit on heterogeneous-material, TSV-dense fields beyond the adaptation strategy. revision: yes
Circularity Check
No significant circularity; claims rest on experimental adaptation of external pretrained model
full rationale
The paper's central premise is that elliptic/parabolic heat conduction shares operator structure with diffusion PDEs, allowing a pretrained foundation model to initialize thermal predictions; this is presented as physical motivation rather than a derived result. The multi-fidelity strategy (low-cost simulations for adaptation plus high-fidelity calibration) and reported gains (10.6x error reduction, cross-chip transfer with 10-30 samples) are evaluated empirically on HotSpot and industrial benchmarks. No equations or steps reduce a prediction to a fitted parameter by construction, and no load-bearing uniqueness theorem or self-citation chain is invoked to force the architecture. The derivation chain is therefore self-contained against external benchmarks and does not collapse to its inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- multi-fidelity adaptation hyperparameters
axioms (1)
- domain assumption Steady-state chip heat conduction shares elliptic operator structure with diffusion PDEs and transient shares parabolic structure
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
steady-state and transient chip-level heat conduction respectively share elliptic and parabolic operator structures with diffusion-type PDEs
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
adapts a pretrained partial differential equation (PDE) foundation model
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.