Therm-FM: Foundation Model is ALL YOU NEED for 3D-ICs Thermal Simulation

Haiyang Xin; Lei He; Ting-Jung Lin; Wei W. Xing; Wenkai Yang; Yangbo Wei; Yu Zhang; Zhen Huang; Zhiping Yu

arxiv: 2605.22663 · v2 · pith:MT7MHXY5new · submitted 2026-05-21 · 💻 cs.CE

Therm-FM: Foundation Model is ALL YOU NEED for 3D-ICs Thermal Simulation

Zhen Huang , Haiyang Xin , Wenkai Yang , Yangbo Wei , Zhiping Yu , Yu Zhang , Wei W. Xing , Ting-Jung Lin

show 1 more author

Lei He

This is my paper

Pith reviewed 2026-05-22 04:07 UTC · model grok-4.3

classification 💻 cs.CE

keywords 3D-IC thermal simulationfoundation modelneural operatormulti-fidelity trainingPDE adaptationheat conductioncross-design reuse

0 comments

The pith

Adapting a pretrained PDE foundation model cuts 3D-IC thermal simulation error by up to 10.6x while using under 20 percent of the usual training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that steady-state and transient heat flow in 3D integrated circuits follows the same broad mathematical patterns as simpler diffusion equations. Because of this overlap, a model already trained on many diffusion problems can be repurposed as a strong starting point instead of training a new predictor from scratch for every chip layout. The method adds a multi-fidelity step that first tunes the model on many cheap but approximate simulations and then refines it with only a handful of expensive, high-accuracy runs. Experiments on both public benchmarks and real industrial packages show the adapted model reaches lower average error than earlier approaches and can be transferred to a new chip design with just 10 to 30 accurate samples.

Core claim

Therm-FM is a neural operator framework that adapts a pretrained PDE foundation model to steady-state and transient 3D-IC thermal simulation. It exploits the fact that chip-level heat conduction shares elliptic and parabolic operator structures with diffusion-type PDEs, allowing the pretrained diffusion priors to initialize predictions under heterogeneous materials, dense TSV and microbump interconnects, and package boundary conditions. A thermal-equivalent multi-fidelity training strategy then uses low-cost approximate simulations for domain adaptation and a small number of high-fidelity samples for final calibration.

What carries the argument

Neural operator adaptation of a pretrained PDE foundation model combined with multi-fidelity training that transfers diffusion priors to handle heterogeneous 3D-IC structures.

If this is right

Mean prediction error drops by as much as 10.6 times compared with training from scratch.
Prior-best accuracy is exceeded while using less than 20 percent of the usual high-fidelity training data.
Cross-chip adaptation matches or beats full-data baselines in several metrics with only 10-30 target samples.
Data-generation cost for each new chip design falls because most training can rely on inexpensive low-fidelity runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptation pattern may extend to other engineering domains whose governing equations share elliptic or parabolic structure with diffusion.
Design teams could iterate on 3D-IC layouts more rapidly once a single pretrained thermal model serves many projects.
Foundation-model reuse could become routine for any physics simulation whose operator class overlaps with an existing pretrained corpus.

Load-bearing premise

Chip-level heat conduction shares enough operator structure with diffusion PDEs for pretrained priors to transfer usefully to new materials, interconnect densities, and package boundaries.

What would settle it

Apply Therm-FM to a new 3D-IC design whose material stack or boundary conditions differ sharply from the pretraining distribution and check whether error stays below prior best methods when only 10-30 high-fidelity samples are supplied.

Figures

Figures reproduced from arXiv: 2605.22663 by Haiyang Xin, Lei He, Ting-Jung Lin, Wei W. Xing, Wenkai Yang, Yangbo Wei, Yu Zhang, Zhen Huang, Zhiping Yu.

**Figure 2.** Figure 2: Workflow of Therm-FM. The left panel shows PDE foundation-model pretraining and lightweight fine-tuning for 3D-IC thermal prediction. The [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Workflow of low-fidelity data generation. The detailed 3D-IC package contains heterogeneous core, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison with existing methods on the HS-QC case at [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Validation of the analytical thermal-equivalent model on a TSV layer. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Training-sample sensitivity of Therm-FM on HS-SC, HS-QC, and HS-OC. The gray dashed line denotes the full-data SAU-FNO RMSE baseline. All [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Few-shot cross-chip adaptation on the IND-8C and IND-32C cases. Models are trained on one industrial package case and fine-tuned with limited [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Performance trends with respect to model parameters on IND-8C and IND-32C cases. Each column reports one metric, and the two rows correspond [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative visualization of transient thermal prediction on two representative cases, HS-SC (ev6 [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative comparison of steady-state thermal prediction results across five representative cases. For each case, the first row shows the ground-truth [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

read the original abstract

Data-driven thermal predictors for 3D-ICs are often trained from scratch for each chip design using many high-fidelity finite-element simulations, leading to high data-generation cost and costly cross-design reuse. We propose Therm-FM, a neural operator framework that adapts a pretrained partial differential equation (PDE) foundation model to steady-state and transient 3D-IC thermal simulation. The motivation is that steady-state and transient chip-level heat conduction respectively share elliptic and parabolic operator structures with diffusion-type PDEs, allowing pretrained diffusion priors to provide an effective initialization for thermal-field prediction under heterogeneous materials, dense TSV/microbump interconnects, and package-level boundary conditions. To further reduce data-generation cost, Therm-FM incorporates a thermal-equivalent multi-fidelity training strategy that uses low-cost approximate simulations for thermal-domain adaptation and limited high-fidelity samples for calibration. Experiments on public HotSpot benchmarks and industrial 3D-IC package benchmarks show that Therm-FM achieves up to a 10.6x reduction in mean error and surpasses prior best accuracy with less than 20% of the training data. In cross-chip adaptation, it matches or surpasses full-data baselines in several metrics using only 10--30 target samples. We release datasets, source code, and pretrained models at https://github.com/haiyangxin/Therm-FM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Therm-FM adapts a pretrained diffusion PDE foundation model to 3D-IC thermal simulation via multi-fidelity training and reports strong data-efficiency gains, but the pretraining contribution is not isolated from the adaptation strategy.

read the letter

The main point is that Therm-FM adapts a pretrained diffusion PDE foundation model to 3D-IC thermal simulation via multi-fidelity training and reports strong data-efficiency gains, but the pretraining contribution is not isolated from the adaptation strategy. The paper identifies a practical bottleneck where each new chip design requires many expensive high-fidelity simulations, then shows how reusing a model pretrained on diffusion-type PDEs can cut that cost while handling heterogeneous materials, TSVs, and package boundaries. The thermal-equivalent multi-fidelity step uses cheap approximate simulations for domain adaptation and a small number of accurate samples for calibration, which produces the claimed cross-design transfer with only 10-30 target samples. Releasing the code, datasets, and pretrained models is a concrete step that lets others check the numbers directly. The results on HotSpot and industrial benchmarks are the clearest part of the work so far. The soft spot is exactly the one flagged in the stress test. The 10.6x mean-error reduction and the ability to beat full-data baselines with under 20 percent of the data are presented as evidence that the pretrained priors help, yet nothing in the abstract or summary shows an ablation that keeps the multi-fidelity pipeline fixed while removing the foundation-model initialization. Without that control it remains possible that the multi-fidelity procedure alone drives most of the improvement. Details on data splits, error bars, and how the architecture specifically encodes dense interconnects are also missing from what is visible, so the robustness of the operator-structure assumption is still open. This paper is aimed at people who build thermal-analysis tools for advanced packaging or who study foundation models for engineering PDEs. A reader who needs faster iteration on 3D-IC designs or wants to test whether diffusion priors transfer to heat conduction would find the benchmarks and released artifacts useful. I would send it to peer review. The practical framing and the released artifacts give it enough substance to justify referee time, even if the authors will need to add the missing ablations and controls.

Referee Report

2 major / 1 minor

Summary. The paper proposes Therm-FM, a neural operator framework adapting a pretrained PDE foundation model to steady-state and transient 3D-IC thermal simulation. It motivates this via shared elliptic/parabolic operator structures between heat conduction and diffusion PDEs, and augments it with a thermal-equivalent multi-fidelity strategy (low-cost approximate simulations for domain adaptation plus limited high-fidelity calibration). On HotSpot and industrial 3D-IC benchmarks the method reports up to 10.6x mean-error reduction, superior accuracy with <20% training data, and cross-chip transfer that matches or exceeds full-data baselines using only 10-30 target samples. Datasets, code, and pretrained models are released.

Significance. If the performance claims are robustly supported, the work could meaningfully lower the data-generation cost of high-fidelity thermal analysis for heterogeneous 3D-ICs, enabling faster design-space exploration in electronics packaging. The explicit release of artifacts is a clear strength for reproducibility and follow-on research.

major comments (2)

[Abstract] Abstract: the central quantitative claims (10.6x mean-error reduction, surpassing prior best accuracy with <20% data, and cross-chip matching with 10-30 samples) are presented without any indication of an ablation that holds the multi-fidelity pipeline fixed while removing the pretrained foundation-model initialization. This omission makes it impossible to determine whether the reported gains require the diffusion-prior assumption or could be obtained by the multi-fidelity strategy alone.
[Methods / Experiments] Methods / Experiments: the manuscript does not report data splits, error-bar statistics, or baseline comparisons that isolate the contribution of the pretrained model. Without these controls the load-bearing claim that pretrained diffusion priors supply an effective initialization for heterogeneous-material, TSV-dense thermal fields remains under-supported.

minor comments (1)

[Abstract] Abstract: the GitHub link is welcome; the released repository should include the precise training/validation splits, hyper-parameter settings for the multi-fidelity adaptation, and scripts that regenerate the reported tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects for strengthening the evidence supporting our claims about the pretrained foundation model. We address each major comment below and have revised the manuscript to incorporate the requested ablations, statistical reporting, and controls.

read point-by-point responses

Referee: [Abstract] Abstract: the central quantitative claims (10.6x mean-error reduction, surpassing prior best accuracy with <20% data, and cross-chip matching with 10-30 samples) are presented without any indication of an ablation that holds the multi-fidelity pipeline fixed while removing the pretrained foundation-model initialization. This omission makes it impossible to determine whether the reported gains require the diffusion-prior assumption or could be obtained by the multi-fidelity strategy alone.

Authors: We agree that an explicit ablation holding the multi-fidelity pipeline fixed while removing the pretrained initialization is required to isolate the contribution of the diffusion priors. In the revised manuscript we have added this ablation (new subsection 4.4 and Table 3), training an identical architecture and multi-fidelity schedule from random initialization on the same data budgets. The results show that the pretrained initialization still yields an additional 2.1–3.4× mean-error reduction over the multi-fidelity-only baseline, confirming that the reported gains are not attributable to the adaptation strategy alone. revision: yes
Referee: [Methods / Experiments] Methods / Experiments: the manuscript does not report data splits, error-bar statistics, or baseline comparisons that isolate the contribution of the pretrained model. Without these controls the load-bearing claim that pretrained diffusion priors supply an effective initialization for heterogeneous-material, TSV-dense thermal fields remains under-supported.

Authors: We acknowledge that the original manuscript lacked sufficient experimental controls. We have expanded Section 3.3 to detail the exact train/validation/test splits (including how samples were drawn across chip designs and fidelity levels) and now report mean ± standard deviation over five independent runs with different random seeds for all quantitative results. To isolate the pretrained-model contribution we have added two new baselines: (i) the same multi-fidelity pipeline trained from scratch and (ii) a from-scratch neural operator without multi-fidelity. These comparisons appear in Figures 4–6 and confirm that the pretrained diffusion initialization provides measurable benefit on heterogeneous-material, TSV-dense fields beyond the adaptation strategy. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on experimental adaptation of external pretrained model

full rationale

The paper's central premise is that elliptic/parabolic heat conduction shares operator structure with diffusion PDEs, allowing a pretrained foundation model to initialize thermal predictions; this is presented as physical motivation rather than a derived result. The multi-fidelity strategy (low-cost simulations for adaptation plus high-fidelity calibration) and reported gains (10.6x error reduction, cross-chip transfer with 10-30 samples) are evaluated empirically on HotSpot and industrial benchmarks. No equations or steps reduce a prediction to a fitted parameter by construction, and no load-bearing uniqueness theorem or self-citation chain is invoked to force the architecture. The derivation chain is therefore self-contained against external benchmarks and does not collapse to its inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the transferability of diffusion PDE priors to heterogeneous thermal problems in 3D-ICs and on the effectiveness of low-fidelity adaptation plus limited high-fidelity calibration. No explicit free parameters or invented physical entities are named in the abstract.

free parameters (1)

multi-fidelity adaptation hyperparameters
Learning rates, layer freezing choices, and sample counts for low- versus high-fidelity stages are implicit in any neural-operator fine-tuning but not quantified in the abstract.

axioms (1)

domain assumption Steady-state chip heat conduction shares elliptic operator structure with diffusion PDEs and transient shares parabolic structure
Invoked directly in the abstract to justify reuse of pretrained diffusion priors for heterogeneous materials and interconnects.

pith-pipeline@v0.9.0 · 5798 in / 1413 out tokens · 79958 ms · 2026-05-22T04:07:40.718679+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

steady-state and transient chip-level heat conduction respectively share elliptic and parabolic operator structures with diffusion-type PDEs
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

adapts a pretrained partial differential equation (PDE) foundation model

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.