pith. sign in

arxiv: 2501.04066 · v2 · submitted 2025-01-07 · 💻 cs.LG · cs.AR

Federated Knowledge Distillation for Multi-Model Architectures Lithography Hotspot Detection

Pith reviewed 2026-05-23 05:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AR
keywords federated learningknowledge distillationlithography hotspot detectionprivacy preservationmulti-model architecturessemiconductor manufacturing
0
0 comments X

The pith

A hybrid federated knowledge distillation framework improves lithography hotspot detection while preserving data privacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FedKD-hybrid to solve privacy issues in training models for lithography hotspot detection, a task in semiconductor manufacturing that requires strong data protection. Existing federated methods use either only parameter averaging or only knowledge distillation, but this approach combines both by having clients share parameters from agreed layers and logits using a public dataset for alignment. The hybrid aggregation helps refine each client's local model more effectively than single-paradigm methods. Experiments on benchmark and real factory data show consistent gains in accuracy and robustness. This matters because it enables collaborative improvement across different organizations without exposing sensitive manufacturing data.

Core claim

FedKD-hybrid utilizes a public dataset to facilitate consensus, where clients exchange both parameters of agreed-upon layers and logits. This hybrid information is aggregated to refine local models, enhancing knowledge transfer and outperforming state-of-the-art methods in effectiveness and robustness on ICCAD-2012 and real-world FAB datasets.

What carries the argument

The FedKD-hybrid framework that aggregates both selected model parameters and output logits exchanged over a public dataset to update local models.

If this is right

  • Local models trained on private data achieve higher hotspot detection performance through combined knowledge sources.
  • Privacy is maintained as private datasets remain local while still benefiting from collaboration.
  • The method supports multi-model architectures by allowing exchange only on agreed layers.
  • Performance gains hold on both public benchmarks and real manufacturing datasets.
  • Robustness to variations in data distributions increases compared to pure parameter or distillation approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hybrid exchange could apply to other domains needing privacy like medical imaging analysis.
  • If the public dataset is well-chosen, it might reduce the number of communication rounds needed.
  • Testing the sensitivity to public dataset choice would clarify how critical that component is.
  • The approach might generalize to other computer vision tasks in industrial settings.

Load-bearing premise

There exists a public dataset that enables useful consensus across clients without introducing bias or privacy risks to the private training data.

What would settle it

Demonstrating that models trained with the hybrid method perform no better than those using only parameters or only distillation on the same datasets would challenge the central claim.

Figures

Figures reproduced from arXiv: 2501.04066 by Chuanguang Yang, Jianping Gou, Kai Zhang, Tingwen Huang, Xingyou Lin, Yanli Li, Yingli Tian, Yuqi Li, Zhongliang Guo.

Figure 1
Figure 1. Figure 1: The illustration of non-hotspot and hotspot pattern of lithography. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overview of FedKD-hybrid algorithm. transfer across multiple clients based on different scenar￾ios: parameter-based and non-parameter-based, respectively. In lithography hotspot detection (LHD) scenarios, clients may use different model architectures depending on their available computing resources. Additionally, due to bandwidth hetero￾geneity, clients are assumed to participate in the learning task a… view at source ↗
Figure 3
Figure 3. Figure 3: Test results on ICCAD and FAB set using synchronous updates. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Test results on ICCAD and FAB set using 80% asynchronous updates. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

As a special type of multimedia data, Lithography Hotspot Detection (LHD) training often requires stronger privacy protection than conventional multimedia data, and federated learning provides a promising potential solution to this challenge. However, existing approaches rely solely on either parameter aggregation or Knowledge Distillation (KD), failing to fully exploit the potential of collaborative learning. To address this, we propose FedKD-hybrid, a novel framework that synergizes the strengths of both paradigms. Specifically, FedKD-hybrid utilizes a public dataset to facilitate consensus, where clients exchange both parameters of agreed-upon layers and logits. This hybrid information is aggregated to refine local models, enhancing knowledge transfer. Extensive experiments on ICCAD-2012 and real-world FAB datasets demonstrate that FedKD-hybrid consistently outperforms state-of-the-art methods in both effectiveness and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes FedKD-hybrid, a federated learning framework for lithography hotspot detection (LHD) that combines parameter aggregation of agreed-upon layers with logit exchange via knowledge distillation, using a public dataset to enable client consensus and model refinement. It claims this hybrid approach enhances knowledge transfer and consistently outperforms state-of-the-art methods in effectiveness and robustness on the ICCAD-2012 benchmark and real-world FAB datasets.

Significance. If the experimental results hold under proper validation, the hybrid parameter-plus-logit aggregation in a federated setting with a public dataset could offer a practical advance for privacy-sensitive collaborative training in semiconductor manufacturing, where LHD data distributions are proprietary. The approach addresses a gap between pure parameter-based FL and pure KD methods, but its significance depends on demonstrating that the public dataset enables unbiased transfer without introducing distribution shift.

major comments (2)
  1. [Abstract and experimental evaluation section] The central experimental claim (abstract) that FedKD-hybrid 'consistently outperforms state-of-the-art methods' on ICCAD-2012 and real-world FAB datasets cannot be evaluated because the manuscript provides no description of the public dataset (source, size, label distribution, or statistical distance to private client data), no baselines, no statistical significance tests, and no ablation studies isolating the hybrid aggregation benefit. This directly undermines the assertion that the public dataset 'facilitates consensus' and 'enhances knowledge transfer' without bias or leakage.
  2. [Method description of FedKD-hybrid] The method relies on the assumption that a public dataset exists which matches private LHD distributions closely enough for hybrid aggregation to transfer useful knowledge (abstract and method description). No analysis of distribution shift, privacy leakage risk, or sensitivity to public-set choice is provided; if the public set is drawn from ICCAD-2012 itself, the reported gains may reflect benchmark artifacts rather than generalization to proprietary FAB data.
minor comments (2)
  1. [Method] Notation for 'agreed-upon layers' and the aggregation procedure for hybrid information should be formalized with equations or pseudocode for reproducibility.
  2. [Abstract and introduction] The abstract mentions 'multi-model architectures' in the title but provides no details on how the framework handles heterogeneous client models beyond layer agreement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that additional details on the public dataset, baselines, statistical tests, ablations, distribution shift, and privacy analysis are needed to strengthen the claims. The revised manuscript will incorporate these elements.

read point-by-point responses
  1. Referee: [Abstract and experimental evaluation section] The central experimental claim (abstract) that FedKD-hybrid 'consistently outperforms state-of-the-art methods' on ICCAD-2012 and real-world FAB datasets cannot be evaluated because the manuscript provides no description of the public dataset (source, size, label distribution, or statistical distance to private client data), no baselines, no statistical significance tests, and no ablation studies isolating the hybrid aggregation benefit. This directly undermines the assertion that the public dataset 'facilitates consensus' and 'enhances knowledge transfer' without bias or leakage.

    Authors: We agree that the current version lacks sufficient experimental transparency. In the revision we will add: a full description of the public dataset (source, size, label distribution, and statistical distance metrics to private data); explicit listing of all baselines; statistical significance tests; and ablation studies isolating the hybrid aggregation components. These additions will allow proper evaluation of the claims regarding consensus and knowledge transfer. revision: yes

  2. Referee: [Method description of FedKD-hybrid] The method relies on the assumption that a public dataset exists which matches private LHD distributions closely enough for hybrid aggregation to transfer useful knowledge (abstract and method description). No analysis of distribution shift, privacy leakage risk, or sensitivity to public-set choice is provided; if the public set is drawn from ICCAD-2012 itself, the reported gains may reflect benchmark artifacts rather than generalization to proprietary FAB data.

    Authors: We agree that the manuscript would be strengthened by explicit analysis of these factors. The revision will include quantitative assessment of distribution shift, privacy leakage evaluation, and sensitivity experiments across different public-set choices. We will also clarify the relationship of the public dataset to the ICCAD-2012 benchmark to address concerns about potential artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on experimental comparisons

full rationale

The paper introduces FedKD-hybrid as a hybrid federated KD framework that aggregates layer parameters and logits over a public dataset. All performance claims are grounded in reported experiments on ICCAD-2012 and real-world FAB data versus prior methods. No equations, fitted parameters renamed as predictions, self-definitional constructions, or load-bearing self-citations appear in the provided text. The derivation chain consists of a proposed architecture plus empirical validation and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no free parameters, axioms, or invented entities are specified.

pith-pipeline@v0.9.0 · 5690 in / 971 out tokens · 45731 ms · 2026-05-23T05:38:22.849563+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 1 internal anchor

  1. [1]

    Reducing dfm to practice: the lithography manufacturability assessor,

    L. Liebmann, S. Mansfield, G. Han, J. Culp, J. Hibbeler, and R. Tsai, “Reducing dfm to practice: the lithography manufacturability assessor,” in Design and Process Integration for Microelectronic Manufacturing IV, vol. 6156. SPIE, 2006, pp. 178–189

  2. [2]

    Cramming more components onto integrated circuits,

    G. E. Moore, “Cramming more components onto integrated circuits,” Proceedings of the IEEE , vol. 86, no. 1, pp. 82–85, 1998

  3. [3]

    Lithography hotspot de- tection based on heterogeneous federated learning with local adaptation and feature selection,

    J. Pan, X. Lin, J. Xu, Y . Chen, and C. Zhuo, “Lithography hotspot de- tection based on heterogeneous federated learning with local adaptation and feature selection,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2023

  4. [4]

    C. Finn, P. Abbeel, and S. Levine, 2017

  5. [5]

    Hotspot prediction: Sem image generation with potential lithography hotspots,

    J. Kim, J. Lim, J. Lee, T.-Y . Kim, Y . Nam, K. Kim, and D.-N. Kim, “Hotspot prediction: Sem image generation with potential lithography hotspots,” IEEE Transactions on Semiconductor Manufacturing , 2023

  6. [6]

    Hotspot detection on post-opc layout using full chip simulation based verification tool: A case study with aerial image simulation,

    J. Kim and M. Fan, “Hotspot detection on post-opc layout using full chip simulation based verification tool: A case study with aerial image simulation,” Proc. SPIE, vol. 5256, 2003

  7. [7]

    Automated full-chip hotspot detection and removal flow for interconnect layers of cell-based designs,

    E. Roseboom, M. Rossman, F. C. Chang, and P. Hurat, “Automated full-chip hotspot detection and removal flow for interconnect layers of cell-based designs,” Proceedings of Spie the International Society for Optical Engineering, 2007

  8. [8]

    Accurate process-hotspot detection using critical design rule extraction,

    Y . T. Yu, Y . C. Chan, S. Sinha, H. R. Jiang, and C. Chiang, “Accurate process-hotspot detection using critical design rule extraction,” in ACM, 2012

  9. [9]

    A fuzzy-matching model with grid reduction for lithography hotspot detection,

    Chang, S., J., Chen, Lin, Wen, and W., “A fuzzy-matching model with grid reduction for lithography hotspot detection,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems: A publication of the IEEE Circuits and Systems Society , vol. 33, no. 11, pp. 1671–1680, 2014

  10. [10]

    Improved tangent space-based distance metric for lithographic hotspot classification,

    Fan, Yang, Subarna, Sinha, Charles, C., Chiang, Xuan, Zeng, and Dian, “Improved tangent space-based distance metric for lithographic hotspot classification,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 9, pp. 1545–1556, 2017

  11. [11]

    Grasp based metaheuristics for layout pattern classification,

    M. Woo, S. Kim, and S. Kang, “Grasp based metaheuristics for layout pattern classification,” in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) , 2017

  12. [12]

    Accurate lithography hotspot detection using deep convolutional neural networks,

    M. Shin and J. H. Lee, “Accurate lithography hotspot detection using deep convolutional neural networks,” Journal of Micro/nanolithography Mems & Moems , vol. 15, no. 4, p. 043507, 2016

  13. [13]

    Imbalance aware lithography hotspot detection: a deep learning approach,

    H. Yang, L. Luo, S. Jing, C. Lin, and Y . Bei, “Imbalance aware lithography hotspot detection: a deep learning approach,” Journal of Micro/nanolithography Mems & Moems , vol. 16, no. 3, p. 1, 2017

  14. [14]

    Lithography hotspot detection: From shallow to deep learning,

    H. Yang, Y . Lin, Y . Bei, and E. Young, “Lithography hotspot detection: From shallow to deep learning,” in 2017 30th IEEE International System-on-Chip Conference (SOCC) , 2017

  15. [15]

    Lithography hotspot detection via heterogeneous federated learning with local adaptation,

    X. Lin, J. Pan, J. Xu, Y . Chen, and C. Zhuo, “Lithography hotspot detection via heterogeneous federated learning with local adaptation,” 2021

  16. [16]

    Communication-efficient learning of deep networks from decentralized data,

    H. B. Mcmahan, E. Moore, D. Ramage, S. Hampson, and B. Arcas, “Communication-efficient learning of deep networks from decentralized data,” 2016

  17. [17]

    Federated multi-task learning,

    V . Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,”Advances in neural information processing systems, vol. 30, 2017

  18. [18]

    Federated Learning with Personalization Layers

    M. G. Arivazhagan, V . Aggarwal, A. K. Singh, and S. Choud- hary, “Federated learning with personalization layers,” arXiv preprint arXiv:1912.00818, 2019

  19. [19]

    Fedmd: Heterogenous federated learning via model distillation,

    D. Li and J. Wang, “Fedmd: Heterogenous federated learning via model distillation,” arXiv preprint arXiv:1910.03581 , 2019

  20. [20]

    Federated optimization in heterogeneous networks,

    T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems , vol. 2, pp. 429–450, 2020

  21. [21]

    Knowledge distillation: A survey,

    J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision , vol. 129, no. 6, pp. 1789–1819, 2021

  22. [22]

    Hierarchical self-supervised augmented knowledge distillation,

    C. Yang, Z. An, L. Cai, and Y . Xu, “Hierarchical self-supervised augmented knowledge distillation,” International Joint Conference on Artificial Intelligence, pp. 1217–1223, 2021

  23. [23]

    Cross- image relational knowledge distillation for semantic segmentation,

    C. Yang, H. Zhou, Z. An, X. Jiang, Y . Xu, and Q. Zhang, “Cross- image relational knowledge distillation for semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 319–12 328

  24. [24]

    Mixskd: Self-knowledge distillation from mixup for image recogni- tion,

    C. Yang, Z. An, H. Zhou, L. Cai, X. Zhi, J. Wu, Y . Xu, and Q. Zhang, “Mixskd: Self-knowledge distillation from mixup for image recogni- tion,” in European Conference on Computer Vision . Springer, 2022, pp. 534–551

  25. [25]

    Federated distillation: A survey,

    L. Li, J. Gou, B. Yu, L. Du, and Z. Y . D. Tao, “Federated distillation: A survey,” arXiv preprint arXiv:2404.08564 , 2024

  26. [26]

    Clip-kd: An empirical study of clip model distillation,

    C. Yang, Z. An, L. Huang, J. Bi, X. Yu, H. Yang, B. Diao, and Y . Xu, “Clip-kd: An empirical study of clip model distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 952–15 962

  27. [27]

    Iccad-2012 cad contest in fuzzy pattern matching for physical verification and benchmark suite,

    J. A. Torres, “Iccad-2012 cad contest in fuzzy pattern matching for physical verification and benchmark suite,” IEEE, 2012

  28. [28]

    Ensemble distillation for robust model fusion in federated learning,

    T. Lin, L. Kong, S. U. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,” Advances in neural information processing systems , vol. 33, pp. 2351–2363, 2020

  29. [29]

    Knowledge dis- tillation for federated learning: a practical guide,

    A. Mora, I. Tenison, P. Bellavista, and I. Rish, “Knowledge dis- tillation for federated learning: a practical guide,” arXiv preprint arXiv:2211.04742, 2022

  30. [30]

    Data-free knowledge distillation for het- erogeneous federated learning,

    Z. Zhu, J. Hong, and J. Zhou, “Data-free knowledge distillation for het- erogeneous federated learning,” in International conference on machine learning. PMLR, 2021, pp. 12 878–12 889

  31. [31]

    Communication-efficient federated learning via knowledge distillation,

    C. Wu, F. Wu, L. Lyu, Y . Huang, and X. Xie, “Communication-efficient federated learning via knowledge distillation,” Nature communications, vol. 13, no. 1, p. 2032, 2022

  32. [32]

    When federated learning meets knowledge distillation,

    X. Pang, J. Hu, P. Sun, J. Ren, and Z. Wang, “When federated learning meets knowledge distillation,” IEEE Wireless Communications, vol. 31, no. 5, pp. 208–214, 2024

  33. [33]

    Survey of personalization techniques for federated learning,

    V . Kulkarni, M. Kulkarni, and A. Pant, “Survey of personalization techniques for federated learning,” in 2020 fourth world conference on smart trends in systems, security and sustainability (WorldS4) . IEEE, 2020, pp. 794–797

  34. [34]

    Lithography hotspots detection using deep learning,

    V . Borisov and J. Scheible, “Lithography hotspots detection using deep learning,” in2018 15th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), 2018, pp. 145–148

  35. [35]

    Lithography hotspot detection based on residual network,

    M. Lin, F. Zeng, Y . Shen, and Y . Wei, “Lithography hotspot detection based on residual network,” in DTCO and Computational Patterning II , vol. 12495. SPIE, 2023, pp. 354–361

  36. [36]

    Data augmentation in hotspot detection based on generative adversarial network,

    S. Wang, T. Gai, T. Qu, B. Ma, X. Su, L. Dong, L. Zhang, P. Xu, Y . Su, and Y . Wei, “Data augmentation in hotspot detection based on generative adversarial network,” Journal of Micro/Nanopatterning, Materials, and Metrology, vol. 20, no. 3, pp. 034 201–034 201, 2021

  37. [37]

    Lithography hotspot detection via heterogeneous federated learning with local adaptation,

    X. Lin, J. Pan, J. Xu, Y . Chen, and C. Zhuo, “Lithography hotspot detection via heterogeneous federated learning with local adaptation,” in 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 2022, pp. 166–171

  38. [38]

    Lithography hotspot de- tection based on heterogeneous federated learning with local adaptation and feature selection,

    J. Pan, X. Lin, J. Xu, Y . Chen, and C. Zhuo, “Lithography hotspot de- tection based on heterogeneous federated learning with local adaptation and feature selection,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 43, no. 5, pp. 1484–1496, 2024

  39. [39]

    Lithography hotspot detection method based on transfer learning using pre-trained deep convolutional neural network,

    L. Liao, S. Li, Y . Che, W. Shi, and X. Wang, “Lithography hotspot detection method based on transfer learning using pre-trained deep convolutional neural network,” Applied Sciences, vol. 12, no. 4, p. 2192, 2022

  40. [40]

    : Toward heterogeneous federated learning via global knowledge distillation,

    D. Yao, W. Pan, Y . Dai, Y . Wan, X. Ding, C. Yu, H. Jin, Z. Xu, and L. Sun, “: Toward heterogeneous federated learning via global knowledge distillation,”IEEE Transactions on Computers, vol. 73, no. 1, pp. 3–17, 2024

  41. [41]

    Adam: A method for stochastic optimization,

    D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Computer Science, 2014

  42. [42]

    Federated learning with partial model personalization,

    K. Pillutla, K. Malik, A.-R. Mohamed, M. Rabbat, M. Sanjabi, and L. Xiao, “Federated learning with partial model personalization,” in International Conference on Machine Learning . PMLR, 2022, pp. 17 716–17 758

  43. [43]

    Online knowledge distillation via mutual contrastive learning for visual recogni- tion,

    C. Yang, Z. An, H. Zhou, F. Zhuang, Y . Xu, and Q. Zhang, “Online knowledge distillation via mutual contrastive learning for visual recogni- tion,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 8, pp. 10 212–10 227, 2023