pith. sign in

arxiv: 2606.26713 · v1 · pith:RLDY74KDnew · submitted 2026-06-25 · 💻 cs.AI

LithoDreamer: A Physics-Informed World Model for Multi-Stage Computational Lithography

Pith reviewed 2026-06-26 04:48 UTC · model grok-4.3

classification 💻 cs.AI
keywords computational lithographyworld modelphysics-informedmask optimizationresist simulationmulti-stage processvariational optimization
0
0 comments X

The pith

LithoDreamer is the first physics-informed world model for the multi-stage lithography process from layout to after-development image.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LithoDreamer to address the limitation of existing models that do not capture the continuous physical process of lithography involving multiple stages. It formulates the pipeline as a decision-driven multi-step evolution system that models changes in feature spaces between states using physics-informed latent representations. A contrastive variational optimization paradigm enables the model to learn consistent evolutions by contrasting intervention paths without needing continuous supervision. This results in improved accuracy for both simulating the forward process and performing inverse planning for optimization. A sympathetic reader would care because accurate modeling could lead to better mask designs and higher manufacturing yields as technology nodes shrink.

Core claim

LithoDreamer formulates the Layout-Mask-Resist Image-After Development Image pipeline as a decision-driven multi-step evolution system. It captures feature changes between adjacent states to create stage-specific physics-informed latent spaces that control process interventions and drive state transitions. The contrastive variational optimization paradigm contrasts latent differences between intervention paths with variational evolution constraints to ensure consistency with real lithography physics without continuous supervision.

What carries the argument

The contrastive variational optimization paradigm that guides generation of physically consistent evolutions by contrasting latent differences between intervention paths under variational constraints.

Load-bearing premise

The contrastive variational optimization paradigm can guide the model to generate evolutions consistent with real lithography physics without continuous supervision.

What would settle it

Measuring the error between LithoDreamer predictions and ground-truth physical simulations or real experimental data for after-development images on a held-out set of process parameters and masks.

Figures

Figures reproduced from arXiv: 2606.26713 by Cheng Zhuo, Jinyuan Deng, Qian Jin, Qi Sun, Xunzhao Yin, Yucheng Cui, Yu Li, Yumeng Liu, Yuqi Jiang, Zimu Li.

Figure 1
Figure 1. Figure 1: Comparison of the different processes: (a) Typical commercial simulation workflow; (b) Actual physical lithogra￾phy manufacturing process; (c) The evolution workflow of our LithoDreamer’s process intervention and lithography state. ing, and photoresist reactions, simulates the practical manu￾facturing process and plays a key role in addressing imaging complexities and manufacturing constraints (Yang & Ren,… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the LithoDreamer framework. 2.3. Applications of World Models WMs learn latent environment dynamics from historical ob￾servations, enabling multi-step state prediction and planning. This paradigm has been widely used in embodied intel￾ligence and autonomous driving to support long-horizon reasoning and efficient decision-making. For example, DriveDreamer-2 (Zhao et al., 2025) combines LLMs with… view at source ↗
Figure 3
Figure 3. Figure 3: Three types of light sources in the dataset, each with configurable parameters, such as radius. parameters: source type ( [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of inverse planning on the ID dataset. Given the input layout and target ADI, LithoDreamer plans latent interventions and evolves the Mask, Resist Image, and ADI state to achieve the target pattern. terns into polygonal contours, sample gauge points along the target contour, and measure the local displacement from the target contour to the predicted contour along the target normal direction. … view at source ↗
Figure 5
Figure 5. Figure 5: Schematic illustration of gauge-based EPE measurement. Local measurement gauges are placed on the target resist image contour, and edge displacement is evaluated along the corresponding contour-normal direction. The magnified view highlights how the measured offset captures local contour placement deviation between the generated and target resist image patterns. B. Principles of Lithography Metric Calculat… view at source ↗
Figure 6
Figure 6. Figure 6: Representative LRC violation categories used for manufacturability assessment. Pinch captures locally narrowed printed features, Bridge captures unintended connections or insufficient spacing between neighboring structures, and EPE captures excessive contour displacement beyond the allowed placement tolerance. Red markers indicate the detected violation regions. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparison of forward evolution results on OOD samples at the 55 nm process node. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Forward evolution results on curved and irregular OOD layouts. GT LithoDreamer (Ours) GT LithoDreamer (Ours) GT LithoDreamer (Ours) GT LithoDreamer (Ours) Layout Mask Resist Image ADI [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Forward evolution results on isolated contact-like OOD layouts. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
read the original abstract

As semiconductor technology nodes scale, computational lithography is essential for ensuring yield and performance. However, lithography is a continuous physical process involving mask optimization, optical imaging, resist exposure, and development, which existing models fail to capture. To overcome this limitation, we present LithoDreamer, the first physics-informed World Model (WM) framework for computational lithography, which formulates the ``Layout-Mask-Resist Image-After Development Image (ADI)'' pipeline as a decision-driven multi-step evolution system. LithoDreamer captures feature changes between adjacent states to model stage-specific physics-informed latent spaces, in which it controls process intervention exploration and drives subsequent state transitions. To achieve interpretable intervention optimization without continuous supervision, we propose a contrastive variational optimization paradigm that contrasts the latent differences between intervention paths with variational evolution constraints, guiding the model to generate evolutions consistent with real lithography physics. Experiments show LithoDreamer achieves state-of-the-art performance in forward evolution and inverse planning. Our lithography dataset is publicly available at GitHub (https://github.com/7jiangyq/lithodreamer.git).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces LithoDreamer, the first physics-informed world model for computational lithography. It formulates the Layout-Mask-Resist Image-After Development Image (ADI) pipeline as a decision-driven multi-step evolution system, captures stage-specific physics-informed latent spaces via feature changes between adjacent states, and introduces a contrastive variational optimization paradigm to enable interpretable intervention optimization without continuous supervision. Experiments claim state-of-the-art performance on forward evolution and inverse planning tasks, with a publicly released dataset.

Significance. If the reported metrics hold under the stated conditions, the work provides a unified multi-stage framework that integrates optical imaging, resist exposure, and development physics within a single latent-space world model, addressing fragmentation in existing stage-specific approaches. The public dataset release is a clear strength that enables external verification and extension.

minor comments (3)
  1. §3.2: the description of the contrastive variational objective would benefit from an explicit statement of how the variational evolution constraints are enforced in the loss (e.g., via KL term or reconstruction) to clarify the 'without continuous supervision' claim.
  2. Table 2 and Table 3: axis labels and units for the reported metrics (e.g., CD error, process window) are not fully specified; adding these would improve reproducibility.
  3. §4.1: the baseline methods are listed but the exact hyper-parameter settings used for each (especially for the non-world-model competitors) are not tabulated; a supplementary table would strengthen the SOTA comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of LithoDreamer and the recommendation for minor revision. The report does not enumerate any specific major comments requiring point-by-point rebuttal.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper formulates lithography as a multi-step evolution system and introduces a contrastive variational optimization paradigm to enforce physics consistency without continuous supervision. No equations, fitted parameters renamed as predictions, or self-citation chains are shown that reduce the central claims (forward evolution, inverse planning, or latent intervention) to inputs by construction. The public dataset link permits external falsification of reported metrics. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review limits ledger to explicitly stated elements; full parameter count and derivation details unavailable.

axioms (1)
  • domain assumption The lithography process can be formulated as a decision-driven multi-step evolution system with stage-specific physics-informed latent spaces.
    Directly stated as the core formulation in the abstract.
invented entities (1)
  • contrastive variational optimization paradigm no independent evidence
    purpose: Achieve interpretable intervention optimization without continuous supervision by contrasting latent differences between intervention paths under variational evolution constraints.
    Introduced in the abstract as the key training method.

pith-pipeline@v0.9.1-grok · 5756 in / 1136 out tokens · 34026 ms · 2026-06-26T04:48:59.584655+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 4 linked inside Pith

  1. [1]

    Advances in neural information processing systems , volume=

    Variational autoencoder for deep learning of images, labels and captions , author=. Advances in neural information processing systems , volume=

  2. [2]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Image-to-image translation with conditional adversarial networks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  3. [3]

    Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

  4. [4]

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=

    From IC layout to die photograph: a CNN-based data-driven approach , author=. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=. 2020 , publisher=

  5. [5]

    arXiv preprint arXiv:2010.11929 , year=

    An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

  6. [6]

    Science China Information Sciences , volume=

    Litho-AsymVnet: super-resolution lithography modeling with an asymmetric V-net architecture , author=. Science China Information Sciences , volume=. 2023 , publisher=

  7. [7]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  8. [8]

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=

    L2O-ILT: Learning to optimize inverse lithography techniques , author=. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=. 2023 , publisher=

  9. [9]

    arXiv preprint arXiv:2411.04983 , year=

    Dino-wm: World models on pre-trained visual features enable zero-shot planning , author=. arXiv preprint arXiv:2411.04983 , year=

  10. [10]

    Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , pages=

    Fabgpt: An efficient large multimodal model for complex wafer defect knowledge queries , author=. Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , pages=

  11. [11]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Intelligent opc engineer assistant for semiconductor manufacturing , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  12. [12]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Ominicontrol: Minimal and universal control for diffusion transformer , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  13. [13]

    IEEE Journal of Biomedical and Health Informatics , year=

    Fast-DDPM: Fast denoising diffusion probabilistic models for medical image-to-image generation , author=. IEEE Journal of Biomedical and Health Informatics , year=

  14. [14]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Feature purification matters: Suppressing outlier propagation for training-free open-vocabulary semantic segmentation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  15. [15]

    arXiv preprint arXiv:2502.09992 , year=

    Large language diffusion models , author=. arXiv preprint arXiv:2502.09992 , year=

  16. [16]

    arXiv preprint arXiv:2505.15809 , year=

    Mmada: Multimodal large diffusion language models , author=. arXiv preprint arXiv:2505.15809 , year=

  17. [17]

    Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design , pages=

    Unitho: A Unified Multi-Task Framework for Computational Lithography , author=. Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design , pages=

  18. [18]

    2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=

    LMLitho: A Large Vision Model-Driven Lithography Simulation Framework , author=. 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=. 2025 , organization=

  19. [19]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Navigation world models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  20. [20]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Drivedreamer-2: Llm-enhanced world models for diverse driving video generation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  21. [21]

    Moore and More , volume=

    Recent advances in computational lithography technology , author=. Moore and More , volume=. 2025 , publisher=

  22. [22]

    arXiv preprint arXiv:2510.21219 , year=

    World Models Should Prioritize the Unification of Physical and Social Dynamics , author=. arXiv preprint arXiv:2510.21219 , year=

  23. [23]

    Advances in Neural Information Processing Systems , volume=

    Diffusion for World Modeling: Visual Details Matter in Atari , author=. Advances in Neural Information Processing Systems , volume=

  24. [24]

    Advances in Neural Information Processing Systems , volume=

    Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , author=. Advances in Neural Information Processing Systems , volume=

  25. [25]

    Huang, Yuhang and Zhang, Jiazhao and Zou, Shilong and Liu, Xinwang and Hu, Ruizhen and Xu, Kai , journal=

  26. [26]

    Proceedings of the 39th International Conference on Computer-Aided Design , pages=

    DAMO: Deep agile mask optimization for full chip scale , author=. Proceedings of the 39th International Conference on Computer-Aided Design , pages=

  27. [27]

    Enabling scalable

    Yang, Haoyu and Ren, Haoxing , booktitle=. Enabling scalable

  28. [28]

    Light: Science & Applications , volume=

    Advancements and challenges in inverse lithography technology: a review of artificial intelligence-based approaches , author=. Light: Science & Applications , volume=

  29. [29]

    2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=

    Fabthink: A wafer analysis multimodal llm via chain-of-thought-driven retrieval augmentation , author=. 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=. 2025 , organization=

  30. [30]

    Advances in Neural Information Processing Systems , volume=

    LithoSim: A Large, Holistic Lithography Simulation Benchmark for AI-Driven Semiconductor Manufacturing , author=. Advances in Neural Information Processing Systems , volume=

  31. [31]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Circuit-Think: A Multimodal Reasoning Framework for Automated Circuit-to-Netlist Translation with Trajectory-Guided Reinforcement Learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=