LithoDreamer: A Physics-Informed World Model for Multi-Stage Computational Lithography

Cheng Zhuo; Jinyuan Deng; Qian Jin; Qi Sun; Xunzhao Yin; Yucheng Cui; Yu Li; Yumeng Liu; Yuqi Jiang; Zimu Li

arxiv: 2606.26713 · v1 · pith:RLDY74KDnew · submitted 2026-06-25 · 💻 cs.AI

LithoDreamer: A Physics-Informed World Model for Multi-Stage Computational Lithography

Yuqi Jiang , Yumeng Liu , Zimu Li , Jinyuan Deng , Qian Jin , Yucheng Cui , Yu Li , Xunzhao Yin

show 2 more authors

Qi Sun Cheng Zhuo

This is my paper

Pith reviewed 2026-06-26 04:48 UTC · model grok-4.3

classification 💻 cs.AI

keywords computational lithographyworld modelphysics-informedmask optimizationresist simulationmulti-stage processvariational optimization

0 comments

The pith

LithoDreamer is the first physics-informed world model for the multi-stage lithography process from layout to after-development image.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LithoDreamer to address the limitation of existing models that do not capture the continuous physical process of lithography involving multiple stages. It formulates the pipeline as a decision-driven multi-step evolution system that models changes in feature spaces between states using physics-informed latent representations. A contrastive variational optimization paradigm enables the model to learn consistent evolutions by contrasting intervention paths without needing continuous supervision. This results in improved accuracy for both simulating the forward process and performing inverse planning for optimization. A sympathetic reader would care because accurate modeling could lead to better mask designs and higher manufacturing yields as technology nodes shrink.

Core claim

LithoDreamer formulates the Layout-Mask-Resist Image-After Development Image pipeline as a decision-driven multi-step evolution system. It captures feature changes between adjacent states to create stage-specific physics-informed latent spaces that control process interventions and drive state transitions. The contrastive variational optimization paradigm contrasts latent differences between intervention paths with variational evolution constraints to ensure consistency with real lithography physics without continuous supervision.

What carries the argument

The contrastive variational optimization paradigm that guides generation of physically consistent evolutions by contrasting latent differences between intervention paths under variational constraints.

Load-bearing premise

The contrastive variational optimization paradigm can guide the model to generate evolutions consistent with real lithography physics without continuous supervision.

What would settle it

Measuring the error between LithoDreamer predictions and ground-truth physical simulations or real experimental data for after-development images on a held-out set of process parameters and masks.

Figures

Figures reproduced from arXiv: 2606.26713 by Cheng Zhuo, Jinyuan Deng, Qian Jin, Qi Sun, Xunzhao Yin, Yucheng Cui, Yu Li, Yumeng Liu, Yuqi Jiang, Zimu Li.

**Figure 1.** Figure 1: Comparison of the different processes: (a) Typical commercial simulation workflow; (b) Actual physical lithography manufacturing process; (c) The evolution workflow of our LithoDreamer’s process intervention and lithography state. ing, and photoresist reactions, simulates the practical manufacturing process and plays a key role in addressing imaging complexities and manufacturing constraints (Yang & Ren,… view at source ↗

**Figure 2.** Figure 2: Overview of the LithoDreamer framework. 2.3. Applications of World Models WMs learn latent environment dynamics from historical observations, enabling multi-step state prediction and planning. This paradigm has been widely used in embodied intelligence and autonomous driving to support long-horizon reasoning and efficient decision-making. For example, DriveDreamer-2 (Zhao et al., 2025) combines LLMs with… view at source ↗

**Figure 3.** Figure 3: Three types of light sources in the dataset, each with configurable parameters, such as radius. parameters: source type ( [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of inverse planning on the ID dataset. Given the input layout and target ADI, LithoDreamer plans latent interventions and evolves the Mask, Resist Image, and ADI state to achieve the target pattern. terns into polygonal contours, sample gauge points along the target contour, and measure the local displacement from the target contour to the predicted contour along the target normal direction. … view at source ↗

**Figure 5.** Figure 5: Schematic illustration of gauge-based EPE measurement. Local measurement gauges are placed on the target resist image contour, and edge displacement is evaluated along the corresponding contour-normal direction. The magnified view highlights how the measured offset captures local contour placement deviation between the generated and target resist image patterns. B. Principles of Lithography Metric Calculat… view at source ↗

**Figure 6.** Figure 6: Representative LRC violation categories used for manufacturability assessment. Pinch captures locally narrowed printed features, Bridge captures unintended connections or insufficient spacing between neighboring structures, and EPE captures excessive contour displacement beyond the allowed placement tolerance. Red markers indicate the detected violation regions. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of forward evolution results on OOD samples at the 55 nm process node. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Forward evolution results on curved and irregular OOD layouts. GT LithoDreamer (Ours) GT LithoDreamer (Ours) GT LithoDreamer (Ours) GT LithoDreamer (Ours) Layout Mask Resist Image ADI [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Forward evolution results on isolated contact-like OOD layouts. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

As semiconductor technology nodes scale, computational lithography is essential for ensuring yield and performance. However, lithography is a continuous physical process involving mask optimization, optical imaging, resist exposure, and development, which existing models fail to capture. To overcome this limitation, we present LithoDreamer, the first physics-informed World Model (WM) framework for computational lithography, which formulates the ``Layout-Mask-Resist Image-After Development Image (ADI)'' pipeline as a decision-driven multi-step evolution system. LithoDreamer captures feature changes between adjacent states to model stage-specific physics-informed latent spaces, in which it controls process intervention exploration and drives subsequent state transitions. To achieve interpretable intervention optimization without continuous supervision, we propose a contrastive variational optimization paradigm that contrasts the latent differences between intervention paths with variational evolution constraints, guiding the model to generate evolutions consistent with real lithography physics. Experiments show LithoDreamer achieves state-of-the-art performance in forward evolution and inverse planning. Our lithography dataset is publicly available at GitHub (https://github.com/7jiangyq/lithodreamer.git).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LithoDreamer models the full lithography pipeline as a multi-step world model with contrastive variational optimization and ships a public dataset.

read the letter

The core contribution is treating the Layout-Mask-Resist-ADI sequence as a decision-driven evolution in latent space, where interventions are optimized via contrastive variational constraints rather than dense supervision. They also release the dataset on GitHub, which is the most immediately useful part for anyone in the area.

The framing is new for this domain. Most prior work handles individual stages separately; here the model learns stage-specific physics in latent space and uses the contrastive objective to steer transitions. The construction is presented coherently, with no internal contradictions in the latent intervention mechanism or the variational constraints. The public data link means the reported gains in forward prediction and inverse planning can be checked directly.

The main soft spot is the central claim that the contrastive objective produces physics-consistent rollouts without continuous supervision. The paper shows the method works on their metrics and baselines, but the strength of that result hinges on how well the ablations isolate the contrastive term and whether the chosen baselines are representative. If those comparisons are fair, the result holds; if not, the physics alignment is harder to judge.

This is for people working on computational lithography tools or on world models for multi-stage physical processes. A reader who needs either the dataset or a concrete example of latent-space intervention in a real engineering pipeline will get value. It is worth sending to peer review because the problem matters, the dataset is shared, and the approach is spelled out enough for referees to evaluate the claims on the data.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces LithoDreamer, the first physics-informed world model for computational lithography. It formulates the Layout-Mask-Resist Image-After Development Image (ADI) pipeline as a decision-driven multi-step evolution system, captures stage-specific physics-informed latent spaces via feature changes between adjacent states, and introduces a contrastive variational optimization paradigm to enable interpretable intervention optimization without continuous supervision. Experiments claim state-of-the-art performance on forward evolution and inverse planning tasks, with a publicly released dataset.

Significance. If the reported metrics hold under the stated conditions, the work provides a unified multi-stage framework that integrates optical imaging, resist exposure, and development physics within a single latent-space world model, addressing fragmentation in existing stage-specific approaches. The public dataset release is a clear strength that enables external verification and extension.

minor comments (3)

§3.2: the description of the contrastive variational objective would benefit from an explicit statement of how the variational evolution constraints are enforced in the loss (e.g., via KL term or reconstruction) to clarify the 'without continuous supervision' claim.
Table 2 and Table 3: axis labels and units for the reported metrics (e.g., CD error, process window) are not fully specified; adding these would improve reproducibility.
§4.1: the baseline methods are listed but the exact hyper-parameter settings used for each (especially for the non-world-model competitors) are not tabulated; a supplementary table would strengthen the SOTA comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of LithoDreamer and the recommendation for minor revision. The report does not enumerate any specific major comments requiring point-by-point rebuttal.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper formulates lithography as a multi-step evolution system and introduces a contrastive variational optimization paradigm to enforce physics consistency without continuous supervision. No equations, fitted parameters renamed as predictions, or self-citation chains are shown that reduce the central claims (forward evolution, inverse planning, or latent intervention) to inputs by construction. The public dataset link permits external falsification of reported metrics. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review limits ledger to explicitly stated elements; full parameter count and derivation details unavailable.

axioms (1)

domain assumption The lithography process can be formulated as a decision-driven multi-step evolution system with stage-specific physics-informed latent spaces.
Directly stated as the core formulation in the abstract.

invented entities (1)

contrastive variational optimization paradigm no independent evidence
purpose: Achieve interpretable intervention optimization without continuous supervision by contrasting latent differences between intervention paths under variational evolution constraints.
Introduced in the abstract as the key training method.

pith-pipeline@v0.9.1-grok · 5756 in / 1136 out tokens · 34026 ms · 2026-06-26T04:48:59.584655+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 4 linked inside Pith

[1]

Advances in neural information processing systems , volume=

Variational autoencoder for deep learning of images, labels and captions , author=. Advances in neural information processing systems , volume=
[2]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Image-to-image translation with conditional adversarial networks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[3]

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

2019
[4]

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=

From IC layout to die photograph: a CNN-based data-driven approach , author=. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=. 2020 , publisher=

2020
[5]

arXiv preprint arXiv:2010.11929 , year=

An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

Pith/arXiv arXiv 2010
[6]

Science China Information Sciences , volume=

Litho-AsymVnet: super-resolution lithography modeling with an asymmetric V-net architecture , author=. Science China Information Sciences , volume=. 2023 , publisher=

2023
[7]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[8]

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=

L2O-ILT: Learning to optimize inverse lithography techniques , author=. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=. 2023 , publisher=

2023
[9]

arXiv preprint arXiv:2411.04983 , year=

Dino-wm: World models on pre-trained visual features enable zero-shot planning , author=. arXiv preprint arXiv:2411.04983 , year=

Pith/arXiv arXiv
[10]

Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , pages=

Fabgpt: An efficient large multimodal model for complex wafer defect knowledge queries , author=. Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , pages=
[11]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Intelligent opc engineer assistant for semiconductor manufacturing , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[12]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Ominicontrol: Minimal and universal control for diffusion transformer , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[13]

IEEE Journal of Biomedical and Health Informatics , year=

Fast-DDPM: Fast denoising diffusion probabilistic models for medical image-to-image generation , author=. IEEE Journal of Biomedical and Health Informatics , year=
[14]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Feature purification matters: Suppressing outlier propagation for training-free open-vocabulary semantic segmentation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[15]

arXiv preprint arXiv:2502.09992 , year=

Large language diffusion models , author=. arXiv preprint arXiv:2502.09992 , year=

Pith/arXiv arXiv
[16]

arXiv preprint arXiv:2505.15809 , year=

Mmada: Multimodal large diffusion language models , author=. arXiv preprint arXiv:2505.15809 , year=

Pith/arXiv arXiv
[17]

Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design , pages=

Unitho: A Unified Multi-Task Framework for Computational Lithography , author=. Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design , pages=
[18]

2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=

LMLitho: A Large Vision Model-Driven Lithography Simulation Framework , author=. 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=. 2025 , organization=

2025
[19]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Navigation world models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
[20]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Drivedreamer-2: Llm-enhanced world models for diverse driving video generation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[21]

Moore and More , volume=

Recent advances in computational lithography technology , author=. Moore and More , volume=. 2025 , publisher=

2025
[22]

arXiv preprint arXiv:2510.21219 , year=

World Models Should Prioritize the Unification of Physical and Social Dynamics , author=. arXiv preprint arXiv:2510.21219 , year=

arXiv
[23]

Advances in Neural Information Processing Systems , volume=

Diffusion for World Modeling: Visual Details Matter in Atari , author=. Advances in Neural Information Processing Systems , volume=
[24]

Advances in Neural Information Processing Systems , volume=

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , author=. Advances in Neural Information Processing Systems , volume=
[25]

Huang, Yuhang and Zhang, Jiazhao and Zou, Shilong and Liu, Xinwang and Hu, Ruizhen and Xu, Kai , journal=
[26]

Proceedings of the 39th International Conference on Computer-Aided Design , pages=

DAMO: Deep agile mask optimization for full chip scale , author=. Proceedings of the 39th International Conference on Computer-Aided Design , pages=
[27]

Enabling scalable

Yang, Haoyu and Ren, Haoxing , booktitle=. Enabling scalable
[28]

Light: Science & Applications , volume=

Advancements and challenges in inverse lithography technology: a review of artificial intelligence-based approaches , author=. Light: Science & Applications , volume=
[29]

2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=

Fabthink: A wafer analysis multimodal llm via chain-of-thought-driven retrieval augmentation , author=. 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=. 2025 , organization=

2025
[30]

Advances in Neural Information Processing Systems , volume=

LithoSim: A Large, Holistic Lithography Simulation Benchmark for AI-Driven Semiconductor Manufacturing , author=. Advances in Neural Information Processing Systems , volume=
[31]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Circuit-Think: A Multimodal Reasoning Framework for Automated Circuit-to-Netlist Translation with Trajectory-Guided Reinforcement Learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[1] [1]

Advances in neural information processing systems , volume=

Variational autoencoder for deep learning of images, labels and captions , author=. Advances in neural information processing systems , volume=

[2] [2]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Image-to-image translation with conditional adversarial networks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[3] [3]

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

2019

[4] [4]

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=

From IC layout to die photograph: a CNN-based data-driven approach , author=. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=. 2020 , publisher=

2020

[5] [5]

arXiv preprint arXiv:2010.11929 , year=

An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

Pith/arXiv arXiv 2010

[6] [6]

Science China Information Sciences , volume=

Litho-AsymVnet: super-resolution lithography modeling with an asymmetric V-net architecture , author=. Science China Information Sciences , volume=. 2023 , publisher=

2023

[7] [7]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[8] [8]

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=

L2O-ILT: Learning to optimize inverse lithography techniques , author=. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , volume=. 2023 , publisher=

2023

[9] [9]

arXiv preprint arXiv:2411.04983 , year=

Dino-wm: World models on pre-trained visual features enable zero-shot planning , author=. arXiv preprint arXiv:2411.04983 , year=

Pith/arXiv arXiv

[10] [10]

Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , pages=

Fabgpt: An efficient large multimodal model for complex wafer defect knowledge queries , author=. Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , pages=

[11] [11]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Intelligent opc engineer assistant for semiconductor manufacturing , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[12] [12]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Ominicontrol: Minimal and universal control for diffusion transformer , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[13] [13]

IEEE Journal of Biomedical and Health Informatics , year=

Fast-DDPM: Fast denoising diffusion probabilistic models for medical image-to-image generation , author=. IEEE Journal of Biomedical and Health Informatics , year=

[14] [14]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Feature purification matters: Suppressing outlier propagation for training-free open-vocabulary semantic segmentation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[15] [15]

arXiv preprint arXiv:2502.09992 , year=

Large language diffusion models , author=. arXiv preprint arXiv:2502.09992 , year=

Pith/arXiv arXiv

[16] [16]

arXiv preprint arXiv:2505.15809 , year=

Mmada: Multimodal large diffusion language models , author=. arXiv preprint arXiv:2505.15809 , year=

Pith/arXiv arXiv

[17] [17]

Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design , pages=

Unitho: A Unified Multi-Task Framework for Computational Lithography , author=. Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design , pages=

[18] [18]

2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=

LMLitho: A Large Vision Model-Driven Lithography Simulation Framework , author=. 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=. 2025 , organization=

2025

[19] [19]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Navigation world models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

[20] [20]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Drivedreamer-2: Llm-enhanced world models for diverse driving video generation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[21] [21]

Moore and More , volume=

Recent advances in computational lithography technology , author=. Moore and More , volume=. 2025 , publisher=

2025

[22] [22]

arXiv preprint arXiv:2510.21219 , year=

World Models Should Prioritize the Unification of Physical and Social Dynamics , author=. arXiv preprint arXiv:2510.21219 , year=

arXiv

[23] [23]

Advances in Neural Information Processing Systems , volume=

Diffusion for World Modeling: Visual Details Matter in Atari , author=. Advances in Neural Information Processing Systems , volume=

[24] [24]

Advances in Neural Information Processing Systems , volume=

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , author=. Advances in Neural Information Processing Systems , volume=

[25] [25]

Huang, Yuhang and Zhang, Jiazhao and Zou, Shilong and Liu, Xinwang and Hu, Ruizhen and Xu, Kai , journal=

[26] [26]

Proceedings of the 39th International Conference on Computer-Aided Design , pages=

DAMO: Deep agile mask optimization for full chip scale , author=. Proceedings of the 39th International Conference on Computer-Aided Design , pages=

[27] [27]

Enabling scalable

Yang, Haoyu and Ren, Haoxing , booktitle=. Enabling scalable

[28] [28]

Light: Science & Applications , volume=

Advancements and challenges in inverse lithography technology: a review of artificial intelligence-based approaches , author=. Light: Science & Applications , volume=

[29] [29]

2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=

Fabthink: A wafer analysis multimodal llm via chain-of-thought-driven retrieval augmentation , author=. 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) , pages=. 2025 , organization=

2025

[30] [30]

Advances in Neural Information Processing Systems , volume=

LithoSim: A Large, Holistic Lithography Simulation Benchmark for AI-Driven Semiconductor Manufacturing , author=. Advances in Neural Information Processing Systems , volume=

[31] [31]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Circuit-Think: A Multimodal Reasoning Framework for Automated Circuit-to-Netlist Translation with Trajectory-Guided Reinforcement Learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=