pith. sign in

arxiv: 2511.17340 · v3 · submitted 2025-11-21 · 💻 cs.CV

Refracting Reality: Generating Images with Realistic Transparent Objects

Pith reviewed 2026-05-17 20:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords generative image modelstransparent objectsrefractionSnell's Lawimage synthesisoptical effectspixel warping
0
0 comments X

The pith

Generative models can synthesize transparent objects with accurate refraction by warping pixels using Snell's Law at each generation step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current generative image models often fail to produce realistic transparent objects because they do not account for how light refracts through them according to physical laws. This paper introduces a technique that corrects this by synchronizing the pixels inside the object with the outside scene through warping and merging based on Snell's Law during the entire image generation process. It also synchronizes with a generated panorama to handle surfaces visible only through refraction or reflection. A sympathetic reader would care because this makes it possible to generate convincing images of everyday transparent items like bottles or windows from text descriptions alone, without manual 3D modeling or ray tracing.

Core claim

The paper claims that synchronizing pixels within the object's boundary with those outside by warping and merging using Snell's Law of Refraction at each step of the generation trajectory produces much more optically-plausible images. For surfaces not directly observed but visible via refraction or reflection, their appearance is recovered by synchronizing with a second generated panorama image centered at the object using the same procedure. This respects the physical constraints of refraction without requiring explicit 3D geometry.

What carries the argument

The mechanism of pixel synchronization by warping and merging according to Snell's Law of Refraction, applied at every step of the generation trajectory and extended via a panorama image for unobserved surfaces.

If this is right

  • Generated images of transparent objects will show correct distortion and alignment of background elements as seen through the object.
  • Complex refractive effects become feasible in text-to-image synthesis without additional 3D reconstruction.
  • The method integrates into existing generative pipelines by modifying the generation trajectory.
  • Images respect optical constraints, leading to fewer physically implausible artifacts in transparent regions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar pixel-based enforcement of physical laws could be applied to other phenomena like reflections or shadows in generative models.
  • Adapting the approach to video generation could ensure consistent refraction across frames.
  • This suggests that embedding explicit physical rules into the sampling process may improve accuracy for specific optical effects beyond what data alone provides.

Load-bearing premise

That enforcing refraction through pixel warping with Snell's Law at each generation step and synchronizing to a panorama image suffices to produce accurate results without full 3D geometry or ray-tracing.

What would settle it

Create an image of a refractive sphere over a grid pattern and verify if the observed distortion of the grid matches the predictions from the law of refraction for the given index and geometry.

Figures

Figures reproduced from arXiv: 2511.17340 by Dylan Campbell, Enze Tao, Yue Yin.

Figure 1
Figure 1. Figure 1: Image generation with optically-accurate refractions and consistent reflections. (Top) Images generated by Flux [ [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Snellcaster flowchart. (Top) An initial image [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison across six scenes. The first column is rendered in Blender with the estimated geometry and appearance [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Kitchen scene example of the synchronized object [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation study of the proposed method. We compare the full model with variants that remove individual components: detail [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Image synthesis for another object type (polygonal fox): [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Prompts used for generating scenes. The full prompts [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: An example of extracting the object specific prompt [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Additional qualitative results on complex transparent objects. We show outputs from Snellcaster (ours), Flux inpainting [ [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Additional qualitative results on complex transparent objects (continued). This second part supplements Figure [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Appearance variation under different refractive indices for a glass sphere. We render the same scene while sweeping the [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative comparison of time travel repeat counts. The top half of the figure shows configurations where the same repeat count [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
read the original abstract

Generative image models can produce convincingly real images, with plausible shapes, textures, layouts and lighting. However, one domain in which they perform notably poorly is in the synthesis of transparent objects, which exhibit refraction, reflection, absorption and scattering. Refraction is a particular challenge, because refracted pixel rays often intersect with surfaces observed in other parts of the image, providing a constraint on the color. It is clear from inspection that generative models have not distilled the laws of optics sufficiently well to accurately render refractive objects. In this work, we consider the problem of generating images with accurate refraction, given a text prompt. We synchronize the pixels within the object's boundary with those outside by warping and merging the pixels using Snell's Law of Refraction, at each step of the generation trajectory. For those surfaces that are not directly observed in the image, but are visible via refraction or reflection, we recover their appearance by synchronizing the image with a second generated image -- a panorama centered at the object -- using the same warping and merging procedure. We demonstrate that our approach generates much more optically-plausible images that respect the physical constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a method to generate images with realistic transparent objects from text prompts. It modifies the generative trajectory by warping and merging pixels within the object's boundary using Snell's Law of Refraction at each step. For surfaces visible only via refraction or reflection, it generates and synchronizes with a second panorama image centered at the object using the same warping procedure, claiming this yields more optically plausible results that respect physical constraints.

Significance. If the central mechanism can be shown to produce accurate refraction without explicit 3D geometry or full ray tracing, the work would offer a meaningful step toward embedding optical physics into diffusion-based image synthesis. This could improve fidelity in domains such as product visualization and scene rendering where transparent materials are common. The auxiliary panorama idea addresses an important coverage issue for unobserved surfaces.

major comments (2)
  1. [Abstract] Abstract: The core claim that pixels are synchronized 'by warping and merging the pixels using Snell's Law of Refraction, at each step of the generation trajectory' is load-bearing. Snell's Law in vector form requires the surface normal n, incident direction, and refractive index to compute the refracted ray and source pixel location. The manuscript provides no procedure for recovering these quantities (depth, normals, or equivalent) from the 2D prompt or intermediate generation state, making the described synchronization step impossible to execute as stated.
  2. [Method] Method (or equivalent section describing the warping): Without an explicit mechanism for obtaining surface normals or depth (e.g., monocular depth estimation, prompt-derived shape prior, or per-step segmentation), the pixel-warping operation cannot be performed. This omission directly undermines the claim of producing refraction that respects physical constraints while avoiding explicit 3D geometry.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'synchronize the pixels' is repeated without a precise definition of the merging operation (e.g., alpha blending weights, handling of multiple refractions, or occlusion resolution).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments correctly identify that the abstract and method description are concise and would benefit from greater explicitness regarding the recovery of geometric quantities needed for Snell's Law. We address each major comment below and will incorporate clarifications in the revision.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The core claim that pixels are synchronized 'by warping and merging the pixels using Snell's Law of Refraction, at each step of the generation trajectory' is load-bearing. Snell's Law in vector form requires the surface normal n, incident direction, and refractive index to compute the refracted ray and source pixel location. The manuscript provides no procedure for recovering these quantities (depth, normals, or equivalent) from the 2D prompt or intermediate generation state, making the described synchronization step impossible to execute as stated.

    Authors: We acknowledge that the abstract does not detail the recovery procedure. In the full Method section we apply a pre-trained monocular depth and normal estimator to the intermediate denoised image at each step, conditioned on the text prompt to focus on the object region. The incident direction is obtained from the pixel coordinate under a standard pinhole camera assumption, and the refractive index is taken from material keywords in the prompt (e.g., glass = 1.5). We will revise the abstract to briefly reference this estimation step and add pseudocode plus a pipeline diagram in the Method section. revision: yes

  2. Referee: [Method] Method (or equivalent section describing the warping): Without an explicit mechanism for obtaining surface normals or depth (e.g., monocular depth estimation, prompt-derived shape prior, or per-step segmentation), the pixel-warping operation cannot be performed. This omission directly undermines the claim of producing refraction that respects physical constraints while avoiding explicit 3D geometry.

    Authors: We agree that the current wording leaves the mechanism underspecified. Our approach integrates a monocular depth/normal network run on the current generation state at every denoising step; the resulting normals and depths are used directly to evaluate the vector form of Snell's Law for the warping and merging operation. No explicit 3D mesh or full ray-tracing is constructed. We will expand the Method section with equations, a step-by-step algorithm box, and an additional figure showing the per-step estimation and warping to make the procedure fully reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity: method applies external physical law without self-referential reduction

full rationale

The paper's core procedure applies Snell's Law of Refraction (an established result from optics) to warp and merge pixels during the diffusion trajectory and to synchronize with a generated panorama. No equations or steps reduce the claimed output to fitted parameters, self-defined quantities, or a chain of self-citations whose validity depends on the present work. The derivation remains self-contained because it imports an independent physical constraint rather than deriving the refraction behavior from the generative model's own statistics or renaming an empirical pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard optics and generative modeling assumptions; no free parameters, ad-hoc axioms, or new invented entities are mentioned in the abstract.

axioms (2)
  • standard math Snell's Law accurately describes refraction at object boundaries
    Invoked directly as the warping rule in the method.
  • domain assumption Diffusion models can be steered by pixel-level warping operations without breaking the generative process
    Assumed when the warping is applied at each step of the generation trajectory.

pith-pipeline@v0.9.0 · 5491 in / 1329 out tokens · 29947 ms · 2026-05-17T20:24:13.463819+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

  1. [1]

    Elsevier, 2013

    Max Born and Emil Wolf.Principles of Optics: Electromag- netic Theory of Propagation, Interference and Diffraction of Light. Elsevier, 2013. 2, 3, 4

  2. [2]

    LookingGlass: Generative anamor- phoses via laplacian pyramid warping

    Pascal Chang, Sergio Sancho, Jingwei Tang, Markus Gross, and Vinicius Azevedo. LookingGlass: Generative anamor- phoses via laplacian pyramid warping. InCVPR, pages 24– 33, 2025. 3, 4, 5, 6, 1

  3. [3]

    Scribblelight: Single image indoor relighting with scribbles

    Jun Myeong Choi, Annie Wang, Pieter Peers, Anand Bhat- tad, and Roni Sengupta. Scribblelight: Single image indoor relighting with scribbles. InCVPR, pages 5720–5731, 2025. 2

  4. [4]

    Blender Foundation, Stichting Blender Foundation, Amsterdam, 2024

    Blender Online Community.Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2024. 5

  5. [5]

    Latent swap joint diffusion for 2d long-form latent gen- eration

    Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, Kewei Li, Jun Du, Lei Sun, Jianqing Gao, Ruoyu Wang, and Jiefeng Ma. Latent swap joint diffusion for 2d long-form latent gen- eration. InICCV, pages 11006–11015, 2025. 2

  6. [6]

    Flashtex: Fast relightable mesh texturing with lightcontrolnet

    Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, and Maneesh Agrawala. Flashtex: Fast relightable mesh texturing with lightcontrolnet. InECCV, pages 90–107. Springer, 2024. 2

  7. [7]

    Reflecting reality: Enabling diffusion models to produce faithful mirror reflections

    Ankit Dhiman, Manan Shah, Rishubh Parihar, Yash Bhalgat, Lokesh R Boregowda, and R Venkatesh Babu. Reflecting reality: Enabling diffusion models to produce faithful mirror reflections. In2025 International Conference on 3D Vision (3DV), pages 824–834. IEEE, 2025. 2

  8. [8]

    Visual ana- grams: Generating multi-view optical illusions with diffu- sion models

    Daniel Geng, Inbum Park, and Andrew Owens. Visual ana- grams: Generating multi-view optical illusions with diffu- sion models. InCVPR, pages 24154–24163, 2024. 2

  9. [9]

    Pearson Education India, 2012

    Eugene Hecht.Optics. Pearson Education India, 2012. 5

  10. [10]

    Clipscore: A reference-free evaluation met- ric for image captioning

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation met- ric for image captioning. InProceedings of the 2021 confer- ence on empirical methods in natural language processing, pages 7514–7528, 2021. 5

  11. [11]

    Shadow generation for composite image in real-world scenes

    Yan Hong, Li Niu, and Jianfu Zhang. Shadow generation for composite image in real-world scenes. InAAAI, pages 914–922, 2022. 2

  12. [12]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. In ICLR, page 3, 2022. 2

  13. [13]

    Mask-shadowgan: Learning to remove shadows from unpaired data

    Xiaowei Hu, Yitong Jiang, Chi-Wing Fu, and Pheng-Ann Heng. Mask-shadowgan: Learning to remove shadows from unpaired data. InICCV, pages 2472–2481, 2019. 2

  14. [14]

    Neural gaffer: Relighting any object via diffusion.NeurIPS, 37: 141129–141152, 2024

    Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural gaffer: Relighting any object via diffusion.NeurIPS, 37: 141129–141152, 2024. 2

  15. [15]

    Automatic scene inference for 3d object compositing.ACM TOG, 33(3):1–15, 2014

    Kevin Karsch, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Hailin Jin, Rafael Fonte, Michael Sittig, and David Forsyth. Automatic scene inference for 3d object compositing.ACM TOG, 33(3):1–15, 2014. 2

  16. [16]

    Exposing photo manipulation from shading and shadows.ACM TOG, 33(5): 1–21, 2014

    Eric Kee, James F O’brien, and Hany Farid. Exposing photo manipulation from shading and shadows.ACM TOG, 33(5): 1–21, 2014. 2

  17. [17]

    Switchlight: Co-design of physics- driven architecture and pre-training framework for human portrait relighting

    Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, and Sanghyun Woo. Switchlight: Co-design of physics- driven architecture and pre-training framework for human portrait relighting. InCVPR, pages 25096–25106, 2024. 2

  18. [18]

    SyncTweedies: A general generative framework based on synchronized diffusions

    Jaihoon Kim, Juil Koo, Kyeongmin Yeo, and Minhyuk Sung. SyncTweedies: A general generative framework based on synchronized diffusions. InNeurIPS, pages 95198–95237,

  19. [19]

    Lightit: Illumination modeling and control for diffusion models

    Peter Kocsis, Julien Philip, Kalyan Sunkavalli, Matthias Nießner, and Yannick Hold-Geoffroy. Lightit: Illumination modeling and control for diffusion models. InCVPR, pages 9359–9369, 2024. 2

  20. [20]

    Flux.1 kontext: Flow matching for in-context image generation and editing in latent space,

    Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dock- horn, Jack English, Zion English, Patrick Esser, Sumith Ku- lal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas M¨uller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context i...

  21. [21]

    Syncsde: A probabilistic framework for diffusion synchronization

    Hyunjun Lee, Hyunsoo Lee, and Sookwan Han. Syncsde: A probabilistic framework for diffusion synchronization. In CVPR, pages 17508–17517, 2025. 2

  22. [22]

    Syncdiffusion: Coherent montage via synchronized joint diffusions.NeurIPS, 36:50648–50660, 2023

    Yuseung Lee, Kunho Kim, Hyunjin Kim, and Minhyuk Sung. Syncdiffusion: Coherent montage via synchronized joint diffusions.NeurIPS, 36:50648–50660, 2023. 2

  23. [23]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maxim- ilian Nickel, and Matthew Le. Flow matching for generative modeling. InICLR, 2023. 3

  24. [24]

    Arshadowgan: Shadow generative adversarial network for augmented reality in sin- gle light scenes

    Daquan Liu, Chengjiang Long, Hongpan Zhang, Hanning Yu, Xinzhi Dong, and Chunxia Xiao. Arshadowgan: Shadow generative adversarial network for augmented reality in sin- gle light scenes. InCVPR, pages 8139–8148, 2020. 2

  25. [25]

    Shadow generation for composite image using diffusion model

    Qingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, and Li Niu. Shadow generation for composite image using diffusion model. InCVPR, pages 8121–8130, 2024. 2

  26. [26]

    Learning physics-guided face re- lighting under directional light

    Thomas Nestmeyer, Jean-Franc ¸ois Lalonde, Iain Matthews, and Andreas Lehrmann. Learning physics-guided face re- lighting under directional light. InCVPR, pages 5124–5133,

  27. [27]

    Chatgpt.https://openai.com, 2025

    OpenAI. Chatgpt.https://openai.com, 2025. Ver- sion 5.1. 1

  28. [28]

    Diffusionlight: Light probes for free by painting a chrome ball

    Pakkapon Phongthawee, Worameth Chinchuthakun, Non- taphat Sinsunthithet, Varun Jampani, Amit Raj, Pramook Khungurn, and Supasorn Suwajanakorn. Diffusionlight: Light probes for free by painting a chrome ball. InCVPR, pages 98–108, 2024. 2

  29. [29]

    SDXL: Improving Latent Diffusion Mod- els for High-Resolution Image Synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. SDXL: Improving Latent Diffusion Mod- els for High-Resolution Image Synthesis. InThe Twelfth In- ternational Conference on Learning Representations, 2023. 2

  30. [30]

    Relightful harmonization: Lighting-aware portrait background replacement

    Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, and He 7 Zhang. Relightful harmonization: Lighting-aware portrait background replacement. InCVPR, pages 6452–6462, 2024. 2

  31. [31]

    An empirical bayes approach to statis- tics

    Herbert E Robbins. An empirical bayes approach to statis- tics. InBreakthroughs in Statistics: Foundations and basic theory, pages 388–394. Springer, 1992. 2, 3

  32. [32]

    High-resolution Image Synthesis with Latent Diffusion Models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution Image Synthesis with Latent Diffusion Models. InCVPR, pages 10684–10695, 2022. 2

  33. [33]

    A light stage on every desk

    Soumyadip Sengupta, Brian Curless, Ira Kemelmacher- Shlizerman, and Steven M Seitz. A light stage on every desk. InICCV, pages 2420–2429, 2021. 2

  34. [34]

    Ssn: Soft shadow network for image compositing

    Yichen Sheng, Jianming Zhang, and Bedrich Benes. Ssn: Soft shadow network for image compositing. InCVPR, pages 4380–4390, 2021. 2

  35. [35]

    Controllable shadow generation using pixel height maps

    Yichen Sheng, Yifan Liu, Jianming Zhang, Wei Yin, A Cen- giz Oztireli, He Zhang, Zhe Lin, Eli Shechtman, and Bedrich Benes. Controllable shadow generation using pixel height maps. InECCV, pages 240–256. Springer, 2022

  36. [36]

    Pixht-lab: Pixel height based light effect generation for im- age compositing

    Yichen Sheng, Jianming Zhang, Julien Philip, Yannick Hold- Geoffroy, Xin Sun, He Zhang, Lu Ling, and Bedrich Benes. Pixht-lab: Pixel height based light effect generation for im- age compositing. InCVPR, pages 16643–16653, 2023. 2

  37. [37]

    Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul Debevec, and Ravi Ramamoorthi

    Tiancheng Sun, Jonathan T. Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul Debevec, and Ravi Ramamoorthi. Single image portrait relighting.ACM TOG, 38(4), 2019. 2

  38. [38]

    Shadow generation with decomposed mask prediction and attentive shadow filling

    Xinhao Tao, Junyan Cao, Yan Hong, and Li Niu. Shadow generation with decomposed mask prediction and attentive shadow filling. InAAAI, pages 5198–5206, 2024. 2

  39. [39]

    Ref-NeRF: Structured View-Dependent Appearance for Neural Radi- ance Fields

    Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T Barron, and Pratul P Srinivasan. Ref-NeRF: Structured View-Dependent Appearance for Neural Radi- ance Fields. InCVPR, pages 5481–5490. IEEE, 2022. 5

  40. [40]

    MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

    Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details.arXiv preprint arXiv:2507.02546,

  41. [41]

    Sunstage: Portrait reconstruction and relighting using the sun as a light stage

    Yifan Wang, Aleksander Holynski, Xiuming Zhang, and Xu- aner Zhang. Sunstage: Portrait reconstruction and relighting using the sun as a light stage. InCVPR, pages 20792–20802,

  42. [42]

    Zero-shot image restoration using denoising diffusion null-space model

    Yinhuai Wang, Jiwen Yu, and Jian Zhang. Zero-shot image restoration using denoising diffusion null-space model. In ICLR, 2023. 5, 1

  43. [43]

    Structured 3d latents for scalable and versatile 3d gen- eration

    Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d gen- eration. InCVPR, pages 21469–21480, 2025. 1

  44. [44]

    Imagereward: learning and evaluating human preferences for text-to-image generation

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: learning and evaluating human preferences for text-to-image generation. InNeurIPS, pages 15903–15935, 2023. 5

  45. [45]

    Refref: A synthetic dataset and benchmark for recon- structing refractive and reflective objects.arXiv preprint arXiv:2505.05848, 2025

    Yue Yin, Enze Tao, Weijian Deng, and Dylan Campbell. Refref: A synthetic dataset and benchmark for recon- structing refractive and reflective objects.arXiv preprint arXiv:2505.05848, 2025. 1

  46. [46]

    Freedom: Training-free energy-guided condi- tional diffusion model

    Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. Freedom: Training-free energy-guided condi- tional diffusion model. InICCV, pages 23174–23184, 2023. 5

  47. [47]

    Dilightnet: Fine-grained light- ing control for diffusion-based image generation

    Chong Zeng, Yue Dong, Pieter Peers, Youkang Kong, Hongzhi Wu, and Xin Tong. Dilightnet: Fine-grained light- ing control for diffusion-based image generation. InACM SIGGRAPH 2024 Conference Papers, pages 1–12, 2024. 2

  48. [48]

    Scaling in-the-wild training for diffusion-based illumination harmo- nization and editing by imposing consistent light transport

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Scaling in-the-wild training for diffusion-based illumination harmo- nization and editing by imposing consistent light transport. InICLR, 2025. 2

  49. [49]

    Shadowgan: Shadow synthesis for virtual objects with conditional adver- sarial networks.Computational Visual Media, 5(1):105–115,

    Shuyang Zhang, Runze Liang, and Miao Wang. Shadowgan: Shadow synthesis for virtual objects with conditional adver- sarial networks.Computational Visual Media, 5(1):105–115,

  50. [50]

    Shadow generation using diffusion model with geometry prior

    Haonan Zhao, Qingyang Liu, Xinhao Tao, Li Niu, and Guangtao Zhai. Shadow generation using diffusion model with geometry prior. InCVPR, pages 7603–7612, 2025. 2

  51. [51]

    Re- lightable neural human assets from multi-view gradient illu- minations

    Taotao Zhou, Kai He, Di Wu, Teng Xu, Qixuan Zhang, Kuix- iang Shao, Wenzheng Chen, Lan Xu, and Jingyi Yu. Re- lightable neural human assets from multi-view gradient illu- minations. InCVPR, pages 4315–4327, 2023. 2 8