pith. machine review for the scientific record. sign in

arxiv: 2604.15310 · v2 · submitted 2026-04-16 · 💻 cs.CV · cs.GR

Recognition: unknown

TokenLight: Precise Lighting Control in Images using Attribute Tokens

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:24 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords image relightingattribute tokenslighting controlconditional image generationsynthetic datalight transportvirtual light sources
0
0 comments X

The pith

Attribute tokens enable a generative model to precisely control multiple lighting attributes in a single image.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that encoding separate lighting factors as attribute tokens lets a conditional image generator perform detailed relighting edits. Training happens mostly on synthetic scenes that supply exact ground-truth lighting labels, with a smaller set of real photographs added for realism. The resulting model handles tasks such as moving in-scene lights or introducing virtual sources while producing effects that respect geometry, occlusion, and materials. A sympathetic reader cares because the method turns professional-grade illumination adjustments into a direct, continuous operation on ordinary photos.

Core claim

TokenLight formulates relighting as a conditional image generation task and introduces attribute tokens to encode distinct lighting factors such as intensity, color, ambient illumination, diffuse level, and 3D light positions. The model is trained on a large-scale synthetic dataset with ground-truth lighting annotations, supplemented by a small set of real captures to enhance realism and generalization. It achieves state-of-the-art quantitative and qualitative performance on relighting tasks including controlling in-scene lighting fixtures and editing environment illumination using virtual light sources, on synthetic and real images. Without explicit inverse rendering supervision, the model

What carries the argument

Attribute tokens that encode distinct lighting factors such as intensity, color, ambient illumination, diffuse level, and 3D light positions inside a conditional image generation network.

If this is right

  • Continuous and precise control over multiple independent lighting attributes becomes available within one model.
  • Convincing results appear in traditionally difficult cases such as lights placed inside objects or edits on transparent materials.
  • State-of-the-art scores are reached on both quantitative metrics and visual quality for synthetic and real inputs.
  • Generalization occurs from mostly synthetic training data to arbitrary real photographs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The token approach could be extended to control other scene properties such as material appearance or viewpoint in the same framework.
  • Implicit capture of light transport from data may reduce reliance on explicit physics simulators for training relighting systems.
  • Adding temporal consistency tokens might allow the same control to be applied across video frames without flickering.

Load-bearing premise

Training primarily on large-scale synthetic data with ground-truth lighting annotations, supplemented by a small set of real captures, produces a model that generalizes to arbitrary real-world images and complex lighting edits without introducing artifacts or incorrect light transport.

What would settle it

A set of real photographs with known but edited light positions where the output images either match or fail to match measured shadow directions, reflection patterns, and occlusion boundaries.

Figures

Figures reproduced from arXiv: 2604.15310 by He Zhang, Jingyuan Liu, Julie Dorsey, Mengwei Ren, Sumit Chaturvedi, Yannick Hold-Geoffroy, Yiqun Mei, Zhixin Shu.

Figure 1
Figure 1. Figure 1: TokenLight enables intuitive, flexible lighting edits for any image. Using a tokenized lighting-attribute design for diffusion transformers, it provides precise control over a light’s intensity, color, diffuseness, 3D location, and more. This enables effects like placing a light inside a pumpkin, adding back-lighting to a lion, simulating Rembrandt-style setups, diffusing shadows and shifting color tones. … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our data construction and lighting controls. We generate per-light renders, ambient images, and light-visibility [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Architecture overview, shown here for the 3D [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Example trajectory and confusion maps measure preci￾sion as sensitivity (B/A ↑) and accuracy (A ↓). Our method is closer to ground truth than Neural Gaffer [30] [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Synthetic data evaluation with Neural Gaffer [ [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: VisibleFixture-60. We show two cases: (i) turning a [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: In-the-wild examples showcasing the method’s precise lighting control. Best viewed on a screen—zoom in for details. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Relighting comparison with Careaga et al. [ [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison with GenLit [3]: (i) Centered pumpkin light reproduced only by ours; (ii) Frontal view where both place the top light correctly; (iii) Low-angle view where GenLit’s top light drifts, while ours remains consistent. 4.2. Qualitative Comparisons We compare with recent relighting methods: GenLit [3], Careaga et al. [8], and LightLab [41]. Since all are closed￾source, results for GenLit and Careaga e… view at source ↗
Figure 10
Figure 10. Figure 10: An illustration of the scene-agnostic camera and lighting parameterization, implemented via a transformation from a canonical [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: User Study Questionnaire. Here, we provide additional details for the user study summarized in Tab. 2 of the main paper. In the absence of ground-truth lighting, quantitative evaluation on in-the-wild images is challenging. We therefore conduct a user study comparing TokenLight against the closest spatial-control baselines, GenLit [3] and Careaga et al. [8]. As shown in [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
Figure 12
Figure 12. Figure 12: Effect of sampling steps on relighting quality. Quantitative evaluation of PSNR, SSIM, LPIPS versus sampling steps and inference time (top). In-the-wild qualitative results at varying step counts for inputs from [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Spatial lighting for portraits: For each input, we show three relit outputs with lights inserted at different 3D locations. The results demonstrate high-quality portrait relighting and complex light–geometry interactions, including challenging cases such as the veiled subject [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Extreme camera viewpoints. We test on top-down views (left) moving the point light along two trajectories (top). The model follows both motions, left → right (green border) and down → up (blue border), with coherent shading and shadows [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Additional comparison with Careaga et al. [8] We show the input image and results for two lighting directions (top-lit and left-lit). In the first row, our method produces plausible relighting on fur. In the second row, it better preserves the material appearance of the input. In the last row, it relights the building while maintaining its albedo [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Additional comparison with GenLit [3] Across these additional examples, our method produces more consistent light placement, while GenLit often exhibits drift in the intended light position [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Continuous ambient–intensity control. Each row shows an input image and its light-fixture mask. All remaining illumination is treated as ambient and is progressively reduced by our method. In the first row, for instance, the ambient light from the window fades while the masked light source’s reflection in the glass remains unchanged, illustrating proper separation of ambient and the masked light-fixture … view at source ↗
Figure 18
Figure 18. Figure 18: Ambient-light diffusion for shadow softness. In each row, we show an input and three results with increasing global diffuse levels that increase shadow softness [PITH_FULL_IMAGE:figures/full_fig_p024_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Ambient-light diffusion for shadow sharpening. In each row, we show an input and three results with different global diffuse levels. By providing negative dg values, we can adjust the ambient light diffuse level to make shadows sharper [PITH_FULL_IMAGE:figures/full_fig_p025_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Light-intensity control to gradually turn off lights. Our method gradually turns off light in input images, provided a mask and different intensity levels, even generalizing to complicated light fixtures such as chandeliers, different types of lamps and streetlights. In the last two rows we show how the mask can be used to localize lighting edits, turning off street lights one at a time [PITH_FULL_IMAGE:… view at source ↗
Figure 21
Figure 21. Figure 21: Light-intensity control for turning lights on: Given a light-visibility mask, our method increases intensity with realistic light falloffs, i.e., the illuminated portions on nearby walls retain plausible boundary based on light-fixture shape [PITH_FULL_IMAGE:figures/full_fig_p027_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Outdoor light disentanglement. Turning off a car’s backlight suppresses only the light, without darkening the environment. Our pipeline renders localized light edits under diverse environment maps, promoting this separation [PITH_FULL_IMAGE:figures/full_fig_p028_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: We present additional results on our VisibleFixture-60 test set with available ground truth. Rows (i-iii) show examples where our method turns lights on while rows (iv-v) show examples where our method turns lights off [PITH_FULL_IMAGE:figures/full_fig_p029_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Seed-dependent variation across diffuse levels. We insert a virtual light at the same 3D location and vary its diffuse level across the three rows (low to high). Each row shows three outputs generated with different inference seeds. At low diffuse levels, shadows become sharper and exhibit minor seed-dependent shifts. As the diffuse level increases, results become naturally more consistent across seeds. A… view at source ↗
Figure 25
Figure 25. Figure 25: Comparison with LightLab [41]: Using results provided by the authors, we qualitatively compare our method—reproducing photorealistic effects such as mug shadows and light reflecting from tabletop in (i), and the lamp turning off in (ii) [PITH_FULL_IMAGE:figures/full_fig_p031_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Our compact light representation can easily be extended to support simultaneous editing of multiple light sources with different [PITH_FULL_IMAGE:figures/full_fig_p032_26.png] view at source ↗
read the original abstract

This paper presents a method for image relighting that enables precise and continuous control over multiple illumination attributes in a photograph. We formulate relighting as a conditional image generation task and introduce attribute tokens to encode distinct lighting factors such as intensity, color, ambient illumination, diffuse level, and 3D light positions. The model is trained on a large-scale synthetic dataset with ground-truth lighting annotations, supplemented by a small set of real captures to enhance realism and generalization. We validate our approach across a variety of relighting tasks, including controlling in-scene lighting fixtures and editing environment illumination using virtual light sources, on synthetic and real images. Our method achieves state-of-the-art quantitative and qualitative performance compared to prior work. Remarkably, without explicit inverse rendering supervision, the model exhibits an inherent understanding of how light interacts with scene geometry, occlusion, and materials, yielding convincing lighting effects even in traditionally challenging scenarios such as placing lights within objects or relighting transparent materials plausibly. Project page: vrroom.github.io/tokenlight/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces TokenLight, a conditional generative model for image relighting that encodes multiple lighting attributes (intensity, color, ambient illumination, diffuse level, 3D positions) via learned attribute tokens. Trained primarily on large-scale synthetic data with ground-truth lighting annotations and supplemented by a small real-capture set, the method claims state-of-the-art quantitative and qualitative results on synthetic and real images. It further asserts that the model acquires an implicit understanding of light transport, occlusion, and materials without explicit inverse-rendering losses, enabling plausible edits in challenging cases such as in-object lighting and transparent materials.

Significance. If the generalization and implicit-physics claims are substantiated with quantitative evidence, the work would offer a practical token-based interface for continuous, multi-attribute lighting control that avoids explicit inverse rendering. This could impact downstream applications in computational photography, virtual production, and image editing. The absence of reported metrics, ablations, or failure statistics on held-out real data currently prevents assessment of whether the approach delivers a genuine advance over existing relighting techniques.

major comments (3)
  1. [Abstract] Abstract: the claim of 'state-of-the-art quantitative and qualitative performance' is unsupported by any reported metrics, baselines, or ablation tables. Without these, the SOTA assertion cannot be evaluated and the central generalization claim remains unverified.
  2. [Validation / Experiments] Validation section (implied by abstract description of experiments): no quantitative error maps, perceptual scores, or failure-rate statistics are provided for held-out real photographs with complex lighting. Reliance on qualitative examples alone leaves open whether observed effects are interpolations within the synthetic distribution or genuine out-of-distribution generalization.
  3. [Method / Training] Training description: the paper states that the model is 'trained on a large-scale synthetic dataset with ground-truth lighting annotations, supplemented by a small set of real captures,' yet provides no details on the relative weighting, domain-adaptation losses, or ablation of the real-data component. This information is load-bearing for the claim that the attribute tokens encode physically plausible light transport on arbitrary real scenes.
minor comments (2)
  1. [Abstract] The abstract mentions 'continuous control' but does not specify the parameterization or interpolation mechanism for the attribute tokens; a brief description of the token embedding space would improve clarity.
  2. [Abstract] Project page is referenced but no link or supplementary material identifier is given in the manuscript text provided.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We agree that the current manuscript version does not provide sufficient quantitative metrics, ablations, or training details to fully support the claims made in the abstract and method description. We will revise the paper to include these elements, strengthening the evidence for generalization and the role of attribute tokens. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 'state-of-the-art quantitative and qualitative performance' is unsupported by any reported metrics, baselines, or ablation tables. Without these, the SOTA assertion cannot be evaluated and the central generalization claim remains unverified.

    Authors: We acknowledge that the abstract asserts state-of-the-art quantitative and qualitative performance, yet the submitted manuscript does not include the corresponding metric tables, baseline comparisons, or ablations to substantiate this. This is a clear presentation gap. In the revised version we will add comprehensive quantitative results, including tables reporting PSNR, SSIM, LPIPS and other perceptual metrics against relevant prior relighting methods on both synthetic and real test sets, plus ablation studies on the attribute-token architecture and training data. These additions will allow direct evaluation of the SOTA claim and the generalization assertions. revision: yes

  2. Referee: [Validation / Experiments] Validation section (implied by abstract description of experiments): no quantitative error maps, perceptual scores, or failure-rate statistics are provided for held-out real photographs with complex lighting. Reliance on qualitative examples alone leaves open whether observed effects are interpolations within the synthetic distribution or genuine out-of-distribution generalization.

    Authors: We agree that the current validation section relies primarily on qualitative examples for real images and lacks quantitative support for out-of-distribution generalization. In the revision we will add quantitative evaluations on held-out real photographs, including perceptual scores (LPIPS, FID), error maps (where proxy ground truth can be obtained), and failure-case statistics for complex lighting scenarios. This will help distinguish genuine generalization from interpolation within the synthetic distribution. revision: yes

  3. Referee: [Method / Training] Training description: the paper states that the model is 'trained on a large-scale synthetic dataset with ground-truth lighting annotations, supplemented by a small set of real captures,' yet provides no details on the relative weighting, domain-adaptation losses, or ablation of the real-data component. This information is load-bearing for the claim that the attribute tokens encode physically plausible light transport on arbitrary real scenes.

    Authors: We recognize that the lack of training details undermines the claim that attribute tokens learn physically plausible light transport on real scenes. In the revised manuscript we will expand the training section with the exact dataset sizes and mixing ratios, the training schedule, any domain-adaptation mechanisms employed, and ablation experiments that isolate the contribution of the real-capture data to performance on real-world images. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external synthetic GT and real-image validation

full rationale

The paper formulates relighting as conditional generation with attribute tokens and trains on large-scale synthetic data with ground-truth lighting annotations plus a small real-capture supplement. No equations, derivations, or self-citations are shown that reduce the claimed implicit light-transport understanding or generalization performance to quantities defined by the authors' own fitted parameters. The central result is presented as an observed outcome of standard supervised training rather than a self-referential construction, and validation is described as relying on external benchmarks and qualitative comparisons.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical effectiveness of token-based conditioning and the transfer from synthetic to real data; no explicit free parameters beyond standard neural network weights are named, but the approach implicitly assumes disentanglement of lighting factors via tokens.

free parameters (1)
  • neural network weights and token embeddings
    Learned during training on the synthetic-plus-real dataset; central to all performance claims.
axioms (1)
  • domain assumption Synthetic images with perfect lighting ground truth plus limited real captures suffice to learn generalizable light transport behavior
    Invoked when stating that the model exhibits inherent understanding without explicit supervision.
invented entities (1)
  • attribute tokens no independent evidence
    purpose: To encode and disentangle distinct lighting factors for conditional generation
    New representational device introduced by the paper; no independent evidence outside the trained model is provided.

pith-pipeline@v0.9.0 · 5497 in / 1376 out tokens · 54874 ms · 2026-05-10T11:24:32.320797+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

82 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Inverse lighting and photorealistic rendering for augmented reality.Vis

    Miika Aittala. Inverse lighting and photorealistic rendering for augmented reality.Vis. Comput., 26(6–8):669–678, 2010. 2

  2. [2]

    Barron and Jitendra Malik

    Jonathan T. Barron and Jitendra Malik. Shape, Illumination, and Reflectance from Shading.IEEE Trans. Pattern Anal. Mach. Intell., 37(8):1670–1687, 2015. 3

  3. [3]

    Shrisha Bharadwaj, Haiwen Feng, Giorgio Becherini, Victo- ria Fernandez Abrevaya, and Michael J. Black. GenLit: Re- formulating Single-Image Relighting as Video Generation. arXiv preprint arxiv:2412.11224, 2025. 3, 6, 8, 10

  4. [4]

    Anand Bhattad, James Soole, and D.A. Forsyth. StyLitGAN: Image-based Relighting via Latent Control. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2024. 3

  5. [5]

    Blender: Open-source 3d cre- ation suite, 2025

    Blender Online Community. Blender: Open-source 3d cre- ation suite, 2025. Version 4.2 LTS. 2

  6. [6]

    Barron, Ce Liu, and Hendrik P.A

    Mark Boss, Raphael Braun, Varun Jampani, Jonathan T. Barron, Ce Liu, and Hendrik P.A. Lensch. NeRD: Neu- ral Reflectance Decomposition from Image Collections . In 2021 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 12664–12674, Los Alamitos, CA, USA,

  7. [7]

    IEEE Computer Society. 3

  8. [8]

    Real-time 3d-aware portrait video relighting

    Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen, Yu-Kun Lai, Hongbo Fu, Boxin Shi, and Lin Gao. Real-time 3d-aware portrait video relighting. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 6221–6231, 2024. 2

  9. [9]

    Physically controllable re- lighting of photographs

    Chris Careaga and Ya ˘gız Aksoy. Physically controllable re- lighting of photographs. InProceedings of the Special Inter- est Group on Computer Graphics and Interactive Techniques Conference Conference Papers, New York, NY , USA, 2025. Association for Computing Machinery. 2, 3, 5, 6, 8, 9

  10. [10]

    Text2relight: creative portrait relighting with text guidance

    Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, and Seungryul Baek. Text2relight: creative portrait relighting with text guidance. InProceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Ar- tificial ...

  11. [11]

    Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey, and Zhixin Shu. Synthlight: Por- trait relighting with diffusion model by learning to re-render synthetic faces.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 2, 3, 4

  12. [12]

    DIB-R++: learning to predict lighting and material with a hybrid differentiable renderer

    Wenzheng Chen, Joey Litalien, Jun Gao, Zian Wang, Clement Fuji Tsang, Sameh Khamis, Or Litany, and Sanja Fidler. DIB-R++: learning to predict lighting and material with a hybrid differentiable renderer. InProceedings of the 35th International Conference on Neural Information Pro- cessing Systems, Red Hook, NY , USA, 2021. Curran Asso- ciates Inc. 3

  13. [13]

    ScribbleLight: Single Image Indoor Relighting with Scribbles

    Jun Myeong Choi, Annie Wang, Pieter Peers, Anand Bhat- tad, and Roni Sengupta. ScribbleLight: Single Image Indoor Relighting with Scribbles. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5720–5731, 2025. 3, 6, 7

  14. [14]

    Acquiring the reflectance field of a human face

    Paul Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley Sarokin, and Mark Sagar. Acquiring the reflectance field of a human face. InProceedings of the 27th Annual Conference on Computer Graphics and In- teractive Techniques, page 145–156, USA, 2000. ACM Press/Addison-Wesley Publishing Co. 3

  15. [15]

    Deep Neural Models for Illumination Estimation and Re- lighting: A Survey.Computer Graphics Forum, 40(6):315– 331, 2021

    Farshad Einabadi, Jean-Yves Guillemaut, and Adrian Hilton. Deep Neural Models for Illumination Estimation and Re- lighting: A Survey.Computer Graphics Forum, 40(6):315– 331, 2021. 3

  16. [16]

    Bermano, and Christian Theobalt

    Yotam Erel, Rishabh Dabral, Vladislav Golyanik, Amit H. Bermano, and Christian Theobalt. Practilight: Practical light control using foundational diffusion models, 2025. 2, 3

  17. [17]

    arXiv preprint arXiv:2501.16330 (2025)

    Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein, and Dahua Lin. RelightVid: Temporal-Consistent Diffusion Model for Video Relighting.arXiv preprint arXiv:2501.16330, 2025. 5

  18. [18]

    Spotlight: Shadow-guided object relighting via diffusion,

    Fr ´ed´eric Fortier-Chouinard, Zitian Zhang, Louis-Etienne Messier, Mathieu Garon, Anand Bhattad, and Jean-Franc ¸ois Lalonde. Spotlight: Shadow-guided object relighting via dif- fusion.arXiv preprint arXiv:2411.18665, 2024. 3

  19. [19]

    Controllable Light Diffusion for Portraits

    David Futschik, Kelvin Ritland, James Vecore, Sean Fanello, Sergio Orts-Escolano, Brian Curless, Daniel Sykora, and Ro- hit Pandey. Controllable Light Diffusion for Portraits . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8412–8421, Los Alamitos, CA, USA, 2023. IEEE Computer Society. 3

  20. [20]

    Out- cast: Outdoor single-image relighting with cast shadows

    David Griffiths, Tobias Ritschel, and Julien Philip. Out- cast: Outdoor single-image relighting with cast shadows. In Computer Graphics Forum, pages 179–193. Wiley Online Library, 2022. 3

  21. [21]

    Synthetic Lighting for Photography.https: / / www

    Paul Haeberli. Synthetic Lighting for Photography.https: / / www . graficaobscura . com / synth / index . html, 1992. Accessed: 2025-10-19. 3

  22. [22]

    Shape, light, and material decomposition from images us- ing monte carlo rendering and denoising

    Jon Hasselgren, Nikolai Hofmann, and Jacob Munkberg. Shape, light, and material decomposition from images us- ing monte carlo rendering and denoising. InProceedings of the 36th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2022. Curran As- sociates Inc. 3

  23. [23]

    Unirelight: Learning joint decomposition and synthesis for video relight- ing, 2025

    Kai He, Ruofan Liang, Jacob Munkberg, Jon Hasselgren, Nandita Vijaykumar, Alexander Keller, Sanja Fidler, Igor Gilitschenski, Zan Gojcic, and Zian Wang. Unirelight: Learning joint decomposition and synthesis for video relight- ing, 2025. 3

  24. [24]

    DifFRelight: Diffusion-Based Facial Performance Relighting

    Mingming He, Pascal Clausen, Ahmet Levent Tas ¸el, Li Ma, Oliver Pilarski, Wenqi Xian, Laszlo Rikker, Xueming Yu, Ryan Burgert, Ning Yu, and Paul Debevec. DifFRelight: Diffusion-Based Facial Performance Relighting. InSIG- GRAPH Asia 2024 Conference Papers, New York, NY , USA,

  25. [25]

    Association for Computing Machinery. 5

  26. [26]

    Classifier-free diffusion guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. 5

  27. [27]

    Denoising Diffu- sion Probabilistic Models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Diffu- sion Probabilistic Models. InProceedings of the 34th Inter- national Conference on Neural Information Processing Sys- tems, Red Hook, NY , USA, 2020. Curran Associates Inc. 3

  28. [28]

    Face Relighting with Geometrically Consis- tent Shadows

    Andrew Hou, Michel Sarkis, Ning Bi, Yiying Tong, and Xi- aoming Liu. Face Relighting with Geometrically Consis- tent Shadows . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4207–4216, Los Alamitos, CA, USA, 2022. IEEE Computer Society. 3

  29. [29]

    COMPOSE: Comprehensive Portrait Shadow Editing

    Andrew Hou, Zhixin Shu, Xuaner Zhang, He Zhang, Yan- nick Hold-Geoffroy, Jae Shin Yoon, and Xiaoming Liu. COMPOSE: Comprehensive Portrait Shadow Editing. In Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXI, page 356–373, Berlin, Heidelberg, 2024. Springer- Verlag. 3

  30. [30]

    Self forcing: Bridging the train-test gap in autoregressive video diffusion

    Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion. InProceedings of the 39th International Conference on Neural Information Processing Systems, 2025. 5

  31. [31]

    GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflec- tive Surfaces

    Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaox- iao Long, Wenping Wang, and Yuexin Ma. GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflec- tive Surfaces . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5322–5332, Los Alamitos, CA, USA, 2024. IEEE Computer Society. 3

  32. [32]

    Neural Gaffer: Relighting Any Object via Diffusion

    Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural Gaffer: Relighting Any Object via Diffusion. InProceedings of the 38th International Conference on Neural Information Processing Systems, 2024. 2, 3, 4, 6

  33. [33]

    SwitchLight: Co-design of Physics- driven Architecture and Pre-training Framework for Human Portrait Relighting

    Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, and Sanghyun Woo. SwitchLight: Co-design of Physics- driven Architecture and Pre-training Framework for Human Portrait Relighting. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 25096–25106, 2024. 2, 3

  34. [34]

    LightIt: Illumination modeling and control for diffusion models

    Peter Kocsis, Julien Philip, Kalyan Sunkavalli, Matthias Nießner, and Yannick Hold-Geoffroy. LightIt: Illumination modeling and control for diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9359–9369, 2024. 3

  35. [35]

    Diffusion- Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models

    Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nan- dita Vijaykumar, Sanja Fidler, and Zian Wang. Diffusion- Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

  36. [36]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maxim- ilian Nickel, and Matthew Le. Flow matching for generative modeling. InInternational Conference on Learning Repre- sentations, 2023. 5

  37. [37]

    Flow Matching Guide and Code

    Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky TQ Chen, David Lopez- Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code.arXiv preprint arXiv:2412.06264, 2024. 5

  38. [38]

    UniLumos: Fast and unified im- age and video relighting with physics-plausible feedback

    Ropeway Liu, Hangjie Yuan, Bo Dong, Jiazheng Xing, Jin- wang Wang, Rui Zhao, Yan Xing, Weihua Chen, and Fan Wang. UniLumos: Fast and Unified Image and Video Re- lighting with Physics-Plausible Feedback.arXiv preprint arXiv:2511.01678, 2025. 5

  39. [39]

    Blind Removal of Facial Foreign Shadows

    Yaojie Liu*, Andrew Hou*, Xinyu Huang, Liu Ren, and Xi- aoming Liu. Blind Removal of Facial Foreign Shadows . InIn Proceedings of British Machine Vision Conference (BMVC), London, UK, 2022. 3

  40. [40]

    Decoupled weight de- cay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2019. 5

  41. [41]

    Intrinsicedit: Precise gen- erative image manipulation in intrinsic space.ACM Trans

    Linjie Lyu, Valentin Deschaintre, Yannick Hold-Geoffroy, Miloˇs Haˇsan, Jae Shin Yoon, Thomas Leimk¨uhler, Christian Theobalt, and Iliyan Georgiev. Intrinsicedit: Precise gen- erative image manipulation in intrinsic space.ACM Trans. Graph., 44(4):1–13, 2025. 3

  42. [42]

    Facelift: Learning generalizable single image 3d face re- construction from synthetic heads

    Weijie Lyu, Yi Zhou, Ming-Hsuan Yang, and Zhixin Shu. Facelift: Learning generalizable single image 3d face re- construction from synthetic heads. In2025 IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 12691–12701, 2025. 4

  43. [43]

    LightLab: Con- trolling Light Sources in Images with Diffusion Models

    Nadav Magar, Amir Hertz, Eric Tabellion, Yael Pritch, Alex Rav-Acha, Ariel Shamir, and Yedid Hoshen. LightLab: Con- trolling Light Sources in Images with Diffusion Models. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, New York, NY , USA, 2025. Association for Comput- ing Machinery...

  44. [44]

    Yiqun Mei, He Zhang, Xuaner Zhang, Jianming Zhang, Zhixin Shu, Yilin Wang, Zijun Wei, Shi Yan, HyunJoon Jung, and Vishal M. Patel. LightPainter: Interactive Por- trait Relighting with Freehand Scribble . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 195–205, Los Alamitos, CA, USA, 2023. IEEE Computer Society. 3

  45. [45]

    Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M George, Xueming Yu, Gabriel Dedic, Ahmet Lev- ent Tas ¸el, Ning Yu, Vishal M Patel, and Paul Debevec. Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset.Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn...

  46. [46]

    Total relighting: learning to relight portraits for background replacement.ACM Trans

    Rohit Pandey, Sergio Orts Escolano, Chloe Legendre, Chris- tian H ¨ane, Sofien Bouaziz, Christoph Rhemann, Paul De- bevec, and Sean Fanello. Total relighting: learning to relight portraits for background replacement.ACM Trans. Graph., 40(4), 2021. 2, 3

  47. [47]

    Scalable Diffusion Models with Transformers

    William Peebles and Saining Xie. Scalable Diffusion Models with Transformers. In2023 IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 4195–4205, 2023. 3, 5

  48. [48]

    MIT Press, 2023

    Matt Pharr, Wenzel Jakob, and Greg Humphreys.Physi- cally based rendering: From theory to implementation. MIT Press, 2023. 1

  49. [49]

    Poly haven.https : / / polyhaven.com/, 2021

    Poly Haven Contributors. Poly haven.https : / / polyhaven.com/, 2021. Free public asset library for 3D artists. 4

  50. [50]

    DiFaReli: Diffusion face relighting

    Puntawat Ponglertnapakorn, Nontawat Tritrong, and Supa- sorn Suwajanakorn. DiFaReli: Diffusion face relighting. In 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 22589–22600, 2023. 2, 3

  51. [51]

    Photographic tone reproduction for digital images

    Erik Reinhard, Michael Stark, Peter Shirley, and James Fer- werda. Photographic tone reproduction for digital images. InProceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, page 267–276, New York, NY , USA, 2002. Association for Computing Machin- ery. 4, 1, 3

  52. [52]

    Relightful Harmonization: Lighting-Aware Portrait Back- ground Replacement

    Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jian- ming Zhang, HyunJoon Jung, Guido Gerig, and He Zhang. Relightful Harmonization: Lighting-Aware Portrait Back- ground Replacement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6452–6462, 2024. 2, 3

  53. [53]

    Image based relighting using neural networks.ACM Trans

    Peiran Ren, Yue Dong, Stephen Lin, Xin Tong, and Baining Guo. Image based relighting using neural networks.ACM Trans. Graph., 34(4), 2015. 3

  54. [54]

    High-Resolution Image Synthesis with Latent Diffusion Models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2022. 3, 5

  55. [55]

    Palette: Image-to-Image Diffusion Models

    Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. Palette: Image-to-Image Diffusion Models. In ACM SIGGRAPH 2022 Conference Proceedings, New York, NY , USA, 2022. Association for Computing Machinery. 3

  56. [56]

    Painting with light

    Chris Schoeneman, Julie Dorsey, Brian Smits, James Arvo, and Donald Greenberg. Painting with light. InProceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, page 143–146, New York, NY , USA,

  57. [57]

    Association for Computing Machinery. 3

  58. [58]

    Portrait lighting trans- fer using a mass transport approach.ACM Trans

    Zhixin Shu, Sunil Hadap, Eli Shechtman, Kalyan Sunkavalli, Sylvain Paris, and Dimitris Samaras. Portrait lighting trans- fer using a mass transport approach.ACM Trans. Graph., 37 (2), 2017. 3

  59. [59]

    Denois- ing diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InInternational Conference on Learning Representations, 2021. 5

  60. [60]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. InProceedings of the 40th International Conference on Machine Learning. JMLR.org,

  61. [61]

    RoFormer: Enhanced transformer with Rotary Position Embedding.Neurocomput., 568(C), 2024

    Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with Rotary Position Embedding.Neurocomput., 568(C), 2024. 5

  62. [62]

    Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ra- mamoorthi, Jonathan T

    Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ra- mamoorthi, Jonathan T. Barron, and Ren Ng. Fourier Fea- tures Let Networks Learn High Frequency Functions in Low Dimensional Domains.NeurIPS, 2020. 5

  63. [63]

    Gomez, Łukasz Kaiser, and Il- lia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Il- lia Polosukhin. Attention is all you need. InProceedings of the 31st International Conference on Neural Information Processing Systems, page 6000–6010, Red Hook, NY , USA,

  64. [64]

    Curran Associates Inc. 5

  65. [65]

    Wan: Open and Advanced Large-Scale Video Generative Models, 2025

    Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianx- iao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jin- gren Zhou, Jinkai Wang, Jixuan Chen, Kai Zhu, Kang Zhao, Keyu Yan, Lianghua Huang, Mengyang Feng, Ningyi Zhang, Pandeng Li, Pingyu Wu, Ruihang Chu, Ruili Feng, Shiwei Zhang, Siyang Sun, Tao Fan...

  66. [66]

    Stylelight: Hdr panorama generation for lighting estimation and editing

    Guangcong Wang, Yinuo Yang, Chen Change Loy, and Zi- wei Liu. Stylelight: Hdr panorama generation for lighting estimation and editing. InComputer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, page 477–492, Berlin, Heidel- berg, 2022. Springer-Verlag. 3

  67. [67]

    Wang, Simon S

    Junying Wang, Jingyuan Liu, Xin Sun, Krishna Kumar Singh, Zhixin Shu, He Zhang, Jimei Yang, Nanxuan Zhao, Tuanfeng Y . Wang, Simon S. Chen, Ulrich Neumann, and Jae Shin Yoon. Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmo- nization . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn...

  68. [68]

    Cashman, and Jamie Shotton

    Erroll Wood, Tadas Baltru ˇsaitis, Charlie Hewitt, Sebastian Dziadzio, Thomas J. Cashman, and Jamie Shotton. Fake it till you make it: face analysis in the wild using synthetic data alone. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3661–3671, 2021. 4

  69. [69]

    LumiNet: Latent Intrinsics Meets Dif- fusion Models for Indoor Scene Relighting

    Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers, and Anand Bhattad. LumiNet: Latent Intrinsics Meets Dif- fusion Models for Indoor Scene Relighting . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 442–452, Los Alamitos, CA, USA, 2025. IEEE Computer Society. 3

  70. [70]

    Deep image-based relighting from optimal sparse samples.ACM Trans

    Zexiang Xu, Kalyan Sunkavalli, Sunil Hadap, and Ravi Ramamoorthi. Deep image-based relighting from optimal sparse samples.ACM Trans. Graph., 37(4):1–13, 2018. 3

  71. [71]

    Weakly-supervised Single-view Image Relighting

    Renjiao Yi, Chenyang Zhu, and Kai Xu. Weakly-supervised Single-view Image Relighting . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8402–8411, Los Alamitos, CA, USA, 2023. IEEE Computer Society. 3

  72. [72]

    Freeman, and Taesung Park

    Tianwei Yin, Michael Gharbi, Richard Zhang, Eli Shecht- man, Fredo Durand, William T. Freeman, and Taesung Park. One-Step Diffusion with Distribution Matching Distillation . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6613–6623, Los Alamitos, CA, USA, 2024. IEEE Computer Society. 3

  73. [73]

    Free- man, Fredo Durand, Eli Shechtman, and Xun Huang

    Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Free- man, Fredo Durand, Eli Shechtman, and Xun Huang. From Slow Bidirectional to Fast Autoregressive Video Diffusion Models . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22963–22974, Los Alamitos, CA, USA, 2025. IEEE Com- puter Society. 5

  74. [74]

    Generative Portrait Shadow Removal.ACM Trans

    Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Cecilia Zhang, Yannick Hold-Geoffroy, Krishna kumar Singh, and He Zhang. Generative Portrait Shadow Removal.ACM Trans. Graph., 43(6), 2024. 3

  75. [75]

    DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

    Chong Zeng, Yue Dong, Pieter Peers, Youkang Kong, Hongzhi Wu, and Xin Tong. DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation. In ACM SIGGRAPH 2024 Conference Papers, New York, NY , USA, 2024. Association for Computing Machinery. 2, 3

  76. [76]

    RGB↔X: Image decomposition and synthe- sis using material- and lighting-aware diffusion models

    Zheng Zeng, Valentin Deschaintre, Iliyan Georgiev, Yannick Hold-Geoffroy, Yiwei Hu, Fujun Luan, Ling-Qi Yan, and Miloˇs Haˇsan. RGB↔X: Image decomposition and synthe- sis using material- and lighting-aware diffusion models. In ACM SIGGRAPH 2024 Conference Papers, New York, NY , USA, 2024. Association for Computing Machinery. 3

  77. [77]

    Scaling In-the-Wild Training for Diffusion-based Illumination Har- monization and Editing by Imposing Consistent Light Trans- port

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Scaling In-the-Wild Training for Diffusion-based Illumination Har- monization and Editing by Imposing Consistent Light Trans- port. InInternational Conference on Learning Representa- tions, 2025. 2, 3

  78. [78]

    Efros, Eli Shecht- man, and Oliver Wang

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018. 6

  79. [79]

    Freeman, Kai Zhang, and Fujun Luan

    Tianyuan Zhang, Zhengfei Kuang, Haian Jin, Zexiang Xu, Sai Bi, Hao Tan, He Zhang, Yiwei Hu, Milos Hasan, William T. Freeman, Kai Zhang, and Fujun Luan. Re- litLRM: Generative relightable radiance for large reconstruc- tion models. InInternational Conference on Learning Rep- resentations, 2025. 2

  80. [80]

    Barron, Yun-Ta Tsai, Rohit Pandey, Xiuming Zhang, Ren Ng, and David E

    Xuaner (Cecilia) Zhang, Jonathan T. Barron, Yun-Ta Tsai, Rohit Pandey, Xiuming Zhang, Ren Ng, and David E. Ja- cobs. Portrait shadow manipulation.ACM Trans. Graph., 39 (4), 2020. 3

Showing first 80 references.