Recognition: unknown
GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers
Pith reviewed 2026-05-10 01:15 UTC · model grok-4.3
The pith
A unified diffusion transformer can jointly estimate 3D geometry and relight a person from a single photo.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GeoRelight is a Multi-Modal Diffusion Transformer that jointly solves for 3D geometry and relighting from a single image. It achieves this through the isotropic NDC-Orthographic Depth representation, which provides a distortion-free 3D encoding compatible with latent diffusion models, combined with a mixed-data training strategy that uses both synthetic renders and auto-labeled real images.
What carries the argument
The isotropic NDC-Orthographic Depth (iNOD) representation serves as the central mechanism, allowing the diffusion transformer to process geometry and lighting variables jointly without distortion.
If this is right
- The joint model outperforms sequential pipelines by avoiding error accumulation between geometry and relighting steps.
- Explicit use of estimated geometry during relighting produces outputs with greater physical consistency.
- Mixed training on synthetic and real data enables generalization without dataset-specific tuning or post-hoc fixes.
- Joint solving removes the need for separate post-processing stages in both tasks.
Where Pith is reading between the lines
- This joint training strategy could be tested on related inverse graphics problems such as estimating surface materials from images.
- Extending the model to handle multiple input views or video sequences might further improve geometry accuracy.
- If iNOD proves stable, it may serve as a drop-in replacement for other depth representations in diffusion-based 3D generation pipelines.
Load-bearing premise
The assumption that joint training on the proposed representation and mixed data actually prevents error accumulation and produces outputs that are physically consistent without additional corrections.
What would settle it
A controlled experiment on images with known ground-truth 3D geometry and lighting, where the model's relit output is compared against a physically-based renderer using the estimated geometry and lights; significant deviations from expected results would falsify the consistency benefit.
Figures
read the original abstract
Relighting a person from a single photo is an attractive but ill-posed task, as a 2D image ambiguously entangles 3D geometry, intrinsic appearance, and illumination. Current methods either use sequential pipelines that suffer from error accumulation, or they do not explicitly leverage 3D geometry during relighting, which limits physical consistency. Since relighting and estimation of 3D geometry are mutually beneficial tasks, we propose a unified Multi-Modal Diffusion Transformer (DiT) that jointly solves for both: GeoRelight. We make this possible through two key technical contributions: isotropic NDC-Orthographic Depth (iNOD), a distortion-free 3D representation compatible with latent diffusion models; and a strategic mixed-data training method that combines synthetic and auto-labeled real data. By solving geometry and relighting jointly, GeoRelight achieves better performance than both sequential models and previous systems that ignored geometry.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes GeoRelight, a unified Multi-Modal Diffusion Transformer (DiT) that jointly performs relighting and 3D geometry reconstruction from a single photo. It introduces isotropic NDC-Orthographic Depth (iNOD) as a distortion-free representation compatible with latent diffusion and employs mixed-data training on synthetic plus auto-labeled real data, claiming this mutual-benefit approach avoids the error accumulation of sequential pipelines and yields superior performance to prior systems that ignore geometry.
Significance. If the joint optimization demonstrably produces physically consistent outputs with reduced error accumulation, the work would advance single-image relighting and reconstruction by unifying two interdependent tasks inside a flexible diffusion transformer, offering a template for other appearance-geometry problems.
major comments (3)
- [Training Strategy] The central claim that joint solving via iNOD and mixed training inherently avoids error accumulation and ensures physical consistency rests on the training dynamics, yet the manuscript provides no explicit cross-consistency term in the diffusion objective that penalizes mismatches between predicted depth and relit appearance (see the description of the training objective).
- [Experiments] No ablation studies isolate the contribution of joint training versus sequential pipelines or quantify whether mixed-data training reduces inconsistency rather than averaging label noise from auto-labeled real data; this directly undermines the assertion of a unified advantage over sequential models.
- [Method] The iNOD representation is asserted to be distortion-free and DiT-compatible, but the manuscript supplies neither a derivation comparing it to standard NDC/orthographic projections nor empirical verification that it preserves the mutual-benefit premise without additional post-hoc fixes.
minor comments (2)
- [Abstract] The abstract introduces 'Multi-Modal Diffusion Transformer' without immediately clarifying the modalities; a brief parenthetical in the first sentence would improve readability.
- [Method] Notation for the iNOD projection (e.g., the exact mapping from 3D points to the latent space) should be formalized with an equation rather than prose description.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed comments. We appreciate the focus on the core claims of joint optimization and physical consistency. Below we respond point-by-point to the major comments, clarifying our approach and committing to revisions that strengthen the presentation and validation of these claims.
read point-by-point responses
-
Referee: The central claim that joint solving via iNOD and mixed training inherently avoids error accumulation and ensures physical consistency rests on the training dynamics, yet the manuscript provides no explicit cross-consistency term in the diffusion objective that penalizes mismatches between predicted depth and relit appearance (see the description of the training objective).
Authors: We agree that an explicit cross-consistency term would make the mutual-benefit argument more direct. While the shared DiT backbone and joint denoising process on the combined iNOD+appearance latent encourage consistency through data-driven supervision (synthetic data provides perfect alignment and real auto-labels provide scale), the current objective does not add an auxiliary penalty for depth-appearance mismatch. In the revision we will introduce a lightweight consistency regularizer (e.g., a rendered shading consistency loss between predicted depth and relit image) into the training objective and report its effect. revision: yes
-
Referee: No ablation studies isolate the contribution of joint training versus sequential pipelines or quantify whether mixed-data training reduces inconsistency rather than averaging label noise from auto-labeled real data; this directly undermines the assertion of a unified advantage over sequential models.
Authors: We acknowledge the absence of these targeted ablations. In the revised manuscript we will add (1) a direct comparison of the joint GeoRelight model against a sequential baseline (depth estimation followed by a separate relighting network) using the same backbone and data, and (2) quantitative consistency metrics (e.g., normal-shading error and depth-relighting alignment on a held-out synthetic test set) that separate the effect of joint training from potential label noise averaging in the mixed-data regime. revision: yes
-
Referee: The iNOD representation is asserted to be distortion-free and DiT-compatible, but the manuscript supplies neither a derivation comparing it to standard NDC/orthographic projections nor empirical verification that it preserves the mutual-benefit premise without additional post-hoc fixes.
Authors: We will expand the method section with a concise derivation showing that iNOD applies isotropic scaling within normalized device coordinates to eliminate the non-uniform stretching present in both standard NDC perspective and pure orthographic projections, while remaining compatible with the fixed-resolution latent grid of the DiT. We will also add empirical verification: side-by-side reconstruction and relighting error tables on synthetic data, plus qualitative examples demonstrating that the joint model benefits from iNOD without requiring post-hoc alignment steps. revision: yes
Circularity Check
No significant circularity; claims rest on empirical training rather than definitional reduction
full rationale
The paper motivates its unified DiT by stating that relighting and 3D geometry are mutually beneficial tasks, then introduces the iNOD representation and mixed synthetic/auto-labeled training as technical contributions. Performance superiority over sequential pipelines is asserted as an outcome of joint training and evaluated empirically, without any equation or result that reduces by construction to the inputs (e.g., no fitted parameter renamed as prediction, no self-citation chain invoked as a uniqueness theorem, and no ansatz smuggled via prior work). The derivation chain is self-contained as a proposal of architecture plus data strategy whose validity is tested externally via experiments rather than forced analytically.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Relighting and 3D geometry estimation are mutually beneficial tasks whose joint solution avoids error accumulation
invented entities (1)
-
isotropic NDC-Orthographic Depth (iNOD)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
K. S. Arun, T. S. Huang, and S. D. Blostein. Least-squares fitting of two 3-d point sets.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, PAMI-9(5):698– 700, 1987. 6, 13
1987
-
[2]
Shape, illumination, and reflectance from shading.IEEE transactions on pattern analysis and machine intelligence, 37(8):1670–1687, 2014
Jonathan T Barron and Jitendra Malik. Shape, illumination, and reflectance from shading.IEEE transactions on pattern analysis and machine intelligence, 37(8):1670–1687, 2014. 3
2014
-
[3]
Re- covering intrinsic scene characteristics.Comput
Harry Barrow, J Tenenbaum, A Hanson, and E Riseman. Re- covering intrinsic scene characteristics.Comput. vis. syst, 2 (3-26):2, 1978. 2
1978
-
[4]
Shrisha Bharadwaj, Haiwen Feng, Giorgio Becherini, Vic- toria Fernandez Abrevaya, and Michael J. Black. GenLit: Reformulating single image relighting as video generation. InSIGGRAPH Asia Conference Papers ’25, New York, NY , USA, 2025. Association for Computing Machinery. 2
2025
-
[5]
Physically controllable re- lighting of photographs
Chris Careaga and Yagiz Aksoy. Physically controllable re- lighting of photographs. InProceedings of the Special Inter- est Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH Conference Papers 2025, Vancou- ver, BC, Canada, August 10-14, 2025, pages 105:1–105:10. ACM, 2025. 3
2025
-
[6]
Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey, and Zhixin Shu. Synthlight: Por- trait relighting with diffusion model by learning to re-render synthetic faces.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 2, 12
2025
-
[7]
VideoJAM: Joint appearance-motion representations for en- hanced motion generation in video models
Hila Chefer, Uriel Singer, Amit Zohar, Yuval Kirstain, Adam Polyak, Yaniv Taigman, Lior Wolf, and Shelly Sheynin. VideoJAM: Joint appearance-motion representations for en- hanced motion generation in video models. InForty-second International Conference on Machine Learning, 2025. 3
2025
-
[8]
A survey on intrinsic images: Delv- ing deep into lambert and beyond.International Journal of Computer Vision, 130(3):836–868, 2022
Elena Garces, Carlos Rodriguez-Pardo, Dan Casas, and Jorge Lopez-Moreno. A survey on intrinsic images: Delv- ing deep into lambert and beyond.International Journal of Computer Vision, 130(3):836–868, 2022. 2
2022
-
[9]
Unirelight: Learning joint decomposition and synthesis for video relight- ing, 2025
Kai He, Ruofan Liang, Jacob Munkberg, Jon Hasselgren, Nandita Vijaykumar, Alexander Keller, Sanja Fidler, Igor Gilitschenski, Zan Gojcic, and Zian Wang. Unirelight: Learning joint decomposition and synthesis for video relight- ing, 2025. 2, 3, 4
2025
-
[10]
Classifier-free diffusion guidance, 2022
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022. 10
2022
-
[11]
Geo4d: Leveraging video generators for geometric 4d scene reconstruction, 2025
Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, and Andrea Vedaldi. Geo4d: Leveraging video generators for geometric 4d scene reconstruction, 2025. 3
2025
-
[12]
Neural gaffer: Relighting any object via diffusion
Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural gaffer: Relighting any object via diffusion. InAdvances in Neural Information Processing Systems, 2024. 2, 4, 6, 12, 13
2024
-
[13]
Elucidating the design space of diffusion-based generative models, 2022
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models, 2022. 4
2022
-
[14]
Repurpos- ing diffusion-based image generators for monocular depth estimation
Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing diffusion-based image generators for monocular depth estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 3
2024
-
[15]
Marigold: Affordable adaptation of diffusion- based image generators for image analysis, 2025
Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. Marigold: Affordable adaptation of diffusion- based image generators for image analysis, 2025. 3
2025
-
[16]
Sapiens: Foundation for human vision mod- els.arXiv preprint arXiv:2408.12569, 2024
Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Su Zhaoen, Austin James, Peter Selednik, Stuart Anderson, and Shunsuke Saito. Sapiens: Foundation for human vision mod- els.arXiv preprint arXiv:2408.12569, 2024. 6, 12
-
[17]
Switchlight: Co-design of physics- driven architecture and pre-training framework for human portrait relighting, 2024
Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, and Sanghyun Woo. Switchlight: Co-design of physics- driven architecture and pre-training framework for human portrait relighting, 2024. 2, 11, 12
2024
-
[18]
Kingma and Jimmy Ba
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. 9
2017
-
[19]
Releasing re-laion-5b: transparent iteration on laion-5b with additional safety fixes.https://laion
LAION. Releasing re-laion-5b: transparent iteration on laion-5b with additional safety fixes.https://laion. ai/blog/relaion-5b/, 2024. Accessed: 30 aug, 2024. 5, 11 15
2024
-
[20]
Lightness and retinex theory.Journal of the Optical society of America, 61(1):1– 11, 1971
Edwin H Land and John J McCann. Lightness and retinex theory.Journal of the Optical society of America, 61(1):1– 11, 1971. 2
1971
-
[21]
Cosmicman: A text-to-image foun- dation model for humans
Shikai Li, Jianglin Fu, Kaiyuan Liu, Wentao Wang, Kwan- Yee Lin, and Wayne Wu. Cosmicman: A text-to-image foun- dation model for humans. InComputer Vision and Pattern Recognition (CVPR), 2024. 5, 11
2024
-
[22]
Diffusion- renderer: Neural inverse and forward rendering with video diffusion models
Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nan- dita Vijaykumar, Sanja Fidler, and Zian Wang. Diffusion- renderer: Neural inverse and forward rendering with video diffusion models. InThe IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2025. 2, 4, 5, 6, 9, 11, 12, 13
2025
-
[23]
Matrix3d: Large photogrammetry model all-in-one
Yuanxun Lu, Jingyang Zhang, Tian Fang, Jean-Daniel Nah- mias, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao, and Shi- wei Li. Matrix3d: Large photogrammetry model all-in-one. Computer Vision and Pattern Recognition (CVPR), 2025. 3
2025
-
[24]
Lightlab: Con- trolling light sources in images with diffusion models
Nadav Magar, Amir Hertz, Eric Tabellion, Yael Pritch, Alex Rav-Acha, Ariel Shamir, and Yedid Hoshen. Lightlab: Con- trolling light sources in images with diffusion models. 2025. 2
2025
-
[25]
Jewett, Simon Ven- shtain, Christopher Heilman, Yueh-Tung Chen, Sidi Fu, Mo- hamed Ezzeldin A
Julieta Martinez, Emily Kim, Javier Romero, Timur Bagaut- dinov, Shunsuke Saito, Shoou-I Yu, Stuart Anderson, Michael Zollhöfer, Te-Li Wang, Shaojie Bai, Chenghui Li, Shih-En Wei, Rohan Joshi, Wyatt Borsos, Tomas Simon, Jason Saragih, Paul Theodosis, Alexander Greene, Anjani Josyula, Silvio Mano Maeta, Andrew I. Jewett, Simon Ven- shtain, Christopher Heil...
2024
-
[26]
Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M George, Xueming Yu, Gabriel Dedic, Ahmet Lev- ent Ta¸ sel, Ning Yu, Vishal M Patel, and Paul Debevec. Lux post facto: Learning portrait performance relighting with conditional video diffusion and a hybrid dataset.Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn...
2025
-
[27]
Cosmos world foundation model platform for physical ai, 2025
NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Ji...
2025
-
[28]
Total relighting: Learning to relight portraits for background replacement
Rohit Pandey, Sergio Orts-Escolano, Chloe LeGendre, Christian Haene, Sofien Bouaziz, Christoph Rhemann, Paul Debevec, and Sean Fanello. Total relighting: Learning to relight portraits for background replacement. 2021. 2, 11, 12
2021
-
[29]
Scalable Diffusion Models with Transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers.arXiv preprint arXiv:2212.09748, 2022. 2, 9
work page internal anchor Pith review arXiv 2022
-
[30]
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005
Erik Reinhard, Greg Ward, Sumanta Pattanaik, and Paul De- bevec.High Dynamic Range Imaging: Acquisition, Display, and Image-Based Lighting (The Morgan Kaufmann Series in Computer Graphics). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. 4
2005
-
[31]
Roformer: Enhanced transformer with rotary position embedding, 2023
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2023. 4, 9, 14
2023
-
[32]
Humanolat: A large-scale dataset for full-body human re- lighting and novel-view synthesis
Timo Teufel, Pulkit Gera, Xilong Zhou, Umar Iqbal, Pramod Rao, Jan Kautz, Vladislav Golyanik, and Christian Theobalt. Humanolat: A large-scale dataset for full-body human re- lighting and novel-view synthesis. InInternational Confer- ence on Computer Vision (ICCV), 2025. 6, 7, 12
2025
-
[33]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 3, 6, 7, 8, 12, 13
2025
-
[34]
Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision
Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5261–5271, 2025
2025
-
[35]
Moge-2: Accurate monocular geometry with metric scale and sharp details, 2025
Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details, 2025. 6, 7, 8, 12, 13
2025
-
[36]
Dust3r: Geometric 3d vi- sion made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. InCVPR, 2024. 3, 12
2024
-
[37]
Relightable full-body gaussian codec avatars
Shaofei Wang, Tomas Simon, Igor Santesteban, Timur Bagautdinov, Junxuan Li, Vasu Agrawal, Fabian Prada, Shoou-I Yu, Pace Nalbone, Matt Gramlich, Roman Lubach- ersky, Chenglei Wu, Javier Romero, Jason Saragih, Michael 16 Zollhoefer, Andreas Geiger, Siyu Tang, and Shunsuke Saito. Relightable full-body gaussian codec avatars. InProceed- ings of the Special I...
2025
-
[38]
Event-based non-rigid reconstruction from con- tours
Yuxuan Xue, Haolong Li, Stefan Leutenegger, and Joerg Stueckler. Event-based non-rigid reconstruction from con- tours. In33rd British Machine Vision Conference 2022, BMVC 2022, 2022. 15
2022
-
[39]
Nsf: Neural surface fields for human modeling from monocular depth
Yuxuan Xue, Bharat Lal Bhatnagar, Riccardo Marin, Niko- laos Sarafianos, Yuanlu Xu, Gerard Pons-Moll, and Tony Tung. Nsf: Neural surface fields for human modeling from monocular depth. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 15049–15060, 2023. 13
2023
-
[40]
Event-based non-rigid reconstruction of low-rank parametrized deformations from contours.Int
Yuxuan Xue, Haolong Li, Stefan Leutenegger, and Jörg Stückler. Event-based non-rigid reconstruction of low-rank parametrized deformations from contours.Int. J. Comput. Vis., 132(8):2943–2961, 2024. 15
2024
-
[41]
Human-3diffusion: Realistic avatar creation via explicit 3d consistent diffusion models
Yuxuan Xue, Xianghui Xie, Riccardo Marin, and Gerard Pons-Moll. Human-3diffusion: Realistic avatar creation via explicit 3d consistent diffusion models. InAdvances in Neu- ral Information Processing Systems 38 (NeurIPS), 2024. 8, 14
2024
-
[42]
Infinihuman: Infinite 3d human creation with precise control
Yuxuan Xue, Xianghui Xie, Margaret Kostyrko, and Gerard Pons-Moll. Infinihuman: Infinite 3d human creation with precise control. InSIGGRAPH Asia 2025 Conference Pa- pers, 2025. 14
2025
-
[43]
Gen-3diffusion: Realistic image-to-3d genera- tion via 2d & 3d diffusion synergy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
Yuxuan Xue, Xianghui Xie, Riccardo Marin, and Gerard Pons-Moll. Gen-3diffusion: Realistic image-to-3d genera- tion via 2d & 3d diffusion synergy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 8, 14
2025
-
[44]
Physic: Physically plausible 3d human-scene interaction and contact from a single image.SIGGRAPH Asia 2025 Conference Pa- pers, 2025
Pradyumna Yalandur Muralidhar, Yuxuan Xue, Xianghui Xie, Margaret Kostyrko, and Gerard Pons-Moll. Physic: Physically plausible 3d human-scene interaction and contact from a single image.SIGGRAPH Asia 2025 Conference Pa- pers, 2025. 15
2025
-
[45]
Dilightnet: Fine-grained light- ing control for diffusion-based image generation
Chong Zeng, Yue Dong, Pieter Peers, Youkang Kong, Hongzhi Wu, and Xin Tong. Dilightnet: Fine-grained light- ing control for diffusion-based image generation. InACM SIGGRAPH 2024 Conference Papers, 2024. 2, 6, 12, 13
2024
-
[46]
Rgb↔x: Image decomposition and synthe- sis using material- and lighting-aware diffusion models
Zheng Zeng, Valentin Deschaintre, Iliyan Georgiev, Yannick Hold-Geoffroy, Yiwei Hu, Fujun Luan, Ling-Qi Yan, and Miloš Hašan. Rgb↔x: Image decomposition and synthe- sis using material- and lighting-aware diffusion models. In ACM SIGGRAPH 2024 Conference Papers, New York, NY , USA, 2024. Association for Computing Machinery. 2, 6, 12
2024
-
[47]
Scal- ing in-the-wild training for diffusion-based illumination har- monization and editing by imposing consistent light trans- port
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Scal- ing in-the-wild training for diffusion-based illumination har- monization and editing by imposing consistent light trans- port. InThe Thirteenth International Conference on Learn- ing Representations, 2025. 2, 6, 12, 13
2025
-
[48]
Learning-based human relighting: A survey.ACM Comput- ing Surveys, 2025
Shumin Zhu, Wai Keung Wong, and Xingxing Zou. Learning-based human relighting: A survey.ACM Comput- ing Surveys, 2025. 2 17
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.