DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows
Pith reviewed 2026-05-24 08:40 UTC · model grok-4.3
The pith
A conditional diffusion model relights single-view faces with consistent cast shadows by modulating DDIM steps with rendered shading and an inferred shadow map.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By conditioning a DDIM decoder on a disentangled light encoding together with a rendered shading reference and an inferred shadow map, the model can synthesize relit face images that preserve identity and geometry while adding realistic, temporally consistent cast shadows, all from a single network pass and without any ground-truth lighting supervision.
What carries the argument
Conditional DDIM whose spatial modulation is performed by a rendered shading reference combined with a shadow map inferred from the input geometry.
If this is right
- Relighting no longer requires light-stage data, relit pairs, or multi-view images.
- A single forward pass produces the relit image once pre-processing is done.
- Cast shadows remain consistent when the same face is shown under changing target lights.
- Performance exceeds the teacher model on all reported metrics.
Where Pith is reading between the lines
- The same conditioning trick could be tested on non-face objects once reliable shape estimators exist.
- If the shadow-map step generalizes, it may reduce the need for full global-illumination simulation in other diffusion relighting tasks.
- Single-pass operation opens the possibility of applying the model to short video clips without per-frame retraining.
Load-bearing premise
Off-the-shelf 3D shape and facial-identity estimators supply inputs accurate enough that the simple shadow-map modulation does not introduce large visible errors in the final output.
What would settle it
Run the method on Multi-PIE test sequences using the same off-the-shelf estimators; if the generated cast shadows fail to match ground-truth shadow boundaries or show temporal flicker, the claim is falsified.
Figures
read the original abstract
We introduce a novel approach to single-view face relighting in the wild, addressing challenges such as global illumination and cast shadows. A common scheme in recent methods involves intrinsically decomposing an input image into 3D shape, albedo, and lighting, then recomposing it with the target lighting. However, estimating these components is error-prone and requires many training examples with ground-truth lighting to generalize well. Our work bypasses the need for accurate intrinsic estimation and can be trained solely on 2D images without any light stage data, relit pairs, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We propose a novel conditioning technique that simplifies modeling the complex interaction between light and geometry. It uses a rendered shading reference along with a shadow map, inferred using a simple and effective technique, to spatially modulate the DDIM. Moreover, we propose a single-shot relighting framework that requires just one network pass, given pre-processed data, and even outperforms the teacher model across all metrics. Our method realistically relights in-the-wild images with temporally consistent cast shadows under varying lighting conditions. We achieve state-of-the-art performance on the standard benchmark Multi-PIE and rank highest in user studies. Please visit our page: https://diffusion-face-relighting-pp.github.io
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DiFaReli++, a single-view face relighting method for in-the-wild images that uses a conditional DDIM decoder on disentangled light, 3D shape, and identity encodings obtained from off-the-shelf estimators. It proposes a conditioning technique that spatially modulates the DDIM via a rendered shading reference plus an inferred shadow map (obtained by a 'simple and effective technique') to model light-geometry interactions, including cast shadows. The method is trained only on 2D images without light-stage data, relit pairs, or lighting ground truth; it claims a single-shot inference pass, SOTA quantitative results on Multi-PIE, highest user-study rankings, and temporally consistent cast shadows under varying lighting.
Significance. If the conditioning technique and error propagation from off-the-shelf estimators can be shown to be robust, the approach would be significant for enabling realistic relighting with consistent shadows without requiring accurate intrinsic decomposition or specialized training data. The single-pass inference and avoidance of light-stage supervision are practical strengths.
major comments (3)
- [Abstract, §3] Abstract and §3 (Method): the central claim that the rendered shading + inferred shadow map 'simplifies modeling the complex interaction between light and geometry' and produces 'temporally consistent cast shadows' rests on unverified accuracy of the shadow map when derived from off-the-shelf 3D estimators; no quantitative propagation analysis, no ablation with ground-truth geometry, and no failure-case study on pose/expression variation (where shape errors are known to be large) are provided, leaving the link between 2D-only training and output consistency unsecured.
- [§4] §4 (Experiments): the assertion of 'state-of-the-art performance on the standard benchmark Multi-PIE' and 'outperforms the teacher model across all metrics' is stated without any reported quantitative tables, error bars, or per-metric comparisons in the abstract and is not accompanied by ablation details on the shadow-map component, which is load-bearing for the consistency claim.
- [§3.2] §3.2 (Conditioning technique): the shadow-map inference is described as 'simple and effective' yet no explicit formulation, pseudocode, or sensitivity analysis to input shape error (e.g., angular or depth deviation) is given, so it is impossible to assess whether the DDIM can correct misaligned shadows or merely propagates them.
minor comments (2)
- [Abstract] The abstract states results on Multi-PIE and user studies but does not cite the exact table or figure numbers where these appear; adding explicit cross-references would improve readability.
- [§3] Notation for the 'disentangled light encoding' and 'other encodings related to 3D shape and facial identity' should be defined with symbols or a diagram in §3 to avoid ambiguity when describing the DDIM conditioning.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, proposing revisions where appropriate to strengthen the paper while remaining faithful to the presented work.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (Method): the central claim that the rendered shading + inferred shadow map 'simplifies modeling the complex interaction between light and geometry' and produces 'temporally consistent cast shadows' rests on unverified accuracy of the shadow map when derived from off-the-shelf 3D estimators; no quantitative propagation analysis, no ablation with ground-truth geometry, and no failure-case study on pose/expression variation (where shape errors are known to be large) are provided, leaving the link between 2D-only training and output consistency unsecured.
Authors: We agree that additional analysis would strengthen the claims regarding robustness. The current manuscript does not include quantitative error propagation analysis or ablations with ground-truth geometry, as our approach is explicitly designed for 2D-only training without such supervision. In revision, we will add a failure-case study examining performance under pose and expression variations to better illustrate behavior with shape estimation errors. We maintain that the Multi-PIE results and user study provide supporting evidence for consistency, but acknowledge the value of the suggested additions. revision: partial
-
Referee: [§4] §4 (Experiments): the assertion of 'state-of-the-art performance on the standard benchmark Multi-PIE' and 'outperforms the teacher model across all metrics' is stated without any reported quantitative tables, error bars, or per-metric comparisons in the abstract and is not accompanied by ablation details on the shadow-map component, which is load-bearing for the consistency claim.
Authors: Quantitative tables with per-metric comparisons on Multi-PIE, including outperformance over the teacher model, are reported in Section 4 of the manuscript. Abstracts are summaries and do not typically contain full tables or error bars. To address the concern, we will add error bars to the existing tables and include a dedicated ablation study on the shadow-map component in the revised experiments section. revision: yes
-
Referee: [§3.2] §3.2 (Conditioning technique): the shadow-map inference is described as 'simple and effective' yet no explicit formulation, pseudocode, or sensitivity analysis to input shape error (e.g., angular or depth deviation) is given, so it is impossible to assess whether the DDIM can correct misaligned shadows or merely propagates them.
Authors: We will revise Section 3.2 to include the explicit formulation of the shadow-map inference technique, accompanying pseudocode, and a sensitivity analysis to input shape errors (such as angular or depth deviations) to the extent possible with available data. This will allow readers to better evaluate the conditioning mechanism. revision: yes
- Quantitative ablation studies using ground-truth geometry for error propagation analysis, as the method is trained solely on 2D images and does not have access to such ground-truth data.
Circularity Check
No circularity: method uses external estimators and standard DDIM conditioning without self-referential reductions
full rationale
The paper's approach relies on off-the-shelf 3D shape and identity estimators plus a simple shadow map inference to condition a standard DDIM, with training solely on 2D images. No equations, predictions, or derivations in the abstract or described framework reduce outputs to quantities defined by the method's own fitted parameters or self-citations. The conditioning technique is presented as a practical modulation step rather than a tautological construction, and claims rest on benchmark performance and user studies rather than internal self-definition. This is a standard empirical method paper with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Off-the-shelf 3D shape and identity estimators supply inputs accurate enough for the downstream diffusion conditioning to succeed.
- ad hoc to paper A simple shadow-map inference technique combined with rendered shading can spatially modulate the DDIM to model light-geometry interactions.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We propose a novel conditioning technique that simplifies modeling the complex interaction between light and geometry. It uses a rendered shading reference along with a shadow map, inferred using a simple and effective technique, to spatially modulate the DDIM.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We achieve state-of-the-art performance on the standard benchmark Multi-PIE and rank highest in user studies.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Segdiff: Image segmentation with diffusion probabilistic models
Tomer Amit, Eliya Nachmani, Tal Shaharbany, and Lior Wolf. Segdiff: Image segmentation with diffusion probabilistic models. arXiv:2112.00390, 2021. 17
-
[2]
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, et al. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv:2211.01324, 2022. 17
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[3]
Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. Analytic-dpm: an an- alytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv:2201.06503, 2022. 4
-
[4]
Label-efficient semantic segmentation with diffu- sion models
Dmitry Baranchuk, Ivan Rubachev, Andrey V oynov, Valentin Khrulkov, and Artem Babenko. Label-efficient semantic segmentation with diffu- sion models. arXiv:2112.03126, 2021. 17
-
[5]
Shape, illumination, and reflectance from shading
Jonathan T Barron and Jitendra Malik. Shape, illumination, and reflectance from shading. IEEE transactions on pattern analysis and machine intelligence, 37(8):1670–1687, 2014. 1, 3
work page 2014
-
[6]
A morphable model for the synthesis of 3d faces
V olker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques , pages 187–194, 1999. 3, 17
work page 1999
-
[7]
Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Trem- blay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3d generative adversarial networks, 2022. 4
work page 2022
-
[8]
Denoising likelihood score matching for conditional score-based data generation
Chen-Hao Chao, Wei-Fang Sun, Bo-Wun Cheng, Yi-Chen Lo, Chia-Che Chang, Yu-Lun Liu, Yu-Lin Chang, Chia-Ping Chen, and Chun-Yi Lee. Denoising likelihood score matching for conditional score-based data generation. arXiv:2203.14206, 2022. 17
-
[9]
Diffusion models in vision: A survey
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. arXiv:2209.04747, 2022. 17 14
-
[10]
Arcface: Additive angular margin loss for deep face recognition
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 4690–4699, 2019. 2, 5, 17
work page 2019
-
[11]
Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 0–0, 2019. 17
work page 2019
-
[12]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems , 34:8780–8794, 2021. 6, 16, 17, 19
work page 2021
-
[13]
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning
Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks , 107:3–11, 2018. 6
work page 2018
-
[14]
Qianli Feng, Viraj Shah, Raghudeep Gadde, Pietro Perona, and Aleix Martinez. Near perfect gan inversion. arXiv:2202.11833, 2022. 4
-
[15]
Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. Learning an animatable detailed 3D face model from in-the-wild images. vol- ume 40, 2021. 2, 4, 5, 8, 18, 19
work page 2021
-
[16]
Learning an animatable detailed 3d face model from in-the-wild images
Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (ToG) , 40(4):1–13, 2021. 17
work page 2021
-
[17]
Con- trollable light diffusion for portraits, 2023
David Futschik, Kelvin Ritland, James Vecore, Sean Fanello, Sergio Orts-Escolano, Brian Curless, Daniel S ´ykora, and Rohit Pandey. Con- trollable light diffusion for portraits, 2023. 3
work page 2023
-
[18]
Unsupervised training for 3d morphable model regression
Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T Freeman. Unsupervised training for 3d morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 8377–8386, 2018. 17
work page 2018
-
[19]
Gen- erative adversarial networks
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- erative adversarial networks. Communications of the ACM , 63(11):139– 144, 2020. 4
work page 2020
- [20]
-
[21]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016. 5
work page 2016
-
[22]
Diffrelight: Diffusion-based facial performance relighting
Mingming He, Pascal Clausen, Ahmet Levent Tas ¸el, Li Ma, Oliver Pilarski, Wenqi Xian, Laszlo Rikker, Xueming Yu, Ryan Burgert, Ning Yu, et al. Diffrelight: Diffusion-based facial performance relighting. In SIGGRAPH Asia 2024 Conference Papers , pages 1–12, 2024. 4
work page 2024
-
[23]
Denoising diffusion prob- abilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion prob- abilistic models. Advances in Neural Information Processing Systems , 33:6840–6851, 2020. 5, 6, 17
work page 2020
-
[24]
Cascaded diffusion models for high fidelity image generation
Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Moham- mad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. , 23:47–1, 2022. 17
work page 2022
-
[25]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv:2207.12598, 2022. 17
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
Face relighting with geometrically consistent shadows
Andrew Hou, Michel Sarkis, Ning Bi, Yiying Tong, and Xiaoming Liu. Face relighting with geometrically consistent shadows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 4217–4226, 2022. 1, 3, 4, 7, 8, 9, 10, 11, 12, 16, 18
work page 2022
-
[27]
Towards high fidelity face relighting with realistic shadows
Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, and Xiaoming Liu. Towards high fidelity face relighting with realistic shadows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 14719–14728, 2021. 1, 3, 4, 8, 9, 10, 11, 12, 18
work page 2021
-
[28]
3d face reconstruction with geometry details from a single image
Luo Jiang, Juyong Zhang, Bailin Deng, Hao Li, and Ligang Liu. 3d face reconstruction with geometry details from a single image. IEEE Transactions on Image Processing , 27(10):4756–4770, 2018. 17
work page 2018
-
[29]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2019. 4, 9, 12, 16, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32
work page 2019
-
[30]
Zhanghan Ke, Chunyi Sun, Lei Zhu, Ke Xu, and Rynson W.H. Lau. Harmonizer: Learning to perform white-box image and video harmo- nization. In European Conference on Computer Vision , 2022. 4
work page 2022
-
[31]
Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, and Sanghyun Woo. Switchlight: Co-design of physics-driven architecture and pre-training framework for human portrait relighting, 2024. 1, 3, 4, 9, 17, 18, 20
work page 2024
-
[32]
Illumination-invariant face recog- nition with deep relit face images
Ha A Le and Ioannis A Kakadiaris. Illumination-invariant face recog- nition with deep relit face images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) . IEEE, 2019. 1, 3
work page 2019
-
[33]
Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) , 2017. 5, 6, 17
work page 2017
-
[34]
A closed-form solution to photorealistic image stylization
Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, and Jan Kautz. A closed-form solution to photorealistic image stylization. In Proceedings of the European Conference on Computer Vision (ECCV) , 2018. 4
work page 2018
-
[35]
Feature- preserving detailed 3d face reconstruction from a single image
Yue Li, Liqian Ma, Haoqiang Fan, and Kenny Mitchell. Feature- preserving detailed 3d face reconstruction from a single image. In Proceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production , pages 1–9, 2018. 17
work page 2018
-
[36]
Targeting Ultimate Accuracy: Face Recognition via Deep Embedding
Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, and Chang Huang. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv:1506.07310, 2015. 17
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[37]
Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022. 4, 10
work page 2022
-
[38]
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv:2211.01095, 2022. 4, 10
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[39]
Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4990–4998, 2017. 4
work page 2017
-
[40]
Photoapp: Photorealistic appearance editing of head portraits
BR Mallikarjun, Ayush Tewari, Abdallah Dib, Tim Weyrich, Bernd Bickel, Hans Peter Seidel, Hanspeter Pfister, Wojciech Matusik, Louis Chevallier, Mohamed A Elgharib, et al. Photoapp: Photorealistic appearance editing of head portraits. ACM Transactions on Graphics ,
-
[41]
Face-specific data augmentation for unconstrained face recog- nition
Iacopo Masi, Anh Tu ˆan Tr ˆa`n, Tal Hassner, Gozde Sahin, and G ´erard Medioni. Face-specific data augmentation for unconstrained face recog- nition. International Journal of Computer Vision , 127, 2019. 17
work page 2019
-
[42]
Holo- relighting: Controllable volumetric portrait relighting from a single image
Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, and Vishal M Patel. Holo- relighting: Controllable volumetric portrait relighting from a single image. arXiv:2403.09632, 2024. 1, 3, 4, 9, 17, 18, 19
-
[43]
Sdedit: Guided image synthesis and editing with stochastic differential equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations , 2021. 17
work page 2021
-
[44]
Learning physics-guided face relighting under directional light
Thomas Nestmeyer, Jean-Franc ¸ois Lalonde, Iain Matthews, and Andreas Lehrmann. Learning physics-guided face relighting under directional light. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 5124–5133, 2020. 1, 3, 4, 8, 9, 10, 18
work page 2020
-
[45]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741, 2021. 6, 17
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[46]
Vaes meet diffusion models: Efficient and high-fidelity generation
Kushagra Pandey, Avideep Mukherjee, Piyush Rai, and Abhishek Ku- mar. Vaes meet diffusion models: Efficient and high-fidelity generation. In NeurIPS 2021 Workshop on Deep Generative Models and Down- stream Applications, 2021. 17
work page 2021
-
[47]
Total relighting: learning to relight portraits for background replacement
Rohit Pandey, Sergio Orts Escolano, Chloe Legendre, Christian Haene, Sofien Bouaziz, Christoph Rhemann, Paul Debevec, and Sean Fanello. Total relighting: learning to relight portraits for background replacement. ACM Transactions on Graphics (TOG) , 40(4):1–21, 2021. 1, 3, 4, 9, 10, 11, 17, 18
work page 2021
-
[48]
Relightify: Relightable 3d faces from a single image via diffusion models
Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou, and Stefanos Zafeiriou. Relightify: Relightable 3d faces from a single image via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023. 3
work page 2023
-
[49]
Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. 2015. 17
work page 2015
-
[50]
DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows
Puntawat Ponglertnapakorn, Nontawat Tritrong, and Supasorn Suwa- janakorn. Difareli: Diffusion face relighting. arXiv:2304.09479, 2023. 2, 3, 7, 8, 9, 11, 12, 18
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[51]
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dream- fusion: Text-to-3d using 2d diffusion. arXiv:2209.14988, 2022. 17
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[52]
Diffusion autoencoders: Toward a meaningful and decodable representation
Konpat Preechakul, Nattanat Chatthee, Suttisak Wizadwongsa, and Su- pasorn Suwajanakorn. Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10619–10629, 2022. 2, 5, 6, 7, 17, 19
work page 2022
-
[53]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 17
work page 2021
-
[54]
Explor- ing the limits of transfer learning with a unified text-to-text transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Explor- ing the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research , 21(1):5485–5551, 2020. 17 15
work page 2020
-
[55]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022. 17
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[56]
Facelit: Neural 3d relightable faces, 2023
Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, and Oncel Tuzel. Facelit: Neural 3d relightable faces, 2023. 4
work page 2023
-
[57]
Relightful harmonization: Lighting-aware portrait background replacement, 2023
Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, and He Zhang. Relightful harmonization: Lighting-aware portrait background replacement, 2023. 3, 4, 9
work page 2023
-
[58]
Encoding in style: a stylegan encoder for image-to-image translation
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2021. 4
work page 2021
-
[59]
Pivotal tuning for latent-based editing of real images
Daniel Roich, Ron Mokady, Amit H Bermano, and Daniel Cohen-Or. Pivotal tuning for latent-based editing of real images. ACM Transactions on Graphics (TOG) , 42(1):1–13, 2022. 4
work page 2022
-
[60]
High-resolution image synthesis with latent diffusion models, 2021
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion models, 2021. 6
work page 2021
-
[61]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 10684–10695, 2022. 17
work page 2022
-
[62]
Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv:2208.12242, 2022. 17
-
[63]
Palette: Image-to-image diffusion models
Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022. 17
work page 2022
-
[64]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv:2205.11487,
work page internal anchor Pith review Pith/arXiv arXiv
-
[65]
Relightable gaussian codec avatars
Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. Relightable gaussian codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 130–141, 2024. 4
work page 2024
-
[66]
Adversarial diffusion distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. arXiv:2311.17042, 2023. 3, 4
-
[67]
Facenet: A unified embedding for face recognition and clustering
Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 815–823, 2015. 17
work page 2015
-
[68]
Sfsnet: Learning shape, reflectance and illuminance of facesin the wild
Soumyadip Sengupta, Angjoo Kanazawa, Carlos D Castillo, and David W Jacobs. Sfsnet: Learning shape, reflectance and illuminance of facesin the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition , 2018. 1, 3, 10, 17, 18
work page 2018
-
[69]
Style transfer for headshot portraits
YiChang Shih, Sylvain Paris, Connelly Barnes, William T Freeman, and Fr´edo Durand. Style transfer for headshot portraits. 2014. 4
work page 2014
-
[70]
Portrait lighting transfer using a mass transport approach
Zhixin Shu, Sunil Hadap, Eli Shechtman, Kalyan Sunkavalli, Sylvain Paris, and Dimitris Samaras. Portrait lighting transfer using a mass transport approach. ACM Transactions on Graphics (TOG) , 2017. 4, 17
work page 2017
-
[71]
Neural face editing with intrinsic image disentangling
Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shecht- man, and Dimitris Samaras. Neural face editing with intrinsic image disentangling. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 5541–5550, 2017. 1, 3
work page 2017
-
[72]
D2c: Diffusion-decoding models for few-shot conditional generation
Abhishek Sinha, Jiaming Song, Chenlin Meng, and Stefano Ermon. D2c: Diffusion-decoding models for few-shot conditional generation. Advances in Neural Information Processing Systems , 34, 2021. 17
work page 2021
-
[73]
Deep unsupervised learning using nonequilibrium thermody- namics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermody- namics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning , Proceedings of Machine Learning Research, pages 2256–2265. PMLR, 2015. 5, 17
work page 2015
-
[74]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representa- tions, 2021. 2, 5, 6
work page 2021
-
[75]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consis- tency models. arXiv:2303.01469, 2023. 3, 4
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[76]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 32. Curran Asso- ciates, Inc., 2019. 5, 17
work page 2019
-
[77]
Single image portrait relighting
Tiancheng Sun, Jonathan T Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul E Debevec, and Ravi Ramamoorthi. Single image portrait relighting. ACM Trans. Graph., 38(4):79–1, 2019. 3, 10, 18
work page 2019
-
[78]
Deepface: Closing the gap to human-level performance in face veri- fication
Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face veri- fication. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1701–1708, 2014. 17
work page 2014
-
[79]
Pie: Portrait image embedding for semantic control
Ayush Tewari, Mohamed Elgharib, Florian Bernard, Hans-Peter Seidel, Patrick P ´erez, Michael Zollh ¨ofer, and Christian Theobalt. Pie: Portrait image embedding for semantic control. ACM Transactions on Graphics (TOG), 39(6):1–14, 2020. 4
work page 2020
-
[80]
Stylerig: Rigging stylegan for 3d control over portrait images
Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick P ´erez, Michael Zollhofer, and Christian Theobalt. Stylerig: Rigging stylegan for 3d control over portrait images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6142–6151, 2020. 4
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.